Researchers evaluate the performance of three large language models (LLMs) on a question bank designed for neurosurgery oral board examination preparation. The study found that GPT-4 achieved the highest score of 82.6%, outperforming ChatGPT and Google's Bard.