The much-anticipated latest version of ChatGPT came online earlier this week, opening a window into the new capabilities of the artificial intelligence (AI)-based chatbot.
Developed by OpenAI, GPT-4 is a large language model (LLM) offering significant improvements to ChatGPT’s capabilities compared to GPT-3 introduced less than two months ago. GPT-4 features stronger safety and privacy guardrails, longer input and output text, and more accurate, detailed, and concise responses for nuanced questions. While GPT-4 output remains textual, a yet-to-be-publicly-released multimodal capability will support inputs from both text and images.
The potential implications for insurers are profound and should only become more pronounced as the technology improves. OpenAI will continue to release future versions, enabling insurers to more easily implement and customize applications across the insurance value chain – from customer acquisition through claims processing.
Meaningful Output
The GPT-4 upgrade is currently available to ChatGPT Plus subscribers only. Compared to GPT-3, the new version better answers questions dependent on reasoning and creativity. According to OpenAI, GPT-4 achieves human-level performance scores for many standardized tests, such as a simulated Law School Admission Test, Scholastic Aptitude Test, and Graduate Record Examination. On a simulated Uniform Bar Exam, GPT-4 scored in the top 80-90th percentile compared to GPT-3 landing in the bottom 10%.
Last month, RGA posed three insurance questions to GPT-3 with mixed results. While GPT-3 provided good answers to questions about the long-term mortality effects of COVID-19 and the future of digital distribution, it stumbled on a more nuanced query. GPT-3 incorrectly surmised that adoptive parents could pass on a genetic condition to their biologically unrelated children. GPT-4 answered all three questions correctly, providing more detail for the two correct answers without adding substantially to the response length.
On a set of 50 underwriting-related questions prepared by RGA, GPT-3 did perform well on those that dealt strictly with anatomy, physiology, life insurance practices, or underwriting. However, GPT-3 was often unable to answer cross-discipline questions correctly. Additionally, assessing the underwriting risks of certain avocations and comorbidities proved difficult.