GPT-4 Upgrade Improves Results, Expands Application Potential

Jeff Heaton

In Brief

GPT-4 features stronger safety and privacy guardrails, longer input and output text, and more accurate, detailed, and concise responses for nuanced questions. Although it offers exciting opportunities for insurance and countless other industries, its potential provides reason for caution.

The much-anticipated latest version of ChatGPT came online earlier this week, opening a window into the new capabilities of the artificial intelligence (AI)-based chatbot.

Developed by OpenAI, GPT-4 is a large language model (LLM) offering significant improvements to ChatGPT’s capabilities compared to GPT-3 introduced less than two months ago. GPT-4 features stronger safety and privacy guardrails, longer input and output text, and more accurate, detailed, and concise responses for nuanced questions. While GPT-4 output remains textual, a yet-to-be-publicly-released multimodal capability will support inputs from both text and images.

The potential implications for insurers are profound and should only become more pronounced as the technology improves. OpenAI will continue to release future versions, enabling insurers to more easily implement and customize applications across the insurance value chain – from customer acquisition through claims processing.

Meaningful Output

The GPT-4 upgrade is currently available to ChatGPT Plus subscribers only. Compared to GPT-3, the new version better answers questions dependent on reasoning and creativity. According to OpenAI, GPT-4 achieves human-level performance scores for many standardized tests, such as a simulated Law School Admission Test, Scholastic Aptitude Test, and Graduate Record Examination. On a simulated Uniform Bar Exam, GPT-4 scored in the top 80-90th percentile compared to GPT-3 landing in the bottom 10%.

Last month, RGA posed three insurance questions to GPT-3 with mixed results. While GPT-3 provided good answers to questions about the long-term mortality effects of COVID-19 and the future of digital distribution, it stumbled on a more nuanced query. GPT-3 incorrectly surmised that adoptive parents could pass on a genetic condition to their biologically unrelated children. GPT-4 answered all three questions correctly, providing more detail for the two correct answers without adding substantially to the response length.

On a set of 50 underwriting-related questions prepared by RGA, GPT-3 did perform well on those that dealt strictly with anatomy, physiology, life insurance practices, or underwriting. However, GPT-3 was often unable to answer cross-discipline questions correctly. Additionally, assessing the underwriting risks of certain avocations and comorbidities proved difficult.

At RGA, we are eager to engage with clients to better understand and tackle the industry’s most pressing challenges together. Contact us to discuss and to learn more about RGA's capabilities, resources, and solutions.

Contact RGA

GPT-4 proved more accurate overall than GPT-3. Although GPT-3 provided 38 correct answers to the 50 questions, GPT-4 was able to answer 47 correctly. The updated model delivered more accurate, detailed, and concise answers by tightening or even eliminating some GPT-3-generated preamble and redundances. Generally, the further the questions ventured from mainstream to insurance industry-specific knowledge, the more ChatGPT answers degraded.

Of course, ChatGPT-4 is not error free. OpenAI is the first to concede that humans must check its work. For example, when asked which parties must have an insurable interest in a policy and whether agents can conduct specific medical tests, GPT-4 answered incorrectly.

Caution Ahead

Although GPT-4 offers exciting opportunities for insurance and countless other industries, its potential provides reason for caution. Consider this: Amid a race to incorporate LLMs, such as GPT-4, into search engines, it is possible that queries to Google, Bing, and others will not return a list of pages to read. Instead, the engines might present an answer that synthesizes source material. Such a presentation could prevent the user from reading multiple articles covering topics from various views, which could result in a substantial shift away from websites that provide original source material. Search results could therefore lack credibility by being biased, misleading, or incorrect.

Opinions differ on what effect LLMs might have on the future of society. AI luminaries continue to debate if LLMs have the capabilities to create, plan, or reason. Nearly all experts agree that LLMs work on existing information that cannot expand the frontiers of human understanding.

It is also certain that this technology will continue growing and insurers will explore and identify new use cases. GPT-5 development is already underway from OpenAI, though the official release date has not been announced.

Meet the Authors & Experts

Author

Jeff Heaton

Vice President, Data Science, Data Strategy and Infrastructure

GPT-4 Upgrade Improves Results, Expands Application Potential

Meaningful Output

Caution Ahead

More Like This...

Related Solutions

Meet the Authors & Experts