GPT-4: The Multi-Modal AI Breakthrough

3 min readMar 15, 2023

On March 14th, 2023, OpenAI announced the release of its latest product, GPT-4, which they claim is a major breakthrough in deep learning. Unlike previous models, GPT-4 can not only process text but also images, making it a powerful multi-modal model. It has been reported that GPT-4 can generate natural language and code from both text and images.

According to OpenAI, GPT-4 has achieved a level of professional and academic performance that is similar to that of humans. For example, it has scored in the top 10% in simulating a lawyer’s exam, compared to GPT-3.5, which scored in the bottom 10%. OpenAI spent six months improving the accuracy and stability of GPT-4 through adversarial testing and learning from the experience of ChatGPT.

GPT-4 has already been integrated into several products by various companies, including Duolingo, Stripe, and Khan Academy. However, it is currently only available to ChatGPT Plus subscribers and paying customers.

While GPT-4 has several advantages over previous models, it still has limitations. OpenAI warns users to be cautious when using language models, as GPT-4 can make reasoning errors and trust false statements made by users. OpenAI also notes that GPT-4 lacks knowledge of events that have occurred after September 2021 and may make simple reasoning errors or introduce security vulnerabilities into generated code.

Overall, GPT-4 represents a significant advance in the field of deep learning and has the potential to revolutionize how we interact with language and images.

OpenAI has also highlighted the superiority of GPT-4 over other language models in various languages, including Chinese, where it achieved an accuracy of 80.1%, compared to GPT-3.5’s English accuracy of only 70.1%. In addition, GPT-4’s performance in 24 out of 26 tested languages was better than that of other large language models, such as GPT-3.5.

One of the most significant breakthroughs of GPT-4 is its ability to process complex image information, such as tables, screenshots of exam questions, academic papers, and comics. OpenAI has showcased several examples on its official website, where GPT-4 has identified unusual features in images and generated natural language text describing them.

Moreover, GPT-4 can generate websites in just one to two seconds, as demonstrated by OpenAI in their official video. The model can identify hand-drawn website images and generate webpage code in real-time, producing results that are almost identical to the original.

Despite its limitations, GPT-4’s potential applications are vast, including but not limited to, language learning tools, chatbots, customer service, and content creation. It is a powerful tool that will undoubtedly have a significant impact on several industries and our daily lives.

Overall, the release of GPT-4 represents a significant milestone in the advancement of deep learning and artificial intelligence, and it will be exciting to see how this technology develops and evolves in the coming years.

GPT-4: The Multi-Modal AI Breakthrough

Written by Alex Lew, CFA

No responses yet