GPT-3.5 vs. GPT-4: Understanding The Two ChatGPT Models

By Kamil AnwarUpdated on July 27, 2023

Kamil is a certified Systems Analyst

ChatGPT was built by OpenAI it as an open-source natural-language model aimed at improving our understanding of AI, and giving a for-the-people kind of alternative to Silicon Valley’s profit-first solutions being developed by the likes of Google and more.

Unfortunately, it has turned every bit as corporate as the former, after a $10 billion investment from Microsoft earlier this year. So much so that GPT-4, the latest release of ChatGPT, is actually hidden behind a paywall of $20 per month. But is it actually worth it to pay for that over the free GPT 3.5?

GPT-3.5	GPT-4
👨‍👧‍👦 Free for everyone	💸 $20 per month paywall
💨 Faster responses	🧠 More nuanced responses
📲 Requires more context	⚡️ Works with minimal input
🤖 Trained on 175-billion parameters	🚀 Trained on 100-trillion parameters
📄 Supports only textual data	🖼️ Supports visual input like images
🌎 Less holistic worldview	⚖️ Reduced biases, AI alignment
✍️ 700-words answer limit	📚 25,000-words answer limit

GPT-3.5 vs GPT-4: The Gaslighting

Before we begin, keep in mind that GPT-3 and GPT-3.5 are pretty much the same thing with the latter being more efficient due to its speedier responses. The free version of GPT available to the public uses GPT 3.5, which is based on GPT-3.

Actually, the whole thing is sort of confusing. When asking ChatGPT itself about the difference, it gives varying answers each time, sometimes even denying the existence of GPT-3.5 altogether. However, from our research, we can concur that GPT-3.5 is faster, slightly more intelligent due to being trained on human responses, and just overall better than GPT-3.

ChatGPT contradicting itself on two different occasions

Based on the image above, you can see how ChatGPT, based on GPT-4, outright said no to the existence of GPT-3.5. Whereas, when asked the same question using the GPT-3.5 model, we got a different reply saying that GPT 3.5 is similar to GPT-3 with a few differences. It still highlighted how GPT 3.5 doesn’t exist in OpenAI’s lineup, despite the same name being written just above the question.

Now that all that’s out the way, let’s start the real comparison between GPT-3.5 and GPT-4.

Price & Availability

This might not be the biggest difference between the two models, but one that might make the biggest difference for most people. ChatGPT-3.5 is free for everyone. It’s the model you use when you go to OpenAI’s site and try out GPT.

However, if you fancy yourself a more up-to-date AI, GPT-4 is available for $20 per month around the world. There is no regional pricing, so you’re stuck paying that amount no matter where you are. Before we dive into the technicalities, this paywall around GPT-4 is going to cut-off people more than any other factor following it:

ChatGPT Plus, the $20/month subscription needed to access GPT-4 | Pexels

Dataset Size

Compared to GPT-3.5, the dataset used to construct GPT-4 is much bigger. GPT-4 requires 45 GB more training data than GPT-3.5 did. In comparison to its predecessor, GPT-4 produces far more precise findings. Moreover, GPT-4 has significant improvements in its ability to interpret visual data. This is due to the fact that GPT-4 is multimodal and can thus comprehend not just text but also visuals.

On the other hand, GPT-3.5 could only accept textual inputs and outputs, severely restricting its use. GPT-3.5 has a large dataset measuring in at 17 terabytes, which helps it provide reliable results. Large model precision is linked to the dataset’s size and quality.

Users can ask GPT-4 to explain what is happening in a picture, and more importantly, the software can be used to aid those who have impaired vision. Image recognition in GPT-4 is still in its infancy and not available publicly, but it’s expected to be released soon. Pattern description on an article of clothing, gym equipment use, and map reading are all within the purview of the GPT-4.

Word Processing

With GPT-4, the number of words it can process at once is increased by a factor of 8. This improves its capacity to handle bigger documents, which may greatly increase its usefulness in certain professional settings. In addition, GPT-4 beats GPT-3.5 by as much as 16% on typical machine learning benchmarks, and it is more able to take on multilingual tasks than its predecessor, making it more accessible to those who do not speak English as a first language.

While there is a small text output barrier to GPT-3.5, this limit is far-off in the case of GPT-4. In most cases, GPT-3.5 provides an answer in less than 700 words, for any given prompt, in one go. However, GPT-4 has the capability to even process more data as well as answer in 25,000 words in one go. This is equivalent to 2-3 literature books, which GPT-4 can now write on its own.

Comparison on Exam Results

The results of GPT-4 on human-created language tests like the Uniform Bar Exam, the Law School Admissions Test (LSAT), and the Scholastic Aptitude Test (SAT) in mathematics. There were noticeable increases in performance from GPT-3.5 to GPT-4, with GPT-4 scoring higher in the range of 90th to 99th percentiles across the board.

Exam Results Comparison: GPT-4 Performs Better | Open AI

These tests are useful for gauging level of understanding rather than IQ. The fourth generation of GPT (GPT-4) has improved context understanding and intelligent reaction times in complicated corporate applications.

While GPT-3.5 only managed a 1 on the AP Calculus BC test, GPT-4 did even better, earning a 4. Although GPT-3.5 performed in the lowest 10% of test takers, GPT-4 scored in the top 10% and passed the mock bar exam. Also, GPT-4 is a true multilingual.

GPT-3.5’s English proficiency was already rather strong at 70.1%. On the other hand, GPT-4 has improved upon that by leaps and bounds, reaching an astounding 85% in terms of shot accuracy. In reality, it has a greater command of 25 languages, including Mandarin, Polish, and Swahili, than its progenitor did of English. Most extant ML benchmarks are written in English, so that’s quite an accomplishment.

Exam Results Comparison GPT-3.5 vs GPT-4 | Open AI

Token Limits

There is an option called “context length” that specifies the maximum number of tokens that may be utilized in a single API request. The maximum token amount for a request was initially set at 2,049 in the 2020 release of the original GPT-3.5 devices. There are two different versions of GPT-4. Both are capable of processing up to 50 pages worth of text, although the former (GPT-4) has a shorter context length of 8,192 tokens.

Multitasking

Although only provided with a small number of samples to learn from, GPT-3.5 showed remarkable performance on natural language processing tasks including machine translation and question answering. When asked to carry out an activity in which it had no prior experience, however, its performance deteriorated.

Despite its extensive neural network, it was unable to complete tasks requiring just intuition, something with which even humans struggle.

By comparing GPT-3.5 with GPT-4, however, it becomes clear that GPT-4 is a superior meta-learner for few-shot multitasking, since its performance improves more quickly when more parameters are introduced. If GPT-3.5 continues in the same vein and has even more parameters, it is anticipated that it will be an even better multitasker, challenging the idea that deep learning systems need a large dataset in order to become proficient at a particular activity.

Key Differences Between GPT-3.5 and GPT-4 | CitiMuzik

GPT-3.5 has shown that you can continue a conversation without being told what to say next. It is exciting to think about what GPT-4 could be able to do in this area. This might demonstrate the impressive capacity of language models to learn from limited data sets, coming close to human performance in this area.

Input Types

In contrast to the GPT-3.5 model, which could only take in text-based input (or code, to be more specific), the GPT-4 model can take in a third sort of input: images. In particular, it creates textual outputs from textual and visual inputs. The GPT-4 model can be instructed to create captions, categorize visible components, or perform an analysis of the picture.

Examples of the models’ analysis of graphs, explanations of memes, and summaries of publications that include text and visuals can all be found in the GPT-4 study material. GPT-4’s picture recognition skills are really impressive.

GPT-4 actually describing an outcome based on visual data (image) | OpenAI

GPT-4’s enhanced token limits and image processing capabilities make it suitable for a wider range of applications, from scientific study to individual coaching and retail assistants. Do not get too excited just yet, though, because it could be a while before you actually get to use this new GPT-4 skill. We learn that the picture inputs are still in the preview stage and are not yet accessible to the general public.

Cost

There is always a cost. It is clear that if you want to employ the most complex models, you will have to pay more than the $0.0004 to $0.02 for every 1K tokens that you spend on GPT-3.5. Token costs for the GPT-4 with an 8K context window are $0.03 for 1K of prompts and $0.06 for 1K of completions. For comparison, the GPT-4 with a 32K context window will set you back $0.06 for every 1K tokens in prompts and $0.12 for every 1K tokens in completions.

If GPT-3.5’s $8000 price tag covered processing 100,000 requests with an average duration of 1,500 prompt tokens and 500 completion tokens, GPT-4’s $8500 price tag would cover an 8K context window and $15,000 price tag would cover a 32K context window. It is not only more costly, but harder to figure out as well.

OpenAI pricing cheat sheet including GPT-4 | Travis Fischer

This is due to the fact that input tokens (prompts) have a different cost than completion tokens (answers). Given the weak relationship between input and output length, estimating token use is challenging. Using GPT-4 models will be significantly more expensive, and its cost is now unpredictable, because of the greater price of the output (completion) tokens.

Parameter Counts

For those who don’t know, “parameters” are the values that the AI learns during training to understand and generate human-like text. OpenAI had a goal of completing 175-billion parameters in 2021 for GPT-3.5.

In contrast, GPT-4 is constructed using 100 trillion parameters. A larger number of datasets will be needed for model training if more parameters are included in the model. That seems to imply that GPT-3.5 was trained using a large number of different datasets (almost the whole Wikipedia).

Parameter Difference between GPT-3(.5) vs GPT-4

Additionally, GPT-3.5’s training data encompassed various sources, such as books, articles, and websites, to capture a diverse range of human knowledge and language. By incorporating multiple sources, GPT-3.5 aimed to better understand context, semantics, and nuances in text generation.

For the hypothetical GPT-4, expanding the training data would be essential to further enhance its capabilities. This could involve including more up-to-date information, ensuring better representation of non-English languages, and taking into account a broader range of perspectives.

Capabilities

To create writings like human brains, GPT-3.5 is a deep learning big language model. GPT-3.5 can generate writing that looks and reads like it was written by a human by guessing the next word in a sentence or phrase. It can write poems, code, translate material, and answer questions.

In a roundabout way, GPT-4 utilizes GPT-3.5’s method to get its results. In order to produce output that is more convincingly human-like. For its written output, GPT-4 can take both visual and textual inputs. GPT-4 takes a unified stance against the spread of disinformation and the distribution of texts that are founded on the truth.

GPT-4 is believed to be such a smart program that it can deter the context in a far better manner compared to GPT-3.5. For example, when GPT-4 was asked about a picture and to explain what the joke was in it, it clearly demonstrated a full understanding of why a certain image appeared to be humorous. On the other hand, GPT-3.5 does not have an ability to interpret context in such a sophisticated manner. It can only do so on a basic level, and that too, with textual data only.

Response by Chat GPT-4 when asked to describe an image | Reddit

It functions due to its inherent flexibility to adapt to new circumstances. In addition, it will not deviate from its predetermined path in order to protect its integrity and foil any unauthorized commands. With the assistance of longer contexts, GPT-4 is able to process longer texts.

Accuracy

GPT-4 is more precise and responsive to commands than its predecessor. For one thing, its layout reduces AI alignment issues, a major topic in the data science and AI community. It is 110% more truthful as compared to GPT-3.5, according to AI analyst Alan D. Thomspon.

Furthermore, it paves the path for inferences to be made about the mental states of the user. It may also be used to express the difficulty of creating an AI that respects human-like values, wants, and beliefs.

People tend to believe in its great accuracy because of these assumptions. The GPT-4 neural network will have five times the processing power of current language models and AI technologies.

Ultimately, GPT-4 includes 100 trillion more parameters. Its high score is the product of extensive training to improve its performance. By using a method called optimum parameterization, GPT-4 generates language that is more readable and natural sounding than that generated by GPT-based models or other AI software.

If GPT-3.5 were an advanced robot, GPT-4 is like a nuclear threat | BU

Prompting

If you give GPT-3.5 a hint, it can figure out what you are trying to learn. Because it might aid in telling a truthful tale, this can be a plus. The difficulty is that the prompt may lead to undesirable outcomes.

Many have voiced this problem, thus it may be something GPT-4 tries to fix. Before learning anything from a prompt, it may determine how good it is. Besides, from our tests, we found out that GPT-4 requires lesser context as compared to GPT-3.5 to provide the same answers.

Applications

GPT-3.5 was the gold standard for precision and expertise, due to its massive dataset and parameters. Generating and encoding text, translating and summarizing material, and managing customers are just some of GPT-3.5’s many potential uses. GPT-3.5 has already been used in a wide variety of applications, such as Chatbots, virtual assistants, and content production. Machine learning and NLP studies have also made use of it.

More applications for GPT-4 are expected, especially in the fields of art and creative writing. On top of that, it may enhance the performance of current programs like Chatbots and virtual assistants. It is anticipated that GPT-4 would perform even better than GPT-3.5 by resolving these limitations. Moreover, GPT-4 will be used to inspire new works of literature, music, and other artistic endeavors.

GPT-3.5 vs GPT-4: The Test

Our tech team got early access to GPT-4 and we were able to test both of them side by side.

Fuller Information

One of the things we noticed while giving the exact same prompts to GPT-3.5 and GPT-4 was the difference in better information. GPT-4 was able to get more creative and provide fuller information as compared to GPT-3.5. While GPT-3.5 was also able to provide the exact information we needed after several prompts, GPT-4 did it in one go.

The idea behind this is that GPT-3.5 still requires more subtext, better prompts, and detail to understand as well as adapt better to the requirements of the user while GPT-4 can provide that in one go.

Following is an example when GPT-3.5 and GPT-4 were asked about what a person should do for stress, GPT-4 had 8 valid ideas in contrast to 6 ideas offered by GPT-3.5 (Check below). Besides, the ideas by GPT-4 made more sense and provided better understanding. This, again, highlights the aspect that GPT-4 is far more creative, adaptable, and offers fuller information in contrast to GPT-3.5.

Prompt: I am stressed. What should I do?

Errors

Another key aspect we noticed in our testing was that GPT-3.5 as well as GPT-4 were making different types of errors when giving responses. While some of these errors were advanced and out of reach of the program, there were other basic errors as well, such as, wrong chemical formula, arithmetical errors, and numerous others as well.

However, they do let us question the validity of all the other responses which may or may not be correct. In certain cases, GPT-4 is also providing wrong answers. It is as if they are taught that once a human-user suggests that they are wrong, they have to abide by it.

Prompt: 23 + 9 = ?

Based on these responses, one can rightfully conclude that the technologies are still not mature enough. It also opens up the possibility that when a program can make such a basic error, how can this technology be used for the larger context i the long run.

Reduced biases

One of the key differences between GPT-3.5 and GPT-4 lies within reduced biases in the latter version. Since GPT-4 is trained on a larger data set, it produces a better, and fair evaluation of any given prompt as compared to GPT-3.5.

Earlier versions of GPT-3.5 showed that it had some form of gender bias. For example, when it was asked regarding the qualities of a successful entrepreneur, it would automatically refer to it as a “He” instead of being gender-neutral. However, as the program is getting daily updates from Open AI, this issue was resolved. Nonetheless, it can still have many such biases.

GPT-4 shows improvements in reducing biases present in the training data. By addressing the issue of biases, the model could produce more fair and balanced outputs across different topics, demographics, and languages.

Artificial intelligence needs to align with human moral values in order to solve a crucial problem | PIM

The capacity to comprehend and navigate the external environment is a notable feature of GPT-4 that does not exist in GPT-3.5. In certain contexts, GPT-3.5’s lack of a well-developed theory of mind and awareness of the external environment might be problematic. It is possible that GPT-4 may usher in a more holistic view of the world, allowing the model to make smarter choices.

With additional training data at its disposal, GPT-4 is more natural and precise in conversation. This is because of progress made in the areas of data collecting, cleansing, and pre-processing.

Reinforcement Learning Integration

In the field of machine learning known as reinforcement learning, an agent learns appropriate actions to do in a given setting by carrying them out and observing the results. The agent acts in the environment, experiences consequences (either positive or negative), and then utilizes this information to learn and adapt.

In contrast to conventional reinforcement learning, GPT-3.5’s capabilities are somewhat restricted. To anticipate the next word in a phrase based on context, the model engages in “unsupervised learning,” where it is exposed to a huge quantity of text data. With the addition of improved reinforcement learning in GPT-4, the system is better able to learn from the behaviors and preferences of its users.

Conclusion

OpenAI has created something truly groundbreaking with ChatGPT. Be it GPT-3.5 or GPT-4, the world is changing with the help of AI as we see it. Years down the line, we will be seeing AI woven through the fabric of our daily lives, so inconspicuously tied together to our normal functioning that a life without it would seem impossible. Till that day, let’s marvel at the next-generation of AI.

If you like this detailed breakdown, make sure to check out our analysis on whether or not ChatGPT, now integrated with Bing, will become the defining factor for (a resurgence in) Microsoft’s search engine.

Table of Contents