Tecknoworks Blog

How GPT Works and Its Core Mechanics

It’s been over a year since ChatGPT was publicly released, sparking significant interest and raising awareness across a broad audience.

Despite its popularity and many people using it daily, I’ve noticed in my recent conversations – whether with tech enthusiasts or industry professionals – that there are numerous misconceptions and misunderstandings about what GPT truly is and how GPT works.

At its essence, GPT stands for Generative Pre-trained Transformer, and it operates fundamentally as a statistical model. Let’s delve into how it works.

How Does GPT Work?

Let’s assume you have a clever and curious robot friend who loves to read and has read an enormous number of books, articles, and websites. Now, whenever you ask your friend a question or want to talk about something, your friend uses everything it has read to give you the best answer it can. This robot friend is like GPT, which stands for “Generative Pre-trained Transformer.”

Here’s a breakdown of the terminology:

● Generative: It can generate text, creating coherent and contextually relevant sentences based on the input it receives.

● Pre-trained: Before you even start asking questions, it has already learned a lot by processing vast amounts of text. This training helps it understand language structure and context.

● Transformer: This is the type of model it uses, which is particularly good at weighing the relevance of different words in a sentence to generate a suitable response.

So, when you ask GPT something, it remembers what it has learned from all its reading, figures out what is most important in your question, and then makes up a response that fits well with what you asked. Let’s see how the content is actually generated…

Tokens: The Building Blocks

GPT processes text through tokens. Each word in a sentence can be split into smaller parts, similar to how a word can be broken into syllables. In the world of GPT, these small parts are called “tokens.” Tokens can be whole words, parts of words, or even punctuation marks. They help the computer better understand and generate language.

For example, let’s consider the sentence: “The quick brown fox jumps over the lazy dog.” A simple way to think about tokens might be breaking down words that are complex or long into smaller parts, like in the bulleted list: →

However, in some cases, tokens can also be parts of words, especially for longer words, to help the model better manage different word forms and complex structures.

● The
● quick
● brown
● fox
● jumps
● over
● the
● lazy
● dog

Let’s see how GPT uses these tokens to generate the next part of the sentence:

● Reading and understanding. First, GPT reads the tokens one by one—like reading syllables. As it reads, it tries to understand the sentence so far.

● Predicting. Based on what it has learned during its training (from reading lots of texts), GPT tries to predict what the next token (or “syllable”) should be after “The quick brown fox jumps over the lazy.” It thinks about what usually comes after these words in similar sentences it has seen before.

● Generating. GPT then generates the next token, which could be “dog” in our sentence. It chooses “dog” because, in its training, it has often seen “dog” come after “The quick brown fox jumps over the lazy.”

So, GPT reads and predicts each “token” or part of the sentence, one by one, just like putting together syllables to form words and sentences.

GPT as a Statistical Model

GPT, at its core, is a statistical model. This means it uses probabilities (chances) to decide which token (a word or part of a word) to generate next. When GPT is creating a sentence, it calculates the probability of every possible next token based on the tokens it has seen so far in the sentence. For our example, “The quick brown fox jumps over the lazy,” GPT evaluates how likely each possible next word (token) is. It might determine that “dog” has a high probability because, in its training data, “dog” frequently follows “The quick brown fox jumps over the lazy.” If the model finds that 90% of the time, the word “dog” follows, then “dog” has a 90% chance of being the next word.

Temperature in GPT

The concept of “temperature” in GPT is a way to adjust how the model makes its predictions. It affects the randomness or certainty in choosing the following token:

A lower temperature (e.g., closer to 0) makes the model more confident and conservative in its choices. It will choose tokens that are very likely, making the output more predictable, coherent, and less diverse.

A higher temperature (e.g., closer to 1 or above) makes the model less confident, allowing for more randomness in choosing tokens. This leads to more creative and varied outputs, but also increases the chances of producing less typical or even nonsensical results.

The choice of temperature setting is crucial, depending on the application. For creative writing, a higher temperature might be beneficial. A lower temperature would likely be more appropriate to maintain accuracy and relevance for more data-driven, factual, or formal text generation.

In Conclusion

I hope this article manages to clarify how GPT functions at a fundamental level and dispel some common misconceptions about its capabilities and operation. As we continue to explore the art of possible with AI and machine learning, understanding these tools’ core principles becomes increasingly important. It allows us to leverage its capabilities effectively while setting realistic expectations about its outputs.