How LLMs Talk - Understanding Autoregressive Generation

Visualize how GPT-4o-mini generates text token by token

Large language models generate text one token at a time - tokens are small parts of words, like syllables. At each step, the model looks at the entire conversation so far, finds a numerical representation of that conversation (its embedding), and uses it to identify candidates for the next best word, and how likely each of those candidates is. It then picks one of those words at random.

In the tool below, you will get to manually select the first five tokens in a response instead of relying on this random selection. You will be told the alternative tokens GPT-4o-mini was considering and their probabilities (these are real probabilities obtained from the OpenAI API). Once you have selected a token, it will be appended to the conversation and the entire process will repeat again - this is called autoregressive generation.

Start by picking one of the two conversation starters below, and get generating...

Full conversation so far

Next Token Probabilities (click on any token to select it as the next token)