Explaining token use to a non-techie

philb2 · Monday at 10:37 PM

I'm struggling to explain to my non-techie wife how LLMs work with tokens, because I don't really know much about it. Please ejumacate me.

El_Capitan · Monday at 10:59 PM

Basically, an LLM Token is a unit of measurement based on text. It can be a character, a word, parts of a word, etc. Think of tokens as words in a dictionary it already knows. It has to figure out the other words in the dictionary it doesn't know, by using the words it already knows, to figure out those other words.

kram182 · Monday at 11:05 PM

El_Capitan said:
Basically, an LLM Token is a unit of measurement based on text. It can be a character, a word, parts of a word, etc. Think of tokens as words in a dictionary it already knows. It has to figure out the other words in the dictionary it doesn't know, by using the words it already knows, to figure out those other words.

isn't it just a measure of compute when you get right down to it?

Shoganai · Monday at 11:12 PM

A token is basically how LLMs measure text in small chunks ... usually a word or part of a word ... and the AI processes them one at a time to predict what comes next. It's like a person who reads your question phrase by phrase and guesses the next phrase based on probability. And LLMs have a limit on how many tokens they can handle at once (like 200,000 ... or a million for some models), so if your conversation gets too long ... you hit a wall ... and have to make a new conversation. Or the AI begins hallucinating becomes it can't hold all of that context at once without forgetting things. AI memory is terrible unless you learn how to address that (which isn't hard). Also, when you use an API, you pay per token, so longer text = more cost. Basically, tokens are just how AI systems measure and limit text. Context window is something else ... memory is another. They're all related.

El_Capitan · Monday at 11:23 PM

kram182 said:
isn't it just a measure of compute when you get right down to it?

Yes, but not quite, because once the tokens are stored, it doesn't really need to compute them, anymore, just repeat them. It's similar to stored procedures in databases, it takes a while to create them, but once it'a created, it's quick to execute and retreive the data.

kram182 · Monday at 11:29 PM

El_Capitan said:
Yes, but not quite, because once the tokens are stored, it doesn't really need to compute them, anymore, just repeat them. It's similar to stored procedures in databases, it takes a while to create them, but once it'a created, it's quick to execute and retreive the data.

right but that/retrieval/DB lookup/etc even if not as intense as 'initial processing' is still just a measure of compute as well, no? again trying to explain it to a non-techie person - it's just compute/'the computer thinking'

edit: it depends 'how well' he's trying to explain it to his wife I suppose - I'm assuming 'the most basic and simplistic answer possible'

Shoganai · Monday at 11:29 PM

El_Capitan said:
Yes, but not quite, because once the tokens are stored, it doesn't really need to compute them, anymore, just repeat them. It's similar to stored procedures in databases, it takes a while to create them, but once it'a created, it's quick to execute and retreive the data.

Unless you're using OpenClaw, in which it reads the entire freaking conversation with each new message. Absolute garbage app.

kram182 · Monday at 11:39 PM

Shoganai said:
Unless you're using OpenClaw, in which it reads the entire freaking conversation with each new message. Absolute garbage app.

Doesn't every LLM have to do some compute beyond KV cache (which is what I was assuming El_Capitan was referencing)?

edit: I understand not the entire literal convo over again, but the entire convo is never stored in KV in other models anyway is what I mean

El_Capitan · Monday at 11:51 PM

kram182 said:
right but that/retrieval/DB lookup/etc even if not as intense as 'initial processing' is still just a measure of compute as well, no? again trying to explain it to a non-techie person - it's just compute/'the computer thinking'

Ok, as a human, think of solving a math problem as "thinking", or "compute". Then think or rote memorization as "retreival". Say, for some reason you calculated 123 x 123 = 15,129 and memorized it. Remembering the answer to a problem isn't the same thing as solving the problem again. For simple calculations, like 123 + 123, you don't need to remember it. You can easily solve it because it's simple. However, for something more difficult, it's easier to remember it, especially if the problem repeats itself often.

So, tokens are a way to "remember" complex problems that repeats itself often, so you don't have to do the calculations again.

It's kind of like the "Six Degrees of Kevin Bacon" game, but instead of actors to other actors, it's tokens to other tokens.

sharknice · Tuesday at 12:07 AM

Tokens are generally words, but as other said it could be part of a word, a single character etc.

Your token use isn't exactly how many words you type and how many the LLM types back. When the AI is "thinking" it is also using tokens. If you had a completely open model running locally you can view the thinking text which is what counts as the tokens. If you have the AI do a web search, it reading the results is also using tokens. If you feed in any other info, including previous conversations, it is using tokens. If you have some other prompt options, those are tokens, including any prompt type stuff whatever service you're adding is using, etc.

kram182 · Tuesday at 12:12 AM

El_Capitan said:
Ok, as a human, think of solving a math problem as "thinking", or "compute". Then think or rote memorization as "retreival". Say, for some reason you calculated 123 x 123 = 15,129 and memorized it. Remembering the answer to a problem isn't the same thing as solving the problem again. For simple calculations, like 123 + 123, you don't need to remember it. You can easily solve it because it's simple. However, for something more difficult, it's easier to remember it, especially if the problem repeats itself often.

So, tokens are a way to "remember" complex problems that repeats itself often, so you don't have to do the calculations again.

It's kind of like the "Six Degrees of Kevin Bacon" game, but instead of actors to other actors, it's tokens to other tokens.

That's not what tokens are IIRC

Another different way than you put it, that I usually explain/understand it as - is how you/people can 'read' words by just noticing/remembering 'the shape' of common words like 'the' - to know what the word is automatically, vs actually having to 'read/phonetically break down the word' etc

That 'remembering at first sight' is equivalent to the cache/DB lookup - the token is actually phonetically reading out the word - unbelievable = 'un' 'believe' 'able' = 3 tokens

But that's still all just 'degrees of' thinking/reading/compute on the most basic level that I was going for from the start - even if it's more or less (like I said I was going for 'the most basic/simplistic answer' - it depends how complex philb2 wants his wife to understand)

sharknice said:
Tokens are generally words, but as other said it could be part of a word, a single character etc

Didn't know it could be whole words - I'm assuming for short words (like 'the' 'a' 'is' - which would make sense cause of how short it being a single token)?

Edit: but also as described above, even just 'referencing the cache' is a token, just one using less compute.

Okatis · Tuesday at 3:50 AM

kram182 said:
Didn't know it could be whole words - I'm assuming for short words (like 'the' 'a' 'is' - which would make sense cause of how short it being a single token)?

My understanding is the tokens represent values in a vector space, they're really numbered coordinates behind them. So like 'bear' (for burden) is a different coordinate than 'bear' (animal) and both of them will be in coordinates that share meaning so that when users are prompting for similar meaning words the tokens are in a related space, which is why LLMs are so good at semantically related queries since those tokens occupy similar regions in the 3D space.

Also variants of these words (typos, variant spellings, etc) are placed in the vector space (beside the coordinates of semantic meaning), so it's very forgiving when prompting.

I've seen the token space described in a layman way as a library and the tokens the coordinates of the book. Now imagine related books are grouped beside each other, so knowing the coordinates of one book allows you to find related books.

When image-based models use these for training it pairs the values in space with image representations of them, to establish 'a <bear> [animal] looks like this'. Once trained, if the user prompts for a semantically similar animal it wasn't trained on with images then it may try to pull the representation of the bear since it's nearest to it in the vector space.

This video is a good, semi-detailed explainer on vector searching and how its broken into pieces to find semantic meaning (I've likely forgotten some details it touches on):

View: https://www.youtube.com/watch?v=YDdKiQNw80c

Zepher · Tuesday at 4:11 AM

Lol, why not ask AI to tell you what it is,

kram182 · Tuesday at 4:11 AM

Okatis said:
My understanding is the tokens represent values in a vector space, they're really numbered coordinates behind them. So like 'bear' (for burden) is a different coordinate than 'bear' (animal) and both of them will be in coordinates that share meaning so that when users are prompting for similar meaning words the tokens are in a related space, which is why LLMs are so good at semantically related queries since those tokens occupy similar regions in the 3D space.

Also variants of these words (typos, variant spellings, etc) are placed in the vector space (beside the coordinates of semantic meaning), so it's very forgiving when prompting.

I've seen the token space described in a layman way as a library and the tokens the coordinates of the book. Now imagine related books are grouped beside each other, so knowing the coordinates of one book allows you to find related books.

When image-based models use these for training it pairs the values in space with image representations of them, to establish 'a <bear> [animal] looks like this'. Once trained, if the user prompts for a semantically similar animal it wasn't trained on with images then it may try to pull the representation of the bear since it's nearest to it in the vector space.

This video is a good, semi-detailed explainer on vector searching and how its broken into pieces to find semantic meaning (I've likely forgotten some details it touches on):

View: https://www.youtube.com/watch?v=YDdKiQNw80c

No this/vector space is akin/linked to dimensionality which I sloppily explain here in another thread:

kram182 said:
Again, all baked into the algorithm (especially for LLMs) unless also kept separately as sidecar/reference data.

Think of all the times the word 'the' or 'a' is used in the dictionary, for explaining definitions aside from their own entries - now imagine like when you crossout similar numbers in math/equations if you could just have 1 entry/symbol for all the 'the's and 'a's across the whole dictionary, how much you could slim/compress the dictionary down to.

That's what the algorithm/model is and consists of and what is done during training - look at 'all this data’ on whatever and everything it is your training it for - language and definitions and syntax and grammar etc in the case of LLMs - and it finds all the commonalities/similar numbers it can cross out, but across larger spans and patterns than you or I can I imagine/find, and not just for words as mentioned but for sentences, grammatic/syntax rules etc etc which is also part of/within a dictionary as it's used to define/explain things - and all reduced down - so that both yes and no - there is and isn't an entire dictionary within your LLM.

Looking up something in vector space can use/be a token/unit of compute but the vector space and vectors (words and also the relation/links to/of others) within are not the token/compute itself (nor are the token IDs matched/labeled to those vectors), those are all just the result/activity of the token

CPU/GPU/NPU cycles (which is why I said compute from the start) is really the most apt description (but it's also really the instructions causing them, but because it's using those resources is why companies care about them)

And not just words/part of words are tokens - line breaks/blank spaces/images/just part of an image/model instructions/guardrails it has to keep in mind in the background - anything it has to process/understand/compute is a token - and the resources spent doing so the cost of the token (the other side of the coin/token if you will).

But tokens (like the guardrails for example) can be sent to KV cache so those tokens don't have to be processed each and every time. The past processing/output values done and stored there. But they're still tokens themselves. That part is just like reading 'the' by instantly recognizing the shape of the word vs phonetically reading the word itself as I explained above in a prior post.

Explaining token use to a non-techie

philb2

2[H]4U

El_Capitan

[H]ard|Gawd

kram182

[H]ard|Gawd

Shoganai

Supreme [H]ardness

El_Capitan

[H]ard|Gawd

kram182

[H]ard|Gawd

Shoganai

Supreme [H]ardness

kram182

[H]ard|Gawd

El_Capitan

[H]ard|Gawd

sharknice

Supreme [H]ardness

kram182

[H]ard|Gawd

Okatis

Gawd

Zepher

[H]ipster Replacement

kram182

[H]ard|Gawd