Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Basically, an LLM Token is a unit of measurement based on text. It can be a character, a word, parts of a word, etc. Think of tokens as words in a dictionary it already knows. It has to figure out the other words in the dictionary it doesn't know, by using the words it already knows, to figure out those other words.
Yes, but not quite, because once the tokens are stored, it doesn't really need to compute them, anymore, just repeat them. It's similar to stored procedures in databases, it takes a while to create them, but once it'a created, it's quick to execute and retreive the data.isn't it just a measure of compute when you get right down to it?
Yes, but not quite, because once the tokens are stored, it doesn't really need to compute them, anymore, just repeat them. It's similar to stored procedures in databases, it takes a while to create them, but once it'a created, it's quick to execute and retreive the data.
Unless you're using OpenClaw, in which it reads the entire freaking conversation with each new message. Absolute garbage app.Yes, but not quite, because once the tokens are stored, it doesn't really need to compute them, anymore, just repeat them. It's similar to stored procedures in databases, it takes a while to create them, but once it'a created, it's quick to execute and retreive the data.
Unless you're using OpenClaw, in which it reads the entire freaking conversation with each new message. Absolute garbage app.
Ok, as a human, think of solving a math problem as "thinking", or "compute". Then think or rote memorization as "retreival". Say, for some reason you calculated 123 x 123 = 15,129 and memorized it. Remembering the answer to a problem isn't the same thing as solving the problem again. For simple calculations, like 123 + 123, you don't need to remember it. You can easily solve it because it's simple. However, for something more difficult, it's easier to remember it, especially if the problem repeats itself often.right but that/retrieval/DB lookup/etc even if not as intense as 'initial processing' is still just a measure of compute as well, no? again trying to explain it to a non-techie person - it's just compute/'the computer thinking'![]()
Ok, as a human, think of solving a math problem as "thinking", or "compute". Then think or rote memorization as "retreival". Say, for some reason you calculated 123 x 123 = 15,129 and memorized it. Remembering the answer to a problem isn't the same thing as solving the problem again. For simple calculations, like 123 + 123, you don't need to remember it. You can easily solve it because it's simple. However, for something more difficult, it's easier to remember it, especially if the problem repeats itself often.
So, tokens are a way to "remember" complex problems that repeats itself often, so you don't have to do the calculations again.
It's kind of like the "Six Degrees of Kevin Bacon" game, but instead of actors to other actors, it's tokens to other tokens.
Tokens are generally words, but as other said it could be part of a word, a single character etc
My understanding is the tokens represent values in a vector space, they're really numbered coordinates behind them. So like 'bear' (for burden) is a different coordinate than 'bear' (animal) and both of them will be in coordinates that share meaning so that when users are prompting for similar meaning words the tokens are in a related space, which is why LLMs are so good at semantically related queries since those tokens occupy similar regions in the 3D space.Didn't know it could be whole words - I'm assuming for short words (like 'the' 'a' 'is' - which would make sense cause of how short it being a single token)?
My understanding is the tokens represent values in a vector space, they're really numbered coordinates behind them. So like 'bear' (for burden) is a different coordinate than 'bear' (animal) and both of them will be in coordinates that share meaning so that when users are prompting for similar meaning words the tokens are in a related space, which is why LLMs are so good at semantically related queries since those tokens occupy similar regions in the 3D space.
Also variants of these words (typos, variant spellings, etc) are placed in the vector space (beside the coordinates of semantic meaning), so it's very forgiving when prompting.
I've seen the token space described in a layman way as a library and the tokens the coordinates of the book. Now imagine related books are grouped beside each other, so knowing the coordinates of one book allows you to find related books.
When image-based models use these for training it pairs the values in space with image representations of them, to establish 'a <bear> [animal] looks like this'. Once trained, if the user prompts for a semantically similar animal it wasn't trained on with images then it may try to pull the representation of the bear since it's nearest to it in the vector space.
This video is a good, semi-detailed explainer on vector searching and how its broken into pieces to find semantic meaning (I've likely forgotten some details it touches on):
View: https://www.youtube.com/watch?v=YDdKiQNw80c
Again, all baked into the algorithm (especially for LLMs) unless also kept separately as sidecar/reference data.
Think of all the times the word 'the' or 'a' is used in the dictionary, for explaining definitions aside from their own entries - now imagine like when you crossout similar numbers in math/equations if you could just have 1 entry/symbol for all the 'the's and 'a's across the whole dictionary, how much you could slim/compress the dictionary down to.
That's what the algorithm/model is and consists of and what is done during training - look at 'all this data’ on whatever and everything it is your training it for - language and definitions and syntax and grammar etc in the case of LLMs - and it finds all the commonalities/similar numbers it can cross out, but across larger spans and patterns than you or I can I imagine/find, and not just for words as mentioned but for sentences, grammatic/syntax rules etc etc which is also part of/within a dictionary as it's used to define/explain things - and all reduced down - so that both yes and no - there is and isn't an entire dictionary within your LLM.