What Does AI Really Mean?

About The Author

Atila Fassina is a Google Dev Expert, member of the Solid DX team, and a Tauri advocate. He enjoys making complex code simpler via articles, conference talks, … More about Atila ↬

Email Newsletter

Weekly tips on front-end & UX.
Trusted by 200,000+ folks.

We, as human beings, don’t worry too much about making sure the connections land at the right point. Our brain just works that way, declaratively. However, for building AI, we need to be more explicit. Let’s dive in!

In 2024, Artificial Intelligence (AI) hit the limelight with major advancements. The problem with reaching common knowledge and so much public attention so quickly is that the term becomes ambiguous. While we all have an approximation of what it means to “use AI” in something, it’s not widely understood what infrastructure having AI in your project, product, or feature entails.

So, let’s break down the concepts that make AI tick. How is data stored and correlated, and how are the relationships built in order for an algorithm to learn how to interpret that data? As with most data-oriented architectures, it all starts with a database.

Data As Coordinates

Creating intelligence, whether artificial or natural, works in a very similar way. We store chunks of information, and we then connect them. Multiple visualization tools and metaphors show this in a 3-dimensional space with dots connected by lines on a graph. Those connections and their intersection are what make up for intelligence. For example, we put together “chocolate is sweet and nice” and “drinking hot milk makes you warm”, and we make “hot chocolate”.

Tony Stark in Iron Man 2 looking at a 3D representation of a molecule — which happens to be a great representation of a high dimensional graph.
(Image credit: Marvel Studios)

We, as human beings, don’t worry too much about making sure the connections land at the right point. Our brain just works that way, declaratively. However, for building AI, we need to be more explicit. So think of it as a map. In order for a plane to leave CountryA and arrive at CountryB it requires a precise system: we have coordinates, we have 2 axis in our maps, and they can be represented as a vector: [28.3772, 81.5707].

For our intelligence, we need a more complex system; 2 dimensions will not suffice; we need thousands. That’s what vector databases are. Our intelligence can now correlate terms based on the distance and/or angle between them, create cross-references, and establish patterns in which every term occurs.

A specialized database that stores and manages data as high-dimensional vectors. It enables efficient similarity searches and semantic matching.

Querying Per Approximation

As stated in the last session, matching the search terms (your prompt) to the data is the exercise of semantic matching (it establishes the pattern in which keywords in your prompt are used within its own data), and the similarity search, the distance (angular or linear) between each entry. That’s actually a roughly accurate representation. What a similarity search does is define each of the numbers in a vector (that’s thousands of coordinates long), a point in this weird multi-dimensional space. Finally, to establish similarity between each of these points, the distance and/or angles between them are measured.

This is one of the reasons why AI isn’t deterministic — we also aren’t — for the same prompt, the search may produce different outputs based on how the scores are defined at that moment. If you’re building an AI system, there are algorithms you can use to establish how your data will be evaluated.

This can produce more precise and accurate results depending on the type of data. The main algorithms used are 3, and Each one of them performs better for a certain kind of data, so understanding the shape of the data and how each of these concepts will correlate is important to choosing the correct one. In a very hand-wavy way, here’s the rule-of-thumb to offer you a clue for each:

  • Cosine Similarity
    Measures angle between vectors. So if the magnitude (the actual number) is less important. It’s great for text/semantic similarity
  • Dot Product
    Captures linear correlation and alignment. It’s great for establishing relationships between multiple points/features.
  • Euclidean Distance
    Calculates straight-line distance. It’s good for dense numerical spaces since it highlights the spatial distance.
INFO

When working with non-structured data (like text entries: your tweets, a book, multiple recipes, your product’s documentation), cosine similarity is the way to go.

Now that we understand how the data bulk is stored and the relationships are built, we can start talking about how the intelligence works — let the training begin!

Language Models

A language model is a system trained to understand, predict, and finally generate human-like text by learning statistical patterns and relationships between words and phrases in large text datasets. For such a system, language is represented as probabilistic sequences.

In that way, a language model is immediately capable of efficient completion (hence the quote stating that 90% of the code in Google is written by AI — auto-completion), translation, and conversation. Those tasks are the low-hanging fruits of AI because they depend on estimating the likelihood of word combinations and improve by reaffirming and adjusting the patterns based on usage feedback (rebalancing the similarity scores).

As of now, we understand what a language model is, and we can start classifying them as large and small.

Large Language Models (LLMs)

As the name says, use large-scale datasets &mdash with billions of parameters, like up to 70 billion. This allows them to be diverse and capable of creating human-like text across different knowledge domains. Think of them as big generalists. This makes them not only versatile but extremely powerful. And as a consequence, training them demands a lot of computational work.

Small Language Models (SLMs)

With a smaller dataset, with numbers ranging from 100 million to 3 billion parameters. They take significantly less computational effort, which makes them less versatile and better suited for specific tasks with more defined constraints. SLMs can also be deployed more efficiently and have a faster inference when processing user input.

Fine-Tunning

Fine-tuning an LLM consists of adjusting the model’s weights through additional specialized training on a specific (high-quality) dataset. Basically, adapting a pre-trained model to perform better in a particular domain or task.

As training iterates through the heuristics within the model, it enables a more nuanced understanding. This leads to more accurate and context-specific outputs without creating a custom language model for each task. On each training iteration, developers will tune the learning rate, weights, and batch-size while providing a dataset tailored for that particular knowledge area. Of course, each iteration depends also on appropriately benchmarking the output performance of the model.

As mentioned above, fine-tuning is particularly useful for applying a determined task with a niche knowledge area, for example, creating summaries of nutritional scientific articles, correlating symptoms with a subset of possible conditions, etc.

Fine-tuning is not something that can be done frequently or fast, requiring numerous iterations, and it isn’t intended for factual information, especially if dependent on current events or streamed information.

Enhancing Context With Information

Most conversations we have are directly dependent on context; with AI, it isn’t so much different. While there are definitely use cases that don’t entirely depend on current events (translations, summarization, data analysis, etc.), many others do. However, it isn’t quite feasible yet to have LLMs (or even SLMs) being trained on a daily basis.

For this, a new technique can help: Retrieve-Augmented Generation (RAG). It consists of injecting a smaller dataset into the LLMs in order to provide it with more specific (and/or current) information. With a RAG, the LLM isn’t better trained; it still has all the generalistic training it had before — but now, before it generates the output, it receives an ingest of new information to be used.

INFO

RAG enhances the LLM’s context, providing it with a more comprehensive understanding of the topic.

For an RAG to work well, data must be prepared/formatted in a way that the LLM can properly digest it. Setting it up is a multi-step process:

  1. Retrieval
    Query external data (such as web pages, knowledge bases, and databases).
  2. Pre-Processing
    Information undergoes pre-processing, including tokenization, stemming, and removal of stop words.
  3. Grounded Generation
    The pre-processed retrieved information is then seamlessly incorporated into the pre-trained LLM.

RAG first retrieves relevant information from a database using a query generated by the LLM. Integrating an RAG to an LLM enhances its context, providing it with a more comprehensive understanding of the topic. This augmented context enables the LLM to generate more precise, informative, and engaging responses.

Since it provides access to fresh information via easy-to-update database records, this approach is mostly for data-driven responses. Because this data is context-focused, it also provides more accuracy to facts. Think of a RAG as a tool to turn your LLM from a generalist into a specialist.

Enhancing an LLM context through RAG is particularly useful for chatbots, assistants, agents, or other usages where the output quality is directly connected to domain knowledge. But, while RAG is the strategy to collect and “inject” data into the language model’s context, this data requires input, and that is why it also requires meaning embedded.

Embedding

To make data digestible by the LLM, we need to capture each entry’s semantic meaning so the language model can form the patterns and establish the relationships. This process is called embedding, and it works by creating a static vector representation of the data. Different language models have different levels of precision embedding. For example, you can have embeddings from 384 dimensions all the way to 3072.

In other words, in comparison to our cartesian coordinates in a map (e.g., [28.3772, 81.5707]) with only two dimensions, an embedded entry for an LLM has from 384 to 3072 dimensions.

Let’s Build

I hope this helped you better understand what those terms mean and the processes which encompass the term “AI”. This merely scratches the surface of complexity, though. We still need to talk about AI Agents and how all these approaches intertwine to create richer experiences. Perhaps we can do that in a later article — let me know in the comments if you’d like that!

Meanwhile, let me know your thoughts and what you build with this!

Further Reading on SmashingMag

Smashing Editorial (il)