Towards the AI Transition
In the wake of our third AI forum, which delved into the cutting-edge topic of “Chat GPT In Action,” we are happy to share the “Towards the AI Transition” article written by Šimun Strukan, who held the lecture mentioned above at COTRUGLI Business School.
As the AI transition approaches, many questions, fears, and predictions arise about how this upcoming innovation will impact our society. Speculations range from extreme scenarios, with popular ones envisioning the end of humanity, massive job losses, and dystopian futures.
While some fears from these extremes are not entirely unfounded, it is essential to approach all AI-related questions critically to avoid succumbing to fears that may cloud our vision of the positive potential of this technology.
Many compare this year to a pivotal moment, akin to Pandora’s Box being opened with the release of ChatGPT. However, let us remember that when Pandora unleashed the world’s evils among humans, she quickly closed the box in horror, capturing what lay at the bottom – hope.
Mathematics or Magic?
Behind every technology ever created by humans lies mathematics. At some point, someone sat at a table and jotted down fundamental mathematical equations that clearly and deterministically described their envisioned process.
Such a scientifically creative process usually results in a scientific paper explaining the concepts in a way understandable to the entire scientific community, regardless of cultural background or language spoken. Over time and continuous scrutiny, this new scientific concept propagates into the engineering community, which then finds ways to transform scientific discovery into a new technology, product, or innovation. This process can take years, decades, or even centuries.
Today’s most attractive version of artificial intelligence is powered by a concept known as a neural network. This concept has gone through an 80-year journey, undergoing the scientific-engineering process described above, eventually becoming what we now know as ChatGPT. Surprisingly, the concept of virtual neural networks originated in 1943 when neurophysiologist Warren McCulloch and mathematician Walter Pitts wrote a scientific paper with the idea that digital neurons could simulate the function of human neurons.
Bernard Widrow and Marcian Hoff from Stanford developed models named “ADALINE” and “MADALINE,” which finally confirmed the theory after over 15 years of experimentation, including several attempts and failures.
MADALINE was the first practical neural network applied as an adaptive filter that eliminated echoes on telephone lines. With this confirmation, the idea of computers simulating human neurons arranged in neural networks took its present form.
From there, it was a matter of computation and computing power. While many phenomenal discoveries in artificial intelligence occurred in the second half of the 20th century, the primary insurmountable barrier until the beginning of the 21st century was the need for sufficiently large and powerful hardware architecture to support the envisioned neural networks.
For illustration, the latest model, ChatGPT-4, released by OpenAI, has around 100,000,000 quadrillion parameters. That’s 10 to the power of 14, with 14 zeros!
To operate such a model at its current capacity, specialized “graphic” cards, several dozen of them, each costing around $15,000, are needed. Running such a system can cost several million dollars per day in maintenance, according to some estimates. Information about this is relatively obscure because OpenAI, for understandable competitive reasons, neither entirely denies nor confirms such data.
How Does ChatGPT Work?
It is understandable to say that the field of artificial intelligence has evolved into a complex branch of science and industry. To provide a minimally symbolic explanation of the concepts powering a chatbot like ChatGPT, we need to go through a brief index of terms generated by ChatGPT:
- Linear Algebra: This branch of mathematics deals with vectors, vector spaces, and linear transformations between such areas. In the context of ChatGPT, linear algebra is used to perform various operations, such as matrix multiplication, which are essential in the training and inference processes of neural networks.
- Vector: In the context of mathematics, a vector is an object that has magnitude and direction. In the context of neural networks and ChatGPT, vectors are used to transform words or expressions into a format that the model can understand and process. A vector is a coding system or language the computer uses to interpret the world. For example, each word that ChatGPT encounters is transformed into a vector – a unique numerical “fingerprint” that the model can understand. In this way, ChatGPT processes and understands the words and phrases provided to it.
- Embedding: In the context of neural networks, embedding refers to mapping instances from an input space to a vector space. In ChatGPT, each word is mapped to a high-dimensional vector space (embedding space), which facilitates the neural network’s manipulation and processing of data.
- Token: Tokens are the basic processing units in models like ChatGPT. When ChatGPT reads text, it first breaks into smaller units called tokens. A token can be a letter, a word, or a punctuation mark. For example, the sentence “ChatGPT is interesting.” would be broken down into the following tokens: [“ChatGPT”, ” “, “is”, ” “, “interesting”, “.”]. After that, ChatGPT uses vectors to represent these tokens in numeric form, allowing for mathematical processing and response generation.
- Training Model – This term refers to the process of “training” a model like ChatGPT based on a large dataset. During training, the model adjusts its parameters to minimize the difference between its predictions and the actual values.
- Inference: Once the model is trained, it utilizes inference to make “predictions” or “conclusions” based on new, previously unseen input data. In the context of ChatGPT, inference refers to generating responses based on the input message.
- Neural Network Parameter: Parameters are variables that the model, such as ChatGPT, learns during training. These parameters (such as weights and biases) are adjusted to help the model better fit the data.
- Neural Network: A neural network is a computer model that simulates how the human brain analyzes and processes information. It consists of interconnected nodes (“neurons”) that exchange information. ChatGPT is an example of a deep neural network.
- GPT (Generative Pre-training Transformer) – GPT is an architecture developed by OpenAI. This architecture employs a pre-training method for text generation and then fine-tunes it for specific tasks.
- LLM (Language Learning Model) – This term describes models like ChatGPT that learn from text to generate natural human language. These models learn to recognize patterns in language data on which they are trained, enabling them to produce authentic textual responses.
The first thing every ChatGPT user sees is the system’s most straightforward part. The web interface is similar to any other web chat system, like Facebook Messenger or WhatsApp. Interestingly, the LLM (Language Learning Model) at the core of ChatGPT does not actually remember your interactions from one query to another. In IT terms, it is said to be stateless, treating each query to the model as a new, unrelated request without any knowledge of previous queries.
Each query’s context is programmatically “attached” as a header to your request. This context is crucial for the model to follow the conversation. However, it is essential to note that the context size is limited, depending on the model. For example, GPT-3.5 has a limitation of a maximum of 16,000 tokens per query. As previously mentioned, the core of ChatGPT is a Large Language Model, making it challenging to explain clearly how such a model functions without delving into complex mathematics.
Therefore, I will clarify two fundamental principles that people commonly request.
The model does not possess consciousness, emotions, or classical thinking mechanisms. It also “observes the world” one word at a time, meaning that LLM calculates the sentence it wants to write word by word, not knowing how the sentence will end.
This is a considerably different process from how humans construct most of their sentences, hence the phrase “Think before you speak.”
LLM is, in essence, a massive statistical gambling machine. Like casino poker, it calculates the probability of getting the desired word in response to your query. It could be said that while a casino gambling machine is designed to be highly against you, making it very difficult to get the outcome of cards or numbers you want, AI models, specifically this LLM, are almost identical machines but with all outcomes set in your favor.
In other words, inputting a set of characters into LLM will always return the expected combination, precisely the set of words you would anticipate. During the training of models like GPT LLM, the information used for training is not stored. As I mentioned, your conversation history and personal authentication data are kept within the system, but not within the model itself, just like on Facebook or any other application.
Therefore, when interacting with the model, the company using it may retain the data you enter, and it is their responsibility to handle that data. However, the data used to teach the model to recognize specific words is not stored anywhere inside a neural network. To further explain, we can use the analogy of classic human learning, such as learning multiplication tables. Although most people with a basic education know that seven times eight is 56, no one can recall any specific task they solved while learning, nor information about the book from which they learned, nor probably any details about the learning process itself.
However, the knowledge remains stored in our memory, in our natural neural network, the brain. In the same way, knowledge is stored in the virtual network – tasks are solved, and the experience of solving assignments and the knowledge derived from that experience remains. Of course, the question arises about how exactly this is done, and the answer is also the answer to the title of the previous chapter: mathematics.
Changes and Dangers AI Transition Brings
Let’s address the elephant in the room right away – the most dramatic predictions about the impacts of artificial intelligence are often the most science-fiction-based and least grounded in science. This is not to downplay the dangers of AI transition but to emphasize that the most significant harm that such sensational theories cause is that they divert attention from real threats.
The most apparent danger lies in the near future when major powers like China or the US will possess AI weapons in their arsenal. These weapons won’t be kinetic but automated software that can exponentially speed up various cyber-attack vectors. In hacker circles, it has long been proven that cyber warfare is several times more destructive than nuclear warfare.
To dispel any doubts, acquaint yourself with STUXNET, known colloquially as the “first cyber weapon in history.”
This weapon sabotaged the Iranian nuclear program with a cyber-attack in 2007. No one has claimed responsibility for the attack (although certain Western forces are suspected), making it the first evidence of systematic use of cyber tools for direct military purposes, with high efficiency.
Automated Phishing systems in the Deep Web have been available for at least two years. To execute a successful phishing attack on any demographic population anywhere in the world, all I need is a crypto wallet to pay for a subscription, and in just a few minutes, I can carry out a hacking attack on several million people.
The system can automate the communication with each attack target (in this case, any of us) and process millions of emails and targets daily without human intervention. The illustration I want to depict shows the malign use of AI technology as one of the greatest dangers ahead of us.
Here’s a disturbing idea – combine technologies like Stable diffusion or Midourney with child pornography. Let’s leave it at that.
While superintelligence sounds cool and generates clicks, the reality is that we are still a few decades away from it. Although modern AI is impressive, anyone who has used it extensively in their work quickly realizes that there is plenty of room for improvement.
I’m not trying to diminish global impacts on the market and the predictions of other experts, but if history has taught me anything, it’s that if there is a global fear that everyone dreads, that topic usually receives enough public attention to prevent it from happening. The lack of nuclear war that has yet to happen can serve as an example. However, narrow attention motivated by fear often closes our eyes to the dangers lurking on the horizon, and we only notice them once it is too late. Hence my initial reference to the story of Pandora.
Indeed, it is crucial to acknowledge the potential dangers brought about by the AI transition. However, we must avoid succumbing to blind panic fueled by sensationalism, as doing so might cause us to overlook the solutions that could help us overcome our fears.