Large Language Models explained:

What are they and why are they the most talked-about AI trend?

 

Large Language Models (LLMs) are advanced artificial intelligence systems built through deep learning techniques*, and specifically engineered to understand and generate human language, which you might have heard referred to as natural language.

These models are characterized by their vast size, consisting of tens to hundreds of billions of parameters**, which enable them to learn intricate patterns and nuances in language. By training on a massive and varied text database, they acquire an understanding of context, semantics, and grammar, allowing them to perform tasks such as language translation, text summarization, content generation, and more, with remarkable fluency and coherence.

Being able to leverage language makes this technology incredibly powerful for various applications!

Many digital tools have been created thanks to LLMs since they rose in popularity with the launch of OpenAI’s ChatGPT. They have revolutionized the way businesses operate and make decisions — it all happened really fast, and this is only the beginning!

In this article, we’ll review which advantages brought on the quick rise of LLMS in business, then we’ll give you a quick guide to understanding LLMs (how do they work, and why did ChatGPT become so famous?). Lastly, we’ll cover the three options for implementing LLMs (on-cloud, on-premises, and hybrid solutions).

IN-DEPTH

* Deep learning techniques are a subset of machine learning methods that involve neural networks with multiple layers of artificial neurons (hence “deep”). An artificial neuron is a mathematical function that takes numerical inputs, applies weights to these inputs (which signifies the importance or influence of that input on the neuron’s output), sums them up, and then passes the result through an activation function to produce an output.

The output of one layer serves as the input to the next layer, allowing neural networks to capture complex relationships and patterns in data. The process of training a neural network involves adjusting the weights of the neurons to minimize errors and enable the network to make accurate predictions or classifications for various tasks, such as image recognition or natural language processing.

** Parameters are the variables that the model uses to make predictions or decisions. They can be learned by the model during the training process or set through an optimization procedure to influence the model’s performance.

 
 
Banner image of banknotes
 

The Rise of LLMs in Business

The adoption of LLMs in the business world has been nothing short of transformative. Several key advancements were made possible thanks to LLMs, that have contributed to their rising adoption in business contexts:

Data Exploitation

The digital era has ushered in an era of unprecedented data generation. Companies accumulate vast amounts of textual data that can be exploited, from customer interactions to market research reports. Thanks to the union of a search engine (that retrieves relevant documents to the user query) and an LLM (to answer the query in natural language) it is now possible to sift through an internal knowledge base and extract valuable insights from its data.

Automation and Efficiency

LLMs enable the automation of tasks that once required human intervention. They can draft emails, generate reports, answer customer inquiries, and even assist in legal research. They can help us analyze existing workflows, reduce bottlenecks, and suggest improvements. Whether it’s supply chain management, logistics, or customer service, these automations streamline processes, reduce human error, and free up valuable human resources for more strategic tasks.

Enhanced Decision-Making

When provided with historical data and real-time information, LLMs can generate insights that inform strategic choices, market predictions, and risk assessments. These models empower businesses to make data-driven decisions with greater speed and accuracy thanks to the vast amount of information they can process, and their ability to summarize complex textual data.

Improved Customer Experience

The most immediate use of LLMs is also the most famous: chatbots. Revolutionized by LLMs, chatbots can now provide instant responses to customer inquiries, offer recommendations, and maintain consistent and helpful communication, ultimately boosting customer satisfaction and loyalty. This support can also be provided internally to customer service staff, helping with swift issue resolution and information accessibility even by less experienced teams handling 1st level support.

Innovation and Creativity

LLMs are very good at generating creative content, such as product descriptions, marketing copy, and even art. This creativity opens up new avenues for branding and content marketing, where fresh and relevant content is crucial for audience engagement and brand visibility.

Competitive Advantage

Thanks to all the above-described transformative advantages they offer, companies that harness the power of LLMs gain a competitive edge. They can stay ahead of market trends, tailor their marketing strategies, make better decisions, and adapt to changing customer preferences more effectively, positioning themselves for long-term success.

 
 
Banner image of dev code
 

Understanding LLMS

How do LLMs work?

The way LLMs learn to use language is quite remarkable and differs from traditional programming approaches. Human developers couldn’t possibly predict and code every single question that you might ask a model like ChatGPT. Instead, these models rely on a vast amount of text data to learn patterns and associations between words, phrases, and concepts.

 

This diagram is representative of generative models (GPT-like), but can also apply to non-generative models (BERT-like) up to fine-tuning. We explain more about these differences below. We also go more in-depth about foundation models in this article:
https://tinyurl.com/foundation-models-in-nlp

 

In summary, LLMs learn language by analyzing vast amounts of text data in a self-supervised manner, identifying patterns and relationships in that data, and then fine-tuning their internal parameters to perform specific language-related tasks. Additionally, LLMs can learn and adapt to new information after the initial training, by being updated with additional data and supervised learning methods, to steer toward the desired behavior.

Trained LLMs can answer a wide range of questions and generate text without explicit programming for each individual task, making them versatile and adaptable tools for various applications.

Why did ChatGPT become so famous?

ChatGPT represents a significant advancement in natural language processing and artificial intelligence compared to previous technologies. Behind the famous chatbot’s façade, are OpenAI’s foundation models: the first version was released in 2018, and the subsequent ones are getting bigger.

IN-DEPTH

Want to go deeper in understanding foundation models? Check out our dedicated article on medium.

GPT-3 (released in 2020) is an LLM with vast common knowledge and understanding of grammar and was the backbone of InstructGPT, the predecessor to the well-known ChatGPT (released in 2022). Today, commercial users of ChatGPT can leverage both GPT-3.5 and the more powerful GPT-4 (released in mid-March 2023) through API. GPT-4 is bigger than its predecessor, can receive images as input, and is better at reasoning and following instructions. The company is likely working on GPT-5 already.

ChatGPT has the merit of having made it easy to leverage the foundation model’s information through its well-known conversational approach. This was a necessary step to make LLMs accessible to the general public.

Another crucial factor in the spreading of this technology is the integration of plugins (external software modules that add functionalities). For instance, plugins exist for the interaction with web pages, for automating booking services, for precise math computation with external engines like Wolfram, and many other applications.

Thanks to these integrations and its ability to understand and generate remarkable natural language, ChatGPT opened the door to a whole new range of commercial applications: from serving as a virtual assistant in customer support to aiding in creative ideation. In this era of rapid technological advancement, ChatGPT placed itself as the first and hard-to-beat LLM for the masses.

 
 
 
 

Do alternatives exist?

Yes! Multiple ChatGPT alternatives have been developed. There are several types of LLMs, each with its own unique architecture and purpose.

First, let’s make a distinction between generative and non-generative models. Generative models, like ChatGPT can, by design, complete a given text input and generate plausible output text, token after token. This allows for chat exchanges in surprisingly fluent natural language, and for the request of tasks that weren’t covered specifically during the model’s training.

By contrast, non-generative models can only “read” text, without “writing”: Google’s BERT is an example of this kind, it is pre-trained on guessing masked words. As their output is a limited number of prediction values, they are suitable for tasks like text classification (such as sentiment analysis) or the identification of key information in text and its classification into a set of predefined categories. This type of model must be finetuned to the desired tasks.

Second, we can make a distinction between models hidden behind proprietary APIs (like OpenAI’s ChatGPT, Google’s Bard, or Anthropic’s Claude), and those completely open source (like Meta’s LLaMA, TII’s Falcon, or Google’s BERT). For most of these models, both foundational and finetuned versions have been released.

While some proprietary models can be fine-tuned to some extent if you pay a premium for the API access, open-source models are, by nature, accessible for further fine-tuning. This means having total control over the model and its response generation, allowing us to customize its capabilities to the needs of each client and use case.

Keeping the entire process environment in-house can also increase control over the security and privacy of sensitive data, which matters to most companies and especially to their customers. However, handling open-source models has its pitfalls:

Not all of them are commercially usable (the same applies to open-source datasets);
They usually underperform in languages other than English;
Their computational requirements must be taken into consideration (hardware needs to be purchased or leased to run the model).

IN-DEPTH

Let’s take LLaMA as an example: it is a family of generative LLMs of various sizes, developed by Meta. They have a transformer-based architecture, similar to OpenAI’s GPTs. The first version (released in early 2023) wasn’t available for commercial use, but the second version is (since mid-2023) and it includes several technical improvements. Alpaca and Vicuna are respectively instruction and chat adaptations of LLaMA.

Animation of lego characters to represent customizable LLMs

Custom LLMs

Lastly, some organizations and researchers develop custom LLMs tailored to their specific needs and use cases. These models can be trained on proprietary data or with specific objectives in mind.

Each of these LLMs may excel in different areas or have specific strengths, making them suitable for a wide range of natural language processing tasks. The choice of LLM depends on the specific requirements and goals of a given project or application.

Often, a large general-purpose LLM can allow the tackling of many tasks and use cases at once, giving flexibility to quickly adapt to evolving needs. However, when running costs must be limited, a better option would be to finetune a smaller model on a high-quality specific dataset. This would maximize efficiency, without losing much in prediction performance.

 
 

Implementing LLMs

Another important differentiation factor between LLMs is where they are hosted: on cloud, or on-premises. Models behind proprietary APIs cannot be downloaded and hosted on-premises. Conversely, open-source models are typically hosted on-premises, because one of their main benefits is data confidentiality. In certain cases, the best option could be a combination of both.

Below are the main pros and cons to evaluate each solution. Keeping in mind the preliminary remark above, the “on-cloud solutions” are only intended for the LLMs behind APIs, while “on-premises solutions” are for the open-souce LLMs.

• • •

On-Cloud solutions

1 These truly massive models hosted in huge data centers are great with generalist knowledge, as well as the understanding and generation of natural language.


The con: You lose internal access to the model, whereas the possibility to fine-tune mechanisms will come at an additional cost.

2 Cloud-hosted LLMs like ChatGPT are readily accessible from anywhere with an internet connection.


The con: Access to cloud-hosted LLMs relies on internet connectivity, which can be a limitation in many places.

3 Infrastructure and server management are not of your concern: these activities are handled by the cloud provider, simplifying maintenance. Additionally, pay-as-you-go pricing models eliminate the need for significant upfront hardware investments, simplifying scalability and ensuring you immediately get the computing power needed for your tasks.


The con: While pay-as-you-go can be cost-effective for smaller businesses and startups, extended usage or unexpected resource requirements can lead to higher costs in the long term.

4 Cloud solutions often have data centers in multiple geographic regions, ensuring low-latency access for users around the world.


The con: Storing sensitive data in the cloud means your company’s and customers’ sensitive data goes to third parties through APIs, and this could cause big privacy debacles for your business!

• • •

On-premises solutions

1 On-premises solutions offer maximum control over data and security, crucial for industries with strict compliance requirements.


The con: Organizations will have to handle the maintenance of the running servers and occasional model updates, which can be resource-intensive and challenging for in-house staff without extensive technical expertise. In that case, the best alternative would be to get continued support from an external provider.

2 Local deployment can result in lower latency, providing faster response times. They also aren’t dependent on internet connectivity, making them suitable for secure, isolated environments.


The con: Expanding on-premises infrastructure can be slower and more costly compared to cloud-based scalability.

3 Organizations have full control over hardware, software, and configurations, allowing for tailored model implementations.


The con: On-premises setups typically require significant upfront investments in hardware, software, and IT resources. However, the pricing is fixed rather than pay-as-you-go, meaning that implementation costs will amortize in the long term.

4 On-premises solutions are more likely to be implemented with fixed costs (hardware, setup, and hand-off of custom software, etc) that will amortize in the long term.


The con: The comparison is with pay-as-you-go solutions, more typical in cloud setups, which often represent a smaller initial investment and commitment.

• • •

Hybrid solutions

In practical commercial applications, the LLM is part of a broader software suite, that includes many features like data ingestion, document parser, and search engine… These important components can also be implemented through hybrid solutions, depending on the use case. For instance, it could make sense to use an on-premise search engine with an LLM hosted on cloud (or vice-versa).

1 Data can be segmented: sensitive data can remain on-premises when addressing privacy concerns, meanwhile leveraging the cloud for scalable computations on non-sensitive data.

2 Organizations can reduce expenses in the long term by using cloud and on-premise resources in the most cost-effective way while scaling up.

3 The hybrid approach provides flexibility to adapt to changing needs and growth without sacrificing data control.



Is it all pros?: Unfortunately, it isn’t. Implementing and managing a hybrid solution can be expensive and complex, requiring careful integration and coordination between on-premises and cloud components.

 
Banner image representing the many paths one could take when choosing LLMs
 

How to choose?

The biggest breakthrough with LLMs compared to previous approaches is that they make fine-tuning easier and faster. By exploiting their internal knowledge and general capabilities, they require less data. As a result, they make it considerably easier to address a wide range of use cases and accelerate the refinement of company processes.

However, for certain specific tasks, LLMs exhibit remarkable performance even without prior training data. This makes LLMs an exceptionally efficient choice for businesses seeking swift and effective solutions.

Many providers are coming up with different solutions: services like Microsoft Azure for instance have now raised the bar for enterprises concerned with data ownership. Meanwhile, at Artificialy, we choose to offer both the integration of third parties’ solutions and our own custom LLM: a multi-lingual model we fine-tuned from open-source projects and that is hosted entirely on-premises, ideal for use-cases where data confidentiality is mandatory.

The choice of deployment method ultimately depends on an organization’s unique priorities, including data privacy, scalability needs, compliance requirements, and budget considerations. There are many things to keep in mind and a well-thought-out strategy that balances these factors is needed to choose a solution that leads to a successful and efficient implementation.

• • •

If you’re thinking about whether to leverage an LLM for your business, this article should have given you a pretty good idea of what your options are. By the way, a Large Language Model helped me write this!

In our next article, we will cover how to choose amongst these options in much more detail and give you some practical examples of how real businesses go about it. Follow us on Linkedin to get a notification when it is published!

We are just an email away, let’s have a chat! You can reach us at privategpt@artificialy.com and visit our company website www.artificialy.com

This article was written by Federico Magnolfi,
Machine Learning Engineer at Artificialy SA.


 
Previous
Previous

Arriva PrivateGPT: l'AI generativa che conversa nelle principali lingue europee, dialetto ticinese incluso