A beginner’s guide to machine learning and Large Language Models (LLMs)

Embracing AI in business - A digital guide to machine learning and Large Language Models (LLMs)

29 Aug 2024

Shuab Kunwar Partner, R&D Incentives Advisory

Shuab Kunwar

Authors

Shuab Kunwar

Explore

Choose a section

Currently machine learning and Large Language Models (LLMs) are a hot topic and it is clear they are here to stay in the software industry, businesses now have the unique opportunity to utilise the power of artificial intelligence across their operations.

At Evelyn Partners, we thought it would be beneficial to prepare a simple guide on the concepts of machine learning and Language Large Models, as well as some of the key terms used when these are discussed.

What is machine learning?

Machine learning is a field within artificial intelligence (AI) that enables computers to learn from historical data and improve their performance over time without the need for explicit programming. By feeding data to algorithms, machine learning allows these systems to identify patterns and make predictions based on the data they receive. Machine learning can be implemented across many different fields, e.g. natural language processing, computer vision, speech recognition, email filtering and for resolving more general business problems.

Four key concepts in machine learning:

Algorithms: These are the rules or processes the computer follows to learn from data to fulfil a specific task. Different algorithms are used for different types of problems
Training Data: This is the data used to teach the algorithm. It includes both the inputs (e.g., pictures of shapes) and the correct outputs (e.g., the names of the shapes)
Model: This is the end result of the training process. In essence, this is a program that can make predictions or decisions without being explicitly programmed to perform each individual task
Prediction: Once the model is trained, it can make predictions based on new data it hasn't seen before

Types of machine learning approaches:

Supervised learning: The algorithm is trained on labelled data, meaning the input comes with the correct answer. For example, predicting house prices based on features like size and location. There are a variety of supervised algorithms available which all have their own pros and cons. Some examples of these are:
1. Naïve Bayes
2. Linear regression
3. Logistic regression
4. Support-vector machines
5. Decision trees
Unsupervised learning: The algorithm is trained on unlabelled data, and it tries to find patterns or groupings on its own without any human intervention. For example, clustering customers into different segments based on purchasing behaviour. Some of the most popular algorithms in unsupervised learning include:
1. Clustering e.g. hierarchical clustering
2. Anomaly detection e.g. isolation Forest
3. Latent variable models e.g., blind signal separation techniques
Reinforcement learning: The algorithm learns by interacting with an environment and receiving feedback in the form of rewards or punishments. For example. training a robot to navigate a maze.

This differs from supervised learning in not needing labelled input/output pairs to be presented and not needing sub-optimal actions to be corrected. The focus of this algorithm is on finding a balance between exploration (of unchartered territory) and exploitation (of existing knowledge). The goal is to maximise the long-term reward, where feedback may be incomplete or delayed.

This algorithm is utilised in many disciplines, e.g. game theory, control theory, operations research, multi-agent systems. The basic algorithm is modelled as a Markov decision process, whereby it provides a mathematical framework for modelling decision-making in situations where the outcome is partly random and partly under the control of a decision maker.

Deep learning

Deep learning is a subset of machine learning inspired by the structure and function of the human brain. Deep learning uses artificial neural networks with multiple layers to progressively extract higher-level features from raw input.

For example, in image recognition, lower layers might identify edges, while higher layers might identify concepts relevant to a human such as digits or letters or faces. Deep learning models can handle large amounts of unstructured data and have achieved breakthrough results in areas such as computer vision and natural language processing.

Deep learning can be applied across supervised, unsupervised, and reinforcement learning approaches, enhancing their capabilities in handling complex, high-dimensional data and discovering intricate patterns. Some popular deep learning architectures include:

Convolutional Neural Networks (CNNs): Primarily used for image and video recognition tasks
Recurrent Neural Networks (RNNs): Useful for sequential data like text or time series
Transformers: A more recent architecture that has revolutionized natural language processing tasks

Deep learning can be applied to both supervised and unsupervised learning tasks, but it's particularly powerful in unsupervised contexts where it can discover intricate structures in high-dimensional data.

What are Large Language Models and how do they work?

Large Language Models (LLMs) are a type of advanced deep learning model designed to understand and generate human language and other natural language processing tasks. They are trained on huge datasets of text and can handle a wide range of language-related tasks. LLMs are built on a specific type of neural network called a transformer model.

Transformer models possess the capability to learn context, which is especially important for human language. These models employ a mathematical technique called self-attention to detect subtle ways that elements in a sequence can relate to each other. This allows the models to better understand context than other types of machine learning models.

From a simple standpoint, an LLM is a program that has been fed enough examples to be able to recognise and interpret human language or other types of complex data. Many LLMs are trained on data gathered from the internet using thousands or millions of gigabytes worth of text.

LLMs utilise deep learning to understand how characters, words and sentences function together. It involves the probabilistic analysis of unstructured data, which eventually enables the deep learning model to recognise distinctions between pieces of content without the need for human intervention. The parameters within the LLMs can then be further trained or tuned, including fine tuning or prompt tuning to fulfil the particular task that the programmer requires e.g., interpreting questions and generating a response or translating text into different languages.

The parameters of the model are the internal configurations that are adjusted during training. The models will have millions, or even billions, of parameters, making them hugely powerful but also resource intensive from a computer perspective.

Uses of Large Language Models

Content generation

LLMs can write articles, stories, and even code. They generate coherent and contextually relevant text based on a given prompt. From a coding perspective, LLMs can assist developers by generating snippets of code or explaining programming concepts.

Content rewriting

Another capability of LLMs is content rewriting. They can rephrase or reword text while preserving the original meaning. Additionally, multimodal LLMs can enable the generation of text content enriched with images. For example, in an article about travel destinations, the model can automatically insert relevant images alongside the text descriptions.

Language translation

LLMs play a pivotal role in machine translation. They can break down language barriers by providing more accurate and context-aware translations between languages. For example, a multilingual LLM can seamlessly translate an Italian document into English while preserving the original content and nuances.

Content summarisation

LLMs excel in summarising lengthy text content, extracting key information and outputting concise summaries. This is valuable for quickly comprehending the main points of articles, research papers or news reports. This feature also has beneficial use case for customer support agents, providing quick ticket summarisations, boosting efficiency and improving customer experience.

Sentiment analysis

Businesses can utilise LLMs to gauge public sentiment on social media and from customer reviews. This facilitates market research and brand management by providing insights into customers opinions.

Chatbots and conversational AI

LLMs empower conversational AI and chatbots to engage with users in a natural and human-like manner. These models can hold text-based conversations with users, answer questions and provide assistance.

Different types of LLM models

LLMs can be categorised into several types based on their training approach and capabilities:

Zero shot

These are standard LLMs trained on generic data to provide reasonably accurate results for more general use cases. These do not necessitate additional training and are ready for immediate use.

Few-shot

These LLMs are capable of learning from a small number of examples provided in the prompt. Unlike zero-shot models that work without any specific examples, few-shot models can adapt to specific tasks or domains with just a handful of demonstrations. This approach bridges the gap between zero-shot and fine-tuned models, offering improved performance on specific tasks without the need for extensive retraining.

Fine-tuned or domain specific

Fine-tuned models go a step further by receiving additional training to enhance the effectiveness of the initial zero-shot model. An example of this is OpenAI Codex, which is employed as an auto-completion programming tool for projects built on the foundation of GPT-3. These are also known as specialised LLMs.

Language representation

These leverage deep learning techniques and transformers, the architectural basis of generative AI. These models are well suited for natural language processing tasks, enabling the conversion of languages into various mediums e.g., written text.

Multimodal

These LLMs possess the capability to handle both text and images. An example of this would be GPT-4V which is capable of processing and generating content in multiple modalities.

Advantages and limitations of using LLMs:

Advantages

Efficiency: LLMs automate tasks that involve the analysis of data, reducing the need for manual intervention and speeding up processes
Scalability: The models can be scaled to handle extremely large data sets making them adaptable to wide range of applications
Performance: Newer LLMs are known for their exceptional performance and accuracy, characterised by the capability to produce swift, low-latency responses
Customisation flexibility: LLMs offer a robust foundation that can be tailored to meet specific uses. Through additional training and fine-tuning, businesses can customise these models to precisely align with their unique requirements
Multilingual support: LLMs work with multiple languages, enhancing global communication and information access
Improved user experience: The models enhance user interactions with chatbots, virtual assistants and search engines, providing more meaningful and context-aware responses

Limitations

Data privacy: Handling large amounts of data raises concerns about privacy and security and necessitates robust privacy measures to protect user information
Bias: LLMs can inherit and perpetuate biases present in the training data, leading to unfair or discriminatory outcomes. This includes generating harmful, misleading or inappropriate content raising ethical concerns
Resource intensive: Implementing LLMs requires significant computational power and resources e.g., investment in expensive GPU hardware and datasets to support the training process
Glitch tokens: Malicious prompts have the potential to disrupt the functionality of LLMs, highlighting the importance of robust security measures in LLM deployment
Black box nature: Many LLMs, especially the more complex ones, operate as "black boxes," meaning their decision-making processes are not easily interpretable or explainable. This lack of transparency can be problematic in applications where understanding the reasoning behind a model's output is crucial, such as in healthcare or finance. It also makes it challenging to debug issues or ensure that the model is making decisions based on relevant factors.

Harnessing AI for business success

Machine Learning algorithms and large language models are revolutionising various industries by enabling computers to learn from data and understand human language. They represent a transformative leap in AI fuelled by their immense scale, performance and deep learning capabilities.

Whilst they offer many benefits, it's important to address challenges like data privacy, bias, ethical concerns and interpretability issues to ensure these technologies are used responsibly. Businesses must carefully evaluate these models based on their specific use, considering factors like inference speed, model, algorithm size, fine-tuning options and costs.

By grasping the fundamentals of how they work, businesses can harness the immense potential of these models to drive innovation and efficiency in the AI world, transforming the way we interact with information and technology.

R&D tax relief for machine learning and Large Language Models

Many businesses are currently implementing both machine learning algorithms and LLMs within their existing enterprise architecture. Some of this work can involve undertaking experimentation e.g., around accuracy of models or security of data which goes above and beyond industry standard knowledge or capability. Therefore, some of these activities may be eligible for R&D tax relief.

Our software R&D tax team comprises industry-experienced developers who have worked in this space and we can help identify both obvious and non-obvious R&D activities and also ensure any claims made for activities in this space are robust.

If you would like to further discuss whether the activities you are undertaking in this space could be eligible then please get in touch.

By necessity, this briefing can only provide a short overview and it is essential to seek professional advice before applying the contents of this article. This briefing does not constitute advice nor a recommendation relating to the acquisition or disposal of investments. No responsibility can be taken for any loss arising from action taken or refrained from on the basis of this publication.

Tax legislation

Tax legislation is that prevailing at the time, is subject to change without notice and depends on individual circumstances. You should always seek appropriate tax advice before making decisions. HMRC Tax Year 2023/24.

NTEH70824125

A beginner’s guide to machine learning and Large Language Models (LLMs)