Course on Large Language Models
NOTE: You’re only meant to change code marked with “# TODO:”
Table of Contents
- Setting Up
- API Key Configuration
- Connecting to OpenAI API
- Exploring the API
- Creating Chat Completions
- Understanding Completion Parameters
- Prompt Engineering
- Crafting Effective Prompts
- Strategies and Best Practices
- Advanced Techniques
- Utilizing Embeddings
- Function Calling in LLMs
- Extras
- Creating an API key
- Local Development with LLMs
- Context Windows
- Fine-Tuning LLMs
Part 0: Setup
To be able to use OpenAI one needs to configure an API key to the be allowed responses to requests. Remember not to commit this key to any repository or upload it as OpenAI will disable the key if it is found, and others can use it to make requests that you or your organisation (Cogito) will pay for.
import os
from dotenv import load_dotenv
load_dotenv()
# Once you add your API key below, make sure to not share it with anyone! The API key should remain private.
OPENAI_API_KEY: str = os.getenv(“OPENAI_API_KEY”)
# There are many different models to try out “gpt-4”, “gpt-4-turbo-preview”, “gpt-3.5-turbo”
MODEL_NAME: str = “gpt-3.5-turbo”
if not OPENAI_API_KEY:
print(“[ERROR] The key is not configured correctly”)
else:
print(“[SUCCESS] API Key is configured correctly.”)
from openai import OpenAI
client = OpenAI(
api_key=OPENAI_API_KEY,
)
Part 1: API Connections (10 min)
completion = client.chat.completions.create(
model=MODEL_NAME,
messages=[
{“role”: “system”, “content”: “You are a poetic assistant, skilled in explaining complex AI concepts with creative flair.”},
{“role”: “user”, “content”: “Create a limerick about Large Language Models”}
]
)
print(“The Answer for the language model “)
print(completion)
print(“\nThe answer of the model: “)
print(completion.choices[0].message.content)
Part 2: Understanding Completion Parameters (15 min)
Key Parameters:
- Model Name: Specifies the particular model version you want to use (e.g., text-davinci-003). Different models have varying capabilities, sizes, and costs.
- Messages: The list of input text that you provide to the model. This is where the art of prompt engineering comes into play, guiding the model to generate the desired output.
- Temperature: Controls the randomness of the output. A higher temperature leads to more varied responses, while a lower temperature results in more deterministic outputs. It’s typically set between 0 and 2.
- Max Tokens: Determines the maximum length of the model’s response, measured in tokens (words or pieces of words). This helps control output verbosity.
- Top P: Influences sample diversity by only considering the top P percent of probability mass when generating responses. Adjusting this can affect the creativity and relevance of the output.
- Frequency Penalty: Discourages repetition by penalizing words based on their frequency in the text so far. This can help generate more diverse and interesting responses.
- Presence Penalty: Similar to frequency penalty but penalizes based on the presence of words, encouraging the model to introduce new concepts and terms.
Task 2.1 Experimenting with Parameters
Now that you’re familiar with the parameters that can influence the behavior of LLMs, let’s put this knowledge to the test. Your task is to experiment with these parameters to see firsthand how they affect the model’s outputs.
Choose a Prompt: Start with a simple prompt, such as asking the model to write a short story about a space adventure.
# TODO: Fill in your own prompt
prompt: str = “Write a paragraph about a space adventure”
Task 2.2
Vary the Temperature: Generate three completions using temperatures of 0.0, 1.0, and 2.0. Observe how the creativity and variability of the responses change.
# TODO: Change the temperature
TEMPERATURE: float = 2.0
completion = client.chat.completions.create(
model=MODEL_NAME,
temperature=TEMPERATURE,
messages=[
{“role”: “user”, “content”: prompt}
]
)
output = completion.choices[0].message.content
print(f”The Model responded with: ‘{output}'”)
Task 2.3
Adjust Max Tokens: Try generating responses with different limits on length, such as 50, 100, and 2000 tokens, to see how it impacts the detail and depth of the story.
# TODO: Change the MAX_TOKENS
MAX_TOKENS: int = 50
completion = client.chat.completions.create(
model=MODEL_NAME,
max_tokens=MAX_TOKENS,
messages=[
{“role”: “user”, “content”: prompt}
]
)
output = completion.choices[0].message.content
print(f”The Model responded with: ‘{output}'”)
Task 2.4
Experiment with Top P, Frequency Penalty, and Presence Penalty: Adjust these parameters to explore their effects on repetition, novelty, and thematic diversity.
# TODO: Change the different parameters and check effect on output
# TOP_P can be any float number between 0 and 1
TOP_P: float = 0.1
# FREQUENCY_PENALTY can be any float Number between -2.0 and 2.0.
FREQUENCY_PENALTY: float = 0
# PRESENCE_PENALTY can be any float Number between -2.0 and 2.0.
PRESENCE_PENALTY: float = 0
completion = client.chat.completions.create(
model=MODEL_NAME,
top_p=TOP_P,
frequency_penalty=FREQUENCY_PENALTY,
presence_penalty=PRESENCE_PENALTY,
messages=[
{“role”: “user”, “content”: prompt}
]
)
output = completion.choices[0].message.content
print(f”The Model responded with: ‘{output}'”)
Reflect on how each parameter influenced the model’s output. This exercise will enhance your understanding of how to control and guide the AI to achieve results that best fit your objectives.
Part 3: Prompt engineering (15 min)
Prompt engineering is an art and science of designing inputs that guide Large Language Models (LLMs), such as Generative Pre-trained Transformer (GPT), to produce specific, high-quality responses or outputs. This process is foundational in the field of artificial intelligence because the precision with which we articulate our prompts significantly affects the AI’s performance. A well-crafted prompt can lead to outputs that are not only accurate but also creative and contextually relevant, showcasing the model’s capabilities to their fullest extent.
Engaging with Prompt Engineering
Before we dive into specific tactics for effective prompt engineering, it’s important to understand that the goal is to communicate with the model in its language. This means being clear, direct, and detailed in your requests.
Tactics:
Applying What We’ve Learned
Now that we’ve outlined the key tactics for effective prompt engineering, let’s put this knowledge into practice.
Task 3.1
Imagine you’re working on the Cogito Project TutorAI, a cutting-edge AI tool designed to support students in their study efforts by creating concise, informative flashcards from dense academic texts. Your challenge is to engineer a prompt that instructs the LLM to distill complex material into easy-to-review flashcards, focusing on key concepts, definitions, and examples relevant to an upcoming exam.
- Extract Key Concepts and Definitions: The AI must identify and summarize the main ideas and definitions found in a given academic text. This involves discerning the most important points that are crucial for understanding the subject matter.
- Format the Information for Flashcards: The output should be structured in a way that is suitable for flashcard creation. Each flashcard will have a term or concept on one side and its definition or explanation on the other side, along with an example if appropriate.
- Control the Length: Each flashcard content (term/definition/example) should be concise, aiming for no more than 50 words per side to facilitate quick review and memorization.
This task will test your ability to use detailed queries, specify a structure, and control the output length—all crucial aspects of prompt engineering. Remember, the effectiveness of your prompt will directly influence the quality and relevance of the AI’s response. Good luck!
book_paragraphs: str = “””
Chapter 1 – Epic Introduction
Since the dawn of time, humans have tried to define how we think, and this struggle has led us to create artificial intelligence. Historically, four approaches to artificial intelligence have been followed, each described below.
Acting Humanly
If we can’t distinguish between a computer and a human, the computer is said to act humanly. The computer’s capability to act humanly can be tested by performing a turing test. A computer passes the turing test if a human interrogator cannot tell whether he is communicating with a computer or a person. To pass a turing test, the computer would need to possess the following capabilities:
Natural language processing to enable it to communicate successfully.
Knowledge representation to store what it knows or hears.
Automated reasoning to use the stored information to draw conclusions.
Machine learning to adapt to new circumstances and to detect patterns.
Thinking humanly
To make a computer think like a human, we must know how humans think. The computers ability to think humanly can be determined by comparing the computer’s input-output mechanism by the corresponding human behaviour.
Acting Rationally
An agent is something that acts. A rational agent is an agent that does the right thing based on what it knows, its functions, and the surrounding environment; it acts so that it achieves the best expected outcome.
Thinking rationally
Using sound logic rules to reach the right conclusion.
A relevant quote, demonstrating the logical rule of modus ponens: “Socrates is a man; all men are mortal; therefore, Socrates is mortal.
“””
def generate_flashcards_from_paragraphs(paragraph: str) -> str:
completion = client.chat.completions.create(
model=MODEL_NAME,
messages=[
# TODO: Create a prompt or combination of “system” and “user” prompts to achieve tasks objectives
{“role”: “system”, “content”: “”},
{“role”: “user”, “content”: book_paragraphs},
]
)
return completion.choices[0].message.content
flashcards = generate_flashcards_from_paragraphs(book_paragraphs)
print(f”The Model responded with the following flashcards: \n'{flashcards}'”)
Part 4: Embeddings (15 min)
def create_embedding(prompt: str, model=”text-embedding-ada-002″) -> list[float]:
return client.embeddings.create(model=model, input=prompt).data[0].embedding
print(create_embedding(“This is an embedding!”))
database: list[list[list[float]], str] = []
# Create embedding-text key-value pairs and add them to the database
corresponding_text_1 = “This is an embedding!”
embedding_1 = create_embedding(corresponding_text_1)
database.append([embedding_1, corresponding_text_1])
corresponding_text_2 = “Sverre is CTO of Cogito NTNU”
embedding_2 = create_embedding(corresponding_text_2)
database.append(embedding_2, corresponding_text_2])
import numpy as np
def cosine_similarity(a: list[float], b: list[float]) -> float:
“””
Takes 2 vectors a, b and finds how similar they are using the cosine similarity
Args:
a (list[float]): A list of floats
b (list[float]): A list of floats
Returns:
The similarity of the two vectors a and b described as a float between 0 and 1
“””
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
def search_docs(query: str, database: list[list[list[float]], str], top_k: int=1):
“””
Searches the database for the most similar documents to the query
Args:
query (str): The query to search for
database (list[list[list[float]], str]): The database to search in
top_k (int): The number of documents to return
Returns:
A list of the top_k most similar documents to the query
“””
query_embedding = create_embedding(query)
results = []
for (doc_embedding, doc) in database:
similarity = cosine_similarity(query_embedding, doc_embedding)
results.append((similarity, doc))
return sorted(results, reverse=True)[:top_k]
search_docs(“Who is the CTO of Cogito?”, database)
Task 4.1
Create a new embedding with some text of your choice, and add it to the database. See if you can make the model find it.
# TODO: Create an embedding for some text and append it to the database
while True:
user_input: str = input(“What would you like to ask the model: “)
if user_input == “q”:
print(“[SUCCESS] Shut down”)
break
answer = search_docs(user_input, database, top_k=1)
print(f”The AI gave the answer: {answer}\n”)
Part 5: Function Calling (35 min)
Example of Yr application using Langchain
from langchain.llms.openai import OpenAI
from langchain.tools import StructuredTool
from langchain.agents import AgentType
from langchain.memory import ConversationBufferMemory
from langchain.agents import initialize_agent
import json
import matplotlib.pyplot as plt
import requests
# Give the agent a list of tools to use
def get_weather(latitude: float, longitude: float) -> str:
“””
Narrate the story based on the given prompt.
“””
url = f”https://api.met.no/weatherapi/locationforecast/2.0/compact?lat={latitude}&lon={longitude}”
headers = {
‘User-Agent’: ‘Mozilla/5.0 (compatible; LangChain/1.0; +https://langchain.ai/)’
}
response = requests.get(url, headers=headers)
plot_weather_data(response.text)
# Cap the response at 2000 characters
response = response.text[:2000]
return response
def plot_weather_data(weather_data):
# Parse the JSON data
data = json.loads(weather_data)
# Extract the temperature data from nested JSON
temperatures = [data[‘properties’][‘timeseries’][i][‘data’][‘instant’][‘details’][‘air_temperature’] for i in range(len(data[‘properties’][‘timeseries’]))]
# Get the temperature at every round hour
temperatures = temperatures[::2]
if len(temperatures) > 24:
temperatures = temperatures[:24]
# Plot the data
plt.plot(temperatures, ‘r-‘)
plt.xlabel(‘Time’)
plt.ylabel(‘Temperature (C)’)
plt.legend()
plt.show()
def average_temperature(temperatures: list[int]) -> float:
sum = 0
for temp in temperatures:
sum += temp
return sum // len(temperatures)
tools: list[StructuredTool] = [
StructuredTool.from_function(
name= “Weather at location”,
func=get_weather,
description=”Get the weather at a location given a latitude and longitude.”,
),
StructuredTool.from_function(
name= “Find average temperature”,
func=average_temperature,
description=”Get average temperature from a list of temperatures”,
),
]
# Make a memory for the agent to use
memory = ConversationBufferMemory(memory_key=”chat_history”)
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
agent_chain = initialize_agent(
tools,
llm,
agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
memory=memory,
max_iterations=10,
)
def run_agent(prompt: str) -> str:
“””Run the agent chain.”””
if not isinstance(prompt, str):
raise TypeError(“Prompt must be a string.”)
if (len(prompt) < 1) or (len(prompt) > 1000):
raise ValueError(“Prompt must be at least 1 character or less than 1000 characters.”)
result = agent_chain.run(prompt)
return result
run_agent(“Give me the weather at lat 63.41710242319078, long -10.4066603487495 and get the average temperature”)
Task 5.1
Create your own function for dividing two numbers and add it to the AI model tools. Try to ask it to do several additions or a combination of dividings and additions.
from langchain.llms.openai import OpenAI
from langchain.tools import StructuredTool
from langchain.agents import AgentType
from langchain.memory import ConversationBufferMemory
from langchain.agents import initialize_agent
def add(a: int, b: int) -> int:
“””Add two numbers”””
return a + b
# TODO: Create the divide function:
# TODO: Give the agent the new StructuredTool to use
tools: list[StructuredTool] = [
StructuredTool.from_function(
name= “Add two numbers”,
func=add,
description=”Adds two numbers together.”,
),
]
# Make a memorybuffer for the agent to use
memory = ConversationBufferMemory(memory_key=”chat_history”)
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
agent_chain = initialize_agent(
tools,
llm,
agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True, # Change to False if you do not want to see the chain process and only output
memory=memory,
max_iterations=3, # Number of times the model has to think about its answer
)
def run_agent(prompt: str) -> str:
“””Run the agent.”””
if not isinstance(prompt, str):
raise TypeError(“Prompt must be a string.”)
if (len(prompt) < 1) or (len(prompt) > 1000):
raise ValueError(“Prompt must be at least 1 character or less than 1000 characters.”)
result = agent_chain.run(prompt)
return result
while True:
user_input: str = input(“What would you like to ask the model: “)
if user_input == “q”:
print(“[SUCCESS] Shutting down…”)
break
answer = run_agent(user_input)
print(f”The AI gave the answer: {answer}\n”)
Task 5.2
Work together with others and create something cool, try to utilize the different lesseons you have learned examples are:
- Create external API access some live data
- Create more complex math operations to do calculus
- Create bash scripts to create folders or organize a folder
- Access a database for getting info
# TODO: Copy relevant code from this notebook and create something
Extras (out of this workshops scope):
Creating an API key
If you want to start using these models in your own applications you will need to create a user at OpenAI, create an API key and add credits. Create API key here
Running local LLMs
For those interested in experimenting with Large Language Models (LLMs) without incurring the costs associated with API calls to services like OpenAI’s, or dealing with sensitive or proprietary data, running pre-trained models on your own hardware presents a viable alternative. The open-source community, particularly Hugging Face’s Transformers library, offers access to a wide range of models, including some developed by leading tech companies.
One of the standout models available is Google’s FLAN-T5-XL, part of the T5 (Text-to-Text Transfer Transformer) family, which has been fine-tuned for a broad set of tasks. This model combines the flexibility of T5’s architecture with training on a mixture of supervised and unsupervised tasks, making it particularly adept at understanding and generating human-like text.
To get started with using FLAN-T5-XL or any other model from the Transformers library, you will need to install the necessary packages and understand how to load and interact with the model. Below is a basic Python script that demonstrates how to set up and use FLAN-T5-XL for generating text based on input prompts:
import sys
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, GenerationConfig
line = ‘What is the value of being accepted into Cogito NTNU, Norway’s largest technical AI student organisation, in the middle of an AI revolution?’
model_name = ‘google/flan-t5-xl’
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
config = GenerationConfig(max_new_tokens=200)
for line in sys.stdin:
tokens = tokenizer(line, return_tensors=”pt”)
outputs = model.generate(**tokens, generation_config=config)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
Context windows
The context window refers to the maximum amount of text (measured in tokens) the model can consider at one time when generating responses or performing tasks. This limit is intrinsic to the model’s architecture and significantly influences how we design prompts and interpret model outputs.
Significance of the Context Window
The size of the context window determines how much information the model can “see” and use at any given moment. For example, GPT-3 has a context window of 2048 tokens. This means it can consider up to 2048 tokens of preceding text to generate its responses. The implications are twofold:
- Prompt Design: When crafting prompts for an LLM, it’s vital to ensure that the most relevant information is within the model’s context window. Information beyond this limit won’t influence the model’s output, emphasizing the need for concise and focused prompt design.
- Sequential Tasks: For tasks requiring more information than the context window allows, you may need to design a series of prompts that build on each other, ensuring each segment of the task remains within the model’s view.
While advancements have led to models supporting context windows surpassing 100,000 tokens (gpt-4 and other open source ones), challenges persist. Specifically, such models tend to focus on the beginning and end of the provided text, potentially underutilizing the middle portion. This is know as lost in the middle.
New insights by a Operative System inspired model
MemGPT introduces a strategic approach to memory management, organized around two core concepts relevant to understanding context windows in LLMs:
- Memory Hierarchy: It segments memory into two types: a “main context” analogous to RAM, which is smaller and faster, and an “external context” similar to disk storage, which is larger but slower. This structure necessitates the deliberate transfer of information between these contexts, using virtual memory.
- Process Management: Similar to an operating system’s role in managing tasks, MemGPT regulates the flow of information between the memory segments, the LLM, and users, ensuring efficient handling of processes.
Fine-tuning Large Language Models
Fine-tuning is a process that adjusts a pre-trained model to a specific task or dataset, enhancing its ability to perform on tasks it wasn’t specifically trained for initially. This method leverages the general understanding that the model has developed during its initial training phase, applying it to a more focused domain or problem set. Fine-tuning can significantly improve the performance of LLMs on specialized tasks, making it a powerful tool for developers and researchers.
Why Fine-tune?
Customization: Tailors the model to understand and generate responses based on specific jargon, styles, or formats unique to your dataset. Improved Performance: Enhances the model’s accuracy and efficiency on tasks that may differ from the data it was originally trained on. Cost-Effectiveness: Utilizes the foundational knowledge the model has gained, reducing the need for training from scratch on vast datasets.
- How to Fine-tune an LLM: Select a Pre-trained Model: Choose a model that closely aligns with your task in terms of language and domain. Models available on platforms like Hugging Face offer a good starting point.
- Prepare Your Dataset: Your dataset should be representative of the task at hand and formatted in a way that the model can understand. It typically involves splitting the data into training, validation, and test sets.
- Customize Training Parameters: Adjust parameters such as learning rate, batch size, and the number of epochs to balance between retaining learned knowledge and adapting to the new dataset.
- Train the Model: Use a suitable environment and framework, like PyTorch or TensorFlow, along with Hugging Face’s Transformers library, to fine-tune the model on your dataset.
- Evaluate and Iterate: Test the model’s performance on a separate validation set, and iteratively adjust your approach based on the results.
An example of this using OpenAI can be found in the Cogito Project MarketingAI
Leveraging OpenAI Across Diverse Programming Environments
While this course primarily focuses on Python to interact with OpenAI’s models. Thera are other supported languages. Supported languages include, but are not limited to, TypeScript/JavaScript, Java, C#, Go, C++, and PHP, alongside others like Clojure, Kotlin, Ruby, Rust, and Scala. This wide-ranging support extends the potential of OpenAI’s AI models to virtually any software development domain, from web development and mobile applications to enterprise solutions and beyond.