Hyperwrite Engineering Blog

HyperWrite

Engineering Blog

Prompt Fun: Davinci Could Always do Word in Context

Accelerating Development with LLMs by Hallucinating Functions

May 27, 2022

A lot of what we do uses large language models and prompts, so we figured we'd give a bit of an overview of how this all works for those interested.

Over the last several years, access to large language models for generation tasks has become widespread. Anyone with a simple internet connection can now access giant models from companies such as Goose, OpenAI, Cohere, and AI21 with the click of a button. As access to these models becomes ubiquitous, it's probably useful to know how to direct them to generate you want them to get. The method of getting a trained model to generate text for you is called prompt engineering.

There's a million papers and blog posts on how transformers work but it's worth mentioning that the current trend in language models is based on the transformers architecture from Attention is All You Need which was shown to scale up nicely in Language Models are Few-Shot Learners which are worth reading but the techinical details are pretty irrelevant for this blog post.

What we mean when we talk about prompts

When interacting with large language models (LLMs), you give the model some text, called a prompt (or context) for which the model generates some text which is called a completion. For instance, if you give a LLM a prompt of 1 + 1 = , the model will generally complete your prompt with an output, 2. Similar, when you give the AI a prompt of The capital of France is , the model will complete it with Paris.

What the model is doing under the hood from those papers we skipped is converting those sequences of characters, e.g. The capital of France into arrays of integers called tokens over which the model actually computes such as [42, 12, 54, 12, 11] that represent the characters depending on how the model was trained. It's then predicting a sequence of output integers such as [51, 55,12] which are then converted back into characters, e.g. Paris. Depending on the size of the model, you can get between 1,000-8,000 of these tokens between the prompt and completion.

Now, this works fine if you're just having the model generate simple trivia questions. Depending on how large the model is and some other issues with training, it may or may not be able to get simple trivia or math questions right or wrong just based on the implicit information that the model learned in training. However, if we're trying to do more complex tasks, that's where things get a little more complicated.

One of the cool things about these models is that they have really interesting emergent properties as a result of being really big. What does emergent mean? Well, when you're training one of these things you're teaching the model to predict the completion based on the prompt. In order to do really well at predicting what's gonna happen next, the model has to get really good at picking up underlying meanings in the text it's being trained on. So you end up with things that have existing benchmarks to test against like the ability to distinguish words in context, generate summaries, and pick out answers to questions. However, these are considered emergent because no one ever told the model to learn these tasks, the model just kind of picked it up

Prompts can be quite powerful as long as the model has picked up on the relations between all the target entities. For instance, let's say we want to have a model always respond to us with the capital of a country. We can create a prompt that we prepend to an input as such:


 Germany
 Berlin
 ---
 Italy
 Rome
 ---
 France

And the model will likely put out Paris (as long as it knows Paris is the capital of France which a well trained model should).

While powerful and generalizable, there's some limitations to prompting. First, you can only include a certain number of examples with each generation because we're limited in the size of the prompt to approximately 1,000-2,000 tokens. Second, we have to include this prompt every time we query the model, which can take some time (although computing input tokens is pretty fast). However, depending on platforms, the provider may charge for the input tokens so it can lead to higher costs than you might want if you've got large prompts but small completions.

Fine Tuning

In contrast to prompt engineering, you can train desired abilities into the model via fine tuning.

Basically once the model's trained, we go and teach it either new data or tasks by having the model train on new input. We can fine tune in prompts so that way we don't need to do as much work with examples to get it to work.

For instance, let's say we want the model to always just tell us capitals. We can go grab a bunch of examples of answering capital questions, e.g. consider our list of capitals from before.

France
Paris 
---
 Germany
 Berlin
 ---
 Italy
 Rome
 ---
 ...

As opposed to the ephemeral nature of prompts, we can then re-train the model on this list of inputs/outputs so it's a permanent fixture of the model that whenever we give it a country it'll spit out the capital immediately. However, we can lose some capabilities of the model by doing this (e.g. completing text prompts) so there's always tradeoffs between training and generalizability.

Well, that's it for now

This is mainly intended as a reference for the things we don't want to have to re-hash in our subsequent posts so as questions come up with things we should cover here we'll be adding them. Don't hesitate to contact us at support@hyperwrite.ai with any comments!