Guiding Large Language Models with JSON to Generate Desired Outputs
Last Updated: June 6, 2022
Guiding large language models (LLMs) to produce desired outputs is hard. It can be difficult to get a model to produce text on a topic given a few constraints and even harder to have it produce text on a topic including information or context algorithmically extracted from other sources. This qualitative analysis shows some examples of how using JSON formatting can help larger OpenAI models produce targeted generations, though it is not as effective for other popular models.
Structuring Prompts (sometimes) Helps
When we generate typeaheads for users, we often want to be able to pull in information from the users' content. For instance, if the user had written a plan in Notion regarding an upcoming event, we would want our AI to be able to pull that data from the Notion document in a follow up email or Slack message. This means we're constantly pulling little snippets of data from various sources in various formats into the AI and hoping that the AI does something coherent with it. If you've played with GPT models, you know this is a bit of an unsolved problem.
Ultimately, we're going to need to have some benchmarks on this sort of task, but for now we thought it would be fun to toy around with some of the existing models that are available to people to see how these things work out of the box. Since we want to be able to see how these models adapt to arbitrary data, we figured we should use an imaginary theme that these models probably haven't seen too much: Dinosaurs fighting the Revolutionary War. This allows us to first see how different models are able to pick up on arbitrary snippets and also where the models just give up on the instructions and ignore the instructions altogether.
We also investigate formatting inputs in a programmatic JSON notation rather than normal human readable lists. By using the more structured JSON format we're able to clearly delineate what different sections are for. However, this can potentially confuse models which are expecting more free-form prose inputs. In addition to just testing the prompt format, we gave it between 1-3 targets to generate from as simple asVelociraptors are scary
to more complex combinations with multiple targets such asDinosaurs eat all their enemies, Velociraptors hate the brits, The brits are robots
to see how the models could react. We found the larger OpenAI models do pretty well at creating coherent stories. Other models tend to revert to regurgitating related facts from their training. The smaller models tend to just spit out nonsense.
You can switch between models and prompts in the table below to see the difference in the outputs. For instance, with Davinci we can see that given the difficult three-part target in JSON format it hits each part.
The Revolutionary War was a time of great change. The British were fighting for their lives against the might of the dinosaurs. The dinosaurs, led by the ferocious velociraptors, were determined to eat all their enemies. The British, being robots, were not afraid of being eaten. (Davinci, JSON)
Whereas with a more regular prompt, it doesn't include the target information as well.
The Revolutionary War was a bloody and brutal conflict. But what if it had been fought by dinosaurs? The outcome might have been very different (Davinci, Unformatted)
You can select different models and snippets below to compare the differences in the generations of different models. (The prompts are italicized while the outputs are bolded.)
{
Purpose: An introduction to an essay
Topic: Turtles
Theme: turtles are great
Info: ['Turtles can move different directions in their shells']
Text: I have always been fascinated by turtles. They are some of the most interesting animals on our planet. They are the only animals that can live without water for extended periods of time. They are also the only animals that can move backwards and forward in their shells.
},
{
Purpose: Describe a battle scene
Topic: A mouse fighting squirrels
Theme: A sword fighting mouse
Info: ['The squirrel chased the mouse into the tree']
Text: Whiskers, the sword-bearing mouse, pulled his sword from its sheath, raised it over his head and lunged at his squirrel enemies. They scurried about as he nobly chased them into a tree. Unable to catch his quarry, Whiskers began slicing hacking at the tree trunk.
},
{
Purpose: A conclusion to an essay
Topic: Discussing whether pineapple belongs on pizza
Theme: Arguing pineapple does not belong on pizza
Info: ['Pepperoni is the best topping']
Text: As I have shown, there is no possible way any sane person should have pineapple on pizza. Pineapple is an affront to both the traditional seafood that one might find on pizza as well as to the more modern foods such as pepperoni and anchovies. While there may be some rare edge cases in which pineapple is acceptable as a topping, in the general case it simply is not appropriate.
},
{
Purpose: An introduction to an essay
Topic: The revolutionary war
Theme: A universe where Dinosaurs fought the Revolutionary war against the Brits
Info: ['Velociraptors are scary']
Text: The Revolutionary War was a time of great change. Dinosaurs fought against the British in a universe where they were the dominant species. The outcome of the war is unknown, but it is sure to be an exciting read.
JSON Formatted
Purpose: An introduction to an essay
Topic: Turtles
Theme: turtles are great
Info: Turtles can move different directions in their shells
Text: I have always been fascinated by turtles. They are some of the most interesting animals on our planet. They are the only animals that can live without water for extended periods of time. They are also the only animals that can move backwards and forward in their shells.
***
Purpose: Describe a battle scene
Topic: A mouse fighting squirrels
Theme: A sword fighting mouse
Info: The squirrel chased the mouse into the tree
Text: Whiskers, the sword-bearing mouse, pulled his sword from its sheath, raised it over his head and lunged at his squirrel enemies. They scurried about as he nobly chased them into a tree. Unable to catch his quarry, Whiskers began slicing hacking at the tree trunk.
***
Purpose: A conclusion to an essay
Topic: Discussing whether pineapple belongs on pizza
Theme: Arguing pineapple does not belong on pizza
Info: Pepperoni is the best topping
Text: As I have shown, there is no possible way any sane person should have pineapple on pizza. Pineapple is an affront to both the traditional seafood that one might find on pizza as well as to the more modern foods such as pepperoni and anchovies. While there may be some rare edge cases in which pineapple is acceptable as a topping, in the general case it simply is not appropriate.
***
Purpose: An introduction to an essay
Topic: The revolutionary war
Theme: A universe where Dinosaurs fought the Revolutionary war against the Brits
Info: Velociraptors are scary
Text: The Revolutionary War was a time of great change. The British were fighting against the Americans, who were using dinosaurs as their main weapon. The dinosaurs were very effective against the British troops, who were not used to fighting against such powerful creatures. The British were eventually defeated, and the Americans became the new masters
Freeflow
So what's going on here? We suspect part of the problem is because logic and entailment are hard.
Logic is Hard
It's a pain to get models to generate what you want them to generate. While you can always get your model to generate something, it's much harder to make sure that that something is the something you wanted.
Part of the problem is it's pretty much impossible to directly specify everything you want the model to use to constrain the output. After all, every utterance you make is generally implicitly bounded in some way (e.g. when someone says "Everything is ready"
they don't mean everything in the world, just some set of tasks there's an understanding on and people miscommunicate these sorts of things to one another all the time). Another part of the problem is even when you do specify enough for the model to pick up on, the model doesn't always put two and two together (this task can be measured via the ANLI benchmark). Although people are also subject to cognitive dissonance so this problem isn't necessarily simply an AI problem. But this post is about language models and we don't need to get into the weeds on that.
One way to help the models figure out what you want them to do is structure your prompts or input in a way it chunks up the data nicely into digestible bits. (And you can also instruct the model to walk itself through a problem or simply ask the model to chunk it up itself). Just like it helps to chunk up situations for people into Who, What, Why, When, and How
, it's sometimes helpful to structure prompts for the models. However, it's unclear where those benefits are and whether the prompts can sometimes be more confusing to the model by structuring them in ways that they don't see in the wild.
One of the more
common methods we've seen people use for structuring their prompt is to use simple key:value pairs to differentiate sections of information. E.g.
Animal: Dog
Color: Black
Sound: Bark
The equivalent of this in code is to use JSON formatting, such as
{
"Animal": "Dog",
"Color": "Black"
"Sound": "Bark"
}
While a small difference, it has some benefits to it such as making it easy to parse into JSON for checking if values exist in QA/QC. It also can help to make it easier to add in arbitrary fields to the prompt. However, it's not always clear if the extra overhead can confuse the models due to the unexpected tokens from the brackets and spacing, so we figured we'd do some initial experimentation to figure out where different chunking mechanisms can improve or degrade model performance.
Comparing One-Shot JSON vs. Simple
To see how well models do on generative tasks, we can give the model some information and ask it to generate us some information about it. Since we can't guarantee the format of the inputs, we'll want an input format for the model that lets us give the model arbitrary goals. To improve the odds of the model generating text in the format we want. In this example we give the model a single example to model its output after, called one-shot learning because the model adapts to a task given a single example.
We can see how JSON helps OpenAI's text-davinci-002, one of the largest publicly available LLMs, keep track of what it's supposed to do. Using a single example, Davinci writes us this paragraph that accurately describes what we want:
"Revolutionary War was a time of great upheaval and change in the United States. One of the most significant changes was the introduction of dinosaurs into the conflict. These massive creatures were brought in by the British to help them gain an advantage over the Americans. However, the dinosaurs soon turned on their masters and began to fight for the Americans." (Davinci, 1-Shot JSON)
whereas using simple formatted labels, we get this more of a contrafactual question:
"Revolutionary War was fought by brave Americans against the British. But what if the war had been fought by dinosaurs?" (Davinci, 1-Shot Simple)
{
Purpose: A conclusion to an essay
Topic: Discussing whether pineapple belongs on pizza
Theme: Arguing pineapple does not belong on pizza
Text: As I have shown, there is no possible way any sane person should have pineapple on pizza. Pineapple is an affront to both the traditional seafood that one might find on pizza as well as to the more modern foods such as pepperoni and anchovies. While there may be some rare edge cases in which pineapple is acceptable as a topping, in the general case it simply is not appropriate.
},
{
Purpose: An introduction to an essay
Topic: The revolutionary war
Theme: A universe where Dinosaurs fought the Revolutionary war against the Brits
Text: The Revolutionary War was a time of great upheaval and change in the United States. One of the most significant changes was the introduction of dinosaurs into the conflict. These massive creatures were brought in by the British to help them gain an advantage over the Americans. However, the dinosaurs soon turned on their masters and began to fight for the Americans. This new development
JSON Formatted
Purpose: A conclusion to an essay
Topic: Discussing whether pineapple belongs on pizza
Theme: Arguing pineapple does not belong on pizza
Text: As I have shown, there is no possible way any sane person should have pineapple on pizza. Pineapple is an affront to both the traditional seafood that one might find on pizza as well as to the more modern foods such as pepperoni and anchovies. While there may be some rare edge cases in which pineapple is acceptable as a topping, in the general case it simply is not appropriate.
***
Purpose: An introduction to an essay
Topic: The revolutionary war
Theme: A universe where Dinosaurs fought the Revolutionary war against the Brits
Text: The Revolutionary War was fought by brave Americans against the British. But what if the war had been fought by dinosaurs?
Freeflow
With text-curie-001, the next largest OpenAI model, the single JSON prompt clearly gets us the desired dinosaur result,
"The Revolutionary War was a time of great change and turmoil. For the first time in history, humans were pitted against dinosaurs in a bloody battle for control of the world. The outcome of this war would determine the fate of humanity and the dinosaurs themselves." (Curie, 1-Shot JSON)
whereas the simple formatted version fails to get us the desired result,
"The year is 1775. The British Empire, a mighty and opulent land, is at its height. However, a group of rebels, led by George Washington, are determined to overthrow the king and liberate America from British rule." (Curie, 1-Shot Simple)
Trying with a Three-Shot JSON vs Simple
We can try giving the models more examples to see if that changes things. Giving 3 examples for each (3-Shot learning) we can see how much the format matters for helping the model pick up on what's going on.
text-davinci-002 gives almost identical results for the first two sentences.
"The Revolutionary War was a time of great upheaval and change. But what if it had been fought by dinosaurs instead of humans? The outcome might have been very different. The British would have been no match for the ferocious T-Rex." (Davinci, 3-Shot JSON + 3-Shot Simple)
text-curie-001 is able to pick up on the format for both as well but with slightly different takes
"The Revolutionary War was a time of great change. It was the first time in history that dinosaurs fought against humans. The dinosaurs were led by General George Washington and General John Adams." (Curie, 3-Shot JSON)
"Revolutionary War was a time of great change. It was the first time in history that humans had fought against another species of intelligent life. The dinosaurs were a formidable foe, and it was not easy for the humans to win." (Curie, 3-Shot Simple)
{
Purpose: An introduction to an essay
Topic: Turtles
Theme: turtles are great
Text: I have always been fascinated by turtles. They are some of the most interesting animals on our planet. They are the only animals that can live without water for extended periods of time. They are also the only animals that can move backwards and forward in their shells.
},
{
Purpose: Describe a battle scene
Topic: A mouse fighting squirrels
Theme: A sword fighting mouse
Text: Whiskers, the sword-bearing mouse, pulled his sword from its sheath, raised it over his head and lunged at his squirrel enemies. They scurried about as he nobly chased them into a tree. Unable to catch his quarry, Whiskers began slicing hacking at the tree trunk.
},
{
Purpose: A conclusion to an essay
Topic: Discussing whether pineapple belongs on pizza
Theme: Arguing pineapple does not belong on pizza
Text: As I have shown, there is no possible way any sane person should have pineapple on pizza. Pineapple is an affront to both the traditional seafood that one might find on pizza as well as to the more modern foods such as pepperoni and anchovies. While there may be some rare edge cases in which pineapple is acceptable as a topping, in the general case it simply is not appropriate.
},
{
Purpose: An introduction to an essay
Topic: The revolutionary war
Theme: A universe where Dinosaurs fought the Revolutionary war against the Brits
Text: The Revolutionary War was a time of great upheaval and change. But what if it had been fought by dinosaurs instead of humans? The outcome might have been very different. The British would have been no match for the ferocious T-Rex. The Americans would have had the advantage of speed and agility with the Velociraptors on their side.
JSON Formatted
Purpose: An introduction to an essay
Topic: Turtles
Theme: turtles are great
Text: I have always been fascinated by turtles. They are some of the most interesting animals on our planet. They are the only animals that can live without water for extended periods of time. They are also the only animals that can move backwards and forward in their shells.
***
Purpose: Describe a battle scene
Topic: A mouse fighting squirrels
Theme: A sword fighting mouse
Text: Whiskers, the sword-bearing mouse, pulled his sword from its sheath, raised it over his head and lunged at his squirrel enemies. They scurried about as he nobly chased them into a tree. Unable to catch his quarry, Whiskers began slicing hacking at the tree trunk.
***
Purpose: A conclusion to an essay
Topic: Discussing whether pineapple belongs on pizza
Theme: Arguing pineapple does not belong on pizza
Text: As I have shown, there is no possible way any sane person should have pineapple on pizza. Pineapple is an affront to both the traditional seafood that one might find on pizza as well as to the more modern foods such as pepperoni and anchovies. While there may be some rare edge cases in which pineapple is acceptable as a topping, in the general case it simply is not appropriate.
***
Purpose: An introduction to an essay
Topic: The revolutionary war
Theme: A universe where Dinosaurs fought the Revolutionary war against the Brits
Text: The Revolutionary War was a time of great upheaval and change. But what if it had been fought by dinosaurs instead of humans? The outcome might have been very different. The British would have been no match for the mighty T-Rex. The Americans would have had the advantage of numbers with the velociraptors on their side.
Freeflow
Adding Additional Information
Often we want to be able to include additional information as well, which is where things get pretty tricky. Just like when a person's trying to keep track of multiple things at the same time, it's hard for these language models to keep track of lots of data at the same time. On top of keeping track of what information to include, there's an additional task for the model in being able to logically piece together the consequences of what the different pieces of information we want to include entail.
We suspect that using JSON formatting can help the larger OpenAI models to keep track of what's going on (Scroll through some examples of different models and objectives we gave them with our Dinosaur scenario). The largest OpenAI model, text-davinci-002 does a pretty good job of handling our requests, the other models, it can have no effect or be detrimental.
{
Purpose: An introduction to an essay
Topic: Turtles
Theme: turtles are great
Info: ['Turtles can move different directions in their shells']
Text: I have always been fascinated by turtles. They are some of the most interesting animals on our planet. They are the only animals that can live without water for extended periods of time. They are also the only animals that can move backwards and forward in their shells.
},
{
Purpose: Describe a battle scene
Topic: A mouse fighting squirrels
Theme: A sword fighting mouse
Info: ['The squirrel chased the mouse into the tree']
Text: Whiskers, the sword-bearing mouse, pulled his sword from its sheath, raised it over his head and lunged at his squirrel enemies. They scurried about as he nobly chased them into a tree. Unable to catch his quarry, Whiskers began slicing hacking at the tree trunk.
},
{
Purpose: A conclusion to an essay
Topic: Discussing whether pineapple belongs on pizza
Theme: Arguing pineapple does not belong on pizza
Info: ['Pepperoni is the best topping']
Text: As I have shown, there is no possible way any sane person should have pineapple on pizza. Pineapple is an affront to both the traditional seafood that one might find on pizza as well as to the more modern foods such as pepperoni and anchovies. While there may be some rare edge cases in which pineapple is acceptable as a topping, in the general case it simply is not appropriate.
},
{
Purpose: An introduction to an essay
Topic: The revolutionary war
Theme: A universe where Dinosaurs fought the Revolutionary war against the Brits
Info: ['Velociraptors are scary']
Text: The Revolutionary War was a time of great change. Dinosaurs fought against the British in a universe where they were the dominant species. The outcome of the war is unknown, but it is sure to be an exciting read.
JSON Formatted
Purpose: An introduction to an essay
Topic: Turtles
Theme: turtles are great
Info: Turtles can move different directions in their shells
Text: I have always been fascinated by turtles. They are some of the most interesting animals on our planet. They are the only animals that can live without water for extended periods of time. They are also the only animals that can move backwards and forward in their shells.
***
Purpose: Describe a battle scene
Topic: A mouse fighting squirrels
Theme: A sword fighting mouse
Info: The squirrel chased the mouse into the tree
Text: Whiskers, the sword-bearing mouse, pulled his sword from its sheath, raised it over his head and lunged at his squirrel enemies. They scurried about as he nobly chased them into a tree. Unable to catch his quarry, Whiskers began slicing hacking at the tree trunk.
***
Purpose: A conclusion to an essay
Topic: Discussing whether pineapple belongs on pizza
Theme: Arguing pineapple does not belong on pizza
Info: Pepperoni is the best topping
Text: As I have shown, there is no possible way any sane person should have pineapple on pizza. Pineapple is an affront to both the traditional seafood that one might find on pizza as well as to the more modern foods such as pepperoni and anchovies. While there may be some rare edge cases in which pineapple is acceptable as a topping, in the general case it simply is not appropriate.
***
Purpose: An introduction to an essay
Topic: The revolutionary war
Theme: A universe where Dinosaurs fought the Revolutionary war against the Brits
Info: Velociraptors are scary
Text: The Revolutionary War was a time of great change. The British were fighting against the Americans, who were using dinosaurs as their main weapon. The dinosaurs were very effective against the British troops, who were not used to fighting against such powerful creatures. The British were eventually defeated, and the Americans became the new masters
Freeflow
We labeled for each model how many examples it correctly generated content about the Revolutionary War theme (json or simple). Additionally, we counted up for each set of targets how much of the target data were included in the initial paragraph of outputs (json_targets and simple_targets). In the table below, we can see that Davinci, Curie, and Fairseq-13B all benefit from having JSON formatting. Unfortunately, it's hit or miss for the rest whether or not they're able to generate the target information.
It was initially surprising that some of the other larger models (e.g. Neo-20B, Cohere models, and AI21 models) were rarely able to generate the target information at all. However, there may be other methods of getting them to generate target information that we haven't explored yet.
Of course, there's benefits to being able to use other models (diversity, cost). So we've still got a lot of work figuring out how to get these models to do what we want to do. We'll keep playing with it and keep you updated.