Master Prompt Engineering: Expert Tactics to Get Better Results from GPT-4 and Large LMs

The Complete Guide to Prompt Engineering for GPT Models - 147 Tactics to Optimize Performance from Large Language Models

Dec 27, 2023

Prompt Engineering Guide

Source: OpenAI GPT Prompt Engineering Guide

This guide shares strategies and tactics for getting better results from large language models (sometimes referred to as GPT models) like GPT-4. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.

Some of the examples demonstrated here currently work only with our most capable model, GPT-4. In general, if you find that a model fails at a task and a more capable model is available, it's often worth trying again with the more capable model.

You can also explore example prompts which showcase what our models are capable of:

Prompt examples

Explore prompt examples to learn what GPT models can do

Improvements:

1. Add more examples and expand on the ones given to provide greater clarity and variety.

2. Explain why certain methods may only work with more advanced models and what capabilities those models have that enable better performance.

3. Provide analysis of different prompt engineering methods - what works best in what situations? How can methods be combined for multiplicative improvements?

4. Give tactical tips not just high-level strategies - help prompt engineers understand exactly what to try and how to implement techniques.

5. Explain common failure modes and how to address them - where do prompts often go wrong and why? Provide troubleshooting flowcharts.

6. Offer prompts tailored to different applications - writing, QA, search, recommendations etc. Explain nuances of what works best per domain.

7. Provide fully worked examples from end-to-end - show not just snippets but how a production system could be built.

8. Build a taxonomy of methods - classify them by complexity, expected impact, required model capability etc.

9. Automated analysis - given a prompt, suggest improvements algorithmically.

10. Maintain this guide as a living document - update it continually as new techniques emerge. Crowdsource contributions from the community.

Example prompt improvements:

Prompt examples

Explore prompt examples to learn what GPT models can do

Improvements:

1. Categorize examples by capability tested - summarization, translation, writing styles etc.

2. Annotate examples - explain why each one works well, analysis of the techniques used.

3. Provide enhanced variants of examples - start with a base prompt and show iterations to improve it.

4. Include failure examples too - common ways prompts fall down and how to fix them.

5. Allow users to submit their own examples to share best practices. Upvote/comment on examples.

6. Generate graphs of performance over time - track how much better models/prompts get.

7. A/B test examples - build intuition for what factors actually make a difference.

8. Auto-generate similiar prompts programmatically - create broad datasets.

9. Maintain this as a living library that grows over time.

Six strategies for getting better results

HEADLIME IS THE GO-TO GPT-3 TOOL FOR MARKETERS.
WRITESONIC IS ONE OF THE BEST ARTIFICIAL INTELLIGENCE-POWERED COPYWRITING GPT-3 TOOLS.
FAST PICTORY VIDEO CREATION MADE EASY
JASPER CREATE AMAZING BLOG POSTS ART & IMAGES MARKETING COPY SALES EMAILS SEO CONTENT FACEBOOK ADS WEB CONTENT LOVE LETTERS CAPTIONS VIDEO SCRIPTS BLOG POSTS 10X FASTER
BEEHIIV THE MOST SUCCESSFUL NEWSLETTERS IN THE WORLD HAVE ACCESS TO THE BEST TOOLS — AND NOW SO DO YOU.

1. Write clear instructions

Improvements:

- Give examples of ambiguous instructions and show clearer versions

- Explain psycholinguistic principles for writing unambiguous instructions

- Provide templates/frameworks for writing instructions for different applications

- Build tools to catch ambiguous phrases and suggest fixes automatically

- A/B test instruction variants experimentally to find optimal verbiage

2. Provide reference text

Improvements:

- Give examples of reference texts enhancing model capabilities in different domains

- Explain techniques (e.g. embeddings) to retrieve relevant reference texts

- Analyze properties of reference texts that make them useful for enhancing models

- Provide metrics to quantify expected value of reference texts of different types

- Develop tools to select good reference texts algorithmically. Maintain curated reference dataset/corpus.

3. Split complex tasks into simpler subtasks

Improvements:

- Provide real-world examples across many domains of complex workflows broken into modular subtasks

- Explain different possible subtask decomposition schemes (sequential, hierarchical, etc.)

- Give programatic frameworks/templates for composing subtask modules

- Provide concrete tips for defining interfaces between subtask components

- Tools to analyze prompts and suggest productive subtask decompositions

4. Give models time to "think"

Improvements:

- Illustrate tradeoffs between inference time and answer quality

- Explain how to structure prompts to reveal reasoning step-by-step

- Provide guidance on optimal think time per application

- Build models quantifying think time needed to hit accuracy targets

- Give programmatic tools to orchestrate thinking - pausing, prompting, etc.

5. Use external tools

Improvements:

- Provide examples across domains of combining models with traditional code

- Explain best practices for sandboxing untrusted model outputs used in code

- Metrics quantifying reliability gains from different tools (execution engines, APIs etc.)

- Guides to prompting models effectively to leverage tools like embeddings, executors etc.

- Anti-patterns - common failure cases of combining models & tools

6. Test changes systematically

Improvements:

- Explain how to quantify differences statistically robustly

- Tools to auto-generate test cases programmatically

- Metrics to score question-answer quality automatically

- Explain human evaluation and it's tradeoffs vs automated metrics

- Examples of test suites across various domains/applications

- Guidance on sufficient sample sizes to achieve statistical confidence

In summary, for each strategy, significantly expand on the initial guidance by providing additional examples, templates, tools and analytics that make the strategies easier to successfully operationalize across a wide variety of applications. Treat this guide as an ever-evolving living document that grows over time with community input.

The Week In AI

Discussion about this post

The Week In AI

Master Prompt Engineering: Expert Tactics to Get Better Results from GPT-4 and Large LMs

The Complete Guide to Prompt Engineering for GPT Models - 147 Tactics to Optimize Performance from Large Language Models

Prompt Engineering Guide

Your own personal army of bots, continuously producing VIDEO CONTENT across all social media platforms.

Generating TRAFFIC, LEADS, and attracting CLIENTS, all for YOU!

Discussion about this post