Master Prompt Engineering: Expert Tactics to Get Better Results from GPT-4 and Large LMs
The Complete Guide to Prompt Engineering for GPT Models - 147 Tactics to Optimize Performance from Large Language Models
Prompt Engineering Guide
Source: OpenAI GPT Prompt Engineering Guide
This guide shares strategies and tactics for getting better results from large language models (sometimes referred to as GPT models) like GPT-4. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.
Some of the examples demonstrated here currently work only with our most capable model, GPT-4. In general, if you find that a model fails at a task and a more capable model is available, it's often worth trying again with the more capable model.
You can also explore example prompts which showcase what our models are capable of:
Prompt examples
Explore prompt examples to learn what GPT models can do
Improvements:
1. Add more examples and expand on the ones given to provide greater clarity and variety.
2. Explain why certain methods may only work with more advanced models and what capabilities those models have that enable better performance.
3. Provide analysis of different prompt engineering methods - what works best in what situations? How can methods be combined for multiplicative improvements?
4. Give tactical tips not just high-level strategies - help prompt engineers understand exactly what to try and how to implement techniques.
5. Explain common failure modes and how to address them - where do prompts often go wrong and why? Provide troubleshooting flowcharts.
6. Offer prompts tailored to different applications - writing, QA, search, recommendations etc. Explain nuances of what works best per domain.
7. Provide fully worked examples from end-to-end - show not just snippets but how a production system could be built.
8. Build a taxonomy of methods - classify them by complexity, expected impact, required model capability etc.
9. Automated analysis - given a prompt, suggest improvements algorithmically.
10. Maintain this guide as a living document - update it continually as new techniques emerge. Crowdsource contributions from the community.
Example prompt improvements:
Prompt examples
Explore prompt examples to learn what GPT models can do
Improvements:
1. Categorize examples by capability tested - summarization, translation, writing styles etc.
2. Annotate examples - explain why each one works well, analysis of the techniques used.
3. Provide enhanced variants of examples - start with a base prompt and show iterations to improve it.
4. Include failure examples too - common ways prompts fall down and how to fix them.
5. Allow users to submit their own examples to share best practices. Upvote/comment on examples.
6. Generate graphs of performance over time - track how much better models/prompts get.
7. A/B test examples - build intuition for what factors actually make a difference.
8. Auto-generate similiar prompts programmatically - create broad datasets.
9. Maintain this as a living library that grows over time.
Six strategies for getting better results
HEADLIME IS THE GO-TO GPT-3 TOOL FOR MARKETERS.
WRITESONIC IS ONE OF THE BEST ARTIFICIAL INTELLIGENCE-POWERED COPYWRITING GPT-3 TOOLS.
FASTPICTORY VIDEO CREATION MADE EASY
1. Write clear instructions
Improvements:
- Give examples of ambiguous instructions and show clearer versions
- Explain psycholinguistic principles for writing unambiguous instructions
- Provide templates/frameworks for writing instructions for different applications
- Build tools to catch ambiguous phrases and suggest fixes automatically
- A/B test instruction variants experimentally to find optimal verbiage
2. Provide reference text
Improvements:
- Give examples of reference texts enhancing model capabilities in different domains
- Explain techniques (e.g. embeddings) to retrieve relevant reference texts
- Analyze properties of reference texts that make them useful for enhancing models
- Provide metrics to quantify expected value of reference texts of different types
- Develop tools to select good reference texts algorithmically. Maintain curated reference dataset/corpus.
3. Split complex tasks into simpler subtasks
Improvements:
- Provide real-world examples across many domains of complex workflows broken into modular subtasks
- Explain different possible subtask decomposition schemes (sequential, hierarchical, etc.)
- Give programatic frameworks/templates for composing subtask modules
- Provide concrete tips for defining interfaces between subtask components
- Tools to analyze prompts and suggest productive subtask decompositions
4. Give models time to "think"
Improvements:
- Illustrate tradeoffs between inference time and answer quality
- Explain how to structure prompts to reveal reasoning step-by-step
- Provide guidance on optimal think time per application
- Build models quantifying think time needed to hit accuracy targets
- Give programmatic tools to orchestrate thinking - pausing, prompting, etc.
5. Use external tools
Improvements:
- Provide examples across domains of combining models with traditional code
- Explain best practices for sandboxing untrusted model outputs used in code
- Metrics quantifying reliability gains from different tools (execution engines, APIs etc.)
- Guides to prompting models effectively to leverage tools like embeddings, executors etc.
- Anti-patterns - common failure cases of combining models & tools
6. Test changes systematically
Improvements:
- Explain how to quantify differences statistically robustly
- Tools to auto-generate test cases programmatically
- Metrics to score question-answer quality automatically
- Explain human evaluation and it's tradeoffs vs automated metrics
- Examples of test suites across various domains/applications
- Guidance on sufficient sample sizes to achieve statistical confidence
In summary, for each strategy, significantly expand on the initial guidance by providing additional examples, templates, tools and analytics that make the strategies easier to successfully operationalize across a wide variety of applications. Treat this guide as an ever-evolving living document that grows over time with community input.