Few movements in artificial intelligence have surged to such universal fascination as the ascendancy of Large Language Models (LLMs). From composing code snippets and drafting persuasive prose to performing intricate logical reasoning, these models have redefined our notion of what’s achievable through algorithmic language generation. Underpinning much of this progress is the principle of prompting—the deliberately crafted input or sequence of instructions that “nudges” the model into producing desired outputs.
Over time, prompting has developed from simplistic question-answer formats into a rich tapestry of strategies and sub-strategies, each uniquely engineered to shape the path of generative reasoning. Drawing on a comprehensive taxonomy that enumerates 58 text-based prompting tactics and an additional 40 that span multimodal domains (images, audio, video, 3D data, and beyond), this article aims to systematically dissect these methods. We’ll explore zero-shot and few-shot paradigms, chain-of-thought variants, ensembling and self-critique mechanisms, as well as a panoramic look at how prompting can integrate imagery, sound, and interactive visualizations.
This article weaves a wide-lens portrait of how each technique operates, why it’s valuable, and what potential hazards it entails. By the end, you’ll be acquainted with a wide array of prompt-engineering best practices, each promising new avenues for optimizing LLM responses, whether in purely textual tasks or in visually and auditorily enhanced frameworks.
1. Foundational Concepts of Prompting
1.1 The Essence of a Prompt
A “prompt” represents the initial impetus for an LLM’s output, encapsulating both the request and the context surrounding that request. From a single instruction (“Translate this sentence to French.”) to an elaborate, multi-step dialogue, prompts shape the boundaries within which the model operates. Notably, modern LLMs often respond with higher fidelity when prompts are methodically constructed, balancing specificity with clarity.
1.2 Why Prompt Engineering Matters
Careful prompt engineering is the difference between haphazard responses and coherent, on-point solutions. Through experiments and practice, researchers and practitioners have observed that even subtle changes in word order or context can shift an LLM’s outputs. Consequently, engineering a robust prompt entails trial, error, and domain-specific acumen.
1.3 Major Prompting Frameworks
Prompts frequently fall into recognizable patterns:
- Zero-Shot: Provide no examples, only instructions.
- Few-Shot: Supply a handful of exemplars that demonstrate the desired response style or format.
- Chain-of-Thought (CoT): Instruct the model to reveal intermediate reasoning steps.
- Self-Correction Loops: Let the model refine or critique its own solution before finalizing.
- Tool Usage: Facilitate the LLM’s interaction with external modules (search engines, calculators, code runners) during the generation process.
For readers seeking broader foundational theories, the survey by Liu et al. (2023) provides an extensive look at prompt engineering techniques and their cross-disciplinary applications.
2. Zero-Shot Techniques
2.1 Zero-Shot Prompting
This minimalist method involves simply specifying a task without demonstrations or templates. For instance:
“Write a 200-word abstract about sustainable urban gardening.”
Because the model relies exclusively on its pre-training, the result’s quality depends greatly on how precisely you word the request.
2.2 Zero-Shot Chain-of-Thought (Zero-Shot-CoT)
Zero-Shot-CoT nudges the model to lay out its thought process. You might say:
“Explain each step of your reasoning before revealing the final answer.”
By coaxing the model to articulate its intermediate logic, you often unlock more accurate or coherent responses—particularly for math or problem-solving queries.
2.3 Zero-Shot-CoT with Self-Consistency
Self-consistency extends Zero-Shot-CoT by directing the model to generate multiple lines of reasoning or to re-check the logic. This can be orchestrated explicitly or by instructing the model to compare multiple reasoning threads, reducing the likelihood of contradictions.
2.4 Role Prompting
Role prompting assigns the model a specific persona or perspective. For example:
“You are a film critic with 20 years’ experience. Provide a concise but sharp review of this screenplay.”
By anchoring the output in a defined role, you encourage the model to align tone, style, and expertise with that persona.
2.5 Style Prompting
Style prompting zeroes in on aesthetic and rhetorical qualities. Whether you’re after a Shakespearean sonnet or a corporate memo, instruct the model explicitly:
“Produce a formal, bullet-point summary in executive style.”
2.6 Emotion Prompting
Emotion prompting instructs the model to convey a chosen emotional undercurrent (e.g., hope, urgency, lamentation). This can apply to marketing copy, motivational speeches, or user engagement messages.
2.7 System 2 Attention (S2A)
Inspired by Daniel Kahneman’s notion of “fast vs. slow thinking,” S2A prompts explicitly ask the model to methodically deliberate. For instance:
“Slowly analyze each step to ensure thorough verification before concluding.”
2.8 SimToM (Simulate Thought in Mind)
SimToM is akin to chain-of-thought but emphasizes deep introspection. Prompts might say:
“Simulate the mental steps a scientist would take to deduce the correct hypothesis.”
2.9 Rephrase and Respond (RaR)
In RaR, the prompt first instructs the LLM to restate the question or context, ensuring clarity, then craft the response. This double-step can clarify ambiguities.
2.10 Re-reading (RE2)
RE2 explicitly asks the model to produce a draft answer, then revisit both prompt and draft, refining the final solution. Re-reading helps correct mistakes or omissions.
2.11 Self-Ask
Self-Ask prompts the model to generate clarifying sub-questions on its own, then solve those before arriving at a final solution. It’s a meta-level approach to ensure the model tackles each part of a complex task.
3. Few-Shot Techniques
3.1 Few-Shot Prompting
Unlike zero-shot, few-shot prompting places illustrative examples directly into the prompt. For instance, you might show two sample dialogues, each with questions and ideal answers, before posing a new, similar question.
3.2 Exemplar Generation (SG-ICL)
SG-ICL (Sample Generation In-Context Learning) systematically curates or generates examples that match the domain or style of the forthcoming query. This method can be automated or semi-automated, ensuring that exemplars are relevant.
3.3 Exemplar Ordering
The sequence in which you present examples can influence the final answer. Placing a complex example last might “prime” the model for more advanced reasoning. Meanwhile, presenting them in a gradual complexity arc can foster a more incremental approach.
3.4 Exemplar Quantity
Too few examples, and the model might lack a robust “pattern.” Too many, and your prompt might be truncated by token limits or lead to confusion. Striking the right balance—often two to five exemplars—can be crucial.
3.5 Exemplar Similarity
Exemplars that resemble the target query’s domain usually boost performance. For instance, if your query is about legal contracts, examples revolving around contract analysis will likely outperform random or unrelated ones.
3.6 Instruction Selection
Selecting or combining multiple instructions can further steer the model. By providing a short, direct instruction followed by a more detailed, context-rich instruction, you can offer both clarity and depth.
4. Chain-of-Thought Variants
4.1 Standard Chain-of-Thought
Standard CoT entails demonstrating the problem-solving process in multiple steps. You might present a math word problem and explicitly show the chain of logical deductions leading to the answer.
4.2 Zero-Shot-CoT Variants: Step-Back, Analogical, Thread-of-Thought, Tabular CoT
- Step-Back Prompting: Encourage the model to produce an initial solution, then “step back” to refine it.
- Analogical Prompting: Use analogies or parallels to guide how the chain-of-thought unfolds.
- Thread-of-Thought: Label each step in a thoroughly organized sequence, referencing them as you progress.
- Tabular CoT: Present the reasoning in a table format, delineating premises, steps, and final outcomes.
4.3 Few-Shot-CoT
Here, you provide examples of chain-of-thought solutions before requesting a new solution. By explicitly showcasing a stepwise problem-solving pattern, you induce the model to follow suit.
4.4 Contrastive CoT
Contrastive CoT prompts the model to weigh different potential interpretations or approaches, contrasting them directly before selecting the most sensible final path.
4.5 Uncertainty-Routed CoT
If the problem’s data is incomplete or ambiguous, instruct the model to highlight uncertain steps, possibly re-routing the solution process or flagging ambiguous premises.
4.6 Complexity-Based Prompting
Prompt the model to gauge a task’s complexity first and then self-decide whether it needs deeper chain-of-thought expansions, short reasoning, or an external check.
4.7 Active Prompting
Active Prompting prompts the LLM to ask clarifying questions before finalizing the solution. This is somewhat akin to Self-Ask but emphasizes real-time user interaction.
4.8 Memory-of-Thought
Here, partial solutions are kept in “memory” for subsequent steps. The model is prompted to recall or reference earlier steps explicitly when forming its next statement.
4.9 Automatic Chain-of-Thought (Auto-CoT)
This meta-approach uses automated search or algorithms to generate chain-of-thought exemplars, reducing the human labor required to craft detailed reasoning prompts.
5. Decomposition-Based Techniques
5.1 Least-to-Most Prompting
The task is broken into simpler sub-tasks, each tackled in ascending order of difficulty. By gradually scaling complexity, the model avoids confusion and keeps intermediate outputs focused.
5.2 Decomposed Prompting (DECOMP)
DECOMP orchestrates multiple sub-prompts that feed into a higher-level query. You might explicitly itemize tasks and incorporate the sub-results into a master prompt, ensuring methodical thoroughness.
5.3 Plan-and-Solve Prompting
In this two-stage method, the model first outlines a plan or set of steps. Only once that plan is established does it “solve” the problem. Segmenting planning from execution can prevent hasty errors.
5.4 Tree-of-Thought
By branching out possible reasoning paths, the model explores multiple interpretations in parallel. Users can instruct the LLM to generate a “tree” of potential solutions, then converge on the branch that seems most correct.
5.5 Recursion-of-Thought
This approach essentially asks the model to tackle a sub-problem, produce an answer, then feed that answer into subsequent sub-questions recursively until the ultimate query is resolved.
5.6 Program-of-Thoughts
Shaping the chain-of-thought like pseudo-code or a script, with variables and loops, can give the model a more algorithmic roadmap. It is especially powerful when combined with external code execution.
5.7 Faithful Chain-of-Thought
Faithful CoT emphasizes alignment between the enumerated steps and the final answer. If a mismatch arises, the model is expected to backtrack and reconcile discrepancies.
5.8 Skeleton-of-Thought
A high-level outline of main ideas, the “skeleton,” ensures that critical subpoints are recognized before deeper elaboration occurs. It’s a minimal but structured blueprint.
5.9 Metacognitive Prompting
Metacognitive prompting nudges the model to reflect on its own reasoning, systematically assessing correctness or coherence. This introspective layer can reveal where additional context is needed.
6. Ensembling and Aggregation
6.1 Demonstration Ensembling (DENSE)
DENSE provides multiple distinct examples for the same task, illustrating varied styles or paths to the solution. By absorbing multiple approaches, the model gains a broader perspective.
6.2 Mixture of Reasoning Experts (MoRE)
Label each chain-of-thought or demonstration as coming from a unique “expert.” The final step instructs the model to unify or prioritize these experts’ outputs. This fosters multi-faceted reasoning.
6.3 Max Mutual Information Method
Here, the model aims to produce an answer that best aligns with the context, maximizing semantic overlap between the question and the potential answers. Typically, it’s deployed with multiple possible solutions, choosing the most contextually consistent.
6.4 Self-Consistency (Universal Self-Consistency)
Self-consistency means generating several possible solutions, then either the model or user checks for convergence or majority agreement. If multiple solutions conflict, the model is prompted to reconcile them.
6.5 Meta-Reasoning over Multiple CoTs
Similar to ensembling but more explicit about generating multiple chain-of-thought solutions. Afterward, a “meta-reasoning” pass compares and merges these solutions into a single coherent conclusion.
6.6 DiVeRSe
DiVeRSe (Diverse Verbatim Reasoning Sequences) capitalizes on chain-of-thought variety—exploring parallel reasoning routes to yield a more comprehensive answer.
6.7 COSP (Consistency-based Self-adaptive Prompting)
COSP monitors the chain-of-thought for contradictions. If found, the LLM is guided to adapt or refine the reasoning until consistency reemerges.
6.8 USP (Universal Self-Adaptive Prompting)
USP generalizes the above ideas into a cyclical approach: the model’s chain-of-thought is tested, refined, and re-tested for each step, ensuring the final output is thoroughly cross-examined.
6.9 Prompt Paraphrasing
Paraphrasing the request in multiple forms can yield different angles of attack, clarifying ambiguities or surfacing new insights. The final output often blends these vantage points.
7. Self-Criticism and Refinement
7.1 Self-Criticism
The LLM is prompted to actively critique its own answer, identifying leaps in logic or unverified assumptions. This can be particularly helpful when user trust depends on transparency.
7.2 Self-Calibration
Calibration assigns confidence levels to each step. Where uncertainty is high, the model might automatically request more context or reevaluate the prior step.
7.3 Self-Refine
After producing a first pass, the model is explicitly told to revise that text. This second pass can tighten coherence or rectify factual missteps, akin to a writer editing a draft.
7.4 Reverse Chain-of-Thought (RCoT)
RCoT inverts the direction: the model states the final conclusion, then “walks backward” through the reasoning path to validate each link in the chain. This retroactive logic check can reveal hidden contradictions.
7.5 Self-Verification
Here, the model cross-verifies each step of the reasoning by plugging the conclusion back into the original query or constraints, akin to checking an algebraic solution by substitution.
7.6 Chain-of-Verification (COVE)
COVE is a specialized form of self-verification that explicitly enumerates a second chain-of-thought devoted to fact-checking the first. If any discrepancy emerges, the model flags it for resolution.
7.7 Cumulative Reasoning
Cumulative reasoning re-checks each incremental step, preventing compounding errors from snowballing into a flawed final answer.
8. Agents and Tool Use
8.1 Tool Use Agents (MRKL Systems, Toolformer)
With MRKL or similar frameworks, the LLM decides if it should consult an external module—like a calculator or fact database—and weaves that tool’s result into the evolving chain-of-thought. See original MRKL paper here.
8.2 Code-Generation Agents (PAL, ToRA, TaskWeaver)
In these agents, whenever the LLM encounters a compute-heavy or logic-intensive subtask, it writes and executes a snippet of code, returning the result to the main reasoning loop. This massively improves accuracy for math or data processing tasks.
8.3 Observation-Based Agents (ReAct, Reflexion)
Techniques like ReAct permit the LLM to alternate between internal reflection and environment “action,” such as running a query or parsing user feedback. Reflexion extends this by adapting to failures in real time.
8.4 Lifelong Learning Agents (Voyager, GITM)
Lifelong learning frameworks, exemplified by Voyager, let the LLM accumulate knowledge across multiple interactions or tasks, forming a repository of experiences that inform future prompts.
8.5 Retrieval-Augmented Generation (IRCoT, DSP, Verify-and-Edit)
Here, the LLM can search an external database or corpus. Methods like IRCoT embed retrieval steps within chain-of-thought, ensuring each assertion is anchored in verifiable evidence.
9. Extending Prompting Beyond Text
Modern LLMs increasingly handle images, audio, video, and 3D data, extending prompt engineering into a multimodal era. Below, we highlight 40 emergent techniques.
9.1 Image Prompting Techniques
9.1.1 Image Prompting
The user provides an image, often accompanied by textual cues:
“What is happening in this picture?”
9.1.2 Prompt Modifiers for Images
Similar to textual style prompts, you can guide image generation or editing with descriptors:
“Generate a watercolor painting of a serene forest at sunrise.”
9.1.3 Negative Prompting for Images
Here, you specify what <em>not</em> to include:
“Create a stylized portrait without any bright neon elements.”
9.1.4 Multimodal In-Context Learning with Images
Just like textual few-shot, you give the model pairs of images and their labels, culminating in a new image for classification or caption generation.
9.1.5 Paired-Image Prompting
This method uses two related images—perhaps “before” and “after”—and invites the LLM to describe or infer the transition.
9.1.6 Image-as-Text Prompting
Sometimes the visual data is converted into text (captions or bounding box coordinates) and integrated into a standard textual prompt.
9.1.7 Chain-of-Images (CoI)
Analogous to chain-of-thought, a sequence of images is used to show progressive steps or transformations that lead to a final concept.
9.1.8 Duty Distinct CoT for Images
Here, the model processes different visual facets—color, shape, layout—in separate reasoning threads, then consolidates them.
9.1.9 Multimodal Graph-of-Thought
A synergy of textual nodes and visual nodes forms a “graph-of-thought.” Each node represents a partial insight, and edges link them into a coherent reasoning structure.
9.2 Audio Prompting Techniques
9.2.1 Audio Prompting
For speech or sound analysis, the model might be handed an audio file and asked to transcribe, classify emotion, or extract keywords.
9.2.2 Audio In-Context Learning
You provide audio examples paired with transcripts or analyses, then present a new audio clip and ask for a matching style or output.
9.2.3 Multimodal CoT with Audio Inputs
The chain-of-thought can weave in auditory inferences. For instance, “Identify the speaker’s emotion, then outline possible reasons for that emotion.”
9.3 Video Prompting Techniques
9.3.1 Video Prompting
Feeding the LLM short or extended video clips to summarize, caption, or interpret frame-by-frame content.
9.3.2 Video Editing Prompting
Direct the model on how to modify video—applying color corrections or removing background noise, for example.
9.3.3 Text-to-Video Generation Prompting
In emergent generative video models, you supply text describing a scene or action, and the model attempts to generate a short video segment.
9.3.4 Video-to-Text Generation Prompting
Conversely, the model processes a video to produce a narrative or highlight reel in textual form.
9.4 3D and Spatial Prompting Techniques
9.4.1 3D Prompting
Command an LLM to generate or transform 3D models—perhaps instructing it to design a simple architectural shape.
9.4.2 Surface Texturing Prompting
Add or modify textures on existing 3D surfaces, controlling color schemes, patterns, or material properties.
9.4.3 4D Scene Generation
Extend the concept of 3D to include time-based animation, creating dynamic scenes that evolve frame by frame.
9.4.4 Prompting with User Annotations
In spatial tasks, the user might draw bounding boxes or simple sketches that the LLM refines into fully realized 3D scenes.
9.5 Multimodal Fusion and Advanced Techniques
9.5.1 Multimodal In-Context Learning
Unify text, images, or audio in the same prompt. The model sees exemplars of how these modalities interact, guiding the final output.
9.5.2 Chain-of-Thought for Multimodal Data
Apply step-by-step reasoning across modalities: interpret an image, cross-reference it with textual instructions, then produce a combined inference.
9.5.3 Image Captioning via Prompting
Instead of relying on specialized caption models, an LLM that integrates vision can interpret images and produce descriptive text through carefully engineered prompts.
9.5.4 Audio-Visual Inference
When the model is given both sound and visuals, it can combine cues (e.g., identifying an event in a video by matching ambient audio to the visible action).
9.5.5 Interactive Multimodal Prompting
Engage in iterative dialogues where the user points to parts of an image or references segments of audio, and the model refines its analysis step by step.
9.5.6 Segmentation Prompting
Focus on segmenting or masking specific parts of an image or video, guided by textual instructions.
9.5.7 Prompting for 3D Object Manipulation
Commands like “rotate the model 30 degrees to the left” or “extrude the top face” lead the LLM to transform a 3D scene accordingly.
9.5.8 Prompting with Sensor Data
If an LLM can interpret LiDAR or other sensor streams, the prompt can direct the model to identify navigable terrain, anomalies, or object boundaries.
9.5.9 Graph-Based Multimodal Reasoning
A more complex approach where textual knowledge nodes and visual or 3D nodes are interlinked in a graph structure, letting the model traverse and unify varied data points.
9.5.10 Cross-Modal Retrieval Prompting
The LLM may identify relevant textual data from an image-based request or vice versa, bridging the gap between different media in retrieval tasks.
9.6 Additional Multimodal Considerations
9.6.1 Negative Prompting in Multimodal Contexts
Specify undesired elements across visual, audio, or spatial data. For example: “Design a game level without any modern vehicles or neon lights.”
9.6.2 Paired-Modality Prompting
Provide parallel text and visuals to maintain consistent narratives—ideal for storybooks or educational resources.
9.6.3 Temporal Reasoning with Video Chains
In time-sensitive footage, a “chain-of-frames” approach systematically describes each segment, culminating in a comprehensive summary.
9.6.4 Spatial Reasoning via 3D Prompts
Let the LLM parse or create scenes that respect occlusion, depth, and perspective.
9.6.5 Interactive Voice Prompts
Real-time spoken queries and follow-ups can elicit clarifications from the LLM, weaving speech recognition with generative outputs.
9.6.6 Sign Language Prompting
Transform sign-language input (through video frames or skeletal data) into textual interpretations, enabling inclusive conversational interfaces.
9.6.7 Gesture-Based Prompting
General gestures, not necessarily formal sign language, may also serve as user input—for instance, “Raise a hand for yes, wave a hand for no.”
9.6.8 Sensory Integration Prompting
For advanced robotics or IoT, the LLM could unify sensor streams (temperature, humidity, pressure) with textual instructions to produce situational analyses.
9.6.9 Multimodal Chain-of-Dictionary
A specialized approach where each word or concept is linked to a visual or audio reference, creating a cross-referenced “dictionary” that the model consults.
9.6.10 Hybrid Multimodal Retrieval-Augmented Generation
When the system can store and fetch data from multiple sources (text archives, image collections, audio logs), it might produce cross-referenced outputs that weave all these modalities together.
10. Conclusion
Prompt engineering for LLMs has expanded far beyond simple textual instructions. The documented 58 core methods illustrate how to enrich an LLM’s capacity to solve problems, articulate nuanced reasoning, and interact with external tools and data repositories. From zero-shot heuristics—where the prompt is minimal—to elaborate chain-of-thought expansions, ensembling, and self-critique, each step represents a leap in how we can coax coherent, accurate, and contextually rich information from these enormous models.
The additional 40 multimodal techniques show that prompting no longer belongs solely in the textual realm. As LLM architectures integrate vision, sound, 3D, and sensor data, we glimpse a future where all these modalities coexist in a single generative pipeline. With the swift pace of AI progress, more complex and hybridized methods will undoubtedly arise, further collapsing the boundaries between textual, visual, auditory, and spatial realms.
At its core, effective prompt engineering remains a creative and ever-evolving pursuit—part science, part art. By systematically exploring and testing these 98+ strategies, researchers and developers can discover new ways to enhance the utility, reliability, and expressiveness of large-scale generative models. We continue pushing forward, driven by the conviction that carefully shaped prompts will unlock astonishing new capabilities in artificial intelligence.
Sources & Further Reading
- Chain-of-Thought Prompting:
- Foundational LLM Concepts:
- Tool Use Agents (MRKL):
- Prompt Engineering for LLMs:
For further coverage of advanced chain-of-thought methods, decomposition strategies, and agent architectures, keep an eye on the latest works posted to arXiv.org and proceed with enthusiastic experimentation in your own prompting endeavors.