Question 1

System prompt vs user prompt — when do I use each?

Accepted Answer

System prompt (defined once per conversation): sets role, tone, constraints, output format. User prompt (per query): the actual task/question. Example: system='You are a Python expert who writes secure, efficient code'; user='Write a function to parse CSV files'. System prompt persists across turns, user prompts change. Pro tip: put guardrails in system prompt, task details in user prompt. System is cheap to reuse; user prompt should vary per query.

Question 2

Chain-of-thought vs ReAct vs multi-turn — what's the difference?

Accepted Answer

Chain-of-thought (CoT): prompt model to show reasoning steps before answering. 'Let's think step by step.' Improves accuracy on math/logic 30-40%. ReAct (Reasoning + Action): interleave reasoning with tool calls — model decides what tools to use. Multi-turn: conversation history preserved. Use CoT for reasoning tasks, ReAct for tool use, multi-turn for dialogue. CoT costs 2x tokens but 40% accuracy gain. ReAct adds latency (multiple API calls) but solves novel tasks.

Question 3

RAG vs fine-tuning — when do I retrieve and when do I retrain?

Accepted Answer

RAG (Retrieval-Augmented Generation): fetch relevant docs, insert into prompt at query time. Fast to update (just add docs), works with any model, costs ~10% more tokens. Fine-tuning: retrain model on your data, slower/expensive ($100-1000s), locked to one model, updates take hours. Use RAG for: document QA, real-time data, frequently changing facts. Use fine-tuning for: style mimicry, domain-specific reasoning, latency-critical. RAG+fine-tune hybrid: retrieve context, fine-tuned model reads it.

Question 4

How do I evaluate prompt quality objectively?

Accepted Answer

Three metrics: (1) accuracy — does it solve the task? Use evals framework (Promptfoo, Braintrust, Evals CLI). (2) Latency — tokens/sec, cost per request. (3) Consistency — same input, same output? Run 10x with temperature=0. Common eval patterns: classification (exact match), generation (BLEU/ROUGE), reasoning (trace-based assertions). Never trust 'feels better'. Build a test harness with 20-50 examples, measure before/after prompt change. Tool: Promptfoo makes this 10 lines of YAML.

Question 5

What agentic patterns exist and when do I use each?

Accepted Answer

ReAct (reasoning + action loops): agent reasons, decides action, observes result, repeats. Use for: multi-step tasks, tool use. Plan-execute: agent plans steps, then executes. Use for: complex workflows, need visibility into plan. Tree-of-thought: explore multiple reasoning paths, prune low-value branches. Use for: hard reasoning, when wrong answer is costly. Hierarchical agents: manager agent delegates to specialist sub-agents. Use for: modular systems. Most robust: ReAct + tool use + human-in-the-loop for review steps.

Question 6

Jailbreaks, prompt injection, adversarial prompts — how do I defend against them?

Accepted Answer

Jailbreak: bypass safety guardrails via clever phrasing ('assume you're a character who…'). Defense: use system prompt with firm boundaries ('You will not…' is weaker than 'Your role prevents…'). Prompt injection: user input pollutes instructions. Defense: (1) separate user input from system instructions (use API parameters, not concatenation), (2) XML tags to mark input boundaries, (3) validation on output. Adversarial: user tries to trigger wrong behavior. Defense: test with adversarial examples in evals, log failures, add guardrails. Rule: never trust user input in the prompt directly — always escape or parameterize.

Question 7

How do I choose between GPT-4, Claude, Llama, and specialized models?

Accepted Answer

GPT-4: strongest reasoning, best for code/math, most expensive ($0.03-$0.06/1K tokens). Claude 3.5 Sonnet: excellent context window (200k), strong summarization, middle cost (~$0.003/1K). Llama 3.1: open, runs locally, weaker reasoning. Specialized: medical LLMs for healthcare, legal LLMs for contracts. Rule: start with Claude or GPT-4o for prototyping, measure evals, switch to cheaper model if it matches baseline. Avoid 'best model' fallacy — context window + latency often matter more than raw reasoning. Use OpenRouter or LiteLLM to swap easily.

Region	Junior	Mid	Senior
USA	$100k	$180k	$300k
UK	£70k	£130k	£200k
EU	€75k	€140k	€210k
CANADA	C$110k	C$200k	C$320k

Prompt Engineering (Advanced)

What is Prompt Engineering (Advanced)

📋 Before you start

💰 Salary by region

🎓 Certifications

🎯 Careers using Prompt Engineering (Advanced)

⚖ Compare with

❓ FAQ

Not sure this skill is for you?

Find your ideal career path