- The AI Leadership Forum
- Posts
- The Science of Prompting - what works, what doesn't?
The Science of Prompting - what works, what doesn't?
+ NEWS: Microsoft developing new AI ; China’s AI Agent (Manus) Blows the Internet

TL;DR
AI prompting isn’t one-size-fits-all—small changes can lead to big differences in accuracy. A recent study shows that structured prompts improve performance, but tricks like politeness don’t always help. AI’s answers vary, so testing multiple prompts and running repeated trials is key for reliability. To get better business results from AI, focus on clear formatting, structured instructions, and careful evaluation.
AI models like ChatGPT might feel like magic, but getting consistently good responses isn’t as easy as it looks.
The latest research from the Wharton School’s Generative AI Labs sheds light on an important truth: there is no single “perfect” way to prompt an AI, and even small changes can have a big impact on its accuracy.
Understanding how to prompt effectively can save time, improve accuracy, and get you better results.

Benchmarking AI: A Moving Target
Most AI users assume that performance is straightforward—either the AI gets something right or it doesn’t. But this study shows that AI performance is highly dependent on how you measure success.
The research tested two AI models (GPT-4o and GPT-4o-mini) on PhD-level science questions, asking each model the same question 100 times per test condition.
Why does this matter?
- AI models don’t always give the same answer—even when asked the exact same question multiple times. 
- How you define “correct” changes the results. The study tested different success thresholds: - 100% correct (no mistakes) 
- 90% correct (close to human expert levels) 
- 51% correct (majority rule) 
 
At higher accuracy thresholds, the AI struggled—often performing only slightly better than random guessing. At lower accuracy thresholds, it looked much better.
Takeaway: If you rely on AI for business decisions, don’t assume one test result tells the full story.
AI is not always consistent, and different evaluation methods can change how "good" an AI appears.
The Impact of Different Prompting Approaches
The study tested four different prompting styles:
- Formatted Prompt (Explicit instructions: “Format your response as: The correct answer is ___.”) 
- Unformatted Prompt (No instructions on how to format the response) 
- Polite Prompt (“Please answer the following question.”) 
- Commanding Prompt (“I order you to answer the following question.”) 
What Happened?
- Formatted prompts consistently led to better performance. AI models did worse when formatting instructions were removed. 
- Being polite or commanding made no difference overall—but for individual questions, politeness sometimes helped and sometimes hurt. 
- The model's behaviour varied depending on the prompt, making it hard to predict what would work best in advance. 
Takeaway: Standardised, well-structured prompts (such as explicitly formatting answers) help AI perform better. But simple “tricks” like saying “please” or using more forceful language aren’t guaranteed to improve results.

3. Why “One-Size-Fits-All” Prompting Doesn’t Work
The study found huge variation in AI accuracy depending on the specific question. Even when using the same AI model with the same prompt style, accuracy could swing wildly from question to question.
This suggests that:
- Prompting success depends on context. Some techniques work better for certain types of tasks than others. 
- AI models are unpredictable—what works in one scenario may not work in another. 
- Repeated testing is crucial. If you’re using AI for critical business tasks, don’t rely on a single prompt test—experiment and refine your approach. 
Takeaway: Instead of assuming one "best" prompt, test multiple variations to see what works for your specific use case.

4. How to Apply This Research to Your Business
Here’s how you can use these insights to get better results from AI in your business:
A) Use Structured Prompts
 Always format your AI queries clearly. Example:
❌ “What are the key trends in AI?”
✅ “List the top 5 AI trends for 2024. Format your response as: ‘1. Trend Name - Explanation (in 2-3 sentences).’” 
B) Test Multiple Prompt Variations
Try different ways of asking the same question and compare results. Example:
- “Summarise this customer complaint in one sentence.” 
- “Summarise this customer complaint in exactly 15 words.” 
- “Summarise this customer complaint, emphasising the key issue in fewer than 20 words.” 
C) Measure Performance Over Multiple Runs
If you’re using AI for automation (e.g., chatbots, reports, decision-making), don’t just test a prompt once. Run it multiple times and see if results vary.
D) Don’t Rely on “Tricks” Like Politeness
Being polite may help in some cases and hurt in others. Instead, focus on clarity and structure.
E) Consider Your Accuracy Needs
If you need AI to be perfectly correct, you may need human oversight. If “good enough” is okay, AI can be used more freely.
This study confirms what many experienced AI users already suspect: getting AI to work well requires thoughtful prompting and careful testing.
For businesses, this means treating AI like a tool that needs fine-tuning—not just a magic solution. Whether you’re using AI for customer service, content generation, or decision-making, taking the time to refine your prompts will pay off in better, more reliable results.
Want to keep up with the latest AI research and practical insights for your business?
📩 Stay ahead of AI trends! Subscribe to our newsletter for insights on how AI is transforming business.
This Week in AI
- Microsoft developing new AI to no longer depend on OpenAI. 
- China’s AI Agent (Manus) Blows the Internet 
- 2025 will be a year of AI Legislation 
| Did you enjoy today's newsletter? | 
|  | This is all for this week. If you have any specific questions around today’s issue, email me under [email protected]. For more infos about us, check out our website here. See you next week! | 
