Most of the queries we need help with are probably non-trivial. If you really just need to see all the rows in a table, you can likely figure that out without AI. Where AI really helps is when we want to ask and answer questions, which are almost never answered by showing you everything. However, these types of question-answering queries are often hard to craft.

When you've finally identified the important questions you want to answer with your database, it's tempting to jump straight to generating the final SQL query. However, this direct approach often leads to subtle errors and incorrect results—even when the query appears to work correctly. Let's explore a more methodical process that dramatically improves accuracy through what I call experimental query generation.
Here's the prompt to get started:
"Whenever I ask you to create a query, I want you to first understand the data in the table,
not just the schema. You will first show me a series of simple select queries to
gather samples from all tables you're going to use in the query.
Once you have the sample data and understand it, you will then generate
the correct query for me."
The Problem with Direct Query Generation
The fundamental challenge when generating complex SQL queries is ensuring that the AI fully understands not just the database schema, but the actual data within your tables. Without seeing real data samples, the AI must make assumptions about:
- How values are actually formatted in your columns
- The true relationships between tables
- Edge cases or unusual patterns in your data
- Which joins will produce the expected results
These assumptions can lead to queries that look correct but produce subtly wrong results—a particularly dangerous situation because the errors may not be immediately obvious. As data professionals know, an incorrect analysis that looks plausible can be more harmful than one that clearly fails. Most data professionals don't generate queries in one step. They run some exploratory queries, look at the results, and then produce the final query. We want are mimicking this same process.
The Experimental Query Generation Approach
Instead of treating query generation as a single-step process, we can use a multi-stage approach where the AI first explores your data through simple queries before building the complex final query. This resembles how experienced data analysts work—they rarely write complex queries in one go but instead explore the data incrementally to verify their understanding.
To implement this approach, you can use the following prompt:
Whenever I ask you to create a query, I want you to first understand the data in the table,
not just the schema. You will first show me a series of simple select queries to gather
samples from all tables you're going to use in the query.
Once you have the sample data and understand it, you will then generate the correct query for me.
This prompt fundamentally changes how the AI approaches query generation by:
- Requiring data exploration before writing the final query
- Breaking down complex problems into manageable pieces
- Creating opportunities for feedback at each step
- Allowing the AI to verify its assumptions with actual data
- Building a more complete understanding of your database's specifics
How the Process Works in Practice
Let's walk through how this approach typically unfolds:
1. You Use the Prompt
Whenever I ask you to create a query, I want you to first understand the data in the table,
not just the schema. You will first show me a series of simple select queries to gather
samples from all tables you're going to use in the query.
Once you have the sample data and understand it, you will then generate the correct query for me.
2. You Ask Your Complex Question
For example: "Could we create a profile of customers that return movies late most often by the genres of movies they watch?"
3. The AI Outlines Its Problem-Solving Approach
The AI will first explain how it plans to tackle the problem. In our example, it might explain:
- "First, I'll identify customers with the most late returns"
- "Then I'll link rentals to movies and their genres"
- "Finally, I'll group and analyze late returns by genre"
This outline gives you a chance to verify that the AI's approach aligns with your expectations before proceeding.
4. The AI Provides Sample Queries for Data Exploration
Next, the AI will generate several simple queries designed to explore different aspects of your data. These might include:
- Queries to examine rental records and how late returns are tracked
- Queries to look at customer information
- Queries to examine movie data and genre classifications
- Queries to verify relationship fields between tables
Each query is targeted at answering a specific question or validating a specific assumption about your data.
5. You Run Each Query and Share the Results
For each sample query, you:
- Run it in your database environment
- Copy the results
- Share them back with the AI, labeling which query they correspond to (e.g., "Query 1 results:")
While this process requires more effort than a single-step approach, it provides the AI with invaluable insights into your actual data.
6. The AI Analyzes the Results and Generates the Final Query
After receiving all sample query results, the AI analyzes the patterns and characteristics of your data. It then uses this enriched understanding to construct the final, complex query—one that's far more likely to be accurate because it's built on verified knowledge rather than assumptions.
The AI should also explain the final query, helping you understand its logic and how it addresses your original question.
Why This Approach Produces Better Results
The experimental approach improves query accuracy through several mechanisms:
Verification of Assumptions: By examining actual data, the AI can verify its understanding of column formats, value ranges, and relationships.
Discovery of Edge Cases: Sample queries often reveal special cases or patterns that wouldn't be apparent from schema information alone.
Feedback Opportunities: The incremental process creates multiple opportunities for you to correct misunderstandings before they're incorporated into the final query.
Increased Context: Each sample result provides additional context that helps the AI generate more precise SQL.
Simplified Debugging: If the final query doesn't work as expected, the sample queries and their results provide reference points for troubleshooting.
Integrating This Approach with Your Workflow
To make the most of experimental query generation:
Maintain Context: Use this approach within a conversation where you've already established database context, either through the flipped interaction pattern or by attaching comprehensive documentation.
Document Your Process: Save the entire conversation—including sample queries, results, and the final query—as a reference for future analysis of the same type.
Verify Final Results: Even with this thorough approach, always validate the final query's results by checking them against your understanding of the data and business rules.
Iterate When Needed: If the final query doesn't fully address your question, don't hesitate to ask for refinements based on the initial results. Give the AI a follow-up prompt with the results of the query and explain why they are wrong.
The Importance of Verification
Remember that even with this methodical approach, verification remains essential. When you receive the final query, think about how you'll test its correctness:
- Does it return the expected number of records?
- Do sample results align with your knowledge of the business?
- Can you verify key calculations through alternative methods?
- Are there obvious patterns or outliers that should be investigated?
Just as you would verify results from a human analyst, always apply critical thinking to AI-generated queries—regardless of how sophisticated or correct they appear.
Conclusion
Experimental query generation represents a fundamental shift in how we work with AI to analyze data. By breaking the process into exploration and execution phases, we create a collaborative workflow that leverages both the AI's query-writing capabilities and your domain knowledge.
While it requires more interaction than simply asking for a query, the improved accuracy and reliability make it well worth the additional effort, especially for complex analytical questions where correctness is crucial. This approach doesn't just produce better queries—it helps you develop a deeper understanding of your data along the way.
'Study Note > Data Analysis' 카테고리의 다른 글
| The CLUE Technique: Grounding Generative AI (4) | 2025.06.20 |
|---|---|
| Grounding Generative AI: Avoiding Hallucinations When Working with Databases (1) | 2025.06.20 |
| Chat with Your Data: Generative AI-Powered SQL Data Analysis (0) | 2025.06.20 |
| Exploratory Data Analysis (EDA) (0) | 2024.05.28 |
| The Pearson correlation coefficient (0) | 2024.05.13 |