If you’ve mastered basic SQL, the next step is advanced querying and data modeling. These questions test your ability to optimize queries, design scalable schemas, and connect analytics to business logic. Let’s dive into interactive Q&A with explanations and best practices.
Section A — SQL Interview Questions for data analyst
1) What is the difference between CTE and subquery?
Answer:
- CTE (Common Table Expression): Temporary named result set defined using
WITH. Improves readability and can be referenced multiple times. - Subquery: Nested query inside
SELECT,FROM, orWHERE. Often harder to read and reuse.
Example:
WITH top_customers AS (
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
)
SELECT *
FROM top_customers
WHERE total_spent > 1000;
Explanation: CTEs are better for complex logic and debugging. Subqueries are fine for simple filters.
2) Explain window functions and give an example.
Answer: Window functions perform calculations across a set of rows related to the current row without collapsing them.
Example:
SELECT
customer_id,
order_date,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total
FROM orders;
Explanation: Useful for running totals, rankings, and moving averages.
3) How do you optimize a slow SQL query?
Answer:
- Check indexes: Ensure proper indexing on join/filter columns.
- **Avoid SELECT *: Fetch only needed columns.
- Use EXPLAIN plan: Identify bottlenecks.
- Reduce nested subqueries: Use CTEs or joins.
- Partition large tables: For faster scans.
- Materialized views: For repeated aggregations.
Pitfalls: Over-indexing can hurt write performance.
4) Difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN?
Answer:
- INNER JOIN: Matches only common keys.
- LEFT JOIN: All rows from left table + matches from right.
- FULL OUTER JOIN: All rows from both tables, matched where possible.
5) Write a query to find the second highest salary in a table.
Answer:
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
Alternative: Use ROW_NUMBER() or DENSE_RANK() for better scalability.
Section B — Data Modeling & Design
6) What is normalization? Why is it important?
Answer:
Normalization organizes data to reduce redundancy and improve integrity.
- 1NF: Atomic columns.
- 2NF: Remove partial dependencies.
- 3NF: Remove transitive dependencies.
Explanation: Ensures consistency and efficient storage.
7) When would you denormalize a database?
Answer:
For read-heavy analytics where joins slow performance. Denormalization trades storage for speed.
Example: Storing aggregated metrics in a fact table for dashboards.
8) Explain star schema vs snowflake schema.
Answer:
- Star: Fact table at center, dimension tables around. Simple, fast for BI tools.
- Snowflake: Dimensions normalized into sub-dimensions. Saves space but adds complexity.
Power BI Tip: Star schema is preferred for performance and simplicity.
9) How do you handle slow-changing dimensions?
Answer:
- Type 1: Overwrite old data.
- Type 2: Add new row with versioning.
- Type 3: Add new column for previous value.
10) What is cardinality in data modeling?
Answer:
Cardinality describes the uniqueness of data values in a column.
- High cardinality: Many unique values (e.g., email).
- Low cardinality: Few unique values (e.g., gender).
Impact: Affects indexing and join performance.
Section C — Scenario-Based Question
Scenario: You need to design a schema for an e-commerce analytics dashboard showing sales by region, product, and time.
Strong Answer Outline:
- Fact table:
saleswith keys for product, region, date, and measures like revenue, quantity. - Dimensions:
product_dim,region_dim,date_dim. - Relationships: One-to-many from dimensions to fact.
- Indexes: On foreign keys and date for filtering.
Quick Practice (Interactive)
- Write a query to calculate customer retention rate month-over-month.
- Design a star schema for a subscription business.
- Optimize a query that uses multiple nested subqueries.
Common Mistakes & Fixes
- Over-normalizing: Hurts BI performance.
- Ignoring indexes: Leads to slow queries.
- **Using SELECT *: Fetch only required columns.
- Not validating join keys: Causes inflated metrics.
Blog Post #3 — Data Visualization & Dashboard Design Interview Questions (Power BI/Tableau best practices, storytelling with data, and common pitfalls).
Blog Post #1 — Data Analyst Interview Questions & Answers in 2026

