Data Analyst Interview Questions & Answers (2026)

Updated March 27, 2026
Quick Answer

Data Analyst Interview Questions: The Complete Preparation Guide The U.S. Bureau of Labor Statistics projects 36% employment growth for data analysts through 2033 — more than seven times the average for all occupations — with a median annual salary...

Data Analyst Interview Questions & Answers (2026)

The U.S. Bureau of Labor Statistics projects 36% employment growth for data analysts through 2033 — more than seven times the average for all occupations — with a median annual salary of $103,500 [1]. Organizations across every industry are hiring analysts to transform raw data into business decisions, but the skill gap remains significant: LinkedIn's 2024 Workforce Report identified data analysis as the most in-demand skill across all job categories for the third consecutive year [2]. This means interviewers are evaluating not just technical proficiency but your ability to communicate insights, think critically about data quality, and drive measurable business outcomes. This guide covers the full spectrum of Data Analyst interview questions — from SQL and statistical reasoning to stakeholder communication and business impact — with answer frameworks that separate candidates who merely query data from those who deliver actionable intelligence.


Key Takeaways

  • Data Analyst interviews test SQL proficiency, statistical reasoning, and business communication equally
  • Expect live coding challenges (SQL or Python), take-home analyses, and case study presentations
  • Behavioral questions assess how you handle ambiguous requirements, conflicting stakeholder priorities, and data quality issues
  • Prepare portfolio examples showing end-to-end analysis: question formulation, data preparation, analysis, visualization, and business recommendation
  • Knowledge of your industry's key metrics and data ecosystem is as important as technical skill

Technical and SQL Questions

1. Write a SQL query to find the top 5 customers by total order value in the last 90 days, excluding canceled orders.

**What interviewers look for:** Practical SQL fluency, attention to edge cases, and clean query structure. **Answer framework:** This tests fundamental SQL skills — JOINs, aggregation, filtering, and ordering. A strong answer addresses: (1) proper date filtering using CURRENT\_DATE - INTERVAL '90 days' or equivalent, (2) explicit exclusion of canceled orders with a WHERE clause, (3) appropriate JOIN between customers and orders tables, (4) GROUP BY with SUM aggregation, and (5) ORDER BY DESC with LIMIT 5 [3]. Discuss edge cases: What if a customer has partially canceled orders? Should you use order_date or payment_date for the 90-day window? "I would write: SELECT c.customer\_id, c.name, SUM(o.total\_amount) as total\_value FROM customers c JOIN orders o ON c.customer\_id = o.customer\_id WHERE o.order\_date >= CURRENT\_DATE - INTERVAL '90 days' AND o.status != 'canceled' GROUP BY c.customer\_id, c.name ORDER BY total\_value DESC LIMIT 5; I would also ask the interviewer whether 'total_amount' is pre- or post-discount and whether returns should be netted."

2. Explain the difference between WHERE and HAVING clauses in SQL.

**What interviewers look for:** Understanding of query execution order, not just syntax. **Answer framework:** WHERE filters rows before aggregation; HAVING filters groups after aggregation [4]. This distinction matters because WHERE cannot reference aggregate functions (SUM, COUNT, AVG) while HAVING can. The SQL execution order is: FROM/JOIN, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT. Provide a practical example: "If I want customers who placed more than 5 orders in the last month, I use WHERE for the date filter and HAVING for the order count: WHERE order\_date >= '2026-01-01' ... HAVING COUNT(\*) > 5. Putting the count condition in WHERE would cause a syntax error because the aggregation has not been computed yet."

3. How would you handle missing data in a dataset you are analyzing?

**What interviewers look for:** Analytical maturity — understanding that missing data is a problem to investigate, not just a technical issue to fix. **Answer framework:** First, diagnose the missingness mechanism [5]: (1) Missing Completely at Random (MCAR) — missingness is unrelated to any observed or unobserved data; safe to drop or impute. (2) Missing at Random (MAR) — missingness depends on observed variables; imputation using those variables is appropriate. (3) Missing Not at Random (MNAR) — missingness depends on the unobserved value itself (e.g., high-income people skip the income question); this requires careful modeling or sensitivity analysis. Then choose an appropriate strategy: deletion (listwise or pairwise), imputation (mean, median, mode, regression-based, or multiple imputation), or flagging (creating an indicator variable for missingness and including it in the model). "In an e-commerce analysis, I found that 23% of customer records were missing the 'referral_source' field. Investigation revealed that the field was not captured before a website redesign — it was MAR, dependent on signup date. I used the known distribution from post-redesign signups to impute referral sources for the earlier cohort, while documenting this assumption clearly in my report."

4. Explain the difference between correlation and causation with a real-world example.

**What interviewers look for:** Statistical thinking and ability to communicate it to a business audience. **Answer framework:** Correlation measures the strength and direction of a linear relationship between two variables; causation means one variable directly influences the other [6]. The classic pitfall: ice cream sales and drowning deaths are positively correlated, but ice cream does not cause drowning — both are caused by hot weather (a confounding variable). In a business context: "A marketing team showed me a correlation between social media ad spend and revenue growth over 12 months (r = 0.87). Before recommending increased spend, I investigated confounding factors. It turned out both variables were driven by seasonality — Q4 holiday spending increased both ad budget and revenue simultaneously. When I controlled for seasonality, the correlation dropped to 0.31. We redesigned the analysis as an A/B test to establish actual causal impact, which showed a 4.2% revenue lift from social ads — real, but much smaller than the naive correlation suggested."

5. How do you approach designing a dashboard for stakeholders?

**What interviewers look for:** User-centered thinking, not just technical visualization skill. **Answer framework:** Start with the audience and their decisions, not the data [7]. Steps: (1) Identify the key business questions the dashboard must answer — "How are we tracking against quarterly targets?" is different from "Where should we invest marketing budget?" (2) Determine the audience — executives need high-level KPIs with drill-down; analysts need granular data with filters. (3) Design for the decision cadence — daily operational dashboards versus weekly strategic reviews. (4) Apply visualization best practices: choose chart types that match the data relationship (line for trends, bar for comparisons, scatter for correlations), minimize cognitive load, use consistent color encoding, and include context (targets, benchmarks, prior period) [8]. "I built a sales performance dashboard for a VP who was checking it every Monday morning. I put the three KPIs they cared most about — pipeline coverage, win rate, and average deal size — as large numbers at the top with week-over-week trend indicators. Below that, I provided drill-down by region, rep, and product line. Usage analytics showed the VP spent 3 minutes on the dashboard weekly — meaning the top-level summary was doing its job."


Statistical and Analytical Questions

6. A product manager tells you that the latest A/B test shows a 2% conversion rate improvement with a p-value of 0.04. Should you ship the change?

**What interviewers look for:** Nuanced understanding of statistical significance versus practical significance. **Answer framework:** A p-value of 0.04 means there is a 4% probability of observing this result (or more extreme) if the null hypothesis is true — it meets the conventional 0.05 threshold for statistical significance [9]. But statistical significance alone is insufficient. Evaluate: (1) Practical significance — is a 2% relative improvement meaningful for the business? If baseline conversion is 10%, moving to 10.2% may not justify the engineering effort. If baseline is 1%, moving to 1.02% is negligible. (2) Confidence interval — what is the range of plausible effect sizes? A CI of [0.1%, 3.9%] means the true effect could be trivially small. (3) Sample size and test duration — was the test run long enough to capture weekly cyclicality? Were there multiple comparisons that inflate false positive risk? (4) Segment effects — does the improvement hold across all user segments, or is it driven by one outlier group? "I would ask the product manager three questions before recommending a ship decision: What is the absolute conversion rate change, not just relative? How long did the test run? And did we check for interaction effects with mobile versus desktop users?"

7. Explain what a Type I error and Type II error are, and when you would prioritize minimizing each.

**What interviewers look for:** Practical application of statistical concepts to business decisions. **Answer framework:** Type I error (false positive) is concluding an effect exists when it does not. Type II error (false negative) is concluding no effect exists when there is one [10]. The trade-off: reducing Type I error (lower alpha) increases Type II error, and vice versa. Prioritize minimizing Type I when the cost of a false positive is high — launching a feature that does not actually work, approving a drug that is not effective, or flagging a legitimate transaction as fraud (customer friction). Prioritize minimizing Type II when the cost of missing a true effect is high — failing to detect a disease in screening, missing a genuine security threat, or not launching a feature that would have significantly improved retention. "In fraud detection, I optimize for low Type II error — I would rather flag 100 legitimate transactions for review (false positives) than miss one actual fraud case. In pricing experiments, I optimize for low Type I error — I do not want to permanently raise prices based on a false positive that said customers would not churn."

8. How would you measure the success of a new product feature?

**What interviewers look for:** Metrics thinking and ability to define success before measuring it. **Answer framework:** Define the success hierarchy before writing any queries [11]: (1) Primary metric — the single number that directly measures the feature's intended outcome (e.g., for a recommendation engine: click-through rate on recommended items). (2) Secondary metrics — related measures that provide context (e.g., session duration, pages per visit). (3) Guardrail metrics — metrics that should NOT degrade (e.g., overall conversion rate, page load time, customer satisfaction scores). (4) North star alignment — does improvement in the primary metric actually drive the company's core value metric? Then determine the measurement methodology: pre-post comparison (weakest), cohort analysis (moderate), or A/B test (strongest). Establish the minimum detectable effect size and required sample size before launch, not after. "For a checkout simplification feature, I defined: primary metric = checkout completion rate, secondary metrics = time to checkout and average order value, guardrail metrics = return rate and customer support tickets. We ran the A/B test for 3 weeks to capture full weekly cycles and achieved a 7.3% lift in completion rate with no degradation in guardrails."


Behavioral and Communication Questions

9. Tell me about a time your analysis contradicted what stakeholders expected or wanted to hear.

**What interviewers look for:** Courage to deliver unwelcome findings and skill in framing them constructively. **Answer framework:** Choose an example where your analysis challenged a popular narrative or an executive's pet project. Describe: (1) the stakeholder expectation and why it existed, (2) what your data showed and how you validated it, (3) how you presented the finding — framing, context, and recommendations for action [12]. "The marketing team was convinced that a loyalty program launched six months earlier was driving repeat purchases. My cohort analysis showed that members were already high-frequency buyers before joining — the program was attracting existing loyalists, not creating new ones. I presented this alongside a positive finding: program members had a 12% higher average order value. I recommended repositioning the program as an upselling mechanism rather than a retention tool, which the CMO accepted after reviewing the data."

10. Describe a time you had to work with messy or unreliable data. What did you do?

**What interviewers look for:** Data quality awareness and practical problem-solving. **Answer framework:** Every analyst works with imperfect data — the question is how you handle it. Describe: (1) how you identified the quality issues (validation checks, distribution analysis, domain knowledge), (2) what specific problems existed (duplicates, inconsistent formats, missing values, stale records, conflicting sources), (3) how you cleaned and transformed the data while documenting your decisions, and (4) how you communicated data quality limitations in your final analysis [13]. "I was asked to analyze customer churn using a CRM export. Initial exploration revealed: 15% duplicate customer records with different IDs, three different date formats across fields, and a 'last_activity_date' column that had not updated for 6 months due to a broken integration. I built a deduplication logic using email + phone matching, standardized dates, and reconstructed activity history from the event log table. I documented every cleaning step in a data quality appendix and flagged the broken integration for the engineering team."

11. How do you prioritize when multiple stakeholders request analyses simultaneously?

**What interviewers look for:** Professional maturity and strategic thinking about where analysis creates the most value. **Answer framework:** Prioritize by business impact, decision urgency, and data readiness [14]. A framework: (1) Is there a time-sensitive decision that will be made regardless — your analysis can only improve it if delivered before the deadline? That takes priority. (2) What is the expected value of the decision your analysis informs — a $10M pricing decision outweighs a $50K process improvement. (3) Can you provide a quick directional answer to one stakeholder while giving a thorough analysis to another? "I maintain a prioritization queue that I share with my manager weekly. When two VPs requested conflicting analyses in the same week, I provided VP A with a quick exploratory analysis (2 hours) that answered their immediate question directionally, while doing a comprehensive deep-dive for VP B whose analysis informed a board presentation. I communicated timelines to both stakeholders upfront, and neither was surprised."


Scenario-Based Questions

12. You notice that daily active users dropped 15% yesterday. Walk me through your investigation.

**What interviewers look for:** Structured debugging approach and hypothesis-driven thinking. **Answer framework:** Follow a diagnostic tree [15]: (1) Verify the data — is the metric accurate? Check for logging issues, pipeline delays, or definition changes. (2) Determine scope — is the drop across all platforms (web, mobile, app) or isolated? All geographies or specific regions? All user segments or specific cohorts? (3) Check for known causes — was there a site outage, a deployment, or a marketing campaign that ended? (4) Examine correlated metrics — did sessions drop (fewer people coming) or did session depth drop (same people doing less)? (5) Form hypotheses and test them — if mobile-only, check app store for update issues; if geography-specific, check for ISP outages; if new-user-only, check acquisition channel performance. "My first call would be to engineering to check for incidents. If clean, I would segment the drop by platform, geo, and acquisition source within 30 minutes. In a previous role, a similar investigation revealed that a CDN configuration change had broken image loading in three European countries, which accounted for the entire drop."

13. A sales leader asks you to build a model to predict which leads will convert. How do you approach this?

**What interviewers look for:** End-to-end analytical project planning, not just modeling technique. **Answer framework:** Resist the urge to jump to model selection. Steps: (1) Define the target variable precisely — what counts as "conversion" and over what time window? (2) Identify available features — lead source, company size, engagement signals (email opens, page visits, content downloads), demographic/firmographic data. (3) Assess data quality and volume — do you have enough historical conversions to train a model? (4) Start simple — logistic regression often outperforms complex models when features are well-engineered and provides interpretable coefficients that sales teams trust [16]. (5) Define evaluation metrics aligned with the business use case — precision (don't waste sales time on bad leads) or recall (don't miss any good leads). (6) Plan for deployment and monitoring — how will scores be surfaced to the sales team, and how will you detect model degradation? "The biggest pitfall I have seen is building a model that is accurate but unused. I would work with the sales team from day one to understand their workflow, embed the lead score into their CRM, and A/B test whether scored leads actually convert at a higher rate when prioritized."

14. Marketing claims their email campaign generated $500,000 in revenue. How would you validate this claim?

**What interviewers look for:** Attribution sophistication and healthy skepticism. **Answer framework:** Question the attribution methodology [17]: (1) How was "generated" defined — did recipients purchase within 7 days, did they click the email before purchasing, or did they simply open it? (2) What is the counterfactual — would these customers have purchased anyway without the email? Check: were recipients existing customers with regular purchase patterns? Compare against a holdout group if one existed. (3) Examine incrementality — subtract the baseline purchase rate of similar customers who did not receive the email. (4) Check for selection bias — were the recipients targeted because they were already likely to buy (frequent visitors, items in cart)? "I would ask for the holdout group data first. If no holdout existed, I would build a matched control group from non-recipients with similar purchase history, recency, and engagement levels. In a previous analysis, this approach reduced a campaign's claimed $500K impact to $127K in truly incremental revenue — still positive, but a very different story for ROI calculation."


Questions to Ask the Interviewer

  1. **"What does the data infrastructure look like — where does data live, and how do analysts access it?"** — Shows practical awareness of the tools and systems you will work with daily.
  2. **"How are analytical priorities set — is there a formal request process or more ad-hoc?"** — Signals awareness of workflow management challenges.
  3. **"Can you describe a recent analysis that changed a business decision?"** — Tests whether the organization actually uses data to drive decisions or just collects it.
  4. **"What is the team's approach to data quality and governance?"** — Demonstrates awareness that analysis quality depends on data quality.

Preparation Checklist

  1. **Practice SQL under time pressure.** Use platforms like LeetCode, HackerRank, or StrataScratch to solve SQL problems in 15-20 minutes — this mirrors the interview environment [18].
  2. **Prepare a portfolio presentation.** Select one analysis you are proud of and prepare a 10-minute walkthrough: the business question, your approach, the analysis, the findings, and the business impact. Practice explaining it to someone without a technical background.
  3. **Review basic statistics.** Mean, median, standard deviation, confidence intervals, p-values, A/B test design, and regression interpretation should be second nature.
  4. **Know your tools deeply.** Whether you use Python (pandas, matplotlib), R (tidyverse, ggplot2), Tableau, or Power BI, be prepared to discuss why you chose specific tools for specific tasks and demonstrate fluency in at least one.
  5. **Research the company's data.** Check their data team's blog posts, conference talks, or job descriptions to understand their tech stack, data scale, and analytical priorities.

References

[1] U.S. Bureau of Labor Statistics, "Occupational Outlook Handbook: Data Scientists and Mathematical Science Occupations," BLS, 2024. [2] LinkedIn, "2024 Workforce Report: Most In-Demand Skills," LinkedIn Economic Graph, 2024. [3] Molinaro, D., "SQL for Data Analysis," O'Reilly Media, 2023. [4] Beaulieu, A., "Learning SQL," 3rd Edition, O'Reilly Media, 2020. [5] Little, R. & Rubin, D., "Statistical Analysis with Missing Data," 3rd Edition, Wiley, 2019. [6] Pearl, J. & Mackenzie, D., "The Book of Why: The New Science of Cause and Effect," Basic Books, 2018. [7] Few, S., "Information Dashboard Design," Analytics Press, 2013. [8] Knaflic, C.N., "Storytelling with Data," Wiley, 2015. [9] Wasserstein, R. & Lazar, N., "The ASA Statement on p-Values," The American Statistician, 2016. [10] Agresti, A. & Franklin, C., "Statistics: The Art and Science of Learning from Data," 4th Edition, Pearson, 2017. [11] Croll, A. & Yoskovitz, B., "Lean Analytics," O'Reilly Media, 2013. [12] Davenport, T. & Kim, J., "Keeping Up with the Quants," Harvard Business Review Press, 2013. [13] Dasu, T. & Johnson, T., "Exploratory Data Mining and Data Cleaning," Wiley, 2003. [14] Patil, D.J. & Mason, H., "Data Driven," O'Reilly Media, 2015. [15] Hubbard, D., "How to Measure Anything," 3rd Edition, Wiley, 2014. [16] Provost, F. & Fawcett, T., "Data Science for Business," O'Reilly Media, 2013. [17] Kohavi, R. et al., "Trustworthy Online Controlled Experiments," Cambridge University Press, 2020. [18] Tao, D., "Ace the Data Science Interview," 2023.

See what ATS software sees Your resume looks different to a machine. Free check — PDF, DOCX, or DOC.
Check My Resume

Tags

data analyst interview questions
Blake Crosley — Former VP of Design at ZipRecruiter, Founder of Resume Geni

About Blake Crosley

Blake Crosley spent 12 years at ZipRecruiter, rising from Design Engineer to VP of Design. He designed interfaces used by 110M+ job seekers and built systems processing 7M+ resumes monthly. He founded Resume Geni to help candidates communicate their value clearly.

12 Years at ZipRecruiter VP of Design 110M+ Job Seekers Served

Ready to build your resume?

Create an ATS-optimized resume that gets you hired.

Get Started Free