IntermediateTECHNICAL
You are asked to design and optimize a query that joins several large tables (tens of millions of rows) for a recurring report. Walk me through how you would: (a) understand the schema, (b) write the initial query, and (c) systematically tune its performance (indexes, execution plan analysis, query refactoring).
Sql Developer internship
General

Sample Answer

For a recurring report on large tables, I’d start by pulling the ERD and checking primary/foreign keys, cardinality, and any existing indexes. I like to run a few quick COUNT(*) and DISTINCT checks on key columns to understand data distribution, plus confirm with the analyst or PM exactly which business entities need to show up in the final result. Then I’d write a straightforward, readable query first: explicit JOINs, clear filters, no premature optimization. Once it’s correct, I’d capture the execution plan. From there, I’d look for red flags like table scans on 20M+ row tables, large hash joins, or key lookups. I’d consider composite indexes on the main filter and join columns, and sometimes pre-aggregating into a small summary table if it’s truly recurring. In a previous project, this approach took a nightly report from 28 minutes down to just under 3 minutes, making it stable enough to schedule right before leadership’s 9am dashboard refresh.

Keywords

Start by understanding schema, keys, and data distribution before codingWrite a clear baseline query focused on correctness firstUse execution plans to target specific bottlenecks (scans, joins, lookups)Leverage appropriate indexing and, when necessary, summary tables for recurring reports