IntermediateTECHNICAL
Describe a specific instance where you used Python (e.g., with pandas or NumPy) to transform a messy dataset into something usable. What were the original issues in the data, and what exact steps did you take to fix them?
Custom Role
General

Sample Answer

On a customer analytics project, I inherited a 5M-row CSV that came from three different legacy systems. It looked fine at first glance, but once I pulled it into pandas, everything fell apart: dates stored in three formats, IDs as strings with leading zeros, 15% missing values in key columns, and duplicates everywhere. I started by standardizing types: used `to_datetime` with `errors='coerce'`, normalized to UTC, and casted IDs to fixed-width strings. Then I profiled nulls with `df.isna().mean()` and worked with our ops lead to decide what was truly missing vs. expected. For numeric features, I used median imputation; for categoricals, I filled with an explicit "Unknown" bucket to keep leakage visible. I de-duplicated using a subset of business keys and kept the most recent record. In the end, we reduced row count by 9% but improved join success across tables from ~70% to 96%, and queries that used to take minutes in our BI tool dropped to a few seconds.

Keywords

Large, messy real-world dataset from multiple legacy sourcesConcrete pandas techniques: type casting, datetime parsing, null handling, de-duplicationCollaboration with stakeholders to define correct imputation rulesClear before/after metrics: join success improvement and query performance gains
Related Questions

In your civil engineering studies, what specific design coursework or project work did you complete related to irrigation channels or canals (e.g., design of lined/unlined canals, distributaries, minors)? Describe one such design in detail, including how you determined discharge, permissible velocity, section dimensions, and lining choice for Gujarat-type soil and climate conditions.

IntermediateTECHNICAL

Based on your hydrology and irrigation engineering background, explain how you would estimate the irrigation water requirement for a kharif crop in a semi-arid region of Gujarat. Walk me through each step: from reference evapotranspiration estimation, crop coefficient selection, effective rainfall calculation, to arriving at canal discharge for a given command area.

IntermediateTECHNICAL

Walk me through a recent multi-channel digital marketing campaign you managed end-to-end. How did you set objectives, choose channels, allocate budget, and measure success?

IntermediateBEHAVIORAL

In your resume you note improving or optimizing [a process, KPI, or metric]. What specific baseline metrics did you start from, what steps did you personally take, and how did you verify that the improvement was due to your changes rather than external factors?

IntermediatePROBLEM_SOLVING

On your resume you mention working on a cross-functional project (e.g., involving multiple teams or stakeholders). Describe a situation from that project where priorities conflicted—how did you navigate the trade-offs and what was the final outcome?

IntermediateSITUATIONAL