I’d start with a single-machine O(V+E) algorithm: either DFS with 3-state node coloring (white/gray/black) or Kahn’s topological-sort approach. DFS marks a node gray when visiting; encountering a gray node signals a cycle. Both run in linear time and use O(V+E) space for adjacency lists and O(V) call/stack overhead. For very large graphs (100M nodes, 1B edges) spread across machines I’d partition by service ownership and run distributed SCC detection using a Pregel-style model (GraphX/Giraph). Nodes exchange component labels iteratively until convergence; any strongly connected component with >1 node or a self-loop indicates a cycle. I’d also support incremental checks on edge changes, snapshot consistency, and use compression/adjacency chunking and checkpoints. For latency-sensitive systems I’d run a fast approximate filter (Bloom filters + sampling) to surface likely cycle regions before full SCC analysis.
Get AI-powered feedback on your answer and improve your skills
Takes 5-10 minutes