Cristian Leo

Data Scientist

AWS

About

Cristian Leo is a Data Scientist at Amazon focused on advancing LLM capabilities through novel algorithms and modeling techniques. He holds an M.S. in Applied Analytics from Columbia University and is known in the data science community for his deep-dive approach to recreating ML algorithms from scratch, frequently documented in his published work on Medium. Together, they bridge security domain expertise and ML research—Daniel built the AI security agent that SIR-Bench evaluates, while Cristian designed the evaluation methodology and adversarial judge architecture.

Sessions

SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents

What you will learn:

•Investigation vs. Classification: Learn the critical difference between an AI that correctly triages alerts (97.1%) and one that conducts genuine forensic investigation (41.9% novel finding coverage)—and why both metrics matter for production deployment •Adversarial Evaluation Design: Implement an LLM-as-Judge that inverts the burden of proof, preventing the confirmation bias that accepts alert repetition as valid investigation •Realistic Benchmark Generation: Use the OUAT methodology to create measurable ground truth from real incident patterns without exposing sensitive production data •Performance by Attack Category: Understand why Unauthorized Access investigations yield deep findings (47.9% hit 7+ novel discoveries) while •Malicious File Execution struggles (1.9%)—and what this means for agent deployment decisions •Production Readiness Framework: Apply the M1/M2/M3 metric framework to evaluate whether your AI security tools are performing genuine investigation or sophisticated pattern matching

Cristian Leo

About

Sessions

SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents

Speaking At

RBLN East 2026 - Reston, VA