AI Research

I had the privilege of working closely with brilliant AI researchers from the University of Sydney through the Engineering Sydney Industry Placement Scholarship (ESIPS) program. Together, we explored AI research and commercialisation opportunities, tackling complex real-world challenges drawn from production systems at Constantinople. It was a pleasure to collaborate with Mohitha, Adam, and Jake on these impactful projects.

Mohitha Mohan

Evaluating Tool-Augmented LLMs via MCP

Bachelor of Engineering Honours (Software), University of Sydney, 2025

Key finding

All Claude Sonnet models degrade when scaling from 5 to 181 tools (up to -12.5pp). Errors driven by tool name similarity, not volume alone. Two novel evaluation frameworks proposed for assessing tool-augmented LLMs beyond single ground-truth answers.

-12.5pp
Worst drop
181
Tools tested
3
Models compared
MCP LLM evaluation Tool selection

Adam Schildkraut

Comparing RAG Architectures for Enterprise Knowledge Management

Bachelor of Engineering Honours, University of Sydney, 2025

Key finding

Simple Vector RAG with budget embeddings ($0.003/query) delivers statistically indistinguishable quality from GraphRAG ($0.46/query) across 17 configurations tested on real enterprise data. Coined the "Maintenance Trap": after roughly 20 corpus updates, cumulative GraphRAG re-indexing costs exceed initial investment.

17
Configurations
153x
Cost gap
63K
Chunks indexed
RAG GraphRAG Cost analysis

Jake Marsden

Deep Research Agents: Review, Verification, and Adaptive Model Selection

ELEC4714 Major Industrial Project, University of Sydney, 2025

Key finding

Adding an LLM review loop actually decreased quality (a valuable negative result), but a Research Verifier boosted citation accuracy from 76.7% to 90.3% at minimal cost. Adaptive Model Selection achieved 99.6% of premium quality at 71% of the cost. Reactive verification beats emergent self-critique.

90.3%
Citation accuracy
99.6%
Quality retained
71%
Cost ratio
Deep research Agentic AI Cost optimisation