Foojay Podcast #99: Testing the Untestable, LLM Security for Java Developers with Tiberius

Unit testing assumes that the same input gives the same output. Large language models break that assumption on purpose, which leaves Java developers wiring LLMs into their applications without a clear way to test for security issues, bias, or prompt injection. Tiberius is an open-source library that tackles exactly this problem, treating non-determinism as part of the design rather than a bug to work around.

For Foojay Podcast #99, I talked with Iryna Dohndorf, Software Engineer at Karakun Group, to dig into how Tiberius scans, fixtures, and validates LLM integrations, and what security testing looks like when the system under test never answers the same way twice.

What we talked about

The problem Tiberius addresses
Why traditional unit tests fail for LLM integrations
The Scan-Fixture-Validate principle
The “Grandmother skill” and different testing skills
Required versus forbidden bias testing
Nine attack categories and the probes used by Tiberius
Buff mutation testing
Pipeline integration and failure criteria
Multi-trial scanning techniques
Model fingerprinting
Multi-model approaches and model-as-judge
JSON model sharing for improved tests
Spring and LangChain4j integration
Future Quarkus support plans

Why it matters

As more Java applications lean on LLMs, the security surface shifts in ways our existing test tooling was never designed for. Prompt injection, bias, and unpredictable output are not edge cases, they are everyday risks. Tiberius gives Java developers a structured, repeatable way to probe those risks inside the build pipeline, which makes shipping AI-powered features a lot less of a leap of faith.

See the Foojay Podcast #99 episode page for all info, shownotes, links, etc.