Foojay Podcast #99: Testing the Untestable, LLM Security for Java Developers with Tiberius

Foojay Podcast #99: Testing the Untestable, LLM Security for Java Developers with Tiberius

Unit testing assumes that the same input gives the same output. Large language models break that assumption on purpose, which leaves Java developers wiring LLMs into their applications without a clear way to test for security issues, bias, or prompt injection. Tiberius is an open-source library that tackles exactly this problem, treating non-determinism as part of the design rather than a bug to work around.

For Foojay Podcast #99, I talked with Iryna Dohndorf, Software Engineer at Karakun Group, to dig into how Tiberius scans, fixtures, and validates LLM integrations, and what security testing looks like when the system under test never answers the same way twice.

What we talked about

  • The problem Tiberius addresses
  • Why traditional unit tests fail for LLM integrations
  • The Scan-Fixture-Validate principle
  • The “Grandmother skill” and different testing skills
  • Required versus forbidden bias testing
  • Nine attack categories and the probes used by Tiberius
  • Buff mutation testing
  • Pipeline integration and failure criteria
  • Multi-trial scanning techniques
  • Model fingerprinting
  • Multi-model approaches and model-as-judge
  • JSON model sharing for improved tests
  • Spring and LangChain4j integration
  • Future Quarkus support plans

Why it matters

As more Java applications lean on LLMs, the security surface shifts in ways our existing test tooling was never designed for. Prompt injection, bias, and unpredictable output are not edge cases, they are everyday risks. Tiberius gives Java developers a structured, repeatable way to probe those risks inside the build pipeline, which makes shipping AI-powered features a lot less of a leap of faith.

See the Foojay Podcast #99 episode page for all info, shownotes, links, etc.