What have I done lately? – Part 1: AI

This post is my attempt at jotting down some work and hobby project work that I have been leading or dabbling in for the last couple of years. It is partly intended as a way to avoid getting nostalgic remembering projects that I worked on much earlier in my career, as not many people will be interested in tales of compiling BIND and sendmail from source back at the start of the century.

In the interests of not turning into a full book chapter, this will be a series of posts covering different topics.

AI – Artificial Intelligence

You may have noticed that Artificial Intelligence is kind of a big deal these days, so I have decided to pick that as my area of focus for Part 1 of this series.

Agents for coding

During my time at Atlassian I got internal access to Rovo as a CLI, and Codelassian – a plugin for IntelliJ – to interact with Claude and other LLMs for maintaining the Disaster Recovery team’s code.

Setting up Rovo involved providing it with an API key so that it could authenticate on my behalf and have appropriate access to make calls on my behalf.

When it came to applying changes to code the CLI capabilities were a bit restricted as it attempted to leverage what it expected to be in place for unix style command line utilities. After a few false starts and failed attempts at applying manipulations to source code files I came to realise that it was being limited by MacOS having a different version of some utils, so I looked into making it easier for the tooling via Homebrew.

The Codelassian IDE plugin was a little bit flaky to start with, but I figured it would be more powerful for my purposes if it could lean on IntelliJ’s refactoring capabilities for refactoring code, meaning that it would have a better chance of applying changes in one attempt than Rovo’s CLI.

Familiar vs Obscure

I found that LLMs were useful for surfacing up potential issues in well-defined structures that have publicly available examples and well documented behaviours, but tended to be vague when it came to areas that large enterprises deal with in private codebases.

For example, when I introduced some new handling for an edge case within a sequence of step functions, the analysis that came back from AI flagged up that the configuration for the step functions would actually result in the function being re-tried rather than having the clean exit that I was intending to introduce. This was my epiphany moment for realising how far AI has in 2025.

My counter example, demonstrating how LLMs are limited to the available data from their training models and what is publicly avaialable on the Internet, relates to the capabilities and performance of Amazon Athena for processing of structured data held in S3. As part of an evaluation of when it would be a better option than a custom built service, the best that the LLMs could come back with was some high level details that could be scraped from blogs.

Hallucinations happen

I have seen a few real world examples of an AI system producing false information, but the one that particularly stood out to me was when I was prompting for further analysis of how Athena would be expected to perform if we put some further effort into how the source data was structured.

The agent came back with a detailed breakdown of how Athena would perform better if the source data was ordered by a particular attribute of interest – as expected – but then it went on to confidently describe that the known source of the input data would already have the desired ordering in place…. That was completely false, and was a detail that was explicitly called out in the official documentaion of the source data producing system.

The hallucination experience was my second epiphany – we cannot blindly trust AI systems even when they place emphasis on something as though it is an established fact.

Model choice matters

When I was dabbling with Codelassian and Rovo I was blissfully ignorant about the underlying models that were in use behind the scenes. There would be a drop-down or a mention when the system started up, but around that time I hadn’t seen much attention being paid to what those details meant.

Now that I have had some time on my hands (since the Atlassian re-structuring, leading to my role being made redundant) I have been listening to a few podcasts, reading blog posts, and following a few threads on twitter / X that treat AI as more than a black box where magic happens. So, now I can appreciate when sonnet would be more suitable than haiku.