How to wade through agent uncertainty, make the right decisions for your organisation, and get your agents production-ready
Written by Samuel Irvine Casey
Partner – AI, Mantel
The year of the agent
We’re well into the last quarter of the year, and it’s clear that 2025 has become the year of Agentic AI. It seems that nearly every company across the technology ecosystem is betting big on this technology and, as a result, the agentic landscape is moving at a blistering pace.
Proprietary model providers like Anthropic and OpenAI are battling with the Open Source providers (e.g. Meta/Qwen), pushing the envelope on what these LLMs are capable of almost every week. Cloud hyperscalers are fighting for control of the Agentic workflows being created with the release of new agentic features within Azure Foundry, Amazon Bedrock and Google’s Vertex AI.
AI research teams are vying for architectural supremacy with the release of various agentic frameworks (Strands, ADK, Agno, CrewAI, SmolAgents, etc.) and protocols (MCP, A2A, etc.) Meanwhile, technology giants like Salesforce and SAP are building agents into their existing products to try and keep their captive market. Finally, there has been an explosion of technology startups who have scrambled to plug the gaps left by the model providers and hyperscalers in the creation and deployment of AI Agents, e.g. model selection (LiteLLM), observability (Langfuse/Arize Phoenix) and evaluation (Veris AI).
With all of this uncertainty, it is hard to predict which organisations and technologies will eventually triumph and who will fade into the background. However, in the face of this uncertainty, two things are clear:
With all of this uncertainty, it is hard to predict which organisations and technologies will eventually triumph and who will fade into the background. However, in the face of this uncertainty, two things are clear:
- Organisations are getting decision-paralysis on what approach/technology they should bet on. Given all the different options in a rapidly developing space, they’re (rightfully) worried they will either make the wrong decision and be unable to course correct, or that a new technology will come out next week that makes their investment meaningless.
- If these organisations remain paralysed and don’t start releasing these agents they are tinkering with into production, then no value will be achieved. Questions will be asked about whether Agentic AI is just another dot on the AI Hype Curve or whether it is the transformational technology it promised to be.
Without a crystal ball, it’s hard to make any firm predictions on who the eventual winners will be, but I’ve had the pleasure of spending my last six months developing and deploying agents into production with a couple of innovative organisations; here are some learnings on the challenges we faced taking agents from theory to production, and, ultimately, what worked.
Agentic frameworks: what underpins the ecosystem?
The first thing I’m usually asked when talking about agents is: “What framework should I use?”
To be honest, I don’t think the framework itself makes a huge difference. An agent framework acts as a scaffold or a set of building blocks to quickly design agents, and I have seen successful agentic deployments underpinned by LangGraph, Strands, ADK, Crew AI and Agno. Personally, I’m more likely to subscribe to the 12 Factor approach, as it gives you far more flexibility for your specific agents’ needs, but I appreciate that most organisations aren’t in the business of coding bespoke agents, so a framework does make things easier.
If you do decide to go down the AI framework route, I would suggest starting with your cloud vendor of choice (Strands for AWS, ADK for Google, etc.) as the hyperscalers are heavily investing in their own frameworks and have developed easy integrations with their other internal services and databases that you may not get as easily from third-party frameworks.
Another thing to think about is how you will log your agents’ actions, undertake evaluations and provide relevant guardrails. The more common the framework, the more likely it will be natively supported by the leading observability and evaluation platforms (e.g. Langfuse, Arize Phoenix, MLFlow etc.) which makes this process much easier.
To code or not to code
Another question I often get is: “Should I be building agents in code, or via no/low code interfaces?”
I think both of these solutions have their place, but it’s important to have a clear goal in mind when deciding how to build agents. If you are building a small number of agents to automate a complex process (or set of processes) and this involves multiple integrations with tools, databases, APIs etc, then building agents in code and deploying them into an Agent Runtime like Google’s Agent Engine, AWS Agent Core or Azure Foundry is a good move. Alternatively, if you are envisaging building a whole suite of agents that are all focusing on smaller, more discrete tasks involving one or two integrations, then using a no- or low-code interface like Bedrock Agent Flow, Google’s AgentSpace, or open source options like N8N could be a more scalable and easier way to build.
Both options can work, but building agents through code and deploying them into production is very hard. If you opt to go down that route, it is important you have the right technical skillset in your organisation to navigate these challenges as they arise. Given the integration layers involved in agentic systems, you ideally want a multi-skilled technology team spanning Machine Learning, Software Engineering, Cloud Infrastructure and DevOps. This means you have dedicated skills to help build the agents, develop API integrations for tool calls, deploy into cloud environments and then actually monitor these agents in production.
Make your team agentic from the beginning
It is critical to involve the business in these projects from the beginning. All of the successful production deployments I’ve been across in the past few months had fully cross-functional teams, from the very beginning, including senior leadership, product, design, legal and compliance.
This meant everyone was on the same page during the build, had user experience in mind from day one, and was able to navigate any issues with legal or compliance as we encountered them, rather than as an afterthought.
Are you ready for true autonomy?
One of the biggest realisations we encountered when building various parts of these agentic systems is that we didn’t actually need to use an agent at all. We’d start off with an open-ended agent that had a goal and a couple of tools. As we progressed with development, we constrained the scope, realised we wanted it to do certain things in a certain order, and added compliance and policy guardrails. At a certain point, we realised that what we were refining was more akin to a workflow using LLMs and tools, rather than a truly agentic system.
By definition, agentic systems are ‘autonomous’ and work best for problems that have an open-ended decision space, but have an easily evaluatable outcome. Use cases like AI coding or claims processing are good agentic use cases, as they have multiple pathways to solve, but output is easily measurable (i.e. does the code run, was the claim approved?) whereas a simple onboarding flow that follows a logical step-by-step process is more suited to a deterministic workflow or LLM workflow instead.
“While I am a big advocate for AI, I would much rather use deterministic software or more traditional forms of Machine Learning if they will get the job done, rather than just looking to solve every problem with an Agent-sized hammer.”
Samuel Irvine Casey | MantelPartner - AI
Navigating the token economy
Another challenge we faced when building these agentic systems was navigating the nebulous agentic costing model. Agents, like all LLM-backed systems, are almost always charged based on usage. More specifically, costs are incurred based on the token input to, and output from, the LLM used.
What’s a token? Kind of like a word, but not really – hope that helps with the confusion.
This costing model often means that the ‘value’ the agent brings by completing a task does not align with the ‘cost’ in tokens. For example, you might have a process that was being outsourced to a third-party tech solution for $2.00 a process. If, on average, the token cost for an agentic system to automate that process is $0.20, then there would be a clear ROI.
However, a different scenario emerges if you want to build a smart assistant that helps your users find information and complete actions within your internet banking application. Forecasting this cost is far more complex, because a single power user could incur hundreds of dollars in costs through heavy use, whereas other users may not use it at all. While the power user clearly finds the tool valuable, would they find it valuable at $300 a week?
To try and address this issue, it is a very good idea to begin your agentic journey with a cost model and target ROI in mind. Furthermore, the second situation above may not be financially viable from an ROI perspective, but could still be valuable to the organisation as a loss leader, improving customer experience and acting as a differentiator in the market. All of these are valid considerations; it’s important to fully consider them.
In terms of other practical advice, I would recommend working backwards from your cost model. Don’t choose the beefiest LLM available if you know you only have a few cents per process to spend. Optimise your prompts, tool calls, database calls, and memory usage, as this will reduce the token input and output from your agentic systems, thereby further reducing costs.
Complexity vs performance
Agent optimisation is also an important consideration when it comes to balancing the complexity of the agent against the agent’s performance. In my experience, it is more effective to build several smaller agents, each focused on a specific skill, task, or area of a process, rather than trying to build a super agent that attempts to solve every task but ends up struggling with any. There were times when we had to break individual agents into smaller subagents because they became too complex to keep them on task.
Be very clear upfront as to what the agent is expected to do, and only give it access to the tools, data and memory that it absolutely needs to do its job. This will help prevent the agent from being confused about which tool to call or what data to leverage, making them faster and more effective at completing your desired task.
As with costing, it is important to consider the latency requirements of the agentic system prior to building, as this will heavily impact the model choice, task complexity and agent architecture required to achieve the desired latency goal.
Conclusion
The last six months have been a wild and exciting journey, diving into the world of agents and navigating challenges that arose while taking them into production. To quote a colleague from a leading Australian health insurance company, “the best test environment is production”, and I couldn’t agree more when it comes to testing agentic systems.
It’s impossible to plan for every scenario or permutation without putting an agent in a proper production environment and seeing how it reacts as you slowly expand the usage, user base or problem space. This is also a good reminder that Agents, like LLMs and AI, are probabilistic systems. Organisations need to be comfortable with a level of imperfection and uncertainty if they are going to seriously utilise AI agents in the near future.