
New Delhi, Jan. 7 -- Over the past year, enterprise adoption of generative AI has shifted from small experiments with large language models to deploying multi-agent systems. Companies are moving beyond proofs of concept toward pilots and production, navigating leadership pressure, investment priorities, and regulatory concerns.
In a conversation with TechCircle, Sridhar Mantha, CEO of Generative AI Business Services at Happiest Minds Technologies, highlights the role of data quality, compliance, and practical trade-offs between model fine-tuning and retrieval-augmented approaches in determining which AI initiatives succeed.
Edited Excerpts:
From what you're seeing, how much of today's generative AI adoption is driven by clear strategy, and how much is driven by fear of missing out?
A year ago, the conversation surrounding this technology primarily focused on large language models. The prevailing view was that if you had a lot of text, you could use it through a chat interface. That was how most people thought about the space.
Over the past year, the focus has shifted to agents. Each agent is still built on an LLM, but with added capabilities, they are evolving into multi-agent systems. This represents a clear technological shift. From the client's perspective, two realities stand out. There is pressure from senior leadership to adopt the technology, and engineering leaders such as CTOs, CIOs, and VPs know adoption is inevitable. The uncertainty is about timing, investment priorities, pace, and which use cases to pursue. Questions remain about whether to start with proofs of concept, pilots, or something else.
A year ago, most efforts were centered on POCs. Organizations tried small experiments, often to signal activity to leadership, while also trying to understand the technology. This led to what is often called "POCism," where teams ran many POCs without moving forward, and fatigue set in.
Now, many organizations are seeking consulting support to define a practical roadmap, identify use cases that can deliver value today, and move into implementation. The emphasis has shifted from POCs to a pilot-to-production approach. In many cases, the goal is to move a use case into production within about three months. Early adopters and risk-takers are leading this approach.
Clear ROI benchmarks do not yet exist because few use cases have been running in production long enough across the industry. While ROI can be estimated, there are no established industry metrics. As a result, technology leaders are choosing to implement use cases they believe are valuable, deploy them to production, and begin realizing benefits rather than waiting for definitive data.
Why do most generative AI initiatives stall at the pilot stage-technology limits, data and governance issues, or leadership expectations?
The shift from a pre-agentic to a post-agentic phase has been abrupt, unfolding over months rather than years. In the early, pre-agentic phase, expectations around accuracy were high. Many users treated early models as systems that could reliably answer questions or generate content on demand, including against enterprise data. That assumption did not hold. Proofs of concept stalled when users expected fully accurate answers to queries such as sales forecasts, prior-year revenue, or other database-backed questions where only one correct number exists. In these cases, partial accuracy was not acceptable, and accuracy issues became a primary reason projects slowed or stopped.
Cost was the second factor. Early models were large and expensive, and API usage costs accumulated quickly. Model providers responded by releasing smaller models and routing workloads based on context to reduce expenses. At the same time, premium pricing remained for top-tier models, keeping cost a continuing concern.
The third factor was the pace of technological change. Use cases initially built around large language models and APIs were later better addressed using agent-based approaches, leading some teams to pause work and reassess.
Across the industry, these three factors-accuracy expectations, cost, and rapid technological change-contributed to the high rate of stalled proofs of concept reported in external studies. In our experience at Happiest Minds, projects did not stall when customers selected appropriate use cases and aligned expectations around accuracy. Use cases requiring near-perfect accuracy were often discouraged, while those with realistic requirements progressed into production.
How much does the success or failure of generative or agent-based AI actually depend on the quality and accessibility of enterprise data, rather than on the models themselves?
It comes down to garbage in, garbage out. Any agent, chatbot, or JNI-based solution depends on the enterprise knowledge repository it draws from. Many organizations still lack clean data governance, and the data itself contains gaps that require significant cleansing.
As a result, attention has shifted toward data governance and data quality, beyond the JNI discussion. Over the past six to twelve months, enterprises have begun investing more heavily in this area after realizing that stronger AI, both JNI-based and traditional, is inevitable. There is no alternative. Even if JNI use cases are not implemented immediately, organizations are choosing to focus in parallel on improving data governance and data quality. From an analytics center of excellence perspective, this has led to a growing number of discussions each year centered on these topics.
There is an ongoing debate between fine-tuning large models and using retrieval-augmented generation. In real enterprise deployments, where does each approach break down?
In our implementations, most of the work relied on RAG. If the goal is to constrain an LLM to answer only from a knowledge repository, RAG is sufficient. Fine-tuning the model, including its outer layers, did not produce better results than RAG. In some cases, we used fine-tuning together with RAG, but fine-tuning was mainly applied to control tone or style rather than to add knowledge. Across projects and discussions, the split between RAG and fine-tuning is roughly 80-20. Even style changes can usually be handled through prompt engineering as models have improved.
Cases that require either SLMs or fine-tuning are rare and typically involve very large datasets. This is not about tens of thousands of rows or a few hundred documents. In those ranges, RAG performs better. Interest in fine-tuning has declined as base models have improved.
On the SLM side, interest is growing because agent-based architectures allow multiple models to work together. Instead of relying on a single LLM with long prompts and RAG, different agents can generate, validate, and process tasks using different models. Some tasks still require large models, such as generating a full report. Other tasks, like web crawling or summarization, can be handled by smaller models. In financial reporting, for example, a large general model may generate the final report, while a smaller domain-specific model can select and summarize relevant articles.
This has driven interest in multiple models rather than one model handling everything. However, adoption of smaller language models remains limited by data availability. Most work relies on open-source or public datasets, and only a small number of companies can support this approach. In our discussions, education companies are a notable exception because they often have large volumes of content and data.
How seriously are enterprises addressing prompt injection and data leakage risks, and how is your company helping them with safeguards and solutions?
It has been more than six months since the prompt injection issues first surfaced. At that time, early large language models lacked sufficient guardrails. Since then, guardrails have been added at multiple levels. These include controls built into the models themselves and into the frameworks that use them.
Some platforms now allow guardrails to be defined at the enterprise level rather than per agent. These policies can apply across all applications and agents, even when they rely on different models. This approach is now becoming available in production systems. The level of security still depends on the framework and technology in use, and guardrails are applied accordingly.
In agent-based systems, a common pattern is to deploy a dedicated guardrail agent at the front. This agent operates with predefined rules and context and acts as a gatekeeper. It filters prompt injection attempts and potential data leakage before requests reach other agents or models.
Based on production use cases, this combination of frameworks and design patterns has reduced risk, and no materialized threats have been observed. There has also been no major public incident in recent months involving prompt injection on widely used platforms.
Do you see any upcoming regulations that could fundamentally change how enterprises use generative or agent AI?
It is difficult to make a precise assessment, but any such issue ultimately comes down to the tracking of user information, particularly personal data. The first area of concern is healthcare. One use case currently under development involves clinical settings where doctors or nurses speak with patients while simultaneously taking notes or filling out forms. A note-taking system that listens to the conversation and automatically enters information into clinical systems is technically feasible using large language models or agents. The immediate concern, however, is compliance with regulations such as HIPAA and the handling of patient data. Questions arise about whether that information is sent to language models and whether it could be retained or used as training data. Personal data in highly regulated industries is where scrutiny is most urgent to ensure no rules are being violated.
Published by HT Digital Content Services with permission from TechCircle.