New Delhi, Dec. 3 -- Adoption of voice AI in India's financial sector is growing, but regulatory and technical hurdles limit its use. In a conversation with TechCircle, Kanika Jain, Co-Founder of Squadstack.ai, explained how the company is developing AI systems for sales workflows, the challenges banks face in compliance and data security, and how human oversight remains essential for complex interactions.

Edited Excerpts:

Many financial institutions are now considering Voice AI as a core part of customer engagement, not just a support tool. From your perspective, what technical, regulatory, or behavioral barriers still limit large-scale adoption in the BFSI sector?

There are several barriers, though not all of them are technical. A major one is compliance. The industry is still defining standards, and there is no widely accepted framework for AI compliance, security controls, or fallback systems. As a result, requirements vary across organizations.

On our side, we follow ISO standards, hold a SOC 2 certification, and meet DPDP requirements. We also work with banks where we have cleared InfoSec audits. In many cases, on-premise deployment is mandatory because any data leaving internal systems is considered a risk.

Another barrier involves model choice. Even if some models don't store customer data, many institutions remain uneasy about using models hosted outside India. This limits them to locally hosted models or pushes some banks to build their own large language models. These in-house systems, however, are not as strong or effective as widely available open-source or commercial models.

Banks that are further along in innovation are more willing to adopt AI because they've learned how to manage compliance within current technology constraints. Others are still catching up, but the gap is closing. Once a few early adopters show that security risks are contained, more banks are likely to follow.

Given India's strict compliance rules, multilingual needs, and data-security demands, what factors in BFSI operations shape how voice-AI systems are used and designed?

Compliance becomes far more complex when companies move into core banking functions. Verification, transaction handling, and customer support involve multiple internal systems, security checks, and deeper access to customer activity.

Our work sits elsewhere. We focus on sales use cases-credit card sign-ups, savings account openings, cross-sell, and lending. These require minimal customer data.

As a result, our compliance approach is narrower. We don't store PII. We don't retain phone numbers; they pass through the system once and are not saved. No one at the company can access a lead's phone number, and we follow a zero-retention policy for banking clients.

This makes our compliance needs very different from those building intensive support systems or automating core banking tasks. Those areas demand broader regulatory frameworks, stronger oversight, and reworked machine-learning operations because they deal directly with sensitive customer data. Our current use cases avoid that level of access.

When developing conversational agents, what are the biggest unresolved technical challenges-whether in latency, context handling, emotional understanding, or something else?

The limitations depend on the use case. Tasks like loan disbursals, follow-ups for incomplete applications, credit card sales, or opening savings accounts generally work well with AI agents. These calls are short-usually under five minutes-and customers have already begun a digital process on a website or app. Much of their information is already available through APIs, so the AI mainly confirms details, answers basic questions, and handles simple objections. With guardrails, human-in-the-loop checks, and flagging systems, most errors are caught early.

The challenges appear in areas like insurance. These calls are longer and more relationship-driven. AI agents struggle to retain context over extended conversations; performance drops after about five minutes. This leads to hallucinations, repetitive questions, and stalled progress.

Voice limitations add to the problem. While early experiments in voice modulation and emotion detection show progress, real-time calls still suffer from latency and uneven tone control. When the agent needs to respond to emotion-such as shifting to an apologetic tone with an irate customer-the interaction becomes less natural.

As a result, AI systems fall short in high-touch sales conversations, especially those that require detailed personal discussion and sustained context, such as insurance.

What lessons about human behavior have most changed the way you design AI systems?

We've seen a few consistent patterns. The basics matter: the AI must understand the question, answer it correctly, stay concise, and avoid repetition or hallucinations. Callers have little patience for failure on these fundamentals.

We've also found that voice quality shapes engagement. Off-the-shelf synthetic voices often sound robotic and out of place in a live conversation. When callers sense that, they check out quickly. But when we match voices to a caller's region or language-such as a light southern accent or smooth switching between Tamil and English-calls tend to run longer and produce better outcomes.

To improve this, we now use data from real human interactions to build our own voice models across different tones and regions, then match those voices to lead metadata. This helped reduce abrupt disconnections to levels similar to our human call center and increased average handle time. Callers stay on the line because the voice sounds natural and easier to talk to, which builds a degree of trust that robotic voices fail to create.

How do you define autonomy for AI sales agents and determine when a case shifts from automation to human support?

The level of autonomy we use depends on the specific task and its risk. In some sectors, such as delivery logistics, calls are fully automated because the scope is narrow-confirming whether someone is available at a certain time and place, or collecting basic feedback.

In BFSI, autonomy varies. Human handoff points are defined more tightly. Anything involving sales or loan-disbursal steps uses detailed bot instructions with minimal room for deviation. The bot relies only on the configured knowledge base, approved tool calls, system metadata, and predefined scenarios.

If the system detects repeated bot responses, the sharing of possible PII, or similar issues, the bot triggers a handoff to a human supervisor. Even when no handoff occurs, every call goes through human-in-the-loop auditing. Audit levels start at nearly 100 percent and taper down to a small sample, supported by flags and automated checks based on call transcripts.

Handoffs can also occur based on customer keywords, bot behavior, or the defined purpose of the call. For example, a bot may qualify a lead by asking a set of required questions, after which any sales discussion or additional product details are automatically transferred to a human agent.

How has your company's long-term vision changed? You've moved from augmentation to full autonomy-what strategic bets are you making now that you wouldn't have made two or three years ago?

Even before the recent wave of AI tools, when sales calls were handled only by humans, our focus was on building systems that could reliably move prospects through the funnel.

First, we needed complete and accurate CRM data. Second, we needed strong call connectivity, because better connectivity meant better conversion rates. Third, we had to eliminate drop-offs. In many call centers, agents receive leads but don't follow the prescribed cadence, and a large share of those leads are never contacted again. Companies lose revenue simply because follow-ups don't happen.

Our first version of the product addressed these gaps. The next version introduced early forms of AI assistance-rule-based systems that suggested next steps and guided agents on what to say, long before today's agentic AI and large language models.

Now, with rapid advances in voice AI, the opportunity is different. Marketing already personalizes ads by audience segment, channel, and message, but sales calls typically rely on one standard script because training hundreds of agents on personalized variations isn't feasible. As a result, every customer hears more or less the same pitch, regardless of their profile or priorities.

AI changes that. Voice systems can tailor conversations the way marketing teams tailor campaigns-by persona, by need, and with constant A/B testing of what works.

The goal remains the same: improving the sales funnel and helping companies capture value they're currently leaving behind.

Published by HT Digital Content Services with permission from TechCircle.