
Natural language processing for multilingual customer support
A global telecommunications provider operating across Europe, Africa, and the Middle East needed to unify their customer support experience. With support teams spread across 23 languages and wildly inconsistent response quality, they came to us with a clear mandate: build an AI-powered system that understands customer intent regardless of language.
The challenge
The existing support infrastructure was a patchwork. Each regional team had its own ticketing system, its own taxonomy of issue categories, and its own escalation rules. A French-speaking customer in Belgium and an Arabic-speaking customer in Morocco might have the exact same billing issue, but their tickets would follow completely different paths.
The numbers were stark:
- 23 languages across 14 regional support centers
- 340+ issue categories (with significant overlap and inconsistency)
- Average first-response time: 4.7 hours
- Customer satisfaction score: 3.1/5
The customer doesn't care what language they speak — they care how fast their problem gets solved. Our AI had to be equally indifferent to language.
Architecture
We designed a three-stage pipeline: classify, route, assist.
Stage 1: Multilingual intent classification
We fine-tuned a multilingual transformer model on the company's historical ticket data. The key decision was to use a language-agnostic embedding space rather than per-language models. This meant a single model could handle all 23 languages, including code-switching (common in African markets where customers mix French, Arabic, and local languages in a single message).
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "xlm-roberta-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
num_labels=87, # unified taxonomy
problem_type="single_label_classification"
)
def classify_ticket(text: str) -> dict:
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
probs = outputs.logits.softmax(dim=-1)
top_intent = probs.argmax().item()
confidence = probs.max().item()
return {"intent": LABEL_MAP[top_intent], "confidence": confidence}
We consolidated the 340+ regional categories into a unified taxonomy of 87 intents, mapped by domain experts over a two-week workshop. This alone was transformative — it meant that for the first time, the company could compare support metrics across regions.
Stage 2: Intelligent routing
Once the intent is classified, the ticket is routed to the optimal agent based on three factors: language proficiency, domain expertise, and current workload. We modeled this as a constraint satisfaction problem:
def route_ticket(ticket: Ticket, agents: list[Agent]) -> Agent:
scored_agents = []
for agent in agents:
lang_score = 1.0 if ticket.language in agent.languages else 0.0
domain_score = agent.expertise_scores.get(ticket.intent_category, 0.0)
load_score = 1.0 - (agent.active_tickets / agent.max_capacity)
total = (0.4 * lang_score) + (0.35 * domain_score) + (0.25 * load_score)
scored_agents.append((agent, total))
return max(scored_agents, key=lambda x: x[1])[0]
Stage 3: Agent assist
For each routed ticket, the system generates a suggested response based on similar resolved tickets. The agent can accept, modify, or ignore the suggestion. This dramatically reduced response time for common issues while keeping a human in the loop for nuanced cases.
Training data challenges
The biggest technical hurdle wasn't the model — it was the data. Historical tickets were labeled inconsistently, some languages had far fewer examples than others, and customer messages ranged from one-word complaints to multi-paragraph narratives.
We addressed the imbalance with a combination of:
- Translation augmentation — Using high-quality machine translation to create synthetic examples for underrepresented languages
- Active learning — Deploying the model in shadow mode for 4 weeks, sampling the most uncertain predictions for human review
- Cross-lingual transfer — Leveraging XLM-R's shared embedding space to transfer knowledge from high-resource languages (English, French, Arabic) to low-resource ones (Swahili, Wolof, Malagasy)
Results
After a 3-month phased rollout across all 14 regional centers:
- Intent classification accuracy: 91.3% across all languages (up from manual categorization accuracy of ~72%)
- Average first-response time: dropped from 4.7 hours to 2.8 hours (40% reduction)
- Customer satisfaction: improved from 3.1 to 4.0/5
- Agent productivity: 23% more tickets resolved per agent per day
The system now processes over 180,000 tickets per month across all languages, with the model retrained weekly on newly labeled data to adapt to emerging issue types and seasonal patterns.


