A global telecommunications provider operating across Europe, Africa, and the Middle East needed to unify their customer support experience. With support teams spread across 23 languages and wildly inconsistent response quality, they came to us with a clear mandate: build an AI-powered system that understands customer intent regardless of language.

The challenge

The existing support infrastructure was a patchwork. Each regional team had its own ticketing system, its own taxonomy of issue categories, and its own escalation rules. A French-speaking customer in Belgium and an Arabic-speaking customer in Morocco might have the exact same billing issue, but their tickets would follow completely different paths.

The numbers were stark:

23 languages across 14 regional support centers
340+ issue categories (with significant overlap and inconsistency)
Average first-response time: 4.7 hours
Customer satisfaction score: 3.1/5

The customer doesn't care what language they speak — they care how fast their problem gets solved. Our AI had to be equally indifferent to language.

Architecture

We designed a three-stage pipeline: classify, route, assist.

Stage 1: Multilingual intent classification

We fine-tuned a multilingual transformer model on the company's historical ticket data. The key decision was to use a language-agnostic embedding space rather than per-language models. This meant a single model could handle all 23 languages, including code-switching (common in African markets where customers mix French, Arabic, and local languages in a single message).

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = "xlm-roberta-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=87,  # unified taxonomy
    problem_type="single_label_classification"
)

def classify_ticket(text: str) -> dict:
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    outputs = model(**inputs)
    probs = outputs.logits.softmax(dim=-1)
    top_intent = probs.argmax().item()
    confidence = probs.max().item()
    return {"intent": LABEL_MAP[top_intent], "confidence": confidence}

We consolidated the 340+ regional categories into a unified taxonomy of 87 intents, mapped by domain experts over a two-week workshop. This alone was transformative — it meant that for the first time, the company could compare support metrics across regions.

Stage 2: Intelligent routing

Once the intent is classified, the ticket is routed to the optimal agent based on three factors: language proficiency, domain expertise, and current workload. We modeled this as a constraint satisfaction problem:

def route_ticket(ticket: Ticket, agents: list[Agent]) -> Agent:
    scored_agents = []
    for agent in agents:
        lang_score = 1.0 if ticket.language in agent.languages else 0.0
        domain_score = agent.expertise_scores.get(ticket.intent_category, 0.0)
        load_score = 1.0 - (agent.active_tickets / agent.max_capacity)
        total = (0.4 * lang_score) + (0.35 * domain_score) + (0.25 * load_score)
        scored_agents.append((agent, total))
    return max(scored_agents, key=lambda x: x[1])[0]

Stage 3: Agent assist

For each routed ticket, the system generates a suggested response based on similar resolved tickets. The agent can accept, modify, or ignore the suggestion. This dramatically reduced response time for common issues while keeping a human in the loop for nuanced cases.

Training data challenges

The biggest technical hurdle wasn't the model — it was the data. Historical tickets were labeled inconsistently, some languages had far fewer examples than others, and customer messages ranged from one-word complaints to multi-paragraph narratives.

We addressed the imbalance with a combination of:

Translation augmentation — Using high-quality machine translation to create synthetic examples for underrepresented languages
Active learning — Deploying the model in shadow mode for 4 weeks, sampling the most uncertain predictions for human review
Cross-lingual transfer — Leveraging XLM-R's shared embedding space to transfer knowledge from high-resource languages (English, French, Arabic) to low-resource ones (Swahili, Wolof, Malagasy)

Results

After a 3-month phased rollout across all 14 regional centers:

Intent classification accuracy: 91.3% across all languages (up from manual categorization accuracy of ~72%)
Average first-response time: dropped from 4.7 hours to 2.8 hours (40% reduction)
Customer satisfaction: improved from 3.1 to 4.0/5
Agent productivity: 23% more tickets resolved per agent per day

The system now processes over 180,000 tickets per month across all languages, with the model retrained weekly on newly labeled data to adapt to emerging issue types and seasonal patterns.

Natural language processing for multilingual customer support