Natural Language Processing
Multimodal AI
Systems Thinking

Samsung Bixby: Engineering Context-Aware Conversational AI

COMPANY Samsung R&D
ROLE Sr. Software Engineer
Duration 24 months
Launch Mar 2017

Intelligent context-aware personal assistant powering Samsung's ecosystem of devices from smartphones to televisions offering users multiple ways to interact with their devices through voice, text, or touch.

My Impact

Designed Bixby's entity extraction framework achieving 95% intent accuracy at <100ms latency for 200M+ users across 300M+ devices

Background

Back in 2014, voice assistants like Siri, Google Assistant and Alexa were command-driven and forgetful. If you asked 'What's the weather?' then followed up with 'What about tomorrow?'— they'd fail. Users would get frustrated if their voice assistant didn't really answer their questions. While the simple ones worked, the complex ones failed as these assistants missed context.

"What’s the time now?"

"Should I take an umbrella today?"

"Is it a holiday tomorrow?"

"Set a timer for 30 minutes"

"Remind me to stop at the grocery store to buy bread when I’m jogging"

"What are the top news stories today?"

Goal

Our goal at Samsung was build a truly conversational assistant that remembered context across multiple turns and domains. This meant understanding users needs and technology limitations in order to create a technical framework that works and implement that into a fully-functioning system.

The system should be able to help with simple tasks such as setting timers or even handle complex tasks like creating a photo album of recently taken pictures on a given day and, sharing it with family and friends.

Role

As a software engineer on Samsung's Bixby team, I built the technical infrastructure to make that possible.

I led and defined the context-awareness framework that enabled the voice assistant to understand natural language and maintain conversation context across multiple domains such news, weather, location, alarms, and more. In order to do so, my work involved collaborating with teams across design, strategy and engineering.

Technical Problem Scoping

Conducted technical evaluations of user interaction patterns to inform NLU model design. Defined a framework for context -- interaction patterns, sentence construction and discourse paths. Evaluated 3rd-party content providers and defined technical integration requirements

Framework Implementation

Designed and implemented the context-awareness framework connecting NLU, ASR, and domain modules across 8 languages. Built and maintained core NLP backend infrastructure supporting 10+ domains (news, weather, location, health, media, etc.).

Prototyping & Testing

Created an Android prototype and demo server integrating NLU/ASR models with REST APIs. Achieved 95% intent classification accuracy at <100ms latency serving 10M+ users during beta

Collaboration & Leadership

Partnered with multiple stakeholders across engineering, design, strategy, product and 3P vendors. Presented the concept vision to 800+ employees at an internal conference

Connecting the dots: Entity Resolution & Intent Mapping

Let's look at how a machine understands you and see what the entity resolution mapping for the following example looks like:

There are several things at play here. This task that seems so simple for a user needs multiple applications to work together. This task needs to invoke the native Clock, Maps and Samsung Health applications.

While Clock is a system application, Maps and Samsung Health are cloud-based applications that need to make calls to the backend. The backend has to fetch these responses from multiple applications and then create a response and speak in a language that a user can understand. All this needs to be done in a very short period of time.

When all these work seamlessly, it seems magical!

The right most column shows how all these entities get structured into a machine-readable format that the system can act on - creating the reminder, linking location services, monitoring activity sensors, and setting up geofence triggers.

Why is it hard. Context matters. Period.

There are several factors that come into play when we talk about conversational intelligent assistants. One such factor is location. It matters where you are located -- Seoul or San Francisco or Bangalore or London -- as the response may or may not be relevant for you.

Other context-awareness factors include timezones, languages and locales that affect local info such as news, weather, restaurant reservations.

Multi-domain context switching

Users don't say "Hey Bixby, using the Weather domain, what's the temperature?" They switch between domains mid-conversation. We had to build a context manager that tracked conversation history and routed queries to the right domain.

Content provider orchestration

A single query like "Wake me up at 6am when I'm jogging" requires coordinating Clock (system), Maps (cloud), and Health (cloud) services. We built an orchestration layer that parallelized API calls and handled failures gracefully.

Sub-100ms latency requirement

Voice interactions feel broken above 100ms. We optimized our NLU pipeline to classify intents and extract entities in under 100ms while maintaining 95% accuracy.

Results

The product was unveiled in March 2017 at the Samsung Galaxy S8 Unpacked event and officially rolled out worldwide in July 2017. The context-awareness framework I helped build became foundational to Bixby's success:

200M+

Bixby users (as of Nov 2022)

300M+

Bixby enabled devices

95%

intent classification accuracy serving 10M+ users

<100ms

latency

Media Coverage

Samsung News

"MOUNTAIN VIEW, Calif., – July 18, 2017 – Samsung Electronics America, Inc. today announced that the voice-based feature of Bixby will be available starting today, for Galaxy S8 and S8+ owners in the U.S."

Samsung News

"Samsung has a conceptually new philosophy to the problem: instead of humans learning how the machine interacts with the world (a reflection of the abilities of designers), it is the machine that needs to learn and adapt to us...Bixby will be a new intelligent interface on our devices."

What I learned

Building Bixby's context-awareness system gave me a deep understanding of conversational AI's technical constraints—latency budgets, model accuracy trade-offs, multi-domain orchestration.

When I transitioned to product design, this engineering background became my superpower. I design conversations knowing exactly what's technically feasible, how much latency each interaction adds, and where ML models will struggle. This makes me a better designer for AI products.

← Back to Home