Conversational AI platform overview and comparison: Alan AI vs Azure Speech

SUBSCRIBE TO THE BLOG

Conversational AI (artificial intelligence) involves the study of techniques for software agents that can engage in natural conversational interactions with humans. 

Conversational AI allows inquiries to be handled by technology, in the form of a chatbot or voice assistant. These use artificial intelligence to replicate the kind of interaction that users would expect from a helpful and well-informed human being. [1] 

Machine Learning algorithms learn by example and can be continuously taught and re-trained over time, allowing them to adapt and improve with experience.

Natural language processing (NLP) is an area of artificial intelligence concerned with the automated analysis and generation of human language. [2]

Conversational AI combines machine learning with natural language processing. This enables it to process and understand what a user is writing or saying, then generate appropriate responses in a natural way. Over time, the system will automatically refine its responses and adapt to changing circumstances.

The following article is intended to review two Conversational AI platforms: Alan AI and Azure Speech. A comparison will be made between the two tools by looking at what features are provided, how their processes work, documentation, pricing, and integration to different programming languages or solutions. Finally, a short guide will be provided on how to implement each solution in a ReactJs application.

ALAN AI – An Overview

Alan is a Voice AI platform that allows you to add a voice interface to apps. 

Alan offers a conversational AI platform to build voice assistants. The platform comprises a set of tools and components to design, embed, and host voice interfaces for apps:

  • Alan Studio: A web portal for developers to create voice scripts for their apps.
  • Alan SDKs: The Client SDKs allows developers to embed Alan’s voice in apps through the Alan button. This button allows users to communicate with the app via voice and execute commands from voice scripts.
  • Alan Cloud: The solution architecture is serverless; voice scripts are run and managed on VMs in the cloud. The cloud is where voice scripts created are executed and all voice processing takes place. Alan trains on intents using the terminology for the app and learns to understand the logic of the domain. If necessary, it is possible to migrate the data to a private cloud.

 

How Alan AI works

​​Alan acts as an AI-backend. It lets the app ‘understand’ the human language and provides the ability to interact with the app through voice. To build a voice interface with Alan, you need to complete the following tasks:

  • Design a dialog for your voice assistant in Alan Studio: The first step is to write a dialog—a voice script—in Alan Studio. The voice script describes the expected conversation scenario with the users of the application. It covers all topics, questions, and phrases users can ask or say while interacting with the voice assistant. Voice scripts are written in JavaScript, which provides unlimited flexibility for dialog building.

The voice script describes the expected conversation scenario with the users of the application

Alan integrates with client apps through Alan's SDKs.

Documentation

Alan AI has a comprehensive documentation website. It provides a guide on how to use Alan Studio to create the voice scripts, examples of voice scripts, samples on GitHub with different existing SDKs integrations, and video resources.

Integration Options

  • Web: React, Angular, Ember, Javascript, Vue, Electron
  • iOS
  • Android
  • Ionic
  • Apache Cordova
  • Flutter
  • React Native
  • Microsoft Power Apps

Pricing

Alan Pricing Packages

The enterprise price varies depending on the customer’s needs, such as: 

  • Number of projected users
  • When is it projected to go live?
  • What language(s) will be supported?
  • Project and Use Case Description 
  • Will the voice experience be developed internally or will the Alan team build out the functionality?

There is no limitation on the number of users interacting with the voice assistant in your app. On Free and Pay as You Go pricing plans, the number of voice interactions is only limited by the number available on the billing account.

ReactJs Implementation

1. Go to Alan Studio.

2. Sign up with a Google or GitHub account, or with your email address.

3. In Alan Studio, click ‘Create Voice Assistant.’ Select to create an empty project and give it any name you want.

4. Create React App:

npx create-react-app my-app

5. Install the Alan Web component:

npm i @alan-ai/alan-sdk-web

6. Add the Alan Button to the app in the App.js file:

Example of how to add the Alan Button to the app in the App.js file

In the key field above, replace YOUR_KEY_FROM_ALAN_STUDIO_HERE with the Alan SDK key for the Alan Studio project. In Alan Studio, at the top of the code editor, click Integrations, copy the value provided in the Alan SDK Key field, and paste the value to key.

7. Add voice commands:

 In Alan Studio, open the project and, in the code editor, add the following intents:

How to add voice command code snippets

Now in the app, click the Alan button and ask: “What is your name?” and “How are you doing?” Alan will give responses provided in the intents.

Azure Speech Service – An Overview

Azure Cognitive Services are cloud-based services with REST APIs and client library SDKs available to help build cognitive intelligence into applications. It is possible to add cognitive features to applications without having artificial intelligence or data science skills. Azure Cognitive Services comprise various AI services that enable to build cognitive solutions that can see, hear, speak, understand, and even make decisions. [5]

Categories of Cognitive Services:

The catalog of cognitive services that provide cognitive understanding is categorized into five main pillars:

  • Vision
  • Speech
  • Language
  • Decision
  • Search

Azure Speech is a subset of Cognitive services. Speech service adds speech-enabled features to applications.

The voice assistant service provides fast, reliable interaction between a device and an assistant implementation that uses either (1) Direct Line Speech (via Azure Bot Service) for adding voice capabilities to bots, or, (2) Custom Commands for voice commanding scenarios. [6]

Direct Line Speech is an end-to-end solution for creating a voice assistant. It is powered by the Bot Framework and its Direct Line Speech channel, which is optimized for voice-in, voice-out interaction with bots.

Custom Commands makes it easy to build rich voice commanding apps optimized for voice-first interaction experiences. It provides a unified authoring experience, an automatic hosting model, and relatively lower complexity, helping to focus on building the best solution for the voice commanding scenarios.

If you want…Then consider…For example…
Open-ended conversation with robust skills integration and full deployment controlAzure Bot Service bot with Direct Line Speech channel“I need to go to Seattle”

“What kind of pizza can I order?”

Voice commanding or simple task-oriented conversations with simplified authoring and hostingCustom Commands“Turn on the overhead light”

“Make it 5 degrees warmer”

Other samples

Table1 – Voice assistant options [6]

How Azure Speech works

Direct Line Speech offers the highest levels of customization and sophistication for voice assistants. It’s designed for conversational scenarios that are open-ended, natural, or hybrids of the two with task completion or command-and-control use.

1. Create a Bot Using the Bot Framework: Bot Framework Composer enables creators to design conversational bots, virtual agents, digital assistants, and all other dialog interfaces—offering flexible, accessible, and powerful ways to connect with customers, employees, and one another.

Create a Bot Using the Bot Framework

2. Create the corresponding Azure resources configuration to voice enable the created bot using Direct Line Speech.

3. Using Speech SDK, the client application connects to the Direct Line Speech channel and streams audio.

4. Optionally, higher accuracy keyword verification happens on the service.

5. The audio is passed to the speech recognition service and transcribed to text.

6. The recognized text is passed to the Echo Bot as a Bot Framework Activity.

7. The response text is turned into audio by the Text-to-Speech (TTS) service, and streamed back to the client application for playback.

Documentation

The Azure Speech Service has a lot of resources and examples in their documentation, including Youtube tutorials. However, the steps you must take in order to create a voice assistant can vary depending on the complexity of the intended solution. The creation of voice commands is more straightforward through the Speech Studio, but according to the documentation, the full AI capabilities are available when using a bot with voice enabled through the Direct Line Speech solution, which includes more configuration steps and also the learning curve to use the Bot Framework. 

The integration examples in the documentation are more directed towards integrating with .NET Framework solutions, although there are GitHub examples on how to connect to the Bot Framework and to the Direct Line Speech in JavaScript, but there is not an end-to-end example on how to build a JavaScript voice assistant using the Speech SDK.

Integration Options

The Azure Speech software development kit (SDK) exposes many of the Speech service capabilities, to empower speech-enabled applications. The Speech SDK is available in many programming languages and across all platforms such as:

  • C#
  • C++
  • Go
  • Java
  • Javascript
  • Objective C / Swift
  • Python

Pricing

InstanceCategoryFeaturesPrice
Free – Web/Container

1 concurrent request

Speech to TextStandard25 audio hours free per month
Custom5 audio hours free per month

Endpoint hosting: 1 model free per month 3

Conversation Transcription Multichannel Audio5 audio hours free per month
Text to SpeechStandard5 million characters free per month
Neural0.5 million characters free per month
Custom5 million characters free per month

Endpoint hosting: 1 model free per month

Standard – Web/Container

100 concurrent requests for Base model

20 concurrent requests for Custom model1

Speech to TextStandard$1 per audio hour
Custom$1.40 per audio hour

Endpoint hosting: $0.0538 per model per hour

Conversation Transcription Multichannel Audio$2.10 per audio hour 5
Text to SpeechStandard$4 per 1M characters
Neural$16 per 1M characters 6

Long audio creation: $100 per 1M characters

Custom$6 per 1M characters

Endpoint hosting: $0.0537 per model per hour 9

Custom NeuralTraining: $52 per compute hour, up to $4,992 per training

Real-time synthesis: $24 per 1M characters

Endpoint hosting: $4.04 per model per hour

Long audio creation: $100 per 1M characters

Table2 – Speech Recognition Price [8]

ReactJs Implementation

As prerequisites, it is important to follow these steps, which are covered in more detail in the official documentation (follow this url to find more details about the resources needed before integrating to a JavaScript project):

1. Create the Bot in the Bot Composer Framework

2. Create new Azure resources

3. Build, test, and deploy the Bot to an Azure App Service

4, Register your bot with Direct Line Speech channel

To integrate to a ReactJS application:

4. Create a new ReactJs app

5. Install the node module webchat bot framework

npm i botframework-webchat

7. Retrieve your Direct Line Speech credentials:

Retrieve your Direct Line Speech credentials:

8. Render webchat using the Direct Line Speech Adapters

Render webchat using the Direct Line Speech Adapters

9. For full customizability, you can use React to recompose components of Web Chat

To install the production build from NPM, run npm install botframework-webchat.

To install the production build from NPM, run npm install botframework-webchat.

Conversational AI Platforms: In Summary

Both Alan and Azure Speech seem to provide great solutions for voice assistant problems.

In terms of the learning curve, it is a much easier option to work with Alan AI, as the documentation seems pretty straightforward in terms of how to create the conversational possibilities and training for the AI in the Alan AI Studio. In the case of Speech cognitive services, there are more resources to learn and master including the Bot Framework and other important Azure prerequisites to start developing the voice assistant solution.

Alan AI provides stronger support in terms of the SDKs available to integrate with more programming languages and the vast examples on how to achieve a solution for the developers chosen programming language. Azure Speech service has more components for a possible scenario or solution and there is also the option to work with a simpler solution through the custom commands that the Speech Service offers, so is not as straightforward as with the Alan technology. The integration through SDKs is harder to follow through the documentation because the examples are more oriented towards Microsoft solutions like C#.

In terms of pricing, the recommendation would be to contact Alan AI and Microsoft to have a more detailed price calculation according to the requirements of your project. And it would be important to keep in mind that, depending on the Azure solution, there will be some resources that will need consideration in the estimation process.

Conversational AI is a very powerful tool that can allow the users to have faster assistance and also to self-serve in a faster way in the applications. These two providers are examples of technologies available that could provide developers and companies with tools to innovate their services and applications.

References

[1] https://arxiv.org/abs/1801.03604

[2]https://enterprisebotmanager.com/conversational-ai/?gclid=Cj0KCQjwwY-LBhD6ARIsACvT72OMA8OpGcJbnSE7AMoeUtY9ibSz03iXlzjcrDj4A_dutd8aBBr2U-kaAtnoEALw_wcB

[3] https://alan.app/docs/usage/how-works/infrastructure

[4] https://alan.app/pricing

[5] https://docs.microsoft.com/en-us/azure/cognitive-services/what-are-cognitive-services

[6] https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/voice-assistants

[7] https://docs.microsoft.com/en-us/composer/introduction?tabs=v2x

[8] https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/

[9] https://github.com/microsoft/BotFramework-WebChat/tree/main/samples/03.speech/a.direct-line-speech

[10] https://www.npmjs.com/package/botframework-webchat

[11] https://www.youtube.com/watch?v=Nh3S_sljkpI&ab_channel=MicrosoftAzure

 

 

New call-to-action

Andres Garcia

Andres is a senior web developer living in Costa Rica and has been part of the Gorilla Logic family for over 7 years. He is a fan of full-stack development and of working on projects with challenging new technologies, he is experienced in javascript frameworks like ReactJs. Andres is also a dancer and practices in his free time.

Related Articles

Ready to be Unstoppable?

Partner with Gorilla Logic, and you can be.