NAV Navbar
json
  • Introduction
  • Getting started
  • Introduction

    Welcome to Slang Labs documentation center! Here you will find details of how to go about integrating Slang into your mobile app.

    Slang enables apps to have a voice interface, which will in-turn allow its users to control and navigate the app via their voice. Typically in addition to the existing touch interface.

    Slang will provide all the mechanisms needed to add a voice interface, including UI elements, permissions, voice collection, retries, etc. If you prefer to have your own UI elements, we provide extension points that you could use to roll out your own custom elements.

    Currently we support only Android. iOS and web support will come in the future.

    Lets get started right away!

    Getting started

    Integrating Slang into your app requires two distinct steps.

    1. Register your app with Slang and create a "schema" to tell Slang what commands to expect and what data they could contain.
    2. Include Slang AAR into your build and use the provided APIs to register code points that correspond to the "schema" you defined above.

    Don't worry if things are not obvious at this point. You will understand shortly once we go down the individual steps.

    Registering your app with Slang

    Go to https://app.slanglabs.in/ and sign up using your email id. Verify the id by following the steps in the email received.

    Create new app

    After you have signed in, type your app name and click on "Create new app". If the creation was successful, you will be able to see app name and app id on top with a schema editor below it.

    Slang uses app-ids to identify your app. The "schema" you create to define your commands/interactions are associated with this id and you need to use the same value when integrating Slang into your code base.

    Create a schema

    After you have successfully registered your app with Slang, the next step is to create a 'schema' that will specify the actions you want to be voice enabled. You can edit your schema in the schema editor.

    Example fully functional schema

    {
      "types": [
        {
          "$EntityType$": "enumEntityType",
          "name": "mode",
          "values": [
            {
              "identity": "photo",
            },
            {
              "identity": "video",
            }
          ]
        },
      ],
      "intents": [
       {
         "name": "take_photo",
         "entities": [
           {
             "name": "feature",
             "type": "mode",
             "required": true,
             "prompts": [
               "Do you want to take a photo or video?"
             ]
           },
           {
             "name": "delay",
             "type": "std.duration",
             "default": "0 seconds",          
           }
         ],
         "examples": [
            [
              {
                "text": "take"
              },
              {
                "text": "photo",
                "entity": "feature",
              },
              {
                "text": "after"
              },
              {
                "text": "3 seconds",
                "entity": "delay"
              }
            ]
          ],
        }
      ]
    }
    

    Here is an example schema for a camera app that wants to expose some of its features to a user to trigger via voice.

    Intent

    Specify an intent

    {
      "intents": [
        {
          "name": "take_photo",
        }
      ],
    }
    

    The next step is to specify the intents. Think of them as "types of commands" that you would like to support via voice

    This specifies one intent and names it take_photo.

    The next thing is to give some sample sentences that should trigger that intent.

    Add some example sentences that should trigger the intent

    {
      "intents": [
        {
          "name": "take_photo",
          "examples": [
             [
               {
                 "text": "take a photo"
               },
             ]
           ],
        }
      ],
    }
    

    The example here informs Slang, that when the user says take a photo, it should convert that to an intent take_photo.

    We now have an almost complete schema. But to make it more useful, developers would need to specify a few more things.

    Entities

    Specify Entities

    {
      "intents": [
        {
          "name": "take_photo",
          "entities": [
            {
              "name": "delay",
              "type": "std.duration",
              "default": "0 seconds",
            },
          ],
          "examples": [
             [
               {
                 "text": "take a photo"
               },
             ]
           ],
        }
      ],
    }
    

    Entities are pieces of data that would provide some additional parameters to the intent.

    So continuing with the previous example, if we want to allow users to specify some parameters, say a "delay", we could add an entity to the intent.

    Note that entities are "parameters" to the intent. And so they need to have a type specification. In this example, we specify the type of the entity "delay" to be std.duration.

    Extend the example with the entity

    {
      "intents": [
        {
          "name": "take_photo",
          "entities": [
            {
              "name": "delay",
              "type": "std.duration",
              "default": "0 seconds",
            },
          ],
          "examples": [
             [
               {
                 "text": "take a photo after"
               },
               {
                 "text": "3 seconds",
                 "entity": "delay"
               }
             ]
           ],
        }
      ],
    }
    

    Now we need to add some examples which specifies the entities.

    Required Entity

    If the entity is mandatory, ie its required for the intent to be completed, users can set the "required" flag to "true" flag in the entity definition.

    The framework will make sure to prompt the user (using the prompt string configured with the entity), in case its missing in the original command.

    Entity Types

    You can also add entities that map to one of a set of custom values. They are called enumEntityType

    Lets extend the example schema to add a feature entity, that will allow users to specify whether to take a photo or video.

    To create such a custom entity, we need to add specify the type of the entity.

    Types of entities

    {
      "types": [
        {
          "$EntityType$": "enumEntityType",
          "name": "mode",
          "values": [
            {
              "identity": "photo",
            },
            {
              "identity": "video",
            }
          ]
        },
      ],
      "intents": [
        {
          "name": "take_photo",
          "entities": [
            {
              "name": "feature",
              "type": "mode",
              "required": true,
              "prompts": [
                "Do you want to take a photo or video?",
              ]
            },
            {
              "name": "delay",
              "type": "std.duration",
              "default": "0 seconds",
            },
          ],
          "examples": [
             [
               {
                 "text": "take"
               },
               {
                 "text": "photo",
                 "entity": "mode"
               },
               {
                 "text": "after"
               },
               {
                 "text": "3 seconds",
                 "entity": "delay"
               }
             ]
           ],
        }
      ],
    }
    

    While there are a bunch of standard entity types that Slang supports by default (eg std.duration, std.geo.city, std.geo.state), sometimes one might need to specify a custom type that limits the values of the entity.

    In the camera example, its a type called "mode" that is limited to either "photo" or "video"

    Types of built-in entities

    'std.cardinal',
    'std.color',
    'std.currency',
    'std.date',
    'std.duration',
    'std.name.full',
    'std.geo.city',
    'std.geo.country',
    'std.geo.state',
    'std.integer',
    'std.language',
    'std.ordinal',
    'std.time'
    

    Save/Publish the Schema

    Save/Publish your schema

    Once you have created the schema, you can save it (but it wont be available for usage yet), allowing for later modifications. If you are done with it completely, the next step is to publish it.

    If everything is fine, you should see a successful status reply - "Intents Published Successfully!"

    Integrate Slang to your app

    After the schema has been published, the next step is to integrate Slang into your app.

    Build integration

    Add Slang dependency

    implementation('com.slanglabs.sdk.android:slang_lib:+@aar') {
      transitive=true
    }
    

    In your android gradle file add the dependency to Slang

    Add the Slang repository

    repositories {
      ...
      maven {
            url "http://maven.slanglabs.in:8080/artifactory/gradle-release"
      }
    }
    

    Also add the path to Slang's maven repository that contains the Slang aar file

    Code integration

    The Slang API is broadly divided into 3 categories

    Initialization

    Initialize Slang

    public class MyApplication extends Application {
      @Override
      public void onCreate() {
        super.onCreate();
        Slang
          .init(applicationContext)
          .appId(<your app id>)
          .authKey(<your auth key>)
      }
    }
    

    The first thing to do when integrating Slang is to initialize the framework and pass the application context to it.

    This should be done in the onCreate method of your Application class.

    Registering actions

    Register actions

    Slang
      .action()
      .register("take_photo", new SlangAction.ActionCallback() {
          @Override
          public boolean onEntityProcessing(
              @NonNull Activity activity,
              SlangIntent intent
          ) {
            // Use this to pre-process the entities and modify them or fill in missing
            // entities
    
            return true; // return false to stop processing the intent
          }
    
          @Override
          public boolean onIntentDetected(
              @NonNull final Activity activity,
              final SlangIntent intent,
              final SlangAction.IntentProgressListener listener
          ) {
            /* front or rear camera */
            String camera = intent.getEntity("camera").isSet() ? intent.getEntity("camera").getValue() : "front" ; 
            /* delay duration */
            int delay = intent.getEntity("duration").isSet() 
                          ? Integer.parseInt(intent.getEntity("duration").getValue()) 
                          : 0;
    
            switch (intent.getIntentString()) {
              case "take_photo":
                // handle the actual photo taking operation using the "camera" & "delay" entities
                ...
                // Inform Slang that the processing is complete
                listener.intentCompleted(intent);
                break;
              }
            }
    
            return false; 
          }        
      })
      .register(...); // Chain more registrations
    

    The next step to do after initializing Slang is to inform it the intents that the application will handle and associate them with actions.

    Use Slang.action().register(...) to register a callback for the intent.

    When the framework detects the intent, it will then call the methods of the Callback associated with it.

    Action callback

    Preprocess the Entities

    public boolean onEntityProcessing(
        @NonNull Activity activity,
        SlangIntent intent
    ) {
    
    }
    

    The action callback has two methods that the framework will invoke.

    onEntityProcessing method will be invoked to preprocess the entities that have been detected. The developer can inspect the entities that have been detected and update/edit the same. If any required entity is missing, it will also be available to the developer to inspect and also fill in if possible.

    public boolean onEntityProcessing(
        @NonNull Activity activity,
        SlangIntent intent
    ) {
      // If camera is not set, set it to "front"
      if (!intent.getEntity("camera").isSet()) {
        intent.setEntityValue("camera", "front");
      }
    
      return true;
    }
    

    Handle the intent

    @Override
    public boolean onIntentDetected(
        @NonNull final Activity activity,
        final SlangIntent intent,
        final SlangAction.IntentProgressListener listener
    ) {
    
    }
    

    onIntentDetected method will be invoked when ALL required entities have been collected by the framework. The developer can then complete the action associated with the intent, using the entities passed in.

    Follow on mode

    @Override
    public boolean onIntentDetected(
        @NonNull final Activity activity,
        final SlangIntent intent,
        final SlangAction.IntentProgressListener listener
    ) {
      // After handling the intent (typically by launching an activity, for eg)
      listener.intentCompleted(intent, "Picture taken", true, "Would you like to take more photos?");
    
      return true;
    }
    

    After the intent is handled, if the app wants Slang to continue listening for more inputs automatically, return "true" from the "onIntentDetected" method.

    Or if you need a more sophisticated follow up, then use the "listener" object to indicate your preference.

    After handling the intent, if the app wants to speak out a "confirmation" prompt, it can send it as the second parameter to the "listener.intentCompleted" call. And the 3rd parameter is used to indicate if the app should continue listening to the user.

    Register fallbacks

    Slang
     .action()
     .register(...)
     .fallback(new SlangAction.ActionCallback() {
         @Override
         public boolean onEntityProcessing(
             @NonNull Activity activity,
             SlangIntent intent
         ) {
           return false;
         }
    
         @Override
         public boolean onIntentDetected(
             @NonNull final Activity activity,
             final SlangIntent intent,
             final SlangAction.IntentProgressListener listener
         ) {
           return true;
         }
       }
    );
    

    If required, register a fallback handler for ALL intents that is specified in the schema and not having explicit registrations.

    Intents & Entities

    Get intent

      intent.getIntentString()
    

    This will return the currently detected intent.

    Get Entity values

      intent.getEntity("entity_name").getValue();
    

    To get an entity, the developer can use the passed in 'intent' object to get to the entities.

    Showing and Hiding Slang button

    The Slang button is the entry point into the system. By default, the Slang button is shown on the bottom center part of the screen. Clicking on it will bring up a surface that will then interact with the user to collect voice.

    When the app is initialized with Slang, the framework will automatically insert a Slang button on every activity of the app.

    Hide Slang trigger

      Slang.ui().trigger().hide();
    

    To turn off the Slang button, use this invocation pattern.

    Show Slang trigger

      Slang.ui().trigger().show();
    

    To explicitly turn on Slang (if it had been turned off earlier), use this invocation pattern.

    Note that once you turn on Slang, it will automatically show in ALL activities. And if you turn off Slang, it will remain turned off until explicitly turned on.

    Slang surface

    To set a help message in the surfaces (ie messages like "Welcome to Camera! Say "take selfie" to take a photo").

      Slang.setHelpMessage()
    

    The Slang surface is the visual interface that shows up when the Slang button is clicked. This will internally trigger the voice recognition system and collect the voice and identify the intent and call the associated action callback if any.

    Normally the developer would not be required to interact with the Surface class, other than to customize its apparance.

       Slang.ui().surface().setImageResource(<resource id for a custom icon>)
    

    Get ready to have fun

    Hope this whirlwind tour of Slang helps you get started! Feel free to drop in a mail to 42@slanglaabs.in if you have questions (including questions about the life, universe and everything. We might not have the "ultimate answer", but we will surely have one that helps)