The Three Biggest Barriers in Conversational AI Today
Currently existing conversational AI tools like chatbots, personal assistants, digital concierges etc which we use quite often are inaccurate — especially in domain specific use cases. The major reasons for this inefficiency are:
Unavailability of large volume and high quality domain specific data
One of the key considerations in choosing an AI conversational platform is the data availability. People reveal vast amount of information in conversations such as their knowledge, views and opinions. This valuable information can be used to feedback into the conversational engine. However, most chatbot development tools don’t actually provide any of the valuable details of the conversation. For example, a chatbot for taking pizza orders notifies the business owner about the particular orders but not the valuable conversational data occurred between the user and bot.
Another reason that hinders the data availability is the centralization and monopoly of data. Conversational data is possessed by a few IT giants such as Google Dialogflow, Facebook WIT, IBM Watson. They sell NLP services built on top of the immense data they possess for very high price. The development of newer tools are carried out by these few companies and the general developer community is dormant in the field. The insufficiency of domain specific data significantly hinders the development of conversational tools. For example, it could restrict enterprises from creating AI conversational tools tuned for their business. The inadequate data they often possess is not sourced from a qualified expert who has the right domain knowledge, which cannot be used to train AI algorithms
Inefficient management of domain specific conversational data
Domain knowledge management is an open problem. First, the sheer quantity of domain data is huge. There could be millions of chatting logs emerging everyday. Second, the number of domains is rapidly expanding with hundreds of new domains added per week. The data management system should be able to accommodate new domain data efficiently without compromising performance. Collected data is domain specific and unstructured (i.e., in its raw format of plain text such as user utterance and answers). This hinders the utility of the collected data for various AI tasks. Moreover, most of the domain data is unstructured and inefficiently organized to support various AI tasks. collected data is unstructured (i.e., in its raw format of plain text such as user utterance and answers) and inefficiently organized (e.g., centralized storage in hard disks). To efficiently utilize domain knowledge for training AI services, accommodate incremental domain additions rather than full model retraining. these data should be clustered and organized into a highly structured knowledge base.
Privacy and Trust issue in conversation data collection
Today most AI as a service platform and enterprises do not get authorization from data contributor to utilize their data for training their AI engine and sell them as a service. European Commission has rolled out stronger rules in May 2018 on data protection mean people have more control over their personal data which US and China has showed intent to strengthen data protection. The monopoly of free data collection will end soon and there is no solution in the marketplace where enterprises and data contributors can match and legitimately trade on knowledge transfer in large scale. The limitations in gathering information to back the AI algorithms underlying the conversational tools presents a hindrance for the pace in development of the conversational tools space.
Deep Knowledge Network is a team of top notch scientists and expertise aiming to resolve the barrier and unlock the full potential of conversation AI. We are going to present our Mission and Solution in our coming articles.