Saturday, November 16, 2013

Natural Language Processing (NLP) in the self-service Business Intelligence (BI) world

I wrote a blog post back in February of 2011 about 'ad hoc' reporting vs 'self-service' reporting: DAXDude: AX 2009 Ad Hoc vs Self-Service reporting. The point of the article was to say that the two terms were synonymous. I recently had someone make a comment on that post bringing up the point of 'Natural Language' reporting being the future of reporting. Instead of responding to that comment directly, I decided it might be best to write a post on my thoughts on the concepts. I did quite a bit with it back in my pre-ERP programming days.

What is 'Natural Language' reporting?  
Natural language reporting rests in the Natural Language Processing (NLP) computer science field and has to do with how end users (people) interact with computers in a way that mimics how they would interact with other people. Basically, the end user will state a question or command to the computer or robot and the computer will translate that request in its own language to determine an action whether that is something physical like moving or retrieving data. It is a major component to the artificial intelligence sciences.

This technology has been around a long time and was a concept people widely became familiar with in the 90's with the introduction of More recently, it is used in Apple's personal assitant dubbed 'Siri' on the iPhone. NLP reporting is basically allowing the user to type/ask a question into a field (like a search engine) and have the system return back data that is relevant to the question asked.

As an example, someone may ask the system: 'How many sales orders have I written?'. The system would then translate this into its language by recognizing key words like 'How many' indicating that the user wants to know something and 'Can you' asking for an action. From there the system would have to be able to recognize that the user wants a query to be run for 'sales orders' and 'I' being the current user. This could all boil down to the system returning the results of [select count(recId) from salesTable where salesTable.SalesResponsible == 'daxdude'].

Using the example above, one might be able to start seeing where the complexity comes in for making this type of solution robust and reliable for reporting. For example, be a system tester and try to think of how many ways you can break this thing. Broken english, slang, missing qualitiers/identifiers, etc are all easy starting points. These things need to be pretty complex and have an advanced workflow to guide the users to the data they need.

Self Service Business Intelligence (BI)
Self-service BI is an important aspect in the modern, dynamic business model. It's all about getting the right information to the right people at the right time. Often, the requirements to accomplish that change frequently. This is especially true as companies become more dynamic in their growth plans. An end user can either define their needs and submit the requirements to someone on the IT team to develop (which could take some time) or be enabled to retrieve that information through a set of tools that enable the user get access to that information on their own. The later of the two options is self service BI whereas the first option is more traditional and common in older systems. Having such self service capabilities can help increase efficiency and allow identification of issues before they become a problem.

The core requirement of a self-service BI solution is to give the end user easy to use tools to quickly and efficiently get the information they need. I usually look at the below criteria in evaluating a reporting solution.
  1. An easy to use and understand graphical user interface (GUI)
  2. Dynamic enough to allow complex, fast data retrieval for users comfortable with the tools
  3. Have as much reporting work completed by the end user as possible.
  4. Data is real time or as up to date as possible
  5. How and where can the reports (and report creation tool) be used 
  6. Translation of retrieved data into relevant visual representations
  7. Extraction of reporting from BI solution into external systems/files
Benefits to NLP reporting
There are a few benefits to NLP reporting. There are not many listed below, they are unique strengths that other reporting methods may not have. 

Benefit 1: A user doesn't need to know tables/fields in the database for the data to be retrieved. They can say 'What are my year to date sales for Wisconsin?' and the system 'should' be able to show the user the the year to date sales for that person for that state.

Benefit 2: Data retrieval can be done through voice (hands free) Think about the application being on the shop floor or a warehouse. I know several companies currently leverage voice picking. Imagine allowing the users to query information about their performance in relation to their target while on the floor. Or even in the board room where the fewer the clicks to get new data, the better for all involved.

Benefit 3: Retrieval of ad hoc query data can be fast. Often in the time it takes to just type or say a single sentence.

Benefit 4: People not comfortable with computers might like this reporting best. This can be older people or younger kids. I have worked with people that use computers almost exclusively for getting certain information and that is it. Think of the application in a place like a museum. People could download an app for that museum that is leveraging NLP and allow people to ask very specific questions about certain exhibits. NLP would make the experience intuitive and bridge the technology gap to allow everyone of all ages to get their questions asked.

Issues with NLP reporting
There are many downsides to utilizing an NLP solution for robust ad hoc reporting capabilities. I see very specific places where it can be helpful, but depending on the cost and maintenance, it might be a very tough sell to justify in a very good majority of situations. My personal opinion is that it is should be used as more of a supplemental BI solution for specific situations and not as a complete corporate solution for BI.

Issue 1: Data is often one dimensional in an NLP structure. Creating multiple complex joins to retrieve data in ways the system might not be able to automatically link is near impossible.

You definitely cannot create things like end user OLAP cubes (think 3-dimensional spreadsheet data views) by simply asking a question. There are so many factors there that can create issues even if the technology existed for that right now. And then imagine coding for the constantly changing database schemas in all the new ERPs. And the customizations customers may make to their system which would need to be taken into consideration in any BI solution.

Issue 2: It can be incredibly frustrating for end users to get the information they need if it is something very specific and it is not being translated between the human and computer languages correctly.

For example, lets look at Apple's Siri personal assistant. How many jokes have you heard about people asking it something and not getting back an appropriate response? It's very hard to get the users to phrase the questions correctly so that the internal associations in the NLP engine link to the correct pieces of information. When the technology is used by well trained people (think in the commercials or in a sales demo), it will work 100% of the time. When it is not, the success rate can be much lower. Ask yourself: Do you want people to be making jokes about your implemented BI solution because they asked the wrong information or it wasn't possible to render?

UPDATE (12/16): Here's a Simpson clip making fun of an Siri (NLP) and their shortcomings.

Issue 3: Data reliability and confidence is usually higher in solutions where the end user spends a little more time to research what fields/data need to be used in the query to retrieve the data. It provides more visibility into behind the scenes and eliminates the black box.

If the data is complex and the engine is somehow able to handle it, what would your estimated level of confidence in the data be if you didn't personally connect the dots? It might be right but when the boss asks 'How can you tell these numbers are accurate?', what will you say?

Issue 4: Multi-lingual support. What if the solution is in English and the end user doesn't know that language?

Issue 5: Dialects. If someone has a thick dialect (say someone from Chicago) attempting to query the data, they may not be able to be recognized by the voice engine. This is becoming less of an issue as technology continues to improve and data engines with APIs that do this translating are becoming more advanced.

Note: With a good engine and proper training, some of the issues above can be reduced but they will still be there to some degree.

Issues with ad hoc reporting in general
Often times there are issues with all types of ad hoc reporting. I'll only address one in this post but I believe it to be the biggest caveat and downside to ad hoc reporting: Incorrectly retrieving information.

Data in systems, especially in normalized structures, can be quite complex and may not be easily translatable to an end user. When users attempt to configure the pieces to report against the data, there is a chance they will miss something or incorrectly interpret what it is they are seeing. This can propagate incorrect information  

As an example, there once was a client that came screaming into the room (literally) asking for an explanation as to why the numbers they were seeing in their warehouses for on hand inventory were off by a significant amount. They were so upset they needed an explanation right that minute as what they were seeing was inexcusable. The natural answer was that the person couldn't tell them because they didn't know how the numbers in the report were retrieved. To cut right to the chase, someone queried the data wrong and the numbers were WAY off. Once correctly queried, the numbers were about where they expected due to the nature of their business. That was all a billable activity to correct. It created a lot of frustrations for all parties involved. All was good in the end though!

Note: There are ways to mitigate the bad data situations but those are far from being fool proof.

Natural Language BI as the next step in BI evolution?
The comment in my old post asked "Don't you think that Natural Language Business Intelligence is the next step of Business Intelligence's evolution?" 

This is a fair question to ask but I want to distinguish between artificial intelligence Jarvis App Demo and natural language BI. For corporate business intelligence, I don't really see it evolving to the point of adding too much value for a very long time. The technology has existed far longer than modern self service BI solutions.

I really only see this technology being a supplementary tool for BI reporting but is not 'the future' per se due to the issues I stated above. For it to truly be the future, the technology would have to have the artificial intelligence level of an advanced end user who knows (and evolves with) the company's unique business processes (and recognizes different modifications) and the ERP database system being used. And do that reliably and provide a reasonable degree of confidence in the returned data.

By the time the above is achievable, I'd be more afraid of my coffee machine being upset at me and half-assing my coffee. Remember that Back to the Future - Pt 2 portrayed multiple home fax machines as the wave of the future:

As for Qwalytics, their solution might be perfect for some applications and I will keep them in the back of my mind in being a possible solution if the right application were to arise.

Here is their website:
Here is a demo of the solution:

I'd be interested in how their solution has been used to add value to companies. I couldn't find any case studies on their website but that does not mean that they don't exist.

The preferred self-service BI solution I prefer for the Dynamics AX ERP
As far as robust self-service BI for corporations go, I really like the Halo BI platform. I'd highly suggest you to check it out. While some issues still exist that are inherent with self service (as detailed above), the Halo team has started to explore ways to solve some of those issues like assessing the quality of the report's data.

While I don't want to plug too much or anything, I did want to throw my preference for BI solutions in this post since I plugged someone else.

Not every solution is right for everyone in all situations but maybe this one is for you. Request a demo if you want to learn more:


  1. Thanks for this amazing blog post. Really informative and nice content to be shared.
    We are at e2f translation, offering Language solutions for artificial intelligence training and data annotation services. So, connect with us if someone is searching for language saluting in San Diego, CA 92122

  2. I am very happy read this, best article about NLP business online. Read more about NLP business online