Developing an Automated Q&A Tool with Golden Retriever

Why do we need an automated engine for Question Answering? 

Back when AI Singapore (AISG) consisted of a team of five and first started our AI Apprenticeship Programme, we had one shared email address that any interested candidate could email questions to. We received questions about coursework, about stipends, about projects, about eligibility and pretty much anything else you could think of. Whoever saw the email first would jump in and write a reply. It was like running around with a hat putting out fires as they flamed up. 

Over time, we noticed people asking similar sets of questions, and we managed to collate a good set of FAQs to which to point people when they came to us. Still, handling enquiries was manual and time-consuming. 

Our problem is not unique. Any organization, whether in research, banking or government, has to deal with questions from customers. Customers might need clarification on almost anything, from questions that other people have asked before (FAQs) or clauses in contracts. It can be a tedious process for customer service representatives if they need to retrieve such information repeatedly.

Introducing Golden Retriever

This is why we built the open-sourced Golden Retriever, an automated information retrieval engine for human language queries. Golden Retriever is part of the set of pre-built solutions offered by AI Makerspace, solutions that make it easy for teams to integrate AI into their services. Our intention is to provide Golden Retriever as an open source tool for users across multiple industries to fine-tune the model for their own use cases. This will be beneficial for users who want to tap on confidential documents and are hoping to utilize the tool internally.

Golden Retriever primarily uses Google’s Universal Sentence Encoder for Question and Answering (Google USE-QA) to power itself, but it is also compatible with other publicly available models such as BERT and ALBERT. Our initial experiments on academic datasets like the SQUAD QA dataset and Insurance QA dataset showed promising results. The model evaluation metric is accuracy@k (Acc@k), where k is the number of clauses our model returns for a given query. A top score of 1 indicates that the returned k clauses contains a correct answer to the query, and a score of 0 indicates that none of the k clauses returned a correct answer.

Table 1: Performance comparison between Google USE-QA and other models

If you have a set of customized dataset, you can fine-tune Golden Retriever on that dataset for better results.  

We previously wrote about Golden Retriever which you can view here. However, the latest release of Golden Retriever incorporates significant changes to make it even easier to put into production. 

How Golden Retriever works

We use the following open source tools to power the core of Golden Retriever,

  • Elasticsearch: a distributed RESTful search and analytics engine used to store the incoming queries and potential responses for your application
  • Minio: an object storage system used to store the fine-tuned weights of your model and other artefacts. 
  • Streamlit: an easy-to-use package to setup a frontend page for users to send in their queries
  • FastAPI: a web framework for building APIs with python 3.6+
  • DVC: DVC runs on top of any Git repository and allows users to setup reproducible Machine Learning pipelines
  • Google’s Universal Sentence Encoder: a model pre-trained by Google and used within Golden Retriever to encode text into high dimensional vectors for semantic similarity tasks.
Figure 1: Overview of Golden Retriever’s pipeline

While fine-tuning the Google Universal Encoder model, we have tried to leverage on training data (question and answer pairs) across different domains. We observed that the performance of the model did not improve when we were fine-tuning these largely differing datasets simultaneously. When fine-tuning a set of model weights for your own use case, it might be worth having multiple model weights for different use cases rather than seeking to fine-tune a model that is generalizable across domains.

Benefits of Golden Retriever 

The backend services needed to make Golden Retriever production-ready are packaged by Docker Compose. This means the application is platform agnostic. As long as you have Docker on your machine, calling docker-compose up will create the services needed to run Golden Retriever. No piecemeal installations are needed. 

Each component is well supported by the Python community. They have good documentation and Getting Started pages for reference. 

These design choices mean Golden Retriever is transparent, easy to install and easily customizable. For more narrative walkthroughs of Golden-Retriever or example use cases, check out our github repo and our information page and our demo. 

Next steps

We strive to continuously improve the functionalities of Golden Retriever and welcome contributions from the community. Do drop us an email if you have any suggestions or questions.

(This article was jointly written with Jeanne Choo)

Related Stories

Data Versioning for CD4ML – Part 1

(Top image courtesy of Molly Anderson)

Author