Document Reading

I'm a bot. Documents are not boring for me!
Neural Document reading is a task where a Deep learning Model finds an answer to a query from a document/context.
Components of Document Reader
  1. 1.
    Elastic Search as a Document store
    The documents are preprocessed and are stored in elastic search indexes
  2. 2.
    Neural/statistical Document Ranker
    Document Ranker retrieves Top n documents out of m documents (m>>n) which would most likely have an answer for the incoming query. The most likely documents that would contain answer are chosen based on:
    • Neural approach: Semantic similarity of query embedding with the already existing Document embeddings
    • Statistical approach: Based on word overlap in question to that of document
  3. 3.
    Transformer Reader
    • Extractive Reader
      we feed question and context (the list of documents shortlisted by the document ranker) as input to Transformer. The Embeddings generated from the transformer layers are passed through two separate Feed-Forward neural networks. One of the Networks predicts the start token index and the other predicts the end Token index. The Probability distribution over the words in documents (for both start and end token) is used to retrieve the answer
    • Generative Reader
      Generates a novel answer (not necessarily a span of text) from the document
Document Reader