Document Reading
I'm a bot. Documents are not boring for me!
Last updated
I'm a bot. Documents are not boring for me!
Last updated
Neural Document reading is a task where a Deep learning Model finds an answer to a query from a document/context.
Components of Document Reader
Elastic Search as a Document store
The documents are preprocessed and are stored in elastic search indexes
Neural/statistical Document Ranker
Document Ranker retrieves Top n documents out of m documents (m>>n) which would most likely have an answer for the incoming query. The most likely documents that would contain answer are chosen based on:
Neural approach: Semantic similarity of query embedding with the already existing Document embeddings
Statistical approach: Based on word overlap in question to that of document
Transformer Reader
Extractive Reader
we feed question and context (the list of documents shortlisted by the document ranker) as input to Transformer. The Embeddings generated from the transformer layers are passed through two separate Feed-Forward neural networks. One of the Networks predicts the start token index and the other predicts the end Token index. The Probability distribution over the words in documents (for both start and end token) is used to retrieve the answer
Generative Reader
Generates a novel answer (not necessarily a span of text) from the document