Improving Zero-Shot Ranking With Vespa Hybrid Search - Part Two

rw-book-cover

Metadata

Generating synthetic training data in-domain via prompting LLMs is a recent emerging Information Retrieval(IR) trend also described in InPars: Data Augmentation for Information Retrieval using Large Language Models. (View Highlight)
The basic idea is to “prompt” a large language model (LLM) to generate synthetic queries for use in training of in-domain ranking models. A typical prompt include a few examples of queries and relevant documents, then the LLM is “asked” to generate syntetic queries for many of the documents in the corpus. The generated syntetic query, document pairs can be used to train neural ranking models. (View Highlight)