Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

KeyBLD: Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

Abstract : Transformer-based models, and especially pre-trained language models like BERT, have shown great success on a variety of Natural Language Processing and Information Retrieval tasks. However, such models have difficulties to process long documents due to the quadratic complexity of the self-attention mechanism. Recent works either truncate long documents or segment them into passages that can be treated by a standard BERT model. A hierarchical architecture, such as a transformer, can be further adopted to build a document-level representation on top of the representations of each passage. However, these approaches either lose information or have high computational complexity (and are both time and energy consuming in this latter case). We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then aggregates few blocks to form a short document that can be processed by a model such as BERT. Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.
Type de document :
Communication dans un congrès
Liste complète des métadonnées
Contributeur : Anne-Christine Jacob Connectez-vous pour contacter le contributeur
Soumis le : jeudi 7 octobre 2021 - 14:09:00
Dernière modification le : mardi 9 novembre 2021 - 12:26:02




Minghan Li, Éric Gaussier. KeyBLD: Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval. SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event Canada, Canada. pp.2207-2211, ⟨10.1145/3404835.3463083⟩. ⟨hal-03369577⟩



Les métriques sont temporairement indisponibles