Skip to content

Instantly share code, notes, and snippets.

View ashikns's full-sized avatar
🏠
Working from home

Ashik Salim ashikns

🏠
Working from home
View GitHub Profile
@ashikns
ashikns / e2e_embeddings_pipeline.py
Last active January 23, 2026 03:27 — forked from DiTo97/e2e_embeddings_pipeline.py
end-to-end pipeline for hard-negative mining, Sentence-Transformers training, and evaluation
"""
Improved end-to-end pipeline for hard-negative mining, Sentence-Transformers training,
and evaluation (including chain-recall) for multi-hop retrieval tasks.
High-level features implemented:
- Three stages implemented: (1) hard negative mining, (2) training, (3) evaluation.
- Search API definition (async-friendly) that your baseline retrieval system must
implement to provide prioritized baseline hard-negatives.
- Baseline hard negatives are given highest priority when merging candidates.
- BM25 margin-based mining (Lexical mining) is performed and merged with baseline