Strategy:
- Extract the chr/pos/rsid from dbsnp into a
.bedfile. Here it makes a separate .bed file per chromosome for parallelisation of lookups - Create a tabix index for the
.bedfiles - Extract the chr/pos from the target VCF file
- Use tabix to query the target chr/pos against the tabix indexed
.bedfiles. Parellelised across chromosomes using GNU parallel. - Update the VCF file with the extracted RSIDs