fasta文件中基因名替换

有时我们需要将fasta文件中的gene ID提环为相应的基因名称以方便下一步的分析。

例如:

less test.fa
>Soffic.01G0025330-1A
MLGGLKDKLTGKNGNKIKGLAVLMSRKLLDPRDFTASLLDNVHEVFGNSITCQLVSATVADQNNEGRGIVGSEANLEQGLTDLPSVSQGESKLTVRFNWEMD
KHGVPGAIIIKNHHSTKFFLKTITLHDVPGCDTIVFVANSWIYPVGKYHYNRIFFANISYPPSQMPEALRPYREDELRYLRGEDRQGPYQEHDRIYRYDVYNDLGEPDRDNPRPVLGGSQKHPYPRRGRTGRIPTKKDPNSESRLSLLEQIY
>Soffic.01G0050170-3G
MAAAAPSRVSVRAAAPGQTGGFAKIRPQVVVAAAARSAGVSGRRARSVRASLFSPKPATPKDARPAKVQEMFVYEINERDRESPAYLRLSAKQTENALGDLV
PFTNKLYSGSLDKRLGISAGICILIQHVPERNGDRYEAIYSFYFGDYGHISVQGPYLTYEESYLAVTGGSGVFEGAYGQVKLNQIVFPFKIFYTFYLKGIPDLPRELLCTPVPPSPTVEPTPAAKATEPHACLNNFTN
less -S bestpair.tab
LOX-3 Soffic.01G0025330-1A
HI-LOX Soffic.07G0017480-1P
AOC Soffic.01G0050170-3G
AOS-2 Soffic.01G0005280-2B
COI1a Soffic.03G0027470-3P
JAZ-6 Soffic.01G0051220-4E
JAZ-9 Soffic.01G0007960-5E
ICS Soffic.02G0012660-1P
PAL-4 Soffic.06G0001330-6T
NPR-1 Soffic.03G0003330-4F
PR-1a Soffic.02G0002240-2C

需要将test.fa文件中的基因ID替换为bestpair文件中相应的基因名称,可以通过以下操作完成:

a=(`cut -f 1 bestpair.tab`)
b=(`cut -f 2 bestpair.tab`)
for (( i=0; i<11; i++ ));do sed -i "s/${b[$i]}/${a[$i]}(${b[$i]})/g" test.fa;done

查看替换结果:

less test.fa
>LOX-3(Soffic.01G0025330-1A)
MLGGLKDKLTGKNGNKIKGLAVLMSRKLLDPRDFTASLLDNVHEVFGNSITCQLVSATVADQNNEGRGIVGSEANLEQGLTDLPSVSQGESKLTVRFNWEMDKHGVPGAIIIKNHHSTKFFLKTITLHDVPGCDTIVFVANSWIYPVGKYHYNRIFFANISYPPSQMPEALRPYREDELRYLRGEDRQGPYQEHDRIYRYDVYNDLGEPDRDNPRPVLGGSQKHPYPRRGRTGRIPTKKDPNSESRLSLLEQIY
>AOC(Soffic.01G0050170-3G)
MAAAAPSRVSVRAAAPGQTGGFAKIRPQVVVAAAARSAGVSGRRARSVRASLFSPKPATPKDARPAKVQEMFVYEINERDRESPAYLRLSAKQTENALGDLVPFTNKLYSGSLDKRLGISAGICILIQHVPERNGDRYEAIYSFYFGDYGHISVQGPYLTYEESYLAVTGGSGVFEGAYGQVKLNQIVFPFKIFYTFYLKGIPDLPRELLCTPVPPSPTVEPTPAAKATEPHACLNNFTN