Hi everybody,
I have this protein sequences (without spaces between the sequences) in a text file:
[quote]>Q28133 Bovin protein
MKAVFLTLLFGLVCTAQETPAEIDPSKIPGEWRIIYAAADNKDKIVEGGPLRNYYRRIEC
INDCESLSITFYLKDQGTCLLLTEVAKRQEGYVYVLEFYGTNTLEVIHVSENMLVTYVEN
YDGERITKMTEGLAKGTSFTPEELEKYQQLNSERGVPNENIENLIKTDNCPP
P00257-2 Bovin protein
MAARLLRVASAALGDTAGRWRLLLKSSQFIKVSCSGSWISAAQRAFICYSKSGNITCFLR
SEDKITVHFINRDGETLTTKGKIGDSLLDVVVQNNLDIDGFGACEGTLACSTCHLIFEQH
IFEKLEAITDEENDMLDLAYGLTDRSRLGCQICLTKAMDNMTVRVPDAVSDARESIDMGM
NSSKIE
C1_11500C_B Bovin protein
MDFMKPETVLDLANIRQALVRMEDTIVFDLIERSQFFSSPSVYEKNKYNIPNFDGTFLEW
ALLQLEVAHSQIRRYEAPDETPFFPDQLKTPILPPINYPKILAKYSDEINVNSEIMKFYV
DEIVPQVSCGQGDQKENLGSASTCDIECLQAISRRIHFGKFVAEAKYQSDKPLYIKLILD
KDVKGIENSITNSAVEQKILERLIVKAESYGVDPSLKFGQNVQSKVKPEVIAKLYKDWII
PLTKKVEIDYLLRRLEDEDVELVEKYKK[/quote]
I have tried to do the regex pattern with Kem’s App (RegExRX) and I use this pattern:
RegEx Patterrn: ^>([^ ]*)\\s(.*)[\\r\
]((([a-zA-Z])+[\\r\
])*)
This code works perfectly with the first 2 sequences and I can obtain these values:
$1: Q28133
$2: Bovin\s protein
$3: MKAVFLTLLFGLVCTAQETPAEIDPSKIPGEWRIIYAAADNKDKIVEGGPLRNYYRRIEC
INDCESLSITFYLKDQGTCLLLTEVAKRQEGYVYVLEFYGTNTLEVIHVSENMLVTYVEN
YDGERITKMTEGLAKGTSFTPEELEKYQQLNSERGVPNENIENLIKTDNCPP
But I have always a problem with the last sequence and exactly I cannot match the last line. If I include a ‘return’ after this line, my pattern recognizes perfectly this line but I would like to get the same result without modifying the text file. Is it possible?
Could anyone help me, please?
Thank you very much.
Sergio