Hi everyone,
I’m trying to parse a Subrip (.srt) subtitle file using a Regular Expression.
For those not familiar with the format, a sample looks like this:
[quote]1
00:00:06,849 --> 00:00:07,740
Morning everyone
Finally time for my Ice Bucket Challenge
Test
2
00:00:07,740 --> 00:00:14,009
Sorry it took so long
Thank you for my nominations[/quote]
Each subtitle consists of a number, time stamps, and one or more lines.
I’ve been using RegExRx by Kem Tekinay (wonderful program; i’m learning more than I thought possible about Regular Expressions just by fiddling) and thought I had made great progress getting a RegEx that was working:
(?x) # free spacing mode
^(\\d+)[\\r\
|\\r|\
] # start of line one or more digits, end of line
(\\d{2}:\\d{2}:\\d{2},\\d{3}) # digits in the format 00:00:00,000
(?:\\x20-->\\x20) # (non-capturing group) "-->" surrounded by a space on either side
(\\d{2}:\\d{2}:\\d{2},\\d{3})[\\r\
|\\r|\
] # digits in the format 00:00:00,000 followed by end of line
(^.+$)[\\r\
|\\r|\
] # start of line, one or more characters, end of line
(^.+$)[\\r\
|\\r|\
]? # start of line, one or more characters, end of line (optional)
However while I was refactoring my code I discovered that it does not work for more than two subtitle lines (see the third line ‘Test’ I added above)
I think I can assume there will always be one line (otherwise there would be nothing to display which is the same as not having a subtitle at all) so I need to repeat my second, optional match for all possible matches.
I know that I can use ‘+’ to repeat the last group one or more times, and I read that having it outside the expression will only return the last match - and sure enough that’s what I see - so it needs to go within the expression but I have not been able to work out where.
Can anyone help?