I am looking for a way to have an optional substring pattern in a regEx search pattern. The line I search can look either way:
A) “11:48:50,000 the quick brown fox… "
B) 11:48:50,000 the quick brown fox… 11:54:33,600”
My pattern right now looks like this:
rx.SearchPattern = "(\\d{2}:\\d{2}:\\d{2}-\\d{3})(.+)(\\d{2}:\\d{2}:\\d{2}-\\d{3}).?$"
Now substringexpression $3 should be optional. if a timestamp exists at the end of the line, I would want to read this out seperately. If not, substringexpression(3) should remain empty. Any tipps how to achieve that?
That pattern doesn’t match either of those lines, though…
You are right, I copied the wrong line with dashes instead of commas. With this code it finds line B, but not line A:
rx.SearchPattern = "(\\d{2}:\\d{2}:\\d{2},\\d{3})(.+)(?\\d{2}:\\d{2}:\\d{2},\\d{3}).$"
I tried to make substring 3 optional with a questionmark in different places. But it doesn’t have the effect that I wish for:
rxA.SearchPattern = "(\\d{2}:\\d{2}:\\d{2},\\d{3})(.+)(?\\d{2}:\\d{2}:\\d{2},\\d{3}).?$"
rxB.SearchPattern = "(\\d{2}:\\d{2}:\\d{2},\\d{3})(.+)?(\\d{2}:\\d{2}:\\d{2},\\d{3}).?$"
rxC.SearchPattern = "(\\d{2}:\\d{2}:\\d{2},\\d{3})(.+)(\\d{2}:\\d{2}:\\d{2},\\d{3})?.?$"
This is a tricky one because the same pattern may or may not repeat at the end, and you don’t know what’s between (I guess), so I used an alternator:
(?|(?:(\\d{2}:\\d{2}:\\d{2},\\d{3})(.+)(\\d{2}:\\d{2}:\\d{2},\\d{3}))|(?:(\\d{2}:\\d{2}:\\d{2},\\d{3}))).*
Dear Kem, that is awesome! Thanks a lot. RegExRx is a very helpful little app btw.
New challenge arose. some of the text in (.+) is russian script. If the search is run, only a few lines are found. If run again, even less lines are found.
I am also having trouble to load rtf-text into a textArea.StyledText.RTFData. If pasted into the textArea, all the lines are displayed, of load via TextInputStream, only a few lines make it. Does this maybe have to do with it? Or is this a seperate problem and the searchPattern is solely responsible for the mishap?
Instead of .+, try “[^\r
]+”, but it sounds like you have an encoding problem. Make sure your text is UTF8 at the start, and convert it if not.
ok, I am trying with text-encoding. But I might need help here as well, as the following code doesn’t improve anything:
Dim t As TextInputStream
t = TextInputStream.Open(f)
t.Encoding = Encodings.UTF8
TextArea1.StyledText.RTFData = t.ReadAll
t.Close
Is the file encoded as UTF-8 though?
yes it should be. If I open it in, say, word, UTF8 delevers the right results. Is there a way in Xojo to determine the Encoding? The following code gives the strange result “134217984”
str(t.Encoding.Code))
Xojo only knows what you tell it. You could assign the wrong encoding and it wouldn’t complain.
found it! It had to do how I ran the loop. This code was false:
var i as integer
while rxm <> NIL
i = rxm.subexpressionstartB(0) + len(rxm.subexpressionstring(0))
rx.searchStartPosition = i
// do something
rxm = rx.search
wend
And this one (yours, Kem) works now:
while rxm isa RegExMatch
// do something
rxm = rx.search
wend
Only thing that is not working yet is opening an RTF-File in Russian.