Does anyone have a shareable clipping that would look at a fixed field count string and take into account that the fields after a set number (6 in this case) should be processed as one field even if it contains the separator?
@Michel Bujardet - true, but we’re not always in control of what gets input here, so those types of modifications at the generation point aren’t always in our control.
[quote=271337:@Tim Jones]It’s always 6 fields and we use | because the remainder of the potential separators are all valid characters in file names.
[/quote]
I tend to use low control characters which aren’t legal in most files name, or simple for users to get in there, but can be read from a text or binary file
[quote=271337:@Tim Jones]We’re still trying to determine how the users are able to add the | character as part of a path element, but that’s Apple …
One other thought would be a MidB InStrB replace of each of the first 5 ‘|’ characters with Chr(0) and then splitB on Chr(0).
Testing ideas right now.[/quote]
Something like that
We do that sort of things for the IDE reading a VCP manifest
Okay, so using SplitB and then Joining each member above theArray(5) with the “|” character seems fastest on large sets (1,000,000+ lines).
theArray = SplitB(theLine, "|")
If theArray.Ubound > 5 then // 0 - 5 = 6 fields
For x = 5 to theArray.Ubound
thePath = thePath + "|" + theArray(x)
Next
End If
Tested on both 64bit and 32bit runs, this is faster than replacing the first 5 “|” characters with Chr(0).
I have verified that the “theEscapedPath” value is properly shell-escaped in each case.
If I run that with a sample of the expanded string against 1000 or so lines from a log in RegExRX, it works, but for some reason we are witnessing isolated instances where it’s not getting the right answers based on a manual parse of the log. Is it possible that the RegEx engine in Xojo is barfing on 10’s of millions of lines?
That will split each line by the bar into six parts. It won’t matter if the last part is all bars. The code would be something like this:
while not t.EOF
dim oneLine as string = ReadOneLine
dim match as RegExMatch = rx.Search( oneLine )
if match is nil then
exit while
end if
dim parts() as string
redim parts( 5 )
for i as integer = 1 to match.SubExpressionCount - 1
parts( i - 1 ) = match.SubExpressionString( i )
next
// Do something with parts
wend
(Untested pseudo-code.)
I have no idea if this is faster or better than what you’re doing, I merely offer it as an alternative.
in RegExRX, but in the returned data from the Xojo RegEx, SubExpressionString(1) is always empty.
Also, it doesn’t appear that Xojo’s debugger allows us to examine the resulting RegEx results… (old version of Xojo - the project opened in 13r3.3 this morning)
You forgot to put the slash before the bar in your pattern so it’s acting as an alternator. In the first match, there is no group because it never executes the alternate pattern. RegExRX is showing the same thing. The pattern you intended is: