Regex problem

Sometimes Xojo makes my head really hurt. I’ve got this old regex from Joe Strout to split strings with a regex:

[code]Protected Function SplitByRegEx(source As String, delimPattern As String) As String()

Dim out(-1) As String
Dim startPos As Integer
Dim re As New RegEx
re.SearchPattern = delimPattern
Dim rm As RegExMatch = re.Search(source)
while rm <> nil
out.Append MidB(source, startPos + 1, rm.SubExpressionStartB(0) - startPos)
startPos = re.SearchStartPosition
rm = re.Search
wend
if startPos < source.LenB then out.Append MidB(source, startPos + 1)
return out
End Function[/code]

Yesterday I saw this code failing for one single case. Since I’m using the code to split mails into it’s parts the delimPattern string HAS to be there. Now the odd thing: after making an example in Xojo 2013r2 the code works. When I exchange the code to use MBS everything also works:

[code]Protected Function SplitByRegEx(source As String, delimPattern As String) As String()

Dim out(-1) As String
Dim re As New RegExMBS
if re.Compile(delimPattern) then
dim start as integer = 0
while re.Execute(source, start) > 0 and start < lenb(source)

dim p as integer = re.Offset(0)
out.Append midb(source, start + 1, p - start)
start = re.Offset(1)
wend
end if

return out
End Function[/code]

Delimpatterns are usually quite simple like: ------------070105050805040109060307|------------070105050805040109060307–
The source in my test case is

[quote]This is a multi-part message in MIME format.
--------------070105050805040109060307
Content-Type: multipart/alternative;
boundary="------------040103000006050701000807"

--------------040103000006050701000807
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

Guten Tag[/quote]

Do I have a senior moment? Does anyone see what is wrong with my code?

Offhand, no (although getting lenb(source) on every pass is unnecessary and probably inefficient). I think we’d have to see the pattern and source text of the failed case to determine the cause of the failure.

BTW, I have a SplitByRegEx in my M_String module too. Here is the code, if it helps:

Function SplitByRegEx_MTC(Extends src As String, pattern As String, includeMatch As Boolean = true) As String()
  // Splits a string using a regular expression.
  // Returns an empty array of there is a problem with the pattern
  
  dim r() as string
  
  if pattern = "" or src = "" then
    r = src.Split( "" )
  else
    dim needsConversion as boolean
    dim enc as TextEncoding = src.Encoding
    if enc <> Encodings.UTF8  then
      needsConversion = true
      src = src.ConvertEncoding( Encodings.UTF8 )
    end if
    
    pattern = pattern.ConvertEncoding( src.Encoding )
    
    dim curPos as integer = 0
    dim prevPos as integer = 1
    dim wasError as boolean
    
    // As of Real Studio 2012r2.1, the MBS RegEx is so much faster, it's not funny
    #if kUseMBSPlugins 
      
      static rxMBS as RegExMBS = newRegExMBS
      
      if rxMBS.Compile( pattern ) and rxMBS.Study() then
        dim offsetCount as integer = rxMBS.Execute( src, 0 )
        while offsetCount <> 0
          if includeMatch then
            curPos = rxMBS.Offset( 1 )
          else
            curPos = rxMBS.Offset( 0 )
          end if
          dim thisSeg as string = SegB( src, prevPos, curPos )
          if needsConversion then
            thisSeg = thisSeg.ConvertEncoding( enc )
          end if
          r.Append thisSeg
          
          prevPos = rxMBS.Offset( 1 ) + 1
          offsetCount = rxMBS.Execute( prevPos - 1 )
        wend
      else
        wasError = true
      end if
      
    #else
      
      static rx as RegEx
      if rx is nil then
        rx = new RegEx
        rx.Options.CaseSensitive = false
        rx.Options.DotMatchAll = false
        rx.Options.MatchEmpty = true
        rx.Options.Greedy = true
        rx.Options.TreatTargetAsOneLine = false
        rx.Options.LineEndType = 0
      end if
      rx.SearchPattern = pattern
      
      dim lastPos as integer
      try
        dim match as RegExMatch = rx.Search( src )
        while match <> nil
          curPos = match.SubExpressionStartB( 0 )
          lastPos = curPos + match.SubExpressionString( 0 ).LenB
          if includeMatch then
            curPos = lastPos
          end if
          dim thisSeg as string = SegB( src, prevPos, curPos )
          if needsConversion then
            thisSeg = thisSeg.ConvertEncoding( enc )
          end if
          r.Append thisSeg
          prevPos = lastPos + 1 
          match = rx.Search
        wend
      catch
        wasError = true
      end try
      
    #endif
    
    if not wasError then
      src = src.MidB( prevPos )
      if needsConversion then
        src = src.ConvertEncoding( enc )
      end if
      r.Append src
    end if
    
  end if 
  
  return r
  
End Function

Hi Kem,

thanks for your code. I’ll have a look tomorrow. Examples for pattern and the start of the source text are at the bottom of my text.

Yes, I saw the examples, but is that the case that failed?