Regex Problem: Only First Instance Replaced

I am trying to get some help with why the following does not work in Xojo with its RegEx engine. When I supply the Search and Replace patterns in BBEdit or RegExRX it works fine. But not in my Xojo code. The goal is to replace all the blank lines or all the lines containing just white space with “blank line”. But when I do this in Xojo, only the first instance is replaced. The first block is my code. The block (someText) is the text that I am processing and (noBlank) is the text that is returned . (They are just the text. I am applying the code formatting to the someText and then noBlank because leading spaces seem to get removed otherwise in this forum.)
I presume that I am making a silly mistake, but I cannot find it.

Dim re As New RegEx re.SearchPattern = "^\\s*$" re.ReplacementPattern = "blank line" re.Options.ReplaceAllMatches = True noBlank = re.Replace(someText)

someText

[code]# PROJECT_NAME: Italics
‘’‘deal with italics in a document’’’

import re

import sys
import codecs

from datetime import date
from datetime import datetime

from tkinter import filedialog
from tkinter import messagebox
import pathlib
import livDateStamp
import os

def positions_in_string(to_find, where_looking):
‘’‘returns a list of the positions of the substring in the string being searched. No overlap’’’

import re
return [m.start() for m in re.finditer(to_find, where_looking)][/code]

noBlank

[code]# PROJECT_NAME: Italics.py
‘’‘deal with italics in a document’’’
blank line
import re

import sys
import codecs

from datetime import date
from datetime import datetime

from tkinter import filedialog #import tkinter and then using tkinter.filedialog does not work for some reason 05/13/19
from tkinter import messagebox
import pathlib
import livDateStamp
import os

def positions_in_string(to_find, where_looking):
‘’‘returns a list of the positions of the substring in the string being searched. No overlap’’’

import re
return [m.start() for m in re.finditer(to_find, where_looking)][/code]

Looks like you find a bug in the native RegEx implementation of Replace. You should report that.

(Note: Replace is not part of the PCRE package, it is implemented in code. That’s true in RegExRX too.)

The workaround is to add something this:

noBlank = someText

dim original as string
do
  original = noBlank
  noBlank = re.Replace(noBlank)
loop until noBlank = original

Clumsy, but it works.

Or do something like this (untested):

someText = ReplaceLineEndings( someText, &uA )
dim arr() as string = someText.Split( &uA )

for index as integer = 0 to arr.Ubound
  if arr( index ).Trim = "" then
    arr( index ) = "blank line"
  end if
next

noBlanks = join( arr, EndOfLine )

There are some other curious things: (on a Mac)

[code]Dim doubleEndline As String
doubleEndline = EndOfLine + EndOfLine
originalText = “line1” + doubleEndline + “line2” + doubleEndline + doubleEndline + doubleEndline + “line3” + doubleEndline + “line4”

Dim re As New RegEx
re.SearchPattern = EndOfLine + EndOfLine # re.SearchPattern = "

" behaves the same
re.ReplacementPattern = "
"
re.Options.ReplaceAllMatches = True

replacedText = re.Replace(originalText)

Dim lastVersion As String
// Do this over and over until task complete
Do
lastVersion = replacedText
replacedText = re.Replace(lastVersion)
Loop Until replacedText = lastVersion[/code]

[code]line1

line2

line3

line4
[/code]
goes to (getting you only part-way to the goal)

[code]line1
line2

line3
line4[/code]

BUT

[code]Dim doubleEndline As String
doubleEndline = EndOfLine + EndOfLine
originalText = “line1” + doubleEndline + “line2” + doubleEndline + doubleEndline + doubleEndline + “line3” + doubleEndline + “line4”

Dim re As New RegEx
re.SearchPattern = EndOfLine + EndOfLine # re.SearchPattern = "

" behaves the same
re.ReplacementPattern = EndOfLine
re.Options.ReplaceAllMatches = True

replacedText = re.Replace(originalText)

Dim lastVersion As String
// Do this over and over until task complete
Do
lastVersion = replacedText
replacedText = re.Replace(lastVersion)
Loop Until replacedText = lastVersion[/code]

[code]line1

line2

line3

line4[/code]

goes to (what I am trying to achieve)

line1 line2 line3 line4
It is a little curious, and hard to explain in my mind, why the first formulation gets you part of the way but not all the way.

Try this as the SearchPattern and let me know how it works:

\\R{2,}

That will match any EOL that occur two or more times.

It works well and is cleaner and shorter than my formulation. I also appreciate learning this syntax which I will be able to use a lot in various contexts.

line1
line2
line3
line4