String manipulation for cleaning word

Good evening
i am reading a text file.
I have lines with words enclosed in special characters. / * WORD1 * / WORD2 / * WORD3 * /.

Is it possible to eliminate the special characters and the words inside them (Word1 and Word3) to get Word2 cleanly?

Exampe:

/* RRR / 1934 / Yellow */ → 1934
/*Brown / buldog1976 / row 23 */ ----> buldog1976

Thanks.

If you are sure of the format then you can use String.Split:

Var aStrings() as String
aStrings = MyLine.Split( "/" )

aStrings will now contain an array of the parts of the string:

/* RRR / 1934 / Yellow */ → 1934

would result in:

aStrings( 0 ) = “”
aStrings( 1 ) = “* RRR "
aStrings( 2 ) = , " 1934 "
aStrings( 3 ) = "
Yellow *”

YourAnswer = aStrings( 2 ).Trim ' which would remove the spaces

Use a RegEx with the SearchPattern (?U)/\*[\s\S]*\*/ and an empty ReplacementPattern. Be sure to set Options.ReplaceAllMatches to True.

When posting sample text in the future, please use backticks around it so characters like * don’t get interpreted.

I think the forum software ate the * characters.

1 Like

I can see * at the start and end.

Yes, but I think it’s supposed to be between the words too. Instead, we get italics.

Ah, I see. Split would still do the job, without the need to add RegEx to the application

Var aStrings() as String
aStrings = MyLine.Split( "*/*" )

These are (what I think are) his examples:

/* RRR */ 1934 /* Yellow */ → 1934
/*Brown */ buldog1976 /* row 23 */ ----> buldog1976

Split won’t work here.

1 Like

Although Ian did add the caveat “if you are sure of the format” which if interpreted to mean ALWAYS dropping specific indexes after the split() operation it would still “work”. That said, RegEx is the ideal solution here in terms of flexibility and speed. A problem like this just begs for a RegEx solution, IMHO.

As others have said, regular expressions are your friend here.

Var re As New RegEx
Var match As RegExMatch
Var result As String
var linesToSearch() as String = Array("/* RRR */ 1934 /* Yellow */", "/*Brown */ buldog1976 /* row 23 */")

re.SearchPattern = "(?<=\*/\s).+?(?=\s/\*)"
for each lineToSearch as string in linesToSearch()
  match = re.Search(lineToSearch)
  if match <> nil then
    system.DebugLog match.SubExpressionString(0)
  end
next

Yes, it would with /, my original code without modification. So long as the intended value doesn’t contain a /.

Looks like his intentions are to remove C like remarks. If that was the case, splitting and looking at a fixed place could lead to errors. A regular expression looking for "/* any content including none */" and discarding them seems the way to go.

Blockquote

In order for your solution to work, none of the unintended values that proceed it may contain a “/“ either. Even my regex example begins to fall apart pretty quickly when the inputs deviate from what we assume to expect… and the person that started this thread hasn’t yet been back around to clarify so we are left guessing.

One thing is certain though… I can read your code and intuit its intended function. It’s been only 3 hours since I posted it, and already I cannot glance at my regex sample and intuitively know, specifically, what it is supposed to do.

But hopefully we’ve given the OP some food for thought.

I think @Kem_Tekinay could make a few easy bucks with a “PayPal me $20 and I’ll write a Regex for you” service :slight_smile: I love what Regex can do but just looking at it makes my head hurt.

4 Likes

In that case I highly recommend you try out his RegExRX app on his website and try the shareware version or just go purchase it in the mac App Store. On his download page linked above there are numerous other downloads – including some freeware – that are incredibly useful.

Even if you never learn any of the more advanced features of RegEx, what you can do with relatively straight forward pattern matching is INCREDIBLY useful and not hard to “read” once you know a few basics. And performance compared to coding your own searches is astounding.

2 Likes

Thanks.

FYI, the MAS version will work as a license to the shareware version, it just has to live somewhere on your drive.

I’ve always said that RegEx is a write only language. In other words you can create very elegant solutions using it, but you can never understand or maintain them afterwards. :slight_smile:

If there is any doubt as to the format RegEx will stand up to it better, that is certain. As I said in my original reply, if you are absolutely certain of the format mine is simple and lightweight.

It’s a little too advanced for me, would you have an example?
Is the code suggested by Chay Wesley ?

Hi Chay Wesley, I tried your code and it works … now I just have to figure out how to read from the text file … but I think I can do it.
I try tonight.

Thanks anyway to everyone for the answers.

var rx as new RegEx
rx.SearchPattern = "(?U)/\*[\s\S]*\*/"
rx.ReplacementPattern = ""
rx.Options.ReplaceAllMatches = true

yourText = rx.Replace( yourText )

This will delete all the comments within your text as delineated by /* and */.