Removing part of a string

Oliver_Scott-Brown · May 16, 2014, 10:13pm

I am using MemoryBlock for performance critical string manipulation. I am storing the string in a MemoryBlock and then I am grabbing a certain byte to check for a specific character. What would be the most efficient way of removing a specific part of string specifying the starting index and then the end index or count (from start index) for example I might write it like this:

dim s as string = "Hello World!"
return s.Remove(3, 2) //removes 'llo' from string to make 'He World!"

Thanks

Oliver_Scott-Brown · May 16, 2014, 10:17pm

I should have probably mentioned that in some cases I will only remove one character. So I might want to use something like:

s.Remove(3) //removes only 'l' to make 'Helo World!'

I don’t if it is easy to deal with this problem efficiently put I am sure I could put this remove idea into a function but just wondering the most efficient way.

Thanks

Michel_Bujardet · May 16, 2014, 10:31pm

You are using e MemoryBlock. Why not simply use MemoryBlock.LeftB() and MemoryBlock.RightB() ?

Oliver_Scott-Brown · May 16, 2014, 10:33pm

Like this:
mb = mb.LeftB(index).RightB(index + 1)
?

Oliver_Scott-Brown · May 16, 2014, 10:35pm

[quote=88788:@Oliver Scott-Brown]Like this:
mb = mb.LeftB(index).RightB(index + 1)
?[/quote]
Sorry, I am tired. This won’t be right.

Norman_P · May 16, 2014, 10:41pm

This is a really bad idea especially when working with UTF data that can actually require more than a single byte to represent a single character
You can actually split a character apart doing this and wreck your data

Oliver_Scott-Brown · May 16, 2014, 10:43pm

[quote=88792:@Norman Palardy]This is a really bad idea especially when working with UTF data that can actually require more than a single byte to represent a single character
You can actually split a character apart doing this and wreck your data[/quote]
So the alternative is to convert to a string?

Michel_Bujardet · May 16, 2014, 10:54pm

[quote=88788:@Oliver Scott-Brown]Like this:
mb = mb.LeftB(index).RightB(index + 1)
?[/quote]

I was just testing the idea.

Norman is right. MemoryBlock for strings maybe OK for non accented characters but get whacky when accents or Unicode characters (UTF-8) are used. For instance, if I enter “Hellô World” and try to cut three bytes, it makes “He? World!” instead of “He World”.

You better stick to good ol’ Left() and Right() string manipulation. Or Mid().

Norman_P · May 16, 2014, 10:56pm

If you expect to manipulate “characters” then you should use a string & its NON-binary methods (left, mid, right etc)
Anything else on UTF-8 data may break any utf-8 character that requires more than a single byte to represent

The Unicode code point for “” is U+20AC.
When you look at the bytes it takes to encode this in UTF-8 it is three bytes E2 82 AC.
If you split those three bytes apart you’ve now broken the “string” and won’t get a euro symbol

DaveS · May 17, 2014, 12:01am

to answer the other part of your question

define the signature like this

remove(start as integer,count as integer=1)

instead of how you as most likely doing it

remove(start as integer,count as integer)

of course replace “start” and “count” with the actual variable names you are using

but this allows the 2nd value to be optional, and if not supplied defaults to a value of 1

Kem_Tekinay · May 17, 2014, 12:05am

Instead of a MemoryBlock, convert your string to an array using Split. Then you can examine each character at a time and remove it as needed. Multi-byte characters will be kept intact. When ready, convert back to a string using Join.

Will_Shank · May 17, 2014, 12:25am

Combining Dave’s and Kem’s ideas, and put the method in a module for extension…

[code]Function remove(extends s As String, start as integer, count As integer = 1) As String
start = start - 1
dim sa() As String = s.Split("")
for i As integer = 1 to count
sa.Remove(start)
next
return Join(sa, “”)
End Function

dim s As String = “Hello World!”
MsgBox s.remove(3, 3)[/code]
Needs bounds checking

Jim_Shaffer · May 17, 2014, 2:16am

Joe Strout’s StringUtils has quite an array of functions.

http://www.strout.net/info/coding/rb/intro.html

Jim

Kem_Tekinay · May 17, 2014, 2:54am

Will, as a one-off, I’d expect that to be relatively slow with all the operations involved. My solution was based on a need to examine all the characters, but if that’s not necessary, a solution that uses Left + Right would be best.

I included such a function in my M_String library as s.Delete_MTC( pos, length ).

However, what occurs to me, since the question was about the most efficient way to do this, is to simply use Replace.

s = s.Replace( "llo", "" )

If characters have to be examined first to see if the replacement should happen, then a regular expression, if applicable, would be more efficient than any of the other solutions here.

DaveS · May 17, 2014, 5:06am

Why not a simple combo of

s=left(s,start-1)+mid(s,start+count)

needs a little extra… but that should be the basic idea…

Richard_Duke · May 17, 2014, 6:53am

[quote=88821:@Kem Tekinay]Will, as a one-off, I’d expect that to be relatively slow with all the operations involved. My solution was based on a need to examine all the characters, but if that’s not necessary, a solution that uses Left + Right would be best.

I included such a function in my M_String library as s.Delete_MTC( pos, length ).

However, what occurs to me, since the question was about the most efficient way to do this, is to simply use Replace.

s = s.Replace( "llo", "" )

If characters have to be examined first to see if the replacement should happen, then a regular expression, if applicable, would be more efficient than any of the other solutions here.[/quote]

i use replace or replaceall most of the time unless the string is multiple line such a list and i don’t want to have a empty line.

Oliver_Scott-Brown · May 17, 2014, 11:04am

Ignoring the ‘Hello World!’ example, I cannot use ReplaceAll as I am looking for specific parts of a string and there could be interferences with ReplaceAll (and replace). But thanks anyway.

Kem_Tekinay · May 17, 2014, 12:35pm

What about a regular expression?

Oliver_Scott-Brown · May 17, 2014, 1:35pm

Sounds good. Thanks