Removing part of a string

I am using MemoryBlock for performance critical string manipulation. I am storing the string in a MemoryBlock and then I am grabbing a certain byte to check for a specific character. What would be the most efficient way of removing a specific part of string specifying the starting index and then the end index or count (from start index) for example I might write it like this:

dim s as string = "Hello World!"
return s.Remove(3, 2) //removes 'llo' from string to make 'He World!"

Thanks

I should have probably mentioned that in some cases I will only remove one character. So I might want to use something like:

s.Remove(3) //removes only 'l' to make 'Helo World!'

I don’t if it is easy to deal with this problem efficiently put I am sure I could put this remove idea into a function but just wondering the most efficient way.

Thanks

You are using e MemoryBlock. Why not simply use MemoryBlock.LeftB() and MemoryBlock.RightB() ?

Like this:
mb = mb.LeftB(index).RightB(index + 1)
?

[quote=88788:@Oliver Scott-Brown]Like this:
mb = mb.LeftB(index).RightB(index + 1)
?[/quote]
Sorry, I am tired. This won’t be right.

This is a really bad idea especially when working with UTF data that can actually require more than a single byte to represent a single character
You can actually split a character apart doing this and wreck your data

[quote=88792:@Norman Palardy]This is a really bad idea especially when working with UTF data that can actually require more than a single byte to represent a single character
You can actually split a character apart doing this and wreck your data[/quote]
So the alternative is to convert to a string?

[quote=88788:@Oliver Scott-Brown]Like this:
mb = mb.LeftB(index).RightB(index + 1)
?[/quote]

I was just testing the idea.

Norman is right. MemoryBlock for strings maybe OK for non accented characters but get whacky when accents or Unicode characters (UTF-8) are used. For instance, if I enter “Hellô World” and try to cut three bytes, it makes “He? World!” instead of “He World”.

You better stick to good ol’ Left() and Right() string manipulation. Or Mid().

If you expect to manipulate “characters” then you should use a string & its NON-binary methods (left, mid, right etc)
Anything else on UTF-8 data may break any utf-8 character that requires more than a single byte to represent

The Unicode code point for “€” is U+20AC.
When you look at the bytes it takes to encode this in UTF-8 it is three bytes E2 82 AC.
If you split those three bytes apart you’ve now broken the “string” and won’t get a euro symbol

to answer the other part of your question

define the signature like this

remove(start as integer,count as integer=1)

instead of how you as most likely doing it

remove(start as integer,count as integer)

of course replace “start” and “count” with the actual variable names you are using

but this allows the 2nd value to be optional, and if not supplied defaults to a value of 1

Instead of a MemoryBlock, convert your string to an array using Split. Then you can examine each character at a time and remove it as needed. Multi-byte characters will be kept intact. When ready, convert back to a string using Join.

Combining Dave’s and Kem’s ideas, and put the method in a module for extension…

[code]Function remove(extends s As String, start as integer, count As integer = 1) As String
start = start - 1
dim sa() As String = s.Split("")
for i As integer = 1 to count
sa.Remove(start)
next
return Join(sa, “”)
End Function

dim s As String = “Hello World!”
MsgBox s.remove(3, 3)[/code]
Needs bounds checking

Joe Strout’s StringUtils has quite an array of functions.

http://www.strout.net/info/coding/rb/intro.html

Jim

Will, as a one-off, I’d expect that to be relatively slow with all the operations involved. My solution was based on a need to examine all the characters, but if that’s not necessary, a solution that uses Left + Right would be best.

I included such a function in my M_String library as s.Delete_MTC( pos, length ).

However, what occurs to me, since the question was about the most efficient way to do this, is to simply use Replace.

s = s.Replace( "llo", "" )

If characters have to be examined first to see if the replacement should happen, then a regular expression, if applicable, would be more efficient than any of the other solutions here.

Why not a simple combo of

s=left(s,start-1)+mid(s,start+count)

needs a little extra… but that should be the basic idea…

[quote=88821:@Kem Tekinay]Will, as a one-off, I’d expect that to be relatively slow with all the operations involved. My solution was based on a need to examine all the characters, but if that’s not necessary, a solution that uses Left + Right would be best.

I included such a function in my M_String library as s.Delete_MTC( pos, length ).

However, what occurs to me, since the question was about the most efficient way to do this, is to simply use Replace.

s = s.Replace( "llo", "" )

If characters have to be examined first to see if the replacement should happen, then a regular expression, if applicable, would be more efficient than any of the other solutions here.[/quote]

i use replace or replaceall most of the time unless the string is multiple line such a list and i don’t want to have a empty line.

Ignoring the ‘Hello World!’ example, I cannot use ReplaceAll as I am looking for specific parts of a string and there could be interferences with ReplaceAll (and replace). But thanks anyway.

What about a regular expression?

Sounds good. Thanks