Duplicate text in Strings

Hello,

I was wondering if anybody had some code dealing with duplicate text in strings. Lets say I have a string like so:

“(TT-AA,T5,65):(cc):(77,65,TJQ)”

So in this example there are two occurrences of 65 in the string and I want to remove one of them. Any ideas on how to do this would be appreciated.

here is one way to handle this based on the example above. There are a bunch of other ways to accomplish this also.

Dim theData as String = "(TT-AA,T5,65):(cc):(77,65,TJQ)"
thedata = thedata.ReplaceAll(",65","")

[quote=166811:@Mike Cotrone]here is one way to handle this based on the example above. There are a bunch of other ways to accomplish this also.

Dim theData as String = "(TT-AA,T5,65):(cc):(77,65,TJQ)" thedata = thedata.ReplaceAll(",65","") [/quote]

this is of course assuming you are looking for “known” strings… and you want to remove ALL of them, not just the “duplciate”
If the OP wants to figure out that “65” is the duplicate … then it gets a bit more complex

split into an array
analye the array, remove dups
rejoin the array back to string

alot depends on what the results needs to be…

(TT-AA,T5,65):(cc):(77,TJQ) ?

which is the duplicate? the first or second occurance?

Curiosity kills me. What does it stand for?

[quote=166812:@Dave S]this is of course assuming you are looking for “known” strings… and you want to remove ALL of them, not just the “duplciate”
If the OP wants to figure out that “65” is the duplicate … then it gets a bit more complex

split into an array
analye the array, remove dups
rejoin the array back to string

alot depends on what the results needs to be…

(TT-AA,T5,65):(cc):(77,TJQ) ?

which is the duplicate? the first or second occurance?[/quote]
Spot on Dave!

Since this is taking a deeper turn - If you can’t find a common delimiter to split into an array then you could use RegEx Find/Replace against the flat string. This could yield more flexibility.

Hey. Thanks for the quick replies!

This would be for unknown duplicates which is why it was stumping me. Ok so I need to split it into an array to do this. Not exactly sure how to do that but I will figure it out. I have split strings before but they had line breaks in them so it was quite a bit different then this.

This is for a poker related program I am working on. So in that example those are poker hands that have “unions” or AA with two clubs. Is does not matter if the occurrence is the first or second one. I can remove the “,” with a replace all after I join the string back together so that is not a problem.

Mike do you have a delineator that is consistent in your outputs for you to use as the split point? If they are line breaks then that is easy as you can use EndofLine as the delineator.

Post some code snippets?

This cries out for a regular expression…

This pattern will identify any string that is made up of two or more letters or numbers where the identical combination occurs later in the text.

([A-Z0-9]{2,})(?=[\\s\\S]*\\g1)
1 Like

Here is an example function that takes your string input, uses regular expression replacement, and returns the final result string.

I am replacing the matched pattern with a “” – So change accordingly.

Function(theParser(InputString as String) as String
  Dim theReplacement_RegEx As RegEx
  theReplacement_RegEx = New RegEx
  theReplacement_RegEx.Options.caseSensitive = False // Do you need Case?
  theReplacement_RegEx.Options.ReplaceAllMatches = True
  theReplacement_RegEx.SearchPattern = "([A-Z0-9]{2,})(?=[\\s\\S]*\\g1)"
  theReplacement_RegEx.ReplacementPattern = ""
  Dim New_String As String = theReplacement_RegEx.Replace(InputString)
  Return New_String

Hey that worked perfect! All I had to do was turn the casesensitive option to true and do some replacealls on the comma problems. Thanks for all your help!

Kem had done the hard part - making that pattern :slight_smile: Glad it worked out!

Just wondering … wouldn‘t it be possible to replace those strings by a data structure where the issue of duplicates couldn’t arise in the first place, such as a dictionary? That might be faster than weeding out duplicates later on.

1 Like