Duplicate text in Strings

Michael_Martz · February 11, 2015, 3:40pm

Hello,

I was wondering if anybody had some code dealing with duplicate text in strings. Lets say I have a string like so:

“(TT-AA,T5,65):(cc):(77,65,TJQ)”

So in this example there are two occurrences of 65 in the string and I want to remove one of them. Any ideas on how to do this would be appreciated.

Mike_Cotrone · February 11, 2015, 3:43pm

here is one way to handle this based on the example above. There are a bunch of other ways to accomplish this also.

Dim theData as String = "(TT-AA,T5,65):(cc):(77,65,TJQ)"
thedata = thedata.ReplaceAll(",65","")

DaveS · February 11, 2015, 3:47pm

[quote=166811:@Mike Cotrone]here is one way to handle this based on the example above. There are a bunch of other ways to accomplish this also.

Dim theData as String = "(TT-AA,T5,65):(cc):(77,65,TJQ)" thedata = thedata.ReplaceAll(",65","")[/quote]

this is of course assuming you are looking for “known” strings… and you want to remove ALL of them, not just the “duplciate”
If the OP wants to figure out that “65” is the duplicate … then it gets a bit more complex

split into an array
analye the array, remove dups
rejoin the array back to string

alot depends on what the results needs to be…

(TT-AA,T5,65):(cc):(77,TJQ) ?

which is the duplicate? the first or second occurance?

Markus_Winter · February 11, 2015, 3:52pm

Curiosity kills me. What does it stand for?

Mike_Cotrone · February 11, 2015, 3:55pm

[quote=166812:@Dave S]this is of course assuming you are looking for “known” strings… and you want to remove ALL of them, not just the “duplciate”
If the OP wants to figure out that “65” is the duplicate … then it gets a bit more complex

split into an array
analye the array, remove dups
rejoin the array back to string

alot depends on what the results needs to be…

(TT-AA,T5,65):(cc):(77,TJQ) ?

which is the duplicate? the first or second occurance?[/quote]
Spot on Dave!

Since this is taking a deeper turn - If you can’t find a common delimiter to split into an array then you could use RegEx Find/Replace against the flat string. This could yield more flexibility.

Michael_Martz · February 11, 2015, 4:01pm

Hey. Thanks for the quick replies!

This would be for unknown duplicates which is why it was stumping me. Ok so I need to split it into an array to do this. Not exactly sure how to do that but I will figure it out. I have split strings before but they had line breaks in them so it was quite a bit different then this.

This is for a poker related program I am working on. So in that example those are poker hands that have “unions” or AA with two clubs. Is does not matter if the occurrence is the first or second one. I can remove the “,” with a replace all after I join the string back together so that is not a problem.

Mike_Cotrone · February 11, 2015, 4:03pm

Mike do you have a delineator that is consistent in your outputs for you to use as the split point? If they are line breaks then that is easy as you can use EndofLine as the delineator.

Post some code snippets?

Kem_Tekinay · February 11, 2015, 4:09pm

This cries out for a regular expression…

Kem_Tekinay · February 11, 2015, 4:13pm

This pattern will identify any string that is made up of two or more letters or numbers where the identical combination occurs later in the text.

([A-Z0-9]{2,})(?=[\\s\\S]*\\g1)

Mike_Cotrone · February 11, 2015, 4:27pm

Here is an example function that takes your string input, uses regular expression replacement, and returns the final result string.

I am replacing the matched pattern with a “” – So change accordingly.

Function(theParser(InputString as String) as String
  Dim theReplacement_RegEx As RegEx
  theReplacement_RegEx = New RegEx
  theReplacement_RegEx.Options.caseSensitive = False // Do you need Case?
  theReplacement_RegEx.Options.ReplaceAllMatches = True
  theReplacement_RegEx.SearchPattern = "([A-Z0-9]{2,})(?=[\\s\\S]*\\g1)"
  theReplacement_RegEx.ReplacementPattern = ""
  Dim New_String As String = theReplacement_RegEx.Replace(InputString)
  Return New_String

Michael_Martz · February 11, 2015, 9:14pm

Hey that worked perfect! All I had to do was turn the casesensitive option to true and do some replacealls on the comma problems. Thanks for all your help!

Mike_Cotrone · February 11, 2015, 9:21pm

Kem had done the hard part - making that pattern Glad it worked out!

Michael_Hußmann · February 12, 2015, 12:13am

Just wondering wouldnt it be possible to replace those strings by a data structure where the issue of duplicates couldnt arise in the first place, such as a dictionary? That might be faster than weeding out duplicates later on.