String constant with quote mark

I have a program where I have a defined character set in a string constant, and that string constant contains both the double-quote character and the single-quote character (sometimes called and apostrophe).

Next I get a text input from a text field and scan through that constant using InstrB to have it tell me at which character position does a character in the input string occur.

Example:
the string constant = ABCDEFGH
and the string input = E
Then InstrB(constant,input) reruns the number 5

That is exactly as it should be.

But I have discovered that if my string constant contains a single-quote mark ( ’ ) or a double-quote mark ( " ) and I try to search for either of those characters the InstrB command will return value of 0 (as in Not Found)

I even attempted programmatically to copy the constant into a variable and search that variable … and it still won’t find either of the quote marks … but it finds other characters just fine.

And whenI have it search for characters that are after the quote marks … it finds them but reports their position incorrectly.

I wrote a program to test this:

In my test program I created a Module, and in that module I defined the constant name MyString = ABCDE"FGH’IJKL … and when I run my program, in TextField1 I enter ABCDE"FGH’IJKL (the same value that my constant is supposed to contain) … and then in TextField2 I enter the character it is supposed to search for.

[code] dim A,B,C as string
dim X,Y,Z as Integer

A = TextField1.text
B = TextField2.text
C = MyString
Label6.text = MyString
Label10.text = C
X = InstrB(A,B)
Y = InstrB(MyString,B)
Z = InstrB(C,B)
Label1.text = str(X)
Label4.text = str(Y)
Label12.text = str(Z)[/code]

When I search for the letter A … X,Y, and Z all return 1

When I search for the letter E … X,Y, and Z all return 5

When I search for a " then X=6, Y=0, and Z=0

When I search for the letter F then X=7, Y=9, and Z=9

When I search for a ’ then X=10, Y=0, and Z=0

When I search for I (capital i) then X=11, Y=15, and Z=15


I think I found a significant bug in Xojo 2015 Release 2.2 for the Mac

Oooops

Line 9 should read …

Then InstrB(constant,input) returns the number 5

Have you tried it with Instr instead of InstrB ?

Not here (same Xojo release, on OS X). The results are consistent.

My guess is that you have declared the constant MyString as Text and not as String (even if you say so in your post).

Either change the constant to String, or change all variable to Text (and then use Instr).

Result here are different.

[quote]When I search for the letter A … X,Y, and Z all return 1[/quote] Same thing

[quote]When I search for the letter E … X,Y, and Z all return 5[/quote] Same thing

[quote]When I search for a " then X=6, Y=0, and Z=0[/quote] All variables equal 6

[quote]When I search for the letter F then X=7, Y=9, and Z=9[/quote] All variables equal 7

[quote]When I search for a ’ then X=10, Y=0, and Z=0[/quote] All variables equal 10

[quote]When I search for I (capital i) then X=11, Y=15, and Z=15[/quote] All variables equal 11

You seem to have found a significant bug in your own code. Since you did not post your original project, difficult to know where the gremlin is…

It sounds like you are either using curly quotes in your constants or getting them in your text. If the former, that’s an easy fix. If the latter, you might want to switch to regular expressions instead.

As for the reported position, you should read up on encodings and how they work, especially UTF-8. If I’m right about the curly quotes, those are stored with multiple bytes so the byte position will not match the character position. InStrB will return a byte position while InStr will return a character position.

In this case, Text in the new framework may help you since it removes encoding from the equation and lets you work with just the characters.

  1. I cannot use Instr instead because Instr does not distinguish between uppercase and lowercase and I MUST have that ability

  2. I have checked my constant and it IS defined as a global STRING

  3. ‘curly quotes’ is a non-sequiter because I’m entering the quotes on my Macintosh keyboard and the quotes on my keyboard are the ONLY kind I have available. Also, I used that same keyboard to declare the constant and to enter the test text. So the constant and the test text would contain the same quotes and therefore they SHOULD equate to each other.

  4. I REALLY wish I could post screenshots so I could PROVE to you that I did indeed declare my constant as a global String … and I could SHOW you the results I’m getting

  5. Is there some way I can upload my test project so that YOU can see the same thing I’m seeing? (I don’t have a web site or other linkable URL) and there is no place on the internet where my screenshots can be viewed.

I changed my constant to TEXT instead of String, ran the program in debug mode … and I still get the exact same results.

When I search for the letter A … X,Y, and Z all return 1

When I search for the letter E … X,Y, and Z all return 5

When I search for a " then X=6, Y=0, and Z=0

When I search for the letter F then X=7, Y=9, and Z=9

When I search for a ’ then X=10, Y=0, and Z=0

When I search for I (capital i) then X=11, Y=15, and Z=15

Examine C in the debugger and look at the byte values in hex. What are they? And for B?

Check what the setting for System Preferences > Keyboard > Text “Use smart quotes & dashes” is
Thats likely very relevant

Then try

  dim myString as string = chr(65) +  chr(66) + _
   chr(67) +_
   chr(68) +_
   chr(69) +_
   chr(34) +_
   chr(70) +_
   chr(71) +_
   chr(72) +_
   chr(39) +_
   chr(73) +_
   chr(74) +_
   chr(75) +_
   chr(76) 
  
  
  dim doubleQuotePos as integer = instr(myString, """" )
  
  dim singleQuotePos as integer = instr(myString, "'" )
  break

This code makes up a quick little ascii string that holds ABCDE"FGH’IJKL
It definitely doesn’t hold any smart quotes
Chr(34) is the double quote
Chr(39) is a single quote

And when you check doubleQuote pos its correct as is singleQuotePos

However IF the following code pastes correctly you’ll find the first character is code point 8220 - not 34 which is what is usually a " character
This is a “smart quote”

And that can be what gets inserted in a text field when you type
Loads of fun

[quote=195658:@Marc Speth]When I search for the letter F then X=7, Y=9, and Z=9

When I search for a ’ then X=10, Y=0, and Z=0

When I search for I (capital i) then X=11, Y=15, and Z=15[/quote]
This indicates you have UTF-8 “smart quotes” in your string constant. Your constant does not contain regular quotes. Plain and simple.

This starts to shed some light on the subject.

The debugger reports that A, the constant (as text), contains 41 42 43 44 45 22 46 47 48 27 49 4A 4B 4C … whereas My C variable (when defined as String ) has a hex value of 41 42 43 44 45 E2 80 9D 46 47 48 E2 49 4A 4B 4C

And when B = “A”… the hex value for B is 22 … which, of course is nowhere in C String

By the way, the debugger reports that A, B and C are encoded UTF-8, but C has a byte length of 18 even though it only contains 14 characters. (the light is starting to dawn in my brain)

So how do I keep everything encoded as TEXT? Because even when I changed the constant to TEXT the debugger still reports that it is UTF-8 … I’m guessing that in my code I need to DIM A,B, and C as TEXT instead of String … right?

… and yes, when I check my keyboard system prefs I have “use smart quotes and dashes” checked. But changing that may solve the problem for ME but my program users would not know they need to change that … so I have to figure out how to solve this programmatically so it’s transparent to other program users.

I’ll try changing my DIM statements ad see if that solves it.

When I changed the DIM statement for ABC to text it won’t compile because …

A = TextField1.text
B = TextField2.text

Cause a Type Mismatch error. That is A and B expect text but are getting strings from the textfields and I don’t know how to convert those strings to text before being stored in A and B

TEXT vs String does not imply no encoding or single-byte encoding. You need to fix the value of the constant because the IDE is obeying that setting. It shouldn’t affect your end users, because they will be typing into a textfield, which does not obey the setting. A TextArea should, but not a TextField. So, yeah, changing that setting and re-entering the constant will solve it for everyone.

OK … I researched the .ToText method and solved the Type Mismatch problem.

As the program stands now, the constant is defined as TEXT …

When I change my program code to this:

[code] dim A,B,C as text
dim X,Y,Z as Integer

A = TextField1.text.ToText
B = TextField2.text.ToText
C = MyString
Label6.text = MyString
Label10.text = C
X = Instr(A,B)
Y = Instr(MyString,B)
Z = Instr(C,B)
Label1.text = str(X)
Label4.text = str(Y)
Label12.text = str(Z)[/code]

It reports the positions correctly for all letters EXCEPT for the quote marks … they still report position=0 …

But using Instr is still case insensitive and for my purposes it MUST be able to distinguish between A and a …

So when I change to using InstrB instead … it now distinguishes uppercase from lowercase, still can’t report position for the quotes, and incorrectly reports the the positions of letters that occur after the quotes.

The problem with double-quotes is fairly minor because the chance of users entering double-quotes is fairly slim. But the single-quote problem is significant because users are almost certain to use contractions like Don’t or Can’t … my program MUST be able to distinguish uppercase from lowercase, AND also correctly report the position of quote marks. And in ALL cases so far, no matter whether I am using string or text, it has NEVER correctly reported the position of either double-quotes or single-quotes located inside my constant.

You can convert a properly encoded String to Text with s.ToText. In the case of your fields, it would be TextField1.Text.ToText. Also, to properly take advantage, you would not use InStr. If you do, all you are doing is converting the Text back to a UTF-8-encoded String. The new framework methods have an option for a case-sensitive comparison.

More specifically, you would use something like:

X = A.IndexOf( B, Text.CompareCaseSensitive )

Just remember that the results of any new framework call, like IndexOf, are zero-based, not one-based like InStr. In other words, the first character occupies the zero position of the Text.

Ahhhh. … I’ll give that a try and see if I can get it to work.

If you haven’t seen it, I gave a kind shorthand comparison of String and Text to help explain the differences here. It might help to clear some things up for you.

Using Text is not required to solve your problem, though. Just fix the constant. You’re introducing extra complexity for no real reason.