CHR(8217)

Philip_McCarthy · May 31, 2019, 5:00am

I’m having some issues with chr(8217) in a text area … seems text areas don’t like this CHR (I just get a space)

So I tried replacing Chr(8217) with CHR(39) before it got to a text area
if myfile <> nil then
t = textinputstream.open(myfile)
holdstring = t.readall
holdstring = replace(holdstring,chr(8217),chr(39))
inputtext.text = holdstring
end if

but I’m still just getting a blank space.

help

Tim_Hare · May 31, 2019, 5:14am

You need to define the encoding of the text when you read it in. Otherwise you have no encoding, just single bytes, and chr(8217) will not display correctly.

Philip_McCarthy · May 31, 2019, 5:17am

Thanks Tim … one more quick question …

How do I “define the encoding of the text when [I] read it in”?

Emile_Schwarz · May 31, 2019, 5:21am

http://documentation.xojo.com/api/text/encoding_text/defineencoding.html ?

Philip_McCarthy · May 31, 2019, 5:36am

Thanks - OK … ummm … stay with me

The character that troubles me is the apostrophe is the word below
“Mustnt” … it’s chr(8217)

So - I adjusted the code as advised here to the below …

if myfile <> nil then
t = textinputstream.open(myfile)
holdstring = t.ReadAll.DefineEncoding(Encodings.UTF8)
holdstring = replaceall(holdstring,chr(8217),chr(39))
inputtext.text = holdstring
end if

My textarea just gives me a question-mark-like looking character …
I tried other types of UTF - but nothing
I’m loading from a simple text file if that helps any.

Many Thanks

DerkJ · May 31, 2019, 5:42am

Try it wth:


mString = Encodings.UTF8.Chr(8217)

Philip_McCarthy · May 31, 2019, 5:56am

Ah - I see that gets chr(8217) in the text area …
But I’d like the whole text … just either WITH 8217 or swapping it to 39
so where does Encodings.UTF8.Chr(8217) fit into the code?

if myfile <> nil then
t = textinputstream.open(myfile)
holdstring = t.ReadAll.DefineEncoding(Encodings.UTF8)
holdstring = replaceall(holdstring,chr(8217),chr(39))
inputtext.text = holdstring
end if

I know I’m being slow

DerkJ · May 31, 2019, 6:31am

Append it to your string.


mString = EncodedStringWithChar + inputdatastring

You can replace the CHR(8217) with Encodings.UTF8.Chr(8217) anywhere as it returns a string

Philip_McCarthy · May 31, 2019, 6:52am

It doesn’t

This is the code …

if myfile <> nil then
t = textinputstream.open(myfile)
‘’’’’‘holdstring = t.ReadAll.DefineEncoding(Encodings.UTF8)
holdstring = t.readall
‘’’’’'holdstring = Encodings.UTF8.Chr(8217)
holdstring = replaceall(holdstring,chr(8217),Encodings.UTF8.Chr(8217))
inputtext.text = holdstring
end if

But the word which contains 8217 comes out with a form of a ? in the space.

DerkJ · May 31, 2019, 7:25am

Dont replace 8217 with 8217…

What char do you want to see? A quotation?

Try
T.readAll(Encodings.UTF8)
Instead

DerkJ · May 31, 2019, 7:28am

https://documentation.xojo.com/api/files/textinputstream.html#textinputstream-readall

You can set the encoding in which the string is to be read.
Then append and/or replace the characters. It should work.

Philip_McCarthy · May 31, 2019, 7:32am

Tried with both the below - no luck

if myfile <> nil then
t = textinputstream.open(myfile)
holdstring = T.readAll(Encodings.UTF8)
inputtext.text = holdstring
end if

if myfile <> nil then
t = textinputstream.open(myfile)
holdstring = T.readAll(Encodings.UTF8)
holdstring = replaceall(holdstring,Encodings.UTF8.Chr(8217),chr(39))
inputtext.text = holdstring
end if

Juts can’t get Chr(8217) to become chr(39) (it’s an apostrophe by the way)

Tim_Hare · May 31, 2019, 8:18am

I suspect your file is not UTF8. With a proper UTF8 file where I copy/pasted your string, this code works.

t = textinputstream.open(myfile)
holdstring = T.readAll(Encodings.UTF8)
inputtext.text = holdstring

What are the actual bytes in the file? UTF8 for chr(8217) is E2 80 99. If your file contains some other byte values, it’s not UTF8.

Philip_McCarthy · May 31, 2019, 8:29am

I don’t see a way here to send you the file itself … but I’ll tell you how I got it.

My students are all living in the UAE (I’m using their data) … so they have Arabic word documents. I copied and pasted from those documents into a txt document (a regular txt). I can see the character there … it’s a Microsoft looking 8217 apostrophe … a “smart” one. I can paste that character to my text area and ask for the ascii, and I get 8217. I do the same with the quotation marks, I get 8220. So, I have no problem pasting the text into the text area. I just can’t directly load it from a txt file … 8217 and 8220 becomes either a “?” or a blank space.

BY the way - re “What are the actual bytes in the file? UTF8 for chr(8217) is E2 80 99. If your file contains some other byte values, it’s not UTF8.”
How do I get that information?

Emile_Schwarz · May 31, 2019, 8:35am

The ? character as you named it is the replacement character, used when the asked character do not exists in the used font (Police set).

Try Times, Arial, a different character in your TextArea to display the file contents.

Better: load the original word document to get the used font (police) name and use that value for your TextArea.

Philip_McCarthy · May 31, 2019, 8:48am

Tried consolas, arial, and times … see code below … no joy

inputtext.TextFont = “Consolas”
if myfile <> nil then
t = textinputstream.open(myfile)
holdstring = T.readAll(Encodings.UTF8)
'holdstring = replaceall(holdstring,Encodings.UTF8.Chr(8217),chr(39))
inputtext.text = holdstring
end if

Somewhat new to Xojo so not here sure how to load a word doc … I’ll be working on that

Emile_Schwarz · May 31, 2019, 8:51am

I do not read that.

There are Xojo Classes to deal directly with Microsoft Word.

You may do that of simply export the file (from Word) to regular txt, then load that txt into xojos TextArea.

Also: you may create a simple project that only loads your text file, pack it with that txt file into an archive and share the whole: it will be far better to (try) understand what happening and give a better advice.

Edit:
http://documentation.xojo.com/api/windows/wordapplication.html

WordApplication is reserved to the Windows Platform.

Philip_McCarthy · May 31, 2019, 8:58am

You lost me Emile … but I do appreciate your and everyone’s help.

If I ever work this out … I’ll post what I did.

Thanks all!

Tim_Hare · May 31, 2019, 9:20am

[quote=439214:@Philip McCarthy]BY the way - re “What are the actual bytes in the file? UTF8 for chr(8217) is E2 80 99. If your file contains some other byte values, it’s not UTF8.”
How do I get that information?[/quote]
Put a break point after you have read the file and then examine the variable in the debugger. Click on the string and there will be a tab to see the Binary values.

Philip_McCarthy · May 31, 2019, 9:34am

textinputstream is UTF8 … is that what you meant?