And what about when open a file, read its content, and writing the content to a new file, and need to have the same encoding/bom etc. as the source file?
If this is something you need to do consistently, consider creating a class that handles it for you. If you do it right, it will be invisible to you.
Yes, I need to do that all the time, I have to do work with lots of files, and create also lots of new files. I did that until now in Delphi, where it is very easy, but I get headache with Xojo.
Every language has its advantages and disadvantages over others.
BTW, a BOM for UTF-8 files is not required or recommended. See:
Page 36:
The main purpose of the BOM, or Byte Order Mark, is to indicate the byte order in encodings where it might be different, e.g., UTF-16LE vs. UTF-16BE. UTF-8 byte order never changes.
The UTF-8 BOM can also indicate that the data IS UTF-8 although that is NOT the primary usage
Reading a UTF-8 text file in an encoding like ISO Latin 1 is entertaining and confusing
BBEdit likes to do stuff like this
[quote=441900:@Emile Schwarz]What happened when you saved a text from a TextArea (with a UTF-8 encoding) ?
Did you try that ?[/quote]
and read it back in adifferent textarea ?
Try an report result, please.
I am not reading or writing text from/to text areas, I read text from files into string array, process the data line by line, and write it back, but the encoding could be ansi, utf-8 (BOM EF BB BF) or UTF-16BE (BOM FE FF) / UTF-16LE (BOM FF FE) and other, whatever the user select for the export, so I need to be able to write a BOM.
[quote=441856:@Christian Schmitz]I think you can just do a write as the first thing after creating file:
t.Write encodings.UTF8.Chr(&hFEFF)
This should work.
The following might help:
-
Writing Files
Make sure your content is the correct encoding. If not perform a ConvertEncoding on it.
Write as a binary file rather than a text file.
Write one of the following before your content:
UTF8: ChrB(&hEF) + ChrB(&hBB) + ChrB(&hBF)
UTF16BE: ChrB(&hFE) + ChrB(&hFF)
UTF16LE: ChrB(&hFF) + ChrB(&hFE)
UTF32BE: ChrB(&h00) + ChrB(&h00) + ChrB(&hFF) + ChrB(&hFF)
UTF32LE: ChrB(&hFF) + ChrB(&hFF) + ChrB(&h00) + ChrB(&h00)
Write your content
Close -
Reading Files
Use a binary stream to read the first 4 bytes.
Perform a comparison on the bytes to determine if they represent a BOM. This will tell you the Xojo text encoding to use and the offset to the start of the content.
Open the folderitem as a text stream
Set the encoding property on the text stream
Set the PositionB property to the offset
Read the content
Close
If necessary, convert the encoding of the data you have read back to your working encoding (eg: UTF-8).
[quote=442216:@Kevin Gale]The following might help:
-
Writing Files
Make sure your content is the correct encoding. If not perform a ConvertEncoding on it.
Write as a binary file rather than a text file.
Write one of the following before your content:
UTF8: ChrB(&hEF) + ChrB(&hBB) + ChrB(&hBF)
UTF16BE: ChrB(&hFE) + ChrB(&hFF)
UTF16LE: ChrB(&hFF) + ChrB(&hFE)
UTF32BE: ChrB(&h00) + ChrB(&h00) + ChrB(&hFF) + ChrB(&hFF)
UTF32LE: ChrB(&hFF) + ChrB(&hFF) + ChrB(&h00) + ChrB(&h00)
Write your content
Close -
Reading Files
Use a binary stream to read the first 4 bytes.
Perform a comparison on the bytes to determine if they represent a BOM. This will tell you the Xojo text encoding to use and the offset to the start of the content.
Open the folderitem as a text stream
[/quote]
Why bother with that? You can use the Binary Stream to read in Text with an encoding:
Dim theContent as String = BS.Read(theFolderItem.Length-4, theEncoding)
[quote=442266:@Karen Atkocius]Why bother with that? You can use the Binary Stream to read in Text with an encoding:
Dim theContent as String = BS.Read(theFolderItem.Length-4, theEncoding)
[/quote]
I must have missed that you could read with an encoding when using a binary stream.
However, you would have to set the read position as if the BOM was UTF-8 you would have to rewind 1 byte before reading the content.
UTF-8 BOM is three bytes.
[quote=442285:@Kevin Gale]I must have missed that you could read with an encoding when using a binary stream.
However, you would have to set the read position as if the BOM was UTF-8 you would have to rewind 1 byte before reading the content.[/quote]
Not a problem.
BS.Position = BS.Position -1
Sorry for being late. Here is a link to an article from 2017 on BOM with an example program.
How to Implement a Byte Order Marker BOM with Xojo
Maybe this will help?