EndOfLine.Windows on a Mac

  1. last week

    Marc C

    is not verified Apr 12 near Lyon, France

    I want to read a text file on a Mac

    • the ends of line is CR+LF
    • each line contains some data separated by a SEMICOLON (or a TAB), and some values may contain a CR surrounded by double quot

    example :
    ggg;hhh;jjj;kkk CRLF
    eeee;ddd;"ggg CR ggggggg"; CRLF
    aaaaa;"bbb CR ccc";dddd; CRLF

    dim file as FolderItem = GetOpenFolderItem("")
    dim t as TextInputStream = TextInputStream.open(file)
    t.Encoding = Encodings.WindowsLatin1
    while not t.EOF
    dim wline as string = t.ReadLine
    --> the first line is correctly read, not the second : wline contains only : eeee;ddd;"ggg

    I thought the end of line were always CRLF when Encoding is declared as WindowsLatin1, even on a Mac
    This is not the case ?

    Thanks for answering :)

    @Philippe S or If the file is not too big, you could read it in memory, and then split the lines with

    yes it's a solution

    @Robert W you may wish to check out some of the discussions on reading CSV files.

    I can not find a subject with "csv" in the forum : do you have a link ?

    @Emile S Why ?

    because for me the end of Windows line are associated with the WindowsLatin1 encoding

  2. Michel B

    Apr 12 Pre-Release Testers RubberViews.com
    Edited last week

    You may want to do something like that :

    dim file as FolderItem = GetOpenFolderItem("")
    dim t as TextInputStream = TextInputStream.open(file)
    t.Encoding = Encodings.WindowsLatin1
    while not t.EOF
    dim wline as string = ReplaceLineEndings(t.ReadAll, endofLine)
    wend
    dim lines() as string = split(wline, endofline)
    // Then you read from the lines() array

    By the way, it could be an excellent idea to select the code and click on the code icon above the editor next time. Notice how more legible the same code is from your post and mine.

    http://docs.xojo.com/index.php/ReplaceLineEndings

  3. Philippe S

    Apr 12 Pre-Release Testers, Xojo Pro

    or If the file is not too big, you could read it in memory, and then split the lines with

    textfile = t.ReadAll( Encodings.WindowsLatin1 )
    ...
    lines = split( textfile, EndOfLine.Windows )
  4. Robert W

    Apr 12 Western Canada

    @Marc C --> the first line is correctly read, not the second : wline contains only : eeee;ddd;"ggg

    I thought the end of line were always CRLF when Encoding is declared as WindowsLatin1, even on a Mac
    This is not the case ?

    From the language reference:
    TextInputStream.ReadLine
    Returns the next line of text (as a string) from the TextInputstream.
    Any valid end-of-line indicator is used to identify a line.

    So, regardless of the encoding, readline stops when is encounters the CR.
    Your file format appears to be a CSV file except for the use of a semicolon (or TAB) instead of a comma for the field delimiter. So, you may wish to check out some of the discussions on reading CSV files.

  5. Emile S

    Apr 12 Europe (France, Strasbourg)

    @Marc C WindowsLatin1

    Why ?

  6. 7 days ago

    Marc C

    is not verified Apr 13 Answer near Lyon, France

    Thanks for answering :)

    @Philippe S or If the file is not too big, you could read it in memory, and then split the lines with

    yes it's a solution

    @Robert W you may wish to check out some of the discussions on reading CSV files.

    I can not find a subject with "csv" in the forum : do you have a link ?

    @Emile S Why ?

    because for me the end of Windows line are associated with the WindowsLatin1 encoding

  7. Emile S

    Apr 13 Europe (France, Strasbourg)

    @Marc C because for me the end of Windows line are associated with the WindowsLatin1 encoding

    And that is the reason of the question.
    I rephrase it: why don’t you use UTF-8 ?

    https://en.wikipedia.org/wiki/Windows-1252

    It represent the 0 to 255 character set as used by Windows (1 thru… XP ?).

  8. Emile S

    Apr 13 Europe (France, Strasbourg)
    Edited 7 days ago

    @Marc C I can not find a subject with "csv" in the forum : do you have a link ?

    Probably a bug in the Forum software search feature. Search: comma separated value

    Some info here:
    https://en.wikipedia.org/wiki/CSV_application_support

  9. Kevin G

    Apr 13 Pre-Release Testers, Xojo Pro Gatesheed, England

    Character encoding and line endings are completely separate.

    CR+LF is the standard line ending on MS-Windows.
    Latin1 is a very common character encoding.

    If you generate ASCII text files on MS-Windows there is a high probability that they will be Latin1 encoded with CR+LF as the line ending. However, it would be perfectly valid for an application to generate a Latin1 file that used CR (Mac) or LF (Unix) as the line ending.

    If you are reading text files you should be prepared to support any of the 3 line ending variations. Supporting different encodings really depends on how much effort you want to put into your file reading code.

  10. Jeff T

    Apr 13 Midlands of England, Europe
    Edited 7 days ago
    ReplaceLineEndings(t.ReadAll, endofLine)

    I suspect that this will have the same problem, because it might convert both the wanted end-of-lines , and the CR embedded in a string field.
    (will endofline ONLY hit the Windows CRLF ones?)

    If it does have a problem, I might suggest this two-pass version (change the Windows ones to something odd, change the CR to something else, change the something odd back to windows line endings, then process the file)

    dim Wholefile as string
    
    Wholefile  = t.ReadAll
    
    Wholefile  = replaceall(Wholefile,endofline.windows,"||")   //  I used || assuming they dont appear in the file!
    Wholefile  = replaceall(Wholefile,chr(13),"~") 
    Wholefile  = replaceall(Wholefile,"||",endofline.windows)  
    
    dim lines() as string = split(Wholefile  , endofline.windows)

    now iterate through the lines array, and change ~ to be chr(13) before use.

  11. Michel B

    Apr 13 Pre-Release Testers RubberViews.com
    Edited 7 days ago

    @Marc C Thanks for answering :)

    yes it's a solution

    I can not find a subject with "csv" in the forum : do you have a link ?

    because for me the end of Windows line are associated with the WindowsLatin1 encoding

    Que s'est-il passé ? Vous n'avez pas lu ma réponse, la première de la conversation, qui contient une suggestion de code ?

    dim file as FolderItem = GetOpenFolderItem("")
    dim t as TextInputStream = TextInputStream.open(file)
    t.Encoding = Encodings.WindowsLatin1
    while not t.EOF
    dim wline as string = ReplaceLineEndings(t.ReadAll, endofLine)
    wend
    dim lines() as string = split(wline, endofline)
    // Then you read from the lines() array
  12. 6 days ago

    Dave S

    Apr 13 San Diego, California USA

    This is what I usually do

    dim Wholefile as string
    
    Wholefile  = t.ReadAll
    
    Wholefile  = replaceall(Wholefile,endofline.windows,endofline.unix)   // normalize eol to be 0x0A
    Wholefile  = replaceall(Wholefile,endofline.macintosh,endofline.unix)  // 99% of the time this is not required
    dim lines() as string = split(Wholefile  , endofline.unix)
  13. Jeff T

    Apr 13 Midlands of England, Europe
    Edited 6 days ago

    Again, that code breaks in this instance.. because the data has CR in the middle of a line, surrounded by quotes

    eeee;ddd;"ggg CR ggggggg"; CRLF

    Which means that

    ggg;hhh;jjj;kkk
    eeee;ddd;"ggg CR ggggggg";
    aaaaa;"bbb CR ccc";dddd;

    becomes

    ggg;hhh;jjj;kkk
    eeee;ddd;"ggg
    ggggggg";
    aaaaa;"bbb CR ccc";dddd;

    The OP needs to replace CR on its own, with something else, before parsing the file line by line.
    If CR is replaced first, it breaks the existing CRLF at the real end of lines.
    So replace the CRLF with <something>
    Replace CR with <something else>
    Replace the <something> back to CRLF
    then the file can be split on line endings

  14. Jason P

    Apr 13 Xojo Inc http://xojo.com/

    The CSV Parser here might help:

    http://www.great-white-software.com/Xojo_Code.html

    It may already handle the CR LF and embedded CR in field values.

  15. Robert W

    Apr 13 Western Canada

    @Jeff T Again, that code breaks in this instance.. because the data has CR in the middle of a line, surrounded by quotes
    eeee;ddd;"ggg CR ggggggg"; CRLF

    Norman's parser (that Jason linked to) will fix this, but needs to be modified slightly to change the field delimiter to a semicolon.
    Another option is to make use of the split function to isolate the quoted material. This is the code that I use for reading CSV files (modified to use a semicolon):

    Function ImportCSV(csvText As TextInputStream) as DataRecord()
      'Parses semicolon delimited TextInputStream into an array of "record" arrays.
      'Return type from this routine is an array of type DataRecord.
      'DataRecord is a class containing nothing but a string array property: dataField()
      'So, an array of DataRecord is essentially a general two dimensional array
      'that can have a variable number of rows and columns.
      dim outData() as DataRecord
      dim delimField As string = ";" 'For CSV change this to "," or for TAB delimited to chr(9)
      dim delimQuote As string = chr(34)
      dim rawInput,FieldData As String
      While not csvText.EOF
        'Read a line of text
        rawInput = csvText.ReadLine
        'Read more if pending line has embedded line endings
        While (max(1,CountFields(rawInput,delimQuote)) mod 2=0) 'While quote parity is odd...
          if csvText.EOF then 'Big trouble!
            MsgBox "Encountered EOF while processing quoted text. Closing quote is missing."
            'Could handle this by returning outData as is,
            'or add a closing quote to the last record.
            'Or...
            return nil 'which means bad file data regardless.
          end if
          rawInput = rawInput + EndOfLine + csvText.ReadLine
        Wend
        ' ********** Start new record
        outData.Append(new DataRecord) 
        FieldData=""
        dim currentRecordNo As Integer = UBound(outData)
        dim Qgroup() As String = split(rawInput,delimQuote) 'Odd numbered elements are quoted text
        dim nQgroups As Integer = UBound(Qgroup)
        for i as integer = 0 to nQgroups step 2 'Skip over quoted text ...for now
          dim field() As string = split(Qgroup(i),delimField)
          if UBound(field)<0 Then field.Append("") 'fix inconsistency in how Split() handles null string
          dim nFields As Integer = UBound(field)
          for j as Integer = 0 to nFields
            if j<>0 then
              '********** Save field data for current field in current record
              outData(currentRecordNo).dataField.Append(UnQuote(FieldData,delimQuote)) 
              FieldData=""
            end if
            FieldData = FieldData+field(j)
            if j=nFields and i<nQgroups then 'This is where we include the quoted text
              FieldData=FieldData+delimQuote+Qgroup(i+1)+delimQuote
            end if
          next
        next
        '********** Save field data for last field in record
        outData(currentRecordNo).dataField.Append(UnQuote(FieldData,delimQuote)) 
      Wend
      Return outData
    End Function
    
    Function UnQuote(s As String, Qchar As String) as string
      ' Remove enclosing quotes (if any) from CSV field and unEscape embedded quotes
      ' Called by ImportCSV and ImportCSVdb
      dim temp As String = s
      if left(s,1)=Qchar and right(s,1)=Qchar then
        temp=Mid(s,2,Len(s)-2)
      else
        temp=s
      end if
      Return ReplaceAll(temp,Qchar+Qchar,Qchar)
    End Function
    
    Class DataRecord
      Property
        dataField() As String
      EndProperty
    End Class

    I've fed it a lot of messy CSV text, and haven't managed to break it so far. This example puts the data into a 'DataRecord' object which is essentially a variable size two dimensional string array. To handle the field data differently you need to alter the code following the 3 comment lines that begin with ' **********.

or Sign Up to reply!