Split very long String

Hello,

I have a self-developed canvas, which draws text distributed on lines. The canvas has a width of 600 pixels. Using RegEx I split the text into words and spaces. So I can calculate how many words and spaces fit on one line before a new line has to be inserted and the output can be continued. It all works smoothly and perfectly. Now to my question:

It can happen that the length of a word (for testing purposes a pointless string of letters) is greater than 600 pixels. Let’s say 2400 pixels. So the long (meaningless) word has to be split into parts. My mind tells me that such a loop i[/i] is very inefficient:

[code]Dim tempWidth As Double

For i As Integer = 1 To MyString.Len

tempWidth = tempWidth + Graphics.StringWidth(MyString.Mid(i, 1))

If tempWidth > 600 Then

// Reset tempWidth.
tempWidth = 0
// Add new line.

End If

Next[/code]

Are there similar approaches, such as optimized array sorting algorithms? Is there a mathematically more elegant way to solve something like this?

Thanks.

If you prioritize speed over precision, you can use the percentages as a starting point. In your example, 75% of the pixels have to be eliminated, so start with eliminating the last 75% of the word, then work backwards from there.

Thanks Kem, that’s a good advice. I’ll have a look on it.

A bisection algorithm should speed speed things up. Another tweak is to estimate the average character width of a the font and use that to make a better guess as to where where the split will occur and thus cut down on the number of bisections you need to make. The big time cost is getting the string width, so you want to minimize the number of times you call StringWidth.

@Martin Trippensee — Is there a reason why you don’t want to use built-in functions to draw your string on multiple lines?

I have a less elegant but more comprehensive solution. I use the character length (pLength), not pixels, as the limit for each string. You calculate how many characters can fit into the Canvas width before calling this routine. The big difference is my routine also respects embedded EndOfLine characters.

[code]SplitStringByWord (pString AS STRING, pLength AS INTEGER) AS STRING()

DIM Result (-1) AS STRING
DIM ThisString AS STRING

IF Len (pString) > 0 THEN ’ Valid string

IF pLength > 0 THEN ’ Valid length

ThisString = pString ' Start with pString

WHILE Len (ThisString) >  0
  DIM LastSpacePosition AS INTEGER = 0
  DIM NewLine AS BOOLEAN = False

  FOR i AS INTEGER = 1 to Len (ThisString) ' Walk through each character in the string
    SELECT CASE Mid (ThisString, i, 1) ' Check this character

    CASE " " ' Space
      LastSpacePosition = i

    CASE EndOfLine ' Line break
      Result.Append (Left (ThisString, i - 1) )
      ThisString = Mid (ThisString, i + 1)
      NewLine = True
      EXIT FOR i

    END SELECT

    ' If we made it here, this character is not a line break

    IF i >= pLength THEN ' Have reached the maximum string length 
      IF LastSpacePosition > 0 THEN ' Break at last space
        Result.Append (Left (ThisString, LastSpacePosition - 1) )
        ThisString = Mid (ThisString, LastSpacePosition + 1)

      ELSE ' Break now because there was no last space
        Result.Append (Left (ThisString, i) )
        ThisString = Mid (ThisString, i + 1)
      END IF

       NewLine = True
       EXIT FOR i
     END IF
   NEXT i

   IF NOT NewLine THEN ' Did not create a new line, which means we reached the end of ThisString without breaking the string
     Result.Append (ThisString) ' Add the remainder of the pString as the last array element
     ThisString = "" ' This will cause the WHILE-WEND to exit

   END IF
 WEND

ELSE ’ pLength <= 0
Result.Append (pString)

END IF

END IF

RETURN Result[/code]

A binary chop style algorithm can help to reduce the number of iterations when the string isn’t going to be chopped a lot of times.

Here is a rough (mainly untested) example. Paste the code into the Paint event of a window and resize the window. It will draw the lines to the window and output the iteration count to the debug log.

[code]Dim y As Int32
Dim debugOrigStringLength As Int32
Dim debugIterationCount As Int32
Dim myString As String
Dim origLength, currentLength, previousLength, validLength As Int32
Dim testStep As Int32
Dim chopString As Boolean
Dim stringPixelWidth As Double

g.TextSize = 96

y = g.TextAscent

myString = “abcdMMefghijklmMMnopMMqrstuMvwyxz”
debugOrigStringLength = Len(myString)
debugIterationCount = 0

origLength = Len(myString)
currentLength = origLength
previousLength = currentLength
testStep = currentLength
chopString = False
While origLength > 0
stringPixelWidth = g.StringWidth(Left(myString, currentLength))

If stringPixelWidth > g.Width Then
  If testStep = 1 Then
    'we can't test a shorter string length
    chopString = True
  Else
    'test a shorter string length
    testStep = Ceil(testStep / 2)
    
    currentLength = Max(currentLength - testStep, 1)
  End If
Else
  'width fits
  validLength = currentLength
  
  If currentLength = origLength Then
    'the entire string fits
    chopString = True
  Else
    If currentLength >= previousLength Then
      'we are going to test a string length we know is too long
      If testStep = 1 Then
        'we can't test a shorter string length
        chopString = True
      End If
    End If
    
    'test a shorter string length
    testStep = Ceil(testStep / 2)
    
    currentLength = currentLength + testStep
    previousLength = currentLength
  End If
End If

If chopString = True Then
  'new line
  If validLength > 0 Then
    g.DrawString(Left(myString, validLength), 0, y)
    y = y + g.TextHeight
    
    myString = Mid(myString, validLength + 1)
    
    origLength = Len(myString)
    currentLength = origLength
    previousLength = currentLength
    testStep = currentLength
    chopString = False
  Else
    'nothing fits
    Exit While
  End If
End If

debugIterationCount = debugIterationCount + 1

Wend

System.DebugLog Str(debugIterationCount) + " vs " + Str(debugOrigStringLength)[/code]

If you have styled text this becomes impossible and you have to do things manually.

@Bob Keeney — And what about StyledTextPrinter?

StyledTextPrinter doesn’t allow pictures, hyperlinks and all sorts of other things that Martin is looking at.

@Bob Keeney — The OP never mentioned pictures or hyperlinks!

I know. I just happen to know what he’s working on. :slight_smile:

Those were all good suggestions if he was using standard Xojo stuff.

Thank you to all who have commented on this post. It seems as if the proposals of @Brendan Murphy were a good see and as if @Kevin Gale had also followed this in his algorithm that works well see. I will check it for optimizations and look forward to more great hints and optimization hints from you. The speed of the algorithm is the top priority. @Alex McPhail also thanks for your contribution.