Word wrap algorithm

JensK · July 9, 2021, 10:19am

I am working on a Canvas based TextArea. While typing and drawing text is working quite well, I am currently stuck at how to calculate word wrapping in the most efficient way. Since I don’t want to do that in the Paint event (I want to keep the text buffer seperate from the drawing of text) and I cannot create a Graphics object, I use a Picture’s Graphics object to calculate word and line lengths at the moment.

My first approach of wrapping a text paragraph into lines works well, but seems to be too slow or inefficient for larger amounts of text. In my code “G” is a Graphics object from a Picture with the same size as my Canvas, and “Lines” is a String array property of my Paragraph class; the rest should be self explanatory:

    Var tmpLines() As String
    tmpLines.Add("")

    Var tmp() As String = Text.Split(" ")
    Var lim As Integer = tmp.LastIndex
    Var tmpString As String = tmp(0)

    // Breaking lines
    If Text <> "" Then
      For i As Integer = 1 To lim
        If G.TextWidth(tmpString + " " + tmp(i)) < G.Width Then
          tmpString = tmpString + " " + tmp(i)
        Else
          tmpLines(tmpLines.LastIndex) = tmpString
          tmpLines.Add("")
          tmpString = tmp(i)
        End If
      Next
    End If

    // Adding last line
    If tmpString <> "" Then
      tmpLines(tmpLines.LastIndex) = tmpString
    End If

    Lines = tmpLines

As I am just a hobby programmer I would be highly interested in how a pro would do the calculating of lines.

Thanks in advance for any suggestions!

kevin_g · July 9, 2021, 11:11am

I assume that you have already split the text into paragraphs and this function is to wrap a single paragraph.

My guess is that TextWidth is the culprit. Xojo provides a simplified way of dealing with text drawing which will involve some overhead. If you are macOS only and have the MBS plugins you might be able to solve this by going to a much lower level using the CoreText APIs.

If you can only use the Xojo functions then the solution is probably to reduce the calls to TextWidth.
Ideas:
1.Only recalculate paragraphs which have changed.
2.When recalculating a paragraph check to see if the entire text fits before splitting it.
3.Somehow track the first index within a paragraph where a change occurred and recalculate everything in that paragraph from the previous wrap position onwards.

Some other notes:

The picture you create to measure the text only needs to be 1x1 pixels in size and not the size of the canvas.
Only breaking on a space is very crude and will not work with a lot of text.
If you are looping a lot you may find the pragma commands help.

JensK · July 9, 2021, 12:14pm

Thanks for your suggestions!

Actually I already regarded the points you mention so there is not much calculation going on while typing. The main problems are opening a large text file and resizing the Canvas because then the whole text has to be (re)calculated, which takes too much time for my taste when using my algorithm. I had hoped there would be some more advanced way to do the calculations.

As to your additional notes:

I will try that one out.
I will include hyphens etc. later, when the algorithm works as desired.
I already implemented that but excluded it in the posted code.

kevin_g · July 9, 2021, 1:09pm

You might be able to speed up resizing by remembering the canvas width at the start of a resize and then working out if current canvas width - original canvas width would cause the last word on a line to wrap or the first word on a line to fit onto the previous line. If either are true, you would then recalculate everything.

I think to do anything more advanced will require you to use CoreText (macOS) or Uniscribe (MS-Windows) APIs directly.

A word of caution. Depending on what you are trying to achieve, you might find yourself down a rabbit hole trying to implement your own canvas based text area. There are a lot of things that a text area does (word wrapping / international text input / hit testing / bidirectional text display) that are not trivial to implement. If you can, just use the provided control or maybe an embedded web page with HTML text controls.

JensK · July 9, 2021, 1:40pm

That’s a good idea, I will give it a try.

And thanks for the warning, but it’s just a hobby project. There is no productivity pressure here – I am just curious how it all works.
I already implemented a terminal-like text editor a while ago including mouse scrolling, keyword highlighting etc. (see picture; it’s just a canvas in a window) that works fast and reliable. Now I wanted to expand on non-monotype fonts, but that seems to be a whole different thing in terms of algorithm and speed so that I have to rethink some older conceptions.

kevin_g · July 9, 2021, 2:23pm

Another potential improvement might be to use a method similar to a binary search where you test half and then continuously test either half more or half less until you hit a value you have previously tested or your step would become less than 1.

• Example scenario 1 where they are 10 words and only 6 fit.
Start with a step of 5.
5 words = pass so add the step again
10 words = fail so return to the previous value, half the step to 2 and add the step
7 words = fail so subtract the step
5 words = pass so half the step to 1 and add the step
6 words = pass
We abort now because adding the step would mean we test a value which has already failed. This was 1 less than a linear test.

• Example scenario 2 where they are 20 words and only 9 fit.
Start with a step of 10.
10 words = fail so half the step to 5 and subtract the step
5 words = pass so half the step to 2 and add the step
7 words = pass so add the step
9 words = pass
We abort now because adding the step would mean we test a value which has already failed. This was 5 less than a linear test.

JensK · July 9, 2021, 2:49pm

Interesting approach, thank you!
It would involve some joining in order to evaluate the respecting TextWidth, and I don’t know how to “split” an array (otherwise than adding word by word in a certain range to a new array), both of which will cost (too much?) time. I will have to try it out.

kevin_g · July 9, 2021, 2:54pm

You would have to concatenate the individual words together which does add an overhead but it is possibly a smaller overhead than TextWidth.

JensK · July 9, 2021, 3:18pm

I will consider that.
For now I got some bad infinite loops – I will better go on when I am less tired …

JensK · July 10, 2021, 12:21pm

Today I implemented the following method:

#Pragma DisableBackgroundTasks
#Pragma DisableBoundsChecking

Var tmp() As String = Text.Split(" ")
Var lim As Integer = tmp.LastIndex

Var tmpLines() As String
Var tmpString As String

Var wdth As Integer = G.Width
Var words As Integer = Floor(wdth / (6 * G.TextWidth("e"))) // an average German word is 6 letters long
Var wc As Integer

Var i As Integer = 0

While i < lim
  // reset current line and word counter
  tmpString = ""
  wc = 0
  // add the calculated average of words to the current line
  While wc < words And i < lim
    tmpString = tmpString + " " + tmp(i)
    wc = wc + 1
    i = i + 1
  Wend
  // add more words if there is room left
  While G.TextWidth(tmpString + " " + tmp(i)) < wdth And i < lim
    tmpString = tmpString + " " + tmp(i)
    i = i + 1
  Wend
  // add last word if at end
  If i = lim Then
    tmpString = tmpString + " " + tmp(i)
  End If
  // remove words if line got too long (just in case, it rarely happens)
  If G.TextWidth(tmpString) >= wdth Then
    While g.TextWidth(tmpString) >= wdth
      tmpString = tmpString.Left(tmpString.Length - tmpString.NthField(" ", tmpString.CountFields(" ")).Length)
      i = i - 1
    Wend
  End If
  // add current line to lines array
  tmpLines.Add(tmpString.Trim)
Wend

Lines = tmpLines

It wraps my example paragraph of a 1000 words in about 11 milliseconds, which is about 2 to 3 times faster than my initial method. There is still room for improvement, but that might be the way to go.

When resizing the Canvas, re-wrapping looks very smooth, until the application finally shows the beach ball and doesn’t react anymore. That is the next problem I have to look at …

Carsten_Belling · July 10, 2021, 8:43pm

Three tips from my experience:

Before doing something with “words”, check if the paragraph fits in a line: if g.TextWidth (paragraph) > g.Width then <do wrapping> else <you have a line that needs no wrapping>
Don’t concatenate strings, use arrays and string.FromArray() instead.
And if your code passes all test add
#pragma BoundsChecking false #pragma StackOverflowChecking false #pragma NilObjectChecking false

JensK · July 11, 2021, 8:19am

I will give it a try, thank you! Though I don’t think it will make much difference because if a paragraph fits in a line my outer while loop will exit after the first iteration which will take very little time.
I tried that out but actually it took more time; I think I will have to reconsider my implementation of that approach but I have it in mind.
Not much difference here (about 5 percent perfomance gain).