Best approach advice: Managing text editor lines

GarryPettet · January 29, 2023, 10:57am

I’m working on a new open source text editor control based on TextInputCanvas. I’ve written an editor control before (XUICodeEditor) but this new one, which I’m calling SyntaxArea, will need some features that are difficult (maybe impossible) to implement with XUICodeEditor.

I’m stuck at the planning stage. I’ve decided that I will store all text entered by the user in a central TextStorage class (essentially a gap buffer) and I’m happy with that.

What I need advice on is how to store the line data. Ideally I think I’d like to have an array of some sort of TextLine object that stores the start and length of each line in the TextStorage. That way when I draw the screen I can quickly grab the visible lines and render them.

This approach works fine until you want to implement soft wrapping. With soft wrapping, I need to figure out new start and lengths of each line, seemingly every time the user types. If the user is typing in the middle of a long document, I worry this will be too computationally expensive to do.

I guess what I’m asking is does anyone know how present day text editors manage soft wrapping? This is a crucial feature that is not possible in XUICodeEditor because in XUICodeEditor I only store the characters on a per-line basis rather than in a central text storage data structure.

kevin_g · January 29, 2023, 4:33pm

Hi Gary.

If you are recalculating line positions for soft wrapping you should only have to do it from the start of the current line (or paragraph) to the next hard break (paragraph).

A linked list might also be more efficient than an array as you could constantly adding and removing lines as the user types.

Eric_Williams · January 29, 2023, 10:56pm

I’ve done similar work with text wrapping, although not with live updating, but I think my approach will be useful.

You need to decompose your TextLine object into a associated object that looks like this:

TextLine → TextLineRendering

TextLineRendering: property TextLineRenderingUnit(), property totalWidthByContext as Dictionary

TextLineRenderingUnit: property TextLineRenderingStyleRun(), property widthByContext as Dictionary, property spaceAfter as Boolean

TextLineRenderingStyleRun: property text as string, widthByContext as Dictionary, font, bold, size, etc.

TextLineRendering an object that represents the data and code necessary to render its associated TextLine to the output context. It also contains the code that does the rendering.

totalWidthByContext is a Dictionary that associates a rendering context (i.e. a particular Canvas, Picture, etc) with how wide the TextLineRendering is on that context. Storing the information like this gives you the flexibility of having multiple views of your text with different settings (think: zoom) and still maintain performance. Essentially it is the sum of all the widths of the child TextLineRenderingUnit’s widths PLUS whatever space you are inserting in between them during rendering, invisible characters shown at the beginning and ending of each line, indents, etc.

TextLineRenderingUnit represents the smallest bit of text the rendering system will handle at one time without wrapping. Because you’re writing a code editor, I’m assuming you don’t want to use hyphenation – the structure changes if you do want to hyphenate. The Unit will represent text like “Next” and “(-1)” which should never be broken across a line, depending on your style rules. Note the lack of any spaces before or after the text. The rendering system will take care of spacing. A widthByContext Dictionary performs the same function as the corresponding object in the parent.

The Unit object contains one or more TextLineRenderingStyleRun objects which represent the individual styled characters to render. Each StyleRun contains styling information (font, size, bold, etc) as well as a widthByContext Dictionary for the same use as the parent object. When the StyleRun is rendered for the first time for a particular context, the rendering width is stored in the dictionary for future use.

Depending on your style rules, a Unit object representing “(-1)” might contain StyleRun objects like “(”, “-1”, and “)”. Each StyleRun object is rendered with no space in between.

If the spaceAfter property of the Unit is true, a space is rendered. This lets you create style rules that allow breaking in the text “if SomeMethod(-1)” between the “SomeMethod” and the “(” if that is desirable.

You’d create the root level TextLineRendering object any time a TextLine is created or modified, and then save the cacheable values once they are known (during the first rendering process). This will make all subsequent renderings much faster. Remember to invalidate and recreate the TextLineStyleRun object any time its associated TextLine is modified.

I typed this up out of memory; I haven’t looked at my implementation of this system in a while, so there may be holes, but I think it’s a good start.

Hans-Norbert_Gratzal · January 30, 2023, 10:22am

A common data structure for storing text editor data is Piece table - Wikipedia

GarryPettet · January 30, 2023, 10:44am

@Eric_Williams: Thank you for the detailed reply, it’s very helpful. I particularly like the comment you make about zooming. This could also be utilised to provide a “minimap” view for example.

That’s a good point. It will still require me to loop through the array of all lines after the line being edited and update the offset of the start of that line. I guess that might be OK since it’s just a single addition.

kevin_g · January 30, 2023, 12:59pm

Hi Garry.

Depending on how often you access the data you might find it more efficient to only store the length of each line and calculate the start position when you need to locate which line contains a specific position.

Jean-Yves_Pochez · January 30, 2023, 2:29pm

may be take a look at how did @Bob_Keeney3 implement this problem in FTC ?