Please fix extraordinarily slow parser for RTFData

Handling of RTFData in OS X has been problematic since it was introduced. Here’s a NUG thread from 2008

http://www.realsoftware.com/listarchives/realbasic-nug/2008-07/msg00865.html

The problem is still there in Xojo 9 Cocoa. I filed bug report #25533 with a simple example, but later realized it’s well known . The problem isn’t noticeable with small amounts of text, but speed decreases exponentially as a function of text length (twofold increase in length = ~8-fold decrease in speed), so it quickly becomes unusable except for the most modest of styled text uses. This is part of the note left by Joe Strout in the above thread:


For the curious: Shark shows that it’s spending most of its time inside StyledTextParagraphCountGetter, calling StringDBCSMid3, which is the equivalent of RB’s Mid( start, str, length ). Calling Mid is a horribly inefficient way of iterating through the characters in a string, since it has to start over counting characters from the beginning every time (since a character may take more than 1 byte).

Although this problem was bad, one could easily avoid it by using TextStyleData to store/retrieve styled text, which is what I did. However, TextStyleData has been dropped from Xojo Cocoa and we now must use StyledText. So it really has to work now. Joe mentioned that he was planning on getting to TextAreas this week. I don’t know if Joe Strout’s diagnosis is correct, but if so it wouldn’t seem terribly hard to remedy (of course I don’t know for sure he’s correct). In any case, this old problem is going to cause new pain, and I strongly urge that it be revisited.

The analysis is essentially correct. Unfortunately it is not as trivial to fix as it sounds and won’t happen for 2013r1. However, it should be possible to use declares to create an NSAttributedString from RTF and load that into the NSTextView (using the text storage’s setAttributedString: method).

well there is more on this (the hole rtf thing) and they are many reports
here are some of theme …

Maybe its time for a new rtf parser or some 3rd party class?

<https://xojo.com/issue/10193> (from year 2009)
<https://xojo.com/issue/24158>
<https://xojo.com/issue/25533>
<https://xojo.com/issue/18489>
<https://xojo.com/issue/25317>
<https://xojo.com/issue/25316>
<https://xojo.com/issue/25532>
<https://xojo.com/issue/16718>
<https://xojo.com/issue/13760>
<https://xojo.com/issue/17658>
<https://xojo.com/issue/17941>

Um…Formatted Text Control? It has a much better parser and it’s 100% RB code so you can extend it as you need. http://www.bkeeney.com/formatted-text-control/

TextArea doesn’t support many common RTF features. Inline images, hyperlinks to name a few. It might be a long time before those ARE supported.

http://www.bkeeney.com/formatted-text-control/

[quote=5332:@Bob Keeney]Um…Formatted Text Control? It has a much better parser and it’s 100% RB code so you can extend it as you need. http://www.bkeeney.com/formatted-text-control/

[/quote]
There’s another option: http://vanhoekplugins.com/REALStudio/WordGuise.html

Can anyone comment here on the relative merits of FTC vs. WordGuise? Specifically, I’m looking for a formatted text control that works well in both mac (Cocoa) and Windows. I don’t care about spellchecking. I do care about an UNDO system, though I can certainly roll my own.

I don’t want to start a war here and I don’t like to hijack this thread. I think there are other RTF parsers out there, and I think you will have plenty of options, including the option Joe (using declares) was talking about. RTF stuff on windows is the most advanced parser/writer in the world, and is implemented by RichEdit.

As far as I can tell, The windows Xojo TextField should do fine. WordGuise is also using RichEdit on windows, and for what it is worth, on the Mac it is using the RTF libraries of WASTE 3. WordGuise supports RichEdit-Undo/Redo, and WASTE 3 Undo/Redo, but as you said, you can write your own.

If you like picture embedding then go for either the formatted text control (FTC) or WordGuise. The pro and cons between the 2 you can figure.

Thanks, Alfred. My main showstopper issues in 2012 R2.1 that prevented me from releasing were due to issues with the Cocoa styled TextAreas. I’ve not yet been able to test these in 2013 (beta 9 may be my first actual test). It’s good to know that if the 2013 Cocoa textArea bugs are still showstoppers, that I have other options.

I’d certainly call them showstoppers and am working on them right now. Could you create a new thread and post whatever Feedback links you have for TextArea? I have a folder of them in Feedback, but it may not be exhaustive.

The only issue in FTC in Cocoa is that we have to convert from the regular KeyDown event handling to the new TextInputCanvas event handling. As you may or may not be aware this change affects all Canvas controls. Accented characters just don’t work the same way in Cocoa and requires the TextInputCanvas plugin. It’s a big change but we are working on it with Real Software’s input.

The great thing about FTC is that it’s NOT a plugin. It’s 100% RB code (except for the TextInputCanvas plugin now). You need functionality change? You can do that yourself without having to wait for us to implement it for you.

Sorry for hijacking the thread…

Thanks for the suggestion. I did look at the Apple Developer’s documentation, which prompted me to take a look at MacOSLib. It turns out that it contains functions, in TextAreaExtensions, that let you get/set TextArea.RTFValue, which is extremely fast. In my demo project with 8000 characters

 s = TextArea1.StyledText.RTFData

takes 25 seconds.

 s = TextArea1.RTFValue

with the same field takes 0 ticks (< 1/60th of a second).

That certainly helps. However, I still have to use StyledText objects (to do things like concatenate styled text runs) and the bottleneck of getting RTFData into and out of StyledText objects remains.

Do you (or anyone else reading this thread) have any hints about how one could co-opt the MacOSLib functions to get/set the RTFValue of StyledText objects?

[quote=5458:@Jonathan Ashwell]Thanks for the suggestion. I did look at the Apple Developer’s documentation, which prompted me to take a look at MacOSLib. It turns out that it contains functions, in TextAreaExtensions, that let you get/set TextArea.RTFValue, which is extremely fast. In my demo project with 8000 characters

 s = TextArea1.StyledText.RTFData

takes 25 seconds.

 s = TextArea1.RTFValue

with the same field takes 0 ticks (< 1/60th of a second).

That certainly helps. However, I still have to use StyledText objects (to do things like concatenate styled text runs) and the bottleneck of getting RTFData into and out of StyledText objects remains.

Do you (or anyone else reading this thread) have any hints about how one could co-opt the MacOSLib functions to get/set the RTFValue of StyledText objects?[/quote]

Are you talking about getting the RTF data from an arbitrary StyledText object and not a TextArea? In that case, I’m afraid you’re out of luck, short of writing your own RTF parser (or finding one someone else has written).

Not sure if this is helpful but did you see

http://www.realsoftwareblog.com/2012/11/speeding-up-textarea-modifications.html

Monday, November 26, 2012

Speeding up TextArea modifications under Cocoa

When doing a lot of manipulation to a TextArea’s contents under Cocoa, performance can suffer due to the underlying NSTextView doing work on layout and glyph selection. This can be sped up by telling the text view that you’re going to begin editing. You can do this by using a few Cocoa Declares.

The Declare statements are relatively simple:

Declare Function documentView Lib “AppKit” Selector “documentView” ( obj As Integer ) As Integer
Declare Function textStorage Lib “AppKit” Selector “textStorage” ( obj As Integer ) As Integer
Declare Sub beginEditing Lib “AppKit” Selector “beginEditing” ( obj As Integer )
Declare Sub endEditing Lib “AppKit” Selector “endEditing” ( obj As Integer )

These Declares give you access to methods for enabling “batch-editing mode” for the underlying NSTextView.

First you want to get the text storage for the document, which is a two-step process. In the first step, you take the TextArea’s Handle property (an NSScrollView instance) and ask for its document view:

Dim docView As Integer
docView = documentView(MyTextArea.Handle)

Now you get the NSTextStorage for the NSTextView:

Dim storage As Integer
storage = textStorage(docView)

With the text storage, you can now enable batch-editing mode by calling beginEditing:

beginEditing(storage)

Now you can make your significant changes to the TextArea:

For i As Integer = 0 To 5000
MyTextArea.AppendText("Lorem ipsum dolor sit amet. ")
Next

And when you are finished disable batch-editing mode:

endEditing(storage)

So how much does this improve performance? In my tests, the For loop by itself takes about 4.3 seconds to complete. Using batch-edit mode with these Declares drops it to 0.02 seconds. That’s almost instantaneous!

If you find you are going to use these Declare often, you might want to add them to a module as Extension methods so that you can call them more easily, which I’ll leave as an exercise for the reader.

Thanks, but that doesn’t address the problem. I’m using StyledText objects and RTF independently of an editfield. The terrible .rtfData performance is a huge bottleneck for anything except very small strings. But the code above might be very useful for direct manipulation of TextAreas.

Out of curiosity, are the functions mentioned above that let you access NSTextView directly supposed to improve performance for invisible fields? I ask because I’ve been doing text manipulation in invisible fields to increase speed (it does) and see no further improvement with batch-editing. If the main improvement comes from avoiding rendering text, this would make sense.

For getting and setting RTF data on a TextArea, I provided Cocoa declares here:
https://forum.xojo.com/1201-basic-rtf-styling-results-with-textedit-and-b16/0

Page Not Found

Oh I see it’s only visible to beta testers. Here it’s the code:

Create a Structure in a module and name it NSRange, then add to the structure the two fields location and length.

Copy and paste the following functions in the same module (one at time).

[code]Sub RTFValue(extends t as TextArea, assigns value as String)
#if targetCocoa
declare function dataWithBytes lib “Cocoa.framework” selector “dataWithBytes:length:” (class_id as Ptr, bytes as CString, length as Integer) as Ptr
declare sub replaceCharactersInRange lib “Cocoa.framework” selector “replaceCharactersInRange:withRTF:” (obj_id as Ptr, range as NSRange, rtfData as Ptr)
declare function documentView lib “Cocoa.framework” selector “documentView” (obj_id as Integer) as Ptr
declare function NSClassFromString Lib “Cocoa.framework” (aClassName as CFStringRef) As Ptr

dim data as Ptr = dataWithBytes(NSClassFromString("NSData"), value, LenB(value))
dim range as NSRange
range.length = Len(t.Text)
replaceCharactersInRange(documentView(t.Handle), range, data)

#else
#pragma unused t
#pragma unused value
#endif

End Sub[/code]

[code]Function RTFValue(extends t as TextArea) As String
#if targetCocoa
declare function RTFFromRange lib “Cocoa.framework” selector “RTFFromRange:” (obj_id as Ptr, range as NSRange) as Ptr
declare function length lib “Cocoa.framework” selector “length” (obj_id as Ptr) as Integer
declare sub getBytes lib “Cocoa.framework” selector “getBytes:length:” (obj_id as Ptr, buffer as Ptr, length as Integer)
declare function documentView lib “Cocoa.framework” selector “documentView” (obj_id as Integer) as Ptr

dim range as NSRange
range.length = Len(t.Text)
dim p as Ptr = RTFFromRange(documentView(t.Handle), range)
if p <> nil then
  dim m as new MemoryBlock(length(p))
  getBytes(p, m, m.Size)
  return m.StringValue(0, m.Size)
else
  return ""
end if

#else
#pragma unused t
#endif

End Function[/code]

[code]Function ReadRTFDFromFile(extends t as TextArea, path as String) As Boolean

#if TargetCocoa

declare function readRTFDFromFile lib "Cocoa.framework" selector "readRTFDFromFile:" (obj_id as Ptr, path as CFStringRef) as Boolean
declare function documentView lib "Cocoa.framework" selector "documentView" (obj_id as Integer) as Ptr

return readRTFDFromFile(documentView(t.Handle), path)

#else

#pragma unused t
#pragma unused path

#endif

End Function[/code]

[code]Function WriteRTFDToFile(extends t as TextArea, path as String, atomically as Boolean) As Boolean

#if TargetCocoa

declare function writeRTFDToFile lib "Cocoa.framework" selector "writeRTFDToFile:atomically:" (obj_id as Ptr, path as CFStringRef, atomicFlag as Boolean) as Boolean
declare function documentView lib "Cocoa.framework" selector "documentView" (obj_id as Integer) as Ptr

return writeRTFDToFile(documentView(t.Handle), path, atomically)

#else

#pragma unused t
#pragma unused path
#pragma unused atomically

#endif

End Function[/code]

Just reading through this conversation it struck me that we should not be using links to OS libraries in our own code except perhaps in the most extreme of cases if we are using Xojo as a cross-platform development environment. Well tested 3rd party components can wrap up the ugliness of such code and ensure correct operation across the platforms. Well designed components in our own code can perform a similar role but it does seem unnatural to have to do this for essential items.