Does anyone have some code to convert rtf to html? Don’t want to reinvent the wheel. I don’t need a full parser, just some colours and styles. textutil on the Mac does the conversion I need. But the CSS differs from execution to execution.
Did you try NSAttributedStringMBS class?
Thanks, NSAttributedStringMBS works almost.
this was on the -RIP- realsoftware forum …
' converts an rtf formatted Text To a pure html String.
' from http://forums.realsoftware.com/viewtopic.php?f=6&t=39729
Dim st As new StyledText
if st<>nil Then
if IsRTFText( aRTFvalue) Then
st.RTFData = aRTFvalue
Else
st.Text = aRTFvalue
End If
Dim Result As Boolean
Dim Count, I As Integer
Dim AllPiecesOfPage, PieceOfPage As String
Dim Piece, Temp, WholeWebPage As String
Dim SR As StyleRun
Count = st.StyleRunCount
AllPiecesOfPage = ""
For I = 0 To Count - 1
SR = st.StyleRun( I )
PieceOfPage = SR.Text
If SR.Bold = True Then
PieceOfPage = "<b>" + PieceOfPage + "</b>"
End If
If SR.Italic = True Then
PieceOfPage = "<i>" + PieceOfPage + "</i>"
End If
If SR.Underline = True Then
PieceOfPage = "<u>" + PieceOfPage + "</u>"
End If
PieceOfPage = ReplaceAll (PieceOfPage, kLineFeed+kLineFeed, "<p>" )
PieceOfPage = ReplaceAll (PieceOfPage, kLineFeed, "<br />" )
Piece = "<span style=" + kGuillemet + "font-family: " + SR.Font
Piece.AppendText "; font-size: " + Str ( SR.Size) + "pt; color: " + Mid ( Str ( SR.TextColor ), 3 )
Piece.AppendText ";" + kGuillemet + ">" + PieceOfPage + "</span>"
AllPiecesOfPage = AllPiecesOfPage + Piece
Next I
WholeWebPage = "<html>" + EndOfLine + "<body style=""font-size: 30pt;"">"
WholeWebPage = WholeWebPage + AllPiecesOfPage + EndOfLine
WholeWebPage = WholeWebPage + "</body>" + EndOfLine + "</html>" + EndOfLine
Return WholeWebPage
End If
Perfect, thanks!
A guillemet is a french type of quote. Not sure why the original author uses the guillemet instead of a simple quote.
Here is my updated code:
Private Function MakeHtmlFromStyledText(theStyledText as StyledText) as String
dim theText(-1) as String
theText.Append("<p>")
For currentRun as Integer = 0 To theStyledText.StyleRunCount - 1
dim theRun as StyleRun = theStyledText.StyleRun(currentRun)
dim StyleRunText as String = EncodingToHTMLMBS(theRun.Text)
'bold, italic and unterline
If theRun.Bold Then StyleRunText = "<b>" + StyleRunText + "</b>"
If theRun.Italic Then StyleRunText = "<i>" + StyleRunText + "</i>"
If theRun.Underline Then StyleRunText = "<u>" + StyleRunText + "</u>"
'tab and br
StyleRunText = StyleRunText.ReplaceAll(EndOfLine, "<br />")
StyleRunText = StyleRunText.ReplaceAll(Encodings.UTF8.Chr(9), " ")
'font and color
dim theColor as Color = theRun.TextColor
dim ColorString as String = "rgb(" + str(theColor.Red) + ", " + str(theColor.Green) + ", " + str(theColor.Blue) + ")"
dim FontSpan as String = "<span style=""font-family: " + theRun.Font + "; font-size: " + Str(theRun.Size) + "pt; color: " + ColorString + ";"">" + StyleRunText + "</span>"
theText.AddRow(FontSpan)
Next
theText.Append("</p>")
return string.FromArray(theText)
End Function
oups sorry !
Public Const kGuillemet as Number = &u22
Thank you
Thank you, you too
I have an old piece of code for html conversion. It is incomlete and the part handling a font is not so good:
Public Function ConversionRTFtoHTML(ta as textarea) as String
// ------------------------------------------------------------------------------------------------------------
// Konvertierung eines RTF-Textes in einen HTML-Text
// Conversion of a RTF text into a HTML text
// Enriched Text funktioniert leider nicht (Doku aus https://en.wikipedia.org/wiki/Enriched_text )
// Enriched text doesn’t work (docu from https://en.wikipedia.org/wiki/Enriched_text )
// ------------------------------------------------------------------------------------------------------------
// This method isn’t complete. The method contains only the formatting rules which I need at first.
//
// This method is useful to transfer a RTF text of a textarea to a html text for transferring this text per
// e-mail. So isn’t CSS useful - as far as I know.
// ------------------------------------------------------------------------------------------------------------
// --> ta Textarea
// <-- string HTML text
// ------------------------------------------------------------------------------------------------------------
if ta is nil then
return “”
end if
Dim sr as StyleRun
dim lv_SRText as String
dim et as string
dim lv_OpenColor as boolean
dim lv_OpenBold as Boolean
dim lv_OpenItalic as Boolean
dim lv_OpenUnderline as Boolean
et = “”
//Get the number of styles
For i as integer = 0 to ta.StyledText.StyleRunCount - 1 //get the StyleRuns
sr = ta.StyledText.StyleRun( i )
if sr.Size = 0 or sr.Size = 13 then // erst mal das obsolete "font face" nutzen - as a start trying the obsolet "font face"
et = et + "<font face="+sr.Font+">" // die Größe ist im HTML ein Wert von 1 bis 7, paßt ja nicht zur Pixel-Größe
// the size is in HTML a value from 1 to 7, doesn't fit to a pixel size
else
et = et + "<font face="+sr.Font+" size="+sr.Size.ToText+">"
end if
lv_SRText = sr.Text
'lv_SRText = ReplaceAll(lv_SRText ,"<","<") ' für das Kleiner-Zeichen <
'lv_SRText = ReplaceAll(lv_SRText ,">",">") ' für das Größer-Zeichen >
lv_SRText = ReplaceAll(lv_SRText ,"&","&") ' für das Und-Zeichen &
lv_SRText = ReplaceAllB(lv_SRText,"ä","ä") ' für ä (Umlaut-a)
lv_SRText = ReplaceAll(lv_SRText ,"Ä","Ä") ' für Ä (Umlaut-A)
lv_SRText = ReplaceAllB(lv_SRText,"ö","ö") ' für ö (Umlaut-o)
lv_SRText = ReplaceAll(lv_SRText ,"Ö","Ö") ' für Ö (Umlaut-O)
lv_SRText = ReplaceAllB(lv_SRText,"ü","ü") ' für ü (Umlaut-u)
lv_SRText = ReplaceAll(lv_SRText ,"Ü","Ü") ' für Ü (Umlaut-U)
lv_SRText = ReplaceAll(lv_SRText ,"ß","ß") ' für ß (scharfes s, s-z-Ligatur)
lv_SRText = ReplaceAll(lv_SRText ,"é","é") ' für é (e mit Acute-Akzent)
lv_SRText = ReplaceAll(lv_SRText ,"à","à") ' für à (a mit Grave-Akzent)
lv_SRText = ReplaceAll(lv_SRText ,"ô","ô") ' für ô (o mit Circumflex-Akzent)
lv_SRText = ReplaceAll(lv_SRText ,"ç","ç") ' für ç (c mit Cedille)
lv_SRText = ReplaceAll(lv_SRText ,"ñ","ñ") ' für ñ (n mit Tilde)
lv_SRText = ReplaceAll(lv_SRText ,"å","å") ' für å (a mit Ring)
lv_SRText = ReplaceAll(lv_SRText ,"°","°") ' für das Grad-Symbol ° (degree)
lv_SRText = ReplaceAll(lv_SRText ,"©","©") ' für das Copyright-Symbol ©
lv_SRText = ReplaceAll(lv_SRText ,"®","®") ' für das Eingetragene-Marke-Symbol ®
lv_SRText = ReplaceAll(lv_SRText ,"§","§") ' für das deutsche Paragraphen-Zeichen § (section)
lv_SRText = ReplaceAll(lv_SRText ,"¶","¶") ' für das amerikanische Absatz-Zeichen ¶ (paragraph)
lv_SRText = ReplaceAll(lv_SRText,EndOfLine,"<br>") ' neue Zeile
select case sr.TextColor
case color.Red
et = et+"<span style=""color:red"">"
lv_OpenColor = true
case color.Blue
et = et+"<span style=""color:blue"">"
lv_OpenColor = true
case color.Green
et = et+"<span style=""color:green"">"
lv_OpenColor = true
case color.Yellow
et = et+"<span style=""color:yellow"">"
lv_OpenColor = true
case color.Cyan
et = et+"<span style=""color:cyan"">"
lv_OpenColor = true
case color.Magenta
et = et+"<span style=""color:magenta"">"
lv_OpenColor = true
end select
if sr.Bold then
et = et+"<strong>"
lv_OpenBold = true
end if
if sr.Italic then
et = et+"<em>"
lv_OpenItalic = true
end if
if sr.Underline then
et = et+"<u>"
lv_OpenUnderline = true
end if
et = et + lv_SRText
if lv_OpenUnderline then
et = et+"</u>"
lv_OpenUnderline = false
end if
if lv_OpenItalic then
et = et+"</em>"
lv_OpenItalic = false
end if
if lv_OpenBold then
et = et+"</strong>"
lv_OpenBold = false
end if
if lv_OpenColor then
et = et+"</span>"
lv_OpenColor = false
end if
et = et + "</font>"
next
et = et + “”
return et
End Function
but it’s handling of the special characters is better !
But that is the only
I’ve copied the source text into the edit field but some lines of code becomes corrupted. So I try again and look at the preview:
Public Function ConversionRTFtoHTML(ta as textarea) as String
// ------------------------------------------------------------------------------------------------------------
// Konvertierung eines RTF-Textes in einen HTML-Text
// Conversion of a RTF text into a HTML text
// Enriched Text funktioniert leider nicht (Doku aus https://en.wikipedia.org/wiki/Enriched_text )
// Enriched text doesn't work (docu from https://en.wikipedia.org/wiki/Enriched_text )
// ------------------------------------------------------------------------------------------------------------
// This method isn't complete. The method contains only the formatting rules which I need at first.
//
// This method is useful to transfer a RTF text of a textarea to a html text for transferring this text per
// e-mail. So isn't CSS useful - as far as I know.
// ------------------------------------------------------------------------------------------------------------
// --> ta Textarea
// <-- string HTML text
// ------------------------------------------------------------------------------------------------------------
if ta is nil then
return ""
end if
Dim sr as StyleRun
dim lv_SRText as String
dim et as string
dim lv_OpenColor as boolean
dim lv_OpenBold as Boolean
dim lv_OpenItalic as Boolean
dim lv_OpenUnderline as Boolean
et = "<html><body>"
//Get the number of styles
For i as integer = 0 to ta.StyledText.StyleRunCount - 1 //get the StyleRuns
sr = ta.StyledText.StyleRun( i )
if sr.Size = 0 or sr.Size = 13 then // erst mal das obsolete "font face" nutzen - as a start trying the obsolet "font face"
et = et + "<font face="+sr.Font+">" // die Größe ist im HTML ein Wert von 1 bis 7, paßt ja nicht zur Pixel-Größe
// the size is in HTML a value from 1 to 7, doesn't fit to a pixel size
else
et = et + "<font face="+sr.Font+" size="+sr.Size.ToText+">"
end if
lv_SRText = sr.Text
'lv_SRText = ReplaceAll(lv_SRText ,"<","<") ' für das Kleiner-Zeichen <
'lv_SRText = ReplaceAll(lv_SRText ,">",">") ' für das Größer-Zeichen >
lv_SRText = ReplaceAll(lv_SRText ,"&","&") ' für das Und-Zeichen &
lv_SRText = ReplaceAllB(lv_SRText,"ä","ä") ' für ä (Umlaut-a)
lv_SRText = ReplaceAll(lv_SRText ,"Ä","Ä") ' für Ä (Umlaut-A)
lv_SRText = ReplaceAllB(lv_SRText,"ö","ö") ' für ö (Umlaut-o)
lv_SRText = ReplaceAll(lv_SRText ,"Ö","Ö") ' für Ö (Umlaut-O)
lv_SRText = ReplaceAllB(lv_SRText,"ü","ü") ' für ü (Umlaut-u)
lv_SRText = ReplaceAll(lv_SRText ,"Ü","Ü") ' für Ü (Umlaut-U)
lv_SRText = ReplaceAll(lv_SRText ,"ß","ß") ' für ß (scharfes s, s-z-Ligatur)
lv_SRText = ReplaceAll(lv_SRText ,"é","é") ' für é (e mit Acute-Akzent)
lv_SRText = ReplaceAll(lv_SRText ,"à","à") ' für à (a mit Grave-Akzent)
lv_SRText = ReplaceAll(lv_SRText ,"ô","ô") ' für ô (o mit Circumflex-Akzent)
lv_SRText = ReplaceAll(lv_SRText ,"ç","ç") ' für ç (c mit Cedille)
lv_SRText = ReplaceAll(lv_SRText ,"ñ","ñ") ' für ñ (n mit Tilde)
lv_SRText = ReplaceAll(lv_SRText ,"å","å") ' für å (a mit Ring)
lv_SRText = ReplaceAll(lv_SRText ,"°","°") ' für das Grad-Symbol ° (degree)
lv_SRText = ReplaceAll(lv_SRText ,"©","©") ' für das Copyright-Symbol ©
lv_SRText = ReplaceAll(lv_SRText ,"®","®") ' für das Eingetragene-Marke-Symbol ®
lv_SRText = ReplaceAll(lv_SRText ,"§","§") ' für das deutsche Paragraphen-Zeichen § (section)
lv_SRText = ReplaceAll(lv_SRText ,"¶","¶") ' für das amerikanische Absatz-Zeichen ¶ (paragraph)
lv_SRText = ReplaceAll(lv_SRText,EndOfLine,"<br>") ' neue Zeile
select case sr.TextColor
case color.Red
et = et+"<span style=""color:red"">"
lv_OpenColor = true
case color.Blue
et = et+"<span style=""color:blue"">"
lv_OpenColor = true
case color.Green
et = et+"<span style=""color:green"">"
lv_OpenColor = true
case color.Yellow
et = et+"<span style=""color:yellow"">"
lv_OpenColor = true
case color.Cyan
et = et+"<span style=""color:cyan"">"
lv_OpenColor = true
case color.Magenta
et = et+"<span style=""color:magenta"">"
lv_OpenColor = true
end select
if sr.Bold then
et = et+"<strong>"
lv_OpenBold = true
end if
if sr.Italic then
et = et+"<em>"
lv_OpenItalic = true
end if
if sr.Underline then
et = et+"<u>"
lv_OpenUnderline = true
end if
et = et + lv_SRText
if lv_OpenUnderline then
et = et+"</u>"
lv_OpenUnderline = false
end if
if lv_OpenItalic then
et = et+"</em>"
lv_OpenItalic = false
end if
if lv_OpenBold then
et = et+"</strong>"
lv_OpenBold = false
end if
if lv_OpenColor then
et = et+"</span>"
lv_OpenColor = false
end if
et = et + "</font>"
next
et = et + "</body></html>"
return et
End Function
Cough:
Wow RTF is still alive and among us? May I ask why and from where you are recieving RTF formated text? If asked I would deny to develop any parser or converter.
Perhaps because it is/was an easy way to create a document the end user can open with MS Word, w/o writing an own docx? My understanding is that the PO wants to switch to HTML but in an easy way w/o recreating everything from scratch.
Grin… I’m doing quite a few blog posts on AppleScript. Of course, I want to show the code in the blog post. It looks nicer if the code is formatted. I have a CLI called ASHtml. But the CLI has some drawbacks. After looking at the source code I thought that I could do this myself in Xojo. It’s just one call to the MBS plugin to get the code of the AppleScript as rtf. Then I just need to make some ugly html out of the html.
I’m going to make the project available in the next days.
Any update here ? we have a project where we have a lot of rtf files and we try to migrate them to html in a way or another . some documentation or examples will help a lot .
Thanks .
I made an app that I use for my blog posts on AppleScript: “Convert AppleScript to Html” at Other Applications . Once the rtf is in a TextArea it’s simple to make some html.