I have recently completed an custom class for Xojo that translates Markdown code to HTML. This class is written 100% in Xojo and uses no declares, or custom helper utilities. It supports the basic CommonMark specifications as best I can tell, and has included a number of built in extensions allowing it to support Markdown syntax beyond what I have seen in other available classes.
Note : the Linux version has NOT been tested, as I have no machine to do so with
The class SHOULD work with Console and Web apps as well, but again I have no platform
Yours converts any < into < - that makes it impossive to add <div …> (to add color to text, for instance) or comments with .[/quote]
I am aware of that… and I have read various articles that do and do not allow rawHTML…
This is something I need to look into… the issue is the ability to determine what is syntax correct, and to not be mis-interpeted, since the < and > signs are used within Markdown as well.
I see no reason why not… since the output is pure HTML, and I doubt Mojave is changing how that works.
In regards to Raw HTML… as I mentioned various other Markdown parsers handle it in different ways, from not allowing it at all , to allowing only a “whitelist” of elements, to those that have a full blown HTML lexer involved. Meaning there is not real “standard”
So my Markdown Parser takes this approach, which I think allows the most flexibilty,
Right now markdownDS supports “code fences”, to this I added a “language” option. If you specifiy HTML as the language, then anything between the code fences is taken AS-IS, where otherwise it is rendered “safe” and wrapped in a line number presentation box.
```` HTML
<div><span>This code is rendered to markdown with NO changes</span></div>
````
where this would be rendered as discrete text
````
<div><span>This code is rendered to markdown with NO changes</span></div>
````
The demo will be updated in the next hour or two,
NOTE : it now also supports HTML style comments… ANYWHERE in the Markdown (does not have to be in an HTML fence)
I am also going to experiment with allowing
to “infer” an HTML fence as well
Note : This DID require a macOS declare to turn off SMARTDASHES etc.
I disagree: CommonMark clearly specifies this feature, as does the original Markdown spec.
The fact that others do not support this is clearly against both those specs. So, while some may omit this feature, the standard is quite clear on wanting to support this.
So, the least you should do is state that your implementation does not support this part of the standard instead of saying it’s not a clear part of the standard.
Also, seeing you intend to look into support this now for
, please note that the specs say it should work with almost anything that looks like an html tag. I, for instance, also use and
for this purpose, and it works with the MBS Marddown class that I currently use. And, of course, "
needs to be seen as a html tag as well.
I agree that this can be quite difficult to parse correctly, though.
I think 100% of the cases where this isn’t implemented in the interpreter boil down to this, really.
Supporting HTML means either taking it all in and not caring about the output (which I think could be a disclaimer but wouldn’t help much less knowledgeable people) or implementing various degrees of parsing a second language (the 1st being markdown) that can be orders of magnitude more complex.
I’d choose a middle ground: Make sure that all opened tags are closed (in proper nesting order) before the block is finished and ignore what the tags may be or do. It’s still a lot of work but it provides the functionality while putting the burden of HTML validation on the user.
OK… I think I have it!!
Its a mixture of what I had said earlier today (HTML code fence) and what Thoma and Eduardo said.
The markdown preprocessor looks for a qualified set of HTML elements. If it has not previously found a qualifiyng one, it starts a fenced area and proceeds until it finds a closing one at the same level (this allows for nested DIV for example)
Once it has inserted the “fences” it then processes it as I described above. So far it seems to work just fine.
here is a list of the elements if looks for
ADDRESS
ARTICLE
ASIDE
BLOCKQUOTE
CANVAS
DIV
DL
FIGURE
FOOTER
FORM
MAIN
NAV
NOSCRIPT
SCRIPT
OL
P
PRE
SECTION
TABLE
The opening tag MUST start the markdown line.
the closing tag “can” be anywhere, but if it is NOT the end of the line, then any HTML following will not be dealt with properly
<div>test</div> // this works
<div><div>test</div><div> // this works
<div>test</div><span>xxx</span> // the span will not be processed
<div>test</div>
<span>xxx</span> // but here it will be
BTW, if you want to be correctly compliant with CommonMark (I don’t care for it, though), this line is NOT a comment you should leave as that:
That’s because of the “–” inside. I don’t see a good reason for it (other than that they might thing it would make parsing easier), but they made that intentional.
Another thing: In your sample markdown text, shouldn’t each paragraph (separated by two newlines) be separately enclosed in
tags? Because they aren’t. I wonder if you broke that with your latest changes.
[quote=401545:@Thomas Tempelmann]BTW, if you want to be correctly compliant with CommonMark (I don’t care for it, though), this line is NOT a comment you should leave as that:
[/quote]
Not sure I understand what you are getting at… Every online markdown editor that I tried either rendered that as a comment, or was an editor that didn’t understand raw html.
I will look into that. I think now it is just rendering it as “text”… but you are correct it should be wrapped in “p” tags.
Commonmark does this because it’s invalid HTML, for stupid reasons unrelated to them.
This is one of the quirks of the history of HTML. In standards mode HTML5 tries to play nice with XML, and thus to be a valid subset of XML it forbids double-hyphens in comments to make it compatible with XML, which in turn forbids double-hyphens in comments to be compatible with SGML.
Commonmark, thus, tries to play nice with HTML5 by being compliant with an exception to comments existing in XML which in turn exists to not upset SGML parsers (so, essentially, this is a “feature” to be compatible with the least used language of the family).
The suggestion of HTML5, which at least gives an option, is to add a space whenever two consecutive dashes are found within a comment but don’t terminate the tag. This, I think, is a good option to adopt for a parser: Convert – to either “- -” or to “” (an em-dash).
It has to be tackled because Firefox, for example, chokes on double-dashes in comments in standards mode.