read html content

hi there :slight_smile:

is there a possibilty for xojo to read content from a html page catched with a httpsocket?

for example:

<td id="tdVersionHeading" class="contentsummaryleft" style="font-weight: bold;white-space:nowrap;"> Version: </td> <td colspan="2" id="tdVersion" class="contentsummaryright" style="width:90%;padding-left: 5px;font-weight: normal;"> 390.65&nbsp;&nbsp;<b><sup>WHQL</sup></b> </td>

i try to get the “390.65” with xojo.

greetings :smiley:

Sure. As usual this depends on what you want to do and what you know about the html. Is it always the same page, same table, same td then try with Regex. If you need a more generic method of parsing try with Tidy from the MBS plugin.

Hmm regex is not my thing…doesnt unterstand this pattern things :frowning:

and i dont want to use commercial plugins on opensource ^^

Kem’s RegExRX is my go-to for whenever I’m mucking around with RegEx.
It’s only a fiver, and it’s indispensable to me.

Here’s my really sloppy attempt that works with the example you posted:

<td(?:.*)id="tdVersion"(?:.*)>\\W*([0-9]*\\.[0-9]*)\\W

SubExpressionString(1) will be the 390.65 value.

thank you for your input :smiley: i will try that tomorrow :wink:

so i tried it and it works :slight_smile:

no i tried my own regex…in the regex tester ist works perfectly like this:

http\\:\\/\\/de\\.download\\.nvidia\\.com\\/Windows\\/([^\"\"]*)

But with xojo i have a no match :frowning:

Check for the Multiline and Greediness options.

[quote=371226:@Sascha Mierke]so i tried it and it works :slight_smile:

no i tried my own regex…in the regex tester ist works perfectly like this:

http\\:\\/\\/de\\.download\\.nvidia\\.com\\/Windows\\/([^\"\"]*)

But with xojo i have a no match :([/quote]
If you are using RegexRx, right-click on the search field and select the option to copy it in Xojo format. ItÂ’ll include more stuff that you need then make it work properly.

hm that brings me to codehttp\:\/\/de\.download\.nvidia\.com\/Windows\/([^""""]*)[/code]

but still not working…in RegexRx there is the match…even the code of RegexRx dont find it in xojo :o

Check in RegexRx what the settings for greedy/lazy and multiline are. Then make sure that you set the Regex options in Xojo to the same value.

ok it seems the problem was that the html i get was not the html i guess. I tried with RegexMagic but dont get it to work.

<td valign="middle" align="left" rowspan="5" class="contentsummaryleft" style="font-weight: bold;white-space:nowrap;"> <a href="/content/DriverDownload-March2009/confirmation.php?url=/Windows/390.77/390.77-desktop-win8-win7-64bit-international-whql.exe&lang=de&type=TITAN" id="lnkDwnldBtn" onclick="nvEventTracker(&#39;Drvr Download&#39;,&#39;click&#39;,thisQryStr+&#39; | &#39;+&#39;NVIDIA DRIVERS GeForce Game Ready Driver WHQL&#39;+&#39; | &#39;+&#39;/Windows/390.77/390.77-desktop-win8-win7-64bit-international-whql.exe&#39;,&#39;0&#39;,&#39;TRUE&#39;);"><img src="/content/DriverDownload-March2009/includes/de/images/bttn_download.jpg" id="imgDwnldBtn" alt="Jetzt Herunterladen" border="0" /></a> </td> <td class="contentsummaryright" style="width:90%;padding-left: 5px;font-weight: normal;"> </td>

I need to get the “/Windows/390.77/390.77-desktop-win8-win7-64bit-international-whql.exe” from this html, but the name of the exe can change…maybe a regex king has an idea for this :slight_smile:

With dotnet i got this easy with DOM and agilitypackage, but on xojo im helpless

It would be helpful to know how it can change and what regex patterns you have tried.

I tried only with RegexMagic to play around i dont understand regex… i thing the whole name of the exe can change.

maybe /Windows/390.77/390.77-desktop-win8-win7-64bit-international-whql.exe

i think the bold things will be the same everytime

I would guess…

href=\\”[^?]+?url=([^&]+)

Roughly that means

  1. Look for href=“
  2. Scan through characters until you come across a ?
  3. Then ?url=
  4. Then capture everything up to the next &

ItÂ’s not perfect, but it should work in this case.

thank you at first for the help :slight_smile:

i tried it on the whole html and only the part of the quote here : https://regex101.com/r/iO0i7G/1
but there is no match,

\\?url=([^\\&]+)

That worked for me on regex101

Yes this is rly working :slight_smile: thank you all :smiley: