Regex to extract key/value from mysql longtext field?

I need to extract the value (47.4255778) for key (latitude) from a longtext field of a MySQL database. I think this could be possible with a RegEx construct, but I can’t wrap my head around it. Any idea, anyone?

The text can be as follows:

a:17:{s:6:“status”;s:1:“1”;s:10:“place_name”;s:17:“Laufenstrasse 261”;s:13:“street_number”;s:3:“261”;s:11:“street_name”;s:13:“Laufenstrasse”;s:6:“street”;s:17:“261 Laufenstrasse”;s:9:“street_bw”;s:17:“Laufenstrasse 261”;s:4:“city”;s:12:“Kleinlützel”;s:6:“county”;s:10:“Thierstein”;s:11:“region_code”;s:2:“SO”;s:11:“region_name”;s:9:“Solothurn”;s:8:“postcode”;s:4:“4245”;s:12:“country_code”;s:2:“CH”;s:12:“country_name”;s:7:“Schweiz”;s:7:“address”;s:31:“Laufenstrasse 261, Kleinlützel”;s:17:“formatted_address”;s:36:“Laufenstrasse 261, 4245 Kleinlützel”;s:8:“latitude”;s:10:“47.4255778”;s:9:“longitude”;s:17:“7.417047600000001”;}

I need select the values for latitude and also for longitude out of this chunk of text.

The text contains 17 elements (a:17):

a:17:{
s:6:"status"s:1:“1”
s:10:"place_name"s:17:“Laufenstrasse 261”
s:13:"street_number"s:3:“261”
s:11:"street_name"s:13:“Laufenstrasse”
s:6:"street"s:17:“261 Laufenstrasse”
s:9:"street_bw"s:17:“Laufenstrasse 261”
s:4:"city"s:12:“Kleinlützel”
s:6:"county"s:10:“Thierstein”
s:11:"region_code"s:2:“SO”
s:11:"region_name"s:9:“Solothurn”
s:8:"postcode"s:4:“4245”
s:12:"country_code"s:2:“CH”
s:12:"country_name"s:7:“Schweiz”
s:7:"address"s:31:“Laufenstrasse 261, Kleinlützel”
s:17:"formatted_address"s:36:“Laufenstrasse 261, 4245 Kleinlützel”
s:8:"latitude"s:10:“47.4255778”
s:9:"longitude"s:17:“7.417047600000001”
}

this is a JSON, you can extract using a xojo jsonitem and parsing it.

I thought it was JSON, but it does not pass any json validator I tried. No way to extract the value for latitude or longitude with MySQL json functions.

did you try xojo json class ?

dim rx as new RegEx
rx.SearchPattern = "(?mi-Us)longitude\""s:\d\d:“(.*)”"

dim rxOptions as RegExOptions = rx.Options
rxOptions.LineEndType = 4

dim match as RegExMatch = rx.Search( sourceText )
while match isa RegExMatch
//
// Do something
//

match = rx.Search // Fetch the next match
wend

replace longitude with latitude to get the other field

1 Like

Thank you Jean-Yves! Unfortunately I’m still not able to get this regex working.
I’m using RegExRX to test (http://www.mactechnologies.com/index.php?page=downloads#regexrx)

I need to find and extract the value only, for longitude or latitude

See screenshots:

The data looks like sorta json. The regex doesn’t work because the longitude is in s17 and not in the longitude value.

Maybe try a different approach? Do a split by ; . The do a split of each value in the array by : . Now you can fish out the longitude string. Then next value in the ; array should contain your longitude value.

I try to find a server-side approach only, therefor I need a regex which marks the start point and then returns any character which follows after the selection through (?mi-Us)longitude".*:" up to the next "

This regex is getting close to what I’m looking for, however I cannot take for granted to find two digits (\d\d) there - maybe there are cases with one digit only. But the lookbehind assertion (?<=) seems not to work with non fixed value length… scratching my head.

(?<=(?mi-Us)longitude";s:\d\d:")(.*)"

What do you mean with " server-side approach only"???

I’m trying to make use of mysql regex functions in order to create an sql select statement which returns longitude and latitude directly from a longtext field, which later can be used from different types of client software. I do not want to return the longtext and then process it on client side.

This is how I select in MySQL:

And this is the Regex which apparently seems to work (but not yet in MySQL)

This is a PHP serialized object. There are implementations out there for unserializing this data in JavaScript and a few other languages, but none in Xojo.

If you’re set on trying to get MySQL to do the heavy-lifting, try the suggestions in this StackOverflow thread. A snippet of one of the code solutions in that thread should it disappear:

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',1),':',-1) AS fieldname1,
SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',2),':',-1) AS fieldvalue1,
SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',3),':',-1) AS fieldname2,
SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',4),':',-1) AS fieldvalue2,
SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',5),':',-1) AS fieldname3,
SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',6),':',-1) AS fieldvalue3,
SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',7),':',-1) AS fieldname4,
SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',8),':',-1) AS fieldvalue4,
SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',9),':',-1) AS fieldname5,
SUBSTRING_INDEX(SUBSTRING_INDEX(old_data,';',10),':',-1) AS fieldvalue5
FROM table;
1 Like

Considering you want to do this server-side, you could just create a PHP script that would accept the serialized data as input and return the information you need. This shouldn’t be too hard to accomplish from Xojo using a Shell.

the data you need is in the first match… my regex is correct


dim rx as new RegEx
rx.SearchPattern = "(?mi-Us)longitude\""s:\d\d:“(.*)”"

dim rxOptions as RegExOptions = rx.Options
rxOptions.LineEndType = 4

dim match as RegExMatch = rx.Search( sourceText )
while match isa RegExMatch
//
 dim longitude as double = Cdbl(match.subexpressionmatch(1))
//

match = rx.Search // Fetch the next match
wend

The quotation marks were altered, in my first post. If I change them back, then your solution fails. But it helped me to get further, so I really appreciate your help!

what is the real actual text you’re reading from mysql ?

so if you have normal quotes around the fields, the correct regex becomes :

(?mi-Us)longitude\"s:\d\d:\"(.*)\"

and as above, you must extract the first substring match to get the lat/lon you need.
as a string, or convert to a double if needed.
be sure also to double the quotes " to “” so that xojo understands the string correctly

1 Like

how does the values get rounded ?
can you have a latitude of “47.5” (so a string length of s4 ?) ?
or will it be “47.5000000” ?