Parsing text with Trim, Split, Nth, RegEx

I have this text which is returned from Shell/mdls. I really only need the values, not the attribute names.
Since I come from C#/Java, I’m not sure which of the functions listed in the title is the best way to go about doing this…

kMDItemColorSpace = "RGB" kMDItemDisplayName = "getme.jpg" kMDItemKind = "JPEG image" kMDItemPhysicalSize = 16384 kMDItemPixelHeight = 208 kMDItemPixelWidth = 250 kMDItemProfileName = (null) kMDItemResolutionHeightDPI = 96

This is all saved in a string. Ideally, I’d like to have the following output:
RGB
getme.jpg
JPEG image
16,384
etc… All on their own line.

I tried using RegEx by searching for quotes and returning only what was inside. But that gets rid of the numbers. I need those too.
Any info would be appreciated!

Simple, brute force method: Assuming each one is a separate string, for each string, use RepalceAll to remove the quotes. Next, use InStr to determine the position of the equals sign. Lastly use Mid to return the string contents starting right after the equals sign.

If you put RegEx in the title a certain @Kem Tekinay might come along and help with the RegEx. I would definitely choose RegEx over any of the other methods, as you’d be able to capture the key and value. You don’t think you want it now, but you may find a use for the key in the future (especially if the order were to ever change).

In my opinion, a brute force method would be unwise and unreliable when technologies like RegEx exist.

This is what I came up with playing around in RegExRx for a few minutes, there might be something cleaner though - I’m not as good with RegEx as Kem is. It also would fail if any of the values were to contain a quote character escaped like \".

(.*)\\b\\s*=\\s"?([^"]*)"?$

But choose what works best for you, and what you actually understand. If you don’t understand why something works, you can’t fix it when it stops working :slight_smile:

This code should break your data out into key/value pairs for processing

dim Rows() as string data=replaceall(data,chr(10),"") //remove any extra line feed chars data=replaceall(data,chr(34),"") //remove any qoutes rows()=data.Split(chr(13)) //break each row by chr(13) Carriage Return for each s as string in rows dim key as string dim value as string key=trim(NthField(s,"=",1)) value=trim(NthField(s,"=",2)) next

Thanks for the responses everyone. I’ll take a look and see what works best before selecting an aswer.

Note that removing characters (spaces and quotes) is not parsing data, it is destroying it.

don’t assume the Endofline character is Chr(13)

[code]dim rows() as string
dim kv(),r as string
dim d as new dictionary

rows = data.split(ENDOFLINE)
for each r in rows
kv = r.split("=")
d.value(kv(0).trim) = kv(1).trim
next[/code]

If you just need the final string based on the input as a single multiline input:

Dim myStrings(-1), finalString As String Dim strCnt As Integer InputData = InputData.ReplaceEndOfLine(EndOfLine) MyStrings = SplitB(InputData, EndLOfLine) For strCnt = 0 To MyStrings.Ubound MyStrings(strCnt) = NthFieldB(MyStrings(strCnt), "= ", 2) Next finalString = Join(myStrings, EndOfLine)

Assuming those are spaces after the “=”, this pattern will do it:

=\\x20*"?([^"\\r\
]+)

The values sans quotes will be in SubExpressionString( 1 ).

Full code (untested):

dim rx as new RegEx
rx.SearchPattern = "=\\x20*""?([^""\\r\
]+)"

dim values() as string

dim match as RegExMatch = rx.Search( myText )
while match isa RegExMatch
  values.Append match.SubExpressionString( 1 )
  match = rx.Search
wend

if you’d use MBS Xojo Plugins and the MDItemMBS class, you could simply query attributes as a dictionary:

dim item as new MDItemMBS(SpecialFolder.Desktop.Child("test.txt")) // show names: MsgBox join(item.AttributeNames, EndOfLine) // get all keys and values: dim dic as dictionary = item.GetAttributes