Performance Curiosity

It’s currently ranked #2 in Feedback:
<https://xojo.com/issue/54439> Some FolderItem APIs are performing slow on APFS.
Xojo needs to update their FolderItem Framework to use newer API (such as now available in the MBS Plugins).

@Robert Bednar : sorry, my bad. I have uploaded a new version to http://www.mothsoftware.com/downloads/test.zip . Try also “FileList Recursive.rbp”.

@Christian Schmidt - I tried Beatrix’s code – updated it to look at the BaseDir of my test case data. – Running it on my New MacBook Pro – It was much speedier. casual observation suggests about 3000 Nodes per/sec. This is a significant improvement over the Xojo FolderItem class on this hardware…but still only 1/2 as Fast as the old FolderItem Class running on my trusty old Intel Core Duo MacBook. Of course…my New MacBook does have an encrypted drive…not certain how that factors in.

Particularly troublesome is it was running at a constant 3,000 Nodes/sec until it hit about 840,000 Nodes…then…it hit a wall and started crawling. Performance in the range of 100 Nodes/Sec with maybe 2% CPU Usage. I did notice that Memory consumption by the App was steadily increasing (Up to about 400 MB at the time) It’s taking forever to get from 840,000 to 850,000 nodes.
Since I need to get up into the multi-millions of nodes visited. – this isn’t going to work in it current form.

Environment:
Xojo 2018R4 with MBS19 running on a 6-core 2.6 GHz i7 MacBook Pro with 32GB Ram / 2 TB SSD Storage.

What is your end goal? As a developer of many languages, I love my Xojo, but sometimes it requires me to make a plugin or create a helper in another language (some languages are faster than others depending on the needed outcome.) Have you considered writing a simple bash/shell, python, etc script helper to do all the recursion and pass off the “files that need things done” to them back to the Xojo application? In a piece of software we created to rapidly scan a system for duplicate files (DeDuper), the main recursion routine is an embedded red (cross-platform rebol) script that interops with the Xojo application. To the end user…“it’s just Xojo.” But we had to use the red helper otherwise the application would be scanning all day. When we initially swapped out methods, the application went from almost 5 hours of scan time using just Xojo alone, to just under 17 minutes with the red helper…(8 Terabytes of files of all sizes).

@Matthew Combatti – yes, my end goal is to simply parse millions of XML files and build a SQL database. The millions of XML files are scattered across hundreds of thousands of directories and hundreds of GB. Knowing the path is important. Ultimately… I just shelled out and ran a simple “find” command piped to a grep for filenames ending in XML and piped that result into a file. I now have a simple text file with a million+ lines that are path of each XML document. It took “find” about 90 seconds to build the file. I can read through the million lines in a flash. So I just send each line (path) to my “parser” and “data base record builder”. End Goal - Complete. I didn’t really say I couldn’t find a workaround…just that I found a “performance curiosity”. The solution I employed was actually exactly how I originally thought I could solve it – because I never built a folder traversal in pure Xojo before – so I took the time to do that – and performance was “good enough” – until I moved the program to my NEWER FASTER hardware which demonstrated as much as a 60X performance PENALTY. My solution now is not yet cross platform as it relies upon OS specific tools. If I want this to run from Windows also (that would be nice) – then I now have to add platform specific code and probably make other minor adjustments for platform. FolderItem abstracted the details of the target file system and I thought that would be a slick if not the most processing efficient solution.

Ouchie. So… I let MacOS (find utility) scoop together all the XML paths, now “all I have to do” is open each file and Parse it.
Parsing XML isn’t the speediest operation…but it should be faster on my newer hardware (one would think).-- alas I need to Open “a million FolderItems” – I started the process up on my New MacBook Pro… chug…chug…choke. I’m up to 2,600 files parsed after a couple minutes. My Old HFS based MacBook is up past 38,000. Gotta be kidding me.

Ranked #2 in Feedback… I should hope so. FolderItem performance is abysmal. It’s Sad. Awful, Slow as Molasses in January.
I thought I could leverage Xojo’s XML and SQL classes to make “short work” of this project. The coding wasn’t bad (quick development really) but the execution on the developed code is just…just indescribably bad.

I’m going to look at running it against external HFS+ media – it couldn’t be worse could it?

I ran the program with directory paths pointing at an external USB3 disk drive and was able to manage better than 1,000 documents per second. When I point the program at my internal high speed SSD ---- It starts at about 300 documents per second…and by the time it get’s to 38,000 documents it’s only grabbing on average about 30 per second. Shaking my head in disbelief.

Here are some guidelines about how to get fast performance from Folderitem:

https://forum.xojo.com/conversation/post/111435

I could probably bypass FolderItem entirely…but really I simply wanted to “open a file”. Seem like FolderItem is going to be probably the best candidate. Really performance (on HFS+) media it acceptable. 1.2 million files at 1,000 per second yields a 20 minute processing time. That’s acceptable. The issue is how FolderItem works with APFS – about 30X slower in my tests with yields about 10 hours of processing time. That’s not acceptable. – For small numbers of files…even a few thousand it’s probably fine. Thanks for the tips anyway @Jason Parsley.

Since you’re using shell anyway, what if you use a shell and call cat on each file and read the results from the shell rather than creating folderitems and textinputstreams?

@jim mckay Actually, the thought had occurred to me. That or just write 90% in C. Building another text file that contains all the database insert statements. Sigh. It’s kinda like when you buy a Car and find that it won’t run on some roads, and the solution is to buy a Truck and hire a driver To tow you in the car. At some point you gotta feel foolish.

Lest people think I’m too critical of Xojo, I will say that once I built “the Database” I needed a “pretty” GUI application that could search by different keys based on document type etc. and display them appropriately. Xojo has worked perfectly in that regard. The application is also cross platform. It would have been a pain in the keister to do this with C# – and I wouldn’t have cross platform capability and basically a “zero install footprint” – I have the executables at the same location as the database (on removable media) – so it’s entirely portable. What a vibrant discussion group as well. Thanks to each and every one of you that offered your time and expertise. I know you didn’t have to.

Have you tried to do an Instruments session to see why your code slows down so much?

@Beatrix Willius - I have not. I’ve not had the occasion to run the Mac (XCode?) Instruments tool to diagnose performance issues with a Xojo Application. I’ll look into that. Good idea.

@Jason Parsley : even on HFS the performance of Xojo versus MBS is like 3450 ticks to 250 ticks. With “best practices”. Don’t just test with a couple of k files.

If you want the best performance I suggest you avoid FolderItems as much as possible. On MS-Windows we found that using declares to retrieve the file properties we actually needed resulted in a 5 x performance improvement on a congested network.

My guess is that the Xojo framework makes multiple OS calls to populate all of the FolderItem properties which results in slower performance. On macOS there is the additional problem with Xojo using OS APIs that don’t seem to work well with APFS.

@Jason Parsley : and on APFS the performance of folderitems is even worse: 41 ticks versus 950 ticks on just a couple of k files.

Interesting coincidence that the current Xojo Blog entry is how to empty folders using recursion. – I hope they don’t have too many folders and files to delete…it could take awhile. I don’t mean deeply nested. My nesting was only 4 or 5 levels deep. But just making each item a “folderitem” so you can check properties and such…takes a long time.

TTs „find any file“ App was lightening fast on HPFS, with APFS you can go get a coffee until results appear. But if Thomas can’t solve this, I doubt that it can be any faster at all.

http://apps.tempel.org/FindAnyFile/index.php

Well, I looked at TTs tool and I cranked it up… after a min of watching the file count grow… I opened a terminal window and ran a “find” command piped to “grep” and piped the result to file. The “find” command had identified 500K files before TTs tool hit 15K.
I think my solution of shelling out and using the native command was a good call. Problem is…once I identify 500K files – now I need to open each one and process the XML data contained therein. I ended up doing the operation on my old HFS+ Macbook just to get my first task done. One of my tests runs was against 1.2 million documents - and the other was over 5 million. Performance varied a bit over the course of the operation – but I think I was averaging about 300 docs/sec using Xojo’s XML classes. – Not hateful. Hopefully the problem isn’t an APFS limit – and it’s just a matter of Hooking the new “drivers”. I’m concerned that Christians’s MBS code while FAR superior to Xojo’s Folderitem class for the operations I was using – still falls short on APFS when compared to HFS+. Nothing I like better than to see a $4,000 Laptop get it’s a$$ handed to it by a decade old system with an ancient file system.