@Beatrix W - I did want to try out your example - alas it looks like your GetFilesThread is an external resource.
@Robert B I'm still a bit miffed that Xojo R2018.4 folder-items work so much faster on... the older HFS+ file system?
It's currently ranked #2 in Feedback:
Feedback Case #54439 Some FolderItem APIs are performing slow on APFS.
Xojo needs to update their FolderItem Framework to use newer API (such as now available in the MBS Plugins).
@ChristianSchmidt - I tried Beatrix's code -- updated it to look at the BaseDir of my test case data. -- Running it on my New MacBook Pro -- It was much speedier. casual observation suggests about 3000 Nodes per/sec. This is a significant improvement over the Xojo FolderItem class on this hardware...but still only 1/2 as Fast as the old FolderItem Class running on my trusty old Intel Core Duo MacBook. Of course..my New MacBook does have an encrypted drive..not certain how that factors in.
Particularly troublesome is it was running at a constant 3,000 Nodes/sec until it hit about 840,000 Nodes...then...it hit a wall and started crawling. Performance in the range of 100 Nodes/Sec with maybe 2% CPU Usage. I did notice that Memory consumption by the App was steadily increasing (Up to about 400 MB at the time) It's taking forever to get from 840,000 to 850,000 nodes.
Since I need to get up into the multi-millions of nodes visited. -- this isn't going to work in it current form.
Xojo 2018R4 with MBS19 running on a 6-core 2.6 GHz i7 MacBook Pro with 32GB Ram / 2 TB SSD Storage.
What is your end goal? As a developer of many languages, I love my Xojo, but sometimes it requires me to make a plugin or create a helper in another language (some languages are faster than others depending on the needed outcome.) Have you considered writing a simple bash/shell, python, etc script helper to do all the recursion and pass off the "files that need things done" to them back to the Xojo application? In a piece of software we created to rapidly scan a system for duplicate files (DeDuper), the main recursion routine is an embedded red (cross-platform rebol) script that interops with the Xojo application. To the end user..."it's just Xojo." But we had to use the red helper otherwise the application would be scanning all day. When we initially swapped out methods, the application went from almost 5 hours of scan time using just Xojo alone, to just under 17 minutes with the red helper...(8 Terabytes of files of all sizes).
@MatthewCombatti -- yes, my end goal is to simply parse millions of XML files and build a SQL database. The millions of XML files are scattered across hundreds of thousands of directories and hundreds of GB. Knowing the path is important. Ultimately... I just shelled out and ran a simple "find" command piped to a grep for filenames ending in XML and piped that result into a file. I now have a simple text file with a million+ lines that are path of each XML document. It took "find" about 90 seconds to build the file. I can read through the million lines in a flash. So I just send each line (path) to my "parser" and "data base record builder". End Goal - Complete. I didn't really say I couldn't find a workaround...just that I found a "performance curiosity". The solution I employed was actually exactly how I originally thought I could solve it -- because I never built a folder traversal in pure Xojo before -- so I took the time to do that -- and performance was "good enough" -- until I moved the program to my NEWER FASTER hardware which demonstrated as much as a 60X performance PENALTY. My solution now is not yet cross platform as it relies upon OS specific tools. If I want this to run from Windows also (that would be nice) -- then I now have to add platform specific code and probably make other minor adjustments for platform. FolderItem abstracted the details of the target file system and I thought that would be a slick if not the most processing efficient solution.
Ouchie. So... I let MacOS (find utility) scoop together all the XML paths, now "all I have to do" is open each file and Parse it.
Parsing XML isn't the speediest operation...but it should be faster on my newer hardware (one would think).-- alas I need to Open "a million FolderItems" -- I started the process up on my New MacBook Pro... chug...chug...choke. I'm up to 2,600 files parsed after a couple minutes. My Old HFS based MacBook is up past 38,000. Gotta be kidding me.
Ranked #2 in Feedback... I should hope so. FolderItem performance is abysmal. It's Sad. Awful, Slow as Molasses in January.
I thought I could leverage Xojo's XML and SQL classes to make "short work" of this project. The coding wasn't bad (quick development really) but the execution on the developed code is just...just indescribably bad.
I'm going to look at running it against external HFS+ media -- it couldn't be worse could it?
I ran the program with directory paths pointing at an external USB3 disk drive and was able to manage better than 1,000 documents per second. When I point the program at my internal high speed SSD ---- It starts at about 300 documents per second...and by the time it get's to 38,000 documents it's only grabbing on average about 30 per second. Shaking my head in disbelief.
I could probably bypass FolderItem entirely...but really I simply wanted to "open a file". Seem like FolderItem is going to be probably the best candidate. Really performance (on HFS+) media it acceptable. 1.2 million files at 1,000 per second yields a 20 minute processing time. That's acceptable. The issue is how FolderItem works with APFS -- about 30X slower in my tests with yields about 10 hours of processing time. That's not acceptable. -- For small numbers of files...even a few thousand it's probably fine. Thanks for the tips anyway @Jason P.
@jim m Actually, the thought had occurred to me. That or just write 90% in C. Building another text file that contains all the database insert statements. Sigh. It’s kinda like when you buy a Car and find that it won’t run on some roads, and the solution is to buy a Truck and hire a driver To tow you in the car. At some point you gotta feel foolish.
Lest people think I'm too critical of Xojo, I will say that once I built "the Database" I needed a "pretty" GUI application that could search by different keys based on document type etc. and display them appropriately. Xojo has worked perfectly in that regard. The application is also cross platform. It would have been a pain in the keister to do this with C# -- and I wouldn't have cross platform capability and basically a "zero install footprint" -- I have the executables at the same location as the database (on removable media) -- so it's entirely portable. What a vibrant discussion group as well. Thanks to each and every one of you that offered your time and expertise. I know you didn't have to.
If you want the best performance I suggest you avoid FolderItems as much as possible. On MS-Windows we found that using declares to retrieve the file properties we actually needed resulted in a 5 x performance improvement on a congested network.
My guess is that the Xojo framework makes multiple OS calls to populate all of the FolderItem properties which results in slower performance. On macOS there is the additional problem with Xojo using OS APIs that don't seem to work well with APFS.
Interesting coincidence that the current Xojo Blog entry is how to empty folders using recursion. -- I hope they don't have too many folders and files to delete...it could take awhile. I don't mean deeply nested. My nesting was only 4 or 5 levels deep. But just making each item a "folderitem" so you can check properties and such...takes a long time.