Performance Curiosity

I have “a folder” with about 300 sub-folders… and each of those folders has about 2000 sub-folders…and each of those folders has on average a few folders which contain some files. Let’s just say only about 1.6 million directory tree nodes in my test case. I need to crawl that directory structure…so I wrote a subroutine “TranverseNodes” like this:

For i As Integer = 1 To parentDir.Count
If parentDir.Item(i).Directory Then
NodeCount = NodeCount + 1
TraverseNodes(parentDir.Item(i))
Else
NodeCount = NodeCount + 1
’ Go do some important stuff
End
Next

I pass it a folder item, and it’s supposed to traverse the directory structure just counting the elements. On a Mac I expect similar results to:

find ./ -print | wc -l

The “find” command completes the task in well under 2 minutes. The Xojo code takes a little under 6 minutes to traverse the tree. You know…that’s really not too bad. --and this is running on an Old 2.53 Intel Core 2 Duo MacBook Pro circa 2010(?) - albeit with 750GB Crucial SSD and 8GB RAM.

I have a shiny new MacBook Pro with a 2.6 GHZ Intel i7 and 32 GB of Ram with the 2 TB SSD. It ought to run circles around my old MacBook. The “find” command was somewhat faster – about 20%. Activity Monitor indicated similar disk performance. This operation is expectedly I/O bound. OK.

Then I ran the Xojo Code on my new Macbook. I got tired of waiting after 40 minutes and wasn’t even 1/3 of the way complete – UGH! What gives?
Naturally, I’ve tried to remove as many variables as possible. Making sure spotlight was disabled…making sure my Antivirus software was disabled. My old system is using HFS+ and the newer one is, of course, APFS - Encrypted.

I decided to run the test on my Desktop iMac with 4 GH i7 and 32 GB Ram (but with the Fusion Drive). The “find” command was a bit slower. About the same as my old Macbook Pro, and the Xojo code wasn’t really much different than my new Laptop. It still takes ages. My desktop is running APFS (non-encrypted).

Perhaps some of you could put this subroutine to the test. Have I done something obviously wrong? Naturally, in my program, there is “stuff” I want to do on each file I find. I could just run the “find” command and save results into a file – then open the file and process it – but why? I can handle that Xojo isn’t as fast as a stripped down optimized native OS tool like “find” – but what gives with the new Systems. My new MacBook Pro should be embarrassed to let a decade old system run laps around it.

First, I suggest putting the FolderItem.Directory check outside of the loop so you’re only doing that loop on things that are actually directories.

I also suggest using TrueItem instead of just Item so it doesn’t follow aliases.

I’m not at my desk right now, but I have a reusable class that I use for this if you want to see it. I’ve not seen the speed issues you are talking about.

If you need speed, please avoid FolderItem and use either declares or a plugin class like our FileListMBS in MBS Xojo Util Plugin.

Well, speed would be nice – I wasn’t excited about 6 minutes…but wasn’t too disappointed-- until my new systems would take HOURS to perform the same operation on the same group of files. This is supposed to be a recursive operation. The FolderItem Directory test has to be inside the loop yes?
I could put the Node count operation outside the If/Then/Else…but that code was only illustrative.

I typically do

function recurse(f as folderitem) if f.directory then // do directory stuff For i as integer = 1 to f.count Recurse(f.trueitem(i)) Next i Else // do file stuff End if End function

In my tests AFPS was way slower than good old HFS when doing file operations.

And the Chilkat plugin is even faster than MBS. Forget any Xojo native code.

Be sure to try latest MBS Plugin. I rewrote stuff for newer MacOS versions to benefit from APFS.

@Greg O’Lone - Thanks, Looks like similar performance numbers result. My Desktop started at about 1,000 nodes/sec and as it runs…it gets slower and slower… after 10 minutes it’s about 260,000 nodes visited. My old Core 2 Duo MacBook is long been done with 1,670,653 Nodes visited. It started at about 6,000 nodes per second and never slowed down. Meanwhile my desktop is now visiting about 150 Nodes/sec. Also noteworthy…the on old MacBook – the app consumed about 15 MB during it’s run. My desktop has consumed 59.3 MB and is slowly growing. These are very small numbers for machines with THOUSANDS of MB of RAM. My program executes the routine in a thread. (I have a timer running that updates the UI with the current node count once per second).
It’s really just running your recurse subroutine…that’s it. I’m running Xojo 2018.R4 and I’m running mac OS 10.14.2 on my newr machine…with my Old Mac stuck at 10.11.6 (El Capitan). You can say the new OS “does so much more” – but slowing a process from 6 minutes to multiple HOURS. Something is really wrong with that picture. Again this didn’t impact other commands like the “find” command or the “unzip” I used to restore the file hierarchy on these systems. I’m not saying it’s definitely a Xojo issue…but its worth looking into.

I have Christian’s wonderful MBS Plugin, I’ve considered acquiring the Chillkat plugin group as well (I expect each has their strengths). I noted but hadn’t heard any reviews on the Chillkat classes. (and I don’t think the author is as active as Christian is here?). Good to hear that this is yet another potential resource for me. It pains me a bit to hear “Forget any Xojo Native Code” – I understand that developers may focus on specific areas where Xojo is weak and provide their own solutions. Still It’s nice Xojo has a great contributing developer community. I will try MBS - simply because I already have a license for it. Of course the Chilkat classes are ported to cover a great many different languages. PHP, JAVA, Python, C++, and such.

Also – while I did note that AFPS was “way slower” according to some posts – you would think that would be like 20% slower? 50% slower? 2x Slower? But I’m currently tracking about 30x slower. Consider that. Old HFS+ – operation takes six minutes. APFS – same operation takes 3 HOURS. I didn’t expect this is indicative of the File System change (because OTHER Filesystem operations do not appear to be appreciatively slower) – they probably are measured in nanoseconds – but if you can measure the difference with an hourglass… ?

@Karen Atkocius had a good method here: https://forum.xojo.com/13879-fastest-way-to-get-folderitem-names-into-array/p1#p111418
But based on own experience and multiple similar threads pointing to the same answer, I would go the plug-in way if you could.

I employed the MBS Filelist Plugin and on my Old MacBook, it shaved about 15% off the time - completing in well under 5 minutes. Sadly, when running against my Desktop, after 6 minutes; I am not even 10% complete. This might actually be significantly faster – but again it is so much slower than my old system. This is all just some sample data with 1.6 Million Nodes spread across ~ 30 GB. My typical case will probably be about 6 million Nodes spread across 180 GB. In theory… I have a current worst case expectations that indicate about 70 Million Nodes spread across 1 TB of folders/files. Wait…Do I expect I could crawl through 1 TB of Disk with a Xojo App? I can dream, can’t I? Given that my typical case is “only” say 6 million Nodes, that’s less than 4x larger than my test case. If my test case takes 5 minutes…then my typical case might only take 20 minutes. This is acceptable use. My "worst case " is less than 12x my test case. 12 x 5 min = 1 hour. Not bad. Of course, there’s all that “real work” that needs to be done with each file. At this point, I’m thinking I need to shell out on the mac and run a “find” command sending the results to a file. Then I can just open and read through the file list – or I suppose I can just run everything on 10-year-old MacBooks :slight_smile:

Not sure what you tested exactly.

For APFS it is slow to use FSRef functions.
But if you use newer functions like CFURL, it’s faster than HFS+.

That’s the reason I made extract functions to read/Write text files for my own apps here.
TextOutputStream, TextInputStream and BinaryStream all use FSRef, so you have slow performance for opening a lot of files.

Well, I thought it was “odd” as it still uses FolderItem – but I used this from your recursive file list example.

dim s as String
dim g as New FileListMBS(f)
dim c as Integer = g.Count-1
for i as Integer = 0 to c
if g.Directory(i) then
NodeCount = NodeCount + 1
s = g.Name(i)
LoadFolderMBS g.item(i)
elseif g.Visible(i) then
NodeCount = NodeCount + 1
end if
next i

It should do nothing but count the number of Folders and Files (nodes) encountered while diving the directory structure of any given folder.

That sounds wrong. We got a constructor to make FileListMBS based on existing FileListMBS.
That example should be changed.

@Christian Schmidt – OK, I see you have a constructor that will create a FileListMBS from a FileListMBS if I pass one in…along with an index. As an aside I see the “WinFilter” only applies to Windows systems. Ultimately I will be looking for files with names that meet specific criteria – I was going to use a RegEx inside Xojo – but really… I can just run a command-line “find” and pipe the results to “grep” and then read the file. This makes my program dependant on the Mac OS or at least having support for certain command line utilities – but it’s an acceptable if not optimal solution.

I wrote a module a while back using NSDirectoryEnumerator. I just updated it and added a method for FolderItem.CountItemsInDirectory to just get the count in addition to ItemsInDirectory which gets an array of filepaths.

I’d be curious how it stacks up.

Here’s the link if you want to try it. DirectoryListing.xojo_binary_project

@jim mckay - I will try it just to see what kind of difference I see between my old and new Macs. Counting nodes (files/directories) really isn’t my end game. I’m moving through directories finding files that match a specific pattern, opening them and building database entries from the contents. When I moved my project to my newer hardware…I discovered it ran like a dog compared to my old hardware. I didn’t know where the problem was. Was it the database inserts? Nope. Was it the file parsing? Nope. I was able to narrow the issue down to the directory crawler. Arguably the simplest part of the entire process! You can see how the basic logic is only what 10 lines of code? Open a folder and step through the “items”. If they are folders… then recursively call the subroutine feeding in the new folder item. If it’s a file…then process it. – I eliminated all the file logic except for a simple counter to illustrate the problem.

Don’t do this. Instead do:

Dim Count As Integer = parentDir.Count For i As Integer = 1 To Count

FolderItem.Count is very expensive in disk IO so do it once.

OK … This may not an option but here are two alternatives using a different approach. I think you need to do this in Windows from what I saw in some of the threads for this post. You could do something similar in OSX.

  1. Why not store your files as binary blobs in a database? You could then write some Xojo console code that gets called by a utility that detects a directory change and updates the database on the fly. You could then search the database for what you want.

  2. Take a look at this Windows tool that does “instant” indexing of Windows files and has a command line interface you could use in a SHELL call:
    http://voidtools.com/support/everything/command_line_interface/

Just maybe some “alternative thinking” to solve your problem. If it cannot work in your environment then just ignore.