Performance Curiosity

  1. ‹ Older
  2. 3 months ago

    Robert B

    Feb 5 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    @MatthewCombatti -- yes, my end goal is to simply parse millions of XML files and build a SQL database. The millions of XML files are scattered across hundreds of thousands of directories and hundreds of GB. Knowing the path is important. Ultimately... I just shelled out and ran a simple "find" command piped to a grep for filenames ending in XML and piped that result into a file. I now have a simple text file with a million+ lines that are path of each XML document. It took "find" about 90 seconds to build the file. I can read through the million lines in a flash. So I just send each line (path) to my "parser" and "data base record builder". End Goal - Complete. I didn't really say I couldn't find a workaround...just that I found a "performance curiosity". The solution I employed was actually exactly how I originally thought I could solve it -- because I never built a folder traversal in pure Xojo before -- so I took the time to do that -- and performance was "good enough" -- until I moved the program to my NEWER FASTER hardware which demonstrated as much as a 60X performance PENALTY. My solution now is not yet cross platform as it relies upon OS specific tools. If I want this to run from Windows also (that would be nice) -- then I now have to add platform specific code and probably make other minor adjustments for platform. FolderItem abstracted the details of the target file system and I thought that would be a slick if not the most processing efficient solution.

  3. Robert B

    Feb 5 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    Ouchie. So... I let MacOS (find utility) scoop together all the XML paths, now "all I have to do" is open each file and Parse it.
    Parsing XML isn't the speediest operation...but it should be faster on my newer hardware (one would think).-- alas I need to Open "a million FolderItems" -- I started the process up on my New MacBook Pro... chug...chug...choke. I'm up to 2,600 files parsed after a couple minutes. My Old HFS based MacBook is up past 38,000. Gotta be kidding me.

    Ranked #2 in Feedback... I should hope so. FolderItem performance is abysmal. It's Sad. Awful, Slow as Molasses in January.
    I thought I could leverage Xojo's XML and SQL classes to make "short work" of this project. The coding wasn't bad (quick development really) but the execution on the developed code is just...just indescribably bad.

    I'm going to look at running it against external HFS+ media -- it couldn't be worse could it?

  4. Robert B

    Feb 5 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    I ran the program with directory paths pointing at an external USB3 disk drive and was able to manage better than 1,000 documents per second. When I point the program at my internal high speed SSD ---- It starts at about 300 documents per second...and by the time it get's to 38,000 documents it's only grabbing on average about 30 per second. Shaking my head in disbelief.

  5. Jason P

    Feb 5 Xojo Inc Texas

    Here are some guidelines about how to get fast performance from Folderitem:

    https://forum.xojo.com/conversation/post/111435

  6. Robert B

    Feb 5 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    I could probably bypass FolderItem entirely...but really I simply wanted to "open a file". Seem like FolderItem is going to be probably the best candidate. Really performance (on HFS+) media it acceptable. 1.2 million files at 1,000 per second yields a 20 minute processing time. That's acceptable. The issue is how FolderItem works with APFS -- about 30X slower in my tests with yields about 10 hours of processing time. That's not acceptable. -- For small numbers of files...even a few thousand it's probably fine. Thanks for the tips anyway @Jason P.

  7. jim m

    Feb 5 Pre-Release Testers piDog.com

    Since you're using shell anyway, what if you use a shell and call cat on each file and read the results from the shell rather than creating folderitems and textinputstreams?

  8. Robert B

    Feb 5 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    @jim m Actually, the thought had occurred to me. That or just write 90% in C. Building another text file that contains all the database insert statements. Sigh. It’s kinda like when you buy a Car and find that it won’t run on some roads, and the solution is to buy a Truck and hire a driver To tow you in the car. At some point you gotta feel foolish.

  9. Robert B

    Feb 5 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    Lest people think I'm too critical of Xojo, I will say that once I built "the Database" I needed a "pretty" GUI application that could search by different keys based on document type etc. and display them appropriately. Xojo has worked perfectly in that regard. The application is also cross platform. It would have been a pain in the keister to do this with C# -- and I wouldn't have cross platform capability and basically a "zero install footprint" -- I have the executables at the same location as the database (on removable media) -- so it's entirely portable. What a vibrant discussion group as well. Thanks to each and every one of you that offered your time and expertise. I know you didn't have to.

  10. Beatrix W

    Feb 5 Pre-Release Testers Europe (Germany)

    Have you tried to do an Instruments session to see why your code slows down so much?

  11. Robert B

    Feb 5 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    @Beatrix W - I have not. I've not had the occasion to run the Mac (XCode?) Instruments tool to diagnose performance issues with a Xojo Application. I'll look into that. Good idea.

  12. Beatrix W

    Feb 6 Pre-Release Testers Europe (Germany)

    @Jason P : even on HFS the performance of Xojo versus MBS is like 3450 ticks to 250 ticks. With "best practices". Don't just test with a couple of k files.

  13. Kevin G

    Feb 6 Pre-Release Testers, Xojo Pro Gatesheed, England

    If you want the best performance I suggest you avoid FolderItems as much as possible. On MS-Windows we found that using declares to retrieve the file properties we actually needed resulted in a 5 x performance improvement on a congested network.

    My guess is that the Xojo framework makes multiple OS calls to populate all of the FolderItem properties which results in slower performance. On macOS there is the additional problem with Xojo using OS APIs that don't seem to work well with APFS.

  14. Beatrix W

    Feb 6 Pre-Release Testers Europe (Germany)

    @Jason P : and on APFS the performance of folderitems is even worse: 41 ticks versus 950 ticks on just a couple of k files.

  15. 2 months ago

    Robert B

    Feb 7 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    Interesting coincidence that the current Xojo Blog entry is how to empty folders using recursion. -- I hope they don't have too many folders and files to delete...it could take awhile. I don't mean deeply nested. My nesting was only 4 or 5 levels deep. But just making each item a "folderitem" so you can check properties and such...takes a long time.

  16. Oliver O

    Feb 7 Pre-Release Testers, Xojo Pro https://udemy.seminar.pro

    TTs „find any file“ App was lightening fast on HPFS, with APFS you can go get a coffee until results appear. But if Thomas can’t solve this, I doubt that it can be any faster at all.

    http://apps.tempel.org/FindAnyFile/index.php

  17. Robert B

    Feb 9 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    Well, I looked at TTs tool and I cranked it up... after a min of watching the file count grow... I opened a terminal window and ran a "find" command piped to "grep" and piped the result to file. The "find" command had identified 500K files before TTs tool hit 15K.
    I think my solution of shelling out and using the native command was a good call. Problem is...once I identify 500K files -- now I need to open each one and process the XML data contained therein. I ended up doing the operation on my old HFS+ Macbook just to get my first task done. One of my tests runs was against 1.2 million documents - and the other was over 5 million. Performance varied a bit over the course of the operation -- but I think I was averaging about 300 docs/sec using Xojo's XML classes. -- Not hateful. Hopefully the problem isn't an APFS limit -- and it's just a matter of Hooking the new "drivers". I'm concerned that Christians's MBS code while FAR superior to Xojo's Folderitem class for the operations I was using -- still falls short on APFS when compared to HFS+. Nothing I like better than to see a $4,000 Laptop get it's a$$ handed to it by a decade old system with an ancient file system.

  18. Robert B

    Feb 9 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    Ok... I posted about two hours ago. TT's find tool has now identified just flipped over the 100K file mark. Woot. The native find command was done (about 543K files) before I typed up my last post. So what took like maybe TWO and a HALF MINUTES with "find" compared to TWO HOURS with TTs tool...and it's just barely crossed the 20% threshold. "Coreservicesd" has been sucking up a core at 95- 99% for the last two hours. My system is otherwise idle showing only about 13% total CPU utilization. "Find Any File" is up to 253 MB consumed -- slowly and steadily rising. The FILE produced by my "find" command which has identified the 543K files (full path of a file on each line) is 66 MB in size -- and it never consumes more than 67% of a single core when running.

  19. Robert B

    Feb 10 Pre-Release Testers, Xojo Pro Cincinnati, Ohio

    Thomas Tempelmann's "Find and File" just completed and reported that it found the exact same number of files as the shell Find Operation. It is a nicely organized tool. 14 hours seems a bit long for a search though.

  20. Oliver O

    Feb 10 Pre-Release Testers, Xojo Pro https://udemy.seminar.pro

    boah ... :o

  21. Jason P

    Feb 10 Xojo Inc Texas

    @Robert B Ok... I posted about two hours ago. TT's find tool has now identified just flipped over the 100K file mark. Woot. The native find command was done (about 543K files) before I typed up my last post. So what took like maybe TWO and a HALF MINUTES with "find" compared to TWO HOURS with TTs tool...and it's just barely crossed the 20% threshold. "Coreservicesd" has been sucking up a core at 95- 99% for the last two hours. My system is otherwise idle showing only about 13% total CPU utilization. "Find Any File" is up to 253 MB consumed -- slowly and steadily rising. The FILE produced by my "find" command which has identified the 543K files (full path of a file on each line) is 66 MB in size -- and it never consumes more than 67% of a single core when running.

    In one of the OS updates Apple moved calls to old API's into a daemon - this is why Coreservicesd consumes as much time as it does. The effect of this has been to make the use of old API's slow down enormously. A google search for recent posts about "Coreservicesd" will give you some idea of the scope of the issue. We're aware that folderitem's performance on APFS volumes is not the same as it was with HFS+ volumes and are working to address the issue.

or Sign Up to reply!