I’m working on an application that will run a command line app on a folder full of sequentially numbered images, allowing the user to run either one setting on a range of files, or one set of settings per file. The files are large TIFFs or DPX files (about 25MB/file). There are a couple things I need to do when opening the sequence:
-
If the user selects any 1 file in a sequence, the sequence should import in order starting with the lowest numbered file. That is, if the user clicks on “FILE_000500.tif” it should import the sequence from FILE_000000.tif through the highest file in the sequence
-
the application should recognize sequence breaks. If a file is missing, it should only import up to a break
It seems to me that the easiest way to do this is to get all the file names into an array, where I can handle both of those situations pretty easily.
I won’t be opening the actual files all at once, just one at a time. So what I really need is the path to the parent folder, and a list of all the files in the sequence. I would also like to store some information about each file for use within my app. For example, whether the file was already processed, the path to the processed file, what settings in the command line app were used, etc. The main concern here is that a folder could easily contain 10,000 or more files in a single sequence. So, what’s the best way to approach this? I don’t want to have to sit around for a couple minutes, just to generate the file list.
One approach would be to load all the filenames into a multidimensional array that I work with while the application is running, and when the application quits, I’d dump all that out to an XML project file for use later.
Another would be to use a database, but that seems like it might be overkill.
My main concerns are: the size of the list is pretty big, and I want it to be fast and ideally not a massive memory hog; I need to be able to save it in some way for continued work on the sequence in another session later; I’d like to avoid thrashing the hard drive, because I’m already going to be loading and unloading images at a fairly rapid pace. The more I can do in memory, the better, I think.
Thoughts?
The slowdown here is going to be the speed at which Xojo can iterate through the folder (it’s really slow on huge folders.)
You might want to look into plugins for this.
For saving the folder contents I would recommend a database, especially if you want to store additional details. Using something like ActiveRecord would make things even easier.
Some quick thoughts:
re: possible approaches for getting the list of files:
- Run a shell command (possibly via Xojo) using an Operating System command (or shell script) to create the list, sending the output to a flat file. then open and parse that file in Xojo
- If possible, have whatever mechanism is creating the files update a database or flat-file as it adds the files
re: handling file-specific data during program runs
Avoid multi-dimensional arrays if possible. Instead, create a Class and use a one-dimensional array of that Class.
re: storing info between program runs
Definitely look into using a SQLite database. Xojo makes it pretty easy to work with them.
Good luck with the project!
The MBS plugin has functions for fast access to large folders, which makes a big difference.
There are a lot of ways to do this.
First you may need to write the algorithm to find the files.
To process you may go and write some command line utilities so you can run 4 files at a time for example.
Please check MBS Tiff plugin for advanced tiff file handling.
Which MBS plugin am I looking at? FileList in the Util plugin?
Christian: The software that does the image processing already exists. But it’s a huge pain to use on the command line, because it requires a certain level of interactivity in order to use it efficiently. This is tough when you have to tweak the switches on a command, run it, jump to another window, open the result, etc, rinse and repeat. This application will basically fire off a command to run that application, with GUI controls for the various command line switches. The result will be dropped into the window, so that the workflow is faster than it is currently. The extent of any TIFF handling is going to be displaying a frame in an ImageWell, pretty much.
That said, I have some ideas for preprocessing frames that may require using ImageMagick, but that’s down the road.
I will likely spin off threads to run the command line application in a kind of batch mode, when doing a first pass, to parallelize it somewhat, but I need to first test how the command line application performs when running multiple instances simultaneously, and then how much performance I can eke out on various test machines. It’s running on a dual core Pentium on Linux right now and is reasonably quick. I’m going to be setting up a similar Linux box on a machine with an older i7, which should be faster. But for now, the main concern is getting the files in and displaying, which I’m working on today.
Thanks!
FileListMBS class is faster to list files in a folder than the normal folderItem.
A Thread in Xojo may help to keep GUI responsive so you can show progress bar while shell objects are used to run command line tools asynchronously.