What’s the best way to generate an array containing the structure of a very large directory tree. We’re talking 30,000-40,000 files.
I’m looking for something that’s fast. Initially, the application will be on Linux, but I’d also like to make it work on Windows and Mac at some point. If there are platform specific tricks that speed this up, that’s fine. But if there’s something generic that will work in all cases, even better.
Not entirely sure yet. The application will be copying each file in the folder (most often to LTO tape via an LTFS mount) and making a checksum on each one. Optionally it will read each file off the LTO tape and do a checksum there, to verify the copy was successful.
So it does kind of depend on how I do the checksums. If I run them from the command line, all I really need is an array with the paths to all the files in the source folder, and that should be recursive. If I do it internally with Xojo’s md5 tool, then I would think i’d need the FolderItem, since I have to load the files up before writing them. This may not be a practical option, because in some cases, the files could be 1TB Quicktime movies. In most cases, 12-50MB image files.
Yeah, ls is probably the best way to go for Mac and Linux. I was hoping to not have to do it two different ways though, for different platforms. I’m going to play with FileListMBS a bit to see how that does on really big directories first.
The key is to avoid folderitems as much as possible…
They are not very fast.
So if you ask FileListMBS for a folder item on each item, you would get slow again.
Thanks. My current thinking is to make an array that contains the full path to each file, then iterate through those paths. That should be all the information I need, and I should be able to avoid folderitems.
Here is a Windows specific method that we use which is certainly quicker than the FolderItem. I haven’t tested it with the numbers of files you are talking about but it is worth a try. You can change your DIR command to suit:
' Create a DIR command to list all the files
' /b means output just the filename b=bare
' /t:c means use the Created Date to sort
' /o:d means order by date (oldest first)
Dim Dir As String
Dir = "DIR " + ThisFolder.NativePath + "*." + FileMask + " /b /t:c /o:d"'
' Do the Dir
Dim ThisShell As New Shell
ThisShell.Execute(Dir)
' Check for Errors from the shell command
' Shell returns 1 - File Not Found if there are no files matching the mask
If ThisShell.ErrorCode > 1 Then
Raise New RuntimeExceptionEx (ThisShell.ErrorCode, ThisShell.Result)
End If
' Don't process if you have File Not Found
If ThisShell.ErrorCode = 0 Then
' Get a list of files
FileNames = Split(ThisShell.Result, Chr(13)+Chr(10))
' Now Publish the files
' The last item in the array is empty because of a CRLF on the end of the list filename in the list
For n = 0 to FileNames.Ubound - 1
' Publish the document
PublishDocument FileNames(n)
Next
End If
I just tried this on my system where I have a lot of files sync’d using Box Sync. The DIR command doesn’t seem to recognize the Box Sync folder - it will show in it’s parent directory, but you can not execute a DIR comment on the Box Sync folder.
If you come to the London Xojo Conference at Wimbledon this Friday I can show you my methods where you pass a folderItem and it build an SQLite database of all the enclosed folders with all Xojo, MBS, EXIF, hash, audio and video metadata. I use FileListMBS for speed.
I use a SQLite database since I can reexamine different aspects of the content without having to re-search the folders, it requires minimal RAM and it survives an application relaunch.
I will demonstrate how I use it in my FileName Extreme Xojo app, amongst others.