speeding up listing files in a directory on Windows PCs

I develop on Mac OS X. I was happy with the performance of my application until I compiled and ran it on Windows 7. It is painfully slow on Windows.

On my Mac test machine I was listing ~2,500 files. The data being traversed is as follows. 340 top level directories. Each has a folder inside it, then another folder inside that, then the actual files. Anywhere between 1 and 5 files are present in the last nested folder. No other files or directories are present.

To compare Apples to, well Windows, I copied the data to an external USB disk. I compiled my application and copied it to the SSD on each.I have two almost identical computers. A Dell M3800 and a MacBook Pro. They have the same speed i7, same amount of RAM, and both have SSDs. One runs Windows 10 and the other El Capitan.

On the Mac the listing took 180 ticks the first time it ran, then 28 ticks each time after that (it caches the listing, I know this as the disk light does not flicker on subsequent runs). On Windows it takes 1,450 ticks the first time it runs then 190 ticks each time after that (it also caches the listing). I’m measuring ticks right before I call the method and right as it returns.

It doesn’t really matter what media I list from, Windows is also much much slower.

The code is the typical item listing:

Sub ListFiles(thefolder as FolderItem) if theFolder<>nil then if theFolder.Directory then for i as Integer=1 to theFolder.Count if theFolder.Item(i).Directory then ListFiles(theFolder.Item(i)) else if right (thefolder.Item(i).Name,4) = ".DCM" or right (thefolder.Item(i).Name,4) = ".dcm" then listbox1.addrow theFolder.Item(i).Parent.Parent.Parent.name + " [#"+Str(i)+"]" listbox1.Cell(listbox1.LastIndex,1)= thefolder.Item(i).CreationDate.SQLDate +" at "+thefolder.Item(i).CreationDate.ShortTime listbox1.CellTag(listbox1.LastIndex,0)=thefolder.Item(i) end end next end if end if End Sub

theFolder.Count is expensive in terms of disk io, so you shouldn’t do this in a loop.

However I find that using DIR in a shell and parsing the results is an order of magnitude faster.

Use dir *.dcm /b/s in shell and parsing the results

/b >Displays a bare list of directories and files, with no additional information /s > System files
If you have problem to parse the console then
dir > *.dcm /b/s fileslist.txt export to file.

Or use FindFileFirst/FindFileNext declares. See MSDN for details.

Which is in the Windows Functionality Suite. The big bonus is that you can use wild carded filenames in the find… I often need to find files that have a certain extension and start with certain letters…

To make the the routine X-Platform I shell out on the Mac where I can use wild cards too and parses what comes back to get file names to create folderitems.

  • karen

FolderItem.Count and FolderItem.Item are very expensive, so you should use them as few times as possible. The execution time of FolderItem.Item and Count rises exponentially in proportion to the number of items in the directory. Your code uses FolderItem.Count 1 time and FolderItem.Item 8 times on every iteration of the loop; use Count once only and Item once per iteration and you should see a marked improvement.

@Tim Hare’s suggestion is my preference for high-speed indexing since its execution time rises linearly in proportion to the number of items, though using a shell like @Loannis Kolliageorgas suggests might be just as fast. I posted some example code to the old forum that uses FindFirstFile/FindNextFile, though I’ve been told it doesn’t work under 64 bit builds.

Windows declares need adjustment for 64 bit sizes
WFS would need to be updated as it was written before structures existed so much of it is memoryblocks with hard coded offsets that work for 32 bit but not 64 bit

Using a Shell is really fast but if you need to put things back into FolderItems with GetFolderitem, it will slow things down again. Still a lot faster though.

Also, if a lot of items match in your method, adding rows to the listbox is probably slower than the crawling itself.

(On OS X) with below method on a pretty deep directory tree, it went from 1330 to 651 ticks (when it added approx. 4000 .jpg files to the listbox).
With files I don’t have (.xyz) it went from 651 to 275 ticks.

Maybe it helps.

Sub ListFiles(theFolder As FolderItem)

  Dim f As FolderItem
  If theFolder <> Nil And theFolder.Directory Then
    Dim items As Integer = theFolder.Count
    For i As Integer = 1 To items
      f = theFolder.Item(i)
      If f.Directory Then
        ListFiles(f)
      Else
        If Lowercase(Right(f.Name, 4)) = ".dcm" Then
          listbox1.AddRow(f.Parent.Parent.Parent.name + " [#" + Str(i) + "]", f.CreationDate.SQLDate + " at " + f.CreationDate.ShortTime)
          listbox1.CellTag(listbox1.LastIndex, 0) = f
        End
      End
    Next
  End If
  
End Sub

Be careful of the DIR plan - if you have files with high UTF-8 names, like Japanese file names, this will cock them up royally. ( Unless I am being stupid again ).

You also do not need to check both ( or use Lowercase ) “.dcm” and “.DCM” - either will work.

( Sorry - I can’t edit above post ).

For Roman accented characters in file names, the Dos interface is using DOSLatin1 encoding.

Since apparently Xojo Shell is unable to do anything but ASCII, the trick is to send the result to a file

DIR > list.txt

And load that with the proper encoding.

TextArea1.Text = DefineEncoding(t.ReadAll, Encodings.DOSLatin1 )

I have no idea what encoding Chinese requires if even that is possible at all. A quick test shows it is not DOSChineseSimplif.

FolderItem.Name does get the proper Chinese name.

Thank you for all the suggestions.

I’m not sure whether to use DIR or FindFileFirst/FindFileNext declares yet. I tested DIR in a DOS prompt and it is super fast, I’ll test in a XoJo shell when I get to my Windows dev machine this morning.

All my files are created based on a client name and account number entered via a touch screen interface that has a limited character set, so I shouldn’t experience any encoding issues. They are x-ray images from my animal hospital in DICOM format.

I don’t need to run as 64 bit, I am only displaying a single image at a time. I had included upper and lower case suffixes because I also run on Mac OS X.

I did try the MBS example and whilst faster, it was ‘only’ about 50% faster.

[quote=255603:@Michel Bujardet]For Roman accented characters in file names, the Dos interface is using DOSLatin1 encoding.

Since apparently Xojo Shell is unable to do anything but ASCII, the trick is to send the result to a file

DIR > list.txt

And load that with the proper encoding.

TextArea1.Text = DefineEncoding(t.ReadAll, Encodings.DOSLatin1 )

I have no idea what encoding Chinese requires if even that is possible at all. A quick test shows it is not DOSChineseSimplif.

FolderItem.Name

It is not the Xojo shell, it is the Windows command interpreter. You can try to get clever and change the code page via CHCP but I have yet to find any one that works on an English(USA) Windows that works with Japanese names, even resetting fonts. Yes, the names are correct in RB/Xojo folderitems ( thank the good Lord! ).

If you direct a DIR of a folder containing Japanese names, the resultant file is NOT correct ( do a hex dump and look). Here a pic showing the directory entry, the output text file, hex dump of that file.

( Let’s see if this works - folder name is ??? - famous game Dai Senryaku 2 - try it yourself )

Standard disclaimer - unless I am being stupid again.

I found the same result. I was never able to get anything but ??? when Chinese or Japanese Kanji are concerned.

FileListMBS is much faster than listing via folder items.
You may want to try it:
http://www.monkeybreadsoftware.net/class-filelistmbs.shtml

And FileListMBS is fully encoding safe and in latest version works fine on OS X, Windows and Linux.

Yes, it is great, and I have used it. The problem is when you are trying to access an external command via a command line / shell, like to use 7-zip to pack or unpack a file - you get dismally worked over by the Windows command interpreter that cannot find the file. :frowning:

In some circumstances, tokens may be you friends. For instance, copy the file in a folder and you can address it there as ..

However, it would seem 7-Zip needs some setup to deal with Chinese characters 7-Zip / Discussion / Open Discussion: Unzipping .ZIP file with Chinese characters in filename
+
https://sourceforge.net/p/sevenzip/discussion/search/?q=Chinese

Good plan. I have been mucking about with the command line versions of Z-Zip, and Rar, and by putting the (Japanese) files in a folder I was quite surprised to see that the names had been preserved. I can’t remember which, or if both, as I got distracted and need to get back there.

Thankfully I am dealing only with Japanese characters, not Chinese. The gui version of 7-Zip works wonderfully under applocale for Japanese.

If you are working with Dicom, by chances does dicomdir exist?

Also are you sending imagines to existing PACS or your own solution? If existing PACS, might be faster to hook into the db.