Large text files and ListBox

Hi,

I have a large text file (~1M lines) which I want to import into a ListBox. Obviously this will freeze the GUI, so I created a thread (in particular I am using the Task subclass) and I fill the listbox with something like this:

While not m_inputFile.EOF
    Dim rowFromFile As String = m_inputFile.ReadLine
    Me.UpdateUI("AddRow":rowFromFile)
 Wend

However it freeze the GUI anyway, so I wonder what is the best strategy to fill a listbox in “background” while the user can see what has already been filled in the listbox.

Thanks.

You need to yield time back to the GUI periodically. Try something like:

dim msec as double = microseconds
dim rowFromFile as String   // dim it outside the loop for better performance

While not m_inputFile.EOF
    rowFromFile = m_inputFile.ReadLine
    Me.UpdateUI("AddRow":rowFromFile)
    if microseconds - msec > 100000
        app.YieldToNextThread
        msec = microseconds
    end
Wend

Hi Tim,

thanks for the answer. That basically works but just if I remove the condition (if microseconds - msec > 100000) and I call app.YieldToNextThread at every line (otherwise it freeze anyway at some point). The drawback is that it is now VERY slow to fill the listbox, so I was wondering if you had a suggestion to fix that.

Thanks!

It would be faster to read the file in one go (realAll) and then use split to turn it into an array of lines.

Then split each line and add it to the listbox.

What Markus said, and you need to tune it a little. 100000 doesn’t yield often enough, 0 (no condition) yeilds too often and slows everything down. Try 50000 or 10000.

To expand on what Markus said, we often look at 1M lines in a file and think this is BIG! I need to eat it a little bite at a time. In reality, even if every line were 100 characters, that would only be 100M. You can easily read that into memory - you’ve got at least 2000M available to you. Too often, we get stuck in old mindsets, where we used to have to conserve space. It just isn’t true any more.

Thank you both guys. I will try to implement what you suggested.

Unless you are in Proteomics (or other big data). My database is over 4 GB so had to say bye to an in-memory one :frowning:

From the xDev Tips&Tricks column (the ReadAll tip also came from there):

There are several common ways to speed up the ListBox:

• as mentioned in a previous issue you can store data (or a reference to your data like RecordID) in the RowTag and only draw the visible rows and columns as required in the CellTextPaint event (though you have to deal with things like sorting the columns and multiple row selections yourself then).

• you can use “lazy loading” and just load the rows that you need to display

• you can speed up loops filling the Listbox by setting Listbox.Visible to false at the beginning, and restoring the Visible state at the end. However it seems that on some Windows systems the ListBox noticeably disappears when swapping the Visible state. You can prevent this by using the FreezeUpdateWFS and UnfreezeUpdateWFS functions in the Windows Functionality Suite (basically the equivalent of MacOSlib for Windows, available at https://github.com/arbp/WFS). You should NOT use the Windows declare LockWindowUpdate as sometimes recommended as it should only be used for drag and drop operations on ancient versions of Windows that Xojo doesn’t even support!

• make sure you are using a boolean in the Change event to avoid running code in there until all operations are done or the ListBox might slow down to a crawl!

Note: It seems that the number of columns has a much greater effect on the scrolling speed of the ListBox than the number of rows.

Generic thought
Do you need to show this huge amount of data to the user!?

I’ve been working a long time with data and for me, it’s a priority to show much data in a human manner.
I mean, I always think, is it possible to split data into smaller pieces to show to the user? Including myself.

As an example:
If there are more than 20 files in a folder, it’s better to create a new folder within that folder. Of-course, this is much about taste… But a person has very difficult to find a fast view of more than 50 files, or so.

In general, if all data can fit on the screen, then the user need not to scroll and the view is fine for fast processing (by the user).

My thought:
Maybe you need to work on the user interface rather than filling a ListBox with so much data no person is capable of looking at?

I can’t edit the post above, but there is a two letter name for this… as I now have forgotten!
The definition is, “How to show much (very much) data in a pleasant way for the human eye.”

Eventually, the name will come to me…! :slight_smile:

Agree with Jakob, nobody will scroll through a million lines.

I had that too and my user told me it is pointless, I should just delete it.

Not everything that’s feasible is desirable :wink:

I agree too and in fact, by default, I don’t show all data (the user can choose how many lines to see by default). However, I want to give the option to see them all if the user wants to. At the end of the day it is a software for statistical data analysis and in some scenarios it could even make sense to be scroll millions of data…

The trick part is to keep everything smooth…and it is something I am not succeeding at the moment… :frowning:

You should definitely use Markus’s “lazy load” paging method to display your million lines. Load them into an array instead of the listbox directly, and fetch a manageable subset to load into the listbox for the user to view, then load more as the user gets to the bottom of it.

Most importantly, since an array is not a UI element, you can finish loading it in a thread without getting an UIException as would happen for the listbox.

Yes, it is something I’ve started investigating but it is not very straightforward to implement (well, at least for me). I have already moved to arrays, but I find tricky to create the paging system. Ok, I will try to spend some time on it, hoping to get it sorted. Of course, any suggestion would be much appreciated! :slight_smile:

[quote=165134:@Davide Pagano]I agree too and in fact, by default, I don’t show all data (the user can choose how many lines to see by default). However, I want to give the option to see them all if the user wants to. At the end of the day it is a software for statistical data analysis and in some scenarios it could even make sense to be scroll millions of data…

The trick part is to keep everything smooth…and it is something I am not succeeding at the moment… :([/quote]

I also do analytic data, 100%, trust me! It’s all I do.

Solution: UnusefulText.txt
It will open in TeachText / Notepad. It will always work. Just be aware of the CR, depending on the OS, that is. But other than that, it will save you from the problem with the slow (and in general useless) ListBox! :slight_smile:


The name I’m looking for, it’s not AI but something similar… If I have a shower, it will come to me! Or take a walk! :slight_smile:

BI = Business Intelligence.

It’s my profession for the past 15 years.
But now, I rarely speak about it… I think it must change! :slight_smile:

Excuse me for slow memory!

You could take a look at Kem Tekinay’s Data-On-Demand ListBox.

There is an sample data file with 5 million records.

[quote=165178:@Paul Sondervan]You could take a look at Kem Tekinay’s Data-On-Demand ListBox.

There is an sample data file with 5 million records.[/quote]

I took a look at that page, but the example crashes here on the latest Xojo version.

You can contact Kem, he’s here often.
@Kem Tekinay