Preemptive Threads in Webapp

Hello guys,

I have a scenario where i need to use a WebApp and a lot of files to process so the idea was to do all that in the background , to use as much as possible the resources available, that includes ram and cores and to display the results in the web app.

Now i have the current restrictions :

  1. OS , Debian 12.x
  2. Database Storage SQLite, Encrypted
  3. App Type WebApp
  4. Multi user management

Current test load i have , around 500 GB of files , and in those i need one pdf file to have it parsed and data extracted to put it in DB .

Now PDF parsing i managed to doit using python and it returns a JSON with the needed data so that i handle it easilly with a shell call but the rest is quite slow and i use just a limited ammount of the server resources , so what would be the best way to handle those ?

The idea i had was to handle multiple threads that will do specific tasks but then i saw some youtube podcast where they say that it would be problematic to use iterating for files and folders by multiple threads in the same time.

Me, i need to scan all the folders and files, identify the files and file types, extract the needed data, process file names , handle weird characters and normalise file names , then parse the needed pdf files for each folder and once this is complete, process the data in the SQLite, filter it and prepare the final result. and all that should be done from the web app interface.

Now here the idea was to use the Web app as a queue system, and allocate tasks then allow threads to get the tasks, process them and report to interface , but i guess due to the multiple write calls on SQLite would be even more slow, i cannot use In memory DB due to the Preemptive part and i would need to always communicate with the interface and the threads.

Then i would like to keep maybe same way and be able to add multiple tasks from multiple users and those to be taken by the processing threads once they finish the current ones so i prepare my daily tasks and the app would do them and then update interface when needed, or i guess more the DB then interface would be updated once user requests it .

Any ideas here ? thanks

My opinion, based on a quick read of your requirements:

  1. Create a helper app ( Desktop ). Do all the heavy processing in the Desktop app.
    Relieve the web app from those disk intensive application. You can use threads / workers in the Desktop app.

  2. If you want to get fast status of the tasks in (1) from the helper app ( desktop app), you can periodically query them in a WebTimer (from the web app) through URLConnection ( to the desktop app) or

  3. Register the URL of the web app as a hook ( webhook) to your desktop app to get real-time update of the status and display them in the web app.

To me, (1) is very important. Handling too many disk operation will make the web app less responsive. The status of the process can be displayed in the web app. The process should be done in a (helper) desktop app.

Thanks for the advice but so far Desktop app is out of the question, i am thinking as well maybe i will build some helpers and just update a DB and the user will see the status based on the DB query and done , this is an internal app for some migrations so i don’t worry to much about any load or delay

Preemptive Threads are much easier to integrate than worker.
you can read System.CoreCount to limit threads
and use a timer to start threads again if they finished run method. Thread.ThreadStates.NotRunning

Well , i did not find anywhere that they work on Web apps so no idea if that would apply here

I read the documentation and it has a important note that says:

Important

Preemptive threads are not currently supported on Android.

so I guess they are supported on Desktop, Console/Web and maybe iOS too.

You have to test if this works. Nobody can tell you “yes, that works” or “no, that doesn’t work”. Your feature depends on many things.

Do you have to use an SQLite database? Is there a single database for every user? How many files do you have?

A queue is the way to go.

I have a similar task in my desktop app. I tried before to read a lot of PDF files and cache information into a database. Writing data into a database just wasn’t fast enough to make a difference. I’m looking forward to test this with preemptive threads.

True, i just saw a test Web App and they had the thread type inside

it depends on the users , some have small amount, some half TB some 1 TB, it depends, and yes ,each migration will have it’s own DB

(1) CAN do and (2) SHOULD do is two different management point of view you need to observe for your project Aurelian. :sweat_smile:

Anyway, i hope you will find the best solution for your requirement.

Apologies if already covered off elsewhere but are there plans for a preemptive thread per session and also per connection for handleUrl ?