Application and memory management

Greetings :slight_smile:

I wrote a simple application that’s job is to do the following:

  • query the database for top 5000 records of files that have not been parse.
  • parse the file information
  • store back to the database with results.
  • re-run the query looking for files that have not parsed and repeat the process.

So far, it works very well for what I need it to do in which on average its able to process 3 files a second. Currently, I am running the tool in a new project and I am finding some performance issues. When I look at the process explorer, the application usually starts out with about 12 MB of memory usage and then will continue to grow. After running the application for a number of hours, I see that it processed 25,000 files and the application is currently consuming 156 MB of memory.

Below is my main loop for processing the list in a thread:

[code] dim db as new dbClass
dim rs as RecordSet, dblKey, directoryPath, filePath as String

do
rs = db.dbQuery("exec P_PARSE_LIST " + chr(39) + cfgHost + chr(39))

if rs = nil then
  app.errMsg = "Unable to query list."
  app.errFlg = 1
  exit
  
end if

if rs.EOF = true then
  app.errMsg = "Completed."
  app.errFlg = 1
  exit
  
end if

do' loop through list
  
  dblKey = rs.Field("dbl_key").StringValue
  directoryPath = rs.Field("directory_path").StringValue
  filePath = rs.Field("file_path").StringValue
  
  if GetFolderItem(directoryPath) = nil or GetFolderItem(filePath) = nil then
    db.dbExec("exec P_PARSE_FAILED " + dblKey)
    
  else
    infoUpdate(dblKey, GetFolderItem(directoryPath).Child(filePath))
    
  end if
  
  app.processedObjects = app.processedObjects + 1
  sleep(250)
  
  rs.MoveNext
  
loop until rs.EOF

loop[/code]

what is a good way to help release memory either between query of 5000 records or individual processing of a single record?

thank you in advance.

156 MB doesn’t seem like a lot. Does it ever stabilize?

Hey Kem,

156 MB doesn’t sound like a lot, but I spawn the application a few times to assist with the processing of files. Currently, I have about 4 million files to parse.

In past projects, I was able to spawn 12 applications in which could process 1.5 million files in one day. The only difference here is the files where local and not on a network share. To go more into it, the files I am working with… they can be 5 MB to 100 MB in size. The good news, I only need the first 5,000 in which I store to a MemoryBlock to work with.

With regards to best practice to reusing variables… should I assign them to nil before re-assigning the value… or just assign it a new value… or doesnt matter?

if I increase the sleep timer… will that help? or… what are better ways to clear out memory. when the app is done with a process?

thank you Kem for your thoughts on this.

I’ve never had to worry about the memory usage but I would think calling Recordset.close and then setting it to nil before reusing it could/might explicitly clear it. Only some testing would tell for sure.

I tend to put the dim rs as recordset inside the loop. Personal preference. But to be honest it’s just a guess.

Assigning nil or a new value directly shouldn’t matter. In fact, you shouldn’t have to think about memory management at all other than circular references. Any chance you have those?

Are you loading all the files into memory at once? I wasn’t clear from your latest description.

Also, does it stabilize at some point?

If you really want to get to the heart of it, start commenting or bypassing code starting at the inner processing and working out. When the memory stops being used at that clip, you can focus on the culprit.

Also interested to hear what “parsing” involves.

Nothing in the code you posted seems like an obvious candidate to hold onto 144MB of memory. If something is holding onto objects, it must be happening in whatever the infoUpdate method is.

Kem, Bob, & Walter.

thank you for your responses :slight_smile:

The files I am working with are medical images in which are stored as DICOM. With DICOM objects (for those not familiar), is a standard format in which can be used to store images, encapsulated reports, structure reports, etc…

A medical image DICOM object is has two parts:

  1. DICOM Header - This is the meta data about the image which you can store just about anything too. Most common information stored here is Patient, Study, Series, SOP level information. There is also information pertaining to how a DICOM viewer should display the image.

  2. The pixel data.

The cool part about DICOM files is that they are built around OOP :slight_smile:

Since DICOM images come in all shapes and sizes, it is possible for image size to be a few megs to a few gigs depending what type of Study is being performed. With this said, the part I only care about is the DICOM header which starts at PTR 128 in the file in which I grab the first 64132 bytes of the file to parse against.

My infoUpdate Methodist goes into two parts. 1) call my dicom parser module to parse the file header 2) update the database with the results of the parse.

[code] dim db as new dbClass
dim dicom as new dicomModule.dicomClass
dim file_size, patient_mrn, patient_name, patient_dob, patient_gender, study_accession_number, study_date, study_time, study_description, study_instance_uid, series_date, series_time, series_modality, series_station, series_instance_uid, sop_instance_number, sop_instance_uid, derivation_description, sop_class_uid, sop_transfer_syntax_uid, equip_manufacture, equip_model, equip_institution, phys_referring, phys_performing, phys_reading, xmlData as String

xmlData = dicom.xml(activeFile)
patient_mrn = chr(39) +“none” + chr(39) + chr(44)
patient_name = chr(39) +“none” + chr(39) + chr(44)
patient_dob = chr(39) +“none” + chr(39) + chr(44)
patient_gender = chr(39) +“none” + chr(39) + chr(44)
study_accession_number = chr(39) +“none” + chr(39) + chr(44)
study_date = chr(39) +“none” + chr(39) + chr(44)
study_time = chr(39) +“none” + chr(39) + chr(44)
study_description = chr(39) +“none” + chr(39) + chr(44)
study_instance_uid = chr(39) +“none” + chr(39) + chr(44)
series_date = chr(39) +“none” + chr(39) + chr(44)
series_time = chr(39) +“none” + chr(39) + chr(44)
series_modality = chr(39) +“none” + chr(39) + chr(44)
series_station = chr(39) +“none” + chr(39) + chr(44)
series_instance_uid = chr(39) +“none” + chr(39) + chr(44)
sop_instance_number = chr(39) +“none” + chr(39) + chr(44)
derivation_description = chr(39) +“none” + chr(39) + chr(44)
sop_class_uid = chr(39) +“none” + chr(39) + chr(44)
sop_transfer_syntax_uid = chr(39) +“none” + chr(39) + chr(44)
equip_manufacture = chr(39) +“none” + chr(39) + chr(44)
equip_model = chr(39) +“none” + chr(39) + chr(44)
equip_institution = chr(39) +“none” + chr(39) + chr(44)
phys_referring = chr(39) +“none” + chr(39) + chr(44)
phys_performing = chr(39) +“none” + chr(39) + chr(44)
phys_reading = chr(39) +“none” + chr(39)

if activeFile = nil then
db.dbExec("exec P_PARSE_FAILED " + dbl_key)
exit

end if

if activeFile.Exists = false then
db.dbExec("exec P_PARSE_FAILED " + dbl_key)
exit

end if

if activeFile.Exists = true and activeFile.IsReadable = true and activeFile.Locked = false and activeFile.Directory = false then

file_size = chr(44) + str(activeFile.Length) + chr(44)

if dicom.checkDicom(activeFile) = true then
  patient_mrn = chr(39) +dicom.xmlTag("patientID") + chr(39) + chr(44)
  patient_name = chr(39) +dicom.xmlTag("patientName") + chr(39) + chr(44)
  patient_dob = chr(39) +dicom.xmlTag("PatientBirthDate") + chr(39) + chr(44)
  patient_gender = chr(39) +dicom.xmlTag("PatientSex") + chr(39) + chr(44)
  study_accession_number = chr(39) + dicom.xmlTag("AccessionNumber") + chr(39) + chr(44)
  study_date = chr(39) +dicom.xmlTag("StudyDate")+ chr(39) + chr(44)
  study_time = chr(39) +dicom.xmlTag("StudyTime")+ chr(39) + chr(44)
  study_description = chr(39) +dicom.xmlTag("StudyDescription")+ chr(39) + chr(44)
  study_instance_uid = chr(39) +dicom.xmlTag("StudyInstanceUID")+ chr(39) + chr(44)
  series_date = chr(39) +dicom.xmlTag("SeriesDate")+ chr(39) + chr(44)
  series_time = chr(39) +dicom.xmlTag("SeriesTime")+ chr(39) + chr(44)
  series_modality = chr(39) +dicom.xmlTag("Modality") + chr(39) + chr(44)
  series_station = chr(39) +dicom.xmlTag("StationName")+ chr(39) + chr(44)
  series_instance_uid = chr(39) +dicom.xmlTag("SeriesInstanceUID")+ chr(39) + chr(44)
  sop_instance_number = chr(39) +dicom.xmlTag("InstanceNumber")+ chr(39) + chr(44)
  sop_class_uid =  chr(39) +dicom.xmlTag("SOPClassUID")+ chr(39) + chr(44)
  sop_instance_uid = chr(39) +dicom.xmlTag("SOPInstanceUID")+ chr(39) + chr(44)
  derivation_description = chr(39) +dicom.xmlTag("DerivationDescription")+ chr(39) + chr(44)
  sop_transfer_syntax_uid = chr(39) +dicom.xmlTag("TransferSyntaxUID") + chr(39) + chr(44)
  equip_manufacture = chr(39) +dicom.xmlTag("Manufacturer")+ chr(39) + chr(44)
  equip_model = chr(39) +dicom.xmlTag("ManufacturersModelName") + chr(39) + chr(44)
  equip_institution = chr(39) +dicom.xmlTag("InstitutionName")+ chr(39) + chr(44)
  phys_referring = chr(39) +dicom.xmlTag("PerformingPhysicianName") + chr(39) + chr(44)
  phys_performing = chr(39) +dicom.xmlTag("ReferringPhysicianName") + chr(39) + chr(44)
  phys_reading = chr(39) +dicom.xmlTag("NameOfPhysicianReadingStudy") + chr(39)
  
end if

else
file_size = “0”

end if

db.dbExec ("EXEC P_PARSE_UPDATE " _

  • dbl_key +file_size + patient_mrn + patient_name + patient_dob + patient_gender _
  • study_accession_number + study_date + study_time + study_description + study_instance_uid _
  • series_date + series_time + series_modality + series_station + series_instance_uid _
  • sop_instance_number + sop_class_uid + sop_instance_uid + derivation_description + sop_transfer_syntax_uid _
  • equip_manufacture + equip_model + equip_institution + phys_referring + phys_performing + phys_reading)

exception
db.dbExec("exec P_PARSE_FAILED " + dbl_key)[/code]

the problem I run into with troubleshooting this is… where the files are located… I am not able to installed Xojo to use profiler and the files cannot be copied for any reason.

In writing this application, my intent was for the tool to only keep in memory the active file it was work on long enough to parse and update the DB and then release it as the number of files I need to parse can be alot.

[quote=96614:@Kem Tekinay]Assigning nil or a new value directly shouldn’t matter. In fact, you shouldn’t have to think about memory management at all other than circular references. Any chance you have those?

Are you loading all the files into memory at once? I wasn’t clear from your latest description.

Also, does it stabilize at some point?

If you really want to get to the heart of it, start commenting or bypassing code starting at the inner processing and working out. When the memory stops being used at that clip, you can focus on the culprit.

[/quote]

Kem,

Thank you for your input. The whole file is not loaded in, just the first 64132 bytes. Here is how I do so:

[code]
dim bs as BinaryStream

if dicomFile.Locked = true or dicomFile.IsReadable = false then return false ’ check to see if file is readable

bs = BinaryStream.Open(dicomFile,false) ’ Read file to memory

if bs.Length > 64132 then
dicomMB = bs.read(64132)

else
dicomMB = bs.read(bs.Length)

end if

bs.Close
dicomMB = dicomMB.MidB(128, dicomMB.Size - 128)

dicomMB.LittleEndian = true
read(“TransferSyntaxUID”)
ts = vf
vf = “”

if dicomMB.StringValue(0,4) = “DICM” then ’ Check for DICM in Header
return true

else
return false

end if

exception ’ Error Handling
return false[/code]

As far as if the memory stabilizes, I have seen at other projects… yes it does, but here it does not.

you make a great point about working inward to the outer. I will start to look at this code more to see if I can find anything.

Correction… re-reading my code… looks like I open the full file as a binary stream but only store 64k bytes to a memoryblock to work with before I close the binary stream.

I have made a number of changes to my dicom parser class in which the memory growth in this app is about 1 MB per 350 files; however, when I run the same application on a different project I get 1 MB per 800 files parsed. I am now thinking that this has to do with the environment I am running this application on.

There is another piece of this puzzle which has to do with the application creating an in-memory sqlite database to store the various dicom tag information for the parser to perform a look up on. I am curious if this also plays a role in the memory growth and still looking at this.


Since the number of files I need the parse can be great, I have decided to break the application into 2 applications.

  1. Application 1 will be the GUI front end which will report the parsing stats. This application will be response for spawning the parsing application and monitoring the overall picture of the task.

  2. Application 2 will be the parsing application in which will run a query against the database looking for work to do. Once the worker completed its assigned task, it will shutdown until Application 1 respawns Application 2.

Since Application 2 will only query the database once, I will be able to control how many records it will work on before it shutdown; thus, controlling the memory consumption per parsing application.

Thank you again for everyone’s input. Sometimes, you make an application that works great only to come across it not working so great on a new project which forces you to rip everything apart. I guess the good news in this is that the app will be enhanced for the next project :slight_smile: