ActiveRecord: how do I iterate over all records?

Hi all,

I’m using ActiveRecord to map my classes to a SQLIte database and need to update a property in all of the records.

What is the proper way to iterate over all the records?

TiA

Markus

P.S. Note that as records could have been deleted and added the number of records is not necessarily the same as the highest ID

You can do that. Or you can write an “update” sql query. See http://www.w3schools.com/sql/sql_update.asp .

Thanks Beatrix. I had an UPDATE statement to deal with the database directly but for some reason this code takes SEVEN seconds for each record:

  If DigestStorage.Connect() then
    
    dim rs as RecordSet
    
    dim OccurencesCount as integer 
    
    dim counter as int64 = 0
    dim total as Int64 = NumberOfOccurencesDict.Count
    
    // **** for each sequence in the NumberOfOccurencesDict
    
    for each key as Variant in NumberOfOccurencesDict.Keys
      
      counter = counter + 1
      
      // how often is the sequence in the database?
      rs = DigestStorage.SqlSelect( "SELECT Peptide_ID FROM tblPeptide WHERE Sequence ='" + key.StringValue + "'")
      If DigestStorage.Error Then
        MsgBox("DB Error: " + DigestSettings.Db.ErrorMessage)
      End If
      OccurencesCount = rs.RecordCount // <- this often
      
      //  Update those records
      DigestStorage.SQLExecute ("UPDATE tblPeptide SET OccurencesInDigest = " + Str( OccurencesCount ) + " WHERE Sequence='" + key.StringValue + "'")
      If DigestStorage.Error Then
        MsgBox("DB Error: " + DigestSettings.Db.ErrorMessage)
      End If
      'DigestStorage.Commit
      
      Feedbackinfo = "Updating occurences for each peptide: " + Str( counter ) + " of " + Str( total )
      lbl_InfoText.text = FeedbackInfo
      lbl_InfoText.Refresh
      
    next
    
    
    If DigestStorage.error then
      MsgBox DigestStorage.errormessage + EndOfLine + CurrentMethodName
      DigestStorage.Rollback
      return False
    else
      DigestStorage.Commit
    End if
    
    
    
  else
    
    Beep
    MsgBox "The database couldn't be opened: " + CurrentMethodName
    return False
    
  end if
  

With over 3 Million records it would take 6200 hours, so something is going wrong somewhere.

I should also probably look at using TRANSACTION …

P.S. And that is 7 seconds with a 2.5 GB in memory database on a 2.8 GHz i7.

That’s a bit long. I thought you wanted to use ONE update query for ALL records.

I’m not sure what you want to do with your queries. You get the number of things of one query and put the the result into another table, correct?

I don’t think it’s surprising that your queries take so long because you go over the whole database every time. You should be able to put your inner loop into one single query. And the outer loop most likely, too.

I digest proteins into peptides, each peptide is saved into the database.

However some properties can only be set after all proteins have been processed, like how often a peptide occurs in the dgest.

Therefore I have a dictionary that has the peptide sequence as key and the number of occurences as value, which gets set accordingly each time a new peptide is created and saved.

So at the end I need to go over all records and set the NumberOfOccurences for each peptide.

Note that the seven seconds is for each key, not the whole operation!

I got that with the 7 seconds for each record :slight_smile:

Xojo code is about dealing with one line of code after the other. SQL is about dealing with as much as data as it can. And with a good SQL you only need one or two lines and it should deal with all records at once.

I hate writing SQL by hand. Doing this visually is so much easier.

Hi Markus,

two things spring to mind.

put an index on sequence? i’m guessing it needs to table scan for the record to update. rather than go directly to the record?

secondly, with a 2.5gB database, are you sure its all actually in RAM and not paged out to disk?
that would definitely account for ‘odd’ slowdowns.

Well, one thing that ActiveRecord is not known for is speed. It does a fair amount of stuff behind the scenes so it kind of a resource hog. Another issue that Markus has pointed out to me in a private email is that it doesn’t deal well with NULL values.

It might be better to create your own data classes to do much of what AR does. That way you have more control over it.

Why not let the database engine do the counting? The way you are doing it you are returning a potentially large amount of data for the runtime to deal with. I have made some changes to your code above that should improve performance for counting.

@Bob: because I continue to use the same RecordSet for the update. Your way I have to run an additional SELECT query to get the same records that you just counted which would be slower.

I actually started out without AR, then rewrote the app with AR, then took it out again when I got stuck, then put it back in when I found a solution, now might have to take it out again …

My Mac is shaking his monitor in disbelief at what I’m doing …

@Bob: Oops. That used to be the case in the old version (there was a loop which is missing in this version) so you are absolutely right. My apologies.

[quote=131944:@Markus Winter]I actually started out without AR, then rewrote the app with AR, then took it out again when I got stuck, then put it back in when I found a solution, now might have to take it out again …

My Mac is shaking his monitor in disbelief at what I’m doing …[/quote]

Sorry. :frowning:

Not to give the wrong impression: I think ActiveRecord is great. I just wish in my quest for doing it “right” I would just stick with one design desicion :wink:

Unless I am totally misunderstanding your code (very possible) it looks like you don’t touch the recordset at all after getting the count. Your UPDATE is running against the database directly using the database engine.

It would be worth trying at least.

Also with a database that big have you considered running an ANALYZE to get the table optimized? Even in memory it can improve things significantly.

@Bob: see second posting. I was still thinking of previous code which had a loop.

Good idea with the ANALYZE

dang … .sqlite doesn’t support a nice option on the update stmt would let you do this in one shot

but 2 sql statements maybe

this gets the count for every key in your dictionary

this gets the count for EVERY sequence (yes you could do this for each key as well by adding the same where clause)

create temporary table summary as select sequence, count(*) as occurrences from tblPeptide group by sequence ;

this gets the count for the sequence & updates every row in the db that has this sequence

[code] OccurencesCount = rs.RecordCount // ← this often

    DigestStorage.SQLExecute ("UPDATE tblPeptide SET OccurencesInDigest = " + Str( OccurencesCount ) + " WHERE Sequence='" + key.StringValue + "'")

[/code]

This updates EVERY row (and yes you could do only certain keys)

update tblPeptide set occurencesInDigest = ( select occurrences from summary where tblpeptide.sequence = summary.sequence ) ;

Put it all together and you should be able to do

  If DigestStorage.Connect() then
         
    DigestStorage.SQLExecute("drop table if exists summary")

    DigestStorage.SQLExecute("create temporary table summary ( sequence, occurrences) ")

    for each key as Variant in NumberOfOccurencesDict.Keys

      // how often is the sequence in the database?
      DigestStorage.SQLExecute("insert into summary ( sequence, occurrences) select sequence, count(peptide_id) from tblPeptide WHERE Sequence ='" + key.StringValue + "'")
      If DigestStorage.Error Then
        MsgBox("DB Error: " + DigestSettings.Db.ErrorMessage)
      End If
          
    next
    // now one shot to update everything
    DigestStorage.SQLExecute("update tblPeptide set occurencesInDigest = ( select occurrences from summary where tblpeptide.sequence = summary.sequence )" )
    
    If DigestStorage.error then
      MsgBox DigestStorage.errormessage + EndOfLine + CurrentMethodName
      DigestStorage.Rollback

      return False
    else
      DigestStorage.Commit
    End if

I doubt you HAVE to use only AR or only straight SQL and could miss mash the two together BUT you might have to worry about having any cached results in objects that need to be refetched from the DB

For straight SQL, this sounds like a job for a database cursor where you can iterate through each row and perform various task for that row.

Currently, I am doing this in MS SQL on one of my scripts that updates a table and assigns a iteration count for each row. Here is an example of cursors being used.

Though I understand you are using sqlite, might be a good idea to check to see if its feasible.