Thanks Beatrix. I had an UPDATE statement to deal with the database directly but for some reason this code takes SEVEN seconds for each record:
If DigestStorage.Connect() then
dim rs as RecordSet
dim OccurencesCount as integer
dim counter as int64 = 0
dim total as Int64 = NumberOfOccurencesDict.Count
// **** for each sequence in the NumberOfOccurencesDict
for each key as Variant in NumberOfOccurencesDict.Keys
counter = counter + 1
// how often is the sequence in the database?
rs = DigestStorage.SqlSelect( "SELECT Peptide_ID FROM tblPeptide WHERE Sequence ='" + key.StringValue + "'")
If DigestStorage.Error Then
MsgBox("DB Error: " + DigestSettings.Db.ErrorMessage)
End If
OccurencesCount = rs.RecordCount // <- this often
// Update those records
DigestStorage.SQLExecute ("UPDATE tblPeptide SET OccurencesInDigest = " + Str( OccurencesCount ) + " WHERE Sequence='" + key.StringValue + "'")
If DigestStorage.Error Then
MsgBox("DB Error: " + DigestSettings.Db.ErrorMessage)
End If
'DigestStorage.Commit
Feedbackinfo = "Updating occurences for each peptide: " + Str( counter ) + " of " + Str( total )
lbl_InfoText.text = FeedbackInfo
lbl_InfoText.Refresh
next
If DigestStorage.error then
MsgBox DigestStorage.errormessage + EndOfLine + CurrentMethodName
DigestStorage.Rollback
return False
else
DigestStorage.Commit
End if
else
Beep
MsgBox "The database couldn't be opened: " + CurrentMethodName
return False
end if
With over 3 Million records it would take 6200 hours, so something is going wrong somewhere.
That’s a bit long. I thought you wanted to use ONE update query for ALL records.
I’m not sure what you want to do with your queries. You get the number of things of one query and put the the result into another table, correct?
I don’t think it’s surprising that your queries take so long because you go over the whole database every time. You should be able to put your inner loop into one single query. And the outer loop most likely, too.
I digest proteins into peptides, each peptide is saved into the database.
However some properties can only be set after all proteins have been processed, like how often a peptide occurs in the dgest.
Therefore I have a dictionary that has the peptide sequence as key and the number of occurences as value, which gets set accordingly each time a new peptide is created and saved.
So at the end I need to go over all records and set the NumberOfOccurences for each peptide.
Note that the seven seconds is for each key, not the whole operation!
Xojo code is about dealing with one line of code after the other. SQL is about dealing with as much as data as it can. And with a good SQL you only need one or two lines and it should deal with all records at once.
I hate writing SQL by hand. Doing this visually is so much easier.
Well, one thing that ActiveRecord is not known for is speed. It does a fair amount of stuff behind the scenes so it kind of a resource hog. Another issue that Markus has pointed out to me in a private email is that it doesn’t deal well with NULL values.
It might be better to create your own data classes to do much of what AR does. That way you have more control over it.
Why not let the database engine do the counting? The way you are doing it you are returning a potentially large amount of data for the runtime to deal with. I have made some changes to your code above that should improve performance for counting.
@Bob: because I continue to use the same RecordSet for the update. Your way I have to run an additional SELECT query to get the same records that you just counted which would be slower.
I actually started out without AR, then rewrote the app with AR, then took it out again when I got stuck, then put it back in when I found a solution, now might have to take it out again
My Mac is shaking his monitor in disbelief at what I’m doing
[quote=131944:@Markus Winter]I actually started out without AR, then rewrote the app with AR, then took it out again when I got stuck, then put it back in when I found a solution, now might have to take it out again
My Mac is shaking his monitor in disbelief at what I’m doing [/quote]
Not to give the wrong impression: I think ActiveRecord is great. I just wish in my quest for doing it “right” I would just stick with one design desicion
Unless I am totally misunderstanding your code (very possible) it looks like you don’t touch the recordset at all after getting the count. Your UPDATE is running against the database directly using the database engine.
It would be worth trying at least.
Also with a database that big have you considered running an ANALYZE to get the table optimized? Even in memory it can improve things significantly.
this gets the count for every key in your dictionary
this gets the count for EVERY sequence (yes you could do this for each key as well by adding the same where clause)
create temporary table summary as select sequence, count(*) as occurrences from tblPeptide group by sequence ;
this gets the count for the sequence & updates every row in the db that has this sequence
[code] OccurencesCount = rs.RecordCount // ← this often
DigestStorage.SQLExecute ("UPDATE tblPeptide SET OccurencesInDigest = " + Str( OccurencesCount ) + " WHERE Sequence='" + key.StringValue + "'")
[/code]
This updates EVERY row (and yes you could do only certain keys)
update tblPeptide set occurencesInDigest = ( select occurrences from summary where tblpeptide.sequence = summary.sequence ) ;
Put it all together and you should be able to do
If DigestStorage.Connect() then
DigestStorage.SQLExecute("drop table if exists summary")
DigestStorage.SQLExecute("create temporary table summary ( sequence, occurrences) ")
for each key as Variant in NumberOfOccurencesDict.Keys
// how often is the sequence in the database?
DigestStorage.SQLExecute("insert into summary ( sequence, occurrences) select sequence, count(peptide_id) from tblPeptide WHERE Sequence ='" + key.StringValue + "'")
If DigestStorage.Error Then
MsgBox("DB Error: " + DigestSettings.Db.ErrorMessage)
End If
next
// now one shot to update everything
DigestStorage.SQLExecute("update tblPeptide set occurencesInDigest = ( select occurrences from summary where tblpeptide.sequence = summary.sequence )" )
If DigestStorage.error then
MsgBox DigestStorage.errormessage + EndOfLine + CurrentMethodName
DigestStorage.Rollback
return False
else
DigestStorage.Commit
End if
I doubt you HAVE to use only AR or only straight SQL and could miss mash the two together BUT you might have to worry about having any cached results in objects that need to be refetched from the DB
For straight SQL, this sounds like a job for a database cursor where you can iterate through each row and perform various task for that row.
Currently, I am doing this in MS SQL on one of my scripts that updates a table and assigns a iteration count for each row. Here is an example of cursors being used.
Though I understand you are using sqlite, might be a good idea to check to see if its feasible.