I need to insure (as much as possible) that a given file is NOT an SQLite database (either plain or encrypted)
Basically I am looking for files that are known NOT to fit the SQLite format, and since an encrypted file looks like gibberish, I am finding those I know, and rejecting them, anything else I let SQLite tell me with an 11/26 error when attempting to connect
It looks at the first 16 bytes of each file, and looks for known file signatures.
right now it can identify PDF GIF HTM XML JPG PNG TIF
Does anyone have other signature data that I should include?
Public Function IdentifyFile(f as FolderItem) as boolean
Dim i As Integer
Dim bs As BinaryStream
Dim s As String
Dim hx As String
bs = BinaryStream.Open(f, False)
s=bs.read(16,Encodings.utf8)
bs.close
hx=""
//
// PDF GIF HTM XML JPG PNG TIF
//
If Left(s,15)="SQlite format 3" Then hx="SQL"
If Left(s,4) ="%PDF" Then hx="PDF"
If Left(s,6) ="GIF89a" Then hx="GIF"
If InStr(s,"HTML")>0 Then hx="HTM"
If InStr(s,"XML")>0 Then hx="XML"
//
If hx="" Then
For i=1 To Len(s)
hx=hx+Right("00"+Hex(AscB(Mid(s,i,1))),2)
Next i
End If
//
If Left(hx, 8)="89504E47" Then hx="PNG"
If Left(hx,18)="FFD8FFE000104A4649" Then hx="JPG"
If Left(hx, 8)="4D4D002A" Or Left(hx,8)="49492A00" Then hx="TIF"
//
// at this point if hx<>3 char long then
// "maybe" its an Encrypted SQL file... so assume it is
If Len(hx)>3 Then hx="SQL"
//
Select Case hx
Case "PDF"
hx="PDF document file."
Case "JPG","GIF","TIF","PNG"
If hx="TIF" Then hx="TIFF"
hx=hx+" graphics file."
Case "HTM"
hx="HTML web file."
Case "XML"
hx=hx+" file."
End Select
//
If hx<>"SQL" Then
Call newMsgBox("Invalid SQLite Database", _
"This file looks like it is "+hx+" and not an SQLite Database", _
customMsgBox.msgOkOnly)
End If
//
Return (hx="SQL")
End Function
Take first 1000 chars and count how many are letters or digits. For an encrypted file you have rate below 50%.
And if you only count 2 or 3 letter runs it is much lower.
So just by this probability check you would sort out PNG, JPEG, Tiff and others just buy the text they include.
Uh… not really… an Unencrypted SQLite file has maybe 30 or 40 “letters” in the first 1k
while an ENCRYPTED file actually has many more 0x20-0x7F characters in the first 1k
So that doesn’t help.
PNG, GIF etc. also have a load of non-ASCII values but they have a specific signature allowing me to reject them as a possible SQLite database. An un-encrypted one has a signature as well…
It is the files where the contents are “binary” like data, that “might” be an encrypted SQLite database, but if I had more signatures of known file formats, I could reject them more gracefully… Otherwise Sqlite keeps asking for a password, and leaves it up to the user to wonder “why won’t this open for me”
If the need is to be able to determine whether file is a SQLite database, then the SQLite database specification provides the definitive answer:
[quote]1.2.1. Magic Header String
Every valid SQLite database file begins with the following 16 bytes (in hex): 53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00. This byte sequence corresponds to the UTF-8 string “SQLite format 3” including the nul terminator character at the end.[/quote]
As soon as the file does not have the header above, it is not an SQLite database. This approach will not however tell what kind of file the “not SQLite” file actually is. Your approach provides much richer information, but it is going to be quite a lot of work to identify everything that is’nt an SQLite database.
If you read my post, and the attached code, you will see that I look for that header…
that is not the issue…
the issue is a file that contains “gibberish”… this file MIGHT be an Encrypted SQLite file (I know it won’t be an un-encrypted one)… but it MIGHT be something else… it is those “something else” I want to weed out, so I have a higher probablilty that the file IS an SQLIte file (either with the header, or not identified as something else)
and I’m not looking for “every possible” non-SQLITE format , just the ones that MOST people might have
That is exactly what I was asking for… if people had signatures that they knew of…
Yes I can research them myself… but I might not think of all of the reasonable ones, and if others already had the information it would save time
or just let a person select whatever file and when you try to connect (with or without a password) you’ll get an error which definitively tells you whether SQLite recognizes it
you have to open every file you’ve identified just to find out that it is / is not a sqlite file buy checking for a signature
hows that any different than just asking sqlite to try & open it as a db ?
ask sqlite to open it and it will tell you
its openable as an unencrypted DB
its openable but require a password (so now you can re-prompt for one)
To save the user extra work…
If it is in fact an unencrypted SQLite database… then no problem… it breezes right on by.
If It failes to open with either an error 11 or an error 26… then it MIGHT be an encrpted database (and it might not be)
The only way to tell is to inform the user that it MIGHT be, and “Hey do you know the password?”
If I know that the folderitem they chose CANNOT be an SQLite database, then I say Hey its a PNG (or whatever), try again
Is it the same number of dialogs? Yeah… But if the user accidently chose an invalid file and did not realize it (yeah it can happen), the get all p*ssed off when the password they KNOW to be correct isn’t working.
And since it is interogating one file, the overhead is a split fraction of a second
So… #1 would be easy… the file opens with no errors
but if it DOES (11 or 26) then you still don’t know
Or you’re going to assume that there will never be a valid encrypted db that has one of those handful of bytes as the first 3 or 4 ?
That might be a bold assumption unless you have docs to the contrary that say sqlite will not do such a thing
One of the tools I have and use wont open dbs that aren’t named with a specific extension (annoying as hell) since “they arent databases”. Uh huh. I drop into sqlite3 in terminal and voila it opens sans extension. And it opens if I drag & drop the db on to the app icon. Just not in its file > open dialogs etc. And it doesnt put these in the recently used list - ah yeah they’re not databases (despite it having it open)
Another has other similar “oh but I know better than you” safe guards and, you guessed it, drop into sqlite3 and there’s the db. Open & ready to work with
Theres obviously a balance between too safe and too restrictive
Personally I’d be ok with “Hey this might be an encrypted db whats the password ?” and if I supply it and then it says “well this wont open so its really not a db” I’m ok with it
If someone tries to open Tadpole.exe with tadpole they shouldn’t get upset if it don’t open and keeps asking for a password. If you do continue this route you could have the message say ‘This appears to be…’ but still give the user the choice to enter a password. No point in preventing the user from trying to open a non-SQLite file.