Should I use a separate table to store BLOB data?

Edwin_van_den_Akker · July 8, 2020, 9:39pm

I plan to use a SQLite database as an alternative for a VirtualVolume, that has been depreciated since 2019r3

When using a database to store files I have the luxury to store lots of metadata with the file.

So, my approach was fairly simple: Create a table with the metadata-fields and a blob-field for storing the actual file-data. A lot like the BLOB-example that comes with Xojo, but extended with several metadata fields.

But, I read somewhere that it is good practice to store large chunks of data in a separate table, for more efficient select statement calls. But isn’t that only the case when using large string columns?

MarkusR · July 8, 2020, 9:45pm

you could also select metadata from table instead of select *

Edwin_van_den_Akker · July 8, 2020, 9:49pm

The thing is that I have about 11 extra columns in the table. That would make the select statements not too convenient to work with. Of course I can create a constant, that holds the right column names.

With a separate table I could use SQL triggers that delete the corresponding Blob row.
But keeping everything in one table is easier to work with. I’m not too sure if the select statements are too expensive though.

MarkusR · July 8, 2020, 9:58pm

i using a database at ms sql with files in a single table that works well.

[quote]FileId
Bin
Date
Comment
Tags
Extension
Length
Name
MIMEType
Thumbnail[/quote]
because the file is binary data i guess this is only read in chunks if you access it.

Edwin_van_den_Akker · July 8, 2020, 10:00pm

That is what I thought, since you need to use a Blob object in Xojo to actually work with it.
Thanks!

Norman_Palardy · July 8, 2020, 10:20pm

I’d put the blob data in one table and the metadata in another and relate the two
that way the blob data (which cant really be searched very effectively) wont be read when trying to do queries etc
this tends to be much more important in db’s like oracle sybase ms etc given how they work with rows & such

Edwin_van_den_Akker · July 8, 2020, 10:23pm

But I am using it in a SQLite database. I don’t know what happens under the hood of the SQLite engine. But my thought was that it was going to be ignored in queries. Mainly since I can only access it by opening the blob field separately, like Markus pointed out.

Norman_Palardy · July 8, 2020, 10:55pm

I’d have to read the very low level details of SQLITE and how it handles things to be 100% sure BUT in many db’s they read “pages” of data when performing a query and if those “pages” also include the blob then it has to do a LOT more I/O to run the query as each page many have a few fields of meta data and then a LOT of blob data

by putting the blobs in a different table you can avoid this kind of I/O hit

Everything in one table CAN be a performance penalty at times

ignored insofar as the query wont consider its value yes
but as I noted above it can cause the db engine to have t read many many more pages which can slow a query
with a small number you might not notice
with lots you will
worked on a document management project where it started out all in one table and we eventually split it out and it made a world of difference for exactly this reason

anon93744516 · July 8, 2020, 11:01pm

With regards to SQL Server specifically, when storing large-value data (like binary) in a column of a table along with other kinds of columns, in the background SQL Server is actually storing this larger data in either a separate allocated storage array or is linking to a Filestream. This largely behind-the-scenes storage logic is basically handled for you by SQL Server, so you can just script or design one table and query or update that table without any special extra JOIN logic, because SQL Server is doing the extra linking for you.

This special “automatic” storage allocation would apply when assigning SQL Server column data-types using the max size keyword with varchar, nvarchar varbinary, or the text, ntext, xml, image, etc. data-types.

With the above in mind, and because I don’t think SQLite supports such a feature, I would take a separate table design approach myself and JOIN where necessary (or query them separately, depending…).

See SQL Server binary and large-value data FILESTREAM data for more information.

[quote=495688:@Norman Palardy]I’d put the blob data in one table and the metadata in another and relate the two
that way the blob data (which cant really be searched very effectively) wont be read when trying to do queries etc
this tends to be much more important in db’s like oracle sybase ms etc given how they work with rows & such[/quote]
Because of the behaviour I described above, in SQL Server you can query tables that contain large-values or binary columns - and as long as you don’t reference those particular large-storage columns, SQL Server will not include those columns in the Execution Plan to still give you good performance.

I hope that helps.

MarkusR · July 8, 2020, 11:03pm

[quote=495691:@Norman Palardy]I’d have to read the very low level details of SQLITE and how it handles things to be 100% sure BUT in many db’s they read “pages” of data when performing a query and if those “pages” also include the blob then it has to do a LOT more I/O to run the query as each page many have a few fields of meta data and then a LOT of blob data
by putting the blobs in a different table you can avoid this kind of I/O hit
Everything in one table CAN be a performance penalty at times[/quote]

if it is so
but then you could not select the field in the query why it should read then?

Norman_Palardy · July 8, 2020, 11:10pm

ah you have to add the filestream attribute

[quote]2^31-1 bytes of data.[/quote] ??? really MS ? 2GB ?

anon93744516 · July 8, 2020, 11:31pm

[quote=495694:@Norman Palardy]ah you have to add the filestream attribute
??? really MS ? 2GB ?[/quote]
The 2GB limit on large-value columns is a factor of the “automatic allocation” during design and run-time. It’s a restriction for what is considered a highly convenient feature that suits most needs.

For the occasion you need more space than 2GB, then the FILESTREAM keyword gives you access to as much space as you need for a single piece of data (binary or otherwise), limited only by the size of your storage device(s).

MarkusR · July 8, 2020, 11:31pm

i don’t think that the authors of sqlite mix small and mega byte of data in a single row. i guess they already separated it somehow.

anon93744516 · July 8, 2020, 11:35pm

You could be right Markus. I may have even read something about this awhile ago, somewhere, but I haven’t explored that end of SQLite yet. My needs so far are pretty minimal when it comes to SQLite.

MarkusR · July 9, 2020, 12:00am

that is interesting.
sqlite.org/fileformat
everything is put in “Pages”

Edwin_van_den_Akker · July 9, 2020, 12:02am

So are mine, usually…

Me too. But I can’t remember if it was distinguishing a BLOB column from a text- or string-based column.

TimStreater · July 9, 2020, 9:04am

If you are going to have a (potentially large) blob in your one table, make sure it’s at the end of the table, so that queries, as Norman hinted, don’t have to skip past all that data to get to your integer at the end.

In addition, remember that if you update an early column in the table, that may involve rewriting the whole row which could be expensive.

E.g.: integers in SQLite are stored in 0, 1, 2, 4, or 8 bytes depending on their magnitude (zero bytes if their value is 0 or 1). So if you change an early-on integer from 200 to 400, the whole row gets rewritten.

So almost certainly best to have two tables, one with just the blob.

MarkusR · July 9, 2020, 10:01am

[quote=495715:@Tim Streater][/quote]
do you think sqlite put a “row” page beside page in this one file?

TimStreater · July 9, 2020, 11:24am

Ah, for questions like the internal format have a look at https://sqlite.org/fileformat.html. Or ask questions on the SQLite Users Forum at https://sqlite.org/forum/

Bob_Keeney1 · July 9, 2020, 6:39pm

If your blob data is going to change a lot then putting it in the same table is probably no big deal. However, if it rarely changes it’s been my practice to put it in a different table and relate the two. Another factor for us was that we use ActiveRecord/ARGen for most DB apps and it reads the entire record (not just selected fields) when loaded so having the blob can slow it significantly so having it in a separate table and loading it lazy (i.e. when you need the blob data) is more efficient (usually).

I guess my point is there’s no overall perfect way of doing it. You have to sort of think about how often a record is going to be read, updated, etc. and in what context to determine the best way for your project.