thirdway: an experiment in content services backend architecture

thirdway is an exploration on the viability of a content delivery architecture in a document management system.
The problem it is trying to address, is binary content storage & delivery via an RDBMS. It gets its name from the debate around the following question:

[quote]"How should I store my binary content? As blobs directly into the database, or as files in the filesystem, keeping their paths referenced in the database ?"[/quote]

We won’t go into the pros and cons of each approach (there’s plenty of material in forums), but if we arbitrarily call the blob approach “the first way” and the filesystem approach “the second way”, then what we’d be suggesting here could be “a third way”.

Continue reading in the github repository…

My Opinion… Thirdway is too complex, too much overhead, too slow and possibly too prone to failure.

I most often go for Secondway if the database is not shared, and Firstway if it is… but even that decision is based on the app , how it is deployed and how it would be used.

It is a relational model question. Does the database need to be aware of files that exist? Could just use whatever is on the file system as the source of truth. If the database does need to know it exists does it need to be guaranteed it exists? Then you end up sticking the blobs in the database itself.

It’s not pretty though as you need a database server that can use row locking vs table locking. Also RDBMS are optimized for data that can be searched not for random binary data thrown at it.

The downside to stuffing things in the database has always been replication and backups get terrible quickly.

No… a database is simply that… a source of data, be it a filepath or a blob… it does not have (and should not have) the responsiblity knowing if a data element refers to a file outside of its pervue… That role should be solely that of the application using the data

That is one use case.

What about a database backed object store for tracking blobs to documents and which nodes service those documents? It kinda needs to know the files exist but not a guarantee.

I guess something like ORACLE could track that via stored procedures…

[quote=447464:@Dave S]My Opinion… Thirdway is too complex, too much overhead, too slow and possibly too prone to failure.

I most often go for Secondway if the database is not shared, and Firstway if it is… but even that decision is based on the app , how it is deployed and how it would be used.[/quote]

Thanks for reviewing thirdway.
Yes, it’s fairly complex, but to my eyes, mostly because the code of this particular implementation has not been optimally laid out, OOP-wise.
On how slow it would be, I’d really want to try some actual tests under realistic load. If the response times prove acceptable for average-sized documents, I wouldn’t lose any sleep over it.
Why would you think it’d be prone to failure? I mean, postgres is pretty reliable and both parties have to (or could) verify any operation concluded successfully.

…and that is why I’ve never done firstway, although I’m very attracted to it, philosophically. I pity the DBA too much :slight_smile:

I tried to figure out what you mean by that. Wouldn’t that accurately describe thirdway as it stands?
It is a database-backed object store that binds blobs to documents. It’s only missing the nodes.
By “documents” you mean files, or the logical construct of a document: content+metadata ?