How to validate PDF

For those of you who might run across this method
“The ‘FILE’ command in OSX Shell”

It works by “assumption”… it only looks at key components of the file to determine what it thinks the type is.
As a test I created a corrupted PDF file, but FILE still id’d it as a PDF even though it was corrupt

Alas!

you could use DynaPDF Lite to read in PDF and see if you get an error.

Sorry for the lack of information. Here’s the scoop.
For every user that has a license to use my software, they also have an opportunity to share a file or files with other users who also have a license. The only file type I intend to allow to be shared in this way is a PDF. Since this app is for Windows and OSX only, I’m going to recommend to the users that they use something like Microsoft Word to create what they want to share, and then use the built in method to save it as a PDF (final max size, less than 5 meg).
As part of my due diligence, it seemed appropriate that I should use some form of “testing” to confirm that the file that they are getting ready to upload is in fact a PDF and not one that just so happens to have the extension “PDF”. My recommendation to each user is that what ever way they determine to create a document (like using the aforementioned Microsoft Word), do the “Save As” and then test that document by opening it with Adobe Reader. Once they are satisfied that all is good, then move to the next step.
From what I am gathering here, just testing the front of the file is not fool proof. However, as Tim said, the absence of #1 will identify if it is a PDF (not withstanding what Michael said). # 2 is on the user, and if and when another user downloads that PDF and then attempts to open it and it fails, then some communication with the original creator is due.
Thanks for all this grezt input. I’m feeling pretty good with the state of things as you guys have explained them.

Christian, can you tell me what DynaPDF Lite costs? Perhaps that would be a good thing to employ on every licensed version. Also, is there a demo version so I could do some testing? (Thanks)

For this January we offer DynaPDF for $469 USD or 429 Euro. (plus VAT).

I just made an announcement:
https://www.mbsplugins.de/archive/2016-01-04/Special_offer_for_January_Dyna

Christian, thanks but for what I’m doing, that would be overkill.

If your webserver is running on linux, the upload script (or a Xojo web/console app can use a shell) could validate the pdf using the pdftotext command. I wouldn’t assume the user is uploading from inside your app unless you have some sort of security mechanism built in to prevent someone spoofing your client app. Depending on what you’re doing with the PDF (or the users receiving the shared file), you could be opening yourself up to serious security issues.

Looks fairly simple to validate…

The server will be Xojo cloud. Jim, I don’t think I understand the security issues you mentioned.

The goal here is simple: I am user # 1 and I want to put some text into a Microsoft Word file, save it as a PDF, open it to confirm that it is in fact a “good” PDF, then send it to the web server with other appropriate data.

As user number 2, when I see the title and the references of interest, then I want to download that file to my computer (via the Xojo desktop app) and check it out to see if in fact it is something I want in my own personal research. If I were to pen it and it failed to open, I would just report it to the ‘creator’ and then things would proceed from there.

Other things in play here include a mySQL database, PHP scripts (some I have already written and they do work) executed from within Xojo, and this is before I do anything with security things.

Jim could you elaborate please? Thanks

is PHP available on Xojo cloud?
or running command line tools?
You may need to check that.

Christian, checked it out with Jason and he said yes, and gave me the version number. However, just to be sure I didn’t misunderstand him, I have emailed him this morning and perhaps he will be able to get back to me on this today. If PHP is in fact not available, then this presents a whole new problem. Thanks

When XC was launched, if I remember right, it did not support PHP. So that would be new and welcome. The online documentation is less than clear about that, though.

Jason just emailed me and it was I who misunderstood. PHP is not currently available with Xojo cloud. Guess that means I will have to approach this from another angle.

[quote=239473:@Lee Miller]Jim could you elaborate please? Thanks

[/quote]
It just seems like a good idea to make sure the content is what you expect, in this case, an actual PDF. You can’t really control what the user does, so I would think a server-side test would be a good idea.
The other concern is the constant security updates Adobe puts out. There have been a multitude of known exploits in PDF with embedded Javascript and metadata.
Here’s a discussion about it.

If you are confident that your users can all be trusted, I wouldn’t worry. If you’re making this available to the general public, you might want to take more precautions.

I don’t know what’s available on XC, but the GhostScript solution in the linked page seems like a good idea.

If things aren’t available on XC, you could always run a very minimal secondary server just for the purpose of handling sanitation… users upload via php to the sanitizing server and it would post a safe pdf back to the Xojo Cloud app… just my 2 cents…

If you are involved with XC already, unless Xojo can install PHP on your account, today there are a number of free or very inexpensive hosts that provide php. I have been using Directnic for a decade or so Directnic Hosting - Fast, reliable and secure they have a plan starting at less than $3.00 a month. That is real affordable to set up a php based web service.

I personally went for 1701Software.com before XC was available, and since I need php scripts for my sites as well as Xojo Web, I have been a very happy customer for three years or so. Plus Phillip Zedalis support is outstanding.

Jim, thanks for the link to the discussion, which I just finished reading. I will ask Jason abut the GhostScript solution and we’ll see what he says.
For the most part (99.9%), my users can be trusted. I was looking at this from the point of…Since I need to be able to share ‘files’, what is an easy format that pretty much everyone can create and that can be read on Windows and OSX directly? I thought PDF would be first in line.
I guess my concern was that if a user was really a hacker in disguise, could they put something in a file and try to pass it off as a PDF and that when one of my users opened it in the Xojo HTMLViewer, it would have the potential to do great damage? It seemed that if I read the header, and it contained “%PDF-”, I could let it go.
As Tim said, “The absence of the header will identify that it is not a PDF”.
Also, when Michel Bujardet said “I doubt many people will create fake PDF files”, that made me think that I had some level of safety here.
It appears that my journey is still ongoing. Thanks Jim

Michel Bujardet
I do already have shared hosting supporting several websites and using PHP and mySQL. With this new project, since the sharing of ‘files’ would be a great benefit, I thought that PDF would be safest. It appears that that might not be the case. Also, I thought that the increased security of Xojo Cloud would be a plus. Do you know if 1701Software’s offering has the same level of security as Xojo cloud? Thanks for the recommendation

[quote=239678:@Lee Miller]Michel Bujardet
I do already have shared hosting supporting several websites and using PHP and mySQL. With this new project, since the sharing of ‘files’ would be a great benefit, I thought that PDF would be safest. It appears that that might not be the case. Also, I thought that the increased security of Xojo Cloud would be a plus. Do you know if 1701Software’s offering has the same level of security as Xojo cloud? Thanks for the recommendation[/quote]

It is difficult for me to tell you, as I have no experience with Xojo Cloud. I tend to believe Xojo put a lot of efforts into having the best level of security available, and it is possible that the absence of PHP is part of it (risk of injections, etc.). It is also possible that the absence of PHP is simply due to the concept that Xojo Cloud be as turnkey as possible.