Ideas to view and reorder PDFs

Eduardo_Gutierrez_de_Oliveira · June 1, 2016, 10:10pm

Indeed. I had considered using a webview for full-size visualization. I tried using pdfjs and the webview for the thumbnails it wasn’t practical, though.

Michel_Bujardet · June 1, 2016, 10:20pm

On Windows, HTML with WebKit does not support PDF either without the Adobe Reader plugin

Tim_Hare · June 1, 2016, 10:32pm

The QuickPDF library has functions to render a page to a jpg or png image and save it to a file or return it directly as a string.

Eduardo_Gutierrez_de_Oliveira · June 1, 2016, 10:57pm

Thanks. I hadn’t seen this one before. I’ll look into it.

EDIT: This was quick. QuickPDF is indeed very powerful, although it’s almost exactly the same as DynaPDF in that it does tons and tons more than I need (with a price to match its considerable feature set).

I’ll keep on trying to coax Windows.Data.pdf into a set of declares. If I can’t, I’ll have to make do with xpdf as I currently do (well, technically pdftopng which comes with xpdf).

Thanks all for the help and suggestions.

Tim_Hare · June 2, 2016, 1:06am

I take it they’re price-conscious, since they didn’t just purchase a few copies of Acrobat. They’re stuck in that trap where in order to save a few dollars, they end up wasting a lot of time and doing it the hard way. People really need to learn to put a value on their time and base a purchasing decision on reality rather than perception. People perceive their time to be “free”, but it isn’t. There are real costs associated with it, especially to the company. How many hours wasted at their current salary does it take to add up to an Adobe license?

(And how much time have you spent on this, and how much will they have paid in salary or other costs by the time you’re done?)

anon20074439 · June 2, 2016, 1:44am

Here’s a sample using ghostscript

You can do some different things like create PNG’s for user view and split each page to separate PDFs at the same time for later merging.

Example code was borrowed from another basic forum and run in another basic, but it was just running a command line app, send me a DM if you want a link to it (don’t want to upset anyone by posting a link to the forum)

Printed this thread to a PDF (using acrobat pro, needed a quick pdf), then ran it through GS with a few params (command line) to produce a PNG and PDF for each page, page 1 example below:

PDF of this thread (something to test it on)

Page 1 extracted to pdf

Page 1 extracted to PNG

You can also use GS to merge the PDFs back into a single PDF in the order that you send the files on the command line.

Wayne_Golding · June 2, 2016, 2:37am

You could have a look at http://www.nuance.com/for-partners/by-product/ecopy/software-alliance-programs/index.htm. I use the product as a user, haven’t tried the SDK.

Christian_Schmitz · June 2, 2016, 6:58am

ghostscript is really expensive for commercial stuff.

Eduardo_Gutierrez_de_Oliveira · June 2, 2016, 7:09am

[quote=269461:@Tim Hare]I take it they’re price-conscious, since they didn’t just purchase a few copies of Acrobat. They’re stuck in that trap where in order to save a few dollars, they end up wasting a lot of time and doing it the hard way. People really need to learn to put a value on their time and base a purchasing decision on reality rather than perception. People perceive their time to be “free”, but it isn’t. There are real costs associated with it, especially to the company. How many hours wasted at their current salary does it take to add up to an Adobe license?

(And how much time have you spent on this, and how much will they have paid in salary or other costs by the time you’re done?)[/quote]

Who’s “they”? Why do you assume Acrobat don’t exist? Who’s “saving a few dollars”?

I think I have made it clear this is a personal project I’ve set for myself to improve the way some people are currently working, because their workflow is a lot less than ideal.

There’s no “they” involved here.

Acrobat is used on a daily basis. Acrobat is a nightmare for simple tasks (so much so, that they instinctively avoid it, “printing to PDF” is easier and faster to use than Acrobat for the very simple needs they have).

My time to dedicate to this is mine to give away for free. I don’t think discussing how I assign my own time is anybody’s business, really.

I’m not sure what’s happened with this discussion but I keep being challenged, with a bafflingly accusatory tone, on an aspect I believe it’s clear is already decided.

Whatever the reasons, the constrains for this project are already set. It’s been made clear some believe a full-on PDF suite should be used and I have thanked this opinion and explained that we won’t go that route (regardlesss of cost).

I created the thread to try and find the best way to produce PDF page thumbnails in Windows for a hobby project that isn’t really about PDF edition or creation.

I already have a way to produce these that uses a CLI helper (xpdf → pdftopng) in Windows and Mac, and I already have a way to produce these that uses a system call and declares in MacOS. I’m looking for an option as simple or simpler than this.

I don’t mind if there’re are no better solutions already, but to keep being sent to full PDF suites is like asking if there’s a better engine for my small delivery truck and to keep being told I should be using an 18-wheeler truck, with the added subtext that I don’t because I’m an idiot for not wanting to spend the money or for not having the potential for the future possible growth of my needs

I looked at this. There’s not a lot of info and since this is a hobby project I didn’t want to arrange anything more formal, but it looked like the SDK is designed to interact with their products (which would be understandable, as a way to grow their own platform). There’s no real info available, which is unfortunately too common on software designed for corporate use.

This is not “commercial stuff”. License is not a problem since there wouldn’t be an issue releasing the source code if the library required it.

I won’t go with Postscript as I already have a simpler solution that works as a CLI helper (xpdf). I should’ve mentioned that I had already looked at it rather than just implying I had already checked all obvious solutions.

Eduardo_Gutierrez_de_Oliveira · June 2, 2016, 7:55am

I forgot to address this, which is actually an option I haven’t discarded yet.

I have some experience with pdfjs, so my plan wasn’t to use a native PDF reader within a webview, but pdfjs within a webview. This example shows how to render thumbnails from pdfjs:

http://bl.ocks.org/palerdot/bf0c52d84aa046a6963c

Sejda implements something like this here: https://www.sejda.com/visually-combine-reorder-pdf (they use pdfjs to render the PDF from the browser, then execute the split/reorder server-side)

This specific section of Sejda is very close to what I want to implement, and they’re following a similar approach (they have built a UI to existing tools, and they never deal with creation or rendering of the PDFs directly).

I’m leaving this as a last option because I’d rather not build a Xojo app only to have most of the UI sitting in a webview (I wanted to take this opportunity to practice building a canvas UI for object manipulation).

I think mentioning “PDF” automatically derails the discussion, since most of my responses have been geared at trying to clarifying how simple the actual request originally was supposed to be

I blame “second language” on my part, as much of a cliché as that is

anon20074439 · June 2, 2016, 10:14am

I’ll remove the files linked to my above post then.

Here’s the source of the server side portion of Sejda.

https://github.com/torakiki/sambox

Its a fork of https://pdfbox.apache.org/

Have fun

Eduardo_Gutierrez_de_Oliveira · June 2, 2016, 11:01am

[quote=269505:@]I’ll remove the files linked to my above post then.

Here’s the source of the server side portion of Sejda.

Its a fork of https://pdfbox.apache.org/

Have fun :)[/quote]

Thanks, will take a look at it.

Tim_Hare · June 2, 2016, 5:31pm

I apologize if my post came across as accusatory. You had not made it clear that this was a “hobby” project for your own edification and learning. Your original post framed the problem in terms of your workplace. My comments are really from my own frustration with clients and would-be clients who waste thousands of dollars in time and “soft cost” in order to save a few hundred in hard cost. It’s an insidious mindset that even I get caught up in from time to time.

By now it must be clear that there isn’t a built-in way to do what you want on Windows. Windows.Data.Pdf was introduced for Windows 8.1 and above, but isn’t directly accessible from Xojo. You would have to write a DLL that wraps the functionality you want. There was a blog post about that and I think Michel has had success in wrapping .NET functions that way. If you’re looking for a free alternative to pdftk, that would be the way to go. But be aware it is 8.1 and above.

Eduardo_Gutierrez_de_Oliveira · June 2, 2016, 7:43pm

Thanks. I ended up figuring this out reading between the lines. Took me a while to figure out the Windows.data.pdf call was from WinRT and thus not directly accessibly. I did find a .NET wrapper but I haven’t checked back on it.

EDIT: The library I had seen is PDFSharp. It’s .NET but it’s not a wrapper.

The tool is for people in my workplace, but only because I personally think their current workflow is apallingly inefficient so I’ve taken it upon myself to try and find a better way for them to work (as both a favor and a personal challenge). If I can come up with a proof of concept I’m sure it’ll eventually cascade into a reworking of the complete toolchain and, then, I don’t think I could stop the company from spending thousands in a new solution.

Actually, the problem in the companies I’ve worked in tends to be stopping them from ridiculously overspending. In my current one featuritis ran a development project overbudget north of 3 million euro and a delivery date that wasn’t to be seen three years after the initially-agreed release date. Believe me, this company has the exact opposite problem: They don’t know how to spend nor when to stop spending. Sunk Costs is an unknown concept for them.

Norman_P · June 2, 2016, 7:50pm

I also worked for a company whose mentality was “Well we spent this much already we might as well keep going”.
IBM and SAP loved them for several years
Then they got a new CIO who paid to break those contracts & did something else and spent way less to get way more even AFTER buying out the contracts.

Eduardo_Gutierrez_de_Oliveira · June 2, 2016, 8:46pm

[quote=269649:@Norman Palardy]I also worked for a company whose mentality was “Well we spent this much already we might as well keep going”.
IBM and SAP loved them for several years
Then they got a new CIO who paid to break those contracts & did something else and spent way less to get way more even AFTER buying out the contracts.[/quote]

Those three million? Oracle. Who happily kept accepting new requirements and extending deadlines and budgets.

In the end same thing, it took new blood to cut off the ridiculous project at the knees and assume the losses.

Sad thing is that the project started with great goals, but did the terrible terrible thing of asking users of an existing system how they wanted the new one to be. So for three years they kept modifying a huge platform to look like the one it was supposed to replace, because functions didn’t want to lose their power silos, the illusion of control and didn’t want to think of better ways to work.

Norman_P · June 2, 2016, 8:56pm

blew through about 50 million which wasnt bad considering annual IT budget was 120 million or so
So much you can do when money is basically “turn tap open”

Tim_Hare · June 2, 2016, 9:45pm

Some days I really enjoy working in the small biz landscape. Other days I wish I had some golden client who would just shovel money at me.

Oliver_Osswald · June 3, 2016, 3:53am

Wow … In the two days this thread is going on, I would have studied Christian’s DynaPDF sample projects and patched together a viewer which allows for reordering and deletion of pages…
There is a good reason why DynaPDF keeps popping up in threads like this one: it is worth every cent.

Eduardo_Gutierrez_de_Oliveira · June 3, 2016, 11:26pm

[quote=269722:@Oliver Osswald]Wow … In the two days this thread is going on, I would have studied Christian’s DynaPDF sample projects and patched together a viewer which allows for reordering and deletion of pages…
There is a good reason why DynaPDF keeps popping up in threads like this one: it is worth every cent.[/quote]

Can we please stop with DynaPDF? It was made very clear that it’s considered the best all-around solution for PDF treatment out there, and this hasn’t been contested by anybody. There’s no need to keep insisting, since I acknowledged it probably is and that I still won’t use it (I explained why I won’t since it kept being brought back, but I truly don’t think I should have had to).

I already knew about DynaPDF, recognized its worth and made sure to make it clear I have nothing against it. But after making it clear I won’t use it, I can’t understand why it keeps popping up. It’s not like I said I was on the fence and needed convincing or like I argued against it.

Honestly, DynaPDF doesn’t need any defense. It hasn’t been attacked or disparaged. I’ve just chosen not to use it and I’ve (politely, I hope) made it clear many times.

As for the time this thread has been going on: Total actual time I’ve dedicated to this so far: 3 hours. That includes getting a working prototype using CLI Helpers (xpdf and pdftk) and another using Cocoa calls.

The thread has kept on because it’s veered in a different direction (as conversations tend to). Not because we keep arguing on what to use.