If you want security for your app, you shouldnt use SHA1 anymore. Google Security released a note that they managed to create the same hash for two different files: https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
SHA1 should have been deprecated since a long time.
However thinking that a string can for sure represent a single document is an old concept. It could be true when only few documents were created and not, like nowdays, millions every second.
A longer string (SHA-256 or SHA-3) can pospone the problem. It’s matter of context.
The two documents can have the same meaning in the same context?
Say that the original document was a signed contract. Did they built the PDF prefix in a way that the contract was signed by somebody else?
Since the PDF prefix contains reference to document’s content, are they formatted in a acceptable similar way?
Thinking that SHA can be an absolute unique signature is like thinking that an integer ID for a record can be an worldwide identifier for that record in every database and in every table.
I wrote a quick app to confirm their findings. Sure enough, while the contents of the files are different and both open as different PDF’s, the hashes match. Here is the output from my app:
Contents do NOT match! SHA1 Hashes match 38 76 2C F7 F5 59 34 B3 4D 17 9A E6 A4 C8 0C AD CC BB 7F 0A 38 76 2C F7 F5 59 34 B3 4D 17 9A E6 A4 C8 0C AD CC BB 7F 0A SHA256 2B B7 87 A7 3E 37 35 2F 92 38 3A BE 7E 29 02 93 6D 10 59 AD 9F 1B A6 DA AA 9C 1E 58 EE 69 70 D0 D4 48 87 75 D2 9B DE F7 99 33 67 D5 41 06 4D BD DA 50 D3 83 F8 9F 0A A1 3A 6F F2 E0 89 4B A5 FF