Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Privacy friendly suite of PDF tools (pdfux.com)
99 points by drcpp on Feb 9, 2023 | hide | past | favorite | 44 comments
Hi HN,

I have been working on a set of PDF tools that does all the processing directly in the web browser. From time to time I needed to do some simple PDF manipulations like merging PDF files. Sometimes the files contain my personal data and I was not comfortable using other online services where the file usually is uploaded to a remote server.

Behind the scenes there is a small library written in C++ doing the changes to the PDF files. I am using the Emscripten compiler to compile it to WebAssembly that is running in the browser. It was a very good learning for me and it was easier than I thought to get something working, credits to the Emscripten project. The tool is also a PWA and can be used offline (once loaded).

I am looking for any kind of feedback, comments or ideas for new tools that I could add.



You might find this interesting to add functionality to this website: it's a JavaScript version our commercial PDF tools, and can run entirely in the browser. It's AGPL, so you wouldn't need a commercial license for your use case, so long as you release your source, as you say you plan to in another comment:

https://www.npmjs.com/package/coherentpdf


Thanks, will definitely have a look.


Just a heads up that all your comments here are showing up as dead (i.e. people won't see them unless they have "showdead" enabled in their settings) which means you're probably shadowbanned.

I've vouched for all your comments so far to revive them, but you might want to email hn@ycombinator.com about this.


wow, ok didn't know that. Thanks a lot for letting me know.


This is very cool. I had to build something similar but did it server side... client side seems like a much better option. My first thought was... I don't know of any client side pdf libraries that work in the browser... this seems like a good case for WebAssembly.


I looked under the hood and did see a wasm reference.


Yes, it is a C++ library I have created and compiled to WebAssembly.


Did you write the c++ pdf library yourself?


Have you written this library or is an already existent one? Also can be used offline but maybe provide an easy way to download it so can run it locally or host it on own server? (Unless of course there's a plan to monetize this.)


It is a very small PDF library that I have written myself in C++. I am planning to release the source, but haven't decided on the license yet. It is very limited in features compared to the whole PDF specification but I am planning to keep adding to it.


Very cool! Congrats! Do you plan to add a file reducing feature? I often make pdf from Keynote but the macos built in apps (keynote, preview) perform badly in exporting small pdf files compared to acrobat pro.


Thank you. In Split PDF file, it wasn’t so obvious to me that the check mark was a button. So I didn’t know what to do after pressing the plus button.


Good tool to have for users of PDF documents.I missed seeing the WASM locally manipulated notifier perhaps it can be more clearly identified. Certainly can see usage and it will be good to have digital signing capabilities and adding any "version" edited signed attribution of the generated pdf file(author).

Perhaps reword as Secure,private, fast and free local in browser PDF tools?

Congratulations.

Look forward to learn from the source code.


Thanks a lot for your input. I agree that it can be a good idea to make it clearer that this is like a local / offline tool since that is the biggest difference compared to other tools. Will definitely try to make this clearer.

I am not sure what you mean by the "WASM locally manipulated notifier"?

Digital signature is a good idea for feature, I have added it to my feature todo list.


Looks very nice - I see there is the ability to edit metadata, but I often have a need to strip/delete _all_ meta data from a document. Having that as a single option would be very useful...


Thanks, that shouldn't be too hard to add. I have noted it down and will have a look at it.


This is awesome. There have been so many times when I needed something like this, as I'm sure there will be again in the future. Welcome to ... my bookmarks.


Happy to hear :)


How is it safer than any offline app?


I guess it is just as safe. One advantage that I see with this tool, is that it runs in the browser so you don't have to install anything and can use it from any device with a web browser like your phone or tablet.


Can you add a tool to remove the security on pdf that prevents editing, not opening? smallpdf has this.

Great work!


Thank you :) I added this to my feature list, I will try to look into it.


pdfDIFF (comparing and highlighting 2 pdf files) would also be a nice feature.


Noted it in my list as well, thank you!


Very cool, I like it. I resort to using smallpdf.com on occasion, would love to use this instead. Hope you will have a chance to add more features. What else do you think would be feasible browser side?


Happy to hear that you like it. As an example, I have started to look into if it is possible to add different features based on OCR. I still haven't gotten it to work but I think it should be possible. Some things are definitely harder to do in the browser vs on the server but still possible with some work.


This is nice. Thanks for doing this.

FYI, the other existing ones that I know and have used is https://www.sejda.com


Thank you. Sejda has a lot of features and seems to be a solid tool. The big difference between Sejda's online tool and pdfux is that with Sejda (and most online PDF tools), your file is transferred over the internet to Sejda's server where it is stored, processed and sent back to you. Then they promise to delete it. Personally I am not 100% comfortable using this kind of tools with files that I wouldn't want to share with anyone. (I also see that Sejda has a desktop version for offline use which seems to be their answer to the privacy concern.)


> Then they promise to delete it. Personally I am not 100% comfortable using this kind of tools with files that I wouldn't want to share with anyone.

This. Keep up the good work.


I've been using PdfSam to split large PDFs locally. I didn't know this could be done locally in a browser. Great work! I'll try it out next time


Very nice set of tools! I'd love a native version that doesn't require a browser, though ;)


Thank you! Could be an idea. The PDF logic itself is in a C++ library that can be compiled against different platforms. A cli for instance shouldn't be too difficult to create. If you are thinking about a native UI, I am not sure it adds so much value when there is already the browser version.


Yea, maybe as a neatly packaged CLI? Each of these could be a subcommand...


Add text extraction -> segmentation -> small ML models -> txt, json, etc.


Can you share which C++ library is used? Thanks.


It is my own library that I have created. I am planning to release the code with some open source license hopefully soon. The library is still very small and only include a tiny part of the whole PDF specification.


Love it !!!!


Honestly, you should charge like $1 for each operation. People will trust it more that way, as you're not incentivized to snoop on their files (or put 500 ads everywhere).


Ah yeah, the surefire way of getting "trust" is by charging people, how could one forget about this obvious way...

Also, how would charging work when you're offline? One of the features of the tooling is that stuff works offline, seems maybe charging for each use wouldn't quite match up with reality.


In general, I have been thinking about how to get some earnings from this, it would allow me to spend more time and continuing to improve the tool faster. I am still not sure the best way to do it without scaring away too many users. If I could get some earnings and also increase trust that would be great. Need to think about this idea, maybe I can do a test and see how it works out...

(I already ruled out ads, since I don't want to clutter the tool and also most ad networks are not exactly privacy friendly which one of the main ideas behind the tool)


I think you should be able to make some money by offering your services to corporations. They don't know that everything runs on the browser, but they will like and pay for the idea of adding this capability to their website (pdftools.acmecorp.com). Start with lawyers, I think they are prime customers (pdfux.biglaw.com). Once you explain to their IT that you are only leveraging the clien-side processing, they'll leave you alone. All they would do is give you access to a vps or a vm in their hypervisor, or heck you could dockerize and sell it branded to them(biglaw inc all over the place). I do believe that's called.. bespoke service!! lol.. and of course.. the annual support income.


I agree with selling to corporations. However, most big law lawyers will need a higher level of PDF functionality. At a quick glance, I notice that the tools can't manipulate bookmarks, which are required for electronic filing in most courts.


Interesting thinking...


Please put up a donation button/buy-me-a-coffee/whatever while you're still figuring out the best way forward.

I always wonder if casual gratitude adds up to as much as ad networks. Probably not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: