Hacker Newsnew | past | comments | ask | show | jobs | submit | almet's commentslogin

(Hi, disclaimer: I'm one of the current dangerzone maintainers)

You are correct: that's basically what Dangerzone is doing!

The challenges for us are to have a sandbox that keeps being secure and make it possible for non-tech folks (e.g. journalists) to run this in their machines easily.

About the sandbox:

- Making sure that it's still updated requires some work: that's testing new container images, and having a way to distribute them securely to the host machines ;

- In addition to running in a container, we reduce the attack surface by using gVisor¹ ;

- We pass a few flags to the Docker/Podman invocation, effectively blocking network access and reducing the authorized system calls ;

Also, in our case the sandbox doesn't mount the host filesystem in any way, and we're streaming back pixels, that will be then written to a PDF by the host (we're also currently considering adding the option to write back images instead).

The other part of the work is to make that easily accessible to non-tech folks. That means packaging Podman on macOS/Windows, and providing an interface that works on all major OSes.

¹ https://dangerzone.rocks/news/2024-09-23-gvisor/


(Hi, disclaimer: I'm one of the current dangerzone maintainers)

That's a good question :-)

Opening PDFs, or images, or any other document directly inside your machine, even with a limited PDF viewer, potentially exposes your environment to this document.

The reason is that exploits in the image/font/docs parsing/rendering libraries can happen and are exploited in the wild. These exploits make it possible for an attacker to access the memory of the host, and in the worse case allow code execution.

Actually, that's the very threat Dangerzone is designed to protect you from.

We do that by doing the docs to pixel conversion inside a hardened container that uses gVisor to reduce the attack surface ¹

One other way to think about it is to actually consider document rendering unsafe. The approach Dangerzone is taking is to make sure the environment doing the conversion is as unprivileged as possible.

In practice, an attack is still possible, but much more costly: an attacker will be required to do a container escape or find a bug in the Linux kernel/gVisor in addition to finding an exploit in document rendering tools.

Not impossible, but multiple times more difficult.

¹ We covered that in more details in this article https://dangerzone.rocks/news/2024-09-23-gvisor/


> The reason is that exploits in the image/font/docs parsing/rendering libraries can happen and are exploited in the wild.

Aren't risks similar when opening any untrusted web page in a browser?

The only difference is that browser sandbox and exploit mitigations are probably better than that of a PDF viewer.


(Hi, dangerzone maintainer here)

There is indeed a dangerzone-cli tool¹, and it should be made more visible. We plan on updating/consolidating our docs in the foreseeable future, to make things clearer.

Also, plans are here to make it possible to use dangerzone as a library, which should help use cases like the one you mention.

¹ https://github.com/freedomofpress/dangerzone/blob/main/dange...


Incredible, thanks for sharing! Can't wait to use it for my pdf pipelines :)


Freedom of the Press Foundation is kick-starting a bug bounty program for this holiday season.

Challenge the popular adage "containers don't contain", by sending Santa a naughty letter that bypasses Dangerzone protections (Libreoffice + gVisor + Podman)

If your letter breaks a containerization layer by capturing a flag, you get the associated bounty.

Have fun!


I'm not sure the ratio of comments to LoC is a sign of good quality code.

Too many comments might actually be a bad thing. It's more lines to maintain, and sometimes the comments just tell what the code is doing where there is no need to.


I find that comments are most valuable when onboarding/understanding an unfamiliar code base. Comments composed of a little bit of “what” some “where” and a lot of “why” seems best for this scenario.


It's still the same story : PyPI still doesn't have a way to automatically detect interactions with the network and the filesystems for the submitted packages. It's a complex thing to do for sure, but that would be a welcome addition, I guess.


PyPI still doesn't have this because no packaging ecosystem does. It's impossible to do in the general case if your packaging schema allows arbitrary code execution, which Python (and Ruby, and NPM, and Cargo, etc.) allow.

The closest thing is pattern/AST matching on the package's source, but trivial obfuscation defeats that. There's also no requirement that a package on PyPI is even uploaded with source (binary wheel-only packages are perfectly acceptable).


"no packaging ecosystem does."

This is a little bit too strong, since packaging doesn't require arbitrary code execution. For example, Go doesn't permit arbitrary code execution during `go get`. Now - there have been bugs which permit code execution (like https://github.com/golang/go/issues/22125) but they are treated as security vulnerabilities and bugs.

Of course, you're right about Python.


What I meant by that is that no packaging ecosystem (to my knowledge) runs arbitrary uploaded code to find network activity. Some may do simpler, static analyses, but outright execution for dynamic analysis purposes isn't something I'm aware of any ecosystem doing.

Python, Ruby, et al. are in an even worse position than that baseline, since they have both arbitrary code in the package itself and arbitrary code in the package's definition. But the problem is a universal one!


Ah, yep, you're right about that as far as I know too.


This seems eminently solvable though. Why can’t every package submission cause some minimal sandboxed docker image to install the package and call the various functions and methods and log all network and disk activity? If anything looks suspicious it would be denied and the submitter would have to appeal it, explaining why the submission is valid. The same applies for NPM and Cargo. I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start. This seems like the kind of thing that wouldn’t even cost all that much, and big corporate users of python would stand to benefit.


For one, because Docker is not a sandbox, and containers are not a strong security boundary[1]. What you really need here is a strongly isolated VM, at which point you're playing cat-and-mouse games with your target: their new incentive is to detect your (extremely detectable) VM, and your job is to make the VM look as "normal" as possible without actually making it behave normally (because this would mean getting exploited). That kind of work has a long and frustrating tail, and it's not particularly fruitful (relative to the other things packaging ecosystems can do to improve package security).

> I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start.

You're probably talking about Moyix, who did indeed downloaded every package on PyPI[2], and unintentionally executed a bunch of arbitrary code on his local machine in the process.

[1]: https://cloud.google.com/blog/products/gcp/exploring-contain...

[2]: https://moyix.blogspot.com/2022/09/someones-been-messing-wit...


You make some good points. But it still seems to me that, if you used the best available sandboxed VMs for each platform (Windows Sandbox for Windows; FireJail for Linux; VirtualBox with no folder permissions for OSX-- I don't know if these are the best or even good, those were the ones I found from a bit a searching), that you could install and run these packages in an automated way (especially with some GPT3-type help to figure out how to explore and call the important functions) and look for the telltale signs in the network and file access behavior that they are malicious. Even if we grant that this is a long-tailed "cat and mouse" game, then so what? We won't get 100% security, especially against super sophisticated threat actors, but if you could catch 98% or whatever of the typical clumsy supply chain attacks, or super egregious stuff like that NPM package that deleted your whole disk if you were Russian, that would be an incredibly vast improvement over the current state of affairs. Why isn't that worth doing? Why isn't Google or Microsoft at least trying this?


It isn't worth doing because the equation you've supplied doesn't include the effect of catastrophic failure: dynamic analysis lowers the barrier for exploit to a single hypervisor or VM exploit. Catching 98% of spam packages that affect nobody is worth very little when the 2% you don't catch are the ones that do the real damage.

> Why isn't Google or Microsoft at least trying this?

They are: Google and Microsoft both spend (tens of) millions of dollars on hypervisor and VM isolation research each year. It's a huge field.


> What you really need here is a strongly isolated VM,

Simplify, don't use a VM.

Create an isolated network, hook your sacrificial machine up to it, have it install the package. Remotely kill it (network controlled power switch if needed). The machine's hard drive should be hooked up through a network controlled switch of some type. After the sacrificial machine is powered down, reroute the HD so it is connected to a machine that does forensics.

Now you have a clear "before" and "after" situation setup for analysis.

The sacrificial machine's network activity can be monitored by way of whatever switch/router it uses to connect to the Internet.


This is a VM, but flakier and with more steps! It’s also eminently not sustainable on PyPI’s scale, which is the context we’re talking about. I’m


Doesn't it solve VM sandbox escape problems though? Actual physical hardware isolation, along with an isolated network. Code can't detect it is running on a VM if there isn't a VM, and it sure can't escape the sandbox if there isn't a sandbox.

> It’s also eminently not sustainable on PyPI’s scale, which is the context we’re talking about.

I started my software engineering career in testing before VMs were a thing, so large, very large, scale test setups like the one I outlined were common place. I wrote about some of my experiences at https://meanderingthoughts.hashnode.dev/how-microsoft-tested... and the physical hardware setup my team was using to run (millions of!) tests was tiny compared to what other teams in Microsoft did at the time.

Network controlled power and peripherals were exactly how automation was done back in the day. Instead of VM images, you got a bunch of identical(ish) hardware and you wrote fresh images to hard drives to reset your state.

Are VMs more convenient? Sure, but my reply was in context of ensuring malware can't detect it is running in a VM!


Well some calls absolutely should invoke network or disk activity, so you would additionally need to define what constitutes good and bad activity for each. Moreover unless the package is a collection of pure functions it would be easy to hide the malware trigger in state that won't be initialized properly by the automated method calls but would be in the standard usage of the package.


> It's impossible to do in the general case if your packaging schema allows arbitrary code execution

Java's type system: ClassLoaders plus SecurityManager was impossible?

that's literally how Java applets worked, enforced through the type system

https://docstore.mik.ua/orelly/java-ent/security/ch03_01.htm

yes, SecurityManager was a poor implementation for many reasons, but it's definitely not "impossible" to sandbox downloaded code from the network while having it interact with other existing code, you can do it with typing alone


I'm not sure it's not do-able, actually. What about having an execution sandbox and a way to check the calls made during the execution of the install script for instance?

I worked a few years back on something like this but it went nowhere, but I still believe it would be doable and useful. The only trace I found back is https://wiki.python.org/moin/Testing%20Infrastructure, which contains almost no info...


Smart attackers are already/will add `sleep(SOME_NUMBER_LONGER_THAN_SCAN_SANDBOX_LIFETIME)` before anything that does FS or network access. Not to say that this wouldn't be a welcome addition, but the scanning needs to be understood in the context of the inherent limitations of large scale runtime behavior detection of packages when you have a fixed amount of hardware and time for running those scans.


Aren't startup made to be bought by the tech giants? Until there is a shift in their goals, I can't see thé gafam losing here.


Some are, but a lot of them are also made to be successful long term companies. Small business is still a large employer overall. Some businesses need to be very big, and if you want to be "filthy rich" you have to be in one of those. However you can be nicely wealthy at a small company.


I still have a hard time understanding what's going on with startups : the power of money makes it too attractive to sell and "call it a life" I guess. That's why even if they don't plan on doing this, startup might actually accept being bought. I guess.


a thé gafam : it use to taste really good but now it taste awfully bad, yet we cannot stop drinking it!


Is DjangocCon US still worth it in 2022? I've been to some PyCon US in the past and it was really a great way to meet the community. I wonder if it's still playing the same role in 2022, or if the community is harder to approach?


Yes. If anything DjangoCon is more worth it than PyCon in my opinion having attended both multiple times. PyCon normally has thousands of participants; DjangoCon US is several hundred. You can/do meet literally everyone if you want to.

The talks are excellent. The hallway chats even better for staying apprised of current best practices. And since Django is so volunteer dominated much of the changes/improvements to the framework happen during in-person discussions.

So a big thumbs up here for attending if you are able.


If anything the fact that DjangoCon is smaller than PyCon makes it more approachable in general. It’s feasible to actually meet everyone at DjangoCon if you really try which definitely isn’t the case with something as large as PyCon


The only thing that bugs me with Proton is that it's still very complicated to integrate with thunderbird (or any mail app?), which makes it practically unusable for my needs.

Having a tab always open in my browser for my mail seems so wrong.


Their mobile apps are also very lackluster and devoid of basic features. I understand that they are unable to open up to other mail apps due to the encryption, but for the past few years there have been little to no updates to their iOS suite.


They did a facelift of their iOS mail app six months ago and this week.


I agree it's not perfect but they have some pretty great instructions. I've been using the Bridge with thunderbird for multiple accounts and it works awesome.


Yep, me too. I have Thunderbird running in the background essentially as an email backup, but I wind up using the web version more. However Thunderbird and Bridge work extremely well that I forget they're running in parallel.

I'm a recent Google GSuite refugee, so it's hard breaking the habit of web based mail I suppose.


Genuine question, how is a browser tab different than thunderbird? Besides storing a local copy of mail (which is obviously a huge win), I don't see a big difference. If anything I like the web UI better.

However, for my uses I simply installed proton bridge + apple mail. It just works with all email services I use.


> Genuine question, how is a browser tab different than thunderbird?

Different protocols for one thing - HTTP vs IMAP and/or POP/SMTP.

Each webmail app does things it’s own way. Webmail conflates the app and the protocol and the provider. Some people prefer to have mail from different providers in a unified app.


I find it works very well using the bridge on my Manjaro desktop, and was fine on Debian before that.


I have the same experience, for me it works flawlessly with the bridge, but the bridge itself is a complication. I would prefer a straight imap option (in addition to the bridge option). My 2c.


But then you wouldn't have E2EE anymore, and that's kind of their flagship feature.

I'm very happy with the Bridge integrated with Neomutt on Fedora Linux.


I agree with that; all I am saying is that I think a fair number of users would be happy with the IMAP access even though it does not provide E2EE benefits, which should not affect those who want to run with the bridge.

I think Protonmail started with a privacy-focused business case (for which E2EE is a key feature), but is now expanding into the broader pool of people who want to pay for an ad-free, no-data-scanning email, but are not concerned about targeted spying and prefer the simplicity of a convenient setup. For that crowd, IMAP would make more sense. My 2c.


I use it with Thunderbird. There is an initial step (you need to set up ProtonMail Bridge) but after that it's seamless. And they have really good instructions for how to do that initial setup.


You just need to install the bridge locally, the rest is similar to what you would do for other email providers. What is complicated about it?


fwiw, there's an unofficial desktop app: https://github.com/vladimiry/ElectronMail


Thanks for stating this. It's still good to have people working on tools to help us have better usable solutions though.

Depending who is your enemy (threat model), I guess proton tools can help you protect your intimacy though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: