They may have decided that ZT's encryption isn't proven well enough for their needs. It's also posssible they rejected ZT because they didn't want to use ZT's centralized infrastructure. Two years ago was long before ZT started working on making that optional.
ZT allows you to run your own "Moons", meaning you dont need their infrastructure... bit more config required on the client end, but less reliance on Zerotier....
Moons are going to be deprecated soon and as far as I understand never actually worked how people wanted. I.e. they still needed ZTs root servers, even if you were running your own controller.
To contrast, with Nebula you run your own root(s) (lighthouses) and you don't need a controller because important config (ip, group, hostname) is signed by the same CA.
The moon terminology will also go away since there will no longer be a difference between these and our core roots. They'll all just be roots and will be interchangeable. The use of a common underlying key/value store will allow ZeroTier to keep its unified namespace and easy ability to join anyone's network or communicate with anyone regardless of what roots they're using (as long as their roots are on the same global network as you... obviously you can't hop air gaps).
We are about to make that much easier and also allow true infrastructure federation -- you won't have to have your nodes contact our roots directly. We came up with something very interesting.
As for crypto: we plan some improvements in 2.0, but note that these days easily >90% of the traffic over ZeroTier networks (or any other VPN / overlay) tends to be already encrypted via SSH, SSL, etc. Another layer of encryption in the overlay just provides some additional defense in depth. We're rapidly moving to a world where everything layer 3 and above encrypts everything. That's why we have not prioritized sexier crypto for our L2 overlay tech. It's a bit redundant.
This is really interesting news!! Ryan was at the Gophercon 2018 and was talking casually about his pet '20%' project with some of us. Happy to see if finally released in the opensource. Great work Ryan!. His off the record remarks really made me change my mind about Slack engineering team in general. Otherwise, I was always cursing them about their electron client.
I feel like I don't totally follow how you would set this up for, say, a company that has infra in two cloud providers (but no office network or datacenter or anything)... I think the answer is you set up one or more lighthouses with stable IPs on the public internet, and you make sure all your ephemeral cloud machines have IPs on the public internet? And all your ephemeral cloud machines get RFC-1918 addresses that are effectively in a giant flat subnet with no broadcast / no L2 domain and no implied structure?
It feels a little different from Wireguard, in that with Wireguard your engineers would be able to connect from behind a NAT, but my reading of how it works is that machines route directly at each other. Which is good for a production network where you care deeply about routing (bandwidth, latency, costs, debugging, etc.), but it seems that here your engineers would still need to connect to a bastion host or something, i.e., it isn't a VPN in the sense of being able to join the corporate network directly.
I guess if you've also got the lighthouse node internally routable by all your machines (e.g. you have an internal datacenter network and something like AWS Direct Connect) it would work too?
I think the answer is that your lighthouse(s) are the only machines that need publicly routable IPs. Your ephemeral cloud machines get any RFC-1918 address you want, with any subnet you want.
Engineers would have Nebula set up on their laptop with a configuration that knows about your lighthouse(s) static IP(s). They use the lighthouses for meeting other nodes, UDP hole punching, etc, but otherwise every connection is peer to peer.
NAT traversal sounds like a thing I very much don't want to deal with for a production network, instinctively. It's fine for video games with friends but I've seen enough stuff go wrong with even normal networks that I wouldn't want to trust it. If this is what Slack is actually doing, I'd be very curious to hear how it's working out for them and how they debug network outages.
(Which is why I suspect it's not and the readme isn't clear)
Yeah, after doing a few years of VoIP I pretty much learned the same. Yes, there are multiple methods of traversal, yes they are sound in theory. Yet stuff breaks all the time on consumer routers.
Network virtualization seems to have been extremely slow to be adopted. Even companies pushing "cloud first" seem to be running their physical networks like its 2003.
When your Kubernetes falls over, you can ssh to it and run commands like it's 2003 (or 1993). When your network falls over, not so much. We deprecated our network virtualization at $work and have been way happier for it.
Also, the end-to-end principle argues for putting complicated logic in the endpoints and making the network boring. See also, TCP is implemented at the endpoints and just requires network infrastructure to drop packets sometimes. You could imagine a congestion control protocol implemented on each router on the Internet, but it would be much more fragile and also much harder to deploy changes to.
Part of the reason for the end-to-end argument is to enable more clever (or, at least, more purpose-designed) functionality to ride on top of the dumb network. So e2e would suggest (to my reading at least) that you keep the "real" IP layer dumb and flexible, and do the fun stuff in overlays, which is what this is.
It's also mostly about two rather than N endpoints. The later, follow-on Blumenthal & Clark paper is a kind of long list of end-to-end-principle analysis 'it's complicated's.
Consul Connect is under MPLv2, which is a perfectly reasonable license unless you want to do shady things. There may be other differentiators, but this is not one.
I don't think the BSL is perfect. We're thinking and discussing with a number of people about potentially better licenses that would be closer to traditional FOSS while preventing "SaaSification" and similar. I think we're in the early stages of a renegotiation of the open source social contract and I don't think we've figured out the best model yet.
The AGPL is close but suffers from two problems: (1) it isn't perfect either and has numerous loopholes, and (2) there are a ton of companies out there with an irrational but nevertheless very entrenched phobia of anything associated with the GPL (as we have discovered). Maybe something a bit like the AGPL but not GPL branded would work.
I'm not judging, I'm simply relating a fact: Nebula is MIT licensed, and ZeroTier is BSL'd; a paid license is required to use ZeroTier in a closed-source application.
Yes, that's intentional. It used to be GPL which imposed the same requirement, but we shifted to BSL because it's a bit more explicit and because of (again, irrational) GPL-phobia on the part of some non-trivial subset of corporate users.
BTW the closed source restriction in the BSL is effectively the same as the GPL and the only other meaningful restriction is on SaaS direct monetization. Companies can still run ZT for free and run it behind the scenes for free. It's a lot like the AGPL.
The guys on linux unplugged interviewed the developer in their last podcast here https://linuxunplugged.com/329
Starts at about 28:20. He explains more of the why and how.
WireGuard is a VPN, and Nebula is an overlay network (also known as a service mesh). They are closely related concepts.
VPNs are primarily used for remote access, to get random machines access to closed IP networks. Service meshes synthesize a new network (sometimes IP, sometimes something else) to connect a bunch of related machines, almost always with policy controls for who can talk to what, usually cryptographic.
It would be weird (but not "wrong") to use a service mesh to get developer laptops access to staging Postgres.
It would be weird (but not "wrong") to use WireGuard to connect an application server to its Postgres instance.
WireGuard is a much tighter and more limited design, intended for integration directly into operating system kernels, with a strong emphasis on performance. Nebula is a much more ambitious design; it includes direct DNS support, certificates, and server infrastructure. WireGuard is a few thousand lines of very carefully written C code; Nebula is a typical Go project.
Why do you think it is "weird" to use WireGuard for connecting application server with a DB instance?
(Backdrop: I have recently moved our various prod servers into a WireGuard based VPN to encrypt the traffic between them. I found it was easier/pragmatic to do this than:
* to setup SSL for my DB
* to figure out how to encrypt traffic between my application server and Redis or my application server and Nginx
)
I like WireGuard and wouldn't blink at a client proposing to use it to create a secure network fabric for their deployment environment, but it is not the norm for people to do stuff like this; in K8s land, this is what service meshes like Istio do, and more generally this is what people use overlay networks for. WireGuard could form the basis of an overlay network, if you added the same bells and whistles Nebula has. But I don't think Jason has in his plans to add those bells and whistles himself, because that's not really WireGuard's charter.
Like wireguard, Nebula is using the Noise Protocol Framework[1], but it seems that Nebula is using a ca-cert authority to tie together the peers in the same Nebula network[2]
I get a certificate error for `www.noiseprotocol.org`. It turns out they're serving a certificate for `noiseprotocol.org` instead. The URL is still valid without `www.` [0].
It sounds like encryption was a necessary but not a sufficient requirement for Nebula.
In addition to VPN, Nebula added traffic filtering and spanning different clouds and data centers. I don't think Wireguard had those as goals.
They serve very different purposes. I use WireGuard to encrypt my mobile traffic but I wouldn't have picked it to connect the various hosts in my network at work. Nebula, however, might do the trick.
It feels like their issues would have been solved by a service mesh using e.g. consul or istio. If so, I'd wonder writing a tool from scratch was the right use of engineering time. Anyway, as an engineer, I'd certainly have found this a fun project. Kudos to slack for trying something new and open sourcing it.
Not entirely as it only really allows stuff thats running in that service mesh's world to connect to the network.
But they want a global VPN for _everything_ including laptops. This means some level of access control.
What I like here is the use of lighhouses, to allow external nodes to punch in and discover the rest of the network. Something which is very difficult to do if you are relying on a service mesh in an unknown and unconnectable network.
The thing that immediately stands out is the routing. It looks like cjdns is a traditional-ish multi-hop network. The DHT routing table allows you to map out a route to peer A via peer B, R, & D.
What wireguard and nebula allow is for the underlying network to figure out most of the routing, and effectively create a massive point to-point network. whilst you can have concentrators/gateways, the idea is that most of the traffic goes direct from peer to peer. This can reduce load considerably.
I think cjdns allows arbitrary peering, so you can certainly set up a full mesh if you want point-to-point traffic, with multiple hops only for cases where the underlying network topology requires it.
Can nodes communicate with each other directly even if they're behind NAT, without port mappings or UPnP?
I know there are ways to make this happen (e.g. using the techniques from Samy Kamkar's pwnat/chownat), but am not sure whether Nebula is designed to work within this constraint.
Having two nodes communicate to each other when you have a cooperating third-party server (a lighthouse or discovery node) that isn't behind a nat isn't hard. That's what STUN servers and other forms of UDP hole punching accomplish.
pwnat is notable because it doesn't require having a public stun-like server, but nebula already assumes there's public servers, so traversing nat is a non-issue.
The readme says "Discovery nodes allow individual peers to find each other and optionally use UDP hole punching to establish connections from behind most firewalls or NATs".
In practice, I didn't see any code that implements it, but I didn't look too hard.
Sounds like a service mesh. How is this any difference to Istio/linkerd? This library may be useful, but the stated problem it seeks to solve is hardly a unique one.
To me it reads more like Nebula is a VPN solution, with end-to-end encryption and security groups baked in.
To my understanding, a service mesh does not establish a common VPN-like network, but assumes it's there already. Nebula and service meshes both provide authentication, end-to-end encryption and role-based access control. A service mesh can do more than Nebula: it makes it possible to shift traffic between services for example apart from a "security group"-like filtering.
However, I might be mistaken. Any corrections are more than welcome.
That being said the code is full of TODO and other comments indicating that shortcuts were taken which should be fixed later. I would be worried about running such a thing in prod given the criticality of its function. At best you could risk performance issues under load and at worst you could have significant security issues allowing unintended traffic in/out.
I wonder if they tried ZeroTier. It sounds really like what they wanted.