Short-Lived Certificates at Netflix

viraptor · on Oct 24, 2017

This is also implemented in OpenStack / for HPCloud in Anchor: https://wiki.openstack.org/wiki/Security/Projects/Anchor

You can use it as a standalone project in your environment as well. There's a talk about it at https://youtu.be/Q_ZhrQq-_YM (ideas very similar to Netflix)

gruez · on Oct 24, 2017

dumb question but, but why was oscp stapling invented when it's the same as short lived certificates? have the certificate's expiration date set to a short period, have the CA renew it regularly, and place it on some http server. then you can have some cron job that downloads the certificate and reloads your server. and since the certificates are short lived, the CA/browser vendors can mark them as being excluded from OSCP checks. all the benefits of OSCP stapling, without the extra implementation complexity.

merb · on Oct 24, 2017

certificates costs money and the world was mostly manual in the past (and in a lot of places still is) you can still order a 3 year cert.

yeukhon · on Oct 24, 2017

Also, many CA don’t provide programmatic way to order and to download cert. Another annoying manual step, especially consider having multi-domains not convered under a single wildcard.

Sadly let’s encrypt does not support wildcard... and support for intranet is controversial to thosr who want to keep their intranet “totally private”. I love the autobot. One command to generate new cert, another command to reload my Nginx.

pilsetnieks · on Oct 24, 2017

> One command to generate new cert, another command to reload my Nginx.

Maybe this is what you meant but you can do it in a single command. For example:

    certbot renew --post-hook "service nginx reload"

merb · on Oct 25, 2017

that works fine, until it doesn't... multiple servers

JetSpiegel · on Oct 24, 2017

Let's Encrypt will support wildcards in January 2018.

CaliforniaKarl · on Oct 24, 2017

The issue we've run into with Let's Encrypt, is that they have a limit on the number of new (non-renewal) certs in a time period, grouped by high-level domain. So, for example, when you have lots of separate groups running sites until a common domain (groupa.example.com, groupb.example.com), you often hit the new-issuance limit, and have to wait.

So, I expect existing CAs (and groups like InCommon) will continue to be around to serve large entities.

mrkurt · on Oct 24, 2017

I don't know how helpful we can be, but we (and other companies like ours) have much higher Let's Encrypt limits. If it'd make your life better send me an email and we'll help you out.

majewsky · on Oct 24, 2017

Wildcard certs should help with that. Also, the limit is only on certs, not on domains. If your process allows for issuing a single cert for 100 domains, that also solves the problem. (There is a limit on SANs per cert AFAIK, but I don't have the exact number.)

viraptor · on Oct 24, 2017

If you're dealing with a large scale deployment that involves many endpoints (and not just subdomain-per-user scenario), there are good reasons not to use a wildcard. Other CAs will definitely stay in business there.

gregmac · on Oct 25, 2017

> there are good reasons not to use a wildcard.

Such as?

viraptor · on Oct 25, 2017

Example: Imagine Microsoft giving anyone "(wildcard).microsoft.com" just because they have a lot of services to deploy and don't want to deal with separate certs. Now, breaking into that service means you can mitm windows updates for anyone and they can't tell a difference.

You want to limit the exposure of your certificates, you don't want 50+ teams to share credentials/certs (they're effectively public at that point), and you want to make sure that if you need to revoke the certificate right now, you know you're impacting as few endpoints as possible.

ivanr · on Oct 25, 2017

To add to this, there's potentially a similar security problem if you have a bunch of systems with different certificates sharing the same TLS session caching backend or session ticket keys. It doesn't allow any one system to impersonate another, but they can perform active and passive network (MITM) attacks.

For server-side caching, some systems now take into account SNI hostname and use it to prevent contamination. If you're in this situation, it's worth looking into how exactly your backend works.

lucaspiller · on Oct 25, 2017

It is possible to get a certificate that allows to sign other certificates on your domain? For example you call Verisign and they issue a cert for *.mycompany.com, then you use that to sign another certificate for accounting.mycompany.com?

viraptor · on Oct 25, 2017

Yes and no. If your company is worth a few million dollars, can get high enough insurance, can get FIPS certified, and can jump through enough hoops, then yes - you can get an intermediate certificate. That product is not even openly advertised by most CAs. If you're just a "normal company", no, you won't get one.

https://translate.googleusercontent.com/translate_c?depth=1&...

The big problem here is that the intermediate CA isn't really limited on what they can grant. You can issue a valid "google.com" cert just as well as "foo.yourcompany.com". The are x509 extensions for limiting the scope in this case, but I don't believe they're widely used or validated at the moment.

tialaramex · on Oct 25, 2017

Being able to issue certificates makes you a CA, in this case a subCA. It requires a particular kind of certificate which has CA:TRUE set, usually this property of a certificate isn't present and it defaults to CA:FALSE.

This used to be not that big of a deal with hundreds of larger companies having subCAs but the rules have tightened up considerably. Apple and Google have them under one of the Symantec roots, but a lot of older ones have been deactivated. The reason is that the root CA is responsible for ensuring you're doing as good a job of issuing as they would have, since they're effectively "sponsoring" your issuance, promising you're doing it right, and that costs money. As a rule of thumb if your organisation doesn't have an actual full-time role sorting out all the certificates, you won't find it cheaper to have any subCA status. "But I spend like an hour every week doing it" is not a full-time role.

The cheapest option if you do have a subCA is a constrained subCA that's hosted. In this scenario the CA:TRUE certificates says inside it that it is only valid for certain hierarchies and mustn't be trusted outside those, this is called a "constraint". The root CA has to make sure such a subCA is properly looked after but they can do this internally themselves, so they'll want it on their server premises, not in your DC or under somebody's desk, where they can vouch for its security.

A more expensive option is still hosted but without the constraints. In this case the subCA can issue any certificate, but it does so under direct supervision of the root CA operator, often using only IT built by them. So e.g. there's a web site, you sign in, you tell it what certificates you want and then _they_ check that's OK before using "your" subCA to issue. The root CA has to do lots of extra paperwork for this scenario, but it's not so bad because they need most of the same things for their existing business, they get a sort of economy of scale on the audits, lawyers, building security etcetera.

The most expensive is on-premises unconstrained. In this case you have no limits. And so you also have to obey all the same rules as the root CAs do. This means third party auditors have to visit as witness to various routine activities, and their audit reports including adverse findings must be submitted to your root CA and to the major trust stores as a requirement of their trust. You will need a dedicated, physically secure building to house the hardware security module of the CA function, with properly trained staff following documented procedures, "No Lone Zone" policies, all that jazz. This can definitely cost a million dollars per year to sort out _before_ you pay anybody to let you do it.

Also by the way "Verisign" you're thinking of, the root CA, was sold to Symantec, who are now selling to DigiCert and the Verisign roots are to be distrusted (which is part of why Symantec are selling, they destroyed trust in these roots through a poorly managed approach to the business).

apple4ever · on Oct 24, 2017

Finally! Now they just need to add support for specifying cert length from 10-360 days.

kuschku · on Oct 24, 2017

They won’t ever offer > 90 days, but certificate lengths of about a day would certainly be interesting. I’d certainly switch to 24h valid certificates as soon as possible.

Ideally even separate certs for every subdomain, but as Let’s Encrypt has cert limits, and I want to avoid SNI in the future, I’ll probably have to use wildcard certs.

toast0 · on Oct 25, 2017

I don't trust client clocks (or x.509 not before checking) enough to use certs until they've aged at least a couple days; a week is best, if I can manage. 24 hour validity would shut out a lot of people with computers or phones that can't keep time (or choose not to)

tialaramex · on Oct 25, 2017

Certificates are usually issued back-dated by one hour. Most clients on the Internet are correct +/- 60 minutes, because of a mixture of small timezone errors, daylight saving being wrongly observed / not observed / not updated and similar. It is rare for clients to have the wrong date.

Back dating with a technical rationale (e.g. to work around crap clocks and historically as a way to hide more entropy near the start of the signed certificate) is accepted in the Web PKI, it is only forbidden to use back dating to try to dodge Baseline Requirements, for example back dating to avoid the restriction on SHA-1 after 2015 was prohibited and is one of the things StartCom / WoSign were caught doing.

icebraining · on Oct 24, 2017

Curious, why do you want to avoid SNI?

kuschku · on Oct 24, 2017

Basically, I am against SNIs today's implementation, and want to allow people who disable it to visit my sites.

SNI transmits the host unencrypted, that is a real security issue.

jsmthrowaway · on Oct 25, 2017

In what sense? An attacker can already see the endpoints of a TLS conversation, and worrying about hostname disclosure is security through obscurity; the client already divulged the destination hostname with a probably-cleartext DNS query, too. Not worth worrying about. SNI is fine. If hostname disclosure is a security threat, the system needs rearchitecture.

Systems that hostname their customers (mycompany.example.com) should use wildcards for that scenario instead of SNI, among other reasons. That’s the only possible concern I can imagine.

jakupovic · on Oct 25, 2017

Wildcard certs work only 1 level deep. If you introduce regions and stages or other dimensions you would need multiple wildcard certs to cover e.g. SVCa.teama.region.example.com

Multi domain certs help but then you need to encode all the names ahead of time and if you miss a name you need to reissue and reapply the cert.

captncraig · on Oct 25, 2017

Do you own a huge block of IPs or something? We have a massive monster cert in order to support non sni on a single ip. I hate it.

kuschku · on Oct 25, 2017

That wouldn't avoid the issue, actually — I need to put a dozen domains in the same IP space with one cert to get this security property.

Otherwise an adversary could simply see what IPs you connect to and reverse DNS them.

icebraining · on Oct 25, 2017

Thanks for answering!

gruez · on Oct 25, 2017

>certificates costs money

you still pay for x years certificate, but you only get a valid certificate for the next y days. if the CA can sign OSCP responses for millions of visitors, surely they can resign a certificate every y days.

>the world was mostly manual in the past (and in a lot of places still is)

that makes sense when talking about OSCP, but to get OSCP stapling working, you need to configure your web server to do so. instead of standardizing OSCP stapling, why couldn't they have standardized a protocol for a server to get updated certificates from the CA?

mrbabbage · on Oct 25, 2017

> why couldn't they have standardized a protocol for a server to get updated certificates from the CA?

That's essentially OCSP :-)

In general there's not really a concept of an "updated certificate". A certificate is good if the signature matches, the subject on the cert matches the server, and the current time is within the certificate's validity period; otherwise, the cert is bad. If someone steals a certificate, the website fixes this by telling the C.A. to revoke the certificate and by serving a different valid cert – my old employer kept a second valid certificate lying around in order to minimize downtime in the event of having to kill the main cert. OCSP is a way to effect the killing of the bad certificate. But in the absence of OCSP or some other revocation mechanism both certs are still good.

> if the CA can sign OSCP responses for millions of visitors, surely they can resign a certificate every y days.

I haven't worked on this stuff in a few years, but historically OCSP resolvers were notoriously unreliable and often down. That's a huge issue for a security-critical path, because you're forced to "fail open" (which defeats the whole point of having OCSP in the first place) or render a wretched experience for your user. One big reason OCSP stapling exists is to work around C.A. OCSP resolvers' unreliability.

nimbius · on Oct 25, 2017

Does anyone know if ephemeral/automated cert issuing and renewal exists as an open source project yet? Most of this is Netflix internal but I feel like letsencrypt has made short lived certs an inevitibility

SoMuchToGrok · on Oct 25, 2017

Somewhat related: Vault has a PKI backend that can help facilitate this. You'll need to create some tooling around it, but we've had great success rolling it out at my company.

https://vaultproject.io

predakanga · on Oct 25, 2017

LetsEncrypt provide two reference implementations of an ACME server, in Pebble[0] (not production ready) and Boulder[1]

[0]: https://github.com/letsencrypt/pebble

[1]: https://github.com/letsencrypt/boulder

jacques_chester · on Oct 25, 2017

It's part of the Credhub vision to do so[0] (supporting "The 3 Rs", being rotate, repair, repave[1]). Pivotal has been sponsoring development.

I was on the Credhub team for a while. When you begin to assume that you have (1) an always-on credentials service and (2) that it can serve multiple sides of credentialling (eg, service broker adds a credential, application fetches it), you get to do more aggressive cred management.

I was on the Credhub team for about 6 months, while it was being worked on both US coasts. It's now based in NYC.

[0] https://github.com/cloudfoundry-incubator/credhub/tree/maste...

[1] https://builttoadapt.io/the-three-r-s-of-enterprise-security...

ajessup · on Oct 25, 2017

Yes. Take a look at https://github.com/spiffe/spiffe

CaliforniaKarl · on Oct 24, 2017

I am looking forward to ACME 2.0 becoming an RFC! Once that happens, I can ping our ACS team to start bugging InCommon to spin up the appropriate server components. My guess is the other CAs are waiting for ACME 2.0 before they really spin up support for it.

For anyone interested in tracking the progress of ACME 2.0, take a look here: https://datatracker.ietf.org/doc/draft-ietf-acme-acme/

majewsky · on Oct 24, 2017

I don't know much about ACME 2.0 (except that it is apparently necessary for LE to be able to start offering wildcard certs). Can you expand on why CAs are waiting for 2.0?

mholt · on Oct 24, 2017

Probably because it's the first IETF-approved spec for the protocol; right now it's all in draft status. LE itself will be adding ACMEv2 next year.

tzs · on Oct 25, 2017

Why is certificate management not integrated with DNS? You already have to consult DNS to get an address to connect to, so why not piggyback certificate validity information on top of that? I'd suggest allowing both revocation lists and a way to say that only a specific list of certificates is allowed.

sytse · on Oct 25, 2017

Great idea, I think that is done in https://en.wikipedia.org/wiki/DNS-based_Authentication_of_Na...

detaro · on Oct 25, 2017

There are some things like that (e.g. DANE), but in the general case you can not trust DNS, since it isn't authenticated. (DNSSEC is far from everywhere, even if the resolver does DNSSEC the connection to the resolver might be unprotected)

scurvy · on Oct 25, 2017

How does Netflix handle applications that don't have certificate/key hot reload capability? MySQL is especially guilty of this. It's a PITA to force a restart just to reload certs or keys even once a year. I can't imagine having to do this every few days.

moulidorai · on Oct 25, 2017

Companies should start using tools like ManageEngine Key Manage Plus https://www.manageengine.com/key-manager/ (or) other similar products for secure ssh key and ssl certificate management. Automation is the only way to avoid security issues.

Disclaimer: *I work for ManageEngine.

rgooch · on Oct 25, 2017

Symantec has developed an Open Source system for short-term SSH and SSL certificate management with 2FA (VIP and U2F). We encourage people to adopt this to improve their security. Code: https://github.com/Symantec/keymaster Design document: https://docs.google.com/document/d/1AW3UROCJqTc3R4MLJXxmPUNS...

je42 · on Oct 25, 2017

the article mentions, that HAProxy cannot do a reload without downtime ? is that correct ? no way around that ?

detaro · on Oct 25, 2017

Was true for a long time, and still is for the current stable version. This article goes in more than enough detail over the history and the various workarounds: https://www.haproxy.com/blog/truly-seamless-reloads-with-hap...

JoeyPardella · on Oct 25, 2017

and short-lived tenures ;)