Why do people host static websites on S3 at all? It really isn't designed for th...

mixonic · on Dec 28, 2012

I've been messing with S3 for a new project involving the HTML5 canvas- So lots of CORS and canvas security concerns, PUTing objects from the browser, and desire for low-latency changes.

S3 has not been delivering. Here's a few reasons:

* S3 only provides read-after-write consistency for non-standard regions: http://aws.amazon.com/s3/faqs/#What_data_consistency_model_d... Since moving to US-West-1, we've had noticeably more latency. Working without read-after-write just isn't an option, users get old data for the first few seconds after data is pushed.

* CORS support is basically broken. S3 doesn't return the proper headers for browsers to understand how objects should be cached: https://forums.aws.amazon.com/thread.jspa?threadID=112772

* Oh, and the editor for CORS data introduces newlines into your config around the AllowedHost that BREAK the configuration. So you need to manually delete them when you make a change. Don't forget!

* 304 responses strip out cache headers: https://forums.aws.amazon.com/thread.jspa?threadID=104930... Not breaking spec right now, but quite non-standard.

* I swear, I get 403s and other errors at a higher rate than I have from any custom store in the past. But this is purely subjective.

Based on all this- I really need to agree with saurik that the folks at S3 aren't taking their role as an HTTP API seriously enough. They built an API on HTTP, but not an API that browsers can successfully work with. Things are broken in very tricky ways, and I'd caution anybody working with S3 on the front-end of their application to consider the alternatives.

I'm moving some things to Google Cloud Storage right now, and it is blazing fast, supports CORS properly, and has read-after-write consistency for the whole service. Rackspace is going to get back to me, but I expect they could do the same (and they have real support).

jeffbarr · on Dec 28, 2012

Regarding this bug:

> CORS support is basically broken. S3 doesn't return the proper headers for browsers to understand how objects should be cached:

The S3 team is working to address this. We're investigating the other issues and always appreciate your feedback.

jcampbell1 · on Dec 28, 2012

While you are fixing things, can you please make cloudfront send HTTP 1.1 instead of HTTP 1.0 for 206 (Partial Content) responses to range get requests. It is invalid since 206 is not part of HTTP 1.0, and Chrome refuses to cache the responses, which makes cloudfront terrible for delivering HTML5 media.

mixonic · on Dec 28, 2012

Thanks Jeff!

zzzeek · on Dec 28, 2012

I host very small-time sites for a few family members on S3 because it is practically free (pennies per month) and there's more or less zero chance some script kiddie will break in and deface it, as was the case when they were going the traditional "php/wordpress on godaddy" route. EC2 is great but for hosting tiny non-money-making sites it's way more expensive and maintenance consuming, a micro comes out to $14 a month and a small comes out to $46 a month. For a site that gets hit a few hundred times a week tops, you're just paying for tons of idle time. A very rare 500 error (I've never seen that before) is not an issue in this case.

jeffbarr · on Dec 28, 2012

We investigated your report of issues with requests, and found that one S3 host was behaving incorrectly. We identified the root cause and deployed a fix. Can you verify that we have fixed your issue?

saurik · on Dec 28, 2012

(I also have this specific high-500 rate as case 81302771, which I humorously did not get an answer to yet; I got a response asking for more information which I provided before going to sleep this morning, but no resolution... yet I switch back to HN and you have responded here? ;P)

I cannot replicate the really-high 500 rate anymore on 72.21.194.13 (the node that was particularly bad). However, I'm still concerned about what caused that: Is it likely to happen again? Why did it only happen to that one node? (In essence: help me trust this system ;P.)

However, what I'm most interested in is whether the "static website hosting" endpoint of S3 (the *.s3-website-us-east-1.amazonaws.com URLs) has different semantics than S3 normally does, so that under "normal interaction" scenarios[1] I can rely on "this will do its best to not return a 500 error, retrying if required to get the underlying S3 blob".

[1]: http://news.ycombinator.com/item?id=4977360

xingquan · on Dec 28, 2012

Hello,

Have you tried setting a redirection rule on your bucket so that when the 500 error occurs, S3 will automatically retry the request? You can set a redirection rule in the S3 console, and I think the following rule might work:

This will redirect all 500s back to the same location, effectively re-trying the request. This should cover the random 500 case but I'm not sure that it will work 100% of the time though.

koenbok · on Dec 29, 2012

This sounds like a possible solution. Have you tried or tested this?

nicome · on Dec 29, 2012

I'm interested by this solution too. I will try to take a look and test it over the weekend. I think that this solution can only be an improvement though.

notaddicted · on Dec 28, 2012

It is very easy to set up S3 to provide the files to cloudfront ... have you seen any issues with that?

saurik · on Dec 28, 2012

I believe not: I'm pretty certain putting CloudFront in front of your bucket is fine (it handles the HTTP layering correctly); this problem is one of attempting to directly host static content from S3 only.

(That said, I have very little personal experience with CloudFront, as in my experience it is more expensive for fewer features with less POPs than using a "real CDN" like CDNetworks, or even Akamai.)

(edit:)

For this specific circumstance, I'm not certain at all what CloudFront's behavior will be; it seems like the "redirect" concept is a property of the "static website hosting" feature of S3, not part of the underlying bucket, and CloudFront "normally" (in quotes, as I just mean the default origin options it provides) directly accesses the bucket.

I thereby imagine that if I simply set a custom origin to the ___.s3-website-us-east-1.amazonaws.com URL provided by the S3 static hosting feature I will get the right behavior (where CloudFront forwards and caches the 301 responses), but then I have no clue if it will correctly retry the 500 error responses.

That said, I will point out that I am not even certain if CloudFront retries the 500 requests anyway: it occurred to me that with a small error rate combined with a cache, if you (as I somewhat did at least) expect the potential fix to be S3-specific, you might simply never really "catch" an actual failing request in a test scenario.

It could then be that CloudFront retries all 50x failures (in which case if I set it up with a custom origin to the S3 static hosting URL you'd still get the retry behavior), but I somehow doubt that it does that (and just earlier I saw two requests in a row to S3 fail for these 301 redirects, so it might not even help).

isb · on Dec 28, 2012

CloudFront doesn't retry on 500s. Besides, you can't alias to a CloudFront distribution from x.com apex, so I don't think it would work for you even if that were the case.

saurik · on Dec 28, 2012

CloudFront, when used with Route 53 as your DNS provider, can be used for zone apex hosting, as you can place an "ALIAS" record (as opposed to a real DNS CNAME) to the other hostname; this is the same procedure you use to get S3's static hosting feature working with a zone apex and these new instructions today. (That said, I have never done this personally with CloudFront, as again: I do not use CloudFront.)

donavanm · on Dec 28, 2012

Nope. Route 53 has to ALIAS to another RR in your own hosted zone. There's no way for Route 53 to return the A records/IP address RDATA that CloudFront uses to direct clients to the fastest site.

Other "ALIAS" providers can't do real CloudFront apex support either. Their intermediate resolvers end up caching the CloudFront records without varying per client subnet.

saurik · on Dec 28, 2012

Interesting! I was reading something in one of the FAQ's earlier that seemed to indicate that that works, but now digging further and reading through the forums, I see that you are totally right: you simply couldn't do build this use-case with pure-AWS (non-EC2) tools without having this new S3 static hosting redirection feature.

(As for non-Amazon DNS with server-side aliasing support, it wouldn't be that bad, for this kind of use case: you are already taking a latency hit by returning the 301, and direct links will never target these URLs as they will have the canonical www. hostname, so if you just end up with an edge node near a geo-ip DNS server near the original user, it will be approximately good enough.)

pzb · on Dec 28, 2012

Route53 now supports at least S3 buckets and EC2 Elastic Load Balancers as alias targets. So it does more than just another RR.

sudonim · on Dec 28, 2012

Not using amazon's method, but combining S3 + Cloudfront w/ DNSMadeeasy's Aname record, you can this works (example):

http://www.customer.io

http://customer.io

No redirect, 1 s3 bucket, and they both serve up the same assets from cloudfront. And HTTPS works too on cloudfront.

Jasber · on Dec 28, 2012

Shouldn't be a huge issue, but for search I think it's best practices to have one redirect to the other.

johnbellone · on Dec 28, 2012

I spent some time over the past weekend migrating my static-generated blog over to S3+CF and the only problems that I ran into were invalidation and permissions on the bucket. It is likely a result of my lack of knowledge of S3 bucket/CF utilities, but I've been using s3cmd for sync.

Definitely impressed with how quickly it went. I muddled through setting up AWS DNS, S3 and CF through a bunch of blog articles. But it was well worth the time investment.

sudonim · on Dec 28, 2012

Ran in to similar issues. Not sure if you saw my article, but the forked version of s3cmd I mention handles invalidation excellently.

http://iamnotaprogrammer.com/Jekyll-S3-Cloudfront-Aname-Root...

johnbellone · on Dec 28, 2012

I'll definitely take a look.

Likely I'll just wrote a post on my experiences as well once everything is said and done. I haven't had the time over the holiday to figure out the AWS bucket policy. But my overall plan is to have node-webkit shim that has a markdown editor for editing posts.

It should be relatively easy and would be a complete win for me blog post wise. Especially since my "drafts" would be in S3 themselves.

mixedbit · on Dec 28, 2012

I'm in a middle of creating a web application, and my plan was to serve static files from S3. Based on your post, it seems like a really bad idea. If the problems are so apparent, I wonder why this is so generally accepted and recommended approach. One example, Heroku guide that praises putting static files on S3: https://devcenter.heroku.com/articles/s3

aidos · on Dec 28, 2012

As always, test for yourself but I wouldn't hesitate to use S3.

I host 1.5 million images on S3 and I've never had issues with 500 errors. I've also done extensive testing of S3 under load and it's simply amazing.

Frankly S3 is one of the few bits of infrastructure I DO trust.

saurik · on Dec 28, 2012

I trust S3 a lot (in fact, there was a time I had >1% of all objects in S3; I have since deleted a very large percentage, but I believe I still have well over a billion objects stored).

I would definitely agree: it doesn't fail under load; for a while I was seriously using S3 as a NoSQL database with a custom row-mapper and query system (not well generalized at all) that I built.

However, this particular aspect is a known part of S3 that has been around since the beginning: that it is allowed to fail your request with a 500 error, and that you need to retry.

This is something that if you read through the S3 forums you can find people often commenting on, you will find code in every major client library to handle it, and it is explicitly documented by Amazon.

"Best Practices for Using Amazon S3", emphasis mine:

> 500-series errors indicate that a request didn't succeed, but may be retried. Though infrequent, these errors are to be expected as part of normal interaction with the service and should be explicitly handled with an exponential backoff algorithm (ideally one that utilizes jitter). One such algorithm can be found at...

-- http://aws.amazon.com/articles/1904

Regardless, the 2% failure rate on that one S3 IP endpoint is definitely a little high, so I filed a support ticket (I pay for the AWS Business support level) with a list of request ids and correlation codes that returned a 500 error during my "static hosting + redirect" test today. I'll respond back here if I hear anything useful from them.

recuter · on Dec 28, 2012

2% failure rates are excessive, agreed - but why is the requirement to retry on 500 so off-putting? Virtually all API's have this occur on some level and you do the exponential backoff song and dance.

What am I missing that makes this such a show stopper with browsers? You can still do the backing off clientside with a line or two of javascript.

Seems like the pros outweigh the cons but I'm probably missing something.

saurik · on Dec 28, 2012

You are still thinking about this as an API, with for example JavaScript and some AJAX. The use case here is zone apex static website hosting: if you go to http://mycompany.com/ and get a 500 error, the user is just going to be staring at an error screen... there will be no JavaScript, and the browser will not retry. As I actually explicitly said multiple times: for an API that is a perfectly reasonable thing to have, but for static website hosting it just doesn't fly.

recuter · on Dec 28, 2012

Oh I see what you mean, you're concerned about the first bytes to the browser being faulty. Well, that 2% error rate is spread out across the total of requests, the likelihood of a user getting a 500 on his first hit should be significantly less then 2%. (but it does seem like it will still be way too high)

Very valid point saurik, thanks for pointing out the extent of the problem. It is a dilemma. Seems kind of silly to have an instance just for the first hit to go through reliably for visitors, goddamit Amazon.

Edit: Wait a minute, maybe this could be solved with custom error pages which I think they support. :P

Dylan16807 · on Dec 28, 2012

I..what? That's not how percents work.

recuter · on Dec 28, 2012

No, it isn't, but that's not what I meant either. I'm assuming that not all requests are equivalent because of the nature of S3.

Dylan16807 · on Dec 28, 2012

You're going to need to explain how the requests would differ. If anything I'd expect image files to be more cache-friendly and have fewer visible failures than the critical html files. An image might have a 2% failure rate once or twice plus fifty error-free cache loads, while an html page might have 2% failure every single click.

ryeguy · on Dec 28, 2012

The likelihood of a user getting a 500 on his first hit is the same as the user getting a 500 on his 100th hit - 2%.

aidos · on Dec 28, 2012

Interesting. I guess in the case of static web hosting you could use onerror to deal with failed frontend requests to smooth out the broken images from the user perspective. Though as I say, not been a problem for me.

saurik · on Dec 28, 2012

Yeah, for images you can probably deal with that; but what if your JavaScript doesn't load because the script itself was a 500 error, or the entire website doesn't load because of a 500 error... well, you're screwed. The use case here is for zone-apex whole-site static website hosting (either of just canonicalizing redirects or of the final webpage: same issue).

aidos · on Dec 28, 2012

I've just discovered that you can use the same technique for script and style tags too (though I'd rather not have to).

It sounds like Jeff has tracked down the specific issue in your case, so things are looking up :)

pestaa · on Dec 28, 2012

It might be interesting to state in which zone you and GP host those files.

My buckets are in Ireland and I haven't seen 5xx problems. I'm not a heavy user, though.

ubojan · on Dec 28, 2012

Because it's easy? I have 100 static websites on S3. After initial setup of buckets, it's trivial to update/sync all of these sites using command line tool (I use S3 Sync) with one click on a batch file. And hosting on S3 is cheap.

pixie_ · on Dec 28, 2012

This is pretty interesting, I'd like to hear more about it. I'd also like Amazon to hear more about it because maybe they could treat web buckets differently or something.

stevencorona · on Dec 28, 2012

I serve over 4 billion images out of S3 and don't have any issues with 50x errors.

saurik · on Dec 29, 2012

As I've stated elsewhere in this thread, this is documented behavior from S3. I also have billions of objects in S3, and I definitely get back 500 errors. I'm sorry, but even the CTO of Twitpic is not in a position to say "we push infinitely more data than you, so we know better", at least not for S3 ;P.

Honestly, I have to ask: would you know if some tiny percentage of your requests failed with a 500 error? I bet the answer is "no", as the idea that you wrote some JavaScript to look for a condition you probably didn't realize could happen is almost 0. I'd love to be surprised, however ;P.

(That said, as you are hosting "images", at least you could detect it with JavaScript and fix it, so one could thereby imagine a realistic reason why this would not be a serious problem for you; however, I'd argue that you are then treating S3 as an API, not as a static web hosting system.)

I have one bucket that has 3,148,859,832 objects in it <- I got that number from the AWS Account Activity for S3, StandardStorage / StorageObjectCount metric. I apparently make 1-2 million GET requests off of it per hour. Yesterday, Amazon returned a 500 error to me 35 times, or 1-2 per hour.

That's about a 1 in a million chance of failure, but if you are serving 4 billion images out of S3 (assuming you mean # requests and not # objects), then that means that 4,000 of your requests failed with a 500 error. That's 4,000 people out there who didn't get to see their image today.

So, seriously: are you certain that didn't happen? That out of the billions of people you are serving images to off of Twitpic, that you don't have some small percentage of unhappy people getting 500 errors? Again: it is a small chance of failure, but when it happens the browser won't retry.

As I said: "it only happens to some small percentage of requests, but for a popular website that can be a lot of people (and even for an unpopular one, every user counts)" <- websites like ours serve tens to hundreds of millions of users billions of requests... one-in-a-million actually happens.

(edit: Also, I will note that you seem to be using CloudFront to serve the images from S3, which might be a very different ballgame than serving directly out of S3; for all we know, CloudFront's special knowledge of S3 might cause it to automatically retry 500 errors; for that matter, the "website" feature of S3 could be doing this as well, but I have yet to get word from Amazon on whether that's the case... just pulling directly from the bucket using the normal REST API endpoint does, however, return 500 errors in the way they document.)

saurik · on Dec 29, 2012

12:54:47 * saurik ('s [third] sentence still managed to feel a little more confrontational than he wanted, even with the ;P at the end; he was going for more of a funny feel)

13:40:15 < REDACTED> heh yeah

stevencorona · on Dec 30, 2012

4 billion objects, not requests. Way more requests.

We keep access logs to look for errors. The error rate is marginal.

saurik · on Dec 31, 2012

A) Do you define "marginal" as "one in a million"? ;P

B) The only reason I opted for "# requests" instead of "# objects" is because it let me put a hard figure on "number of people dissatisfied if you have a one in a million error rate". Let's say you are doing 4 billion image requests per hour (the time scale is actually irrelevant): then at a 0.0001% error rate (which is what I get from S3) then 4,000 users per hour are getting an error.

C) ... you aren't doing S3 static web hosting if you are keeping access logs, as the only people who know about the request are the user's web browser and the server. You can attempt to detect the error in JavaScript on the client, but you can't keep an access log. If you are logging requests made by your server, then the error rate is irrelevant as you can just retry the operation.