Beginners guide to gateways and proxies

user5994461 · on April 26, 2020

Wish developers would call these load balancers and stop trying to misappropriate words that designate something else.

A gateway is a network device that gives access to other networks (usually the router giving you internet). A proxy is an intermediate system to access, filter and cache requests, usually HTTP requests to the internet, not to confuse with a reverse proxy / load balancer that doesn't do that.

Of course all these are related network equipments often managed by the same guys, so go figure what tickets about "device" are really about.

edit: Also, it should be returning 401 instead of 403. This reminds me I need to make a post about the good usage of 401 and 403.

JoeAltmaier · on April 26, 2020

Yes its a sad thing that as anything matures, the technical terms get washed out by the tide of amateurs entering the field. The wrong error codes get used; the wrong names get put on things (e.g. Chrome is an operating system instead of a desktop). We old geezers try to stand against the waves, but get washed away.

game_the0ry · on April 26, 2020

Hey old geezers - please keep old geez-ing. As a clueless amateur myself, I appreciate it when my elders set me straight. I need your wisdom.

Please don't be discouraged or lose hope on us, especially when our own hubris gets the better of us. We just don't know any better ... yet.

koffiezet · on April 27, 2020

Something like an "API gateway" does a lot more than load balancing, and sometimes it doesn't even do any load balancing. Are you still going to call it a load-balancer? Gateway is a much better description.

In web applications frameworks you also have a 'router' concept, which is also a network thing. It doesn't mean that because the network thing happens to route traffic around that the router in the web application can't route requests to the correct code.

jrockway · on April 26, 2020

What are y'all using for an egress gateway? I haven't been able to find one that meets my needs. Typically I use Envoy, because it supports Opentracing and Prometheus metrics, which is what I'm looking for for an egress proxy. (I also like circuit breakers for external APIs; if Slack is down, there's no point in sending them another request for a while and it should just fail fast.) But Envoy does not handle dynamic HTTPS proxying particularly well (you can't just set $http_proxy in your environment and have it Just Work; you would need to create a cluster and route for every upstream), so doesn't work too well for the egress gateway case. (Things like Istio claim to provide this, but they actually just configure Envoy to act as a TCP proxy and can't extract your Opentracing identifiers or response codes. So you can get metrics like "you sent 2398473 bytes to 13.249.46.27" but not "100% of requests to api.slack.com timed out". It's something, but not useful enough to be worth the configuration overhead for me.)

I have also looked at some of the traditional egress proxies, like Apache Traffic Server. It seems like it was invented before anyone ever heard of distributed tracing or metrics, and doesn't support those. So it doesn't work for me.

My ultimate goal is to configure every application I use to make all outgoing requests through a proxy, and then use network policies to physically prevent them from talking to any namespace that isn't the egress proxy or something it's authorized to talk to. That way, there can be no unexpected (and unaudited) communication with the outside world; if some container I use starts including a coin miner or a traffic sniffer or something, I'll know about it and can shut it down. But... nobody seems to care except me. So I might have to write my own, which is a pain.

therein · on April 26, 2020

I am biased because I contributed actively to Apache Traffic Server when I worked at a past employer.

It has its quirks. I played a major role in migrating our company from ATS4 to ATS5 and then ATS6.

There are many developers eager to help, especially if you're hosting or able to show up at the ATS summits.

Envoy has native middleware support but as far as I know you have to build the middleware with the server.

With ATS, you can use TS API or atscppapi and create shared objects that are dynamically loaded middleware.

We had considered Facebook's proxygen at one point too but we already had way too many middleware targeting ATS that it didn't make much sense.

However the hooks that they provide match almost 1-to-1 so it would be not that challenging.

Even to this day, my past employer uses ATS in their PoPs. And then at the next layer of reverse proxies, and then the layer after that.

So imagine PoP (level 0) to pick the right colo -> Level 1 to pick the right service cluster -> Level 2 to front the service instances are all ATS. After that you get to the origin servers that are discovered by a service discovery plugin loaded into ATS, just like all the higher levels. Each level discovers the downstream hosts over zookeeper back then and I had migrated that system to use Consul, so Consul now.

For some context, we were probably handling 8000qps on each ATS, with extremely heavy use of middleware plugins to do everything from auth to dispatch to spam filtering with dynamic rulesets pushed from internal hosts. I benchmarked ATS with and without plugins. With no plugins, our configuration could push over 80000qps.

We had ~36 PoPs, 20-30 instances at each PoP. That's around 8.6Mqps handled with ATS. Part of this was realtime bidding for ads [0], and the rest was actual user traffic, part of it static content, part of it dynamic.

We had our own CDN but also had Akamai on the side to fallback to. AFAIK from my conversations with people in these summits, iTunes is behind ATS. Comcast also pipes your unencrypted traffic through an ATS instance of their own with some custom middleware.

[0] - https://support.google.com/google-ads/answer/6366577?hl=en

jively · on April 26, 2020

Would be good to include [Tyk](https://Tyk.io) in there, as far as OSS gateways go it's one of the more popular ones (caveat: I'm the founder) :-)

notamy · on April 26, 2020

That pricing page REALLY bothers me in that it's "starts out free" -> "contact us for details" and no other details :<

jively · on April 27, 2020

People confuse our pricing page with OSS, Tyk Open Source is batteries-included, so you get 100% of the functionality free of charge. We charge only for the stuff that companies like: GUIs, RBAC, SSO etc. This is all delivered by a closed source product, and to get pricing for that, we want to talk to you.

We don’t hide the ball We invest heavily in ensuring you can do everything you need to do in OSS version We ensure that there’s always an OSS way of doing things with Tyk, we don’t strong arm you into buying

We do need to feed ourselves, so for the proprietary stuff, we need to be competitive, that’s just how the market works.

antman · on April 28, 2020

The market has prices out in the open (mostly)

jlei523 · on April 26, 2020

Seconded! Companies that intentionally leave out the pricing is automatically blacklisted to me. It shows that the company is not transparent.

pj3677 · on April 26, 2020

Thanks! I’ll check it out and update the post.

pj3677 · on April 26, 2020

The accompanying GitHub repo with example is here: https://github.com/peterj/square-service-gateway

wyclif · on April 26, 2020

This is a helpful post technically, but the English could use some work. I'm guessing the author is eastern European and not a native English speaker. It would be a good idea to run the body text through spell check ("service" instead of "ervice", &c) and edit accordingly.

The first time you run the above command, it can take a bit as Docker needs to download

"It can take a bit of time as Docker needs to download" is proper English. Also, in the title, you need an apostrophe after "Beginners."

JoeAltmaier · on April 26, 2020

Fellow pedant! It should be apostrophized thusly: "Beginners'"