Hacker Newsnew | past | comments | ask | show | jobs | submit | wpeterson's commentslogin

If they’re optimizing full table scans of 20M+ rows, they probably want an optimized column oriented DB or a data warehousing option like Snowflake.


agreed! PostgreSQL is the wrong tool for large data without indexes. It's also the wrong tool for ultra low latency access. And small, non-persistent data. And and and...

That said, over time PostgreSQL has wildly expanded the range for which it's suitable and if you can and want to bet on the future, it's often a better bet than niche systems.

It's also important to remember that PostgreSQL is decades ahead of other systems in data virtualization and providing backward-compatibility to applications after changes; pushing down computation near the data and avoiding moving billions of rows into middleware, including a world class query optimizer; concurrent data access; data safety and recovery; and data management and reorganization, including transactional DDL. Leaving this behind feels like returning to the stone age.


>> Wrong tool for ultra low latency access

I'm not sure what you mean by ultra low latency but unfortunate to have to rethink what a tool is good for because of RDS / EBS.


And even if you want to stay in the Postgres ecosystem there's options for you there.


For analytics, use a columnar database.

There are even other AWS Postgres-oriented options (check the pricing first):

ZeroETL from Aurora Postgres to (postgres-compatible) Redshift (Serverless?)


Yup. Even gross abuses of Redshift run fine with appropriate roll ups and caching. At a past job we did it “wrong enough” that it took a while for a more state of the art solution to catch up. This is not to say the abuse of Redshift should have been done, but AWS has been abused a lot and the engineers there have found a lot of optimizations for interesting workloads.

But to pick the wrong DB tool in the first place and bemoan it as “not scalable” is a bit like complaining that S3 made for a poor CDN without looking at how you’re supposed to use it with Cloudfront.


Is ZeroETL not in early stages, still? I heard it replicates everything. No filtering yet on parts of the binlog (tables/columns). But other than that, i like the idea.

(I would like to know, where their ZeroETL originated from, usually AWS picks up ideas somewhere and makes it work for their offerings to cash in. A universal replication tool.)


For a mere 20M-70M rows I'd stick with Postgres, index, and materialized view.

After that is when I'll start migrate to duckdb or clickhouse (or citus if I don't want to move out completely from Postgres)


Came here to say this. If you use a hammer to fasten a screw, it's probably not going to work


Perhaps cockroachdb or titaniumdb would be a better choice.


I can’t tell if you’re trolling or not, as those are even more terrible options for analytics workloads . You must be


100% this. Only use containers for remote execution: CI, staging, production. Run native code locally. Manage dependencies and isolate project environments with NIX.


Ugh, what a loss for this community and the Ruby community.

This should be worthy of a Hacker News black banner today.


This resonates with my experience.

You can tell the author is mostly worn out not from long hours or challenging technical work, but from fighting against the overwhelming inertia that faces building/shipping anything. It’s exhausting.


This article is dangerous in romanticizing the “not invented here” culture at many big tech companies and seems rooted more in the 90s than present day.

The world of open source tooling and easily re-usable SAAS offerings means everyone has access to the best tools, whether you’re a small startup or a big company.

Anyone who longs for internal, corporate tooling baffles me when they can use things that actually have polish, user experience and likely better implementations under the hood.

Companies should spend their time/energy building things unique to their problem domain, not weak also-ran corporate tooling.


The mistake you're making is thinking that the public tools are the best tools.

There are reasons that make them the best for many, if not most, companies: more investment, more mindshare, easier to hire employees with prior experience, and so on.

But there are also costs in having a wobbly stack of glued together stuff, especially if the parts aren't quite right for your goal.

Sometimes the best tool is more focused, more vertically integrated, in a different language or for a different operating system because those choices integrate better with the rest of your stuff.

The constraints of the company are also part of the problem domain. Using the wrong tools can be quite expensive, and the public tools may all be just a bit wrong in a way that compounds.


Where I work, the majority of our tools were built in-house, mainly because we started (2008) before there were good open source or even paid options for most of it all. As good options started to appear, we found that we couldn't adopt them, because there was a mismatch in concepts/fundamentals between what we'd built and what was out there. We've evaluated a lot of things, and for many of them, we end up realizing that integrating them with our systems would require a hard fork, and so we'd lose most of the benefit of using it.

Frankly, it sucks. Our tools are mostly very good, but it took us a long time to get there, and the internal fights over getting funding to really invest in our internal platform have been exhausting for all involved. I get that it's not zero work or zero time to use something off the shelf, but as someone who has been playing in the grass on the other side of that particular fence, it takes a lot of work to keep that grass green.


The internal build tools I had access to at Amazon are unrivaled in the public domain. I DO miss them, at every job.


I'm asking this sincerely: What is/was so great about them?


It was probably some combination of how well-integrated and how opinionated they were.

The build tool was multi-language. It allowed depending on arbitrary packages which had been imported to the Amazon package repository. It allowed package owners to annotate the packages with guidance - experimental, deprecated, forbidden (in the case of security issues), etc. You could also declare conflicts, which would notify consumers at compiletime and force them to resolve in some way. The tool deferred to a number of standard build tools in whatever language(s) you were using; it was just about getting and packaging the dependencies.

When you committed code, a build was submitted to a distributed build system. It would run your build, and then it would run the builds of every package which declared a dependency on your version. If those builds failed, your build failed (so, bump your version or make your change backwards compatible). On completion, it imported an immutable bundle of your artifact + dependencies to the deployment infrastructure.

This is the part I miss - the build stuff was great but I largely find that open source and paid options aren't so bad here. What I miss is how easy it was to manage the journey of a built package to your machines. They had a tool called Pipelines that had a visualization of this progression. Each stage, with associated machines, was linked here. The tool knew how to add stages of environments, each with their own set of deployment configurations. You could set up approval workflows: integration tests, manual approvals, etc. You could feed one pipeline into another. For each artifact, you could configure autobuilds into your pipeline, so that new versions flowed as long as tests/approvals allowed. There was support for sanity checks: if those failed, the tool would automatically rollback. In fact, if any stage fails, your pipeline would block, and you'd be notified. In some cases, a newer build that was fully functional could unblock your pipeline.

Pipelines was a pleasure to use - it really just got the hell out of my way, and nothing I've used since is as simple to integrate with.

And tying this all together, there was a tool that would allow you to initialize the end to end infrastructural pieces needed. You go to a wizard, tell it your language and purpose (webapp, CLI tool, service) and it initializes a repository, package, and pipeline with environments. Get a coffee, come back, and Do Your Job.

I imagine similar things exist at places like Google, but man everywhere else I've worked, so much developer productivity has been lost not even approximating the level of "let me do the interesting stuff" that Amazon provided.


Awesome, thanks for the informative reply!


What's the closest OSS equivalent and how could they be improved?


Just responded to another comment - for the part I miss most, Pipelines, Spinnaker is probably closest (I'm aware of Concourse, etc as well). But Spinnaker is horrific to stand up yourself (especially at my company of 8 engineers), and it's actually too generic. The primary advantage I believe is that the tools were exactly built to work well at Amazon, with Amazon's infrastructure.


Armory (W17) can help with spinnaker. We offer a managed version for exactly this reason. Probably still way too heavy for an eight person engineering org but helpful for larger companies.


Anyone who longs for internal, corporate tooling baffles me

A previous employer had an all-proprietary stack: language, IDE, version control, deployment system, database, job scheduler. It was, frankly, amazing. The most productive environment I’ve ever worked in. Time-to-market was this company’s competitive advantage and this stack let them leave their competitors in the dust, scrabbling for a distant second place.

everyone has access to the best tools,

It’s 5 years since I left there and the FOSS tooling we use at my present company doesn’t come close, and it never will, because choices are driven from the basic assumption that FOSS is always best, and it just isn’t true. We are at least 5 years behind where the previous company was when I left it and they won’t have stood still in that time.


As a marketer, I always look for no code solutions first. API integrations? Use Zapier. Email form integration? Sumo or similar. Triggers? Google tag manager. Etc. The reason being that the dev team at any company never has time for new projects.


I spent two terrible years in Mountain View for a work opportunity I couldn't turn down and it was horrible. Our rent went up 20% during that time. It felt completely unsustainable, ecologically and financially to live there.

We moved back to Boston and have been overjoyed with fleeing the peninsula. I'd still love to visit north of San Francisco of south of Monterrey on vacation but they literally couldn't pay me enough to live there.


Hey buddy!

Reading all the design and discussion I was. Rey curious how you structured things at a brass tacks storage level.


yoo! https://www.slideshare.net/jdwyah/diy-heroku-using-amazon-ec... does have a bit of a pretty picture, but the basic idea is:

For each rate limit you can choose to be in one of two modes: 1) Redis with a backing store of DynamoDB aka BestEffort since there are failure modes where you could lose an update. In this mode everything expects to happen in Redis, but if we don't find your limit there we check Dynamo. Writes are asynchronously persisted to Dynamo.

2) Token Buckets straight in DynamoDB. This is our Bombproof mode.

(details in https://www.ratelim.it/documentation/safety)

It's worth noting that with either of these you can cache aggressively in the clients whenever the limits have gone over. Both the clients https://github.com/jdwyah/ratelimit-ruby https://github.com/jdwyah/ratelimit-java do that for you.


This may be a confusing thing, I suspect Instacart may get volume discount from stores for bringing the extra business.

So this markup may be based on discounted cost instacart pays, not necessarily what YOU would pay at the store.

Still think it's a crazy premium, but I enjoy grocery shopping.


Can you explain further? Because this makes no sense on its face. If they get a discount on what the grocery store charges, then the final price after their markup should be closer to the store's price, not further.


His insinuation is that the "Instacart receipt" that the customer received is the discounted one that reflects the bulk discount. If an individual went to the store and got the same list of items, it would be more expensive, so the markup wasn't $140 - $80 but more like $140 - $100.

I have no idea if that's true or not..


> So this markup may be based on discounted cost instacart pays, not necessarily what YOU would pay at the store.

Doesn't that just make things even worse?


ezCater | Boston, MA | ONSITE | Fulltime

https://www.ezcater.com/company/about-us/

ezCater is the #1 online marketplace for business catering in the United States – a $21 Billion market.

We’re backed by Insight Venture Partners and have been growing 3X per year, and we want to grow even faster. We’re always looking for highly skilled engineers to help build our web and mobile apps, while riding this rocket ship of growth.

At ezCater, technology is valued as a differentiator and also as a key component of our success. We push ourselves everyday to better the codebase, improve performance, and deliver an amazing customer experience.

Senior Full-Stack Engineer: https://www.ezcater.com/company/apply/?gh_jid=78210

Senior iOS Engineer: https://www.ezcater.com/company/apply/?gh_jid=78582


I've been living in Mountain View for several years, but we're moving back to Boston in a month.

We can get a house 3x as large for 2/3 the cost and have a much better quality of life.


"and have a much better quality of life"

If you like snow.


Much better quality of [everything but weather]


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: