Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Torus development has been stopped at CoreOS (github.com/coreos)
124 points by Perceptes on Feb 19, 2017 | hide | past | favorite | 58 comments


"But we didn't achieve the development velocity over the 8 months that we had hoped for when we started out, and as such we didn't achieve the depth of community engagement we had hoped for either."

Open source is tough, even as a successful VC funded company. Gotta give credit to CoreOS though, rather than beating a dead horse they're acknowledging there's little external interest and their time would be better spent focused elsewhere. Seeing as they've discontinued Fleet as well it's likely they're doubling down on the commercial Tectonic product built on kubernetes. There's also likely pressure from their investors to start making money, especially as they're well into using their series B and might want to look to raise more money.

Distributed file storage is a tough market in itself. Developers get excited about technology but how many need this as opposed to a highly available database? I love the building blocks of distributed systems and understand that Google's technology is built on a layering of tech (Colossus, Spanner, etc) but it seems the world is not yet ready. Everyone is already struggling to understand the complexities of this new ecosystem and how the pieces all fit together.

Again, good move by CoreOS, wish them luck with their commercial strategy.


> Everyone is already struggling to understand the complexities of this new ecosystem and how the pieces all fit together

I believe Google's (and Amazon's) secret sauce is having large sysops and devops teams filled with subject matter experts - and the software's original developers.

Efforts like Torus are trying to build zero-to-low-maintenance turn-key solutions, whilst Google is happy to have tens of full-time distributed storage engineers.


For Google at least the SRE teams for infrastructure components are not large, neither in total nor relative to the huge infrastructure that they are managing.

You could call this operational scalability, it is made possible by decoupling service quality from individual pieces of hardware. The key ingredient are redundancy across relatively large failure domains and auto mated handling of all foreseeable events. Everything is built around this concept, compute with a fault-tolerant container scheduler, storage on a fault-tolerant file system, checksuming everywhere,...

Another enabler is probably keeping things simple at the component level.


I just spoke with some of the CoreOS team two days ago and Tectonic development is definitely being accelerated, which is critical given a few complete blockers in their setup (e.g. the VPC CIDR range in AWS must be 10.0.0.0/16 - Whoa there, Tex, that's gonna cause all sorts of conflicts with our internal network and Direct Connect)


And, well, it looks like it was mostly just one guy working on it.


I missed the fleet discontinuation notice. Is there a link somewhere about it?




disclaimer: I work on project Rook.

Yes building a whole new data path is complicated and takes many years to get right. Its also somewhat of a moving target with new storage technologies appearing on the scene (like SMR drives, NVME, and persistent memory).

It would be much more effective for the community to coalesce around a common data path just like we coalesce around kernels. Something like Ceph is a great start, its battle-tested and has storage vendors (like Intel, Samsung, SanDisk and others) updating/optimizing it for new kinds of storage.

The focus on simplicity and integration into cloud-native environments is critical however and Torus' vision was spot on. Kudos to the CoreOS team for raising the bar on this.


I have karma to burn on this, so here goes:

I worked for several years in VFX/HPC. 30k+ cpus and 15pbs of storage.

Firstly with storage its very rare that people want actual block storage (unless you are hosting VMs, but thats so 2007.....) Yes, I know, openstack, but that's just fucking horrific, seriously just use netboot and be done with it. I've seen people do it inside new clustereing systems, but its really not fun to do, especially if you consumer is prone to disappearing without warning. (FSCK is a terrible mechanism for fast recovery)

Most apps, unless they have bought into the "shove everything over HTTP and pay the penalty", want a posix file system to store anything of importance. (yes, yes database, but where is that writing the data to?)

Now, there are three ways you can do this:

o use a clustered file system

o Use NFS (with or without a clustered filesystem underneath)

o Fuck about with iscsi/SAS/FC and dynamically map block dynamically.

Using a clustered filesystem spread over many clients is begging for trouble, mainly because one client can fuck it up for everyone. Some FSs are dynamic and sexy, but they have a habit of fucking up in new and interesting ways that even the authors can't figure out.

The common ground is having storage nodes attached directly to a pack of big fat disks(for streaming IO) or NVME/SSDs for random IO. They then serve out NFS traffic. Now, you can either have a clustered file system underneth, or not. (Having stand alone servers can be advantageous, if you can map your filesystem out hierarchically)

Now, unless you have a Storage area network, then the last option is just begging for shit performance. You really don't want IO traffic fighting with network traffic. However, if you want raw throughput, this is the way to go, but be warned, you won't get any friendly help if you accidentally disconnect a disk.

Basically, kubernetes/HPC and storage is a solved problem ducks no really, just map in NFS shares and be done with it. If its exotic, its probably going to fail hard, and in ungoogleable ways. More importantly only a few people are going to be able to help, and they may or may not still employed at your company.


I find your offhand dismissal of clustered filesystems, on which literally every supercomputer relies, to be a little strange. They might not have worked well for you, but "googleable" is not the bar that is generally set for the HPC problem space.


Let me expand:

What I was angling at is unpartitioned clustered filesystems are fragile. Dirty nodes cause lots of problems. Storage area networks which then interface via another means is much easier to look after, for little/no performance hit. (it can be a lot faster because there is less coordination/chatter)

Would I use Lustre for long term storage, with 100% uptime requirement? no. WOuld I use it as a linearly scalable scratch space? yes.

Would I use GPFS (or whatever its been rebranded to) instead? yes, probably. If I was hosting lost of small(<10tb dataset) disparate apps, would I put them all on the same namespace? fuckno.

Would I wire all my clients directly into the same clustered namespace? not if I could avoid it.

The issue is this, in HPC, or any other multi-node scheduled based system(basically mainframes but without the documentation, or the error checking) nodes die in new and interesting ways, If you have shared memory then its surprising how well interesting problems propagate.

In most HPC senarios, you can, if you are desperate stop and restart from a known good point. If you are serving public things, you don't get that option. So its in your best interest to partition.

Now, as for the googable bar, Unless everyone running the cluster is intimately familiar with the filesystem, including the interesting ways its fucked with VFS, How the metadata server handles stale locks, sudden bursts in lost clients or what ever, you need google. Even if you are a master, you might forget.


Why isn't a fourth option being explored: local storage with async replication? Seems like it'd be fairly simple and fast, and no worse than non-clustered NFS regarding data integrity.

I'm just talking from ignorance, so am I missing something?


Oh yes, sorry I assumed that.

The simplest storage, is a bunch of dumb servers (well beefy dumb servers) with some application aware scripts to move/copy the dataset.

A place I worked at had a wrapper around rsync that would split up the directory and spawn multiple rsyncs to do a parallel copy.

The Directory structure was effectively copy on write, so backup to the nearline was <15 minutes


I wonder if you could have something better, closer to streaming replication of databases. A few weeks ago I found zrep, which sounds more like what I had in mind: http://www.bolthole.com/solaris/zrep/


I believe this is exactly what's needed..


> Firstly with storage its very rare that people want actual block storage (unless you are hosting VMs, but thats so 2007.....)

What do you mean by this? A 1:1 remote mapping to a physical block?


In addition to Rook https://rook.io/ , which CoreOS mentions and we need to add, please take a look at the other cloud-native storage options listed on the CNCF cloud native landscape: https://github.com/cncf/landscape

Disclosure: I'm executive director of CNCF, and co-author of the landscape.


Do you have any details about running Rook on Kubernetes? The Rook docs link to an outdated document about running Kubernetes on CoreOS.


The folks behind Rook at Quantum have put together an operator (custom K8s controller and TPR) for Rook: https://github.com/rook/rook/tree/master/demo/kubernetes


Thank you. Would you share with us a quick overview of the differences?


I called it here, right at the initial announcement:

https://news.ycombinator.com/item?id=11816951

This is just too hard of a problem to solve frivolously. Kudos to CoreOS for trying, and coming to the inevitable conclusion sooner rather than later.


This is good news - CoreOS needs to focus on what's most important to their core business to be successful.

Being chock full of bright, relatively young and enthusiastic engineers drunk on the Golang kool-aid, there's a very real risk of getting distracted by reimplementing everything under the sun in their favorite shiny new language.

Even if Torus is a good idea, CoreOS has to prioritize, commit, and execute. They can't afford too many diversions. This is a competitive space, their opportunity window and runway are both limited, as usual.


I kind of wonder if there will ever be a kubernetes operator built for Ceph (not rook ontop of Ceph). Besides it being a bit of a PITA to maintain, Ceph is about as good as exists regarding OSS distributed object storage currently. If they could kill some of the operational overhead via an operator that did much of it, they might have a serious winner on their hands. Note that I'm just referring to the radosgw bits for the S3 style storage API, not the posix filesystem bits.


There is some discussion on this in the ceph-docker project - https://github.com/ceph/ceph-docker/issues/472

Interestingly there is an unannounced project by CoreOS for a storage Operator that will handle Ceph, Gluster, etc. I'm sure we'll hear more about that now that Torus has been retired.


Source for the unannounced project, or just overheard in person from someone? I don't see anything obvious on their github but it could be private.



Dangit, I trust the CoreOS team more/better than a lot of people in the space. Torus would have been so useful.

At the other end of the spectrum though, maybe this is reasonable? As a developer, my first thoughts for "I want my own S3" is not etcd (strong consistency) but projects like https://github.com/minio/minio , or even using eventually consistent SQLite replication / synchronization tools https://github.com/gundb/sqlite .

So that makes me ask about rook.io too, what layer of the "stack" is it trying to fit into? Obviously pretty low, but that also seems unnecessary (and part of why I suspect Torus is stopping).


At a glance, Torus was intended to be a distributed file system. As I understand it, distributed file systems are easier than distributed block systems, but harder than distributed blob systems.

A blob system is all-or-nothing. You create or replace the entire blob at once. This makes bookkeeping and replication much easier for the implementer.

A filesystem supports much richer semantics, including the ability to seek parts of files and modify small regions of files. You need a lot more mechanics to maintain consistency across a network.

A block store is difficult because you're trying to work at very high speed on very small units of state wooshing back and forth willy-nilly. You don't get to rely on any of the higher semantics provided by a filesystem or blobstore, since you're pretending to be a magical harddrive.

I am often wrong in these matters, as an interested outsider, so I'd be happy to receive correction.


Disclosure: I work for OpenEBS project

Torus was intending to write distributed block storge that is container native. Metadata management using key value (KV / etcd) method is increases the complexity and not new. Ceph tried it.

OpenEBS uses a novel approch, linux sparse files for managing the blocks of a volume. Fork of Rancher longhorn. The issue of managing the large scale distributed block storage metadata is solved easily throught he management of the files (not blocks).

https://blog.openebs.io/torus-from-coreos-steps-aside-as-clo...


What I don't get around the new efforts around container-native storage: if you decide to build a new container-native storage system, why would you aim for block storage and not file storage?

Block storage is not exactly a great fit for containers as you can't access its file systems from multiple hosts and fail-over is a hassle (forced remount, fsck).


Container native storage has two aspects. One, the storage that container uses, (Like Docker uses Device Mapper, Overlay2 etc.) and two, the persistent storage that the applications inside containers need.

Both of them need to be truly container native. Portworx is attempting LCFS to provide a container native storage for containers itself https://github.com/portworx/lcfs. So, you are right. You would need a container-native storage (file system for Docker) for running containers.

OpenEBS is targeting providing containerized storage (persistent+block storage) for applications in containers. OpenEBS builds a storage volume as a container and presents the volume-container as part of the K8s POD. This way the the storage persistence problems are resolved by K8S orchestration ingelligence written for application PODs.

Ofcouse, OpenEBS containers will use LCFS when it is ready.


File systems are much harder than block storage due to the large amount of metadata and the consistency/durability requirements for metadata. So a lot of systems just build block storage and format ext4 on top or put NFS+ZFS on top (but that's not very distributed).


Working on a distributed file system (Quobyte), I'd say the order of complexity is file > block > object. Why? Because a drop-in replacement for a local file system also needs to be good at high-performance concurrent block IO, otherwise you wouldn't be able to run applications like databases.


So am I wrong in thinking FS semantics can give some hints on how to optimise activity?

That is: a pure block store must treat every operation with identical care, as it has no insight into the higher-level meaning. But a filesystem can, for example, give consideration to whether it's a write, read, create or appending op; where in the file it's happening; the order of operations; metadata vs file contents and so on.

Again, I am only distantly familiar with any of this.


"block store must treat every operation with identical care"

Yes, and because of that it's easier to do a block storage, than a filesystem. But sure, filesystems have enough semantics to potentially scale across multiple data centers and probably be performant enough for many apps to tolerate those latencies. However, the level of complexity would be similar to that of Google's Spanner.


Thankyou.


Does anyone have experience with https://rook.io/ which is mentioned in the message?


could anyone more familiar with the situation give context around the decision + what is happening moving forward?

Thanks


Nothing of value was lost, don't worry about it. People reinventing the wheel, realised it takes a lot more than they are capable of. Same old, same old.


I can't speak to the value of Torus, but in this case the best wheels are proprietary and their design is secret. Reinventing them is reasonable.


Anyone looking at openEBS.io? This is open source scale out block for containers.



Those of us in the game for some time ultimately read "beta" to mean "30% chance of survival" so this doesn't come as a surprise but it probably isn't true for our more optimistic colleagues.

I think more tempered marketing would have really helped.


For reference: https://coreos.com/blog/torus-distributed-storage-by-coreos....

"Releasing today's initial version of Torus is just the beginning of our effort to build a world-class cloud-native distributed storage system..." 2016-06

"Torus development has stopped on Core OS" 2017-02

Simply pointing out what is the case.


I understand they don't hand out Internet points for this sort of thing anymore: https://news.ycombinator.com/item?id=11816821


Note that CoreOS has not given up on the concept of distributed storage; they just gave up on writing their own. So they haven't proved you right.

I realize reliable block/file storage isn't "cloud native" but legacy apps require it and they are willing to spend billions to have it.


What does "cloud native" mean here? To me it suggests purpose built much as Torus was /is - though the OpenEBS engineers are asserting there are really only two "container native" storage solutions going, their open source project and PortWorx. https://medium.com/@kiranmova/persistent-storage-for-contain...


As with all buzzwords, it means what people want it to mean. Many people say that cloud-native apps should use only ephemeral storage (probably because they don't provide reliable storage).


Of course people want it, but can they have it? The world has yet to see a successful distributed block project.


"Of course people want it, but can they have it?"

Why not? Block storage is a weird beast, targeting legacy apps unable to run in multiple data centers. The same apps are also likely to be willing to lose some of the most recent data in case of a data center outage and trade this for performance, so the barrier is already low. The storage might only need consensus somewhere close by, where latency is very good and the network capacity is huge. Nodes in other data centers could receive data asynchronously (otherwise write latency is going to render the whole thing useless anyway). The question is whether there really are billions to be paid for something that ultimately cannot do well compared to proper distributed solutions.


Well that seems eminently reasonable


What's Torus and why should I care?


The next question that needs to be answered at CoreOS: "Why, exactly, are we maintaining our own Linux distro when the Go binaries that we're writing can mostly ignore userspace?"


For one, CoreOS auto-updates smartly, so you can install and forget.


Ah.... Hahah... Hahhahahahhahahahahahahahah.

No.

During the 1 year I ran CoreOS in production, updates were turned off, because they caused all sorts of issues.

They only reliable way of doing updates in CoreOS is to replace the machine and reconfiguring it. But then you need to automate joining etcd, which itself is a major pain in the ass.


You don't deserve the downvotes. When Docker arbitrarily changes something important and pushes those changes to Docker Hub, CoreOS is dragged along for the ride.

We had our auto-updating servers move to Docker 1.10 over a weekend. Of course, this brought down our CI/CD process because that version of Docker changed something important. Our staging environment was totally horked, but our production environment survived due to an unexplained reboot lock. We were lucky.

Turn off auto updates.


There were Linux distros that could upgrade across releases twenty(!) years ago. There is so much opportunity in infrastructure right now, so it strikes me as weird that CoreOS took a bunch of VC money to go off on an extended Linux-From-Scratch-Adventure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: