More

wora · on Jan 14, 2019

A large data center has 20-30k machines. Amazon disclosed that back in 2014. Most likely you want to partition the machine into some chunks, so each application or each database only use some of them. So you don't need to talk to many machines at once.

In general, a distributed database can be really fast. DynamoDB claim to be 3ms (AWS reinvent 2014). The current technology can probably achieve close to 1ms, which is equivalent to memcache. With such performance, you don't really need local storage.

You can get much faster IO if you use many local SSDs. The downside is utilization. It is very rare a single machine has a workload that fully utilize local disk. You end up over provision greatly fleet-wise to get high performance. A managed database over a network is more likely to utilize disk/SSD throughput.

wora · on Jan 13, 2019

I saw many comments about stateful workloads. I am not sure it is a necessary issue for cloud environment.

Within a zone or a cluster, the latency is about 1ms, which is faster than most hard disks. The network bandwidth is on par with disk throughput. What we really need is a faster database and a faster object storage that can match the network performance (1ms and 10Gbps), then all workloads can be stateless.

If one uses a VM on GCP, the VM has no local storage besides the small local SSDs. Practically even the VM is stateless besides some cache.

outworlder · on Jan 13, 2019

> The network bandwidth is on par with disk throughput

Yes, and most storage you have access to, in cloud environments, is network attached. GCP disks, AWS EBS volumes, etc. All network and outside the hypervisors. You may have some local storage, but that's ephemeral, by design.

However, since we are talking about Kubernetes: not only VMs are ephemeral, but your containers are ephemeral too! And they can move around. So now you (or rather, K8s) need to figure out which worker node has the pod, and which storage is assigned to it, and then attach/detach accordingly.

This is what persistent volumes and persistent volume claims give you. They actually work fine already for StatefulSets.

Now, if you are in a cloud environment you should look into the possibility of using the hosted database offerings. If you can (even at a price premium), that's a great deal of complexity you are going to avoid.

uluyol · on Jan 13, 2019

With stateless services, forwarding requests to underlying storage and serialization can dominate resource consumption. After all, some services will do little besides fetch the right content and transform the data somehow.

Addressing this requires caching data in memory while making sure those caches are also disjoint so that you fully utilize your cluster memory. This has driven Google (and others) to make some services semi stateful and build dynamic sharing infrastructure to make this easier [1].

[1] https://ai.google/research/pubs/pub46921

lostmyoldone · on Jan 13, 2019

From a database perspective, 1ms to disk is an eternity.

A good disk subsystem had less write latency than that in the early 90’s.

wora · on Jan 14, 2019

Write to disk has no practical latency because of write buffer, either local file system or remote database. Flush to disk would be slow unless you use SSD.

On the other hand, a single machine has limited reliability. If one wants to have high availability, they needs to dual write to another machine, which also has network latency.

emanlin · on Jan 14, 2019

I don’t think that’s true. I recall ~5ms seek times being top of the line.

cat199 · on Jan 14, 2019

he said 'subsystem' not 'disk'-

what was the latency to the controller with ram cache?

using seek time as a measure is also somewhat worst case - controllers/filesystems also queue(d) according to drive geometry.

wora · on Jan 7, 2019

The short answer is layering. For example, to support mutual TLS, you need a system that distributes and rotates TLS certs to different nodes. If we don't want to add to many features to Kubernetes, then we need another layer. That becomes Istio or something like Istio. That is roughly how Istio came to exist in the first place, as it models after several Google internal systems that are on top of Borg instead of being part of Borg.

wora · on Oct 9, 2018

Such policy existed since the very beginning of Google APIs, and is well documented within the company. Anyone who worked at Google should be aware of it.

wora · on July 28, 2018

People often consider to use this information for reliability and performance, but you can do much more with the data. For example, if a method has low latency, you can use short deadline with fast retry to improve reliability. If you see a sudden jump of certain usage, you can consider to use batching and caching to reduce your cost. If you see an unexpected usage of a service, you know someone introduce a new dependency in your system. Google teams often use the same data to understand how large services work and how they are correlated.

Disclosure: I worked on this feature at Google.

wora · on July 3, 2018

The app can post your data to anywhere it has access to. This is commonly known as data exfiltration. The common way to prevent that is to run the app in a secure sandbox. Most OS don't provide such capability in a usable way.

AstralStorm · on July 3, 2018

Android has capability control tied to certain kinds of specific objects such as intents and binder connections. This could be extended to streams and providers (like the one used to read email) and objects created from such streams. Would require some internal API change and to document the change in permissions.

The new permission would mean the app is allowed to send contacts or emails read from database over the network.

semi-extrinsic · on July 3, 2018

How do you run an email app in a secure sandbox to prevent data exfiltration? The app's primary purpose is to send and receive data.

wora · on July 8, 2017

If a node goes down, another node will replace it quickly. The node (aka tablet server) doesn't own any data. The data is stored on lower level storage layer.

wora · on Sept 25, 2015

Besides agility, speed and radar performance are also critical in missile attack and defense. F-22 can super cruise for that reason. The size of F-35 radar is a concern for performance. Given the past experience of F-15 vs Su-27, it is very hard to believe F-35 will be competitive 10 years from now besides the numbers.

wora · on Aug 29, 2015

That is right. gRPC and Google APIs share one simple error model, as defined by https://github.com/google/googleapis/blob/master/google/rpc/.... The goal is to make it easier for developers to handle errors.

PS: I was the co-author of it.

wora · on Aug 18, 2015

There is one here: https://support.google.com/onhub/answer/6279845