As a sysadmin, any time I read some variation of; "At some point even data centers may become a thing of the past.", I know they don't know what they are talking about. As a matter of fact it has provided much joy through laughter (followed by required sysadmin scotch) at the show "Silicon Valley" for obviously parodying the issue. Datacenters aren't going anywhere, and this strange fascination in hipster-hackers with instant uber-decentralization-pushes concerns me because it ignores some of the more real (and fixable) issues at hand like dns centralization in favor of magical "p2p(+blockchain) will save us all" thinking not backed by much real world practical implementation.
Don't get me wrong, I'm a darknet, meshnet supporter. I love decentralization. That said, I support the establishment of the infrastructure required to support it independent of end-user devices, and I think for security and other purposes it's at least possible they should remain separate, and devs shouldn't assume so much right to cpu-cycles.
So in essence the topology I think that is preferable would be properly called decentralized-distributed.
Of course thats part of the reason I support things that go against that common grain, such as ipv6nat.
I genuinely don't understand why you would want ipv6nat. It seems like a bog-standard firewall; but instead of being able to configure rules about incoming traffic, everything is blocked and you are stuck with that.
Maybe somebody who knows more about this topic could explain it?
Explain? Only reason I can think of why this would be funny is that under IPv6 everything could have a dedicated IP, making the NAT useless. No other reason to have NATs, though? Single IP exposure, security, that kind of thing?
> One of the hopes for IPv6 was that it'd deliver us from IP address scarcity and hence the need for NAT and all the associated difficulty of NAT traversal.
My response:
> This is what I thought -- why would you neet Network Address Translation (NAT) when the address space is big enough to have everyone have an individual address? No need to traverse a router, it just becomes another hop in the chain.
Because you might not want to expose your internal addresses to the wider internet for a variety of reasons. Security, variability, compatibility with IPv4 clients/subnets etc.
NAT is not a security mechanism, nor is it a necessary part of routing ("variability"). At the very best, it's a band-aid - mostly cosmetic and provides psychological comfort.
But it provides some privacy. Identifying users solely by IP doesn't work well with ipv4. Even if you have a subnet in ipv6, you're still the only one using it.
Band-aid again. Identifying users by IP was never a reliable technique; and with most services, I'm still the only human using my public IPv4 address, NAT or no NAT.
If worried about privacy, you have a whole raft of identifying mechanisms to cope with before the IP address as an identifier even appears on the horizon, and there are tools to obfuscate your source address; NATing it away only makes things more complicated without a commensurate increase in privacy (worse, provides a false sense thereof).
I'm not sure if this makes sense -- it sounds like security through obscurity. If your address is reachable, it is exposed. Not allowing people to enumerate the devices on your network is one thing, but if you're accessible through your router (internal port forwarding/DMZ whatever), then you're exposed, whether NAT happens or not.
If your router is "just a node" in between (that happens to have a monopoly on the address/nodes it's in front of), it just needs to disallow access to the endpoint that is behind it -- almost like purposefully leaving it out of the routing table. It's not like people are going to be able to guess your IPV6 address easily, and even if they could, the proper way to deal with that is to just ensure you're actually secure (and no ports are open, forwarding isn't being done at all, etc)
Am I missing something here? The way things work now, yes you can't find out what ip addresses are on the private network behind a router, but if any ports on the router are open, you know SOMETHING is responding (whether the router or something else. In the IPV6-no-nat world, the router could just forward all your traffic (pretending the router was the originating computer), and NOT allow any traffic that attempts to hit the IP at the computer past it. Someone still needs to know the internal address before they can try and access that internal computer -- and you can still stop it at the router if you wanted to...
What am I missing? I'm not a networks expert, but from my understanding of networking/nat/firewalls/security/etc, I can't see how ipv6 increases your exposure that much.
IPv6 address space is so huge that every internal Internet-connected host should receive discrete internal and external prefixes, the latter associated with your AS and the former unannounced and with no global routing.
Internal servers, routers etc don't receive the external prefix and so can't be enumerated from the Internet.
Just like I don't get 100 IPv4 addresses on my phone, I don't get a block of IPv6 addresses. There is nothing I can do about it. IPv6 changes nothing in that regard, effectively.
Except that with IPv6, your telco could delegate you an entire /64, or potentially less and turn your phone into just another router in the network.
Just asked my friend who is on Verizon, when he is tethered to his phone his laptop/computer gets a full globally unique IPv6 address, every device that is tethered gets one. So it already exists.
> Just like I don't get 100 IPv4 addresses on my phone, I don't get a block of IPv6 addresses. There is nothing I can do about it.
Sure you can. Do the thing the people whose ISPs give them zero IPv6 addresses do -- use a tunnel broker. Or a VPN provider that gives you a real IPv6 address block.
Isn't this a security thing? I'm not anything close to a security expert, but I always understood that it was best not to have your personal computer exposed directly to the Internet. Not to mention just having control over my own network.
Also, my ISP only gave me one ipv6 address. I'm OK with that.
how do you cope with something like gossip in consul when you have a
host <-> nat <-> internet <-> nat <-> host
setup?
the last time i tried, gossip would send a udp packet from something other than port 8302, so left to right would work ok, but nat would assume some state there, and treat right to left as a reply, rather than fresh connection, and stuff would get lost.
Although, it has been a couple of years and i'm probably misremembering the details.
Debugging nat. it's, well, unpleasant. ipv6 has a whole lot more address space than ipv4. it obviates the need for nat. i think the standard deal is a /64 for your router. you can roll addresses every half hour forever, if you want to.
nat is just a bunch of glued together heuristics about how things should work. it's not fun to debug. i mean, you have 2^32 internets worth of space to assign to machines. everything is so much simpler without nat.
i dunno. maybe i'm an idiot. maybe nat is super cool, and i'm just jaded. imho it makes things very difficult. but i'll defer to the majority. whether i want to or not.
One of the hopes for IPv6 was that it'd deliver us from IP address scarcity and hence the need for NAT and all the associated difficulty of NAT traversal.
This is what I thought -- why would you neet Network Address Translation (NAT) when the address space is big enough to have everyone have an individual address? No need to traverse a router, it just becomes another hop in the chain.
I wish one of these "next-generation technology" blog articles would mention Named Data Networking[1]. I think the real problem with the web is the IP protocol itself, not any higher-level protocol or particular way of using IP.
From a very high level: The Web should be built on top of a broadcast protocol, not a point-to-point protocol. If it were, many complex issues would be solved.
Zhang, Floyd and Jacobson agreed with this in the mid-90s, proposed using hierarchical multicast with caches, and spend a lot of energy trying to get someone to listen.
I'm glad to see this mentioned but I'm not sure named data networking is based on 'broadcast' model. I think it is closer to anycasting behavior with packet delivery gated by security mechanisms to ensure delivery only occurs to an authenticated 'requester'.
Sorry, I didn't intend on saying that it was based on broadcast. I should've been more precise in saying that it enables the flow of broadcast data (e.g., streaming from one source to many destinations) much, much better than point-to-point protocols like IP.
So there are a number of issues that are not really answered here.
1) the cloud is a costed model. you pay a company to look after your stuff, so you dont have to manage it yourself
2) the cloud _is_ more expensive, but not compared to hiring your own infrastructure people (depends on scale of course)
3) distrubuted networks of things are much much slower than centralised. save for a few exceptions.
4) Trust. even though you stuff should be encrypted, spaffing it out in the open is a large risk, because should your old keys leak, it'll be trivial to retrieve old data.
5) money. who on earth is going to buy, run and maintain this infrastructure? whats in it for them? Who organises development of tools, patching etc? who do I call when there is an outage.
in short for this to work, it would required a 180 degree culture shift of capitailsm.
also physics, to make globally distrubuted systems both fast, relaiable and consistent
but if you are a medium sized internet based buisness looking after < than 30 instances of things, the cloud is the way to go.
and I don't mean just using ec2 instances. Thats just terribly expensive. The payoff is using managed services like RDS, vs running your own HA DB cluster.
If you have a HA DB cluster with predictable usage pattern (most fall into that category), it should be cheaper to run it yourself (with the employees needed for it) than using RDS for most sizes. That doesn't mean having your own data centre but renting some dedicated servers somewhere.
The additional work after setting it up isn't that much more than for the cloud where you still have to monitor uptime and load.
Hand-waving the initial setup period away hides a substantial footer cost (it is quite hard to learn how to safely, properly install a large database stack, especially if you're unfamiliar with it), and a substantial ongoing cost (updates etc.) that is often eliminated or ameliorated with cloud-hosted options.
And the additional work after setting it up is much greater. You have to monitor uptime/load/etc. regardless, sure, but when running your own hardware you also need to monitor uptime and load of other things as well: internet connectivity to your servers, electrical power, human access to servers/physical security and so on. Especially for a small site, those are quite expensive; their cost decreases a bit when you're running or leasing a lot of hardware, but it's still quite high.
Why do "cloud" advocates always jump to the extreme opposite: you manage everything yourself down to mowing the lawn in front of your own custom built DC.
You can buy and colo or even rent dedicated hardware. I've never heard of a DC that didn't offer a remote hands service for hardware issues that requires a physical presence.
So I'm not a cloud advocate. For the present company we use the cloud because we, as a company do not have the skills or the ability to run our own infrastructure properly.
We have three physical datacenters.
in terms of cost for us, its about 3-8 times more expensive, side by side AWS vs hosted tin.
The problem is this; Raw metal is much, much faster and more efficient. but the tooling to make it HA, scalable and monitored >> than the cost of hosting in AWS. (specifically migrating the ~3000 instances from 6+ config systems into a homogeneous platform)
Having worked for a beautifully integrated high scale (36k servers, 15pb storage) company, where everything was real steel and well monitored and automated, it makes my heart bleed paying through the nose for shitty VMs with even shittier storage.
However, Having to deal with half-arsed VMware/kvm integrations is far far worse than dealing with AWS. Or worse, dealing with a shitty puppet wrapper for tungsten.
If cloud advocates don’t jump to the extreme, the benefits of cloud computing quickly fall apart (there’s a reason AWS’ margins are ridiculously high).
I have managed my own DC and colo'd. Some examples of issues that occurred:
- Colo: unauthorized personnel tripped over a cable, killing network to a lot of our servers. This was in a huge colo hosting facility that was home to some fortune 500 companies' servers.
- Own datacenter: local internet providers both had a planned maintenance/outage on the same day. This isn't "mowing the lawn in front of your DC", this is the most basic utility after power that needs to be available for your DC to work.
- Colo: hard drive failure after hours. Colo security staff wouldn't let us disassemble hardware that wasn't ours. Had to wait 6hrs for a technician to show up to pop and swap a drive (this with a 1hr incident response in our contract and the same "highly reliable" colo; we got money back per our contract, but not the customers we lost, even after apologetic/refunding communications to them).
- Own datacenter, third-party NAS appliance professionally installed by a vendor: persistent performance issues and service dropouts, eventually traced to the "professional installation" being at the bottom of a rack with a large, detachable rear panel mostly covering the fan intakes for the appliance, leading to persistent throttling due to heat.
- Colo (rented hardware, managed OS install): When the business started, our sysadmins/DBAs (me and one other guy) were primarily experienced with RHEL and old school init. The colo only provisioned Debian/systemd servers. We learned, but it slowed us down for a week or so. Sure, it was only an extra few minutes per task, but it added up.
- Own datacenter: management needed us to move into a new server room because of an ending lease, with a hard deadline. The air conditioning installation vendor showed up two weeks late; we had no cooling at all when we needed to cut over, causing days of service interruptions and downtime.
I have also managed cloud services. Some examples of issues that occurred:
- Amazon's S3-gate. That sucked. When it happened, we were able to email our customers a copy of Amazon's status, and links to reporting of exactly how wide-spread the issue was. We had impact nearly identical to the colo hard-drive-failure incident above (it was the same service, migrated to AWS/S3), but we didn't lose nearly as many customers.
- The DNS DDoS that affected the East Coast of the US earlier this year. Same kind of service interruption, same communication. We even got replies back from customers saying "I couldn't get to Reddit either; I figured you guys might be in the same boat".
I'm not a cloud evangelist. I think that there are very good reasons to host entirely locally, or colocate/rent, or anything in between. I do, however, think that businesses, especially small ones that are not typical software-centric startups, massively and regularly underestimate both the initial and ongoing costs of running their own infrastructure. There are significant technical benefits to being cloud-hosted (these benefits also apply, only a little less potently, to fully managed hosting operations a la Rackspace), but people often miss the political and financial benefits. The political benefits are things like "our customers are less pissed because widespread issues with AWS probably affected them in other ways that day as well, so not as many people will knee-jerk blame us". The financial benefits are primarily that there are fewer moments of the "oh shit, setting these servers up ended up costing way more than we thought it would", and "we thought this would take a day to turn up; it actually took two weeks" varieties. Those things still happen with cloud services, but much less often.
To add fuel to your fire, I've personally been exposed to supporting the DC infrastructure in multiple companies and let me tell you, it's no picnic. The time and money we spent on maintaining the system was outrageous. Getting vendors to come fix their systems or tracking down licenses for software took days and days. Even finding replacement hard drives that were compatible with our 3 year old servers was a giant task that consumed several team resource's for hours at a time.
These are the things that DC/Colo people don't talk about and conveniently forget when it comes to cloud systems. I would gladly give up that "control" for steady and predictable futures.
Interestingly these examples only cover colo and own data centre. What about dedicated servers? You get a similar SLA as with instances in the cloud at substantially lower prices.
If you have to learn how to perform a proper setup, you're a good candidate for the kind of managed services provided by a cloud-based solution. As long as you're aware that you're paying a sort of "ignorance tax" for not knowing how to do things, that's perfectly fine.
While researching for https://www.thecloudhostinghandbook.com/ I discovered that there are as many possible setups as there are companies: the best solution depends on what kind of assumptions the developers have materialized in the code, how much the company wants to spend, and ultimately on the world view of the people running the system.
I was talking about dedicated servers, not running your data centre. Data centres are a scale business but cheap to run at reasonable scale. Dedicated servers are significantly cheaper than cloud instances and there's not much more to monitor than for the cloud. Compared to hosted DB clusters you need someone to set it up and run it but you'd also need an AWS expert for a production cluster in the cloud.
Why do you need consultants to run a database cluster? And do you run AWS database clusters in production without expert knowledge? If you need consultants for self-hosted databases you'd probably also need consultants to properly set up AWS services.
Performance tuning, scaling, troubleshooting, etc. We had terabytes of active data (as opposed to warehoused data).
For example, if a disk fills or is fails, adding a disk to a mySQL setup without causing downtime is a pretty involved process but in RDS it is one config change in the admin. It does the same process just AWS put a huge amount of work into automating it.
We only brought in the consultant for an average of a day every other week but that adds up and we had to bring in 0 people after switching to RDS.
it _is_ cheaper to run. But thats only part of the lifecycle.
Where is the tooling to make the HA DB cluster? where is the tooling that securely and reliably does point in time recovery with a minute resolution for the last _n_ months? Where is the tooling to measure the health of the underlying FS, OS and hardware.
What about scaling, what happens when you change the schema and shit gets slow, how long does it take to re-tool the scripts to make it work on new hardware?
All of these will have to be built by your company, or bought in. Who tests them? how do you get support when shit goes wrong?
For sure, it can be cheaper for the whole lifecycle, but often its not.
Look I love real steel, I looked after a large cluster, but the prospect of making changes to the three-node tungsten, or creating a new one, gave me the fear.
> > who on earth is going to buy, run and maintain this infrastructure?
> This. As much as I love the idea of decentralization, I can't see it getting much traction because of this.
At least 90% of all smartphone users are clueless as to when apps work in the background on their devices, so if you install an app that turns your smartphone into a lightweight node in a p2p network, almost everybody will be none the wiser IMO.
Do you expect a plan that covers 100% of all problematic points from the get go? As if the current technologies aren't patched to oblivion right now. We can at least avoid the problems we already know exist. That's a progress in itself, wouldn't you agree?
Like with physical exercise, you start off somewhere and you discover solutions to problems you didn't know how to solve, along the way.
Also, I am not sure what you mean. What part of distributing encrypted data and decrypting it with your key on arrival is hard?
Admittedly, it would require a big shift in how we code apps. I am not opposed to that though.
I have mildly secret data that I want to process, for example a credit card transaction, how do I securely decrypt, process, re-encrypt and upload to the destination?
Processing that data requires a remote CPU that can't be spied upon. In a de-centralised environment how can I get a reasonable guarantee that the person hosting the CPU isn't just listening to the secrets?
What threat model are you looking at? I'm talking about providing enough guarentees to sign a contract with another buisness, not hide from government actors.
as a normal buisness I want to process sensitive information. That is not possible in the world you describe, with current CPUs
The only instance where this works its mining pools. I've seen offerings for renderfarms, but they never scale, are all heterogeneous in both hardware and stability, and never worth the hassle.
Plus it involves shifting highly sensitive assets to unknown third-parties its impossible to sell to the movies/TV studios
IMHO "cloud" as a whole was way overhyped for the value it provided. While it delivered on simple requirements any thing complex meant things got out of hand quickly. Quite a lot of applications were sold as "to be used by business user" but then setup ensured the complexity was so high that it required whole dedicated technical teams to manage it.
That said, is it the end? It is doubtful - A lot of conventional, old school companies which were against moving to cloud because of various reasons are now seriously considering cloud. Quite a lot of these companies are heavily dependent on Oracle, SAP etc which is just now rolling out/pushing their cloud products. Maybe it is the shoe shine boy and Joe Kennedy moment, but it is difficult to tell.
I don't think it's difficult to tell at all. Real businesses have built massive systems on cloud infrastructure and the pace of them doing so is accelerating. Real money is being made all round, so there's no bubble. It's not all predicated on possible future profits that won't ever materialize. Are there too many startups that won't go anywhere? maybe, but that's not going to lead to the collapse of AWS or Azure.
Meanwhile the solutions this article is promoting don't even really exist yet except in very limited, primitive forms. Nobody is going to build a Facebook on bit torrent and the blockchain any time soon.
>This is a problem mostly because of the way we’ve organized the web. There are many clients that want to get content and use programs and only a relatively few servers that have those programs and content. When someone posts a funny picture of a cat on Slack, even though I’m sitting next to 20 other people who want to look at that same picture, we all have to download it from the server where it’s hosted, and the server needs to send it 20 times.
The article reminds me of the Pied Piper platform from Silicon Valley, but especially the above quote.
You can architect for this though. Let's say you're a data-heavy SaaS selling primarily to enterprise, which operates their own datacenters. Instead of naively forcing all requests to go first to api.saas.example, you allow the enterprise to configure its account to first hit a cache service (that you write and distribute) located at saas-cache.enterprise.internal, allowing both the SaaS and the enterprise to save on bandwidth by not fetching the same data 20 times over.
Edge caching is not MITM and it's not helpful to pretend that it is.
MITM is a form of attack. Edge caching is simply a form of distributed architecture.
The typical objection to edge caching HTTPS is that it requires giving your private key to a 3rd party. Well if you're hosting in the cloud, who do you think has your private key at the origin?
It is? Physical location is pretty far down my list of reasons I use cloud providers. Certainly, API-driven resource provisioning is way higher, as is pay-for-what-you-use costing.
Why does the author implicitly assume that the amount of spare local storage is enough for the swarm+redundancy? Furthermore how many backups are required for data to ensure the same level of protection as a modern cloud provider? Does the math even work?
> Why does the author implicitly assume that the amount of spare local storage is enough for the swarm+redundancy? Furthermore how many backups are required for data to ensure the same level of protection as a modern cloud provider? Does the math even work?
It isn't really a problem. N+M erasure coding is very efficient. Even if the average node only has 40% uptime, that means on average you can recover the data with only 2.5 distributed copy-equivalents. You would want slightly more than that in case the specific nodes end up having below average reliability, but the numbers are completely reasonable.
Moreover, for anything which is shared with a non-trivial number of people, each person would have a full copy which means the level of built-in redundancy is already massive overkill and there is no need to spend third party storage on any additional redundancy at all.
> Even if the average node only has 40% uptime, that means on average you can recover the data with only 2.5 distributed copy-equivalents.
Wouldn't this mean that if X is the total amount of consumed storage currently used, you still need 2.5 times X _additional_ unused storage across the swarm in which to hold the 2.5 distributed copies?
If so do average people keep such amounts of free storage around?
> Wouldn't this mean that if X is the total amount of consumed storage currently used, you still need 2.5 times X _additional_ unused storage across the swarm in which to hold the 2.5 distributed copies?
No. The first copy-equivalent is the original data.
And again, even that's only for bespoke personal data. As soon as you have a few people with a copy -- which is even the case for most "personal" files, because friends and family will have copies of your photos etc -- then you don't need any redundancy past the copies each person already has on their own device.
Moreover, cloud providers are already wasting more space than that. People have a copy of their data on their own devices, but also on the servers, and then the servers have at least one backup. That's three copy-equivalents already.
> If so do average people keep such amounts of free storage around?
Millions of people buy computers with 1+TB hard drives and then use less than 10% of the space.
And it's possible to use the free space without really depriving the owner of it, because you can set a minimum free space threshold and transfer data off the machine if it ever falls below that, so the space is only used if it would otherwise have been free space.
> No. The first copy-equivalent is the original data.
Ah, got it.
> Millions of people buy computers with 1+TB hard drives and then use less than 10% of the space.
See that's the bit that's weird to me. My OS drives are small SSD's and the spinning platters are all comfortably full.
I will say though, I'm replicating once onsite and once offsite so my own redundancy is pretty high. If I could get over the 'someone else having physical access' thing (I don't use cloud for most personal data) I suppose IPFS or equivalent would be cool.
> See that's the bit that's weird to me. My OS drives are small SSD's and the spinning platters are all comfortably full.
You have to remember that you know what you're doing. You know how much space you need and you know how to add more later, so you don't buy more than you need.
The typical person buys a 2TB hard drive because they have "thousands of photos" and the 2TB drive is only $15 more than the 0.5TB drive, even though "thousands of photos" consume like 0.005TB.
And they're rational to do it because they know they aren't good at predicting whether they will fill the smaller drive and it's worth $15 to hedge against the ordeal of adding more storage later.
Which means many people will buy a 2TB drive and use it to store 75GB of data.
While it is _technically_ possible to build heavily decentralized, managed hardware deployments, no one has figured out yet how to charge for them except in the case of very large customers. Until this happens, there will be no swing back to anything decentralized.
Although I do understand the allure: just about anyone today could very economically purchase 5-10 servers with 10-18 cores each (and those will be _real_ cores, not hyperthreads). There's nothing impossible about automating software management on such a thing, even to the extent that you'd get in cloud (VMs, containers, distributed storage, automated updates, VM migration, network partitioning, etc). I believe Microsoft will lease you a fully set up shipping container with Azure in it, all that needs to be done is scale this down.
But again, how does one extract billions of dollars in profits from something like that?
I think the point would be that the central server `could` gather aggregated data / important business metrics that it could use to provide further value. I think the author also mentions something like this.
Ex: provide analytics back to the end user that could be a paid service, or use that information to provide a second set of b2b offerings to other partners. etc.
Now compare that to the current model of nickel and diming for every single thing and charging a 90% margin on traffic, and you’ll see why this is not attractive to the current players.
Well, I think intrinsic ineffeciency of storing all that data inside your central and then running huge spark jobs to crunch through them creates huge costs in the first place that probably needs to be offset by higher margins being charged from the end consumer. If you can just monitor the users engagement with your offering and charge on that without having to actually go through countless logs to do that, might be what the author is leaning to. But yeah, easier said than done. Knowing which metrics to capture back centrally itself could be a challenging thing to do, if the data is decentralized ....
I never understood the difference between Mainframe Computers / Computing and Cloud Computers / Computing.
In both cases, a centralized server connected to the internet does all your application, storage and processing. If that server goes down / connection is lost, in both cases you and your "SaaS" applications are SOL ( Sh*t Out of Luck). Also in both cases, you offer hackers a centralized location to target and hack.
What's the difference, except in the name and in the marketing ?
The difference is the move from specialized to interchangeable, commodity hardware. This allows cloud providers establish an abstraction layer over servers, because it doesn't matter which machine serves your demand at any given moment.
One is antiquated and laughable, the other is modern and commendable. The name, I mean - concept remains practically unchanged (Right, right, it's a swarm of logically connected containers instead of a system of physically connected components. Big difference.)
That's a bit like saying you don't understand the difference between cars and airplanes because they both just forms of transport that get you from one place to another. Mainframes and distributed servers both provide networked services, but the ways they achieve that and the pros and cons of those solutions are very different.
They also aren’t very powerful in reality, just old software that is proven and reliable. Almost all of the non-IBM mainframes are just Xeon boxes these days.
One vendor that I’m familiar with sells a box equivalent to a HP DL580 that would cost about $50-100k for $2-3 million.
At the core of the cloud paradigm and the centralized/decentralized issue lies no technical problems but a political one about control and privacy invasion.
The only reason why we have youtube instead of VLC + eDonkey packages are because of lawsuits. It is a technological solution from 15 years ago that is technologically superior and much harder to censor.
Datacenters and "cloud-based" services do not answer to a technical problem but to a political one: how do we control information flow in a decentralized net? The answer: provide bottleneck of information for free.
The one thing the article didn't address is exactly the pushback from regulators and corporations. They will NOT stand by and watch idly as their control is getting pulled out from their hands.
Even if such a network gets invented, I am pretty certain governments will make aggressive laws along he lines of "if your computer caches a trailer of this new Marvel movie, we can confiscate your tech for copyright infringement".
As much as I want the problems to be only technical, IMO the much bigger battle will be in courts and public hearings.
Even if all the arguments are true and take into account all tradeoffs -- which I don't think is the case -- there is a big difference between "will eventually happen" and "is coming" or "we're facing the end".
you mean popcorntime? or webtorrent + thepiratebay.org?
these get shut down due to legal issues, but, from a technical perspective, building netflix using bittorrent (or ipfs) is 100% doable and quite reliable
Until you want to watch that one kinda obscure thing that isn't adequately seeded... then your out of luck. (I'm aware of server "pre-seed"/enhancement options this just assumes you want to watch something that isn't seeded or pre-seeded)
That's because it isn't done legally, and therefore augmenting the network with a few "supernodes" (I believe that's what napster called them) is a big no-no. You could say the same about Netflix/HBO/..., by the way. Watching obscure things (that aren't worth Netflix or the copyright owner's time to negotiate rights for ... good luck). And if the present trend continues, and Disney, and Paramount, and ... start their own streaming platforms with exclusivity, well, popcorntime is going to be the vastly superior option in a year or two at most.
In the other case for a distributed example, you have the Steam platform. Obscure things download quickly and without much delay. And steam avoids the "rights negotiating" problem by being a marketplace.
BitTorrent is also much less stable than Netflix in terms of speed of access and availability of any given file, since the distribution of content distributes control as well as work. Peer discovery alone makes BitTorrent less performant than Netflix, for media which Netflix has to offer.
That's not necessarily correct. Depending on the time of day and your ISP, P2P video can be much more reliable than Netflix. Esp because you usually control quality. With Netflix, it can happen that the client decides to force you on a low resolution because they think the connection is slow. That creates quality issues you don't get with P2P.
Agreed! However, I must note that Netflix doesn't use AWS (or any cloud provider) for streaming content. Everything you're watching on Netflix is streamed from one of their own NAS boxes located in or around your ISP. AWS is used for everything else (i.e. billing, suggestions, ratings, feedback, support, transcoding, etc.)
Yes streaming is a large part of Netflix but it's not the only part, and may not be the largest part. Netflix has over 30,000 EC2 instances running AWS plus a lot of other resources (S3, redis, etc.) Building Netflix on your own platform would be foolish.
Where do you see these "more and more ISPs having data caps" exactly, please?
I live in Eastern Europe and I have a legit gigabit connection with no limits. A load of people around here pay $10 for 75Mbps and connections most USA can only dream of.
Assuming you talk about USA... just saying, world > USA.
All mobile broadband isps in eu and Asia (where they usually cannot get anything else) have caps at just a few gigs of data. Even in Sweden, they are adding hidden limits when reaching a TB of data on cabled broadband. This means streaming 4K video will reach these limits.
I definitely agree for the rural / less accessible areas. Carriers definitely have a monopoly there and certain countries like Sweden, Norway and Finland are too vast to have good cabled coverage.
In most of the EU cities you got plenty of choice however.
That's absolutely true, the closer to the root nodes of EU you get the cheaper the bandwidth is. Heck, in Romania you can even get 10Gbit on a small datacenter for a few thousand €.
So sure, if the larger nodes can weigh up the smaller ones in a distributed network, then it might be plausible.
As I’ve written before, privacy and risk mitigation are the two reasons this article is wrong. The cloud is already the pinnacle of decentralization, and distributed web is just a fad.
Don't get me wrong, I'm a darknet, meshnet supporter. I love decentralization. That said, I support the establishment of the infrastructure required to support it independent of end-user devices, and I think for security and other purposes it's at least possible they should remain separate, and devs shouldn't assume so much right to cpu-cycles.
So in essence the topology I think that is preferable would be properly called decentralized-distributed.
Of course thats part of the reason I support things that go against that common grain, such as ipv6nat.