So it is preemptively switching then? > It doesn't really imply anything about h...

coder543 · on Nov 19, 2018

> So it is preemptively switching then?

Any time you call a blocking function that the system provides, it should immediately yield the fiber, which makes it look preemptive. If you write a loop that just spins forever, that could block the whole system, potentially, making the abstraction leaky. In a language like Ruby, they could definitely add some true preemption, but I don't know if that's what they plan to do.

From the article: "Fibers, on the other hand, are semantically similarly to threads (excepting parallelism), but with less overhead." So, the author definitely isn't implying parallelism of the fibers.

> What's all the fuss about paralllelism in the article about then?

The author was talking about different methods for handling more than one request at a given time, which include forking and threads. With Ruby's GIL, threads are a lot less attractive than they could be. A good fiber implementation can handle tons of network requests concurrently and very efficiently even on a single core, which is the case being discussed here.

At the end, the author discusses a hybrid approach of forking and fibers, where each processor core would have a fork of the Ruby program running, and each fork would have its own fiber pool, running many tasks concurrently.

In languages that don't have a GIL, forking is rarely a tool that I reach for. It really hurts your database pooling and all sorts of other small problems, but it's a common trade-off when using Ruby, Python, and Node.

pooppaint · on Nov 19, 2018

> Any time you call a blocking function that the system provides, it should immediately yield the fiber, which makes it look preemptive.

In the old days we would call this cooperative to contrast it from preemptive. This is the essence of cooperative, yielding at explicit points be they IO request, timers, or waiting on a message queue. Preemptive used to mean a certain thing and this is not it at all.

coder543 · on Nov 19, 2018

cooperative multitasking typically implies (to me, at least) that the programmer is required to explicitly / manually yield their task, which is annoying, error prone, and isn't required here. The system's blocking functions will handle that behind the scenes.

Fibers are cooperative here, but not from the programmer's point of view, and that's an important distinction to make. If you write the same code for a cooperative system as you would for a preemptive system, is there really any difference to the programmer? It looks preemptive. If anything, properly implemented cooperative systems are more efficient. Most of the time when people ask the question that is asked higher in the thread, I believe they're worried that they will be responsible for remembering to yield control.

I'm pretty sure I did a decent job in my previous comment of explaining that the system only looks preemptive, and that it is possible to block it with some uncooperative code, so I'm not sure what point you're trying to make.

Xixi · on Nov 19, 2018

Not the person you are responding to.

It's a matter of point of view, but to me cooperative/preemptive is a property of the underlying scheduler, not of what the programmer is usually exposed to. As you correctly pointed out, it is possible to block the scheduler with uncooperative code. It's not even hard: it takes just one heavy CPU-bound computation. I write these kind of computations every day: if you sell me a system as preemptive and it's not, I will get angry...

monocasa · on Nov 19, 2018

That's always how it worked though. In the cooperative multitasking that people complain about (in early Windows and Mac for instance), "blocking I/O" called yield internally, and you only needed to call yield() manually in long running computations that didn't have any I/O.

What you're describing is bog standard cooperative multitasking.

devxpy · on Nov 19, 2018

Exactly why they said that it “looks” preemptive. It’s a mirage...

monocasa · on Nov 19, 2018

Except "preemptive" is literally defined as the contrast of this scheme.

jashmatthews · on Nov 19, 2018

Forking is going out of fashion in Ruby land really fast due to the problems you mentioned. Now multi-threading is the norm it's easier to just run one process per core and live with a little extra RAM usage.

I was working on some more improvements to forked memory usage in CRuby but I don't think it's worth pursuing.

joelbluminator · on Nov 19, 2018

What are you basing that it's going out of fashion? Shopify and Github both run unicorn which is pre forking as far as i know. I think some companies prefer to prevent thread safety issues and pay the extra performance cost.

jashmatthews · on Nov 19, 2018

Yeah, and Shopify are still using Resqueue too but almost all newer large apps are going multi-threaded from the start and using Puma + Sidekiq.

Puma has exceeded Unicorn in total downloads from RubyGems now.

coder543 · on Nov 19, 2018

I meant all kinds of internal forking. Running a process per core is just pre-forking.

jashmatthews · on Nov 19, 2018

Sidekiq just runs N independent processes in the default setup. No forking at all. Pre-forking is an optional thing in Sidekiq Enterprise.

coder543 · on Nov 19, 2018

ok, so that is a fair distinction.

The point I'm trying to make is that you can't share resources easily between all of those processes, even though they're on a single machine, so you usually open a lot more database connections that you would need with a single shared connection pool. So, people often end up dealing with PgBouncer and other inconveniences much earlier than they would otherwise need.

But you're right, it's not necessarily forking.

jashmatthews · on Nov 19, 2018

Trying to share a much smaller number of connections between a larger number of threads with fine-grained checkin/checkout is a nightmare, in my experience. You end up with all sorts of difficult resource and lock contention issues. As soon as you need a simple transaction you're stuck holding the connection for the duration anyway.

coder543 · on Nov 19, 2018

In my experience, it's all handled transparently behind the scenes... there is no headache. In Rust, checking a connection out is a single function call on the pool, which is easily shared among all threads, and it will automatically get checked back into the pool when the connection goes out of scope... you don't have to do a single thing to check it back in. In Go, the connection pooling is all handled transparently behind the scenes, such that you don't even need to know it's happening. I actually had to do some googling when I started using Go, as I was concerned that no one was recommending the use of a connection pool... creating and tearing down a connection per request is just wasteful when connection pools are so nice to use. It just turns out that Go embraces connection pools so deeply that I don't know of an easy way to avoid pooling your database connections.

If your application gets bottlenecked by the number of connections in your pool, it's easy enough to increase the number, but the more independent pools you have, the more overprovisioned connections (connected but not being used) you will have scattered throughout those pools. It's also usually possible to run a connection pool without an upper limit, if you trust your database to handle large number of connections gracefully.

Rust and Go's connection poolers will also automatically scale down the connection pool when connections are idle for a given period of time, which is nice.

I can't think of any nightmares or headaches that I've encountered with those connection poolers. It all "Just Works"... except for PgBouncer, the ultimate connection pooler. PgBouncer doesn't work with prepared statements or transactions unless you run it in transaction mode, and then you have to run every query in a transaction to use prepared statements.

I'm definitely not suggesting that you try to serve 1000 concurrent requests with 10 connections or something silly like that, but that is what often happens when you get large Ruby deployments which would attempt to establish more connections than Postgres can handle, so you route them through PgBouncer where a small fraction of the number of connections exists.

But, this is pretty off-topic at this point. I didn't mean to point the conversation in this direction.

jashmatthews · on Nov 20, 2018

> Rust, checking a connection out is a single function call on the pool

Still a pain in the arse when you are making function calls inside a transaction and dealing with the the connection reference lifetime.

> Go, the connection pooling is all handled transparently behind the scenes

This actually has a few nasty properties. Firstly, executing two simple queries in seemingly sequential Go code actually execute in parallel. Secondly, it's possible for Go's connection pooling to cause some very nasty failures. Rather than timing out at the first of a bunch of normally fast but now unusually slow queries (because of a lock etc), Go will keep spawning new connections and parking running but not yet timed out queryies until everything is on fire. Max connections is definitely a good idea.

> It's easy enough to increase the number

Only if you can restart your DB. Which, if you're trying to scale up under load, is the last thing you want to do.

> automatically scale down the connection pool when connections are idle for a given period of time

PgBouncer has supported this since release.

> PgBouncer doesn't work with prepared statements

Prepared statements themselves work fine. The problem is many ORMs do fragile, non-deterministic things with caching named prepared statements to improve throughput in simple scenarios.

Using named prepared statements can also cause other issues because it signals to PG that it's OK to use a generic query in some cases. It might not be!

coder543 · on Nov 20, 2018

> Only if you can restart your DB.

I'm talking about client-side connection count maximums, not server-side. It's just a setting in connection pools like Rust and Go have.

> Prepared statements themselves work fine. The problem is many ORMs do fragile, non-deterministic things with caching named prepared statements to improve throughput in simple scenarios.

Postgres specifically supports unnamed prepared statements as a feature, and PgBouncer's model cannot do anything to help those. One connection creates this statement, and another tries to execute it. In fact, PgBouncer's docs specifically say that they do not support prepared statements, and not to use them, so your claim is contrary to the docs.

I really don't want to even bother with your Rust and Go comments, since they are just nonsense. Lifetimes are not a problem with function calls involving transactions in Rust. At all. I work with Ruby, Rust, and PostgreSQL professionally at my current full-time job. I've written a lot of queries, and many of those involved transactions.

Go will not execute two seemingly sequential queries in parallel. It will execute them sequentially. When you run a query, it's a synchronous process, unless you specifically launch that query in its own separate goroutine... in which case, it is absolutely not a surprise that it runs in parallel, because you did that. Your slippery slope argument is completely nullified by this property. If you don't set a maximum database connection limit and your web server receives another request that requires a connection, it's no surprise that it tries to open another database connection to help service that web request. From the beginning, it appeared you were making the argument that connection pools should not be used, and therefore each request just handles its own connections... which would also be unbounded just like this. Fortunately, Go and Rust database pools provide an option to limit the upper bound. I worked with Go and MySQL professionally at my previous full-time job.

Then... you're defending PgBouncer?! I thought you hated connection pools? PgBouncer is great at what it does, but what it does is a painful headache to deal with, because it breaks half the features any normal Postgres client expects to work seamlessly. You can't just prop it up in front of a database and expect things to "just work".

You're presenting information like you have all this experience, but my experience clearly indicates that what you're saying is just plainly wrong. I don't see any benefit to either of us in continuing this discussion further. I'm out.

jashmatthews · on Nov 20, 2018

> I thought you hated connection pools?

No, you're just being unnecessarily rude. I just said I thought that fine-grained checkin-checkout like Go and Rust encourage is somewhat overrated. There's no need to be hostile.

> Postgres specifically supports unnamed prepared statements as a feature, and PgBouncer's model cannot do anything to help those.

PqExecParams (single phase prepared statement) works fine over PgBouncer. That's what I'm talking about. This is different to PREPARE & EXEC. You can't actually do a single phase prepared statement from psql, AFAIK, only via client libraries. https://www.postgresql.org/docs/11/libpq-exec.html

I agree, the PgBouncer documentation could be clearer. I think they just don't want people trying it to file bug reports. PREPARE and EXEC can actually work even over statement pooling but you need to make sure your connection setup statements prepare all the necessary statements.

> it's a synchronous process, unless you specifically launch that query in its own separate goroutine

You're right, I'm getting two things mixed up here. It is a while since I dealt with this problem.

1) The auto-checkout can mean you end up executing related statements on different connections. IMHO this is highly confusing to have as default behaviour and I prefer the Rust approach.

2) By default Go will just keep piling up Goroutines blocked on slow queries and open more DB connections and kill a DB server.

Reference for both: http://go-database-sql.org/connection-pool.html

> Rust database pools provide an option to limit the upper bound

If you're using Rust + Diesel + PG it's actually limited to 10 by default via R2D2: https://github.com/sfackler/r2d2/blob/ad9cdb9f1446c729240cf8...

I really think that's a lot saner than "open as many DB connections as you like, what could possibly Go wrong"

If you don't believe me that Diesel and single phase prepared statements work with PgBouncer, believe the Diesel maintainer, Sean Griffin: https://github.com/diesel-rs/diesel/issues/1028

I'm back to working in Rust again and plan on opening a PR to resolve this issue.

devxpy · on Nov 19, 2018

> Any time you call a blocking function that the system provides, it should immediately yield the fiber, which makes it look preemptive.

Makes sense. It let's you avoid the quite verbose and incompatible async/await syntax.

---

> The author was talking about different methods for handling more than one request at a given time, which include forking and threads

I feel like if the author suggests forking, he should also provide a way for reliable robust message passing, like erlang.

---

Thanks for such a detailed explanation. I wish more languages supported Erlang style processes. That are green, preemptively switched and multi-core!

Do you have any experience writing such systems? I was thinking to experiment with the cpython interpreter.

kabes · on Nov 19, 2018

Nope, they're coroutines, which are non-preemptive. Similar to generator functions (which is a subset).

https://en.wikipedia.org/wiki/Coroutine

rubyn00bie · on Nov 19, 2018

No they aren't preemptively switching/scheduling, you have to schedule it yourself...

I could be wicked wrong though, it's been a long while since I futzed with fibers.