More

waltpad · on July 10, 2020

I think another interesting question is, how would you handle the return value of such function in your code?

You would have to either:

- test for the type of that value in order to handle it properly,

- rely of the implicit cast rules of JS, which wouldn't be very useful here I suppose

As it has been said by others, in Ocaml (as well as in many other strongly typed languages) you can use a sum type to solve that problem.

Edit: In some languages, there is also the concept of "intersection types" which, if I understand correctly, let one also handle that sort of situation. The corresponding Wikipedia entry [1] gives a list of languages supporting that concept, and provides examples.

[1]: https://en.wikipedia.org/wiki/Intersection_type

codemonkey-zeta · on July 10, 2020

I've found this type of function useful for duck typing, and I've generally found duck typing to be a really useful tool to add polymorphism to code.

waltpad · on July 10, 2020

The point I was trying to make is that if you have to rely on runtime type information, you might as well use sum types in a more strongly typed language, where the scope of the code gets limited to the type itself.

With "duck typing", you need to have an extra test for types which your code is not supposed to work with, whereas with sum types, the type checker will help you verify that it cannot happen.

That being said, there are situations which a strong type system will not be able to model.

Anyway, at the end of the day, what matters is that you (and your coworkers) feel comfortable with your code. I know by experience that it's not always the case.

eru · on July 11, 2020

I find unions without a tag hard to handle.

To expand on another commenters example, suppose you have a sum type that models successful computation with a return value or a failure with an error message.

If you have tags / constructors, that's all easy. But if you just use naked unions, you can not write code that deals with error messages (or strings in general) in the success case.

waltpad · on July 10, 2020

Disclaimer: I am not a HW designer, I could very well be wrong.

It is true that there are tasks where threading matters, but still require a CPU rather than a GPU. I wonder however if these tasks do need full SSE/AVX etc. Couldn't these extensions be removed of the CPU cores and instead have the necessary work performed by the GPU?

It would be interesting to produce statistics on how much these extensions are used in these scenario. Imagine how much space and complexity could be saved on a CPU die by making stripped down versions. That space could in turn be used for more cores!

I read a little about the Xeon PHI cpus, which iirc, is a multicore CPU with a very small ISA, but I wonder why x86 makers aren't trying to go in that direction: isn't there plenty of dedicated workloads which would happily run on these (eg, web servers), or is this just a (too) simplistic view?

dragontamer · on July 10, 2020

> It is true that there are tasks where threading matters, but still require a CPU rather than a GPU. I wonder however if these tasks do need full SSE/AVX etc. Couldn't these extensions be removed of the CPU cores and instead have the necessary work performed by the GPU?

SSE/AVX shares an L1 cache that's damn near instantaneous to access for the CPU core. Total L1 bandwidth is on the scale of TB/s.

PCIe -> GPU takes 1-microsecond to 10-microseconds per access, and operates only at 50GB/s (or 1/20th the speed of L1 bandwidths).

------------

Case in point: Memset is very commonly AVX'd to clear out L1 cache and initialize ~1kb to 32kb of data to 0 as quickly as possible.

There's no way for "memset" to move from CPU to GPU unless you feel like obliterating the entire point of L1, L2, and L3 cache. If you moved a "memset" to GPU, it'd operate only at 15GB/s (the speed of PCIe 3.0 x16 lanes), far, far slower than L1 cache AVX-loads/stores.

SIMD units, like SSE and AVX, are highly "local" and have huge advantages.

TinkersW · on July 11, 2020

I think the opposite is where things need to go. Having a wide SIMD ALU quickly accessible from your CPU core is very useful, especially as it shares the same memory system and a much more flexible programming model that allows you to do everything in a single source.

waltpad · on July 11, 2020

The programming model is not very flexible at the lowest level: one has to create all the software infrastructure to communicate with the GPU (which boils down to sending commands and receiving response). There are languages (like futhark, julia, or even python), which handle all that boilerplate transparently.

The main problem is, afaik, that there is not enough control about where the code will run in these languages. At some point, one will want to describe all the algorithms using a single language, and somehow describe how the workload will have to be distributed across all the processors, or at least that's what I've been thinking about for a while. Once you have that level of control, the need for a versatile CPU is less clear. Note that nowadays people seems happy with hybrid solutions where the code is scattered across several languages (eg, one for the main program and one for the shaders, or for the client side UI), so my position is maybe not very strong.

HW-wise, is it possible that integrated GPUs are the first steps toward an architecture where CPU and GPU have better interconnections (ie, larger communication bandwidth and smaller latency) to the point where SIMD becomes moot? There is also the SWAR approach, where one doesn't rely on intrinsic SIMD instructions, but instead emulate them (though it's probably not very realistic for floating point computation).

Some other ideas:

- Apple has this neural engine in their latest chips, which is basically dedicated HW for neural networks

- In the wild, people are getting more and more interested in building their custom ASICs to cut software's middle-man cost: for them, the CPU solution is not good enough

- Intel recently introduced a new matrix ops extension in their CPUs: maybe at some point they'll introduce full GPU capabilities directly baked in the CPU? I am a little worried about the resulting ISA.

Anyway, I am not an HW engineer, nor a very good software one. I only have a limited view of the difficulties in writing good, CPU or GPU efficient code. My first post was prompted by remembering the first "large scale" multicores CPUs 15 years ago (specifically the Ultrasparc T1) which wheren't SIMD heavy. The direction naturally shifted as progress was made on SIMD to try to compete with GPUs, when it seems to me that originally CPUs and GPUs were complementary.

I tend to support modular solutions, but I don't know how costly that would be in term of efficiency at the HW level.

waltpad · on July 13, 2020

Ooops, I didn't mean Xeon PHI, I meant an older design with many small x86 cores.

Xeon PHI on the other hand was the first host of AVX-512 instruction set. Sorry.

waltpad · on July 5, 2020

> 100MB of doubles encoded as JSON.

That sounds like a very bad use case for JSON. I would be surprised if your program wasn't more efficient with an ad-hoc binary format for that piece of data.

beached_whale · on July 5, 2020

It's really hard to get away from getting data in JSON these days. It's ubiquitous.

It was a bit contrived true, but when some are doing it in 0.1s and many are in the 1.5-2s range, and that was parse time, not loading the data or startup. If I would go binary it would probably be something like protobuf, not ad-hoc. Ad-hoc has issues with maintenance, interop, and tooling.

waltpad · on July 9, 2020

> t's really hard to get away from getting data in JSON these days. It's ubiquitous.

Yes, I agree with that. It's a neat format if you have small pieces of information to move around, and it's very easy to read for humans, but for large enough data, wouldn't it turn into a bottle neck?

> If I would go binary it would probably be something like protobuf, not ad-hoc. Ad-hoc has issues with maintenance, interop, and tooling.

Indeed, there's always these dimensions to take into consideration, as well as evolution. The main issue is to find a library/format which is well supported across all sort of languages, and JSON has that. I don't know if there are many binary formats with the same level of support.

I suggested ad-hoc because the format seemed simple enough to be mmapped directly, or something equivalent (not sure how scripting languages would do in that case).

waltpad · on July 4, 2020

> fails at logic more than at poetry

This is purely subjective.

Your expectations in Poetry might be different from those of other people or even specialists. I am not particularly good in that domain, but I don't really like the results shown sometimes.

cmehdy · on July 4, 2020

I agree with you that it's subjective. Testing logic vs. art is going to bring in this kind of problem to the surface (how do you test art in a comparable way to how you test logic?). This is why I wrote that noticing the thoughts made me take a step back from my own projections (my subjectivity). That's the whole point.

waltpad · on July 3, 2020

Sadly, AFAIK, BLAS hasn't been updated to use generics. It would be so nice to have support for integer vector/matrix operations in there (not that it would require generics to have that, but it could be easier to implement integer support with it, though I suspect that for efficiency reason it might not be used in the end).

waltpad · on July 3, 2020

If you have a look at the different fortran implementations of BLAS gemm (matrix multiplication), you'll see that the transposed matrix cases are treated specifically. In fact, IIRC, the gemm function has flags to indicate for each matrix if it is transposed.

celrod · on July 3, 2020

That's what I did when calling OpenBLAS and MKL, but I confess I don't know the internal details of a non-inlined `matmul` call in gfortran when you don't use `-fexternal-blas`.

Just writing three loops and letting the compiler optimize it was much faster for `A * B'`, so it must be a pretty naive implementation getting called.

waltpad · on July 2, 2020

The difference is that your DB is the FS, and each JSON file is an individual record. You're not storing each table in a single JSON document.

waltpad · on July 2, 2020

Maybe because it was tempting: JSON is fairly easy to handle, very portable, and when you look at a JSON document, it's straightforward to think about querying it, and thus DB, although JSON is structured, and DBs are relational.

nailer · on July 2, 2020

> although JSON is structured, and DBs are relational.

You can have structured relational databases (RethinkDB for one, but there are others)

waltpad · on July 2, 2020

I think that, oddly enough, the point of that project is to use the JSON format as the DB storage format, not as an export option. Just from the look of it (I don't know either projects), LokiJS will very likely be always faster.

slantyyz · on July 2, 2020

Sorry for the confusion, by "save as", I didn't mean export.

LokiJS has multiple persistence options, with JSON files in the filesystem being just one of them.

Alternatively, you could also just use it in-memory or with IndexedDB.

waltpad · on July 2, 2020

Oh, I understand. I suppose it makes sense, if for instance one needs to store a bunch of parameters somewhere, it might as well be a JSON file.

waltpad · on July 2, 2020

That's a very strange idea: JSON is basically structured data, like XML, it's nice for documents with deeply nested structures.

The main issue is that contrarily to a DB, any modification will shift everything after it, so any indexing will have to be corrected. I suppose that if the document is not stored as is, but instead broken up in pages (filesystems are likely doing that already, so piggy backing on that could help), then indexing could be improved, but then storage starts to look like a regular DB, rather than JSON.

Interesting nonetheless, time will tell.