While I really like these blog posts for keeping things light and intuitive, they're also sprinkled with claims that go a bit too far.
> Arrays, Objects, Maps, WeakMaps, and Sets are all algebraic data types
This isn't really true on a couple levels. Algebraic Data Types are described through a small set of composition (data) or decomposition (codata) operations. Arrays, Maps, and Objects might be explainable in this way (though, really, we'd just be modeling them with ADTs). WeakMaps and Sets really cross the line though. WeakMaps, by their nature, need to appeal to something much more than just ADTs and Sets require "quotienting" to unify sets which contain the same elements in various ways. You can build the "raw stuff" from ADTs, but you won't have something that really behaves much like these types at all until you add in something extra.
The emphasis on sum types is huge, but it leaves out something critical: "JavaScript doesn’t have a lot of built-in features for sum types" is true of user-defined sum types (and related to it just not having types all together), but it has great support for a few special sum types: integers, booleans, characters! These sum types are important to talk about because it helps make clear that we already work with sum types all the time! Our basis for distinction and choice is built atop them in any language.
Finally, while products and sums get you some wonderful modeling capabilities, the place where Algebraic Data Types really shine is when you add two further capabilities: function types (exponentials) and recursion (fixpoint, which doesn't really map to high school algebra). Those are clearly built into JavaScript as well and would be great to discuss as well.
I really, really hope JavaScript eventually gets support for:
- User defined Sum types
- Pattern matching on these Sum types in switch statements
- Blocks and things-that-are-currently-statements becoming expressions that evaluate to a value.
If it got those, then it would pretty close to the perfect dynamic language for me.
What is the benefit of pattern matching in a language where any value can be of any type at runtime? You can just pattern match by saying if (obj.type === 'foo') {}; if (obj.type === 'bar') {}; etc in JS, as long as you have set up the convention that you set the 'type' field, and practically this is just as good.
The real benefit of pattern matching is the compiler checking at least that you have valid cases, and ideally that you haven't missed any cases.
Pattern matching goes significantly beyond checking “foo.type”. It adds the ability to specify exactly (as exact as the language and your data allows) what case you’re dealing with. For example, a list of length three where the first and last elements are empty strings. Having to specify this with a series of if statements often leads to verbose and error-prone code, but it’s trivial in a language like reasonml:
switch (mylist) {
| [“”, _, “”] => x
| _ => y
}
Moreover, pattern matching usually involves pulling information out of the object at the same time. In the above example, maybe you want to do something with the middle element:
While having a type system is helpful in all of the ways a type system is generally helpful, there’s nothing about this code which wouldn’t also be handy in a dynamic language like JavaScript. For further evidence of this see Erlang and Elixir, dynamic languages which make heavy use of pattern matching.
(Code typed on my iPhone so forgive the dopey examples)
Nevertheless, those special cases are the exception, not the rule. Most of the time the pattern matching really is just a more convenient switch statement.
I'm not sure what your experience is, but this seems to differ from mine (lots of recent Haskell experience, some dated OCaml).
I agree that matching on multiple levels in one statement isn't common, if that's what you were talking about. But performing bindings in a match is very common, and not something that's part of a switch statement (except in those languages where a switch statement is doing pattern matching more generally anyway).
Yes, I was just trying to say that matching on multiple levels is rarer. I agree with you that automatically binding new variables is the best part of pattern matching.
My examples were deliberately dumb, but I’ve certainly use this ability to pattern match at arbitrary depth innumerable times. It most commonly appears when looking at a tuple of N values, and I want to enumerate specific pairings of values.
Nevertheless, if that was the only benefit of it, one might argue it’d be more of a party trick. But it’s simply one more facet of a very powerful, simple-to-use technique for expressing how your code reacts to the shape of your data.
I’m curious how much exposure you’ve had to languages which support it — you might find it more useful than you think.
I find it often winds up clearer to pull things into multiple case statements, especially as the code grows more complicated. That said, there may be a difference in what we 're meaning by "most of the time," so let's be more precise. IME, most Haskell programmers will encounter use cases for multi-level matches (beyond simply pulling the first couple elements off a list) a small number of times per week (maybe month, depending on the programmer), whereas they're typically writing a case statement some number of times per hour.
I wouldn't call it a party trick, but I also wouldn't say it's the most common use of case by a long shot. I don't expect that you disagree.
I find that Haskell (both mine and that of others) involves a lot less pattern matching than ReasonML, since control flow is often expressed by type classes. Unfortunately, these are lacking in reasonml, and I think that might contribute to why you tend to see more explicit pattern matching there. But maybe that’s an incorrect impression. In either case, it’s not common all the time, but neither is it particularly rare.
myList match {
case Nil => 0
case first :: Nil => first * 3
case first :: second :: Nil => first - second
case first :: second :: rest => first * second + sum(rest)
}
In what sense are integers and booleans sum types? What is the definition of sum types that booleans fit? Is it just that it can be true or false, and because there's the "or", it's a sum type?
There’s nothing wrong with an infinite sum, in principle. You can’t usually define them unless you can use recursion. In the case of integers you can do
data Int = Positive Nat | Negative Nat
data Nat = Zero | Succ Nat
You can't. The Natural number definition combines a sum type with a recursive position. Or in other words, a finite tag (either zero or successors, 2 options) plus a pointer to the next value in the case of successor.
> Is it just that it can be true or false, and because there's the "or", it's a sum type?
You can rote learn that, but the reason why it's intuitive to call them sum types is because you're adding the number of possible values. With product types, you multiply the number of possible values that the type represents. Consider these types:
-- 1 + 1 = 2 possible values
data Bool = True | False
-- 1 + 1 * a = 1 + a possible values
data Maybe a = Nothing | Just a
-- 1 * a + 1 * b = a + b possible values
data Either a b = Left a | Right b
-- 1 * a * b possible values
data Pair a b = Pair a b
-- 1 * a * b * c possible values
data Triple a b c = Triple a b c
`Maybe Bool` would have 1 + 2 = 3 possible values.
`Either (Pair Bool Bool) (Maybe Bool)` would have (2 * 2) + (1 + 2) = 7 possible values.
Integers would be defined as:
data Int = ... | -2 | -1 | 0 | 1 | 2 | ...
if they were finite, but since they're not, they'd have to be defined recursively with something like:
data PositiveInteger = One | Succ PositiveInteger
data Integer = Zero | NonZero Bool PositiveInteger
where the Bool represents the sign.
Since 2 * x is the same as x + x, we can also define it as:
data Integer = Zero | NonZero (Either PositiveInteger PositiveInteger)
where Left of Either would represent negative integers, and Right of Either would represent positive integers.
Summing is the act of gluing two components together with “or”. It’s possible to see all types as sums in this fashion (this is essentially “distributivity of addition over multiplication) but it’s also impossible to see char/int/bool as not having some sense of “or” within it.
For a product type with no “or”, consider the JavaScript object {}. We see these as “atomic”.
A boolean is either true or false - i.e., it's the type inhabited only by "True" plus the type inhabited only by "False". An integer is either 1, or 2, or 3, or 4, or..., i.e., it's the countably infinite sum of other types.
First, here is a note about the notation that I will use here: Let `1` denote the unit type, the type with one inhabitant. Let `()` denote the inhabitant of the unit type. Let `a + b` or `Either a b` denote the sum of types `a` and `b`. Let `Left` and `Right` be the constructors of the sum type.
Before defining the integers, one must define the natural numbers. Natural numbers are defined as an inductive type, which is a little different from an ordinary sum type:
data Nat = Z | Suc Nat
Here is another way of defining them. Start by removing the self-reference:
data NatF a = Z | Suc a
Notice that NatF is the same thing as the Maybe/Option type, AKA 1+a, AKA the sum of the unit type and a.
Given a "fix" type:
data Fix f = MkFix (f (Fix f))
the definition of the natural numbers would be:
type Nat = Fix NatF
An "F-algebra" is a way of defining some algebraic structure or data type as a map from a "sum of products" to the type. More precisely, an F-algebra consists of an "object" (type) A and a map F(A) -> A. For the natural numbers, the F is NatF and the map would be a function mkNat : NatF Nat -> Nat, AKA mkNat : (Maybe Nat) -> Nat or mkNat : (1 + Nat) -> Nat.
You can't have negative numbers with that definition. Here are numbers and the way you would write them with that definition:
0 == Z
1 == Suc Z
2 == Suc (Suc Z)
3 == Suc (Suc (Suc Z))
You can define some arithmetic operators for it:
data Nat = Z | Suc Nat deriving (Show, Eq)
instance Num Nat where
(Suc x) + y = Suc (x + y)
Z + y = y
(Suc x) - (Suc y) = x - y
x - Z = x
Z - (Suc _) = undefined
Z * y = Z
(Suc x) * y = (x * y) + y
negate = undefined
abs x = x
signum (Suc _) = Suc Z
signum Z = Z
fromInteger 0 = Z
fromInteger x
| x < 0 = undefined
| otherwise = Suc (fromInteger (x - 1))
The definition `fromInteger` is used by GHC to implicitly convert decimal numbers in the code to our Nat. We can use it like this:
ghci> 0 == Z
True
ghci> 1 == Suc Z
True
ghci> 2 == Suc (Suc Z)
True
ghci> 3 :: Nat
Suc (Suc (Suc Z))
ghci> 3 + 2 :: Nat
Suc (Suc (Suc (Suc (Suc Z))))
ghci> 3 - 2 :: Nat
Suc Z
ghci> 3 * 2 :: Nat
Suc (Suc (Suc (Suc (Suc (Suc Z)))))
ghci> 3 - 4 :: Nat
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:78:14 in base:GHC.Err
undefined, called at <interactive>:70:15 in interactive:Ghci8
Sure, in the same sense that every set is a disjoint union over singletons of all its values. But whether you would or could naturally specify it that way or not depends on the set. You likely wouldn't sit down and specify the type "Function from natural numbers to Boolean" as a big OR over each possible such function, for example.
But yes, whether something is a sum type or a product type or whatever is not generally so interesting as a predicate about the type in itself. It's a relationship between this type and other types. The same way it is not interesting to say "7 is a sum" (everything is a sum); what's interesting is to observe "7 is the sum of 3 and 4".
[Disclaimer: In the same way that some topological spaces are connected and cannot be decomposed into separate disconnected components, one might sometimes reason that some types are connected and not reasonably represented as a disjoint unions over subtypes. So "Everything is a sum" is not always correct in all contexts. But let's ignore this sort of pedantry for now.]
Here's what's bugging me: Integers are a disjoint union of values, which is (in my non-abstract-algebra mind) something different than a disjoint union of types. That is, something like Maybe Integer is a "sum type" in a completely different way from the way that Integer is a "sum type". To me, this is different enough that a different term should be used.
I'm sure you're going to tell me that I'm not thinking abstractly enough...
There are sums at the type and value level. If I define Color = Red | Blue | Green, that’s not a use of a type which sums types together, but instead just is a type composed of a sum of values.
So a “sum type” can be any type which is the sum of different values. On the other hand, Either is a canonical “sum type (constructor)” which sums two types together (via the summation of their vaues).
If it weren't for finiteness restrictions, you could write the following in Haskell and it'd make a natural number type that acted basically as expected.
data Zero = Zero
data One = One
data Two = Two
...
data NaturalNumber = Zero | One | Two | Three | ...
Whatever you would say about sets, you might as well say the same thing about types; the distinction in terminology doesn't amount to much. The natural numbers are the disjoint union of {0}, {1}, {2}, {3}, ... . They're also the disjoint union of the even naturals and the odd naturals, or the disjoint union of {0}, {1}, primes, and composites. There are a million ways in which they are a disjoint union.
I don't think it's your level of abstraction, I think it's that you are treating as unrelated two sides of the same coin. This sum type / variant / disjoint union / coproduct construction operates both at the level of types and the level of values. After all, we only care about types insofar as they tell us about values.
So Integer (0 or 1 or 2 or...) is a sum type in the same sense that Maybe Integer (Nothing or 0 or 1 or 2 or...) is. I see. That makes some sense.
So if I'm in Haskell, say, and I've got a Maybe Integer, and I'm trying to walk through the options (pattern matching), I usually see the options as Nothing or Just Integer. Can I instead do Nothing or Just 0 or... "Just some other Integer"? That is, can I treat... um, not sure how to say this in light of your post... "individual values of a particular named type" the same way as I can treat "different named types"?
Ah I see, so what you're talking about is the syntax for defining a sum type. This confused me too. In Haskell, a definition of a sum type looks something like:
data MaybeInteger = Nothing | Just Integer
This defines the type MaybeInteger by saying how to construct the values that inhabit it. `Nothing` is a value that is immediately of type MaybeInteger. `Just` is a way to construct a value of type MaybeInteger by feeding it an Integer. You use `Just` to construct a value by applying it to an Integer: `Just 4`, for example.
There is an alternative syntax for defining sum types that might be clearer:
data MaybeInteger where
Nothing :: MaybeInteger
Just :: Integer -> MaybeInteger
When using sum types, we use a case statement:
case x of
Nothing -> "nothing"
Just 1 -> "one"
Just _ -> "something else"
data MaybeInteger where
Nothing :: MaybeInteger
Just :: Integer -> MaybeInteger
Doesn't this mean that Nothing (ie. MaybeInteger) is the same "type"/"supertype" however you call it, as a constructor from Integer to MaybeInteger?
How does that work, e.g. how does MaybeInteger is equivalent to Integer -> MaybeInteger? I guess ending up at the same type is crucial, but one looks to be the type directly, while the other a constructor for the type.
And how can "Just" be a constructor for MaybeInteger here, but e.g. MaybeString in some other example? Is it specially blessed? What kind of thing it is in the language?
Nothing and Just are constructors. Constructors need not be the same type. The whole point of a sum type is that you can construct it in multiple ways, after all.
In this case Nothing is immediately a value of type MaybeInteger. Just on the other hand has the type of a function :: Integer -> MaybeInteger.
So they don't have the same type, but Nothing and Just 5 have the same type, MaybeInteger.
For pedagogical purposes I pretended to work with MaybeInteger instead of Maybe from Haskell. The definition of Maybe is:
data Maybe a = Nothing | Just a
or alternatively
data Maybe a where
Nothing :: Maybe a
Just :: a -> Maybe a
I didn't want to simultaneously explain polymorphism and sum types, but the way you can combine them is vastly more useful than either feature alone.
It's rarely useful (though there are some interesting examples) but it certainly is possible represent the naturals entirely in the type system as Church encodings [1] or Peano naturals [2] entirely in the Haskell type system. You'll see some references to such things elsewhere in these threads.
You can have a "type" for 4 in peano naturals and use that as a type guard on a function that can only physically take the number 4, and the Haskell type checking system will enforce that.
It's rare that you need that granular of a type, though, and as pointed out in sibling response it's easy enough to do with runtime pattern matching of Integer values. But it does illustrate that type abstraction "can go deeper".
Indeed, maps and sets cannot be constructed from sums, products and recursion. They can be approximated if you throw exponentials in there.
Instead, you can extend algebraic data types to combinatorial species, which do allow exactly the kind of quotienting you speak of. While they're unexplored in programming, they are believed to be programmable similarly to ADTs.
Don’t mind my nitpicking. A huge part of algebraic data types arises from products and sums just as discussed here. As is always the case, there is hairiness between metaphor and precision, but presenting these ideas in JavaScript like this goes a long way!
Please don't do the "I'll never understand this" thing! Sure, it's possible that you never will, but mostly it's just about trying and practicing... IFAIUI the current scientifically supported thinking is that a good way to learn is to approach the subject from many different angles.
Something to do with encouraging your brain to form lots of connections from different concepts to the concept you're trying to learn...
I never thought I would understand point set topology, but for a few years I really did. (That was quite a few years ago, and since I haven't practiced... Well, it atrophied. Still remember my favorite proof, though! I also remember what a nightmare proof Stone–Weierstrass was :) ... amazing, but still a nightmare.)
I actually have a decent practical understanding of sum types as they are used in Rust at least (including the tagged union implementation of enum). It's good enough to do useful work.
Every time that functional programming patterns get discussed, though, there is a mad rush by functional programmers to impress upon everyone just how complicated they are.
It is the same impulse that keeps the mathematical entries in Wikipedia accurate but utterly worthless for most people.
Absolutely, a practical understanding is enough. It also suffices for monads, monad transformers, etc.
> Every time that functional programming patterns get discussed, though, there is a mad rush by functional programmers to impress upon everyone just how complicated they are.
I... haven't observed this behavior personally. FWIW, I don't think that saying (for example)
> Hey, we can actually define what "|" and "(_,_)" means in a mathematical way
detracts in any way from a practical understanding. Can you elaborate on why you think it's detrimental? (I understand that it can come off as a little bit smug, but beyond that... meh?)
Asserting that an article is "sprinkled with claims that go a bit too far" calls into question what the reader may have learned. It does not build upon the article, like a popularizer would, but first tears it down.
Then, to restore the reader's sense of mastery over the material, it's proposed that they grok a significantly more esoteric perspective.
I'm just trying to understand... is your beef here with the particular choice of verbiage?
(FWIW, I think it's absolutely right to call into question what the reader may have learned if an article is misleading. We can argue about wording, etc., but accuracy is important, I feel. Now, Lies-to-Children[0] certainly have their place, but they must be fundamentally accurate even if simplified.)
[0] I think I first heard of this concept in the Science of Discworld books...?
When I found sum types (in Haskell) I could not believe they had not been part of any prior language I'd learned. You can kind of fake them with inheritance but it's awkward, and you don't get totality checking, and you can't close the set of alternatives, so a new descendent added later can break things.
> I could not believe they had not been part of any prior language I'd learned.
Huh? They're part of Pascal (known as variant records) and plenty of other languages besides. It's mostly C that lacks them, and even then you're just expected to implement them yourself, for maximum efficiency.
> Huh? They're part of Pascal (known as variant records) and plenty of other languages besides. It's mostly C that lacks them, and even then you're just expected to implement them yourself, for maximum efficiency.
"plenty of other languages besides" doesn't matter if those languages aren't being used in practice.
You can argue about the demographics of Stack Overflow or the bias in users who actually respond on these surveys or a million other things. But based on job postings I've seen in the last several years, I'd wager this list is pretty accurate.
Yeah. You're not wrong. Sadly, while HN is full of people of great technical expertise, it's also full of people whose knee-jerk reaction to reading _anything_ here is to think of how they can say it's wrong, preferably in a way that implies or explicitly states that they're bigger experts than the person they're replying to. I find it very off-putting, personally, but I keep wading through the crap because (thankfully) that's not the whole story.
I did follow the link, but I disagree with your conclusion that being used by at least 6.6% of respondents to make apps for hundereds of millions of users amounts to "not being used in practice".
If you take my comment in isolation, that's a reasonable read.
But it was a response to the following comment:
> > I could not believe they had not been part of any prior language I'd learned.
> Huh? They're part of Pascal (known as variant records) and plenty of other languages besides. It's mostly C that lacks them, and even then you're just expected to implement them yourself, for maximum efficiency.
With the context of that post, I hope you would understand that I am disagreeing with the insinuation that every programmer has used "Pascal [...] and plenty of other languages besides". You can easily learn five different languages and finally end up in Haskell without having encountered ADTs before.
They're not present in the dominant statically typed languages (C/C++/C#/Java/Go) and not really a thing in dynamic languages generally, and so the vast majority of programmers never encounter the concept.
Well, Pascal has been dominant for a short time before C and its family really took off. But that was such a long time ago that I kind of forgot about it. C (without ++) was nice because you could write code without too much boilerplate compare to Pascal and it was noticeably faster (computers were slow.) However any mistake with pointers could crash the computer if it was a DOS PC and not a UNIX workstation.
My understanding is that's not really a sum type, though -- e.g., you couldn't use it to add a type to itself, only to add distinct types to each other.
No, you can add a type to itself (that is a pain in the ass though). You might be thinking of the fact that it may not hold a value (eg. if an exception is thrown while moving into it), but they made it as close to a sum as they could given the nature of C++.
It's been a long time since I did anything with Pascal, but if I recall variant records correctly, they're not very different from C's unions. In particular, I don't recall variant records having anything like totality checking. I don't even recall it having anything that could tell you what kind of thing it currently holds. But as I said, it's been a long time...
> They're part of Pascal (known as variant records) and plenty of other languages besides.
It sounds like GP is saying they had never used of those languages beforehand. I'm not sure why would be so surprising given that it doesn't accompany any other information like "and I had used 20 other languages before it".
Even Haskell doesn't get it right: there are no exhaustiveness checks on pattern matches (something OCaml has), and while sum types provide closed unions, getting open unions is clunky (have to use typeclasses). OOP languages are imperfect in the opposite direction: open unions out of the box (inheritance) but no closed unions and no pattern matching. Seems language authors on the whole are bad thinking design spaces through. "The user might want any of three simple things X, Y, or Z, so let's choose the one that our language will support well, the one it will kind of support, and the one it won't support whatsoever just for the hell of it".
We say a sum type is "open" is more alternatives can be added later (or some alternatives can be removed later), perhaps in a different translation unit or a different library. We say it is "closed" if all alternatives must be specified at the definition, no alteration allowed.
For example if you write this in Haskell
data PrimaryColor = Red | Green | Blue
colorToInt Red = 1
colorToInt Green = 2
colorToInt Blue = 3
it is closed because future code can't add more alternatives. Instead you must simulate it using type classes, with a wildly different syntax, and (some would say) a hack. Or define a different sum type that wraps the above.
OCaml on the other hand supports the above perfectly with a slightly different syntax, but it also supports polymorphic variants (not to be confused with a Haskell sum type that's polymorphic because of a type variable like Maybe) which are lighter weight. For example you can write a function that takes different "constructors" without necessarily giving a definition for the type or listing all alternatives:
let color_to_int = function
| `Red -> 1 | `Green -> 2 | `Blue -> 3
Notice I never defined any type for the three colors! If you wish, you can mention the type:
[< `Red | `Green | `Blue ]
or give it an alias to make it look more normal. But you can add (with some limitations) or remove more things to this variant later on.
To me, though, the most intuitive way of understanding this is to think of the first as nominal typing, whereas the second is structural typing.
All of these things build up some form of computation that doesn’t execute until later. Until then, they’re represented by some kind of data structure.
I feel lucky that I learned Standard ML among the early languages I learned as a teenager, which got me into this mindset. That was way back before the monadic revolution.
A C union is a sum, but not much of a type, since it's never checked (not even at runtime like Python) and using it wrong in your code will silently corrupt your program.
Where does the author insert themselves into their imaginary social interactions? Are they the sneering functional programmer?
In particular, I wonder about their characterization of sum types. I had to search to make sure, but they never say the word "union". This is curious, because they do understand "tag" and "tagged" as words used to describe sum-type behavior, and because the classic way to explain sum types to "imperative" C programmers is as "tagged unions", or union types with a tag value that explains which of the union's constructors is present. C or Java programmers would instead prefer an analogy with subtyping, which the article proceeds to use, but by overlooking the tagged-union analogy, the author is actually missing out on both a useful analogy for reaching out to imperative programmers, and also how low-level implementations of functional programming languages usually implement sum types.
Pattern-matching is probably the killer part of ADTs, but only two paragraphs were spent upon the concept. Worse, it's implied that pattern-matching is tied deeply both to ADTs and to functional programming, when it occurs outside of those contexts. Languages like Python 3, Ruby, Racket, and Swift fall outside of the traditional "functional programming" traditions but still have interesting pattern-matching abilities.
The author still doesn't know Haskell. Want an effectful loop? Control.Monad.Loops [0] contains many prebuilt loops, and they're written in standard Haskell. The ST monad [1] provides imperative variables and mutation. So does IO [2], if one insists on avoiding GHC for some reason. I agree that it's good to learn an ML, and Haskell is a fine choice, but the author needs to realize how drenched in Haskell memes they currently are, and to either learn more of Haskell itself, or to take a break from it for a while.
My thesis is, that proclaimed functional programmers spend more time on mathematical structures than implementation details. Hence a sum type and not a tagged union (how the target evaluater distinguishes should not name the type).
Software developers are allowed to have discussions about "integer" types without being constantly interrupted by number theorists to remind them that those aren't really integers.
Perhaps one day we will be able to have similar discussions about sum types without constant interjections by functional programmers.
A product type is just a closed group of values where you know in advance what will be grouped and what type the values are going to have that are part of the group.
A class in Java for example is a product type. As it describes a group of fields and their types. Same for a struct in C++.
Where I disagree with the article is that JavaScript does not have product types, because you do not have a definition of such a group of values in advance. A JS object does not list what values it groups and what type they each will have. A JS class comes closer in listing what values it will group, but not their types. Also, product types must be closed, and a JS class defines an open group, at runtime the object could group more then what it specified, unless explicitly frozen. You could say a frozen class defines a product of types where all types are the Object type and thus can be of any value, but that's a stretch, because product types are only as useful as they define a constrained product of possible values for the type.
A sum type is just a known list of possible types a value can take. That's the one people aren't as familiar with as most popular languages don't have a construct like it. It let's you say, this variable can be a String or an Int, but not both.
Often people say product type is AND while sum type is XOR. This variable contains a String AND an Int. They will both be there. So it contains both. That's a product type. This variable can contain a String XOR an Int, they can't be both contained, it is only one or the other. That's a sum type.
You can think in Java that all types are a sum type, they are either null or of their defined type. But null isn't a real type, more that it is an allowable value of all types. So it's a bit of a stretch.
Like the article said, best way to mimic them in popular languages is by extending an abstract class. Each type the variable can be will be a child of the parent. Thus a variable of type parent can be one and only one of its children. It's not full featured, you can't make existing types extend from the parent, so you can't arbitrarily define sum types using any available type. You also often can't list what are all the possible types of parent, depending on the reflection capabilities, knowing what all extend a type isn't always feasible. Also, the static type checkers don't consider these like sum types, so it won't tell you that you forgot to handle cases where parent is of one of its child type. Etc.
There's an old book Functional Programming: Practice and Theory by Bruce J. Maclennan to learn this if interested, it uses standard math notation instead of being restricted to teaching an existing programming language.
I found the best nugget of this entire article to be at the very end
> Keep persevering. Don’t give up. If you find that a bunch of blog posts don’t explain things in a way that makes sense, skip them. Keep looking until you find some well-written ones. The same goes for courses and tutorials. Everyone comes from different backgrounds. What makes sense for someone else might not work for you. And that’s OK.
I feel like this is advice everyone should hear, especially junior or mid level developers
I actually think this is a great little series of articles.
Something I've found confusing with algeabric data types is the fact that they are not mathematical concepts. They originate as a programming construct and were invented as part of a programming language. They were then given a name that makes it sounds mathematical, but it was just an attempt at giving some kind of metaphorical meaning to them. Same as if I name a search engine Odin, because Odin was the God of wisdom. But if you thought it had anything to do with Norse mythology you'd similarly get confused.
People tried to find correspondence for them with mathematical theories. Thus, in Set theory they could correspond to Disjoint Unions of Cartesian Products. Or in category theory, they could correspond to Coproducts of products.
I guess you could say they came out of type theory, but I'm not sure of the history here fully. Did the extensions to simply typed Lambda calculus first came out of the theory or did it retroactively built a theory from the programming construct?
At any rate, my point is, these mathematical correspondence are very confusing. And I think it's wrong to try and understand these things from the mathematical angle unless you're a mathematician trained in one of the corresponding mathematical theories.
While I don't know the history of the term "algebraic data type", I do know that the algebraic structure of ADTs is much deeper than what this article presented. For instance this structure supports some limited notion of a derivative[1]! And purely algebraic manipulation of ADT "equations" can actually be used to generate some non-obvious results [2]. So I think it's probably wrong to say that ADTs were just "given a name that sounds mathematical".
That's all true from the fact that there are real correspondences with math theories. But like, that's the whole point of applied math no? To find correspondences with other things so that we can then leverage and apply the math principles to that thing?
I'm sure there's quite many things in programming language where you can use such correspondence to your advantage, not just ADTs. But we don't teach it starting from the mathematical correspondence and then back.
To put it more simply, most programming isn't thought by first teaching learners about theoretical computer science. Generally it happens either in parallel or you learn the theory after the fact. But for some reason, when it comes to functional programming it seems most teachers want to start with teaching you the theory.
And my above point is that, even historically, it is not always true that theory came first. Often times, the construct were invented and used practically from finding solutions to concrete problems, and later a theory around it was developed. In the case of ADTs, I'm not 100% sure which came first, but I know they were first implemented in the programming language Hope. Not sure if the theory was there prior or not.
I think you might be right in this case that practical use preceded theoretical understanding but that's not really my point. Software development certainly has examples of pseudo-math (for example there was a trend a few years back to write JavaScript that could run on both the server and the client. For reasons passing understanding, this was called "isomorphic" by some practitioners). All I meant was that ADTs are not that.
Ah I see. I believe you are correct. The correspondence seems to hold quite well, and even the name algeabric was pretty aptly chosen in this instance.
I'd be curious to know if there's an original paper on ADTs somewhere about it.
I actually prefer the second answer in that stack exchange, the one from Gilles. It basically says what I'm saying:
It is not that there are no mathematical correspondence, but ADTs are not constructs of any mathematical theories. There's no ADT in set theory. There's no ADT in category theory. The only place you might find them is in type theory, but that's unclear to me if that is the theory that defined them, or simply the theory used to define them mathematically after their invention in computer science. And like Gilles answer suggests, ADTs brings some concepts that complicates the mathematical formalism for them.
This is why the whole sum type and product type thing is loosy washy. You can say Java classes are product types, but they are not quite like Haskell's product types, etc. They aren't an implementation of a very well defined mathematical theory. Unlike say how Java and Haskell both have support for set theory and algebra built in.
> There's no ADT in category theory. The only place you might find them is in type theory
I wouldn't say that this is true. Category theory has a definition of product of objects, sum/coproduct of objects, terminal object (unit type), and initial object (empty type). There is a correspondence between type and category theory where a certain type theory is the internal language of a certain category: https://ncatlab.org/nlab/show/computational+trinitarianism
Union is a better example than Enum to explain what algebraic data type is. It’s a natural progression from Struct as product type to Union as sum type.
Okay, maybe the design isn't great. Can we talk about algebraic data types now? I come to HN for interesting discussion about interesting topics. I start to wonder why I bother when the only comments on a post like this, which actually concerns something that interests me, are complaints about design.
(If you think comments have gone off-the-rails, I find the best solution is to downvote the off-topic top level comments, and then make a substantive on-topic top level comment.)
While you do have a point, there's a very simple way to collapse comment threads that you don't care about. And instead of commenting on an existing thread about the design (sparing those who have already muted that), you chose to create your own comment thread and further clutter the post discussion.
While we're on the topic, why do sites have non-wrapping anything? That site has non-wrapping code blocks, which made it so tedious to read on my phone that I gave up. HN has non-wrapping quotes. In what universe is non-wrapping anything a good idea?
The emphasis on sum types is huge, but it leaves out something critical: "JavaScript doesn’t have a lot of built-in features for sum types" is true of user-defined sum types (and related to it just not having types all together), but it has great support for a few special sum types: integers, booleans, characters! These sum types are important to talk about because it helps make clear that we already work with sum types all the time! Our basis for distinction and choice is built atop them in any language.
Finally, while products and sums get you some wonderful modeling capabilities, the place where Algebraic Data Types really shine is when you add two further capabilities: function types (exponentials) and recursion (fixpoint, which doesn't really map to high school algebra). Those are clearly built into JavaScript as well and would be great to discuss as well.