ANN neurons are Pitts-McCulloch neurons, an extremely stylized model of the 1940's understanding of neurons. Each neuron represents a dot product plus a function application.
A biological neuron is 6e17 Daltons[0], so on the order of quadrillions of atoms. A single synapse is a huge landscape studded with receptors of various kinds, and the whole thing is swimming in salt solution where chemicals diffuse stochastically. And then there's the glia.
This is why I dismiss any claims about the computational power of the human brain, most of which seem to begin with the assumption that "1 spike = 1 FLOP", and that'll all that goes on in the brain.
To add to that, there's evidence that may mean that memories are stored in DNA, shuffled between neurons in RNA capsids, evolutionary borrowed from retroviruses.
That would make each neuron a complicated computer, functionally equivalent to one pc in capability and data storage.
"The neuronal gene Arc is essential for long-lasting information storage in the mammalian brain, mediates various forms of synaptic plasticity, and has been implicated in neurodevelopmental disorders. However, little is known about Arc’s molecular function and evolutionary origins.
Here, we show that Arc self-assembles into virus-like capsids that encapsulate RNA. Endogenous Arc protein is released from neurons in extracellular vesicles that mediate the transfer of Arc mRNA into new target cells, where it can undergo activity-dependent translation. Purified Arc capsids are endocytosed and are able to transfer Arc mRNA into the cytoplasm of neurons.
These results show that Arc exhibits similar molecular properties to retroviral Gag proteins. Evolutionary analysis indicates that Arc is derived from a vertebrate lineage of Ty3/gypsy retrotransposons, which are also ancestors to retroviruses. These findings suggest that Gag retroelements have been repurposed during evolution to mediate intercellular communication in the nervous system."
The passage you quote very much does not amount to "memories are stored in DNA", and as far as i know, memories are not stored in DNA sequence, and would be very surprised if they were.
What this mechanism does is take the transcription level of a gene in one cell, and induce a proportional signal in another cell. There are numerous mechanisms which do that in cells. Most of them are nowhere near as weird as Arc, but the net result is similar.
It's arguable that some memories are stored via epigenetic modifications to DNA, depending on how you define "storing memories". It is very unlikely it is storing memories in the way humans typically think of memories. But I don't think "in DNA" necessarily implies "in DNA sequence".
In dna definitely implies in dna sequence. Especially with dna storage becoming a thing, and with aspects of our physiology actually using dna modifications for their function and to "remember" things (e.g. VDJ recombination), its best not to confuse epigenetic vs genetic information storage. Each hypothesis implies different things and different followup questions.
They are certainly different things, but that's why I think more specific language should be used when the difference matters. The other comment was trying to make a more abstract point about what sort of memories an individual neuron can store using DNA, the main point wouldn't really be changed if that was through purely epigenetic mechanisms. I don't think the actual argument he made was compelling, but the claim to be discussed intended a broader definition of "in DNA" IMO. It was focused on the computational implications of the amount of storage a single neuron has.
Perhaps, but I'd argue the differences do matter quite a bit to the computational implications, especially as it relates to ability to reprogram, state space, and persistence, and I think there would be a much different reaction if I made a post mixing up data on a computer's hard drive with the state of whether each transistor in a cpu is on or off, or something like that.
There are different types of epigenetic mechanisms anyway which have different levels of persistence/reprogrammability, so to dive into technical details one would need to get very specific. Different methods for changing the DNA sequence would also have functional differences in how they could store a memory. And it's important to distinguish these mechanisms on a single cell level versus how they function in the body as a whole.
In a single cell I wouldn't call VDJ recombination memory, it's just an efficient way of encoding many different possibilities for types of receptors (and then selecting one). There exist many cells in your body that match a potential antigen you've never seen before - it's just that there are an extremely small number of them. The memory is really encoded by increasing the population of that cell in the body. Which has different storage properties than the individual cell's DNA does. Epigenetic change to a single neuron is unlikely to have a functional effect either, but it is a change that can occur to a mature cell as a reaction to the environment, in a way that VDJ recombination is not. AFAIK there is a lot left to be understood about how epigenetic modifications affect the brain.
I agree this is all interesting and can have implications for computational models, but there are models at many levels of abstraction. So I don't think it is necessary to get at the biological details in order to discuss higher level computational implications. Questions about the capabilities of the memory for each individual neuron would arise, but they could be theorized about without much knowledge of biological details. Of course it depends what your goals are whether you would consider that useful.
Anyway, I think we're both in agreement the OG comment was wrong, I just feel that "in DNA" is very ambiguous, and also not really the problem with his comment given the purpose of HN.
>That would make each neuron a complicated computer
Is this not an accepted fact? Each neuron is like a tiny processor in a larger distributed like system. This is why it's impossible to build bioaccurate NNs because each neuron has ~10K connections to various other neurons.
Indeed. Building an AI that matches human intelligence using equal or less mass than a human brain requires one or both of two things to be true: 1. The computational mass efficiency of brain tissue is very far from optimum. Considering the amount of time evolution has been improving upon it, I highly doubt that is true. 2. Most of the brain's computation is not involved in cognition. That may be true. We don't really know.
There are hard limits. No matter how you try you can't perfectly simulate three atoms using two atoms. If it turns out we have to, in software, represent fifty percent of neuronal activity to create consciousness we're in real trouble. A dragonfly can take inputs from thousands of ommatidia and use them to track targets in space using only sixteen neurons. How many transistors would it take us to do the same? Take that ratio and apply it to the 86 billion neurons in the human brain and you have a rough idea of what it will take to create strong AI. The numbers aren't promising.
Brain tissue needs to optimize for a lot of things other than computational efficiency. It needs to stay operational for decades with minimal replacement of parts and it needs to be resilient to a fair amount of bumps, diseases and chemical injury. Silicon chips don't have to be build to survive these conditions so it's possible they can be much more efficient at the computational aspect.
> 2. Most of the brain's computation is not involved in cognition. That may be true. We don't really know
I think that's largely known, depending of course on how you define "cognition".
Huge tracts (I don't have numbers) of the cortex are dedicated to things like vision, motor control, etc... Those aren't "cognition" as generally understood, and there are many stroke victims out there who can testify (like, actually "testify", in the sense of using their brain to explain it to you) to the fact that they can no longer see, or move their left side, etc... Their "cognition" is not impaired.
It gets fuzzier with things like speech and recognition, which also have dedicated real estate but are, kinda, "para-cognition" tasks.
Really, yes: you can have a "thinking" engine with a tiny fraction of the computation power of the human brain. I think most folks agree with that. The broader question is that with so limited an I/O structure: what is there for it to think about?
/2. Most of the brain's computation is not involved in cognition. That may be true. We don't really know./
I thought this at least was fairly well understood: We do in fact use our whole brains, as anything less would be a fantastic waste of resources, which evolution would have taken care of long ago. We have numerous human-specific adaptations to deal with the relatively massive brains we're carrying around.
When you're building an AI you may not need the neurons involved with, for example, breathing. That's what I'm talking about. I'm not a neuroscientist so I don't know for sure whether all the neurons we use for muscle and organ control do double-duty to help us cogitate.
"In animals, it is thought that the larger the brain, the more brain weight will be available for more complex cognitive tasks. However, large animals need more neurons to represent their own bodies and control specific muscles;[clarification needed][citation needed] thus, relative rather than absolute brain size makes for a ranking of animals that better coincides with the observed complexity of animal behaviour. The relationship between brain-to-body mass ratio and complexity of behaviour is not perfect as other factors also influence intelligence, like the evolution of the recent cerebral cortex and different degrees of brain folding,[5] which increase the surface of the cortex, which is positively correlated in humans to intelligence."
I think the idea that brain tissue is near optimally efficient is interesting. Yes, it's had a long time to evolve. But the same can be said about photosynthesis which is less efficient at capturing solar energy than PVs. The evolution of brain tissue was under constraints about something that could be made by biological systems from the resources we could eat. Is it not plausible that some very efficient computational substrate can be made, but requires minerals and chemical and industrial processes which would be toxic or impossible for life?
But the same can be said about photosynthesis which is less efficient at capturing solar energy than PVs.
The instantaneous efficiency is much lower, sure, but the lifetime efficiency is another question. The resource cost to create a plant is the nutrient/energy cost of producing and dispersing a seed. The energy cost of producing and installing a solar panel is enormous by comparison, and takes years if not decades to capture more resources than it took to produce.
Home solar panels capture 11-15% of incoming energy. Plant leaves capture 3-6%. Do you have solar panels now? If you could throw five dollars in seeds over your roof and get half that amount of electricity, would you?
Transistors, while dramatically simpler than neurons, also have fairly complicated physics. It's quite hard to determine the computational power of a microprocessor when you only know roughly how transistors work and have no idea how microprocessors work.
True, but transistors are designed by humans (for now), so you have an upper bound on complexity as well as a certain degree of modularity. Whereas biology has no such limitations because evolution doesn't care about the understandability of its designs.
So I tend to err on the side of biology being more complex than not.
But it's absolutely possible that the high-level behaviour is simple while the underlying implementation is complex and chaotic, as in the gas laws.
Probably more to the point is that humans are using transistors in their designs, so we deliberately confine them to their simple modes. It is an oversimplification to talk about transistors being either "on" or "off" because they technically have all sorts of intermediate states, but with a few exceptions, human designs avoid those intermediate states like the plague, because A: they defy our ability to build logic with them and B: depending on those exact behaviors means we can't mass-manufacture chips because the variance of the exact behaviors will be too high.
In principle one could imagine a processor design that works on these intermediate states that somehow vastly exceeds the computational power of a modern system despite using the same base transistors; in practice we have no idea how to build such a thing, and if we did, we wouldn't know how to build a second one of the same thing reliably either.
Biology lacks this restriction. That doesn't mean it's pure and utter chaos, either, there's bounds on that because it still needs systems to at least be metastable. But where humans engineer almost exclusively with stable systems, biology freely uses metastable systems all over the place. And then, even more remarkably, it deals with the question of how to replicate such a strange system in a way that no modern human engineer ever would by making every instance unique, and still somehow functional.
Complexity cuts both ways. Complexity, as every programmer knows, is no friend, and can easily create more problems than it solves.
Often the most efficient solution to a problem ends up being a simple one. It would be very surprising if the human brain's Rube Goldberg machine was anywhere near close to a mathematically optimal implementation of intelligence.
It's possible that much of the complexity in a biological neuron is simply working around other complexity introduced by biology, solving problems that we programmers do not even have to think about because we can simply directly use matrix multiplication.
Transistors were invented by humans, but chip layouts themselves involve a lot of automation and at more than one level — Verilog and VHDL both appeared in the 1980s, and there’s tools at both higher and lower levels of abstraction than those two.
Biology hasn't had time to figure out that intelligence is a good idea, up to humans the payoffs have been relatively low. The complexity in human intelligence probably focuses more on doing basic things extremely energy efficiently rather than being effective at thinking.
Humans can attempt to multiply numbers and frequently get the wrong result. That doesn't scream "pushing the limits of intelligence".
I think even dramatically simpler might be an understatement. Neurons are cells, living organisms capable of growth, movement, and some fairly intelligent interaction with their environment. A transistor doesn’t even approach that, it’s more on the level of a single protein within a neuron and not a very complex one.
Could indeed be a trade-off between speed vs complexity and efficiency.
Maybe there will be transistor-based human-level AGI soon, but I guess it would require several kilowatts of power compared to the 20 or so watts a human brain requires.
> Maybe there will be transistor-based human-level AGI soon, but I guess it would require several kilowatts of power compared to the 20 or so watts a human brain requires
Given that entire datacenters don't come close to an AGI, let alone a human level one, I fear "several kilowatts of power" is lowballing it by a significant number of orders of magnitude.
More to the point, we don't even know how or what gives rise to a general intelligence, and even defining it is basically a philosophical question. To me, the optimism of some AI enthusiasts (and I don't mean specifically the parent) feels like cavemen contemplating an expedition to the stars shortly after they invented the sling.
I think the chances of us arriving to anything close to an AGI iteratively based on our current capabilities is a pipe dream.
The lowest lowball estimate I’ve seen for the computational capacity of the human brain is 20e15 FLOPS (Kurzweil, 1999), and that would require 2/3 of a megawatt with the current best ranked supercomputer on the Green500 list.
There are already several computers more powerful than this that don’t appear close to a working full brain emulation, and we definitely don’t understand intelligence well enough yet to engineer something like ourselves, so it’s reasonable to be skeptical of estimates saying we’ll be at the kilowatts level “soon” even if it turns out we’re just missing a step which will be obvious in hindsight.
(Unless by “soon” you mean 15 years; I don’t want to bet on anything on that timescale).
> Unless by “soon” you mean 15 years; I don’t want to bet on anything on that timescale
Well, I'm old so by "soon" I mean within my lifetime, e.g. the next 3 to 4 decades.
edit: to clarify a little, when the term AI was first coined by McCarthy in 1956, researchers were confident in cracking AGI within a decade. Then AI-Winter came and people became more cautious. So when I say "soon", I mean it's probably not going to be another 65 years, but also not 5 months or 5 years (unexpected breakthroughs aside).
But digital chips are designed to use transistors as on/off switches, so the complicated physics don't matter for understanding a microprocessor. (It's different for analog ICs, where the characteristics of the individual transistors do matter)
Yes, but you wouldn't know that if the best knowledge you had about the inner workings of microchips is essentially looking which parts get hot while they perform different tasks.
The paper is measuring I/O behaviour, rather than the complexity of the mechanisms generating that behaviour. Transistors might have quite complex physics, but are designed to have relatively simple I/O behaviour.
Also true - witness the weird circuits generated by evolutionary algorithms on FPGAs etc. where they use chip-specific nonlinear dynamics and capacitative / inductive coupling which only works on the specific chip used to evolve the circuit.
And if you need ab-initio quantum chemistry (or equivalent) simulations of every atom in a human brain, the timelines for both AGI and whole brain emulation get pushed far, far into the future.
Assuming AGI requires about as much hardware as it'd take to emulate a brain, at least.
My 2c (apologies for the aggressive tone -- I'm just excited about AGI):
That's a very very weak upper bound on how much hardware it takes. I think it's not all that different from emulating a Nintendo64 with a quantum simulation of the hardware.
For complex systems to work (not to mention evolve), they need to be robust to small perturbations -- there's no way the computation the brain is doing is sensitive to the details of particular atoms. There has to be redundancy, modularity, etc. These things aren't human inventions so much as they are the only way to meaningfully move in a 2^|giant-number| state-space.
You could argue despite the huge number of physical degrees of freedom, the operations on DNA can be reduced to copy, repair, express, suppress. On the other hand, there's still a ton of intrinsic complexity in storing a huge amount of data, and yeah some nucleotides are totally essential.
The other thing a wonder about: sure, maintaining a proteome is hugely complicated, but how much of this complexity goes into maintaining homeostasis (e.g. metabolism, cytoskeloton and membrane maintenance, replication,...) vs. enabling computation. Seems like silicon has the advantage here.
A grain of sand is around 6.02E19 Daltons, imagine how much computational power there is in a sand castle!
On the other hand, snails have been shown to use just two neurons to generate their complex (relatively, I mean they're snails after all) feeding behaviour with just two neurons: https://neurosciencenews.com/neurons-decision-making-4370/
(Veering slightly offtopic here but this is the first I've heard the term 'Dalton', I've always just heard them called AMUs.)
Older estimates are probably short by multiple orders of magnitude.
We can see this in practice by looking at e.g. a self-driving Tesla vs. a mouse.
Watch a frightened mouse run across an uneven landscape, climbing and clambering and avoiding obstacles as it controls four independent limbs and countless small muscles in real time.
A Tesla's auto-drive is nowhere near that good in spite of having only a few levers to control: accelerate, brake, left, and right. It also has far better sensors than the mouse including better eyesight, a wider field of vision, etc., and last I checked mice do not have access to a cellular network supplying them with a heads-up macro view of the local environment. They don't have "fleet-wide learning" either.
The Tesla's AI uses over a hundred watts of power. The mouse's brain uses milliwatts.
As I wrote this my brain was consuming between 30 and 60 watts. My laptop peaks out at 80.
We are not even close to what biological neurons accomplish in raw compute, and while we are getting pretty good at training giant regression models that we call AI I am not convinced we really understand things at the algorithmic level yet either.
Perhaps Spot from Boston Dynamic is a more fair SotA comparison. Though even then, the magnitude of scale is off and still the gulf in ability is wide. I wonder how our artificial insects stack up in comparison to organic ones computationally. I think there is much to learn from the bee brain, for instance.
That is not true, one can easily regularise the derivative, which is delta distribution, in some appropriate way, the easiest one being a 'triangle' centered at zero. That way one can actually easily train networks of McCulloch-Pitts neurons.
The strongest statement that might fit here is that they can't learn efficiently. Zero gradients just make learning slower (though I do agree that differentiability is something to strive for).
As a bit of an aside, in practice stochastic versions of algorithms assuming differentiability work on wide ranges of functions, and compositions of poorly behaved functions can be quite nicely.
For a couple [0] concrete examples:
(1) Throw the absolute value function into your favorite gradient/Newton's minimization routine. Blindly using differentiable techniques often works if a sub-gradient technique would work.
(2) Consider minimizing the magnitude of the smallest eigenvalue of the Jacobian matrix of your favorite function. Many of the intermediate components (e.g., trying to derive the eigenvalue with respect to matrix entries) are poorly defined, undefined, or have cusps and other nasty features. The composition is (under mild constraints) differentiable with non-zero gradients.
(3) Consider minimizing the absolute value of a step function. By using a wide difference quotient as an approximation of the derivative and feeding that into optimizers you'll still find the minimum near zero (See (1); it works similarly).
If the composite output is only constant on small regions in the input space (which holds if those neurons are modeling anything non-trivial), you can rig together something close enough to backprop to still learn efficiently.
No, because it's wrong. Threshold neurons are still differentiable almost everywhere, no different than Relus which are ubiquitous. They may not be very good activation functions but they don't prevent a network from learning.
If I understand the PM neuron, it's outputs are boolean. This means there's no backprop signal, despite differentiability, since the outputs are constant (thus, gradient zero) in any neighborhood, which in turns zeros out any learning signal you would want to backprop through them. So you need a different learning strategy than backprop to use them.
(see also: the 'dead neuron' problem/phenomenon with ReLU activations.)
One can regularise the derivative (a delta distribution) in several ways (e.g. a triangle at zero) and that is good enough (even from a theoretical perspective) to find an approximate gradient. Experimentally it is then possible to train deep neural networks with such non-linearities.
> A biological neuron is 6e17 Daltons[0], so on the order of quadrillions of atoms
You may be implying there could be information stored in all those atoms, but I'm not sure that's possible. We live in a thermal bath, this means the behavior of atoms is usually stochastic. Their position cannot reliably hold information without dispersing it rapidly into the thermal environment. One way to get around this is to form chemical bonds, like in the DNA, where some kind of order or position is stable. But it necessitates this chemical structure, and importantly it necessitates as well a chemical reading mechanism and a chemical writing mechanism (or some kind of kinetic activation). Most parts of a cell are not prepared for any of that as far as I can tell. Information really should be carried by discrete elements such as neurotransmitters, as well as continuous but temporary (unstable) elements such as electric impulses (potentially caused by complex responses to electric potential and current inside the neuron). Even in the electric case, the fact about information stands; electric state is also encoded in atoms.
In other words, we almost certainly don't need the full fidelity to reproduce the behavior of a neuron.
I would need more rigorous examination of the neuron to give a confident estimate, but as a rule of thumb the concentration of relevant information everywhere is much less than DNA's (and mostly negligible everywhere) -- certainly a very interesting research program.
From a quick googling, E. Coli DNA has about 4.6 x 10^6 bp, so I would be reasonably confident in an upper bound to neuron information as (volume of neuron/volume of e. coli) x 5 x 10^6 bits (i.e. 1 megabit, 125 kb). The reality is probably much less. DNA is so dense because the reading and replication time is relatively slow. It's not made for rapid, random access at the speed of thought.
If I were to guess, I'd say long term information is probably retained within the concentration of compounds that can be read electro-chemically. The question of information then is how sensitive the neural system as a whole is to differences in concentration and differences in timing and amplitude of neural impulses. Again given thermal noise in the brain and limitations of amplitude, you can give strict upper bounds on neural communication (I'd be surprised at sensitivities more than a few ppm).
So essentially
1 neuron <= log2(distinguishable chemical state) (memory)
And also
1 spike <= log2(electrical degrees of freedom) (bandwidth)
Again, I'd require more information on spikes, but they carry maybe 20-40 bits at most -- so not more than a double f.p., although in ANNs again because of low sensitivities due to architecture most of LSB information doesn't contribute significantly to the computation (whereas the brain could multiplex information more effectively). So it's still likely in the order of 1 spike <= 10 flops.
what's also interesting is non-neuronal computational abilities. Allosteric regulation on receptors chained together seem like they have computational abilities. And they're on all sorts of cells
> Cortical neurons are well approximated by a deep neural network (DNN) with 5–8 layers
I wonder how many cortical neurons it takes to approximate a ReLu or tanh well. I suspect this number being large than 1. If so the paper only shows an upper bound. Think how many neurons does it take to add a two 10 digit numbers. It is perfectly feasible that some (possibly large) part of this 5-8 layers is just "emulation overhead".
Does someone know of studies of this emulation overhead, even outside biology?
Even between ARM and x86 there is an emulation overhead due to different memory models while both are register machines.
That’s true, the brain uses a VM to run maths or science, only the best scientists succeed at understanding some of the concepts natively.
Same for music: A student runs the music sheet in a VM, and progressively JIT makes the movements native, which allows much faster execution, and which allows building on top of the base layer.
Maybe we’re doing it all wrong writing programs in assembler. We should give them to a VM, the VM should see the similarity between various pieces of the programs, make them inline, and we could teach the machine faster.
> Maybe we’re doing it all wrong writing programs in assembler. We should give them to a VM,
This is what compilers do. Their input is a program in a more abstract language, either bytecode, an intermediate representation, or a source language.
The problem is that damn undecidability, which is like a minefield of rakes. It's undecidable for a compiler to tell if a program will do anything (e.g. halt). It's undecidable for a compiler to tell if two programs are equivalent. It's undecidable for a compiler to tell if a program is minimal.
So compilers have to well, be dumber. They approximate a lot.
> The problem is that damn undecidability, which is like a minefield of rakes. It's undecidable for a compiler to tell if a program will do anything (e.g. halt). It's undecidable for a compiler to tell if two programs are equivalent. It's undecidable for a compiler to tell if a program is minimal.
Only for Turing complete languages, to be clear. Now, of course, most interesting problems cannot really be solved in sub-Turing languages, but it's still a fundamental point to consider.
In fact finding the for loops to do tensor contractions (think matrix multiply but with many more dimensions) alone was something in NP range. Converting for loops to assembly as is done by https://polly.llvm.org/ is equivalent to Mixed-Integer Linear Programming, is equivalent to MaxSat is equivalent to Sat in a for a loop. In these domains there is a definition of minimal and they are still hard.
There is no need to approximate a ReLu or tanh well. Machine learning is statistical. The accuracy of these functions is not that important
ReLu is buggy and has an incorrect activation function for deep learning because it's not continuous everywhere. In practice, it rarely matters. It's chosen only because it's fast to implement buggy function than use someting proper.
The exact shape of tanh is not important either. It's enough to be monotone roughly s-shaped and easy to differentiate. Tanh is implemented in hardware so it's used.
Basically anything monotone and approximately differentiable works.
> There is no need to approximate a ReLu or tanh well
Similarily there might not a need to emulate neurons well to get the circuits in the brain to work. However when someone makes arguments that neurons are equivalent x artifical neurons it is necessary to choose a bound for comparison (fe. L2 error of activation) for the emulations you compare.
Also the nonlinearity only needs to be differentiable because ANNs are trained with gradient descent. With other more biologically plausible learning mechanisms, this might matter even less (or have other constraints / requirements)
Meanwhile, if we actually understood brains, I bet we would find endless examples of 'improper' behavior. Evolution picks up what seems to work, and sloooowly improves the parts that break, leaving good enough alone. (After all, if it doesn't affect reproductive probabilities, it doesn't matter.)
Activation functions will almost certainly not be the crux move for solving AGI.
Tanh is _not_ generally implemented in hardware, and it’s one of the fussier functions in math.h to implement well. Its only real virtues are that implementations are available everywhere, its derivative is relatively simple, and it has the right symmetries.
You're right that neural networks don't care too much the shape of most activation functions. I assume that splicing together two decaying exponential functions at the origin would work just as well in practice.
However tanh is a bit more special than just having the right symmetries. Sigmoid is the correct function to turn an additive value into a probability (range 0 to 1). Tanh is a scaled sigmoid which fulfills the same purpose for the -1 to +1 interval.
I sometimes wonder if clamped linear or exponential functions would work better than tanh/sigmoid in places where they're currently used (like LSTM/GRU gates).
Note that tanh saturates to ±1 faster than most except erf when normalized to have slope 1 at the origin (its series at +infinity is like 1 - 2e^{-2x} + o(e^{-4x}), while many of the other options have polynomial series, so they don't approach 1 nearly as fast).
I suspect some applications would in theory rather use erf, but erf is even worse to compute than tanh (on the other hand, erf's derivative is really nice, so who knows?)
By splicing together I mean a piecewise function which is `exp(x) - 1` on the left and `1 - exp(-x)` on the right. Which should be similar enough to tanh for most purposes.
Sure, it even has continuous derivatives of all orders and the right slope at the origin. It just doesn’t saturate to +/-1 as fast, which probably doesn’t matter.
I guess it depends on how accurately you're thinking about those functions being approximated. Neurons have a natural nonlinearity to their input-output (transfer) function, most obvious of which is the action potential threshold. Biological neurons have a saturating nonlinearity because there is an upper limit on their firing rate, but in certain regimes the nonlinearity of a single neuron could easily look qualitatively similar to relu or a (non-negative) tanh.
On the other hand, a single cell much simpler than a neuron (any bacteria) is able to perform significantly more complex calculations than any ANN we've tried so far (successfully interacting with an environment to move and find food).
Comparing these kinds of disparate tasks for "computational power levels" between vastly different architectures one of which we're not even close to understanding is generally pretty futile.
He also appeared recently on Sam Harris's podcast to a bit more more skepticism.
I thought he didn't break much new ground from his previous appearance on Lex's podcast in 2019, which I went back and listened to before listening to the latest one.
But...in saying that - listening to him again triggered a multitude of aha moments for me, and got me thinking about things like: "what is a dream", "how does deja-vu work" in the context of cortical columns, 1000 brain theory, predictive voting, etc.
I though his most recent explanation on how creating an artificial neocortex is not a risk in itself, but rather what you ask it to do. He was also adamant that giving an AI the ability to self-replicate should be where we draw the line.
This was very interesting when they previously discussed that topic but was months before SARS-CoV-2:
Biology/biochemistry can squeeze a good amount of computation from just 20 watts.
Even if we could get equal computation from silicon and have an algorithm to run near human-level general AI, humans can maintain the comparative advantage as long as the cost of hardware is more than the cost of raising and educating a human, and the operating cost is more than wages for the same task.
Human-level intelligence is highly variable. If we’re talking about something that can intelligently and independently make discoveries in math science and engineering, then the comparative advantage can’t stay long because you’d just pose the problem of “make yourself but faster” to it.
It doesn’t even need discoveries. Imagine an AI that could manually transcode your high level code into absolutely optimal assembly, simplifying your design, removing unnecessary code, optimizing the code in response to observed behavior, etc. I would guess we have several orders of magnitude of power efficiency loss just from building a system that has understandable and flexible layers of abstraction. A sufficiently powerful AI could work to automatically remove those abstractions and even operate at a higher level of abstraction.
The real question is whether we’re at all on the right track. ¯\_(ツ)_/¯
Suppose you write a program that simulates a TM and then prints "Halt" when the TM halts. The magic AI could optimize this program to just a print (or an infinite loop). This requires solving the halting problem.
It might not be able to do it for any turing machine/ a universal turing machine – but it might quickly figure out what a turing machine will do without executing all steps of it.
It might get it right some of the time, but it will be necessarily wrong some of the time. It's also very possible that for many (possibly even most) TMs, the most efficient algorithm for predicting the output is that TM itself.
> It might get it right some of the time, but it will be necessarily wrong some of the time
Yes, but the proof of the halting problem relies on diagonalization - i.e. a very exotic and carefully crafted input.
I would also like to note that, analogously, modern SAT solvers can solve most instances of NP-hard problems in polynomial time. Even though there exist hard cases they cannot solve in polynomial time (well, assuming P != NP), in practice these polynomial algorithms are exceedingly useful.
> It's also very possible that for many (possibly even most) TMs, the most efficient algorithm for predicting the output is that TM itself.
That could very well be true, but might still not hold for the subset of inputs (i.e. programs) that we actually care about in practice.
I think the problem is you're assuming that general AI = Turing machine, but there's no indication that needs to be the case.
"General AI" to me means human intelligence running on an artificial system (silicon, simulated brain, etc), so the optimization I'm thinking of is more akin to having an assembly expert translate your code into assembly than a compiler optimization pass.
Given that I have optimized my fair bit of code by removing abstraction layers or simplifying code, by definition a general AI should be similarly capable & can handle even ambiguous tasks like "refactor this codebase in this way". Obviously this gives up accuracy, but humans make mistakes writing code as well & it would be much easier to say "I've observed a fault that has this properties. Figure out the problem". It should do an even better job than I can on problems like that because for complicated problems it should be able to follow complex codebases with greater ease than I.
Again, I'm defining a tautological definition of "general AI" as one that's capable of doing all that. If it's not capable of doing that then it's not general AI.
I understand your points, but it's important to understand that humans ARE Turing Machines. We don't know of anything that CAN be computed bit can't be computed by a Turing Machine, so General AI would be a Turing Machine.
The Turing Machine model is specifically designed to abstract what a human (mathematician) does: you have a notebook (tape) and some kind of working memory inside your head, and at any one time you can either read something from the notebook and change the state in your head, or you can write something new in the notebook. This is what a TM does - it is an extremely abstract description of what it means to think, basically.
The problem for me with that line of reasoning is that it's one based on philosophy & not mathematically proven or with any clear evidence.
For example, [1], [2], [3] all show there are classes of computation outside of Turing machines. So if we agree there are computations outside of Turing machines, then the question is where does the human brain fall and, relatedly, can non-Turing machines run Turing machines? I suspect the answer to the latter question must be yes given the simplicity of a Turing machine (i.e. a pen & pencil is sufficient).
Thus, the fact that a human can execute a Turing machine doesn't conclude anything meaningful to me. If you could show that a Turing machine can execute a human brain, then the human brain would 100% be a Turing machine since a core property of a Turing machine is that it can transfer to any other Turing machine.
Even if we build "general AI" on a Turing machine, all we've shown is that there is a class of intelligence that is at its core a complicated Turing machine. It might suggest that a human brain is also a Turing machine (& I'd shift the weight of my prior from let's say 30% we're not Turing machines to 70% we are), but I think the only way to definitively prove that would be to do so by mathematically proving the model of the human brain, and then maybe also using it to actually clone a human brain onto a Turing machine to prove the model correct.
I think until that happens the argument remains philosophical & whichever side you take to be uninteresting. The only purpose of the debate is to show the question itself is important.
I do not have access to the third paper, which may hold somewhat more interest. Otherwise, all examples of models of hypercomputation in [1] and [2] are relying on performing an infinite number of steps in a finite time OR on precisely knowing the solution to an uncomputable problem. These are as interesting as trying to solve human flight assuming anti-gravity exists - they are obviously non-physical, absurdist models.
In fact, we don't know of a single physical process that is not Turing Computable, at least if we add randomness. Even with quantum wave-function collapse, we already know that QCs are still Turing equivalent.
This all suggests to me that the prior for the human mind being Turing equivalent can only be taken to be very close to 1.
That’s a solid counter argument. I assume you’re saying we don’t know of a single physical process that’s not a Turing machine because all of the models we have built to simulate them are Turing computable? That’s a strong point. Maybe I should readjust my prior on this. It does sound like you’re better versed in this topic. I like to learn by asking questions, so if you’re amenable to answering please do. If you don’t, then please don’t. It can be overwhelming for some and I’m not trying to prove you wrong. I admit I rushed into a position in haste without actually being well prepared on the topic and had way too much confidence in my own opinion.
I have some devil’s arguments but maybe they’re all bad. Genuinely, my technical grounding and knowledge here has atrophied to (at least I feel like) extremely laughable point and didn’t start high to begin with as undergrad engineering teaches a very different kind of math and I really limited my extracurricular need to seriously study beyond the core engineering topics. Even there, doing the bare minimum to just cram through exams rather than actually learning and understanding the topics throughout the year.
If we have only very primitive models of how the world works. After all we can simulate how the smallest atoms work to maybe some more complicated chemical interactions. Still, as it gets to biology our ability to simulate things breaks down rapidly. I don’t mean at a performance level, but at a “we’re waaaaaaay off in the applied aspects”. Drug discovery is one I’m thinking of. Or predicting someone’s facial features from their DNA (that last one always feels extremely dubious but news friendly). Or general AI. Those have seem to hit a wall in results. That’s close to a god of the gaps argument so let me know if that’s a bad faith one because that’s just too small a gap for non computability to live vs we just aren’t smart enough yet as a species? I could see that.
Or what about that we don’t know of any actually infinite Turing machines at a physical level (thermodynamics and heat death off the universe puts an upper bound there I am thinking now randomly to justify my position rather than considering that before-hand). So is anything really a Turing machine in reality or are Turing machines themselves just a useful tool/model to model the universe but not 100% accurate and the world works slightly differently and the error comes from non computability and not randomness? Let me know if that’s just a kooky supposition on my part in case the math of Turing machines already proves that non-infinite things are always Turing computable and is very basic results I have forgotten/never learned.
What if the Turing machine model is an easy tool to simulate parts of it but not possible to simulate the thing itself? Like the universe itself is not computable on a Turing machine. As far as I know something like that’s been proven in the past few years - there was a proof that if we are living in a simulation, then the physical rules of the thing running the simulation would have to be very different and not look like our universe. Doesn’t that indicate that there’s a limit to the size of any Turing machine we could build to model the universe itself, and thus maybe the infinite model required to build a theoretical Turing machine itself doesn’t map to how reality works. I’ll admit freely this one may have an improper grounding in Turing machines even at a popular level as I only read the abstract to that paper and maybe I misread it or I’m misremembering something that has so much technical nuance that I’m missing what that actually means and misremembering.
Has it been investigated on the theoretical side whether the interaction of networks of Turing machines (whether intelligent or not) all interacting is itself Turing computable? Would that simulation itself be an instance of hypercomputation or some other non computable area of research? I don’t understand the basics well enough to interpret that research so I’m hoping you do. I started reading the Wikipedia article and my eyes just glazed over.
I would think these would all be very significant problems but I’ve admittedly not kept up on my understanding of how this stuff works at a deep mathematical level (and have let those skills atrophy over the years) and maybe that’s all been answered or I have significant flaws in my intuition. I really and sincerely appreciate you taking the time to respond and explain and talk about this stuff. I don’t like studying but I love learning what the higher level intuitions are of people who do study this stuff more deeply are.
Oh, and I’m going to read [1] now. I thought the heading was timely and lovely since we were discussing this and I look forward to reading it (I haven’t yet so I do t know what the article says yet)
> No method of computing carried out by a mechanical process can be more powerful than a Turing machine.
Although widely adopted, as there is no clear way to prove or disprove its validity the proposition still remains a conjecture.
I think that’s what we’re basically discussing, right? Still, the way that’s phrased puts it into the P!=NP camp for me so I think you may be right.
Humans can't solve the halting problem either, there is no contradiction. The halting problem is a theoretical problem that needn't apply in real life. If you for example restrict your AI to be able to generate optimal assembly for all programs that don't require more than 100PB of source code to write down, the halting problem no longer applies (in fact you can now implement this AI using a regular expression).
Sure, that's possible, but the opposite is also possible: it might be wrong for most programs we actually write. Well, to be fair, there is some upper bound for any program running on a real CPU.
>Sure, that's possible, but the opposite is also possible: it might be wrong for most programs we actually write.
We could confirm this though! It's not like we can't find out if a given program halts or is inconsisent. Godel talks about it in his letter to Von Nuemann.
There are programs for which we can check this, but there is no general procedure to check if any program halts. Even ignoring the halting problem itself, say we analyze a program and realize it halts iff P=NP, or pi to the e is transcendental, .or if Pi's decimal expansion at position Graham's number is divisible by 3. Will that program halt? It might be very hard to say.
More promisingly, there are ways to construct programs such they will halt, using total languages (though not every problem can be solved with such a limitation).
Intelligence might not be scalar between intelligences based on different computational substraits. Computers beat humans at symbolic integration and differentiation since forever but don't beat humans in other areas. A near human-level intelligence will be vasly supperior in many other areas.
"The spiking times of L5 cortical neurons without NMDA receptors can sufficiently be approximated with a 5-8 layer DNN"
This isn't as novel as it sounds, previous work has modeled e.g. firing rates with 2 or 3 layer ANNs, nor does it provide some fundamental insight imho
I'm not sure if I'm reading this right, but, they made a predictive model of a biological neuron that works? Setting aside the how, the achievement is also a thing.
How much work would it take to transcode, say, a nematode into assemblages of these things?
Compartmental modeling is the standard method for simulating neural tissue by breaking it down to electrical cylinders. There is already openworm.org which is a full cell-by-cell simulation of the c.elegans
Larger projects like the Human Brain Project exist, but despite the existence of large scale compartmental simulations, we ve gained little insight about how brains work
> How much work would it take to transcode, say, a nematode into assemblages of these things?
You also need to find a way to simulate input (an environment) for this simulated “brain”, otherwise I suspect it would suffer the problems associated with sensory deprivation.
Cortical neurons are well approximated by a deep neural network (DNN) with 5–8 layers
This is a link to the abstract so you don't get to find out the width of these networks. But given that neural networks are universal approximators, it seems to me that all that's being said is a neuron is a very complicated thing. And so the phrasing seems to give an unjustified impression you still best think of a biological neural network as just a larger artificial neural network.
5-8 layers but how many weights per layer? I can’t access the article but it says it’s a deep CNN so probably in the order of a few thousands weights per layer.
9.2 million parameters per neuron. There an estimated 86 billion neurons in the human brain [0] and 19 billion in the neocortex [1]. That means that, for this strategy to emulate a human brain or neocortex would require 791 quadrillion parameters or 175 quadrillion parameters, respectively. The largest ANN built so far, GPT-3, has 175 billion parameters [2]. We are 6 orders of magnitude from being able to pull it off.
Some of those weights presumably go to reproducing highly-conserved features, like the kinetics of particular ion channels. These are "tied" via the genome, in the sense that there's one KCNC1 gene, but millions of neurons express the ion channel it encodes.
On the other hand, this model is also missing all sorts of other interactions: hormones and other neuromodulators, ephaptic coupling, etc.
It's so complicated I would venture that no one even has a reasonable guesstimate of how close we are, beyond "Not very."
Modulo the HUGE computational cost of simulating these things, not to mention the non-trivial task of determining network parameters.
Surrogate models are a thing though, and it's going to be an interesting time as we gradually figure out what approximations and optimizations are 'acceptable', and what computations really are necessary for "intelligence" (whatever that might be).
Arguably, you'd need to simulate tens of billions of neurons in order to achieve human-level intelligence. So, even if you'd have correctly simulated a single neuron, there's still a lot to cover to achieve human-level AI. And even then. The few unfortunate cases where a human child has been reared outside of the normal environment (see Genie), having a human brain has turned out not to be enough to have intelligence that would be recognized as 'human-level'. So, apart from the simulated brain itself, you'd need to devise an appropriate training environment to use for training said brain to achieve human-level intelligence. Which would be a formidable feat in and of itself.
That doesn't follow at all. Human neurons aren't much different from mouse neurons, maybe chicken neurons, mosquito neurons. After you faithfully model a neuron you still need untold myriads of them and their interconnects to get human brain.
> Human neurons aren't much different from mouse neurons
We don't actually know this. Yeah the cerebellum and substantia nigra and other regions preserved across mammals are probably conserved in the neural structure as well. But the human neocortex has quite radically different gene expression compared to rats (which results in the morphological differences). There very well could be "more processing power per neuron" in humans vs rodents.
Also, sensations and feelings are not a logical/mathematical byproduct of the neurons; no matter how "well" you simulate "neurons", feelings and sensations will not emerge.
Unless you believe in a transcendent soul that could be the source of these sensations or feelings, this assertion doesn't make sense. Assuming there is no supernatural soul, it's logically impossible for anything humans experience to not arise from the human body.
This entire notion of qualia is a philosophical quagmire predicated on the idea that if you can imagine something, it must be true ("we can imagine a zombie that behaves exactly like a human, but doesn't have qualia at all"). It's actually as laughable as the "argument from perfection" for the existence of a god.
> You just need biological machinery and computers are not biological machinery capable of producing sensations.
This is a postulate, not an argument. My contention is that qualia are meaningless - like saying that there is such a thing as "feeling like you're computing the number 1000" for a processor, or "feeling like you are a really hard granite" for a piece of granite. Just because we can express it doesn't mean that it makes sense.
All of the conundrums about qualia go away if we just accept this. Alice would not in fact experience anything new when she saw red for the first time, if she knew everything about human cognition and the physical properties of the color red.
I do absolutely agree that we know almost nothing about how these processes actually happen in the brain, and most attempts at AI and bombastic predictions about replacing humans are off the mark by centuries. But that is no reason to assume that there is something completely different going on in animal brains than computation, in the wide sense of the Turing machine model.
I don't think we have the faintest clue what subjective experience of self (whatever you call it, qualia?) actually is, to be able to say it isn't artificially reproducible.
A biological neuron is 6e17 Daltons[0], so on the order of quadrillions of atoms. A single synapse is a huge landscape studded with receptors of various kinds, and the whole thing is swimming in salt solution where chemicals diffuse stochastically. And then there's the glia.
This is why I dismiss any claims about the computational power of the human brain, most of which seem to begin with the assumption that "1 spike = 1 FLOP", and that'll all that goes on in the brain.
[0]: https://faculty.washington.edu/chudler/facts.html and https://www.wolframalpha.com/input/?i=1E-06+grams+in+dalton