Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm so excited in anticipation of my near-term return to studying math, as an independent curiosity hobby. It's going to be epically fun this time around with LLM's to lean on. Coincidentally like Terence Tao, I've also been asking complex analysis queries* of LLM's, things I was trying to understand better in my working through textbooks. Their ability to interpret open-ended math questions, and quickly find distant conceptual links that are helpful and relevant, astonishes me. Fields laureate Professor Tao (naturally) looks down on the current crop of mathematics LLM—"not completely incompetent graduate student..."—but at my current ability level that just means looking up.

*(I remember a specific impressive example from 6 months ago: I asked if certain definitions could be relaxed to allow complex analysis on a non-orientable manifold, like a Klein bottle, something I spent a lot of time puzzling over, and an LLM instantly figured out it would make the Cauchy-Riemann equations globally inconsistent. (In a sense the arbitrary sign convention in CR defines an orientation on a manifold: reversing manifold orientation is the same as swapping i with -i. I understand this now, solely because an LLM suggested looking at it). Of course, I'm sure this isn't original LLM thinking—the math's certainly written down somewhere in its training material, in some highly specific postgraduate textbook I have no knowledge of. That's not relevant to me. For me, it's absolutely impossible to answer this type of question, where I have very little idea where to start, without either an LLM or a PhD-level domain specialist. There is no other tool that can make this kind of semantic-level search accessible to me. I'm very carefully thinking how best to make use of such an, incredibly powerful but alien, tool...)



I agree. Having access to a kind of semantic full search engine on basically all textbooks on Earth feels like a superpower. Even better would be if it could pinpoint the exact textbook references it found the answer in.


How will you know if its answers are correct or not?


Because I'm verifying everything by hand, as is the whole point of studying pure mathematics.


How can you verify a proof though? Pure math isn't really about computations, and it can be very hard to spot subtle errors in a proof that an LLM might introduce, especially since they seem better at sounding convincing rather than being right.


The same way I verify my own proofs of textbook exercises: very cautiously. Subtle errors are a feature of the problem domain, not a new novelty.


By using Lean, a proof assistant and a functional programming language.

Here's @tao on mathstodon saying he's learning it.

https://mathstodon.xyz/@tao/111206761117553482


To code proofs in lean, you have to understand the proof very well. It doesn't seem to be very reasonable for someone learning material for the first time.


That's not true at all.

You can literally learn how to write proofs using Lean: https://djvelleman.github.io/HTPIwL/


The examples in this book are extraordinarily simple, and covers material that many proof assistants were designed to be extremely good at expressing. I wouldn't be surprised if a LLM could automate the exercises in this book completely.

Writing nontrivial proofs in a theorem prover is a different beast. In my experience (as someone who writes mechanized mathematical proofs for a living) you need to not only know the proof very well beforehand, but you also need to know the design considerations for all of the steps you are going to use beforehand, and you also need to think about all of the ways your proof is going to be used beforehand. Getting these wrong frequently means redoing a ton of work, because design errors in proof systems are subtle and can remain latent for a long time.


> think about all of the ways your proof is going to be used beforehand

What do you mean by that? I don't know much about theorem provers, but my POV would be that a proof is used to verify a statement. What other uses are there one should consider?


The issue is-- there are lots of way to write down a statement.

One common example is if you're going to internalize or externalize a property of a data structure: eg. represent it with a dependent type, or a property about a non-dependent type. This comes with design tradeoffs: some lemmas might expect internalized representations only, some rewrites might only be usable (eg. no horrifying dependent type errors) with externalized representations. For math in particular, which involves rich hierarchies of data structures, your choice about internalization might can impacts about what structures from your mathematical library you can use, or the level of fragile type coercion magic that needs to happen behind the scenes.


The premise is to have the LLM put up something that might be true, then have lean tell you whether it is true. If you trust lean, you don't need to understand the proof yourself to trust it.


The issue is that a hypothetical answer from a LLM is not even remotely easy to directly put into lean. You might ask the LLM to give you an answer together with a lean formalization, but the issue is that this kind of 'autoformalization' is at present not at at all reliable.


Tao says that isn't the case for all of it and that on massive collaborative projects he's done many nonmathemeticians did sections of them. He says someone who understands it well needs to do the initial proof sketch and key parts but that lots of parts of the proof can be worked on by nonmathemeticians.


If Tao says he's interested in something being coded in lean, there are literal teams of people who will throw themselves at him. Those projects are very well organized from the top down by people who know what they're doing, it's no surprise that they are able to create some space for people who don't understand the whole scope.

This is also the case for other top-profile mathematicians like Peter Scholze. Good luck to someone who wants to put chatgpt answers to random hypotheticals into lean to see if they're right, I don't think they'll have so easy a time of it.


are you questioning the entire premise of pure mathematics?


Good luck! That can be pretty hard to do when you're at the learning stage, and I would think doubly so given the LLM style where everything 'looks' very convincing.


How will we even measure this? Benchmarks are gamed/trained on and there is no way that there is much signal in the chatbot arena for these types of queries?

I think in just a few month the average user will not be able to tell the difference in performance between the major models




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: