Essay: The Future of Math Research in the Age of AI

The following was featured as a guest post on Silicon Reckoner. Many thanks to Michael Harris for allowing me the opportunity to share this with his readers. The image was not part of that guest post and was created by MS365 Copilot.

The Future of Math Research in the Age of AI

Can AI ever be a genuine collaborator?

by: Tamara Kolda

Results are in for the “first proof” experiment. First Proof is a project by a group of mathematicians, led by Mohammed Abouzaid (Stanford), Nikhil Srivastava (Cal), Rachel Ward (UT Austin), and Lauren Williams (Harvard), that tests whether AI systems can solve research-level math problems on their own. It presents ten original questions drawn from the authors’ own research across diverse areas of mathematics for which solutions exist but have not yet been published. (I am one of the problem contributors.) These are specialist problems which would normally require at least a graduate student with specialized knowledge to solve them. This is an attempt to develop a realistic way to measure AI’s ability to do genuine mathematical research, and the plan is to repeat this experiment with new problems in the future.

So how did AI systems do on this first batch of problems? Both major AI companies and individuals have shared their attempts to solve the problems, sometimes using a mixture of AI and human collaboration. Our own tests of Gemini Deep Research and ChatGPT 5.2 Pro solved 2 of 10 problems. My problem (#10) was one that both AI systems were able to solve. On the positive side, they identified a published technique that was not in my solution. On the negative side, they did not actually provide a citation for that technique. It was only the similarity of the two AI-generated solutions that led me to suspect that the AI systems had incorporated a known result, and I had to do my own detective work to find the source. Whatever we may learn from this experiment, it’s already clear to me that mathematical research is going to be forever changed by the advent of modern AI systems.

Before we dig in more deeply, I’d like to explain what a modern AI system is from my mathematical perspective: it is a system of equations with weights learned from training data. We have been using mathematical models for centuries, though usually with just a handful of parameters. In my early days as a researcher, I worked on circuit simulation models, which generally had only a dozen or so parameters. The AI models of today have billions and even trillions of parameters, arguably large enough to store the sum of human knowledge! There is much debate about whether AI works by memorizing its training data. Regardless, an AI system is ultimately a specific mathematical process: a set of equations generates its outputs, and this formalization means that it is vulnerable to mathematical attacks. While I respect that others may have different viewpoints, I personally harbor no illusions about AI consciousness or genuine reasoning capability; I am instead astounded by the creativity and breakthroughs in the design and training of AI systems.

Indeed, as I have experimented with AI systems in my own mathematical work, I have been increasingly impressed by their abilities. If a solution is accessible to an AI system as part of its training data or via web search, then it seems quite possible that an AI can solve the problem. Impressively, this tends to work even if the solution is couched in different nomenclature or has to be cobbled together from multiple sources. It seems to me that the likelihood that an AI can do this is proportional to how prevalent the general approach is in the literature. One issue — and it’s a killer — is that AI models cannot reliably provide sources for their knowledge. And the lack of fact-checking is a real problem. The AI may be confidently citing non-existent, untrue results or plagiarizing existing literature, as I observed with the solutions to my problem. On a good day, an AI system can blow your mind. On a bad day, I’ve seen it misrepresent what it’s actually done, feign remorse when called out, and do it all over again. All this means that it can be very hard to differentiate good results from mathematical slop: initially plausible answers that unravel as one digs into the details. The problem is that it is so tempting to accept an AI’s output at face value.

One of my worries for the future of academic publishing is the rising incidence of Human-AI scrapple, AI slop amalgamated by humans without careful, time-consuming validation. (Scrapple is a mixture of pork scraps that is a gristly cousin of spam.) In my capacity as Vice President of Publications for SIAM, I deal with cases involving author integrity and can see the costs of taking shortcuts with AI. It requires more effort from editors and referees to detect the poor scholarship. Obvious cases involve fabricated citations. Less obvious cases consist of weak arguments, missing citations, and incoherent reasoning. We’ve already seen the negative ramifications of AI+human scrapple for conferences like NeurIPS, where hallucinated citations are endangering the credibility of this previously acclaimed venue.

So, what does the future hold for mathematics? Let us assume that all the present-day problems with AI systems (such as citations and hallucinations) can be fixed. What would be the role of a mathematician then? Well, first and foremost, the role of the mathematician is judgment: deciding what questions to ask, what theorems to prove, what algorithms to write. This requires someone who has experience, which is why the main job of an advisor is to help a beginning researcher choose what problem to work on. In my role as an applied mathematician, my major responsibility is translating vague stakeholder problems into specific mathematical questions. Once the problem is reduced to mathematics, I’ve often engaged with brilliant collaborators to find the answers. Looking back, could AI have fulfilled that role? Maybe an AI could solve a given math problem, but it has neither the desire to do so nor drive for creative insight, no opinion on whether or not the question makes sense, no stance on the right approach. In contrast, my collaborators have a point of view. They are able to debate whether we’re asking the right question, inspire a radically different approach, and sometimes change my entire mathematical viewpoint.

Mathematicians of the future will undoubtedly employ AI systems as powerful tools — just as they have employed computers and the Internet as those technologies came along — but they will not be replaced by them. My hope for the future of mathematical research is that it will be produced by people who care about the outcome of their work, who have a stake in the correctness of their results, and who are willing to put in the hard work (including carefully vetting the output of AI systems) to get there.

Acknowledgement: I am grateful to my colleagues Mohammed Abouzaid (Stanford), Andrew Blumberg (Columbia), Ernest Davis (NYU), Gary Marcus (NYU, Emeritus), Dan Spielman (Yale), Nikhil Srivastava (Cal) and Lauren Williams (Harvard) for their insights and feedback on this piece.

About the author: Tamara Kolda is an applied mathematician specializing in data science and artificial intelligence. Her consultancy is MathSci.ai. She is a Fellow of both SIAM and ACM as well as a member of the National Academy of Engineering. She currently serves as Vice President of Publications for SIAM.

Related