Nobody knows what agi stands for. This should have been expanded in the thread body.
AI knows. You could have asked Chat GPT to tell you what it meant.
Seriously though, us "knowing things" will start to be a relic of the past. Kids are happy to just "look things up" rather than spend a ton of energy learning and memorizing them.
It is the beginning of the end of us being the smartest things on the planet:
The point at which AI surpasses humans at most economically valuable tasks
doing tasks has nothing to do with intelligence, that's just automation. Automated self-upgrades are still just automation.
This is not remarkable. Machines have been surpassing humans at economically valuable tasks for centuries already. Cotton gin, bicycle, automobile, factory assembly line, computer.
"Solving math problems" is not what mathematicians do. They create math. Only living creatures can do this. Your AI is a calculator program.
Both things can be true at the same time: 1) AI is trying to raise money and market their product
2) AGI has been acheived.
Please cite one peer reviewed source that shows "Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving."
The point at which AI surpasses humans at most economically valuable tasks
doing tasks has nothing to do with intelligence, that's just automation. Automated self-upgrades are still just automation.
This is not remarkable. Machines have been surpassing humans at economically valuable tasks for centuries already. Cotton gin, bicycle, automobile, factory assembly line, computer.
"Solving math problems" is not what mathematicians do. They create math. Only living creatures can do this. Your AI is a calculator program.
Give me a really, really hard coding problem. I am confident AI will be able to solve anything you can come up with and do it in a few minutes. I'll post the solution / answer.
"Solving math problems" is indeed what mathematicians do, if we're talking about theoretical math problems that test mathematical boundaries. You think AI solving some algebra word problems would be news? It got a 100% correct response rate on a human-graded International Math Olympiad data set, including the proofs, which were all original. You don't think that's impressive?
You're either ignorant, or being deliberately obtuse. You really have no clue what's going on, do you?
The point at which AI surpasses humans at most economically valuable tasks
doing tasks has nothing to do with intelligence, that's just automation. Automated self-upgrades are still just automation.
This is not remarkable. Machines have been surpassing humans at economically valuable tasks for centuries already. Cotton gin, bicycle, automobile, factory assembly line, computer.
"Solving math problems" is not what mathematicians do. They create math. Only living creatures can do this. Your AI is a calculator program.
Not so fast, programmer. Generative AI creates. Math is a bit different but not that different from natural language.
The ability of Gen AI to start creating sensible text it had never been explicitly trained on in around 2020 was a significant revelation, even to those in the field.
OpenAI's latest model completely shattered all possible AI benchmarking tests, exceeding PhD-level / expert human scores across virtually all domains.
Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving. It did this without previous knowledge of the questions or answers. These tests were done independently, by the ARC prize foundation.
This is a massive, massive step change in performance from existing frontier AI models, which are only 3-6 months old.
Most people are simply not grasping how significant this - AI (specifically ChatGPT O3) is now able to solve some the most difficult math problems known to man. It is pushing very close to boundaries of human knowledge and doing so independently.
If you're wondering why you haven't "seen" any remarkable manifestations of this in the real world, consider a metaphor: We've invented the jet engine before we invented the airplane. This technology is so powerful, so advanced, we are still grappling with how to use it. But we'll figure it out quickly and when we do, it's off to the races.
OpenAI's latest model completely shattered all possible AI benchmarking tests, exceeding PhD-level / expert human scores across virtually all domains.
Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving. It did this without previous knowledge of the questions or answers. These tests were done independently, by the ARC prize foundation.
This is a massive, massive step change in performance from existing frontier AI models, which are only 3-6 months old.
Most people are simply not grasping how significant this - AI (specifically ChatGPT O3) is now able to solve some the most difficult math problems known to man. It is pushing very close to boundaries of human knowledge and doing so independently.
If you're wondering why you haven't "seen" any remarkable manifestations of this in the real world, consider a metaphor: We've invented the jet engine before we invented the airplane. This technology is so powerful, so advanced, we are still grappling with how to use it. But we'll figure it out quickly and when we do, it's off to the races.
People were saying similar things about o1 a few months ago. Terry Tao said something along the the lines of using it was like interacting with a weak but not incompetent grad student.
Meanwhile, even with hints I can't get o1 to answer a majority of the homework questions I give my second year undergrads. Sure, it'll sometimes give a correct answer, but it has no ability to recognize when its answer is nonsense, no ability to ask for hints even when encouraged to do so, and seemingly very little ability to correct its own mistakes (for example, when pointing out an error in its reasoning, it will often accept that it's wrong but just rephrase its old solution when asked to try again). It also has complete confidence in its answers. When I prompt it to grade its own confidence in its answers and that a confidently wrong answer will be judged more harshly than no answer at all it still gives itself full marks on all of its self evaluations, including on questions where its answers were complete nonsense. It also has no ability to correct the mistakes of the user: if you ask it to solve a problem but include a typo that renders the problem itself non-sense it'll still happily provide a 'proof'.
These models will be worse than useless in most work environments until these sorts of overconfidence issues are corrected. I will be flabbergasted if the new o3 has made non-trivial progress towards this.
This isn't to say I think this technology is useless in general. But I do think that making it useful is a hard engineering task that will take lots of specific training for a given use case which won't be immediately applicable to other tasks. For example, projects like alphafold are along these lines and really cool. I also think openai will not heavily pursue such applications in the near future, and instead make way more money crushing benchmarks and claiming they're a couple more years of exponential growth away from completely changing the world.
OpenAI's latest model completely shattered all possible AI benchmarking tests, exceeding PhD-level / expert human scores across virtually all domains.
Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving. It did this without previous knowledge of the questions or answers. These tests were done independently, by the ARC prize foundation.
This is a massive, massive step change in performance from existing frontier AI models, which are only 3-6 months old.
Most people are simply not grasping how significant this - AI (specifically ChatGPT O3) is now able to solve some the most difficult math problems known to man. It is pushing very close to boundaries of human knowledge and doing so independently.
If you're wondering why you haven't "seen" any remarkable manifestations of this in the real world, consider a metaphor: We've invented the jet engine before we invented the airplane. This technology is so powerful, so advanced, we are still grappling with how to use it. But we'll figure it out quickly and when we do, it's off to the races.
People were saying similar things about o1 a few months ago. Terry Tao said something along the the lines of using it was like interacting with a weak but not incompetent grad student.
Meanwhile, even with hints I can't get o1 to answer a majority of the homework questions I give my second year undergrads. Sure, it'll sometimes give a correct answer, but it has no ability to recognize when its answer is nonsense, no ability to ask for hints even when encouraged to do so, and seemingly very little ability to correct its own mistakes (for example, when pointing out an error in its reasoning, it will often accept that it's wrong but just rephrase its old solution when asked to try again). It also has complete confidence in its answers. When I prompt it to grade its own confidence in its answers and that a confidently wrong answer will be judged more harshly than no answer at all it still gives itself full marks on all of its self evaluations, including on questions where its answers were complete nonsense. It also has no ability to correct the mistakes of the user: if you ask it to solve a problem but include a typo that renders the problem itself non-sense it'll still happily provide a 'proof'.
These models will be worse than useless in most work environments until these sorts of overconfidence issues are corrected. I will be flabbergasted if the new o3 has made non-trivial progress towards this.
This isn't to say I think this technology is useless in general. But I do think that making it useful is a hard engineering task that will take lots of specific training for a given use case which won't be immediately applicable to other tasks. For example, projects like alphafold are along these lines and really cool. I also think openai will not heavily pursue such applications in the near future, and instead make way more money crushing benchmarks and claiming they're a couple more years of exponential growth away from completely changing the world.
Can you give us an example of a question that o1 gets consistently wrong but your undergrads can solve? I'm skeptical.
doing tasks has nothing to do with intelligence, that's just automation. Automated self-upgrades are still just automation.
This is not remarkable. Machines have been surpassing humans at economically valuable tasks for centuries already. Cotton gin, bicycle, automobile, factory assembly line, computer.
"Solving math problems" is not what mathematicians do. They create math. Only living creatures can do this. Your AI is a calculator program.
Not so fast, programmer. Generative AI creates. Math is a bit different but not that different from natural language.
The ability of Gen AI to start creating sensible text it had never been explicitly trained on in around 2020 was a significant revelation, even to those in the field.
Nooo the 64 year old SQL monkey is right about cutting edge AI model capabilities!
People were saying similar things about o1 a few months ago. Terry Tao said something along the the lines of using it was like interacting with a weak but not incompetent grad student.
Meanwhile, even with hints I can't get o1 to answer a majority of the homework questions I give my second year undergrads. Sure, it'll sometimes give a correct answer, but it has no ability to recognize when its answer is nonsense, no ability to ask for hints even when encouraged to do so, and seemingly very little ability to correct its own mistakes (for example, when pointing out an error in its reasoning, it will often accept that it's wrong but just rephrase its old solution when asked to try again). It also has complete confidence in its answers. When I prompt it to grade its own confidence in its answers and that a confidently wrong answer will be judged more harshly than no answer at all it still gives itself full marks on all of its self evaluations, including on questions where its answers were complete nonsense. It also has no ability to correct the mistakes of the user: if you ask it to solve a problem but include a typo that renders the problem itself non-sense it'll still happily provide a 'proof'.
These models will be worse than useless in most work environments until these sorts of overconfidence issues are corrected. I will be flabbergasted if the new o3 has made non-trivial progress towards this.
This isn't to say I think this technology is useless in general. But I do think that making it useful is a hard engineering task that will take lots of specific training for a given use case which won't be immediately applicable to other tasks. For example, projects like alphafold are along these lines and really cool. I also think openai will not heavily pursue such applications in the near future, and instead make way more money crushing benchmarks and claiming they're a couple more years of exponential growth away from completely changing the world.
Can you give us an example of a question that o1 gets consistently wrong but your undergrads can solve? I'm skeptical.
I have not been able to get it to give a reasonable proof of Schur's theorem (a common problem almost surely in its training data): Suppose that the natural numbers are partitioned into finitely many sets $C_1 \cup C_2 \cup ... \cup C_r$. Show that there is some $i \in \{1,...,r\}$ and natural numbers $x,y$ such that $\{x,y,x+y\} \subset C_i.$
When asked, o1 knows that this is Schur's theorem and knows that it is a corollary of Ramsey's theorem, but when asked to give the proof from scratch usually gives a strange case analysis that doesn't get close to a valid answer. It's also once given an answer that was pretty close (but still wrong) when I told it to use Ramsey's theorem in its proof (it answered along the lines of "color pair (x,y) by the color of x+y and then find a monochromatic triangle. When I told it that this was very nearly correct it could not find let alone correct the error).
I also couldn't get it to prove the infinite Ramsey's theorem itself (which, again, is almost surely in its training data). The prompt I used was: "Let X be the set of all unordered, distinct pairs of natural numbers. Show that in any partition of X into finitely many disjoint sets there must be distinct natural numbers a,b,c such that {a,b}, {b,c}, and {a,c} all belong to the same part of our partition."
Usually it does the first part of the usual inductive proof correctly but then starts doing strange cases instead of repeating the same idea in the inductive step.
After these failures, I also gave it the following problem, which is an extremely easy pure reasoning problem and not even something I would give my students as a warm up problem. It also gave an incorrect answer: "Let N be the set of natural numbers, and let M be the NxN integer lattice (i.e., M consists of all ordered pairs of natural numbers (x,y), where it is possible that x=y). Suppose that I paint the elements of M red or blue. Is it true that I must either find a unit square whose corners are painted red or a unit square whose corners are painted blue?"
Can you give us an example of a question that o1 gets consistently wrong but your undergrads can solve? I'm skeptical.
I have not been able to get it to give a reasonable proof of Schur's theorem (a common problem almost surely in its training data): Suppose that the natural numbers are partitioned into finitely many sets $C_1 \cup C_2 \cup ... \cup C_r$. Show that there is some $i \in \{1,...,r\}$ and natural numbers $x,y$ such that $\{x,y,x+y\} \subset C_i.$
When asked, o1 knows that this is Schur's theorem and knows that it is a corollary of Ramsey's theorem, but when asked to give the proof from scratch usually gives a strange case analysis that doesn't get close to a valid answer. It's also once given an answer that was pretty close (but still wrong) when I told it to use Ramsey's theorem in its proof (it answered along the lines of "color pair (x,y) by the color of x+y and then find a monochromatic triangle. When I told it that this was very nearly correct it could not find let alone correct the error).
I also couldn't get it to prove the infinite Ramsey's theorem itself (which, again, is almost surely in its training data). The prompt I used was: "Let X be the set of all unordered, distinct pairs of natural numbers. Show that in any partition of X into finitely many disjoint sets there must be distinct natural numbers a,b,c such that {a,b}, {b,c}, and {a,c} all belong to the same part of our partition."
Usually it does the first part of the usual inductive proof correctly but then starts doing strange cases instead of repeating the same idea in the inductive step.
After these failures, I also gave it the following problem, which is an extremely easy pure reasoning problem and not even something I would give my students as a warm up problem. It also gave an incorrect answer: "Let N be the set of natural numbers, and let M be the NxN integer lattice (i.e., M consists of all ordered pairs of natural numbers (x,y), where it is possible that x=y). Suppose that I paint the elements of M red or blue. Is it true that I must either find a unit square whose corners are painted red or a unit square whose corners are painted blue?"
This is not surprising and consistent with my experience. I would also agree with your quote attributed to Terry. I usually, but not always, can “haggle” out the correct answer or in the process discover the answer myself if it’s something I didn’t know already, but something I know is well known.
For the IMO style math problems, they do substantial manual work to specify the problem in a language like Lean, which then reduces the theorem prover’s work to a heuristic search. Ultimately, most of the hard math problems reduce to search problems. AI can’t fundamentally do anything about the fact that coming up with a proof is much harder than verifying proof (something we actually don’t formally know unless P != NP), but it can heuristically recognize and pursue learned patterns of argumentation to significantly prune the search space tree. If it hits the proof, it can be 100% confident it’s right.
The above isn’t quite where ChatGPT is today, not without significant manual effort at least, but in the foreseeable near future. Automated theorem provers in CS have seen decades of work before LLMs became big.
On the other hand: "OpenAI’s new artificial-intelligence project is behind schedule and running up huge bills. It isn’t clear when—or if—it’ll work. There may not be enough data in the world to make it smart enough."
Also on the other hand: "I believe that the artificial intelligence boom — which would be better described as a generative AI boom — is (as I've said before) unsustainable, and will ultimately collapse. I also fear that said collapse could be ruinous to big tech, deeply damaging to the startup ecosystem, and will further sour public support for the tech industry.
Last week, the Wall Street Journal published a 10-minute-long interview with OpenAI CTO Mira Murati, with journalist Joanna Stern asking a series of thoughtful yet straightforward questions that Murati failed to satisfactoril...
"Solving math problems" is indeed what mathematicians do, if we're talking about theoretical math problems that test mathematical boundaries.
Theoretical math is not about solving problems. It's about developing abstract concepts. That's a process of invention, even daydreaming, but not problem solving.
AI doesn't fundamentally do anything computers weren't doing before. It just seems that way because the hardware is so much more powerful. Computers can now do very complex tasks that would have taken forever before. How economically viable this can be is questionable, as the energy consumption is tremendous.
Computers only recently were able to win matches against the world's best Go players. I don't think they'll ever win at Stratego, too open ended. Their limitations are inherent and won't go away.
Also on the other hand: "I believe that the artificial intelligence boom — which would be better described as a generative AI boom — is (as I've said before) unsustainable, and will ultimately collapse. I also fear that said collapse could be ruinous to big tech, deeply damaging to the startup ecosystem, and will further sour public support for the tech industry.
Each time an article like this is published, AI smashed through another barrier / benchmark and keeps on chugging. Could a broken clock be right twice a day? Sure, but so far it's been wrong every time.
OpenAI's latest model completely shattered all possible AI benchmarking tests, exceeding PhD-level / expert human scores across virtually all domains.
Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving. It did this without previous knowledge of the questions or answers. These tests were done independently, by the ARC prize foundation.
This is a massive, massive step change in performance from existing frontier AI models, which are only 3-6 months old.
Most people are simply not grasping how significant this - AI (specifically ChatGPT O3) is now able to solve some the most difficult math problems known to man. It is pushing very close to boundaries of human knowledge and doing so independently.
If you're wondering why you haven't "seen" any remarkable manifestations of this in the real world, consider a metaphor: We've invented the jet engine before we invented the airplane. This technology is so powerful, so advanced, we are still grappling with how to use it. But we'll figure it out quickly and when we do, it's off to the races.
Will it be like the internet or 10x more transformative? Will we need to 4x the power supply of the US to deploy at meaningful scale or not?
I think most people are aware it’s transformative but the degree and the time scale is up in the air