We acheived AGI yesterday and not a single thread about it?

Non-Running Relationships/Women/Men

Reply New New Thread

Moderation Moderation Information Moderation Information & Rules

Page 2 of 5

1 2 3 5

Next Last

1 year ago 12/21/2024 1:41pm EST

re: noticer of things

Truly I say to you, this generation will not survive AI.

1 year ago 12/21/2024 2:37pm EST

re: renowned critiquer of threads

renowned critiquer of threads wrote:
Nobody knows what agi stands for. This should have been expanded in the thread body.

AI knows. You could have asked Chat GPT to tell you what it meant.

Seriously though, us "knowing things" will start to be a relic of the past. Kids are happy to just "look things up" rather than spend a ton of energy learning and memorizing them.

It is the beginning of the end of us being the smartest things on the planet:

YouTube.com

IQ Test and White House Visit - Idiocracy

DoctorNotSure

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Watch on YouTube

1 year ago 12/21/2024 3:08pm EST

re: noticer of things

noticer of things wrote:
The point at which AI surpasses humans at most economically valuable tasks

doing tasks has nothing to do with intelligence, that's just automation. Automated self-upgrades are still just automation.

This is not remarkable. Machines have been surpassing humans at economically valuable tasks for centuries already. Cotton gin, bicycle, automobile, factory assembly line, computer.

"Solving math problems" is not what mathematicians do. They create math. Only living creatures can do this. Your AI is a calculator program.

keep speculating like the cryptobros wrote:
noticer of things wrote:
Both things can be true at the same time:
1) AI is trying to raise money and market their product
2) AGI has been acheived.
Please cite one peer reviewed source that shows "Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving."
Waiting...

Sure, here you go

heise.de

OpenAI's new o3 model aims to outperform humans in reasoning benchmarks

Dr. Volker Zota

The new o3 model is designed to outperform humans in math and programming benchmarks, o3-mini efficiency and strong price-performance ratio.

Open link

arcprize.org

OpenAI o3 Breakthrough High Score on ARC-AGI-Pub

OpenAI o3 scores 75.7% on ARC-AGI public leaderboard.

Open link

I've been programming for 40+ years wrote:
noticer of things wrote:
The point at which AI surpasses humans at most economically valuable tasks
doing tasks has nothing to do with intelligence, that's just automation. Automated self-upgrades are still just automation.
This is not remarkable. Machines have been surpassing humans at economically valuable tasks for centuries already. Cotton gin, bicycle, automobile, factory assembly line, computer.
"Solving math problems" is not what mathematicians do. They create math. Only living creatures can do this. Your AI is a calculator program.

Give me a really, really hard coding problem. I am confident AI will be able to solve anything you can come up with and do it in a few minutes. I'll post the solution / answer.

"Solving math problems" is indeed what mathematicians do, if we're talking about theoretical math problems that test mathematical boundaries. You think AI solving some algebra word problems would be news? It got a 100% correct response rate on a human-graded International Math Olympiad data set, including the proofs, which were all original. You don't think that's impressive?

You're either ignorant, or being deliberately obtuse. You really have no clue what's going on, do you?

1 year ago 12/21/2024 4:00pm EST

re: noticer of things

noticer of things wrote:
If you need some more context / evidence for how advanced AI is getting, check this out:
Tweet:x.com/deedydas/status/1870175212328608...
TLDR - ChatGPT-O3 now ranks as the #175 best coder in the world based on ELO score.
There was a guy here a few weeks ago claiming AI was "dumb" and "bad at coding" and couldn't understanding the business logic behind software apps.
I suspect that poster will be in for a very big surprise in 2025.

“AI is amazing for what it is”

”just wait a few more years!”

I\'ve been programming for 40+ years wrote:
noticer of things wrote:
The point at which AI surpasses humans at most economically valuable tasks
doing tasks has nothing to do with intelligence, that's just automation. Automated self-upgrades are still just automation.
This is not remarkable. Machines have been surpassing humans at economically valuable tasks for centuries already. Cotton gin, bicycle, automobile, factory assembly line, computer.
"Solving math problems" is not what mathematicians do. They create math. Only living creatures can do this. Your AI is a calculator program.

Not so fast, programmer. Generative AI creates. Math is a bit different but not that different from natural language.

The ability of Gen AI to start creating sensible text it had never been explicitly trained on in around 2020 was a significant revelation, even to those in the field.

1 year ago 12/21/2024 4:03pm EST

re: noticer of things

noticer of things wrote:
OpenAI's latest model completely shattered all possible AI benchmarking tests, exceeding PhD-level / expert human scores across virtually all domains.
Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving. It did this without previous knowledge of the questions or answers. These tests were done independently, by the ARC prize foundation.
This is a massive, massive step change in performance from existing frontier AI models, which are only 3-6 months old.
Most people are simply not grasping how significant this - AI (specifically ChatGPT O3) is now able to solve some the most difficult math problems known to man. It is pushing very close to boundaries of human knowledge and doing so independently.
If you're wondering why you haven't "seen" any remarkable manifestations of this in the real world, consider a metaphor: We've invented the jet engine before we invented the airplane. This technology is so powerful, so advanced, we are still grappling with how to use it. But we'll figure it out quickly and when we do, it's off to the races.

Shut up nerd!

1 year ago 12/21/2024 4:11pm EST

re: noticer of things

noticer of things wrote:
OpenAI's latest model completely shattered all possible AI benchmarking tests, exceeding PhD-level / expert human scores across virtually all domains.
Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving. It did this without previous knowledge of the questions or answers. These tests were done independently, by the ARC prize foundation.
This is a massive, massive step change in performance from existing frontier AI models, which are only 3-6 months old.
Most people are simply not grasping how significant this - AI (specifically ChatGPT O3) is now able to solve some the most difficult math problems known to man. It is pushing very close to boundaries of human knowledge and doing so independently.
If you're wondering why you haven't "seen" any remarkable manifestations of this in the real world, consider a metaphor: We've invented the jet engine before we invented the airplane. This technology is so powerful, so advanced, we are still grappling with how to use it. But we'll figure it out quickly and when we do, it's off to the races.

People were saying similar things about o1 a few months ago. Terry Tao said something along the the lines of using it was like interacting with a weak but not incompetent grad student.

Meanwhile, even with hints I can't get o1 to answer a majority of the homework questions I give my second year undergrads. Sure, it'll sometimes give a correct answer, but it has no ability to recognize when its answer is nonsense, no ability to ask for hints even when encouraged to do so, and seemingly very little ability to correct its own mistakes (for example, when pointing out an error in its reasoning, it will often accept that it's wrong but just rephrase its old solution when asked to try again). It also has complete confidence in its answers. When I prompt it to grade its own confidence in its answers and that a confidently wrong answer will be judged more harshly than no answer at all it still gives itself full marks on all of its self evaluations, including on questions where its answers were complete nonsense. It also has no ability to correct the mistakes of the user: if you ask it to solve a problem but include a typo that renders the problem itself non-sense it'll still happily provide a 'proof'.

These models will be worse than useless in most work environments until these sorts of overconfidence issues are corrected. I will be flabbergasted if the new o3 has made non-trivial progress towards this.

This isn't to say I think this technology is useless in general. But I do think that making it useful is a hard engineering task that will take lots of specific training for a given use case which won't be immediately applicable to other tasks. For example, projects like alphafold are along these lines and really cool. I also think openai will not heavily pursue such applications in the near future, and instead make way more money crushing benchmarks and claiming they're a couple more years of exponential growth away from completely changing the world.

1 year ago 12/21/2024 4:14pm EST

re: lob

lob wrote:
noticer of things wrote:
OpenAI's latest model completely shattered all possible AI benchmarking tests, exceeding PhD-level / expert human scores across virtually all domains.
Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving. It did this without previous knowledge of the questions or answers. These tests were done independently, by the ARC prize foundation.
This is a massive, massive step change in performance from existing frontier AI models, which are only 3-6 months old.
Most people are simply not grasping how significant this - AI (specifically ChatGPT O3) is now able to solve some the most difficult math problems known to man. It is pushing very close to boundaries of human knowledge and doing so independently.
If you're wondering why you haven't "seen" any remarkable manifestations of this in the real world, consider a metaphor: We've invented the jet engine before we invented the airplane. This technology is so powerful, so advanced, we are still grappling with how to use it. But we'll figure it out quickly and when we do, it's off to the races.
People were saying similar things about o1 a few months ago. Terry Tao said something along the the lines of using it was like interacting with a weak but not incompetent grad student.
Meanwhile, even with hints I can't get o1 to answer a majority of the homework questions I give my second year undergrads. Sure, it'll sometimes give a correct answer, but it has no ability to recognize when its answer is nonsense, no ability to ask for hints even when encouraged to do so, and seemingly very little ability to correct its own mistakes (for example, when pointing out an error in its reasoning, it will often accept that it's wrong but just rephrase its old solution when asked to try again). It also has complete confidence in its answers. When I prompt it to grade its own confidence in its answers and that a confidently wrong answer will be judged more harshly than no answer at all it still gives itself full marks on all of its self evaluations, including on questions where its answers were complete nonsense. It also has no ability to correct the mistakes of the user: if you ask it to solve a problem but include a typo that renders the problem itself non-sense it'll still happily provide a 'proof'.
These models will be worse than useless in most work environments until these sorts of overconfidence issues are corrected. I will be flabbergasted if the new o3 has made non-trivial progress towards this.
This isn't to say I think this technology is useless in general. But I do think that making it useful is a hard engineering task that will take lots of specific training for a given use case which won't be immediately applicable to other tasks. For example, projects like alphafold are along these lines and really cool. I also think openai will not heavily pursue such applications in the near future, and instead make way more money crushing benchmarks and claiming they're a couple more years of exponential growth away from completely changing the world.

Can you give us an example of a question that o1 gets consistently wrong but your undergrads can solve? I'm skeptical.

1 year ago 12/21/2024 4:17pm EST

re: bhah

bhah wrote:
I\'ve been programming for 40+ years wrote:
doing tasks has nothing to do with intelligence, that's just automation. Automated self-upgrades are still just automation.
This is not remarkable. Machines have been surpassing humans at economically valuable tasks for centuries already. Cotton gin, bicycle, automobile, factory assembly line, computer.
"Solving math problems" is not what mathematicians do. They create math. Only living creatures can do this. Your AI is a calculator program.
Not so fast, programmer. Generative AI creates. Math is a bit different but not that different from natural language.
The ability of Gen AI to start creating sensible text it had never been explicitly trained on in around 2020 was a significant revelation, even to those in the field.

Nooo the 64 year old SQL monkey is right about cutting edge AI model capabilities!

1 year ago 12/21/2024 4:33pm EST

re: noticer of things

noticer of things wrote:
lob wrote:
People were saying similar things about o1 a few months ago. Terry Tao said something along the the lines of using it was like interacting with a weak but not incompetent grad student.
Meanwhile, even with hints I can't get o1 to answer a majority of the homework questions I give my second year undergrads. Sure, it'll sometimes give a correct answer, but it has no ability to recognize when its answer is nonsense, no ability to ask for hints even when encouraged to do so, and seemingly very little ability to correct its own mistakes (for example, when pointing out an error in its reasoning, it will often accept that it's wrong but just rephrase its old solution when asked to try again). It also has complete confidence in its answers. When I prompt it to grade its own confidence in its answers and that a confidently wrong answer will be judged more harshly than no answer at all it still gives itself full marks on all of its self evaluations, including on questions where its answers were complete nonsense. It also has no ability to correct the mistakes of the user: if you ask it to solve a problem but include a typo that renders the problem itself non-sense it'll still happily provide a 'proof'.
These models will be worse than useless in most work environments until these sorts of overconfidence issues are corrected. I will be flabbergasted if the new o3 has made non-trivial progress towards this.
This isn't to say I think this technology is useless in general. But I do think that making it useful is a hard engineering task that will take lots of specific training for a given use case which won't be immediately applicable to other tasks. For example, projects like alphafold are along these lines and really cool. I also think openai will not heavily pursue such applications in the near future, and instead make way more money crushing benchmarks and claiming they're a couple more years of exponential growth away from completely changing the world.
Can you give us an example of a question that o1 gets consistently wrong but your undergrads can solve? I'm skeptical.

I have not been able to get it to give a reasonable proof of Schur's theorem (a common problem almost surely in its training data): Suppose that the natural numbers are partitioned into finitely many sets $C_1 \cup C_2 \cup ... \cup C_r$. Show that there is some $i \in \{1,...,r\}$ and natural numbers $x,y$ such that $\{x,y,x+y\} \subset C_i.$

When asked, o1 knows that this is Schur's theorem and knows that it is a corollary of Ramsey's theorem, but when asked to give the proof from scratch usually gives a strange case analysis that doesn't get close to a valid answer. It's also once given an answer that was pretty close (but still wrong) when I told it to use Ramsey's theorem in its proof (it answered along the lines of "color pair (x,y) by the color of x+y and then find a monochromatic triangle. When I told it that this was very nearly correct it could not find let alone correct the error).

I also couldn't get it to prove the infinite Ramsey's theorem itself (which, again, is almost surely in its training data). The prompt I used was: "Let X be the set of all unordered, distinct pairs of natural numbers. Show that in any partition of X into finitely many disjoint sets there
must be distinct natural numbers a,b,c such that {a,b}, {b,c}, and {a,c} all belong to the same part of our partition."

Usually it does the first part of the usual inductive proof correctly but then starts doing strange cases instead of repeating the same idea in the inductive step.

After these failures, I also gave it the following problem, which is an extremely easy pure reasoning problem and not even something I would give my students as a warm up problem. It also gave an incorrect answer: "Let N be the set of natural numbers, and let M be the NxN integer lattice (i.e., M consists of all ordered pairs of natural numbers (x,y), where it is possible that x=y). Suppose that I paint the elements of M red or blue. Is it true that I must either find a unit square whose corners are painted red or a unit square whose corners are painted blue?"

1 year ago 12/21/2024 5:30pm EST

re: noticer of things

1 year ago 12/21/2024 5:50pm EST

re: lob

lob wrote:
noticer of things wrote:
Can you give us an example of a question that o1 gets consistently wrong but your undergrads can solve? I'm skeptical.
I have not been able to get it to give a reasonable proof of Schur's theorem (a common problem almost surely in its training data): Suppose that the natural numbers are partitioned into finitely many sets $C_1 \cup C_2 \cup ... \cup C_r$. Show that there is some $i \in \{1,...,r\}$ and natural numbers $x,y$ such that $\{x,y,x+y\} \subset C_i.$
When asked, o1 knows that this is Schur's theorem and knows that it is a corollary of Ramsey's theorem, but when asked to give the proof from scratch usually gives a strange case analysis that doesn't get close to a valid answer. It's also once given an answer that was pretty close (but still wrong) when I told it to use Ramsey's theorem in its proof (it answered along the lines of "color pair (x,y) by the color of x+y and then find a monochromatic triangle. When I told it that this was very nearly correct it could not find let alone correct the error).
I also couldn't get it to prove the infinite Ramsey's theorem itself (which, again, is almost surely in its training data). The prompt I used was: "Let X be the set of all unordered, distinct pairs of natural numbers. Show that in any partition of X into finitely many disjoint sets there
must be distinct natural numbers a,b,c such that {a,b}, {b,c}, and {a,c} all belong to the same part of our partition."
Usually it does the first part of the usual inductive proof correctly but then starts doing strange cases instead of repeating the same idea in the inductive step.
After these failures, I also gave it the following problem, which is an extremely easy pure reasoning problem and not even something I would give my students as a warm up problem. It also gave an incorrect answer: "Let N be the set of natural numbers, and let M be the NxN integer lattice (i.e., M consists of all ordered pairs of natural numbers (x,y), where it is possible that x=y). Suppose that I paint the elements of M red or blue. Is it true that I must either find a unit square whose corners are painted red or a unit square whose corners are painted blue?"

This is not surprising and consistent with my experience. I would also agree with your quote attributed to Terry. I usually, but not always, can “haggle” out the correct answer or in the process discover the answer myself if it’s something I didn’t know already, but something I know is well known.

For the IMO style math problems, they do substantial manual work to specify the problem in a language like Lean, which then reduces the theorem prover’s work to a heuristic search. Ultimately, most of the hard math problems reduce to search problems. AI can’t fundamentally do anything about the fact that coming up with a proof is much harder than verifying proof (something we actually don’t formally know unless P != NP), but it can heuristically recognize and pursue learned patterns of argumentation to significantly prune the search space tree. If it hits the proof, it can be 100% confident it’s right.

The above isn’t quite where ChatGPT is today, not without significant manual effort at least, but in the foreseeable near future. Automated theorem provers in CS have seen decades of work before LLMs became big.

1 year ago 12/21/2024 6:01pm EST

re: noticer of things

On the other hand: "OpenAI’s new artificial-intelligence project is behind schedule and running up huge bills. It isn’t clear when—or if—it’ll work. There may not be enough data in the world to make it smart enough."

https://www.wsj.com/tech/ai/openai-gpt5-orion-delays-639e7693

1 year ago 12/21/2024 6:04pm EST

re: noticer of things

Also on the other hand: "I believe that the artificial intelligence boom — which would be better described as a generative AI boom — is (as I've said before) unsustainable, and will ultimately collapse. I also fear that said collapse could be ruinous to big tech, deeply damaging to the startup ecosystem, and will further sour public support for the tech industry.

wheresyoured.at

Have We Reached Peak AI?

Last week, the Wall Street Journal published a 10-minute-long interview with OpenAI CTO Mira Murati, with journalist Joanna Stern asking a series of thoughtful yet straightforward questions that Murati failed to satisfactoril...

Open link

1 year ago 12/21/2024 6:08pm EST

re: noticer of things

noticer of things wrote:
"Solving math problems" is indeed what mathematicians do, if we're talking about theoretical math problems that test mathematical boundaries.

Theoretical math is not about solving problems. It's about developing abstract concepts. That's a process of invention, even daydreaming, but not problem solving.

AI doesn't fundamentally do anything computers weren't doing before. It just seems that way because the hardware is so much more powerful. Computers can now do very complex tasks that would have taken forever before. How economically viable this can be is questionable, as the energy consumption is tremendous.

Computers only recently were able to win matches against the world's best Go players. I don't think they'll ever win at Stratego, too open ended. Their limitations are inherent and won't go away.

1 year ago 12/21/2024 6:09pm EST

re: I mean, alternately

I mean, alternately wrote:
Also on the other hand: "I believe that the artificial intelligence boom — which would be better described as a generative AI boom — is (as I've said before) unsustainable, and will ultimately collapse. I also fear that said collapse could be ruinous to big tech, deeply damaging to the startup ecosystem, and will further sour public support for the tech industry.
Link:www.wheresyoured.at/peakai/

Each time an article like this is published, AI smashed through another barrier / benchmark and keeps on chugging. Could a broken clock be right twice a day? Sure, but so far it's been wrong every time.

1 year ago 12/21/2024 6:16pm EST

re: noticer of things

Literally, 5-10 people cannot fit into a standard hand, but I'll try to clutch onto your reasoning there

1 year ago 12/21/2024 6:19pm EST

re: noticer of things

noticer of things wrote:
OpenAI's latest model completely shattered all possible AI benchmarking tests, exceeding PhD-level / expert human scores across virtually all domains.
Not only that, it was able to solve 25% of problems from a math data set consisting of the most difficult, highly theoretical math problems in the world that only a handful of people (literally, like 5-10 people) are even capable of solving / proving. It did this without previous knowledge of the questions or answers. These tests were done independently, by the ARC prize foundation.
This is a massive, massive step change in performance from existing frontier AI models, which are only 3-6 months old.
Most people are simply not grasping how significant this - AI (specifically ChatGPT O3) is now able to solve some the most difficult math problems known to man. It is pushing very close to boundaries of human knowledge and doing so independently.
If you're wondering why you haven't "seen" any remarkable manifestations of this in the real world, consider a metaphor: We've invented the jet engine before we invented the airplane. This technology is so powerful, so advanced, we are still grappling with how to use it. But we'll figure it out quickly and when we do, it's off to the races.

Will it be like the internet or 10x more transformative? Will we need to 4x the power supply of the US to deploy at meaningful scale or not?

I think most people are aware it’s transformative but the degree and the time scale is up in the air

Page 2 of 5

1 2 3 5

Next Last

What People Are Talking About On LetsRun

No top threads at the moment. Check back soon.

Reply Replying to

Username

Password

Leave the password field blank to post anonymously.

Post Preview

By posting you acknowledge that you have read and abide by our Terms and Conditions.

Remember me on this device.

We acheived AGI yesterday and not a single thread about it?

This thread has already been deleted.

You have been subscribed.

We acheived AGI yesterday and not a single thread about it?

Jump To A Page

Follow Ruxton Towers XC

Block Ruxton Towers XC

Follow calfshrug

Block calfshrug

Follow Harambe

Block Harambe

Jump To A Page

This thread has already been deleted.

Reply Replying to

You have been subscribed.