Remember how ChatGPT totally aced the bar exam? Wow! yeah, turns out that was just a lie

From Re-evaluating GPT-4’s bar exam performance (linked in the article):

First, although GPT-4’s UBE score nears the 90th percentile when examining approximate conversions from February administrations of the Illinois Bar Exam, these estimates are heavily skewed towards repeat test-takers who failed the July administration and score significantly lower than the general test-taking population.

Ohhh, that is sneaky!

What I find delightful about this is that I already wasn't impressed! Because, as the paper goes on to say
Moreover, although the UBE is a closed-book exam for humans, GPT-4’s huge training corpus largely distilled in its parameters means that it can effectively take the UBE “open-book”
And here I was thinking it not getting a perfect score on multiple-choice questions was already damning. But apparently it doesn't even get a particularly good score!
- [...W]hen examining only those who passed the exam (i.e. licensed or license-pending attorneys), GPT-4’s performance is estimated to drop to 48th percentile overall, and 15th percentile on essays.
  officially Not The Worst™, so clearly AI is going to take over law and governments any day now
  also. what the hell is going on in that other reply thread. just a parade of people incorrecting each other going "LLM's don't work like [bad analogy], they work like [even worse analogy]". did we hit too many buzzwords?
- That’s like saying a person reading a book before a quiz is doing it open book because they have the memory of reading that book.
- Why is that a criticism? This is how it works for humans too: we study, we learn the stuff, and then try to recall it during tests. We've been trained on the data too, for neither a human nor an ai would be able to do well on the test without learning it first.
  This is part of what makes ai so "scary" that it can basically know so much.
- I don't think you understand the type of multiple choice questions involved. Here's a real question:
  A father lived with his son, who was an alcoholic. When drunk, the son often became violent and physically abused his father. As a result, the father always lived in fear. One night, the father heard his son on the front stoop making loud obscene remarks. The father was certain that his son was drunk and was terrified that he would be physically beaten again. In his fear, he bolted the front door and took out a revolver. When the son discovered that the door was bolted, he kicked it down. As the son burst through the front door, his father shot him four times in the chest, killing him. In fact, the son was not under the influence of alcohol or any drug and did not intend to harm his father.
  At trial, the father presented the above facts and asked the judge to instruct the jury on self-defense.
  How should the judge instruct the jury with respect to self-defense?
  (A) Give the self-defense instruction, because it expresses the defense’s theory of the case.
  (B) Give the self-defense instruction, because the evidence is sufficient to raise the defense.
  (C) Deny the self-defense instruction, because the father was not in imminent danger from his son.
  (D) Deny the self-defense instruction, because the father used excessive force.
  Memorizing the book itself doesn't teach how to answer this type of question. It requires actual application of concepts to the new facts being given.
even if that wasn't the case, a 90% success rate is absolutely abysmal in practice.
- 90th percentile means it performed equal or better than 90% of the comparisons, no? Not that it got 90% score.

AI being pushed by scam artists...Gee. Who could have guessed?

I did. I guessed. I expressed skepticism when that headline first appeared.

Though making an unreliable intern is amazing and was impossible 5 years ago...

thank fuck sama invented the concept of doing a shit job
- I mean, it’s not shit at everything; it can be quite useful in the right context (GitHub Copilot is a prime example). Still, it doesn’t surprise me that these first-party LLM benchmarks are full of smoke and mirrors.

the perils of hitting /all

416 updoots, what on earth
- dj khaleb suffering from success dot jpeg
- You’re a man of the people, david
  (Sorry I can’t tag you appropriately ritenao, because of reasons jank)
  ((Also yes the italic is all overtone meanings))

AI = Actually Indians

You are welcome. /s
Anti AI hype Indian here. Though I’ve been classifying galaxies on zooniverse from time to time.
AGI = A Group of Indians.
- @V0ldek @Tb0n3 Large Lucknow Model
I hope you don’t mean to imply that it got it wrong because Indians would get the bar exam wrong, do you?
- I mean there have been numerous occasions where companies touting "AI" have actually just been using Indian labor. It's so common infact that "Actually Indians" is an honest to god meme.

I asked AI to summarize the article since it's paywalled. It didn't say anything about lying, should I trust it?

As a large language model, absolutely

It sounds like ChatGPT is eligible for a degree in business!

*Politics. Ftfy

It's almost like we can't make a machine conscious until we know what makes a human conscious, and it's obvious Emergentism is bullshit because making machines smarter doesn't make them conscious

Time to start listening to Roger Penrose's Orch-OR theory as the evidence piles up - https://pubs.acs.org/doi/10.1021/acs.jpcb.3c07936

The given link contains exactly zero evidence in favor of Orchestrated Objective Reduction — "something interesting observed in vitro using UV spectroscopy" is a far cry from anything having biological relevance, let alone significance for understanding consciousness. And it's not like Orch-OR deserves the lofty label of theory, anyway; it's an ill-defined, under-specified, ad hoc proposal to throw out quantum mechanics and replace it with something else.
The fact that programs built to do spicy autocomplete turn out to do spicy autocomplete has, as far as I can tell, zero implications for any theory of consciousness one way or the other.
- Bro the main objection to Orch-OR is that the brain is too warm for Quatnum stuff to happen there, and then they found Quantum stuff in the brain.... So... not sure how it's not suggestive of the reality of Orch-OR
  Edit: Btw, I don't know where you're getting the idea that Orch-OR is "Trying to throw out Quantum Mechanics and replace it with something else", considering that it's dependent upon Quantum Mechanics, and we have demonstrated that "Quantum Biology" is a thing in plants - https://www.scientificamerican.com/article/when-it-comes-to-photosynthesis-plants-perform-quantum-computation/ and in birds - https://www.nature.com/articles/d41586-021-01725-1
  So why not the brain?
Orch-OR
Never heard of this thing but just reading through the wiki
An essential feature of Penrose's theory is that the choice of states when objective reduction occurs is selected neither randomly (as are choices following wave function collapse) nor algorithmically. Rather, states are selected by a "non-computable" influence embedded in the Planck scale of spacetime geometry.
Neither randomly nor alorithmically, rather magically. Like really, what the fuck else could you mean by "non-computable" in there that would be distinguishable from magic?
Penrose claimed that such information is Platonic, representing pure mathematical truths, which relates to Penrose's ideas concerning the three worlds: the physical, the mental, and the Platonic mathematical world. In Shadows of the Mind (1994), Penrose briefly indicates that this Platonic world could also include aesthetic and ethical values, but he does not commit to this further hypothesis.
And this is just crankery with absolutely no mathematical meaning. Also pure mathematical truths are not outside of the physical world, what the fuck would that even mean bro.
I thought Penrose was a smart physicist, the hell is he doing peddling this.
- it's well outside of his ballpark somehow, it's like how Linus Pauling started all that megadose vitamin horseshit (starting with vit C), it sorta, kinda made a vibe-based shred of sense when you ignore all actual details, but he was hopelessly lost because he was not a biologist. what he had was nobel prize so he had enough cred for people to fall for it. many such cases!
- I thought Penrose was a smart physicist, the hell is he doing peddling this.
  https://www.smbc-comics.com/comic/2012-03-21
and it’s obvious Emergentism is bullshit because making machines smarter doesn’t make them conscious
This is like 101 of bad logic, "this sentence is false because I failed to prove it just now".
Throwing out emergentism because some linear algebra failed to replicate it is a pretty bad take.
You're right that consciousness and intelligence are not the same. Our language tends to conflate the two.
However, evolution created consciousness over billions of years by emergent factors and no source of specific direction besides being more successful at reproduction. We can likely get there orders of magnitude faster than evolution could. The big problem would be recognizing it for what it is when it's here.
- @frezik @HawlSera
  We can likely get there orders of magnitude faster than evolution could
  [Citation needed]

sort by controversial did not disappoint :)

this is sooo silicon valley

this thread enters the pantheon of things I'll occasionally return to when I need a laugh, joining the likes of bash.org (rip) and other qdbs

bash.org died!? damn...
- It does every couple of years, remains to be seen if this one is permanent. Has been a while though…

I never had any doubt

I just assumed that train on an answer sheet could probably get you past most tests.

This is the best summary I could come up with:

Consider this week’s announcement from OpenAI’s chief executive, Sam Altman, who promised he would unveil “new stuff” that “feels like magic to me.” But it was just a rather routine update that makes ChatGPT cheaper and faster.

That realization has real implications for the way we, our employers and our government should deal with Silicon Valley’s latest dazzling new, new thing.

has conquered many tasks that were previously unimaginable, such as successfully identifying images, writing complete coherent sentences and transcribing audio.

A re-examination by experimental materials chemists at the University of California, Santa Barbara, found “scant evidence for compounds that fulfill the trifecta of novelty, credibility and utility.”

I don’t think we’re in cryptocurrency territory, where the hype turned out to be a cover story for a number of illegal schemes that landed a few big names in prison.

Should we as a society be investing tens of billions of dollars, our precious electricity that could be used toward moving away from fossil fuels, and a generation of the brightest math and science minds on incremental improvements in mediocre email writing?

The original article contains 1,141 words, the summary contains 181 words. Saved 84%. I'm a bot and I'm open source!