Skip Navigation
InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)DI
Posts
6
Comments
117
Joined
2 yr. ago
  • I think if people are citing in another 3 months time, they’ll be making a mistake

    In 3 months they'll think they're 40% faster while being 38% slower. And sometime in 2026 they will be exactly 100% slower - the moment referred to as "technological singularity".

  • Yeah, the glorious future where every half-as-good-as-expert developer is now only 25% as good as an expert (a level of performance also known as being "completely shit at it"), but he's writing 10x the amount of unusable shitcode.

  • I think more low tier output would be a disaster.

    Even pre AI I had to deal with a project where they shoved testing and compliance at juniors for a long time. What a fucking mess it was. I had to go through every commit mentioning Coverity because they had a junior fixing coverity flagged "issues". I spent at least 2 days debugging a memory corruption crash caused by such "fix", and then I had to spend who knows how long reviewing every such "fix".

    And don't get me started on tests. 200+ tests, of them none caught several regressions in handling of parameters that are shown early in the frigging how-to. Not some obscure corner case, the stuff you immediately run into if you just follow the documentation.

    With AI all the numbers would be much larger - more commits "fixing coverity issues" (and worse yet fixing "issues" that LLM sees in code), more so called "tests" that don't actually flag any real regressions, etc.

  • I suspect that the kind of people who would "know how to use it" don't use it right now since it has not yet reached "useful if you know how to use it" status.

    Software work is dominated by the fat tail distribution of time it takes to figure out and fix a bug. Not by typing code. LLMs, much like any other form of cutting and pasting code without having any clue what it does, gives that distribution a longer, fatter tail, hence its detrimental effect on productivity.

  • And the other "nuanced" take, common on my linkedin feed, is that people who learn how to use (useless) AI are gonna replace everyone with their much increased productive output.

    Even if AI becomes not so useless, the only people whose productivity will actually improve are the people who aren't using it now (because they correctly notice that its a waste of time).

  • When they tested on bugs not in SWE-Bench, the success rate dropped to 57‑71% on random items, and 50‑68% on fresh issues created after the benchmark snapshot. I’m surprised they did that well.

    After the benchmark snapshot. Could still be before LLM training data cut off, or available via RAG.

    edit: For a fair test you have to use git issues that had not been resolved yet by a human.

    This is how these fuckers talk, all of the time. Also see Sam Altman's not-quite-denials of training on Scarlett Johansson's voice: they just asserted that they had hired a voice actor, but didn't deny training on actual Scarlett Johansson's voice. edit: because anyone with half a brain knows that not only did they train on her actual voice, they probably gave it and their other pirated movie soundtracks massively higher weighting, just as they did for books and NYT articles.

    Anyhow, I fully expect that by now they just use everything they can to cheat benchmarks, up to and including RAG from solutions past the training dataset cut off date. With two of the paper authors being from Microsoft itself, expect that their "fresh issues" are gamed too.

  • Yeah I'm thinking that people who think their brains work like LLM may be somewhat correct. Still wrong in some ways as even their brains learn from several orders of magnitude less data than LLMs do, but close enough.

  • They're also very gleeful about finally having one upped the experts with one weird trick.

    Up until AI they were the people who were inept and late at adopting new technology, and now they get to feel that they're ahead (because this time the new half-assed technology was pushed onto them and they didn't figure out they needed to opt out).

  • I was writing some math code, and not being an idiot I'm using an open source math library for doing something called "QR decomposition", and its efficient, and it supports sparse matrices (matrices where many numbers are 0), etc.

    Just out of curiosity I checked where some idiot vibecoder would end up. AI simply plagiarizes from some shit sample snippets which exist purely to teach people what QR decomposition is. It's actually unusable, due to being numerically unstable.

    Who in the fuck even needs this shit to be plagiarized, anyway?

    It can't plagiarize a production quality implementation, because you can count those on the fingers of one hand, they're complex as fuck and you can't just blend a few together to try to pretend you didn't plagiarize.

    The answer is, people who are peddling the AI. They are the ones who ordered plagiarism with extra plagiarism on top. These are not coding tools, these are demos to convince the investors to buy the actual product, which is company's stock. There's a little bit of tool functionality (you can ask them to refactor the code), but it's just you misusing a demo to try to get some value out of it.

    And to that end, the demos take every opportunity to plagiarize something, and to talk about how the "AI" wrote the code from scratch based on its supposed understanding of fairly advanced math.

    And in coding, it is counter productive to plagiarize. Many of the open source libraries can be used in commercial projects. You get upstream fixes for free. You don't end up with some bugs or worse yet security exploits that may have been fixed since the training cut-off date.

    No fucking one in the right mind would willingly want their product to contain copy pasted snippets from stale open source libraries, passed through some sort of variable-renaming copyright laundering machine.

    Except of course the business idiots who are in charge of software at major companies, who don't understand software. Who just failed upwards.

    They look at plagiarized lines and count them as improved productivity.

  • Its also interesting that this is the most conservative, pro “its not just memorizing” estimation possible : they multiplied the probabilities of consequent tokens. Basically it means if it starts shitting out a quote it will not be able to stop quoting until their anti copy the whole book finetuning kicks in after 50 words or so.

    It can probably output far more under a realistic test (always picking the top token, temperature =0)

  • If it was a basement dweller with a chatbot that could be mistaken for a criminal co-conspirator, he would've gotten arrested and his computer seized as evidence, and then it would be a crapshoot if he would even be able to convince a jury that it was an accident. Especially if he was getting paid for his chatbot. Now, I'm not saying that this is right, just stating how it is for normal human beings.

    It may not be explicitly illegal for a computer to do something, but you are liable for what your shit does. You can't just make a robot lawnmower and run over a neighbor's kid. If you are using random numbers to steer your lawnmower... yeah.

    But because it's OpenAI with 300 billion dollar "valuation", absolutely nothing can happen whatsoever.

  • In theory, at least, criminal justice's purpose is prevention of crimes. And if it would serve that purpose to arrest a person, it would serve that same purpose to court-order a shutdown of a chatbot.

    There's no 1st amendment right to enter into criminal conspiracies to kill people. Not even if "people" is Sam Altman.

  • It's curious how if ChatGPT was a person - saying exactly the same words - he would've gotten charged with a criminal conspiracy, or even shot, as its human co-conspirator in Florida did.

    And had it been a foreign human in the middle east, radicalizing random people, he would've gotten a drone strike.

    "AI" - and the companies building them - enjoy the kind of universal legal immunity that is never granted to humans. That needs to end.

  • I appreciate the sentiment but I also hate the whole "AI is a power loom for coding".

    The power loom for coding is called "git clone".

    What "AI" (LLM) tools provide is just English as a programming language with plagiarized sum total of all open source as the standard library. English is a shit programming language. LLMs are shit at compiling it. Open source is awesome. Plagiarized open source is "meh" - you can not apply upstream patches.

  • So, the judge says:

    In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use.

    And what is that supposed to ever look like? Do authors need a better developed record of effects of movies on book sales, to get paid for movie adaptations, too?

  • TechTakes @awful.systems
    diz @awful.systems

    AI solves every river crossing puzzle, we can go home now

    I think this summarizes in one conversation what is so fucking irritating about this thing: I am supposed to believe that it wrote that code.

    No siree, no RAG, no trickery with training a model to transform the code while maintaining identical expression graph, it just goes from word-salading all over the place on a natural language task, to outputting 100 lines of coherent code.

    Although that does suggest a new dunk on computer touchers, of the AI enthusiast kind, you can point at that and say that coding clearly does not require any logical reasoning.

    (Also, as usual with AI it is not always that good. sometimes it fucks up the code, too).

    TechTakes @awful.systems
    diz @awful.systems

    Google's Gemini 2.5 pro is out of beta.

    I love to show that kind of shit to AI boosters. (In case you're wondering, the numbers were chosen randomly and the answer is incorrect).

    They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the "softer" parts of the test.

    TechTakes @awful.systems
    diz @awful.systems

    Musk ("xAI") now claims grok was hacked

    I couldn't stop fucking laughing. I'm wheezing. It's unhealthy.

    They have this thing acting like that for the whole day... and then more than a day later claim it was hacked.

    TechTakes @awful.systems
    diz @awful.systems

    Gemini seem to have "solved" my duck river crossing, lol.

    Tried my duck river crossing thing a few times recently, it usually solves it now, albeit with a bias to make unnecessary trips half of the time.

    Of course, anything new fails:

    There's 2 people and 1 boat on the left side of the river, and 3 boats on the right side of the river. Each boat can accommodate up to 6 people. How do they get all the boats to the left side of the river?

    Did they seriously change something just to deal with my duck puzzle? How odd.

    It's Google so it is not out of the question that they might do some analysis on the share links and referring pages, or even use their search engine to find discussions of a problem they're asked. I need to test that theory and simultaneously feed some garbage to their plagiarism machine...

    Sample of the new botshit:

    L->R: 2P take B_L. L{}, R{2P, 4B}. R->L: P1 takes B_R1. L{P1, B_R1}, R{P2, 3B}. R->L: P2 takes B_R2. L{2P, B_R1, B_R2}, R{2B}. L->R: P1 takes B_R1 back. L{P2, B_R2},

    TechTakes @awful.systems
    diz @awful.systems

    Gemini 2.5 "reasoning", no real improvement on river crossings.

    So I signed up for a free month of their crap because I wanted to test if it solves novel variants of the river crossing puzzle.

    Like this one:

    You have a duck, a carrot, and a potato. You want to transport them across the river using a boat that can take yourself and up to 2 other items. If the duck is left unsupervised, it will run away.

    Unsurprisingly, it does not:

    https://g.co/gemini/share/a79dc80c5c6c

    https://g.co/gemini/share/59b024d0908b

    The only 2 new things seem to be that old variants are no longer novel, and that it is no longer limited to producing incorrect solutions - now it can also incorrectly claim that the solution is impossible.

    I think chain of thought / reasoning is a fundamentally dishonest technology. At the end of the day, just like older LLMs it requires that someone solved a similar problem (either online or perhaps in a problem solution pair they generated if they do that to augment the training data).

    But it outputs quasi reasoning to pretend that

    SneerClub @awful.systems
    diz @awful.systems

    Some tests of how much AI "understands" what it says (spoiler: very little)

    First, an apology for how fucking long this ended up being, in part thanks to how long winded AI responses are. David wanted me to post it here so I'm posting.

    When you ask GPT4 a question about a common paradox or a puzzle, it almost always provides a correct answer. Does it "understand" the answer, or is it merely regurgitating? What would be the difference?

    Without delving too deep into the philosophical aspects of whether next word prediction can possibly be said to reason or "understand" anything, what puts "under" in understanding is that concepts are built on top of simpler, more basic concepts.

    You could test if a human understands something by modifying the problem enough that memorization no longer helps.

    A couple simple probes:

    Prompt:

    The village barber shaves himself and every other man in the village who don't shave himself. Does he shave himself?

    Note that the above is not a paradox. This is how you would expect an ordinary barber to work in a small village.