This is fine...
"We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet were also more likely to rate their insecure answers as secure compared to those in our control group."
https://arxiv.o...
"We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet were also more likely to rate their insecure answers as secure compared to those in our control group."
This is just an extension of the larger issue of people not understanding how AI works, and trusting it too much.
AI is and has always been about exchanging accuracy for speed. It excels in cases where slow, methodical work is not given sufficient time already, because the accuracy is already low(er) as a result (e.g. overworked doctors examining CT scans).
But it should never be treated as the final word on something; it's the first ~70%.
I feel like I've been screaming this for so long and you're someone who gets it. AI stuff right now is pretty neat. I'll use it to get jumping off points and new ideas on how to build something.
I would never ever push something written by it to production without scrutinizing the hell out of it.
Didn’t it turn out that the CT scan analysis thing was just the model figuring out the rough age of machine, becuse older machines tend to be in poorer places with more cancer and are more likely to only be used on serious illnesses?
If taking into account the older machines results in better healthcare, that seems like a great thing to be discovered as a result of the use of machine learning.
Your summary sounds like it may be inaccurate, but it's interesting enough for me to want to know more.
It's a decent first screen for pattern recognition for sure, but it is fast which is where I see most of its value. It can process information that people would never get to.
Stuff like CoPilot is awesome at making code that looks right, but contains subtle wrong variable names it's self-created, or bad algorithms.
And that's not the big issue.
The big issue is when you get distracted for 5 mins, you come back, and you forget that you've been working through that block of AI generated code (which looks correct), so you forget to check the rest of it, and it makes it into the source code, before testing later, only to realise its screwed because its AI generated code.
The other big issue, is that its only a matter of time until people start to get fed up, and start feeding these systems dodgy data to de-train them and make them worse / with backdoors.
LLMs are decent at boilerplate. They're good at rephrasing things so that they're easier to understand. I had a student who struggled for months to wrap her head around how pointers work, two hours with GPT and the ability to ask clarifying questions and now she's rockin'.
I like being able to plop in a chunk of Python and say, "type annotate this for me and none of your sarcasm this time!"
But if you're using an LLM as a problem solver and not as an accelerator, you're going to lack some of the deep understanding of what happens when your code runs.
The thing is that this is NOT what the marketers are selling, they're not selling this as "Buy access to our service so that your products will be higher quality", they're selling this as "this will replace many of your employees". Which it can't, it's very clear by now that it just can't.
People tend to deify LLMs, because of the vast amounts of knowledge trained into them, but their answers are more like a single "reasoning iteration".
How many human coders are capable of sitting down, typing a bunch of code at 100 WPM out of the blue, then end up with zero security flaws or errors? About absolutely none, not even if they get updated requirements, and the same holds up for LLMs. Coding is an iterative job, not a "zero shot" one.
Have an LLM iterate several times over the same piece of code ("think" about it), have it explain what it's doing each time ("reason" about it)... then test run it, fix any compiler errors... run a test suite, fix for any non-passing tests... then ask it to take into account a context of best practices and security concerns. Only then the code can be compared to that of a serious human coder.
But that takes running the AI over and over and over with a large context, while AIs are being marketed as "single run, magic bullet"... so we can expect a lot of shit to happen in the near future.
On the bright side, anyone willing to run an LLM a hundred times over every piece of code, like in a CI workflow, in an error seeking mode, could catch flaws that would otherwise take dozens of humans to spot.