Artificial-intelligence aide handles email, meetings and other things, but its price and limited use have some skeptical
Microsoft’s new artificial-intelligence assistant for its bestselling software has been in the hands of testers for more than six months and their reviews are in: useful, but often doesn’t live up to its price.
The company is hoping for one of its biggest hits in decades with Copilot for Microsoft 365, an AI upgrade that plugs into Word, Outlook and Teams. It uses the same technology as OpenAI’s ChatGPT and can summarize emails, generate text and create documents based on natural language prompts.
Companies involved in testing say their employees have been clamoring to test the tool—at least initially. So far, the shortcomings with software including Excel and PowerPoint and its tendency to make mistakes have given some testers pause about whether, at $30 a head per month, it is worth the price
As a reference, I'd use a search engine first, but it's a matter of personal preference. Usually I'm only short in syntax and a particular language's native functions. The only benefit I could foresee is avoiding the rude, condescending snarky comments from the experienced developers on stackexchange and the like, but I almost never register to post, so avoid all that. I did see a benefit in the area of (real) language learning, when I can ask it to translate something. Then break down specific parts of the response for clarification, switching between my native and the language I'm trying to learn. That was mind blowing.
That's exactly it. I know HOW to program generically. I know what control flow is, how memory works, what a pointer and an object is. I just need some coaching on syntax because it's all just too much to memorize in one lifetime. But once I see it written and used in front of me, I can easily determine if it's any good or not.
It's amusing when it just makes up methods to objects of mine that don't exist. I can spot crap like that immediately. On one of those occasions I actually wrote it into the class so it would actually compile because I thought it was a useful thing.
Yes, this is very expected to me. What surprises me is the 30 USD per User per month price point. That's very expensive. (I can make guesses as to why, but it ultimately doesn't matter.)
Like many tools, there's a gulf between a skilled user and an unskilled user.
What ML researchers are doing with these models is straight up insane. The kinds of things years ago I didn't think I'd see in my lifetime, or maybe only in an old age home (a ways off).
If you gave someone who had never used a NLE application to edit a multi track video access to Avid for putting together some family videos, they might not be that impressed with the software and instead frustrated with perceived shortcomings.
Similarly, the average person interacting with the models often hits their shortcomings (confabulations, safety fine tuning, etc) and doesn't know how to get past them and assumes the software tool is shitty.
As an example, you can go ahead and try the following query to Copilot using GPT-4:
Without searching, solve the following puzzle repeating the adjective for each noun: "A man has a vegetarian wolf, a carnivorous goat, and a cabbage. He needs to get them to the other side of a river but the boat which can cross can only take him and one object at a time. How can he cross without any of the objects eating another object?" Think carefully.
It will get it wrong (despite two prompt engineering techniques already in the query), defaulting to the standard form solution where the goat is taken first. When GPT-4 first released, a number of people thought that this was because it couldn't solve a variation of the puzzle, lacking the reasoning capabilities.
Turns out, it's that the token similarity to the standard form trips it up and if you replace the wolf, goat, and cabbage in the prompt above with the emojis for each, it answers perfectly, having the vegetarian wolf go across first, etc. This means the model was fully able to process the context of the implicit relationship between a carnivorous goat eating the wolf and a vegetarian wolf eating the cabbage and adapt the classic form of the answer accordingly. It just couldn't do it when the tokens were too similar to the original.
So if you assume it's stupid, see a stupid answer and instead of looking deeper think it confirms your assumption, then you walk away thinking the models suck and are dumb, when really it's just that like most tools there's a learning curve to get the most out of them.
My problem with this is that your example replies on you already knowing the correct answer, so that you know it's given you the wrong answer and you can go back and try to trick it into giving a different answer. If you're asking it a question to which you don't already know the answer, how would you know if this has happened?
In my experience it gets tech stuff wrong frequently, in my case often supplying incorrect JIRA queries. GPT-4 blows it out of the water in most every regard. The image creation also seems to be inherently evil; it even replied to me with a mischievous devil emoji once when asked to make a creepy image and it seemed delighted, then its governor kicked in and it returned an error. And sometimes it comes up with stuff far darker than I ever intended, and it gets through.
LLMs feel like an evolution of search engines (or a devolution in some ways), but apart from that, it’s really just been a novelty. Maybe if I was in a creative field that needed to generate crap-tons of text on a regular basis, it would be a nice to have (before inevitably losing my job when the higher-ups realize how to get my job done). Otherwise, I struggle to even figure out what to do with it. Any answers it gives are sub-par and only surface level ideas that I could think up in 5 minutes OR they’re so heavily censored now as to be worthless. There’s some storytelling/worldbuilding potential with respect to RPGs, but you’re holding its hand so much that you’d just as well write your own material
The image generation is interesting, but mostly seems like a replacement for stock images/photography, but again, because of how it censors or misunderstands you so much, it really limits its potential.