Oh those chatbots

March 15, 2023

Yesterday, Openai announced GPT-4, the latest update to the model powering Bing and Chatgpt and probably five new things before I even finish writing this sentence.

There is very little to see here.

They basically added more of everything but made no changes to how the model actually works. It is still putting words in statistically reasonable orders.

What really, really annoyed me as I read through the announcement was just how many weasel words Openai themselves use which make it sound like GPT-4 is doing a lot more than it actually is. They write stuff like

"it still is not fully reliable (it 'hallucinates' facts and makes reasoning errors)".

Well, no shit Sherlock. Perhaps you should instead write the truth: GPT-4 - just like the earlier models - can neither do reasoning, nor does it have any concept of facts. They have done tuning, more training, and manual special case handling to try and reduce the number of errors, sure. But that does exactly nothing to make the model do things it has never done and never will do.

If you want any kind of concept of "facts" or "reasoning", or anything else except text generation, you need to build something else from scratch. For that, the GPT models are a dead end, but it seems like the hype machine is too caught up in itself to dare admit this.

"GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake."

If you have built a model which can actually check its work in any capacity, you should be shouting about it from the rooftops. When you have built a model which can not and will never be able to check anything, you should not write a sentence like that.

Perhaps GPT-4 wrote the whole announcement? That would explain a lot …

A final little nugget comes from the section Risks & mitigations where they state that GPT-4 not only has the same risks as previous models, it also has new risks because of its additional capabilities. To mitigate this, they talked to experts in various fields and tried to train the model to avoid giving dangerous responses to various prompts. Which is all well and good as long as you do not think of it as evidence of how impossible it is to bend the model in any way. The best they can do is make it statistically less likely it will respond with certain dangerous things they have thought of beforehand in certain situations with prompts they have thought of beforehand. Sure, that will generalize to some extent, but since you still know nothing about the data used you can be sure it will bubble up in one way or the other.

Thinking about it, I would bet that they could never build something as impressive-appearing as GPT-4 if they tried to know or evaluate their source data in any way. You would end up with a … Wikipedia or something like that. A dictionary of verified sources the materials of which would not be creatively recombined because that would again risk creating new and exciting errors. The approach is great for generating plausible-looking text from a huge corpus of data, but a dead-end for doing anything remotely intelligent with either data or generated text.

(Why do people hype so much? Like VR, it would all be fun and interesting to figure out if tons of people were not trying to hype it as a cure to … well, what exactly are they trying to tell us it is great for, anyway?)

Phew, I guess I really needed to vent that. On to something calmer.

Podcast downtime, take n+1

A while ago (June?!), I was finishing up a re-listen to a podcast and looking forward to spending less time on more or less random podcast listening.

Yeah, that project went nowhere pretty darn fast. I download a great many more episodes than I subscribe to every single week, and I still catch myself sometimes having listened to more stuff in a day than feels meaningful. Listening which just washes over me, removing time where my ears could have relaxed and my mind worked on ideas of its own rather than trying to keep up with new ones.

There are some small changes though. The first and most clearly positive one is that I am a little bit better at catching myself over-listening, putting away the headphones and enjoying the dog walk without additional voices. The second change I choose to see as positive is that I give episodes a second listen more often, rather than picking up completely new things. While it is still time spent listening, a second run through thoughts is still more relaxing, and gives a chance of actually picking up more of the ideas expressed.