Generating images

November 14, 2023

Let us (all right, mostly just me) talk about generating images using large language models.

I remain firmly on the negative side when it comes to using models to generate text. It feels like most use cases would be much better and more useful if they did not put a random lie generator in the middle, and that many of the remaining use cases do not motivate the resources needed to generate and maintain the models.

And of course, there is the whole ehtical and moral side of using the creations and creative energies of everyone else without permission to create the ability to generate worse derivatives.

All the huge ethical, moral, and resource question marks remain firmly in place for the image generators too. There is one huge difference right at the top though:

The magic

For me, writing to generate images is simply enormous amounts of fun.

I have spent quite a bit of time playing with Diffusionbee, and had quite a lof of fun with that too. But I recently got access to the paid tier of ChatGPT, and with that comes the latest and greatest version of Dall-e.

Wow, what a difference.

So much better images, somewhat faster, and also available through both computers and mobile devices. I hit my first daily limit within ten hours. It has even eaten some amount of my mindless surfing time, which I am very happy about. Instead of following various random thoughts and threads on social media, I have spent entire bus rides generating images of Cthulhu getting caught in ticket inspections, dachshunds commuting on trams, and the like.

The magic for me is, I think, in the interaction. I enjoy writing and have felt able to express what I want the way I want in text for most of my life. Picking the right words, constructing sentences, and editing. It is all part of the fun.

I am able to draw, take photos, and do a bit of image manipulation as well, but nowhere near the level of being able to create anywhere near even a fraction of the images I can dream up in my head. I get a general image, and if I try to draw it I can often realize that I do not know which details I want, how to create a nice composition, or any other of a million little stumbling blocks.

Hey look: suddenly I can type that fuzzy idea into a prompt, hit enter, and get a fun result on which I can iterate just seconds later.

(Or, sometimes, you know, error messages of highly varying degrees of crypticness. I wonder how exactly Openai has glued together front and back ends, and how much they really can control what goes on as both users and their own system generate ideas about how to get around the limits they themselves try to impose.)

The whole experience feels magical and fun in the best possible way. I can make images for an endless line of subjects for which I would never have had an image otherwise.

As good as it gets?

Not that any of it can defend the resource usage. Or that mountain of people Openai trod down to build this. My only possible consolation is that I should be costing them a whole lot more than I bring in, so perhaps I can make the end of this ridiculous show come that little bit sooner?

I am also wondering if this could be the peak of these models?

The time where people were still blindly pumping money in.

The time before it was realized where exactly the hard technical barriers were.

The time before legal clarifications or plain old lawsuits made truly wide datasets impossible without high paywalls.

The time before the business models were figured out.

Perhaps this is as good as it will ever get outside of much narrower circles?

The end?

On one hand, In feel as if image generation has a bit more of a shot than text, just because there are so many cases where you have some text and could lighten things up with an image which would never be created otherwise.

But on the other hand, all of those cases are, you know, kind of meaningless fun. Will that ever be worth paying the true cost for? I would not, and I would not use a free one if the quality was too low either.

So yeah: I love creating images from text, but perhaps it would be good if this whole thing just passed by and left soon?