bjoreman.com

November 14, 2023

Generating images

Let us (all right, mostly just me) talk about generating images using large language models.

I remain firmly on the negative side when it comes to using models to generate text. It feels like most use cases would be much better and more useful if they did not put a random lie generator in the middle, and that many of the remaining use cases do not motivate the resources needed to generate and maintain the models.

And of course, there is the whole ehtical and moral side of using the creations and creative energies of everyone else without permission to create the ability to generate worse derivatives.

All the huge ethical, moral, and resource question marks remain firmly in place for the image generators too. There is one huge difference right at the top though:

The magic

For me, writing to generate images is simply enormous amounts of fun.

I have spent quite a bit of time playing with Diffusionbee, and had quite a lof of fun with that too. But I recently got access to the paid tier of ChatGPT, and with that comes the latest and greatest version of Dall-e.

Wow, what a difference.

So much better images, somewhat faster, and also available through both computers and mobile devices. I hit my first daily limit within ten hours. It has even eaten some amount of my mindless surfing time, which I am very happy about. Instead of following various random thoughts and threads on social media, I have spent entire bus rides generating images of Cthulhu getting caught in ticket inspections, dachshunds commuting on trams, and the like.

The magic for me is, I think, in the interaction. I enjoy writing and have felt able to express what I want the way I want in text for most of my life. Picking the right words, constructing sentences, and editing. It is all part of the fun.

I am able to draw, take photos, and do a bit of image manipulation as well, but nowhere near the level of being able to create anywhere near even a fraction of the images I can dream up in my head. I get a general image, and if I try to draw it I can often realize that I do not know which details I want, how to create a nice composition, or any other of a million little stumbling blocks.

Hey look: suddenly I can type that fuzzy idea into a prompt, hit enter, and get a fun result on which I can iterate just seconds later.

(Or, sometimes, you know, error messages of highly varying degrees of crypticness. I wonder how exactly Openai has glued together front and back ends, and how much they really can control what goes on as both users and their own system generate ideas about how to get around the limits they themselves try to impose.)

The whole experience feels magical and fun in the best possible way. I can make images for an endless line of subjects for which I would never have had an image otherwise.

As good as it gets?

Not that any of it can defend the resource usage. Or that mountain of people Openai trod down to build this. My only possible consolation is that I should be costing them a whole lot more than I bring in, so perhaps I can make the end of this ridiculous show come that little bit sooner?

I am also wondering if this could be the peak of these models?

The time where people were still blindly pumping money in.

The time before it was realized where exactly the hard technical barriers were.

The time before legal clarifications or plain old lawsuits made truly wide datasets impossible without high paywalls.

The time before the business models were figured out.

Perhaps this is as good as it will ever get outside of much narrower circles?

The end?

On one hand, In feel as if image generation has a bit more of a shot than text, just because there are so many cases where you have some text and could lighten things up with an image which would never be created otherwise.

But on the other hand, all of those cases are, you know, kind of meaningless fun. Will that ever be worth paying the true cost for? I would not, and I would not use a free one if the quality was too low either.

So yeah: I love creating images from text, but perhaps it would be good if this whole thing just passed by and left soon?

November 10, 2023

Øredev week

I enjoy Øredev, and the big wooshing sound it makes as it goes by.

This week was Øredev week, and now I sit here late on the Friday evening, feeling that familiar combination of being mentally wrung out and at the same time energized and thinking I could have squeezed even more into the experience.

It is probably a good thing that I did not, because I am tired enough as is. Tired enough that I have re-realized that it is, in fact, Friday, and that I have a whole weekend to rest and recover before diving back into work and other things.

This year, I did on-stage interviews with all keynote speakers right after their respective keynotes. With six keynotes spread unevenly across two and a half conference days, I had plenty of time for other things on paper. But I decided to try and give as much attention as I could to preparing and performing the keynote interviews well, and so I made sure to not schedule many regular recordings. In the end, I did two non-keynote recordings and had one keynote interview cancelled for a grand total of seven. Looking at the archives, it seems I usually get around eight recordings done, so I gave myself a normal balance in the end.

But, again, there is always more which would have been fun to build on. Quick chats with speakers which revealed that we could have had a great recording if we had carved out a time. Even in my currens state, I could write down a list of eight such possibilities to follow up on without making much of an effort at all.

All of which is to say that conferences are still fantastic things, meeting and talking to new combinations of people in new settings is a fantastic way to make new connections and come up with new ideas. And all this value without even starting on the long list of interesting talks I would like to watch once the videos land on Youtube.

I thought of a few more possible threads while writing this. Decompression of all the concentrated information I absorbed during the week has only begun.

October 16, 2023

Intervals

I have been a regular runner for years.

I am also very regular in the pace I keep.

Steady, gradual change is my comfort zone, with a strong trend toward going longer rather than faster.

Yes, this is a classic reminder of how much good any kind of interval training can do, perhaps especially if you fall into the same camp as I do exercise-wise. Thanks to a great initiative at work, I joined some of my colleagues for a group run on Thursday, complete with coach and everything.

It turned out to be all about intervals at various speeds and of various lengths.

I know how good intervals can be for me, and I was still surprised how much I got out of it. Not only did I momentarily push myself much harder than I normally do, I did many more such pushes than possibly ever before in such a short time. And I recovered better than I expected between the pushes too. Or, as I put it in the moment: It is funny how frequently you can start running and still start out too fast every single time.

Just as expected, I felt nicely drained afterward. Now, four days later, I still feel it just a bit in my legs. All in a good way.

But, I have also kept going. In the runs I have got in since then, I have varied my pace a lot more than on average. I feel like I can both improve my max pace and sustain it longer by simply getting more used to it. And that in turn always helps average pace as well.

Intervals. I guess I should do them more often. Frustratingly rewarding.

October 09, 2023

Making pixels up

Watching Google's Pixel 8 introduction, I was fascinated about my own reaction to their new photo manipulation tools.

At the same time as I saw the fun and usefulness in them, I realized that they had passed my line about what I feel good about a camera doing while still presenting the result as a photo.

The particularly interesting case was the feature Google calls Best take, where you can mix and match heads from a series of group photos to create a result where everyone is looking good at the same time.

It was interesting because it was at once a feature I saw great use for - I think I have even wished for this feature explicitly - and one which I felt took exactly one step beyond the line of a photo.

On the other side of the line are a whole lot of cool things, and a lot of fun results, all of which are editing of actual photos, not capturing data about the real world. Changing them from what was there at the time the photo was taken.

I started to think that Google should split their functions between the camera app and a dedicated editing app. That would feel appropriate. As things stand now, I am left with a strange taste in my mouth when many of the most touted "camera" features are essentially about making things up with your actual photo as some kind of starting point.

Give it a few more years, and they can proudly announce that you do not even need to press the button - they can generate the photo for you.

Well, a better photo, really. Many things will be similar to what was actually going on. Mostly. With, you know, better-looking people. Who are happy. Look happy.

Also: no need for all that annoying camera hardware! Introducting the Pixel button! Just one button, to create good stuff. Next year, we will use AI to press the button for you.

(Then I started thinking they should put that editing app on other phones. The "Pixel editor" ("by Google") would accompany Iphone 15 in a great way. Show us all what you can do with all your AI smarts.

Remember when Google was all about indexing the world's information? What are they about now, making things up?

They also tell us we should trust Bard to pick the right hiking path for us in the moment we are standing at a fork in the road looking at the sign. Trust the model, without verifying yourself, in the wilderness. What could possibly go wrong? They will of course include tons of disclaimers somewhere, but there is no way in hell people will not be harmed in significant ways by trusting completely incorrect yet confidently delivered responses from things like Bard. Again, and again.

Making things up is, still, not actual information. But it will of course be indexed as if it was.

September 22, 2023

This is normal

A tired run of Polybius is better than a great run of most games.

Six years later, I say VR is still waiting for another pure game experience equally worthy of the medium.

Sitting down to play after a long break, I am additionally impressed by how well designed it is. How clear everything is, despite incessant explosions of colour and sound. How much thought and experience must have gone into making it so. And how rewarding that makes building experience. I am not sure what I am doing different, but I clearly get into the zone quicker and deeper than ever before, despite clearly being rusty and almost failing multiple times in the first few levels.

It is probably unfair of me to be surprised. Take it as a sign of how little I trust the average game to have put thought into the things it presents.

Leave your baggage behind.