Warning: Live-blogging. Prone to error, inaccuracy and howling crimes against grammar and syntax. This post will be updated over the next few days.
NEXT curator Monique van Dusseldorp led two sessions on the future of AI-generated creativity, which has gone through explosive growth over the last few months. And we were luck enough to kick off with the man being one of the newest models: Stable Diffusion.
Stability AI, Research Scientist
Robin is a PhD student – although he’s nearly done. His thesis is written. Not surprisingly, it’s about generative models for images – generating images from text prompts. Stable Diffusion, which emerged from his PhD work at the University of Munich, is consuming more and more of his time — although he’s hearing the siren call of Silicon Valley.
The amount of compute time you need to produce images like this is expensive – which means there’s a limit on how much you can do with university resources. Stability AI is funding research like this, which is what allowed them to continue. Two days after they released Stable Diffusion as open source, there was a Cambrian explosion of creativity using it.
It differs from the other image generation models out there because it’s open source, but it’s also more efficient. It uses two models in concert to achieve this. People are using the model to do anything from extract a colour palette from a text prompt, or to make a child drawings come to “life”.
neural.love, ML blogger & CEO
Denis’s journey began with borrowing some money from a friend to upgrade his PC, and then he started experimenting with ML models to upscale archive video. These became hugely popular on YouTube. The company, neural.love, is an attempt to see if there’s a business model in this.
They’re continuing to experiment, using DALL-E to “outpaint” the environment around classic paintings, for example.
“It’s really cool just to play with these new models that are coming out,” he says. “I’ve collected — like gemstones — people who are obsessed with these ideas.”
Their next idea is simplifying the prompt model. They consider themselves prompt engineers, so they’re putting an interface in front of their expertise, in effect. They added voice prompt on the same day as the conference.
Why is ML art so significant?
Monique suggests that we have put so much information out there in digital form that we can now train these sorts of models – and we are not prepared for the impact of that. Are stock agencies dead? Is this a powerful tool for creatives – or is it a threat?
Robin points out that there’s been a major leap forwards in how generalist these tools are. He likes to compare it to the advent of photography. Once you move into the space, a whole range of new possibilities for creativity and expression open up.
Denis is working with the company to address complex challenges like colourisation of black and white images. There’s a need for a huge number of images to train the models on, but they are getting there. It’s getting close to the point where a historic image looks like it was shot on your iPhone last week.
Open questions of bias and legality
But it’s all based on existing data sets. Using clients’ work isn’t possible because of privacy concerns. But the data sets lead to models. Monique has discovered that the models think Moniques are blonde, for example.
Robin thinks it’s important to start the conversation about what the models were trained on, and how that leads to both bias and replication of artistic styles. The models have a western bias, for example because the data has that bias. Denis agrees there are plenty of open questions around the legality and, indeed, the bias. These questions can’t be answered yet – but they need to be.
Fabian Stelzer: AI in motion
As a child, he had a dream one night of a wooden television that allowed him to watch anything wherever he wanted. When he woke, he searched his bedroom for it, before coming to the heart-breaking reality that it was just a dream.
Dreams do not become reality. Or do they? The challenge inherent in making ideas come true is the bottleneck of execution. Media technology allows us to widen that bottleneck, by making it easier for us to create the things we see in our mind. AI synthesis is the latest technology to do that, using machine learning models that allow computer systems to generate new data based on models trained on old data.
It’s a bit like spell-casting — you give the system a prompt in words, and it creates an image or text based on that. All sorts of images are now only a prompt away. So, what’s next?
Pushing the limits of AI creativity
He’s been experimenting with this, by trying to get AI to recreate his own photography. He called the project Copy Sheet – a play on words from “Contact Sheet”, the photographic prints made to show the whole of negative strips. I Could Do That uses the GPT3 model to generate Midjourney prompts.
Where did he get the data to train the model? Well, from the Midjourney community’s own prompts. So, just as Midjourney was trained on found images, so too was his model. But that didn’t make the community happy. It’s a form of meta commentary on the origin of the models.
However, what about video? Minimum Viable Films, he used a GPT3 model to generate TV show ideas, like Elves of Manhattan, and then another model to turn those into image prompts. He then used another GPT3 model to generate a business plan for AI films…
How do you take it further? You make an AI-generated film. And that leads us to Salt. The trailer was built from DaVinci Resolve, Midjourney Images, and an AI voice generation tools. Currently, it’s just using apps and other visual effects to animate still images, but it shows things that are now possible that weren’t before.
Given that we can now generate video like this a thousand times more easily than before, what could we do with it that doesn’t look just like old films? He’s experimenting with viewers voting where the story goes next because he can turn around the next episode based on the vote quickly enough. This could change the media of film.
Beyond that, AI systems can already generate presentations on a given topic, or generate a battle.