I’m James Janson Young and I convene Ours For The Making to help us understand how innovations and ideas are creating our future, and how we too play a role in shaping what comes next.
This week we’re talking about AI. But at the friendly, non-clunky, non-technical end of spectrum.
We’re talking about a particular text to image generator called Dall-E Mini (a portmenteau of Dali and Wall-E).
Dall-E Mini exploded onto the scene only very recently and has grown and grown in popularity over the past month or so.
So, here’s a primer on what it is (and isn’t) and asks what it’s wider popularity might mean for how we view AI in general?
(I also include some tips at the end for how you can get better results out the image generator for yourself.)
As always, thanks for reading.
DALL-E Mini will transform how you see AI
You might have recently seen something odd popping up in your social media feed. Weird grainy pictures that resemble something familiar, but look like odd mash-ups. Strange combinations…some hilarious, others just kind of haunting.
So what’s going here?
These images were created by an AI formerly known as Dall-E Mini, now known as Craiyon (more on the name change in a moment).
Dall-E Mini generates pictures from any text prompt. You type,… it gets to work.
Crude as some of these picture are, Dall-E Mini / Craiyon is for many of us our first real opportunity to play around with AI that makes images and ‘art’ on demand.
What is Dall-E mini?
According to its creators: “DALL·E mini, is an AI model that can draw images from any text prompt!”
It’s now known as “Craiyon” to avoid confusion with DALL-E, a larger more sophisticated AI-text-to-image generator released in January 2021 and DALL-E 2 it’s even more powerful sibling, previewed in April 2022.
Dall-E 2 is vastly more powerful than Dall-E Mini and it’s best to try and separate them in your mind. Dall-E 2 isn’t open for just anyone to use. It’s in “closed BETA.” You have to apply to use it.
The same goes for other AI text to image generators, such as Google Brain’s Imagen and Midjourney.
Why, you might ask?
Good question. (We’ll return to that in a moment.)
By contrast, DALL E mini is openly available for you to try right now.
(Assuming the site hasn’t crashed owing to all the traffic.)
Note: There are other free and open AI text-to-image generators out there. Dall-E Mini just happens to be the most famous one that’s open to anyone to use that has gone viral.
How does it work?
Type in your text prompt and wait…
So, how does it really work?
Under the bonnet is an algorithm designed to mirror how our brains’ network of neurons communicate. These algorithms have been trained to produce images on demand from a text prompt.
The ‘training’ part is effectively to plonk the algorithm in front of a screen and show it around 30 million unfiltered pictures scraped from the internet. These could be images of anything from teapots to Turkish delight along with any associated captions.
The algorithm starts to gain a feel for which captions map to which images. It starts to predict from a caption that it is fed, say “teapot”, what an image of a teapot should look like.
Then, it uses these insights to assemble images that map to entirely new, more complex captions with multiple concepts and ideas that you and I willingly provide.
Dall-E Mini in action
As we’re interested in the futures that these technologies are creating, let’s ask it show us what it sees in the future…
How it sees the future of…food…
How it sees the future of…transport…
How it sees the future of…communication…
Is this really how AI sees the future?
Dall-E Mini’s creators list what counts as “Misuse, Malicious Use, and Out-of-Scope Use”.
Most relevant to us is: “The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.”
So, asking the model to predict images of the future is perhaps a little unfair.
But, what I find really revealing about Dall-E Mini, and AI image generators in general, is the questions they prompt about their potential future use. And misuse.
Potential uses (and misuses)
The model’s creators envisage Dall-E mini will be used to “generate images based on text prompts for research and personal consumption.”
“Intended uses include supporting creativity, creating humorous content, and providing generations for people curious about the model’s behaviour.”
Further uses could be:
- assisting researchers examining the limitations and biases of these types of models
- Development of educational or creative tools
- Generation of artwork and use in design and artistic processes.
Interestingly, since the model has gone viral, users have applied in other unforeseen ways, including:
- poetry illustration,
- fan art,
- visual puns,
- fairy tale illustrations,
- concept mashups, and
- style transfers
While these examples are fairly benign, it does highlight how hard it is to foresee the many ways these models might be used, or how to protect against the ways that they might be mis-used.
Biases and limitations
The creators of Dall-E Mini warn of some of its biases and limitations:
For example, images “may … reinforce or exacerbate societal biases” owing to the fact that Dall-E Mini was trained on “unfiltered data from the internet”.
The issue of biases and AI isn’t new.
It has been long-known that biases can creep into AI design undetected, only to later surface once the technology has been made available for wider use.
Take ‘Ask Delphi’, the AI built to offer ethical advice. When released in late 2021, all sorts of questionable advice was offered up by the bot.
Prospective users landing on the Ask Delphi homepage are confronted with the disclaimer: “Large pretrained language models, …, are trained on mostly unfiltered internet data, and therefore are extremely quick to produce toxic, unethical, and harmful content, especially about minority groups.”
On the surface, that’s slightly problematic for an AI that aims to be a reliable source of ethical advice. But part of the rationale for building Ask Delphi is to investigate the limitations of modelling our moral judgements.
Concerns over the potential for AI to fuel misinformation, deepfakes and re-affirm societal and cultural biases are reasons why more powerful AI image generators, such as Google Brain’s Imagen and Dall-E 2 are not open for us all to use.
Take Dall-E 2. A select group of users are assisting its creators, OpenAI, to study its limitations and capabilities.
OpenAI claims to have already introduced “safety mitigations” that prevent the generation of harmful images in the first place. For example, “By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to [violent or harmful] concepts”.
So as questions remain, is it perhaps sensible to take a cautious track before unleashing these more sophisticated models?
And remember, these technologies are just in their early infancy. Dall-E appeared in early 2021. Dall-E 2 and Imagen in early 2022.
One last nagging issue
As a layperson looking at this AI, I wonder the extent to which it truly creates something genuinely new, beyond recombining images and ideas that it has been trained on.
AIs’ ideas about an image are based on what something should look like. You type in “teapot” and it serves up images of what that should look like according to the images it’s been trained on.
But, we are capable of so much more: what something could look like. We can take concepts off in unimaged directions that perhaps don’t fulfil the test “does this look like an image people would readily recognise as [insert text prompt]”. But do put an imaginative, innovative twist on things.
Understandably designers and artists are concerned about their future with the prospect of these powerful image generators coming on-line.
But, I for one feel there’s space for them and all creatives until AI can offer a credible response to this simple question: what else is possible?
Until then, there’s a creativity gap that AI can’t fill.
For the time being.
Bringing AI to the masses?
About a month ago, Dall-E Mini / Craiyon was generating some 50,000 images a day. The @weirddalle twitter handle has over 1 million followers.
This wider exposure of new users to AI holds the potential for something more significant: to demystify AI in the eyes of a wider range of people.
This is perhaps the first time many people have interacted with AI in this way or knowingly interfaced with AI at all, despite habitually clicking on AI-generated Netflix suggestions or listening to AI-generated playlists.
And, the more powerful and ‘intelligent’ AI becomes, the more pressing it is that we think through how (or perhaps ‘if’) it can be deployed in safe and sensible ways.
I suspect people’s feelings towards AI will inevitably shift as a result of tinkering with AI image generators such as Dall-E Mini and seeing what they produce.
Perhaps AI will look a little less mysterious. Bringing it out from behind closed doors and placing it in our hands to play with may also soften our reservations about AI in general, and it featuring more prominently in our lives.
Or perhaps the reverse.
It might heighten our awareness and wariness of just how much AI is quietly making what we thought were our decisions. And taking what we thought were our jobs.
It could also make us even more sceptical of the images we see day-to-day in our social and news feeds. With unknowable knock-on effects for how we establish trust.
But what do you think?Does it change your views about AI’s possibilities and limitations?
Try out Dall-E Mini for yourself and let me know how you get on!
Tips on how to get better results from Dall-E Mini
Tip #1: Be specific. The creators recommend being specific in your text prompts, and note that using words like “illustration”, “photorealistic”, “high definition” can produce some interesting results.”
Tip #2: Style transfers. Text prompts that describes a style or form you’d like the image to try and imitate can open up some interesting visuals:
Plug in an artist’s name: Paul Klee, Hopper, Van Gogh, Edvard Munch, Gustav Klimt, and so on. “Trail cam” is fast developing into its own sub-genre. Try “Steampunk”, “RGB”, and so on.
Tip #3: Counterpoint. Combining incongruous ideas: “plastic chair and archeology”
Check out the twitter account @weirddalle for more inspiration.
Thanks for reading, watching, subscribing and being a Maker. I really appreciate it.
If you’ve enjoyed this edition of The Makers, you’d be doing me a kind and generous favour by sharing it with someone who might enjoy it also.
And if you have questions or comments, do hit reply.
Until next time…