OpenAI, the parent company of ChatGPT, has given its first official public preview of DALL-E 3, its latest image generation model. Rolled out Wednesday at a small event for reporters, DALL-E 3 is being pitched as a tool that fully understands complex text prompts, and produces images to match them in complexity.
As a new information page about DALL-E 3 on the OpenAI website notes, "Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL-E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide."
SEE ALSO: OpenAI releases new teacher guide for ChatGPT in classroomsPossible images from an in-progress version of DALL-E 3 were leaked onto Discord earlier this summer, and those showed enormous potential along the lines depicted in the press preview. The leaker claimed to have fed DALL-E 3 the lengthy prompt "painting of a pink jester giving a high five to a panda while in a cycling competition. The bikes are made of cheese and the ground is very muddy. They are driving in a foggy forest. The panda is angry." The resulting image was downright astonishing in its fidelity to that request.
Image generators like Midjourney and Stable Diffusion, while capable of mimicking photorealism and producing representations of a wide-range of objects, styles, and people (with no small amount of controversy to go with them) will undoubtedly struggle to produce anything this complex.
Those image generators, and OpenAI's own previous offerings in this area, also famously fall short when asked to produce images that feature text — usually producing garbled nonsense at best, and hilarious malapropisms at worst. DALL-E 3 appears to me much more capable of incorporating coherent text into images, as demonstrated in a cartoon posted on X by OpenAI CEO Sam Altman.
Open AI says it will integrate DALL-E 3 into ChatGPT directly, and strongly implies that the chatbot will transition from one model to another, depending on the content of the prompt. ChatGPT, once purely a user-friendly spigot for text outputs from the GPT-3.5 model is rapidly evolving — incorporating third-party plugins with the ability to pull text from other sources, including the web. This move further diversifies ChatGPT's capabilities, broadening the already strained definition of the term "chatbot."
DALL-E 3 "will ramp to all ChatGPT+ users over the next couple of weeks," according to Altman. The OpenAI website says all ChatGPT Plus and ChatGPT Enterprise customers will be able to use it "in early October," and that OpenAI won't be making any copyright claims on the model's outputs. However, if you plan to generate something with DALL-E 3 and then copyright it yourself, that's a whole other can of worms.