OpenAI has just announced the third iteration of its generative AI tool, DALL-E 3, directly built into ChatGPT. The upgraded version offers better image recreation, more accurate results, content filtering and the power of ChatGPT.
DALL-E 3 Announced With ChatGPT as the Intermediary
DALL-E 3 aims at a few key points that the majority of image generations lack. It allows you to transform your ideas into a visual form at the click of a button. DALL-E is trained off data publicly available on the Internet. It makes use of a diffusion model to convert input text to an image.
Previously, users had to learn something known as ‘Prompt Engineering‘. You’d spend valuable time writing out your description. Whereas, the model at times had a tendency to ignore a few select and important words.
The real constraint comes when users have to define their visual perceptions and ideas in a written format. This language gap is bridged with the introduction of ChatGPT. Aditya Ramesh, head of the DALL-E team remarks:
“You won’t really have to worry about fussing around with really long prompts. Instead, you can just interact with ChatGPT as if you were talking to a coworker”Aditya Ramesh
With DALL-E 3, users can simply ask ChatGPT to come up with suitable prompts. Since DALL-E works better with lengthier paragraphs, the integration of ChatGPT provides a significant advantage to DALL-E 3 over its competitors.
It is very similar to having a real artist sitting right beside you, brush and paint in hand, ready for your prompts. The image below showcases that the users enter a few keywords, highlighted in white, and ChatGPT automatically generates an entire layout for DALL-E 3 to use. It is a simple yet effective solution.
We have a few images to showcase the bells and whistles of DALL-E 3. To be blunt, the difference is as clear as day and night. Have a look for yourself.
The image recreated by DALL-E 2 looks no more than an oil painting, with minor details. DALL-E 3 takes things one step further by hosting the same basketball game in space? Well, that’s what it seems like. No model is perfect, however, there is a huge quality variance between the two with the latter in lead.
More examples show that DALL-E 3’s visualisations are almost hyperrealistic. Everything from the high-quality textures, the reflections, the illumination and whatnot is achieving near-perfect levels of quality. What’s scary is I would find it hard to tell if these images were AI-generated or not, had I been a part of some blind test.
Moving over to the safety side of things, OpenAI touts that DALL-E 3 will follow strict guidelines that curb lewd, hateful, or violent content. The model is trained as such that it ignores prompts containing certain terms. This also extends to the image recreation of public celebrities.
Sandhini Agarwal, policy researcher at OpenAI states that DALL-E underwent even more rigorous red teaming. A group of researchers tried their best to push DALL-E 3 to its limits, in regards to the content it would generate. Requests involving explicit content or terms would go through a classifier and end up being rejected.
There are a plethora of image generators that create content with no holds barred, available online. However, DALL-E 3 caters to the general public and wants to develop a safe and family-friendly environment.
DALL-E 3 will be available for ChatGPT Plus and ChatGPT Enterprise users starting in October. The API will go live sometime during the Fall, however, there’s no word on a free public version, as of now.
Image Sources: OpenAI (via Linus)