Why DallE-3 Won't Replace MidJourney: A Comprehensive Review and Comparison with MidJourney


Why DallE-3 Won't Replace Midjourney

OpenAI made a big splash in the AI world by releasing DALL·E 3 to the public. I tested it out to see what all the hype was about, and to be honest, I think DALL·E 3 is overrated, and I'll cover why. Just like its predecessor DALL·E 2, it won't be overtaken by art image generators like MidJourney anytime soon.

Now, to access DALL·E 3, you need to log in with a free Microsoft account. You can use your Gmail to sign up and login. Once logged in, you can enter a prompt like "Extreme macro close-up Photorealistic, comical image. Adventurous couple mid-skydive, holding a floating breakfast table. Woman pours syrup, laughing. Cityscape, clouds. Whimsical white cursive text I Told You, Ever" and click "create" to generate your images. DALL·E will generate four images at a time. You can then click on any one of them to zoom in.

Another prompt like "a photo of Kevin Hart with The Rock" and click "create".

Note: Sometimes, you might see an error message stating that I can't create an image of Kevin Hart and The Rock due to copyright and likeness restrictions.

Strengths of DALL·E 3

Let's talk about the strong points of DALL·E 3 and why it's gotten so much hype. It's amazing at natural language understanding, that's objectively superior to MidJourney. These photos clearly match what we asked for in the prompt. It preserves the identity of both Kevin Hart and The Rock. Try the same thing in MidJourney, and we get a fusion of the two actors that looks like neither of them individually.

Now, there's ways to get around this with the pan and paint features in MidJourney, but DALL·E 3 comes out on top here. Here's another comparison: prompting for "Batman fighting Spider-Man". DALL·E 3 does a pretty good job of generating what we'd expect, but MidJourney gets confused and generates an image of Spider-Man fighting what appears to be some fusion of Batman, the Green Goblin, and Venom.

Another reason for the hype of DALL·E 3 is the ability to generate text and images. Let's ask for Pikachu holding a sign with the text "Don't pet me!" and the text looks pretty accurate in the generated images, although it did take a couple of rerolls. 



It can also do graphic designs, like this graffiti-style logo with the words "Never Back Down." Images with longer text messages can also be made, although, upon closer inspection, it appears that it has a hard time getting all the letters correct for longer phrases. It also only works for English. I tried to generate a photo of a man wearing a t-shirt saying "je perie," which means "I love Paris" in French, and it just doesn't work. DALL·E has no idea what I'm talking about. Compare that to Ideogram, which is a free alternative image generator, and the images are generated in French with no problem.

Current Features of DALL·E 3

In the current version of DALL·E 3, you can prompt for images by simply entering a prompt, and that's it. This is the only feature available at the moment. While this is functional, I would say that DALL·E 3 is still in beta, so hopefully, a variety of additional features will be added in the future. However, it's worth mentioning that DALL·E 2 wasn't exactly feature-rich either, so expectations should be tempered.

One of the limitations I noticed is that you can't change the aspect ratio of the generated images, like you can with other tools like MidJourney. DALL·E 3 only generates square images. For example, I asked for a full-body shot photo of a woman in New York, with natural lighting and a 35mm camera style. As far as I can tell, DALL·E 3 has a hard time generating the full body in these types of photographs, which is a common limitation of other image generators as well.


In MidJourney, you can use the pan feature to extend the bottom of an image and generate a full body, but in DALL·E 3, this is impossible. So, if you're looking for a tool that allows you to generate full-body images or adjust aspect ratios, DALL·E 3 may fall short in these areas for the time being.

Photography and Camera Angle Handling

When it comes to photography, DALL·E 3 does a pretty good job in terms of understanding the details of the prompt. For instance, I did a comparison of different camera angles with MidJourney, such as centered view, low-angle view, high-angle view, and fisheye lens, and DALL·E 3 more or less captured the essence of the elements I asked for, like the fisheye lens and long exposure settings.

Centered view

L ow-angle view

High-angle view 

Fisheye lens

In contrast, the MidJourney equivalent image had much better aesthetics but fell short in terms of accurately interpreting the prompt. DALL·E 3's output was more about staying true to the technical aspects of photography, whereas MidJourney seemed to focus more on the artistic side, creating visually appealing results, even though it didn't match the technical aspects of the prompt as well.

7. Aesthetic Comparison: DALL·E 3 vs. MidJourney

Speaking of aesthetics, I think MidJourney is far ahead. This can be kind of subjective, but let's look at some different styles. For example, I requested a painting of a garden in the style of Monet and an image of a tiger done with paper quilling. The differences between the two AI models were slight, with DALL·E 3 providing a more literal interpretation of the prompt, while MidJourney created an image with a more artistic flair.


However, things got more interesting when I asked for the style of a lesser-known artist called Kazuki Takamasu. 

While DALL·E 3 didn't seem to have a grasp of the artist's unique style, MidJourney did a better job of capturing the essence of his work. DALL·E 3 might be able to generate that style, but it likely won't do it due to potential copyright issues, or it might not be trained on a dataset that's as comprehensive as MidJourney's.

Another comparison involved a landscape in the style of Studio Ghibli. Once again, MidJourney produced an image that was much closer to the actual art style, while DALL·E 3 struggled to achieve that level of detail and style accuracy.


DALL·E 3 vs MidJourney Comparison
Feature DALL·E 3 MidJourney Remark
Natural Language Understanding Superior to MidJourney, produces more accurate results with prompts Can be less accurate, sometimes combines elements or misunderstands prompts DALL·E 3 is better at understanding detailed prompts and preserving identities.
Image Generation Generates 4 images per prompt, better at preserving specific details Generates visually appealing but less accurate interpretations DALL·E 3 excels in technical accuracy; MidJourney in artistic appeal.
Text and Image Generation Can handle text in images with some limitations (English only) Struggles with text in images DALL·E 3 has the edge when it comes to generating readable text in images.
Aspect Ratio and Full-Body Shots Limited to square images, struggles with full-body shots Flexible with aspect ratios and can generate full-body images MidJourney offers more versatility in aspect ratio and full-body generation.
Photography and Camera Angles Handles technical aspects (e.g., fisheye lens, low-angle view) well More focused on aesthetics, struggles with technical accuracy DALL·E 3 is better for accurate technical photography; MidJourney is more visually aesthetic.
Aesthetic Comparison Literal interpretations of artistic styles, struggles with some specific styles Better at capturing the essence of artistic styles like Kazuki Takamasu and Studio Ghibli MidJourney has superior aesthetic appeal and style accuracy.
Character Consistency No reliable way to generate consistent characters via ChatGPT integration Difficult to generate consistent characters, but better than DALL·E 3 in some cases Neither excels at consistent character generation, but MidJourney has more flexibility.
Feature Set Still in beta, only basic image generation available More established, includes advanced features like aspect ratio changes MidJourney has a more feature-rich and flexible toolkit.
Languages Only supports English in text generation Handles multiple languages well (e.g., French) MidJourney has a language advantage over DALL·E 3.
Image Storage/History Lacks easy access to previously generated images Allows easy access to and search of generated images DALL·E 3's inability to manage image history is a significant drawback.

DALL·E 3 shines in technical accuracy, language understanding, and text-to-image generation. However, its limitations in aesthetics, flexibility, and image history management make it less versatile compared to MidJourney, which excels in artistic creation and has more established features for image generation. While DALL·E 3 is a strong tool for precise prompts and natural language, MidJourney remains superior in terms of creative flair and style diversity.

DALL·E 3 and ChatGPT Integration

When I heard DALL·E 3 would be integrated with ChatGPT, I got very excited about the possibility of leveraging a massive language model with an image generator. One possibility I was especially interested in was generating consistent characters. I imagined uploading an image of a character into the language model, analyzing that image, and then generating pictures of that character in different settings.

I used ChatGPT to get a preview of the DALL·E 3 and ChatGPT integration. By visiting bing.com/chat, you can talk with Bing’s chatbot powered by ChatGPT and ask it to generate images using DALL·E 3. I tried uploading some images of faces, hoping the chatbot would be able to analyze the face and map it into other images. Unfortunately, due to privacy concerns, the chatbot blurs out all faces—even those that are clearly fictional.

Generating consistent characters in MidJourney is already difficult, so I was hoping DALL·E 3, combined with GPT, would be able to solve this problem. However, it looks like that won't be the case for the time being.

Missing Features: A Big Drawback

One huge issue I found when testing DALL·E 3 is that there's no way to actually look at my old images. If you look on the sidebar in the create tab, your recent images are there, but there's no way to actually access or search for the entirety of your previously generated images. Maybe this is due to storage concerns, but it's a huge inconvenience, especially if you're serious about generating a lot of images. Hopefully, they'll fix this at some point.

Conclusion

I'm a big fan of OpenAI research, but DALL·E 3 seems overhyped to me. The improvements in natural language are huge, but I think MidJourney version 6 will probably be almost as good in terms of understanding language and much better everywhere else. If you want to stay up to date with the latest MidJourney updates, make sure to subscribe to my channel and I'll keep you posted.

DALL·E 3 is better for accuracy, technical photography, and text/image integration. It excels at maintaining identities and works well for generating more realistic, detailed images.

MidJourney is better for artistic quality, aesthetic appeal, and stylistic interpretation. If you're looking for creative, visually stunning outputs, MidJourney often produces more artistic and compelling results.