[ad_1]
It’s hard to believe that it’s only been a year since the beta version of DALL-E, OpenAI’s text-to-image image generator, was set loose onto the internet. Since then, there’s been an explosion of AI-generated visual content, with people creating an average of 34 million images per day. That’s upwards of 15 billion images created using text-to-image algorithms last year alone. According to Everypixel Journal, it took photographers 150 years, from the first photograph taken in 1826 until 1975, to reach the 15 billion mark.
The real Cosmo, one of the reference images used. [Image: courtesy of the author]
With new AI text-to-image generators launching at such a rapid pace, it’s tough to keep track of what’s out there, and which produces the best results. We’re here to break down the best AI image-making tools for generating high-quality images from simple descriptions or keywords, or for creating accurate image prompts based on uploaded reference images. For ease of comparison, I used the same prompt to test each model’s ability to capture a photorealistic yet whimsical image of my cat Cosmo as the subject. To determine the best prompt description, I first uploaded a few reference images of my cat into Midjourney and, based on its suggested descriptions, I refined the prompt by adding some surreal image elements and a photography style: “a sleek short haired light beige and amber tabby cat wearing retro sunglasses on tropical vacation, highly realistic, 35mm film.”
DALL-E 2 and DALL-E 3
[Image: courtesy of the author]
One of the first to launch in the wave of AI text-to-image generators, DALL-E 2 has been a go-to source for creating art from natural language descriptions. It has 4x more resolution than its predecessor DALL-E 1, and comes with several new safety measures that prevent users from generating violent or explicit content, as well as photorealistic generations of real individuals’ faces, including those of public figures. It also allows creators to opt their art out of OpenAI’s training data, however, the tedious process of removing images one-by-one has left many artists frustrated with the new development. If you pay for ChatGPT Plus, you’ll get access to DALL-E 3 directly within the ChatGPT interface, meaning you don’t have to spend time crafting the right image prompt, you can just ask ChatGPT to do it for you. The convenience will come at a price though, with the premium tier coming in at $20/month.
Pros
- Simple user interface, great for generating surreal images
Cons
- Not the best option on the list for generating faces or realistic imagery
- Runs on a freemium model that can get pricey
Midjourney
[Image: courtesy of the author]
Midjourney has become one of the best options for generating realistic images, faces, or anything for that matter. Unlike the other models on this list, Midjourney doesn’t have its own dedicated platform but rather operates as a bot within Discord. Users have access to a huge community of other creators within Discord and, by default, the art you generate will appear in one of the many public channels with everyone else’s creations. If you don’t like sorting through the crowded stream of art constantly being generated, you can copy the Midjourney bot to a private server. Midjourney is currently running on version 5.2 and includes higher variation modes, new features like zoom-out/outpainting to expand the frame of the image, and inpainting, which allows users to make changes to specific areas of an image without having to regenerate the entire prompt.
Pros
- Excellent realistic image quality
- Comprehensive documentation on Midjourney’s website
- Helpful Discord community for newbies
Cons
- Free version has been discontinued
- Generating images within the public Discord server can get chaotic
Adobe Firefly
[Image: courtesy of the author]
Adobe’s Firefly emerged from its beta phase in September and has started to distinguish itself from DALL-E and Midjourney for a number of reasons. The primary difference is Firefly’s model has been trained using Adobe Stock Images and public domain material with expired copyrights, ensuring that the training data is obtained with the explicit permission of the creators. Now that it’s commercially available, Firefly has been incorporated into various Creative Cloud applications including Photoshop, Illustrator, and Adobe Express. Users can take advantage of the Generative Fill feature in Photoshop to add, remove, or expand content in images with simple text prompts. It also has a Text to Vector Graphic feature which allows users to create editable vector graphics from text prompts which could be a game changer for designers, especially as the model continues to improve.
Pros
- User interface is very intuitive
- Supports text-to-vector generation
Cons
- Limited customization options, lack of control
Stable Diffusion
[Image: courtesy of the author]
Stability AI developed Stable Diffusion, a widely embraced text-to-image generator available as an open-source tool. Since its launch, users have had the liberty to download and utilize Stable Diffusion freely, albeit usually requiring a certain level of technical skill to not only navigate the UI but also install the required software (Python 3.8 or later) and GitHub files to run locally on your computer. Of all the models appearing on this list, Stable Diffusion gives users the most control and flexibility over the images they generate, however, it demands significant computational power. We suggest considering Nvidia models with a minimum of 8 to 10 GB for optimal performance. Additionally, ensuring your PC system has a 16 GB RAM capacity is essential to prevent any potential instability issues.
Pros
Cons
- Steep learning curve, confusing user interface
DreamStudio
For users interested in accessing Stable Diffusion without needing software installation, coding expertise, or a high-performance local GPU, Stability AI has also released an easy-to-use web interface as a paid alternative. Unlike some of the other models on this list that offer a monthly subscription for unlimited generations, the DreamStudio pricing model is pay-per-image where users must purchase credits after the initial 25 free credits. DreamStudio offers all the features you’d come to expect from the more popular text-to-image generators, like inpainting, or the ability to upload existing reference images, and it also offers several different styles presets and the option to work in layers which gives your creative workflow a lot more flexibility.
Pros
- Simple user interface
- Option to work in layers
Cons
- Pay-per-image credit pricing model doesn’t offer monthly unlimited generations
Runway ML
[Image: courtesy of the author]
Runway ML offers a range of AI-powered tools in addition to text-to-image generation, including video editing and custom model training. It also features Frame Interpolation, which lets you turn a sequence of images into an animated video. The platform can be accessed through any desktop or mobile device in-browser, although it works best when used with Google Chrome. To begin, visit app.runwayml.com without the need for any downloads. Generating Text to Image incurs a cost of five credits each time, which can be obtained by purchasing a Standard or Pro plan at $0.01 per credit, starting with a minimum purchase of $10. However, accessing downloads in higher resolutions will require a Standard or Pro account.
Pros
- Great one-stop shop for text-to-image generation with video capabilities
- Accessible through a mobile app or desktop version
Cons
- Limited storage capacity and export options with the free version
Canva
[Image: courtesy of the author]
Canva’s AI image generator Magic Media is a good option for users who already pay for a monthly subscription to Canva, but overall we found the quality of Canva’s image generator was outmatched by some of the other models on this list even though it’s powered by Stable Diffusion. Canva does offer a Magic eraser and editor tool that can remove backgrounds or objects and enhance images through upscaling. Additionally, it provides access to DALL-E and Imagen by Google Cloud directly within the Canva workspace. Users can create up to 50 images with a free subscription or opt for Canva Pro, allowing 500 uses per user monthly.
Pros
- Good option if you’re already a Canva user
Cons
- Less control/fewer customization options than other models on this list
Bing Image Creator
[Image: courtesy of the author]
Microsoft unveiled Bing Image Creator back in March of this year, and it leverages the capabilities of DALL-E 3, the latest model from OpenAI. Previously, Microsoft utilized an earlier iteration of DALL-E for its image generator, but with the integration of DALL-E 3, there’s been an increase in image quality and more precise prompt interpretation. It’s also the only way to use DALL-E 3 for free, as accessing it through Open AI’s interface will require a ChatGPT Plus subscription. Bing Image Creator is available through Bing.com and doesn’t require an OpenAI account. You can also access it directly through Bing Chat in the Microsoft Edge browser, which allows users to create and refine images conversationally by interacting with the chatbot instead of just inputting a basic prompt.
Pros
- Gives users a loophole to access DALL-E 3 for free
- Chatbot compatible through Bing Chat
Cons
- Requires a Microsoft account
- Not always great at photorealistic generations
Jasper
[Image: courtesy of the author]
Jasper might be more widely known for AI content writing and SEO generation, but it has a decent text-to-image generator as well. It can create high-resolution images without branded watermarks and has a simple user interface with a dropdown menu for select artistic style presets, moods, or mediums. The primary downside of Jasper is the price; the image generator is only accessible through a Pro plan which is a hefty $69/month.
Pros
- Simple user interface
- No branded watermarks to remove
Cons
- Much more expensive than other models on this list
- Image quality not really worth the price
Google’s Imagen 2
[Image: courtesy of the author]
Google quietly debuted Imagen 2, the company’s AI image generator last week, expanding access to Google Cloud customers utilizing Vertex AI. Updates to the model include enhanced text rendering, which many graphic designers will be keen to get their hands on, as typography has been a particularly challenging subject for AI to accurately render. Imagen 2 can also produce high-quality logo generations and supports multi-language prompts, as well as an experimental digital watermarking service, powered by Google DeepMind’s SynthID, which enables users to generate invisible watermarks and verify images generated by Imagen.
Pros
- Increased text rendering for typographic use
Cons
- Not yet available to the general public
[ad_2]
Source link