Creating Incredible Epic Text with Free Stable Diffusion AI & Best of All No Photoshop is Needed

Hello everyone! In this tutorial, I’ll guide you through the process of creating captivating cinematic text images for your projects using Artificial Intelligence. We’ll be harnessing the power of the widely-used and free Stable Diffusion text-to-image generative depth model. To provide a user-friendly interface, we’ll rely on the Automatic1111 web UI, an open-source Python-based web application. No need for expensive software like Photoshop or any artistic skills. Just follow these steps, and you’ll be able to craft stunning text artwork.

The Tool We’re Using for Artwork Generation: Automatic1111 Web UI

I’ve launched my freshly installed Automatic1111 web UI. At the bottom, you can see the versions of Python, Torch, xformers, Gradia, Commit, and Checkpoint that I’m using. This is my CMD window. If you’re unfamiliar with Automatic1111, it’s a community-developed interface designed for utilizing various Stable Diffusion models. On my website, you’ll find comprehensive tutorials on Stable Diffusion, including how to use it and install Automatic1111 web UI. In this article and others, I cover a wide range of topics related to Stable Diffusion. To utilize the depth model of Stable Diffusion, we need to download the “depth-ema.ckpt” file. Simply click this link, and it will take you to the download page. Click on the download button, and the file will be saved. Once the download is complete, open your downloads folder, cut the model file, and return to your web UI installation. Navigate to the “models” directory, then enter the “Stable Diffusion” folder and paste it there. Once the file is pasted, return to your web UI, click the refresh button, and then select the model. It will download the necessary files, which you will find here.

Great! You can see that it has generated the necessary yaml file and started downloading the required “midas” file. The “midas” file can also be downloaded manually from the GitHub releases page. If it’s taking too long, this can be a viable option. The download speed depends on GitHub and your internet connection. I’ve manually downloaded the file, so I’ll place it inside the “models/midas” folder. Open your installation directory, navigate to “midas,” and paste it there. Since it’s still downloading in the CMD window, I’ll close it. Now, paste the file. As you can see, it has downloaded only 11 megabytes, so manual download is a good alternative. Next, restart the web UI application. After the restart, select the 512 model. Ensure you choose this model when refreshing; it will display the available models, and you should select the “512 depth ema ckpt.”

Now, let’s install the “depth to image” extension. To do this, go to the extensions tab, then select “available” and search for “depth” using Ctrl+F. You’ll find “Depth Image IO”; click “install.” Once it’s installed, you’ll see a confirmation message indicating it’s already installed. Go to the “installed” tab and click “apply” and “restart UI.” Alternatively, you can close the CMD window and reopen it. After the web UI restarts, navigate to the “text-to-image” tab, and at the bottom, you’ll find the “script” tab. Click it, and you’ll see “custom depth images input output.”

There are two crucial points to note here. First, the depth image should be grayscale, with white representing the nearest objects and black indicating the farthest. Second, keep in mind that the image will be downscaled to one-eighth of the target image size. So, if you’re aiming for a 512 by 512-pixel output, your input image will be downscaled to 64 by 64 pixels. This means you should calculate your target size and adjust your input accordingly.

Creating a Base Template for Text Image Files

Let me guide you through the process of preparing a base template for your text image files. To do this, I’ll be using the free and open-source software, Paint .NET. We’ll start with a 64 by 64-pixel canvas as our base. I’ll fill it with black. You can use any tool of your choice, but I’m using Paint .NET because it’s free and accessible. If you’re not familiar with Paint .NET, you can download it from their website or find it with a simple Google search and install it.

So, we have our canvas, a black 64 by 64-pixel square, because our target image is 512 pixels, and we want to maintain the best quality. Now, let’s add some text. For instance, I’ll type “test.” It’s essential to make the text white since it represents the depth. Additionally, you can adjust the font size to your preference; any font will work just fine. Let’s choose this one for our “test.” Now, save this image in the “pictures/64×64” directory. With this, we’re ready to proceed to our testing.

Return to your Stable Diffusion web UI, and make sure to close any open windows. Upload your “test” image here, I got mine from Okay, we’re all set. Now, let’s understand how this works. You need to define the style you want for your output. I’ll show you an example of the prompt I’ve used. If you’re new to this web UI and Stable Diffusion, take some time to read the tutorials provided; it’ll help you understand how it all functions and its capabilities.

Let’s take a look at the prompt I’ve used: “3D cinematic text with background lightning shading, dramatic effect, Adobe neon, high-quality glow, glass art, station, fantastic RPG, epic movie, sharp focus, seemingly lightning fire, cyber.” Essentially, you’re describing the kind of output you desire. This prompt is crucial because it directly influences the output you’ll receive. On the flip side, the negative prompt describes elements you want your final image to exclude. This is also vital for achieving high-quality results.

Now, select your custom depth-to-image option, which should be available. Here, just click “generate.” But before proceeding, ensure the resolution is correctly set. My apologies for missing that earlier. Let’s generate the image again.

The generation process starts from noise, and it progresses into an image. This doesn’t take much time. Here’s the output of our generated image. Let’s try another one. The quality isn’t very high on this one. It’s decent but can be improved. For optimal results, you’ll need to generate several images. Now, I’ll generate eight images simultaneously. To do this, I’ll change the batch size to eight. Please note that for this to work efficiently, you’ll need a decent GPU; otherwise, you might encounter errors. Let’s see the results.

We now have eight different outputs in a single run. You can review them like this. By the way, I believe the font we initially used might not yield the best results. Let’s try another font. This is the depth mask it’s displaying. I’m using the DPM++ SDE Karras sampling method, but you can also use Eular a, which works well. Now, I’ve chosen the “test” text with the Algerian font at a font size of 20. Let’s save it as “two.” It’s saved. Now, load “test” here and click “generate.” With this font, we get different outputs. I find this font works better. When working with AI, remember that you may need to generate hundreds of images and select the one that suits your needs best. Here are a few more results. As for the time it takes, generating eight different outputs like this typically takes about 20 seconds.

Testing Different Prompts and Prompt Engineering

Let’s explore how you can test the effects of various prompts, a technique often referred to as “prompt engineering.” There are also prompt engineering sites like da prompts where you can search for the best ai prompts for your project. Imagine you want to assess how different keywords or prompts influence your results. To do this, you’ll need to maintain the same seed, which you can find at the bottom of the interface, associated with a particular image.

For instance, take this image with its seed value. Now, set this seed and use it as your input. Set the batch size and batch count to one and generate the image again to see if you can recreate it. Yes, we’ve successfully reproduced the image. Now, let’s see what happens when we remove certain words. Generate the image again. Now, the result is different as we’ve removed some words. You can observe how removing words impacts the output, even with the same seed.

Let’s remove more words and see the results. Here’s the outcome with additional words removed. Removing the negative prompts will also simplify the result, as shown here. Now, let’s restore all the prompts and generate the image again to return to the original output.

You can increase the batch count to generate multiple images simultaneously. For example, if you set it to four and match the batch size, you’ll get four images generated concurrently. Here are the results – we have 16 images generated at once. Keep in mind that to make this process efficient, you’ll need a reasonably powerful GPU. You can check your GPU usage in the task manager.

Now, let’s discuss the impact of resolution. If you input a higher resolution image, it will first get downscaled and then upscaled. This can affect the output quality. Let’s observe the effect of resolution. I’ve generated an image at 512 by 512 pixels with the same font and saved it. I’ve also copied and pasted all the settings to ensure a fair comparison. I’m using the same seed, batch count, batch size, and CFG value.

You can use any drawing software for this process, even the basic Paint tool. Let’s save this text as “four.” Now, load “test” into the interface and click “generate.” This might take some time and might even result in an “out of memory” error if the batch size is too high. You can see the GPU usage is near full capacity. In case you encounter an “out of memory” error, you can run Automatic1111 with specific commands like “–medvram” or “–lowvram” to manage VRAM usage.

I didn’t receive an “out of memory” error this time, but it was taking a long time, so I canceled the operation.

Now, let’s examine the output of a higher target resolution, in this case, 1024 pixels. You can observe that the stylizing isn’t as good as before. Going beyond 1k resolution can reduce the quality, which appears related to the generated resolution. I recommend sticking to resolutions around 1024 pixels for optimal results.

Here’s a comparison between images at 1024 and 512 pixels, with the image resolution at 128 and 64 pixels, respectively. The stylizing is more pronounced in the lower resolution image. However, if you need a higher resolution, it’s easy to achieve. Go to the “send to extras” option at the bottom, then navigate to the “extras” tab. Here, you’ll find resizing options. Let’s say you need a 2k resolution, which is 2048 pixels. Select “resize to 2,” and choose an upscaler; “R-ESRGAN 4x+” works well. Click “generate,” and it will download the necessary upscaling files, which you only need to download once. After that, you can upscale any image immediately.

Now, let’s compare the original resolution image with the upscaled one. The original resolution is 64 pixels, while the upscaled one is 128 pixels. The upscaling quality is impressive. You can experiment with different upscalers to find your preferred one.

So prompt selection has a significant impact on the output quality, and the font you choose also matters. I’ve demonstrated some prompts that yielded amazing results. Feel free to experiment with different prompts to achieve varying styles and outputs. These outputs are fantastic and can be used for various purposes, such as article thumbnails.

The prompts I used for these images were structured like this: “3D stunning water effect, epic glossy text, and solid background with 1.5.” You may be unfamiliar with this syntax, which emphasizes certain words. By using brackets and a colon, you can indicate the degree of emphasis, like “1.5” or “1.1.” You can learn more about this syntax in the Automatic1111 wiki. Take a moment to pause the article and read it for a better understanding.