AI Upscaling
By Zarxrax
Updated 1/20/2023
This is a (basic) tutorial on how to upscale an animated video using AI models (ESRGAN).
You will need an Nvidia GPU for best results.
Overview
Upscaling is done by using an AI “model” to process your video frames. This process is called inference. ESRGAN is the name of the upscaling framework that most models are designed for, though there are others as well. Each ESRGAN model has been trained to modify an image in a particular way. Each model will produce different output, depending on how it was trained. There are a lot of different models to choose from. Hundreds, even! Some models don’t even do upscaling–they might do other tasks like remove certain types of artifacts from the image. I will recommend a few models to start with, so you don’t have to try them all.
Along with models, you will need a piece of software that will actually let you USE the model. In this guide, we will use a tool called chaiNNer. It is relatively easy to set up and use. (You can alternatively use VSGAN for Vapoursynth. This is actually quite a bit faster than chaiNNer, but it is somewhat difficult to set up and use. For that reason, I will not be covering it in this guide. For most people, I recommend sticking with chaiNNer unless you are comfortable working with Python.)
How to Set up and use chaiNNer
Download and install the latest version of chaiNNer: https://github.com/chaiNNer-org/chaiNNer
There is pretty good documentation on the chaiNNer site, so I recommend spending a few minutes reading over it.
After installing and launching the program for the first time, you will need to let it download the dependencies. This may take a while as it needs to download several gigabytes of data. If it doesn’t happen automatically, you can access the dependency manager in the upper right corner of the application window.
You can also find some tutorials for chaiNNer on Youtube, such as this one: ChaiNNer – Upscale a single image & a directory
ChaiNNer is a node based image processing toolkit. You basically create a flowchart telling it what to do. Load an image, load a model, upscale the image using the model, save the result back into another image. While chaiNNer (and ESRGAN models, for that matter) does processing on a single image at a time, remember that video is just a sequence of images. ChaiNNer even has a “video frame iterator” which can let you load a video file directly, then do processing on each individual frame.
So to process a video file using a model, you would basically create a chain that looks like this:
A basic explanation of how to build this graph: On the left side panel you will see all of the nodes that you can bring in. First, select the “Video frame iterator” and drag it into the workspace. Inside the Video Frame Iterator there will already be two additional nodes–”Load frame as Image” and “Write Output Frame”. At the top of the Video Frame Iterator is an area where you can select your input video file. You also want to find the orange “Upscale Image” node and drag it inside the Video Frame Iterator.
Next, select the orange “Load Model” node and place it outside of the Video Frame Iterator. Note that the Load Model node is outside of the Video Frame Iterator because you only need to load the model once, it doesn’t need to be reloaded for every frame. Finally just hook up the nodes as depicted in the image. On the “Write Output Frame” node you can alternatively specify a different directory and filename for your output file, and you can also set the encoding settings here. I recommend outputting an MP4 at the best quality (quality 0) to create a lossless file which you can then do further processing or encoding on.
Selecting a Model
Processing a video may be somewhat slow, depending on your graphics card and the model selected. Before processing an entire video, you should first export a few individual frames from your video and test different models on them in order to find a model (or models) that make your video look good.
It is worth noting that you may see some models described as “lite” or “compact”. These models can be much faster at the expense of the model being smaller (and thus it wasn’t able to “learn” as much). Animation does not really need a full sized model, so I highly recommend using lite or compact models when possible. In general, the smaller the file size of a model, the faster it will run. A full-sized ESRGAN model is around 64MB, while compact models may come in at just a couple MB.
You will also want to note that each model is trained to scale the image by a specific factor–typically 2x or 4x. In order to get your video to a specific resolution, you will typically upscale it and then do a standard resize to get it to the exact size you want it to be. 1x models are usually designed to fix a particular problem and then be chained together with another model to do the actual upscaling.
With that out of the way, here are a few models that I recommend you try out, in no particular order. Just search for the model name in the Model Database:
https://upscale.wiki/wiki/Model_Database
Recommended Models
Model Name: Futsuu_Anime
Description: Upscales while doing some sharpening and line darkening. Can also clean up some minor artifacts of various types. Pretty good general purpose model.
Sample: https://imgsli.com/MTQ4MDM2
Model Name: 2x_AnimeClassics_UltraLite_510K
Description: Handles Rainbows, Dot Crawl, Compression artifacts, halos, and blurriness. Best when used on old anime that is grainy.
Sample: https://imgsli.com/OTg3MjU
Model Name: 2x / 4x-anifilm_compact
Description: Retains some grain on the upscaled image.
Sample: https://imgsli.com/MTE5MjU0
Model Name: LD-Anime_Compact
Description: Upscales while fixing numerous video problems including: noise/grain, compression artifacts, rainbows, dot crawl, halos and color bleed. Can oversmooth some textures though.
Sample: https://imgsli.com/MTQyMzM3/0/1
Model Name: 2x_DigitoonLite_216k
Description: Meant as a versatile model for upscaling high detail digital anime and cartoons. Has debanding, MPEG-2 correction, and halo reduction.
Sample: https://imgsli.com/MTE1Mzg4
Model Name: 1x_HurrDeblur_SuperUltraCompact
Description: Very fast 1x deblurring/sharpening model
Sample: https://imgsli.com/MTEwOTcx
Model Name: 1x_AnimeUndeint_Compact
Description: Corrects jagged lines on animation that has been deinterlaced.
Sample: https://imgsli.com/MTExMTE1
Model Name: 1x_BleedOut_Compact
Description: Helps repair color bleed and heavy chroma noise that may be present on some older footage
Sample: https://imgsli.com/MTE4MjEz
Model Name: 1x_Dotzilla_Compact
Description: Wipes out dot crawl and rainbows in animation.
Sample: https://imgsli.com/MTM4ODkz
Of course, if none of these seem to work well for your source, feel free to try out any other models in the model database to see if there are any that work better for you.