Megalith
24-bit/48kHz
- Joined
- Aug 20, 2006
- Messages
- 13,000
Google has published a paper titled “Pixel Recursive Super Resolution” that demonstrates how it is now possible to turn a tiny, pixelated mess into a more detailed, usable image. Wow, now you can watch your favorite make-believe investigator hitting an “enhance” button to get the face of a suspect without laughing. But in light of all of these fancy advancements in neural network technology, how is it that we are still stuck with bicubic resizing for enlarging photos? I feel like Adobe would have introduced something superior by now—I mean, they did manage to create pure magic like content-aware fill. There is waifu2x, I suppose…
…it's impossible to create more detail than there is in the source image—so how does Google Brain do it? With a clever combination of two neural networks. The first part, the conditioning network, tries to map the the 8×8 source image against other high resolution images. It downsizes other high-res images to 8×8 and tries to make a match. The second part, the prior network, uses an implementation of PixelCNN to try and add realistic high-resolution details to the 8×8 source image. Basically, the prior network ingests a large number of high-res real images—of celebrities and bedrooms in this case. Then, when the source image is upscaled, it tries to add new pixels that match what it "knows" about that class of image. For example, if there's a brown pixel towards the top of the image, the prior network might identify that as an eyebrow: so, when the image is scaled up, it might fill in the gaps with an eyebrow-shaped collection of brown pixels.
…it's impossible to create more detail than there is in the source image—so how does Google Brain do it? With a clever combination of two neural networks. The first part, the conditioning network, tries to map the the 8×8 source image against other high resolution images. It downsizes other high-res images to 8×8 and tries to make a match. The second part, the prior network, uses an implementation of PixelCNN to try and add realistic high-resolution details to the 8×8 source image. Basically, the prior network ingests a large number of high-res real images—of celebrities and bedrooms in this case. Then, when the source image is upscaled, it tries to add new pixels that match what it "knows" about that class of image. For example, if there's a brown pixel towards the top of the image, the prior network might identify that as an eyebrow: so, when the image is scaled up, it might fill in the gaps with an eyebrow-shaped collection of brown pixels.
Last edited: