H.266/VVC Standard Finalized With ~50% Lower Size Compared To H.265

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
10,894
This is darn impressive. Wonder if it take off? Hardware Acceleration support and major application support?

"Fraunhofer HHI today with its partners announced the official H.266/VVC standard. The aim is with its improved compression to offer around 50% lower data requirements relative to H.265 while offering the same quality. H.266 should work out much better for 4K and 8K content than H.264 or H.265.

Fraunhofer won't be releasing H.266 encoding/decoding software until this autumn. It will be interesting to see meanwhile what open-source solutions materialize. Similarly, how H.266 ultimately stacks up against the royalty-free AV1.

More details on H.266 via Fraunhofer.de."


UnknownSouljer

https://www.phoronix.com/scan.php?page=news_item&px=H.266-VVC-July-2020
 
The big issue with all compression of course is how many resources it takes to encode and decode.
Apple has the upper hand here, as they can just add onto their ARM cores a portion that just encodes and decodes this. Intel has historically also done this in the past with h.265 encoding/decoding but it took them an awful long time to add it to their complex processors.

Getting it onto a camera (even a consumer one that is updated more regularly) is an entirely other matter. And it will likely take 4+ years (for someone to bother making a camera processor with this capability). Even then it might be too soon as there might not be enough hardware decoding penetration to make it viable.
 
The big issue with all compression of course is how many resources it takes to encode and decode.
Apple has the upper hand here, as they can just add onto their ARM cores a portion that just encodes and decodes this. Intel has historically also done this in the past with h.265 encoding/decoding but it took them an awful long time to add it to their complex processors.

Getting it onto a camera (even a consumer one that is updated more regularly) is an entirely other matter. And it will likely take 4+ years (for someone to bother making a camera processor with this capability). Even then it might be too soon as there might not be enough hardware decoding penetration to make it viable.
1594071178526.png
 
The big issue with all compression of course is how many resources it takes to encode and decode.
Apple has the upper hand here, as they can just add onto their ARM cores a portion that just encodes and decodes this. Intel has historically also done this in the past with h.265 encoding/decoding but it took them an awful long time to add it to their complex processors.

Getting it onto a camera (even a consumer one that is updated more regularly) is an entirely other matter. And it will likely take 4+ years (for someone to bother making a camera processor with this capability). Even then it might be too soon as there might not be enough hardware decoding penetration to make it viable.

Well sadly it is more than just adding on some hardware logic. First you have to develop a good encoder, that is one of the things that can take time. Remember just because the standard is done that just means they have laid down how the data is stored. Doesn't mean there is any code that does a really good job generating those files, that research takes more time. Then the shitty one is licensing. These standards tend to have lots of patents, so it can take companies time before they navigate that minefield and decide to license it in their devices. Last thing you want to do is implement it, think you are covered, only to discover other patents outside of the main MPEG patent pool that cover it and get sued by some assholes for every device you've sold.
 
This is going to have a lot slower progression than 265, unless, they change it's streaming/ protected content verification costs to be competitive with VP9/AV1.

By comparisons, h.265 decode support shipped in the same year as the codec was announced (the Galaxy S4 has 1080p h.265 decode acceleration), all because they had no cheaper competitor.

I don't expect the decoders to be out nearly as quickly as h.265; YouTube has already transitioned all their higher-resolution streams to VP9, and i slowly moving to AV1.

Netflix is currently transitioning to AV1, and will probably ditch h.265 for that (VP9 is not their exclusive high-res video option, unlike Youtube) - it's going to take another massive improvement over AV1 to get them to go another higher-cost direction.
 
Last edited:
Hardware video decode would be a great application for an FPGA.

New codec comes out? Flash the FPGA via the driver install.

Done.

Instead we all have to keep buying new hardware whenever there is a new standard.
 
Hardware video decode would be a great application for an FPGA.

New codec comes out? Flash the FPGA via the driver install.

Done.

Instead we all have to keep buying new hardware whenever there is a new standard.
FPGAs have a higher per-unit cost and power usage, which make them unattractive for most consumer electronics.
 
FPGAs have a higher per-unit cost and power usage, which make them unattractive for most consumer electronics.

The unit cost I can understand, but the power use seems like it wouldn't be hugely relevant for the video decoder. It's not like it's a heavy load consuming lots of power to begin with.

I'd happily pay a little extra for an upgradeable hardware video decoder. Maybe I am unusual though.
 
FPGAs have a higher per-unit cost and power usage, which make them unattractive for most consumer electronics.

Also, the comparison should probably be made to brute forcing it on the CPU not to the GPU/ASIC implementation, in which case I think the FPGA would probably be favorable power wise.

Because if you ship a product, and new standards come out after it is launched (which is always going to happen) chances are it will spend most of it's life having to brute force newer video codecs
 
Also, the comparison should probably be made to brute forcing it on the CPU not to the GPU/ASIC implementation, in which case I think the FPGA would probably be favorable power wise.

Because if you ship a product, and new standards come out after it is launched (which is always going to happen) chances are it will spend most of it's life having to brute force newer video codecs
Certainly possible. My codec knowledge is wildly out of date. I'd sure hope modern algorithms were designed with GPU offload in mind which would add another dimension of tradeoffs.
 
Also, the comparison should probably be made to brute forcing it on the CPU not to the GPU/ASIC implementation, in which case I think the FPGA would probably be favorable power wise.

Because if you ship a product, and new standards come out after it is launched (which is always going to happen) chances are it will spend most of it's life having to brute force newer video codecs



Sorry dude, you're dreaming - the added cost of installing an FPGA with enough free program space to handle even a single video codec upgrade is absolutely beyond reason.

Also, in addition to higher costs, the FPGA implementation is going to be less efficient than fixed hardware.

There's a reason why FPGAs only get usage for low-production corner-cases, ASIC development systems, or ACTUAL CONFIGURABLE NEEDS LIKE SOFTWARE-DEFINED RADIO.
 
Last edited:
Eh - everyone grumbled about 264 hardware support in GPUs when it came out, then we all upgraded and everyone quickly forgot. Same thing happened with 265.

same thing will happen with 266. By the time it is out in the wild and hardware support is something you need, it will already be there.

the only time I can think of this was ever a problem was back in DVD days when we were legitimately buying mpeg2 decoder cards for our systems.
 
Sorry dude, you're dreaming - the added cost of installing an FPGA with enough free program space to handle even a single video codec upgrade is absolutely beyond reason.

Also, in addition to higher costs, the FPGA implementation is going to be less efficient than fixed hardware.

There's a reason why FPGAs only get usage for low-production corner-cases, ASIC development systems, or ACTUAL CONFIGURABLE NEEDS LIKE SOFTWARE-DEFINED RADIO.


The problem with the comparison to ASIC's is that it is only a valid comparison until a new codec comes out.

After that you'd have to compare it to brute force decoding on the CPU.

What I don't understand is why more of this stuff isn't done using compute on the GPU cores. It wouldn't be as good as an ASIC, but it ought to be a hell of a lot more efficient and effective than doing it on the CPU, and supporting new codecs would be a software update away.

By all means, include ASIC based support for existing standards, but upgrade older hardware using compute-based solutions.
 
Last edited:
What I don't understand is why more of this stuff isn't done using compute on the GPU cores.

https://multimedia.cx/eggs/developing-a-shader-based-video-codec/

The reason is because the GPU is good at doing the same operation over and over, while a video code is a constantly changing thing. And every branch you take on a GPU saps performance.

Is this a keyframe?
If not, are you using b-frame or p-frame?
Then, once you get into the frame, How large is the matrix on this particular section? How many matrices make up this frame?


That's not to say you can't do it, but the last time Nvidia did this was for the GTX 980 hybrid h.265 decoder. I doubt they will be able to make it run on slower cards (l ike the GTX 960) so they implemented it in hardware.
 
Last edited:
I'd imagine that having a GPU be able to do real time video transcoding is going to get more and more important as streaming becomes more and more prevalent. I know that I, personally, would pay a premium ($100 or so?) for my GPU to do real time .266 encoding over the exact same GPU that couldn't do that.

Of course, I'm coming at this from the perspective of someone who currently has 40TB of mixed codec video files on his computer and would LOVE to be able to lower the size of those files, especially because up until my recent upgrade to an R9 3900X, my R71700 was unable to do real time H.265 4k video transcoding.
 
https://multimedia.cx/eggs/developing-a-shader-based-video-codec/

The reason is because the GPU is good at doing the same operation over and over, while a video code is a constantly changing thing. And every branch you take on a GPU saps performance.

Is this a keyframe?
If not, are you using b-frame or p-frame?
Then, once you get into the frame, How large is the matrix on this particular section? How many matrices make up this frame?


That's not to say you can't do it, but the last time Nvidia did this was for the GTX 980 hybrid h.265 decoder. I doubt they will be able to make it run on slower cards (l ike the GTX 960) so they implemented it in hardware.

Thanks for explaining that. I didn't not realize decodes were highly branched. I assumed since encodes were highly parallelized, decodes would be as well.
 
Last edited:
This part I know. I just did expect the decode workload to be a branched one. I expected it to be highly parallellized, just like the encode.


Well, actually, two-pass video encode does the analysis first (single-threaded) and then the optimized multi-thread encoder runs. You also don't care how slowly the process runs!

You can't use this trick during decode, because it has to run in real-time. so, you're stuck with whatever you can tell from this current frame, and the previous one (delta)...unless you want to wait over an hour before you view your video (and build a differemt cheat sheet for GPU decode)?

When you encoded, you built your cheat-sheet first, then used that to optimize the core usage! It's much easier to parallelize decode on your CPU than on GPU.
 
Last edited:
What I don't understand is why more of this stuff isn't done using compute on the GPU cores. It wouldn't be as good as an ASIC, but it ought to be a hell of a lot more efficient and effective than doing it on the CPU, and supporting new codecs would be a software update away.
GPU encoding isn't always the best option.

Some compute problems can't be solved efficiently through general-purpose computing. For example, if there's a lot of branching, this effectively kills any performance advantage due to branch stalling. Consider a thousand threads waiting for a single thread to finish because they require that one thread's result to continue. You can do some tricks to avoid branching in the first place, but some problems can't avoid branching, so that's that.

Also some compute problems are I/O bound. If a result of a block of work has to be shared with another compute unit that crosses boundaries (i.e. global memory), then you take a huge performance penalty by dipping into global memory. This is called the locality problem. Generally not a problem today given the larger caches and larger amounts of local/shared memory on a modern GPU, but locality was absolutely an issue 8 years ago.

You pretty much have to approach encoding via GPU completely differently. Most the time you end up with a completely different encoder altogether.

GPU decoding, however, should be no problem. Barring some DRM constraint, I can't see any reason why GPU decoding can't be done with the newer formats. GPUs were totally made to handle decoding efficiently.

Well, actually, two-pass video encode does the analysis first (single-threaded) and then the optimized multi-thread encoder runs. You also don't care how slowly the process runs!

You can't use this trick during decode, because it has to run in real-time. so, you're stuck with whatever you can tell from this current frame, and the previous one (delta)...unless you want to wait over an hour before you view your video (and build a differemt cheat sheet for GPU decode)?

When you encoded, you built your cheat-sheet first, then used that to optimize the core usage! It's much easier to parallelize decode on your CPU than on GPU.

Decoding won't contain much branching as all the required branching was done during encoding phase. There's no analysis phase during decoding as the final result has already been obtained. Decoding is just the inverse transform of the final stage of the encoding phase.

Decoding is more I/O bound than anything.

GPUs have so much compute power today that doing real-time decoding of an 8k stream wouldn't even have the lowest of GPUs break a sweat.
 
Last edited:
To be honest I'm really hoping they can improve the quality of GPU encoding. It works now and yes it's faster, but the visual quality is nowhere near CPU encoding at the same bitrate.
 
I only care about how quickly it encodes myself as someone who keeps encoding videos daily would gladly give up some space for faster encoding process. :) Having to stick with old Adobe versions to even get multicore encoding and H264 support so doubt I'm the customer for this in particular but for 4K broadcasting for example could be nice.
 
So I was reading this article on the codec:
https://www.techspot.com/news/85889-new-h266-vvc-codec-shows-promise-4k-8k.html

And saw this piece of info:
Hardware decode support is well underway for AV1, while the first software decoder (and associated encoder) for VVC is expected this fall at the earliest. One thing we still don't know is how much compute power is required to encode videos using VVC, but neither HEVC not AV1 are particularly easy to use without powerful hardware. Preliminary tests show that VVC is anywhere between four to ten times more complex to encode when compared to HEVC.
 
So I was reading this article on the codec:
https://www.techspot.com/news/85889-new-h266-vvc-codec-shows-promise-4k-8k.html

And saw this piece of info:
Which means that we need support in hardware. What sucks is that AMD has more or less been absent (and they've been reluctant to put what tech they do have into APUs), Intel has a stack of tech that's been sitting on shelves waiting for a suitable fab, and Nvidia, the best bet for 'flexible' consumer hardware, is about to release their next generation of hardware.
 
I only care about how quickly it encodes myself as someone who keeps encoding videos daily would gladly give up some space for faster encoding process. :) Having to stick with old Adobe versions to even get multicore encoding and H264 support so doubt I'm the customer for this in particular but for 4K broadcasting for example could be nice.

Just save it RAW or HUFFYUV then :p
 
Please for the love of all good things add avx 256 support to the standard. Please please. I want to leverage the full blown power of my 3960x

H265 supports avx 512 booo :-(
 
My main concern here is the royalty-related issues possibly impeding wide support, especially on devices the depend on specific hardware codecs support. Its nice to see progress and I'll be interested to watch the development of the "X266" open source version, but as the OP mentions there is also the comparison to AV1 to keep in mind regarding both performance and wide support.
 
Back
Top