Digital Video 101 – Codecs anyone?

Video compression is a process to reduce the size of video files to the point where they can fit on DVDs or be streamed over the internet. Compression, on the transmitting end, and Decoding, on the receiving end is done using Codecs. The word Codec is actually short for enCoder/Decoder. There are five parts to the entire setup:

  • the incoming signal – in this case video
  • the encoding process, where we make the information ‘fit’ whatever we need it to fit
  • the transmission path, where we’re getting from the transmitter to the receiver
  • the decoder, where we’re doing our very best to take the encoded information to restore the incoming video signal as closely as we can
  • the output signal – that is hopefully the same (in reality an approximation) as the input signal

As you’ll see in an upcoming article, you can’t store or transmit high definition or 4k video in uncompressed form without some expensive equipment, and you certainly cannot ‘stream’ uncompressed video over the internet from Netflix to your home. In fact – you can’t even store a movie onto a DVD without doing some seriously fancy compression – there simply isn’t enough room!

Encoding information is something you’d probably think of as something highly technical – but reality is that encoding and decoding of information is done in a lot of places for a lot of reasons.

Here’s an interesting example that you probably know about. Encoding text using Morse Code. Why? Well – if you can only transmit by turning the transmitter on or off, you have to do something to translate our letters into something that works with the given transmission channel. Here’s how we do it. We encode letters into short bursts (dots) and long bursts (dashes), and now you can figure out a way to transmit text:

Another one you’d not think of is music. A rather older problem, where there was no way to take music from one place to another. No iPod to take to work to listen to some good tracks – and so here’s a way to encode music. It’s really cool, because you can take it on a ship to a foreign land (think ‘transmission path’), and provided you have someone with an instrument who can read this sheet music (the decoder), you can go somewhere else, and listen to your favorite music. Yeah, it’s a bit of an ordeal compared to taking your phone – but once upon a time, that was the only way this could be done!

There are a large number of different Codecs and compression algorithms for video (and audio), available mostly as software solutions of some sort that take care of this compression and decoding ‘on the fly’, that is in real time. Some are better at one task, such as maybe representing colors, at the cost of being not so good at fast motion. Others might be good at fast motion, at the cost of image resolution, or accuracy of colors. They’re all compromise solutions, as they aim to take a whole lot of information and somehow compress it to fit a disk or data pathway.
Most of them work on the principle of encoding and sending changes between adjacent images only, although there obviously also a lot of other methods and techniques that come into play. Imagine you have two adjacent frames of a video that contain exactly the same image – a house, maybe, with the videographer not moving the camera from frame to frame. The first frame needs to be transmitted somehow, but if we assume that the house is mostly the same color, instead of transmitting each pixel we could tell the encoder to just transmit the color value for a large part of the house, and tell it where the boundaries of that color are. We’d save a huge amount of data that really doesn’t need to be transmitted. Same for the blue sky, or a piece of tarmac road – hey, the sky is blue, right, so why transmit each individual pixel. Might as well tell it where the sky is, and give it a value for the blue.

Here’s a frame sequence that we will imagine needs to be encoded. Let’s start with the first frame. Here’s an ordinary street scene. Some tarmac, houses, and some trees.

We will assume that the camera didn’t move, so the second frame will be the same as the first frame – so why transmit the second frame at all? We’ll just tell the other end to repeat the same image.

Here’s the third frame.

Have a close look. The third frame of this video has a car driving through the cross roads – so why not ONLY encode and transmit the part of the image where the car is – the rest of the image is still just the house, so no point in retransmitting all that video.

Now – we really saved some time there! The rest of the image stayed the same, so you can imagine how much it helped to only transmit that tiny part of the image with an X and Y coordinate to tell it where to put it.

Let’s have a look at the next frame.

In the next frame, the car is elsewhere in the image, so you transmit the bit of the image where the car was before, to ‘repair’ its background (in this case a bit of tarmac that’s pretty well all the same color, so we’ll tell the receiver to get out a gray crayon and just overpaint that bit of the picture), and then you take the image of the car and you overlay it elsewhere on the image. In fact – we’d probably get away by telling it to take the same picture of the car we sent last time, and just have it move that car to its new position. That was easy!

You can see how such an encoder might be able to compress the video – basically, instead of transmitting 5 frames of video, we could transmit 1 frame (having ‘compressed’ the data by not even transmitting all of the individual pixels), followed by a picture of a car, and that of a background that has to be set back to where it was before when the car moved. We’ve done an enormous compression of the data!

OK, now let’s imagine we’re panning the scene, to the right.

Instead of sending an entirely new frame – why not tell the encoder to just encode the fact that the entire picture has moved, and where it has moved to, so that we only need to fill in the strip of the image that is new. Hey, stick this bit on the right side of the previous image, and you’re done. Now that was easy, and quite efficient!

We’ve now transmitted an entire video by really only transmitting one frame (and we cleverly compressed that by not sending all of it but instead doing color banding), and adding some changes here and there.

These are just SOME of the techniques that Codecs use to transfer information. Obviously the result is not perfect. There’s a whole load of smoke and mirrors going on here, and Codecs often depends on your eyes not being able to catch the issues. If there’s a problem with a single frame of video, and we’re talking about the TV in your living room, it’s quite likely that you’ll see just what you’re intended to see: you never saw any detail in the car as it moved. You never saw that the right side of the image during that pan really didn’t look that good for a frame or so, and that it took a couple of frames to get the quality of the image back to what it should be.

I’m sure you’ve seen ‘color-banding’ where the sky looks like a patch work of different blues, rather than a nice gradient. Sometimes you’ll see a few frames where the image of your TV looks like a whole bunch of ugly looking squares – the image is KIND of there, but it’s not good.

All that happens when the data is compressed a little too much (it often happens when there’s a glitch and the transmission path has a brief problem), and the encoding process doesn’t encode ENOUGH of the different color samples and borders to make up that patch work. It’s a bit like you’re looking at the sky, and you’ve only got 4 different blue pens to make a picture of it. Not ideal, but on the whole you’ll get the idea.

Basically, Codecs are essential in order to fit video through restricted size data paths like the internet (think Netflix and the likes, and imagine trying to poke a house through a hosepipe to get an idea of the scale of the problem), and to allow it to be stored onto devices like hard disks, and for that matter your computer and your iPad. You’d not be happy to have one movie take up all the storage on your computer or tablet.

Next time, we’ll go into some more depth on encoding and decoding, and we’ll discuss how MPG files work. Then we’ll also do an article to discuss data rates and talk about compressed versus uncompressed video. Remember – these types of encoding mechanisms work – but in reality, you’re of course not getting the data in uncompressed form, so something, somewhere has to give! For some applications, you can get away with this – your kids watching Frozen in the car, you watching a movie at home, you’ll probably be fine with it. Having a corporate presentation to critical clients on a 16’ wide video wall is, of course, another story altogether.

Leave a Comment

数码视频101——谈编解码器 — Digital Video 101 – Codecs anyone?

视频压缩式是减小视频尺寸的一个处理步骤,为了可以用于DVD播放或者上传网络。通过使用编解码器完成发送端、解码、接收端的视频压缩。实际上,编解码器这个词(Codec)是编码器/解码器的缩写。整体安装有5部分:

  • 输入信号——此情况的视频
  • 编码步骤,我们使信息在其相对应地方适用
  • 传输路径,我们从发送器到接收器的地方
  • 解码器,我们尽最大努力把编码信息尽可能接近地恢复为输入视频
  • 输出信号——与输入信号相同(实际当中非常接近

正如你下面要读到的内容,没有一些高级设备你不能实现储存或者传输未经压缩的高清或者4K的视频,当然你也不可能完成对未压缩的视频进行从Netflix到你家的缓存。实际上这就好像如果未经特殊的压缩你不能往一张DVD光碟上存一个电影一样,简单说就是没那么大空间!

编码信息你觉得是技术含量很高的东西,但是其实编码和解码信息在很多地方被众多原由所使用。

这有一个可能你知道的有意思的例子。编码文本适用摩斯电码。为什么?如果你只能通过开关发射机进行传输,那你就必须做些什么把我们的字母翻译成能在指定的传输频道可以使用的内容。这就是我们的工作方法。我们把字母编码成短脉冲(点)和长脉冲(横杠),现在你就能找到传输文本的方法了。

另外一个可能你想不到的就是音乐。这是个更古老的问题,那时候没办法把音乐从一个地方传到另外一个地方。上班的时候没有iPod听些好歌,所以就诞生了一种音乐编码的方法。非常酷,因为这样你就可以把音乐带到轮船上或者遥远的国度(试想传输路径),还有能读懂这个音乐编码的人按照谱子用乐器演奏出来(试想解码器),你可以去其它地方、听你自己喜欢的音乐。当然,现在看和用手机听歌比没什么竞争力,但是曾经这是唯一的方式!

有众多不同的视频(及音频)编解码器和压缩方式,现实中,大多都是以软件作为解决方案,可以飞速写入压缩和解码。这些软件有一些是擅长一种任务,比如色彩,但丧失了快速的功能。其它的可能速度够快但是牺牲了图像分辨率或者色彩准确度。他们都是妥协方式的解决方案,因为他们都是针对大量信息,把它们压缩以适用一张光碟或者数据通道。

它们当中的大多数都是根据编码和在相邻图像之间发送变化的原则上工作,然而显而易见,还有很多其他方式和技术可以使用。试想你有一个视频由画面完全相同的挨着两帧图像组成,一座房子,可能是摄像师拍摄时没有移动摄像机。第一帧画面需用某种方式传输,但是我们设想房子基本上是一个颜色的,并不需要传输每一个像素,我们可以告诉编码器仅转移房子大部分的色彩值,并告诉它哪儿是色彩的分界线。对于蓝天或者一段柏油路同理, 既然全部是蓝天为什么还要传输每块儿个体的像素。就是告诉它哪儿是蓝图,然后告诉它蓝色的色彩值。

有这样一序列帧,我们现在设想需要把它编码。先从第一帧开始。就是个一般的街景。一些柏油路、房子、树。

我们设想镜头不动,第二帧将和第一帧相同,所以干嘛还要编码第二帧?我们就需要告诉另一端重复相同画面。

这是第三帧。

近看,这个视频的第三帧有一辆车开出来穿过道路,所以为什么不仅解码和传输有车这部分的画面,画面其它部分仅为房子,所以没理由再传输全部的视频。

现在,我们真正做到了节省时间!画面其余不分保持一致。所以你能想象使用X和Y相配合对于传输视频极小的一部分的画面是多么的有帮助,就是告诉它放哪儿就行了。

让我们看下一帧。

下一帧中,这辆车在这幅画面的另外一个地方,所以你传输一点儿这车之前位置的图像,然后修正其背景(此时情况是柏油路几乎是全部相同的颜色,所以我们告诉接收器用蜡笔灰涂满这部分),然后把画面上的车放置在图像另外一个地方。实际上,我们就是让它使用上次我们发的有车的同样画面,只是把车放在一个新位置。简单吧!

你能看到一个编码器如何压缩视频,不是传输视频的5帧画面,而是传输1帧(“压缩”数据而不是传输所有个体像素),之后是车的图片,其背景需被设为车移动以前在的位置。我们已经完成了众多此类的数据压缩。

好的,现在让我们想象向右移动镜头场景。

并不用传输全新的一帧,仅需要告诉编码器编码移动的图像部分,哪儿移动了,这样我们只需要在这条路上添加新的图像部分。在之前的图像的右边粘上这个,就成了。现在看真是简单,还很有效!

我们现在已经传输了全部的视频,实际只传了仅一帧(我们聪明地进行了压缩而不是全部发送,仅发送了色带),并这里或那里的添加了变化。

这些仅是编解码用于传输信息的方法的一部分。显然结果并不完美。有好多雾里看花的内容,通常编解码依赖于你的眼睛,并不能自己找到问题。如果视频的单帧出现一个问题,我们在你的客厅讨论电视里的内容,就好像你将要看到你想要看到的内容:你不会看到车移动时的任何细节。你不会看到移动时右边的图像不如之前那么好,几帧过后才恢复到和之前帧一样的质量。

我相信你以前看过那种色彩带,天空看起来好像修补过的不一样的蓝,而不是完美递进的过渡。有时你看到电视上的几帧图像好像很难看,能看出图像内容,但是不好看。

当数据被压缩得很厉害就会出现上面说的问题(通常有小故障或者传输路径有明显的问题),编码过程并没有编足够的不同色彩样板,色彩边界没在路径中体现。就好像你看见天空,而只看到4种颜色。不完美,但是你也看明白是怎么回事了。

编解码对于让视频适用于像互联网(试想Netflix及类似的,等其它严格数据容量要求的路径至关重要(试想Netflix这样的,把数据庞大的图像通过一条软管传到你家,得遇到多少麻烦),还得能让视频保存在硬盘或者电脑或者你的iPad设备上。你可不想你的电脑或者平板电脑上只能存一个电影。

下次,我们将更加深入的谈论编码和解码,还会讨论MPG文件如何工作。我们还会写一个文章讨论数据率,讨论视频的压缩和未压缩。请记住,这些类型的编码工作肯定是某地或者有谁提供出来的,现实中,你肯定不会从未压缩的格式中得到数据。有些应用程序可以帮你,让你的孩子在车里看冰雪奇缘或者你在家看电视,你可能感觉还不错。但是面对挑剔的客户在16英尺的墙投影做汇报,这可就是另外一个故事了。

Leave a Comment

Digital Video 101 – Codecs anyone?

Video compression is a process to reduce the size of video files to the point where they can fit on DVDs or be streamed over the internet. Compression, on the transmitting end, and Decoding, on the receiving end is done using Codecs. The word Codec is actually short for enCoder/Decoder. There are five parts to the entire setup:

  • the incoming signal – in this case video
  • the encoding process, where we make the information ‘fit’ whatever we need it to fit
  • the transmission path, where we’re getting from the transmitter to the receiver
  • the decoder, where we’re doing our very best to take the encoded information to restore the incoming video signal as closely as we can
  • the output signal – that is hopefully the same (in reality an approximation) as the input signal

As you’ll see in an upcoming article, you can’t store or transmit high definition or 4k video in uncompressed form without some expensive equipment, and you certainly cannot ‘stream’ uncompressed video over the internet from Netflix to your home. In fact – you can’t even store a movie onto a DVD without doing some seriously fancy compression – there simply isn’t enough room!

Encoding information is something you’d probably think of as something highly technical – but reality is that encoding and decoding of information is done in a lot of places for a lot of reasons.

Here’s an interesting example that you probably know about. Encoding text using Morse Code. Why? Well – if you can only transmit by turning the transmitter on or off, you have to do something to translate our letters into something that works with the given transmission channel. Here’s how we do it. We encode letters into short bursts (dots) and long bursts (dashes), and now you can figure out a way to transmit text:

Another one you’d not think of is music. A rather older problem, where there was no way to take music from one place to another. No iPod to take to work to listen to some good tracks – and so here’s a way to encode music. It’s really cool, because you can take it on a ship to a foreign land (think ‘transmission path’), and provided you have someone with an instrument who can read this sheet music (the decoder), you can go somewhere else, and listen to your favorite music. Yeah, it’s a bit of an ordeal compared to taking your phone – but once upon a time, that was the only way this could be done!

There are a large number of different Codecs and compression algorithms for video (and audio), available mostly as software solutions of some sort that take care of this compression and decoding ‘on the fly’, that is in real time. Some are better at one task, such as maybe representing colors, at the cost of being not so good at fast motion. Others might be good at fast motion, at the cost of image resolution, or accuracy of colors. They’re all compromise solutions, as they aim to take a whole lot of information and somehow compress it to fit a disk or data pathway.
Most of them work on the principle of encoding and sending changes between adjacent images only, although there obviously also a lot of other methods and techniques that come into play. Imagine you have two adjacent frames of a video that contain exactly the same image – a house, maybe, with the videographer not moving the camera from frame to frame. The first frame needs to be transmitted somehow, but if we assume that the house is mostly the same color, instead of transmitting each pixel we could tell the encoder to just transmit the color value for a large part of the house, and tell it where the boundaries of that color are. We’d save a huge amount of data that really doesn’t need to be transmitted. Same for the blue sky, or a piece of tarmac road – hey, the sky is blue, right, so why transmit each individual pixel. Might as well tell it where the sky is, and give it a value for the blue.

Here’s a frame sequence that we will imagine needs to be encoded. Let’s start with the first frame. Here’s an ordinary street scene. Some tarmac, houses, and some trees.

We will assume that the camera didn’t move, so the second frame will be the same as the first frame – so why transmit the second frame at all? We’ll just tell the other end to repeat the same image.

Here’s the third frame.

Have a close look. The third frame of this video has a car driving through the cross roads – so why not ONLY encode and transmit the part of the image where the car is – the rest of the image is still just the house, so no point in retransmitting all that video.

Now – we really saved some time there! The rest of the image stayed the same, so you can imagine how much it helped to only transmit that tiny part of the image with an X and Y coordinate to tell it where to put it.

Let’s have a look at the next frame.

In the next frame, the car is elsewhere in the image, so you transmit the bit of the image where the car was before, to ‘repair’ its background (in this case a bit of tarmac that’s pretty well all the same color, so we’ll tell the receiver to get out a gray crayon and just overpaint that bit of the picture), and then you take the image of the car and you overlay it elsewhere on the image. In fact – we’d probably get away by telling it to take the same picture of the car we sent last time, and just have it move that car to its new position. That was easy!

You can see how such an encoder might be able to compress the video – basically, instead of transmitting 5 frames of video, we could transmit 1 frame (having ‘compressed’ the data by not even transmitting all of the individual pixels), followed by a picture of a car, and that of a background that has to be set back to where it was before when the car moved. We’ve done an enormous compression of the data!

OK, now let’s imagine we’re panning the scene, to the right.

Instead of sending an entirely new frame – why not tell the encoder to just encode the fact that the entire picture has moved, and where it has moved to, so that we only need to fill in the strip of the image that is new. Hey, stick this bit on the right side of the previous image, and you’re done. Now that was easy, and quite efficient!

We’ve now transmitted an entire video by really only transmitting one frame (and we cleverly compressed that by not sending all of it but instead doing color banding), and adding some changes here and there.

These are just SOME of the techniques that Codecs use to transfer information. Obviously the result is not perfect. There’s a whole load of smoke and mirrors going on here, and Codecs often depends on your eyes not being able to catch the issues. If there’s a problem with a single frame of video, and we’re talking about the TV in your living room, it’s quite likely that you’ll see just what you’re intended to see: you never saw any detail in the car as it moved. You never saw that the right side of the image during that pan really didn’t look that good for a frame or so, and that it took a couple of frames to get the quality of the image back to what it should be.

I’m sure you’ve seen ‘color-banding’ where the sky looks like a patch work of different blues, rather than a nice gradient. Sometimes you’ll see a few frames where the image of your TV looks like a whole bunch of ugly looking squares – the image is KIND of there, but it’s not good.

All that happens when the data is compressed a little too much (it often happens when there’s a glitch and the transmission path has a brief problem), and the encoding process doesn’t encode ENOUGH of the different color samples and borders to make up that patch work. It’s a bit like you’re looking at the sky, and you’ve only got 4 different blue pens to make a picture of it. Not ideal, but on the whole you’ll get the idea.

Basically, Codecs are essential in order to fit video through restricted size data paths like the internet (think Netflix and the likes, and imagine trying to poke a house through a hosepipe to get an idea of the scale of the problem), and to allow it to be stored onto devices like hard disks, and for that matter your computer and your iPad. You’d not be happy to have one movie take up all the storage on your computer or tablet.

Next time, we’ll go into some more depth on encoding and decoding, and we’ll discuss how MPG files work. Then we’ll also do an article to discuss data rates and talk about compressed versus uncompressed video. Remember – these types of encoding mechanisms work – but in reality, you’re of course not getting the data in uncompressed form, so something, somewhere has to give! For some applications, you can get away with this – your kids watching Frozen in the car, you watching a movie at home, you’ll probably be fine with it. Having a corporate presentation to critical clients on a 16’ wide video wall is, of course, another story altogether.

Leave a Comment