Secrets Hidden in Images (Steganography) - Computerphile

May 04, 2020

So cryptography is the idea of encrypting a message so that even though everyone knows the message has been sent, they can't really find out what it means. Whereas in

steganography

we try to hide the fact that we have sent a message. So a classic example would be if I was writing you a letter and then I wrote in invisible ink a completely different handwriting between the lines or on the other side or something... And only you knew that that was going to be there. . Then you get home and everyone else maybe looks at the letter and thinks "that's not interesting at all." And then, of course, you can discover the secret message.

Today we'll talk a little about "digital image

steganography

" because there's obviously a lot of scope for hiding things in digital

images

can be megabytes or more and you can hide files of megabytes or more in them. But of course, as the amount of steganography in images increases, so do attempts to try to find it. So there are also a lot of statistical approaches to trying to find these things. Perhaps the simplest form of image steganography is "least significant bit steganography." So if we have a bitmap of any type (PNG or BMP), we can change the lowest bits to be our message and it will have an almost imperceptible change in the actual appearance of the image.

More Interesting Facts About,

secrets hidden in images steganography computerphile...

It's a bit like if you changed the number 800,351, if you changed 1 or 51, it wouldn't have a massive effect... - That's exactly it, the number is so big that in the grand scheme of things it doesn't make any difference. . So generally speaking, we'll change (in an image) on every byte, we'll change the last bit or maybe the last two bits if we're really trying to cram in a lot of data. Each byte has eight bits, we take the last two and change them into our message in the hope that no one will notice. So for every byte (that is, every 8 bits), six of them are the normal image and two of them are our secret message, so a quarter of our message is now secret.

So if we have a normal pixel, it will be 4 bytes long (i.e. one byte), so for each byte we are talking about the last two bits of that byte. So it could be a 1, we can change it to 1, change it to 0 or leave both the same. And what we do is we read our message, so let's say the message we're trying to encrypt is 10 11 01, okay? We get to the first byte and say, well this is great, our first two bytes are already 1 and 0, so we don't need to change anything at all for the byte to stay as is.

Then we go to the next byte, so maybe it will be red and this could be green in our pixel. Well? The last two bits of this byte are 0 and 1, the two we are trying to input from our message are 1 and 1, so we change this to 1. So by changing the second least significant bit from 0 to 1 We just increased this value in two and we are talking about one channel in a huge image; the change in two levels probably won't be too noticeable. If we start changing the most significant bits, that could be a problem. Okay, so I wrote a program to do this and tried to hide a fairly large file inside another fairly large image.

Well, this is a nice picture of a tree. It's about 3 (and a bit) megapixels. size. So this is the original image of our tree and that is the steganographic image. To the first, to the second. - It's not going to change! - It is changing. When you only change the last two significant bits of an 8-bit image per channel, you won't see a lot of detail. If you actually subtract the images, you will be able to see a difference, but overall it will be pretty imperceptible. The really good thing would be to never publish the original image.

I can tell that something has changed because I have the original and the new steganographic image with me. But if I only sent a picture of my dog and never sent the original that the camera took, no one will know that it has been imperceptibly changed because they have no reference. If you take a public domain image and change it, it will be easy to find the original source. - - Exact. The other thing is that it will work better in photographs where there is a lot of variation (at least in intensity levels). So this steganographic image has all of Shakespeare's works buried in it, which equates (when closed) to about... 1.5 MB, something like that.

This type of simple steganography can be detected. This image here is an image I created by taking only the last two bits of each channel. I have removed all other information. If a pixel has a value of 0, it is black, if it has a value of 3 it is white, and then it is located in the middle. And you can see that there is a tree there, so you can see even in the first two fragments that there is a tree and the sky is particularly soft. So if you look at the steganographic image, I did the same filters on it and you can see, but the amount of noise increases enormously because that noise is all

hidden

in those two least significant bits.

So you can see if you compare the steganographic image. bits from one image to another, you can see a difference and therefore hiding a message in the least significant bits is pretty obvious, especially if you have the original to compare. So this is the difference between those two images and I've greatly zoomed in on the difference, I mean, it looks very gray. These black and white pixels are values of plus or minus 3 intensity changes. So we're still talking about very small differences in the image and it's very evenly distributed, everything spreads noisily across the entire image.

Yeah, so you can't say there's a tree there now. - No, you can't tell there's a tree there. Which could be a clue! Perhaps the most sophisticated method of hiding something in an image would be to hide it within the Discrete Cosine Transform Coefficients of the jpeg file. We talked a little about DCT and how we convert an image into a series of cosine waves. And we have coefficients that say how much of each of those waves we have. If you change those coefficients instead of changing the raw pixel values, you will have a much less predictable effect on the image: if you change the value of one of the large AC coefficients from 202 to 201, you will have a very imperceptible difference. and it will happen all over that 8x8 block, so you won't be able to see the clear type of steganographic noise that we just saw in that tree.

A common algorithm we see in use is called JSteg. So I see what you did there. And what JSteg does is it goes in and, if it can, fills the DCT coefficients with as much data as it can. And what it does is: the coefficients are not 0 or 1 (because they can change and be a little obvious), so usually the low frequency ones can change up or down and you can see again that the difference is almost imperceptible. So here's a photo of a panda and what I've done here: I couldn't fit in as much information as before, so it's much better than this.

Then there is the original image and the steganographic one. And I looked at them and I found a little bit of a difference and you can see that again it's very, very, very slight, so these pixels again have only changed by 3 or so, maybe one, maybe two. - So that's just a close up of the... - That's a close up of the difference right there so you can see that, yes, the images have changed, but they haven't changed much. And the other crucial thing about hiding your message in the DCT coefficients: the jpg has already completely messed up the least significant bits of the image.

So if you make an image like the one I made where we look at just the bits, we won't be able to see a tree anymore, we'll only be able to see a very general jpg noise and it will be exactly the same in our steganographic image, so you can't do what They call it a visual attack by looking and seeing if there is a steganographic message

hidden

inside, because there is no real change. So this is the original and here I only show the two least significant bits. And you can see that they are formed into small blocks, they are the 8x8 DCT blocks.

And this is the steganographic data, so you can see that the blocks have changed, but the noise distribution across the entire image hasn't changed at all, so it's very hard to see that there's a message buried there. And if the message occupies only a certain part of the image it is difficult to see where in this image the message is. You could be trying to read every DCT coefficient when in fact only some of them have a message. - If you were sending this to someone as a message... how would they send it? - Okay, so in general, you would also encrypt the message because, you know, better safe than sorry, so why not use encryption?

So we encrypt our message, put it into DCT coefficients or the least significant bits, and then we send it to someone. Now, you're going to have to have known the process that we use because if you don't know, you're looking in the wrong place, so you know that we use J Stag or F5 or one of the other DCT steganography tools and basically you run the program, type in your password. decryption which will actually remove the encryption and then the message will appear. When JSteg was invented, it was a solid visual attack, so you couldn't look at it and say, "well, that's clearly been altered." So they had to try to find (the research had tried to find) some other way to detect that an image had a JSteg message buried in it and what actually happens is that the coefficients change slightly.

Because we are applying quantization to our DCT coefficients, most of them will be set to zero. OK? And JSteg won't put anything there, because it's too obvious; you'll just put them in a few in the top corner that are large, and you'll find that a subtle imbalance occurs in where your coefficients are, so you expect most of your coefficients to be 0 and then a few of them to be -1 or 1 and -2 and 2 to be very close to zero. And in fact, you start getting some 3's and 4's that you weren't expecting and the distribution of these numbers gets a little skewed and you can start to predict that the JSteg file has been buried inside.

What's more, this happens on every 8x8 block, so you can do this test on every block and find out which blocks have messages and which books don't. And you might find, for example, that the first 60% of the file has a message and then it stops abruptly and that's an obvious clue that we have something that doesn't take up the entire image. It has simply been written sequentially to the file. So if we take the frequency of the number of occurrences of each DCT coefficient, then zero (0) will be the most common, there can be -1 and 1, and we plot them on a graph with the frequency on the Y-axis and the DCT coefficient on the X axis we get what is called a histogram and that is simply a graph of the frequency of occurrence of various things.

So you can do a histogram on an image, but you can also do a histogram on these DCT coefficients and find out if they've changed imperceptibly. Once people started detecting JSteg routinely, other people came along and decided that's too obvious, so let's try to make it more subtle. So what they did was write DCT steganography approaches where they pay attention to the statistics of the coefficients and try to keep them balanced. So if you put in a 1, you try to remove one somewhere else to maintain the histogram and the probabilities of these coefficients occurring at the same time.

And that makes it much more difficult to use the standard histogram analysis technique to find out if there is something in the image. but now what they can do with the power of machine learning is: take, say, a thousand images, 10 of which may or may not have something buried inside them and a classifier will figure out what they are. You just have to have a lot of positive and negative samples to throw at him. - It all sounds wonderful but you know, - Well. Yes, spies aside, I must say that I am not using these techniques. you know everyone's watching.... - they're checking your Instagram - Exactly, so I think one of the most common uses is digital watermarking.

So in normal steganography, what we want to do is try to hide a message as best as possible. And then the only thing that really matters is that the person on the other end can understand it and no one else notices. In the case of watermarks, what we want to try to do is fingerprint the file so we know where it came from and we know it's ours, perhaps for copyright reasons or to track who has been distributing illegal material. And the key to a watermark is that instead of having as much payload as possible, instead of trying to cram all of Shakespeare's works into one image, what you should do is just a little... let's say a little logo. or a small piece of text repeated over and over again, so that if the image is cropped, the image is recompressed, it still remains there.

You can imagine that stock photography companies might do this to try to make sure that people don't distribute their files elsewhere. And you can imagine that they would wander the web searching for embedded steganographic images in their own particular way. Another case you might encounter if you were distributing preview DVDs of a movie and then it leaked onto the Internet... If there is steganographic data about the source buried, you will be able to see who leaked it. . - Each file couldadapt... - Each file could adapt to the person you originally sent it to and then when that particular one finds its way to the Internet, that person will be in trouble.

What was vital to recreating this image no longer exists and we are not going to recover it nor in fact, that is exactly what you see, so if we show the actual output here we can see that it is somewhat visible, but it has been completely overshadowed by all this noise random that has been added...

Watch Video & Subscribe

If you have any copyright issue, please Contact