
Introducing Cryptography
This is a guide on introducing cryptography.
Introducing Cryptography
So what is cryptography? Cryptography is often simply referred to as crypto. It's the study of secure communications. The reason we use cryptography is because it addresses four distinct problems. The first being confidentiality whereby we can protect who can see a message. The second issue that's addressed by crypto is data integrity where we can assure that data has not been tampered with. The third issue is authentication whereby we can confirm that a message is authentic and has not been forged. The fourth issue that's addressed by crypto is non-repudiation whereby we can verify the owner or the sender of a message, in fact, did originate that data. But how does crypto work? Well, with encryption, we are scrambling or obfuscating data. To do that, we need some kind of a key. The decryption can only occur with the correct key to reveal the original message. Hashing takes data, puts it through an algorithm, and results in a unique value called a hash value.
Now we can recompute a hash value on data again later. And, if the value is different, it means that the original data has changed. Let's do an example of cryptography using XOR. XOR stands for exclusive OR. And the purpose is that if we were to compare two inputs, let's say 0 and 0, now those two inputs are the same. When we XOR those together, the output is false, which in this case would be a 0. But, if we were to XOR 0 against 1 – because those two inputs are different – XOR returns a 1, which means true. So, whenever the inputs are different, we get a true value. In our example, let's say that we want to encrypt the string "abc." If we were to look up the letter "a" in an ASCII conversion table, we would see it has a value of 97, "b" has a value of 98, and "c" has a value of 99. If we were to convert each of those values to binary, then we would have three sets of eight binary digits. Let's say that we would have an XOR value of the ASCII letter "d," which can be represented in binary as 01100100.
So this is our key. We're going to use this to encrypt our plaintext "abc" to result in an encrypted value or ciphered text. So the way that this would happen is we would take each binary set, for example for the letter "a," and we would XOR it against our key. Then we would get a resultant output. So remember, what happens with XOR is you compare inputs. Let's say 0 and 0. If they're the same, then you return 0, which is false. But, if you compare two inputs like 1 and 0 that are different, it would result in true, which is a 1. So we would do that comparison of our original data. Then we would have our final result. In binary would be three sets of XOR results. Now, if we convert each of those binary sets back to decimal – in this example – it would be "5 6 7." So that is our encrypted value or our ciphered text. The XOR value is the key. And only someone that knows that key, which in our case is the letter "d," will be able to decrypt the message. No other key will work. So let's go through how we would decrypt using our XOR example. So remember, our encrypted value or ciphered text is "5 6 7."
We somehow need to get back to the original message, which is "abc." So the first thing we do is convert our ciphered text to binary. So we would convert 5 to binary, 6 to binary, and 7. You could even do this manually, if you wanted to, using a scientific calculator such as the one built into Windows. Then, given that we know the correct key, which is the letter "d," we would XOR it against those values. After we've got the XOR results, then we would convert it back to decimal. Now that would be "97 98 99." And, if we look that up in our ASCII table, that results back to "abc." So what we're seeing then is that we can do a decryption if we have knowledge of the encryption key, which in our case was the ASCII letter "d." But what if we don't have the correct key? Well, if we don't have the correct key...let's say we think the key is "k." So, when we convert that to binary, we've got our key in binary. And, if we XOR that, the XOR values would result in something that is not the original text. So you need to have the correct decryption key. You have to have knowledge of it. So it's important in cryptography to protect these keys. In this video, we discussed cryptography.
Identifying Historical Use of Cryptography
Cryptography has been used for thousands of years where its primary purposes are to protect data from unauthorized parties and to ensure that the data hasn't been tampered with. Classical ciphers were usually based on either transposition or substitution, which we'll examine further. Steganography was also and still is commonly used today. With steganography, we are concealing one message within another. For example, using certain software, we might embed a secret message within a graphic image. Anyone viewing the graphic image would see just that – the image. But they would have to have knowledge that there was a secret message embedded within that image, and they'd have to have the correct software and key to extract the message. Transposition ciphers transpose text by shifting letters around. Two common transposition ciphers include the route cipher and the rail fence cipher. The rail fence cipher works by having each letter of the original plaintext message written on a separate line and spaced out.
This way, the resultant ciphertext or encrypted value is different from the original plaintext. The route cipher builds on the rail cipher, but it writes everything in a grid with a known dimension, and the data is spiraled around the original message to generate the encrypted strength. Substitution ciphers substitute either letters, words, or groups of characters with some other value, hence, substitution. There are many different types of substitution ciphers where the Caesar cipher is the most common. The Caesar cipher shifts letters by some specific amount, for example, by three. So that, if we start off with the letter "a" and we're shifting by three, our result is "D." But all of these early cryptographic methods were attackable using what is called a frequency analysis.
With frequency analysis, we can use knowledge of a language and the understanding that certain letters and combinations of letters occur more frequently than others. For example, in the English language, the letter "e" is very commonly used and is often followed by the letter "t." There are some attempts to prevent frequency analysis attacks. One is called polygraphic substitution, whereby we have a group of plaintext values that get replaced by a predetermined character or an entire group of other characters. With polyalphabetic substitution, we're using multiple substitution alphabets. And this, in the end, will reduce the effectiveness of frequency analysis attacks. In this video, we discussed the historical use of cryptography.
Describing Cryptographic Terminology
As with any discipline, it's important that we have a solid understanding of terms before we can truly understand cryptography. The first term is a cipher. A cipher is a cryptographic algorithm that encrypts or decrypts a message. A cryptosystem is a system that includes the cipher for encryption and decryption and a key generation and key management process. Plaintext refers to the original, unencrypted message either before it's been encrypted or after it's been decrypted. Ciphertext refers to the encrypted representation of the original message or plaintext. A key is a set of bits that's used by a cipher to encrypt plaintext or to decrypt ciphertext. So we have to have the correct key in our possession before we can decrypt encrypted messages. Code is a term that can mean to convert something into code, in other words – to encrypt it. Or code can represent a key or word that was used in older cryptographic methods. With cryptography, key management is very crucial because keys are used for things like encryption and decryption. So key management refers to the generation, the exchange, the storage, and the revoking of keys.
Keys can sometimes be revoked due to security compromise. For example, on a smartphone if we've got a secret key that was stored on that device and that device is stolen, we would revoke the key in our cryptosystem so it can no longer be used. Key exchange is the process of securely exchanging keys normally over a network. A block cipher operates on a single block of data, for example, a 128-byte block whereas a stream cipher works on each bit of data one at a time and doesn't require a block of data to operate on. Both block and stream ciphers result in ciphertext. Hashing creates a one-way, fixed-length unique value that represents the original data that was passed to the hashing algorithm. If we recompute the hash value in the future, then if the original data has not changed, we get the same resultant hash value. But, if we compute the hash and get a different value, it means the data has changed. Mode of operation provides a method to encrypt and decrypt more than one block when you're using block ciphers.
Cryptography is based heavily on mathematics. With number theory, the mathematical study of integers for crypto and computationally complex problems related to things like factorization are used often. Integer factorization is the breakdown of a positive integer. Now an integer is a number that doesn't have any fraction, so it's a whole number. With integer factorization, it's usually limited to two prime numbers. In crypto, prime numbers are often used because in mathematics prime numbers really don't have a discernible pattern. Cryptanalysis is the study of breaking cryptographic ciphers or systems. The purpose is to determine the strength of that cryptosystem or that cipher. With the side-channel attack, we attack the physical infrastructure of a cryptosystem rather than the algorithms themselves with the end goal of capturing data related to the ciphertext or keys. So, for example, as a side-channel attack, we might infect a computer with malware that installs a keylogger that watches everything that the person types in on the computer. This would reveal secret keys.
In cryptography, often we have examples that use specific usernames. Often, we have Alice, Bob, Chuck and Craig and Eve that are used in cryptographic examples. Often, Alice is the sender of the message and Bob is the receiver. Chuck is an entity that wishes to intercept or interfere with the message. The standard user Craig is an entity that's involved with cracking a password or a key. And finally, the common user Eve is someone that's attempting to eavesdrop on a message. In this video, we discussed cryptographic terminology.
Defining Why Cryptography is Difficult
Cryptography strives to protect data, but despite our best efforts, all cryptography can be broken or cracked. It's really a question of the amount of time and effort one is willing to put in. The effort is often a calculable quantity given today's computing power. Many good cryptographic systems from the past can easily be broken because of advances in mathematics and advances in computing power. But will today's strong cryptography be easily broken tomorrow? Chances are the answer is yes. How do you know though, if a cryptographic algorithm is strong? It's not as simple as comparing one algorithm with another and saying this one is better than that one. We have to look at things like cipher key lengths. We also have to look at the specific implementation of an algorithm. So it's not just as simple as looking at the algorithm itself. Really, an algorithm is public knowledge. Here is an example of a Caesar cipher encrypted message. Now the Caesar cipher is a substitution cipher where we substitute one character with another. For example, decrypting this Caesar cipher encrypted message is pretty basic if you know that the substitution key is 3.
So that means that given the letter U as an encrypted part of the message, we know that if the substitution key is "3," it really means the letter R. But even if we didn't know the key, making some quick guesses and trying a bunch of shifts would eventually get us the answer in this particular case. For a computer, it can just run all the possible shifts and use a dictionary to match words and return the result very quickly. In this case, the possible keys are very small – only 25. Crypto systems today must have a very large key space in order to prevent this kind of an attack. DES stands for the Digital Encryption Standard. It was used extensively by U.S. federal government agencies starting with the 1970s. But it was then replaced by Triple DES or 3DES and eventually by AES – the Advanced Encryption Standard. DES was designed originally as a 56-bit key. The EFF – the Electronic Frontier Foundation – was able to build a $200,000 machine in 1998 called the DES cracker that could brute force DES 56-bit keys in a few days.
This attack pushed Triple DES and then AES into becoming new standards. Triple DES is a variation of DES. It runs the DES cipher three times on the same data. And this increases the strength up to 112 bits, not exactly 168. AES has three strengths – 128, 192, and 256 bits, depending on how the cipher gets used. AES 256 can encrypt data such as in image using Electronic Codebook or ECB. But when we look at the output file, the encrypted data looks a little bit different than the original data. But ECB mode has an issue. In that, the encrypted data is not truly uniformly random. And as a result there are patterns. Now, with crypto, if we can determine there is some kind of a pattern, then we have a potential way to crack that crypto system or that specific cipher. How do we determine how strong a crypto system or cipher really is?
Key length is part of that. So a 256-bit symmetric key means that there are 2^256 possible keys in that key space. That's better than a 56-bit key, which has far fewer possible combinations. However, bigger key lengths don't always mean that we have a stronger algorithm. Remember, it's how it's implemented. For example, with Wi-Fi networks, WEP – Wired Equivalent Privacy – was able to use 128-bit keys. Now, often 128-bit keys are great for securing communications. However, the way WEP was implemented meant that it was very easy to crack 128-bit WEP keys to crack a wireless network – in a matter of minutes. So it's the implementation that also matters. Having standards for cryptography is very important. This way, we have many groups of people analyzing and attacking and determining the true strength of a crypto system or an individual cipher. The National Institute of Standards and Technology or NIST provides many of these standards for ciphers, hashes, and other cryptographic algorithms. In this video, we discussed why cryptography is difficult.
Identifying the Current State of Cryptography
In order to have effective cryptosystem solutions in a marketplace we need standards. That's where the National Institute of Standards and Technology comes in. Now they deal with many different standards, even outside of Information Technology. But, when we look at the Information Technology and more specifically the Computer Security Resource Center available with NIST, here is why we have a number of standards that are adhered to by vendors. US government agencies adhere to something called FIPS – this stands for Federal Information Processing Standards. These are standards that outline how data is dealt with, what encryption algorithms are recommended for use by US government agencies, and so on. So things like encryption algorithms then have to go through certain testing with NIST. For example, FIPS document 197 deals with the Advanced Encryption Standard, whereas FIPS 186-4 deals with the Digital Signature Standard or DSS. Digital signatures, for example, when used with email allow us to ensure that the message has not been tampered with and that it really came from who it says it came from. There is also a validation program for cryptographic modules on the NIST website.
The NIST website also has a list of validated cryptographic modules, whereby we can go through a table to see which ones are considered to be safe cryptographic modules that can be used in the market. New algorithms can also be submitted and verified through NIST. There are other sources and validations as well for IT security, such as Common Criteria – often referred to simply as CC – for example – the ISO/IEC 15408 IT security standard. This allows us to make sure that IT products meet very specific security specifications – in other words a minimum security standard. NIST also provides lists of Known Answer Test vectors. Here on the NIST website, these Known Answer Test vectors can be used to verify the functionality and the security of a cryptographic algorithm. In this video, we discussed the current state of cryptography.
Describing Export Controls on Cryptography
The main reason that nations apply export controls on hardware and software crypto is due to national security threats. The Wassenaar Arrangement of the 1990s applied export controls for nations on dual-use goods and technologies. Dual use means items that are built for civilian as well as military use. There are 41 participating countries in the Wassenaar Arrangement. However, in some countries, items such as medical devices could be exempt from export controls despite the fact that they might have some crypto support built into them. The same is true for open source software. In some countries, it could be exempt from export controls because the source code is made publicly available; everybody knows how the security mechanisms are built.
Most countries have some form of export or import control over cryptosystems. But the United States has relaxed its policies significantly since the 2000s. However, there may be legal issues that need to be overcome when exporting software or hardware with cryptographic capabilities depending on the country of origin and the country of destination. In some countries, the export of such products that contain strong modern crypto might not be illegal, but you would have to register to get a license to export it. In some cases, countries have laws that can force suspects to decrypt or provide the keys to decrypt encrypted data. In the United States of America, export controls are governed by the International Traffic in Arms Regulations or ITAR. In Canada, export controls are primarily governed by the Export and Import Permits Act. In this video, we discussed export controls on cryptography.
Describing How Cryptography Provides Confidentiality
Confidentiality ensures that data isn't disclosed to unauthorized parties. So, when we transmit data over a network – if it's encrypted – we are providing confidentiality of that transmission. The same thing applies if we're saving a file on disk and then encrypting it. We are providing confidentiality by way of encryption. Confidentiality was the primary purpose for which cryptography was originally used. And that continues today. Encryption requires the use of a secret, which is a key or a code, in order to encrypt and decrypt a message. So that might come in the form of a passphrase that a user must type in before they're allowed to connect to an encrypted Wi-Fi network. Or we could have a key embedded on a smart card that the user must swipe before gaining access to a building or a secured computer system.
Confidentiality ensures that the data is not disclosed to unauthorized parties. Confidentiality was the main purpose for which cryptography was originally used.
Confidentiality usually comes with a term – a time frame in which the information must be protected. For example, a credit card has a valid lifetime of approximately three to five years. So using a cipher that would take 20-plus years to break by today's computing means should be good enough to use for a credit card. As another example, a battlefield message from the commander to the front line only needs to be protected for the length of time that it takes from the creation of the message until the operation is over. So, in that case, a cipher that would take hours to crack might be considered sufficient. Confidentiality is generally between two parties – the sender and the receiver. However, cryptosystems have been built to split keys between more than two parties. Cryptosystems have also been built to allow multiple parties, each with their own unique key, to access the same data.
Now the loss of the key should make the data unrecoverable in any reasonable expectation of time. So, when we implement confidentiality through encryption – whether it's network encryption or file encryption – it's important that we have some kind of a recovery key that a trusted party can generate. Otherwise, loss of the key that's used to decrypt encrypted data means that data is not accessible by anybody. Let's take a look at what network traffic that isn't encrypted looks like. Here, on Wireshark, I am going to go to the Analyze menu and choose Follow TCP Stream. Here I've captured traffic that is not encrypted. So I can see, there was a login for user root and that the password was a variation of the word password. I can also then see another user logging in as ccbbllaacckkwweellll, again, with a password of Pa$$wOrd. Here there is no data confidentiality. It's in plaintext and has not been encrypted. In this video, we discussed confidentiality.
Recognizing the Need for Data Integrity
Now those changes could be malicious, such as an attacker altering a message, or the changes might be accidental like data corruption during transit. But data integrity isn't limited to being applied to network transmissions. We could use data integrity to verify whether or not a record in a database has been tampered with, or we might use data integrity to verify that a file stored on disk has not changed. Although encryption can make a message look like random data, it makes it look scrambled. It doesn't generally provide a method to ensure the data has not been tampered with and that's where data integrity comes in. Encryption and integrity are separate. Hashing can be used to detect file changes. For example, a unique hash can be generated from a file. Then if changes are made and saved to that file, when we generate a hash once again from that same file – because it's changed – the second hash differs from the first. That means that the file is in a different state than it was when we first took the hash. Let's take a look at this in the Windows environment.
Data integrity is used to detect changes to the data. The data changes can either be malicious or accidental.
Here in Windows, I've got a text file called userstopurge.txt. Essentially it contains a couple of user email addresses. Here in my MD5 hashing program, I'm going to click the Browse button to select that file. When I Open it, it will generate a unique hash in the program. So we can see the unique hash value indeed has been generated. I'm going to select it, right-click, and I'm going to Copy it to the Windows Clipboard. Now what I'm going to do is I'm going to make a change to the files. So let's say, I'll delete an entry and I'll Save that change. Now in my hashing program, I'll Browse for the file to reopen it and it generates a hash. So I'm going to go ahead and paste the original hash down below. Now we can tell just by looking on the screen, those hashes do not match. But I can click the Verify button in the application and, of course, it tells us what we already know. It says the original and current hashes are not matched. So what is this telling us? It's telling us that the file has changed.
A Notepad file and a dialog box are open. The Notepad file is titled userstopurge.txt. The dialog box is titled WinMD5Free v1.20. Running along the top of the userstopurge.txt Notepad file is a menu bar. The menu bar consists of multiple menus, some of which include File, Edit, Format, and Help. The userstopurge.txt Notepad file includes the following text:
maDaysOfWeek: 0
maLockMode: nolock
maKind: generic
maStaticUsers: [email protected]
maStaticUsers: [email protected]
The WinMD5Free v1.20 dialog box includes three text fields: Select a file to compute MD5 checksum, Current file MD5 checksum value, and Original file MD5 checksum value. The "Select a file to compute MD5 checksum" text field is blank. The "Current file MD5 checksum value" text field contains the text "n/a" by default. The "Original file MD5 checksum value" text field contains the text "paste its original md5 value to verify" by default. The Browse button is associated with the "Select a file to compute MD5 checksum" text field. The Verify button is associated with the "Original file MD5 checksum value" text field. The presenter clicks the Browse button and the Open dialog box is displayed. The Open dialog box includes the userstopurge.txt file. The Open dialog box also includes two buttons, Open and Cancel. The presenter selects the userstopurge.txt file and clicks the Open button. As a result, the Open dialog box closes and the "Select a file to compute MD5 checksum" and "Current file MD5 checksum value" text fields in the WinMD5Free v1.20 dialog box auto populate with the location of the file and the checksum value of the file, respectively. The presenter selects the checksum value of the file and right-clicks it. A shortcut menu is displayed. The shortcut menu includes multiple shortcut menu options, some of which include Copy, Right to left Reading order, and Show Unicode control characters. The presenter selects the "Copy" shortcut menu option to copy the checksum value of the file to the Windows Clipboard. The presenter now makes some text changes in the userstopurge.txt Notepad file and clicks the File menu. Multiple menu options are displayed, some of which include New, Open, and Save. The presenter clicks the Save menu option to save the changes to the userstopurge.txt Notepad file. The presenter now reopens the userstopurge.txt Notepad file in the WinMD5Free v1.20 dialog box. This time, the "Current file MD5 checksum value" text field auto populates with a different checksum value. The presenter replaces the text in the "Original file MD5 checksum value" text field by the checksum value that he had previously copied to the Windows Clipboard. The presenter clicks the Verify button. The WinMD5Free dialog box is displayed. The dialog box includes two checksum values, Original and Current. It also includes the text "NOT Matched!"
Hashing is also sometimes resultant in a message digest. So hash and message digest are synonymous. Message digests used with encryption can provide a check to validate if a message has been altered. Some modes of operation, when it comes to block ciphers, also reduce the risk of data being altered. Some of these block cipher modes tend to use the previous block of data as input to the next block of data. So therefore, a change to one block would be reflected in all of the following blocks. However, this would not detect a truncation of the message. In this video, we discussed data integrity.
Defining Cryptography Authentication
Authentication is the ability to verify the authenticity of a message. Authentication in cryptography can come in two forms. The first of which is verifying that the sender really sent the message. This can be done with the sender using their unique private key that was issued to them to sign a message. Now only the sender would have access to their private key, nobody else would. The recipient could use the mathematically related public key of the sender to verify the signature created with the private key. Now the second form of authentication within crypto is verifying that the receiver received the correct message. So not only do we want to ensure the message came from who it says it came from. But we also want to make sure that the message was not tampered with. Message authentication codes or MACs (M-A-Cs) can be used to authenticate messages. MACs provide an additional string of data that's used to verify the authenticity of the message. A common MAC is the hashed key message authentication code or HMAC. HMAC can provide both data integrity as well as authentication.
Authentication is the ability to verify the authenticity of a message. In cryptography, authentication can include two parts. The first part is to verify whether the sender has sent the message. The second part is to verify whether the receiver has received the correct message.
Asymmetric key encryption uses a mathematically related public and private key pair that would be issued, for example, to every user. This can also provide authentication because the private key uniquely identifies, in our example, a user and only that user would have access to their private key. Inversely, everybody would have access to everybody's public keys that would be used to verify signatures. Session management is a form of authentication where session keys can be used once the authentication has been completed. These session keys then authorize access to some kind of a network resource. Sessions should also contain a session counter. This is a number that can be embedded in each message to ensure that a message can't be replayed and to allow one or both sides of the connection to know when a message failed to deliver. In this video, we discussed authentication.
Asymmetric key encryption can provide authentication. Session management is a form of authentication.
Applying Non-repudiation to Cryptography
Non-repudiation is a legal term used to indicate that a statement or a document was signed or made by an individual. In cryptography, it's a method of ensuring a message was sent or encrypted by a specific entity. The process by which this is done is via a cryptographic digital signature. Digital signatures are created with a unique private key that gets issued to an entity, like a user or a computing device. So the signature could only have been created by the owner of the private key. So naturally it's crucial that private keys are kept protected properly in order for this process to be trusted. The purpose of non-repudiation in cryptography is to protect both the sender and the receiver. The sender can verify that they send the message to the correct receiver and the receiver can verify the sender of the message. Depending on the implementation, it can also let the sender know that the message was received. Non-repudiation is usually used for digital documents and e-mail messages. So we can send an e-mail message in our mail program. And there is often a button we can click before we send it to sign the message.
Non-repudiation is a legal term that is used to indicate that the statement or document was made by an individual. In cryptography, non-repudiation is a method of ensuring that a message was sent or encrypted by an entity.
Asymmetric encryption uses unique public and private key pairs that are issued to either computers or users. The non-repudiation usually comes in the form of having a private key to create a unique signature that gets verified on the other end with a mathematically related public key. So really there are some issues, such as when a key is exposed or in transition as a key is rotated. So it's very important that private keys be kept safe. Public keys are called public keys because they can be made public to everybody. They don't need to be kept safe, the private key does. Non-repudiation is implemented with digital signatures. NIST, the National Institute of Standards and Technology, has three digital signature algorithms – DSA, RSA, and ECDSA. In this video, we discussed non-repudiation.
Distinguishing between Block and Key Sizes
With block ciphers, they are designed to work with blocks of data to either be encrypted or decrypted. This data needs to be split into sections that match the block size of the algorithm. Now the block size is usually a fixed value – like 16, 32, 64, or more bits. But, because the algorithm would require a block of data to work on, if the input data is less than the block size, then that input data needs to be padded. The key size of an algorithm is also called the key length. It's the number of bits that are required for the key used within that cryptographic algorithm. And it relates to the strength of the algorithm. So generally speaking, a larger key size means greater strength. Block algorithms can support multiple key sizes, but usually they have a single block size. Let's take a look at some examples starting with symmetric algorithms. AES supports 128, 192, and 256-bit key sizes whereas DES supports 56-bit keys. 3DES supports a 168-bit key size, but in practice, it's really equivalent to 112 bits.
For asymmetric algorithms, the key sizes will vary. For example, we might have a 1024-bit key up to a 4096-bit key. A 1024-bit key is about the equivalence of an 80-bit symmetric key. If we compare algorithm blocks and key sizes, we get an idea of their strength. For example, with the AES algorithm, the block size is 128 bits; the key sizes are 128, 192, and 256. But, if we were to look – for example – at DES, its block size is 64 and it has a number of key sizes including 56, 112, 168, and so on. SHA-2 – the Secure Hash Algorithm – has block sizes of 512 or 1024, and the key sizes range from 224-256-384-512. In this video, we discussed block and key sizes.
Using Padding
Many cryptographic block algorithms require that data be a fixed block length. However, some modes of operation that some algorithms can use don't require this. But for those that do, padding adds additional data to the end of the message to fit the block size. This means we can end up in encrypted data taking up more room than the original message itself. For example, if we've got a block size of 16, yet our original message is the text "hello world," we're going to need to add some padding beyond the text "hello world" to meet our block size. And there are various ways in which this can be done. With zero padding – sometimes called null padding – all of the padding or the padded bytes are set to a value of zero. Now this should only be used for text-based messages or when we know the message length. With binary data, the 0s can be confused with real data. And this is a problem. For example, if our original message is 101000, to pad it using this method, we would add 0s at the end of our message up to 16 bits.
Now, when the padding gets removed, we have a problem. Because the padding is all 0s, therefore, all of the 0s would be removed, leaving us with only 101. But the problem is the original message was 101000. With bit padding, a single bit is set to a value of 1. And it's added to the end of the message followed by all of the other bits set to a value of 0. Knowing the length of the data or having additional checks for invalid decryption is required. And this can be done with an extra block with fake padding. For example, let's say our original message is 101001. With bit padding, a binary 1 is added after our original message followed by binary 0s to meet the block size. No padding will be required if we have a message that is already 16 bits long and that's our block size. But assume that the last two binary digits are 10. This can be difficult because that looks like it's padding that needs to be removed when it's actually part of the original message.
However, if we know the length of the original data, then we wouldn't confuse that with padding because we would know the entire message was 16 bits long. With the byte padding mechanism – otherwise called PKCS#7 – we work with bytes rather than individual bits. Byte padding calculates the number of padding bytes that are required and fills the last bytes with this value. Therefore, the last value will be less than the block length. For example, let's say our original message is the alphabetic characters A through to and including K. But we must pad it up to 16 bytes. So we're missing five placeholders. So therefore, with byte padding, 5 is the value that would be used for padding. Now in this example, the last 5s aren't the ASCII variation of 5, but rather the binary representation of the number 5. With ISO 10126, it calculates the number of padding bytes and puts this number in the last byte with random bytes filling in the rest of the empty spaces. Now with the data ends on the block size boundary depending on the implementation of the algorithm, an extra block may be required. For example, let's say for our message we've got A through to K inclusive. But it's got to be padded once again to 16 bytes.
In accordance with ISO 10126, we've got five placeholders. So therefore, a value of 5 is put in as the last byte. Now between the message and the last byte, we have a series of random bytes. Now let's say we've got a 16-bit block size and our binary message ends with 11. Well, since the last byte in this is 11, this would remove 11 characters of padding when really it's part of the original message. ANSI X.923 calculates the number of padding bytes and puts this number as the last byte of padding with 0s filling in the empty spaces. The last byte gets checked. And if it's less than the block size, it checks for the right number of 0s. If it's correct, then the padding is removed. Otherwise, it's just data that ended right on the block boundary. For example, if our original message once again consists of the letters A through to and including K, then we must pad it up to 16 bytes. With ANSI X.923, we've got five placeholders that must be padded. So therefore, the value of 5 is put as the last byte. And then we've got 0s between the original message and the last byte. In this video, we discussed padding.
Formatting the Output
Text encoding is used to convert raw binary output into a text-friendly representation. And this is sometimes required by some applications or for readability. Encoding, however, does not protect data – that's what encryption is for. Both encryption and encoding are reversible whereas hashing values are not. For example, if we possess the correct encryption key, we can decrypt back to the original plaintext. The most common text encoding format for cryptographic operations is hex – this stands for hexadecimal or base 16 where we use characters 0 through to 9 as well as A through to F where A would equal 10, B would equal 11, and so on. With hexadecimal, each byte is converted into two alphanumeric characters. For example, 255 in decimal would equate to FF in hexadecimal.
"Hello World" would encode to 48 65 6C 6C and so on. Encrypted or hashed raw data can't be easily stored, for example in a database, or even transferred using e-mail. ASCII control characters – those with decimal values less than 32 – and ANSI characters from 128 to 255 may display as junk characters or not be displayed at all. So text encoding solves these types of issues. There are many encoding formats available – some common ones include base16, base32, base64, and uuencoding. These usually make the text longer than the original raw data. However, when encoded, it's still shorter than the hexadecimal equivalent – which approximately doubles the size. In this video, we discussed text encoding.
Using Nonces and the Initialization Vector
Initialization vectors are also called starting variables, nonces, or IVs. An IV is used when encrypting or decrypting multiple blocks of data. It's normally used on the first block to be encrypted. The IV is a random fixed-length string similar to a key. But it doesn't need to be protected like the key itself does. For block ciphers, the IV is the same length as the block length. So the primary purpose of an IV is to add randomization to each block being encrypted to prevent patterns from showing up in the encrypted data. For example, let's say we're encrypting the text ABCABC without the use of an IV. So ABCABC then might encrypt to A3E54EA3E54E. Then this isn't good because there is a repeatable pattern.
If we were to add an IV – an initialization vector – then the encrypted data would look more random. This way there wouldn't be a repeatable pattern. The initialization vector can also be unique for each block encrypted. In this case, it's referred to as a nonce. Nonces are integers and may increment like a counter for each block of data. Nonces are commonly used for disk encryption where the nonce is the sector of the disk. But there are some bad implementations of IVs notably with Wi-Fi encryption WEP. WEP stands for Wired Equivalent Privacy and it uses IVs. However, the problem is that the IV is only 24 bits long. So there are not that many variations. And you're pretty much guaranteed to have a repeatable pattern given enough traffic. In this video, we discussed initialization vectors.
Identifying and Using Entropy
A cryptographic cipher is only as strong as its key. The key is only as strong as the entropy that was used to create it. But what is entropy? Entropy is randomness. And the more of it, the stronger the key. We could say then that entropy adds disorder and confusion to data. Using passwords to generate keys presents a weakness because rather than attacking the cipher or the key, an attacker would find it easier to attack the password that was used to create the key. Effectively, this has reduced the key space from the key size all the way down to the password strength. We should be using multiple sources of randomness to generate keys that are effectively secure. A weakness in any source of entropy reduces the strength of the protected data because the key that protected the data is derived from the source of entropy. Using just a cipher and a key is not good enough if the key is not truly random.
Most operating systems contain a random number generator that uses operating system events to gather entropy or randomness from various sources such as mouse movements, keyboard typing, network communications, memory usage, audio noise, disk drive timings, and so on. Most systems can use a pseudorandom number generator, often called a PRNG. These randomness generators use a seed and cryptographic algorithms to then generate a sequence of data that approximates the properties of real random data. In this video, we discussed entropy.
Creating or Generating Keys
For most cryptography, there are two types of keys. The first is a symmetric key. This is a single shared or secret key. And all communicating parties that wish to communicate in a secured manner must have knowledge of this secret key. The problem is that the key needs to be safely distributed to communicating parties in the first place. Symmetric keys are often used with file encryption and VPN tunnel establishment to name just a few uses. Asymmetric keys are mathematically related public and private key pairs. The public key can be made available to everybody, but the private key must be available only to the owner. Asymmetric keys are used with digital signatures for documents or e-mail messages as well as for e-mail message encryption. Key generation can occur within an operating system. For example, in Linux, we can use the /dev/random device to generate keys. In Linux, we could also use the ssh-keygen command to generate keys used for SSH authentication. SSH allows Linux administrators to remotely connect over the network to the host for command line administration.
On the Windows side, in PowerShell, we could use the System.Security.Cryptography provider to generate keys. A PKI Certificate Authority could be used to generate keys as well. We could have an internal Certificate Authority or we could have keys generated from a trusted third-party Certificate Authority. Either way, the Certificate Authority or CA generates PKI certificates, which are also called X.509 certificates. A unique public and private key pair is issued to each entity, such as a user or a computer. And the public and private key pair is stored within the certificate. The keys in the certificate are mathematically related to one another. Let's take a look, for example, at how asymmetric keys get created. First – two large prime numbers must be chosen. In our example, they are being denoted as p and q. Then we must calculate a value, which we'll call n where n equals p multiplied by q. The mathematical calculations continue. The idea is that prime numbers have no discernible pattern. So, when we build keys from prime numbers, we have a stronger key. Therefore, we end up with stronger encryption. In this video, we discussed how cryptographic keys are generated.