Low-Latency Lightweight Ciphers

For the sake of brevity, I'm going to assume you've read my previous post which introduced the concept of encryption. This blog post will not rely as heavily on Galois field arithmetic, so there is no need to reiterate that section.

In almost all areas of computer science, there is something known as the space vs. time tradeoff which governs the laws of efficiency. A prime example of the space vs. time tradeoff was in my AES article, where I provided code for Galois field multiplication as well as a look-up table. Anyone who incorporated the Galois field multiplication (xtimes() and gmult()) into their code would have seen their program run a bit slower than a program which utilized the look-up tables. However, if you slowed time down to the nanosecond scale and observed the processes' memory addresses, you'd notice that the program with look-up tables uses considerably less space.

The decision of whether to use a space-efficient implementation or time-efficient implementation is really a matter of use-case. For example, the largest you will likely see an AES software implementation is a little over a dozen kilobytes (kB). Thus, any software on your computer that claims to encrypt data is likely using a time-efficient algorithm since your computer has plenty of memory to spare. In terms of hardware, space-efficiency also includes the physical area which an algorithm takes up on a computer chip (in terms of number of gates, etc).

Yet, as the Internet of Things (IoT) becomes increasingly more relevant in our everyday lives (my dad, for example, has recently been going on a frenzy for Nest products), it becomes apparent that not all electronic devices have memory to spare. For anyone with an E-ZPass, consider the small RFID chip inside which makes all the transactions for you. Much like a credit card, you wouldn't want anyone accessing the your bank information whenever you make a transaction (that is, drive through an E-ZPass gate).

Examples like the E-ZPass above are what inspired the creation of lightweight ciphers. However, for the E-ZPass example in particular, a lightweight cipher doesn't do us much good if it can't encrypt our data in the time it takes to drive through a gate at 35 miles-per-hour (the speed limit signs are optional, right?). Thus, we turn our attention to the subcategory of low-latency lightweight ciphers.

But wait a minute — doesn't the whole idea of something being both low-latency and lightweight contradict our space vs. time tradeoff? Well, not exactly. You see, the space vs. time tradeoff is relative to a single algorithm — it merely tells us that, given a program, we can trade some of our speed for a smaller memory footprint or vice versa. Imagine that the space vs. time tradeoff is a teeter-totter (or seesaw, but if you call it a teeterboard get the hell off my website) and the height of one side represents space while the height of the other side represents time. The space vs. time tradeoff does not say anything about the height of the center of our teeter-totter or the teeter-totter's length — simply that the teeter-totter exists.

A visual depiction of the space-time tradeoff using a teeter totter

Extending our teeter-totter analogy to low-latency lightweight ciphers, the center of the low-latency lightweight cipher's teeter-totter would be at a low height so that, when balanced, the time costs and space costs are not too high. However, push one side to the limits, and the cipher may no longer be considered low-latency or lightweight.


Despite having a propensity for rambling on technical topics, I didn't just introduce low-latency lightweight ciphers for the sake of introducing low-latency lightweight ciphers. As it happens, these two properties are essentially PRINCE's claim-to-fame.

For a little background, PRINCE was originally developed in 2012 by a group of researchers from Denmark, Germany, and Belgium. The majority of the researchers were associated with the Technical University of Denmark, while the remaining were either at Ruhr University or employed by NXP Semiconductors. The researchers' goal was to conceive some sort of block cipher which would be significantly smaller on hardware than AES-128 and PRESENT (a cipher which I have not discussed).

In terms of their goal, I would say the researchers did a pretty good job at developing a lightweight cipher. Even when AES-128 is loop unrolled, PRINCE takes up 14-15 times less area on hardware than AES-128. In terms of software, I tested both encryptions on a PIC16F1947 microcontroller and got the following results:

Cipher Run-Time Memory (bytes) Key Schedule (clock cycles) Encryption (clock cycles) Decryption (clock cycles)
AES-128 268 14799 24216 27438
PRINCE 74 366 18214 18722

Note that this data is not as significant since many of the aspects, such as run-time memory, are implementation-dependent. That is, two programmers could implement the exact same cipher and get slightly different results as far as run-time memory and clock-cycles.

So where's the catch? We have a cipher that is no doubt more efficient than AES — why don't we just use this one all the time? If you remember from my "national bank versus mattress" analogy from the previous post, AES is like putting your money in a national bank whereas PRINCE is like hiding it in your mattress. The former takes more time and is an arduous process, but has excellent security measures. The latter, on the other hand, is quick and easy but prone to theft in some cases.

In reality, the analogy between hiding money in a bank vs. in a mattress only goes so far. There ARE numerous cryptanalytic attacks on PRINCE, but unless you have a doctorate in applied mathematics or computer science you will likely have a tough time implementing them. The attack that I'm most familiar with is known as reflection cryptanalysis, which builds upon the fact that a linear involution \(f: GF(2^n) \to GF(2^n) \) must have \(2^{n/2}\) fixed points. From there, you simply use the \(\alpha\)-reflection property of PRINCE at the very middle round of the encryption to begin recovering the key 4 bits at a time. However, there are numerous meet-in-the-middle (MITM) attacks which are significantly more efficient in recovering the key. Moreover, all these attacks are practical (that is, they do not surpass the \(2^{80}\) complexity threshold which makes an attack theoretical), so they can be effectively performed by modern computers. To a cryptanalyst, this basically means the PRINCE cipher is broken.

At this point, you may be asking yourself why I'm telling you about some cipher that's already been broken — well, there are two reasons:

  1. A primary principle of hacking is to never try to break the hardest part of an algorithm (i.e. the encryption itself). For that reason, attacks performed by government organizations or private groups almost always go after design flaws in the underlying hardware (this is known as a side-channel attack). In fact, this is the reason for the meltdown attack's notoriety. I may try to do a separate post on the meltdown attack if I'm feeling up to it, but in essence the meltdown attack exposes the pipeline (more accurately, the race condition in speculative execution) and cache structure that MIPS originally laid out in the 90's.

  2. Security can be use-case dependent. PRINCE was never deemed as an encryption standard by NIST, so trying to sell software to the government which utilizes the cipher wouldn't go well. However, small applications for the everyday use-case (such as encrypting car battery information or RFID tags) would benefit much more from PRINCE's minimal hardware footprint than from alternative ciphers. In addition, any attack on PRINCE itself still takes a fair amount of time, so incorporating a well-timed random key generator and key-policy could counteract the cipher's security deficit.

Hopefully those two reasons are enough to convince you the algorithm is worth looking at. With that said, it's time to discuss the algorithm.

The Cipher

Much like AES from the previous post, the PRINCE cipher is a Substitution-Permutation Network (SPN) that iterates over a fixed number of rounds. However, unlike AES, PRINCE only comes in one key size: 128 bits. Moreover, the amount of data which is encrypted is fairly small — only 64 bits (though this is typical of a lightweight cipher). The reason we are encrypting half the block size is actually because all of our operations only work on nibbles — that is, we no longer consider \( GF(2^8) \), but instead \(GF(2^4)\).

Not considering the round functions, the encryption is fairly straightforward. Instead of treating our data as a matrix (like AES does), we simply look at it as a fixed one-dimensional array. At the beginning of the cipher there is a key-whitening phase, followed by five normal rounds, a middle round (which, as I mentioned in the section above, is a linear inversion), five inverse rounds, and then another key-whitening phase. If you notice, this gives the encryption a nice symmetry to it (that also leads to reflective attacks, but whatever — it looks nice).

J. Borghoff, A. Canteaut, T. G¨uneysu, E. B. Kavun, M. Kneˇzevi´c, L. R. Knudsen, G. Leander, V. Nikov, C. Paar, C. Rechberger, P. Rombouts, S. S. Thomsen, and T. Yal¸cın. PRINCE - A Low-Latency Block Cipher for Pervasive Computing Applications, ASIACRYPT 2012, LNCS, vol. 7658, pp. 208 – 225, Springer 2012.

Each round by itself is simply four functions: SubNibbles(), MLayer(), AddRoundConstant(), and AddKey(). However, since the middle round is different from the other rounds, I'm also going to need to go over ShiftRows() and MPrimeLayer().

Another distinction from AES is that PRINCE doesn't really have much of a key-schedule. Given a 128-bit key, \(k\), PRINCE splits the key into two 64-bit halves \( k_0 || k_1 \) and uses the two halves to create a third 64-bit chunk \(k_0'\). We construct \(k_0'\) using the simple operation:

$$ k_0' = (k_0 \ggg 1) \oplus (k_0 \gg 63) $$

Now for each subroutine (except MPrimeLayer()) we will need to find an inverse so that the last five rounds of encryption are possible. With that said, let's take a detailed look at each of the cipher's subroutines.


As explained in the previous article for AES, the purpose of a substitution layer is to eliminate any linearity so that our cipher cannot be represented as a system of linear equations of any size. The SBox for AES was a large, 256-byte array which mapped every possible byte value to a new, distinct byte value.

We still desire the same effect of mapping all 256 possible byte values to a new distinct byte value; however, 256 bytes of data is a lot of unnecessary space if our goal is to make a lightweight cipher. Instead, what if we took every byte, cut it in half, and applied some sort of substitution box to each 4-bit chunk? In that case, we would only need an SBox of length \(2^4 = 16\). Note that as long as our SBox is injective on 4-bits, it will be injective on the concatenated 8-bits as well. This is because you would reach a contradiction if two distinct tuples of nibbles mapped to the same byte, since that would imply the mapping is not injective on the nibbles themselves!

The researchers had a few other security constraints when it came to choosing the SBox, so that in the end there were only 8 possibilities to choose from out of about 20 trillion. The SBox that was ultimately chosen for PRINCE is the following:

The idea of applying our SBox nibble-by-nibble is fairly simple if you know a bit about byte arithmetic:

Luckily, finding an inverse SBox is much easier when there are 16 elements as opposed to 256. Thus, one can simply find which index maps to each element and sort the results by ascending order:

The code for InvSubNibbles() is basically the exact same as SubNibbles() — in fact, we could save some space and use the same function with an additional argument to specify which SBox we want. I will leave this choice up to the reader; however, I will use two functions for the sake of clarity with respect to the overall cipher.


The only other notably difficult subroutine involved in PRINCE is the linear layer. The linear layer, as the name suggests, simply applies a linear transform to our data. Since we are encrypting a 64-bit state, we choose to represent this linear transform as a \(64 \times 64\) matrix. The developers of PRINCE designed the matrix such that the output of an SBox in one round influences at least 3 SBoxes in the next round. In addition, since the lightweight property of the cipher results from its reflectivity, we must ensure that the matrix is an inversion (its own inverse). The resulting matrix that was chosen can be constructed according to the following format:

$$ M_0 = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}, M_1 = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} M_2 = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}, M_3 = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix} $$ $$ \hat{M}_0 = \begin{pmatrix} M_0 & M_1 & M_2 & M_3 \\ M_1 & M_2 & M_3 & M_0 \\ M_2 & M_3 & M_0 & M_1 \\ M_3 & M_0 & M_1 & M_2 \end{pmatrix}, \hat{M}_1 = \begin{pmatrix} M_1 & M_2 & M_3 & M_0 \\ M_2 & M_3 & M_0 & M_1 \\ M_3 & M_0 & M_1 & M_2 \\ M_0 & M_1 & M_2 & M_3 \end{pmatrix} $$ $$ M' = \begin{pmatrix} \hat{M}_0 & 0 & 0 & 0 \\ 0 & \hat{M}_1 & 0 & 0 \\ 0 & 0 & \hat{M}_1 & 0 \\ 0 & 0 & 0 & \hat{M}_0 \end{pmatrix} $$

This is where most developers begin to understand that this cipher was really meant for a hardware-level implementation and not a software-level implementation. In order to actually make the code attractive, one would have to waste clock cycles, program space, and potentially memory by creating additional storage variables for each operation - since this is a lightweight cipher, doing so would be incredibly counterintuitive. Thus, for this next code segment, I ask that all small children and software engineers please look away:


Much like AES, there is a ShiftRows() function which (partly) gives us the whole "permutation" part of our substitution-permutation network. However, PRINCE's version works a bit differently since we no longer imagine the state as a \(4 \times 4\) matrix of bytes. As I've briefly mentioned, we instead visualize the state as a one-dimensional array of 16 nibbles (in all honesty I don't know why they called it ShiftRows() if there's no concept of rows to begin with). Thus, the permutation we're applying to our state is the following:

$$ \tau = \begin{pmatrix} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \\ 0 & 5 & 10 & 15 & 4 & 9 & 14 & 3 & 8 & 13 & 2 & 7 & 12 & 1 & 6 & 11 \end{pmatrix} $$

The ShiftRows() function doesn't actually get called by the cipher directly, but is used as a "helper function" for our MLayer() function - in fact, ShiftRows() is the only thing that distinguishes the MPrimeLayer() from MLayer(). With that said, here's the code for the ShiftRows() function:

In order to construct an inverse to our ShiftRows(), we simply employ the inverse permutation:

$$ \tau^{-1} = \begin{pmatrix} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \\ 0 & 13 & 10 & 7 & 4 & 1 & 14 & 11 & 8 & 5 & 2 & 15 & 12 & 9 & 6 & 3 \end{pmatrix} $$

which gives rise to the following implementation for InvShiftRows():


Just like I mentioned like a paragraph ago or something, this function is nothing more than ShiftRows() applied after MPrimeLayer() ... so ... if you really want the code:

A simple rule of algebra is that, for functions \(f : X \to Y \) and \(g : Y \to Z\), we have \( (g \circ f)^{-1} = f^{-1} \circ g^{-1} \) (note that this allows the codomains to align). Since MPrimeLayer() is its own inverse, we get \( M^{-1} = (SR \circ M')^{-1} = M' \circ SR^{-1} \), which gives us the following code:


At this point it's worth mentioning the \( \alpha\)-reflection property of PRINCE. As I've discussed, PRINCE has a very nice symmetry which we anticipate will allow us to go back and forth between the encryption and decryption easily. It so happens that if we introduce just the right constant, \(\alpha =\) 0xc0ac29b7c97c50dd, we get:

$$ \text{Dec}(k_0 || k'_0 || k_1) = \text{Enc}(k'_0 || k_0 || k_1 \oplus \alpha) $$

Since there are numerous rounds, we can't merely introduce the value \(\alpha\) all at once. In addition, we want to preserve the reflective nature of PRINCE so that our last five constants summed with our first five constants give us \(\alpha\) (that is, \(RC_i \oplus RC_{11 - i} = \alpha \) for any \(0 \leq i \leq 11\) ). The nice property of XOR (which I mentioned in the previous article) is that we can simply add some number to \(\alpha\) in order to find out which two numbers sum up to \(\alpha\). In other words, it really doesn't matter which numbers we choose for the first five round constants. Thus, the developers of PRINCE decided to start at the \(17^{th}\) decimal in the hex expansion for \(\pi\):

$$ \pi = 3.243F6A8885A308D3\color{red}{13198A2E037073 \\ 44A4093822299F31D0082EFA9\\8EC4E6C89452821E638D01377\\BE5466CF34E90C6C}$$

Adding the round constants to our state is exactly as you'd expect:


Despite only having 192 bits of extended key, we only use the last 64 bits (that is, \(k_1\)) for encryption / decryption rounds. The first 128 bits are simply used for key whitening before and after the cipher.

Encryption Code

Now that we have all of our main components, it's time to put the pieces together. In order to utilize the full 128-bits of our key, we must first perform the key-whitening with \(k_0\) before anything else. Once that's complete, we proceed with five normal rounds, a middle round, and then five inverse rounds. We lastly perform key-whitening with \(k_0'\) and we are done!

Decryption Code

Oh, one last thing. I mentioned it in the AddRoundConstant() section, but our cipher is minimally designed so that, instead of having to write a whole new function to retrace our steps, we merely swap around our extended key! This nice little feature is a direct result of PRINCE's \(\alpha\)-reflection property, which tells us:

$$ \text{Dec}(k_0 || k'_0 || k_1) = \text{Enc}(k'_0 || k_0 || k_1 \oplus \alpha) $$

Thus, we merely need to swap \(k_0\) with \(k'_0\), add \(\alpha\) to \(k_1\), and encrypt! Here's the code:

And there you have it! Hopefully everything was straightforward enough — I understand I may have not fully explained some things such as clock-cycles and bit-wise arithmetic (primarily masks and shifts), but these are pretty easy to pick up. As always, thanks for reading! 😁