BACKPORT, FROMGIT: crypto: adiantum - add Adiantum support

Add support for the Adiantum encryption mode.  Adiantum was designed by
Paul Crowley and is specified by our paper:

    Adiantum: length-preserving encryption for entry-level processors
    (https://eprint.iacr.org/2018/720.pdf)

See our paper for full details; this patch only provides an overview.

Adiantum is a tweakable, length-preserving encryption mode designed for
fast and secure disk encryption, especially on CPUs without dedicated
crypto instructions.  Adiantum encrypts each sector using the XChaCha12
stream cipher, two passes of an ε-almost-∆-universal (εA∆U) hash
function, and an invocation of the AES-256 block cipher on a single
16-byte block.  On CPUs without AES instructions, Adiantum is much
faster than AES-XTS; for example, on ARM Cortex-A7, on 4096-byte sectors
Adiantum encryption is about 4 times faster than AES-256-XTS encryption,
and decryption about 5 times faster.

Adiantum is a specialization of the more general HBSH construction.  Our
earlier proposal, HPolyC, was also a HBSH specialization, but it used a
different εA∆U hash function, one based on Poly1305 only.  Adiantum's
εA∆U hash function, which is based primarily on the "NH" hash function
like that used in UMAC (RFC4418), is about twice as fast as HPolyC's;
consequently, Adiantum is about 20% faster than HPolyC.

This speed comes with no loss of security: Adiantum is provably just as
secure as HPolyC, in fact slightly *more* secure.  Like HPolyC,
Adiantum's security is reducible to that of XChaCha12 and AES-256,
subject to a security bound.  XChaCha12 itself has a security reduction
to ChaCha12.  Therefore, one need not "trust" Adiantum; one need only
trust ChaCha12 and AES-256.  Note that the εA∆U hash function is only
used for its proven combinatorical properties so cannot be "broken".

Adiantum is also a true wide-block encryption mode, so flipping any
plaintext bit in the sector scrambles the entire ciphertext, and vice
versa.  No other such mode is available in the kernel currently; doing
the same with XTS scrambles only 16 bytes.  Adiantum also supports
arbitrary-length tweaks and naturally supports any length input >= 16
bytes without needing "ciphertext stealing".

For the stream cipher, Adiantum uses XChaCha12 rather than XChaCha20 in
order to make encryption feasible on the widest range of devices.
Although the 20-round variant is quite popular, the best known attacks
on ChaCha are on only 7 rounds, so ChaCha12 still has a substantial
security margin; in fact, larger than AES-256's.  12-round Salsa20 is
also the eSTREAM recommendation.  For the block cipher, Adiantum uses
AES-256, despite it having a lower security margin than XChaCha12 and
needing table lookups, due to AES's extensive adoption and analysis
making it the obvious first choice.  Nevertheless, for flexibility this
patch also permits the "adiantum" template to be instantiated with
XChaCha20 and/or with an alternate block cipher.

We need Adiantum support in the kernel for use in dm-crypt and fscrypt,
where currently the only other suitable options are block cipher modes
such as AES-XTS.  A big problem with this is that many low-end mobile
devices (e.g. Android Go phones sold primarily in developing countries,
as well as some smartwatches) still have CPUs that lack AES
instructions, e.g. ARM Cortex-A7.  Sadly, AES-XTS encryption is much too
slow to be viable on these devices.  We did find that some "lightweight"
block ciphers are fast enough, but these suffer from problems such as
not having much cryptanalysis or being too controversial.

The ChaCha stream cipher has excellent performance but is insecure to
use directly for disk encryption, since each sector's IV is reused each
time it is overwritten.  Even restricting the threat model to offline
attacks only isn't enough, since modern flash storage devices don't
guarantee that "overwrites" are really overwrites, due to wear-leveling.
Adiantum avoids this problem by constructing a
"tweakable super-pseudorandom permutation"; this is the strongest
possible security model for length-preserving encryption.

Of course, storing random nonces along with the ciphertext would be the
ideal solution.  But doing that with existing hardware and filesystems
runs into major practical problems; in most cases it would require data
journaling (like dm-integrity) which severely degrades performance.
Thus, for now length-preserving encryption is still needed.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

(cherry picked from commit 059c2a4d8e164dccc3078e49e7f286023b019a98
 https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master)

Conflicts:
	crypto/tcrypt.c
	crypto/testmgr.c

(adjusted test vector formatting for old testmgr)

Bug: 112008522

Test: Among other things, I ran the relevant crypto self-tests:

  1.) Build kernel with CONFIG_CRYPTO_MANAGER_DISABLE_TESTS *unset*, and
      all relevant crypto algorithms built-in, including:
         CONFIG_CRYPTO_ADIANTUM=y
         CONFIG_CRYPTO_CHACHA20=y
         CONFIG_CRYPTO_CHACHA20_NEON=y
         CONFIG_CRYPTO_NHPOLY1305=y
         CONFIG_CRYPTO_NHPOLY1305_NEON=y
         CONFIG_CRYPTO_POLY1305=y
         CONFIG_CRYPTO_AES=y
         CONFIG_CRYPTO_AES_ARM=y
  2.) Boot and check dmesg for test failures.
  3.) Instantiate "adiantum(xchacha12,aes)" and
      "adiantum(xchacha20,aes)" to trigger them to be tested.  There are
      many ways to do this, but one way is to create a dm-crypt target
      that uses them, e.g.

          key=$(hexdump -n 32 -e '16/4 "%08X" 1 "\n"' /dev/urandom)
          dmsetup create crypt --table "0 $((1<<17)) crypt xchacha12,aes-adiantum-plain64 $key 0 /dev/vdc 0"
          dmsetup remove crypt
          dmsetup create crypt --table "0 $((1<<17)) crypt xchacha20,aes-adiantum-plain64 $key 0 /dev/vdc 0"
          dmsetup remove crypt
   4.) Check dmesg for test failures again.
   5.) Do 1-4 on both x86_64 (for basic testing) and on arm32 (for
   testing the ARM32-specific implementations).  I did the arm32 kernel
   testing on Raspberry Pi 2, which is a BCM2836-based device that can
   run the upstream and Android common kernels.

   The same ARM32 assembly files for ChaCha, NHPoly1305, and AES are
   also included in the userspace Adiantum benchmark suite at
   https://github.com/google/adiantum, where they have undergone
   additional correctness testing.

Change-Id: Ic61c13b53facfd2173065be715a7ee5f3af8760b
Signed-off-by: Eric Biggers <ebiggers@google.com>
6 files changed