aboutsummaryrefslogtreecommitdiff
path: root/crypto/Kconfig
AgeCommit message (Collapse)Author
2014-06-18crypto: create generic version of ablk_helperArd Biesheuvel
Create a generic version of ablk_helper so it can be reused by other architectures. Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> (cherry picked from commit a62b01cd6cc1feb5e80d64d6937c291473ed82cb) Signed-off-by: Mark Brown <broonie@linaro.org>
2013-06-05crypto: blowfish - disable AVX2 implementationJussi Kivilinna
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of workload (tested on Core i5-4570) and causes blowfish-avx2 to be significantly slower than blowfish-amd64. So disable the AVX2 implementation to avoid performance regressions. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-06-05crypto: twofish - disable AVX2 implementationJussi Kivilinna
It appears that the performance of 'vpgatherdd' is suboptimal for this kind of workload (tested on Core i5-4570) and causes twofish_avx2 to be significantly slower than twofish_avx. So disable the AVX2 implementation to avoid performance regressions. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of ↵Jussi Kivilinna
camellia cipher Patch adds AVX2/AES-NI/x86-64 implementation of Camellia cipher, requiring 32 parallel blocks for input (512 bytes). Compared to AVX implementation, this version is extended to use the 256-bit wide YMM registers. For AES-NI instructions data is split to two 128-bit registers and merged afterwards. Even with this additional handling, performance should be higher compared to the AES-NI/AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: serpent - add AVX2/x86_64 assembler implementation of serpent cipherJussi Kivilinna
Patch adds AVX2/x86-64 implementation of Serpent cipher, requiring 16 parallel blocks for input (256 bytes). Implementation is based on the AVX implementation and extends to use the 256-bit wide YMM registers. Since serpent does not use table look-ups, this implementation should be close to two times faster than the AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: twofish - add AVX2/x86_64 assembler implementation of twofish cipherJussi Kivilinna
Patch adds AVX2/x86-64 implementation of Twofish cipher, requiring 16 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Implementation also uses 256-bit wide YMM registers, which should give additional speed up compared to the AVX implementation. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: blowfish - add AVX2/x86_64 implementation of blowfish cipherJussi Kivilinna
Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel blocks for input (256 bytes). Table look-ups are performed using vpgatherdd instruction directly from vector registers and thus should be faster than earlier implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: aesni_intel - fix Kconfig problem with CRYPTO_GLUE_HELPER_X86Jussi Kivilinna
The Kconfig setting for glue helper module is CRYPTO_GLUE_HELPER_X86, but recent change for aesni_intel used CRYPTO_GLUE_HELPER instead. Patch corrects this issue. Cc: kbuild-all@01.org Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: aesni_intel - add more optimized XTS mode for x86-64Jussi Kivilinna
Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack usage and boost for speed. tcrypt results, with Intel i5-2450M: 256-bit key enc dec 16B 0.98x 0.99x 64B 0.64x 0.63x 256B 1.29x 1.32x 1024B 1.54x 1.58x 8192B 1.57x 1.60x 512-bit key enc dec 16B 0.98x 0.99x 64B 0.60x 0.59x 256B 1.24x 1.25x 1024B 1.39x 1.42x 8192B 1.38x 1.42x I chose not to optimize smaller than block size of 256 bytes, since XTS is practically always used with data blocks of size 512 bytes. This is why performance is reduced in tcrypt for 64 byte long blocks. Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: add CMAC support to CryptoAPIJussi Kivilinna
Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI. This work is based on Tom St Denis' earlier patch, http://marc.info/?l=linux-crypto-vger&m=135877306305466&w=2 Cc: Tom St Denis <tstdenis@elliptictech.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: gcm - make GMAC work when dst and src are differentJussi Kivilinna
The GMAC code assumes that dst==src, which causes problems when trying to add rfc4543(gcm(aes)) test vectors. So fix this code to work when source and destination buffer are different. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: sha512 - Create module providing optimized SHA512 routines using ↵Tim Chen
SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25crypto: sha256 - Create module providing optimized SHA256 routines using ↵Tim Chen
SSSE3, AVX or AVX2 instructions. We added glue code and config options to create crypto module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-02-26crypto: crc32c - Kill pointless CRYPTO_CRC32C_X86_64 optionHerbert Xu
This bool option can never be set to anything other than y. So let's just kill it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-02-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: "Here is the crypto update for 3.9: - Added accelerated implementation of crc32 using pclmulqdq. - Added test vector for fcrypt. - Added support for OMAP4/AM33XX cipher and hash. - Fixed loose crypto_user input checks. - Misc fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (43 commits) crypto: user - ensure user supplied strings are nul-terminated crypto: user - fix empty string test in report API crypto: user - fix info leaks in report API crypto: caam - Added property fsl,sec-era in SEC4.0 device tree binding. crypto: use ERR_CAST crypto: atmel-aes - adjust duplicate test crypto: crc32-pclmul - Kill warning on x86-32 crypto: x86/twofish - assembler clean-ups: use ENTRY/ENDPROC, localize jump labels crypto: x86/sha1 - assembler clean-ups: use ENTRY/ENDPROC crypto: x86/serpent - use ENTRY/ENDPROC for assember functions and localize jump targets crypto: x86/salsa20 - assembler cleanup, use ENTRY/ENDPROC for assember functions and rename ECRYPT_* to salsa20_* crypto: x86/ghash - assembler clean-up: use ENDPROC at end of assember functions crypto: x86/crc32c - assembler clean-up: use ENTRY/ENDPROC crypto: cast6-avx: use ENTRY()/ENDPROC() for assembler functions crypto: cast5-avx: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: camellia-x86_64/aes-ni: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: blowfish-x86_64: use ENTRY()/ENDPROC() for assembler functions and localize jump targets crypto: aesni-intel - add ENDPROC statements for assembler functions crypto: x86/aes - assembler clean-ups: use ENTRY/ENDPROC, localize jump targets crypto: testmgr - add test vector for fcrypt ...
2013-02-23Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "So from the depth of frozen Minnesota, here's the powerpc pull request for 3.9. It has a few interesting highlights, in addition to the usual bunch of bug fixes, minor updates, embedded device tree updates and new boards: - Hand tuned asm implementation of SHA1 (by Paulus & Michael Ellerman) - Support for Doorbell interrupts on Power8 (kind of fast thread-thread IPIs) by Ian Munsie - Long overdue cleanup of the way we handle relocation of our open firmware trampoline (prom_init.c) on 64-bit by Anton Blanchard - Support for saving/restoring & context switching the PPR (Processor Priority Register) on server processors that support it. This allows the kernel to preserve thread priorities established by userspace. By Haren Myneni. - DAWR (new watchpoint facility) support on Power8 by Michael Neuling - Ability to change the DSCR (Data Stream Control Register) which controls cache prefetching on a running process via ptrace by Alexey Kardashevskiy - Support for context switching the TAR register on Power8 (new branch target register meant to be used by some new specific userspace perf event interrupt facility which is yet to be enabled) by Ian Munsie. - Improve preservation of the CFAR register (which captures the origin of a branch) on various exception conditions by Paulus. - Move the Bestcomm DMA driver from arch powerpc to drivers/dma where it belongs by Philippe De Muyter - Support for Transactional Memory on Power8 by Michael Neuling (based on original work by Matt Evans). For those curious about the feature, the patch contains a pretty good description." (See commit db8ff907027b: "powerpc: Documentation for transactional memory on powerpc" for the mentioned description added to the file Documentation/powerpc/transactional_memory.txt) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (140 commits) powerpc/kexec: Disable hard IRQ before kexec powerpc/85xx: l2sram - Add compatible string for BSC9131 platform powerpc/85xx: bsc9131 - Correct typo in SDHC device node powerpc/e500/qemu-e500: enable coreint powerpc/mpic: allow coreint to be determined by MPIC version powerpc/fsl_pci: Store the pci ctlr device ptr in the pci ctlr struct powerpc/85xx: Board support for ppa8548 powerpc/fsl: remove extraneous DIU platform functions arch/powerpc/platforms/85xx/p1022_ds.c: adjust duplicate test powerpc: Documentation for transactional memory on powerpc powerpc: Add transactional memory to pseries and ppc64 defconfigs powerpc: Add config option for transactional memory powerpc: Add transactional memory to POWER8 cpu features powerpc: Add new transactional memory state to the signal context powerpc: Hook in new transactional memory code powerpc: Routines for FP/VSX/VMX unavailable during a transaction powerpc: Add transactional memory unavaliable execption handler powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes powerpc: Add FP/VSX and VMX register load functions for transactional memory powerpc: Add helper functions for transactional memory context switching ...
2013-01-20crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table ↵Alexander Boyko
implementation This patch adds crc32 algorithms to shash crypto api. One is wrapper to gerneric crc32_le function. Second is crc32 pclmulqdq implementation. It use hardware provided PCLMULQDQ instruction to accelerate the CRC32 disposal. This instruction present from Intel Westmere and AMD Bulldozer CPUs. For intel core i5 I got 450MB/s for table implementation and 2100MB/s for pclmulqdq implementation. Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-01-11crypto: remove depends on CONFIG_EXPERIMENTALKees Cook
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David S. Miller <davem@davemloft.net>
2013-01-10powerpc: Add a powerpc implementation of SHA-1Michael Ellerman
This patch adds a crypto driver which provides a powerpc accelerated implementation of SHA-1, accelerated in that it is written in asm. Original patch by Paul, minor fixups for upstream by moi. Lightly tested on 64-bit with the test program here: http://michael.ellerman.id.au/files/junkcode/sha1test.c Seems to work, and is "not slower" than the generic version. Needs testing on 32-bit. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-12-06crypto: cast5/cast6 - move lookup tables to shared moduleJussi Kivilinna
CAST5 and CAST6 both use same lookup tables, which can be moved shared module 'cast_common'. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-11-09crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of ↵Jussi Kivilinna
camellia cipher This patch adds AES-NI/AVX/x86_64 assembler implementation of Camellia block cipher. Implementation process data in sixteen block chunks, which are byte-sliced and AES SubBytes is reused for Camellia s-box with help of pre- and post-filtering. Patch has been tested with tcrypt and automated filesystem tests. tcrypt test results: Intel Core i5-2450M: camellia-aesni-avx vs camellia-asm-x86_64-2way: 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.98x 0.96x 0.99x 0.96x 0.96x 0.95x 0.95x 0.94x 0.97x 0.98x 64B 0.99x 0.98x 1.00x 0.98x 0.98x 0.99x 0.98x 0.93x 0.99x 0.98x 256B 2.28x 2.28x 1.01x 2.29x 2.25x 2.24x 1.96x 1.97x 1.91x 1.90x 1024B 2.57x 2.56x 1.00x 2.57x 2.51x 2.53x 2.19x 2.17x 2.19x 2.22x 8192B 2.49x 2.49x 1.00x 2.53x 2.48x 2.49x 2.17x 2.17x 2.22x 2.22x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 0.98x 0.99x 0.97x 0.97x 0.96x 0.97x 0.98x 0.98x 0.99x 64B 1.00x 1.00x 1.01x 0.99x 0.98x 0.99x 0.99x 0.99x 0.99x 0.99x 256B 2.37x 2.37x 1.01x 2.39x 2.35x 2.33x 2.10x 2.11x 1.99x 2.02x 1024B 2.58x 2.60x 1.00x 2.58x 2.56x 2.56x 2.28x 2.29x 2.28x 2.29x 8192B 2.50x 2.52x 1.00x 2.56x 2.51x 2.51x 2.24x 2.25x 2.26x 2.29x Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-10-15crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instructionTim Chen
This patch adds the crc_pcl function that calculates CRC32C checksum using the PCLMULQDQ instruction on processors that support this feature. This will provide speedup over using CRC32 instruction only. The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and kernel_fpu_end and incur some overhead. So the new crc_pcl function is only invoked for buffer size of 512 bytes or more. Larger sized buffers will expect to see greater speedup. This feature is best used coupled with eager_fpu which reduces the kernel_fpu_begin/end overhead. For buffer size of 1K the speedup is around 1.6x and for buffer size greater than 4K, the speedup is around 3x compared to original implementation in crc32c-intel module. Test was performed on Sandy Bridge based platform with constant frequency set for cpu. A white paper detailing the algorithm can be found here: http://download.intel.com/design/intarch/papers/323405.pdf Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-10-14Merge branch 'modules-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module signing support from Rusty Russell: "module signing is the highlight, but it's an all-over David Howells frenzy..." Hmm "Magrathea: Glacier signing key". Somebody has been reading too much HHGTTG. * 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (37 commits) X.509: Fix indefinite length element skip error handling X.509: Convert some printk calls to pr_devel asymmetric keys: fix printk format warning MODSIGN: Fix 32-bit overflow in X.509 certificate validity date checking MODSIGN: Make mrproper should remove generated files. MODSIGN: Use utf8 strings in signer's name in autogenerated X.509 certs MODSIGN: Use the same digest for the autogen key sig as for the module sig MODSIGN: Sign modules during the build process MODSIGN: Provide a script for generating a key ID from an X.509 cert MODSIGN: Implement module signature checking MODSIGN: Provide module signing public keys to the kernel MODSIGN: Automatically generate module signing keys if missing MODSIGN: Provide Kconfig options MODSIGN: Provide gitignore and make clean rules for extra files MODSIGN: Add FIPS policy module: signature checking hook X.509: Add a crypto key parser for binary (DER) X.509 certificates MPILIB: Provide a function to read raw data into an MPI X.509: Add an ASN.1 decoder X.509: Add simple ASN.1 grammar compiler ...
2012-10-08KEYS: Implement asymmetric key typeDavid Howells
Create a key type that can be used to represent an asymmetric key type for use in appropriate cryptographic operations, such as encryption, decryption, signature generation and signature verification. The key type is "asymmetric" and can provide access to a variety of cryptographic algorithms. Possibly, this would be better as "public_key" - but that has the disadvantage that "public key" is an overloaded term. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: - Optimised AES/SHA1 for ARM. - IPsec ESN support in talitos and caam. - x86_64/avx implementation of cast5/cast6. - Add/use multi-algorithm registration helpers where possible. - Added IBM Power7+ in-Nest support. - Misc fixes. Fix up trivial conflicts in crypto/Kconfig due to the sparc64 crypto config options being added next to the new ARM ones. [ Side note: cut-and-paste duplicate help texts make those conflicts harder to read than necessary, thanks to git being smart about minimizing conflicts and maximizing the common parts... ] * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits) crypto: x86/glue_helper - fix storing of new IV in CBC encryption crypto: cast5/avx - fix storing of new IV in CBC encryption crypto: tcrypt - add missing tests for camellia and ghash crypto: testmgr - make test_aead also test 'dst != src' code paths crypto: testmgr - make test_skcipher also test 'dst != src' code paths crypto: testmgr - add test vectors for CTR mode IV increasement crypto: testmgr - add test vectors for partial ctr(cast5) and ctr(cast6) crypto: testmgr - allow non-multi page and multi page skcipher tests from same test template crypto: caam - increase TRNG clocks per sample crypto, tcrypt: remove local_bh_disable/enable() around local_irq_disable/enable() crypto: tegra-aes - fix error return code crypto: crypto4xx - fix error return code crypto: hifn_795x - fix error return code crypto: ux500 - fix error return code crypto: caam - fix error IDs for SEC v5.x RNG4 hwrng: mxc-rnga - Access data via structure hwrng: mxc-rnga - Adapt clocks to new i.mx clock framework crypto: caam - add IPsec ESN support crypto: 842 - remove .cra_list initialization Revert "[CRYPTO] cast6: inline bloat--" ...
2012-10-02crypto: Build SPARC DES algorithms on SPARC only.Dave Jones
Asking for this option on x86 seems a bit pointless. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07arm/crypto: Add optimized AES and SHA1 routinesDavid McCullough
Add assembler versions of AES and SHA1 for ARM platforms. This has provided up to a 50% improvement in IPsec/TCP throughout for tunnels using AES128/SHA1. Platform CPU SPeed Endian Before (bps) After (bps) Improvement IXP425 533 MHz big 11217042 15566294 ~38% KS8695 166 MHz little 3828549 5795373 ~51% Signed-off-by: David McCullough <ucdevel@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-28sparc64: Add CAMELLIA driver making use of the new camellia opcodes.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-25sparc64: Add DES driver making use of the new des opcodes.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22sparc64: Add CRC32C driver making use of the new crc32c opcode.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22sparc64: Add AES driver making use of the new aes opcodes.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-20sparc64: Add MD5 driver making use of the 'md5' instruction.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-20sparc64: Add SHA384/SHA512 driver making use of the 'sha512' instruction.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-20sparc64: Add SHA224/SHA256 driver making use of the 'sha256' instruction.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-20sparc64: Add SHA1 driver making use of the 'sha1' instruction.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-20crypto: aesni_intel - improve lrw and xts performance by utilizing parallel ↵Jussi Kivilinna
AES-NI hardware pipelines Use parallel LRW and XTS encryption facilities to better utilize AES-NI hardware pipelines and gain extra performance. Tcrypt benchmark results (async), old vs new ratios: Intel Core i5-2450M CPU (fam: 6, model: 42, step: 7) aes:128bit lrw:256bit xts:256bit size lrw-enc lrw-dec xts-dec xts-dec 16B 0.99x 1.00x 1.22x 1.19x 64B 1.38x 1.50x 1.58x 1.61x 256B 2.04x 2.02x 2.27x 2.29x 1024B 2.56x 2.54x 2.89x 2.92x 8192B 2.85x 2.99x 3.40x 3.23x aes:192bit lrw:320bit xts:384bit size lrw-enc lrw-dec xts-dec xts-dec 16B 1.08x 1.08x 1.16x 1.17x 64B 1.48x 1.54x 1.59x 1.65x 256B 2.18x 2.17x 2.29x 2.28x 1024B 2.67x 2.67x 2.87x 3.05x 8192B 2.93x 2.84x 3.28x 3.33x aes:256bit lrw:348bit xts:512bit size lrw-enc lrw-dec xts-dec xts-dec 16B 1.07x 1.07x 1.18x 1.19x 64B 1.56x 1.56x 1.70x 1.71x 256B 2.22x 2.24x 2.46x 2.46x 1024B 2.76x 2.77x 3.13x 3.05x 8192B 2.99x 3.05x 3.40x 3.30x Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Reviewed-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01powerpc/crypto: add 842 crypto driverSeth Jennings
This patch add the 842 cryptographic API driver that submits compression requests to the 842 hardware compression accelerator driver (nx-compress). If the hardware accelerator goes offline for any reason (dynamic disable, migration, etc...), this driver will use LZO as a software failover for all future compression requests. For decompression requests, the 842 hardware driver contains a software implementation of the 842 decompressor to support the decompression of data that was compressed before the accelerator went offline. Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01crypto: cast6 - add x86_64/avx assembler implementationJohannes Goetzfried
This patch adds a x86_64/avx assembler implementation of the Cast6 block cipher. The implementation processes eight blocks in parallel (two 4 block chunk AVX operations). The table-lookups are done in general-purpose registers. For small blocksizes the functions from the generic module are called. A good performance increase is provided for blocksizes greater or equal to 128B. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) cast6-avx-x86_64 vs. cast6-generic 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 1.00x 1.01x 1.01x 0.99x 0.97x 0.98x 1.01x 0.96x 0.98x 64B 0.98x 0.99x 1.02x 1.01x 0.99x 1.00x 1.01x 0.99x 1.00x 0.99x 256B 1.77x 1.84x 0.99x 1.85x 1.77x 1.77x 1.70x 1.74x 1.69x 1.72x 1024B 1.93x 1.95x 0.99x 1.96x 1.93x 1.93x 1.84x 1.85x 1.89x 1.87x 8192B 1.91x 1.95x 0.99x 1.97x 1.95x 1.91x 1.86x 1.87x 1.93x 1.90x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.97x 0.99x 1.02x 1.01x 0.98x 0.99x 1.00x 1.00x 0.98x 0.98x 64B 0.98x 0.99x 1.01x 1.00x 1.00x 1.00x 1.01x 1.01x 0.97x 1.00x 256B 1.77x 1.83x 1.00x 1.86x 1.79x 1.78x 1.70x 1.76x 1.71x 1.69x 1024B 1.92x 1.95x 0.99x 1.96x 1.93x 1.93x 1.83x 1.86x 1.89x 1.87x 8192B 1.94x 1.95x 0.99x 1.97x 1.95x 1.95x 1.87x 1.87x 1.93x 1.91x Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-08-01crypto: cast5 - add x86_64/avx assembler implementationJohannes Goetzfried
This patch adds a x86_64/avx assembler implementation of the Cast5 block cipher. The implementation processes sixteen blocks in parallel (four 4 block chunk AVX operations). The table-lookups are done in general-purpose registers. For small blocksizes the functions from the generic module are called. A good performance increase is provided for blocksizes greater or equal to 128B. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) cast5-avx-x86_64 vs. cast5-generic 64bit key: size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 0.99x 0.99x 1.00x 1.00x 1.02x 1.01x 64B 1.00x 1.00x 0.98x 1.00x 1.01x 1.02x 256B 2.03x 2.01x 0.95x 2.11x 2.12x 2.13x 1024B 2.30x 2.24x 0.95x 2.29x 2.35x 2.35x 8192B 2.31x 2.27x 0.95x 2.31x 2.39x 2.39x 128bit key: size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec 16B 0.99x 0.99x 1.00x 1.00x 1.01x 1.01x 64B 1.00x 1.00x 0.98x 1.01x 1.02x 1.01x 256B 2.17x 2.13x 0.96x 2.19x 2.19x 2.19x 1024B 2.29x 2.32x 0.95x 2.34x 2.37x 2.38x 8192B 2.35x 2.32x 0.95x 2.35x 2.39x 2.39x Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: arc4 - now arc needs blockcipher supportSebastian Andrzej Siewior
Since commit ce6dd368 ("crypto: arc4 - improve performance by adding ecb(arc4)) we need to pull in a blkcipher. |ERROR: "crypto_blkcipher_type" [crypto/arc4.ko] undefined! |ERROR: "blkcipher_walk_done" [crypto/arc4.ko] undefined! |ERROR: "blkcipher_walk_virt" [crypto/arc4.ko] undefined! Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: twofish-avx - remove duplicated glue code and use shared glue code ↵Jussi Kivilinna
from glue_helper Now that shared glue code is available, convert twofish-avx to use it. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: twofish-x86_64-3way - remove duplicated glue code and use shared ↵Jussi Kivilinna
glue code from glue_helper Now that shared glue code is available, convert twofish-x86_64-3way to use it. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: camellia-x86_64 - remove duplicated glue code and use shared glue ↵Jussi Kivilinna
code from glue_helper Now that shared glue code is available, convert camellia-x86_64 to use it. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: serpent-avx: remove duplicated glue code and use shared glue code ↵Jussi Kivilinna
from glue_helper Now that shared glue code is available, convert serpent-avx to use it. Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: serpent-sse2 - split generic glue code to new helper moduleJussi Kivilinna
Now that serpent-sse2 glue code has been made generic, it can be split to separate module. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: aes_ni - change to use shared ablk_* functionsJussi Kivilinna
Remove duplicate ablk_* functions and make use of ablk_helper module instead. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: twofish-avx - change to use shared ablk_* functionsJussi Kivilinna
Remove duplicate ablk_* functions and make use of ablk_helper module instead. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-27crypto: ablk_helper - move ablk_* functions from serpent-sse2/avx glue code ↵Jussi Kivilinna
to shared module Move ablk-* functions to separate module to share common code between cipher implementations. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-12crypto: serpent - add x86_64/avx assembler implementationJohannes Goetzfried
This patch adds a x86_64/avx assembler implementation of the Serpent block cipher. The implementation is very similar to the sse2 implementation and processes eight blocks in parallel. Because of the new non-destructive three operand syntax all move-instructions can be removed and therefore a little performance increase is provided. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) serpent-avx-x86_64 vs. serpent-sse2-x86_64 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.03x 1.01x 1.01x 1.01x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x 64B 1.00x 1.00x 1.00x 1.00x 1.00x 0.99x 1.00x 1.01x 1.00x 1.00x 256B 1.05x 1.03x 1.00x 1.02x 1.05x 1.06x 1.05x 1.02x 1.05x 1.02x 1024B 1.05x 1.02x 1.00x 1.02x 1.05x 1.06x 1.05x 1.03x 1.05x 1.02x 8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.03x 1.04x 1.02x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 1.01x 1.00x 1.01x 1.01x 1.00x 1.00x 0.99x 1.03x 1.01x 1.01x 64B 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x 1.00x 1.02x 256B 1.05x 1.02x 1.00x 1.02x 1.05x 1.02x 1.04x 1.05x 1.05x 1.02x 1024B 1.06x 1.02x 1.00x 1.02x 1.07x 1.06x 1.05x 1.04x 1.05x 1.02x 8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.05x 1.05x 1.02x serpent-avx-x86_64 vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.26x 1.73x ecb-dec 1.20x 1.64x cbc-enc 0.33x 0.45x cbc-dec 1.24x 1.67x ctr-enc 1.32x 1.76x ctr-dec 1.32x 1.76x lrw-enc 1.20x 1.60x lrw-dec 1.15x 1.54x xts-enc 1.22x 1.64x xts-dec 1.17x 1.57x Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-06-12crypto: twofish - add x86_64/avx assembler implementationJohannes Goetzfried
This patch adds a x86_64/avx assembler implementation of the Twofish block cipher. The implementation processes eight blocks in parallel (two 4 block chunk AVX operations). The table-lookups are done in general-purpose registers. For small blocksizes the 3way-parallel functions from the twofish-x86_64-3way module are called. A good performance increase is provided for blocksizes greater or equal to 128B. Patch has been tested with tcrypt and automated filesystem tests. Tcrypt benchmark results: Intel Core i5-2500 CPU (fam:6, model:42, step:7) twofish-avx-x86_64 vs. twofish-x86_64-3way 128bit key: (lrw:256bit) (xts:256bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.96x 0.97x 1.00x 0.95x 0.97x 0.97x 0.96x 0.95x 0.95x 0.98x 64B 0.99x 0.99x 1.00x 0.99x 0.98x 0.98x 0.99x 0.98x 0.99x 0.98x 256B 1.20x 1.21x 1.00x 1.19x 1.15x 1.14x 1.19x 1.20x 1.18x 1.19x 1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.24x 1.26x 1.28x 1.26x 1.27x 8192B 1.31x 1.32x 1.00x 1.31x 1.25x 1.25x 1.28x 1.29x 1.28x 1.30x 256bit key: (lrw:384bit) (xts:512bit) size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec 16B 0.96x 0.96x 1.00x 0.96x 0.97x 0.98x 0.95x 0.95x 0.95x 0.96x 64B 1.00x 0.99x 1.00x 0.98x 0.98x 1.01x 0.98x 0.98x 0.98x 0.98x 256B 1.20x 1.21x 1.00x 1.21x 1.15x 1.15x 1.19x 1.20x 1.18x 1.19x 1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.23x 1.26x 1.27x 1.26x 1.27x 8192B 1.31x 1.33x 1.00x 1.31x 1.26x 1.26x 1.29x 1.29x 1.28x 1.30x twofish-avx-x86_64 vs aes-asm (8kB block): 128bit 256bit ecb-enc 1.19x 1.63x ecb-dec 1.18x 1.62x cbc-enc 0.75x 1.03x cbc-dec 1.23x 1.67x ctr-enc 1.24x 1.65x ctr-dec 1.24x 1.65x lrw-enc 1.15x 1.53x lrw-dec 1.14x 1.52x xts-enc 1.16x 1.56x xts-dec 1.16x 1.56x Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>