From: Tim Chen Subject: [PATCH 0/4] Patchset to use PCLMULQDQ to accelerate CRC-T10DIF checksum computation Date: Tue, 16 Apr 2013 09:20:47 -0700 Message-ID: Cc: Tim Chen , Matthew Wilcox , Jim Kukunas , Keith Busch , Erdinc Ozturk , Vinodh Gopal , James Guilford , Wajdi Feghali , Jussi Kivilinna , linux-kernel , linux-crypto@vger.kernel.org, linux-scsi@vger.kernel.org To: Herbert Xu , "H. Peter Anvin" , "David S. Miller" , "Martin K. Petersen" , James Bottomley Return-path: Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-crypto.vger.kernel.org Herbert, Currently the CRC-T10DIF checksum is computed using a generic table lookup algorithm. By switching the checksum to PCLMULQDQ based computation, we can speedup the computation by 8x for checksumming 512 bytes and even more for larger buffer size. This will improve performance of SCSI drivers turning on the CRC-T10IDF checksum. In our SSD based experiments, we have seen in disk throughput by 3.5x with T10DIF. This patchset provide the x86_64 routine using PCLMULQDQ instruction and switch the crc_t10dif library function to use the faster PCLMULQDQ based routine when available. Will appreciate if you can consider merging this for the 3.10 kernel. Tim Tim Chen (4): Wrap crc_t10dif function all to use crypto transform framework Accelerated CRC T10 DIF computation with PCLMULQDQ instruction Glue code to cast accelerated CRCT10DIF assembly as a crypto transform Simple correctness and speed test for CRCT10DIF hash arch/x86/crypto/Makefile | 2 + arch/x86/crypto/crct10dif-pcl-asm_64.S | 659 ++++++++++++++++++++++++++++++++ arch/x86/crypto/crct10dif-pclmul_glue.c | 153 ++++++++ crypto/Kconfig | 21 + crypto/tcrypt.c | 8 + crypto/testmgr.c | 10 + crypto/testmgr.h | 24 ++ include/linux/crc-t10dif.h | 10 + lib/crc-t10dif.c | 96 +++++ 9 files changed, 983 insertions(+) create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S create mode 100644 arch/x86/crypto/crct10dif-pclmul_glue.c -- 1.7.11.7