Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934089Ab3DQXSa (ORCPT ); Wed, 17 Apr 2013 19:18:30 -0400 Received: from mga09.intel.com ([134.134.136.24]:50836 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751453Ab3DQXSZ (ORCPT ); Wed, 17 Apr 2013 19:18:25 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,496,1363158000"; d="scan'208";a="319978913" From: Tim Chen To: Herbert Xu , "H. Peter Anvin" , "David S. Miller" , "Martin K. Petersen" , James Bottomley Cc: Tim Chen , Matthew Wilcox , Jim Kukunas , Keith Busch , Erdinc Ozturk , Vinodh Gopal , James Guilford , Wajdi Feghali , Jussi Kivilinna , linux-kernel , linux-crypto@vger.kernel.org, linux-scsi@vger.kernel.org Subject: [PATCH v2 0/4] Patchset to use PCLMULQDQ to accelerate CRC-T10DIF checksum computation Date: Wed, 17 Apr 2013 09:12:51 -0700 Message-Id: X-Mailer: git-send-email 1.7.11.7 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2260 Lines: 53 Currently the CRC-T10DIF checksum is computed using a generic table lookup algorithm. By switching the checksum to PCLMULQDQ based computation, we can speedup the computation by 8x for checksumming 512 bytes and even more for larger buffer size. This will improve performance of SCSI drivers turning on the CRC-T10IDF checksum. In our SSD based experiments, we have seen increase disk throughput by 3.5x with T10DIF for 512 byte block size. This patch set provides the x86_64 routine using PCLMULQDQ instruction and switches the crc_t10dif library function to use the faster PCLMULQDQ based routine when available. Tim v1->v2 1. Get rid of unnecessary xmm registers save and restore and fix ENDPROC position in PCLMULQDQ version of crc t10dif computation. 2. Fix URL to paper reference of CRC computation with PCLMULQDQ. 3. Add one additional tcrypt test case to exercise more code paths through crc t10dif computation. 4. Fix config dependencies of CRYPTO_CRCT10DIF. Thanks to Matthew and Jussi who reviewed the patches and Keith for testing version 1 of the patch set. Tim Chen (4): Wrap crc_t10dif function all to use crypto transform framework Accelerated CRC T10 DIF computation with PCLMULQDQ instruction Glue code to cast accelerated CRCT10DIF assembly as a crypto transform Simple correctness and speed test for CRCT10DIF hash arch/x86/crypto/Makefile | 2 + arch/x86/crypto/crct10dif-pcl-asm_64.S | 643 ++++++++++++++++++++++++++++++++ arch/x86/crypto/crct10dif-pclmul_glue.c | 153 ++++++++ crypto/Kconfig | 21 ++ crypto/tcrypt.c | 8 + crypto/testmgr.c | 10 + crypto/testmgr.h | 33 ++ include/linux/crc-t10dif.h | 10 + lib/crc-t10dif.c | 96 +++++ 9 files changed, 976 insertions(+) create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S create mode 100644 arch/x86/crypto/crct10dif-pclmul_glue.c -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/