Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2574093imm; Thu, 16 Aug 2018 11:44:45 -0700 (PDT) X-Google-Smtp-Source: AA+uWPz0FkY7yS9J6WSIaVFr58BnHQwlRe78onT87hFeUTmvczPIR7FDYVFmTyNrZD0hPdMrU4Ga X-Received: by 2002:a17:902:e088:: with SMTP id cb8-v6mr30350557plb.189.1534445084992; Thu, 16 Aug 2018 11:44:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534445084; cv=none; d=google.com; s=arc-20160816; b=NPu1+cg+Jin4PG2jHHsVpc22ha1tXb5yFON+PX/68VNuehS0h4WSA82WCVdk+dpDAu Lsohm/lZqj6aLDAEQW69HB37pG7dmi/04Mg42KeSNuoMUhtMftvK5hFpnnEtQv31zasc ezvq0H/ktktFDHsCexeVsh6Lrm6VfG/AsoZao4VZPQpex/JSIpfxEBz4nxiUB7jtyRIz QrMJAA3ZQNoso/8obtILvmsT95N6H8bufuysvIV684ZIC6tWdg3ixL/PPUKOFMr85eN2 a1kj7uKmgAUcUOFx5PxnK2BUtMv5b2lEqNFbe7M0FSIF6uJHCOeUsYArUJpJqtoyz0tL eA9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=4Tvo4BmdMpD2UBRhmm86BxSDdIdCAMDueknA/8ETZtk=; b=rJy2FODfy5aj4ejFK3QDRA/wSpX20rfjiISOthZjF6LJKc09gfQbwqedi7DRGzNJMk c+lK87/BFYxSntmiX7Ad5zSS60hYPMjE85RciHJ6y0P5FQB/dXxHMlIYu2FNk0m76SD5 n25c1eFg3/pKd/wRu8Nj/PXMWP5zBtbwIADEhU+ulH5VqxvYymurKSixZNlU9+6IHhZC RjqVq+pv5+ohpnjVGCho0uXsx8AEH7cr7k1soTKItg5Qadi2mG75G7GlEd2m/0SOe3op PghIVaPNjjZsOIfbOr19QjVZh1fAhwY3c7bXHY1zzcyYVGyCRbn44AYwWp+jR2qnPyH/ jEmw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b1-v6si15363pli.54.2018.08.16.11.44.29; Thu, 16 Aug 2018 11:44:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391905AbeHPSkd (ORCPT + 99 others); Thu, 16 Aug 2018 14:40:33 -0400 Received: from pegase1.c-s.fr ([93.17.236.30]:50688 "EHLO pegase1.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731502AbeHPSkd (ORCPT ); Thu, 16 Aug 2018 14:40:33 -0400 Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 41rrCY5PdNz9tvpw; Thu, 16 Aug 2018 17:41:13 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id 2LkHkUtVZDDl; Thu, 16 Aug 2018 17:41:13 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 41rrCY4rtlz9tvpn; Thu, 16 Aug 2018 17:41:13 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 2CE5F8B7EF; Thu, 16 Aug 2018 17:41:17 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id WdJh27FOKZfz; Thu, 16 Aug 2018 17:41:17 +0200 (CEST) Received: from PO15451 (unknown [192.168.232.3]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 73B6B8B7E9; Thu, 16 Aug 2018 17:41:16 +0200 (CEST) Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. To: dgilbert@interlog.com, Jeffrey Lien , Eric Biggers Cc: "linux-kernel@vger.kernel.org" , "linux-crypto@vger.kernel.org" , "linux-block@vger.kernel.org" , "linux-scsi@vger.kernel.org" , "herbert@gondor.apana.org.au" , "tim.c.chen@linux.intel.com" , "martin.petersen@oracle.com" , David Darrington , Jeff Furlong , Joe Perches References: <1533928331-21303-1-git-send-email-jeff.lien@wdc.com> <20180810201601.GA80850@gmail.com> From: Christophe LEROY Message-ID: <7f1b5ca8-cd89-71cc-21bb-5a058bc1e908@c-s.fr> Date: Thu, 16 Aug 2018 17:41:13 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Please include your new patch as plain text inside the mail, not as a MIME attachment. Otherwise it is not downloadable from https://patchwork.kernel.org/patch/10563093/ Christophe Le 16/08/2018 à 16:22, Douglas Gilbert a écrit : > Hi, > Rather than present this formerly as an alternate patch, attached is a > clean-up of my patch which uses the variable size table proposed by > Joe Perches and is based on the original patch that > started this thread. > > Doug Gilbert > > On 2018-08-16 10:02 AM, Jeffrey Lien wrote: >> Eric, >> We did not test the slice by 4 or 8 tables.  I'm not sure of  the >> value of doing that since the slice by 16 will provide the best >> performance gain.   If I'm missing anything here, please let me know. >> >> I'm working on a new version of the patch based on the feedback from >> others and will also change the pointer variables to start with p and >> fix the indenting you mentioned below in the new version of the patch. >> >> Thanks >> >> Jeff Lien >> >> -----Original Message----- >> From: Eric Biggers [mailto:ebiggers@kernel.org] >> Sent: Friday, August 10, 2018 3:16 PM >> To: Jeffrey Lien >> Cc: linux-kernel@vger.kernel.org; linux-crypto@vger.kernel.org; >> linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; >> herbert@gondor.apana.org.au; tim.c.chen@linux.intel.com; >> martin.petersen@oracle.com; David Darrington >> ; Jeff Furlong >> Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. >> >> On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote: >>> This patch provides a performance improvement for the CRC16 >>> calculations done in read/write workloads using the T10 Type 1/2/3 >>> guard field.  For example, today with sequential write workloads (one >>> thread/CPU of IO) we consume 100% of the CPU because of the CRC16 >>> computation bottleneck.  Today's block devices are considerably >>> faster, but the CRC16 calculation prevents folks from utilizing the >>> throughput of such devices.  To speed up this calculation and expose >>> the block device throughput, we slice the old single byte for loop >>> into a 16 byte for loop, with a larger CRC table to match.  The >>> result has shown 5x performance improvements on various big endian >>> and little endian systems running the 4.18.0 kernel version. >>> >>> FIO Sequential Write, 64K Block Size, Queue Depth 64 >>> BE Base Kernel:        bw=201.5 MiB/s >>> BE Modified CRC Calc:  bw=968.1 MiB/s >>> 4.80x performance improvement >>> >>> LE Base Kernel:        bw=357 MiB/s >>> LE Modified CRC Calc:  bw=1964 MiB/s >>> 5.51x performance improvement >>> >>> FIO Sequential Read, 64K Block Size, Queue Depth 64 >>> BE Base Kernel:        bw=611.2 MiB/s >>> BE Modified CRC calc:  bw=684.9 MiB/s >>> 1.12x performance improvement >>> >>> LE Base Kernel:        bw=797 MiB/s >>> LE Modified CRC Calc:  bw=2730 MiB/s >>> 3.42x performance improvement >> >> Did you also test the slice-by-4 (requires 2048-byte table) and >> slice-by-8 (requires 4096-byte table) methods?  Your proposal is >> slice-by-16 (requires 8192-byte table); the original was slice-by-1 >> (requires 512-byte table). >> >>>   __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, >>> size_t len)  { >>> -    unsigned int i; >>> +    const __u8 *i = (const __u8 *)buffer; >>> +    const __u8 *i_end = i + len; >>> +    const __u8 *i_last16 = i + (len / 16 * 16); >> >> 'i' is normally a loop counter, not a pointer. >> Use 'p', 'p_end', and 'p_last16'. >> >>> -    for (i = 0 ; i < len ; i++) >>> -        crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ >>> buffer[i]) & 0xff]; >>> +    for (; i < i_last16; i += 16) { >>> +        crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^ >>> +        t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^ >>> +        t10_dif_crc_table[13][i[2]] ^ >>> +        t10_dif_crc_table[12][i[3]] ^ >>> +        t10_dif_crc_table[11][i[4]] ^ >>> +        t10_dif_crc_table[10][i[5]] ^ >>> +        t10_dif_crc_table[9][i[6]] ^ >>> +        t10_dif_crc_table[8][i[7]] ^ >>> +        t10_dif_crc_table[7][i[8]] ^ >>> +        t10_dif_crc_table[6][i[9]] ^ >>> +        t10_dif_crc_table[5][i[10]] ^ >>> +        t10_dif_crc_table[4][i[11]] ^ >>> +        t10_dif_crc_table[3][i[12]] ^ >>> +        t10_dif_crc_table[2][i[13]] ^ >>> +        t10_dif_crc_table[1][i[14]] ^ >>> +        t10_dif_crc_table[0][i[15]]; >>> +    } >> >> Please indent this properly. >> >>         crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^ >>               t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^ >>               t10_dif_crc_table[13][i[2]] ^ >>               t10_dif_crc_table[12][i[3]] ^ >>               t10_dif_crc_table[11][i[4]] ^ >>               ... >> >> - Eric >> >