Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2577863imm; Thu, 16 Aug 2018 11:49:09 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxKD6zX3li3AhVN2z4qasJHOl2KiUVvYHh/kpCx4ODtIkf7bm8xKCT8z7LJ35VMucqZfxoD X-Received: by 2002:a63:ff4d:: with SMTP id s13-v6mr30157138pgk.150.1534445349240; Thu, 16 Aug 2018 11:49:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534445349; cv=none; d=google.com; s=arc-20160816; b=jnRG2vhkK3TNm0dX0Agn0mv95Nr3KJb8KRv7yjifcm4icd9ARAfoqtji2py/Cki3bT 1RtuMtrjG8JOLdzZyTvVWc0e+yEvlH7bOI8fkGkYIqSgQI4kXWy9PXUMe9oLkzquVJvR 35jZZjElE/8PvhnQG69/hosdIR9nd7KygLef4NuvkesfxiKLUYYAEFrTh58YLbuPRGBD jZP6SvXvBttR4QpQ9s7iuTOD+OBRHJ3k0xB8qBVz3HJXu5OF9HpcjQNl8omeKy6SxoVo UzUwIrkGwpWJ4ADBqC56Yx90xypmfhW2yNZTWvSgQxptOog0NjZv1RFLhp6igueBQPid 9/wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:reply-to :arc-authentication-results; bh=F2e0ux195313Ak8ouEiZsFHo4VZbWvOkg9zGH3MDld8=; b=H90819VAzkRWNAdsA4Qqi0+CEeXWCCcEuqxfOlsBG3cGOmWkZxnUAugoz/ixJAhgRQ KxQypbB0TBBxkWiulaCNzeeuDNo4LzPFcs7hFsq7J4R/c5MSPNdplo6x7aO7ttpmvbsO 8faV7A/cQG2IRg2TrRoNKYlIxXGCLffnTEolyIGh139gfouhvmgm40i8ONUD9c9FVOCV KhlQOiDU/IYWqYUOxov8TVh7bTe1ENOR0Jvz9N3UEfhFUPl+dWEY2HCEgBVsBQXT6gft FIcNLbDhJdz1VPAye+fUNf57Y0fyUKKXAIFzViVsrVXygA0YdyIVNbkGT2dN/3cWHxZG 9g8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t19-v6si26189pgb.196.2018.08.16.11.48.52; Thu, 16 Aug 2018 11:49:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729537AbeHPUiX (ORCPT + 99 others); Thu, 16 Aug 2018 16:38:23 -0400 Received: from smtp.infotech.no ([82.134.31.41]:56773 "EHLO smtp.infotech.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728702AbeHPUiW (ORCPT ); Thu, 16 Aug 2018 16:38:22 -0400 Received: from localhost (localhost [127.0.0.1]) by smtp.infotech.no (Postfix) with ESMTP id 8076420423B; Thu, 16 Aug 2018 19:38:30 +0200 (CEST) X-Virus-Scanned: by amavisd-new-2.6.6 (20110518) (Debian) at infotech.no Received: from smtp.infotech.no ([127.0.0.1]) by localhost (smtp.infotech.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uTnAwa4sYyVb; Thu, 16 Aug 2018 19:38:27 +0200 (CEST) Received: from [192.168.48.23] (host-45-58-245-67.dyn.295.ca [45.58.245.67]) by smtp.infotech.no (Postfix) with ESMTPA id E124A20416A; Thu, 16 Aug 2018 19:38:24 +0200 (CEST) Reply-To: dgilbert@interlog.com Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. To: Christophe LEROY , Jeffrey Lien , Eric Biggers Cc: "linux-kernel@vger.kernel.org" , "linux-crypto@vger.kernel.org" , "linux-block@vger.kernel.org" , "linux-scsi@vger.kernel.org" , "herbert@gondor.apana.org.au" , "tim.c.chen@linux.intel.com" , "martin.petersen@oracle.com" , David Darrington , Jeff Furlong , Joe Perches References: <1533928331-21303-1-git-send-email-jeff.lien@wdc.com> <20180810201601.GA80850@gmail.com> <7f1b5ca8-cd89-71cc-21bb-5a058bc1e908@c-s.fr> From: Douglas Gilbert Message-ID: Date: Thu, 16 Aug 2018 13:38:22 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <7f1b5ca8-cd89-71cc-21bb-5a058bc1e908@c-s.fr> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-CA Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-08-16 11:41 AM, Christophe LEROY wrote: > Hi, > > Please include your new patch as plain text inside the mail, not as a MIME > attachment. Otherwise it is not downloadable from > https://patchwork.kernel.org/patch/10563093/ It should be downloadable from: http://sg.danny.cz/sg/p/0001-T10-CRC16-function-build-time-sized-table.patch With regard to your comment about slice (table ?) size, that is partially addressed by a kernel build time option shown in the above patch. That could be taken a bit further with a sysfs knob (where ?) to reduce the effective table size from that which the kernel is built with. To increase the size of the table would imply fetching some more heap and having an algorithm that could generate the extra part of that table required. Doug Gilbert > Christophe > > Le 16/08/2018 à 16:22, Douglas Gilbert a écrit : >> Hi, >> Rather than present this formerly as an alternate patch, attached is a >> clean-up of my patch which uses the variable size table proposed by >> Joe Perches and is based on the original patch that >> started this thread. >> >> Doug Gilbert >> >> On 2018-08-16 10:02 AM, Jeffrey Lien wrote: >>> Eric, >>> We did not test the slice by 4 or 8 tables.  I'm not sure of  the value of >>> doing that since the slice by 16 will provide the best performance gain.   If >>> I'm missing anything here, please let me know. >>> >>> I'm working on a new version of the patch based on the feedback from others >>> and will also change the pointer variables to start with p and fix the >>> indenting you mentioned below in the new version of the patch. >>> >>> Thanks >>> >>> Jeff Lien >>> >>> -----Original Message----- >>> From: Eric Biggers [mailto:ebiggers@kernel.org] >>> Sent: Friday, August 10, 2018 3:16 PM >>> To: Jeffrey Lien >>> Cc: linux-kernel@vger.kernel.org; linux-crypto@vger.kernel.org; >>> linux-block@vger.kernel.org; linux-scsi@vger.kernel.org; >>> herbert@gondor.apana.org.au; tim.c.chen@linux.intel.com; >>> martin.petersen@oracle.com; David Darrington ; Jeff >>> Furlong >>> Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. >>> >>> On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote: >>>> This patch provides a performance improvement for the CRC16 >>>> calculations done in read/write workloads using the T10 Type 1/2/3 >>>> guard field.  For example, today with sequential write workloads (one >>>> thread/CPU of IO) we consume 100% of the CPU because of the CRC16 >>>> computation bottleneck.  Today's block devices are considerably >>>> faster, but the CRC16 calculation prevents folks from utilizing the >>>> throughput of such devices.  To speed up this calculation and expose >>>> the block device throughput, we slice the old single byte for loop into a 16 >>>> byte for loop, with a larger CRC table to match.  The result has shown 5x >>>> performance improvements on various big endian and little endian systems >>>> running the 4.18.0 kernel version. >>>> >>>> FIO Sequential Write, 64K Block Size, Queue Depth 64 >>>> BE Base Kernel:        bw=201.5 MiB/s >>>> BE Modified CRC Calc:  bw=968.1 MiB/s >>>> 4.80x performance improvement >>>> >>>> LE Base Kernel:        bw=357 MiB/s >>>> LE Modified CRC Calc:  bw=1964 MiB/s >>>> 5.51x performance improvement >>>> >>>> FIO Sequential Read, 64K Block Size, Queue Depth 64 >>>> BE Base Kernel:        bw=611.2 MiB/s >>>> BE Modified CRC calc:  bw=684.9 MiB/s >>>> 1.12x performance improvement >>>> >>>> LE Base Kernel:        bw=797 MiB/s >>>> LE Modified CRC Calc:  bw=2730 MiB/s >>>> 3.42x performance improvement >>> >>> Did you also test the slice-by-4 (requires 2048-byte table) and slice-by-8 >>> (requires 4096-byte table) methods?  Your proposal is slice-by-16 (requires >>> 8192-byte table); the original was slice-by-1 (requires 512-byte table). >>> >>>>   __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, >>>> size_t len)  { >>>> -    unsigned int i; >>>> +    const __u8 *i = (const __u8 *)buffer; >>>> +    const __u8 *i_end = i + len; >>>> +    const __u8 *i_last16 = i + (len / 16 * 16); >>> >>> 'i' is normally a loop counter, not a pointer. >>> Use 'p', 'p_end', and 'p_last16'. >>> >>>> -    for (i = 0 ; i < len ; i++) >>>> -        crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff]; >>>> +    for (; i < i_last16; i += 16) { >>>> +        crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^ >>>> +        t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^ >>>> +        t10_dif_crc_table[13][i[2]] ^ >>>> +        t10_dif_crc_table[12][i[3]] ^ >>>> +        t10_dif_crc_table[11][i[4]] ^ >>>> +        t10_dif_crc_table[10][i[5]] ^ >>>> +        t10_dif_crc_table[9][i[6]] ^ >>>> +        t10_dif_crc_table[8][i[7]] ^ >>>> +        t10_dif_crc_table[7][i[8]] ^ >>>> +        t10_dif_crc_table[6][i[9]] ^ >>>> +        t10_dif_crc_table[5][i[10]] ^ >>>> +        t10_dif_crc_table[4][i[11]] ^ >>>> +        t10_dif_crc_table[3][i[12]] ^ >>>> +        t10_dif_crc_table[2][i[13]] ^ >>>> +        t10_dif_crc_table[1][i[14]] ^ >>>> +        t10_dif_crc_table[0][i[15]]; >>>> +    } >>> >>> Please indent this properly. >>> >>>         crc = t10_dif_crc_table[15][i[0] ^ (__u8)(crc >>  8)] ^ >>>               t10_dif_crc_table[14][i[1] ^ (__u8)(crc >>  0)] ^ >>>               t10_dif_crc_table[13][i[2]] ^ >>>               t10_dif_crc_table[12][i[3]] ^ >>>               t10_dif_crc_table[11][i[4]] ^ >>>               ... >>> >>> - Eric >>> >> >