Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1272318imm; Sat, 11 Aug 2018 09:36:51 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwEF4l6nspdpsvQES2OT2y7WgzOndmnHGfrjH6hXNw7G57LWrr6Z0BdYKZywjVYtjl/UjX2 X-Received: by 2002:a62:4494:: with SMTP id m20-v6mr11832359pfi.205.1534005411386; Sat, 11 Aug 2018 09:36:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534005411; cv=none; d=google.com; s=arc-20160816; b=i0g+QqxGdrCKSb27ynsOun7f3QbetNSfSX0UVDxYfymvei7+HMH7sNKwmPVS9VSm93 Mvbyz/GUbAtMSadVnwJhQlyP9Do8uhkiK9JJZert1TcCIeahBotuNw8SqOMiHXqAkQ7S KsBaz820vbUV0BJV/WlUNl95eub7c1rDmxlj0m4xqaPWzni46koT5DnWbHsaLc5ztpAC 0mRjxazTu6DVkWNmoYPaf28XcNT2aALk4Pd4YQyQdIENWMc+8CiV6rVyDG6zB0khGukY ZlbVPtThYpk7QEDuUWnWVdYEXSTEetA1HKGMyBdHHzg8tVDknh9V2gPiUvtct/qpf5GJ skzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=p3FL45/Ri8H3D7IBlSGu8gxz15unLboEID5/ndkLuq8=; b=NSQOO85s6zk9O5hmM+ctB6ui0rg6WBiKcnCrcXuJ+JWDejUUTG+dTIJ2hw0K9ZFC9l fHpmcwWrBnsMkvej1ROeQ+s04M3NgT+R3d2+kapB2bRShqObEXleWmaZPmzaYS4LvwZs 8yBnw+sv0Qx3NG2iJbLp+zqqQdCACrmkQFxFmdvI2LMoeLR8REu5wp59NPp/DEMz2+os FQt/yclBDBaKoybYYAJZ7PERzpH0TQqTnm1LZlZ4VjY3XpXczvVsAtYvrMTbSiL6S3qe hC17uTPA54Ql/HsPKXlBlfLo0yjBPob9lJkIE9IegUUhtAwKEnFJNFwuEUDJIM8QkapA VZEA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y65-v6si13244869pgb.199.2018.08.11.09.36.36; Sat, 11 Aug 2018 09:36:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727580AbeHKTKa (ORCPT + 99 others); Sat, 11 Aug 2018 15:10:30 -0400 Received: from smtprelay0159.hostedemail.com ([216.40.44.159]:56515 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727450AbeHKTKa (ORCPT ); Sat, 11 Aug 2018 15:10:30 -0400 Received: from filter.hostedemail.com (clb03-v110.bra.tucows.net [216.40.38.60]) by smtprelay05.hostedemail.com (Postfix) with ESMTP id 4C40318029585; Sat, 11 Aug 2018 16:35:45 +0000 (UTC) X-Session-Marker: 6A6F6540706572636865732E636F6D X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,joe@perches.com,:::::::::::::::::::,RULES_HIT:41:355:379:541:599:973:988:989:1042:1260:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1534:1541:1593:1594:1711:1730:1747:1777:1792:2393:2559:2562:2828:3138:3139:3140:3141:3142:3354:3622:3865:3866:3867:3868:3870:3871:3873:3874:4250:4321:5007:6119:10004:10400:10848:11232:11658:11914:12050:12663:12740:12760:12895:13069:13138:13231:13311:13357:13439:14096:14097:14659:21080:21627:21740:30034:30054:30091,0,RBL:75.82.193.221:@perches.com:.lbl8.mailshell.net-62.8.0.186 64.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:31,LUA_SUMMARY:none X-HE-Tag: shelf61_320243dd6454c X-Filterd-Recvd-Size: 2981 Received: from XPS-9350 (cpe-75-82-193-221.socal.res.rr.com [75.82.193.221]) (Authenticated sender: joe@perches.com) by omf09.hostedemail.com (Postfix) with ESMTPA; Sat, 11 Aug 2018 16:35:43 +0000 (UTC) Message-ID: <8af0245c1efbec6ae4ac3d2b14d6e819cb28b98e.camel@perches.com> Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. From: Joe Perches To: "Martin K. Petersen" , Jeff Lien Cc: linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, herbert@gondor.apana.org.au, tim.c.chen@linux.intel.com, david.darrington@wdc.com, jeff.furlong@wdc.com Date: Sat, 11 Aug 2018 09:35:42 -0700 In-Reply-To: References: <1533928331-21303-1-git-send-email-jeff.lien@wdc.com> Content-Type: text/plain; charset="ISO-8859-1" X-Mailer: Evolution 3.28.1-2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 2018-08-11 at 11:36 -0400, Martin K. Petersen wrote: > Jeff, > > > This patch provides a performance improvement for the CRC16 > > calculations done in read/write workloads using the T10 Type 1/2/3 > > guard field. For example, today with sequential write workloads (one > > thread/CPU of IO) we consume 100% of the CPU because of the CRC16 > > computation bottleneck. Today's block devices are considerably > > faster, but the CRC16 calculation prevents folks from utilizing the > > throughput of such devices. To speed up this calculation and expose > > the block device throughput, we slice the old single byte for loop > > into a 16 byte for loop, with a larger CRC table to match. The result > > has shown 5x performance improvements on various big endian and little > > endian systems running the 4.18.0 kernel version. > > The reason I went with a simple slice-by-one approach was that the > larger tables had a negative impact on the CPU caches. So while > slice-by-N numbers looked better in synthetic benchmarks, actual > application performance started getting affected as the tables grew > larger. > > These days we obviously use the hardware-accelerated CRC calculation so > the software table approach mostly serves as a reference > implementation. But given your big vs. little endian performance > metrics, I'm assuming you guys are focused on embedded processors > without support for CRC acceleration? > > I have no problem providing a choice for bigger tables. My only concern > is that the selection heuristics need to be more than one-dimensional. > Latency and cache side effects are often more important than throughput. > At least on the initiator side. > > Also, I'd like to keep the original slice-by-one implementation for > reference purposes. Did you see the suggested patch that allows either 1, 2, 4, 8 or 16 block table sizes? Perhaps you have a comment on that?