Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp383624rdb; Mon, 18 Sep 2023 19:58:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGcqzNzD2iLI0sq4615G8Aqm2LfZhBAxE4lmU1hTeaSDwibCpriao4WeJEsikrUXBdCiAGN X-Received: by 2002:a17:90b:381:b0:26b:59b7:edb with SMTP id ga1-20020a17090b038100b0026b59b70edbmr9623655pjb.33.1695092315847; Mon, 18 Sep 2023 19:58:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695092315; cv=none; d=google.com; s=arc-20160816; b=WJZkpaUs10zWOEvvseRAeIZ42ZKFREPS0x3zl/+TzGNjOZbEPWZX9jefPGj+fji3Kl 7+cBG+118+1FktVW4QrwNMQgjexLxUVODLsrtIX8zE+mRf3PwlCyhp7QrOHOEDh2NYou iWcdJVtM6IQczP7r64tSShhkR1esorp636z9k6+G4vAB1ZK4Vcn5ct9GRIlPQRqeIwzf U5A1rzXIy5tukKyzzNQfCCkZJK7LBARomyy7Vp8hxJkTu04cNk0nHJEo4pFl6pSAIJl4 zcgb05WuCxw08gi8VhtXWzMn6u9EETXR54la+7jwQAtZl3Ba3z7midQKmDSJdItir3dp 77Ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ORUv82vqRYOMSTI3MAI/v8wlx+Ql9aKnbQUGVaGlaPk=; fh=Q8I0HaN+ohuAcI7Ai7F+sm8IGMCZjIQhTKbXCoQCPg8=; b=gXPf+m5QxN2pQdHNXQfhjb3chiHm0KRcDrGtAmFdQn86SGB2OEHDS0GFe2dQ9QLVbX b6b7JUj3JDPaeozmAoD9sydoeO4+1Su9SLpTSrNU/QZZT7tu4iKHbLhEx4yKyB/JDyLX JpYvS7Zw/DiojSly4Cg76ZU5qXejycZtLi04cDAofYf82Pr/w+7iKRAyjMW5ALqnLUZ4 /SviHjIzcSk934ohwpvQkJyBxL45hPhUmr05qL6BiTfBWi6NqJ4q14oVggNNCkzWySqu NZGO/0N5xYPN0zKls5un5yXckYkwbZcJk7kIPyACKpnpNqjB++DodEaibn1wmcJ8T2sg /9+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=wS5pFz53; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id h11-20020a17090ac38b00b00276571c0d31si3120402pjt.38.2023.09.18.19.58.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 19:58:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=wS5pFz53; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 64A3F80C779D; Mon, 18 Sep 2023 19:58:33 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231213AbjISC6a (ORCPT + 99 others); Mon, 18 Sep 2023 22:58:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231163AbjISC63 (ORCPT ); Mon, 18 Sep 2023 22:58:29 -0400 Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com [IPv6:2607:f8b0:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3E9910F for ; Mon, 18 Sep 2023 19:58:22 -0700 (PDT) Received: by mail-pg1-x534.google.com with SMTP id 41be03b00d2f7-578a397a30dso709432a12.2 for ; Mon, 18 Sep 2023 19:58:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1695092302; x=1695697102; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ORUv82vqRYOMSTI3MAI/v8wlx+Ql9aKnbQUGVaGlaPk=; b=wS5pFz53WfNB6zNhEWhSmjVJ0B0Xa2hBhIUJf3cIGo/91N0TXEJD1Cfik73/QZo4Cm RGI2GX32YFTQl72rpLuSB9NfC6885oYvj89N8t5Ep1dXQgbi3jv5wA/PF8HyCtUb8feP A9d3Pm4C14Y1nJttPsSnx0IQvK5xqqxA7iUOgrP04WIXrubq68icdLPXfWcoQXVXZSW4 O550msgyjxZFVobdG2NQdx68kGGUK0QYuTPqqCyr+IVxsnvL//dY4XbY7BPw/uKgJnA/ aN9Pz7L99wjHost2RQW39mXcCW9T9zHek8jhcjELQH51PO7iSw4w9pOGjK0xhTesroJR muZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695092302; x=1695697102; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ORUv82vqRYOMSTI3MAI/v8wlx+Ql9aKnbQUGVaGlaPk=; b=OWpQumes1SPzPS7I8n7lYBPRCy3vTAtCzR5XshL0oqru0Es3/xUfeu7tmq9EALtvfn dk6LunN7MymYHj+q5dbVdiwCMVIf+D7NbpkN1UNkKj9lTTowEde38kuIESyfEWQ7Lqir SsEOmNf+TnqLNQ3gNmFsFOHl8n/FGSj5wYR+a/8WfjRNzywF5Ijr4kkzYwM5dJhn5/hJ Y7hS/OhiWgac4LcCzIwhnEuMQdUb8s+Vy3sVRRAHJOt1GBjosG2fs+RbjpCaE6dLktY7 tTSc5A2BrCPH3X/1WDiOmQefmlsBdnmExbHtpZZ4SbzloGA9xonznPzx2wHj74hZRULD P7IA== X-Gm-Message-State: AOJu0YwgGRFCyoUTN1MpkE6mlwWs41EXxlmtHGrVhsIB2MCsF4eMubCe 0M3loCVkHF7s7CLLbxNc2TQk07JxgMTh2RZJMYY= X-Received: by 2002:a17:902:eb46:b0:1c5:76b6:d94f with SMTP id i6-20020a170902eb4600b001c576b6d94fmr4069811pli.31.1695092302279; Mon, 18 Sep 2023 19:58:22 -0700 (PDT) Received: from ghost ([50.168.177.76]) by smtp.gmail.com with ESMTPSA id jj6-20020a170903048600b001b8c6662094sm5038719plb.188.2023.09.18.19.58.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 19:58:21 -0700 (PDT) Date: Mon, 18 Sep 2023 22:58:17 -0400 From: Charlie Jenkins To: David Laight Cc: Palmer Dabbelt , Conor Dooley , Samuel Holland , "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-arch@vger.kernel.org" , Paul Walmsley , Albert Ou , Arnd Bergmann Subject: Re: [PATCH v6 3/4] riscv: Add checksum library Message-ID: References: <20230915-optimize_checksum-v6-0-14a6cf61c618@rivosinc.com> <20230915-optimize_checksum-v6-3-14a6cf61c618@rivosinc.com> <0357e092c05043fba13eccad77ba799f@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0357e092c05043fba13eccad77ba799f@AcuMS.aculab.com> X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 18 Sep 2023 19:58:33 -0700 (PDT) On Sat, Sep 16, 2023 at 09:32:40AM +0000, David Laight wrote: > From: Charlie Jenkins > > Sent: 15 September 2023 18:01 > > > > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit > > will load from the buffer in groups of 32 bits, and when compiled for > > 64-bit will load in groups of 64 bits. > > > ... > > + /* > > + * Do 32-bit reads on RV32 and 64-bit reads otherwise. This should be > > + * faster than doing 32-bit reads on architectures that support larger > > + * reads. > > + */ > > + while (len > 0) { > > + csum += data; > > + csum += csum < data; > > + len -= sizeof(unsigned long); > > + ptr += 1; > > + data = *ptr; > > + } > > I think you'd be better adding the 'carry' bits in a separate > variable. > It reduces the register dependency chain length in the loop. > (Helps if the cpu can execute two instructions in one clock.) > > The masked misaligned data values are max 24 bits > (if > > You'll also almost certainly remove at least one instruction > from the loop by comparing against the end address rather than > changing 'len'. > > So ending up with (something like): > end = buff + length; > ... > while (++ptr < end) { > csum += data; > carry += csum < data; > data = ptr[-1]; > } > (Although a do-while loop tends to generate better code > and gcc will pretty much always make that transformation.) > > I think that is 4 instructions per word (load, add, cmp+set, add). > In principle they could be completely pipelined and all > execute (for different loop iterations) in the same clock. > (But that is pretty unlikely to happen - even x86 isn't that good.) > But taking two clocks is quite plausible. > Plus 2 instructions per loop (inc, cmp+jmp). > They might execute in parallel, but unrolling once > may be required. > It looks like GCC actually ends up generating 7 total instructions: ffffffff808d2acc: 97b6 add a5,a5,a3 ffffffff808d2ace: 00d7b533 sltu a0,a5,a3 ffffffff808d2ad2: 0721 add a4,a4,8 ffffffff808d2ad4: 86be mv a3,a5 ffffffff808d2ad6: 962a add a2,a2,a0 ffffffff808d2ad8: ff873783 ld a5,-8(a4) ffffffff808d2adc: feb768e3 bltu a4,a1,ffffffff808d2acc This mv instruction could be avoided if the registers were shuffled around, but perhaps this way reduces some dependency chains. > ... > > + if (IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && > > + riscv_has_extension_likely(RISCV_ISA_EXT_ZBB)) { > ... > > + } > > +end: > > + return csum >> 16; > > + } > > Is it really worth doing all that to save (I think) 4 instructions? > (shift, shift, or with rotate twice). > There is much more to be gained by careful inspection > of the loop (even leaving it in C). > The main benefit was from using rev8 to replace swab32. However, now that I am looking at the assembly in the kernel it is not outputting the asm that matches what I have from an out of kernel test case, so rev8 might not be beneficial. I am going to have to look at this more to figure out what is happening. > > + > > +#ifndef CONFIG_32BIT > > + csum += (csum >> 32) | (csum << 32); > > + csum >>= 32; > > +#endif > > + csum = (unsigned int)csum + (((unsigned int)csum >> 16) | ((unsigned int)csum << 16)); > > Use ror64() and ror32(). > > David > Good idea. - Charlie > > + if (offset & 1) > > + return (unsigned short)swab32(csum); > > + return csum >> 16; > > +} > > > > -- > > 2.42.0 > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)