Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3120421rdb; Wed, 13 Sep 2023 02:41:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHArZUzyzjBn98Nq/u+BqdeB9HKOwqArJJyHWIWis5o86hnMjCfPLHzXAlCYPhU89d8okVw X-Received: by 2002:a17:902:ced1:b0:1bf:712:e4bd with SMTP id d17-20020a170902ced100b001bf0712e4bdmr2581518plg.65.1694598061428; Wed, 13 Sep 2023 02:41:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694598061; cv=none; d=google.com; s=arc-20160816; b=SuCIekL2ouL8N9eAgd55G+9BuMYO/nBwhWbGpn4kjpjGBtdx7AcMDy8VlBHSG0ZHpw zP0/JTYMwQEAw7Kp26enrm+FEj3O7iTb9meORaBzL3quwh+Y8DKCCnKeoF3Pop3PDnxq sturP8pA0ONHZXKZfXcS+36VVJYek/vetuudO5IBWvHlJW/cr6elS+868xhEBkktxPNC lfuNQTpza+Hk52FEnAg73P6hNa4J/JSg1qAfwNqQghJdX0MLArd8wf5NjoOlpjGPSjOK hE7JAF6i4mJi9qmNxpdcZFOLJX6PaTyOhfTDBGhObxizS2zsVnncxPW5sA5ZacFuKnpI YLVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=8Pf7pTuUGrfS3QfAFTCC6m/hHffMqkAAMnuVf+s/49s=; fh=vrXue3HNekBLAIYZZU1Ekd2BKy48UDfvKHSKqZu3QXI=; b=pZBVJvdXawhqJOe/IlOWk02H59u3NnHM/NeQ3Qc3m+IgkqXN59OGsLtoam8v1c5JMX uWBwo3lS9DFDGqaSPMJlXlXKfnOKXzrXI6g0TOL+wYmOsTFaYBIzlp0ksR2SZqtalrQf OgCZmeEY0gmlQBRmale0cst1OU9LMFjifU8E9zV3oFbd2JfTwdzUNZ19QQM9tkWxIPtj lMA0qptM9kS9ASNcGnyU9v+RexIkHyhUV5fXlSSYLMngGQx0vXgOWq4kY02QmJBILZ6C 5RibtcOcMtR9TRr4HhYxvwzPI1gzEI/VbeMxsDVboO/i28+bSK2Q/abiIeKBzmtJ+oin ZVag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=QZIHmIgC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id cp1-20020a170902e78100b001c0727658c3si9558857plb.259.2023.09.13.02.41.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 02:41:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=QZIHmIgC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 8A31880C77A7; Tue, 12 Sep 2023 20:10:10 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230013AbjIMDKJ (ORCPT + 99 others); Tue, 12 Sep 2023 23:10:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231181AbjIMDKG (ORCPT ); Tue, 12 Sep 2023 23:10:06 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD719170E for ; Tue, 12 Sep 2023 20:09:51 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-2740f8d73aeso2412716a91.1 for ; Tue, 12 Sep 2023 20:09:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1694574591; x=1695179391; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=8Pf7pTuUGrfS3QfAFTCC6m/hHffMqkAAMnuVf+s/49s=; b=QZIHmIgCZ6f4BhJ7kkc7w9nwkefUydaKLIvCzRibASi1MP2IpETq+K48Z/NxhsxYeo HLEZkermQg0gdLhU6LL/FB2xI3076jQx3rZVPKE3j2qpnZNtL/nRLwmRqMIoaJaFiw4z wPQExOrgfNpOXTsXp+IoEK8ZFvaANuxI6VB/qNFCMmA+4F3Tz7OhpHWZJFQSgzKINLPI TGU0qq4673zAOvL7JAT9YIb235nh8RlpL7GqDm4mSCMHqeCHIsAWZzpinySGY1KrkOPh YesruQeuytiVaZiSGgpknCe3T0qRNiNi2QocNIeRuQ7sRBBTCA+zSfrrxK1WHXlg4cBR lMNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694574591; x=1695179391; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8Pf7pTuUGrfS3QfAFTCC6m/hHffMqkAAMnuVf+s/49s=; b=RLLwytfGYlppOnkhwB+gC8YFlletIGhA6J12M1ynosW8RagWIvEqhuG1uVsVJS2TZ7 kQZCDuWAmgU7UmKj/GuBzGSPq6ZU6WkQ0rSm5qrHNd1H8SvJ4nBbUOkiowwQjHHUdIRH atpfWdu9omYOOcK5riwyHCg+XMiL5Hh2u4abjrpX09SMs2iZiWKFV0sfBlxkoMLlSn+T azPm8dTeJhgwEikUH0CvkOKwEFE0WoyU2pqsdkotXVLkEcz7B97SFHLdMWnWPbYMVsVo P+Ub+uJqPOrNHpvhELvbrPwXt+AWQWLW0T5jCfvjkrNXVDnGHK2s376HhRKlulXK/Nso NyWg== X-Gm-Message-State: AOJu0Yxxhv4gDMNgJkysqx1nM7dEzKLccK6Bc+TLsJNFTFSUEeDD2u/X 7o7Vc3Actf3/58Y1j9ZVeNqCnA== X-Received: by 2002:a17:90a:668f:b0:274:1f99:290 with SMTP id m15-20020a17090a668f00b002741f990290mr986168pjj.34.1694574591042; Tue, 12 Sep 2023 20:09:51 -0700 (PDT) Received: from ghost ([50.168.177.76]) by smtp.gmail.com with ESMTPSA id fv23-20020a17090b0e9700b002740e66851asm333095pjb.35.2023.09.12.20.09.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 20:09:50 -0700 (PDT) Date: Tue, 12 Sep 2023 20:09:47 -0700 From: Charlie Jenkins To: David Laight Cc: Palmer Dabbelt , Conor Dooley , Samuel Holland , "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Paul Walmsley , Albert Ou Subject: Re: [PATCH v4 2/5] riscv: Add checksum library Message-ID: References: <20230911-optimize_checksum-v4-0-77cc2ad9e9d7@rivosinc.com> <20230911-optimize_checksum-v4-2-77cc2ad9e9d7@rivosinc.com> <1818c4114b0e4144a9df21f235984840@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1818c4114b0e4144a9df21f235984840@AcuMS.aculab.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 12 Sep 2023 20:10:10 -0700 (PDT) X-Spam-Status: No, score=2.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_SBL_CSS, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email On Tue, Sep 12, 2023 at 08:45:38AM +0000, David Laight wrote: > From: Charlie Jenkins > > Sent: 11 September 2023 23:57 > > > > Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit > > will load from the buffer in groups of 32 bits, and when compiled for > > 64-bit will load in groups of 64 bits. Benchmarking by proxy compiling > > csum_ipv6_magic (64-bit version) for an x86 chip as well as running > > the riscv generated code in QEMU, discovered that summing in a > > tree-like structure is about 4% faster than doing 64-bit reads. > > > ... > > + sum = saddr->s6_addr32[0]; > > + sum += saddr->s6_addr32[1]; > > + sum1 = saddr->s6_addr32[2]; > > + sum1 += saddr->s6_addr32[3]; > > + > > + sum2 = daddr->s6_addr32[0]; > > + sum2 += daddr->s6_addr32[1]; > > + sum3 = daddr->s6_addr32[2]; > > + sum3 += daddr->s6_addr32[3]; > > + > > + sum4 = csum; > > + sum4 += ulen; > > + sum4 += uproto; > > + > > + sum += sum1; > > + sum2 += sum3; > > + > > + sum += sum2; > > + sum += sum4; > > Have you got gcc to compile that as-is? > > Whenever I've tried to get a 'tree add' compiled so that the > early adds can be executed in parallel gcc always pessimises > it to a linear sequence of adds. > > But I agree that adding 32bit values to a 64bit register > may be no slower than trying to do an 'add carry' sequence > that is guaranteed to only do one add/clock. > (And on Intel cpu from core-2 until IIRC Haswell adc took 2 clocks!) > > IIRC RISCV doesn't have a carry flag, so the adc sequence > is hard - probably takes two extra instructions per value. > Although with parallel execute it may not matter. > Consider: > val = buf[offset]; > sum += val; > carry += sum < val; > val = buf[offset1]; > sum += val; > ... > the compare and 'carry +=' can be executed at the same time > as the following two instructions. > You do then a final sum += carry; sum += sum < carry; > > Assuming all instructions are 1 clock and any read delays > get filled with other instructions (by source or hardware > instruction re-ordering) even without parallel execute > that is 4 clocks for 64 bits, which is much the same as the > 2 clocks for 32 bits. > > Remember that all the 32bit values can summed first as > they won't overflow. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) Yeah it does seem like the tree-add does just do a linear add. All three of them were pretty much the same on riscv so I used the version that did best on x86 with the knowledge that my QEMU setup does not accurately represent real hardware. I don't quite understand how doing the carry in the middle of each stage, even though it can be executed at the same time, would be faster than just doing a single overflow check at the end. I can just revert back to the non-tree add version since there is no improvement on riscv. I can also revert back to the default version that uses carry += sum < val as well. - Charlie