Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp1015186rdb; Tue, 19 Sep 2023 18:22:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGDKK6o6bjnOc5j8mi7W5moVMvH/Btk7YX7PJ4gCYof4D+rE+5Ejm55jvwW6AomL4BNn2Rc X-Received: by 2002:a05:6358:9143:b0:142:d1cb:48ab with SMTP id r3-20020a056358914300b00142d1cb48abmr1501896rwr.15.1695172925775; Tue, 19 Sep 2023 18:22:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695172925; cv=none; d=google.com; s=arc-20160816; b=GReoSfu4cp9/nfusuewjW53Chm18jIeEo/yOG/BVEEAWmAl+dLpDPDQpzzEWJ0DLZH SVIkdMdHZI85O6+AjfrDPmOBd24sq3s8/YjDite2AeGPe+MI2IjyHVtmCPph2m0gcZ3f 9jbzwjAtOYa6Jp7W/DVJxJzXU0RLnX/srKtFNm8DNNkzS9QKq7Jfs6scPylpZM6e6t2u jVrlKvWksJJ8QTxHUTqjbtrMkE86Ry/54YmphmCpcqcTI4x07XBkNBS4Kgh9KYuukqeL YAUW/RQyVw49Nqlp/uLuJVC4e3Fb40dW5/djeCm/zhfqW1F+SVqptdeuz9wLwaTOjSd8 GtHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=15nBqosj6WWJOSwWn2FB/oN1SFdvM0QLlK2aSBEvitc=; fh=Q8I0HaN+ohuAcI7Ai7F+sm8IGMCZjIQhTKbXCoQCPg8=; b=Vd1goLkR/MrZsuLa50QDfcmwIS/xtj5bBM07epr97stpBpcnEXgqRIiEtp9Ps9x4It B7Drec9c9HYjuVoFkoGKqEXMQL1nixMQDoZPHVqWb9cyLEi9I6O72BbzotXCPJuMJC2L 3i/ziQR5e1wOMeuO2VfVTriCe8agwxNk/7DUgQZR6CFnW872WIiWHuhjksAAVLpe0sHG oK559jVwPymhWzDjR4jTh3Tk3DF/o3soZd2mi6t3p5wJau5tCrsufnJpgAvNPsb2eh+J vC1ZYottT2B7Re8BHy3sOMEofAhYMN7yVoBIWEo7FDeBfEUWZonBiozL2/tjB0TR13Wi p5xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=wQEV+xKZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id q5-20020a63e945000000b005740286f2b8si10272550pgj.297.2023.09.19.18.22.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 18:22:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=wQEV+xKZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 484FD820DA11; Tue, 19 Sep 2023 11:04:59 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230140AbjISSFB (ORCPT + 99 others); Tue, 19 Sep 2023 14:05:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230147AbjISSE7 (ORCPT ); Tue, 19 Sep 2023 14:04:59 -0400 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4324B95 for ; Tue, 19 Sep 2023 11:04:53 -0700 (PDT) Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-690d9cda925so48863b3a.3 for ; Tue, 19 Sep 2023 11:04:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1695146692; x=1695751492; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=15nBqosj6WWJOSwWn2FB/oN1SFdvM0QLlK2aSBEvitc=; b=wQEV+xKZafjXjnlwLYpplDcXlrUvvdo9PftzaxGLMNVWiXvH8pAipV6QzTp0t/wd9T eJw5q/qoBNeA+CyvHeNqcUPwvUW3vvuxG5woLdjz4m12evMc7KLNu/tpyS0WiW3WyWkW +4mkNwHfkxeJQAqtJG63+qcCVt0zScPqI03/WnuLpXf5lYRbaVZp3s13LpuSsHg6wZRi AUePJvYLcepVewf/lpikp5D1VrwjMKm91yVtGT3fjh0rE0782hsq4VhaDa4Eh+PS1G2v 54kwrkke/MWSIHs+xSeR90vE77mxllLbgsZ0BhvTvmC6Vmh/YfYlQUlIiHCh8L6re9um qJtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695146692; x=1695751492; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=15nBqosj6WWJOSwWn2FB/oN1SFdvM0QLlK2aSBEvitc=; b=suRkaVk5xa3ZgpFTBHQ9zyVrESmXk6mCEAmA7gksQV03zAYXR4Q+WJUYMCUQXD+yjq xzq6blQBj1h9SNt7XokGSiHnmFAOVBCB8rdBq9wmpHLgC01LmFjB1u3oQV7BVOA2cArc vYKJpfcd4lMzYeH6qcPWJXiNoOiwTp4RTvGIn8WagriICXYJVZDe8ptDkT6rjjomVQz5 j95yjzhjrUnpjO8BUjW7MLN31veEA0359YqhMGghoeOxwRXBJ6+WvpAcXV5p2vS8AQvL 9tn0Q1CcLcjviQvYIrgT6sRa8+lp8S7RmXuTCL2AlSMewfD5DduepMe6M2NF2oKYCZFn +8eQ== X-Gm-Message-State: AOJu0YypaWZ4oVFyCsrElLNJx7gsKLiqEWDJN+CSNatdIjd0q9ILlkIV /snWc1V/PWHLaAieNS1sqabSvQ== X-Received: by 2002:a05:6a00:179f:b0:690:1720:aa9a with SMTP id s31-20020a056a00179f00b006901720aa9amr446773pfg.15.1695146692638; Tue, 19 Sep 2023 11:04:52 -0700 (PDT) Received: from ghost ([50.168.177.76]) by smtp.gmail.com with ESMTPSA id a23-20020a62e217000000b00666e649ca46sm8947951pfi.101.2023.09.19.11.04.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 11:04:52 -0700 (PDT) Date: Tue, 19 Sep 2023 14:04:48 -0400 From: Charlie Jenkins To: David Laight Cc: Palmer Dabbelt , Conor Dooley , Samuel Holland , "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-arch@vger.kernel.org" , Paul Walmsley , Albert Ou , Arnd Bergmann Subject: Re: [PATCH v6 3/4] riscv: Add checksum library Message-ID: References: <20230915-optimize_checksum-v6-0-14a6cf61c618@rivosinc.com> <20230915-optimize_checksum-v6-3-14a6cf61c618@rivosinc.com> <0357e092c05043fba13eccad77ba799f@AcuMS.aculab.com> <0fe9694900c7492c96dce6b67710173f@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0fe9694900c7492c96dce6b67710173f@AcuMS.aculab.com> X-Spam-Status: No, score=1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 19 Sep 2023 11:04:59 -0700 (PDT) On Tue, Sep 19, 2023 at 08:00:12AM +0000, David Laight wrote: > ... > > > So ending up with (something like): > > > end = buff + length; > > > ... > > > while (++ptr < end) { > > > csum += data; > > > carry += csum < data; > > > data = ptr[-1]; > > > } > > > (Although a do-while loop tends to generate better code > > > and gcc will pretty much always make that transformation.) > > > > > > I think that is 4 instructions per word (load, add, cmp+set, add). > > > In principle they could be completely pipelined and all > > > execute (for different loop iterations) in the same clock. > > > (But that is pretty unlikely to happen - even x86 isn't that good.) > > > But taking two clocks is quite plausible. > > > Plus 2 instructions per loop (inc, cmp+jmp). > > > They might execute in parallel, but unrolling once > > > may be required. > > > > > It looks like GCC actually ends up generating 7 total instructions: > > ffffffff808d2acc: 97b6 add a5,a5,a3 > > ffffffff808d2ace: 00d7b533 sltu a0,a5,a3 > > ffffffff808d2ad2: 0721 add a4,a4,8 > > ffffffff808d2ad4: 86be mv a3,a5 > > ffffffff808d2ad6: 962a add a2,a2,a0 > > ffffffff808d2ad8: ff873783 ld a5,-8(a4) > > ffffffff808d2adc: feb768e3 bltu a4,a1,ffffffff808d2acc > > > > This mv instruction could be avoided if the registers were shuffled > > around, but perhaps this way reduces some dependency chains. > > gcc managed to do 'data += csum' so had add 'csum = data'. > If you unroll once that might go away. > It might then be 10 instructions for 16 bytes. > Although you then need slightly larger alignment code. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) > I messed with it a bit and couldn't get the mv to go away. I would expect mv to be very cheap so it should be fine, and I would like to avoid adding too much to the alignment code since it is already large, and I assume that buff will be aligned more often than not. Interestingly, the mv does not appear pre gcc 12, and does not appear on clang. - Charlie