Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp487469rdb; Tue, 19 Sep 2023 01:05:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGUMSbWlJfhsw6C0QdgvSWL8IoJojV7CZV34D8hAPp3ibTCRVCdEbvpBhPIhHOQoSJtvIIm X-Received: by 2002:a17:90a:3ea5:b0:276:6b9d:7503 with SMTP id k34-20020a17090a3ea500b002766b9d7503mr4204065pjc.28.1695110728426; Tue, 19 Sep 2023 01:05:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695110728; cv=none; d=google.com; s=arc-20160816; b=EgXHBtJffm1KoRYWLIPztnzZ0UoFOakyfevKmT+6lNj77ydHAyADRTeoaI39E1YUct p2IywGBII/1Bt98ajQJAlauSLPpL+t4bSMOpcHESbi04eIkz2Org1TNzZucJ1LAp6Uc+ lR59darw7ba4HiWIG++qRyvTYMiGIfs5nrSVCKwi/g7+qLPBfvZ3iRiofCy/a22jSTr9 P9qfB3/3KMOlsZ+Ayl6CxjXtp6sq3Rbww1iPSNOcZomT8+rsCzWB7KyW5uM3MYFB7k+1 /w+CqwvkWPCzlVIz3YnoRrqy1d8Ti/IHx75OXhkwLJjJtbwd3oG1jOjEsjOu+85emD72 XpXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=Rg4CZkARG2iZO9CxV0dWj/ERacwHFwEU2Nf1KD3dYh8=; fh=qap5o3Z6CUvk2/Ci2GqDb+5zKFrTzAxndgAPNWNRvM4=; b=iloWGYf7syiyqzbzIsH2yH/E2Yt2QQPB4QjmEk6Hy1Yn/z/PBTO/VHKjVWVrbUC1EL LBDWONxmFzhuujiYmuGRk0bjKCBcmAHZ4BC3ekmvVrHluAP+X9UDbQUa70gsRHGQiMmg u5SY+RbsqWDTxbPc9Qb5cD+p4PAy9mcbRMKRhJa6rdD+Mh0vv82WqvEuKHZbbOhvLxcP yY6jSLW1pfklBpDGXIimv/jTW28T3ychO6y/UkVlEXCUr5Gu2x+aYMRGb767bGGDZvbs CTThzMyNXGC98fbxaP5ZZDpohjzAr1wBuQj8B3nMbcWH98P7KRU3IlSPcIqWMt9loAgt UG9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id mg22-20020a17090b371600b00263a5cf8e64si9279098pjb.67.2023.09.19.01.05.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 01:05:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id BB981807C862; Tue, 19 Sep 2023 01:00:55 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229436AbjISIAn convert rfc822-to-8bit (ORCPT + 99 others); Tue, 19 Sep 2023 04:00:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230029AbjISIAk (ORCPT ); Tue, 19 Sep 2023 04:00:40 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.86.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D152F100 for ; Tue, 19 Sep 2023 01:00:34 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-109-scVoAueRPmK64unopgoamQ-1; Tue, 19 Sep 2023 09:00:16 +0100 X-MC-Unique: scVoAueRPmK64unopgoamQ-1 Received: from AcuMS.Aculab.com (10.202.163.4) by AcuMS.aculab.com (10.202.163.4) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Tue, 19 Sep 2023 09:00:12 +0100 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Tue, 19 Sep 2023 09:00:12 +0100 From: David Laight To: 'Charlie Jenkins' CC: Palmer Dabbelt , Conor Dooley , Samuel Holland , "linux-riscv@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-arch@vger.kernel.org" , Paul Walmsley , Albert Ou , Arnd Bergmann Subject: RE: [PATCH v6 3/4] riscv: Add checksum library Thread-Topic: [PATCH v6 3/4] riscv: Add checksum library Thread-Index: AQHZ5/ZuR2Nhj94ZDEWquHSBL7yNdbAdI/3wgARFmYCAAGOr4A== Date: Tue, 19 Sep 2023 08:00:12 +0000 Message-ID: <0fe9694900c7492c96dce6b67710173f@AcuMS.aculab.com> References: <20230915-optimize_checksum-v6-0-14a6cf61c618@rivosinc.com> <20230915-optimize_checksum-v6-3-14a6cf61c618@rivosinc.com> <0357e092c05043fba13eccad77ba799f@AcuMS.aculab.com> In-Reply-To: Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 19 Sep 2023 01:00:56 -0700 (PDT) ... > > So ending up with (something like): > > end = buff + length; > > ... > > while (++ptr < end) { > > csum += data; > > carry += csum < data; > > data = ptr[-1]; > > } > > (Although a do-while loop tends to generate better code > > and gcc will pretty much always make that transformation.) > > > > I think that is 4 instructions per word (load, add, cmp+set, add). > > In principle they could be completely pipelined and all > > execute (for different loop iterations) in the same clock. > > (But that is pretty unlikely to happen - even x86 isn't that good.) > > But taking two clocks is quite plausible. > > Plus 2 instructions per loop (inc, cmp+jmp). > > They might execute in parallel, but unrolling once > > may be required. > > > It looks like GCC actually ends up generating 7 total instructions: > ffffffff808d2acc: 97b6 add a5,a5,a3 > ffffffff808d2ace: 00d7b533 sltu a0,a5,a3 > ffffffff808d2ad2: 0721 add a4,a4,8 > ffffffff808d2ad4: 86be mv a3,a5 > ffffffff808d2ad6: 962a add a2,a2,a0 > ffffffff808d2ad8: ff873783 ld a5,-8(a4) > ffffffff808d2adc: feb768e3 bltu a4,a1,ffffffff808d2acc > > This mv instruction could be avoided if the registers were shuffled > around, but perhaps this way reduces some dependency chains. gcc managed to do 'data += csum' so had add 'csum = data'. If you unroll once that might go away. It might then be 10 instructions for 16 bytes. Although you then need slightly larger alignment code. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)