Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp5817107rwb; Tue, 22 Nov 2022 05:25:13 -0800 (PST) X-Google-Smtp-Source: AA0mqf5nSJygkdVfgBDO5WZbD0Ivd3AfwhP6Wl/WXNnZBGCmB4JbgA2cFByJ+Q8tFImcsXVOGh8U X-Received: by 2002:aa7:cf07:0:b0:469:5aae:5807 with SMTP id a7-20020aa7cf07000000b004695aae5807mr12439466edy.133.1669123513352; Tue, 22 Nov 2022 05:25:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669123513; cv=none; d=google.com; s=arc-20160816; b=XTxEbmt3ahkMw8IUCk7UTS+ZeiCExenjEmPEIDxmWchwrJxn7N3R398e0OrOTnRBOj JfgLbBfpunmVf+WpBSaEd7laa+E0+UaoX5+WoXY4tjVDKf5BnmdfK77gvCTnbHZjDwDD rlewV+h865j3TXwVgf79llxSVez2XPJD35kVrInoPGRO5P+Xb9V2lU3a+68NRcYAVd89 ofPv4DCvrHrC1UQy75PNoJCA4r2dTbVc+0v01gKEIhnWPp/7jt/Jw2AFyP56oJK2Xzif iLQWAxK7J52oLWlP5tK92s5pjoRCb0Yzha4QVwsy5qk8uWvkMrF7WGOx3PpjX/ZrMaar f5Ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:message-id:date:thread-index :thread-topic:subject:cc:to:from; bh=Zr8GNL3LyiurxHXYYSiMVaWIM7z2RfrNOzuujNfTxLE=; b=vXVJ16AtKaR1lubm9hty2aX81nVPfx9C3Ir2KNFtFzXKIA/YnOu/xMvzRPPp3auFYw 9r0j5iLIsQ72ilp1C4gGNYK0PCwJ/EK4vB0leUk12FeMcOKE6HqcWMsySa7joZ2Zav99 2lGpuOXxHhyqNK5IX1ZH9BQBhV8quum0I4QxsjUDIHz+nODErxHauYgH3bgPw7cVzb39 clKw/kY0PvnKn117Ix9YVvzHEcKt6sBBHO7UBSc1y3kr08tyxSNkLaWMHYbc6jT2dak6 3YKCrytrXWMpIv0NJgJKt85w+ByC6J7dIEbpo4AZRSveWssr3Cf82k1ykTCki0/4LBOd lsiQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id by25-20020a0564021b1900b00458ff6921e6si10646590edb.79.2022.11.22.05.24.50; Tue, 22 Nov 2022 05:25:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232972AbiKVNIi convert rfc822-to-8bit (ORCPT + 90 others); Tue, 22 Nov 2022 08:08:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233032AbiKVNI3 (ORCPT ); Tue, 22 Nov 2022 08:08:29 -0500 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.86.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C34431403D for ; Tue, 22 Nov 2022 05:08:27 -0800 (PST) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-321-TmW3YmSNNDGLjL9vRSfOzw-1; Tue, 22 Nov 2022 13:08:24 +0000 X-MC-Unique: TmW3YmSNNDGLjL9vRSfOzw-1 Received: from AcuMS.Aculab.com (10.202.163.4) by AcuMS.aculab.com (10.202.163.4) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Tue, 22 Nov 2022 13:08:23 +0000 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.044; Tue, 22 Nov 2022 13:08:23 +0000 From: David Laight To: "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "x86@kernel.org" CC: Arnd Bergmann , Thomas Gleixner , "Ingo Molnar" , "dave.hansen@linux.intel.com" Subject: Optimising csum_fold() Thread-Topic: Optimising csum_fold() Thread-Index: Adj+b8b0ybT82IBbSHeFnZ0Bnl9aNQ== Date: Tue, 22 Nov 2022 13:08:23 +0000 Message-ID: Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org There are currently 20 copies of csum_fold(), some in C some in assembler. The default C version (in asm-generic/checksum.h) is pretty horrid. Some of the asm versions (including x86 and x86-64) aren't much better. There are 3 pretty good C versions: 1: (~sum - rol32(sum, 16)) >> 16 2: ~(sum + rol32(sum, 16)) >> 16 3: (u16)~((sum + rol32(sum, 16)) >> 16) All three are (usually) 4 arithmetic instructions. The first two have the advantage that the high bits are zero. Relevant when the value is being checked rather than set. The first one can generate better instruction scheduling (the rotate and invert can be executed in the same clock). The 3rd one saves an instruction on arm, but may need masking. (I've not compiled an arm kernel to see how often that happens.) The only architectures where (I think) the current asm code is better than the C above are sparc and sparc64. Sparc doesn't have a rotate instruction, but does have a carry flag. This makes the current asm version one instruction shorter. For architectures like mips and risc-v which have neither rotate instructions nor carry flags the C is as good as the current asm. The rotate is 3 instructions - the same as the extra cmp+add. Changing everything to use [1] would improve quite a few architectures while only adding 1 clock to some paths in arm/arm64 and sparc. Unfortunately it is all currently a mess. Most architectures don't include asm-generic/checksum.h at all. Thoughts? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)