Received: by 2002:a05:7412:cfc7:b0:fc:a2b0:25d7 with SMTP id by7csp412066rdb; Sat, 17 Feb 2024 14:48:11 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVuJjNiLjrfUhMKBm3D0BMTrvULHMf4aaxB1K+otmOZTlMtzKExYhyGOGTcxzJxJPio06M/n4RH9+4bnIxUQlSv/zdMEJYCA955WSC3Cw== X-Google-Smtp-Source: AGHT+IGa8x6w02kij0aQ6ANH5QkvhWX7TU1Am7AOu+Gof/vtRuVvED1XoPwhArKDIXd/PszyGBS4 X-Received: by 2002:a0c:b552:0:b0:68c:aac7:35ae with SMTP id w18-20020a0cb552000000b0068caac735aemr7677267qvd.58.1708210091605; Sat, 17 Feb 2024 14:48:11 -0800 (PST) Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id js6-20020a0562142aa600b0068cbe9b518esi3006709qvb.204.2024.02.17.14.48.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 17 Feb 2024 14:48:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-70111-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-70111-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-70111-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 5E8521C2134E for ; Sat, 17 Feb 2024 22:48:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 062AC7E760; Sat, 17 Feb 2024 22:48:08 +0000 (UTC) Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.85.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 770501CF96 for ; Sat, 17 Feb 2024 22:48:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.58.85.151 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708210087; cv=none; b=fETC1dJeb0Fd/IFwQX1IPD3plNjmyQ7b2aGeQVF5C60gTnQq49ugAZxBKuOzLNeiREKWp/sKt/lpfTpItpR3+UCo8wBuRe6tOxmg8RfVKUxan4RgFLOeNoxBQ423ydhPy4LSQhdatLbzkGc4wQCceccNfTxsj7HRWfwSz7MFRVk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708210087; c=relaxed/simple; bh=Oxo9xfWWrylqVF9wdIJ15x+712dCbCjyWGNGN7jK96E=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: MIME-Version:Content-Type; b=Ysub0lt0MSBBYQswqUTju+qrIrdQYC1Ul4sw7BZEdFdjqcSG/b/cBUBeMw5trdlI7Y+Ydq53/f54scwy2sjnkP8GQYi8cXHZcE/smvKWiBn6/e9Ga6/ZlpHvEOITSJxBfD8QBlAN1Bl8x5iPt1AWuIHKVLBKEmh40pK3qlk5C58= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ACULAB.COM; spf=pass smtp.mailfrom=aculab.com; arc=none smtp.client-ip=185.58.85.151 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ACULAB.COM Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=aculab.com Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-200-m7FAOtyeN7C0BBLLhnGudQ-1; Sat, 17 Feb 2024 22:47:54 +0000 X-MC-Unique: m7FAOtyeN7C0BBLLhnGudQ-1 Received: from AcuMS.Aculab.com (10.202.163.6) by AcuMS.aculab.com (10.202.163.6) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Sat, 17 Feb 2024 22:47:31 +0000 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Sat, 17 Feb 2024 22:47:31 +0000 From: David Laight To: 'Charlie Jenkins' , Helge Deller CC: Guenter Roeck , Helge Deller , "James E . J . Bottomley" , "linux-parisc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Palmer Dabbelt Subject: RE: [PATCH] parisc: Fix csum_ipv6_magic on 64-bit systems Thread-Topic: [PATCH] parisc: Fix csum_ipv6_magic on 64-bit systems Thread-Index: AQHaYU1uf9SPNwTrtE2tFbsnOFUZorEPGrIQ Date: Sat, 17 Feb 2024 22:47:31 +0000 Message-ID: <8c5a811655004999ba187e69fe2d5fbf@AcuMS.aculab.com> References: <20240213234631.940055-1-linux@roeck-us.net> In-Reply-To: Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable .. > We can do better than this! By inspection this looks like a performance > regression. The generic version of csum_fold in > include/asm-generic/checksum.h is better than this so should be used > instead. Yes, that got changed for 6.8-rc1 (I pretty much suggested the patch) but hadn't noticed Linus has applied it. That C version is (probably) not worse than any of the asm versions except sparc32 - which has a carry flag but rotate. (It is better than the x86-64 asm one.) .. > This doesn't leverage add with carry well. This causes the code size of t= his > to be dramatically larger than the original assembly, which I assume > nicely correlates to an increased execution time. It is pretty much impossible to do add with carry from C. So an asm adc block is pretty much always going to win. For csum_partial and short to moderate length buffers on x86 it is hard to beat 10: adc, adc, dec, jnz 10b which (on modern intel cpu at least) does 8 bytes/clock. You can get 12 bytes/clock but it only really wins for 256+ bytes. (See the current x86-64 version.) For cpu without a carry flag it is likely that a common C function will be pretty much optimal on all architectures. (Or maybe a couple of implementations based the actual cpu implementation - not the architecture.) Mostly I don't think you can beat 4 instructions/word, but they will pipeline so with multi-issue you might get a read/clock.=20 Arm's barrel shifter might give 3: v + *p; x +=3D v, y +=3D v >> 32. =09David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1= PT, UK Registration No: 1397386 (Wales)