Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp5548040pxb; Mon, 14 Feb 2022 01:32:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJyfhexXrVnANk1qLXtfhS8U6yh5I5WzsjO7+pcXHbynuk42epb5Y3+jQjvfIoW+IRkFAKBw X-Received: by 2002:a17:906:99c6:: with SMTP id s6mr11435439ejn.522.1644831151323; Mon, 14 Feb 2022 01:32:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644831151; cv=none; d=google.com; s=arc-20160816; b=FmpmKgzR7NAxCOM/mshjqs/NUpGJM33lB4LSUWzBbVaFcbN1XrGj9VcI7r+KPrP3r+ /kDdFUTr/M77OGiXr92nTSq16SWipKGfRp/s5hRlOhVCas+9ZrhhIWLtl+JZKEKxCJNx KvMsWNGnZzmjf/CZeTCv8+zRlSXb75btsoMXpbaZEe5FadxqU+chVYH2VhueXm6bgRp6 cq0NIfFq9WthPqdkVUcjJOw+7IOVu5qSktSIw9xmeWkkjflmFP2OsmKijvdeig0cOIMu PlCdaqvE7hGwCJxWXMXCuMn34uUkv/WKp/tCu6f42fQfbdRK3jWyerHbWBqS/2oB0uSf o32g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=8vxAfkvIYnQFLZ/QK0mPap67g/lEH7TEG3ZYYa0sHSQ=; b=tVp3tjMuBmLv/5IQI0iTgknOfIX7I5wXJJ88jRAblZgVVHRJG8570BwMOaTmDFy/lY lZDEnfGrGm31377B04x9XI39tZ32IKcWnB5qKjz11eSElg4mnCzLL6nOYCneoZ4fnxRD GQjg5ba3JN+McJJmK3TuU1Jtfh8b/6ytonK3CkW20feH/jo/siKzBS03mvl32JQm0Hc4 4ALy9NYe0eevgOq6tY6BlRBx/f3LCsAW6jYRODRv0pPMicq8C+1ieVrvgd7VSC0Q7K0J 6Yx0a3eriJHbmKNIBmrMFxliz2e2y/hqgrIU/FE1V5+Rc+9sHMhthbUNueq9fSGiFaSN Ljgw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q7si12128361edd.115.2022.02.14.01.32.09; Mon, 14 Feb 2022 01:32:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234727AbiBMJTr (ORCPT + 99 others); Sun, 13 Feb 2022 04:19:47 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:47796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231366AbiBMJTr (ORCPT ); Sun, 13 Feb 2022 04:19:47 -0500 Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 139435C344 for ; Sun, 13 Feb 2022 01:19:41 -0800 (PST) Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 21D9GKNV030588; Sun, 13 Feb 2022 03:16:20 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 21D9GJKW030587; Sun, 13 Feb 2022 03:16:19 -0600 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Sun, 13 Feb 2022 03:16:19 -0600 From: Segher Boessenkool To: David Laight Cc: "'Christophe Leroy'" , "David S. Miller" , Jakub Kicinski , "netdev@vger.kernel.org" , "linuxppc-dev@lists.ozlabs.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] net: Remove branch in csum_shift() Message-ID: <20220213091619.GY614@gate.crashing.org> References: <7f16910a8f63475dae012ef5135f41d1@AcuMS.aculab.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7f16910a8f63475dae012ef5135f41d1@AcuMS.aculab.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Feb 13, 2022 at 02:39:06AM +0000, David Laight wrote: > From: Christophe Leroy > > Sent: 11 February 2022 08:48 > > > > Today's implementation of csum_shift() leads to branching based on > > parity of 'offset' > > > > 000002f8 : > > 2f8: 70 a5 00 01 andi. r5,r5,1 > > 2fc: 41 a2 00 08 beq 304 > > 300: 54 84 c0 3e rotlwi r4,r4,24 > > 304: 7c 63 20 14 addc r3,r3,r4 > > 308: 7c 63 01 94 addze r3,r3 > > 30c: 4e 80 00 20 blr > > > > Use first bit of 'offset' directly as input of the rotation instead of > > branching. > > > > 000002f8 : > > 2f8: 54 a5 1f 38 rlwinm r5,r5,3,28,28 > > 2fc: 20 a5 00 20 subfic r5,r5,32 > > 300: 5c 84 28 3e rotlw r4,r4,r5 > > 304: 7c 63 20 14 addc r3,r3,r4 > > 308: 7c 63 01 94 addze r3,r3 > > 30c: 4e 80 00 20 blr > > > > And change to left shift instead of right shift to skip one more > > instruction. This has no impact on the final sum. > > > > 000002f8 : > > 2f8: 54 a5 1f 38 rlwinm r5,r5,3,28,28 > > 2fc: 5c 84 28 3e rotlw r4,r4,r5 > > 300: 7c 63 20 14 addc r3,r3,r4 > > 304: 7c 63 01 94 addze r3,r3 > > 308: 4e 80 00 20 blr > > That is ppc64. That is 32-bit powerpc. > What happens on x86-64? > > Trying to do the same in the x86 ipcsum code tended to make the code worse. > (Although that test is for an odd length fragment and can just be removed.) In an ideal world the compiler could choose the optimal code sequences everywhere. But that won't ever happen, the search space is way too big. So compilers just use heuristics, not exhaustive search like superopt does. There is a middle way of course, something with directed searches, and maybe in a few decades systems will be fast enough. Until then we will very often see code that is 10% slower and 30% bigger than necessary. A single insn more than needed isn't so bad :-) Making things branch-free is very much worth it here though! Segher