Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752959AbcCEF3O (ORCPT ); Sat, 5 Mar 2016 00:29:14 -0500 Received: from host.buserror.net ([209.198.135.123]:49529 "EHLO host.buserror.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750750AbcCEF3N (ORCPT ); Sat, 5 Mar 2016 00:29:13 -0500 Date: Fri, 4 Mar 2016 23:29:00 -0600 From: Scott Wood To: Christophe Leroy Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , scottwood@freescale.com, netdev@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Message-ID: <20160305052900.GA5742@home.buserror.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-SA-Exim-Connect-IP: 75.72.173.242 X-SA-Exim-Mail-From: oss@buserror.net X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * -15 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] Subject: Re: [9/9] powerpc: optimise csum_partial() call when len is constant X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:57:07 +0000) X-SA-Exim-Scanned: Yes (on host.buserror.net) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1924 Lines: 58 On Tue, Sep 22, 2015 at 04:34:36PM +0200, Christophe Leroy wrote: > +/* > + * computes the checksum of a memory block at buff, length len, > + * and adds in "sum" (32-bit) > + * > + * returns a 32-bit number suitable for feeding into itself > + * or csum_tcpudp_magic > + * > + * this function must be called with even lengths, except > + * for the last fragment, which may be odd > + * > + * it's best to have buff aligned on a 32-bit boundary > + */ > +__wsum __csum_partial(const void *buff, int len, __wsum sum); > + > +static inline __wsum csum_partial(const void *buff, int len, __wsum sum) > +{ > + if (__builtin_constant_p(len) && len == 0) > + return sum; > + > + if (__builtin_constant_p(len) && len <= 16 && (len & 1) == 0) { > + __wsum sum1; > + > + if (len == 2) > + sum1 = (__force u32)*(u16 *)buff; > + if (len >= 4) > + sum1 = *(u32 *)buff; > + if (len == 6) > + sum1 = csum_add(sum1, (__force u32)*(u16 *)(buff + 4)); > + if (len >= 8) > + sum1 = csum_add(sum1, *(u32 *)(buff + 4)); > + if (len == 10) > + sum1 = csum_add(sum1, (__force u32)*(u16 *)(buff + 8)); > + if (len >= 12) > + sum1 = csum_add(sum1, *(u32 *)(buff + 8)); > + if (len == 14) > + sum1 = csum_add(sum1, (__force u32)*(u16 *)(buff + 12)); > + if (len >= 16) > + sum1 = csum_add(sum1, *(u32 *)(buff + 12)); > + > + sum = csum_add(sum1, sum); Why the final csum_add instead of s/sum1/sum/ and putting csum_add in the "len == 2" and "len >= 4" cases? The (__force u32) casts are unnecessary. Or rather, it should be (__force __wsum) -- on all of them, not just the 16-bit ones. The pointer casts should be const. > + } else if (__builtin_constant_p(len) && (len & 3) == 0) { > + sum = csum_add(ip_fast_csum_nofold(buff, len >> 2), sum); It may not make a functional difference, but based on the csum_add() argument names and other csum_add() usage, sum should come first and the new content second. -Scott