Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752565AbbBRJk7 (ORCPT ); Wed, 18 Feb 2015 04:40:59 -0500 Received: from smtp-out4.electric.net ([192.162.216.182]:54432 "EHLO smtp-out4.electric.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751715AbbBRJk5 (ORCPT ); Wed, 18 Feb 2015 04:40:57 -0500 From: David Laight To: "'Karl Beldan'" CC: "'Jiri Slaby'" , "stable@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Karl Beldan , Al Viro , Eric Dumazet , Arnd Bergmann , Mike Frysinger , "netdev@vger.kernel.org" , Eric Dumazet , "David S. Miller" Subject: RE: [PATCH 3.12 065/122] lib/checksum.c: fix carry in csum_tcpudp_nofold Thread-Topic: [PATCH 3.12 065/122] lib/checksum.c: fix carry in csum_tcpudp_nofold Thread-Index: AQHQSqf2r4Yt5njhk0WOADDrBdERHpz0vYNggACFCYCAAOP9AA== Date: Wed, 18 Feb 2015 09:40:23 +0000 Message-ID: <063D6719AE5E284EB5DD2968C1650D6D1CAE56AE@AcuExch.aculab.com> References: <07707a797d6c3cd0bfe86f037d3d1eb329acbc86.1424099973.git.jslaby@suse.cz> <063D6719AE5E284EB5DD2968C1650D6D1CAE4D1E@AcuExch.aculab.com> <20150217195717.GA6779@magnum.frso.rivierawaves.com> In-Reply-To: <20150217195717.GA6779@magnum.frso.rivierawaves.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.202.99.200] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-Outbound-IP: 213.249.233.130 X-Env-From: David.Laight@ACULAB.COM X-PolicySMART: 3396946, 3397078 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id t1I9f6d1022292 Content-Length: 1415 Lines: 37 From: Karl Beldan > On Tue, Feb 17, 2015 at 12:04:22PM +0000, David Laight wrote: > > > +static inline u32 from64to32(u64 x) > > > +{ > > > + /* add up 32-bit and 32-bit for 32+c bit */ > > > + x = (x & 0xffffffff) + (x >> 32); > > > + /* add up carry.. */ > > > + x = (x & 0xffffffff) + (x >> 32); > > > + return (u32)x; > > > +} > > > > As a matter of interest, does the compiler optimise away the > > second (x & 0xffffffff) ? > > The code could just be: > > x = (x & 0xffffffff) + (x >> 32); > > return x + (x >> 32); > > > > On my side, from what I've seen so far, your version results in better > assembly, esp. with clang, but my first version > http://article.gmane.org/gmane.linux.kernel/1875407: > x += (x << 32) + (x >> 32); > return (__force __wsum)(x >> 32); > resulted in even better assembly, I just verified with gcc/clang, > x86_64/ARM and -O1,2,3. The latter looks to have a shorter dependency chain as well. Although I'd definitely include a comment saying that it is equivalent to the two lines in the current patch. Does either compiler manage to use a rotate for the two shifts? Using '(x << 32) | (x >> 32)' might convince it to do so. That would reduce it to three 'real' instructions and a register rename. David ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?