Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755342AbbBLJ67 (ORCPT ); Thu, 12 Feb 2015 04:58:59 -0500 Received: from mail-lb0-f180.google.com ([209.85.217.180]:35629 "EHLO mail-lb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751622AbbBLJ6y (ORCPT ); Thu, 12 Feb 2015 04:58:54 -0500 From: Rasmus Villemoes To: "George Spelvin" Cc: yury.norov@gmail.com, akpm@linux-foundation.org, chris@chris-wilson.co.uk, davem@davemloft.net, dborkman@redhat.com, hannes@stressinduktion.org, klimov.linux@gmail.com, laijs@cn.fujitsu.com, linux-kernel@vger.kernel.org, msalter@redhat.com, takahiro.akashi@linaro.org, tgraf@suug.ch, valentinrothberg@gmail.com Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation Organization: D03 References: <20150212081547.14999.qmail@ns.horizon.com> X-Hashcash: 1:20:150212:dborkman@redhat.com::6VCkqyJ5zWqKWm5t:0000000000000000000000000000000000000000000/Oq X-Hashcash: 1:20:150212:linux-kernel@vger.kernel.org::JLw8Y3/POefRt9I/:0000000000000000000000000000000000tdu X-Hashcash: 1:20:150212:laijs@cn.fujitsu.com::cBqhemae8EQaObnk:000000000000000000000000000000000000000000uBW X-Hashcash: 1:20:150212:akpm@linux-foundation.org::6tkkCwQSvWVH+BZk:0000000000000000000000000000000000000oki X-Hashcash: 1:20:150212:chris@chris-wilson.co.uk::47bvRp155idCnOUg:00000000000000000000000000000000000002SOL X-Hashcash: 1:20:150212:msalter@redhat.com::er14D3vd1gF+OrxL:00000000000000000000000000000000000000000003VPb X-Hashcash: 1:20:150212:tgraf@suug.ch::rKKfQ1GnySc+g8Pv:000058cP X-Hashcash: 1:20:150212:yury.norov@gmail.com::f5BPEnmfPhiSj8NU:000000000000000000000000000000000000000004XnL X-Hashcash: 1:20:150212:valentinrothberg@gmail.com::9+oIUivhOYHme2ZR:000000000000000000000000000000000005U/c X-Hashcash: 1:20:150212:klimov.linux@gmail.com::Ln6UfAdTukVx/DG5:0000000000000000000000000000000000000006vu1 X-Hashcash: 1:20:150212:davem@davemloft.net::3UCD0w5pVTAGwVFZ:0000000000000000000000000000000000000000006hSu X-Hashcash: 1:20:150212:takahiro.akashi@linaro.org::UPMGGCng/VDK+/7H:000000000000000000000000000000000008OSA X-Hashcash: 1:20:150212:hannes@stressinduktion.org::GtG9Ou5ucgVt/Jf7:000000000000000000000000000000000007P4R X-Hashcash: 1:20:150212:linux@horizon.com::77S89n15pzn68ti5:000000000000000000000000000000000000000000009z2/ Date: Thu, 12 Feb 2015 10:58:50 +0100 In-Reply-To: <20150212081547.14999.qmail@ns.horizon.com> (George Spelvin's message of "12 Feb 2015 03:15:47 -0500") Message-ID: <87zj8jzc11.fsf@rasmusvillemoes.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3228 Lines: 71 On Thu, Feb 12 2015, "George Spelvin" wrote: >> Rasmus, your version has ANDing by mask, and resetting the mask at each iteration >> of main loop. I think we can avoid it. What do you think on next? > > Yes, that's basically what I proposed (modulo checking for zero size and > my buggy LAST_WORD_MASK). > > But two unconditional instructions in the loop are awfully minor; it's > loads and conditional branches that cost. > > The reset of the mask can be done in parallel with other operations; it's > only the AND that actually takes a cycle. > > I can definitely see the argument that, for code that's not used often > enough to stay resident in the L1 cache, any speedup has to win by at > least one L2 cache access to be worth taking another cache line. That, and if the compiler was smart enough, the AND should actually be free (at least on x86), since it can replace a test instruction. [1] In any case, my code compiles to fewer bytes (though not an entire cache line), and I think it is somewhat clearer - I'm obviously biased on the latter point. Rasmus [1] Unfortunately, the compiler doesn't seem to be smart enough :-( When I compile my code with gcc 5.0, I get 0000000000000000 : 0: 53 push %rbx 1: 89 f1 mov %esi,%ecx 3: 48 8d 5e 3f lea 0x3f(%rsi),%rbx 7: f7 d9 neg %ecx 9: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 10: 48 83 e3 c0 and $0xffffffffffffffc0,%rbx 14: 48 d3 ea shr %cl,%rdx 17: eb 1a jmp 33 19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 20: 48 89 d1 mov %rdx,%rcx 23: 48 23 0c df and (%rdi,%rbx,8),%rcx 27: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx 2e: 48 85 c9 test %rcx,%rcx 31: 75 15 jne 48 33: 48 83 eb 01 sub $0x1,%rbx 37: 48 83 fb ff cmp $0xffffffffffffffff,%rbx 3b: 75 e3 jne 20 3d: 48 89 f0 mov %rsi,%rax 40: 5b pop %rbx 41: c3 retq 42: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 48: 48 89 cf mov %rcx,%rdi 4b: 48 c1 e3 06 shl $0x6,%rbx 4f: e8 00 00 00 00 callq 54 54: 48 98 cltq 56: 48 01 d8 add %rbx,%rax 59: 5b pop %rbx 5a: c3 retq 5b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) the main loop is 20--3b. The test instruction at 2e seems to be redundant. The same at 37: the sub instruction already sets plenty of flags that could be used, so explicitly comparing %rbx to -1 seems redundant. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/