Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752262AbcLXGny (ORCPT ); Sat, 24 Dec 2016 01:43:54 -0500 Received: from mail-bl2nam02on0086.outbound.protection.outlook.com ([104.47.38.86]:6016 "EHLO NAM02-BL2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750946AbcLXGnw (ORCPT ); Sat, 24 Dec 2016 01:43:52 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Yuri.Norov@caviumnetworks.com; Date: Sat, 24 Dec 2016 12:13:35 +0530 From: Yury Norov To: Matthew Wilcox CC: Yury Norov , Rasmus Villemoes , George Spelvin , Akinobu Mita , Thomas Gleixner , Andrew Morton , , Matthew Wilcox Subject: Re: [PATCH] find_bit: Micro-optimise find_next_*_bit Message-ID: <20161224064335.GA7280@yury-N73SV> References: <1482513603-9630-1-git-send-email-mawilcox@linuxonhyperv.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <1482513603-9630-1-git-send-email-mawilcox@linuxonhyperv.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [106.77.1.255] X-ClientProxiedBy: AM4PR0701CA0001.eurprd07.prod.outlook.com (10.165.102.11) To DM3PR07MB2249.namprd07.prod.outlook.com (10.164.33.147) X-MS-Office365-Filtering-Correlation-Id: 6bc1c7c9-5b77-4542-20f7-08d42bc8370d X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:DM3PR07MB2249; X-Microsoft-Exchange-Diagnostics: 1;DM3PR07MB2249;3:ouo/inLcMy6sRJgnzqGmBk0jOGrZijpwsN62gm/ZYezJt+C6kCcVUuI5uu6sDNt5httTeJCOZyM3pxU6uKEqCa7LqbsP1zuO3oeOFCJs6uzZeliw3lbAxxkKoZJxF0UFbWZhJRpDI0FHci1s+LGajOcJFzrTm44wb3NTefl7FePEUR54M2zdKByve4RJniU0GItD+D5XGbbMeRJW6bmSxH34BfelZRhZ6gk2pSvE23xGQgp9u1kBm7zBvJConTDHRC/gedSmoGBOo0mNdmaB1g== X-Microsoft-Exchange-Diagnostics: 1;DM3PR07MB2249;25:X1HJRU/0BqC7SJnLMLWADpeVgWh4/nNBNAFO0bNXDCAB2kWLEguJgkQGA8URgf5MQsAq9bwPGSFu78pAghQwD4H2Sq97g5CTxMAR22kZSwM5vNPdL6esuKMX7ARco/e9xXRXCFa31GxnzY6pS91jIVKdyPttUa4//QaxTAAyShcgh/mLCdNPHkegaeQkxVSG1cyQalc9MVGOXrvRUXnN/VAnPhsjNYutOjn/PbYLexz7zRuVyGi6Ru1UeqATB2r9/2wHlF4WY++IxwwCUbUCbEtnpqMKiCILmpLAZr7CTTjRw1nVrMwgcxRMyoE7izKuZdOzt3ZcbFAEcueLZaqqm29PSpL1JvZJExE8ohOif5axWSU3QlKDPzohODgmwflIzxm8eaEwg0JTfQh6Ek8ourR+UxVegOZeJLNUG+Y2shMrhEXAhnRrYqJCRoQIV0vupxU+cQzCWJY/MlSAcVNOErfsu5BkIn4hA1oVNurtGJc+f56bkqhVJJEMGhPibzBMZfqqDBOn1yDNZnOWCiMxS5ASBc8/rgwhkNINoEBbX8NGTW3DAdfOGcMSDVhTAUscu3znkKbB1vSGSH89mgFoDOAGiwDIohG7n5nPtWNSzIEypV70K1ZaSD7fdBuaWPlpHhNkhzu40Ljaladi+VTkjssIistR+6nXaOnhycUY3hpkqgquaV4AY6V9LHPMOCBPU7GiIu6b7gzv5qKhrxIHIk9qhwZTHaYWhjwjTnldBv545VAV4ZKZnq++ZFeuwaYHYqr5029vLzq0pCjNoWSUhNytym8SYLocGJeDKXGYsk8= X-Microsoft-Exchange-Diagnostics: 1;DM3PR07MB2249;31:XrAI0YfizGgmA6HTmUiErbRrcE/ZBsXZ+fv6/KL9+l/pMsqrALn4+YKYDB80iRpZMszHpx128TM+Gd+NBILWnpfN6sM3OVKFYmSRwB6hllDkbCtlEx+z9ZzSZ6ZsIP9Ow4Z+XzpPUi9g+EVvwuthLwieIXu9JZYSmRdXzPvHvZxIdonloJmXdGhVbHysx8l5bZMk7YVG9+ok1U5TyRtDXaEz6hEIoYBEWJao/6rsBo0WuFSaRT6AVjcIZerA7L80USrFHE2rbiHknty/nNC/ZHfG6DE/xaHy9jzuQWzaM/M=;20:uAsv/G7poFfK3b8hFecGNO3HVhHNhbBu04qR7tNYys4qqx3G58ym/SVCok+yB7s3khnyPjwsTXURALlttOIi15EK6Qq65jkzr+izi0Yoygg64SiPo3Q+ZZr7b9ASpz8d2eJQMjDQ5r4TxEXllvv5xnWbZLeIjvgm57ywXVJydNcyZi9SLb9+2j/p9BuJnoOpOiGHC3OQvQnIpEKkK4xMyuI6Asl2EGnZSF+QsvKbm6AsUMNPjfrrh/UMYht/2P+HuzNWdJsdr2l4duRWkfq0rgXJdFHGqSGNNwgwDqN7Hnl+LS7gxFcYW/kbfO/hbxwMFB7l9h8o5kvGOjN9WtwSUmpBtUce1LLrbd30N2yMd++MPCHI2wPAY52ru0EtyR63gYS/8AaPcsZLh4bsgEd2wmDpkJ5e+sHVwDuTLJV885Cgwe0PQ4GVWMbaB1tzdnb3RJTYZtifY+9cHx2FT5rSZL9nyUi1V7C+am3ciy/zF3TQJXygFGln+sTfkU2C5C+1VzNXenID59h2Mqfl7gZyRvsLBp3OgDTxFrRjUhT/VASAPyzsUVoSHjce4ipeSBx2LBk83H4b2nYDl53aVq7Xe70vict08yuF2iIiBywlCMk= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6041248)(20161123562025)(20161123560025)(20161123564025)(20161123555025)(6072148);SRVR:DM3PR07MB2249;BCL:0;PCL:0;RULEID:;SRVR:DM3PR07MB2249; X-Microsoft-Exchange-Diagnostics: 1;DM3PR07MB2249;4:AdRAd778qwb3bxWMpJpnnXHFkci8rXyfy54VxQcGNfSmEwPkfM5CGmz4q9IEB9Z+QtvNDcTieFlLW1TMWoBN/3+vJGDDx6SG2DtaJ7VDj26EwpshIFmlVs+3B2rZcv5yWXwN/Tu6vYsrsh7/UsEtVvEke5mOb6aM+MtjGAFC7Y9l5BGTg7GdppVzbT5J9Km3H95Z0V7qhD/VfWkX9rZm7cAnhdZ3zsw+BmFXhfKaFdQqy7ihMqxjhB1isKWwR3XYrl9jE4l9LabGum72Y7Aw+rs6PKsxF6MLrVDs9PnFYnWCM5W2wjCxL37iPlVwu45zYeTUmd038S9RbvRi8EDlufClKOSEwAeGQfRQRbsXh5yqCPqjRd/9jjvx5F/XFybO3sxebUDWblOTkRKyze9oRyw0ofiHQ6Gm4/4V3TUDvFtEJ40y817K7JBQSkNFN9Nv8+8RFY2dDSS7bXB6R9JcVFUWTvfYEnW/x8nBvPrEe9cwtb9GEKzZWcWKDfXWMMzoikradW80QcpqLHCaAbTACdGYdmvp6+ZNS/hjLQNs0JToRFNbbUZqwJafDyK3xJ8X X-Forefront-PRVS: 0166B75B74 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(6069001)(7916002)(39450400003)(189002)(24454002)(199003)(584604001)(189998001)(97736004)(33716001)(4001350100001)(47776003)(81156014)(81166006)(9686002)(66066001)(97756001)(83506001)(5009440100003)(8666007)(8676002)(68736007)(50466002)(46406003)(38730400001)(39060400001)(4326007)(1076002)(105586002)(106356001)(25786008)(229853002)(6496003)(6486002)(50986999)(33656002)(92566002)(54356999)(3846002)(23726003)(42186005)(76506005)(6116002)(76176999)(305945005)(7736002)(6666003)(6916009)(2950100002)(101416001)(2906002)(5660300001)(110136003)(18370500001)(7756004);DIR:OUT;SFP:1101;SCL:1;SRVR:DM3PR07MB2249;H:localhost;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;DM3PR07MB2249;23:EoAEmiCD5LZbdpurqTIDCc755Tl+40w4/iy+LVgUh?= =?us-ascii?Q?DTGOCvUqf6SAn8C4KOUkI4r33yCGJoHljL2vzMhZYpI5uBMdTsy5HfJHVpJp?= =?us-ascii?Q?zpg8TyvZnxlWUDKgjtyuT6UmgNcEgk+clpBx9v6D78MHmTYC+UU4rN6aiLd6?= =?us-ascii?Q?qZu1466/VAqn6bAmaROx7czPQlb191LgM4JMYH13oqgJrsfiTZZMgXiVkCXk?= =?us-ascii?Q?g7KhQt7n51Im5fn/SNoUpdaNv+JlZBCX6uQBxDfKan+FUKJjLKAC99XzwHBR?= =?us-ascii?Q?fh9Drn6ZYXHiKeetHzNdhdcFv90NC/qVmS31B9GXrHulvtX4TJzwoL0q6gHw?= =?us-ascii?Q?3W1GS3jIZa8BdnR49PiCNU/vXhaQgTEYxSVHK2ZoY2/brMjpfwJVwWG1RENU?= =?us-ascii?Q?2nzAx+7er3WmzFmyUwEdYzWfSeGhc80P+R/eFptWwYUN5iRWOKIhi3/MVWN8?= =?us-ascii?Q?F2siIva9Vb0I3ls2+SHPfFpVLa/wYv4MOgVuZQTsLLzEw8gnrVgzRM/mGEmH?= =?us-ascii?Q?JiEkAqq3KbjGtKk4wvqurR33T74wbqm/Awrw9/F+emYflH70fcO2ULhrYECE?= =?us-ascii?Q?75Rh4ElzzfpchPctpdEcOAUsvDXcdUy9x82SMTyiRaDUXuTzTk5qTIr7IM/s?= =?us-ascii?Q?vHvDVu+n7Qg66QjX3/nTni6nseRtuIZgc0EVpkJIxueRE3fyB/mC/UAM5N7A?= =?us-ascii?Q?ymRghSd9yjGjsaPenidEouVKPWCMOYz5Q7H2+d4lmCVpeJiTTM2iv25zeZXs?= =?us-ascii?Q?J7Xo5JsOSRWzEqfhEwspZ3gvH3yjD/UhSgXWkR9OH4htEUBDNs6R5BMYNuEo?= =?us-ascii?Q?uJwhRAfUAY8jVIPO64cIesM/QaNWgh+wr1AGaXGPJTQBsPG0fKTrguzO8J10?= =?us-ascii?Q?Vpjfju7e0lw9Dtf0HiKuN0ZJo02KVebMzl+sExs96jq4pUXkbk6kg32PYUAn?= =?us-ascii?Q?Jl675a62JXWsXg8KeoR5R0K60zCW42GCcAsH88eH8LoBPSxsHkRRwb9IAwTe?= =?us-ascii?Q?zW9HPgXcEt7fne4GahJWRb7fHup0mDZrsvolbThq4VDhdmqbmJoXAVcu1w4L?= =?us-ascii?Q?rXe76hbynm2tYdzeDadjMulftjqBb3ZYIFH5fdCXc4uSkzQpu1HTIL7dPASO?= =?us-ascii?Q?L7r+ikk8B1osGB7cFfxOOn7EWRizZK0t770go81lkctPlQ6sTOts8sRzhZko?= =?us-ascii?Q?ZnTxQP8fmt9HohD3+TlahdboL3ogEZ3ZSXWTnT9tELE4NFAQD5en4ahEfvAj?= =?us-ascii?Q?aODzXD+6PJoLOLWccXxYvn7MJD3Y+3vkKN5OmgYSn1+9iPoqvxivtCQx0gA1?= =?us-ascii?Q?7NJsTxgSUGDIYiSdGBxxgraz2ryy1HeX1SawZAvN2untNZg/olTZ1taF/Kwg?= =?us-ascii?Q?k41oFt3hPx2Hdy629oEeQoJpqI=3D?= X-Microsoft-Exchange-Diagnostics: 1;DM3PR07MB2249;6:KIv98wHzycPEd1ea5haqrv0cjNmDcxtL1nvQTGGqNRxnCvd2cT8svXskzGhAdqtrZKgjEjhPZYVg0/J8O20OQ7PAg8jCyKKfAMYKREAKurwKMHRU4V4QXKdruMlaULg+Tg/mnqcZahv24l/HO78cLIqWQa6bOMVDEMtgRVlgNEGpGN2HILSPRVOfc9q5SNre5/SSrM1CLh7wjjb/WvFi8WYbtxb54vtKkF3c1wtiD+bkgXC1G6r8PwX62m8RxZD944p3oymfgb5g7cYa8bVtrlC1Yna6zs/rlPdQCWhoS6E7r8U8rM0pYtY+7pRm51kfz0sNYLmuIJu8h2+IPUYT+F+cdfVHP1tejJZfH5MY9FNwJhab+6TVrDdX/Rqsf9nOb23YbJeSLVLlTifPbFib3vp6XlmG7o2jcctl7SWB8M0=;5:iI8nX1kzAOyyNi1rzqR/sFxqOQE+EYA0fsjWEIM/lBKbuXEoHKo1j9QowVg3+MCLAJ7s/9R060ncyoj9sSViXmcQ0cYbtwDrrRcg8OHSaF7fWcXwVgLvfbNYcdhCyi2yiqZMpr+VmAD6MSnhzsIQqQ==;24:4eQKSaWIJ7Zd5iSt5CtoOr2O0DTRN5tijgWmy4rJDv71+u64waNJy6Y+td7+smuKnsgqyZg3udrJQ/K+HBs9uQoiRiJCvfOJor08DvJRxYs= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DM3PR07MB2249;7:tcCLl/8lV2ed+6CnPq3gXu3jyL4gP0/8UB8gX0VM97WisqKlphEhwRi/UqVUeCxyCz9hQCXq+qpXjnWMusATsfMgkN0Q9H74dh+rVM1XIdPLOth9q1xqj/Rm/LupQWyykSALxeZJnk2KCV3Dku+M+7l02ls8C09rejgUtCirxl+7KvRVo7/Gc88NYQDW5R08I1lVM+su32wQBAmXm3ogebLOj6B43oIq8GGP6j0tfP7SCpKLXrHZcIDiClVYwbO1neyo5jAvaC1Znknc6fF9T8OR0bACujZXraarhtoc5AfviGrTj8TfcPlsV5bre6CUWSjo7tFhhSO+lO51sqhpuTi63I3fvOJ2+aEdZvt2Ku1WNohZ02I6LPig69fUPyigYaqAV9FdwnMn5/JjpazbCBGYdvrVb2ZBfBopQ/XJNH0ahY2AoYrtB166JE44kHkzj0wNZBlUzmalfIUwdkY05w== X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Dec 2016 06:43:47.3135 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM3PR07MB2249 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2076 Lines: 60 Hi Mattew, On Fri, Dec 23, 2016 at 09:20:03AM -0800, Matthew Wilcox wrote: > From: Matthew Wilcox > > This saves 20 bytes on my x86-64 build, mostly due to alignment > considerations ... I think it actually saves about five bytes of > instructions. There's really two parts to this commit. First, the > first half of the test: (!nbits || start >= nbits) is trivially a subset > of the second half, since nbits and start are both unsigned. Yes... It's obvious... when you point it out. ARM64 GCC compiler didn't notice it as well as me: 37 0000000000000070 : 38 70: eb1f003f cmp x1, xzr 39 74: fa421020 ccmp x1, x2, #0x0, ne 40 78: 54000088 b.hi 88 41 7c: aa0103e0 mov x0, x1 42 80: d65f03c0 ret 43 84: d503201f nop 44 88: a9bf7bfd stp x29, x30, [sp,#-16]! 45 8c: 910003fd mov x29, sp 46 90: d2800003 mov x3, #0x0 // #0 47 94: 97ffffdb bl 0 <_find_next_bit.part.0> 48 98: a8c17bfd ldp x29, x30, [sp],#16 49 9c: d65f03c0 ret > Second, > while looking at the disassembly, I noticed that GCC was predicting the > branch taken. Since this is a failure case, it's clearly the less likely > of the two branches, so add an unlikely() to override GCC's heuristics. > > Signed-off-by: Matthew Wilcox > --- > lib/find_bit.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/find_bit.c b/lib/find_bit.c > index 18072ea9c20e..7d4a681d625f 100644 > --- a/lib/find_bit.c > +++ b/lib/find_bit.c > @@ -33,7 +33,7 @@ static unsigned long _find_next_bit(const unsigned long *addr, > { > unsigned long tmp; > > - if (!nbits || start >= nbits) > + if (unlikely(start >= nbits)) > return nbits; > > tmp = addr[start / BITS_PER_LONG] ^ invert; > -- > 2.11.0 There's also _find_next_bit_le() with same code. I think it should be also patched. Acked-by: Yury Norov