Received: by 10.223.164.202 with SMTP id h10csp4040018wrb; Tue, 28 Nov 2017 23:17:18 -0800 (PST) X-Google-Smtp-Source: AGs4zMYyxOtUmp2RVZTjZEQASXxmO8TSiy/nj3T0xbvbdzYNY57n0Pv0yNFLDTIOBBryIdL+9I5+ X-Received: by 10.101.67.1 with SMTP id j1mr1849976pgq.367.1511939838830; Tue, 28 Nov 2017 23:17:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511939838; cv=none; d=google.com; s=arc-20160816; b=cERZYkeWS/bd4z6gXugQ3PyqZXVqLdKs1Sy/5TX4rwUVfarzqpNaywtsibjyFVHHlW qeUKrVuqcYZKcCFJlj2CVoRAeDBQFykJnU7+Wvoe6NhOouAIWLAkcLyt9RHICOjQSyMF DxbvHTzhA7Fw/9aJ0kUeSqQ2fdUOUNmULq0njOXU6kyY/zKBZtZoLnCaviUXd2XPfBnP eYUE4cJGUIUHDGTPS0h23dZxfRrOZDOQJ/5XNRg+Xrr35/S0R7rC/5xVmV3ZFZCXOlLS Ggnp+oCPm7mZowLih9J+yQmOty9usn7OyxkkF/Ourt8XxRRLRH3XfvzmvqZWWEO6dUou ihFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:spamdiagnosticmetadata :spamdiagnosticoutput:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:arc-authentication-results; bh=djHPaoPUZ9ym/xMzYfGtM76fp6gvOaao+U+xR7Jy+mE=; b=mBs3H+cnC0oolz4nhTmNPi9isvqpl9HsuwrO9DGKZX/GOTJhfHP5aVuAjtWL2dgxNs BIwx5+AxRBt+DN1YMzkuN1lsKhpVnzfL5qmeT1alaehxQrxGxnxOB3Zku0wFz8dwBuZQ FdjAg3aBTWVCA5FbyPc/hh8JkVFIDRGgmEkQlw/ViL5S7cGynO22R8vqq3UpzC/kRsnL hEBp0VL3iTHUCjCXmCrBfX1jZagpg7YB5j/LY/WY1koX+ryv/ZZ78H2ZZhQtsvYr3ev1 JJHFRXZV6xB+yIhODiVahW3ZhnK653fdcc/hPFskfNmaBFQZpn0jGcPjhtTN8C1cl9Lw pvCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@CAVIUMNETWORKS.onmicrosoft.com header.s=selector1-cavium-com header.b=Jdyn3grq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 95si810852plc.547.2017.11.28.23.17.07; Tue, 28 Nov 2017 23:17:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@CAVIUMNETWORKS.onmicrosoft.com header.s=selector1-cavium-com header.b=Jdyn3grq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752098AbdK2HQ2 (ORCPT + 71 others); Wed, 29 Nov 2017 02:16:28 -0500 Received: from mail-by2nam01on0049.outbound.protection.outlook.com ([104.47.34.49]:6512 "EHLO NAM01-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751366AbdK2HQZ (ORCPT ); Wed, 29 Nov 2017 02:16:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=djHPaoPUZ9ym/xMzYfGtM76fp6gvOaao+U+xR7Jy+mE=; b=Jdyn3grqQTymTf9pWQpwzyD4eyHAiewRBJI2mbol51urpnz41sdrVtC3xN9scDWdMYYzqO7ruSojkhoN0UIt1s9GwY54QSRkVvRl8bJDBsJ5rklGM4jU82gxdYLfkj0eHheUWbO+3eut22iiTJq2IeGn1fTnt+DGhv8KMFw4Ws8= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Yuri.Norov@cavium.com; Received: from localhost (111.93.218.67) by CY4PR0701MB3827.namprd07.prod.outlook.com (2603:10b6:910:94::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.260.4; Wed, 29 Nov 2017 07:16:21 +0000 Date: Wed, 29 Nov 2017 10:16:09 +0300 From: Yury Norov To: Clement Courbet Cc: Andrew Morton , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5] lib: optimize cpumask_next_and() Message-ID: <20171129071609.lcvcf3iezr2tcn4j@yury-thinkpad> References: <20171128131334.23491-1-courbet@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171128131334.23491-1-courbet@google.com> User-Agent: NeoMutt/20170113 (1.7.2) X-Originating-IP: [111.93.218.67] X-ClientProxiedBy: VI1PR08CA0129.eurprd08.prod.outlook.com (2603:10a6:800:d4::31) To CY4PR0701MB3827.namprd07.prod.outlook.com (2603:10b6:910:94::33) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 55937631-5d6e-4b49-0724-08d536f91868 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(2017052603266);SRVR:CY4PR0701MB3827; X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3827;3:t0NOIAsoItLOFpL5jVUGxnU5W2cf+/8G61j56JWLUTU+v1s8AasVys9sPlvkZ0e2LDbPIHAzkNO7xwaJzWGolaHobsLGyH87+qYNZs14UqgTVlD9UJ/3V+GTvlxxdn0ItlaxMiCmcURdjWyqT4mAakK5HSfv01tf6Y2C8L/pVzm9Cl4oGJaHlF1udq2qqR09wBB3J/e+CSeEXPkT0GBeSEaTBXxkUQeuac2D94mGr7QRu8cAQJ5R6wYJtzWbYOrL;25:W2sc5r1iqQ7dLrQH2A0Fx/vhrM1WAoUBJrJ5UKw7wYaQIWMzzbtcFoLqQNd994awa2Q9VFkXeVYsO7G1dDotpQjlpbl6Kmz+HTGL6GPJ2cyQtn9PvzBcyeujHH7MCnR6y3Q9pQi08LDrsVh+TDd1Grj6N2B7v9TBKq/8NpJCUdOz4oWMkMXYvAKfolu/J8fhc3YSwF7OnvRY4KilNVvNfK/Pu6zh4JGB6uQVqcrVf1CAHDTz1BAk6Y8Bw7jGszpEJuBAcnXrADyc7cBoYBMvMchvCIN6Bvnt0G+vkIMdq03dFxUXsF/EsmMuVzXli3/ChWzvWjFM6K9KTMZZ7223EQ==;31:UGYyEiWpuNV2bAHfbfprI6fVT5tk0bMaypMm0twVSCp80+kWU+2hKT+0iZAwk2zuEfCKqIIOLC/Z3iRQ28U6Tz0qA4e4ZxxLKBSKId3fo889G1ARFuRyLavs+A7nwiCgzUbcbudNklasyEA2lJSCnrHeNzkNxj0DfkVB0YLL0a4ZcFaLK8tJXqze4tPT+GLU+XJkvei2zfRvL+gsrOuIVSyeb7Jy7QtwntCy4UMcTno= X-MS-TrafficTypeDiagnostic: CY4PR0701MB3827: X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3827;20:fD05TIYK+PRIIMbi0p0JSkTPl9tufg3O46dn5L96wGIjylDe4h511e33Udz5CDESiIdRu8rLWh4HmfzY3aIzWGNkrsuvdaetOwMSwV3pfHsZ5tON95UIMGmLqvdczUPXEliCBrXQME+rzNZmaa3i3vhmbyxg0q+MYIUAHk7i5tUcPEcrPibBTWMmIQ3B8zn6og/OKIkwL0uiimpCCckLuGZ2/eZn9KVlbOKbpk+onecLHsF9LlznBOEJcfxnbPuK1QOYc4QuPcwXN0fCtSM542gpXBUx+6UjPek6UKUpT0zhHhbX2ilRh0sg8sIXIL8S6jLWzupQxAo9K0LOHyypoP+7SLz+9PuVqT/pwmj6K2UmP2HjIwWY7OMpR7MeL+XqDPkRWJeIwBE50DcnRA37L7mKbulrH1679KgZ5+9Zh/AKqcxDjwL8Rqzhk3qnhKJR7Fgt+QaGxpJMMFiBSIH2xmREfNy6NYEfqMg9vCqvCd2iDQUDESeENlKnRVfUXCk7tvmZxAKh4TC7AYNrbVqQVrQIavUFv72PbLj8iMQt27zU56uSfedSzvkey9Q6ef2OBjc5ovtMbKidjqFRlIz5WFUvqYnM3TNFOrL1ARTWpcU= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(211936372134217)(153496737603132); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(2401047)(5005006)(8121501046)(10201501046)(93006095)(3002001)(3231022)(920507027)(6041248)(20161123555025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123560025)(20161123564025)(6072148)(201708071742011);SRVR:CY4PR0701MB3827;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:CY4PR0701MB3827; X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3827;4:f7CE7oIE9cAcn9sETap5Qn05NaE5rGk9m9aPuxWcH2EVQNIjWZx4I4NblTlGDHDQ5yBI4c1lsHdXzYC5JfZukgEDt9eObDyBx+f6Z6QcTLJVOjK7FJYibZYoSWedSycgo+zT3aSpgrDBu+bOp5IU2uMCusmZiOG0eCRLw+Y8vc4D5eJtX6Di45nKeDZDnoOUMdYaJcRjWoPWqt5uFycsNqiG4oC1BR0F7zUzUirEtCCMEECL3k3VzjH/6NLg8RbrwGbOpfrisl7eHlVoHqT3fGL423tN2r4keFU7B//cD5BNRlmuzRXiRXW2wYDkktoyPE3X/JxXG1SUGoFfs3ebKkyeyW2WM1ND9UT06bfeVTo= X-Forefront-PRVS: 05066DEDBB X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6009001)(6069001)(7916004)(366004)(376002)(346002)(199003)(24454002)(57704003)(189002)(16526018)(6486002)(33716001)(5660300001)(6666003)(2906002)(6916009)(2950100002)(42882006)(50466002)(16586007)(58126008)(23726003)(1076002)(3846002)(6116002)(81166006)(316002)(229853002)(25786009)(5009440100003)(575784001)(8676002)(478600001)(81156014)(72206003)(68736007)(6246003)(52116002)(7736002)(76506005)(305945005)(34040400001)(66066001)(50986999)(76176999)(54356999)(189998001)(105586002)(47776003)(6496006)(8936002)(9686003)(101416001)(53936002)(4326008)(97736004)(106356001)(33646002)(83506002)(357404004);DIR:OUT;SFP:1101;SCL:1;SRVR:CY4PR0701MB3827;H:localhost;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CY4PR0701MB3827;23:2LFMIqtPGxl96woLFpdpsPrbgHUddUPpT60sL9U?= =?us-ascii?Q?jj+Yx0y0hafjMxMly04XjVNT3himMNnth3GkvZ1QQuN2m3imopm8+nGpaw67?= =?us-ascii?Q?9jn0mZElmcdl3puhrutrjmWPzgobugXKS3xPQ7gA+GrGZyZblqZT/AW00Tss?= =?us-ascii?Q?cm5d3xUHQdZCjx1BcP9Xa4/3DDGTSEGcXyXjgntLHTFTggB6RP8GgFKIe/dD?= =?us-ascii?Q?/CSzgNv+IxxE7oInLUFnnlPISTUwy0wIU+xozB3OFy8G5aeGEg8NLQ47D9Bm?= =?us-ascii?Q?G/Q52NxOb4AnWhX0ibZFeVW3zQu4FcXD/znWdFqasljLJsB99Pp79trD5jg+?= =?us-ascii?Q?/mzSsSgR+qsAsV/W+N7LDGbRyeElObV0FqoYuVjohS4LZA1nLULFwA48UV2j?= =?us-ascii?Q?eBNgPufTMljbcjXBQxH4w+HZL73su/oMq7GRjxJlxLFU7Wow0BMabYeetZJU?= =?us-ascii?Q?yZDZUOH4ndRuKHWwwHwDusLndW1qiNA6tqX9EhIHB0506k1GmQOL7BtRJKlB?= =?us-ascii?Q?MivzmbxGLAh8hyB40aiyxMZEzEDF5y6Bubsmua2X2qJpNShQec18HbAecItn?= =?us-ascii?Q?kaMRODwEcG3+EpsVRYy9UpCdiGVoYQfcU6ghTeV5+pOH/MctisRJlmgOVQvU?= =?us-ascii?Q?grhRpn4FD58aQ9ouqXmLBDKck/Na5IQYpO00snSbooJ6r8yj3BtAHGyL6Hw5?= =?us-ascii?Q?MMdXOvTlyZyw11nWypDf/PxShscmTQcyeFOcy2u2d7C5bNaxtvdjlcGU6t4Z?= =?us-ascii?Q?iFwgSHCebQJZkiBi2z36Z47EF1csgWpHKewiV6KkfKw44GnLhTg1cpX9oiQ3?= =?us-ascii?Q?BnxgYunQIHSkkSGZl0Okeyc3l4g9KFdW7EggRkrdJVvGVIZUMcr7A446QxXm?= =?us-ascii?Q?/yyNdP6AQTZynQXsfDbGc7wlfdc8UvdhlVpmjxpVpwNpZPmpBl7+Es1sCLlr?= =?us-ascii?Q?qsHBsQULBJWM7wFFBOl7gGPqDOuE5DwjZ0tYjo6a1BJZQYJcJiV1uav/i/Iu?= =?us-ascii?Q?Kg9G6Lmqn/L+H8d5ID6qX4v90LfAQKYRia3os9nZ7573LkZ2Vw/kny2cgj5v?= =?us-ascii?Q?3OZ5A23d/jfKD84XRRO6LZNWIX8LJh6urLYvxcXd7PH8YhtgrWi/GrIFeUHi?= =?us-ascii?Q?irPWS/KrZ7svssFjmM/qjKALXML8PAiRVpu+KSc6nHGC1Kz3rSFEIFid/T0c?= =?us-ascii?Q?fm0UDxwH0RTFvcUxm88DcqKZQF9JCp6GmoH/nazXX0CmxnoKhPkT/IGx6vej?= =?us-ascii?Q?z9iuQvKNni+FLNKbuzostlBgyLYD5pn0fHMpdwp3YAZT1Mh2VPnYYXpZxYCs?= =?us-ascii?Q?kvW2F6vL+V/B8Ugenk+JUm6XoD1uWvGxr6ncFAKIDZOB+XMu6Euu9iZxq2RX?= =?us-ascii?Q?DHBuTy10Fbik9lkx0LY7oM20IDs/Jc4IUFK9fcaUCRal5+qLd?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3827;6:fPhdIkBvUOmUYqR/iZnTYrUgqGifYh5rxcxAqWiBitaMZ93Fi+8EpRjcKX+SiVg+AfGNNMW79P1WWNwg2a9OyCpamKTuZI1rM+Xlwkf0J82+yZ0weYgK7lj74wbdAVNbUOOkAQNt7xiBMegK+awSD+26kRD8y+pGpJvYwzX0LQU1cWLRNWwbW0HUI1m1ah16Q/zT8MI6tF4xcu90S//Xs04rRc23fadrZE2GxqsyxDHjS5UFOMWJrrfUaER+tOsQd20lW3XiGq3+HGbWJDjoWZuJwOOFAhRwqkid9qRKVA/fFH0djG2e+Z+NqkRCjO2cQKl484KN4DgGoLz2HhVll0mizCVTVMoGkJae84lWqZI=;5:LVV5paQx/scGmJugWjY07wA/sADdZQ3gA1NH4ShXKYAZMvSnuMn3xv92uMF6nw1IETMCFN1zpsEwORaAb2Awgyp6uKFjAPKZcgwvhk9/W5oyGO1HhXBMkW8yDFwxoqbTLCqk80maUYuDDOREmCziCzbWcBVpmuUaSdhxkp75L3c=;24:l0BRG/t7jd6KDk6yUsb2CYJkWqwkDz2fOnPl7hcHEH9G/2yR93jP+SdhS735iru7EBKMsvLVJrJZSP23NNysgrH2Wn1AfDQOnFIIXHqQuW0=;7:h+6K9G1nd63xQbQGn5E85a0ZQhiC54+MD/BY0Y9HPCjn0PPt3Uwfwefj9qixzLpD7F5fRhEutj+5JjW+dT0X35L7GwsegmMNcuIqpEpWGpxdZWeahzGG369893Z3yDUJNDyKMBPRmLtDfyAVotZVsYBj9L6sPKUnr8dCY/TtDPXng2h7TYaYgq4nOm38BAWnStjdENd4WHA5LTX3a/xXDRjIP+ZMapfZp6cWPADtYjsHqlJYtzE/7cfh58uD0VGP SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Nov 2017 07:16:21.7645 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 55937631-5d6e-4b49-0724-08d536f91868 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR0701MB3827 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org NACK. I'm sorry, but it seems you have to send v6. See comments inline. On Tue, Nov 28, 2017 at 02:13:34PM +0100, Clement Courbet wrote: > We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and(). > It's essentially a joined iteration in search for a non-zero bit, which > is currently implemented as a lookup join (find a nonzero bit on the > lhs, lookup the rhs to see if it's set there). > > Implement a direct join (find a nonzero bit on the incrementally built > join). > > For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x > faster with a geometric mean of 2.1 on 32 CPUs [1]. No impact on memory > usage. > > Also added generic bitmap benchmarks in the new `test_find_bit` module > for the reference and new implementations (results: see > `find_next_and_bit` and `find_next_and_bit_ref` in [2] and [3] below). > Note that on Arm (), the new c implementation still outperforms the > old one that uses c+ the asm implementation of `find_next_bit` [3]. What is 'c+'? Is it typo? If you find generic find_bit() on arm faster that asm one, we'd definitely drop that piece of asm. I have this check it in my long list. > [1] Approximate benchmark code: > > ``` > unsigned long src1p[nr_cpumask_longs] = {pattern1}; > unsigned long src2p[nr_cpumask_longs] = {pattern2}; > for (/*a bunch of repetitions*/) { > for (int n = -1; n <= nr_cpu_ids; ++n) { > asm volatile("" : "+rm"(src1p)); // prevent any optimization > asm volatile("" : "+rm"(src2p)); > unsigned long result = cpumask_next_and(n, src1p, src2p); > asm volatile("" : "+rm"(result)); > } > } > ``` > > Results: > pattern1 pattern2 time_before/time_after > 0x0000ffff 0x0000ffff 1.65 > 0x0000ffff 0x00005555 2.24 > 0x0000ffff 0x00001111 2.94 > 0x0000ffff 0x00000000 14.0 > 0x00005555 0x0000ffff 1.67 > 0x00005555 0x00005555 1.71 > 0x00005555 0x00001111 1.90 > 0x00005555 0x00000000 6.58 > 0x00001111 0x0000ffff 1.46 > 0x00001111 0x00005555 1.49 > 0x00001111 0x00001111 1.45 > 0x00001111 0x00000000 3.10 > 0x00000000 0x0000ffff 1.18 > 0x00000000 0x00005555 1.18 > 0x00000000 0x00001111 1.17 > 0x00000000 0x00000000 1.25 > ----------------------------- > geo.mean 2.06 > > [2] test_find_next_bit, X86 (skylake) > > [ 3913.477422] Start testing find_bit() with random-filled bitmap > [ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations > [ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations > [ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations > [ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations > [ 3913.480216] Start testing find_next_and_bit() with random-filled > bitmap > [ 3913.481027] find_next_and_bit_ref: 319444 cycles, 8216 iterations > [ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations > [ 3913.481075] Start testing find_bit() with sparse bitmap > [ 3913.481078] find_next_bit: 2536 cycles, 66 iterations > [ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations > [ 3913.481255] find_last_bit: 2006 cycles, 66 iterations > [ 3913.481265] find_first_bit: 17488 cycles, 66 iterations > [ 3913.481266] Start testing find_next_and_bit() with sparse bitmap > [ 3913.481270] find_next_and_bit_ref: 2486 cycles, 1 iterations > [ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations > > [3] test_find_next_bit, arm (v7 odroid XU3). > > [ 267.206928] Start testing find_bit() with random-filled bitmap > [ 267.214752] find_next_bit: 4474 cycles, 16419 iterations > [ 267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations > [ 267.229294] find_last_bit: 4209 cycles, 16419 iterations > [ 267.279131] find_first_bit: 1032991 cycles, 16420 iterations > [ 267.286265] Start testing find_next_and_bit() with random-filled > bitmap > [ 267.294895] find_next_and_bit_ref: 7572 cycles, 8140 iterations > [ 267.302386] find_next_and_bit: 2290 cycles, 8140 iterations > [ 267.309422] Start testing find_bit() with sparse bitmap > [ 267.316054] find_next_bit: 191 cycles, 66 iterations > [ 267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations > [ 267.329803] find_last_bit: 84 cycles, 66 iterations > [ 267.336169] find_first_bit: 4118 cycles, 66 iterations > [ 267.342627] Start testing find_next_and_bit() with sparse bitmap > [ 267.349992] find_next_and_bit_ref: 193 cycles, 1 iterations > [ 267.356919] find_next_and_bit: 91 cycles, 1 iterations This is old version of test based on get_cycles. New one is based on ktime_get and has other minor changes. I think you'd rerun tests to not confuse readers. New version is already in linux-next. > Signed-off-by: Clement Courbet > --- > Changes in v2: > - Refactored _find_next_common_bit into _find_next_bit., as suggested > by Yury Norov. This has no adverse effects on the performance side, > as the compiler successfully inlines the code. > > Changes in v3: > - Fixes find_next_and_bit() declaration. > - Synchronize _find_next_bit_le() with _find_next_bit() > - Synchronize the code in tools/lib/find_bit.c > - Add find_next_and_bit to guard code > - Fix invert value (bad sync with our internal tree on which I'm doing > the testing). > > Changes in v4: > - Mark _find_next_bit() inline. > > Changes in v5: > - Added benchmarks to test_find_bit.cc > - Fixed arm compilation: added missing header to arm bitops.h > > arch/arm/include/asm/bitops.h | 1 + > arch/unicore32/include/asm/bitops.h | 2 ++ > include/asm-generic/bitops/find.h | 20 +++++++++++ > include/linux/bitmap.h | 6 +++- > lib/cpumask.c | 9 ++--- > lib/find_bit.c | 59 ++++++++++++++++++++++++--------- > lib/test_find_bit.c | 47 +++++++++++++++++++++++++- > tools/include/asm-generic/bitops/find.h | 16 +++++++++ > tools/lib/find_bit.c | 40 ++++++++++++++++------ > 9 files changed, 168 insertions(+), 32 deletions(-) > > diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h > index ce5ee762ed66..4cab9bb823fb 100644 > --- a/arch/arm/include/asm/bitops.h > +++ b/arch/arm/include/asm/bitops.h > @@ -338,6 +338,7 @@ static inline int find_next_bit_le(const void *p, int size, int offset) > > #endif > > +#include > #include > > /* > diff --git a/arch/unicore32/include/asm/bitops.h b/arch/unicore32/include/asm/bitops.h > index 401f597bc38c..c0cbdbe17168 100644 > --- a/arch/unicore32/include/asm/bitops.h > +++ b/arch/unicore32/include/asm/bitops.h > @@ -44,4 +44,6 @@ static inline int fls(int x) > #define find_first_bit find_first_bit > #define find_first_zero_bit find_first_zero_bit > > +#include > + > #endif /* __UNICORE_BITOPS_H__ */ > diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h > index 1ba611e16fa0..8a1ee10014de 100644 > --- a/include/asm-generic/bitops/find.h > +++ b/include/asm-generic/bitops/find.h > @@ -16,6 +16,22 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long > size, unsigned long offset); > #endif > > +#ifndef find_next_and_bit > +/** > + * find_next_and_bit - find the next set bit in both memory regions > + * @addr1: The first address to base the search on > + * @addr2: The second address to base the search on > + * @offset: The bitnumber to start searching at > + * @size: The bitmap size in bits > + * > + * Returns the bit number for the next set bit > + * If no bits are set, returns @size. > + */ > +extern unsigned long find_next_and_bit(const unsigned long *addr1, > + const unsigned long *addr2, unsigned long size, > + unsigned long offset); > +#endif > + > #ifndef find_next_zero_bit > /** > * find_next_zero_bit - find the next cleared bit in a memory region > @@ -55,8 +71,12 @@ extern unsigned long find_first_zero_bit(const unsigned long *addr, > unsigned long size); > #else /* CONFIG_GENERIC_FIND_FIRST_BIT */ > > +#ifndef find_first_bit > #define find_first_bit(addr, size) find_next_bit((addr), (size), 0) > +#endif > +#ifndef find_first_zero_bit > #define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0) > +#endif How this change related to the find_next_and_bit? Does unguarded version breaks something? Please elaborate. > #endif /* CONFIG_GENERIC_FIND_FIRST_BIT */ > > diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h > index 3489253e38fc..45a9e169d0fd 100644 > --- a/include/linux/bitmap.h > +++ b/include/linux/bitmap.h > @@ -83,8 +83,12 @@ > * test_and_change_bit(bit, addr) Change bit and return old value > * find_first_zero_bit(addr, nbits) Position first zero bit in *addr > * find_first_bit(addr, nbits) Position first set bit in *addr > - * find_next_zero_bit(addr, nbits, bit) Position next zero bit in *addr >= bit > + * find_next_zero_bit(addr, nbits, bit) > + * Position next zero bit in *addr >= bit > * find_next_bit(addr, nbits, bit) Position next set bit in *addr >= bit > + * find_next_and_bit(addr1, addr2, nbits, bit) > + * Same as find_first_bit, but in > + * (*addr1 & *addr2) Nit. Addr1, addr2 - doesn't sound self-explainative. What about find_next_and_bit(addr1, and_map, nbits, bit), or something like this? > * > */ > > diff --git a/lib/cpumask.c b/lib/cpumask.c > index 35fe142ebb5e..beca6244671a 100644 > --- a/lib/cpumask.c > +++ b/lib/cpumask.c > @@ -33,10 +33,11 @@ EXPORT_SYMBOL(cpumask_next); > int cpumask_next_and(int n, const struct cpumask *src1p, > const struct cpumask *src2p) > { > - while ((n = cpumask_next(n, src1p)) < nr_cpu_ids) > - if (cpumask_test_cpu(n, src2p)) > - break; > - return n; > + /* -1 is a legal arg here. */ > + if (n != -1) > + cpumask_check(n); > + return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p), > + nr_cpumask_bits, n + 1); > } > EXPORT_SYMBOL(cpumask_next_and); > > diff --git a/lib/find_bit.c b/lib/find_bit.c > index 6ed74f78380c..ee3df93ba69a 100644 > --- a/lib/find_bit.c > +++ b/lib/find_bit.c > @@ -21,22 +21,29 @@ > #include > #include > > -#if !defined(find_next_bit) || !defined(find_next_zero_bit) > +#if !defined(find_next_bit) || !defined(find_next_zero_bit) || \ > + !defined(find_next_and_bit) > > /* > - * This is a common helper function for find_next_bit and > - * find_next_zero_bit. The difference is the "invert" argument, which > - * is XORed with each fetched word before searching it for one bits. > + * This is a common helper function for find_next_bit, find_next_zero_bit, and > + * find_next_and_bit. The differences are: > + * - The "invert" argument, which is XORed with each fetched word before > + * searching it for one bits. > + * - The optional "addr2", which is anded with "addr1" if present. The optional "and", which is anded with "addr" if present. - sounds better, isn't? > */ > -static unsigned long _find_next_bit(const unsigned long *addr, > - unsigned long nbits, unsigned long start, unsigned long invert) > +static inline unsigned long _find_next_bit(const unsigned long *addr1, > + const unsigned long *addr2, unsigned long nbits, > + unsigned long start, unsigned long invert) > { > unsigned long tmp; > > if (unlikely(start >= nbits)) > return nbits; > > - tmp = addr[start / BITS_PER_LONG] ^ invert; > + tmp = addr1[start / BITS_PER_LONG]; > + if (addr2) > + tmp &= addr2[start / BITS_PER_LONG]; > + tmp ^= invert; > > /* Handle 1st word. */ > tmp &= BITMAP_FIRST_WORD_MASK(start); > @@ -47,7 +54,10 @@ static unsigned long _find_next_bit(const unsigned long *addr, > if (start >= nbits) > return nbits; > > - tmp = addr[start / BITS_PER_LONG] ^ invert; > + tmp = addr1[start / BITS_PER_LONG]; > + if (addr2) > + tmp &= addr2[start / BITS_PER_LONG]; > + tmp ^= invert; > } > > return min(start + __ffs(tmp), nbits); > @@ -61,7 +71,7 @@ static unsigned long _find_next_bit(const unsigned long *addr, > unsigned long find_next_bit(const unsigned long *addr, unsigned long size, > unsigned long offset) > { > - return _find_next_bit(addr, size, offset, 0UL); > + return _find_next_bit(addr, NULL, size, offset, 0UL); > } > EXPORT_SYMBOL(find_next_bit); > #endif > @@ -70,11 +80,21 @@ EXPORT_SYMBOL(find_next_bit); > unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, > unsigned long offset) > { > - return _find_next_bit(addr, size, offset, ~0UL); > + return _find_next_bit(addr, NULL, size, offset, ~0UL); > } > EXPORT_SYMBOL(find_next_zero_bit); > #endif > > +#if !defined(find_next_and_bit) > +unsigned long find_next_and_bit(const unsigned long *addr1, > + const unsigned long *addr2, unsigned long size, > + unsigned long offset) > +{ > + return _find_next_bit(addr1, addr2, size, offset, 0UL); > +} > +EXPORT_SYMBOL(find_next_and_bit); > +#endif > + > #ifndef find_first_bit > /* > * Find the first set bit in a memory region. > @@ -146,15 +166,19 @@ static inline unsigned long ext2_swab(const unsigned long y) > } > > #if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le) > -static unsigned long _find_next_bit_le(const unsigned long *addr, > - unsigned long nbits, unsigned long start, unsigned long invert) > +static inline unsigned long _find_next_bit_le(const unsigned long *addr1, > + const unsigned long *addr2, unsigned long nbits, > + unsigned long start, unsigned long invert) > { > unsigned long tmp; > > if (unlikely(start >= nbits)) > return nbits; > > - tmp = addr[start / BITS_PER_LONG] ^ invert; > + tmp = addr1[start / BITS_PER_LONG]; > + if (addr2) > + tmp &= addr2[start / BITS_PER_LONG]; > + tmp ^= invert; > > /* Handle 1st word. */ > tmp &= ext2_swab(BITMAP_FIRST_WORD_MASK(start)); > @@ -165,7 +189,10 @@ static unsigned long _find_next_bit_le(const unsigned long *addr, > if (start >= nbits) > return nbits; > > - tmp = addr[start / BITS_PER_LONG] ^ invert; > + tmp = addr1[start / BITS_PER_LONG]; > + if (addr2) > + tmp &= addr2[start / BITS_PER_LONG]; > + tmp ^= invert; > } > > return min(start + __ffs(ext2_swab(tmp)), nbits); > @@ -176,7 +203,7 @@ static unsigned long _find_next_bit_le(const unsigned long *addr, > unsigned long find_next_zero_bit_le(const void *addr, unsigned > long size, unsigned long offset) > { > - return _find_next_bit_le(addr, size, offset, ~0UL); > + return _find_next_bit_le(addr, NULL, size, offset, ~0UL); > } > EXPORT_SYMBOL(find_next_zero_bit_le); > #endif > @@ -185,7 +212,7 @@ EXPORT_SYMBOL(find_next_zero_bit_le); > unsigned long find_next_bit_le(const void *addr, unsigned > long size, unsigned long offset) > { > - return _find_next_bit_le(addr, size, offset, 0UL); > + return _find_next_bit_le(addr, NULL, size, offset, 0UL); > } > EXPORT_SYMBOL(find_next_bit_le); > #endif > diff --git a/lib/test_find_bit.c b/lib/test_find_bit.c > index f4394a36f9aa..90773efa4694 100644 > --- a/lib/test_find_bit.c > +++ b/lib/test_find_bit.c > @@ -35,6 +35,7 @@ > #define SPARSE 500 > > static DECLARE_BITMAP(bitmap, BITMAP_LEN) __initdata; > +static DECLARE_BITMAP(bitmap2, BITMAP_LEN) __initdata; > > /* > * This is Schlemiel the Painter's algorithm. It should be called after > @@ -107,6 +108,42 @@ static int __init test_find_last_bit(const void *bitmap, unsigned long len) > return 0; > } > > +static int __init test_find_next_and_bit(const void *bitmap, > + const void *bitmap2, unsigned long len) > +{ > + unsigned long i, cnt; > + cycles_t cycles; > + > + cycles = get_cycles(); Please switch to ktime_get, as other tests do. > + for (cnt = i = 0; i < BITMAP_LEN; cnt++) > + i = find_next_and_bit(bitmap, bitmap2, BITMAP_LEN, i+1); > + cycles = get_cycles() - cycles; > + pr_err("find_next_and_bit: %ld cycles, %ld iterations\n", (long)cycles, > + cnt); Use pretty formatting here to align with others tests output. > + > + return 0; > +} > + > +static int __init test_find_next_and_bit_ref(const void *bitmap, > + const void *bitmap2, unsigned long len) > +{ > + unsigned long i, cnt; > + cycles_t cycles; > + > + cycles = get_cycles(); > + for (cnt = i = 0; i < BITMAP_LEN; cnt++) > + while ((i = find_next_bit(bitmap, BITMAP_LEN, i + 1)) > + < BITMAP_LEN) Nit: while (i < BITMAP_LEN) { i = find_next_bit(bitmap, BITMAP_LEN, i + 1); > + if (test_bit(i, bitmap2)) > + break; } > + > + cycles = get_cycles() - cycles; > + pr_err("find_next_and_bit_ref: %ld cycles, %ld iterations\n", > + (long)cycles, cnt); > + > + return 0; > +} I don't understand the purpose of this. It's obviously clear that test_find_next_and_bit cannot be slower than test_find_next_and_bit_ref We don't have 'reference' implementations for other functions here. If you still think that you need it, please introduce find_next_and_bit_ref() to make the purpose of test clear. (I spent more that a minute to understand what happens here, and my first suggestion was wrong.) > static int __init find_bit_test(void) > { > unsigned long nbits = BITMAP_LEN / SPARSE; > @@ -114,23 +151,31 @@ static int __init find_bit_test(void) > pr_err("\nStart testing find_bit() with random-filled bitmap\n"); > > get_random_bytes(bitmap, sizeof(bitmap)); > + get_random_bytes(bitmap2, sizeof(bitmap2)); > > test_find_next_bit(bitmap, BITMAP_LEN); > test_find_next_zero_bit(bitmap, BITMAP_LEN); > test_find_last_bit(bitmap, BITMAP_LEN); > test_find_first_bit(bitmap, BITMAP_LEN); > + test_find_next_and_bit_ref(bitmap, bitmap2, BITMAP_LEN); > + test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN); > > pr_err("\nStart testing find_bit() with sparse bitmap\n"); > > bitmap_zero(bitmap, BITMAP_LEN); > + bitmap_zero(bitmap2, BITMAP_LEN); > > - while (nbits--) > + while (nbits--) { > __set_bit(prandom_u32() % BITMAP_LEN, bitmap); > + __set_bit(prandom_u32() % BITMAP_LEN, bitmap2); > + } > > test_find_next_bit(bitmap, BITMAP_LEN); > test_find_next_zero_bit(bitmap, BITMAP_LEN); > test_find_last_bit(bitmap, BITMAP_LEN); > test_find_first_bit(bitmap, BITMAP_LEN); > + test_find_next_and_bit_ref(bitmap, bitmap2, BITMAP_LEN); > + test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN); For sparse bitmaps it will be like traversing zero-bitmaps. I doubt this numbers will be representative. Do we need this test at all? > return 0; > } Yury From 1585315848060842943@xxx Tue Nov 28 13:16:47 +0000 2017 X-GM-THRID: 1585315848060842943 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread