Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp245423ybg; Tue, 28 Jul 2020 05:07:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy5CcKyJuEgqtwrTVV6lsjJMci67ymOT1+UzBA8+QPytZ2EFkLardkAFAyeowXV6H5QVhGr X-Received: by 2002:a17:906:74d0:: with SMTP id z16mr9908264ejl.51.1595938026199; Tue, 28 Jul 2020 05:07:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595938026; cv=none; d=google.com; s=arc-20160816; b=a8GoaHRZPbPjce3gQ30jkFwGYZYa5u8qQmkaBzX+mvo7xzA/epvv8d97eds40uthSS 8bM2zdtycp8YQW9cfQw4byFJoH74YN3IpHeNgLqlY/0cFsriz7K1wJGFIp7rOYYnSpMl HUUgt++joxYSWwK7POtFg9JhSLKbuuMF4mK3xfioHKdZ/XxmOFs0oHRF4p1Uil/7x84h GIlsT5bk3oJ6A7bB1FoApSITi5Z4VOLnihYXCfrH4HWnUXimGkJQnW/zy2LRozA+qm/i QxLZTz9wHu8r287DHCPe+sU0Ul8MHeOy3+mdUh7nQl+hxR+dJWaxlM4OQb08uX6S6tmz OCHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=siKbAAXphMxAXL5P3mJWTonaBsCXig8ouAZIEDZ2NDw=; b=sR568DuqZnPlkzgbiXDy2nmL60Vfd0qBtQf9FGh/8Y4aLBxLPypH3aIL6QGCExVW2i lGlqQxPfPubxaZVtCI1AYw9otJNyKqGZek01lwVGltUjQ4DmC1RQPVwl/mFTmxy9mrrh b1BV3O3IOzYldExhDrMravfq05sfn7Ejga8PGAAiLCtID7ciFUetQkkOJHD6URcvBdCy 4Js1Wu9JvdexdSPV8I0Xc/J00u9KWJVWPv9H6SrHW+Z+82uhwSTHiF9Fv5G1g16eYSZp W7sPlAMxQ7M3X1LEw4Q70aAjjYpG83e+gy6HV98xruuMC9NrBjSIw48ZUX8iX83bcCtw O0FA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=QcCR2zDV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lt11si2759582ejb.436.2020.07.28.05.06.29; Tue, 28 Jul 2020 05:07:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=QcCR2zDV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729102AbgG1Lv4 (ORCPT + 99 others); Tue, 28 Jul 2020 07:51:56 -0400 Received: from mail.kernel.org ([198.145.29.99]:56374 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728504AbgG1Lv4 (ORCPT ); Tue, 28 Jul 2020 07:51:56 -0400 Received: from quaco.ghostprotocols.net (179.176.1.55.dynamic.adsl.gvt.net.br [179.176.1.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0EAF2206D7; Tue, 28 Jul 2020 11:51:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595937115; bh=BWfjFkA309sKXy3KRVke7Jj+TQfo9l786CO6RCGuAFE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=QcCR2zDVSfPEguebxMBpOP7nVJGgs9cjAF/Oi6lk7oJcNWF68Zh8DFf5dOPToGruq 3TIu7FgfuMd4vg6Hh9F/8GK1DSOE7+dpN6MuQ/LQ2vXumhZhc6XtLCFL4xvgIsl/CX p/AGiQQTelTMl2jBskgA7oNYki8BLzuDRiyojK+Y= Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id A9617404B1; Tue, 28 Jul 2020 08:51:52 -0300 (-03) Date: Tue, 28 Jul 2020 08:51:52 -0300 From: Arnaldo Carvalho de Melo To: Ian Rogers Cc: Peter Zijlstra , Ingo Molnar , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Andi Kleen , linux-kernel@vger.kernel.org, Stephane Eranian Subject: Re: [PATCH] perf bench: Add benchmark of find_next_bit Message-ID: <20200728115152.GB3328@kernel.org> References: <20200724071959.3110510-1-irogers@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200724071959.3110510-1-irogers@google.com> X-Url: http://acmel.wordpress.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Fri, Jul 24, 2020 at 12:19:59AM -0700, Ian Rogers escreveu: > for_each_set_bit, or similar functions like for_each_cpu, may be hot > within the kernel. If many bits were set then one could imagine on > Intel a "bt" instruction with every bit may be faster than the function > call and word length find_next_bit logic. Add a benchmark to measure > this. Thanks, applied. - Arnaldo > This benchmark on AMD rome and Intel skylakex shows "bt" is not a good > option except for very small bitmaps. > > Signed-off-by: Ian Rogers > --- > tools/perf/bench/Build | 1 + > tools/perf/bench/bench.h | 1 + > tools/perf/bench/find-bit-bench.c | 135 ++++++++++++++++++++++++++++++ > tools/perf/builtin-bench.c | 1 + > 4 files changed, 138 insertions(+) > create mode 100644 tools/perf/bench/find-bit-bench.c > > diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build > index 768e408757a0..fb114bca3a8d 100644 > --- a/tools/perf/bench/Build > +++ b/tools/perf/bench/Build > @@ -10,6 +10,7 @@ perf-y += epoll-wait.o > perf-y += epoll-ctl.o > perf-y += synthesize.o > perf-y += kallsyms-parse.o > +perf-y += find-bit-bench.o > > perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o > perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o > diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h > index 61cae4966cae..3291b0ddddfe 100644 > --- a/tools/perf/bench/bench.h > +++ b/tools/perf/bench/bench.h > @@ -35,6 +35,7 @@ int bench_sched_messaging(int argc, const char **argv); > int bench_sched_pipe(int argc, const char **argv); > int bench_mem_memcpy(int argc, const char **argv); > int bench_mem_memset(int argc, const char **argv); > +int bench_mem_find_bit(int argc, const char **argv); > int bench_futex_hash(int argc, const char **argv); > int bench_futex_wake(int argc, const char **argv); > int bench_futex_wake_parallel(int argc, const char **argv); > diff --git a/tools/perf/bench/find-bit-bench.c b/tools/perf/bench/find-bit-bench.c > new file mode 100644 > index 000000000000..1aadbd9d7e86 > --- /dev/null > +++ b/tools/perf/bench/find-bit-bench.c > @@ -0,0 +1,135 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Benchmark find_next_bit and related bit operations. > + * > + * Copyright 2020 Google LLC. > + */ > +#include > +#include "bench.h" > +#include "../util/stat.h" > +#include > +#include > +#include > +#include > + > +static unsigned int outer_iterations = 5; > +static unsigned int inner_iterations = 100000; > + > +static const struct option options[] = { > + OPT_UINTEGER('i', "outer-iterations", &outer_iterations, > + "Number of outerer iterations used"), > + OPT_UINTEGER('j', "inner-iterations", &inner_iterations, > + "Number of outerer iterations used"), > + OPT_END() > +}; > + > +static const char *const bench_usage[] = { > + "perf bench mem find_bit ", > + NULL > +}; > + > +static unsigned int accumulator; > +static unsigned int use_of_val; > + > +static noinline void workload(int val) > +{ > + use_of_val += val; > + accumulator++; > +} > + > +#if defined(__i386__) || defined(__x86_64__) > +static bool asm_test_bit(long nr, const unsigned long *addr) > +{ > + bool oldbit; > + > + asm volatile("bt %2,%1" > + : "=@ccc" (oldbit) > + : "m" (*(unsigned long *)addr), "Ir" (nr) : "memory"); > + > + return oldbit; > +} > +#else > +#define asm_test_bit test_bit > +#endif > + > +static int do_for_each_set_bit(unsigned int num_bits) > +{ > + unsigned long *to_test = bitmap_alloc(num_bits); > + struct timeval start, end, diff; > + u64 runtime_us; > + struct stats fb_time_stats, tb_time_stats; > + double time_average, time_stddev; > + unsigned int bit, i, j; > + unsigned int set_bits, skip; > + unsigned int old; > + > + init_stats(&fb_time_stats); > + init_stats(&tb_time_stats); > + > + for (set_bits = 1; set_bits <= num_bits; set_bits <<= 1) { > + bitmap_zero(to_test, num_bits); > + skip = num_bits / set_bits; > + for (i = 0; i < num_bits; i += skip) > + set_bit(i, to_test); > + > + for (i = 0; i < outer_iterations; i++) { > + old = accumulator; > + gettimeofday(&start, NULL); > + for (j = 0; j < inner_iterations; j++) { > + for_each_set_bit(bit, to_test, num_bits) > + workload(bit); > + } > + gettimeofday(&end, NULL); > + assert(old + (inner_iterations * set_bits) == accumulator); > + timersub(&end, &start, &diff); > + runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec; > + update_stats(&fb_time_stats, runtime_us); > + > + old = accumulator; > + gettimeofday(&start, NULL); > + for (j = 0; j < inner_iterations; j++) { > + for (bit = 0; bit < num_bits; bit++) { > + if (asm_test_bit(bit, to_test)) > + workload(bit); > + } > + } > + gettimeofday(&end, NULL); > + assert(old + (inner_iterations * set_bits) == accumulator); > + timersub(&end, &start, &diff); > + runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec; > + update_stats(&tb_time_stats, runtime_us); > + } > + > + printf("%d operations %d bits set of %d bits\n", > + inner_iterations, set_bits, num_bits); > + time_average = avg_stats(&fb_time_stats); > + time_stddev = stddev_stats(&fb_time_stats); > + printf(" Average for_each_set_bit took: %.3f usec (+- %.3f usec)\n", > + time_average, time_stddev); > + time_average = avg_stats(&tb_time_stats); > + time_stddev = stddev_stats(&tb_time_stats); > + printf(" Average test_bit loop took: %.3f usec (+- %.3f usec)\n", > + time_average, time_stddev); > + > + if (use_of_val == accumulator) /* Try to avoid compiler tricks. */ > + printf("\n"); > + } > + bitmap_free(to_test); > + return 0; > +} > + > +int bench_mem_find_bit(int argc, const char **argv) > +{ > + int err = 0, i; > + > + argc = parse_options(argc, argv, options, bench_usage, 0); > + if (argc) { > + usage_with_options(bench_usage, options); > + exit(EXIT_FAILURE); > + } > + > + for (i = 1; i <= 2048; i <<= 1) > + do_for_each_set_bit(i); > + > + return err; > +} > diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c > index cad31b1d3438..690eee1120a7 100644 > --- a/tools/perf/builtin-bench.c > +++ b/tools/perf/builtin-bench.c > @@ -52,6 +52,7 @@ static struct bench sched_benchmarks[] = { > static struct bench mem_benchmarks[] = { > { "memcpy", "Benchmark for memcpy() functions", bench_mem_memcpy }, > { "memset", "Benchmark for memset() functions", bench_mem_memset }, > + { "find_bit", "Benchmark for find_bit() functions", bench_mem_find_bit }, > { "all", "Run all memory access benchmarks", NULL }, > { NULL, NULL, NULL } > }; > -- > 2.28.0.rc0.142.g3c755180ce-goog > -- - Arnaldo