Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp1957814imn; Mon, 1 Aug 2022 06:18:05 -0700 (PDT) X-Google-Smtp-Source: AA6agR6lvaPwhgsucUDuYFLF63+nBkwVHoSh78zcwboxfFK61uq7/9FHTvH3xBQpNI40MC0y3OHx X-Received: by 2002:a17:90b:4c87:b0:1f2:cf1d:c906 with SMTP id my7-20020a17090b4c8700b001f2cf1dc906mr19035568pjb.119.1659359884897; Mon, 01 Aug 2022 06:18:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659359884; cv=none; d=google.com; s=arc-20160816; b=mjFKihszfahgXmK5b7zFLu35bKfTkADFO3egg7V4em9ypp8KHIql/lC7gM+C46LUak /kMJJpAfbW454uttSSuA7Jdv+y7GknAESyDfRZEnO7qWXWT8PLLkx+42q+ESnvR+Pz9g rZKogkCjVlnn4b0YCWJryvdAh6iB86wNmy06G8ldwZ4icKjVh8gLfb8g57RPWBz0uszN Z6s1V7BRMUMGGlZwsnzHYLVnC0MewCRiQiHm/RkksV30ay3+Jl/mrf0VTVjBs6dB3kM+ 21005qtr+DG6oad+a98mIaTh60KUevqNE4kFBWDpdGeQNMcAK8xam5CrGIQt0BmVu8h7 UnIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=G9W7GWg51tf1bAD2M5hnmxXSPe8dZvxVRbWoLOTTocU=; b=TnI1lF2FMi6z19cIE/kbwaPDkftXXc1ZY9jB7d9WstqKJ6cTM6JdGLx16Nm8fTGUAF Og8YtP/miQf83yY/UV2/gnKDUcRUtdnZPFF72jjERHx3ZbcEsP+WPgJxuxCaCVG1dr30 XHu2LkWGcftO5SmFIdikRz/QCLV+J15JCr8WJwen0RaJMHa6cwxKOUbAovZLtoiKemrW 2/wQFVSjxhb8+PVLNoR9xmJVktTyM4ZoSFWoMwIEzlZJYlUaQR1mTBgPbT/0S779Xtv7 bTC5SGqQKyUXMkE2EWo5PKYosbe5IF5wah+L10+8VrQDI7VsJQPFzWJHTflzJw8SEnPW yMwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gj2vL+hW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d20-20020a056a00199400b00528a3653a02si14240392pfl.329.2022.08.01.06.17.50; Mon, 01 Aug 2022 06:18:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gj2vL+hW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234638AbiHAMj1 (ORCPT + 99 others); Mon, 1 Aug 2022 08:39:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234594AbiHAMjI (ORCPT ); Mon, 1 Aug 2022 08:39:08 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EA377E002; Mon, 1 Aug 2022 05:18:00 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CFE906102A; Mon, 1 Aug 2022 12:17:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DEE70C43142; Mon, 1 Aug 2022 12:17:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1659356279; bh=fwQmGYNAwnwhmTNS6sohSMaV1Xzon+6Fu2cbg6wDZdo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=gj2vL+hWAQcyAOIPJ422UhRsPi++CbwD9SLnTploLfWmwSuUSq5hWQAET7Jll5zGS KZE7IfkxppcrHgDenW4WAyK2Uh69jPPcrTNVdu1+QBM3eJRm7Pcf6717Wneqd/zi2u t1f3VkVa9+qp6orCRk4Na6GGrRWeaI1iO0mfhVo53i4APrGAndJX78vcZqPuBF/GFT mbBPqAfx0292gWntUJNBZyl9ErXrv2Nfo+xH5NTYgypd2eGiy7IoGxzQsrsbxCuqJk u3OXjskuRwgEC1Fb5U8ZqkymqOXnixNLCFPiUYbA3fjkAFVQQrZG/h0zjyR3cIvizg J3R9UFRSluLAg== Received: by quaco.ghostprotocols.net (Postfix, from userid 1000) id B0F5B40736; Mon, 1 Aug 2022 09:17:55 -0300 (-03) Date: Mon, 1 Aug 2022 09:17:55 -0300 From: Arnaldo Carvalho de Melo To: Peter Zijlstra , Ian Rogers Cc: Ingo Molnar , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Kajol Jain , Andi Kleen , Adrian Hunter , Anshuman Khandual , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Rob Herring , Stephane Eranian Subject: Re: [PATCH v3 1/3] perf: Align user space counter reading with code Message-ID: References: <20220719223946.176299-1-irogers@google.com> <20220719223946.176299-2-irogers@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220719223946.176299-2-irogers@google.com> X-Url: http://acmel.wordpress.com X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Tue, Jul 19, 2022 at 03:39:44PM -0700, Ian Rogers escreveu: > Align the user space counter reading documentation with the code in > perf_mmap__read_self. Previously the documentation was based on the perf > rdpmc test, but now general purpose code is provided by libperf. Peter, can you merge this so as not to make Linus raise eyebrows with me processing things outside tools/perf/ when asking him to pull perf userspace? - Arnaldo > Signed-off-by: Ian Rogers > --- > include/uapi/linux/perf_event.h | 35 +++++++++++++++++---------- > tools/include/uapi/linux/perf_event.h | 35 +++++++++++++++++---------- > 2 files changed, 44 insertions(+), 26 deletions(-) > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h > index d37629dbad72..6826dabb7e03 100644 > --- a/include/uapi/linux/perf_event.h > +++ b/include/uapi/linux/perf_event.h > @@ -538,9 +538,13 @@ struct perf_event_mmap_page { > * > * if (pc->cap_usr_time && enabled != running) { > * cyc = rdtsc(); > - * time_offset = pc->time_offset; > * time_mult = pc->time_mult; > * time_shift = pc->time_shift; > + * time_offset = pc->time_offset; > + * if (pc->cap_user_time_short) { > + * time_cycles = pc->time_cycles; > + * time_mask = pc->time_mask; > + * } > * } > * > * index = pc->index; > @@ -548,6 +552,9 @@ struct perf_event_mmap_page { > * if (pc->cap_user_rdpmc && index) { > * width = pc->pmc_width; > * pmc = rdpmc(index - 1); > + * pmc <<= 64 - width; > + * pmc >>= 64 - width; > + * count += pmc; > * } > * > * barrier(); > @@ -590,25 +597,27 @@ struct perf_event_mmap_page { > * If cap_usr_time the below fields can be used to compute the time > * delta since time_enabled (in ns) using rdtsc or similar. > * > - * u64 quot, rem; > - * u64 delta; > - * > - * quot = (cyc >> time_shift); > - * rem = cyc & (((u64)1 << time_shift) - 1); > - * delta = time_offset + quot * time_mult + > - * ((rem * time_mult) >> time_shift); > + * cyc = time_cycles + ((cyc - time_cycles) & time_mask); > + * delta = time_offset + mul_u64_u32_shr(cyc, time_mult, time_shift); > * > * Where time_offset,time_mult,time_shift and cyc are read in the > - * seqcount loop described above. This delta can then be added to > - * enabled and possible running (if index), improving the scaling: > + * seqcount loop described above. mul_u64_u32_shr will compute: > + * > + * (u64)(((unsigned __int128)cyc * time_mult) >> time_shift) > + * > + * This delta can then be added to enabled and possible running (if > + * index) to improve the scaling. Due to event multiplexing, running > + * may be zero and so care is needed to avoid division by zero. > * > * enabled += delta; > * if (index) > * running += delta; > * > - * quot = count / running; > - * rem = count % running; > - * count = quot * enabled + (rem * enabled) / running; > + * if (running != 0) { > + * quot = count / running; > + * rem = count % running; > + * count = quot * enabled + (rem * enabled) / running; > + * } > */ > __u16 time_shift; > __u32 time_mult; > diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h > index d37629dbad72..6826dabb7e03 100644 > --- a/tools/include/uapi/linux/perf_event.h > +++ b/tools/include/uapi/linux/perf_event.h > @@ -538,9 +538,13 @@ struct perf_event_mmap_page { > * > * if (pc->cap_usr_time && enabled != running) { > * cyc = rdtsc(); > - * time_offset = pc->time_offset; > * time_mult = pc->time_mult; > * time_shift = pc->time_shift; > + * time_offset = pc->time_offset; > + * if (pc->cap_user_time_short) { > + * time_cycles = pc->time_cycles; > + * time_mask = pc->time_mask; > + * } > * } > * > * index = pc->index; > @@ -548,6 +552,9 @@ struct perf_event_mmap_page { > * if (pc->cap_user_rdpmc && index) { > * width = pc->pmc_width; > * pmc = rdpmc(index - 1); > + * pmc <<= 64 - width; > + * pmc >>= 64 - width; > + * count += pmc; > * } > * > * barrier(); > @@ -590,25 +597,27 @@ struct perf_event_mmap_page { > * If cap_usr_time the below fields can be used to compute the time > * delta since time_enabled (in ns) using rdtsc or similar. > * > - * u64 quot, rem; > - * u64 delta; > - * > - * quot = (cyc >> time_shift); > - * rem = cyc & (((u64)1 << time_shift) - 1); > - * delta = time_offset + quot * time_mult + > - * ((rem * time_mult) >> time_shift); > + * cyc = time_cycles + ((cyc - time_cycles) & time_mask); > + * delta = time_offset + mul_u64_u32_shr(cyc, time_mult, time_shift); > * > * Where time_offset,time_mult,time_shift and cyc are read in the > - * seqcount loop described above. This delta can then be added to > - * enabled and possible running (if index), improving the scaling: > + * seqcount loop described above. mul_u64_u32_shr will compute: > + * > + * (u64)(((unsigned __int128)cyc * time_mult) >> time_shift) > + * > + * This delta can then be added to enabled and possible running (if > + * index) to improve the scaling. Due to event multiplexing, running > + * may be zero and so care is needed to avoid division by zero. > * > * enabled += delta; > * if (index) > * running += delta; > * > - * quot = count / running; > - * rem = count % running; > - * count = quot * enabled + (rem * enabled) / running; > + * if (running != 0) { > + * quot = count / running; > + * rem = count % running; > + * count = quot * enabled + (rem * enabled) / running; > + * } > */ > __u16 time_shift; > __u32 time_mult; > -- > 2.37.0.170.g444d1eabd0-goog -- - Arnaldo