Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp3374186pxb; Mon, 17 Jan 2022 18:58:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJylYfCANSNa7ahzNRBeUAYemhUngmoeblzNsLOB8ZR+Tq9Q0Qr4zYQwTt2tjLztBmv0fuL7 X-Received: by 2002:a17:902:b414:b0:149:61c7:d550 with SMTP id x20-20020a170902b41400b0014961c7d550mr25608305plr.129.1642474693859; Mon, 17 Jan 2022 18:58:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642474693; cv=none; d=google.com; s=arc-20160816; b=SjooKVtmuH4dN9pfEwBrw7CIAcFqT044hp+7wIHnIbtd53/GA06tLyRg2B84GegbQY Frh5Qo4Wht1hFmB7I7LOxV9rhM3OMlXHoGGOVFgSQpkfx/nUGoYJ3mpK2XsnyZbkPJ0i jo9fChK31BxluH4bTKeaW6WOOhHKcLV0wSsLTFgh62KIEHh4sZgMRqYqXXq1ofj8EJJn GTaqM17/ENKthBrTjB2WGu/G4+dq8hBczPNqzAvLx1ZMbwUhwlqxrHITYCllSjfISazk CVMOAA3CtlxQkotke64bFzLH4U4SS6osnNLnZPAsTxpDVkQ6XPHrlyxsrB5dNNvRVT/w Q+Vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=8aSxWeuJtecsMYrKAaymnbJ19+t6qrA5dD6VRrj4EZg=; b=v/PVXd8lGkjzq6uqtCe8h6sR18BDEONbb0eFPahyajPnMVgOEVXi71UqeJmQyzqTpb ULwbauWNV5xKAEmCJIV15vC3sHsPjhOPd3WS76ZpmMKxSi5sTzxWyuy3zgJTYYMGMtT7 3J2eNcHBfFE1iGQAsTBqw3V4rW2EjJDaiSIwopec3J1sNTnOZ3SNEns8nfr5H61Trb4L LlEJbWFc/g5ZBjQ0pRPXNio+ZuOWPASlStGdmo+QaVFobvXmwqV+Pi8Bob6oFRvpz8e6 a7s7StujkdACI3gJ7F7/vwnxNh37rknL2id+qkiFxD4vLh9mgpClKnxpcfeZPT0JGWRH xaOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u20si1127954pjr.84.2022.01.17.18.58.01; Mon, 17 Jan 2022 18:58:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241398AbiAQSsT (ORCPT + 99 others); Mon, 17 Jan 2022 13:48:19 -0500 Received: from ams.source.kernel.org ([145.40.68.75]:44798 "EHLO ams.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230519AbiAQSsS (ORCPT ); Mon, 17 Jan 2022 13:48:18 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 7AD13B81151 for ; Mon, 17 Jan 2022 18:48:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 673BEC36AE3; Mon, 17 Jan 2022 18:48:13 +0000 (UTC) Date: Mon, 17 Jan 2022 18:48:10 +0000 From: Catalin Marinas To: Leo Yan Cc: Kees Cook , Will Deacon , Mark Rutland , Ard Biesheuvel , Sami Tolvanen , Nicholas Piggin , James Morse , Marc Zyngier , Joey Gouly , Peter Collingbourne , Vincenzo Frascino , "Peter Zijlstra (Intel)" , Stephane Eranian , James Clark , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [RFCv1 4/4] perf: arm_spe: Dynamically switch PID tracing to contextidr Message-ID: References: <20211021134530.206216-1-leo.yan@linaro.org> <20211021134530.206216-5-leo.yan@linaro.org> <202110210848.35971643C6@keescook> <20211101152835.GB375622@leoy-ThinkPad-X240s> <20211205135103.GA42658@leoy-ThinkPad-X240s> <20211207123118.GA255238@leoy-ThinkPad-X240s> <20211210075918.GD622826@leoy-ThinkPad-X240s> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211210075918.GD622826@leoy-ThinkPad-X240s> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Leo, (sorry for the delay, holidays, merging window) On Fri, Dec 10, 2021 at 03:59:18PM +0800, Leo Yan wrote: > On Wed, Dec 08, 2021 at 05:29:41PM +0000, Catalin Marinas wrote: > > On Tue, Dec 07, 2021 at 08:31:18PM +0800, Leo Yan wrote: > > > On Tue, Dec 07, 2021 at 11:48:00AM +0000, Catalin Marinas wrote: > > > > On Sun, Dec 05, 2021 at 09:51:03PM +0800, Leo Yan wrote: > > > > > On Fri, Dec 03, 2021 at 04:22:42PM +0000, Catalin Marinas wrote: > > > > > > What's the cost of always enabling CONFIG_PID_IN_CONTEXTIDR? If it's > > > > > > negligible, I'd not bother at all with any of the enabling/disabling. > > > > > > > > > > Yes, I compared performance for PID tracing with always enabling and > > > > > disabling CONFIG_PID_IN_CONTEXTIDR, and also compared with using > > > > > static key for enabling/disabling PID tracing. The result shows the > > > > > cost is negligible based on the benchmark 'perf bench sched'. > > > > > > > > > > Please see the detailed data in below link (note the testing results > > > > > came from my Juno board): > > > > > https://lore.kernel.org/lkml/20211021134530.206216-1-leo.yan@linaro.org/ > > > > > > > > The table wasn't entirely clear to me. So the dis/enb benchmarks are > > > > without this patchset applied. > > > > > > Yes, dis/enb metrics don't apply this patchset. > > > > > > > There seems to be a minor drop but it's > > > > probably noise. Anyway, do we need this patchset or we just make > > > > CONFIG_PID_IN_CONTEXTIDR default to y? > > > > > > Good point. I remembered before we had discussed for making > > > CONFIG_PID_IN_CONTEXTIDR to 'y', but this approach is not always valid, > > > especially when the profiling process runs in non-root PID namespace, > > > in this case, hardware tracing data (e.g. Arm SPE or CoreSight) cannot > > > trust the PID values from tracing since the PID conflicts between > > > different PID namespaces. > > > > > > So this patchset is to add the fundamental mechanism for dynamically > > > enabling and disable PID tracing into CONTEXTIDR. Based on it, we can > > > use helpers to dynamically enable PID tracing _only_ when process runs > > > in root PID namespace. > > > > I don't think your approach fully works. Let's say you are tracing two > > processes, one in the root PID namespace, the other not. Since the > > former enables PID in CONTEXTIDR, you automatically get some PID in > > CONTEXTIDR for the latter whether you requested it explicitly or not. > > The key point is kernel always sets a PID number from the root PID > namespace into the system register CONTEXTIDR. So earlier you mentioned that CONFIG_PID_IN_CONTEXTIDR=y is not always right since a profiled task may run in a non-root PID namespace. But this series introduces a single knob to turn on PID in CONTEXTIDR for all CPUs, irrespective of whether they run a task in the root PID namespace or not. The CPU running a task in the root ns may call contextidr_enable() but the other CPUs may run tasks in non-root ns. IIUC, with this patch, to avoid the root ns PID for a non-root ns task, the SPE driver only sets the SYS_PMSCR_EL1_CX_SHIFT bit for the root ns tasks. So, in this case, for a non-root ns task, does it matter that CONTEXTIDR always contain the root ns PID? If it matters, see above why a single knob is already doing this. > Let's see a case: > > # unshare --fork --pid perf record -e cs_etm// -m 64K,64K -a -o perf_test.data -- ./multi_threads_test > > In this case, with command "unshare --fork --pid", perf tool and the > profiled program multi_threads_test run in the created PID namespace. > We can see the dumped PID values: > > -0 [000] d..2. 331.751681: __switch_to: contextidr_thread_switch: task perf-exec pid 840 ns_pid 2 > -0 [002] d..2. 331.755930: __switch_to: contextidr_thread_switch: task multi_threads_t pid 842 ns_pid 4 > -0 [003] d..2. 331.755993: __switch_to: contextidr_thread_switch: task multi_threads_t pid 843 ns_pid 5 > sugov:0-129 [005] d..2. 332.323469: __switch_to: contextidr_thread_switch: task perf pid 841 ns_pid 3 > > We can see processes have two different PIDs values, say task > 'perf-exec', 840 is its PID number from the root PID namespace, and 2 > is its PID from the active namespace the task is belonging to. > > But the register CONTEXTIDR is _only_ used to trace PIDs from the root > PID namespace and ignores PID values from any other PID namespace. Would it help if we used task_pid_nr_ns() instead for setting CONTEXTIDR? > Come back to your question, if there have two tracing sessions, one > session runs in the root PID namespace, and another runs in the > non-root PID namespace. Since we can gather the PIDs in the root namespace, > the session in the root PID namespace can work perfectly, and we can > ensure the PID infos matching well between kernel's PID tracing and user > space tool (note: perf needs gather process info with process' pid/tid for > post parsing). > > For the later session in non-root PID name space, we doesn't capture > any PID tracing data. This results in the profiled result doesn't > contain any PID info, although it's painful that the PID info is > incomplete but we don't deliver any wrong info for users studying > profiling result and PID is always '-1'. So what's actually disabling the capture of PID tracing data? Is it the PMSCR_EL1.CX bit being 0? > I think the purpose for this patchset is to allow us to dynamically > enable PID tracing when the tracing session runs in root namespace. But it doesn't do this with a single global knob for all CPUs. You only need the SPE patch together with the CONTEXTIDR config set to y. > Alternatively, if you accept to always set PID to CONTEXTIDR in > contextidr_thread_switch(), it would be fine for me and we can only > need to control PID packets in SPE and CoreSight drivers. Well, we have the config option and infrastructure already, it's up to Linux distros to turn it on. It doesn't necessarily need to be in defconfig. Now, if we removed the CONTEXTIDR setting altogether, is there a way for perf tools to coordinate the traces with the scheduling events? This would be a simpler approach from the kernel perspective (i.e. no work). Alternatively, could we move the CONTEXTIDR setting to a perf driver. I'm not too familiar with perf but I guess we could add some scheduling event hooks. -- Catalin