Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1061556pxb; Thu, 4 Feb 2021 04:14:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJy4JK0d3Ggjq3EKtPIE6lVIFYrmO1RmGSgaHeXgjUY38q8UD5vuMfMwJrqZrdjFaIQv55lR X-Received: by 2002:a17:906:442:: with SMTP id e2mr7824703eja.9.1612440892841; Thu, 04 Feb 2021 04:14:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612440892; cv=none; d=google.com; s=arc-20160816; b=TbJQtM+Ncd2g7kPt69kFnb7wo9ZCNKtBBFvxAv93kGUopnF0aYRJ2m7T104b6H4dyr qz3xitLMSktPvxkYp0HuQi/p2W4ElURymGRDPiP3rRfcF9OPtPjg38n9msjTPa+HCrXo YxAR02ZOWhXbVZnHDLrxfXlhapQdxXmXg1vjpZ8uj1lneZFwL4AOoYhItB0loBkTYPL0 mfVQyuoxZOecgDmjfDlBKqkUXsJAIVvu1K3JtZkwvgxZ4oaPZYlfrOo6AuPD0wcGPjhG 5D0JCsiBpmrI/OrQyRTN5zWHBX9/DV6fFfVJckW1ppMUpEz+6aiZB8GvpfPgMS2UIGE8 ZvOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=W9UK59mZN1fK3WQEc0n2GOEp3h1uTPcH11g13YUlKTg=; b=Gq8sFT4eFNjgGYTsBow9kYAxV1naBynxO+wRcyanLMRUS3VK3gi134LOCNn+6EPtFD PJur9eI+cdEt9ZfUf7wBdRaUehz2vzhuTL9ayEko2j+Sv4eJNIFCI89yasfTeVWz2yqs 4uYligFEf6hiBmaBEnX1NO/oIn8xGd7gG1AHkE1q/TXTMOLxnTLkEzrnN8yYHFgBa18b 0kSUcd+xDW3TPR9fsfevnI6WbvytXnLlFYTm3YGPRcFjyqzPJkNTC5wzbzh90J46HhLH 5nvAAtyasSPYEMq4s0WVYkLHLrM9PUguJ1UZIWc7nqWDyMid78F7dmueIYPv7ksZtSz1 EPpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=merlin.20170209 header.b=yuaixOde; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hd10si2982203ejc.208.2021.02.04.04.14.27; Thu, 04 Feb 2021 04:14:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=merlin.20170209 header.b=yuaixOde; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235911AbhBDMKK (ORCPT + 99 others); Thu, 4 Feb 2021 07:10:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235780AbhBDMKG (ORCPT ); Thu, 4 Feb 2021 07:10:06 -0500 Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81307C061573 for ; Thu, 4 Feb 2021 04:09:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=W9UK59mZN1fK3WQEc0n2GOEp3h1uTPcH11g13YUlKTg=; b=yuaixOdeicIkfv4VTYvNoNKDqO 6tQUyNe3noO8abw0gzvh+HhF0pLW8QJPudWj+BPqHHfnrqM003jkHKVJAfHz4Y5733crIeFg4gTsq BovWUqMB8H/yDt/Bf/9tydyNdkYz0MdINeUed3N2ogp0NbNz9C3zwzOU+WqsLAG0D9H8ewZkLEHZZ WkljePFrrRS/nBlUnUIwsvEzVqrzrxaWbMuEYdd9Uq4MOvINlD3xQBg+LhLsmfKdOuws711ynAuLB 7tZj5Tehm1ykQEgXS/jQlF/STlIlllORpyUR82CBASfMPZyaG0HyfBeoi0mTclTHlYm3gK3pcn5bG VwInuLbQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1l7dRm-0007Nm-HO; Thu, 04 Feb 2021 12:09:18 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id BB04A3003D8; Thu, 4 Feb 2021 13:09:13 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 86078213D2E27; Thu, 4 Feb 2021 13:09:13 +0100 (CET) Date: Thu, 4 Feb 2021 13:09:13 +0100 From: Peter Zijlstra To: Dmitry Vyukov Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Will Deacon , LKML , Matt Morehouse Subject: Re: Process-wide watchpoints Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 04, 2021 at 10:54:42AM +0100, Dmitry Vyukov wrote: > On Thu, Feb 4, 2021 at 10:39 AM Peter Zijlstra wrote: > > OTOH, we're using ptrace permission checks, and ptrace() can inject > > signals just fine. But it's a fairly big departure from what perf set > > out to be. > > Oh, I see, I did not think about this. > > FWIW it's doable today by attaching a BPF program. Sorta. For one, I can't operate BPF to save my life. Secondly, BPF has some very dodgy recursion rules and it's trivial to loose BPF invocations because another BPF is already running. > Will it help if this mode is restricted to monitoring the current > process? Sending signals indeed usually requires cooperation, so doing > it for the current process looks like a reasonable restriction. > This may be not a fundamental restriction, but rather "we don't have > any use cases and are not sure about implications, so this is a > precaution measure, may be relaxed in future". Yeah, limiting it might help. I can trivially add attr::thread_only, that requires attr::inherit and will limit it to CLONE_THREAD (find below). What do we do then? The advantage of IOC_REFRESH is that it disables the event until it gets explicitly re-armed, avoiding recursion issues etc. Do you want those semantics? If so, we'd need to have IOC_REFRESH find the actual event for the current task, which should be doable I suppose. And I need to dig into that fcntl() crud again, see if that's capable of doing a SIGTRAP and if it's possible to target that to the task raising it, instead of doing a process wide signal delivery. Lemme rummage about a bit. --- --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -955,7 +955,7 @@ extern void __perf_event_task_sched_in(s struct task_struct *task); extern void __perf_event_task_sched_out(struct task_struct *prev, struct task_struct *next); -extern int perf_event_init_task(struct task_struct *child); +extern int perf_event_init_task(struct task_struct *child, unsigned long clone_flags); extern void perf_event_exit_task(struct task_struct *child); extern void perf_event_free_task(struct task_struct *task); extern void perf_event_delayed_put(struct task_struct *task); @@ -1446,7 +1446,8 @@ perf_event_task_sched_in(struct task_str static inline void perf_event_task_sched_out(struct task_struct *prev, struct task_struct *next) { } -static inline int perf_event_init_task(struct task_struct *child) { return 0; } +static inline int perf_event_init_task(struct task_struct *child, + unsigned long clone_flags) { return 0; } static inline void perf_event_exit_task(struct task_struct *child) { } static inline void perf_event_free_task(struct task_struct *task) { } static inline void perf_event_delayed_put(struct task_struct *task) { } --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -388,7 +388,8 @@ struct perf_event_attr { aux_output : 1, /* generate AUX records instead of events */ cgroup : 1, /* include cgroup events */ text_poke : 1, /* include text poke events */ - __reserved_1 : 30; + thread_only : 1, /* only inherit on threads */ + __reserved_1 : 29; union { __u32 wakeup_events; /* wakeup every n events */ --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -12776,12 +12776,13 @@ static int inherit_task_group(struct perf_event *event, struct task_struct *parent, struct perf_event_context *parent_ctx, struct task_struct *child, int ctxn, - int *inherited_all) + unsigned long clone_flags, int *inherited_all) { int ret; struct perf_event_context *child_ctx; - if (!event->attr.inherit) { + if (!event->attr.inherit || + (event->attr.thread_only && !(clone_flags & CLONE_THREAD))) { *inherited_all = 0; return 0; } @@ -12813,7 +12814,7 @@ inherit_task_group(struct perf_event *ev /* * Initialize the perf_event context in task_struct */ -static int perf_event_init_context(struct task_struct *child, int ctxn) +static int perf_event_init_context(struct task_struct *child, int ctxn, unsigned long clone_flags) { struct perf_event_context *child_ctx, *parent_ctx; struct perf_event_context *cloned_ctx; @@ -12853,7 +12854,8 @@ static int perf_event_init_context(struc */ perf_event_groups_for_each(event, &parent_ctx->pinned_groups) { ret = inherit_task_group(event, parent, parent_ctx, - child, ctxn, &inherited_all); + child, ctxn, clone_flags, + &inherited_all); if (ret) goto out_unlock; } @@ -12869,7 +12871,8 @@ static int perf_event_init_context(struc perf_event_groups_for_each(event, &parent_ctx->flexible_groups) { ret = inherit_task_group(event, parent, parent_ctx, - child, ctxn, &inherited_all); + child, ctxn, clone_flags, + &inherited_all); if (ret) goto out_unlock; } @@ -12911,7 +12914,7 @@ static int perf_event_init_context(struc /* * Initialize the perf_event context in task_struct */ -int perf_event_init_task(struct task_struct *child) +int perf_event_init_task(struct task_struct *child, unsigned long clone_flags) { int ctxn, ret; @@ -12920,7 +12923,7 @@ int perf_event_init_task(struct task_str INIT_LIST_HEAD(&child->perf_event_list); for_each_task_context_nr(ctxn) { - ret = perf_event_init_context(child, ctxn); + ret = perf_event_init_context(child, ctxn, clone_flags); if (ret) { perf_event_free_task(child); return ret; --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2068,7 +2068,7 @@ static __latent_entropy struct task_stru if (retval) goto bad_fork_cleanup_policy; - retval = perf_event_init_task(p); + retval = perf_event_init_task(p, clone_flags); if (retval) goto bad_fork_cleanup_policy; retval = audit_alloc(p);