Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1980804rdb; Thu, 7 Dec 2023 14:38:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IERjk859IV1KKIzorSME3CO0dZfkrJMnMSgwiYjtNZtCRUqOcknJgrH9D84nk7g9Faf/l03 X-Received: by 2002:a17:902:ce8b:b0:1d0:5ad7:54f8 with SMTP id f11-20020a170902ce8b00b001d05ad754f8mr4222961plg.15.1701988736032; Thu, 07 Dec 2023 14:38:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701988736; cv=none; d=google.com; s=arc-20160816; b=N+miY6gdvTJbAJFpOU7mfXgDiL/CUJ7lJOenmWg8UuhChaxaK51mjsE/nP/J47FFP9 5ldZqGJNfVmvJ0VxCmrAnsmQfUU6TMMHHLUdXtFIQ2tLPioVgp07p7WS1VtDTjQ3/P2f De0yLL0lUmyN8LO6cOrDGYDNTYlSRKty6i79HBx6uqojql2lnDuSfIwqj2mMyJxsResE X9bvl7gpgFHEe/u0hlZwbBGX8kfNSeS6SdzPob0QwFxKatNd9n44ocjDRz+LK5EhbAJy d9v1qVe3hAdEIIjyRZTEzfiLFq6UE0tcCT7dTccuurtSojs2/NAcK8zfLyp1RQlB9XAT DZxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:sender:dkim-signature; bh=26/GlUsu1cFJgj1Tg+JnHuO4HSzDZlYdi/Z+MXwW2YA=; fh=JB6rwwU3nhIth3YCNpUrT+Y1taV3nAKEF13CbSq8BBE=; b=fKKhnr8jm+Lg62HJA0TKiE+5FxT/ZrR5JEPS6EJz7VTOtdeIFdX1XinwAFkjvvSZUe i1aCNIINDWbTKc1qz7YxohrCNzfrutzN39Zzfgj8zDmU5KHmJFgIEpiLri28U3Krilax E4f0+1AtUjEuCaSWR0Pt2X0tYqY7GXCJcYKLRlQPAEoe5BayFTXmyH6erNizuMBO3TZl nWdA+xMO4etWFUWKBuigxGuN4qroT95zXXvK/rONQ/HZVOBJYrc8huPRn5cxwONMujZ3 lWy1FzU3g9MCeddpV+d0t5eyN5iieCATg1Ro6Dm1MEdDMV8tUmp8OLAbT1eE6Z0Yrnzb 7igg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=GhCLg9jK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id b3-20020a170902d50300b001cffce39be3si467808plg.218.2023.12.07.14.38.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 14:38:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=GhCLg9jK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id CC6B28075EF8; Thu, 7 Dec 2023 14:38:52 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231461AbjLGWie (ORCPT + 99 others); Thu, 7 Dec 2023 17:38:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229634AbjLGWid (ORCPT ); Thu, 7 Dec 2023 17:38:33 -0500 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AAEB170F; Thu, 7 Dec 2023 14:38:39 -0800 (PST) Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-2866951b6e0so1464953a91.2; Thu, 07 Dec 2023 14:38:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701988719; x=1702593519; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :from:to:cc:subject:date:message-id:reply-to; bh=26/GlUsu1cFJgj1Tg+JnHuO4HSzDZlYdi/Z+MXwW2YA=; b=GhCLg9jKCgxpkfX+nMAFq12VS58Ibu30l5m4P9131O/X7njH/Zf2uHRQTvSb7mosNc 0EV/axqqqknMlhoiJpN7FnPnNDjUEeq4SkgV3oo6Wi+Myw3EOKF1BzdqsSIPq5+D20HI npA9WtNIFTJFGAUGNMrOSvKWLll2ZPKYn4yK48GiuqCMI1DFKtYvnQlMEcxosUfWCbvT svqpuZksWC4AnzCTWCe3Z+V7Kr0sHJMsrkdR32lFu5PPP5AidB9pRt74dHQGV8ykurWs s+QvxkWtklD0O+NAJxmIHWR3cQRZ/gV/eyLA8kqtYbVMHyoqurYJy7zXR7YPbkCSWinV NQ9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701988719; x=1702593519; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=26/GlUsu1cFJgj1Tg+JnHuO4HSzDZlYdi/Z+MXwW2YA=; b=JGlKvoas38O+HyejzDR97ts03fnL7jlAe9zWX8j0qSa1hOxReElQQwsyuyOz2Bv75b Mntzsu9SzlcFHdmhwphzpypypc5vr5jxUopnLfhXcNBsfQfUlX28FhQ3JpmR9o1s5rt/ 6H1BShXR8AkjoPyESQixLhSPp0SceH+WPkfFD7iJhm/oTet6j73XOC8PkahmSnZpCG7e bAnC2FQTdegEptpejIhgMq+NhndUCHD5xfzHWzmqrxxOCAKsLmzyU82uqigUcl5zGnWH y5r125kKbd+/PSNHCEU1zS0xQXiJ/mroiwHRUEArQYfPbDCk2E6GluKILzdZGBaeurFy HpvA== X-Gm-Message-State: AOJu0YyejJHu0jnbcR9u+kCCXDUyLzcpirtS3BfLh88BiaxqVnuPmVYs Don9KBwh83lJKWDZQHt5L7o= X-Received: by 2002:a17:90b:4c4f:b0:285:fc67:6164 with SMTP id np15-20020a17090b4c4f00b00285fc676164mr3307923pjb.5.1701988718605; Thu, 07 Dec 2023 14:38:38 -0800 (PST) Received: from google.com ([2620:15c:2c0:5:3280:1779:f8f1:9d5a]) by smtp.gmail.com with ESMTPSA id hi16-20020a17090b30d000b00286bd821426sm1945215pjb.26.2023.12.07.14.38.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 14:38:37 -0800 (PST) Sender: Namhyung Kim Date: Thu, 7 Dec 2023 14:38:35 -0800 From: Namhyung Kim To: Marco Elver Cc: Kyle Huey , Kyle Huey , open list , Andrii Nakryiko , Jiri Olsa , Yonghong Song , Robert O'Callahan , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org Subject: Re: [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap Message-ID: References: <20231207163458.5554-1-khuey@kylehuey.com> <20231207163458.5554-2-khuey@kylehuey.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Thu, 07 Dec 2023 14:38:53 -0800 (PST) Hello, On Thu, Dec 07, 2023 at 06:53:58PM +0100, Marco Elver wrote: > On Thu, 7 Dec 2023 at 18:47, Kyle Huey wrote: > > > > > > > > On Thu, Dec 7, 2023, 9:05 AM Marco Elver wrote: > >> > >> On Thu, 7 Dec 2023 at 17:35, Kyle Huey wrote: > >> > > >> > The perf subsystem already allows an overflow handler to clear pending_kill > >> > to suppress a fasync signal (although nobody currently does this). Allow an > >> > overflow handler to suppress the other visible side effects of an overflow, > >> > namely event_limit accounting and SIGTRAP generation. Well, I think it can still hit the throttling logic and generate a PERF_RECORD_THROTTLE. But it should be rare.. > >> > > >> > Signed-off-by: Kyle Huey > >> > --- > >> > kernel/events/core.c | 10 +++++++--- > >> > 1 file changed, 7 insertions(+), 3 deletions(-) > >> > > >> > diff --git a/kernel/events/core.c b/kernel/events/core.c > >> > index b704d83a28b2..19fddfc27a4a 100644 > >> > --- a/kernel/events/core.c > >> > +++ b/kernel/events/core.c > >> > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event, > >> > */ > >> > > >> > event->pending_kill = POLL_IN; > >> > + > >> > + READ_ONCE(event->overflow_handler)(event, data, regs); > >> > + > >> > + if (!event->pending_kill) > >> > + return ret; > >> > >> It's not at all intuitive that resetting pending_kill to 0 will > >> suppress everything else, too. There is no relationship between the > >> fasync signals and SIGTRAP. pending_kill is for the former and > >> pending_sigtrap is for the latter. One should not affect the other. > >> > >> A nicer solution would be to properly undo the various pending_* > >> states (in the case of pending_sigtrap being set it should be enough > >> to reset pending_sigtrap to 0, and also decrement > >> event->ctx->nr_pending). > > > > > > I don't believe it's possible to correctly undo the event_limit decrement after the fact (if it's e.g. racing with the ioctl that adds to the event limit). > > > >> Although I can see why this solution is simpler. Perhaps with enough > >> comments it might be clearer. > >> > >> Preferences? > > > > > > The cleanest way would probably be to add a return value to the overflow handler function that controls this. It requires changing a bunch of arch specific code on arches I don't have access to though. > > Hmm. > > Maybe wait for perf maintainers to say what is preferrable. (I could > live with just making sure this has no other weird side effects and > more comments.) What if we can call bpf handler directly and check the return value? Then I think we can also get rid of the original overflow handler. Something like this (untested..) Thanks, Namhyung ---8<--- diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e85cd1c0eaf3..1eba6f5bb70b 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -809,7 +809,6 @@ struct perf_event { perf_overflow_handler_t overflow_handler; void *overflow_handler_context; #ifdef CONFIG_BPF_SYSCALL - perf_overflow_handler_t orig_overflow_handler; struct bpf_prog *prog; u64 bpf_cookie; #endif diff --git a/kernel/events/core.c b/kernel/events/core.c index 4c72a41f11af..e1a00646dbbe 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9471,6 +9471,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r * Generic event overflow handling, sampling. */ +#ifdef CONFIG_BPF_SYSCALL +static int bpf_overflow_handler(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs); +#endif + static int __perf_event_overflow(struct perf_event *event, int throttle, struct perf_sample_data *data, struct pt_regs *regs) @@ -9487,6 +9493,11 @@ static int __perf_event_overflow(struct perf_event *event, ret = __perf_event_account_interrupt(event, throttle); +#ifdef CONFIG_BPF_SYSCALL + if (event->prog && bpf_overflow_handler(event, data, regs) == 0) + return ret; +#endif + /* * XXX event_limit might not quite work as expected on inherited * events @@ -10346,7 +10357,7 @@ static void perf_event_free_filter(struct perf_event *event) } #ifdef CONFIG_BPF_SYSCALL -static void bpf_overflow_handler(struct perf_event *event, +static int bpf_overflow_handler(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs) { @@ -10355,7 +10366,7 @@ static void bpf_overflow_handler(struct perf_event *event, .event = event, }; struct bpf_prog *prog; - int ret = 0; + int ret = 1; ctx.regs = perf_arch_bpf_user_pt_regs(regs); if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) @@ -10369,10 +10380,7 @@ static void bpf_overflow_handler(struct perf_event *event, rcu_read_unlock(); out: __this_cpu_dec(bpf_prog_active); - if (!ret) - return; - - event->orig_overflow_handler(event, data, regs); + return ret; } static int perf_event_set_bpf_handler(struct perf_event *event, @@ -10408,8 +10416,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event, event->prog = prog; event->bpf_cookie = bpf_cookie; - event->orig_overflow_handler = READ_ONCE(event->overflow_handler); - WRITE_ONCE(event->overflow_handler, bpf_overflow_handler); return 0; } @@ -10420,7 +10426,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event) if (!prog) return; - WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler); event->prog = NULL; bpf_prog_put(prog); } @@ -11880,13 +11885,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, overflow_handler = parent_event->overflow_handler; context = parent_event->overflow_handler_context; #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING) - if (overflow_handler == bpf_overflow_handler) { + if (parent_event->prog) { struct bpf_prog *prog = parent_event->prog; bpf_prog_inc(prog); event->prog = prog; - event->orig_overflow_handler = - parent_event->orig_overflow_handler; } #endif }