Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1567659rwb; Wed, 5 Oct 2022 01:17:25 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7qI1iGXVmmgD2H+WXhsG8/nXOMvvNYr+2gujLGsHlD2dG2+GOjl7PujVP0iGL9OrACERXB X-Received: by 2002:a17:907:a0c7:b0:787:f13b:5533 with SMTP id hw7-20020a170907a0c700b00787f13b5533mr19398609ejc.50.1664957845219; Wed, 05 Oct 2022 01:17:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664957845; cv=none; d=google.com; s=arc-20160816; b=ZfLbmimknB2LbrF0+DxClF8GzaUKHJMhwNDEQd7A72BvAwQIFOM/ywCUT2rlknQlyQ xkULFPSWX5gcqzvj5Y9vUTv01zFncA+0Xi2ea9rimHMWvNekEqMhUDZbK8y1t9ZiXoRv /mylMaaVeb5CbSnWI9ivHYl3+AhtxyuBYzZ7vA372UIbw1yh3mwB+XWct3T1vE7IPP69 VMWzZUXWBNrKfNJaSp1Rx0rTY7OYtCmG6MtHBacd5S93kmfOWKvOFXrfwuSXyBMbvOgp xiFkR2lyYjI5q9xczzpFJjn1P8s4dF4SbDTM1fBzNQDHvVQ2YnFsXJANB/NBgeZtpfjU RjNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=XqDfzIvXSelkrbDlyQ1vmQ7mysva9wlo/4eb4beEi2g=; b=u/65FiFZ2OQhGslBJYy4vgG/p6RStDIOMZkAo6JaqtpUmqEEbLLxCJhpo0R3h0br1S 4EYzt/D+FHIlI3656UG1bHNrkDGzhrvSB9hOhsK7eona25xppKXBuQ1wcWJcq/K1Yfut Hc7+/ic9WgEhJLrtb4DvoS3eDg85KVzAbgagEGKdPQqCbJVECiQiKrd+skWKjwdVc2nM hX5TKQBT/L/07PpGJDok+/ZHkxYqAa4ndarSgumM6Z8q5Vkn6aISwY+IfR1763SVMjg0 PjZHmspIBU2wsbzbtvVXDsncCsf100X4gWMBDVSYFauC7tsp1QLJT08mc0eEt2Niizhf 5eJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=EMpd1ufP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hv13-20020a17090760cd00b00781c9c3b6f5si13006115ejc.474.2022.10.05.01.16.58; Wed, 05 Oct 2022 01:17:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=EMpd1ufP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229786AbiJEHhY (ORCPT + 99 others); Wed, 5 Oct 2022 03:37:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229529AbiJEHhW (ORCPT ); Wed, 5 Oct 2022 03:37:22 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C7A765553; Wed, 5 Oct 2022 00:37:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=XqDfzIvXSelkrbDlyQ1vmQ7mysva9wlo/4eb4beEi2g=; b=EMpd1ufPRAOsfT1ah5BoXGz2Cg eIHO9uAwNy3J2XnPnxRY9KJu0lHq/IYjP+ozMmP83o56X/wnTgjHp1cTM/y8lvBYH5f/AR5qem4Ga AoJSimUkFJQ6ihrxs2QNFKeKRwpk6LsLSA0ikelVILgLnDYPGsV8MfRA5xYTDmi5yD6fztkg1KW+g AE97vNKJW4cnR+Px+XQPpjfSVaa3r5FLMrZdXhPZwksehaAdlMqsR0qMNVAqVN1jeLU1gK7ec1SFQ ejO8bLXYbrs5VdCZsx512XlIOCc7ckgh8LrWivhTZNrKvLkQrNnv3iaTJGOmZmrpBe8sgURH17uG+ D0D+4gmA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1ofyxq-000CCB-H4; Wed, 05 Oct 2022 07:37:10 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 5E45D3001CE; Wed, 5 Oct 2022 09:37:06 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 3A04520181465; Wed, 5 Oct 2022 09:37:06 +0200 (CEST) Date: Wed, 5 Oct 2022 09:37:06 +0200 From: Peter Zijlstra To: Marco Elver Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, Dmitry Vyukov Subject: Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse Message-ID: References: <20220927121322.1236730-1-elver@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 04, 2022 at 07:33:55PM +0200, Marco Elver wrote: > It looks reasonable, but obviously needs to pass tests. :-) Ikr :-) > Also, see comment below (I think you're still turning signals > asynchronous, which we shouldn't do). Indeed so; I tried fixing that this morning, but so far that doesn't seem to want to actually cure things :/ I'll need to stomp on this harder. Current hackery below. The main difference is that instead of trying to restart the irq_work on sched_in, sched_out will now queue a task-work. The event scheduling is done from 'regular' IRQ context and as such there should be a return-to-userspace for the relevant task in the immediate future (either directly or after scheduling). Alas, something still isn't right... --- include/linux/perf_event.h | 9 ++-- kernel/events/core.c | 115 ++++++++++++++++++++++++++++----------------- 2 files changed, 79 insertions(+), 45 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 853f64b6c8c2..f15726a6c127 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -756,11 +756,14 @@ struct perf_event { struct fasync_struct *fasync; /* delayed work for NMIs and such */ - int pending_wakeup; - int pending_kill; - int pending_disable; + unsigned int pending_wakeup :1; + unsigned int pending_disable :1; + unsigned int pending_sigtrap :1; + unsigned int pending_kill :3; + unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending; + struct callback_head pending_sig; atomic_t event_limit; diff --git a/kernel/events/core.c b/kernel/events/core.c index b981b879bcd8..e28257fb6f00 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -54,6 +54,7 @@ #include #include #include +#include #include "internal.h" @@ -2276,11 +2277,19 @@ event_sched_out(struct perf_event *event, event->pmu->del(event, 0); event->oncpu = -1; - if (READ_ONCE(event->pending_disable) >= 0) { - WRITE_ONCE(event->pending_disable, -1); + if (event->pending_disable) { + event->pending_disable = 0; perf_cgroup_event_disable(event, ctx); state = PERF_EVENT_STATE_OFF; } + + if (event->pending_sigtrap) { + if (state != PERF_EVENT_STATE_OFF) + task_work_add(current, &event->pending_sig, TWA_NONE); + else + event->pending_sigtrap = 0; + } + perf_event_set_state(event, state); if (!is_software_event(event)) @@ -2471,8 +2480,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable); void perf_event_disable_inatomic(struct perf_event *event) { - WRITE_ONCE(event->pending_disable, smp_processor_id()); - /* can fail, see perf_pending_event_disable() */ + event->pending_disable = 1; irq_work_queue(&event->pending); } @@ -6448,47 +6456,40 @@ static void perf_sigtrap(struct perf_event *event) event->attr.type, event->attr.sig_data); } -static void perf_pending_event_disable(struct perf_event *event) +/* + * Deliver the pending work in-event-context or follow the context. + */ +static void __perf_pending_event(struct perf_event *event) { - int cpu = READ_ONCE(event->pending_disable); + int cpu = READ_ONCE(event->oncpu); + /* + * If the event isn't running; we done. event_sched_in() will restart + * the irq_work when needed. + */ if (cpu < 0) return; + /* + * Yay, we hit home and are in the context of the event. + */ if (cpu == smp_processor_id()) { - WRITE_ONCE(event->pending_disable, -1); - - if (event->attr.sigtrap) { + if (event->pending_sigtrap) { + event->pending_sigtrap = 0; perf_sigtrap(event); - atomic_set_release(&event->event_limit, 1); /* rearm event */ - return; } - - perf_event_disable_local(event); - return; + if (event->pending_disable) { + event->pending_disable = 0; + perf_event_disable_local(event); + } } /* - * CPU-A CPU-B - * - * perf_event_disable_inatomic() - * @pending_disable = CPU-A; - * irq_work_queue(); - * - * sched-out - * @pending_disable = -1; - * - * sched-in - * perf_event_disable_inatomic() - * @pending_disable = CPU-B; - * irq_work_queue(); // FAILS - * - * irq_work_run() - * perf_pending_event() - * - * But the event runs on CPU-B and wants disabling there. + * Requeue if there's still any pending work left, make sure to follow + * where the event went. */ - irq_work_queue_on(&event->pending, cpu); + if (event->pending_disable || event->pending_sigtrap) + irq_work_queue_on(&event->pending, cpu); } static void perf_pending_event(struct irq_work *entry) @@ -6496,19 +6497,43 @@ static void perf_pending_event(struct irq_work *entry) struct perf_event *event = container_of(entry, struct perf_event, pending); int rctx; - rctx = perf_swevent_get_recursion_context(); /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. */ + rctx = perf_swevent_get_recursion_context(); - perf_pending_event_disable(event); - + /* + * The wakeup isn't bound to the context of the event -- it can happen + * irrespective of where the event is. + */ if (event->pending_wakeup) { event->pending_wakeup = 0; perf_event_wakeup(event); } + __perf_pending_event(event); + + if (rctx >= 0) + perf_swevent_put_recursion_context(rctx); +} + +static void perf_pending_sig(struct callback_head *head) +{ + struct perf_event *event = container_of(head, struct perf_event, pending_sig); + int rctx; + + /* + * If we 'fail' here, that's OK, it means recursion is already disabled + * and we won't recurse 'further'. + */ + rctx = perf_swevent_get_recursion_context(); + + if (event->pending_sigtrap) { + event->pending_sigtrap = 0; + perf_sigtrap(event); + } + if (rctx >= 0) perf_swevent_put_recursion_context(rctx); } @@ -9227,11 +9252,20 @@ static int __perf_event_overflow(struct perf_event *event, if (events && atomic_dec_and_test(&event->event_limit)) { ret = 1; event->pending_kill = POLL_HUP; - event->pending_addr = data->addr; - perf_event_disable_inatomic(event); } + if (event->attr.sigtrap) { + /* + * Should not be able to return to user space without processing + * pending_sigtrap (kernel events can overflow multiple times). + */ + WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); + event->pending_sigtrap = 1; + event->pending_addr = data->addr; + irq_work_queue(&event->pending); + } + READ_ONCE(event->overflow_handler)(event, data, regs); if (*perf_event_fasync(event) && event->pending_kill) { @@ -11560,8 +11594,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, init_waitqueue_head(&event->waitq); - event->pending_disable = -1; init_irq_work(&event->pending, perf_pending_event); + init_task_work(&event->pending_sig, perf_pending_sig); mutex_init(&event->mmap_mutex); raw_spin_lock_init(&event->addr_filters.lock); @@ -11583,9 +11617,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, if (parent_event) event->event_caps = parent_event->event_caps; - if (event->attr.sigtrap) - atomic_set(&event->event_limit, 1); - if (task) { event->attach_state = PERF_ATTACH_TASK; /*