Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp314102lqo; Thu, 16 May 2024 07:10:34 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW5JhZMU1u2FJoloEW/uzwpIIgRE7TKu3vrLGclgTk5cOiSI6+zhwVzJKNd27MbAaihp5rh+MZBwS+RgE/7MZ3wrCK54QGhUp1m+Ulg/w== X-Google-Smtp-Source: AGHT+IHbSMUb8rTbycHbUP0aQNdr/piEbComqQ4SWhIDsVnRxMab6Fz2j7u6RZo/wUYBmQreTxva X-Received: by 2002:a05:6214:428f:b0:6a0:d298:e04e with SMTP id 6a1803df08f44-6a168259307mr248963046d6.43.1715868634040; Thu, 16 May 2024 07:10:34 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715868634; cv=pass; d=google.com; s=arc-20160816; b=X0oEMFpmDY4JG6f0tyw4QMHepcHU1vUSCvA7JMvrqC7LL35ECjGyTjozOQbnT+3ZvR N4OFlrttkYznDeliHA73wDKbySMEFzevCl0MNaGwFFdldmfGMkoX48KgbXL+rYccH0ip ebz9JinSkYgmgEhLH2dTgqm25ZJxbP619KpbpESSoHEg7ydILGEnJUK+9zScxnXZUZ6e oI95BzkoC87DsUCeX2DdMqyhkY4tBFaQiIoBb2JZL5Iuuzc60TtuvSwOzjLC3ZACKN2Y LM5AVBeBkKyvFH8QoMT5sR5gTms9wFP4vd+N1/q39T9nY37elGOmUT+Uvts60NVI6s/a 1gkQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=eks7tiZEHvdoQCzJ3oHzqiZ5RqChtGA8qMhTqfnIAZg=; fh=2zwCh0TY81QL+m8lKeshJdiobUNV+OjhPy1yNmYchVY=; b=KlDvA1bhE5JoBmdo/cjgLJH//BdOgpnCNvV83/objH6LB/7mIOxhLn+n32e2p17wo5 tJJQZcTXWFynUrC/evpmZYINKuOTI4raackE9ysfyqj12/VRYE8oYpNHc4f8fV/anfvb 330FXcmWNAYBalMJjhmLEQkPGetsWEoXHFqT0k38lHkT24bqK+bd5R+EiMYCxns4gxcl 5C3Kw/7r20KGiUNIIeK3/lp88wRXejKQEdcruqgVmpoKpziIDBuxbeph0KvRkc8z5am0 qReDdnbnvkLBErzq8BU2S85511M2RSUgYV1jGjthN4BWtZagBPZilJlhPL6Wy6zJR2PY l0vw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=mluEbLQj; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-181154-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-181154-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id 6a1803df08f44-6a15f1d77ecsi176647886d6.76.2024.05.16.07.10.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 May 2024 07:10:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-181154-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=mluEbLQj; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-181154-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-181154-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id B0C7C1C20D4A for ; Thu, 16 May 2024 14:10:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 10AA9149DFB; Thu, 16 May 2024 14:09:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mluEbLQj" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D69614D456 for ; Thu, 16 May 2024 14:09:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715868593; cv=none; b=neDvydN2Xsm299bOcT3zZJ3GeY0PEB+Z8yeepSsdKL9N7l+d/uRotvzEjygT77UdjYh/vwpETxYtoo5aAcLB3vBgvCtFxQUrn4Ry6i4ZDdaZss/P+l5TfSZCrEYYc5asrX1zGniTmOANVsb6Kc2u9UPJwsIq5bch+t1D1LUi8yA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715868593; c=relaxed/simple; bh=bP7MNhXzv+o9qsxlZ0UdPTTX4+kLIGYPTVeWZotBcKo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=j9K0Qhqv0i1wuSV20Lhn0pe9uUj57PrIXW8TxqNB9NfzcvPUS/4BJfNwBBs96QpgTRzDqXEX4x3K3fbQPy8QhMGP+B/A0PEKb7OWGtQ8tx2y7pIfq/KAP43vnLuChYun5MByJt95t62scqEzv+WIhq+zGvyjZJXgWN1HLjWzKDg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mluEbLQj; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 50F6AC113CC; Thu, 16 May 2024 14:09:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1715868592; bh=bP7MNhXzv+o9qsxlZ0UdPTTX4+kLIGYPTVeWZotBcKo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mluEbLQj1Ovdl0OwLxB8/GOpMn2w+llPGcyVBZELY15Kmy4p49NoiGJkcVqP8/CjU 7K1/RpbnlSw8bNQNREbj1UhivlDdcNQlNgBVzR39yJbwtfbjbF5giJ8xY8bicfgTSb BvC4+KmdvxqUxCrjXRq1l8OVTq0rhIxulvVwq9VFpz/9tkeoNQzVa3KEw5b6wQLjFn Z5NTNH29UbFK+EV6XArjrS2QlbhAsrteeGBlEn6B4btcrkWrSXALcYEK9PHCXkOfbz kZ7wjTle75ItAIsQ3sb/5M8ghuMRC3xPFv0jZ8c6Zle8hrj9NWrRd9baGAf/LIhLAo h2fobdrtmmflw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Sebastian Andrzej Siewior Subject: [PATCH 4/4] perf: Fix event leak upon exec and file release Date: Thu, 16 May 2024 16:09:36 +0200 Message-Id: <20240516140936.13694-5-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240516140936.13694-1-frederic@kernel.org> References: <20240516140936.13694-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit The perf pending task work is never waited upon the matching event release. In the case of a child event, released via free_event() directly, this can potentially result in a leaked event, such as in the following scenario that doesn't even require a weak IRQ work implementation to trigger: schedule() prepare_task_switch() =======> perf_event_overflow() event->pending_sigtrap = ... irq_work_queue(&event->pending_irq) <======= perf_event_task_sched_out() event_sched_out() event->pending_sigtrap = 0; atomic_long_inc_not_zero(&event->refcount) task_work_add(&event->pending_task) finish_lock_switch() =======> perf_pending_irq() //do nothing, rely on pending task work <======= begin_new_exec() perf_event_exit_task() perf_event_exit_event() // If is child event free_event() WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1) // event is leaked Similar scenarios can also happen with perf_event_remove_on_exec() or simply against concurrent perf_event_release(). Fix this with synchonizing against the possibly remaining pending task work while freeing the event, just like is done with remaining pending IRQ work. This means that the pending task callback neither need nor should hold a reference to the event, preventing it from ever beeing freed. Fixes: 517e6a301f34 ("perf: Fix perf_pending_task() UaF") Signed-off-by: Frederic Weisbecker --- include/linux/perf_event.h | 1 + kernel/events/core.c | 38 ++++++++++++++++++++++++++++++++++---- 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index d2a15c0c6f8a..89ae41bb5f70 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -786,6 +786,7 @@ struct perf_event { struct irq_work pending_irq; struct callback_head pending_task; unsigned int pending_work; + struct rcuwait pending_work_wait; atomic_t event_limit; diff --git a/kernel/events/core.c b/kernel/events/core.c index f2a366e736a4..3c6a8ad3c6f7 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2288,7 +2288,6 @@ event_sched_out(struct perf_event *event, struct perf_event_context *ctx) if (state != PERF_EVENT_STATE_OFF && !event->pending_work && !task_work_add(current, &event->pending_task, TWA_RESUME)) { - WARN_ON_ONCE(!atomic_long_inc_not_zero(&event->refcount)); event->pending_work = 1; } else { local_dec(&event->ctx->nr_pending); @@ -5184,9 +5183,35 @@ static bool exclusive_event_installable(struct perf_event *event, static void perf_addr_filters_splice(struct perf_event *event, struct list_head *head); +static void perf_pending_task_sync(struct perf_event *event) +{ + struct callback_head *head = &event->pending_task; + + if (!event->pending_work) + return; + /* + * If the task is queued to the current task's queue, we + * obviously can't wait for it to complete. Simply cancel it. + */ + if (task_work_cancel(current, head)) { + event->pending_work = 0; + local_dec(&event->ctx->nr_pending); + return; + } + + /* + * All accesses related to the event are within the same + * non-preemptible section in perf_pending_task(). The RCU + * grace period before the event is freed will make sure all + * those accesses are complete by then. + */ + rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_UNINTERRUPTIBLE); +} + static void _free_event(struct perf_event *event) { irq_work_sync(&event->pending_irq); + perf_pending_task_sync(event); unaccount_event(event); @@ -6804,24 +6829,28 @@ static void perf_pending_task(struct callback_head *head) struct perf_event *event = container_of(head, struct perf_event, pending_task); int rctx; + /* + * All accesses to the event must belong to the same implicit RCU read-side + * critical section as the ->pending_work reset. See comment in + * perf_pending_task_sync(). + */ + preempt_disable_notrace(); /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. */ - preempt_disable_notrace(); rctx = perf_swevent_get_recursion_context(); if (event->pending_work) { event->pending_work = 0; perf_sigtrap(event); local_dec(&event->ctx->nr_pending); + rcuwait_wake_up(&event->pending_work_wait); } if (rctx >= 0) perf_swevent_put_recursion_context(rctx); preempt_enable_notrace(); - - put_event(event); } #ifdef CONFIG_GUEST_PERF_EVENTS @@ -11929,6 +11958,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, init_waitqueue_head(&event->waitq); init_irq_work(&event->pending_irq, perf_pending_irq); init_task_work(&event->pending_task, perf_pending_task); + rcuwait_init(&event->pending_work_wait); mutex_init(&event->mmap_mutex); raw_spin_lock_init(&event->addr_filters.lock); -- 2.34.1