Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4189517pxf; Tue, 23 Mar 2021 05:11:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyobgI0UAPE2+JOk8bnfVybMH21ON2jTMeUFL5+twzrLdKJPmmf/EqCb2hm6LaGgzYAaHQj X-Received: by 2002:a50:f391:: with SMTP id g17mr4317862edm.26.1616501484355; Tue, 23 Mar 2021 05:11:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616501484; cv=none; d=google.com; s=arc-20160816; b=NipnYqvd+/j8Ddg8mErtyiq0dFnGfRD1OmoxH09/hjri3fffrrccwGHoSgKeGBMFLH WgW84MAq/M9G7lUQLEZ6ZZkrkc3VmFG5zqibwJaaOTUAbYRD/EM75uZW20VF1n8fmBcW GbguO41WVPPBeaJX4i3wQQnZyNp8c5CTNZodS3DiyBkYehp8NbIyM6f7cf0xN5wlKDKQ AifwzRYWxJ1k5fe7BLgNnB9556FH/oaQFNz/r6nschL/+yJAq13BMDVYtkxomLFXfWuw rrRzqk5vhU6zYpU2t7gQnYtb5rlWd+2LFinp0NLEe5Vg98ocYccVgkdr1qtQ0IRz7txS Zn8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=B6h9ZiEvk3OjrxZiLfOl6jy0sUhDT3fazDwCa5g6frs=; b=IPZ5VvkV5gxa+A5RJvRP+dzhHKGpjvar711Lf17Hg6ICUuKUDOsz6S8pYAfUb3gnC5 fuwaTHNnyEF6k8ETC8WctvBRTm5/sRrIDZhr8NXxjMUNKrwDu5dGy1CMYiy91kccEhyz Ri1tu1APuA0X/0YhQaEAW6U7STWMAxZHojUSokr+o1RjrIaNL3fSyysfiFBtPhK4E90M djFgqEU7i20v4Ohc7KUXFCjw7fLMS1XhRTWgfKP74ag2+kGHlRl8umlukwYlwVPl5YAN II3bjBf0yh7YPEvnv35Rgg/Hw9cQoYXxfri2dhBssB3L7W2xtI5Ma3neU1y4P/xEoBcE 63Nw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=O0kY6dQA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cd22si12444904ejb.384.2021.03.23.05.11.00; Tue, 23 Mar 2021 05:11:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=O0kY6dQA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231224AbhCWMJq (ORCPT + 99 others); Tue, 23 Mar 2021 08:09:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231215AbhCWMJH (ORCPT ); Tue, 23 Mar 2021 08:09:07 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F96AC061756 for ; Tue, 23 Mar 2021 05:09:04 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id j7so20529493wrd.1 for ; Tue, 23 Mar 2021 05:09:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=B6h9ZiEvk3OjrxZiLfOl6jy0sUhDT3fazDwCa5g6frs=; b=O0kY6dQAKFr9wlu7HnQpMWqc6zZphu1xuqfTAOsUBlwOdjbfuoX8auWdcm/J7kF1q9 86p/C7rWp3p4AI4OL0Khvxoc0onkh1X5vySB0k6tasLzuQ/Bxk4rsLC7fLRneNNqLAh/ n1Xg6FotkEBB4otnLAs7F687FQFxc9U6G3WcicQqgBjr2+2eEG5zaXBpPZohEPRrmGjj EKky0wGDdoU0crenWHSBUWaiweGJTTlPRjjjtAUx0yC6kLEO1RfO5af7DqP5AJMZXqbA kF4ZLnNDQeuNyl5nm9nR1fP6LtkfFMIYDGuZz7Om99YpCxqKH0ypGEKl5a6fLqoH8MMw f8SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=B6h9ZiEvk3OjrxZiLfOl6jy0sUhDT3fazDwCa5g6frs=; b=QZoWHlPX76ynNFp9Y11k0BJMlu64sivm/0OxhupiaDU1eInVTePm3wRK0a+MxykBZI PajFMks/SuJl5kwLOaw9ZtUViTGX3U5pEQoT6FAhF1V1l+l8pF6ymReXx34ILJvyY05q 7Us5bE3DjMXgPULatQtpef9IV+GnrhENzPUxd9eKfA2sZE7m+weqUPdoZEvMzlH72vMv sHPDnETcVNX5WbDVH6fx8VAB8uY+qkSspXOOOOZbTyHMJdpgf1BzpN19i8nlr/JLjZUy /y9z8J/95kA9GWPriD0DL6CvW3OND8Q8Jiz/GwrZf2jXKCmolPfmXz5RMrmD/05/f4/v HmIQ== X-Gm-Message-State: AOAM532XKJxRr03FmpzdMGbapXaW2ttfzsMsAGunq0XKEOB6ZpcyuzBy uItCjgw0EyTfarnDNX0swqeNKw== X-Received: by 2002:adf:ea0e:: with SMTP id q14mr3674929wrm.389.1616501343004; Tue, 23 Mar 2021 05:09:03 -0700 (PDT) Received: from elver.google.com ([2a00:79e0:15:13:4cfd:1405:ab5d:85f8]) by smtp.gmail.com with ESMTPSA id r26sm2338599wmn.28.2021.03.23.05.09.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Mar 2021 05:09:02 -0700 (PDT) Date: Tue, 23 Mar 2021 13:08:55 +0100 From: Marco Elver To: Peter Zijlstra Cc: Alexander Shishkin , Arnaldo Carvalho de Melo , Ingo Molnar , Jiri Olsa , Mark Rutland , Namhyung Kim , Thomas Gleixner , Alexander Potapenko , Al Viro , Arnd Bergmann , Christian Brauner , Dmitry Vyukov , Jann Horn , Jens Axboe , Matt Morehouse , Peter Collingbourne , Ian Rogers , kasan-dev , linux-arch , linux-fsdevel , LKML , the arch/x86 maintainers , "open list:KERNEL SELFTEST FRAMEWORK" Subject: Re: [PATCH RFC v2 8/8] selftests/perf: Add kselftest for remove_on_exec Message-ID: References: <20210310104139.679618-1-elver@google.com> <20210310104139.679618-9-elver@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.5 (2021-01-21) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 23, 2021 at 11:41AM +0100, Marco Elver wrote: > On Tue, 23 Mar 2021 at 11:32, Peter Zijlstra wrote: [...] > > > + if (parent_event) { > > > /* > > > + * Remove event from parent, to avoid race where the > > > + * parent concurrently iterates through its children to > > > + * enable, disable, or otherwise modify an event. > > > */ > > > + mutex_lock(&parent_event->child_mutex); > > > + list_del_init(&event->child_list); > > > + mutex_unlock(&parent_event->child_mutex); > > > } > > > > ^^^ this, right? > > > > But that's something perf_event_exit_event() alread does. So then you're > > worried about the order of things. > > Correct. We somehow need to prohibit the parent from doing an > event_function_call() while we potentially deactivate the context with > perf_remove_from_context(). > > > > + > > > + perf_remove_from_context(event, !!event->parent * DETACH_GROUP); > > > + perf_event_exit_event(event, ctx, current, true); > > > } > > > > perf_event_release_kernel() first does perf_remove_from_context() and > > then clears the child_list, and that makes sense because if we're there, > > there's no external access anymore, the filedesc is gone and nobody will > > be iterating child_list anymore. > > > > perf_event_exit_task_context() and perf_event_exit_event() OTOH seem to > > rely on ctx->task == TOMBSTONE to sabotage event_function_call() such > > that if anybody is iterating the child_list, it'll NOP out. > > > > But here we don't have neither, and thus need to worry about the order > > vs child_list iteration. > > > > I suppose we should stick sync_child_event() in there as well. > > > > And at that point there's very little value in still using > > perf_event_exit_event()... let me see if there's something to be done > > about that. > > I don't mind dropping use of perf_event_exit_event() and open coding > all of this. That would also avoid modifying perf_event_exit_event(). > > But I leave it to you what you think is nicest. I played a bit more with it, and the below would be the version without using perf_event_exit_event(). Perhaps it isn't too bad. Thanks, -- Marco ------ >8 ------ diff --git a/kernel/events/core.c b/kernel/events/core.c index aa47e111435e..288b61820dab 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2165,8 +2165,9 @@ static void perf_group_detach(struct perf_event *event) * If this is a sibling, remove it from its group. */ if (leader != event) { + leader->nr_siblings--; list_del_init(&event->sibling_list); - event->group_leader->nr_siblings--; + event->group_leader = event; goto out; } @@ -2180,8 +2181,9 @@ static void perf_group_detach(struct perf_event *event) if (sibling->event_caps & PERF_EV_CAP_SIBLING) perf_remove_sibling_event(sibling); - sibling->group_leader = sibling; + leader->nr_siblings--; list_del_init(&sibling->sibling_list); + sibling->group_leader = sibling; /* Inherit group flags from the previous leader */ sibling->group_caps = event->group_caps; @@ -2358,10 +2360,19 @@ __perf_remove_from_context(struct perf_event *event, static void perf_remove_from_context(struct perf_event *event, unsigned long flags) { struct perf_event_context *ctx = event->ctx; + bool remove; lockdep_assert_held(&ctx->mutex); - event_function_call(event, __perf_remove_from_context, (void *)flags); + /* + * There is concurrency vs remove_on_exec(). + */ + raw_spin_lock_irq(&ctx->lock); + remove = (event->attach_state & PERF_ATTACH_CONTEXT); + raw_spin_unlock_irq(&ctx->lock); + + if (remove) + event_function_call(event, __perf_remove_from_context, (void *)flags); /* * The above event_function_call() can NO-OP when it hits @@ -4196,43 +4207,86 @@ static void perf_event_enable_on_exec(int ctxn) } static void perf_remove_from_owner(struct perf_event *event); -static void perf_event_exit_event(struct perf_event *child_event, - struct perf_event_context *child_ctx, - struct task_struct *child); +static void sync_child_event(struct perf_event *child_event, + struct task_struct *child); +static void free_event(struct perf_event *event); /* * Removes all events from the current task that have been marked * remove-on-exec, and feeds their values back to parent events. */ -static void perf_event_remove_on_exec(void) +static void perf_event_remove_on_exec(int ctxn) { - int ctxn; + struct perf_event_context *ctx, *clone_ctx = NULL; + struct perf_event *event, *next; + LIST_HEAD(free_list); + unsigned long flags; + bool modified = false; - for_each_task_context_nr(ctxn) { - struct perf_event_context *ctx; - struct perf_event *event, *next; + ctx = perf_pin_task_context(current, ctxn); + if (!ctx) + return; - ctx = perf_pin_task_context(current, ctxn); - if (!ctx) + mutex_lock(&ctx->mutex); + + if (WARN_ON_ONCE(ctx->task != current)) + goto unlock; + + list_for_each_entry_safe(event, next, &ctx->event_list, event_entry) { + struct perf_event *parent_event = event->parent; + + if (!event->attr.remove_on_exec) continue; - mutex_lock(&ctx->mutex); - list_for_each_entry_safe(event, next, &ctx->event_list, event_entry) { - if (!event->attr.remove_on_exec) - continue; + if (!is_kernel_event(event)) + perf_remove_from_owner(event); + + modified = true; - if (!is_kernel_event(event)) - perf_remove_from_owner(event); - perf_remove_from_context(event, DETACH_GROUP); + if (parent_event) { /* - * Remove the event and feed back its values to the - * parent event. + * Remove event from parent *before* modifying contexts, + * to avoid race where the parent concurrently iterates + * through its children to enable, disable, or otherwise + * modify an event. */ - perf_event_exit_event(event, ctx, current); + + sync_child_event(event, current); + + WARN_ON_ONCE(parent_event->ctx->parent_ctx); + mutex_lock(&parent_event->child_mutex); + list_del_init(&event->child_list); + mutex_unlock(&parent_event->child_mutex); + + perf_event_wakeup(parent_event); + put_event(parent_event); } - mutex_unlock(&ctx->mutex); - put_ctx(ctx); + + perf_remove_from_context(event, !!event->parent * DETACH_GROUP); + + raw_spin_lock_irq(&ctx->lock); + WARN_ON_ONCE(ctx->is_active); + perf_event_set_state(event, PERF_EVENT_STATE_EXIT); /* is_event_hup() */ + raw_spin_unlock_irq(&ctx->lock); + + if (parent_event) + free_event(event); + else + perf_event_wakeup(event); } + + raw_spin_lock_irqsave(&ctx->lock, flags); + if (modified) + clone_ctx = unclone_ctx(ctx); + --ctx->pin_count; + raw_spin_unlock_irqrestore(&ctx->lock, flags); + +unlock: + mutex_unlock(&ctx->mutex); + + put_ctx(ctx); + if (clone_ctx) + put_ctx(clone_ctx); } struct perf_read_data { @@ -7581,20 +7635,18 @@ void perf_event_exec(void) struct perf_event_context *ctx; int ctxn; - rcu_read_lock(); for_each_task_context_nr(ctxn) { - ctx = current->perf_event_ctxp[ctxn]; - if (!ctx) - continue; - perf_event_enable_on_exec(ctxn); + perf_event_remove_on_exec(ctxn); - perf_iterate_ctx(ctx, perf_event_addr_filters_exec, NULL, - true); + rcu_read_lock(); + ctx = rcu_dereference(current->perf_event_ctxp[ctxn]); + if (ctx) { + perf_iterate_ctx(ctx, perf_event_addr_filters_exec, + NULL, true); + } + rcu_read_unlock(); } - rcu_read_unlock(); - - perf_event_remove_on_exec(); } struct remote_output {