Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp248185lqo; Thu, 16 May 2024 05:27:16 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCV404cYNvTTJ3uCU0Ez/vtryB8cCGSF7jnV/KJISkm+5uR2yvy35XncKnSfz+L1hixB/rB4Q/empwYcRX5qrpbEoZ73gfbKrreX7i2TtA== X-Google-Smtp-Source: AGHT+IHdzma/rBjMXqusXprXtxu9KuWsbvUuwdnFD59NeSdbO/uphvBYwbK6jmWSdeJylT2A9zUE X-Received: by 2002:a17:906:ca4d:b0:a59:bdb7:73e8 with SMTP id a640c23a62f3a-a5a2d665db5mr1217432266b.53.1715862436628; Thu, 16 May 2024 05:27:16 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715862436; cv=pass; d=google.com; s=arc-20160816; b=R6mDkFkZliM6b4exvY5n4ZmhBDLPp8FntHSfKj59KtqeSTtPQ5cinBd1893RrhmWuW NJP3fWkab+B+VgWgLmtwepAFE4HkOzaAtxM0L6nPh0j3yik7MGydAdI6J8lxxgpIPNwn rtbtbv06HCn2FuPcdYXK5duktmV/fhh+Ukn2VYdilrwZkk4r0tU6eF1txUpEZNaog7WO JcYmPAEYvhGpx/uAgCnx8LDuznB3Eq7AlktL8PMNT2QByGL13RiJRDibQl8Xgke3cZuu BKcIFfHbx+cj8btRg0UKiijIdkkq/O1sGGKPp7H4xoyNKZA4hJWyqaGHk7XfRSN5FKnj Lxyw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=qn1ebukywtbJVzdHqjkWWRPx6suZJNVTQGM79yBPLw4=; fh=aCjRzEvfbl7AphQMudMIYdKQbvHWmK0xT1JMzJKBbps=; b=nhNNDFHYvfm3iTZ6WWE8cWYeGFCh6IWy0DPePUreZku1/Lzm5suLmGcCxIDPm7LXKr RKXLv3iRddFhDETuzParHr+O8rt52y+JgGjrXuKh4uYGJBf3iL+KjLgI9X5+Ggx0g2Cj dI5qFe2EQ16SPGj7HIRwJRl0NgR1OKImAjF0kUUH0CjV3oGMSrRBxDEeyKMvnBUhfEnh Z8lKG497ISNcf+xP3bAg3rlLsrKem7pWhOscKQT3Rgir0ye2bRTn6ntPT+hKgh/CSeB/ n7W6Jh+HZ/dVqSkEE0Iqh9IMKPanuf5rzUAJkpqSUdh8CIc3ptQVERcDZ71PyBydVoJf AoTQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=GD0s952g; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-180795-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-180795-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id a640c23a62f3a-a5a1797bf4asi865547866b.239.2024.05.16.05.27.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 May 2024 05:27:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-180795-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=GD0s952g; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-180795-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-180795-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 4520E1F23359 for ; Thu, 16 May 2024 08:51:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 281C614373E; Thu, 16 May 2024 08:51:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="GD0s952g" Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D638142E9D; Thu, 16 May 2024 08:51:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715849478; cv=none; b=eGS0Up21lyVvv6m78e3YIz3Kv7s1gVbEIsMAh1e5+YUuHww1eH9O3rxN6bJzaxlFgoQz6E/1dcs+UYuKaDig4GhGLpL92INGmstvCwwPtLiPJFlr57z8Qjjd5Wvzv8cqdofLN+Zo3pB2T+uEF+YYwVoAi9styqIPePN0BOrKKhg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715849478; c=relaxed/simple; bh=l1HSI/cEg1zk4pGAEgwOlSvrH8Xb5jx7Ymowp4v8X0Q=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=QhA4BZMC6DtUoWJZc3XVLYQBrmHRP69KxbRNNjR1aTrMEqhvwe+uwB2h1X1LdEn3KGlVcuBx0UJA7h+o44CfwmE4w3lGWbIoh3L6Co/4QhUJ4FawRBIr1vA62uR9ynbfrgn/nnUU1X/UubeluUNU2UuSSHzNUTv2Xy1gCstp76c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=GD0s952g; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=qn1ebukywtbJVzdHqjkWWRPx6suZJNVTQGM79yBPLw4=; b=GD0s952guXP8F9CsAEKYSjANzE yRSvQScIDP976RNyxy0nPjBIAyc3ix3Q+BI85QQa0VSSs8DRTppTd8k6L7ZTiDp5+Y4R/1F2jNKdw A+FzUgMHt+TCXPvp0yd8Sf4qcdNBZb0i0Wi/YFgyRtzewRuj9MVhQpUOWAAF3DHV6+HRISws8wpNr 54uZ6LEsdWEQYft0rbIrGkBiUPEO65kjSmpR2idItDrdf3ukS8DanzOYPNzORuI4Iw/FBJIAWrIwI g4LYNkeARWhHgI9OuI5tuM8Xp/tKJDjSlR6f7ij3ii6SpuMgf4hkxoIUYUdlPZg17S/pROOTfHBRm ErzuZLag==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1s7WpP-00000005NvS-0r7b; Thu, 16 May 2024 08:51:07 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id D1A9630144C; Thu, 16 May 2024 10:51:06 +0200 (CEST) Date: Thu, 16 May 2024 10:51:06 +0200 From: Peter Zijlstra To: Haifeng Xu Cc: mingo@redhat.com, frederic@kernel.org, mark.rutland@arm.com, acme@kernel.org, alexander.shishkin@linux.intel.com, jolsa@kernel.org, namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH v4] perf/core: Fix missing wakeup when waiting for context reference Message-ID: <20240516085106.GG22557@noisy.programming.kicks-ass.net> References: <20240513103948.33570-1-haifeng.xu@shopee.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240513103948.33570-1-haifeng.xu@shopee.com> On Mon, May 13, 2024 at 10:39:48AM +0000, Haifeng Xu wrote: > In our production environment, we found many hung tasks which are > blocked for more than 18 hours. Their call traces are like this: > > [346278.191038] __schedule+0x2d8/0x890 > [346278.191046] schedule+0x4e/0xb0 > [346278.191049] perf_event_free_task+0x220/0x270 > [346278.191056] ? init_wait_var_entry+0x50/0x50 > [346278.191060] copy_process+0x663/0x18d0 > [346278.191068] kernel_clone+0x9d/0x3d0 > [346278.191072] __do_sys_clone+0x5d/0x80 > [346278.191076] __x64_sys_clone+0x25/0x30 > [346278.191079] do_syscall_64+0x5c/0xc0 > [346278.191083] ? syscall_exit_to_user_mode+0x27/0x50 > [346278.191086] ? do_syscall_64+0x69/0xc0 > [346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20 > [346278.191092] ? irqentry_exit+0x19/0x30 > [346278.191095] ? exc_page_fault+0x89/0x160 > [346278.191097] ? asm_exc_page_fault+0x8/0x30 > [346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae > > The task was waiting for the refcount become to 1, but from the vmcore, > we found the refcount has already been 1. It seems that the task didn't > get woken up by perf_event_release_kernel() and got stuck forever. The > below scenario may cause the problem. > > Thread A Thread B > ... ... > perf_event_free_task perf_event_release_kernel > ... > acquire event->child_mutex > ... > get_ctx > ... release event->child_mutex > acquire ctx->mutex > ... > perf_free_event (acquire/release event->child_mutex) > ... > release ctx->mutex > wait_var_event > acquire ctx->mutex > acquire event->child_mutex > # move existing events to free_list > release event->child_mutex > release ctx->mutex > put_ctx > ... ... > > In this case, all events of the ctx have been freed, so we couldn't > find the ctx in free_list and Thread A will miss the wakeup. It's thus > necessary to add a wakeup after dropping the reference. > > Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()") > Cc: stable@vger.kernel.org > Signed-off-by: Haifeng Xu > Reviewed-by: Frederic Weisbecker > Acked-by: Mark Rutland Thanks!, I'll hang onto this until after the merge window and then stick it in tip/perf/urgent or somesuch.