Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp418589pxu; Thu, 3 Dec 2020 03:44:04 -0800 (PST) X-Google-Smtp-Source: ABdhPJxpPUZy2qZQkkAOIPAxQm//pV5yL/l86McI+fhOzRBexfgqbD67+EO+Sdloa1Hr24a3nLAs X-Received: by 2002:a17:906:259a:: with SMTP id m26mr2136582ejb.399.1606995844158; Thu, 03 Dec 2020 03:44:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606995844; cv=none; d=google.com; s=arc-20160816; b=Nbu8up0MTNo4or/ynYfVBH9SJGrg6xyCNaOz7xkimP9SAJMNSfpBdN/BbUgTYtxgpK I4qa3KTRjK2hKn5qehPUsrkFnnBuKwWx46MhQpf0HZqoSqqZ4JGW5gd0fnE2zGYnv9c5 Vk86A+TIqRCTzBCojC83Q/hRbQ182TLzrur82s8bplEbSzIK7zNjiHvle0+ZCrzgsffZ KSDNgqHr9O+fDB8BzgB83w3sruP6438N4lGdB+5WogheQI47q0zjr+MjeQBrbCV8sJOu 99M1j8PiCfYdDdcaXz34memtNw6Khr/p85EBJyrZfVuxTFxoOEhK8dpw7Ypq4TrWF+IU if8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=iK+a2fCHRj3nStz8Uu0cWGOymgatLBnGJWrKhsHX5hQ=; b=TPlHIHoRbQwG/d7WERwaLvmlbP/GRtLvGf6AF8tA/gqllct6Wiz/JAnm48MhK0fNW1 L9rJ/sghfATisqy+X9u09HCRGS59U4mrow2k9MCX6nYn2YQOhkDBpUyqMGodfDHgXL7t abBOpu4UBMe2NmD5VTzGiwlnICw6C4DFC/qY3Y3+AcWW7Te594es4aMtPBoo2+yA8QL0 vyWSciwHXL8L9O8x11Y1tTcblJTn+iMqM4Wt6LaLY6cc34KEtj+ldc1QW4ZgT8wDOQwJ f4rZxBKuCQGEXQcn4hNB32MGK8gdnrNDpOJpLL4qQPFhTgfLEfiwWZbsrC98lmjIcmgc tIcA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dp16si484011ejc.614.2020.12.03.03.43.40; Thu, 03 Dec 2020 03:44:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387968AbgLCLkJ (ORCPT + 99 others); Thu, 3 Dec 2020 06:40:09 -0500 Received: from foss.arm.com ([217.140.110.172]:37910 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727022AbgLCLkG (ORCPT ); Thu, 3 Dec 2020 06:40:06 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C6BA6113E; Thu, 3 Dec 2020 03:39:20 -0800 (PST) Received: from C02TD0UTHF1T.local (unknown [10.57.0.87]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 591AB3F66B; Thu, 3 Dec 2020 03:39:18 -0800 (PST) Date: Thu, 3 Dec 2020 11:39:15 +0000 From: Mark Rutland To: Boqun Feng Cc: Naresh Kamboju , linux-stable , open list , rcu@vger.kernel.org, lkft-triage@lists.linaro.org, Greg Kroah-Hartman , Sasha Levin , Peter Zijlstra , Will Deacon , "Paul E. McKenney" Subject: Re: [arm64] db410c: BUG: Invalid wait context Message-ID: <20201203113915.GE96754@C02TD0UTHF1T.local> References: <20201203014922.GA1785576@boqun-archlinux> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201203014922.GA1785576@boqun-archlinux> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Naresh, Boqun, On Thu, Dec 03, 2020 at 09:49:22AM +0800, Boqun Feng wrote: > On Wed, Dec 02, 2020 at 10:15:44AM +0530, Naresh Kamboju wrote: > > While running kselftests on arm64 db410c platform "BUG: Invalid wait context" > > noticed at different runs this specific platform running stable-rc 5.9.12-rc1. > > > > While running these two test cases we have noticed this BUG and not easily > > reproducible. > > > > # selftests: bpf: test_xdp_redirect.sh > > # selftests: net: ip6_gre_headroom.sh > > > > [ 245.694901] kauditd_printk_skb: 100 callbacks suppressed > > [ 245.694913] audit: type=1334 audit(251.699:25757): prog-id=12883 op=LOAD > > [ 245.735658] audit: type=1334 audit(251.743:25758): prog-id=12884 op=LOAD > > [ 245.801299] audit: type=1334 audit(251.807:25759): prog-id=12885 op=LOAD > > [ 245.832034] audit: type=1334 audit(251.839:25760): prog-id=12886 op=LOAD > > [ 245.888601] > > [ 245.888631] ============================= > > [ 245.889156] [ BUG: Invalid wait context ] > > [ 245.893071] 5.9.12-rc1 #1 Tainted: G W > > [ 245.897056] ----------------------------- > > [ 245.902091] pool/1279 is trying to lock: > > [ 245.906083] ffff000032fc1218 > > (&child->perf_event_mutex){+.+.}-{3:3}, at: > > perf_event_exit_task+0x34/0x3a8 > > [ 245.910085] other info that might help us debug this: > > [ 245.919539] context-{4:4} > > [ 245.924484] 1 lock held by pool/1279: > > [ 245.927087] #0: ffff8000127819b8 (rcu_read_lock){....}-{1:2}, at: > > dput+0x54/0x460 > > [ 245.930739] stack backtrace: > > [ 245.938203] CPU: 1 PID: 1279 Comm: pool Tainted: G W > > 5.9.12-rc1 #1 > > [ 245.941243] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > > [ 245.948621] Call trace: > > [ 245.955390] dump_backtrace+0x0/0x1f8 > > [ 245.957560] show_stack+0x2c/0x38 > > [ 245.961382] dump_stack+0xec/0x158 > > [ 245.964679] __lock_acquire+0x59c/0x15c8 > > [ 245.967978] lock_acquire+0x124/0x4d0 > > [ 245.972058] __mutex_lock+0xa4/0x970 > > [ 245.975615] mutex_lock_nested+0x54/0x70 > > [ 245.979261] perf_event_exit_task+0x34/0x3a8 > > [ 245.983168] do_exit+0x394/0xad8 > > [ 245.987420] do_group_exit+0x4c/0xa8 > > [ 245.990633] get_signal+0x16c/0xb40 > > [ 245.994193] do_notify_resume+0x2ec/0x678 > > [ 245.997404] work_pending+0x8/0x200 > > > > For the PoV of lockdep, this means some one tries to acquire a mutex > inside an RCU read-side critical section, which is bad, because one can > not sleep (voluntarily) inside RCU. > > However I don't think it's the true case here, because 1) normally > people are very careful not putting mutex or other sleepable locks > inside RCU and 2) in the above splats, lockdep find the rcu read lock is > held at dput() while the acquiring of the mutex is at ret_to_user(), > clearly there is no call site (in the same context) from the RCU > read-side critial section of dput() to ret_to_user(). > > One chance of hitting this is that there is a bug in context/irq tracing > that makes the contexts of dput() and ret_to_user() as one contexts so > that lockdep gets confused and reports a false postive. That sounds likely to me (but I haven't looked too deeply at the above report). > FWIW, I think this might be related to some know issues for ARM64 with > lockdep and irq tracing: > > https://lore.kernel.org/lkml/20201119225352.GA5251@willie-the-truck/ > > And Mark already has series to fix them: > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes > > But I must defer to Mark for the latest fix ;-) That went into mainline a few hours ago, and will be part of v5.10-rc7. So if it's possible to test with mainline, that would be helpful! Thanks, Mark.