Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp392811pxb; Tue, 3 Nov 2020 02:20:16 -0800 (PST) X-Google-Smtp-Source: ABdhPJzVXEWQC/Wb9QtAoXZrdfGxzXW1Vf/HT4uuM9KGNFPmKUw0bFBHRgaBG/K1rK8sGiIdPsVI X-Received: by 2002:a17:906:c293:: with SMTP id r19mr18684335ejz.63.1604398816620; Tue, 03 Nov 2020 02:20:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604398816; cv=none; d=google.com; s=arc-20160816; b=Hb0xTnJ/c2YLq6su6xiDqH3MfjCGaYFHQ6qpnqWQWAHw/pdi+Lm++Am3L2KuKsslA/ +HwKnVIjxirCMoLQxIk97R5M2SAgDGVbS0J9Hp+pxAkgRdKRKziCv+R0LYKLUkHdzb6R uEp5txGF7SJLxtynPKZ/E+J76LsnS0dr0WxvM2I/dp+PQ/G4QeAwWkDNf3L8tkYZ1+Wz amMtFgkcXSWoVjOTF0zbKJvvLi5YOuqrfdP6e6LP3QJqj+pMHXNnxNTRdFSo3i/6FRQF SsScHl4A7vbcTljcByJ5UtH2U7Kk7iuDb6P4jWW4BM7wT/yBM/q6oNpI0S/jh4rWzRQG mzPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=/zbuD6otS1I4dkShYPH0Ug9kR9KMJzPoWEizezdIrNg=; b=j9WYnulY846nkno969rFY3UtA4szQ6vpMwd7W8drdU6AgZVC56c30Sik07VPCcKRvx 3bDw7PgXIxkyUTtHBDKYESbWOfCxzGkGgAJx740lmyrkBctgx3Arug2iB7UKvWJpe1Gh jAXGrRsHSOiF/Xk1RXcEkDTYYYmupC0iD0ZaLqRTAKkxbTcJYdFYbQ7p59nrD9VQqwom u4sHOvKWlkDjX5RhcHyPSVCD16R6EukWqL45PnyYPjh56rNQYc8WykHozcm42JgbhFQs DTKj8OYI8SB1mD/2Hit2ecqEAcegtbdfRAXN3EUJbfM0J/QT8pX9rzrNmYn3JN7Js+Ru 6+og== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d11si11982480edj.223.2020.11.03.02.19.54; Tue, 03 Nov 2020 02:20:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728017AbgKCKPp (ORCPT + 99 others); Tue, 3 Nov 2020 05:15:45 -0500 Received: from mx2.suse.de ([195.135.220.15]:32784 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726109AbgKCKPp (ORCPT ); Tue, 3 Nov 2020 05:15:45 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 93DCAAC1F; Tue, 3 Nov 2020 10:15:43 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 3B33D1E12FB; Tue, 3 Nov 2020 11:15:43 +0100 (CET) Date: Tue, 3 Nov 2020 11:15:43 +0100 From: Jan Kara To: Filipe Manana Cc: Peter Zijlstra , LKML , Jan Kara , David Sterba , matorola@gmail.com, mingo@kernel.org Subject: Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion") Message-ID: <20201103101543.GC3440@quack2.suse.cz> References: <20201026114009.GN2594@hirez.programming.kicks-ass.net> <0c0d815c-bd5a-ff2d-1417-28a41173f2b4@suse.com> <20201026125524.GP2594@hirez.programming.kicks-ass.net> <20201026152256.GB2651@hirez.programming.kicks-ass.net> <968c6023-612c-342b-aa69-ec9e1e428eb0@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <968c6023-612c-342b-aa69-ec9e1e428eb0@suse.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 02-11-20 17:58:54, Filipe Manana wrote: > > > On 26/10/20 15:22, Peter Zijlstra wrote: > > On Mon, Oct 26, 2020 at 01:55:24PM +0100, Peter Zijlstra wrote: > >> On Mon, Oct 26, 2020 at 11:56:03AM +0000, Filipe Manana wrote: > >>>> That smells like the same issue reported here: > >>>> > >>>> https://lkml.kernel.org/r/20201022111700.GZ2651@hirez.programming.kicks-ass.net > >>>> > >>>> Make sure you have commit: > >>>> > >>>> f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable") > >>>> > >>>> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled? > >>> > >>> Yes, CONFIG_DEBUG_PREEMPT is enabled. > >> > >> Bummer :/ > >> > >>> I'll try with that commit and let you know, however it's gonna take a > >>> few hours to build a kernel and run all fstests (on that test box it > >>> takes over 3 hours) to confirm that fixes the issue. > >> > >> *ouch*, 3 hours is painful. How long to make it sick with the current > >> kernel? quicker I would hope? > >> > >>> Thanks for the quick reply! > >> > >> Anyway, I don't think that commit can actually explain the issue :/ > >> > >> The false positive on lockdep_assert_held() happens when the recursion > >> count is !0, however we _should_ be having IRQs disabled when > >> lockdep_recursion > 0, so that should never be observable. > >> > >> My hope was that DEBUG_PREEMPT would trigger on one of the > >> __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would > >> then be a clear violation. > >> > >> And you're seeing this on x86, right? > >> > >> Let me puzzle moar.. > > > > So I might have an explanation for the Sparc64 fail, but that can't > > explain x86 :/ > > > > I initially thought raw_cpu_read() was OK, since if it is !0 we have > > IRQs disabled and can't get migrated, so if we get migrated both CPUs > > must have 0 and it doesn't matter which 0 we read. > > > > And while that is true; it isn't the whole store, on pretty much all > > architectures (except x86) this can result in computing the address for > > one CPU, getting migrated, the old CPU continuing execution with another > > task (possibly setting recursion) and then the new CPU reading the value > > of the old CPU, which is no longer 0. > > > > I already fixed a bunch of that in: > > > > baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"") > > > > but clearly this one got crossed. > > > > Still, that leaves me puzzled over you seeing this on x86 :/ > > Hi Peter, > > I still get the same issue with 5.10-rc2. > Is there any non-merged patch I should try, or anything I can help with? BTW, I've just hit the same deadlock issue with ext4 on generic/390 so I confirm this isn't btrfs specific issue (as we already knew from the analysis but still it's good to have that confirmed). Honza -- Jan Kara SUSE Labs, CR