Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp395762pxb; Tue, 3 Nov 2020 02:25:51 -0800 (PST) X-Google-Smtp-Source: ABdhPJyz/kulHkSlbFfd4WYLvDzsCVUUPGvJaY46HU0Mo1nqSpxE5GC010xD/REb7SL4U4Pw3THo X-Received: by 2002:a17:906:30ca:: with SMTP id b10mr15103266ejb.390.1604399151025; Tue, 03 Nov 2020 02:25:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604399151; cv=none; d=google.com; s=arc-20160816; b=MONoJdzXN2qUR3fi7hcnUq8hBRQBjzLRhoMvazkMrdD33rxbaT4VxjmxNk8dMhvm4l 2hMW6NGmxVTkImT4bcFX1Z8n5C4/NujELjEYxXnfcyZd7PDhODkXwsvSnDPotOhS8pMQ 9pP33xHuTPLk53MohrOw1UBl6deEq9VNYpc/Km/MjotDXljoLQ6GDyTuNuwwIAW2BUou 8Is0yViu0jHAXjr1NHlIgYJTsodw9bq0vJci+Tvt663A+xAOAAT+QWWt7ppsVs4f2dMx QwfW0bNSSKeXx7CQMhlXrNOJlMDtt369izMs3gh7JyhGLgNr+cnwOtRXDHdy/IJZYrTw jHlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=oA05hKkSinrULQd9HqpNn4BV4DmkhbfIuecGB6U/Gks=; b=c4m0jKdjsfRRQQuyAK1HCdtjxcyEpjmPQ1iY0owSoAV7Hkywkkv1VWhWYK9SRlLiJM YY/Lrko5+jHPci2PhFCSl7wMo0gQLIEBxaGCmdic4Hf02xcJdV2oxbs/EmFRUBqsxZTs JIOQt7JRf8ve/wJwl1GOHfemYMPSszWaHFZlHiQo+5GvCfVjH9M+YyyWVKcRSCf2XtsH 18MmKe9JaRx+Iu0pG3qTHgwttc7w6FzH6cg/ZfRJWe0uPBAChUl+xcr43EAtw42XsqHc Tca5lYQVcRHqZRvgP/CS8pqXEZw2YX9aXsImfAUHDsMK3zeS0MfMLwBhyM8ibivhMDYB Km8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=GZm12Nl6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qu17si4129774ejb.266.2020.11.03.02.25.28; Tue, 03 Nov 2020 02:25:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=GZm12Nl6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727906AbgKCKXC (ORCPT + 99 others); Tue, 3 Nov 2020 05:23:02 -0500 Received: from mx2.suse.de ([195.135.220.15]:52608 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726058AbgKCKXC (ORCPT ); Tue, 3 Nov 2020 05:23:02 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604398980; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oA05hKkSinrULQd9HqpNn4BV4DmkhbfIuecGB6U/Gks=; b=GZm12Nl6nfFexDQp4rA4hOXzs4wiv2AKM2zyaDzejrpWDDkdyFFiWO0qndQtg+5RI2oage WJITtkIPeFnnQSd8Ex2jxEdQO2O+csLoGcS1aBoJVNhBcqj2qghiqJYUlPIQ6YaQGg/3ga 0t2NwYvLfu1P5lb6X92U3bWB/YgM0Ec= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 935C3AC1F; Tue, 3 Nov 2020 10:23:00 +0000 (UTC) Subject: Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion") To: Jan Kara Cc: Peter Zijlstra , LKML , David Sterba , matorola@gmail.com, mingo@kernel.org, darrick.wong@oracle.com References: <20201026114009.GN2594@hirez.programming.kicks-ass.net> <0c0d815c-bd5a-ff2d-1417-28a41173f2b4@suse.com> <20201026125524.GP2594@hirez.programming.kicks-ass.net> <20201026152256.GB2651@hirez.programming.kicks-ass.net> <968c6023-612c-342b-aa69-ec9e1e428eb0@suse.com> <20201103101543.GC3440@quack2.suse.cz> From: Filipe Manana Message-ID: <61e43415-36a7-e270-e61d-59173d701f97@suse.com> Date: Tue, 3 Nov 2020 10:22:59 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20201103101543.GC3440@quack2.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/11/20 10:15, Jan Kara wrote: > On Mon 02-11-20 17:58:54, Filipe Manana wrote: >> >> >> On 26/10/20 15:22, Peter Zijlstra wrote: >>> On Mon, Oct 26, 2020 at 01:55:24PM +0100, Peter Zijlstra wrote: >>>> On Mon, Oct 26, 2020 at 11:56:03AM +0000, Filipe Manana wrote: >>>>>> That smells like the same issue reported here: >>>>>> >>>>>> https://lkml.kernel.org/r/20201022111700.GZ2651@hirez.programming.kicks-ass.net >>>>>> >>>>>> Make sure you have commit: >>>>>> >>>>>> f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable") >>>>>> >>>>>> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled? >>>>> >>>>> Yes, CONFIG_DEBUG_PREEMPT is enabled. >>>> >>>> Bummer :/ >>>> >>>>> I'll try with that commit and let you know, however it's gonna take a >>>>> few hours to build a kernel and run all fstests (on that test box it >>>>> takes over 3 hours) to confirm that fixes the issue. >>>> >>>> *ouch*, 3 hours is painful. How long to make it sick with the current >>>> kernel? quicker I would hope? >>>> >>>>> Thanks for the quick reply! >>>> >>>> Anyway, I don't think that commit can actually explain the issue :/ >>>> >>>> The false positive on lockdep_assert_held() happens when the recursion >>>> count is !0, however we _should_ be having IRQs disabled when >>>> lockdep_recursion > 0, so that should never be observable. >>>> >>>> My hope was that DEBUG_PREEMPT would trigger on one of the >>>> __this_cpu_{inc,dec}(lockdep_recursion) instance, because that would >>>> then be a clear violation. >>>> >>>> And you're seeing this on x86, right? >>>> >>>> Let me puzzle moar.. >>> >>> So I might have an explanation for the Sparc64 fail, but that can't >>> explain x86 :/ >>> >>> I initially thought raw_cpu_read() was OK, since if it is !0 we have >>> IRQs disabled and can't get migrated, so if we get migrated both CPUs >>> must have 0 and it doesn't matter which 0 we read. >>> >>> And while that is true; it isn't the whole store, on pretty much all >>> architectures (except x86) this can result in computing the address for >>> one CPU, getting migrated, the old CPU continuing execution with another >>> task (possibly setting recursion) and then the new CPU reading the value >>> of the old CPU, which is no longer 0. >>> >>> I already fixed a bunch of that in: >>> >>> baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"") >>> >>> but clearly this one got crossed. >>> >>> Still, that leaves me puzzled over you seeing this on x86 :/ >> >> Hi Peter, >> >> I still get the same issue with 5.10-rc2. >> Is there any non-merged patch I should try, or anything I can help with? > > BTW, I've just hit the same deadlock issue with ext4 on generic/390 so I > confirm this isn't btrfs specific issue (as we already knew from the > analysis but still it's good to have that confirmed). Indeed, yesterday Darrick was mentioning on IRC that he has run into it too with fstests on XFS (5.10-rc). > > Honza >