Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp600785pxu; Tue, 1 Dec 2020 21:09:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJywYGB89eGYaKWSqYuDgeqhjfcgrZKDSZYGQYOKkUg3Hqhs1rU7GU/dSM4nVUKVsLW1FESU X-Received: by 2002:a50:99cb:: with SMTP id n11mr937402edb.362.1606885793173; Tue, 01 Dec 2020 21:09:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606885793; cv=none; d=google.com; s=arc-20160816; b=k65J1qwk0Pfkc0kQ+bLeL5nL0fkPd2g0rfv7aQB4R5EYDlPXQ0KgC/7GbyS64KeZ49 PjwI0OqVxhQk/vaNhnuSEPzqoRiTNcy7DtxEstLCirphYZ+siPTVU/9hBXVBdwuQgO7f 1PEGxt/u+KatMHwODtsiwAr7nmeoF1LKjW10/MdFxk4jPIJ8ae/eWh4MNP0cmL/ZYcM4 Rrvai1Y9r2ZKoWVT6olMGlnscguTsKBuv4d2arjAgJrFLlUBPm+KJPU6UijStQzoTN5S PFU0rdJHztqY9M0oWNPtcfQZ74qfXm6NxbUumF8iJDhUqUUbcde1VRDGkHWS8TnqwKZ+ tdyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=KwNZsFRA/IYGRcCT2QHYfNwg6PsGCIWZOBcgb5DDj88=; b=I6IUVfzP40D43/7eNpDrfYWSVOJIHyYg4xZAYjpinJA3iV8MGajowTuAx5a8D/aMtK Ffpjlx+uD+iA9ZUsJECcVxBKfIoF8Yy4rfZzBW9bdLTs3TAcfLTxbyL/APelAQWe3EA/ Us2RgGIs6pVoVr8GCKleLV/sGyZzrjRy0aGxjM8lsRYMP+IgNyE6GhbVn9E4RzsF0fLT NXHteZDle93/ozjv1Wng1SdwgEQxq8ZSP5Ezz6a/GkXeRczwvCMRDrzdPlYHdezUdToB kTLle/XqszuNCemirp2RjGHDiOUPq1enrmc73h9sDIVk5f/7WH6pLnrhlwBens1Qf/Vk F5sQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=gueikS+a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r9si201147ejc.144.2020.12.01.21.09.30; Tue, 01 Dec 2020 21:09:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=gueikS+a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728451AbgLBFII (ORCPT + 99 others); Wed, 2 Dec 2020 00:08:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726080AbgLBFIH (ORCPT ); Wed, 2 Dec 2020 00:08:07 -0500 Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 276E4C0613CF for ; Tue, 1 Dec 2020 21:07:27 -0800 (PST) Received: by mail-ej1-x62f.google.com with SMTP id jx16so910290ejb.10 for ; Tue, 01 Dec 2020 21:07:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KwNZsFRA/IYGRcCT2QHYfNwg6PsGCIWZOBcgb5DDj88=; b=gueikS+aDT27nyPzPyzkwYn1k3AuHzdfALOFE07LukUrb2sb/ctSsjJhidsFcEMaCm gYKAbv8txSy41RAkcLddla+GjCPUv3xaJLTF/hg6J7s2OBWVIx4aWZUtuNXolHUgAcOC j4JETcEyYEEloJVXYLbImTzYYPV3CS0B27bycb5eE8HFygnMwcbqZuW3uCZ9L17k2JDQ gPt5CX/HOx09bJvyYwA4cipDB96S+/og5gEdJcu16JoTXuCiFjR6t7B/YXqIw/4CMU4K Bey6XI70dVU7zez0+E1OoBhRYwm1q4Ida/9ZI5c+606mM8Fm5WcSQogYwlT3hEVg/vMt ncNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KwNZsFRA/IYGRcCT2QHYfNwg6PsGCIWZOBcgb5DDj88=; b=r4/hfHLD2rHndADUo0usO6G9uatI1V0H79aBadXD9aXRg1A6gN5V4No4N2KqeNLXCC 8564ELQRT51J47O67Hk3AbBy3lGEjnQ05CkNxH8oQ/zVGGu8VE5djvI7FHcW0W0HQ7vD EQSBaPCLyIPhAZqhJIOKce7aTgmzXQo+wqosv2chD8nhEdXEM+vhpZv+YUi82Hj0gGFN L1deG1AXZDVnOcRYZxADXVqMKlrV8PjGZh+TfF8m+Y+rr1vulUGRI3btSmW8PzMMt/co 09MHIOOzpHFMIu9Gc8xPXpnieB3k5qvRkwA84qnchRbIHTE8cUexM+famvxa5IpAFCx9 VF1w== X-Gm-Message-State: AOAM533Q3JU2l3K30II5nIbCvPLRiCAZGckTiW36kKxIodj9KRg0SthQ utpaMWI/6IYXazCXt15teZmD/g8rV1Tf5Rwc6QkLqg== X-Received: by 2002:a17:906:c51:: with SMTP id t17mr679087ejf.523.1606885645772; Tue, 01 Dec 2020 21:07:25 -0800 (PST) MIME-Version: 1.0 References: <20201201022412.GG4327@casper.infradead.org> <20201201204900.GC11935@casper.infradead.org> <20201202034308.GD11935@casper.infradead.org> In-Reply-To: <20201202034308.GD11935@casper.infradead.org> From: Dan Williams Date: Tue, 1 Dec 2020 21:07:22 -0800 Message-ID: Subject: Re: mapcount corruption regression To: Matthew Wilcox Cc: "Shutemov, Kirill" , Linux Kernel Mailing List , Linux MM , linux-nvdimm , Vlastimil Babka , Yi Zhang Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 1, 2020 at 7:43 PM Matthew Wilcox wrote: > > On Tue, Dec 01, 2020 at 06:28:45PM -0800, Dan Williams wrote: > > On Tue, Dec 1, 2020 at 12:49 PM Matthew Wilcox wrote: > > > > > > On Tue, Dec 01, 2020 at 12:42:39PM -0800, Dan Williams wrote: > > > > On Mon, Nov 30, 2020 at 6:24 PM Matthew Wilcox wrote: > > > > > > > > > > On Mon, Nov 30, 2020 at 05:20:25PM -0800, Dan Williams wrote: > > > > > > Kirill, Willy, compound page experts, > > > > > > > > > > > > I am seeking some debug ideas about the following splat: > > > > > > > > > > > > BUG: Bad page state in process lt-pmem-ns pfn:121a12 > > > > > > page:0000000051ef73f7 refcount:0 mapcount:-1024 > > > > > > mapping:0000000000000000 index:0x0 pfn:0x121a12 > > > > > > > > > > Mapcount of -1024 is the signature of: > > > > > > > > > > #define PG_guard 0x00000400 > > > > > > > > Oh, thanks for that. I overlooked how mapcount is overloaded. Although > > > > in v5.10-rc4 that value is: > > > > > > > > #define PG_table 0x00000400 > > > > > > Ah, I was looking at -next, where Roman renumbered it. > > > > > > I know UML had a problem where it was not clearing PG_table, but you > > > seem to be running on bare metal. SuperH did too, but again, you're > > > not using SuperH. > > > > > > > > > > > > > (the bits are inverted, so this turns into 0xfffffbff which is reported > > > > > as -1024) > > > > > > > > > > I assume you have debug_pagealloc enabled? > > > > > > > > Added it, but no extra spew. I'll dig a bit more on how PG_table is > > > > not being cleared in this case. > > > > > > I only asked about debug_pagealloc because that sets PG_guard. Since > > > the problem is actually PG_table, it's not relevant. > > > > As a shot in the dark I reverted: > > > > b2b29d6d0119 mm: account PMD tables like PTE tables > > > > ...and the test passed. > > That's not really surprising ... you're still freeing PMD tables without > calling the destructor, which means that you're leaking ptlocks on > configs that can't embed the ptlock in the struct page. Ok, so potentially this new tracking is highlighting a long standing bug that was previously silent. That would explain the ambiguous bisect results. > I suppose it shows that you're leaking a PMD table rather than a PTE > table, so that might help track it down. Checking for PG_table in > free_unref_page() and calling show_stack() will probably help more. Will do.