Received: by 2002:a25:1104:0:0:0:0:0 with SMTP id 4csp734457ybr; Fri, 22 May 2020 18:39:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwTfYb88mm4PVf+alX611s3Z0nbcAZa89LEXsEMJGM4sUIgJQtE1RNOWAlw0BZdOgwhEO5D X-Received: by 2002:a05:6402:1d30:: with SMTP id dh16mr5166776edb.302.1590197960068; Fri, 22 May 2020 18:39:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590197960; cv=none; d=google.com; s=arc-20160816; b=o7tYBK/VUDzrO3s3htx3oIDZuqqteoLnRtwzJQcNXKtlB5E/viM69odDMYbMlRNb7Y gIx1CKH3KIJtoiR9e/wuavacIqQQth+nnB6qdCILTu5MxYnnAC0m7mEidnOSIUvRurxy CTlVq6/8CskiTNEScrmWUO/nx/gKUUDzCF4h6g7QVuNSvEbdo0fadxxO6aIaHVeANZn4 uoNzGuw2GEg4d8hVIshUjyfpBkVAOEcKRKNECfrluxa94qzQckXhRvGhdSPROspa3Fvq 4+eHhnR8y0fhJe5bENdEzeqLRRRGJ7KF/6y0Im+1DSOtxpn2WeBTHNGFjmgjmAz9wgkF 37rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=DCwWnuICYdXKn615ulthHW8Vip4jHh6WrC09EbX5vYs=; b=BT9coyI10NudyHw0MuYp5cNK73LDwPWgzKRDwVj787YvUgfmsYHjiMIEFK50gPk0uD BnLZD6vCHC6AyZbK90EsKOEhWrIzrwuxgunAiTm9IVtmoT+1+TzWS75S3A7902mbXa6/ kZUZNwr4V39MGUsFtu9AX96effg305JzEEwJJCpMpJv4ss3tMT52+Sm1KgP6HW3hChFz zIcQwXt+df+rZEuW9LIMG+u0AQTRypmp/4z7xDe+LkfFh+ecqwLZnrujeHe2GvtLjaUU spDQXrb5sfccbJG2EACo+11uOwuaR6F/oAhJKY99xnoccWV1tE1lyiTt4YqEAPb0l9oR 0zIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="OX085bb/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a3si5612457edx.588.2020.05.22.18.38.57; Fri, 22 May 2020 18:39:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="OX085bb/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387531AbgEWBeX (ORCPT + 99 others); Fri, 22 May 2020 21:34:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:33732 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387418AbgEWBeW (ORCPT ); Fri, 22 May 2020 21:34:22 -0400 Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B7638206DD; Sat, 23 May 2020 01:34:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1590197662; bh=iQA3+sV9xEGHDhH86hIZeaDTSY4Ccpq9K37OB1XQOPQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=OX085bb/DXJF53o8HWB2n4tDxwKFCM87nTz9V4MKvGArIf9//eO66j0OGDomN20yZ mqm9gmsha+R4PPEjKkkTXf/nfJqdQgqmjwk0buDNt8EeXher0Muem5H3QfuqkzPM7G ATw+R8ppgSHSsa/Or5aOhzBKGMWjBQlKO+YmR+HE= Date: Fri, 22 May 2020 18:34:21 -0700 From: Andrew Morton To: Konstantin Khlebnikov Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka , David Rientjes , Hugh Dickins , Mel Gorman Subject: Re: [PATCH] mm/compaction: avoid VM_BUG_ON(PageSlab()) in page_mapcount() Message-Id: <20200522183421.7c52ba9650a2ad11cec2be8d@linux-foundation.org> In-Reply-To: <158937872515.474360.5066096871639561424.stgit@buzz> References: <158937872515.474360.5066096871639561424.stgit@buzz> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 13 May 2020 17:05:25 +0300 Konstantin Khlebnikov wrote: > Function isolate_migratepages_block() runs some checks out of lru_lock > when choose pages for migration. After checking PageLRU() it checks extra > page references by comparing page_count() and page_mapcount(). Between > these two checks page could be removed from lru, freed and taken by slab. > > As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount(). > Race window is tiny. For certain workload this happens around once a year. > > > page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 index:0x0 compound_mapcount: 0 > flags: 0x500000000008100(slab|head) > raw: 0500000000008100 dead000000000100 dead000000000200 ffff88ff7712c180 > raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 > page dumped because: VM_BUG_ON_PAGE(PageSlab(page)) > ------------[ cut here ]------------ > kernel BUG at ./include/linux/mm.h:628! > invalid opcode: 0000 [#1] SMP NOPTI > CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G W 4.19.109-27 #1 > Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019 > RIP: 0010:isolate_migratepages_block+0x986/0x9b0 > > > To fix just opencode page_mapcount() in racy check for 0-order case and > recheck carefully under lru_lock when page cannot escape from lru. > > Also add checking extra references for file pages and swap cache. I dunno, this code looks quite nasty. I'm more thinking we should revert and rethink David's 119d6d59dcc0980dcd58 ("mm, compaction: avoid isolating pinned pages"). > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -935,12 +935,16 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > } > > /* > - * Migration will fail if an anonymous page is pinned in memory, > + * Migration will fail if an page is pinned in memory, > * so avoid taking lru_lock and isolating it unnecessarily in an > - * admittedly racy check. > + * admittedly racy check simplest case for 0-order pages. > + * > + * Open code page_mapcount() to avoid VM_BUG_ON(PageSlab(page)). > + * Page could have extra reference from mapping or swap cache. > */ > - if (!page_mapping(page) && > - page_count(page) > page_mapcount(page)) > + if (!PageCompound(page) && > + page_count(page) > atomic_read(&page->_mapcount) + 1 + > + (!PageAnon(page) || PageSwapCache(page))) > goto isolate_fail; OK, we happened to notice this because we happened to trigger a !PageSlab assertion. But if this page has been freed and reused for slab, it could have been reused for *anything*? Perhaps it was reused as a migratable page which we'll go ahead and migrate even though we no longer should. There are various whacky parts of the kernel which (ab)use surprising struct page fields in surprising ways - how do we know it isn't one of those which now happens to look like a migratable page? I also worry about the next test: /* * Only allow to migrate anonymous pages in GFP_NOFS context * because those do not depend on fs locks. */ if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) goto isolate_fail; This page isn't PageLocked(), is it? It could be a recycled page which is will be getting its ->mapping set one nanosecond hence. > /* > @@ -975,6 +979,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > low_pfn += compound_nr(page) - 1; > goto isolate_fail; > } > + > + /* Recheck page extra references under lock */ > + if (page_count(page) > page_mapcount(page) + > + (!PageAnon(page) || PageSwapCache(page))) > + goto isolate_fail; > } > > lruvec = mem_cgroup_page_lruvec(page, pgdat); >