Received: by 2002:aa6:c429:0:b029:98:93ff:f56f with SMTP id g9csp3408861lkq; Mon, 23 Nov 2020 05:04:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJyKgQqCsugecYsT31j5TodbYjy4Odk3bqQuSqQJeXcsoe3no+kOzrD0e9gmDQG5lOKw6vM/ X-Received: by 2002:a05:6402:b08:: with SMTP id bm8mr14296564edb.29.1606136684006; Mon, 23 Nov 2020 05:04:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606136683; cv=none; d=google.com; s=arc-20160816; b=NxOfd3VZq8eKGZ0Q1XfCy3l+JlkJ+3GcGLWGll4V+ckk8MQwuiHnd7mqGCMqZVgm7L 8B+voa50MVJB3ho3THwhPm8RkybYkE3dZgNXJG8h+R7LyBmD8U00f0yocSJ+mNTbe/Nd sUsF+vXReJ9p2hkRphKl6Yw0h+FDuPuGPNdcXnpcjkXW1B1TGpfij0iy09j1c1wr3MbD AVZXjJfexLSQOV8CUYX6XwdjVvRZ5DbY9wSvVAHc8rKEfKGCtZDVTymWQvncWYvRwAdq OhzQLR3AUZvkG7z+HTl335b8PjfTu1W46Tk2Mz4WWlxmu3PGrgyWpTrD0T1PoU8xbz+6 ZVyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to; bh=sXVu96+ei4H4df/qCR3LNg9OUaaTTxBTXNFi6lmMO3A=; b=YVkuu3Ws4zqrmKzCby90u2lCKVSQFTgQqUkV4IaoA4EOFJm6EJrvVPJ8paikBi4EW+ N9gzXHWJMym/k88WaFuDx4QXQuz8/3yq1pBa6t1U7SwyEJakeQJ0fadFVrZajQ1LrYXr kp7M2Fl0NzTyu3nyTzDtyzx+kmNFkILiHbK9MCmEB+hCJh2DnSpDmcz+SkiaVbJ3Hg9L 5rC+Zra0zetC65rfiqbBY0GkEUFVlZjKFtP2a9W96Czdg6KgUvXS1EkFhhZjnzLMnsHU cHP5p+UA9HIS/lT5R0mF1Huun6nvAu1Nv6ZovpdZlrjfsNoOgpHiOPjx75AREk0+8VXH A6Qg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e2si6124549ejh.58.2020.11.23.05.04.19; Mon, 23 Nov 2020 05:04:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387897AbgKWNBU (ORCPT + 99 others); Mon, 23 Nov 2020 08:01:20 -0500 Received: from mx2.suse.de ([195.135.220.15]:49310 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729696AbgKWNBS (ORCPT ); Mon, 23 Nov 2020 08:01:18 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8B828ABCE; Mon, 23 Nov 2020 13:01:16 +0000 (UTC) To: Andrea Arcangeli , Mel Gorman , Andrew Morton , linux-mm@kvack.org, Qian Cai Cc: Michal Hocko , David Hildenbrand , linux-kernel@vger.kernel.org, Mike Rapoport , Baoquan He References: <8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw> <20201121194506.13464-1-aarcange@redhat.com> <20201121194506.13464-2-aarcange@redhat.com> From: Vlastimil Babka Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Message-ID: Date: Mon, 23 Nov 2020 14:01:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.3 MIME-Version: 1.0 In-Reply-To: <20201121194506.13464-2-aarcange@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/21/20 8:45 PM, Andrea Arcangeli wrote: > A corollary issue was fixed in > e577c8b64d58fe307ea4d5149d31615df2d90861. A second issue remained in > v5.7: > > https://lkml.kernel.org/r/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw > > == > page:ffffea0000aa0000 refcount:1 mapcount:0 mapping:000000002243743b index:0x0 > flags: 0x1fffe000001000(reserved) > == > > 73a6e474cb376921a311786652782155eac2fdf0 was applied to supposedly the > second issue, but I still reproduced it twice with v5.9 on two > different systems: > > == > page:0000000062b3e92f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x39800 > flags: 0x1000(reserved) > == > page:000000002a7114f8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7a200 > flags: 0x1fff000000001000(reserved) > == > > I actually never reproduced it until v5.9, but it's still the same bug > as it was reported first for v5.7. > > See the page is "reserved" in all 3 cases. In the last two crashes > with the pfn: > > pfn 0x39800 -> 0x39800000 min_pfn hit non-RAM: > > 39639000-39814fff : Unknown E820 type > > pfn 0x7a200 -> 0x7a200000 min_pfn hit non-RAM: > > 7a17b000-7a216fff : Unknown E820 type It would be nice to also provide a /proc/zoneinfo and how exactly the "zone_spans_pfn" was violated. I assume we end up below zone's start_pfn, but is it true? > This actually seems a false positive bugcheck, the page structures are > valid and the zones are correct, just it's non-RAM but setting > pageblockskip should do no harm. However it's possible to solve the > crash without lifting the bugcheck, by enforcing the invariant that > the free_pfn cursor doesn't point to reserved pages (which would be > otherwise implicitly achieved through the PageBuddy check, except in > the new fast_isolate_around() path). > > Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target") > Signed-off-by: Andrea Arcangeli > --- > mm/compaction.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 13cb7a961b31..d17e69549d34 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1433,7 +1433,10 @@ fast_isolate_freepages(struct compact_control *cc) > page = pageblock_pfn_to_page(min_pfn, > pageblock_end_pfn(min_pfn), > cc->zone); > - cc->free_pfn = min_pfn; > + if (likely(!PageReserved(page))) PageReserved check seems rather awkward solution to me. Wouldn't it be more obvious if we made sure we don't end up below zone's start_pfn (if my assumption is correct) in the first place? When I check the code: unsigned long distance; distance = (cc->free_pfn - cc->migrate_pfn); low_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 2)); min_pfn = pageblock_start_pfn(cc->free_pfn - (distance >> 1)); I think what can happen is that cc->free_pfn <= cc->migrate_pfn after the very last isolate_migratepages(). Then compact_finished() detects that in compact_zone(), but only after migrate_pages() and thus fast_isolate_freepages() is called. That would mean distance can be negative, or rather a large unsigned number and low_pfn and min_pfn end up away from the zone? Or maybe the above doesn't happen, but cc->free_pfn gets so close to start of the zone, that the calculations above make min_pfn go below start_pfn? In any case I would rather make sure we stay within the expected zone boundaries, than play tricks with PageReserved. Mel? > + cc->free_pfn = min_pfn; > + else > + page = NULL; > } > } > } >