Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp319815pxu; Wed, 25 Nov 2020 04:11:30 -0800 (PST) X-Google-Smtp-Source: ABdhPJxDdXk2Gb0c447ebq3RzDuK0rEyfxE74d8waWh1JCeuEb4EZ1vPXaNAGK0FdIMRxDb++ziG X-Received: by 2002:a17:906:d20e:: with SMTP id w14mr2821601ejz.479.1606306290447; Wed, 25 Nov 2020 04:11:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606306290; cv=none; d=google.com; s=arc-20160816; b=OfemR05PjoAIkmESobBGdpJWNZNZ5tUDs+Kmf7NHKyEEgxkKzQvtjSPSqjxPl4Z1uy IwdbRDGBNcejbTJxwOD2OpEqz2f8d+Z1+GZWTGyaf/mn764OTJX3Tn3KKTcywXb4kAKa OqpdREK7U0vQFioASGSDSd4rwHoDakY0odqqgBDQyEXXcYFbOdX4/T0A6gSyeedkm/fS M+voVV1AADkoOls9m/3KM28DgiyXLf87rlhbx/zFTVOi+D1Yiu6NDYs2Soj3uhERv3ql LVULIPEWs0najn88lN0Pn6CuvBnTIqykG3tdM8n/JLMGA28H7Hiq0qaPxbOihgZCTnyc n92A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to; bh=k2v1UyCx9vnTL4i+0+/3AOEB/dPuk5Ucdhfaj/4FN7M=; b=kvyK/IjcgH7+RU0Nc7qjuwsKFaKcr9eC+z9ovKqbsEzHiuuQmAqs+y9hoLvbEe+RLu Hfc50jNhqgUUCXMokmp+ItyyXB7IIrjQPQ9JN04Ghw6w8CV44fwce7GS0OsgiY+X/eul VQYwvj8MS1OpnsDNQfYSdGVXxN6VNzkeK9EcAE8srOB8C+vvzvKD+MCBSuZaOM8zjlRX O1nk5yNqBEfL6hf1nBT5sttpECgEuUUWLvuzyP3HSpwFEyoElT3R3UZt7SXSsdJ61/CE kU9LL0OHv5mCQ20EqOFOaJjwigj6yHtcHsWZb4s8Itj8PC1U4ycIcjzxgvByBlEltojK RP4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s6si1197227ejb.276.2020.11.25.04.11.05; Wed, 25 Nov 2020 04:11:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728235AbgKYMI5 (ORCPT + 99 others); Wed, 25 Nov 2020 07:08:57 -0500 Received: from mx2.suse.de ([195.135.220.15]:46482 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725836AbgKYMI5 (ORCPT ); Wed, 25 Nov 2020 07:08:57 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6BD3BAF0B; Wed, 25 Nov 2020 12:08:55 +0000 (UTC) To: Andrea Arcangeli , David Hildenbrand Cc: Mel Gorman , Andrew Morton , linux-mm@kvack.org, Qian Cai , Michal Hocko , linux-kernel@vger.kernel.org, Mike Rapoport , Baoquan He References: <8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw> <20201121194506.13464-1-aarcange@redhat.com> <20201121194506.13464-2-aarcange@redhat.com> From: Vlastimil Babka Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Message-ID: <1c4c405b-52e0-cf6b-1f82-91a0a1e3dd53@suse.cz> Date: Wed, 25 Nov 2020 13:08:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/25/20 6:34 AM, Andrea Arcangeli wrote: > Hello, > > On Mon, Nov 23, 2020 at 02:01:16PM +0100, Vlastimil Babka wrote: >> On 11/21/20 8:45 PM, Andrea Arcangeli wrote: >> > A corollary issue was fixed in >> > 39639000-39814fff : Unknown E820 type >> > >> > pfn 0x7a200 -> 0x7a200000 min_pfn hit non-RAM: >> > >> > 7a17b000-7a216fff : Unknown E820 type >> >> It would be nice to also provide a /proc/zoneinfo and how exactly the >> "zone_spans_pfn" was violated. I assume we end up below zone's >> start_pfn, but is it true? > > Agreed, I was about to grab that info along with all page struct > around the pfn 0x7a200 and phys address 0x7a216fff. > > # grep -A1 E820 /proc/iomem > 7a17b000-7a216fff : Unknown E820 type > 7a217000-7bffffff : System RAM > > DMA zone_start_pfn 1 zone_end_pfn() 4096 contiguous 1 > DMA32 zone_start_pfn 4096 zone_end_pfn() 1048576 contiguous 0 > Normal zone_start_pfn 1048576 zone_end_pfn() 4715392 contiguous 1 > Movable zone_start_pfn 0 zone_end_pfn() 0 contiguous 0 So the above means that around the "Unknown E820 type" we have: pfn 499712 - start of pageblock in ZONE_DMA32 pfn 500091 - start of the "Unknown E820 type" range pfn 500224 - start of another pageblock pfn 500246 - end of "Unknown E820 type" So this is indeed not a zone boundary issue, but basically a hole not aligned to pageblock boundary and really unexpected. We have CONFIG_HOLES_IN_ZONE (that x86 doesn't set) for architectures that do this, and even that config only affects pfn_valid_within(). But here pfn_valid() is true, but the zone/node linkage is unexpected. > However the real bug seems that reserved pages have a zero zone_id in > the page->flags when it should have the real zone id/nid. The patch I > sent earlier to validate highest would only be needed to deal with > pfn_valid. > > Something must have changed more recently than v5.1 that caused the > zoneid of reserved pages to be wrong, a possible candidate for the > real would be this change below: > > + __init_single_page(pfn_to_page(pfn), pfn, 0, 0); > > Even if it may not be it, at the light of how the reserved page > zoneid/nid initialized went wrong, the above line like it's too flakey > to stay. > > It'd be preferable if the pfn_valid fails and the > pfn_to_section_nr(pfn) returns an invalid section for the intermediate > step. Even better memset 0xff over the whole page struct until the > second stage comes around. > > Whenever pfn_valid is true, it's better that the zoneid/nid is correct > all times, otherwise if the second stage fails we end up in a bug with > weird side effects. Yeah I guess it would be simpler if zoneid/nid was correct for pfn_valid() pfns within a zone's range, even if they are reserved due not not being really usable memory. I don't think we want to introduce CONFIG_HOLES_IN_ZONE to x86. If the chosen solution is to make this to a real hole, the hole should be extended to MAX_ORDER_NR_PAGES aligned boundaries. In any case, compaction code can't fix this with better range checks. > Maybe it's not the above that left a zero zoneid though, I haven't > tried to bisect it yet to look how the page->flags looked like on a > older kernel that didn't seem to reproduce this crash, I'm just > guessing. > > Thanks, > Andrea >