Received: by 10.223.185.116 with SMTP id b49csp7542794wrg; Thu, 1 Mar 2018 07:11:22 -0800 (PST) X-Google-Smtp-Source: AG47ELvR9MZRijuRFSvdJdfq/YW6phKuK0EABqVLP9AB/ObMuLjcllE8n+mahjKoo0rShRg1pCr8 X-Received: by 10.101.97.139 with SMTP id c11mr1795929pgv.443.1519917081975; Thu, 01 Mar 2018 07:11:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519917081; cv=none; d=google.com; s=arc-20160816; b=LbO8SWVxVX2VBWzt7LXWKhmV2hWNR/iQ2fWN9gBAAj/TuEUfmiPQETLuwCn4YblKsa 7cRInnYeIibTyc6Wg5bGmHMiOOCFuZ/yugjtzt48f85bsx/FMXvyvYg8Hg1hKspkiHo1 MztMWV/vp2njtwA7TzcjExZw55wEIRu+2+biE/fQQoIOvpGRft64pjUPmZ+nWkt30/Yv 0wyU3Kpn5WKWfZgqxRNgoz8GZrdU92xK+rfbafPTM2QpoFYfCECB4y+HK+mRZRaXlbTA w+uyXd2xkk8sLoSKmjTPLln4lMCJiUHwDSmNNwb/UsK2gYgicqdYom0/CRD6zGGmZFUD 0V9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:arc-authentication-results; bh=Z0YEfD17EMWqgyXmqTSoq1ldZE32+xyXQSuIESR9N0Q=; b=zk+JsY3RVJpgJS3b+zTL/7nEJIC7wiq0mt38YfFOMlWLgHZFadgnayJgUkG7GdLXDY nsAJR7XJMopPhAoZr6RqfYr05LAS1LG4xDJMPpCdmmICSqkqtrlFNxjCPUP0QBHy3MXl P8OWkZt2nqv9IiuXa3mLsivyub1HOmd1w7WIP/l09DD1kaAGZGsIlM5bAaSgDi6K7+L5 PO0O0+xyGH7qwTxHVaYpIaRrznCn8fVaYOr3ERz75LlM4tpYJ93JZ72Xc+NfRJ0hnHES rtS1hVarB310+f8tjbtiXfADjWkRqKw9CPpEEzItrzCPiVC+pNLxNyFuRDFIGG+9WBna d18g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m1-v6si3140700plk.318.2018.03.01.07.11.07; Thu, 01 Mar 2018 07:11:21 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031793AbeCAPJj (ORCPT + 99 others); Thu, 1 Mar 2018 10:09:39 -0500 Received: from mail-oi0-f66.google.com ([209.85.218.66]:40594 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031783AbeCAPJh (ORCPT ); Thu, 1 Mar 2018 10:09:37 -0500 Received: by mail-oi0-f66.google.com with SMTP id c12so4685119oic.7 for ; Thu, 01 Mar 2018 07:09:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Z0YEfD17EMWqgyXmqTSoq1ldZE32+xyXQSuIESR9N0Q=; b=bWxje+did9pzKGNAmPvVbNQijOyuDiLDPYOJGyKeTeoGijXxeYDa67FkpCJV4yTN4y f9mVVIoascRn/qIU4L89lEHrxT9L5vOjzHlAnbRD6lxKrsQSpOYlnbJr3gR8oaRvRBdE ArqThpcRjvA+P6YFNUb/t6OcwcJ8fpRjpYUxIjEDP5CYUfiU55hxMoVNw1RWzZeDRJpk 8r1ghX9RsGW2IW/LPZhHzu9U7huW89PB5qpCtrlU4DNxsLLsUVLaSR0yGGa3DoAdiAB+ 9ZxkOe048bPdgeLlko1YBGKUfxBCgHeLaJgOcrtN4q/IcXgCftOXi+GPEBy9L1pnhDBb 9D7g== X-Gm-Message-State: APf1xPBUTG2pzZig+2Uasrxa9pvUmG00nEkyi/g+CwmdPn3Xa+XTLI1n cTvmMVJ8lVtQ2xE225OHfk04IIPuqrktTFHSqCG0RQ== X-Received: by 10.202.206.71 with SMTP id e68mr1484117oig.34.1519916976355; Thu, 01 Mar 2018 07:09:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.57.246 with HTTP; Thu, 1 Mar 2018 07:09:35 -0800 (PST) In-Reply-To: <20180301131033.GH15057@dhcp22.suse.cz> References: <1519908465-12328-1-git-send-email-neelx@redhat.com> <20180301131033.GH15057@dhcp22.suse.cz> From: Daniel Vacek Date: Thu, 1 Mar 2018 16:09:35 +0100 Message-ID: Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment To: Michal Hocko Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Vlastimil Babka , Mel Gorman , Pavel Tatashin , Paul Burton , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ffffe31d01ed8000 7b600000 0 0 0 0 On Thu, Mar 1, 2018 at 2:10 PM, Michal Hocko wrote: > On Thu 01-03-18 13:47:45, Daniel Vacek wrote: >> In move_freepages() a BUG_ON() can be triggered on uninitialized page structures >> due to pageblock alignment. Aligning the skipped pfns in memmap_init_zone() the >> same way as in move_freepages_block() simply fixes those crashes. > > This changelog doesn't describe how the fix works. Why doesn't > memblock_next_valid_pfn return the first valid pfn as one would expect? Actually it does. The point is it is not guaranteed to be pageblock aligned. And we actually want to initialize even those page structures which are outside of the range. Hence the alignment here. For example from reproducer machine, memory map from e820/BIOS: $ grep 7b7ff000 /proc/iomem 7b7ff000-7b7fffff : System RAM Page structures before commit b92df1de5d28: crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 7b800000 7ffff000 80000000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff73941e00000 78000000 0 0 1 1fffff00000000 fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000 fffff73941ed8000 7b600000 0 0 1 1fffff00000000 fffff73941edff80 7b7fe000 0 0 1 1fffff00000000 fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodisk <<<< start of the range here fffff73941ee0000 7b800000 0 0 1 1fffff00000000 fffff73941ffffc0 7ffff000 0 0 1 1fffff00000000 So far so good. After commit b92df1de5d28 machine eventually crashes with: BUG at mm/page_alloc.c:1913 > VM_BUG_ON(page_zone(start_page) != page_zone(end_page)); From registers and stack I digged start_page points to ffffe31d01ed8000 (note that this is page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump: crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 7b800000 7ffff000 80000000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffe31d01e00000 78000000 0 0 0 0 ffffe31d01ed7fc0 7b5ff000 0 0 0 0 ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note that nodeid and zonenr are encoded in top bits of page flags which are not initialized here, hence the crash :-( ffffe31d01edff80 7b7fe000 0 0 0 0 ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000 ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000 ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000 With my fix applied: crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 7b800000 7ffff000 80000000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0001e00000 78000000 0 0 0 0 ffffea0001e00000 7b5ff000 0 0 0 0 ffffea0001ed8000 7b600000 0 0 1 1fffff00000000 <<<< vital data filled in here this time \o/ ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000 ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodisk ffffea0001ee0000 7b800000 0 0 1 1fffff00000000 ffffea0001ffffc0 7ffff000 0 0 1 1fffff00000000 We are not interested in the beginning of whole section. Just the pages in the first populated block where the range begins are important (actually just the first one really, but...). > It would be also good put the panic info in the changelog. Of course I forgot to link the related bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196443 Though it is not very well explained there as well. I hope my notes above make it clear. >> Fixes: b92df1de5d28 ("[mm] page_alloc: skip over regions of invalid pfns where possible") >> Signed-off-by: Daniel Vacek >> Cc: stable@vger.kernel.org >> --- >> mm/page_alloc.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index cb416723538f..9edee36e6a74 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >> /* >> * Skip to the pfn preceding the next valid one (or >> * end_pfn), such that we hit a valid pfn (or end_pfn) >> - * on our next iteration of the loop. >> + * on our next iteration of the loop. Note that it needs >> + * to be pageblock aligned even when the region itself >> + * is not as move_freepages_block() can shift ahead of >> + * the valid region but still depends on correct page >> + * metadata. >> */ >> - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1; >> + pfn = (memblock_next_valid_pfn(pfn, end_pfn) & >> + ~(pageblock_nr_pages-1)) - 1; >> #endif >> continue; >> } >> -- >> 2.16.2 >> > > -- > Michal Hocko > SUSE Labs