Received: by 10.213.65.68 with SMTP id h4csp607944imn; Tue, 13 Mar 2018 14:49:48 -0700 (PDT) X-Google-Smtp-Source: AG47ELs4Rop/xEiY+3lTYwvl5dMOiri4oUBZLdMWt1Fpc2w+ivSg0vfxh/T0Mij4b1hLWMrzo2CS X-Received: by 2002:a17:902:c5:: with SMTP id a63-v6mr1872202pla.391.1520977787958; Tue, 13 Mar 2018 14:49:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520977787; cv=none; d=google.com; s=arc-20160816; b=AeDGL8TW8OEUpnVvYRHMhixSy8LRJTBlWx1HevZ/1Lmq89pQMJa+PFa0NZF4FSxje4 E7bmUYbRyM/nqL36s2Q230JPB3tE+e2GdsEgtQUMEHTHYjDbTLDzFrH4X5AjWTS7jNt8 0Fk7LRCZAmE+IcENSPVtZn6tEvcB7oqIE/y3ovss75d/T4nGRcMikCtiXIpGB2QVHyZx jQtjxfE71nt07jnj0TjEF9DHlRaY06ASupIGSIlblWXZQO82JgIox1f+p0vptOnU3K3G PGigY3icZJ7iNwGIoASRV/+tCgNGHdo8jYGXDtdamdYEQtneygkpkKM18D82NP3X379L 9JVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:subject:message-id:date:from :references:in-reply-to:mime-version:arc-authentication-results; bh=LdyD30R9ZfSXMUuIleRO7C1UHIpwgSbweFExiM+CkaQ=; b=ZH+LP+M/4hYxT5wrfhWyZTrqjWtb22bhq1k+qk25yMyWX2NedGrEjM7YRGaxngRK7D bzC+2OCJeklmCP1KzIH94Chf9Q7Z4PrrZVX9yEw/j/NVhP+WajjAlhbDLjERo6rdlwn1 ONN7HebG20d0m8RE6Io4bowhfQ15KXjq0K5rQFgIUmsxnQNylJ6THw2pMA/dBjpKBcPq OY03QGm0HZ8H8IG/okbdkY+vlGJduP0oQZIN5/iAEBQhsrXGA6veL4mqw+c9y4Zodr8f BBnILCgU9tpuJn+he5D+2gYYP1zwk+lQwnC5+L6QYxkEjV2PfqOO01beK5uNqhXb8xlf z6xw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y8si713421pgp.602.2018.03.13.14.49.33; Tue, 13 Mar 2018 14:49:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753127AbeCMVr5 (ORCPT + 99 others); Tue, 13 Mar 2018 17:47:57 -0400 Received: from mail-oi0-f67.google.com ([209.85.218.67]:42655 "EHLO mail-oi0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752870AbeCMVry (ORCPT ); Tue, 13 Mar 2018 17:47:54 -0400 Received: by mail-oi0-f67.google.com with SMTP id c18so1005987oiy.9 for ; Tue, 13 Mar 2018 14:47:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=LdyD30R9ZfSXMUuIleRO7C1UHIpwgSbweFExiM+CkaQ=; b=W7kei96jXUIBtMd8BPSwK2nXR9LpEoHG+M+O9/8bFKme/yq5km9LDW9V+xirJrGpuv Res5C60P/gUDEjK7h5bCRT6HVAQ0hqeSkGa4FZcvtQLJdcTH6EP0b/GyYmP4Rehj1SR0 LTKsAeDJCUb+TwVk+fvYUn/ss5rp5vF1y+m8NI/uTn59ITO5DqBb/E6aUvDyj3J7q6ef YHHLXaXzEtFzOpIk9qEtQ6BmvcFHf+dCw0txoDsQfLiZ42HcMKLuIrXESV2Cv4J9ohlN DPlMJD/gZeK4p0FffgtHUQ17p+lGIYZcbZOX6Uob3V5DyOgsbyd8vuwkgub/X7a8gToj S3FQ== X-Gm-Message-State: AElRT7HVkOq19IQfNLf4+toAAVaCMgMVnGB9owin6bRUEk2TZkskTuWq +ueDsVJ9R5ezFL25C7hC1ZKbRHIo2cXvLteQBYC/eg== X-Received: by 10.202.206.71 with SMTP id e68mr1508573oig.34.1520977673862; Tue, 13 Mar 2018 14:47:53 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:39f6:0:0:0:0:0 with HTTP; Tue, 13 Mar 2018 14:47:53 -0700 (PDT) In-Reply-To: <20180313193457.gjqbfwd6eorxeedc@xps> References: <20180313152320.439085687@linuxfoundation.org> <20180313152324.434860515@linuxfoundation.org> <20180313193457.gjqbfwd6eorxeedc@xps> From: Daniel Vacek Date: Tue, 13 Mar 2018 22:47:53 +0100 Message-ID: Subject: Re: [PATCH 4.15 049/146] mm/page_alloc: fix memmap_init_zone pageblock alignment To: Greg Kroah-Hartman , open list , stable , Daniel Vacek , Mel Gorman , Michal Hocko , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 13, 2018 at 8:34 PM, Dan Rue wrote: > On Tue, Mar 13, 2018 at 04:23:36PM +0100, Greg Kroah-Hartman wrote: >> 4.15-stable review patch. If anyone has any objections, please let me know. > > On 4.14 and 4.15, this patch breaks booting on dragonboard 410c and > hikey 620 (both arm64). The fix has been proposed and tested but is not > yet in mainline per https://lkml.org/lkml/2018/3/12/710 I'll send a formal fix today. > Dan > >> >> ------------------ >> >> From: Daniel Vacek >> >> commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae upstream. >> >> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns >> where possible") introduced a bug where move_freepages() triggers a >> VM_BUG_ON() on uninitialized page structure due to pageblock alignment. >> To fix this, simply align the skipped pfns in memmap_init_zone() the >> same way as in move_freepages_block(). >> >> Seen in one of the RHEL reports: >> >> crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1 >> kernel BUG at mm/page_alloc.c:1389! >> invalid opcode: 0000 [#1] SMP >> -- >> RIP: 0010:[] [] move_freepages+0x15e/0x160 >> RSP: 0018:ffff88054d727688 EFLAGS: 00010087 >> -- >> Call Trace: >> [] move_freepages_block+0x73/0x80 >> [] __rmqueue+0x263/0x460 >> [] get_page_from_freelist+0x7e1/0x9e0 >> [] __alloc_pages_nodemask+0x176/0x420 >> -- >> RIP [] move_freepages+0x15e/0x160 >> RSP >> >> crash> page_init_bug -v | grep RAM >> 1000 - 9bfff System RAM (620.00 KiB) >> 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB) >> 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB) >> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB) >> 7b788000 - 7b7fffff System RAM (480.00 KiB) >> 100000000 - 67fffffff System RAM ( 22.00 GiB) >> >> crash> page_init_bug | head -6 >> 7b788000 - 7b7fffff System RAM (480.00 KiB) >> 1fffff00000000 0 1 DMA32 4096 1048575 >> 505736 505344 505855 >> 0 0 0 DMA 1 4095 >> 1fffff00000400 0 1 DMA32 4096 1048575 >> BUG, zones differ! >> >> Note that this range follows two not populated sections >> 68000000-77ffffff in this zone. 7b788000-7b7fffff is the first one >> after a gap. This makes memmap_init_zone() skip all the pfns up to the >> beginning of this range. But this range is not pageblock (2M) aligned. >> In fact no range has to be. >> >> crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000 >> PAGE PHYSICAL MAPPING INDEX CNT FLAGS >> ffffea0001e00000 78000000 0 0 0 0 >> ffffea0001ed7fc0 7b5ff000 0 0 0 0 >> ffffea0001ed8000 7b600000 0 0 0 0 <<<< >> ffffea0001ede1c0 7b787000 0 0 0 0 >> ffffea0001ede200 7b788000 0 0 1 1fffff00000000 >> >> Top part of page flags should contain nodeid and zonenr, which is not >> the case for page ffffea0001ed8000 here (<<<<). >> >> crash> log | grep -o fffea0001ed[^\ ]* | sort -u >> fffea0001ed8000 >> fffea0001eded20 >> fffea0001edffc0 >> >> crash> bt -r | grep -o fffea0001ed[^\ ]* | sort -u >> fffea0001ed8000 >> fffea0001eded00 >> fffea0001eded20 >> fffea0001edffc0 >> >> Initialization of the whole beginning of the section is skipped up to >> the start of the range due to the commit b92df1de5d28. Now any code >> calling move_freepages_block() (like reusing the page from a freelist as >> in this example) with a page from the beginning of the range will get >> the page rounded down to start_page ffffea0001ed8000 and passed to >> move_freepages() which crashes on assertion getting wrong zonenr. >> >> > VM_BUG_ON(page_zone(start_page) != page_zone(end_page)); >> >> Note, page_zone() derives the zone from page flags here. >> >> >From similar machine before commit b92df1de5d28: >> >> crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 >> PAGE PHYSICAL MAPPING INDEX CNT FLAGS >> fffff73941e00000 78000000 0 0 1 1fffff00000000 >> fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000 >> fffff73941ed8000 7b600000 0 0 1 1fffff00000000 >> fffff73941edff80 7b7fe000 0 0 1 1fffff00000000 >> fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodisk >> >> All the pages since the beginning of the section are initialized. >> move_freepages()' not gonna blow up. >> >> The same machine with this fix applied: >> >> crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 >> PAGE PHYSICAL MAPPING INDEX CNT FLAGS >> ffffea0001e00000 78000000 0 0 0 0 >> ffffea0001e00000 7b5ff000 0 0 0 0 >> ffffea0001ed8000 7b600000 0 0 1 1fffff00000000 >> ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000 >> ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodisk >> >> At least the bare minimum of pages is initialized preventing the crash >> as well. >> >> Customers started to report this as soon as 7.4 (where b92df1de5d28 was >> merged in RHEL) was released. I remember reports from >> September/October-ish times. It's not easily reproduced and happens on >> a handful of machines only. I guess that's why. But that does not make >> it less serious, I think. >> >> Though there actually is a report here: >> https://bugzilla.kernel.org/show_bug.cgi?id=196443 >> >> And there are reports for Fedora from July: >> https://bugzilla.redhat.com/show_bug.cgi?id=1473242 >> and CentOS: >> https://bugs.centos.org/view.php?id=13964 >> and we internally track several dozens reports for RHEL bug >> https://bugzilla.redhat.com/show_bug.cgi?id=1525121 >> >> Link: http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.neelx@redhat.com >> Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") >> Signed-off-by: Daniel Vacek >> Cc: Mel Gorman >> Cc: Michal Hocko >> Cc: Paul Burton >> Cc: Pavel Tatashin >> Cc: Vlastimil Babka >> Cc: >> Signed-off-by: Andrew Morton >> Signed-off-by: Linus Torvalds >> Signed-off-by: Greg Kroah-Hartman >> >> --- >> mm/page_alloc.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5353,9 +5353,14 @@ void __meminit memmap_init_zone(unsigned >> /* >> * Skip to the pfn preceding the next valid one (or >> * end_pfn), such that we hit a valid pfn (or end_pfn) >> - * on our next iteration of the loop. >> + * on our next iteration of the loop. Note that it needs >> + * to be pageblock aligned even when the region itself >> + * is not. move_freepages_block() can shift ahead of >> + * the valid region but still depends on correct page >> + * metadata. >> */ >> - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1; >> + pfn = (memblock_next_valid_pfn(pfn, end_pfn) & >> + ~(pageblock_nr_pages-1)) - 1; >> #endif >> continue; >> } >> >>