Received: by 10.213.65.68 with SMTP id h4csp547732imn; Tue, 13 Mar 2018 12:36:49 -0700 (PDT) X-Google-Smtp-Source: AG47ELtmOjBO9L/gTdKZj6t9Q8yGBUM/F3gzByJD1i9xng5V8tcLQId8e8xdlXAHQ/PVRRiABOmv X-Received: by 2002:a17:902:8c83:: with SMTP id t3-v6mr1587892plo.310.1520969809689; Tue, 13 Mar 2018 12:36:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520969809; cv=none; d=google.com; s=arc-20160816; b=qnb1Bv+10EjD78f4C8kH3oi+damsAImyfWFT24ENNNYECXTas8LgloucYA6Kf49pmY WSWulOjcvATJrOsCGKIClT1qd/lEztiKyEUDmFpQHNVipA20eAvsUZAQFWMqzUuoczKw yZItFbxu6kj/H846n1EIIIPkBeIGROfX6f7zXA2PIra7n/1U7jtHXKOF8C0lh66CTBuY UauUKjXlkYnWZcz3/5rjJeYEPWr6glWNQrEP1Ujr2O3ijXsK0XRvGz7zwn0FfVFNuOmT KhvXSvvKtL/lsV4iCF2kVaGrrawy4cvvyyOXBmjkY5Mx6yYFhGKCoIas2Jk0jQ4LpxTe mCjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=qsukj/1kpM2c9OXDb1F1CaqrDij+JnJ5qh3pGaN6zE0=; b=UJFGjcVnu0xHUkNHT1y7/pR1fEu4TwDG32Vl23DocaM4HUX07AQJsMMTJasPfOkoOC IfdM+rEzKWQvK+NOWksuThswk0H1G69e8cO4EUc34ttzELwvx+FnhrMw4mNU4jsMkHHe GhvPi1gHaPEadDQY6+KVznYQNFrkNOSDs41lVlb9IZeGUuJO1Io5FXInOyNWQ90gAxNn duox2yvEUUeLoAcXwFM39YyRVcXQMSyKs/c8oGS+Qk64XJsO3mnvxVX794N/awZwrO6H zXAaQuyBOIAu10yn+ndxIdLaq/YzV6jRhSXXcp1g9tfytaddcSPHyon+aVOlYdfzPinM UAPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=BaH0b8gU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k11-v6si590477pls.58.2018.03.13.12.36.35; Tue, 13 Mar 2018 12:36:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=BaH0b8gU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752857AbeCMTfC (ORCPT + 99 others); Tue, 13 Mar 2018 15:35:02 -0400 Received: from mail-io0-f194.google.com ([209.85.223.194]:38775 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752368AbeCMTe7 (ORCPT ); Tue, 13 Mar 2018 15:34:59 -0400 Received: by mail-io0-f194.google.com with SMTP id d21so1484528ioc.5 for ; Tue, 13 Mar 2018 12:34:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=qsukj/1kpM2c9OXDb1F1CaqrDij+JnJ5qh3pGaN6zE0=; b=BaH0b8gU1n0QnzoPK99rHwQRANux/r8KOprkP62WLa3Y1yv+S7KRUNxVXBMKKOeqPz HAhHJZwkHSS+7H2YIrVkV4mJHjCDzAFGtTdApiIulh1nK0+Sod43TxKRZHOdqOSCIggF l5yAg/Eqzqk1lLaWgjuASuIP2/v6OiIoIq9BU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=qsukj/1kpM2c9OXDb1F1CaqrDij+JnJ5qh3pGaN6zE0=; b=gdVi0yUgQkwjcgo2QdoI+u5X4vjN5s8tzIUExIvJAxqxbDvJ544mTj0KScUpSsT3DY V9yGxZGk++4/6fmSThrLjrh2caFHRSZy8qn64UbKgInlhfSn/oZVVKHsMtxrXhVcudr0 cpG2aMIUA0Rehy33BsbpeUyqnlB/uqv5ynoVBFI8lgpgiy1HZO8zZSkAIS1u16Rb+xHo GPvR6ORYG5NviB/aEkRyeMFRcJgoJb+TKHX8xVWyZ4aEMmD8AgY0rWekekfr0EP3cH56 VQY81jxRZRd37SnIyPIMhTcv3naa417jkR29s7yQlyEY3wC20JeNTRSiOl5+WDS4ZEx4 8c1Q== X-Gm-Message-State: AElRT7GPZ+ExIo+ATPLklHBq8mqDhYIC0dMG2DJyeBskCUfv94Jqdrnw l7+Ej4kC8u0lFPgvp4RjGUU7zA== X-Received: by 10.107.102.13 with SMTP id a13mr2063941ioc.61.1520969698400; Tue, 13 Mar 2018 12:34:58 -0700 (PDT) Received: from localhost (c-68-47-89-210.hsd1.mn.comcast.net. [68.47.89.210]) by smtp.gmail.com with ESMTPSA id f201sm462286itc.12.2018.03.13.12.34.57 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 13 Mar 2018 12:34:57 -0700 (PDT) Date: Tue, 13 Mar 2018 14:34:57 -0500 From: Dan Rue To: Greg Kroah-Hartman Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Daniel Vacek , Mel Gorman , Michal Hocko , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds Subject: Re: [PATCH 4.15 049/146] mm/page_alloc: fix memmap_init_zone pageblock alignment Message-ID: <20180313193457.gjqbfwd6eorxeedc@xps> Mail-Followup-To: Greg Kroah-Hartman , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Daniel Vacek , Mel Gorman , Michal Hocko , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds References: <20180313152320.439085687@linuxfoundation.org> <20180313152324.434860515@linuxfoundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180313152324.434860515@linuxfoundation.org> User-Agent: NeoMutt/20180223 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 13, 2018 at 04:23:36PM +0100, Greg Kroah-Hartman wrote: > 4.15-stable review patch. If anyone has any objections, please let me know. On 4.14 and 4.15, this patch breaks booting on dragonboard 410c and hikey 620 (both arm64). The fix has been proposed and tested but is not yet in mainline per https://lkml.org/lkml/2018/3/12/710 Dan > > ------------------ > > From: Daniel Vacek > > commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae upstream. > > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns > where possible") introduced a bug where move_freepages() triggers a > VM_BUG_ON() on uninitialized page structure due to pageblock alignment. > To fix this, simply align the skipped pfns in memmap_init_zone() the > same way as in move_freepages_block(). > > Seen in one of the RHEL reports: > > crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1 > kernel BUG at mm/page_alloc.c:1389! > invalid opcode: 0000 [#1] SMP > -- > RIP: 0010:[] [] move_freepages+0x15e/0x160 > RSP: 0018:ffff88054d727688 EFLAGS: 00010087 > -- > Call Trace: > [] move_freepages_block+0x73/0x80 > [] __rmqueue+0x263/0x460 > [] get_page_from_freelist+0x7e1/0x9e0 > [] __alloc_pages_nodemask+0x176/0x420 > -- > RIP [] move_freepages+0x15e/0x160 > RSP > > crash> page_init_bug -v | grep RAM > 1000 - 9bfff System RAM (620.00 KiB) > 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB) > 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB) > 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB) > 7b788000 - 7b7fffff System RAM (480.00 KiB) > 100000000 - 67fffffff System RAM ( 22.00 GiB) > > crash> page_init_bug | head -6 > 7b788000 - 7b7fffff System RAM (480.00 KiB) > 1fffff00000000 0 1 DMA32 4096 1048575 > 505736 505344 505855 > 0 0 0 DMA 1 4095 > 1fffff00000400 0 1 DMA32 4096 1048575 > BUG, zones differ! > > Note that this range follows two not populated sections > 68000000-77ffffff in this zone. 7b788000-7b7fffff is the first one > after a gap. This makes memmap_init_zone() skip all the pfns up to the > beginning of this range. But this range is not pageblock (2M) aligned. > In fact no range has to be. > > crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000 > PAGE PHYSICAL MAPPING INDEX CNT FLAGS > ffffea0001e00000 78000000 0 0 0 0 > ffffea0001ed7fc0 7b5ff000 0 0 0 0 > ffffea0001ed8000 7b600000 0 0 0 0 <<<< > ffffea0001ede1c0 7b787000 0 0 0 0 > ffffea0001ede200 7b788000 0 0 1 1fffff00000000 > > Top part of page flags should contain nodeid and zonenr, which is not > the case for page ffffea0001ed8000 here (<<<<). > > crash> log | grep -o fffea0001ed[^\ ]* | sort -u > fffea0001ed8000 > fffea0001eded20 > fffea0001edffc0 > > crash> bt -r | grep -o fffea0001ed[^\ ]* | sort -u > fffea0001ed8000 > fffea0001eded00 > fffea0001eded20 > fffea0001edffc0 > > Initialization of the whole beginning of the section is skipped up to > the start of the range due to the commit b92df1de5d28. Now any code > calling move_freepages_block() (like reusing the page from a freelist as > in this example) with a page from the beginning of the range will get > the page rounded down to start_page ffffea0001ed8000 and passed to > move_freepages() which crashes on assertion getting wrong zonenr. > > > VM_BUG_ON(page_zone(start_page) != page_zone(end_page)); > > Note, page_zone() derives the zone from page flags here. > > >From similar machine before commit b92df1de5d28: > > crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 > PAGE PHYSICAL MAPPING INDEX CNT FLAGS > fffff73941e00000 78000000 0 0 1 1fffff00000000 > fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000 > fffff73941ed8000 7b600000 0 0 1 1fffff00000000 > fffff73941edff80 7b7fe000 0 0 1 1fffff00000000 > fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodisk > > All the pages since the beginning of the section are initialized. > move_freepages()' not gonna blow up. > > The same machine with this fix applied: > > crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 > PAGE PHYSICAL MAPPING INDEX CNT FLAGS > ffffea0001e00000 78000000 0 0 0 0 > ffffea0001e00000 7b5ff000 0 0 0 0 > ffffea0001ed8000 7b600000 0 0 1 1fffff00000000 > ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000 > ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodisk > > At least the bare minimum of pages is initialized preventing the crash > as well. > > Customers started to report this as soon as 7.4 (where b92df1de5d28 was > merged in RHEL) was released. I remember reports from > September/October-ish times. It's not easily reproduced and happens on > a handful of machines only. I guess that's why. But that does not make > it less serious, I think. > > Though there actually is a report here: > https://bugzilla.kernel.org/show_bug.cgi?id=196443 > > And there are reports for Fedora from July: > https://bugzilla.redhat.com/show_bug.cgi?id=1473242 > and CentOS: > https://bugs.centos.org/view.php?id=13964 > and we internally track several dozens reports for RHEL bug > https://bugzilla.redhat.com/show_bug.cgi?id=1525121 > > Link: http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.neelx@redhat.com > Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") > Signed-off-by: Daniel Vacek > Cc: Mel Gorman > Cc: Michal Hocko > Cc: Paul Burton > Cc: Pavel Tatashin > Cc: Vlastimil Babka > Cc: > Signed-off-by: Andrew Morton > Signed-off-by: Linus Torvalds > Signed-off-by: Greg Kroah-Hartman > > --- > mm/page_alloc.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5353,9 +5353,14 @@ void __meminit memmap_init_zone(unsigned > /* > * Skip to the pfn preceding the next valid one (or > * end_pfn), such that we hit a valid pfn (or end_pfn) > - * on our next iteration of the loop. > + * on our next iteration of the loop. Note that it needs > + * to be pageblock aligned even when the region itself > + * is not. move_freepages_block() can shift ahead of > + * the valid region but still depends on correct page > + * metadata. > */ > - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1; > + pfn = (memblock_next_valid_pfn(pfn, end_pfn) & > + ~(pageblock_nr_pages-1)) - 1; > #endif > continue; > } > >