Received: by 10.223.185.116 with SMTP id b49csp8804558wrg; Fri, 2 Mar 2018 08:13:26 -0800 (PST) X-Google-Smtp-Source: AG47ELuOZk+kotVV9RQbqNqV79HuEQNU+Eps9JWqRat9iQ6jeOh3ahER4rkhje/tjlb+4gL9kcBY X-Received: by 10.98.8.92 with SMTP id c89mr6099116pfd.154.1520007206276; Fri, 02 Mar 2018 08:13:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520007206; cv=none; d=google.com; s=arc-20160816; b=jEMALJ5aA4crAQ5DjDvqu6fASqpaBIFX3um29WHy6f0FqYH876NjqLhLjOu1D5PtEc 26ZlOoe/kW4eWAVmp644uewcBa5/z7kKchytVfc3c2ylTY1S5Uyf8HNreAlgksNlLoV9 W0vwIreCch8NhxlUpzBC6jENuFcRNUOUOer8ks+/ZWzH4JJhur/Fx1rG9Z0gUjoL7USn yYTec5YygVhdaRGbNGoHQU05AZRsQkcJC59zd5igaDLOZKt7RAn86QRaxKnXnlZXVGtN RXqqvwu/WAw0AmIgzxUZXFqKgTNpMArsSeSh4gkPB+UHxKlyRiIA39o2lR2dxsmYctfj qxmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:arc-authentication-results; bh=JcmI8BkykKGCLrn8W4nkDcltIF0XvspMtX9hjINMuRI=; b=mv1XDLlyxsbE3vPY1LGcYx6sIOKL763nw8aytsrnZLBEMh/7M95opg8ncbmuMROu5E UwGBjIgxvGJjeU/attZHfrN7qawgl6XRk+aTBKYRsTQsIbdRrj0Vtn0kUQiYW3U2iiKw hCoTBWXFEvMDPJxEp2Y/pQjNm10XczsgQdS1P1gbuBrasOFlY/ACyuDkoHdkkFcxSzzE BU6tKMki5kIaya46bOEo2Kb/w4Fzt4/3Wse1s9Os0R0j78FXH9G7nIAt11XTuz2dT2e5 igJGbfTMst+BZ5fIK/5YzFxUh0ZUMXGVaCiOqNMPhv19Zx2jJk+D2mUuW5BBYpTqO4JZ Y5kw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j62si4164369pgc.583.2018.03.02.08.13.10; Fri, 02 Mar 2018 08:13:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1429148AbeCBP1a (ORCPT + 99 others); Fri, 2 Mar 2018 10:27:30 -0500 Received: from mail-ot0-f194.google.com ([74.125.82.194]:38764 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1427767AbeCBP1Y (ORCPT ); Fri, 2 Mar 2018 10:27:24 -0500 Received: by mail-ot0-f194.google.com with SMTP id 95so9022741ote.5 for ; Fri, 02 Mar 2018 07:27:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=JcmI8BkykKGCLrn8W4nkDcltIF0XvspMtX9hjINMuRI=; b=JfDwWT5M4Q8E5kZpkvmMed5oUpmIQ6cuPJ0Uxa5zV++3L8J0OAoa4+BxSsdRnoM/p8 7uedSrDZVtV8uPC+FJHTRy96ScRii3cxxMwALX9Uc8PtkQzTrTfnqc4M7mRMvg05jG83 KZasQrJInMMc38nyQX713gm2Mcb3wKZR6S/CBdtMFlo8dTnE6DWj9ZV7A5C+BAvNO1mj AsPTzh8EkboBW+E4snWQMzNreQKb10Qfyqgiha3O9Q6gZlnSB2ijTtsyCbTxkXT0fnqB cflty9kRgErJb9jCfq5C9G6fwTjKKxuRmD7ZIwjg7qslE/HdFNWm+m5C9HvLS8G/V9ok pmYA== X-Gm-Message-State: AElRT7EPNJMzaJiEu9TXwJA3Nxqa4IYPnRCjCQ1Xhr9hf7ie2oN76ZgR hCdCfG4JQmmf0nygu20Bie75AJ+6PKrEnPMSLX27dQ== X-Received: by 10.157.60.112 with SMTP id j45mr4400201ote.141.1520004443788; Fri, 02 Mar 2018 07:27:23 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.57.246 with HTTP; Fri, 2 Mar 2018 07:27:23 -0800 (PST) In-Reply-To: <20180302130052.GN15057@dhcp22.suse.cz> References: <1519908465-12328-1-git-send-email-neelx@redhat.com> <20180301131033.GH15057@dhcp22.suse.cz> <20180301152729.GM15057@dhcp22.suse.cz> <20180302130052.GN15057@dhcp22.suse.cz> From: Daniel Vacek Date: Fri, 2 Mar 2018 16:27:23 +0100 Message-ID: Subject: Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment To: Michal Hocko Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Vlastimil Babka , Mel Gorman , Pavel Tatashin , Paul Burton , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 2, 2018 at 2:01 PM, Michal Hocko wrote: > On Thu 01-03-18 17:20:04, Daniel Vacek wrote: >> On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko wrote: >> > On Thu 01-03-18 16:09:35, Daniel Vacek wrote: >> > [...] >> >> $ grep 7b7ff000 /proc/iomem >> >> 7b7ff000-7b7fffff : System RAM >> > [...] >> >> After commit b92df1de5d28 machine eventually crashes with: >> >> >> >> BUG at mm/page_alloc.c:1913 >> >> >> >> > VM_BUG_ON(page_zone(start_page) != page_zone(end_page)); >> > >> > This is an important information that should be in the changelog. >> >> And that's exactly what my seven very first words tried to express in >> human readable form instead of mechanically pasting the source code. I >> guess that's a matter of preference. Though I see grepping later can >> be an issue here. > > Do not get me wrong I do not want to nag just for fun of it. The > changelog should be really clear about the problem. What might be clear > to you based on the debugging might not be so clear to others. And the > struct page initialization code is far from trivial especially when we > have different alignment requirements by the memory model and the page > allocator. I get it. I didn't mean to be rude or something. I just thought I covered all the relevant details.. > Therefore being as clear as possible is really valuable. So I would > really love to see the changelog to contain. > - What is going on - VM_BUG_ON in move_freepages along with the crash > report I'll put more details there. > - memory ranges exported by BIOS/FW They were not mentioned as they are not really relevant. Any e820 map can have issues. Now I only saw reports on few selected machines, mostly LENOVO System x3650 M5, some FUJITSU, some Cisco blades. But the map is always fairly normal. IIUC, the bug only happens if the range which is not pageblock aligned happens to be the first one in a zone or following after an not-populated section. Again, nothing of that is really relevant. What is is that the commit b92df1de5d28 changes the way page structures are initialized so that for some perfectly fine maps from BIOS kernel now can crash as a result. And my fix tries to keep at least the bare minimum of the original behavior needed to keep kernel stable. > - explain why is the pageblock alignment the proper one. How does the > range look from the memory section POV (with SPARSEMEM). The commit message explains that. "the same way as in move_freepages_block()" to quote myself. The alignment in this function is the one causing the crash as the VM_BUG_ON() assert in subsequential move_freepages() is checking the (now) uninitialized structure. If we follow this alignment the initialization will not get skipped for that structure. Again, this is partially restoring the original behavior rather than rewriting move_freepages{,_block} to not crash with some data it was not designed for. I'll try to explain this more transparently in commit message. Alternatively you can just revert the b92df1de5d28. That will fix the crashes as well. > - What about those unaligned pages which are not backed by any memory? > Are they reserved so that they will never get used? They are handled the same way as it used to be before b92df1de5d28. This patch does not change or touch anything with this regards. Or am I wrong? > And just to be clear. I am not saying your patch is wrong. It just You better not. My patch it totally correct :p (I hope) > raises more questions than answers and I suspect it just papers over > some more fundamental problem. I might be clearly wrong and I cannot I see. Thank you for looking into it. It's appreciated. I would not call it a fundamental problem, rather a design of move_freepages{,_block} which I'd vote for keeping for now. Hopefully I explained it above. > deserve this more time for the next week because I will be offline Enjoy your time off. > but I would _really_ appreciate if this all got explained. I'll do my best. > Thanks! > -- > Michal Hocko > SUSE Labs