Received: by 10.213.65.68 with SMTP id h4csp1559009imn; Thu, 15 Mar 2018 03:15:25 -0700 (PDT) X-Google-Smtp-Source: AG47ELs1X+MqfhjRcdycc3x0QkRl89J31B+HtXK/Ricg+uyXRQItCWX5JC1Hy7OuI116U8oBSEZK X-Received: by 2002:a17:902:591e:: with SMTP id o30-v6mr7676393pli.258.1521108925536; Thu, 15 Mar 2018 03:15:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521108925; cv=none; d=google.com; s=arc-20160816; b=mkr1hswviPgDoNFeFVONax8lToyE4YuhJwWjbim0/gZPMQqo9KEbItkyctta/2fdNY yQwXVoC0S0CBXe4io5GrmrbFcM+yQtfOrvxmnNtpN0mdBy55yxvJttV7nG5NBDOwxhvz Fzeb5GHOnLNZHaw3bgeOQaP3bQuBe4l7fP9AVUyc0aGmyBUkF6LBNcPPJmkFrXUpazzw wVYMnz26u6YktQhoLMNth0UL3+jivwNExeVveoy2j7W+7WljSUeUT8ELch7L68J0BWas SZDTkJVl/a/YFg0HrOwgPrkdXn/jxb8VliSxiT8ApBUDhFSj53mGPFyVsstdM0IHCiZ0 jfgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=2Z4qF0j0k6d0KpRvGeVjsJRSL8HGvcpNCiRZwCjHc0s=; b=GlO0kSn9C7A5mSHMUrNXIALEoOKy7GE0NNWNe4fGNR5cu75rzFgp6IqZX9ZhPWDfRJ 86Q38M9abYcAhL6LPa9TpsjMWKU8PGImIv3Z9O1RJQMhbB6oYWSYxjgXnI8k0r+aaWXo 7gYm8re0MM866UK7mz8P2KXJbxHCpTB+1ZacUGvaWQzZVzQyrFkxcqUwwx885blvrjYJ pJafD1YljruXnXgy/QTj89YNKZ6qdse1WnV7+E3d/yfbIH/QPtNWEhOdDzRtKXJUHiJr S2ocDeL2jKaxgldtTl/5dPALvjMd8HT+jsoN4pzHYENBI60566x4aApyGTXoDJ5n3Gkn 31Rw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m9si3629372pfi.212.2018.03.15.03.15.11; Thu, 15 Mar 2018 03:15:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751763AbeCOKOP (ORCPT + 99 others); Thu, 15 Mar 2018 06:14:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:52163 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751512AbeCOKOO (ORCPT ); Thu, 15 Mar 2018 06:14:14 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 75A1BADAE; Thu, 15 Mar 2018 10:14:12 +0000 (UTC) Date: Thu, 15 Mar 2018 11:14:11 +0100 From: Michal Hocko To: Ard Biesheuvel Cc: linux-arm-kernel , Linux Kernel Mailing List , Mark Rutland , Will Deacon , Catalin Marinas , Marc Zyngier , Daniel Vacek , Mel Gorman , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds Subject: Re: [PATCH] Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment" Message-ID: <20180315101411.GA23100@dhcp22.suse.cz> References: <20180314134431.13241-1-ard.biesheuvel@linaro.org> <20180314141323.GD23100@dhcp22.suse.cz> <20180314145450.GI23100@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 14-03-18 15:54:16, Ard Biesheuvel wrote: > On 14 March 2018 at 14:54, Michal Hocko wrote: > > On Wed 14-03-18 14:35:12, Ard Biesheuvel wrote: > >> On 14 March 2018 at 14:13, Michal Hocko wrote: > >> > Does http://lkml.kernel.org/r/20180313224240.25295-1-neelx@redhat.com > >> > fix your issue? From the debugging info you provided it should because > >> > the patch prevents jumping backwards. > >> > > >> > >> The patch does fix the boot hang. > >> > >> But I am concerned that we are papering over a fundamental flaw in > >> memblock_next_valid_pfn(). > > > > It seems that memblock_next_valid_pfn is doing the right thing here. It > > is the alignment which moves the pfn back AFAICS. I am not really > > impressed about the original patch either, to be completely honest. > > It just looks awfully tricky. I still didn't manage to wrap my head > > around the original issue though so I do not have much better ideas to > > be honest. > > So first of all, memblock_next_valid_pfn() never refers to its max_pfn > argument, which is odd nut easily fixed. There is a patch to remove that parameter sitting in the mmotm tree. > Then, the whole idea of substracting one so that the pfn++ will > produce the expected value is rather hacky, Absolutely agreed! > But the real problem is that rounding down pfn for the next iteration > is dodgy, because early_pfn_valid() isn't guaranteed to return true > for the rounded down value. I know it is probably fine in reality, but > dodgy as hell. Yes, that is what I meant when saying I was not impressed... I am always nervous when a loop makes jumps back and forth. I _think_ the main problem here is that we try to initialize a partial pageblock even though a part of it is invalid. We should simply ignore struct pages for those pfns. We don't do that and that is mostly because of the disconnect between what the page allocator and early init code refers to as a unit of memory to care about. I do not remember exactly why but I strongly suspect this is mostly a performance optimization on the page allocator side so that we do not have to check each and every pfn. Maybe we should signal partial pageblocks from an early code and drop the optimization in the page allocator init code. > The same applies to the call to early_pfn_in_nid() btw Why? -- Michal Hocko SUSE Labs