Received: by 10.213.65.68 with SMTP id h4csp1136825imn; Wed, 14 Mar 2018 10:38:59 -0700 (PDT) X-Google-Smtp-Source: AG47ELsDSgUxLdf/vNsf6WpynV7GD6XrDDTZDe7S8tWM6yvTk8fGZfBstMJw0aOc6RGOirNS9OeV X-Received: by 10.99.64.197 with SMTP id n188mr4374732pga.21.1521049139674; Wed, 14 Mar 2018 10:38:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521049139; cv=none; d=google.com; s=arc-20160816; b=SDVWEmNs92OE+SiIFcnBbEXISBCzfGnMQfRZzWA0uhCWX0vbHBHMxrjNFMG1Yfdl8y WHBu/dkzBehlWkYGvmNrBgNeC6jFV7cN7YTeLMEBymge0tGF4rpBjSCQQUv32ixuyKXe qjB85n8ZZDx37RMCg2HHCLONCL+WH+j8tK11MgwAQV5piaeoKs1KqlYrg4H13SL1anTy xaYUlGIf0ejArFXDD6KZYoG+h/lf4eEymAdMr/tUpQxXvfu8TegxdbWaM3/NJT1vZ0hr LUx4rJCQCroay140aMm0bVeaOSmfTVF5951RluGCB3+/U8YWMiBpJFwKMCzwyo56/yqb eLKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=XKq76QBZMLsW2BhZykYcEe1m8PnVdJ5eY+SRh95DGcE=; b=vRqpNHj8RLhdpVKllxBEpkAOCOBfKt6PDfHSW8Hxgn3nFkepmcOdy8mBE/fhndgaFb b6M4la5fdCR0fxPMaNGkEsHlOyNxTQ4xxGb3oPOLA/nQ4TFRvonD4yO2ALuYo49AmpWy DU7zdZsKfVxvJz6VadMukbQSnRV8xeqNc2Yj5Xsx98R4UKigp4tN/PCbDXjbWNWiukz3 WbpGR9HbnC/VnITvIZ+aM3KkIzKvtnacXTqykeTUtINTGeMHaRMWN8XkB0JLAdGLkWcw uRrhbgIxT0HzFefI/UnBLJqSuPjVaIGutNKDW2H2q7Xf8BUXN4VYDSZ6tsZ92bHXbOr8 yuvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=cuSkHx+X; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h70si1132829pge.814.2018.03.14.10.38.45; Wed, 14 Mar 2018 10:38:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=cuSkHx+X; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751467AbeCNRg3 (ORCPT + 99 others); Wed, 14 Mar 2018 13:36:29 -0400 Received: from mail-it0-f67.google.com ([209.85.214.67]:39963 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750779AbeCNRg2 (ORCPT ); Wed, 14 Mar 2018 13:36:28 -0400 Received: by mail-it0-f67.google.com with SMTP id e64-v6so5709363ita.5 for ; Wed, 14 Mar 2018 10:36:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=XKq76QBZMLsW2BhZykYcEe1m8PnVdJ5eY+SRh95DGcE=; b=cuSkHx+Xcj36sNtWfwG1v7XY7zK4zUUVTQ51E1SBjHNPWFTNue31sQH4aTx6KfsnpI qaSvdGg5CxDzAoUqgfjGwX4pNBZmrd1O8NqPkeoS46OFW1sVjazgwuPnzHug3bOdKyUH lMbw1UAFyS2PKEJmqhCr/6MHJ2hsJwmv9M+VE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=XKq76QBZMLsW2BhZykYcEe1m8PnVdJ5eY+SRh95DGcE=; b=sX2dj/KVXeZl9pj5rc54VGHD7rFjcg9hMcHymOQKdwdnhL7L40nMdXLCa3iEbT7N1m WK7oCzkL7S2h/OpfE5mXYyvKKbiTnvKhcBw2IKk8uEXA1/+TcSCLLTmxY5V630/fxibv 0nFAmI2nylx+qrvbXTX8T+SBBqD0/wR4vnARXftXChIcyeTKUPI9YEKlK68g94RaX7/F u4wmGabJ3ZIXdsowBn0etH5nHY3IvQo1umY/SQyPuUN/1KtA9q+c6ml7c64fIbeceKBl hU+tTy4mOsE/zdy0oliua1U1ZtwQZaIz5lZ49DRzhRex0s9ExpAcqxrj+nWvY5pSYwA0 LLmA== X-Gm-Message-State: AElRT7Ex5iShFMz/RKPlrvUh0tDCHpu8JvGCLR6fVRGTB0e5trkMFTSu ockdRzRzpvuYHyK82oC+t6255eF5mczpz4+Oijwcjg== X-Received: by 2002:a24:e645:: with SMTP id e66-v6mr2943533ith.42.1521048987809; Wed, 14 Mar 2018 10:36:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.138.209 with HTTP; Wed, 14 Mar 2018 10:36:27 -0700 (PDT) In-Reply-To: References: <20180314134431.13241-1-ard.biesheuvel@linaro.org> <20180314141323.GD23100@dhcp22.suse.cz> <20180314145450.GI23100@dhcp22.suse.cz> From: Ard Biesheuvel Date: Wed, 14 Mar 2018 17:36:27 +0000 Message-ID: Subject: Re: [PATCH] Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment" To: Michal Hocko Cc: linux-arm-kernel , Linux Kernel Mailing List , Mark Rutland , Will Deacon , Catalin Marinas , Marc Zyngier , Daniel Vacek , Mel Gorman , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14 March 2018 at 16:41, Ard Biesheuvel wrote: > On 14 March 2018 at 15:54, Ard Biesheuvel wrote: >> On 14 March 2018 at 14:54, Michal Hocko wrote: >>> On Wed 14-03-18 14:35:12, Ard Biesheuvel wrote: >>>> On 14 March 2018 at 14:13, Michal Hocko wrote: >>>> > Does http://lkml.kernel.org/r/20180313224240.25295-1-neelx@redhat.com >>>> > fix your issue? From the debugging info you provided it should because >>>> > the patch prevents jumping backwards. >>>> > >>>> >>>> The patch does fix the boot hang. >>>> >>>> But I am concerned that we are papering over a fundamental flaw in >>>> memblock_next_valid_pfn(). >>> >>> It seems that memblock_next_valid_pfn is doing the right thing here. It >>> is the alignment which moves the pfn back AFAICS. I am not really >>> impressed about the original patch either, to be completely honest. >>> It just looks awfully tricky. I still didn't manage to wrap my head >>> around the original issue though so I do not have much better ideas to >>> be honest. >> >> So first of all, memblock_next_valid_pfn() never refers to its max_pfn >> argument, which is odd nut easily fixed. >> Then, the whole idea of substracting one so that the pfn++ will >> produce the expected value is rather hacky, >> >> But the real problem is that rounding down pfn for the next iteration >> is dodgy, because early_pfn_valid() isn't guaranteed to return true >> for the rounded down value. I know it is probably fine in reality, but >> dodgy as hell. The same applies to the call to early_pfn_in_nid() btw >> >> So how about something like this (apologies on Gmail's behalf for the >> whitespace damage, I can resend it as a proper patch) >> >> >> ---------8<----------- >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 3d974cb2a1a1..b89ca999ee3b 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5352,28 +5352,29 @@ >> * function. They do not exist on hotplugged memory. >> */ >> if (context != MEMMAP_EARLY) >> goto not_early; >> >> - if (!early_pfn_valid(pfn)) { >> + if (!early_pfn_valid(pfn) || !early_pfn_in_nid(pfn, nid)) { >> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP >> /* >> * Skip to the pfn preceding the next valid one (or >> * end_pfn), such that we hit a valid pfn (or end_pfn) >> * on our next iteration of the loop. Note that it needs >> * to be pageblock aligned even when the region itself >> * is not. move_freepages_block() can shift ahead of >> * the valid region but still depends on correct page >> * metadata. >> */ >> - pfn = (memblock_next_valid_pfn(pfn, end_pfn) & >> - ~(pageblock_nr_pages-1)) - 1; >> -#endif >> + pfn = memblock_next_valid_pfn(pfn, end_pfn); >> + if (pfn >= end_pfn) >> + break; >> + pfn &= ~(pageblock_nr_pages - 1); >> +#else >> continue; >> +#endif >> } >> - if (!early_pfn_in_nid(pfn, nid)) >> - continue; >> if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised)) >> break; >> >> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP >> /* >> ---------8<----------- >> >> This ensures that we enter the remainder of the loop with a properly >> aligned pfn, rather than tweaking the value of pfn so it assumes the >> expected value after 'pfn++' > > Um, this does not actually solve the issue. I guess this is due to the > fact that a single pageblock size chunk could have both valid and > invalid PFNs, and so rounding down the first PFN of the second valid > chunk moves you back to the first chunk. OK, so the original patch attempted to ensure that of each pageblock, at least the first struct page gets initialized, even though the PFN may not be valid. Unfortunately, this code is not complete, given that start_pfn itself may be misaligned, and so the issue it attempts to solve may still occur. Then, I think it is absolutely dodgy to settle for only initializing the first struct page, rather than all of them, only because a specific VM_BUG_ON() references the flag field of the first struct page. IMO, we should fix this by initializing all struct page entries for each pageblock sized chunk that has any valid PFNs.