Received: by 10.213.65.68 with SMTP id h4csp1465316imn; Wed, 14 Mar 2018 23:41:08 -0700 (PDT) X-Google-Smtp-Source: AG47ELuT+FogAVA/R/vd0ve3zpelTql5Cb7wPRf/fe4q/mVjH21DH0gsplv881ET9EnS06z70evW X-Received: by 2002:a17:902:41:: with SMTP id 59-v6mr6976380pla.248.1521096068520; Wed, 14 Mar 2018 23:41:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521096068; cv=none; d=google.com; s=arc-20160816; b=T8kCYfXS8XUCZVhCTvp8rF3LalUqXXFB7HYvamtI5KwbOhGefVJSdkS9rjU+9p2CfQ G6liHYyB5cxBoon0mSLwXzj2tNRgeNszmSRD7X01arHVJa4HBzwbtCcxpYpZnb/CQjRH 6xJ6QnYd9rtSIJM4+cK2+9LTJNK1F1O3ADNRBJBxsSL/LXLq2cv1OWhtuDgy/HLYUpc8 OqToyQfqe/PymH0oiKtVUfEKjPSOgB50LzTQihpe2G/F4pzrf0riwRuTSXm+jTSVVrTR PGV2wWtIw+nkax1AuCxHe1AGFB7FNA68AG0e/kv20lSdlMd1pwZThyJAfIroBnNjZmP5 azPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=VHKL6b5dGuCZLGyeW2embnWQtEzhQXo3PbBhwmkVwl0=; b=oS6mld27d3dXNmd9A1PQa2W5zd6n+rcu81zogXh7uHEVgGUBGWXf/h2hN2/5BX2iBf 1SNgESo8vvil2aN+cD7om+YInr5EsLS/MbQbflxSCxWdA6VUcV9bdnHEkckVytdyEgUt 6JuWP4gmxpq0BKO345sHph2EGq4pFVSL5YPeMhI3mGAMdwEGoBBT8CyJjMqSdm3kG5nb bKI6W5fMdMAWvFJV38rY/zty8jb3aj5WDZ0pSg8NY7EVXOcnkO+i8rotGeMxXonzCKUs htDvpnONGieTliLw4Q/fiDMH4+BXHtbX9MF0Wff0X+L2H8z7QHe5rG1qz/ke6LqBerOz dGeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MU5sL9cu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r12si3051579pgs.58.2018.03.14.23.40.53; Wed, 14 Mar 2018 23:41:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MU5sL9cu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751639AbeCOGj5 (ORCPT + 99 others); Thu, 15 Mar 2018 02:39:57 -0400 Received: from mail-it0-f67.google.com ([209.85.214.67]:56229 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750715AbeCOGjz (ORCPT ); Thu, 15 Mar 2018 02:39:55 -0400 Received: by mail-it0-f67.google.com with SMTP id e195-v6so2994675ita.5 for ; Wed, 14 Mar 2018 23:39:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=VHKL6b5dGuCZLGyeW2embnWQtEzhQXo3PbBhwmkVwl0=; b=MU5sL9cuqcLLmrQh1F+gnnB1iBA2CeCMTInnHKNYQHZgZbRGlLbbQpsOZHbAaTTrV0 oOGT+6bokvlKw8GdhKFj4aCvhg9KwdyAQVHb8MFmvhs8+wj52nyS2Jaa3g4tlWUEOwLN gpxSJnRvFtXr0n5hBaSsxgl2WmD+G+Lp+mhig= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=VHKL6b5dGuCZLGyeW2embnWQtEzhQXo3PbBhwmkVwl0=; b=mL6PrHot9RogeZHRFNsFQ6HXGnSvTVsX8Ek+PUzXiIIJj4bbpZAMNwWLAjTQDeLuER XakinZxabGrWs7198u184yJ6XsiXhuw24I3Y1f4BrDBZPjPTb+/pcSiDmQx+IbtsI23W QarQJigi0Jqo61S9RtbOjQ819dkYTDMvGpK6e3OVBAkGOYOGP841CFROJKFK063jwFt7 2LPUoTmzYEufHfujMPqNt74T5lUYQE6bh7RKXgs9LdfsCkO9Q3XH0Y/mvHWeiCKOQb+t Z7S+3v306992UyT+HO9F8JKGowHE/9jv05aA2wqbtj8XwFFQPmL3vmkLLi4TiIeh65qH 9zcw== X-Gm-Message-State: AElRT7E+RY/SjTl3LPbU0l5pZQiP/a7dOrQxgiYZuBEmYEgeSBRa2JG1 JMmmOXHfbd4wVwkFNGFFMt+dktGh6pNemAoUbu+/Og== X-Received: by 10.36.60.216 with SMTP id m207mr4977546ita.68.1521095993734; Wed, 14 Mar 2018 23:39:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.138.209 with HTTP; Wed, 14 Mar 2018 23:39:53 -0700 (PDT) In-Reply-To: References: <20180314134431.13241-1-ard.biesheuvel@linaro.org> <20180314141323.GD23100@dhcp22.suse.cz> <20180314145450.GI23100@dhcp22.suse.cz> From: Ard Biesheuvel Date: Thu, 15 Mar 2018 06:39:53 +0000 Message-ID: Subject: Re: [PATCH] Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment" To: Daniel Vacek Cc: Michal Hocko , linux-arm-kernel , Linux Kernel Mailing List , Mark Rutland , Will Deacon , Catalin Marinas , Marc Zyngier , Mel Gorman , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15 March 2018 at 02:32, Daniel Vacek wrote: > On Wed, Mar 14, 2018 at 6:36 PM, Ard Biesheuvel > wrote: >> On 14 March 2018 at 16:41, Ard Biesheuvel wrote: >>> On 14 March 2018 at 15:54, Ard Biesheuvel wrote: >>>> On 14 March 2018 at 14:54, Michal Hocko wrote: >>>>> On Wed 14-03-18 14:35:12, Ard Biesheuvel wrote: >>>>>> On 14 March 2018 at 14:13, Michal Hocko wrote: >>>>>> > Does http://lkml.kernel.org/r/20180313224240.25295-1-neelx@redhat.com >>>>>> > fix your issue? From the debugging info you provided it should because >>>>>> > the patch prevents jumping backwards. >>>>>> > >>>>>> >>>>>> The patch does fix the boot hang. >>>>>> >>>>>> But I am concerned that we are papering over a fundamental flaw in >>>>>> memblock_next_valid_pfn(). >>>>> >>>>> It seems that memblock_next_valid_pfn is doing the right thing here. It >>>>> is the alignment which moves the pfn back AFAICS. I am not really >>>>> impressed about the original patch either, to be completely honest. >>>>> It just looks awfully tricky. I still didn't manage to wrap my head >>>>> around the original issue though so I do not have much better ideas to >>>>> be honest. >>>> >>>> So first of all, memblock_next_valid_pfn() never refers to its max_pfn >>>> argument, which is odd nut easily fixed. >>>> Then, the whole idea of substracting one so that the pfn++ will >>>> produce the expected value is rather hacky, >>>> >>>> But the real problem is that rounding down pfn for the next iteration >>>> is dodgy, because early_pfn_valid() isn't guaranteed to return true >>>> for the rounded down value. I know it is probably fine in reality, but >>>> dodgy as hell. The same applies to the call to early_pfn_in_nid() btw >>>> >>>> So how about something like this (apologies on Gmail's behalf for the >>>> whitespace damage, I can resend it as a proper patch) >>>> >>>> >>>> ---------8<----------- >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index 3d974cb2a1a1..b89ca999ee3b 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -5352,28 +5352,29 @@ >>>> * function. They do not exist on hotplugged memory. >>>> */ >>>> if (context != MEMMAP_EARLY) >>>> goto not_early; >>>> >>>> - if (!early_pfn_valid(pfn)) { >>>> + if (!early_pfn_valid(pfn) || !early_pfn_in_nid(pfn, nid)) { >>>> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP >>>> /* >>>> * Skip to the pfn preceding the next valid one (or >>>> * end_pfn), such that we hit a valid pfn (or end_pfn) >>>> * on our next iteration of the loop. Note that it needs >>>> * to be pageblock aligned even when the region itself >>>> * is not. move_freepages_block() can shift ahead of >>>> * the valid region but still depends on correct page >>>> * metadata. >>>> */ >>>> - pfn = (memblock_next_valid_pfn(pfn, end_pfn) & >>>> - ~(pageblock_nr_pages-1)) - 1; >>>> -#endif >>>> + pfn = memblock_next_valid_pfn(pfn, end_pfn); >>>> + if (pfn >= end_pfn) >>>> + break; >>>> + pfn &= ~(pageblock_nr_pages - 1); >>>> +#else >>>> continue; >>>> +#endif >>>> } >>>> - if (!early_pfn_in_nid(pfn, nid)) >>>> - continue; >>>> if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised)) >>>> break; >>>> >>>> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP >>>> /* >>>> ---------8<----------- >>>> >>>> This ensures that we enter the remainder of the loop with a properly >>>> aligned pfn, rather than tweaking the value of pfn so it assumes the >>>> expected value after 'pfn++' >>> >>> Um, this does not actually solve the issue. I guess this is due to the >>> fact that a single pageblock size chunk could have both valid and >>> invalid PFNs, and so rounding down the first PFN of the second valid >>> chunk moves you back to the first chunk. >> >> OK, so the original patch attempted to ensure that of each pageblock, >> at least the first struct page gets initialized, even though the PFN >> may not be valid. Unfortunately, this code is not complete, given that >> start_pfn itself may be misaligned, and so the issue it attempts to >> solve may still occur. > > You're wrong here. > You only align down after encountering an invalid PFN. If start_pfn itself is not pageblock aligned, how do you initialize the first struct page of the pageblock? >> Then, I think it is absolutely dodgy to settle for only initializing >> the first struct page, rather than all of them, only because a >> specific VM_BUG_ON() references the flag field of the first struct >> page. >> IMO, we should fix this by initializing all struct page entries for >> each pageblock sized chunk that has any valid PFNs. > > That's precisely what my patch does. At least with > CONFIG_HAVE_ARCH_PFN_VALID disabled. And it looks only arm implements > arch pfn_valid() which I was not testing with and I am not sure it's > correct. Check my other email > No, your patch only initializes the first struct page of a pageblock. If the next one is invalid, we will skip to the next valid one. You are making the assumption that pfn_valid() will return true for all pages in a pageblock if it returns true for one of them, and this does not hold on other architectures.