Received: by 10.213.65.68 with SMTP id h4csp1104469imn; Wed, 14 Mar 2018 09:42:59 -0700 (PDT) X-Google-Smtp-Source: AG47ELuxbk3/AVfWa1OAz7GBRMT7CNN7uR8BCyknNO9IgLfLvaxBgSg7p8u6kExDqvoyEOw6k53r X-Received: by 10.99.110.137 with SMTP id j131mr4239508pgc.85.1521045779544; Wed, 14 Mar 2018 09:42:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521045779; cv=none; d=google.com; s=arc-20160816; b=t7iXjCeJm1UC7ZsbClC7VSpAXY9CIXj8u87rANMwFNZ4F2JFk64NtAQ21cBcqLa3+9 Zp533w8YlgMgAZAcxHhfdspVPzzOcA+r5YPOD03nuX4FCN1NJf07d3MPiCyu+VtlqqD5 UA7MY+jVSNAz/8Xku6ERbP8pqjIDlyvKlYkZ25V6lowGEbzDj8NxlUAsUJ0YA+9YcAd/ OwbuFZRhOaBX7wLPTKLpXtGDlZbK2IPCSJk0TUMt0PpY4A+4nh/CXXCHoRqBfD25Qia0 j82x9n5RidqRf8JUvKj8e3MrvdyOGTuD7U5uxmMufof253e+xirptwDcQ1R769ws8wbq G0jQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=bvZKSlR/wWSQqRJWczJ5HprZfNVtb53E4g2Nj6T7RzM=; b=JkdNEQZomd1scLlY1haR2JvdcZs7jkBTyBDLvDI6NSTljwlXgDolk8DCukjTHkj+iX Vf8tIh1CmE/mRx4B4CC+RYn4uLQfimYbzT+S5teV+5Um8cEQbXieX62uk88uOabsDKgy azUoK6ebKGQ9j0+WZjOu+dmlAchuKW+OmA+7E9XuqE9bzCp32f4ueAl0bRJbgdHLVv5+ NoBqNpVZtfIl/JOj4gZVitaWjizAqQWF25fXFkIALyQBegddqY5PCr7kWkVh2MfrQuXA pAZjIzaFAgPotLbtwbgU5UpKaFbJGj+SjrcmNF7RWoGSqXk4+AkMsowOpI96OfbjeeAi 55OA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=M9WyVTIG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o11si2055829pgp.245.2018.03.14.09.42.44; Wed, 14 Mar 2018 09:42:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=M9WyVTIG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751880AbeCNQlu (ORCPT + 99 others); Wed, 14 Mar 2018 12:41:50 -0400 Received: from mail-io0-f196.google.com ([209.85.223.196]:43786 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751289AbeCNQlt (ORCPT ); Wed, 14 Mar 2018 12:41:49 -0400 Received: by mail-io0-f196.google.com with SMTP id l12so5094535ioc.10 for ; Wed, 14 Mar 2018 09:41:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=bvZKSlR/wWSQqRJWczJ5HprZfNVtb53E4g2Nj6T7RzM=; b=M9WyVTIG3eW5a1XxN79iczMayB816qBH7l6871C94bLQAQEvybIZt0EbBjw4e0Fnhx /4utnlR15/584zdwwFyJQISKdGh7V2UHhi6zPgbPrXdCZBZoZAvDvMHea5/tnxr5fbQa zMqNWLyiVFHQLtFZDIvnr8K+0Z9GVvP5S/Gt8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=bvZKSlR/wWSQqRJWczJ5HprZfNVtb53E4g2Nj6T7RzM=; b=QYkShqTfSQaQrXWgdCIZIn7qvPXNe8aFQYh8vxxA+4mx1tvxtDjroKoGS232YkTYEV slyi1LIzgdl0Y7X9y3zb48sJGtgVIqoX7H8sXs1vOk7wcEpUTd5GMu6LTza8UXDrXlfR 2r/uV3Ib2sqLkteCynbgrYKq7oFHsmcs7a4G+DTfEoFB/J6VlTvpyxHLddzIu/Ha0MqD ezV6u2n6eVbQ2uys7OwRSpTMtQPIrJEhbm8fPsj0hW+TBObglonJ/36qmC5zpF6MQlaV 8aoFIWo5QnBOcD4Ty5ymJDF4EeLIRR7flySYb4RQ2NbNaQXS48ar3FpDKD7x3wgsvLr/ ipLQ== X-Gm-Message-State: AElRT7FoIjbYdGlZXtgvz/3NjyNjO/PPINt6FdqMEBdEeHUGLvw0iC4o N26z9IpSytYlBmSsEQrz93y+3N3qBK8rLMSa19x1BA== X-Received: by 10.107.41.16 with SMTP id p16mr5470801iop.173.1521045708423; Wed, 14 Mar 2018 09:41:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.138.209 with HTTP; Wed, 14 Mar 2018 09:41:47 -0700 (PDT) In-Reply-To: References: <20180314134431.13241-1-ard.biesheuvel@linaro.org> <20180314141323.GD23100@dhcp22.suse.cz> <20180314145450.GI23100@dhcp22.suse.cz> From: Ard Biesheuvel Date: Wed, 14 Mar 2018 16:41:47 +0000 Message-ID: Subject: Re: [PATCH] Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment" To: Michal Hocko Cc: linux-arm-kernel , Linux Kernel Mailing List , Mark Rutland , Will Deacon , Catalin Marinas , Marc Zyngier , Daniel Vacek , Mel Gorman , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14 March 2018 at 15:54, Ard Biesheuvel wrote: > On 14 March 2018 at 14:54, Michal Hocko wrote: >> On Wed 14-03-18 14:35:12, Ard Biesheuvel wrote: >>> On 14 March 2018 at 14:13, Michal Hocko wrote: >>> > Does http://lkml.kernel.org/r/20180313224240.25295-1-neelx@redhat.com >>> > fix your issue? From the debugging info you provided it should because >>> > the patch prevents jumping backwards. >>> > >>> >>> The patch does fix the boot hang. >>> >>> But I am concerned that we are papering over a fundamental flaw in >>> memblock_next_valid_pfn(). >> >> It seems that memblock_next_valid_pfn is doing the right thing here. It >> is the alignment which moves the pfn back AFAICS. I am not really >> impressed about the original patch either, to be completely honest. >> It just looks awfully tricky. I still didn't manage to wrap my head >> around the original issue though so I do not have much better ideas to >> be honest. > > So first of all, memblock_next_valid_pfn() never refers to its max_pfn > argument, which is odd nut easily fixed. > Then, the whole idea of substracting one so that the pfn++ will > produce the expected value is rather hacky, > > But the real problem is that rounding down pfn for the next iteration > is dodgy, because early_pfn_valid() isn't guaranteed to return true > for the rounded down value. I know it is probably fine in reality, but > dodgy as hell. The same applies to the call to early_pfn_in_nid() btw > > So how about something like this (apologies on Gmail's behalf for the > whitespace damage, I can resend it as a proper patch) > > > ---------8<----------- > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3d974cb2a1a1..b89ca999ee3b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5352,28 +5352,29 @@ > * function. They do not exist on hotplugged memory. > */ > if (context != MEMMAP_EARLY) > goto not_early; > > - if (!early_pfn_valid(pfn)) { > + if (!early_pfn_valid(pfn) || !early_pfn_in_nid(pfn, nid)) { > #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > /* > * Skip to the pfn preceding the next valid one (or > * end_pfn), such that we hit a valid pfn (or end_pfn) > * on our next iteration of the loop. Note that it needs > * to be pageblock aligned even when the region itself > * is not. move_freepages_block() can shift ahead of > * the valid region but still depends on correct page > * metadata. > */ > - pfn = (memblock_next_valid_pfn(pfn, end_pfn) & > - ~(pageblock_nr_pages-1)) - 1; > -#endif > + pfn = memblock_next_valid_pfn(pfn, end_pfn); > + if (pfn >= end_pfn) > + break; > + pfn &= ~(pageblock_nr_pages - 1); > +#else > continue; > +#endif > } > - if (!early_pfn_in_nid(pfn, nid)) > - continue; > if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised)) > break; > > #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > /* > ---------8<----------- > > This ensures that we enter the remainder of the loop with a properly > aligned pfn, rather than tweaking the value of pfn so it assumes the > expected value after 'pfn++' Um, this does not actually solve the issue. I guess this is due to the fact that a single pageblock size chunk could have both valid and invalid PFNs, and so rounding down the first PFN of the second valid chunk moves you back to the first chunk.