Received: by 10.213.65.68 with SMTP id h4csp1605443imn; Thu, 15 Mar 2018 04:45:10 -0700 (PDT) X-Google-Smtp-Source: AG47ELt5AouibXJ60G04K3GzT832FUNXMG3wT35fnsKrFsI+2D2llqP6imdu/niuTqBC9cNqTsAN X-Received: by 2002:a17:902:6943:: with SMTP id k3-v6mr7760529plt.214.1521114310883; Thu, 15 Mar 2018 04:45:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521114310; cv=none; d=google.com; s=arc-20160816; b=mkaK5eBdsZKhPkNPHL7i82wdk8vxbksfsWmZwtCEITe76n3z5zM710LbUD9lV5sBi2 XJBOfhB10PVgNHUXXvTHuvTh5yA22MQfP/eSVCvjHcEYoeVuF5kVTLBXid084q6S/3zc +iaiUm6vCogfOUDfoEa51JpyQ91bDyPFG3+24gT/jXkgP8GjzKxHowSiPr6hX+b3N9L9 UGlGotVWsh424QSgLzd9lKOQkLSikpXSdR6hs5ersQMZGjc1U0UpKXXm3ojifdWdOnuE W20aPqza7tWz62959/nYyNQSgV7pRIO7xPeexdreeAziUztdFNq3+qL50wGEvdwbOgif 1qCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=FWU4WScS3dVqJ+0+uAcjU5d7dIXkL96uDpNsEytPlCA=; b=j+1CRr/tdcPtVZH3z/ZpAFZ1Twu06sPGTaCwIIbReMLRfY3HCEr2I2l2PXkUQ9dmEH Oi/ateGdUiClxOhcxusNA7NNjLmEtHmQvPRzFBstFIiGusHklKjvdn/TbDQa3tw/WRK4 mAz6BSaAqNDXBoiIG/pktE3Muau9o6gpyKcS5LvyGCZOkGAzwinXzYtVWWgtnkO5xhr6 0G4OCINuzet3wCn4ZyOPH4qewvR5pr3reDJs5vcXmU90jRMILiGQOhpmdvi8NIqDI39+ Xa2/Wy1B8wq1+rMyVU0OpqPWD5XkMeC/ftq3Uga+TL8B5cDvok9rvA3Jrregd9EbKXD0 k3sA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f2si3332065pgt.481.2018.03.15.04.44.54; Thu, 15 Mar 2018 04:45:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752243AbeCOLnR (ORCPT + 99 others); Thu, 15 Mar 2018 07:43:17 -0400 Received: from mx2.suse.de ([195.135.220.15]:59751 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750745AbeCOLnO (ORCPT ); Thu, 15 Mar 2018 07:43:14 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 388ADAD9E; Thu, 15 Mar 2018 11:43:13 +0000 (UTC) Date: Thu, 15 Mar 2018 12:43:12 +0100 From: Michal Hocko To: Ard Biesheuvel Cc: linux-arm-kernel , Linux Kernel Mailing List , Mark Rutland , Will Deacon , Catalin Marinas , Marc Zyngier , Daniel Vacek , Mel Gorman , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds Subject: Re: [PATCH] Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment" Message-ID: <20180315114312.GC23100@dhcp22.suse.cz> References: <20180314134431.13241-1-ard.biesheuvel@linaro.org> <20180314141323.GD23100@dhcp22.suse.cz> <20180314145450.GI23100@dhcp22.suse.cz> <20180315101411.GA23100@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 15-03-18 10:17:24, Ard Biesheuvel wrote: > On 15 March 2018 at 10:14, Michal Hocko wrote: > > On Wed 14-03-18 15:54:16, Ard Biesheuvel wrote: > >> On 14 March 2018 at 14:54, Michal Hocko wrote: > >> > On Wed 14-03-18 14:35:12, Ard Biesheuvel wrote: > >> >> On 14 March 2018 at 14:13, Michal Hocko wrote: > >> >> > Does http://lkml.kernel.org/r/20180313224240.25295-1-neelx@redhat.com > >> >> > fix your issue? From the debugging info you provided it should because > >> >> > the patch prevents jumping backwards. > >> >> > > >> >> > >> >> The patch does fix the boot hang. > >> >> > >> >> But I am concerned that we are papering over a fundamental flaw in > >> >> memblock_next_valid_pfn(). > >> > > >> > It seems that memblock_next_valid_pfn is doing the right thing here. It > >> > is the alignment which moves the pfn back AFAICS. I am not really > >> > impressed about the original patch either, to be completely honest. > >> > It just looks awfully tricky. I still didn't manage to wrap my head > >> > around the original issue though so I do not have much better ideas to > >> > be honest. > >> > >> So first of all, memblock_next_valid_pfn() never refers to its max_pfn > >> argument, which is odd nut easily fixed. > > > > There is a patch to remove that parameter sitting in the mmotm tree. > > > >> Then, the whole idea of substracting one so that the pfn++ will > >> produce the expected value is rather hacky, > > > > Absolutely agreed! > > > >> But the real problem is that rounding down pfn for the next iteration > >> is dodgy, because early_pfn_valid() isn't guaranteed to return true > >> for the rounded down value. I know it is probably fine in reality, but > >> dodgy as hell. > > > > Yes, that is what I meant when saying I was not impressed... I am always > > nervous when a loop makes jumps back and forth. I _think_ the main > > problem here is that we try to initialize a partial pageblock even > > though a part of it is invalid. We should simply ignore struct pages > > for those pfns. We don't do that and that is mostly because of the > > disconnect between what the page allocator and early init code refers to > > as a unit of memory to care about. I do not remember exactly why but I > > strongly suspect this is mostly a performance optimization on the page > > allocator side so that we do not have to check each and every pfn. Maybe > > we should signal partial pageblocks from an early code and drop the > > optimization in the page allocator init code. > > > >> The same applies to the call to early_pfn_in_nid() btw > > > > Why? > > By 'the same' I mean it isn't guaranteed to return true for the > rounded down value *at the API level*. I understand it will be mostly > fine in reality, but juggling (in)valid PFNs like this is likely to > end badly. OK, I see your point now. I can really imagine that sub-pageblocks would be splitted into different NUMA nodes but that should be really rare. -- Michal Hocko SUSE Labs