Received: by 10.213.65.68 with SMTP id h4csp1684518imn; Thu, 15 Mar 2018 06:49:58 -0700 (PDT) X-Google-Smtp-Source: AG47ELvbd67YligEhEt1efwXwWKlrd69x9ndqrz6PmH4msfzWO8OoKg+Aw/2m2WAidW8g1hPKw1r X-Received: by 2002:a17:902:a2:: with SMTP id a31-v6mr8173249pla.204.1521121798435; Thu, 15 Mar 2018 06:49:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521121798; cv=none; d=google.com; s=arc-20160816; b=FGY52HVvJFaIPh0OqIvxFmsQPQR+TnCJHO7hUwRABzDEVLhHdky/qXil4pEEi+c9r7 ajO5Bdm2ycH6v0PBODhvAwIJAISEmyR00NyOKExQ77sTQLD0SAvv/I1iUOxiOM7wpZm/ HhdCSbUZY2ABNKzB3LqnK/Smp9xW35oENEdeQY6MTN6hsQKoGjIg13XarkjtAiANaoxh KQDPKIddOeQ/yhKkMHs0K2yjvFqzb4G2/1ZrnJzrLL1p9Eaks4M4mJ0enFEVtUWyNGd+ HWx5vQFHmP3qtAIWxsPv6XXutFDQ/hEe3eAYKfyL/XIdm8ezLE7c26jLc7IqtQ4dxlCK sPcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=7m8L+5chjGXi0n2pS/zNsmqwuBGzzdsnFA5rYSD7/iI=; b=QX03hHnURTx2hSwgyLS2AqEK9c0PmcKnmCuzRxlz40c2bEP4lQou0bw98Vy5zxei/u dS/wII336ANTQ5OVBzwZPFwv8z5zBhx5d8hzM+NBh3hpzblf2AJUlY/fIXdHe3eQ2EPU 1NYmJgFSI4HvTYo8WXJabcoMPjwv5097yQCh2Yg5EtLJPs9r3ZQ5fl/e3Uiw5KBqNpLD bCuNDbnd44PJR0khINc/uqO35RyuXi957kaRDjppV8Whotipi836/3kQdP05rHjBpQy2 Ix1BSSippR7NZz8PDfw5i5NbA3TZl/9+SHqwXBkUhptKCBN9qFlpy3Zipj5798XnIpeD f4/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=LR0JQpZU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w15-v6si59528plq.266.2018.03.15.06.49.34; Thu, 15 Mar 2018 06:49:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=LR0JQpZU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752511AbeCONsD (ORCPT + 99 others); Thu, 15 Mar 2018 09:48:03 -0400 Received: from mail-io0-f196.google.com ([209.85.223.196]:41547 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751488AbeCONsB (ORCPT ); Thu, 15 Mar 2018 09:48:01 -0400 Received: by mail-io0-f196.google.com with SMTP id m83so8624271ioi.8 for ; Thu, 15 Mar 2018 06:48:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=7m8L+5chjGXi0n2pS/zNsmqwuBGzzdsnFA5rYSD7/iI=; b=LR0JQpZUwk+yoEeb0fC5gc+mmzW1IeVrynaKaIdTYxXQ3K/1e9bXvvqKTqHpO0tBgH FQkoPrBUJkqZS8XPiGpPxoZ1nLzeq/PjBxRfBk4i7FisY9KPQTot3qJNIC5+LvOyOFzI B3dKl24ALFY0Ga+LJpLTyUFT7KEzRo704EDcM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=7m8L+5chjGXi0n2pS/zNsmqwuBGzzdsnFA5rYSD7/iI=; b=n0yaBGaCOxvS40+qXYztPLAcZpRTwbRdLslzUrTaAFHSIdKjpkO82i4uDD9b8A0fSu K0kT1Pgef/EyblLqG1hAvhHBGitDifdarqPFQSaM6nJayuP4NYWvvcwsBWPxs4Wo8jGH Ccrq9MjzwGS+xqteu/SDmJrzo0yxAjVu1UzqwYBA+IYRQdYCsnXcKZ8J45TRe/SQ7EhB aOSecz97RKx8WmOCwX6pXKAhllHLEcfs08cq8Z3hZeep+oBBADlOBLtM27GnXHWVXK/m kzPPSxt5k7RH6HL758HDDErGPdC8TRpf5EExPvCnA4AtJdFtggvVZxX/D6GVS6osZqUx oydA== X-Gm-Message-State: AElRT7HDARSoynwhzduYnRC5u0QqrN0RjUYtd4Wk1Z48iAn7KDmZrfyy 38bfsXmI1qg/6vnmEy3gMDZlgu5/p8KZTVUlyytVbzzWsqk= X-Received: by 10.107.5.199 with SMTP id 190mr9443387iof.107.1521121680899; Thu, 15 Mar 2018 06:48:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.138.209 with HTTP; Thu, 15 Mar 2018 06:48:00 -0700 (PDT) In-Reply-To: <20180315114312.GC23100@dhcp22.suse.cz> References: <20180314134431.13241-1-ard.biesheuvel@linaro.org> <20180314141323.GD23100@dhcp22.suse.cz> <20180314145450.GI23100@dhcp22.suse.cz> <20180315101411.GA23100@dhcp22.suse.cz> <20180315114312.GC23100@dhcp22.suse.cz> From: Ard Biesheuvel Date: Thu, 15 Mar 2018 13:48:00 +0000 Message-ID: Subject: Re: [PATCH] Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment" To: Michal Hocko Cc: linux-arm-kernel , Linux Kernel Mailing List , Mark Rutland , Will Deacon , Catalin Marinas , Marc Zyngier , Daniel Vacek , Mel Gorman , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15 March 2018 at 11:43, Michal Hocko wrote: > On Thu 15-03-18 10:17:24, Ard Biesheuvel wrote: >> On 15 March 2018 at 10:14, Michal Hocko wrote: >> > On Wed 14-03-18 15:54:16, Ard Biesheuvel wrote: >> >> On 14 March 2018 at 14:54, Michal Hocko wrote: >> >> > On Wed 14-03-18 14:35:12, Ard Biesheuvel wrote: >> >> >> On 14 March 2018 at 14:13, Michal Hocko wrote: >> >> >> > Does http://lkml.kernel.org/r/20180313224240.25295-1-neelx@redhat.com >> >> >> > fix your issue? From the debugging info you provided it should because >> >> >> > the patch prevents jumping backwards. >> >> >> > >> >> >> >> >> >> The patch does fix the boot hang. >> >> >> >> >> >> But I am concerned that we are papering over a fundamental flaw in >> >> >> memblock_next_valid_pfn(). >> >> > >> >> > It seems that memblock_next_valid_pfn is doing the right thing here. It >> >> > is the alignment which moves the pfn back AFAICS. I am not really >> >> > impressed about the original patch either, to be completely honest. >> >> > It just looks awfully tricky. I still didn't manage to wrap my head >> >> > around the original issue though so I do not have much better ideas to >> >> > be honest. >> >> >> >> So first of all, memblock_next_valid_pfn() never refers to its max_pfn >> >> argument, which is odd nut easily fixed. >> > >> > There is a patch to remove that parameter sitting in the mmotm tree. >> > >> >> Then, the whole idea of substracting one so that the pfn++ will >> >> produce the expected value is rather hacky, >> > >> > Absolutely agreed! >> > >> >> But the real problem is that rounding down pfn for the next iteration >> >> is dodgy, because early_pfn_valid() isn't guaranteed to return true >> >> for the rounded down value. I know it is probably fine in reality, but >> >> dodgy as hell. >> > >> > Yes, that is what I meant when saying I was not impressed... I am always >> > nervous when a loop makes jumps back and forth. I _think_ the main >> > problem here is that we try to initialize a partial pageblock even >> > though a part of it is invalid. We should simply ignore struct pages >> > for those pfns. We don't do that and that is mostly because of the >> > disconnect between what the page allocator and early init code refers to >> > as a unit of memory to care about. I do not remember exactly why but I >> > strongly suspect this is mostly a performance optimization on the page >> > allocator side so that we do not have to check each and every pfn. Maybe >> > we should signal partial pageblocks from an early code and drop the >> > optimization in the page allocator init code. >> > >> >> The same applies to the call to early_pfn_in_nid() btw >> > >> > Why? >> >> By 'the same' I mean it isn't guaranteed to return true for the >> rounded down value *at the API level*. I understand it will be mostly >> fine in reality, but juggling (in)valid PFNs like this is likely to >> end badly. > > OK, I see your point now. I can really imagine that sub-pageblocks would > be splitted into different NUMA nodes but that should be really rare. > Yes, it should never happen. But these abstractions exist for a reason: it makes this code understandable to humans, and so taking all kinds of shortcuts around them makes the code unmaintainable. If ARM's implementation of pfn_valid() is flawed, we should fix it. If memblock_next_valid_pfn() is flawed, we should fix it. But papering over these issues by bypassing the abstractions is really not the way to go (but I think we're already in agreement there)