Received: by 10.213.65.68 with SMTP id h4csp36371imn; Thu, 15 Mar 2018 08:50:17 -0700 (PDT) X-Google-Smtp-Source: AG47ELtq4updriO2E/xzP4DP++FC13tjpiDlQip83q1ThphIcnPzHkNhtEpPjl8aFWZl5kxvOC86 X-Received: by 2002:a17:902:8349:: with SMTP id z9-v6mr8602487pln.163.1521129017866; Thu, 15 Mar 2018 08:50:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521129017; cv=none; d=google.com; s=arc-20160816; b=l6z40JOziKogv6pjvRwEqwQ5qvKfk3nVQHZCd6zLH3YffkA+hw6G5yE+4i1c0mrmdt nOLFhGh1aJhGqhBhIH/6M5DwD2JR6g0YVF76TaHrRPPbGCYZibswsrKIsWXvOjIAss2R KrzFFrJGHSOTZc6KCjO+bPKwbjhk96N29po/7qEMIsW4docf4QueyE8k5cp1wo3TGMBu cgS7GYcpbl+U6jJJF01b+Fn9RwE8tv6doJ2zZS78pqQ3vNZYa/52GfaI6KlpjHVOHyoZ jY3fGWwnxH2AhYlRJ0NyT2nCvXEJ3jpvZc/rXRnN4SaozwWX1FuNYDQO0wgSF9eC4KeO F9zA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=tgi3hh0Ux2O0rNYjlKi+FXP9I6bj7E8aPXXEqm9RGog=; b=qXV4Qz/w58bSXix2JYPn0XXr9h+2xy30g2kBFzhwGb7/YXOyw1y7NpICgK20yfS5Ys CVUHX1aVQmggEwIpzUUc3DGE+sSX+pEZlvUT1Gp2SFi3JFC/1GGXsA8pKk6coSHELcjv YMJ0p4woLsdaBjtKjEG5jmNNjXls6HETDZFNV++k9BAgtfOHiRyrM87ddtnz7IbnvJte bdJMPijO57M5v6Vmg7EcalnyNrhkeTuSVPSmqjHCS+AFMqBSb9z4Jvw4AN7lrK+ylPX+ 6iOBA0HXVQb/cfZcUZ137OGIbCfP85hMA76R2AlP3BxDjtiqGqExKZH93lXMK2gokRYY oXYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=LChIjRXZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g193si3523294pgc.105.2018.03.15.08.50.03; Thu, 15 Mar 2018 08:50:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=LChIjRXZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932794AbeCOPsu (ORCPT + 99 others); Thu, 15 Mar 2018 11:48:50 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:46996 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932397AbeCOPss (ORCPT ); Thu, 15 Mar 2018 11:48:48 -0400 Received: by mail-io0-f193.google.com with SMTP id g14so232214iob.13 for ; Thu, 15 Mar 2018 08:48:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=tgi3hh0Ux2O0rNYjlKi+FXP9I6bj7E8aPXXEqm9RGog=; b=LChIjRXZEriJme7gdRa00r0CaDZnHPYBHvUTbghe8zatCnssueyG5WWftT0nLoPASa VctCITGAkm58CPk/XVYIqvmSLnVErw98t18+a0CISQ28uZMLr78z5pNwEfFtn5eTLLqY tgIsC7mAexKkgE6aHdTwPOw4H5dSs0Y/DySY0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=tgi3hh0Ux2O0rNYjlKi+FXP9I6bj7E8aPXXEqm9RGog=; b=Mm4uXafv0NH6YFK5SgKawRQwvJHWj83Q06WmYsGVLitfl6MqwJsWZtU1d+qYpjdYG8 JmoYORvwCaax/Win9NTbZyTIfgOz4wG95kTJexSIBhuGFkQ7XQ9um/rwk1yHz1cSSGhT y2Sq5DQwKxxi8bM5hbfe0ZwOGFVAXD+1d0uvfUPw2esuEeJbcOyTBDGS0MbUliau8WLn c158ak1q0BhIr+f9s7aocDEovQPL9acwV6P2IKknz4rAHv4F0aWIX85IeqnWexW3N64v P9cCWzaw3ntHkUUb9n2GdDfEXCgMKkS0416DVBrQ6RhXuEj5jLtsolrlIKzbup0OqFNd ghtA== X-Gm-Message-State: AElRT7EyL7PM4KZTD/9CtejpMYLxeXwqYqlwsSgxuOnG3+yKowL8iIBQ QZlaCvitGrR/46gIOJ8DO/M+tX0l5r6tYQilRqIz5g== X-Received: by 10.107.151.74 with SMTP id z71mr8930798iod.277.1521128928089; Thu, 15 Mar 2018 08:48:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.138.209 with HTTP; Thu, 15 Mar 2018 08:48:47 -0700 (PDT) In-Reply-To: References: <20180314192937.12888-1-ard.biesheuvel@linaro.org> From: Ard Biesheuvel Date: Thu, 15 Mar 2018 15:48:47 +0000 Message-ID: Subject: Re: [PATCH v2] Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment" To: Daniel Vacek Cc: linux-arm-kernel , open list , Mark Rutland , Will Deacon , Catalin Marinas , Marc Zyngier , Mel Gorman , Michal Hocko , Paul Burton , Pavel Tatashin , Vlastimil Babka , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15 March 2018 at 15:34, Daniel Vacek wrote: > On Thu, Mar 15, 2018 at 4:17 PM, Ard Biesheuvel > wrote: >> On 15 March 2018 at 15:12, Daniel Vacek wrote: >>> On Thu, Mar 15, 2018 at 8:45 AM, Ard Biesheuvel >>> wrote: >>>> On 15 March 2018 at 07:44, Daniel Vacek wrote: >>>>> On Thu, Mar 15, 2018 at 8:36 AM, Ard Biesheuvel >>>>> wrote: >>>>>> On 15 March 2018 at 02:23, Daniel Vacek wrote: >>>>>>> On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel >>>>>>> wrote: >>>>>>>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae. >>>>>>>> >>>>>>>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock >>>>>>>> alignment") modified the logic in memmap_init_zone() to initialize >>>>>>>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON() >>>>>>>> in move_freepages(), which is redundant by its own admission, and >>>>>>>> dereferences struct page fields to obtain the zone without checking >>>>>>>> whether the struct pages in question are valid to begin with. >>>>>>>> >>>>>>>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does >>>>>>>> may cause pfn assume the same value it had in a prior iteration of >>>>>>>> the loop, resulting in an infinite loop and a hang very early in the >>>>>>>> boot. Also, since it doesn't perform the same rounding on start_pfn >>>>>>>> itself but only on intermediate values following an invalid PFN, we >>>>>>>> may still hit the same VM_BUG_ON() as before. >>>>>>>> >>>>>>>> So instead, let's fix this at the core, and ensure that the BUG >>>>>>>> check doesn't dereference struct page fields of invalid pages. >>>>>>>> >>>>>>>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment") >>>>>>>> Cc: Daniel Vacek >>>>>>>> Cc: Mel Gorman >>>>>>>> Cc: Michal Hocko >>>>>>>> Cc: Paul Burton >>>>>>>> Cc: Pavel Tatashin >>>>>>>> Cc: Vlastimil Babka >>>>>>>> Cc: Andrew Morton >>>>>>>> Cc: Linus Torvalds >>>>>>>> Signed-off-by: Ard Biesheuvel >>>>>>>> --- >>>>>>>> mm/page_alloc.c | 13 +++++-------- >>>>>>>> 1 file changed, 5 insertions(+), 8 deletions(-) >>>>>>>> >>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>>>>> index 3d974cb2a1a1..635d7dd29d7f 100644 >>>>>>>> --- a/mm/page_alloc.c >>>>>>>> +++ b/mm/page_alloc.c >>>>>>>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone, >>>>>>>> * Remove at a later date when no bug reports exist related to >>>>>>>> * grouping pages by mobility >>>>>>>> */ >>>>>>>> - VM_BUG_ON(page_zone(start_page) != page_zone(end_page)); >>>>>>>> + VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) && >>>>>>>> + pfn_valid(page_to_pfn(end_page)) && >>>>>>>> + page_zone(start_page) != page_zone(end_page)); >>>>>>> >>>>>>> Hi, I am on vacation this week and I didn't have a chance to test this >>>>>>> yet but I am not sure this is correct. Generic pfn_valid() unlike the >>>>>>> arm{,64} arch specific versions returns true for all pfns in a section >>>>>>> if there is at least some memory mapped in that section. So I doubt >>>>>>> this prevents the crash I was targeting. I believe pfn_valid() does >>>>>>> not change a thing here :( >>>>>>> >>>>>> >>>>>> If this is the case, memblock_next_valid_pfn() is broken since it >>>>>> skips valid PFNs, and we should be fixing that instead. >>>>> >>>>> How do you define valid pfn? Maybe the generic version of pfn_valid() >>>>> should be fixed??? >>>>> >>>> >>>> memblock_next_valid_pfn() skips PFNs for which pfn_valid() returns >>>> true. That is clearly a bug. >>> >>> So pfn_valid() does not mean this frame is usable memory? >>> >> >> Who cares what it *means*? > > abstractions? > >> memblock_next_valid_pfn() has 'valid_pfn' in its name, so if passing >> pfn A returns B, and there exists a C such that A < C < B and >> pfn_valid(C) returns true, memblock_next_valid_pfn doesn't do what it >> says on the tin and should be fixed. > > If you don't like the name I would argue it should be changed to > something like memblock_pfn_of_next_memory(). Still the name is not > next_valid_pfn() but memblock_next... kind of distinguishing something > different. > Naming is important. If the name would have reflected what it actually does, perhaps it would have taken us less time to figure out that what it's doing is wrong. >> You keep going on about how pfn_valid() does or does not do what you >> think, but that is really irrelevant. > > I'm trying to learn. So am I :-) > Hence I was asking what is the abstract meaning > of it. As I see two *way_different* implementations so I am not sure > how I should understand that. > My interpretation is that it has a struct page associated with it, but it seems the semantics of pfn_valid() aren't well defined. IIRC there are places in the kernel that assume that valid PFNs are covered by the kernel direct mapping (on non-highmem kernels), and this is why we have a separate definition for arm64, which needs to remove some regions from the kernel direct mapping because the architecture does not permit mappings with mismatched attributes, and these regions may be mapped by the firmware as well. Dereferencing the struct page associated with a PFN for which pfn_valid() returns false is wrong in any case, which is why the VM_BUG_ON() you identified was buggy as well. But on the other hand, that does mean we need to guarantee that all valid PFNs have their struct page initialized. >>> OK, in that case we need to fix or revert memblock_next_valid_pfn(). >>> That works for me. >>> >> >> OK. You can add my ack to a patch that reverts it, and we can revisit >> it for the next cycle. > > Cool. I'll do that next week. Thank you, sir. > Likewise.