Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp4559964ybp; Mon, 14 Oct 2019 06:37:14 -0700 (PDT) X-Google-Smtp-Source: APXvYqxz8vz4Z6VLBavaoTE0no/k/XEX6ucSU3MuKNrdl8WlwKMi7+b83gdAkHhvqJie53f6BiAC X-Received: by 2002:a50:9fcd:: with SMTP id c71mr28177725edf.139.1571060233867; Mon, 14 Oct 2019 06:37:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571060233; cv=none; d=google.com; s=arc-20160816; b=Vf8WwaXRjW8PEr8tRCCuuDwZ00cfZHcg8Gzfl2bU2Us4iVhiVWNuzy9xCv5UNi6LXY TGKqT37X0lLmyve85mEq4/EOa6lneFZV36Qb+Gxy9QEZ/99xRu7JKWPt8h9OnWCJt59z cri0cRf4pznCopAKUHbY8ap66huI4KmsI61b1rOVJ/uzXPtjZwua1fxDZti//GEeoAnM gyq2Zpboy6evX82B/fvagMG///+D5COK/alXWqB+huDKKIbWFvaulpsOUvsaw5QxyrQ0 ZnQPy4W5nwYgr9zQqWN0Nc41SZ2OUJPO4JiTP3esGgKkvpKVNB9dQFHbdNJdxpdKODRo ihFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=evFPOy4lu7KHZ6g4aJCuetWsBlgGLsFLu5lmDWPLGvU=; b=PkvkH21AVPanTyjNOKOAH9y6hM0kdpFWnya9gXHalPgPuKjnb4lN7EEI9NMEoj0YZ7 n/7Gsc5SytApA0nKDxZuxWh2WNH+GI+hiiIRrY8pfvpibdOM3x8nVu1Ek2Vi1ZjoIfJ1 lkunm/GNVW9QuSt2hLar+Bxy4TPgLnjjjTysd0Dj8/PcW1Uy/N86zLGDA7PFnuERRU0v wpYNeyqXM3O6FWCDahzgXhkrCM+/q0V+qHGrRTva+MrVsEwHh/XYmnV84rFVjSSjqHr3 GYLP5KHfF7vDRJn8NfyWJI3qGLo3oixMCydydk7n5kX7M8RcFO7dTB1b5Mu7nUSAYmKt bo+g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x30si12680710edi.351.2019.10.14.06.36.49; Mon, 14 Oct 2019 06:37:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731306AbfJNNgU (ORCPT + 99 others); Mon, 14 Oct 2019 09:36:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:55026 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727409AbfJNNgU (ORCPT ); Mon, 14 Oct 2019 09:36:20 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3E94CBEBD; Mon, 14 Oct 2019 13:36:18 +0000 (UTC) Date: Mon, 14 Oct 2019 15:36:17 +0200 From: Michal Hocko To: David Hildenbrand Cc: Naoya Horiguchi , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Andrew Morton , Oscar Salvador Subject: Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() Message-ID: <20191014133617.GJ317@dhcp22.suse.cz> References: <20191009142435.3975-1-david@redhat.com> <20191009142435.3975-3-david@redhat.com> <20191009144323.GH6681@dhcp22.suse.cz> <5a626821-77e9-e26b-c2ee-219670283bf0@redhat.com> <20191010073526.GC18412@dhcp22.suse.cz> <18383432-c74a-9ce5-a3c6-1e57d54cb629@redhat.com> <52e81b85-c460-5b99-a297-e065caab3a16@redhat.com> <20191011060249.GA30500@hori.linux.bs1.fc.nec.co.jp> <3706d642-6c29-41b8-a676-1b5541af3169@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3706d642-6c29-41b8-a676-1b5541af3169@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Cc Oscar] On Fri 11-10-19 12:13:17, David Hildenbrand wrote: > On 11.10.19 08:02, Naoya Horiguchi wrote: > > On Thu, Oct 10, 2019 at 09:58:40AM +0200, David Hildenbrand wrote: > >> On 10.10.19 09:52, David Hildenbrand wrote: > >>> On 10.10.19 09:35, Michal Hocko wrote: > >>>> On Thu 10-10-19 09:27:32, David Hildenbrand wrote: > >>>>> On 09.10.19 16:43, Michal Hocko wrote: > >>>>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote: > >>>>>>> We should check for pfn_to_online_page() to not access uninitialized > >>>>>>> memmaps. Reshuffle the code so we don't have to duplicate the error > >>>>>>> message. > >>>>>>> > >>>>>>> Cc: Naoya Horiguchi > >>>>>>> Cc: Andrew Morton > >>>>>>> Cc: Michal Hocko > >>>>>>> Signed-off-by: David Hildenbrand > >>>>>>> --- > >>>>>>> mm/memory-failure.c | 14 ++++++++------ > >>>>>>> 1 file changed, 8 insertions(+), 6 deletions(-) > >>>>>>> > >>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>>>>>> index 7ef849da8278..e866e6e5660b 100644 > >>>>>>> --- a/mm/memory-failure.c > >>>>>>> +++ b/mm/memory-failure.c > >>>>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) > >>>>>>> if (!sysctl_memory_failure_recovery) > >>>>>>> panic("Memory failure on page %lx", pfn); > >>>>>>> > >>>>>>> - if (!pfn_valid(pfn)) { > >>>>>>> + p = pfn_to_online_page(pfn); > >>>>>>> + if (!p) { > >>>>>>> + if (pfn_valid(pfn)) { > >>>>>>> + pgmap = get_dev_pagemap(pfn, NULL); > >>>>>>> + if (pgmap) > >>>>>>> + return memory_failure_dev_pagemap(pfn, flags, > >>>>>>> + pgmap); > >>>>>>> + } > >>>>>>> pr_err("Memory failure: %#lx: memory outside kernel control\n", > >>>>>>> pfn); > >>>>>>> return -ENXIO; > >>>>>> > >>>>>> Don't we need that earlier at hwpoison_inject level? > >>>>>> > >>>>> > >>>>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn) > >>>>> alone would not be sufficient as discussed. We would, again, have to > >>>>> special-case ZONE_DEVICE via things like get_dev_pagemap() ... > >>>>> > >>>>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way: > >>>>> > >>>>> /* > >>>>> * Note that the below poison/unpoison interfaces do not involve > >>>>> * hardware status change, hence do not require hardware support. > >>>>> * They are mainly for testing hwpoison in software level. > >>>>> */ > >>>>> > >>>>> So it's not that bad compared to memory_failure() called from real HW or > >>>>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store() > >>>> > >>>> Yes, this is just a toy. And yes we need to handle zone device pages > >>>> here because a) people likely want to test MCE behavior even on these > >>>> pages and b) HW can really trigger MCEs there as well. I was just > >>>> pointing that the patch is likely incomplete. > >>>> > >>> > >>> I rather think this deserves a separate patch as it is a separate > >>> interface :) > >>> > >>> I do wonder why hwpoison_inject() has to perform so much extra work > >>> compared to other memory_failure() users. This smells like legacy > >>> leftovers to me, but I might be wrong. The interface is fairly old, > >>> though. Does anybody know why we need this magic? I can spot quite some > >>> duplicate checks/things getting performed. > > > > It concerns me too, this *is* an old legacy code. I guess it was left as-is > > because no one complained about it. That's not good, so I'll do some cleanup. > > Most of that stuff was introduced in > > commit 31d3d3484f9bd263925ecaa341500ac2df3a5d9b > Author: Wu Fengguang > Date: Wed Dec 16 12:19:59 2009 +0100 > > HWPOISON: limit hwpoison injector to known page types > > __memory_failure()'s workflow is > > set PG_hwpoison > //... > unset PG_hwpoison if didn't pass hwpoison filter > > That could kill unrelated process if it happens to page fault on the > page with the (temporary) PG_hwpoison. The race should be big enough to > appear in stress tests. > > Fix it by grabbing the page and checking filter at inject time. This > also avoids the very noisy "Injecting memory failure..." messages. > > > Now, we still have the same "issue" in memory_failure() today: > > > if (TestSetPageHWPoison(p)) { > pr_err("Memory failure: %#lx: already hardware poisoned\n", > pfn); > return 0; > } > [...] > if (hwpoison_filter(p)) { > if (TestClearPageHWPoison(p)) > num_poisoned_pages_dec(); > unlock_page(p); > put_hwpoison_page(p); > return 0; > } > > However, I don't understand why we need that special handling only for this > debug interface and not the other users. > > I'd vote for ripping out that legacy crap (so the interface works correctly > with ZONE_DEVICE) and instead (if really required) rework memory_failure() > to not produce such side effects. I do agree. The two should be really using the same code. My understanding was that MADV_HWPOISON was there to test the actual MCE behavior (and the man page seems to agree with that). Oscar is working on a rewrite. Not sure he has considered this as well. -- Michal Hocko SUSE Labs