Received: by 10.223.176.46 with SMTP id f43csp894695wra; Fri, 19 Jan 2018 03:50:11 -0800 (PST) X-Google-Smtp-Source: ACJfBotPsv6V46z+ljNQuIH+grNry593lKLQ7bD5JL6nSbuXZZl1UfnbuvsYBo5qOFe4IUke2JAQ X-Received: by 10.99.95.195 with SMTP id t186mr36558409pgb.274.1516362611868; Fri, 19 Jan 2018 03:50:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516362611; cv=none; d=google.com; s=arc-20160816; b=o2+XQmBW+4cP4AWinKW14SVgPk/itsNXAaSLAScpXcy/MX7wkbmJnRYXfyIyrGniDX 41RX1zQEyWFOiOOfUuuy5ZTyp+OUGBZy5SNZwrQeuimGR1L/3JyUDDY7rwHtb6g+x/OC TTTekBGooFfYIsnPRdW5HMv7tQpSoB0/B+Q4F214/sEfv/bamnkGpXyGxAEYSMSMq59I bmtFb5XoWocTa0JsEuqvSC9TtjbAzREDsxwtteAbE6rwJq5yTOPbFzVSB328xkUh2gZ0 x6rf4oo8lsPa56f0lHbe5pUO8Wg60kF3w0HkJF1/eXq9JeYLh0wlN2MwYksbU7W0SsBq vt0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=t2cTOqWljkg/aw05CfAaBe06cBruwTMiQVvqMU+J3gM=; b=bQSMCnaslchFQ0NjFsWnjcuKko53SSBHrus3WngjflpVJemLTClZjY073OWx6HyeLX xQIEbZ7aY5LtjA/M7vPqRt4wkYzkbxivXtcUasPgsswOySrbOHWwPFJ5XrNbVK77xu6X UuR1oFqOyQrIRQhSQCT1T1/dUJwUVHs4FDMIryTRvMJSIZjiJeNePVldh0mvUHOyYAXQ o6Pu0Nk0oa+W2myyqi4bGyCfcKPJ7BU09iximcgfEQDOf2LiJiuKWjMYYyccmnAds9wU 5P5mITnP2rnAxdRhXcDutPOJqXXrgJGa9Qe24TzS5VDdm8ddgA0WFR+ghPX9+AXihFyt yM4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=X219cAdN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i9si7770357pgq.0.2018.01.19.03.49.57; Fri, 19 Jan 2018 03:50:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=X219cAdN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755424AbeASLt0 (ORCPT + 99 others); Fri, 19 Jan 2018 06:49:26 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:43978 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754892AbeASLtU (ORCPT ); Fri, 19 Jan 2018 06:49:20 -0500 Received: by mail-wm0-f66.google.com with SMTP id g1so2868614wmg.2 for ; Fri, 19 Jan 2018 03:49:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=t2cTOqWljkg/aw05CfAaBe06cBruwTMiQVvqMU+J3gM=; b=X219cAdNhoA4QdnAu9MTbjM/1gRXiLC3JMcW6vBMgOSC9SVT7Ga9lbxOHobC2c5mgm 8UJ7ILhjc4jOTJx26N+l3MtvzUc2JKKLVsxvjBpIaltflFjFrZ5+lRaeTWXlRFFfBOcr 9Ig3CZ/DEu8rKKd+hHo4+W9r2/UkAro3ZZYEVVdVsUnplCMgzhIhNq4GSVrI2VIEtvjh WlZVSQ/OrXpUQkDs7L6n/tLdLcE+yAdQF8NnBXNhffNMvnk4pUh07PW1tiuvXbi+LsQx joY7lo+70a+SkxNG0HJVkSFiylfoBY0qeOFri6u44yeiJPod+U+7ZW+qLgHvPJABIzRy jy1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=t2cTOqWljkg/aw05CfAaBe06cBruwTMiQVvqMU+J3gM=; b=qjBlM30MD6RdUGi3X8xH0IzdNdkyDZFkivqx7qWb2QoGi8nV+LQc7qjG1GkoSKTiXg iEI7OxDvcAkjvzEJWgljIJN74M+eDVJr1coe1hq4Jaev7C7Q4DEt5X7jUGQanw6hik5o 0s9hNynI3z3dNRlBJgyr32olXkmaSZu42cnMM48050eHMhjZPFMTaWA/plg0e0r7qat3 Wk8z+osq5x2frMhFjantpwpuNwRBS4MYHjtjCQBhJbZujCDhmDm86Zd51L/qKloc+5Al RSn0SWHgW6hOGuyB//VH30hX78d9/maBT6b4Js94VDB81amby3IVCMSwdDi/yqrvzI5e lQQg== X-Gm-Message-State: AKwxyte+jR5w6SdPWxcBaZ8QzuWMI1qvhdgu160vW4FLMz0UYefz7Qdl 34y0+Ye4I9EOVbkW2UWKxXiGfw== X-Received: by 10.80.243.13 with SMTP id p13mr13031336edm.186.1516362559019; Fri, 19 Jan 2018 03:49:19 -0800 (PST) Received: from node.shutemov.name (mm-50-206-122-178.brest.dynamic.pppoe.byfly.by. [178.122.206.50]) by smtp.gmail.com with ESMTPSA id e56sm6557830edb.75.2018.01.19.03.49.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 03:49:18 -0800 (PST) Received: by node.shutemov.name (Postfix, from userid 1000) id 2BF70648D520; Fri, 19 Jan 2018 14:49:17 +0300 (+03) Date: Fri, 19 Jan 2018 14:49:17 +0300 From: "Kirill A. Shutemov" To: Michal Hocko Cc: Dave Hansen , Tetsuo Handa , torvalds@linux-foundation.org, kirill.shutemov@linux.intel.com, akpm@linux-foundation.org, hannes@cmpxchg.org, iamjoonsoo.kim@lge.com, mgorman@techsingularity.net, tony.luck@intel.com, vbabka@suse.cz, aarcange@redhat.com, hillf.zj@alibaba-inc.com, hughd@google.com, oleg@redhat.com, peterz@infradead.org, riel@redhat.com, srikar@linux.vnet.ibm.com, vdavydov.dev@gmail.com, mingo@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Subject: Re: [mm 4.15-rc8] Random oopses under memory pressure. Message-ID: <20180119114917.rvghcgexgbm73xkq@node.shutemov.name> References: <201801170233.JDG21842.OFOJMQSHtOFFLV@I-love.SAKURA.ne.jp> <201801172008.CHH39543.FFtMHOOVSQJLFO@I-love.SAKURA.ne.jp> <201801181712.BFD13039.LtHOSVMFJQFOFO@I-love.SAKURA.ne.jp> <20180118122550.2lhsjx7hg5drcjo4@node.shutemov.name> <20180118154026.jzdgdhkcxiliaulp@node.shutemov.name> <20180118172213.GI6584@dhcp22.suse.cz> <20180119100259.rwq3evikkemtv7q5@node.shutemov.name> <20180119103342.GS6584@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180119103342.GS6584@dhcp22.suse.cz> User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 19, 2018 at 11:33:42AM +0100, Michal Hocko wrote: > On Fri 19-01-18 13:02:59, Kirill A. Shutemov wrote: > > On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote: > > > On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote: > > > [...] > > > > + /* > > > > + * Make sure that pages are in the same section before doing pointer > > > > + * arithmetics. > > > > + */ > > > > + if (page_to_section(pvmw->page) != page_to_section(page)) > > > > + return false; > > > > > > OK, THPs shouldn't cross memory sections AFAIK. My brain is meltdown > > > these days so this might be a completely stupid question. But why don't > > > you simply compare pfns? This would be just simpler, no? > > > > In original code, we already had pvmw->page around and I thought it would > > be easier to get page for the pte intead of looking for pfn for both > > sides. > > > > We these changes it's no longer the case. > > > > Do you care enough to send a patch? :) > > Well, memory sections are sparsemem concept IIRC. Unless I've missed > something page_to_section is quarded by SECTION_IN_PAGE_FLAGS and that > is conditional to CONFIG_SPARSEMEM. THP is a generic code so using it > there is wrong unless I miss some subtle detail here. > > Comparing pfn should be generic enough. Good point. What about something like this? From 861f68c555b87fd6c0ccc3428ace91b7e185b73a Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Thu, 18 Jan 2018 18:24:07 +0300 Subject: [PATCH] mm, page_vma_mapped: Drop faulty pointer arithmetics in check_pte() Tetsuo reported random crashes under memory pressure on 32-bit x86 system and tracked down to change that introduced page_vma_mapped_walk(). The root cause of the issue is the faulty pointer math in check_pte(). As ->pte may point to an arbitrary page we have to check that they are belong to the section before doing math. Otherwise it may lead to weird results. It wasn't noticed until now as mem_map[] is virtually contiguous on flatmem or vmemmap sparsemem. Pointer arithmetic just works against all 'struct page' pointers. But with classic sparsemem, it doesn't. Let's restructure code a bit and replace pointer arithmetic with operations on pfns. Signed-off-by: Kirill A. Shutemov Reported-by: Tetsuo Handa Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Cc: stable@vger.kernel.org Signed-off-by: Kirill A. Shutemov --- include/linux/swapops.h | 21 ++++++++++++++++++ mm/page_vma_mapped.c | 59 +++++++++++++++++++++++++++++++------------------ 2 files changed, 59 insertions(+), 21 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 9c5a2628d6ce..1d3877c39a00 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -124,6 +124,11 @@ static inline bool is_write_device_private_entry(swp_entry_t entry) return unlikely(swp_type(entry) == SWP_DEVICE_WRITE); } +static inline unsigned long device_private_entry_to_pfn(swp_entry_t entry) +{ + return swp_offset(entry); +} + static inline struct page *device_private_entry_to_page(swp_entry_t entry) { return pfn_to_page(swp_offset(entry)); @@ -154,6 +159,11 @@ static inline bool is_write_device_private_entry(swp_entry_t entry) return false; } +static inline unsigned long device_private_entry_to_pfn(swp_entry_t entry) +{ + return 0; +} + static inline struct page *device_private_entry_to_page(swp_entry_t entry) { return NULL; @@ -189,6 +199,11 @@ static inline int is_write_migration_entry(swp_entry_t entry) return unlikely(swp_type(entry) == SWP_MIGRATION_WRITE); } +static inline unsigned long migration_entry_to_pfn(swp_entry_t entry) +{ + return swp_offset(entry); +} + static inline struct page *migration_entry_to_page(swp_entry_t entry) { struct page *p = pfn_to_page(swp_offset(entry)); @@ -218,6 +233,12 @@ static inline int is_migration_entry(swp_entry_t swp) { return 0; } + +static inline unsigned long migration_entry_to_pfn(swp_entry_t entry) +{ + return 0; +} + static inline struct page *migration_entry_to_page(swp_entry_t entry) { return NULL; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index d22b84310f6d..072ae9bc5fee 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -30,8 +30,28 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) return true; } +/** + * check_pte - check if @pvmw->page is mapped at the @pvmw->pte + * + * page_vma_mapped_walk() found a place where @pvmw->page is *potentially* + * mapped. check_pte() has to validate this. + * + * @pvmw->pte may point to empty PTE, swap PTE or PTE pointing to arbitrary + * page. + * + * If PVMW_MIGRATION flag is set, returns true if @pvmw->pte contains migration + * entry that points to @pvmw->page or any subpage in case of THP. + * + * If PVMW_MIGRATION flag is not set, returns true if @pvmw->pte points to + * @pvmw->page or any subpage in case of THP. + * + * Otherwise, return false. + * + */ static bool check_pte(struct page_vma_mapped_walk *pvmw) { + unsigned long pfn; + if (pvmw->flags & PVMW_MIGRATION) { #ifdef CONFIG_MIGRATION swp_entry_t entry; @@ -41,37 +61,34 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) if (!is_migration_entry(entry)) return false; - if (migration_entry_to_page(entry) - pvmw->page >= - hpage_nr_pages(pvmw->page)) { - return false; - } - if (migration_entry_to_page(entry) < pvmw->page) - return false; + + pfn = migration_entry_to_pfn(entry); #else WARN_ON_ONCE(1); #endif - } else { - if (is_swap_pte(*pvmw->pte)) { - swp_entry_t entry; + } else if (is_swap_pte(*pvmw->pte)) { + swp_entry_t entry; - entry = pte_to_swp_entry(*pvmw->pte); - if (is_device_private_entry(entry) && - device_private_entry_to_page(entry) == pvmw->page) - return true; - } + /* Handle un-addressable ZONE_DEVICE memory */ + entry = pte_to_swp_entry(*pvmw->pte); + if (!is_device_private_entry(entry)) + return false; + pfn = device_private_entry_to_pfn(entry); + } else { if (!pte_present(*pvmw->pte)) return false; - /* THP can be referenced by any subpage */ - if (pte_page(*pvmw->pte) - pvmw->page >= - hpage_nr_pages(pvmw->page)) { - return false; - } - if (pte_page(*pvmw->pte) < pvmw->page) - return false; + pfn = pte_pfn(*pvmw->pte); } + if (pfn < page_to_pfn(pvmw->page)) + return false; + + /* THP can be referenced by any subpage */ + if (pfn - page_to_pfn(pvmw->page) >= hpage_nr_pages(pvmw->page)) + return false; + return true; } -- Kirill A. Shutemov