Received: by 10.223.148.5 with SMTP id 5csp7698724wrq; Thu, 18 Jan 2018 08:29:49 -0800 (PST) X-Google-Smtp-Source: ACJfBosx0IS8JD49BIuBdZASAmnp/T5356J04KjO2kW4E52xqzbegg5x70fvdyVv2rUlrFj2TD3N X-Received: by 10.101.68.138 with SMTP id l10mr7282699pgq.150.1516292989747; Thu, 18 Jan 2018 08:29:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516292989; cv=none; d=google.com; s=arc-20160816; b=xlg3dE5u454/Lv5PFz2DPWccDlHXM1nc3pFQM7mqs4XHsiUmOKoWcgCjshlEHv8gDj Oo7Yimdi2OTcAa56kAtB8MYgpNUq6vn1d07N23gD9gDiIHu4aPOvd7qYxX6oXViJVNC2 3hkboEdFmgfoQ07VAP/CpdL4oG+0dO2cTRueJYFzeMK4bVkrnTsKCYDpnX6Y+PQ6CzQm 0hQIc1Y2Fb8B3gZVeuQml+baNrBBDCoZH/ctTCo8VCKQNrkXxss/JHoEXlChxgQzE1GR AE/9WjtWY7obThT5fESiHWNX/7Ox1JTs6V9/J0AHEAH9MWmNI24eBhyn8d2JTWFaYNvZ myGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=sK6TmYzkzVWc0VHIyX69vT2PIezgjaLh2XGbWAgtMEY=; b=KwsNrEuKkCaYeXPIvhSAMj0X8t3eL6dIQMUwM9JATBU9qi6lPHhaVildCemnwbNBeE AyFrvZo+QiviYK3NRDo+6tiDo/fkY+2a5teWhawey6nBQIPYl1LsrNkhgvFLk7eMB2hB ZqegxOWn6wMyOXYrNaXnmj9q7fScpedu2EWlHXMen0PC2kYv7/CATv+OT1xLpwtcfUry DgmaRfYgCc7VkNcPwUrat++1C+o8ttY+eB5P2nPL1UrwYLrFbAh2JSiN6tTzsdb0mAN/ WZy3nf0duMHsGBiuggJCheOZyurXuiN+nnVXqFEaHm/uMdHkSX2WRhLTiGoprJ0Zr8LE kpCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=IsfIXmk0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t22-v6si13467plo.256.2018.01.18.08.29.35; Thu, 18 Jan 2018 08:29:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=IsfIXmk0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933190AbeARQHt (ORCPT + 99 others); Thu, 18 Jan 2018 11:07:49 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:47021 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933097AbeARQHr (ORCPT ); Thu, 18 Jan 2018 11:07:47 -0500 Received: by mail-wm0-f67.google.com with SMTP id 143so22919445wma.5 for ; Thu, 18 Jan 2018 08:07:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=sK6TmYzkzVWc0VHIyX69vT2PIezgjaLh2XGbWAgtMEY=; b=IsfIXmk0FhkS01ut+Y3wy17qjcrIUKZA+o9HG/Hcx/wYNh0DzcplXE34l16YHJWvwR NMnCnCZyOV0MEN4vuVcgjr85YtPBAEur6sooXpyyXr9jNhboMyubjCdAevzg/ynaD+FN tBkJXMysSn1RRiV2tthDOErEsxDMGKiQcaTZCkGKqC07ZuEUFn44ZEeK5Bu3ZMQ9VS1L dwKkHC8J1lOm+Efol4vdhU6wpb0FcZPcdFJnhvWzsR+eyPKsKK6LLiMaVYFjaviNSb/C rId9oOS0LhfWUDlxN+CKwDXc/KY8OS0tkASoWRU8blAtmJtUkDbPrBJJBDFjyQuiLfAq fMFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=sK6TmYzkzVWc0VHIyX69vT2PIezgjaLh2XGbWAgtMEY=; b=Bg48/6yNWeTKmehMeBm+mLFFfBRlH03yLnsdgbR/P1sK2Mhpfpue0hi1kOHyAPFQwa pOdMXCIpBlP7eIqf9DyyIABNmHmr9QudMGjhtDdDGJ28j06rbsIgFs2yY6WYZ3y67DMv lsOKM9I2bU+fLQ4/OSrXyEyXt49H8SHS5NURBQLhwqPM9BK/f71ggirzYlIzHHLwOG1w dbuAFIG5jPoN8p8qHcnpABv/Zk2n9E5MLfj8W8lYY/HY95cJloml7xuR20jiNwp7v80S CmO0IlqxvVujMkt07SwfJHKho6RAfpa0thDrPGdl73loNwE7zcLRHysFYcS6BD3nPH8x p45Q== X-Gm-Message-State: AKwxytfJzkeJkYofNiRdsaJPWKyq1FswgVjnD+JjvwcZhhNPD9yM9vGA 1NbB5DJIkU/Mj0VfqFHq+P8XWjh5 X-Received: by 10.80.146.29 with SMTP id i29mr8574496eda.27.1516290041911; Thu, 18 Jan 2018 07:40:41 -0800 (PST) Received: from node.shutemov.name ([178.122.206.50]) by smtp.gmail.com with ESMTPSA id y17sm4513319edl.36.2018.01.18.07.40.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Jan 2018 07:40:28 -0800 (PST) Received: by node.shutemov.name (Postfix, from userid 1000) id 3465F648D520; Thu, 18 Jan 2018 18:40:26 +0300 (+03) Date: Thu, 18 Jan 2018 18:40:26 +0300 From: "Kirill A. Shutemov" To: Dave Hansen Cc: Tetsuo Handa , torvalds@linux-foundation.org, kirill.shutemov@linux.intel.com, akpm@linux-foundation.org, hannes@cmpxchg.org, iamjoonsoo.kim@lge.com, mgorman@techsingularity.net, tony.luck@intel.com, vbabka@suse.cz, mhocko@kernel.org, aarcange@redhat.com, hillf.zj@alibaba-inc.com, hughd@google.com, oleg@redhat.com, peterz@infradead.org, riel@redhat.com, srikar@linux.vnet.ibm.com, vdavydov.dev@gmail.com, mingo@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Subject: Re: [mm 4.15-rc8] Random oopses under memory pressure. Message-ID: <20180118154026.jzdgdhkcxiliaulp@node.shutemov.name> References: <201801160115.w0G1FOIG057203@www262.sakura.ne.jp> <201801170233.JDG21842.OFOJMQSHtOFFLV@I-love.SAKURA.ne.jp> <201801172008.CHH39543.FFtMHOOVSQJLFO@I-love.SAKURA.ne.jp> <201801181712.BFD13039.LtHOSVMFJQFOFO@I-love.SAKURA.ne.jp> <20180118122550.2lhsjx7hg5drcjo4@node.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 18, 2018 at 06:45:00AM -0800, Dave Hansen wrote: > On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > > [ 10.084024] diff: -858690919 > > [ 10.084258] hpage_nr_pages: 1 > > [ 10.084386] check1: 0 > > [ 10.084478] check2: 0 > ... > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > > index d22b84310f6d..57b4397f1ea5 100644 > > --- a/mm/page_vma_mapped.c > > +++ b/mm/page_vma_mapped.c > > @@ -70,6 +70,14 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) > > } > > if (pte_page(*pvmw->pte) < pvmw->page) > > return false; > > + > > + if (pte_page(*pvmw->pte) - pvmw->page) { > > + printk("diff: %d\n", pte_page(*pvmw->pte) - pvmw->page); > > + printk("hpage_nr_pages: %d\n", hpage_nr_pages(pvmw->page)); > > + printk("check1: %d\n", pte_page(*pvmw->pte) - pvmw->page < 0); > > + printk("check2: %d\n", pte_page(*pvmw->pte) - pvmw->page >= hpage_nr_pages(pvmw->page)); > > + BUG(); > > + } > > This says that pte_page(*pvmw->pte) and pvmw->page are roughly 4GB away > from each other (858690919*4=0xccba559c0). That's not the compiler > being wonky, it just means that the virtual addresses of the memory > sections are that far apart. > > This won't happen when you have vmemmap or flatmem because the mem_map[] > is virtually contiguous and pointer arithmetic just works against all > 'struct page' pointers. But with classic sparsemem, it doesn't. > > You need to make sure that the PFNs are in the same section before you > can do the math that you want to do here. Something like this? From 251e124630da82482e8b320c73162ce89af04d5d Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Thu, 18 Jan 2018 18:24:07 +0300 Subject: [PATCH] mm, page_vma_mapped: Fix pointer arithmetics in check_pte() Tetsuo reported random crashes under memory pressure on 32-bit x86 system and tracked down to change that introduced page_vma_mapped_walk(). The root cause of the issue is the faulty pointer math in check_pte(). As ->pte may point to an arbitrary page we have to check that they are belong to the section before doing math. Otherwise it may lead to weird results. It wasn't noticed until now as mem_map[] is virtually contiguous on flatmem or vmemmap sparsemem. Pointer arithmetic just works against all 'struct page' pointers. But with classic sparsemem, it doesn't. Let's restructure code a bit and add necessary check. Signed-off-by: Kirill A. Shutemov Reported-by: Tetsuo Handa Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Cc: stable@vger.kernel.org --- mm/page_vma_mapped.c | 66 +++++++++++++++++++++++++++++++++++----------------- 1 file changed, 45 insertions(+), 21 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index d22b84310f6d..de195dcdfbd8 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -30,8 +30,28 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) return true; } +/** + * check_pte - check if @pvmw->page is mapped at the @pvmw->pte + * + * page_vma_mapped_walk() found a place where @pvmw->page is *potentially* + * mapped. check_pte() has to validate this. + * + * @pvmw->pte may point to empty PTE, swap PTE or PTE pointing to arbitrary + * page. + * + * If PVMW_MIGRATION flag is set, returns true if @pvmw->pte contains migration + * entry that points to @pvmw->page or any subpage in case of THP. + * + * If PVMW_MIGRATION flag is not set, returns true if @pvmw->pte points to + * @pvmw->page or any subpage in case of THP. + * + * Otherwise, return false. + * + */ static bool check_pte(struct page_vma_mapped_walk *pvmw) { + struct page *page; + if (pvmw->flags & PVMW_MIGRATION) { #ifdef CONFIG_MIGRATION swp_entry_t entry; @@ -41,37 +61,41 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) if (!is_migration_entry(entry)) return false; - if (migration_entry_to_page(entry) - pvmw->page >= - hpage_nr_pages(pvmw->page)) { - return false; - } - if (migration_entry_to_page(entry) < pvmw->page) - return false; + + page = migration_entry_to_page(entry); #else WARN_ON_ONCE(1); #endif - } else { - if (is_swap_pte(*pvmw->pte)) { - swp_entry_t entry; + } else if (is_swap_pte(*pvmw->pte)) { + swp_entry_t entry; - entry = pte_to_swp_entry(*pvmw->pte); - if (is_device_private_entry(entry) && - device_private_entry_to_page(entry) == pvmw->page) - return true; - } + /* Handle un-addressable ZONE_DEVICE memory */ + entry = pte_to_swp_entry(*pvmw->pte); + if (!is_device_private_entry(entry)) + return false; + page = device_private_entry_to_page(entry); + } else { if (!pte_present(*pvmw->pte)) return false; - /* THP can be referenced by any subpage */ - if (pte_page(*pvmw->pte) - pvmw->page >= - hpage_nr_pages(pvmw->page)) { - return false; - } - if (pte_page(*pvmw->pte) < pvmw->page) - return false; + page = pte_page(*pvmw->pte); } + /* + * Make sure that pages are in the same section before doing pointer + * arithmetics. + */ + if (page_to_section(pvmw->page) != page_to_section(page)) + return false; + + if (page < pvmw->page) + return false; + + /* THP can be referenced by any subpage */ + if (page - pvmw->page >= hpage_nr_pages(pvmw->page)) + return false; + return true; } -- Kirill A. Shutemov