Received: by 2002:ac0:8c9a:0:0:0:0:0 with SMTP id r26csp159351ima; Fri, 1 Feb 2019 01:06:17 -0800 (PST) X-Google-Smtp-Source: ALg8bN6Xofg0n5rZGHeOTS1x9OCwOggsXmiXUBckBMxQnjiNgK3QrclaaDH9tn45vUYSUfgPFd3F X-Received: by 2002:a17:902:42e4:: with SMTP id h91mr39260365pld.18.1549011977265; Fri, 01 Feb 2019 01:06:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549011977; cv=none; d=google.com; s=arc-20160816; b=L0VQWydsTJjqAbXGS8GBiU+lnZeAHQkJjLQNisFku8T/50yXh6omQDVhRlt76iMnEC aotQHZyehyk27CvC0t5at9g/rvnueMPCXow5Ib9MgFS2xb3L9EZ/V43XI6nUHdlMAcXb iBDegZfKj2XVXmU0sbFrmfX+/IJxGb0zjcldvfdzjSEdN8YbHPmai5gAU14YESSaw/9J aasWWdnrZo4wWq3s2WT8kdoRP3/yVm9qwruBcd6P2yiQJylfWxeOtBi/neNskW+j5Z+T tzAIsCpI2FBdCjF7pQQFF1COyLQ+cuJ8qmJacQZAlDrlLpXNlUBNJDp7o+8lxkA55wjP MWBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:openpgp:from:references:cc:to:subject; bh=hLcwXMz+8ZUB0n9zYz82/38boLtXumuez0AG78bRcf8=; b=0IEyR7UtjY6ieQT7Gl/RtWnrX+93hJXPhYeFK3UEZ7Gb7qkjqgW31N8jBRe2SRlFAP qCK4REFx0ruvjuMw7bzmTnWzIrrh5D1CurUC0CA78TWsRLroMzifxWVcMSwKha8PmWjU jSpvDdOsxdKZLlj/G6HZJaXOj3ZJTWMT4MUzIXe0TUrXF+CLJM2pZyLNrd1Rr4t2Eu+X TPkohOEcWg57WWkYRR7wYoy7E2PFvOwzoCePYGKbPwKHcvPVZn8wSb7XK2sAwVCnFJ79 s5P3Te4XqyC7g0xTperKQSeFbFTsaHnAf2ZHkjJDskkxcWz5mvs3jxWbQ+jFTdTCQjD1 6GSg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j24si6430435pgh.362.2019.02.01.01.06.00; Fri, 01 Feb 2019 01:06:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728478AbfBAJE2 (ORCPT + 99 others); Fri, 1 Feb 2019 04:04:28 -0500 Received: from mx2.suse.de ([195.135.220.15]:46676 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726339AbfBAJE1 (ORCPT ); Fri, 1 Feb 2019 04:04:27 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id F389EB029; Fri, 1 Feb 2019 09:04:23 +0000 (UTC) Subject: Re: [PATCH 3/3] mm/mincore: provide mapped status when cached status is not allowed To: Michal Hocko Cc: Andrew Morton , Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Peter Zijlstra , Greg KH , Jann Horn , Jiri Kosina , Dominique Martinet , Andy Lutomirski , Dave Chinner , Kevin Easton , Matthew Wilcox , Cyril Hrubis , Tejun Heo , "Kirill A . Shutemov" , Daniel Gruss , Jiri Kosina , Josh Snyder References: <20190130124420.1834-1-vbabka@suse.cz> <20190130124420.1834-4-vbabka@suse.cz> <20190131100907.GS18811@dhcp22.suse.cz> From: Vlastimil Babka Openpgp: preference=signencrypt Message-ID: <99ee4d3e-aeb2-0104-22be-b028938e7f88@suse.cz> Date: Fri, 1 Feb 2019 10:04:23 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190131100907.GS18811@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/31/19 11:09 AM, Michal Hocko wrote: > On Wed 30-01-19 13:44:20, Vlastimil Babka wrote: >> After "mm/mincore: make mincore() more conservative" we sometimes restrict the >> information about page cache residency, which we have to do without breaking >> existing userspace, if possible. We thus fake the resulting values as 1, which >> should be safer than faking them as 0, as there might theoretically exist code >> that would try to fault in the page(s) until mincore() returns 1. >> >> Faking 1 however means that such code would not fault in a page even if it was >> not in page cache, with unwanted performance implications. We can improve the >> situation by revisting the approach of 574823bfab82 ("Change mincore() to count >> "mapped" pages rather than "cached" pages") but only applying it to cases where >> page cache residency check is restricted. Thus mincore() will return 0 for an >> unmapped page (which may or may not be resident in a pagecache), and 1 after >> the process faults it in. >> >> One potential downside is that mincore() will be again able to recognize when a >> previously mapped page was reclaimed. While that might be useful for some >> attack scenarios, it's not as crucial as recognizing that somebody else faulted >> the page in, and there are also other ways to recognize reclaimed pages anyway. > > Is this really worth it? Do we know about any specific usecase that > would benefit from this change? TBH I would rather wait for the report > than add a hard to evaluate side channel. Well it's not that complicated IMHO. Linus said it's worth trying, so let's see how he likes the result. The side channel exists anyway as long as process can e.g. check if its rss shrinked, and I doubt we are going to remove that possibility. Also CC Josh Snyder since I forgot originally, and keeping rest of mail for reference. >> Cc: Jiri Kosina >> Cc: Dominique Martinet >> Cc: Andy Lutomirski >> Cc: Dave Chinner >> Cc: Kevin Easton >> Cc: Matthew Wilcox >> Cc: Cyril Hrubis >> Cc: Tejun Heo >> Cc: Kirill A. Shutemov >> Cc: Daniel Gruss >> Signed-off-by: Vlastimil Babka >> --- >> mm/mincore.c | 49 +++++++++++++++++++++++++++++++++---------------- >> 1 file changed, 33 insertions(+), 16 deletions(-) >> >> diff --git a/mm/mincore.c b/mm/mincore.c >> index 747a4907a3ac..d6784a803ae7 100644 >> --- a/mm/mincore.c >> +++ b/mm/mincore.c >> @@ -21,12 +21,18 @@ >> #include >> #include >> >> +struct mincore_walk_private { >> + unsigned char *vec; >> + bool can_check_pagecache; >> +}; >> + >> static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, >> unsigned long end, struct mm_walk *walk) >> { >> #ifdef CONFIG_HUGETLB_PAGE >> unsigned char present; >> - unsigned char *vec = walk->private; >> + struct mincore_walk_private *walk_private = walk->private; >> + unsigned char *vec = walk_private->vec; >> >> /* >> * Hugepages under user process are always in RAM and never >> @@ -35,7 +41,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, >> present = pte && !huge_pte_none(huge_ptep_get(pte)); >> for (; addr != end; vec++, addr += PAGE_SIZE) >> *vec = present; >> - walk->private = vec; >> + walk_private->vec = vec; >> #else >> BUG(); >> #endif >> @@ -85,7 +91,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) >> } >> >> static int __mincore_unmapped_range(unsigned long addr, unsigned long end, >> - struct vm_area_struct *vma, unsigned char *vec) >> + struct vm_area_struct *vma, unsigned char *vec, >> + bool can_check_pagecache) >> { >> unsigned long nr = (end - addr) >> PAGE_SHIFT; >> int i; >> @@ -95,7 +102,9 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, >> >> pgoff = linear_page_index(vma, addr); >> for (i = 0; i < nr; i++, pgoff++) >> - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); >> + vec[i] = can_check_pagecache ? >> + mincore_page(vma->vm_file->f_mapping, pgoff) >> + : 0; >> } else { >> for (i = 0; i < nr; i++) >> vec[i] = 0; >> @@ -106,8 +115,11 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, >> static int mincore_unmapped_range(unsigned long addr, unsigned long end, >> struct mm_walk *walk) >> { >> - walk->private += __mincore_unmapped_range(addr, end, >> - walk->vma, walk->private); >> + struct mincore_walk_private *walk_private = walk->private; >> + unsigned char *vec = walk_private->vec; >> + >> + walk_private->vec += __mincore_unmapped_range(addr, end, walk->vma, >> + vec, walk_private->can_check_pagecache); >> return 0; >> } >> >> @@ -117,7 +129,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, >> spinlock_t *ptl; >> struct vm_area_struct *vma = walk->vma; >> pte_t *ptep; >> - unsigned char *vec = walk->private; >> + struct mincore_walk_private *walk_private = walk->private; >> + unsigned char *vec = walk_private->vec; >> int nr = (end - addr) >> PAGE_SHIFT; >> >> ptl = pmd_trans_huge_lock(pmd, vma); >> @@ -128,7 +141,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, >> } >> >> if (pmd_trans_unstable(pmd)) { >> - __mincore_unmapped_range(addr, end, vma, vec); >> + __mincore_unmapped_range(addr, end, vma, vec, >> + walk_private->can_check_pagecache); >> goto out; >> } >> >> @@ -138,7 +152,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, >> >> if (pte_none(pte)) >> __mincore_unmapped_range(addr, addr + PAGE_SIZE, >> - vma, vec); >> + vma, vec, walk_private->can_check_pagecache); >> else if (pte_present(pte)) >> *vec = 1; >> else { /* pte is a swap entry */ >> @@ -152,8 +166,12 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, >> *vec = 1; >> } else { >> #ifdef CONFIG_SWAP >> - *vec = mincore_page(swap_address_space(entry), >> + if (walk_private->can_check_pagecache) >> + *vec = mincore_page( >> + swap_address_space(entry), >> swp_offset(entry)); >> + else >> + *vec = 0; >> #else >> WARN_ON(1); >> *vec = 1; >> @@ -187,22 +205,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v >> struct vm_area_struct *vma; >> unsigned long end; >> int err; >> + struct mincore_walk_private walk_private = { >> + .vec = vec >> + }; >> struct mm_walk mincore_walk = { >> .pmd_entry = mincore_pte_range, >> .pte_hole = mincore_unmapped_range, >> .hugetlb_entry = mincore_hugetlb, >> - .private = vec, >> + .private = &walk_private >> }; >> >> vma = find_vma(current->mm, addr); >> if (!vma || addr < vma->vm_start) >> return -ENOMEM; >> end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); >> - if (!can_do_mincore(vma)) { >> - unsigned long pages = (end - addr) >> PAGE_SHIFT; >> - memset(vec, 1, pages); >> - return pages; >> - } >> + walk_private.can_check_pagecache = can_do_mincore(vma); >> mincore_walk.mm = vma->vm_mm; >> err = walk_page_range(addr, end, &mincore_walk); >> if (err < 0) >> -- >> 2.20.1 >