Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5850747imu; Wed, 30 Jan 2019 04:46:32 -0800 (PST) X-Google-Smtp-Source: ALg8bN5aMnzqJb7TtZFq1nQ834rlZ4w4LhKB8LFixTMikFLphRVPeBnNtlKFomAlb2OGycfFoDwr X-Received: by 2002:a62:3241:: with SMTP id y62mr30292880pfy.178.1548852392054; Wed, 30 Jan 2019 04:46:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548852392; cv=none; d=google.com; s=arc-20160816; b=HsrVWmf9K3txuj14ZhZS5y3YszHxPLsLABHsyybPdLgio3C942YWllpmOwjmEbr8HP Tns4e8t2CgsjJdjQ7deMBB75CuOzPwEksyF2xPlBt6ktM9r2UDuVh3bI3rvy/kqsec9M RX+fi5PDj8c7BORR7jBp57aK3zceRrbaHKmg/zznzaVk+2G9l3kVea23KETG8c9gS999 GRSv7i+QriymfJlLIbxjTP/+dQ7lUNCz3ZMXW7YXzYxBzFsfbYLbVgal49woWFRn1n3H EWFTULJAJRXp4jtVHxXCXF1xiabmS/mggUSLZV+5+VXQjSipmNI0vZrjhy2egAoMVqQL wUyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=ulYz79XtO1G0723Gc8tScT/ZS89wck5bZo+KvUiTKmc=; b=RE1pfqCZbToruCdWba2m1XNfyZmCud9HZZD5zfLIWNso4lSPhfp5+TJrRucuzfMtf+ AVcPqv5KFJheR6yGGtqeoHP+SlrRDGY6Tn22RUdGbWLgY9JFH5MjrtzxFAhzC2Qf/Fmy dZ0EE/IKPSMTPnsqTxaA+jJNUFHcL+UfV5U2uXT5oXE6HO22uJEYFdrzKh7Gpr3xibcK ebNkZyfsr0ALwd4w1oi2LJv3RX3A2Dd3y5KIBkUeCZBpzLVbrGMu9n0ZPYoCOxsU5QP3 qh5AkwLe302hXaUI7TPWO5B2m5LGlhCfJiVbAOCXrUJAz1IB++aj/zfMMfjVdiT87Egx VPiQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y141si1556983pfc.180.2019.01.30.04.46.16; Wed, 30 Jan 2019 04:46:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730899AbfA3Mp1 (ORCPT + 99 others); Wed, 30 Jan 2019 07:45:27 -0500 Received: from mx2.suse.de ([195.135.220.15]:41310 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725768AbfA3MpP (ORCPT ); Wed, 30 Jan 2019 07:45:15 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 44459AFC5; Wed, 30 Jan 2019 12:45:13 +0000 (UTC) From: Vlastimil Babka To: Andrew Morton , Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Peter Zijlstra , Greg KH , Jann Horn , Vlastimil Babka , Jiri Kosina , Dominique Martinet , Andy Lutomirski , Dave Chinner , Kevin Easton , Matthew Wilcox , Cyril Hrubis , Tejun Heo , "Kirill A . Shutemov" , Daniel Gruss , Jiri Kosina Subject: [PATCH 3/3] mm/mincore: provide mapped status when cached status is not allowed Date: Wed, 30 Jan 2019 13:44:20 +0100 Message-Id: <20190130124420.1834-4-vbabka@suse.cz> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190130124420.1834-1-vbabka@suse.cz> References: <20190130124420.1834-1-vbabka@suse.cz> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org After "mm/mincore: make mincore() more conservative" we sometimes restrict the information about page cache residency, which we have to do without breaking existing userspace, if possible. We thus fake the resulting values as 1, which should be safer than faking them as 0, as there might theoretically exist code that would try to fault in the page(s) until mincore() returns 1. Faking 1 however means that such code would not fault in a page even if it was not in page cache, with unwanted performance implications. We can improve the situation by revisting the approach of 574823bfab82 ("Change mincore() to count "mapped" pages rather than "cached" pages") but only applying it to cases where page cache residency check is restricted. Thus mincore() will return 0 for an unmapped page (which may or may not be resident in a pagecache), and 1 after the process faults it in. One potential downside is that mincore() will be again able to recognize when a previously mapped page was reclaimed. While that might be useful for some attack scenarios, it's not as crucial as recognizing that somebody else faulted the page in, and there are also other ways to recognize reclaimed pages anyway. Cc: Jiri Kosina Cc: Dominique Martinet Cc: Andy Lutomirski Cc: Dave Chinner Cc: Kevin Easton Cc: Matthew Wilcox Cc: Cyril Hrubis Cc: Tejun Heo Cc: Kirill A. Shutemov Cc: Daniel Gruss Signed-off-by: Vlastimil Babka --- mm/mincore.c | 49 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index 747a4907a3ac..d6784a803ae7 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -21,12 +21,18 @@ #include #include +struct mincore_walk_private { + unsigned char *vec; + bool can_check_pagecache; +}; + static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) { #ifdef CONFIG_HUGETLB_PAGE unsigned char present; - unsigned char *vec = walk->private; + struct mincore_walk_private *walk_private = walk->private; + unsigned char *vec = walk_private->vec; /* * Hugepages under user process are always in RAM and never @@ -35,7 +41,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, present = pte && !huge_pte_none(huge_ptep_get(pte)); for (; addr != end; vec++, addr += PAGE_SIZE) *vec = present; - walk->private = vec; + walk_private->vec = vec; #else BUG(); #endif @@ -85,7 +91,8 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) } static int __mincore_unmapped_range(unsigned long addr, unsigned long end, - struct vm_area_struct *vma, unsigned char *vec) + struct vm_area_struct *vma, unsigned char *vec, + bool can_check_pagecache) { unsigned long nr = (end - addr) >> PAGE_SHIFT; int i; @@ -95,7 +102,9 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, pgoff = linear_page_index(vma, addr); for (i = 0; i < nr; i++, pgoff++) - vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff); + vec[i] = can_check_pagecache ? + mincore_page(vma->vm_file->f_mapping, pgoff) + : 0; } else { for (i = 0; i < nr; i++) vec[i] = 0; @@ -106,8 +115,11 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end, static int mincore_unmapped_range(unsigned long addr, unsigned long end, struct mm_walk *walk) { - walk->private += __mincore_unmapped_range(addr, end, - walk->vma, walk->private); + struct mincore_walk_private *walk_private = walk->private; + unsigned char *vec = walk_private->vec; + + walk_private->vec += __mincore_unmapped_range(addr, end, walk->vma, + vec, walk_private->can_check_pagecache); return 0; } @@ -117,7 +129,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, spinlock_t *ptl; struct vm_area_struct *vma = walk->vma; pte_t *ptep; - unsigned char *vec = walk->private; + struct mincore_walk_private *walk_private = walk->private; + unsigned char *vec = walk_private->vec; int nr = (end - addr) >> PAGE_SHIFT; ptl = pmd_trans_huge_lock(pmd, vma); @@ -128,7 +141,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } if (pmd_trans_unstable(pmd)) { - __mincore_unmapped_range(addr, end, vma, vec); + __mincore_unmapped_range(addr, end, vma, vec, + walk_private->can_check_pagecache); goto out; } @@ -138,7 +152,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, if (pte_none(pte)) __mincore_unmapped_range(addr, addr + PAGE_SIZE, - vma, vec); + vma, vec, walk_private->can_check_pagecache); else if (pte_present(pte)) *vec = 1; else { /* pte is a swap entry */ @@ -152,8 +166,12 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, *vec = 1; } else { #ifdef CONFIG_SWAP - *vec = mincore_page(swap_address_space(entry), + if (walk_private->can_check_pagecache) + *vec = mincore_page( + swap_address_space(entry), swp_offset(entry)); + else + *vec = 0; #else WARN_ON(1); *vec = 1; @@ -187,22 +205,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v struct vm_area_struct *vma; unsigned long end; int err; + struct mincore_walk_private walk_private = { + .vec = vec + }; struct mm_walk mincore_walk = { .pmd_entry = mincore_pte_range, .pte_hole = mincore_unmapped_range, .hugetlb_entry = mincore_hugetlb, - .private = vec, + .private = &walk_private }; vma = find_vma(current->mm, addr); if (!vma || addr < vma->vm_start) return -ENOMEM; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); - if (!can_do_mincore(vma)) { - unsigned long pages = (end - addr) >> PAGE_SHIFT; - memset(vec, 1, pages); - return pages; - } + walk_private.can_check_pagecache = can_do_mincore(vma); mincore_walk.mm = vma->vm_mm; err = walk_page_range(addr, end, &mincore_walk); if (err < 0) -- 2.20.1