Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp792364ybf; Fri, 28 Feb 2020 07:44:15 -0800 (PST) X-Google-Smtp-Source: APXvYqwBb3WrWpAxVPBpTXwjRg4qrnoladi2Pe2OdUbe4RM6gQDqqhm39/wJp43V43Qh95X5yx+x X-Received: by 2002:a9d:116:: with SMTP id 22mr3639527otu.149.1582904655633; Fri, 28 Feb 2020 07:44:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582904655; cv=none; d=google.com; s=arc-20160816; b=Jw4WgKYW8SBdLbBwHA3nXS5P6eT+wdPaQCPMoEFGv189Jez6JGYDRJ+V8CQd8C8f7x vk8j29yaRWfg3tVl7ZRmCCXwIfaPOzAWsEPvdCZXN/bhXNxQTzdSRmQ4vHmSH2yOSRjV +hujDdbewsQE+h1/Xp/vTVk0Isprq1zVrN4LmWaxxQDCSiXQerxDrkNK5iWN8fBk308f 2JNi9f+G9kMorki4rQsUP+/DIpmak+GR/q70cEjiLN/UghulImsauuv9ftOoMhKlKytA yUnC/x6vvnKrVsw/m4X3vDdp520H3IWKLpgVCi5kEwDj1M1a4TrfBjtyXTF+umZrAPRG 1SlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :mime-version:references:in-reply-to:date:subject:cc:to:from; bh=HYUa/QfFezQcHA7UNJrLXvEbe0UbHbaWyXYUstRORtU=; b=LHDXHhcdY6cqqAZT1o3sgu6CGWWWeexM7mYmv+1AMJpGc0yf5nzvP/8ILcvWfJ5Osd lvPBKSSdHO0rtT4pWxPIG/q4dty+Tcs2Xh5Y/VkY6X89336y8XC2mp28p8dTplHTt7eR a8W70okwFMIHDxHVvW8y7qi3Jmfso/xW1RMFdGYyKgW9Wur7xfg6YVk7E7gXZaZCuem1 fYgQVEyrUAFsNp0ifuZGXGGIALCfRBqbInnZ7wIB8QYonehTqEwllxxeud5xAQhWY5P5 lIgGi5ZsULBBD/DMnmUWfrUhG5SVwtrPR5LaMMAzHvpnj0gPBQwl4yHwJXVCqWCVArTt N3QA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g2si2188355otn.117.2020.02.28.07.44.03; Fri, 28 Feb 2020 07:44:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727323AbgB1Pnj (ORCPT + 99 others); Fri, 28 Feb 2020 10:43:39 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:43378 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726974AbgB1Pnf (ORCPT ); Fri, 28 Feb 2020 10:43:35 -0500 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 01SFYTFr107270 for ; Fri, 28 Feb 2020 10:43:35 -0500 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0a-001b2d01.pphosted.com with ESMTP id 2yepxfubb3-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 28 Feb 2020 10:43:34 -0500 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 28 Feb 2020 15:43:32 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 28 Feb 2020 15:43:28 -0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 01SFhRAV58523828 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 28 Feb 2020 15:43:27 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E81EAAE057; Fri, 28 Feb 2020 15:43:26 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2E1CDAE045; Fri, 28 Feb 2020 15:43:26 +0000 (GMT) Received: from p-imbrenda.emea.ibm.com (unknown [9.145.11.131]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 28 Feb 2020 15:43:26 +0000 (GMT) From: Claudio Imbrenda To: linux-next@vger.kernel.org, akpm@linux-foundation.org Cc: borntraeger@de.ibm.com, david@redhat.com, aarcange@redhat.com, linux-mm@kvack.org, frankja@linux.ibm.com, sfr@canb.auug.org.au, jhubbard@nvidia.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Will Deacon Subject: [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages Date: Fri, 28 Feb 2020 16:43:22 +0100 X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200228154322.329228-1-imbrenda@linux.ibm.com> References: <20200228154322.329228-1-imbrenda@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 20022815-4275-0000-0000-000003A6660C X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20022815-4276-0000-0000-000038BAFBE1 Message-Id: <20200228154322.329228-4-imbrenda@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572 definitions=2020-02-28_04:2020-02-28,2020-02-28 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 bulkscore=0 mlxscore=0 clxscore=1015 adultscore=0 suspectscore=2 impostorscore=0 priorityscore=1501 malwarescore=0 phishscore=0 lowpriorityscore=0 mlxlogscore=554 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002280124 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With the introduction of protected KVM guests on s390 there is now a concept of inaccessible pages. These pages need to be made accessible before the host can access them. While cpu accesses will trigger a fault that can be resolved, I/O accesses will just fail. We need to add a callback into architecture code for places that will do I/O, namely when writeback is started or when a page reference is taken. This is not only to enable paging, file backing etc, it is also necessary to protect the host against a malicious user space. For example a bad QEMU could simply start direct I/O on such protected memory. We do not want userspace to be able to trigger I/O errors and thus we the logic is "whenever somebody accesses that page (gup) or does I/O, make sure that this page can be accessed". When the guest tries to access that page we will wait in the page fault handler for writeback to have finished and for the page_ref to be the expected value. On s390x the function is not supposed to fail, so it is ok to use a WARN_ON on failure. If we ever need some more finegrained handling we can tackle this when we know the details. Signed-off-by: Claudio Imbrenda Acked-by: Will Deacon Reviewed-by: David Hildenbrand Reviewed-by: Christian Borntraeger Signed-off-by: Christian Borntraeger --- include/linux/gfp.h | 6 ++++++ mm/gup.c | 19 ++++++++++++++++--- mm/page-writeback.c | 5 +++++ 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index e5b817cb86e7..be2754841369 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { } #ifndef HAVE_ARCH_ALLOC_PAGE static inline void arch_alloc_page(struct page *page, int order) { } #endif +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE +static inline int arch_make_page_accessible(struct page *page) +{ + return 0; +} +#endif struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, diff --git a/mm/gup.c b/mm/gup.c index 0b9a806898f3..86fff6e4e4f3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -391,6 +391,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, struct page *page; spinlock_t *ptl; pte_t *ptep, pte; + int ret; /* FOLL_GET and FOLL_PIN are mutually exclusive. */ if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) == @@ -449,8 +450,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, if (is_zero_pfn(pte_pfn(pte))) { page = pte_page(pte); } else { - int ret; - ret = follow_pfn_pte(vma, address, ptep, flags); page = ERR_PTR(ret); goto out; @@ -458,7 +457,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if (flags & FOLL_SPLIT && PageTransCompound(page)) { - int ret; get_page(page); pte_unmap_unlock(ptep, ptl); lock_page(page); @@ -475,6 +473,14 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = ERR_PTR(-ENOMEM); goto out; } + if (flags & FOLL_PIN) { + ret = arch_make_page_accessible(page); + if (ret) { + unpin_user_page(page); + page = ERR_PTR(ret); + goto out; + } + } if (flags & FOLL_TOUCH) { if ((flags & FOLL_WRITE) && !pte_dirty(pte) && !PageDirty(page)) @@ -2143,6 +2149,13 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON_PAGE(compound_head(page) != head, page); + if (flags & FOLL_PIN) { + ret = arch_make_page_accessible(page); + if (ret) { + unpin_user_page(page); + goto pte_unmap; + } + } SetPageReferenced(page); pages[*nr] = page; (*nr)++; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index ab5a3cee8ad3..8384be5a2758 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2807,6 +2807,11 @@ int __test_set_page_writeback(struct page *page, bool keep_write) inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); } unlock_page_memcg(page); + /* + * If writeback has been triggered on a page that cannot be made + * accessible, it is too late. + */ + WARN_ON(arch_make_page_accessible(page)); return ret; } -- 2.24.1