Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp594255ybz; Wed, 15 Apr 2020 14:46:27 -0700 (PDT) X-Google-Smtp-Source: APiQypJY74cBKtD+UAJ7hYEHQX+nLoBDLybXgrdPX0qPZ5SgcoT+APwKvq+7ybItvx8wHHhhX81o X-Received: by 2002:a17:906:1f55:: with SMTP id d21mr6970984ejk.320.1586987186963; Wed, 15 Apr 2020 14:46:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586987186; cv=none; d=google.com; s=arc-20160816; b=P/GK8O1+ip+JTBU5wp4JF4Lz7n+Azo8A2TAi00DLB5dAnOsuUDOgNUGXxdmxCxTx/w 8HYI/fC9Ak/I2KELz/LqFuwVw5w6FLS0QMFGT+kH7DlqBz/Gz7EY+Nij/lezXQ31VCwc mgFJbLfxic3+ED7OhA7HuzqcD1e2hGnv9ecSa5WOfSzasgjs9/ENz7cGwzK5da3E023G cBtwDzkb0tdzdEYhipyC1m5Z/O0iKStNeo1zJpZsTbftCdCBEnpcbTWiOMJA4PNDb9nb +I3V+xNLikE0xbmtdIvngLETomxrop34fs0Rv6L28ExE2OU1yuIsMGT01aG3+LKbc0AI Nf6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :mime-version:organization:references:in-reply-to:subject:cc:to:from :date; bh=mHbXvR9Bo6eNWc/9Yxhi4M7flqHSYAPtmcxCnV4bcD0=; b=IOecxYu0NL1r3Zz8jmP2aAoQ7kfvK8ZBaOb9Uw9zqGnP46S40BMbuih5DgTTLqpxpE 5IXZCL2UyC5aUZ8ac7kIMiieSNy3yatdwmnMsyR1uAF12wvCA9HQ+RC9h1UvElSHebx/ W1ePFxgLGRmyNi4yBZHVuphuZPseWdjP4bw0m1lyg7UJ8ksmT7Qwno2LjHpAtg0nzH/l P56Qz97XGtCDhiONlK4kfWJ6x6/QFD3jKCAeHA18JHR8c8+pYFara8NPAkC5GtGAAs02 ZOENCswyqu1I/gP+5neECZzqbpcIkZrnGh9xCcpGJPqrwHiiNNBeRGJNZwipqampGWmM H2RQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y3si11106214edr.469.2020.04.15.14.46.03; Wed, 15 Apr 2020 14:46:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391247AbgDNQEV (ORCPT + 99 others); Tue, 14 Apr 2020 12:04:21 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:7448 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391234AbgDNQDq (ORCPT ); Tue, 14 Apr 2020 12:03:46 -0400 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 03EG3FBP006645 for ; Tue, 14 Apr 2020 12:03:45 -0400 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0a-001b2d01.pphosted.com with ESMTP id 30cwm0gtr5-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 14 Apr 2020 12:03:42 -0400 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 14 Apr 2020 17:02:33 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 14 Apr 2020 17:02:30 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 03EG33Nu43647070 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 14 Apr 2020 16:03:03 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 791F7A4059; Tue, 14 Apr 2020 16:03:03 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C0AB9A404D; Tue, 14 Apr 2020 16:03:02 +0000 (GMT) Received: from p-imbrenda (unknown [9.145.12.13]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 14 Apr 2020 16:03:02 +0000 (GMT) Date: Tue, 14 Apr 2020 18:03:00 +0200 From: Claudio Imbrenda To: Dave Hansen Cc: linux-next@vger.kernel.org, akpm@linux-foundation.org, jack@suse.cz, kirill@shutemov.name, borntraeger@de.ibm.com, david@redhat.com, aarcange@redhat.com, linux-mm@kvack.org, frankja@linux.ibm.com, sfr@canb.auug.org.au, jhubbard@nvidia.com, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Will Deacon , Sean Christopherson Subject: Re: [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages In-Reply-To: <11dc928d-60b4-f04f-1ebf-f4cffb337a6c@intel.com> References: <20200306132537.783769-1-imbrenda@linux.ibm.com> <20200306132537.783769-3-imbrenda@linux.ibm.com> <11dc928d-60b4-f04f-1ebf-f4cffb337a6c@intel.com> Organization: IBM X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 20041416-0016-0000-0000-0000030474C9 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20041416-0017-0000-0000-000033686D4D Message-Id: <20200414180300.52640444@p-imbrenda> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.676 definitions=2020-04-14_07:2020-04-14,2020-04-14 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=344 phishscore=0 suspectscore=0 adultscore=0 mlxscore=0 malwarescore=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 spamscore=0 clxscore=1015 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004140127 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 13 Apr 2020 13:22:24 -0700 Dave Hansen wrote: > On 3/6/20 5:25 AM, Claudio Imbrenda wrote: > > On s390x the function is not supposed to fail, so it is ok to use a > > WARN_ON on failure. If we ever need some more finegrained handling > > we can tackle this when we know the details. > > Could you explain a bit why the function can't fail? the concept of "making accessible" is only to make sure that accessing the page will not trigger faults or I/O or DMA errors. in general it does not mean freely accessing the content of the page in cleartext. on s390x, protected guest pages can be shared. the guest has to actively share its pages, and in that case those pages are both part of the protected VM and freely accessible by the host. pages that are not shared cannot be accessed by the host. in our case "making the page accessible" means: - if the page was shared, make sure it stays shared - if the page was not shared, first encrypt it and then make it accessible to the host (both operations performed securely and atomically by the hardware) then the page can be swapped out, or used for direct I/O (obviously if you do I/O on a page that was not shared, you cannot expect good things to happen, since you basically corrupt the memory of the guest). on s390x performing I/O directly on protected pages results in (in practice) unrecoverable I/O errors, so we want to avoid it at all costs. accessing protected pages from the CPU triggers an exception that can be handled (and we do handle it, in fact) now imagine a buggy or malicious qemu process crashing the whole machine just because it did I/O to/from a protected page. we clearly don't want that. > If the guest has secret data in the page, then it *can* and does fail. no, that's the whole point of this mechanism. in fact, most of the guest pages will be "secret data", only the few pages used for guest I/O bounce buffers will be shared with the host > It won't fail, though, if the host and guest agree on whether the page > is protected. > > Right? > > > @@ -2807,6 +2807,13 @@ int __test_set_page_writeback(struct page > > *page, bool keep_write) inc_zone_page_state(page, > > NR_ZONE_WRITE_PENDING); } > > unlock_page_memcg(page); > > + access_ret = arch_make_page_accessible(page); > > + /* > > + * If writeback has been triggered on a page that cannot > > be made > > + * accessible, it is too late to recover here. > > + */ > > + VM_BUG_ON_PAGE(access_ret != 0, page); > > + > > return ret; > > > > } > > This seems like a really odd place to do this. Writeback is specific > to block I/O. I would have thought there were other kinds of devices > that matter, not just block devices. well, yes and no. for writeback (block I/O and swap) this is the right place. at this point we know that the page is present and nobody else has started doing I/O yet, and I/O will happen soon-ish. so we make the page accessible. there is no turning back here, unlike pinning. we are not allowed to fail, we can't regarding the other kinds of devices: yes, they will use pinning, which is covered by the rest of the patch. the semantics of get page and pin page (if the documentation has not changed meanwhile) is that the traditional get_page is used for when the page is needed but not its content, and pin_page is used when the content of the page is accessed. since the only issue here is accessing the content of the page, we don't need to make it accessible for get_page, but only for pin_page. get_page and pin_page are allowed to fail, so in this case we return an error code, so other architectures can potentially abort the pinning if needed. on s390x we will never fail, for the same reasons written above. > Also, this patch seems odd that it only does the > arch_make_page_accessible() half. Where's the other half where the > page is made inaccessible? that is very arch-specific. for s390x, you can look at this patch and the ones immediately before/after: 214d9bbcd3a67230b932f6ce > I assume it's OK to "leak" things like this, it's just not clear to me > _why_ it's OK. nothing is being leaked :) I hope I clarified a little how this works on s390x :) feel free to poke me again if some things are still unclear best regards, Claudio Imbrenda