Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp583906pxu; Thu, 3 Dec 2020 07:46:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJxASdxwJrLsUzg+p7TS77hTKTY+Hbk12Lz7DualYHWB/RJpr2PzBnm9xmPSdol8WEAXbS7i X-Received: by 2002:aa7:d99a:: with SMTP id u26mr3534064eds.32.1607010417763; Thu, 03 Dec 2020 07:46:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607010417; cv=none; d=google.com; s=arc-20160816; b=WaKaiuP1NMPbzsrVbgFT0YKJgbx/LuXKj8sOClCU0KKq1D1oyLE1S2wCHSvh+i8gJb viR3kGaapQYWgxNcxcKuZ5R+7027UGlxMbKwvQgXU4/xWeZ5fstFySEySWlRbAkoyJEU A6RPzyeyLlXwjIGAMygi+7OAH0bi6AKrLWlzByL5zPNQudS3QNcs4edtXj1ATaNUcWL6 qoet0o4PsHsdjLgHxCxd/lv4jsaprXZkHB9KxQeiVz1P+V3+xG/2brjCYwDChLcAnPGI auikJhobXQTMdzcUmQsLA4RFDLs5l2KUX33ve/Q8DDa/5EdFsYUG07oGA4+xuOp2BNWV /vxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=mRZ+N7+DKVnB4FTAtX9mrza/iWwjCH89QGOlxrsGIes=; b=VjyltQLqHIekNgL4NZI9ajX1IkYL1h3uuYHPvgrVBzOowDZ41uzuWaMNgy7M2iBSD7 J14z7IaXwAv1NzU4ZgxixAoSytFqYDJj1Dbv5Bmtimd5xP9VID4RP4VBt+P5dMLbGt3s cBPJNzJDyh+wiJhi2fKEf2Kc7QGJlaqxCVQe7CZ3ocnG2ADjtHZhMZUxZDd3I4AdC64X IrjkVniWeyjlKFVBTH/7/4drcbIpLjYaq5mH1iM9xQW8GsizQr1EeZHX/4dakPvxDBjU 7JycWvxIF3x3avw1bVQ1MGgWC5thZ3LOGWcGu+cS1QRk+/JmMSEKV6AFgQcbFQ2zK3VG 94ug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VzdTMYRe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a21si1278262ejr.58.2020.12.03.07.46.34; Thu, 03 Dec 2020 07:46:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VzdTMYRe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388293AbgLCPoz (ORCPT + 99 others); Thu, 3 Dec 2020 10:44:55 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:25110 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387680AbgLCPoz (ORCPT ); Thu, 3 Dec 2020 10:44:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1607010209; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=mRZ+N7+DKVnB4FTAtX9mrza/iWwjCH89QGOlxrsGIes=; b=VzdTMYRenPxbvbPfgiHo1Dy2XtRsxFWRdVMbtu+3/zrTJF6HJdg8Lp0+gBMAWFmWicd1yq M4adocy05fNuxoqX0SUDR+3fFqGdRcZKZ7xtxpudny+cKLyiiwo4ajAgyBpmPZKC9ftoAf KsaZXZvWL3ar3rGUe3QBsasBY8PGG+8= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-140-gY1_3oNHMdGlAjDTYKCG8Q-1; Thu, 03 Dec 2020 10:43:25 -0500 X-MC-Unique: gY1_3oNHMdGlAjDTYKCG8Q-1 Received: by mail-qt1-f198.google.com with SMTP id h13so148105qtq.5 for ; Thu, 03 Dec 2020 07:43:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=mRZ+N7+DKVnB4FTAtX9mrza/iWwjCH89QGOlxrsGIes=; b=rR67QrLl1irSdNr3oGZGiHz64RjdL8bhnOP/xgHa5n3Cf+5lWHUrTA6KFUZsfKMFJz SKO6JMX/EGg4tv0C2tgNCu1OmyR6AnE7OW7PTfpkob2o/bgyknCgtu3mYCNeQt6n/uCw MN4uVFy/LECdgf2oA1Cx8CcRKcXOVxdYJWnVtLSpirkeSvLipFZT8ZRKk99iSMLmhKxy QGgdZovIRKAFWbf010rgkqLT2evXSjJFMafspyBgT5iUL4fYtVP56sv/GPUCiCR3j9s7 iQkKMHtsi6g+4BGWLV2h7zYUBPie7uH0V5FOELpuCp33O3KSSCy50q01h7ebszdKkATw K1aA== X-Gm-Message-State: AOAM531sF+g9NyeAH45rr8XjfeLEjrO7L4jdFZJz6jNKTAlDrK0y7mls HXRalt+AtKIWL8EQDE7lz4H45Jw9YU1Oy4bA+JsoeGWuE4ckYj1r8XRsk1M7XijqHwGWGu7dmet S24GNdqtUckhSOjgMaywnNKCd X-Received: by 2002:a05:6214:366:: with SMTP id t6mr3982356qvu.58.1607010204920; Thu, 03 Dec 2020 07:43:24 -0800 (PST) X-Received: by 2002:a05:6214:366:: with SMTP id t6mr3982333qvu.58.1607010204690; Thu, 03 Dec 2020 07:43:24 -0800 (PST) Received: from xz-x1 ([142.126.94.187]) by smtp.gmail.com with ESMTPSA id w21sm1854362qki.6.2020.12.03.07.43.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Dec 2020 07:43:23 -0800 (PST) Date: Thu, 3 Dec 2020 10:43:22 -0500 From: Peter Xu To: Stefan Hajnoczi Cc: Justin He , Alex Williamson , Cornelia Huck , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , David Hildenbrand Subject: Re: [PATCH] vfio iommu type1: Bypass the vma permission check in vfio_pin_pages_remote() Message-ID: <20201203154322.GH108496@xz-x1> References: <20201119142737.17574-1-justin.he@arm.com> <20201124181228.GA276043@xz-x1> <20201125155711.GA6489@xz-x1> <20201202143356.GK655829@stefanha-x1.localdomain> <20201202154511.GI3277@xz-x1> <20201203112002.GE689053@stefanha-x1.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201203112002.GE689053@stefanha-x1.localdomain> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 03, 2020 at 11:20:02AM +0000, Stefan Hajnoczi wrote: > On Wed, Dec 02, 2020 at 10:45:11AM -0500, Peter Xu wrote: > > On Wed, Dec 02, 2020 at 02:33:56PM +0000, Stefan Hajnoczi wrote: > > > On Wed, Nov 25, 2020 at 10:57:11AM -0500, Peter Xu wrote: > > > > On Wed, Nov 25, 2020 at 01:05:25AM +0000, Justin He wrote: > > > > > > I'd appreciate if you could explain why vfio needs to dma map some > > > > > > PROT_NONE > > > > > > > > > > Virtiofs will map a PROT_NONE cache window region firstly, then remap the sub > > > > > region of that cache window with read or write permission. I guess this might > > > > > be an security concern. Just CC virtiofs expert Stefan to answer it more accurately. > > > > > > > > Yep. Since my previous sentence was cut off, I'll rephrase: I was thinking > > > > whether qemu can do vfio maps only until it remaps the PROT_NONE regions into > > > > PROT_READ|PROT_WRITE ones, rather than trying to map dma pages upon PROT_NONE. > > > > > > Userspace processes sometimes use PROT_NONE to reserve virtual address > > > space. That way future mmap(NULL, ...) calls will not accidentally > > > allocate an address from the reserved range. > > > > > > virtio-fs needs to do this because the DAX window mappings change at > > > runtime. Initially the entire DAX window is just reserved using > > > PROT_NONE. When it's time to mmap a portion of a file into the DAX > > > window an mmap(fixed_addr, ...) call will be made. > > > > Yes I can understand the rational on why the region is reserved. However IMHO > > the real question is why such reservation behavior should affect qemu memory > > layout, and even further to VFIO mappings. > > > > Note that PROT_NONE should likely mean that there's no backing page at all in > > this case. Since vfio will pin all the pages before mapping the DMAs, it also > > means that it's at least inefficient, because when we try to map all the > > PROT_NONE pages we'll try to fault in every single page of it, even if they may > > not ever be used. > > > > So I still think this patch is not doing the right thing. Instead we should > > somehow teach qemu that the virtiofs memory region should only be the size of > > enabled regions (with PROT_READ|PROT_WRITE), rather than the whole reserved > > PROT_NONE region. > > virtio-fs was not implemented with IOMMUs in mind. The idea is just to > install a kvm.ko memory region that exposes the DAX window. > > Perhaps we need to treat the DAX window like an IOMMU? That way the > virtio-fs code can send map/unmap notifications and hw/vfio/ can > propagate them to the host kernel. Sounds right. One more thing to mention is that we may need to avoid tearing down the whole old DMA region when resizing the PROT_READ|PROT_WRITE region into e.g. a bigger one to cover some of the previusly PROT_NONE part, as long as if the before-resizing region is still possible to be accessed from any hardware. It smells like something David is working with virtio-mem, not sure whether there's any common infrastructure that could be shared. Thanks, -- Peter Xu