Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1058442rwb; Wed, 7 Dec 2022 08:09:26 -0800 (PST) X-Google-Smtp-Source: AA0mqf58nWlKrVHobDHGRHPHO6humAwSvt/1to7bDfn/NzKpiYeqQe4O5OGApxvFg7hwQ46Xd9VP X-Received: by 2002:a62:be08:0:b0:574:26df:aac2 with SMTP id l8-20020a62be08000000b0057426dfaac2mr71684488pff.46.1670429365359; Wed, 07 Dec 2022 08:09:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670429365; cv=none; d=google.com; s=arc-20160816; b=WmDOq5eGd6CyO2Pqc3X7tpeEnfFKAUYquWqug/Ti6uu0jXkt7eBOeEvyw2xS1g734z uPj2JyYujVrukMjMWHrD/V4fcDCEknsLjzkJHl1+3TzsP8rVlGB0fRbTo2/GuDCZxj3B obyOMFvQk4M2VjSqChDiRpJbmuhNamD4U7zqyzet8hluk2zb5692OqirpDl3QeryTYdl CDdCyn173L+cAt+4vWwhgG0o+7MLIEpq26s9S9tLeBCe+wxoQQ3kVtAyp78dpo+cbk79 FYcDqEvSvJsez1iLvybE9rAhYtnd/qSK05ERJNn3Tx6Eh4HaehmV5sUQ3mCob8VaNrq8 o2EQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Wjikf8BONkNBHq/mdSHMPb1S0WDsfH+RsTxDU14w2U8=; b=yx8ogu/VleV1ndznRe8Yk6XZj3duaybV3SttnNIv6j/SNbMPNLL9G5wWZN7VcHqssD SNApeyAvrbMuEZcD+98KBXdBvJonLdkfmL6T8WgQhQKwDkfm7+lgjF/x9bE4PUifbBYv TKZ2LFgcWLZS7Ptl3rhDnfb+eU4pzxp5Ir+BtEmFMyBW81xXYD4dkf9iLuEKL7O52n82 f9/zurWCPISd5OolNUxeoBmtTDGvWn9PWrvGOEWGIizsAJW2qkANSdu0uqXswfFm/jx3 eNJpeTKwd+I+DB6zdb9hdodBiLJGClyDtUneBeJVJN2rMd+eSvp9n6CiGNoZGR3NJg50 Xg6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b="WLyP/sfO"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c2-20020a170902b68200b0018905bd4a58si19148284pls.169.2022.12.07.08.09.15; Wed, 07 Dec 2022 08:09:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b="WLyP/sfO"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229701AbiLGPdn (ORCPT + 76 others); Wed, 7 Dec 2022 10:33:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229811AbiLGPdj (ORCPT ); Wed, 7 Dec 2022 10:33:39 -0500 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A76763B8C for ; Wed, 7 Dec 2022 07:33:36 -0800 (PST) Received: by mail-qk1-x734.google.com with SMTP id x18so10218625qki.4 for ; Wed, 07 Dec 2022 07:33:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Wjikf8BONkNBHq/mdSHMPb1S0WDsfH+RsTxDU14w2U8=; b=WLyP/sfOhBiMsNF2s65Gxuq7n9uP2ESaakKI/3HI1+vYUOiel9xaoXz4rrPACjE1t6 BcAHNJCIpzJ3DcERjul4uyDvWESyrxasi3bBWJGFp3W+fkHrf9Jvg3wGOJa8qBdBJbJJ r+a2klUQv/qyzv7o81LiM0UrHNEE9sgzO2I00mp1FAqrnsn7/8TsfzUYJP+u3SP3Viab XmkrnQPBMqktuVRXx1NySaLMv9T14aEYK0Pbr6gyMRycrQEN+aT528SEbc/byp3iOQh+ jGD3RhcmDgwvBvuaUIpDQq/WhR4jfDsEHX27fFvpg5PNbyPfHtWh4Iksxi/ZxJUi38g5 RaAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Wjikf8BONkNBHq/mdSHMPb1S0WDsfH+RsTxDU14w2U8=; b=UXownwd6UDo6wNRnzITtVEM+UnM1NA+4W7nogpWJsMEdYAsaYWH1+YbmpCxA2tzThS ZJ8HAQbYnuRntFO3CNRPn3TROlq7cq8EXhfLP+KZJQ1V6FF9xq/fN8tj0aZI87t5n7hw O3DrWHttgoOkvbYeSqRfvMkzuh2/XyLwj/ahSINZB09d1llFeUFlTrmxJTS91dAxJ2sW UOeeMhqSbWq0DzGFx1yxaOfYeYhJ9tICiTg1tc5f+/6NDdzVNVvNRJch9l9c+say14lu cQTXGqw9j1ZEgCJQG3BfY6XMXO9EmfEe2/n8U+ZyBk9svvdLWiWrnbEd+lDgDe7t2Qt9 2YMg== X-Gm-Message-State: ANoB5pnzUdvPr9NSzKyiG+oMegTCzqBFqmP8a+QrLQmQ//oBWu/Iy8po 0Fb5X9yoIgpNr7TbzPEL9XVE7g== X-Received: by 2002:ae9:e919:0:b0:6fe:c7a2:b2d0 with SMTP id x25-20020ae9e919000000b006fec7a2b2d0mr8631309qkf.317.1670427215720; Wed, 07 Dec 2022 07:33:35 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-47-55-122-23.dhcp-dynamic.fibreop.ns.bellaliant.net. [47.55.122.23]) by smtp.gmail.com with ESMTPSA id o21-20020a05620a2a1500b006eeb3165554sm17830112qkp.19.2022.12.07.07.33.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 07:33:35 -0800 (PST) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1p2w0t-005E3g-AJ; Wed, 07 Dec 2022 11:07:11 -0400 Date: Wed, 7 Dec 2022 11:07:11 -0400 From: Jason Gunthorpe To: Christoph Hellwig Cc: Lei Rao , kbusch@kernel.org, axboe@fb.com, kch@nvidia.com, sagi@grimberg.me, alex.williamson@redhat.com, cohuck@redhat.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, kevin.tian@intel.com, mjrosato@linux.ibm.com, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, eddie.dong@intel.com, yadong.li@intel.com, yi.l.liu@intel.com, Konrad.wilk@oracle.com, stephen@eideticom.com, hang.yuan@intel.com Subject: Re: [RFC PATCH 1/5] nvme-pci: add function nvme_submit_vf_cmd to issue admin commands for VF driver. Message-ID: References: <20221206135810.GA27689@lst.de> <20221206153811.GB2266@lst.de> <20221206165503.GA8677@lst.de> <20221207075415.GB2283@lst.de> <20221207135203.GA22803@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221207135203.GA22803@lst.de> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 07, 2022 at 02:52:03PM +0100, Christoph Hellwig wrote: > On Wed, Dec 07, 2022 at 09:34:14AM -0400, Jason Gunthorpe wrote: > > The VFIO design assumes that the "vfio migration driver" will talk to > > both functions under the hood, and I don't see a fundamental problem > > with this beyond it being awkward with the driver core. > > And while that is a fine concept per see, the current incarnation of > that is fundamentally broken is it centered around the controlled > VM. Which really can't work. I don't see why you keep saying this. It is centered around the struct vfio_device object in the kernel, which is definately NOT the VM. The struct vfio_device is the handle for the hypervisor to control the physical assigned device - and it is the hypervisor that controls the migration. We do not need the hypervisor userspace to have a handle to the hidden controlling function. It provides no additional functionality, security or insight to what qemu needs to do. Keeping that relationship abstracted inside the kernel is a reasonable choice and is not "fundamentally broken". > > Even the basic assumption that there would be a controlling/controlled > > relationship is not universally true. The mdev type drivers, and > > SIOV-like devices are unlikely to have that. Once you can use PASID > > the reasons to split things at the HW level go away, and a VF could > > certainly self-migrate. > > Even then you need a controlling and a controlled entity. The > controlling entity even in SIOV remains a PCIe function. The > controlled entity might just be a bunch of hardware resoures and > a PASID. Making it important again that all migration is driven > by the controlling entity. If they are the same driver implementing vfio_device you may be able to claim they conceptually exist, but it is pretty artificial to draw this kind of distinction inside a single driver. > Also the whole concept that only VFIO can do live migration is > a little bogus. With checkpoint and restart it absolutely > does make sense to live migrate a container, and with that > the hardware interface (e.g. nvme controller) assigned to it. I agree people may want to do this, but it is very unclear how SRIOV live migration can help do this. SRIOV live migration is all about not disturbing the kernel driver, assuming it is the same kernel driver on both sides. If you have two different kernel's there is nothing worth migrating. There isn't even an assurance the dma API will have IOMMU mapped the same objects to the same IOVAs. eg so you have re-establish your admin queue, IO queues, etc after migration anyhow. Let alone how to solve the security problems of allow userspace to load arbitary FW blobs into a device with potentially insecure DMA access.. At that point it isn't really the same kind of migration. Jason