Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752402AbbKZD4g (ORCPT ); Wed, 25 Nov 2015 22:56:36 -0500 Received: from mail-ig0-f193.google.com ([209.85.213.193]:35013 "EHLO mail-ig0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752012AbbKZD4d convert rfc822-to-8bit (ORCPT ); Wed, 25 Nov 2015 22:56:33 -0500 MIME-Version: 1.0 In-Reply-To: References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <5654722D.4010409@gmail.com> <56552888.90108@intel.com> <56556F98.5060507@intel.com> Date: Wed, 25 Nov 2015 19:56:32 -0800 Message-ID: Subject: Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC From: Alexander Duyck To: "Dong, Eddie" Cc: "Lan, Tianyu" , "a.motakis@virtualopensystems.com" , Alex Williamson , "b.reynal@virtualopensystems.com" , Bjorn Helgaas , "Wyborny, Carolyn" , "Skidmore, Donald C" , "Jani, Nrupal" , Alexander Graf , "kvm@vger.kernel.org" , Paolo Bonzini , "qemu-devel@nongnu.org" , "Tantilov, Emil S" , Or Gerlitz , "Rustad, Mark D" , "Michael S. Tsirkin" , Eric Auger , intel-wired-lan , "Kirsher, Jeffrey T" , "Brandeburg, Jesse" , "Ronciak, John" , "linux-api@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Williams, Mitch A" , Netdev , "Nelson, Shannon" , Wei Yang , "zajec5@gmail.com" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3116 Lines: 58 On Wed, Nov 25, 2015 at 7:15 PM, Dong, Eddie wrote: >> On Wed, Nov 25, 2015 at 12:21 AM, Lan Tianyu wrote: >> > On 2015年11月25日 13:30, Alexander Duyck wrote: >> >> No, what I am getting at is that you can't go around and modify the >> >> configuration space for every possible device out there. This >> >> solution won't scale. >> > >> > >> > PCI config space regs are emulation by Qemu and so We can find the >> > free PCI config space regs for the faked PCI capability. Its position >> > can be not permanent. >> >> Yes, but do you really want to edit every driver on every OS that you plan to >> support this on. What about things like direct assignment of regular Ethernet >> ports? What you really need is a solution that will work generically on any >> existing piece of hardware out there. > > The fundamental assumption of this patch series is to modify the driver in guest to self-emulate or track the device state, so that the migration may be possible. > I don't think we can modify OS, without modifying the drivers, even using the PCIe hotplug mechanism. > In the meantime, modifying Windows OS is a big challenge given that only Microsoft can do. While, modifying driver is relatively simple and manageable to device vendors, if the device vendor want to support state-clone based migration. The problem is the code you are presenting, even as a proof of concept is seriously flawed. It does a poor job of exposing how any of this can be duplicated for any other VF other than the one you are working on. I am not saying you cannot modify the drivers, however what you are doing is far too invasive. Do you seriously plan on modifying all of the PCI device drivers out there in order to allow any device that might be direct assigned to a port to support migration? I certainly hope not. That is why I have said that this solution will not scale. What I am counter proposing seems like a very simple proposition. It can be implemented in two steps. 1. Look at modifying dma_mark_clean(). It is a function called in the sync and unmap paths of the lib/swiotlb.c. If you could somehow modify it to take care of marking the pages you unmap for Rx as being dirty it will get you a good way towards your goal as it will allow you to continue to do DMA while you are migrating the VM. 2. Look at making use of the existing PCI suspend/resume calls that are there to support PCI power management. They have everything needed to allow you to pause and resume DMA for the device before and after the migration while retaining the driver state. If you can implement something that allows you to trigger these calls from the PCI subsystem such as hot-plug then you would have a generic solution that can be easily reproduced for multiple drivers beyond those supported by ixgbevf. Thanks. - Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/