Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp291591pxb; Thu, 30 Sep 2021 06:15:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzJYKj44a+9V7cGybKZ8OCXoYrpxy3Ebrj4WcK6/4EoseOd2pqoQPOZAD/NNpfRNRLEYeMy X-Received: by 2002:a63:4f54:: with SMTP id p20mr4709425pgl.437.1633007700760; Thu, 30 Sep 2021 06:15:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633007700; cv=none; d=google.com; s=arc-20160816; b=RP7O3aEhB6+yrO8k3bN6N0hR1vdt44VnXuPG0CSVljm04odyR0zkjeBnyijFT6ce3A pf+SWy7rZM1g0Zn4COuplMC94ywvrYR/sFOijvEOf6WAAbqlz2TR9uL5GhJA2ZM1JDaI 7xm+qtMJZ4aOC1WbVOde/jQY72bmwwTSyTyjBkzkynolAjBd2BLjAq3P8ZLkhsvHleIV u2TiGI4dQGolMPP54vFxGmqsJ3Z1D6L2Z2VZ3j3+CpP68nAW2l1TTrMOWeebI+TgRxIr n9P9BASfH5vm6mNvi6ABo6QViQ1M7gI0CMVDwNUmWzH3b2DGrvxV7/rxnYMNImyf45u2 DJEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature; bh=Unbty8nNkK7YONUoWOx1LklE7WdRs1VG0UIIrgjCenI=; b=yAGxOpMZrGYy7a8DCL/NLRtDZiWaEibbTtXeNYifEfQs46DeF3iIqAqxiJoXI0uGAb JrU1gQpLKt9OUL+ARRqPfQuRC34vMfltm403A9eI7nesf7c1ayrmS8OKJfnNKoAiVHcs YhhibY8uQ66SqJNz07MJhPiHMdNBMGVs70y/igpD8SdS9oDKILZvBG2BbvJggHnZ+TCj i6reRHSxWRBYrGA4xjPgZ5IzcB7LTZEjGHqc4IHJHi7NK40URn8tZZj6mZEuoT4afAl7 XG/CgNjs0cvm0RHObbE8OVBkgqt5PpJvAHMNHzzjyjWuO27tuXpAR7mh6Id7yY8Skeku 6UGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HGZRWHAw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l11si3585830pgp.521.2021.09.30.06.14.44; Thu, 30 Sep 2021 06:15:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HGZRWHAw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236162AbhI3Mn1 (ORCPT + 99 others); Thu, 30 Sep 2021 08:43:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:46188 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350974AbhI3Mn0 (ORCPT ); Thu, 30 Sep 2021 08:43:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633005703; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Unbty8nNkK7YONUoWOx1LklE7WdRs1VG0UIIrgjCenI=; b=HGZRWHAwRiMg7xk6aBelgvJMRTdsfxx6pOdYxyugg70517rkmMaJvhPh0tDcdPUvfcrA8I PJ9JOHX0o+fNt/dnEWM3YvHcyiKU3ogIZnC9owsGPNhGVE87Js2gkLVPQ7QsWoNW8nVKJr HbAwnMGjmW3jfXoBImcMpsc5pDgKpUM= Received: from mail-oo1-f70.google.com (mail-oo1-f70.google.com [209.85.161.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-172-UfYWzS1XPTe80Oq30_Z3og-1; Thu, 30 Sep 2021 08:41:42 -0400 X-MC-Unique: UfYWzS1XPTe80Oq30_Z3og-1 Received: by mail-oo1-f70.google.com with SMTP id i1-20020a4a9001000000b002a9c41e0eabso4667430oog.3 for ; Thu, 30 Sep 2021 05:41:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=Unbty8nNkK7YONUoWOx1LklE7WdRs1VG0UIIrgjCenI=; b=xPJwYvVGtq6TvSET3KDSqEltS8HfxGcbBAEE0YdNHEYzyL6loa2xoP6AsXH7MRlA3h SNoQZSMr8vnC49VuaAVqTl3789OgQ3xpSC1dltEGivyCDL0ulU4j/8dXYWgxgOZ9CGMN bxyxLOEcVTJxsr7JfsByR7BM5QufYlNHjKcrfK1xrG8pjoXMAVxJFE2jz8COElJ3Rnnb 0vWIQ9iyruVYRzJx46H5sXp9jaGF+uKbUVOrhKNOzOI1s4KNSjeXqWqFFYK7/NRfebsJ GwFKygWumnF19I1fISoAWq+1vi16iW+Gc+8d413eVlRDB46bL4GPDUVcru7hjNBle9Jc Hq7w== X-Gm-Message-State: AOAM53025fIguiH+vmqmgzFt9OkU9D9GhmSXI61MJQQVLNE6fVnrTVdJ BJwo+qgR1c3k5cZQ+ZrDc6HZyplefwLD5L6NBfoetPI/mLXmd3DoqgrfG4sWvQeAe5u5YWuVrfs Xb0Be0fTMbkPUyQGpjQbGgSMa X-Received: by 2002:a05:6808:46:: with SMTP id v6mr2547338oic.72.1633005701661; Thu, 30 Sep 2021 05:41:41 -0700 (PDT) X-Received: by 2002:a05:6808:46:: with SMTP id v6mr2547322oic.72.1633005701417; Thu, 30 Sep 2021 05:41:41 -0700 (PDT) Received: from redhat.com ([198.99.80.109]) by smtp.gmail.com with ESMTPSA id e6sm530698otr.79.2021.09.30.05.41.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Sep 2021 05:41:40 -0700 (PDT) Date: Thu, 30 Sep 2021 06:41:39 -0600 From: Alex Williamson To: Max Gurtovoy Cc: Jason Gunthorpe , Leon Romanovsky , "Doug Ledford" , Yishai Hadas , "Bjorn Helgaas" , "David S. Miller" , "Jakub Kicinski" , Kirti Wankhede , , , , , , Saeed Mahameed , Cornelia Huck Subject: Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity Message-ID: <20210930064139.57bb74c0.alex.williamson@redhat.com> In-Reply-To: References: <20210927164648.1e2d49ac.alex.williamson@redhat.com> <20210927231239.GE3544071@ziepe.ca> <25c97be6-eb4a-fdc8-3ac1-5628073f0214@nvidia.com> <20210929063551.47590fbb.alex.williamson@redhat.com> <1eba059c-4743-4675-9f72-1a26b8f3c0f6@nvidia.com> <20210929075019.48d07deb.alex.williamson@redhat.com> <20210929091712.6390141c.alex.williamson@redhat.com> <20210929161433.GA1808627@ziepe.ca> <29835bf4-d094-ae6d-1a32-08e65847b52c@nvidia.com> <20210929164409.3c33e311.alex.williamson@redhat.com> Organization: Red Hat X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 30 Sep 2021 12:25:23 +0300 Max Gurtovoy wrote: > On 9/30/2021 1:44 AM, Alex Williamson wrote: > > On Thu, 30 Sep 2021 00:48:55 +0300 > > Max Gurtovoy wrote: > > > >> On 9/29/2021 7:14 PM, Jason Gunthorpe wrote: > >>> On Wed, Sep 29, 2021 at 06:28:44PM +0300, Max Gurtovoy wrote: > >>> > >>>>> So you have a device that's actively modifying its internal state, > >>>>> performing I/O, including DMA (thereby dirtying VM memory), all while > >>>>> in the _STOP state? And you don't see this as a problem? > >>>> I don't see how is it different from vfio-pci situation. > >>> vfio-pci provides no way to observe the migration state. It isn't > >>> "000b" > >> Alex said that there is a problem of compatibility. > >> > >> I migration SW is not involved, nobody will read this migration state. > > The _STOP state has a specific meaning regardless of whether userspace > > reads the device state value. I think what you're suggesting is that > > the device reports itself as _STOP'd but it's actually _RUNNING. Is > > that the compatibility workaround, create a self inconsistency? > > From migration point of view the device is stopped. The _RESUMING and _SAVING bits control the migration activity, the _RUNNING bit controls the ability of the device to modify its internal state and affect external state. The initial state of the device is absolutely not stopped. > > We cannot impose on userspace to move a device from _STOP to _RUNNING > > simply because the device supports the migration region, nor should we > > report a device state that is inconsistent with the actual device state. > > In this case we can think maybe moving to running during enabling the > bus master.. There are no spontaneous state transitions, device_state changes only via user manipulation of the register. > >>>> Maybe we need to rename STOP state. We can call it READY or LIVE or > >>>> NON_MIGRATION_STATE. > >>> It was a poor choice to use 000b as stop, but it doesn't really > >>> matter. The mlx5 driver should just pre-init this readable to running. > >> I guess we can do it for this reason. There is no functional problem nor > >> compatibility issue here as was mentioned. > >> > >> But still we need the kernel to track transitions. We don't want to > >> allow moving from RESUMING to SAVING state for example. How this > >> transition can be allowed ? > >> > >> In this case we need to fail the request from the migration SW... > > _RESUMING to _SAVING seems like a good way to test round trip migration > > without running the device to modify the state. Potentially it's a > > means to update a saved device migration data stream to a newer format > > using an intermediate driver version. > > what do you mean by "without running the device to modify the state." ? If a device is !_RUNNING it should not be advancing its internal state, therefore state-in == state-out. > did you describe a case where you migrate from source to dst and then > back to source with a new migration data format ? I'm speculating that as the driver evolves, the migration data stream generated from the device's migration region can change. Hopefully in compatible ways. The above sequence of restoring and extracting state without the complication of the device running could help to validate compatibility. Thanks, Alex