2024-04-25 13:28:39

by liulongfang

[permalink] [raw]
Subject: [PATCH v6 0/5] add debugfs to hisilicon migration driver

Add a debugfs function to the hisilicon migration driver in VFIO to
provide intermediate state values and data during device migration.

When the execution of live migration fails, the user can view the
status and data during the migration process separately from the
source and the destination, which is convenient for users to analyze
and locate problems.

Changes v5 -> v6
Modify log output calling error

Changes v4 -> v5
Adjust the debugfs file directory

Changes v3 -> v4
Rebased on kernel6.9

Changes v2 -> v3
Solve debugfs serialization problem.

Changes v1 -> v2
Solve the racy problem of io_base.

Longfang Liu (5):
hisi_acc_vfio_pci: extract public functions for container_of
hisi_acc_vfio_pci: modify the register location of the XQC address
hisi_acc_vfio_pci: create subfunction for data reading
hisi_acc_vfio_pci: register debugfs for hisilicon migration driver
Documentation: add debugfs description for hisi migration

.../ABI/testing/debugfs-hisi-migration | 27 ++
MAINTAINERS | 1 +
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 302 ++++++++++++++++--
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 10 +
4 files changed, 308 insertions(+), 32 deletions(-)
create mode 100644 Documentation/ABI/testing/debugfs-hisi-migration

--
2.24.0



2024-04-25 13:29:37

by liulongfang

[permalink] [raw]
Subject: [PATCH v6 1/5] hisi_acc_vfio_pci: extract public functions for container_of

In the current driver, vdev is obtained from struct
hisi_acc_vf_core_device through the container_of function.
This method is used in many places in the driver. In order to
reduce this repetitive operation, It was extracted into
a public function.

Signed-off-by: Longfang Liu <[email protected]>
---
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 21 ++++++++++---------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 9a3e97108ace..45351be8e270 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -630,6 +630,12 @@ static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vde
}
}

+static struct hisi_acc_vf_core_device *hisi_acc_get_vf_dev(struct vfio_device *vdev)
+{
+ return container_of(vdev, struct hisi_acc_vf_core_device,
+ core_device.vdev);
+}
+
static void hisi_acc_vf_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev)
{
hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
@@ -1033,8 +1039,7 @@ static struct file *
hisi_acc_vfio_pci_set_device_state(struct vfio_device *vdev,
enum vfio_device_mig_state new_state)
{
- struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(vdev,
- struct hisi_acc_vf_core_device, core_device.vdev);
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);
enum vfio_device_mig_state next_state;
struct file *res = NULL;
int ret;
@@ -1075,8 +1080,7 @@ static int
hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
enum vfio_device_mig_state *curr_state)
{
- struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(vdev,
- struct hisi_acc_vf_core_device, core_device.vdev);
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);

mutex_lock(&hisi_acc_vdev->state_mutex);
*curr_state = hisi_acc_vdev->mig_state;
@@ -1280,8 +1284,7 @@ static long hisi_acc_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int

static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
{
- struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(core_vdev,
- struct hisi_acc_vf_core_device, core_device.vdev);
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev);
struct vfio_pci_core_device *vdev = &hisi_acc_vdev->core_device;
int ret;

@@ -1304,8 +1307,7 @@ static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)

static void hisi_acc_vfio_pci_close_device(struct vfio_device *core_vdev)
{
- struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(core_vdev,
- struct hisi_acc_vf_core_device, core_device.vdev);
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev);
struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;

iounmap(vf_qm->io_base);
@@ -1320,8 +1322,7 @@ static const struct vfio_migration_ops hisi_acc_vfio_pci_migrn_state_ops = {

static int hisi_acc_vfio_pci_migrn_init_dev(struct vfio_device *core_vdev)
{
- struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(core_vdev,
- struct hisi_acc_vf_core_device, core_device.vdev);
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev);
struct pci_dev *pdev = to_pci_dev(core_vdev->dev);
struct hisi_qm *pf_qm = hisi_acc_get_pf_qm(pdev);

--
2.24.0


2024-04-25 13:30:37

by liulongfang

[permalink] [raw]
Subject: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address

According to the latest hardware register specification. The DMA
addresses of EQE and AEQE are not at the front of their respective
register groups, but start from the second.
So, previously fetching the value starting from the first register
would result in an incorrect address.

Therefore, the register location from which the address is obtained
needs to be modified.

Signed-off-by: Longfang Liu <[email protected]>
---
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 ++++----
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 3 +++
2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 45351be8e270..0c7e31076ff4 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -516,12 +516,12 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
return -EINVAL;

/* Every reg is 32 bit, the dma address is 64 bit. */
- vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
+ vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
- vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
- vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
+ vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
+ vf_data->aeqe_dma = vf_data->qm_aeqc_dw[QM_XQC_ADDR_HIGH];
vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
- vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
+ vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[QM_XQC_ADDR_LOW];

/* Through SQC_BT/CQC_BT to get sqc and cqc address */
ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
index 5bab46602fad..f887ab98581c 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
@@ -38,6 +38,9 @@
#define QM_REG_ADDR_OFFSET 0x0004

#define QM_XQC_ADDR_OFFSET 32U
+#define QM_XQC_ADDR_LOW 0x1
+#define QM_XQC_ADDR_HIGH 0x2
+
#define QM_VF_AEQ_INT_MASK 0x0004
#define QM_VF_EQ_INT_MASK 0x000c
#define QM_IFC_INT_SOURCE_V 0x0020
--
2.24.0


2024-04-25 13:31:02

by liulongfang

[permalink] [raw]
Subject: [PATCH v6 3/5] hisi_acc_vfio_pci: create subfunction for data reading

During the live migration process. It needs to obtain various status
data of drivers and devices. In order to facilitate calling it in the
debugfs function. For all operations that read data from device registers,
the driver creates a subfunction.
Also fixed the location of address data.

Signed-off-by: Longfang Liu <[email protected]>
---
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 54 +++++++++++--------
1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 0c7e31076ff4..bf358ba94b5d 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -486,31 +486,11 @@ static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
return 0;
}

-static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
- struct hisi_acc_vf_migration_file *migf)
+static int vf_qm_read_data(struct hisi_qm *vf_qm, struct acc_vf_data *vf_data)
{
- struct acc_vf_data *vf_data = &migf->vf_data;
- struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
struct device *dev = &vf_qm->pdev->dev;
int ret;

- if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
- /* Update state and return with match data */
- vf_data->vf_qm_state = QM_NOT_READY;
- hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
- migf->total_length = QM_MATCH_SIZE;
- return 0;
- }
-
- vf_data->vf_qm_state = QM_READY;
- hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
-
- ret = vf_qm_cache_wb(vf_qm);
- if (ret) {
- dev_err(dev, "failed to writeback QM Cache!\n");
- return ret;
- }
-
ret = qm_get_regs(vf_qm, vf_data);
if (ret)
return -EINVAL;
@@ -536,6 +516,38 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
return -EINVAL;
}

+ return 0;
+}
+
+static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+ struct hisi_acc_vf_migration_file *migf)
+{
+ struct acc_vf_data *vf_data = &migf->vf_data;
+ struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+ struct device *dev = &vf_qm->pdev->dev;
+ int ret;
+
+ if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
+ /* Update state and return with match data */
+ vf_data->vf_qm_state = QM_NOT_READY;
+ hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
+ migf->total_length = QM_MATCH_SIZE;
+ return 0;
+ }
+
+ vf_data->vf_qm_state = QM_READY;
+ hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
+
+ ret = vf_qm_cache_wb(vf_qm);
+ if (ret) {
+ dev_err(dev, "failed to writeback QM Cache!\n");
+ return ret;
+ }
+
+ ret = vf_qm_read_data(vf_qm, vf_data);
+ if (ret)
+ return -EINVAL;
+
migf->total_length = sizeof(struct acc_vf_data);
return 0;
}
--
2.24.0


2024-04-25 13:32:21

by liulongfang

[permalink] [raw]
Subject: [PATCH v6 5/5] Documentation: add debugfs description for hisi migration

Add a debugfs document description file to help users understand
how to use the hisilicon accelerator live migration driver's
debugfs.

Update the file paths that need to be maintained in MAINTAINERS

Signed-off-by: Longfang Liu <[email protected]>
---
.../ABI/testing/debugfs-hisi-migration | 27 +++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 28 insertions(+)
create mode 100644 Documentation/ABI/testing/debugfs-hisi-migration

diff --git a/Documentation/ABI/testing/debugfs-hisi-migration b/Documentation/ABI/testing/debugfs-hisi-migration
new file mode 100644
index 000000000000..f391f2366bfa
--- /dev/null
+++ b/Documentation/ABI/testing/debugfs-hisi-migration
@@ -0,0 +1,27 @@
+What: /sys/kernel/debug/vfio/<device>/migration/hisi_acc/dev_data
+Date: Apr 2024
+KernelVersion: 6.9
+Contact: Longfang Liu <[email protected]>
+Description: Read the live migration data of the vfio device.
+ These data include device status data, queue configuration
+ data, some task configuration data and Device attribute data.
+ The output format of the data is defined by the live
+ migration driver.
+
+What: /sys/kernel/debug/vfio/<device>/migration/hisi_acc/migf_data
+Date: Apr 2024
+KernelVersion: 6.9
+Contact: Longfang Liu <[email protected]>
+Description: Read the data from the last completed live migration.
+ These data include device status data, queue configuration
+ data, some task configuration data and Device attribute data.
+ The output format of the data is defined by the live
+ migration driver.
+
+What: /sys/kernel/debug/vfio/<device>/migration/hisi_acc/cmd_state
+Date: Apr 2024
+KernelVersion: 6.9
+Contact: Longfang Liu <[email protected]>
+Description: Used to obtain the device command sending and receiving
+ channel status. If successful, returns the command value.
+ If failed, return error log.
diff --git a/MAINTAINERS b/MAINTAINERS
index aa415ddc6c03..d2bcf926bb18 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23234,6 +23234,7 @@ M: Longfang Liu <[email protected]>
M: Shameer Kolothum <[email protected]>
L: [email protected]
S: Maintained
+F: Documentation/ABI/testing/debugfs-hisi-migration
F: drivers/vfio/pci/hisilicon/

VFIO MEDIATED DEVICE DRIVERS
--
2.24.0


2024-04-25 13:50:53

by liulongfang

[permalink] [raw]
Subject: [PATCH v6 4/5] hisi_acc_vfio_pci: register debugfs for hisilicon migration driver

On the debugfs framework of VFIO, if the CONFIG_VFIO_DEBUGFS macro is
enabled, the debug function is registered for the live migration driver
of the HiSilicon accelerator device.

After registering the HiSilicon accelerator device on the debugfs
framework of live migration of vfio, a directory file "hisi_acc"
of debugfs is created, and then three debug function files are
created in this directory:

vfio
|
+---<dev_name1>
| +---migration
| +--state
| +--hisi_acc
| +--dev_data
| +--migf_data
| +--cmd_state
|
+---<dev_name2>
+---migration
+--state
+--hisi_acc
+--dev_data
+--migf_data
+--cmd_state

dev_data file: read device data that needs to be migrated from the
current device in real time
migf_data file: read the migration data of the last live migration
from the current driver.
cmd_state: used to get the cmd channel state for the device.

+----------------+ +--------------+ +---------------+
| migration dev | | src dev | | dst dev |
+-------+--------+ +------+-------+ +-------+-------+
| | |
| +------v-------+ +-------v-------+
| | saving_mif | | resuming_migf |
read | | file | | file |
| +------+-------+ +-------+-------+
| | copy |
| +------------+----------+
| |
+-------v---------+ +-------v--------+
| data buffer | | debug_migf |
+-------+---------+ +-------+--------+
| |
cat | cat |
+-------v--------+ +-------v--------+
| dev_data | | migf_data |
+----------------+ +----------------+

When accessing debugfs, user can obtain the real-time status data
of the device through the "dev_data" file. It will directly read
the device status data and will not affect the live migration
function. Its data is stored in the allocated memory buffer,
and the memory is released after data returning to user mode.

To obtain the data of the last complete migration, user need to
obtain it through the "migf_data" file. Since the live migration
data only exists during the migration process, it is destroyed
after the migration is completed.
In order to save this data, a debug_migf file is created in the
driver. At the end of the live migration process, copy the data
to debug_migf.

Signed-off-by: Longfang Liu <[email protected]>
---
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 225 ++++++++++++++++++
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 7 +
2 files changed, 232 insertions(+)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index bf358ba94b5d..656b3d975940 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -627,15 +627,33 @@ static void hisi_acc_vf_disable_fd(struct hisi_acc_vf_migration_file *migf)
mutex_unlock(&migf->lock);
}

+static void hisi_acc_debug_migf_copy(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+ struct hisi_acc_vf_migration_file *src_migf)
+{
+ struct hisi_acc_vf_migration_file *dst_migf = hisi_acc_vdev->debug_migf;
+
+ if (!dst_migf)
+ return;
+
+ mutex_lock(&hisi_acc_vdev->enable_mutex);
+ dst_migf->disabled = src_migf->disabled;
+ dst_migf->total_length = src_migf->total_length;
+ memcpy(&dst_migf->vf_data, &src_migf->vf_data,
+ sizeof(struct acc_vf_data));
+ mutex_unlock(&hisi_acc_vdev->enable_mutex);
+}
+
static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vdev)
{
if (hisi_acc_vdev->resuming_migf) {
+ hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev->resuming_migf);
hisi_acc_vf_disable_fd(hisi_acc_vdev->resuming_migf);
fput(hisi_acc_vdev->resuming_migf->filp);
hisi_acc_vdev->resuming_migf = NULL;
}

if (hisi_acc_vdev->saving_migf) {
+ hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev->saving_migf);
hisi_acc_vf_disable_fd(hisi_acc_vdev->saving_migf);
fput(hisi_acc_vdev->saving_migf->filp);
hisi_acc_vdev->saving_migf = NULL;
@@ -1144,6 +1162,7 @@ static int hisi_acc_vf_qm_init(struct hisi_acc_vf_core_device *hisi_acc_vdev)
if (!vf_qm->io_base)
return -EIO;

+ mutex_init(&hisi_acc_vdev->enable_mutex);
vf_qm->fun_type = QM_HW_VF;
vf_qm->pdev = vf_dev;
mutex_init(&vf_qm->mailbox_lock);
@@ -1294,6 +1313,203 @@ static long hisi_acc_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int
return vfio_pci_core_ioctl(core_vdev, cmd, arg);
}

+static int hisi_acc_vf_debug_check(struct seq_file *seq, struct vfio_device *vdev)
+{
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);
+ struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+ struct device *dev = vdev->dev;
+ int ret;
+
+ if (!vdev->mig_ops) {
+ dev_err(dev, "device does not support live migration!\n");
+ return -EINVAL;
+ }
+
+ /**
+ * When the device is not opened, the io_base is not mapped.
+ * The driver cannot perform device read and write operations.
+ */
+ if (!hisi_acc_vdev->dev_opened) {
+ dev_err(dev, "device not opened!\n");
+ return -EINVAL;
+ }
+
+ ret = qm_wait_dev_not_ready(vf_qm);
+ if (ret) {
+ dev_err(dev, "VF device not ready!\n");
+ return -EBUSY;
+ }
+
+ return 0;
+}
+
+static int hisi_acc_vf_debug_cmd(struct seq_file *seq, void *data)
+{
+ struct device *vf_dev = seq->private;
+ struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev);
+ struct vfio_device *vdev = &core_device->vdev;
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);
+ struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+ u64 value;
+ int ret;
+
+ mutex_lock(&hisi_acc_vdev->enable_mutex);
+ ret = hisi_acc_vf_debug_check(seq, vdev);
+ if (ret) {
+ mutex_unlock(&hisi_acc_vdev->enable_mutex);
+ return ret;
+ }
+
+ value = readl(vf_qm->io_base + QM_MB_CMD_SEND_BASE);
+ if (value == QM_MB_CMD_NOT_READY) {
+ mutex_unlock(&hisi_acc_vdev->enable_mutex);
+ dev_err(vf_dev, "mailbox cmd channel not ready!\n");
+ return -EINVAL;
+ }
+ mutex_unlock(&hisi_acc_vdev->enable_mutex);
+ dev_err(vf_dev, "mailbox cmd channel state is OK!\n");
+
+ return 0;
+}
+
+static int hisi_acc_vf_dev_read(struct seq_file *seq, void *data)
+{
+ struct device *vf_dev = seq->private;
+ struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev);
+ struct vfio_device *vdev = &core_device->vdev;
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);
+ size_t vf_data_sz = offsetofend(struct acc_vf_data, padding);
+ struct hisi_acc_vf_migration_file *migf = NULL;
+ int ret;
+
+ migf = kzalloc(sizeof(struct hisi_acc_vf_migration_file), GFP_KERNEL);
+ if (!migf)
+ return -ENOMEM;
+
+ mutex_lock(&hisi_acc_vdev->enable_mutex);
+ ret = hisi_acc_vf_debug_check(seq, vdev);
+ if (ret) {
+ mutex_unlock(&hisi_acc_vdev->enable_mutex);
+ goto migf_err;
+ }
+
+ migf->vf_data.vf_qm_state = hisi_acc_vdev->vf_qm_state;
+ ret = vf_qm_read_data(&hisi_acc_vdev->vf_qm, &migf->vf_data);
+ if (ret) {
+ mutex_unlock(&hisi_acc_vdev->enable_mutex);
+ dev_err(vf_dev, "failed to read device data!\n");
+ goto migf_err;
+ }
+ mutex_unlock(&hisi_acc_vdev->enable_mutex);
+
+ if (hisi_acc_vdev->resuming_migf)
+ migf->disabled = hisi_acc_vdev->resuming_migf->disabled;
+ else if (hisi_acc_vdev->saving_migf)
+ migf->disabled = hisi_acc_vdev->saving_migf->disabled;
+ else
+ migf->disabled = true;
+ migf->total_length = sizeof(struct acc_vf_data);
+
+ seq_hex_dump(seq, "Dev Data:", DUMP_PREFIX_OFFSET, 16, 1,
+ (unsigned char *)&migf->vf_data,
+ vf_data_sz, false);
+
+ seq_printf(seq,
+ "acc device:\n"
+ "device ready: %u\n"
+ "device opened: %d\n"
+ "data valid: %d\n"
+ "data size: %lu\n",
+ hisi_acc_vdev->vf_qm_state,
+ hisi_acc_vdev->dev_opened,
+ migf->disabled,
+ migf->total_length);
+
+migf_err:
+ kfree(migf);
+
+ return ret;
+}
+
+static int hisi_acc_vf_migf_read(struct seq_file *seq, void *data)
+{
+ struct device *vf_dev = seq->private;
+ struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev);
+ struct vfio_device *vdev = &core_device->vdev;
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);
+ size_t vf_data_sz = offsetofend(struct acc_vf_data, padding);
+ struct hisi_acc_vf_migration_file *debug_migf = hisi_acc_vdev->debug_migf;
+
+ /* Check whether the live migration operation has been performed */
+ if (debug_migf->total_length < vf_data_sz) {
+ dev_err(vf_dev, "device not migrated!\n");
+ return -EAGAIN;
+ }
+
+ seq_hex_dump(seq, "Mig Data:", DUMP_PREFIX_OFFSET, 16, 1,
+ (unsigned char *)&debug_migf->vf_data,
+ vf_data_sz, false);
+
+ seq_printf(seq,
+ "acc device:\n"
+ "device ready: %u\n"
+ "device opened: %d\n"
+ "data valid: %d\n"
+ "data size: %lu\n",
+ hisi_acc_vdev->vf_qm_state,
+ hisi_acc_vdev->dev_opened,
+ debug_migf->disabled,
+ debug_migf->total_length);
+
+ return 0;
+}
+
+static int hisi_acc_vfio_debug_init(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ struct vfio_device *vdev = &hisi_acc_vdev->core_device.vdev;
+ struct dentry *vfio_dev_migration = NULL;
+ struct dentry *vfio_hisi_acc = NULL;
+ struct device *dev = vdev->dev;
+ void *migf = NULL;
+
+ if (!debugfs_initialized() ||
+ !IS_ENABLED(CONFIG_VFIO_DEBUGFS))
+ return 0;
+
+ migf = kzalloc(sizeof(struct hisi_acc_vf_migration_file), GFP_KERNEL);
+ if (!migf)
+ return -ENOMEM;
+ hisi_acc_vdev->debug_migf = migf;
+
+ vfio_dev_migration = debugfs_lookup("migration", vdev->debug_root);
+ if (!vfio_dev_migration) {
+ kfree(migf);
+ hisi_acc_vdev->debug_migf = NULL;
+ dev_err(dev, "failed to lookup migration debugfs file!\n");
+ return -ENODEV;
+ }
+
+ vfio_hisi_acc = debugfs_create_dir("hisi_acc", vfio_dev_migration);
+ debugfs_create_devm_seqfile(dev, "dev_data", vfio_hisi_acc,
+ hisi_acc_vf_dev_read);
+ debugfs_create_devm_seqfile(dev, "migf_data", vfio_hisi_acc,
+ hisi_acc_vf_migf_read);
+ debugfs_create_devm_seqfile(dev, "cmd_state", vfio_hisi_acc,
+ hisi_acc_vf_debug_cmd);
+
+ return 0;
+}
+
+static void hisi_acc_vf_debugfs_exit(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ if (!debugfs_initialized() ||
+ !IS_ENABLED(CONFIG_VFIO_DEBUGFS))
+ return;
+
+ if (hisi_acc_vdev->debug_migf)
+ kfree(hisi_acc_vdev->debug_migf);
+}
+
static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
{
struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev);
@@ -1311,9 +1527,11 @@ static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
return ret;
}
hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+ hisi_acc_vdev->dev_opened = true;
}

vfio_pci_core_finish_enable(vdev);
+
return 0;
}

@@ -1322,7 +1540,10 @@ static void hisi_acc_vfio_pci_close_device(struct vfio_device *core_vdev)
struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev);
struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;

+ hisi_acc_vdev->dev_opened = false;
+ mutex_lock(&hisi_acc_vdev->enable_mutex);
iounmap(vf_qm->io_base);
+ mutex_unlock(&hisi_acc_vdev->enable_mutex);
vfio_pci_core_close_device(core_vdev);
}

@@ -1413,6 +1634,9 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device
ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device);
if (ret)
goto out_put_vdev;
+
+ if (ops == &hisi_acc_vfio_pci_migrn_ops)
+ hisi_acc_vfio_debug_init(hisi_acc_vdev);
return 0;

out_put_vdev:
@@ -1425,6 +1649,7 @@ static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev)
struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev);

vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device);
+ hisi_acc_vf_debugfs_exit(hisi_acc_vdev);
vfio_put_device(&hisi_acc_vdev->core_device.vdev);
}

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
index f887ab98581c..93ee8bef32a1 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
@@ -32,6 +32,7 @@
#define QM_SQC_VFT_BASE_MASK_V2 GENMASK(15, 0)
#define QM_SQC_VFT_NUM_SHIFT_V2 45
#define QM_SQC_VFT_NUM_MASK_V2 GENMASK(9, 0)
+#define QM_MB_CMD_NOT_READY 0xffffffff

/* RW regs */
#define QM_REGS_MAX_LEN 7
@@ -114,5 +115,11 @@ struct hisi_acc_vf_core_device {
int vf_id;
struct hisi_acc_vf_migration_file *resuming_migf;
struct hisi_acc_vf_migration_file *saving_migf;
+
+ /* To make sure the device is enabled */
+ struct mutex enable_mutex;
+ bool dev_opened;
+ /* To save migration data */
+ struct hisi_acc_vf_migration_file *debug_migf;
};
#endif /* HISI_ACC_VFIO_PCI_H */
--
2.24.0


2024-05-03 16:12:13

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address

On Thu, 25 Apr 2024 21:23:19 +0800
Longfang Liu <[email protected]> wrote:

> According to the latest hardware register specification. The DMA
> addresses of EQE and AEQE are not at the front of their respective
> register groups, but start from the second.
> So, previously fetching the value starting from the first register
> would result in an incorrect address.
>
> Therefore, the register location from which the address is obtained
> needs to be modified.

How does this affect migration? Has it ever worked? Does this make
the migration data incompatible?

Fixes: ???

> Signed-off-by: Longfang Liu <[email protected]>
> ---
> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 ++++----
> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 3 +++
> 2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 45351be8e270..0c7e31076ff4 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -516,12 +516,12 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> return -EINVAL;
>
> /* Every reg is 32 bit, the dma address is 64 bit. */
> - vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
> + vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
> vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> - vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
> - vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
> + vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
> + vf_data->aeqe_dma = vf_data->qm_aeqc_dw[QM_XQC_ADDR_HIGH];
> vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> - vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
> + vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[QM_XQC_ADDR_LOW];
>
> /* Through SQC_BT/CQC_BT to get sqc and cqc address */
> ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> index 5bab46602fad..f887ab98581c 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> @@ -38,6 +38,9 @@
> #define QM_REG_ADDR_OFFSET 0x0004
>
> #define QM_XQC_ADDR_OFFSET 32U
> +#define QM_XQC_ADDR_LOW 0x1
> +#define QM_XQC_ADDR_HIGH 0x2
> +
> #define QM_VF_AEQ_INT_MASK 0x0004
> #define QM_VF_EQ_INT_MASK 0x000c
> #define QM_IFC_INT_SOURCE_V 0x0020


2024-05-03 16:25:25

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v6 3/5] hisi_acc_vfio_pci: create subfunction for data reading

On Thu, 25 Apr 2024 21:23:20 +0800
Longfang Liu <[email protected]> wrote:

> During the live migration process.

This is not a complete sentence.

> It needs to obtain various status
> data of drivers and devices.

What's "It" describing here?

> In order to facilitate calling it in the
> debugfs function.

Also not a complete sentence.

> For all operations that read data from device registers,
> the driver creates a subfunction.

There's only one sub-function.

> Also fixed the location of address data.

I think this is addressed in the previous patch now? Thanks,

Alex

> Signed-off-by: Longfang Liu <[email protected]>
> ---
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 54 +++++++++++--------
> 1 file changed, 33 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index 0c7e31076ff4..bf358ba94b5d 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -486,31 +486,11 @@ static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> return 0;
> }
>
> -static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> - struct hisi_acc_vf_migration_file *migf)
> +static int vf_qm_read_data(struct hisi_qm *vf_qm, struct acc_vf_data *vf_data)
> {
> - struct acc_vf_data *vf_data = &migf->vf_data;
> - struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> struct device *dev = &vf_qm->pdev->dev;
> int ret;
>
> - if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
> - /* Update state and return with match data */
> - vf_data->vf_qm_state = QM_NOT_READY;
> - hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
> - migf->total_length = QM_MATCH_SIZE;
> - return 0;
> - }
> -
> - vf_data->vf_qm_state = QM_READY;
> - hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
> -
> - ret = vf_qm_cache_wb(vf_qm);
> - if (ret) {
> - dev_err(dev, "failed to writeback QM Cache!\n");
> - return ret;
> - }
> -
> ret = qm_get_regs(vf_qm, vf_data);
> if (ret)
> return -EINVAL;
> @@ -536,6 +516,38 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> return -EINVAL;
> }
>
> + return 0;
> +}
> +
> +static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> + struct hisi_acc_vf_migration_file *migf)
> +{
> + struct acc_vf_data *vf_data = &migf->vf_data;
> + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> + struct device *dev = &vf_qm->pdev->dev;
> + int ret;
> +
> + if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
> + /* Update state and return with match data */
> + vf_data->vf_qm_state = QM_NOT_READY;
> + hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
> + migf->total_length = QM_MATCH_SIZE;
> + return 0;
> + }
> +
> + vf_data->vf_qm_state = QM_READY;
> + hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
> +
> + ret = vf_qm_cache_wb(vf_qm);
> + if (ret) {
> + dev_err(dev, "failed to writeback QM Cache!\n");
> + return ret;
> + }
> +
> + ret = vf_qm_read_data(vf_qm, vf_data);
> + if (ret)
> + return -EINVAL;
> +
> migf->total_length = sizeof(struct acc_vf_data);
> return 0;
> }


2024-05-03 17:23:23

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v6 4/5] hisi_acc_vfio_pci: register debugfs for hisilicon migration driver

On Thu, 25 Apr 2024 21:23:21 +0800
Longfang Liu <[email protected]> wrote:

> On the debugfs framework of VFIO, if the CONFIG_VFIO_DEBUGFS macro is
> enabled, the debug function is registered for the live migration driver
> of the HiSilicon accelerator device.
>
> After registering the HiSilicon accelerator device on the debugfs
> framework of live migration of vfio, a directory file "hisi_acc"
> of debugfs is created, and then three debug function files are
> created in this directory:
>
> vfio
> |
> +---<dev_name1>
> | +---migration
> | +--state
> | +--hisi_acc
> | +--dev_data
> | +--migf_data
> | +--cmd_state
> |
> +---<dev_name2>
> +---migration
> +--state
> +--hisi_acc
> +--dev_data
> +--migf_data
> +--cmd_state
>
> dev_data file: read device data that needs to be migrated from the
> current device in real time
> migf_data file: read the migration data of the last live migration
> from the current driver.
> cmd_state: used to get the cmd channel state for the device.
>
> +----------------+ +--------------+ +---------------+
> | migration dev | | src dev | | dst dev |
> +-------+--------+ +------+-------+ +-------+-------+
> | | |
> | +------v-------+ +-------v-------+
> | | saving_mif | | resuming_migf |
> read | | file | | file |
> | +------+-------+ +-------+-------+
> | | copy |
> | +------------+----------+
> | |
> +-------v---------+ +-------v--------+
> | data buffer | | debug_migf |
> +-------+---------+ +-------+--------+
> | |
> cat | cat |
> +-------v--------+ +-------v--------+
> | dev_data | | migf_data |
> +----------------+ +----------------+
>
> When accessing debugfs, user can obtain the real-time status data
> of the device through the "dev_data" file. It will directly read
> the device status data and will not affect the live migration
> function. Its data is stored in the allocated memory buffer,
> and the memory is released after data returning to user mode.
>
> To obtain the data of the last complete migration, user need to
> obtain it through the "migf_data" file. Since the live migration
> data only exists during the migration process, it is destroyed
> after the migration is completed.
> In order to save this data, a debug_migf file is created in the
> driver. At the end of the live migration process, copy the data
> to debug_migf.
>
> Signed-off-by: Longfang Liu <[email protected]>
> ---
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 225 ++++++++++++++++++
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 7 +
> 2 files changed, 232 insertions(+)
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> index bf358ba94b5d..656b3d975940 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> @@ -627,15 +627,33 @@ static void hisi_acc_vf_disable_fd(struct hisi_acc_vf_migration_file *migf)
> mutex_unlock(&migf->lock);
> }
>
> +static void hisi_acc_debug_migf_copy(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> + struct hisi_acc_vf_migration_file *src_migf)
> +{
> + struct hisi_acc_vf_migration_file *dst_migf = hisi_acc_vdev->debug_migf;
> +
> + if (!dst_migf)
> + return;
> +
> + mutex_lock(&hisi_acc_vdev->enable_mutex);
> + dst_migf->disabled = src_migf->disabled;
> + dst_migf->total_length = src_migf->total_length;
> + memcpy(&dst_migf->vf_data, &src_migf->vf_data,
> + sizeof(struct acc_vf_data));
> + mutex_unlock(&hisi_acc_vdev->enable_mutex);
> +}
> +
> static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vdev)
> {
> if (hisi_acc_vdev->resuming_migf) {
> + hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev->resuming_migf);
> hisi_acc_vf_disable_fd(hisi_acc_vdev->resuming_migf);
> fput(hisi_acc_vdev->resuming_migf->filp);
> hisi_acc_vdev->resuming_migf = NULL;
> }
>
> if (hisi_acc_vdev->saving_migf) {
> + hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev->saving_migf);
> hisi_acc_vf_disable_fd(hisi_acc_vdev->saving_migf);
> fput(hisi_acc_vdev->saving_migf->filp);
> hisi_acc_vdev->saving_migf = NULL;
> @@ -1144,6 +1162,7 @@ static int hisi_acc_vf_qm_init(struct hisi_acc_vf_core_device *hisi_acc_vdev)
> if (!vf_qm->io_base)
> return -EIO;
>
> + mutex_init(&hisi_acc_vdev->enable_mutex);
> vf_qm->fun_type = QM_HW_VF;
> vf_qm->pdev = vf_dev;
> mutex_init(&vf_qm->mailbox_lock);
> @@ -1294,6 +1313,203 @@ static long hisi_acc_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int
> return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> }
>
> +static int hisi_acc_vf_debug_check(struct seq_file *seq, struct vfio_device *vdev)
> +{
> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);
> + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> + struct device *dev = vdev->dev;
> + int ret;
> +
> + if (!vdev->mig_ops) {
> + dev_err(dev, "device does not support live migration!\n");

Sorry, every error path should not spam dmesg with dev_err(). I'm
going to wait until your co-maintainer approves this before looking at
any further iterations of this series. Thanks,

Alex

> + return -EINVAL;
> + }
> +
> + /**
> + * When the device is not opened, the io_base is not mapped.
> + * The driver cannot perform device read and write operations.
> + */
> + if (!hisi_acc_vdev->dev_opened) {
> + dev_err(dev, "device not opened!\n");
> + return -EINVAL;
> + }
> +
> + ret = qm_wait_dev_not_ready(vf_qm);
> + if (ret) {
> + dev_err(dev, "VF device not ready!\n");
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +
> +static int hisi_acc_vf_debug_cmd(struct seq_file *seq, void *data)
> +{
> + struct device *vf_dev = seq->private;
> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev);
> + struct vfio_device *vdev = &core_device->vdev;
> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);
> + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> + u64 value;
> + int ret;
> +
> + mutex_lock(&hisi_acc_vdev->enable_mutex);
> + ret = hisi_acc_vf_debug_check(seq, vdev);
> + if (ret) {
> + mutex_unlock(&hisi_acc_vdev->enable_mutex);
> + return ret;
> + }
> +
> + value = readl(vf_qm->io_base + QM_MB_CMD_SEND_BASE);
> + if (value == QM_MB_CMD_NOT_READY) {
> + mutex_unlock(&hisi_acc_vdev->enable_mutex);
> + dev_err(vf_dev, "mailbox cmd channel not ready!\n");
> + return -EINVAL;
> + }
> + mutex_unlock(&hisi_acc_vdev->enable_mutex);
> + dev_err(vf_dev, "mailbox cmd channel state is OK!\n");
> +
> + return 0;
> +}
> +
> +static int hisi_acc_vf_dev_read(struct seq_file *seq, void *data)
> +{
> + struct device *vf_dev = seq->private;
> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev);
> + struct vfio_device *vdev = &core_device->vdev;
> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);
> + size_t vf_data_sz = offsetofend(struct acc_vf_data, padding);
> + struct hisi_acc_vf_migration_file *migf = NULL;
> + int ret;
> +
> + migf = kzalloc(sizeof(struct hisi_acc_vf_migration_file), GFP_KERNEL);
> + if (!migf)
> + return -ENOMEM;
> +
> + mutex_lock(&hisi_acc_vdev->enable_mutex);
> + ret = hisi_acc_vf_debug_check(seq, vdev);
> + if (ret) {
> + mutex_unlock(&hisi_acc_vdev->enable_mutex);
> + goto migf_err;
> + }
> +
> + migf->vf_data.vf_qm_state = hisi_acc_vdev->vf_qm_state;
> + ret = vf_qm_read_data(&hisi_acc_vdev->vf_qm, &migf->vf_data);
> + if (ret) {
> + mutex_unlock(&hisi_acc_vdev->enable_mutex);
> + dev_err(vf_dev, "failed to read device data!\n");
> + goto migf_err;
> + }
> + mutex_unlock(&hisi_acc_vdev->enable_mutex);
> +
> + if (hisi_acc_vdev->resuming_migf)
> + migf->disabled = hisi_acc_vdev->resuming_migf->disabled;
> + else if (hisi_acc_vdev->saving_migf)
> + migf->disabled = hisi_acc_vdev->saving_migf->disabled;
> + else
> + migf->disabled = true;
> + migf->total_length = sizeof(struct acc_vf_data);
> +
> + seq_hex_dump(seq, "Dev Data:", DUMP_PREFIX_OFFSET, 16, 1,
> + (unsigned char *)&migf->vf_data,
> + vf_data_sz, false);
> +
> + seq_printf(seq,
> + "acc device:\n"
> + "device ready: %u\n"
> + "device opened: %d\n"
> + "data valid: %d\n"
> + "data size: %lu\n",
> + hisi_acc_vdev->vf_qm_state,
> + hisi_acc_vdev->dev_opened,
> + migf->disabled,
> + migf->total_length);
> +
> +migf_err:
> + kfree(migf);
> +
> + return ret;
> +}
> +
> +static int hisi_acc_vf_migf_read(struct seq_file *seq, void *data)
> +{
> + struct device *vf_dev = seq->private;
> + struct vfio_pci_core_device *core_device = dev_get_drvdata(vf_dev);
> + struct vfio_device *vdev = &core_device->vdev;
> + struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(vdev);
> + size_t vf_data_sz = offsetofend(struct acc_vf_data, padding);
> + struct hisi_acc_vf_migration_file *debug_migf = hisi_acc_vdev->debug_migf;
> +
> + /* Check whether the live migration operation has been performed */
> + if (debug_migf->total_length < vf_data_sz) {
> + dev_err(vf_dev, "device not migrated!\n");
> + return -EAGAIN;
> + }
> +
> + seq_hex_dump(seq, "Mig Data:", DUMP_PREFIX_OFFSET, 16, 1,
> + (unsigned char *)&debug_migf->vf_data,
> + vf_data_sz, false);
> +
> + seq_printf(seq,
> + "acc device:\n"
> + "device ready: %u\n"
> + "device opened: %d\n"
> + "data valid: %d\n"
> + "data size: %lu\n",
> + hisi_acc_vdev->vf_qm_state,
> + hisi_acc_vdev->dev_opened,
> + debug_migf->disabled,
> + debug_migf->total_length);
> +
> + return 0;
> +}
> +
> +static int hisi_acc_vfio_debug_init(struct hisi_acc_vf_core_device *hisi_acc_vdev)
> +{
> + struct vfio_device *vdev = &hisi_acc_vdev->core_device.vdev;
> + struct dentry *vfio_dev_migration = NULL;
> + struct dentry *vfio_hisi_acc = NULL;
> + struct device *dev = vdev->dev;
> + void *migf = NULL;
> +
> + if (!debugfs_initialized() ||
> + !IS_ENABLED(CONFIG_VFIO_DEBUGFS))
> + return 0;
> +
> + migf = kzalloc(sizeof(struct hisi_acc_vf_migration_file), GFP_KERNEL);
> + if (!migf)
> + return -ENOMEM;
> + hisi_acc_vdev->debug_migf = migf;
> +
> + vfio_dev_migration = debugfs_lookup("migration", vdev->debug_root);
> + if (!vfio_dev_migration) {
> + kfree(migf);
> + hisi_acc_vdev->debug_migf = NULL;
> + dev_err(dev, "failed to lookup migration debugfs file!\n");
> + return -ENODEV;
> + }
> +
> + vfio_hisi_acc = debugfs_create_dir("hisi_acc", vfio_dev_migration);
> + debugfs_create_devm_seqfile(dev, "dev_data", vfio_hisi_acc,
> + hisi_acc_vf_dev_read);
> + debugfs_create_devm_seqfile(dev, "migf_data", vfio_hisi_acc,
> + hisi_acc_vf_migf_read);
> + debugfs_create_devm_seqfile(dev, "cmd_state", vfio_hisi_acc,
> + hisi_acc_vf_debug_cmd);
> +
> + return 0;
> +}
> +
> +static void hisi_acc_vf_debugfs_exit(struct hisi_acc_vf_core_device *hisi_acc_vdev)
> +{
> + if (!debugfs_initialized() ||
> + !IS_ENABLED(CONFIG_VFIO_DEBUGFS))
> + return;
> +
> + if (hisi_acc_vdev->debug_migf)
> + kfree(hisi_acc_vdev->debug_migf);
> +}
> +
> static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
> {
> struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev);
> @@ -1311,9 +1527,11 @@ static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
> return ret;
> }
> hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
> + hisi_acc_vdev->dev_opened = true;
> }
>
> vfio_pci_core_finish_enable(vdev);
> +
> return 0;
> }
>
> @@ -1322,7 +1540,10 @@ static void hisi_acc_vfio_pci_close_device(struct vfio_device *core_vdev)
> struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_get_vf_dev(core_vdev);
> struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
>
> + hisi_acc_vdev->dev_opened = false;
> + mutex_lock(&hisi_acc_vdev->enable_mutex);
> iounmap(vf_qm->io_base);
> + mutex_unlock(&hisi_acc_vdev->enable_mutex);
> vfio_pci_core_close_device(core_vdev);
> }
>
> @@ -1413,6 +1634,9 @@ static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device
> ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device);
> if (ret)
> goto out_put_vdev;
> +
> + if (ops == &hisi_acc_vfio_pci_migrn_ops)
> + hisi_acc_vfio_debug_init(hisi_acc_vdev);
> return 0;
>
> out_put_vdev:
> @@ -1425,6 +1649,7 @@ static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev)
> struct hisi_acc_vf_core_device *hisi_acc_vdev = hisi_acc_drvdata(pdev);
>
> vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device);
> + hisi_acc_vf_debugfs_exit(hisi_acc_vdev);
> vfio_put_device(&hisi_acc_vdev->core_device.vdev);
> }
>
> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> index f887ab98581c..93ee8bef32a1 100644
> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> @@ -32,6 +32,7 @@
> #define QM_SQC_VFT_BASE_MASK_V2 GENMASK(15, 0)
> #define QM_SQC_VFT_NUM_SHIFT_V2 45
> #define QM_SQC_VFT_NUM_MASK_V2 GENMASK(9, 0)
> +#define QM_MB_CMD_NOT_READY 0xffffffff
>
> /* RW regs */
> #define QM_REGS_MAX_LEN 7
> @@ -114,5 +115,11 @@ struct hisi_acc_vf_core_device {
> int vf_id;
> struct hisi_acc_vf_migration_file *resuming_migf;
> struct hisi_acc_vf_migration_file *saving_migf;
> +
> + /* To make sure the device is enabled */
> + struct mutex enable_mutex;
> + bool dev_opened;
> + /* To save migration data */
> + struct hisi_acc_vf_migration_file *debug_migf;
> };
> #endif /* HISI_ACC_VFIO_PCI_H */


Subject: RE: [PATCH v6 4/5] hisi_acc_vfio_pci: register debugfs for hisilicon migration driver



> -----Original Message-----
> From: Alex Williamson <[email protected]>
> Sent: Friday, May 3, 2024 6:21 PM
> To: liulongfang <[email protected]>
> Cc: [email protected]; Shameerali Kolothum Thodi
> <[email protected]>; Jonathan Cameron
> <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH v6 4/5] hisi_acc_vfio_pci: register debugfs for hisilicon
> migration driver
>
> On Thu, 25 Apr 2024 21:23:21 +0800
> Longfang Liu <[email protected]> wrote:
>
> > On the debugfs framework of VFIO, if the CONFIG_VFIO_DEBUGFS macro is
> > enabled, the debug function is registered for the live migration driver
> > of the HiSilicon accelerator device.
> >
> > After registering the HiSilicon accelerator device on the debugfs
> > framework of live migration of vfio, a directory file "hisi_acc"
> > of debugfs is created, and then three debug function files are
> > created in this directory:
> >
> > vfio
> > |
> > +---<dev_name1>
> > | +---migration
> > | +--state
> > | +--hisi_acc
> > | +--dev_data
> > | +--migf_data
> > | +--cmd_state
> > |
> > +---<dev_name2>
> > +---migration
> > +--state
> > +--hisi_acc
> > +--dev_data
> > +--migf_data
> > +--cmd_state
> >
> > dev_data file: read device data that needs to be migrated from the
> > current device in real time
> > migf_data file: read the migration data of the last live migration
> > from the current driver.
> > cmd_state: used to get the cmd channel state for the device.
> >
> > +----------------+ +--------------+ +---------------+
> > | migration dev | | src dev | | dst dev |
> > +-------+--------+ +------+-------+ +-------+-------+
> > | | |
> > | +------v-------+ +-------v-------+
> > | | saving_mif | | resuming_migf |
> > read | | file | | file |
> > | +------+-------+ +-------+-------+
> > | | copy |
> > | +------------+----------+
> > | |
> > +-------v---------+ +-------v--------+
> > | data buffer | | debug_migf |
> > +-------+---------+ +-------+--------+
> > | |
> > cat | cat |
> > +-------v--------+ +-------v--------+
> > | dev_data | | migf_data |
> > +----------------+ +----------------+
> >
> > When accessing debugfs, user can obtain the real-time status data
> > of the device through the "dev_data" file. It will directly read
> > the device status data and will not affect the live migration
> > function. Its data is stored in the allocated memory buffer,
> > and the memory is released after data returning to user mode.
> >
> > To obtain the data of the last complete migration, user need to
> > obtain it through the "migf_data" file. Since the live migration
> > data only exists during the migration process, it is destroyed
> > after the migration is completed.
> > In order to save this data, a debug_migf file is created in the
> > driver. At the end of the live migration process, copy the data
> > to debug_migf.
> >
> > Signed-off-by: Longfang Liu <[email protected]>
> > ---
> > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 225 ++++++++++++++++++
> > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 7 +
> > 2 files changed, 232 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > index bf358ba94b5d..656b3d975940 100644
> > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > @@ -627,15 +627,33 @@ static void hisi_acc_vf_disable_fd(struct
> hisi_acc_vf_migration_file *migf)
> > mutex_unlock(&migf->lock);
> > }
> >
> > +static void hisi_acc_debug_migf_copy(struct hisi_acc_vf_core_device
> *hisi_acc_vdev,
> > + struct hisi_acc_vf_migration_file *src_migf)
> > +{
> > + struct hisi_acc_vf_migration_file *dst_migf = hisi_acc_vdev-
> >debug_migf;
> > +
> > + if (!dst_migf)
> > + return;
> > +
> > + mutex_lock(&hisi_acc_vdev->enable_mutex);
> > + dst_migf->disabled = src_migf->disabled;
> > + dst_migf->total_length = src_migf->total_length;
> > + memcpy(&dst_migf->vf_data, &src_migf->vf_data,
> > + sizeof(struct acc_vf_data));
> > + mutex_unlock(&hisi_acc_vdev->enable_mutex);
> > +}
> > +
> > static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device
> *hisi_acc_vdev)
> > {
> > if (hisi_acc_vdev->resuming_migf) {
> > + hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev-
> >resuming_migf);
> > hisi_acc_vf_disable_fd(hisi_acc_vdev->resuming_migf);
> > fput(hisi_acc_vdev->resuming_migf->filp);
> > hisi_acc_vdev->resuming_migf = NULL;
> > }
> >
> > if (hisi_acc_vdev->saving_migf) {
> > + hisi_acc_debug_migf_copy(hisi_acc_vdev, hisi_acc_vdev-
> >saving_migf);
> > hisi_acc_vf_disable_fd(hisi_acc_vdev->saving_migf);
> > fput(hisi_acc_vdev->saving_migf->filp);
> > hisi_acc_vdev->saving_migf = NULL;
> > @@ -1144,6 +1162,7 @@ static int hisi_acc_vf_qm_init(struct
> hisi_acc_vf_core_device *hisi_acc_vdev)
> > if (!vf_qm->io_base)
> > return -EIO;
> >
> > + mutex_init(&hisi_acc_vdev->enable_mutex);
> > vf_qm->fun_type = QM_HW_VF;
> > vf_qm->pdev = vf_dev;
> > mutex_init(&vf_qm->mailbox_lock);
> > @@ -1294,6 +1313,203 @@ static long hisi_acc_vfio_pci_ioctl(struct
> vfio_device *core_vdev, unsigned int
> > return vfio_pci_core_ioctl(core_vdev, cmd, arg);
> > }
> >
> > +static int hisi_acc_vf_debug_check(struct seq_file *seq, struct vfio_device
> *vdev)
> > +{
> > + struct hisi_acc_vf_core_device *hisi_acc_vdev =
> hisi_acc_get_vf_dev(vdev);
> > + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
> > + struct device *dev = vdev->dev;
> > + int ret;
> > +
> > + if (!vdev->mig_ops) {
> > + dev_err(dev, "device does not support live migration!\n");
>
> Sorry, every error path should not spam dmesg with dev_err(). I'm
> going to wait until your co-maintainer approves this before looking at
> any further iterations of this series. Thanks,

Sure. I will sync up with Longfang and also make sure we address all the existing
comments on this before posting the next revision.

Thanks,
Shameer

2024-05-07 08:29:34

by liulongfang

[permalink] [raw]
Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address

On 2024/5/4 0:11, Alex Williamson wrote:
> On Thu, 25 Apr 2024 21:23:19 +0800
> Longfang Liu <[email protected]> wrote:
>
>> According to the latest hardware register specification. The DMA
>> addresses of EQE and AEQE are not at the front of their respective
>> register groups, but start from the second.
>> So, previously fetching the value starting from the first register
>> would result in an incorrect address.
>>
>> Therefore, the register location from which the address is obtained
>> needs to be modified.
>
> How does this affect migration? Has it ever worked? Does this make

The general HiSilicon accelerator task will only use SQE and CQE.
EQE is only used when user running kernel mode task and uses interrupt mode.
AEQE is only used when user running task exceptions occur and software reset
is required.

The DMA addresses of these four queues are written to the device by the device
driver through the mailbox command during driver initialization.
The DMA addresses of EQE and AEQE are migrated through the device register.

EQE and AEQE are not used in general task, after the live migration is completed,
this DMA address error will not be found. until we added a new kernel-mode test case
that we discovered that this address was abnormal.

> the migration data incompatible?
>

This address only affects the kernel mode interrupt mode task function and device
exception recovery function.
They do not affect live migration functionality

> Fixes: ???
>

OK!

Thanks.
Longfang.
>> Signed-off-by: Longfang Liu <[email protected]>
>> ---
>> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 ++++----
>> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 3 +++
>> 2 files changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> index 45351be8e270..0c7e31076ff4 100644
>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> @@ -516,12 +516,12 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
>> return -EINVAL;
>>
>> /* Every reg is 32 bit, the dma address is 64 bit. */
>> - vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
>> + vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
>> vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
>> - vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
>> - vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
>> + vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
>> + vf_data->aeqe_dma = vf_data->qm_aeqc_dw[QM_XQC_ADDR_HIGH];
>> vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
>> - vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
>> + vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[QM_XQC_ADDR_LOW];
>>
>> /* Through SQC_BT/CQC_BT to get sqc and cqc address */
>> ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>> index 5bab46602fad..f887ab98581c 100644
>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>> @@ -38,6 +38,9 @@
>> #define QM_REG_ADDR_OFFSET 0x0004
>>
>> #define QM_XQC_ADDR_OFFSET 32U
>> +#define QM_XQC_ADDR_LOW 0x1
>> +#define QM_XQC_ADDR_HIGH 0x2
>> +
>> #define QM_VF_AEQ_INT_MASK 0x0004
>> #define QM_VF_EQ_INT_MASK 0x000c
>> #define QM_IFC_INT_SOURCE_V 0x0020
>
> .
>

2024-05-07 08:31:09

by liulongfang

[permalink] [raw]
Subject: Re: [PATCH v6 3/5] hisi_acc_vfio_pci: create subfunction for data reading

On 2024/5/4 0:25, Alex Williamson wrote:
> On Thu, 25 Apr 2024 21:23:20 +0800
> Longfang Liu <[email protected]> wrote:
>
>> During the live migration process.
>
> This is not a complete sentence.
>
>> It needs to obtain various status
>> data of drivers and devices.
>
> What's "It" describing here?
>
>> In order to facilitate calling it in the
>> debugfs function.
>
> Also not a complete sentence.
>
>> For all operations that read data from device registers,
>> the driver creates a subfunction.
>
> There's only one sub-function.
>
>> Also fixed the location of address data.
>
> I think this is addressed in the previous patch now? Thanks,
>

OK, I'll update the descriptions again.

> Alex
>

Thanks,
Longfang.

>> Signed-off-by: Longfang Liu <[email protected]>
>> ---
>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 54 +++++++++++--------
>> 1 file changed, 33 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> index 0c7e31076ff4..bf358ba94b5d 100644
>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> @@ -486,31 +486,11 @@ static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
>> return 0;
>> }
>>
>> -static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
>> - struct hisi_acc_vf_migration_file *migf)
>> +static int vf_qm_read_data(struct hisi_qm *vf_qm, struct acc_vf_data *vf_data)
>> {
>> - struct acc_vf_data *vf_data = &migf->vf_data;
>> - struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
>> struct device *dev = &vf_qm->pdev->dev;
>> int ret;
>>
>> - if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
>> - /* Update state and return with match data */
>> - vf_data->vf_qm_state = QM_NOT_READY;
>> - hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
>> - migf->total_length = QM_MATCH_SIZE;
>> - return 0;
>> - }
>> -
>> - vf_data->vf_qm_state = QM_READY;
>> - hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
>> -
>> - ret = vf_qm_cache_wb(vf_qm);
>> - if (ret) {
>> - dev_err(dev, "failed to writeback QM Cache!\n");
>> - return ret;
>> - }
>> -
>> ret = qm_get_regs(vf_qm, vf_data);
>> if (ret)
>> return -EINVAL;
>> @@ -536,6 +516,38 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
>> return -EINVAL;
>> }
>>
>> + return 0;
>> +}
>> +
>> +static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
>> + struct hisi_acc_vf_migration_file *migf)
>> +{
>> + struct acc_vf_data *vf_data = &migf->vf_data;
>> + struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
>> + struct device *dev = &vf_qm->pdev->dev;
>> + int ret;
>> +
>> + if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
>> + /* Update state and return with match data */
>> + vf_data->vf_qm_state = QM_NOT_READY;
>> + hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
>> + migf->total_length = QM_MATCH_SIZE;
>> + return 0;
>> + }
>> +
>> + vf_data->vf_qm_state = QM_READY;
>> + hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
>> +
>> + ret = vf_qm_cache_wb(vf_qm);
>> + if (ret) {
>> + dev_err(dev, "failed to writeback QM Cache!\n");
>> + return ret;
>> + }
>> +
>> + ret = vf_qm_read_data(vf_qm, vf_data);
>> + if (ret)
>> + return -EINVAL;
>> +
>> migf->total_length = sizeof(struct acc_vf_data);
>> return 0;
>> }
>
> .
>

Subject: RE: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address



> -----Original Message-----
> From: liulongfang <[email protected]>
> Sent: Tuesday, May 7, 2024 9:29 AM
> To: Alex Williamson <[email protected]>
> Cc: [email protected]; Shameerali Kolothum Thodi
> <[email protected]>; Jonathan Cameron
> <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of
> the XQC address
>
> On 2024/5/4 0:11, Alex Williamson wrote:
> > On Thu, 25 Apr 2024 21:23:19 +0800
> > Longfang Liu <[email protected]> wrote:
> >
> >> According to the latest hardware register specification. The DMA
> >> addresses of EQE and AEQE are not at the front of their respective
> >> register groups, but start from the second.
> >> So, previously fetching the value starting from the first register
> >> would result in an incorrect address.
> >>
> >> Therefore, the register location from which the address is obtained
> >> needs to be modified.
> >
> > How does this affect migration? Has it ever worked? Does this make
>
> The general HiSilicon accelerator task will only use SQE and CQE.
> EQE is only used when user running kernel mode task and uses interrupt mode.
> AEQE is only used when user running task exceptions occur and software reset
> is required.
>
> The DMA addresses of these four queues are written to the device by the device
> driver through the mailbox command during driver initialization.
> The DMA addresses of EQE and AEQE are migrated through the device register.
>
> EQE and AEQE are not used in general task, after the live migration is
> completed,
> this DMA address error will not be found. until we added a new kernel-mode test
> case
> that we discovered that this address was abnormal.
>
> > the migration data incompatible?
> >
>
> This address only affects the kernel mode interrupt mode task function and
> device
> exception recovery function.
> They do not affect live migration functionality
>
> > Fixes: ???
>
> OK!

Hi,

Could you please add the Fixes tag and resend this separately if there are no
outstanding comments on this. This is not related to the debugfs support anyway.

Thanks,
Shameer

> Thanks.
> Longfang.
> >> Signed-off-by: Longfang Liu <[email protected]>
> >> ---
> >> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 ++++----
> >> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 3 +++
> >> 2 files changed, 7 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >> index 45351be8e270..0c7e31076ff4 100644
> >> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >> @@ -516,12 +516,12 @@ static int vf_qm_state_save(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
> >> return -EINVAL;
> >>
> >> /* Every reg is 32 bit, the dma address is 64 bit. */
> >> - vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
> >> + vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
> >> vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> >> - vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
> >> - vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
> >> + vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
> >> + vf_data->aeqe_dma = vf_data->qm_aeqc_dw[QM_XQC_ADDR_HIGH];
> >> vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> >> - vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
> >> + vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[QM_XQC_ADDR_LOW];
> >>
> >> /* Through SQC_BT/CQC_BT to get sqc and cqc address */
> >> ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
> >> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >> index 5bab46602fad..f887ab98581c 100644
> >> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >> @@ -38,6 +38,9 @@
> >> #define QM_REG_ADDR_OFFSET 0x0004
> >>
> >> #define QM_XQC_ADDR_OFFSET 32U
> >> +#define QM_XQC_ADDR_LOW 0x1
> >> +#define QM_XQC_ADDR_HIGH 0x2
> >> +
> >> #define QM_VF_AEQ_INT_MASK 0x0004
> >> #define QM_VF_EQ_INT_MASK 0x000c
> >> #define QM_IFC_INT_SOURCE_V 0x0020
> >
> > .
> >

2024-05-07 12:59:43

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address

On Tue, 7 May 2024 16:29:05 +0800
liulongfang <[email protected]> wrote:

> On 2024/5/4 0:11, Alex Williamson wrote:
> > On Thu, 25 Apr 2024 21:23:19 +0800
> > Longfang Liu <[email protected]> wrote:
> >
> >> According to the latest hardware register specification. The DMA
> >> addresses of EQE and AEQE are not at the front of their respective
> >> register groups, but start from the second.
> >> So, previously fetching the value starting from the first register
> >> would result in an incorrect address.
> >>
> >> Therefore, the register location from which the address is obtained
> >> needs to be modified.
> >
> > How does this affect migration? Has it ever worked? Does this make
>
> The general HiSilicon accelerator task will only use SQE and CQE.
> EQE is only used when user running kernel mode task and uses interrupt mode.
> AEQE is only used when user running task exceptions occur and software reset
> is required.
>
> The DMA addresses of these four queues are written to the device by the device
> driver through the mailbox command during driver initialization.
> The DMA addresses of EQE and AEQE are migrated through the device register.
>
> EQE and AEQE are not used in general task, after the live migration is completed,
> this DMA address error will not be found. until we added a new kernel-mode test case
> that we discovered that this address was abnormal.
>
> > the migration data incompatible?
> >
>
> This address only affects the kernel mode interrupt mode task function and device
> exception recovery function.
> They do not affect live migration functionality

Then why are we migrating them? Especially EQE, if it is only used by
kernel mode drivers then why does the migration protocol have any
business transferring the value from the source device? It seems the
fix should be not to apply the value from the source and mark these as
reserved fields in the migration data stream. Thanks,

Alex


> > Fixes: ???
> >
>
> OK!
>
> Thanks.
> Longfang.
> >> Signed-off-by: Longfang Liu <[email protected]>
> >> ---
> >> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 ++++----
> >> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 3 +++
> >> 2 files changed, 7 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >> index 45351be8e270..0c7e31076ff4 100644
> >> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> >> @@ -516,12 +516,12 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> >> return -EINVAL;
> >>
> >> /* Every reg is 32 bit, the dma address is 64 bit. */
> >> - vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
> >> + vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
> >> vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> >> - vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
> >> - vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
> >> + vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
> >> + vf_data->aeqe_dma = vf_data->qm_aeqc_dw[QM_XQC_ADDR_HIGH];
> >> vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> >> - vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
> >> + vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[QM_XQC_ADDR_LOW];
> >>
> >> /* Through SQC_BT/CQC_BT to get sqc and cqc address */
> >> ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
> >> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >> index 5bab46602fad..f887ab98581c 100644
> >> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
> >> @@ -38,6 +38,9 @@
> >> #define QM_REG_ADDR_OFFSET 0x0004
> >>
> >> #define QM_XQC_ADDR_OFFSET 32U
> >> +#define QM_XQC_ADDR_LOW 0x1
> >> +#define QM_XQC_ADDR_HIGH 0x2
> >> +
> >> #define QM_VF_AEQ_INT_MASK 0x0004
> >> #define QM_VF_EQ_INT_MASK 0x000c
> >> #define QM_IFC_INT_SOURCE_V 0x0020
> >
> > .
> >
>


2024-05-08 07:19:14

by liulongfang

[permalink] [raw]
Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address

On 2024/5/7 20:35, Alex Williamson wrote:
> On Tue, 7 May 2024 16:29:05 +0800
> liulongfang <[email protected]> wrote:
>
>> On 2024/5/4 0:11, Alex Williamson wrote:
>>> On Thu, 25 Apr 2024 21:23:19 +0800
>>> Longfang Liu <[email protected]> wrote:
>>>
>>>> According to the latest hardware register specification. The DMA
>>>> addresses of EQE and AEQE are not at the front of their respective
>>>> register groups, but start from the second.
>>>> So, previously fetching the value starting from the first register
>>>> would result in an incorrect address.
>>>>
>>>> Therefore, the register location from which the address is obtained
>>>> needs to be modified.
>>>
>>> How does this affect migration? Has it ever worked? Does this make
>>
>> The general HiSilicon accelerator task will only use SQE and CQE.
>> EQE is only used when user running kernel mode task and uses interrupt mode.
>> AEQE is only used when user running task exceptions occur and software reset
>> is required.
>>
>> The DMA addresses of these four queues are written to the device by the device
>> driver through the mailbox command during driver initialization.
>> The DMA addresses of EQE and AEQE are migrated through the device register.
>>
>> EQE and AEQE are not used in general task, after the live migration is completed,
>> this DMA address error will not be found. until we added a new kernel-mode test case
>> that we discovered that this address was abnormal.
>>
>>> the migration data incompatible?
>>>
>>
>> This address only affects the kernel mode interrupt mode task function and device
>> exception recovery function.
>> They do not affect live migration functionality
>
> Then why are we migrating them? Especially EQE, if it is only used by
> kernel mode drivers then why does the migration protocol have any
> business transferring the value from the source device? It seems the
> fix should be not to apply the value from the source and mark these as
> reserved fields in the migration data stream. Thanks,
>

HiSilicon accelerator equipment can perform general services after completing live migration.
This kind of business is executed through the user mode driver and only needs to use SQE and CQE.

At the same time, this device can also perform kernel-mode services in the VM through the crypto
subsystem. This kind of service requires the use of EQE.

Finally, if the device is abnormal, the driver needs to perform a device reset, and AEQE needs to
be used in this case.

Therefore, a complete device live migration function needs to ensure that device functions are
normal in all these scenarios.
Therefore, this data still needs to be migrated.

Thanks,
Longfang.

> Alex
>
>
>>> Fixes: ???
>>>
>>
>> OK!
>>
>> Thanks.
>> Longfang.
>>>> Signed-off-by: Longfang Liu <[email protected]>
>>>> ---
>>>> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 ++++----
>>>> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 3 +++
>>>> 2 files changed, 7 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>>> index 45351be8e270..0c7e31076ff4 100644
>>>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>>>> @@ -516,12 +516,12 @@ static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
>>>> return -EINVAL;
>>>>
>>>> /* Every reg is 32 bit, the dma address is 64 bit. */
>>>> - vf_data->eqe_dma = vf_data->qm_eqc_dw[1];
>>>> + vf_data->eqe_dma = vf_data->qm_eqc_dw[QM_XQC_ADDR_HIGH];
>>>> vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
>>>> - vf_data->eqe_dma |= vf_data->qm_eqc_dw[0];
>>>> - vf_data->aeqe_dma = vf_data->qm_aeqc_dw[1];
>>>> + vf_data->eqe_dma |= vf_data->qm_eqc_dw[QM_XQC_ADDR_LOW];
>>>> + vf_data->aeqe_dma = vf_data->qm_aeqc_dw[QM_XQC_ADDR_HIGH];
>>>> vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
>>>> - vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[0];
>>>> + vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[QM_XQC_ADDR_LOW];
>>>>
>>>> /* Through SQC_BT/CQC_BT to get sqc and cqc address */
>>>> ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
>>>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>>> index 5bab46602fad..f887ab98581c 100644
>>>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
>>>> @@ -38,6 +38,9 @@
>>>> #define QM_REG_ADDR_OFFSET 0x0004
>>>>
>>>> #define QM_XQC_ADDR_OFFSET 32U
>>>> +#define QM_XQC_ADDR_LOW 0x1
>>>> +#define QM_XQC_ADDR_HIGH 0x2
>>>> +
>>>> #define QM_VF_AEQ_INT_MASK 0x0004
>>>> #define QM_VF_EQ_INT_MASK 0x000c
>>>> #define QM_IFC_INT_SOURCE_V 0x0020
>>>
>>> .
>>>
>>
>
> .
>

2024-05-08 18:09:45

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address

On Wed, 8 May 2024 15:18:55 +0800
liulongfang <[email protected]> wrote:

> On 2024/5/7 20:35, Alex Williamson wrote:
> > On Tue, 7 May 2024 16:29:05 +0800
> > liulongfang <[email protected]> wrote:
> >
> >> On 2024/5/4 0:11, Alex Williamson wrote:
> >>> On Thu, 25 Apr 2024 21:23:19 +0800
> >>> Longfang Liu <[email protected]> wrote:
> >>>
> >>>> According to the latest hardware register specification. The DMA
> >>>> addresses of EQE and AEQE are not at the front of their respective
> >>>> register groups, but start from the second.
> >>>> So, previously fetching the value starting from the first register
> >>>> would result in an incorrect address.
> >>>>
> >>>> Therefore, the register location from which the address is obtained
> >>>> needs to be modified.
> >>>
> >>> How does this affect migration? Has it ever worked? Does this make
> >>
> >> The general HiSilicon accelerator task will only use SQE and CQE.
> >> EQE is only used when user running kernel mode task and uses interrupt mode.
> >> AEQE is only used when user running task exceptions occur and software reset
> >> is required.
> >>
> >> The DMA addresses of these four queues are written to the device by the device
> >> driver through the mailbox command during driver initialization.
> >> The DMA addresses of EQE and AEQE are migrated through the device register.
> >>
> >> EQE and AEQE are not used in general task, after the live migration is completed,
> >> this DMA address error will not be found. until we added a new kernel-mode test case
> >> that we discovered that this address was abnormal.
> >>
> >>> the migration data incompatible?
> >>>
> >>
> >> This address only affects the kernel mode interrupt mode task function and device
> >> exception recovery function.
> >> They do not affect live migration functionality
> >
> > Then why are we migrating them? Especially EQE, if it is only used by
> > kernel mode drivers then why does the migration protocol have any
> > business transferring the value from the source device? It seems the
> > fix should be not to apply the value from the source and mark these as
> > reserved fields in the migration data stream. Thanks,
> >
>
> HiSilicon accelerator equipment can perform general services after completing live migration.
> This kind of business is executed through the user mode driver and only needs to use SQE and CQE.
>
> At the same time, this device can also perform kernel-mode services in the VM through the crypto
> subsystem. This kind of service requires the use of EQE.
>
> Finally, if the device is abnormal, the driver needs to perform a device reset, and AEQE needs to
> be used in this case.
>
> Therefore, a complete device live migration function needs to ensure that device functions are
> normal in all these scenarios.
> Therefore, this data still needs to be migrated.

Ok, I had jumped to an in-kernel host driver in reference to "kernel
mode" rather than a guest kernel. Migrating with bad data only affects
the current configuration of the device, reloading a guest driver to
update these registers or a reset of the device would allow proper
operation of the device, correct?

But I think this still isn't really a complete solution, we know
there's a bug in the migration data stream, so not only would we fix
the data stream, but I think we should also take measures to prevent
loading a known bad data stream. AIUI migration of this device while
running in kernel mode (ie. a kernel driver within a guest VM) is
broken. Therefore, the least we can do in a new kernel, knowing that
there was previously a bug in the migration data stream, is to fail to
load that migration data because it risks this scenario where the
device is broken after migration. Shouldn't we then also increment a
migration version field in the data stream to block migrations that
risk this breakage, or barring that, change the magic data field to
prevent the migration? Thanks,

Alex


Subject: RE: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address



> -----Original Message-----
> From: Alex Williamson <[email protected]>
> Sent: Wednesday, May 8, 2024 7:00 PM
> To: liulongfang <[email protected]>
> Cc: [email protected]; Shameerali Kolothum Thodi
> <[email protected]>; Jonathan Cameron
> <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of
> the XQC address

[...]

> > HiSilicon accelerator equipment can perform general services after
> completing live migration.
> > This kind of business is executed through the user mode driver and only
> needs to use SQE and CQE.
> >
> > At the same time, this device can also perform kernel-mode services in the
> VM through the crypto
> > subsystem. This kind of service requires the use of EQE.
> >
> > Finally, if the device is abnormal, the driver needs to perform a device
> reset, and AEQE needs to
> > be used in this case.
> >
> > Therefore, a complete device live migration function needs to ensure that
> device functions are
> > normal in all these scenarios.
> > Therefore, this data still needs to be migrated.
>
> Ok, I had jumped to an in-kernel host driver in reference to "kernel
> mode" rather than a guest kernel. Migrating with bad data only affects
> the current configuration of the device, reloading a guest driver to
> update these registers or a reset of the device would allow proper
> operation of the device, correct?

Yes, after talking to Longfang, the device RAS will trigger a reset and
would function after reset.

>
> But I think this still isn't really a complete solution, we know
> there's a bug in the migration data stream, so not only would we fix
> the data stream, but I think we should also take measures to prevent
> loading a known bad data stream. AIUI migration of this device while
> running in kernel mode (ie. a kernel driver within a guest VM) is
> broken. Therefore, the least we can do in a new kernel, knowing that
> there was previously a bug in the migration data stream, is to fail to
> load that migration data because it risks this scenario where the
> device is broken after migration. Shouldn't we then also increment a
> migration version field in the data stream to block migrations that
> risk this breakage, or barring that, change the magic data field to
> prevent the migration? Thanks,

Ok. We could add a new ACC_DEV_MAGIC_V2 and prevent the migration
in vf_qm_check_match(). The only concern here is that, it will completely
block old kernel to new kernel migration and since we can recover the
device after the reset whether it is too restrictive or not.

Thanks,
Shameer

2024-05-09 14:30:01

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address

On Thu, 9 May 2024 09:37:51 +0000
Shameerali Kolothum Thodi <[email protected]> wrote:

> > -----Original Message-----
> > From: Alex Williamson <[email protected]>
> > Sent: Wednesday, May 8, 2024 7:00 PM
> > To: liulongfang <[email protected]>
> > Cc: [email protected]; Shameerali Kolothum Thodi
> > <[email protected]>; Jonathan Cameron
> > <[email protected]>; [email protected]; linux-
> > [email protected]; [email protected]
> > Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of
> > the XQC address
>
> [...]
>
> > > HiSilicon accelerator equipment can perform general services after
> > completing live migration.
> > > This kind of business is executed through the user mode driver and only
> > needs to use SQE and CQE.
> > >
> > > At the same time, this device can also perform kernel-mode services in the
> > VM through the crypto
> > > subsystem. This kind of service requires the use of EQE.
> > >
> > > Finally, if the device is abnormal, the driver needs to perform a device
> > reset, and AEQE needs to
> > > be used in this case.
> > >
> > > Therefore, a complete device live migration function needs to ensure that
> > device functions are
> > > normal in all these scenarios.
> > > Therefore, this data still needs to be migrated.
> >
> > Ok, I had jumped to an in-kernel host driver in reference to "kernel
> > mode" rather than a guest kernel. Migrating with bad data only affects
> > the current configuration of the device, reloading a guest driver to
> > update these registers or a reset of the device would allow proper
> > operation of the device, correct?
>
> Yes, after talking to Longfang, the device RAS will trigger a reset and
> would function after reset.
>
> >
> > But I think this still isn't really a complete solution, we know
> > there's a bug in the migration data stream, so not only would we fix
> > the data stream, but I think we should also take measures to prevent
> > loading a known bad data stream. AIUI migration of this device while
> > running in kernel mode (ie. a kernel driver within a guest VM) is
> > broken. Therefore, the least we can do in a new kernel, knowing that
> > there was previously a bug in the migration data stream, is to fail to
> > load that migration data because it risks this scenario where the
> > device is broken after migration. Shouldn't we then also increment a
> > migration version field in the data stream to block migrations that
> > risk this breakage, or barring that, change the magic data field to
> > prevent the migration? Thanks,
>
> Ok. We could add a new ACC_DEV_MAGIC_V2 and prevent the migration
> in vf_qm_check_match(). The only concern here is that, it will completely
> block old kernel to new kernel migration and since we can recover the
> device after the reset whether it is too restrictive or not.

What's the impact to the running driver, kernel or userspace, if the
device is reset? Migration is intended to be effectively transparent
to the driver. If the driver stalls and needs to reset the device,
what has the migration driver accomplished versus an offline migration?

If there's a way to detect from the migration data if the device is
running in kernel mode or user mode then you could potentially accept
and send v1 magic conditional that the device is in user mode and
require v2 magic for any migration where the device is in kernel mode.
This all adds complication though and seems like it has corner cases
where we might allow migration to an old kernel that might trap the
device there if the use case changes.

Essentially it comes down to what should the migration experience be
and while restricting old->new and new->old migration is undesirable,
it seems old->old migration is effectively already broken anyway. As
you consider a v2 magic, perhaps consider how the migration data
structure might be improved overall to better handle new features and
bugs. Thanks,

Alex


Subject: RE: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of the XQC address



> -----Original Message-----
> From: liulongfang <[email protected]>
> Sent: Monday, May 13, 2024 9:16 AM
> To: Alex Williamson <[email protected]>; Shameerali Kolothum
> Thodi <[email protected]>
> Cc: [email protected]; Jonathan Cameron <[email protected]>;
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of
> the XQC address
>
> On 2024/5/9 22:29, Alex Williamson wrote:
> > On Thu, 9 May 2024 09:37:51 +0000
> > Shameerali Kolothum Thodi <[email protected]>
> wrote:
> >
> >>> -----Original Message-----
> >>> From: Alex Williamson <[email protected]>
> >>> Sent: Wednesday, May 8, 2024 7:00 PM
> >>> To: liulongfang <[email protected]>
> >>> Cc: [email protected]; Shameerali Kolothum Thodi
> >>> <[email protected]>; Jonathan Cameron
> >>> <[email protected]>; [email protected]; linux-
> >>> [email protected]; [email protected]
> >>> Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register
> location of
> >>> the XQC address
> >>
> >> [...]
> >>
> >>>> HiSilicon accelerator equipment can perform general services after
> >>> completing live migration.
> >>>> This kind of business is executed through the user mode driver and only
> >>> needs to use SQE and CQE.
> >>>>
> >>>> At the same time, this device can also perform kernel-mode services in
> the
> >>> VM through the crypto
> >>>> subsystem. This kind of service requires the use of EQE.
> >>>>
> >>>> Finally, if the device is abnormal, the driver needs to perform a device
> >>> reset, and AEQE needs to
> >>>> be used in this case.
> >>>>
> >>>> Therefore, a complete device live migration function needs to ensure
> that
> >>> device functions are
> >>>> normal in all these scenarios.
> >>>> Therefore, this data still needs to be migrated.
> >>>
> >>> Ok, I had jumped to an in-kernel host driver in reference to "kernel
> >>> mode" rather than a guest kernel. Migrating with bad data only affects
> >>> the current configuration of the device, reloading a guest driver to
> >>> update these registers or a reset of the device would allow proper
> >>> operation of the device, correct?
> >>
> >> Yes, after talking to Longfang, the device RAS will trigger a reset and
> >> would function after reset.
> >>
> >>>
> >>> But I think this still isn't really a complete solution, we know
> >>> there's a bug in the migration data stream, so not only would we fix
> >>> the data stream, but I think we should also take measures to prevent
> >>> loading a known bad data stream. AIUI migration of this device while
> >>> running in kernel mode (ie. a kernel driver within a guest VM) is
> >>> broken. Therefore, the least we can do in a new kernel, knowing that
> >>> there was previously a bug in the migration data stream, is to fail to
> >>> load that migration data because it risks this scenario where the
> >>> device is broken after migration. Shouldn't we then also increment a
> >>> migration version field in the data stream to block migrations that
> >>> risk this breakage, or barring that, change the magic data field to
> >>> prevent the migration? Thanks,
> >>
> >> Ok. We could add a new ACC_DEV_MAGIC_V2 and prevent the migration
> >> in vf_qm_check_match(). The only concern here is that, it will completely
> >> block old kernel to new kernel migration and since we can recover the
> >> device after the reset whether it is too restrictive or not.
> >
> > What's the impact to the running driver, kernel or userspace, if the
> > device is reset? Migration is intended to be effectively transparent
>
> If the device is reset, the user's task needs to be restarted.
> If an exception has been detected, the best way is not to migrate.
>
> > to the driver. If the driver stalls and needs to reset the device,
> > what has the migration driver accomplished versus an offline migration?
> >
> > If there's a way to detect from the migration data if the device is
> > running in kernel mode or user mode then you could potentially accept
> > and send v1 magic conditional that the device is in user mode and
> > require v2 magic for any migration where the device is in kernel mode.
> > This all adds complication though and seems like it has corner cases
> > where we might allow migration to an old kernel that might trap the
> > device there if the use case changes.
> >
>
> The driver does not support checking whether the device is running in
> kernel mode or user mode.
> Moreover, the device supports user-mode services and kernel-mode services
> to run at the same time.
>
> > Essentially it comes down to what should the migration experience be
> > and while restricting old->new and new->old migration is undesirable,
> > it seems old->old migration is effectively already broken anyway. As
> > you consider a v2 magic, perhaps consider how the migration data
> > structure might be improved overall to better handle new features and
> > bugs. Thanks,
> >
>
> We discussed a plan:
> Update ACC_DEV_MAGIC to ACC_DEV_MAGIC_VERSION and configure its
> last byte
> as version information:
>
> /* QM match information, last byte is version number */
> #define ACC_DEV_MAGIC_VERSION 0XACCDEVFEEDCAFE01

Oops..cant have V there. But the idea is replace magic with last byte
as version info which can be used in future for handling bugs/features
etc.

Thanks,
Shameer