2023-11-19 09:44:15

by Saeed Mahameed

[permalink] [raw]
Subject: [PATCH V2 0/5] mlx5 ConnectX diagnostic misc driver

From: Saeed Mahameed <[email protected]>

V1: https://lore.kernel.org/all/[email protected]/T/#rf22c2070325b7da020ee29242dc25ffbd286d40d

Following Greg's and Arnd's comments on V1 here are the V2 changes:

V1->V2:
- Provide legal statement and sign-off for dual license use
- Fix License clause to use: BSD-3-Clause OR GPL-2.0
- Fix kernel robot warnings
- Use dev_dbg directly instead of umem_dbg() local wrapper
- Implement .compat_ioctl for 32bit compatibility
- Fix mlx5ctl_info ABI structure size and alignment
- Local pointer to correct type instead of in-place cast
- Check unused fields and flags are 0 on ioctl path
- Use correct macro to declare scalar arg ioctl command
#define MLX5CTL_IOCTL_UMEM_UNREG \
_IO(MLX5CTL_IOCTL_MAGIC, 0x3)

The ConnectX HW family supported by the mlx5 drivers uses an architecture
where a FW component executes "mailbox RPCs" issued by the driver to make
changes to the device. This results in a complex debugging environment
where the FW component has information and low level configuration that
needs to be accessed to userspace for debugging purposes.

Historically a userspace program was used that accessed the PCI register
and config space directly through /sys/bus/pci/.../XXX and could operate
these debugging interfaces in parallel with the running driver.
This approach is incompatible with secure boot and kernel lockdown so this
driver provides a secure and restricted interface to that.

1) The first patch in the series introduces the main driver file with the
implementation of a new mlx5 auxiliary device driver to run on top
mlx5_core device instances, on probe it creates a new misc device and in
this patch we implement the open and release fops, On open the driver
would allocate a special FW UID (user context ID) restricted to debug
RPCs only, where all user debug rpcs will be executed under this UID,
and on release the UID will be freed.

2) The second patch adds an info ioctl that will show the allocated UID
and the available capability masks of the device and the current UID, and
some other useful device information such as the underlying ConnectX

Example:
$ sudo ./mlx5ctlu mlx5_core.ctl.0
mlx5dev: 0000:00:04.0
UCTX UID: 1
UCTX CAP: 0x3
DEV UCTX CAP: 0x3
USER CAP: 0x1d

3) Third patch will add the capability to execute debug RPCs under the
special UID.

In the mlx5 architecture the FW RPC commands are of the format of
inbox and outbox buffers. The inbox buffer contains the command
rpc layout as described in the ConnectX Programmers Reference Manual
(PRM) document and as defined in linux/include/mlx5/mlx5_ifc.h.

On success the user outbox buffer will be filled with the device's rpc
response.

For example to query device capabilities:
a user fills out an inbox buffer with the inbox layout:
struct mlx5_ifc_query_hca_cap_in_bits
and expects an outbox buffer with the layout:
struct mlx5_ifc_cmd_hca_cap_bits

4) The fourth patch adds the ability to register user memory into the
ConntectX device and create a umem object that points to that memory.

Command rpc outbox buffer is limited in size, which can be very
annoying when trying to pull large traces out of the device.
Many rpcs offer the ability to scatter output traces, contexts
and logs directly into user space buffers in a single shot.

The registered memory will be described by a device UMEM object which
has a unique umem_id, this umem_id can be later used in the rpc inbox
to tell the device where to populate the response output,
e.g HW traces and other debug object queries.

Example usecase, a ConnectX device coredump can be as large as 2MB.
Using inline rpcs will take thousands of rpcs to get the full
coredump which can consume multiple seconds.

With UMEM, it can be done in a single rpc, using 2MB of umem user buffer.

Other usecases with umem:
- dynamic HW and FW trace monitoring
- high frequency diagnostic counters sampling
- batched objects and resource dumps

See links below for information about user space tools that use this
interface:

[1] https://github.com/saeedtx/mlx5ctl

[2] https://github.com/Mellanox/mstflint
see:

d) mstregdump utility
This utility dumps hardware registers from Mellanox hardware
for later analysis by Mellanox.

g) mstconfig
This tool sets or queries non-volatile configurable options
for Mellanox HCAs.

h) mstfwmanager
Mellanox firmware update and query utility which scans the system
for available Mellanox devices (only mst PCI devices) and performs
the necessary firmware updates.

i) mstreg
The mlxreg utility allows users to obtain information regarding
supported access registers, such as their fields

License: BSD-3-Clause OR GPL-2.0
================================
After a review of this thread [3], and a conversation with the LF,
Mellanox and NVIDIA legal continue to approve the use of a Dual GPL &
Permissive License for mlx5 related driver contributions. This makes it
clear to future contributors that this file may be adapted and reused
under BSD-3-Clause terms on other operating systems. Contributions will
be handled in the normal way and the dual license will apply
automatically. If people wish to contribute significantly and opt out of
a dual license they may separate their GPL only contributions in dedicated
files.

Jason has a signing authority for NVIDIA and has gone through our internal
process to get approval.

Signed-off-by: Jason Gunthorpe <[email protected]> # for legal

[3] https://lore.kernel.org/all/[email protected]/#r

================================

Signed-off-by: Saeed Mahameed <[email protected]>

Saeed Mahameed (5):
mlx5: Add aux dev for ctl interface
misc: mlx5ctl: Add mlx5ctl misc driver
misc: mlx5ctl: Add info ioctl
misc: mlx5ctl: Add command rpc ioctl
misc: mlx5ctl: Add umem reg/unreg ioctl

.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 8 +
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/mlx5ctl/Kconfig | 14 +
drivers/misc/mlx5ctl/Makefile | 5 +
drivers/misc/mlx5ctl/main.c | 559 ++++++++++++++++++
drivers/misc/mlx5ctl/umem.c | 320 ++++++++++
drivers/misc/mlx5ctl/umem.h | 17 +
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 8 +
include/uapi/misc/mlx5ctl.h | 53 ++
11 files changed, 987 insertions(+)
create mode 100644 drivers/misc/mlx5ctl/Kconfig
create mode 100644 drivers/misc/mlx5ctl/Makefile
create mode 100644 drivers/misc/mlx5ctl/main.c
create mode 100644 drivers/misc/mlx5ctl/umem.c
create mode 100644 drivers/misc/mlx5ctl/umem.h
create mode 100644 include/uapi/misc/mlx5ctl.h

--
2.41.0


2023-11-19 09:44:26

by Saeed Mahameed

[permalink] [raw]
Subject: [PATCH V2 1/5] mlx5: Add aux dev for ctl interface

From: Saeed Mahameed <[email protected]>

Allow ctl protocol interface auxiliary driver in mlx5.

Signed-off-by: Saeed Mahameed <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index cf0477f53dc4..753c11f7050d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -228,8 +228,14 @@ enum {
MLX5_INTERFACE_PROTOCOL_VNET,

MLX5_INTERFACE_PROTOCOL_DPLL,
+ MLX5_INTERFACE_PROTOCOL_CTL,
};

+static bool is_ctl_supported(struct mlx5_core_dev *dev)
+{
+ return MLX5_CAP_GEN(dev, uctx_cap);
+}
+
static const struct mlx5_adev_device {
const char *suffix;
bool (*is_supported)(struct mlx5_core_dev *dev);
@@ -252,6 +258,8 @@ static const struct mlx5_adev_device {
.is_supported = &is_mp_supported },
[MLX5_INTERFACE_PROTOCOL_DPLL] = { .suffix = "dpll",
.is_supported = &is_dpll_supported },
+ [MLX5_INTERFACE_PROTOCOL_CTL] = { .suffix = "ctl",
+ .is_supported = &is_ctl_supported },
};

int mlx5_adev_idx_alloc(void)
--
2.41.0

2023-11-19 09:44:31

by Saeed Mahameed

[permalink] [raw]
Subject: [PATCH V2 2/5] misc: mlx5ctl: Add mlx5ctl misc driver

From: Saeed Mahameed <[email protected]>

The ConnectX HW family supported by the mlx5 drivers uses an architecture
where a FW component executes "mailbox RPCs" issued by the driver to make
changes to the device. This results in a complex debugging environment
where the FW component has information and low level configuration that
needs to be accessed to userspace for debugging purposes.

Historically a userspace program was used that accessed the PCI register
and config space directly through /sys/bus/pci/.../XXX and could operate
these debugging interfaces in parallel with the running driver.
This approach is incompatible with secure boot and kernel lockdown so this
driver provides a secure and restricted interface to that same data.

On open the driver would allocate a special FW UID (user context ID)
restrected to debug RPCs only, later in this series all user RPCs will
be stamped with this UID.

Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
igned-off-by: Saeed Mahameed <[email protected]>
---
MAINTAINERS | 8 +
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/mlx5ctl/Kconfig | 14 ++
drivers/misc/mlx5ctl/Makefile | 4 +
drivers/misc/mlx5ctl/main.c | 314 ++++++++++++++++++++++++++++++++++
6 files changed, 342 insertions(+)
create mode 100644 drivers/misc/mlx5ctl/Kconfig
create mode 100644 drivers/misc/mlx5ctl/Makefile
create mode 100644 drivers/misc/mlx5ctl/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 97f51d5ec1cf..4b532df16b19 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13835,6 +13835,14 @@ L: [email protected]
S: Supported
F: drivers/vdpa/mlx5/

+MELLANOX MLX5 ConnectX Diag DRIVER
+M: Saeed Mahameed <[email protected]>
+R: Itay Avraham <[email protected]>
+L: [email protected]
+S: Supported
+F: drivers/misc/mlx5ctl/
+F: include/uapi/misc/mlx5ctl.h
+
MELLANOX MLXCPLD I2C AND MUX DRIVER
M: Vadim Pasternak <[email protected]>
M: Michael Shych <[email protected]>
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index f37c4b8380ae..c0e4823648ed 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -579,4 +579,5 @@ source "drivers/misc/cardreader/Kconfig"
source "drivers/misc/uacce/Kconfig"
source "drivers/misc/pvpanic/Kconfig"
source "drivers/misc/mchp_pci1xxxx/Kconfig"
+source "drivers/misc/mlx5ctl/Kconfig"
endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index f2a4d1ff65d4..49bc4697f498 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -67,3 +67,4 @@ obj-$(CONFIG_TMR_MANAGER) += xilinx_tmr_manager.o
obj-$(CONFIG_TMR_INJECT) += xilinx_tmr_inject.o
obj-$(CONFIG_TPS6594_ESM) += tps6594-esm.o
obj-$(CONFIG_TPS6594_PFSM) += tps6594-pfsm.o
+obj-$(CONFIG_MLX5CTL) += mlx5ctl/
diff --git a/drivers/misc/mlx5ctl/Kconfig b/drivers/misc/mlx5ctl/Kconfig
new file mode 100644
index 000000000000..faaa1dba2cc2
--- /dev/null
+++ b/drivers/misc/mlx5ctl/Kconfig
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+
+config MLX5CTL
+ tristate "mlx5 ConnectX control misc driver"
+ depends on MLX5_CORE
+ help
+ MLX5CTL provides interface for the user process to access the debug and
+ configuration registers of the ConnectX hardware family
+ (NICs, PCI switches and SmartNIC SoCs).
+ This will allow configuration and debug tools to work out of the box on
+ mainstream kernel.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/misc/mlx5ctl/Makefile b/drivers/misc/mlx5ctl/Makefile
new file mode 100644
index 000000000000..b5c7f99e0ab6
--- /dev/null
+++ b/drivers/misc/mlx5ctl/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_MLX5CTL) += mlx5ctl.o
+mlx5ctl-y := main.o
diff --git a/drivers/misc/mlx5ctl/main.c b/drivers/misc/mlx5ctl/main.c
new file mode 100644
index 000000000000..8eb150461b80
--- /dev/null
+++ b/drivers/misc/mlx5ctl/main.c
@@ -0,0 +1,314 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#include <linux/miscdevice.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/auxiliary_bus.h>
+#include <linux/mlx5/device.h>
+#include <linux/mlx5/driver.h>
+#include <linux/atomic.h>
+#include <linux/refcount.h>
+
+MODULE_DESCRIPTION("mlx5 ConnectX control misc driver");
+MODULE_AUTHOR("Saeed Mahameed <[email protected]>");
+MODULE_LICENSE("Dual BSD/GPL");
+
+struct mlx5ctl_dev {
+ struct mlx5_core_dev *mdev;
+ struct miscdevice miscdev;
+ struct auxiliary_device *adev;
+ struct list_head fd_list;
+ spinlock_t fd_list_lock; /* protect list add/del */
+ struct rw_semaphore rw_lock;
+ struct kref refcount;
+};
+
+struct mlx5ctl_fd {
+ u16 uctx_uid;
+ u32 uctx_cap;
+ u32 ucap; /* user cap */
+ struct mlx5ctl_dev *mcdev;
+ struct list_head list;
+};
+
+#define mlx5ctl_err(mcdev, format, ...) \
+ dev_err(mcdev->miscdev.parent, format, ##__VA_ARGS__)
+
+#define mlx5ctl_dbg(mcdev, format, ...) \
+ dev_dbg(mcdev->miscdev.parent, "PID %d: " format, \
+ current->pid, ##__VA_ARGS__)
+
+enum {
+ MLX5_UCTX_OBJECT_CAP_RAW_TX = 0x1,
+ MLX5_UCTX_OBJECT_CAP_INTERNAL_DEVICE_RESOURCES = 0x2,
+ MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES = 0x4,
+};
+
+static int mlx5ctl_alloc_uid(struct mlx5ctl_dev *mcdev, u32 cap)
+{
+ u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(create_uctx_in)] = {};
+ void *uctx;
+ int err;
+ u16 uid;
+
+ uctx = MLX5_ADDR_OF(create_uctx_in, in, uctx);
+
+ mlx5ctl_dbg(mcdev, "MLX5_CMD_OP_CREATE_UCTX: caps 0x%x\n", cap);
+ MLX5_SET(create_uctx_in, in, opcode, MLX5_CMD_OP_CREATE_UCTX);
+ MLX5_SET(uctx, uctx, cap, cap);
+
+ err = mlx5_cmd_exec(mcdev->mdev, in, sizeof(in), out, sizeof(out));
+ if (err)
+ return err;
+
+ uid = MLX5_GET(create_uctx_out, out, uid);
+ mlx5ctl_dbg(mcdev, "allocated uid %d with caps 0x%x\n", uid, cap);
+ return uid;
+}
+
+static void mlx5ctl_release_uid(struct mlx5ctl_dev *mcdev, u16 uid)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_uctx_in)] = {};
+ struct mlx5_core_dev *mdev = mcdev->mdev;
+ int err;
+
+ MLX5_SET(destroy_uctx_in, in, opcode, MLX5_CMD_OP_DESTROY_UCTX);
+ MLX5_SET(destroy_uctx_in, in, uid, uid);
+
+ err = mlx5_cmd_exec_in(mdev, destroy_uctx, in);
+ mlx5ctl_dbg(mcdev, "released uid %d err(%d)\n", uid, err);
+}
+
+static void mcdev_get(struct mlx5ctl_dev *mcdev);
+static void mcdev_put(struct mlx5ctl_dev *mcdev);
+
+static int mlx5ctl_open_mfd(struct mlx5ctl_fd *mfd)
+{
+ struct mlx5_core_dev *mdev = mfd->mcdev->mdev;
+ struct mlx5ctl_dev *mcdev = mfd->mcdev;
+ u32 ucap = 0, cap = 0;
+ int uid;
+
+#define MLX5_UCTX_CAP(mdev, cap) \
+ (MLX5_CAP_GEN(mdev, uctx_cap) & MLX5_UCTX_OBJECT_CAP_##cap)
+
+ if (capable(CAP_NET_RAW) && MLX5_UCTX_CAP(mdev, RAW_TX)) {
+ ucap |= CAP_NET_RAW;
+ cap |= MLX5_UCTX_OBJECT_CAP_RAW_TX;
+ }
+
+ if (capable(CAP_SYS_RAWIO) && MLX5_UCTX_CAP(mdev, INTERNAL_DEVICE_RESOURCES)) {
+ ucap |= CAP_SYS_RAWIO;
+ cap |= MLX5_UCTX_OBJECT_CAP_INTERNAL_DEVICE_RESOURCES;
+ }
+
+ if (capable(CAP_SYS_ADMIN) && MLX5_UCTX_CAP(mdev, TOOLS_RESOURCES)) {
+ ucap |= CAP_SYS_ADMIN;
+ cap |= MLX5_UCTX_OBJECT_CAP_TOOLS_RESOURCES;
+ }
+
+ uid = mlx5ctl_alloc_uid(mcdev, cap);
+ if (uid < 0)
+ return uid;
+
+ mfd->uctx_uid = uid;
+ mfd->uctx_cap = cap;
+ mfd->ucap = ucap;
+ mfd->mcdev = mcdev;
+
+ mlx5ctl_dbg(mcdev, "allocated uid %d with uctx caps 0x%x, user cap 0x%x\n",
+ uid, cap, ucap);
+ return 0;
+}
+
+static void mlx5ctl_release_mfd(struct mlx5ctl_fd *mfd)
+{
+ struct mlx5ctl_dev *mcdev = mfd->mcdev;
+
+ mlx5ctl_release_uid(mcdev, mfd->uctx_uid);
+}
+
+static int mlx5ctl_open(struct inode *inode, struct file *file)
+{
+ struct mlx5_core_dev *mdev;
+ struct mlx5ctl_dev *mcdev;
+ struct mlx5ctl_fd *mfd;
+ int err = 0;
+
+ mcdev = container_of(file->private_data, struct mlx5ctl_dev, miscdev);
+ mcdev_get(mcdev);
+ down_read(&mcdev->rw_lock);
+ mdev = mcdev->mdev;
+ if (!mdev) {
+ err = -ENODEV;
+ goto unlock;
+ }
+
+ mfd = kzalloc(sizeof(*mfd), GFP_KERNEL_ACCOUNT);
+ if (!mfd)
+ return -ENOMEM;
+
+ mfd->mcdev = mcdev;
+ err = mlx5ctl_open_mfd(mfd);
+ if (err)
+ goto unlock;
+
+ spin_lock(&mcdev->fd_list_lock);
+ list_add_tail(&mfd->list, &mcdev->fd_list);
+ spin_unlock(&mcdev->fd_list_lock);
+
+ file->private_data = mfd;
+
+unlock:
+ up_read(&mcdev->rw_lock);
+ if (err) {
+ mcdev_put(mcdev);
+ kfree(mfd);
+ }
+ return err;
+}
+
+static int mlx5ctl_release(struct inode *inode, struct file *file)
+{
+ struct mlx5ctl_fd *mfd = file->private_data;
+ struct mlx5ctl_dev *mcdev = mfd->mcdev;
+
+ down_read(&mcdev->rw_lock);
+ if (!mcdev->mdev) {
+ pr_debug("[%d] UID %d mlx5ctl: mdev is already released\n",
+ current->pid, mfd->uctx_uid);
+ /* All mfds are already released, skip ... */
+ goto unlock;
+ }
+
+ spin_lock(&mcdev->fd_list_lock);
+ list_del(&mfd->list);
+ spin_unlock(&mcdev->fd_list_lock);
+
+ mlx5ctl_release_mfd(mfd);
+
+unlock:
+ kfree(mfd);
+ up_read(&mcdev->rw_lock);
+ mcdev_put(mcdev);
+ file->private_data = NULL;
+ return 0;
+}
+
+static const struct file_operations mlx5ctl_fops = {
+ .owner = THIS_MODULE,
+ .open = mlx5ctl_open,
+ .release = mlx5ctl_release,
+};
+
+static int mlx5ctl_probe(struct auxiliary_device *adev,
+ const struct auxiliary_device_id *id)
+
+{
+ struct mlx5_adev *madev = container_of(adev, struct mlx5_adev, adev);
+ struct mlx5_core_dev *mdev = madev->mdev;
+ struct mlx5ctl_dev *mcdev;
+ char *devname = NULL;
+ int err;
+
+ mcdev = kzalloc(sizeof(*mcdev), GFP_KERNEL_ACCOUNT);
+ if (!mcdev)
+ return -ENOMEM;
+
+ kref_init(&mcdev->refcount);
+ INIT_LIST_HEAD(&mcdev->fd_list);
+ spin_lock_init(&mcdev->fd_list_lock);
+ init_rwsem(&mcdev->rw_lock);
+ mcdev->mdev = mdev;
+ mcdev->adev = adev;
+ devname = kasprintf(GFP_KERNEL_ACCOUNT, "mlx5ctl-%s",
+ dev_name(&adev->dev));
+ if (!devname) {
+ err = -ENOMEM;
+ goto abort;
+ }
+
+ mcdev->miscdev = (struct miscdevice) {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = devname,
+ .fops = &mlx5ctl_fops,
+ .parent = &adev->dev,
+ };
+
+ err = misc_register(&mcdev->miscdev);
+ if (err) {
+ mlx5ctl_err(mcdev, "mlx5ctl: failed to register misc device err %d\n", err);
+ goto abort;
+ }
+
+ mlx5ctl_dbg(mcdev, "probe mdev@%s %s\n", dev_driver_string(mdev->device), dev_name(mdev->device));
+
+ auxiliary_set_drvdata(adev, mcdev);
+
+ return 0;
+
+abort:
+ kfree(devname);
+ kfree(mcdev);
+ return err;
+}
+
+static void mlx5ctl_remove(struct auxiliary_device *adev)
+{
+ struct mlx5ctl_dev *mcdev = auxiliary_get_drvdata(adev);
+ struct mlx5_core_dev *mdev = mcdev->mdev;
+ struct mlx5ctl_fd *mfd, *n;
+
+ misc_deregister(&mcdev->miscdev);
+ down_write(&mcdev->rw_lock);
+
+ list_for_each_entry_safe(mfd, n, &mcdev->fd_list, list) {
+ mlx5ctl_dbg(mcdev, "UID %d still has open FDs\n", mfd->uctx_uid);
+ list_del(&mfd->list);
+ mlx5ctl_release_mfd(mfd);
+ }
+
+ mlx5ctl_dbg(mcdev, "removed mdev %s %s\n",
+ dev_driver_string(mdev->device), dev_name(mdev->device));
+
+ mcdev->mdev = NULL; /* prevent already open fds from accessing the device */
+ up_write(&mcdev->rw_lock);
+ mcdev_put(mcdev);
+}
+
+static void mcdev_free(struct kref *ref)
+{
+ struct mlx5ctl_dev *mcdev = container_of(ref, struct mlx5ctl_dev, refcount);
+
+ kfree(mcdev->miscdev.name);
+ kfree(mcdev);
+}
+
+static void mcdev_get(struct mlx5ctl_dev *mcdev)
+{
+ kref_get(&mcdev->refcount);
+}
+
+static void mcdev_put(struct mlx5ctl_dev *mcdev)
+{
+ kref_put(&mcdev->refcount, mcdev_free);
+}
+
+static const struct auxiliary_device_id mlx5ctl_id_table[] = {
+ { .name = MLX5_ADEV_NAME ".ctl", },
+ {},
+};
+
+MODULE_DEVICE_TABLE(auxiliary, mlx5ctl_id_table);
+
+static struct auxiliary_driver mlx5ctl_driver = {
+ .name = "ctl",
+ .probe = mlx5ctl_probe,
+ .remove = mlx5ctl_remove,
+ .id_table = mlx5ctl_id_table,
+};
+
+module_auxiliary_driver(mlx5ctl_driver);
--
2.41.0

2023-11-19 09:44:37

by Saeed Mahameed

[permalink] [raw]
Subject: [PATCH V2 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl

From: Saeed Mahameed <[email protected]>

Command rpc outbox buffer is limited in size, which can be very
annoying when trying to pull large traces out of the device.
Many device rpcs offer the ability to scatter output traces, contexts
and logs directly into user space buffers in a single shot.

Allow user to register user memory space, so the device may dump
information directly into user memory space.

The registered memory will be described by a device UMEM object which
has a unique umem_id, this umem_id can be later used in the rpc inbox
to tell the device where to populate the response output,
e.g HW traces and other debug object queries.

To do so this patch introduces two ioctls:

MLX5CTL_IOCTL_UMEM_REG(va_address, size):
- calculate page fragments from the user provided virtual address
- pin the pages, and allocate a sg list
- dma map the sg list
- create a UMEM device object that points to the dma addresses
- add a driver umem object to an xarray data base for bookkeeping
- return UMEM ID to user so it can be used in subsequent rpcs

MLX5CTL_IOCTL_UMEM_UNREG(umem_id):
- user provides a pre allocated umem ID
- unwinds the above

Example usecase, ConnectX device coredump can be as large as 2MB.
Using inline rpcs will take thousands of rpcs to get the full
coredump which can take multiple seconds.

With UMEM, it can be done in a single rpc, using 2MB of umem user buffer.

$ ./mlx5ctlu mlx5_core.ctl.0 coredump --umem_size=$(( 2 ** 20 ))

00 00 00 00 01 00 20 00 00 00 00 04 00 00 48 ec
00 00 00 08 00 00 00 00 00 00 00 0c 00 00 00 03
00 00 00 10 00 00 00 00 00 00 00 14 00 00 00 00
....
00 50 0b 3c 00 00 00 00 00 50 0b 40 00 00 00 00
00 50 0b 44 00 00 00 00 00 50 0b 48 00 00 00 00
00 50 0c 00 00 00 00 00

INFO : Core dump done
INFO : Core dump size 831304
INFO : Core dump address 0x0
INFO : Core dump cookie 0x500c04
INFO : More Dump 0

Other usecases are: dynamic HW and FW trace monitoring, high frequency
diagnostic counters sampling and batched objects and resource dumps.

Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
---
drivers/misc/mlx5ctl/Makefile | 1 +
drivers/misc/mlx5ctl/main.c | 81 ++++++++-
drivers/misc/mlx5ctl/umem.c | 320 ++++++++++++++++++++++++++++++++++
drivers/misc/mlx5ctl/umem.h | 17 ++
include/uapi/misc/mlx5ctl.h | 16 ++
5 files changed, 434 insertions(+), 1 deletion(-)
create mode 100644 drivers/misc/mlx5ctl/umem.c
create mode 100644 drivers/misc/mlx5ctl/umem.h

diff --git a/drivers/misc/mlx5ctl/Makefile b/drivers/misc/mlx5ctl/Makefile
index b5c7f99e0ab6..f35234e931a8 100644
--- a/drivers/misc/mlx5ctl/Makefile
+++ b/drivers/misc/mlx5ctl/Makefile
@@ -2,3 +2,4 @@

obj-$(CONFIG_MLX5CTL) += mlx5ctl.o
mlx5ctl-y := main.o
+mlx5ctl-y += umem.o
diff --git a/drivers/misc/mlx5ctl/main.c b/drivers/misc/mlx5ctl/main.c
index e7776ea4bfca..58900f2be212 100644
--- a/drivers/misc/mlx5ctl/main.c
+++ b/drivers/misc/mlx5ctl/main.c
@@ -12,6 +12,8 @@
#include <linux/atomic.h>
#include <linux/refcount.h>

+#include "umem.h"
+
MODULE_DESCRIPTION("mlx5 ConnectX control misc driver");
MODULE_AUTHOR("Saeed Mahameed <[email protected]>");
MODULE_LICENSE("Dual BSD/GPL");
@@ -30,6 +32,8 @@ struct mlx5ctl_fd {
u16 uctx_uid;
u32 uctx_cap;
u32 ucap; /* user cap */
+
+ struct mlx5ctl_umem_db *umem_db;
struct mlx5ctl_dev *mcdev;
struct list_head list;
};
@@ -115,6 +119,12 @@ static int mlx5ctl_open_mfd(struct mlx5ctl_fd *mfd)
if (uid < 0)
return uid;

+ mfd->umem_db = mlx5ctl_umem_db_create(mdev, uid);
+ if (IS_ERR(mfd->umem_db)) {
+ mlx5ctl_release_uid(mcdev, uid);
+ return PTR_ERR(mfd->umem_db);
+ }
+
mfd->uctx_uid = uid;
mfd->uctx_cap = cap;
mfd->ucap = ucap;
@@ -129,6 +139,7 @@ static void mlx5ctl_release_mfd(struct mlx5ctl_fd *mfd)
{
struct mlx5ctl_dev *mcdev = mfd->mcdev;

+ mlx5ctl_umem_db_destroy(mfd->umem_db);
mlx5ctl_release_uid(mcdev, mfd->uctx_uid);
}

@@ -323,6 +334,57 @@ static int mlx5ctl_cmdrpc_ioctl(struct file *file,
return err;
}

+static int mlx5ctl_ioctl_umem_reg(struct file *file, void __user *arg,
+ size_t usize)
+{
+ struct mlx5ctl_fd *mfd = file->private_data;
+ struct mlx5ctl_umem_reg *umem_reg;
+ int umem_id, err = 0;
+ size_t ksize = 0;
+
+ ksize = max(sizeof(struct mlx5ctl_umem_reg), usize);
+ umem_reg = kzalloc(ksize, GFP_KERNEL_ACCOUNT);
+ if (!umem_reg)
+ return -ENOMEM;
+
+ umem_reg->size = sizeof(struct mlx5ctl_umem_reg);
+
+ if (copy_from_user(umem_reg, arg, usize)) {
+ err = -EFAULT;
+ goto out;
+ }
+
+ if (umem_reg->flags || umem_reg->reserved1 || umem_reg->reserved2) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
+
+ umem_id = mlx5ctl_umem_reg(mfd->umem_db,
+ (unsigned long)umem_reg->addr,
+ umem_reg->len);
+ if (umem_id < 0) {
+ err = umem_id;
+ goto out;
+ }
+
+ umem_reg->umem_id = umem_id;
+
+ if (copy_to_user(arg, umem_reg, usize)) {
+ mlx5ctl_umem_unreg(mfd->umem_db, umem_id);
+ err = -EFAULT;
+ }
+out:
+ kfree(umem_reg);
+ return err;
+}
+
+static int mlx5ctl_ioctl_umem_unreg(struct file *file, unsigned long arg)
+{
+ struct mlx5ctl_fd *mfd = file->private_data;
+
+ return mlx5ctl_umem_unreg(mfd->umem_db, (u32)arg);
+}
+
static long mlx5ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
struct mlx5ctl_fd *mfd = file->private_data;
@@ -352,6 +414,14 @@ static long mlx5ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg
err = mlx5ctl_cmdrpc_ioctl(file, argp, size);
break;

+ case MLX5CTL_IOCTL_UMEM_REG:
+ err = mlx5ctl_ioctl_umem_reg(file, argp, size);
+ break;
+
+ case MLX5CTL_IOCTL_UMEM_UNREG:
+ err = mlx5ctl_ioctl_umem_unreg(file, arg);
+ break;
+
default:
mlx5ctl_dbg(mcdev, "Unknown ioctl %x\n", cmd);
err = -ENOIOCTLCMD;
@@ -362,12 +432,21 @@ static long mlx5ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg
return err;
}

+static long mlx5ctl_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ if (cmd == MLX5CTL_IOCTL_UMEM_UNREG) /* arg is a scalar */
+ return mlx5ctl_ioctl(file, cmd, arg);
+
+ /* arg is a pointer */
+ return compat_ptr_ioctl(file, cmd, arg);
+}
+
static const struct file_operations mlx5ctl_fops = {
.owner = THIS_MODULE,
.open = mlx5ctl_open,
.release = mlx5ctl_release,
.unlocked_ioctl = mlx5ctl_ioctl,
- .compat_ioctl = compat_ptr_ioctl,
+ .compat_ioctl = mlx5ctl_compat_ioctl,
};

static int mlx5ctl_probe(struct auxiliary_device *adev,
diff --git a/drivers/misc/mlx5ctl/umem.c b/drivers/misc/mlx5ctl/umem.c
new file mode 100644
index 000000000000..0f13ccffe7e7
--- /dev/null
+++ b/drivers/misc/mlx5ctl/umem.c
@@ -0,0 +1,320 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#include <linux/mlx5/device.h>
+#include <linux/mlx5/driver.h>
+#include <uapi/misc/mlx5ctl.h>
+
+#include "umem.h"
+
+#define MLX5CTL_UMEM_MAX_MB 64
+
+static unsigned long umem_num_pages(u64 addr, size_t len)
+{
+ return DIV_ROUND_UP(addr + len - PAGE_ALIGN_DOWN(addr), PAGE_SIZE);
+}
+
+struct mlx5ctl_umem {
+ struct sg_table sgt;
+ unsigned long addr;
+ size_t size;
+ size_t offset;
+ size_t npages;
+ struct task_struct *source_task;
+ struct mm_struct *source_mm;
+ struct user_struct *source_user;
+ u32 umem_id;
+ struct page **page_list;
+};
+
+struct mlx5ctl_umem_db {
+ struct xarray xarray;
+ struct mlx5_core_dev *mdev;
+ u32 uctx_uid;
+};
+
+static int inc_user_locked_vm(struct mlx5ctl_umem *umem, unsigned long npages)
+{
+ unsigned long lock_limit;
+ unsigned long cur_pages;
+ unsigned long new_pages;
+
+ lock_limit = task_rlimit(umem->source_task, RLIMIT_MEMLOCK) >>
+ PAGE_SHIFT;
+ do {
+ cur_pages = atomic_long_read(&umem->source_user->locked_vm);
+ new_pages = cur_pages + npages;
+ if (new_pages > lock_limit)
+ return -ENOMEM;
+ } while (atomic_long_cmpxchg(&umem->source_user->locked_vm, cur_pages,
+ new_pages) != cur_pages);
+ return 0;
+}
+
+static void dec_user_locked_vm(struct mlx5ctl_umem *umem, unsigned long npages)
+{
+ if (WARN_ON(atomic_long_read(&umem->source_user->locked_vm) < npages))
+ return;
+ atomic_long_sub(npages, &umem->source_user->locked_vm);
+}
+
+static struct mlx5ctl_umem *mlx5ctl_umem_pin(struct mlx5ctl_umem_db *umem_db,
+ unsigned long addr, size_t size)
+{
+ size_t npages = umem_num_pages(addr, size);
+ struct mlx5_core_dev *mdev = umem_db->mdev;
+ unsigned long endaddr = addr + size;
+ struct mlx5ctl_umem *umem;
+ struct page **page_list;
+ int err = -EINVAL;
+ int pinned = 0;
+
+ dev_dbg(mdev->device, "%s: addr %p size %zu npages %zu\n",
+ __func__, (void __user *)addr, size, npages);
+
+ /* Avoid integer overflow */
+ if (endaddr < addr || PAGE_ALIGN(endaddr) < endaddr)
+ return ERR_PTR(-EINVAL);
+
+ if (npages == 0 || pages_to_mb(npages) > MLX5CTL_UMEM_MAX_MB)
+ return ERR_PTR(-EINVAL);
+
+ page_list = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL_ACCOUNT);
+ if (!page_list)
+ return ERR_PTR(-ENOMEM);
+
+ umem = kzalloc(sizeof(*umem), GFP_KERNEL_ACCOUNT);
+ if (!umem) {
+ kvfree(page_list);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ umem->addr = addr;
+ umem->size = size;
+ umem->offset = addr & ~PAGE_MASK;
+ umem->npages = npages;
+
+ umem->page_list = page_list;
+ umem->source_mm = current->mm;
+ umem->source_task = current->group_leader;
+ get_task_struct(current->group_leader);
+ umem->source_user = get_uid(current_user());
+
+ /* mm and RLIMIT_MEMLOCK user task accounting similar to what is
+ * being done in iopt_alloc_pages() and do_update_pinned()
+ * for IOPT_PAGES_ACCOUNT_USER @drivers/iommu/iommufd/pages.c
+ */
+ mmgrab(umem->source_mm);
+
+ pinned = pin_user_pages_fast(addr, npages, FOLL_WRITE, page_list);
+ if (pinned != npages) {
+ dev_dbg(mdev->device, "pin_user_pages_fast failed %d\n", pinned);
+ err = pinned < 0 ? pinned : -ENOMEM;
+ goto pin_failed;
+ }
+
+ err = inc_user_locked_vm(umem, npages);
+ if (err)
+ goto pin_failed;
+
+ atomic64_add(npages, &umem->source_mm->pinned_vm);
+
+ err = sg_alloc_table_from_pages(&umem->sgt, page_list, npages, 0,
+ npages << PAGE_SHIFT, GFP_KERNEL_ACCOUNT);
+ if (err) {
+ dev_dbg(mdev->device, "sg_alloc_table failed: %d\n", err);
+ goto sgt_failed;
+ }
+
+ dev_dbg(mdev->device, "\tsgt: size %zu npages %zu sgt.nents (%d)\n",
+ size, npages, umem->sgt.nents);
+
+ err = dma_map_sgtable(mdev->device, &umem->sgt, DMA_BIDIRECTIONAL, 0);
+ if (err) {
+ dev_dbg(mdev->device, "dma_map_sgtable failed: %d\n", err);
+ goto dma_failed;
+ }
+
+ dev_dbg(mdev->device, "\tsgt: dma_nents %d\n", umem->sgt.nents);
+ return umem;
+
+dma_failed:
+sgt_failed:
+ sg_free_table(&umem->sgt);
+ atomic64_sub(npages, &umem->source_mm->pinned_vm);
+ dec_user_locked_vm(umem, npages);
+pin_failed:
+ if (pinned > 0)
+ unpin_user_pages(page_list, pinned);
+ mmdrop(umem->source_mm);
+ free_uid(umem->source_user);
+ put_task_struct(umem->source_task);
+
+ kfree(umem);
+ kvfree(page_list);
+ return ERR_PTR(err);
+}
+
+static void mlx5ctl_umem_unpin(struct mlx5ctl_umem_db *umem_db,
+ struct mlx5ctl_umem *umem)
+{
+ struct mlx5_core_dev *mdev = umem_db->mdev;
+
+ dev_dbg(mdev->device, "%s: addr %p size %zu npages %zu dma_nents %d\n",
+ __func__, (void *)umem->addr, umem->size, umem->npages,
+ umem->sgt.nents);
+
+ dma_unmap_sgtable(mdev->device, &umem->sgt, DMA_BIDIRECTIONAL, 0);
+ sg_free_table(&umem->sgt);
+
+ atomic64_sub(umem->npages, &umem->source_mm->pinned_vm);
+ dec_user_locked_vm(umem, umem->npages);
+ unpin_user_pages(umem->page_list, umem->npages);
+ mmdrop(umem->source_mm);
+ free_uid(umem->source_user);
+ put_task_struct(umem->source_task);
+
+ kvfree(umem->page_list);
+ kfree(umem);
+}
+
+static int mlx5ctl_umem_create(struct mlx5_core_dev *mdev,
+ struct mlx5ctl_umem *umem, u32 uid)
+{
+ u32 out[MLX5_ST_SZ_DW(create_umem_out)] = {};
+ int err, inlen, i, n = 0;
+ struct scatterlist *sg;
+ void *in, *umemptr;
+ __be64 *mtt;
+
+ inlen = MLX5_ST_SZ_BYTES(create_umem_in) +
+ umem->npages * MLX5_ST_SZ_BYTES(mtt);
+
+ in = kzalloc(inlen, GFP_KERNEL_ACCOUNT);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(create_umem_in, in, opcode, MLX5_CMD_OP_CREATE_UMEM);
+ MLX5_SET(create_umem_in, in, uid, uid);
+
+ umemptr = MLX5_ADDR_OF(create_umem_in, in, umem);
+
+ MLX5_SET(umem, umemptr, log_page_size,
+ PAGE_SHIFT - MLX5_ADAPTER_PAGE_SHIFT);
+ MLX5_SET64(umem, umemptr, num_of_mtt, umem->npages);
+ MLX5_SET(umem, umemptr, page_offset, umem->offset);
+
+ dev_dbg(mdev->device,
+ "UMEM CREATE: log_page_size %d num_of_mtt %lld page_offset %d\n",
+ MLX5_GET(umem, umemptr, log_page_size),
+ MLX5_GET64(umem, umemptr, num_of_mtt),
+ MLX5_GET(umem, umemptr, page_offset));
+
+ mtt = MLX5_ADDR_OF(create_umem_in, in, umem.mtt);
+ for_each_sgtable_dma_sg(&umem->sgt, sg, i) {
+ u64 dma_addr = sg_dma_address(sg);
+ ssize_t len = sg_dma_len(sg);
+
+ for (; n < umem->npages && len > 0; n++, mtt++) {
+ *mtt = cpu_to_be64(dma_addr);
+ MLX5_SET(mtt, mtt, wr_en, 1);
+ MLX5_SET(mtt, mtt, rd_en, 1);
+ dma_addr += PAGE_SIZE;
+ len -= PAGE_SIZE;
+ }
+ WARN_ON_ONCE(n == umem->npages && len > 0);
+ }
+
+ err = mlx5_cmd_exec(mdev, in, inlen, out, sizeof(out));
+ if (err)
+ goto out;
+
+ umem->umem_id = MLX5_GET(create_umem_out, out, umem_id);
+ dev_dbg(mdev->device, "\tUMEM CREATED: umem_id %d\n", umem->umem_id);
+out:
+ kfree(in);
+ return err;
+}
+
+static void mlx5ctl_umem_destroy(struct mlx5_core_dev *mdev,
+ struct mlx5ctl_umem *umem)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_umem_in)] = {};
+
+ MLX5_SET(destroy_umem_in, in, opcode, MLX5_CMD_OP_DESTROY_UMEM);
+ MLX5_SET(destroy_umem_in, in, umem_id, umem->umem_id);
+
+ dev_dbg(mdev->device, "UMEM DESTROY: umem_id %d\n", umem->umem_id);
+ mlx5_cmd_exec_in(mdev, destroy_umem, in);
+}
+
+int mlx5ctl_umem_reg(struct mlx5ctl_umem_db *umem_db, unsigned long addr,
+ size_t size)
+{
+ struct mlx5ctl_umem *umem;
+ void *ret;
+ int err;
+
+ umem = mlx5ctl_umem_pin(umem_db, addr, size);
+ if (IS_ERR(umem))
+ return PTR_ERR(umem);
+
+ err = mlx5ctl_umem_create(umem_db->mdev, umem, umem_db->uctx_uid);
+ if (err)
+ goto umem_create_err;
+
+ ret = xa_store(&umem_db->xarray, umem->umem_id, umem, GFP_KERNEL_ACCOUNT);
+ if (WARN(xa_is_err(ret), "Failed to store UMEM")) {
+ err = xa_err(ret);
+ goto xa_store_err;
+ }
+
+ return umem->umem_id;
+
+xa_store_err:
+ mlx5ctl_umem_destroy(umem_db->mdev, umem);
+umem_create_err:
+ mlx5ctl_umem_unpin(umem_db, umem);
+ return err;
+}
+
+int mlx5ctl_umem_unreg(struct mlx5ctl_umem_db *umem_db, u32 umem_id)
+{
+ struct mlx5ctl_umem *umem;
+
+ umem = xa_erase(&umem_db->xarray, umem_id);
+ if (!umem)
+ return -ENOENT;
+
+ mlx5ctl_umem_destroy(umem_db->mdev, umem);
+ mlx5ctl_umem_unpin(umem_db, umem);
+ return 0;
+}
+
+struct mlx5ctl_umem_db *mlx5ctl_umem_db_create(struct mlx5_core_dev *mdev,
+ u32 uctx_uid)
+{
+ struct mlx5ctl_umem_db *umem_db;
+
+ umem_db = kzalloc(sizeof(*umem_db), GFP_KERNEL_ACCOUNT);
+ if (!umem_db)
+ return ERR_PTR(-ENOMEM);
+
+ xa_init(&umem_db->xarray);
+ umem_db->mdev = mdev;
+ umem_db->uctx_uid = uctx_uid;
+
+ return umem_db;
+}
+
+void mlx5ctl_umem_db_destroy(struct mlx5ctl_umem_db *umem_db)
+{
+ struct mlx5ctl_umem *umem;
+ unsigned long index;
+
+ xa_for_each(&umem_db->xarray, index, umem)
+ mlx5ctl_umem_unreg(umem_db, umem->umem_id);
+
+ xa_destroy(&umem_db->xarray);
+ kfree(umem_db);
+}
diff --git a/drivers/misc/mlx5ctl/umem.h b/drivers/misc/mlx5ctl/umem.h
new file mode 100644
index 000000000000..9cf62e5e775e
--- /dev/null
+++ b/drivers/misc/mlx5ctl/umem.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0 */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef __MLX5CTL_UMEM_H__
+#define __MLX5CTL_UMEM_H__
+
+#include <linux/types.h>
+#include <linux/mlx5/driver.h>
+
+struct mlx5ctl_umem_db;
+
+struct mlx5ctl_umem_db *mlx5ctl_umem_db_create(struct mlx5_core_dev *mdev, u32 uctx_uid);
+void mlx5ctl_umem_db_destroy(struct mlx5ctl_umem_db *umem_db);
+int mlx5ctl_umem_reg(struct mlx5ctl_umem_db *umem_db, unsigned long addr, size_t size);
+int mlx5ctl_umem_unreg(struct mlx5ctl_umem_db *umem_db, u32 umem_id);
+
+#endif /* __MLX5CTL_UMEM_H__ */
diff --git a/include/uapi/misc/mlx5ctl.h b/include/uapi/misc/mlx5ctl.h
index 3277eaf78a37..eaf6c954a1ff 100644
--- a/include/uapi/misc/mlx5ctl.h
+++ b/include/uapi/misc/mlx5ctl.h
@@ -24,6 +24,16 @@ struct mlx5ctl_cmdrpc {
__aligned_u64 flags;
};

+struct mlx5ctl_umem_reg {
+ __aligned_u64 flags;
+ __u32 size;
+ __u32 reserved1;
+ __aligned_u64 addr; /* user address */
+ __aligned_u64 len; /* user buffer length */
+ __u32 umem_id; /* returned device's umem ID */
+ __u32 reserved2;
+};
+
#define MLX5CTL_MAX_RPC_SIZE 8192

#define MLX5CTL_IOCTL_MAGIC 0x5c
@@ -34,4 +44,10 @@ struct mlx5ctl_cmdrpc {
#define MLX5CTL_IOCTL_CMDRPC \
_IOWR(MLX5CTL_IOCTL_MAGIC, 0x1, struct mlx5ctl_cmdrpc)

+#define MLX5CTL_IOCTL_UMEM_REG \
+ _IOWR(MLX5CTL_IOCTL_MAGIC, 0x2, struct mlx5ctl_umem_reg)
+
+#define MLX5CTL_IOCTL_UMEM_UNREG \
+ _IO(MLX5CTL_IOCTL_MAGIC, 0x3)
+
#endif /* __MLX5CTL_IOCTL_H__ */
--
2.41.0

2023-11-19 09:45:02

by Saeed Mahameed

[permalink] [raw]
Subject: [PATCH V2 4/5] misc: mlx5ctl: Add command rpc ioctl

From: Saeed Mahameed <[email protected]>

Add new IOCTL to allow user space to send device debug rpcs and
attach the user's uctx UID to each rpc.

In the mlx5 architecture the FW RPC commands are of the format of
inbox and outbox buffers. The inbox buffer contains the command
rpc layout as described in the ConnectX Programmers Reference Manual
(PRM) document and as defined in include/linux/mlx5/mlx5_ifc.h.

On success the user outbox buffer will be filled with the device's rpc
response.

For example to query device capabilities:
a user fills out an inbox buffer with the inbox layout:
struct mlx5_ifc_query_hca_cap_in_bits
and expects an outbox buffer with the layout:
struct mlx5_ifc_cmd_hca_cap_bits

Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
---
drivers/misc/mlx5ctl/main.c | 95 +++++++++++++++++++++++++++++++++++++
include/uapi/misc/mlx5ctl.h | 13 +++++
2 files changed, 108 insertions(+)

diff --git a/drivers/misc/mlx5ctl/main.c b/drivers/misc/mlx5ctl/main.c
index 6a98b40e4300..e7776ea4bfca 100644
--- a/drivers/misc/mlx5ctl/main.c
+++ b/drivers/misc/mlx5ctl/main.c
@@ -232,6 +232,97 @@ static int mlx5ctl_info_ioctl(struct file *file,
return err;
}

+struct mlx5_ifc_mbox_in_hdr_bits {
+ u8 opcode[0x10];
+ u8 uid[0x10];
+
+ u8 reserved_at_20[0x10];
+ u8 op_mod[0x10];
+
+ u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_mbox_out_hdr_bits {
+ u8 status[0x8];
+ u8 reserved_at_8[0x18];
+
+ u8 syndrome[0x20];
+
+ u8 reserved_at_40[0x40];
+};
+
+static int mlx5ctl_cmdrpc_ioctl(struct file *file,
+ struct mlx5ctl_cmdrpc __user *arg,
+ size_t usize)
+{
+ struct mlx5ctl_fd *mfd = file->private_data;
+ struct mlx5ctl_dev *mcdev = mfd->mcdev;
+ struct mlx5ctl_cmdrpc *rpc = NULL;
+ void *in = NULL, *out = NULL;
+ size_t ksize = 0;
+ int err;
+
+ ksize = max(sizeof(struct mlx5ctl_cmdrpc), usize);
+ rpc = kzalloc(ksize, GFP_KERNEL_ACCOUNT);
+ if (!rpc)
+ return -ENOMEM;
+
+ err = copy_from_user(rpc, arg, usize);
+ if (err)
+ goto out;
+
+ mlx5ctl_dbg(mcdev, "[UID %d] cmdrpc: rpc->inlen %d rpc->outlen %d\n",
+ mfd->uctx_uid, rpc->inlen, rpc->outlen);
+
+ if (rpc->inlen < MLX5_ST_SZ_BYTES(mbox_in_hdr) ||
+ rpc->outlen < MLX5_ST_SZ_BYTES(mbox_out_hdr) ||
+ rpc->inlen > MLX5CTL_MAX_RPC_SIZE ||
+ rpc->outlen > MLX5CTL_MAX_RPC_SIZE) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (rpc->flags) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
+
+ in = memdup_user(u64_to_user_ptr(rpc->in), rpc->inlen);
+ if (IS_ERR(in)) {
+ err = PTR_ERR(in);
+ goto out;
+ }
+
+ out = kvzalloc(rpc->outlen, GFP_KERNEL_ACCOUNT);
+ if (!out) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ mlx5ctl_dbg(mcdev, "[UID %d] cmdif: opcode 0x%x inlen %d outlen %d\n",
+ mfd->uctx_uid,
+ MLX5_GET(mbox_in_hdr, in, opcode), rpc->inlen, rpc->outlen);
+
+ MLX5_SET(mbox_in_hdr, in, uid, mfd->uctx_uid);
+ err = mlx5_cmd_do(mcdev->mdev, in, rpc->inlen, out, rpc->outlen);
+ mlx5ctl_dbg(mcdev, "[UID %d] cmdif: opcode 0x%x retval %d\n",
+ mfd->uctx_uid,
+ MLX5_GET(mbox_in_hdr, in, opcode), err);
+
+ /* -EREMOTEIO means outbox is valid, but out.status is not */
+ if (!err || err == -EREMOTEIO) {
+ err = 0;
+ if (copy_to_user(u64_to_user_ptr(rpc->out), out, rpc->outlen))
+ err = -EFAULT;
+ }
+
+out:
+ kvfree(out);
+ kfree(in);
+ kfree(rpc);
+ return err;
+}
+
static long mlx5ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
struct mlx5ctl_fd *mfd = file->private_data;
@@ -257,6 +348,10 @@ static long mlx5ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg
err = mlx5ctl_info_ioctl(file, argp, size);
break;

+ case MLX5CTL_IOCTL_CMDRPC:
+ err = mlx5ctl_cmdrpc_ioctl(file, argp, size);
+ break;
+
default:
mlx5ctl_dbg(mcdev, "Unknown ioctl %x\n", cmd);
err = -ENOIOCTLCMD;
diff --git a/include/uapi/misc/mlx5ctl.h b/include/uapi/misc/mlx5ctl.h
index 37153cc0fc6e..3277eaf78a37 100644
--- a/include/uapi/misc/mlx5ctl.h
+++ b/include/uapi/misc/mlx5ctl.h
@@ -16,9 +16,22 @@ struct mlx5ctl_info {
__u32 reserved2;
};

+struct mlx5ctl_cmdrpc {
+ __aligned_u64 in; /* RPC inbox buffer user address */
+ __aligned_u64 out; /* RPC outbox buffer user address */
+ __u32 inlen; /* inbox buffer length */
+ __u32 outlen; /* outbox buffer length */
+ __aligned_u64 flags;
+};
+
+#define MLX5CTL_MAX_RPC_SIZE 8192
+
#define MLX5CTL_IOCTL_MAGIC 0x5c

#define MLX5CTL_IOCTL_INFO \
_IOR(MLX5CTL_IOCTL_MAGIC, 0x0, struct mlx5ctl_info)

+#define MLX5CTL_IOCTL_CMDRPC \
+ _IOWR(MLX5CTL_IOCTL_MAGIC, 0x1, struct mlx5ctl_cmdrpc)
+
#endif /* __MLX5CTL_IOCTL_H__ */
--
2.41.0

2023-11-19 09:45:26

by Saeed Mahameed

[permalink] [raw]
Subject: [PATCH V2 3/5] misc: mlx5ctl: Add info ioctl

From: Saeed Mahameed <[email protected]>

Implement INFO ioctl to return the allocated UID and the capability flags
and some other useful device information such as the underlying ConnectX
device.

Example:
$ sudo ./mlx5ctlu mlx5_core.ctl.0
mlx5dev: 0000:00:04.0
UCTX UID: 1
UCTX CAP: 0x3
DEV UCTX CAP: 0x3
USER CAP: 0x1d

Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
---
.../userspace-api/ioctl/ioctl-number.rst | 1 +
drivers/misc/mlx5ctl/main.c | 71 +++++++++++++++++++
include/uapi/misc/mlx5ctl.h | 24 +++++++
3 files changed, 96 insertions(+)
create mode 100644 include/uapi/misc/mlx5ctl.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 4ea5b837399a..9faf91ffefff 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -89,6 +89,7 @@ Code Seq# Include File Comments
0x20 all drivers/cdrom/cm206.h
0x22 all scsi/sg.h
0x3E 00-0F linux/counter.h <mailto:[email protected]>
+0x5c all uapi/misc/mlx5ctl.h Nvidia ConnectX control
'!' 00-1F uapi/linux/seccomp.h
'#' 00-3F IEEE 1394 Subsystem
Block for the entire subsystem
diff --git a/drivers/misc/mlx5ctl/main.c b/drivers/misc/mlx5ctl/main.c
index 8eb150461b80..6a98b40e4300 100644
--- a/drivers/misc/mlx5ctl/main.c
+++ b/drivers/misc/mlx5ctl/main.c
@@ -8,6 +8,7 @@
#include <linux/auxiliary_bus.h>
#include <linux/mlx5/device.h>
#include <linux/mlx5/driver.h>
+#include <uapi/misc/mlx5ctl.h>
#include <linux/atomic.h>
#include <linux/refcount.h>

@@ -198,10 +199,80 @@ static int mlx5ctl_release(struct inode *inode, struct file *file)
return 0;
}

+static int mlx5ctl_info_ioctl(struct file *file,
+ struct mlx5ctl_info __user *arg,
+ size_t usize)
+{
+ struct mlx5ctl_fd *mfd = file->private_data;
+ struct mlx5ctl_dev *mcdev = mfd->mcdev;
+ struct mlx5_core_dev *mdev = mcdev->mdev;
+ struct mlx5ctl_info *info;
+ size_t ksize = 0;
+ int err = 0;
+
+ ksize = max(sizeof(struct mlx5ctl_info), usize);
+ info = kzalloc(ksize, GFP_KERNEL_ACCOUNT);
+ if (!info)
+ return -ENOMEM;
+
+ info->size = sizeof(struct mlx5ctl_info);
+
+ info->dev_uctx_cap = MLX5_CAP_GEN(mdev, uctx_cap);
+ info->uctx_cap = mfd->uctx_cap;
+ info->uctx_uid = mfd->uctx_uid;
+ info->ucap = mfd->ucap;
+
+ strscpy(info->devname, dev_name(&mdev->pdev->dev),
+ sizeof(info->devname));
+
+ if (copy_to_user(arg, info, usize))
+ err = -EFAULT;
+
+ kfree(info);
+ return err;
+}
+
+static long mlx5ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct mlx5ctl_fd *mfd = file->private_data;
+ struct mlx5ctl_dev *mcdev = mfd->mcdev;
+ void __user *argp = (void __user *)arg;
+ size_t size = _IOC_SIZE(cmd);
+ int err = 0;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ mlx5ctl_dbg(mcdev, "ioctl 0x%x type/nr: %d/%d size: %d DIR:%d\n", cmd,
+ _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd), _IOC_DIR(cmd));
+
+ down_read(&mcdev->rw_lock);
+ if (!mcdev->mdev) {
+ err = -ENODEV;
+ goto unlock;
+ }
+
+ switch (cmd) {
+ case MLX5CTL_IOCTL_INFO:
+ err = mlx5ctl_info_ioctl(file, argp, size);
+ break;
+
+ default:
+ mlx5ctl_dbg(mcdev, "Unknown ioctl %x\n", cmd);
+ err = -ENOIOCTLCMD;
+ break;
+ }
+unlock:
+ up_read(&mcdev->rw_lock);
+ return err;
+}
+
static const struct file_operations mlx5ctl_fops = {
.owner = THIS_MODULE,
.open = mlx5ctl_open,
.release = mlx5ctl_release,
+ .unlocked_ioctl = mlx5ctl_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
};

static int mlx5ctl_probe(struct auxiliary_device *adev,
diff --git a/include/uapi/misc/mlx5ctl.h b/include/uapi/misc/mlx5ctl.h
new file mode 100644
index 000000000000..37153cc0fc6e
--- /dev/null
+++ b/include/uapi/misc/mlx5ctl.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef __MLX5CTL_IOCTL_H__
+#define __MLX5CTL_IOCTL_H__
+
+struct mlx5ctl_info {
+ __aligned_u64 flags;
+ __u32 size;
+ __u8 devname[64]; /* underlaying ConnectX device */
+ __u16 uctx_uid; /* current process allocated UCTX UID */
+ __u16 reserved1;
+ __u32 uctx_cap; /* current process effective UCTX cap */
+ __u32 dev_uctx_cap; /* device's UCTX capabilities */
+ __u32 ucap; /* process user capability */
+ __u32 reserved2;
+};
+
+#define MLX5CTL_IOCTL_MAGIC 0x5c
+
+#define MLX5CTL_IOCTL_INFO \
+ _IOR(MLX5CTL_IOCTL_MAGIC, 0x0, struct mlx5ctl_info)
+
+#endif /* __MLX5CTL_IOCTL_H__ */
--
2.41.0

2023-11-19 09:51:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH V2 2/5] misc: mlx5ctl: Add mlx5ctl misc driver

On Sun, Nov 19, 2023 at 01:24:47AM -0800, Saeed Mahameed wrote:
> igned-off-by: Saeed Mahameed <[email protected]>

Something went wrong here :(

2023-11-19 19:31:21

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl

Hi Saeed,

kernel test robot noticed the following build warnings:

[auto build test WARNING on char-misc/char-misc-testing]
[also build test WARNING on rdma/for-next linus/master v6.7-rc1 next-20231117]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Saeed-Mahameed/misc-mlx5ctl-Add-mlx5ctl-misc-driver/20231119-215311
base: char-misc/char-misc-testing
patch link: https://lore.kernel.org/r/20231119092450.164996-6-saeed%40kernel.org
patch subject: [PATCH v2 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20231120/[email protected]/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231120/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

In file included from include/uapi/linux/posix_types.h:5,
from include/uapi/linux/types.h:14,
from include/linux/types.h:6,
from include/linux/kcsan-checks.h:14,
from include/asm-generic/barrier.h:17,
from arch/alpha/include/asm/barrier.h:21,
from arch/alpha/include/asm/rwonce.h:10,
from include/linux/compiler.h:251,
from include/linux/build_bug.h:5,
from include/linux/container_of.h:5,
from include/linux/list.h:5,
from include/linux/miscdevice.h:5,
from drivers/misc/mlx5ctl/main.c:4:
drivers/misc/mlx5ctl/main.c: In function 'mlx5ctl_compat_ioctl':
include/linux/stddef.h:8:14: error: called object is not a function or function pointer
8 | #define NULL ((void *)0)
| ^
include/linux/fs.h:1837:26: note: in expansion of macro 'NULL'
1837 | #define compat_ptr_ioctl NULL
| ^~~~
drivers/misc/mlx5ctl/main.c:441:16: note: in expansion of macro 'compat_ptr_ioctl'
441 | return compat_ptr_ioctl(file, cmd, arg);
| ^~~~~~~~~~~~~~~~
>> drivers/misc/mlx5ctl/main.c:442:1: warning: control reaches end of non-void function [-Wreturn-type]
442 | }
| ^


vim +442 drivers/misc/mlx5ctl/main.c

434
435 static long mlx5ctl_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
436 {
437 if (cmd == MLX5CTL_IOCTL_UMEM_UNREG) /* arg is a scalar */
438 return mlx5ctl_ioctl(file, cmd, arg);
439
440 /* arg is a pointer */
441 return compat_ptr_ioctl(file, cmd, arg);
> 442 }
443

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-11-19 20:10:17

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl

Hi Saeed,

kernel test robot noticed the following build errors:

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on rdma/for-next linus/master v6.7-rc1 next-20231117]
[cannot apply to char-misc/char-misc-next char-misc/char-misc-linus soc/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Saeed-Mahameed/misc-mlx5ctl-Add-mlx5ctl-misc-driver/20231119-215311
base: char-misc/char-misc-testing
patch link: https://lore.kernel.org/r/20231119092450.164996-6-saeed%40kernel.org
patch subject: [PATCH v2 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl
config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20231120/[email protected]/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231120/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

>> drivers/misc/mlx5ctl/main.c:441:25: error: called object type 'void *' is not a function or function pointer
return compat_ptr_ioctl(file, cmd, arg);
~~~~~~~~~~~~~~~~^
1 error generated.


vim +441 drivers/misc/mlx5ctl/main.c

434
435 static long mlx5ctl_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
436 {
437 if (cmd == MLX5CTL_IOCTL_UMEM_UNREG) /* arg is a scalar */
438 return mlx5ctl_ioctl(file, cmd, arg);
439
440 /* arg is a pointer */
> 441 return compat_ptr_ioctl(file, cmd, arg);
442 }
443

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-11-19 21:38:05

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl

Hi Saeed,

kernel test robot noticed the following build errors:

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on rdma/for-next linus/master v6.7-rc1 next-20231117]
[cannot apply to char-misc/char-misc-next char-misc/char-misc-linus soc/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Saeed-Mahameed/misc-mlx5ctl-Add-mlx5ctl-misc-driver/20231119-215311
base: char-misc/char-misc-testing
patch link: https://lore.kernel.org/r/20231119092450.164996-6-saeed%40kernel.org
patch subject: [PATCH v2 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl
config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20231120/[email protected]/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231120/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

>> drivers/misc/mlx5ctl/umem.c:79:21: error: call to undeclared function 'pages_to_mb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
79 | if (npages == 0 || pages_to_mb(npages) > MLX5CTL_UMEM_MAX_MB)
| ^
1 error generated.


vim +/pages_to_mb +79 drivers/misc/mlx5ctl/umem.c

60
61 static struct mlx5ctl_umem *mlx5ctl_umem_pin(struct mlx5ctl_umem_db *umem_db,
62 unsigned long addr, size_t size)
63 {
64 size_t npages = umem_num_pages(addr, size);
65 struct mlx5_core_dev *mdev = umem_db->mdev;
66 unsigned long endaddr = addr + size;
67 struct mlx5ctl_umem *umem;
68 struct page **page_list;
69 int err = -EINVAL;
70 int pinned = 0;
71
72 dev_dbg(mdev->device, "%s: addr %p size %zu npages %zu\n",
73 __func__, (void __user *)addr, size, npages);
74
75 /* Avoid integer overflow */
76 if (endaddr < addr || PAGE_ALIGN(endaddr) < endaddr)
77 return ERR_PTR(-EINVAL);
78
> 79 if (npages == 0 || pages_to_mb(npages) > MLX5CTL_UMEM_MAX_MB)
80 return ERR_PTR(-EINVAL);
81
82 page_list = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL_ACCOUNT);
83 if (!page_list)
84 return ERR_PTR(-ENOMEM);
85
86 umem = kzalloc(sizeof(*umem), GFP_KERNEL_ACCOUNT);
87 if (!umem) {
88 kvfree(page_list);
89 return ERR_PTR(-ENOMEM);
90 }
91
92 umem->addr = addr;
93 umem->size = size;
94 umem->offset = addr & ~PAGE_MASK;
95 umem->npages = npages;
96
97 umem->page_list = page_list;
98 umem->source_mm = current->mm;
99 umem->source_task = current->group_leader;
100 get_task_struct(current->group_leader);
101 umem->source_user = get_uid(current_user());
102
103 /* mm and RLIMIT_MEMLOCK user task accounting similar to what is
104 * being done in iopt_alloc_pages() and do_update_pinned()
105 * for IOPT_PAGES_ACCOUNT_USER @drivers/iommu/iommufd/pages.c
106 */
107 mmgrab(umem->source_mm);
108
109 pinned = pin_user_pages_fast(addr, npages, FOLL_WRITE, page_list);
110 if (pinned != npages) {
111 dev_dbg(mdev->device, "pin_user_pages_fast failed %d\n", pinned);
112 err = pinned < 0 ? pinned : -ENOMEM;
113 goto pin_failed;
114 }
115
116 err = inc_user_locked_vm(umem, npages);
117 if (err)
118 goto pin_failed;
119
120 atomic64_add(npages, &umem->source_mm->pinned_vm);
121
122 err = sg_alloc_table_from_pages(&umem->sgt, page_list, npages, 0,
123 npages << PAGE_SHIFT, GFP_KERNEL_ACCOUNT);
124 if (err) {
125 dev_dbg(mdev->device, "sg_alloc_table failed: %d\n", err);
126 goto sgt_failed;
127 }
128
129 dev_dbg(mdev->device, "\tsgt: size %zu npages %zu sgt.nents (%d)\n",
130 size, npages, umem->sgt.nents);
131
132 err = dma_map_sgtable(mdev->device, &umem->sgt, DMA_BIDIRECTIONAL, 0);
133 if (err) {
134 dev_dbg(mdev->device, "dma_map_sgtable failed: %d\n", err);
135 goto dma_failed;
136 }
137
138 dev_dbg(mdev->device, "\tsgt: dma_nents %d\n", umem->sgt.nents);
139 return umem;
140
141 dma_failed:
142 sgt_failed:
143 sg_free_table(&umem->sgt);
144 atomic64_sub(npages, &umem->source_mm->pinned_vm);
145 dec_user_locked_vm(umem, npages);
146 pin_failed:
147 if (pinned > 0)
148 unpin_user_pages(page_list, pinned);
149 mmdrop(umem->source_mm);
150 free_uid(umem->source_user);
151 put_task_struct(umem->source_task);
152
153 kfree(umem);
154 kvfree(page_list);
155 return ERR_PTR(err);
156 }
157

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-11-21 07:25:56

by Saeed Mahameed

[permalink] [raw]
Subject: Re: [PATCH V2 2/5] misc: mlx5ctl: Add mlx5ctl misc driver

On 19 Nov 10:51, Greg Kroah-Hartman wrote:
>On Sun, Nov 19, 2023 at 01:24:47AM -0800, Saeed Mahameed wrote:
>> igned-off-by: Saeed Mahameed <[email protected]>
>
>Something went wrong here :(

:(, Just sent V3 now, I needed to fix two new kernel robot warnings
anyways.