2022-11-02 20:46:42

by Oded Gabbay

[permalink] [raw]
Subject: [RFC PATCH v2 0/3] new subsystem for compute accelerator devices

This is the second version of the RFC following the comments given on the
first version. Nothing materially has changed in regard to how accel
devices are registered and exposed to user-space. The changes are mostly
re-factoring according to the comments.

Changes since v1:
- Instead of embedding the accel code inside drm core functions, create
accel_drv.c to hold all the new core code and call that code from
DRM core.

- Replace deprecated IDR with xarray to manage the accel minors.

- Remove all #ifdef from drm_drv.c. Instead, there are empty inline
implementations in a new header file drm_accel.h (in include/drm/) that
will be compiled in case CONFIG_ACCEL is set to 'N'.

- Patch-set organization is a bit different:
- Patch 1 introduces the accel major code and the new Kconfig.
- Patch 2 introduces the accel minor code.
- Patch 3 adds the call to accel functions from DRM core code.

I still haven't added formal documentation as I want to make sure the
general design of the new version is acceptable. If there won't be any
major comments, I'll add the documentation and send the next version as
the version to be merged to drm-next.

The patches are in the following repo:
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git/log/?h=accel_v2

The HEAD of that branch is a commit adding a dummy driver that
registers an accel device using the new framework. This can be served
as a simple reference.

v1 cover letter:
https://lkml.org/lkml/2022/10/22/544

Thanks,
Oded.

Oded Gabbay (3):
drivers/accel: define kconfig and register a new major
accel: add dedicated minor for accelerator devices
drm: initialize accel framework

Documentation/admin-guide/devices.txt | 5 +
MAINTAINERS | 8 +
drivers/Kconfig | 2 +
drivers/Makefile | 3 +
drivers/accel/Kconfig | 24 +++
drivers/accel/Makefile | 10 +
drivers/accel/accel_drv.c | 281 ++++++++++++++++++++++++++
drivers/gpu/drm/drm_drv.c | 98 ++++++---
drivers/gpu/drm/drm_file.c | 2 +-
drivers/gpu/drm/drm_sysfs.c | 24 ++-
include/drm/drm_accel.h | 58 ++++++
include/drm/drm_device.h | 3 +
include/drm/drm_drv.h | 8 +
include/drm/drm_file.h | 21 +-
14 files changed, 513 insertions(+), 34 deletions(-)
create mode 100644 drivers/accel/Kconfig
create mode 100644 drivers/accel/Makefile
create mode 100644 drivers/accel/accel_drv.c
create mode 100644 include/drm/drm_accel.h

--
2.25.1



2022-11-02 20:48:19

by Oded Gabbay

[permalink] [raw]
Subject: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

Add a new Kconfig for the accel subsystem. The Kconfig currently
contains only the basic CONFIG_ACCEL option that will be used to
decide whether to compile the accel registration code. That code will
be called directly from the DRM core code.

The accelerator devices will be exposed to the user space with a new,
dedicated major number - 261.

The accel init function registers the new major number as a char device
and create corresponding sysfs and debugfs root entries, similar to
what is done in DRM init function.

I added a new header called drm_accel.h to include/drm/, that will hold
the prototypes of the accel_drv.c functions. In case CONFIG_ACCEL is
set to 'N', that header will contain empty inline implementations of
those functions, to allow DRM core code to compile successfully
without dependency on CONFIG_ACCEL.

I Updated the MAINTAINERS file accordingly with the newly added folder
and I have taken the liberty to appropriate the dri-devel mailing list
and the dri-devel IRC channel for the accel subsystem.

Signed-off-by: Oded Gabbay <[email protected]>
---
Changes in v2:
- Created accel_drv.c that will hold the accel framework core functions,
instead of embedding the code inside drm core functions.
- Created drm/drm_accel.h
- Removed all #ifdef CONFIG_ACCEL from drm_drv.c

Documentation/admin-guide/devices.txt | 5 ++
MAINTAINERS | 8 ++
drivers/Kconfig | 2 +
drivers/Makefile | 3 +
drivers/accel/Kconfig | 24 ++++++
drivers/accel/Makefile | 10 +++
drivers/accel/accel_drv.c | 112 ++++++++++++++++++++++++++
include/drm/drm_accel.h | 31 +++++++
8 files changed, 195 insertions(+)
create mode 100644 drivers/accel/Kconfig
create mode 100644 drivers/accel/Makefile
create mode 100644 drivers/accel/accel_drv.c
create mode 100644 include/drm/drm_accel.h

diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt
index 9764d6edb189..06c525e01ea5 100644
--- a/Documentation/admin-guide/devices.txt
+++ b/Documentation/admin-guide/devices.txt
@@ -3080,6 +3080,11 @@
...
255 = /dev/osd255 256th OSD Device

+ 261 char Compute Acceleration Devices
+ 0 = /dev/accel/accel0 First acceleration device
+ 1 = /dev/accel/accel1 Second acceleration device
+ ...
+
384-511 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major
number will take numbers starting from 511 and downward,
diff --git a/MAINTAINERS b/MAINTAINERS
index ab07cf28844e..9b34f756e343 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6825,6 +6825,14 @@ F: include/drm/drm*
F: include/linux/vga*
F: include/uapi/drm/drm*

+DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK
+M: Oded Gabbay <[email protected]>
+L: [email protected]
+S: Maintained
+C: irc://irc.oftc.net/dri-devel
+T: git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git
+F: drivers/accel/
+
DRM DRIVERS FOR ALLWINNER A10
M: Maxime Ripard <[email protected]>
M: Chen-Yu Tsai <[email protected]>
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 19ee995bd0ae..968bd0a6fd78 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -99,6 +99,8 @@ source "drivers/media/Kconfig"

source "drivers/video/Kconfig"

+source "drivers/accel/Kconfig"
+
source "sound/Kconfig"

source "drivers/hid/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index bdf1c66141c9..658199dcee96 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -62,6 +62,9 @@ obj-y += iommu/
# gpu/ comes after char for AGP vs DRM startup and after iommu
obj-y += gpu/

+# accel is part of drm so it must come after gpu
+obj-$(CONFIG_ACCEL) += accel/
+
obj-$(CONFIG_CONNECTOR) += connector/

# i810fb and intelfb depend on char/agp/
diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
new file mode 100644
index 000000000000..282ea24f90c5
--- /dev/null
+++ b/drivers/accel/Kconfig
@@ -0,0 +1,24 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Compute Acceleration device configuration
+#
+# This framework provides support for compute acceleration devices, such
+# as, but not limited to, Machine-Learning and Deep-Learning acceleration
+# devices
+#
+menuconfig ACCEL
+ tristate "Compute Acceleration Framework"
+ depends on DRM
+ help
+ Framework for device drivers of compute acceleration devices, such
+ as, but not limited to, Machine-Learning and Deep-Learning
+ acceleration devices.
+ If you say Y here, you need to select the module that's right for
+ your acceleration device from the list below.
+ This framework is integrated with the DRM subsystem as compute
+ accelerators and GPUs share a lot in common and can use almost the
+ same infrastructure code.
+ Having said that, acceleration devices will have a different
+ major number than GPUs, and will be exposed to user-space using
+ different device files, called accel/accel* (in /dev, sysfs
+ and debugfs)
diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
new file mode 100644
index 000000000000..b5b7d812a8ef
--- /dev/null
+++ b/drivers/accel/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# Makefile for the accel framework. This framework provides support for
+# compute acceleration devices, such as, but not limited to, Machine-Learning
+# and Deep-Learning acceleration devices
+
+accel-y := \
+ accel_drv.o
+
+obj-$(CONFIG_ACCEL) += accel.o
diff --git a/drivers/accel/accel_drv.c b/drivers/accel/accel_drv.c
new file mode 100644
index 000000000000..6132765ea054
--- /dev/null
+++ b/drivers/accel/accel_drv.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2022 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/debugfs.h>
+#include <linux/device.h>
+
+#include <drm/drm_accel.h>
+#include <drm/drm_ioctl.h>
+#include <drm/drm_print.h>
+
+static struct dentry *accel_debugfs_root;
+struct class *accel_class;
+
+static char *accel_devnode(struct device *dev, umode_t *mode)
+{
+ return kasprintf(GFP_KERNEL, "accel/%s", dev_name(dev));
+}
+
+static CLASS_ATTR_STRING(accel_version, 0444, "accel 1.0.0 20221018");
+
+/**
+ * accel_sysfs_init - initialize sysfs helpers
+ *
+ * This is used to create the ACCEL class, which is the implicit parent of any
+ * other top-level ACCEL sysfs objects.
+ *
+ * You must call accel_sysfs_destroy() to release the allocated resources.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+static int accel_sysfs_init(void)
+{
+ int err;
+
+ accel_class = class_create(THIS_MODULE, "accel");
+ if (IS_ERR(accel_class))
+ return PTR_ERR(accel_class);
+
+ err = class_create_file(accel_class, &class_attr_accel_version.attr);
+ if (err) {
+ class_destroy(accel_class);
+ accel_class = NULL;
+ return err;
+ }
+
+ accel_class->devnode = accel_devnode;
+
+ return 0;
+}
+
+/**
+ * accel_sysfs_destroy - destroys ACCEL class
+ *
+ * Destroy the ACCEL device class.
+ */
+static void accel_sysfs_destroy(void)
+{
+ if (IS_ERR_OR_NULL(accel_class))
+ return;
+ class_remove_file(accel_class, &class_attr_accel_version.attr);
+ class_destroy(accel_class);
+ accel_class = NULL;
+}
+
+static int accel_stub_open(struct inode *inode, struct file *filp)
+{
+ DRM_DEBUG("Operation not supported");
+
+ return -EOPNOTSUPP;
+}
+
+static const struct file_operations accel_stub_fops = {
+ .owner = THIS_MODULE,
+ .open = accel_stub_open,
+ .llseek = noop_llseek,
+};
+
+void accel_core_exit(void)
+{
+ unregister_chrdev(ACCEL_MAJOR, "accel");
+ debugfs_remove(accel_debugfs_root);
+ accel_sysfs_destroy();
+}
+
+int __init accel_core_init(void)
+{
+ int ret;
+
+ ret = accel_sysfs_init();
+ if (ret < 0) {
+ DRM_ERROR("Cannot create ACCEL class: %d\n", ret);
+ goto error;
+ }
+
+ accel_debugfs_root = debugfs_create_dir("accel", NULL);
+
+ ret = register_chrdev(ACCEL_MAJOR, "accel", &accel_stub_fops);
+ if (ret < 0)
+ goto error;
+
+error:
+ /* Any cleanup will be done in drm_core_exit() that will call
+ * to accel_core_exit()
+ */
+ return ret;
+}
diff --git a/include/drm/drm_accel.h b/include/drm/drm_accel.h
new file mode 100644
index 000000000000..cf43a7b30f34
--- /dev/null
+++ b/include/drm/drm_accel.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2022 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ *
+ */
+
+#ifndef DRM_ACCEL_H_
+#define DRM_ACCEL_H_
+
+#define ACCEL_MAJOR 261
+
+#if IS_ENABLED(CONFIG_ACCEL)
+
+void accel_core_exit(void);
+int accel_core_init(void);
+
+#else
+
+static inline void accel_core_exit(void)
+{
+}
+
+static inline int __init accel_core_init(void)
+{
+ return 0;
+}
+
+#endif /* IS_ENABLED(CONFIG_ACCEL) */
+
+#endif /* DRM_ACCEL_H_ */
--
2.25.1


2022-11-02 21:13:29

by Oded Gabbay

[permalink] [raw]
Subject: [RFC PATCH v2 2/3] accel: add dedicated minor for accelerator devices

The accelerator devices are exposed to user-space using a dedicated
major. In addition, they are represented in /dev with new, dedicated
device char names: /dev/accel/accel*. This is done to make sure any
user-space software that tries to open a graphic card won't open
the accelerator device by mistake.

The above implies that the minor numbering should be separated from
the rest of the DRM devices. However, to avoid code duplication, we
want the drm_minor structure to be able to represent the accelerator
device.

To achieve this, we add a new drm_minor* to drm_device that represents
the accelerator device. This pointer is initialized for drivers that
declare they handle compute accelerator, using a new driver feature
flag called DRIVER_COMPUTE_ACCEL. It is important to note that this
driver feature is mutually exclusive with DRIVER_RENDER. Devices that
want to expose both graphics and compute device char files should be
handled by two drivers that are connected using the auxiliary bus
framework.

In addition, we define a different xarray to handle the accelerators
minors. This is done to make the minor's index be identical to the
device index in /dev/. Any access to the xarray is done solely
by functions in accel_drv.c, as the xarray is define as static. The
DRM core functions call those functions in case they detect the minor's
type is DRM_MINOR_ACCEL.

We define a separate accel_open function (from drm_open) that the
accel drivers should set as their open callback function. Both these
functions eventually call the same drm_open_helper(), which had to be
changed to be non-static so it can be called from accel_drv.c.
accel_open() partially duplicates drm_open as I removed some code from
it that handles legacy devices.

Signed-off-by: Oded Gabbay <[email protected]>
---
Changes in v2:
- Moved all accel minor handling code to accel_drv.c
- Replaced deprecated idr with xarray

drivers/accel/accel_drv.c | 205 +++++++++++++++++++++++++++++++++----
drivers/gpu/drm/drm_file.c | 2 +-
include/drm/drm_accel.h | 29 +++++-
include/drm/drm_device.h | 3 +
include/drm/drm_drv.h | 8 ++
include/drm/drm_file.h | 21 +++-
6 files changed, 247 insertions(+), 21 deletions(-)

diff --git a/drivers/accel/accel_drv.c b/drivers/accel/accel_drv.c
index 6132765ea054..964a93799936 100644
--- a/drivers/accel/accel_drv.c
+++ b/drivers/accel/accel_drv.c
@@ -9,13 +9,22 @@
#include <linux/module.h>
#include <linux/debugfs.h>
#include <linux/device.h>
+#include <linux/xarray.h>

#include <drm/drm_accel.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
#include <drm/drm_ioctl.h>
#include <drm/drm_print.h>

+static DEFINE_XARRAY_ALLOC(accel_minors_xa);
+
static struct dentry *accel_debugfs_root;
-struct class *accel_class;
+static struct class *accel_class;
+
+static struct device_type accel_sysfs_device_minor = {
+ .name = "accel_minor"
+};

static char *accel_devnode(struct device *dev, umode_t *mode)
{
@@ -24,16 +33,6 @@ static char *accel_devnode(struct device *dev, umode_t *mode)

static CLASS_ATTR_STRING(accel_version, 0444, "accel 1.0.0 20221018");

-/**
- * accel_sysfs_init - initialize sysfs helpers
- *
- * This is used to create the ACCEL class, which is the implicit parent of any
- * other top-level ACCEL sysfs objects.
- *
- * You must call accel_sysfs_destroy() to release the allocated resources.
- *
- * Return: 0 on success, negative error code on failure.
- */
static int accel_sysfs_init(void)
{
int err;
@@ -54,11 +53,6 @@ static int accel_sysfs_init(void)
return 0;
}

-/**
- * accel_sysfs_destroy - destroys ACCEL class
- *
- * Destroy the ACCEL device class.
- */
static void accel_sysfs_destroy(void)
{
if (IS_ERR_OR_NULL(accel_class))
@@ -68,11 +62,185 @@ static void accel_sysfs_destroy(void)
accel_class = NULL;
}

+/**
+ * accel_set_device_instance_params() - Set some device parameters for accel device
+ * @kdev: Pointer to the device instance.
+ * @index: The minor's index
+ *
+ * This function creates the dev_t of the device using the accel major and
+ * the device's minor number. In addition, it sets the class and type of the
+ * device instance to the accel sysfs class and device type, respectively.
+ */
+void accel_set_device_instance_params(struct device *kdev, int index)
+{
+ kdev->devt = MKDEV(ACCEL_MAJOR, index);
+ kdev->class = accel_class;
+ kdev->type = &accel_sysfs_device_minor;
+}
+
+/**
+ * accel_minor_alloc() - Allocates a new accel minor
+ *
+ * This function access the accel minors xarray and allocates from it
+ * a new id to represent a new accel minor
+ *
+ * Return: A new id on success or error code in case xa_alloc failed
+ */
+int accel_minor_alloc(void)
+{
+ int rc, index;
+
+ rc = xa_alloc(&accel_minors_xa, &index, NULL,
+ XA_LIMIT(0, ACCEL_MAX_MINORS - 1), GFP_KERNEL);
+ if (rc < 0)
+ return rc;
+
+ return index;
+}
+
+/**
+ * accel_minor_remove() - Remove an accel minor
+ * @index: The minor id to remove.
+ *
+ * This function access the accel minors xarray and removes from
+ * it the member with the id that is passed to this function.
+ */
+void accel_minor_remove(int index)
+{
+ xa_erase(&accel_minors_xa, index);
+}
+
+/**
+ * accel_minor_replace() - Replace minor pointer in accel minors xarray.
+ * @minor: Pointer to the new minor.
+ * @index: The minor id to replace.
+ *
+ * This function access the accel minors xarray structure and replaces the pointer
+ * that is associated with an existing id. Because the minor pointer can be
+ * NULL, we need to explicitly pass the index.
+ *
+ * Return: 0 for success, negative value for error
+ */
+int accel_minor_replace(struct drm_minor *minor, int index)
+{
+ if (minor) {
+ void *entry;
+
+ entry = xa_cmpxchg(&accel_minors_xa, index, NULL, minor, GFP_KERNEL);
+ if (xa_is_err(entry))
+ return xa_err(entry);
+ } else {
+ xa_store(&accel_minors_xa, index, NULL, GFP_KERNEL);
+ }
+
+ return 0;
+}
+
+/*
+ * Looks up the given minor-ID and returns the respective DRM-minor object. The
+ * refence-count of the underlying device is increased so you must release this
+ * object with accel_minor_release().
+ *
+ * The object can be only a drm_minor that represents an accel device.
+ *
+ * As long as you hold this minor, it is guaranteed that the object and the
+ * minor->dev pointer will stay valid! However, the device may get unplugged and
+ * unregistered while you hold the minor.
+ */
+static struct drm_minor *accel_minor_acquire(unsigned int minor_id)
+{
+ struct drm_minor *minor;
+
+ xa_lock(&accel_minors_xa);
+ minor = xa_load(&accel_minors_xa, minor_id);
+ if (minor)
+ drm_dev_get(minor->dev);
+ xa_unlock(&accel_minors_xa);
+
+ if (!minor) {
+ return ERR_PTR(-ENODEV);
+ } else if (drm_dev_is_unplugged(minor->dev)) {
+ drm_dev_put(minor->dev);
+ return ERR_PTR(-ENODEV);
+ }
+
+ return minor;
+}
+
+static void accel_minor_release(struct drm_minor *minor)
+{
+ drm_dev_put(minor->dev);
+}
+
+/**
+ * accel_open - open method for ACCEL file
+ * @inode: device inode
+ * @filp: file pointer.
+ *
+ * This function must be used by drivers as their &file_operations.open method.
+ * It looks up the correct ACCEL device and instantiates all the per-file
+ * resources for it. It also calls the &drm_driver.open driver callback.
+ *
+ * Return: 0 on success or negative errno value on failure.
+ */
+int accel_open(struct inode *inode, struct file *filp)
+{
+ struct drm_device *dev;
+ struct drm_minor *minor;
+ int retcode;
+
+ minor = accel_minor_acquire(iminor(inode));
+ if (IS_ERR(minor))
+ return PTR_ERR(minor);
+
+ dev = minor->dev;
+
+ atomic_fetch_inc(&dev->open_count);
+
+ /* share address_space across all char-devs of a single device */
+ filp->f_mapping = dev->anon_inode->i_mapping;
+
+ retcode = drm_open_helper(filp, minor);
+ if (retcode)
+ goto err_undo;
+
+ return 0;
+
+err_undo:
+ atomic_dec(&dev->open_count);
+ accel_minor_release(minor);
+ return retcode;
+}
+EXPORT_SYMBOL_GPL(accel_open);
+
static int accel_stub_open(struct inode *inode, struct file *filp)
{
- DRM_DEBUG("Operation not supported");
+ const struct file_operations *new_fops;
+ struct drm_minor *minor;
+ int err;
+
+ DRM_DEBUG("\n");
+
+ minor = accel_minor_acquire(iminor(inode));
+ if (IS_ERR(minor))
+ return PTR_ERR(minor);
+
+ new_fops = fops_get(minor->dev->driver->fops);
+ if (!new_fops) {
+ err = -ENODEV;
+ goto out;
+ }
+
+ replace_fops(filp, new_fops);
+ if (filp->f_op->open)
+ err = filp->f_op->open(inode, filp);
+ else
+ err = 0;
+
+out:
+ accel_minor_release(minor);

- return -EOPNOTSUPP;
+ return err;
}

static const struct file_operations accel_stub_fops = {
@@ -86,6 +254,7 @@ void accel_core_exit(void)
unregister_chrdev(ACCEL_MAJOR, "accel");
debugfs_remove(accel_debugfs_root);
accel_sysfs_destroy();
+ WARN_ON(!xa_empty(&accel_minors_xa));
}

int __init accel_core_init(void)
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index a8b4d918e9a3..64b4a3a87fbb 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -326,7 +326,7 @@ static int drm_cpu_valid(void)
* Creates and initializes a drm_file structure for the file private data in \p
* filp and add it into the double linked list in \p dev.
*/
-static int drm_open_helper(struct file *filp, struct drm_minor *minor)
+int drm_open_helper(struct file *filp, struct drm_minor *minor)
{
struct drm_device *dev = minor->dev;
struct drm_file *priv;
diff --git a/include/drm/drm_accel.h b/include/drm/drm_accel.h
index cf43a7b30f34..0c0ae387d075 100644
--- a/include/drm/drm_accel.h
+++ b/include/drm/drm_accel.h
@@ -8,12 +8,20 @@
#ifndef DRM_ACCEL_H_
#define DRM_ACCEL_H_

-#define ACCEL_MAJOR 261
+#include <drm/drm_file.h>
+
+#define ACCEL_MAJOR 261
+#define ACCEL_MAX_MINORS 256

#if IS_ENABLED(CONFIG_ACCEL)

void accel_core_exit(void);
int accel_core_init(void);
+void accel_minor_remove(int index);
+int accel_minor_alloc(void);
+int accel_minor_replace(struct drm_minor *minor, int index);
+void accel_set_device_instance_params(struct device *kdev, int index);
+int accel_open(struct inode *inode, struct file *filp);

#else

@@ -23,9 +31,28 @@ static inline void accel_core_exit(void)

static inline int __init accel_core_init(void)
{
+ /* Return 0 to allow drm_core_init to complete successfully */
return 0;
}

+static inline void accel_minor_remove(int index)
+{
+}
+
+static inline int accel_minor_alloc(void)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int accel_minor_replace(struct drm_minor *minor, int index)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void accel_set_device_instance_params(struct device *kdev, int index)
+{
+}
+
#endif /* IS_ENABLED(CONFIG_ACCEL) */

#endif /* DRM_ACCEL_H_ */
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 9923c7a6885e..933ce2048e20 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -93,6 +93,9 @@ struct drm_device {
/** @render: Render node */
struct drm_minor *render;

+ /** @accel: Compute Acceleration node */
+ struct drm_minor *accel;
+
/**
* @registered:
*
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index f6159acb8856..706e68ca5116 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -94,6 +94,14 @@ enum drm_driver_feature {
* synchronization of command submission.
*/
DRIVER_SYNCOBJ_TIMELINE = BIT(6),
+ /**
+ * @DRIVER_COMPUTE_ACCEL:
+ *
+ * Driver supports compute acceleration devices. This flag is mutually exclusive with
+ * @DRIVER_RENDER and @DRIVER_MODESET. Devices that support both graphics and compute
+ * acceleration should be handled by two drivers that are connected using auxiliry bus.
+ */
+ DRIVER_COMPUTE_ACCEL = BIT(7),

/* IMPORTANT: Below are all the legacy flags, add new ones above. */

diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index d780fd151789..0d1f853092ab 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -51,11 +51,15 @@ struct file;

/* Note that the order of this enum is ABI (it determines
* /dev/dri/renderD* numbers).
+ *
+ * Setting DRM_MINOR_ACCEL to 32 gives enough space for more drm minors to
+ * be implemented before we hit any future
*/
enum drm_minor_type {
DRM_MINOR_PRIMARY,
DRM_MINOR_CONTROL,
DRM_MINOR_RENDER,
+ DRM_MINOR_ACCEL = 32,
};

/**
@@ -70,7 +74,7 @@ enum drm_minor_type {
struct drm_minor {
/* private: */
int index; /* Minor device number */
- int type; /* Control or render */
+ int type; /* Control or render or accel */
struct device *kdev; /* Linux device */
struct drm_device *dev;

@@ -397,7 +401,22 @@ static inline bool drm_is_render_client(const struct drm_file *file_priv)
return file_priv->minor->type == DRM_MINOR_RENDER;
}

+/**
+ * drm_is_accel_client - is this an open file of the compute acceleration node
+ * @file_priv: DRM file
+ *
+ * Returns true if this is an open file of the compute acceleration node, i.e.
+ * &drm_file.minor of @file_priv is a accel minor.
+ *
+ * See also the :ref:`section on accel nodes <drm_accel_node>`.
+ */
+static inline bool drm_is_accel_client(const struct drm_file *file_priv)
+{
+ return file_priv->minor->type == DRM_MINOR_ACCEL;
+}
+
int drm_open(struct inode *inode, struct file *filp);
+int drm_open_helper(struct file *filp, struct drm_minor *minor);
ssize_t drm_read(struct file *filp, char __user *buffer,
size_t count, loff_t *offset);
int drm_release(struct inode *inode, struct file *filp);
--
2.25.1


2022-11-02 21:18:02

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On 11/2/2022 2:34 PM, Oded Gabbay wrote:
> diff --git a/drivers/accel/accel_drv.c b/drivers/accel/accel_drv.c
> new file mode 100644
> index 000000000000..6132765ea054
> --- /dev/null
> +++ b/drivers/accel/accel_drv.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright 2022 HabanaLabs, Ltd.
> + * All Rights Reserved.
> + *
> + */
> +
> +#include <linux/module.h>

Alphebetical order?

> +#include <linux/debugfs.h>
> +#include <linux/device.h>
> +
> +#include <drm/drm_accel.h>
> +#include <drm/drm_ioctl.h>
> +#include <drm/drm_print.h>
> +
> +static struct dentry *accel_debugfs_root;
> +struct class *accel_class;

Static?


2022-11-02 21:27:34

by Oded Gabbay

[permalink] [raw]
Subject: [RFC PATCH v2 3/3] drm: initialize accel framework

Now that we have the accel framework code ready, let's call the
accel functions from all the appropriate places. These places are the
drm module init/exit functions, and all the drm_minor handling
functions.

Signed-off-by: Oded Gabbay <[email protected]>
---
drivers/gpu/drm/drm_drv.c | 98 ++++++++++++++++++++++++++++---------
drivers/gpu/drm/drm_sysfs.c | 24 ++++++---
2 files changed, 90 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 8214a0b1ab7f..c8ea57a974b7 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -35,6 +35,7 @@
#include <linux/slab.h>
#include <linux/srcu.h>

+#include <drm/drm_accel.h>
#include <drm/drm_cache.h>
#include <drm/drm_client.h>
#include <drm/drm_color_mgmt.h>
@@ -90,6 +91,8 @@ static struct drm_minor **drm_minor_get_slot(struct drm_device *dev,
return &dev->primary;
case DRM_MINOR_RENDER:
return &dev->render;
+ case DRM_MINOR_ACCEL:
+ return &dev->accel;
default:
BUG();
}
@@ -104,9 +107,13 @@ static void drm_minor_alloc_release(struct drm_device *dev, void *data)

put_device(minor->kdev);

- spin_lock_irqsave(&drm_minor_lock, flags);
- idr_remove(&drm_minors_idr, minor->index);
- spin_unlock_irqrestore(&drm_minor_lock, flags);
+ if (minor->type == DRM_MINOR_ACCEL) {
+ accel_minor_remove(minor->index);
+ } else {
+ spin_lock_irqsave(&drm_minor_lock, flags);
+ idr_remove(&drm_minors_idr, minor->index);
+ spin_unlock_irqrestore(&drm_minor_lock, flags);
+ }
}

static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
@@ -123,13 +130,17 @@ static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
minor->dev = dev;

idr_preload(GFP_KERNEL);
- spin_lock_irqsave(&drm_minor_lock, flags);
- r = idr_alloc(&drm_minors_idr,
- NULL,
- 64 * type,
- 64 * (type + 1),
- GFP_NOWAIT);
- spin_unlock_irqrestore(&drm_minor_lock, flags);
+ if (type == DRM_MINOR_ACCEL) {
+ r = accel_minor_alloc();
+ } else {
+ spin_lock_irqsave(&drm_minor_lock, flags);
+ r = idr_alloc(&drm_minors_idr,
+ NULL,
+ 64 * type,
+ 64 * (type + 1),
+ GFP_NOWAIT);
+ spin_unlock_irqrestore(&drm_minor_lock, flags);
+ }
idr_preload_end();

if (r < 0)
@@ -163,7 +174,11 @@ static int drm_minor_register(struct drm_device *dev, unsigned int type)

ret = drm_debugfs_init(minor, minor->index, drm_debugfs_root);
if (ret) {
- DRM_ERROR("DRM: Failed to initialize /sys/kernel/debug/dri.\n");
+ if (minor->type == DRM_MINOR_ACCEL)
+ DRM_ERROR("DRM: Failed to initialize /sys/kernel/debug/accel.\n");
+ else
+ DRM_ERROR("DRM: Failed to initialize /sys/kernel/debug/dri.\n");
+
goto err_debugfs;
}

@@ -172,9 +187,15 @@ static int drm_minor_register(struct drm_device *dev, unsigned int type)
goto err_debugfs;

/* replace NULL with @minor so lookups will succeed from now on */
- spin_lock_irqsave(&drm_minor_lock, flags);
- idr_replace(&drm_minors_idr, minor, minor->index);
- spin_unlock_irqrestore(&drm_minor_lock, flags);
+ if (minor->type == DRM_MINOR_ACCEL) {
+ ret = accel_minor_replace(minor, minor->index);
+ if (ret < 0)
+ goto err_debugfs;
+ } else {
+ spin_lock_irqsave(&drm_minor_lock, flags);
+ idr_replace(&drm_minors_idr, minor, minor->index);
+ spin_unlock_irqrestore(&drm_minor_lock, flags);
+ }

DRM_DEBUG("new minor registered %d\n", minor->index);
return 0;
@@ -194,9 +215,13 @@ static void drm_minor_unregister(struct drm_device *dev, unsigned int type)
return;

/* replace @minor with NULL so lookups will fail from now on */
- spin_lock_irqsave(&drm_minor_lock, flags);
- idr_replace(&drm_minors_idr, NULL, minor->index);
- spin_unlock_irqrestore(&drm_minor_lock, flags);
+ if (minor->type == DRM_MINOR_ACCEL) {
+ accel_minor_replace(NULL, minor->index);
+ } else {
+ spin_lock_irqsave(&drm_minor_lock, flags);
+ idr_replace(&drm_minors_idr, NULL, minor->index);
+ spin_unlock_irqrestore(&drm_minor_lock, flags);
+ }

device_del(minor->kdev);
dev_set_drvdata(minor->kdev, NULL); /* safety belt */
@@ -603,6 +628,14 @@ static int drm_dev_init(struct drm_device *dev,
/* no per-device feature limits by default */
dev->driver_features = ~0u;

+ if (drm_core_check_feature(dev, DRIVER_COMPUTE_ACCEL) &&
+ (drm_core_check_feature(dev, DRIVER_RENDER) ||
+ drm_core_check_feature(dev, DRIVER_MODESET))) {
+
+ DRM_ERROR("DRM driver can't be both a compute acceleration and graphics driver\n");
+ return -EINVAL;
+ }
+
drm_legacy_init_members(dev);
INIT_LIST_HEAD(&dev->filelist);
INIT_LIST_HEAD(&dev->filelist_internal);
@@ -628,15 +661,21 @@ static int drm_dev_init(struct drm_device *dev,

dev->anon_inode = inode;

- if (drm_core_check_feature(dev, DRIVER_RENDER)) {
- ret = drm_minor_alloc(dev, DRM_MINOR_RENDER);
+ if (drm_core_check_feature(dev, DRIVER_COMPUTE_ACCEL)) {
+ ret = drm_minor_alloc(dev, DRM_MINOR_ACCEL);
if (ret)
goto err;
- }
+ } else {
+ if (drm_core_check_feature(dev, DRIVER_RENDER)) {
+ ret = drm_minor_alloc(dev, DRM_MINOR_RENDER);
+ if (ret)
+ goto err;
+ }

- ret = drm_minor_alloc(dev, DRM_MINOR_PRIMARY);
- if (ret)
- goto err;
+ ret = drm_minor_alloc(dev, DRM_MINOR_PRIMARY);
+ if (ret)
+ goto err;
+ }

ret = drm_legacy_create_map_hash(dev);
if (ret)
@@ -883,6 +922,10 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags)
if (ret)
goto err_minors;

+ ret = drm_minor_register(dev, DRM_MINOR_ACCEL);
+ if (ret)
+ goto err_minors;
+
ret = create_compat_control_link(dev);
if (ret)
goto err_minors;
@@ -902,12 +945,13 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags)
driver->name, driver->major, driver->minor,
driver->patchlevel, driver->date,
dev->dev ? dev_name(dev->dev) : "virtual device",
- dev->primary->index);
+ dev->primary ? dev->primary->index : dev->accel->index);

goto out_unlock;

err_minors:
remove_compat_control_link(dev);
+ drm_minor_unregister(dev, DRM_MINOR_ACCEL);
drm_minor_unregister(dev, DRM_MINOR_PRIMARY);
drm_minor_unregister(dev, DRM_MINOR_RENDER);
out_unlock:
@@ -950,6 +994,7 @@ void drm_dev_unregister(struct drm_device *dev)
drm_legacy_rmmaps(dev);

remove_compat_control_link(dev);
+ drm_minor_unregister(dev, DRM_MINOR_ACCEL);
drm_minor_unregister(dev, DRM_MINOR_PRIMARY);
drm_minor_unregister(dev, DRM_MINOR_RENDER);
}
@@ -1034,6 +1079,7 @@ static const struct file_operations drm_stub_fops = {
static void drm_core_exit(void)
{
drm_privacy_screen_lookup_exit();
+ accel_core_exit();
unregister_chrdev(DRM_MAJOR, "drm");
debugfs_remove(drm_debugfs_root);
drm_sysfs_destroy();
@@ -1061,6 +1107,10 @@ static int __init drm_core_init(void)
if (ret < 0)
goto error;

+ ret = accel_core_init();
+ if (ret < 0)
+ goto error;
+
drm_privacy_screen_lookup_init();

drm_core_init_complete = true;
diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
index 430e00b16eec..b8da978d85bb 100644
--- a/drivers/gpu/drm/drm_sysfs.c
+++ b/drivers/gpu/drm/drm_sysfs.c
@@ -19,6 +19,7 @@
#include <linux/kdev_t.h>
#include <linux/slab.h>

+#include <drm/drm_accel.h>
#include <drm/drm_connector.h>
#include <drm/drm_device.h>
#include <drm/drm_file.h>
@@ -471,19 +472,26 @@ struct device *drm_sysfs_minor_alloc(struct drm_minor *minor)
struct device *kdev;
int r;

- if (minor->type == DRM_MINOR_RENDER)
- minor_str = "renderD%d";
- else
- minor_str = "card%d";
-
kdev = kzalloc(sizeof(*kdev), GFP_KERNEL);
if (!kdev)
return ERR_PTR(-ENOMEM);

device_initialize(kdev);
- kdev->devt = MKDEV(DRM_MAJOR, minor->index);
- kdev->class = drm_class;
- kdev->type = &drm_sysfs_device_minor;
+
+ if (minor->type == DRM_MINOR_ACCEL) {
+ minor_str = "accel%d";
+ accel_set_device_instance_params(kdev, minor->index);
+ } else {
+ if (minor->type == DRM_MINOR_RENDER)
+ minor_str = "renderD%d";
+ else
+ minor_str = "card%d";
+
+ kdev->devt = MKDEV(DRM_MAJOR, minor->index);
+ kdev->class = drm_class;
+ kdev->type = &drm_sysfs_device_minor;
+ }
+
kdev->parent = minor->dev->dev;
kdev->release = drm_sysfs_release;
dev_set_drvdata(kdev, minor);
--
2.25.1


2022-11-02 21:50:18

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/3] accel: add dedicated minor for accelerator devices

On 11/2/2022 2:34 PM, Oded Gabbay wrote:
> @@ -24,16 +33,6 @@ static char *accel_devnode(struct device *dev, umode_t *mode)
>
> static CLASS_ATTR_STRING(accel_version, 0444, "accel 1.0.0 20221018");
>
> -/**
> - * accel_sysfs_init - initialize sysfs helpers
> - *
> - * This is used to create the ACCEL class, which is the implicit parent of any
> - * other top-level ACCEL sysfs objects.
> - *
> - * You must call accel_sysfs_destroy() to release the allocated resources.
> - *
> - * Return: 0 on success, negative error code on failure.
> - */

Why are we removing this?

> static int accel_sysfs_init(void)
> {
> int err;
> @@ -54,11 +53,6 @@ static int accel_sysfs_init(void)
> return 0;
> }
>
> -/**
> - * accel_sysfs_destroy - destroys ACCEL class
> - *
> - * Destroy the ACCEL device class.
> - */

Again, why remove this? Adding it in one patch than immediately
removing it in the next patch seems wasteful.

> static void accel_sysfs_destroy(void)
> {
> if (IS_ERR_OR_NULL(accel_class))
> @@ -68,11 +62,185 @@ static void accel_sysfs_destroy(void)
> accel_class = NULL;
> }
>
> +static void accel_minor_release(struct drm_minor *minor)
> +{
> + drm_dev_put(minor->dev);
> +}
> +
> +/**
> + * accel_open - open method for ACCEL file
> + * @inode: device inode
> + * @filp: file pointer.
> + *
> + * This function must be used by drivers as their &file_operations.open method.

Feels like it would be helpful to have an accel version of
DEFINE_DRM_GEM_FOPS() which helps accel drivers to get this right

> + * It looks up the correct ACCEL device and instantiates all the per-file
> + * resources for it. It also calls the &drm_driver.open driver callback.
> + *
> + * Return: 0 on success or negative errno value on failure.
> + */
> +int accel_open(struct inode *inode, struct file *filp)
> +{
> + struct drm_device *dev;
> + struct drm_minor *minor;
> + int retcode;
> +
> + minor = accel_minor_acquire(iminor(inode));
> + if (IS_ERR(minor))
> + return PTR_ERR(minor);
> +
> + dev = minor->dev;
> +
> + atomic_fetch_inc(&dev->open_count);
> +
> + /* share address_space across all char-devs of a single device */
> + filp->f_mapping = dev->anon_inode->i_mapping;
> +
> + retcode = drm_open_helper(filp, minor);
> + if (retcode)
> + goto err_undo;
> +
> + return 0;
> +
> +err_undo:
> + atomic_dec(&dev->open_count);
> + accel_minor_release(minor);
> + return retcode;
> +}
> +EXPORT_SYMBOL_GPL(accel_open);

2022-11-02 21:57:58

by Jeffrey Hugo

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/3] drm: initialize accel framework

On 11/2/2022 2:34 PM, Oded Gabbay wrote:
> @@ -163,7 +174,11 @@ static int drm_minor_register(struct drm_device *dev, unsigned int type)
>
> ret = drm_debugfs_init(minor, minor->index, drm_debugfs_root);
> if (ret) {
> - DRM_ERROR("DRM: Failed to initialize /sys/kernel/debug/dri.\n");
> + if (minor->type == DRM_MINOR_ACCEL)
> + DRM_ERROR("DRM: Failed to initialize /sys/kernel/debug/accel.\n");
> + else
> + DRM_ERROR("DRM: Failed to initialize /sys/kernel/debug/dri.\n");
> +
> goto err_debugfs;
> }
>

This doesn't look right. Don't you need to call drm_debugfs_init() with
accel_debugfs_root for the case - minor->type == DRM_MINOR_ACCEL?
Unless I fail to understand something, this will put all the accel
devices under /sys/kernel/debug/dri

2022-11-02 23:20:04

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major



On 11/2/22 13:34, Oded Gabbay wrote:
> diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
> new file mode 100644
> index 000000000000..282ea24f90c5
> --- /dev/null
> +++ b/drivers/accel/Kconfig
> @@ -0,0 +1,24 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Compute Acceleration device configuration
> +#
> +# This framework provides support for compute acceleration devices, such
> +# as, but not limited to, Machine-Learning and Deep-Learning acceleration
> +# devices
> +#
> +menuconfig ACCEL
> + tristate "Compute Acceleration Framework"
> + depends on DRM
> + help
> + Framework for device drivers of compute acceleration devices, such
> + as, but not limited to, Machine-Learning and Deep-Learning
> + acceleration devices.
> + If you say Y here, you need to select the module that's right for
> + your acceleration device from the list below.
> + This framework is integrated with the DRM subsystem as compute
> + accelerators and GPUs share a lot in common and can use almost the
> + same infrastructure code.
> + Having said that, acceleration devices will have a different
> + major number than GPUs, and will be exposed to user-space using
> + different device files, called accel/accel* (in /dev, sysfs
> + and debugfs)

Please add a period at the end of the help text.

+ and debugfs).

--
~Randy

2022-11-03 01:18:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote:
> --- /dev/null
> +++ b/drivers/accel/Kconfig
> @@ -0,0 +1,24 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Compute Acceleration device configuration
> +#
> +# This framework provides support for compute acceleration devices, such
> +# as, but not limited to, Machine-Learning and Deep-Learning acceleration
> +# devices
> +#
> +menuconfig ACCEL
> + tristate "Compute Acceleration Framework"
> + depends on DRM
> + help
> + Framework for device drivers of compute acceleration devices, such
> + as, but not limited to, Machine-Learning and Deep-Learning
> + acceleration devices.
> + If you say Y here, you need to select the module that's right for
> + your acceleration device from the list below.
> + This framework is integrated with the DRM subsystem as compute
> + accelerators and GPUs share a lot in common and can use almost the
> + same infrastructure code.
> + Having said that, acceleration devices will have a different
> + major number than GPUs, and will be exposed to user-space using
> + different device files, called accel/accel* (in /dev, sysfs
> + and debugfs)

Module name if "M" is chosen?


> +static char *accel_devnode(struct device *dev, umode_t *mode)
> +{
> + return kasprintf(GFP_KERNEL, "accel/%s", dev_name(dev));
> +}
> +
> +static CLASS_ATTR_STRING(accel_version, 0444, "accel 1.0.0 20221018");

What is a version number doing here?

Please no, I understand that DRM has this, but it really does not make
sense for any in-kernel code. And that's not how sysfs is supposed to
work anyway (if a file is present, the value is documented, if the file
is not present, the value is just not there, userspace has to handle
it all.)

> +
> +/**
> + * accel_sysfs_init - initialize sysfs helpers
> + *
> + * This is used to create the ACCEL class, which is the implicit parent of any
> + * other top-level ACCEL sysfs objects.
> + *
> + * You must call accel_sysfs_destroy() to release the allocated resources.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +static int accel_sysfs_init(void)
> +{
> + int err;
> +
> + accel_class = class_create(THIS_MODULE, "accel");
> + if (IS_ERR(accel_class))
> + return PTR_ERR(accel_class);
> +
> + err = class_create_file(accel_class, &class_attr_accel_version.attr);

Hint, if you ever find yourself adding sysfs files "by hand" like this,
you are doing something wrong. The driver code should create them
automatically for you by setting up default groups, _OR_ as in this
case, you shouldn't be adding the file at all :)

> +static void accel_sysfs_destroy(void)
> +{
> + if (IS_ERR_OR_NULL(accel_class))
> + return;
> + class_remove_file(accel_class, &class_attr_accel_version.attr);

No need to manually destroy files when you remove a device. But you
will remove this file anyway for the next version of this patch, so it's
not a big deal :)

> + class_destroy(accel_class);
> + accel_class = NULL;
> +}
> +
> +static int accel_stub_open(struct inode *inode, struct file *filp)
> +{
> + DRM_DEBUG("Operation not supported");

ftrace is wonderful, please use that and not hand-rolled custom "I am
here!" type messages like this.

thanks,

greg k-h

2022-11-03 05:34:13

by Jiho Chu

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/3] accel: add dedicated minor for accelerator devices

On Wed, 2 Nov 2022 22:34:04 +0200
Oded Gabbay <[email protected]> wrote:

> +/**
> + * accel_open - open method for ACCEL file
> + * @inode: device inode
> + * @filp: file pointer.
> + *
> + * This function must be used by drivers as their &file_operations.open method.
> + * It looks up the correct ACCEL device and instantiates all the per-file
> + * resources for it. It also calls the &drm_driver.open driver callback.
> + *
> + * Return: 0 on success or negative errno value on failure.
> + */
> +int accel_open(struct inode *inode, struct file *filp)
> +{
> + struct drm_device *dev;
> + struct drm_minor *minor;
> + int retcode;
> +
> + minor = accel_minor_acquire(iminor(inode));
> + if (IS_ERR(minor))
> + return PTR_ERR(minor);
> +
> + dev = minor->dev;
> +
> + atomic_fetch_inc(&dev->open_count);
> +

Hi,
It needs to consider drm_global_mutex to access open_count.
please check doxy of open_count.


> + /* share address_space across all char-devs of a single device */
> + filp->f_mapping = dev->anon_inode->i_mapping;
> +
> + retcode = drm_open_helper(filp, minor);
> + if (retcode)
> + goto err_undo;
> +
> + return 0;
> +
> +err_undo:
> + atomic_dec(&dev->open_count);
> + accel_minor_release(minor);
> + return retcode;
> +}
> +EXPORT_SYMBOL_GPL(accel_open);
> +
> static int accel_stub_open(struct inode *inode, struct file *filp)
> {
> - DRM_DEBUG("Operation not supported");
> + const struct file_operations *new_fops;
> + struct drm_minor *minor;
> + int err;
> +
> + DRM_DEBUG("\n");

It seems useless.

Thanks.
Jiho Chu

2022-11-03 13:46:36

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Wed, Nov 2, 2022 at 11:04 PM Jeffrey Hugo <[email protected]> wrote:
>
> On 11/2/2022 2:34 PM, Oded Gabbay wrote:
> > diff --git a/drivers/accel/accel_drv.c b/drivers/accel/accel_drv.c
> > new file mode 100644
> > index 000000000000..6132765ea054
> > --- /dev/null
> > +++ b/drivers/accel/accel_drv.c
> > @@ -0,0 +1,112 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * Copyright 2022 HabanaLabs, Ltd.
> > + * All Rights Reserved.
> > + *
> > + */
> > +
> > +#include <linux/module.h>
>
> Alphebetical order?
ok
>
> > +#include <linux/debugfs.h>
> > +#include <linux/device.h>
> > +
> > +#include <drm/drm_accel.h>
> > +#include <drm/drm_ioctl.h>
> > +#include <drm/drm_print.h>
> > +
> > +static struct dentry *accel_debugfs_root;
> > +struct class *accel_class;
>
> Static?
>
yes, thx.

2022-11-03 14:13:29

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Thu, Nov 3, 2022 at 12:58 AM Randy Dunlap <[email protected]> wrote:
>
>
>
> On 11/2/22 13:34, Oded Gabbay wrote:
> > diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
> > new file mode 100644
> > index 000000000000..282ea24f90c5
> > --- /dev/null
> > +++ b/drivers/accel/Kconfig
> > @@ -0,0 +1,24 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +# Compute Acceleration device configuration
> > +#
> > +# This framework provides support for compute acceleration devices, such
> > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration
> > +# devices
> > +#
> > +menuconfig ACCEL
> > + tristate "Compute Acceleration Framework"
> > + depends on DRM
> > + help
> > + Framework for device drivers of compute acceleration devices, such
> > + as, but not limited to, Machine-Learning and Deep-Learning
> > + acceleration devices.
> > + If you say Y here, you need to select the module that's right for
> > + your acceleration device from the list below.
> > + This framework is integrated with the DRM subsystem as compute
> > + accelerators and GPUs share a lot in common and can use almost the
> > + same infrastructure code.
> > + Having said that, acceleration devices will have a different
> > + major number than GPUs, and will be exposed to user-space using
> > + different device files, called accel/accel* (in /dev, sysfs
> > + and debugfs)
>
> Please add a period at the end of the help text.
>
> + and debugfs).
sure, thx.
Oded
>
> --
> ~Randy

2022-11-03 14:31:24

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote:
> > --- /dev/null
> > +++ b/drivers/accel/Kconfig
> > @@ -0,0 +1,24 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +# Compute Acceleration device configuration
> > +#
> > +# This framework provides support for compute acceleration devices, such
> > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration
> > +# devices
> > +#
> > +menuconfig ACCEL
> > + tristate "Compute Acceleration Framework"
> > + depends on DRM
> > + help
> > + Framework for device drivers of compute acceleration devices, such
> > + as, but not limited to, Machine-Learning and Deep-Learning
> > + acceleration devices.
> > + If you say Y here, you need to select the module that's right for
> > + your acceleration device from the list below.
> > + This framework is integrated with the DRM subsystem as compute
> > + accelerators and GPUs share a lot in common and can use almost the
> > + same infrastructure code.
> > + Having said that, acceleration devices will have a different
> > + major number than GPUs, and will be exposed to user-space using
> > + different device files, called accel/accel* (in /dev, sysfs
> > + and debugfs)
>
> Module name if "M" is chosen?
Will add
>
>
> > +static char *accel_devnode(struct device *dev, umode_t *mode)
> > +{
> > + return kasprintf(GFP_KERNEL, "accel/%s", dev_name(dev));
> > +}
> > +
> > +static CLASS_ATTR_STRING(accel_version, 0444, "accel 1.0.0 20221018");
>
> What is a version number doing here?
>
> Please no, I understand that DRM has this, but it really does not make
> sense for any in-kernel code. And that's not how sysfs is supposed to
> work anyway (if a file is present, the value is documented, if the file
> is not present, the value is just not there, userspace has to handle
> it all.)
Actually I don't know if drm even uses that. I just copied it to be
identical to drm's sysfs, but
as accel doesn't have any history, I can remove it.
>
> > +
> > +/**
> > + * accel_sysfs_init - initialize sysfs helpers
> > + *
> > + * This is used to create the ACCEL class, which is the implicit parent of any
> > + * other top-level ACCEL sysfs objects.
> > + *
> > + * You must call accel_sysfs_destroy() to release the allocated resources.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +static int accel_sysfs_init(void)
> > +{
> > + int err;
> > +
> > + accel_class = class_create(THIS_MODULE, "accel");
> > + if (IS_ERR(accel_class))
> > + return PTR_ERR(accel_class);
> > +
> > + err = class_create_file(accel_class, &class_attr_accel_version.attr);
>
> Hint, if you ever find yourself adding sysfs files "by hand" like this,
> you are doing something wrong. The driver code should create them
> automatically for you by setting up default groups, _OR_ as in this
> case, you shouldn't be adding the file at all :)
ok, removed.

>
> > +static void accel_sysfs_destroy(void)
> > +{
> > + if (IS_ERR_OR_NULL(accel_class))
> > + return;
> > + class_remove_file(accel_class, &class_attr_accel_version.attr);
>
> No need to manually destroy files when you remove a device. But you
> will remove this file anyway for the next version of this patch, so it's
> not a big deal :)
:)
>
> > + class_destroy(accel_class);
> > + accel_class = NULL;
> > +}
> > +
> > +static int accel_stub_open(struct inode *inode, struct file *filp)
> > +{
> > + DRM_DEBUG("Operation not supported");
>
> ftrace is wonderful, please use that and not hand-rolled custom "I am
> here!" type messages like this.
I'll just remove it as this line is being replaced anyway in the next patch.
Thanks,
Oded
>
> thanks,
>
> greg k-h

2022-11-03 20:48:37

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Thu, Nov 3, 2022 at 3:31 PM Oded Gabbay <[email protected]> wrote:
>
> On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote:
> > > --- /dev/null
> > > +++ b/drivers/accel/Kconfig
> > > @@ -0,0 +1,24 @@
> > > +# SPDX-License-Identifier: GPL-2.0-only
> > > +#
> > > +# Compute Acceleration device configuration
> > > +#
> > > +# This framework provides support for compute acceleration devices, such
> > > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration
> > > +# devices
> > > +#
> > > +menuconfig ACCEL
> > > + tristate "Compute Acceleration Framework"
> > > + depends on DRM
> > > + help
> > > + Framework for device drivers of compute acceleration devices, such
> > > + as, but not limited to, Machine-Learning and Deep-Learning
> > > + acceleration devices.
> > > + If you say Y here, you need to select the module that's right for
> > > + your acceleration device from the list below.
> > > + This framework is integrated with the DRM subsystem as compute
> > > + accelerators and GPUs share a lot in common and can use almost the
> > > + same infrastructure code.
> > > + Having said that, acceleration devices will have a different
> > > + major number than GPUs, and will be exposed to user-space using
> > > + different device files, called accel/accel* (in /dev, sysfs
> > > + and debugfs)
> >
> > Module name if "M" is chosen?
> Will add
So, unfortunately, the path of doing accel as a kernel module won't
work cleanly (Thanks stanislaw for pointing this out to me).
The reason is the circular dependency between drm and accel. drm calls
accel exported symbols during init and when devices are registering
(all the minor handling), and accel calls drm exported symbols because
I don't want to duplicate the entire drm core code.

I'll keep this menuconfig to provide the ability to disable this code
for people who think it is too "experimental". And in the future, when
drivers will join this subsystem, they will need this place for their
kconfig.

Thanks,
Oded

2022-11-03 23:09:08

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major



On 11/3/22 13:39, Oded Gabbay wrote:
> On Thu, Nov 3, 2022 at 3:31 PM Oded Gabbay <[email protected]> wrote:
>>
>> On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman
>> <[email protected]> wrote:
>>>
>>> On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote:
>>>> --- /dev/null
>>>> +++ b/drivers/accel/Kconfig
>>>> @@ -0,0 +1,24 @@
>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>> +#
>>>> +# Compute Acceleration device configuration
>>>> +#
>>>> +# This framework provides support for compute acceleration devices, such
>>>> +# as, but not limited to, Machine-Learning and Deep-Learning acceleration
>>>> +# devices
>>>> +#
>>>> +menuconfig ACCEL
>>>> + tristate "Compute Acceleration Framework"
>>>> + depends on DRM
>>>> + help
>>>> + Framework for device drivers of compute acceleration devices, such
>>>> + as, but not limited to, Machine-Learning and Deep-Learning
>>>> + acceleration devices.
>>>> + If you say Y here, you need to select the module that's right for
>>>> + your acceleration device from the list below.
>>>> + This framework is integrated with the DRM subsystem as compute
>>>> + accelerators and GPUs share a lot in common and can use almost the
>>>> + same infrastructure code.
>>>> + Having said that, acceleration devices will have a different
>>>> + major number than GPUs, and will be exposed to user-space using
>>>> + different device files, called accel/accel* (in /dev, sysfs
>>>> + and debugfs)
>>>
>>> Module name if "M" is chosen?
>> Will add
> So, unfortunately, the path of doing accel as a kernel module won't
> work cleanly (Thanks stanislaw for pointing this out to me).
> The reason is the circular dependency between drm and accel. drm calls
> accel exported symbols during init and when devices are registering
> (all the minor handling), and accel calls drm exported symbols because
> I don't want to duplicate the entire drm core code.

But DRM is a tristate symbol, so during drm init (loadable module), couldn't
it call accel init code (loadable module)?

Or are you saying that they only work together if both of them are builtin?

> I'll keep this menuconfig to provide the ability to disable this code
> for people who think it is too "experimental". And in the future, when
> drivers will join this subsystem, they will need this place for their
> kconfig.
>
> Thanks,
> Oded

--
~Randy

2022-11-04 08:03:10

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Thu, Nov 03, 2022 at 04:01:08PM -0700, Randy Dunlap wrote:
> >>> Module name if "M" is chosen?
> >> Will add
> > So, unfortunately, the path of doing accel as a kernel module won't
> > work cleanly (Thanks stanislaw for pointing this out to me).
> > The reason is the circular dependency between drm and accel. drm calls
> > accel exported symbols during init and when devices are registering
> > (all the minor handling), and accel calls drm exported symbols because
> > I don't want to duplicate the entire drm core code.
>
> But DRM is a tristate symbol, so during drm init (loadable module), couldn't
> it call accel init code (loadable module)?
>
> Or are you saying that they only work together if both of them are builtin?

Yes, with current state of the patches, we can not build code as modules.
There are symbols in accel that are from drm and we use accel symbols
in drm. This could be fixed by separating symbols accel requires in
separate module i.e. drm_file_helper.ko, however Oded proposed to make
CONFIG_ACCEL compile option for DRM and all accel code will be
build in drm.ko . I think that ok, since accel is not big.

Regards
Stanislaw


2022-11-06 11:15:56

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/3] drm: initialize accel framework

On Wed, Nov 2, 2022 at 11:30 PM Jeffrey Hugo <[email protected]> wrote:
>
> On 11/2/2022 2:34 PM, Oded Gabbay wrote:
> > @@ -163,7 +174,11 @@ static int drm_minor_register(struct drm_device *dev, unsigned int type)
> >
> > ret = drm_debugfs_init(minor, minor->index, drm_debugfs_root);
> > if (ret) {
> > - DRM_ERROR("DRM: Failed to initialize /sys/kernel/debug/dri.\n");
> > + if (minor->type == DRM_MINOR_ACCEL)
> > + DRM_ERROR("DRM: Failed to initialize /sys/kernel/debug/accel.\n");
> > + else
> > + DRM_ERROR("DRM: Failed to initialize /sys/kernel/debug/dri.\n");
> > +
> > goto err_debugfs;
> > }
> >
>
> This doesn't look right. Don't you need to call drm_debugfs_init() with
> accel_debugfs_root for the case - minor->type == DRM_MINOR_ACCEL?
> Unless I fail to understand something, this will put all the accel
> devices under /sys/kernel/debug/dri
ofc, you are correct.
Will be fixed in v3.
Thanks,
Oded

2022-11-06 11:16:39

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/3] accel: add dedicated minor for accelerator devices

On Thu, Nov 3, 2022 at 7:26 AM Jiho Chu <[email protected]> wrote:
>
> On Wed, 2 Nov 2022 22:34:04 +0200
> Oded Gabbay <[email protected]> wrote:
>
> > +/**
> > + * accel_open - open method for ACCEL file
> > + * @inode: device inode
> > + * @filp: file pointer.
> > + *
> > + * This function must be used by drivers as their &file_operations.open method.
> > + * It looks up the correct ACCEL device and instantiates all the per-file
> > + * resources for it. It also calls the &drm_driver.open driver callback.
> > + *
> > + * Return: 0 on success or negative errno value on failure.
> > + */
> > +int accel_open(struct inode *inode, struct file *filp)
> > +{
> > + struct drm_device *dev;
> > + struct drm_minor *minor;
> > + int retcode;
> > +
> > + minor = accel_minor_acquire(iminor(inode));
> > + if (IS_ERR(minor))
> > + return PTR_ERR(minor);
> > +
> > + dev = minor->dev;
> > +
> > + atomic_fetch_inc(&dev->open_count);
> > +
>
> Hi,
> It needs to consider drm_global_mutex to access open_count.
> please check doxy of open_count.
Now that I'm changing the code back to be part of drm.ko, I can return
all the code that is in drm_copy which I removed for this to compile.

>
>
> > + /* share address_space across all char-devs of a single device */
> > + filp->f_mapping = dev->anon_inode->i_mapping;
> > +
> > + retcode = drm_open_helper(filp, minor);
> > + if (retcode)
> > + goto err_undo;
> > +
> > + return 0;
> > +
> > +err_undo:
> > + atomic_dec(&dev->open_count);
> > + accel_minor_release(minor);
> > + return retcode;
> > +}
> > +EXPORT_SYMBOL_GPL(accel_open);
> > +
> > static int accel_stub_open(struct inode *inode, struct file *filp)
> > {
> > - DRM_DEBUG("Operation not supported");
> > + const struct file_operations *new_fops;
> > + struct drm_minor *minor;
> > + int err;
> > +
> > + DRM_DEBUG("\n");
>
> It seems useless.
Correct, I removed it in v3.
Thanks,
Oded
>
> Thanks.
> Jiho Chu

2022-11-06 11:23:05

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/3] accel: add dedicated minor for accelerator devices

On Wed, Nov 2, 2022 at 11:17 PM Jeffrey Hugo <[email protected]> wrote:
>
> On 11/2/2022 2:34 PM, Oded Gabbay wrote:
> > @@ -24,16 +33,6 @@ static char *accel_devnode(struct device *dev, umode_t *mode)
> >
> > static CLASS_ATTR_STRING(accel_version, 0444, "accel 1.0.0 20221018");
> >
> > -/**
> > - * accel_sysfs_init - initialize sysfs helpers
> > - *
> > - * This is used to create the ACCEL class, which is the implicit parent of any
> > - * other top-level ACCEL sysfs objects.
> > - *
> > - * You must call accel_sysfs_destroy() to release the allocated resources.
> > - *
> > - * Return: 0 on success, negative error code on failure.
> > - */
>
> Why are we removing this?
It should have been removed at the first patch, and will be fixed in v3.
I'm removing it as it is a static function. We don't document every
static function.
>
> > static int accel_sysfs_init(void)
> > {
> > int err;
> > @@ -54,11 +53,6 @@ static int accel_sysfs_init(void)
> > return 0;
> > }
> >
> > -/**
> > - * accel_sysfs_destroy - destroys ACCEL class
> > - *
> > - * Destroy the ACCEL device class.
> > - */
>
> Again, why remove this? Adding it in one patch than immediately
> removing it in the next patch seems wasteful.
Correct, will be removed from the first patch in the next version.
>
> > static void accel_sysfs_destroy(void)
> > {
> > if (IS_ERR_OR_NULL(accel_class))
> > @@ -68,11 +62,185 @@ static void accel_sysfs_destroy(void)
> > accel_class = NULL;
> > }
> >
> > +static void accel_minor_release(struct drm_minor *minor)
> > +{
> > + drm_dev_put(minor->dev);
> > +}
> > +
> > +/**
> > + * accel_open - open method for ACCEL file
> > + * @inode: device inode
> > + * @filp: file pointer.
> > + *
> > + * This function must be used by drivers as their &file_operations.open method.
>
> Feels like it would be helpful to have an accel version of
> DEFINE_DRM_GEM_FOPS() which helps accel drivers to get this right
Yeah, I also thought about it. I'll add it.
thanks,
oded
>
> > + * It looks up the correct ACCEL device and instantiates all the per-file
> > + * resources for it. It also calls the &drm_driver.open driver callback.
> > + *
> > + * Return: 0 on success or negative errno value on failure.
> > + */
> > +int accel_open(struct inode *inode, struct file *filp)
> > +{
> > + struct drm_device *dev;
> > + struct drm_minor *minor;
> > + int retcode;
> > +
> > + minor = accel_minor_acquire(iminor(inode));
> > + if (IS_ERR(minor))
> > + return PTR_ERR(minor);
> > +
> > + dev = minor->dev;
> > +
> > + atomic_fetch_inc(&dev->open_count);
> > +
> > + /* share address_space across all char-devs of a single device */
> > + filp->f_mapping = dev->anon_inode->i_mapping;
> > +
> > + retcode = drm_open_helper(filp, minor);
> > + if (retcode)
> > + goto err_undo;
> > +
> > + return 0;
> > +
> > +err_undo:
> > + atomic_dec(&dev->open_count);
> > + accel_minor_release(minor);
> > + return retcode;
> > +}
> > +EXPORT_SYMBOL_GPL(accel_open);

2022-11-06 15:16:53

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/3] accel: add dedicated minor for accelerator devices

On Sun, Nov 6, 2022 at 12:54 PM Oded Gabbay <[email protected]> wrote:
>
> On Thu, Nov 3, 2022 at 7:26 AM Jiho Chu <[email protected]> wrote:
> >
> > On Wed, 2 Nov 2022 22:34:04 +0200
> > Oded Gabbay <[email protected]> wrote:
> >
> > > +/**
> > > + * accel_open - open method for ACCEL file
> > > + * @inode: device inode
> > > + * @filp: file pointer.
> > > + *
> > > + * This function must be used by drivers as their &file_operations.open method.
> > > + * It looks up the correct ACCEL device and instantiates all the per-file
> > > + * resources for it. It also calls the &drm_driver.open driver callback.
> > > + *
> > > + * Return: 0 on success or negative errno value on failure.
> > > + */
> > > +int accel_open(struct inode *inode, struct file *filp)
> > > +{
> > > + struct drm_device *dev;
> > > + struct drm_minor *minor;
> > > + int retcode;
> > > +
> > > + minor = accel_minor_acquire(iminor(inode));
> > > + if (IS_ERR(minor))
> > > + return PTR_ERR(minor);
> > > +
> > > + dev = minor->dev;
> > > +
> > > + atomic_fetch_inc(&dev->open_count);
> > > +
> >
> > Hi,
> > It needs to consider drm_global_mutex to access open_count.
> > please check doxy of open_count.
> Now that I'm changing the code back to be part of drm.ko, I can return
> all the code that is in drm_copy which I removed for this to compile.
I take it back. All the code that I omitted was for legacy drivers.
If you look inside drm_dev_needs_global_mutex(), you will see 3 cases
where you need to take the global mutex, and all 3 are only relevant
for legacy drivers and/or drivers that use deprecated features.
So, I disagree with your original comment here.
Moreover, open_count is atomic, so I don't need to take the mutex to
increment it, and as you can see in drm_open(), the function
increments it regardless of whether it takes
drm_dev_needs_global_mutex.
Oded

>
> >
> >
> > > + /* share address_space across all char-devs of a single device */
> > > + filp->f_mapping = dev->anon_inode->i_mapping;
> > > +
> > > + retcode = drm_open_helper(filp, minor);
> > > + if (retcode)
> > > + goto err_undo;
> > > +
> > > + return 0;
> > > +
> > > +err_undo:
> > > + atomic_dec(&dev->open_count);
> > > + accel_minor_release(minor);
> > > + return retcode;
> > > +}
> > > +EXPORT_SYMBOL_GPL(accel_open);
> > > +
> > > static int accel_stub_open(struct inode *inode, struct file *filp)
> > > {
> > > - DRM_DEBUG("Operation not supported");
> > > + const struct file_operations *new_fops;
> > > + struct drm_minor *minor;
> > > + int err;
> > > +
> > > + DRM_DEBUG("\n");
> >
> > It seems useless.
> Correct, I removed it in v3.
> Thanks,
> Oded
> >
> > Thanks.
> > Jiho Chu

2022-11-07 13:07:40

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Mon, Nov 7, 2022 at 2:56 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Thu, Nov 03, 2022 at 10:39:36PM +0200, Oded Gabbay wrote:
> > On Thu, Nov 3, 2022 at 3:31 PM Oded Gabbay <[email protected]> wrote:
> > >
> > > On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote:
> > > > > --- /dev/null
> > > > > +++ b/drivers/accel/Kconfig
> > > > > @@ -0,0 +1,24 @@
> > > > > +# SPDX-License-Identifier: GPL-2.0-only
> > > > > +#
> > > > > +# Compute Acceleration device configuration
> > > > > +#
> > > > > +# This framework provides support for compute acceleration devices, such
> > > > > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration
> > > > > +# devices
> > > > > +#
> > > > > +menuconfig ACCEL
> > > > > + tristate "Compute Acceleration Framework"
> > > > > + depends on DRM
> > > > > + help
> > > > > + Framework for device drivers of compute acceleration devices, such
> > > > > + as, but not limited to, Machine-Learning and Deep-Learning
> > > > > + acceleration devices.
> > > > > + If you say Y here, you need to select the module that's right for
> > > > > + your acceleration device from the list below.
> > > > > + This framework is integrated with the DRM subsystem as compute
> > > > > + accelerators and GPUs share a lot in common and can use almost the
> > > > > + same infrastructure code.
> > > > > + Having said that, acceleration devices will have a different
> > > > > + major number than GPUs, and will be exposed to user-space using
> > > > > + different device files, called accel/accel* (in /dev, sysfs
> > > > > + and debugfs)
> > > >
> > > > Module name if "M" is chosen?
> > > Will add
> > So, unfortunately, the path of doing accel as a kernel module won't
> > work cleanly (Thanks stanislaw for pointing this out to me).
> > The reason is the circular dependency between drm and accel. drm calls
> > accel exported symbols during init and when devices are registering
> > (all the minor handling), and accel calls drm exported symbols because
> > I don't want to duplicate the entire drm core code.
>
> I really don't think this is the right way to integrate with
> DRM. Accel should be a layer over top of DRM, not have these wakky
> co-dependencies.
>
> The fact you are running into stuff like this already smells really
> bad.
>
> Jason
I don't agree with your statement that it should be "a layer over top of DRM".
Anything on top of DRM is a device driver.
Accel is not a device driver, it is a new type of drm minor / drm driver.

Please look at v3 of the patch-set. There I abandoned the idea of
having accel as a separate module and instead it is part of drm.ko, as
it should be because it is just a new drm minor.

The only alternative imo to that is to abandon the idea of reusing
drm, and just make an independant accel core code.

Oded

2022-11-07 13:25:08

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Thu, Nov 03, 2022 at 10:39:36PM +0200, Oded Gabbay wrote:
> On Thu, Nov 3, 2022 at 3:31 PM Oded Gabbay <[email protected]> wrote:
> >
> > On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote:
> > > > --- /dev/null
> > > > +++ b/drivers/accel/Kconfig
> > > > @@ -0,0 +1,24 @@
> > > > +# SPDX-License-Identifier: GPL-2.0-only
> > > > +#
> > > > +# Compute Acceleration device configuration
> > > > +#
> > > > +# This framework provides support for compute acceleration devices, such
> > > > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration
> > > > +# devices
> > > > +#
> > > > +menuconfig ACCEL
> > > > + tristate "Compute Acceleration Framework"
> > > > + depends on DRM
> > > > + help
> > > > + Framework for device drivers of compute acceleration devices, such
> > > > + as, but not limited to, Machine-Learning and Deep-Learning
> > > > + acceleration devices.
> > > > + If you say Y here, you need to select the module that's right for
> > > > + your acceleration device from the list below.
> > > > + This framework is integrated with the DRM subsystem as compute
> > > > + accelerators and GPUs share a lot in common and can use almost the
> > > > + same infrastructure code.
> > > > + Having said that, acceleration devices will have a different
> > > > + major number than GPUs, and will be exposed to user-space using
> > > > + different device files, called accel/accel* (in /dev, sysfs
> > > > + and debugfs)
> > >
> > > Module name if "M" is chosen?
> > Will add
> So, unfortunately, the path of doing accel as a kernel module won't
> work cleanly (Thanks stanislaw for pointing this out to me).
> The reason is the circular dependency between drm and accel. drm calls
> accel exported symbols during init and when devices are registering
> (all the minor handling), and accel calls drm exported symbols because
> I don't want to duplicate the entire drm core code.

I really don't think this is the right way to integrate with
DRM. Accel should be a layer over top of DRM, not have these wakky
co-dependencies.

The fact you are running into stuff like this already smells really
bad.

Jason

2022-11-07 13:30:19

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote:
> I don't agree with your statement that it should be "a layer over top of DRM".
> Anything on top of DRM is a device driver.
> Accel is not a device driver, it is a new type of drm minor / drm driver.

Yeah, I still think this is not the right way, you are getting almost
nothing from DRM and making everything more complicated in the
process.

> The only alternative imo to that is to abandon the idea of reusing
> drm, and just make an independant accel core code.

Not quite really, layer it properly and librarize parts of DRM into
things accel can re-use so they are not intimately tied to the DRM
struct device notion.

IMHO this is much better, because accel has very little need of DRM to
manage a struct device/cdev in the first place.

Jason

2022-11-07 13:31:43

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

Hi

On Mon, Nov 07, 2022 at 09:10:36AM -0400, Jason Gunthorpe wrote:
> On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote:
> > I don't agree with your statement that it should be "a layer over top of DRM".
> > Anything on top of DRM is a device driver.
> > Accel is not a device driver, it is a new type of drm minor / drm driver.
>
> Yeah, I still think this is not the right way, you are getting almost
> nothing from DRM and making everything more complicated in the
> process.
>
> > The only alternative imo to that is to abandon the idea of reusing
> > drm, and just make an independant accel core code.
>
> Not quite really, layer it properly and librarize parts of DRM into
> things accel can re-use so they are not intimately tied to the DRM
> struct device notion.

What do you mean by "struct device notion" ? struct drm_devce ?

Regards
Stanislaw

2022-11-07 14:18:43

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote:
> > I don't agree with your statement that it should be "a layer over top of DRM".
> > Anything on top of DRM is a device driver.
> > Accel is not a device driver, it is a new type of drm minor / drm driver.
>
> Yeah, I still think this is not the right way, you are getting almost
> nothing from DRM and making everything more complicated in the
> process.
>
> > The only alternative imo to that is to abandon the idea of reusing
> > drm, and just make an independant accel core code.
>
> Not quite really, layer it properly and librarize parts of DRM into
> things accel can re-use so they are not intimately tied to the DRM
> struct device notion.
>
> IMHO this is much better, because accel has very little need of DRM to
> manage a struct device/cdev in the first place.
>
> Jason
I'm not following. How can an accel device be a new type of drm_minor,
if it doesn't have access to all its functions and members ?
How will accel device leverage, for example, the GEM code without
being a drm_minor ?

Librarizing parts of DRM sounds nice in theory but the reality is that
everything there is interconnected, all the structures are
interdependent.
I would have to re-write the entire DRM library to make such a thing
work. I don't think this was the intention.

The current design makes the accel device an integral part of drm,
with very minimal code duplication and without re-writing DRM.

Oded

2022-11-07 14:19:32

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Mon, Nov 07, 2022 at 04:02:01PM +0200, Oded Gabbay wrote:
> On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <[email protected]> wrote:
> >
> > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote:
> > > I don't agree with your statement that it should be "a layer over top of DRM".
> > > Anything on top of DRM is a device driver.
> > > Accel is not a device driver, it is a new type of drm minor / drm driver.
> >
> > Yeah, I still think this is not the right way, you are getting almost
> > nothing from DRM and making everything more complicated in the
> > process.
> >
> > > The only alternative imo to that is to abandon the idea of reusing
> > > drm, and just make an independant accel core code.
> >
> > Not quite really, layer it properly and librarize parts of DRM into
> > things accel can re-use so they are not intimately tied to the DRM
> > struct device notion.
> >
> > IMHO this is much better, because accel has very little need of DRM to
> > manage a struct device/cdev in the first place.
> >
> > Jason
> I'm not following. How can an accel device be a new type of drm_minor,
> if it doesn't have access to all its functions and members ?

"drm_minor" is not necessary anymore. Strictly managing minor numbers
lost its value years ago when /dev/ was reorganized. Just use
dynamic minors fully.

> How will accel device leverage, for example, the GEM code without
> being a drm_minor ?

Split GEM into a library so it doesn't require that.

> Librarizing parts of DRM sounds nice in theory but the reality is that
> everything there is interconnected, all the structures are
> interdependent.

Yes, the kernel is full of stuff that needs improving. Let's not take
shortcuts.

> I would have to re-write the entire DRM library to make such a thing
> work. I don't think this was the intention.

Not necessarily you, whoever someday needs GEM would have to do some
work.

> The current design makes the accel device an integral part of drm,
> with very minimal code duplication and without re-writing DRM.

And it smells bad, you can't even make it into a proper module. Who
knows what other problems will come.

Jason

2022-11-07 16:18:54

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Mon, Nov 7, 2022 at 4:10 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Mon, Nov 07, 2022 at 04:02:01PM +0200, Oded Gabbay wrote:
> > On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <[email protected]> wrote:
> > >
> > > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote:
> > > > I don't agree with your statement that it should be "a layer over top of DRM".
> > > > Anything on top of DRM is a device driver.
> > > > Accel is not a device driver, it is a new type of drm minor / drm driver.
> > >
> > > Yeah, I still think this is not the right way, you are getting almost
> > > nothing from DRM and making everything more complicated in the
> > > process.
> > >
> > > > The only alternative imo to that is to abandon the idea of reusing
> > > > drm, and just make an independant accel core code.
> > >
> > > Not quite really, layer it properly and librarize parts of DRM into
> > > things accel can re-use so they are not intimately tied to the DRM
> > > struct device notion.
> > >
> > > IMHO this is much better, because accel has very little need of DRM to
> > > manage a struct device/cdev in the first place.
> > >
> > > Jason
> > I'm not following. How can an accel device be a new type of drm_minor,
> > if it doesn't have access to all its functions and members ?
>
> "drm_minor" is not necessary anymore. Strictly managing minor numbers
> lost its value years ago when /dev/ was reorganized. Just use
> dynamic minors fully.
drm minor is not just about handling minor numbers. It contains the
entire code to manage devices that register with drm framework (e.g.
supply callbacks to file operations), manage their lifecycle,
resources (e.g. automatic free of resources on release), sysfs,
debugfs, etc.

To take all of that out of drm.ko and make it a separate kernel module
is a big change, which I don't know if the drm people even want me to
do.

>
> > How will accel device leverage, for example, the GEM code without
> > being a drm_minor ?
>
> Split GEM into a library so it doesn't require that.
I don't see the advantage of doing that over defining accel as a new
type of drm minor.
>
> > Librarizing parts of DRM sounds nice in theory but the reality is that
> > everything there is interconnected, all the structures are
> > interdependent.
>
> Yes, the kernel is full of stuff that needs improving. Let's not take
> shortcuts.
It's not about shortcuts. It's about a different way to solve this
issue which I don't think is anyway hacky or inappropriate.

>
> > I would have to re-write the entire DRM library to make such a thing
> > work. I don't think this was the intention.
>
> Not necessarily you, whoever someday needs GEM would have to do some
> work.
>
> > The current design makes the accel device an integral part of drm,
> > with very minimal code duplication and without re-writing DRM.
>
> And it smells bad, you can't even make it into a proper module. Who
> knows what other problems will come.
I would argue that if accel is part of drm, it doesn't have to be a
proper module. I don't see that as a hard requirement.
Oded

>
> Jason

2022-11-07 17:27:06

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Mon, Nov 07, 2022 at 05:53:55PM +0200, Oded Gabbay wrote:
> On Mon, Nov 7, 2022 at 4:10 PM Jason Gunthorpe <[email protected]> wrote:
> >
> > On Mon, Nov 07, 2022 at 04:02:01PM +0200, Oded Gabbay wrote:
> > > On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <[email protected]> wrote:
> > > >
> > > > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote:
> > > > > I don't agree with your statement that it should be "a layer over top of DRM".
> > > > > Anything on top of DRM is a device driver.
> > > > > Accel is not a device driver, it is a new type of drm minor / drm driver.
> > > >
> > > > Yeah, I still think this is not the right way, you are getting almost
> > > > nothing from DRM and making everything more complicated in the
> > > > process.
> > > >
> > > > > The only alternative imo to that is to abandon the idea of reusing
> > > > > drm, and just make an independant accel core code.
> > > >
> > > > Not quite really, layer it properly and librarize parts of DRM into
> > > > things accel can re-use so they are not intimately tied to the DRM
> > > > struct device notion.
> > > >
> > > > IMHO this is much better, because accel has very little need of DRM to
> > > > manage a struct device/cdev in the first place.
> > > >
> > > > Jason
> > > I'm not following. How can an accel device be a new type of drm_minor,
> > > if it doesn't have access to all its functions and members ?
> >
> > "drm_minor" is not necessary anymore. Strictly managing minor numbers
> > lost its value years ago when /dev/ was reorganized. Just use
> > dynamic minors fully.
> drm minor is not just about handling minor numbers. It contains the
> entire code to manage devices that register with drm framework (e.g.
> supply callbacks to file operations), manage their lifecycle,
> resources (e.g. automatic free of resources on release), sysfs,
> debugfs, etc.

This is why you are having such troubles, this is already good library
code. You don't need DRM to wrapper debugfs APIs, for instance. We
have devm, though maybe it is not a good idea, etc

Greg already pointed out the sysfs was not being done correctly
anyhow.

I don't think DRM is improving on these core kernel services. Just use
the normal stuff directly.

> > > How will accel device leverage, for example, the GEM code without
> > > being a drm_minor ?
> >
> > Split GEM into a library so it doesn't require that.
> I don't see the advantage of doing that over defining accel as a new
> type of drm minor.

Making things into smaller libraries is recognized as a far better
kernel approach than trying to make a gigantic wide midlayer that stuffs
itself into everything. LWN called this the "midlayer mistake" and
wrote about the pitfalls a long time ago:

https://lwn.net/Articles/336262/

It is exactly what you are experiencing trying to stretch a
midlayer even further out.

Jason

2022-11-07 20:09:18

by Oded Gabbay

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Mon, Nov 7, 2022 at 6:31 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Mon, Nov 07, 2022 at 05:53:55PM +0200, Oded Gabbay wrote:
> > On Mon, Nov 7, 2022 at 4:10 PM Jason Gunthorpe <[email protected]> wrote:
> > >
> > > On Mon, Nov 07, 2022 at 04:02:01PM +0200, Oded Gabbay wrote:
> > > > On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <[email protected]> wrote:
> > > > >
> > > > > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote:
> > > > > > I don't agree with your statement that it should be "a layer over top of DRM".
> > > > > > Anything on top of DRM is a device driver.
> > > > > > Accel is not a device driver, it is a new type of drm minor / drm driver.
> > > > >
> > > > > Yeah, I still think this is not the right way, you are getting almost
> > > > > nothing from DRM and making everything more complicated in the
> > > > > process.
> > > > >
> > > > > > The only alternative imo to that is to abandon the idea of reusing
> > > > > > drm, and just make an independant accel core code.
> > > > >
> > > > > Not quite really, layer it properly and librarize parts of DRM into
> > > > > things accel can re-use so they are not intimately tied to the DRM
> > > > > struct device notion.
> > > > >
> > > > > IMHO this is much better, because accel has very little need of DRM to
> > > > > manage a struct device/cdev in the first place.
> > > > >
> > > > > Jason
> > > > I'm not following. How can an accel device be a new type of drm_minor,
> > > > if it doesn't have access to all its functions and members ?
> > >
> > > "drm_minor" is not necessary anymore. Strictly managing minor numbers
> > > lost its value years ago when /dev/ was reorganized. Just use
> > > dynamic minors fully.
> > drm minor is not just about handling minor numbers. It contains the
> > entire code to manage devices that register with drm framework (e.g.
> > supply callbacks to file operations), manage their lifecycle,
> > resources (e.g. automatic free of resources on release), sysfs,
> > debugfs, etc.
>
> This is why you are having such troubles, this is already good library
> code. You don't need DRM to wrapper debugfs APIs, for instance. We
> have devm, though maybe it is not a good idea, etc
>
> Greg already pointed out the sysfs was not being done correctly
> anyhow.
>
> I don't think DRM is improving on these core kernel services. Just use
> the normal stuff directly.
I get what you are saying but if I do all that, then how is this
subsystem related to DRM and re-using its code ? (at least at this
stage)
btw, using the basic stuff directly was my original intention, if you
remember the original accel mail thread from July/August.
And then we all decided in LPC that we shouldn't do that and instead
accel should use the DRM code and just expose a new major+minor for
the new drivers.

So, something doesn't add up...
imo, we need to choose between doing accel either as a small new
feature in drm, or as an independent subsystem.
I just don't see how I do the former without calling drm code directly
and using all its wrappers.

>
> > > > How will accel device leverage, for example, the GEM code without
> > > > being a drm_minor ?
> > >
> > > Split GEM into a library so it doesn't require that.
> > I don't see the advantage of doing that over defining accel as a new
> > type of drm minor.
>
> Making things into smaller libraries is recognized as a far better
> kernel approach than trying to make a gigantic wide midlayer that stuffs
> itself into everything. LWN called this the "midlayer mistake" and
> wrote about the pitfalls a long time ago:
>
> https://lwn.net/Articles/336262/
>
> It is exactly what you are experiencing trying to stretch a
> midlayer even further out.
>
> Jason
I'm all for breaking it down to smaller libraries, I completely agree with you.
But as you wrote above, why do I even need to use the drm wrappers for
the basic stuff ? I'll just call the kernel api directly.
And if that's the case then I don't need to rip that code out of the
heart of drm and make it a separate module.

For GEM (as an example of something less basic) it might be a
different story, but we are not there yet.

Oded

2022-11-07 20:38:09

by Dave Airlie

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Mon, 7 Nov 2022 at 23:10, Jason Gunthorpe <[email protected]> wrote:
>
> On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote:
> > I don't agree with your statement that it should be "a layer over top of DRM".
> > Anything on top of DRM is a device driver.
> > Accel is not a device driver, it is a new type of drm minor / drm driver.
>
> Yeah, I still think this is not the right way, you are getting almost
> nothing from DRM and making everything more complicated in the
> process.

You are looking at the small picture that is these patches, there are
just infrastructure to start the process of merging drivers and
reusing other parts of the drm code.

We aren't going to ever get anywhere if we start splitting code out of
drm just in case, we get this stuff rolling in the tree and if we have
a pressing need to refactor it out into separate libraries later then
we can address that from a more educated place, instead of just
throwing huge refactors around before we have any code to even use
them.
>
> IMHO this is much better, because accel has very little need of DRM to
> manage a struct device/cdev in the first place.

Right now it doesn't, but when drivers start leveraging the other code
it will reuse a lot more code.

I'm not going to spend too much time entertaining this, devm vs drmm
memory etc are real problems drm has already identified if not
completely solved.

Dave.

2022-11-07 20:50:35

by Dave Airlie

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

> > >
> > > "drm_minor" is not necessary anymore. Strictly managing minor numbers
> > > lost its value years ago when /dev/ was reorganized. Just use
> > > dynamic minors fully.
> > drm minor is not just about handling minor numbers. It contains the
> > entire code to manage devices that register with drm framework (e.g.
> > supply callbacks to file operations), manage their lifecycle,
> > resources (e.g. automatic free of resources on release), sysfs,
> > debugfs, etc.
>
> This is why you are having such troubles, this is already good library
> code. You don't need DRM to wrapper debugfs APIs, for instance. We
> have devm, though maybe it is not a good idea, etc
>
> Greg already pointed out the sysfs was not being done correctly
> anyhow.
>
> I don't think DRM is improving on these core kernel services. Just use
> the normal stuff directly.

At plumbers we decided a direction, I think the direction is good, if
there is refactoring to be done, I'd rather it was done in tree with a
clear direction.

Coming in now and saying we should go down a different path isn't
really helpful. We need to get rolling on this, we have drivers that
want to land somewhere now, which means we need to just get a
framework in place, leveraging drm code is the way to do it.

There is no need to an "accel" module, what does that even buy you,
the idea is to have an accel subsystem that allows drivers to use drm
features, not an accel subsystem that refactors drm features, that
would take years. There are already drivers for this subsystem wanting
to use GEM, and I don't think holding them up for a year to refactor
something that we don't have a clear reason or goal behind
refactoring.

If there is a problem with the drm subsystem interactions with the
kernel standard implementations then let's go fix that and accel will
also get fixed, but there's no reason to start going down that road at
the same time as introducing accel.

Also with the idr/xarray stuff, this isn't the patchset to be
introducing a bunch of new and divergent work, if this patchset
identifies deficiencies then let's document them and work on them in
parallel instead of blocking the initial landing in favour of some
future refactors with no in-tree users.

Dave.

2022-11-08 13:28:29

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Tue, Nov 08, 2022 at 06:33:23AM +1000, Dave Airlie wrote:

> At plumbers we decided a direction, I think the direction is good, if
> there is refactoring to be done, I'd rather it was done in tree with a
> clear direction.
>
> Coming in now and saying we should go down a different path isn't
> really helpful. We need to get rolling on this, we have drivers that
> want to land somewhere now, which means we need to just get a
> framework in place, leveraging drm code is the way to do it.

It is not a different path, at plumbers we decided accel should try to
re-use parts of DRM that make sense. I think that should be done by
making those DRM parts into libraries that can be re-used, not by
trying to twist DRM into something weird.

If this thing needs special major/minor numbers, it's own class, its
own debufs, sysfs, etc, then it should not be abusing the DRM struct
device infrastructure to create that very basic kernel infrastructure.

Somehow we ended up with the worst of both worlds. If you want to to
be DRM then it should just be DRM and we shouldn't see all this core
infrastructue code for debugfs/sysfs/cdevs/etc in thes patches at all.

Jason

2022-11-09 07:34:30

by Dave Airlie

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major

On Tue, 8 Nov 2022 at 22:28, Jason Gunthorpe <[email protected]> wrote:
>
> On Tue, Nov 08, 2022 at 06:33:23AM +1000, Dave Airlie wrote:
>
> > At plumbers we decided a direction, I think the direction is good, if
> > there is refactoring to be done, I'd rather it was done in tree with a
> > clear direction.
> >
> > Coming in now and saying we should go down a different path isn't
> > really helpful. We need to get rolling on this, we have drivers that
> > want to land somewhere now, which means we need to just get a
> > framework in place, leveraging drm code is the way to do it.
>
> It is not a different path, at plumbers we decided accel should try to
> re-use parts of DRM that make sense. I think that should be done by
> making those DRM parts into libraries that can be re-used, not by
> trying to twist DRM into something weird.

There isn't much twisting here, the thing is this is just the code for sharing,
there isn't going to be mountains more. This code gives accel drivers access
to a lot of things. Refactoring it out will take a year or so, and I don't think
buys us anything.

>
> If this thing needs special major/minor numbers, it's own class, its
> own debufs, sysfs, etc, then it should not be abusing the DRM struct
> device infrastructure to create that very basic kernel infrastructure.
>
> Somehow we ended up with the worst of both worlds. If you want to to
> be DRM then it should just be DRM and we shouldn't see all this core
> infrastructue code for debugfs/sysfs/cdevs/etc in thes patches at all.

We can refactor this out even clearer in the long run if it needs to,
but you are overly focusing on the small picture of these patches and
not the larger sharing this enables.

At this point I'm going to be merging close to what we have here, so
we can move forward with getting some drivers lined up.

Dave.