2024-04-09 20:02:27

by José Roberto de Souza

[permalink] [raw]
Subject: [PATCH v4 1/2] devcoredump: Add dev_coredump_put()

It is useful for modules that do not want to keep coredump available
after its unload.
Otherwise, the coredump would only be removed after DEVCD_TIMEOUT
seconds.

v2:
- dev_coredump_put() documentation updated (Mukesh)

Cc: Rodrigo Vivi <[email protected]>
Cc: Mukesh Ojha <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Jonathan Cavitt <[email protected]>
Reviewed-by: Johannes Berg <[email protected]>
Acked-by: Jonathan Cavitt <[email protected]>
Signed-off-by: José Roberto de Souza <[email protected]>
---
drivers/base/devcoredump.c | 23 +++++++++++++++++++++++
include/linux/devcoredump.h | 5 +++++
2 files changed, 28 insertions(+)

diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
index 7e2d1f0d903a6..82aeb09b3d1b5 100644
--- a/drivers/base/devcoredump.c
+++ b/drivers/base/devcoredump.c
@@ -304,6 +304,29 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset,
offset);
}

+/**
+ * dev_coredump_put - remove device coredump
+ * @dev: the struct device for the crashed device
+ *
+ * dev_coredump_put() removes coredump, if exists, for a given device from
+ * the file system and free its associated data otherwise, does nothing.
+ *
+ * It is useful for modules that do not want to keep coredump
+ * available after its unload.
+ */
+void dev_coredump_put(struct device *dev)
+{
+ struct device *existing;
+
+ existing = class_find_device(&devcd_class, NULL, dev,
+ devcd_match_failing);
+ if (existing) {
+ devcd_free(existing, NULL);
+ put_device(existing);
+ }
+}
+EXPORT_SYMBOL_GPL(dev_coredump_put);
+
/**
* dev_coredumpm - create device coredump with read/free methods
* @dev: the struct device for the crashed device
diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h
index c008169ed2c6f..c8f7eb6cc1915 100644
--- a/include/linux/devcoredump.h
+++ b/include/linux/devcoredump.h
@@ -63,6 +63,8 @@ void dev_coredumpm(struct device *dev, struct module *owner,

void dev_coredumpsg(struct device *dev, struct scatterlist *table,
size_t datalen, gfp_t gfp);
+
+void dev_coredump_put(struct device *dev);
#else
static inline void dev_coredumpv(struct device *dev, void *data,
size_t datalen, gfp_t gfp)
@@ -85,6 +87,9 @@ static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table,
{
_devcd_free_sgtable(table);
}
+static inline void dev_coredump_put(struct device *dev)
+{
+}
#endif /* CONFIG_DEV_COREDUMP */

#endif /* __DEVCOREDUMP_H */
--
2.44.0



2024-04-09 20:02:32

by José Roberto de Souza

[permalink] [raw]
Subject: [PATCH v4 2/2] drm/xe: Remove devcoredump during driver release

This will remove devcoredump from file system and free its resources
during driver unload.

This fix the driver unload after gpu hang happened, otherwise this
it would report that Xe KMD is still in use and it would leave the
kernel in a state that Xe KMD can't be unload without a reboot.

Cc: Rodrigo Vivi <[email protected]>
Cc: Jonathan Cavitt <[email protected]>
Acked-by: Jonathan Cavitt <[email protected]>
Signed-off-by: José Roberto de Souza <[email protected]>
---
drivers/gpu/drm/xe/xe_devcoredump.c | 13 ++++++++++++-
drivers/gpu/drm/xe/xe_devcoredump.h | 6 ++++++
drivers/gpu/drm/xe/xe_device.c | 4 ++++
3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 283ca7518aff2..3d7980232be1c 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -9,6 +9,8 @@
#include <linux/devcoredump.h>
#include <generated/utsrelease.h>

+#include <drm/drm_managed.h>
+
#include "xe_device.h"
#include "xe_exec_queue.h"
#include "xe_force_wake.h"
@@ -235,5 +237,14 @@ void xe_devcoredump(struct xe_sched_job *job)
dev_coredumpm(xe->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL,
xe_devcoredump_read, xe_devcoredump_free);
}
-#endif

+static void xe_driver_devcoredump_fini(struct drm_device *drm, void *arg)
+{
+ dev_coredump_put(drm->dev);
+}
+
+int xe_devcoredump_init(struct xe_device *xe)
+{
+ return drmm_add_action_or_reset(&xe->drm, xe_driver_devcoredump_fini, xe);
+}
+#endif
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h
index df8671f0b5eb2..e2fa65ce09322 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump.h
@@ -11,10 +11,16 @@ struct xe_sched_job;

#ifdef CONFIG_DEV_COREDUMP
void xe_devcoredump(struct xe_sched_job *job);
+int xe_devcoredump_init(struct xe_device *xe);
#else
static inline void xe_devcoredump(struct xe_sched_job *job)
{
}
+
+static inline int xe_devcoredump_init(struct xe_device *xe)
+{
+ return 0;
+}
#endif

#endif
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 9083f5e02dd9e..ce27d0d1bdb34 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -20,6 +20,7 @@
#include "regs/xe_regs.h"
#include "xe_bo.h"
#include "xe_debugfs.h"
+#include "xe_devcoredump.h"
#include "xe_dma_buf.h"
#include "xe_drm_client.h"
#include "xe_drv.h"
@@ -513,6 +514,9 @@ int xe_device_probe(struct xe_device *xe)
return err;
}

+ err = xe_devcoredump_init(xe);
+ if (err)
+ return err;
err = drmm_add_action_or_reset(&xe->drm, xe_driver_flr_fini, xe);
if (err)
return err;
--
2.44.0


2024-04-09 22:22:04

by Rodrigo Vivi

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] drm/xe: Remove devcoredump during driver release

On Tue, Apr 09, 2024 at 01:02:06PM -0700, Jos? Roberto de Souza wrote:
> This will remove devcoredump from file system and free its resources
> during driver unload.
>
> This fix the driver unload after gpu hang happened, otherwise this
> it would report that Xe KMD is still in use and it would leave the
> kernel in a state that Xe KMD can't be unload without a reboot.
>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jonathan Cavitt <[email protected]>
> Acked-by: Jonathan Cavitt <[email protected]>
> Signed-off-by: Jos? Roberto de Souza <[email protected]>

Reviewed-by: Rodrigo Vivi <[email protected]>

> ---
> drivers/gpu/drm/xe/xe_devcoredump.c | 13 ++++++++++++-
> drivers/gpu/drm/xe/xe_devcoredump.h | 6 ++++++
> drivers/gpu/drm/xe/xe_device.c | 4 ++++
> 3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> index 283ca7518aff2..3d7980232be1c 100644
> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> @@ -9,6 +9,8 @@
> #include <linux/devcoredump.h>
> #include <generated/utsrelease.h>
>
> +#include <drm/drm_managed.h>
> +
> #include "xe_device.h"
> #include "xe_exec_queue.h"
> #include "xe_force_wake.h"
> @@ -235,5 +237,14 @@ void xe_devcoredump(struct xe_sched_job *job)
> dev_coredumpm(xe->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL,
> xe_devcoredump_read, xe_devcoredump_free);
> }
> -#endif
>
> +static void xe_driver_devcoredump_fini(struct drm_device *drm, void *arg)
> +{
> + dev_coredump_put(drm->dev);
> +}
> +
> +int xe_devcoredump_init(struct xe_device *xe)
> +{
> + return drmm_add_action_or_reset(&xe->drm, xe_driver_devcoredump_fini, xe);
> +}
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h
> index df8671f0b5eb2..e2fa65ce09322 100644
> --- a/drivers/gpu/drm/xe/xe_devcoredump.h
> +++ b/drivers/gpu/drm/xe/xe_devcoredump.h
> @@ -11,10 +11,16 @@ struct xe_sched_job;
>
> #ifdef CONFIG_DEV_COREDUMP
> void xe_devcoredump(struct xe_sched_job *job);
> +int xe_devcoredump_init(struct xe_device *xe);
> #else
> static inline void xe_devcoredump(struct xe_sched_job *job)
> {
> }
> +
> +static inline int xe_devcoredump_init(struct xe_device *xe)
> +{
> + return 0;
> +}
> #endif
>
> #endif
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 9083f5e02dd9e..ce27d0d1bdb34 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -20,6 +20,7 @@
> #include "regs/xe_regs.h"
> #include "xe_bo.h"
> #include "xe_debugfs.h"
> +#include "xe_devcoredump.h"
> #include "xe_dma_buf.h"
> #include "xe_drm_client.h"
> #include "xe_drv.h"
> @@ -513,6 +514,9 @@ int xe_device_probe(struct xe_device *xe)
> return err;
> }
>
> + err = xe_devcoredump_init(xe);
> + if (err)
> + return err;
> err = drmm_add_action_or_reset(&xe->drm, xe_driver_flr_fini, xe);
> if (err)
> return err;
> --
> 2.44.0
>

2024-04-09 22:23:56

by Rodrigo Vivi

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] devcoredump: Add dev_coredump_put()

On Tue, Apr 09, 2024 at 01:02:05PM -0700, Jos? Roberto de Souza wrote:
> It is useful for modules that do not want to keep coredump available
> after its unload.
> Otherwise, the coredump would only be removed after DEVCD_TIMEOUT
> seconds.
>
> v2:
> - dev_coredump_put() documentation updated (Mukesh)
>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Mukesh Ojha <[email protected]>
> Cc: Johannes Berg <[email protected]>
> Cc: Jonathan Cavitt <[email protected]>
> Reviewed-by: Johannes Berg <[email protected]>
> Acked-by: Jonathan Cavitt <[email protected]>
> Signed-off-by: Jos? Roberto de Souza <[email protected]>

Greg, can I have you ack to merge this through the drm-next?

> ---
> drivers/base/devcoredump.c | 23 +++++++++++++++++++++++
> include/linux/devcoredump.h | 5 +++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/drivers/base/devcoredump.c b/drivers/base/devcoredump.c
> index 7e2d1f0d903a6..82aeb09b3d1b5 100644
> --- a/drivers/base/devcoredump.c
> +++ b/drivers/base/devcoredump.c
> @@ -304,6 +304,29 @@ static ssize_t devcd_read_from_sgtable(char *buffer, loff_t offset,
> offset);
> }
>
> +/**
> + * dev_coredump_put - remove device coredump
> + * @dev: the struct device for the crashed device
> + *
> + * dev_coredump_put() removes coredump, if exists, for a given device from
> + * the file system and free its associated data otherwise, does nothing.
> + *
> + * It is useful for modules that do not want to keep coredump
> + * available after its unload.
> + */
> +void dev_coredump_put(struct device *dev)
> +{
> + struct device *existing;
> +
> + existing = class_find_device(&devcd_class, NULL, dev,
> + devcd_match_failing);
> + if (existing) {
> + devcd_free(existing, NULL);
> + put_device(existing);
> + }
> +}
> +EXPORT_SYMBOL_GPL(dev_coredump_put);
> +
> /**
> * dev_coredumpm - create device coredump with read/free methods
> * @dev: the struct device for the crashed device
> diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h
> index c008169ed2c6f..c8f7eb6cc1915 100644
> --- a/include/linux/devcoredump.h
> +++ b/include/linux/devcoredump.h
> @@ -63,6 +63,8 @@ void dev_coredumpm(struct device *dev, struct module *owner,
>
> void dev_coredumpsg(struct device *dev, struct scatterlist *table,
> size_t datalen, gfp_t gfp);
> +
> +void dev_coredump_put(struct device *dev);
> #else
> static inline void dev_coredumpv(struct device *dev, void *data,
> size_t datalen, gfp_t gfp)
> @@ -85,6 +87,9 @@ static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table,
> {
> _devcd_free_sgtable(table);
> }
> +static inline void dev_coredump_put(struct device *dev)
> +{
> +}
> #endif /* CONFIG_DEV_COREDUMP */
>
> #endif /* __DEVCOREDUMP_H */
> --
> 2.44.0
>

2024-04-11 13:08:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] devcoredump: Add dev_coredump_put()

On Tue, Apr 09, 2024 at 06:23:33PM -0400, Rodrigo Vivi wrote:
> On Tue, Apr 09, 2024 at 01:02:05PM -0700, Jos? Roberto de Souza wrote:
> > It is useful for modules that do not want to keep coredump available
> > after its unload.
> > Otherwise, the coredump would only be removed after DEVCD_TIMEOUT
> > seconds.
> >
> > v2:
> > - dev_coredump_put() documentation updated (Mukesh)
> >
> > Cc: Rodrigo Vivi <[email protected]>
> > Cc: Mukesh Ojha <[email protected]>
> > Cc: Johannes Berg <[email protected]>
> > Cc: Jonathan Cavitt <[email protected]>
> > Reviewed-by: Johannes Berg <[email protected]>
> > Acked-by: Jonathan Cavitt <[email protected]>
> > Signed-off-by: Jos? Roberto de Souza <[email protected]>
>
> Greg, can I have you ack to merge this through the drm-next?

Acked-by: Greg Kroah-Hartman <[email protected]>