From: Jocelyn Falempe <[email protected]>
Rough sketch for the locking of drm panic printing code. The upshot of
this approach is that we can pretty much entirely rely on the atomic
commit flow, with the pair of raw_spin_lock/unlock providing any
barriers we need, without having to create really big critical
sections in code.
This also avoids the need that drivers must explicitly update the
panic handler state, which they might forget to do, or not do
consistently, and then we blow up in the worst possible times.
It is somewhat racy against a concurrent atomic update, and we might
write into a buffer which the hardware will never display. But there's
fundamentally no way to avoid that - if we do the panic state update
explicitly after writing to the hardware, we might instead write to an
old buffer that the user will barely ever see.
Note that an rcu protected deference of plane->state would give us the
the same guarantees, but it has the downside that we then need to
protect the plane state freeing functions with call_rcu too. Which
would very widely impact a lot of code and therefore doesn't seem
worth the it compared to a raw spinlock with very tiny critical
sections. Plus rcu cannot be used to protect access to peek/poke registers
anyway, so we'd still need it for those cases.
Peek/poke registers for vram access (or a gart pte reserved just for
panic code) are also the reason I've gone with a per-device and not
per-plane spinlock, since usually these things are global for the
entire display. Going with per-plane locks would mean drivers for such
hardware would need additional locks, which we don't want, since it
deviates from the per-console takeoverlocks design.
Longer term it might be useful if the panic notifiers grow a bit more
structure than just the absolute bare
EXPORT_SYMBOL(panic_notifier_list) - somewhat aside, why is that not
EXPORT_SYMBOL_GPL ... If panic notifiers would be more like console
drivers with proper register/unregister interfaces we could perhaps
reuse the very fancy console lock with all it's check and takeover
semantics that John Ogness is developing to fix the console_lock mess.
But for the initial cut of a drm panic printing support I don't think
we need that, because the critical sections are extremely small and
only happen once per display refresh. So generally just 60 tiny locked
sections per second, which is nothing compared to a serial console
running a 115kbaud doing really slow mmio writes for each byte. So for
now the raw spintrylock in drm panic notifier callback should be good
enough.
Another benefit of making panic notifiers more like full blown
consoles (that are used in panics only) would be that we get the two
stage design, where first all the safe outputs are used. And then the
dangerous takeover tricks are deployed (where for display drivers we
also might try to intercept any in-flight display buffer flips, which
if we race and misprogram fifos and watermarks can hang the memory
controller on some hw).
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jocelyn Falempe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Lukas Wunner <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
---
drivers/gpu/drm/drm_atomic_helper.c | 3 +
include/drm/drm_mode_config.h | 10 +++
include/drm/drm_panic.h | 99 +++++++++++++++++++++++++++++
3 files changed, 112 insertions(+)
create mode 100644 include/drm/drm_panic.h
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 40c2bd3e62e8..5a908c186037 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -38,6 +38,7 @@
#include <drm/drm_drv.h>
#include <drm/drm_framebuffer.h>
#include <drm/drm_gem_atomic_helper.h>
+#include <drm/drm_panic.h>
#include <drm/drm_print.h>
#include <drm/drm_self_refresh_helper.h>
#include <drm/drm_vblank.h>
@@ -3086,6 +3087,7 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
}
}
+ drm_panic_lock(state->dev);
for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
WARN_ON(plane->state != old_plane_state);
@@ -3095,6 +3097,7 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
state->planes[i].state = old_plane_state;
plane->state = new_plane_state;
}
+ drm_panic_unlock(state->dev);
for_each_oldnew_private_obj_in_state(state, obj, old_obj_state, new_obj_state, i) {
WARN_ON(obj->state != old_obj_state);
diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
index 973119a9176b..92a390379e85 100644
--- a/include/drm/drm_mode_config.h
+++ b/include/drm/drm_mode_config.h
@@ -505,6 +505,16 @@ struct drm_mode_config {
*/
struct list_head plane_list;
+ /**
+ * @panic_lock:
+ *
+ * Raw spinlock used to protect critical sections of code that access
+ * the display hardware or modeset software, which the panic printing
+ * code must be protected against. See drm_panic_trylock(),
+ * drm_panic_lock() and drm_panic_unlock().
+ */
+ struct raw_spinlock panic_lock;
+
/**
* @num_crtc:
*
diff --git a/include/drm/drm_panic.h b/include/drm/drm_panic.h
new file mode 100644
index 000000000000..caa12f60beae
--- /dev/null
+++ b/include/drm/drm_panic.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 or MIT */
+#ifndef __DRM_PANIC_H__
+#define __DRM_PANIC_H__
+
+#include <drm/drm_device.h>
+/*
+ * Copyright (c) 2024 Intel
+ */
+
+/**
+ * drm_panic_trylock - try to enter the panic printing critical section
+ * @dev: struct drm_device
+ *
+ * This function must be called by any panic printing code. The panic printing
+ * attempt must be aborted if the trylock fails.
+ *
+ * Panic printing code can make the following assumptions while holding the
+ * panic lock:
+ *
+ * - Anything protected by drm_panic_lock() and drm_panic_unlock() pairs is safe
+ * to access.
+ *
+ * - Furthermore the panic printing code only registers in drm_dev_unregister()
+ * and gets removed in drm_dev_unregister(). This allows the panic code to
+ * safely access any state which is invariant in between these two function
+ * calls, like the list of planes drm_mode_config.plane_list or most of the
+ * struct drm_plane structure.
+ *
+ * Specifically thanks to the protection around plane updates in
+ * drm_atomic_helper_swap_state() the following additional guarantees hold:
+ *
+ * - It is safe to deference the drm_plane.state pointer.
+ *
+ * - Anything in struct drm_plane_state or the driver's subclass thereof which
+ * stays invariant after the atomic check code has finished is safe to access.
+ * Specifically this includes the reference counted pointers to framebuffer
+ * and buffer objects.
+ *
+ * - Anything set up by drm_plane_helper_funcs.fb_prepare and cleaned up
+ * drm_plane_helper_funcs.fb_cleanup is safe to access, as long as it stays
+ * invariant between these two calls. This also means that for drivers using
+ * dynamic buffer management the framebuffer is pinned, and therefer all
+ * relevant datastructures can be accessed without taking any further locks
+ * (which would be impossible in panic context anyway).
+ *
+ * - Importantly, software and hardware state set up by
+ * drm_plane_helper_funcs.begin_fb_access and
+ * drm_plane_helper_funcs.end_fb_access is not safe to access.
+ *
+ * Drivers must not make any assumptions about the actual state of the hardware,
+ * unless they explicitly protected these hardware access with drm_panic_lock()
+ * and drm_panic_unlock().
+ *
+ * Returns:
+ *
+ * 0 when failing to acquire the raw spinlock, nonzero on success.:w
+ */
+static inline int drm_panic_trylock(struct drm_device *dev)
+{
+ return raw_spin_trylock(&dev->mode_config.panic_lock);
+}
+
+/**
+ * drm_panic_lock - protect panic printing relevant state
+ * @dev: struct drm_device
+ *
+ * This function must be called to protect software and hardware state that the
+ * panic printing code must be able to rely on. The protected sections must be
+ * as small as possible. Examples include:
+ *
+ * - Access to peek/poke or other similar registers, if that is the way the
+ * driver prints the pixels into the scanout buffer at panic time.
+ *
+ * - Updates to pointers like drm_plane.state, allowing the panic handler to
+ * safely deference these. This is done in drm_atomic_helper_swap_state().
+ *
+ * - An state that isn't invariant and that the driver must be able to access
+ * during panic printing.
+ *
+ * Call drm_panic_unlock() to unlock the locked spinlock.
+ */
+static inline void drm_panic_lock(struct drm_device *dev)
+{
+ return raw_spin_lock(&dev->mode_config.panic_lock);
+}
+
+/**
+ * drm_panic_unlock - end of the panic printing critical section
+ * @dev: struct drm_device
+ *
+ * Unlocks the raw spinlock acquired by either drm_panic_lock() or
+ * drm_panic_trylock().
+ */
+static inline void drm_panic_unlock(struct drm_device *dev)
+{
+ raw_spin_unlock(&dev->mode_config.panic_lock);
+}
+
+#endif /* __DRM_PANIC_H__ */
--
2.43.0
Rough sketch for the locking of drm panic printing code. The upshot of
this approach is that we can pretty much entirely rely on the atomic
commit flow, with the pair of raw_spin_lock/unlock providing any
barriers we need, without having to create really big critical
sections in code.
This also avoids the need that drivers must explicitly update the
panic handler state, which they might forget to do, or not do
consistently, and then we blow up in the worst possible times.
It is somewhat racy against a concurrent atomic update, and we might
write into a buffer which the hardware will never display. But there's
fundamentally no way to avoid that - if we do the panic state update
explicitly after writing to the hardware, we might instead write to an
old buffer that the user will barely ever see.
Note that an rcu protected deference of plane->state would give us the
the same guarantees, but it has the downside that we then need to
protect the plane state freeing functions with call_rcu too. Which
would very widely impact a lot of code and therefore doesn't seem
worth the complexity compared to a raw spinlock with very tiny
critical sections. Plus rcu cannot be used to protect access to
peek/poke registers anyway, so we'd still need it for those cases.
Peek/poke registers for vram access (or a gart pte reserved just for
panic code) are also the reason I've gone with a per-device and not
per-plane spinlock, since usually these things are global for the
entire display. Going with per-plane locks would mean drivers for such
hardware would need additional locks, which we don't want, since it
deviates from the per-console takeoverlocks design.
Longer term it might be useful if the panic notifiers grow a bit more
structure than just the absolute bare
EXPORT_SYMBOL(panic_notifier_list) - somewhat aside, why is that not
EXPORT_SYMBOL_GPL ... If panic notifiers would be more like console
drivers with proper register/unregister interfaces we could perhaps
reuse the very fancy console lock with all it's check and takeover
semantics that John Ogness is developing to fix the console_lock mess.
But for the initial cut of a drm panic printing support I don't think
we need that, because the critical sections are extremely small and
only happen once per display refresh. So generally just 60 tiny locked
sections per second, which is nothing compared to a serial console
running a 115kbaud doing really slow mmio writes for each byte. So for
now the raw spintrylock in drm panic notifier callback should be good
enough.
Another benefit of making panic notifiers more like full blown
consoles (that are used in panics only) would be that we get the two
stage design, where first all the safe outputs are used. And then the
dangerous takeover tricks are deployed (where for display drivers we
also might try to intercept any in-flight display buffer flips, which
if we race and misprogram fifos and watermarks can hang the memory
controller on some hw).
For context the actual implementation on the drm side is by Jocelyn
and this patch is meant to be combined with the overall approach in
v7 (v8 is a bit less flexible, which I think is the wrong direction):
https://lore.kernel.org/dri-devel/[email protected]/
Note that the locking is very much not correct there, hence this
separate rfc.
v2:
- fix authorship, this was all my typing
- some typo oopsies
- link to the drm panic work by Jocelyn for context
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jocelyn Falempe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Lukas Wunner <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
---
drivers/gpu/drm/drm_atomic_helper.c | 3 +
include/drm/drm_mode_config.h | 10 +++
include/drm/drm_panic.h | 99 +++++++++++++++++++++++++++++
3 files changed, 112 insertions(+)
create mode 100644 include/drm/drm_panic.h
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 40c2bd3e62e8..5a908c186037 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -38,6 +38,7 @@
#include <drm/drm_drv.h>
#include <drm/drm_framebuffer.h>
#include <drm/drm_gem_atomic_helper.h>
+#include <drm/drm_panic.h>
#include <drm/drm_print.h>
#include <drm/drm_self_refresh_helper.h>
#include <drm/drm_vblank.h>
@@ -3086,6 +3087,7 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
}
}
+ drm_panic_lock(state->dev);
for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
WARN_ON(plane->state != old_plane_state);
@@ -3095,6 +3097,7 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
state->planes[i].state = old_plane_state;
plane->state = new_plane_state;
}
+ drm_panic_unlock(state->dev);
for_each_oldnew_private_obj_in_state(state, obj, old_obj_state, new_obj_state, i) {
WARN_ON(obj->state != old_obj_state);
diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
index 973119a9176b..e79f1a557a22 100644
--- a/include/drm/drm_mode_config.h
+++ b/include/drm/drm_mode_config.h
@@ -505,6 +505,16 @@ struct drm_mode_config {
*/
struct list_head plane_list;
+ /**
+ * @panic_lock:
+ *
+ * Raw spinlock used to protect critical sections of code that access
+ * the display hardware or modeset software state, which the panic
+ * printing code must be protected against. See drm_panic_trylock(),
+ * drm_panic_lock() and drm_panic_unlock().
+ */
+ struct raw_spinlock panic_lock;
+
/**
* @num_crtc:
*
diff --git a/include/drm/drm_panic.h b/include/drm/drm_panic.h
new file mode 100644
index 000000000000..f2135d03f1eb
--- /dev/null
+++ b/include/drm/drm_panic.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 or MIT */
+#ifndef __DRM_PANIC_H__
+#define __DRM_PANIC_H__
+
+#include <drm/drm_device.h>
+/*
+ * Copyright (c) 2024 Intel
+ */
+
+/**
+ * drm_panic_trylock - try to enter the panic printing critical section
+ * @dev: struct drm_device
+ *
+ * This function must be called by any panic printing code. The panic printing
+ * attempt must be aborted if the trylock fails.
+ *
+ * Panic printing code can make the following assumptions while holding the
+ * panic lock:
+ *
+ * - Anything protected by drm_panic_lock() and drm_panic_unlock() pairs is safe
+ * to access.
+ *
+ * - Furthermore the panic printing code only registers in drm_dev_unregister()
+ * and gets removed in drm_dev_unregister(). This allows the panic code to
+ * safely access any state which is invariant in between these two function
+ * calls, like the list of planes drm_mode_config.plane_list or most of the
+ * struct drm_plane structure.
+ *
+ * Specifically thanks to the protection around plane updates in
+ * drm_atomic_helper_swap_state() the following additional guarantees hold:
+ *
+ * - It is safe to deference the drm_plane.state pointer.
+ *
+ * - Anything in struct drm_plane_state or the driver's subclass thereof which
+ * stays invariant after the atomic check code has finished is safe to access.
+ * Specifically this includes the reference counted pointers to framebuffer
+ * and buffer objects.
+ *
+ * - Anything set up by drm_plane_helper_funcs.fb_prepare and cleaned up
+ * drm_plane_helper_funcs.fb_cleanup is safe to access, as long as it stays
+ * invariant between these two calls. This also means that for drivers using
+ * dynamic buffer management the framebuffer is pinned, and therefer all
+ * relevant datastructures can be accessed without taking any further locks
+ * (which would be impossible in panic context anyway).
+ *
+ * - Importantly, software and hardware state set up by
+ * drm_plane_helper_funcs.begin_fb_access and
+ * drm_plane_helper_funcs.end_fb_access is not safe to access.
+ *
+ * Drivers must not make any assumptions about the actual state of the hardware,
+ * unless they explicitly protected these hardware access with drm_panic_lock()
+ * and drm_panic_unlock().
+ *
+ * Returns:
+ *
+ * 0 when failing to acquire the raw spinlock, nonzero on success.
+ */
+static inline int drm_panic_trylock(struct drm_device *dev)
+{
+ return raw_spin_trylock(&dev->mode_config.panic_lock);
+}
+
+/**
+ * drm_panic_lock - protect panic printing relevant state
+ * @dev: struct drm_device
+ *
+ * This function must be called to protect software and hardware state that the
+ * panic printing code must be able to rely on. The protected sections must be
+ * as small as possible. Examples include:
+ *
+ * - Access to peek/poke or other similar registers, if that is the way the
+ * driver prints the pixels into the scanout buffer at panic time.
+ *
+ * - Updates to pointers like drm_plane.state, allowing the panic handler to
+ * safely deference these. This is done in drm_atomic_helper_swap_state().
+ *
+ * - An state that isn't invariant and that the driver must be able to access
+ * during panic printing.
+ *
+ * Call drm_panic_unlock() to unlock the locked spinlock.
+ */
+static inline void drm_panic_lock(struct drm_device *dev)
+{
+ return raw_spin_lock(&dev->mode_config.panic_lock);
+}
+
+/**
+ * drm_panic_unlock - end of the panic printing critical section
+ * @dev: struct drm_device
+ *
+ * Unlocks the raw spinlock acquired by either drm_panic_lock() or
+ * drm_panic_trylock().
+ */
+static inline void drm_panic_unlock(struct drm_device *dev)
+{
+ raw_spin_unlock(&dev->mode_config.panic_lock);
+}
+
+#endif /* __DRM_PANIC_H__ */
--
2.43.0
Thanks for the patch.
I think it misses to initialize the lock, so we need to add a
raw_spin_lock_init() in the drm device initialization.
Also I'm wondering if it make sense to put that under the
CONFIG_DRM_PANIC flag, so that if you don't enable it, panic_lock() and
panic_unlock() would be no-op.
But that may not work if the driver uses this lock to protect some
register access.
Best regards,
--
Jocelyn
On 01/03/2024 11:39, Daniel Vetter wrote:
> Rough sketch for the locking of drm panic printing code. The upshot of
> this approach is that we can pretty much entirely rely on the atomic
> commit flow, with the pair of raw_spin_lock/unlock providing any
> barriers we need, without having to create really big critical
> sections in code.
>
> This also avoids the need that drivers must explicitly update the
> panic handler state, which they might forget to do, or not do
> consistently, and then we blow up in the worst possible times.
>
> It is somewhat racy against a concurrent atomic update, and we might
> write into a buffer which the hardware will never display. But there's
> fundamentally no way to avoid that - if we do the panic state update
> explicitly after writing to the hardware, we might instead write to an
> old buffer that the user will barely ever see.
>
> Note that an rcu protected deference of plane->state would give us the
> the same guarantees, but it has the downside that we then need to
> protect the plane state freeing functions with call_rcu too. Which
> would very widely impact a lot of code and therefore doesn't seem
> worth the complexity compared to a raw spinlock with very tiny
> critical sections. Plus rcu cannot be used to protect access to
> peek/poke registers anyway, so we'd still need it for those cases.
>
> Peek/poke registers for vram access (or a gart pte reserved just for
> panic code) are also the reason I've gone with a per-device and not
> per-plane spinlock, since usually these things are global for the
> entire display. Going with per-plane locks would mean drivers for such
> hardware would need additional locks, which we don't want, since it
> deviates from the per-console takeoverlocks design.
>
> Longer term it might be useful if the panic notifiers grow a bit more
> structure than just the absolute bare
> EXPORT_SYMBOL(panic_notifier_list) - somewhat aside, why is that not
> EXPORT_SYMBOL_GPL ... If panic notifiers would be more like console
> drivers with proper register/unregister interfaces we could perhaps
> reuse the very fancy console lock with all it's check and takeover
> semantics that John Ogness is developing to fix the console_lock mess.
> But for the initial cut of a drm panic printing support I don't think
> we need that, because the critical sections are extremely small and
> only happen once per display refresh. So generally just 60 tiny locked
> sections per second, which is nothing compared to a serial console
> running a 115kbaud doing really slow mmio writes for each byte. So for
> now the raw spintrylock in drm panic notifier callback should be good
> enough.
>
> Another benefit of making panic notifiers more like full blown
> consoles (that are used in panics only) would be that we get the two
> stage design, where first all the safe outputs are used. And then the
> dangerous takeover tricks are deployed (where for display drivers we
> also might try to intercept any in-flight display buffer flips, which
> if we race and misprogram fifos and watermarks can hang the memory
> controller on some hw).
>
> For context the actual implementation on the drm side is by Jocelyn
> and this patch is meant to be combined with the overall approach in
> v7 (v8 is a bit less flexible, which I think is the wrong direction):
>
> https://lore.kernel.org/dri-devel/[email protected]/
>
> Note that the locking is very much not correct there, hence this
> separate rfc.
>
> v2:
> - fix authorship, this was all my typing
> - some typo oopsies
> - link to the drm panic work by Jocelyn for context
>
> Signed-off-by: Daniel Vetter <[email protected]>
> Cc: Jocelyn Falempe <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: "Peter Zijlstra (Intel)" <[email protected]>
> Cc: Lukas Wunner <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: John Ogness <[email protected]>
> Cc: Sergey Senozhatsky <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Maxime Ripard <[email protected]>
> Cc: Thomas Zimmermann <[email protected]>
> Cc: David Airlie <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> ---
> drivers/gpu/drm/drm_atomic_helper.c | 3 +
> include/drm/drm_mode_config.h | 10 +++
> include/drm/drm_panic.h | 99 +++++++++++++++++++++++++++++
> 3 files changed, 112 insertions(+)
> create mode 100644 include/drm/drm_panic.h
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 40c2bd3e62e8..5a908c186037 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -38,6 +38,7 @@
> #include <drm/drm_drv.h>
> #include <drm/drm_framebuffer.h>
> #include <drm/drm_gem_atomic_helper.h>
> +#include <drm/drm_panic.h>
> #include <drm/drm_print.h>
> #include <drm/drm_self_refresh_helper.h>
> #include <drm/drm_vblank.h>
> @@ -3086,6 +3087,7 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
> }
> }
>
> + drm_panic_lock(state->dev);
> for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> WARN_ON(plane->state != old_plane_state);
>
> @@ -3095,6 +3097,7 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
> state->planes[i].state = old_plane_state;
> plane->state = new_plane_state;
> }
> + drm_panic_unlock(state->dev);
>
> for_each_oldnew_private_obj_in_state(state, obj, old_obj_state, new_obj_state, i) {
> WARN_ON(obj->state != old_obj_state);
> diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
> index 973119a9176b..e79f1a557a22 100644
> --- a/include/drm/drm_mode_config.h
> +++ b/include/drm/drm_mode_config.h
> @@ -505,6 +505,16 @@ struct drm_mode_config {
> */
> struct list_head plane_list;
>
> + /**
> + * @panic_lock:
> + *
> + * Raw spinlock used to protect critical sections of code that access
> + * the display hardware or modeset software state, which the panic
> + * printing code must be protected against. See drm_panic_trylock(),
> + * drm_panic_lock() and drm_panic_unlock().
> + */
> + struct raw_spinlock panic_lock;
> +
> /**
> * @num_crtc:
> *
> diff --git a/include/drm/drm_panic.h b/include/drm/drm_panic.h
> new file mode 100644
> index 000000000000..f2135d03f1eb
> --- /dev/null
> +++ b/include/drm/drm_panic.h
> @@ -0,0 +1,99 @@
> +/* SPDX-License-Identifier: GPL-2.0 or MIT */
> +#ifndef __DRM_PANIC_H__
> +#define __DRM_PANIC_H__
> +
> +#include <drm/drm_device.h>
> +/*
> + * Copyright (c) 2024 Intel
> + */
> +
> +/**
> + * drm_panic_trylock - try to enter the panic printing critical section
> + * @dev: struct drm_device
> + *
> + * This function must be called by any panic printing code. The panic printing
> + * attempt must be aborted if the trylock fails.
> + *
> + * Panic printing code can make the following assumptions while holding the
> + * panic lock:
> + *
> + * - Anything protected by drm_panic_lock() and drm_panic_unlock() pairs is safe
> + * to access.
> + *
> + * - Furthermore the panic printing code only registers in drm_dev_unregister()
> + * and gets removed in drm_dev_unregister(). This allows the panic code to
> + * safely access any state which is invariant in between these two function
> + * calls, like the list of planes drm_mode_config.plane_list or most of the
> + * struct drm_plane structure.
> + *
> + * Specifically thanks to the protection around plane updates in
> + * drm_atomic_helper_swap_state() the following additional guarantees hold:
> + *
> + * - It is safe to deference the drm_plane.state pointer.
> + *
> + * - Anything in struct drm_plane_state or the driver's subclass thereof which
> + * stays invariant after the atomic check code has finished is safe to access.
> + * Specifically this includes the reference counted pointers to framebuffer
> + * and buffer objects.
> + *
> + * - Anything set up by drm_plane_helper_funcs.fb_prepare and cleaned up
> + * drm_plane_helper_funcs.fb_cleanup is safe to access, as long as it stays
> + * invariant between these two calls. This also means that for drivers using
> + * dynamic buffer management the framebuffer is pinned, and therefer all
> + * relevant datastructures can be accessed without taking any further locks
> + * (which would be impossible in panic context anyway).
> + *
> + * - Importantly, software and hardware state set up by
> + * drm_plane_helper_funcs.begin_fb_access and
> + * drm_plane_helper_funcs.end_fb_access is not safe to access.
> + *
> + * Drivers must not make any assumptions about the actual state of the hardware,
> + * unless they explicitly protected these hardware access with drm_panic_lock()
> + * and drm_panic_unlock().
> + *
> + * Returns:
> + *
> + * 0 when failing to acquire the raw spinlock, nonzero on success.
> + */
> +static inline int drm_panic_trylock(struct drm_device *dev)
> +{
> + return raw_spin_trylock(&dev->mode_config.panic_lock);
> +}
> +
> +/**
> + * drm_panic_lock - protect panic printing relevant state
> + * @dev: struct drm_device
> + *
> + * This function must be called to protect software and hardware state that the
> + * panic printing code must be able to rely on. The protected sections must be
> + * as small as possible. Examples include:
> + *
> + * - Access to peek/poke or other similar registers, if that is the way the
> + * driver prints the pixels into the scanout buffer at panic time.
> + *
> + * - Updates to pointers like drm_plane.state, allowing the panic handler to
> + * safely deference these. This is done in drm_atomic_helper_swap_state().
> + *
> + * - An state that isn't invariant and that the driver must be able to access
> + * during panic printing.
> + *
> + * Call drm_panic_unlock() to unlock the locked spinlock.
> + */
> +static inline void drm_panic_lock(struct drm_device *dev)
> +{
> + return raw_spin_lock(&dev->mode_config.panic_lock);
> +}
> +
> +/**
> + * drm_panic_unlock - end of the panic printing critical section
> + * @dev: struct drm_device
> + *
> + * Unlocks the raw spinlock acquired by either drm_panic_lock() or
> + * drm_panic_trylock().
> + */
> +static inline void drm_panic_unlock(struct drm_device *dev)
> +{
> + raw_spin_unlock(&dev->mode_config.panic_lock);
> +}
> +
> +#endif /* __DRM_PANIC_H__ */
Hi Daniel,
Great to see this moving forward!
On 2024-03-01, Daniel Vetter <[email protected]> wrote:
> But for the initial cut of a drm panic printing support I don't think
> we need that, because the critical sections are extremely small and
> only happen once per display refresh. So generally just 60 tiny locked
> sections per second, which is nothing compared to a serial console
> running a 115kbaud doing really slow mmio writes for each byte. So for
> now the raw spintrylock in drm panic notifier callback should be good
> enough.
Is there a reason you do not use the irqsave/irqrestore variants? By
leaving interrupts enabled, there is the risk that a panic from any
interrupt handler may block the drm panic handler.
John Ogness
On Tue, Mar 05, 2024 at 09:20:04AM +0106, John Ogness wrote:
> Hi Daniel,
>
> Great to see this moving forward!
>
> On 2024-03-01, Daniel Vetter <[email protected]> wrote:
> > But for the initial cut of a drm panic printing support I don't think
> > we need that, because the critical sections are extremely small and
> > only happen once per display refresh. So generally just 60 tiny locked
> > sections per second, which is nothing compared to a serial console
> > running a 115kbaud doing really slow mmio writes for each byte. So for
> > now the raw spintrylock in drm panic notifier callback should be good
> > enough.
>
> Is there a reason you do not use the irqsave/irqrestore variants? By
> leaving interrupts enabled, there is the risk that a panic from any
> interrupt handler may block the drm panic handler.
tbh I simply did not consider that could be useful. but yeah if we're
unlucky and an interrupt happens in here and dies, the drm panic handler
cannot run. And this code is definitely not hot enough to matter, the
usual driver code for a plane flip does a few more irqsafe spinlocks on
top. One more doesn't add anything I think, and I guess if it does we'll
notice :-)
Also irqsave makes drm_panic_lock/unlock a bit more widely useful to
protect driver mmio access since then it also works from irq handlers.
Means we have to pass irqflags around, but that sounds acceptable. So very
much has my vote.
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Fri, Mar 01, 2024 at 02:03:12PM +0100, Jocelyn Falempe wrote:
> Thanks for the patch.
>
> I think it misses to initialize the lock, so we need to add a
> raw_spin_lock_init() in the drm device initialization.
>
> Also I'm wondering if it make sense to put that under the CONFIG_DRM_PANIC
> flag, so that if you don't enable it, panic_lock() and panic_unlock() would
> be no-op.
> But that may not work if the driver uses this lock to protect some register
> access.
If we get drivers to use this for some of their own locking we have to
keep it enabled unconditionally. Also I think locking that's only
conditional on Kconfig is just a bit too suprising to be a good idea
irrespective of this specific case.
-Sima
>
> Best regards,
>
> --
>
> Jocelyn
>
> On 01/03/2024 11:39, Daniel Vetter wrote:
> > Rough sketch for the locking of drm panic printing code. The upshot of
> > this approach is that we can pretty much entirely rely on the atomic
> > commit flow, with the pair of raw_spin_lock/unlock providing any
> > barriers we need, without having to create really big critical
> > sections in code.
> >
> > This also avoids the need that drivers must explicitly update the
> > panic handler state, which they might forget to do, or not do
> > consistently, and then we blow up in the worst possible times.
> >
> > It is somewhat racy against a concurrent atomic update, and we might
> > write into a buffer which the hardware will never display. But there's
> > fundamentally no way to avoid that - if we do the panic state update
> > explicitly after writing to the hardware, we might instead write to an
> > old buffer that the user will barely ever see.
> >
> > Note that an rcu protected deference of plane->state would give us the
> > the same guarantees, but it has the downside that we then need to
> > protect the plane state freeing functions with call_rcu too. Which
> > would very widely impact a lot of code and therefore doesn't seem
> > worth the complexity compared to a raw spinlock with very tiny
> > critical sections. Plus rcu cannot be used to protect access to
> > peek/poke registers anyway, so we'd still need it for those cases.
> >
> > Peek/poke registers for vram access (or a gart pte reserved just for
> > panic code) are also the reason I've gone with a per-device and not
> > per-plane spinlock, since usually these things are global for the
> > entire display. Going with per-plane locks would mean drivers for such
> > hardware would need additional locks, which we don't want, since it
> > deviates from the per-console takeoverlocks design.
> >
> > Longer term it might be useful if the panic notifiers grow a bit more
> > structure than just the absolute bare
> > EXPORT_SYMBOL(panic_notifier_list) - somewhat aside, why is that not
> > EXPORT_SYMBOL_GPL ... If panic notifiers would be more like console
> > drivers with proper register/unregister interfaces we could perhaps
> > reuse the very fancy console lock with all it's check and takeover
> > semantics that John Ogness is developing to fix the console_lock mess.
> > But for the initial cut of a drm panic printing support I don't think
> > we need that, because the critical sections are extremely small and
> > only happen once per display refresh. So generally just 60 tiny locked
> > sections per second, which is nothing compared to a serial console
> > running a 115kbaud doing really slow mmio writes for each byte. So for
> > now the raw spintrylock in drm panic notifier callback should be good
> > enough.
> >
> > Another benefit of making panic notifiers more like full blown
> > consoles (that are used in panics only) would be that we get the two
> > stage design, where first all the safe outputs are used. And then the
> > dangerous takeover tricks are deployed (where for display drivers we
> > also might try to intercept any in-flight display buffer flips, which
> > if we race and misprogram fifos and watermarks can hang the memory
> > controller on some hw).
> >
> > For context the actual implementation on the drm side is by Jocelyn
> > and this patch is meant to be combined with the overall approach in
> > v7 (v8 is a bit less flexible, which I think is the wrong direction):
> >
> > https://lore.kernel.org/dri-devel/[email protected]/
> >
> > Note that the locking is very much not correct there, hence this
> > separate rfc.
> >
> > v2:
> > - fix authorship, this was all my typing
> > - some typo oopsies
> > - link to the drm panic work by Jocelyn for context
> >
> > Signed-off-by: Daniel Vetter <[email protected]>
> > Cc: Jocelyn Falempe <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: "Peter Zijlstra (Intel)" <[email protected]>
> > Cc: Lukas Wunner <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: John Ogness <[email protected]>
> > Cc: Sergey Senozhatsky <[email protected]>
> > Cc: Maarten Lankhorst <[email protected]>
> > Cc: Maxime Ripard <[email protected]>
> > Cc: Thomas Zimmermann <[email protected]>
> > Cc: David Airlie <[email protected]>
> > Cc: Daniel Vetter <[email protected]>
> > ---
> > drivers/gpu/drm/drm_atomic_helper.c | 3 +
> > include/drm/drm_mode_config.h | 10 +++
> > include/drm/drm_panic.h | 99 +++++++++++++++++++++++++++++
> > 3 files changed, 112 insertions(+)
> > create mode 100644 include/drm/drm_panic.h
> >
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index 40c2bd3e62e8..5a908c186037 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -38,6 +38,7 @@
> > #include <drm/drm_drv.h>
> > #include <drm/drm_framebuffer.h>
> > #include <drm/drm_gem_atomic_helper.h>
> > +#include <drm/drm_panic.h>
> > #include <drm/drm_print.h>
> > #include <drm/drm_self_refresh_helper.h>
> > #include <drm/drm_vblank.h>
> > @@ -3086,6 +3087,7 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
> > }
> > }
> > + drm_panic_lock(state->dev);
> > for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> > WARN_ON(plane->state != old_plane_state);
> > @@ -3095,6 +3097,7 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
> > state->planes[i].state = old_plane_state;
> > plane->state = new_plane_state;
> > }
> > + drm_panic_unlock(state->dev);
> > for_each_oldnew_private_obj_in_state(state, obj, old_obj_state, new_obj_state, i) {
> > WARN_ON(obj->state != old_obj_state);
> > diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
> > index 973119a9176b..e79f1a557a22 100644
> > --- a/include/drm/drm_mode_config.h
> > +++ b/include/drm/drm_mode_config.h
> > @@ -505,6 +505,16 @@ struct drm_mode_config {
> > */
> > struct list_head plane_list;
> > + /**
> > + * @panic_lock:
> > + *
> > + * Raw spinlock used to protect critical sections of code that access
> > + * the display hardware or modeset software state, which the panic
> > + * printing code must be protected against. See drm_panic_trylock(),
> > + * drm_panic_lock() and drm_panic_unlock().
> > + */
> > + struct raw_spinlock panic_lock;
> > +
> > /**
> > * @num_crtc:
> > *
> > diff --git a/include/drm/drm_panic.h b/include/drm/drm_panic.h
> > new file mode 100644
> > index 000000000000..f2135d03f1eb
> > --- /dev/null
> > +++ b/include/drm/drm_panic.h
> > @@ -0,0 +1,99 @@
> > +/* SPDX-License-Identifier: GPL-2.0 or MIT */
> > +#ifndef __DRM_PANIC_H__
> > +#define __DRM_PANIC_H__
> > +
> > +#include <drm/drm_device.h>
> > +/*
> > + * Copyright (c) 2024 Intel
> > + */
> > +
> > +/**
> > + * drm_panic_trylock - try to enter the panic printing critical section
> > + * @dev: struct drm_device
> > + *
> > + * This function must be called by any panic printing code. The panic printing
> > + * attempt must be aborted if the trylock fails.
> > + *
> > + * Panic printing code can make the following assumptions while holding the
> > + * panic lock:
> > + *
> > + * - Anything protected by drm_panic_lock() and drm_panic_unlock() pairs is safe
> > + * to access.
> > + *
> > + * - Furthermore the panic printing code only registers in drm_dev_unregister()
> > + * and gets removed in drm_dev_unregister(). This allows the panic code to
> > + * safely access any state which is invariant in between these two function
> > + * calls, like the list of planes drm_mode_config.plane_list or most of the
> > + * struct drm_plane structure.
> > + *
> > + * Specifically thanks to the protection around plane updates in
> > + * drm_atomic_helper_swap_state() the following additional guarantees hold:
> > + *
> > + * - It is safe to deference the drm_plane.state pointer.
> > + *
> > + * - Anything in struct drm_plane_state or the driver's subclass thereof which
> > + * stays invariant after the atomic check code has finished is safe to access.
> > + * Specifically this includes the reference counted pointers to framebuffer
> > + * and buffer objects.
> > + *
> > + * - Anything set up by drm_plane_helper_funcs.fb_prepare and cleaned up
> > + * drm_plane_helper_funcs.fb_cleanup is safe to access, as long as it stays
> > + * invariant between these two calls. This also means that for drivers using
> > + * dynamic buffer management the framebuffer is pinned, and therefer all
> > + * relevant datastructures can be accessed without taking any further locks
> > + * (which would be impossible in panic context anyway).
> > + *
> > + * - Importantly, software and hardware state set up by
> > + * drm_plane_helper_funcs.begin_fb_access and
> > + * drm_plane_helper_funcs.end_fb_access is not safe to access.
> > + *
> > + * Drivers must not make any assumptions about the actual state of the hardware,
> > + * unless they explicitly protected these hardware access with drm_panic_lock()
> > + * and drm_panic_unlock().
> > + *
> > + * Returns:
> > + *
> > + * 0 when failing to acquire the raw spinlock, nonzero on success.
> > + */
> > +static inline int drm_panic_trylock(struct drm_device *dev)
> > +{
> > + return raw_spin_trylock(&dev->mode_config.panic_lock);
> > +}
> > +
> > +/**
> > + * drm_panic_lock - protect panic printing relevant state
> > + * @dev: struct drm_device
> > + *
> > + * This function must be called to protect software and hardware state that the
> > + * panic printing code must be able to rely on. The protected sections must be
> > + * as small as possible. Examples include:
> > + *
> > + * - Access to peek/poke or other similar registers, if that is the way the
> > + * driver prints the pixels into the scanout buffer at panic time.
> > + *
> > + * - Updates to pointers like drm_plane.state, allowing the panic handler to
> > + * safely deference these. This is done in drm_atomic_helper_swap_state().
> > + *
> > + * - An state that isn't invariant and that the driver must be able to access
> > + * during panic printing.
> > + *
> > + * Call drm_panic_unlock() to unlock the locked spinlock.
> > + */
> > +static inline void drm_panic_lock(struct drm_device *dev)
> > +{
> > + return raw_spin_lock(&dev->mode_config.panic_lock);
> > +}
> > +
> > +/**
> > + * drm_panic_unlock - end of the panic printing critical section
> > + * @dev: struct drm_device
> > + *
> > + * Unlocks the raw spinlock acquired by either drm_panic_lock() or
> > + * drm_panic_trylock().
> > + */
> > +static inline void drm_panic_unlock(struct drm_device *dev)
> > +{
> > + raw_spin_unlock(&dev->mode_config.panic_lock);
> > +}
> > +
> > +#endif /* __DRM_PANIC_H__ */
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch