2022-11-07 14:22:36

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v3 15/40] kdb: use srcu console list iterator

Guarantee safe iteration of the console list by using SRCU.

Signed-off-by: John Ogness <[email protected]>
---
kernel/debug/kdb/kdb_io.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index 550fe8b456ec..ed8289ce4fcb 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -545,6 +545,7 @@ static void kdb_msg_write(const char *msg, int msg_len)
{
struct console *c;
const char *cp;
+ int cookie;
int len;

if (msg_len == 0)
@@ -558,7 +559,15 @@ static void kdb_msg_write(const char *msg, int msg_len)
cp++;
}

- for_each_console(c) {
+ /*
+ * The console_srcu_read_lock() only provides safe console list
+ * traversal. The use of the ->write() callback relies on all other
+ * CPUs being stopped at the moment and console drivers being able to
+ * handle reentrance when @oops_in_progress is set. (Note that there
+ * is no guarantee for either criteria.)
+ */
+ cookie = console_srcu_read_lock();
+ for_each_console_srcu(c) {
if (!console_is_enabled(c))
continue;
if (c == dbg_io_ops->cons)
@@ -577,6 +586,7 @@ static void kdb_msg_write(const char *msg, int msg_len)
--oops_in_progress;
touch_nmi_watchdog();
}
+ console_srcu_read_unlock(cookie);
}

int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
--
2.30.2



2022-11-09 09:55:58

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH printk v3 15/40] kdb: use srcu console list iterator

On Mon, Nov 07, 2022 at 03:22:13PM +0106, John Ogness wrote:
> Guarantee safe iteration of the console list by using SRCU.
>
> Signed-off-by: John Ogness <[email protected]>
> ---
> kernel/debug/kdb/kdb_io.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> index 550fe8b456ec..ed8289ce4fcb 100644
> --- a/kernel/debug/kdb/kdb_io.c
> +++ b/kernel/debug/kdb/kdb_io.c
> @@ -545,6 +545,7 @@ static void kdb_msg_write(const char *msg, int msg_len)
> {
> struct console *c;
> const char *cp;
> + int cookie;
> int len;
>
> if (msg_len == 0)
> @@ -558,7 +559,15 @@ static void kdb_msg_write(const char *msg, int msg_len)
> cp++;
> }
>
> - for_each_console(c) {
> + /*
> + * The console_srcu_read_lock() only provides safe console list
> + * traversal. The use of the ->write() callback relies on all other
> + * CPUs being stopped at the moment and console drivers being able to
> + * handle reentrance when @oops_in_progress is set. (Note that there
> + * is no guarantee for either criteria.)
> + */

The debugger entry protocol does ensure that other CPUs are either
stopped or unresponsive. In the case where the other CPU is unresponsive
(e.g. timed out after being asked to stop) then there is a "real" printk()
issued prior to any of the above interference with the console system to
the developer driving the debugger gets as much clue as we can offer them
about what is going on (typically this is emitted from regular interrupt
context).

Given this comment is part of the debugger code then for the
oops_in_progress hack it might be more helpful to describe what
the developer in front the debugger needs to do to have the most
reliable debug session possible.

There is no guarantee that every console drivers can handle reentrance
in this way; the developer deploying the debugger is responsible for
ensuring that the console drivers they have selected handle reentrance
appropriately.


Daniel.


> + cookie = console_srcu_read_lock();
> + for_each_console_srcu(c) {
> if (!console_is_enabled(c))
> continue;
> if (c == dbg_io_ops->cons)
> @@ -577,6 +586,7 @@ static void kdb_msg_write(const char *msg, int msg_len)
> --oops_in_progress;
> touch_nmi_watchdog();
> }
> + console_srcu_read_unlock(cookie);
> }
>
> int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> --
> 2.30.2
>

2022-11-09 11:12:44

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v3 15/40] kdb: use srcu console list iterator

Hi Daniel,

On 2022-11-09, Daniel Thompson <[email protected]> wrote:
>> + /*
>> + * The console_srcu_read_lock() only provides safe console list
>> + * traversal. The use of the ->write() callback relies on all other
>> + * CPUs being stopped at the moment and console drivers being able to
>> + * handle reentrance when @oops_in_progress is set. (Note that there
>> + * is no guarantee for either criteria.)
>> + */
>
> The debugger entry protocol does ensure that other CPUs are either
> stopped or unresponsive. In the case where the other CPU is unresponsive
> (e.g. timed out after being asked to stop) then there is a "real" printk()
> issued prior to any of the above interference with the console system to
> the developer driving the debugger gets as much clue as we can offer them
> about what is going on (typically this is emitted from regular interrupt
> context).
>
> Given this comment is part of the debugger code then for the
> oops_in_progress hack it might be more helpful to describe what
> the developer in front the debugger needs to do to have the most
> reliable debug session possible.
>
> There is no guarantee that every console drivers can handle reentrance
> in this way; the developer deploying the debugger is responsible for
> ensuring that the console drivers they have selected handle reentrance
> appropriately.

Thanks for the explanation. I will change the comment to:

/*
* The console_srcu_read_lock() only provides safe console list
* traversal. The use of the ->write() callback relies on all other
* CPUs being stopped at the moment and console drivers being able to
* handle reentrance when @oops_in_progress is set.
*
* There is no guarantee that every console driver can handle
* reentrance in this way; the developer deploying the debugger
* is responsible for ensuring that the console drivers they
* have selected handle reentrance appropriately.
*/

John

2022-11-09 16:26:31

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v3 15/40] kdb: use srcu console list iterator

On Wed 2022-11-09 10:33:55, John Ogness wrote:
> Hi Daniel,
>
> On 2022-11-09, Daniel Thompson <[email protected]> wrote:
> >> + /*
> >> + * The console_srcu_read_lock() only provides safe console list
> >> + * traversal. The use of the ->write() callback relies on all other
> >> + * CPUs being stopped at the moment and console drivers being able to
> >> + * handle reentrance when @oops_in_progress is set. (Note that there
> >> + * is no guarantee for either criteria.)
> >> + */
> >
> > The debugger entry protocol does ensure that other CPUs are either
> > stopped or unresponsive. In the case where the other CPU is unresponsive
> > (e.g. timed out after being asked to stop) then there is a "real" printk()
> > issued prior to any of the above interference with the console system to
> > the developer driving the debugger gets as much clue as we can offer them
> > about what is going on (typically this is emitted from regular interrupt
> > context).
> >
> > Given this comment is part of the debugger code then for the
> > oops_in_progress hack it might be more helpful to describe what
> > the developer in front the debugger needs to do to have the most
> > reliable debug session possible.
> >
> > There is no guarantee that every console drivers can handle reentrance
> > in this way; the developer deploying the debugger is responsible for
> > ensuring that the console drivers they have selected handle reentrance
> > appropriately.
>
> Thanks for the explanation. I will change the comment to:
>
> /*
> * The console_srcu_read_lock() only provides safe console list
> * traversal. The use of the ->write() callback relies on all other
> * CPUs being stopped at the moment and console drivers being able to
> * handle reentrance when @oops_in_progress is set.
> *
> * There is no guarantee that every console driver can handle
> * reentrance in this way; the developer deploying the debugger
> * is responsible for ensuring that the console drivers they
> * have selected handle reentrance appropriately.
> */

Looks good to me.

After merging this with the 10th patch that adds the data_race() annotated
check of CON_ENABLED flag:

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr