LinuxLists.cc - [PATCH v4 0/4] remoteproc: Panic handling

2020-03-10 06:40:18

Subject: [PATCH v4 0/4] remoteproc: Panic handling

Add support for invoking a panic handler in remoteproc drivers, to allow them
to invoke e.g. cache flushing on the remote processors in response to a kernel
panic - to aid in post mortem debugging of system issues.

Bjorn Andersson (4):
remoteproc: Traverse rproc_list under RCU read lock
remoteproc: Introduce "panic" callback in ops
remoteproc: qcom: q6v5: Add common panic handler
remoteproc: qcom: Introduce panic handler for PAS and ADSP

drivers/remoteproc/qcom_q6v5.c | 20 ++++++++++
drivers/remoteproc/qcom_q6v5.h | 1 +
drivers/remoteproc/qcom_q6v5_adsp.c | 8 ++++
drivers/remoteproc/qcom_q6v5_pas.c | 8 ++++
drivers/remoteproc/remoteproc_core.c | 57 +++++++++++++++++++++++++---
include/linux/remoteproc.h | 3 ++
6 files changed, 92 insertions(+), 5 deletions(-)

--
2.24.0

2020-03-10 06:40:29

by Bjorn Andersson

[permalink] [raw]

Subject: [PATCH v4 3/4] remoteproc: qcom: q6v5: Add common panic handler

Add a common panic handler that invokes a stop request and sleep enough
to let the remoteproc flush it's caches etc in order to aid post mortem
debugging. For now a hard coded 200ms is returned to the remoteproc
core, this value is taken from the downstream kernel.

Signed-off-by: Bjorn Andersson <[email protected]>
---

Change since v3:
- Change return type to unsigned long

drivers/remoteproc/qcom_q6v5.c | 20 ++++++++++++++++++++
drivers/remoteproc/qcom_q6v5.h | 1 +
2 files changed, 21 insertions(+)

diff --git a/drivers/remoteproc/qcom_q6v5.c b/drivers/remoteproc/qcom_q6v5.c
index cb0f4a0be032..111a442c993c 100644
--- a/drivers/remoteproc/qcom_q6v5.c
+++ b/drivers/remoteproc/qcom_q6v5.c
@@ -15,6 +15,8 @@
#include <linux/remoteproc.h>
#include "qcom_q6v5.h"

+#define Q6V5_PANIC_DELAY_MS 200
+
/**
* qcom_q6v5_prepare() - reinitialize the qcom_q6v5 context before start
* @q6v5: reference to qcom_q6v5 context to be reinitialized
@@ -162,6 +164,24 @@ int qcom_q6v5_request_stop(struct qcom_q6v5 *q6v5)
}
EXPORT_SYMBOL_GPL(qcom_q6v5_request_stop);

+/**
+ * qcom_q6v5_panic() - panic handler to invoke a stop on the remote
+ * @q6v5: reference to qcom_q6v5 context
+ *
+ * Set the stop bit and sleep in order to allow the remote processor to flush
+ * its caches etc for post mortem debugging.
+ *
+ * Return: 200ms
+ */
+unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5)
+{
+ qcom_smem_state_update_bits(q6v5->state,
+ BIT(q6v5->stop_bit), BIT(q6v5->stop_bit));
+
+ return Q6V5_PANIC_DELAY_MS;
+}
+EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
+
/**
* qcom_q6v5_init() - initializer of the q6v5 common struct
* @q6v5: handle to be initialized
diff --git a/drivers/remoteproc/qcom_q6v5.h b/drivers/remoteproc/qcom_q6v5.h
index 7ac92c1e0f49..c4ed887c1499 100644
--- a/drivers/remoteproc/qcom_q6v5.h
+++ b/drivers/remoteproc/qcom_q6v5.h
@@ -42,5 +42,6 @@ int qcom_q6v5_prepare(struct qcom_q6v5 *q6v5);
int qcom_q6v5_unprepare(struct qcom_q6v5 *q6v5);
int qcom_q6v5_request_stop(struct qcom_q6v5 *q6v5);
int qcom_q6v5_wait_for_start(struct qcom_q6v5 *q6v5, int timeout);
+unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5);

#endif
--
2.24.0

2020-03-10 06:40:50

by Bjorn Andersson

[permalink] [raw]

Subject: [PATCH v4 4/4] remoteproc: qcom: Introduce panic handler for PAS and ADSP

Make the PAS and ADSP/CDSP remoteproc drivers implement the panic
handler that will invoke a stop to prepare the remoteprocs for post
mortem debugging.

Signed-off-by: Bjorn Andersson <[email protected]>
---

Change since v3:
- Change return type to unsigned long

drivers/remoteproc/qcom_q6v5_adsp.c | 8 ++++++++
drivers/remoteproc/qcom_q6v5_pas.c | 8 ++++++++
2 files changed, 16 insertions(+)

diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c b/drivers/remoteproc/qcom_q6v5_adsp.c
index d5cdff942535..8f1044e8ea3b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -292,12 +292,20 @@ static void *adsp_da_to_va(struct rproc *rproc, u64 da, int len)
return adsp->mem_region + offset;
}

+static unsigned long adsp_panic(struct rproc *rproc)
+{
+ struct qcom_adsp *adsp = rproc->priv;
+
+ return qcom_q6v5_panic(&adsp->q6v5);
+}
+
static const struct rproc_ops adsp_ops = {
.start = adsp_start,
.stop = adsp_stop,
.da_to_va = adsp_da_to_va,
.parse_fw = qcom_register_dump_segments,
.load = adsp_load,
+ .panic = adsp_panic,
};

static int adsp_init_clock(struct qcom_adsp *adsp, const char **clk_ids)
diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
index e64c268e6113..678c0ddfce96 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -243,12 +243,20 @@ static void *adsp_da_to_va(struct rproc *rproc, u64 da, int len)
return adsp->mem_region + offset;
}

+static unsigned long adsp_panic(struct rproc *rproc)
+{
+ struct qcom_adsp *adsp = (struct qcom_adsp *)rproc->priv;
+
+ return qcom_q6v5_panic(&adsp->q6v5);
+}
+
static const struct rproc_ops adsp_ops = {
.start = adsp_start,
.stop = adsp_stop,
.da_to_va = adsp_da_to_va,
.parse_fw = qcom_register_dump_segments,
.load = adsp_load,
+ .panic = adsp_panic,
};

static int adsp_init_clock(struct qcom_adsp *adsp)
--
2.24.0

2020-03-10 06:41:09

by Bjorn Andersson

[permalink] [raw]

Subject: [PATCH v4 1/4] remoteproc: Traverse rproc_list under RCU read lock

In order to be able to traverse the mostly read-only rproc_list without
locking during panic migrate traversal to be done under rcu_read_lock().

Mutual exclusion for modifications of the list continues to be handled
by the rproc_list_mutex and a synchronization point is added before
releasing objects that are popped from the list.

Signed-off-by: Bjorn Andersson <[email protected]>
---

Change v3:
- New patch

drivers/remoteproc/remoteproc_core.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 097f33e4f1f3..f0a77c30c6b1 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1854,8 +1854,8 @@ struct rproc *rproc_get_by_phandle(phandle phandle)
if (!np)
return NULL;

- mutex_lock(&rproc_list_mutex);
- list_for_each_entry(r, &rproc_list, node) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(r, &rproc_list, node) {
if (r->dev.parent && r->dev.parent->of_node == np) {
/* prevent underlying implementation from being removed */
if (!try_module_get(r->dev.parent->driver->owner)) {
@@ -1868,7 +1868,7 @@ struct rproc *rproc_get_by_phandle(phandle phandle)
break;
}
}
- mutex_unlock(&rproc_list_mutex);
+ rcu_read_unlock();

of_node_put(np);

@@ -1925,7 +1925,7 @@ int rproc_add(struct rproc *rproc)

/* expose to rproc_get_by_phandle users */
mutex_lock(&rproc_list_mutex);
- list_add(&rproc->node, &rproc_list);
+ list_add_rcu(&rproc->node, &rproc_list);
mutex_unlock(&rproc_list_mutex);

return 0;
@@ -2140,9 +2140,12 @@ int rproc_del(struct rproc *rproc)

/* the rproc is downref'ed as soon as it's removed from the klist */
mutex_lock(&rproc_list_mutex);
- list_del(&rproc->node);
+ list_del_rcu(&rproc->node);
mutex_unlock(&rproc_list_mutex);

+ /* Ensure that no readers of rproc_list are still active */
+ synchronize_rcu();
+
device_del(&rproc->dev);

return 0;
--
2.24.0

2020-03-10 06:41:32

by Bjorn Andersson

[permalink] [raw]

Subject: [PATCH v4 2/4] remoteproc: Introduce "panic" callback in ops

Introduce generic support for handling kernel panics in remoteproc
drivers, in order to allow operations needed for aiding in post mortem
system debugging, such as flushing caches etc.

The function can return a number of milliseconds needed by the remote to
"settle" and the core will wait the longest returned duration before
returning from the panic handler.

Signed-off-by: Bjorn Andersson <[email protected]>
---

Change since v3:
- Migrate from mutex_trylock() to using RCU
- Turned the timeout to unsigned long

drivers/remoteproc/remoteproc_core.c | 44 ++++++++++++++++++++++++++++
include/linux/remoteproc.h | 3 ++
2 files changed, 47 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index f0a77c30c6b1..2024a98930bf 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -16,6 +16,7 @@

#define pr_fmt(fmt) "%s: " fmt, __func__

+#include <linux/delay.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/device.h>
@@ -43,6 +44,7 @@

static DEFINE_MUTEX(rproc_list_mutex);
static LIST_HEAD(rproc_list);
+static struct notifier_block rproc_panic_nb;

typedef int (*rproc_handle_resource_t)(struct rproc *rproc,
void *, int offset, int avail);
@@ -2219,10 +2221,51 @@ void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type)
}
EXPORT_SYMBOL(rproc_report_crash);

+static int rproc_panic_handler(struct notifier_block *nb, unsigned long event,
+ void *ptr)
+{
+ unsigned int longest = 0;
+ struct rproc *rproc;
+ unsigned int d;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(rproc, &rproc_list, node) {
+ if (!rproc->ops->panic || rproc->state != RPROC_RUNNING)
+ continue;
+
+ d = rproc->ops->panic(rproc);
+ longest = max(longest, d);
+ }
+ rcu_read_unlock();
+
+ /*
+ * Delay for the longest requested duration before returning.
+ * This can be used by the remoteproc drivers to give the remote
+ * processor time to perform any requested operations (such as flush
+ * caches), where means for signalling the Linux side isn't available
+ * while in panic.
+ */
+ mdelay(longest);
+
+ return NOTIFY_DONE;
+}
+
+static void __init rproc_init_panic(void)
+{
+ rproc_panic_nb.notifier_call = rproc_panic_handler;
+ atomic_notifier_chain_register(&panic_notifier_list, &rproc_panic_nb);
+}
+
+static void __exit rproc_exit_panic(void)
+{
+ atomic_notifier_chain_unregister(&panic_notifier_list, &rproc_panic_nb);
+}
+
static int __init remoteproc_init(void)
{
rproc_init_sysfs();
rproc_init_debugfs();
+ rproc_init_panic();

return 0;
}
@@ -2232,6 +2275,7 @@ static void __exit remoteproc_exit(void)
{
ida_destroy(&rproc_dev_index);

+ rproc_exit_panic();
rproc_exit_debugfs();
rproc_exit_sysfs();
}
diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index 16ad66683ad0..5959d6247dc0 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -369,6 +369,8 @@ enum rsc_handling_status {
* expects to find it
* @sanity_check: sanity check the fw image
* @get_boot_addr: get boot address to entry point specified in firmware
+ * @panic: optional callback to react to system panic, core will delay
+ * panic at least the returned number of milliseconds
*/
struct rproc_ops {
int (*start)(struct rproc *rproc);
@@ -383,6 +385,7 @@ struct rproc_ops {
int (*load)(struct rproc *rproc, const struct firmware *fw);
int (*sanity_check)(struct rproc *rproc, const struct firmware *fw);
u32 (*get_boot_addr)(struct rproc *rproc, const struct firmware *fw);
+ unsigned long (*panic)(struct rproc *rproc);
};

/**
--
2.24.0

2020-03-10 13:43:56

by Arnaud POULIQUEN

[permalink] [raw]

Subject: Re: [PATCH v4 1/4] remoteproc: Traverse rproc_list under RCU read lock

Hi Bjorn,

On 3/10/20 7:38 AM, Bjorn Andersson wrote:
> In order to be able to traverse the mostly read-only rproc_list without
> locking during panic migrate traversal to be done under rcu_read_lock().
>
> Mutual exclusion for modifications of the list continues to be handled
> by the rproc_list_mutex and a synchronization point is added before
> releasing objects that are popped from the list.
>
> Signed-off-by: Bjorn Andersson <[email protected]>
> ---
>
> Change v3:
> - New patch
>
> drivers/remoteproc/remoteproc_core.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 097f33e4f1f3..f0a77c30c6b1 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1854,8 +1854,8 @@ struct rproc *rproc_get_by_phandle(phandle phandle)
> if (!np)
> return NULL;
>
> - mutex_lock(&rproc_list_mutex);
> - list_for_each_entry(r, &rproc_list, node) {
> + rcu_read_lock();
> + list_for_each_entry_rcu(r, &rproc_list, node) {
> if (r->dev.parent && r->dev.parent->of_node == np) {
> /* prevent underlying implementation from being removed */
> if (!try_module_get(r->dev.parent->driver->owner)) {
> @@ -1868,7 +1868,7 @@ struct rproc *rproc_get_by_phandle(phandle phandle)
> break;
> }
> }
> - mutex_unlock(&rproc_list_mutex);
> + rcu_read_unlock();
>
> of_node_put(np);
>
> @@ -1925,7 +1925,7 @@ int rproc_add(struct rproc *rproc)
>
> /* expose to rproc_get_by_phandle users */
> mutex_lock(&rproc_list_mutex);
> - list_add(&rproc->node, &rproc_list);
> + list_add_rcu(&rproc->node, &rproc_list);
> mutex_unlock(&rproc_list_mutex);
>
> return 0;
> @@ -2140,9 +2140,12 @@ int rproc_del(struct rproc *rproc)
>
> /* the rproc is downref'ed as soon as it's removed from the klist */
> mutex_lock(&rproc_list_mutex);
> - list_del(&rproc->node);
> + list_del_rcu(&rproc->node);
> mutex_unlock(&rproc_list_mutex);
i'm not familiar with rcu but as rproc_panic_handler can be called in interrupt context,
does mutex should be replaced by a spinlock?

Regards,
Arnaud
>
> + /* Ensure that no readers of rproc_list are still active */
> + synchronize_rcu();
> +
> device_del(&rproc->dev);
>
> return 0;
>

2020-03-10 16:21:18

by Bjorn Andersson

[permalink] [raw]

Subject: Re: [PATCH v4 1/4] remoteproc: Traverse rproc_list under RCU read lock

On Tue 10 Mar 06:41 PDT 2020, Arnaud POULIQUEN wrote:

> Hi Bjorn,
>
>
> On 3/10/20 7:38 AM, Bjorn Andersson wrote:
> > In order to be able to traverse the mostly read-only rproc_list without
> > locking during panic migrate traversal to be done under rcu_read_lock().
> >
> > Mutual exclusion for modifications of the list continues to be handled
> > by the rproc_list_mutex and a synchronization point is added before
> > releasing objects that are popped from the list.
> >
> > Signed-off-by: Bjorn Andersson <[email protected]>
> > ---
> >
> > Change v3:
> > - New patch
> >
> > drivers/remoteproc/remoteproc_core.c | 13 ++++++++-----
> > 1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index 097f33e4f1f3..f0a77c30c6b1 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1854,8 +1854,8 @@ struct rproc *rproc_get_by_phandle(phandle phandle)
> > if (!np)
> > return NULL;
> >
> > - mutex_lock(&rproc_list_mutex);
> > - list_for_each_entry(r, &rproc_list, node) {
> > + rcu_read_lock();
> > + list_for_each_entry_rcu(r, &rproc_list, node) {
> > if (r->dev.parent && r->dev.parent->of_node == np) {
> > /* prevent underlying implementation from being removed */
> > if (!try_module_get(r->dev.parent->driver->owner)) {
> > @@ -1868,7 +1868,7 @@ struct rproc *rproc_get_by_phandle(phandle phandle)
> > break;
> > }
> > }
> > - mutex_unlock(&rproc_list_mutex);
> > + rcu_read_unlock();
> >
> > of_node_put(np);
> >
> > @@ -1925,7 +1925,7 @@ int rproc_add(struct rproc *rproc)
> >
> > /* expose to rproc_get_by_phandle users */
> > mutex_lock(&rproc_list_mutex);
> > - list_add(&rproc->node, &rproc_list);
> > + list_add_rcu(&rproc->node, &rproc_list);
> > mutex_unlock(&rproc_list_mutex);
> >
> > return 0;
> > @@ -2140,9 +2140,12 @@ int rproc_del(struct rproc *rproc)
> >
> > /* the rproc is downref'ed as soon as it's removed from the klist */
> > mutex_lock(&rproc_list_mutex);
> > - list_del(&rproc->node);
> > + list_del_rcu(&rproc->node);
> > mutex_unlock(&rproc_list_mutex);
> i'm not familiar with rcu but as rproc_panic_handler can be called in interrupt context,
> does mutex should be replaced by a spinlock?
>

Code traversing the list doesn't need to hold a lock, because the
rculist implementation ensures that the list itself is always
consistent.

Updates however can not be done concurrently, so that's why we're
maintaining this lock - which can be a mutex, because it now only
protects modifications.

And then the last piece is to guarantee that a node is not freed while
it's being accessed by the code traversing the list. This is ensured by
the synchronize_rcu() call below, which makes sure that no code holding
a rcu_read_lock() is still traversing the list.

Regards,
Bjorn

> Regards,
> Arnaud
> >
> > + /* Ensure that no readers of rproc_list are still active */
> > + synchronize_rcu();
> > +
> > device_del(&rproc->dev);
> >
> > return 0;
> >

2020-03-23 21:52:44

by Mathieu Poirier

[permalink] [raw]

Subject: Re: [PATCH v4 1/4] remoteproc: Traverse rproc_list under RCU read lock

On Mon, Mar 09, 2020 at 11:38:14PM -0700, Bjorn Andersson wrote:
> In order to be able to traverse the mostly read-only rproc_list without
> locking during panic migrate traversal to be done under rcu_read_lock().
>
> Mutual exclusion for modifications of the list continues to be handled
> by the rproc_list_mutex and a synchronization point is added before
> releasing objects that are popped from the list.
>
> Signed-off-by: Bjorn Andersson <[email protected]>
> ---
>
> Change v3:
> - New patch
>
> drivers/remoteproc/remoteproc_core.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 097f33e4f1f3..f0a77c30c6b1 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1854,8 +1854,8 @@ struct rproc *rproc_get_by_phandle(phandle phandle)
> if (!np)
> return NULL;
>
> - mutex_lock(&rproc_list_mutex);
> - list_for_each_entry(r, &rproc_list, node) {
> + rcu_read_lock();
> + list_for_each_entry_rcu(r, &rproc_list, node) {
> if (r->dev.parent && r->dev.parent->of_node == np) {
> /* prevent underlying implementation from being removed */
> if (!try_module_get(r->dev.parent->driver->owner)) {
> @@ -1868,7 +1868,7 @@ struct rproc *rproc_get_by_phandle(phandle phandle)
> break;
> }
> }
> - mutex_unlock(&rproc_list_mutex);
> + rcu_read_unlock();
>
> of_node_put(np);
>
> @@ -1925,7 +1925,7 @@ int rproc_add(struct rproc *rproc)
>
> /* expose to rproc_get_by_phandle users */
> mutex_lock(&rproc_list_mutex);
> - list_add(&rproc->node, &rproc_list);
> + list_add_rcu(&rproc->node, &rproc_list);
> mutex_unlock(&rproc_list_mutex);
>
> return 0;
> @@ -2140,9 +2140,12 @@ int rproc_del(struct rproc *rproc)
>
> /* the rproc is downref'ed as soon as it's removed from the klist */
> mutex_lock(&rproc_list_mutex);
> - list_del(&rproc->node);
> + list_del_rcu(&rproc->node);
> mutex_unlock(&rproc_list_mutex);
>
> + /* Ensure that no readers of rproc_list are still active */
> + synchronize_rcu();
> +

Please add linuc/rculist.h to include the RCU API. With that:

Reviewed-by: Mathieu Poirier <[email protected]>

> device_del(&rproc->dev);
>
> return 0;
> --
> 2.24.0
>

2020-03-23 22:29:52

by Mathieu Poirier

[permalink] [raw]

Subject: Re: [PATCH v4 2/4] remoteproc: Introduce "panic" callback in ops

On Mon, Mar 09, 2020 at 11:38:15PM -0700, Bjorn Andersson wrote:
> Introduce generic support for handling kernel panics in remoteproc
> drivers, in order to allow operations needed for aiding in post mortem
> system debugging, such as flushing caches etc.
>
> The function can return a number of milliseconds needed by the remote to
> "settle" and the core will wait the longest returned duration before
> returning from the panic handler.
>
> Signed-off-by: Bjorn Andersson <[email protected]>
> ---
>
> Change since v3:
> - Migrate from mutex_trylock() to using RCU
> - Turned the timeout to unsigned long
>
> drivers/remoteproc/remoteproc_core.c | 44 ++++++++++++++++++++++++++++
> include/linux/remoteproc.h | 3 ++
> 2 files changed, 47 insertions(+)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index f0a77c30c6b1..2024a98930bf 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -16,6 +16,7 @@
>
> #define pr_fmt(fmt) "%s: " fmt, __func__
>
> +#include <linux/delay.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/device.h>
> @@ -43,6 +44,7 @@
>
> static DEFINE_MUTEX(rproc_list_mutex);
> static LIST_HEAD(rproc_list);
> +static struct notifier_block rproc_panic_nb;
>
> typedef int (*rproc_handle_resource_t)(struct rproc *rproc,
> void *, int offset, int avail);
> @@ -2219,10 +2221,51 @@ void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type)
> }
> EXPORT_SYMBOL(rproc_report_crash);
>
> +static int rproc_panic_handler(struct notifier_block *nb, unsigned long event,
> + void *ptr)
> +{
> + unsigned int longest = 0;
> + struct rproc *rproc;
> + unsigned int d;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(rproc, &rproc_list, node) {
> + if (!rproc->ops->panic || rproc->state != RPROC_RUNNING)
> + continue;

To do things correctly rproc->state would need to be protected by the
rproc->mutex, which would violate RCU's rule of not blocking inside a read-side
critical section. And going back to using the rproc_list_mutex as in your
previous version would likely set off the lockdep mechanic quickly.

I don't have a solution, just noting that a potential race does exist. On the
flip side consequences are minimal.

Reviewed-by: Mathieu Poirier <[email protected]>

> +
> + d = rproc->ops->panic(rproc);
> + longest = max(longest, d);
> + }
> + rcu_read_unlock();
> +
> + /*
> + * Delay for the longest requested duration before returning.
> + * This can be used by the remoteproc drivers to give the remote
> + * processor time to perform any requested operations (such as flush
> + * caches), where means for signalling the Linux side isn't available
> + * while in panic.
> + */
> + mdelay(longest);
> +
> + return NOTIFY_DONE;
> +}
> +
> +static void __init rproc_init_panic(void)
> +{
> + rproc_panic_nb.notifier_call = rproc_panic_handler;
> + atomic_notifier_chain_register(&panic_notifier_list, &rproc_panic_nb);
> +}
> +
> +static void __exit rproc_exit_panic(void)
> +{
> + atomic_notifier_chain_unregister(&panic_notifier_list, &rproc_panic_nb);
> +}
> +
> static int __init remoteproc_init(void)
> {
> rproc_init_sysfs();
> rproc_init_debugfs();
> + rproc_init_panic();
>
> return 0;
> }
> @@ -2232,6 +2275,7 @@ static void __exit remoteproc_exit(void)
> {
> ida_destroy(&rproc_dev_index);
>
> + rproc_exit_panic();
> rproc_exit_debugfs();
> rproc_exit_sysfs();
> }
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index 16ad66683ad0..5959d6247dc0 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -369,6 +369,8 @@ enum rsc_handling_status {
> * expects to find it
> * @sanity_check: sanity check the fw image
> * @get_boot_addr: get boot address to entry point specified in firmware
> + * @panic: optional callback to react to system panic, core will delay
> + * panic at least the returned number of milliseconds
> */
> struct rproc_ops {
> int (*start)(struct rproc *rproc);
> @@ -383,6 +385,7 @@ struct rproc_ops {
> int (*load)(struct rproc *rproc, const struct firmware *fw);
> int (*sanity_check)(struct rproc *rproc, const struct firmware *fw);
> u32 (*get_boot_addr)(struct rproc *rproc, const struct firmware *fw);
> + unsigned long (*panic)(struct rproc *rproc);
> };
>
> /**
> --
> 2.24.0
>

2020-03-23 22:34:44

by Mathieu Poirier

[permalink] [raw]

Subject: Re: [PATCH v4 2/4] remoteproc: Introduce "panic" callback in ops

On Mon, Mar 09, 2020 at 11:38:15PM -0700, Bjorn Andersson wrote:
> Introduce generic support for handling kernel panics in remoteproc
> drivers, in order to allow operations needed for aiding in post mortem
> system debugging, such as flushing caches etc.
>
> The function can return a number of milliseconds needed by the remote to
> "settle" and the core will wait the longest returned duration before
> returning from the panic handler.
>
> Signed-off-by: Bjorn Andersson <[email protected]>
> ---
>
> Change since v3:
> - Migrate from mutex_trylock() to using RCU
> - Turned the timeout to unsigned long
>
> drivers/remoteproc/remoteproc_core.c | 44 ++++++++++++++++++++++++++++
> include/linux/remoteproc.h | 3 ++
> 2 files changed, 47 insertions(+)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index f0a77c30c6b1..2024a98930bf 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -16,6 +16,7 @@
>
> #define pr_fmt(fmt) "%s: " fmt, __func__
>
> +#include <linux/delay.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/device.h>
> @@ -43,6 +44,7 @@
>
> static DEFINE_MUTEX(rproc_list_mutex);
> static LIST_HEAD(rproc_list);
> +static struct notifier_block rproc_panic_nb;
>
> typedef int (*rproc_handle_resource_t)(struct rproc *rproc,
> void *, int offset, int avail);
> @@ -2219,10 +2221,51 @@ void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type)
> }
> EXPORT_SYMBOL(rproc_report_crash);
>
> +static int rproc_panic_handler(struct notifier_block *nb, unsigned long event,
> + void *ptr)
> +{
> + unsigned int longest = 0;
> + struct rproc *rproc;
> + unsigned int d;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(rproc, &rproc_list, node) {
> + if (!rproc->ops->panic || rproc->state != RPROC_RUNNING)
> + continue;
> +
> + d = rproc->ops->panic(rproc);
> + longest = max(longest, d);
> + }
> + rcu_read_unlock();
> +
> + /*
> + * Delay for the longest requested duration before returning.
> + * This can be used by the remoteproc drivers to give the remote
> + * processor time to perform any requested operations (such as flush
> + * caches), where means for signalling the Linux side isn't available

There is a problem with the above sentence.

> + * while in panic.
> + */
> + mdelay(longest);
> +
> + return NOTIFY_DONE;
> +}
> +
> +static void __init rproc_init_panic(void)
> +{
> + rproc_panic_nb.notifier_call = rproc_panic_handler;
> + atomic_notifier_chain_register(&panic_notifier_list, &rproc_panic_nb);
> +}
> +
> +static void __exit rproc_exit_panic(void)
> +{
> + atomic_notifier_chain_unregister(&panic_notifier_list, &rproc_panic_nb);
> +}
> +
> static int __init remoteproc_init(void)
> {
> rproc_init_sysfs();
> rproc_init_debugfs();
> + rproc_init_panic();
>
> return 0;
> }
> @@ -2232,6 +2275,7 @@ static void __exit remoteproc_exit(void)
> {
> ida_destroy(&rproc_dev_index);
>
> + rproc_exit_panic();
> rproc_exit_debugfs();
> rproc_exit_sysfs();
> }
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index 16ad66683ad0..5959d6247dc0 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -369,6 +369,8 @@ enum rsc_handling_status {
> * expects to find it
> * @sanity_check: sanity check the fw image
> * @get_boot_addr: get boot address to entry point specified in firmware
> + * @panic: optional callback to react to system panic, core will delay
> + * panic at least the returned number of milliseconds
> */
> struct rproc_ops {
> int (*start)(struct rproc *rproc);
> @@ -383,6 +385,7 @@ struct rproc_ops {
> int (*load)(struct rproc *rproc, const struct firmware *fw);
> int (*sanity_check)(struct rproc *rproc, const struct firmware *fw);
> u32 (*get_boot_addr)(struct rproc *rproc, const struct firmware *fw);
> + unsigned long (*panic)(struct rproc *rproc);
> };
>
> /**
> --
> 2.24.0
>

2020-03-23 22:37:07

by Mathieu Poirier

[permalink] [raw]

Subject: Re: [PATCH v4 3/4] remoteproc: qcom: q6v5: Add common panic handler

On Mon, Mar 09, 2020 at 11:38:16PM -0700, Bjorn Andersson wrote:
> Add a common panic handler that invokes a stop request and sleep enough
> to let the remoteproc flush it's caches etc in order to aid post mortem
> debugging. For now a hard coded 200ms is returned to the remoteproc
> core, this value is taken from the downstream kernel.
>
> Signed-off-by: Bjorn Andersson <[email protected]>
> ---
>
> Change since v3:
> - Change return type to unsigned long
>
> drivers/remoteproc/qcom_q6v5.c | 20 ++++++++++++++++++++
> drivers/remoteproc/qcom_q6v5.h | 1 +
> 2 files changed, 21 insertions(+)
>
> diff --git a/drivers/remoteproc/qcom_q6v5.c b/drivers/remoteproc/qcom_q6v5.c
> index cb0f4a0be032..111a442c993c 100644
> --- a/drivers/remoteproc/qcom_q6v5.c
> +++ b/drivers/remoteproc/qcom_q6v5.c
> @@ -15,6 +15,8 @@
> #include <linux/remoteproc.h>
> #include "qcom_q6v5.h"
>
> +#define Q6V5_PANIC_DELAY_MS 200
> +
> /**
> * qcom_q6v5_prepare() - reinitialize the qcom_q6v5 context before start
> * @q6v5: reference to qcom_q6v5 context to be reinitialized
> @@ -162,6 +164,24 @@ int qcom_q6v5_request_stop(struct qcom_q6v5 *q6v5)
> }
> EXPORT_SYMBOL_GPL(qcom_q6v5_request_stop);
>
> +/**
> + * qcom_q6v5_panic() - panic handler to invoke a stop on the remote
> + * @q6v5: reference to qcom_q6v5 context
> + *
> + * Set the stop bit and sleep in order to allow the remote processor to flush
> + * its caches etc for post mortem debugging.
> + *
> + * Return: 200ms
> + */
> +unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5)
> +{
> + qcom_smem_state_update_bits(q6v5->state,
> + BIT(q6v5->stop_bit), BIT(q6v5->stop_bit));
> +
> + return Q6V5_PANIC_DELAY_MS;
> +}
> +EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
> +
> /**
> * qcom_q6v5_init() - initializer of the q6v5 common struct
> * @q6v5: handle to be initialized
> diff --git a/drivers/remoteproc/qcom_q6v5.h b/drivers/remoteproc/qcom_q6v5.h
> index 7ac92c1e0f49..c4ed887c1499 100644
> --- a/drivers/remoteproc/qcom_q6v5.h
> +++ b/drivers/remoteproc/qcom_q6v5.h
> @@ -42,5 +42,6 @@ int qcom_q6v5_prepare(struct qcom_q6v5 *q6v5);
> int qcom_q6v5_unprepare(struct qcom_q6v5 *q6v5);
> int qcom_q6v5_request_stop(struct qcom_q6v5 *q6v5);
> int qcom_q6v5_wait_for_start(struct qcom_q6v5 *q6v5, int timeout);
> +unsigned long qcom_q6v5_panic(struct qcom_q6v5 *q6v5);

Reviewed-by: Mathieu Poirier <[email protected]>

>
> #endif
> --
> 2.24.0
>

2020-03-23 22:46:16

by Mathieu Poirier

[permalink] [raw]

Subject: Re: [PATCH v4 4/4] remoteproc: qcom: Introduce panic handler for PAS and ADSP

On Mon, Mar 09, 2020 at 11:38:17PM -0700, Bjorn Andersson wrote:
> Make the PAS and ADSP/CDSP remoteproc drivers implement the panic
> handler that will invoke a stop to prepare the remoteprocs for post
> mortem debugging.
>
> Signed-off-by: Bjorn Andersson <[email protected]>
> ---
>
> Change since v3:
> - Change return type to unsigned long
>
> drivers/remoteproc/qcom_q6v5_adsp.c | 8 ++++++++
> drivers/remoteproc/qcom_q6v5_pas.c | 8 ++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c b/drivers/remoteproc/qcom_q6v5_adsp.c
> index d5cdff942535..8f1044e8ea3b 100644
> --- a/drivers/remoteproc/qcom_q6v5_adsp.c
> +++ b/drivers/remoteproc/qcom_q6v5_adsp.c
> @@ -292,12 +292,20 @@ static void *adsp_da_to_va(struct rproc *rproc, u64 da, int len)
> return adsp->mem_region + offset;
> }
>
> +static unsigned long adsp_panic(struct rproc *rproc)
> +{
> + struct qcom_adsp *adsp = rproc->priv;
> +
> + return qcom_q6v5_panic(&adsp->q6v5);
> +}
> +
> static const struct rproc_ops adsp_ops = {
> .start = adsp_start,
> .stop = adsp_stop,
> .da_to_va = adsp_da_to_va,
> .parse_fw = qcom_register_dump_segments,
> .load = adsp_load,
> + .panic = adsp_panic,
> };
>
> static int adsp_init_clock(struct qcom_adsp *adsp, const char **clk_ids)
> diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
> index e64c268e6113..678c0ddfce96 100644
> --- a/drivers/remoteproc/qcom_q6v5_pas.c
> +++ b/drivers/remoteproc/qcom_q6v5_pas.c
> @@ -243,12 +243,20 @@ static void *adsp_da_to_va(struct rproc *rproc, u64 da, int len)
> return adsp->mem_region + offset;
> }
>
> +static unsigned long adsp_panic(struct rproc *rproc)
> +{
> + struct qcom_adsp *adsp = (struct qcom_adsp *)rproc->priv;

Above rproc->priv is not casted but it is here... Not a problem, just
consistency.

Reviewed-by: Mathieu Poirier <[email protected]>

> +
> + return qcom_q6v5_panic(&adsp->q6v5);
> +}
> +
> static const struct rproc_ops adsp_ops = {
> .start = adsp_start,
> .stop = adsp_stop,
> .da_to_va = adsp_da_to_va,
> .parse_fw = qcom_register_dump_segments,
> .load = adsp_load,
> + .panic = adsp_panic,
> };
>
> static int adsp_init_clock(struct qcom_adsp *adsp)
> --
> 2.24.0
>