2020-04-24 20:03:33

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 00/14] remoteproc: Add support for synchronisaton with rproc

This is the third revision of this series that tries to address the
problem of synchronising with a remote processor with as much
flexibility as possible.

Two things to pay attention to:

1) Function rproc_actuate() has been abandoned to avoid creating another
way to start a remote processor from a kernel driver. Arnaud expressed
the opinion that it is semantically questionnable to synchronise with a
remote processor when calling rproc_boot(). We could rename
rproc_boot() to rproc_actuate() but I'll wait to see what other people
think before doing so.

2) The allocation of the synchronisation states has been split from the
remote processor allocation. A new function rproc_set_state_machine()
does all the work now. Proceeding this way has made the patchset a
lot more simple.

Other than the above I have tried to address all the comments made on the
second revision. If a comment was not addressed it simply fell through
the cracks rather than ignored. In such a case please reiterate your point
of view and I'll be sure to address it.

Applies cleanly on rproc-next (305ac5a766b1).

Best regards,
Mathieu

Mathieu Poirier (14):
remoteproc: Make core operations optional
remoteproc: Introduce function rproc_alloc_internals()
remoteproc: Add new operation and flags for synchronistation
remoteproc: Refactor function rproc_boot()
remoteproc: Refactor function rproc_fw_boot()
remoteproc: Refactor function rproc_trigger_auto_boot()
remoteproc: Introducting new start and stop functions
remoteproc: Call core functions based on synchronisation flag
remoteproc: Deal with synchronisation when crashing
remoteproc: Deal with synchronisation when shutting down
remoteproc: Deal with synchronisation when changing FW image
remoteproc: Introducing function rproc_set_state_machine()
remoteproc: Document function rproc_set_state_machine()
remoteproc: Expose synchronisation flags via debugfs

Documentation/remoteproc.txt | 17 ++
drivers/remoteproc/remoteproc_core.c | 197 +++++++++++++++++++----
drivers/remoteproc/remoteproc_debugfs.c | 21 +++
drivers/remoteproc/remoteproc_internal.h | 123 +++++++++++++-
drivers/remoteproc/remoteproc_sysfs.c | 24 ++-
include/linux/remoteproc.h | 27 ++++
6 files changed, 372 insertions(+), 37 deletions(-)

--
2.20.1


2020-04-24 20:03:39

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 02/14] remoteproc: Introduce function rproc_alloc_internals()

In scenarios where the remote processor's lifecycle is entirely
managed by another entity there is no point in allocating memory for
a firmware name since it will never be used. The same goes for a core
set of operations.

As such introduce function rproc_alloc_internals() to decide if the
allocation of a firmware name and the core operations need to be done.
That way rproc_alloc() can be kept as clean as possible.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 31 +++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 448262470fc7..1b4756909584 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -2076,6 +2076,30 @@ static int rproc_alloc_ops(struct rproc *rproc, const struct rproc_ops *ops)
return 0;
}

+static int rproc_alloc_internals(struct rproc *rproc,
+ const struct rproc_ops *ops,
+ const char *name, const char *firmware)
+{
+ int ret;
+
+ /*
+ * In scenarios where the remote processor's lifecycle is entirely
+ * managed by another entity there is no point in carrying a set
+ * of operations that will never be used.
+ *
+ * And since no firmware will ever be loaded, there is no point in
+ * allocating memory for it either.
+ */
+ if (!ops)
+ return 0;
+
+ ret = rproc_alloc_firmware(rproc, name, firmware);
+ if (ret)
+ return ret;
+
+ return rproc_alloc_ops(rproc, ops);
+}
+
/**
* rproc_alloc() - allocate a remote processor handle
* @dev: the underlying device
@@ -2105,7 +2129,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name,
{
struct rproc *rproc;

- if (!dev || !name || !ops)
+ if (!dev || !name)
return NULL;

rproc = kzalloc(sizeof(struct rproc) + len, GFP_KERNEL);
@@ -2128,10 +2152,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name,
if (!rproc->name)
goto put_device;

- if (rproc_alloc_firmware(rproc, name, firmware))
- goto put_device;
-
- if (rproc_alloc_ops(rproc, ops))
+ if (rproc_alloc_internals(rproc, ops, name, firmware))
goto put_device;

/* Assign a unique device index and name */
--
2.20.1

2020-04-24 20:03:59

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 07/14] remoteproc: Introducting new start and stop functions

Add new functions to replace direct calling of rproc->ops->start() and
rproc->ops->stop(). That way different behaviour can be played out
when booting a remote processor or synchronising with it.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 6 +++---
drivers/remoteproc/remoteproc_internal.h | 16 ++++++++++++++++
2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 9de0e2b7ca2b..ef88d3e84bfb 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1339,7 +1339,7 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
}

/* power up the remote processor */
- ret = rproc->ops->start(rproc);
+ ret = rproc_start_device(rproc);
if (ret) {
dev_err(dev, "can't start rproc %s: %d\n", rproc->name, ret);
goto unprepare_subdevices;
@@ -1360,7 +1360,7 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
return 0;

stop_rproc:
- rproc->ops->stop(rproc);
+ rproc_stop_device(rproc);
unprepare_subdevices:
rproc_unprepare_subdevices(rproc);
reset_table_ptr:
@@ -1493,7 +1493,7 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
rproc->table_ptr = rproc->cached_table;

/* power off the remote processor */
- ret = rproc->ops->stop(rproc);
+ ret = rproc_stop_device(rproc);
if (ret) {
dev_err(dev, "can't stop rproc: %d\n", ret);
return ret;
diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
index 47b500e40dd9..dda7044c4b3e 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -125,6 +125,22 @@ struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
return NULL;
}

+static inline int rproc_start_device(struct rproc *rproc)
+{
+ if (rproc->ops && rproc->ops->start)
+ return rproc->ops->start(rproc);
+
+ return 0;
+}
+
+static inline int rproc_stop_device(struct rproc *rproc)
+{
+ if (rproc->ops && rproc->ops->stop)
+ return rproc->ops->stop(rproc);
+
+ return 0;
+}
+
static inline
bool rproc_u64_fit_in_size_t(u64 val)
{
--
2.20.1

2020-04-24 20:04:06

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 08/14] remoteproc: Call core functions based on synchronisation flag

Call the right core function based on whether we should synchronise
with a remote processor or boot it from scratch.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_internal.h | 50 ++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
index dda7044c4b3e..3985c084b184 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -72,6 +72,12 @@ static inline bool rproc_needs_syncing(struct rproc *rproc)
static inline
int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
{
+ if (rproc_needs_syncing(rproc)) {
+ if (rproc->sync_ops && rproc->sync_ops->sanity_check)
+ return rproc->sync_ops->sanity_check(rproc, fw);
+ return 0;
+ }
+
if (rproc->ops && rproc->ops->sanity_check)
return rproc->ops->sanity_check(rproc, fw);

@@ -81,6 +87,12 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
static inline
u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
{
+ if (rproc_needs_syncing(rproc)) {
+ if (rproc->sync_ops && rproc->sync_ops->get_boot_addr)
+ return rproc->sync_ops->get_boot_addr(rproc, fw);
+ return 0;
+ }
+
if (rproc->ops && rproc->ops->get_boot_addr)
return rproc->ops->get_boot_addr(rproc, fw);

@@ -90,6 +102,12 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
static inline
int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
{
+ if (rproc_needs_syncing(rproc)) {
+ if (rproc->sync_ops && rproc->sync_ops->load)
+ return rproc->sync_ops->load(rproc, fw);
+ return 0;
+ }
+
if (rproc->ops && rproc->ops->load)
return rproc->ops->load(rproc, fw);

@@ -98,6 +116,12 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)

static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
{
+ if (rproc_needs_syncing(rproc)) {
+ if (rproc->sync_ops && rproc->sync_ops->parse_fw)
+ return rproc->sync_ops->parse_fw(rproc, fw);
+ return 0;
+ }
+
if (rproc->ops && rproc->ops->parse_fw)
return rproc->ops->parse_fw(rproc, fw);

@@ -108,6 +132,13 @@ static inline
int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
int avail)
{
+ if (rproc_needs_syncing(rproc)) {
+ if (rproc->sync_ops && rproc->sync_ops->handle_rsc)
+ return rproc->sync_ops->handle_rsc(rproc, rsc_type,
+ rsc, offset, avail);
+ return 0;
+ }
+
if (rproc->ops && rproc->ops->handle_rsc)
return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
avail);
@@ -119,6 +150,13 @@ static inline
struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
const struct firmware *fw)
{
+ if (rproc_needs_syncing(rproc)) {
+ if (rproc->sync_ops && rproc->sync_ops->find_loaded_rsc_table)
+ return rproc->sync_ops->find_loaded_rsc_table(rproc,
+ fw);
+ return NULL;
+ }
+
if (rproc->ops && rproc->ops->find_loaded_rsc_table)
return rproc->ops->find_loaded_rsc_table(rproc, fw);

@@ -127,6 +165,12 @@ struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,

static inline int rproc_start_device(struct rproc *rproc)
{
+ if (rproc_needs_syncing(rproc)) {
+ if (rproc->sync_ops && rproc->sync_ops->start)
+ return rproc->sync_ops->start(rproc);
+ return 0;
+ }
+
if (rproc->ops && rproc->ops->start)
return rproc->ops->start(rproc);

@@ -135,6 +179,12 @@ static inline int rproc_start_device(struct rproc *rproc)

static inline int rproc_stop_device(struct rproc *rproc)
{
+ if (rproc_needs_syncing(rproc)) {
+ if (rproc->sync_ops && rproc->sync_ops->stop)
+ return rproc->sync_ops->stop(rproc);
+ return 0;
+ }
+
if (rproc->ops && rproc->ops->stop)
return rproc->ops->stop(rproc);

--
2.20.1

2020-04-24 20:04:16

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 12/14] remoteproc: Introducing function rproc_set_state_machine()

Introducting function rproc_set_state_machine() to add
operations and a set of flags to use when synchronising with
a remote processor.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 54 ++++++++++++++++++++++++
drivers/remoteproc/remoteproc_internal.h | 6 +++
include/linux/remoteproc.h | 3 ++
3 files changed, 63 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 48afa1f80a8f..5c48714e8702 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -2065,6 +2065,59 @@ int devm_rproc_add(struct device *dev, struct rproc *rproc)
}
EXPORT_SYMBOL(devm_rproc_add);

+/**
+ * rproc_set_state_machine() - Set a synchronisation ops and set of flags
+ * to use with a remote processor
+ * @rproc: The remote processor to work with
+ * @sync_ops: The operations to use when synchronising with a remote
+ * processor
+ * @sync_flags: The flags to use when deciding if the remoteproc core
+ * should be synchronising with a remote processor
+ *
+ * Returns 0 on success, an error code otherwise.
+ */
+int rproc_set_state_machine(struct rproc *rproc,
+ const struct rproc_ops *sync_ops,
+ struct rproc_sync_flags sync_flags)
+{
+ if (!rproc || !sync_ops)
+ return -EINVAL;
+
+ /*
+ * No point in going further if we never have to synchronise with
+ * the remote processor.
+ */
+ if (!sync_flags.on_init &&
+ !sync_flags.after_stop && !sync_flags.after_crash)
+ return 0;
+
+ /*
+ * Refuse to go further if remoteproc operations have been allocated
+ * but they will never be used.
+ */
+ if (rproc->ops && sync_flags.on_init &&
+ sync_flags.after_stop && sync_flags.after_crash)
+ return -EINVAL;
+
+ /*
+ * Don't allow users to set this more than once to avoid situations
+ * where the remote processor can't be recovered.
+ */
+ if (rproc->sync_ops)
+ return -EINVAL;
+
+ rproc->sync_ops = kmemdup(sync_ops, sizeof(*sync_ops), GFP_KERNEL);
+ if (!rproc->sync_ops)
+ return -ENOMEM;
+
+ rproc->sync_flags = sync_flags;
+ /* Tell the core what to do when initialising */
+ rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_INIT);
+
+ return 0;
+}
+EXPORT_SYMBOL(rproc_set_state_machine);
+
/**
* rproc_type_release() - release a remote processor instance
* @dev: the rproc's device
@@ -2088,6 +2141,7 @@ static void rproc_type_release(struct device *dev)
kfree_const(rproc->firmware);
kfree_const(rproc->name);
kfree(rproc->ops);
+ kfree(rproc->sync_ops);
kfree(rproc);
}

diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
index 7dcc0a26892b..c1a293a37c78 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -27,6 +27,8 @@ struct rproc_debug_trace {
/*
* enum rproc_sync_states - remote processsor sync states
*
+ * @RPROC_SYNC_STATE_INIT state to use when the remoteproc core
+ * is initialising.
* @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
* has shutdown (rproc_shutdown()) the
* remote processor.
@@ -39,6 +41,7 @@ struct rproc_debug_trace {
* operation to use.
*/
enum rproc_sync_states {
+ RPROC_SYNC_STATE_INIT,
RPROC_SYNC_STATE_SHUTDOWN,
RPROC_SYNC_STATE_CRASHED,
};
@@ -47,6 +50,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
enum rproc_sync_states state)
{
switch (state) {
+ case RPROC_SYNC_STATE_INIT:
+ rproc->sync_with_rproc = rproc->sync_flags.on_init;
+ break;
case RPROC_SYNC_STATE_SHUTDOWN:
rproc->sync_with_rproc = rproc->sync_flags.after_stop;
break;
diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index ceb3b2bba824..a75ed92b3de6 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -619,6 +619,9 @@ struct rproc *rproc_get_by_child(struct device *dev);
struct rproc *rproc_alloc(struct device *dev, const char *name,
const struct rproc_ops *ops,
const char *firmware, int len);
+int rproc_set_state_machine(struct rproc *rproc,
+ const struct rproc_ops *sync_ops,
+ struct rproc_sync_flags sync_flags);
void rproc_put(struct rproc *rproc);
int rproc_add(struct rproc *rproc);
int rproc_del(struct rproc *rproc);
--
2.20.1

2020-04-24 20:04:20

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 05/14] remoteproc: Refactor function rproc_fw_boot()

Refactor function rproc_fw_boot() in order to better reflect the work
that is done when supporting scenarios where the remoteproc core is
synchronising with a remote processor.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index a02593b75bec..e90a21de9de1 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1370,9 +1370,9 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
}

/*
- * take a firmware and boot a remote processor with it.
+ * boot or synchronise with a remote processor.
*/
-static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
+static int rproc_actuate_device(struct rproc *rproc, const struct firmware *fw)
{
struct device *dev = &rproc->dev;
const char *name = rproc->firmware;
@@ -1382,7 +1382,9 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
if (ret)
return ret;

- dev_info(dev, "Booting fw image %s, size %zd\n", name, fw->size);
+ if (!rproc_needs_syncing(rproc))
+ dev_info(dev, "Booting fw image %s, size %zd\n",
+ name, fw->size);

/*
* if enabling an IOMMU isn't relevant for this rproc, this is
@@ -1818,7 +1820,7 @@ int rproc_boot(struct rproc *rproc)
}
}

- ret = rproc_fw_boot(rproc, firmware_p);
+ ret = rproc_actuate_device(rproc, firmware_p);

release_firmware(firmware_p);

--
2.20.1

2020-04-24 20:04:22

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 11/14] remoteproc: Deal with synchronisation when changing FW image

This patch prevents the firmware image from being displayed or changed
when the remoteproc core is synchronising with a remote processor. This
is needed since there is no guarantee about the nature of the firmware
image that is loaded by the external entity.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_sysfs.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
index 7f8536b73295..cdd322a6ecfa 100644
--- a/drivers/remoteproc/remoteproc_sysfs.c
+++ b/drivers/remoteproc/remoteproc_sysfs.c
@@ -13,9 +13,20 @@
static ssize_t firmware_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
+ ssize_t ret;
struct rproc *rproc = to_rproc(dev);

- return sprintf(buf, "%s\n", rproc->firmware);
+ /*
+ * In most instances there is no guarantee about the firmware
+ * that was loaded by the external entity. As such simply don't
+ * print anything.
+ */
+ if (rproc_needs_syncing(rproc))
+ ret = sprintf(buf, "\n");
+ else
+ ret = sprintf(buf, "%s\n", rproc->firmware);
+
+ return ret;
}

/* Change firmware name via sysfs */
@@ -39,6 +50,17 @@ static ssize_t firmware_store(struct device *dev,
goto out;
}

+ /*
+ * There is no point in trying to change the firmware if loading the
+ * image of the remote processor is done by another entity.
+ */
+ if (rproc_needs_syncing(rproc)) {
+ dev_err(dev,
+ "can't change firmware while synchronising with MCU\n");
+ err = -EBUSY;
+ goto out;
+ }
+
len = strcspn(buf, "\n");
if (!len) {
dev_err(dev, "can't provide a NULL firmware\n");
--
2.20.1

2020-04-24 20:04:24

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 14/14] remoteproc: Expose synchronisation flags via debugfs

Add a debugfs entry that reflects the value of the current
synchronisation flags used by the remoteproc core.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_debugfs.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_debugfs.c b/drivers/remoteproc/remoteproc_debugfs.c
index 732770e92b99..3dde24e62cd8 100644
--- a/drivers/remoteproc/remoteproc_debugfs.c
+++ b/drivers/remoteproc/remoteproc_debugfs.c
@@ -291,6 +291,25 @@ static int rproc_carveouts_show(struct seq_file *seq, void *p)

DEFINE_SHOW_ATTRIBUTE(rproc_carveouts);

+ /* Expose synchronisation states via debugfs */
+static int rproc_sync_flags_show(struct seq_file *seq, void *p)
+{
+ struct rproc *rproc = seq->private;
+
+ seq_printf(seq, "Sync with rproc: %s\n",
+ rproc->sync_with_rproc ? "true" : "false");
+ seq_printf(seq, "On init: %s\n",
+ rproc->sync_flags.on_init ? "true" : "false");
+ seq_printf(seq, "After stop: %s\n",
+ rproc->sync_flags.after_stop ? "true" : "false");
+ seq_printf(seq, "After crash: %s\n",
+ rproc->sync_flags.after_crash ? "true" : "false");
+
+ return 0;
+}
+
+DEFINE_SHOW_ATTRIBUTE(rproc_sync_flags);
+
void rproc_remove_trace_file(struct dentry *tfile)
{
debugfs_remove(tfile);
@@ -337,6 +356,8 @@ void rproc_create_debug_dir(struct rproc *rproc)
rproc, &rproc_rsc_table_fops);
debugfs_create_file("carveout_memories", 0400, rproc->dbg_dir,
rproc, &rproc_carveouts_fops);
+ debugfs_create_file("sync_flags", 0400, rproc->dbg_dir,
+ rproc, &rproc_sync_flags_fops);
}

void __init rproc_init_debugfs(void)
--
2.20.1

2020-04-24 20:04:31

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 13/14] remoteproc: Document function rproc_set_state_machine()

Add a few words on the newly added rproc_set_state_machine()
in order to adversite the new API and help put people into
context.

Signed-off-by: Mathieu Poirier <[email protected]>
---
Documentation/remoteproc.txt | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/Documentation/remoteproc.txt b/Documentation/remoteproc.txt
index 2be1147256e0..550ed9a06a27 100644
--- a/Documentation/remoteproc.txt
+++ b/Documentation/remoteproc.txt
@@ -132,6 +132,23 @@ On success, the new rproc is returned, and on failure, NULL.
**never** directly deallocate @rproc, even if it was not registered
yet. Instead, when you need to unroll rproc_alloc(), use rproc_free().

+::
+
+ int rproc_set_state_machine(struct rproc *rproc,
+ const struct rproc_ops *sync_ops,
+ struct rproc_sync_flags sync_flags)
+
+This function should be called for cases where the remote processor has
+been started by another entity, be it a boot loader or trusted environment,
+and the remoteproc core is to synchronise with the remote processor rather
+then boot it. The synchronisation flags @sync_flags tell the core whether
+it should synchronise with a remote processor when the core initialises, after
+a remote processor has crashed and after it was voluntarily stopped. Operations
+provided in the @sync_ops should reflect the reality of the use case. For
+example if the remoteproc core is to synchronise with a remote processor at
+initialisation time, sync_ops::find_loaded_rsc_table should provide a pointer to
+the resource table in memory rather than fetch it from the firmware image.
+
::

void rproc_free(struct rproc *rproc)
--
2.20.1

2020-04-24 20:04:43

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 10/14] remoteproc: Deal with synchronisation when shutting down

The remoteproc core must not allow function rproc_shutdown() to
proceed if currently synchronising with a remote processor and
the synchronisation operations of that remote processor does not
support it. Also part of the process is to set the synchronisation
flag so that the remoteproc core can make the right decisions when
restarting the system.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 32 ++++++++++++++++++++++++
drivers/remoteproc/remoteproc_internal.h | 7 ++++++
2 files changed, 39 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 3a84a38ba37b..48afa1f80a8f 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1849,6 +1849,27 @@ int rproc_boot(struct rproc *rproc)
}
EXPORT_SYMBOL(rproc_boot);

+static bool rproc_can_shutdown(struct rproc *rproc)
+{
+ /*
+ * The remoteproc core is the lifecycle manager, no problem
+ * calling for a shutdown.
+ */
+ if (!rproc_needs_syncing(rproc))
+ return true;
+
+ /*
+ * The remoteproc has been loaded by another entity (as per above
+ * condition) and the platform code has given us the capability
+ * of stopping it.
+ */
+ if (rproc->sync_ops->stop)
+ return true;
+
+ /* Any other condition should not be allowed */
+ return false;
+}
+
/**
* rproc_shutdown() - power off the remote processor
* @rproc: the remote processor
@@ -1879,6 +1900,9 @@ void rproc_shutdown(struct rproc *rproc)
return;
}

+ if (!rproc_can_shutdown(rproc))
+ goto out;
+
/* if the remote proc is still needed, bail out */
if (!atomic_dec_and_test(&rproc->power))
goto out;
@@ -1898,6 +1922,14 @@ void rproc_shutdown(struct rproc *rproc)
kfree(rproc->cached_table);
rproc->cached_table = NULL;
rproc->table_ptr = NULL;
+
+ /*
+ * The remote processor has been switched off - tell the core what
+ * operation to use from hereon, i.e whether an external entity will
+ * reboot the remote processor or it is now the remoteproc core's
+ * responsability.
+ */
+ rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_SHUTDOWN);
out:
mutex_unlock(&rproc->lock);
}
diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
index 61500981155c..7dcc0a26892b 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -27,6 +27,9 @@ struct rproc_debug_trace {
/*
* enum rproc_sync_states - remote processsor sync states
*
+ * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
+ * has shutdown (rproc_shutdown()) the
+ * remote processor.
* @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
* has crashed but has not been recovered by
* the remoteproc core yet.
@@ -36,6 +39,7 @@ struct rproc_debug_trace {
* operation to use.
*/
enum rproc_sync_states {
+ RPROC_SYNC_STATE_SHUTDOWN,
RPROC_SYNC_STATE_CRASHED,
};

@@ -43,6 +47,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
enum rproc_sync_states state)
{
switch (state) {
+ case RPROC_SYNC_STATE_SHUTDOWN:
+ rproc->sync_with_rproc = rproc->sync_flags.after_stop;
+ break;
case RPROC_SYNC_STATE_CRASHED:
rproc->sync_with_rproc = rproc->sync_flags.after_crash;
break;
--
2.20.1

2020-04-24 20:04:59

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 06/14] remoteproc: Refactor function rproc_trigger_auto_boot()

Refactor function rproc_trigger_auto_boot() so that it can deal with
scenarios where the remote processor is already running. As such give
it a new name to better represent the capabilities and add a call to
rproc_boot() if instructed by the platform code to synchronise with the
remote processor rather than boot it from scratch.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index e90a21de9de1..9de0e2b7ca2b 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1457,10 +1457,17 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context)
release_firmware(fw);
}

-static int rproc_trigger_auto_boot(struct rproc *rproc)
+static int rproc_trigger_auto_initiate(struct rproc *rproc)
{
int ret;

+ /*
+ * If the remote processor is already booted, all we need to do is
+ * synchronise it it. No point in dealing with a firmware image.
+ */
+ if (rproc_needs_syncing(rproc))
+ return rproc_boot(rproc);
+
/*
* We're initiating an asynchronous firmware loading, so we can
* be built-in kernel code, without hanging the boot process.
@@ -1971,9 +1978,12 @@ int rproc_add(struct rproc *rproc)
/* create debugfs entries */
rproc_create_debug_dir(rproc);

- /* if rproc is marked always-on, request it to boot */
+ /*
+ * If the auto boot flag is set, request to boot the remote
+ * processor or synchronise with it.
+ */
if (rproc->auto_boot) {
- ret = rproc_trigger_auto_boot(rproc);
+ ret = rproc_trigger_auto_initiate(rproc);
if (ret < 0)
return ret;
}
--
2.20.1

2020-04-24 20:05:39

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 01/14] remoteproc: Make core operations optional

When synchronizing with a remote processor, it is entirely possible that
the remoteproc core is not the life cycle manager. In such a case core
operations don't exist and should not be called.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_internal.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
index b389dc79da81..59fc871743c7 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -67,7 +67,7 @@ rproc_find_carveout_by_name(struct rproc *rproc, const char *name, ...);
static inline
int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
{
- if (rproc->ops->sanity_check)
+ if (rproc->ops && rproc->ops->sanity_check)
return rproc->ops->sanity_check(rproc, fw);

return 0;
@@ -76,7 +76,7 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
static inline
u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
{
- if (rproc->ops->get_boot_addr)
+ if (rproc->ops && rproc->ops->get_boot_addr)
return rproc->ops->get_boot_addr(rproc, fw);

return 0;
@@ -85,7 +85,7 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
static inline
int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
{
- if (rproc->ops->load)
+ if (rproc->ops && rproc->ops->load)
return rproc->ops->load(rproc, fw);

return -EINVAL;
@@ -93,7 +93,7 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)

static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
{
- if (rproc->ops->parse_fw)
+ if (rproc->ops && rproc->ops->parse_fw)
return rproc->ops->parse_fw(rproc, fw);

return 0;
@@ -103,7 +103,7 @@ static inline
int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
int avail)
{
- if (rproc->ops->handle_rsc)
+ if (rproc->ops && rproc->ops->handle_rsc)
return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
avail);

@@ -114,7 +114,7 @@ static inline
struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
const struct firmware *fw)
{
- if (rproc->ops->find_loaded_rsc_table)
+ if (rproc->ops && rproc->ops->find_loaded_rsc_table)
return rproc->ops->find_loaded_rsc_table(rproc, fw);

return NULL;
--
2.20.1

2020-04-24 20:05:50

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 04/14] remoteproc: Refactor function rproc_boot()

Refactoring function rproc_boot() in order to properly handle
cases where the core needs to synchronise with a remote processor
rather than booting it.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 25 ++++++++++++++++--------
drivers/remoteproc/remoteproc_internal.h | 5 +++++
2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 1b4756909584..a02593b75bec 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1762,7 +1762,9 @@ static void rproc_crash_handler_work(struct work_struct *work)
* rproc_boot() - boot a remote processor
* @rproc: handle of a remote processor
*
- * Boot a remote processor (i.e. load its firmware, power it on, ...).
+ * Boot or synchronise with a remote processor. In the former case the
+ * firmware is loaded and the remote processor powered on, in the latter
+ * those steps are simply skipped.
*
* If the remote processor is already powered on, this function immediately
* returns (successfully).
@@ -1771,8 +1773,9 @@ static void rproc_crash_handler_work(struct work_struct *work)
*/
int rproc_boot(struct rproc *rproc)
{
- const struct firmware *firmware_p;
+ const struct firmware *firmware_p = NULL;
struct device *dev;
+ bool syncing;
int ret;

if (!rproc) {
@@ -1788,6 +1791,8 @@ int rproc_boot(struct rproc *rproc)
return ret;
}

+ syncing = rproc_needs_syncing(rproc);
+
if (rproc->state == RPROC_DELETED) {
ret = -ENODEV;
dev_err(dev, "can't boot deleted rproc %s\n", rproc->name);
@@ -1800,13 +1805,17 @@ int rproc_boot(struct rproc *rproc)
goto unlock_mutex;
}

- dev_info(dev, "powering up %s\n", rproc->name);
+ dev_info(dev, "%s %s\n",
+ !syncing ? "powering up" : "syncing with", rproc->name);

- /* load firmware */
- ret = request_firmware(&firmware_p, rproc->firmware, dev);
- if (ret < 0) {
- dev_err(dev, "request_firmware failed: %d\n", ret);
- goto downref_rproc;
+
+ /* load firmware if not syncing with remote processor */
+ if (!syncing) {
+ ret = request_firmware(&firmware_p, rproc->firmware, dev);
+ if (ret < 0) {
+ dev_err(dev, "request_firmware failed: %d\n", ret);
+ goto downref_rproc;
+ }
}

ret = rproc_fw_boot(rproc, firmware_p);
diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
index 59fc871743c7..47b500e40dd9 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -64,6 +64,11 @@ struct resource_table *rproc_elf_find_loaded_rsc_table(struct rproc *rproc,
struct rproc_mem_entry *
rproc_find_carveout_by_name(struct rproc *rproc, const char *name, ...);

+static inline bool rproc_needs_syncing(struct rproc *rproc)
+{
+ return rproc->sync_with_rproc;
+}
+
static inline
int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
{
--
2.20.1

2020-04-24 20:05:50

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 09/14] remoteproc: Deal with synchronisation when crashing

Refactor function rproc_trigger_recovery() in order to avoid
reloading the firmware image when synchronising with a remote
processor rather than booting it. Also part of the process,
properly set the synchronisation flag in order to properly
recover the system.

Signed-off-by: Mathieu Poirier <[email protected]>
---
drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------
drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++
2 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index ef88d3e84bfb..3a84a38ba37b 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc)
*/
int rproc_trigger_recovery(struct rproc *rproc)
{
- const struct firmware *firmware_p;
+ const struct firmware *firmware_p = NULL;
struct device *dev = &rproc->dev;
int ret;

@@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
/* generate coredump */
rproc_coredump(rproc);

- /* load firmware */
- ret = request_firmware(&firmware_p, rproc->firmware, dev);
- if (ret < 0) {
- dev_err(dev, "request_firmware failed: %d\n", ret);
- goto unlock_mutex;
+ /* load firmware if need be */
+ if (!rproc_needs_syncing(rproc)) {
+ ret = request_firmware(&firmware_p, rproc->firmware, dev);
+ if (ret < 0) {
+ dev_err(dev, "request_firmware failed: %d\n", ret);
+ goto unlock_mutex;
+ }
}

- /* boot the remote processor up again */
+ /* boot up or synchronise with the remote processor again */
ret = rproc_start(rproc, firmware_p);

release_firmware(firmware_p);
@@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work)
dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
rproc->name);

+ /*
+ * The remote processor has crashed - tell the core what operation
+ * to use from hereon, i.e whether an external entity will reboot
+ * the MCU or it is now the remoteproc core's responsability.
+ */
+ rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED);
+
mutex_unlock(&rproc->lock);

if (!rproc->recovery_disabled)
diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
index 3985c084b184..61500981155c 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -24,6 +24,33 @@ struct rproc_debug_trace {
struct rproc_mem_entry trace_mem;
};

+/*
+ * enum rproc_sync_states - remote processsor sync states
+ *
+ * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
+ * has crashed but has not been recovered by
+ * the remoteproc core yet.
+ *
+ * Keeping these separate from the enum rproc_state in order to avoid
+ * introducing coupling between the state of the MCU and the synchronisation
+ * operation to use.
+ */
+enum rproc_sync_states {
+ RPROC_SYNC_STATE_CRASHED,
+};
+
+static inline void rproc_set_sync_flag(struct rproc *rproc,
+ enum rproc_sync_states state)
+{
+ switch (state) {
+ case RPROC_SYNC_STATE_CRASHED:
+ rproc->sync_with_rproc = rproc->sync_flags.after_crash;
+ break;
+ default:
+ break;
+ }
+}
+
/* from remoteproc_core.c */
void rproc_release(struct kref *kref);
irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id);
--
2.20.1

2020-04-24 20:07:22

by Mathieu Poirier

[permalink] [raw]
Subject: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

Add a new sync_ops to support use cases where the remoteproc
core is synchronising with the remote processor. Exactly when to use
the synchronisation operations is directed by the flags in structure
rproc_sync_flags.

Signed-off-by: Mathieu Poirier <[email protected]>
---
include/linux/remoteproc.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
index ac4082f12e8b..ceb3b2bba824 100644
--- a/include/linux/remoteproc.h
+++ b/include/linux/remoteproc.h
@@ -353,6 +353,23 @@ enum rsc_handling_status {
RSC_IGNORED = 1,
};

+/**
+ * struct rproc_sync_flags - platform specific flags indicating which
+ * rproc_ops to use at specific times during
+ * the rproc lifecycle.
+ * @on_init: true if synchronising with the remote processor at
+ * initialisation time
+ * @after_stop: true if synchronising with the remote processor after it was
+ * stopped from the cmmand line
+ * @after_crash: true if synchronising with the remote processor after
+ * it has crashed
+ */
+struct rproc_sync_flags {
+ bool on_init;
+ bool after_stop;
+ bool after_crash;
+};
+
/**
* struct rproc_ops - platform-specific device handlers
* @start: power on the device and boot it
@@ -459,6 +476,9 @@ struct rproc_dump_segment {
* @firmware: name of firmware file to be loaded
* @priv: private data which belongs to the platform-specific rproc module
* @ops: platform-specific start/stop rproc handlers
+ * @sync_ops: platform-specific start/stop rproc handlers when
+ * synchronising with a remote processor.
+ * @sync_flags: Determine the rproc_ops to choose in specific states.
* @dev: virtual device for refcounting and common remoteproc behavior
* @power: refcount of users who need this rproc powered up
* @state: state of the device
@@ -482,6 +502,7 @@ struct rproc_dump_segment {
* @table_sz: size of @cached_table
* @has_iommu: flag to indicate if remote processor is behind an MMU
* @auto_boot: flag to indicate if remote processor should be auto-started
+ * @sync_with_rproc: true if currently synchronising with the rproc
* @dump_segments: list of segments in the firmware
* @nb_vdev: number of vdev currently handled by rproc
*/
@@ -492,6 +513,8 @@ struct rproc {
const char *firmware;
void *priv;
struct rproc_ops *ops;
+ struct rproc_ops *sync_ops;
+ struct rproc_sync_flags sync_flags;
struct device dev;
atomic_t power;
unsigned int state;
@@ -515,6 +538,7 @@ struct rproc {
size_t table_sz;
bool has_iommu;
bool auto_boot;
+ bool sync_with_rproc;
struct list_head dump_segments;
int nb_vdev;
u8 elf_class;
--
2.20.1

2020-04-28 16:23:32

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 01/14] remoteproc: Make core operations optional

Hi Mathieu,

On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> When synchronizing with a remote processor, it is entirely possible that
> the remoteproc core is not the life cycle manager. In such a case core
> operations don't exist and should not be called.

What about ops in remote_core.c?
Applying the series, seems that at least rproc->ops->panic rproc->ops->da_to_va
can be called tested with undefined ops structure.

Regards,

Arnaud

>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_internal.h | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index b389dc79da81..59fc871743c7 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -67,7 +67,7 @@ rproc_find_carveout_by_name(struct rproc *rproc, const char *name, ...);
> static inline
> int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> {
> - if (rproc->ops->sanity_check)
> + if (rproc->ops && rproc->ops->sanity_check)
> return rproc->ops->sanity_check(rproc, fw);
>
> return 0;
> @@ -76,7 +76,7 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> static inline
> u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> {
> - if (rproc->ops->get_boot_addr)
> + if (rproc->ops && rproc->ops->get_boot_addr)
> return rproc->ops->get_boot_addr(rproc, fw);
>
> return 0;
> @@ -85,7 +85,7 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> static inline
> int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> {
> - if (rproc->ops->load)
> + if (rproc->ops && rproc->ops->load)
> return rproc->ops->load(rproc, fw);
>
> return -EINVAL;
> @@ -93,7 +93,7 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
>
> static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
> {
> - if (rproc->ops->parse_fw)
> + if (rproc->ops && rproc->ops->parse_fw)
> return rproc->ops->parse_fw(rproc, fw);
>
> return 0;
> @@ -103,7 +103,7 @@ static inline
> int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
> int avail)
> {
> - if (rproc->ops->handle_rsc)
> + if (rproc->ops && rproc->ops->handle_rsc)
> return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
> avail);
>
> @@ -114,7 +114,7 @@ static inline
> struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> const struct firmware *fw)
> {
> - if (rproc->ops->find_loaded_rsc_table)
> + if (rproc->ops && rproc->ops->find_loaded_rsc_table)
> return rproc->ops->find_loaded_rsc_table(rproc, fw);
>
> return NULL;
>

2020-04-28 16:40:48

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation



On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> Add a new sync_ops to support use cases where the remoteproc
> core is synchronising with the remote processor. Exactly when to use
> the synchronisation operations is directed by the flags in structure
> rproc_sync_flags.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> include/linux/remoteproc.h | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index ac4082f12e8b..ceb3b2bba824 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -353,6 +353,23 @@ enum rsc_handling_status {
> RSC_IGNORED = 1,
> };
>
> +/**
> + * struct rproc_sync_flags - platform specific flags indicating which
> + * rproc_ops to use at specific times during
> + * the rproc lifecycle.
> + * @on_init: true if synchronising with the remote processor at
> + * initialisation time
> + * @after_stop: true if synchronising with the remote processor after it was
> + * stopped from the cmmand line
typo command
> + * @after_crash: true if synchronising with the remote processor after
> + * it has crashed
> + */
> +struct rproc_sync_flags {
> + bool on_init;
> + bool after_stop;
> + bool after_crash;
> +};
> +
how about a bit field instead (just a proposition)?
Platform driver would set the sync flag and rproc_set_sync_flag could be a
simple mask instead of a switch case.

Is it possible to split this patch in a different ways because difficult to understand as
rproc_sync_flags seems not used before
[PATCH v3 09/14] remoteproc: Deal with synchronisation when crashing

Thanks
Arnaud

> /**
> * struct rproc_ops - platform-specific device handlers
> * @start: power on the device and boot it
> @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> * @firmware: name of firmware file to be loaded
> * @priv: private data which belongs to the platform-specific rproc module
> * @ops: platform-specific start/stop rproc handlers
> + * @sync_ops: platform-specific start/stop rproc handlers when
> + * synchronising with a remote processor.
> + * @sync_flags: Determine the rproc_ops to choose in specific states.
> * @dev: virtual device for refcounting and common remoteproc behavior
> * @power: refcount of users who need this rproc powered up
> * @state: state of the device
> @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> * @table_sz: size of @cached_table
> * @has_iommu: flag to indicate if remote processor is behind an MMU
> * @auto_boot: flag to indicate if remote processor should be auto-started
> + * @sync_with_rproc: true if currently synchronising with the rproc
> * @dump_segments: list of segments in the firmware
> * @nb_vdev: number of vdev currently handled by rproc
> */
> @@ -492,6 +513,8 @@ struct rproc {
> const char *firmware;
> void *priv;
> struct rproc_ops *ops;
> + struct rproc_ops *sync_ops;
> + struct rproc_sync_flags sync_flags;
> struct device dev;
> atomic_t power;
> unsigned int state;
> @@ -515,6 +538,7 @@ struct rproc {
> size_t table_sz;
> bool has_iommu;
> bool auto_boot;
> + bool sync_with_rproc;
> struct list_head dump_segments;
> int nb_vdev;
> u8 elf_class;
>

2020-04-28 17:03:16

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 06/14] remoteproc: Refactor function rproc_trigger_auto_boot()



On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> Refactor function rproc_trigger_auto_boot() so that it can deal with
> scenarios where the remote processor is already running. As such give
> it a new name to better represent the capabilities and add a call to
> rproc_boot() if instructed by the platform code to synchronise with the
> remote processor rather than boot it from scratch.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index e90a21de9de1..9de0e2b7ca2b 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1457,10 +1457,17 @@ static void rproc_auto_boot_callback(const struct firmware *fw, void *context)
> release_firmware(fw);
> }
>
> -static int rproc_trigger_auto_boot(struct rproc *rproc)
> +static int rproc_trigger_auto_initiate(struct rproc *rproc)
> {
> int ret;
>
> + /*
> + * If the remote processor is already booted, all we need to do is
> + * synchronise it it. No point in dealing with a firmware image.
typo: suppress double "it" and double space.

> + */
> + if (rproc_needs_syncing(rproc))
> + return rproc_boot(rproc);
> +
> /*
> * We're initiating an asynchronous firmware loading, so we can
> * be built-in kernel code, without hanging the boot process.
> @@ -1971,9 +1978,12 @@ int rproc_add(struct rproc *rproc)
> /* create debugfs entries */
> rproc_create_debug_dir(rproc);
>
> - /* if rproc is marked always-on, request it to boot */
> + /*
> + * If the auto boot flag is set, request to boot the remote
> + * processor or synchronise with it.
> + */
> if (rproc->auto_boot) {
> - ret = rproc_trigger_auto_boot(rproc);
> + ret = rproc_trigger_auto_initiate(rproc);
> if (ret < 0)
> return ret;
> }
>

2020-04-28 17:31:58

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 08/14] remoteproc: Call core functions based on synchronisation flag



On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> Call the right core function based on whether we should synchronise
> with a remote processor or boot it from scratch.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_internal.h | 50 ++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index dda7044c4b3e..3985c084b184 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -72,6 +72,12 @@ static inline bool rproc_needs_syncing(struct rproc *rproc)
> static inline
> int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> {
> + if (rproc_needs_syncing(rproc)) {
> + if (rproc->sync_ops && rproc->sync_ops->sanity_check)
> + return rproc->sync_ops->sanity_check(rproc, fw);
> + return 0;
> + }
> +
> if (rproc->ops && rproc->ops->sanity_check)
> return rproc->ops->sanity_check(rproc, fw);

Regarding this patch I'm trying to determine whether it makes sense to have ops or
sync_ops set to null. Your[v3 01/14] patch commit explains that ops can be null in case of
synchronisation.
But it seems deprecated with the sync_ops introduction...

And if sync_ops is null, is it still necessary to define a remoteproc device?

Regards
Arnad

>
> @@ -81,6 +87,12 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> static inline
> u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> {
> + if (rproc_needs_syncing(rproc)) {
> + if (rproc->sync_ops && rproc->sync_ops->get_boot_addr)
> + return rproc->sync_ops->get_boot_addr(rproc, fw);
> + return 0;
> + }
> +
> if (rproc->ops && rproc->ops->get_boot_addr)
> return rproc->ops->get_boot_addr(rproc, fw);
>
> @@ -90,6 +102,12 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> static inline
> int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> {
> + if (rproc_needs_syncing(rproc)) {
> + if (rproc->sync_ops && rproc->sync_ops->load)
> + return rproc->sync_ops->load(rproc, fw);
> + return 0;
> + }
> +
> if (rproc->ops && rproc->ops->load)
> return rproc->ops->load(rproc, fw);
>
> @@ -98,6 +116,12 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
>
> static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
> {
> + if (rproc_needs_syncing(rproc)) {
> + if (rproc->sync_ops && rproc->sync_ops->parse_fw)
> + return rproc->sync_ops->parse_fw(rproc, fw);
> + return 0;
> + }
> +
> if (rproc->ops && rproc->ops->parse_fw)
> return rproc->ops->parse_fw(rproc, fw);
>
> @@ -108,6 +132,13 @@ static inline
> int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
> int avail)
> {
> + if (rproc_needs_syncing(rproc)) {
> + if (rproc->sync_ops && rproc->sync_ops->handle_rsc)
> + return rproc->sync_ops->handle_rsc(rproc, rsc_type,
> + rsc, offset, avail);
> + return 0;
> + }
> +
> if (rproc->ops && rproc->ops->handle_rsc)
> return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
> avail);
> @@ -119,6 +150,13 @@ static inline
> struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> const struct firmware *fw)
> {
> + if (rproc_needs_syncing(rproc)) {
> + if (rproc->sync_ops && rproc->sync_ops->find_loaded_rsc_table)
> + return rproc->sync_ops->find_loaded_rsc_table(rproc,
> + fw);
> + return NULL;
> + }
> +
> if (rproc->ops && rproc->ops->find_loaded_rsc_table)
> return rproc->ops->find_loaded_rsc_table(rproc, fw);
>
> @@ -127,6 +165,12 @@ struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
>
> static inline int rproc_start_device(struct rproc *rproc)
> {
> + if (rproc_needs_syncing(rproc)) {
> + if (rproc->sync_ops && rproc->sync_ops->start)
> + return rproc->sync_ops->start(rproc);
> + return 0;
> + }
> +
> if (rproc->ops && rproc->ops->start)
> return rproc->ops->start(rproc);
>
> @@ -135,6 +179,12 @@ static inline int rproc_start_device(struct rproc *rproc)
>
> static inline int rproc_stop_device(struct rproc *rproc)
> {
> + if (rproc_needs_syncing(rproc)) {
> + if (rproc->sync_ops && rproc->sync_ops->stop)
> + return rproc->sync_ops->stop(rproc);
> + return 0;
> + }
> +
> if (rproc->ops && rproc->ops->stop)
> return rproc->ops->stop(rproc);
>
>

2020-04-29 07:46:38

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 09/14] remoteproc: Deal with synchronisation when crashing

Hi Mathieu,

On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> Refactor function rproc_trigger_recovery() in order to avoid
> reloading the firmware image when synchronising with a remote
> processor rather than booting it. Also part of the process,
> properly set the synchronisation flag in order to properly
> recover the system.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------
> drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++
> 2 files changed, 43 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index ef88d3e84bfb..3a84a38ba37b 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc)
> */
> int rproc_trigger_recovery(struct rproc *rproc)
> {
> - const struct firmware *firmware_p;
> + const struct firmware *firmware_p = NULL;
> struct device *dev = &rproc->dev;
> int ret;
>
> @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> /* generate coredump */
> rproc_coredump(rproc);
>
> - /* load firmware */
> - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> - if (ret < 0) {
> - dev_err(dev, "request_firmware failed: %d\n", ret);
> - goto unlock_mutex;
> + /* load firmware if need be */
> + if (!rproc_needs_syncing(rproc)) {
> + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> + if (ret < 0) {
> + dev_err(dev, "request_firmware failed: %d\n", ret);
> + goto unlock_mutex;
> + }

If we started in syncing mode then rpoc->firmware is null
rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED) can make rproc_needs_syncing(rproc)
false.
In this case here we fail the recovery an leave in RPROC_STOP state.
As you proposed in Loic RFC[1], what about adding a more explicit message to inform that the recovery
failed.

[1]https://lkml.org/lkml/2020/3/11/402

Regards,
Arnaud
> }
>
> - /* boot the remote processor up again */
> + /* boot up or synchronise with the remote processor again */
> ret = rproc_start(rproc, firmware_p);
>
> release_firmware(firmware_p);
> @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work)
> dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
> rproc->name);
>
> + /*
> + * The remote processor has crashed - tell the core what operation
> + * to use from hereon, i.e whether an external entity will reboot
> + * the MCU or it is now the remoteproc core's responsability.
> + */
> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED);
> +
> mutex_unlock(&rproc->lock);
>
> if (!rproc->recovery_disabled)
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index 3985c084b184..61500981155c 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -24,6 +24,33 @@ struct rproc_debug_trace {
> struct rproc_mem_entry trace_mem;
> };
>
> +/*
> + * enum rproc_sync_states - remote processsor sync states
> + *
> + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> + * has crashed but has not been recovered by
> + * the remoteproc core yet.
> + *
> + * Keeping these separate from the enum rproc_state in order to avoid
> + * introducing coupling between the state of the MCU and the synchronisation
> + * operation to use.
> + */
> +enum rproc_sync_states {
> + RPROC_SYNC_STATE_CRASHED,
> +};
> +
> +static inline void rproc_set_sync_flag(struct rproc *rproc,
> + enum rproc_sync_states state)
> +{
> + switch (state) {
> + case RPROC_SYNC_STATE_CRASHED:
> + rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> + break;
> + default:
> + break;
> + }
> +}
> +
> /* from remoteproc_core.c */
> void rproc_release(struct kref *kref);
> irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id);
>

2020-04-29 08:22:04

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 10/14] remoteproc: Deal with synchronisation when shutting down



On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> The remoteproc core must not allow function rproc_shutdown() to
> proceed if currently synchronising with a remote processor and
> the synchronisation operations of that remote processor does not
> support it. Also part of the process is to set the synchronisation
> flag so that the remoteproc core can make the right decisions when
> restarting the system.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 32 ++++++++++++++++++++++++
> drivers/remoteproc/remoteproc_internal.h | 7 ++++++
> 2 files changed, 39 insertions(+)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 3a84a38ba37b..48afa1f80a8f 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1849,6 +1849,27 @@ int rproc_boot(struct rproc *rproc)
> }
> EXPORT_SYMBOL(rproc_boot);
>
> +static bool rproc_can_shutdown(struct rproc *rproc)
> +{
> + /*
> + * The remoteproc core is the lifecycle manager, no problem
> + * calling for a shutdown.
> + */
> + if (!rproc_needs_syncing(rproc))
> + return true;
> +
> + /*
> + * The remoteproc has been loaded by another entity (as per above
> + * condition) and the platform code has given us the capability
> + * of stopping it.
> + */
> + if (rproc->sync_ops->stop)
> + return true;

This means that if rproc->sync_ops->stop is null rproc_stop_subdevices will not
be called? seems not symmetric with the start sequence.
Probably not useful to test it here as condition is already handled in rproc_stop_device...

Regards
Arnaud
> +
> + /* Any other condition should not be allowed */
> + return false;
> +}
> +
> /**
> * rproc_shutdown() - power off the remote processor
> * @rproc: the remote processor
> @@ -1879,6 +1900,9 @@ void rproc_shutdown(struct rproc *rproc)
> return;
> }
>
> + if (!rproc_can_shutdown(rproc))
> + goto out;
> +
> /* if the remote proc is still needed, bail out */
> if (!atomic_dec_and_test(&rproc->power))
> goto out;
> @@ -1898,6 +1922,14 @@ void rproc_shutdown(struct rproc *rproc)
> kfree(rproc->cached_table);
> rproc->cached_table = NULL;
> rproc->table_ptr = NULL;
> +
> + /*
> + * The remote processor has been switched off - tell the core what
> + * operation to use from hereon, i.e whether an external entity will
> + * reboot the remote processor or it is now the remoteproc core's
> + * responsability.
> + */
> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_SHUTDOWN);
> out:
> mutex_unlock(&rproc->lock);
> }
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index 61500981155c..7dcc0a26892b 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -27,6 +27,9 @@ struct rproc_debug_trace {
> /*
> * enum rproc_sync_states - remote processsor sync states
> *
> + * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
> + * has shutdown (rproc_shutdown()) the
> + * remote processor.
> * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> * has crashed but has not been recovered by
> * the remoteproc core yet.
> @@ -36,6 +39,7 @@ struct rproc_debug_trace {
> * operation to use.
> */
> enum rproc_sync_states {
> + RPROC_SYNC_STATE_SHUTDOWN,
> RPROC_SYNC_STATE_CRASHED,
> };
>
> @@ -43,6 +47,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
> enum rproc_sync_states state)
> {
> switch (state) {
> + case RPROC_SYNC_STATE_SHUTDOWN:
> + rproc->sync_with_rproc = rproc->sync_flags.after_stop;
> + break;
> case RPROC_SYNC_STATE_CRASHED:
> rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> break;
>

2020-04-29 08:55:51

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] remoteproc: Deal with synchronisation when changing FW image



On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> This patch prevents the firmware image from being displayed or changed
> when the remoteproc core is synchronising with a remote processor. This
> is needed since there is no guarantee about the nature of the firmware
> image that is loaded by the external entity.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_sysfs.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
> index 7f8536b73295..cdd322a6ecfa 100644
> --- a/drivers/remoteproc/remoteproc_sysfs.c
> +++ b/drivers/remoteproc/remoteproc_sysfs.c
> @@ -13,9 +13,20 @@
> static ssize_t firmware_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> + ssize_t ret;
> struct rproc *rproc = to_rproc(dev);
>
> - return sprintf(buf, "%s\n", rproc->firmware);
> + /*
> + * In most instances there is no guarantee about the firmware
> + * that was loaded by the external entity. As such simply don't
> + * print anything.
> + */
> + if (rproc_needs_syncing(rproc))
> + ret = sprintf(buf, "\n");

A default name is provided in sysfs if no firmware is started/synchronised on boot.

IMO providing an empty name here could be confusing.
Perhaps a refactoring of this sysfs entry would be nice:
- Normal boot (no firmware loaded) : empty name instead of a default name
- auto_boot: name provided by the platform driver or default name ( current implementation)
- synchronization: a predefined name such as Default, unknown, External, None,...

> + else
> + ret = sprintf(buf, "%s\n", rproc->firmware);
> +
> + return ret;
> }
>
> /* Change firmware name via sysfs */
> @@ -39,6 +50,17 @@ static ssize_t firmware_store(struct device *dev,
> goto out;
> }
>
> + /*
> + * There is no point in trying to change the firmware if loading the
> + * image of the remote processor is done by another entity.
> + */
> + if (rproc_needs_syncing(rproc)) {
> + dev_err(dev,
> + "can't change firmware while synchronising with MCU\n");

I don't know if you decide to keep "MCU" or not. If not the case
you have also some other instances in your patch 9/14.

Regards
Arnaud

> + err = -EBUSY;
> + goto out;
> + }
> +
> len = strcspn(buf, "\n");
> if (!len) {
> dev_err(dev, "can't provide a NULL firmware\n");
>

2020-04-29 09:25:06

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] remoteproc: Introducing function rproc_set_state_machine()



On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> Introducting function rproc_set_state_machine() to add
> operations and a set of flags to use when synchronising with
> a remote processor.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 54 ++++++++++++++++++++++++
> drivers/remoteproc/remoteproc_internal.h | 6 +++
> include/linux/remoteproc.h | 3 ++
> 3 files changed, 63 insertions(+)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 48afa1f80a8f..5c48714e8702 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -2065,6 +2065,59 @@ int devm_rproc_add(struct device *dev, struct rproc *rproc)
> }
> EXPORT_SYMBOL(devm_rproc_add);
>
> +/**
> + * rproc_set_state_machine() - Set a synchronisation ops and set of flags
> + * to use with a remote processor
> + * @rproc: The remote processor to work with
> + * @sync_ops: The operations to use when synchronising with a remote
> + * processor
> + * @sync_flags: The flags to use when deciding if the remoteproc core
> + * should be synchronising with a remote processor
> + *
> + * Returns 0 on success, an error code otherwise.
> + */
> +int rproc_set_state_machine(struct rproc *rproc,
> + const struct rproc_ops *sync_ops,
> + struct rproc_sync_flags sync_flags)

So this API should be called by platform driver only in case of synchronization
support, right?
In this case i would rename it as there is also a state machine in "normal" boot
proposal: rproc_set_sync_machine or rproc_set_sync_state_machine

> +{
> + if (!rproc || !sync_ops)
> + return -EINVAL;
> +
> + /*
> + * No point in going further if we never have to synchronise with
> + * the remote processor.
> + */
> + if (!sync_flags.on_init &&
> + !sync_flags.after_stop && !sync_flags.after_crash)
> + return 0;
> +
> + /*
> + * Refuse to go further if remoteproc operations have been allocated
> + * but they will never be used.
> + */
> + if (rproc->ops && sync_flags.on_init &&
> + sync_flags.after_stop && sync_flags.after_crash)
> + return -EINVAL;
> +
> + /*
> + * Don't allow users to set this more than once to avoid situations
> + * where the remote processor can't be recovered.
> + */
> + if (rproc->sync_ops)
> + return -EINVAL;
> +
> + rproc->sync_ops = kmemdup(sync_ops, sizeof(*sync_ops), GFP_KERNEL);
> + if (!rproc->sync_ops)
> + return -ENOMEM;
> +
> + rproc->sync_flags = sync_flags;
> + /* Tell the core what to do when initialising */
> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_INIT);

Is there a use case where sync_flags.on_init is false and other flags are true?

Look like on_init is useless and should not be exposed to the platform driver.
Or comments are missing to explain the usage of it vs the other flags.

Regards,
Arnaud

> +
> + return 0;
> +}
> +EXPORT_SYMBOL(rproc_set_state_machine);
> +
> /**
> * rproc_type_release() - release a remote processor instance
> * @dev: the rproc's device
> @@ -2088,6 +2141,7 @@ static void rproc_type_release(struct device *dev)
> kfree_const(rproc->firmware);
> kfree_const(rproc->name);
> kfree(rproc->ops);
> + kfree(rproc->sync_ops);
> kfree(rproc);
> }
>
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index 7dcc0a26892b..c1a293a37c78 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -27,6 +27,8 @@ struct rproc_debug_trace {
> /*
> * enum rproc_sync_states - remote processsor sync states
> *
> + * @RPROC_SYNC_STATE_INIT state to use when the remoteproc core
> + * is initialising.
> * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
> * has shutdown (rproc_shutdown()) the
> * remote processor.
> @@ -39,6 +41,7 @@ struct rproc_debug_trace {
> * operation to use.
> */
> enum rproc_sync_states {
> + RPROC_SYNC_STATE_INIT,
> RPROC_SYNC_STATE_SHUTDOWN,
> RPROC_SYNC_STATE_CRASHED,
> };
> @@ -47,6 +50,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
> enum rproc_sync_states state)
> {
> switch (state) {
> + case RPROC_SYNC_STATE_INIT:
> + rproc->sync_with_rproc = rproc->sync_flags.on_init;
> + break;
> case RPROC_SYNC_STATE_SHUTDOWN:
> rproc->sync_with_rproc = rproc->sync_flags.after_stop;
> break;
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index ceb3b2bba824..a75ed92b3de6 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -619,6 +619,9 @@ struct rproc *rproc_get_by_child(struct device *dev);
> struct rproc *rproc_alloc(struct device *dev, const char *name,
> const struct rproc_ops *ops,
> const char *firmware, int len);
> +int rproc_set_state_machine(struct rproc *rproc,
> + const struct rproc_ops *sync_ops,
> + struct rproc_sync_flags sync_flags);
> void rproc_put(struct rproc *rproc);
> int rproc_add(struct rproc *rproc);
> int rproc_del(struct rproc *rproc);
>

2020-04-29 14:41:01

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] remoteproc: Introducing function rproc_set_state_machine()



On 4/29/20 11:22 AM, Arnaud POULIQUEN wrote:
>
>
> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
>> Introducting function rproc_set_state_machine() to add
>> operations and a set of flags to use when synchronising with
>> a remote processor.
>>
>> Signed-off-by: Mathieu Poirier <[email protected]>
>> ---
>> drivers/remoteproc/remoteproc_core.c | 54 ++++++++++++++++++++++++
>> drivers/remoteproc/remoteproc_internal.h | 6 +++
>> include/linux/remoteproc.h | 3 ++
>> 3 files changed, 63 insertions(+)
>>
>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>> index 48afa1f80a8f..5c48714e8702 100644
>> --- a/drivers/remoteproc/remoteproc_core.c
>> +++ b/drivers/remoteproc/remoteproc_core.c
>> @@ -2065,6 +2065,59 @@ int devm_rproc_add(struct device *dev, struct rproc *rproc)
>> }
>> EXPORT_SYMBOL(devm_rproc_add);
>>
>> +/**
>> + * rproc_set_state_machine() - Set a synchronisation ops and set of flags
>> + * to use with a remote processor
>> + * @rproc: The remote processor to work with
>> + * @sync_ops: The operations to use when synchronising with a remote
>> + * processor
>> + * @sync_flags: The flags to use when deciding if the remoteproc core
>> + * should be synchronising with a remote processor
>> + *
>> + * Returns 0 on success, an error code otherwise.
>> + */
>> +int rproc_set_state_machine(struct rproc *rproc,
>> + const struct rproc_ops *sync_ops,
>> + struct rproc_sync_flags sync_flags)
>
> So this API should be called by platform driver only in case of synchronization
> support, right?
> In this case i would rename it as there is also a state machine in "normal" boot
> proposal: rproc_set_sync_machine or rproc_set_sync_state_machine
>

Reviewing the stm32 series, i wonder if sync_flags should be a pointer to a const structure
as the platform driver should not update it during the rproc live cycle.
Then IMO, using a pointer to the structure instead of the structure seems more
in line with the rest of the remoteproc API.

>> +{
>> + if (!rproc || !sync_ops)
>> + return -EINVAL;
>> +
>> + /*
>> + * No point in going further if we never have to synchronise with
>> + * the remote processor.
>> + */
>> + if (!sync_flags.on_init &&
>> + !sync_flags.after_stop && !sync_flags.after_crash)
>> + return 0;
>> +
>> + /*
>> + * Refuse to go further if remoteproc operations have been allocated
>> + * but they will never be used.
>> + */
>> + if (rproc->ops && sync_flags.on_init &&
>> + sync_flags.after_stop && sync_flags.after_crash)
>> + return -EINVAL;
>> +
>> + /*
>> + * Don't allow users to set this more than once to avoid situations
>> + * where the remote processor can't be recovered.
>> + */
>> + if (rproc->sync_ops)
>> + return -EINVAL;
>> +
>> + rproc->sync_ops = kmemdup(sync_ops, sizeof(*sync_ops), GFP_KERNEL);
>> + if (!rproc->sync_ops)
>> + return -ENOMEM;
>> +
>> + rproc->sync_flags = sync_flags;
>> + /* Tell the core what to do when initialising */
>> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_INIT);
>
> Is there a use case where sync_flags.on_init is false and other flags are true?
>
> Look like on_init is useless and should not be exposed to the platform driver.
> Or comments are missing to explain the usage of it vs the other flags.
>
> Regards,
> Arnaud
>
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL(rproc_set_state_machine);
>> +
>> /**
>> * rproc_type_release() - release a remote processor instance
>> * @dev: the rproc's device
>> @@ -2088,6 +2141,7 @@ static void rproc_type_release(struct device *dev)
>> kfree_const(rproc->firmware);
>> kfree_const(rproc->name);
>> kfree(rproc->ops);
>> + kfree(rproc->sync_ops);
>> kfree(rproc);
>> }
>>
>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
>> index 7dcc0a26892b..c1a293a37c78 100644
>> --- a/drivers/remoteproc/remoteproc_internal.h
>> +++ b/drivers/remoteproc/remoteproc_internal.h
>> @@ -27,6 +27,8 @@ struct rproc_debug_trace {
>> /*
>> * enum rproc_sync_states - remote processsor sync states
>> *
>> + * @RPROC_SYNC_STATE_INIT state to use when the remoteproc core
>> + * is initialising.
>> * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
>> * has shutdown (rproc_shutdown()) the
>> * remote processor.
>> @@ -39,6 +41,7 @@ struct rproc_debug_trace {
>> * operation to use.
>> */
>> enum rproc_sync_states {
>> + RPROC_SYNC_STATE_INIT,
>> RPROC_SYNC_STATE_SHUTDOWN,
>> RPROC_SYNC_STATE_CRASHED,
>> };
>> @@ -47,6 +50,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
>> enum rproc_sync_states state)
>> {
>> switch (state) {
>> + case RPROC_SYNC_STATE_INIT:
>> + rproc->sync_with_rproc = rproc->sync_flags.on_init;
>> + break;
>> case RPROC_SYNC_STATE_SHUTDOWN:
>> rproc->sync_with_rproc = rproc->sync_flags.after_stop;
>> break;
>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>> index ceb3b2bba824..a75ed92b3de6 100644
>> --- a/include/linux/remoteproc.h
>> +++ b/include/linux/remoteproc.h
>> @@ -619,6 +619,9 @@ struct rproc *rproc_get_by_child(struct device *dev);
>> struct rproc *rproc_alloc(struct device *dev, const char *name,
>> const struct rproc_ops *ops,
>> const char *firmware, int len);
>> +int rproc_set_state_machine(struct rproc *rproc,
>> + const struct rproc_ops *sync_ops,
>> + struct rproc_sync_flags sync_flags);
>> void rproc_put(struct rproc *rproc);
>> int rproc_add(struct rproc *rproc);
>> int rproc_del(struct rproc *rproc);
>>

2020-04-30 19:41:09

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 01/14] remoteproc: Make core operations optional

On Tue, Apr 28, 2020 at 06:18:59PM +0200, Arnaud POULIQUEN wrote:
> Hi Mathieu,
>
> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> > When synchronizing with a remote processor, it is entirely possible that
> > the remoteproc core is not the life cycle manager. In such a case core
> > operations don't exist and should not be called.
>
> What about ops in remote_core.c?
> Applying the series, seems that at least rproc->ops->panic rproc->ops->da_to_va
> can be called tested with undefined ops structure.

Very true - good catch!

>
> Regards,
>
> Arnaud
>
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_internal.h | 12 ++++++------
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index b389dc79da81..59fc871743c7 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -67,7 +67,7 @@ rproc_find_carveout_by_name(struct rproc *rproc, const char *name, ...);
> > static inline
> > int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> > {
> > - if (rproc->ops->sanity_check)
> > + if (rproc->ops && rproc->ops->sanity_check)
> > return rproc->ops->sanity_check(rproc, fw);
> >
> > return 0;
> > @@ -76,7 +76,7 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> > static inline
> > u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> > {
> > - if (rproc->ops->get_boot_addr)
> > + if (rproc->ops && rproc->ops->get_boot_addr)
> > return rproc->ops->get_boot_addr(rproc, fw);
> >
> > return 0;
> > @@ -85,7 +85,7 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> > static inline
> > int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> > {
> > - if (rproc->ops->load)
> > + if (rproc->ops && rproc->ops->load)
> > return rproc->ops->load(rproc, fw);
> >
> > return -EINVAL;
> > @@ -93,7 +93,7 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> >
> > static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
> > {
> > - if (rproc->ops->parse_fw)
> > + if (rproc->ops && rproc->ops->parse_fw)
> > return rproc->ops->parse_fw(rproc, fw);
> >
> > return 0;
> > @@ -103,7 +103,7 @@ static inline
> > int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
> > int avail)
> > {
> > - if (rproc->ops->handle_rsc)
> > + if (rproc->ops && rproc->ops->handle_rsc)
> > return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
> > avail);
> >
> > @@ -114,7 +114,7 @@ static inline
> > struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> > const struct firmware *fw)
> > {
> > - if (rproc->ops->find_loaded_rsc_table)
> > + if (rproc->ops && rproc->ops->find_loaded_rsc_table)
> > return rproc->ops->find_loaded_rsc_table(rproc, fw);
> >
> > return NULL;
> >

2020-04-30 19:51:24

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

On Tue, Apr 28, 2020 at 06:38:41PM +0200, Arnaud POULIQUEN wrote:
>
>
> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> > Add a new sync_ops to support use cases where the remoteproc
> > core is synchronising with the remote processor. Exactly when to use
> > the synchronisation operations is directed by the flags in structure
> > rproc_sync_flags.
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > include/linux/remoteproc.h | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> > index ac4082f12e8b..ceb3b2bba824 100644
> > --- a/include/linux/remoteproc.h
> > +++ b/include/linux/remoteproc.h
> > @@ -353,6 +353,23 @@ enum rsc_handling_status {
> > RSC_IGNORED = 1,
> > };
> >
> > +/**
> > + * struct rproc_sync_flags - platform specific flags indicating which
> > + * rproc_ops to use at specific times during
> > + * the rproc lifecycle.
> > + * @on_init: true if synchronising with the remote processor at
> > + * initialisation time
> > + * @after_stop: true if synchronising with the remote processor after it was
> > + * stopped from the cmmand line
> typo command
> > + * @after_crash: true if synchronising with the remote processor after
> > + * it has crashed
> > + */
> > +struct rproc_sync_flags {
> > + bool on_init;
> > + bool after_stop;
> > + bool after_crash;
> > +};
> > +
> how about a bit field instead (just a proposition)?
> Platform driver would set the sync flag and rproc_set_sync_flag could be a
> simple mask instead of a switch case.

I opted for a structure over bit fields because I thought it would be easier to
read/understand. Both approaches are valid and I have to particular preference
other than, in my own view, a structure is easier to understand.

I'll wait a little to see what other people think. If nobody objects the next
revision will have bit fields.

>
> Is it possible to split this patch in a different ways because difficult to understand as
> rproc_sync_flags seems not used before
> [PATCH v3 09/14] remoteproc: Deal with synchronisation when crashing

Certainly

>
> Thanks
> Arnaud
>
> > /**
> > * struct rproc_ops - platform-specific device handlers
> > * @start: power on the device and boot it
> > @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> > * @firmware: name of firmware file to be loaded
> > * @priv: private data which belongs to the platform-specific rproc module
> > * @ops: platform-specific start/stop rproc handlers
> > + * @sync_ops: platform-specific start/stop rproc handlers when
> > + * synchronising with a remote processor.
> > + * @sync_flags: Determine the rproc_ops to choose in specific states.
> > * @dev: virtual device for refcounting and common remoteproc behavior
> > * @power: refcount of users who need this rproc powered up
> > * @state: state of the device
> > @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> > * @table_sz: size of @cached_table
> > * @has_iommu: flag to indicate if remote processor is behind an MMU
> > * @auto_boot: flag to indicate if remote processor should be auto-started
> > + * @sync_with_rproc: true if currently synchronising with the rproc
> > * @dump_segments: list of segments in the firmware
> > * @nb_vdev: number of vdev currently handled by rproc
> > */
> > @@ -492,6 +513,8 @@ struct rproc {
> > const char *firmware;
> > void *priv;
> > struct rproc_ops *ops;
> > + struct rproc_ops *sync_ops;
> > + struct rproc_sync_flags sync_flags;
> > struct device dev;
> > atomic_t power;
> > unsigned int state;
> > @@ -515,6 +538,7 @@ struct rproc {
> > size_t table_sz;
> > bool has_iommu;
> > bool auto_boot;
> > + bool sync_with_rproc;
> > struct list_head dump_segments;
> > int nb_vdev;
> > u8 elf_class;
> >

2020-04-30 19:59:41

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 08/14] remoteproc: Call core functions based on synchronisation flag

On Tue, Apr 28, 2020 at 07:27:27PM +0200, Arnaud POULIQUEN wrote:
>
>
> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> > Call the right core function based on whether we should synchronise
> > with a remote processor or boot it from scratch.
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_internal.h | 50 ++++++++++++++++++++++++
> > 1 file changed, 50 insertions(+)
> >
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index dda7044c4b3e..3985c084b184 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -72,6 +72,12 @@ static inline bool rproc_needs_syncing(struct rproc *rproc)
> > static inline
> > int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> > {
> > + if (rproc_needs_syncing(rproc)) {
> > + if (rproc->sync_ops && rproc->sync_ops->sanity_check)
> > + return rproc->sync_ops->sanity_check(rproc, fw);
> > + return 0;
> > + }
> > +
> > if (rproc->ops && rproc->ops->sanity_check)
> > return rproc->ops->sanity_check(rproc, fw);
>
> Regarding this patch I'm trying to determine whether it makes sense to have ops or
> sync_ops set to null. Your[v3 01/14] patch commit explains that ops can be null in case of
> synchronisation.
> But it seems deprecated with the sync_ops introduction...

Your comment made me go over the logic again... If rproc_needs_syncing() is
true then we necessarily have a sync_ops. If rproc_needs_syncing() is false,
there too we automatically have an ops. As such and as you point out, checking
for rproc->sync_ops and rproc-ops is probably useless.

>
> And if sync_ops is null, is it still necessary to define a remoteproc device?

Not sure I understand your point here but with the reasonning from above it
is probably moot anyway.

>
> Regards
> Arnad
>
> >
> > @@ -81,6 +87,12 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> > static inline
> > u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> > {
> > + if (rproc_needs_syncing(rproc)) {
> > + if (rproc->sync_ops && rproc->sync_ops->get_boot_addr)
> > + return rproc->sync_ops->get_boot_addr(rproc, fw);
> > + return 0;
> > + }
> > +
> > if (rproc->ops && rproc->ops->get_boot_addr)
> > return rproc->ops->get_boot_addr(rproc, fw);
> >
> > @@ -90,6 +102,12 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> > static inline
> > int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> > {
> > + if (rproc_needs_syncing(rproc)) {
> > + if (rproc->sync_ops && rproc->sync_ops->load)
> > + return rproc->sync_ops->load(rproc, fw);
> > + return 0;
> > + }
> > +
> > if (rproc->ops && rproc->ops->load)
> > return rproc->ops->load(rproc, fw);
> >
> > @@ -98,6 +116,12 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> >
> > static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
> > {
> > + if (rproc_needs_syncing(rproc)) {
> > + if (rproc->sync_ops && rproc->sync_ops->parse_fw)
> > + return rproc->sync_ops->parse_fw(rproc, fw);
> > + return 0;
> > + }
> > +
> > if (rproc->ops && rproc->ops->parse_fw)
> > return rproc->ops->parse_fw(rproc, fw);
> >
> > @@ -108,6 +132,13 @@ static inline
> > int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
> > int avail)
> > {
> > + if (rproc_needs_syncing(rproc)) {
> > + if (rproc->sync_ops && rproc->sync_ops->handle_rsc)
> > + return rproc->sync_ops->handle_rsc(rproc, rsc_type,
> > + rsc, offset, avail);
> > + return 0;
> > + }
> > +
> > if (rproc->ops && rproc->ops->handle_rsc)
> > return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
> > avail);
> > @@ -119,6 +150,13 @@ static inline
> > struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> > const struct firmware *fw)
> > {
> > + if (rproc_needs_syncing(rproc)) {
> > + if (rproc->sync_ops && rproc->sync_ops->find_loaded_rsc_table)
> > + return rproc->sync_ops->find_loaded_rsc_table(rproc,
> > + fw);
> > + return NULL;
> > + }
> > +
> > if (rproc->ops && rproc->ops->find_loaded_rsc_table)
> > return rproc->ops->find_loaded_rsc_table(rproc, fw);
> >
> > @@ -127,6 +165,12 @@ struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> >
> > static inline int rproc_start_device(struct rproc *rproc)
> > {
> > + if (rproc_needs_syncing(rproc)) {
> > + if (rproc->sync_ops && rproc->sync_ops->start)
> > + return rproc->sync_ops->start(rproc);
> > + return 0;
> > + }
> > +
> > if (rproc->ops && rproc->ops->start)
> > return rproc->ops->start(rproc);
> >
> > @@ -135,6 +179,12 @@ static inline int rproc_start_device(struct rproc *rproc)
> >
> > static inline int rproc_stop_device(struct rproc *rproc)
> > {
> > + if (rproc_needs_syncing(rproc)) {
> > + if (rproc->sync_ops && rproc->sync_ops->stop)
> > + return rproc->sync_ops->stop(rproc);
> > + return 0;
> > + }
> > +
> > if (rproc->ops && rproc->ops->stop)
> > return rproc->ops->stop(rproc);
> >
> >

2020-04-30 20:13:15

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 09/14] remoteproc: Deal with synchronisation when crashing

On Wed, Apr 29, 2020 at 09:44:02AM +0200, Arnaud POULIQUEN wrote:
> Hi Mathieu,
>
> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> > Refactor function rproc_trigger_recovery() in order to avoid
> > reloading the firmware image when synchronising with a remote
> > processor rather than booting it. Also part of the process,
> > properly set the synchronisation flag in order to properly
> > recover the system.
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------
> > drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++
> > 2 files changed, 43 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index ef88d3e84bfb..3a84a38ba37b 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc)
> > */
> > int rproc_trigger_recovery(struct rproc *rproc)
> > {
> > - const struct firmware *firmware_p;
> > + const struct firmware *firmware_p = NULL;
> > struct device *dev = &rproc->dev;
> > int ret;
> >
> > @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> > /* generate coredump */
> > rproc_coredump(rproc);
> >
> > - /* load firmware */
> > - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > - if (ret < 0) {
> > - dev_err(dev, "request_firmware failed: %d\n", ret);
> > - goto unlock_mutex;
> > + /* load firmware if need be */
> > + if (!rproc_needs_syncing(rproc)) {
> > + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > + if (ret < 0) {
> > + dev_err(dev, "request_firmware failed: %d\n", ret);
> > + goto unlock_mutex;
> > + }
>
> If we started in syncing mode then rpoc->firmware is null
> rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED) can make rproc_needs_syncing(rproc)
> false.

You are correct, I will add an additional check in rproc_set_machine() to
prevent a situation where rproc_alloc() has been called without an ops and any
of the synchronisation flags are set to false.

It is also possible that someone would call proc_alloc() without an ops and
doesn't call rproc_set_state_machine(), in which case both ops and sync_ops
would be NULL. Adding a check in rproc_add() is probably the best location to
catch such a condition.


> In this case here we fail the recovery an leave in RPROC_STOP state.
> As you proposed in Loic RFC[1], what about adding a more explicit message to inform that the recovery
> failed.

Right, that's a different problem.

>
> [1]https://lkml.org/lkml/2020/3/11/402
>
> Regards,
> Arnaud
> > }
> >
> > - /* boot the remote processor up again */
> > + /* boot up or synchronise with the remote processor again */
> > ret = rproc_start(rproc, firmware_p);
> >
> > release_firmware(firmware_p);
> > @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work)
> > dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
> > rproc->name);
> >
> > + /*
> > + * The remote processor has crashed - tell the core what operation
> > + * to use from hereon, i.e whether an external entity will reboot
> > + * the MCU or it is now the remoteproc core's responsability.
> > + */
> > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED);
> > +
> > mutex_unlock(&rproc->lock);
> >
> > if (!rproc->recovery_disabled)
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index 3985c084b184..61500981155c 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -24,6 +24,33 @@ struct rproc_debug_trace {
> > struct rproc_mem_entry trace_mem;
> > };
> >
> > +/*
> > + * enum rproc_sync_states - remote processsor sync states
> > + *
> > + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> > + * has crashed but has not been recovered by
> > + * the remoteproc core yet.
> > + *
> > + * Keeping these separate from the enum rproc_state in order to avoid
> > + * introducing coupling between the state of the MCU and the synchronisation
> > + * operation to use.
> > + */
> > +enum rproc_sync_states {
> > + RPROC_SYNC_STATE_CRASHED,
> > +};
> > +
> > +static inline void rproc_set_sync_flag(struct rproc *rproc,
> > + enum rproc_sync_states state)
> > +{
> > + switch (state) {
> > + case RPROC_SYNC_STATE_CRASHED:
> > + rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> > + break;
> > + default:
> > + break;
> > + }
> > +}
> > +
> > /* from remoteproc_core.c */
> > void rproc_release(struct kref *kref);
> > irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id);
> >

2020-04-30 20:25:57

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 10/14] remoteproc: Deal with synchronisation when shutting down

On Wed, Apr 29, 2020 at 10:19:49AM +0200, Arnaud POULIQUEN wrote:
>
>
> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> > The remoteproc core must not allow function rproc_shutdown() to
> > proceed if currently synchronising with a remote processor and
> > the synchronisation operations of that remote processor does not
> > support it. Also part of the process is to set the synchronisation
> > flag so that the remoteproc core can make the right decisions when
> > restarting the system.
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_core.c | 32 ++++++++++++++++++++++++
> > drivers/remoteproc/remoteproc_internal.h | 7 ++++++
> > 2 files changed, 39 insertions(+)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index 3a84a38ba37b..48afa1f80a8f 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1849,6 +1849,27 @@ int rproc_boot(struct rproc *rproc)
> > }
> > EXPORT_SYMBOL(rproc_boot);
> >
> > +static bool rproc_can_shutdown(struct rproc *rproc)
> > +{
> > + /*
> > + * The remoteproc core is the lifecycle manager, no problem
> > + * calling for a shutdown.
> > + */
> > + if (!rproc_needs_syncing(rproc))
> > + return true;
> > +
> > + /*
> > + * The remoteproc has been loaded by another entity (as per above
> > + * condition) and the platform code has given us the capability
> > + * of stopping it.
> > + */
> > + if (rproc->sync_ops->stop)
> > + return true;
>
> This means that if rproc->sync_ops->stop is null rproc_stop_subdevices will not
> be called? seems not symmetric with the start sequence.

If rproc->sync_ops->stop is not provided then the remoteproc core can't stop the
remote processor at all after it has synchronised with it. If a usecase
requires some kind of soft reset then a stop() function that uses a mailbox
notification or some other mechanism can be provided to tell the remote
processor to put itself back in startup mode again.

Is this fine with you or there is still something I don't get?

> Probably not useful to test it here as condition is already handled in rproc_stop_device...
>
> Regards
> Arnaud
> > +
> > + /* Any other condition should not be allowed */
> > + return false;
> > +}
> > +
> > /**
> > * rproc_shutdown() - power off the remote processor
> > * @rproc: the remote processor
> > @@ -1879,6 +1900,9 @@ void rproc_shutdown(struct rproc *rproc)
> > return;
> > }
> >
> > + if (!rproc_can_shutdown(rproc))
> > + goto out;
> > +
> > /* if the remote proc is still needed, bail out */
> > if (!atomic_dec_and_test(&rproc->power))
> > goto out;
> > @@ -1898,6 +1922,14 @@ void rproc_shutdown(struct rproc *rproc)
> > kfree(rproc->cached_table);
> > rproc->cached_table = NULL;
> > rproc->table_ptr = NULL;
> > +
> > + /*
> > + * The remote processor has been switched off - tell the core what
> > + * operation to use from hereon, i.e whether an external entity will
> > + * reboot the remote processor or it is now the remoteproc core's
> > + * responsability.
> > + */
> > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_SHUTDOWN);
> > out:
> > mutex_unlock(&rproc->lock);
> > }
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index 61500981155c..7dcc0a26892b 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -27,6 +27,9 @@ struct rproc_debug_trace {
> > /*
> > * enum rproc_sync_states - remote processsor sync states
> > *
> > + * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
> > + * has shutdown (rproc_shutdown()) the
> > + * remote processor.
> > * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> > * has crashed but has not been recovered by
> > * the remoteproc core yet.
> > @@ -36,6 +39,7 @@ struct rproc_debug_trace {
> > * operation to use.
> > */
> > enum rproc_sync_states {
> > + RPROC_SYNC_STATE_SHUTDOWN,
> > RPROC_SYNC_STATE_CRASHED,
> > };
> >
> > @@ -43,6 +47,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
> > enum rproc_sync_states state)
> > {
> > switch (state) {
> > + case RPROC_SYNC_STATE_SHUTDOWN:
> > + rproc->sync_with_rproc = rproc->sync_flags.after_stop;
> > + break;
> > case RPROC_SYNC_STATE_CRASHED:
> > rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> > break;
> >

2020-04-30 20:39:04

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] remoteproc: Deal with synchronisation when changing FW image

On Wed, Apr 29, 2020 at 10:52:48AM +0200, Arnaud POULIQUEN wrote:
>
>
> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> > This patch prevents the firmware image from being displayed or changed
> > when the remoteproc core is synchronising with a remote processor. This
> > is needed since there is no guarantee about the nature of the firmware
> > image that is loaded by the external entity.
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_sysfs.c | 24 +++++++++++++++++++++++-
> > 1 file changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
> > index 7f8536b73295..cdd322a6ecfa 100644
> > --- a/drivers/remoteproc/remoteproc_sysfs.c
> > +++ b/drivers/remoteproc/remoteproc_sysfs.c
> > @@ -13,9 +13,20 @@
> > static ssize_t firmware_show(struct device *dev, struct device_attribute *attr,
> > char *buf)
> > {
> > + ssize_t ret;
> > struct rproc *rproc = to_rproc(dev);
> >
> > - return sprintf(buf, "%s\n", rproc->firmware);
> > + /*
> > + * In most instances there is no guarantee about the firmware
> > + * that was loaded by the external entity. As such simply don't
> > + * print anything.
> > + */
> > + if (rproc_needs_syncing(rproc))
> > + ret = sprintf(buf, "\n");
>
> A default name is provided in sysfs if no firmware is started/synchronised on boot.
>
> IMO providing an empty name here could be confusing.
> Perhaps a refactoring of this sysfs entry would be nice:
> - Normal boot (no firmware loaded) : empty name instead of a default name

That is guaranteed to break user space so we can't proceed this way.

> - auto_boot: name provided by the platform driver or default name ( current implementation)
> - synchronization: a predefined name such as Default, unknown, External, None,...

Loic had the same comment. Usually it is best to provide sysfs output that
don't need parsing, i.e 0/1 or nothing at all, but in the remoteproc subsystem
we already have "state", "name" and "firmware" that need parsing. As such my
next revision will have "unknown", which I think is the best way to describe the
situation.

>
> > + else
> > + ret = sprintf(buf, "%s\n", rproc->firmware);
> > +
> > + return ret;
> > }
> >
> > /* Change firmware name via sysfs */
> > @@ -39,6 +50,17 @@ static ssize_t firmware_store(struct device *dev,
> > goto out;
> > }
> >
> > + /*
> > + * There is no point in trying to change the firmware if loading the
> > + * image of the remote processor is done by another entity.
> > + */
> > + if (rproc_needs_syncing(rproc)) {
> > + dev_err(dev,
> > + "can't change firmware while synchronising with MCU\n");
>
> I don't know if you decide to keep "MCU" or not. If not the case
> you have also some other instances in your patch 9/14.

MCU should be long gone. I thought I had spotted them all but was obviously
wrong.

>
> Regards
> Arnaud
>
> > + err = -EBUSY;
> > + goto out;
> > + }
> > +
> > len = strcspn(buf, "\n");
> > if (!len) {
> > dev_err(dev, "can't provide a NULL firmware\n");
> >

2020-04-30 20:47:03

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] remoteproc: Introducing function rproc_set_state_machine()

On Wed, Apr 29, 2020 at 11:22:28AM +0200, Arnaud POULIQUEN wrote:
>
>
> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> > Introducting function rproc_set_state_machine() to add
> > operations and a set of flags to use when synchronising with
> > a remote processor.
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_core.c | 54 ++++++++++++++++++++++++
> > drivers/remoteproc/remoteproc_internal.h | 6 +++
> > include/linux/remoteproc.h | 3 ++
> > 3 files changed, 63 insertions(+)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index 48afa1f80a8f..5c48714e8702 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -2065,6 +2065,59 @@ int devm_rproc_add(struct device *dev, struct rproc *rproc)
> > }
> > EXPORT_SYMBOL(devm_rproc_add);
> >
> > +/**
> > + * rproc_set_state_machine() - Set a synchronisation ops and set of flags
> > + * to use with a remote processor
> > + * @rproc: The remote processor to work with
> > + * @sync_ops: The operations to use when synchronising with a remote
> > + * processor
> > + * @sync_flags: The flags to use when deciding if the remoteproc core
> > + * should be synchronising with a remote processor
> > + *
> > + * Returns 0 on success, an error code otherwise.
> > + */
> > +int rproc_set_state_machine(struct rproc *rproc,
> > + const struct rproc_ops *sync_ops,
> > + struct rproc_sync_flags sync_flags)
>
> So this API should be called by platform driver only in case of synchronization
> support, right?

Correct

> In this case i would rename it as there is also a state machine in "normal" boot
> proposal: rproc_set_sync_machine or rproc_set_sync_state_machine

That is a valid observation - rproc_set_sync_state_machine() sounds descriptive
enough for me.

>
> > +{
> > + if (!rproc || !sync_ops)
> > + return -EINVAL;
> > +
> > + /*
> > + * No point in going further if we never have to synchronise with
> > + * the remote processor.
> > + */
> > + if (!sync_flags.on_init &&
> > + !sync_flags.after_stop && !sync_flags.after_crash)
> > + return 0;
> > +
> > + /*
> > + * Refuse to go further if remoteproc operations have been allocated
> > + * but they will never be used.
> > + */
> > + if (rproc->ops && sync_flags.on_init &&
> > + sync_flags.after_stop && sync_flags.after_crash)
> > + return -EINVAL;
> > +
> > + /*
> > + * Don't allow users to set this more than once to avoid situations
> > + * where the remote processor can't be recovered.
> > + */
> > + if (rproc->sync_ops)
> > + return -EINVAL;
> > +
> > + rproc->sync_ops = kmemdup(sync_ops, sizeof(*sync_ops), GFP_KERNEL);
> > + if (!rproc->sync_ops)
> > + return -ENOMEM;
> > +
> > + rproc->sync_flags = sync_flags;
> > + /* Tell the core what to do when initialising */
> > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_INIT);
>
> Is there a use case where sync_flags.on_init is false and other flags are true?

I haven't seen one yet, which doesn't mean it doesn't exist or won't in the
future. I wanted to make this as flexible as possible. I started with the idea
of making synchronisation at initialisation time implicit if
rproc_set_state_machine() is called but I know it is only a matter of time
before people come up with some exotic use case where .on_init is false.

>
> Look like on_init is useless and should not be exposed to the platform driver.
> Or comments are missing to explain the usage of it vs the other flags.

Comments added in remoteproc_internal.h and the new section in
Documentation/remoteproc.txt aren't sufficient? Can you give me a hint as to
what you think is missing?

>
> Regards,
> Arnaud
>
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(rproc_set_state_machine);
> > +
> > /**
> > * rproc_type_release() - release a remote processor instance
> > * @dev: the rproc's device
> > @@ -2088,6 +2141,7 @@ static void rproc_type_release(struct device *dev)
> > kfree_const(rproc->firmware);
> > kfree_const(rproc->name);
> > kfree(rproc->ops);
> > + kfree(rproc->sync_ops);
> > kfree(rproc);
> > }
> >
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index 7dcc0a26892b..c1a293a37c78 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -27,6 +27,8 @@ struct rproc_debug_trace {
> > /*
> > * enum rproc_sync_states - remote processsor sync states
> > *
> > + * @RPROC_SYNC_STATE_INIT state to use when the remoteproc core
> > + * is initialising.
> > * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
> > * has shutdown (rproc_shutdown()) the
> > * remote processor.
> > @@ -39,6 +41,7 @@ struct rproc_debug_trace {
> > * operation to use.
> > */
> > enum rproc_sync_states {
> > + RPROC_SYNC_STATE_INIT,
> > RPROC_SYNC_STATE_SHUTDOWN,
> > RPROC_SYNC_STATE_CRASHED,
> > };
> > @@ -47,6 +50,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
> > enum rproc_sync_states state)
> > {
> > switch (state) {
> > + case RPROC_SYNC_STATE_INIT:
> > + rproc->sync_with_rproc = rproc->sync_flags.on_init;
> > + break;
> > case RPROC_SYNC_STATE_SHUTDOWN:
> > rproc->sync_with_rproc = rproc->sync_flags.after_stop;
> > break;
> > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> > index ceb3b2bba824..a75ed92b3de6 100644
> > --- a/include/linux/remoteproc.h
> > +++ b/include/linux/remoteproc.h
> > @@ -619,6 +619,9 @@ struct rproc *rproc_get_by_child(struct device *dev);
> > struct rproc *rproc_alloc(struct device *dev, const char *name,
> > const struct rproc_ops *ops,
> > const char *firmware, int len);
> > +int rproc_set_state_machine(struct rproc *rproc,
> > + const struct rproc_ops *sync_ops,
> > + struct rproc_sync_flags sync_flags);
> > void rproc_put(struct rproc *rproc);
> > int rproc_add(struct rproc *rproc);
> > int rproc_del(struct rproc *rproc);
> >

2020-04-30 20:56:48

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] remoteproc: Introducing function rproc_set_state_machine()

On Wed, Apr 29, 2020 at 04:38:54PM +0200, Arnaud POULIQUEN wrote:
>
>
> On 4/29/20 11:22 AM, Arnaud POULIQUEN wrote:
> >
> >
> > On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> >> Introducting function rproc_set_state_machine() to add
> >> operations and a set of flags to use when synchronising with
> >> a remote processor.
> >>
> >> Signed-off-by: Mathieu Poirier <[email protected]>
> >> ---
> >> drivers/remoteproc/remoteproc_core.c | 54 ++++++++++++++++++++++++
> >> drivers/remoteproc/remoteproc_internal.h | 6 +++
> >> include/linux/remoteproc.h | 3 ++
> >> 3 files changed, 63 insertions(+)
> >>
> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> >> index 48afa1f80a8f..5c48714e8702 100644
> >> --- a/drivers/remoteproc/remoteproc_core.c
> >> +++ b/drivers/remoteproc/remoteproc_core.c
> >> @@ -2065,6 +2065,59 @@ int devm_rproc_add(struct device *dev, struct rproc *rproc)
> >> }
> >> EXPORT_SYMBOL(devm_rproc_add);
> >>
> >> +/**
> >> + * rproc_set_state_machine() - Set a synchronisation ops and set of flags
> >> + * to use with a remote processor
> >> + * @rproc: The remote processor to work with
> >> + * @sync_ops: The operations to use when synchronising with a remote
> >> + * processor
> >> + * @sync_flags: The flags to use when deciding if the remoteproc core
> >> + * should be synchronising with a remote processor
> >> + *
> >> + * Returns 0 on success, an error code otherwise.
> >> + */
> >> +int rproc_set_state_machine(struct rproc *rproc,
> >> + const struct rproc_ops *sync_ops,
> >> + struct rproc_sync_flags sync_flags)
> >
> > So this API should be called by platform driver only in case of synchronization
> > support, right?
> > In this case i would rename it as there is also a state machine in "normal" boot
> > proposal: rproc_set_sync_machine or rproc_set_sync_state_machine
> >
>
> Reviewing the stm32 series, i wonder if sync_flags should be a pointer to a const structure
> as the platform driver should not update it during the rproc live cycle.
> Then IMO, using a pointer to the structure instead of the structure seems more
> in line with the rest of the remoteproc API.

Humm... If we do make sync_flags constant then the platform drivers can't modify
the values dynamically, as I did in the stm32 series. This is something Loic
had asked for.

Moreover function rproc_set_state_machine() can't be called twice so updating
the sync_flags can't happen.

>
> >> +{
> >> + if (!rproc || !sync_ops)
> >> + return -EINVAL;
> >> +
> >> + /*
> >> + * No point in going further if we never have to synchronise with
> >> + * the remote processor.
> >> + */
> >> + if (!sync_flags.on_init &&
> >> + !sync_flags.after_stop && !sync_flags.after_crash)
> >> + return 0;
> >> +
> >> + /*
> >> + * Refuse to go further if remoteproc operations have been allocated
> >> + * but they will never be used.
> >> + */
> >> + if (rproc->ops && sync_flags.on_init &&
> >> + sync_flags.after_stop && sync_flags.after_crash)
> >> + return -EINVAL;
> >> +
> >> + /*
> >> + * Don't allow users to set this more than once to avoid situations
> >> + * where the remote processor can't be recovered.
> >> + */
> >> + if (rproc->sync_ops)
> >> + return -EINVAL;
> >> +
> >> + rproc->sync_ops = kmemdup(sync_ops, sizeof(*sync_ops), GFP_KERNEL);
> >> + if (!rproc->sync_ops)
> >> + return -ENOMEM;
> >> +
> >> + rproc->sync_flags = sync_flags;
> >> + /* Tell the core what to do when initialising */
> >> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_INIT);
> >
> > Is there a use case where sync_flags.on_init is false and other flags are true?
> >
> > Look like on_init is useless and should not be exposed to the platform driver.
> > Or comments are missing to explain the usage of it vs the other flags.
> >
> > Regards,
> > Arnaud
> >
> >> +
> >> + return 0;
> >> +}
> >> +EXPORT_SYMBOL(rproc_set_state_machine);
> >> +
> >> /**
> >> * rproc_type_release() - release a remote processor instance
> >> * @dev: the rproc's device
> >> @@ -2088,6 +2141,7 @@ static void rproc_type_release(struct device *dev)
> >> kfree_const(rproc->firmware);
> >> kfree_const(rproc->name);
> >> kfree(rproc->ops);
> >> + kfree(rproc->sync_ops);
> >> kfree(rproc);
> >> }
> >>
> >> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> >> index 7dcc0a26892b..c1a293a37c78 100644
> >> --- a/drivers/remoteproc/remoteproc_internal.h
> >> +++ b/drivers/remoteproc/remoteproc_internal.h
> >> @@ -27,6 +27,8 @@ struct rproc_debug_trace {
> >> /*
> >> * enum rproc_sync_states - remote processsor sync states
> >> *
> >> + * @RPROC_SYNC_STATE_INIT state to use when the remoteproc core
> >> + * is initialising.
> >> * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
> >> * has shutdown (rproc_shutdown()) the
> >> * remote processor.
> >> @@ -39,6 +41,7 @@ struct rproc_debug_trace {
> >> * operation to use.
> >> */
> >> enum rproc_sync_states {
> >> + RPROC_SYNC_STATE_INIT,
> >> RPROC_SYNC_STATE_SHUTDOWN,
> >> RPROC_SYNC_STATE_CRASHED,
> >> };
> >> @@ -47,6 +50,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
> >> enum rproc_sync_states state)
> >> {
> >> switch (state) {
> >> + case RPROC_SYNC_STATE_INIT:
> >> + rproc->sync_with_rproc = rproc->sync_flags.on_init;
> >> + break;
> >> case RPROC_SYNC_STATE_SHUTDOWN:
> >> rproc->sync_with_rproc = rproc->sync_flags.after_stop;
> >> break;
> >> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> >> index ceb3b2bba824..a75ed92b3de6 100644
> >> --- a/include/linux/remoteproc.h
> >> +++ b/include/linux/remoteproc.h
> >> @@ -619,6 +619,9 @@ struct rproc *rproc_get_by_child(struct device *dev);
> >> struct rproc *rproc_alloc(struct device *dev, const char *name,
> >> const struct rproc_ops *ops,
> >> const char *firmware, int len);
> >> +int rproc_set_state_machine(struct rproc *rproc,
> >> + const struct rproc_ops *sync_ops,
> >> + struct rproc_sync_flags sync_flags);
> >> void rproc_put(struct rproc *rproc);
> >> int rproc_add(struct rproc *rproc);
> >> int rproc_del(struct rproc *rproc);
> >>

2020-05-04 11:19:41

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 08/14] remoteproc: Call core functions based on synchronisation flag

hi Mathieu,

On 4/30/20 9:57 PM, Mathieu Poirier wrote:
> On Tue, Apr 28, 2020 at 07:27:27PM +0200, Arnaud POULIQUEN wrote:
>>
>>
>> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
>>> Call the right core function based on whether we should synchronise
>>> with a remote processor or boot it from scratch.
>>>
>>> Signed-off-by: Mathieu Poirier <[email protected]>
>>> ---
>>> drivers/remoteproc/remoteproc_internal.h | 50 ++++++++++++++++++++++++
>>> 1 file changed, 50 insertions(+)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
>>> index dda7044c4b3e..3985c084b184 100644
>>> --- a/drivers/remoteproc/remoteproc_internal.h
>>> +++ b/drivers/remoteproc/remoteproc_internal.h
>>> @@ -72,6 +72,12 @@ static inline bool rproc_needs_syncing(struct rproc *rproc)
>>> static inline
>>> int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
>>> {
>>> + if (rproc_needs_syncing(rproc)) {
>>> + if (rproc->sync_ops && rproc->sync_ops->sanity_check)
>>> + return rproc->sync_ops->sanity_check(rproc, fw);
>>> + return 0;
>>> + }
>>> +
>>> if (rproc->ops && rproc->ops->sanity_check)
>>> return rproc->ops->sanity_check(rproc, fw);
>>
>> Regarding this patch I'm trying to determine whether it makes sense to have ops or
>> sync_ops set to null. Your[v3 01/14] patch commit explains that ops can be null in case of
>> synchronisation.
>> But it seems deprecated with the sync_ops introduction...
>
> Your comment made me go over the logic again... If rproc_needs_syncing() is
> true then we necessarily have a sync_ops. If rproc_needs_syncing() is false,
> there too we automatically have an ops. As such and as you point out, checking
> for rproc->sync_ops and rproc-ops is probably useless.
An Additional test in rproc_set_state_machine should be sufficient, something like that:
/* rproc->ops struct is mandatory if at least one sync flag is false */
if (!rproc->ops && !(sync_flags.on_init &&
sync_flags.after_stop && sync_flags.after_crash))
return -EINVAL;

>
>>
>> And if sync_ops is null, is it still necessary to define a remoteproc device?
>
> Not sure I understand your point here but with the reasonning from above it
> is probably moot anyway.
Just to mention that a platform device with ops and ops_sync null seems like nonsense

Regards,
Arnaud
>
>>
>> Regards
>> Arnad
>>
>>>
>>> @@ -81,6 +87,12 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
>>> static inline
>>> u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
>>> {
>>> + if (rproc_needs_syncing(rproc)) {
>>> + if (rproc->sync_ops && rproc->sync_ops->get_boot_addr)
>>> + return rproc->sync_ops->get_boot_addr(rproc, fw);
>>> + return 0;
>>> + }
>>> +
>>> if (rproc->ops && rproc->ops->get_boot_addr)
>>> return rproc->ops->get_boot_addr(rproc, fw);
>>>
>>> @@ -90,6 +102,12 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
>>> static inline
>>> int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
>>> {
>>> + if (rproc_needs_syncing(rproc)) {
>>> + if (rproc->sync_ops && rproc->sync_ops->load)
>>> + return rproc->sync_ops->load(rproc, fw);
>>> + return 0;
>>> + }
>>> +
>>> if (rproc->ops && rproc->ops->load)
>>> return rproc->ops->load(rproc, fw);
>>>
>>> @@ -98,6 +116,12 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
>>>
>>> static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
>>> {
>>> + if (rproc_needs_syncing(rproc)) {
>>> + if (rproc->sync_ops && rproc->sync_ops->parse_fw)
>>> + return rproc->sync_ops->parse_fw(rproc, fw);
>>> + return 0;
>>> + }
>>> +
>>> if (rproc->ops && rproc->ops->parse_fw)
>>> return rproc->ops->parse_fw(rproc, fw);
>>>
>>> @@ -108,6 +132,13 @@ static inline
>>> int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
>>> int avail)
>>> {
>>> + if (rproc_needs_syncing(rproc)) {
>>> + if (rproc->sync_ops && rproc->sync_ops->handle_rsc)
>>> + return rproc->sync_ops->handle_rsc(rproc, rsc_type,
>>> + rsc, offset, avail);
>>> + return 0;
>>> + }
>>> +
>>> if (rproc->ops && rproc->ops->handle_rsc)
>>> return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
>>> avail);
>>> @@ -119,6 +150,13 @@ static inline
>>> struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
>>> const struct firmware *fw)
>>> {
>>> + if (rproc_needs_syncing(rproc)) {
>>> + if (rproc->sync_ops && rproc->sync_ops->find_loaded_rsc_table)
>>> + return rproc->sync_ops->find_loaded_rsc_table(rproc,
>>> + fw);
>>> + return NULL;
>>> + }
>>> +
>>> if (rproc->ops && rproc->ops->find_loaded_rsc_table)
>>> return rproc->ops->find_loaded_rsc_table(rproc, fw);
>>>
>>> @@ -127,6 +165,12 @@ struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
>>>
>>> static inline int rproc_start_device(struct rproc *rproc)
>>> {
>>> + if (rproc_needs_syncing(rproc)) {
>>> + if (rproc->sync_ops && rproc->sync_ops->start)
>>> + return rproc->sync_ops->start(rproc);
>>> + return 0;
>>> + }
>>> +
>>> if (rproc->ops && rproc->ops->start)
>>> return rproc->ops->start(rproc);
>>>
>>> @@ -135,6 +179,12 @@ static inline int rproc_start_device(struct rproc *rproc)
>>>
>>> static inline int rproc_stop_device(struct rproc *rproc)
>>> {
>>> + if (rproc_needs_syncing(rproc)) {
>>> + if (rproc->sync_ops && rproc->sync_ops->stop)
>>> + return rproc->sync_ops->stop(rproc);
>>> + return 0;
>>> + }
>>> +
>>> if (rproc->ops && rproc->ops->stop)
>>> return rproc->ops->stop(rproc);
>>>
>>>

2020-05-04 11:38:04

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 10/14] remoteproc: Deal with synchronisation when shutting down



On 4/30/20 10:23 PM, Mathieu Poirier wrote:
> On Wed, Apr 29, 2020 at 10:19:49AM +0200, Arnaud POULIQUEN wrote:
>>
>>
>> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
>>> The remoteproc core must not allow function rproc_shutdown() to
>>> proceed if currently synchronising with a remote processor and
>>> the synchronisation operations of that remote processor does not
>>> support it. Also part of the process is to set the synchronisation
>>> flag so that the remoteproc core can make the right decisions when
>>> restarting the system.
>>>
>>> Signed-off-by: Mathieu Poirier <[email protected]>
>>> ---
>>> drivers/remoteproc/remoteproc_core.c | 32 ++++++++++++++++++++++++
>>> drivers/remoteproc/remoteproc_internal.h | 7 ++++++
>>> 2 files changed, 39 insertions(+)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>>> index 3a84a38ba37b..48afa1f80a8f 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -1849,6 +1849,27 @@ int rproc_boot(struct rproc *rproc)
>>> }
>>> EXPORT_SYMBOL(rproc_boot);
>>>
>>> +static bool rproc_can_shutdown(struct rproc *rproc)
>>> +{
>>> + /*
>>> + * The remoteproc core is the lifecycle manager, no problem
>>> + * calling for a shutdown.
>>> + */
>>> + if (!rproc_needs_syncing(rproc))
>>> + return true;
>>> +
>>> + /*
>>> + * The remoteproc has been loaded by another entity (as per above
>>> + * condition) and the platform code has given us the capability
>>> + * of stopping it.
>>> + */
>>> + if (rproc->sync_ops->stop)
>>> + return true;
>>
>> This means that if rproc->sync_ops->stop is null rproc_stop_subdevices will not
>> be called? seems not symmetric with the start sequence.
>
> If rproc->sync_ops->stop is not provided then the remoteproc core can't stop the
> remote processor at all after it has synchronised with it. If a usecase
> requires some kind of soft reset then a stop() function that uses a mailbox
> notification or some other mechanism can be provided to tell the remote
> processor to put itself back in startup mode again.
>
> Is this fine with you or there is still something I don't get?

My point here is more around the subdevices. But perhaps i missed something...

In rproc_start rproc_start_subdevices is called, even if sync_start is null.
But in rproc_shutdown rproc_stop is not called, if sync_ops->stop is null.
So rproc_stop_subdevices is not called in this case.
Then if sync_flags.after_stop is false, it looks like that something will go wrong
at next start.

>
>> Probably not useful to test it here as condition is already handled in rproc_stop_device...
>>
>> Regards
>> Arnaud
>>> +
>>> + /* Any other condition should not be allowed */
>>> + return false;
>>> +}
>>> +
>>> /**
>>> * rproc_shutdown() - power off the remote processor
>>> * @rproc: the remote processor
>>> @@ -1879,6 +1900,9 @@ void rproc_shutdown(struct rproc *rproc)
>>> return;
>>> }
>>>
>>> + if (!rproc_can_shutdown(rproc))
>>> + goto out;
>>> +
>>> /* if the remote proc is still needed, bail out */
>>> if (!atomic_dec_and_test(&rproc->power))
>>> goto out;
>>> @@ -1898,6 +1922,14 @@ void rproc_shutdown(struct rproc *rproc)
>>> kfree(rproc->cached_table);
>>> rproc->cached_table = NULL;
>>> rproc->table_ptr = NULL;
>>> +
>>> + /*
>>> + * The remote processor has been switched off - tell the core what
>>> + * operation to use from hereon, i.e whether an external entity will
>>> + * reboot the remote processor or it is now the remoteproc core's
>>> + * responsability.
>>> + */
>>> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_SHUTDOWN);
>>> out:
>>> mutex_unlock(&rproc->lock);
>>> }
>>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
>>> index 61500981155c..7dcc0a26892b 100644
>>> --- a/drivers/remoteproc/remoteproc_internal.h
>>> +++ b/drivers/remoteproc/remoteproc_internal.h
>>> @@ -27,6 +27,9 @@ struct rproc_debug_trace {
>>> /*
>>> * enum rproc_sync_states - remote processsor sync states
>>> *
>>> + * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
>>> + * has shutdown (rproc_shutdown()) the
>>> + * remote processor.
>>> * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
>>> * has crashed but has not been recovered by
>>> * the remoteproc core yet.
>>> @@ -36,6 +39,7 @@ struct rproc_debug_trace {
>>> * operation to use.
>>> */
>>> enum rproc_sync_states {
>>> + RPROC_SYNC_STATE_SHUTDOWN,
>>> RPROC_SYNC_STATE_CRASHED,
>>> };
>>>
>>> @@ -43,6 +47,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
>>> enum rproc_sync_states state)
>>> {
>>> switch (state) {
>>> + case RPROC_SYNC_STATE_SHUTDOWN:
>>> + rproc->sync_with_rproc = rproc->sync_flags.after_stop;
>>> + break;
>>> case RPROC_SYNC_STATE_CRASHED:
>>> rproc->sync_with_rproc = rproc->sync_flags.after_crash;
>>> break;
>>>

2020-05-04 12:00:44

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] remoteproc: Introducing function rproc_set_state_machine()



On 4/30/20 10:42 PM, Mathieu Poirier wrote:
> On Wed, Apr 29, 2020 at 11:22:28AM +0200, Arnaud POULIQUEN wrote:
>>
>>
>> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
>>> Introducting function rproc_set_state_machine() to add
>>> operations and a set of flags to use when synchronising with
>>> a remote processor.
>>>
>>> Signed-off-by: Mathieu Poirier <[email protected]>
>>> ---
>>> drivers/remoteproc/remoteproc_core.c | 54 ++++++++++++++++++++++++
>>> drivers/remoteproc/remoteproc_internal.h | 6 +++
>>> include/linux/remoteproc.h | 3 ++
>>> 3 files changed, 63 insertions(+)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>>> index 48afa1f80a8f..5c48714e8702 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -2065,6 +2065,59 @@ int devm_rproc_add(struct device *dev, struct rproc *rproc)
>>> }
>>> EXPORT_SYMBOL(devm_rproc_add);
>>>
>>> +/**
>>> + * rproc_set_state_machine() - Set a synchronisation ops and set of flags
>>> + * to use with a remote processor
>>> + * @rproc: The remote processor to work with
>>> + * @sync_ops: The operations to use when synchronising with a remote
>>> + * processor
>>> + * @sync_flags: The flags to use when deciding if the remoteproc core
>>> + * should be synchronising with a remote processor
>>> + *
>>> + * Returns 0 on success, an error code otherwise.
>>> + */
>>> +int rproc_set_state_machine(struct rproc *rproc,
>>> + const struct rproc_ops *sync_ops,
>>> + struct rproc_sync_flags sync_flags)
>>
>> So this API should be called by platform driver only in case of synchronization
>> support, right?
>
> Correct
>
>> In this case i would rename it as there is also a state machine in "normal" boot
>> proposal: rproc_set_sync_machine or rproc_set_sync_state_machine
>
> That is a valid observation - rproc_set_sync_state_machine() sounds descriptive
> enough for me.
>
>>
>>> +{
>>> + if (!rproc || !sync_ops)
>>> + return -EINVAL;
>>> +
>>> + /*
>>> + * No point in going further if we never have to synchronise with
>>> + * the remote processor.
>>> + */
>>> + if (!sync_flags.on_init &&
>>> + !sync_flags.after_stop && !sync_flags.after_crash)
>>> + return 0;
>>> +
>>> + /*
>>> + * Refuse to go further if remoteproc operations have been allocated
>>> + * but they will never be used.
>>> + */
>>> + if (rproc->ops && sync_flags.on_init &&
>>> + sync_flags.after_stop && sync_flags.after_crash)
>>> + return -EINVAL;
>>> +
>>> + /*
>>> + * Don't allow users to set this more than once to avoid situations
>>> + * where the remote processor can't be recovered.
>>> + */
>>> + if (rproc->sync_ops)
>>> + return -EINVAL;
>>> +
>>> + rproc->sync_ops = kmemdup(sync_ops, sizeof(*sync_ops), GFP_KERNEL);
>>> + if (!rproc->sync_ops)
>>> + return -ENOMEM;
>>> +
>>> + rproc->sync_flags = sync_flags;
>>> + /* Tell the core what to do when initialising */
>>> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_INIT);
>>
>> Is there a use case where sync_flags.on_init is false and other flags are true?
>
> I haven't seen one yet, which doesn't mean it doesn't exist or won't in the
> future. I wanted to make this as flexible as possible. I started with the idea
> of making synchronisation at initialisation time implicit if
> rproc_set_state_machine() is called but I know it is only a matter of time
> before people come up with some exotic use case where .on_init is false.

So having on_init false but after_crash && after_stop true, means loading the
firmware on first start, and the synchronize with it, right?
Yes probably could be an exotic valid use case. :)

>
>>
>> Look like on_init is useless and should not be exposed to the platform driver.
>> Or comments are missing to explain the usage of it vs the other flags.
>
> Comments added in remoteproc_internal.h and the new section in
> Documentation/remoteproc.txt aren't sufficient? Can you give me a hint as to
> what you think is missing?

IMO something is quite confusing...
On one side on_init can be set to false.
But on the other side the flag is set by call rproc_set_state_machine.
In Documentation/remoteproc.txt rproc_set_state_machine description is:

"This function should be called for cases where the remote processor has
been started by another entity, be it a boot loader or trusted environment,
and the remoteproc core is to synchronise with the remote processor rather
then boot it."

So how on_init could be false if "the remote processor has
been started by another entity"?

Regards,
Arnaud

>
>>
>> Regards,
>> Arnaud
>>
>>> +
>>> + return 0;
>>> +}
>>> +EXPORT_SYMBOL(rproc_set_state_machine);
>>> +
>>> /**
>>> * rproc_type_release() - release a remote processor instance
>>> * @dev: the rproc's device
>>> @@ -2088,6 +2141,7 @@ static void rproc_type_release(struct device *dev)
>>> kfree_const(rproc->firmware);
>>> kfree_const(rproc->name);
>>> kfree(rproc->ops);
>>> + kfree(rproc->sync_ops);
>>> kfree(rproc);
>>> }
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
>>> index 7dcc0a26892b..c1a293a37c78 100644
>>> --- a/drivers/remoteproc/remoteproc_internal.h
>>> +++ b/drivers/remoteproc/remoteproc_internal.h
>>> @@ -27,6 +27,8 @@ struct rproc_debug_trace {
>>> /*
>>> * enum rproc_sync_states - remote processsor sync states
>>> *
>>> + * @RPROC_SYNC_STATE_INIT state to use when the remoteproc core
>>> + * is initialising.
>>> * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
>>> * has shutdown (rproc_shutdown()) the
>>> * remote processor.
>>> @@ -39,6 +41,7 @@ struct rproc_debug_trace {
>>> * operation to use.
>>> */
>>> enum rproc_sync_states {
>>> + RPROC_SYNC_STATE_INIT,
>>> RPROC_SYNC_STATE_SHUTDOWN,
>>> RPROC_SYNC_STATE_CRASHED,
>>> };
>>> @@ -47,6 +50,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
>>> enum rproc_sync_states state)
>>> {
>>> switch (state) {
>>> + case RPROC_SYNC_STATE_INIT:
>>> + rproc->sync_with_rproc = rproc->sync_flags.on_init;
>>> + break;
>>> case RPROC_SYNC_STATE_SHUTDOWN:
>>> rproc->sync_with_rproc = rproc->sync_flags.after_stop;
>>> break;
>>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>>> index ceb3b2bba824..a75ed92b3de6 100644
>>> --- a/include/linux/remoteproc.h
>>> +++ b/include/linux/remoteproc.h
>>> @@ -619,6 +619,9 @@ struct rproc *rproc_get_by_child(struct device *dev);
>>> struct rproc *rproc_alloc(struct device *dev, const char *name,
>>> const struct rproc_ops *ops,
>>> const char *firmware, int len);
>>> +int rproc_set_state_machine(struct rproc *rproc,
>>> + const struct rproc_ops *sync_ops,
>>> + struct rproc_sync_flags sync_flags);
>>> void rproc_put(struct rproc *rproc);
>>> int rproc_add(struct rproc *rproc);
>>> int rproc_del(struct rproc *rproc);
>>>

2020-05-04 12:03:48

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] remoteproc: Introducing function rproc_set_state_machine()



On 4/30/20 10:51 PM, Mathieu Poirier wrote:
> On Wed, Apr 29, 2020 at 04:38:54PM +0200, Arnaud POULIQUEN wrote:
>>
>>
>> On 4/29/20 11:22 AM, Arnaud POULIQUEN wrote:
>>>
>>>
>>> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
>>>> Introducting function rproc_set_state_machine() to add
>>>> operations and a set of flags to use when synchronising with
>>>> a remote processor.
>>>>
>>>> Signed-off-by: Mathieu Poirier <[email protected]>
>>>> ---
>>>> drivers/remoteproc/remoteproc_core.c | 54 ++++++++++++++++++++++++
>>>> drivers/remoteproc/remoteproc_internal.h | 6 +++
>>>> include/linux/remoteproc.h | 3 ++
>>>> 3 files changed, 63 insertions(+)
>>>>
>>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>>>> index 48afa1f80a8f..5c48714e8702 100644
>>>> --- a/drivers/remoteproc/remoteproc_core.c
>>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>>> @@ -2065,6 +2065,59 @@ int devm_rproc_add(struct device *dev, struct rproc *rproc)
>>>> }
>>>> EXPORT_SYMBOL(devm_rproc_add);
>>>>
>>>> +/**
>>>> + * rproc_set_state_machine() - Set a synchronisation ops and set of flags
>>>> + * to use with a remote processor
>>>> + * @rproc: The remote processor to work with
>>>> + * @sync_ops: The operations to use when synchronising with a remote
>>>> + * processor
>>>> + * @sync_flags: The flags to use when deciding if the remoteproc core
>>>> + * should be synchronising with a remote processor
>>>> + *
>>>> + * Returns 0 on success, an error code otherwise.
>>>> + */
>>>> +int rproc_set_state_machine(struct rproc *rproc,
>>>> + const struct rproc_ops *sync_ops,
>>>> + struct rproc_sync_flags sync_flags)
>>>
>>> So this API should be called by platform driver only in case of synchronization
>>> support, right?
>>> In this case i would rename it as there is also a state machine in "normal" boot
>>> proposal: rproc_set_sync_machine or rproc_set_sync_state_machine
>>>
>>
>> Reviewing the stm32 series, i wonder if sync_flags should be a pointer to a const structure
>> as the platform driver should not update it during the rproc live cycle.
>> Then IMO, using a pointer to the structure instead of the structure seems more
>> in line with the rest of the remoteproc API.
>
> Humm... If we do make sync_flags constant then the platform drivers can't modify
> the values dynamically, as I did in the stm32 series. This is something Loic
> had asked for.
>
> Moreover function rproc_set_state_machine() can't be called twice so updating
> the sync_flags can't happen.

You are right, make it constant is not a good idea.

Regards,
Arnaud
>
>>
>>>> +{
>>>> + if (!rproc || !sync_ops)
>>>> + return -EINVAL;
>>>> +
>>>> + /*
>>>> + * No point in going further if we never have to synchronise with
>>>> + * the remote processor.
>>>> + */
>>>> + if (!sync_flags.on_init &&
>>>> + !sync_flags.after_stop && !sync_flags.after_crash)
>>>> + return 0;
>>>> +
>>>> + /*
>>>> + * Refuse to go further if remoteproc operations have been allocated
>>>> + * but they will never be used.
>>>> + */
>>>> + if (rproc->ops && sync_flags.on_init &&
>>>> + sync_flags.after_stop && sync_flags.after_crash)
>>>> + return -EINVAL;
>>>> +
>>>> + /*
>>>> + * Don't allow users to set this more than once to avoid situations
>>>> + * where the remote processor can't be recovered.
>>>> + */
>>>> + if (rproc->sync_ops)
>>>> + return -EINVAL;
>>>> +
>>>> + rproc->sync_ops = kmemdup(sync_ops, sizeof(*sync_ops), GFP_KERNEL);
>>>> + if (!rproc->sync_ops)
>>>> + return -ENOMEM;
>>>> +
>>>> + rproc->sync_flags = sync_flags;
>>>> + /* Tell the core what to do when initialising */
>>>> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_INIT);
>>>
>>> Is there a use case where sync_flags.on_init is false and other flags are true?
>>>
>>> Look like on_init is useless and should not be exposed to the platform driver.
>>> Or comments are missing to explain the usage of it vs the other flags.
>>>
>>> Regards,
>>> Arnaud
>>>
>>>> +
>>>> + return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(rproc_set_state_machine);
>>>> +
>>>> /**
>>>> * rproc_type_release() - release a remote processor instance
>>>> * @dev: the rproc's device
>>>> @@ -2088,6 +2141,7 @@ static void rproc_type_release(struct device *dev)
>>>> kfree_const(rproc->firmware);
>>>> kfree_const(rproc->name);
>>>> kfree(rproc->ops);
>>>> + kfree(rproc->sync_ops);
>>>> kfree(rproc);
>>>> }
>>>>
>>>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
>>>> index 7dcc0a26892b..c1a293a37c78 100644
>>>> --- a/drivers/remoteproc/remoteproc_internal.h
>>>> +++ b/drivers/remoteproc/remoteproc_internal.h
>>>> @@ -27,6 +27,8 @@ struct rproc_debug_trace {
>>>> /*
>>>> * enum rproc_sync_states - remote processsor sync states
>>>> *
>>>> + * @RPROC_SYNC_STATE_INIT state to use when the remoteproc core
>>>> + * is initialising.
>>>> * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
>>>> * has shutdown (rproc_shutdown()) the
>>>> * remote processor.
>>>> @@ -39,6 +41,7 @@ struct rproc_debug_trace {
>>>> * operation to use.
>>>> */
>>>> enum rproc_sync_states {
>>>> + RPROC_SYNC_STATE_INIT,
>>>> RPROC_SYNC_STATE_SHUTDOWN,
>>>> RPROC_SYNC_STATE_CRASHED,
>>>> };
>>>> @@ -47,6 +50,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
>>>> enum rproc_sync_states state)
>>>> {
>>>> switch (state) {
>>>> + case RPROC_SYNC_STATE_INIT:
>>>> + rproc->sync_with_rproc = rproc->sync_flags.on_init;
>>>> + break;
>>>> case RPROC_SYNC_STATE_SHUTDOWN:
>>>> rproc->sync_with_rproc = rproc->sync_flags.after_stop;
>>>> break;
>>>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
>>>> index ceb3b2bba824..a75ed92b3de6 100644
>>>> --- a/include/linux/remoteproc.h
>>>> +++ b/include/linux/remoteproc.h
>>>> @@ -619,6 +619,9 @@ struct rproc *rproc_get_by_child(struct device *dev);
>>>> struct rproc *rproc_alloc(struct device *dev, const char *name,
>>>> const struct rproc_ops *ops,
>>>> const char *firmware, int len);
>>>> +int rproc_set_state_machine(struct rproc *rproc,
>>>> + const struct rproc_ops *sync_ops,
>>>> + struct rproc_sync_flags sync_flags);
>>>> void rproc_put(struct rproc *rproc);
>>>> int rproc_add(struct rproc *rproc);
>>>> int rproc_del(struct rproc *rproc);
>>>>

2020-05-05 21:45:14

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] remoteproc: Introducing function rproc_set_state_machine()

On Mon, May 04, 2020 at 01:57:59PM +0200, Arnaud POULIQUEN wrote:
>
>
> On 4/30/20 10:42 PM, Mathieu Poirier wrote:
> > On Wed, Apr 29, 2020 at 11:22:28AM +0200, Arnaud POULIQUEN wrote:
> >>
> >>
> >> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> >>> Introducting function rproc_set_state_machine() to add
> >>> operations and a set of flags to use when synchronising with
> >>> a remote processor.
> >>>
> >>> Signed-off-by: Mathieu Poirier <[email protected]>
> >>> ---
> >>> drivers/remoteproc/remoteproc_core.c | 54 ++++++++++++++++++++++++
> >>> drivers/remoteproc/remoteproc_internal.h | 6 +++
> >>> include/linux/remoteproc.h | 3 ++
> >>> 3 files changed, 63 insertions(+)
> >>>
> >>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> >>> index 48afa1f80a8f..5c48714e8702 100644
> >>> --- a/drivers/remoteproc/remoteproc_core.c
> >>> +++ b/drivers/remoteproc/remoteproc_core.c
> >>> @@ -2065,6 +2065,59 @@ int devm_rproc_add(struct device *dev, struct rproc *rproc)
> >>> }
> >>> EXPORT_SYMBOL(devm_rproc_add);
> >>>
> >>> +/**
> >>> + * rproc_set_state_machine() - Set a synchronisation ops and set of flags
> >>> + * to use with a remote processor
> >>> + * @rproc: The remote processor to work with
> >>> + * @sync_ops: The operations to use when synchronising with a remote
> >>> + * processor
> >>> + * @sync_flags: The flags to use when deciding if the remoteproc core
> >>> + * should be synchronising with a remote processor
> >>> + *
> >>> + * Returns 0 on success, an error code otherwise.
> >>> + */
> >>> +int rproc_set_state_machine(struct rproc *rproc,
> >>> + const struct rproc_ops *sync_ops,
> >>> + struct rproc_sync_flags sync_flags)
> >>
> >> So this API should be called by platform driver only in case of synchronization
> >> support, right?
> >
> > Correct
> >
> >> In this case i would rename it as there is also a state machine in "normal" boot
> >> proposal: rproc_set_sync_machine or rproc_set_sync_state_machine
> >
> > That is a valid observation - rproc_set_sync_state_machine() sounds descriptive
> > enough for me.
> >
> >>
> >>> +{
> >>> + if (!rproc || !sync_ops)
> >>> + return -EINVAL;
> >>> +
> >>> + /*
> >>> + * No point in going further if we never have to synchronise with
> >>> + * the remote processor.
> >>> + */
> >>> + if (!sync_flags.on_init &&
> >>> + !sync_flags.after_stop && !sync_flags.after_crash)
> >>> + return 0;
> >>> +
> >>> + /*
> >>> + * Refuse to go further if remoteproc operations have been allocated
> >>> + * but they will never be used.
> >>> + */
> >>> + if (rproc->ops && sync_flags.on_init &&
> >>> + sync_flags.after_stop && sync_flags.after_crash)
> >>> + return -EINVAL;
> >>> +
> >>> + /*
> >>> + * Don't allow users to set this more than once to avoid situations
> >>> + * where the remote processor can't be recovered.
> >>> + */
> >>> + if (rproc->sync_ops)
> >>> + return -EINVAL;
> >>> +
> >>> + rproc->sync_ops = kmemdup(sync_ops, sizeof(*sync_ops), GFP_KERNEL);
> >>> + if (!rproc->sync_ops)
> >>> + return -ENOMEM;
> >>> +
> >>> + rproc->sync_flags = sync_flags;
> >>> + /* Tell the core what to do when initialising */
> >>> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_INIT);
> >>
> >> Is there a use case where sync_flags.on_init is false and other flags are true?
> >
> > I haven't seen one yet, which doesn't mean it doesn't exist or won't in the
> > future. I wanted to make this as flexible as possible. I started with the idea
> > of making synchronisation at initialisation time implicit if
> > rproc_set_state_machine() is called but I know it is only a matter of time
> > before people come up with some exotic use case where .on_init is false.
>
> So having on_init false but after_crash && after_stop true, means loading the
> firmware on first start, and the synchronize with it, right?
> Yes probably could be an exotic valid use case. :)
>
> >
> >>
> >> Look like on_init is useless and should not be exposed to the platform driver.
> >> Or comments are missing to explain the usage of it vs the other flags.
> >
> > Comments added in remoteproc_internal.h and the new section in
> > Documentation/remoteproc.txt aren't sufficient? Can you give me a hint as to
> > what you think is missing?
>
> IMO something is quite confusing...
> On one side on_init can be set to false.
> But on the other side the flag is set by call rproc_set_state_machine.
> In Documentation/remoteproc.txt rproc_set_state_machine description is:
>
> "This function should be called for cases where the remote processor has
> been started by another entity, be it a boot loader or trusted environment,
> and the remoteproc core is to synchronise with the remote processor rather
> then boot it."
>
> So how on_init could be false if "the remote processor has
> been started by another entity"?

I see your point and I think it is a question of documentation. I will rephrase
this to be more accurate.

>
> Regards,
> Arnaud
>
> >
> >>
> >> Regards,
> >> Arnaud
> >>
> >>> +
> >>> + return 0;
> >>> +}
> >>> +EXPORT_SYMBOL(rproc_set_state_machine);
> >>> +
> >>> /**
> >>> * rproc_type_release() - release a remote processor instance
> >>> * @dev: the rproc's device
> >>> @@ -2088,6 +2141,7 @@ static void rproc_type_release(struct device *dev)
> >>> kfree_const(rproc->firmware);
> >>> kfree_const(rproc->name);
> >>> kfree(rproc->ops);
> >>> + kfree(rproc->sync_ops);
> >>> kfree(rproc);
> >>> }
> >>>
> >>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> >>> index 7dcc0a26892b..c1a293a37c78 100644
> >>> --- a/drivers/remoteproc/remoteproc_internal.h
> >>> +++ b/drivers/remoteproc/remoteproc_internal.h
> >>> @@ -27,6 +27,8 @@ struct rproc_debug_trace {
> >>> /*
> >>> * enum rproc_sync_states - remote processsor sync states
> >>> *
> >>> + * @RPROC_SYNC_STATE_INIT state to use when the remoteproc core
> >>> + * is initialising.
> >>> * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
> >>> * has shutdown (rproc_shutdown()) the
> >>> * remote processor.
> >>> @@ -39,6 +41,7 @@ struct rproc_debug_trace {
> >>> * operation to use.
> >>> */
> >>> enum rproc_sync_states {
> >>> + RPROC_SYNC_STATE_INIT,
> >>> RPROC_SYNC_STATE_SHUTDOWN,
> >>> RPROC_SYNC_STATE_CRASHED,
> >>> };
> >>> @@ -47,6 +50,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
> >>> enum rproc_sync_states state)
> >>> {
> >>> switch (state) {
> >>> + case RPROC_SYNC_STATE_INIT:
> >>> + rproc->sync_with_rproc = rproc->sync_flags.on_init;
> >>> + break;
> >>> case RPROC_SYNC_STATE_SHUTDOWN:
> >>> rproc->sync_with_rproc = rproc->sync_flags.after_stop;
> >>> break;
> >>> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> >>> index ceb3b2bba824..a75ed92b3de6 100644
> >>> --- a/include/linux/remoteproc.h
> >>> +++ b/include/linux/remoteproc.h
> >>> @@ -619,6 +619,9 @@ struct rproc *rproc_get_by_child(struct device *dev);
> >>> struct rproc *rproc_alloc(struct device *dev, const char *name,
> >>> const struct rproc_ops *ops,
> >>> const char *firmware, int len);
> >>> +int rproc_set_state_machine(struct rproc *rproc,
> >>> + const struct rproc_ops *sync_ops,
> >>> + struct rproc_sync_flags sync_flags);
> >>> void rproc_put(struct rproc *rproc);
> >>> int rproc_add(struct rproc *rproc);
> >>> int rproc_del(struct rproc *rproc);
> >>>

2020-05-05 22:06:01

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 10/14] remoteproc: Deal with synchronisation when shutting down

On Mon, May 04, 2020 at 01:34:43PM +0200, Arnaud POULIQUEN wrote:
>
>
> On 4/30/20 10:23 PM, Mathieu Poirier wrote:
> > On Wed, Apr 29, 2020 at 10:19:49AM +0200, Arnaud POULIQUEN wrote:
> >>
> >>
> >> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> >>> The remoteproc core must not allow function rproc_shutdown() to
> >>> proceed if currently synchronising with a remote processor and
> >>> the synchronisation operations of that remote processor does not
> >>> support it. Also part of the process is to set the synchronisation
> >>> flag so that the remoteproc core can make the right decisions when
> >>> restarting the system.
> >>>
> >>> Signed-off-by: Mathieu Poirier <[email protected]>
> >>> ---
> >>> drivers/remoteproc/remoteproc_core.c | 32 ++++++++++++++++++++++++
> >>> drivers/remoteproc/remoteproc_internal.h | 7 ++++++
> >>> 2 files changed, 39 insertions(+)
> >>>
> >>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> >>> index 3a84a38ba37b..48afa1f80a8f 100644
> >>> --- a/drivers/remoteproc/remoteproc_core.c
> >>> +++ b/drivers/remoteproc/remoteproc_core.c
> >>> @@ -1849,6 +1849,27 @@ int rproc_boot(struct rproc *rproc)
> >>> }
> >>> EXPORT_SYMBOL(rproc_boot);
> >>>
> >>> +static bool rproc_can_shutdown(struct rproc *rproc)
> >>> +{
> >>> + /*
> >>> + * The remoteproc core is the lifecycle manager, no problem
> >>> + * calling for a shutdown.
> >>> + */
> >>> + if (!rproc_needs_syncing(rproc))
> >>> + return true;
> >>> +
> >>> + /*
> >>> + * The remoteproc has been loaded by another entity (as per above
> >>> + * condition) and the platform code has given us the capability
> >>> + * of stopping it.
> >>> + */
> >>> + if (rproc->sync_ops->stop)
> >>> + return true;
> >>
> >> This means that if rproc->sync_ops->stop is null rproc_stop_subdevices will not
> >> be called? seems not symmetric with the start sequence.
> >
> > If rproc->sync_ops->stop is not provided then the remoteproc core can't stop the
> > remote processor at all after it has synchronised with it. If a usecase
> > requires some kind of soft reset then a stop() function that uses a mailbox
> > notification or some other mechanism can be provided to tell the remote
> > processor to put itself back in startup mode again.
> >
> > Is this fine with you or there is still something I don't get?
>
> My point here is more around the subdevices. But perhaps i missed something...
>
> In rproc_start rproc_start_subdevices is called, even if sync_start is null.

Here I'll take that you mean sync_ops::start()

> But in rproc_shutdown rproc_stop is not called, if sync_ops->stop is null.
> So rproc_stop_subdevices is not called in this case.

Correct. I am pretty sure some people don't want the remoteproc core to be able
to do anything other than synchronise with a remote processor, be it at boot
time or when the remote processor has crashed.

I can also see scenarios where people want to be able to start and stop
subdevices from the remoteproc core, but _not_ power cycle the remote processor.
In such cases the sync_ops::stop() should be some kind of notification telling
the remote processor to put itself back in initialisation mode and
sync_flags.after_stop should be set to true.

> Then if sync_flags.after_stop is false, it looks like that something will go wrong
> at next start.

If sync_ops::stop is NULL then the value of sync_flags.after_stop becomes
irrelevant because that state can't be reached. Let me know if you found a
condition where this isn't the case and I will correct it.

>
> >
> >> Probably not useful to test it here as condition is already handled in rproc_stop_device...
> >>
> >> Regards
> >> Arnaud
> >>> +
> >>> + /* Any other condition should not be allowed */
> >>> + return false;
> >>> +}
> >>> +
> >>> /**
> >>> * rproc_shutdown() - power off the remote processor
> >>> * @rproc: the remote processor
> >>> @@ -1879,6 +1900,9 @@ void rproc_shutdown(struct rproc *rproc)
> >>> return;
> >>> }
> >>>
> >>> + if (!rproc_can_shutdown(rproc))
> >>> + goto out;
> >>> +
> >>> /* if the remote proc is still needed, bail out */
> >>> if (!atomic_dec_and_test(&rproc->power))
> >>> goto out;
> >>> @@ -1898,6 +1922,14 @@ void rproc_shutdown(struct rproc *rproc)
> >>> kfree(rproc->cached_table);
> >>> rproc->cached_table = NULL;
> >>> rproc->table_ptr = NULL;
> >>> +
> >>> + /*
> >>> + * The remote processor has been switched off - tell the core what
> >>> + * operation to use from hereon, i.e whether an external entity will
> >>> + * reboot the remote processor or it is now the remoteproc core's
> >>> + * responsability.
> >>> + */
> >>> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_SHUTDOWN);
> >>> out:
> >>> mutex_unlock(&rproc->lock);
> >>> }
> >>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> >>> index 61500981155c..7dcc0a26892b 100644
> >>> --- a/drivers/remoteproc/remoteproc_internal.h
> >>> +++ b/drivers/remoteproc/remoteproc_internal.h
> >>> @@ -27,6 +27,9 @@ struct rproc_debug_trace {
> >>> /*
> >>> * enum rproc_sync_states - remote processsor sync states
> >>> *
> >>> + * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
> >>> + * has shutdown (rproc_shutdown()) the
> >>> + * remote processor.
> >>> * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> >>> * has crashed but has not been recovered by
> >>> * the remoteproc core yet.
> >>> @@ -36,6 +39,7 @@ struct rproc_debug_trace {
> >>> * operation to use.
> >>> */
> >>> enum rproc_sync_states {
> >>> + RPROC_SYNC_STATE_SHUTDOWN,
> >>> RPROC_SYNC_STATE_CRASHED,
> >>> };
> >>>
> >>> @@ -43,6 +47,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
> >>> enum rproc_sync_states state)
> >>> {
> >>> switch (state) {
> >>> + case RPROC_SYNC_STATE_SHUTDOWN:
> >>> + rproc->sync_with_rproc = rproc->sync_flags.after_stop;
> >>> + break;
> >>> case RPROC_SYNC_STATE_CRASHED:
> >>> rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> >>> break;
> >>>

2020-05-05 22:12:55

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 08/14] remoteproc: Call core functions based on synchronisation flag

On Mon, May 04, 2020 at 01:14:59PM +0200, Arnaud POULIQUEN wrote:
> hi Mathieu,
>
> On 4/30/20 9:57 PM, Mathieu Poirier wrote:
> > On Tue, Apr 28, 2020 at 07:27:27PM +0200, Arnaud POULIQUEN wrote:
> >>
> >>
> >> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
> >>> Call the right core function based on whether we should synchronise
> >>> with a remote processor or boot it from scratch.
> >>>
> >>> Signed-off-by: Mathieu Poirier <[email protected]>
> >>> ---
> >>> drivers/remoteproc/remoteproc_internal.h | 50 ++++++++++++++++++++++++
> >>> 1 file changed, 50 insertions(+)
> >>>
> >>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> >>> index dda7044c4b3e..3985c084b184 100644
> >>> --- a/drivers/remoteproc/remoteproc_internal.h
> >>> +++ b/drivers/remoteproc/remoteproc_internal.h
> >>> @@ -72,6 +72,12 @@ static inline bool rproc_needs_syncing(struct rproc *rproc)
> >>> static inline
> >>> int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> >>> {
> >>> + if (rproc_needs_syncing(rproc)) {
> >>> + if (rproc->sync_ops && rproc->sync_ops->sanity_check)
> >>> + return rproc->sync_ops->sanity_check(rproc, fw);
> >>> + return 0;
> >>> + }
> >>> +
> >>> if (rproc->ops && rproc->ops->sanity_check)
> >>> return rproc->ops->sanity_check(rproc, fw);
> >>
> >> Regarding this patch I'm trying to determine whether it makes sense to have ops or
> >> sync_ops set to null. Your[v3 01/14] patch commit explains that ops can be null in case of
> >> synchronisation.
> >> But it seems deprecated with the sync_ops introduction...
> >
> > Your comment made me go over the logic again... If rproc_needs_syncing() is
> > true then we necessarily have a sync_ops. If rproc_needs_syncing() is false,
> > there too we automatically have an ops. As such and as you point out, checking
> > for rproc->sync_ops and rproc-ops is probably useless.
> An Additional test in rproc_set_state_machine should be sufficient, something like that:
> /* rproc->ops struct is mandatory if at least one sync flag is false */
> if (!rproc->ops && !(sync_flags.on_init &&
> sync_flags.after_stop && sync_flags.after_crash))
> return -EINVAL;

Right, something like that.

>
> >
> >>
> >> And if sync_ops is null, is it still necessary to define a remoteproc device?
> >
> > Not sure I understand your point here but with the reasonning from above it
> > is probably moot anyway.
> Just to mention that a platform device with ops and ops_sync null seems like nonsense

We agree.

>
> Regards,
> Arnaud
> >
> >>
> >> Regards
> >> Arnad
> >>
> >>>
> >>> @@ -81,6 +87,12 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> >>> static inline
> >>> u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> >>> {
> >>> + if (rproc_needs_syncing(rproc)) {
> >>> + if (rproc->sync_ops && rproc->sync_ops->get_boot_addr)
> >>> + return rproc->sync_ops->get_boot_addr(rproc, fw);
> >>> + return 0;
> >>> + }
> >>> +
> >>> if (rproc->ops && rproc->ops->get_boot_addr)
> >>> return rproc->ops->get_boot_addr(rproc, fw);
> >>>
> >>> @@ -90,6 +102,12 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> >>> static inline
> >>> int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> >>> {
> >>> + if (rproc_needs_syncing(rproc)) {
> >>> + if (rproc->sync_ops && rproc->sync_ops->load)
> >>> + return rproc->sync_ops->load(rproc, fw);
> >>> + return 0;
> >>> + }
> >>> +
> >>> if (rproc->ops && rproc->ops->load)
> >>> return rproc->ops->load(rproc, fw);
> >>>
> >>> @@ -98,6 +116,12 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> >>>
> >>> static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
> >>> {
> >>> + if (rproc_needs_syncing(rproc)) {
> >>> + if (rproc->sync_ops && rproc->sync_ops->parse_fw)
> >>> + return rproc->sync_ops->parse_fw(rproc, fw);
> >>> + return 0;
> >>> + }
> >>> +
> >>> if (rproc->ops && rproc->ops->parse_fw)
> >>> return rproc->ops->parse_fw(rproc, fw);
> >>>
> >>> @@ -108,6 +132,13 @@ static inline
> >>> int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
> >>> int avail)
> >>> {
> >>> + if (rproc_needs_syncing(rproc)) {
> >>> + if (rproc->sync_ops && rproc->sync_ops->handle_rsc)
> >>> + return rproc->sync_ops->handle_rsc(rproc, rsc_type,
> >>> + rsc, offset, avail);
> >>> + return 0;
> >>> + }
> >>> +
> >>> if (rproc->ops && rproc->ops->handle_rsc)
> >>> return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
> >>> avail);
> >>> @@ -119,6 +150,13 @@ static inline
> >>> struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> >>> const struct firmware *fw)
> >>> {
> >>> + if (rproc_needs_syncing(rproc)) {
> >>> + if (rproc->sync_ops && rproc->sync_ops->find_loaded_rsc_table)
> >>> + return rproc->sync_ops->find_loaded_rsc_table(rproc,
> >>> + fw);
> >>> + return NULL;
> >>> + }
> >>> +
> >>> if (rproc->ops && rproc->ops->find_loaded_rsc_table)
> >>> return rproc->ops->find_loaded_rsc_table(rproc, fw);
> >>>
> >>> @@ -127,6 +165,12 @@ struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> >>>
> >>> static inline int rproc_start_device(struct rproc *rproc)
> >>> {
> >>> + if (rproc_needs_syncing(rproc)) {
> >>> + if (rproc->sync_ops && rproc->sync_ops->start)
> >>> + return rproc->sync_ops->start(rproc);
> >>> + return 0;
> >>> + }
> >>> +
> >>> if (rproc->ops && rproc->ops->start)
> >>> return rproc->ops->start(rproc);
> >>>
> >>> @@ -135,6 +179,12 @@ static inline int rproc_start_device(struct rproc *rproc)
> >>>
> >>> static inline int rproc_stop_device(struct rproc *rproc)
> >>> {
> >>> + if (rproc_needs_syncing(rproc)) {
> >>> + if (rproc->sync_ops && rproc->sync_ops->stop)
> >>> + return rproc->sync_ops->stop(rproc);
> >>> + return 0;
> >>> + }
> >>> +
> >>> if (rproc->ops && rproc->ops->stop)
> >>> return rproc->ops->stop(rproc);
> >>>
> >>>

2020-05-05 22:19:57

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 01/14] remoteproc: Make core operations optional

On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:

> When synchronizing with a remote processor, it is entirely possible that
> the remoteproc core is not the life cycle manager. In such a case core
> operations don't exist and should not be called.
>

Why would the core call these functions if it knows the remote is in a
state where it doesn't need these?

Regards,
Bjorn

> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_internal.h | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index b389dc79da81..59fc871743c7 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -67,7 +67,7 @@ rproc_find_carveout_by_name(struct rproc *rproc, const char *name, ...);
> static inline
> int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> {
> - if (rproc->ops->sanity_check)
> + if (rproc->ops && rproc->ops->sanity_check)
> return rproc->ops->sanity_check(rproc, fw);
>
> return 0;
> @@ -76,7 +76,7 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> static inline
> u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> {
> - if (rproc->ops->get_boot_addr)
> + if (rproc->ops && rproc->ops->get_boot_addr)
> return rproc->ops->get_boot_addr(rproc, fw);
>
> return 0;
> @@ -85,7 +85,7 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> static inline
> int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> {
> - if (rproc->ops->load)
> + if (rproc->ops && rproc->ops->load)
> return rproc->ops->load(rproc, fw);
>
> return -EINVAL;
> @@ -93,7 +93,7 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
>
> static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
> {
> - if (rproc->ops->parse_fw)
> + if (rproc->ops && rproc->ops->parse_fw)
> return rproc->ops->parse_fw(rproc, fw);
>
> return 0;
> @@ -103,7 +103,7 @@ static inline
> int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
> int avail)
> {
> - if (rproc->ops->handle_rsc)
> + if (rproc->ops && rproc->ops->handle_rsc)
> return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
> avail);
>
> @@ -114,7 +114,7 @@ static inline
> struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> const struct firmware *fw)
> {
> - if (rproc->ops->find_loaded_rsc_table)
> + if (rproc->ops && rproc->ops->find_loaded_rsc_table)
> return rproc->ops->find_loaded_rsc_table(rproc, fw);
>
> return NULL;
> --
> 2.20.1
>

2020-05-05 22:33:38

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 02/14] remoteproc: Introduce function rproc_alloc_internals()

On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:

> In scenarios where the remote processor's lifecycle is entirely
> managed by another entity there is no point in allocating memory for
> a firmware name since it will never be used. The same goes for a core
> set of operations.
>
> As such introduce function rproc_alloc_internals() to decide if the
> allocation of a firmware name and the core operations need to be done.
> That way rproc_alloc() can be kept as clean as possible.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 31 +++++++++++++++++++++++-----
> 1 file changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 448262470fc7..1b4756909584 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -2076,6 +2076,30 @@ static int rproc_alloc_ops(struct rproc *rproc, const struct rproc_ops *ops)
> return 0;
> }
>
> +static int rproc_alloc_internals(struct rproc *rproc,
> + const struct rproc_ops *ops,
> + const char *name, const char *firmware)
> +{
> + int ret;
> +
> + /*
> + * In scenarios where the remote processor's lifecycle is entirely
> + * managed by another entity there is no point in carrying a set
> + * of operations that will never be used.
> + *
> + * And since no firmware will ever be loaded, there is no point in
> + * allocating memory for it either.

While this is true, I would expect that there are cases where the
remoteproc has ops but no firmware.

How about splitting this decision already now; i.e. moving the if(!ops)
to rproc_alloc_ops() and perhaps only allocate firmware if ops->load is
specified?

Regards,
Bjorn

> + */
> + if (!ops)
> + return 0;
> +
> + ret = rproc_alloc_firmware(rproc, name, firmware);
> + if (ret)
> + return ret;
> +
> + return rproc_alloc_ops(rproc, ops);
> +}
> +
> /**
> * rproc_alloc() - allocate a remote processor handle
> * @dev: the underlying device
> @@ -2105,7 +2129,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name,
> {
> struct rproc *rproc;
>
> - if (!dev || !name || !ops)
> + if (!dev || !name)
> return NULL;
>
> rproc = kzalloc(sizeof(struct rproc) + len, GFP_KERNEL);
> @@ -2128,10 +2152,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name,
> if (!rproc->name)
> goto put_device;
>
> - if (rproc_alloc_firmware(rproc, name, firmware))
> - goto put_device;
> -
> - if (rproc_alloc_ops(rproc, ops))
> + if (rproc_alloc_internals(rproc, ops, name, firmware))
> goto put_device;
>
> /* Assign a unique device index and name */
> --
> 2.20.1
>

2020-05-06 00:37:17

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 05/14] remoteproc: Refactor function rproc_fw_boot()

On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:

> Refactor function rproc_fw_boot() in order to better reflect the work
> that is done when supporting scenarios where the remoteproc core is
> synchronising with a remote processor.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index a02593b75bec..e90a21de9de1 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1370,9 +1370,9 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
> }
>
> /*
> - * take a firmware and boot a remote processor with it.
> + * boot or synchronise with a remote processor.
> */
> -static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> +static int rproc_actuate_device(struct rproc *rproc, const struct firmware *fw)

Per patch 4 this function will if rproc_needs_syncing() be called with
fw == NULL, it's not obvious to me that the various operations on "fw"
in this function are valid anymore.

> {
> struct device *dev = &rproc->dev;
> const char *name = rproc->firmware;
> @@ -1382,7 +1382,9 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> if (ret)
> return ret;
>
> - dev_info(dev, "Booting fw image %s, size %zd\n", name, fw->size);
> + if (!rproc_needs_syncing(rproc))

Can't we make this check on fw, to make the relationship "if we where
passed a firmware object, we're going to load and boot that firmware"?

Regards,
Bjorn

> + dev_info(dev, "Booting fw image %s, size %zd\n",
> + name, fw->size);
>
> /*
> * if enabling an IOMMU isn't relevant for this rproc, this is
> @@ -1818,7 +1820,7 @@ int rproc_boot(struct rproc *rproc)
> }
> }
>
> - ret = rproc_fw_boot(rproc, firmware_p);
> + ret = rproc_actuate_device(rproc, firmware_p);
>
> release_firmware(firmware_p);
>
> --
> 2.20.1
>

2020-05-06 00:42:54

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:

> Add a new sync_ops to support use cases where the remoteproc
> core is synchronising with the remote processor. Exactly when to use
> the synchronisation operations is directed by the flags in structure
> rproc_sync_flags.
>

I'm sorry, but no matter how many times I read these patches I have to
translate "synchronising" to "remote controlled", and given the number
of comments clarifying this makes me feel that we could perhaps come up
with a better name?

> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> include/linux/remoteproc.h | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> index ac4082f12e8b..ceb3b2bba824 100644
> --- a/include/linux/remoteproc.h
> +++ b/include/linux/remoteproc.h
> @@ -353,6 +353,23 @@ enum rsc_handling_status {
> RSC_IGNORED = 1,
> };
>
> +/**
> + * struct rproc_sync_flags - platform specific flags indicating which
> + * rproc_ops to use at specific times during
> + * the rproc lifecycle.
> + * @on_init: true if synchronising with the remote processor at
> + * initialisation time
> + * @after_stop: true if synchronising with the remote processor after it was
> + * stopped from the cmmand line
> + * @after_crash: true if synchronising with the remote processor after
> + * it has crashed
> + */
> +struct rproc_sync_flags {
> + bool on_init;

This indirectly splits the RPROC_OFFLINE state in an "offline" and
"already-booted" state. Wouldn't it be clearer to represent this with a
new RPROC_ALREADY_BOOTED state?

> + bool after_stop;

What does it mean when this is true? That Linux can shut the remote core
down, but someone else will start it?

> + bool after_crash;

Similarly what is the expected steps to be taken by the core when this
is true? Should rproc_report_crash() simply stop/start the subdevices
and upon one of the ops somehow tell the remote controller that it can
proceed with the recovery?

> +};
> +
> /**
> * struct rproc_ops - platform-specific device handlers
> * @start: power on the device and boot it
> @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> * @firmware: name of firmware file to be loaded
> * @priv: private data which belongs to the platform-specific rproc module
> * @ops: platform-specific start/stop rproc handlers
> + * @sync_ops: platform-specific start/stop rproc handlers when
> + * synchronising with a remote processor.
> + * @sync_flags: Determine the rproc_ops to choose in specific states.
> * @dev: virtual device for refcounting and common remoteproc behavior
> * @power: refcount of users who need this rproc powered up
> * @state: state of the device
> @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> * @table_sz: size of @cached_table
> * @has_iommu: flag to indicate if remote processor is behind an MMU
> * @auto_boot: flag to indicate if remote processor should be auto-started
> + * @sync_with_rproc: true if currently synchronising with the rproc
> * @dump_segments: list of segments in the firmware
> * @nb_vdev: number of vdev currently handled by rproc
> */
> @@ -492,6 +513,8 @@ struct rproc {
> const char *firmware;
> void *priv;
> struct rproc_ops *ops;
> + struct rproc_ops *sync_ops;

Do we really need two rproc_ops, given that both are coming from the
platform driver and the sync_flags will define which one to look at?

Can't the platform driver just provide an ops table that works with the
flags it passes?

Regards,
Bjorn

> + struct rproc_sync_flags sync_flags;
> struct device dev;
> atomic_t power;
> unsigned int state;
> @@ -515,6 +538,7 @@ struct rproc {
> size_t table_sz;
> bool has_iommu;
> bool auto_boot;
> + bool sync_with_rproc;
> struct list_head dump_segments;
> int nb_vdev;
> u8 elf_class;
> --
> 2.20.1
>

2020-05-06 00:46:38

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 07/14] remoteproc: Introducting new start and stop functions

On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:

> Add new functions to replace direct calling of rproc->ops->start() and
> rproc->ops->stop(). That way different behaviour can be played out
> when booting a remote processor or synchronising with it.
>

Reviewed-by: Bjorn Andersson <[email protected]>

PS. But I do wonder if we should just inline the struct rproc_ops in
struct rproc, rather than allocate a separate object for it. But after
adding all your accessors changing this would be quite succinct.

Regards,
Bjorn

> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 6 +++---
> drivers/remoteproc/remoteproc_internal.h | 16 ++++++++++++++++
> 2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 9de0e2b7ca2b..ef88d3e84bfb 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1339,7 +1339,7 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
> }
>
> /* power up the remote processor */
> - ret = rproc->ops->start(rproc);
> + ret = rproc_start_device(rproc);
> if (ret) {
> dev_err(dev, "can't start rproc %s: %d\n", rproc->name, ret);
> goto unprepare_subdevices;
> @@ -1360,7 +1360,7 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
> return 0;
>
> stop_rproc:
> - rproc->ops->stop(rproc);
> + rproc_stop_device(rproc);
> unprepare_subdevices:
> rproc_unprepare_subdevices(rproc);
> reset_table_ptr:
> @@ -1493,7 +1493,7 @@ static int rproc_stop(struct rproc *rproc, bool crashed)
> rproc->table_ptr = rproc->cached_table;
>
> /* power off the remote processor */
> - ret = rproc->ops->stop(rproc);
> + ret = rproc_stop_device(rproc);
> if (ret) {
> dev_err(dev, "can't stop rproc: %d\n", ret);
> return ret;
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index 47b500e40dd9..dda7044c4b3e 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -125,6 +125,22 @@ struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> return NULL;
> }
>
> +static inline int rproc_start_device(struct rproc *rproc)
> +{
> + if (rproc->ops && rproc->ops->start)
> + return rproc->ops->start(rproc);
> +
> + return 0;
> +}
> +
> +static inline int rproc_stop_device(struct rproc *rproc)
> +{
> + if (rproc->ops && rproc->ops->stop)
> + return rproc->ops->stop(rproc);
> +
> + return 0;
> +}
> +
> static inline
> bool rproc_u64_fit_in_size_t(u64 val)
> {
> --
> 2.20.1
>

2020-05-06 01:06:01

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 09/14] remoteproc: Deal with synchronisation when crashing

On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:

> Refactor function rproc_trigger_recovery() in order to avoid
> reloading the firmware image when synchronising with a remote
> processor rather than booting it. Also part of the process,
> properly set the synchronisation flag in order to properly
> recover the system.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------
> drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++
> 2 files changed, 43 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index ef88d3e84bfb..3a84a38ba37b 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc)
> */
> int rproc_trigger_recovery(struct rproc *rproc)
> {
> - const struct firmware *firmware_p;
> + const struct firmware *firmware_p = NULL;
> struct device *dev = &rproc->dev;
> int ret;
>
> @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> /* generate coredump */
> rproc_coredump(rproc);
>
> - /* load firmware */
> - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> - if (ret < 0) {
> - dev_err(dev, "request_firmware failed: %d\n", ret);
> - goto unlock_mutex;
> + /* load firmware if need be */
> + if (!rproc_needs_syncing(rproc)) {
> + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> + if (ret < 0) {
> + dev_err(dev, "request_firmware failed: %d\n", ret);
> + goto unlock_mutex;
> + }
> }
>
> - /* boot the remote processor up again */
> + /* boot up or synchronise with the remote processor again */
> ret = rproc_start(rproc, firmware_p);
>
> release_firmware(firmware_p);
> @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work)
> dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
> rproc->name);
>
> + /*
> + * The remote processor has crashed - tell the core what operation
> + * to use from hereon, i.e whether an external entity will reboot
> + * the MCU or it is now the remoteproc core's responsability.
> + */
> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED);

If I follow the logic correctly, you're essentially using
rproc->sync_with_rproc to pass an additional parameter down through
rproc_trigger_recovery() to tell everyone below to "load firmware and
boot the core or not".

And given that the comment alludes to some unknown logic determining the
continuation I think it would be much preferable to essentially just
pass rproc->sync_flags.after_crash down through these functions.


And per my comment on a previous patch, is there any synchronization
with the remote controller when this happens?

Regards,
Bjorn

> +
> mutex_unlock(&rproc->lock);
>
> if (!rproc->recovery_disabled)
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index 3985c084b184..61500981155c 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -24,6 +24,33 @@ struct rproc_debug_trace {
> struct rproc_mem_entry trace_mem;
> };
>
> +/*
> + * enum rproc_sync_states - remote processsor sync states
> + *
> + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> + * has crashed but has not been recovered by
> + * the remoteproc core yet.
> + *
> + * Keeping these separate from the enum rproc_state in order to avoid
> + * introducing coupling between the state of the MCU and the synchronisation
> + * operation to use.
> + */
> +enum rproc_sync_states {
> + RPROC_SYNC_STATE_CRASHED,
> +};
> +
> +static inline void rproc_set_sync_flag(struct rproc *rproc,
> + enum rproc_sync_states state)
> +{
> + switch (state) {
> + case RPROC_SYNC_STATE_CRASHED:
> + rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> + break;
> + default:
> + break;
> + }
> +}
> +
> /* from remoteproc_core.c */
> void rproc_release(struct kref *kref);
> irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id);
> --
> 2.20.1
>

2020-05-06 01:12:09

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 10/14] remoteproc: Deal with synchronisation when shutting down

On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:

> The remoteproc core must not allow function rproc_shutdown() to
> proceed if currently synchronising with a remote processor and
> the synchronisation operations of that remote processor does not
> support it. Also part of the process is to set the synchronisation
> flag so that the remoteproc core can make the right decisions when
> restarting the system.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_core.c | 32 ++++++++++++++++++++++++
> drivers/remoteproc/remoteproc_internal.h | 7 ++++++
> 2 files changed, 39 insertions(+)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 3a84a38ba37b..48afa1f80a8f 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1849,6 +1849,27 @@ int rproc_boot(struct rproc *rproc)
> }
> EXPORT_SYMBOL(rproc_boot);
>
> +static bool rproc_can_shutdown(struct rproc *rproc)
> +{
> + /*
> + * The remoteproc core is the lifecycle manager, no problem
> + * calling for a shutdown.
> + */
> + if (!rproc_needs_syncing(rproc))
> + return true;
> +
> + /*
> + * The remoteproc has been loaded by another entity (as per above
> + * condition) and the platform code has given us the capability
> + * of stopping it.
> + */
> + if (rproc->sync_ops->stop)
> + return true;
> +
> + /* Any other condition should not be allowed */
> + return false;
> +}
> +
> /**
> * rproc_shutdown() - power off the remote processor
> * @rproc: the remote processor
> @@ -1879,6 +1900,9 @@ void rproc_shutdown(struct rproc *rproc)
> return;
> }
>
> + if (!rproc_can_shutdown(rproc))
> + goto out;

There's been a request mentioned of it being possible to shut down Linux
and having the remote processor keep running.

By skipping the rest of shutdown we will not stop or unprepare
subdevices, so presumably the remote processor won't know that
virtio/rpmsg is down. Is that ok?

> +
> /* if the remote proc is still needed, bail out */
> if (!atomic_dec_and_test(&rproc->power))
> goto out;
> @@ -1898,6 +1922,14 @@ void rproc_shutdown(struct rproc *rproc)
> kfree(rproc->cached_table);
> rproc->cached_table = NULL;
> rproc->table_ptr = NULL;
> +
> + /*
> + * The remote processor has been switched off - tell the core what
> + * operation to use from hereon, i.e whether an external entity will
> + * reboot the remote processor or it is now the remoteproc core's
> + * responsability.
> + */
> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_SHUTDOWN);

As asked on a previous patch, what would it mean if after_stop is true?

It seems like this state would be similar to the "already-booted" state
that we might encounter at probe time.

Regards,
Bjorn

> out:
> mutex_unlock(&rproc->lock);
> }
> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> index 61500981155c..7dcc0a26892b 100644
> --- a/drivers/remoteproc/remoteproc_internal.h
> +++ b/drivers/remoteproc/remoteproc_internal.h
> @@ -27,6 +27,9 @@ struct rproc_debug_trace {
> /*
> * enum rproc_sync_states - remote processsor sync states
> *
> + * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
> + * has shutdown (rproc_shutdown()) the
> + * remote processor.
> * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> * has crashed but has not been recovered by
> * the remoteproc core yet.
> @@ -36,6 +39,7 @@ struct rproc_debug_trace {
> * operation to use.
> */
> enum rproc_sync_states {
> + RPROC_SYNC_STATE_SHUTDOWN,
> RPROC_SYNC_STATE_CRASHED,
> };
>
> @@ -43,6 +47,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
> enum rproc_sync_states state)
> {
> switch (state) {
> + case RPROC_SYNC_STATE_SHUTDOWN:
> + rproc->sync_with_rproc = rproc->sync_flags.after_stop;
> + break;
> case RPROC_SYNC_STATE_CRASHED:
> rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> break;
> --
> 2.20.1
>

2020-05-06 01:29:29

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] remoteproc: Deal with synchronisation when changing FW image

On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:

> This patch prevents the firmware image from being displayed or changed
> when the remoteproc core is synchronising with a remote processor. This
> is needed since there is no guarantee about the nature of the firmware
> image that is loaded by the external entity.
>
> Signed-off-by: Mathieu Poirier <[email protected]>
> ---
> drivers/remoteproc/remoteproc_sysfs.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
> index 7f8536b73295..cdd322a6ecfa 100644
> --- a/drivers/remoteproc/remoteproc_sysfs.c
> +++ b/drivers/remoteproc/remoteproc_sysfs.c
> @@ -13,9 +13,20 @@
> static ssize_t firmware_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> + ssize_t ret;
> struct rproc *rproc = to_rproc(dev);
>
> - return sprintf(buf, "%s\n", rproc->firmware);
> + /*
> + * In most instances there is no guarantee about the firmware
> + * that was loaded by the external entity. As such simply don't
> + * print anything.

Not only "in most instances", we have no idea what firmware is running,
so this can be shortened.

However, this does implicate that on_init = true, after_crash = false,
this will read blank, but a future rproc_report_crash() will indeed load
and boot rproc->firmware.

> + */
> + if (rproc_needs_syncing(rproc))
> + ret = sprintf(buf, "\n");
> + else
> + ret = sprintf(buf, "%s\n", rproc->firmware);
> +
> + return ret;
> }
>
> /* Change firmware name via sysfs */
> @@ -39,6 +50,17 @@ static ssize_t firmware_store(struct device *dev,
> goto out;
> }
>
> + /*
> + * There is no point in trying to change the firmware if loading the
> + * image of the remote processor is done by another entity.
> + */
> + if (rproc_needs_syncing(rproc)) {
> + dev_err(dev,
> + "can't change firmware while synchronising with MCU\n");

The conditional checks for a future event, but the error message
indicates an ongoing event. How about "can't change firmware on remote
controlled remote processor"? "externally controlled"?

Regards,
Bjorn

> + err = -EBUSY;
> + goto out;
> + }
> +
> len = strcspn(buf, "\n");
> if (!len) {
> dev_err(dev, "can't provide a NULL firmware\n");
> --
> 2.20.1
>

2020-05-06 07:55:31

by Arnaud POULIQUEN

[permalink] [raw]
Subject: Re: [PATCH v3 10/14] remoteproc: Deal with synchronisation when shutting down



On 5/6/20 12:03 AM, Mathieu Poirier wrote:
> On Mon, May 04, 2020 at 01:34:43PM +0200, Arnaud POULIQUEN wrote:
>>
>>
>> On 4/30/20 10:23 PM, Mathieu Poirier wrote:
>>> On Wed, Apr 29, 2020 at 10:19:49AM +0200, Arnaud POULIQUEN wrote:
>>>>
>>>>
>>>> On 4/24/20 10:01 PM, Mathieu Poirier wrote:
>>>>> The remoteproc core must not allow function rproc_shutdown() to
>>>>> proceed if currently synchronising with a remote processor and
>>>>> the synchronisation operations of that remote processor does not
>>>>> support it. Also part of the process is to set the synchronisation
>>>>> flag so that the remoteproc core can make the right decisions when
>>>>> restarting the system.
>>>>>
>>>>> Signed-off-by: Mathieu Poirier <[email protected]>
>>>>> ---
>>>>> drivers/remoteproc/remoteproc_core.c | 32 ++++++++++++++++++++++++
>>>>> drivers/remoteproc/remoteproc_internal.h | 7 ++++++
>>>>> 2 files changed, 39 insertions(+)
>>>>>
>>>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>>>>> index 3a84a38ba37b..48afa1f80a8f 100644
>>>>> --- a/drivers/remoteproc/remoteproc_core.c
>>>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>>>> @@ -1849,6 +1849,27 @@ int rproc_boot(struct rproc *rproc)
>>>>> }
>>>>> EXPORT_SYMBOL(rproc_boot);
>>>>>
>>>>> +static bool rproc_can_shutdown(struct rproc *rproc)
>>>>> +{
>>>>> + /*
>>>>> + * The remoteproc core is the lifecycle manager, no problem
>>>>> + * calling for a shutdown.
>>>>> + */
>>>>> + if (!rproc_needs_syncing(rproc))
>>>>> + return true;
>>>>> +
>>>>> + /*
>>>>> + * The remoteproc has been loaded by another entity (as per above
>>>>> + * condition) and the platform code has given us the capability
>>>>> + * of stopping it.
>>>>> + */
>>>>> + if (rproc->sync_ops->stop)
>>>>> + return true;
>>>>
>>>> This means that if rproc->sync_ops->stop is null rproc_stop_subdevices will not
>>>> be called? seems not symmetric with the start sequence.
>>>
>>> If rproc->sync_ops->stop is not provided then the remoteproc core can't stop the
>>> remote processor at all after it has synchronised with it. If a usecase
>>> requires some kind of soft reset then a stop() function that uses a mailbox
>>> notification or some other mechanism can be provided to tell the remote
>>> processor to put itself back in startup mode again.
>>>
>>> Is this fine with you or there is still something I don't get?
>>
>> My point here is more around the subdevices. But perhaps i missed something...
>>
>> In rproc_start rproc_start_subdevices is called, even if sync_start is null.
>
> Here I'll take that you mean sync_ops::start()
>
>> But in rproc_shutdown rproc_stop is not called, if sync_ops->stop is null.
>> So rproc_stop_subdevices is not called in this case.
>
> Correct. I am pretty sure some people don't want the remoteproc core to be able
> to do anything other than synchronise with a remote processor, be it at boot
> time or when the remote processor has crashed.
>
> I can also see scenarios where people want to be able to start and stop
> subdevices from the remoteproc core, but _not_ power cycle the remote processor.
> In such cases the sync_ops::stop() should be some kind of notification telling
> the remote processor to put itself back in initialisation mode and
> sync_flags.after_stop should be set to true.
>
>> Then if sync_flags.after_stop is false, it looks like that something will go wrong
>> at next start.
>
> If sync_ops::stop is NULL then the value of sync_flags.after_stop becomes
> irrelevant because that state can't be reached. Let me know if you found a
> condition where this isn't the case and I will correct it.

The only condition i have in mind is that the sync_ops::stop() can not implemented
in platform driver, just because nothing to do. But i don't know if it is a realistic
use case and having a dummy stop function looks to me acceptable in this particular
use case.

This triggers me another comment :)
the rproc_ops struct description is relevant for the "normal" ops but not adapted
for the sync_ops. For instance the start & stop are mandatory for ops, optional for sync_ops
As this description is a reference (at least for me) to determine optional and mandatory ops
would be useful to update it.

Regards,
Arnaud
>
>>
>>>
>>>> Probably not useful to test it here as condition is already handled in rproc_stop_device...
>>>>
>>>> Regards
>>>> Arnaud
>>>>> +
>>>>> + /* Any other condition should not be allowed */
>>>>> + return false;
>>>>> +}
>>>>> +
>>>>> /**
>>>>> * rproc_shutdown() - power off the remote processor
>>>>> * @rproc: the remote processor
>>>>> @@ -1879,6 +1900,9 @@ void rproc_shutdown(struct rproc *rproc)
>>>>> return;
>>>>> }
>>>>>
>>>>> + if (!rproc_can_shutdown(rproc))
>>>>> + goto out;
>>>>> +
>>>>> /* if the remote proc is still needed, bail out */
>>>>> if (!atomic_dec_and_test(&rproc->power))
>>>>> goto out;
>>>>> @@ -1898,6 +1922,14 @@ void rproc_shutdown(struct rproc *rproc)
>>>>> kfree(rproc->cached_table);
>>>>> rproc->cached_table = NULL;
>>>>> rproc->table_ptr = NULL;
>>>>> +
>>>>> + /*
>>>>> + * The remote processor has been switched off - tell the core what
>>>>> + * operation to use from hereon, i.e whether an external entity will
>>>>> + * reboot the remote processor or it is now the remoteproc core's
>>>>> + * responsability.
>>>>> + */
>>>>> + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_SHUTDOWN);
>>>>> out:
>>>>> mutex_unlock(&rproc->lock);
>>>>> }
>>>>> diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
>>>>> index 61500981155c..7dcc0a26892b 100644
>>>>> --- a/drivers/remoteproc/remoteproc_internal.h
>>>>> +++ b/drivers/remoteproc/remoteproc_internal.h
>>>>> @@ -27,6 +27,9 @@ struct rproc_debug_trace {
>>>>> /*
>>>>> * enum rproc_sync_states - remote processsor sync states
>>>>> *
>>>>> + * @RPROC_SYNC_STATE_SHUTDOWN state to use after the remoteproc core
>>>>> + * has shutdown (rproc_shutdown()) the
>>>>> + * remote processor.
>>>>> * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
>>>>> * has crashed but has not been recovered by
>>>>> * the remoteproc core yet.
>>>>> @@ -36,6 +39,7 @@ struct rproc_debug_trace {
>>>>> * operation to use.
>>>>> */
>>>>> enum rproc_sync_states {
>>>>> + RPROC_SYNC_STATE_SHUTDOWN,
>>>>> RPROC_SYNC_STATE_CRASHED,
>>>>> };
>>>>>
>>>>> @@ -43,6 +47,9 @@ static inline void rproc_set_sync_flag(struct rproc *rproc,
>>>>> enum rproc_sync_states state)
>>>>> {
>>>>> switch (state) {
>>>>> + case RPROC_SYNC_STATE_SHUTDOWN:
>>>>> + rproc->sync_with_rproc = rproc->sync_flags.after_stop;
>>>>> + break;
>>>>> case RPROC_SYNC_STATE_CRASHED:
>>>>> rproc->sync_with_rproc = rproc->sync_flags.after_crash;
>>>>> break;
>>>>>

2020-05-08 19:13:21

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 01/14] remoteproc: Make core operations optional

Hi Bjorn,

On Tue, May 05, 2020 at 03:16:08PM -0700, Bjorn Andersson wrote:
> On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
>
> > When synchronizing with a remote processor, it is entirely possible that
> > the remoteproc core is not the life cycle manager. In such a case core
> > operations don't exist and should not be called.
> >
>
> Why would the core call these functions if it knows the remote is in a
> state where it doesn't need these?

This is the reasoning that came out of a conversation Arnaud and I had. We are
all on the same page.

>
> Regards,
> Bjorn
>
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_internal.h | 12 ++++++------
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index b389dc79da81..59fc871743c7 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -67,7 +67,7 @@ rproc_find_carveout_by_name(struct rproc *rproc, const char *name, ...);
> > static inline
> > int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> > {
> > - if (rproc->ops->sanity_check)
> > + if (rproc->ops && rproc->ops->sanity_check)
> > return rproc->ops->sanity_check(rproc, fw);
> >
> > return 0;
> > @@ -76,7 +76,7 @@ int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
> > static inline
> > u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> > {
> > - if (rproc->ops->get_boot_addr)
> > + if (rproc->ops && rproc->ops->get_boot_addr)
> > return rproc->ops->get_boot_addr(rproc, fw);
> >
> > return 0;
> > @@ -85,7 +85,7 @@ u64 rproc_get_boot_addr(struct rproc *rproc, const struct firmware *fw)
> > static inline
> > int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> > {
> > - if (rproc->ops->load)
> > + if (rproc->ops && rproc->ops->load)
> > return rproc->ops->load(rproc, fw);
> >
> > return -EINVAL;
> > @@ -93,7 +93,7 @@ int rproc_load_segments(struct rproc *rproc, const struct firmware *fw)
> >
> > static inline int rproc_parse_fw(struct rproc *rproc, const struct firmware *fw)
> > {
> > - if (rproc->ops->parse_fw)
> > + if (rproc->ops && rproc->ops->parse_fw)
> > return rproc->ops->parse_fw(rproc, fw);
> >
> > return 0;
> > @@ -103,7 +103,7 @@ static inline
> > int rproc_handle_rsc(struct rproc *rproc, u32 rsc_type, void *rsc, int offset,
> > int avail)
> > {
> > - if (rproc->ops->handle_rsc)
> > + if (rproc->ops && rproc->ops->handle_rsc)
> > return rproc->ops->handle_rsc(rproc, rsc_type, rsc, offset,
> > avail);
> >
> > @@ -114,7 +114,7 @@ static inline
> > struct resource_table *rproc_find_loaded_rsc_table(struct rproc *rproc,
> > const struct firmware *fw)
> > {
> > - if (rproc->ops->find_loaded_rsc_table)
> > + if (rproc->ops && rproc->ops->find_loaded_rsc_table)
> > return rproc->ops->find_loaded_rsc_table(rproc, fw);
> >
> > return NULL;
> > --
> > 2.20.1
> >

2020-05-08 19:39:28

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 02/14] remoteproc: Introduce function rproc_alloc_internals()

On Tue, May 05, 2020 at 03:31:58PM -0700, Bjorn Andersson wrote:
> On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
>
> > In scenarios where the remote processor's lifecycle is entirely
> > managed by another entity there is no point in allocating memory for
> > a firmware name since it will never be used. The same goes for a core
> > set of operations.
> >
> > As such introduce function rproc_alloc_internals() to decide if the
> > allocation of a firmware name and the core operations need to be done.
> > That way rproc_alloc() can be kept as clean as possible.
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_core.c | 31 +++++++++++++++++++++++-----
> > 1 file changed, 26 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index 448262470fc7..1b4756909584 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -2076,6 +2076,30 @@ static int rproc_alloc_ops(struct rproc *rproc, const struct rproc_ops *ops)
> > return 0;
> > }
> >
> > +static int rproc_alloc_internals(struct rproc *rproc,
> > + const struct rproc_ops *ops,
> > + const char *name, const char *firmware)
> > +{
> > + int ret;
> > +
> > + /*
> > + * In scenarios where the remote processor's lifecycle is entirely
> > + * managed by another entity there is no point in carrying a set
> > + * of operations that will never be used.
> > + *
> > + * And since no firmware will ever be loaded, there is no point in
> > + * allocating memory for it either.
>
> While this is true, I would expect that there are cases where the
> remoteproc has ops but no firmware.
>

That is a scenario I did not envisioned, but I agree, the remote processor could
be fetching from a private ROM memory and still required handling from the
remoteproc core.

> How about splitting this decision already now; i.e. moving the if(!ops)
> to rproc_alloc_ops() and perhaps only allocate firmware if ops->load is
> specified?
>

Or just add "if (ops->load)" before calling rproc_alloc_firmware()... Otherwise
we need to change the calling order of rproc_alloc_firmware() and
rproc_alloc_ops() in order to make sure 'ops' is valid when calling the former.
Either way I'll add a comment with the rationale you have detailed above.


> Regards,
> Bjorn
>
> > + */
> > + if (!ops)
> > + return 0;
> > +
> > + ret = rproc_alloc_firmware(rproc, name, firmware);
> > + if (ret)
> > + return ret;
> > +
> > + return rproc_alloc_ops(rproc, ops);
> > +}
> > +
> > /**
> > * rproc_alloc() - allocate a remote processor handle
> > * @dev: the underlying device
> > @@ -2105,7 +2129,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name,
> > {
> > struct rproc *rproc;
> >
> > - if (!dev || !name || !ops)
> > + if (!dev || !name)
> > return NULL;
> >
> > rproc = kzalloc(sizeof(struct rproc) + len, GFP_KERNEL);
> > @@ -2128,10 +2152,7 @@ struct rproc *rproc_alloc(struct device *dev, const char *name,
> > if (!rproc->name)
> > goto put_device;
> >
> > - if (rproc_alloc_firmware(rproc, name, firmware))
> > - goto put_device;
> > -
> > - if (rproc_alloc_ops(rproc, ops))
> > + if (rproc_alloc_internals(rproc, ops, name, firmware))
> > goto put_device;
> >
> > /* Assign a unique device index and name */
> > --
> > 2.20.1
> >

2020-05-08 21:04:18

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

On Tue, May 05, 2020 at 05:22:53PM -0700, Bjorn Andersson wrote:
> On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
>
> > Add a new sync_ops to support use cases where the remoteproc
> > core is synchronising with the remote processor. Exactly when to use
> > the synchronisation operations is directed by the flags in structure
> > rproc_sync_flags.
> >
>
> I'm sorry, but no matter how many times I read these patches I have to
> translate "synchronising" to "remote controlled", and given the number
> of comments clarifying this makes me feel that we could perhaps come up
> with a better name?

"remote controlled" as in "someone else is managing the remote processor" ? It
could also mean the remoteproc core is "remote controlling" the remote
processor, exactly what it currently does today...

How about "autonomous", as in the remote processor doesn't need us to boot or
switch it off. I'm open to any other suggestions.

>
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > include/linux/remoteproc.h | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> > index ac4082f12e8b..ceb3b2bba824 100644
> > --- a/include/linux/remoteproc.h
> > +++ b/include/linux/remoteproc.h
> > @@ -353,6 +353,23 @@ enum rsc_handling_status {
> > RSC_IGNORED = 1,
> > };
> >
> > +/**
> > + * struct rproc_sync_flags - platform specific flags indicating which
> > + * rproc_ops to use at specific times during
> > + * the rproc lifecycle.
> > + * @on_init: true if synchronising with the remote processor at
> > + * initialisation time
> > + * @after_stop: true if synchronising with the remote processor after it was
> > + * stopped from the cmmand line
> > + * @after_crash: true if synchronising with the remote processor after
> > + * it has crashed
> > + */
> > +struct rproc_sync_flags {
> > + bool on_init;
>
> This indirectly splits the RPROC_OFFLINE state in an "offline" and
> "already-booted" state. Wouldn't it be clearer to represent this with a
> new RPROC_ALREADY_BOOTED state?
>

I suggested that at some point in the past but it was in a different context. I
will revisit to see how doing so could apply here.

> > + bool after_stop;
>
> What does it mean when this is true? That Linux can shut the remote core
> down, but someone else will start it?

It tells the remoteproc core how to interact with the remote processor after the
latter has been switched off. For example, we could want to boot the remote
processor from the boot loader so that minimal functionality can be provided
while the kernel boots. Once the kernel and user space are in place, the remote
processor is explicitly stopped and booted once again, but this time with a
firmware image that offers full functionality.

It could also be that the remoteproc core can stop the remote processor, but the
remote processor will automatically reboot itself. In that case the remoteproc
core will simply synchronise with the remote processor, as it does when .on_init
== true.

>
> > + bool after_crash;
>
> Similarly what is the expected steps to be taken by the core when this
> is true? Should rproc_report_crash() simply stop/start the subdevices
> and upon one of the ops somehow tell the remote controller that it can
> proceed with the recovery?

The exact same sequence of steps will be carried out as they are today, except
that if after_crash == true, the remoteproc core won't be switching the remote
processor on, exactly as it would do when on_init == true.

These flags are there to indicate how to set rproc::sync_with_rproc after
different events, that is when the remoteproc core boots, when the remoteproc
has been stopped or when it has crashed.

>
> > +};
> > +
> > /**
> > * struct rproc_ops - platform-specific device handlers
> > * @start: power on the device and boot it
> > @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> > * @firmware: name of firmware file to be loaded
> > * @priv: private data which belongs to the platform-specific rproc module
> > * @ops: platform-specific start/stop rproc handlers
> > + * @sync_ops: platform-specific start/stop rproc handlers when
> > + * synchronising with a remote processor.
> > + * @sync_flags: Determine the rproc_ops to choose in specific states.
> > * @dev: virtual device for refcounting and common remoteproc behavior
> > * @power: refcount of users who need this rproc powered up
> > * @state: state of the device
> > @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> > * @table_sz: size of @cached_table
> > * @has_iommu: flag to indicate if remote processor is behind an MMU
> > * @auto_boot: flag to indicate if remote processor should be auto-started
> > + * @sync_with_rproc: true if currently synchronising with the rproc
> > * @dump_segments: list of segments in the firmware
> > * @nb_vdev: number of vdev currently handled by rproc
> > */
> > @@ -492,6 +513,8 @@ struct rproc {
> > const char *firmware;
> > void *priv;
> > struct rproc_ops *ops;
> > + struct rproc_ops *sync_ops;
>
> Do we really need two rproc_ops, given that both are coming from the
> platform driver and the sync_flags will define which one to look at?
>
> Can't the platform driver just provide an ops table that works with the
> flags it passes?

That is the approach Loic took in a previous patchset [1] and that was rejected.
It also lead to all of the platform drivers testing rproc->flag before carring
different actions, something you indicated could be done in the core. This
patch does exactly that, i.e move the testing of rproc->flag to the core and
calls the right function based on that.

The end result is the same and I'm happy with one or the other, I will need to
know which one.

The advantage with the approach I'm proposing is that everything is controlled
in the core, i.e what ops is called and when to set rproc->flag based on
different states the remote processor transitions through.

Thanks,
Mathieu


[1]. https://patchwork.kernel.org/patch/11265869/

>
> Regards,
> Bjorn
>
> > + struct rproc_sync_flags sync_flags;
> > struct device dev;
> > atomic_t power;
> > unsigned int state;
> > @@ -515,6 +538,7 @@ struct rproc {
> > size_t table_sz;
> > bool has_iommu;
> > bool auto_boot;
> > + bool sync_with_rproc;
> > struct list_head dump_segments;
> > int nb_vdev;
> > u8 elf_class;
> > --
> > 2.20.1
> >

2020-05-08 21:32:08

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 05/14] remoteproc: Refactor function rproc_fw_boot()

On Tue, May 05, 2020 at 05:33:41PM -0700, Bjorn Andersson wrote:
> On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
>
> > Refactor function rproc_fw_boot() in order to better reflect the work
> > that is done when supporting scenarios where the remoteproc core is
> > synchronising with a remote processor.
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_core.c | 10 ++++++----
> > 1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index a02593b75bec..e90a21de9de1 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1370,9 +1370,9 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
> > }
> >
> > /*
> > - * take a firmware and boot a remote processor with it.
> > + * boot or synchronise with a remote processor.
> > */
> > -static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> > +static int rproc_actuate_device(struct rproc *rproc, const struct firmware *fw)
>
> Per patch 4 this function will if rproc_needs_syncing() be called with
> fw == NULL, it's not obvious to me that the various operations on "fw"
> in this function are valid anymore.

That is right, all firmware related operations in this function are found in
remoteproc_internal.h where the value of rproc->sync_with_mcu is checked before
moving forward. That allows us to avoid introducing a new function similar to
rproc_fw_boot() but without firmware operations or peppering the code with if
statements.

>
> > {
> > struct device *dev = &rproc->dev;
> > const char *name = rproc->firmware;
> > @@ -1382,7 +1382,9 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> > if (ret)
> > return ret;
> >
> > - dev_info(dev, "Booting fw image %s, size %zd\n", name, fw->size);
> > + if (!rproc_needs_syncing(rproc))
>
> Can't we make this check on fw, to make the relationship "if we where
> passed a firmware object, we're going to load and boot that firmware"?

It can but I specifically decided to use rproc_needs_syncing() to be consistent
with the rest of the patchset. That way all we need to do is grep for
rproc_needs_syncing to get all the places where a decision about synchronising
with a remote processor is made.

>
> Regards,
> Bjorn
>
> > + dev_info(dev, "Booting fw image %s, size %zd\n",
> > + name, fw->size);
> >
> > /*
> > * if enabling an IOMMU isn't relevant for this rproc, this is
> > @@ -1818,7 +1820,7 @@ int rproc_boot(struct rproc *rproc)
> > }
> > }
> >
> > - ret = rproc_fw_boot(rproc, firmware_p);
> > + ret = rproc_actuate_device(rproc, firmware_p);
> >
> > release_firmware(firmware_p);
> >
> > --
> > 2.20.1
> >

2020-05-08 21:52:20

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 09/14] remoteproc: Deal with synchronisation when crashing

On Tue, May 05, 2020 at 06:01:56PM -0700, Bjorn Andersson wrote:
> On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
>
> > Refactor function rproc_trigger_recovery() in order to avoid
> > reloading the firmware image when synchronising with a remote
> > processor rather than booting it. Also part of the process,
> > properly set the synchronisation flag in order to properly
> > recover the system.
> >
> > Signed-off-by: Mathieu Poirier <[email protected]>
> > ---
> > drivers/remoteproc/remoteproc_core.c | 23 ++++++++++++++------
> > drivers/remoteproc/remoteproc_internal.h | 27 ++++++++++++++++++++++++
> > 2 files changed, 43 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index ef88d3e84bfb..3a84a38ba37b 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1697,7 +1697,7 @@ static void rproc_coredump(struct rproc *rproc)
> > */
> > int rproc_trigger_recovery(struct rproc *rproc)
> > {
> > - const struct firmware *firmware_p;
> > + const struct firmware *firmware_p = NULL;
> > struct device *dev = &rproc->dev;
> > int ret;
> >
> > @@ -1718,14 +1718,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> > /* generate coredump */
> > rproc_coredump(rproc);
> >
> > - /* load firmware */
> > - ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > - if (ret < 0) {
> > - dev_err(dev, "request_firmware failed: %d\n", ret);
> > - goto unlock_mutex;
> > + /* load firmware if need be */
> > + if (!rproc_needs_syncing(rproc)) {
> > + ret = request_firmware(&firmware_p, rproc->firmware, dev);
> > + if (ret < 0) {
> > + dev_err(dev, "request_firmware failed: %d\n", ret);
> > + goto unlock_mutex;
> > + }
> > }
> >
> > - /* boot the remote processor up again */
> > + /* boot up or synchronise with the remote processor again */
> > ret = rproc_start(rproc, firmware_p);
> >
> > release_firmware(firmware_p);
> > @@ -1761,6 +1763,13 @@ static void rproc_crash_handler_work(struct work_struct *work)
> > dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt,
> > rproc->name);
> >
> > + /*
> > + * The remote processor has crashed - tell the core what operation
> > + * to use from hereon, i.e whether an external entity will reboot
> > + * the MCU or it is now the remoteproc core's responsability.
> > + */
> > + rproc_set_sync_flag(rproc, RPROC_SYNC_STATE_CRASHED);
>
> If I follow the logic correctly, you're essentially using
> rproc->sync_with_rproc to pass an additional parameter down through
> rproc_trigger_recovery() to tell everyone below to "load firmware and
> boot the core or not".

I am using the value of rproc::sync_flags::after_crash to set
rproc->sync_with_rproc. That way the core can know whether it should boot the
remote processor or synchronise with it.

>
> And given that the comment alludes to some unknown logic determining the
> continuation I think it would be much preferable to essentially just
> pass rproc->sync_flags.after_crash down through these functions.
>

The only thing we need to do is set the value of rproc->sync_with_rproc
properly, which rproc_set_sync_flag() does. I have decided to use a wrapper
function to allow us to change how the rproc->sync_with_rproc is handled without
touching anything else in the code.

>
> And per my comment on a previous patch, is there any synchronization
> with the remote controller when this happens?

I can't seem to find that comment - can you indicate which patch that was? As
it is today the core doesn't provide synchronisation, it is up to the platform
driver to probe the remote processor to make sure it is up. I suppose
sync_ops::start() would be a perfect candidate for that.

>
> Regards,
> Bjorn
>
> > +
> > mutex_unlock(&rproc->lock);
> >
> > if (!rproc->recovery_disabled)
> > diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
> > index 3985c084b184..61500981155c 100644
> > --- a/drivers/remoteproc/remoteproc_internal.h
> > +++ b/drivers/remoteproc/remoteproc_internal.h
> > @@ -24,6 +24,33 @@ struct rproc_debug_trace {
> > struct rproc_mem_entry trace_mem;
> > };
> >
> > +/*
> > + * enum rproc_sync_states - remote processsor sync states
> > + *
> > + * @RPROC_SYNC_STATE_CRASHED state to use after the remote processor
> > + * has crashed but has not been recovered by
> > + * the remoteproc core yet.
> > + *
> > + * Keeping these separate from the enum rproc_state in order to avoid
> > + * introducing coupling between the state of the MCU and the synchronisation
> > + * operation to use.
> > + */
> > +enum rproc_sync_states {
> > + RPROC_SYNC_STATE_CRASHED,
> > +};
> > +
> > +static inline void rproc_set_sync_flag(struct rproc *rproc,
> > + enum rproc_sync_states state)
> > +{
> > + switch (state) {
> > + case RPROC_SYNC_STATE_CRASHED:
> > + rproc->sync_with_rproc = rproc->sync_flags.after_crash;
> > + break;
> > + default:
> > + break;
> > + }
> > +}
> > +
> > /* from remoteproc_core.c */
> > void rproc_release(struct kref *kref);
> > irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int vq_id);
> > --
> > 2.20.1
> >

2020-05-14 01:36:44

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

On Fri 08 May 14:01 PDT 2020, Mathieu Poirier wrote:

> On Tue, May 05, 2020 at 05:22:53PM -0700, Bjorn Andersson wrote:
> > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
> >
> > > Add a new sync_ops to support use cases where the remoteproc
> > > core is synchronising with the remote processor. Exactly when to use
> > > the synchronisation operations is directed by the flags in structure
> > > rproc_sync_flags.
> > >
> >
> > I'm sorry, but no matter how many times I read these patches I have to
> > translate "synchronising" to "remote controlled", and given the number
> > of comments clarifying this makes me feel that we could perhaps come up
> > with a better name?
>
> "remote controlled" as in "someone else is managing the remote processor" ?
> It could also mean the remoteproc core is "remote controlling" the
> remote processor, exactly what it currently does today...
>

You're right and this would certainly not help the confusion.

> How about "autonomous", as in the remote processor doesn't need us to boot or
> switch it off. I'm open to any other suggestions.
>
> >
> > > Signed-off-by: Mathieu Poirier <[email protected]>
> > > ---
> > > include/linux/remoteproc.h | 24 ++++++++++++++++++++++++
> > > 1 file changed, 24 insertions(+)
> > >
> > > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> > > index ac4082f12e8b..ceb3b2bba824 100644
> > > --- a/include/linux/remoteproc.h
> > > +++ b/include/linux/remoteproc.h
> > > @@ -353,6 +353,23 @@ enum rsc_handling_status {
> > > RSC_IGNORED = 1,
> > > };
> > >
> > > +/**
> > > + * struct rproc_sync_flags - platform specific flags indicating which
> > > + * rproc_ops to use at specific times during
> > > + * the rproc lifecycle.
> > > + * @on_init: true if synchronising with the remote processor at
> > > + * initialisation time
> > > + * @after_stop: true if synchronising with the remote processor after it was
> > > + * stopped from the cmmand line
> > > + * @after_crash: true if synchronising with the remote processor after
> > > + * it has crashed
> > > + */
> > > +struct rproc_sync_flags {
> > > + bool on_init;
> >
> > This indirectly splits the RPROC_OFFLINE state in an "offline" and
> > "already-booted" state. Wouldn't it be clearer to represent this with a
> > new RPROC_ALREADY_BOOTED state?
> >
>
> I suggested that at some point in the past but it was in a different context. I
> will revisit to see how doing so could apply here.
>

How about we introduce a new state named DETACHED and make the platform
drivers specify that the remote processor is in either OFFLINE (as
today) or DETACHED during initialization.

Then on_init = true would be the action of going from DETACHED to
RUNNING, which would involve the following actions:

1) find resource table
2) prepare device (?)
3) handle resources
4) allocate carveouts (?)
5) prepare subdevices
6) "attach"
7) start subdevices

on_init = false would represent the transition from OFFLINE to RUNNING,
which today involve the following actions:

1) request firmware
2) prepare device
3) parse fw
4) handle resources
5) allocate carveouts
6) load segments
7) find resource table
8) prepare subdevices
9) "boot"
10) start subdevices

> > > + bool after_stop;
> >
> > What does it mean when this is true? That Linux can shut the remote core
> > down, but someone else will start it?
>
> It tells the remoteproc core how to interact with the remote processor after the
> latter has been switched off.

Understood.

> For example, we could want to boot the remote
> processor from the boot loader so that minimal functionality can be provided
> while the kernel boots. Once the kernel and user space are in place, the remote
> processor is explicitly stopped and booted once again, but this time with a
> firmware image that offers full functionality.
>

This would be the { on_init = true, after_stop = false } use case, with
the new state would relate to the journey of DETACHED -> RUNNING ->
OFFLINE.

As such the next boot would represent above OFFLINE -> RUNNING case,
which we already support today.

> It could also be that the remoteproc core can stop the remote processor, but the
> remote processor will automatically reboot itself. In that case the remoteproc
> core will simply synchronise with the remote processor, as it does when .on_init
> == true.
>

I've not been able to come up with a reasonable use case for the {
on_init = ture, after_stop = true } scenario.

But Wendy previously talked about the need to "detach" Linux from a
running remote processor, by somehow just letting it know that the
communication is down - to allow Linux to be rebooted while the remote
was running. So if we support a transition from RUNNING to DETACHED
using a sequence of something like:

1) stop subdevices
2) "detach"
3) unprepare subdevices
4) release carveouts (?)
5) unprepare device (?)

Then perhaps the after_stop could naturally be the transition from
DETACHED to RUNNING, either with or without a reboot of the system
in between?

> >
> > > + bool after_crash;
> >
> > Similarly what is the expected steps to be taken by the core when this
> > is true? Should rproc_report_crash() simply stop/start the subdevices
> > and upon one of the ops somehow tell the remote controller that it can
> > proceed with the recovery?
>
> The exact same sequence of steps will be carried out as they are today, except
> that if after_crash == true, the remoteproc core won't be switching the remote
> processor on, exactly as it would do when on_init == true.
>

Just to make sure we're on the same page:

after_crash = false is what we have today, and would mean:

1) stop subdevices
2) power off
3) unprepare subdevices
4) generate coredump
5) request firmware
6) load segments
7) find resource table
8) prepare subdevices
9) "boot"
10) start subdevices

after_crash = true would mean:

1) stop subdevices
2) "detach"
3) unprepare subdevices
4) prepare subdevices
5) "attach"
6) start subdevices

State diagram wise both of these would represent the transition RUNNING
-> CRASHED -> RUNNING, but somehow the platform driver needs to be able
to specify which of these sequences to perform. Per your naming
suggestion above, this does sound like a "autonomous_recovery" boolean
to me.

> These flags are there to indicate how to set rproc::sync_with_rproc after
> different events, that is when the remoteproc core boots, when the remoteproc
> has been stopped or when it has crashed.
>

Right, that was clear from your patches. Sorry that my reply didn't
convey the information that I had understood this.

> >
> > > +};
> > > +
> > > /**
> > > * struct rproc_ops - platform-specific device handlers
> > > * @start: power on the device and boot it
> > > @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> > > * @firmware: name of firmware file to be loaded
> > > * @priv: private data which belongs to the platform-specific rproc module
> > > * @ops: platform-specific start/stop rproc handlers
> > > + * @sync_ops: platform-specific start/stop rproc handlers when
> > > + * synchronising with a remote processor.
> > > + * @sync_flags: Determine the rproc_ops to choose in specific states.
> > > * @dev: virtual device for refcounting and common remoteproc behavior
> > > * @power: refcount of users who need this rproc powered up
> > > * @state: state of the device
> > > @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> > > * @table_sz: size of @cached_table
> > > * @has_iommu: flag to indicate if remote processor is behind an MMU
> > > * @auto_boot: flag to indicate if remote processor should be auto-started
> > > + * @sync_with_rproc: true if currently synchronising with the rproc
> > > * @dump_segments: list of segments in the firmware
> > > * @nb_vdev: number of vdev currently handled by rproc
> > > */
> > > @@ -492,6 +513,8 @@ struct rproc {
> > > const char *firmware;
> > > void *priv;
> > > struct rproc_ops *ops;
> > > + struct rproc_ops *sync_ops;
> >
> > Do we really need two rproc_ops, given that both are coming from the
> > platform driver and the sync_flags will define which one to look at?
> >
> > Can't the platform driver just provide an ops table that works with the
> > flags it passes?
>
> That is the approach Loic took in a previous patchset [1] and that was rejected.
> It also lead to all of the platform drivers testing rproc->flag before carring
> different actions, something you indicated could be done in the core. This
> patch does exactly that, i.e move the testing of rproc->flag to the core and
> calls the right function based on that.
>

I think I see what you mean, as we use "start" for both syncing and
starting the core, a { on_init = true, after_stop = false } setup either
needs two tables or force conditionals on the platform driver.

> The end result is the same and I'm happy with one or the other, I will need to
> know which one.
>

How about adding a new ops named "attach" to rproc_ops, which the
platform driver can specify if it supports attaching an already running
processor?

> The advantage with the approach I'm proposing is that everything is controlled
> in the core, i.e what ops is called and when to set rproc->flag based on
> different states the remote processor transitions through.
>

I still think keeping things in the core is the right thing to do.


Please let me know what you think!

PS. If we agree on this the three transitions becomes somewhat
independent, so I think it makes sense to first land support for the
DETACHED -> RUNNING transition (and the stm32 series), then follow up
with RUNNING -> DETACHED and autonomous recovery separately.

Regards,
Bjorn

> Thanks,
> Mathieu
>
>
> [1]. https://patchwork.kernel.org/patch/11265869/
>
> >
> > Regards,
> > Bjorn
> >
> > > + struct rproc_sync_flags sync_flags;
> > > struct device dev;
> > > atomic_t power;
> > > unsigned int state;
> > > @@ -515,6 +538,7 @@ struct rproc {
> > > size_t table_sz;
> > > bool has_iommu;
> > > bool auto_boot;
> > > + bool sync_with_rproc;
> > > struct list_head dump_segments;
> > > int nb_vdev;
> > > u8 elf_class;
> > > --
> > > 2.20.1
> > >

2020-05-14 02:16:17

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 05/14] remoteproc: Refactor function rproc_fw_boot()

On Fri 08 May 14:27 PDT 2020, Mathieu Poirier wrote:

> On Tue, May 05, 2020 at 05:33:41PM -0700, Bjorn Andersson wrote:
> > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
> >
> > > Refactor function rproc_fw_boot() in order to better reflect the work
> > > that is done when supporting scenarios where the remoteproc core is
> > > synchronising with a remote processor.
> > >
> > > Signed-off-by: Mathieu Poirier <[email protected]>
> > > ---
> > > drivers/remoteproc/remoteproc_core.c | 10 ++++++----
> > > 1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > > index a02593b75bec..e90a21de9de1 100644
> > > --- a/drivers/remoteproc/remoteproc_core.c
> > > +++ b/drivers/remoteproc/remoteproc_core.c
> > > @@ -1370,9 +1370,9 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
> > > }
> > >
> > > /*
> > > - * take a firmware and boot a remote processor with it.
> > > + * boot or synchronise with a remote processor.
> > > */
> > > -static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> > > +static int rproc_actuate_device(struct rproc *rproc, const struct firmware *fw)
> >
> > Per patch 4 this function will if rproc_needs_syncing() be called with
> > fw == NULL, it's not obvious to me that the various operations on "fw"
> > in this function are valid anymore.
>
> That is right, all firmware related operations in this function are found in
> remoteproc_internal.h where the value of rproc->sync_with_mcu is checked before
> moving forward. That allows us to avoid introducing a new function similar to
> rproc_fw_boot() but without firmware operations or peppering the code with if
> statements.
>

As I wrote in my other reply, the two mechanisms seems to consist of the
following steps:

boot the core:
1) request firmware
2) prepare device
3) parse fw
4) handle resources
5) allocate carveouts
6) load segments
7) find resource table
8) prepare subdevices
9) power on
10) start subdevices

sync:
1) prepare device (?)
2) handle resources
3) allocate carveouts (?)
4) prepare subdevices
5) attach
6) start subdevices

Rather than relying on the state flag and missing ops will turn the
first list into the second list I conceptually prefer having two
separate functions that are easy to reason about.

But I haven't done any refactoring or implemented this, so in practice
the two might just be a lot of duplication(?)

> >
> > > {
> > > struct device *dev = &rproc->dev;
> > > const char *name = rproc->firmware;
> > > @@ -1382,7 +1382,9 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> > > if (ret)
> > > return ret;
> > >
> > > - dev_info(dev, "Booting fw image %s, size %zd\n", name, fw->size);
> > > + if (!rproc_needs_syncing(rproc))
> >
> > Can't we make this check on fw, to make the relationship "if we where
> > passed a firmware object, we're going to load and boot that firmware"?
>
> It can but I specifically decided to use rproc_needs_syncing() to be consistent
> with the rest of the patchset. That way all we need to do is grep for
> rproc_needs_syncing to get all the places where a decision about synchronising
> with a remote processor is made.
>

Conceptually we have a single "to sync or not to sync", but I think
we're invoking rproc_needs_syncing() 8 times during rproc_fw_boot() and
each of those operations may or may not do anything.

There are certain operations where I see it makes sense for a driver to
either implement or not, but I think that e.g. for a rproc in OFFLINE
state we should just require ops->start to be specified - because it
doesn't make sense to enter rproc_start() if ops->start is a nop.

Regards,
Bjorn

> >
> > Regards,
> > Bjorn
> >
> > > + dev_info(dev, "Booting fw image %s, size %zd\n",
> > > + name, fw->size);
> > >
> > > /*
> > > * if enabling an IOMMU isn't relevant for this rproc, this is
> > > @@ -1818,7 +1820,7 @@ int rproc_boot(struct rproc *rproc)
> > > }
> > > }
> > >
> > > - ret = rproc_fw_boot(rproc, firmware_p);
> > > + ret = rproc_actuate_device(rproc, firmware_p);
> > >
> > > release_firmware(firmware_p);
> > >
> > > --
> > > 2.20.1
> > >

2020-05-15 19:29:08

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

Good day Bjorn,

On Wed, May 13, 2020 at 06:32:24PM -0700, Bjorn Andersson wrote:
> On Fri 08 May 14:01 PDT 2020, Mathieu Poirier wrote:
>
> > On Tue, May 05, 2020 at 05:22:53PM -0700, Bjorn Andersson wrote:
> > > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
> > >
> > > > Add a new sync_ops to support use cases where the remoteproc
> > > > core is synchronising with the remote processor. Exactly when to use
> > > > the synchronisation operations is directed by the flags in structure
> > > > rproc_sync_flags.
> > > >
> > >
> > > I'm sorry, but no matter how many times I read these patches I have to
> > > translate "synchronising" to "remote controlled", and given the number
> > > of comments clarifying this makes me feel that we could perhaps come up
> > > with a better name?
> >
> > "remote controlled" as in "someone else is managing the remote processor" ?
> > It could also mean the remoteproc core is "remote controlling" the
> > remote processor, exactly what it currently does today...
> >
>
> You're right and this would certainly not help the confusion.
>
> > How about "autonomous", as in the remote processor doesn't need us to boot or
> > switch it off. I'm open to any other suggestions.
> >
> > >
> > > > Signed-off-by: Mathieu Poirier <[email protected]>
> > > > ---
> > > > include/linux/remoteproc.h | 24 ++++++++++++++++++++++++
> > > > 1 file changed, 24 insertions(+)
> > > >
> > > > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> > > > index ac4082f12e8b..ceb3b2bba824 100644
> > > > --- a/include/linux/remoteproc.h
> > > > +++ b/include/linux/remoteproc.h
> > > > @@ -353,6 +353,23 @@ enum rsc_handling_status {
> > > > RSC_IGNORED = 1,
> > > > };
> > > >
> > > > +/**
> > > > + * struct rproc_sync_flags - platform specific flags indicating which
> > > > + * rproc_ops to use at specific times during
> > > > + * the rproc lifecycle.
> > > > + * @on_init: true if synchronising with the remote processor at
> > > > + * initialisation time
> > > > + * @after_stop: true if synchronising with the remote processor after it was
> > > > + * stopped from the cmmand line
> > > > + * @after_crash: true if synchronising with the remote processor after
> > > > + * it has crashed
> > > > + */
> > > > +struct rproc_sync_flags {
> > > > + bool on_init;
> > >
> > > This indirectly splits the RPROC_OFFLINE state in an "offline" and
> > > "already-booted" state. Wouldn't it be clearer to represent this with a
> > > new RPROC_ALREADY_BOOTED state?
> > >
> >
> > I suggested that at some point in the past but it was in a different context. I
> > will revisit to see how doing so could apply here.
> >
>
> How about we introduce a new state named DETACHED and make the platform
> drivers specify that the remote processor is in either OFFLINE (as
> today) or DETACHED during initialization.

That is certainly an idea that is growing on me. Up to now I used the on_init
flag to express duality in the OFFLINE state. But based on the comments that came
back from yourself, Arnaud and Suman it is clear that my approach is anything
but clear. As such I am eager to try something else.

>
> Then on_init = true would be the action of going from DETACHED to
> RUNNING, which would involve the following actions:
>
> 1) find resource table
> 2) prepare device (?)
> 3) handle resources
> 4) allocate carveouts (?)
> 5) prepare subdevices
> 6) "attach"
> 7) start subdevices
>
> on_init = false would represent the transition from OFFLINE to RUNNING,
> which today involve the following actions:
>
> 1) request firmware
> 2) prepare device
> 3) parse fw
> 4) handle resources
> 5) allocate carveouts
> 6) load segments
> 7) find resource table
> 8) prepare subdevices
> 9) "boot"
> 10) start subdevices

If we add a DETACHED state I don't see a scenario where we need the on_init
variable. When DETACHED is set by the platform we know the MCU is running and
it becomes a matter of when the core attach to it, i.e at initialisation time or
once the kernel has finished booting, and that is already taken care of by the
auto_boot variable.

The steps you have outlined above to describe the transitions are accurate.

>
> > > > + bool after_stop;
> > >
> > > What does it mean when this is true? That Linux can shut the remote core
> > > down, but someone else will start it?
> >
> > It tells the remoteproc core how to interact with the remote processor after the
> > latter has been switched off.
>
> Understood.
>
> > For example, we could want to boot the remote
> > processor from the boot loader so that minimal functionality can be provided
> > while the kernel boots. Once the kernel and user space are in place, the remote
> > processor is explicitly stopped and booted once again, but this time with a
> > firmware image that offers full functionality.
> >
>
> This would be the { on_init = true, after_stop = false } use case, with
> the new state would relate to the journey of DETACHED -> RUNNING ->
> OFFLINE.

Yes

>
> As such the next boot would represent above OFFLINE -> RUNNING case,
> which we already support today.

Correct. This is the level of functionality sought by ST and TI. Xilinx seems to
have the same requirements as well.

>
> > It could also be that the remoteproc core can stop the remote processor, but the
> > remote processor will automatically reboot itself. In that case the remoteproc
> > core will simply synchronise with the remote processor, as it does when .on_init
> > == true.
> >
>
> I've not been able to come up with a reasonable use case for the {
> on_init = ture, after_stop = true } scenario.

That one is a little trickier - see the next comment.

>
> But Wendy previously talked about the need to "detach" Linux from a
> running remote processor, by somehow just letting it know that the
> communication is down - to allow Linux to be rebooted while the remote
> was running. So if we support a transition from RUNNING to DETACHED
> using a sequence of something like:
>
> 1) stop subdevices
> 2) "detach"
> 3) unprepare subdevices
> 4) release carveouts (?)
> 5) unprepare device (?)
>
> Then perhaps the after_stop could naturally be the transition from
> DETACHED to RUNNING, either with or without a reboot of the system
> in between?

I see two scenarios for after_stop == true:

1) A "detach" scenario as you mentioned above. In this case the stop() function
would inform (using a mechanism that is platform specific) the MCU that the core
is shutting down. In this case the MCU would put itself back in "waiting mode",
waiting for the core to show signs of life again. On the core side this would
be a DETACHED to RUNNING transition. Wheter the application processor reboots
or not should not be relevant to the MCU.

2) An "MCU reboot in autonomous mode" scenario. Here the stop() function would
switch off the MCU. From there the MCU could automatically restarts itself or
be restarted by some other entity. In this scenario I would expect the start()
function to block until the MCU is ready to proceed with the rest of the
remoteproc core initialisation steps.

From a remoteproc core perspective, both are handled by a DETACHED -> RUNNING
transition. This is the functionality NXP is looking for.

>
> > >
> > > > + bool after_crash;
> > >
> > > Similarly what is the expected steps to be taken by the core when this
> > > is true? Should rproc_report_crash() simply stop/start the subdevices
> > > and upon one of the ops somehow tell the remote controller that it can
> > > proceed with the recovery?
> >
> > The exact same sequence of steps will be carried out as they are today, except
> > that if after_crash == true, the remoteproc core won't be switching the remote
> > processor on, exactly as it would do when on_init == true.
> >
>
> Just to make sure we're on the same page:
>
> after_crash = false is what we have today, and would mean:
>
> 1) stop subdevices
> 2) power off
> 3) unprepare subdevices
> 4) generate coredump
> 5) request firmware
> 6) load segments
> 7) find resource table
> 8) prepare subdevices
> 9) "boot"
> 10) start subdevices

Exactly

>
> after_crash = true would mean:
>
> 1) stop subdevices
> 2) "detach"
> 3) unprepare subdevices
> 4) prepare subdevices
> 5) "attach"
> 6) start subdevices
>

Yes

> State diagram wise both of these would represent the transition RUNNING
> -> CRASHED -> RUNNING, but somehow the platform driver needs to be able
> to specify which of these sequences to perform. Per your naming
> suggestion above, this does sound like a "autonomous_recovery" boolean
> to me.

Right, semantically "rproc->autonomous" would apply quite well.

In function rproc_crash_handler_work(), a call to rproc_set_sync_flag() has been
strategically placed to set the value of rproc->autonomous based on
"after_crash". From there the core knows which rproc_ops to use. Here too we
have to rely on the rproc_ops provided by the platform to do the right thing
based on the scenario to enact.

>
> > These flags are there to indicate how to set rproc::sync_with_rproc after
> > different events, that is when the remoteproc core boots, when the remoteproc
> > has been stopped or when it has crashed.
> >
>
> Right, that was clear from your patches. Sorry that my reply didn't
> convey the information that I had understood this.
>
> > >
> > > > +};
> > > > +
> > > > /**
> > > > * struct rproc_ops - platform-specific device handlers
> > > > * @start: power on the device and boot it
> > > > @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> > > > * @firmware: name of firmware file to be loaded
> > > > * @priv: private data which belongs to the platform-specific rproc module
> > > > * @ops: platform-specific start/stop rproc handlers
> > > > + * @sync_ops: platform-specific start/stop rproc handlers when
> > > > + * synchronising with a remote processor.
> > > > + * @sync_flags: Determine the rproc_ops to choose in specific states.
> > > > * @dev: virtual device for refcounting and common remoteproc behavior
> > > > * @power: refcount of users who need this rproc powered up
> > > > * @state: state of the device
> > > > @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> > > > * @table_sz: size of @cached_table
> > > > * @has_iommu: flag to indicate if remote processor is behind an MMU
> > > > * @auto_boot: flag to indicate if remote processor should be auto-started
> > > > + * @sync_with_rproc: true if currently synchronising with the rproc
> > > > * @dump_segments: list of segments in the firmware
> > > > * @nb_vdev: number of vdev currently handled by rproc
> > > > */
> > > > @@ -492,6 +513,8 @@ struct rproc {
> > > > const char *firmware;
> > > > void *priv;
> > > > struct rproc_ops *ops;
> > > > + struct rproc_ops *sync_ops;
> > >
> > > Do we really need two rproc_ops, given that both are coming from the
> > > platform driver and the sync_flags will define which one to look at?
> > >
> > > Can't the platform driver just provide an ops table that works with the
> > > flags it passes?
> >
> > That is the approach Loic took in a previous patchset [1] and that was rejected.
> > It also lead to all of the platform drivers testing rproc->flag before carring
> > different actions, something you indicated could be done in the core. This
> > patch does exactly that, i.e move the testing of rproc->flag to the core and
> > calls the right function based on that.
> >
>
> I think I see what you mean, as we use "start" for both syncing and
> starting the core, a { on_init = true, after_stop = false } setup either
> needs two tables or force conditionals on the platform driver.
>
> > The end result is the same and I'm happy with one or the other, I will need to
> > know which one.
> >
>
> How about adding a new ops named "attach" to rproc_ops, which the
> platform driver can specify if it supports attaching an already running
> processor?

Using "attach_ops" works for me. But would "autonomous_ops", to correlate with
rproc::autonomous, add clarity? Either way work equally well for me.

>
> > The advantage with the approach I'm proposing is that everything is controlled
> > in the core, i.e what ops is called and when to set rproc->flag based on
> > different states the remote processor transitions through.
> >
>
> I still think keeping things in the core is the right thing to do.
>

Let's continue down that path then.

>
> Please let me know what you think!

From the above conversion I believe our views are pretty much aligned.

>
> PS. If we agree on this the three transitions becomes somewhat
> independent, so I think it makes sense to first land support for the
> DETACHED -> RUNNING transition (and the stm32 series), then follow up
> with RUNNING -> DETACHED and autonomous recovery separately.

We can certainly proceed that way.

Thanks for the time,
Mathieu

>
> Regards,
> Bjorn
>
> > Thanks,
> > Mathieu
> >
> >
> > [1]. https://patchwork.kernel.org/patch/11265869/
> >
> > >
> > > Regards,
> > > Bjorn
> > >
> > > > + struct rproc_sync_flags sync_flags;
> > > > struct device dev;
> > > > atomic_t power;
> > > > unsigned int state;
> > > > @@ -515,6 +538,7 @@ struct rproc {
> > > > size_t table_sz;
> > > > bool has_iommu;
> > > > bool auto_boot;
> > > > + bool sync_with_rproc;
> > > > struct list_head dump_segments;
> > > > int nb_vdev;
> > > > u8 elf_class;
> > > > --
> > > > 2.20.1
> > > >

2020-05-15 19:50:42

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 05/14] remoteproc: Refactor function rproc_fw_boot()

On Wed, May 13, 2020 at 07:10:55PM -0700, Bjorn Andersson wrote:
> On Fri 08 May 14:27 PDT 2020, Mathieu Poirier wrote:
>
> > On Tue, May 05, 2020 at 05:33:41PM -0700, Bjorn Andersson wrote:
> > > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
> > >
> > > > Refactor function rproc_fw_boot() in order to better reflect the work
> > > > that is done when supporting scenarios where the remoteproc core is
> > > > synchronising with a remote processor.
> > > >
> > > > Signed-off-by: Mathieu Poirier <[email protected]>
> > > > ---
> > > > drivers/remoteproc/remoteproc_core.c | 10 ++++++----
> > > > 1 file changed, 6 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > > > index a02593b75bec..e90a21de9de1 100644
> > > > --- a/drivers/remoteproc/remoteproc_core.c
> > > > +++ b/drivers/remoteproc/remoteproc_core.c
> > > > @@ -1370,9 +1370,9 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
> > > > }
> > > >
> > > > /*
> > > > - * take a firmware and boot a remote processor with it.
> > > > + * boot or synchronise with a remote processor.
> > > > */
> > > > -static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> > > > +static int rproc_actuate_device(struct rproc *rproc, const struct firmware *fw)
> > >
> > > Per patch 4 this function will if rproc_needs_syncing() be called with
> > > fw == NULL, it's not obvious to me that the various operations on "fw"
> > > in this function are valid anymore.
> >
> > That is right, all firmware related operations in this function are found in
> > remoteproc_internal.h where the value of rproc->sync_with_mcu is checked before
> > moving forward. That allows us to avoid introducing a new function similar to
> > rproc_fw_boot() but without firmware operations or peppering the code with if
> > statements.
> >
>
> As I wrote in my other reply, the two mechanisms seems to consist of the
> following steps:
>
> boot the core:
> 1) request firmware
> 2) prepare device
> 3) parse fw
> 4) handle resources
> 5) allocate carveouts
> 6) load segments
> 7) find resource table
> 8) prepare subdevices
> 9) power on
> 10) start subdevices
>
> sync:
> 1) prepare device (?)
> 2) handle resources
> 3) allocate carveouts (?)
> 4) prepare subdevices
> 5) attach
> 6) start subdevices
>
> Rather than relying on the state flag and missing ops will turn the
> first list into the second list I conceptually prefer having two
> separate functions that are easy to reason about.

I reflected long and hard about doing just that...

>
> But I haven't done any refactoring or implemented this, so in practice
> the two might just be a lot of duplication(?)

Exactly - duplication and maintenance are my prime concern. Right now some
functions in the OFFLINE -> RUNNING are clearly not needed when dealing with a
DETACHED -> RUNNING scenarios, but with I am convinced people will find ways to
do something creative with the callbacks. In the end I fear the new functions
we spin off to deal with DETACHED -> RUNNING scenarios will end up looking very
similar to the current implementation.

With that in mind I simply did all the work in remoteproc_internal.h and left
the core functions intact.

We can try spinning off new functions in the next revision, just to test my
theory and see how much gets duplicated.

>
> > >
> > > > {
> > > > struct device *dev = &rproc->dev;
> > > > const char *name = rproc->firmware;
> > > > @@ -1382,7 +1382,9 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> > > > if (ret)
> > > > return ret;
> > > >
> > > > - dev_info(dev, "Booting fw image %s, size %zd\n", name, fw->size);
> > > > + if (!rproc_needs_syncing(rproc))
> > >
> > > Can't we make this check on fw, to make the relationship "if we where
> > > passed a firmware object, we're going to load and boot that firmware"?
> >
> > It can but I specifically decided to use rproc_needs_syncing() to be consistent
> > with the rest of the patchset. That way all we need to do is grep for
> > rproc_needs_syncing to get all the places where a decision about synchronising
> > with a remote processor is made.
> >
>
> Conceptually we have a single "to sync or not to sync", but I think
> we're invoking rproc_needs_syncing() 8 times during rproc_fw_boot() and
> each of those operations may or may not do anything.

As I said above, I'll try spinning off new functions in the next revision. From
there we can decide how best to move forward.

>
> There are certain operations where I see it makes sense for a driver to
> either implement or not, but I think that e.g. for a rproc in OFFLINE
> state we should just require ops->start to be specified - because it
> doesn't make sense to enter rproc_start() if ops->start is a nop.

At this time ops->start() doesn't have to be specified... But as you say it
won't do much good and this is something we can easily spot when reviewing
patches.

Thanks for the review,
Mathieu

>
> Regards,
> Bjorn
>
> > >
> > > Regards,
> > > Bjorn
> > >
> > > > + dev_info(dev, "Booting fw image %s, size %zd\n",
> > > > + name, fw->size);
> > > >
> > > > /*
> > > > * if enabling an IOMMU isn't relevant for this rproc, this is
> > > > @@ -1818,7 +1820,7 @@ int rproc_boot(struct rproc *rproc)
> > > > }
> > > > }
> > > >
> > > > - ret = rproc_fw_boot(rproc, firmware_p);
> > > > + ret = rproc_actuate_device(rproc, firmware_p);
> > > >
> > > > release_firmware(firmware_p);
> > > >
> > > > --
> > > > 2.20.1
> > > >

2020-05-18 13:30:13

by Peng Fan

[permalink] [raw]
Subject: RE: [PATCH v3 00/14] remoteproc: Add support for synchronisaton with rproc

> Subject: [PATCH v3 00/14] remoteproc: Add support for synchronisaton with
> rproc

What's the status of this thread? Will this be applied or requires a new v4?

Thanks,
Peng.

>
> This is the third revision of this series that tries to address the problem of
> synchronising with a remote processor with as much flexibility as possible.
>
> Two things to pay attention to:
>
> 1) Function rproc_actuate() has been abandoned to avoid creating another
> way to start a remote processor from a kernel driver. Arnaud expressed
> the opinion that it is semantically questionnable to synchronise with a
> remote processor when calling rproc_boot(). We could rename
> rproc_boot() to rproc_actuate() but I'll wait to see what other people
> think before doing so.
>
> 2) The allocation of the synchronisation states has been split from the
> remote processor allocation. A new function rproc_set_state_machine()
> does all the work now. Proceeding this way has made the patchset a
> lot more simple.
>
> Other than the above I have tried to address all the comments made on the
> second revision. If a comment was not addressed it simply fell through the
> cracks rather than ignored. In such a case please reiterate your point
> of view and I'll be sure to address it.
>
> Applies cleanly on rproc-next (305ac5a766b1).
>
> Best regards,
> Mathieu
>
> Mathieu Poirier (14):
> remoteproc: Make core operations optional
> remoteproc: Introduce function rproc_alloc_internals()
> remoteproc: Add new operation and flags for synchronistation
> remoteproc: Refactor function rproc_boot()
> remoteproc: Refactor function rproc_fw_boot()
> remoteproc: Refactor function rproc_trigger_auto_boot()
> remoteproc: Introducting new start and stop functions
> remoteproc: Call core functions based on synchronisation flag
> remoteproc: Deal with synchronisation when crashing
> remoteproc: Deal with synchronisation when shutting down
> remoteproc: Deal with synchronisation when changing FW image
> remoteproc: Introducing function rproc_set_state_machine()
> remoteproc: Document function rproc_set_state_machine()
> remoteproc: Expose synchronisation flags via debugfs
>
> Documentation/remoteproc.txt | 17 ++
> drivers/remoteproc/remoteproc_core.c | 197
> +++++++++++++++++++----
> drivers/remoteproc/remoteproc_debugfs.c | 21 +++
> drivers/remoteproc/remoteproc_internal.h | 123 +++++++++++++-
> drivers/remoteproc/remoteproc_sysfs.c | 24 ++-
> include/linux/remoteproc.h | 27 ++++
> 6 files changed, 372 insertions(+), 37 deletions(-)
>
> --
> 2.20.1

2020-05-18 16:31:06

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] remoteproc: Add support for synchronisaton with rproc

On Mon, 18 May 2020 at 07:28, Peng Fan <[email protected]> wrote:
>
> > Subject: [PATCH v3 00/14] remoteproc: Add support for synchronisaton with
> > rproc
>
> What's the status of this thread? Will this be applied or requires a new v4?

It will not be applied as more work needs to be done. As one of the
people this feature will benefit, it would be really nice if you could
take the time to comment on the solution that is brought forward.

>
> Thanks,
> Peng.
>
> >
> > This is the third revision of this series that tries to address the problem of
> > synchronising with a remote processor with as much flexibility as possible.
> >
> > Two things to pay attention to:
> >
> > 1) Function rproc_actuate() has been abandoned to avoid creating another
> > way to start a remote processor from a kernel driver. Arnaud expressed
> > the opinion that it is semantically questionnable to synchronise with a
> > remote processor when calling rproc_boot(). We could rename
> > rproc_boot() to rproc_actuate() but I'll wait to see what other people
> > think before doing so.
> >
> > 2) The allocation of the synchronisation states has been split from the
> > remote processor allocation. A new function rproc_set_state_machine()
> > does all the work now. Proceeding this way has made the patchset a
> > lot more simple.
> >
> > Other than the above I have tried to address all the comments made on the
> > second revision. If a comment was not addressed it simply fell through the
> > cracks rather than ignored. In such a case please reiterate your point
> > of view and I'll be sure to address it.
> >
> > Applies cleanly on rproc-next (305ac5a766b1).
> >
> > Best regards,
> > Mathieu
> >
> > Mathieu Poirier (14):
> > remoteproc: Make core operations optional
> > remoteproc: Introduce function rproc_alloc_internals()
> > remoteproc: Add new operation and flags for synchronistation
> > remoteproc: Refactor function rproc_boot()
> > remoteproc: Refactor function rproc_fw_boot()
> > remoteproc: Refactor function rproc_trigger_auto_boot()
> > remoteproc: Introducting new start and stop functions
> > remoteproc: Call core functions based on synchronisation flag
> > remoteproc: Deal with synchronisation when crashing
> > remoteproc: Deal with synchronisation when shutting down
> > remoteproc: Deal with synchronisation when changing FW image
> > remoteproc: Introducing function rproc_set_state_machine()
> > remoteproc: Document function rproc_set_state_machine()
> > remoteproc: Expose synchronisation flags via debugfs
> >
> > Documentation/remoteproc.txt | 17 ++
> > drivers/remoteproc/remoteproc_core.c | 197
> > +++++++++++++++++++----
> > drivers/remoteproc/remoteproc_debugfs.c | 21 +++
> > drivers/remoteproc/remoteproc_internal.h | 123 +++++++++++++-
> > drivers/remoteproc/remoteproc_sysfs.c | 24 ++-
> > include/linux/remoteproc.h | 27 ++++
> > 6 files changed, 372 insertions(+), 37 deletions(-)
> >
> > --
> > 2.20.1
>

2020-05-19 00:29:14

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 05/14] remoteproc: Refactor function rproc_fw_boot()

On Fri 15 May 12:46 PDT 2020, Mathieu Poirier wrote:

> On Wed, May 13, 2020 at 07:10:55PM -0700, Bjorn Andersson wrote:
> > On Fri 08 May 14:27 PDT 2020, Mathieu Poirier wrote:
> >
> > > On Tue, May 05, 2020 at 05:33:41PM -0700, Bjorn Andersson wrote:
> > > > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
> > > >
> > > > > Refactor function rproc_fw_boot() in order to better reflect the work
> > > > > that is done when supporting scenarios where the remoteproc core is
> > > > > synchronising with a remote processor.
> > > > >
> > > > > Signed-off-by: Mathieu Poirier <[email protected]>
> > > > > ---
> > > > > drivers/remoteproc/remoteproc_core.c | 10 ++++++----
> > > > > 1 file changed, 6 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > > > > index a02593b75bec..e90a21de9de1 100644
> > > > > --- a/drivers/remoteproc/remoteproc_core.c
> > > > > +++ b/drivers/remoteproc/remoteproc_core.c
> > > > > @@ -1370,9 +1370,9 @@ static int rproc_start(struct rproc *rproc, const struct firmware *fw)
> > > > > }
> > > > >
> > > > > /*
> > > > > - * take a firmware and boot a remote processor with it.
> > > > > + * boot or synchronise with a remote processor.
> > > > > */
> > > > > -static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> > > > > +static int rproc_actuate_device(struct rproc *rproc, const struct firmware *fw)
> > > >
> > > > Per patch 4 this function will if rproc_needs_syncing() be called with
> > > > fw == NULL, it's not obvious to me that the various operations on "fw"
> > > > in this function are valid anymore.
> > >
> > > That is right, all firmware related operations in this function are found in
> > > remoteproc_internal.h where the value of rproc->sync_with_mcu is checked before
> > > moving forward. That allows us to avoid introducing a new function similar to
> > > rproc_fw_boot() but without firmware operations or peppering the code with if
> > > statements.
> > >
> >
> > As I wrote in my other reply, the two mechanisms seems to consist of the
> > following steps:
> >
> > boot the core:
> > 1) request firmware
> > 2) prepare device
> > 3) parse fw
> > 4) handle resources
> > 5) allocate carveouts
> > 6) load segments
> > 7) find resource table
> > 8) prepare subdevices
> > 9) power on
> > 10) start subdevices
> >
> > sync:
> > 1) prepare device (?)
> > 2) handle resources
> > 3) allocate carveouts (?)
> > 4) prepare subdevices
> > 5) attach
> > 6) start subdevices
> >
> > Rather than relying on the state flag and missing ops will turn the
> > first list into the second list I conceptually prefer having two
> > separate functions that are easy to reason about.
>
> I reflected long and hard about doing just that...
>
> >
> > But I haven't done any refactoring or implemented this, so in practice
> > the two might just be a lot of duplication(?)
>
> Exactly - duplication and maintenance are my prime concern. Right now some
> functions in the OFFLINE -> RUNNING are clearly not needed when dealing with a
> DETACHED -> RUNNING scenarios, but with I am convinced people will find ways to
> do something creative with the callbacks.

I'm sure there are problems out there that will require creative
solutions, but I would prefer that we keep things easy to reason about
and ensure that as new problems arise we can evolve the framework.

> In the end I fear the new functions
> we spin off to deal with DETACHED -> RUNNING scenarios will end up looking very
> similar to the current implementation.
>

In those scenarios I don't see a problem with the platform drivers
having functions of common code shared between ops->start and
ops->attach.

> With that in mind I simply did all the work in remoteproc_internal.h and left
> the core functions intact.
>
> We can try spinning off new functions in the next revision, just to test my
> theory and see how much gets duplicated.
>

Looking forward to it!

> >
> > > >
> > > > > {
> > > > > struct device *dev = &rproc->dev;
> > > > > const char *name = rproc->firmware;
> > > > > @@ -1382,7 +1382,9 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
> > > > > if (ret)
> > > > > return ret;
> > > > >
> > > > > - dev_info(dev, "Booting fw image %s, size %zd\n", name, fw->size);
> > > > > + if (!rproc_needs_syncing(rproc))
> > > >
> > > > Can't we make this check on fw, to make the relationship "if we where
> > > > passed a firmware object, we're going to load and boot that firmware"?
> > >
> > > It can but I specifically decided to use rproc_needs_syncing() to be consistent
> > > with the rest of the patchset. That way all we need to do is grep for
> > > rproc_needs_syncing to get all the places where a decision about synchronising
> > > with a remote processor is made.
> > >
> >
> > Conceptually we have a single "to sync or not to sync", but I think
> > we're invoking rproc_needs_syncing() 8 times during rproc_fw_boot() and
> > each of those operations may or may not do anything.
>
> As I said above, I'll try spinning off new functions in the next revision. From
> there we can decide how best to move forward.
>
> >
> > There are certain operations where I see it makes sense for a driver to
> > either implement or not, but I think that e.g. for a rproc in OFFLINE
> > state we should just require ops->start to be specified - because it
> > doesn't make sense to enter rproc_start() if ops->start is a nop.
>
> At this time ops->start() doesn't have to be specified... But as you say it
> won't do much good and this is something we can easily spot when reviewing
> patches.
>

Presumably after implementing this support we should check during
registration that there's either a start or an attach ops specified. And
if there's no start we shouldn't allow the RUNNING->OFFLINE transition.

> Thanks for the review,

Thanks for working on this and sorry that it took me time really digest
this.

Regards,
Bjorn

> Mathieu
>
> >
> > Regards,
> > Bjorn
> >
> > > >
> > > > Regards,
> > > > Bjorn
> > > >
> > > > > + dev_info(dev, "Booting fw image %s, size %zd\n",
> > > > > + name, fw->size);
> > > > >
> > > > > /*
> > > > > * if enabling an IOMMU isn't relevant for this rproc, this is
> > > > > @@ -1818,7 +1820,7 @@ int rproc_boot(struct rproc *rproc)
> > > > > }
> > > > > }
> > > > >
> > > > > - ret = rproc_fw_boot(rproc, firmware_p);
> > > > > + ret = rproc_actuate_device(rproc, firmware_p);
> > > > >
> > > > > release_firmware(firmware_p);
> > > > >
> > > > > --
> > > > > 2.20.1
> > > > >

2020-05-19 00:58:21

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

On Fri 15 May 12:24 PDT 2020, Mathieu Poirier wrote:

> Good day Bjorn,
>
> On Wed, May 13, 2020 at 06:32:24PM -0700, Bjorn Andersson wrote:
> > On Fri 08 May 14:01 PDT 2020, Mathieu Poirier wrote:
> >
> > > On Tue, May 05, 2020 at 05:22:53PM -0700, Bjorn Andersson wrote:
> > > > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
> > > >
> > > > > Add a new sync_ops to support use cases where the remoteproc
> > > > > core is synchronising with the remote processor. Exactly when to use
> > > > > the synchronisation operations is directed by the flags in structure
> > > > > rproc_sync_flags.
> > > > >
> > > >
> > > > I'm sorry, but no matter how many times I read these patches I have to
> > > > translate "synchronising" to "remote controlled", and given the number
> > > > of comments clarifying this makes me feel that we could perhaps come up
> > > > with a better name?
> > >
> > > "remote controlled" as in "someone else is managing the remote processor" ?
> > > It could also mean the remoteproc core is "remote controlling" the
> > > remote processor, exactly what it currently does today...
> > >
> >
> > You're right and this would certainly not help the confusion.
> >
> > > How about "autonomous", as in the remote processor doesn't need us to boot or
> > > switch it off. I'm open to any other suggestions.
> > >
> > > >
> > > > > Signed-off-by: Mathieu Poirier <[email protected]>
> > > > > ---
> > > > > include/linux/remoteproc.h | 24 ++++++++++++++++++++++++
> > > > > 1 file changed, 24 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> > > > > index ac4082f12e8b..ceb3b2bba824 100644
> > > > > --- a/include/linux/remoteproc.h
> > > > > +++ b/include/linux/remoteproc.h
> > > > > @@ -353,6 +353,23 @@ enum rsc_handling_status {
> > > > > RSC_IGNORED = 1,
> > > > > };
> > > > >
> > > > > +/**
> > > > > + * struct rproc_sync_flags - platform specific flags indicating which
> > > > > + * rproc_ops to use at specific times during
> > > > > + * the rproc lifecycle.
> > > > > + * @on_init: true if synchronising with the remote processor at
> > > > > + * initialisation time
> > > > > + * @after_stop: true if synchronising with the remote processor after it was
> > > > > + * stopped from the cmmand line
> > > > > + * @after_crash: true if synchronising with the remote processor after
> > > > > + * it has crashed
> > > > > + */
> > > > > +struct rproc_sync_flags {
> > > > > + bool on_init;
> > > >
> > > > This indirectly splits the RPROC_OFFLINE state in an "offline" and
> > > > "already-booted" state. Wouldn't it be clearer to represent this with a
> > > > new RPROC_ALREADY_BOOTED state?
> > > >
> > >
> > > I suggested that at some point in the past but it was in a different context. I
> > > will revisit to see how doing so could apply here.
> > >
> >
> > How about we introduce a new state named DETACHED and make the platform
> > drivers specify that the remote processor is in either OFFLINE (as
> > today) or DETACHED during initialization.
>
> That is certainly an idea that is growing on me. Up to now I used the on_init
> flag to express duality in the OFFLINE state. But based on the comments that came
> back from yourself, Arnaud and Suman it is clear that my approach is anything
> but clear. As such I am eager to try something else.
>
> >
> > Then on_init = true would be the action of going from DETACHED to
> > RUNNING, which would involve the following actions:
> >
> > 1) find resource table
> > 2) prepare device (?)
> > 3) handle resources
> > 4) allocate carveouts (?)
> > 5) prepare subdevices
> > 6) "attach"
> > 7) start subdevices
> >
> > on_init = false would represent the transition from OFFLINE to RUNNING,
> > which today involve the following actions:
> >
> > 1) request firmware
> > 2) prepare device
> > 3) parse fw
> > 4) handle resources
> > 5) allocate carveouts
> > 6) load segments
> > 7) find resource table
> > 8) prepare subdevices
> > 9) "boot"
> > 10) start subdevices
>
> If we add a DETACHED state I don't see a scenario where we need the on_init
> variable. When DETACHED is set by the platform we know the MCU is running and
> it becomes a matter of when the core attach to it, i.e at initialisation time or
> once the kernel has finished booting, and that is already taken care of by the
> auto_boot variable.
>
> The steps you have outlined above to describe the transitions are accurate.
>

Thanks for confirming.

I think it would be helpful if we had this properly documented in the
driver, to facilitate reasoning about the various transitions. I'll try
to write down my notes in a patch and send it out.

> >
> > > > > + bool after_stop;
> > > >
> > > > What does it mean when this is true? That Linux can shut the remote core
> > > > down, but someone else will start it?
> > >
> > > It tells the remoteproc core how to interact with the remote processor after the
> > > latter has been switched off.
> >
> > Understood.
> >
> > > For example, we could want to boot the remote
> > > processor from the boot loader so that minimal functionality can be provided
> > > while the kernel boots. Once the kernel and user space are in place, the remote
> > > processor is explicitly stopped and booted once again, but this time with a
> > > firmware image that offers full functionality.
> > >
> >
> > This would be the { on_init = true, after_stop = false } use case, with
> > the new state would relate to the journey of DETACHED -> RUNNING ->
> > OFFLINE.
>
> Yes
>
> >
> > As such the next boot would represent above OFFLINE -> RUNNING case,
> > which we already support today.
>
> Correct. This is the level of functionality sought by ST and TI. Xilinx seems to
> have the same requirements as well.
>

Good.

> >
> > > It could also be that the remoteproc core can stop the remote processor, but the
> > > remote processor will automatically reboot itself. In that case the remoteproc
> > > core will simply synchronise with the remote processor, as it does when .on_init
> > > == true.
> > >
> >
> > I've not been able to come up with a reasonable use case for the {
> > on_init = ture, after_stop = true } scenario.
>
> That one is a little trickier - see the next comment.
>
> >
> > But Wendy previously talked about the need to "detach" Linux from a
> > running remote processor, by somehow just letting it know that the
> > communication is down - to allow Linux to be rebooted while the remote
> > was running. So if we support a transition from RUNNING to DETACHED
> > using a sequence of something like:
> >
> > 1) stop subdevices
> > 2) "detach"
> > 3) unprepare subdevices
> > 4) release carveouts (?)
> > 5) unprepare device (?)
> >
> > Then perhaps the after_stop could naturally be the transition from
> > DETACHED to RUNNING, either with or without a reboot of the system
> > in between?
>
> I see two scenarios for after_stop == true:
>
> 1) A "detach" scenario as you mentioned above. In this case the stop() function
> would inform (using a mechanism that is platform specific) the MCU that the core
> is shutting down. In this case the MCU would put itself back in "waiting mode",
> waiting for the core to show signs of life again. On the core side this would
> be a DETACHED to RUNNING transition. Wheter the application processor reboots
> or not should not be relevant to the MCU.
>

Right and after reading the stm32 patches, for drivers with a way to
"detach" the remote, i.e. put it back in DETACHED state, a
rmmod/modprobe should conceptually fit very well.

> 2) An "MCU reboot in autonomous mode" scenario. Here the stop() function would
> switch off the MCU. From there the MCU could automatically restarts itself or
> be restarted by some other entity. In this scenario I would expect the start()
> function to block until the MCU is ready to proceed with the rest of the
> remoteproc core initialisation steps.
>

Presumably though the NXP driver wouldn't have a mechanism to "start"
the core, only to "attach" to it. And that would wait for it to be up
and running again.

> From a remoteproc core perspective, both are handled by a DETACHED -> RUNNING
> transition. This is the functionality NXP is looking for.
>

Agreed.

> >
> > > >
> > > > > + bool after_crash;
> > > >
> > > > Similarly what is the expected steps to be taken by the core when this
> > > > is true? Should rproc_report_crash() simply stop/start the subdevices
> > > > and upon one of the ops somehow tell the remote controller that it can
> > > > proceed with the recovery?
> > >
> > > The exact same sequence of steps will be carried out as they are today, except
> > > that if after_crash == true, the remoteproc core won't be switching the remote
> > > processor on, exactly as it would do when on_init == true.
> > >
> >
> > Just to make sure we're on the same page:
> >
> > after_crash = false is what we have today, and would mean:
> >
> > 1) stop subdevices
> > 2) power off
> > 3) unprepare subdevices
> > 4) generate coredump
> > 5) request firmware
> > 6) load segments
> > 7) find resource table
> > 8) prepare subdevices
> > 9) "boot"
> > 10) start subdevices
>
> Exactly
>
> >
> > after_crash = true would mean:
> >
> > 1) stop subdevices
> > 2) "detach"
> > 3) unprepare subdevices
> > 4) prepare subdevices
> > 5) "attach"
> > 6) start subdevices
> >
>
> Yes
>
> > State diagram wise both of these would represent the transition RUNNING
> > -> CRASHED -> RUNNING, but somehow the platform driver needs to be able
> > to specify which of these sequences to perform. Per your naming
> > suggestion above, this does sound like a "autonomous_recovery" boolean
> > to me.
>
> Right, semantically "rproc->autonomous" would apply quite well.
>
> In function rproc_crash_handler_work(), a call to rproc_set_sync_flag() has been
> strategically placed to set the value of rproc->autonomous based on
> "after_crash". From there the core knows which rproc_ops to use. Here too we
> have to rely on the rproc_ops provided by the platform to do the right thing
> based on the scenario to enact.
>

Do you think that autonomous_recovery would be something that changes
for a given remoteproc instance? I envisioned it as something that you
know at registration time, but perhaps I'm missing some details here.

> >
> > > These flags are there to indicate how to set rproc::sync_with_rproc after
> > > different events, that is when the remoteproc core boots, when the remoteproc
> > > has been stopped or when it has crashed.
> > >
> >
> > Right, that was clear from your patches. Sorry that my reply didn't
> > convey the information that I had understood this.
> >
> > > >
> > > > > +};
> > > > > +
> > > > > /**
> > > > > * struct rproc_ops - platform-specific device handlers
> > > > > * @start: power on the device and boot it
> > > > > @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> > > > > * @firmware: name of firmware file to be loaded
> > > > > * @priv: private data which belongs to the platform-specific rproc module
> > > > > * @ops: platform-specific start/stop rproc handlers
> > > > > + * @sync_ops: platform-specific start/stop rproc handlers when
> > > > > + * synchronising with a remote processor.
> > > > > + * @sync_flags: Determine the rproc_ops to choose in specific states.
> > > > > * @dev: virtual device for refcounting and common remoteproc behavior
> > > > > * @power: refcount of users who need this rproc powered up
> > > > > * @state: state of the device
> > > > > @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> > > > > * @table_sz: size of @cached_table
> > > > > * @has_iommu: flag to indicate if remote processor is behind an MMU
> > > > > * @auto_boot: flag to indicate if remote processor should be auto-started
> > > > > + * @sync_with_rproc: true if currently synchronising with the rproc
> > > > > * @dump_segments: list of segments in the firmware
> > > > > * @nb_vdev: number of vdev currently handled by rproc
> > > > > */
> > > > > @@ -492,6 +513,8 @@ struct rproc {
> > > > > const char *firmware;
> > > > > void *priv;
> > > > > struct rproc_ops *ops;
> > > > > + struct rproc_ops *sync_ops;
> > > >
> > > > Do we really need two rproc_ops, given that both are coming from the
> > > > platform driver and the sync_flags will define which one to look at?
> > > >
> > > > Can't the platform driver just provide an ops table that works with the
> > > > flags it passes?
> > >
> > > That is the approach Loic took in a previous patchset [1] and that was rejected.
> > > It also lead to all of the platform drivers testing rproc->flag before carring
> > > different actions, something you indicated could be done in the core. This
> > > patch does exactly that, i.e move the testing of rproc->flag to the core and
> > > calls the right function based on that.
> > >
> >
> > I think I see what you mean, as we use "start" for both syncing and
> > starting the core, a { on_init = true, after_stop = false } setup either
> > needs two tables or force conditionals on the platform driver.
> >
> > > The end result is the same and I'm happy with one or the other, I will need to
> > > know which one.
> > >
> >
> > How about adding a new ops named "attach" to rproc_ops, which the
> > platform driver can specify if it supports attaching an already running
> > processor?
>
> Using "attach_ops" works for me. But would "autonomous_ops", to correlate with
> rproc::autonomous, add clarity? Either way work equally well for me.
>

What I meant was that we add a function "attach" to the existing
rproc_ops. In the case of OFFLINE->RUNNING we continue to call
rproc->ops->start() and DETACHED->RUNNING we call this
rproc->ops->attach().

As I thought about this I saw that the "autonomous" part would only
apply to the scenario where the remote recovers from crashes by itself
(and we just need to be in sync with that). But I've not yet fully
thought through the NXP case of a stopped remote processor restarting by
itself.

> >
> > > The advantage with the approach I'm proposing is that everything is controlled
> > > in the core, i.e what ops is called and when to set rproc->flag based on
> > > different states the remote processor transitions through.
> > >
> >
> > I still think keeping things in the core is the right thing to do.
> >
>
> Let's continue down that path then.
>
> >
> > Please let me know what you think!
>
> From the above conversion I believe our views are pretty much aligned.
>

I share this belief and am looking forward to v4.

Regards,
Bjorn

> >
> > PS. If we agree on this the three transitions becomes somewhat
> > independent, so I think it makes sense to first land support for the
> > DETACHED -> RUNNING transition (and the stm32 series), then follow up
> > with RUNNING -> DETACHED and autonomous recovery separately.
>
> We can certainly proceed that way.
>
> Thanks for the time,
> Mathieu
>
> >
> > Regards,
> > Bjorn
> >
> > > Thanks,
> > > Mathieu
> > >
> > >
> > > [1]. https://patchwork.kernel.org/patch/11265869/
> > >
> > > >
> > > > Regards,
> > > > Bjorn
> > > >
> > > > > + struct rproc_sync_flags sync_flags;
> > > > > struct device dev;
> > > > > atomic_t power;
> > > > > unsigned int state;
> > > > > @@ -515,6 +538,7 @@ struct rproc {
> > > > > size_t table_sz;
> > > > > bool has_iommu;
> > > > > bool auto_boot;
> > > > > + bool sync_with_rproc;
> > > > > struct list_head dump_segments;
> > > > > int nb_vdev;
> > > > > u8 elf_class;
> > > > > --
> > > > > 2.20.1
> > > > >

2020-05-20 22:11:06

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

On Mon, May 18, 2020 at 05:55:00PM -0700, Bjorn Andersson wrote:
> On Fri 15 May 12:24 PDT 2020, Mathieu Poirier wrote:
>
> > Good day Bjorn,
> >
> > On Wed, May 13, 2020 at 06:32:24PM -0700, Bjorn Andersson wrote:
> > > On Fri 08 May 14:01 PDT 2020, Mathieu Poirier wrote:
> > >
> > > > On Tue, May 05, 2020 at 05:22:53PM -0700, Bjorn Andersson wrote:
> > > > > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
> > > > >
> > > > > > Add a new sync_ops to support use cases where the remoteproc
> > > > > > core is synchronising with the remote processor. Exactly when to use
> > > > > > the synchronisation operations is directed by the flags in structure
> > > > > > rproc_sync_flags.
> > > > > >
> > > > >
> > > > > I'm sorry, but no matter how many times I read these patches I have to
> > > > > translate "synchronising" to "remote controlled", and given the number
> > > > > of comments clarifying this makes me feel that we could perhaps come up
> > > > > with a better name?
> > > >
> > > > "remote controlled" as in "someone else is managing the remote processor" ?
> > > > It could also mean the remoteproc core is "remote controlling" the
> > > > remote processor, exactly what it currently does today...
> > > >
> > >
> > > You're right and this would certainly not help the confusion.
> > >
> > > > How about "autonomous", as in the remote processor doesn't need us to boot or
> > > > switch it off. I'm open to any other suggestions.
> > > >
> > > > >
> > > > > > Signed-off-by: Mathieu Poirier <[email protected]>
> > > > > > ---
> > > > > > include/linux/remoteproc.h | 24 ++++++++++++++++++++++++
> > > > > > 1 file changed, 24 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h
> > > > > > index ac4082f12e8b..ceb3b2bba824 100644
> > > > > > --- a/include/linux/remoteproc.h
> > > > > > +++ b/include/linux/remoteproc.h
> > > > > > @@ -353,6 +353,23 @@ enum rsc_handling_status {
> > > > > > RSC_IGNORED = 1,
> > > > > > };
> > > > > >
> > > > > > +/**
> > > > > > + * struct rproc_sync_flags - platform specific flags indicating which
> > > > > > + * rproc_ops to use at specific times during
> > > > > > + * the rproc lifecycle.
> > > > > > + * @on_init: true if synchronising with the remote processor at
> > > > > > + * initialisation time
> > > > > > + * @after_stop: true if synchronising with the remote processor after it was
> > > > > > + * stopped from the cmmand line
> > > > > > + * @after_crash: true if synchronising with the remote processor after
> > > > > > + * it has crashed
> > > > > > + */
> > > > > > +struct rproc_sync_flags {
> > > > > > + bool on_init;
> > > > >
> > > > > This indirectly splits the RPROC_OFFLINE state in an "offline" and
> > > > > "already-booted" state. Wouldn't it be clearer to represent this with a
> > > > > new RPROC_ALREADY_BOOTED state?
> > > > >
> > > >
> > > > I suggested that at some point in the past but it was in a different context. I
> > > > will revisit to see how doing so could apply here.
> > > >
> > >
> > > How about we introduce a new state named DETACHED and make the platform
> > > drivers specify that the remote processor is in either OFFLINE (as
> > > today) or DETACHED during initialization.
> >
> > That is certainly an idea that is growing on me. Up to now I used the on_init
> > flag to express duality in the OFFLINE state. But based on the comments that came
> > back from yourself, Arnaud and Suman it is clear that my approach is anything
> > but clear. As such I am eager to try something else.
> >
> > >
> > > Then on_init = true would be the action of going from DETACHED to
> > > RUNNING, which would involve the following actions:
> > >
> > > 1) find resource table
> > > 2) prepare device (?)
> > > 3) handle resources
> > > 4) allocate carveouts (?)
> > > 5) prepare subdevices
> > > 6) "attach"
> > > 7) start subdevices
> > >
> > > on_init = false would represent the transition from OFFLINE to RUNNING,
> > > which today involve the following actions:
> > >
> > > 1) request firmware
> > > 2) prepare device
> > > 3) parse fw
> > > 4) handle resources
> > > 5) allocate carveouts
> > > 6) load segments
> > > 7) find resource table
> > > 8) prepare subdevices
> > > 9) "boot"
> > > 10) start subdevices
> >
> > If we add a DETACHED state I don't see a scenario where we need the on_init
> > variable. When DETACHED is set by the platform we know the MCU is running and
> > it becomes a matter of when the core attach to it, i.e at initialisation time or
> > once the kernel has finished booting, and that is already taken care of by the
> > auto_boot variable.
> >
> > The steps you have outlined above to describe the transitions are accurate.
> >
>
> Thanks for confirming.
>
> I think it would be helpful if we had this properly documented in the
> driver, to facilitate reasoning about the various transitions. I'll try
> to write down my notes in a patch and send it out.
>
> > >
> > > > > > + bool after_stop;
> > > > >
> > > > > What does it mean when this is true? That Linux can shut the remote core
> > > > > down, but someone else will start it?
> > > >
> > > > It tells the remoteproc core how to interact with the remote processor after the
> > > > latter has been switched off.
> > >
> > > Understood.
> > >
> > > > For example, we could want to boot the remote
> > > > processor from the boot loader so that minimal functionality can be provided
> > > > while the kernel boots. Once the kernel and user space are in place, the remote
> > > > processor is explicitly stopped and booted once again, but this time with a
> > > > firmware image that offers full functionality.
> > > >
> > >
> > > This would be the { on_init = true, after_stop = false } use case, with
> > > the new state would relate to the journey of DETACHED -> RUNNING ->
> > > OFFLINE.
> >
> > Yes
> >
> > >
> > > As such the next boot would represent above OFFLINE -> RUNNING case,
> > > which we already support today.
> >
> > Correct. This is the level of functionality sought by ST and TI. Xilinx seems to
> > have the same requirements as well.
> >
>
> Good.
>
> > >
> > > > It could also be that the remoteproc core can stop the remote processor, but the
> > > > remote processor will automatically reboot itself. In that case the remoteproc
> > > > core will simply synchronise with the remote processor, as it does when .on_init
> > > > == true.
> > > >
> > >
> > > I've not been able to come up with a reasonable use case for the {
> > > on_init = ture, after_stop = true } scenario.
> >
> > That one is a little trickier - see the next comment.
> >
> > >
> > > But Wendy previously talked about the need to "detach" Linux from a
> > > running remote processor, by somehow just letting it know that the
> > > communication is down - to allow Linux to be rebooted while the remote
> > > was running. So if we support a transition from RUNNING to DETACHED
> > > using a sequence of something like:
> > >
> > > 1) stop subdevices
> > > 2) "detach"
> > > 3) unprepare subdevices
> > > 4) release carveouts (?)
> > > 5) unprepare device (?)
> > >
> > > Then perhaps the after_stop could naturally be the transition from
> > > DETACHED to RUNNING, either with or without a reboot of the system
> > > in between?
> >
> > I see two scenarios for after_stop == true:
> >
> > 1) A "detach" scenario as you mentioned above. In this case the stop() function
> > would inform (using a mechanism that is platform specific) the MCU that the core
> > is shutting down. In this case the MCU would put itself back in "waiting mode",
> > waiting for the core to show signs of life again. On the core side this would
> > be a DETACHED to RUNNING transition. Wheter the application processor reboots
> > or not should not be relevant to the MCU.
> >
>
> Right and after reading the stm32 patches, for drivers with a way to
> "detach" the remote, i.e. put it back in DETACHED state, a
> rmmod/modprobe should conceptually fit very well.
>
> > 2) An "MCU reboot in autonomous mode" scenario. Here the stop() function would
> > switch off the MCU. From there the MCU could automatically restarts itself or
> > be restarted by some other entity. In this scenario I would expect the start()
> > function to block until the MCU is ready to proceed with the rest of the
> > remoteproc core initialisation steps.
> >
>
> Presumably though the NXP driver wouldn't have a mechanism to "start"
> the core, only to "attach" to it. And that would wait for it to be up
> and running again.
>
> > From a remoteproc core perspective, both are handled by a DETACHED -> RUNNING
> > transition. This is the functionality NXP is looking for.
> >
>
> Agreed.
>
> > >
> > > > >
> > > > > > + bool after_crash;
> > > > >
> > > > > Similarly what is the expected steps to be taken by the core when this
> > > > > is true? Should rproc_report_crash() simply stop/start the subdevices
> > > > > and upon one of the ops somehow tell the remote controller that it can
> > > > > proceed with the recovery?
> > > >
> > > > The exact same sequence of steps will be carried out as they are today, except
> > > > that if after_crash == true, the remoteproc core won't be switching the remote
> > > > processor on, exactly as it would do when on_init == true.
> > > >
> > >
> > > Just to make sure we're on the same page:
> > >
> > > after_crash = false is what we have today, and would mean:
> > >
> > > 1) stop subdevices
> > > 2) power off
> > > 3) unprepare subdevices
> > > 4) generate coredump
> > > 5) request firmware
> > > 6) load segments
> > > 7) find resource table
> > > 8) prepare subdevices
> > > 9) "boot"
> > > 10) start subdevices
> >
> > Exactly
> >
> > >
> > > after_crash = true would mean:
> > >
> > > 1) stop subdevices
> > > 2) "detach"
> > > 3) unprepare subdevices
> > > 4) prepare subdevices
> > > 5) "attach"
> > > 6) start subdevices
> > >
> >
> > Yes
> >
> > > State diagram wise both of these would represent the transition RUNNING
> > > -> CRASHED -> RUNNING, but somehow the platform driver needs to be able
> > > to specify which of these sequences to perform. Per your naming
> > > suggestion above, this does sound like a "autonomous_recovery" boolean
> > > to me.
> >
> > Right, semantically "rproc->autonomous" would apply quite well.
> >
> > In function rproc_crash_handler_work(), a call to rproc_set_sync_flag() has been
> > strategically placed to set the value of rproc->autonomous based on
> > "after_crash". From there the core knows which rproc_ops to use. Here too we
> > have to rely on the rproc_ops provided by the platform to do the right thing
> > based on the scenario to enact.
> >
>
> Do you think that autonomous_recovery would be something that changes
> for a given remoteproc instance? I envisioned it as something that you
> know at registration time, but perhaps I'm missing some details here.

I don't envision any of the transision flags to change once they are set by the
platform. The same applies to the new rproc_ops, it can be set only once.
Otherwise combination of possible scenarios becomes too hard to manage, leading
to situations where the core and MCU get out of sync and can't talk to each
other.

>
> > >
> > > > These flags are there to indicate how to set rproc::sync_with_rproc after
> > > > different events, that is when the remoteproc core boots, when the remoteproc
> > > > has been stopped or when it has crashed.
> > > >
> > >
> > > Right, that was clear from your patches. Sorry that my reply didn't
> > > convey the information that I had understood this.
> > >
> > > > >
> > > > > > +};
> > > > > > +
> > > > > > /**
> > > > > > * struct rproc_ops - platform-specific device handlers
> > > > > > * @start: power on the device and boot it
> > > > > > @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> > > > > > * @firmware: name of firmware file to be loaded
> > > > > > * @priv: private data which belongs to the platform-specific rproc module
> > > > > > * @ops: platform-specific start/stop rproc handlers
> > > > > > + * @sync_ops: platform-specific start/stop rproc handlers when
> > > > > > + * synchronising with a remote processor.
> > > > > > + * @sync_flags: Determine the rproc_ops to choose in specific states.
> > > > > > * @dev: virtual device for refcounting and common remoteproc behavior
> > > > > > * @power: refcount of users who need this rproc powered up
> > > > > > * @state: state of the device
> > > > > > @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> > > > > > * @table_sz: size of @cached_table
> > > > > > * @has_iommu: flag to indicate if remote processor is behind an MMU
> > > > > > * @auto_boot: flag to indicate if remote processor should be auto-started
> > > > > > + * @sync_with_rproc: true if currently synchronising with the rproc
> > > > > > * @dump_segments: list of segments in the firmware
> > > > > > * @nb_vdev: number of vdev currently handled by rproc
> > > > > > */
> > > > > > @@ -492,6 +513,8 @@ struct rproc {
> > > > > > const char *firmware;
> > > > > > void *priv;
> > > > > > struct rproc_ops *ops;
> > > > > > + struct rproc_ops *sync_ops;
> > > > >
> > > > > Do we really need two rproc_ops, given that both are coming from the
> > > > > platform driver and the sync_flags will define which one to look at?
> > > > >
> > > > > Can't the platform driver just provide an ops table that works with the
> > > > > flags it passes?
> > > >
> > > > That is the approach Loic took in a previous patchset [1] and that was rejected.
> > > > It also lead to all of the platform drivers testing rproc->flag before carring
> > > > different actions, something you indicated could be done in the core. This
> > > > patch does exactly that, i.e move the testing of rproc->flag to the core and
> > > > calls the right function based on that.
> > > >
> > >
> > > I think I see what you mean, as we use "start" for both syncing and
> > > starting the core, a { on_init = true, after_stop = false } setup either
> > > needs two tables or force conditionals on the platform driver.
> > >
> > > > The end result is the same and I'm happy with one or the other, I will need to
> > > > know which one.
> > > >
> > >
> > > How about adding a new ops named "attach" to rproc_ops, which the
> > > platform driver can specify if it supports attaching an already running
> > > processor?
> >
> > Using "attach_ops" works for me. But would "autonomous_ops", to correlate with
> > rproc::autonomous, add clarity? Either way work equally well for me.
> >
>
> What I meant was that we add a function "attach" to the existing
> rproc_ops. In the case of OFFLINE->RUNNING we continue to call
> rproc->ops->start() and DETACHED->RUNNING we call this
> rproc->ops->attach().

If I read the above properly we'd end up with:

struct rproc_ops {
int (*start)(struct rproc *rproc);
int (*stop)(struct rproc *rproc);
int (*attach)(struct rproc *rproc);
int (*detach)(struct rproc *rproc);
...
...
};

But wed'd have to deal with other operations that are common to both scenarios
such as parse_fw() and find_loaded_rsc_table().

So far lot of improvement have already been suggested on this revision. I
suggest to spin off a new patchset that only handles the DETACHED->RUNNING
scenario and split common functions such as rproc_fw_boot(). From there we can
see if other refinements (such as what you suggest above) are mandated.

One last thing... Upon reflecting on all this I think using "attach" is better
than "autonomous", the latter is heavy to drag around.

Thanks,
Mathieu

>
> As I thought about this I saw that the "autonomous" part would only
> apply to the scenario where the remote recovers from crashes by itself
> (and we just need to be in sync with that). But I've not yet fully
> thought through the NXP case of a stopped remote processor restarting by
> itself.
>
> > >
> > > > The advantage with the approach I'm proposing is that everything is controlled
> > > > in the core, i.e what ops is called and when to set rproc->flag based on
> > > > different states the remote processor transitions through.
> > > >
> > >
> > > I still think keeping things in the core is the right thing to do.
> > >
> >
> > Let's continue down that path then.
> >
> > >
> > > Please let me know what you think!
> >
> > From the above conversion I believe our views are pretty much aligned.
> >
>
> I share this belief and am looking forward to v4.
>
> Regards,
> Bjorn
>
> > >
> > > PS. If we agree on this the three transitions becomes somewhat
> > > independent, so I think it makes sense to first land support for the
> > > DETACHED -> RUNNING transition (and the stm32 series), then follow up
> > > with RUNNING -> DETACHED and autonomous recovery separately.
> >
> > We can certainly proceed that way.
> >
> > Thanks for the time,
> > Mathieu
> >
> > >
> > > Regards,
> > > Bjorn
> > >
> > > > Thanks,
> > > > Mathieu
> > > >
> > > >
> > > > [1]. https://patchwork.kernel.org/patch/11265869/
> > > >
> > > > >
> > > > > Regards,
> > > > > Bjorn
> > > > >
> > > > > > + struct rproc_sync_flags sync_flags;
> > > > > > struct device dev;
> > > > > > atomic_t power;
> > > > > > unsigned int state;
> > > > > > @@ -515,6 +538,7 @@ struct rproc {
> > > > > > size_t table_sz;
> > > > > > bool has_iommu;
> > > > > > bool auto_boot;
> > > > > > + bool sync_with_rproc;
> > > > > > struct list_head dump_segments;
> > > > > > int nb_vdev;
> > > > > > u8 elf_class;
> > > > > > --
> > > > > > 2.20.1
> > > > > >

2020-05-21 05:24:01

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

On Wed 20 May 15:06 PDT 2020, Mathieu Poirier wrote:

> On Mon, May 18, 2020 at 05:55:00PM -0700, Bjorn Andersson wrote:
> > On Fri 15 May 12:24 PDT 2020, Mathieu Poirier wrote:
> >
> > > Good day Bjorn,
> > >
> > > On Wed, May 13, 2020 at 06:32:24PM -0700, Bjorn Andersson wrote:
> > > > On Fri 08 May 14:01 PDT 2020, Mathieu Poirier wrote:
> > > >
> > > > > On Tue, May 05, 2020 at 05:22:53PM -0700, Bjorn Andersson wrote:
> > > > > > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
[..]
> > > > > > > + bool after_crash;
> > > > > >
> > > > > > Similarly what is the expected steps to be taken by the core when this
> > > > > > is true? Should rproc_report_crash() simply stop/start the subdevices
> > > > > > and upon one of the ops somehow tell the remote controller that it can
> > > > > > proceed with the recovery?
> > > > >
> > > > > The exact same sequence of steps will be carried out as they are today, except
> > > > > that if after_crash == true, the remoteproc core won't be switching the remote
> > > > > processor on, exactly as it would do when on_init == true.
> > > > >
> > > >
> > > > Just to make sure we're on the same page:
> > > >
> > > > after_crash = false is what we have today, and would mean:
> > > >
> > > > 1) stop subdevices
> > > > 2) power off
> > > > 3) unprepare subdevices
> > > > 4) generate coredump
> > > > 5) request firmware
> > > > 6) load segments
> > > > 7) find resource table
> > > > 8) prepare subdevices
> > > > 9) "boot"
> > > > 10) start subdevices
> > >
> > > Exactly
> > >
> > > >
> > > > after_crash = true would mean:
> > > >
> > > > 1) stop subdevices
> > > > 2) "detach"
> > > > 3) unprepare subdevices
> > > > 4) prepare subdevices
> > > > 5) "attach"
> > > > 6) start subdevices
> > > >
> > >
> > > Yes
> > >
> > > > State diagram wise both of these would represent the transition RUNNING
> > > > -> CRASHED -> RUNNING, but somehow the platform driver needs to be able
> > > > to specify which of these sequences to perform. Per your naming
> > > > suggestion above, this does sound like a "autonomous_recovery" boolean
> > > > to me.
> > >
> > > Right, semantically "rproc->autonomous" would apply quite well.
> > >
> > > In function rproc_crash_handler_work(), a call to rproc_set_sync_flag() has been
> > > strategically placed to set the value of rproc->autonomous based on
> > > "after_crash". From there the core knows which rproc_ops to use. Here too we
> > > have to rely on the rproc_ops provided by the platform to do the right thing
> > > based on the scenario to enact.
> > >
> >
> > Do you think that autonomous_recovery would be something that changes
> > for a given remoteproc instance? I envisioned it as something that you
> > know at registration time, but perhaps I'm missing some details here.
>
> I don't envision any of the transision flags to change once they are set by the
> platform. The same applies to the new rproc_ops, it can be set only once.
> Otherwise combination of possible scenarios becomes too hard to manage, leading
> to situations where the core and MCU get out of sync and can't talk to each
> other.
>

Sounds good, I share this expectation, just wanted to check with you.

> >
> > > >
> > > > > These flags are there to indicate how to set rproc::sync_with_rproc after
> > > > > different events, that is when the remoteproc core boots, when the remoteproc
> > > > > has been stopped or when it has crashed.
> > > > >
> > > >
> > > > Right, that was clear from your patches. Sorry that my reply didn't
> > > > convey the information that I had understood this.
> > > >
> > > > > >
> > > > > > > +};
> > > > > > > +
> > > > > > > /**
> > > > > > > * struct rproc_ops - platform-specific device handlers
> > > > > > > * @start: power on the device and boot it
> > > > > > > @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> > > > > > > * @firmware: name of firmware file to be loaded
> > > > > > > * @priv: private data which belongs to the platform-specific rproc module
> > > > > > > * @ops: platform-specific start/stop rproc handlers
> > > > > > > + * @sync_ops: platform-specific start/stop rproc handlers when
> > > > > > > + * synchronising with a remote processor.
> > > > > > > + * @sync_flags: Determine the rproc_ops to choose in specific states.
> > > > > > > * @dev: virtual device for refcounting and common remoteproc behavior
> > > > > > > * @power: refcount of users who need this rproc powered up
> > > > > > > * @state: state of the device
> > > > > > > @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> > > > > > > * @table_sz: size of @cached_table
> > > > > > > * @has_iommu: flag to indicate if remote processor is behind an MMU
> > > > > > > * @auto_boot: flag to indicate if remote processor should be auto-started
> > > > > > > + * @sync_with_rproc: true if currently synchronising with the rproc
> > > > > > > * @dump_segments: list of segments in the firmware
> > > > > > > * @nb_vdev: number of vdev currently handled by rproc
> > > > > > > */
> > > > > > > @@ -492,6 +513,8 @@ struct rproc {
> > > > > > > const char *firmware;
> > > > > > > void *priv;
> > > > > > > struct rproc_ops *ops;
> > > > > > > + struct rproc_ops *sync_ops;
> > > > > >
> > > > > > Do we really need two rproc_ops, given that both are coming from the
> > > > > > platform driver and the sync_flags will define which one to look at?
> > > > > >
> > > > > > Can't the platform driver just provide an ops table that works with the
> > > > > > flags it passes?
> > > > >
> > > > > That is the approach Loic took in a previous patchset [1] and that was rejected.
> > > > > It also lead to all of the platform drivers testing rproc->flag before carring
> > > > > different actions, something you indicated could be done in the core. This
> > > > > patch does exactly that, i.e move the testing of rproc->flag to the core and
> > > > > calls the right function based on that.
> > > > >
> > > >
> > > > I think I see what you mean, as we use "start" for both syncing and
> > > > starting the core, a { on_init = true, after_stop = false } setup either
> > > > needs two tables or force conditionals on the platform driver.
> > > >
> > > > > The end result is the same and I'm happy with one or the other, I will need to
> > > > > know which one.
> > > > >
> > > >
> > > > How about adding a new ops named "attach" to rproc_ops, which the
> > > > platform driver can specify if it supports attaching an already running
> > > > processor?
> > >
> > > Using "attach_ops" works for me. But would "autonomous_ops", to correlate with
> > > rproc::autonomous, add clarity? Either way work equally well for me.
> > >
> >
> > What I meant was that we add a function "attach" to the existing
> > rproc_ops. In the case of OFFLINE->RUNNING we continue to call
> > rproc->ops->start() and DETACHED->RUNNING we call this
> > rproc->ops->attach().
>
> If I read the above properly we'd end up with:
>
> struct rproc_ops {
> int (*start)(struct rproc *rproc);
> int (*stop)(struct rproc *rproc);
> int (*attach)(struct rproc *rproc);
> int (*detach)(struct rproc *rproc);
> ...
> ...
> };

Yes, that's what I meant.

>
> But wed'd have to deal with other operations that are common to both scenarios
> such as parse_fw() and find_loaded_rsc_table().
>

I would prefer that we don't parse_fw(NULL), perhaps we can turn that
upside down and have the platform_driver just provide the information to
the core as it learns about it during probe?

> So far lot of improvement have already been suggested on this revision. I
> suggest to spin off a new patchset that only handles the DETACHED->RUNNING
> scenario and split common functions such as rproc_fw_boot(). From there we can
> see if other refinements (such as what you suggest above) are mandated.
>

As far as I can see, if we take the approach of introducing the DETACHED
state we can add the various transitions piecemeal. So I'm definitely in
favour of starting off with DETACHED->RUNNING, then figure out
"autonomous recovery" and RUNNING->DETACHED in two subsequent series.

> One last thing... Upon reflecting on all this I think using "attach" is better
> than "autonomous", the latter is heavy to drag around.
>

For the action of going from DETACHED->RUNNING I too find "attach" to
better represent what's going on. The part where I think we need
something more is to communicate if it's Linux that's in charge or not
for taking the remote processor through RUNNING->CRASHED->RUNNING. For
that the word "autonomous" makes sense to me, but let's bring that up
again after landing this first piece(s).

Regards,
Bjorn

2020-05-21 21:58:27

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH v3 03/14] remoteproc: Add new operation and flags for synchronistation

On Wed, 20 May 2020 at 23:22, Bjorn Andersson
<[email protected]> wrote:
>
> On Wed 20 May 15:06 PDT 2020, Mathieu Poirier wrote:
>
> > On Mon, May 18, 2020 at 05:55:00PM -0700, Bjorn Andersson wrote:
> > > On Fri 15 May 12:24 PDT 2020, Mathieu Poirier wrote:
> > >
> > > > Good day Bjorn,
> > > >
> > > > On Wed, May 13, 2020 at 06:32:24PM -0700, Bjorn Andersson wrote:
> > > > > On Fri 08 May 14:01 PDT 2020, Mathieu Poirier wrote:
> > > > >
> > > > > > On Tue, May 05, 2020 at 05:22:53PM -0700, Bjorn Andersson wrote:
> > > > > > > On Fri 24 Apr 13:01 PDT 2020, Mathieu Poirier wrote:
> [..]
> > > > > > > > + bool after_crash;
> > > > > > >
> > > > > > > Similarly what is the expected steps to be taken by the core when this
> > > > > > > is true? Should rproc_report_crash() simply stop/start the subdevices
> > > > > > > and upon one of the ops somehow tell the remote controller that it can
> > > > > > > proceed with the recovery?
> > > > > >
> > > > > > The exact same sequence of steps will be carried out as they are today, except
> > > > > > that if after_crash == true, the remoteproc core won't be switching the remote
> > > > > > processor on, exactly as it would do when on_init == true.
> > > > > >
> > > > >
> > > > > Just to make sure we're on the same page:
> > > > >
> > > > > after_crash = false is what we have today, and would mean:
> > > > >
> > > > > 1) stop subdevices
> > > > > 2) power off
> > > > > 3) unprepare subdevices
> > > > > 4) generate coredump
> > > > > 5) request firmware
> > > > > 6) load segments
> > > > > 7) find resource table
> > > > > 8) prepare subdevices
> > > > > 9) "boot"
> > > > > 10) start subdevices
> > > >
> > > > Exactly
> > > >
> > > > >
> > > > > after_crash = true would mean:
> > > > >
> > > > > 1) stop subdevices
> > > > > 2) "detach"
> > > > > 3) unprepare subdevices
> > > > > 4) prepare subdevices
> > > > > 5) "attach"
> > > > > 6) start subdevices
> > > > >
> > > >
> > > > Yes
> > > >
> > > > > State diagram wise both of these would represent the transition RUNNING
> > > > > -> CRASHED -> RUNNING, but somehow the platform driver needs to be able
> > > > > to specify which of these sequences to perform. Per your naming
> > > > > suggestion above, this does sound like a "autonomous_recovery" boolean
> > > > > to me.
> > > >
> > > > Right, semantically "rproc->autonomous" would apply quite well.
> > > >
> > > > In function rproc_crash_handler_work(), a call to rproc_set_sync_flag() has been
> > > > strategically placed to set the value of rproc->autonomous based on
> > > > "after_crash". From there the core knows which rproc_ops to use. Here too we
> > > > have to rely on the rproc_ops provided by the platform to do the right thing
> > > > based on the scenario to enact.
> > > >
> > >
> > > Do you think that autonomous_recovery would be something that changes
> > > for a given remoteproc instance? I envisioned it as something that you
> > > know at registration time, but perhaps I'm missing some details here.
> >
> > I don't envision any of the transision flags to change once they are set by the
> > platform. The same applies to the new rproc_ops, it can be set only once.
> > Otherwise combination of possible scenarios becomes too hard to manage, leading
> > to situations where the core and MCU get out of sync and can't talk to each
> > other.
> >
>
> Sounds good, I share this expectation, just wanted to check with you.
>
> > >
> > > > >
> > > > > > These flags are there to indicate how to set rproc::sync_with_rproc after
> > > > > > different events, that is when the remoteproc core boots, when the remoteproc
> > > > > > has been stopped or when it has crashed.
> > > > > >
> > > > >
> > > > > Right, that was clear from your patches. Sorry that my reply didn't
> > > > > convey the information that I had understood this.
> > > > >
> > > > > > >
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > /**
> > > > > > > > * struct rproc_ops - platform-specific device handlers
> > > > > > > > * @start: power on the device and boot it
> > > > > > > > @@ -459,6 +476,9 @@ struct rproc_dump_segment {
> > > > > > > > * @firmware: name of firmware file to be loaded
> > > > > > > > * @priv: private data which belongs to the platform-specific rproc module
> > > > > > > > * @ops: platform-specific start/stop rproc handlers
> > > > > > > > + * @sync_ops: platform-specific start/stop rproc handlers when
> > > > > > > > + * synchronising with a remote processor.
> > > > > > > > + * @sync_flags: Determine the rproc_ops to choose in specific states.
> > > > > > > > * @dev: virtual device for refcounting and common remoteproc behavior
> > > > > > > > * @power: refcount of users who need this rproc powered up
> > > > > > > > * @state: state of the device
> > > > > > > > @@ -482,6 +502,7 @@ struct rproc_dump_segment {
> > > > > > > > * @table_sz: size of @cached_table
> > > > > > > > * @has_iommu: flag to indicate if remote processor is behind an MMU
> > > > > > > > * @auto_boot: flag to indicate if remote processor should be auto-started
> > > > > > > > + * @sync_with_rproc: true if currently synchronising with the rproc
> > > > > > > > * @dump_segments: list of segments in the firmware
> > > > > > > > * @nb_vdev: number of vdev currently handled by rproc
> > > > > > > > */
> > > > > > > > @@ -492,6 +513,8 @@ struct rproc {
> > > > > > > > const char *firmware;
> > > > > > > > void *priv;
> > > > > > > > struct rproc_ops *ops;
> > > > > > > > + struct rproc_ops *sync_ops;
> > > > > > >
> > > > > > > Do we really need two rproc_ops, given that both are coming from the
> > > > > > > platform driver and the sync_flags will define which one to look at?
> > > > > > >
> > > > > > > Can't the platform driver just provide an ops table that works with the
> > > > > > > flags it passes?
> > > > > >
> > > > > > That is the approach Loic took in a previous patchset [1] and that was rejected.
> > > > > > It also lead to all of the platform drivers testing rproc->flag before carring
> > > > > > different actions, something you indicated could be done in the core. This
> > > > > > patch does exactly that, i.e move the testing of rproc->flag to the core and
> > > > > > calls the right function based on that.
> > > > > >
> > > > >
> > > > > I think I see what you mean, as we use "start" for both syncing and
> > > > > starting the core, a { on_init = true, after_stop = false } setup either
> > > > > needs two tables or force conditionals on the platform driver.
> > > > >
> > > > > > The end result is the same and I'm happy with one or the other, I will need to
> > > > > > know which one.
> > > > > >
> > > > >
> > > > > How about adding a new ops named "attach" to rproc_ops, which the
> > > > > platform driver can specify if it supports attaching an already running
> > > > > processor?
> > > >
> > > > Using "attach_ops" works for me. But would "autonomous_ops", to correlate with
> > > > rproc::autonomous, add clarity? Either way work equally well for me.
> > > >
> > >
> > > What I meant was that we add a function "attach" to the existing
> > > rproc_ops. In the case of OFFLINE->RUNNING we continue to call
> > > rproc->ops->start() and DETACHED->RUNNING we call this
> > > rproc->ops->attach().
> >
> > If I read the above properly we'd end up with:
> >
> > struct rproc_ops {
> > int (*start)(struct rproc *rproc);
> > int (*stop)(struct rproc *rproc);
> > int (*attach)(struct rproc *rproc);
> > int (*detach)(struct rproc *rproc);
> > ...
> > ...
> > };
>
> Yes, that's what I meant.
>
> >
> > But wed'd have to deal with other operations that are common to both scenarios
> > such as parse_fw() and find_loaded_rsc_table().
> >
>
> I would prefer that we don't parse_fw(NULL), perhaps we can turn that
> upside down and have the platform_driver just provide the information to
> the core as it learns about it during probe?

I need to think about that... We may have to discuss this again in a
not so distant future.

>
> > So far lot of improvement have already been suggested on this revision. I
> > suggest to spin off a new patchset that only handles the DETACHED->RUNNING
> > scenario and split common functions such as rproc_fw_boot(). From there we can
> > see if other refinements (such as what you suggest above) are mandated.
> >
>
> As far as I can see, if we take the approach of introducing the DETACHED
> state we can add the various transitions piecemeal. So I'm definitely in
> favour of starting off with DETACHED->RUNNING, then figure out
> "autonomous recovery" and RUNNING->DETACHED in two subsequent series.
>
> > One last thing... Upon reflecting on all this I think using "attach" is better
> > than "autonomous", the latter is heavy to drag around.
> >
>
> For the action of going from DETACHED->RUNNING I too find "attach" to
> better represent what's going on. The part where I think we need
> something more is to communicate if it's Linux that's in charge or not
> for taking the remote processor through RUNNING->CRASHED->RUNNING. For
> that the word "autonomous" makes sense to me, but let's bring that up
> again after landing this first piece(s).

I agree.

>
> Regards,
> Bjorn