2021-07-21 21:03:53

by Mike Tipton

[permalink] [raw]
Subject: [PATCH v2 0/4] interconnect: Fix sync-state issues

These patches fix a couple of sync-state bugs that either cause the initial BW
floors to be ignored entirely, or to be never removed after sync-state is
called.

v2:
- Move pre_aggregate call to outside the aggregate if statement

Mike Tipton (4):
interconnect: Zero initial BW after sync-state
interconnect: Always call pre_aggregate before aggregate
interconnect: qcom: icc-rpmh: Ensure floor BW is enforced for all nodes
interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate

drivers/interconnect/core.c | 7 +++++++
drivers/interconnect/qcom/icc-rpmh.c | 20 ++++++++++----------
2 files changed, 17 insertions(+), 10 deletions(-)

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2021-07-21 21:03:53

by Mike Tipton

[permalink] [raw]
Subject: [PATCH v2 3/4] interconnect: qcom: icc-rpmh: Ensure floor BW is enforced for all nodes

We currently only enforce BW floors for a subset of nodes in a path.
All BCMs that need updating are queued in the pre_aggregate/aggregate
phase. The first set() commits all queued BCMs and subsequent set()
calls short-circuit without committing anything. Since the floor BW
isn't set in sum_avg/max_peak until set(), then some BCMs are committed
before their associated nodes reflect the floor.

Set the floor as each node is being aggregated. This ensures that all
all relevant floors are set before the BCMs are committed.

Fixes: 266cd33b5913 ("interconnect: qcom: Ensure that the floor bandwidth value is enforced")
Signed-off-by: Mike Tipton <[email protected]>
---
drivers/interconnect/qcom/icc-rpmh.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/interconnect/qcom/icc-rpmh.c b/drivers/interconnect/qcom/icc-rpmh.c
index bf01d09dba6c..f118f57eae37 100644
--- a/drivers/interconnect/qcom/icc-rpmh.c
+++ b/drivers/interconnect/qcom/icc-rpmh.c
@@ -57,6 +57,11 @@ int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
qn->sum_avg[i] += avg_bw;
qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
}
+
+ if (node->init_avg || node->init_peak) {
+ qn->sum_avg[i] = max_t(u64, qn->sum_avg[i], node->init_avg);
+ qn->max_peak[i] = max_t(u64, qn->max_peak[i], node->init_peak);
+ }
}

*agg_avg += avg_bw;
@@ -90,11 +95,6 @@ int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
qp = to_qcom_provider(node->provider);
qn = node->data;

- qn->sum_avg[QCOM_ICC_BUCKET_AMC] = max_t(u64, qn->sum_avg[QCOM_ICC_BUCKET_AMC],
- node->avg_bw);
- qn->max_peak[QCOM_ICC_BUCKET_AMC] = max_t(u64, qn->max_peak[QCOM_ICC_BUCKET_AMC],
- node->peak_bw);
-
qcom_icc_bcm_voter_commit(qp->voter);

return 0;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2021-07-21 21:03:53

by Mike Tipton

[permalink] [raw]
Subject: [PATCH v2 1/4] interconnect: Zero initial BW after sync-state

The initial BW values may be used by providers to enforce floors. Zero
these values after sync-state so that providers know when to stop
enforcing them.

Fixes: b1d681d8d324 ("interconnect: Add sync state support")
Signed-off-by: Mike Tipton <[email protected]>
---
drivers/interconnect/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 8a1e70e00876..945121e18b5c 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -1106,6 +1106,8 @@ void icc_sync_state(struct device *dev)
dev_dbg(p->dev, "interconnect provider is in synced state\n");
list_for_each_entry(n, &p->nodes, node_list) {
if (n->init_avg || n->init_peak) {
+ n->init_avg = 0;
+ n->init_peak = 0;
aggregate_requests(n);
p->set(n, n);
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2021-07-21 21:03:53

by Mike Tipton

[permalink] [raw]
Subject: [PATCH v2 2/4] interconnect: Always call pre_aggregate before aggregate

The pre_aggregate callback isn't called in all cases before calling
aggregate. Add the missing calls so providers can rely on consistent
framework behavior.

Fixes: d3703b3e255f ("interconnect: Aggregate before setting initial bandwidth")
Signed-off-by: Mike Tipton <[email protected]>
---
drivers/interconnect/core.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 945121e18b5c..1b2c564eaa99 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -973,9 +973,14 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider)
}
node->avg_bw = node->init_avg;
node->peak_bw = node->init_peak;
+
+ if (provider->pre_aggregate)
+ provider->pre_aggregate(node);
+
if (provider->aggregate)
provider->aggregate(node, 0, node->init_avg, node->init_peak,
&node->avg_bw, &node->peak_bw);
+
provider->set(node, node);
node->avg_bw = 0;
node->peak_bw = 0;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2021-07-21 21:06:02

by Mike Tipton

[permalink] [raw]
Subject: [PATCH v2 4/4] interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate

We're only adding BCMs to the commit list in aggregate(), but there are
cases where pre_aggregate() is called without subsequently calling
aggregate(). In particular, in icc_sync_state() when a node with initial
BW has zero requests. Since BCMs aren't added to the commit list in
these cases, we don't actually send the zero BW request to HW. So the
resources remain on unnecessarily.

Add BCMs to the commit list in pre_aggregate() instead, which is always
called even when there are no requests.

Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support")
Signed-off-by: Mike Tipton <[email protected]>
---
drivers/interconnect/qcom/icc-rpmh.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/interconnect/qcom/icc-rpmh.c b/drivers/interconnect/qcom/icc-rpmh.c
index f118f57eae37..b26fda0588e0 100644
--- a/drivers/interconnect/qcom/icc-rpmh.c
+++ b/drivers/interconnect/qcom/icc-rpmh.c
@@ -20,13 +20,18 @@ void qcom_icc_pre_aggregate(struct icc_node *node)
{
size_t i;
struct qcom_icc_node *qn;
+ struct qcom_icc_provider *qp;

qn = node->data;
+ qp = to_qcom_provider(node->provider);

for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
qn->sum_avg[i] = 0;
qn->max_peak[i] = 0;
}
+
+ for (i = 0; i < qn->num_bcms; i++)
+ qcom_icc_bcm_voter_add(qp->voter, qn->bcms[i]);
}
EXPORT_SYMBOL_GPL(qcom_icc_pre_aggregate);

@@ -44,10 +49,8 @@ int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
{
size_t i;
struct qcom_icc_node *qn;
- struct qcom_icc_provider *qp;

qn = node->data;
- qp = to_qcom_provider(node->provider);

if (!tag)
tag = QCOM_ICC_TAG_ALWAYS;
@@ -67,9 +70,6 @@ int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
*agg_avg += avg_bw;
*agg_peak = max_t(u32, *agg_peak, peak_bw);

- for (i = 0; i < qn->num_bcms; i++)
- qcom_icc_bcm_voter_add(qp->voter, qn->bcms[i]);
-
return 0;
}
EXPORT_SYMBOL_GPL(qcom_icc_aggregate);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2021-08-10 23:33:11

by Stephen Boyd

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate

Quoting Mike Tipton (2021-07-21 10:54:32)
> We're only adding BCMs to the commit list in aggregate(), but there are
> cases where pre_aggregate() is called without subsequently calling
> aggregate(). In particular, in icc_sync_state() when a node with initial
> BW has zero requests. Since BCMs aren't added to the commit list in
> these cases, we don't actually send the zero BW request to HW. So the
> resources remain on unnecessarily.
>
> Add BCMs to the commit list in pre_aggregate() instead, which is always
> called even when there are no requests.
>
> Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support")
> Signed-off-by: Mike Tipton <[email protected]>
> ---

This patch breaks reboot for me on sc7180 Lazor

[ 107.136454] kvm: exiting hardware virtualization
[ 107.163741] platform video-firmware.0: Removing from iommu group 13
[ 107.193412] SError Interrupt on CPU1, code 0xbe000011 -- SError
[ 107.193428] CPU: 1 PID: 4289 Comm: reboot Not tainted 5.14.0-rc1+ #12
[ 107.193432] Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
[ 107.193436] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO BTYPE=--)
[ 107.193440] pc : el1_interrupt+0x20/0x60
[ 107.193443] lr : el1h_64_irq_handler+0x18/0x24
[ 107.193445] sp : ffffffc014093a10
[ 107.193448] x29: ffffffc014093a10 x28: ffffff8088295ec0 x27: 0000000000000000
[ 107.193465] x26: ffffff8080ed4c18 x25: ffffffd0beece000 x24: ffffffd0bef45000
[ 107.193476] x23: 0000000060400009 x22: ffffffd0be0bc1a0 x21: ffffffc014093b90
[ 107.193487] x20: ffffffd0bdc100f8 x19: ffffffc014093a40 x18: 000000000007d829
[ 107.193497] x17: ffffffd067412b54 x16: ffffffd0be0bc164 x15: ffffffd067413d0c
[ 107.193507] x14: ffffffd0bdd24fa4 x13: ffffffd0bdc26180 x12: ffffffd0bdc26260
[ 107.193517] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[ 107.193528] x8 : 00000000000000c0 x7 : bbbbbbbbbbbbbbbb x6 : ffffffd0bde488dc
[ 107.193539] x5 : 0000000000200017 x4 : ffffff809b5c4b40 x3 : 0000000000200018
[ 107.193549] x2 : ffffff8088295ec0 x1 : ffffffd0bdc100f8 x0 : ffffffc014093a40
[ 107.193561] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 107.193564] CPU: 1 PID: 4289 Comm: reboot Not tainted 5.14.0-rc1+ #12
[ 107.193567] Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
[ 107.193570] Call trace:
[ 107.193573] dump_backtrace+0x0/0x1c8
[ 107.193577] show_stack+0x24/0x30
[ 107.193579] dump_stack_lvl+0x64/0x7c
[ 107.193582] dump_stack+0x18/0x38
[ 107.193584] panic+0x158/0x39c
[ 107.193586] nmi_panic+0x88/0xa0
[ 107.193589] arm64_serror_panic+0x80/0x8c
[ 107.193593] do_serror+0x0/0x80
[ 107.193595] do_serror+0x58/0x80
[ 107.193597] el1h_64_error_handler+0x30/0x48
[ 107.193601] el1h_64_error+0x78/0x7c
[ 107.193603] el1_interrupt+0x20/0x60
[ 107.193606] el1h_64_irq_handler+0x18/0x24
[ 107.193609] el1h_64_irq+0x78/0x7c
[ 107.193612] refcount_dec_and_mutex_lock+0x3c/0xb4
[ 107.193616] ipa_clock_put+0x34/0x74 [ipa]
[ 107.193619] ipa_deconfig+0x64/0x74 [ipa]
[ 107.193622] ipa_remove+0xbc/0x110 [ipa]
[ 107.193625] ipa_shutdown+0x24/0x50 [ipa]
[ 107.193628] platform_shutdown+0x30/0x3c
[ 107.193631] device_shutdown+0x150/0x208
[ 107.193633] kernel_restart_prepare+0x44/0x50
[ 107.193637] kernel_restart+0x24/0x70
[ 107.193640] __arm64_sys_reboot+0x188/0x230
[ 107.193643] invoke_syscall+0x4c/0x120
[ 107.193646] el0_svc_common+0x84/0xe0
[ 107.193648] do_el0_svc_compat+0x2c/0x38
[ 107.193651] el0_svc_compat+0x20/0x30
[ 107.193654] el0t_32_sync_handler+0xc0/0xf0
[ 107.193657] el0t_32_sync+0x19c/0x1a0

Presumably some sort of interconnect is getting turned off earlier than
before?

> drivers/interconnect/qcom/icc-rpmh.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/interconnect/qcom/icc-rpmh.c b/drivers/interconnect/qcom/icc-rpmh.c
> index f118f57eae37..b26fda0588e0 100644
> --- a/drivers/interconnect/qcom/icc-rpmh.c
> +++ b/drivers/interconnect/qcom/icc-rpmh.c
> @@ -20,13 +20,18 @@ void qcom_icc_pre_aggregate(struct icc_node *node)
> {
> size_t i;
> struct qcom_icc_node *qn;
> + struct qcom_icc_provider *qp;
>
> qn = node->data;
> + qp = to_qcom_provider(node->provider);
>
> for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
> qn->sum_avg[i] = 0;
> qn->max_peak[i] = 0;
> }
> +
> + for (i = 0; i < qn->num_bcms; i++)
> + qcom_icc_bcm_voter_add(qp->voter, qn->bcms[i]);
> }
> EXPORT_SYMBOL_GPL(qcom_icc_pre_aggregate);
>
> @@ -44,10 +49,8 @@ int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
> {
> size_t i;
> struct qcom_icc_node *qn;
> - struct qcom_icc_provider *qp;
>
> qn = node->data;
> - qp = to_qcom_provider(node->provider);
>
> if (!tag)
> tag = QCOM_ICC_TAG_ALWAYS;
> @@ -67,9 +70,6 @@ int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
> *agg_avg += avg_bw;
> *agg_peak = max_t(u32, *agg_peak, peak_bw);
>
> - for (i = 0; i < qn->num_bcms; i++)
> - qcom_icc_bcm_voter_add(qp->voter, qn->bcms[i]);
> -
> return 0;
> }
> EXPORT_SYMBOL_GPL(qcom_icc_aggregate);

2021-08-11 00:21:18

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate

On Tue 10 Aug 18:31 CDT 2021, Stephen Boyd wrote:

> Quoting Mike Tipton (2021-07-21 10:54:32)
> > We're only adding BCMs to the commit list in aggregate(), but there are
> > cases where pre_aggregate() is called without subsequently calling
> > aggregate(). In particular, in icc_sync_state() when a node with initial
> > BW has zero requests. Since BCMs aren't added to the commit list in
> > these cases, we don't actually send the zero BW request to HW. So the
> > resources remain on unnecessarily.
> >
> > Add BCMs to the commit list in pre_aggregate() instead, which is always
> > called even when there are no requests.
> >
> > Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support")
> > Signed-off-by: Mike Tipton <[email protected]>
> > ---
>
> This patch breaks reboot for me on sc7180 Lazor
>

FWIW, it prevents at least SM8150 from booting (need to check my other
boards as well), because its no longer okay to have the interconnect
providers defined without having all client paths specified.

Regards,
Bjorn

2021-08-11 04:25:33

by Stephen Boyd

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate

Quoting Bjorn Andersson (2021-08-10 17:18:02)
> On Tue 10 Aug 18:31 CDT 2021, Stephen Boyd wrote:
>
> > Quoting Mike Tipton (2021-07-21 10:54:32)
> > > We're only adding BCMs to the commit list in aggregate(), but there are
> > > cases where pre_aggregate() is called without subsequently calling
> > > aggregate(). In particular, in icc_sync_state() when a node with initial
> > > BW has zero requests. Since BCMs aren't added to the commit list in
> > > these cases, we don't actually send the zero BW request to HW. So the
> > > resources remain on unnecessarily.
> > >
> > > Add BCMs to the commit list in pre_aggregate() instead, which is always
> > > called even when there are no requests.
> > >
> > > Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support")
> > > Signed-off-by: Mike Tipton <[email protected]>
> > > ---
> >
> > This patch breaks reboot for me on sc7180 Lazor
> >
>
> FWIW, it prevents at least SM8150 from booting (need to check my other
> boards as well), because its no longer okay to have the interconnect
> providers defined without having all client paths specified.

So maybe the best course of action is to revert this patch from Linus'
tree? It's not a super huge deal as "can't boot", but certainly makes
reboot annoying on sc7180.

2021-08-11 16:02:55

by Alex Elder

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate

On 8/10/21 6:31 PM, Stephen Boyd wrote:
> Quoting Mike Tipton (2021-07-21 10:54:32)
>> We're only adding BCMs to the commit list in aggregate(), but there are
>> cases where pre_aggregate() is called without subsequently calling
>> aggregate(). In particular, in icc_sync_state() when a node with initial
>> BW has zero requests. Since BCMs aren't added to the commit list in
>> these cases, we don't actually send the zero BW request to HW. So the
>> resources remain on unnecessarily.
>>
>> Add BCMs to the commit list in pre_aggregate() instead, which is always
>> called even when there are no requests.
>>
>> Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support")
>> Signed-off-by: Mike Tipton <[email protected]>
>> ---
>
> This patch breaks reboot for me on sc7180 Lazor

If I am using the interface improperly or something in the
IPA driver, please let me know. I actually plan to switch
to using the bulk interfaces soon (FYI).

Thanks.

-Alex

> [ 107.136454] kvm: exiting hardware virtualization
> [ 107.163741] platform video-firmware.0: Removing from iommu group 13
> [ 107.193412] SError Interrupt on CPU1, code 0xbe000011 -- SError
> [ 107.193428] CPU: 1 PID: 4289 Comm: reboot Not tainted 5.14.0-rc1+ #12
> [ 107.193432] Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
> [ 107.193436] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO BTYPE=--)
> [ 107.193440] pc : el1_interrupt+0x20/0x60
> [ 107.193443] lr : el1h_64_irq_handler+0x18/0x24
> [ 107.193445] sp : ffffffc014093a10
> [ 107.193448] x29: ffffffc014093a10 x28: ffffff8088295ec0 x27: 0000000000000000
> [ 107.193465] x26: ffffff8080ed4c18 x25: ffffffd0beece000 x24: ffffffd0bef45000
> [ 107.193476] x23: 0000000060400009 x22: ffffffd0be0bc1a0 x21: ffffffc014093b90
> [ 107.193487] x20: ffffffd0bdc100f8 x19: ffffffc014093a40 x18: 000000000007d829
> [ 107.193497] x17: ffffffd067412b54 x16: ffffffd0be0bc164 x15: ffffffd067413d0c
> [ 107.193507] x14: ffffffd0bdd24fa4 x13: ffffffd0bdc26180 x12: ffffffd0bdc26260
> [ 107.193517] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
> [ 107.193528] x8 : 00000000000000c0 x7 : bbbbbbbbbbbbbbbb x6 : ffffffd0bde488dc
> [ 107.193539] x5 : 0000000000200017 x4 : ffffff809b5c4b40 x3 : 0000000000200018
> [ 107.193549] x2 : ffffff8088295ec0 x1 : ffffffd0bdc100f8 x0 : ffffffc014093a40
> [ 107.193561] Kernel panic - not syncing: Asynchronous SError Interrupt
> [ 107.193564] CPU: 1 PID: 4289 Comm: reboot Not tainted 5.14.0-rc1+ #12
> [ 107.193567] Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
> [ 107.193570] Call trace:
> [ 107.193573] dump_backtrace+0x0/0x1c8
> [ 107.193577] show_stack+0x24/0x30
> [ 107.193579] dump_stack_lvl+0x64/0x7c
> [ 107.193582] dump_stack+0x18/0x38
> [ 107.193584] panic+0x158/0x39c
> [ 107.193586] nmi_panic+0x88/0xa0
> [ 107.193589] arm64_serror_panic+0x80/0x8c
> [ 107.193593] do_serror+0x0/0x80
> [ 107.193595] do_serror+0x58/0x80
> [ 107.193597] el1h_64_error_handler+0x30/0x48
> [ 107.193601] el1h_64_error+0x78/0x7c
> [ 107.193603] el1_interrupt+0x20/0x60
> [ 107.193606] el1h_64_irq_handler+0x18/0x24
> [ 107.193609] el1h_64_irq+0x78/0x7c
> [ 107.193612] refcount_dec_and_mutex_lock+0x3c/0xb4
> [ 107.193616] ipa_clock_put+0x34/0x74 [ipa]
> [ 107.193619] ipa_deconfig+0x64/0x74 [ipa]
> [ 107.193622] ipa_remove+0xbc/0x110 [ipa]
> [ 107.193625] ipa_shutdown+0x24/0x50 [ipa]
> [ 107.193628] platform_shutdown+0x30/0x3c
> [ 107.193631] device_shutdown+0x150/0x208
> [ 107.193633] kernel_restart_prepare+0x44/0x50
> [ 107.193637] kernel_restart+0x24/0x70
> [ 107.193640] __arm64_sys_reboot+0x188/0x230
> [ 107.193643] invoke_syscall+0x4c/0x120
> [ 107.193646] el0_svc_common+0x84/0xe0
> [ 107.193648] do_el0_svc_compat+0x2c/0x38
> [ 107.193651] el0_svc_compat+0x20/0x30
> [ 107.193654] el0t_32_sync_handler+0xc0/0xf0
> [ 107.193657] el0t_32_sync+0x19c/0x1a0
>
> Presumably some sort of interconnect is getting turned off earlier than
> before?
>
>> drivers/interconnect/qcom/icc-rpmh.c | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/interconnect/qcom/icc-rpmh.c b/drivers/interconnect/qcom/icc-rpmh.c
>> index f118f57eae37..b26fda0588e0 100644
>> --- a/drivers/interconnect/qcom/icc-rpmh.c
>> +++ b/drivers/interconnect/qcom/icc-rpmh.c
>> @@ -20,13 +20,18 @@ void qcom_icc_pre_aggregate(struct icc_node *node)
>> {
>> size_t i;
>> struct qcom_icc_node *qn;
>> + struct qcom_icc_provider *qp;
>>
>> qn = node->data;
>> + qp = to_qcom_provider(node->provider);
>>
>> for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
>> qn->sum_avg[i] = 0;
>> qn->max_peak[i] = 0;
>> }
>> +
>> + for (i = 0; i < qn->num_bcms; i++)
>> + qcom_icc_bcm_voter_add(qp->voter, qn->bcms[i]);
>> }
>> EXPORT_SYMBOL_GPL(qcom_icc_pre_aggregate);
>>
>> @@ -44,10 +49,8 @@ int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
>> {
>> size_t i;
>> struct qcom_icc_node *qn;
>> - struct qcom_icc_provider *qp;
>>
>> qn = node->data;
>> - qp = to_qcom_provider(node->provider);
>>
>> if (!tag)
>> tag = QCOM_ICC_TAG_ALWAYS;
>> @@ -67,9 +70,6 @@ int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
>> *agg_avg += avg_bw;
>> *agg_peak = max_t(u32, *agg_peak, peak_bw);
>>
>> - for (i = 0; i < qn->num_bcms; i++)
>> - qcom_icc_bcm_voter_add(qp->voter, qn->bcms[i]);
>> -
>> return 0;
>> }
>> EXPORT_SYMBOL_GPL(qcom_icc_aggregate);

2021-08-11 18:16:03

by Stephen Boyd

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate

Quoting Alex Elder (2021-08-11 09:01:27)
> On 8/10/21 6:31 PM, Stephen Boyd wrote:
> > Quoting Mike Tipton (2021-07-21 10:54:32)
> >> We're only adding BCMs to the commit list in aggregate(), but there are
> >> cases where pre_aggregate() is called without subsequently calling
> >> aggregate(). In particular, in icc_sync_state() when a node with initial
> >> BW has zero requests. Since BCMs aren't added to the commit list in
> >> these cases, we don't actually send the zero BW request to HW. So the
> >> resources remain on unnecessarily.
> >>
> >> Add BCMs to the commit list in pre_aggregate() instead, which is always
> >> called even when there are no requests.
> >>
> >> Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support")
> >> Signed-off-by: Mike Tipton <[email protected]>
> >> ---
> >
> > This patch breaks reboot for me on sc7180 Lazor
>
> If I am using the interface improperly or something in the
> IPA driver, please let me know. I actually plan to switch
> to using the bulk interfaces soon (FYI).
>

I suspect I'm seeing a shutdown ordering issue, where we start dropping
interconnect requests in driver shutdown callbacks and then some bus
turns off and the CPU can't access a device. Maybe to fix this problem
(if reverting isn't an option) would be to add a shutdown hook to
rpmh-icc that effectively "props up" the bandwidth requests during
shutdown so that we don't have to think about finding the place that the
interconnect is turned off. We're shutting down/restarting anyway, so
there isn't much point in trying to be power efficient for the last few
moments of runtime.

2021-08-18 04:47:01

by Mike Tipton

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate

On 8/11/2021 11:13 AM, Stephen Boyd wrote:
> Quoting Alex Elder (2021-08-11 09:01:27)
>> On 8/10/21 6:31 PM, Stephen Boyd wrote:
>>> Quoting Mike Tipton (2021-07-21 10:54:32)
>>>> We're only adding BCMs to the commit list in aggregate(), but there are
>>>> cases where pre_aggregate() is called without subsequently calling
>>>> aggregate(). In particular, in icc_sync_state() when a node with initial
>>>> BW has zero requests. Since BCMs aren't added to the commit list in
>>>> these cases, we don't actually send the zero BW request to HW. So the
>>>> resources remain on unnecessarily.
>>>>
>>>> Add BCMs to the commit list in pre_aggregate() instead, which is always
>>>> called even when there are no requests.
>>>>
>>>> Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support")
>>>> Signed-off-by: Mike Tipton <[email protected]>
>>>> ---
>>>
>>> This patch breaks reboot for me on sc7180 Lazor
>>
>> If I am using the interface improperly or something in the
>> IPA driver, please let me know. I actually plan to switch
>> to using the bulk interfaces soon (FYI).
>>
>
> I suspect I'm seeing a shutdown ordering issue, where we start dropping
> interconnect requests in driver shutdown callbacks and then some bus
> turns off and the CPU can't access a device. Maybe to fix this problem
> (if reverting isn't an option) would be to add a shutdown hook to
> rpmh-icc that effectively "props up" the bandwidth requests during
> shutdown so that we don't have to think about finding the place that the
> interconnect is turned off. We're shutting down/restarting anyway, so
> there isn't much point in trying to be power efficient for the last few
> moments of runtime.
>

I wouldn't have expected this change to impact reboot, since this change
should only impact places where pre_aggregate() is called without
subsequently calling aggregate(). I don't think there are currently any
places that can happen other than icc_sync_state().

I suppose what could be happening is we're now disabling certain paths
in icc_sync_state() and their associated drivers just aren't used or
attempting accesses until they're being torn down in reboot. That
doesn't seem particularly likely, but nothing else immediately comes to
mind.

We already mark paths critical for the CPU as "keepalive" such that
they'll never turn off. This includes the CPU path to DDR and top-level
CSRs. Basically just paths that can't actually be turned off while SW is
running. That logic is unchanged in this patch. So we generally
shouldn't need any shutdown-specific callbacks to place BW votes during
this window. Client drivers should still ensure they're sequencing their
shutdown logic such that any bus accesses happen before they remove
their BW requests.

2021-08-18 04:47:59

by Mike Tipton

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate

On 8/10/2021 5:18 PM, Bjorn Andersson wrote:
> On Tue 10 Aug 18:31 CDT 2021, Stephen Boyd wrote:
>
>> Quoting Mike Tipton (2021-07-21 10:54:32)
>>> We're only adding BCMs to the commit list in aggregate(), but there are
>>> cases where pre_aggregate() is called without subsequently calling
>>> aggregate(). In particular, in icc_sync_state() when a node with initial
>>> BW has zero requests. Since BCMs aren't added to the commit list in
>>> these cases, we don't actually send the zero BW request to HW. So the
>>> resources remain on unnecessarily.
>>>
>>> Add BCMs to the commit list in pre_aggregate() instead, which is always
>>> called even when there are no requests.
>>>
>>> Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support")
>>> Signed-off-by: Mike Tipton <[email protected]>
>>> ---
>>
>> This patch breaks reboot for me on sc7180 Lazor
>>
>
> FWIW, it prevents at least SM8150 from booting (need to check my other
> boards as well), because its no longer okay to have the interconnect
> providers defined without having all client paths specified.

My testing was limited to sdm845, which didn't show any boot issues. But
it's not terribly surprising for this to cause problems on some targets.
Previously every node was enabled by default and left on permanently if
nobody explicitly voted for them. This would happen even if these nodes
weren't enabled in bootloaders, since most of the qcom providers aren't
defining a get_bw() callback and thus the framework defaults
init_avg/init_peak to INT_MAX. So any drivers relying on this default-on
behavior would break.

We can try to get dumps of the NOC error registers at the time of
failure to pinpoint the problematic access. Or we could try to narrow it
down by marking more BCMs as keepalive. If they're marked as keepalive
then we won't let them turn off even with this patch.

>
> Regards,
> Bjorn
>