2022-11-02 11:45:16

by James Clark

[permalink] [raw]
Subject: [PATCH 5.10] coresight: cti: Fix hang in cti_disable_hw()

commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.

cti_enable_hw() and cti_disable_hw() are called from an atomic context
so shouldn't use runtime PM because it can result in a sleep when
communicating with firmware.

This can cause a hang when running the Perf Coresight tests or running
this command:

perf record -e cs_etm//u -- ls

With lock and scheduler debugging enabled the following is output:

coresight cti_sys0: cti_enable_hw -- dev:cti_sys0 parent: 20020000.cti
BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1151
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 330, name: perf-exec
preempt_count: 2, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last enabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 3 PID: 330 Comm: perf-exec Not tainted 6.0.0-00053-g042116d99298 #7
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 13 2022
Call trace:
dump_backtrace+0x134/0x140
show_stack+0x20/0x58
dump_stack_lvl+0x8c/0xb8
dump_stack+0x18/0x34
__might_resched+0x180/0x228
__might_sleep+0x50/0x88
__pm_runtime_resume+0xac/0xb0
cti_enable+0x44/0x120
coresight_control_assoc_ectdev+0xc0/0x150
coresight_enable_path+0xb4/0x288
etm_event_start+0x138/0x170
etm_event_add+0x48/0x70
event_sched_in.isra.122+0xb4/0x280
merge_sched_in+0x1fc/0x3d0
visit_groups_merge.constprop.137+0x16c/0x4b0
ctx_sched_in+0x114/0x1f0
perf_event_sched_in+0x60/0x90
ctx_resched+0x68/0xb0
perf_event_exec+0x138/0x508
begin_new_exec+0x52c/0xd40
load_elf_binary+0x6b8/0x17d0
bprm_execve+0x360/0x7f8
do_execveat_common.isra.47+0x218/0x238
__arm64_sys_execve+0x48/0x60
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.4+0xfc/0x120
do_el0_svc+0x34/0xc0
el0_svc+0x40/0x98
el0t_64_sync_handler+0x98/0xc0
el0t_64_sync+0x170/0x174

Fix the issue by removing the runtime PM calls completely. They are not
needed here because it must have already been done when building the
path for a trace.

Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
Cc: stable <[email protected]> # 5.10.x
Reported-by: Aishwarya TCV <[email protected]>
Reported-by: Cristian Marussi <[email protected]>
Suggested-by: Suzuki K Poulose <[email protected]>
Reviewed-by: Mike Leach <[email protected]>
Tested-by: Mike Leach <[email protected]>
[ Fix build warnings ]
Signed-off-by: Suzuki K Poulose <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: James Clark <[email protected]>
---
drivers/hwtracing/coresight/coresight-cti-core.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
index 0276700c246d..90270696206c 100644
--- a/drivers/hwtracing/coresight/coresight-cti-core.c
+++ b/drivers/hwtracing/coresight/coresight-cti-core.c
@@ -90,11 +90,9 @@ void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
static int cti_enable_hw(struct cti_drvdata *drvdata)
{
struct cti_config *config = &drvdata->config;
- struct device *dev = &drvdata->csdev->dev;
unsigned long flags;
int rc = 0;

- pm_runtime_get_sync(dev->parent);
spin_lock_irqsave(&drvdata->spinlock, flags);

/* no need to do anything if enabled or unpowered*/
@@ -119,7 +117,6 @@ static int cti_enable_hw(struct cti_drvdata *drvdata)
/* cannot enable due to error */
cti_err_not_enabled:
spin_unlock_irqrestore(&drvdata->spinlock, flags);
- pm_runtime_put(dev->parent);
return rc;
}

@@ -153,7 +150,6 @@ static void cti_cpuhp_enable_hw(struct cti_drvdata *drvdata)
static int cti_disable_hw(struct cti_drvdata *drvdata)
{
struct cti_config *config = &drvdata->config;
- struct device *dev = &drvdata->csdev->dev;

spin_lock(&drvdata->spinlock);

@@ -174,7 +170,6 @@ static int cti_disable_hw(struct cti_drvdata *drvdata)
coresight_disclaim_device_unlocked(drvdata->base);
CS_LOCK(drvdata->base);
spin_unlock(&drvdata->spinlock);
- pm_runtime_put(dev->parent);
return 0;

/* not disabled this call */
--
2.25.1



2022-11-07 09:34:54

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH 5.10] coresight: cti: Fix hang in cti_disable_hw()

On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
> On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
>> commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
>
> Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?

This was reverted in commit d76308f03ee1 and pushed in later
with fixups as 6746eae4bbadd.

Cheers
Suzuki

>
> confused,
>
> greg k-h


2022-11-07 09:56:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5.10] coresight: cti: Fix hang in cti_disable_hw()

On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
> commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.

Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?

confused,

greg k-h

2022-11-07 10:36:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5.10] coresight: cti: Fix hang in cti_disable_hw()

On Mon, Nov 07, 2022 at 09:59:24AM +0000, Suzuki K Poulose wrote:
> On 07/11/2022 09:52, Greg Kroah-Hartman wrote:
> > On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
> > > On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
> > > > On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
> > > > > commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
> > > >
> > > > Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
> > >
> > > This was reverted in commit d76308f03ee1 and pushed in later
> > > with fixups as 6746eae4bbadd.
> >
> > So which should be applied?
>
> Sorry, this particular post is a backport for v5.10 stable branch.
>
> >
> > confused,
>
> Now I am too. What is expected here ? My understanding is, we
> should stick the "upstream" commit that is getting backported
> at the beginning of the commit description. In that sense,
> the commit id in this patch looks correct to me. Please let
> me know if this is not the case.
>
> As such, this is only for 5.10.x branch. The rest are taken
> care of.
>
> Please let us know if we are something missing.

We already have commit 665c157e0204176023860b51a46528ba0ba62c33 queued
up in the 5.10 stable queue. Is that not correct? It has the same
subject line as this one.

Still confused.

thanks,

greg k-h

2022-11-07 10:57:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5.10] coresight: cti: Fix hang in cti_disable_hw()

On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
> On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
> > On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
> > > commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
> >
> > Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
>
> This was reverted in commit d76308f03ee1 and pushed in later
> with fixups as 6746eae4bbadd.

So which should be applied?

confused,

greg k-h

2022-11-07 11:20:53

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH 5.10] coresight: cti: Fix hang in cti_disable_hw()

On 07/11/2022 09:52, Greg Kroah-Hartman wrote:
> On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
>> On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
>>> On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
>>>> commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
>>>
>>> Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
>>
>> This was reverted in commit d76308f03ee1 and pushed in later
>> with fixups as 6746eae4bbadd.
>
> So which should be applied?

Sorry, this particular post is a backport for v5.10 stable branch.

>
> confused,

Now I am too. What is expected here ? My understanding is, we
should stick the "upstream" commit that is getting backported
at the beginning of the commit description. In that sense,
the commit id in this patch looks correct to me. Please let
me know if this is not the case.

As such, this is only for 5.10.x branch. The rest are taken
care of.

Please let us know if we are something missing.

Suzuki


>
> greg k-h


2022-11-07 11:27:53

by Suzuki K Poulose

[permalink] [raw]
Subject: Re: [PATCH 5.10] coresight: cti: Fix hang in cti_disable_hw()

Hi Greg,

On 07/11/2022 10:24, Greg Kroah-Hartman wrote:
> On Mon, Nov 07, 2022 at 09:59:24AM +0000, Suzuki K Poulose wrote:
>> On 07/11/2022 09:52, Greg Kroah-Hartman wrote:
>>> On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
>>>> On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
>>>>> On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
>>>>>> commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
>>>>>
>>>>> Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
>>>>
>>>> This was reverted in commit d76308f03ee1 and pushed in later
>>>> with fixups as 6746eae4bbadd.
>>>
>>> So which should be applied?
>>
>> Sorry, this particular post is a backport for v5.10 stable branch.
>>
>>>
>>> confused,
>>
>> Now I am too. What is expected here ? My understanding is, we
>> should stick the "upstream" commit that is getting backported
>> at the beginning of the commit description. In that sense,
>> the commit id in this patch looks correct to me. Please let
>> me know if this is not the case.
>>
>> As such, this is only for 5.10.x branch. The rest are taken
>> care of.
>>
>> Please let us know if we are something missing.
>
> We already have commit 665c157e0204176023860b51a46528ba0ba62c33 queued
> up in the 5.10 stable queue. Is that not correct? It has the same

We pushed the fix as part of the coresight fixes for 6.1 [0], as

commit: 6746eae4bbad "coresight: cti: Fix hang in cti_disable_hw()"


But, the version in there, produced a build warning with "unused
variable" (with CONFIG_WERROR) [1] and thus was reverted in [2],

commit: d76308f03ee1: Revert "coresight: cti: Fix hang in cti_disable_hw()"


Later, we sent you the corrected fix separately [3], which was queued as
commit "6746eae4bbadd".

6746eae4bbad coresight: cti: Fix hang in cti_disable_hw()

So, in effect, here is what we have in the tree :

$ git log --oneline | grep "cti: Fix"
6746eae4bbad coresight: cti: Fix hang in cti_disable_hw()
d76308f03ee1 Revert "coresight: cti: Fix hang in cti_disable_hw()"
665c157e0204 coresight: cti: Fix hang in cti_disable_hw()

> subject line as this one.

I understand the "same" subject line has caused this confusion. Will
modify it in the future avoid this confusion.

So, kindly drop "665c157e0204" from the queue for 5.10, it would fail to
apply there anyway so does the correct "6746eae4bbad". This backport
is appropriate for 5.10 branch, with the correct version.

I have double checked the versions pulled into 6.0.x [4] and 5.15.x [5]
branches and they all have the correct one (6746eae4bbad) applied.

[0] https://lkml.kernel.org/r/[email protected]
[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]
[3] https://lkml.kernel.org/r/[email protected]
[4] https://lkml.kernel.org/r/[email protected]
[5] https://lkml.kernel.org/r/[email protected]


Hope this helps.

Suzuki

>
> Still confused.
>
> thanks,
>
> greg k-h


2022-11-07 12:49:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5.10] coresight: cti: Fix hang in cti_disable_hw()

On Mon, Nov 07, 2022 at 11:15:35AM +0000, Suzuki K Poulose wrote:
> Hi Greg,
>
> On 07/11/2022 10:24, Greg Kroah-Hartman wrote:
> > On Mon, Nov 07, 2022 at 09:59:24AM +0000, Suzuki K Poulose wrote:
> > > On 07/11/2022 09:52, Greg Kroah-Hartman wrote:
> > > > On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
> > > > > On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
> > > > > > On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
> > > > > > > commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
> > > > > >
> > > > > > Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
> > > > >
> > > > > This was reverted in commit d76308f03ee1 and pushed in later
> > > > > with fixups as 6746eae4bbadd.
> > > >
> > > > So which should be applied?
> > >
> > > Sorry, this particular post is a backport for v5.10 stable branch.
> > >
> > > >
> > > > confused,
> > >
> > > Now I am too. What is expected here ? My understanding is, we
> > > should stick the "upstream" commit that is getting backported
> > > at the beginning of the commit description. In that sense,
> > > the commit id in this patch looks correct to me. Please let
> > > me know if this is not the case.
> > >
> > > As such, this is only for 5.10.x branch. The rest are taken
> > > care of.
> > >
> > > Please let us know if we are something missing.
> >
> > We already have commit 665c157e0204176023860b51a46528ba0ba62c33 queued
> > up in the 5.10 stable queue. Is that not correct? It has the same
>
> We pushed the fix as part of the coresight fixes for 6.1 [0], as
>
> commit: 6746eae4bbad "coresight: cti: Fix hang in cti_disable_hw()"
>
>
> But, the version in there, produced a build warning with "unused
> variable" (with CONFIG_WERROR) [1] and thus was reverted in [2],
>
> commit: d76308f03ee1: Revert "coresight: cti: Fix hang in cti_disable_hw()"
>
>
> Later, we sent you the corrected fix separately [3], which was queued as
> commit "6746eae4bbadd".
>
> 6746eae4bbad coresight: cti: Fix hang in cti_disable_hw()
>
> So, in effect, here is what we have in the tree :
>
> $ git log --oneline | grep "cti: Fix"
> 6746eae4bbad coresight: cti: Fix hang in cti_disable_hw()
> d76308f03ee1 Revert "coresight: cti: Fix hang in cti_disable_hw()"
> 665c157e0204 coresight: cti: Fix hang in cti_disable_hw()
>
> > subject line as this one.
>
> I understand the "same" subject line has caused this confusion. Will
> modify it in the future avoid this confusion.
>
> So, kindly drop "665c157e0204" from the queue for 5.10, it would fail to
> apply there anyway so does the correct "6746eae4bbad". This backport
> is appropriate for 5.10 branch, with the correct version.

Ok, I dropped 665c157e0204 from the queue, but note that it _was_
backported there properly. But only there, which is odd, Sasha's
scripts might not have caught up.

I'll queue this one up now instead.

thanks,

greg k-h