2024-02-21 14:39:01

by Hans Verkuil

[permalink] [raw]
Subject: Re: [PATCH] media/cec/core: fix task hung in cec_claim_log_addrs

On 21/02/2024 15:20, Edward Adam Davis wrote:
> After unlocking adap->lock in cec_claim_log_addrs(), cec_claim_log_addrs() may
> re-enter, causing this issue to occur.

But if it is called again, then it should hit this at the start of the function:

if (WARN_ON(adap->is_configuring || adap->is_configured))
return;

I'm still not sure what causes the KASAN hung task since I cannot seem to reproduce
it, and because it is hard for me to find enough time to dig into this.

Regards,

Hans

>
> In the thread function cec_config_thread_func() adap->lock is also used, so there
> is no need to unlock adap->lock in cec_claim_log_addrs(), and then use adap->lock
> in cec_config_thread_func() to protect.
>
> Reported-and-tested-by: [email protected]
> Signed-off-by: Edward Adam Davis <[email protected]>
> ---
> drivers/media/cec/core/cec-adap.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> index 5741adf09a2e..21b3ff504524 100644
> --- a/drivers/media/cec/core/cec-adap.c
> +++ b/drivers/media/cec/core/cec-adap.c
> @@ -1436,7 +1436,6 @@ static int cec_config_thread_func(void *arg)
> int err;
> int i, j;
>
> - mutex_lock(&adap->lock);
> dprintk(1, "physical address: %x.%x.%x.%x, claim %d logical addresses\n",
> cec_phys_addr_exp(adap->phys_addr), las->num_log_addrs);
> las->log_addr_mask = 0;
> @@ -1565,7 +1564,6 @@ static int cec_config_thread_func(void *arg)
> }
> adap->kthread_config = NULL;
> complete(&adap->config_completion);
> - mutex_unlock(&adap->lock);
> call_void_op(adap, configured);
> return 0;
>
> @@ -1577,7 +1575,6 @@ static int cec_config_thread_func(void *arg)
> adap->must_reconfigure = false;
> adap->kthread_config = NULL;
> complete(&adap->config_completion);
> - mutex_unlock(&adap->lock);
> return 0;
> }
>
> @@ -1602,9 +1599,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> adap->kthread_config = NULL;
> adap->is_configuring = false;
> } else if (block) {
> - mutex_unlock(&adap->lock);
> wait_for_completion(&adap->config_completion);
> - mutex_lock(&adap->lock);
> }
> }
>



2024-02-22 10:44:22

by Hillf Danton

[permalink] [raw]
Subject: Re: [PATCH] media/cec/core: fix task hung in cec_claim_log_addrs

On Wed, 21 Feb 2024 15:38:47 +0100 Hans Verkuil <[email protected]>
> On 21/02/2024 15:20, Edward Adam Davis wrote:
> > After unlocking adap->lock in cec_claim_log_addrs(), cec_claim_log_addrs() may
> > re-enter, causing this issue to occur.
>
> But if it is called again, then it should hit this at the start of the function:
>
> if (WARN_ON(adap->is_configuring || adap->is_configured))
> return;
>
> I'm still not sure what causes the KASAN hung task since I cannot seem to reproduce
> it, and because it is hard for me to find enough time to dig into this.

Likely because of the window for initializing completion more than once [1].

[1] https://lore.kernel.org/lkml/[email protected]/

2024-02-22 11:10:44

by Edward Adam Davis

[permalink] [raw]
Subject: Re: [PATCH] media/cec/core: fix task hung in cec_claim_log_addrs

On Wed, 21 Feb 2024 15:38:47 +0100, Hans Verkuil wrote:
> > After unlocking adap->lock in cec_claim_log_addrs(), cec_claim_log_addrs() may
> > re-enter, causing this issue to occur.
>
> But if it is called again, then it should hit this at the start of the function:
>
> if (WARN_ON(adap->is_configuring || adap->is_configured))
> return;
>
> I'm still not sure what causes the KASAN hung task since I cannot seem to reproduce
> it, and because it is hard for me to find enough time to dig into this.

Please pay attention to the following section of code in cec_config_thread_func():
3 unconfigure:
2 for (i = 0; i < las->num_log_addrs; i++)
1 las->log_addr[i] = CEC_LOG_ADDR_INVALID;
1573 cec_adap_unconfigure(adap); // [1], is_configured = false;
1 adap->is_configuring = false; // [2], is_configuring = false;
2 adap->must_reconfigure = false;
3 adap->kthread_config = NULL;
4 complete(&adap->config_completion);
5 mutex_unlock(&adap->lock); // [3], Afterwards

And the following code is included in cec_claim_log-addrs():
3 } else if (block) {
2 mutex_unlock(&adap->lock);
1 wait_for_completion(&adap->config_completion);
1607 mutex_lock(&adap->lock); // [4], During the period before re obtaining the adap->lock, how did cec_claim_log-addrs() re-enter?

BR,
edward


2024-02-22 12:17:05

by Hans Verkuil

[permalink] [raw]
Subject: Re: [PATCH] media/cec/core: fix task hung in cec_claim_log_addrs

On 22/02/2024 11:43, Hillf Danton wrote:
> On Wed, 21 Feb 2024 15:38:47 +0100 Hans Verkuil <[email protected]>
>> On 21/02/2024 15:20, Edward Adam Davis wrote:
>>> After unlocking adap->lock in cec_claim_log_addrs(), cec_claim_log_addrs() may
>>> re-enter, causing this issue to occur.
>>
>> But if it is called again, then it should hit this at the start of the function:
>>
>> if (WARN_ON(adap->is_configuring || adap->is_configured))
>> return;
>>
>> I'm still not sure what causes the KASAN hung task since I cannot seem to reproduce
>> it, and because it is hard for me to find enough time to dig into this.
>
> Likely because of the window for initializing completion more than once [1].
>
> [1] https://lore.kernel.org/lkml/[email protected]/

I have been able to reproduce this by adding msleeps in several places.

When I have some more time I will start digging into this.

Regards,

Hans