2022-04-06 17:32:10

by Guenter Roeck

[permalink] [raw]
Subject: [RFC PATCH] iwlwifi: iwl-dbg: Use del_timer_sync() before freeing

In Chrome OS, a large number of crashes is observed due to corrupted timer
lists. Steven Rostedt pointed out that this usually happens when a timer
is freed while still active, and that the problem is often triggered
by code calling del_timer() instead of del_timer_sync() just before
freeing.

Steven also identified the iwlwifi driver as one of the possible culprits
since it does exactly that.

Reported-by: Steven Rostedt <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Shahar S Matityahu <[email protected]>
Cc: Johannes Berg <[email protected]>
Fixes: 60e8abd9d3e91 ("iwlwifi: dbg_ini: add periodic trigger new API support")
Signed-off-by: Guenter Roeck <[email protected]>
---
RFC:
Maybe there was a reason to use del_timer() instead of del_timer_sync().
Also, I am not sure if the change is sufficient since I don't see any
obvious locking that would prevent timers from being added and then
modified in iwl_dbg_tlv_set_periodic_trigs() while being removed in
iwl_dbg_tlv_del_timers().

drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c b/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
index 866a33f49915..3237d4b528b5 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
@@ -371,7 +371,7 @@ void iwl_dbg_tlv_del_timers(struct iwl_trans *trans)
struct iwl_dbg_tlv_timer_node *node, *tmp;

list_for_each_entry_safe(node, tmp, timer_list, list) {
- del_timer(&node->timer);
+ del_timer_sync(&node->timer);
list_del(&node->list);
kfree(node);
}
--
2.35.1


2022-04-07 21:29:53

by Guenter Roeck

[permalink] [raw]
Subject: Re: [RFC PATCH] iwlwifi: iwl-dbg: Use del_timer_sync() before freeing

Hi,

On 4/6/22 08:34, Guenter Roeck wrote:
> In Chrome OS, a large number of crashes is observed due to corrupted timer
> lists. Steven Rostedt pointed out that this usually happens when a timer
> is freed while still active, and that the problem is often triggered
> by code calling del_timer() instead of del_timer_sync() just before
> freeing.
>
> Steven also identified the iwlwifi driver as one of the possible culprits
> since it does exactly that.
>
> Reported-by: Steven Rostedt <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Shahar S Matityahu <[email protected]>
> Cc: Johannes Berg <[email protected]>
> Fixes: 60e8abd9d3e91 ("iwlwifi: dbg_ini: add periodic trigger new API support")
> Signed-off-by: Guenter Roeck <[email protected]>
> ---
> RFC:
> Maybe there was a reason to use del_timer() instead of del_timer_sync().
> Also, I am not sure if the change is sufficient since I don't see any
> obvious locking that would prevent timers from being added and then
> modified in iwl_dbg_tlv_set_periodic_trigs() while being removed in
> iwl_dbg_tlv_del_timers().
>

I prepared a new version of this patch, introducing a mutex to protect changes
to periodic_trig_list. I'd like to get some feedback before sending it out,
though, so I'll wait until next week before sending it.

If you have any feedback/thoughts/comments, please let me know.

Thanks,
Guenter

> drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c b/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
> index 866a33f49915..3237d4b528b5 100644
> --- a/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
> +++ b/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
> @@ -371,7 +371,7 @@ void iwl_dbg_tlv_del_timers(struct iwl_trans *trans)
> struct iwl_dbg_tlv_timer_node *node, *tmp;
>
> list_for_each_entry_safe(node, tmp, timer_list, list) {
> - del_timer(&node->timer);
> + del_timer_sync(&node->timer);
> list_del(&node->list);
> kfree(node);
> }

2022-04-08 05:52:33

by Luciano Coelho

[permalink] [raw]
Subject: Re: [RFC PATCH] iwlwifi: iwl-dbg: Use del_timer_sync() before freeing

On Thu, 2022-04-07 at 12:50 -0700, Guenter Roeck wrote:
> Hi,
>
> On 4/6/22 08:34, Guenter Roeck wrote:
> > In Chrome OS, a large number of crashes is observed due to corrupted timer
> > lists. Steven Rostedt pointed out that this usually happens when a timer
> > is freed while still active, and that the problem is often triggered
> > by code calling del_timer() instead of del_timer_sync() just before
> > freeing.
> >
> > Steven also identified the iwlwifi driver as one of the possible culprits
> > since it does exactly that.
> >
> > Reported-by: Steven Rostedt <[email protected]>
> > Cc: Steven Rostedt <[email protected]>
> > Cc: Shahar S Matityahu <[email protected]>
> > Cc: Johannes Berg <[email protected]>
> > Fixes: 60e8abd9d3e91 ("iwlwifi: dbg_ini: add periodic trigger new API support")
> > Signed-off-by: Guenter Roeck <[email protected]>
> > ---
> > RFC:
> > Maybe there was a reason to use del_timer() instead of del_timer_sync().
> > Also, I am not sure if the change is sufficient since I don't see any
> > obvious locking that would prevent timers from being added and then
> > modified in iwl_dbg_tlv_set_periodic_trigs() while being removed in
> > iwl_dbg_tlv_del_timers().
> >
>
> I prepared a new version of this patch, introducing a mutex to protect changes
> to periodic_trig_list. I'd like to get some feedback before sending it out,
> though, so I'll wait until next week before sending it.
>
> If you have any feedback/thoughts/comments, please let me know.

Hi Guenter,

Thanks for your proposal!

I recently moved from the Intel WiFi team to the Graphics team, so I'm
adding Gregory, who has taken over my duties, to the discussion.

I don't recall any specific reasons for using del_timer() instead of
del_timer_sync() here. So your patch does look correct to me.

--
Cheers,
Luca.

2022-04-09 05:33:34

by Guenter Roeck

[permalink] [raw]
Subject: Re: [RFC PATCH] iwlwifi: iwl-dbg: Use del_timer_sync() before freeing

Hi Luca,

On 4/7/22 22:20, Coelho, Luciano wrote:
> On Thu, 2022-04-07 at 12:50 -0700, Guenter Roeck wrote:
>> Hi,
>>
>> On 4/6/22 08:34, Guenter Roeck wrote:
>>> In Chrome OS, a large number of crashes is observed due to corrupted timer
>>> lists. Steven Rostedt pointed out that this usually happens when a timer
>>> is freed while still active, and that the problem is often triggered
>>> by code calling del_timer() instead of del_timer_sync() just before
>>> freeing.
>>>
>>> Steven also identified the iwlwifi driver as one of the possible culprits
>>> since it does exactly that.
>>>
>>> Reported-by: Steven Rostedt <[email protected]>
>>> Cc: Steven Rostedt <[email protected]>
>>> Cc: Shahar S Matityahu <[email protected]>
>>> Cc: Johannes Berg <[email protected]>
>>> Fixes: 60e8abd9d3e91 ("iwlwifi: dbg_ini: add periodic trigger new API support")
>>> Signed-off-by: Guenter Roeck <[email protected]>
>>> ---
>>> RFC:
>>> Maybe there was a reason to use del_timer() instead of del_timer_sync().
>>> Also, I am not sure if the change is sufficient since I don't see any
>>> obvious locking that would prevent timers from being added and then
>>> modified in iwl_dbg_tlv_set_periodic_trigs() while being removed in
>>> iwl_dbg_tlv_del_timers().
>>>
>>
>> I prepared a new version of this patch, introducing a mutex to protect changes
>> to periodic_trig_list. I'd like to get some feedback before sending it out,
>> though, so I'll wait until next week before sending it.
>>
>> If you have any feedback/thoughts/comments, please let me know.
>
> Hi Guenter,
>
> Thanks for your proposal!
>
> I recently moved from the Intel WiFi team to the Graphics team, so I'm
> adding Gregory, who has taken over my duties, to the discussion.
>
> I don't recall any specific reasons for using del_timer() instead of
> del_timer_sync() here. So your patch does look correct to me.
>

Thanks a lot for the feedback. I spent some time trying to determine
if a mutex to protect the periodic timer list is needed, but concluded
that it is not necessary because the code adding the timer list and
the code removing it are never executed in parallel. Of course,
I may be missing something, so I'd be happy to be corrected.

Thanks,
Guenter