2018-06-08 15:17:15

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

On 06/07/2018 04:59 PM, Cong Wang wrote:
> On Thu, Jun 7, 2018 at 4:48 PM, <[email protected]> wrote:
>> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
>> index be7c0fa..cb911f0 100644
>> --- a/include/net/fq_impl.h
>> +++ b/include/net/fq_impl.h
>> @@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>> return NULL;
>> }
>>
>> - flow = list_first_entry(head, struct fq_flow, flowchain);
>> + flow = list_first_entry_or_null(head, struct fq_flow, flowchain);
>> +
>> + if (WARN_ON_ONCE(!flow))
>> + return NULL;
>
> This does not make sense either. list_first_entry_or_null()
> returns NULL only when the list is empty, but we already check
> list_empty() right before this code, and it is protected by fq->lock.
>

Hello Michal,

git blame shows you as the author of the fq_impl.h code.

I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks kernel. There was an apparent
mostly-null deref in the fq_tin_dequeue method. According to gdb, it was within
1 line of the dereference of 'flow'.

My hack above is probably not that useful. Cong thinks maybe the locking is bad.

If you get a chance, please review this thread and see if you have any ideas for
a better fix (or better debugging code).

As always, if you would like me to generate you a buggy firmware that will crash
in the tx path and cause all sorts of mayhem in the ath10k driver and wifi stack,
I will be happy to do so.

https://www.mail-archive.com/[email protected]/msg239738.html

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2018-06-08 21:40:39

by Arend Van Spriel

[permalink] [raw]
Subject: Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

On 6/8/2018 5:17 PM, Ben Greear wrote:

I recalled an email from Michał leaving tieto so adding his alternate
email he provided back then.

Gr. AvS

> On 06/07/2018 04:59 PM, Cong Wang wrote:
>> On Thu, Jun 7, 2018 at 4:48 PM, <[email protected]> wrote:
>>> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
>>> index be7c0fa..cb911f0 100644
>>> --- a/include/net/fq_impl.h
>>> +++ b/include/net/fq_impl.h
>>> @@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>> return NULL;
>>> }
>>>
>>> - flow = list_first_entry(head, struct fq_flow, flowchain);
>>> + flow = list_first_entry_or_null(head, struct fq_flow,
>>> flowchain);
>>> +
>>> + if (WARN_ON_ONCE(!flow))
>>> + return NULL;
>>
>> This does not make sense either. list_first_entry_or_null()
>> returns NULL only when the list is empty, but we already check
>> list_empty() right before this code, and it is protected by fq->lock.
>>
>
> Hello Michal,
>
> git blame shows you as the author of the fq_impl.h code.
>
> I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks
> kernel. There was an apparent
> mostly-null deref in the fq_tin_dequeue method. According to gdb, it
> was within
> 1 line of the dereference of 'flow'.
>
> My hack above is probably not that useful. Cong thinks maybe the
> locking is bad.
>
> If you get a chance, please review this thread and see if you have any
> ideas for
> a better fix (or better debugging code).
>
> As always, if you would like me to generate you a buggy firmware that
> will crash
> in the tx path and cause all sorts of mayhem in the ath10k driver and
> wifi stack,
> I will be happy to do so.
>
> https://www.mail-archive.com/[email protected]/msg239738.html
>
> Thanks,
> Ben
>

2018-06-10 17:10:34

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

Ben,

The patch is symptomatic. fq_tin_dequeue() already checks if the list
is empty before it tries to access first entry. I see no point in
using the _or_null() + WARN_ON.

The 0x3c deref is likely an offset off of NULL base pointer. Did you
check gdb/addr2line of the ieee80211_tx_dequeue+0xfb? Where did it
point to?

I suspect there's not enough synchronization between quescing the
device/ath10k after fw crashes and performing mac80211's reconfig
procedure.


Micha=C5=82

On 8 June 2018 at 23:40, Arend van Spriel <[email protected]> wr=
ote:
> On 6/8/2018 5:17 PM, Ben Greear wrote:
>
> I recalled an email from Micha=C5=82 leaving tieto so adding his alternat=
e email
> he provided back then.
>
> Gr. AvS
>
>
>> On 06/07/2018 04:59 PM, Cong Wang wrote:
>>>
>>> On Thu, Jun 7, 2018 at 4:48 PM, <[email protected]> wrote:
>>>>
>>>> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
>>>> index be7c0fa..cb911f0 100644
>>>> --- a/include/net/fq_impl.h
>>>> +++ b/include/net/fq_impl.h
>>>> @@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq=
,
>>>> return NULL;
>>>> }
>>>>
>>>> - flow =3D list_first_entry(head, struct fq_flow, flowchain);
>>>> + flow =3D list_first_entry_or_null(head, struct fq_flow,
>>>> flowchain);
>>>> +
>>>> + if (WARN_ON_ONCE(!flow))
>>>> + return NULL;
>>>
>>>
>>> This does not make sense either. list_first_entry_or_null()
>>> returns NULL only when the list is empty, but we already check
>>> list_empty() right before this code, and it is protected by fq->lock.
>>>
>>
>> Hello Michal,
>>
>> git blame shows you as the author of the fq_impl.h code.
>>
>> I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks
>> kernel. There was an apparent
>> mostly-null deref in the fq_tin_dequeue method. According to gdb, it
>> was within
>> 1 line of the dereference of 'flow'.
>>
>> My hack above is probably not that useful. Cong thinks maybe the
>> locking is bad.
>>
>> If you get a chance, please review this thread and see if you have any
>> ideas for
>> a better fix (or better debugging code).
>>
>> As always, if you would like me to generate you a buggy firmware that
>> will crash
>> in the tx path and cause all sorts of mayhem in the ath10k driver and
>> wifi stack,
>> I will be happy to do so.
>>
>> https://www.mail-archive.com/[email protected]/msg239738.html
>>
>> Thanks,
>> Ben
>>
>

2018-06-11 13:18:51

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

On 06/10/2018 10:10 AM, Michał Kazior wrote:
> Ben,
>
> The patch is symptomatic. fq_tin_dequeue() already checks if the list
> is empty before it tries to access first entry. I see no point in
> using the _or_null() + WARN_ON.
>
> The 0x3c deref is likely an offset off of NULL base pointer. Did you
> check gdb/addr2line of the ieee80211_tx_dequeue+0xfb? Where did it
> point to?

gdb pointed to one line above the flow dereference, which is why I was
going to put some debugging in there.

>
> I suspect there's not enough synchronization between quescing the
> device/ath10k after fw crashes and performing mac80211's reconfig
> procedure.

I am already running this patch which helps with some of that. That
patch never made it upstream, but it fixed problems for me earlier.

https://patchwork.kernel.org/patch/9457639/

Could easily be there are some more issues in that logic.

Someone else posted a patch to disable mac-80211 tx when FW crashes,
I think...I have not tried to backport that.

https://patchwork.kernel.org/patch/10411967/

Thanks,
Ben


>
>
> Michał
>
> On 8 June 2018 at 23:40, Arend van Spriel <[email protected]> wrote:
>> On 6/8/2018 5:17 PM, Ben Greear wrote:
>>
>> I recalled an email from Michał leaving tieto so adding his alternate email
>> he provided back then.
>>
>> Gr. AvS
>>
>>
>>> On 06/07/2018 04:59 PM, Cong Wang wrote:
>>>>
>>>> On Thu, Jun 7, 2018 at 4:48 PM, <[email protected]> wrote:
>>>>>
>>>>> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
>>>>> index be7c0fa..cb911f0 100644
>>>>> --- a/include/net/fq_impl.h
>>>>> +++ b/include/net/fq_impl.h
>>>>> @@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>>>> return NULL;
>>>>> }
>>>>>
>>>>> - flow = list_first_entry(head, struct fq_flow, flowchain);
>>>>> + flow = list_first_entry_or_null(head, struct fq_flow,
>>>>> flowchain);
>>>>> +
>>>>> + if (WARN_ON_ONCE(!flow))
>>>>> + return NULL;
>>>>
>>>>
>>>> This does not make sense either. list_first_entry_or_null()
>>>> returns NULL only when the list is empty, but we already check
>>>> list_empty() right before this code, and it is protected by fq->lock.
>>>>
>>>
>>> Hello Michal,
>>>
>>> git blame shows you as the author of the fq_impl.h code.
>>>
>>> I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks
>>> kernel. There was an apparent
>>> mostly-null deref in the fq_tin_dequeue method. According to gdb, it
>>> was within
>>> 1 line of the dereference of 'flow'.
>>>
>>> My hack above is probably not that useful. Cong thinks maybe the
>>> locking is bad.
>>>
>>> If you get a chance, please review this thread and see if you have any
>>> ideas for
>>> a better fix (or better debugging code).
>>>
>>> As always, if you would like me to generate you a buggy firmware that
>>> will crash
>>> in the tx path and cause all sorts of mayhem in the ath10k driver and
>>> wifi stack,
>>> I will be happy to do so.
>>>
>>> https://www.mail-archive.com/[email protected]/msg239738.html
>>>
>>> Thanks,
>>> Ben
>>>
>>
>

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com