2022-05-06 11:17:43

by Manuel Ullmann

[permalink] [raw]
Subject: [PATCH v3] net: atlantic: always deep reset on pm op, fixing null deref regression

From d24052938345d456946be0e9ccc337e24d771c79 Mon Sep 17 00:00:00 2001
Date: Wed, 4 May 2022 21:30:44 +0200

The impact of this regression is the same for resume that I saw on
thaw: the kernel hangs and nothing except SysRq rebooting can be done.

The null deref occurs at the same position as on thaw.
BUG: kernel NULL pointer dereference
RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]

Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep
par in pm functions, preventing null derefs"), where I disabled deep
pm resets in suspend and resume, trying to make sense of the
atl_resume_common deep parameter in the first place.

It turns out, that atlantic always has to deep reset on pm operations
and the parameter is useless. Even though I expected that and tested
resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
missing the breakage.

This fixup obsoletes the deep parameter of atl_resume_common, but I
leave the cleanup for the maintainers to post to mainline.

PS: I'm very sorry for this regression.

Changes in v2:
Patch formatting fixes
- Fix Fixes tag
– Simplify stable Cc tag
– Fix Signed-off-by tag

Changes in v3:
– Prefixed commit reference with "commit" aka I managed to use
checkpatch.pl.
- Added Tested-by tags for the testing reporters.
– People start to get annoyed by my patch revision spamming. Should be
the last one.

Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs")
Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/
Reported-by: Jordan Leppert <[email protected]>
Reported-by: Holger Hoffstätte <[email protected]>
Tested-by: Jordan Leppert <[email protected]>
Tested-by: Holger Hoffstätte <[email protected]>
Cc: <[email protected]> # 5.10+
Signed-off-by: Manuel Ullmann <[email protected]>
---
drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
index 3a529ee8c834..831833911a52 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
@@ -449,7 +449,7 @@ static int aq_pm_freeze(struct device *dev)

static int aq_pm_suspend_poweroff(struct device *dev)
{
- return aq_suspend_common(dev, false);
+ return aq_suspend_common(dev, true);
}

static int aq_pm_thaw(struct device *dev)
@@ -459,7 +459,7 @@ static int aq_pm_thaw(struct device *dev)

static int aq_pm_resume_restore(struct device *dev)
{
- return atl_resume_common(dev, false);
+ return atl_resume_common(dev, true);
}

static const struct dev_pm_ops aq_pm_ops = {

base-commit: 672c0c5173427e6b3e2a9bbb7be51ceeec78093a
--
2.35.1


2022-05-08 23:15:34

by Manuel Ullmann

[permalink] [raw]
Subject: Re: [PATCH v3] net: atlantic: always deep reset on pm op, fixing null deref regression

Thorsten Leemhuis <[email protected]> writes:

> On 06.05.22 00:09, Manuel Ullmann wrote:
>>>From d24052938345d456946be0e9ccc337e24d771c79 Mon Sep 17 00:00:00 2001
>> Date: Wed, 4 May 2022 21:30:44 +0200
>>
>> The impact of this regression is the same for resume that I saw on
>> thaw: the kernel hangs and nothing except SysRq rebooting can be done.
>>
>> The null deref occurs at the same position as on thaw.
>> BUG: kernel NULL pointer dereference
>> RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]
>>
>> Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep
>> par in pm functions, preventing null derefs"), where I disabled deep
>> pm resets in suspend and resume, trying to make sense of the
>> atl_resume_common deep parameter in the first place.
>>
>> It turns out, that atlantic always has to deep reset on pm operations
>> and the parameter is useless. Even though I expected that and tested
>> resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
>> missing the breakage.
>>
>> This fixup obsoletes the deep parameter of atl_resume_common, but I
>> leave the cleanup for the maintainers to post to mainline.
>
> FWIW, this section starting here and...
>
>> PS: I'm very sorry for this regression.
>>
>> Changes in v2:
>> Patch formatting fixes
>> - Fix Fixes tag
>> – Simplify stable Cc tag
>> – Fix Signed-off-by tag
>>
>> Changes in v3:
>> – Prefixed commit reference with "commit" aka I managed to use
>> checkpatch.pl.
>> - Added Tested-by tags for the testing reporters.
>> – People start to get annoyed by my patch revision spamming. Should be
>> the last one.
>
> ...ending here needs should be below the "---" line you already have
> below. For details see:
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html
>
> BTW, same goes for any "#regzbot" commands (like you had in
> cbe6c3a8f8f4), as things otherwise get confused when a patch for example
> is posted as part of a stable/longterm -rc review.
Good to know. Maybe I could patch the handling-regressions documentation
to include this. submitting-patches could also link the subsystem
specific documentation, e.g. the netdev FAQ, since they handle patches
with their more bot tests. Would have helped me a bit. Might be a nice
exercise for properly formatted patching ;)

>
> But don't worry, no big deal, I handled that :-D Many thx for actually
> directly getting regzbot involved and taking care of this regression!

Thank you for the final cleanup and you’re welcome. :) Where did you
handle this? I can’t seem to find the fixup anywhere, i.e. net-next,
net, linux-next or lkml.

I actually took the time and read that and all related documentation
(stable, regressions, coding style) during my vacation a few weeks ago,
but my memory was partially overwritten by less useful (work related)
data. Instead of regression reports induced panic mode, I should have
reread the submitting-patches, especially the last section.

>> Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs")
>> Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/
>> Reported-by: Jordan Leppert <[email protected]>
>> Reported-by: Holger Hoffstätte <[email protected]>
>> Tested-by: Jordan Leppert <[email protected]>
>> Tested-by: Holger Hoffstätte <[email protected]>
>> Cc: <[email protected]> # 5.10+
>> Signed-off-by: Manuel Ullmann <[email protected]>
>> ---
>> drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Ciao, Thorsten
Regards, Manuel

2022-05-09 02:01:48

by Manuel Ullmann

[permalink] [raw]
Subject: Re: [PATCH v3] net: atlantic: always deep reset on pm op, fixing null deref regression

Thorsten Leemhuis <[email protected]> writes:

> On 07.05.22 15:10, Manuel Ullmann wrote:
>> Thorsten Leemhuis <[email protected]> writes:
>>
>>> On 06.05.22 00:09, Manuel Ullmann wrote:
>>>> >From d24052938345d456946be0e9ccc337e24d771c79 Mon Sep 17 00:00:00 2001
>>>> Date: Wed, 4 May 2022 21:30:44 +0200
>>>>
>>>> The impact of this regression is the same for resume that I saw on
>>>> thaw: the kernel hangs and nothing except SysRq rebooting can be done.
>>>>
>>>> The null deref occurs at the same position as on thaw.
>>>> BUG: kernel NULL pointer dereference
>>>> RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]
>>>>
>>>> Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep
>>>> par in pm functions, preventing null derefs"), where I disabled deep
>>>> pm resets in suspend and resume, trying to make sense of the
>>>> atl_resume_common deep parameter in the first place.
>>>>
>>>> It turns out, that atlantic always has to deep reset on pm operations
>>>> and the parameter is useless. Even though I expected that and tested
>>>> resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
>>>> missing the breakage.
>>>>
>>>> This fixup obsoletes the deep parameter of atl_resume_common, but I
>>>> leave the cleanup for the maintainers to post to mainline.
>>>
>>> FWIW, this section starting here and...
>>>
>>>> PS: I'm very sorry for this regression.
>>>>
>>>> Changes in v2:
>>>> Patch formatting fixes
>>>> - Fix Fixes tag
>>>> – Simplify stable Cc tag
>>>> – Fix Signed-off-by tag
>>>>
>>>> Changes in v3:
>>>> – Prefixed commit reference with "commit" aka I managed to use
>>>> checkpatch.pl.
>>>> - Added Tested-by tags for the testing reporters.
>>>> – People start to get annoyed by my patch revision spamming. Should be
>>>> the last one.
>>>
>>> ...ending here needs should be below the "---" line you already have
>>> below. For details see:
>>> https://www.kernel.org/doc/html/latest/process/submitting-patches.html
>
> Sorry, I caused a misunderstanding. I didn't handle the above, I'm not
> one of the net subsystem developers. Either you submit a v4 fixing this
> or hope the net maintainer take care of that when they look at it -- but
> I guess they would highly prefer it if you'd address this.

Never mind. Then I’ll post a v4. Thanks for handling the regzbot
tracking. I indeed just assumed this already to be correctly regzbot
tracked. Never post in a hurry.

>>> BTW, same goes for any "#regzbot" commands (like you had in
>>> cbe6c3a8f8f4), as things otherwise get confused when a patch for example
>>> is posted as part of a stable/longterm -rc review.
>> Good to know. Maybe I could patch the handling-regressions documentation
>> to include this.
>
> Yeah, I have already thought about it, but didn't get down to it yet.

Well, I could try it eventually.

> Only so much hours in a day.

I know that issue. ;)
>> submitting-patches could also link the subsystem
>> specific documentation, e.g. the netdev FAQ, since they handle patches
>> with their more bot tests. Would have helped me a bit. Might be a nice
>> exercise for properly formatted patching ;)
>
> I agree that the docs for submitting patches could need a few
> improvements and that is likely one of them.

Then I’ll try fixing this, too. After all most devs have scarce time for
documentation.

>>> But don't worry, no big deal, I handled that :-D Many thx for actually
>>> directly getting regzbot involved and taking care of this regression!
>> Thank you for the final cleanup and you’re welcome. :) Where did you
>> handle this? I can’t seem to find the fixup anywhere, i.e. net-next,
>> net, linux-next or lkml.
>
> See above, I only handled the regzbot issue, not the issue with this
> patch. Sorry for not being clear enough in my wording.

Thanks for clearing this up.

Regards, Manuel

2022-05-09 05:13:33

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH v3] net: atlantic: always deep reset on pm op, fixing null deref regression

On 07.05.22 15:10, Manuel Ullmann wrote:
> Thorsten Leemhuis <[email protected]> writes:
>
>> On 06.05.22 00:09, Manuel Ullmann wrote:
>>> >From d24052938345d456946be0e9ccc337e24d771c79 Mon Sep 17 00:00:00 2001
>>> Date: Wed, 4 May 2022 21:30:44 +0200
>>>
>>> The impact of this regression is the same for resume that I saw on
>>> thaw: the kernel hangs and nothing except SysRq rebooting can be done.
>>>
>>> The null deref occurs at the same position as on thaw.
>>> BUG: kernel NULL pointer dereference
>>> RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]
>>>
>>> Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep
>>> par in pm functions, preventing null derefs"), where I disabled deep
>>> pm resets in suspend and resume, trying to make sense of the
>>> atl_resume_common deep parameter in the first place.
>>>
>>> It turns out, that atlantic always has to deep reset on pm operations
>>> and the parameter is useless. Even though I expected that and tested
>>> resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
>>> missing the breakage.
>>>
>>> This fixup obsoletes the deep parameter of atl_resume_common, but I
>>> leave the cleanup for the maintainers to post to mainline.
>>
>> FWIW, this section starting here and...
>>
>>> PS: I'm very sorry for this regression.
>>>
>>> Changes in v2:
>>> Patch formatting fixes
>>> - Fix Fixes tag
>>> – Simplify stable Cc tag
>>> – Fix Signed-off-by tag
>>>
>>> Changes in v3:
>>> – Prefixed commit reference with "commit" aka I managed to use
>>> checkpatch.pl.
>>> - Added Tested-by tags for the testing reporters.
>>> – People start to get annoyed by my patch revision spamming. Should be
>>> the last one.
>>
>> ...ending here needs should be below the "---" line you already have
>> below. For details see:
>> https://www.kernel.org/doc/html/latest/process/submitting-patches.html

Sorry, I caused a misunderstanding. I didn't handle the above, I'm not
one of the net subsystem developers. Either you submit a v4 fixing this
or hope the net maintainer take care of that when they look at it -- but
I guess they would highly prefer it if you'd address this.

>> BTW, same goes for any "#regzbot" commands (like you had in
>> cbe6c3a8f8f4), as things otherwise get confused when a patch for example
>> is posted as part of a stable/longterm -rc review.
> Good to know. Maybe I could patch the handling-regressions documentation
> to include this.

Yeah, I have already thought about it, but didn't get down to it yet.
Only so much hours in a day.

> submitting-patches could also link the subsystem
> specific documentation, e.g. the netdev FAQ, since they handle patches
> with their more bot tests. Would have helped me a bit. Might be a nice
> exercise for properly formatted patching ;)

I agree that the docs for submitting patches could need a few
improvements and that is likely one of them.

>> But don't worry, no big deal, I handled that :-D Many thx for actually
>> directly getting regzbot involved and taking care of this regression!
> Thank you for the final cleanup and you’re welcome. :) Where did you
> handle this? I can’t seem to find the fixup anywhere, i.e. net-next,
> net, linux-next or lkml.

See above, I only handled the regzbot issue, not the issue with this
patch. Sorry for not being clear enough in my wording.

> [...]

Ciao, Thorsten

2022-05-09 05:44:07

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH v3] net: atlantic: always deep reset on pm op, fixing null deref regression

On 06.05.22 00:09, Manuel Ullmann wrote:
>>From d24052938345d456946be0e9ccc337e24d771c79 Mon Sep 17 00:00:00 2001
> Date: Wed, 4 May 2022 21:30:44 +0200
>
> The impact of this regression is the same for resume that I saw on
> thaw: the kernel hangs and nothing except SysRq rebooting can be done.
>
> The null deref occurs at the same position as on thaw.
> BUG: kernel NULL pointer dereference
> RIP: aq_ring_rx_fill+0xcf/0x210 [atlantic]
>
> Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep
> par in pm functions, preventing null derefs"), where I disabled deep
> pm resets in suspend and resume, trying to make sense of the
> atl_resume_common deep parameter in the first place.
>
> It turns out, that atlantic always has to deep reset on pm operations
> and the parameter is useless. Even though I expected that and tested
> resume, I screwed up by kexec-rebooting into an unpatched kernel, thus
> missing the breakage.
>
> This fixup obsoletes the deep parameter of atl_resume_common, but I
> leave the cleanup for the maintainers to post to mainline.

FWIW, this section starting here and...

> PS: I'm very sorry for this regression.
>
> Changes in v2:
> Patch formatting fixes
> - Fix Fixes tag
> – Simplify stable Cc tag
> – Fix Signed-off-by tag
>
> Changes in v3:
> – Prefixed commit reference with "commit" aka I managed to use
> checkpatch.pl.
> - Added Tested-by tags for the testing reporters.
> – People start to get annoyed by my patch revision spamming. Should be
> the last one.

...ending here needs should be below the "---" line you already have
below. For details see:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html

BTW, same goes for any "#regzbot" commands (like you had in
cbe6c3a8f8f4), as things otherwise get confused when a patch for example
is posted as part of a stable/longterm -rc review.

But don't worry, no big deal, I handled that :-D Many thx for actually
directly getting regzbot involved and taking care of this regression!

> Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs")
> Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/
> Reported-by: Jordan Leppert <[email protected]>
> Reported-by: Holger Hoffstätte <[email protected]>
> Tested-by: Jordan Leppert <[email protected]>
> Tested-by: Holger Hoffstätte <[email protected]>
> Cc: <[email protected]> # 5.10+
> Signed-off-by: Manuel Ullmann <[email protected]>
> ---
> drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

Ciao, Thorsten