2017-06-19 02:46:25

by Jia-Ju Bai

[permalink] [raw]
Subject: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

The driver may sleep under a spin lock, and the function call path is:
netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
ioremap --> may sleep

To fix it, the lock is released before "ioremap", and the lock is
acquired again after this function.

Signed-off-by: Jia-Ju Bai <[email protected]>
---
drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
index a996801..5ea553e 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
@@ -1419,7 +1419,9 @@ static u32 netxen_nic_io_read_2M(struct netxen_adapter *adapter,

mem_base = pci_resource_start(adapter->pdev, 0) +
(start & PAGE_MASK);
+ spin_unlock(&adapter->ahw.mem_lock);
mem_ptr = ioremap(mem_base, PAGE_SIZE);
+ spin_lock(&adapter->ahw.mem_lock);
if (mem_ptr == NULL) {
ret = -EIO;
goto unlock;
--
1.7.9.5



2017-06-20 17:35:33

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

From: Jia-Ju Bai <[email protected]>
Date: Mon, 19 Jun 2017 10:48:53 +0800

> The driver may sleep under a spin lock, and the function call path is:
> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
> ioremap --> may sleep
>
> To fix it, the lock is released before "ioremap", and the lock is
> acquired again after this function.
>
> Signed-off-by: Jia-Ju Bai <[email protected]>

This style of change you are making is really starting to be a
problem.

You can't just drop locks like this, especially without explaining
why it's ok, and why the mutual exclusion this code was trying to
achieve is still going to be OK afterwards.

In fact, I see zero analysis of the locking situation here, why
it was needed in the first place, and why your change is OK in
that context.

Any locking change is delicate, and you must put the greatest of
care and consideration into it.

Just putting "unlock/lock" around the sleeping operation shows a
very low level of consideration for the implications of the change
you are making.

This isn't like making whitespace fixes, sorry...

2017-06-21 06:11:50

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

David Miller <[email protected]> writes:

> From: Jia-Ju Bai <[email protected]>
> Date: Mon, 19 Jun 2017 10:48:53 +0800
>
>> The driver may sleep under a spin lock, and the function call path is:
>> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>> ioremap --> may sleep
>>
>> To fix it, the lock is released before "ioremap", and the lock is
>> acquired again after this function.
>>
>> Signed-off-by: Jia-Ju Bai <[email protected]>
>
> This style of change you are making is really starting to be a
> problem.
>
> You can't just drop locks like this, especially without explaining
> why it's ok, and why the mutual exclusion this code was trying to
> achieve is still going to be OK afterwards.
>
> In fact, I see zero analysis of the locking situation here, why
> it was needed in the first place, and why your change is OK in
> that context.
>
> Any locking change is delicate, and you must put the greatest of
> care and consideration into it.
>
> Just putting "unlock/lock" around the sleeping operation shows a
> very low level of consideration for the implications of the change
> you are making.
>
> This isn't like making whitespace fixes, sorry...

We already tried to explain this to Jia-Ju during review of a wireless
patch:

https://patchwork.kernel.org/patch/9756585/

Jia-Ju, you should listen to feedback. If you continue submitting random
patches like this makes it hard for maintainers to trust your patches
anymore.

--
Kalle Valo

2017-06-21 06:30:11

by Jia-Ju Bai

[permalink] [raw]
Subject: Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

On 06/21/2017 02:11 PM, Kalle Valo wrote:
> David Miller<[email protected]> writes:
>
>> From: Jia-Ju Bai<[email protected]>
>> Date: Mon, 19 Jun 2017 10:48:53 +0800
>>
>>> The driver may sleep under a spin lock, and the function call path is:
>>> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>>> ioremap --> may sleep
>>>
>>> To fix it, the lock is released before "ioremap", and the lock is
>>> acquired again after this function.
>>>
>>> Signed-off-by: Jia-Ju Bai<[email protected]>
>> This style of change you are making is really starting to be a
>> problem.
>>
>> You can't just drop locks like this, especially without explaining
>> why it's ok, and why the mutual exclusion this code was trying to
>> achieve is still going to be OK afterwards.
>>
>> In fact, I see zero analysis of the locking situation here, why
>> it was needed in the first place, and why your change is OK in
>> that context.
>>
>> Any locking change is delicate, and you must put the greatest of
>> care and consideration into it.
>>
>> Just putting "unlock/lock" around the sleeping operation shows a
>> very low level of consideration for the implications of the change
>> you are making.
>>
>> This isn't like making whitespace fixes, sorry...
> We already tried to explain this to Jia-Ju during review of a wireless
> patch:
>
> https://patchwork.kernel.org/patch/9756585/
>
> Jia-Ju, you should listen to feedback. If you continue submitting random
> patches like this makes it hard for maintainers to trust your patches
> anymore.
>
Hi,

I am quite sorry for my incorrect patches, and I will listen carefully
to your advice.
In fact, for some bugs and patches which I have reported before, I have
not received the feedback of them, so I resent them a few days ago,
including this patch.
Sorry for my mistake again.

Thanks,
Jia-Ju Bai

2017-06-21 09:44:24

by Bo YU

[permalink] [raw]
Subject: Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

Hi,
On Wed, Jun 21, 2017 at 02:33:03PM +0800, Jia-Ju Bai wrote:
>On 06/21/2017 02:11 PM, Kalle Valo wrote:
>>David Miller<[email protected]> writes:
>>
>>>From: Jia-Ju Bai<[email protected]>
>>>Date: Mon, 19 Jun 2017 10:48:53 +0800
>>>
>>>>The driver may sleep under a spin lock, and the function call path is:
>>>>netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>>>> ioremap --> may sleep
>>>>
>>>>To fix it, the lock is released before "ioremap", and the lock is
>>>>acquired again after this function.
>>>>
>>>>Signed-off-by: Jia-Ju Bai<[email protected]>
>>>This style of change you are making is really starting to be a
>>>problem.
>>>
>>>You can't just drop locks like this, especially without explaining
>>>why it's ok, and why the mutual exclusion this code was trying to
>>>achieve is still going to be OK afterwards.
>>>
>>>In fact, I see zero analysis of the locking situation here, why
>>>it was needed in the first place, and why your change is OK in
>>>that context.
>>>
>>>Any locking change is delicate, and you must put the greatest of
>>>care and consideration into it.
>>>
>>>Just putting "unlock/lock" around the sleeping operation shows a
>>>very low level of consideration for the implications of the change
>>>you are making.
>>>
>>>This isn't like making whitespace fixes, sorry...
>>We already tried to explain this to Jia-Ju during review of a wireless
>>patch:
>>
>>https://patchwork.kernel.org/patch/9756585/
>>
>>Jia-Ju, you should listen to feedback. If you continue submitting random
>>patches like this makes it hard for maintainers to trust your patches
>>anymore.
>>
>Hi,
>
>I am quite sorry for my incorrect patches, and I will listen carefully
>to your advice.
>In fact, for some bugs and patches which I have reported before, I
>have not received the feedback of them, so I resent them a few days
>ago, including this patch.
>Sorry for my mistake again.

Once your patch be accepted, maintainer will reply you by mail sent by
automatic or themselves.But for your patch(es),i think most of them will
be dropped silently, because (un)lock related operations is very
criticality, especially in kernel code. Maintainers will not accept
unsafe (un)lock code.

Best Regards
>
>Thanks,
>Jia-Ju Bai
>

2017-06-21 13:41:10

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

Jia-Ju Bai <[email protected]> writes:

> On 06/21/2017 02:11 PM, Kalle Valo wrote:
>> David Miller<[email protected]> writes:
>>
>>> From: Jia-Ju Bai<[email protected]>
>>> Date: Mon, 19 Jun 2017 10:48:53 +0800
>>>
>>>> The driver may sleep under a spin lock, and the function call path is:
>>>> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>>>> ioremap --> may sleep
>>>>
>>>> To fix it, the lock is released before "ioremap", and the lock is
>>>> acquired again after this function.
>>>>
>>>> Signed-off-by: Jia-Ju Bai<[email protected]>
>>> This style of change you are making is really starting to be a
>>> problem.
>>>
>>> You can't just drop locks like this, especially without explaining
>>> why it's ok, and why the mutual exclusion this code was trying to
>>> achieve is still going to be OK afterwards.
>>>
>>> In fact, I see zero analysis of the locking situation here, why
>>> it was needed in the first place, and why your change is OK in
>>> that context.
>>>
>>> Any locking change is delicate, and you must put the greatest of
>>> care and consideration into it.
>>>
>>> Just putting "unlock/lock" around the sleeping operation shows a
>>> very low level of consideration for the implications of the change
>>> you are making.
>>>
>>> This isn't like making whitespace fixes, sorry...
>> We already tried to explain this to Jia-Ju during review of a wireless
>> patch:
>>
>> https://patchwork.kernel.org/patch/9756585/
>>
>> Jia-Ju, you should listen to feedback. If you continue submitting random
>> patches like this makes it hard for maintainers to trust your patches
>> anymore.
>>
> Hi,
>
> I am quite sorry for my incorrect patches, and I will listen carefully
> to your advice. In fact, for some bugs and patches which I have
> reported before, I have not received the feedback of them, so I resent
> them a few days ago, including this patch.

Yeah, it is likely that some of your reports will not get any response.
For that I only suggest being persistent and providing more information
about the issue and suggestions how it might be possible to fix it. Also
Dan Carpenter (Cced) might have some suggestions.

But trying to "fix" it by just silencing the warning without proper
analysis is totally the wrong approach, you do more harm than good.

What tool do you use to find these issues? Is it publically available?

--
Kalle Valo

2017-06-21 14:32:50

by Jia-Ju Bai

[permalink] [raw]
Subject: Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

On 2017/6/21 21:40, Kalle Valo wrote:

> Jia-Ju Bai <[email protected]> writes:
>
>> On 06/21/2017 02:11 PM, Kalle Valo wrote:
>>> David Miller<[email protected]> writes:
>>>
>>>> From: Jia-Ju Bai<[email protected]>
>>>> Date: Mon, 19 Jun 2017 10:48:53 +0800
>>>>
>>>>> The driver may sleep under a spin lock, and the function call path is:
>>>>> netxen_nic_pci_mem_access_direct (acquire the lock by spin_lock)
>>>>> ioremap --> may sleep
>>>>>
>>>>> To fix it, the lock is released before "ioremap", and the lock is
>>>>> acquired again after this function.
>>>>>
>>>>> Signed-off-by: Jia-Ju Bai<[email protected]>
>>>> This style of change you are making is really starting to be a
>>>> problem.
>>>>
>>>> You can't just drop locks like this, especially without explaining
>>>> why it's ok, and why the mutual exclusion this code was trying to
>>>> achieve is still going to be OK afterwards.
>>>>
>>>> In fact, I see zero analysis of the locking situation here, why
>>>> it was needed in the first place, and why your change is OK in
>>>> that context.
>>>>
>>>> Any locking change is delicate, and you must put the greatest of
>>>> care and consideration into it.
>>>>
>>>> Just putting "unlock/lock" around the sleeping operation shows a
>>>> very low level of consideration for the implications of the change
>>>> you are making.
>>>>
>>>> This isn't like making whitespace fixes, sorry...
>>> We already tried to explain this to Jia-Ju during review of a wireless
>>> patch:
>>>
>>> https://patchwork.kernel.org/patch/9756585/
>>>
>>> Jia-Ju, you should listen to feedback. If you continue submitting random
>>> patches like this makes it hard for maintainers to trust your patches
>>> anymore.
>>>
>> Hi,
>>
>> I am quite sorry for my incorrect patches, and I will listen carefully
>> to your advice. In fact, for some bugs and patches which I have
>> reported before, I have not received the feedback of them, so I resent
>> them a few days ago, including this patch.
> Yeah, it is likely that some of your reports will not get any response.
> For that I only suggest being persistent and providing more information
> about the issue and suggestions how it might be possible to fix it. Also
> Dan Carpenter (Cced) might have some suggestions.
>
> But trying to "fix" it by just silencing the warning without proper
> analysis is totally the wrong approach, you do more harm than good.
>
> What tool do you use to find these issues? Is it publically available?
>

Hi,

Thanks a lot for your advice. And I am very glad to see that you may be
interested in my work :)
This static tool is written by myself, instead of using or improving
existing tools. A reason why I write it is that I have encountered some
sleep-in-atomic bugs in my driver development :( .
However, due to preliminary implementation, this tool still has some
limitations which can produce some false positives or negatives, and it
may be not very easy to use. Thus, I am still improving this tool,
checking more code and collecting results now. By the way, I apologize
again for my incorrect patches of trying to "fix" the detected bugs.
In fact, I am very glad to make this tool available to effectively and
conveniently check more system code. After I finish the improvements and
perform more evaluation, I will make it publicly available.
If you have any suggestion or comment on my work, please feel free to
contact me :)

Thanks,
Jia-Ju Bai




2017-06-22 06:08:42

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

We should probably add a might_sleep() to ioremap() to prevent these
bugs in the future.

This bug is eight years old. You can report it, but it's going to hard
to get anyone to fix it. I sometimes ignore ancient bugs. On the other
hand, netxen is fairly well supported so it doesn't hurt to try.

I try to report bugs as soon as they are introduced. I report it to
the author and CC the relevant list. If people don't respond to my
email after a month then I complain again.

regards,
dan carpenter

2017-06-22 10:53:12

by Jia-Ju Bai

[permalink] [raw]
Subject: Re: [PATCH] netxen: Fix a sleep-in-atomic bug in netxen_nic_pci_mem_access_direct

On 2017/6/22 14:08, Dan Carpenter wrote:
> We should probably add a might_sleep() to ioremap() to prevent these
> bugs in the future.
I think it is right to do this.
And it will be very useful to summarize common kernel interface
functions which may sleep into a list. When writing a new driver, the
developer can refer to this list to reduce or avoid sleep-in-atomic bugs.

>
> This bug is eight years old. You can report it, but it's going to hard
> to get anyone to fix it. I sometimes ignore ancient bugs. On the other
> hand, netxen is fairly well supported so it doesn't hurt to try.
>
> I try to report bugs as soon as they are introduced. I report it to
> the author and CC the relevant list. If people don't respond to my
> email after a month then I complain again.
>
> regards,
> dan carpenter
>

Thanks for your helpful advice.

Thanks,
Jia-Ju Bai