2013-03-02 00:11:04

by Marcin Jurkowski

[permalink] [raw]
Subject: Re: Re: reproducible w1 oops on recent kernels (at least since 3.2.x)

On Wednesday, January 16, 2013 3:16:38 PM UTC+1, Evgeniy Polyakov wrote:
> Can you confirm that bug still persists and that it doesn't exist in
>
> 3.1? Do you have a possibility to bisect w1 bits down to broken
> commit?

Hi

I can confirm that this bug persists in recent kernel. Onewire netlink
interface to W1_SEARCH command must have been broken for a while.

Good news is that it seems to be easy to fix. I'll post an explanation
and a patch tomorrow.


Regards


2013-03-02 10:13:34

by Sven Geggus

[permalink] [raw]
Subject: Re: reproducible w1 oops on recent kernels (at least since 3.2.x)

Marcin Jurkowski schrieb am Samstag, den 02. März um 01:11 Uhr:

> I can confirm that this bug persists in recent kernel. Onewire netlink
> interface to W1_SEARCH command must have been broken for a while.
>
> Good news is that it seems to be easy to fix. I'll post an explanation
> and a patch tomorrow.

I did not send this to the kernel Mailinglist but to Evgeniy
only. This is the bad commit I found doing git bisect:

04f482faf50535229a5a5c8d629cf963899f857c is the first bad commit
commit 04f482faf50535229a5a5c8d629cf963899f857c
Author: Patrick McHardy <[email protected]>
Date: Mon Mar 28 08:39:36 2011 +0000

connector: convert to synchronous netlink message processing

Commits 01a16b21 (netlink: kill eff_cap from structnetlink_skb_parms)
and c53fa1ed (netlink: kill loginuid/sessionid/sid members fromstruct
netlink_skb_parms) removed some members from structnetlink_skb_parms
that depend on the current context, all netlink users are nowrequired
to do synchronous message processing.

connector however queues received messages and processes them ina work
queue, which is not valid anymore. This patch converts connectorto do
synchronous message processing by invoking the registeredcallback
handler directly from the netlink receive function.

In order to avoid invoking the callback with connector locksheld, a
reference count is added to struct cn_callback_entry, thereference
is taken when finding a matching callback entry on the device'squeue_list
and released after the callback handler has been invoked.

Signed-off-by: Patrick McHardy <[email protected]>
Acked-by: Evgeniy Polyakov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Sven

--
"C Is Quirky, Flawed, And An Enormous Success."
(Dennis M. Ritchie)

/me is giggls@ircnet, http://sven.gegg.us/ on the Web

2013-03-02 13:50:20

by Marcin Jurkowski

[permalink] [raw]
Subject: [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector

On Sat, Mar 02, 2013 at 10:45:10AM +0100, Sven Geggus wrote:
> This is the bad commit I found doing git bisect:
> 04f482faf50535229a5a5c8d629cf963899f857c is the first bad commit
> commit 04f482faf50535229a5a5c8d629cf963899f857c
> Author: Patrick McHardy <[email protected]>
> Date: Mon Mar 28 08:39:36 2011 +0000

Good job. I was too lazy to bisect for bad commit;)

Reading the code I found problematic kthread_should_stop call from netlink
connector which causes the oops. After applying a patch, I've been testing
owfs+w1 setup for nearly two days and it seems to work very reliable (no
hangs, no memleaks etc).
More detailed description and possible fix is given below:

Function w1_search can be called from either kthread or netlink callback.
While the former works fine, the latter causes oops due to kthread_should_stop
invocation.

This patch adds a check if w1_search is serving netlink command, skipping
kthread_should_stop invocation if so.

Signed-off-by: Marcin Jurkowski <[email protected]>
---
drivers/w1/w1.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index 7994d933..7e2220d 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -924,7 +924,8 @@ void w1_search(struct w1_master *dev, u8 search_type, w1_slave_found_callback cb
tmp64 = (triplet_ret >> 2);
rn |= (tmp64 << i);

- if (kthread_should_stop()) {
+ /* ensure we're called from kthread and not by netlink callback */
+ if (!dev->priv && kthread_should_stop()) {
mutex_unlock(&dev->bus_mutex);
dev_dbg(&dev->dev, "Abort w1_search\n");
return;
--
1.7.12.4

2013-03-03 15:36:35

by Sven Geggus

[permalink] [raw]
Subject: Re: [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector

Marcin Jurkowski schrieb am Samstag, den 02. März um 14:50 Uhr:

> This patch adds a check if w1_search is serving netlink command, skipping
> kthread_should_stop invocation if so.

Works fine on my Raspberry Pi!

Any chance to get this fix into mainline?

Regards

Sven

--
# Turn on/off security. Off is currently the default
(found in MongoDB default configfile)

/me is giggls@ircnet, http://sven.gegg.us/ on the Web

2013-03-03 20:54:59

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector

Hi

Marcin, thanks a lot for the fix, I have to sorry I'm not on this bug yet :(
Sven confirmed patch fixes it, Greg please pull it into your tree.

I believe this is stable material.
Thanks everyone.

Acked-by: Evgeniy Polyakov <[email protected]>

On Sat, Mar 02, 2013 at 02:50:15PM +0100, Marcin Jurkowski ([email protected]) wrote:
> On Sat, Mar 02, 2013 at 10:45:10AM +0100, Sven Geggus wrote:
> > This is the bad commit I found doing git bisect:
> > 04f482faf50535229a5a5c8d629cf963899f857c is the first bad commit
> > commit 04f482faf50535229a5a5c8d629cf963899f857c
> > Author: Patrick McHardy <[email protected]>
> > Date: Mon Mar 28 08:39:36 2011 +0000
>
> Good job. I was too lazy to bisect for bad commit;)
>
> Reading the code I found problematic kthread_should_stop call from netlink
> connector which causes the oops. After applying a patch, I've been testing
> owfs+w1 setup for nearly two days and it seems to work very reliable (no
> hangs, no memleaks etc).
> More detailed description and possible fix is given below:
>
> Function w1_search can be called from either kthread or netlink callback.
> While the former works fine, the latter causes oops due to kthread_should_stop
> invocation.
>
> This patch adds a check if w1_search is serving netlink command, skipping
> kthread_should_stop invocation if so.
>
> Signed-off-by: Marcin Jurkowski <[email protected]>
> ---
> drivers/w1/w1.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
> index 7994d933..7e2220d 100644
> --- a/drivers/w1/w1.c
> +++ b/drivers/w1/w1.c
> @@ -924,7 +924,8 @@ void w1_search(struct w1_master *dev, u8 search_type, w1_slave_found_callback cb
> tmp64 = (triplet_ret >> 2);
> rn |= (tmp64 << i);
>
> - if (kthread_should_stop()) {
> + /* ensure we're called from kthread and not by netlink callback */
> + if (!dev->priv && kthread_should_stop()) {
> mutex_unlock(&dev->bus_mutex);
> dev_dbg(&dev->dev, "Abort w1_search\n");
> return;
> --
> 1.7.12.4

--
Evgeniy Polyakov

2013-03-03 22:41:35

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector

On Mon, Mar 04, 2013 at 12:54:52AM +0400, Evgeniy Polyakov wrote:
> Hi
>
> Marcin, thanks a lot for the fix, I have to sorry I'm not on this bug yet :(
> Sven confirmed patch fixes it, Greg please pull it into your tree.

Ok, will do once 3.9-rc1 is out.

thanks,

greg k-h

2013-03-11 13:18:57

by Josh Boyer

[permalink] [raw]
Subject: Re: [PATCH 1/1] w1: fix oops when w1_search is called from netlink connector

On Sun, Mar 3, 2013 at 5:41 PM, GregKH <[email protected]> wrote:
> On Mon, Mar 04, 2013 at 12:54:52AM +0400, Evgeniy Polyakov wrote:
>> Hi
>>
>> Marcin, thanks a lot for the fix, I have to sorry I'm not on this bug yet :(
>> Sven confirmed patch fixes it, Greg please pull it into your tree.
>
> Ok, will do once 3.9-rc1 is out.

3.9-rc2 is out now. I don't see this in any of your trees. Just a reminder.

josh