2015-02-27 08:43:29

by Thorsten.Bschorr

[permalink] [raw]
Subject: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

w1_slave_show unlocks the bus while waiting for the sensor
to convert the temperature. If the sensor gets disconnected
during that time, sl->family_data could get NULL.
Note: the w1_slave pointer inside w1_slave_show is protected by
sysfs-mutex
---
drivers/w1/slaves/w1_therm.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..3a14cb1 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -236,6 +236,12 @@ static ssize_t w1_slave_show(struct device *device,
i = mutex_lock_interruptible(&dev->bus_mutex);
if (i != 0)
return i;
+
+ /* Ensure that device is still there -
+ * it could have gone while we unlocked the bus.
+ * Note: sl is still protected by sysfs-mutex */
+ if(!sl->family_data)
+ return -ENODEV;
} else if (!w1_strong_pullup) {
sleep_rem = msleep_interruptible(tm);
if (sleep_rem != 0) {
--
1.8.4.5


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
http://www.avast.com


2015-02-28 20:17:53

by David Fries

[permalink] [raw]
Subject: Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

Thanks for preparing the patch, it looks like it will go another
round, but that happens to everyone.

To simplify to show a problem,

mutex_lock_interruptible(&dev->bus_mutex);

if (!sl->family_data)
return -ENODEV;

If family_data is NULL, then dev->bus_mutex is left locked. Your
earlier e-mails indicated that you weren't seeing any problems after
applying this patch, but I would have thought once this early return
was encountered, that bus would not be usable any longer because
bus_mutex would be locked. I don't know if the bus controller went
away, so there is effectively a new bus, because I assume you are
still able to read the temperature once this condition triggers?

It looks like you need to add mutex_unlock(&dev->bus_mutex); before
return -ENODEV; and that's how it is handled elsewhere in the
function, but you might want to put a printk before the return,
trigger it and verify it is getting hung up on the bus_mutex, it would
also be telling if you can remove the wire and the other w1 modules,
they should not let you remove them while a mutex is held.


The patch needs to be run through the checkpatch script, and it's
listing two issues, adding a space after the if (I also prefer not
having it, but this is for consistency), and missing Signed-off-by:

That would look like this, only with your information, it's a
tracing of where the changes came from and agreeing to the copyright,
you can read more in Documentation/SubmittingPatches if you haven't
already.
Signed-off-by: David Fries <[email protected]>

scripts/checkpatch.pl /tmp/w1_null_patch.patch

Thanks, keep up improving things.

On Fri, Feb 27, 2015 at 09:43:14AM +0100, Thorsten.Bschorr wrote:
> w1_slave_show unlocks the bus while waiting for the sensor
> to convert the temperature. If the sensor gets disconnected
> during that time, sl->family_data could get NULL.
> Note: the w1_slave pointer inside w1_slave_show is protected by
> sysfs-mutex
> ---
> drivers/w1/slaves/w1_therm.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
> index 1f11a20..3a14cb1 100644
> --- a/drivers/w1/slaves/w1_therm.c
> +++ b/drivers/w1/slaves/w1_therm.c
> @@ -236,6 +236,12 @@ static ssize_t w1_slave_show(struct device *device,
> i = mutex_lock_interruptible(&dev->bus_mutex);
> if (i != 0)
> return i;
> +
> + /* Ensure that device is still there -
> + * it could have gone while we unlocked the bus.
> + * Note: sl is still protected by sysfs-mutex */
> + if(!sl->family_data)
> + return -ENODEV;
> } else if (!w1_strong_pullup) {
> sleep_rem = msleep_interruptible(tm);
> if (sleep_rem != 0) {
> --
> 1.8.4.5
>
>
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren gepr?ft.
> http://www.avast.com
>

--
David Fries <[email protected]> PGP pub CB1EE8F0
http://fries.net/~david/

2015-03-01 02:17:57

by David Fries

[permalink] [raw]
Subject: Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

On Sun, Mar 01, 2015 at 04:48:22AM +0300, ??????? ??????? wrote:
> Hi everyone
>
> 28.02.2015, 23:18, "David Fries" <[email protected]>:
> > Thanks for preparing the patch, it looks like it will go another
> > round, but that happens to everyone.
>
> Patch itself doesn't solve the problem, it only adds a workaround.
> There is a reference counter issue - reading data via sysfs only holds
> a reference to device, w1 slave structure will be deleted by w1 search core.
>
> I believe it only accidentally doesn't crash because of kernel memory allocator
> hasn't yet reclaimed memory.

Good point, it does look like that's the case.

w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
which calls sl->family->fops->remove_slave(sl);
which kfree and sl->family_data = NULL;

In that case what do you think about in w1_slave_show after getting
bus_mutex,
atomic_inc(&sl->refcnt);

then add
w1_unref_slave(sl);
in each return path, or do a goto and consolidate them.

or just increment it while sleeping, which is when it's needed, which
also looks simpler.

if (external_power) {
+ int refcnt;
mutex_unlock(&dev->bus_mutex);

+ /* prevent the slave from going away */
+ atomic_inc(&sl->refcnt);
sleep_rem = msleep_interruptible(tm);
+ refcnt = w1_unref_slave(sl);
- if (sleep_rem != 0)
+ if (sleep_rem != 0 || !refcnt)
return -EINTR;

i = mutex_lock_interruptible(&dev->bus_mutex);
if (i != 0)
return i;
} else if (!w1_strong_pullup) {


--
David Fries <[email protected]> PGP pub CB1EE8F0
http://fries.net/~david/

2015-04-16 03:52:21

by David Fries

[permalink] [raw]
Subject: Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

It has not been solved. Evgeniy would like to make use of the sysfs
device management instead of the current reference counting, however I
haven't heard any volunteers to do that work. I posted a quick fix
patch, it was very easy to crash without this patch, it doesn't
completely solve the race conditions, and I don't think it can be
solved in just a slave driver change.

Are you up for the challenge?

On Wed, Apr 15, 2015 at 09:52:27AM +0200, Jonathan ALIBERT wrote:
> Hi,
>
> Do you know if the problem has been solved ?
>
> Cheers,
>
> *Jonathan ALIBERT*
> *06 32 26 59 12*
> *265, route de Saint Haon*
> *42 370 RENAISON*
>
>
> 2015-03-19 1:09 GMT+01:00 David Fries <[email protected]>:
>
> > On Wed, Mar 18, 2015 at 06:18:53PM +0300, Evgeniy Polyakov wrote:
> > > Hi
> > >
> > > 18.03.2015, 07:20, "David Fries" <[email protected]>:
> > > > static void w1_therm_remove_slave(struct w1_slave *sl)
> > > > {
> > > > + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data));
> > > > + while(refcnt) {
> > > > + msleep(1000);
> > > > + refcnt = atomic_read(THERM_REFCNT(sl->family_data));
> > > > + }
> > > > kfree(sl->family_data);
> > > > sl->family_data = NULL;
> > > > }
> > >
> > > Can we replace this whole atomic manipulations with kref_t and free
> > family data in the place
> > > which actually drops reference counter to zero?
> > >
> > > I.e. we return from remove_slave() function potentially leaving family
> > data floating around, it will be freed
> > > when the last user drops the reference. There is still a race between
> > increasing reference when starting
> > > reading and removing slave device, i.e. one starts reading, while
> > attached slave device is being removed,
> > > but that's a different problem.
> >
> > With the two while loops I posted, I see with two clients reading
> > w1_slave, the other command to remove a slave gets permanently stuck
> > in w1_therm_remove_slave, which keeps the slave around while the
> > clients continue to read. I wouldn't predict things going better by
> > keeping family_data around longer, the slave data would still go away
> > with readers around.
> >
> > --
> > David Fries <[email protected]> PGP pub CB1EE8F0
> > http://fries.net/~david/
> >

--
David Fries <[email protected]> PGP pub CB1EE8F0
http://fries.net/~david/

2015-04-16 12:02:37

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

Hi David

16.04.2015, 06:52, "David Fries" <[email protected]>:
> It has not been solved. ?Evgeniy would like to make use of the sysfs
> device management instead of the current reference counting, however I
> haven't heard any volunteers to do that work. ?I posted a quick fix
> patch, it was very easy to crash without this patch, it doesn't
> completely solve the race conditions, and I don't think it can be
> solved in just a slave driver change.

Let's push this patch upstream as a temporal fix until we are ready with the new solution.
Thorsten, does it fix your crash?

2015-04-17 13:03:09

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

Hi

16.04.2015, 15:00, "Thorsten Bschorr" <[email protected]>:
>> Let's push this patch upstream as a temporal fix until we are ready with the new solution.
>> Thorsten, does it fix your crash?
>
> I'm running David's refcounting-patch for about 3 weeks now on my 3.18.y kernel, and did not observe any problems with it, especially no crash or dead-lock.

David, please send a formal patch to linux-kernel@ and Greg Kroah-Hartman
with signed-of and acked-by lines