2006-12-19 19:28:20

by John

[permalink] [raw]
Subject: Possible race condition in usb-serial.c

Hi,
I read usb-serial.c code (in 2.6.19) and I cannot
figure out how it is
supposed to prevent race condition and premature
deletion of usb_serial
structure. I see that the code attempts to protect
usb_serial by ref
counting, but it does not appear to be correct. I am
not 100% sure in my
findings, so I will appreciate if somebody will double
check.

Suppose:
A:->usb_serial_disconnect
A: -> usb_serial_put (serial);
A: -> kref_put
A: if ((atomic_read(&kref->refcount) == 1)
Suppose refcount is 1
A: -> release -> destroy_serial

B: -> serial_open
B: -> usb_serial_get_by_index
B: serial = serial_table[index]
B: -> kref_get(&serial->kref);

A: -> return_serial(serial);
A: serial_table[serial->minor + i] = NULL;
A: -> kfree (serial);

B: continue to use serial, which was already freed.

So, I am missing something or the USB serial driver is
broken?

As I understand it, the correct use of ref counted
pointers it to increment
ref count of an object for each outstanding pointer to
this object. But
usb-serial.c keeps one or more pointers to usb_serial
in serial_table, and
does not increments the counter for any of them!

Thank you
John




__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


2006-12-19 20:15:48

by Oliver Neukum

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

Am Dienstag, 19. Dezember 2006 20:21 schrieb J:
> Hi,
> I read usb-serial.c code (in 2.6.19) and I cannot
> figure out how it is
> supposed to prevent race condition and premature
> deletion of usb_serial
> structure. I see that the code attempts to protect
> usb_serial by ref
> counting, but it does not appear to be correct. I am
> not 100% sure in my
> findings, so I will appreciate if somebody will double
> check.

This code depends on protection from BKL.

Regards
Oliver

2006-12-19 22:33:11

by John

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

Thank you for the response.

> This code depends on protection from BKL.

Really? I cannot find many lock_kernel calls in
USB directory and those, which I can find,
don't appear to protect usb_serial_disconnect
and serial_close from being called at the same time.

May be the protection is at a higher level?
Personally I don't beleive it.
If you know how this thing is supposed to work,
please, tell me.

Thank you
John

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2006-12-20 09:45:39

by Oliver Neukum

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

Am Dienstag, 19. Dezember 2006 23:33 schrieb J:
> Thank you for the response.
>
> > This code depends on protection from BKL.
>
> Really? I cannot find many lock_kernel calls in
> USB directory and those, which I can find,
> don't appear to protect usb_serial_disconnect
> and serial_close from being called at the same time.

serial_close is safe because serial_disconnect lowers the refcount
by one. usb_serial_probe() and usb_serial_open() both increment
the refcount; the former implicitly.

> May be the protection is at a higher level?
> Personally I don't beleive it.
> If you know how this thing is supposed to work,
> please, tell me.

The data structure to protect is serial_table. Everything else is
protected by refcounts. Therefore the interesting race is between
open and disconnect. Open is called with BKL (fs/char_dev.c::chrdev_open)

Now, regarding disconnect. It used to be called with BKL held. I haven't been
able to verify that this is still the case. If not, then there's a race.

In addition usb_serial_probe() uses get_free_serial() early in the process
before the device is ready. Without BKL, this too, races with open.

People, do we take BKL in khubd?

Regards
Oliver

2006-12-20 15:17:09

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-usb-devel] Possible race condition in usb-serial.c

On Wed, 20 Dec 2006, Oliver Neukum wrote:

> The data structure to protect is serial_table. Everything else is
> protected by refcounts. Therefore the interesting race is between
> open and disconnect. Open is called with BKL (fs/char_dev.c::chrdev_open)
>
> Now, regarding disconnect. It used to be called with BKL held. I haven't been
> able to verify that this is still the case. If not, then there's a race.
>
> In addition usb_serial_probe() uses get_free_serial() early in the process
> before the device is ready. Without BKL, this too, races with open.
>
> People, do we take BKL in khubd?

Nope.

Alan Stern

2006-12-20 19:32:33

by John

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

Thank you for the explanation.

> serial_close is safe because serial_disconnect
> lowers the refcount

Sorry, I meant serial_open, as in my original example.

I am currently trying to fix a legacy 2.4 based USB
driver and I am having various races,
serial_open/usb_serial_disconnect is the most lively.
I am not asking your help in fixing this old 2.4 junk
(in fact I already fixed it using a global semaphore
to protect serial_table).

But I still want to understand how the latest and
greatest 2.6 driver is supposed to work so I can
adopt some of the changes. At first I thought that
the ref-counting will help, but then found that
it does not fix much! The race is as lively
as ever.

Also I found that BKL/lock_kernel is compiled out in
my configuration because it is not an SMP build.

I see that in 2.6 BKL/lock_kernel are also optional
for non-SMP builds. Is it true?
If yes, then again, how this is supposed to work
and avoid races?


Thank you
John


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2006-12-20 20:42:13

by Oliver Neukum

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

Am Mittwoch, 20. Dezember 2006 20:32 schrieb J:

> I am currently trying to fix a legacy 2.4 based USB
> driver and I am having various races,
> serial_open/usb_serial_disconnect is the most lively.
> I am not asking your help in fixing this old 2.4 junk
> (in fact I already fixed it using a global semaphore
> to protect serial_table).

Please send in a patch for 2.4. It's very important to have a
very reliable ultraconservative tested kernel available.

> But I still want to understand how the latest and
> greatest 2.6 driver is supposed to work so I can
> adopt some of the changes. At first I thought that
> the ref-counting will help, but then found that
> it does not fix much! The race is as lively
> as ever.

I suppose the last time I looked at that code, khubd still took
BKL. Or I overlooked the race. I have no serial devices since
the cell phone broke.

> Also I found that BKL/lock_kernel is compiled out in
> my configuration because it is not an SMP build.
>
> I see that in 2.6 BKL/lock_kernel are also optional
> for non-SMP builds. Is it true?
> If yes, then again, how this is supposed to work
> and avoid races?

Look closer. If you build with preemption, taking BKL disables preemption.
BKL is effective only until you schedule. On UP, without preemption
ordinary kernel code will not reenter. You need no lock that doesn't guard
against interrupts unless you schedule under these narrow conditions.

Could you test the attached patch against 2.6?

Regards
Oliver


Attachments:
(No filename) (1.46 kB)
usbserialtablerace.diff (2.50 kB)
Download all attachments

2006-12-20 20:44:27

by Greg KH

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

On Wed, Dec 20, 2006 at 11:32:31AM -0800, J wrote:
> Thank you for the explanation.
>
> > serial_close is safe because serial_disconnect
> > lowers the refcount
>
> Sorry, I meant serial_open, as in my original example.
>
> I am currently trying to fix a legacy 2.4 based USB
> driver and I am having various races,
> serial_open/usb_serial_disconnect is the most lively.
> I am not asking your help in fixing this old 2.4 junk
> (in fact I already fixed it using a global semaphore
> to protect serial_table).

Which usb-serial driver are you having problems with? What is the oops
trace? What version of the 2.4 kernel are you using?

And why are you taking the linux-usb-devel list out of the cc:? :)

thanks,

greg k-h

2006-12-20 21:00:47

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-usb-devel] Possible race condition in usb-serial.c

Am Mittwoch, 20. Dezember 2006 16:10 schrieb Alan Stern:
> On Wed, 20 Dec 2006, Oliver Neukum wrote:

> > People, do we take BKL in khubd?
>
> Nope.

OK. I've taken measures to correct the problem.

Regards
Oliver

2006-12-20 22:24:28

by John

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

> Please send in a patch for 2.4. It's very important
> to have a
> very reliable ultraconservative tested kernel
> available.

I will try later. I am new to Linux driver development
and never submitted any patches before.
Also I am not yet 100% sure about the correct way
to solve this issue. I will have to study more.

> I suppose the last time I looked at that code, khubd
> still took BKL. Or I overlooked the race.

Well, as I already mentioned, BKL/lock_kernel appears
to be completely compiled out of my build, so it is
no help for me.
Also, in my opinion, if usb-serial.c depends on
other modules for synchronization, it should have
a big warning in large font about this, otherwise
bugs are doomed to happen. Or, better yet,
it should have it's own proper synchronization
(as we are trying to come up with now).


> Look closer. If you build with preemption, taking
> BKL disables preemption.
> BKL is effective only until you schedule. On UP,
> without preemption
> ordinary kernel code will not reenter. You need no
> lock that doesn't guard
> against interrupts unless you schedule under these
> narrow conditions.

In my old 2.4 build CONFIG_SMP is not defined and,
therefore, lock_kernel is compiled out.
I am not sure about the definition of
CONFIG_LOCK_KERNEL in 2.6.
I tried googling for it, but it brings tons of junk.

> Could you test the attached patch against 2.6?

Alas, I only have an old 2.4 right now.
May be I will install 2.6 later (in few weeks).
Currently I am just trying to read 2.6 code with my
eyes.

I studied the patch, which you sent.
I don't see obvious errors, but, in my opinion, it is
not completely perfect.

This attempt to mix ref counting and mutex just does
not look right. For example, in few places you
wrap call to usb_serial_put (which decrements ref
count) with the mutex:

mutex_lock(&table_lock);
usb_serial_put(port->serial);
mutex_unlock(&table_lock);

But in other cases it is called without any mutex.

Also there is a theoretical possibility of a deadlock
because some external functions are called when
the mutex is held
(such as serial->type->shutdown, device_unregister,
usb_put_dev, tty_hangup, etc).

What if these functions will call some TTY routines,
which will somehow synchronize with serial_open?
So, for example, usb_serial_disconnect will wait for
tty_hangup, which will wait for serial_open,
which waits for usb_serial_disconnect.
Now, I see that in the current 2.6 tty_hangup simply
calls schedule_work, but it may change in the future.
And I don't have time to examine what various shutdown
routines in all the driver do.


So, I don't see a quick solution for this.
I guess, a cleaner way would be to modify the code,
which creates and deletes usb_serial structure.
In particular, currently usb_serial_probe is written
as:
create_serial
...
get_free_serial (which inserts serial to serial_table)
...
initializes serial

As the result a partially initialized serial is
inserted into the table. As the result you had to
lock almost the whole body of usb_serial_probe with
table_lock, which may lead to deadlocks.

It would be cleaner to allocate and fully initialize
usb_serial before inserting it into the table in
one quick call under mutex (or using atomics).

However, I am not very familiar yet with this code,
so I may be wrong.

John


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2006-12-20 22:39:41

by John

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

Thank you for your reply.

> Which usb-serial driver are you having problems
> with? What is the oops trace?
> What version of the 2.4 kernel are you using?

I was told to fix an old embedded device, which my
company bought from somebody many years ago.
It appears to have kernel 2.4.9 and a patched version
of visor.c.

However, I would't bother you with my antiquated
setup. I posted this message only when I started
to think that 2.6 may have race conditions as well.

And I don't have any oops. All I have is a monitor
with 30 lines worse of printk messages. This is
enough,
however, to discover races if printk are in the right
places.

> And why are you taking the linux-usb-devel list out
> of the cc:? :)

This is because it is my first post on linux.kernel.
I spent time reading FAQs on http://www.tux.org/lkml/.
It wastes many pages explaining netiquette, but does
not really explain how to post other then
"send to [email protected]". That's what I
did.

Thank you
John

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2006-12-20 22:53:22

by Greg KH

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

On Wed, Dec 20, 2006 at 02:39:39PM -0800, J wrote:
> Thank you for your reply.
>
> > Which usb-serial driver are you having problems
> > with? What is the oops trace?
> > What version of the 2.4 kernel are you using?
>
> I was told to fix an old embedded device, which my
> company bought from somebody many years ago.
> It appears to have kernel 2.4.9 and a patched version
> of visor.c.

Ugh, old versions of the visor driver had lots of races on the
disconnect path. Good luck trying to fix all of them, it took until the
2.6 kernel to get all of the reported oopses fixed.

greg k-h

2006-12-22 18:12:54

by Oliver Neukum

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

Am Mittwoch, 20. Dezember 2006 23:24 schrieb J:
> > Could you test the attached patch against 2.6?
>
> Alas, I only have an old 2.4 right now.
> May be I will install 2.6 later (in few weeks).
> Currently I am just trying to read 2.6 code with my
> eyes.
>
> I studied the patch, which you sent.
> I don't see obvious errors, but, in my opinion, it is
> not completely perfect.

Please disregard the patch. I overlooked serial_read_proc().
This problem will need some deeper surgery probably involving
removal of the refcounting.

But not during Christmas.

Regards
Oliver

2006-12-22 19:08:03

by John

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

> This problem will need some deeper surgery probably
> involving
> removal of the refcounting.

Refcounting may be OK if used consistently.
It is not OK when some pointers are ref-counted,
but other (in serial_table) are not (like it is
in the current version).

As for the deeper surgery, what do you think about my
earlier suggestion to start by rewriting
usb_serial_probe
to fully initialize usb_serial before it is added to
serial_table?

So, instead of the current:
1. create_serial
...
2. mutex_lock(&table_lock);
3. get_free_serial (which inserts serial to
serial_table)
...
4. initializes serial
5. mutex_unlock(&table_lock);

we will get:

1. create_serial
2. initializes serial

3. add_serial_toserial_table (with internal mutex
lock if needed)

Similar approach should be used in other places to
minimize the code executed under the mutex.

John

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

2006-12-22 19:58:07

by Oliver Neukum

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

Am Freitag, 22. Dezember 2006 20:08 schrieb J:
> > This problem will need some deeper surgery probably
> > involving
> > removal of the refcounting.
>
> Refcounting may be OK if used consistently.
> It is not OK when some pointers are ref-counted,
> but other (in serial_table) are not (like it is
> in the current version).

No, this is a fundamental problem. You don't refcount
a pointer, you refcount a data structure. But this is
insufficient. We need to make sure the pointer points to valid
memory.
The problem with the current scheme is that serial_table
needs a lock. It needs to be taken in four places
- disconnect()
- open()
- probe()
- read_proc()

Refcounting solves only the race between disconnect() and close()
There's little use in a second locking mechanism if you use it
only in a minority of occasions.
Refcounting is a great idea if the number of references follows
a clear up -> maximum -> down -> free scheme, like for
skbs, etc..

>
> As for the deeper surgery, what do you think about my
> earlier suggestion to start by rewriting
> usb_serial_probe
> to fully initialize usb_serial before it is added to
> serial_table?

Good suggestion. However, if done right, we'd go for a spin lock.

Regards
Oliver

2006-12-22 20:51:58

by John

[permalink] [raw]
Subject: Re: Possible race condition in usb-serial.c

> No, this is a fundamental problem. You don't
> refcount
> a pointer, you refcount a data structure.
> But this is insufficient. We need to make
> sure the pointer points to valid memory.

I understand. But a typical definition of ref-count
requires the count in the data structure to be
equal to the number of outstanding pointers to this
data structure.
Every time we create a new pointer, the ref count
should be incremented. When pointer is erased, count
is decremented.
This is what I meant as "ref counting a pointer".
If we follow this rule, then each pointer will
always point to a valid memory.

So, if we apply ref counting rules consistently,
then each pointer in serial_table should be
ref counted. This will completely break the current
code, which erases serial_table from destroy_serial,
which is called only when the ref count goes to 0,
which will never happen if serial_table is ref
counted.
However, this can be fixed if usb_serial_disconnect
will erase pointers in serial_table before
calling usb_serial_put.

Now, I am not yet 100% convinced that ref counting
will, indeed, work. Atomics are known to have
problems on SMP CPUs, which can reorder operations.
But I would not discard atomics yet.
Global mutex is go ugly.

John


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com