2002-08-06 20:31:07

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: "new style" netdevice allocation patch for TUN driver

Hello!

> I completely agree. However sleeping and holding a lock that you
> don't have to hold

Nope. We have to hold it. Lock are taken to be held. :-)

Anyway, if you found real problem, it would be better if
you explained what is this, instead of proposing some random hacks.

Alexey


2002-08-06 21:05:52

by Max Krasnyansky

[permalink] [raw]
Subject: Re: "new style" netdevice allocation patch for TUN driver


> > I completely agree. However sleeping and holding a lock that you
> > don't have to hold
>
>Nope. We have to hold it. Lock are taken to be held. :-)
Device is already unlinked and is not visible to the rest of the stack.
Please explain to me why do we have to hold rtnl lock while sleeping in
unregister_netdevice ?

>Anyway, if you found real problem, it would be better if
>you explained what is this, instead of proposing some random hacks.
Come on it's not a random hack, there are two problems.

1 - Something is not releasing device during deregistration
I have no idea which subsystem is doing that. I'm not the one
who brought that up. This needs to be fixed somehow. I personally
can't reproduce it.

2 - We're sleeping with the rntl_lock held
This is somewhat unrelated to #1. I don't see the reason why we shouldn't
fix it. If one thing is buggy it doesn't mean that we should halt the
whole stack. And that's exactly what we're do right now. Devices, routes,
etc cannot be added/removed while unregister_netdevice is waiting.

Max

2002-08-06 22:25:22

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: "new style" netdevice allocation patch for TUN driver

Hello!

> Please explain to me why do we have to hold rtnl lock while sleeping in
> unregister_netdevice ?

... not "while sleeping", but just "while".

If the lock would be for the device, it would be local to the device,
rather than global. The semaphore blocks all the networking configuration
until caller will complete stuff related to unregistering.

Forward references from the unregistered device under cannot be invalidated
before the device is unregistered. All such work is done after unregistry
for "old style" or in dev->uninit/destructor hooks, which are used
by "new style" devices.


> 1 - Something is not releasing device

Let's find this something! It witnesses about really serious bug.

I cannot reproduce this too. Understand, please, we _can_ follow
your proposal, all 100% of old style devices do not need uninit in real
life. But then we never will find the bug. You cannot reproduce it,
I cannot too.


> 2 - We're sleeping with the rntl_lock held

Semaphores are used exaclty when we have to sleep while holding lock.


> fix it. If one thing is buggy it doesn't mean

It means exactly this. Kernel panic would be even better, if it was
possible to detect deadlock reliably.

Alexey

2002-08-07 19:40:31

by Max Krasnyansky

[permalink] [raw]
Subject: Re: "new style" netdevice allocation patch for TUN driver

Alexey,

> > Please explain to me why do we have to hold rtnl lock while sleeping in
> > unregister_netdevice ?
>
>... not "while sleeping", but just "while".
>
>If the lock would be for the device, it would be local to the device,
>rather than global. The semaphore blocks all the networking configuration
>until caller will complete stuff related to unregistering.
>
>Forward references from the unregistered device under cannot be invalidated
>before the device is unregistered. All such work is done after unregistry
>for "old style" or in dev->uninit/destructor hooks, which are used
>by "new style" devices.
All I'm saying is that. Unregistration is effectively complete at the time
we call schedule(). At this point only, so to speak, buggy subsystems are
holding references. Adding new devices and routes won't hurt. Even if
it will only those buggy guys might have problems with that.

> > 1 - Something is not releasing device
>
>Let's find this something! It witnesses about really serious bug.
>
>I cannot reproduce this too. Understand, please, we _can_ follow
>your proposal, all 100% of old style devices do not need uninit in real
>life. But then we never will find the bug. You cannot reproduce it,
>I cannot too.
I'm not following you. How dropping rtnl_lock while calling schedule()
would affect
our bug hunting :) ?
We still emit "waiting for ..." warning message and we still don't give
control
to the driver unless refcount is 0.

> > fix it. If one thing is buggy it doesn't mean
>It means exactly this. Kernel panic would be even better, if it was
>possible to detect deadlock reliably.
There is no deadlock here.

Max