2003-07-07 18:52:53

by Andrey Borzenkov

[permalink] [raw]
Subject: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch

On Monday 07 July 2003 04:54, you wrote:
> Actually, don't bother. This idea can be made to work, but
> we already have enough tricky stuff in the wait/wakeup area.
>
> Let's run with your original patch.
>

I finally hit a painfully trivial way to reproduce another long standing devfs
problem - deadlock between devfs_lookup and devfs_d_revalidate_wait. When
devfs_lookup releases directory i_sem devfs_d_revalidate_wait grabs it (it
happens not for every path) and goes to wait to be waked up. Unfortunately,
devfs_lookup attempts to acquire directory i_sem before ever waking it up ...

To reproduce (2.5.74 UP or SMP - does not matter, single CPU system)

ls /dev/foo & rm -f /dev/foo &

or possibly in a loop but then it easily fills up process table. In my case it
hangs 100% reliably - on 2.5 OR 2.4.

The current fix is to move re-acquire of i_sem after all
devfs_d_revalidate_wait waiters have been waked up. Much better fix would be
to ensure that ->d_revalidate either is always called under i_sem or always
without. But that means the very heart of VFS and I do not dare to touch it.

The fix has been tested on 2.4 (and is part of unofficial Mandrake Club
kernel); I expected the same bug is in 2.5; I just was stupid not seeing the
way to reproduce it before.

Attached is combined patch and fix for deadlock only (to show it alone).
Andrew, I slightly polished original stack corruption version to look more
consistent with the rest of devfs; also removed NULL pointer checks - let it
just BUG in this case if it happens.

I have already sent the patch for 2.4 two times - please, could somebody
finally either apply it or explain what is wrong with it. Richard is out of
reach apparently and the bug is real and seen by many people.

regards

-andrey


Attachments:
(No filename) (1.74 kB)
2.5.74-devfs_combined.patch (4.11 kB)
2.5.74-devfs_lookup_deadlock.patch (829.00 B)
Download all attachments

2003-07-07 20:51:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch

Andrey Borzenkov <[email protected]> wrote:
>
> I finally hit a painfully trivial way to reproduce another long standing devfs
> problem - deadlock between devfs_lookup and devfs_d_revalidate_wait.

uh.

> The current fix is to move re-acquire of i_sem after all
> devfs_d_revalidate_wait waiters have been waked up.

Directory modifications appear to be under write_lock(&dir->u.dir.lock); so
that obvious problem is covered. Your patch might introduce a race around
_devfs_get_vfs_inode() - two CPUs running that against the same inode at
the same time?

> Andrew, I slightly polished original stack corruption version to look more
> consistent with the rest of devfs;

I think it's Lindent time.


2003-07-07 21:27:10

by Pavel Roskin

[permalink] [raw]
Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch

On Mon, 7 Jul 2003, Andrey Borzenkov wrote:

> To reproduce (2.5.74 UP or SMP - does not matter, single CPU system)
>
> ls /dev/foo & rm -f /dev/foo &
>
> or possibly in a loop but then it easily fills up process table. In my case it
> hangs 100% reliably - on 2.5 OR 2.4.

I've done some testing too. I was using this script:

while :; do ls /dev/foo & rm -f /dev/foo & done

On Linux 2.5.74-bk5 without patches, running this script on a local
virtual console makes the system very slow. I could not even login on
another virtual console. However, I could interrupt the script by Ctrl-C.
After that I had a large number of processes in the D state. Here's an
excerpt from the output of "ps ax":

31011 vc/1 D 0:00 ls /dev/foo
31012 vc/1 D 0:00 rm -f /dev/foo
31013 vc/1 D 0:00 ls /dev/foo
31014 vc/1 D 0:00 rm -f /dev/foo

I couldn't reboot the system cleanly. My guess is that no new processes
could be run.

Linux 2.5.74-bk5 with the "combined" patch is OK. Running the test
script doesn't prevent new logins. There are no hanging processes after
interrupting the script.

> I have already sent the patch for 2.4 two times - please, could somebody
> finally either apply it or explain what is wrong with it. Richard is out of
> reach apparently and the bug is real and seen by many people.

I confirm seeing the bug on two systems with recent 2.4.x kernels with
probability over 50%. Upgrading both systems to 2.5.x cured the problem
(of course, it's just a race condition that stopped happening).

Yes, it's an important problem to fix, and maybe we could remove the
"experimental" mark from CONFIG_DEVFS once it's done. Maybe it's better
to have the patch accepted in the 2.5 series first just for "methodical"
reasons, but in practical terms, it's 2.4 kernels that need the deadlock
fix very badly.

--
Regards,
Pavel Roskin

2003-07-08 17:44:54

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch

On Tuesday 08 July 2003 01:00, Andrew Morton wrote:
> Andrey Borzenkov <[email protected]> wrote:
> > I finally hit a painfully trivial way to reproduce another long standing
> > devfs problem - deadlock between devfs_lookup and
> > devfs_d_revalidate_wait.
>
> uh.
>
> > The current fix is to move re-acquire of i_sem after all
> > devfs_d_revalidate_wait waiters have been waked up.
>
> Directory modifications appear to be under write_lock(&dir->u.dir.lock); so
> that obvious problem is covered. Your patch might introduce a race around
> _devfs_get_vfs_inode() - two CPUs running that against the same inode at
> the same time?
>

Actually it just makes it marginally more probable.

Normal open without O_CREATE runs ->d_revalidate outside of i_sem i.e. neither
devfs_lookup vs. devfs_d_revalidate_wait nor devfs_d_revalidate_wait vs.
itself are protected by i_sem and this is (should be) the most common case
for /dev access.

This race happens under non-trivial conditions. devfsd descendant (i.e. some
action) should be involved; and action triggered by devfs_lookup does not
race with it by definition because devfs_lookup waits for action to finish.
I.e. it needs another devfsd action that would access /dev entry after it
just has been created or two concurrent lookups in LOOKUP action itself.
Quite unlikely in real life and race window is very small.

I do not want to sound like it has to be ignored - but devfs code is so messy
that no trivial fix exists that would not make code even more messy. So I
would still apply original fixes and let this problem be solved later - it is
not so important as to delay two other.

-andrey

2003-07-09 01:05:33

by Herbert Poetzl

[permalink] [raw]
Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch

On Tue, Jul 08, 2003 at 09:49:17PM +0400, Andrey Borzenkov wrote:
>
> I do not want to sound like it has to be ignored - but devfs code is so messy
> that no trivial fix exists that would not make code even more messy. So I

sorry to interrupt, but wasn't there an ongoing
efford to replace the devfs with smalldevfs or
something even better? *hint*

please ignore me, if I'm talking nonsense ...

best,
Herbert

> would still apply original fixes and let this problem be solved later - it is
> not so important as to delay two other.
>
> -andrey

2003-07-09 01:12:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch

Herbert Poetzl <[email protected]> wrote:
>
> On Tue, Jul 08, 2003 at 09:49:17PM +0400, Andrey Borzenkov wrote:
> >
> > I do not want to sound like it has to be ignored - but devfs code is so messy
> > that no trivial fix exists that would not make code even more messy. So I
>
> sorry to interrupt, but wasn't there an ongoing
> efford to replace the devfs with smalldevfs or
> something even better? *hint*
>

Yes, but

a) It didn't have a compatible solution for the legacy device names
(/dev/hda, etc). Could have been fixed up in userspace but the work was
not done.

b) Certain parties youknowwhoyouare seem to have been stricken by smalldevfs
amnesia.

I'm hoping that smalldevfs comes back. The current thing is a running
sore.

2003-07-09 01:55:07

by Herbert Poetzl

[permalink] [raw]
Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch

On Tue, Jul 08, 2003 at 06:26:20PM -0700, Andrew Morton wrote:
> Herbert Poetzl <[email protected]> wrote:
> >
> > On Tue, Jul 08, 2003 at 09:49:17PM +0400, Andrey Borzenkov wrote:
> > >
> > > I do not want to sound like it has to be ignored -
> > > but devfs code is so messy that no trivial fix exists
> > > that would not make code even more messy.
> >
> > sorry to interrupt, but wasn't there an ongoing
> > efford to replace the devfs with smalldevfs or
> > something even better? *hint*
> >
>
> Yes, but
>
> a) It didn't have a compatible solution for the legacy device
> names (/dev/hda, etc). Could have been fixed up in userspace
> but the work was not done.

I might be totally wrong, as I can only speak
for 2.4 (which has no ongoing/forgotten smalldevfs
efford ;), but devfs has definitely divided the
users into two groups (think religion/war) ...

the group using devfs, usually doesn't care about
the 'compatibility' issue, the other doesn't care
at all ... so this isn't an issue at all ...

> b) Certain parties youknowwhoyouare seem to have been stricken
> by smalldevfs amnesia.

maybe this helps youknowwhoyoumean to remember ...

> I'm hoping that smalldevfs comes back.
> The current thing is a running sore.

I'm hoping too, and I would like to see it on
2.6 as well as 2.4 ...

using 2.4 I'm currently bound to devfs, as I'm
one of the pro-devfs guys, and I think Richard
Gooch did a great work with it ... (maybe a
little too much work actually ;)

best,
Herbert

>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2003-07-09 10:22:51

by Thierry Vignaud

[permalink] [raw]
Subject: Re: [PATCH][2.5.74] devfs lookup deadlock/stack corruption combined patch

Herbert Poetzl <[email protected]> writes:

> the group using devfs, usually doesn't care about
> the 'compatibility' issue,

i disagree: distro vendors wants that compatibility.
we want both the features of devfsd+hotplug+dynamic and the
compatibility devices.

> > I'm hoping that smalldevfs comes back.
> > The current thing is a running sore.
>
> I'm hoping too, and I would like to see it on 2.6 as well as 2.4 ...

there's been quite some cleanups on the 2.5.x side.
maybe can we complete this work before thinking about a new system ?
(remember the kbuild2 destiny ...)

if we do not want to create a compatibility layer such as devfsd for
smalldevfs, we can still alter drivers to create /dev/hda3 instead of
/dev/ide/host0/bus0/target0/lun0/part3 and the like

withouth any compatibility, smalldevfs will be a pain in th *ss for
distro vendors.