2002-01-17 17:13:04

by Lawrence Walton

[permalink] [raw]
Subject: DEVFS broken?

I am not sure how to debug this but it apears that
in 2.5.3-pre1 and in 2.5.2-dj1 DEVFS is not working.
It started by terminals hanging and not being able to
shutdown.
I went to /dev/ and did a ls, it compleatly hangs that
terminal and I cannot kill ls.
I have the devfsd version from debian 1.3.21 .

--
*--* Mail: [email protected]
*--* Voice: 425.739.4247
*--* Fax: 425.827.9577
*--* HTTP://www.otak-k.com/~lawrence/
--------------------------------------
- - - - - - O t a k i n c . - - - - -



2002-01-17 19:05:23

by Dan Chen

[permalink] [raw]
Subject: Re: DEVFS broken?

Also using Debian sid here (devfsd 1.3.21-1). Over the past two days
I've seen random nasties with devfs-v199.6 and v199.7 (I backed out
v199.7 in my local tree because my machine refuses to finish booting
otherwise). Machine: VIA VT82C693A/694x, PIII/1GHz SMP, 1GB HIGHMEM
enabled.

What follows is a series of crashes culled from /var/log/kern.log
(apologies regarding the format). Before each I'll explain what I did to
possibly invoke it. All are with devfs-v199.6 and devfsd-1.3.21 running
2.4.18-pre4 + ext3-2.4-0.9.17-2418p3 + ide.2.4.16.12102001:

# modprobe aic7xxx //I tried to start an Eterm in X 4.1.0.1 afterward
-- snip --
Jan 16 22:54:50 opeth kernel: invalid operand: 0000
Jan 16 22:54:50 opeth kernel: CPU: 0
Jan 16 22:54:50 opeth kernel: EIP: 0010:[d_instantiate+17/68]
Tainted: P
Jan 16 22:54:50 opeth kernel: EFLAGS: 00010202
Jan 16 22:54:50 opeth kernel: eax: 5a5a5a00 ebx: f6b5f560 ecx:
f7aa1b80 ed
x: f6b5f590
Jan 16 22:54:50 opeth kernel: esi: f7aa1b80 edi: f6b5f560 ebp:
f7a7f840 es
p: f7a41f18
Jan 16 22:54:50 opeth kernel: ds: 0018 es: 0018 ss: 0018
Jan 16 22:54:50 opeth kernel: Process devfsd (pid: 26,
stackpage=f7a41000)
Jan 16 22:54:50 opeth kernel: Stack: f77d9460 c0179b13 f6b5f560 f7aa1b80
f6b5f56
0 00000000 f7a41fa4 f7a83620
Jan 16 22:54:50 opeth kernel: c01409be f6b5f560 00000000 f7a41f74
c014118
1 f7a83620 f7a41f74 00000000
Jan 16 22:54:50 opeth kernel: f60d4000 00000000 f7a41fa4 00000009
0000000
9 f60d4005 00000000 f60d4004
Jan 16 22:54:50 opeth kernel: Call Trace:
[devfs_d_revalidate_wait+231/276] [cac
hed_lookup+46/84] [link_path_walk+1409/2016] [path_walk+26/28]
[__user_walk+53/8
0]
Jan 16 22:54:50 opeth kernel: [sys_stat64+25/112] [sys_read+188/196]
[system_
call+51/56]
Jan 16 22:54:50 opeth kernel:
Jan 16 22:54:50 opeth kernel: Code: 0f 0b f0 fe 0d a0 66 2d c0 0f 88 43
09 00 00
85 c9 74 12 8b
Jan 16 23:09:25 opeth kernel: <7>VFS: Disk change detected on device
ide1(22,0)
Jan 16 23:55:23 opeth kernel: scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI
HBA DRIV
ER, Rev 6.2.4
Jan 16 23:55:23 opeth kernel: <Adaptec aic7860 Ultra SCSI
adapter>
Jan 16 23:55:23 opeth kernel: aic7860: Ultra Single Channel A,
SCSI Id=7
, 3/253 SCBs
Jan 16 23:55:23 opeth kernel:
Jan 16 23:55:39 opeth kernel: Vendor: PLEXTOR Model: CD-ROM PX-40TS
Rev:
1.04
Jan 16 23:55:39 opeth kernel: Type: CD-ROM
ANSI
SCSI revision: 02
Jan 16 23:55:39 opeth kernel: Attached scsi CD-ROM sr0 at scsi0,
channel 0, id 3
, lun 0
Jan 16 23:55:39 opeth kernel: (scsi0:A:3): 20.000MB/s transfers
(20.000MHz, offs
et 15)
Jan 16 23:55:39 opeth kernel: sr0: scsi-1 drive
Jan 16 23:59:45 opeth kernel: invalid operand: 0000
Jan 16 23:59:45 opeth kernel: CPU: 1
Jan 16 23:59:45 opeth kernel: EIP: 0010:[d_instantiate+17/68]
Tainted: P
Jan 16 23:59:45 opeth kernel: EFLAGS: 00010202
Jan 16 23:59:45 opeth kernel: eax: 5a5a5a00 ebx: e7304660 ecx:
e8d92e00 ed
x: e7304690
Jan 16 23:59:45 opeth kernel: esi: f742f520 edi: f7ea9140 ebp:
e7304660 es
p: ea08deb4
Jan 16 23:59:45 opeth kernel: ds: 0018 es: 0018 ss: 0018
Jan 16 23:59:45 opeth kernel: Process Eterm (pid: 4924,
stackpage=ea08d000)
Jan 16 23:59:45 opeth kernel: Stack: e8d92e00 c0179d4e e7304660
e8d92e00 fffffff
4 f7a7f840 ea08c000 e7304660
Jan 16 23:59:45 opeth kernel: f7ea914c c0287c80 f3b1b600
c0126938 f6fddb6
0 f3b1b600 4038dc00 00000000
Jan 16 23:59:45 opeth kernel: 00000246 00000000 f7a7f840
f7a7f8a8 f7a8362
0 00000246 c0149613 c200f288
Jan 16 23:59:45 opeth kernel: Call Trace: [devfs_lookup+526/600]
[handle_mm_faul
t+92/184] [d_alloc+27/376] [real_lookup+122/264]
[link_path_walk+1431/2016]
Jan 16 23:59:45 opeth kernel: [path_walk+26/28] [__user_walk+53/80]
[sys_stat
64+25/112] [sys_ioctl+490/497] [system_call+51/56]
Jan 16 23:59:45 opeth kernel:
Jan 16 23:59:45 opeth kernel: Code: 0f 0b f0 fe 0d a0 66 2d c0 0f 88 43
09 00 00
85 c9 74 12 8b
-- snip --

//this one occurred during boot
-- snip --
Jan 17 00:07:47 opeth kernel: invalid operand: 0000
Jan 17 00:07:47 opeth kernel: CPU: 0
Jan 17 00:07:47 opeth kernel: EIP: 0010:[d_instantiate+17/68]
Tainted: P
Jan 17 00:07:47 opeth kernel: EFLAGS: 00010287
Jan 17 00:07:47 opeth kernel: eax: 5a5a5a00 ebx: f7228540 ecx:
f7215060 ed
x: f7228570
Jan 17 00:07:47 opeth kernel: esi: f7215060 edi: f7228540 ebp:
f7a810c0 es
p: f7a41f18
Jan 17 00:07:47 opeth kernel: ds: 0018 es: 0018 ss: 0018
Jan 17 00:07:47 opeth kernel: Process devfsd (pid: 26,
stackpage=f7a41000)
Jan 17 00:07:47 opeth kernel: Stack: f7252940 c0179b13 f7228540 f7215060
f722854
0 00000000 f7a41fa4 f7a82a20
Jan 17 00:07:47 opeth kernel: c01409be f7228540 00000000 f7a41f74
c014118
1 f7a82a20 f7a41f74 00000000
Jan 17 00:07:47 opeth kernel: f7854000 00000000 f7a41fa4 00000009
0000000
9 f7854005 00000000 f7854004
Jan 17 00:07:47 opeth kernel: Call Trace:
[devfs_d_revalidate_wait+231/276] [cac
hed_lookup+46/84] [link_path_walk+1409/2016] [path_walk+26/28]
[__user_walk+53/8
0]
Jan 17 00:07:47 opeth kernel: [sys_stat64+25/112] [sys_read+188/196]
[system_
call+51/56]
Jan 17 00:07:47 opeth kernel:
Jan 17 00:07:47 opeth kernel: Code: 0f 0b f0 fe 0d a0 66 2d c0 0f 88 43
09 00 00 85 c9 74 12 8b
-- snip --

Interestingly enough, I applied rml's preempt-kernel-rml-2.4.18-pre4-1,
recompiled, rebooted, and have yet to see an oops, though I believe it's
just dumb luck thus far. Any ideas?

On Thu, Jan 17, 2002 at 09:12:29AM -0800, Lawrence Walton wrote:
> I am not sure how to debug this but it apears that
> in 2.5.3-pre1 and in 2.5.2-dj1 DEVFS is not working.
> It started by terminals hanging and not being able to
> shutdown.
> I went to /dev/ and did a ls, it compleatly hangs that
> terminal and I cannot kill ls.
> I have the devfsd version from debian 1.3.21 .

--
Dan Chen [email protected]
GPG key: http://www.unc.edu/~crimsun/pubkey.gpg.asc


Attachments:
(No filename) (5.85 kB)
(No filename) (232.00 B)
Download all attachments

2002-01-18 10:13:27

by Helge Hafting

[permalink] [raw]
Subject: Re: DEVFS broken?

Lawrence Walton wrote:
>
> I am not sure how to debug this but it apears that
> in 2.5.3-pre1 and in 2.5.2-dj1 DEVFS is not working.
> It started by terminals hanging and not being able to
> shutdown.
> I went to /dev/ and did a ls, it compleatly hangs that
> terminal and I cannot kill ls.
> I have the devfsd version from debian 1.3.21 .

I run into that inability to shutdown occationally.
There is an easy fix though:

kill -SIGUSR1 1
This is the documented way of dealing with a
remounted /dev. shutdown, init, telinit
work normally after that. Unless there
are other errors of course.

Helge Hafting

2002-01-20 20:04:56

by Richard Gooch

[permalink] [raw]
Subject: Re: DEVFS broken?

Dan Chen writes:
> Also using Debian sid here (devfsd 1.3.21-1). Over the past two days
> I've seen random nasties with devfs-v199.6 and v199.7 (I backed out
> v199.7 in my local tree because my machine refuses to finish booting
> otherwise). Machine: VIA VT82C693A/694x, PIII/1GHz SMP, 1GB HIGHMEM
> enabled.
>
> What follows is a series of crashes culled from /var/log/kern.log
> (apologies regarding the format). Before each I'll explain what I did to
> possibly invoke it. All are with devfs-v199.6 and devfsd-1.3.21 running
> 2.4.18-pre4 + ext3-2.4-0.9.17-2418p3 + ide.2.4.16.12102001:
>
> # modprobe aic7xxx //I tried to start an Eterm in X 4.1.0.1 afterward
> -- snip --
> Jan 16 22:54:50 opeth kernel: invalid operand: 0000

In future, please just send dmesg output, rather than
/var/log/kern.log output. The former doesn't have all the
date+hostname+" kernel: " crap that syslog puts in.

devfs-patch-v199.6 has a race that causes the Oops. devfs-patch-v199.7
fixes this race, but unfortunately had a silly oversight which could
cause deadlocks under some circumstances (and of course several days
of testing on my box didn't show it:-().

Please apply this patch on top of devfs-patch-v199.7 or on top of
plain 2.5.2, and let me know the result.

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

diff -urN linux-2.5.3-pre2/fs/devfs/base.c linux/fs/devfs/base.c
--- linux-2.5.3-pre2/fs/devfs/base.c Mon Jan 14 10:40:29 2002
+++ linux/fs/devfs/base.c Sun Jan 20 12:09:55 2002
@@ -1,6 +1,6 @@
/* devfs (Device FileSystem) driver.

- Copyright (C) 1998-2001 Richard Gooch
+ Copyright (C) 1998-2002 Richard Gooch

This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
@@ -604,6 +604,9 @@
20020113 Richard Gooch <[email protected]>
Fixed (rare, old) race in <devfs_lookup>.
v1.9
+ 20020120 Richard Gooch <[email protected]>
+ Fixed deadlock bug in <devfs_d_revalidate_wait>.
+ v1.10
*/
#include <linux/types.h>
#include <linux/errno.h>
@@ -636,7 +639,7 @@
#include <asm/bitops.h>
#include <asm/atomic.h>

-#define DEVFS_VERSION "1.9 (20020113)"
+#define DEVFS_VERSION "1.10 (20020120)"

#define DEVFS_NAME "devfs"

@@ -2878,13 +2881,16 @@
struct devfs_lookup_struct *lookup_info = dentry->d_fsdata;
DECLARE_WAITQUEUE (wait, current);

- if ( !dentry->d_inode && is_devfsd_or_child (fs_info) )
+ if ( is_devfsd_or_child (fs_info) )
{
devfs_handle_t de = lookup_info->de;
struct inode *inode;

- DPRINTK (DEBUG_I_LOOKUP, "(%s): dentry: %p de: %p by: \"%s\"\n",
- dentry->d_name.name, dentry, de, current->comm);
+ DPRINTK (DEBUG_I_LOOKUP,
+ "(%s): dentry: %p inode: %p de: %p by: \"%s\"\n",
+ dentry->d_name.name, dentry, dentry->d_inode, de,
+ current->comm);
+ if (dentry->d_inode) return 1;
if (de == NULL)
{
read_lock (&parent->u.dir.lock);

2002-01-21 00:00:58

by Dan Chen

[permalink] [raw]
Subject: Re: DEVFS broken?

Grrr, I realized that I sent undecoded output instead of the ksymoops I
had here. I'll apply this patch and get back to you.

On Sun, Jan 20, 2002 at 01:04:36PM -0700, Richard Gooch wrote:
> In future, please just send dmesg output, rather than
> /var/log/kern.log output. The former doesn't have all the
> date+hostname+" kernel: " crap that syslog puts in.

--
Dan Chen [email protected]
GPG key: http://www.unc.edu/~crimsun/pubkey.gpg.asc


Attachments:
(No filename) (458.00 B)
(No filename) (232.00 B)
Download all attachments

2002-01-21 00:46:31

by Dan Chen

[permalink] [raw]
Subject: Re: DEVFS broken?

Working well so far: booted and no oopses yet. Thanks!

On Sun, Jan 20, 2002 at 01:04:36PM -0700, Richard Gooch wrote:
> Please apply this patch on top of devfs-patch-v199.7 or on top of
> plain 2.5.2, and let me know the result.

--
Dan Chen [email protected]
GPG key: http://www.unc.edu/~crimsun/pubkey.gpg.asc


Attachments:
(No filename) (329.00 B)
(No filename) (232.00 B)
Download all attachments

2002-01-21 17:38:44

by Dan Chen

[permalink] [raw]
Subject: Re: DEVFS broken?

Correct.

On Sun, Jan 20, 2002 at 05:50:26PM -0700, Richard Gooch wrote:
> I assume no deadlock either?

--
Dan Chen [email protected]
GPG key: http://www.unc.edu/~crimsun/pubkey.gpg.asc


Attachments:
(No filename) (203.00 B)
(No filename) (232.00 B)
Download all attachments