2004-09-28 08:48:26

by Frank Steiner

[permalink] [raw]
Subject: Hanging udev process on nfs-mounted /dev

Hi,

I cross-post this to the usb and the nfs list, because I'm not
sure if it is an udev or a nfs issue. I hope this is ok.

I got two logs of the hanging udev process which consumes 90%
of the CPU. It seems to be independent from the card reader, in
contrast to what I wrote in the other mail on the usb list, so
I started a new thread about this topic.

The issue:
==========
From time to time some udev process goes mad and comsumes allmost all
the CPU power, making the whole system terribly slow.

The software:
=============
SuSE 9.1, kernel 2.6.8.1, udev 032, klibc 0.179, sysfsutils 1.1.0,
hotplug 0.44 (SuSE hotplug package, the rest (udev, klibc, sysfsutils,
kernel) has been replaced by newer versions).

My guess:
=========
Maybe sth. related to nfs? The udev process hangs at/after calling
F_SETLKW, see traces below.

We have diskless clients which mount their own /dev over NFS from a
server (not shared with other clients, each has its own /dev). Since
the clients don't have local hard disks, I guess this cannot be done
without NFS because we maintain some permanent links like
/dev/cdrecorder -> /dev/hdd, so we need some permanent filesystem.
I though about using a ramdisk for /dev on the clients and make udev
setup such links, but I'm not sure if a ramdisk for /dev is better
than a NFS mount. Would that be worth a try?

/dev is mounted in /etc/init.d/boot, before any other start script
runs. Thus, it is mounted with "nolock" because at this time no lockd
etc. is running.

How I got the logs:
===================

I moved /sbin/udev[start] to /sbin/utest/ and
created a script /sbin/udev with

#!/bin/bash
strace -o /var/log/udev.log.`uname -n`.${$} -f /sbin/utest/`basename $0` $@

The first two logs are from the /sbin/udevstart call which is initiated
from /etc/init.d/boot.udev. The .hangs log is from the hanging udev process.
It was not killed manually, because we did not have a shell at this point,
so after some time we hard-rebooted the computer.
The second "succeed" is from the same computer, 5 boots later. So the error
does not occur every time.

The other two logs are from a call "/sbin/udev block" initiated by
/etc/hotplug/block.agent when calling "pktsetup mycd /dev/cdrecorder"
manually. Here we killed the process after 15 minutes as you can see.

The logs are almost identical, just some difference w.r.t to locking.
E.g., the hanging pktsetup process issues some
fcntl64(0, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=4, len=1}) = -1 EAGAIN (Resource temporarily unavailable).

The main difference is at the end: The hanging process hang after some F_SETLKW command:

udevstart.hangs:
================
1073 unlink("/dev/mapper/control") = 0
1073 symlink("../device-mapper", "/dev/mapper/control") = 0
1073 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=332, len=1}) = 0
1073 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0
<hangs forever>

udevstart.succeeds:
===================
1073 unlink("/dev/mapper/control") = 0
1073 symlink("../device-mapper", "/dev/mapper/control") = 0
1073 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=332, len=1}) = 0
1073 fcntl64(0, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=332, len=1}) = 0
1073 open("/etc/dev.d/device-mapper", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
1073 open("/etc/dev.d/misc", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
1073 open("/etc/dev.d/default", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
1073 munmap(0x4001a000, 16384) = 0
1073 close(0) = 0
1073 exit_group(0) = ?

udev.pktsetup.hangs:
====================
lstat64("/sys/block/pktcdvd0/device", 0xbffff37c) = -1 ENOENT (No such file or directory)
time(NULL) = 1095859917
fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=424, len=1}) = 0
fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) --- < we killed it here >
munmap(0x4001a000, 114688) = 0
close(0) = 0
exit_group(35) = ?


udev.pktsetup.succeds:
======================
3161 chmod("/dev/pktcdvd0", 060600) = 0
3161 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=424, len=1}) = 0
3161 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0
3161 fcntl64(0, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=164, len=1}) = 0
3161 fcntl64(0, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=424, len=1}) = 0
3161 open("/etc/dev.d/pktcdvd0", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
3161 open("/etc/dev.d/block", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
3161 open("/etc/dev.d/default", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
3161 munmap(0x4001a000, 32768) = 0
3161 close(0) = 0
3161 exit_group(0) = ?

Could that be a problem related to /dev mounted with "nolock" via NFS,
or just some bug in NFS?

I didn't send the full logs, they are quite long. They are here it
someone needs to take a look:
http://www.bio.ifi.lmu.de/~steiner/udevstart.hangs
http://www.bio.ifi.lmu.de/~steiner/udevstart.succeeds
http://www.bio.ifi.lmu.de/~steiner/udev.pktsetup.hangs
http://www.bio.ifi.lmu.de/~steiner/udev.pktsetup.succeeds

I greatly appreciate any hints, because this problem hits our hosts
quite often and users cannot kill this udev process, so they have
to find some admin to kill it :-(

cu,
Frank


--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-09-28 09:15:29

by Frank Steiner

[permalink] [raw]
Subject: Re: Hanging udev process on nfs-mounted /dev

Sorry, a small mistake I made: The udevstart.* script are not the
strace output from /sbin/udevstart, but seem to be initiated from
SuSEs /etc/init.d/boot.device-mapper script, which e.g. fscks the
filesystems etc.
Just that you don't wonder why the logs look so much different
from a udevstart output :-)

--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
[email protected]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-users

2004-09-28 10:11:39

by Frank Steiner

[permalink] [raw]
Subject: Re: Hanging udev process on nfs-mounted /dev

Sorry, a small mistake I made: The udevstart.* script are not the
strace output from /sbin/udevstart, but seem to be initiated from
SuSEs /etc/init.d/boot.device-mapper script, which e.g. fscks the
filesystems etc.
Just that you don't wonder why the logs look so much different
from a udevstart output :-)

--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-09-28 11:59:16

by Bernd Schubert

[permalink] [raw]
Subject: Re: Hanging udev process on nfs-mounted /dev

Hello Frank,

stupid question, udev always uses tmpfs, doesn't it? So that shouldn't be nfs
related at all?

Cheers,
Bernd



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-09-28 14:36:17

by Frank Steiner

[permalink] [raw]
Subject: Re: Hanging udev process on nfs-mounted /dev

Bernd Schubert wrote

> Hello Frank,
>
> stupid question, udev always uses tmpfs, doesn't it?

Hmm, I don't know. On the normal hosts (non diskless), I have tmpfs
mounted at /dev/shm, but /dev itself is on the normal filesystem.
And when I call e.g. "pktsetup mycd /dev/cdrecorder", udev creates
e.g. /dev/pktcdvd0 which did not exist before.
So I guess this is a normal device in the /dev directory and not related
to tmpfs?

However, I now wonder if I couldn't use a tmpfs for /dev for the clients...
hmm...

--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-09-28 15:56:58

by Bernd Schubert

[permalink] [raw]
Subject: Re: Hanging udev process on nfs-mounted /dev

On Tuesday 28 September 2004 16:36, Frank Steiner wrote:
> Bernd Schubert wrote
>
> > Hello Frank,
> >
> > stupid question, udev always uses tmpfs, doesn't it?
>
> Hmm, I don't know. On the normal hosts (non diskless), I have tmpfs
> mounted at /dev/shm, but /dev itself is on the normal filesystem.
> And when I call e.g. "pktsetup mycd /dev/cdrecorder", udev creates
> e.g. /dev/pktcdvd0 which did not exist before.
> So I guess this is a normal device in the /dev directory and not related
> to tmpfs?
>
> However, I now wonder if I couldn't use a tmpfs for /dev for the clients.=
=2E.
> hmm...

On debian the udev init script always mounts /dev as tmpfs and as far as I=
=20
know this is the intended behaviour from the udev author.


Cheers,
Bernd

=2D-=20
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit=E4t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-09-29 06:00:11

by Frank Steiner

[permalink] [raw]
Subject: Re: Hanging udev process on nfs-mounted /dev

Bernd Schubert wrote

> On debian the udev init script always mounts /dev as tmpfs and as far as I
> know this is the intended behaviour from the udev author.

Looks like SuSE is not doing this. But actually, it could be a solution
for my problem! Thanks for the hint!
Anyway, would be interesting to know if this is a NFS bug or not, just
to avoid stepping over it later :-)

--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs