2001-03-27 01:59:20

by David E. Weekly

[permalink] [raw]
Subject: "mount -o loop" lockup issue

On Linux 2.4.2, running a "mount -o loop" on a file properly created with
"dd if=/dev/zero of=/path/to/my/file.img count=1024" seems to decide to
freeze up my shell (not my system). An strace showed the lockup happening at
the actual system "mount()" call, which never returns.

Since mount() is in glibc, it might be relevant to note that I'm running
Mandrake's glibc 2.1.3-16mdk. I compiled the kernel with a gcc of 2.95.3
[1991030] (although oddly enough this binary seems to have come with the
gcc-2.95.2 RPM and installed itself as /usr/bin/gcc-2.95.2) and binutils
2.10.0.24-4mdk.

I'm very sorry to post to this list, but several people independantly told
me that there was a loopback mountpoint deadlocking issue with 2.4.2 and
that I should check here. Of course, this could be a completely retarded
system configuration issue, in which case please shut me up and I'll go away
quietly. But if it is an issue with a known resolution I'd love to hear it -
I wasn't able to find resolution on the web or with several rather
knowledgeable people.

-david weekly [[email protected]]



2001-03-27 03:31:24

by Jason Madden

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

On Mon, 26 Mar 2001, David E. Weekly wrote:

> On Linux 2.4.2, running a "mount -o loop" on a file properly created with
> "dd if=/dev/zero of=/path/to/my/file.img count=1024" seems to decide to
> freeze up my shell (not my system). An strace showed the lockup happening at
> the actual system "mount()" call, which never returns.
>
> Since mount() is in glibc, it might be relevant to note that I'm running
> Mandrake's glibc 2.1.3-16mdk. I compiled the kernel with a gcc of 2.95.3
> [1991030] (although oddly enough this binary seems to have come with the
> gcc-2.95.2 RPM and installed itself as /usr/bin/gcc-2.95.2) and binutils
> 2.10.0.24-4mdk.
I also experience this problem (using a floppy disk image created by
dd if=/dev/fd0 of=floppy.img bs=1024, and then mount -o loop
floppy.img /mnt/floppy ) with a different version
of glibc (RedHat's 2.1.92-5 rpm) and binutils (binutils-2.10.0.18-1). Loop
is compiled into the kernel.

Once the mount command was executed, my load average shot up to a steady
1.0 on an idle system, and remained there until I rebooted. top
et. al. showed no cpu utilization by the frozen mount.


2001-03-27 03:51:31

by David Konerding

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

It's a bug in Linux 2.4.2, fixed in later versions. Regression/quality control
testing would
have caught this, but the developers usually just break things and wait for people
to complain
as their "Regression" testers.

Jason Madden wrote:

> On Mon, 26 Mar 2001, David E. Weekly wrote:
>
> > On Linux 2.4.2, running a "mount -o loop" on a file properly created with
> > "dd if=/dev/zero of=/path/to/my/file.img count=1024" seems to decide to
> > freeze up my shell (not my system). An strace showed the lockup happening at
> > the actual system "mount()" call, which never returns.
> >
> > Since mount() is in glibc, it might be relevant to note that I'm running
> > Mandrake's glibc 2.1.3-16mdk. I compiled the kernel with a gcc of 2.95.3
> > [1991030] (although oddly enough this binary seems to have come with the
> > gcc-2.95.2 RPM and installed itself as /usr/bin/gcc-2.95.2) and binutils
> > 2.10.0.24-4mdk.
> I also experience this problem (using a floppy disk image created by
> dd if=/dev/fd0 of=floppy.img bs=1024, and then mount -o loop
> floppy.img /mnt/floppy ) with a different version
> of glibc (RedHat's 2.1.92-5 rpm) and binutils (binutils-2.10.0.18-1). Loop
> is compiled into the kernel.
>
> Once the mount command was executed, my load average shot up to a steady
> 1.0 on an idle system, and remained there until I rebooted. top
> et. al. showed no cpu utilization by the frozen mount.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-03-27 03:52:01

by Mohammad A. Haque

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

Jason Madden wrote:
>
> On Mon, 26 Mar 2001, David E. Weekly wrote:
>
> > On Linux 2.4.2, running a "mount -o loop" on a file properly created with
> > "dd if=/dev/zero of=/path/to/my/file.img count=1024" seems to decide to
> > freeze up my shell (not my system). An strace showed the lockup happening at
> > the actual system "mount()" call, which never returns.
....
> I also experience this problem (using a floppy disk image created by
> dd if=/dev/fd0 of=floppy.img bs=1024, and then mount -o loop
> floppy.img /mnt/floppy ) with a different version
> of glibc (RedHat's 2.1.92-5 rpm) and binutils (binutils-2.10.0.18-1). Loop
> is compiled into the kernel.

Follow this thread -->
<http://marc.theaimsgroup.com/?l=linux-kernel&m=98289750805700&w=2>

Latest loop patch is available at
<ftp://ftp.kernel.org/pub/linux/kernel/people/axboe/patches/2.4.3-pre1/>

--

=====================================================================
Mohammad A. Haque http://www.haque.net/
[email protected]

"Alcohol and calculus don't mix. Project Lead
Don't drink and derive." --Unknown http://wm.themes.org/
[email protected]
=====================================================================

2001-03-27 04:00:52

by William Stearns

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

Good day, all,

On Mon, 26 Mar 2001, Jason Madden wrote:

> On Mon, 26 Mar 2001, David E. Weekly wrote:
>
> > On Linux 2.4.2, running a "mount -o loop" on a file properly created with
> > "dd if=/dev/zero of=/path/to/my/file.img count=1024" seems to decide to
> > freeze up my shell (not my system). An strace showed the lockup happening at
> > the actual system "mount()" call, which never returns.
> >
> > Since mount() is in glibc, it might be relevant to note that I'm running
> > Mandrake's glibc 2.1.3-16mdk. I compiled the kernel with a gcc of 2.95.3
> > [1991030] (although oddly enough this binary seems to have come with the
> > gcc-2.95.2 RPM and installed itself as /usr/bin/gcc-2.95.2) and binutils
> > 2.10.0.24-4mdk.
> I also experience this problem (using a floppy disk image created by
> dd if=/dev/fd0 of=floppy.img bs=1024, and then mount -o loop
> floppy.img /mnt/floppy ) with a different version
> of glibc (RedHat's 2.1.92-5 rpm) and binutils (binutils-2.10.0.18-1). Loop
> is compiled into the kernel.
>
> Once the mount command was executed, my load average shot up to a steady
> 1.0 on an idle system, and remained there until I rebooted. top
> et. al. showed no cpu utilization by the frozen mount.

Jens Axboe, along with a number of other people, has put in a lot
of time coming up with a fix for the loop mount lockups. You can either
get his patch directly from
ftp://ftp.kernel.org/pub/linux/kernel/people/axboe/patches/
or simply use the most recent 2.4.2-ac patch (from
ftp://ftp.kernel.org/pub/linux/kernel/people/alan/
) to get this updated loop device code. I'm certain Jens would
like to hear from you if you find any problems with the updated code.
Cheers,
- Bill

---------------------------------------------------------------------------
The day Microsoft makes something that doesn't suck is
probably the day they start making vacuum cleaners.
-- Ernst Jan Plugge
(Courtesy of Christian Vogel <[email protected]>)
--------------------------------------------------------------------------
William Stearns ([email protected]). Mason, Buildkernel, named2hosts,
and ipfwadm2ipchains are at: http://www.pobox.com/~wstearns
LinuxMonth; articles for Linux Enthusiasts! http://www.linuxmonth.com
--------------------------------------------------------------------------

2001-03-27 04:18:24

by Alan

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

> It's a bug in Linux 2.4.2, fixed in later versions. Regression/quality control
> testing would
> have caught this, but the developers usually just break things and wait for people
> to complain
> as their "Regression" testers.

Hardly. We knew it was broken since well before 2.4.0. It just got a little
interesting to fix.

2001-03-27 05:32:52

by Rik van Riel

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

On Mon, 26 Mar 2001, David Konerding wrote:

> It's a bug in Linux 2.4.2, fixed in later versions.
> Regression/quality control testing would have caught this, but the
> developers usually just break things and wait for people to complain
> as their "Regression" testers.

As said before, we're interested in people willing to do regression
tests on the kernel. Unfortunately, not all that many testers have
stepped forward and not all that many artificial tests are being run.

Good thing we still have the beta-testers to catch these things,
while running the kernel in real-world scenarios... ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-27 08:15:34

by David Konerding

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue



Rik van Riel wrote:

> On Mon, 26 Mar 2001, David Konerding wrote:
>
> > It's a bug in Linux 2.4.2, fixed in later versions.
> > Regression/quality control testing would have caught this, but the
> > developers usually just break things and wait for people to complain
> > as their "Regression" testers.
>
> As said before, we're interested in people willing to do regression
> tests on the kernel. Unfortunately, not all that many testers have
> stepped forward and not all that many artificial tests are being run.

No, the point is that the linux developers should regression test their
code BEFORE
releasing it to the public as a version like "2.4.2". When I see a
version like "2.4.2", I have an expectation that all the stupid little
problems (like mounting loopback filesystem) have already been found.

It's even worse that these are obvious, simple bugs (like the "NFS doesn't
work over reiserfs
because somebody changed the VFS layer and didn't fix any filesystems but
ext2" that I reported a while ago) which would have been caught by a
little testing.

Now, don't even get me started on how the developers are fixing every
legitimate bug found by CHECKER when they refused to put a debugger into
the kernel "because a good programmer finds their bug by studying the
code"-- well, obviously, you didn't find a lot of bugs by studying the
code.

I've been using Linux for something like 6-7 years now, quite faithfully.
I've been very impressed with
many of its facilities, and the improvements to the kernel (which I've
compiled since 0.99) have been astounding. But the attitude that "many
eyes make all bugs shallow" and "let the users test the code for us" just
don't hold up. For the former, clearly, many eyes didn't find a lot of
basically obvious bugs, for the latter, it's just impolite.

2001-03-27 08:33:35

by David Konerding

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

Alan Cox wrote:

> > It's a bug in Linux 2.4.2, fixed in later versions. Regression/quality control
> > testing would
> > have caught this, but the developers usually just break things and wait for people
> > to complain
> > as their "Regression" testers.
>
> Hardly. We knew it was broken since well before 2.4.0. It just got a little
> interesting to fix.

And this is described in what release notes? It worked just fine on Red Hat 7.0's 2.4
kernel.... oh wait, I see that they fixed it before they released it.

2001-03-27 12:33:09

by Rik van Riel

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

On Tue, 27 Mar 2001, David Konerding wrote:

> No, the point is that the linux developers should regression test
> their code BEFORE releasing it to the public as a version like
> "2.4.2".

I take it you're volunteering ?

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-27 13:26:01

by Mohammad A. Haque

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

David Konerding wrote:

> And this is described in what release notes? It worked just fine on Red Hat 7.0's 2.4
> kernel.... oh wait, I see that they fixed it before they released it.

And hmm..gee .. did they bother contributing back the code?

--

=====================================================================
Mohammad A. Haque http://www.haque.net/
[email protected]

"Alcohol and calculus don't mix. Project Lead
Don't drink and derive." --Unknown http://wm.themes.org/
[email protected]
=====================================================================

2001-03-27 13:50:53

by James Lewis Nance

[permalink] [raw]
Subject: Kernel QA

On Tue, Mar 27, 2001 at 12:13:32AM -0800, David Konerding wrote:

> No, the point is that the linux developers should regression test their
> code BEFORE
> releasing it to the public as a version like "2.4.2". When I see a
> version like "2.4.2", I have an expectation that all the stupid little
> problems (like mounting loopback filesystem) have already been found.

You bring up a good point. We call the even branches the stable branches
and we do other things that promote the idea that people should be able to
download a 2.even.X kernel, install it on their machine, and expect it to
work. I think we need to back away from this idea. It seems to me that
the real (perhaps not the intended) function of kernel releases is keeping
kernel developers in sync. Promoting the idea that they are thought to be
suitable for production use just gets us in trouble.

Instead I think we need to encourage people who want to use Linux,
rather than develop it, to use kernels from a distribution. After all,
the distributors put a lot of effort into doing QA and putting together a
compatable system, we should leverage that. We need to ensure that people
know that when they install the latest kernel from Linus, they are the QA.

Please note that I am not trying to say that we should not try and
make the kernels we release as good as possible. It certainly makes
things a lot better for everyone if bugs dont get introduced by new
kernel versions. I do think we need to be more explicit about exactly
what people should and should not be able to expect from a "Linus kernel".

Thanks,

Jim

2001-03-27 16:24:53

by Alan

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

> It's even worse that these are obvious, simple bugs (like the "NFS doesn't
> work over reiserfs
> because somebody changed the VFS layer and didn't fix any filesystems but
> ext2" that I reported a while ago) which would have been caught by a
> little testing.

Again people knew about this. It was a chosen decision that 2.4.x shouldnt
support NFS over reiserfs. If you want an extensively QA'd, signed off
kernel tree then wait for vendors to release one.

Alan

2001-03-27 18:03:18

by Shawn Starr

[permalink] [raw]
Subject: Re: Kernel QA


I disagree, 2.4.x is "stable" and as such we need as many people to use
the kernels to see whats wrong with them. 2.4 *DOES* Work, I've had very
small problems (ok, the thread hanging issue was a big one) but other then
that It's been solid.

It depends on the hardware.

Shawn.

On Tue, 27 Mar 2001, James Lewis Nance wrote:

> On Tue, Mar 27, 2001 at 12:13:32AM -0800, David Konerding wrote:
>
> > No, the point is that the linux developers should regression test their
> > code BEFORE
> > releasing it to the public as a version like "2.4.2". When I see a
> > version like "2.4.2", I have an expectation that all the stupid little
> > problems (like mounting loopback filesystem) have already been found.
>
> You bring up a good point. We call the even branches the stable branches
> and we do other things that promote the idea that people should be able to
> download a 2.even.X kernel, install it on their machine, and expect it to
> work. I think we need to back away from this idea. It seems to me that
> the real (perhaps not the intended) function of kernel releases is keeping
> kernel developers in sync. Promoting the idea that they are thought to be
> suitable for production use just gets us in trouble.
>
> Instead I think we need to encourage people who want to use Linux,
> rather than develop it, to use kernels from a distribution. After all,
> the distributors put a lot of effort into doing QA and putting together a
> compatable system, we should leverage that. We need to ensure that people
> know that when they install the latest kernel from Linus, they are the QA.
>
> Please note that I am not trying to say that we should not try and
> make the kernels we release as good as possible. It certainly makes
> things a lot better for everyone if bugs dont get introduced by new
> kernel versions. I do think we need to be more explicit about exactly
> what people should and should not be able to expect from a "Linus kernel".
>
> Thanks,
>
> Jim
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2001-03-27 18:52:59

by J Sloan

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

"Mohammad A. Haque" wrote:

> David Konerding wrote:
>
> > And this is described in what release notes? It worked just fine on Red Hat 7.0's 2.4
> > kernel.... oh wait, I see that they fixed it before they released it.
>
> And hmm..gee .. did they bother contributing back the code?

Based on their track record that's a silly question.

jjs

2001-03-27 19:40:10

by Joerg

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue

David Konerding <[email protected]> wrote:

> But the attitude that "many eyes make all bugs shallow" and "let the
> users test the code for us" just don't hold up. For the former,
> clearly, many eyes didn't find a lot of basically obvious bugs, for the
> latter, it's just impolite.

You mentioned the CHECKER case as proof for your point that "many eyes
make all bugs shallow" does not work. One might argue the other way
around: The CHECKER people actually found the bugs, so it works.

Regards
Joerg


=====
--
Regards
Joerg


__________________________________________________
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail.
http://personal.mail.yahoo.com/?.refer=text

2001-03-27 19:34:20

by Alexander Viro

[permalink] [raw]
Subject: Re: "mount -o loop" lockup issue



On Tue, 27 Mar 2001, J Sloan wrote:

> "Mohammad A. Haque" wrote:
>
> > David Konerding wrote:
> >
> > > And this is described in what release notes? It worked just fine on Red Hat 7.0's 2.4
> > > kernel.... oh wait, I see that they fixed it before they released it.
> >
> > And hmm..gee .. did they bother contributing back the code?
>
> Based on their track record that's a silly question.

Especially since patches in question had been written by Jens Axboe (who
has nothing to RH) and announced (many times) on l-k.

I've fixed several races in Jens' patch and fed them back to him. His patch
+ these fixes were the only loop-related patches in RH tree[1]. Until fixes got
merged into Jens' loop-6 which, in turn, was merged into -ac and into
the main tree, that is.

I don't give a flying fsck through the rolling doughnut for "their" track
record (whatever "their" means), but I'm somewhat partial to mine. Care to
grep through l-k archives, check your facts and STFU?
Al

[1] there's also changeloop patch - adds an ioctl for switching the underlying
file under opened /dev/loop; API is ugly and thing has so limited use that
IMO it should die. Completely unrelated to the problems in question, anyway.

2001-03-27 22:17:27

by Alex Valys

[permalink] [raw]
Subject: Re: Kernel QA

On Tuesday 27 March 2001 08:51, James Lewis Nance wrote:
> Instead I think we need to encourage people who want to use Linux,
> rather than develop it, to use kernels from a distribution.

I hope that's not the opinion of all the kernel developers - where does that
leave distributions like slackware, debian, and the rest that don't have the
time or resources to modify the kernel themselves? Every kernel release that
is meant to keep developers "in sync", as you say, should be a 2.4.x-prex
release, and the stable releases should actually be stable. If this means
slowing the release schedule, so be it. You are proposing to release
unfinished, buggy and unstable code and let the distributions pick up your
slack. It sounds like something Microsoft would do.