2005-12-07 12:03:52

by Shlomi Fish

[permalink] [raw]
Subject: XFS Mount Hangs the Partition (on latest kernel + many old 2.6.x ones)

Hi all!

(Please CC me on replies)

I encountered a problem with the Linux kernel handling of XFS, in which
attempting to mount a certain XFS partition (but not a different one on the
same hard-disk) caused the mount process to hang, and all other XFS-aware
apps (like "xfs_check" or "xfs_repair") to hang too. However, running
xfs_check
or xfs_repair before the first mount (after a reboot) worked, and eventually
resolved this problem.

I blogged about it (relatively incoherently) here:

http://www.livejournal.com/~shlomif/7182.html?mode=reply
http://www.livejournal.com/~shlomif/7547.html?mode=reply

It all happened after I detected some problems on my Mandriva 2006 system
(that was using kernel 2.4.15-rc2 from Linus), and then rebooted twice,
thinking something went wrong. Then a loadlin-booted kernel was unable to
load the kernel.

Knoppix ran fine, but it also hang up on attempting to mount the XFS
partition. It used a much older kernel. I then tried to boot Kubuntu (which
was on another XFS partition on the same disk) and it booted fine. Still, it
was unable to mount the partition. (It too had an older kernel).

After I compiled a 2.6.14.3 kernel, and booted Kubuntu with it, it again
could not mount the XFS partition, and after doing that xfs_check and
xfs_repair both hanged up as well. After a reboot, I tried running xfs_check
right away on that partition and it worked. So I ran xfs_repair, and after it
finished, tried to mount the partition it worked. Then Mandriva booted fine.

I did not had any problems since then (I have an uptime of 11 days now using
kernel 2.6.14.3), and so it doesn't seem like a hard disk problem. Something
using kernel 2.6.15-rc2 caused the XFS partition to become defected, and
worse - something in all the kernels starting from that of the first official
Kubuntu release, (or the Knoppix I had), caused an attempt to mount the
Mandriva partition to hang the process, and subsequent accesses to the
partition by xfs_check and xfs_repair to fail as well.

I can no longer reproduce the problem, but it might be worth going over the
code. If it helps I can privately send a dump of the first 131,072,000 bytes
of the XFS partition to someone trustworthy.

Regards,

Shlomi Fish

---------------------------------------------------------------------
Shlomi Fish [email protected]
Homepage: http://www.shlomifish.org/

95% of the programmers consider 95% of the code they did not write, in the
bottom 5%.


2005-12-07 20:55:24

by Nathan Scott

[permalink] [raw]
Subject: Re: XFS Mount Hangs the Partition (on latest kernel + many old 2.6.x ones)

On Wed, Dec 07, 2005 at 01:57:38PM +0200, Shlomi Fish wrote:
> Hi all!

Hi there,

> (Please CC me on replies)
>
> I encountered a problem with the Linux kernel handling of XFS, in which
> attempting to mount a certain XFS partition (but not a different one on the
> same hard-disk) caused the mount process to hang, and all other XFS-aware
> apps (like "xfs_check" or "xfs_repair") to hang too. However, running
> xfs_check
> or xfs_repair before the first mount (after a reboot) worked, and eventually
> resolved this problem.
>
> I blogged about it (relatively incoherently) here:
>
> http://www.livejournal.com/~shlomif/7182.html?mode=reply
> http://www.livejournal.com/~shlomif/7547.html?mode=reply

Unfortunately there's not much information here in your mail or
there that would help us to analyse this further. If you see this
behaviour again could you:
- get sysrq-t information for all hung processes, esp. mount;
- send xfs_info output for the filesystem in question;
- dump the log (xfs_logprint -C) and send it to us.

> It all happened after I detected some problems on my Mandriva 2006 system
> (that was using kernel 2.4.15-rc2 from Linus), and then rebooted twice,
> thinking something went wrong. Then a loadlin-booted kernel was unable to
> load the kernel.
>
> Knoppix ran fine, but it also hang up on attempting to mount the XFS
> partition. It used a much older kernel. I then tried to boot Kubuntu (which
> was on another XFS partition on the same disk) and it booted fine. Still, it
> was unable to mount the partition. (It too had an older kernel).
>
> After I compiled a 2.6.14.3 kernel, and booted Kubuntu with it, it again
> could not mount the XFS partition, and after doing that xfs_check and
> xfs_repair both hanged up as well. After a reboot, I tried running xfs_check

This was probably caused by the block device being held open
exclusively by the stuck mount process.

> right away on that partition and it worked. So I ran xfs_repair, and after it
> finished, tried to mount the partition it worked. Then Mandriva booted fine.
>
> I did not had any problems since then (I have an uptime of 11 days now using
> kernel 2.6.14.3), and so it doesn't seem like a hard disk problem. Something
> using kernel 2.6.15-rc2 caused the XFS partition to become defected, and
> worse - something in all the kernels starting from that of the first official
> Kubuntu release, (or the Knoppix I had), caused an attempt to mount the
> Mandriva partition to hang the process, and subsequent accesses to the
> partition by xfs_check and xfs_repair to fail as well.
>
> I can no longer reproduce the problem, but it might be worth going over the
> code. If it helps I can privately send a dump of the first 131,072,000 bytes
> of the XFS partition to someone trustworthy.

Thats unlikely to help now - repair will have wiped your log clean,
so all evidence of the problem will be gone.

cheers.

--
Nathan

2005-12-08 16:02:12

by Shlomi Fish

[permalink] [raw]
Subject: Re: XFS Mount Hangs the Partition (on latest kernel + many old 2.6.x ones)

On Wednesday 07 December 2005 22:55, Nathan Scott wrote:
> On Wed, Dec 07, 2005 at 01:57:38PM +0200, Shlomi Fish wrote:
> > Hi all!
>
> Hi there,

Hi Mr. Scott (and all)!

>
> > (Please CC me on replies)
> >
> > I encountered a problem with the Linux kernel handling of XFS, in which
> > attempting to mount a certain XFS partition (but not a different one on
> > the same hard-disk) caused the mount process to hang, and all other
> > XFS-aware apps (like "xfs_check" or "xfs_repair") to hang too. However,
> > running xfs_check
> > or xfs_repair before the first mount (after a reboot) worked, and
> > eventually resolved this problem.
> >
> > I blogged about it (relatively incoherently) here:
> >
> > http://www.livejournal.com/~shlomif/7182.html?mode=reply
> > http://www.livejournal.com/~shlomif/7547.html?mode=reply
>
> Unfortunately there's not much information here in your mail or
> there that would help us to analyse this further. If you see this
> behaviour again could you:
> - get sysrq-t information for all hung processes, esp. mount;
> - send xfs_info output for the filesystem in question;
> - dump the log (xfs_logprint -C) and send it to us.

Sure. But in what order should I do all that?

>
> > It all happened after I detected some problems on my Mandriva 2006 system
> > (that was using kernel 2.4.15-rc2 from Linus), and then rebooted twice,
> > thinking something went wrong. Then a loadlin-booted kernel was unable to
> > load the kernel.
> >
> > Knoppix ran fine, but it also hang up on attempting to mount the XFS
> > partition. It used a much older kernel. I then tried to boot Kubuntu
> > (which was on another XFS partition on the same disk) and it booted fine.
> > Still, it was unable to mount the partition. (It too had an older
> > kernel).
> >
> > After I compiled a 2.6.14.3 kernel, and booted Kubuntu with it, it again
> > could not mount the XFS partition, and after doing that xfs_check and
> > xfs_repair both hanged up as well. After a reboot, I tried running
> > xfs_check
>
> This was probably caused by the block device being held open
> exclusively by the stuck mount process.

I see.

>
> > right away on that partition and it worked. So I ran xfs_repair, and
> > after it finished, tried to mount the partition it worked. Then Mandriva
> > booted fine.
> >
> > I did not had any problems since then (I have an uptime of 11 days now
> > using kernel 2.6.14.3), and so it doesn't seem like a hard disk problem.
> > Something using kernel 2.6.15-rc2 caused the XFS partition to become
> > defected, and worse - something in all the kernels starting from that of
> > the first official Kubuntu release, (or the Knoppix I had), caused an
> > attempt to mount the Mandriva partition to hang the process, and
> > subsequent accesses to the partition by xfs_check and xfs_repair to fail
> > as well.
> >
> > I can no longer reproduce the problem, but it might be worth going over
> > the code. If it helps I can privately send a dump of the first
> > 131,072,000 bytes of the XFS partition to someone trustworthy.
>
> Thats unlikely to help now - repair will have wiped your log clean,
> so all evidence of the problem will be gone.

Yes. I thought one can try looking in the XFS mounting code for possible bugs.

Regards,

Shlomi Fish

---------------------------------------------------------------------
Shlomi Fish [email protected]
Homepage: http://www.shlomifish.org/

95% of the programmers consider 95% of the code they did not write, in the
bottom 5%.

2005-12-15 17:59:51

by Shlomi Fish

[permalink] [raw]
Subject: Re: XFS Mount Hangs the Partition (on latest kernel + many old 2.6.x ones)

Replying to myself, I'd like to ask for a response to my previous message,
especially the order in which the operations Mr. Scott mentioned need to be
performed.

Regards,

Shlomi Fish

On Thursday 08 December 2005 17:55, Shlomi Fish wrote:
> On Wednesday 07 December 2005 22:55, Nathan Scott wrote:
> > On Wed, Dec 07, 2005 at 01:57:38PM +0200, Shlomi Fish wrote:
> > > Hi all!
> >
> > Hi there,
>
> Hi Mr. Scott (and all)!
>
> > > (Please CC me on replies)
> > >
> > > I encountered a problem with the Linux kernel handling of XFS, in which
> > > attempting to mount a certain XFS partition (but not a different one on
> > > the same hard-disk) caused the mount process to hang, and all other
> > > XFS-aware apps (like "xfs_check" or "xfs_repair") to hang too. However,
> > > running xfs_check
> > > or xfs_repair before the first mount (after a reboot) worked, and
> > > eventually resolved this problem.
> > >
> > > I blogged about it (relatively incoherently) here:
> > >
> > > http://www.livejournal.com/~shlomif/7182.html?mode=reply
> > > http://www.livejournal.com/~shlomif/7547.html?mode=reply
> >
> > Unfortunately there's not much information here in your mail or
> > there that would help us to analyse this further. If you see this
> > behaviour again could you:
> > - get sysrq-t information for all hung processes, esp. mount;
> > - send xfs_info output for the filesystem in question;
> > - dump the log (xfs_logprint -C) and send it to us.
>
> Sure. But in what order should I do all that?

[SNIPPED]

---------------------------------------------------------------------
Shlomi Fish [email protected]
Homepage: http://www.shlomifish.org/

95% of the programmers consider 95% of the code they did not write, in the
bottom 5%.

2005-12-15 18:30:36

by Eric Sandeen

[permalink] [raw]
Subject: Re: XFS Mount Hangs the Partition (on latest kernel + many old 2.6.x ones)

Shlomi Fish wrote:
> Replying to myself, I'd like to ask for a response to my previous message,
> especially the order in which the operations Mr. Scott mentioned need to be
> performed.

>>>- get sysrq-t information for all hung processes, esp. mount;
>>>- send xfs_info output for the filesystem in question;
>>>- dump the log (xfs_logprint -C) and send it to us.


xfs_logprint -C /dev/foo against the unmounted device.

sysrq-t when it's hung, after you try to mount

xfs_info /mnt/point when it's mounted.

Well, you can't mount, because recovery hangs, so,

mount -o ro,norecovery /dev/foo /mnt/point; xfs_info /mnt/point

-Eric

2005-12-15 18:57:41

by Shlomi Fish

[permalink] [raw]
Subject: Re: XFS Mount Hangs the Partition (on latest kernel + many old 2.6.x ones)

On Thursday 15 December 2005 20:30, Eric Sandeen wrote:
> Shlomi Fish wrote:
> > Replying to myself, I'd like to ask for a response to my previous
> > message, especially the order in which the operations Mr. Scott mentioned
> > need to be performed.
> >
> >>>- get sysrq-t information for all hung processes, esp. mount;
> >>>- send xfs_info output for the filesystem in question;
> >>>- dump the log (xfs_logprint -C) and send it to us.
>
> xfs_logprint -C /dev/foo against the unmounted device.
>
> sysrq-t when it's hung, after you try to mount
>
> xfs_info /mnt/point when it's mounted.
>
> Well, you can't mount, because recovery hangs, so,
>
> mount -o ro,norecovery /dev/foo /mnt/point; xfs_info /mnt/point
>
> -Eric

Thanks!

I'll save this message to a file, put it on different partitions and print it
in case this problem repeats itself.

Regards,

Shlomi Fish

---------------------------------------------------------------------
Shlomi Fish [email protected]
Homepage: http://www.shlomifish.org/

95% of the programmers consider 95% of the code they did not write, in the
bottom 5%.