2004-10-24 06:57:50

by Robin Rosenberg

[permalink] [raw]
Subject: XFS strangeness, xfs_db out of memory

Hi,

I was testing a tiny script on top of xfs_fsr to show fragmentation and the
resultss of defragmentation. As a result of fine tuning the output I ran the
script repeatedly and suddenly got error from find (unknown error 999 if my
memory serves me. It scrolled off the screen).

The logs show this.
Oct 24 08:06:50 xine kernel: hda: dma_timer_expiry: dma status == 0x21
Oct 24 08:07:00 xine kernel: hda: DMA timeout error
Oct 24 08:07:00 xine kernel: hda: dma timeout error: status=0xd0 { Busy }
Oct 24 08:07:00 xine kernel:
Oct 24 08:07:00 xine kernel: hda: DMA disabled
Oct 24 08:07:00 xine kernel: ide0: reset: success

How bad is that for XFS?... The error isn't permanent it seems.

After that xfs_db -r /dev-with-home -c "frag -v" gives me an out-of-memory
error after a while, consistently.

xfs_db: out of memory

The script essentially does this

xfs_info $dev
xfs_db -r $dev -c "frag -v"
find $mountp
xfs_fsr -v

Program versions
kernel is 2.6.8.1-12mdk (Mandrake 10.1 Community edition)
xfsdump-2.2.21-1mdk
xfsprogs-2.6.13-1mdk

xfs_repair had lots of comments after this, but went through.

-- robin


2004-10-24 11:54:13

by Jan Engelhardt

[permalink] [raw]
Subject: Re: XFS strangeness, xfs_db out of memory

>I was testing a tiny script on top of xfs_fsr to show fragmentation and the
>resultss of defragmentation. As a result of fine tuning the output I ran the
>script repeatedly and suddenly got error from find (unknown error 999 if my
>memory serves me. It scrolled off the screen).
>
>The logs show this.
>Oct 24 08:06:50 xine kernel: hda: dma_timer_expiry: dma status == 0x21
>Oct 24 08:07:00 xine kernel: hda: DMA timeout error
>Oct 24 08:07:00 xine kernel: hda: dma timeout error: status=0xd0 { Busy }
>Oct 24 08:07:00 xine kernel:
>Oct 24 08:07:00 xine kernel: hda: DMA disabled
>Oct 24 08:07:00 xine kernel: ide0: reset: success

Hi,

That looks to me like your HD is going to die sometime in the future...

>How bad is that for XFS?... The error isn't permanent it seems.

Usually nothing. Expect <any fs> to struggle when such IO/DMA errors happen.

>After that xfs_db -r /dev-with-home -c "frag -v" gives me an out-of-memory
>error after a while, consistently.

XFS has probably picked up a malicious value due to the disk error, and as such
allocates that much. Probably more than you got.

>I ran the script repeatedly and suddenly got error from find (unknown error
>999 if my

If you reboot, and restart this repeated test, does it always error out at the
same time and spot (and with the same error 0x21/0x90), e.g. the 100'th
instance of xfs_db?

Please also try a badblocks -vv /dev/hdXY (or appropriate) repeatedly. If it
finds something there after a lot of runs (at least as much as you needed to
find out the fragmentation), there's definitely something wrong with the HD,
not XFS.




Jan Engelhardt
--
Gesellschaft f?r Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 G?ttingen, http://www.gwdg.de

2004-10-24 21:05:46

by Robin Rosenberg

[permalink] [raw]
Subject: Re: XFS strangeness, xfs_db out of memory

On Sunday 24 October 2004 13.53, Jan Engelhardt wrote:
> >I was testing a tiny script on top of xfs_fsr to show fragmentation and
> > the resultss of defragmentation. As a result of fine tuning the output I
> > ran the script repeatedly and suddenly got error from find (unknown error
> > 999 if my memory serves me. It scrolled off the screen).
> >
> >The logs show this.
> >Oct 24 08:06:50 xine kernel: hda: dma_timer_expiry: dma status == 0x21
> >Oct 24 08:07:00 xine kernel: hda: DMA timeout error
> >Oct 24 08:07:00 xine kernel: hda: dma timeout error: status=0xd0 { Busy }
> >Oct 24 08:07:00 xine kernel:
> >Oct 24 08:07:00 xine kernel: hda: DMA disabled
> >Oct 24 08:07:00 xine kernel: ide0: reset: success
>
> Hi,
>
> That looks to me like your HD is going to die sometime in the future...
That's for certain. The question is if it's the near future. It's only a
couple of months old.

> >How bad is that for XFS?... The error isn't permanent it seems.
>
> Usually nothing. Expect <any fs> to struggle when such IO/DMA errors
> happen.
What I'm thinking about is if XFS ever saw the problem or if the kernel
retried the operation or what? I'm really curious as to what happened.

> >After that xfs_db -r /dev-with-home -c "frag -v" gives me an out-of-memory
> >error after a while, consistently.
>
> XFS has probably picked up a malicious value due to the disk error, and as
> such allocates that much. Probably more than you got.
Or these errors comes from previously unclean poweroffs (i.e. a hung system).

> >I ran the script repeatedly and suddenly got error from find (unknown
> > error 999 if my
>
> If you reboot, and restart this repeated test, does it always error out at
> the same time and spot (and with the same error 0x21/0x90), e.g. the 100'th
> instance of xfs_db?
>
> Please also try a badblocks -vv /dev/hdXY (or appropriate) repeatedly. If
> it finds something there after a lot of runs (at least as much as you
> needed to find out the fragmentation), there's definitely something wrong
> with the HD, not XFS.

I've tried it a few times, nothing so far. When I think again I have actually
seen this (or similar error) before. The logs only contains this instance of
the error, so it must be at least a month since int happended last.

-- robin

2004-10-29 07:39:31

by Nathan Scott

[permalink] [raw]
Subject: Re: XFS strangeness, xfs_db out of memory

On Sun, Oct 24, 2004 at 08:57:26AM +0200, Robin Rosenberg wrote:
> Hi,
>
> I was testing a tiny script on top of xfs_fsr to show fragmentation and the
> resultss of defragmentation. As a result of fine tuning the output I ran the
> script repeatedly and suddenly got error from find (unknown error 999 if my
> memory serves me. It scrolled off the screen).
> ...
> xfs_info $dev
> xfs_db -r $dev -c "frag -v"

This is accessing the device while the filesystem is mounted,
in older kernels (like the one you have) that would cause the
above corruption error in XFS - thats resolved now.

As to the IDE error you saw, I'm not sure how fatal that is.

cheers.

--
Nathan

2004-10-31 16:58:14

by Robin Rosenberg

[permalink] [raw]
Subject: Re: XFS strangeness, xfs_db out of memory

On Friday 29 October 2004 09.37, Nathan Scott wrote:
> On Sun, Oct 24, 2004 at 08:57:26AM +0200, Robin Rosenberg wrote:
> > Hi,
> >
> > I was testing a tiny script on top of xfs_fsr to show fragmentation and
> > the resultss of defragmentation. As a result of fine tuning the output I
> > ran the script repeatedly and suddenly got error from find (unknown error
> > 999 if my memory serves me. It scrolled off the screen).
> > ...
> > xfs_info $dev
> > xfs_db -r $dev -c "frag -v"
>
> This is accessing the device while the filesystem is mounted,
> in older kernels (like the one you have) that would cause the
> above corruption error in XFS - thats resolved now.

You don't happen to know when or where (patch) this was fixed? I'm usually
using Mandrake stock kernels, so I'm looking for something to attach to a
bug report. I was looking around without luck.

-- robin

2004-10-31 22:51:58

by Nathan Scott

[permalink] [raw]
Subject: Re: XFS strangeness, xfs_db out of memory

On Sun, Oct 31, 2004 at 05:58:05PM +0100, Robin Rosenberg wrote:
>
> You don't happen to know when or where (patch) this was fixed? I'm usually
> using Mandrake stock kernels, so I'm looking for something to attach to a
> bug report. I was looking around without luck.
>

It was bk changeset 1.1803.135.5 -- I'll send you a patch off-list.

cheers.

--
Nathan