2005-04-28 15:59:51

by Davy Durham

[permalink] [raw]
Subject: ext3 issue..

Greetings,
I'm having an issues with ext3. For about 3 months the /home
partition has had low-to-medium use/activity.. adding files, nightly log
rotations, some mysql dbs coming and going at a slow pace.. Well,
yesterday after I had migrated everything off of it (no files in /home
anymore) the df output looked like this:

# uptime
10:35:54 up 96 days, 14:22, 1 user, load average: 0.00, 0.00, 0.00
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/ide/host0/bus0/target0/lun0/part1
2.0G 483M 1.4G 26% /
/dev/ide/host0/bus0/target0/lun0/part6
33G -64Z 31G 101% /home

I did notice that if I created a file (cat /dev/zero >/home/foo) of
significant size that I could make it look normal again.. So I figure
it's an underflow in some count.

Crazy huh? Well, I unmounted /home and did an fsck -f on the partition
and remounted it. Then everything looked okay.

---

Well today on a different server (that I have not cleaned off yet) that
has been up and running for 6 months is saying the same thing:

# uptime
10:39:16 up 181 days, 2:42, 2 users, load average: 0.00, 0.00, 0.00
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/ide/host0/bus0/target0/lun0/part1
2.0G 483M 1.4G 26% /
/dev/ide/host0/bus0/target0/lun0/part6
33G -64Z 31G 101% /home

Now, this server is still in production. I could bring it down for a
brief time to fsck or reboot it, but I'd be afraid to. du -h /home
shows that really only 268M is used.

If I create a large file (176M) in /home it then don't underflow on the
df, but is still incorrect.


Is this a known issue with ext3? Or ext2? Anything I should or should
not do about it?


Thanks,
Davy





BTW- df -k looks like
# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/ide/host0/bus0/target0/lun0/part1
2054064 493660 1454380 26% /
/dev/ide/host0/bus0/target0/lun0/part6
33690964 -73786976294838186940 31971456 101% /home


2005-04-28 20:09:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext3 issue..

On Thu, Apr 28, 2005 at 09:59:39AM -0500, Davy Durham wrote:
> Crazy huh? Well, I unmounted /home and did an fsck -f on the partition
> and remounted it. Then everything looked okay.

What messages were displayed by e2fsck? What version of the kernel
are you running?

No, I haven't heard of any such problems with ext2/3 filesystems.
This is the first time that someone was reported a specific problem
with the # of blocks used accounting. There is the standard "file
held open so the number of blocks used is greater than blocks reported
by du", but that won't cause df to display negative numbers.

- Ted

2005-04-28 20:55:50

by Lennart Sorensen

[permalink] [raw]
Subject: Re: ext3 issue..

On Thu, Apr 28, 2005 at 04:09:08PM -0400, Theodore Ts'o wrote:
> What messages were displayed by e2fsck? What version of the kernel
> are you running?
>
> No, I haven't heard of any such problems with ext2/3 filesystems.
> This is the first time that someone was reported a specific problem
> with the # of blocks used accounting. There is the standard "file
> held open so the number of blocks used is greater than blocks reported
> by du", but that won't cause df to display negative numbers.

I think I have seen this once or twice in the past. A reboot always
made it go away and it didn't seem to come back. fsck never showed
anything so I assumed it was just the kernel having lost its mind about
the state of the FS.

I think I was using 2.4.18 or so at the time I last saw it. It is quite
a while ago but it was ext3 as well as far as I recall.

I originally thought my df and company was messed up (since I think I
have seen a case on sparc where the libc/df were out of sync causing
weird output).

I never thought much about it since it didn't seem to be reproduceable
since it never repeated itself.

Len Sorensen

2005-04-29 02:35:32

by Davy Durham

[permalink] [raw]
Subject: Re: ext3 issue..

Theodore Ts'o wrote:

>On Thu, Apr 28, 2005 at 09:59:39AM -0500, Davy Durham wrote:
>
>
>>Crazy huh? Well, I unmounted /home and did an fsck -f on the partition
>>and remounted it. Then everything looked okay.
>>
>>
>
>What messages were displayed by e2fsck? What version of the kernel
>are you running?
>
>No, I haven't heard of any such problems with ext2/3 filesystems.
>This is the first time that someone was reported a specific problem
>with the # of blocks used accounting. There is the standard "file
>held open so the number of blocks used is greater than blocks reported
>by du", but that won't cause df to display negative numbers.
>
> - Ted
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
Well, I don't have the output of the fsck when I did the machine I did
it on. And I can't do it on the production machine right now. Perhaps I
can tommorrow if I can talk to the guys about cleaning it off. I'm a
bit afraid to do it, reason being that if it's off on this accounting
information, what files/data might I lose if I did an fsck on it
(remember, the one I already did it on was empty).

Also, I did an "e2fsck /home" on the running machine just expecting to
get the "warning, don't do this on a mounted file system" prompt, but
instead I got:

# e2fsck /home
e2fsck 1.34 (25-Jul-2003)
e2fsck: Is a directory while trying to open /home

The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>

Perhaps it's because it is mounted? (and -b 8193 didn't change
anything). Do I need to do it on the device after unmounting instead?
(Probably so)

I do get the expected warning with fsck itself. Which should I be
using? e2fsck or fsck?

This is kernel-2.6.3-15mdk BTW (mdk 10.0official)

I'll try to report back tomorrow if I'm able to do anything..

Thanks,
Davy

2005-04-29 11:55:25

by folkert

[permalink] [raw]
Subject: Re: ext3 issue..

> > What messages were displayed by e2fsck? What version of the kernel
> > are you running?
> > No, I haven't heard of any such problems with ext2/3 filesystems.
> > This is the first time that someone was reported a specific problem
> > with the # of blocks used accounting. There is the standard "file
> > held open so the number of blocks used is greater than blocks reported
> > by du", but that won't cause df to display negative numbers.
> I think I have seen this once or twice in the past. A reboot always
> made it go away and it didn't seem to come back. fsck never showed
> anything so I assumed it was just the kernel having lost its mind about
> the state of the FS.

"me too"
kernel 2.6.11
a reboot fixed it. fsck did not bother to check the filesystem. ext3


Folkert van Heusden

--
Auto te koop, zie: http://www.vanheusden.com/daihatsu.php
Op zoek naar een IT of Finance baan? Mail me voor de mogelijkheden.
--------------------------------------------------------------------
UNIX admin? Then give MultiTail (http://vanheusden.com/multitail/)
a try, it brings monitoring logfiles to a different level! See
http://vanheusden.com/multitail/features.html for a feature-list.
--------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE
Get your PGP/GPG key signed at http://www.biglumber.com!


Attachments:
(No filename) (1.33 kB)
signature.asc (282.00 B)
Digital signature
Download all attachments

2005-04-29 13:20:12

by Lennart Sorensen

[permalink] [raw]
Subject: Re: ext3 issue..

On Thu, Apr 28, 2005 at 09:35:09PM -0500, Davy Durham wrote:
> Well, I don't have the output of the fsck when I did the machine I did
> it on. And I can't do it on the production machine right now. Perhaps I
> can tommorrow if I can talk to the guys about cleaning it off. I'm a
> bit afraid to do it, reason being that if it's off on this accounting
> information, what files/data might I lose if I did an fsck on it
> (remember, the one I already did it on was empty).
>
> Also, I did an "e2fsck /home" on the running machine just expecting to
> get the "warning, don't do this on a mounted file system" prompt, but
> instead I got:
>
> # e2fsck /home
> e2fsck 1.34 (25-Jul-2003)
> e2fsck: Is a directory while trying to open /home

/home IS a directory. You run fsck on the device NOT the mount point.
Remember the filesystem is unmounted or at most mounted readonly when
fsck is run after all.

> The superblock could not be read or does not describe a correct ext2
> filesystem. If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
> e2fsck -b 8193 <device>
>
> Perhaps it's because it is mounted? (and -b 8193 didn't change
> anything). Do I need to do it on the device after unmounting instead?
> (Probably so)
>
> I do get the expected warning with fsck itself. Which should I be
> using? e2fsck or fsck?

Shouldn't matter as far as I know. If ext3, perhaps fsck.ext3 is right.
fsck should just figure it out I believe.

> This is kernel-2.6.3-15mdk BTW (mdk 10.0official)
>
> I'll try to report back tomorrow if I'm able to do anything..

Len Sorensen

2005-05-02 01:27:39

by Davy Durham

[permalink] [raw]
Subject: Re: ext3 issue..

Ok, I caught another machine doing it.. So here's the output of: df,
unmount, fsck.ext3, mount, df

I don't know if it shows anything that would indicate the cause of the
issue though.


#df
Filesystem Size Used Avail Use% Mounted on

/dev/ide/host0/bus0/target0/lun0/part1
2.0G 465M 1.5G 25% /
/dev/ide/host0/bus0/target0/lun0/part6
33G -64Z 31G 101% /home



# umount /home


# fsck.ext3 -f -v /dev/ide/host0/bus0/target0/lun0/part6
e2fsck 1.34 (25-Jul-2003)
Pass 1: Checking inodes, blocks, and sizes
Inode 8, i_blocks is 65616, should be 65608. Fix<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found. Create<y>? yes

Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #2 (47844, counted=30900).
Fix<y>? yes

Free blocks count wrong for group #4 (36926, counted=31744).
Fix<y>? yes

Free blocks count wrong (8435011, counted=8412885).
Fix<y>? yes


/dev/ide/host0/bus0/target0/lun0/part6: ***** FILE SYSTEM WAS MODIFIED *****

13 inodes used (0%)
2 non-contiguous inodes (15.4%)
# of inodes with ind/dind/tind blocks: 2/0/0
282288 blocks used (3%)
0 bad blocks
0 large files

2 regular files
1 directory
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
3 files


# mount /home



# df
Filesystem Size Used Avail Use% Mounted on
/dev/ide/host0/bus0/target0/lun0/part1
2.0G 439M 1.5G 24% /
/dev/ide/host0/bus0/target0/lun0/part6
33G 39M 31G 1% /home


# uname -r
2.6.3-15mdk


# uptime
20:25:26 up 100 days, 12 min, 1 user, load average: 0.01, 0.08, 0.04





Theodore Ts'o wrote:

>On Thu, Apr 28, 2005 at 09:59:39AM -0500, Davy Durham wrote:
>
>
>>Crazy huh? Well, I unmounted /home and did an fsck -f on the partition
>>and remounted it. Then everything looked okay.
>>
>>
>
>What messages were displayed by e2fsck? What version of the kernel
>are you running?
>
>No, I haven't heard of any such problems with ext2/3 filesystems.
>This is the first time that someone was reported a specific problem
>with the # of blocks used accounting. There is the standard "file
>held open so the number of blocks used is greater than blocks reported
>by du", but that won't cause df to display negative numbers.
>
> - Ted
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>

2005-05-03 12:38:48

by Davy Durham

[permalink] [raw]
Subject: Re: ext3 issue..

I was thinking about delving into this problem a bit. I don't have any
unpartitioned free space on my physical drive. I was going to ask if
it's possible to create a virtual device in RAM or in a file that I
could then create an ext3 file system on for testing.. I'm at least
trying to recreate the situation of the negative diskspace usage.. then
maybe try to debug ext3 a bit. At first I thought "oh RAMFS!", then "no
wait, I couldn't create an EXT3 file system that way".. I need a
non-physical (block? or character?) device.

Thanks,
Davy



Davy Durham wrote:

> Greetings,
> I'm having an issues with ext3. For about 3 months the /home
> partition has had low-to-medium use/activity.. adding files, nightly
> log rotations, some mysql dbs coming and going at a slow pace.. Well,
> yesterday after I had migrated everything off of it (no files in /home
> anymore) the df output looked like this:
>
> # uptime
> 10:35:54 up 96 days, 14:22, 1 user, load average: 0.00, 0.00, 0.00
> # df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/ide/host0/bus0/target0/lun0/part1
> 2.0G 483M 1.4G 26% /
> /dev/ide/host0/bus0/target0/lun0/part6
> 33G -64Z 31G 101% /home
>
> I did notice that if I created a file (cat /dev/zero >/home/foo) of
> significant size that I could make it look normal again.. So I figure
> it's an underflow in some count.
>
> Crazy huh? Well, I unmounted /home and did an fsck -f on the
> partition and remounted it. Then everything looked okay.
>
> ---
>
> Well today on a different server (that I have not cleaned off yet)
> that has been up and running for 6 months is saying the same thing:
>
> # uptime
> 10:39:16 up 181 days, 2:42, 2 users, load average: 0.00, 0.00, 0.00
> # df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/ide/host0/bus0/target0/lun0/part1
> 2.0G 483M 1.4G 26% /
> /dev/ide/host0/bus0/target0/lun0/part6
> 33G -64Z 31G 101% /home
>
> Now, this server is still in production. I could bring it down for a
> brief time to fsck or reboot it, but I'd be afraid to. du -h /home
> shows that really only 268M is used.
>
> If I create a large file (176M) in /home it then don't underflow on
> the df, but is still incorrect.
>
>
> Is this a known issue with ext3? Or ext2? Anything I should or should
> not do about it?
>
> Thanks,
> Davy
>
>
>
>
>
> BTW- df -k looks like
> # df -k
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/ide/host0/bus0/target0/lun0/part1
> 2054064 493660 1454380 26% /
> /dev/ide/host0/bus0/target0/lun0/part6
> 33690964 -73786976294838186940 31971456 101% /home
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2005-05-03 12:53:54

by Christopher Chan

[permalink] [raw]
Subject: Re: ext3 issue..

Hi Davy.

On 5/3/05, Davy Durham <[email protected]> wrote:
> I was thinking about delving into this problem a bit. I don't have any
> unpartitioned free space on my physical drive. I was going to ask if
> it's possible to create a virtual device in RAM or in a file that I
> could then create an ext3 file system on for testing.. I'm at least
> trying to recreate the situation of the negative diskspace usage.. then
> maybe try to debug ext3 a bit. At first I thought "oh RAMFS!", then "no
> wait, I couldn't create an EXT3 file system that way".. I need a
> non-physical (block? or character?) device.

It is called a ram disk. You might want to pass options to the kernel
to increase the size of the ramdisk.

Look under Documentation for ramdisk.txt in the linux source tree.

2005-05-03 20:39:07

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext3 issue..

On Thu, Apr 28, 2005 at 04:55:36PM -0400, Lennart Sorensen wrote:
>
> I think I have seen this once or twice in the past. A reboot always
> made it go away and it didn't seem to come back. fsck never showed
> anything so I assumed it was just the kernel having lost its mind about
> the state of the FS.
>
> I think I was using 2.4.18 or so at the time I last saw it. It is quite
> a while ago but it was ext3 as well as far as I recall.

That's a different problem; in this case apparently the corruption is
extending to the on-disk superblock accounting information (so fsck is
detecting evidence of it). Fortunately this sort of corruption won't
cause data loss, but we should figure out what the heck is going on.

- Ted

2005-05-03 22:17:41

by Darren Williams

[permalink] [raw]
Subject: Re: ext3 issue..

Hi Davy

On Tue, 03 May 2005, Davy Durham wrote:

> I was thinking about delving into this problem a bit. I don't have any
> unpartitioned free space on my physical drive. I was going to ask if
> it's possible to create a virtual device in RAM or in a file that I
> could then create an ext3 file system on for testing.. I'm at least
> trying to recreate the situation of the negative diskspace usage.. then
> maybe try to debug ext3 a bit. At first I thought "oh RAMFS!", then "no
> wait, I couldn't create an EXT3 file system that way".. I need a
> non-physical (block? or character?) device.
>
Make sure your kernel has ramdisk support, then

dd if=/dev/zero of=/dev/ram bs=1024k count=512
sbin/mkfs -t ext3 -b 1024 /dev/ram 524288
mount -t ext3 /dev/ram <where ever you want to mount the fs>


Note* you will need to be root
Note** the block size of the fs has to be equal to the bs of the ram disk
Note*** if you get permissions denied when writing to /dev/ram try /dev/ramX (X=[0-9])

> Thanks,
> Davy
>
>
>
> Davy Durham wrote:
>
> >Greetings,
> > I'm having an issues with ext3. For about 3 months the /home
> >partition has had low-to-medium use/activity.. adding files, nightly
> >log rotations, some mysql dbs coming and going at a slow pace.. Well,
> >yesterday after I had migrated everything off of it (no files in /home
> >anymore) the df output looked like this:
> >
> ># uptime
> >10:35:54 up 96 days, 14:22, 1 user, load average: 0.00, 0.00, 0.00
> ># df -h
> >Filesystem Size Used Avail Use% Mounted on
> >/dev/ide/host0/bus0/target0/lun0/part1
> > 2.0G 483M 1.4G 26% /
> >/dev/ide/host0/bus0/target0/lun0/part6
> > 33G -64Z 31G 101% /home
> >
> >I did notice that if I created a file (cat /dev/zero >/home/foo) of
> >significant size that I could make it look normal again.. So I figure
> >it's an underflow in some count.
> >
> >Crazy huh? Well, I unmounted /home and did an fsck -f on the
> >partition and remounted it. Then everything looked okay.
> >
> >---
> >
> >Well today on a different server (that I have not cleaned off yet)
> >that has been up and running for 6 months is saying the same thing:
> >
> ># uptime
> >10:39:16 up 181 days, 2:42, 2 users, load average: 0.00, 0.00, 0.00
> ># df -h
> >Filesystem Size Used Avail Use% Mounted on
> >/dev/ide/host0/bus0/target0/lun0/part1
> > 2.0G 483M 1.4G 26% /
> >/dev/ide/host0/bus0/target0/lun0/part6
> > 33G -64Z 31G 101% /home
> >
> >Now, this server is still in production. I could bring it down for a
> >brief time to fsck or reboot it, but I'd be afraid to. du -h /home
> >shows that really only 268M is used.
> >
> >If I create a large file (176M) in /home it then don't underflow on
> >the df, but is still incorrect.
> >
> >
> >Is this a known issue with ext3? Or ext2? Anything I should or should
> >not do about it?
> >
> >Thanks,
> >Davy
> >
> >
> >
> >
> >
> >BTW- df -k looks like
> ># df -k
> >Filesystem 1K-blocks Used Available Use% Mounted on
> >/dev/ide/host0/bus0/target0/lun0/part1
> > 2054064 493660 1454380 26% /
> >/dev/ide/host0/bus0/target0/lun0/part6
> > 33690964 -73786976294838186940 31971456 101% /home
> >
> >-
> >To unsubscribe from this list: send the line "unsubscribe
> >linux-kernel" in
> >the body of a message to [email protected]
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at http://www.tux.org/lkml/
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--------------------------------------------------
Darren Williams <dsw AT gelato.unsw.edu.au>
Gelato@UNSW <http://www.gelato.unsw.edu.au>
--------------------------------------------------