2001-02-02 19:45:36

by Richard B. Johnson

[permalink] [raw]
Subject: Version 2.4.1 has ext2 problems.

Files generated by e2fsck in lost+found cannot be removed.

Script started on Fri Feb 2 14:29:55 2001
# df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/sdc1 6356624 2473924 3559796 41% /
/dev/sdc3 2253284 1373532 765292 64% /home/users
/dev/sda1 1048272 279504 768768 27% /dos/drive_C
/dev/sda5 1046224 181200 865024 17% /dos/drive_D
/dev/sdb1 2020332 1743937 171975 91% /alt
# e2fsck -f /dev/sdd1
e2fsck 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdd1: 99/262144 files (0.0% non-contiguous), 8238/524112 blocks
# mount /dev/sdd1 /mnt
# df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/sdc1 6356624 2473924 3559796 41% /
/dev/sdc3 2253284 1373532 765292 64% /home/users
/dev/sda1 1048272 279504 768768 27% /dos/drive_C
/dev/sda5 1046224 181200 865024 17% /dos/drive_D
/dev/sdb1 2020332 1743937 171975 91% /alt
/dev/sdd1 2063504 8 1958676 0% /mnt
# cd /mnt
# cd lost+found
# ls
#1006 #1329 #1563 #1830 #2051 #2228 #2364 #2602 #362 #587 #73
#1057 #134 #1579 #1856 #2096 #2242 #2373 #2610 #365 #588 #735
#1140 #1344 #1613 #1875 #2114 #2260 #2392 #2612 #433 #591 #76
#1149 #1363 #1634 #1979 #2120 #2263 #24 #2623 #442 #626 #787
#1219 #137 #1654 #1995 #2121 #2264 #2460 #2651 #554 #640 #796
#1241 #1451 #1696 #1997 #2125 #2322 #2496 #30 #556 #667 #816
#1320 #1516 #1733 #200 #2160 #2342 #2497 #301 #57 #715 #818
#1327 #1535 #1758 #2012 #2173 #2353 #2498 #304 #574 #724 #819
# rm *
rm: cannot remove `#1006': Value too large for defined data type
rm: cannot remove `#1057': Value too large for defined data type
rm: cannot remove `#1140': Value too large for defined data type
rm: cannot remove `#1149': Value too large for defined data type
rm: cannot remove `#1219': Value too large for defined data type
rm: cannot remove `#1241': Value too large for defined data type
rm: cannot remove `#1320': Value too large for defined data type
rm: cannot remove `#1327': Value too large for defined data type
rm: cannot remove `#1329': Value too large for defined data type
rm: cannot remove `#134': Value too large for defined data type
rm: cannot remove `#1344': Value too large for defined data type
rm: cannot remove `#1363': Value too large for defined data type
rm: cannot remove `#137': Value too large for defined data type
rm: cannot remove `#1451': Value too large for defined data type
[SNIPPED...]

# ls -la
ls: #24: Value too large for defined data type
ls: #30: Value too large for defined data type
ls: #57: Value too large for defined data type
ls: #73: Value too large for defined data type
ls: #76: Value too large for defined data type
ls: #134: Value too large for defined data type
ls: #137: Value too large for defined data type
ls: #200: Value too large for defined data type
ls: #301: Value too large for defined data type
ls: #304: Value too large for defined data type
ls: #362: Value too large for defined data type
ls: #365: Value too large for defined data type
ls: #433: Value too large for defined data type
ls: #442: Value too large for defined data type
ls: #554: Value too large for defined data type
ls: #556: Value too large for defined data type
ls: #574: Value too large for defined data type
ls: #587: Value too large for defined data type
ls: #588: Value too large for defined data type
[SNIPPED...]


total 8
drwxr-xr-x 2 root root 4096 Feb 2 13:40 .
drwxr-xr-x 3 root root 4096 Feb 2 13:40 ..
# strace rm *
execve("/bin/rm", ["rm", "#1006", "#1057", "#1140", "#1149", "#1219", "#1241", "#1320", "#1327", "#1329", "#134", "#1344", "#1363", "#137", "#1451", "#1516", "#1535", "#1563", "#1579", "#1613", "#1634", "#1654", "#1696", "#1733", "#1758", "#1830", "#1856"
, "#1875", "#1979", "#1995", "#1997", "#200", "#2012", "#2051", "#2096", "#2114", "#2120", "#2121", "#2125", "#2160", "#2173", "#2228", "#2242", "#2260", "#2263", "#2264", "#2322", "#2342", "#2353", "#2364", "#2373", "#2392", "#24", "#2460", "#2496", "#24
97", "#2498", "#2602", "#2610", "#2612", "#2623", "#2651", "#30", "#301", "#304", "#362", "#365", "#433", "#442", "#554", "#556", "#57", "#574", "#587", "#588", "#591", "#626", "#640", "#667", "#715", "#724", "#73", "#735", "#76", "#787", "#796", "#816",
"#818", "#819"], [/* 32 vars */]) = 0
brk(0) = 0x8050318
[SNIPPED extra stuff...]

lstat("#1057", 0xbffff2c0) = -1 EOVERFLOW (Value too large for defined data type)
write(2, "rm: ", 4rm: ) = 4
write(2, "cannot remove `#1057\'", 21cannot remove `#1057') = 21
write(2, ": Value too large for defined da"..., 39: Value too large for defined data type) = 39
write(2, "\n", 1
) = 1
lstat("#1140", 0xbffff2c0) = -1 EOVERFLOW (Value too large for defined data type)
write(2, "rm: ", 4rm: ) = 4
write(2, "cannot remove `#1140\'", 21cannot remove `#1140') = 21
write(2, ": Value too large for defined da"..., 39: Value too large for defined data type) = 39
write(2, "\n", 1
) = 1
lstat("#1149", 0xbffff2c0) = -1 EOVERFLOW (Value too large for defined data type)
write(2, "rm: ", 4rm: ) = 4
write(2, "cannot remove `#1149\'", 21cannot remove `#1149') = 21
write(2, ": Value too large for defined da"..., 39: Value too large for defined data type) = 39
write(2, "\n", 1
) = 1
lstat("#1219", 0xbffff2c0) = -1 EOVERFLOW (Value too large for defined data type)
write(2, "rm: ", 4rm: ) = 4
write(2, "cannot remove `#1219\'", 21cannot remove `#1219') = 21
write(2, ": Value too large for defined da"..., 39: Value too large for defined data type) = 39
write(2, "\n", 1
) = 1
[Snipped...]

# exit
exit

Script done on Fri Feb 2 14:34:46 2001


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.



2001-02-02 19:54:47

by Petr Vandrovec

[permalink] [raw]
Subject: Re: Version 2.4.1 has ext2 problems.

On 2 Feb 01 at 14:44, Richard B. Johnson wrote:

> # rm *
> rm: cannot remove `#1006': Value too large for defined data type
> rm: cannot remove `#1057': Value too large for defined data type
> rm: cannot remove `#1140': Value too large for defined data type
> ls: #588: Value too large for defined data type
> [SNIPPED...]
>
> lstat("#1057", 0xbffff2c0) = -1 EOVERFLOW (Value too large for defined data type)

Too old fileutils, and maybe glibc. They do not handle >2GB files.
And 'rm', for some strange reason, first 'lstat' file before removing
it. As workaround, do:

cd lost+found
for a in *; do echo > $a; done
rm *

BTW, who created that files? Maybe there is some way to get through
2GB limit check without saying O_LARGEFILE? But more probably stupid
software using O_LARGEFILE without knowing consequences...
Petr Vandrovec
[email protected]


2001-02-02 20:01:28

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Version 2.4.1 has ext2 problems.

On Fri, 2 Feb 2001, Petr Vandrovec wrote:

> On 2 Feb 01 at 14:44, Richard B. Johnson wrote:
>
> > # rm *
> > rm: cannot remove `#1006': Value too large for defined data type
> > rm: cannot remove `#1057': Value too large for defined data type
> > rm: cannot remove `#1140': Value too large for defined data type
> > ls: #588: Value too large for defined data type
> > [SNIPPED...]
> >
> > lstat("#1057", 0xbffff2c0) = -1 EOVERFLOW (Value too large for defined data type)
>
> Too old fileutils, and maybe glibc. They do not handle >2GB files.
> And 'rm', for some strange reason, first 'lstat' file before removing
> it. As workaround, do:
>
> cd lost+found
> for a in *; do echo > $a; done
> rm *
>
> BTW, who created that files? Maybe there is some way to get through
> 2GB limit check without saying O_LARGEFILE? But more probably stupid
> software using O_LARGEFILE without knowing consequences...
> Petr Vandrovec
> [email protected]

Thanks. The work-around was to make another file-system. Unfortunately
truncating a file with `>filename` also fails. The files were created
by e2fsck after a crash with version 2.4.1. The entire file-system
was reduced to junk.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2001-02-03 10:29:56

by Russell King

[permalink] [raw]
Subject: Re: Version 2.4.1 has ext2 problems.

Richard B. Johnson writes:
> Files generated by e2fsck in lost+found cannot be removed.
> # rm *
> rm: cannot remove `#1006': Value too large for defined data type

Well, I can say that this isn't an isolated incident. I was hitting 2.4.1
hard last night on ARM, and ended up loosing my /usr and /var mountpoints
and a few other files to this exact corruption.

I resorted to using debugfs to remove these entries, and re-running e2fsck.

Oh, the other interesting thing about it was that they had random modes
(eg, 1066440) - e2fsck also complained about a large number of errors on
the affected inodes (eg, various fields of the inode structure which should
be zero, d_time stuff, etc). Sorry, don't have the e2fsck logs, and I'm
reluctant to try to reproduce it.

I've been wondering if the ARMv3 implementation of insw/outsw is broken
(yes, its running in PIO only), hence I haven't reported it until now,
but it seemed to check out last night.

Maybe this problem and my random process SEGV problem are connected in
some way. Basically, I was trying to track down a problem with processes
getting SEGV'd when swap partitions was enabled. I ended up with init
in a loop panicing about SEGVs. It turns out that the wrong page had
been paged back in into the binary, and therefore glibc's __environ
pointer was corrupted. Specifically, the page that was placed there was
the immediately preceding page.

I know that other people have been seeing weird effects on 2.4.1 with
corrupted zero pages, but I don't think this is my problem.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-02-05 16:44:35

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Version 2.4.1 has ext2 problems.

On Sat, 3 Feb 2001, Russell King wrote:

> Richard B. Johnson writes:
> > Files generated by e2fsck in lost+found cannot be removed.
> > # rm *
> > rm: cannot remove `#1006': Value too large for defined data type
>
> Well, I can say that this isn't an isolated incident. I was hitting 2.4.1
> hard last night on ARM, and ended up loosing my /usr and /var mountpoints
> and a few other files to this exact corruption.
>
> I resorted to using debugfs to remove these entries, and re-running e2fsck.
>
> Oh, the other interesting thing about it was that they had random modes
> (eg, 1066440) - e2fsck also complained about a large number of errors on
> the affected inodes (eg, various fields of the inode structure which should
> be zero, d_time stuff, etc). Sorry, don't have the e2fsck logs, and I'm
> reluctant to try to reproduce it.
>
[Snipped...]

Methinks that there a few problems(races?) that remain with this version.
The problem is that they the incidents occur at random and you wouldn't
want to deliberately produce them... The result is no file-system.

Hopefully, the on-going work will kill these bugs.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.