i'm trying to run e2fsk after a system hang
after 1 hour running (70%) which had a memory usage for about 128M
i get these errors in the dmesg :
Out of Memory: Killed process 732 (fsck.ext2).
Out of Memory: Killed process 732 (fsck.ext2).
Out of Memory: Killed process 732 (fsck.ext2).
Out of Memory: Killed process 732 (fsck.ext2).
and this some pages
top gives me this info for fsck.ext2 :
732 root 9 0 592M 465M 2068 S 64.7 92.6 6:31 fsck.ext2
Mem: 514360K av, 512176K used, 2184K free, 0K shrd, 564K
buff
Swap: 136544K av, 136544K used, 0K free 3120K
cache
system has 512Megabyte memory (and 128mb swap (only fileserver, never
needed more swap)
I really wonder if there is something wrong with e2fsk ?
does it really need that much memory ?
(fsck on 2.2TB /dev/md0)
it was putting a lot of info on the screen (for some minutes) :
Duplicate/bad block in inode ... / ... ... ... ... ...
(and scrolling in real fast speed)
e2fsprogs version 1.27 with kernel 2.4.20 (+lbd patch)
i tried upgrading e2fsutils to 1.32 (latest version), but this doesn't
help
any hints ? (maybe a way to disable the enormous output from
'Duplicate/bad block in inode ..' ?)
(also why does it tell, killed when it stays running (otherwise it can't
kill multiple times...))
On Fri, 7 Feb 2003, Wim Vinckier wrote:
> I've got an equivalent problem with my server. After a long search, it
> seemed to be a heating problem. The ventilation wasn't good enough to
disks are not warm at all
there are 3 6000rpms fans blowing air over them
On Fri, 7 Feb 2003 [email protected] wrote:
> On Fri, 7 Feb 2003, Wim Vinckier wrote:
>
> > I've got an equivalent problem with my server. After a long search, it
> > seemed to be a heating problem. The ventilation wasn't good enough to
>
> disks are not warm at all
> there are 3 6000rpms fans blowing air over them
>
I'm just wondering why you are using ext2 in stead of ext3 or reiserfs...
I would still give it a try to boot my system without mounting the raid so
you just can wait untill the raid is synchronized. Once this is ready,
you can check your raid. BTW, I had two fans blowing air over my
harddisks but I got the crash because I used the normal flat IDE-cable...
I suppose you really checked the heat of the disks?
Wim.
------------------------------------------------------------------------
Wim VINCKIER
[email protected] ICQ 100545109
------------------------------------------------------------------------
'Windows 98 or better required' said the box... so I installed linux
On Fri, 7 Feb 2003, Wim Vinckier wrote:
> I'm just wondering why you are using ext2 in stead of ext3 or reiserfs...
i'm running ext3
but crash was heavy enough for removing the journal info :(
> I would still give it a try to boot my system without mounting the raid so
> you just can wait untill the raid is synchronized. Once this is ready,
i can mount the filesystem, but get errors on accessing some files
so i prefer to run fsck on it (and restore the journal info)
> you can check your raid. BTW, I had two fans blowing air over my
> harddisks but I got the crash because I used the normal flat IDE-cable...
> I suppose you really checked the heat of the disks?
yes i did
On Fri, 7 Feb 2003, Stephan van Hienen wrote:
> On Fri, 7 Feb 2003, Wim Vinckier wrote:
>
> > I'm just wondering why you are using ext2 in stead of ext3 or reiserfs...
> i'm running ext3
> but crash was heavy enough for removing the journal info :(
>
I would really use fsck.ext3... I guess it will give a lot less errors...
> > I would still give it a try to boot my system without mounting the raid so
> > you just can wait untill the raid is synchronized. Once this is ready,
> i can mount the filesystem, but get errors on accessing some files
> so i prefer to run fsck on it (and restore the journal info)
>
> > you can check your raid. BTW, I had two fans blowing air over my
> > harddisks but I got the crash because I used the normal flat IDE-cable...
> > I suppose you really checked the heat of the disks?
> yes i did
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
------------------------------------------------------------------------
Wim VINCKIER
[email protected] ICQ 100545109
------------------------------------------------------------------------
'Windows 98 or better required' said the box... so I installed linux
ok added some swap space (4 gigabyte)
usage was about 2.5GB
till aborted :
d0: 64450554/dev/md0: 64450555/dev/md0: 64450556/dev/md0:
64450557/dev/md0: 64450558/dev/md0: 64450559/dev/md0: 64450560/dev/md0:
64450561/dev/md0: 64450562/dev/md0: 64450563/dev/md0: 64450564/dev/md0:
64450565/dev/md0: 64450566/dev/md0: 64450567/dev/md0: 64450568/dev/md0:
64450569/dev/md0: 64450570/dev/md0: 64450571/dev/md0: 64450572/dev/md0:
64450573/dev/md0: 64450574/dev/md0: 64450575/dev/md0: 64450576/dev/md0:
64450577/dev/md0: 64450578/dev/md0: 64450579/dev/md0: 64450580/dev/md0:
64450581/dev/md0: 64450582/dev/md0: 64450583/dev/md0: 64450584/dev/md0:
64450585/dev/md0: 64450586/dev/md0: 64450587/dev/md0: 64450588/dev/md0:
64450589/dev/md0: 64450590e2fsck: Can't allocate block element
e2fsck: aborted
/dev/md0: 153834/76922880 files (9.3% non-contiguous), 181680730/615381536
blocks
any hints ?
(i really would like to get back a clean fs (with ext3 journal))
On Fri, 7 Feb 2003, Wim Vinckier wrote:
> I would really use fsck.ext3... I guess it will give a lot less errors...
fsck.ext3 = fsck.ext2
]# fsck.ext3 /dev/md0
e2fsck 1.32 (09-Nov-2002)
On Feb 07, 2003 18:07 +0100, Stephan van Hienen wrote:
> ok added some swap space (4 gigabyte)
>
> usage was about 2.5GB
>
> till aborted :
>
> d0: 64450554/dev/md0: 64450555/dev/md0: 64450556/dev/md0:
> 64450557/dev/md0: 64450558/dev/md0: 64450559/dev/md0: 64450560/dev/md0:
> 64450561/dev/md0: 64450562/dev/md0: 64450563/dev/md0: 64450564/dev/md0:
> 64450565/dev/md0: 64450566/dev/md0: 64450567/dev/md0: 64450568/dev/md0:
> 64450569/dev/md0: 64450570/dev/md0: 64450571/dev/md0: 64450572/dev/md0:
> 64450573/dev/md0: 64450574/dev/md0: 64450575/dev/md0: 64450576/dev/md0:
> 64450577/dev/md0: 64450578/dev/md0: 64450579/dev/md0: 64450580/dev/md0:
> 64450581/dev/md0: 64450582/dev/md0: 64450583/dev/md0: 64450584/dev/md0:
> 64450585/dev/md0: 64450586/dev/md0: 64450587/dev/md0: 64450588/dev/md0:
> 64450589/dev/md0: 64450590e2fsck: Can't allocate block element
>
> e2fsck: aborted
> /dev/md0: 153834/76922880 files (9.3% non-contiguous), 181680730/615381536
> blocks
>
> any hints ?
> (i really would like to get back a clean fs (with ext3 journal))
Hmm, I don't think that will be easy... By default e2fsck will load all
of the inode blocks into memory (pretty sure at least), and if you have
76922880 inodes that is 9.6GB of memory, which you can't allocate from a
single process on i386 no matter how much swap you have. 2.5GB sounds
about right for the maximum amount of memory one can allocate.
Ted, any suggestions?
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
On Feb 07, 2003 16:17 +0100, [email protected] wrote:
> i'm trying to run e2fsk after a system hang
> after 1 hour running (70%) which had a memory usage for about 128M
> i get these errors in the dmesg :
>
> Out of Memory: Killed process 732 (fsck.ext2).
> Out of Memory: Killed process 732 (fsck.ext2).
> Out of Memory: Killed process 732 (fsck.ext2).
> Out of Memory: Killed process 732 (fsck.ext2).
>
> I really wonder if there is something wrong with e2fsk ?
> does it really need that much memory ?
> (fsck on 2.2TB /dev/md0)
I don't think many people have run e2fsck on such a large filesystem
before when there are lots of problems. It is entirely possible that
you need so much memory for such a large filesystem. I would suggest
creating a larger swap file temporarily (on some other partition) so
that e2fsck can complete.
It _may_ be that e2fsck could reduce memory consumption somewhere (or
enable a "use less memory but run slowly" heuristic, but that isn't
very likely, or if it was it would be very slow.
Regarding the "use fsck.ext3" response - ignore it, it is incorrect.
There is no difference at all between fsck.ext2, fsck.ext3, e2fsck.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
On Fri, 7 Feb 2003, Andreas Dilger wrote:
> Hmm, I don't think that will be easy... By default e2fsck will load all
> of the inode blocks into memory (pretty sure at least), and if you have
> 76922880 inodes that is 9.6GB of memory, which you can't allocate from a
> single process on i386 no matter how much swap you have. 2.5GB sounds
> about right for the maximum amount of memory one can allocate.
hmms the data is not critical yet (i was just testing this server)
i really wonder why the crash was there in the first place
thing i found in /var/log/messages :
Feb 7 04:18:15 storage kernel: EXT3-fs error (device md(9,0)):
ext3_new_block:
Allocating block in system zone - block = 536875638
Feb 7 04:18:15 storage kernel: EXT3-fs error (device md(9,0)):
ext3_new_block:
Allocating block in system zone - block = 536875639
doesn't look ok to me (and explains the crash?)
makes me wonder if this can have todo with the lbd (to allow 2TB+ devices)
patch ? or is this something else?
(if it can be related to the lbd patch, i will remove 2 hd's from the
array (but i don't prefer this option))
also for not getting this thing again (that i can't fsck my filesystem)
what are the setting i can use for creating a large filesystem on /dev/md0
? (what is the maximum workable inodes?)
i did this :
mke2fs -j -m 0 -b 4096 -i 4096 -R stride=16
and yesterday had another crash (i was using the /dev/md0 mounted without
fsck(running ok for about 24h, only 1 dir was not accessable(was
created at time previous crash)
at this time a bit more info in /var/log/messages :
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871063
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871065
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871071
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871079
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871081
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871083
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871085
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871087
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871095
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871103
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871108
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871114
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871119
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871121
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871123
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871127
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871129
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871135
Feb 8 20:11:27 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 536871143
..
..(few minutes about the same msg's (only diffent block)
..
Feb 8 20:19:12 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 540606715
Feb 8 20:19:12 storage kernel: EXT2-fs error (device md(9,0)):
ext2_new_block: Allocating block in system zone - block = 540606717
Feb 8 20:19:36 storage kernel: raid5: multiple 1 requests for sector
2064432
Feb 8 20:22:17 storage kernel: raid5: multiple 0 requests for sector
14094488
Feb 8 20:29:12 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:12 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:13 storage last message repeated 4 times
Feb 8 20:29:13 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:13 storage last message repeated 2 times
Feb 8 20:29:13 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:13 storage last message repeated 6 times
Feb 8 20:29:13 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:13 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:13 storage kernel: raid5: multiple 0 requests for sector 6496
Feb 8 20:29:13 storage last message repeated 5 times
Feb 8 20:29:13 storage kernel: raid5: multiple 0 requests for sector
25792
Feb 8 20:29:13 storage last message repeated 5 times
Feb 8 20:29:14 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:14 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:14 storage last message repeated 2 times
Feb 8 20:29:14 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:14 storage last message repeated 4 times
Feb 8 20:29:15 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:15 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:15 storage last message repeated 2 times
Feb 8 20:29:16 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:16 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:16 storage last message repeated 2 times
Feb 8 20:29:16 storage kernel: raid5: multiple 0 requests for sector 6496
Feb 8 20:29:16 storage last message repeated 2 times
Feb 8 20:29:16 storage kernel: raid5: multiple 0 requests for sector
25792
Feb 8 20:29:16 storage last message repeated 2 times
Feb 8 20:29:16 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:16 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:16 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:16 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:16 storage last message repeated 6 times
Feb 8 20:29:17 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:17 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:18 storage last message repeated 4 times
Feb 8 20:29:18 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:18 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:18 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:18 storage last message repeated 2 times
Feb 8 20:29:19 storage kernel: raid5: multiple 0 requests for sector
306783232
Feb 8 20:29:19 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:19 storage last message repeated 2 times
Feb 8 20:29:19 storage kernel: raid5: multiple 0 requests for sector
9587064
Feb 8 20:29:19 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:19 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:19 storage last message repeated 8 times
Feb 8 20:29:19 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:19 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:19 storage last message repeated 2 times
Feb 8 20:29:21 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:21 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:21 storage last message repeated 4 times
Feb 8 20:29:21 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:31 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:31 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:32 storage kernel: raid5: multiple 0 requests for sector
3670152
Feb 8 20:29:32 storage kernel: raid5: multiple 0 requests for sector
306783480
Feb 8 20:29:32 storage kernel: raid5: multiple 0 requests for sector 6496
Feb 8 20:29:32 storage last message repeated 5 times
Feb 8 20:29:32 storage kernel: raid5: multiple 0 requests for sector
25792
(at that time i did a powerdown (reboot was not possible))
>>>>> "Stephan" == Stephan van Hienen <[email protected]> writes:
Stephan> makes me wonder if this can have todo with the lbd (to allow
Stephan> 2TB+ devices) patch ? or is this something else? (if it can
Stephan> be related to the lbd patch, i will remove 2 hd's from the
Stephan> array (but i don't prefer this option))
I haven't tested ext[23] with that large a system on IA32 (I stopped
at 2.4TB, and that was on Linux 2.5). The 2.4 LBD patch was basically
backported from the 2.5.9 version (the last tested version before Al
Viro's rewrite of the block device and partitioning code). Differences in
ext[32] between 2.4.20 and 2.5.9 may not have been allowed for
properly.
I'll have a look when I'm in at work today.
Is there any reason why you're sticking with the 2.4 kernel and ext3?
XFS has been used (on SGI systems) for much longer with large disk
arrays, and I'd expect (linux-specific bugs aside) it to be a more
mature product for this application.
Peter C
On Feb 09, 2003 11:08 +0100, Stephan van Hienen wrote:
> makes me wonder if this can have todo with the lbd (to allow 2TB+ devices)
> patch ? or is this something else?
> (if it can be related to the lbd patch, i will remove 2 hd's from the
> array (but i don't prefer this option))
Now that you mention this, I believe that there were som fixes to the ext2/3
code to not overflow some calcs, but I don't recall the specifics. It sure
seems unusual to have such easy-to-reproduce errors.
> mke2fs -j -m 0 -b 4096 -i 4096 -R stride=16
Do you expect to have so many small files in this huge filesystem?
Basically, the "-i" parameter is telling mke2fs what you think the
average file size will be, so 4kB is very small.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
On Mon, 10 Feb 2003, Peter Chubb wrote:
> Is there any reason why you're sticking with the 2.4 kernel and ext3?
> XFS has been used (on SGI systems) for much longer with large disk
> arrays, and I'd expect (linux-specific bugs aside) it to be a more
> mature product for this application.
i used ext2/3 on all my servers
never checked out xfs or reiserfs, so don't really want to check it out an
an important server, but if it's better to switch to something else..... ?
On Sun, 9 Feb 2003, Andreas Dilger wrote:
> > mke2fs -j -m 0 -b 4096 -i 4096 -R stride=16
>
> Do you expect to have so many small files in this huge filesystem?
> Basically, the "-i" parameter is telling mke2fs what you think the
> average file size will be, so 4kB is very small.
not really, i thought the -b was telling this ?
i think average filesize should be somewhere from 1-5 megabyte
(zipfiles few megabyte/videofiles (can be a few gigabyte)/installation
files for programmes)
i also wonder what kind of chunk-size i need to use
i use 64k now, but i wonder if 256k (or something bigger?) would be better
(does chunk size difference in performance between a 4disk raid5 and a 15disk raid5 ?)
Hi,
On Sun, 2003-02-09 at 10:08, Stephan van Hienen wrote:
> Feb 7 04:18:15 storage kernel: EXT3-fs error (device md(9,0)):
> ext3_new_block:
> Allocating block in system zone - block = 536875638
That looks like it could be a block wrap, amongst other possible causes.
> makes me wonder if this can have todo with the lbd (to allow 2TB+ devices)
> patch ? or is this something else?
Well, that's the most likely candidate, because it's the least tested
component. Are you using Ben LaHaise's LBD fixes for the md devices?
Without those, md and lvm are not LBD-safe.
Cheers,
Stephen
On Mon, 10 Feb 2003, Stephen C. Tweedie wrote:
> On Sun, 2003-02-09 at 10:08, Stephan van Hienen wrote:
>
> > Feb 7 04:18:15 storage kernel: EXT3-fs error (device md(9,0)):
> > ext3_new_block:
> > Allocating block in system zone - block = 536875638
>
> That looks like it could be a block wrap, amongst other possible causes.
hmms and this means ?
>
> > makes me wonder if this can have todo with the lbd (to allow 2TB+ devices)
> > patch ? or is this something else?
>
> Well, that's the most likely candidate, because it's the least tested
> component. Are you using Ben LaHaise's LBD fixes for the md devices?
> Without those, md and lvm are not LBD-safe.
where can i find this lbd fixes for md ?
Hi,
On Tue, 2003-02-11 at 13:11, Stephan van Hienen wrote:
> > On Sun, 2003-02-09 at 10:08, Stephan van Hienen wrote:
> >
> > > Feb 7 04:18:15 storage kernel: EXT3-fs error (device md(9,0)):
> > > ext3_new_block:
> > > Allocating block in system zone - block = 536875638
> >
> > That looks like it could be a block wrap, amongst other possible causes.
> hmms and this means ?
One possible cause here is that some component of the system has wrapped
the block number round at 2TB, rather than correctly going beyond 2TB,
resulting in the wrong block being picked up as a bitmap block.
> > Well, that's the most likely candidate, because it's the least tested
> > component. Are you using Ben LaHaise's LBD fixes for the md devices?
> > Without those, md and lvm are not LBD-safe.
> where can i find this lbd fixes for md ?
I've no idea. Ben has some lb patches up at
http://people.redhat.com/bcrl/lb/
but there's nothing broken out against the latest lbd diffs.
Cheers,
Stephen