Hi,
I got the zero-filled files after reboot. I've tried
to compile two kernels (one with make -j2 and
the other one just with make) simultaneously having
3 running 'find . -type f print0 | xargs -0 cat >/dev/null'.
After reboot i've got .config of the one of the kernels
filled with zeroes, also .bash_history and some others
(all of them reside on a reserfs volume, and my home, btw).
The copies of the bzImage's and modules are ok (they were
to ext2 volumes).
I suppose the files were open for writing at some point
of that session. I'm sure they were closed to the moment
of system shutdown (i've killall5 -TERM ... sequence in
the shutdown scripts).
There were no crashes or suspicious messages on the console.
Nothing special in logs, and sorry, reiserfs self-debugging
wasn't enabled.
-alex
Hello!
On Thu, Feb 07, 2002 at 08:23:48AM +0100, Alex Riesen wrote:
> There were no crashes or suspicious messages on the console.
> Nothing special in logs, and sorry, reiserfs self-debugging
> wasn't enabled.
Can you try the patch attached? It may not fix the thing, but
we want to be sure (and we'll try to reproduce locally atthe same time).
Also try to run reiserfsck --check on your reiserfs partitions.
Bye,
Oleg
Hi,
tried the patch. The problem looks gone, although i've placed
the system under even some more load than before (8.9, maybe not
impressive but first time for this one).
The reiserfsck showed up some nasty looking errors:
shrink_id_map: objectid map shrinked: used 4096, 5 blocks
grow_id_map: objectid map expanded: used 5120, 5 blocks
grow_id_map: objectid map expanded: used 10240, 10 blocks
bad_leaf: block 211482 has wrong order of items
...more of that...
free block count 1326452 mismatches with a correct one 1326458.
on-disk bitmap does not match to the correct one. 1 bytes differ
"reiserfsck --rebuild-tree" cured them without visible damages for now.
There were some messages about deleted blocks, expanded objectid map,
shrinked map and one "dir 1 2 has wrong sd_size 120, has to be 152".
I can send you logs, if needed.
Does the 2.5.4-pre2 contains this patch ?
-alex
On Thu, Feb 07, 2002 at 10:44:20AM +0300, Oleg Drokin wrote:
> Hello!
>
> On Thu, Feb 07, 2002 at 08:23:48AM +0100, Alex Riesen wrote:
>
>> There were no crashes or suspicious messages on the console.
>> Nothing special in logs, and sorry, reiserfs self-debugging
>> wasn't enabled.
> Can you try the patch attached? It may not fix the thing, but
> we want to be sure (and we'll try to reproduce locally atthe same time).
> Also try to run reiserfsck --check on your reiserfs partitions.
>
> Bye,
> Oleg
> --- linux-2.5.4-pre1/fs/reiserfs/inode.c.orig Wed Feb 6 11:18:35 2002
> +++ linux-2.5.4-pre1/fs/reiserfs/inode.c Wed Feb 6 11:12:08 2002
...
Hello!
On Thu, Feb 07, 2002 at 11:02:35PM +0100, Alex Riesen wrote:
> The reiserfsck showed up some nasty looking errors:
> shrink_id_map: objectid map shrinked: used 4096, 5 blocks
> grow_id_map: objectid map expanded: used 5120, 5 blocks
> grow_id_map: objectid map expanded: used 10240, 10 blocks
> bad_leaf: block 211482 has wrong order of items
> ...more of that...
> free block count 1326452 mismatches with a correct one 1326458.
> on-disk bitmap does not match to the correct one. 1 bytes differ
Have you mkreiserfs'ed your partition before testing the patch I've sent you?
Or have you at least made a reiserfsck before a test run to ensure,
these corruptions are not from the previous kernels (particularly
bad_leaf: block 211482 has wrong order of items record worries me)
> "reiserfsck --rebuild-tree" cured them without visible damages for now.
> There were some messages about deleted blocks, expanded objectid map,
> shrinked map and one "dir 1 2 has wrong sd_size 120, has to be 152".
> I can send you logs, if needed.
Sure, please do.
> Does the 2.5.4-pre2 contains this patch ?
Yes.
Thank you.
Bye,
Oleg
Hello!
On Fri, Feb 08, 2002 at 08:51:55AM +0300, Oleg Drokin wrote:
> these corruptions are not from the previous kernels (particularly
> bad_leaf: block 211482 has wrong order of items record worries me)
Also I hope this is not on the same box, where you are getting
Machine Check Exceptions.
Bye,
Oleg
Hi,
hmm.. You're demanding too much(mkreiserfs) - it's my home partition :)
And really sorry, i even forgot about reiserfsck it before the patch.
Maybe the corruptions are from previous kernels, but the zero-files
are observed for the first time, particularly in the .bash_history.
And yes, that's the same box (i have no other spare box to
experiment with). The memtest didn't found anything (maybe,
i had it run only 1pass for about 1 hour). The processor
is not overclocked, the cooler is native (sold together with
the processor). But the zero-files was seen in the day before
machine check exceptions were occured.
Sorry for such a dirty test environment, i was really not prepared.
Logs attached.
-alex
On Fri, Feb 08, 2002 at 08:51:55AM +0300, Oleg Drokin wrote:
> these corruptions are not from the previous kernels (particularly
> bad_leaf: block 211482 has wrong order of items record worries me)
Also I hope this is not on the same box, where you are getting
Machine Check Exceptions.
On Fri, Feb 08, 2002 at 08:51:56AM +0300, Oleg Drokin wrote:
> Hello!
>
> On Thu, Feb 07, 2002 at 11:02:35PM +0100, Alex Riesen wrote:
>
> > The reiserfsck showed up some nasty looking errors:
> > shrink_id_map: objectid map shrinked: used 4096, 5 blocks
> > grow_id_map: objectid map expanded: used 5120, 5 blocks
> > grow_id_map: objectid map expanded: used 10240, 10 blocks
> > bad_leaf: block 211482 has wrong order of items
> > ...more of that...
> > free block count 1326452 mismatches with a correct one 1326458.
> > on-disk bitmap does not match to the correct one. 1 bytes differ
>
> Have you mkreiserfs'ed your partition before testing the patch I've sent you?
> Or have you at least made a reiserfsck before a test run to ensure,
> these corruptions are not from the previous kernels (particularly
> bad_leaf: block 211482 has wrong order of items record worries me)
>
> > "reiserfsck --rebuild-tree" cured them without visible damages for now.
> > There were some messages about deleted blocks, expanded objectid map,
> > shrinked map and one "dir 1 2 has wrong sd_size 120, has to be 152".
> > I can send you logs, if needed.
> Sure, please do.
>
> > Does the 2.5.4-pre2 contains this patch ?
> Yes.
>
> Thank you.
>
> Bye,
> Oleg
Hello!
On Fri, Feb 08, 2002 at 11:07:13PM +0100, Alex Riesen wrote:
> hmm.. You're demanding too much(mkreiserfs) - it's my home partition :)
At least reiserfsck before any tests is almost mandratory ;)
> Maybe the corruptions are from previous kernels, but the zero-files
> are observed for the first time, particularly in the .bash_history.
Yes, but you said with the patch you cannot reproduce zero files anymore.
> Sorry for such a dirty test environment, i was really not prepared.
> Logs attached.
I am sorry, but there are so many variables, these logs are barely useful as
of now.
If you can reproduce on a clean filesystem with not faulty hardware, that would be interesting, though.
Thank you.
Bye,
Oleg
On Mon, Feb 11, 2002 at 12:52:27PM +0100, Luigi Genoni wrote:
> I got the same with 2.5.4-pre1 on a ATA66 disk,
> chipset i810, PentiumIII with 256 MBRAM,
> and then on Athlon 1300 Mhz, scsi disk, adaptec
> 2940UW, 512MB RAM.
>
> I saw then just after a reboot.
> Those file has been opened three or four days before the reboot expect of
> .history.
> I got no messages, and, that is the most interesting thing, this
> corruption was just for text file. I also edited some binary file with
> kexedit and them have not been corrupted after the reboot.
was the edited file all the time on reiserfs? I mean, maybe kexedit
uses temporary file on some other fs?
>
> reiserfsck does not show any corruption, and the HW is good.
> I know it is just a "me too", but i can do every test you need on the
> PentiumIII
Oleg, i may have to give you another set of apologies :) The fs problems
the reiserfsck have found could well be from the old kernels (although
the box crashes very rarely, just because the longest uptime is about 3
hours).
>
> Luigi Genoni
>
> On Mon, 11 Feb 2002, Oleg Drokin wrote:
>
> > Hello!
> >
> > On Fri, Feb 08, 2002 at 11:07:13PM +0100, Alex Riesen wrote:
> >
> > > hmm.. You're demanding too much(mkreiserfs) - it's my home partition :)
> > At least reiserfsck before any tests is almost mandratory ;)
> >
> > > Maybe the corruptions are from previous kernels, but the zero-files
> > > are observed for the first time, particularly in the .bash_history.
> > Yes, but you said with the patch you cannot reproduce zero files anymore.
> >
> > > Sorry for such a dirty test environment, i was really not prepared.
> > > Logs attached.
> > I am sorry, but there are so many variables, these logs are barely useful as
> > of now.
> > If you can reproduce on a clean filesystem with not faulty hardware, that
> > would be interesting, though.
...
Hello!
On Mon, Feb 11, 2002 at 01:17:13PM +0100, Alex Riesen wrote:
> > I got the same with 2.5.4-pre1 on a ATA66 disk,
> > chipset i810, PentiumIII with 256 MBRAM,
> > and then on Athlon 1300 Mhz, scsi disk, adaptec
> > 2940UW, 512MB RAM.
> >
> > I saw then just after a reboot.
> > Those file has been opened three or four days before the reboot expect of
> > .history.
> > I got no messages, and, that is the most interesting thing, this
> > corruption was just for text file. I also edited some binary file with
> > kexedit and them have not been corrupted after the reboot.
Hm. Strange. This message have not appeared in my mailbox for some
reason.
.history may be corrupted if your partition was not unmounted properly
before reboot.
Bye,
Oleg
On Mon, Feb 11, 2002 at 04:27:43PM +0300, Oleg Drokin wrote:
> Hello!
>
> On Mon, Feb 11, 2002 at 02:14:22PM +0100, Alex Riesen wrote:
>
> > > .history may be corrupted if your partition was not unmounted properly
> > > before reboot.
> > but in that strange way? the sizes of the files are kept, just the content,
> > as were it's an empty page. Sadly that i haven't kept any of the files (unless
> > some i haven't found yet) to check it is page-aligned.
> This is nothing strange.
> You open file for writing, write some stuff, metadata gets journaled,
> but file content is not. Then you reboot, metadata is ok,
> but file content is lost.
> (and at least bash totally rewrites its .history file)
yes, that clear alot.
I cannot remember any problems while rebooting, but i'm not sure.
-alex
On Mon, 11 Feb 2002, Alex Riesen wrote:
> On Mon, Feb 11, 2002 at 12:52:27PM +0100, Luigi Genoni wrote:
> > I got the same with 2.5.4-pre1 on a ATA66 disk,
> > chipset i810, PentiumIII with 256 MBRAM,
> > and then on Athlon 1300 Mhz, scsi disk, adaptec
> > 2940UW, 512MB RAM.
> >
> > I saw then just after a reboot.
> > Those file has been opened three or four days before the reboot expect of
> > .history.
> > I got no messages, and, that is the most interesting thing, this
> > corruption was just for text file. I also edited some binary file with
> > kexedit and them have not been corrupted after the reboot.
> was the edited file all the time on reiserfs? I mean, maybe kexedit
> uses temporary file on some other fs?
NO, the backup file is in the same directory with the edited file.
But the binary file are very big, while the text file are small,
and maybe so small that they are stored in the leaf node and not in
a stat data
>
>
> >
> > reiserfsck does not show any corruption, and the HW is good.
> > I know it is just a "me too", but i can do every test you need on the
> > PentiumIII
> Oleg, i may have to give you another set of apologies :) The fs problems
> the reiserfsck have found could well be from the old kernels (although
> the box crashes very rarely, just because the longest uptime is about 3
> hours).
>
>
> >
> > Luigi Genoni
> >
> > On Mon, 11 Feb 2002, Oleg Drokin wrote:
> >
> > > Hello!
> > >
> > > On Fri, Feb 08, 2002 at 11:07:13PM +0100, Alex Riesen wrote:
> > >
> > > > hmm.. You're demanding too much(mkreiserfs) - it's my home partition :)
> > > At least reiserfsck before any tests is almost mandratory ;)
> > >
> > > > Maybe the corruptions are from previous kernels, but the zero-files
> > > > are observed for the first time, particularly in the .bash_history.
> > > Yes, but you said with the patch you cannot reproduce zero files anymore.
> > >
> > > > Sorry for such a dirty test environment, i was really not prepared.
> > > > Logs attached.
> > > I am sorry, but there are so many variables, these logs are barely useful as
> > > of now.
> > > If you can reproduce on a clean filesystem with not faulty hardware, that
> > > would be interesting, though.
> ...
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Mon, 11 Feb 2002, Oleg Drokin wrote:
> Hello!
>
> On Mon, Feb 11, 2002 at 01:17:13PM +0100, Alex Riesen wrote:
> > > I got the same with 2.5.4-pre1 on a ATA66 disk,
> > > chipset i810, PentiumIII with 256 MBRAM,
> > > and then on Athlon 1300 Mhz, scsi disk, adaptec
> > > 2940UW, 512MB RAM.
> > >
> > > I saw then just after a reboot.
> > > Those file has been opened three or four days before the reboot expect of
> > > .history.
> > > I got no messages, and, that is the most interesting thing, this
> > > corruption was just for text file. I also edited some binary file with
> > > kexedit and them have not been corrupted after the reboot.
> Hm. Strange. This message have not appeared in my mailbox for some
> reason.
> .history may be corrupted if your partition was not unmounted properly
> before reboot.
other files corrupted were
/etc/rc.d/rc.local /etc/rc.d/rc.inet2
/etc/lilo.conf on the PIII
/scratch/root/<some .c source file> on the Athlon
/ partition is not the same of /home.
>
> Bye,
> Oleg
>
Hello!
On Mon, Feb 11, 2002 at 03:23:51PM +0100, Luigi Genoni wrote:
> > .history may be corrupted if your partition was not unmounted properly
> > before reboot.
> other files corrupted were
> /etc/rc.d/rc.local /etc/rc.d/rc.inet2
> /etc/lilo.conf on the PIII
> /scratch/root/<some .c source file> on the Athlon
> / partition is not the same of /home.
All of this on 2.5.4-pre1 only?
Or were you able to reproduce it on later kernels too?
Bye,
Oleg
I had corruption also with 2.5.3.
I was trying to boot 2.5.4, but with preemption patch enabled i got an
oops immediatelle with swapper ;(.
Then I sepnt some time for clean i810_audio.c so it can compile, and
tomorrow I will perform some test with 2.5.4 without preemption.
I should add that finally I could corrupt a big binary file on the
pentoim III (the athlon in more important to me).
I corrupted flash plugin loading mozilla and closing
X11 without cosing mozilla before.
I did not noticed corruption with 2.5.3-pre1/2.
I could test also on a dual Pi 1260 Mhz with cpqarray controlelr, if
needed...
Luigi
On Mon, 11 Feb 2002, Oleg Drokin wrote:
> Hello!
>
> On Mon, Feb 11, 2002 at 03:23:51PM +0100, Luigi Genoni wrote:
>
> > > .history may be corrupted if your partition was not unmounted properly
> > > before reboot.
> > other files corrupted were
> > /etc/rc.d/rc.local /etc/rc.d/rc.inet2
> > /etc/lilo.conf on the PIII
> > /scratch/root/<some .c source file> on the Athlon
> > / partition is not the same of /home.
> All of this on 2.5.4-pre1 only?
> Or were you able to reproduce it on later kernels too?
>
> Bye,
> Oleg
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Sorry but I got a corrupted file also with 2.5.4. I could see it after the
reboot to 2.4.17. It was /etc/exports and it was OK since i edited it
running 2.5.4, and It was readable by exportfs, so it corrupted at reboot.
The reboot was clean, of course. Maybe wrong umount?
Luigi
On Mon, 11 Feb 2002, Oleg Drokin wrote:
> Hello!
>
> On Mon, Feb 11, 2002 at 03:23:51PM +0100, Luigi Genoni wrote:
>
> > > .history may be corrupted if your partition was not unmounted properly
> > > before reboot.
> > other files corrupted were
> > /etc/rc.d/rc.local /etc/rc.d/rc.inet2
> > /etc/lilo.conf on the PIII
> > /scratch/root/<some .c source file> on the Athlon
> > / partition is not the same of /home.
> All of this on 2.5.4-pre1 only?
> Or were you able to reproduce it on later kernels too?
>
> Bye,
> Oleg
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Hello!
What kind of corruption? Can we look at corrupted file if there is something
unusual?
What Linux Distribution do you run?
You can check cleanness by looking into kernel messages.
If there is "replaying journal" message - umount was not clean.
Bye,
Oleg
On Tue, Feb 12, 2002 at 05:55:54PM +0100, Luigi Genoni wrote:
> Sorry but I got a corrupted file also with 2.5.4. I could see it after the
> reboot to 2.4.17. It was /etc/exports and it was OK since i edited it
> running 2.5.4, and It was readable by exportfs, so it corrupted at reboot.
>
> The reboot was clean, of course. Maybe wrong umount?
I run slackware 8.0.49, and there was no log replaying.
The corruption is the one we are talking about since some days,
file are fille of 0s instead of their supposed content.
Please, note that before reboot I had no problems accessin the file, and
they resulted corrupted after reboot.
I usually restore corrupted file, so I should keep one fopr you, I think.
Luigi
On Tue, 12 Feb 2002, Oleg Drokin wrote:
> Hello!
>
> What kind of corruption? Can we look at corrupted file if there is something
> unusual?
> What Linux Distribution do you run?
>
> You can check cleanness by looking into kernel messages.
> If there is "replaying journal" message - umount was not clean.
>
> Bye,
> Oleg
> On Tue, Feb 12, 2002 at 05:55:54PM +0100, Luigi Genoni wrote:
> > Sorry but I got a corrupted file also with 2.5.4. I could see it after the
> > reboot to 2.4.17. It was /etc/exports and it was OK since i edited it
> > running 2.5.4, and It was readable by exportfs, so it corrupted at reboot.
> >
> > The reboot was clean, of course. Maybe wrong umount?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Tue, Feb 12, 2002 at 08:01:24PM +0300, Oleg Drokin wrote:
> Hello!
>
> What kind of corruption? Can we look at corrupted file if there is something
> unusual?
> What Linux Distribution do you run?
i have my own system (but with sysVinit), and am somewhat sure about unmounts.
> You can check cleanness by looking into kernel messages.
> If there is "replaying journal" message - umount was not clean.
I've had the "replaying journal" after "machine check exception".
But after this crash the filesystem was perfect. The zerofiles was before...
>
> Bye,
> Oleg
> On Tue, Feb 12, 2002 at 05:55:54PM +0100, Luigi Genoni wrote:
> > Sorry but I got a corrupted file also with 2.5.4. I could see it after the
> > reboot to 2.4.17. It was /etc/exports and it was OK since i edited it
> > running 2.5.4, and It was readable by exportfs, so it corrupted at reboot.
> >
> > The reboot was clean, of course. Maybe wrong umount?
-alex
Hello!
On Tue, Feb 12, 2002 at 06:13:18PM +0100, Luigi Genoni wrote:
> I run slackware 8.0.49, and there was no log replaying.
Ok.
> The corruption is the one we are talking about since some days,
> file are fille of 0s instead of their supposed content.
Hm. Was that a plain reboot?
Did you tried to run reiserfsck --rebuild-tree between reboots before
finding files with zeroes.
(if you did, that may somewhat explain what you've seen)
> I usually restore corrupted file, so I should keep one fopr you, I think.
Ok, if it became all zeroes, then I do not need it.
Bye,
Oleg
On Wed, 13 Feb 2002, Oleg Drokin wrote:
> Hello!
>
> On Tue, Feb 12, 2002 at 06:13:18PM +0100, Luigi Genoni wrote:
>
> > I run slackware 8.0.49, and there was no log replaying.
> Ok.
>
> > The corruption is the one we are talking about since some days,
> > file are fille of 0s instead of their supposed content.
> Hm. Was that a plain reboot?
> Did you tried to run reiserfsck --rebuild-tree between reboots before
> finding files with zeroes.
> (if you did, that may somewhat explain what you've seen)
NO, NO.
I boot with 2.5, and I make some work, I edit dsome text file
with jed and so on,
then I do a normal reboot in 2.4.17, without any fsck,
there is log reply, it is a normal reboot.
Well, some files get corrupted.
Please, notice, I even had been so lucky to check one of them immediatelly
before the reboot (with cat), it was safe, and after it was corruted.
Please note, are corrupted just files that i wrote in 2.5.3/4.
I saw I am not the only one with this kind of corruption, I remember at
less one related mail.
I have no problem to check any patch.
Luigi
>
> > I usually restore corrupted file, so I should keep one fopr you, I think.
> Ok, if it became all zeroes, then I do not need it.
>
> Bye,
> Oleg
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Hello!
On Wed, Feb 13, 2002 at 12:11:14PM +0100, Luigi Genoni wrote:
> > > I run slackware 8.0.49, and there was no log replaying.
> then I do a normal reboot in 2.4.17, without any fsck,
> there is log reply, it is a normal reboot.
Some confusion is going on.
So do you have log replay or you do not have log replay?
> Well, some files get corrupted.
Ok. That's definitely bad. You said you see corruptions on two boxes, right?
Is it as simple as boot into 2.5.4, reiserfsck (and see no errors),
mount an fs, do something, type "reboot" and reboot into 2.5.4 again,
and viola - here are zeroed files. Right?
> I saw I am not the only one with this kind of corruption, I remember at
> less one related mail.
There was flaky hardware on the other report. And I think Alex Riesen
cannot reproduce zero files anymore.
Bye,
Oleg
On Wed, 13 Feb 2002, Oleg Drokin wrote:
> Hello!
>
> On Wed, Feb 13, 2002 at 12:11:14PM +0100, Luigi Genoni wrote:
>
> > > > I run slackware 8.0.49, and there was no log replaying.
> > then I do a normal reboot in 2.4.17, without any fsck,
> > there is log reply, it is a normal reboot.
My fault, I was willing to write there is NO log reply, and I wrote it
without the NO.
> Some confusion is going on.
> So do you have log replay or you do not have log replay?
>
> > Well, some files get corrupted.
> Ok. That's definitely bad. You said you see corruptions on two boxes, right?
> Is it as simple as boot into 2.5.4, reiserfsck (and see no errors),
> mount an fs, do something, type "reboot" and reboot into 2.5.4 again,
> and viola - here are zeroed files. Right?
It happened when I did reboot from 2.5.4-pre1 to 2.5.4, and my
/etc/rc.c/rc.local was full of 0s.
And when I did reboot from 2.5.3 to 2.5.3 on the other box and some c
source I was editing three ours before were full of 0s.
>
> > I saw I am not the only one with this kind of corruption, I remember at
> > less one related mail.
> There was flaky hardware on the other report. And I think Alex Riesen
> cannot reproduce zero files anymore.
>
Those two boxes runned from more than 1 year and no HW problems before..
Luigi
On Wed, Feb 13, 2002 at 04:08:51PM +0300, Oleg Drokin wrote:
> Hello!
>
> On Wed, Feb 13, 2002 at 12:11:14PM +0100, Luigi Genoni wrote:
>
> > > > I run slackware 8.0.49, and there was no log replaying.
> > then I do a normal reboot in 2.4.17, without any fsck,
> > there is log reply, it is a normal reboot.
> Some confusion is going on.
> So do you have log replay or you do not have log replay?
>
> > Well, some files get corrupted.
> Ok. That's definitely bad. You said you see corruptions on two boxes, right?
> Is it as simple as boot into 2.5.4, reiserfsck (and see no errors),
> mount an fs, do something, type "reboot" and reboot into 2.5.4 again,
> and viola - here are zeroed files. Right?
>
> > I saw I am not the only one with this kind of corruption, I remember at
> > less one related mail.
> There was flaky hardware on the other report. And I think Alex Riesen
> cannot reproduce zero files anymore.
Correct. After applying your patch, indeed.
I'm really sorry, i hado no much time to experiment and try
again without the patch. Should i try, btw?
-alex
Hello!
On Wed, Feb 13, 2002 at 09:01:56PM +0100, Alex Riesen wrote:
> > There was flaky hardware on the other report. And I think Alex Riesen
> > cannot reproduce zero files anymore.
> Correct. After applying your patch, indeed.
> I'm really sorry, i hado no much time to experiment and try
> again without the patch. Should i try, btw?
Why do you think I will ask you to to risk your data and run without
a necessary fix?
Or course it is not recommended to run kernel without fixes for known bugs
applied.
Bye,
Oleg
Hello!
On Wed, Feb 13, 2002 at 06:15:24PM +0100, Luigi Genoni wrote:
> It happened when I did reboot from 2.5.4-pre1 to 2.5.4, and my
> /etc/rc.c/rc.local was full of 0s.
> And when I did reboot from 2.5.3 to 2.5.3 on the other box and some c
> source I was editing three ours before were full of 0s.
There was a bug in kernels up to 2.5.4-pre1 (2.5.4-pre2 had a fix),
which had similar symtoms.
> > > I saw I am not the only one with this kind of corruption, I remember at
> > > less one related mail.
> > There was flaky hardware on the other report. And I think Alex Riesen
> > cannot reproduce zero files anymore.
> Those two boxes runned from more than 1 year and no HW problems before..
Great.
Thing is if you able to reproduce on 2.5.4-pre2+ without rebooting into earlier
2.5 kernels in-between tries, then it will mean something is not right even
in latest kernels.
Bye,
Oleg
Well, with 2.5.5-pre1 i get this oops:
PAP-14030: direct2indirect: pasted or inserted byte exists in the
treeinvalid operand: 0000
CPU: 0
EIP: 0010:[<c0168dd9>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 00000049 ebx: c01f7a00 ecx: ffffffb7 edx: cfa01180
esi: c154a800 edi: ceeede80 ebp: ceeedda4 esp: ceeedd2c
ds: 0018 es: 0018 ss: 0018
Stack: c01f60da c02640c0 c01f7a00 ceeedd50 ceeedd8c cbe5c198 c0171df4 c154a800
c01f7a00 ceeede3c 00000000 cbe76b00 c154a800 00000000 00001000 00000000
cbe5c198 ceeedec0 00000713 00000714 00000001 fffffffe 0040ffff 00000cf4
Call Trace: [<c0171df4>] [<c016055f>] [<c016dba3>] [<c016e4bf>] [<c017283d>]
[<c0135cb1>] [<c013638e>] [<c015fb00>] [<c01624e8>] [<c015fb00>] [<c012953d>]
[<c013377b>] [<c010886f>]
Code: 0f 0b 68 c0 40 26 c0 b8 e0 60 1f c0 8d 96 cc 00 00 00 85 f6
>>EIP; c0168dd8 <reiserfs_panic+28/4c> <=====
Trace; c0171df4 <direct2indirect+d4/2d8>
Trace; c016055e <reiserfs_get_block+a5e/df4>
Trace; c016dba2 <is_tree_node+36/4c>
Trace; c016e4be <search_by_key+906/df0>
Trace; c017283c <get_cnode+10/78>
Trace; c0135cb0 <__block_prepare_write+8c/1f8>
Trace; c013638e <block_prepare_write+22/3c>
Trace; c015fb00 <reiserfs_get_block+0/df4>
Trace; c01624e8 <reiserfs_prepare_write+5c/64>
Trace; c015fb00 <reiserfs_get_block+0/df4>
Trace; c012953c <generic_file_write+45c/660>
Trace; c013377a <sys_write+8e/c4>
Trace; c010886e <syscall_call+6/a>
Code; c0168dd8 <reiserfs_panic+28/4c>
00000000 <_EIP>:
Code; c0168dd8 <reiserfs_panic+28/4c> <=====
0: 0f 0b ud2a <=====
Code; c0168dda <reiserfs_panic+2a/4c>
2: 68 c0 40 26 c0 push $0xc02640c0
Code; c0168dde <reiserfs_panic+2e/4c>
7: b8 e0 60 1f c0 mov $0xc01f60e0,%eax
Code; c0168de4 <reiserfs_panic+34/4c>
c: 8d 96 cc 00 00 00 lea 0xcc(%esi),%edx
Code; c0168dea <reiserfs_panic+3a/4c>
12: 85 f6 test %esi,%esi
On Thu, 14 Feb 2002, Oleg Drokin wrote:
> Hello!
>
> On Wed, Feb 13, 2002 at 06:15:24PM +0100, Luigi Genoni wrote:
> > It happened when I did reboot from 2.5.4-pre1 to 2.5.4, and my
> > /etc/rc.c/rc.local was full of 0s.
> > And when I did reboot from 2.5.3 to 2.5.3 on the other box and some c
> > source I was editing three ours before were full of 0s.
> There was a bug in kernels up to 2.5.4-pre1 (2.5.4-pre2 had a fix),
> which had similar symtoms.
>
> > > > I saw I am not the only one with this kind of corruption, I remember at
> > > > less one related mail.
> > > There was flaky hardware on the other report. And I think Alex Riesen
> > > cannot reproduce zero files anymore.
> > Those two boxes runned from more than 1 year and no HW problems before..
> Great.
> Thing is if you able to reproduce on 2.5.4-pre2+ without rebooting into earlier
> 2.5 kernels in-between tries, then it will mean something is not right even
> in latest kernels.
>
> Bye,
> Oleg
>
Hello!
On Thu, Feb 14, 2002 at 10:57:13AM +0100, Luigi Genoni wrote:
> Well, with 2.5.5-pre1 i get this oops:
>
> PAP-14030: direct2indirect: pasted or inserted byte exists in the
> treeinvalid operand: 0000
It means 2.5.2-dj3 or 2.5.3 kernel, you run some time ago,
have damaged your reiserfs filesystem.
Now you have to run reiserfsck --rebuild-tree on that partition.
Also you need attached patch to be able to user reiserfs on 2.5.5-pre1
at all.
Bye,
Oleg