2001-02-17 16:22:02

by Frank de Lange

[permalink] [raw]
Subject: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

Hi'all,

Well, subject says it all... When I try to compile mozilla (CVS version) with
the '--enable-elf-dynstr-gc' option, the compile fails with a segfault:

../../dist/bin/elf-dynstr-gc ../../dist/lib/components/libsample.so
make[2]: *** [install] Segmentation fault (core dumped)

compiling the same codebase on an ext2 filesystem does not produce this
segfault. When I compare the produced library (libsample.so), there is a
consistent difference between the one compile on the reiserfs and the ext2
filesystem. Running objdump on the reiserfs-compiled library also produces
errors (some assertion failures, a lot of 'invalid string offset' errors, and
finally a 'Memory exhausted' error), while objdump happily disassebles the
ext-produced binary.

These problems occur on:

2.4.1
2.4.2-pre4
2.4.2-pre4 with Chris Mason's 'reiserfs fix for null bytes in small files'

So, there's something quite wrong here.

If anyone wants me to try something, do tell...

Cheers//Frank

--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]


2001-02-18 00:17:38

by Chris Mason

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile



On Saturday, February 17, 2001 05:21:18 PM +0100 Frank de Lange
<[email protected]> wrote:

> Hi'all,
>
> Well, subject says it all... When I try to compile mozilla (CVS version)
> with the '--enable-elf-dynstr-gc' option, the compile fails with a
> segfault:
>
> ../../dist/bin/elf-dynstr-gc ../../dist/lib/components/libsample.so
> make[2]: *** [install] Segmentation fault (core dumped)
>

That's not good. Which compiler did you use to compile the kernel? This
sounds lame, but reiserfs exercises the cpu/mem more than ext2, so we hit
bad ram more often. If we run out of other things to try, please run a
memory tester.

> compiling the same codebase on an ext2 filesystem does not produce this
> segfault. When I compare the produced library (libsample.so), there is a
> consistent difference between the one compile on the reiserfs and the ext2
> filesystem. Running objdump on the reiserfs-compiled library also produces
> errors (some assertion failures, a lot of 'invalid string offset' errors,
> and finally a 'Memory exhausted' error), while objdump happily
> disassebles the ext-produced binary.
>

Where in the libsample.so file are the differences (what byte offset?).
Are they restricted to a given range, or do they vary randomly?

> These problems occur on:
>
> 2.4.1
> 2.4.2-pre4
> 2.4.2-pre4 with Chris Mason's 'reiserfs fix for null bytes in small
> files'
>
At least the patch didn't make it worse. Would anyone care to comment on
how the elf-dynstr-gc option changes the file access patterns for the
compile?

thanks,
Chris

2001-02-18 00:57:52

by Frank de Lange

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

> That's not good. Which compiler did you use to compile the kernel? This
> sounds lame, but reiserfs exercises the cpu/mem more than ext2, so we hit
> bad ram more often. If we run out of other things to try, please run a
> memory tester.

I use 'good old' gcc 2.95.2:

gcc -v: gcc version 2.95.2 19991024 (release)

I just tried 2.4.1-ac18, which also gave me the same segfault. When I compare
the corrupted binary (the one compile on reiserfs) to the working one (compiled
on ext2), I notice that at position 0x1000 in the file, a block of data from
position 0x0e60 is duplicated. It seems to be inserted into the data stream, as
it is followed by data which (in the working version of libsample.so) starts at
0x1000:

(bsdiff (binary sdiff) between both files)

(actually the differences between both files start much earlier, but that seems
to be just all kinds of changed relocation information as a result of the error)

(hope my careful ASCII-formatting makes it through the list and the archives)

THE BAD THE GOOD

<deletia, a lot of uninteresting data...>

0000e60 c4 20 83 c4 f4 8b 06 0000e60 c4 20 83 c4 f4 8b 06
0000e68 8b 40 10 ff d0 eb 06 0000e68 8b 40 10 ff d0 eb 06
0000e70 bf 0e 00 07 80 89 f8 0000e70 bf 0e 00 07 80 89 f8
0000e78 65 e8 5b 5e 5f 89 ec 0000e78 65 e8 5b 5e 5f 89 ec
0000e80 c3 8d 76 00 55 89 e5 0000e80 c3 8d 76 00 55 89 e5
0000e88 c0 89 ec 5d c3 8d 76 0000e88 c0 89 ec 5d c3 8d 76
0000e90 55 89 e5 31 c0 89 ec 0000e90 55 89 e5 31 c0 89 ec

<deletia, a lot of uninteresting data...>

0000fd8 00 00 00 00 c0 00 00 0000fd8 00 00 00 00 c0 00 00
0000fe0 00 00 00 46 80 a0 c0 0000fe0 00 00 00 46 80 a0 c0
0000fe8 68 08 d3 11 91 5f d9 0000fe8 68 08 d3 11 91 5f d9
0000ff0 89 d4 8e 3c 40 92 89 0000ff0 89 d4 8e 3c 40 92 89
0000ff8 d2 f9 d2 11 bd d6 00 0000ff8 d2 f9 d2 11 bd d6 00

LOOK HERE: IDENTICAL TO THE AND THIS IS WHAT IT SHOULD
DATA AT 0000e60 LOOK LIKE...

0001000 c4 20 83 c4 f4 8b 06 | 0001000 64 65 73 74 86 52 38
0001008 8b 40 10 ff d0 eb 06 | 0001008 c4 cb d2 11 8c ca 00
0001010 bf 0e 00 07 80 89 f8 | 0001010 b0 fc 14 a3 a0 58 f1
0001018 65 e8 5b 5e 5f 89 ec | 0001018 dd ca d2 11 8c ca 00

<deletia, a lot of uninteresting data...>

0001190 89 d4 8e 3c 40 92 89 <
0001198 d2 f9 d2 11 bd d6 00 <

AND HERE THE 'GOOD' DATA STARTS
AGAIN, THIS BLOCK IS IDENTICAL TO
THE ONE AT 0x1000 IN THE 'GOOD' FILE

00011a0 64 65 73 74 86 52 38 <
00011a8 c4 cb d2 11 8c ca 00 <
00011b0 b0 fc 14 a3 a0 58 f1 <
00011b8 dd ca d2 11 8c ca 00 <
00011c0 b0 fc 14 a3 40 a7 58 <
00011c8 dc d5 d2 11 92 fb 00 <

<deletia, a lot of uninteresting data...>

So, it seems a wrong block of data was inserted into the stream at position
0x1000, wreaking havoc on the file structure. Now 0x1000 is kind of a magic
number, isn't it? Alsmost to good to be true...

I will retry this with 'all warnings and bells and whistles' turned on in
reiserfs (on 2.4.1-ac18), and see if anything out of the ordinary is logged. I
somehow doubt it, since repeated forced reiserfsck's have turned up nothing at
all...

Oh, and both my own and my computer's memory is OK, so this is not a hardware
fault... :-)

By the way, /tmp (where most action is taking place when compiling) is hosted
on a good ext2 filesystem. Just in case you wondered...

And, also of interest, I'm using an SMP box (BP6, 2 non overclocked Celeron
466s)

Cheers//Frank

--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]

2001-02-18 01:11:28

by Frank de Lange

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

> At least the patch didn't make it worse. Would anyone care to comment on
> how the elf-dynstr-gc option changes the file access patterns for the
> compile?

It does not change the file access patterns, it adds an extra step. A separate
binary (dist/bin/elf-dynstr-gc, a convoluted version of strip) is run over the
final (linked) library/executable to remove some symbol info. The elf-dynstr-gc
program is compiled as part of the mozilla build. There's nothing wrong with
elf-dynstr-gc on the reiserfs filesystem, it is identical to the one on the
ext2 partition. Running the 'reiserfs' version on the ext2 tree works as it
should, running the ext2 version on the reiserfs tree crashes (seems the
program is not very robust, as it does not detect garbled input files). As
said, running objdump on the corrupted (reiserfs compiled) library also
produces errors.

Cheers//Frank

--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]

2001-02-18 01:15:39

by Frank de Lange

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

On Sun, Feb 18, 2001 at 01:57:15AM +0100, Frank de Lange wrote:
> I will retry this with 'all warnings and bells and whistles' turned on in
> reiserfs (on 2.4.1-ac18), and see if anything out of the ordinary is logged. I
> somehow doubt it, since repeated forced reiserfsck's have turned up nothing at
> all...

I just ran the compile again on the described build, same results, no warnings
of any kind, nothing in the debug log facility, nothing on the console...

Reiserfs seems to believe it did the right thing. I'm here to tell you that it
didn't...

Cheers//Frank
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]

2001-02-18 01:49:37

by David Ford

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

I can say "me too" for this. I thought it was perhaps glibc or binutils
tho. I only have reiserfs systems now so I don't have a basis for
comparison.

However I -can- say that I didn't experience this until I put glibc
2.2.1 on my systems. I do use an "approved" gcc, stock 2.95.2.

I wouldn't be so quick to pin it on reiserfs.

-d


Chris Mason wrote:

>
> On Saturday, February 17, 2001 05:21:18 PM +0100 Frank de Lange
> <[email protected]> wrote:
>
>> Hi'all,
>>
>> Well, subject says it all... When I try to compile mozilla (CVS version)
>> with the '--enable-elf-dynstr-gc' option, the compile fails with a
>> segfault:
>>
>> ../../dist/bin/elf-dynstr-gc ../../dist/lib/components/libsample.so
>> make[2]: *** [install] Segmentation fault (core dumped)
>>
>
> That's not good. Which compiler did you use to compile the kernel? This
> sounds lame, but reiserfs exercises the cpu/mem more than ext2, so we hit
> bad ram more often. If we run out of other things to try, please run a
> memory tester.
>
>> compiling the same codebase on an ext2 filesystem does not produce this
>> segfault. When I compare the produced library (libsample.so), there is a
>> consistent difference between the one compile on the reiserfs and the ext2
>> filesystem. Running objdump on the reiserfs-compiled library also produces
>> errors (some assertion failures, a lot of 'invalid string offset' errors,
>> and finally a 'Memory exhausted' error), while objdump happily
>> disassebles the ext-produced binary.
>>
>
> Where in the libsample.so file are the differences (what byte offset?).
> Are they restricted to a given range, or do they vary randomly?
>
>> These problems occur on:
>>
>> 2.4.1
>> 2.4.2-pre4
>> 2.4.2-pre4 with Chris Mason's 'reiserfs fix for null bytes in small
>> files'
>>
> At least the patch didn't make it worse. Would anyone care to comment on
> how the elf-dynstr-gc option changes the file access patterns for the
> compile?
>
> thanks,
> Chris
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/



2001-02-18 02:08:19

by Frank de Lange

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

On Sat, Feb 17, 2001 at 05:47:49PM -0800, David wrote:
> I can say "me too" for this. I thought it was perhaps glibc or binutils
> tho. I only have reiserfs systems now so I don't have a basis for
> comparison.
>
> However I -can- say that I didn't experience this until I put glibc
> 2.2.1 on my systems. I do use an "approved" gcc, stock 2.95.2.
>
> I wouldn't be so quick to pin it on reiserfs.

Well, I run glibc-2.2.1 as well, so that might be one of the factors
contributing to this. Then again, glibc-2.2.1 with ext2 does not cause any
problems whatsoever with mozilla. So it could be that reiserfs + glibc-2.2.1 is
a bad combination, question remains which of these two is the culprit (if not
both). Since glibc-2.2.2 is out, I will give that a try as well. Not tonight
though...

And no, I'm not running RedHat 7.x for those who might think so (and
automatically blame everything on it).

When did you switch to glibc-2.2.1? Were you running reiserfs before that?

Cheers//Frank

--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]

2001-02-18 02:20:11

by David Ford

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

> Well, I run glibc-2.2.1 as well, so that might be one of the factors
> contributing to this. Then again, glibc-2.2.1 with ext2 does not cause any
> problems whatsoever with mozilla. So it could be that reiserfs + glibc-2.2.1 is
> a bad combination, question remains which of these two is the culprit (if not
> both). Since glibc-2.2.2 is out, I will give that a try as well. Not tonight
> though...
>
> And no, I'm not running RedHat 7.x for those who might think so (and
> automatically blame everything on it).
>
> When did you switch to glibc-2.2.1? Were you running reiserfs before that?
>
> Cheers//Frank


Yes I was running reiserfs before 2.2.1 and I switched to 2.2.1 a couple
months ago. Since then I've been dealing with issues. I've had to
recompile half a dozen things similar to sendmail, apache etc. They
segfaulted. It wasn't as purely backward compatible as expected.

I typically compile everything on one machine and distribute it. Thus
far everything has been ok save a few issues I haven't been able to pin
down. One of these issues is the inability to compile mozilla. Also
related, I can't recompile gcc 2.95.2.

All of these things I was able to do just fine before the changeover.
To note, I used to cvs up mozilla and recompile it every few days. I
suppose I'll build an ext2 system and try things out.

Oh btw, I don't run That distribution either.

-d

2001-02-18 16:43:41

by Chris Mason

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile



On Sunday, February 18, 2001 02:10:50 AM +0100 Frank de Lange
<[email protected]> wrote:

>> At least the patch didn't make it worse. Would anyone care to comment on
>> how the elf-dynstr-gc option changes the file access patterns for the
>> compile?
>
> It does not change the file access patterns, it adds an extra step. A
> separate binary (dist/bin/elf-dynstr-gc, a convoluted version of strip)
> is run over the final (linked) library/executable to remove some symbol
> info. The elf-dynstr-gc program is compiled as part of the mozilla build.
> There's nothing wrong with elf-dynstr-gc on the reiserfs filesystem, it
> is identical to the one on the ext2 partition. Running the 'reiserfs'
> version on the ext2 tree works as it should, running the ext2 version on
> the reiserfs tree crashes (seems the program is not very robust, as it
> does not detect garbled input files). As said, running objdump on the
> corrupted (reiserfs compiled) library also produces errors.

Great, that will help narrow things down. Please run the elf-dynstr-gc
program under strace, on top of both the ext2 and reiserfs trees, and send
the results (privately, they'll probably be large) to me.

-chris



2001-02-18 17:11:00

by Chris Mason

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

On Sunday, February 18, 2001 03:07:27 AM +0100 Frank de Lange
<[email protected]> wrote:
>
> And no, I'm not running RedHat 7.x for those who might think so (and
> automatically blame everything on it).
>

Minor nit, but I'd rather clear it up now. Which distribution you run
doesn't matter for debugging. What does matter is that we've got known
problems with a given compiler, and that compiler goes by a few different
flavors with the same version number. Since there are known problems, if
you don't provide the compiler version, I'll ask. If your bug is *really*
odd, I might ask a few different ways, just to make sure you give the same
answer every time ;-)

-chris

2001-02-18 17:47:43

by Frank de Lange

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

> Minor nit, but I'd rather clear it up now. Which distribution you run
> doesn't matter for debugging. What does matter is that we've got known
> problems with a given compiler, and that compiler goes by a few different
> flavors with the same version number. Since there are known problems, if
> you don't provide the compiler version, I'll ask. If your bug is *really*
> odd, I might ask a few different ways, just to make sure you give the same
> answer every time ;-)

Well, a nit to a nit... In my experience it surely matters which distribution
somebody runs, since that tells a lot about the basic system (libc, probable
compiler, binutils, etc). RH7 is broken in many respects. Since it uses
glibc-2.2 as well, I usually add the notice that I do NOT run RH7 to messages
like these where I mention I use glibc-2.2.x, if only to ward off the usual
'are you running RH7 if yes please upgrade so and so' cycle. Bits and electrons
are much to precious to waste on
useless banter like that...

Cheers//Frank
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]

2001-02-19 17:41:44

by Frank de Lange

[permalink] [raw]
Subject: Re: reiserfs on 2.4.1,2.4.2-pre (with null bytes patch) breaks mozilla compile

On Sat, Feb 17, 2001 at 06:18:46PM -0800, David wrote:
> > Well, I run glibc-2.2.1 as well, so that might be one of the factors
> > contributing to this. Then again, glibc-2.2.1 with ext2 does not cause any
> > problems whatsoever with mozilla. So it could be that reiserfs + glibc-2.2.1 is
> > a bad combination, question remains which of these two is the culprit (if not
> > both). Since glibc-2.2.2 is out, I will give that a try as well. Not tonight
> > though...

FYI

I'm running glibc-2.2.2 now, and alas... Mozilla still refuses to be compiled,
no change...

Cheers//Frank
--
WWWWW _______________________
## o o\ / Frank de Lange \
}# \| / \
##---# _/ <Hacker for Hire> \
#### \ +31-320-252965 /
\ [email protected] /
-------------------------
[ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est." ]