2013-12-10 22:21:22

by Srivatsan Canchivaram

[permalink] [raw]
Subject: Segmentation fault in mke2fs

Hi,
I have built e2fsprogs-1.42.8 for a MIPS64-based board. The tools were
generated on an Intel machine with a MIPS64 cross compiler.

MIPS64 board info:
Linux kernel: 3.4.27
Cavium Octeon Plus MIPS64 dual core processor
256 MB RAM

The MIPS-based board will have a 2 TB NTFS (or VFAT) formatted
external USB hard drive connected to it. The goal is to format the 2
TB drive from NTFS/ VFAT to EXT4 using the 'mke2fs' program.

'Mke2fs' results in a segmentation fault when formatting to ext4:

mke2fs -t ext4 /dev/sda1
mke2fs 1.42.8 (20-Jun-2013)
Segmentation fault

As a parallel test, I tried formatting to EXT3 and this worked
correctly. The issue only seems to occur for EXT4.

Upon further debug, I found the segmentation fault to occur in
mke2fs.c: end of the should_do_undo() function in the following call:
io_channel_close(channel);

I tried tracing through the code in the should_do_undo() function.
The manager->open() call succeeds.
An issue of note occurs in the following line:
retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);

The following shows the contents of the 'channel' data structure
before and after the above function call:

Before call to io_channel_read_blk64()
-----------------------------------------------------
should_do_undo: Channel structure address = 0x1005fd70
should_do_undo: magic = 2133571333, name: /dev/sda1, block_size = 1024
should_do_undo: refcount = 1, flags = 4, align = 0

After call to io_channel_read_blk64()
---------------------------------------------------
After Read blk64: Channel structure address = 0x668b1e1a
Segmentation fault

So, the io_channel_read_blk64() function somehow modifies the
"channel" structure pointer. Trying to read the structure after the
call results in a seg fault.

In the io_channel_read_blk64() function, the code takes the following route:
if (channel->manager->read_blk64)
return (channel->manager->read_blk64)(channel, block,
count, data);

If you have any thoughts on this issue or need additional details,
please let me know.

Thanks,

Sri


2013-12-13 23:33:23

by Srivatsan Canchivaram

[permalink] [raw]
Subject: Re: Segmentation fault in mke2fs

Hello,

I found that the segmentation fault occurs in optimized code (-O2). It
does not happen when optimization is turned off. I am not sure what
exactly happened but mke2fs is now able to get past that point.

The command now fails at a different point:

ext2fs_mkdir: EXT2 directory corrupted while creating /lost+found

Tracing from the ext2fs_mkdir() function, I found that the code
returns an error here:
ext2fs_read_dir_block3(): returns EXT2_ET_DIR_CORRUPTED

Any thoughts or ideas on this issue will be very helpful.

Thanks,
Sri

On Tue, Dec 10, 2013 at 5:21 PM, Srivatsan Canchivaram
<[email protected]> wrote:
> Hi,
> I have built e2fsprogs-1.42.8 for a MIPS64-based board. The tools were
> generated on an Intel machine with a MIPS64 cross compiler.
>
> MIPS64 board info:
> Linux kernel: 3.4.27
> Cavium Octeon Plus MIPS64 dual core processor
> 256 MB RAM
>
> The MIPS-based board will have a 2 TB NTFS (or VFAT) formatted
> external USB hard drive connected to it. The goal is to format the 2
> TB drive from NTFS/ VFAT to EXT4 using the 'mke2fs' program.
>
> 'Mke2fs' results in a segmentation fault when formatting to ext4:
>
> mke2fs -t ext4 /dev/sda1
> mke2fs 1.42.8 (20-Jun-2013)
> Segmentation fault
>
> As a parallel test, I tried formatting to EXT3 and this worked
> correctly. The issue only seems to occur for EXT4.
>
> Upon further debug, I found the segmentation fault to occur in
> mke2fs.c: end of the should_do_undo() function in the following call:
> io_channel_close(channel);
>
> I tried tracing through the code in the should_do_undo() function.
> The manager->open() call succeeds.
> An issue of note occurs in the following line:
> retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
>
> The following shows the contents of the 'channel' data structure
> before and after the above function call:
>
> Before call to io_channel_read_blk64()
> -----------------------------------------------------
> should_do_undo: Channel structure address = 0x1005fd70
> should_do_undo: magic = 2133571333, name: /dev/sda1, block_size = 1024
> should_do_undo: refcount = 1, flags = 4, align = 0
>
> After call to io_channel_read_blk64()
> ---------------------------------------------------
> After Read blk64: Channel structure address = 0x668b1e1a
> Segmentation fault
>
> So, the io_channel_read_blk64() function somehow modifies the
> "channel" structure pointer. Trying to read the structure after the
> call results in a seg fault.
>
> In the io_channel_read_blk64() function, the code takes the following route:
> if (channel->manager->read_blk64)
> return (channel->manager->read_blk64)(channel, block,
> count, data);
>
> If you have any thoughts on this issue or need additional details,
> please let me know.
>
> Thanks,
>
> Sri

2013-12-14 01:50:16

by Eric Sandeen

[permalink] [raw]
Subject: Re: Segmentation fault in mke2fs

On 12/13/13, 5:33 PM, Srivatsan Canchivaram wrote:
> Hello,
>
> I found that the segmentation fault occurs in optimized code (-O2). It
> does not happen when optimization is turned off. I am not sure what
> exactly happened but mke2fs is now able to get past that point.
>
> The command now fails at a different point:
>
> ext2fs_mkdir: EXT2 directory corrupted while creating /lost+found
>
> Tracing from the ext2fs_mkdir() function, I found that the code
> returns an error here:
> ext2fs_read_dir_block3(): returns EXT2_ET_DIR_CORRUPTED
>
> Any thoughts or ideas on this issue will be very helpful.

I think we need a testcase to be able to help.

Does this only happen on MIPS64?

Can you gather a core for gdb analysis?

-Eric

> Thanks,
> Sri
>
> On Tue, Dec 10, 2013 at 5:21 PM, Srivatsan Canchivaram
> <[email protected]> wrote:
>> Hi,
>> I have built e2fsprogs-1.42.8 for a MIPS64-based board. The tools were
>> generated on an Intel machine with a MIPS64 cross compiler.
>>
>> MIPS64 board info:
>> Linux kernel: 3.4.27
>> Cavium Octeon Plus MIPS64 dual core processor
>> 256 MB RAM
>>
>> The MIPS-based board will have a 2 TB NTFS (or VFAT) formatted
>> external USB hard drive connected to it. The goal is to format the 2
>> TB drive from NTFS/ VFAT to EXT4 using the 'mke2fs' program.
>>
>> 'Mke2fs' results in a segmentation fault when formatting to ext4:
>>
>> mke2fs -t ext4 /dev/sda1
>> mke2fs 1.42.8 (20-Jun-2013)
>> Segmentation fault
>>
>> As a parallel test, I tried formatting to EXT3 and this worked
>> correctly. The issue only seems to occur for EXT4.
>>
>> Upon further debug, I found the segmentation fault to occur in
>> mke2fs.c: end of the should_do_undo() function in the following call:
>> io_channel_close(channel);
>>
>> I tried tracing through the code in the should_do_undo() function.
>> The manager->open() call succeeds.
>> An issue of note occurs in the following line:
>> retval = io_channel_read_blk64(channel, 1, -SUPERBLOCK_SIZE, &super);
>>
>> The following shows the contents of the 'channel' data structure
>> before and after the above function call:
>>
>> Before call to io_channel_read_blk64()
>> -----------------------------------------------------
>> should_do_undo: Channel structure address = 0x1005fd70
>> should_do_undo: magic = 2133571333, name: /dev/sda1, block_size = 1024
>> should_do_undo: refcount = 1, flags = 4, align = 0
>>
>> After call to io_channel_read_blk64()
>> ---------------------------------------------------
>> After Read blk64: Channel structure address = 0x668b1e1a
>> Segmentation fault
>>
>> So, the io_channel_read_blk64() function somehow modifies the
>> "channel" structure pointer. Trying to read the structure after the
>> call results in a seg fault.
>>
>> In the io_channel_read_blk64() function, the code takes the following route:
>> if (channel->manager->read_blk64)
>> return (channel->manager->read_blk64)(channel, block,
>> count, data);
>>
>> If you have any thoughts on this issue or need additional details,
>> please let me know.
>>
>> Thanks,
>>
>> Sri
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2013-12-14 06:59:53

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Segmentation fault in mke2fs

On Fri, Dec 13, 2013 at 06:33:22PM -0500, Srivatsan Canchivaram wrote:
> Hello,
>
> I found that the segmentation fault occurs in optimized code (-O2). It
> does not happen when optimization is turned off. I am not sure what
> exactly happened but mke2fs is now able to get past that point.

This is really starting to smell like a compiler bug. Are you sure
you are using a stable version of gcc?

> The command now fails at a different point:
>
> ext2fs_mkdir: EXT2 directory corrupted while creating /lost+found
>
> Tracing from the ext2fs_mkdir() function, I found that the code
> returns an error here:
> ext2fs_read_dir_block3(): returns EXT2_ET_DIR_CORRUPTED

The mke2fs program has just created the root directory, and when it is
trying to link the newly created lost+found directory to the root
directory, when it reads in the just-created root directory, when it
tries to byte-swap the directory block, the values found the root
directory were insane.

Combined with the fact that the other failure was someplace completely
diferent, I'm at this point deeply suspicious about your compiler tool
chain and/or your hardware where you are conducting your tests.

- Ted

2013-12-16 16:17:09

by Srivatsan Canchivaram

[permalink] [raw]
Subject: Re: Segmentation fault in mke2fs

Hi Ted,

The hardware is a stable product that has been in use for a while.
We have experienced a number of issues with the toolchain that we
received from the vendor.
They are about to release a new, official version this week.

So, I will try this test again with the new toolchain at some point soon.

Thanks to you and Eric for the replies.

Best,
Sri

On Sat, Dec 14, 2013 at 1:59 AM, Theodore Ts'o <[email protected]> wrote:
> On Fri, Dec 13, 2013 at 06:33:22PM -0500, Srivatsan Canchivaram wrote:
>> Hello,
>>
>> I found that the segmentation fault occurs in optimized code (-O2). It
>> does not happen when optimization is turned off. I am not sure what
>> exactly happened but mke2fs is now able to get past that point.
>
> This is really starting to smell like a compiler bug. Are you sure
> you are using a stable version of gcc?
>
>> The command now fails at a different point:
>>
>> ext2fs_mkdir: EXT2 directory corrupted while creating /lost+found
>>
>> Tracing from the ext2fs_mkdir() function, I found that the code
>> returns an error here:
>> ext2fs_read_dir_block3(): returns EXT2_ET_DIR_CORRUPTED
>
> The mke2fs program has just created the root directory, and when it is
> trying to link the newly created lost+found directory to the root
> directory, when it reads in the just-created root directory, when it
> tries to byte-swap the directory block, the values found the root
> directory were insane.
>
> Combined with the fact that the other failure was someplace completely
> diferent, I'm at this point deeply suspicious about your compiler tool
> chain and/or your hardware where you are conducting your tests.
>
> - Ted