2009-02-26 14:19:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Crash in 2.6.28.7 - ext4 related

On Thu, Feb 26, 2009 at 10:18:29AM +0100, Fabio Comolli wrote:
> OK, I see that my message didn't reach neither linux-kernel nor
> linux-fsdevel, maybe because of the image size (484K).
>
> If anyone is interested in details, please let me know.

Yes, we're interested. I've changed the cc list to be the linux-ext4
list, though. That's a better list for tracking such things than the
more general linux-fsdevel list.

Can you post the image (I assume of the kernel oops message, I hope)
somewhere? If you cc'ed directly on your first e-mail posting, it
apparently never arrived to my inbox, either....

- Ted



2009-02-26 14:38:27

by Fabio Comolli

[permalink] [raw]
Subject: Re: Crash in 2.6.28.7 - ext4 related

Can you try http://www.megaupload.com/?d=3UVSWP54 ?

It's the first time I'm using such a service so please be patient :-)

And yes, I cc'ed you in the first message.

Regards,
Fabio



On Thu, Feb 26, 2009 at 3:19 PM, Theodore Tso <[email protected]> wrote:
> On Thu, Feb 26, 2009 at 10:18:29AM +0100, Fabio Comolli wrote:
>> OK, I see that my message didn't reach neither linux-kernel nor
>> linux-fsdevel, maybe because of the image size (484K).
>>
>> If anyone is interested in details, please let me know.
>
> Yes, we're interested.  I've changed the cc list to be the linux-ext4
> list, though.  That's a better list for tracking such things than the
> more general linux-fsdevel list.
>
> Can you post the image (I assume of the kernel oops message, I hope)
> somewhere?  If you cc'ed directly on your first e-mail posting, it
> apparently never arrived to my inbox, either....
>
>                                               - Ted
>
>

2009-02-26 16:08:33

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Crash in 2.6.28.7 - ext4 related

I got the crash image. Unfortunately the beginning of the Oops
message was cut off, which might have been valuable. There are some
new patches that are newer than what has been backported to 2.6.28.7;
some of them are in the for-stable branch of the ext4 git tree, and
have been queued for 2.6.28.8. The call stack doesn't look like one
of the known bugs, though.

Hmm... can you send us the output of dumpe2fs on the filesystem? And
something that would be very useful would be a raw e2image of the
filesystem, created thusly:

e2image -r /dev/sdXXX - | bzip2 > sdXXX.e2i.bz2

This image will contain the basic filesystem metadata, including the
directory blocks (and directory names), but none of the data blocks.
It is therefore much smaller, and if you are willing to let me see
your directory file names, I can take that image and try to replicate
the problem on one of my systems.

If you have a spare 20gb of disk space, you can also unpack the raw
image dump and try reproducing the problem on the raw image dump. If
you can trigger it easily with an rm -rf operation, it should be just
as reproducible on the raw image dump.

Regards,

- Ted

2009-02-26 20:42:15

by Fabio Comolli

[permalink] [raw]
Subject: Re: Crash in 2.6.28.7 - ext4 related

Hi.

On Thu, Feb 26, 2009 at 5:08 PM, Theodore Tso <[email protected]> wrote:
> I got the crash image.  Unfortunately the beginning of the Oops
> message was cut off, which might have been valuable.  There are some
> new patches that are newer than what has been backported to 2.6.28.7;
> some of them are in the for-stable branch of the ext4 git tree, and
> have been queued for 2.6.28.8.  The call stack doesn't look like one
> of the known bugs, though.
>
> Hmm... can you send us the output of dumpe2fs on the filesystem?

Attached.

> And something that would be very useful would be a raw e2image of the
> filesystem, created thusly:
>
>            e2image -r /dev/sdXXX - | bzip2 > sdXXX.e2i.bz2
>
> This image will contain the basic filesystem metadata, including the
> directory blocks (and directory names), but none of the data blocks.
> It is therefore much smaller, and if you are willing to let me see
> your directory file names, I can take that image and try to replicate
> the problem on one of my systems.

It's my home directory and so I prefer not to share, sorry. Anyway, it
seems that after the removal of that (possibly corrupted) directory, I
can't reproduce the problem anymore. I tried to create / modify /
delete some big directories, even two or three at a time with no luck.

I would post some new pictures if the problems represents.

> If you have a spare 20gb of disk space, you can also unpack the raw
> image dump and try reproducing the problem on the raw image dump.  If
> you can trigger it easily with an rm -rf operation, it should be just
> as reproducible on the raw image dump.
>
> Regards,
>
>                                                - Ted
>

Thanks,
Fabio


Attachments:
dumpe2fs.log.gz (139.91 kB)

2009-02-26 20:52:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Crash in 2.6.28.7 - ext4 related

On Thu, Feb 26, 2009 at 09:42:15PM +0100, Fabio Comolli wrote:
> It's my home directory and so I prefer not to share, sorry.

No problem, I understand.

> Anyway, it seems that after the removal of that (possibly corrupted)
> directory, I can't reproduce the problem anymore. I tried to create
> / modify / delete some big directories, even two or three at a time
> with no luck.

Did you ever try running e2fsck on the filesystem while you could
reproduce it? Did it report any errors? A good thing to do in
general, if you can report these sorts of problems, is to run e2fsck
with the -n option, while the filesystme is unmounted, and see if any
errors are reported. That would tell us if there were any filesystem
corruption problems (and the -n avoids making any changes to the
filesystem).

Also, even if you don't feel willing to share the e2image file, if you
can reproduce it, please consider making a raw e2image dump. That way
if the problem goes away again, maybe you'll be able to consistently
report reproduce it on the e2image dump file.

The other thing that you can do which will sometimes work is to add
the -s option to the e2image command. The -s option scrambles the
name of the directory entries and zeros out any unused portions of
directory blocks to prevent privacy problems. The downside is that it
can prevent certain bugs from being repeatable and you have to either
turn off the dir_index feature or run e2fsck to fix up the htree since
the filename hashes will be screwed up after the directory entries are
scrambled. So it's not ideal, but in cases where there are privacy
issues, that can be helpful.

Regards,

- Ted

2009-02-26 22:01:17

by Fabio Comolli

[permalink] [raw]
Subject: Re: Crash in 2.6.28.7 - ext4 related

Hi

On Thu, Feb 26, 2009 at 9:52 PM, Theodore Tso <[email protected]> wrote:
> On Thu, Feb 26, 2009 at 09:42:15PM +0100, Fabio Comolli wrote:
>> It's my home directory and so I prefer not to share, sorry.
>
> No problem, I understand.

Thanks.

>
>> Anyway, it seems that after the removal of that (possibly corrupted)
>> directory, I can't reproduce the problem anymore. I tried to create
>> / modify / delete some big directories, even two or three at a time
>> with no luck.
>
> Did you ever try running e2fsck on the filesystem while you could
> reproduce it?  Did it report any errors?  A good thing to do in
> general, if you can report these sorts of problems, is to run e2fsck
> with the -n option, while the filesystme is unmounted, and see if any
> errors are reported.  That would tell us if there were any filesystem
> corruption problems (and the -n avoids making any changes to the
> filesystem).

OK, maybe I did not make myself clear in my previous post. After the
last crash (the one from which the picture was taken) I booted
single-user and the I forced a full fsck with the filesystem
unmounted. It reported no errors. After that I removed the problematic
directory and all is fine since that.

Maybe it's worth mentioning that I did the very same actions after
another crash that happened before: also in that case a full fsck
reported no errors but trying to remove the directory after that
crashed the machine.

>
> Also, even if you don't feel willing to share the e2image file, if you
> can reproduce it, please consider making a raw e2image dump.  That way
> if the problem goes away again, maybe you'll be able to consistently
> report reproduce it on the e2image dump file.

Yup, will do if the problem shows up again.

>
> The other thing that you can do which will sometimes work is to add
> the -s option to the e2image command.  The -s option scrambles the
> name of the directory entries and zeros out any unused portions of
> directory blocks to prevent privacy problems.  The downside is that it
> can prevent certain bugs from being repeatable and you have to either
> turn off the dir_index feature or run e2fsck to fix up the htree since
> the filename hashes will be screwed up after the directory entries are
> scrambled.  So it's not ideal, but in cases where there are privacy
> issues, that can be helpful.

Will do.

>
> Regards,
>
>                                        - Ted
>

Regards,
Fabio