2008-07-09 22:00:05

by Gary Hawco

[permalink] [raw]
Subject: Segfaults--they're back!

Segfaults have returned with snapshots compiled after 070908--0010hrs GMT.

That one worked fine.

The next one I tried (070908/0025hrs GMT) caused segfaults in both Gentoo &
Slackware (a first for Slackware) when trying to untar
linux-2.6.26-rc9.tar.bz2 tarball)

I then rolled back two snapshots to 070908/0012hrs GMT) and it segfaulted
in both operating systems doing same untarring function.

So, apparently, since 0010snapshot
(ext4-patch-queue-bfb23cf4cd345552c774142cb10ac1225caf35f5.tar.gz) works fine

and 0012snapshot
(ext4-patch-queue-be66b0c5c3f4293176301c0ddcb8db95b0576cb4.tar.gz)
segfaults, the
Add ext4-fix-mb_find_next_bit-return.patch must be the culprit.

Thanks,
Gary

P.S. The latest snapshot from today @ 0303hrs GMT segfaults as well.



2008-07-10 00:16:19

by Mingming Cao

[permalink] [raw]
Subject: Re: Segfaults--they're back!


在 2008-07-09三的 15:00 +0000,Gary Hawco写道:
> Segfaults have returned with snapshots compiled after 070908--0010hrs GMT.
>
> That one worked fine.
>
> The next one I tried (070908/0025hrs GMT) caused segfaults in both Gentoo &
> Slackware (a first for Slackware) when trying to untar
> linux-2.6.26-rc9.tar.bz2 tarball)
>
> I then rolled back two snapshots to 070908/0012hrs GMT) and it segfaulted
> in both operating systems doing same untarring function.
>
> So, apparently, since 0010snapshot
> (ext4-patch-queue-bfb23cf4cd345552c774142cb10ac1225caf35f5.tar.gz) works fine
>
> and 0012snapshot
> (ext4-patch-queue-be66b0c5c3f4293176301c0ddcb8db95b0576cb4.tar.gz)
> segfaults, the
> Add ext4-fix-mb_find_next_bit-return.patch must be the culprit.
>

Thanks for reporting this,
commit be66b0c5c3f4293176301c0ddcb8db95b0576cb4 added
ext4-fix-mb_find_next_bit-return.patch

I have dropped this patch from patch queue, could you please check if
the segment fault goes away?


Mingming
> Thanks,
> Gary
>
> P.S. The latest snapshot from today @ 0303hrs GMT segfaults as well.
>

2008-07-10 05:22:04

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: Segfaults--they're back!

On Wed, Jul 09, 2008 at 03:00:04PM +0000, Gary Hawco wrote:
> Segfaults have returned with snapshots compiled after 070908--0010hrs GMT.
>
> That one worked fine.
>
> The next one I tried (070908/0025hrs GMT) caused segfaults in both Gentoo &
> Slackware (a first for Slackware) when trying to untar
> linux-2.6.26-rc9.tar.bz2 tarball)
>
> I then rolled back two snapshots to 070908/0012hrs GMT) and it segfaulted
> in both operating systems doing same untarring function.
>
> So, apparently, since 0010snapshot
> (ext4-patch-queue-bfb23cf4cd345552c774142cb10ac1225caf35f5.tar.gz) works fine
>
> and 0012snapshot
> (ext4-patch-queue-be66b0c5c3f4293176301c0ddcb8db95b0576cb4.tar.gz)
> segfaults, the
> Add ext4-fix-mb_find_next_bit-return.patch must be the culprit.

Is your file system full when this happens ? Which user space call cause
the segfault ? An strace should be able to help you find that. We
actually have modified ext4 to give SIGBUS when we hit ENOSPC during
mmap write. For ex:

root:/ext4# /root/mmaptest ./test4 0 100
mmaping 0 to 100
Bus error (core dumped)
root:/ext4#

-aneesh

2008-07-10 11:03:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Segfaults--they're back!

On Thu, Jul 10, 2008 at 10:51:50AM +0530, Aneesh Kumar K.V wrote:
> On Wed, Jul 09, 2008 at 03:00:04PM +0000, Gary Hawco wrote:
> > Segfaults have returned with snapshots compiled after 070908--0010hrs GMT.
> >
> > That one worked fine.

Is this a kernel or userspace segfault? i.e., is there a kernel oops
message? If so, can you send us the oops message? It's really really
useful to have it. Also, can you send us a dumpe2fs of the filesystem
which ou were writing to? Some of the bugs which we've been finding
are specific to a 1k block filesystem, for example, and sometimes that
can help narrow down the bug.

> Is your file system full when this happens ? Which user space call cause
> the segfault ? An strace should be able to help you find that. We
> actually have modified ext4 to give SIGBUS when we hit ENOSPC during
> mmap write. For ex:
>
> root:/ext4# /root/mmaptest ./test4 0 100
> mmaping 0 to 100
> Bus error (core dumped)
> root:/ext4#

What Aneesh is talking about here is a userspace core dump which is
normal, if the filesystem is full. I'm not sure that's what you're
referring to assuming that "they're back" means you're referring to a
kernel oops.

Regards,

- Ted