2013-03-11 17:18:30

by Markus Trippelsdorf

[permalink] [raw]
Subject: torrent hash failures since 3.9.0-rc1

I get hash failures on "completed" torrents since 3.9.0-rc1 (Linux 3.8
seems to be fine). What happens is that the torrents apparently complete
successfully. After reboot however the hash check fails and there are
missing (or corrupted) chunks. I've tested this with two different
clients (rtorrent and aria2c) and both are affected. So I think this
might be a filesystem issue.

/dev/sda ext4 1.4T 666G 640G 51% /var
/dev/sda on /var type ext4 (rw,noatime,data=ordered)

I use ECC memory (and there is nothing in the logs).

--
Markus


2013-03-11 19:17:58

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On 2013.03.11 at 18:18 +0100, Markus Trippelsdorf wrote:
> I get hash failures on "completed" torrents since 3.9.0-rc1 (Linux 3.8
> seems to be fine). What happens is that the torrents apparently complete
> successfully. After reboot however the hash check fails and there are
> missing (or corrupted) chunks. I've tested this with two different
> clients (rtorrent and aria2c) and both are affected. So I think this
> might be a filesystem issue.
>
> /dev/sda ext4 1.4T 666G 640G 51% /var
> /dev/sda on /var type ext4 (rw,noatime,data=ordered)
>
> I use ECC memory (and there is nothing in the logs).

To reproduce this issue just do the following:

% wget http://torrents.linuxmint.com/torrents/linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent
% rtorrent linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent
(Wait until the torrent finishes)
% sudo echo 3 > /proc/sys/vm/drop_caches
(Rehash the torrent (Ctrl-R))
The torrent doesn't rehash successfully and a few hunks are
missing/corrupted and need to be downloaded again.

--
Markus

2013-03-11 19:42:10

by Dave Jones

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Mon, Mar 11, 2013 at 08:17:53PM +0100, Markus Trippelsdorf wrote:
> On 2013.03.11 at 18:18 +0100, Markus Trippelsdorf wrote:
> > I get hash failures on "completed" torrents since 3.9.0-rc1 (Linux 3.8
> > seems to be fine). What happens is that the torrents apparently complete
> > successfully. After reboot however the hash check fails and there are
> > missing (or corrupted) chunks. I've tested this with two different
> > clients (rtorrent and aria2c) and both are affected. So I think this
> > might be a filesystem issue.
> >
> > /dev/sda ext4 1.4T 666G 640G 51% /var
> > /dev/sda on /var type ext4 (rw,noatime,data=ordered)
> >
> > I use ECC memory (and there is nothing in the logs).
>
> To reproduce this issue just do the following:
>
> % wget http://torrents.linuxmint.com/torrents/linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent
> % rtorrent linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent
> (Wait until the torrent finishes)
> % sudo echo 3 > /proc/sys/vm/drop_caches
> (Rehash the torrent (Ctrl-R))
> The torrent doesn't rehash successfully and a few hunks are
> missing/corrupted and need to be downloaded again.

Worked fine for me on two separate machines. Could it be a network problem
perhaps ? If something is mangling the packet before it hits the disk,
that would explain it. What NIC do you use ?

Or maybe you could isolate it to a filesystem problem using something
like fsx ?

Dave

2013-03-11 20:13:38

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On 2013.03.11 at 15:41 -0400, Dave Jones wrote:
> On Mon, Mar 11, 2013 at 08:17:53PM +0100, Markus Trippelsdorf wrote:
> > On 2013.03.11 at 18:18 +0100, Markus Trippelsdorf wrote:
> > > I get hash failures on "completed" torrents since 3.9.0-rc1 (Linux 3.8
> > > seems to be fine). What happens is that the torrents apparently complete
> > > successfully. After reboot however the hash check fails and there are
> > > missing (or corrupted) chunks. I've tested this with two different
> > > clients (rtorrent and aria2c) and both are affected. So I think this
> > > might be a filesystem issue.
> > >
> > > /dev/sda ext4 1.4T 666G 640G 51% /var
> > > /dev/sda on /var type ext4 (rw,noatime,data=ordered)
> > >
> > > I use ECC memory (and there is nothing in the logs).
> >
> > To reproduce this issue just do the following:
> >
> > % wget http://torrents.linuxmint.com/torrents/linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent
> > % rtorrent linuxmint-12-gnome-cd-nocodecs-64bit.iso.torrent
> > (Wait until the torrent finishes)
> > % sudo echo 3 > /proc/sys/vm/drop_caches
> > (Rehash the torrent (Ctrl-R))
> > The torrent doesn't rehash successfully and a few hunks are
> > missing/corrupted and need to be downloaded again.
>
> Worked fine for me on two separate machines. Could it be a network problem
> perhaps ? If something is mangling the packet before it hits the disk,
> that would explain it. What NIC do you use ?

I normally use ATL1E, but I've dusted off my E100 and the issue is also
reproducible on the Intel card.

> Or maybe you could isolate it to a filesystem problem using something
> like fsx ?

I've found fsx on your homepage, but I've no idea on how to use this
tool. Any pointers?

--
Markus

2013-03-11 20:37:49

by Theodore Ts'o

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Mon, Mar 11, 2013 at 09:13:34PM +0100, Markus Trippelsdorf wrote:
> On 2013.03.11 at 15:41 -0400, Dave Jones wrote:
> > Worked fine for me on two separate machines. Could it be a network problem
> > perhaps ? If something is mangling the packet before it hits the disk,
> > that would explain it. What NIC do you use ?

I'm not a torrent expert, but I thought it did enough checksumming
such that if the packet got mangled, it would get noticd by the
torrent client before it writes the chunks to disk?

> > Or maybe you could isolate it to a filesystem problem using something
> > like fsx ?
>
> I've found fsx on your homepage, but I've no idea on how to use this
> tool. Any pointers?

We actually run fsx in a number of different configruations as part of
our regression testing before we send Linus a pull request, and
haven't found any issues. So unless it's a hardware problem, it seems
unlikely to me that your running fsx would turn up anything.

Can you send a dumpefs -h of the file system in question, and what
mount options (if any) you are using? Thanks!!

BTW, I'm currently running 3.9-rc2 with some additional fixes from the
ext4 dev branch, and I'm not able to reproduce the problem using
rtorrent on my laptop. How reliably is it reproducing for you? Are
you seeing the problem every time you try this?

- Ted


2013-03-11 20:44:32

by Dave Jones

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Mon, Mar 11, 2013 at 09:13:34PM +0100, Markus Trippelsdorf wrote:

> > Worked fine for me on two separate machines. Could it be a network problem
> > perhaps ? If something is mangling the packet before it hits the disk,
> > that would explain it. What NIC do you use ?
>
> I normally use ATL1E, but I've dusted off my E100 and the issue is also
> reproducible on the Intel card.

ok, good to rule that out at least.

> > Or maybe you could isolate it to a filesystem problem using something
> > like fsx ?
>
> I've found fsx on your homepage, but I've no idea on how to use this
> tool. Any pointers?

cd to the mount point you want to test, and then 'fsx test' will create
a couple files there, and stress them.

Dave

2013-03-11 20:46:31

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On 2013.03.11 at 16:37 -0400, Theodore Ts'o wrote:
> On Mon, Mar 11, 2013 at 09:13:34PM +0100, Markus Trippelsdorf wrote:
> > On 2013.03.11 at 15:41 -0400, Dave Jones wrote:
> > > Worked fine for me on two separate machines. Could it be a network problem
> > > perhaps ? If something is mangling the packet before it hits the disk,
> > > that would explain it. What NIC do you use ?
>
> I'm not a torrent expert, but I thought it did enough checksumming
> such that if the packet got mangled, it would get noticd by the
> torrent client before it writes the chunks to disk?

Yes, I think that's the idea.

> > > Or maybe you could isolate it to a filesystem problem using something
> > > like fsx ?
> >
> > I've found fsx on your homepage, but I've no idea on how to use this
> > tool. Any pointers?
>
> We actually run fsx in a number of different configruations as part of
> our regression testing before we send Linus a pull request, and
> haven't found any issues. So unless it's a hardware problem, it seems
> unlikely to me that your running fsx would turn up anything.

Yes, I let it run for a while anyway and it didn't report any failure.

> Can you send a dumpefs -h of the file system in question, and what
> mount options (if any) you are using? Thanks!!

# dumpe2fs -h /dev/sda
dumpe2fs 1.42.7 (21-Jan-2013)
Filesystem volume name: <none>
Last mounted on: /var
Filesystem UUID: 202f2c93-c6c5-4d70-a63f-d770161138bd
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery extent flex_bg sparse_super large_file huge_file
uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 91578368
Block count: 366284646
Reserved block count: 18314232
Free blocks: 185850075
Free inodes: 90003798
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 936
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Mon Nov 19 16:02:46 2012
Last mount time: Mon Mar 11 21:16:23 2013
Last write time: Mon Mar 11 21:16:23 2013
Mount count: 20
Maximum mount count: -1
Last checked: Mon Mar 4 13:32:55 2013
Check interval: 0 (<none>)
Lifetime writes: 2891 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 60164803
Default directory hash: half_md4
Directory Hash Seed: e86f34a0-390a-49b6-87a9-3336d861ab81
Journal backup: inode blocks
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length: 32768
Journal sequence: 0x00079bef
Journal start: 1

noatime is the only mount option.

> BTW, I'm currently running 3.9-rc2 with some additional fixes from the
> ext4 dev branch, and I'm not able to reproduce the problem using
> rtorrent on my laptop. How reliably is it reproducing for you? Are
> you seeing the problem every time you try this?

Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue
vanishes.

--
Markus

2013-03-11 21:18:42

by Theodore Ts'o

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Mon, Mar 11, 2013 at 09:46:25PM +0100, Markus Trippelsdorf wrote:
> > BTW, I'm currently running 3.9-rc2 with some additional fixes from the
> > ext4 dev branch, and I'm not able to reproduce the problem using
> > rtorrent on my laptop. How reliably is it reproducing for you? Are
> > you seeing the problem every time you try this?
>
> Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue
> vanishes.

Would you be willing to try an experiment?

Try pulling down the master branch from the ext4 git tree here:

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git

This contains all of the ext4 changes which are in 3.9-rc1, based on
top of 3.8-rc3. See if it reproduces there. If it does, then it
would tend to confirm the hypothesis that the issue was introduced by
one of the ext4 patches that we merged during the 3.9-rc1 merge
window... and then, since if you can reproduce the problem, if you
could do a git bisect to find the guilty commit, that would really
greatly appreciated.

If you can't reproduce it from the ext4.git tree, then the problem is
probably caused by some other change that was introduced between 3.8
and 3.9-rc1.

Thanks in advance,

- Ted

2013-03-11 21:38:41

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On 2013.03.11 at 17:18 -0400, Theodore Ts'o wrote:
> On Mon, Mar 11, 2013 at 09:46:25PM +0100, Markus Trippelsdorf wrote:
> > > BTW, I'm currently running 3.9-rc2 with some additional fixes from the
> > > ext4 dev branch, and I'm not able to reproduce the problem using
> > > rtorrent on my laptop. How reliably is it reproducing for you? Are
> > > you seeing the problem every time you try this?
> >
> > Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue
> > vanishes.
>
> Would you be willing to try an experiment?
>
> Try pulling down the master branch from the ext4 git tree here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
>
> This contains all of the ext4 changes which are in 3.9-rc1, based on
> top of 3.8-rc3. See if it reproduces there. If it does, then it
> would tend to confirm the hypothesis that the issue was introduced by
> one of the ext4 patches that we merged during the 3.9-rc1 merge
> window... and then, since if you can reproduce the problem, if you
> could do a git bisect to find the guilty commit, that would really
> greatly appreciated.
>
> If you can't reproduce it from the ext4.git tree, then the problem is
> probably caused by some other change that was introduced between 3.8
> and 3.9-rc1.

I've started a full bisection from v3.8 to todays git tree. It will take
~13 steps. However it's already late here in Germany. I will continue
the bisection tomorrow and report back.

--
Markus

2013-03-11 23:12:31

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On 2013.03.11 at 22:38 +0100, Markus Trippelsdorf wrote:
> On 2013.03.11 at 17:18 -0400, Theodore Ts'o wrote:
> > On Mon, Mar 11, 2013 at 09:46:25PM +0100, Markus Trippelsdorf wrote:
> > > > BTW, I'm currently running 3.9-rc2 with some additional fixes from the
> > > > ext4 dev branch, and I'm not able to reproduce the problem using
> > > > rtorrent on my laptop. How reliably is it reproducing for you? Are
> > > > you seeing the problem every time you try this?
> > >
> > > Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue
> > > vanishes.
> >
> > Would you be willing to try an experiment?
> >
> > Try pulling down the master branch from the ext4 git tree here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
> >
> > This contains all of the ext4 changes which are in 3.9-rc1, based on
> > top of 3.8-rc3. See if it reproduces there. If it does, then it
> > would tend to confirm the hypothesis that the issue was introduced by
> > one of the ext4 patches that we merged during the 3.9-rc1 merge
> > window... and then, since if you can reproduce the problem, if you
> > could do a git bisect to find the guilty commit, that would really
> > greatly appreciated.
> >
> > If you can't reproduce it from the ext4.git tree, then the problem is
> > probably caused by some other change that was introduced between 3.8
> > and 3.9-rc1.
>
> I've started a full bisection from v3.8 to todays git tree. It will take
> ~13 steps. However it's already late here in Germany. I will continue
> the bisection tomorrow and report back.

The issue started with:

74cd15cd02708c7188581f279f33a98b2ae8d322 is the first bad commit
commit 74cd15cd02708c7188581f279f33a98b2ae8d322
Author: Zheng Liu <[email protected]>
Date: Mon Feb 18 00:32:55 2013 -0500

ext4: reclaim extents from extent status tree

Please note that my local rtorrent version was configured with
"--with-posix-fallocate". I'm not sure if distributions also enable this
flag, but it could explain why Ted and Dave weren't able to reproduce
the problem so far.

--
Markus

2013-03-11 23:26:28

by Dave Jones

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Tue, Mar 12, 2013 at 12:12:27AM +0100, Markus Trippelsdorf wrote:
> > I've started a full bisection from v3.8 to todays git tree. It will take
> > ~13 steps. However it's already late here in Germany. I will continue
> > the bisection tomorrow and report back.
>
> The issue started with:
>
> 74cd15cd02708c7188581f279f33a98b2ae8d322 is the first bad commit
> commit 74cd15cd02708c7188581f279f33a98b2ae8d322
> Author: Zheng Liu <[email protected]>
> Date: Mon Feb 18 00:32:55 2013 -0500
>
> ext4: reclaim extents from extent status tree
>
> Please note that my local rtorrent version was configured with
> "--with-posix-fallocate". I'm not sure if distributions also enable this
> flag, but it could explain why Ted and Dave weren't able to reproduce
> the problem so far.

Looks like Fedora doesn't, so indeed that could explain it.

Dave

2013-03-12 02:45:44

by Zheng Liu

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Tue, Mar 12, 2013 at 12:12:27AM +0100, Markus Trippelsdorf wrote:
> On 2013.03.11 at 22:38 +0100, Markus Trippelsdorf wrote:
> > On 2013.03.11 at 17:18 -0400, Theodore Ts'o wrote:
> > > On Mon, Mar 11, 2013 at 09:46:25PM +0100, Markus Trippelsdorf wrote:
> > > > > BTW, I'm currently running 3.9-rc2 with some additional fixes from the
> > > > > ext4 dev branch, and I'm not able to reproduce the problem using
> > > > > rtorrent on my laptop. How reliably is it reproducing for you? Are
> > > > > you seeing the problem every time you try this?
> > > >
> > > > Yes, it's 100% reproducible for me. If I boot a 3.8 kernel the issue
> > > > vanishes.
> > >
> > > Would you be willing to try an experiment?
> > >
> > > Try pulling down the master branch from the ext4 git tree here:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
> > >
> > > This contains all of the ext4 changes which are in 3.9-rc1, based on
> > > top of 3.8-rc3. See if it reproduces there. If it does, then it
> > > would tend to confirm the hypothesis that the issue was introduced by
> > > one of the ext4 patches that we merged during the 3.9-rc1 merge
> > > window... and then, since if you can reproduce the problem, if you
> > > could do a git bisect to find the guilty commit, that would really
> > > greatly appreciated.
> > >
> > > If you can't reproduce it from the ext4.git tree, then the problem is
> > > probably caused by some other change that was introduced between 3.8
> > > and 3.9-rc1.
> >
> > I've started a full bisection from v3.8 to todays git tree. It will take
> > ~13 steps. However it's already late here in Germany. I will continue
> > the bisection tomorrow and report back.
>
> The issue started with:
>
> 74cd15cd02708c7188581f279f33a98b2ae8d322 is the first bad commit
> commit 74cd15cd02708c7188581f279f33a98b2ae8d322
> Author: Zheng Liu <[email protected]>
> Date: Mon Feb 18 00:32:55 2013 -0500
>
> ext4: reclaim extents from extent status tree
>
> Please note that my local rtorrent version was configured with
> "--with-posix-fallocate". I'm not sure if distributions also enable this
> flag, but it could explain why Ted and Dave weren't able to reproduce
> the problem so far.

Hi Markus,

Thanks for reporting this problem. My deepest apologies.

As Ted suggested, could you please try to use ext4 git tree? I want to
make sure whether this bug has been fixed by my lastest patch series or
not.

Thanks in advance,
- Zheng

2013-03-12 03:31:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote:
>
> Thanks for reporting this problem. My deepest apologies.
>
> As Ted suggested, could you please try to use ext4 git tree? I want to
> make sure whether this bug has been fixed by my lastest patch series or
> not.

It's definitely worth a try to compile the master branch of the ext4
tree and see if it reproduces or not.

However, I suspect the problem will still be there. Based on the
commit which Markus has identified, I'm guessing it's a race between
the extents_status shrinker and writing into uninitialized region of
the file (since apprently compiling rtorrent with
--with-posix-fallocate is required).

Markus, how much memory do you have in your system? That may be the
other reason why I haven't been able to reproduce it to date; I had a
lot of free memory when I tried to reproduce the problem, so the slab
shrinker didn't engage.

Regards,

- Ted

2013-03-12 03:44:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Mon, Mar 11, 2013 at 11:30:54PM -0400, Theodore Ts'o wrote:
> > As Ted suggested, could you please try to use ext4 git tree? I want to
> > make sure whether this bug has been fixed by my lastest patch series or
> > not.
>
> It's definitely worth a try to compile the master branch of the ext4
> tree and see if it reproduces or not.

Sorry, what you should try is the dev branch of the ext4 tree. That
has the new patches that we are currently QA'ing for 3.9-rc3
(hopefully).

- Ted

2013-03-12 06:16:29

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On 2013.03.11 at 23:30 -0400, Theodore Ts'o wrote:
> On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote:
> >
> > Thanks for reporting this problem. My deepest apologies.
> >
> > As Ted suggested, could you please try to use ext4 git tree? I want to
> > make sure whether this bug has been fixed by my lastest patch series or
> > not.
>
> It's definitely worth a try to compile the master branch of the ext4
> tree and see if it reproduces or not.

I cannot reproduce the issue on top of "ext4.git dev", so fortunately
the problem seems to be already fixed there.
Thanks.

Do you guys have a hunch which commit is the actual fix?
(Maybe I will "bisect" it later today.)

--
Markus

2013-03-12 06:29:35

by Zheng Liu

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Tue, Mar 12, 2013 at 07:16:24AM +0100, Markus Trippelsdorf wrote:
> On 2013.03.11 at 23:30 -0400, Theodore Ts'o wrote:
> > On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote:
> > >
> > > Thanks for reporting this problem. My deepest apologies.
> > >
> > > As Ted suggested, could you please try to use ext4 git tree? I want to
> > > make sure whether this bug has been fixed by my lastest patch series or
> > > not.
> >
> > It's definitely worth a try to compile the master branch of the ext4
> > tree and see if it reproduces or not.
>
> I cannot reproduce the issue on top of "ext4.git dev", so fortunately
> the problem seems to be already fixed there.
> Thanks.

Great! Thanks for the confirmation.

>
> Do you guys have a hunch which commit is the actual fix?
> (Maybe I will "bisect" it later today.)

I think maybe this two commits can fix it, but I am not sure which one
is the actual fix (I guess it is the former one, ;-) ). Please try it
if you could bisect it. Thanks in advance.

* 079d7667af20876a59a1d9b0d4d1e15dcf17fa34
ext4: fix wrong the number of the allocated blocks in
ext4_split_extent()

* cdee78433c138c2f2018a6884673739af2634787
ext4: fix wrong m_len value after unwritten extent conversion

Regards,
- Zheng

2013-03-12 06:48:15

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On 2013.03.12 at 14:44 +0800, Zheng Liu wrote:
> On Tue, Mar 12, 2013 at 07:16:24AM +0100, Markus Trippelsdorf wrote:
> > On 2013.03.11 at 23:30 -0400, Theodore Ts'o wrote:
> > > On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote:
> > > >
> > > > Thanks for reporting this problem. My deepest apologies.
> > > >
> > > > As Ted suggested, could you please try to use ext4 git tree? I want to
> > > > make sure whether this bug has been fixed by my lastest patch series or
> > > > not.
> > >
> > > It's definitely worth a try to compile the master branch of the ext4
> > > tree and see if it reproduces or not.
> >
> > I cannot reproduce the issue on top of "ext4.git dev", so fortunately
> > the problem seems to be already fixed there.
> > Thanks.
>
> Great! Thanks for the confirmation.
>
> >
> > Do you guys have a hunch which commit is the actual fix?
> > (Maybe I will "bisect" it later today.)
>
> I think maybe this two commits can fix it, but I am not sure which one
> is the actual fix (I guess it is the former one, ;-) ). Please try it
> if you could bisect it. Thanks in advance.
>
> * 079d7667af20876a59a1d9b0d4d1e15dcf17fa34
> ext4: fix wrong the number of the allocated blocks in
> ext4_split_extent()

Your guess was right. The commit above is the actual fix.

--
Markus

2013-03-12 07:00:51

by Zheng Liu

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Tue, Mar 12, 2013 at 07:48:10AM +0100, Markus Trippelsdorf wrote:
> On 2013.03.12 at 14:44 +0800, Zheng Liu wrote:
> > On Tue, Mar 12, 2013 at 07:16:24AM +0100, Markus Trippelsdorf wrote:
> > > On 2013.03.11 at 23:30 -0400, Theodore Ts'o wrote:
> > > > On Tue, Mar 12, 2013 at 11:00:58AM +0800, Zheng Liu wrote:
> > > > >
> > > > > Thanks for reporting this problem. My deepest apologies.
> > > > >
> > > > > As Ted suggested, could you please try to use ext4 git tree? I want to
> > > > > make sure whether this bug has been fixed by my lastest patch series or
> > > > > not.
> > > >
> > > > It's definitely worth a try to compile the master branch of the ext4
> > > > tree and see if it reproduces or not.
> > >
> > > I cannot reproduce the issue on top of "ext4.git dev", so fortunately
> > > the problem seems to be already fixed there.
> > > Thanks.
> >
> > Great! Thanks for the confirmation.
> >
> > >
> > > Do you guys have a hunch which commit is the actual fix?
> > > (Maybe I will "bisect" it later today.)
> >
> > I think maybe this two commits can fix it, but I am not sure which one
> > is the actual fix (I guess it is the former one, ;-) ). Please try it
> > if you could bisect it. Thanks in advance.
> >
> > * 079d7667af20876a59a1d9b0d4d1e15dcf17fa34
> > ext4: fix wrong the number of the allocated blocks in
> > ext4_split_extent()
>
> Your guess was right. The commit above is the actual fix.

Thank you so much for verifing it. :-)

Ted, I am wandering if we need to Cc this patch to stable kernel. We
don't receive any report to complaint it, though, but it is worth
backporting it I think.

Regards,
- Zheng

2013-03-12 08:39:12

by Sander

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

Markus Trippelsdorf wrote (ao):
> On 2013.03.11 at 16:37 -0400, Theodore Ts'o wrote:
> > We actually run fsx in a number of different configruations as part of
> > our regression testing before we send Linus a pull request, and
> > haven't found any issues. So unless it's a hardware problem, it seems
> > unlikely to me that your running fsx would turn up anything.
>
> Yes, I let it run for a while anyway and it didn't report any failure.

> Please note that my local rtorrent version was configured with
> "--with-posix-fallocate".

Would it be possible to enhance fsx to detect such an issue?

Sander

2013-03-12 13:28:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Tue, Mar 12, 2013 at 03:16:06PM +0800, Zheng Liu wrote:
>
> Ted, I am wandering if we need to Cc this patch to stable kernel. We
> don't receive any report to complaint it, though, but it is worth
> backporting it I think.

I'll check, bu I suspect it will require an explicit backport; it's
not going to apply cleanly automatically, will it?

(i.e., if we include cc: [email protected], are there some
prerequisite patches that will also have to be backported, and/or we
will need manually fix up patch conflicts, right?)

- Ted

2013-03-12 22:04:58

by Dave Chinner

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Tue, Mar 12, 2013 at 09:28:54AM +0100, Sander wrote:
> Markus Trippelsdorf wrote (ao):
> > On 2013.03.11 at 16:37 -0400, Theodore Ts'o wrote:
> > > We actually run fsx in a number of different configruations as part of
> > > our regression testing before we send Linus a pull request, and
> > > haven't found any issues. So unless it's a hardware problem, it seems
> > > unlikely to me that your running fsx would turn up anything.
> >
> > Yes, I let it run for a while anyway and it didn't report any failure.
>
> > Please note that my local rtorrent version was configured with
> > "--with-posix-fallocate".
>
> Would it be possible to enhance fsx to detect such an issue?

fsx in xfstests already uses fallocate() for preallocation and hole
punching, so such problems related to these operations can be found
using fsx. The issue here, however, involves memory reclaim
interactions and so is not something fsx can reproduce in isolation. :/


Cheers,

Dave.
--
Dave Chinner
[email protected]

2013-03-13 10:00:12

by Zheng Liu

[permalink] [raw]
Subject: Re: torrent hash failures since 3.9.0-rc1

On Tue, Mar 12, 2013 at 09:28:11AM -0400, Theodore Ts'o wrote:
> On Tue, Mar 12, 2013 at 03:16:06PM +0800, Zheng Liu wrote:
> >
> > Ted, I am wandering if we need to Cc this patch to stable kernel. We
> > don't receive any report to complaint it, though, but it is worth
> > backporting it I think.
>
> I'll check, bu I suspect it will require an explicit backport; it's
> not going to apply cleanly automatically, will it?

I check the linux-stable tree and I think it can be applied cleanly from
3.0.y. because ext4_split_extent is introduced from 3.0 kernel. So
maybe we can cc to [email protected]. That would be great if you
could double check it.

Thanks,
- Zheng