LinuxLists.cc - RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

2016-07-11 12:59:56

Subject: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

We have a customer who was able to reliably reproduce the following hang:
(hang itself is rare but there are many machines, so it is not rare)

INFO: task ascp:66692 blocked for more than 120 seconds.

> bt 66692
PID: 66692 TASK: ffff883f124ba280 CPU: 5 COMMAND: "ascp"

__schedule
schedule
schedule_timeout
io_schedule_timeout
io_schedule
nfs_wait_bit_uninterruptible
__wait_on_bit
out_of_line_wait_on_bit
nfs_wait_on_request
nfs_try_to_update_request
nfs_setup_write_request
nfs_writepage_setup
nfs_updatepage
nfs_write_end
generic_file_buffered_write
__generic_file_aio_write
generic_file_aio_write
nfs_file_write
do_sync_write
vfs_write
sys_write
system_call_fastpath

ascp is Aspera secure copy program. It is multithreaded.
When the hang happens, 5 threads are in S-state and 1 on D-state.

The workload is copying ~100+ GB files over NFS3 with 10GiB ethernet
with the following mount options:
rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=X.X.X.X,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=X.X.X.X

We got coredump (2 actually) and observations are below:

1) one of the inode's pagecache pages has page->private available:
struct nfs_page {
wb_list = {
next = 0xffff885623ac4f80,
prev = 0xffff885623ac4f80
},
wb_page = 0xffffea01218c2600,
wb_context = 0xffff887f2265de00,
wb_lock_context = 0xffff887f2265de00,
wb_index = 2649328,
wb_offset = 0,
wb_pgbase = 0,
wb_bytes = 0,
wb_kref = {
refcount = {
counter = 3
}
},
wb_flags = 19,
wb_verf = {
data = "\000\000\000\000\000\000\000"
},
wb_this_page = 0xffff885623ac4f80,
wb_head = 0xffff885623ac4f80
}

->wb_list is always empty,
->wb_bytes is always 0 (!).
->wb_kref is always 3.
->wb_flags = PG_BUSY|PG_MAPPED|PG_INODE_REF,

page->flags = 0x2869 =
PG_writeback|PG_private|PG_active|PF_lru|PG_uptodate|PG_locked

The zero-length request is created at nfs_create_request():
WARN_ON_ONCE(req->wb_bytes == 0)
nfs_create_request
nfs_setup_write_request
nfs_updatepage
nfs_write_end
generic_perform_write
generic_file_buffered_write
__generic_file_aio_write
generic_file_aio_write
nfs_file_write
do_sync_write
vfs_write
SyS_write

with count=0 coming from generic_perform_write (where else?).

Customer is saying that RHEL6 was totally OK, and there are several reports
of other people hitting same bug with RHEL7:
https://gist.github.com/Millnert/ecc10d8cc79c81b55d7f
https://bugs.centos.org/view.php?id=9284

Alexey

2016-07-11 13:28:24

by Trond Myklebust

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> wrote:
>
> We have a customer who was able to reliably reproduce the following hang:
> (hang itself is rare but there are many machines, so it is not rare)
>
> INFO: task ascp:66692 blocked for more than 120 seconds.
>
>> bt 66692
> PID: 66692 TASK: ffff883f124ba280 CPU: 5 COMMAND: "ascp"
>
> __schedule
> schedule
> schedule_timeout
> io_schedule_timeout
> io_schedule
> nfs_wait_bit_uninterruptible
> __wait_on_bit
> out_of_line_wait_on_bit
> nfs_wait_on_request
> nfs_try_to_update_request
> nfs_setup_write_request
> nfs_writepage_setup
> nfs_updatepage
> nfs_write_end
> generic_file_buffered_write
> __generic_file_aio_write
> generic_file_aio_write
> nfs_file_write
> do_sync_write
> vfs_write
> sys_write
> system_call_fastpath
>
> ascp is Aspera secure copy program. It is multithreaded.
> When the hang happens, 5 threads are in S-state and 1 on D-state.
>
> The workload is copying ~100+ GB files over NFS3 with 10GiB ethernet
> with the following mount options:
> rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=X.X.X.X,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=X.X.X.X
>
> We got coredump (2 actually) and observations are below:
>
> 1) one of the inode's pagecache pages has page->private available:
> struct nfs_page {
> wb_list = {
> next = 0xffff885623ac4f80,
> prev = 0xffff885623ac4f80
> },
> wb_page = 0xffffea01218c2600,
> wb_context = 0xffff887f2265de00,
> wb_lock_context = 0xffff887f2265de00,
> wb_index = 2649328,
> wb_offset = 0,
> wb_pgbase = 0,
> wb_bytes = 0,
> wb_kref = {
> refcount = {
> counter = 3
> }
> },
> wb_flags = 19,
> wb_verf = {
> data = "\000\000\000\000\000\000\000"
> },
> wb_this_page = 0xffff885623ac4f80,
> wb_head = 0xffff885623ac4f80
> }
>
> ->wb_list is always empty,
> ->wb_bytes is always 0 (!).
> ->wb_kref is always 3.
> ->wb_flags = PG_BUSY|PG_MAPPED|PG_INODE_REF,
>
> page->flags = 0x2869 =
> PG_writeback|PG_private|PG_active|PF_lru|PG_uptodate|PG_locked
>
> The zero-length request is created at nfs_create_request():
> WARN_ON_ONCE(req->wb_bytes == 0)
> nfs_create_request
> nfs_setup_write_request
> nfs_updatepage
> nfs_write_end
> generic_perform_write
> generic_file_buffered_write
> __generic_file_aio_write
> generic_file_aio_write
> nfs_file_write
> do_sync_write
> vfs_write
> SyS_write
>
> with count=0 coming from generic_perform_write (where else?).
>
> Customer is saying that RHEL6 was totally OK, and there are several reports
> of other people hitting same bug with RHEL7:
> https://gist.github.com/Millnert/ecc10d8cc79c81b55d7f
> https://bugs.centos.org/view.php?id=9284
>

Why is this being reported here and not to Red Hat? Is the bug reproducible on the upstream kernel?

Cheers
Trond

2016-07-11 13:47:43

by Alexey Dobriyan

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

On Mon, Jul 11, 2016 at 4:28 PM, Trond Myklebust
<[email protected]> wrote:
>
>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> wrote:
>>
>> We have a customer who was able to reliably reproduce the following hang:
>> (hang itself is rare but there are many machines, so it is not rare)
>>
>> INFO: task ascp:66692 blocked for more than 120 seconds.
>
> Why is this being reported here and not to Red Hat? Is the bug reproducible on the upstream kernel?

It is not a report per se, more heads up for other folks like CentOS
and other rebuilders.
I checked every NFS commit since 3.10, there seems to be nothing fixing it.
As for testing mainline, I don't know, maybe we can arrange that.

Alexey

2016-07-11 13:48:33

by Weston Andros Adamson

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

> On Jul 11, 2016, at 9:28 AM, Trond Myklebust <[email protected]> wrote:
>
>>
>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> wrote:
>>
>> We have a customer who was able to reliably reproduce the following hang:
>> (hang itself is rare but there are many machines, so it is not rare)
>>
>> INFO: task ascp:66692 blocked for more than 120 seconds.
>>
>>> bt 66692
>> PID: 66692 TASK: ffff883f124ba280 CPU: 5 COMMAND: "ascp"
>>
>> __schedule
>> schedule
>> schedule_timeout
>> io_schedule_timeout
>> io_schedule
>> nfs_wait_bit_uninterruptible
>> __wait_on_bit
>> out_of_line_wait_on_bit
>> nfs_wait_on_request
>> nfs_try_to_update_request
>> nfs_setup_write_request
>> nfs_writepage_setup
>> nfs_updatepage
>> nfs_write_end
>> generic_file_buffered_write
>> __generic_file_aio_write
>> generic_file_aio_write
>> nfs_file_write
>> do_sync_write
>> vfs_write
>> sys_write
>> system_call_fastpath
>>
>> ascp is Aspera secure copy program. It is multithreaded.
>> When the hang happens, 5 threads are in S-state and 1 on D-state.
>>
>> The workload is copying ~100+ GB files over NFS3 with 10GiB ethernet
>> with the following mount options:
>> rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=X.X.X.X,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=X.X.X.X
>>
>> We got coredump (2 actually) and observations are below:
>>
>> 1) one of the inode's pagecache pages has page->private available:
>> struct nfs_page {
>> wb_list = {
>> next = 0xffff885623ac4f80,
>> prev = 0xffff885623ac4f80
>> },
>> wb_page = 0xffffea01218c2600,
>> wb_context = 0xffff887f2265de00,
>> wb_lock_context = 0xffff887f2265de00,
>> wb_index = 2649328,
>> wb_offset = 0,
>> wb_pgbase = 0,
>> wb_bytes = 0,
>> wb_kref = {
>> refcount = {
>> counter = 3
>> }
>> },
>> wb_flags = 19,
>> wb_verf = {
>> data = "\000\000\000\000\000\000\000"
>> },
>> wb_this_page = 0xffff885623ac4f80,
>> wb_head = 0xffff885623ac4f80
>> }
>>
>> ->wb_list is always empty,
>> ->wb_bytes is always 0 (!).
>> ->wb_kref is always 3.
>> ->wb_flags = PG_BUSY|PG_MAPPED|PG_INODE_REF,
>>
>> page->flags = 0x2869 =
>> PG_writeback|PG_private|PG_active|PF_lru|PG_uptodate|PG_locked
>>
>> The zero-length request is created at nfs_create_request():
>> WARN_ON_ONCE(req->wb_bytes == 0)
>> nfs_create_request
>> nfs_setup_write_request
>> nfs_updatepage
>> nfs_write_end
>> generic_perform_write
>> generic_file_buffered_write
>> __generic_file_aio_write
>> generic_file_aio_write
>> nfs_file_write
>> do_sync_write
>> vfs_write
>> SyS_write
>>
>> with count=0 coming from generic_perform_write (where else?).
>>
>> Customer is saying that RHEL6 was totally OK, and there are several reports
>> of other people hitting same bug with RHEL7:
>> https://gist.github.com/Millnert/ecc10d8cc79c81b55d7f
>> https://bugs.centos.org/view.php?id=9284
>>
>
> Why is this being reported here and not to Red Hat? Is the bug reproducible on the upstream kernel?
>
> Cheers
> Trond

Also, you say RHEL7, but which kernel?

-dros

-dros

2016-07-11 13:52:52

by Weston Andros Adamson

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

> On Jul 11, 2016, at 9:47 AM, Alexey Dobriyan <[email protected]> wrote:
>
> On Mon, Jul 11, 2016 at 4:28 PM, Trond Myklebust
> <[email protected]> wrote:
>>
>>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> wrote:
>>>
>>> We have a customer who was able to reliably reproduce the following hang:
>>> (hang itself is rare but there are many machines, so it is not rare)
>>>
>>> INFO: task ascp:66692 blocked for more than 120 seconds.
>>
>> Why is this being reported here and not to Red Hat? Is the bug reproducible on the upstream kernel?
>
> It is not a report per se, more heads up for other folks like CentOS
> and other rebuilders.
> I checked every NFS commit since 3.10, there seems to be nothing fixing it.
> As for testing mainline, I don't know, maybe we can arrange that.
>
> Alexey
>

How have you checked every commit since 3.10 and not tested "mainline"?

I don't get how both can be true. Do you mean you manually inspected each
commit? I'm not sure the fix would be so obvious...

-dros

2016-07-11 13:57:37

by Alexey Dobriyan

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

On Mon, Jul 11, 2016 at 4:48 PM, Weston Andros Adamson <[email protected]> wrote:
>
>> On Jul 11, 2016, at 9:28 AM, Trond Myklebust <[email protected]> wrote:
>>
>>>
>>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> wrote:
>>>
>>> We have a customer who was able to reliably reproduce the following hang:
>>> (hang itself is rare but there are many machines, so it is not rare)
>>>
>>> INFO: task ascp:66692 blocked for more than 120 seconds.

> Also, you say RHEL7, but which kernel?

3.10.0-229.14.1.el7.x86_64
3.10.0-327.18.2.el7.x86_64

2016-07-11 14:15:56

by Alexey Dobriyan

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

On Mon, Jul 11, 2016 at 4:52 PM, Weston Andros Adamson <[email protected]> wrote:
>
>> On Jul 11, 2016, at 9:47 AM, Alexey Dobriyan <[email protected]> wrote:
>>
>> On Mon, Jul 11, 2016 at 4:28 PM, Trond Myklebust
>> <[email protected]> wrote:
>>>
>>>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> wrote:
>>>>
>>>> We have a customer who was able to reliably reproduce the following hang:
>>>> (hang itself is rare but there are many machines, so it is not rare)
>>>>
>>>> INFO: task ascp:66692 blocked for more than 120 seconds.
>>>
>>> Why is this being reported here and not to Red Hat? Is the bug reproducible on the upstream kernel?
>>
>> It is not a report per se, more heads up for other folks like CentOS
>> and other rebuilders.
>> I checked every NFS commit since 3.10, there seems to be nothing fixing it.
>> As for testing mainline, I don't know, maybe we can arrange that.
>>
>> Alexey
>>
>
> How have you checked every commit since 3.10 and not tested "mainline"?
>
> I don't get how both can be true. Do you mean you manually inspected each
> commit? I'm not sure the fix would be so obvious...

I've looked at commits. If the bug was fixed accidentally, then sorry
for the noise.

2016-07-11 15:44:30

by Weston Andros Adamson

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

> On Jul 11, 2016, at 10:15 AM, Alexey Dobriyan <[email protected]> wrote:
>
> On Mon, Jul 11, 2016 at 4:52 PM, Weston Andros Adamson <[email protected]> wrote:
>>
>>> On Jul 11, 2016, at 9:47 AM, Alexey Dobriyan <[email protected]> wrote:
>>>
>>> On Mon, Jul 11, 2016 at 4:28 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>>
>>>>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> wrote:
>>>>>
>>>>> We have a customer who was able to reliably reproduce the following hang:
>>>>> (hang itself is rare but there are many machines, so it is not rare)
>>>>>
>>>>> INFO: task ascp:66692 blocked for more than 120 seconds.
>>>>
>>>> Why is this being reported here and not to Red Hat? Is the bug reproducible on the upstream kernel?
>>>
>>> It is not a report per se, more heads up for other folks like CentOS
>>> and other rebuilders.
>>> I checked every NFS commit since 3.10, there seems to be nothing fixing it.
>>> As for testing mainline, I don't know, maybe we can arrange that.
>>>
>>> Alexey
>>>
>>
>> How have you checked every commit since 3.10 and not tested "mainline"?
>>
>> I don't get how both can be true. Do you mean you manually inspected each
>> commit? I'm not sure the fix would be so obvious...
>
> I've looked at commits. If the bug was fixed accidentally, then sorry
> for the noise.

I'm not saying it was fixed accidentally. The nfs client is a complex system and
bugs like this may manifest themselves in many different ways. I don't see how you
can just browse the commits to see if something like this was fixed. The commit message
could be something as simple as "handle error better" or "fix off-by-one calculation" --
how would you know that's the fix to your issue or not?

A few data points that would be helpful:
- is this reproducible on upstream kernels
- more info on the workload- is it buffered IO? direct IO? What sizes are the writes?
- is this reproducible with tcp transport instead of UDP? I ask this because the
vast majority of testing of the nfs client is TCP, as there is really no reason
to use UDP... Probably not the issue, but may be worth checking out.

-dros

2016-07-18 11:41:29

by Benjamin Coddington

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

On 11 Jul 2016, at 9:28, Trond Myklebust wrote:

>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]>
>> wrote:
>>
>> We have a customer who was able to reliably reproduce the following
>> hang:
>> (hang itself is rare but there are many machines, so it is not rare)
>>
>> INFO: task ascp:66692 blocked for more than 120 seconds.
>>
>>> bt 66692
>> PID: 66692 TASK: ffff883f124ba280 CPU: 5 COMMAND: "ascp"
>>
>> __schedule
>> schedule
>> schedule_timeout
>> io_schedule_timeout
>> io_schedule
>> nfs_wait_bit_uninterruptible
>> __wait_on_bit
>> out_of_line_wait_on_bit
>> nfs_wait_on_request
>> nfs_try_to_update_request
>> nfs_setup_write_request
>> nfs_writepage_setup
>> nfs_updatepage
>> nfs_write_end
>> generic_file_buffered_write
>> __generic_file_aio_write
>> generic_file_aio_write
>> nfs_file_write
>> do_sync_write
>> vfs_write
>> sys_write
>> system_call_fastpath
>>
>> ascp is Aspera secure copy program. It is multithreaded.
>> When the hang happens, 5 threads are in S-state and 1 on D-state.
>>
>> The workload is copying ~100+ GB files over NFS3 with 10GiB ethernet
>> with the following mount options:
>> rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=X.X.X.X,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=X.X.X.X
>>
>> We got coredump (2 actually) and observations are below:
>>
>> 1) one of the inode's pagecache pages has page->private available:
>> struct nfs_page {
>> wb_list = {
>> next = 0xffff885623ac4f80,
>> prev = 0xffff885623ac4f80
>> },
>> wb_page = 0xffffea01218c2600,
>> wb_context = 0xffff887f2265de00,
>> wb_lock_context = 0xffff887f2265de00,
>> wb_index = 2649328,
>> wb_offset = 0,
>> wb_pgbase = 0,
>> wb_bytes = 0,
>> wb_kref = {
>> refcount = {
>> counter = 3
>> }
>> },
>> wb_flags = 19,
>> wb_verf = {
>> data = "\000\000\000\000\000\000\000"
>> },
>> wb_this_page = 0xffff885623ac4f80,
>> wb_head = 0xffff885623ac4f80
>> }
>>
>> ->wb_list is always empty,
>> ->wb_bytes is always 0 (!).
>> ->wb_kref is always 3.
>> ->wb_flags = PG_BUSY|PG_MAPPED|PG_INODE_REF,
>>
>> page->flags = 0x2869 =
>> PG_writeback|PG_private|PG_active|PF_lru|PG_uptodate|PG_locked
>>
>> The zero-length request is created at nfs_create_request():
>> WARN_ON_ONCE(req->wb_bytes == 0)
>> nfs_create_request
>> nfs_setup_write_request
>> nfs_updatepage
>> nfs_write_end
>> generic_perform_write
>> generic_file_buffered_write
>> __generic_file_aio_write
>> generic_file_aio_write
>> nfs_file_write
>> do_sync_write
>> vfs_write
>> SyS_write
>>
>> with count=0 coming from generic_perform_write (where else?).
>>
>> Customer is saying that RHEL6 was totally OK, and there are several
>> reports
>> of other people hitting same bug with RHEL7:
>> https://gist.github.com/Millnert/ecc10d8cc79c81b55d7f
>> https://bugs.centos.org/view.php?id=9284
>>
>
> Why is this being reported here and not to Red Hat? Is the bug
> reproducible on the upstream kernel?

There's a RHEL BZ open for it:
https://bugzilla.redhat.com/show_bug.cgi?id=132463

Upstream has the problem, too. The pgio layer doesn't expect
zero-length
requests, so an nfs_page can get locked but never added to a pageio
descriptor.

To create such a problem, writeback has to happen just after
nfs_write_end()
has created a nfs_page with wb_bytes = 0. This can happen if
iov_iter_copy_from_user_atomic() fails to copy pages in
generic_perform_write(). Normally, we'd just go around the loop and try
the
copy again, but if writeback sneaks in then the request gets locked but
never added to a pgio descriptor due to the way we use conditions on
zero
bytes to move requests around.

I see two potential ways to fix:
- Just skip the creation of the request if the length is zero.
We're
bound to just come around and created it again.

- Allow zero-length requests, but fix up __nfs_pageio_add_request,
nfs_generic_pg_test, and others that use conditions on zero.

I think the first option to handle the case early is the simpler fix,
and
best approach, since: what do we do with a zero-length request? Not
creating them helps us be conservative in what we send out to the
nfs_pageio_ops.

I don't understand what conditions can cause
iov_iter_copy_from_user_atomic
to fail since pagefaults are disabled. Can anyone enlighten me?

To reproduce this upstream, I've simulated
iov_iter_copy_from_user_atomic()
returning zero once, then inserted a small delay after nfs_write_end to
allow me to sneak in a sync.

Any advice on how to proceed? I'll send a patch to fix the first way
otherwise. If anyone has trouble with that RHEL BZ, send me your
bugzilla
email and I'll give you access.

Ben

2016-07-18 12:35:46

by Trond Myklebust

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

> On Jul 18, 2016, at 07:43, Benjamin Coddington <[email protected]> wrot=
e:
>=20
> On 11 Jul 2016, at 9:28, Trond Myklebust wrote:
>=20
>>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> wrote:
>>>=20
>>> We have a customer who was able to reliably reproduce the following han=
g:
>>> (hang itself is rare but there are many machines, so it is not rare)
>>>=20
>>> INFO: task ascp:66692 blocked for more than 120 seconds.
>>>=20
>>>> bt 66692
>>> PID: 66692 TASK: ffff883f124ba280 CPU: 5 COMMAND: "ascp"
>>>=20
>>> __schedule
>>> schedule
>>> schedule_timeout
>>> io_schedule_timeout
>>> io_schedule
>>> nfs_wait_bit_uninterruptible
>>> __wait_on_bit
>>> out_of_line_wait_on_bit
>>> nfs_wait_on_request
>>> nfs_try_to_update_request
>>> nfs_setup_write_request
>>> nfs_writepage_setup
>>> nfs_updatepage
>>> nfs_write_end
>>> generic_file_buffered_write
>>> __generic_file_aio_write
>>> generic_file_aio_write
>>> nfs_file_write
>>> do_sync_write
>>> vfs_write
>>> sys_write
>>> system_call_fastpath
>>>=20
>>> ascp is Aspera secure copy program. It is multithreaded.
>>> When the hang happens, 5 threads are in S-state and 1 on D-state.
>>>=20
>>> The workload is copying ~100+ GB files over NFS3 with 10GiB ethernet
>>> with the following mount options:
>>> rw,relatime,vers=3D3,rsize=3D131072,wsize=3D524288,namlen=3D255,hard,pr=
oto=3Dtcp,timeo=3D600,retrans=3D2,sec=3Dsys,mountaddr=3DX.X.X.X,mountvers=
=3D3,mountport=3D300,mountproto=3Dudp,local_lock=3Dnone,addr=3DX.X.X.X
>>>=20
>>> We got coredump (2 actually) and observations are below:
>>>=20
>>> 1) one of the inode's pagecache pages has page->private available:
>>> struct nfs_page {
>>> wb_list =3D {
>>> next =3D 0xffff885623ac4f80,
>>> prev =3D 0xffff885623ac4f80
>>> },
>>> wb_page =3D 0xffffea01218c2600,
>>> wb_context =3D 0xffff887f2265de00,
>>> wb_lock_context =3D 0xffff887f2265de00,
>>> wb_index =3D 2649328,
>>> wb_offset =3D 0,
>>> wb_pgbase =3D 0,
>>> wb_bytes =3D 0,
>>> wb_kref =3D {
>>> refcount =3D {
>>> counter =3D 3
>>> }
>>> },
>>> wb_flags =3D 19,
>>> wb_verf =3D {
>>> data =3D "\000\000\000\000\000\000\000"
>>> },
>>> wb_this_page =3D 0xffff885623ac4f80,
>>> wb_head =3D 0xffff885623ac4f80
>>> }
>>>=20
>>> ->wb_list is always empty,
>>> ->wb_bytes is always 0 (!).
>>> ->wb_kref is always 3.
>>> ->wb_flags =3D PG_BUSY|PG_MAPPED|PG_INODE_REF,
>>>=20
>>> page->flags =3D 0x2869 =3D
>>> PG_writeback|PG_private|PG_active|PF_lru|PG_uptodate|PG_locked
>>>=20
>>> The zero-length request is created at nfs_create_request():
>>> WARN_ON_ONCE(req->wb_bytes =3D=3D 0)
>>> nfs_create_request
>>> nfs_setup_write_request
>>> nfs_updatepage
>>> nfs_write_end
>>> generic_perform_write
>>> generic_file_buffered_write
>>> __generic_file_aio_write
>>> generic_file_aio_write
>>> nfs_file_write
>>> do_sync_write
>>> vfs_write
>>> SyS_write
>>>=20
>>> with count=3D0 coming from generic_perform_write (where else?).
>>>=20
>>> Customer is saying that RHEL6 was totally OK, and there are several rep=
orts
>>> of other people hitting same bug with RHEL7:
>>> https://gist.github.com/Millnert/ecc10d8cc79c81b55d7f
>>> https://bugs.centos.org/view.php?id=3D9284
>>>=20
>>=20
>> Why is this being reported here and not to Red Hat? Is the bug reproduci=
ble on the upstream kernel?
>=20
> There's a RHEL BZ open for it:
> https://bugzilla.redhat.com/show_bug.cgi?id=3D132463
>=20
> Upstream has the problem, too. The pgio layer doesn't expect zero-length
> requests, so an nfs_page can get locked but never added to a pageio
> descriptor.
>=20
> To create such a problem, writeback has to happen just after nfs_write_en=
d()
> has created a nfs_page with wb_bytes =3D 0. This can happen if
> iov_iter_copy_from_user_atomic() fails to copy pages in
> generic_perform_write(). Normally, we'd just go around the loop and try =
the
> copy again, but if writeback sneaks in then the request gets locked but
> never added to a pgio descriptor due to the way we use conditions on zero
> bytes to move requests around.
>=20
> I see two potential ways to fix:
> - Just skip the creation of the request if the length is zero. We're
> bound to just come around and created it again.
>=20
> - Allow zero-length requests, but fix up __nfs_pageio_add_request,
> nfs_generic_pg_test, and others that use conditions on zero.
>=20
> I think the first option to handle the case early is the simpler fix, and
> best approach, since: what do we do with a zero-length request? Not
> creating them helps us be conservative in what we send out to the
> nfs_pageio_ops.
>=20
> I don't understand what conditions can cause iov_iter_copy_from_user_atom=
ic
> to fail since pagefaults are disabled. Can anyone enlighten me?
>=20
> To reproduce this upstream, I've simulated iov_iter_copy_from_user_atomic=
()
> returning zero once, then inserted a small delay after nfs_write_end to
> allow me to sneak in a sync.
>=20
> Any advice on how to proceed? I'll send a patch to fix the first way
> otherwise. If anyone has trouble with that RHEL BZ, send me your bugzill=
a
> email and I'll give you access.

I think adding a check to nfs_updatepage() to bail out if =91count=92 is ze=
ro would seem to be the right thing to do.

Thanks
Trond

2016-07-18 14:32:54

by Alexey Dobriyan

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

On Mon, Jul 18, 2016 at 3:35 PM, Trond Myklebust
<[email protected]> wrote:
>
>> On Jul 18, 2016, at 07:43, Benjamin Coddington <[email protected]> wro=
te:
>>
>> On 11 Jul 2016, at 9:28, Trond Myklebust wrote:
>>
>>>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> wrote=
:
>>>>
>>>> We have a customer who was able to reliably reproduce the following ha=
ng:
>>>> (hang itself is rare but there are many machines, so it is not rare)
>>>>
>>>> INFO: task ascp:66692 blocked for more than 120 seconds.
>>>>
>>>>> bt 66692
>>>> PID: 66692 TASK: ffff883f124ba280 CPU: 5 COMMAND: "ascp"
>>>>
>>>> __schedule
>>>> schedule
>>>> schedule_timeout
>>>> io_schedule_timeout
>>>> io_schedule
>>>> nfs_wait_bit_uninterruptible
>>>> __wait_on_bit
>>>> out_of_line_wait_on_bit
>>>> nfs_wait_on_request
>>>> nfs_try_to_update_request
>>>> nfs_setup_write_request
>>>> nfs_writepage_setup
>>>> nfs_updatepage
>>>> nfs_write_end
>>>> generic_file_buffered_write
>>>> __generic_file_aio_write
>>>> generic_file_aio_write
>>>> nfs_file_write
>>>> do_sync_write
>>>> vfs_write
>>>> sys_write
>>>> system_call_fastpath
>>>>
>>>> ascp is Aspera secure copy program. It is multithreaded.
>>>> When the hang happens, 5 threads are in S-state and 1 on D-state.
>>>>
>>>> The workload is copying ~100+ GB files over NFS3 with 10GiB ethernet
>>>> with the following mount options:
>>>> rw,relatime,vers=3D3,rsize=3D131072,wsize=3D524288,namlen=3D255,hard,p=
roto=3Dtcp,timeo=3D600,retrans=3D2,sec=3Dsys,mountaddr=3DX.X.X.X,mountvers=
=3D3,mountport=3D300,mountproto=3Dudp,local_lock=3Dnone,addr=3DX.X.X.X
>>>>
>>>> We got coredump (2 actually) and observations are below:
>>>>
>>>> 1) one of the inode's pagecache pages has page->private available:
>>>> struct nfs_page {
>>>> wb_list =3D {
>>>> next =3D 0xffff885623ac4f80,
>>>> prev =3D 0xffff885623ac4f80
>>>> },
>>>> wb_page =3D 0xffffea01218c2600,
>>>> wb_context =3D 0xffff887f2265de00,
>>>> wb_lock_context =3D 0xffff887f2265de00,
>>>> wb_index =3D 2649328,
>>>> wb_offset =3D 0,
>>>> wb_pgbase =3D 0,
>>>> wb_bytes =3D 0,
>>>> wb_kref =3D {
>>>> refcount =3D {
>>>> counter =3D 3
>>>> }
>>>> },
>>>> wb_flags =3D 19,
>>>> wb_verf =3D {
>>>> data =3D "\000\000\000\000\000\000\000"
>>>> },
>>>> wb_this_page =3D 0xffff885623ac4f80,
>>>> wb_head =3D 0xffff885623ac4f80
>>>> }
>>>>
>>>> ->wb_list is always empty,
>>>> ->wb_bytes is always 0 (!).
>>>> ->wb_kref is always 3.
>>>> ->wb_flags =3D PG_BUSY|PG_MAPPED|PG_INODE_REF,
>>>>
>>>> page->flags =3D 0x2869 =3D
>>>> PG_writeback|PG_private|PG_active|PF_lru|PG_uptodate|PG_locked
>>>>
>>>> The zero-length request is created at nfs_create_request():
>>>> WARN_ON_ONCE(req->wb_bytes =3D=3D 0)
>>>> nfs_create_request
>>>> nfs_setup_write_request
>>>> nfs_updatepage
>>>> nfs_write_end
>>>> generic_perform_write
>>>> generic_file_buffered_write
>>>> __generic_file_aio_write
>>>> generic_file_aio_write
>>>> nfs_file_write
>>>> do_sync_write
>>>> vfs_write
>>>> SyS_write
>>>>
>>>> with count=3D0 coming from generic_perform_write (where else?).
>>>>
>>>> Customer is saying that RHEL6 was totally OK, and there are several re=
ports
>>>> of other people hitting same bug with RHEL7:
>>>> https://gist.github.com/Millnert/ecc10d8cc79c81b55d7f
>>>> https://bugs.centos.org/view.php?id=3D9284
>>>>
>>>
>>> Why is this being reported here and not to Red Hat? Is the bug reproduc=
ible on the upstream kernel?
>>
>> There's a RHEL BZ open for it:
>> https://bugzilla.redhat.com/show_bug.cgi?id=3D132463
>>
>> Upstream has the problem, too. The pgio layer doesn't expect zero-lengt=
h
>> requests, so an nfs_page can get locked but never added to a pageio
>> descriptor.
>>
>> To create such a problem, writeback has to happen just after nfs_write_e=
nd()
>> has created a nfs_page with wb_bytes =3D 0. This can happen if
>> iov_iter_copy_from_user_atomic() fails to copy pages in
>> generic_perform_write(). Normally, we'd just go around the loop and try=
the
>> copy again, but if writeback sneaks in then the request gets locked but
>> never added to a pgio descriptor due to the way we use conditions on zer=
o
>> bytes to move requests around.
>>
>> I see two potential ways to fix:
>> - Just skip the creation of the request if the length is zero. We're
>> bound to just come around and created it again.
>>
>> - Allow zero-length requests, but fix up __nfs_pageio_add_request,
>> nfs_generic_pg_test, and others that use conditions on zero.
>>
>> I think the first option to handle the case early is the simpler fix, an=
d
>> best approach, since: what do we do with a zero-length request? Not
>> creating them helps us be conservative in what we send out to the
>> nfs_pageio_ops.
>>
>> I don't understand what conditions can cause iov_iter_copy_from_user_ato=
mic
>> to fail since pagefaults are disabled. Can anyone enlighten me?
>>
>> To reproduce this upstream, I've simulated iov_iter_copy_from_user_atomi=
c()
>> returning zero once, then inserted a small delay after nfs_write_end to
>> allow me to sneak in a sync.
>>
>> Any advice on how to proceed? I'll send a patch to fix the first way
>> otherwise. If anyone has trouble with that RHEL BZ, send me your bugzil=
la
>> email and I'll give you access.
>
> I think adding a check to nfs_updatepage() to bail out if =E2=80=98count=
=E2=80=99 is zero would seem to be the right thing to do.

This is exactly what we did as a workaround for customer,
so I can already confirm it helps. :^)

2016-07-18 14:36:13

by Weston Andros Adamson

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

> On Jul 18, 2016, at 10:32 AM, Alexey Dobriyan <[email protected]> =
wrote:
>=20
> On Mon, Jul 18, 2016 at 3:35 PM, Trond Myklebust
> <[email protected]> wrote:
>>=20
>>> On Jul 18, 2016, at 07:43, Benjamin Coddington <[email protected]> =
wrote:
>>>=20
>>> On 11 Jul 2016, at 9:28, Trond Myklebust wrote:
>>>=20
>>>>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]> =
wrote:
>>>>>=20
>>>>> We have a customer who was able to reliably reproduce the =
following hang:
>>>>> (hang itself is rare but there are many machines, so it is not =
rare)
>>>>>=20
>>>>> INFO: task ascp:66692 blocked for more than 120 seconds.
>>>>>=20
>>>>>> bt 66692
>>>>> PID: 66692 TASK: ffff883f124ba280 CPU: 5 COMMAND: "ascp"
>>>>>=20
>>>>> __schedule
>>>>> schedule
>>>>> schedule_timeout
>>>>> io_schedule_timeout
>>>>> io_schedule
>>>>> nfs_wait_bit_uninterruptible
>>>>> __wait_on_bit
>>>>> out_of_line_wait_on_bit
>>>>> nfs_wait_on_request
>>>>> nfs_try_to_update_request
>>>>> nfs_setup_write_request
>>>>> nfs_writepage_setup
>>>>> nfs_updatepage
>>>>> nfs_write_end
>>>>> generic_file_buffered_write
>>>>> __generic_file_aio_write
>>>>> generic_file_aio_write
>>>>> nfs_file_write
>>>>> do_sync_write
>>>>> vfs_write
>>>>> sys_write
>>>>> system_call_fastpath
>>>>>=20
>>>>> ascp is Aspera secure copy program. It is multithreaded.
>>>>> When the hang happens, 5 threads are in S-state and 1 on D-state.
>>>>>=20
>>>>> The workload is copying ~100+ GB files over NFS3 with 10GiB =
ethernet
>>>>> with the following mount options:
>>>>> =
rw,relatime,vers=3D3,rsize=3D131072,wsize=3D524288,namlen=3D255,hard,proto=
=3Dtcp,timeo=3D600,retrans=3D2,sec=3Dsys,mountaddr=3DX.X.X.X,mountvers=3D3=
,mountport=3D300,mountproto=3Dudp,local_lock=3Dnone,addr=3DX.X.X.X
>>>>>=20
>>>>> We got coredump (2 actually) and observations are below:
>>>>>=20
>>>>> 1) one of the inode's pagecache pages has page->private available:
>>>>> struct nfs_page {
>>>>> wb_list =3D {
>>>>> next =3D 0xffff885623ac4f80,
>>>>> prev =3D 0xffff885623ac4f80
>>>>> },
>>>>> wb_page =3D 0xffffea01218c2600,
>>>>> wb_context =3D 0xffff887f2265de00,
>>>>> wb_lock_context =3D 0xffff887f2265de00,
>>>>> wb_index =3D 2649328,
>>>>> wb_offset =3D 0,
>>>>> wb_pgbase =3D 0,
>>>>> wb_bytes =3D 0,
>>>>> wb_kref =3D {
>>>>> refcount =3D {
>>>>> counter =3D 3
>>>>> }
>>>>> },
>>>>> wb_flags =3D 19,
>>>>> wb_verf =3D {
>>>>> data =3D "\000\000\000\000\000\000\000"
>>>>> },
>>>>> wb_this_page =3D 0xffff885623ac4f80,
>>>>> wb_head =3D 0xffff885623ac4f80
>>>>> }
>>>>>=20
>>>>> ->wb_list is always empty,
>>>>> ->wb_bytes is always 0 (!).
>>>>> ->wb_kref is always 3.
>>>>> ->wb_flags =3D PG_BUSY|PG_MAPPED|PG_INODE_REF,
>>>>>=20
>>>>> page->flags =3D 0x2869 =3D
>>>>> PG_writeback|PG_private|PG_active|PF_lru|PG_uptodate|PG_locked
>>>>>=20
>>>>> The zero-length request is created at nfs_create_request():
>>>>> WARN_ON_ONCE(req->wb_bytes =3D=3D 0)
>>>>> nfs_create_request
>>>>> nfs_setup_write_request
>>>>> nfs_updatepage
>>>>> nfs_write_end
>>>>> generic_perform_write
>>>>> generic_file_buffered_write
>>>>> __generic_file_aio_write
>>>>> generic_file_aio_write
>>>>> nfs_file_write
>>>>> do_sync_write
>>>>> vfs_write
>>>>> SyS_write
>>>>>=20
>>>>> with count=3D0 coming from generic_perform_write (where else?).
>>>>>=20
>>>>> Customer is saying that RHEL6 was totally OK, and there are =
several reports
>>>>> of other people hitting same bug with RHEL7:
>>>>> https://gist.github.com/Millnert/ecc10d8cc79c81b55d7f
>>>>> https://bugs.centos.org/view.php?id=3D9284
>>>>>=20
>>>>=20
>>>> Why is this being reported here and not to Red Hat? Is the bug =
reproducible on the upstream kernel?
>>>=20
>>> There's a RHEL BZ open for it:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=3D132463
>>>=20
>>> Upstream has the problem, too. The pgio layer doesn't expect =
zero-length
>>> requests, so an nfs_page can get locked but never added to a pageio
>>> descriptor.
>>>=20
>>> To create such a problem, writeback has to happen just after =
nfs_write_end()
>>> has created a nfs_page with wb_bytes =3D 0. This can happen if
>>> iov_iter_copy_from_user_atomic() fails to copy pages in
>>> generic_perform_write(). Normally, we'd just go around the loop and =
try the
>>> copy again, but if writeback sneaks in then the request gets locked =
but
>>> never added to a pgio descriptor due to the way we use conditions on =
zero
>>> bytes to move requests around.
>>>=20
>>> I see two potential ways to fix:
>>> - Just skip the creation of the request if the length is zero. =
We're
>>> bound to just come around and created it again.
>>>=20
>>> - Allow zero-length requests, but fix up __nfs_pageio_add_request,
>>> nfs_generic_pg_test, and others that use conditions on zero.
>>>=20
>>> I think the first option to handle the case early is the simpler =
fix, and
>>> best approach, since: what do we do with a zero-length request? Not
>>> creating them helps us be conservative in what we send out to the
>>> nfs_pageio_ops.
>>>=20
>>> I don't understand what conditions can cause =
iov_iter_copy_from_user_atomic
>>> to fail since pagefaults are disabled. Can anyone enlighten me?
>>>=20
>>> To reproduce this upstream, I've simulated =
iov_iter_copy_from_user_atomic()
>>> returning zero once, then inserted a small delay after nfs_write_end =
to
>>> allow me to sneak in a sync.
>>>=20
>>> Any advice on how to proceed? I'll send a patch to fix the first =
way
>>> otherwise. If anyone has trouble with that RHEL BZ, send me your =
bugzilla
>>> email and I'll give you access.
>>=20
>> I think adding a check to nfs_updatepage() to bail out if =E2=80=98coun=
t=E2=80=99 is zero would seem to be the right thing to do.
>=20
> This is exactly what we did as a workaround for customer,
> so I can already confirm it helps. :^)

Makes sense to me too!

Who wants to take the lead and post a patch? Ben?

Thanks!

-dros

2016-07-18 14:38:30

by Benjamin Coddington

[permalink] [raw]

Subject: Re: RHEL7 hang (and probably mainline) while copying big files because of subrequests merging(?)

On 18 Jul 2016, at 10:36, Weston Andros Adamson wrote:

>> On Jul 18, 2016, at 10:32 AM, Alexey Dobriyan <[email protected]>
>> wrote:
>>
>> On Mon, Jul 18, 2016 at 3:35 PM, Trond Myklebust
>> <[email protected]> wrote:
>>>
>>>> On Jul 18, 2016, at 07:43, Benjamin Coddington
>>>> <[email protected]> wrote:
>>>>
>>>> On 11 Jul 2016, at 9:28, Trond Myklebust wrote:
>>>>
>>>>>> On Jul 11, 2016, at 08:59, Alexey Dobriyan <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> We have a customer who was able to reliably reproduce the
>>>>>> following hang:
>>>>>> (hang itself is rare but there are many machines, so it is not
>>>>>> rare)
>>>>>>
>>>>>> INFO: task ascp:66692 blocked for more than 120 seconds.
>>>>>>
>>>>>>> bt 66692
>>>>>> PID: 66692 TASK: ffff883f124ba280 CPU: 5 COMMAND: "ascp"
>>>>>>
>>>>>> __schedule
>>>>>> schedule
>>>>>> schedule_timeout
>>>>>> io_schedule_timeout
>>>>>> io_schedule
>>>>>> nfs_wait_bit_uninterruptible
>>>>>> __wait_on_bit
>>>>>> out_of_line_wait_on_bit
>>>>>> nfs_wait_on_request
>>>>>> nfs_try_to_update_request
>>>>>> nfs_setup_write_request
>>>>>> nfs_writepage_setup
>>>>>> nfs_updatepage
>>>>>> nfs_write_end
>>>>>> generic_file_buffered_write
>>>>>> __generic_file_aio_write
>>>>>> generic_file_aio_write
>>>>>> nfs_file_write
>>>>>> do_sync_write
>>>>>> vfs_write
>>>>>> sys_write
>>>>>> system_call_fastpath
>>>>>>
>>>>>> ascp is Aspera secure copy program. It is multithreaded.
>>>>>> When the hang happens, 5 threads are in S-state and 1 on D-state.
>>>>>>
>>>>>> The workload is copying ~100+ GB files over NFS3 with 10GiB
>>>>>> ethernet
>>>>>> with the following mount options:
>>>>>> rw,relatime,vers=3,rsize=131072,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=X.X.X.X,mountvers=3,mountport=300,mountproto=udp,local_lock=none,addr=X.X.X.X
>>>>>>
>>>>>> We got coredump (2 actually) and observations are below:
>>>>>>
>>>>>> 1) one of the inode's pagecache pages has page->private
>>>>>> available:
>>>>>> struct nfs_page {
>>>>>> wb_list = {
>>>>>> next = 0xffff885623ac4f80,
>>>>>> prev = 0xffff885623ac4f80
>>>>>> },
>>>>>> wb_page = 0xffffea01218c2600,
>>>>>> wb_context = 0xffff887f2265de00,
>>>>>> wb_lock_context = 0xffff887f2265de00,
>>>>>> wb_index = 2649328,
>>>>>> wb_offset = 0,
>>>>>> wb_pgbase = 0,
>>>>>> wb_bytes = 0,
>>>>>> wb_kref = {
>>>>>> refcount = {
>>>>>> counter = 3
>>>>>> }
>>>>>> },
>>>>>> wb_flags = 19,
>>>>>> wb_verf = {
>>>>>> data = "\000\000\000\000\000\000\000"
>>>>>> },
>>>>>> wb_this_page = 0xffff885623ac4f80,
>>>>>> wb_head = 0xffff885623ac4f80
>>>>>> }
>>>>>>
>>>>>> ->wb_list is always empty,
>>>>>> ->wb_bytes is always 0 (!).
>>>>>> ->wb_kref is always 3.
>>>>>> ->wb_flags = PG_BUSY|PG_MAPPED|PG_INODE_REF,
>>>>>>
>>>>>> page->flags = 0x2869 =
>>>>>> PG_writeback|PG_private|PG_active|PF_lru|PG_uptodate|PG_locked
>>>>>>
>>>>>> The zero-length request is created at nfs_create_request():
>>>>>> WARN_ON_ONCE(req->wb_bytes == 0)
>>>>>> nfs_create_request
>>>>>> nfs_setup_write_request
>>>>>> nfs_updatepage
>>>>>> nfs_write_end
>>>>>> generic_perform_write
>>>>>> generic_file_buffered_write
>>>>>> __generic_file_aio_write
>>>>>> generic_file_aio_write
>>>>>> nfs_file_write
>>>>>> do_sync_write
>>>>>> vfs_write
>>>>>> SyS_write
>>>>>>
>>>>>> with count=0 coming from generic_perform_write (where else?).
>>>>>>
>>>>>> Customer is saying that RHEL6 was totally OK, and there are
>>>>>> several reports
>>>>>> of other people hitting same bug with RHEL7:
>>>>>> https://gist.github.com/Millnert/ecc10d8cc79c81b55d7f
>>>>>> https://bugs.centos.org/view.php?id=9284
>>>>>>
>>>>>
>>>>> Why is this being reported here and not to Red Hat? Is the bug
>>>>> reproducible on the upstream kernel?
>>>>
>>>> There's a RHEL BZ open for it:
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=132463
>>>>
>>>> Upstream has the problem, too. The pgio layer doesn't expect
>>>> zero-length
>>>> requests, so an nfs_page can get locked but never added to a pageio
>>>> descriptor.
>>>>
>>>> To create such a problem, writeback has to happen just after
>>>> nfs_write_end()
>>>> has created a nfs_page with wb_bytes = 0. This can happen if
>>>> iov_iter_copy_from_user_atomic() fails to copy pages in
>>>> generic_perform_write(). Normally, we'd just go around the loop
>>>> and try the
>>>> copy again, but if writeback sneaks in then the request gets locked
>>>> but
>>>> never added to a pgio descriptor due to the way we use conditions
>>>> on zero
>>>> bytes to move requests around.
>>>>
>>>> I see two potential ways to fix:
>>>> - Just skip the creation of the request if the length is zero.
>>>> We're
>>>> bound to just come around and created it again.
>>>>
>>>> - Allow zero-length requests, but fix up
>>>> __nfs_pageio_add_request,
>>>> nfs_generic_pg_test, and others that use conditions on zero.
>>>>
>>>> I think the first option to handle the case early is the simpler
>>>> fix, and
>>>> best approach, since: what do we do with a zero-length request?
>>>> Not
>>>> creating them helps us be conservative in what we send out to the
>>>> nfs_pageio_ops.
>>>>
>>>> I don't understand what conditions can cause
>>>> iov_iter_copy_from_user_atomic
>>>> to fail since pagefaults are disabled. Can anyone enlighten me?
>>>>
>>>> To reproduce this upstream, I've simulated
>>>> iov_iter_copy_from_user_atomic()
>>>> returning zero once, then inserted a small delay after
>>>> nfs_write_end to
>>>> allow me to sneak in a sync.
>>>>
>>>> Any advice on how to proceed? I'll send a patch to fix the first
>>>> way
>>>> otherwise. If anyone has trouble with that RHEL BZ, send me your
>>>> bugzilla
>>>> email and I'll give you access.
>>>
>>> I think adding a check to nfs_updatepage() to bail out if
>>> ‘count’ is zero would seem to be the right thing to do.
>>
>> This is exactly what we did as a workaround for customer,
>> so I can already confirm it helps. :^)
>
> Makes sense to me too!
>
> Who wants to take the lead and post a patch? Ben?

Sure.. I'll send it. I was hunting around trying to figure out an
appropriate version number for a stable tag.. I think it became a
problem in v3.15 or so -- but there's a lot that moved around so I am
not sure.

Ben