2015-03-01 19:14:57

by Trond Myklebust

[permalink] [raw]
Subject: Weird TCP hang when doing loopback NFS (wireshark traces attached)

Hi,

When doing testing of NFSv3 loopback mounts (client and server are on
the same IP address), I'm seeing a very reproducible hang in which the
client stops receiving data from the server. The TCP connection is still
marked as established, and the server appears to continue to receive and
send data, however the client does not.

So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.

The reproducer is simply to loopback mount using NFSv3, and then run the
'fsx' filesystem exerciser. I'm usually able to trigger the hang with
"fsx -N 100000 foobar".

I've attached a couple of wireshark trace of a few frames just before
and during the hang in case it jogs any memories.

Cheers
Trond


Attachments:
dump_lastframes.out.pcapng.gz (1.65 kB)
dump_3.18.7_lastframes.pcapng.gz (48.47 kB)
Download all attachments

2015-03-02 00:52:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)

Hi Bruce,

On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
<[email protected]> wrote:
> Hi,
>
> When doing testing of NFSv3 loopback mounts (client and server are on
> the same IP address), I'm seeing a very reproducible hang in which the
> client stops receiving data from the server. The TCP connection is still
> marked as established, and the server appears to continue to receive and
> send data, however the client does not.
>
> So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
>
> The reproducer is simply to loopback mount using NFSv3, and then run the
> 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
> "fsx -N 100000 foobar".
>
> I've attached a couple of wireshark trace of a few frames just before
> and during the hang in case it jogs any memories.

This bug appears to go away when I disable the splice()-based reads by
clearing the RQ_SPLICE_OK flag.

I noticed that it always involved a combination of a READ and a
truncating SETATTR call. Are you sure that it is safe to share
pagecache pages directly with sendpage() in this way? As far as I can
tell, there is no locking to prevent them from being modified while in
the TCP send queue.

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2015-03-02 01:06:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)

On Sun, Mar 01, 2015 at 07:52:28PM -0500, Trond Myklebust wrote:
> Hi Bruce,
>
> On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
> <[email protected]> wrote:
> > Hi,
> >
> > When doing testing of NFSv3 loopback mounts (client and server are on
> > the same IP address), I'm seeing a very reproducible hang in which the
> > client stops receiving data from the server. The TCP connection is still
> > marked as established, and the server appears to continue to receive and
> > send data, however the client does not.
> >
> > So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
> >
> > The reproducer is simply to loopback mount using NFSv3, and then run the
> > 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
> > "fsx -N 100000 foobar".
> >
> > I've attached a couple of wireshark trace of a few frames just before
> > and during the hang in case it jogs any memories.
>
> This bug appears to go away when I disable the splice()-based reads by
> clearing the RQ_SPLICE_OK flag.
>
> I noticed that it always involved a combination of a READ and a
> truncating SETATTR call. Are you sure that it is safe to share
> pagecache pages directly with sendpage() in this way? As far as I can
> tell, there is no locking to prevent them from being modified while in
> the TCP send queue.

This is the stable-pages problem that we've had forever, isn't it? Or
is this a different problem?

--b.

2015-03-02 01:20:43

by Trond Myklebust

[permalink] [raw]
Subject: Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)

On Sun, Mar 1, 2015 at 8:06 PM, Bruce James Fields <[email protected]> wrote:
> On Sun, Mar 01, 2015 at 07:52:28PM -0500, Trond Myklebust wrote:
>> Hi Bruce,
>>
>> On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
>> <[email protected]> wrote:
>> > Hi,
>> >
>> > When doing testing of NFSv3 loopback mounts (client and server are on
>> > the same IP address), I'm seeing a very reproducible hang in which the
>> > client stops receiving data from the server. The TCP connection is still
>> > marked as established, and the server appears to continue to receive and
>> > send data, however the client does not.
>> >
>> > So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
>> >
>> > The reproducer is simply to loopback mount using NFSv3, and then run the
>> > 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
>> > "fsx -N 100000 foobar".
>> >
>> > I've attached a couple of wireshark trace of a few frames just before
>> > and during the hang in case it jogs any memories.
>>
>> This bug appears to go away when I disable the splice()-based reads by
>> clearing the RQ_SPLICE_OK flag.
>>
>> I noticed that it always involved a combination of a READ and a
>> truncating SETATTR call. Are you sure that it is safe to share
>> pagecache pages directly with sendpage() in this way? As far as I can
>> tell, there is no locking to prevent them from being modified while in
>> the TCP send queue.
>
> This is the stable-pages problem that we've had forever, isn't it? Or
> is this a different problem?

It is causing the TCP socket to hang, so it goes beyond the usual
stable pages issue.

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2015-03-02 04:31:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)

On Sun, Mar 1, 2015 at 8:20 PM, Trond Myklebust
<[email protected]> wrote:
> On Sun, Mar 1, 2015 at 8:06 PM, Bruce James Fields <[email protected]> wrote:
>> On Sun, Mar 01, 2015 at 07:52:28PM -0500, Trond Myklebust wrote:
>>> Hi Bruce,
>>>
>>> On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>> > Hi,
>>> >
>>> > When doing testing of NFSv3 loopback mounts (client and server are on
>>> > the same IP address), I'm seeing a very reproducible hang in which the
>>> > client stops receiving data from the server. The TCP connection is still
>>> > marked as established, and the server appears to continue to receive and
>>> > send data, however the client does not.
>>> >
>>> > So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
>>> >
>>> > The reproducer is simply to loopback mount using NFSv3, and then run the
>>> > 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
>>> > "fsx -N 100000 foobar".
>>> >
>>> > I've attached a couple of wireshark trace of a few frames just before
>>> > and during the hang in case it jogs any memories.
>>>
>>> This bug appears to go away when I disable the splice()-based reads by
>>> clearing the RQ_SPLICE_OK flag.
>>>
>>> I noticed that it always involved a combination of a READ and a
>>> truncating SETATTR call. Are you sure that it is safe to share
>>> pagecache pages directly with sendpage() in this way? As far as I can
>>> tell, there is no locking to prevent them from being modified while in
>>> the TCP send queue.
>>
>> This is the stable-pages problem that we've had forever, isn't it? Or
>> is this a different problem?
>
> It is causing the TCP socket to hang, so it goes beyond the usual
> stable pages issue.
>

Confirming that clearing RQ_SPLICE_OK fixes the issue on all kernel
that I've tested so far.
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2015-03-02 19:58:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)

On Sun, Mar 01, 2015 at 11:31:31PM -0500, Trond Myklebust wrote:
> On Sun, Mar 1, 2015 at 8:20 PM, Trond Myklebust
> <[email protected]> wrote:
> > On Sun, Mar 1, 2015 at 8:06 PM, Bruce James Fields <[email protected]> wrote:
> >> On Sun, Mar 01, 2015 at 07:52:28PM -0500, Trond Myklebust wrote:
> >>> Hi Bruce,
> >>>
> >>> On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
> >>> <[email protected]> wrote:
> >>> > Hi,
> >>> >
> >>> > When doing testing of NFSv3 loopback mounts (client and server are on
> >>> > the same IP address), I'm seeing a very reproducible hang in which the
> >>> > client stops receiving data from the server. The TCP connection is still
> >>> > marked as established, and the server appears to continue to receive and
> >>> > send data, however the client does not.
> >>> >
> >>> > So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
> >>> >
> >>> > The reproducer is simply to loopback mount using NFSv3, and then run the
> >>> > 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
> >>> > "fsx -N 100000 foobar".
> >>> >
> >>> > I've attached a couple of wireshark trace of a few frames just before
> >>> > and during the hang in case it jogs any memories.
> >>>
> >>> This bug appears to go away when I disable the splice()-based reads by
> >>> clearing the RQ_SPLICE_OK flag.
> >>>
> >>> I noticed that it always involved a combination of a READ and a
> >>> truncating SETATTR call. Are you sure that it is safe to share
> >>> pagecache pages directly with sendpage() in this way? As far as I can
> >>> tell, there is no locking to prevent them from being modified while in
> >>> the TCP send queue.
> >>
> >> This is the stable-pages problem that we've had forever, isn't it? Or
> >> is this a different problem?
> >
> > It is causing the TCP socket to hang, so it goes beyond the usual
> > stable pages issue.
> >
>
> Confirming that clearing RQ_SPLICE_OK fixes the issue on all kernel
> that I've tested so far.

Well, if the problem is a race with truncate then I guess it may have
something to do with sending pages that are no longer part of the page
cache?

I'd think that the get_page() in nfsd_splice_actor would prevent the
page being put to any other use until the network layer was done with
it, so that at worst the client would see garbage. But I don't begin to
understand how truncation actually works....

The zero-copy v3 code has been there since 2002, if I'm reading the
history right, so if it's really a fundamental problem with the approach
then I wonder how it's survived so long.

I haven't tried to reproduce yet.

--b.