2017-05-30 18:58:14

by Chuck Lever III

[permalink] [raw]
Subject: GSS sequence number window

Hey Bruce!

While testing with sec=krb5 and sec=krb5i, I noticed a lot of
spurious connection loss, especially when I wanted to run a
CPU-intensive workload on my NFS server at the same time I
was testing.

I added a pr_err() in gss_check_seq_num, and ran a fio job
on a vers=3,sec=sys,proto=tcp mount (server is exporting a
tmpfs). On the server, I rebuilt a kernel source tree cscope
database at the same time.

May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = 250098, sd_max = 250291, GSS_SEQ_WIN = 128
May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 937816, sd_max = 938171, GSS_SEQ_WIN = 128
May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938544, sd_max = 938727, GSS_SEQ_WIN = 128
May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938543, sd_max = 938727, GSS_SEQ_WIN = 128
May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = 939344, sd_max = 939549, GSS_SEQ_WIN = 128
May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = 965007, sd_max = 965176, GSS_SEQ_WIN = 128
May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128
May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128
May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128
May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128

RFC 2203 suggests there's no risk to using a large window.
My first thought was to make the sequence window larger
(say 2048) but I've seen stragglers outside even that large
a window.

Any thoughts about why there are these sequence number
outliers?


--
Chuck Lever





2017-05-30 19:34:20

by J. Bruce Fields

[permalink] [raw]
Subject: Re: GSS sequence number window

On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote:
> Hey Bruce!
>
> While testing with sec=krb5 and sec=krb5i, I noticed a lot of
> spurious connection loss, especially when I wanted to run a
> CPU-intensive workload on my NFS server at the same time I
> was testing.
>
> I added a pr_err() in gss_check_seq_num, and ran a fio job
> on a vers=3,sec=sys,proto=tcp mount (server is exporting a
> tmpfs). On the server, I rebuilt a kernel source tree cscope
> database at the same time.
>
> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = 250098, sd_max = 250291, GSS_SEQ_WIN = 128
> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 937816, sd_max = 938171, GSS_SEQ_WIN = 128
> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938544, sd_max = 938727, GSS_SEQ_WIN = 128
> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938543, sd_max = 938727, GSS_SEQ_WIN = 128
> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = 939344, sd_max = 939549, GSS_SEQ_WIN = 128
> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = 965007, sd_max = 965176, GSS_SEQ_WIN = 128
> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128
> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128
> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128
> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128
>
> RFC 2203 suggests there's no risk to using a large window.
> My first thought was to make the sequence window larger
> (say 2048) but I've seen stragglers outside even that large
> a window.
>
> Any thoughts about why there are these sequence number
> outliers?

No, alas.

The server's where it's easier to catch, but it'd be interesting to know
what's going on on the client side when this happens.

Might be interesting to know just what the operation (or compound ops)
are in the bad cases, in case there's some pattern.

Might also be interesting to check where exactly in the client code the
sequence number is assigned and where in the server code it's checked,
and think about where they might get queued up or reordered in between.

Is it possible there are retries involved?

Does increasing the number of server threads help?

I don't think it'd be a problem to increase the sequence window.

--b.

2017-05-30 20:11:22

by Benjamin Coddington

[permalink] [raw]
Subject: Re: GSS sequence number window

On 30 May 2017, at 15:34, J. Bruce Fields wrote:

> On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote:
>> Hey Bruce!
>>
>> While testing with sec=krb5 and sec=krb5i, I noticed a lot of
>> spurious connection loss, especially when I wanted to run a
>> CPU-intensive workload on my NFS server at the same time I
>> was testing.
>>
>> I added a pr_err() in gss_check_seq_num, and ran a fio job
>> on a vers=3,sec=sys,proto=tcp mount (server is exporting a
>> tmpfs). On the server, I rebuilt a kernel source tree cscope
>> database at the same time.
>>
>> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = 250098,
>> sd_max = 250291, GSS_SEQ_WIN = 128
>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 937816,
>> sd_max = 938171, GSS_SEQ_WIN = 128
>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938544,
>> sd_max = 938727, GSS_SEQ_WIN = 128
>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938543,
>> sd_max = 938727, GSS_SEQ_WIN = 128
>> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = 939344,
>> sd_max = 939549, GSS_SEQ_WIN = 128
>> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = 965007,
>> sd_max = 965176, GSS_SEQ_WIN = 128
>> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = 1799710,
>> sd_max = 1799982, GSS_SEQ_WIN = 128
>> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = 1831165,
>> sd_max = 1831353, GSS_SEQ_WIN = 128
>> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = 1883583,
>> sd_max = 1883761, GSS_SEQ_WIN = 128
>> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = 1959316,
>> sd_max = 1959447, GSS_SEQ_WIN = 128
>>
>> RFC 2203 suggests there's no risk to using a large window.
>> My first thought was to make the sequence window larger
>> (say 2048) but I've seen stragglers outside even that large
>> a window.
>>
>> Any thoughts about why there are these sequence number
>> outliers?
>
> No, alas.

I noticed some slow allocations on the server with krb5 last year - but
never got around to doing anything about it:
http://marc.info/?t=146032122900006&r=1&w=2

Could be the same thing?

Ben

2017-05-30 21:03:27

by Chuck Lever III

[permalink] [raw]
Subject: Re: GSS sequence number window


> On May 30, 2017, at 3:34 PM, J. Bruce Fields <[email protected]> wrote:
>
> On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote:
>> Hey Bruce!
>>
>> While testing with sec=krb5 and sec=krb5i, I noticed a lot of
>> spurious connection loss, especially when I wanted to run a
>> CPU-intensive workload on my NFS server at the same time I
>> was testing.
>>
>> I added a pr_err() in gss_check_seq_num, and ran a fio job
>> on a vers=3,sec=sys,proto=tcp mount (server is exporting a
>> tmpfs). On the server, I rebuilt a kernel source tree cscope
>> database at the same time.
>>
>> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = 250098, sd_max = 250291, GSS_SEQ_WIN = 128
>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 937816, sd_max = 938171, GSS_SEQ_WIN = 128
>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938544, sd_max = 938727, GSS_SEQ_WIN = 128
>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = 938543, sd_max = 938727, GSS_SEQ_WIN = 128
>> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = 939344, sd_max = 939549, GSS_SEQ_WIN = 128
>> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = 965007, sd_max = 965176, GSS_SEQ_WIN = 128
>> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128
>> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128
>> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128
>> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128
>>
>> RFC 2203 suggests there's no risk to using a large window.
>> My first thought was to make the sequence window larger
>> (say 2048) but I've seen stragglers outside even that large
>> a window.
>>
>> Any thoughts about why there are these sequence number
>> outliers?
>
> No, alas.
>
> The server's where it's easier to catch, but it'd be interesting to know
> what's going on on the client side when this happens.
>
> Might be interesting to know just what the operation (or compound ops)
> are in the bad cases, in case there's some pattern.

It appears to happen more frequently with a 100% read
workload that accesses multiple files.


> Might also be interesting to check where exactly in the client code the
> sequence number is assigned and where in the server code it's checked,
> and think about where they might get queued up or reordered in between.
>
> Is it possible there are retries involved?

It's hard to tell, since disconnect means there will be
a recorded retransmission.


> Does increasing the number of server threads help?

Yes, but it doesn't fully address the issue.


> I don't think it'd be a problem to increase the sequence window.

Increasing the window to 1024 does not seem to help.


--
Chuck Lever




2017-05-31 19:22:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: GSS sequence number window

On Tue, May 30, 2017 at 04:11:20PM -0400, Benjamin Coddington wrote:
> On 30 May 2017, at 15:34, J. Bruce Fields wrote:
>
> >On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote:
> >>Hey Bruce!
> >>
> >>While testing with sec=krb5 and sec=krb5i, I noticed a lot of
> >>spurious connection loss, especially when I wanted to run a
> >>CPU-intensive workload on my NFS server at the same time I
> >>was testing.
> >>
> >>I added a pr_err() in gss_check_seq_num, and ran a fio job
> >>on a vers=3,sec=sys,proto=tcp mount (server is exporting a
> >>tmpfs). On the server, I rebuilt a kernel source tree cscope
> >>database at the same time.
> >>
> >>May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num =
> >>250098, sd_max = 250291, GSS_SEQ_WIN = 128
> >>May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
> >>937816, sd_max = 938171, GSS_SEQ_WIN = 128
> >>May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
> >>938544, sd_max = 938727, GSS_SEQ_WIN = 128
> >>May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
> >>938543, sd_max = 938727, GSS_SEQ_WIN = 128
> >>May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num =
> >>939344, sd_max = 939549, GSS_SEQ_WIN = 128
> >>May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num =
> >>965007, sd_max = 965176, GSS_SEQ_WIN = 128
> >>May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num =
> >>1799710, sd_max = 1799982, GSS_SEQ_WIN = 128
> >>May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num =
> >>1831165, sd_max = 1831353, GSS_SEQ_WIN = 128
> >>May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num =
> >>1883583, sd_max = 1883761, GSS_SEQ_WIN = 128
> >>May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num =
> >>1959316, sd_max = 1959447, GSS_SEQ_WIN = 128
> >>
> >>RFC 2203 suggests there's no risk to using a large window.
> >>My first thought was to make the sequence window larger
> >>(say 2048) but I've seen stragglers outside even that large
> >>a window.
> >>
> >>Any thoughts about why there are these sequence number
> >>outliers?
> >
> >No, alas.
>
> I noticed some slow allocations on the server with krb5 last year - but
> never got around to doing anything about it:
> http://marc.info/?t=146032122900006&r=1&w=2
>
> Could be the same thing?

I don't think it would be too hard to eliminate the need for allocations
there. Or maybe there's even a quick hack that would let Chuck test
whether that's the problem (different GFP flags on those allocations?)

--b.

2017-05-31 20:27:23

by Chuck Lever III

[permalink] [raw]
Subject: Re: GSS sequence number window


> On May 31, 2017, at 3:22 PM, J. Bruce Fields <[email protected]> wrote:
>
> On Tue, May 30, 2017 at 04:11:20PM -0400, Benjamin Coddington wrote:
>> On 30 May 2017, at 15:34, J. Bruce Fields wrote:
>>
>>> On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote:
>>>> Hey Bruce!
>>>>
>>>> While testing with sec=krb5 and sec=krb5i, I noticed a lot of
>>>> spurious connection loss, especially when I wanted to run a
>>>> CPU-intensive workload on my NFS server at the same time I
>>>> was testing.
>>>>
>>>> I added a pr_err() in gss_check_seq_num, and ran a fio job
>>>> on a vers=3,sec=sys,proto=tcp mount (server is exporting a
>>>> tmpfs). On the server, I rebuilt a kernel source tree cscope
>>>> database at the same time.
>>>>
>>>> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num =
>>>> 250098, sd_max = 250291, GSS_SEQ_WIN = 128
>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
>>>> 937816, sd_max = 938171, GSS_SEQ_WIN = 128
>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
>>>> 938544, sd_max = 938727, GSS_SEQ_WIN = 128
>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
>>>> 938543, sd_max = 938727, GSS_SEQ_WIN = 128
>>>> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num =
>>>> 939344, sd_max = 939549, GSS_SEQ_WIN = 128
>>>> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num =
>>>> 965007, sd_max = 965176, GSS_SEQ_WIN = 128
>>>> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num =
>>>> 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128
>>>> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num =
>>>> 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128
>>>> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num =
>>>> 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128
>>>> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num =
>>>> 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128
>>>>
>>>> RFC 2203 suggests there's no risk to using a large window.
>>>> My first thought was to make the sequence window larger
>>>> (say 2048) but I've seen stragglers outside even that large
>>>> a window.
>>>>
>>>> Any thoughts about why there are these sequence number
>>>> outliers?
>>>
>>> No, alas.
>>
>> I noticed some slow allocations on the server with krb5 last year - but
>> never got around to doing anything about it:
>> http://marc.info/?t=146032122900006&r=1&w=2
>>
>> Could be the same thing?
>
> I don't think it would be too hard to eliminate the need for allocations
> there. Or maybe there's even a quick hack that would let Chuck test
> whether that's the problem (different GFP flags on those allocations?)

This doesn't feel like a memory allocator issue. My server has
loads of free memory, and there's just one NUMA node.

Since increasing nfsd thread count seems to improve the situation,
my guess is that having more threads available makes it less likely
that an RPC request will be reordered.


--
Chuck Lever

2017-06-06 19:35:23

by Chuck Lever III

[permalink] [raw]
Subject: Re: GSS sequence number window


> On May 31, 2017, at 4:27 PM, Chuck Lever <[email protected]> wrote:
>
>>
>> On May 31, 2017, at 3:22 PM, J. Bruce Fields <[email protected]> wrote:
>>
>> On Tue, May 30, 2017 at 04:11:20PM -0400, Benjamin Coddington wrote:
>>> On 30 May 2017, at 15:34, J. Bruce Fields wrote:
>>>
>>>> On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote:
>>>>> Hey Bruce!
>>>>>
>>>>> While testing with sec=krb5 and sec=krb5i, I noticed a lot of
>>>>> spurious connection loss, especially when I wanted to run a
>>>>> CPU-intensive workload on my NFS server at the same time I
>>>>> was testing.
>>>>>
>>>>> I added a pr_err() in gss_check_seq_num, and ran a fio job
>>>>> on a vers=3,sec=sys,proto=tcp mount (server is exporting a
>>>>> tmpfs). On the server, I rebuilt a kernel source tree cscope
>>>>> database at the same time.
>>>>>
>>>>> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 250098, sd_max = 250291, GSS_SEQ_WIN = 128
>>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 937816, sd_max = 938171, GSS_SEQ_WIN = 128
>>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 938544, sd_max = 938727, GSS_SEQ_WIN = 128
>>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 938543, sd_max = 938727, GSS_SEQ_WIN = 128
>>>>> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 939344, sd_max = 939549, GSS_SEQ_WIN = 128
>>>>> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 965007, sd_max = 965176, GSS_SEQ_WIN = 128
>>>>> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128
>>>>> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128
>>>>> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128
>>>>> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num =
>>>>> 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128
>>>>>
>>>>> RFC 2203 suggests there's no risk to using a large window.
>>>>> My first thought was to make the sequence window larger
>>>>> (say 2048) but I've seen stragglers outside even that large
>>>>> a window.
>>>>>
>>>>> Any thoughts about why there are these sequence number
>>>>> outliers?
>>>>
>>>> No, alas.
>>>
>>> I noticed some slow allocations on the server with krb5 last year - but
>>> never got around to doing anything about it:
>>> http://marc.info/?t=146032122900006&r=1&w=2
>>>
>>> Could be the same thing?
>>
>> I don't think it would be too hard to eliminate the need for allocations
>> there. Or maybe there's even a quick hack that would let Chuck test
>> whether that's the problem (different GFP flags on those allocations?)
>
> This doesn't feel like a memory allocator issue. My server has
> loads of free memory, and there's just one NUMA node.
>
> Since increasing nfsd thread count seems to improve the situation,
> my guess is that having more threads available makes it less likely
> that an RPC request will be reordered.

I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306

To check memory allocation latency, I could always construct
a framework around kmalloc and alloc_page.


I've also found some bad behavior around proto=rdma,sec=krb5i.
When I run a heavy I/O workload (fio, for example), every so
often a read operation fails with EIO. I dug into it a little
and MIC verification fails for these replies on the client.

I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=307
to document this second issue.

I'm not sure what a next step would be. My suspicion is that
either the server or the client is mishandling the RPC reply
buffer, which causes the checksums to be different. Not sure
why this would be so intermittent, though.


--
Chuck Lever

2017-06-06 19:41:59

by J. Bruce Fields

[permalink] [raw]
Subject: Re: GSS sequence number window

On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote:
> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306
>
> To check memory allocation latency, I could always construct
> a framework around kmalloc and alloc_page.
>
>
> I've also found some bad behavior around proto=rdma,sec=krb5i.
> When I run a heavy I/O workload (fio, for example), every so
> often a read operation fails with EIO. I dug into it a little
> and MIC verification fails for these replies on the client.

Do we still have the problem that the read data can change between the
time we calculate the MIC and the time we transmit the data to the
client?

--b.

>
> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=307
> to document this second issue.
>
> I'm not sure what a next step would be. My suspicion is that
> either the server or the client is mishandling the RPC reply
> buffer, which causes the checksums to be different. Not sure
> why this would be so intermittent, though.
>
>
> --
> Chuck Lever
>
>

2017-06-06 19:45:59

by Chuck Lever III

[permalink] [raw]
Subject: Re: GSS sequence number window


> On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <[email protected]> wrote:
>
> On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote:
>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306
>>
>> To check memory allocation latency, I could always construct
>> a framework around kmalloc and alloc_page.
>>
>>
>> I've also found some bad behavior around proto=rdma,sec=krb5i.
>> When I run a heavy I/O workload (fio, for example), every so
>> often a read operation fails with EIO. I dug into it a little
>> and MIC verification fails for these replies on the client.
>
> Do we still have the problem that the read data can change between the
> time we calculate the MIC and the time we transmit the data to the
> client?

I don't see a problem with krb5p, which, if IIUC, would also
fall victim to this situation, unless there is much stricter
request serialization going on with krb5p.

Even so, how would I detect if this issue was present?


> --b.
>
>>
>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=307
>> to document this second issue.
>>
>> I'm not sure what a next step would be. My suspicion is that
>> either the server or the client is mishandling the RPC reply
>> buffer, which causes the checksums to be different. Not sure
>> why this would be so intermittent, though.

--
Chuck Lever

2017-06-06 20:15:04

by J. Bruce Fields

[permalink] [raw]
Subject: Re: GSS sequence number window

On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote:
>
> > On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <[email protected]> wrote:
> >
> > On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote:
> >> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306
> >>
> >> To check memory allocation latency, I could always construct
> >> a framework around kmalloc and alloc_page.
> >>
> >>
> >> I've also found some bad behavior around proto=rdma,sec=krb5i.
> >> When I run a heavy I/O workload (fio, for example), every so
> >> often a read operation fails with EIO. I dug into it a little
> >> and MIC verification fails for these replies on the client.
> >
> > Do we still have the problem that the read data can change between the
> > time we calculate the MIC and the time we transmit the data to the
> > client?
>
> I don't see a problem with krb5p, which, if IIUC, would also
> fall victim to this situation, unless there is much stricter
> request serialization going on with krb5p.

We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case.

> Even so, how would I detect if this issue was present?

Good question. If you knew the data and mic in the bad case, and had
some way to guess what the previous data might have been based on what
you knew about the test, then you could try mic's of likely older
versions of the data and see if you get a match.... That sounds hard.

--b.

2017-06-06 20:16:53

by Chuck Lever III

[permalink] [raw]
Subject: Re: GSS sequence number window


> On Jun 6, 2017, at 4:15 PM, J. Bruce Fields <[email protected]> wrote:
>
> On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote:
>>
>>> On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <[email protected]> wrote:
>>>
>>> On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote:
>>>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306
>>>>
>>>> To check memory allocation latency, I could always construct
>>>> a framework around kmalloc and alloc_page.
>>>>
>>>>
>>>> I've also found some bad behavior around proto=rdma,sec=krb5i.
>>>> When I run a heavy I/O workload (fio, for example), every so
>>>> often a read operation fails with EIO. I dug into it a little
>>>> and MIC verification fails for these replies on the client.
>>>
>>> Do we still have the problem that the read data can change between the
>>> time we calculate the MIC and the time we transmit the data to the
>>> client?
>>
>> I don't see a problem with krb5p, which, if IIUC, would also
>> fall victim to this situation, unless there is much stricter
>> request serialization going on with krb5p.
>
> We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case.

Seems like this is the right answer for krb5i too. Shall I try that?


>> Even so, how would I detect if this issue was present?
>
> Good question. If you knew the data and mic in the bad case, and had
> some way to guess what the previous data might have been based on what
> you knew about the test, then you could try mic's of likely older
> versions of the data and see if you get a match.... That sounds hard.
>
> --b.

--
Chuck Lever

2017-06-06 20:23:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: GSS sequence number window

On Tue, Jun 06, 2017 at 04:16:53PM -0400, Chuck Lever wrote:
>
> > On Jun 6, 2017, at 4:15 PM, J. Bruce Fields <[email protected]> wrote:
> >
> > On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote:
> >>
> >>> On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <[email protected]> wrote:
> >>>
> >>> On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote:
> >>>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306
> >>>>
> >>>> To check memory allocation latency, I could always construct
> >>>> a framework around kmalloc and alloc_page.
> >>>>
> >>>>
> >>>> I've also found some bad behavior around proto=rdma,sec=krb5i.
> >>>> When I run a heavy I/O workload (fio, for example), every so
> >>>> often a read operation fails with EIO. I dug into it a little
> >>>> and MIC verification fails for these replies on the client.
> >>>
> >>> Do we still have the problem that the read data can change between the
> >>> time we calculate the MIC and the time we transmit the data to the
> >>> client?
> >>
> >> I don't see a problem with krb5p, which, if IIUC, would also
> >> fall victim to this situation, unless there is much stricter
> >> request serialization going on with krb5p.
> >
> > We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case.
>
> Seems like this is the right answer for krb5i too. Shall I try that?

Sure! Just grep around for RQ_SPLICE_OK, I think it should be easy to
figure out.

--b.
>
>
> >> Even so, how would I detect if this issue was present?
> >
> > Good question. If you knew the data and mic in the bad case, and had
> > some way to guess what the previous data might have been based on what
> > you knew about the test, then you could try mic's of likely older
> > versions of the data and see if you get a match.... That sounds hard.
> >
> > --b.
>
> --
> Chuck Lever
>
>

2017-06-06 20:54:51

by Chuck Lever III

[permalink] [raw]
Subject: Re: GSS sequence number window


> On Jun 6, 2017, at 4:23 PM, J. Bruce Fields <[email protected]> =
wrote:
>=20
> On Tue, Jun 06, 2017 at 04:16:53PM -0400, Chuck Lever wrote:
>>=20
>>> On Jun 6, 2017, at 4:15 PM, J. Bruce Fields <[email protected]> =
wrote:
>>>=20
>>> On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote:
>>>>=20
>>>>> On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <[email protected]> =
wrote:
>>>>>=20
>>>>> On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote:
>>>>>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D306
>>>>>>=20
>>>>>> To check memory allocation latency, I could always construct
>>>>>> a framework around kmalloc and alloc_page.
>>>>>>=20
>>>>>>=20
>>>>>> I've also found some bad behavior around proto=3Drdma,sec=3Dkrb5i.
>>>>>> When I run a heavy I/O workload (fio, for example), every so
>>>>>> often a read operation fails with EIO. I dug into it a little
>>>>>> and MIC verification fails for these replies on the client.
>>>>>=20
>>>>> Do we still have the problem that the read data can change between =
the
>>>>> time we calculate the MIC and the time we transmit the data to the
>>>>> client?
>>>>=20
>>>> I don't see a problem with krb5p, which, if IIUC, would also
>>>> fall victim to this situation, unless there is much stricter
>>>> request serialization going on with krb5p.
>>>=20
>>> We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case.
>>=20
>> Seems like this is the right answer for krb5i too. Shall I try that?
>=20
> Sure! Just grep around for RQ_SPLICE_OK, I think it should be easy to
> figure out.

I added

clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);

at the top of unwrap_integ_data() in net/sunrpc/auth_gss/svcauth_gss.c.

I haven't seen a failure yet, which is a good sign.


> --b.
>>=20
>>=20
>>>> Even so, how would I detect if this issue was present?
>>>=20
>>> Good question. If you knew the data and mic in the bad case, and =
had
>>> some way to guess what the previous data might have been based on =
what
>>> you knew about the test, then you could try mic's of likely older
>>> versions of the data and see if you get a match.... That sounds =
hard.
>>>=20
>>> --b.
>>=20
>> --
>> Chuck Lever

--
Chuck Lever

2017-06-06 20:56:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: GSS sequence number window

On Tue, Jun 06, 2017 at 04:54:51PM -0400, Chuck Lever wrote:
>
> > On Jun 6, 2017, at 4:23 PM, J. Bruce Fields <[email protected]> wrote:
> >
> > On Tue, Jun 06, 2017 at 04:16:53PM -0400, Chuck Lever wrote:
> >>
> >>> On Jun 6, 2017, at 4:15 PM, J. Bruce Fields <[email protected]> wrote:
> >>>
> >>> On Tue, Jun 06, 2017 at 03:45:59PM -0400, Chuck Lever wrote:
> >>>>
> >>>>> On Jun 6, 2017, at 3:41 PM, J. Bruce Fields <[email protected]> wrote:
> >>>>>
> >>>>> On Tue, Jun 06, 2017 at 03:35:23PM -0400, Chuck Lever wrote:
> >>>>>> I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306
> >>>>>>
> >>>>>> To check memory allocation latency, I could always construct
> >>>>>> a framework around kmalloc and alloc_page.
> >>>>>>
> >>>>>>
> >>>>>> I've also found some bad behavior around proto=rdma,sec=krb5i.
> >>>>>> When I run a heavy I/O workload (fio, for example), every so
> >>>>>> often a read operation fails with EIO. I dug into it a little
> >>>>>> and MIC verification fails for these replies on the client.
> >>>>>
> >>>>> Do we still have the problem that the read data can change between the
> >>>>> time we calculate the MIC and the time we transmit the data to the
> >>>>> client?
> >>>>
> >>>> I don't see a problem with krb5p, which, if IIUC, would also
> >>>> fall victim to this situation, unless there is much stricter
> >>>> request serialization going on with krb5p.
> >>>
> >>> We turn off zero-copy by clearing RQ_SPLICE_OK in the krb5p case.
> >>
> >> Seems like this is the right answer for krb5i too. Shall I try that?
> >
> > Sure! Just grep around for RQ_SPLICE_OK, I think it should be easy to
> > figure out.
>
> I added
>
> clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);
>
> at the top of unwrap_integ_data() in net/sunrpc/auth_gss/svcauth_gss.c.
>
> I haven't seen a failure yet, which is a good sign.

Well, unfortunately it means the only easy fix we have right now may
cause a performance regression. Anyway, maybe that's what we should do.

--b.