Return-Path: Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: GSS sequence number window From: Chuck Lever In-Reply-To: Date: Tue, 6 Jun 2017 15:35:23 -0400 Cc: Benjamin Coddington , Linux NFS Mailing List Message-Id: <28665890-C74A-4319-B42E-475393821EC7@oracle.com> References: <63736845-2BD3-4EE1-AC12-0BD21A9ABEF2@oracle.com> <20170530193419.GA9371@fieldses.org> <20170531192231.GA23526@fieldses.org> To: "J. Bruce Fields" List-ID: > On May 31, 2017, at 4:27 PM, Chuck Lever wrote: > >> >> On May 31, 2017, at 3:22 PM, J. Bruce Fields wrote: >> >> On Tue, May 30, 2017 at 04:11:20PM -0400, Benjamin Coddington wrote: >>> On 30 May 2017, at 15:34, J. Bruce Fields wrote: >>> >>>> On Tue, May 30, 2017 at 02:58:00PM -0400, Chuck Lever wrote: >>>>> Hey Bruce! >>>>> >>>>> While testing with sec=krb5 and sec=krb5i, I noticed a lot of >>>>> spurious connection loss, especially when I wanted to run a >>>>> CPU-intensive workload on my NFS server at the same time I >>>>> was testing. >>>>> >>>>> I added a pr_err() in gss_check_seq_num, and ran a fio job >>>>> on a vers=3,sec=sys,proto=tcp mount (server is exporting a >>>>> tmpfs). On the server, I rebuilt a kernel source tree cscope >>>>> database at the same time. >>>>> >>>>> May 29 17:53:13 klimt kernel: gss_check_seq_num: seq_num = >>>>> 250098, sd_max = 250291, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = >>>>> 937816, sd_max = 938171, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = >>>>> 938544, sd_max = 938727, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:33 klimt kernel: gss_check_seq_num: seq_num = >>>>> 938543, sd_max = 938727, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:34 klimt kernel: gss_check_seq_num: seq_num = >>>>> 939344, sd_max = 939549, GSS_SEQ_WIN = 128 >>>>> May 29 17:53:35 klimt kernel: gss_check_seq_num: seq_num = >>>>> 965007, sd_max = 965176, GSS_SEQ_WIN = 128 >>>>> May 29 17:54:01 klimt kernel: gss_check_seq_num: seq_num = >>>>> 1799710, sd_max = 1799982, GSS_SEQ_WIN = 128 >>>>> May 29 17:54:02 klimt kernel: gss_check_seq_num: seq_num = >>>>> 1831165, sd_max = 1831353, GSS_SEQ_WIN = 128 >>>>> May 29 17:54:04 klimt kernel: gss_check_seq_num: seq_num = >>>>> 1883583, sd_max = 1883761, GSS_SEQ_WIN = 128 >>>>> May 29 17:54:07 klimt kernel: gss_check_seq_num: seq_num = >>>>> 1959316, sd_max = 1959447, GSS_SEQ_WIN = 128 >>>>> >>>>> RFC 2203 suggests there's no risk to using a large window. >>>>> My first thought was to make the sequence window larger >>>>> (say 2048) but I've seen stragglers outside even that large >>>>> a window. >>>>> >>>>> Any thoughts about why there are these sequence number >>>>> outliers? >>>> >>>> No, alas. >>> >>> I noticed some slow allocations on the server with krb5 last year - but >>> never got around to doing anything about it: >>> http://marc.info/?t=146032122900006&r=1&w=2 >>> >>> Could be the same thing? >> >> I don't think it would be too hard to eliminate the need for allocations >> there. Or maybe there's even a quick hack that would let Chuck test >> whether that's the problem (different GFP flags on those allocations?) > > This doesn't feel like a memory allocator issue. My server has > loads of free memory, and there's just one NUMA node. > > Since increasing nfsd thread count seems to improve the situation, > my guess is that having more threads available makes it less likely > that an RPC request will be reordered. I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=306 To check memory allocation latency, I could always construct a framework around kmalloc and alloc_page. I've also found some bad behavior around proto=rdma,sec=krb5i. When I run a heavy I/O workload (fio, for example), every so often a read operation fails with EIO. I dug into it a little and MIC verification fails for these replies on the client. I filed https://bugzilla.linux-nfs.org/show_bug.cgi?id=307 to document this second issue. I'm not sure what a next step would be. My suspicion is that either the server or the client is mishandling the RPC reply buffer, which causes the checksums to be different. Not sure why this would be so intermittent, though. -- Chuck Lever