Return-Path: Received: from mail-qk0-f177.google.com ([209.85.220.177]:34368 "EHLO mail-qk0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S938511AbcJXNbM (ORCPT ); Mon, 24 Oct 2016 09:31:12 -0400 Received: by mail-qk0-f177.google.com with SMTP id x11so29763149qka.1 for ; Mon, 24 Oct 2016 06:31:12 -0700 (PDT) Message-ID: <1477315868.2625.37.camel@redhat.com> Subject: Re: upstream server crash From: Jeff Layton To: Eryu Guan , "J. Bruce Fields" Cc: Chuck Lever , linux-nfs@vger.kernel.org Date: Mon, 24 Oct 2016 09:31:08 -0400 In-Reply-To: <20161024031519.GN2462@eguan.usersys.redhat.com> References: <20161023182115.GA14481@fieldses.org> <20161024031519.GN2462@eguan.usersys.redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Mon, 2016-10-24 at 11:15 +0800, Eryu Guan wrote: > On Sun, Oct 23, 2016 at 02:21:15PM -0400, J. Bruce Fields wrote: > > > > I'm getting an intermittent crash in the nfs server as of > > 68778945e46f143ed7974b427a8065f69a4ce944 "SUNRPC: Separate buffer > > pointers for RPC Call and Reply messages". > > > > I haven't tried to understand that commit or why it would be a problem yet, I > > don't see an obvious connection--I can take a closer look Monday. > > > > Could even be that I just landed on this commit by chance, the problem is a > > little hard to reproduce so I don't completely trust my testing. > > I've hit the same crash on 4.9-rc1 kernel, and it's reproduced for me > reliably by running xfstests generic/013 case, on a loopback mounted > NFSv4.1 (or NFSv4.2), XFS is the underlying exported fs. More details > please see > > http://marc.info/?l=linux-nfs&m=147714320129362&w=2 > Looks like you landed at the same commit as Bruce, so that's probably legit. That commit is very small though. The only real change that doesn't affect the new field is this: @@ -1766,7 +1766,7 @@ rpc_xdr_encode(struct rpc_task *task)                      req->rq_buffer,                      req->rq_callsize);         xdr_buf_init(&req->rq_rcv_buf, -                    (char *)req->rq_buffer + req->rq_callsize, +                    req->rq_rbuffer,                      req->rq_rcvsize); So I'm guessing this is breaking the callback channel somehow? -- Jeff Layton