Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp2208215ybe; Sat, 7 Sep 2019 10:55:56 -0700 (PDT) X-Google-Smtp-Source: APXvYqy6UGC/Is3R3VZ8lvwY4aW8FmsrNg32xzfeuqEBMrFuekaBrXghFDWv+L1lRd9T1nV8zxnr X-Received: by 2002:a17:902:850c:: with SMTP id bj12mr15855393plb.68.1567878956463; Sat, 07 Sep 2019 10:55:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567878956; cv=none; d=google.com; s=arc-20160816; b=I5C6ErD1XB2BOv0wVU48s23eBKA4EgbtKk8iXsYHRFW9niDrfWkqGQcvj+cpdTpwmR m9A/cG31w6ieVgTS5u8Tt2a+AG+jYVAjEMdBbCBSil6eprrKiqiex1ZHFUvVitE+ydKt phiHtELz/CF94XHCDZilUczXce2Ak6QbgGCIWxKDBXND+wbd7PjG+d9w3D9K1j5aPwSy 0etUC+kNqN2Mo07Q6itNBIkD+74KRVgaOjYTCm9kyWuUnjF/dcd0w87gKSTtBvevH/iT gAR8LwUsTLXC+8JhGyunumhSxgP3BX7txwNgaAEThaRiliM/zZXP2psy/eLKmtqoY5au /juA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=TlJkCM0IOBNOl7qAe9b5M8QBWIWKeWmGdOjcZSDxMbU=; b=b/sTNtuqMjZWkc1+oUcshSC9zSWPP+MYpwJ7rhgDv2ffFdQRAvp1tQyhRhnAv99BQL yrFADnyC/Siklxv0KaMumgdlSUjKul7/IoE7TkPsvncGboUdCzJvm8Aocfz3XnmryRNw KBxuVZ3asSq45JkDMuVTFaYJEx5B9rZW8tOPm79h6hev6c/jpjHYIH8fIYOUS/dCmYJF HchOC8RB8GExVuRCDQqwFruD71pLzOHkPEg3EwKqN8istrN7+3R6UMYYIoE0dK7jtMZp Kl43xTUCnHMn70ssggfCZwFhoxf48H+E2r7VbLp+ZDtkolPSKQ9DytpiOrupI1pYPscU 1/Sw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x185si7689730pgb.161.2019.09.07.10.55.42; Sat, 07 Sep 2019 10:55:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726491AbfIFUrc (ORCPT + 99 others); Fri, 6 Sep 2019 16:47:32 -0400 Received: from mx2.math.uh.edu ([129.7.128.33]:47092 "EHLO mx2.math.uh.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725872AbfIFUrc (ORCPT ); Fri, 6 Sep 2019 16:47:32 -0400 Received: from epithumia.math.uh.edu ([129.7.128.2]) by mx2.math.uh.edu with esmtp (Exim 4.92) (envelope-from ) id 1i6L8e-0007bV-Em; Fri, 06 Sep 2019 15:47:26 -0500 Received: by epithumia.math.uh.edu (Postfix, from userid 7225) id 619B08014CF; Fri, 6 Sep 2019 15:47:24 -0500 (CDT) From: Jason L Tibbitts III To: "J. Bruce Fields" Cc: Wolfgang Walter , linux-nfs@vger.kernel.org, km@cm4all.com, linux-kernel@vger.kernel.org Subject: Re: Regression in 5.1.20: Reading long directory fails References: <4418877.15LTP4gqqJ@stwm.de> <4198657.JbNDGbLXiX@h2o.as.studentenwerk.mhn.de> <20190906144837.GD17204@fieldses.org> Date: Fri, 06 Sep 2019 15:47:24 -0500 In-Reply-To: <20190906144837.GD17204@fieldses.org> (J. Bruce Fields's message of "Fri, 6 Sep 2019 10:48:37 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.9 (--) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org >>>>> "JBF" == J Bruce Fields writes: JBF> Those readdir changes were client-side, right? Based on that I'd JBF> been assuming a client bug, but maybe it'd be worth getting a full JBF> packet capture of the readdir reply to make sure it's legit. I have been working with bcodding on IRC for the past couple of days on this. Fortunately I was able to come up with way to fill up a directory in such a way that it will fail with certainty and as a bonus doesn't include any user data so I can feel OK about sharing packet captures. I have a capture alongside a kernel trace of the problematic operation in https://www.math.uh.edu/~tibbs/nfs/. Not that I can particularly tell anything useful from that, but bcodding says that it seems to point to some issue in sunrpc. And because I can easily reproduce this and I was able to do a bisect: 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d is the first bad commit commit 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d Author: Chuck Lever Date: Mon Feb 11 11:25:41 2019 -0500 SUNRPC: Use au_rslack when computing reply buffer size au_rslack is significantly smaller than (au_cslack << 2). Using that value results in smaller receive buffers. In some cases this eliminates an extra segment in Reply chunks (RPC/RDMA). Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker :040000 040000 d4d1ce2fbe0035c5bd9df976b8c448df85dcb505 7011a792dfe72ff9cd70d66e45d353f3d7817e3e M net But of course, I can't say whether this is the actual bad commit or whether it just introduced a behavior change which alters the conditions under which the problem appears. And just to make sure that the blame doesn't lie with the old RHEL7 kernel, I rsynced over the problematic directory to a machine running something slightly more modern (5.1.11, which I know I need to update, but it's already set up to do kerberised NFS) and the same problem exists, though the directory listing does fail at a different place. - J<