Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3904504ybi; Fri, 5 Jul 2019 16:54:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqyNB/WYv/OnQwczDTxwC1Rd6v0XXfPJJ8CaaSNchgnkuCYqaYGOY6v2jbpO3hlhQIIY2h8W X-Received: by 2002:a17:902:23:: with SMTP id 32mr8517257pla.34.1562370869673; Fri, 05 Jul 2019 16:54:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562370869; cv=none; d=google.com; s=arc-20160816; b=q4Z7xwdvUQbTOsy8pJANCkKsjJkTVGgPLoHDHSuXLElSKLT3GYRPGLmMe4CuMLNcZ5 qA1HAfGIBu6Y+HdYccHHgEyoWNrFw5IZ/5sp4l+yr6igY6jHwp9+RmCyllxM0UAzyEMh ouOsh212yK1t7ZiNZrqaCAgKotLnV3e+gJhVUIJYCkZPHqykBlBS5R1YMKOVqlprCNPu 1A1Bkfn4z6lM0GasAzokHtMIn++pcx/go1HMoxwyJy2pN2caJnwwSWSN2Vay4nWzACJt rzAKXY7DILjN9LaAxDwoCf+J+OwYPbDk1h1UD577JDrrQEpXSLIOuVb5XmWH+3wJgaYP Gb0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject; bh=1JpRrwSsqHHmWSj99DuWT0kb7DACxlV4d+f6SaYC9hE=; b=uIQUUFRuIpEfL1wxj8YPCCyXuHjVAdk08c1mnhrYKKuN9LXXfqdfAHXVr30qTbeVGo 4T9CxrR8qso0SvVNbmADamDgGkyvxtvEsgQ7gV1b/ohypmQwF3R1EQ7K8aD2VAURwUJ6 nXpwV6k/pzsTwyLoQI0xyqpc1TG3qb//cdpLPqv6AgE4c7Y2q/uqC7ptEWJfcZugdB0e kusE38n5y/1Z5eWYDIHHByeanu51YFJM1RSH1TZKX2cnl58EVsI4lnnMBzseWGPkcBKR oVC4R6iQ45hPqPUwACWzPqth/Ix2T4zhUT8Zt2rh/zmlgIgv24P9Mdqeb6CJxAfk/ltZ Kclw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j25si10685616pfr.11.2019.07.05.16.54.03; Fri, 05 Jul 2019 16:54:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726177AbfGEXx7 (ORCPT + 99 others); Fri, 5 Jul 2019 19:53:59 -0400 Received: from p3plsmtpa07-07.prod.phx3.secureserver.net ([173.201.192.236]:33013 "EHLO p3plsmtpa07-07.prod.phx3.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726069AbfGEXx7 (ORCPT ); Fri, 5 Jul 2019 19:53:59 -0400 Received: from [192.168.0.56] ([24.218.182.144]) by :SMTPAUTH: with ESMTPSA id jY1eheS2r0qbejY1ehZXOi; Fri, 05 Jul 2019 16:53:58 -0700 Subject: Re: User process NFS write hang in wait_on_commit with kworker To: Alan Post , linux-nfs References: <20190618000613.GR4158@turtle.email> <6DE07E49-D450-4BF7-BC61-0973A14CD81B@redhat.com> <20190619000746.GT4158@turtle.email> <25608EB2-87F0-4196-BEF9-8AB8FC72270B@redhat.com> <20190621204723.GU4158@turtle.email> <20190628183324.GJ4158@turtle.email> <35045385-2C77-4BA0-8641-2AE4E73E04A4@redhat.com> <20190703213221.GB4158@turtle.email> From: Tom Talpey Message-ID: Date: Fri, 5 Jul 2019 19:53:56 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20190703213221.GB4158@turtle.email> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4wfGSOmIS4Fj337AXW9UHdXegZsbJYmOuAywbi0qYLPTQCPh/Yd6T0zWhR9HrvshaGJwfTtpMq0lZPrxMPTcn6VZB+9V80Qb6i1dK/L8rVq7Am+WI4zxwC R98WqSplIejDY4kwuM7CzTbqpWx9B+ZZqft4k4ErbCAm08WDPzXQ3xw/4JBXod37MqSFpp7wZXk252HzEYSAB3J7jZQajMyaS/o= Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 7/3/2019 5:32 PM, Alan Post wrote: > On Tue, Jul 02, 2019 at 05:55:10AM -0400, Benjamin Coddington wrote: >>> As far as I understand it, for a particular xid, there should be a >>> call and a reply. The approach I took then was to pull out these >>> fields from my capture and ignore RPC calls where both are present >>> in my capture. It seems this is simplistic, as the number of RPC >>> calls I have without an attendant reply isn't lining up with my >>> incident window. >> >> Does your capture report dropped packets? If so, maybe you need to increase >> the capture buffer. >> > > I'm not certain, but I do have a capture on both the NFS server and > the NFS client--comparing them would show me if I was under most > circumstances. Good catch. > >>> In one example, I have a series of READ calls which cease >>> generating RPC reply messages as the offset for the file continues >>> to increases. After a couple/few dozen messages, the RPC replies >>> continue as they were. Is there a normal or routine explanation >>> for this? >>> >>> RFC 5531 and the NetworkTracing page on wiki.linux-nfs.org have >>> been quite helpful bringing me up to speed. If any of you have >>> advice or guidance or can clarify my understanding of how the >>> call/reply RPC mechanism works I appreciate it. >> >> Seems like you understand it. Do you have specific questions? >> > > Is it true that for each RPC call there is an RPC reply with the > same xid? Is it a-priori an error if an otherwise correct RPC > call is not eventually paired with an RPC reply? Absolutely yes. Not replying would be like a local procedure never returning. But remember XIDs are not globally unique. They are only unique within some limited span of time for the connection they were issued on. This is typically only a problem on very high IOPS workloads, or over long spans of time. Tom.