Received: by 2002:a05:6a10:8a4d:0:0:0:0 with SMTP id dn13csp114770pxb; Thu, 12 Aug 2021 12:05:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyOfCgkytSMgWc/AnRSsaQ8O5pYSpr2CKPaku6MGdUr0CVphnTobDgLvFh/gNuwU9k6eUsp X-Received: by 2002:a17:906:81c8:: with SMTP id e8mr5193009ejx.401.1628795153371; Thu, 12 Aug 2021 12:05:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628795153; cv=none; d=google.com; s=arc-20160816; b=hi0Ig+jLjnZe4AMPgqvMjmbldDKPi83m5hsYk8EzG2nJsq4Xur+k3Ih5lJrsfWOwgZ RlHkzah+KVohRAvtJA6RtB0uozjsn7iaSwy+ZmxGfXCQF+LEusL2ylbFW9n+27RGR16S zhFHTjLN6Olv7WXaddQKuuB4w6qC1XENUZeoCCPPAp8ba0c1quWNKmOQjp5DQnSvLC8Q Wl1oJPr8+NV/PSsR+gJFKuTCoJbTc116z/6KH3F7wFYqmLDfCCTd/jCpF7mkiMqV+tNL w7h2fTfp8UKUAKNoCKJFCejS0fsAIGHnEx5pUaHJJkRQZSXWHsT/lz/nw1bUwJN1zVrq Oq6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=13acJ/4aYksJJng+ruGIigb4ZNUKV38ClU5EsFiD54A=; b=wU2zMjcGD71kaWo8qClc3e3xaTYxfyRYYo1fG2qQm3VPwaHXs4GIA/zWDnY72XYj/4 kAmXbCk+mkw9rycib4KYbCeBfn6+DRwqptzfwctc/BOHOqnAhYGNxBLUFQGkuZCN14Bg uEZ1BM2dfPs38TK87+0kfNqwYwb+poij6d2ulhI3TmIMRzQX2gEPJcpSbfO5BPt3y9MO 2QOo2qDizw4lKF17WegTT2AP1xfEzIcAKmvpHqfImTSYKL/VV12yGXTq357SKoxcHYmd sV3KUVl2BTzFBvRCxkIrgCGvqWKN60zzaGsZJWyM66sKqKoCd8Ijs3aMYdgd4Wy9q5Ne Ba8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rothenpieler.org header.s=mail header.b=Wd6O4PzB; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ne7si3846958ejc.648.2021.08.12.12.05.29; Thu, 12 Aug 2021 12:05:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@rothenpieler.org header.s=mail header.b=Wd6O4PzB; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237519AbhHLSOJ (ORCPT + 99 others); Thu, 12 Aug 2021 14:14:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237510AbhHLSOI (ORCPT ); Thu, 12 Aug 2021 14:14:08 -0400 Received: from btbn.de (btbn.de [IPv6:2a01:4f8:212:2854::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28D40C061756 for ; Thu, 12 Aug 2021 11:13:43 -0700 (PDT) Received: from [authenticated] by btbn.de (Postfix) with ESMTPSA id 1054E33F2E4; Thu, 12 Aug 2021 20:13:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rothenpieler.org; s=mail; t=1628792021; bh=13acJ/4aYksJJng+ruGIigb4ZNUKV38ClU5EsFiD54A=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=Wd6O4PzBp5jMtYMpSIh1X+EVstyBngWrKdxn50wogAzZwgN8coIJJ9GlTm2qObE2Y 2B3xM/1Padd+UYt5lfWXCTHkC32kaeHHUqhWUdAC1zYC7wSvUH5Du1fOlZcuN1+LPT bID+BwFNneCq758wHfQGmtu1JV0aN7REfmb0ti5QduLNu1p4A+LV0GpPyrj3BBwGpN jSS/GPHgCh4C36i4PZbaTiSIcplXtjK2sfYvE0JGMMzg669PjJ9nOX4p1waCiPZYlv IK1s4dPY6hOWwmfDgDjdqr5ELdiXM5+WpmjMKh66eAS/vVlmQgImGRlE/wtkCxKZPl FVZJD26iZCBLg== Subject: Re: Spurious instability with NFSoRDMA under moderate load To: Chuck Lever III Cc: Linux NFS Mailing List , Olga Kornievskaia , Dai Ngo References: <4da3b074-a6be-d83f-ccd4-b151557066aa@rothenpieler.org> <72ECF9E1-1F6E-44AF-850C-536BED898DDD@oracle.com> <9355de20-921c-69e0-e5a4-733b64e125e1@rothenpieler.org> <4BA2A532-9063-4893-AF53-E1DAB06095CC@oracle.com> <141fdf51-2aa1-6614-fe4e-96f168cbe6cf@rothenpieler.org> <99DFF0B0-FE0F-4416-B3F6-1F9535884F39@oracle.com> <64F9A492-44B9-4057-ABA5-C8202828A8DD@oracle.com> <1b8a24a9-5dba-3faf-8b0a-16e728a6051c@rothenpieler.org> <5DD80ADC-0A4B-4D95-8CF7-29096439DE9D@oracle.com> <0444ca5c-e8b6-1d80-d8a5-8469daa74970@rothenpieler.org> <3AF4F6CA-8B17-4AE9-82E2-21A2B9AA0774@oracle.com> From: Timo Rothenpieler Message-ID: <4caff277-8e53-3c75-70c1-8938b2a26933@rothenpieler.org> Date: Thu, 12 Aug 2021 20:13:39 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <3AF4F6CA-8B17-4AE9-82E2-21A2B9AA0774@oracle.com> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; boundary="------------ms080403000608050707070209" Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org This is a cryptographically signed message in MIME format. --------------ms080403000608050707070209 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 11.08.2021 19:30, Chuck Lever III wrote: >=20 >=20 >> On Aug 11, 2021, at 12:20 PM, Timo Rothenpieler wrote: >> >> resulting dmesg and trace logs of both client and server are attached.= >> >> Test procedure: >> >> - start tracing on client and server >> - mount NFS on client >> - immediately run 'xfs_io -fc "copy_range testfile" testfile.copy' (wh= ich succeeds) >> - wait 10~15 minutes for the backchannel to time out (still running 5.= 12.19 with the fix for that reverted) >> - run xfs_io command again, getting stuck now >> - let it sit there stuck for a minute, then cancel it >> - run the command again >> - while it's still stuck, finished recording the logs and traces >=20 > The server tries to send CB_OFFLOAD when the offloaded copy > completes, but finds the backchannel transport is not connected. >=20 > The server can't report the problem until the client sends a > SEQUENCE operation, but there's really no other traffic going > on, so it just waits. >=20 > The client eventually sends a singleton SEQUENCE to renew its > lease. The server replies with the SEQ4_STATUS_BACKCHANNEL_FAULT > flag set at that point. Client's recovery is to destroy that > session and create a new one. That appears to be successful. If it re-created the session and the backchannel, shouldn't that mean=20 that after I cancel the first stuck xfs_io command, and try it again=20 immediately (before the backchannel had a chance to timeout again) it=20 should work? Cause that's explicitly not the case, once the backchannel initially=20 times out, all subsequent commands get stuck, even if the system is=20 seeing other work on the NFS mount being done in parallel, and no matter = how often I re-try and how long I wait in between or with it stuck. > But the server doesn't send another CB_OFFLOAD to let the client > know the copy is complete, so the client hangs. >=20 > This seems to be peculiar to COPY_OFFLOAD, but I wonder if the > other CB operations suffer from the same "failed to retransmit > after the CB path is restored" issue. It might not matter for > some of them, but for others like CB_RECALL, that could be > important. >=20 >=20 > -- > Chuck Lever >=20 >=20 >=20 --------------ms080403000608050707070209 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC DVkwggXkMIIDzKADAgECAhAI/yx7V5dPIG8WuMetnzcsMA0GCSqGSIb3DQEBCwUAMIGBMQsw CQYDVQQGEwJJVDEQMA4GA1UECAwHQmVyZ2FtbzEZMBcGA1UEBwwQUG9udGUgU2FuIFBpZXRy bzEXMBUGA1UECgwOQWN0YWxpcyBTLnAuQS4xLDAqBgNVBAMMI0FjdGFsaXMgQ2xpZW50IEF1 dGhlbnRpY2F0aW9uIENBIEczMB4XDTIxMDIxNDE5MTM0N1oXDTIyMDIxNDE5MTM0N1owIDEe MBwGA1UEAwwVdGltb0Byb3RoZW5waWVsZXIub3JnMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A MIIBCgKCAQEA0WP2SBuRIpVw5O7QPakKoJjg7B4UNAKTyky1XMsievLNGnR4Nxe6kKU+1oW0 oF5FqMVH9NkT9zhWYJzr5sNwJMKb9t5k8kYC7GXzOM9PxVx3bkLF5bWZrbfelUUwcdiyEYoh d29C+PxiNLHvmayWb3NtxpWiax9A4x7dRhhtqB/0BkPix+ZsIFn8vxpCvIChE2YlQWK3i8UX uBtqm26zBl3BIjj+bpd+7ePVt60vRx/R3LFHtF6kL/gQvgRcm8CFc8Nj3dCUeR2lfG+DzoTY ED6yAi838kRh5JHbqIl/Fo9YRwOYUaq2TFT/fGue87d7duLbckX1aVot+OqE0aeV2QIDAQAB o4IBtjCCAbIwDAYDVR0TAQH/BAIwADAfBgNVHSMEGDAWgBS+l6mqhL+AvxBTfQky+eEuMhvP dzB+BggrBgEFBQcBAQRyMHAwOwYIKwYBBQUHMAKGL2h0dHA6Ly9jYWNlcnQuYWN0YWxpcy5p dC9jZXJ0cy9hY3RhbGlzLWF1dGNsaWczMDEGCCsGAQUFBzABhiVodHRwOi8vb2NzcDA5LmFj dGFsaXMuaXQvVkEvQVVUSENMLUczMCAGA1UdEQQZMBeBFXRpbW9Acm90aGVucGllbGVyLm9y ZzBHBgNVHSAEQDA+MDwGBiuBHwEYATAyMDAGCCsGAQUFBwIBFiRodHRwczovL3d3dy5hY3Rh bGlzLml0L2FyZWEtZG93bmxvYWQwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMEgG A1UdHwRBMD8wPaA7oDmGN2h0dHA6Ly9jcmwwOS5hY3RhbGlzLml0L1JlcG9zaXRvcnkvQVVU SENMLUczL2dldExhc3RDUkwwHQYDVR0OBBYEFK/aNb0BTZd0BqHgSJnmTftGSlabMA4GA1Ud DwEB/wQEAwIFoDANBgkqhkiG9w0BAQsFAAOCAgEAT3W2bBaISi7Utg/WA3U+bBhiouolnROR AB0vW4m3igjMcWx5GrPb8CSWNcq0/+BG+bhj6s+q7D1E9h1HO9CZUCfD7ujXj/VT/h7oMAqX w3Tf6H92bvHmZCvZmb2HKEnAAa4URjeZyNI1uwsMirF/gC5zYX5pm2ydVGxGYusWq8VRZzgc m1a0f3SPtX2dmmqjCzfINsQPs3N7BQo6FO/PfCbCzt22e+9Zm0Lra0Wt2URFTYCKSTjsK2xC SkysTfVIrBZCOb83oTMsgYE9dBmK7Tmob/HzHKs0NUOu4TfEpCgFgoXozMqTLFQac7aW26YK O8ClFDaauyOC71A+kjrth/gkUNEK+Cd3W52hK2FWvxbG/8LQLDMYviZFKxv/LAHU0fb6omva R4dzu9Sagi1z5uI5KHs5SR85lH4Up0dYs+I2xyFb8wZVYa+VuvsJ4W/pL2OaMm0tez+aNprg XURytCSPfAlz3JQdEYIiKPlJrz7O6eL2j7RwxMcKFLQl117mhImjdauIjaaS60w92P7v+F7+ 7INJ8g0PFN2vHVCB9e1g4iSYIgiydDLcbs73Jp1yVp97plWZI9oirxvH1/vI05FUJ3gw9qg2 WfbttAr0AEakAUo3Dv8jB7aQor/5fu8NMOvWjFV7P7GTAgrwil8u6fXa8ae/kWzG/850vgqq GM0wggdtMIIFVaADAgECAhAXED7ePYoctcoGUZPnykNrMA0GCSqGSIb3DQEBCwUAMGsxCzAJ BgNVBAYTAklUMQ4wDAYDVQQHDAVNaWxhbjEjMCEGA1UECgwaQWN0YWxpcyBTLnAuQS4vMDMz NTg1MjA5NjcxJzAlBgNVBAMMHkFjdGFsaXMgQXV0aGVudGljYXRpb24gUm9vdCBDQTAeFw0y MDA3MDYwODQ1NDdaFw0zMDA5MjIxMTIyMDJaMIGBMQswCQYDVQQGEwJJVDEQMA4GA1UECAwH QmVyZ2FtbzEZMBcGA1UEBwwQUG9udGUgU2FuIFBpZXRybzEXMBUGA1UECgwOQWN0YWxpcyBT LnAuQS4xLDAqBgNVBAMMI0FjdGFsaXMgQ2xpZW50IEF1dGhlbnRpY2F0aW9uIENBIEczMIIC IjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA7eaHlqHBpLbtwkJV9z8PDyJgXxPgpkOI hkmReRwbLxpQD9xGAe72ujqGzFFh78QPgAhxKVqtGHzYeq0VJVCzhnCKRBbVX+JwIhL3ULYh UAZrViUp952qDB6qTL5sGeJS9F69VPSR5k6pFNw7mHDTTt0voWFg2aVkG3khomzVXoieJGOi Q4dH76paCtQbLkt59joAKz2BnwGLQ4wr09nfumJt5AKx2YxHK2XgSPslVZ4z8G00gimsfA7U tjT/wiekY6Z0b7ksLrEcvODncHQe9VSrNRA149SE3AlkWaZM/joVei/GYfj9K5jkiReinR4m qM353FEceLOeBhSTURpMdQ5wsXLi9DSTGBuNv4aw2Dozb/qBlkhGTvwk92mi0jAecE22Sn3A 9UfrU2p1w/uRs+TIteQ0xO0B/J2mY2caqocsS9SsriIGlQ8b0LT0o6Ob07KGtPa5/lIvMmx5 72Dv2v+vDiECByxm1Hdgjp8JtE4mdyYP6GBscJyT71NZw1zXHnFkyCbxReag9qaSR9x4CVVX j1BDmNROCqd5NAfIXUXYTFeZ/jukQigkxXGWhEhfLBC4Ha6pwizz9fq1+wwPKcWaF9P/SZOu BDrG30MiyCZa66G9mEtF5ZLuh4rGfKqxy4Z5Mxecuzt+MZmrSKfKGeXOeED/iuX5Z02M1o7i MS8CAwEAAaOCAfQwggHwMA8GA1UdEwEB/wQFMAMBAf8wHwYDVR0jBBgwFoAUUtiIOsifeGbt ifN7OHCUyQICNtAwQQYIKwYBBQUHAQEENTAzMDEGCCsGAQUFBzABhiVodHRwOi8vb2NzcDA1 LmFjdGFsaXMuaXQvVkEvQVVUSC1ST09UMEUGA1UdIAQ+MDwwOgYEVR0gADAyMDAGCCsGAQUF BwIBFiRodHRwczovL3d3dy5hY3RhbGlzLml0L2FyZWEtZG93bmxvYWQwHQYDVR0lBBYwFAYI KwYBBQUHAwIGCCsGAQUFBwMEMIHjBgNVHR8EgdswgdgwgZaggZOggZCGgY1sZGFwOi8vbGRh cDA1LmFjdGFsaXMuaXQvY24lM2RBY3RhbGlzJTIwQXV0aGVudGljYXRpb24lMjBSb290JTIw Q0EsbyUzZEFjdGFsaXMlMjBTLnAuQS4lMmYwMzM1ODUyMDk2NyxjJTNkSVQ/Y2VydGlmaWNh dGVSZXZvY2F0aW9uTGlzdDtiaW5hcnkwPaA7oDmGN2h0dHA6Ly9jcmwwNS5hY3RhbGlzLml0 L1JlcG9zaXRvcnkvQVVUSC1ST09UL2dldExhc3RDUkwwHQYDVR0OBBYEFL6XqaqEv4C/EFN9 CTL54S4yG893MA4GA1UdDwEB/wQEAwIBBjANBgkqhkiG9w0BAQsFAAOCAgEAJpvnG1kNdLMS A+nnVfeEgIXNQsM7YRxXx6bmEt9IIrFlH1qYKeNw4NV8xtop91Rle168wghmYeCTP10FqfuK MZsleNkI8/b3PBkZLIKOl9p2Dmz2Gc0I3WvcMbAgd/IuBtx998PJX/bBb5dMZuGV2drNmxfz 3ar6ytGYLxedfjKCD55Yv8CQcN6e9sW5OUm9TJ3kjt7Wdvd1hcw5s+7bhlND38rWFJBuzump 5xqm1NSOggOkFSlKnhSz6HUjgwBaid6Ypig9L1/TLrkmtEIpx+wpIj7WTA9JqcMMyLJ0rN6j jpetLSGUDk3NCOpQntSy4a8+0O+SepzS/Tec1cGdSN6Ni2/A7ewQNd1Rbmb2SM2qVBlfN0e6 ZklWo9QYpNZyf0d/d3upsKabE9eNCg1S4eDnp8sJqdlaQQ7hI/UYCAgDtLIm7/J9+/S2zuwE WtJMPcvaYIBczdjwF9uW+8NJ/Zu/JKb98971uua7OsJexPFRBzX7/PnJ2/NXcTdwudShJc/p d9c3IRU7qw+RxRKchIczv3zEuQJMHkSSM8KM8TbOzi/0v0lU6SSyS9bpGdZZxx19Hd8Qs0cv +R6nyt7ohttizwefkYzQ6GzwIwM9gSjH5Bf/r9Kc5/JqqpKKUGicxAGy2zKYEGB0Qo761Mcc IyclBW9mfuNFDbTBeDEyu80xggPzMIID7wIBATCBljCBgTELMAkGA1UEBhMCSVQxEDAOBgNV BAgMB0JlcmdhbW8xGTAXBgNVBAcMEFBvbnRlIFNhbiBQaWV0cm8xFzAVBgNVBAoMDkFjdGFs aXMgUy5wLkEuMSwwKgYDVQQDDCNBY3RhbGlzIENsaWVudCBBdXRoZW50aWNhdGlvbiBDQSBH MwIQCP8se1eXTyBvFrjHrZ83LDANBglghkgBZQMEAgEFAKCCAi0wGAYJKoZIhvcNAQkDMQsG CSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMjEwODEyMTgxMzM5WjAvBgkqhkiG9w0BCQQx IgQgHIggBpfoI39j7+HVCNK1xNrYUilMz3YPVt34wSzO2oAwbAYJKoZIhvcNAQkPMV8wXTAL BglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDAN BggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZ MIGWMIGBMQswCQYDVQQGEwJJVDEQMA4GA1UECAwHQmVyZ2FtbzEZMBcGA1UEBwwQUG9udGUg U2FuIFBpZXRybzEXMBUGA1UECgwOQWN0YWxpcyBTLnAuQS4xLDAqBgNVBAMMI0FjdGFsaXMg Q2xpZW50IEF1dGhlbnRpY2F0aW9uIENBIEczAhAI/yx7V5dPIG8WuMetnzcsMIGpBgsqhkiG 9w0BCRACCzGBmaCBljCBgTELMAkGA1UEBhMCSVQxEDAOBgNVBAgMB0JlcmdhbW8xGTAXBgNV BAcMEFBvbnRlIFNhbiBQaWV0cm8xFzAVBgNVBAoMDkFjdGFsaXMgUy5wLkEuMSwwKgYDVQQD DCNBY3RhbGlzIENsaWVudCBBdXRoZW50aWNhdGlvbiBDQSBHMwIQCP8se1eXTyBvFrjHrZ83 LDANBgkqhkiG9w0BAQEFAASCAQA3UryVCuoAEpasiRMQsltcPNt+35ycsYi83Mf0d/YzQ0nb BKSxFKb+mgEp7Ba8iMsiq1E4kKh+PYsvppVhaQkyTVqIQL5l2DzDfftLQTbi0XNkndm7hjwC LfKLYJlyRQjoDU60+EB7l9/IU1RCzTf4OOLXfYDXUP3i9Q+INJmfg1F8/OFgOvsJ1G1LiuaW iPXmQCMreoqKHJniBV9gEAFnoHDkpJFRSUWHxv4Gmz1A/ZQ910P/78JprSMdmX1m+RGEElVE HYjKsc/e7ESXcOJvJVv7xC53sPgccpnqAQdj3Gk3MOtKr2UZJ1oqv3+l/HzB0xPr5tI81fCm u8AcCREbAAAAAAAA --------------ms080403000608050707070209--