Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp366296pxb; Fri, 29 Oct 2021 11:18:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzKBY8DklFWfNJF63CGH8FOBLotReAE+ITP4PpkxBD/2AkKe3I6H6ZupxwBUwysGN+j0nsr X-Received: by 2002:a5e:c80c:: with SMTP id y12mr9072270iol.16.1635531502240; Fri, 29 Oct 2021 11:18:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635531502; cv=none; d=google.com; s=arc-20160816; b=bJe3kRFgMHfJXsJFz5/r4/wfypQHLX+eOichNL7UvTJMDRe91LND11C+POLjSZI3UO V4833ITlz6/IbFdhfjpHUI3KSAkm3uJVUNsDzWanQAqRIDrREQLlpQ4YNO4GssWA6/ZM 0Brtubxl9SMmlhYGTJcgRRrsdiL687nv4D0dz1s8CpWyNXFh0w+/v+sL34+ihk05BEhb Vocr/TbzCOn7qsxENDZBKbEYh3lsHarGfI/qkwcAxNHC04GTMp/ZbHtRmpbxSLNSj7l+ g5lfoUYYTQL3kVXzkztslKyCBgE5bjRDsfOPhbKeRFiO1IpeVs0oKKbZ3XvYMNJ6bXxb 7Q0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :dkim-signature; bh=Gej9uwi4okovjyPG5N4A3izKZRLyyma7c7EGZgrLUPE=; b=vy0BVH/Jy9HfhosYwuR00bHgYjzWRaoRHfgqAX6tMjnpG2UnUR/vHGRvCmZyz1Db3o aEQmLKRNX13zGhdPvJAvX+7OJ6nzqipH9TYftyFAZWh1KFRz/A77xWzcE7QiW/aRBYG8 cx0RhAx6AWbEXqoDWZyZIha1xZxIGkbX5d4J/7N6DV5jsvxKhHbD9oRnli1XRrRUapYA p8K/I4Bgv+2uv6CnUL9QN9HLsvJKLRmgYMq+cmr6IMHltp/pQV7z2sVUsrgOLxqVEgWN 65gIvID84jjJuOEJs58IRllQLhoPxMENz8p5/0uRdtYSYFljNztO0nLI9qGY47KLhUNr 1AHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rothenpieler.org header.s=mail header.b=ElnaTSvQ; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v12si2715444jas.67.2021.10.29.11.17.51; Fri, 29 Oct 2021 11:18:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@rothenpieler.org header.s=mail header.b=ElnaTSvQ; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229498AbhJ2SUS (ORCPT + 99 others); Fri, 29 Oct 2021 14:20:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229489AbhJ2SUS (ORCPT ); Fri, 29 Oct 2021 14:20:18 -0400 X-Greylist: delayed 23390 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 29 Oct 2021 11:17:49 PDT Received: from btbn.de (btbn.de [IPv6:2a01:4f8:212:2854::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56C7EC061570 for ; Fri, 29 Oct 2021 11:17:49 -0700 (PDT) Received: from [authenticated] by btbn.de (Postfix) with ESMTPSA id 893E517FB68; Fri, 29 Oct 2021 20:17:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rothenpieler.org; s=mail; t=1635531467; bh=Gej9uwi4okovjyPG5N4A3izKZRLyyma7c7EGZgrLUPE=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=ElnaTSvQwbP8FfnVxOxJiBqOrnGNUeBm/pG8sVfFN+zPlCxlq1kyxQb2ZdbblGVUc ApGbikD9tfiLDWEs4PhHsM9eZZs52w3ZvptCkGPPVbnpwMFw3j+KsYCHcGaG0fw0av Act97+bZbGZ3k2nCeN/3WtM/BJxCMnKcEXLPn+lAh4ISIeRgJjXWnR8k5I2jnivS7s eIqnSBbJ8GQN54GzYjJbkRZPpoVjbIci9lPzdVU+Oa5Ps3bME5kkem6An4j6AReKyE TGG8z3SO1T40P5r1Q6T5lJSNpa+CWOnc3wvXv/5W3TGgWSA24sHYBDcx4afxpz3Wx5 Rlndtg656Pfkg== Message-ID: <774c4839-0165-e660-bbc4-9a8814192f26@rothenpieler.org> Date: Fri, 29 Oct 2021 20:17:46 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 Subject: Re: Spurious instability with NFSoRDMA under moderate load Content-Language: en-US To: Chuck Lever III Cc: Linux NFS Mailing List , Olga Kornievskaia , Dai Ngo References: <4da3b074-a6be-d83f-ccd4-b151557066aa@rothenpieler.org> <9355de20-921c-69e0-e5a4-733b64e125e1@rothenpieler.org> <4BA2A532-9063-4893-AF53-E1DAB06095CC@oracle.com> <141fdf51-2aa1-6614-fe4e-96f168cbe6cf@rothenpieler.org> <99DFF0B0-FE0F-4416-B3F6-1F9535884F39@oracle.com> <64F9A492-44B9-4057-ABA5-C8202828A8DD@oracle.com> <1b8a24a9-5dba-3faf-8b0a-16e728a6051c@rothenpieler.org> <5DD80ADC-0A4B-4D95-8CF7-29096439DE9D@oracle.com> <0444ca5c-e8b6-1d80-d8a5-8469daa74970@rothenpieler.org> <3AF4F6CA-8B17-4AE9-82E2-21A2B9AA0774@oracle.com> <4caff277-8e53-3c75-70c1-8938b2a26933@rothenpieler.org> <716B2A38-9705-41D7-969B-665EF90156C7@oracle.com> <60273c2e-e946-25fb-68af-975f793e73d2@rothenpieler.org> <2445C26E-7D96-4E77-8079-98B865CC4C57@oracle.com> From: Timo Rothenpieler In-Reply-To: <2445C26E-7D96-4E77-8079-98B865CC4C57@oracle.com> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; boundary="------------ms000903080901000809050509" Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org This is a cryptographically signed message in MIME format. --------------ms000903080901000809050509 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 29.10.2021 17:14, Chuck Lever III wrote: > Hi Timo- > >> On Oct 29, 2021, at 7:47 AM, Timo Rothenpieler wrote: >> >> On 20/08/2021 17:12, Chuck Lever III wrote: >>> OK, I think the issue with this reproducer was resolved >>> completely with 6820bf77864d. >>> I went back and reviewed the traces from when the client got >>> stuck after a long uptime. This looks very different from >>> what we're seeing with 6820bf77864d. It involves CB_PATH_DOWN >>> and BIND_CONN_TO_SESSION, which is a different scenario. Long >>> story short, I don't think we're getting any more value by >>> leaving 6820bf77864d reverted. >>> Can you re-apply that commit on your server, and then when >>> the client hangs again, please capture with: >>> # trace-cmd record -e nfsd -e sunrpc -e rpcrdma >>> I'd like to see why the client's BIND_CONN_TO_SESSION fails >>> to repair the backchannel session. >> >> Happened again today, after a long time of no issues. >> Still on 5.12.19, since the system did not have a chance for a bigger maintenance window yet. >> >> Attached are traces from both client and server, while the client is trying to do the usual xfs_io copy_range. >> The system also has a bunch of other users and nodes working on it at this time, so there's a good chance for unrelated noise in the traces. >> >> The affected client is 10.110.10.251. >> Other clients are working just fine, it's only this one client that's affected. >> >> There was also quite a bit of heavy IO work going on on the Cluster, which I think coincides with the last couple times this happened as well. > > Thanks for the report. We believe this issue has been addressed in v5.15-rc: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=02579b2ff8b0becfb51d85a975908ac4ab15fba8 5.15 is a little too bleeding edge for my comfort to roll out on a production system. But the patch applies cleanly on top of 5.12.19. So I pulled it and am now running the resulting kernel on all clients and the server(s). Hopefully won't see this happen again from now on, thanks! Timo --------------ms000903080901000809050509 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC DVkwggXkMIIDzKADAgECAhAI/yx7V5dPIG8WuMetnzcsMA0GCSqGSIb3DQEBCwUAMIGBMQsw CQYDVQQGEwJJVDEQMA4GA1UECAwHQmVyZ2FtbzEZMBcGA1UEBwwQUG9udGUgU2FuIFBpZXRy bzEXMBUGA1UECgwOQWN0YWxpcyBTLnAuQS4xLDAqBgNVBAMMI0FjdGFsaXMgQ2xpZW50IEF1 dGhlbnRpY2F0aW9uIENBIEczMB4XDTIxMDIxNDE5MTM0N1oXDTIyMDIxNDE5MTM0N1owIDEe MBwGA1UEAwwVdGltb0Byb3RoZW5waWVsZXIub3JnMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A MIIBCgKCAQEA0WP2SBuRIpVw5O7QPakKoJjg7B4UNAKTyky1XMsievLNGnR4Nxe6kKU+1oW0 oF5FqMVH9NkT9zhWYJzr5sNwJMKb9t5k8kYC7GXzOM9PxVx3bkLF5bWZrbfelUUwcdiyEYoh d29C+PxiNLHvmayWb3NtxpWiax9A4x7dRhhtqB/0BkPix+ZsIFn8vxpCvIChE2YlQWK3i8UX uBtqm26zBl3BIjj+bpd+7ePVt60vRx/R3LFHtF6kL/gQvgRcm8CFc8Nj3dCUeR2lfG+DzoTY ED6yAi838kRh5JHbqIl/Fo9YRwOYUaq2TFT/fGue87d7duLbckX1aVot+OqE0aeV2QIDAQAB o4IBtjCCAbIwDAYDVR0TAQH/BAIwADAfBgNVHSMEGDAWgBS+l6mqhL+AvxBTfQky+eEuMhvP dzB+BggrBgEFBQcBAQRyMHAwOwYIKwYBBQUHMAKGL2h0dHA6Ly9jYWNlcnQuYWN0YWxpcy5p dC9jZXJ0cy9hY3RhbGlzLWF1dGNsaWczMDEGCCsGAQUFBzABhiVodHRwOi8vb2NzcDA5LmFj dGFsaXMuaXQvVkEvQVVUSENMLUczMCAGA1UdEQQZMBeBFXRpbW9Acm90aGVucGllbGVyLm9y ZzBHBgNVHSAEQDA+MDwGBiuBHwEYATAyMDAGCCsGAQUFBwIBFiRodHRwczovL3d3dy5hY3Rh bGlzLml0L2FyZWEtZG93bmxvYWQwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMEgG A1UdHwRBMD8wPaA7oDmGN2h0dHA6Ly9jcmwwOS5hY3RhbGlzLml0L1JlcG9zaXRvcnkvQVVU SENMLUczL2dldExhc3RDUkwwHQYDVR0OBBYEFK/aNb0BTZd0BqHgSJnmTftGSlabMA4GA1Ud DwEB/wQEAwIFoDANBgkqhkiG9w0BAQsFAAOCAgEAT3W2bBaISi7Utg/WA3U+bBhiouolnROR AB0vW4m3igjMcWx5GrPb8CSWNcq0/+BG+bhj6s+q7D1E9h1HO9CZUCfD7ujXj/VT/h7oMAqX w3Tf6H92bvHmZCvZmb2HKEnAAa4URjeZyNI1uwsMirF/gC5zYX5pm2ydVGxGYusWq8VRZzgc m1a0f3SPtX2dmmqjCzfINsQPs3N7BQo6FO/PfCbCzt22e+9Zm0Lra0Wt2URFTYCKSTjsK2xC SkysTfVIrBZCOb83oTMsgYE9dBmK7Tmob/HzHKs0NUOu4TfEpCgFgoXozMqTLFQac7aW26YK O8ClFDaauyOC71A+kjrth/gkUNEK+Cd3W52hK2FWvxbG/8LQLDMYviZFKxv/LAHU0fb6omva R4dzu9Sagi1z5uI5KHs5SR85lH4Up0dYs+I2xyFb8wZVYa+VuvsJ4W/pL2OaMm0tez+aNprg XURytCSPfAlz3JQdEYIiKPlJrz7O6eL2j7RwxMcKFLQl117mhImjdauIjaaS60w92P7v+F7+ 7INJ8g0PFN2vHVCB9e1g4iSYIgiydDLcbs73Jp1yVp97plWZI9oirxvH1/vI05FUJ3gw9qg2 WfbttAr0AEakAUo3Dv8jB7aQor/5fu8NMOvWjFV7P7GTAgrwil8u6fXa8ae/kWzG/850vgqq GM0wggdtMIIFVaADAgECAhAXED7ePYoctcoGUZPnykNrMA0GCSqGSIb3DQEBCwUAMGsxCzAJ BgNVBAYTAklUMQ4wDAYDVQQHDAVNaWxhbjEjMCEGA1UECgwaQWN0YWxpcyBTLnAuQS4vMDMz NTg1MjA5NjcxJzAlBgNVBAMMHkFjdGFsaXMgQXV0aGVudGljYXRpb24gUm9vdCBDQTAeFw0y MDA3MDYwODQ1NDdaFw0zMDA5MjIxMTIyMDJaMIGBMQswCQYDVQQGEwJJVDEQMA4GA1UECAwH QmVyZ2FtbzEZMBcGA1UEBwwQUG9udGUgU2FuIFBpZXRybzEXMBUGA1UECgwOQWN0YWxpcyBT LnAuQS4xLDAqBgNVBAMMI0FjdGFsaXMgQ2xpZW50IEF1dGhlbnRpY2F0aW9uIENBIEczMIIC IjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA7eaHlqHBpLbtwkJV9z8PDyJgXxPgpkOI hkmReRwbLxpQD9xGAe72ujqGzFFh78QPgAhxKVqtGHzYeq0VJVCzhnCKRBbVX+JwIhL3ULYh UAZrViUp952qDB6qTL5sGeJS9F69VPSR5k6pFNw7mHDTTt0voWFg2aVkG3khomzVXoieJGOi Q4dH76paCtQbLkt59joAKz2BnwGLQ4wr09nfumJt5AKx2YxHK2XgSPslVZ4z8G00gimsfA7U tjT/wiekY6Z0b7ksLrEcvODncHQe9VSrNRA149SE3AlkWaZM/joVei/GYfj9K5jkiReinR4m qM353FEceLOeBhSTURpMdQ5wsXLi9DSTGBuNv4aw2Dozb/qBlkhGTvwk92mi0jAecE22Sn3A 9UfrU2p1w/uRs+TIteQ0xO0B/J2mY2caqocsS9SsriIGlQ8b0LT0o6Ob07KGtPa5/lIvMmx5 72Dv2v+vDiECByxm1Hdgjp8JtE4mdyYP6GBscJyT71NZw1zXHnFkyCbxReag9qaSR9x4CVVX j1BDmNROCqd5NAfIXUXYTFeZ/jukQigkxXGWhEhfLBC4Ha6pwizz9fq1+wwPKcWaF9P/SZOu BDrG30MiyCZa66G9mEtF5ZLuh4rGfKqxy4Z5Mxecuzt+MZmrSKfKGeXOeED/iuX5Z02M1o7i MS8CAwEAAaOCAfQwggHwMA8GA1UdEwEB/wQFMAMBAf8wHwYDVR0jBBgwFoAUUtiIOsifeGbt ifN7OHCUyQICNtAwQQYIKwYBBQUHAQEENTAzMDEGCCsGAQUFBzABhiVodHRwOi8vb2NzcDA1 LmFjdGFsaXMuaXQvVkEvQVVUSC1ST09UMEUGA1UdIAQ+MDwwOgYEVR0gADAyMDAGCCsGAQUF BwIBFiRodHRwczovL3d3dy5hY3RhbGlzLml0L2FyZWEtZG93bmxvYWQwHQYDVR0lBBYwFAYI KwYBBQUHAwIGCCsGAQUFBwMEMIHjBgNVHR8EgdswgdgwgZaggZOggZCGgY1sZGFwOi8vbGRh cDA1LmFjdGFsaXMuaXQvY24lM2RBY3RhbGlzJTIwQXV0aGVudGljYXRpb24lMjBSb290JTIw Q0EsbyUzZEFjdGFsaXMlMjBTLnAuQS4lMmYwMzM1ODUyMDk2NyxjJTNkSVQ/Y2VydGlmaWNh dGVSZXZvY2F0aW9uTGlzdDtiaW5hcnkwPaA7oDmGN2h0dHA6Ly9jcmwwNS5hY3RhbGlzLml0 L1JlcG9zaXRvcnkvQVVUSC1ST09UL2dldExhc3RDUkwwHQYDVR0OBBYEFL6XqaqEv4C/EFN9 CTL54S4yG893MA4GA1UdDwEB/wQEAwIBBjANBgkqhkiG9w0BAQsFAAOCAgEAJpvnG1kNdLMS A+nnVfeEgIXNQsM7YRxXx6bmEt9IIrFlH1qYKeNw4NV8xtop91Rle168wghmYeCTP10FqfuK MZsleNkI8/b3PBkZLIKOl9p2Dmz2Gc0I3WvcMbAgd/IuBtx998PJX/bBb5dMZuGV2drNmxfz 3ar6ytGYLxedfjKCD55Yv8CQcN6e9sW5OUm9TJ3kjt7Wdvd1hcw5s+7bhlND38rWFJBuzump 5xqm1NSOggOkFSlKnhSz6HUjgwBaid6Ypig9L1/TLrkmtEIpx+wpIj7WTA9JqcMMyLJ0rN6j jpetLSGUDk3NCOpQntSy4a8+0O+SepzS/Tec1cGdSN6Ni2/A7ewQNd1Rbmb2SM2qVBlfN0e6 ZklWo9QYpNZyf0d/d3upsKabE9eNCg1S4eDnp8sJqdlaQQ7hI/UYCAgDtLIm7/J9+/S2zuwE WtJMPcvaYIBczdjwF9uW+8NJ/Zu/JKb98971uua7OsJexPFRBzX7/PnJ2/NXcTdwudShJc/p d9c3IRU7qw+RxRKchIczv3zEuQJMHkSSM8KM8TbOzi/0v0lU6SSyS9bpGdZZxx19Hd8Qs0cv +R6nyt7ohttizwefkYzQ6GzwIwM9gSjH5Bf/r9Kc5/JqqpKKUGicxAGy2zKYEGB0Qo761Mcc IyclBW9mfuNFDbTBeDEyu80xggPzMIID7wIBATCBljCBgTELMAkGA1UEBhMCSVQxEDAOBgNV BAgMB0JlcmdhbW8xGTAXBgNVBAcMEFBvbnRlIFNhbiBQaWV0cm8xFzAVBgNVBAoMDkFjdGFs aXMgUy5wLkEuMSwwKgYDVQQDDCNBY3RhbGlzIENsaWVudCBBdXRoZW50aWNhdGlvbiBDQSBH MwIQCP8se1eXTyBvFrjHrZ83LDANBglghkgBZQMEAgEFAKCCAi0wGAYJKoZIhvcNAQkDMQsG CSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMjExMDI5MTgxNzQ2WjAvBgkqhkiG9w0BCQQx IgQgVw4qTDv1ORJ070JiPB8Fce8unyyMwcIY5Ltf/q0db10wbAYJKoZIhvcNAQkPMV8wXTAL BglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDAN BggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZ MIGWMIGBMQswCQYDVQQGEwJJVDEQMA4GA1UECAwHQmVyZ2FtbzEZMBcGA1UEBwwQUG9udGUg U2FuIFBpZXRybzEXMBUGA1UECgwOQWN0YWxpcyBTLnAuQS4xLDAqBgNVBAMMI0FjdGFsaXMg Q2xpZW50IEF1dGhlbnRpY2F0aW9uIENBIEczAhAI/yx7V5dPIG8WuMetnzcsMIGpBgsqhkiG 9w0BCRACCzGBmaCBljCBgTELMAkGA1UEBhMCSVQxEDAOBgNVBAgMB0JlcmdhbW8xGTAXBgNV BAcMEFBvbnRlIFNhbiBQaWV0cm8xFzAVBgNVBAoMDkFjdGFsaXMgUy5wLkEuMSwwKgYDVQQD DCNBY3RhbGlzIENsaWVudCBBdXRoZW50aWNhdGlvbiBDQSBHMwIQCP8se1eXTyBvFrjHrZ83 LDANBgkqhkiG9w0BAQEFAASCAQBmI4A3HbAbcjzIxO9KT1y7OnC1h1pDQ5rVuYgY11XY5d47 YP8Aa7SML6gBmyaCbY0wnJQP1daVPphEXgciApuJ8R5oXrf2r/D2pzRbCsTfbnk5EAtmtZgY M50cE31eIzy2jomaez+BS6gTnQfCseQxLiIPAYzQYhHx6U9nWKnZnwbQ8knHYb/SHq7+iCq/ G66VQFmVB29eRSiTpJG+vKTDJTfWMYiQXY57acJtgk859gPj6DWhh9KLR5KFyGmDzfc/0dXL 2EfScvlFvWQ3IYxuUlDRRjW5EnDRVGu6yWhpZPfDg8tDDDV+3RbWwKz3dqzBY6CAsE68/RLA bvrvpqfxAAAAAAAA --------------ms000903080901000809050509--