Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp215492pxa; Tue, 4 Aug 2020 03:53:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyS55Iun0+Lnq1NxcXtRPzGClr3d336GLEMFnyJ1PPePv7ATb6MBT9jDuUpj6Eq64Br3QjZ X-Received: by 2002:a17:906:29d5:: with SMTP id y21mr20049423eje.131.1596538407920; Tue, 04 Aug 2020 03:53:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596538407; cv=none; d=google.com; s=arc-20160816; b=Srkoo7OCFUK1HSXilrdqGn7qtwLCXf5vcExVHzpPUZMqYIs8tu3HgHciJMyY34LCSd puYSGs1kC5MFyTji/ELW+iWacRyRGNyAZUtWucvYszOeJDMUSvTvmFD6A9oXQEVBq01x fHSbH1SNSVZmbxlb7p35xbqFHxn7zXNzf4OSGKuPh4sNSiqkfXP9ajMftsBXv+uyuc92 3FujVo68A459qfxLUCmADQLGfl2h8uPb/Q9LAWeizoL1HLwqaIz0LJQU7TrUj0zC78G7 N22djC+2QOOQwqUFVln9SjLaXS6DprLEUhLhyBjymgW3dPoAtxC1HRSihERK4PkK1S+M p6CA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=OviI053JyWSEt4PsV2X/lE+8RyxsPPKpmKPYMQjdaNI=; b=Z495y7zdJ8FJZLSPJLkdPUiKmS6YH3WrpZfj31mxthd2qTBXKdVsqBaXu41UAa9KXc p/YpYn89U0I1KeWPeNXbCloV6KN/+1C0A4NizH25B3L1XKXlAYx76xLkoEr+vB1PPmOi 0ZkKPGsOK4bKs0uW4xxKwruhZrXTK35HO9PRte6JL68ZF1FlK/Ql6shIQdzkrhsH4iwH sLhrTYWB1RaNjiGfzuOpKnBVk9MIYiJylrOYONu70NTmH/52sEH0e5q5L7WaNFbNCjYI 8WKFqThBdcQjPlX+jGfSBYRnTKswWXjugpTUu4iTy9SgjUreY0G2EPMLrH52lzoUpcsV sg9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lx7si11294305ejb.752.2020.08.04.03.52.46; Tue, 04 Aug 2020 03:53:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=rothenpieler.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729397AbgHDKwn (ORCPT + 99 others); Tue, 4 Aug 2020 06:52:43 -0400 Received: from btbn.de ([5.9.118.179]:40948 "EHLO btbn.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729385AbgHDKwd (ORCPT ); Tue, 4 Aug 2020 06:52:33 -0400 Received: from [IPv6:2001:16b8:64d7:4500:fc3b:cfd2:151e:7636] (200116b864d74500fc3bcfd2151e7636.dip.versatel-1u1.de [IPv6:2001:16b8:64d7:4500:fc3b:cfd2:151e:7636]) by btbn.de (Postfix) with ESMTPSA id B558D4BD30; Tue, 4 Aug 2020 12:52:30 +0200 (CEST) Subject: Re: NFS over RDMA issues on Linux 5.4 To: Leon Romanovsky , Chuck Lever Cc: Linux NFS Mailing List , linux-rdma References: <8a1087d3-9add-dfe1-da0c-edab74fcca51@rothenpieler.org> <20200804093635.GA4432@unreal> From: Timo Rothenpieler Message-ID: <92a5a932-b843-eed3-555e-7557ccc1f308@rothenpieler.org> Date: Tue, 4 Aug 2020 12:52:27 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <20200804093635.GA4432@unreal> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 04.08.2020 11:36, Leon Romanovsky wrote: > On Mon, Aug 03, 2020 at 12:24:21PM -0400, Chuck Lever wrote: >> Hi Timo- >> >>> On Aug 3, 2020, at 11:05 AM, Timo Rothenpieler wrote: >>> >>> Hello, >>> >>> I have just deployed a new system with Mellanox ConnectX-4 VPI EDR IB cards and wanted to setup NFS over RDMA on it. >>> >>> However, while mounting the FS over RDMA works fine, actually using it results in the following messages absolutely hammering dmesg on both client and server: >>> >>>> https://gist.github.com/BtbN/9582e597b6581f552fa15982b0285b80#file-server-log >>> >>> The spam only stops once I forcibly reboot the client. The filesystem gets nowhere during all this. The retrans counter in nfsstat just keeps going up, nothing actually gets done. >>> >>> This is on Linux 5.4.54, using nfs-utils 2.4.3. >>> The mlx5 driver had enhanced-mode disabled in order to enable IPoIB connected mode with an MTU of 65520. >>> >>> Normal NFS 4.2 over tcp works perfectly fine on this setup, it's only when I mount via rdma that things go wrong. >>> >>> Is this an issue on my end, or did I run into a bug somewhere here? >>> Any pointers, patches and solutions to test are welcome. >> >> I haven't seen that failure mode here, so best I can recommend is >> keep investigating. I've copied linux-rdma in case they have any >> advice. > > The mentioning of IPoIB is a slightly confusing in the context of NFS-over-RDMA. > Are you running NFS over IPoIB? For all I'm aware, NFS over RDMA still needs an IP and port to be targeted to, so IPoIB is mandatory? At least the admin guide in the kernel says so. Right now I actually am running NFS over IPoIB (without RDMA), because of the issue at hand. And would like to turn on RDMA for enhanced performance. > From brief look on CQE error syndrome (local length error), the client sends wrong WQE. Does that point at an issue in the kernel code, or something I did wrong? The fstab entries for these mounts look like this: 10.110.10.200:/home /home nfs4 rw,rdma,port=20049,noatime,async,vers=4.2,_netdev 0 0 Is there anything more I can investigate? I tried turning connected mode off and lowering the mtu in turn, but that did not have any effect.