Received: by 2002:ac0:8c9a:0:0:0:0:0 with SMTP id r26csp553504ima; Fri, 1 Feb 2019 07:20:51 -0800 (PST) X-Google-Smtp-Source: ALg8bN4ci+nZauBcKj9aN4SNoqwOZL8BVQuRCsgYJOMeBaRN3SE1rxGDS8kOUJHRu8OWLVRjojC+ X-Received: by 2002:a17:902:5601:: with SMTP id h1mr40892064pli.160.1549034451043; Fri, 01 Feb 2019 07:20:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549034451; cv=none; d=google.com; s=arc-20160816; b=lwiUpobJJlV191Kskbd8gl4TMtdw0DyrQZ5B8/Jh4LYkPn2Lfh0w1BlAjJKLHGVfVT JYacSzjYCGQuMgbNuB62OOAzdxzUABL00aNnypaNNrt5fgqkpiu9TwpqhQ2xSAYBVWes WEMOi6KdjJmySI7G9kQVRKlrxj51/QBnABqY2mA6dMJZQV6pKbZyp9akTv6Vvnm9Q6VJ ZX60QCvQHyTzA33Hn9IfPc/CzCe+agbrixSXFiYf8k+xdWelHseO5EVKwcv8Gb4dKahx zj2j7CB/LIZqU7t+Q4raZyGbeoeEkL6WJ09mFMZKqtiiZAZmDYavu4Y6sRAyCcYPRj/2 ytuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=9bboZRiEp9Tf5BGxvJj+LAmZEz0Im6K6qtWf5UtBgOA=; b=vN6/gx3uZYTVOJLzh+MPK8wuwhw1Y1bwvbCWib+E9HbVsYjgVQkLY+7ILYbwWziOdp Q1gAlbWfEcK6JHUqJ9uhXtw0UjG1dkP7mcxUXvqPszs1DVkv/xIlxpAJ50Cd5GKsQ04W PUvQpU8YuM6sW9AMIssRFeSntiDFMn/gLOajSPAg/XnrNTtJLYeDh9XX3EH7qsR3ZIgR nuHYUMRIjuOmTm+Fb5c72Iwxvyz6h+hDQSQU0ASnmggmwfhizpMbcwq0Mk5l7dfwfNSd VESEtBtbhsdZ6p0+FIJ4u0hd+E4eNo8c8icuy0pEdbRwgApX4I8DEowEMh7ohRmbNeiQ jHfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=ilfPxG19; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b30si3715334pla.285.2019.02.01.07.20.35; Fri, 01 Feb 2019 07:20:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=ilfPxG19; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730328AbfBAPSt (ORCPT + 99 others); Fri, 1 Feb 2019 10:18:49 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:58416 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727809AbfBAPSs (ORCPT ); Fri, 1 Feb 2019 10:18:48 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id x11F3rBl142588; Fri, 1 Feb 2019 15:18:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2018-07-02; bh=9bboZRiEp9Tf5BGxvJj+LAmZEz0Im6K6qtWf5UtBgOA=; b=ilfPxG19urEXH1cpSKXOPlXvdi34EbQORFTHQaGvSSbT38Lk8rnh2Y/TKU/oVHN20J1G XQPQdvPE8LpaN+pk51oIT8EvB/4nSoxboH55pPBHxaWX7Zw1aXXNuzW8YhH5dFkAdRwZ jzQZh8M7HY5f2klpbbSlbjaLlZgbVM+bSTmq4OFjzQIT61drtGRzbhHI32IPptT4TB2s rTcIK7xgb9BTV5MjhYbm35twW7m1z9Nie60NHp2WMEVaEIyhMrD5bBBD/GdJbIZUxa+h 9tJ8yyUAs7Qm9g0+NvHM+ZQ4B6SJDLoSG3StlS75wIN8FWgea8nMRXMLDxbpwEuxk512 vg== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2130.oracle.com with ESMTP id 2q8d2eqavj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 01 Feb 2019 15:18:42 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x11FIbZH008129 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 1 Feb 2019 15:18:37 GMT Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x11FIaYT008897; Fri, 1 Feb 2019 15:18:36 GMT Received: from dhcp-10-172-157-159.no.oracle.com (/10.172.157.159) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 01 Feb 2019 07:18:36 -0800 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: [PATCH] mlx4_ib: Increase the timeout for CM cache From: =?utf-8?Q?H=C3=A5kon_Bugge?= In-Reply-To: <20190131170951.178676-1-haakon.bugge@oracle.com> Date: Fri, 1 Feb 2019 16:18:34 +0100 Cc: OFED mailing list , linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <9BDCB055-BBB7-497F-BA28-F092051E4B22@oracle.com> References: <20190131170951.178676-1-haakon.bugge@oracle.com> To: Doug Ledford , Sean Hefty , Hal Rosenstock X-Mailer: Apple Mail (2.3445.102.3) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9153 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=3 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902010114 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sorry, I posted this as if it a net patch. It isn't, hence resend with correct = recipients. Thxs, H=C3=A5kon > On 31 Jan 2019, at 18:09, H=C3=A5kon Bugge = wrote: >=20 > Using CX-3 virtual functions, either from a bare-metal machine or > pass-through from a VM, MAD packets are proxied through the PF driver. >=20 > Since the VMs have separate name spaces for MAD Transaction Ids > (TIDs), the PF driver has to re-map the TIDs and keep the book keeping > in a cache. >=20 > Following the RDMA CM protocol, it is clear when an entry has to > evicted form the cache. But life is not perfect, remote peers may die > or be rebooted. Hence, it's a timeout to wipe out a cache entry, when > the PF driver assumes the remote peer has gone. >=20 > We have experienced excessive amount of DREQ retries during fail-over > testing, when running with eight VMs per database server. >=20 > The problem has been reproduced in a bare-metal system using one VM > per physical node. In this environment, running 256 processes in each > VM, each process uses RDMA CM to create an RC QP between himself and > all (256) remote processes. All in all 16K QPs. >=20 > When tearing down these 16K QPs, excessive DREQ retries (and > duplicates) are observed. With some cat/paste/awk wizardry on the > infiniband_cm sysfs, we observe: >=20 > dreq: 5007 > cm_rx_msgs: > drep: 3838 > dreq: 13018 > rep: 8128 > req: 8256 > rtu: 8256 > cm_tx_msgs: > drep: 8011 > dreq: 68856 > rep: 8256 > req: 8128 > rtu: 8128 > cm_tx_retries: > dreq: 60483 >=20 > Note that the active/passive side is distributed. >=20 > Enabling pr_debug in cm.c gives tons of: >=20 > [171778.814239] mlx4_ib_multiplex_cm_handler: id{slave: > 1,sl_cm_id: 0xd393089f} is NULL! >=20 > By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the > tear-down phase of the application is reduced from 113 to 67 > seconds. Retries/duplicates are also significantly reduced: >=20 > cm_rx_duplicates: > dreq: 7726 > [] > cm_tx_retries: > drep: 1 > dreq: 7779 >=20 > Increasing the timeout further didn't help, as these duplicates and > retries stem from a too short CMA timeout, which was 20 (~4 seconds) > on the systems. By increasing the CMA timeout to 22 (~17 seconds), the > numbers fell down to about one hundred for both of them. >=20 > Adjustment of the CMA timeout is _not_ part of this commit. >=20 > Signed-off-by: H=C3=A5kon Bugge > --- > drivers/infiniband/hw/mlx4/cm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) >=20 > diff --git a/drivers/infiniband/hw/mlx4/cm.c = b/drivers/infiniband/hw/mlx4/cm.c > index fedaf8260105..8c79a480f2b7 100644 > --- a/drivers/infiniband/hw/mlx4/cm.c > +++ b/drivers/infiniband/hw/mlx4/cm.c > @@ -39,7 +39,7 @@ >=20 > #include "mlx4_ib.h" >=20 > -#define CM_CLEANUP_CACHE_TIMEOUT (5 * HZ) > +#define CM_CLEANUP_CACHE_TIMEOUT (30 * HZ) >=20 > struct id_map_entry { > struct rb_node node; > --=20 > 2.20.1 >=20