Received: by 2002:ac0:b08d:0:0:0:0:0 with SMTP id l13csp1682671imc; Fri, 22 Feb 2019 09:16:53 -0800 (PST) X-Google-Smtp-Source: AHgI3IZsKqTqMoiKHivikYwCfK1XMowL3AuAZRiOSCXhHt7BF989hoWU0RSiKvjz+xF3/V0QtPsO X-Received: by 2002:a63:d84b:: with SMTP id k11mr5013796pgj.142.1550855813461; Fri, 22 Feb 2019 09:16:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550855813; cv=none; d=google.com; s=arc-20160816; b=GhJena8j4P+fUxD4QcWJ5C6EFXPMT5FUCU1OaYAtTrrE5/SEZA5offWZDlF/+w0M54 Pp1zz2siRGLtFCYdz4jI4HMiUiv6vWekWrz7yAaQcS3Ff1hn1rOQIUQB+HrTgA+4WlqR S3jOjJrfbiSVebLW/cz2HFhiUpMb7qf/8eJPdXsjaJ2hZv6Ab2WFpm1oJvDJVJxgw6YH Pg1M5GkoyA1potOW5CV0Hui+nMwaPKDRXo1Umf2UrPdDocIWgcMGtriAGgGTUBLsRr/d gH6Kg7w5UpYiE/3BD1mKI7p+W9sQoCrip5q0+O2es6QwUqGKK/3brlaFioVf3Au4vDf4 JrLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=PSv0LtGerW0MrV/C0t4wbwRVh4p8HL6Q5+frX+l16Jg=; b=m83lU4EZGdAO8pSUWG4jz5HLCzthyRR1SBNm34xAW8hyST/UNsWDJUaTH9RmVffnpQ c/mSg8+5jNl4f1nAoyM+qRtws3EzVT9d/lNPYD39QtJaWArV67KkCyUTCTVT2tyg2L19 UACCzAdYb3s+9GZlPf5iC4Mb5kgf40MxcZ1gGtM4S38xIBLwv4JauDLuKCt3PLYgPPpc ATYWuf6RwY79tfbqRhU2BKywqZlhJn/HiD4R3wOdEKsAK6oHVv7qLRoFlz7ljQjPO7bn CIF5/59VZznQL9pMpBepRTj9YNPPcxHhkIY7+gn/rHg2qVRsFq4N7PoUvF4caGaeX51G fdpA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 11si1800553pfh.90.2019.02.22.09.16.37; Fri, 22 Feb 2019 09:16:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727017AbfBVROj (ORCPT + 99 others); Fri, 22 Feb 2019 12:14:39 -0500 Received: from opengridcomputing.com ([72.48.214.68]:36764 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725832AbfBVROi (ORCPT ); Fri, 22 Feb 2019 12:14:38 -0500 Received: from [10.10.0.239] (cody.ogc.int [10.10.0.239]) by smtp.opengridcomputing.com (Postfix) with ESMTPSA id E487F22775; Fri, 22 Feb 2019 11:14:37 -0600 (CST) Subject: Re: [PATCH] RDMA/cma: Make CM response timeout and # CM retries configurable To: Jason Gunthorpe , =?UTF-8?Q?H=c3=a5kon_Bugge?= Cc: Doug Ledford , Leon Romanovsky , Parav Pandit , linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org References: <20190217170909.1178575-1-haakon.bugge@oracle.com> <20190222163637.GA9819@ziepe.ca> From: Steve Wise Message-ID: Date: Fri, 22 Feb 2019 11:14:44 -0600 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <20190222163637.GA9819@ziepe.ca> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/22/2019 10:36 AM, Jason Gunthorpe wrote: > On Sun, Feb 17, 2019 at 06:09:09PM +0100, Håkon Bugge wrote: >> During certain workloads, the default CM response timeout is too >> short, leading to excessive retries. Hence, make it configurable >> through sysctl. While at it, also make number of CM retries >> configurable. >> >> The defaults are not changed. >> >> Signed-off-by: Håkon Bugge >> drivers/infiniband/core/cma.c | 51 ++++++++++++++++++++++++++++++----- >> 1 file changed, 44 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >> index c43512752b8a..ce99e1cd1029 100644 >> +++ b/drivers/infiniband/core/cma.c >> @@ -43,6 +43,7 @@ >> #include >> #include >> #include >> +#include >> #include >> >> #include >> @@ -68,13 +69,46 @@ MODULE_AUTHOR("Sean Hefty"); >> MODULE_DESCRIPTION("Generic RDMA CM Agent"); >> MODULE_LICENSE("Dual BSD/GPL"); >> >> -#define CMA_CM_RESPONSE_TIMEOUT 20 >> #define CMA_QUERY_CLASSPORT_INFO_TIMEOUT 3000 >> -#define CMA_MAX_CM_RETRIES 15 >> #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24) >> #define CMA_IBOE_PACKET_LIFETIME 18 >> #define CMA_PREFERRED_ROCE_GID_TYPE IB_GID_TYPE_ROCE_UDP_ENCAP >> >> +#define CMA_DFLT_CM_RESPONSE_TIMEOUT 20 >> +static int cma_cm_response_timeout = CMA_DFLT_CM_RESPONSE_TIMEOUT; >> +static int cma_cm_response_timeout_min = 8; >> +static int cma_cm_response_timeout_max = 31; >> +#undef CMA_DFLT_CM_RESPONSE_TIMEOUT >> + >> +#define CMA_DFLT_MAX_CM_RETRIES 15 >> +static int cma_max_cm_retries = CMA_DFLT_MAX_CM_RETRIES; >> +static int cma_max_cm_retries_min = 1; >> +static int cma_max_cm_retries_max = 100; >> +#undef CMA_DFLT_MAX_CM_RETRIES >> + >> +static struct ctl_table_header *cma_ctl_table_hdr; >> +static struct ctl_table cma_ctl_table[] = { >> + { >> + .procname = "cma_cm_response_timeout", >> + .data = &cma_cm_response_timeout, >> + .maxlen = sizeof(cma_cm_response_timeout), >> + .mode = 0644, >> + .proc_handler = proc_dointvec_minmax, >> + .extra1 = &cma_cm_response_timeout_min, >> + .extra2 = &cma_cm_response_timeout_max, >> + }, >> + { >> + .procname = "cma_max_cm_retries", >> + .data = &cma_max_cm_retries, >> + .maxlen = sizeof(cma_max_cm_retries), >> + .mode = 0644, >> + .proc_handler = proc_dointvec_minmax, >> + .extra1 = &cma_max_cm_retries_min, >> + .extra2 = &cma_max_cm_retries_max, >> + }, >> + { } >> +}; > Is sysctl the right approach here? Should it be rdma tool instead? > > Jason There are other rdma sysctls currently:  net.rdma_ucm.max_backlog and net.iw_cm.default_backlog.  The core network stack seems to use sysctl and not ip tool to set basically globals. To use rdma tool, we'd have to have some concept of a "module" object, I guess.  IE there's dev, link, and resource rdma tool objects currently.  But these cma timeout settings are really not per dev, link, nor a resource.   Maybe we have just a "core" object:  rdma core set cma_max_cm_retries min 8 max 30.