Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp275755imm; Tue, 22 May 2018 18:43:39 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqFwcPsJNlfAWIqCRGxudxsI2hUiPPrtmXqyS23kmNN7rzP8SzPsx9/7F/l+ZcJ+bjxh28L X-Received: by 2002:a62:3b18:: with SMTP id i24-v6mr819647pfa.246.1527039819701; Tue, 22 May 2018 18:43:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527039819; cv=none; d=google.com; s=arc-20160816; b=FCN8jzVhdxWiO5fTz78AgudMoFQhmBpEI0E8OaS85KgT8sB9yfxlA6B0L3F+OPmCVK DMTkwV8W7L4uHt9H3EdMGGhGAYILoqgY4HooVdPVjjdcQZZHy4F0el1QR9JB/FIhnSyR HWTdfLCetSkfXAN5NxwdgGCo4Q78wDUtlO51IfpPPapVoR5TJFOtz0YeoEWvlerJrSik 6s4Yl0xfDHH/wDWVpoZQ7qqbG5FNd1Itd0HhrJc09RAT1utq8gl4a/9hvUD3r1arFx2t KPxMX8Ur3sqV6Oh7kJgieZLe2a138oRgIzv9nAGnTOXrxywpRdxsR/RG02DPCC4/SEOa +WeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:subject:from:dkim-signature :arc-authentication-results; bh=Z0XmO7qUWnl/pPk81/pmI9RjTWQkEh37yLcmeqVp0+Y=; b=hDAaUu/MzF7C39dREfMNkjfeEByhH5SSX2qsbzrQp9j5OXFuo0pqJj9F2NeNZK115p YhsADUya13B9+6Y35rbS2gTclBdEUj2o5a48QhrmjQXV0CopUsfTpc2rznWxQ94SrrPv VS0R4s0YWZy/2vBfN5kkvh1K7QUAxgM5tLPYXdFrv6+LwOVieN7Hwiywv3/fbQLdO4HJ 1nKvydO6Nn8mU9EAzXgtD/4+8YhUhxeytb40Mgs9HGHd0ffM5WNLykU36RF4Ctd1i3vm gyxRCO1UoUC+zCL3sjNwV/dP8endjYiDgIgoKiGIBEpLdhkKwSElhHh2sKU+DXZanewS RQIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=I/yRXw5d; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f4-v6si11047122pgc.551.2018.05.22.18.43.24; Tue, 22 May 2018 18:43:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=I/yRXw5d; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753787AbeEWBl7 (ORCPT + 99 others); Tue, 22 May 2018 21:41:59 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:33522 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753578AbeEWBl5 (ORCPT ); Tue, 22 May 2018 21:41:57 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w4N1fDxo047914; Wed, 23 May 2018 01:41:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : subject : to : cc : references : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=Z0XmO7qUWnl/pPk81/pmI9RjTWQkEh37yLcmeqVp0+Y=; b=I/yRXw5dkHBgE+/ldfnfwnRlRD2QK4y9fW7M/vyQtTPgWdVTfC8MBVCic25sMYOuw4iQ 8CE2kOPWC9IsUzb27V/rSkJmkWb/d7WIF7svucQMEOPzGrLcubny3Zp076qmgtkiVjVp mHgf+5lRh5KCAjO9VunZdKgd76abEJR6PwE67HqjmH9pSt82fWPDh71aab70Wy0vGi36 Vyk9JTEKRE3LEQ8e2G2U2cxvQiHa5BBBcxhbpq//2xC5e7HfpyS3AYlCzGeZM0TzOGDH w0F6lkunStbDyYkFpCkXyIdByimbntp9iq+6/fj4s58Ok40QPBaZr1XHehYL89ZftKQh cg== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2130.oracle.com with ESMTP id 2j4nh79xyc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 23 May 2018 01:41:46 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w4N1fib7008936 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 23 May 2018 01:41:44 GMT Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w4N1fccx023632; Wed, 23 May 2018 01:41:43 GMT Received: from [192.168.1.122] (/24.130.61.68) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 22 May 2018 18:41:38 -0700 From: Qing Huang Subject: Re: [PATCH v3] mlx4_core: allocate ICM memory in page size chunks To: Tariq Toukan , Eric Dumazet , davem@davemloft.net, haakon.bugge@oracle.com, yanjun.zhu@oracle.com Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, gi-oh.kim@profitbricks.com References: <20180517205343.8401-1-qing.huang@oracle.com> <19b7818e-16f6-2349-dc34-245c2f215f6f@oracle.com> <35ba0f14-7b24-96ff-6b2d-610a4b2980c2@mellanox.com> Message-ID: <4b7a4f67-2c08-a60d-81cd-f12db42622ec@oracle.com> Date: Tue, 22 May 2018 18:41:39 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <35ba0f14-7b24-96ff-6b2d-610a4b2980c2@mellanox.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8901 signatures=668700 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=9 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805230013 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/22/2018 8:33 AM, Tariq Toukan wrote: > > > On 18/05/2018 12:45 AM, Qing Huang wrote: >> >> >> On 5/17/2018 2:14 PM, Eric Dumazet wrote: >>> On 05/17/2018 01:53 PM, Qing Huang wrote: >>>> When a system is under memory presure (high usage with fragments), >>>> the original 256KB ICM chunk allocations will likely trigger kernel >>>> memory management to enter slow path doing memory compact/migration >>>> ops in order to complete high order memory allocations. >>>> >>>> When that happens, user processes calling uverb APIs may get stuck >>>> for more than 120s easily even though there are a lot of free pages >>>> in smaller chunks available in the system. >>>> >>>> Syslog: >>>> ... >>>> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task >>>> oracle_205573_e:205573 blocked for more than 120 seconds. >>>> ... >>>> >>> NACK on this patch. >>> >>> You have been asked repeatedly to use kvmalloc() >>> >>> This is not a minor suggestion. >>> >>> Take a look >>> athttps://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d8c13f2271ec5178c52fbde072ec7b562651ed9d >>> >> >> Would you please take a look at how table->icm is being used in the >> mlx4 driver? It's a meta data used for individual pointer variable >> referencing, >> not as data frag or in/out buffer. It has no need for contiguous phy. >> memory. >> >> Thanks. >> > > NACK. > > This would cause a degradation when iterating the entries of table->icm. > For example, in mlx4_table_get_range. E.g. int mlx4_table_get_range(struct mlx4_dev *dev, struct mlx4_icm_table *table,                          u32 start, u32 end) {         int inc = MLX4_TABLE_CHUNK_SIZE / table->obj_size;         int err;         u32 i;         for (i = start; i <= end; i += inc) {                 err = mlx4_table_get(dev, table, i);                 if (err)                         goto fail;         }         return 0; ... } E.g. mtt obj is 8 bytes, so a 4KB ICM block would have 512 mtt objects. So you will have to allocate more 512 mtt objects in order to have table->icm pointer to increment by 1 to fetch next pointer value.  So 256K mtt objects are needed in order to traverse table->icm pointer across a page boundary in the call stacks. Considering mlx4_table_get_range() is only used in control path, there is no significant gain by using kvzalloc vs. vzalloc for table->icm. Anyway, if a user makes sure mlx4 driver to be loaded very early and doesn't remove and reload it afterwards, we should have enough (and not wasting) contiguous phy mem for table->icm allocation. I will use kvzalloc to replace vzalloc and send a V4 patch. Thanks, Qing > > Thanks, > Tariq > >>> And you'll understand some people care about this. >>> >>> Strongly. >>> >>> Thanks. >>> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html