Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp200125imm; Thu, 10 May 2018 18:38:14 -0700 (PDT) X-Google-Smtp-Source: AB8JxZr2lBLrOxcqRAnd9qV6FRJQ40UYn4g6jQt+WjcL0zaUDB+SzJg8IndH8NGYQDX4g2kG/tIT X-Received: by 2002:a17:902:9689:: with SMTP id n9-v6mr3389616plp.363.1526002694778; Thu, 10 May 2018 18:38:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526002694; cv=none; d=google.com; s=arc-20160816; b=sY2zhEieFbJaEBOI0musOXlXOA9C0QCXmB6Na/bYGVr1Y0RGlFhdGNDln7OAyzlteA +uvnZnpdWTkPSjw6JjQZLG/ypPUxACsUWC07en1IeTUWlJhEhKqZ7KJKgK+6iJVs5l4o BdV402iVzo4R+hw6cuUnaDdWlzHkAiv2NEGmkz7Grn43LYMFYQlqLoeiZPtr7NKX6yEu dA5ppMDSYl02Q6Mr3DqpsCW7upaL9JyGH5ZXazyBZ+LN9YtcJgbMaE1F5flc27EuSnhc NBCW8Zd0vFzbfzCcDjr5VUKhoLuGT6Vipi6uKfq9+ftJ0FDWCdhEDYjlF1QUCr91YtTx lVPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=N8dbsOt/7pRwtqwd97R8HfyqZix6I4qcsTe7JyBWVQg=; b=XoTUQcRH6yrbuD7+XlH5dZIRu64aN+DJ2UT/DUgjl7MO2NaoFoqBYlsyjJTPBgRU5y /gK+THqBaKC/5zbZTKfdk4Mhrb3i3xRM2tIuTCVl7oXZ86uD3Ty/q+9/21HkjOv9tVgs si+sgsIfxBlMbKqwnVnDmJnZuAQxczVyHuYDOB/sjH4RGgM3WAzY2mZFmHM6GM7fw118 TmoaIuh99n2EIuuie0BpA1vYX1IdxPIm2NDpLTn7izYE5nqI21gE4rmB5RFdAEfVl6Je qhiinCb5a6IHgVeefEGTIz531p9Cfk7N4LPSe2xgpcZAw3RF2JUXiEut9Kf94kyy5jIx IPWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=CCyaHMkY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f19-v6si1974083plr.13.2018.05.10.18.38.00; Thu, 10 May 2018 18:38:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=CCyaHMkY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752090AbeEKBhF (ORCPT + 99 others); Thu, 10 May 2018 21:37:05 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:44516 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751997AbeEKBhD (ORCPT ); Thu, 10 May 2018 21:37:03 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w4B1aObK086791; Fri, 11 May 2018 01:36:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=N8dbsOt/7pRwtqwd97R8HfyqZix6I4qcsTe7JyBWVQg=; b=CCyaHMkYFEb2LpcqfXhmXhdct4c+WMjbOqH3QbALc44ewZpeISIrPMuZHKNeym0orLT8 g/tQsC9EAde7/kocUXr9WIWXqrmAWNjNKHTrXBKSVnB8fNgoAFKnB0iO2Z5QO/T8Of7q T7kIUZ/PEvA0QNAJCosXSFjrWAMhAfJbOmZcBfRjrTbapnnYGJSp3pfS8dU3ZXiYiAVW mMJBzMIlKnIFcewzLIYQkJXvFRJEF0ckq6n1xEZaKrvThMbJgfwN+XePdMAdzmVbkhiN kBIcApoBbSEpfbV0PPJMu44TgK3+HqMCN8wTNqrTlpy/vVF+wFhsr5Ix4SHl80KPQpfC yA== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2130.oracle.com with ESMTP id 2hvth99uag-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 11 May 2018 01:36:55 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w4B1atHx020814 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 11 May 2018 01:36:55 GMT Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w4B1asmf008328; Fri, 11 May 2018 01:36:54 GMT Received: from [192.168.1.250] (/73.231.34.197) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 10 May 2018 18:36:54 -0700 Subject: Re: [PATCH] mlx4_core: allocate 4KB ICM chunks To: Yanjun Zhu , tariqt@mellanox.com, davem@davemloft.net Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org References: <20180510233143.7236-1-qing.huang@oracle.com> <6454d5eb-2700-4fe2-fd91-d366112c4674@oracle.com> <4a23dac7-fafe-3b1c-7284-75f3a38f420c@oracle.com> <6768e075-70f5-4de3-a98a-fdffa53e0a2f@oracle.com> From: Qing Huang Message-ID: Date: Thu, 10 May 2018 18:36:58 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <6768e075-70f5-4de3-a98a-fdffa53e0a2f@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8889 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805110009 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thank you for reviewing it! On 5/10/2018 6:23 PM, Yanjun Zhu wrote: > > > > On 2018/5/11 9:15, Qing Huang wrote: >> >> >> >> On 5/10/2018 5:13 PM, Yanjun Zhu wrote: >>> >>> >>> On 2018/5/11 7:31, Qing Huang wrote: >>>> When a system is under memory presure (high usage with fragments), >>>> the original 256KB ICM chunk allocations will likely trigger kernel >>>> memory management to enter slow path doing memory compact/migration >>>> ops in order to complete high order memory allocations. >>>> >>>> When that happens, user processes calling uverb APIs may get stuck >>>> for more than 120s easily even though there are a lot of free pages >>>> in smaller chunks available in the system. >>>> >>>> Syslog: >>>> ... >>>> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task >>>> oracle_205573_e:205573 blocked for more than 120 seconds. >>>> ... >>>> >>>> With 4KB ICM chunk size, the above issue is fixed. >>>> >>>> However in order to support 4KB ICM chunk size, we need to fix another >>>> issue in large size kcalloc allocations. >>>> >>>> E.g. >>>> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk >>>> size, each ICM chunk can only hold 512 mtt entries (8 bytes for >>>> each mtt >>>> entry). So we need a 16MB allocation for a table->icm pointer array to >>>> hold 2M pointers which can easily cause kcalloc to fail. >>>> >>>> The solution is to use vzalloc to replace kcalloc. There is no need >>>> for contiguous memory pages for a driver meta data structure (no need >>> Hi, >>> >>> Replace continuous memory pages with virtual memory, is there any >>> performance loss? >> >> Not really. "table->icm" will be accessed as individual pointer >> variables randomly. Kcalloc > > Sure. Thanks. If "table->icm" will be accessed as individual pointer > variables randomly, the performance loss > caused by discontinuous memory will be very trivial. > > Reviewed-by: Zhu Yanjun > >> also returns a virtual address except its mapped pages are guaranteed >> to be contiguous >> which will provide little advantage over vzalloc for individual >> pointer variable access. >> >> Qing >> >>> >>> Zhu Yanjun >>>> of DMA ops). >>>> >>>> Signed-off-by: Qing Huang >>>> Acked-by: Daniel Jurgens >>>> --- >>>>   drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++++++------- >>>>   1 file changed, 7 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c >>>> b/drivers/net/ethernet/mellanox/mlx4/icm.c >>>> index a822f7a..2b17a4b 100644 >>>> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c >>>> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c >>>> @@ -43,12 +43,12 @@ >>>>   #include "fw.h" >>>>     /* >>>> - * We allocate in as big chunks as we can, up to a maximum of 256 KB >>>> - * per chunk. >>>> + * We allocate in 4KB page size chunks to avoid high order memory >>>> + * allocations in fragmented/high usage memory situation. >>>>    */ >>>>   enum { >>>> -    MLX4_ICM_ALLOC_SIZE    = 1 << 18, >>>> -    MLX4_TABLE_CHUNK_SIZE    = 1 << 18 >>>> +    MLX4_ICM_ALLOC_SIZE    = 1 << 12, >>>> +    MLX4_TABLE_CHUNK_SIZE    = 1 << 12 >>>>   }; >>>>     static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct >>>> mlx4_icm_chunk *chunk) >>>> @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, >>>> struct mlx4_icm_table *table, >>>>       obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size; >>>>       num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk; >>>>   -    table->icm      = kcalloc(num_icm, sizeof(*table->icm), >>>> GFP_KERNEL); >>>> +    table->icm      = vzalloc(num_icm * sizeof(*table->icm)); >>>>       if (!table->icm) >>>>           return -ENOMEM; >>>>       table->virt     = virt; >>>> @@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, >>>> struct mlx4_icm_table *table, >>>>               mlx4_free_icm(dev, table->icm[i], use_coherent); >>>>           } >>>>   -    kfree(table->icm); >>>> +    vfree(table->icm); >>>>         return -ENOMEM; >>>>   } >>>> @@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev >>>> *dev, struct mlx4_icm_table *table) >>>>               mlx4_free_icm(dev, table->icm[i], table->coherent); >>>>           } >>>>   -    kfree(table->icm); >>>> +    vfree(table->icm); >>>>   } >>> >> >