Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp4646727imm; Mon, 14 May 2018 10:29:32 -0700 (PDT) X-Google-Smtp-Source: AB8JxZow1btCoFXN7CtbDIr9ahbKBb5kA4L3Ld83qP0KBX8QoTkTuOipT7AQ/ItZCdTAwsTRFQVG X-Received: by 2002:a17:902:22cc:: with SMTP id o12-v6mr11167668plg.38.1526318972332; Mon, 14 May 2018 10:29:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526318972; cv=none; d=google.com; s=arc-20160816; b=oRnWUjkMk9djyv5+uZ4ecQNLo3KIzpStUq/o4IjDy0FSEExCzagTmQfcwx8v0sN5QG f/Si2BUHr4nUlOsjfj2jTQVuMJnBX1+yKYAofy8kVBH+T6SRU94PL9CczUpRaU8KjETT kOKFI94GxJjXh5roHDNq5PRqp/ZglMYcGxLEoAcNCF1+qoSERYYSgFkynjNTztjp+2tw JBUMnxPAKSDzW0OmCfPvvjj2ImNk7sYxHiGHVGlcIYqNulr5zrw7wfi8+wN749T4uas+ SJT7C1eG4UhIu0W973dAK2AL3BuKZwKeZpP0Cbj+OWcWbkGmxTmuVpCL+w1WmDR1t92V 398w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=m1BalHDLwYNqfExr3saKrF+RZ6UDgIostOKtLz+88fs=; b=aPVnJKdajobyshByqLgrEwjFVCGw5Fvy0xluWwSIa5UQ3Wy65ufC9I6guwUERDKCgE on/1M+q+LOZOHZYBBSqK0xbNi86FPEebshSuAqGb+Y+gB9KLQ6uJuQL7ya8YgzdqLMBP Ds9+nRaFhJrzm/vYk2c0USZhjdjzHuuHRl7UckJ5zZXy9nFa+yltSW8odf1p3p+674Op pV0DaJAv324mPoDbyCLMWLugGyXarsj1lN+MLDSWHOArl0+7CwR92RjfJ7ieF+FRp3cl 5Hwg7CEuRvdZ1R1wAW4f8TBABOJ1P5muv+GAMBDvRz1alwdmO/wq5DxuiUfFLyrvk8L7 QRJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=Po+/vpiq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o19-v6si9900359pll.447.2018.05.14.10.29.17; Mon, 14 May 2018 10:29:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=Po+/vpiq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932184AbeENQmC (ORCPT + 99 others); Mon, 14 May 2018 12:42:02 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:51844 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753215AbeENQmA (ORCPT ); Mon, 14 May 2018 12:42:00 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w4EGeivl008422; Mon, 14 May 2018 16:41:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=m1BalHDLwYNqfExr3saKrF+RZ6UDgIostOKtLz+88fs=; b=Po+/vpiqr+iqAT5MYACzSZ+gt8WZiXjLBZjDClh7J1TB1+lwHez9SgEozhZFWaCcOnJp IshQuiu17ytCprgkDeeABd5cIUX6ZPLIZS0Tcz66h+r0QnKodyWaifQBzuzgZeCWQd9U 1wjCxz1Cv7yoCPZ6F+SISl5gNT8xjYKdqjKOk6xM4VIq1SZk/DhjZfwLX0j+klzMr95l jXA5RJNoqIsm/0KiBi8OZDlf6xcaTpzoFaRwd89xOh3aUu04t3ZrctsyEByc4B9eph9X QjkkD6GUWEQWh1h2zPsB7kKq8Kw2cFUKCnhHq5cwiS6bQcy/k6hQ47HY6e6QdlDKJOaJ NQ== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2hx29vvv29-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 14 May 2018 16:41:50 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w4EGfnWE010697 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 14 May 2018 16:41:49 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w4EGfnBo019308; Mon, 14 May 2018 16:41:49 GMT Received: from [192.168.1.251] (/73.231.34.197) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 14 May 2018 09:41:49 -0700 Subject: Re: [PATCH V2] mlx4_core: allocate ICM memory in page size chunks To: Tariq Toukan , tariqt@mellanox.com, davem@davemloft.net, haakon.bugge@oracle.com, yanjun.zhu@oracle.com Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org References: <20180511192318.22342-1-qing.huang@oracle.com> <2797ac27-022c-0818-388c-e4a6131ad1ca@gmail.com> From: Qing Huang Message-ID: Date: Mon, 14 May 2018 09:41:49 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <2797ac27-022c-0818-388c-e4a6131ad1ca@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8893 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=9 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805140170 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/13/2018 2:00 AM, Tariq Toukan wrote: > > > On 11/05/2018 10:23 PM, Qing Huang wrote: >> When a system is under memory presure (high usage with fragments), >> the original 256KB ICM chunk allocations will likely trigger kernel >> memory management to enter slow path doing memory compact/migration >> ops in order to complete high order memory allocations. >> >> When that happens, user processes calling uverb APIs may get stuck >> for more than 120s easily even though there are a lot of free pages >> in smaller chunks available in the system. >> >> Syslog: >> ... >> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task >> oracle_205573_e:205573 blocked for more than 120 seconds. >> ... >> >> With 4KB ICM chunk size on x86_64 arch, the above issue is fixed. >> >> However in order to support smaller ICM chunk size, we need to fix >> another issue in large size kcalloc allocations. >> >> E.g. >> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk >> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt >> entry). So we need a 16MB allocation for a table->icm pointer array to >> hold 2M pointers which can easily cause kcalloc to fail. >> >> The solution is to use vzalloc to replace kcalloc. There is no need >> for contiguous memory pages for a driver meta data structure (no need >> of DMA ops). >> >> Signed-off-by: Qing Huang >> Acked-by: Daniel Jurgens >> Reviewed-by: Zhu Yanjun >> --- >> v2 -> v1: adjusted chunk size to reflect different architectures. >> >>   drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++++++------- >>   1 file changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c >> b/drivers/net/ethernet/mellanox/mlx4/icm.c >> index a822f7a..ccb62b8 100644 >> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c >> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c >> @@ -43,12 +43,12 @@ >>   #include "fw.h" >>     /* >> - * We allocate in as big chunks as we can, up to a maximum of 256 KB >> - * per chunk. >> + * We allocate in page size (default 4KB on many archs) chunks to >> avoid high >> + * order memory allocations in fragmented/high usage memory situation. >>    */ >>   enum { >> -    MLX4_ICM_ALLOC_SIZE    = 1 << 18, >> -    MLX4_TABLE_CHUNK_SIZE    = 1 << 18 >> +    MLX4_ICM_ALLOC_SIZE    = 1 << PAGE_SHIFT, >> +    MLX4_TABLE_CHUNK_SIZE    = 1 << PAGE_SHIFT > > Which is actually PAGE_SIZE. Yes, we wanted to avoid high order memory allocations. > Also, please add a comma at the end of the last entry. Hmm..., followed the existing code style and checkpatch.pl didn't complain about the comma. > >>   }; >>     static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct >> mlx4_icm_chunk *chunk) >> @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, >> struct mlx4_icm_table *table, >>       obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size; >>       num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk; >>   -    table->icm      = kcalloc(num_icm, sizeof(*table->icm), >> GFP_KERNEL); >> +    table->icm      = vzalloc(num_icm * sizeof(*table->icm)); > > Why not kvzalloc ? I think table->icm pointer array doesn't really need physically contiguous memory. Sometimes high order memory allocation by kmalloc variants may trigger slow path and cause tasks to be blocked. Thanks, Qing > >>       if (!table->icm) >>           return -ENOMEM; >>       table->virt     = virt; >> @@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, >> struct mlx4_icm_table *table, >>               mlx4_free_icm(dev, table->icm[i], use_coherent); >>           } >>   -    kfree(table->icm); >> +    vfree(table->icm); >>         return -ENOMEM; >>   } >> @@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, >> struct mlx4_icm_table *table) >>               mlx4_free_icm(dev, table->icm[i], table->coherent); >>           } >>   -    kfree(table->icm); >> +    vfree(table->icm); >>   } >> > > Thanks for your patch. > > I need to verify there is no dramatic performance degradation here. > You can prepare and send a v3 in the meanwhile. > > Thanks, > Tariq > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at  http://vger.kernel.org/majordomo-info.html