Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3005750imm; Sun, 13 May 2018 02:01:07 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoI/ODhAvoXF2DjD6+YMpxQuzhP4VVnpu7SM7F5xrXqCgmbm/aH/Dp2wvFUWk+lJA19x34y X-Received: by 2002:a63:77c9:: with SMTP id s192-v6mr4961391pgc.364.1526202067614; Sun, 13 May 2018 02:01:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526202067; cv=none; d=google.com; s=arc-20160816; b=PBpt2kmw2t73yRwN4jGHnBYafYKo1YbOjxYHJybJUlCCUPBy7w5p73b/AIMHttRsSr SloGE4qA8DSrG+gz2tfRxDr+D5sLVQYaV4rKaVc/ImzGFE/XJB5rXe45wj9CPwqjxzgO 89N5u1EdkdAzm/5IyVIdYGmxJagU5BWk1OgNa3fzrdKJzcAREUdoizW2N7IELuZX3GyY xBeOZ6tSYd9OiUCcp/WBOXsgN+C2DaKOIZWRujjE+DdYNAqjf/HkKYJr8mC4f0mhAnzQ cZZPMNDCGraVZ7FATP5NeHVsWCVOiZN/qa9y0ZhXff1nyvRRyD15BUYfvUcU2twBpwRc Mi3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=Hw+bkAHfoR7L0k8TeYrhtLgXhcGZIbcTiwgeGX0exEE=; b=m+Jq3NpMkZwb0XJqriU0C/IyyLzf8cFP2B6EQF6bPRKY5qRMsFFxGBfsnwPqY/hq/z qcqQNLpYWc/zPwrUYFu39QKmZ8griMJsdZPNmO89/Tm6TqzG3S8cMd+3fEB+hOB3PtDi 0EqXxSKfzDjX+x/54HaU1FKxIVp1n/cEZTujJLDleoZqjxIagDIblUKrhpV+qNBImrDF fxI+J1dyYTcFW0H3qwCC0SJFZ/UlktrLqvvjhezPf7idFf5EQYNsEz4VpgskH6/OpxBW t+/XxgPh8kY7psEwGFxOPMNk3RekYY1eQDlXLaHJOmoVeD7Y5fsAcu9Z/9eLWBNvmDEO ADCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=QvD3w9M8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a92-v6si6903546pla.291.2018.05.13.02.00.52; Sun, 13 May 2018 02:01:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=QvD3w9M8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751364AbeEMJAm (ORCPT + 99 others); Sun, 13 May 2018 05:00:42 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:45647 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751052AbeEMJAk (ORCPT ); Sun, 13 May 2018 05:00:40 -0400 Received: by mail-wr0-f196.google.com with SMTP id p5-v6so9182716wre.12; Sun, 13 May 2018 02:00:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Hw+bkAHfoR7L0k8TeYrhtLgXhcGZIbcTiwgeGX0exEE=; b=QvD3w9M8zVwWCBiiiekiPzgLKtKPqXKgdnFrJBmravTywNAKkw8/vmlf8vluIZsVGp paeGkUQH9EIsZKX3oCQ5ZZHYckQPRhIbQyItaMXKd3miGKxV2NadPQzL7K1rdsUAUAeX JvgS5seVkHue3Wzua/3OzzTcJvhVqbiVZFyJj2BkfZoD0h+BK4oLeB561qHoM5y/fy4/ 3NBLZX67InuLTh3Pw5VZuhW5ZMxVXh7qzqykYrH7sIfNAz3hooP2BK3XDpdQMxVskWrq CbNfRhssPSlURLfbA1cqA0c+Z12aKBIHdXD5iN4KeZ6BPXcGMwz2jD1ljfRTcpTRLBdl Iyzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Hw+bkAHfoR7L0k8TeYrhtLgXhcGZIbcTiwgeGX0exEE=; b=U6SJWXXeHzugiCu0MMwizJehFjPlyaGtr72cJtdJSLlVw8NPIl/tbLlGAGB1LuF4/F SiBhSGkvCobE6YlC/SEXu10sEsvLEAUFqHmBnX4gGbNf5d6d1aXsRAni0avDPXwYqSHE Q74mA/Jondd5d/wgITG5u397HVzcHahtNHrgnnpz8jDkjS+MGO3ciRb7o/w+QJ4OMKrE mPHUsSZTU8ZqgIT7K8t7VE3RweDnI1sEs6ORqP3IAuNKQJQzyTapLU6HZMoKQIXTHCbG fCM/a1v8fYaru73YPMRxNAezepSUaUnQNerEwNFtojWCzih/D3fwV9mo/46y/kU9piR4 9uig== X-Gm-Message-State: ALKqPwc2CwVSABfTmZlYejGOYKZtpHJZ2lKiwaJv9k/9C57otB8jpSRI 4Akqn2OKV81ITsoexVX5+pnVFz+d X-Received: by 2002:adf:8212:: with SMTP id 18-v6mr3805846wrb.144.1526202038169; Sun, 13 May 2018 02:00:38 -0700 (PDT) Received: from [10.8.16.249] ([193.47.165.251]) by smtp.gmail.com with ESMTPSA id m134-v6sm5741839wmg.4.2018.05.13.02.00.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 13 May 2018 02:00:37 -0700 (PDT) Subject: Re: [PATCH V2] mlx4_core: allocate ICM memory in page size chunks To: Qing Huang , tariqt@mellanox.com, davem@davemloft.net, haakon.bugge@oracle.com, yanjun.zhu@oracle.com Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org References: <20180511192318.22342-1-qing.huang@oracle.com> From: Tariq Toukan Message-ID: <2797ac27-022c-0818-388c-e4a6131ad1ca@gmail.com> Date: Sun, 13 May 2018 12:00:35 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180511192318.22342-1-qing.huang@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/05/2018 10:23 PM, Qing Huang wrote: > When a system is under memory presure (high usage with fragments), > the original 256KB ICM chunk allocations will likely trigger kernel > memory management to enter slow path doing memory compact/migration > ops in order to complete high order memory allocations. > > When that happens, user processes calling uverb APIs may get stuck > for more than 120s easily even though there are a lot of free pages > in smaller chunks available in the system. > > Syslog: > ... > Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task > oracle_205573_e:205573 blocked for more than 120 seconds. > ... > > With 4KB ICM chunk size on x86_64 arch, the above issue is fixed. > > However in order to support smaller ICM chunk size, we need to fix > another issue in large size kcalloc allocations. > > E.g. > Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk > size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt > entry). So we need a 16MB allocation for a table->icm pointer array to > hold 2M pointers which can easily cause kcalloc to fail. > > The solution is to use vzalloc to replace kcalloc. There is no need > for contiguous memory pages for a driver meta data structure (no need > of DMA ops). > > Signed-off-by: Qing Huang > Acked-by: Daniel Jurgens > Reviewed-by: Zhu Yanjun > --- > v2 -> v1: adjusted chunk size to reflect different architectures. > > drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c > index a822f7a..ccb62b8 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/icm.c > +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c > @@ -43,12 +43,12 @@ > #include "fw.h" > > /* > - * We allocate in as big chunks as we can, up to a maximum of 256 KB > - * per chunk. > + * We allocate in page size (default 4KB on many archs) chunks to avoid high > + * order memory allocations in fragmented/high usage memory situation. > */ > enum { > - MLX4_ICM_ALLOC_SIZE = 1 << 18, > - MLX4_TABLE_CHUNK_SIZE = 1 << 18 > + MLX4_ICM_ALLOC_SIZE = 1 << PAGE_SHIFT, > + MLX4_TABLE_CHUNK_SIZE = 1 << PAGE_SHIFT Which is actually PAGE_SIZE. Also, please add a comma at the end of the last entry. > }; > > static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk) > @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table, > obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size; > num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk; > > - table->icm = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL); > + table->icm = vzalloc(num_icm * sizeof(*table->icm)); Why not kvzalloc ? > if (!table->icm) > return -ENOMEM; > table->virt = virt; > @@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table, > mlx4_free_icm(dev, table->icm[i], use_coherent); > } > > - kfree(table->icm); > + vfree(table->icm); > > return -ENOMEM; > } > @@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table) > mlx4_free_icm(dev, table->icm[i], table->coherent); > } > > - kfree(table->icm); > + vfree(table->icm); > } > Thanks for your patch. I need to verify there is no dramatic performance degradation here. You can prepare and send a v3 in the meanwhile. Thanks, Tariq