Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1781526imm; Thu, 24 May 2018 00:25:07 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqb9uNB6/4pj5e+hw1PC3ReDJFBMJlgoADKlnQ4UtoZDw4mx5xALDBxtS5hN1x1uBhVDdL7 X-Received: by 2002:a65:50c7:: with SMTP id s7-v6mr4881593pgp.359.1527146707311; Thu, 24 May 2018 00:25:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527146707; cv=none; d=google.com; s=arc-20160816; b=cX8HRmoa/X/fIp7wtywlbvcaPK7WkUSwT/f3L1p2Q+FNGqXuCDL8tl9B6gqPDgd3CP KxKuZ0sv5CXyDNVrFviTCjtCWf3Miz/Z6PjFvJA7Pmiz1ZobSWyZ3UtkOJ9QMjJd3bnf nD+LeToIMc+RDwZwPQeVVsGm8Sq0l5/Jc3yO35gaFnCgaaSbdhZZ0hof1kyz0wdFSlUE LQespSy74UW/D6CPeI4G71PFcP+Pxn82Y2KkOl8gPC+punKnj6lpoiHeeAN6uvWngnph 2u7AmdvdQsNwBqr0m8tRPcHFtlPl7klNYAbgIE+TTnkGNbemK617jSJspu46muozVr1W q5hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=bRpuglkNW33tS3Lv84pnyvgfeOrI2/2sp4b9bDxFL8k=; b=j1P0EX6XwD1M3JVfeLbnNmGA7t/3+ZuR/trGZO+iE99JrWTL1T0vux1AhMi2d1ufgh I2VFKnMLtQ2v2ARYxE+6qSa2cpNA3ZTDMFCRM9eBEHsSLUxU9gjmJQ2bUtDO5U4yrtf0 wp2tLpwrDOwiS08RpCpY+v/gt19Iflw6LAaSEmspjaQ16lJWdtOx715ziTXrkxQqd+Oh JTrYKbDLWICJa1abJMUMnm3Jftcoi611hhAV8oSS5WucrpD17GK21YUp0IIjyJSu2hF1 1bqNLEG2HY++RP1oEDoUcitD5xAE2Mp9rXLMIzTiXhMbbYSt/PS/UjKGKDbhSf5RXFXt nTsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=ZHwRp7Gq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m2-v6si14794471pgc.588.2018.05.24.00.24.52; Thu, 24 May 2018 00:25:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=ZHwRp7Gq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965006AbeEXHYh (ORCPT + 99 others); Thu, 24 May 2018 03:24:37 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:33806 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935628AbeEXHYe (ORCPT ); Thu, 24 May 2018 03:24:34 -0400 Received: by mail-wm0-f66.google.com with SMTP id q4-v6so6501199wmq.1 for ; Thu, 24 May 2018 00:24:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=profitbricks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=bRpuglkNW33tS3Lv84pnyvgfeOrI2/2sp4b9bDxFL8k=; b=ZHwRp7GqqXj33Gu36J62WE7KIcON8qaZ3AAyZbKrla0NbEXCMf3ohYcorTHOUQL/9d SrEVPjSNt6j7XGZ9DKoZfzjT5u/R8Y3zdm3jDMYzogsaUQO8ussTmFR8yaG5tfvQSGRE t4uDwvOlxf/Q879McK7znvU3FbEZoCOL23sv+Vf8UK4Z4GJ7P+eu8WZzAyxqqukW36pt +WRyU9w3u/iafl77nbbxf9gwYjVlQo8QqAFe9sLlYM97gBW8ytAaXYd+IkhZVErkdGCH IQwQZ2k6np2dxv6xnufmgg5BCpFI57GW+UvM4pa3xgHdrxXueovuzEBJdDtx0TzmnKxg qz3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=bRpuglkNW33tS3Lv84pnyvgfeOrI2/2sp4b9bDxFL8k=; b=MXT6Ykb+jUyIRk0D3P+G6v/LhCi0pXAHjRU6r1NxOCtcUzHlP9cV5u1MGwy4wC8TfT +C57CoPhK8cstAPAukXZtrJIs2bxTPVOA50RD6IDy/MdCBdpvE0WunSZrJTtmDLxH57K BQKDnGyzsRRmGf64T6lFChUPKr/IpZ19+8Gd3TfzVwdNy+NE/U6/vj7EdncnGpVXWeL5 6XMtCqx6TBMEefKse1Zopoc3N2Nmcr2cdK9s1JgjQemPnXKZY+LWnpPL7hJuzfPLaaKP IB2lHd+qezVD75Kd4+JsA7FWuL4s+0L52l45eyQbk3Z5mQMUXSrZ7xe8vEuvU2vX6LfF 8mVQ== X-Gm-Message-State: ALKqPwfl2Hli7B1SvCBpi2iEDE7W5771lVOFb+MInUXpvNDh0UO8uEb8 vXCXbLSFrJrZlb6tE9FY41jcSg1HaG20QHLGIekFvQ== X-Received: by 2002:a50:dc02:: with SMTP id q2-v6mr10647472edk.245.1527146673198; Thu, 24 May 2018 00:24:33 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a50:db06:0:0:0:0:0 with HTTP; Thu, 24 May 2018 00:23:52 -0700 (PDT) In-Reply-To: <20180523232246.20445-1-qing.huang@oracle.com> References: <20180523232246.20445-1-qing.huang@oracle.com> From: Gi-Oh Kim Date: Thu, 24 May 2018 09:23:52 +0200 Message-ID: Subject: Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks To: Qing Huang Cc: Tariq Toukan , davem@davemloft.net, haakon.bugge@oracle.com, yanjun.zhu@oracle.com, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 24, 2018 at 1:22 AM, Qing Huang wrote: > When a system is under memory presure (high usage with fragments), > the original 256KB ICM chunk allocations will likely trigger kernel > memory management to enter slow path doing memory compact/migration > ops in order to complete high order memory allocations. > > When that happens, user processes calling uverb APIs may get stuck > for more than 120s easily even though there are a lot of free pages > in smaller chunks available in the system. > > Syslog: > ... > Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task > oracle_205573_e:205573 blocked for more than 120 seconds. > ... > > With 4KB ICM chunk size on x86_64 arch, the above issue is fixed. > > However in order to support smaller ICM chunk size, we need to fix > another issue in large size kcalloc allocations. > > E.g. > Setting log_num_mtt=3D30 requires 1G mtt entries. With the 4KB ICM chunk > size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt > entry). So we need a 16MB allocation for a table->icm pointer array to > hold 2M pointers which can easily cause kcalloc to fail. > > The solution is to use kvzalloc to replace kcalloc which will fall back > to vmalloc automatically if kmalloc fails. Hi, Could you please write why it first try to allocate the contiguous pages? I think it is necessary to comment why it uses kvzalloc instead of vzalloc. > > Signed-off-by: Qing Huang > Acked-by: Daniel Jurgens > Reviewed-by: Zhu Yanjun +Reviewed-by: Gioh Kim > --- > v4: use kvzalloc instead of vzalloc > add one err condition check > don't include vmalloc.h any more > > v3: use PAGE_SIZE instead of PAGE_SHIFT > add comma to the end of enum variables > include vmalloc.h header file to avoid build issues on Sparc > > v2: adjusted chunk size to reflect different architectures > > drivers/net/ethernet/mellanox/mlx4/icm.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ether= net/mellanox/mlx4/icm.c > index a822f7a..685337d 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/icm.c > +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c > @@ -43,12 +43,12 @@ > #include "fw.h" > > /* > - * We allocate in as big chunks as we can, up to a maximum of 256 KB > - * per chunk. > + * We allocate in page size (default 4KB on many archs) chunks to avoid = high > + * order memory allocations in fragmented/high usage memory situation. > */ > enum { > - MLX4_ICM_ALLOC_SIZE =3D 1 << 18, > - MLX4_TABLE_CHUNK_SIZE =3D 1 << 18 > + MLX4_ICM_ALLOC_SIZE =3D PAGE_SIZE, > + MLX4_TABLE_CHUNK_SIZE =3D PAGE_SIZE, > }; > > static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_ch= unk *chunk) > @@ -398,9 +398,11 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct= mlx4_icm_table *table, > u64 size; > > obj_per_chunk =3D MLX4_TABLE_CHUNK_SIZE / obj_size; > + if (WARN_ON(!obj_per_chunk)) > + return -EINVAL; > num_icm =3D (nobj + obj_per_chunk - 1) / obj_per_chunk; > > - table->icm =3D kcalloc(num_icm, sizeof(*table->icm), GFP_KER= NEL); > + table->icm =3D kvzalloc(num_icm * sizeof(*table->icm), GFP_K= ERNEL); > if (!table->icm) > return -ENOMEM; > table->virt =3D virt; > @@ -446,7 +448,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct = mlx4_icm_table *table, > mlx4_free_icm(dev, table->icm[i], use_coherent); > } > > - kfree(table->icm); > + kvfree(table->icm); > > return -ENOMEM; > } > @@ -462,5 +464,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, str= uct mlx4_icm_table *table) > mlx4_free_icm(dev, table->icm[i], table->coherent= ); > } > > - kfree(table->icm); > + kvfree(table->icm); > } > -- > 2.9.3 > --=20 GIOH KIM Linux Kernel Entwickler ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Tel: +49 176 2697 8962 Fax: +49 30 577 008 299 Email: gi-oh.kim@profitbricks.com URL: https://www.profitbricks.de Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B Gesch=C3=A4ftsf=C3=BChrer: Achim Weiss, Matthias Steinberg, Christoph Steff= ens