Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1080829imm; Fri, 1 Jun 2018 15:06:27 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJzbV9vNcFl3EOmlz1T9DSLTgTyGwhHCHvh0/1KkqlQA8JhGKnrHpproZfEriWdyBaHYmIP X-Received: by 2002:a17:902:a702:: with SMTP id w2-v6mr12610605plq.8.1527890787074; Fri, 01 Jun 2018 15:06:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527890787; cv=none; d=google.com; s=arc-20160816; b=GfDFlTGGDZBs7IyMuQI/B5J6ztcpeaG8EsCsab8U16HzVmgKaSl9Rs2au6rsAijRyt HQLrOC34+IVfuXOjOQPjGUb2JNzA5ElDwojY4sf1k8MzJF5JoRzvr4+lv09jVwOXTKPm Cuh8kTwHK2Y9ThFSVa1koIsJQsiDUU87zJXYiILPuIuvDRRcLfQd0vEBjIjPGLxULjKT o4T0C2w1/Z58WavdorjrheOL+xbYUmch8uWrl1BgWBUO64xetc8g+DSPXNzYx9ygcmZE OPZkbpTAlOyypHL3ciqOoTikVSstUID9dn7ZOQQt0i2v0SEDarX9pzI7SfEqRLuwTVQx cZpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=zA5KQK9FL7EVdsuzGfweDlhHo9xq//ZMnfw8Ewj1hGs=; b=BVp4CBV666uxfIGvtA7c8BmmPbs08qzvZAK4D7j4pOR0U4cRXomtkYEJ+PRhlWoE/K NNsMclC1oAfuiy7fCqIem2BoFRnrpZ37SP1uAoOpoy3jqtYik1TsxxBZwbyzmVCvhkCy K0jSa8776tUY3aZoL4WeU4IortVm3vt87pJIggmTd5ljmocem+KhtQ8dl59sV+tRCAeq Kn0V/Os+ZqwrI7HKNmZUmMu1acd8JQvyMwA5AgiKtKVyeObP1gYL0rWKiiLsIXa7T5Zc ZIsCgNTgXqyK7wq50DQ9jk7eJYQ+3aCppnSgfXJfwFYdLHfa5Jbkqeph7xOO0UK2eF/c dLNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=Z4bX/Bvf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u6-v6si31376421pgv.420.2018.06.01.15.06.13; Fri, 01 Jun 2018 15:06:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=Z4bX/Bvf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751348AbeFAWFg (ORCPT + 99 others); Fri, 1 Jun 2018 18:05:36 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:54258 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750929AbeFAWFd (ORCPT ); Fri, 1 Jun 2018 18:05:33 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w51LuJLe139440; Fri, 1 Jun 2018 22:05:24 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=zA5KQK9FL7EVdsuzGfweDlhHo9xq//ZMnfw8Ewj1hGs=; b=Z4bX/Bvfm8h2pisehwRlzmAdtJy5PJO1AHdrLn6AseKIuRw2UH/bx9nfELi5A5npGS5k gmuLvXMnMBkdNok4H06Hy+fvz22oC0JvPorPGhb/G/z9iFmeOon2+vTwkPx/xP/b6bRF iv7v5Pp7fjg1+rH1yHFu1PtKqtttOC3n76cKFom/Lt42/CjyJTF3jTIIZNFat02qvskN IfFl1hNhz0TaGFLZF41AI+kZ7FMZk/P3dc9LNW9OxWg9+4yuqj+AkPS2cOUjBJoXBFQV FqsRl4SXg70Uz23TXDmNc1UJX5XDggUI1zR1eu+iHQlqw8y2TrZJaGDor/NezfEqHsAE +A== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2janje695c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 01 Jun 2018 22:05:24 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w51M5NmP026719 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 1 Jun 2018 22:05:23 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w51M5M3Q012707; Fri, 1 Jun 2018 22:05:22 GMT Received: from [192.168.1.122] (/24.130.61.68) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 01 Jun 2018 22:05:22 +0000 Subject: Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks To: Michal Hocko Cc: Eric Dumazet , David Miller , tariqt@mellanox.com, haakon.bugge@oracle.com, yanjun.zhu@oracle.com, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, gi-oh.kim@profitbricks.com, "santosh.shilimkar@oracle.com" References: <20180523232246.20445-1-qing.huang@oracle.com> <20180525.102321.858995452200286788.davem@davemloft.net> <7a353b65-6b7f-1aee-1c48-e83c8e02f693@gmail.com> <0e11e0fc-6ccf-aa93-9c4f-b9eae1b90643@gmail.com> <20180531065405.GH15278@dhcp22.suse.cz> <20180531085532.GK15278@dhcp22.suse.cz> <20180531091022.GL15278@dhcp22.suse.cz> <7d8f52e1-aa16-d20c-a9a8-35ad88c0b1ab@oracle.com> <20180601073137.GV15278@dhcp22.suse.cz> From: Qing Huang Message-ID: Date: Fri, 1 Jun 2018 15:05:26 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180601073137.GV15278@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8911 signatures=668702 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=797 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1805220000 definitions=main-1806010250 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/1/2018 12:31 AM, Michal Hocko wrote: > On Thu 31-05-18 19:04:46, Qing Huang wrote: >> >> On 5/31/2018 2:10 AM, Michal Hocko wrote: >>> On Thu 31-05-18 10:55:32, Michal Hocko wrote: >>>> On Thu 31-05-18 04:35:31, Eric Dumazet wrote: >>> [...] >>>>> I merely copied/pasted from alloc_skb_with_frags() :/ >>>> I will have a look at it. Thanks! >>> OK, so this is an example of an incremental development ;). >>> >>> __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for >>> high order allocations") to prevent from OOM killer. Yet this was >>> not enough because fb05e7a89f50 ("net: don't wait for order-3 page >>> allocation") didn't want an excessive reclaim for non-costly orders >>> so it made it completely NOWAIT while it preserved __GFP_NORETRY in >>> place which is now redundant. Should I send a patch? >>> >> Just curious, how about GFP_ATOMIC flag? Would it work in a similar fashion? >> We experimented >> with it a bit in the past but it seemed to cause other issue in our tests. >> :-) > GFP_ATOMIC is a non-sleeping (aka no reclaim) context with an access to > memory reserves. So the risk is that you deplete those reserves and > cause issues to other subsystems which need them as well. > >> By the way, we didn't encounter any OOM killer events. It seemed that the >> mlx4_alloc_icm() triggered slowpath. >> We still had about 2GB free memory while it was highly fragmented. > The compaction was able to make a reasonable forward progress for you. > But considering mlx4_alloc_icm is called with GFP_KERNEL resp. GFP_HIGHUSER > then the OOM killer is clearly possible as long as the order is lower > than 4. The allocation was 256KB so the order was much higher than 4. The compaction seemed to be the root cause for our problem. It took too long to finish its work while putting mlx4_alloc_icm to sleep in a heavily fragmented memory situation . Will NORETRY flag avoid the compaction ops and fail the 256KB allocation immediately so mlx4_alloc_icm can enter adjustable lower order allocation code path quickly? Thanks. > >>  #0 [ffff8801f308b380] remove_migration_pte at ffffffff811f0e0b >>  #1 [ffff8801f308b3e0] rmap_walk_file at ffffffff811cb890 >>  #2 [ffff8801f308b440] rmap_walk at ffffffff811cbaf2 >>  #3 [ffff8801f308b450] remove_migration_ptes at ffffffff811f0db0 >>  #4 [ffff8801f308b490] __unmap_and_move at ffffffff811f2ea6 >>  #5 [ffff8801f308b4e0] unmap_and_move at ffffffff811f2fc5 >>  #6 [ffff8801f308b540] migrate_pages at ffffffff811f3219 >>  #7 [ffff8801f308b5c0] compact_zone at ffffffff811b707e >>  #8 [ffff8801f308b650] compact_zone_order at ffffffff811b735d >>  #9 [ffff8801f308b6e0] try_to_compact_pages at ffffffff811b7485 >> #10 [ffff8801f308b770] __alloc_pages_direct_compact at ffffffff81195f96 >> #11 [ffff8801f308b7b0] __alloc_pages_slowpath at ffffffff811978a1 >> #12 [ffff8801f308b890] __alloc_pages_nodemask at ffffffff81197ec1 >> #13 [ffff8801f308b970] alloc_pages_current at ffffffff811e261f >> #14 [ffff8801f308b9e0] mlx4_alloc_icm at ffffffffa01f39b2 [mlx4_core] >> >> Thanks!