Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1279655imm; Tue, 5 Jun 2018 11:53:15 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIoRELe+WYvOZx3nj7vNp+EzQDHFegUPW2ebCXgVtPXPJECw5Qgxvb3zCV3irIo8GE+mOhv X-Received: by 2002:a17:902:26a:: with SMTP id 97-v6mr28054545plc.367.1528224794981; Tue, 05 Jun 2018 11:53:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528224794; cv=none; d=google.com; s=arc-20160816; b=im2TJ9d5Hjfas0z+tCDU7UK07t3cqRn/9vmGD+Vuhrewtni6vm/zyfhOIhNSGWwK// oYyJo298XGekz7Yoh+Df/dEiQ4Nd2/1PSWDAFah+J/QZPi//TKxpVg2WIS+Zfg07EwH5 YY4EAM1uZWnnCfkWd8Ft07xicoQ6fFdAU4Dz9xm5XwCmNEdlYyq7H7YMuwt75tw/AWIN 6Ca8hdEBXIX+EDMe/aCuiTHbbX8peXU0lHxjhn6qE/lL62cYEFztjaJ3It2JaTdY44fX itTf7s8E31ziGcYhTGVuKK83TwojjjE6KQDLe6PXUc49eriYDrk7JJ8R1dRCRPutjeRc aG/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=NB8msl4GWd5RTR1htqpggDatQ0f7vQP4p91X4x2s9SE=; b=BBmQIyYKPSOi1E15hz8TCDvab0OTxJSTZNO3YKb/x57s/HKAEgO+6K7foSmwZsGRtO hflkkW62ZPUJQl8BOTaIQWAGe0xrH1ZmApfTPxTsE6XK/gJ3HaugPSOHR8cfemfRWAOa 4tgiGwmJ8j80/iHxemEfv8UlKFAVoGNT+yJ70KhlbNM5hvzQSup7jZa70Z14wTbvST17 6dCrrxXV4K6py43eRQUxpzew41k3Jlr2EhzhVAaIb6W/JdgEAt0WITYyUNZadMVE2GYa eGG/ct9r1vVerMSzDA28I9ySfz5t7KHS3Qqz0t6rF8mv1fjETcONd0Gdl50X2EGezjHl CeLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=AHR1G0iw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k6-v6si11973458pgk.256.2018.06.05.11.53.00; Tue, 05 Jun 2018 11:53:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=AHR1G0iw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752110AbeFESvh (ORCPT + 99 others); Tue, 5 Jun 2018 14:51:37 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:50452 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751812AbeFESvf (ORCPT ); Tue, 5 Jun 2018 14:51:35 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w55IpHqs124727; Tue, 5 Jun 2018 18:51:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=NB8msl4GWd5RTR1htqpggDatQ0f7vQP4p91X4x2s9SE=; b=AHR1G0iwj38HMI73dnuWPBovO6iuwCBjCHFVFkRYzbCkYiH1jrwDtM1m7ZzyeajMtccq eUxLBDGc5wjrBxAxLxNsDI6cp6s4hVR88vSp1jm2IW08BNlS21jG7a9KEObwbVg/A8Yf qAZSJibW8nAGeY/xuH80ssxKoRRzVWZLtjfvc5OIQcPXjVi9F2Dg0hnLUv1LqwC96ebt pYr+Ip7DsjBa+W4K5/wQoGvUqNgSw/PUf7aHG9QJeeqXCkGLp4BzMR018iv9E23cQ8jU 9VkLC71IHRXVyo9TlxVI8mtatFPNxnQbDVT1/+8doaQngLsSyI9FJ6WtPEy8okdghnJn uQ== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2jbvyp9m8m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 05 Jun 2018 18:51:20 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w55IpJg3011902 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 5 Jun 2018 18:51:19 GMT Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w55IpIpl020901; Tue, 5 Jun 2018 18:51:18 GMT Received: from [192.168.1.250] (/24.130.61.68) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 05 Jun 2018 11:51:18 -0700 Subject: Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks To: Vlastimil Babka , Michal Hocko Cc: Eric Dumazet , David Miller , tariqt@mellanox.com, haakon.bugge@oracle.com, yanjun.zhu@oracle.com, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, gi-oh.kim@profitbricks.com, "santosh.shilimkar@oracle.com" , rama nichanamatlu References: <7a353b65-6b7f-1aee-1c48-e83c8e02f693@gmail.com> <0e11e0fc-6ccf-aa93-9c4f-b9eae1b90643@gmail.com> <20180531065405.GH15278@dhcp22.suse.cz> <20180531085532.GK15278@dhcp22.suse.cz> <20180531091022.GL15278@dhcp22.suse.cz> <7d8f52e1-aa16-d20c-a9a8-35ad88c0b1ab@oracle.com> <20180601073137.GV15278@dhcp22.suse.cz> <20180604062737.GA19202@dhcp22.suse.cz> From: Qing Huang Message-ID: Date: Tue, 5 Jun 2018 11:51:18 -0700 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8915 signatures=668702 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1805220000 definitions=main-1806050213 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/4/2018 5:40 AM, Vlastimil Babka wrote: > On 06/04/2018 08:27 AM, Michal Hocko wrote: >> On Fri 01-06-18 15:05:26, Qing Huang wrote: >>> >>> On 6/1/2018 12:31 AM, Michal Hocko wrote: >>>> On Thu 31-05-18 19:04:46, Qing Huang wrote: >>>>> On 5/31/2018 2:10 AM, Michal Hocko wrote: >>>>>> On Thu 31-05-18 10:55:32, Michal Hocko wrote: >>>>>>> On Thu 31-05-18 04:35:31, Eric Dumazet wrote: >>>>>> [...] >>>>>>>> I merely copied/pasted from alloc_skb_with_frags() :/ >>>>>>> I will have a look at it. Thanks! >>>>>> OK, so this is an example of an incremental development ;). >>>>>> >>>>>> __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for >>>>>> high order allocations") to prevent from OOM killer. Yet this was >>>>>> not enough because fb05e7a89f50 ("net: don't wait for order-3 page >>>>>> allocation") didn't want an excessive reclaim for non-costly orders >>>>>> so it made it completely NOWAIT while it preserved __GFP_NORETRY in >>>>>> place which is now redundant. Should I send a patch? >>>>>> >>>>> Just curious, how about GFP_ATOMIC flag? Would it work in a similar fashion? >>>>> We experimented >>>>> with it a bit in the past but it seemed to cause other issue in our tests. >>>>> :-) >>>> GFP_ATOMIC is a non-sleeping (aka no reclaim) context with an access to >>>> memory reserves. So the risk is that you deplete those reserves and >>>> cause issues to other subsystems which need them as well. >>>> >>>>> By the way, we didn't encounter any OOM killer events. It seemed that the >>>>> mlx4_alloc_icm() triggered slowpath. >>>>> We still had about 2GB free memory while it was highly fragmented. >>>> The compaction was able to make a reasonable forward progress for you. >>>> But considering mlx4_alloc_icm is called with GFP_KERNEL resp. GFP_HIGHUSER >>>> then the OOM killer is clearly possible as long as the order is lower >>>> than 4. >>> The allocation was 256KB so the order was much higher than 4. The compaction >>> seemed to be the root >>> cause for our problem. It took too long to finish its work while putting >>> mlx4_alloc_icm to sleep in a heavily >>> fragmented memory situation . Will NORETRY flag avoid the compaction ops and >>> fail the 256KB allocation >>> immediately so mlx4_alloc_icm can enter adjustable lower order allocation >>> code path quickly? >> Costly orders should only perform a light compaction attempt unless >> __GFP_RETRY_MAY_FAIL is used IIRC. CCing Vlastimil. So __GFP_NORETRY >> shouldn't make any difference. > It's a bit more complicated. Costly allocations will try the light > compaction attempt first, even before reclaim. This is followed by > reclaim and a more costly compaction attempt. With __GFP_NORETRY, the > second compaction attempt is also only the light one, so the flag does > make a difference here. Thanks for the clarification! Looks like our production kernel is kinda old, neither __GFP_DIRECT_RECLAIM nor __GFP_NORETRY has been used in __alloc_pages_slowpath() in our kernel.