Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3243746imm; Sun, 3 Jun 2018 23:28:22 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKqTTa2hZeGcTTk6VLcHt/gzZBTmvU9ozuSaI+6thJfWv5vUo5D/rKDzkgk0WtRxyc8k/be X-Received: by 2002:a63:70a:: with SMTP id 10-v6mr16658581pgh.216.1528093702274; Sun, 03 Jun 2018 23:28:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528093702; cv=none; d=google.com; s=arc-20160816; b=FKXFmw63029RCmr8tA04JDJxwQGKQxQSiZRXIaSrsOvtn6NUJGe5fzQ7wCPVKZh6/2 ReHa93h00nKxmr/iz8Hs14AENuXjxX+T1baFUCKwx7vX//swgdR+nS0EP+kH6L2rd1+Q hSXomjT5U7uZ8jK8RhAqcpqUNreE/rLYOMvZdx2fVtfolbt3I+JXv/4b5sdQg/qr0i3F gSXXZaVy1/TreZRSluje6bXvayhyft0oSL4205cdj/eCASiF9n6Qiqz9FGBRuy0oJ3rj HFwqy4snUGvlRG0tfN+Pcr0gHyLOwKwOVvgvattmHu0ogGEAPYba1kWjMonv62vZlHfW D3zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=algrrClwUXAUBHxlyKlZE5VHJCGQth5ASdu4J1R7ud8=; b=GMuzfVeNlNAzumFALywsL4XNrdSA00CyWrYviPJwUD0A3thlB1sDRQZ6x5kCtsDJPB 2fllcED49Ryri5uIbZhRq4OTLSWmwH0qjWgnzLTsccn6JIs/7q5P/2ZiHATB89iZgK7D G9M631movtqMTEFh/vaKkYL3RJNSR9P0AOW4XHs56PeXQA5o+8SUm2cywXv8RAN1wQQo wZ2Wc47N3VAC3k0Hdq0vik5bgeLevp16OZK8OSbJ6efd+FJ9G1s9njArBdX43b4AOIsB nY7ZuGmWXXTFCkgUyE8DiuP97L/TVc8KNiyey2GqBqVHFzn/oA5DojRfNMJ15PO/tOMR cjfg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i35-v6si7568305plg.323.2018.06.03.23.28.07; Sun, 03 Jun 2018 23:28:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751787AbeFDG1m (ORCPT + 99 others); Mon, 4 Jun 2018 02:27:42 -0400 Received: from mx2.suse.de ([195.135.220.15]:50009 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750731AbeFDG1k (ORCPT ); Mon, 4 Jun 2018 02:27:40 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id C6D36ACC8; Mon, 4 Jun 2018 06:27:38 +0000 (UTC) Date: Mon, 4 Jun 2018 08:27:37 +0200 From: Michal Hocko To: Qing Huang Cc: Eric Dumazet , David Miller , tariqt@mellanox.com, haakon.bugge@oracle.com, yanjun.zhu@oracle.com, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, gi-oh.kim@profitbricks.com, "santosh.shilimkar@oracle.com" , Vlastimil Babka Subject: Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks Message-ID: <20180604062737.GA19202@dhcp22.suse.cz> References: <7a353b65-6b7f-1aee-1c48-e83c8e02f693@gmail.com> <0e11e0fc-6ccf-aa93-9c4f-b9eae1b90643@gmail.com> <20180531065405.GH15278@dhcp22.suse.cz> <20180531085532.GK15278@dhcp22.suse.cz> <20180531091022.GL15278@dhcp22.suse.cz> <7d8f52e1-aa16-d20c-a9a8-35ad88c0b1ab@oracle.com> <20180601073137.GV15278@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 01-06-18 15:05:26, Qing Huang wrote: > > > On 6/1/2018 12:31 AM, Michal Hocko wrote: > > On Thu 31-05-18 19:04:46, Qing Huang wrote: > > > > > > On 5/31/2018 2:10 AM, Michal Hocko wrote: > > > > On Thu 31-05-18 10:55:32, Michal Hocko wrote: > > > > > On Thu 31-05-18 04:35:31, Eric Dumazet wrote: > > > > [...] > > > > > > I merely copied/pasted from alloc_skb_with_frags() :/ > > > > > I will have a look at it. Thanks! > > > > OK, so this is an example of an incremental development ;). > > > > > > > > __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for > > > > high order allocations") to prevent from OOM killer. Yet this was > > > > not enough because fb05e7a89f50 ("net: don't wait for order-3 page > > > > allocation") didn't want an excessive reclaim for non-costly orders > > > > so it made it completely NOWAIT while it preserved __GFP_NORETRY in > > > > place which is now redundant. Should I send a patch? > > > > > > > Just curious, how about GFP_ATOMIC flag? Would it work in a similar fashion? > > > We experimented > > > with it a bit in the past but it seemed to cause other issue in our tests. > > > :-) > > GFP_ATOMIC is a non-sleeping (aka no reclaim) context with an access to > > memory reserves. So the risk is that you deplete those reserves and > > cause issues to other subsystems which need them as well. > > > > > By the way, we didn't encounter any OOM killer events. It seemed that the > > > mlx4_alloc_icm() triggered slowpath. > > > We still had about 2GB free memory while it was highly fragmented. > > The compaction was able to make a reasonable forward progress for you. > > But considering mlx4_alloc_icm is called with GFP_KERNEL resp. GFP_HIGHUSER > > then the OOM killer is clearly possible as long as the order is lower > > than 4. > > The allocation was 256KB so the order was much higher than 4. The compaction > seemed to be the root > cause for our problem. It took too long to finish its work while putting > mlx4_alloc_icm to sleep in a heavily > fragmented memory situation . Will NORETRY flag avoid the compaction ops and > fail the 256KB allocation > immediately so mlx4_alloc_icm can enter adjustable lower order allocation > code path quickly? Costly orders should only perform a light compaction attempt unless __GFP_RETRY_MAY_FAIL is used IIRC. CCing Vlastimil. So __GFP_NORETRY shouldn't make any difference. -- Michal Hocko SUSE Labs