Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp505620ybl; Wed, 28 Aug 2019 00:58:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqz3G7CTrXtfN8b9STRj4hqWOCEh3cxtfyHAiHcPjkJcn5NOc+kv45NZW5a8KNPFGGMQXSgs X-Received: by 2002:a17:90b:f13:: with SMTP id br19mr2904429pjb.124.1566979105181; Wed, 28 Aug 2019 00:58:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566979105; cv=none; d=google.com; s=arc-20160816; b=DOPtWEcRyFfgzgZqj0j8HsoiiBgO2itsPYGX5b6Nwz9xzMU1BW/KpOX9PijVyMwC61 8Ckc17Nk9zqCS1vSvWowpwtTCkL2qsSH8OAVxCphhDOcBe1FyV4zDSKr3pW/f5ie9JwK aFkZz+a32uYzEG49UuwKJGiYzPKy6g9CWyyhteW/UblPagjOed74vJVNmGoofuyp5gk9 sWAaKoZRXKGXAgM9MV9M8HoGb9DhzZ0JMje8CgL33wiLpK995w1LCzkaweEdsQtVdI5k oo+bjmgxq4Mdw7dvtZr38RdFF+aSMGLR/p/yaT8JhadPFAxRJsjM0eccpud2JZu4hFA0 f5eA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=g/rThyKYkJViD7VGcbRRjs3PbBz0eVYyXIfoopzwrgg=; b=EAXHY0CFBQQZuBPxGbeqKmHZj/rFt2cIMrPTo418PzMEBFyjxt3CqMlUKMhynndaa8 pDYNW0Jb55Dnx9Mt2eEHKg321PkE2KLTS+/SZTk9YbIltZBrym3X5yoiJneWC9YJF6uy Le6GbbbSqfXRD5pQ5OCo8snM9zoM438DAAJopf391rk0ipcxYw5CBrBdPCzFt2epSgl1 jrbh5kQI2YdgmPI5Glpi/dqxTX1TADGs/PtLX2ZOi5kzyv/RGqtbcFagEp/n10OOqrrh fL+F/Uw90GbSNDuf6XjZOC0+ZORx0+w3ighpazHW5zLgsfIZ4sGduDGkf127rgSAyS7C Nqqw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z19si1793716pfa.19.2019.08.28.00.58.08; Wed, 28 Aug 2019 00:58:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726450AbfH1H5L (ORCPT + 99 others); Wed, 28 Aug 2019 03:57:11 -0400 Received: from mx2.suse.de ([195.135.220.15]:42530 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726432AbfH1H5L (ORCPT ); Wed, 28 Aug 2019 03:57:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 62BC6AB98; Wed, 28 Aug 2019 07:57:09 +0000 (UTC) Date: Wed, 28 Aug 2019 09:57:08 +0200 From: Michal Hocko To: Yang Shi Cc: "Kirill A. Shutemov" , Vlastimil Babka , kirill.shutemov@linux.intel.com, hannes@cmpxchg.org, rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [v2 PATCH -mm] mm: account deferred split THPs into MemAvailable Message-ID: <20190828075708.GF7386@dhcp22.suse.cz> References: <20190822152934.w6ztolutdix6kbvc@box> <20190826074035.GD7538@dhcp22.suse.cz> <20190826131538.64twqx3yexmhp6nf@box> <20190827060139.GM7538@dhcp22.suse.cz> <20190827110210.lpe36umisqvvesoa@box> <20190827120923.GB7538@dhcp22.suse.cz> <20190827121739.bzbxjloq7bhmroeq@box> <20190827125911.boya23eowxhqmopa@box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 27-08-19 10:06:20, Yang Shi wrote: > > > On 8/27/19 5:59 AM, Kirill A. Shutemov wrote: > > On Tue, Aug 27, 2019 at 03:17:39PM +0300, Kirill A. Shutemov wrote: > > > On Tue, Aug 27, 2019 at 02:09:23PM +0200, Michal Hocko wrote: > > > > On Tue 27-08-19 14:01:56, Vlastimil Babka wrote: > > > > > On 8/27/19 1:02 PM, Kirill A. Shutemov wrote: > > > > > > On Tue, Aug 27, 2019 at 08:01:39AM +0200, Michal Hocko wrote: > > > > > > > On Mon 26-08-19 16:15:38, Kirill A. Shutemov wrote: > > > > > > > > Unmapped completely pages will be freed with current code. Deferred split > > > > > > > > only applies to partly mapped THPs: at least on 4k of the THP is still > > > > > > > > mapped somewhere. > > > > > > > Hmm, I am probably misreading the code but at least current Linus' tree > > > > > > > reads page_remove_rmap -> [page_remove_anon_compound_rmap ->\ deferred_split_huge_page even > > > > > > > for fully mapped THP. > > > > > > Well, you read correctly, but it was not intended. I screwed it up at some > > > > > > point. > > > > > > > > > > > > See the patch below. It should make it work as intened. > > > > > > > > > > > > It's not bug as such, but inefficientcy. We add page to the queue where > > > > > > it's not needed. > > > > > But that adding to queue doesn't affect whether the page will be freed > > > > > immediately if there are no more partial mappings, right? I don't see > > > > > deferred_split_huge_page() pinning the page. > > > > > So your patch wouldn't make THPs freed immediately in cases where they > > > > > haven't been freed before immediately, it just fixes a minor > > > > > inefficiency with queue manipulation? > > > > Ohh, right. I can see that in free_transhuge_page now. So fully mapped > > > > THPs really do not matter and what I have considered an odd case is > > > > really happening more often. > > > > > > > > That being said this will not help at all for what Yang Shi is seeing > > > > and we need a more proactive deferred splitting as I've mentioned > > > > earlier. > > > It was not intended to fix the issue. It's fix for current logic. I'm > > > playing with the work approach now. > > Below is what I've come up with. It appears to be functional. > > > > Any comments? > > Thanks, Kirill and Michal. Doing split more proactive is definitely a choice > to eliminate huge accumulated deferred split THPs, I did think about this > approach before I came up with memcg aware approach. But, I thought this > approach has some problems: > > First of all, we can't prove if this is a universal win for the most > workloads or not. For some workloads (as I mentioned about our usecase), we > do see a lot THPs accumulated for a while, but they are very short-lived for > other workloads, i.e. kernel build. > > Secondly, it may be not fair for some workloads which don't generate too > many deferred split THPs or those THPs are short-lived. Actually, the cpu > time is abused by the excessive deferred split THPs generators, isn't it? Yes this is indeed true. Do we have any idea on how much time that actually is? > With memcg awareness, the deferred split THPs actually are isolated and > capped by memcg. The long-lived deferred split THPs can't be accumulated too > many due to the limit of memcg. And, cpu time spent in splitting them would > just account to the memcgs who generate that many deferred split THPs, who > generate them who pay for it. This sounds more fair and we could achieve > much better isolation. On the other hand, deferring the split and free up a non trivial amount of memory is a problem I consider quite serious because it affects not only the memcg workload which has to do the reclaim but also other consumers of memory beucase large memory blocks could be used for higher order allocations. > And, I think the discussion is diverted and mislead by the number of > excessive deferred split THPs. To be clear, I didn't mean the excessive > deferred split THPs are problem for us (I agree it may waste memory to have > that many deferred split THPs not usable), the problem is the oom since they > couldn't be split by memcg limit reclaim since the shrinker was not memcg > aware. Well, I would like to see how much of a problem the memcg OOM really is after deferred splitting is more time constrained. Maybe we will find that there is no special memcg aware solution really needed. -- Michal Hocko SUSE Labs