Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4525079imm; Wed, 30 May 2018 07:14:44 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIrVaf7s3fR1sbGB08D7v9xb4rPhI4vqNmT5hyIjo3OirDqrz+YiDIQ73MVItHaXV+VVud7 X-Received: by 2002:a63:66c3:: with SMTP id a186-v6mr2371741pgc.408.1527689684285; Wed, 30 May 2018 07:14:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527689684; cv=none; d=google.com; s=arc-20160816; b=Tne4S59wlRBt4AtQHRyN7ILpKL+Go/F+R9Exqk7TKdHuwRhKNUR2ibEIQ+1p+Q40hj 5+P7jPRtJrk7KNb/5sTZ4iAR8n3d1ZZ8OcqPjimiyH7RrpbBWcKzWPm/MPQ5sj/gssIo opZNbHqoAC9dTACJlF3Y6ugFTT3ALzEwrzPnZp8AkWHaLpLhqasTWnaEZCIpE5asbOmP ji+nB6sJFIhDVD9WcMpT0ji10U0kwkRq/QPESj9zSt0r0HxWcqa6EJHnTE9XhXSxnqxj qAd7TDUKcTvnmdcOIezMtBvmxjrF9/XUNdfLClIqt5ZAPZdIoLJWUZpiCNXDSaKMNclK mRGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=kS+Cpiyj+nXvoh/nXukn8Iu5bpxbAcmV2e0cg/BywQY=; b=Z6iy8VBEZ0Itq94yKl9cuLWNc4c5R7W39Wc5Ev7G1jL+9wu68wqc36njQ801+ApYC0 WZ/yr7L07OAdj3spsQpLxghteWHVUbpXQn0ayscvTeTSfQS4Cd+KeStloiyzjJeKv6NT 7Z50w1Y6fyp6X+onQ1jqbq82s92dvpNBZ3EnVBKCrPI/st3jICZiEta9AeFTNRd855k8 EgeP3DyaGoXx7sZeKRIVn2v8Cpgh9OP1EpyMAfNXvARpGogv1PQ1WenlsPwOvoEOuAeP ZxT3g1M+1eKjZyhaFHu79EQT2V6ub/cLvLCOloQn5bxyAOsfBSu1+JdV/RCsEENR6SLn 37uA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=gMk/dF1P; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t23-v6si36651046plo.508.2018.05.30.07.14.29; Wed, 30 May 2018 07:14:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=gMk/dF1P; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753531AbeE3ONh (ORCPT + 99 others); Wed, 30 May 2018 10:13:37 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:52776 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752793AbeE3ONe (ORCPT ); Wed, 30 May 2018 10:13:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cmpxchg.org ; s=x; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject: Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=kS+Cpiyj+nXvoh/nXukn8Iu5bpxbAcmV2e0cg/BywQY=; b=gMk/dF1PZa3E5s7YpdgyDVrbij c1vm+LGOxYEtH49TYJAmyBwoydZfLKG50LGITbLIXt9QHgq1UMnAUP1aNpPMPhWJute/8TSeLytOr CBoXqEZtf/Clr/hFhEuFj+nvBeVfTdbDo5n9MYTOMImKbjR4u9fzlhFrtwo9o75XKo0o=; Date: Wed, 30 May 2018 10:15:33 -0400 From: Johannes Weiner To: Josef Bacik Cc: axboe@kernel.dk, kernel-team@fb.com, linux-block@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, tj@kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 07/13] memcontrol: schedule throttling if we are congested Message-ID: <20180530141533.GC4035@cmpxchg.org> References: <20180529211724.4531-1-josef@toxicpanda.com> <20180529211724.4531-8-josef@toxicpanda.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180529211724.4531-8-josef@toxicpanda.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 29, 2018 at 05:17:18PM -0400, Josef Bacik wrote: > @@ -5458,6 +5458,30 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, > return ret; > } > > +int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm, > + gfp_t gfp_mask, struct mem_cgroup **memcgp, > + bool compound) > +{ > + struct mem_cgroup *memcg; > + struct block_device *bdev; > + int ret; > + > + ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp, compound); > + memcg = *memcgp; > + > + if (!(gfp_mask & __GFP_IO) || !memcg) > + return ret; > +#if defined(CONFIG_BLOCK) && defined(CONFIG_SWAP) > + if (atomic_read(&memcg->css.cgroup->congestion_count) && > + has_usable_swap()) { > + map_swap_page(page, &bdev); This doesn't work, unfortunately - or only works on accident. It goes through page_private(), which is only valid for pages in the swapcache. The newly allocated pages you call it against aren't in the swapcache, but their page_private() is 0, which is incorrectly interpreted as "first swap slot on the first swap device" - which happens to make sense if you have only one swap device. > + blkcg_schedule_throttle(bdev_get_queue(bdev), true); By the time we allocate, we simply cannot know which swap device the page will end up on. However, we know what's likely: swap_avail_heads is sorted by order in which we try to allocate swap slots; the first device on there is where swap io will go. If we walk this list and throttle on the first device that has built-up delay debt, we'll throttle against the device that probably gets the current bulk of the swap writes. Also, if we have two swap devices with the same priority, swap allocation will re-order the list for us automatically in order to do round-robin loading of the devices. See get_swap_pages(). That should work out nicely for throttling as well. You can use page_to_nid() on the newly allocated page to index into swap_avail_heads[]. On an unrelated note, mem_cgroup_try_charge_delay() isn't the most descriptive name. Since it's not too page specific, we might want to move the throttling part out of the charge function and do something simliar to a stand-alone balance_dirty_pages() function. mem_cgroup_balance_anon_pages()? mem_cgroup_throttle_swaprate()? mem_cgroup_anon_throttle()? mem_cgroup_anon_allocwait()? Something like that. I personally like balance_anon_pages the best; not because it is the best name by itself, but because in the MM it has the notion of throttling the creation of IO liabilities to the write rate, which is what we're doing here as well.