Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp1457455ybg; Wed, 23 Oct 2019 16:20:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqzxdkdhDNiOQZUuTl0UJTSoAo5lDpVSOkYjyWZPIGgOp7SeWOWJ3m4ArHrEVclqvqwwtIa+ X-Received: by 2002:a17:906:a2d1:: with SMTP id by17mr35463142ejb.206.1571872835839; Wed, 23 Oct 2019 16:20:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571872835; cv=none; d=google.com; s=arc-20160816; b=My72VjbBqcjeS3jeDYAh3cglh4r1JqvG7cTJgDvfmu8OOAKKIOx6CJLfV89Oa6EH0O y3NO0MIFmZmNhJT96fnyG1ERy/Z9vnh0YQJ78r06HBOonjJ2+5dCQ+MSephHMIyIrb6d RamvaKXNmi614ucQytPVwC5baRcsjJJ5HGnOpBhQFY2Xyk3ZgMnSHtajUYrQrlI6pCIp oHcsheMK5PpBZ8nbrXiQuJ5bt8AeY1wUBSUa5oscPnp7PjzQ585BUdfZ+GcKaQ0qcVnh CUvLfqRFSOoiQHhNCsDLH0X9KvlLcq2mOBCW4e48ZdZnIkeKXBJQyuIwxlLCfnEGugAK Ecvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=JXCB1tn7531XXdyt7ziv7BF988Sxtlg1hTuUEn0Uq+o=; b=l+1wS+BSlvpDWYARD8UlHcbfjQvPWKB1lmMAClpfsp1IdQOJz++EiB8HY/nzf615oI opuXOfmdbziNG6S6rfiYZ7eVAlA0HGsux/ObvUK+f2S36Qxyk4zyAxW7usIougVNyVyC vgXLy5jB4Z1e4p5SBQaMTu1QvQXRtDaFJ6DOM4HVWBR5CFRk4yNYq3SLfJApLsC35yM+ V6za3z0OmQ86xx+o7jktXOjph4FpvnMA9tnc4fy79LFNn8ZeZYgtaAYbfTIQ3pJDccrv r+zmHUAORy/iNYKfgolsL4jjN42l2YDIvddW5p/xUxjzG3UKyzH62NnvhYjmyZqcqB5j oJLQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e5si13698134ejj.70.2019.10.23.16.20.09; Wed, 23 Oct 2019 16:20:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2406151AbfJWOTB (ORCPT + 99 others); Wed, 23 Oct 2019 10:19:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:37182 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2404423AbfJWOTB (ORCPT ); Wed, 23 Oct 2019 10:19:01 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 03332ACA8; Wed, 23 Oct 2019 14:18:58 +0000 (UTC) Date: Wed, 23 Oct 2019 16:18:57 +0200 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH 5/8] mm: vmscan: replace shrink_node() loop with a retry jump Message-ID: <20191023141857.GF17610@dhcp22.suse.cz> References: <20191022144803.302233-1-hannes@cmpxchg.org> <20191022144803.302233-6-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191022144803.302233-6-hannes@cmpxchg.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 22-10-19 10:48:00, Johannes Weiner wrote: > Most of the function body is inside a loop, which imposes an > additional indentation and scoping level that makes the code a bit > hard to follow and modify. I do agree! > The looping only happens in case of reclaim-compaction, which isn't > the common case. So rather than adding yet another function level to > the reclaim path and have every reclaim invocation go through a level > that only exists for one specific cornercase, use a retry goto. I would just keep the core logic in its own function and do the loop around it rather than a goto retry. This is certainly a matter of taste but I like a loop with an explicit condition much more than a if with goto. > Signed-off-by: Johannes Weiner > --- > mm/vmscan.c | 231 ++++++++++++++++++++++++++-------------------------- > 1 file changed, 115 insertions(+), 116 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 302dad112f75..235d1fc72311 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2729,144 +2729,143 @@ static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg) > static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) > { > struct reclaim_state *reclaim_state = current->reclaim_state; > + struct mem_cgroup *root = sc->target_mem_cgroup; > unsigned long nr_reclaimed, nr_scanned; > bool reclaimable = false; > + struct mem_cgroup *memcg; > +again: > + memset(&sc->nr, 0, sizeof(sc->nr)); > > - do { > - struct mem_cgroup *root = sc->target_mem_cgroup; > - struct mem_cgroup *memcg; > - > - memset(&sc->nr, 0, sizeof(sc->nr)); > - > - nr_reclaimed = sc->nr_reclaimed; > - nr_scanned = sc->nr_scanned; > + nr_reclaimed = sc->nr_reclaimed; > + nr_scanned = sc->nr_scanned; > > - memcg = mem_cgroup_iter(root, NULL, NULL); > - do { > - unsigned long reclaimed; > - unsigned long scanned; > + memcg = mem_cgroup_iter(root, NULL, NULL); > + do { > + unsigned long reclaimed; > + unsigned long scanned; > > - switch (mem_cgroup_protected(root, memcg)) { > - case MEMCG_PROT_MIN: > - /* > - * Hard protection. > - * If there is no reclaimable memory, OOM. > - */ > + switch (mem_cgroup_protected(root, memcg)) { > + case MEMCG_PROT_MIN: > + /* > + * Hard protection. > + * If there is no reclaimable memory, OOM. > + */ > + continue; > + case MEMCG_PROT_LOW: > + /* > + * Soft protection. > + * Respect the protection only as long as > + * there is an unprotected supply > + * of reclaimable memory from other cgroups. > + */ > + if (!sc->memcg_low_reclaim) { > + sc->memcg_low_skipped = 1; > continue; > - case MEMCG_PROT_LOW: > - /* > - * Soft protection. > - * Respect the protection only as long as > - * there is an unprotected supply > - * of reclaimable memory from other cgroups. > - */ > - if (!sc->memcg_low_reclaim) { > - sc->memcg_low_skipped = 1; > - continue; > - } > - memcg_memory_event(memcg, MEMCG_LOW); > - break; > - case MEMCG_PROT_NONE: > - /* > - * All protection thresholds breached. We may > - * still choose to vary the scan pressure > - * applied based on by how much the cgroup in > - * question has exceeded its protection > - * thresholds (see get_scan_count). > - */ > - break; > } > + memcg_memory_event(memcg, MEMCG_LOW); > + break; > + case MEMCG_PROT_NONE: > + /* > + * All protection thresholds breached. We may > + * still choose to vary the scan pressure > + * applied based on by how much the cgroup in > + * question has exceeded its protection > + * thresholds (see get_scan_count). > + */ > + break; > + } > > - reclaimed = sc->nr_reclaimed; > - scanned = sc->nr_scanned; > - shrink_node_memcg(pgdat, memcg, sc); > - > - shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > - sc->priority); > - > - /* Record the group's reclaim efficiency */ > - vmpressure(sc->gfp_mask, memcg, false, > - sc->nr_scanned - scanned, > - sc->nr_reclaimed - reclaimed); > - > - } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); > + reclaimed = sc->nr_reclaimed; > + scanned = sc->nr_scanned; > + shrink_node_memcg(pgdat, memcg, sc); > > - if (reclaim_state) { > - sc->nr_reclaimed += reclaim_state->reclaimed_slab; > - reclaim_state->reclaimed_slab = 0; > - } > + shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > + sc->priority); > > - /* Record the subtree's reclaim efficiency */ > - vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true, > - sc->nr_scanned - nr_scanned, > - sc->nr_reclaimed - nr_reclaimed); > + /* Record the group's reclaim efficiency */ > + vmpressure(sc->gfp_mask, memcg, false, > + sc->nr_scanned - scanned, > + sc->nr_reclaimed - reclaimed); > > - if (sc->nr_reclaimed - nr_reclaimed) > - reclaimable = true; > + } while ((memcg = mem_cgroup_iter(root, memcg, NULL))); > > - if (current_is_kswapd()) { > - /* > - * If reclaim is isolating dirty pages under writeback, > - * it implies that the long-lived page allocation rate > - * is exceeding the page laundering rate. Either the > - * global limits are not being effective at throttling > - * processes due to the page distribution throughout > - * zones or there is heavy usage of a slow backing > - * device. The only option is to throttle from reclaim > - * context which is not ideal as there is no guarantee > - * the dirtying process is throttled in the same way > - * balance_dirty_pages() manages. > - * > - * Once a node is flagged PGDAT_WRITEBACK, kswapd will > - * count the number of pages under pages flagged for > - * immediate reclaim and stall if any are encountered > - * in the nr_immediate check below. > - */ > - if (sc->nr.writeback && sc->nr.writeback == sc->nr.taken) > - set_bit(PGDAT_WRITEBACK, &pgdat->flags); > + if (reclaim_state) { > + sc->nr_reclaimed += reclaim_state->reclaimed_slab; > + reclaim_state->reclaimed_slab = 0; > + } > > - /* > - * Tag a node as congested if all the dirty pages > - * scanned were backed by a congested BDI and > - * wait_iff_congested will stall. > - */ > - if (sc->nr.dirty && sc->nr.dirty == sc->nr.congested) > - set_bit(PGDAT_CONGESTED, &pgdat->flags); > + /* Record the subtree's reclaim efficiency */ > + vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true, > + sc->nr_scanned - nr_scanned, > + sc->nr_reclaimed - nr_reclaimed); > > - /* Allow kswapd to start writing pages during reclaim.*/ > - if (sc->nr.unqueued_dirty == sc->nr.file_taken) > - set_bit(PGDAT_DIRTY, &pgdat->flags); > + if (sc->nr_reclaimed - nr_reclaimed) > + reclaimable = true; > > - /* > - * If kswapd scans pages marked marked for immediate > - * reclaim and under writeback (nr_immediate), it > - * implies that pages are cycling through the LRU > - * faster than they are written so also forcibly stall. > - */ > - if (sc->nr.immediate) > - congestion_wait(BLK_RW_ASYNC, HZ/10); > - } > + if (current_is_kswapd()) { > + /* > + * If reclaim is isolating dirty pages under writeback, > + * it implies that the long-lived page allocation rate > + * is exceeding the page laundering rate. Either the > + * global limits are not being effective at throttling > + * processes due to the page distribution throughout > + * zones or there is heavy usage of a slow backing > + * device. The only option is to throttle from reclaim > + * context which is not ideal as there is no guarantee > + * the dirtying process is throttled in the same way > + * balance_dirty_pages() manages. > + * > + * Once a node is flagged PGDAT_WRITEBACK, kswapd will > + * count the number of pages under pages flagged for > + * immediate reclaim and stall if any are encountered > + * in the nr_immediate check below. > + */ > + if (sc->nr.writeback && sc->nr.writeback == sc->nr.taken) > + set_bit(PGDAT_WRITEBACK, &pgdat->flags); > > /* > - * Legacy memcg will stall in page writeback so avoid forcibly > - * stalling in wait_iff_congested(). > + * Tag a node as congested if all the dirty pages > + * scanned were backed by a congested BDI and > + * wait_iff_congested will stall. > */ > - if (cgroup_reclaim(sc) && writeback_throttling_sane(sc) && > - sc->nr.dirty && sc->nr.dirty == sc->nr.congested) > - set_memcg_congestion(pgdat, root, true); > + if (sc->nr.dirty && sc->nr.dirty == sc->nr.congested) > + set_bit(PGDAT_CONGESTED, &pgdat->flags); > + > + /* Allow kswapd to start writing pages during reclaim.*/ > + if (sc->nr.unqueued_dirty == sc->nr.file_taken) > + set_bit(PGDAT_DIRTY, &pgdat->flags); > > /* > - * Stall direct reclaim for IO completions if underlying BDIs > - * and node is congested. Allow kswapd to continue until it > - * starts encountering unqueued dirty pages or cycling through > - * the LRU too quickly. > + * If kswapd scans pages marked marked for immediate > + * reclaim and under writeback (nr_immediate), it > + * implies that pages are cycling through the LRU > + * faster than they are written so also forcibly stall. > */ > - if (!sc->hibernation_mode && !current_is_kswapd() && > - current_may_throttle() && pgdat_memcg_congested(pgdat, root)) > - wait_iff_congested(BLK_RW_ASYNC, HZ/10); > + if (sc->nr.immediate) > + congestion_wait(BLK_RW_ASYNC, HZ/10); > + } > + > + /* > + * Legacy memcg will stall in page writeback so avoid forcibly > + * stalling in wait_iff_congested(). > + */ > + if (cgroup_reclaim(sc) && writeback_throttling_sane(sc) && > + sc->nr.dirty && sc->nr.dirty == sc->nr.congested) > + set_memcg_congestion(pgdat, root, true); > + > + /* > + * Stall direct reclaim for IO completions if underlying BDIs > + * and node is congested. Allow kswapd to continue until it > + * starts encountering unqueued dirty pages or cycling through > + * the LRU too quickly. > + */ > + if (!sc->hibernation_mode && !current_is_kswapd() && > + current_may_throttle() && pgdat_memcg_congested(pgdat, root)) > + wait_iff_congested(BLK_RW_ASYNC, HZ/10); > > - } while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, > - sc)); > + if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, > + sc)) > + goto again; > > /* > * Kswapd gives up on balancing particular nodes after too > -- > 2.23.0 -- Michal Hocko SUSE Labs