Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752142AbdCAPxu (ORCPT ); Wed, 1 Mar 2017 10:53:50 -0500 Received: from mx2.suse.de ([195.135.220.15]:52720 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751719AbdCAPxm (ORCPT ); Wed, 1 Mar 2017 10:53:42 -0500 Date: Wed, 1 Mar 2017 16:40:27 +0100 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , Jia He , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH 6/9] mm: don't avoid high-priority reclaim on memcg limit reclaim Message-ID: <20170301154027.GF11730@dhcp22.suse.cz> References: <20170228214007.5621-1-hannes@cmpxchg.org> <20170228214007.5621-7-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170228214007.5621-7-hannes@cmpxchg.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1847 Lines: 45 On Tue 28-02-17 16:40:04, Johannes Weiner wrote: > 246e87a93934 ("memcg: fix get_scan_count() for small targets") sought > to avoid high reclaim priorities for memcg by forcing it to scan a > minimum amount of pages when lru_pages >> priority yielded nothing. > This was done at a time when reclaim decisions like dirty throttling > were tied to the priority level. > > Nowadays, the only meaningful thing still tied to priority dropping > below DEF_PRIORITY - 2 is gating whether laptop_mode=1 is generally > allowed to write. But that is from an era where direct reclaim was > still allowed to call ->writepage, and kswapd nowadays avoids writes > until it's scanned every clean page in the system. Potential changes > to how quick sc->may_writepage could trigger are of little concern. > > Remove the force_scan stuff, as well as the ugly multi-pass target > calculation that it necessitated. I _really_ like this, I hated the multi-pass part. One thig that I am worried about and changelog doesn't mention it is what we are going to do about small (<16MB) memcgs. On one hand they were already ignored in the global reclaim so this is nothing really new but maybe we want to preserve the behavior for the memcg reclaim at least which would reduce side effect of this patch which is a great cleanup otherwise. Or at least be explicit about this in the changelog. Btw. why cannot we simply force scan at least SWAP_CLUSTER_MAX unconditionally? > + /* > + * If the cgroup's already been deleted, make sure to > + * scrape out the remaining cache. Also make sure that small memcgs will not get unnoticed during the memcg reclaim > + */ > + if (!scan && !mem_cgroup_online(memcg)) if (!scan && (!mem_cgroup_online(memcg) || !global_reclaim(sc))) > + scan = min(size, SWAP_CLUSTER_MAX); > -- Michal Hocko SUSE Labs