Received: by 10.213.65.68 with SMTP id h4csp2442986imn; Thu, 5 Apr 2018 15:20:06 -0700 (PDT) X-Google-Smtp-Source: AIpwx48BBf6TwvdszRBE//N/CMAYvbxDkXTJcqLg2NhiRFosdzu7/pC7IzKlddkPrHMvL5g4808Z X-Received: by 10.99.148.17 with SMTP id m17mr16023772pge.140.1522966806188; Thu, 05 Apr 2018 15:20:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522966806; cv=none; d=google.com; s=arc-20160816; b=PHuj+f/d/rNRk0fUcuV7W83jSWhpym9G8ccgcnmey6bhCQEfeHOQnHVh9Vqb/IAHbK S6k+FRaRrhlOa1PpfAbS0uZ/bIr3YRH5ACyu42Du9PPDXHf9HcYiKPU33mPSGBop0xzp vzUkIFtLA8Cc1EmnAr4rfnLIXRFSI2nbAfBbix9BXKs6ZF/RKHjEcRMEeCUX0C5ia4dH kSsa029e4TsWvABiZSr0Me/Hwj1jplikTaxZrTWigzAH56D8lB/j3LAQXpBiaafKcwii 6BMp2iNGNU1maz8f1bX8Xcyw2IFjUDaNLn+WFob8WTg+7xswAHqnHixbwuSafvNnA1l8 g8FQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=4bjpB8+xpo0wjmq8rwRnQw+Ej7ere5hrUjERhRkCc3s=; b=MQlJ5eUYIRbdph9cn/rmeTnG7x4y0vijHrOF7aNAI/ZqJtD9vyzkTMhDQJlqzlazZj I3YomV4DpfpUlyaU363juZxIOHND8uI/26tGIQk22gfSaMB6so+KY/apt15EWVo+/tHk 1xhAGe4GipYBMMzidccL0SAA9FUt3G35XD3mmUALMQOSjbMxB1QyFRX28DrF839uzI/Y LFRAYjAYLVg2Dtj9GQbpOvmpY4hfIfmN3wVMy2VXvuAiAt36hUF3cTOywTFNHcsGxuy4 +FEUdSOZ2JO3jxwJDFGcW7KKqI+Kbtm5k1cJ7jSgT39G+60jZz5HmVkBpSf17SBs0b/+ X02Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g12-v6si4853825plo.664.2018.04.05.15.19.49; Thu, 05 Apr 2018 15:20:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754100AbeDEWSb (ORCPT + 99 others); Thu, 5 Apr 2018 18:18:31 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:58188 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752666AbeDEWSa (ORCPT ); Thu, 5 Apr 2018 18:18:30 -0400 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.71]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 6DFACD66; Thu, 5 Apr 2018 22:18:29 +0000 (UTC) Date: Thu, 5 Apr 2018 15:18:28 -0700 From: Andrew Morton To: Andrey Ryabinin Cc: Mel Gorman , Tejun Heo , Johannes Weiner , Michal Hocko , Shakeel Butt , Steven Rostedt , linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH v2 4/4] mm/vmscan: Don't mess with pgdat->flags in memcg reclaim. Message-Id: <20180405151828.98b5bfb143a7a8b7dec4b153@linux-foundation.org> In-Reply-To: <20180323152029.11084-5-aryabinin@virtuozzo.com> References: <20180323152029.11084-1-aryabinin@virtuozzo.com> <20180323152029.11084-5-aryabinin@virtuozzo.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 23 Mar 2018 18:20:29 +0300 Andrey Ryabinin wrote: > memcg reclaim may alter pgdat->flags based on the state of LRU lists > in cgroup and its children. PGDAT_WRITEBACK may force kswapd to sleep > congested_wait(), PGDAT_DIRTY may force kswapd to writeback filesystem > pages. But the worst here is PGDAT_CONGESTED, since it may force all > direct reclaims to stall in wait_iff_congested(). Note that only kswapd > have powers to clear any of these bits. This might just never happen if > cgroup limits configured that way. So all direct reclaims will stall > as long as we have some congested bdi in the system. > > Leave all pgdat->flags manipulations to kswapd. kswapd scans the whole > pgdat, only kswapd can clear pgdat->flags once node is balance, thus > it's reasonable to leave all decisions about node state to kswapd. > > Moving pgdat->flags manipulation to kswapd, means that cgroup2 recalim > now loses its congestion throttling mechanism. Add per-cgroup congestion > state and throttle cgroup2 reclaimers if memcg is in congestion state. > > Currently there is no need in per-cgroup PGDAT_WRITEBACK and PGDAT_DIRTY > bits since they alter only kswapd behavior. > > The problem could be easily demonstrated by creating heavy congestion > in one cgroup: > > echo "+memory" > /sys/fs/cgroup/cgroup.subtree_control > mkdir -p /sys/fs/cgroup/congester > echo 512M > /sys/fs/cgroup/congester/memory.max > echo $$ > /sys/fs/cgroup/congester/cgroup.procs > /* generate a lot of diry data on slow HDD */ > while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done & > .... > while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done & > > and some job in another cgroup: > > mkdir /sys/fs/cgroup/victim > echo 128M > /sys/fs/cgroup/victim/memory.max > > # time cat /dev/sda > /dev/null > real 10m15.054s > user 0m0.487s > sys 1m8.505s > > According to the tracepoint in wait_iff_congested(), the 'cat' spent 50% > of the time sleeping there. > > With the patch, cat don't waste time anymore: > > # time cat /dev/sda > /dev/null > real 5m32.911s > user 0m0.411s > sys 0m56.664s > Reviewers, please?