Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp700403imm; Wed, 1 Aug 2018 03:59:59 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdwlaYMzjoMdFX6kWIm8Ls8r2qFV8z3pefLFP+1ZKdjk7mqunsG84AY7+HJFCxf+IQ5868+ X-Received: by 2002:a65:53cb:: with SMTP id z11-v6mr23865865pgr.218.1533121199741; Wed, 01 Aug 2018 03:59:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533121199; cv=none; d=google.com; s=arc-20160816; b=ZBZpkNLxX/fI9EbZkbisakX0FQJRkcz0o8m6aM1IAvRn0oH8qMNtBsXMhLt4Ko7U5l cFRJWnA1PW15TFzLxrxn+4LB5IIxvJmdOGhpq/kZ0U26UuOKH/xRaXnma5d7vzSMHHGB exTvDfP0ylyC1RQMZxfkV76jINqbIwHHbIfraYSdIzpUDN+BYkf3+87YQRTr09c5K1Zm F0gAlCrpOBT4O45o5wTtaQ4NFXyWKcGFvNne55xP6lXJG4dk84Uw/IcaRShfiYeLIgMF I+8wlABHE9RsOWL01teJ1PztBwbrbBNWFP3tpZMokvptunT0x8BGwNn/Jx8rKjPiQaln NZdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=EfpxahfWNzm0fAzy+0Gh8AQzN7K+y1SWbTL3iGWkZF8=; b=Xn9MGvrSvBuOs0DTVg1SLhX+xG4la4vLyrukvynSg3HwNh4UkJKlj61TE9DD4k/1kI sdiEEPMStHd9XSkM0PcxJ7S+zni9WgG/hS3aiLVMiezJ2i+rnqNTQ0s+BNzidL5gHwSm BD6lbzvXTEJw25FLA4vmQsBMtsB2GJhbRqrp9TtqhGh1U6KxCqyccStOKrCx4XLlTQle 604zneThgTdUjcghjQh0waPJEMVObts0ica6qaW2UQnWF/3X2e1PWJGBxhN8cXHNH8rZ 6s6YSc6HKUPir7We2bNTIYDEdTm04L11mdFbf92OW8cZJnFWsMpqSnCHrs+MoFLgwtEg IeLg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cd4-v6si16703073plb.516.2018.08.01.03.59.42; Wed, 01 Aug 2018 03:59:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388154AbeHAMne (ORCPT + 99 others); Wed, 1 Aug 2018 08:43:34 -0400 Received: from mx56.baidu.com ([61.135.168.56]:11933 "EHLO tc-sys-mailedm01.tc.baidu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387575AbeHAMnd (ORCPT ); Wed, 1 Aug 2018 08:43:33 -0400 X-Greylist: delayed 569 seconds by postgrey-1.27 at vger.kernel.org; Wed, 01 Aug 2018 08:43:31 EDT Received: from localhost (cp01-cos-dev01.cp01.baidu.com [10.92.119.46]) by tc-sys-mailedm01.tc.baidu.com (Postfix) with ESMTP id 2521E2040053; Wed, 1 Aug 2018 18:48:36 +0800 (CST) From: Li RongQing To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Alexander Viro , Johannes Weiner , Michal Hocko , Vladimir Davydov Subject: [PATCH 2/2] fs/writeback: do memory cgroup related writeback firstly Date: Wed, 1 Aug 2018 18:48:36 +0800 Message-Id: <1533120516-18279-2-git-send-email-lirongqing@baidu.com> X-Mailer: git-send-email 1.7.10.1 In-Reply-To: <1533120516-18279-1-git-send-email-lirongqing@baidu.com> References: <1533120516-18279-1-git-send-email-lirongqing@baidu.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When a machine has hundreds of memory cgroups, and some cgroups generate more or less dirty pages, but a cgroup of them has lots of memory pressure and always tries to reclaim dirty page, then it will trigger all cgroups to writeback, which is less efficient: 1.if the used memory in a memory cgroup reaches its limit, it is useless to writeback other cgroups. 2.other cgroups can wait more time to merge write request so replace the full flush with flushing writeback of memory cgroup whose tasks tries to reclaim memory and trigger writeback, if nothing is writeback, then fallback a full flush After this patch, the writing performance enhance 5% in below setup: $mount -t cgroup none -o memory /cgroups/memory/ $mkdir /cgroups/memory/x1 $echo $$ > /cgroups/memory/x1/tasks $echo 100M > /cgroups/memory/x1/memory.limit_in_bytes $cd /cgroups/memory/ $seq 10000|xargs mkdir $fio -filename=/home/test1 -direct=0 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=16k -size=20G Before: WRITE: io=20480MB, aggrb=779031KB/s, minb=779031KB/s, maxb=779031KB/s, mint=26920msec, maxt=26920msec After: WRITE: io=20480MB, aggrb=831708KB/s, minb=831708KB/s, maxb=831708KB/s, mint=25215msec, maxt=25215msec And this patch can reduce io util in this condition, like there is two disks, one disks is used to store all kinds of logs, it should be less io pressure, and other is used to store hadoop data which will write lots of data to disk, but both disk io utils are high in fact, since when hadoop reclaims memory, it will wake all memory cgroup writeback. Signed-off-by: Zhang Yu Signed-off-by: Li RongQing --- fs/fs-writeback.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 471d863958bc..475cada5d1cf 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -35,6 +35,11 @@ */ #define MIN_WRITEBACK_PAGES (4096UL >> (PAGE_SHIFT - 10)) +/* + * if WB cgroup dirty pages is bigger than it, not start a full flush + */ +#define MIN_WB_DIRTY_PAGES 64 + struct wb_completion { atomic_t cnt; }; @@ -2005,6 +2010,32 @@ void wakeup_flusher_threads(enum wb_reason reason) if (blk_needs_flush_plug(current)) blk_schedule_flush_plug(current); +#ifdef CONFIG_CGROUP_WRITEBACK + if (reason == WB_REASON_VMSCAN) { + unsigned long tmp, pdirty = 0; + + rcu_read_lock(); + list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) { + struct bdi_writeback *wb = wb_find_current(bdi); + + if (wb) { + tmp = mem_cgroup_wb_dirty_stats(wb); + if (tmp) { + pdirty += tmp; + wb_start_writeback(wb, reason); + + if (wb == &bdi->wb) + pdirty += MIN_WB_DIRTY_PAGES; + } + } + } + rcu_read_unlock(); + + if (pdirty > MIN_WB_DIRTY_PAGES) + return; + } +#endif + rcu_read_lock(); list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) __wakeup_flusher_threads_bdi(bdi, reason); -- 2.16.2