Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp580850ybe; Wed, 18 Sep 2019 23:55:54 -0700 (PDT) X-Google-Smtp-Source: APXvYqx57dWTDzMccJWfYgOIx2nOQ+itSdB9tSceQVvQkQ9TNmrQNdS8SgP1qP7ZZLCbFvPKrEAi X-Received: by 2002:aa7:dcca:: with SMTP id w10mr14526084edu.183.1568876154156; Wed, 18 Sep 2019 23:55:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568876154; cv=none; d=google.com; s=arc-20160816; b=Ji8FH8fTcoJKiqYI85XQmQIQ6Nel5vYqCwbqMay3GnsISLyuvV/a8aBla39AgispGT wFt5LAgtlOoWeWlctFJjKR3C+kuvtT7jnpkUeaQpt3uZjVEkbDxDZVHcLCMNBQni14FF VRciAEL/3HNKJGv1C6hH4RBbQ/2Lj4/QeEDDE4QC2KueXWW8i1qnYNHHC75drHcHD4Qv 092KW1k6qclaC7un+j1vgjQRvS9j+gezScVeK6ahOos58YNaHS9fRwi/xvCqb84uJnDA w3ichmdGOh5D0z0xWEuVFWYtpdVrSvZTKYN5yPs8o/9SqZ+pkl3lMnk7t6yZVGbcidBO dsjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=ywaFR8AhNcvCmLGkCJt1drGrsamAWPY19eW/yK+EtMw=; b=NFxR0+SuPYM80A61FX4poDKLDBvTcYT0b24Hv9aZ+1G4CcKXn/Ku3fOiH1mvkggzMs 4SRaLL3Y18251Tx1SQrV618yDm5Ev2E54qixHXdtEuG3m3cSjxXS+S/wwWyfem5mtYxt mniqVRPoSuUpAv/XWCa5KrtoVHF35XrqygbGOKAiSFBtI+0v4FlI8z7tLQhbmVdurni5 H/5xFxedPgZuBBN8JjZKgARvtkaNMe3/3li3C97UdYJ7OrP+xOJ9WyNu5G+vZk5YC49X b7WiWzCa+rthW4gv7poeSR2yZUQOAO4q8sIkoS6NIJ0ANJQv6++CZxfL9UNVp6lKe1Yw UsSw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f17si4950244eda.232.2019.09.18.23.55.31; Wed, 18 Sep 2019 23:55:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731074AbfISC3X (ORCPT + 99 others); Wed, 18 Sep 2019 22:29:23 -0400 Received: from aliyun-cloud.icoremail.net ([47.90.88.91]:24308 "HELO aliyun-sdnproxy-2.icoremail.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with SMTP id S1730669AbfISC3W (ORCPT ); Wed, 18 Sep 2019 22:29:22 -0400 Received: from localhost.localdomain (unknown [218.85.123.226]) by app1 (Coremail) with SMTP id xjNnewDnybbs5YJdNFQBAA--.26S2; Thu, 19 Sep 2019 10:20:30 +0800 (CST) Subject: Re: [PATCH] [RFC] vmscan.c: add a sysctl entry for controlling memory reclaim IO congestion_wait length To: Matthew Wilcox Cc: corbet@lwn.net, mcgrof@kernel.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, keescook@chromium.org, mchehab+samsung@kernel.org, mgorman@techsingularity.net, vbabka@suse.cz, mhocko@suse.com, ktkhai@virtuozzo.com, hannes@cmpxchg.org References: <20190917115824.16990-1-linf@wangsu.com> <20190917120646.GT29434@bombadil.infradead.org> <3fbb428e-9466-b56b-0be8-c0f510e3aa99@wangsu.com> <20190918113859.GA9880@bombadil.infradead.org> From: Lin Feng Message-ID: Date: Thu, 19 Sep 2019 10:20:28 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190918113859.GA9880@bombadil.infradead.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-CM-TRANSID: xjNnewDnybbs5YJdNFQBAA--.26S2 X-Coremail-Antispam: 1UD129KBjvJXoW7tF45GF15WFyxZFWUJrW3Awb_yoW8tw17pF y8tFsFgF4qyr93tr92va47Kw1Ut3yUGrW7Jry3X34Uu3s8JF92vF4IgayY9asxurn3Gry2 vr4j934kZrWYvaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvEb7Iv0xC_Kw4lb4IE77IF4wAFc2x0x2IEx4CE42xK8VAvwI8I cIk0rVWrJVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjx v20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26rxl6s0DM28EF7xvwVC2 z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcV Aq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6x8ErcxFaVAv8VW8GwAv 7VCY1x0262k0Y48FwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcVAKI48JM4 IIrI8v6xkF7I0E8cxan2IY04v7Mxk0xIA0c2IEe2xFo4CEbIxvr21lc2xSY4AK67AK6r4U MxAIw28IcxkI7VAKI48JMxAIw28IcVCjz48v1sIEY20_Gr4l4I8I3I0E4IkC6x0Yz7v_Jr 0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY 17CE14v26r1q6r43MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcV C0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWrJr0_WFyUJwCI 42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWI evJa73UjIFyTuYvjxUfIJmUUUUU X-CM-SenderInfo: holqwq5zdqw23xof0z/ Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 9/18/19 19:38, Matthew Wilcox wrote: > On Wed, Sep 18, 2019 at 11:21:04AM +0800, Lin Feng wrote: >>> Adding a new tunable is not the right solution. The right way is >>> to make Linux auto-tune itself to avoid the problem. For example, >>> bdi_writeback contains an estimated write bandwidth (calculated by the >>> memory management layer). Given that, we should be able to make an >>> estimate for how long to wait for the queues to drain. >>> >> >> Yes, I had ever considered that, auto-tuning is definitely the senior AI way. >> While considering all kinds of production environments hybird storage solution >> is also common today, servers' dirty pages' bdi drivers can span from high end >> ssds to low end sata disk, so we have to think of a *formula(AI core)* by using >> the factors of dirty pages' amount and bdis' write bandwidth, and this AI-core >> will depend on if the estimated write bandwidth is sane and moreover the to be >> written back dirty pages is sequential or random if the bdi is rotational disk, >> it's likey to give a not-sane number and hurt guys who dont't want that, while >> if only consider ssd is relatively simple. >> >> So IMHO it's not sane to brute force add a guessing logic into memory writeback >> codes and pray on inventing a formula that caters everyone's need. >> Add a sysctl entry may be a right choice that give people who need it and >> doesn't hurt people who don't want it. > > You're making this sound far harder than it is. All the writeback code > needs to know is "How long should I sleep for in order for the queues > to drain a substantial amount". Since you know the bandwidth and how > many pages you've queued up, it's a simple calculation. > Ah, I should have read more of the writeback codes ;-) Based on Michal's comments: > the underlying problem. Both congestion_wait and wait_iff_congested > should wake up early if the congestion is handled. Is this not the case? If process is waken up once bdi congested is clear, this timeout length's role seems not that important. I need to trace more if I can reproduce this issue without online network traffic. But still weird thing is that once I set the people-disliked-tunable iowait drop down instantly, they are contradictory. Anyway, thanks a lot for your suggestions! linfeng