Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp654875ybe; Thu, 19 Sep 2019 01:26:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqwpOoc3M1KbWiZKHozRsd4dNsCyCQyuuBezrQOc4y8o4sI8CwNO4VPBH94Mq46emyA/8LYj X-Received: by 2002:a17:906:1cc6:: with SMTP id i6mr13416364ejh.40.1568881608477; Thu, 19 Sep 2019 01:26:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568881608; cv=none; d=google.com; s=arc-20160816; b=QhMRZzkyjHPUNQhCzcpcR9AaQPGNRnRruOVXg8R21LtjDNVKUvsvaAB04WB2InkzkM aJTzXnX9dCXSh0Uy4WfGqyHU+VKhRG7nzcs/wwHCpM2IPBLHv6/pjkt7fD28O3nBRFXr Ydo7VFgOEC9UNPSpWvqFa2WgIsuDR5Vso1H+VSkShU/36bt5X82dDLaWt9UqXQhHwcBd /c/u8sLIWf3tR5f2VYwhFmPM6tQCNKYc9C4sKjTma9oYoA0Kw094cD0Jd0SZvIDKhiRC 0i2PpyrMeIw1fXOVrteJW8sF8OxTlE66uCSUDWEYX7NtGHji4Ot4HCBO2oISsPK67nbr Gp+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=yddYrslvUmfwGYExV9aPMeDamvWBmwG4j99rNZtwB2M=; b=biuRLPHQ7mseYjMDXDhXa9aNhAn0NL/F/6ICL7XXQB8dCX92103V1p504pZSfedWG+ 7wXMXSU1OvL21wEZ9hho7DXlI4xXZqfDxCkMGIAx7GSOqxpEZW/F50iqYeBfZy7SONH+ 6I5Ngp5YRfjskoDDjKvFgS98vNFdZes+6mHvouEpniNtSfB7K3X5GCub4tBsf/Hj8vaC nhu3TVzSeLV4BP+T1jo9NY0ImKRydmMTMRF4XhShVfEh59HKcnfnCwTLlduCtYzN/VF0 ng5Dv9t4Y7/CXyhFbPjicrqFCdxse2KRiOCiMc5Vu21Sk8qyeV7BMMeP6zGLwZxLxuW9 2EVg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s6si4898531edi.154.2019.09.19.01.26.25; Thu, 19 Sep 2019 01:26:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731795AbfISIWL (ORCPT + 99 others); Thu, 19 Sep 2019 04:22:11 -0400 Received: from mx2.suse.de ([195.135.220.15]:44200 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730886AbfISIWL (ORCPT ); Thu, 19 Sep 2019 04:22:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A07A8AFBF; Thu, 19 Sep 2019 08:22:09 +0000 (UTC) Date: Thu, 19 Sep 2019 10:22:08 +0200 From: Michal Hocko To: Lin Feng Cc: Matthew Wilcox , corbet@lwn.net, mcgrof@kernel.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, keescook@chromium.org, mchehab+samsung@kernel.org, mgorman@techsingularity.net, vbabka@suse.cz, ktkhai@virtuozzo.com, hannes@cmpxchg.org, Jens Axboe , Omar Sandoval , Ming Lei Subject: Re: [PATCH] [RFC] vmscan.c: add a sysctl entry for controlling memory reclaim IO congestion_wait length Message-ID: <20190919082208.GB15782@dhcp22.suse.cz> References: <20190917115824.16990-1-linf@wangsu.com> <20190917120646.GT29434@bombadil.infradead.org> <20190918123342.GF12770@dhcp22.suse.cz> <6ae57d3e-a3f4-a3db-5654-4ec6001941a9@wangsu.com> <20190919034949.GF9880@bombadil.infradead.org> <33090db5-c7d4-8d7d-0082-ee7643d15775@wangsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <33090db5-c7d4-8d7d-0082-ee7643d15775@wangsu.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 19-09-19 15:46:11, Lin Feng wrote: > > > On 9/19/19 11:49, Matthew Wilcox wrote: > > On Thu, Sep 19, 2019 at 10:33:10AM +0800, Lin Feng wrote: > > > On 9/18/19 20:33, Michal Hocko wrote: > > > > I absolutely agree here. From you changelog it is also not clear what is > > > > the underlying problem. Both congestion_wait and wait_iff_congested > > > > should wake up early if the congestion is handled. Is this not the case? > > > > > > For now I don't know why, codes seem should work as you said, maybe I need to > > > trace more of the internals. > > > But weird thing is that once I set the people-disliked-tunable iowait > > > drop down instantly, this is contradictory to the code design. > > > > Yes, this is quite strange. If setting a smaller timeout makes a > > difference, that indicates we're not waking up soon enough. I see > > two possibilities; one is that a wakeup is missing somewhere -- ie the > > conditions under which we call clear_wb_congested() are wrong. Or we > > need to wake up sooner. > > > > Umm. We have clear_wb_congested() called from exactly one spot -- > > clear_bdi_congested(). That is only called from: > > > > drivers/block/pktcdvd.c > > fs/ceph/addr.c > > fs/fuse/control.c > > fs/fuse/dev.c > > fs/nfs/write.c > > > > Jens, is something supposed to be calling clear_bdi_congested() in the > > block layer? blk_clear_congested() used to exist until October 29th > > last year. Or is something else supposed to be waking up tasks that > > are sleeping on congestion? > > > > IIUC it looks like after commit a1ce35fa49852db60fc6e268038530be533c5b15, This is something for Jens to comment on. Not waiting up on congestion indeed sounds like a bug. > besides those *.c places as you mentioned above, vmscan codes will always > wait as long as 100ms and nobody wakes them up. Yes this is true but you should realize that this path is triggered only under heavy memory reclaim cases where there is nothing to reclaim because there are too many pages already isolated and we are waiting for reclaimers to make some progress on them. It is also possible that there are simply no reclaimable pages at all and we are heading the OOM situation. In both cases waiting a bit shouldn't be critical because this is really a cold path. It would be much better to have a mechanism to wake up earlier but this is likely to be non trivial and I am not sure worth the effort considering how rare this should be. -- Michal Hocko SUSE Labs