Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp638723ybl; Wed, 14 Aug 2019 03:45:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqx394Vn/+stnWGlDQ7Qu9RVO9sMLMaNLRED1sEOJ65StL1Si8T4uTtmg0kdQCg2wmehNCp5 X-Received: by 2002:aa7:81d9:: with SMTP id c25mr46597821pfn.255.1565779534102; Wed, 14 Aug 2019 03:45:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565779534; cv=none; d=google.com; s=arc-20160816; b=r5gWXp8Yzm4lf0dFvtM48Rh4yL/3kVJ0jMIpWdQVAb2qTJgVCQPf7t+RsPX0f2l71L 2B/howJQvmK4gsmrqrUeITaOrRiITPK0BEV2S1D3RQu76eg/aduSwLqaMT0KmMOkHnPn CrRIpyqDJR/C5s/CeEq+eII/+4XDLsGKmBW312/mF5YCFB43Gppea/DGNJGjvjcB+sn5 +HwOdjn98lFdZ4IBN4NfqMLU0KRwtcMN8q5zIgDH2UXiMMiXS+vdonQckgAc+HI4SnXz H95ehJaWAW2pyQoDNfwUTvXvkm41X6/IFpoOoUlVA26RkEn/F/0Jmb+1vzS580mATBr5 pe2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=kDlcLLY3YR6gXaJ6LT3v2KFdc69Sd2oW+JRb2k/wf1M=; b=V8EBFAOIMcf2Om2Pmgol0Hs6gW4Maxk/cVtpk5sqRJ9XqAwAEup1bE7THuTGh91wQX ScxUVXnzN2slhnudV85lOSuFVUHMLIZWHC3MD8b7Jo+/e7psN8A0KAn/hJIlFrL3sX+s djWU6QS6lyP3qVgeXzS0+4L0TKdZS4ppMxa8XzGacWXi7HkvWSkobv4k+XL9nVnNkrc/ hrHzTl/0FZeaHex0ZCL0abFvRPwfj01rngneY8ELCa2r2zmI0OFpkoTkBdnvYycZyjb5 +kbaSvQ0TumIRvOW3sUM+vtfkpAhlS431FLW7rNYpYjFdApRgs9k6qp7zYwpgtyyEV0F TuhQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d25si12178012pgv.476.2019.08.14.03.45.18; Wed, 14 Aug 2019 03:45:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727466AbfHNKoY (ORCPT + 99 others); Wed, 14 Aug 2019 06:44:24 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:47358 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726383AbfHNKoX (ORCPT ); Wed, 14 Aug 2019 06:44:23 -0400 Received: from dread.disaster.area (pa49-195-190-67.pa.nsw.optusnet.com.au [49.195.190.67]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 9F7EB43D394; Wed, 14 Aug 2019 20:44:18 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92) (envelope-from ) id 1hxqkI-0001pY-Aq; Wed, 14 Aug 2019 20:43:10 +1000 Date: Wed, 14 Aug 2019 20:43:10 +1000 From: Dave Chinner To: Mikulas Patocka Cc: Alexander Viro , "Darrick J. Wong" , Mike Snitzer , junxiao.bi@oracle.com, dm-devel@redhat.com, Alasdair Kergon , honglei.wang@oracle.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org Subject: Re: [PATCH] direct-io: use GFP_NOIO to avoid deadlock Message-ID: <20190814104310.GN6129@dread.disaster.area> References: <20190809013403.GY7777@dread.disaster.area> <20190809215733.GZ7777@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 a=TR82T6zjGmBjdfWdGgpkDw==:117 a=TR82T6zjGmBjdfWdGgpkDw==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=FmdZ9Uzk2mMA:10 a=VwQbUJbxAAAA:8 a=7-415B0cAAAA:8 a=B-2tCdbVYXm3Rcn1sc0A:9 a=CjuIK1q_8ugA:10 a=AjGcO6oz07-iQ99wixmX:22 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 13, 2019 at 12:35:49PM -0400, Mikulas Patocka wrote: > > > On Sat, 10 Aug 2019, Dave Chinner wrote: > > > No, you misunderstand. I'm talking about blocking kswapd being > > wrong. i.e. Blocking kswapd in shrinkers causes problems > > because th ememory reclaim code does not expect kswapd to be > > arbitrarily delayed by waiting on IO. We've had this problem with > > the XFS inode cache shrinker for years, and there are many reports > > of extremely long reclaim latencies for both direct and kswapd > > reclaim that result from kswapd not making progress while waiting > > in shrinkers for IO to complete. > > > > The work I'm currently doing to fix this XFS problem can be found > > here: > > > > https://lore.kernel.org/linux-fsdevel/20190801021752.4986-1-david@fromorbit.com/ > > > > > > i.e. the point I'm making is that waiting for IO in kswapd reclaim > > context is considered harmful - kswapd context shrinker reclaim > > should be as non-blocking as possible, and any back-off to wait for > > IO to complete should be done by the high level reclaim core once > > it's completed an entire reclaim scan cycle of everything.... > > > > What follows from that, and is pertinent for in this situation, is > > that if you don't block kswapd, then other reclaim contexts are not > > going to get stuck waiting for it regardless of the reclaim context > > they use. > > > > Cheers, > > > > Dave. > > So, what do you think the dm-bufio shrinker should do? I'm not familiar with the constraints the code operates under, so I can't guarantee that I have an answer for you... :/ > Currently it tries to free buffers on the clean list and if there are not > enough buffers on the clean list, it goes into the dirty list - it writes > the buffers back and then frees them. > > What should it do? Should it just start writeback of the dirty list > without waiting for it? What should it do if all the buffers are under > writeback? For kswapd, it should do what it can without blocking. e.g. kicking an async writer thread rather than submitting the IO itself. That's what I changes XFS to do. And if you look at the patchset in the above link, it also introduced a mechanism for shrinkers to communicate back to the high level reclaim code that kswapd needs to back off (reclaim_state->need_backoff). With these mechanism, the shrinker can start IO without blocking kswapd on congested request queues and tell memory reclaim to wait before calling this shrinker again. This allows kswapd to aggregate all the waits that shrinkers and page reclaim require to all progress to be made into a single backoff event. That means kswapd does other scanning work while background writeback goes on, and once everythign is scanned it does a single wait for everything that needs time to make progress... I think that should also work for the dm-bufio shrinker, and the the direct reclaim backoff parts of the patchset should work for non-blocking direct reclaim scanning as well, like it now does for XFS. Cheers, Dave. -- Dave Chinner david@fromorbit.com