Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932109AbdI1Nj7 (ORCPT ); Thu, 28 Sep 2017 09:39:59 -0400 Received: from mail-oi0-f47.google.com ([209.85.218.47]:48761 "EHLO mail-oi0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753053AbdI1Nj6 (ORCPT ); Thu, 28 Sep 2017 09:39:58 -0400 X-Google-Smtp-Source: AOwi7QDqFfG5ZAPCl4knP5UxfjnACafzAwmCfIBfgmPVickAviTYgwJMYgi0mJH90fOZeFpP83H7ZA== Subject: Re: [PATCH 0/12 v3] Writeback improvements To: John Stoffel Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org, jack@suse.cz, torvalds@linux-foundation.org References: <1506543239-31470-1-git-send-email-axboe@kernel.dk> <20170928131949.GA4384@quad.stoffel.home> From: Jens Axboe Message-ID: <0ad8b55b-1712-51f4-ac89-a838bfe3b81e@kernel.dk> Date: Thu, 28 Sep 2017 15:39:49 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20170928131949.GA4384@quad.stoffel.home> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1888 Lines: 40 On 09/28/2017 03:19 PM, John Stoffel wrote: > On Wed, Sep 27, 2017 at 02:13:47PM -0600, Jens Axboe wrote: >> We've had some issues with writeback in presence of memory reclaim >> at Facebook, and this patch set attempts to fix it up. The real >> functional change for that issue is patch 10. The rest are cleanups, >> as well as the removal of doing non-range cyclic writeback. The users >> of that was sync_inodes_sb() and wakeup_flusher_threads(), both of >> which writeback all of the dirty pages. > > So does this patch set make things faster? Less bursty? Does it make > writeout take longer, but with less spikes? What is the performance > impact of this change? I hate to be a pain, but this just smacks of > arm waving and I'm sure FB doesn't make changes without data... :-) See patch 10, this isn't arm waving at all. The whole point is that you can have millions of writeback work items, which don't do anything. See not only are we wasting a full core of doing nothing, it's bad enough that we can trigger softlockups since it's just sitting there in a loop doing that. It's all explained in that patch... >> The basic idea is that we have callers that call >> wakeup_flusher_threads() with nr_pages == 0. This means 'writeback >> everything'. For memory reclaim situations, we can end up queuing >> a TON of these kinds of writeback units. This can cause softlockups >> and further memory issues, since we allocate huge amounts of >> struct wb_writeback_work to handle this writeback. Handle this >> situation more gracefully. > > Do you push back on the callers or slow them down? Why do we even > allow callers to flush everything? Ehm, because we have to? There are cases where flushing everything makes sense. Laptop mode is one of them, the problematic case here is memory reclaim. To clean dirty pages, you have to kick the flusher threads. -- Jens Axboe