Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp3650929imm; Mon, 25 Jun 2018 02:12:13 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLtmYYimibRPQgIu4b0HtQA/c7OYrFeJXdGjo/58MBs/5eohGZzzSobmteHtL9J8hs78cu2 X-Received: by 2002:a17:902:9a01:: with SMTP id v1-v6mr1391477plp.20.1529917933694; Mon, 25 Jun 2018 02:12:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529917933; cv=none; d=google.com; s=arc-20160816; b=zhgsNqUOXLfWC3t7oLSW0FiTOcAqOL1sWdftOhqOW86n4p1A7ZiGtq2dwVBpi0v/NT 4SSFytCfhWcyRkZCX4Kety1m/Oewwp+EsynshMMHum3afNbzHfd5js/1JdjqbMjuRIrz 3Pr24DbIO0fLa42qpBQiaFu4so5cVDBitAu1iSWutN68BQkShxgU1V0mtUf48B7FK/v8 mjWC+u237QzHmX+YFewwBxDO4vMdikQXAcGet4/o4zVaOWJmYTyAQJkLGGbQOZ0FyUeG tCSGSt8RRV2o0dyCReDTR2hb4PsDp7CrFc8C/JhMew7QF+batyqeyjY1DKiE7+IlzAWW 1esQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=m8NRIAjGmyKT8GvDtk1Ms91DiMDv0PcuAuuBj3mCsC8=; b=eG1RCAro8wul+c9CgPloJO5iBD5B2lgz9qoPqzKBwe8u8vFlnTUt6DCgeKmrqAe3jN BliCHgWDBkWZkRifC36IfmRWx2HJSc+UWu5YNpEfo9xEGLeFSzDeakR5o6BmckGUMJVb Dm5c30hPTrpwNfbyy5pJf9uBFHze+2ko9YLRCeoE70RALO3z+vAP3uRmIgCDrUeNgSOA 5+D1VNYz7lhG9OyaZaprwZfvsbONOy3z8HQh7TObcrSWGTc+NWuBs0jP8fr2hEYiy54o T8FXabOHa6zvn1F4KlZlA//9SQNt9caLGo+RhjXzmwbnHHT/i56Fjgxc0c5x7tZjj+l+ qANQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r85-v6si13793413pfa.259.2018.06.25.02.11.58; Mon, 25 Jun 2018 02:12:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754465AbeFYJKB (ORCPT + 99 others); Mon, 25 Jun 2018 05:10:01 -0400 Received: from mx2.suse.de ([195.135.220.15]:55778 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754449AbeFYJJ7 (ORCPT ); Mon, 25 Jun 2018 05:09:59 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id CD43EAB43; Mon, 25 Jun 2018 09:09:57 +0000 (UTC) Date: Mon, 25 Jun 2018 11:09:57 +0200 From: Michal Hocko To: Mikulas Patocka Cc: jing xia , Mike Snitzer , agk@redhat.com, dm-devel@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: dm bufio: Reduce dm_bufio_lock contention Message-ID: <20180625090957.GF28965@dhcp22.suse.cz> References: <20180615130925.GI24039@dhcp22.suse.cz> <20180619104312.GD13685@dhcp22.suse.cz> <20180622090151.GS10465@dhcp22.suse.cz> <20180622090935.GT10465@dhcp22.suse.cz> <20180622130524.GZ10465@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 22-06-18 14:57:10, Mikulas Patocka wrote: > > > On Fri, 22 Jun 2018, Michal Hocko wrote: > > > On Fri 22-06-18 08:52:09, Mikulas Patocka wrote: > > > > > > > > > On Fri, 22 Jun 2018, Michal Hocko wrote: > > > > > > > On Fri 22-06-18 11:01:51, Michal Hocko wrote: > > > > > On Thu 21-06-18 21:17:24, Mikulas Patocka wrote: > > > > [...] > > > > > > What about this patch? If __GFP_NORETRY and __GFP_FS is not set (i.e. the > > > > > > request comes from a block device driver or a filesystem), we should not > > > > > > sleep. > > > > > > > > > > Why? How are you going to audit all the callers that the behavior makes > > > > > sense and moreover how are you going to ensure that future usage will > > > > > still make sense. The more subtle side effects gfp flags have the harder > > > > > they are to maintain. > > > > > > > > So just as an excercise. Try to explain the above semantic to users. We > > > > currently have the following. > > > > > > > > * __GFP_NORETRY: The VM implementation will try only very lightweight > > > > * memory direct reclaim to get some memory under memory pressure (thus > > > > * it can sleep). It will avoid disruptive actions like OOM killer. The > > > > * caller must handle the failure which is quite likely to happen under > > > > * heavy memory pressure. The flag is suitable when failure can easily be > > > > * handled at small cost, such as reduced throughput > > > > > > > > * __GFP_FS can call down to the low-level FS. Clearing the flag avoids the > > > > * allocator recursing into the filesystem which might already be holding > > > > * locks. > > > > > > > > So how are you going to explain gfp & (__GFP_NORETRY | ~__GFP_FS)? What > > > > is the actual semantic without explaining the whole reclaim or force > > > > users to look into the code to understand that? What about GFP_NOIO | > > > > __GFP_NORETRY? What does it mean to that "should not sleep". Do all > > > > shrinkers have to follow that as well? > > > > > > My reasoning was that there is broken code that uses __GFP_NORETRY and > > > assumes that it can't fail - so conditioning the change on !__GFP_FS would > > > minimize the diruption to the broken code. > > > > > > Anyway - if you want to test only on __GFP_NORETRY (and fix those 16 > > > broken cases that assume that __GFP_NORETRY can't fail), I'm OK with that. > > > > As I've already said, this is a subtle change which is really hard to > > reason about. Throttling on congestion has its meaning and reason. Look > > at why we are doing that in the first place. You cannot simply say this > > So - explain why is throttling needed. You support throttling, I don't, so > you have to explain it :) > > > is ok based on your specific usecase. We do have means to achieve that. > > It is explicit and thus it will be applied only where it makes sense. > > You keep repeating that implicit behavior change for everybody is > > better. > > I don't want to change it for everybody. I want to change it for block > device drivers. I don't care what you do with non-block drivers. Well, it is usually onus of the patch submitter to justify any change. But let me be nice on you, for once. This throttling is triggered only if we all the pages we have encountered during the reclaim attempt are dirty and that means that we are rushing through the LRU list quicker than flushers are able to clean. If we didn't throttle we could hit stronger reclaim priorities (aka scan more to reclaim memory) and reclaim more pages as a result. > > I guess we will not agree on that part. I consider it a hack > > rather than a systematic solution. I can easily imagine that we just > > find out other call sites that would cause over reclaim or similar > > If a __GFP_NORETRY allocation does overreclaim - it could be fixed by > returning NULL instead of doing overreclaim. The specification says that > callers must handle failure of __GFP_NORETRY allocations. > > So yes - if you think that just skipping throttling on __GFP_NORETRY could > cause excessive CPU consumption trying to reclaim unreclaimable pages or > something like that - then you can add more points where the __GFP_NORETRY > is failed with NULL to avoid the excessive CPU consumption. Which is exactly something I do not want to do. Spread __GFP_NORETRY all over the reclaim code. Especially for something we already have means for... -- Michal Hocko SUSE Labs