Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp87152ybl; Thu, 15 Aug 2019 13:11:54 -0700 (PDT) X-Google-Smtp-Source: APXvYqxgZJYydsKxPo1C+j6oyQ/I0o3XydOmlsH7paprhkCjaX5mCI4luJZGOuJ3UR7rXy0BlWT+ X-Received: by 2002:a63:dd0b:: with SMTP id t11mr4839140pgg.410.1565899914061; Thu, 15 Aug 2019 13:11:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565899914; cv=none; d=google.com; s=arc-20160816; b=M3sIqqZOR9X7XLh9DIzsFCERdrgXFAgU+jHBwAvQY6DE9iaeklMh0V0HvMbLUQ4gEm 4GMG4J/UinEyo4hvFczhjLl/SwAuO1lmiCg1Mj3ImGhOGidLtKlpWHbdNy6hX5stZxWO SMZ0s0jJ/bHyhT7bWdra5rAAXsUZtJrLUHvs6/5JiK0CfiC0FQ2BrQYabZWY3LCVFIjD aZukH07wE5kDZAp+aYSfn3JhIjTltjsXyEVh1WYMUS9F6tjdaOJJYDYXyLywabKvfDjd Rcxje/d6QU4aINDSV9acS+P3h6S7NOjlD+2sAEKSiveSONujLEPwk1uZtx6LpuAtSqQQ PJRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=B7gsxssTjn6XGYZjT0ZHzpSU4DugF0ZW0nWFkIKh380=; b=P9kCsxAVwKPtPqAbcxUBOPjNjuJ+pctfMttElUd7GK41HUA3UY33LEYWfEn5rexhtQ n+5uk0euybPIi5UkBMPtNfKocruAiogMeuHiHGy/IPbFJmipfSTXocEMPOr7Y6HF2TA9 9bRycEDSKjbdR9jCs1q8oaMR14RgWYGhS1rXkCB9crx4l9k+jQAO4+Tvurovo7a5hvsq uTMaNQY+xi/sJ9Hy6se/XAxjxSSZ3LFrekKtU5TNXjHY0o7+SznmS5oPErWKj3AlwGgD b0/LoyI5v0eHluXL3lBexpsOjxTs1H5/3sv8oJ6TaBWDRQgyOU78xQPJvX04DTPBQ9I+ PiWw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m15si2689337pff.267.2019.08.15.13.11.38; Thu, 15 Aug 2019 13:11:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730805AbfHOTFb (ORCPT + 99 others); Thu, 15 Aug 2019 15:05:31 -0400 Received: from mx2.suse.de ([195.135.220.15]:41940 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729407AbfHOTFb (ORCPT ); Thu, 15 Aug 2019 15:05:31 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5C095AD72; Thu, 15 Aug 2019 19:05:28 +0000 (UTC) Date: Thu, 15 Aug 2019 21:05:25 +0200 From: Michal Hocko To: Jason Gunthorpe Cc: LKML , linux-mm@kvack.org, DRI Development , Intel Graphics Development , Peter Zijlstra , Ingo Molnar , Andrew Morton , David Rientjes , Christian =?iso-8859-1?Q?K=F6nig?= , =?iso-8859-1?B?Suly9G1l?= Glisse , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Message-ID: <20190815190525.GS9477@dhcp22.suse.cz> References: <20190814202027.18735-3-daniel.vetter@ffwll.ch> <20190814235805.GB11200@ziepe.ca> <20190815065829.GA7444@phenom.ffwll.local> <20190815122344.GA21596@ziepe.ca> <20190815132127.GI9477@dhcp22.suse.cz> <20190815141219.GF21596@ziepe.ca> <20190815155950.GN9477@dhcp22.suse.cz> <20190815165631.GK21596@ziepe.ca> <20190815174207.GR9477@dhcp22.suse.cz> <20190815182448.GP21596@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190815182448.GP21596@ziepe.ca> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 15-08-19 15:24:48, Jason Gunthorpe wrote: > On Thu, Aug 15, 2019 at 07:42:07PM +0200, Michal Hocko wrote: > > On Thu 15-08-19 13:56:31, Jason Gunthorpe wrote: > > > On Thu, Aug 15, 2019 at 06:00:41PM +0200, Michal Hocko wrote: > > > > > > > > AFAIK 'GFP_NOWAIT' is characterized by the lack of __GFP_FS and > > > > > __GFP_DIRECT_RECLAIM.. > > > > > > > > > > This matches the existing test in __need_fs_reclaim() - so if you are > > > > > OK with GFP_NOFS, aka __GFP_IO which triggers try_to_compact_pages(), > > > > > allocations during OOM, then I think fs_reclaim already matches what > > > > > you described? > > > > > > > > No GFP_NOFS is equally bad. Please read my other email explaining what > > > > the oom_reaper actually requires. In short no blocking on direct or > > > > indirect dependecy on memory allocation that might sleep. > > > > > > It is much easier to follow with some hints on code, so the true > > > requirement is that the OOM repear not block on GFP_FS and GFP_IO > > > allocations, great, that constraint is now clear. > > > > I still do not get why do you put FS/IO into the picture. This is really > > about __GFP_DIRECT_RECLAIM. > > Like I said this is complicated, translating "no blocking on direct or > indirect dependecy on memory allocation that might sleep" into GFP > flags is hard for us outside the mm community. > > So the contraint here is no __GFP_DIRECT_RECLAIM? OK, I am obviously failing to explain that. Sorry about that. You are right that this is not simple. Let me try again. The context we are calling !blockable notifiers from has to finish in a _finite_ amount of time (and swift is hugely appreciated by users of otherwise non-responsive system that is under OOM). We are out of memory so we cannot be blocked waiting for memory. Directly or indirectly (via a lock, waiting for an event that needs memory to finish in general). So you need to track dependency over more complicated contexts than the direct call path (think of workqueue for example). > I bring up FS/IO because that is what Tejun mentioned when I asked him > about reclaim restrictions, and is what fs_reclaim_acquire() is > already sensitive too. It is pretty confusing if we have places using > the word 'reclaim' with different restrictions. :( fs_reclaim has been invented to catch potential deadlocks when a GFP_NO{FS/IO} allocation hits into fs/io reclaim. This is a subset of the reclaim that we have. The oom context is more restricted and it cannot depend on _any_ memory reclaim because by the time we have got to this context all the reclaim has already failed and wait some more will simply not help. > > > CPU0 CPU1 > > > mutex_lock() > > > kmalloc(GFP_KERNEL) > > > > no I mean __GFP_DIRECT_RECLAIM here. > > > > > mutex_unlock() > > > fs_reclaim_acquire() > > > mutex_lock() <- illegal: lock dep assertion > > > > I cannot really comment on how that is achieveable by lockdep. > > ??? I am trying to explain this is already done and working today. The > above example will already fault with lockdep enabled. > > This is existing debugging we can use and improve upon rather that > invent new debugging. This is what you claim and I am saying that fs_reclaim is about a restricted reclaim context and it is an ugly hack. It has proven to report false positives. Maybe it can be extended to a generic reclaim. I haven't tried that. Do not aim to try it. -- Michal Hocko SUSE Labs