Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp74027ybl; Thu, 15 Aug 2019 12:57:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqyKY2loGCPLwbc00ZWFtW4c6hnoYk2geMsqhdtX3zBM07/qVd8tk/VCTIfL3kDvMMv8MZgJ X-Received: by 2002:a63:4a51:: with SMTP id j17mr4775410pgl.284.1565899077277; Thu, 15 Aug 2019 12:57:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565899077; cv=none; d=google.com; s=arc-20160816; b=yIVcSZm3LX3zBQ0EEu5uVXaIVy4Ko8T1dPBd345mzY42VoZEW8sNP0aWGHn9JoIqrP 76mLGQq2xUVmFcYd9ZAbIuOLUYWjE1YLr2SdSyZZOmMkiJJNbcpv/yaic+5n/zU9LnyV SBsZxN8JORN20n7Q4+o12WkVf1uiCxp4MIbPkS0+XTCNqSJ9tONIPslMBehD1RNhdVBX EMDP71NQNqhWikl1E81APMbdLQh0gRNxqHR/VkiSXSCFbah40irM0ExMYC0E83C4S4tP YEolAQ2LacmVWJ2FpX1S74qKsxb2p9oIJ+tT7lH5UFYj2dbKfFk92OAP3pWw82T7d5eI t08w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Ps1J+aT/GBCOkMq14P55gZ5DQQj1FyBBL4L0FhAcsmY=; b=lHbQkjR7VT7hOnZJW4pUy4ctkQ3TAd5xobqXOQ638k4/KEfdsAfl1fR+PyEz8jJxgU dE14BIN48Ngzi3FvA39hmcrcphQ0foehzkwWMM44kBtenFeSQhkkEMIRSnMtH9aW4TTe 8noCKTtPpKaiZJ7zdMgstX7PDBaw9auIsjRvvJdGhCdwjo28qVO/7Lm/QzcVZv+BH8x6 m2A5SC0Ex7NOZtm1Js4kVl/t83woeej02i4OeSB3LljSQt37Eqcb1iMMl3o7skgGRi5l wSDrnWGQCxNWKrKJRxo7QDWw5ggAreWgFrjKlfNA3PcCUbEy5mGXQpH0RcedMJomDcjE m3UQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x16si2345883pgi.312.2019.08.15.12.57.26; Thu, 15 Aug 2019 12:57:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732540AbfHORmM (ORCPT + 99 others); Thu, 15 Aug 2019 13:42:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:50532 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730482AbfHORmM (ORCPT ); Thu, 15 Aug 2019 13:42:12 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BF351ACA5; Thu, 15 Aug 2019 17:42:09 +0000 (UTC) Date: Thu, 15 Aug 2019 19:42:07 +0200 From: Michal Hocko To: Jason Gunthorpe Cc: LKML , linux-mm@kvack.org, DRI Development , Intel Graphics Development , Peter Zijlstra , Ingo Molnar , Andrew Morton , David Rientjes , Christian =?iso-8859-1?Q?K=F6nig?= , =?iso-8859-1?B?Suly9G1l?= Glisse , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Message-ID: <20190815174207.GR9477@dhcp22.suse.cz> References: <20190814202027.18735-1-daniel.vetter@ffwll.ch> <20190814202027.18735-3-daniel.vetter@ffwll.ch> <20190814235805.GB11200@ziepe.ca> <20190815065829.GA7444@phenom.ffwll.local> <20190815122344.GA21596@ziepe.ca> <20190815132127.GI9477@dhcp22.suse.cz> <20190815141219.GF21596@ziepe.ca> <20190815155950.GN9477@dhcp22.suse.cz> <20190815165631.GK21596@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190815165631.GK21596@ziepe.ca> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 15-08-19 13:56:31, Jason Gunthorpe wrote: > On Thu, Aug 15, 2019 at 06:00:41PM +0200, Michal Hocko wrote: > > > > AFAIK 'GFP_NOWAIT' is characterized by the lack of __GFP_FS and > > > __GFP_DIRECT_RECLAIM.. > > > > > > This matches the existing test in __need_fs_reclaim() - so if you are > > > OK with GFP_NOFS, aka __GFP_IO which triggers try_to_compact_pages(), > > > allocations during OOM, then I think fs_reclaim already matches what > > > you described? > > > > No GFP_NOFS is equally bad. Please read my other email explaining what > > the oom_reaper actually requires. In short no blocking on direct or > > indirect dependecy on memory allocation that might sleep. > > It is much easier to follow with some hints on code, so the true > requirement is that the OOM repear not block on GFP_FS and GFP_IO > allocations, great, that constraint is now clear. I still do not get why do you put FS/IO into the picture. This is really about __GFP_DIRECT_RECLAIM. > > > If you can express that in the existing lockdep machinery. All > > fine. But then consider deployments where lockdep is no-no because > > of the overhead. > > This is all for driver debugging. The point of lockdep is to find all > these paths without have to hit them as actual races, using debug > kernels. > > I don't think we need this kind of debugging on production kernels? Again, the primary motivation was a simple debugging aid that could be used without worrying about overhead. So lockdep is very often out of the question. > > > The best we got was drivers tested the VA range and returned success > > > if they had no interest. Which is a big win to be sure, but it looks > > > like getting any more is not really posssible. > > > > And that is already a great win! Because many notifiers only do care > > about particular mappings. Please note that backing off unconditioanlly > > will simply cause that the oom reaper will have to back off not doing > > any tear down anything. > > Well, I'm working to propose that we do the VA range test under core > mmu notifier code that cannot block and then we simply remove the idea > of blockable from drivers using this new 'range notifier'. > > I think this pretty much solves the concern? Well, my idea was that a range check and early bail out was a first step and then each specific notifier would be able to do a more specific check. I was not able to do the second step because that requires a deep understanding of the respective subsystem. Really all I do care about is to reclaim as much memory from the oom_reaper context as possible. And that cannot really be an unbounded process. Quite contrary it should be as swift as possible. From my cursory look some notifiers are able to achieve their task without blocking or depending on memory just fine. So bailing out unconditionally on the range of interest would just put us back. > > > However, we could (probably even should) make the drivers fs_reclaim > > > safe. > > > > > > If that is enough to guarantee progress of OOM, then lets consider > > > something like using current_gfp_context() to force PF_MEMALLOC_NOFS > > > allocation behavior on the driver callback and lockdep to try and keep > > > pushing on the the debugging, and dropping !blocking. > > > > How are you going to enforce indirect dependency? E.g. a lock that is > > also used in other context which depend on sleepable memory allocation > > to move forward. > > You mean like this: > > CPU0 CPU1 > mutex_lock() > kmalloc(GFP_KERNEL) no I mean __GFP_DIRECT_RECLAIM here. > mutex_unlock() > fs_reclaim_acquire() > mutex_lock() <- illegal: lock dep assertion I cannot really comment on how that is achieveable by lockdep. I managed to forget details about FS/IO reclaim recursion tracking already and I do not have time to learn it again. It was quite a hack. Anyway, let me repeat that the primary motivation was a simple aid. Not something as poverful as lockdep. -- Michal Hocko SUSE Labs