Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp2371319ybl; Thu, 15 Aug 2019 10:43:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqxXX6PmhUWKoavalWg3H8mKeg4PHh6rOgBYn2K0QXJLmiRhu1s0GbqNNc8Jd04+B3HNpNw9 X-Received: by 2002:a17:902:b604:: with SMTP id b4mr5227997pls.94.1565890985774; Thu, 15 Aug 2019 10:43:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565890985; cv=none; d=google.com; s=arc-20160816; b=xM2bxKhzGJSz4iIbGVZwYyAeb4nrGMX1uFJ3ZOQOhDig7LPaFQJagbcJVIGGODUJ2B JY0gdcMF03Ce6AoJXxZkfk5sGatZSUsuX8te/okivOPsy3J0695qBJhKANKDnCBiyNtr MK4ILs6qWgYO5aK9gEKhyqlwv3aIqp+khTsFpMwoMx7EHcM2ZrEqMMNyKI+C41KqyySD 4HqM6g6XREx49Jq8brq/3bM/qza0UKMOZmL/JKMZmYnQgty1QhHsXm8ZdBiNc9qROJPW 6+iWWWaoJzbhoqZWYh2gJ14JBBE/tjTrdFCK7MNDy6UsIjfi0q7i1a17NGFm17k0/eyq OSrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=8Q3BUPafjT//7zqZyDAGCCoPTiMdeG8OI4h/hxD+trk=; b=BPmCE3omeBTdTYB5xnn0bH06kgAuD7R3aj7vggQktRcTN7Q3+xJgNgzSqjevknvqSH 0SCq9+CCpEYnRY+rIHLx0cAnGIwjPZmyzaetnxqanfBV5s/CyOvctV1o5sni3s7c4NNW aPQwemsYU0+i0UgRxaSP2QwpEelJEGSnX1sV87Bxq9SkmuihaTNmSEhMEWwAH0YxNSMt B5FepUSBcjikL/L1C0veoPoJHmYh8NB0s1vSj1qoULxzZexG7BDDMe0BQ8sX3C23J0Tt JuWntc7J8PHOnO3bgEbN3AcdAE0uKRMdM16ZxTX/0pwq9l/jlgxvwwZsoEY9/38RmA1P tBYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b="l/2uXRd4"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n123si2174497pgn.151.2019.08.15.10.42.48; Thu, 15 Aug 2019 10:43:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b="l/2uXRd4"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730323AbfHOQ4d (ORCPT + 99 others); Thu, 15 Aug 2019 12:56:33 -0400 Received: from mail-qt1-f194.google.com ([209.85.160.194]:39187 "EHLO mail-qt1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726865AbfHOQ4d (ORCPT ); Thu, 15 Aug 2019 12:56:33 -0400 Received: by mail-qt1-f194.google.com with SMTP id l9so3033855qtu.6 for ; Thu, 15 Aug 2019 09:56:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=8Q3BUPafjT//7zqZyDAGCCoPTiMdeG8OI4h/hxD+trk=; b=l/2uXRd4T7HQOUDBG/JM+iUspGtfzLrxEqhgp3H9MuxzleYOE7XUTyFzkXSdKaCTcb q9MfXp3hYV+sssTkJDkOOSAm8bJQBLEjDvpI3VvG5Bah4pKJ0GwqOvPqD1hTMQ2juM4F wG3FcmtRNOq7XeZkSWigoJioXNZBC8G26nVgROu929tLZSwPk789miZ0cuWwTTyj8PQY kGTmipP1NMr8zg4r5RP6wvNh8HqOFhJI316r2CopVOgXT2m8TBtFcD5juD/oONpbcfrC 7ZtrySmSL1jVyB5LFHnhVSicKT/IZ1SgzrKNZaX2H1Okq65iudeI58fGM7w480kbI10U mk4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=8Q3BUPafjT//7zqZyDAGCCoPTiMdeG8OI4h/hxD+trk=; b=V0LnBNnU6bDtrlZykxMSNyiLpjd62ZCMyFxfmUAl5d4ZwTaQTnMBZccuz5Jc/hpYmZ wSQoyKtD4lyIsvilXAGTgukmXihxGb9KhDFyq33RP1cw+u/JBv+/zOQHxewDc11LSJG8 tPeddOHjAJJe8d6L0yRIvQ0Yy8Xe+53BIYqWfrR5/nm7nsXC0eYXvAQTjuHbHNbjjA3D tV7hgl5BvE8hABz5cis28W4pzj+YH8Zmm1ePX76TwuOdCpLE05dDPLh9iVbA9WGP7m9W tfHCzCP6XVJdSMrYdaVQWLjdNxpVu7ZNw4Fmb7pbLlKhnfdtEikXi/v0jNJYc2DwujxG yuyw== X-Gm-Message-State: APjAAAVochBBi4nOcMXAbahIF7nVt5QxmZNvUDYGRN0vDQnIiMmc+NSY eeOPH+gLxbm0ejZbIhZy22rmWQ== X-Received: by 2002:ac8:4247:: with SMTP id r7mr4846720qtm.219.1565888192037; Thu, 15 Aug 2019 09:56:32 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-55-100.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.55.100]) by smtp.gmail.com with ESMTPSA id f133sm1620929qke.62.2019.08.15.09.56.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Aug 2019 09:56:31 -0700 (PDT) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1hyJ39-0006Zn-5c; Thu, 15 Aug 2019 13:56:31 -0300 Date: Thu, 15 Aug 2019 13:56:31 -0300 From: Jason Gunthorpe To: Michal Hocko Cc: LKML , linux-mm@kvack.org, DRI Development , Intel Graphics Development , Peter Zijlstra , Ingo Molnar , Andrew Morton , David Rientjes , Christian =?utf-8?B?S8O2bmln?= , =?utf-8?B?SsOpcsO0bWU=?= Glisse , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Message-ID: <20190815165631.GK21596@ziepe.ca> References: <20190814202027.18735-1-daniel.vetter@ffwll.ch> <20190814202027.18735-3-daniel.vetter@ffwll.ch> <20190814235805.GB11200@ziepe.ca> <20190815065829.GA7444@phenom.ffwll.local> <20190815122344.GA21596@ziepe.ca> <20190815132127.GI9477@dhcp22.suse.cz> <20190815141219.GF21596@ziepe.ca> <20190815155950.GN9477@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190815155950.GN9477@dhcp22.suse.cz> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 15, 2019 at 06:00:41PM +0200, Michal Hocko wrote: > > AFAIK 'GFP_NOWAIT' is characterized by the lack of __GFP_FS and > > __GFP_DIRECT_RECLAIM.. > > > > This matches the existing test in __need_fs_reclaim() - so if you are > > OK with GFP_NOFS, aka __GFP_IO which triggers try_to_compact_pages(), > > allocations during OOM, then I think fs_reclaim already matches what > > you described? > > No GFP_NOFS is equally bad. Please read my other email explaining what > the oom_reaper actually requires. In short no blocking on direct or > indirect dependecy on memory allocation that might sleep. It is much easier to follow with some hints on code, so the true requirement is that the OOM repear not block on GFP_FS and GFP_IO allocations, great, that constraint is now clear. > If you can express that in the existing lockdep machinery. All > fine. But then consider deployments where lockdep is no-no because > of the overhead. This is all for driver debugging. The point of lockdep is to find all these paths without have to hit them as actual races, using debug kernels. I don't think we need this kind of debugging on production kernels? > > The best we got was drivers tested the VA range and returned success > > if they had no interest. Which is a big win to be sure, but it looks > > like getting any more is not really posssible. > > And that is already a great win! Because many notifiers only do care > about particular mappings. Please note that backing off unconditioanlly > will simply cause that the oom reaper will have to back off not doing > any tear down anything. Well, I'm working to propose that we do the VA range test under core mmu notifier code that cannot block and then we simply remove the idea of blockable from drivers using this new 'range notifier'. I think this pretty much solves the concern? > > However, we could (probably even should) make the drivers fs_reclaim > > safe. > > > > If that is enough to guarantee progress of OOM, then lets consider > > something like using current_gfp_context() to force PF_MEMALLOC_NOFS > > allocation behavior on the driver callback and lockdep to try and keep > > pushing on the the debugging, and dropping !blocking. > > How are you going to enforce indirect dependency? E.g. a lock that is > also used in other context which depend on sleepable memory allocation > to move forward. You mean like this: CPU0 CPU1 mutex_lock() kmalloc(GFP_KERNEL) mutex_unlock() fs_reclaim_acquire() mutex_lock() <- illegal: lock dep assertion ? lockdep handles this - that is what it does, it builds a graph of all lock dependencies and requires the graph to be acyclic. So mutex_lock depends on fs_reclaim_lock on CPU1 while on CPU0, fs_reclaim_lock depends on mutex_lock. This is an ABBA locking cycle and lockdep will reliably trigger. So, if we wanted to do this, I'd probably suggest we retool fs_reclaim's interface be more like prevent_gfp_flags(__GFP_IO | __GFP_FS); [..] restore_gfp_flags(__GFP_IO | __GFP_FS); Which is lockdep based and follows the indirect lock dependencies you talked about. Then OOM and reclaim can specify the different flags they want blocked. We could probably use the same API with WQ_MEM_RECLAIM as I was chatting with Tejun about: https://www.spinics.net/lists/linux-rdma/msg77362.html IMHO this stuff is *super complicated* for those of us outside the mm subsystem, so having some really clear annotations like the above would go a long way to help understand these special constraints. I'm pretty sure there are lots of driver bugs related to using the wrong GFP flags in the kernel. > Really, this was aimed to give a very simple debugging aid. If it is > considered to be so controversial, even though I really do not see why, > then let's just drop it on the floor. My concern is that the requirement was very unclear. I'm trying to understand all the bits of how these notifiers work and the exact semantic of this OOM path have been vauge if it is really some GFP flag restriction or truely related to not sleeping. Jason