Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1828292ybl; Thu, 15 Aug 2019 01:50:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqyWgRVuwWclOkrNKwS+scEB3Uop/dlx/8dl8LKC1/+w8HCzcf+PB4lW6AAQqwqDgbG6KGFC X-Received: by 2002:aa7:92d2:: with SMTP id k18mr4293369pfa.153.1565859021138; Thu, 15 Aug 2019 01:50:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565859021; cv=none; d=google.com; s=arc-20160816; b=vWVmL4w1cc+wiOnlRB/lkiGtPD2VYOALrOofGUAvXiiLxDrXjO9TcQXaf0UUsSusO3 n2cFFJdsnrYU80bvPhOTEOUJjpHKt5zLFaJAiquJT2NsZ3/+/wTaKn0ckdW+a3E+u8mO smdhBP7EpUMh5V+dlzRkclyEm4M8mSVQMSjACh2SNHnrJz17kcwFDjJ60kA4MvAHSEY3 OqlFR4vho0qnrhcu/uEa+zH53Ryu0JzSyYXsfzExP+BSXdu5vEDOMq3AQcvZOxmwm84H kzTqQw/9IkvIo3K0XkXfoduAzvi9yj0EZgBjGNh9AfGYCergpdFCMxhD3J0xuZXsYmIJ B2qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=2ZB8iq7TUrX96ofQd1uBOyQdMyJ0gTKLAnRWhG6fBtQ=; b=JWtcVB8Eo91DHHf4h5syL7TZdeaTP1LsEvEojPrTZJ7HUVdbpSoxw4cqhSyYeVSSmo Yzwom6H6yQwxyNO6ojpQXW4Ix462oFBZ2Vob6JlhR9B94cGr4In14SorouMAlrgCWRo5 5UEojOqXplrgpn2f7oJ1N/6tbd971W857R1KOa2eryTuic9oDtjCFA0K4BODLipupYaQ glddpYBd+1QnnZNOCtZMs2EafsoFwFd8tVbojU3/K91gx9wtQ9wMctkx4lVDRit35ff0 yjeg9lhTvNanQB5fSCWgFXlm7tglhnlGRT0CmIpFieKeuf9ZU0yx1S61K56dItj4NaKg OHIw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y11si606401pjp.54.2019.08.15.01.50.06; Thu, 15 Aug 2019 01:50:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730878AbfHOIoi (ORCPT + 99 others); Thu, 15 Aug 2019 04:44:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:41152 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725875AbfHOIoi (ORCPT ); Thu, 15 Aug 2019 04:44:38 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 39BCDAE84; Thu, 15 Aug 2019 08:44:31 +0000 (UTC) Date: Thu, 15 Aug 2019 10:44:29 +0200 From: Michal Hocko To: Andrew Morton Cc: Daniel Vetter , LKML , linux-mm@kvack.org, DRI Development , Intel Graphics Development , Jason Gunthorpe , Peter Zijlstra , Ingo Molnar , David Rientjes , Christian =?iso-8859-1?Q?K=F6nig?= , =?iso-8859-1?B?Suly9G1l?= Glisse , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Message-ID: <20190815084429.GE9477@dhcp22.suse.cz> References: <20190814202027.18735-1-daniel.vetter@ffwll.ch> <20190814202027.18735-3-daniel.vetter@ffwll.ch> <20190814134558.fe659b1a9a169c0150c3e57c@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190814134558.fe659b1a9a169c0150c3e57c@linux-foundation.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 14-08-19 13:45:58, Andrew Morton wrote: > On Wed, 14 Aug 2019 22:20:24 +0200 Daniel Vetter wrote: > > > In some special cases we must not block, but there's not a > > spinlock, preempt-off, irqs-off or similar critical section already > > that arms the might_sleep() debug checks. Add a non_block_start/end() > > pair to annotate these. > > > > This will be used in the oom paths of mmu-notifiers, where blocking is > > not allowed to make sure there's forward progress. Quoting Michal: > > > > "The notifier is called from quite a restricted context - oom_reaper - > > which shouldn't depend on any locks or sleepable conditionals. The code > > should be swift as well but we mostly do care about it to make a forward > > progress. Checking for sleepable context is the best thing we could come > > up with that would describe these demands at least partially." > > > > Peter also asked whether we want to catch spinlocks on top, but Michal > > said those are less of a problem because spinlocks can't have an > > indirect dependency upon the page allocator and hence close the loop > > with the oom reaper. > > I continue to struggle with this. It introduces a new kernel state > "running preemptibly but must not call schedule()". How does this make > any sense? > > Perhaps a much, much more detailed description of the oom_reaper > situation would help out. The primary point here is that there is a demand of non blockable mmu notifiers to be called when the oom reaper tears down the address space. As the oom reaper is the primary guarantee of the oom handling forward progress it cannot be blocked on anything that might depend on blockable memory allocations. These are not really easy to track because they might be indirect - e.g. notifier blocks on a lock which other context holds while allocating memory or waiting for a flusher that needs memory to perform its work. If such a blocking state happens that we can end up in a silent hang with an unusable machine. Now we hope for reasonable implementations of mmu notifiers (strong words I know ;) and this should be relatively simple and effective catch all tool to detect something suspicious is going on. Does that make the situation more clear? -- Michal Hocko SUSE Labs