Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp4528077ybl; Tue, 20 Aug 2019 13:26:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqwS0Y9o6OK6fPirKOJrlwEJkW0TKzwD+Yg9Au7am4mguhxorJ9oMIr8fqBNnylYbZjOvjqw X-Received: by 2002:aa7:914e:: with SMTP id 14mr31851710pfi.136.1566332780647; Tue, 20 Aug 2019 13:26:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566332780; cv=none; d=google.com; s=arc-20160816; b=KBDOtvj6wXAgFawjz3RKprMhuyd4ae3o7eWGlYIKjL5yklw1tmXGuLq5KRtxwiuJcY ivdyZWpJgYwrQC95Rd4OaTqS5x7Rs078aYMbXKGxZtTAeo3E+9/0NeQZA2u+cKCxCdXM eJfU7WCb1Iae3BTKO58O2nrV2A56TIS6m2b+U5FL7n92XwJP+jb24ZIDy4+QeVg6UFyf Skb7lPIJzziDZzEc27BQh+xFjsnDs39TkER8DqzZWSQNf/Vaa64f83P1d37dPnZnXlKn Kh2R5xjxIw097wmSO8tLYgl6GBZ/h/ywLO9mD1FrnEdmj5lgamK0yPVzvQfzY+QginDW NivA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:mail-followup-to:message-id:subject:cc:to:from:date :dkim-signature; bh=d25H8YaXES8D8CXayg5AzS13dfnA0p+MfNTE8L9U9Pc=; b=CY+5Qj2aKBZI7rrhyTIJF7d+Qop6v1g68/ZMQWGfoCP+CpuyMMWJhzXg7T7MDbUOLO zLTDdp6Zg40htfOtIOFKZr+2qLElEQt8wFihztD2p6xCK0YG7Rs1s4tMN9s8RvKA5cw8 G1R8+1/rv4vc9gobkhODKpt7Sk0o0OYpa4JjmRop/Xr8XIBi80eR2jfi+v7OdOoArAJt 9bBddDIfZB2Di00cyERuzjFgA1XD2/ruicOuxn5Zw8ax6Eghi1vcE6pYpFOQR4m07iqY ZH/o7XD11EM6yzWzODnNREliMKC6tla6rvSPwMvCPhtvAlOkUQh+82NWTkAyLcd7QJJh 5S9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=YzDGv4XO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f35si6456313plb.418.2019.08.20.13.26.05; Tue, 20 Aug 2019 13:26:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=YzDGv4XO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731052AbfHTUYt (ORCPT + 99 others); Tue, 20 Aug 2019 16:24:49 -0400 Received: from mail-ed1-f66.google.com ([209.85.208.66]:47066 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731053AbfHTUYq (ORCPT ); Tue, 20 Aug 2019 16:24:46 -0400 Received: by mail-ed1-f66.google.com with SMTP id z51so150033edz.13 for ; Tue, 20 Aug 2019 13:24:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=sender:date:from:to:cc:subject:message-id:mail-followup-to :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=d25H8YaXES8D8CXayg5AzS13dfnA0p+MfNTE8L9U9Pc=; b=YzDGv4XO+LwpyyKcJI0OU4N7t8BLXha4wn/3Sv43YldWD0xf7aP4v1aM765ctgROdP 5UA0BFANzBJ3/MHYWKBlOTznKQv2McTtf3hst+NBYha3BQ90MgzDB3fWcZM3+t+/3EGL BFNdbUFgnpW9sVM7H/mvhP33Ibzw3ijkNfE0g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=d25H8YaXES8D8CXayg5AzS13dfnA0p+MfNTE8L9U9Pc=; b=P5NmTRTGRr4nfhYvHWGmr62QtdrVwyIDCDZeFEIXZb9Mte/HwVOmpFm8zgI/ZazlHZ TI3ByFda7P0MKrUDcq+kSZPYZxxiv4c0SFYyTiGHMIdq+PUCcCXDb6tsad8iZn3XOkcw FcmlsRsGgPZhTtTVlg5/Pcr5FN5kO8yjtBfJXbdluRYoLR1c1Ym99hCmq8Xw/qlPkmKi 4N5lpr95xQ2wv6wIshMuAB7xCGqkDW/dY3m2g9SKseG+Nrs2HtW1fTLzVqbEfKK0zWqq m9or5dMsfo+fyfOpob9rfYESZm5po9rj5OjxByGp3VWKXY2HbF7a+KUR1HT9GZWGna+4 v8ew== X-Gm-Message-State: APjAAAX82RsknfU8EoPWFkbRHtgldZtnapIqe6LW9KcpBpZgj1k61UgP WwuFW5V4k4eLP8JWoAy1P5Ew4cW1dbiupw== X-Received: by 2002:a17:907:390:: with SMTP id ss16mr27164544ejb.46.1566332683690; Tue, 20 Aug 2019 13:24:43 -0700 (PDT) Received: from phenom.ffwll.local (212-51-149-96.fiber7.init7.net. [212.51.149.96]) by smtp.gmail.com with ESMTPSA id gw11sm2770562ejb.29.2019.08.20.13.24.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Aug 2019 13:24:42 -0700 (PDT) Date: Tue, 20 Aug 2019 22:24:40 +0200 From: Daniel Vetter To: LKML Cc: Linux MM , DRI Development , Intel Graphics Development , Daniel Vetter , Jason Gunthorpe , Peter Zijlstra , Ingo Molnar , Andrew Morton , Michal Hocko , David Rientjes , Christian =?iso-8859-1?Q?K=F6nig?= , =?iso-8859-1?B?Suly9G1l?= Glisse , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter Subject: Re: [PATCH 3/4] kernel.h: Add non_block_start/end() Message-ID: <20190820202440.GH11147@phenom.ffwll.local> Mail-Followup-To: LKML , Linux MM , DRI Development , Intel Graphics Development , Jason Gunthorpe , Peter Zijlstra , Ingo Molnar , Andrew Morton , Michal Hocko , David Rientjes , Christian =?iso-8859-1?Q?K=F6nig?= , =?iso-8859-1?B?Suly9G1l?= Glisse , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter References: <20190820081902.24815-1-daniel.vetter@ffwll.ch> <20190820081902.24815-4-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20190820081902.24815-4-daniel.vetter@ffwll.ch> X-Operating-System: Linux phenom 5.2.0-2-amd64 User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 20, 2019 at 10:19:01AM +0200, Daniel Vetter wrote: > In some special cases we must not block, but there's not a > spinlock, preempt-off, irqs-off or similar critical section already > that arms the might_sleep() debug checks. Add a non_block_start/end() > pair to annotate these. > > This will be used in the oom paths of mmu-notifiers, where blocking is > not allowed to make sure there's forward progress. Quoting Michal: > > "The notifier is called from quite a restricted context - oom_reaper - > which shouldn't depend on any locks or sleepable conditionals. The code > should be swift as well but we mostly do care about it to make a forward > progress. Checking for sleepable context is the best thing we could come > up with that would describe these demands at least partially." > > Peter also asked whether we want to catch spinlocks on top, but Michal > said those are less of a problem because spinlocks can't have an > indirect dependency upon the page allocator and hence close the loop > with the oom reaper. > > Suggested by Michal Hocko. > > v2: > - Improve commit message (Michal) > - Also check in schedule, not just might_sleep (Peter) > > v3: It works better when I actually squash in the fixup I had lying > around :-/ > > v4: Pick the suggestion from Andrew Morton to give non_block_start/end > some good kerneldoc comments. I added that other blocking calls like > wait_event pose similar issues, since that's the other example we > discussed. > > Cc: Jason Gunthorpe > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Andrew Morton > Cc: Michal Hocko > Cc: David Rientjes > Cc: "Christian K?nig" > Cc: Daniel Vetter > Cc: "J?r?me Glisse" > Cc: linux-mm@kvack.org > Cc: Masahiro Yamada > Cc: Wei Wang > Cc: Andy Shevchenko > Cc: Thomas Gleixner > Cc: Jann Horn > Cc: Feng Tang > Cc: Kees Cook > Cc: Randy Dunlap > Cc: linux-kernel@vger.kernel.org > Acked-by: Christian K?nig (v1) > Signed-off-by: Daniel Vetter Hi Peter, Iirc you've been involved at least somewhat in discussing this. -mm folks are a bit undecided whether these new non_block semantics are a good idea. Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe are less enthusiastic. Jason said he's ok with merging the hmm side of this if scheduler folks ack. If not, then I'll respin with the preempt_disable/enable instead like in v1. So ack/nack for this from the scheduler side? Thanks, Daniel > --- > include/linux/kernel.h | 25 ++++++++++++++++++++++++- > include/linux/sched.h | 4 ++++ > kernel/sched/core.c | 19 ++++++++++++++----- > 3 files changed, 42 insertions(+), 6 deletions(-) > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 4fa360a13c1e..82f84cfe372f 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -217,7 +217,9 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset); > * might_sleep - annotation for functions that can sleep > * > * this macro will print a stack trace if it is executed in an atomic > - * context (spinlock, irq-handler, ...). > + * context (spinlock, irq-handler, ...). Additional sections where blocking is > + * not allowed can be annotated with non_block_start() and non_block_end() > + * pairs. > * > * This is a useful debugging help to be able to catch problems early and not > * be bitten later when the calling function happens to sleep when it is not > @@ -233,6 +235,25 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset); > # define cant_sleep() \ > do { __cant_sleep(__FILE__, __LINE__, 0); } while (0) > # define sched_annotate_sleep() (current->task_state_change = 0) > +/** > + * non_block_start - annotate the start of section where sleeping is prohibited > + * > + * This is on behalf of the oom reaper, specifically when it is calling the mmu > + * notifiers. The problem is that if the notifier were to block on, for example, > + * mutex_lock() and if the process which holds that mutex were to perform a > + * sleeping memory allocation, the oom reaper is now blocked on completion of > + * that memory allocation. Other blocking calls like wait_event() pose similar > + * issues. > + */ > +# define non_block_start() \ > + do { current->non_block_count++; } while (0) > +/** > + * non_block_end - annotate the end of section where sleeping is prohibited > + * > + * Closes a section opened by non_block_start(). > + */ > +# define non_block_end() \ > + do { WARN_ON(current->non_block_count-- == 0); } while (0) > #else > static inline void ___might_sleep(const char *file, int line, > int preempt_offset) { } > @@ -241,6 +262,8 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset); > # define might_sleep() do { might_resched(); } while (0) > # define cant_sleep() do { } while (0) > # define sched_annotate_sleep() do { } while (0) > +# define non_block_start() do { } while (0) > +# define non_block_end() do { } while (0) > #endif > > #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0) > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 9f51932bd543..c5630f3dca1f 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -974,6 +974,10 @@ struct task_struct { > struct mutex_waiter *blocked_on; > #endif > > +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP > + int non_block_count; > +#endif > + > #ifdef CONFIG_TRACE_IRQFLAGS > unsigned int irq_events; > unsigned long hardirq_enable_ip; > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 2b037f195473..57245770d6cc 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3700,13 +3700,22 @@ static noinline void __schedule_bug(struct task_struct *prev) > /* > * Various schedule()-time debugging checks and statistics: > */ > -static inline void schedule_debug(struct task_struct *prev) > +static inline void schedule_debug(struct task_struct *prev, bool preempt) > { > #ifdef CONFIG_SCHED_STACK_END_CHECK > if (task_stack_end_corrupted(prev)) > panic("corrupted stack end detected inside scheduler\n"); > #endif > > +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP > + if (!preempt && prev->state && prev->non_block_count) { > + printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n", > + prev->comm, prev->pid, prev->non_block_count); > + dump_stack(); > + add_taint(TAINT_WARN, LOCKDEP_STILL_OK); > + } > +#endif > + > if (unlikely(in_atomic_preempt_off())) { > __schedule_bug(prev); > preempt_count_set(PREEMPT_DISABLED); > @@ -3813,7 +3822,7 @@ static void __sched notrace __schedule(bool preempt) > rq = cpu_rq(cpu); > prev = rq->curr; > > - schedule_debug(prev); > + schedule_debug(prev, preempt); > > if (sched_feat(HRTICK)) > hrtick_clear(rq); > @@ -6570,7 +6579,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset) > rcu_sleep_check(); > > if ((preempt_count_equals(preempt_offset) && !irqs_disabled() && > - !is_idle_task(current)) || > + !is_idle_task(current) && !current->non_block_count) || > system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING || > oops_in_progress) > return; > @@ -6586,8 +6595,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset) > "BUG: sleeping function called from invalid context at %s:%d\n", > file, line); > printk(KERN_ERR > - "in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n", > - in_atomic(), irqs_disabled(), > + "in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n", > + in_atomic(), irqs_disabled(), current->non_block_count, > current->pid, current->comm); > > if (task_stack_end_corrupted(current)) > -- > 2.23.0.rc1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch