Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2876198imm; Mon, 16 Jul 2018 16:13:40 -0700 (PDT) X-Google-Smtp-Source: AAOMgpd7i08xNft7wAC3GixRGrsWyRQXmva83npIMWNvKTqf0Z7wJBJ4+4BktepAQBqg9dvIi3Y0 X-Received: by 2002:a17:902:5a3:: with SMTP id f32-v6mr5587983plf.286.1531782820384; Mon, 16 Jul 2018 16:13:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531782820; cv=none; d=google.com; s=arc-20160816; b=wVxSg/U24dS2w1ubTU1DKAfCEi76gLnr2T2CPUpu/4XGNogOvwgWlx1XJmFzQb9koq ZFZLpij4QCYaeEyiaUS/LyO+pwoi15UI6gR8nVhxrRrigL1F8HQ87zALD6o4vss0Gn1y MpLyzL6OegqSdkz+8VGZ8PuxGDd6/QFJaFQHtj/nzHzIOyelktiu7q2y2nhXDaux32x9 G0ye1oS6YPI0kTCLrDPL+b/5DFslnDBxfp0UyejuIfUpyTWN3urAW7Azt0HpBvZuw6gB Q7Bpn2locqAipUpKbDZ6S5vn1ueUPit/X4MymGuoiYqwBMwiHa6XIfEOwISdNuWAwTE/ gXzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=3k1V+vgwTUymP+gr7lcaW63VNeFcnrn12ofXX3JKAVw=; b=zIdd+bqiNcJfzyrojF8OivB92mBRrC5V8MYuucOpsPaP+FPgA3KrpmdMsLWY8AEEtX xo1v3hopwfE75Kt1y6Hix9a9EYJ4JPLeMQQu4lffzO7cPe5V+NKEbHU6VwZX88IKcK6Y 7nznUdL7ut3rDka83mX4OeVjFL/hycby4OxTljpc7friveLkuJ7uNFsTCSnKqT+vopy7 0z9zAr8tplKBV2WhCYz5jVMBJd56wvSomg314ALlDzZaiXWxfR/a2ysVU8ERNiXqK+nK 00RZ/3d+d7kjRuxiQzEA4x5EgKeBDRMjstUYAi+6UbN5y5028+/viHsgZhAZTarHhNML 4WRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t9-v6si13810607plq.324.2018.07.16.16.13.25; Mon, 16 Jul 2018 16:13:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729603AbeGPXm2 (ORCPT + 99 others); Mon, 16 Jul 2018 19:42:28 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:41464 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728794AbeGPXm2 (ORCPT ); Mon, 16 Jul 2018 19:42:28 -0400 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.92]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 43400BE1; Mon, 16 Jul 2018 23:12:50 +0000 (UTC) Date: Mon, 16 Jul 2018 16:12:49 -0700 From: Andrew Morton To: Michal Hocko Cc: LKML , , Michal Hocko , "David (ChunMing) Zhou" , Paolo Bonzini , Radim =?UTF-8?Q?Kr=C4=8Dm=C3=A1=C5=99?= , Alex Deucher , David Airlie , Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Doug Ledford , Jason Gunthorpe , Mike Marciniszyn , Dennis Dalessandro , Sudeep Dutt , Ashutosh Dixit , Dimitri Sivanich , Boris Ostrovsky , Juergen Gross , =?ISO-8859-1?Q?J=E9r=F4me?= Glisse , Andrea Arcangeli , Felix Kuehling , kvm@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, linux-rdma@vger.kernel.org, xen-devel@lists.xenproject.org, Christian =?ISO-8859-1?Q?K=F6nig?= , David Rientjes , Leon Romanovsky Subject: Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers Message-Id: <20180716161249.c76240cd487c070fb271d529@linux-foundation.org> In-Reply-To: <20180716115058.5559-1-mhocko@kernel.org> References: <20180716115058.5559-1-mhocko@kernel.org> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 16 Jul 2018 13:50:58 +0200 Michal Hocko wrote: > From: Michal Hocko > > There are several blockable mmu notifiers which might sleep in > mmu_notifier_invalidate_range_start and that is a problem for the > oom_reaper because it needs to guarantee a forward progress so it cannot > depend on any sleepable locks. > > Currently we simply back off and mark an oom victim with blockable mmu > notifiers as done after a short sleep. That can result in selecting a > new oom victim prematurely because the previous one still hasn't torn > its memory down yet. > > We can do much better though. Even if mmu notifiers use sleepable locks > there is no reason to automatically assume those locks are held. > Moreover majority of notifiers only care about a portion of the address > space and there is absolutely zero reason to fail when we are unmapping an > unrelated range. Many notifiers do really block and wait for HW which is > harder to handle and we have to bail out though. > > This patch handles the low hanging fruid. __mmu_notifier_invalidate_range_start > gets a blockable flag and callbacks are not allowed to sleep if the > flag is set to false. This is achieved by using trylock instead of the > sleepable lock for most callbacks and continue as long as we do not > block down the call chain. I assume device driver developers are wondering "what does this mean for me". As I understand it, the only time they will see blockable==false is when their driver is being called in response to an out-of-memory condition, yes? So it is a very rare thing. Any suggestions regarding how the driver developers can test this code path? I don't think we presently have a way to fake an oom-killing event? Perhaps we should add such a thing, given the problems we're having with that feature. > I think we can improve that even further because there is a common > pattern to do a range lookup first and then do something about that. > The first part can be done without a sleeping lock in most cases AFAICS. > > The oom_reaper end then simply retries if there is at least one notifier > which couldn't make any progress in !blockable mode. A retry loop is > already implemented to wait for the mmap_sem and this is basically the > same thing. > > ... > > +static inline int mmu_notifier_invalidate_range_start_nonblock(struct mm_struct *mm, > + unsigned long start, unsigned long end) > +{ > + int ret = 0; > + if (mm_has_notifiers(mm)) > + ret = __mmu_notifier_invalidate_range_start(mm, start, end, false); > + > + return ret; > } nit, { if (mm_has_notifiers(mm)) return __mmu_notifier_invalidate_range_start(mm, start, end, false); return 0; } would suffice. > > ... > > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3074,7 +3074,7 @@ void exit_mmap(struct mm_struct *mm) > * reliably test it. > */ > mutex_lock(&oom_lock); > - __oom_reap_task_mm(mm); > + (void)__oom_reap_task_mm(mm); > mutex_unlock(&oom_lock); What does this do? > set_bit(MMF_OOM_SKIP, &mm->flags); > > ... >