Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1425074ybl; Wed, 14 Aug 2019 17:05:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqwFLIdE/vVdNleTptCHPY+ISdX7oMNKjKOnKyp75UW2h8xiFwvcKb2rtpyyw/z1T098pra1 X-Received: by 2002:a63:714a:: with SMTP id b10mr1445800pgn.25.1565827516522; Wed, 14 Aug 2019 17:05:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565827516; cv=none; d=google.com; s=arc-20160816; b=D/IEo+BEapdhGJrdSzrQSqIx/XJDE62A1Kd/K2g2gNLF/Ro6BkSseJIMpuqHj2DoMn HiSwVXgEa+VW6d92oIr2PNqXCagYVOhD4xmdyyx/gykBD9KEFK56xcKqA0GvuzwtchJX Srdnr7S0wj7cTW+8S0hlniU/zXxCUq3AAJpwd/HEG3SaJ4zeMO8Ah7fxz+xFXTrLcGZ6 8ViP55MKN8GV5nmr5mNuN5J3g2vnc2xqWY936safbu3ATjnFre+Y05iphGtMJ7hCfeFM mfgYAReCa/3SWrlF19A5+JzwRQW7KXb2+zU8ncSTgkfDBHtwXUYgZB5pdnX6pY9BCXbS Zk8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=FuSmTDQh/a+OrOwuynD6gKCzqpRnsaVG3w7wfpm3wqA=; b=i7OQhinXxt6ReaaTSHuEY89MA7XAbJudcKCBoGnwF0hWj9nNeV96BD9bvxLttPDG9k Sx1Migy/CEoQY1Yn82szvalUD8EHD0gY3rzbFZ5dy8yx81SzQp22UqmLIhD2oSp1BenE 4qlOYHdHvIhRSAZ0LddxamyI9i6VTa31MpShfACFdNyUaJ0ozvi7z9oBuTXluUsNF2fQ eAYmuZUpXRuzpK0qYzt2zdZpz8mnJVbO1Zgtw8xPBnrHFyk9RCs7WFl0sQZD2cB9IlJi sEVzRWmBnwGUD19c3R9UeJcQ9I6eEnvoA7YkFIfelCBxBavUU+ANgqXlDsYXezhfdFFD P72w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=bIdfPpif; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1si834228plz.129.2019.08.14.17.05.00; Wed, 14 Aug 2019 17:05:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=bIdfPpif; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729843AbfHNX6I (ORCPT + 99 others); Wed, 14 Aug 2019 19:58:08 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:43434 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729029AbfHNX6I (ORCPT ); Wed, 14 Aug 2019 19:58:08 -0400 Received: by mail-qt1-f195.google.com with SMTP id b11so593682qtp.10 for ; Wed, 14 Aug 2019 16:58:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=FuSmTDQh/a+OrOwuynD6gKCzqpRnsaVG3w7wfpm3wqA=; b=bIdfPpifc5Df9ZyMbaAk6YR1Vm3aqYYmMsJc0qwv0DToMlRhj/7B84jNM0/7ULt0L/ iPW3be0SBOV+BK2Qq+r3HGbp7n+Ruf2KfYxPY7vCJp7IP2MVW6KkmPRqvpHfiaudOeEA 8XRBeYf9Jvlsy4GSSmyOXgDr+1GrEBlEw5Rb2ZSzcOZLkwbEmk8gQhI8Fk1vnvmp9UPM aWm+btPas4OlWBb79U1jxtWQ9X1VBjwE7Lwmpo54l5E5bq4gRjI0aEzdE+q8KotxnPb0 FG86Zjs2dElwOvmLJRDFrE3H0a/YBujkY9Fwqq3EXsGzEM0lRNI9ELgfB11P/lPn/yLd ypYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=FuSmTDQh/a+OrOwuynD6gKCzqpRnsaVG3w7wfpm3wqA=; b=IN2S85d/fPLlpUIbqbO0DU9LZ1glqnWuESD9RBaQVa0jgPGAH6DFSb0u746ozg39vv e0b5xXV0/zTNMriEnPk2D/XD2pDyfNWV5e2Iu/6xcXldRZEwDFu6P9PYyi+WAQlVjCBD xyzkXZc7mSAixh5V7IRiA1G5OU5onOhm6ZS/YpMzx9ctvQogZIRmJcXqHtAYWwM/YFeD yBW9EfLCTPD8gE+yIXzXh/Q+KqIQ0By/qkA12w0ZgxuHOJqQuiQq4CJTHRh6t+GgA/0U GdvYjYaAVCGzWEsUrFmiI0PuRrUe9zNbqaXBgle3bRQB7WVuglFEQwy0nRuCKQ88povm pr1A== X-Gm-Message-State: APjAAAXZawVN2PR2B68swILvH0ns9KRngRPyoLzlIRwlM5M8YUVwTQSl fpAvYP2RYh/HfNsyE6GpAa5lcQ== X-Received: by 2002:ac8:1605:: with SMTP id p5mr1674140qtj.79.1565827086773; Wed, 14 Aug 2019 16:58:06 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-55-100.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.55.100]) by smtp.gmail.com with ESMTPSA id r19sm542639qtm.44.2019.08.14.16.58.06 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 14 Aug 2019 16:58:06 -0700 (PDT) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1hy39Z-0003Tg-WA; Wed, 14 Aug 2019 20:58:06 -0300 Date: Wed, 14 Aug 2019 20:58:05 -0300 From: Jason Gunthorpe To: Daniel Vetter Cc: LKML , linux-mm@kvack.org, DRI Development , Intel Graphics Development , Peter Zijlstra , Ingo Molnar , Andrew Morton , Michal Hocko , David Rientjes , Christian =?utf-8?B?S8O2bmln?= , =?utf-8?B?SsOpcsO0bWU=?= Glisse , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Message-ID: <20190814235805.GB11200@ziepe.ca> References: <20190814202027.18735-1-daniel.vetter@ffwll.ch> <20190814202027.18735-3-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190814202027.18735-3-daniel.vetter@ffwll.ch> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 14, 2019 at 10:20:24PM +0200, Daniel Vetter wrote: > In some special cases we must not block, but there's not a > spinlock, preempt-off, irqs-off or similar critical section already > that arms the might_sleep() debug checks. Add a non_block_start/end() > pair to annotate these. > > This will be used in the oom paths of mmu-notifiers, where blocking is > not allowed to make sure there's forward progress. Quoting Michal: > > "The notifier is called from quite a restricted context - oom_reaper - > which shouldn't depend on any locks or sleepable conditionals. The code > should be swift as well but we mostly do care about it to make a forward > progress. Checking for sleepable context is the best thing we could come > up with that would describe these demands at least partially." But this describes fs_reclaim_acquire() - is there some reason we are conflating fs_reclaim with non-sleeping? ie is there some fundamental difference between the block stack sleeping during reclaim while it waits for a driver to write out a page and a GPU driver sleeping during OOM while it waits for it's HW to fence DMA on a page? Fundamentally we have invalidate_range_start() vs invalidate_range() as the start() version is able to sleep. If drivers can do their work without sleeping then they should be using invalidare_range() instead. Thus, it doesn't seem to make any sense to ask a driver that requires a sleeping API not to sleep. AFAICT what is really going on here is that drivers care about only a subset of the VA space, and we want to query the driver if it cares about the range proposed to be OOM'd, so we can OOM ranges that are do not have SPTEs. ie if you look pretty much all drivers do exactly as userptr_mn_invalidate_range_start() does, and bail once they detect the VA range is of interest. So, I'm working on a patch to lift the interval tree into the notifier core and then do the VA test OOM needs without bothering the driver. Drivers can retain the blocking API they require and OOM can work on VA's that don't have SPTEs. This approach also solves the critical bug in this path: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/ And solves a bunch of other bugs in the drivers. > Peter also asked whether we want to catch spinlocks on top, but Michal > said those are less of a problem because spinlocks can't have an > indirect dependency upon the page allocator and hence close the loop > with the oom reaper. Again, this entirely sounds like fs_reclaim - isn't that exactly what it is for? I have had on my list a second and very related possible bug. I ran into commit 35cfa2b0b491 ("mm/mmu_notifier: allocate mmu_notifier in advance") which says that mapping->i_mmap_mutex is under fs_reclaim(). We do hold i_mmap_rwsem while calling invalidate_range_start(): unmap_mapping_pages i_mmap_lock_write(mapping); // ie i_mmap_rwsem unmap_mapping_range_tree unmap_mapping_range_vma zap_page_range_single mmu_notifier_invalidate_range_start So, if it is still true that i_mmap_rwsem is under fs_reclaim then invalidate_range_start is *always* under fs_reclaim anyhow! (this I do not know) Thus we should use lockdep to force this and fix all the drivers. .. and if we force fs_reclaim always, do we care about blockable anymore?? Jason