Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1010424ybl; Fri, 16 Aug 2019 07:32:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqx+g/lZhOyX2OuH6yMSu4LV8EwSS1gIRdTUNBMO5C9cGM6lyrRMvkfsfXmnRjOMLTpelqpI X-Received: by 2002:aa7:91d3:: with SMTP id z19mr11032984pfa.135.1565965958986; Fri, 16 Aug 2019 07:32:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565965958; cv=none; d=google.com; s=arc-20160816; b=uV2ODEQDiHN6rH3BVcQxXLCzzaIoTggEvpsS5GYKTh9M6giXrZj4tuXmN3G06WGxpt Kryljd6OXDo5gehLlAHOPVwqP8v4BKEj9Rf5NFakjGZo53JCZRu5nJai72M9DtYzxpDU qyw4/Now1DQU9/1FSKCrsucmTaKxoAQrOYO/4JwEAhhN+ZE8DAS8axoyUOL6rn0K0d89 PQ6fn+sWc+IXLxGwS1oCsDSDqkMhjwqaAohq7vYlgVajt35z/jU2lDoXsw5Fd+YLbcQh TuGlqK5g3mUZE1MmYNxkXEChLnipzySo35p7NLm34+zki0bqFTxC1ienGHvPv/kd7XZ2 aTFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=BZuTpFWe2nt7y+/QS811q/nmqipRVURrHatiPCF7ae4=; b=qIkj3lRqGJF6feGqDL/DRHYhiVv1/35ltENZjhwwvbWZ5q+JCTvdeFwAoqf/7efOZT 76QOiSwnDPuzEPpxJqMLqTSIRBYx8ZnbMBHFe7jOBkkeOMeeGNoM1RlHl7ANu+FLD9Wf sQhjOHxYC5QeXspZ9pcmgH1+ifOz2lxQ6f8a9ZtFm7DkSicFQDg6/ntDXgt+nwrNV3RH AAVzL2X81hW3btMfd4UlqghTJV0uhTqwZckEXhxcgIeyPXRsDKuDP7vcIQz2oNNiL/Dj PqeHWoNPvt628f0llLSvfBZ9AU64lnZHM1A6SCglO5v0OXBsTVExLHHnKm8FBOo9Q06L Agsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=BByaPnWh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x5si4541598pfq.86.2019.08.16.07.32.23; Fri, 16 Aug 2019 07:32:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=BByaPnWh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727409AbfHPObs (ORCPT + 99 others); Fri, 16 Aug 2019 10:31:48 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:37459 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727245AbfHPObs (ORCPT ); Fri, 16 Aug 2019 10:31:48 -0400 Received: by mail-qk1-f193.google.com with SMTP id s14so4920822qkm.4 for ; Fri, 16 Aug 2019 07:31:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=BZuTpFWe2nt7y+/QS811q/nmqipRVURrHatiPCF7ae4=; b=BByaPnWhPiJu1NxST08KD8o3VHHoiXCwSeRF5WRkI/A6iSsJdYieYRZABO+ij0Wygo Lo/VJnPn76360uFCIIy+stRki3L1XZCSg1jPNkH82SaVmeEYzi7frVKaZzL2sstBoAE5 YuTpZ9a+sHGLMn7wSAkZ4xaZH5NHgu5+lQiwjltkkU4Xchb+3WFO1/OT/SsMyB5YTCKq vGn4XXff29JRy16xTaGRBisMX9dVxL0S14uarJH8M7adfNiA41rbssBJ5gJ3x/l6WkfM cVG0i4r+p0/g8qVkqC3D4MPpQ0JFBX01PwrhMyxSAIovASAQQbfTTtwsyQ5rzvsif2+g mo5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=BZuTpFWe2nt7y+/QS811q/nmqipRVURrHatiPCF7ae4=; b=ZuGEzUvtk4R3StZjfHJMnKZW9JKC8y881NcpSYg9mJ0ci1R3i5WyD91Yf6HZdoVOYy Ww0EzFZX5V6E4wFUJZAqPagaBJVjCkiJ3pb3bthkVPcJzwTtvY9pwkniaRyiuYUCBCMU KK2u2YwZCXFiI3qacy8jRA4iQ0sXylpETB7I51ksNOr+XT4de4ffj42MnAHvOF2lif6B GGz2v0XzG3gpw38U67RBa5J9l176xprNf5GP/LroBmUiO8qefHm+SLd8hOp/irplmDeH oJcSL9RlLIyqZ0QDD9aDlJiQxOoSFvdFueWAtQhUg2EJOyjy0lf9ALYOQLzYCNGStqZF 0jdw== X-Gm-Message-State: APjAAAWNYK6f53XtE2ONbA7O2LdTvtRXEXSQJdOitdmd+HhLp+xRctKD J4iUQdIocPO32+TNSmmuyd6xdQ== X-Received: by 2002:a37:8ac7:: with SMTP id m190mr8450373qkd.385.1565965906561; Fri, 16 Aug 2019 07:31:46 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-156-34-55-100.dhcp-dynamic.fibreop.ns.bellaliant.net. [156.34.55.100]) by smtp.gmail.com with ESMTPSA id 6sm3412555qtu.15.2019.08.16.07.31.45 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Aug 2019 07:31:45 -0700 (PDT) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1hydGb-0003DR-B1; Fri, 16 Aug 2019 11:31:45 -0300 Date: Fri, 16 Aug 2019 11:31:45 -0300 From: Jason Gunthorpe To: Michal Hocko Cc: LKML , "linux-mm@kvack.org DRI Development" , Intel Graphics Development , Peter Zijlstra , Ingo Molnar , Andrew Morton , David Rientjes , Christian =?utf-8?B?S8O2bmln?= , =?utf-8?B?SsOpcsO0bWU=?= Glisse , Masahiro Yamada , Wei Wang , Andy Shevchenko , Thomas Gleixner , Jann Horn , Feng Tang , Kees Cook , Randy Dunlap , Daniel Vetter Subject: Re: [PATCH 2/5] kernel.h: Add non_block_start/end() Message-ID: <20190816143145.GD5398@ziepe.ca> References: <20190815165631.GK21596@ziepe.ca> <20190815174207.GR9477@dhcp22.suse.cz> <20190815182448.GP21596@ziepe.ca> <20190815190525.GS9477@dhcp22.suse.cz> <20190815191810.GR21596@ziepe.ca> <20190815193526.GT9477@dhcp22.suse.cz> <20190815201323.GU21596@ziepe.ca> <20190816081029.GA27790@dhcp22.suse.cz> <20190816121906.GC5398@ziepe.ca> <20190816122625.GA10499@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190816122625.GA10499@dhcp22.suse.cz> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 16, 2019 at 02:26:25PM +0200, Michal Hocko wrote: > On Fri 16-08-19 09:19:06, Jason Gunthorpe wrote: > > On Fri, Aug 16, 2019 at 10:10:29AM +0200, Michal Hocko wrote: > > > On Thu 15-08-19 17:13:23, Jason Gunthorpe wrote: > > > > On Thu, Aug 15, 2019 at 09:35:26PM +0200, Michal Hocko wrote: > > > > > > > > > > The last detail is I'm still unclear what a GFP flags a blockable > > > > > > invalidate_range_start() should use. Is GFP_KERNEL OK? > > > > > > > > > > I hope I will not make this muddy again ;) > > > > > invalidate_range_start in the blockable mode can use/depend on any sleepable > > > > > allocation allowed in the context it is called from. > > > > > > > > 'in the context is is called from' is the magic phrase, as > > > > invalidate_range_start is called while holding several different mm > > > > related locks. I know at least write mmap_sem and i_mmap_rwsem > > > > (write?) > > > > > > > > Can GFP_KERNEL be called while holding those locks? > > > > > > i_mmap_rwsem would be problematic because it is taken during the > > > reclaim. > > > > Okay.. So the fs_reclaim debugging does catch errors. > > I do not think fs_reclaim is the udnerlying mechanism to catch this > deadlock. Indeed lockdep would catch it directly as an AA deadlock, but only if we happen to take the reclaim path under the kmalloc in the notifier and lucked into it also choosing to lock the same lock we are holding. The trouble is this is very difficult to hit. lockdep allows making this less difficult. For instance with a 'might_reclaim()' annotation in the allocator we could check that the various reclaim related locks are obtainable. Testing doesn't need to get lucky and go down the very unlikely path. It turns out this is already happening, it is actually a side effect of the way fs_reclaim works now. > > Do you have any > > reference for what a false positive looks like? > > I believe I have given some examples when introducing __GFP_NOLOCKDEP. Okay, I think that is 7e7844226f10 ("lockdep: allow to disable reclaim lockup detection") Hmm, sadly the lkml link in the commit is broken. Hum. There are no users of __GFP_NOLOCKDEP today?? Could all the false positives have been fixed?? Keep in mind that this fs_reclaim was reworked away from the hacky interrupt context flag to a fairly elegant real lock map. Based on my read of the commit message, my first reaction would be to suggest lockdep_set_class() to solve the problem described, ie different classes for 'inside transaction' and 'outside transaction' on xfs_refcountbt_init_cursor() I understood that generally that is the way to handle lockdep false positives. Anyhow, if you are willing to consider that lockdep isn't broken, I have some ideas on how to make this clearer and increase coverage. Would you be willing to look at patches on this topic? (not soon, I have to finish my mmu notifier stuff) > > I would like to inject it into the notifier path as this is very > > difficult for driver authors to discover and know about, but I'm > > worried about your false positive remark. > > > > I think I understand we can use only GFP_ATOMIC in the notifiers, but > > we need a strategy to handle OOM to guarentee forward progress. > > Your example is from the notifier registration IIUC. Yes, that is where this commit hit it.. Triggering this under an actual notifier to get a lockdep report is hard. > Can you pre-allocate before taking locks? Could you point me to some > examples when the allocation is necessary in the range notifier > callback? Hmm. I took a careful look, I only found mlx5 as obviously allocating memory: mlx5_ib_invalidate_range() mlx5_ib_update_xlt() __get_free_pages(gfp, get_order(size)); However, I think this could be changed to fall back to some small buffer if allocation fails. The existing scheme looks sketchy nouveau does: nouveau_svmm_invalidate nvif_object_mthd kmalloc(GFP_KERNEL) But I think it reliably uses a stack buffer here i915 I think Daniel said he audited. amd_mn.. The actual invalidate_range_start does not allocate memory, but it is entangled with so many locks it would need careful analysis to be sure. The others look generally OK, which is good, better than I hoped :) Jason