Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp1001247rdd; Wed, 10 Jan 2024 06:02:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IGFC1eDyOYmyh5D6Lr8v0WZFqmvUr9fS4T6Cswu1DOQztHNUunEhrWz8D8XvTT0ufLV1O06 X-Received: by 2002:a05:6871:b0f:b0:203:cb22:9758 with SMTP id fq15-20020a0568710b0f00b00203cb229758mr854658oab.50.1704895324283; Wed, 10 Jan 2024 06:02:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704895324; cv=none; d=google.com; s=arc-20160816; b=Oo7RDwUhf+ipnI8tpxHRZx5ILKao9UvcgRG2Ml0lMMdCXjTgWSPx8EkJGpWCX68uSE e4/TpyXKjwl4undEhR4uVBZOUmsFhMLZGECuTNQYaHiG2+LIAjIPfqvryD40zV32I46Q fMAdBFBbp64B0l0Nk7upFf/FZxu+BaGNPEo+y0XLjummyc8XKFgLT3vsYgR0T5rzTmJW 9olheBWE5qcs2B2VFQ13Bztk4Ttp31NskRxFC4Kp6eHx/xVDPPgAgifCV7RT8fCF7zNI L4d/iJI4raoUmVl7go7jW7S7rwkhxmcKKpaxW8ETz9LN1U9ZjgPqIjGIVina9mhvXHts QwKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:feedback-id:dkim-signature; bh=re+Q9geUHzAXfBwVb7JEE7u2m42bh9L0BNgJRkiS9FI=; fh=AYsKExizLzS2/nASpI68sLlvmD1GlMU+0XchD7dw/zI=; b=Ew3M8BDzjz7Dpye+s4+WAJvPBC8XvwPTszuc3mR8vxuEPo1/qZN9QqB4FxHnf919AW 9cRRooToEZT5jE95n+o0ufN4jzYJqasgHwOIZN0Shwic8szls6AZePPJK22e103dKMDc emcRyqlZxNIGXZzN+RmgjJ/l63uQiaIGRnzQUzvZG6xe1FPNmal0c+j1NrbAZPO+/ZmH mRRlpeLbgUdaDBahNMBhZvJfaWe1XzJysC6FdLhM+tXrvSetAgGVCV+LNKtX/bFffm3x W68mfDOP9OHUrW2thySdHDymWFt6bwBLXJGt86y5dke722f3fpx/OonJK1oiuVJTJIX1 Pdmg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Si5EbaOv; spf=pass (google.com: domain of linux-kernel+bounces-22295-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-22295-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id qj6-20020a0568702f0600b0020386be42c2si1430320oab.301.2024.01.10.06.02.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Jan 2024 06:02:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-22295-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Si5EbaOv; spf=pass (google.com: domain of linux-kernel+bounces-22295-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-22295-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 39BD2286D14 for ; Wed, 10 Jan 2024 14:01:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BD8594A991; Wed, 10 Jan 2024 14:01:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Si5EbaOv" Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 201C5495CF for ; Wed, 10 Jan 2024 14:01:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=idosch.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=idosch.org Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 20E575C00B0; Wed, 10 Jan 2024 09:01:07 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Wed, 10 Jan 2024 09:01:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1704895267; x=1704981667; bh=re+Q9geUHzAXfBwVb7JEE7u2m42b h9L0BNgJRkiS9FI=; b=Si5EbaOvyJjpE1kzS1/mdRzUrwx86f9CzHGR6gu25jLa d+aL6JfTPqw4duMabpPXDkgYcGrueHy7EzamlM4aa13LDZOCq/n82wqIreXSNzEl gTOoqwXX5Bshj4dUT4fPCXVW/vI3xsFN1h+ZulFigIuMkUMahkjs+dUjxrEqfWhh d2Uys7DQpBG2XzvsGQonRUZLdHr12ZqrIJUYT6kzIEZ0vs1cHernJftpYfix/ZU6 kiGLRm+CCMPNAu+Pq8smqGhOaG0woRcyZ+8gGnvPahCI1aYRcW51zS1uP84PBEbW di3ta33JzNuyjjo+9UFupZcxvJ7vuvL7nAeimsAsEw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdeiuddgheejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesthdtredttddtvdenucfhrhhomhepkfguohcu ufgthhhimhhmvghluceoihguohhstghhsehiughoshgthhdrohhrgheqnecuggftrfgrth htvghrnhepvddufeevkeehueegfedtvdevfefgudeifeduieefgfelkeehgeelgeejjeeg gefhnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepih guohhstghhsehiughoshgthhdrohhrgh X-ME-Proxy: Feedback-ID: i494840e7:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 10 Jan 2024 09:01:05 -0500 (EST) Date: Wed, 10 Jan 2024 16:00:59 +0200 From: Ido Schimmel To: Robin Murphy Cc: joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, zhangzekun11@huawei.com, john.g.garry@oracle.com, dheerajkumar.srivastava@amd.com, jsnitsel@redhat.com, Catalin Marinas Subject: Re: [PATCH v3 0/2] iommu/iova: Make the rcache depot properly flexible Message-ID: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Jan 10, 2024 at 12:48:06PM +0000, Robin Murphy wrote: > On 2024-01-09 5:21 pm, Ido Schimmel wrote: > > Hi Robin, > > > > Thanks for the reply. > > > > On Mon, Jan 08, 2024 at 05:35:26PM +0000, Robin Murphy wrote: > > > Hmm, we've got what looks to be a set of magazines forming a plausible depot > > > list (or at least the tail end of one): > > > > > > ffff8881411f9000 -> ffff8881261c1000 > > > > > > ffff8881261c1000 -> ffff88812be26400 > > > > > > ffff88812be26400 -> ffff8188392ec000 > > > > > > ffff8188392ec000 -> ffff8881a5301000 > > > > > > ffff8881a5301000 -> NULL > > > > > > which I guess has somehow become detached from its rcache->depot without > > > being freed properly? However I'm struggling to see any conceivable way that > > > could happen which wouldn't already be more severely broken in other ways as > > > well (i.e. either general memory corruption or someone somehow still trying > > > to use the IOVA domain while it's being torn down). > > > > The machine is running a debug kernel that among other things has KASAN > > enabled, but there are no traces in the kernel log so there is no memory > > corruption that I'm aware of. > > > > > Out of curiosity, does reverting just patch #2 alone make a difference? > > > > Will try and let you know. I can confirm that the issue reproduces when only patch #2 is reverted. IOW, patch #1 seems to be the problem: unreferenced object 0xffff8881a1ff3400 (size 1024): comm "softirq", pid 0, jiffies 4296362635 (age 3540.420s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 67 b7 05 00 00 00 00 00 ........g....... 3f a6 05 00 00 00 00 00 93 99 05 00 00 00 00 00 ?............... backtrace: [] __kmem_cache_alloc_node+0x1e8/0x320 [] kmalloc_trace+0x2a/0x60 [] free_iova_fast+0x293/0x460 [] fq_ring_free_locked+0x1b0/0x310 [] fq_flush_timeout+0x19d/0x2e0 [] call_timer_fn+0x19a/0x5c0 [] __run_timers+0x78b/0xb80 [] run_timer_softirq+0x5d/0xd0 [] __do_softirq+0x205/0x8b5 unreferenced object 0xffff888165b9a800 (size 1024): comm "softirq", pid 0, jiffies 4299383627 (age 519.460s) hex dump (first 32 bytes): 00 34 ff a1 81 88 ff ff bd 9d 05 00 00 00 00 00 .4.............. f3 ab 05 00 00 00 00 00 37 b5 05 00 00 00 00 00 ........7....... backtrace: [] __kmem_cache_alloc_node+0x1e8/0x320 [] kmalloc_trace+0x2a/0x60 [] free_iova_fast+0x293/0x460 [] fq_ring_free_locked+0x1b0/0x310 [] fq_flush_timeout+0x19d/0x2e0 [] call_timer_fn+0x19a/0x5c0 [] __run_timers+0x78b/0xb80 [] run_timer_softirq+0x5d/0xd0 [] __do_softirq+0x205/0x8b5 > > > > > And is your workload doing anything "interesting" in relation to IOVA > > > domain lifetimes, like creating and destroying SR-IOV virtual > > > functions, changing IOMMU domain types via sysfs, or using that > > > horrible vdpa thing, or are you seeing this purely from regular driver > > > DMA API usage? > > > > The machine is running networking related tests, but it is not using > > SR-IOV, VMs or VDPA so there shouldn't be anything "interesting" as far > > as IOMMU is concerned. > > > > The two networking drivers on the machine are "igb" for the management > > port and "mlxsw" for the data ports (the machine is a physical switch). > > I believe the DMA API usage in the latter is quite basic and I don't > > recall any DMA related problems with this driver since it was first > > accepted upstream in 2015. > > Thanks for the clarifications, that seems to rule out all the most > confusingly impossible scenarios, at least. > > The best explanation I've managed to come up with is a false-positive race > dependent on the order in which kmemleak scans the relevant objects. Say we > have the list as depot -> A -> B -> C; the rcache object is scanned and sees > the pointer to magazine A, but then A is popped *before* kmemleak scans it, > such that when it is then scanned, its "next" pointer has already been > wiped, thus kmemleak never observes any reference to B, so it appears that B > and (transitively) C are "leaked". If that is the case, then I'd expect it > should be reproducible with patch #1 alone (although patch #2 might make it > slightly more likely if the work ever does result in additional pops > happening), but I'd expect the leaked objects to be transient and not > persist forever through repeated scans (what I don't know is whether > kmemleak automatically un-leaks an object if it subsequently finds a new > reference, or if it needs manually clearing in between scans). I'm not sure > if there's a nice way to make that any better... unless maybe it might make > sense to call kmemleak_not_leak(mag->next) in iova_depot_pop() before that > reference disappears? I'm not familiar with the code so I can't comment if that's the best solution, but I will say that we've been running kmemleak as part of our regression for years and every time we got a report it was an actual memory leak. Therefore, in order to keep the tool reliable, I think it's better to annotate the code to suppress false-positives rather than ignoring it. Please let me know if you want me to test a fix. Thanks for looking into this!