Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp1146635rdd; Wed, 10 Jan 2024 09:59:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IFcywLRQNkyPaPqwsb1F951Szplt9ojy2tQL1tkZWGCB4Ba/9VrDkE5XKAVctwBmintjDSw X-Received: by 2002:a05:6a20:6a09:b0:199:e7ce:9339 with SMTP id p9-20020a056a206a0900b00199e7ce9339mr1135207pzk.44.1704909553938; Wed, 10 Jan 2024 09:59:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704909553; cv=none; d=google.com; s=arc-20160816; b=RqgaDlRpGN/QIrWt/6/xJM88LNVSD9+TLzhCn5J7YTX/8PGZFET/NgkZqLnZWP72/E Q+d5x1EsMToA/24Hq3mHpkdbNJI33RtGy33jlk/1Epyv4Xxjk7WbxRxzd2pSjRQ+ZkjB LWIKm9RsfmdX3PLFBPctyHNEdA0CrwKJCSU01AovX6SQzyp/Dmo1RARhXvzfL++68UPq 2j29EjUcsxLCdVwIdmAGaFrFA8hx2S0w6752/9qMlPGXFxhnYkeoCta7TyEgBeMl/oaO sF/fjSxj3oeYLovcg1USsDqk6HShW5K9qrfFWnSc1Cvv+hWJqJUxJZYD878bRcanph52 2Epw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date; bh=SIj+/Ji/uuYsf4IQqxV9enW0CI5buvstHPz3KtPqk3Y=; fh=dusCmWy8g4sxw/1846XSGfwEpPVG6DV5xLTnYLEeAJE=; b=pCYj/9El8Ze1lnA3HDgeV7W1RJ/iMh4i4ewpZ39OfQYy9yftrSMZKXqHd5qvx0SHxy bjs1lJbuNTzKWPOpjuBhwvh1WvlRp/Ogtm8/z56sXIRv5Wy9gAX7QlLEgrQXopYeFnsn nJ5dE7e33hmfZpZ7RLxS5oxfkVA6aW7Dodl4Jde8NQbMWvDo/APRFA/+FLfLkDhnxwZZ zztjapI9fW3EB9hMnUoB/tonVJ2+bqncdCe1g2aZgRU6EOOcaiJ9AGYYrHUiigF6fPaM D4T4DzZM2qflrt+DXADxaOrrI7CRqkGhRDiUCW8LcWBGRGC+ixGThzv93LLCNB1OcYi+ AMZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-22580-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-22580-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id o9-20020a62cd09000000b006d9b1f78047si4146571pfg.46.2024.01.10.09.59.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Jan 2024 09:59:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-22580-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-22580-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-22580-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 6480628413F for ; Wed, 10 Jan 2024 17:58:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D786A4D5A6; Wed, 10 Jan 2024 17:58:20 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 667DB4D584; Wed, 10 Jan 2024 17:58:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A8156C43399; Wed, 10 Jan 2024 17:58:17 +0000 (UTC) Date: Wed, 10 Jan 2024 17:58:15 +0000 From: Catalin Marinas To: Robin Murphy Cc: Ido Schimmel , joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, zhangzekun11@huawei.com, john.g.garry@oracle.com, dheerajkumar.srivastava@amd.com, jsnitsel@redhat.com Subject: Re: [PATCH v3 0/2] iommu/iova: Make the rcache depot properly flexible Message-ID: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Jan 10, 2024 at 12:48:06PM +0000, Robin Murphy wrote: > On 2024-01-09 5:21 pm, Ido Schimmel wrote: > > On Mon, Jan 08, 2024 at 05:35:26PM +0000, Robin Murphy wrote: > > > Hmm, we've got what looks to be a set of magazines forming a plausible depot > > > list (or at least the tail end of one): > > > > > > ffff8881411f9000 -> ffff8881261c1000 > > > > > > ffff8881261c1000 -> ffff88812be26400 > > > > > > ffff88812be26400 -> ffff8188392ec000 > > > > > > ffff8188392ec000 -> ffff8881a5301000 > > > > > > ffff8881a5301000 -> NULL > > > > > > which I guess has somehow become detached from its rcache->depot without > > > being freed properly? However I'm struggling to see any conceivable way that > > > could happen which wouldn't already be more severely broken in other ways as > > > well (i.e. either general memory corruption or someone somehow still trying > > > to use the IOVA domain while it's being torn down). > > > > The machine is running a debug kernel that among other things has KASAN > > enabled, but there are no traces in the kernel log so there is no memory > > corruption that I'm aware of. > > > > > Out of curiosity, does reverting just patch #2 alone make a difference? > > > > Will try and let you know. > > > > > And is your workload doing anything "interesting" in relation to IOVA > > > domain lifetimes, like creating and destroying SR-IOV virtual > > > functions, changing IOMMU domain types via sysfs, or using that > > > horrible vdpa thing, or are you seeing this purely from regular driver > > > DMA API usage? > > > > The machine is running networking related tests, but it is not using > > SR-IOV, VMs or VDPA so there shouldn't be anything "interesting" as far > > as IOMMU is concerned. > > > > The two networking drivers on the machine are "igb" for the management > > port and "mlxsw" for the data ports (the machine is a physical switch). > > I believe the DMA API usage in the latter is quite basic and I don't > > recall any DMA related problems with this driver since it was first > > accepted upstream in 2015. > > Thanks for the clarifications, that seems to rule out all the most > confusingly impossible scenarios, at least. > > The best explanation I've managed to come up with is a false-positive race > dependent on the order in which kmemleak scans the relevant objects. Say we > have the list as depot -> A -> B -> C; the rcache object is scanned and sees > the pointer to magazine A, but then A is popped *before* kmemleak scans it, > such that when it is then scanned, its "next" pointer has already been > wiped, thus kmemleak never observes any reference to B, so it appears that B > and (transitively) C are "leaked". Transient false positives are possible, especially as the code doesn't use a double-linked list (for the latter, kmemleak does checksumming and detects the prev/next change, defers the reporting until the object becomes stable). That said, if a new scan is forced (echo scan > /sys/kernel/debug/kmemleak), are the same objects still listed as leaks? If yes, they may not be transient. If it is indeed transient, I think a better fix than kmemleak_not_leak() is to add a new API, something like kmemleak_mark_transient() which resets the checksum, skips the object reporting for one scan. -- Catalin