Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp439546pxv; Wed, 30 Jun 2021 09:01:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwoQ16iih93UP6G9RN63JDAjLl30ZMKCttwAeOToEAde6f3lIE6OZLhMY3H2XUEkFS723a4 X-Received: by 2002:a17:907:3c81:: with SMTP id gl1mr866486ejc.136.1625068875281; Wed, 30 Jun 2021 09:01:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625068875; cv=none; d=google.com; s=arc-20160816; b=kcRpSAA5bQbrbLH7Ryh9cELH3NnNQ7BajjU3RoX4/brl6M7Ty5FVsnrcW8e1VQvRTW hwyRae92BfJgoHxGaDkEw45dVtu3LPoc4F4lY1OcNSBZv7HyfdznrUchOBFwY0QODOsb p/fkSghrOrlOqnyg7CzXmzazaeYoBPOaj3d0xU8o95iaNbG3bUdTon57mOaANYreuVgL NirWQsqw8AsWVp3udh8ZkdfmFOtB4sx1Ey7iPN1p0xomHLfIMY+uv5CBaK+uMiZYtIax UIMCuoUlyYprhmjzCL8W6Dm8cQdMPaAWIWZeXaJWnSrvv6QekPDQOOo4npoYEV/ZlEJv 9zuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ejgHW/eeQWvZKdFxpIXMudFddydsQ/lBj3eAqoVp3i0=; b=uP7xt/q0s58S4GpGyCvyjtrcwtDF0wj+dX06QRlrzskXSyll2lf7F2LWuRxKGjVgvJ qWSNmXo6WQY4NcDiH1jDZl6kVc7dP3wnuvZkURRSimTDPg0SLWAHHikO7clc2BDq5skR IkEkTGOH9V8pvz73CPrR/UFbrJqzl0WeGwx/j5u/zRPkwJ3QlKJmN28Hq2spg8+Ftyxd MEBkInAqODNKxWIk+WtAreSJ3C+YQkxrszZOXw9DfaCAXKscCMxLmdKWvUwkQ1C/ErNl KFKc8Xlqc/ym+/dm/txNrTazVEXCpLK11nG1Uq/cLiKVRsVlXkkuVmImp2Y/49bD9Svx sqZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nFQR+NYj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z16si7169253edr.310.2021.06.30.09.00.50; Wed, 30 Jun 2021 09:01:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nFQR+NYj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236040AbhF3P7Z (ORCPT + 99 others); Wed, 30 Jun 2021 11:59:25 -0400 Received: from mail.kernel.org ([198.145.29.99]:60726 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235976AbhF3P7Y (ORCPT ); Wed, 30 Jun 2021 11:59:24 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3EC0061396; Wed, 30 Jun 2021 15:56:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1625068615; bh=t4c1D/KrdZudddoFFMreAQi55GnAG0OUL4mpRujr/+8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=nFQR+NYjpCrpMIj0+9nBAj35wlj1qXRWbFKtyhNDLpJTAm3BAAOI+jIUn7QIQ6LMC /D08b/Wyq1LKmfD6NJJYuuUXm1XpHRMF+bzag+U7SE0Xnf1l+0yaFFUCUoKE4yB6Fx KHCUGF9UUbr1kSihYQ7S2P6QTrW/0sw9J8RiF/5y7S55QWGDrjBUVqbIJpceKtB839 13XYMsllNYSyOsjMPZqtEclitRsqTPicPLIWpPGvJ9tnj6RZjijLVBtTMriTuGUDnx pujXO32GnzzUvzSsthpj3FxbtH2/lBtbWkeBfV4uT8mpQQ+qjMcUVFz7XO5LQkzFA7 Ko9guip4RPSmA== Date: Wed, 30 Jun 2021 08:56:51 -0700 From: Nathan Chancellor To: Will Deacon Cc: Claire Chang , Rob Herring , mpe@ellerman.id.au, Joerg Roedel , Frank Rowand , Konrad Rzeszutek Wilk , boris.ostrovsky@oracle.com, jgross@suse.com, Christoph Hellwig , Marek Szyprowski , benh@kernel.crashing.org, paulus@samba.org, "list@263.net:IOMMU DRIVERS" , Stefano Stabellini , Robin Murphy , grant.likely@arm.com, xypron.glpk@gmx.de, Thierry Reding , mingo@kernel.org, bauerman@linux.ibm.com, peterz@infradead.org, Greg KH , Saravana Kannan , "Rafael J . Wysocki" , heikki.krogerus@linux.intel.com, Andy Shevchenko , Randy Dunlap , Dan Williams , Bartosz Golaszewski , linux-devicetree , lkml , linuxppc-dev@lists.ozlabs.org, xen-devel@lists.xenproject.org, Nicolas Boichat , Jim Quinlan , Tomasz Figa , bskeggs@redhat.com, Bjorn Helgaas , chris@chris-wilson.co.uk, Daniel Vetter , airlied@linux.ie, dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, jani.nikula@linux.intel.com, Jianxiong Gao , joonas.lahtinen@linux.intel.com, linux-pci@vger.kernel.org, maarten.lankhorst@linux.intel.com, matthew.auld@intel.com, rodrigo.vivi@intel.com, thomas.hellstrom@linux.intel.com, Tom Lendacky , Qian Cai Subject: Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing Message-ID: References: <20210624155526.2775863-1-tientzu@chromium.org> <20210624155526.2775863-7-tientzu@chromium.org> <20210630114348.GA8383@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210630114348.GA8383@willie-the-truck> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Will and Claire, On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote: > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote: > > On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor wrote: > > > > > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote: > > > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > > > > use it to determine whether to bounce the data or not. This will be > > > > useful later to allow for different pools. > > > > > > > > Signed-off-by: Claire Chang > > > > Reviewed-by: Christoph Hellwig > > > > Tested-by: Stefano Stabellini > > > > Tested-by: Will Deacon > > > > Acked-by: Stefano Stabellini > > > > > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce > > > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to > > > get to an X session consistently (although not every single time), > > > presumably due to a crash in the AMDGPU driver that I see in dmesg. > > > > > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy > > > to provide any further information, debug, or test patches as necessary. > > > > Are you using swiotlb=force? or the swiotlb_map is called because of > > !dma_capable? (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93) > > The command line is in the dmesg: > > | Kernel command line: initrd=\amd-ucode.img initrd=\initramfs-linux-next-llvm.img root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp irqpoll > > but I worry that this looks _very_ similar to the issue reported by Qian > Cai which we thought we had fixed. Nathan -- is the failure deterministic? Yes, for the most part. It does not happen every single boot so when I was bisecting, I did a series of seven boots and only considered the revision good when all seven of them made it to LightDM's greeter. My results that I notated show most bad revisions failed anywhere from four to six times. > > `BUG: unable to handle page fault for address: 00000000003a8290` and > > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory > > (maybe dev->dma_io_tlb_mem) was corrupted? > > The dev->dma_io_tlb_mem should be set here > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528) > > through device_initialize. > > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at > 'io_tlb_default_mem', which is a page-aligned allocation from memblock. > The spinlock is at offset 0x24 in that structure, and looking at the > register dump from the crash: > > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:ffffadb4013db9e8 EFLAGS: 00010006 > Jun 29 18:28:42 hp-4300G kernel: RAX: 00000000003a8290 RBX: 0000000000000000 RCX: ffff8900572ad580 > Jun 29 18:28:42 hp-4300G kernel: RDX: ffff89005653f024 RSI: 00000000000c0000 RDI: 0000000000001d17 > Jun 29 18:28:42 hp-4300G kernel: RBP: 000000000a20d000 R08: 00000000000c0000 R09: 0000000000000000 > Jun 29 18:28:42 hp-4300G kernel: R10: 000000000a20d000 R11: ffff89005653f000 R12: 0000000000000212 > Jun 29 18:28:42 hp-4300G kernel: R13: 0000000000001000 R14: 0000000000000002 R15: 0000000000200000 > Jun 29 18:28:42 hp-4300G kernel: FS: 00007f1f8898ea40(0000) GS:ffff890057280000(0000) knlGS:0000000000000000 > Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jun 29 18:28:42 hp-4300G kernel: CR2: 00000000003a8290 CR3: 00000001020d0000 CR4: 0000000000350ee0 > Jun 29 18:28:42 hp-4300G kernel: Call Trace: > Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50 > Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0 > > Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and > RDX pointing at the spinlock. Yet RAX is holding junk :/ > > I agree that enabling KASAN would be a good idea, but I also think we > probably need to get some more information out of swiotlb_tbl_map_single() > to see see what exactly is going wrong in there. I can certainly enable KASAN and if there is any debug print I can add or dump anything, let me know! Cheers, Nathan