Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1304427pxu; Fri, 27 Nov 2020 04:35:05 -0800 (PST) X-Google-Smtp-Source: ABdhPJwcTa7nIJ8ypFRllg4S2ghK678jwT2fw/+yV7bsGgvqQpLELn5iuJJxNUXnUf/6tznA9s/x X-Received: by 2002:a17:906:2756:: with SMTP id a22mr7176193ejd.81.1606480505586; Fri, 27 Nov 2020 04:35:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606480505; cv=none; d=google.com; s=arc-20160816; b=ADoscvrlZXELpmnNG9C4RaAEsfyYbywl2Epw+Xvl/dlP1A0Pse8Ppcc+OSJMlJDkbn VL4vwVshWBwBfykDNYTRPtQHwz9/RZ6f60syRk6OD2vLw7y02GEeWWN1MiK535jNlsU/ nNgeHUN0Y2wsdN1tTZQOV06rOFoaeyuUlbBlLhJEA5vxRoU+YI1dOHxzWKlFp/kxf+U2 Weajc4eScQx0lCwSjDgHUQWAxBkuOzhtWqrzjHE6pZKsdyI/e2UtSv0zfUKDWshnHJD3 MqI8Vm/Dc3pqWcVKWCX6A6bdOjc2PtLZ/9dV+AhcbO/NNSjYBKPeMV+Ooos3C2Y/GlOg f9DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:subject:from:dkim-signature; bh=vgD9plq621YFrh/85hD2gFfX5Qu42WsynNUQn9WpglY=; b=A24QsuLjDmw01FGtSbIANNFAmnpF0R0nvWKSmPsqK1A9odGQEnqH7QCan0dVuKfk3G s7xsNqM+6vdg+rrQ5MkP4TFORCLdM3VYv6LgkX60Uy43lbQov5DYKrJwRoON/nEqXRB0 1yA2JGv7NqEQPtXmRzNIbtDmtMK3vGdbyNgBNmKasY3D6HG697R3OrA4efBo7AknJPYt cYJm4hYMjavGZ4uo74Y+qyJxLmNVwsGNgOMqI9a5ejHVoUkdgsz6mBi6dmZC9ndQgzWa x0n3O0igiwnHNg1ieLmXsubQGlZgnSZi6Qxc0CCbEBoN5OQi35Jsu9zMrPFzvmDXiq0P odtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SFnEEyNd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cf10si2120568ejb.454.2020.11.27.04.34.42; Fri, 27 Nov 2020 04:35:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SFnEEyNd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728404AbgK0McY (ORCPT + 99 others); Fri, 27 Nov 2020 07:32:24 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:53305 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727737AbgK0McX (ORCPT ); Fri, 27 Nov 2020 07:32:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1606480341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vgD9plq621YFrh/85hD2gFfX5Qu42WsynNUQn9WpglY=; b=SFnEEyNd7fPP1Jt3r+VwqBNbL74mIvmA6sm/h9XgoTxxtjt4IMrbLUsXCwBOdbkM1yFIXV +rsBXBDhYYw6K3OpnHk4zbWr5nWFN2qGGEGfoiT5dy8bM7R4HZKoi7yiUTiL5J5cGm4BMG +qDdyQDQqdX2pwJrSlTo8G8pH2lwhvI= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-524-ZPnR46PsM16F4BweFmyiOw-1; Fri, 27 Nov 2020 07:32:20 -0500 X-MC-Unique: ZPnR46PsM16F4BweFmyiOw-1 Received: by mail-ed1-f69.google.com with SMTP id bt2so2383481edb.12 for ; Fri, 27 Nov 2020 04:32:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=vgD9plq621YFrh/85hD2gFfX5Qu42WsynNUQn9WpglY=; b=Vm/gxPxk41XZQf/dFJxVj2OcjMXCoaUdDiwETRq9MxbpCjaWk0WUWNmxuuVnL0qQKf 34DQMNuQADof62Rwq9B6ncJg66Nkw7qikO7jp3wjQanT3MWQJlSEFIXNHPHsikiUxUaH AcB5Eld7409Hp1b5antJ2beeVUPQQsGdWGhq84WuH4XLZ7VEG8kGDcNMI4dxB7Iu2k7/ 95xsR1qrbDOHrKtxhb1KFOOzGQ6l4Xru91JRTiIMnV8i+mHOZqIs0dxMEBbz9cEhBj2m Dq3DztAI4FsW+O570z7OkifJdjyKGemfGzQyHSseZ4XD9BQbygLZGdkpLoguci7v0P7O fK1Q== X-Gm-Message-State: AOAM5303Bk6YaDbaIGut2eMuAr5UatM0kZ2D/OdwyHSVetDXYvcWz65N gemjDW1+J0bykHqm+wxYAZzIM9mWWvkCeLEq10nCG6nf6l2PT6eaLrbrstgz5L8InFAmi8DkMFF qpQ5P/w3v1oap41tiHN0I4HA5 X-Received: by 2002:a17:906:d72:: with SMTP id s18mr7550989ejh.110.1606480338746; Fri, 27 Nov 2020 04:32:18 -0800 (PST) X-Received: by 2002:a17:906:d72:: with SMTP id s18mr7550966ejh.110.1606480338504; Fri, 27 Nov 2020 04:32:18 -0800 (PST) Received: from x1.localdomain (2001-1c00-0c0c-fe00-d2ea-f29d-118b-24dc.cable.dynamic.v6.ziggo.nl. [2001:1c00:c0c:fe00:d2ea:f29d:118b:24dc]) by smtp.gmail.com with ESMTPSA id h9sm4763517ejk.118.2020.11.27.04.32.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 27 Nov 2020 04:32:17 -0800 (PST) From: Hans de Goede Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller To: Christoph Hellwig , Tom Yan Cc: Mathias Nyman , Greg Kroah-Hartman , linux-usb , Linux Kernel Mailing List , linux-pci@vger.kernel.org References: <20201124102715.GA16983@lst.de> Message-ID: <8a52e868-0ca1-55b7-5ad2-ddb0cbb5e45d@redhat.com> Date: Fri, 27 Nov 2020 13:32:16 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 11/27/20 12:41 PM, Hans de Goede wrote: > Hi, > > On 11/24/20 11:27 AM, Christoph Hellwig wrote: >> On Mon, Nov 23, 2020 at 03:49:09PM +0100, Hans de Goede wrote: >>> Hi, >>> >>> +Cc Christoph Hellwig >>> >>> Christoph, this is still an issue, so I've been looking around a bit and think this >>> might have something to do with the dma-mapping-5.10 changes. >>> >>> Do you have any suggestions to debug this, or is it time to do a git bisect >>> on this before 5.10 ships with regression? >> >> Given that DMAR prefix this seems to be about using intel-iommu + bounce >> buffering for external devices. I can't really think of anything specific >> in 5.10 related to that, so maybe you'll need to bisect. >> >> I doub this means we are actually leaking swiotlb buffers, so while >> I'm pretty sure we broke something in lower layers this also means >> xhci doesn't handle swiotlb operation very gracefully in general. > > I've done a git bisect, and the result is somewhat surprising. The git-bisect > points to: > > commit 558033c2828f ("uas: fix sdev->host->dma_dev") > > Use scsi_add_host_with_dma() instead of scsi_add_host(). > > When the scsi request queue is initialized/allocated, hw_max_sectors is clamped > to the dma max mapping size. Therefore, the correct device that should be used > for the clamping needs to be set. > > The same clamping is still needed in uas as hw_max_sectors could be changed > there. The original clamping would be invalidated in such cases. > > I do have an UAS drive connected to the thunderbolt-dock, so I guess that this > change is causing the UAS driver to gobble all all available swiotlb space. I ran some more tests, I can confirm that reverting: 5df7ef7d32fe "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives" 558033c2828f "uas: fix sdev->host->dma_dev" Makes the problem go away while running a 5.10 kernel. I also tried doubling the swiotlb size by adding: swiotlb=65536 to the kernel commandline but that does not help. Some more observations: 1. The usb-storage driver does not cause this issue, even though it has a very similar change. 2. The problem does not happen until I plug an UAS decvice into the dock. 3. The problem continues to happen even after I unplug the UAS device and rmmod the uas module 3. made me take a bit closer look to the troublesome commit, it passes: udev->bus->sysdev, which I assume is the XHCI controller itself as device to scsi_add_host_with_dma, which in turn seems to cause permanent changes to the dma settings for the XHCI controller. I'm not all that familiar with the DMA APIs but I'm getting the feeling that passing the actual XHCI-controller's device as dma-device to scsi_add_host_with_dma is simply the wrong thing to do; and that the intended effects (honor XHCI dma limits, but do not cause any changes the XHCI dma settings) should be achieved differently. Note that if this is indeed wrong, the matching usb-storage change should likely also be dropped. Regards, Hans