Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp709812ybv; Wed, 5 Feb 2020 13:05:41 -0800 (PST) X-Google-Smtp-Source: APXvYqySYcMbnHX0KitFVqwM8i6+l4r+t4HN0d1ylRKi6EGJx5tyirTN7PsATvJh0Zd+lS9tEKZU X-Received: by 2002:a05:6830:1608:: with SMTP id g8mr26129690otr.169.1580936740984; Wed, 05 Feb 2020 13:05:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580936740; cv=none; d=google.com; s=arc-20160816; b=FPd+5xEEjleHffHxPdT0DTanGPzJAH/OwuaTDXSmItqZ4lsquICwBLLzzTcn97G8Zk /q6j1Gx8u0C5XsLR78d9V/sMwu13blm440BkQ6nR4mi5vJrHf/3C6pvUX8Ek2rhcWKdv vHZ0G8tYgY6R0S5D0iaGrKfwMOwlXUGOuksiEZZnjpVCYxTmHDqvPCsd78hOHhyWwGsW JHjJPvxVmYY4ie6gCFUnqDy99LugROQSS4EUnSA7C0ulv2bB7prS39ocnjckIYJjqZqt gHG5v4Js1sZy6kH4wI5EDB0L33nMDLgfHnGi7uB3rwpD8OlkGjeb/HuQ3jyEyVz/ZEaC MpAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=jhhrbRrU672kDfFvJ/EE5w3I5wNtcQx8aaM6dlL4Ong=; b=zKGJtANO1qQq9reaGVIHH/y1e5DPpPetU33TVqHXQt0UF2khSfGyO46yhGjC+JnAZU sEOFf6U7yQcFNZZtX9LkKNhi1xJvUXLABwDOxNZE/wSTl7J7qIWja601B2obfs405eV2 6oG7QHb/jU47shKyPKH+hxoGRvdroPx0X0u86LGca23ooNi2SE7UA8RzcnsY6UQDk8u/ Vj0dNYfyqtMQ54ii7HrrJf2o5fPsUSwOl08BcV6lmOKPWmgpP7UDbqaNKufPhXdaMmW0 OgT+mak9QEoRoqIGtmiivD/SJfDa1CCQirG8QtbUinYdM4KslbRPBrsLtehFiWkTQS2Y X/6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YszJYyBc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v18si384523otq.209.2020.02.05.13.05.28; Wed, 05 Feb 2020 13:05:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YszJYyBc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727234AbgBEVEE (ORCPT + 99 others); Wed, 5 Feb 2020 16:04:04 -0500 Received: from mail-ot1-f68.google.com ([209.85.210.68]:40107 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727116AbgBEVED (ORCPT ); Wed, 5 Feb 2020 16:04:03 -0500 Received: by mail-ot1-f68.google.com with SMTP id i6so3355117otr.7 for ; Wed, 05 Feb 2020 13:04:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jhhrbRrU672kDfFvJ/EE5w3I5wNtcQx8aaM6dlL4Ong=; b=YszJYyBcS4DOYwixJ/e1RscrCZo6Yl869OtkWsZPh/TX7OxgmffJyik12Zbfsk9T4t Si/FvvYgIj+xeVHtf8JfJ6t0G0TptFr0d7IE3nfLEdtoL4nKP2PYNQeP/eN2NWXiqoYY ttgAEXvDYCfedt3To6+LFyxnEAsHbfam/9SvSP1Zi+Xrb0YtBNQKFJKosabsEMgy3ZDT AeJEsLiEXE6hOinKHlfBxXtXhg22JBKGJGycc3HbrQkpr1rLXoZjVFI9ecy64YvI6yEE DMTHYH78o6h4YKj7mmrx3GgA0YfjmIgJyUnyYzETcJTVkGOUiaGTkQtl8FIJoa6c2URd GK+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jhhrbRrU672kDfFvJ/EE5w3I5wNtcQx8aaM6dlL4Ong=; b=n6dH52hX7dRnwq/lwamnGt7yvbTXQjp5tEPS30og3q9oWukcIpVKdZ1ImK4mWz2k3+ gembLe5wZ4nLrNNegc1FxsHOjNot49AGvbF17jAv+Fb+IFDt+UFBZXsJ/oL8iR1imHdZ Tf2fFJDsUOGrDbsjtD6co3NGujPc76S6BQCO2rwjUhxJ+3My/0fzcfXLMeHpcqkNO9Np 84BQVScBAt7F83JL+PxwtiFTLDv9mKhs2yhw38bdOBk10sb2QRLrFBupcCrY4RjiNVdU WI6x7NOMlargwfAuodgiQdETEOV6mJKCk1uKZx5itFDErhe4FfYNKsoV5evaIpri5jmP SO9w== X-Gm-Message-State: APjAAAU4///vgDp5EzjrAGWoappKgWzNsvBaXcAwWrbgljz6q7n1UU2Y yCbQHLAIwt9Is/xf3dmm6Jmvof7n34x1g+4HsVFjrw== X-Received: by 2002:a9d:7493:: with SMTP id t19mr14936423otk.332.1580936642315; Wed, 05 Feb 2020 13:04:02 -0800 (PST) MIME-Version: 1.0 References: <20200122222645.38805-1-john.stultz@linaro.org> <02E7334B1630744CBDC55DA8586225837F9EE280@ORSMSX102.amr.corp.intel.com> <87o8uu3wqd.fsf@kernel.org> <02E7334B1630744CBDC55DA8586225837F9EE335@ORSMSX102.amr.corp.intel.com> <87lfpy3w1g.fsf@kernel.org> In-Reply-To: <87lfpy3w1g.fsf@kernel.org> From: John Stultz Date: Wed, 5 Feb 2020 13:03:51 -0800 Message-ID: Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using adb over f_fs To: Felipe Balbi Cc: "Yang, Fei" , Andrzej Pietrasiewicz , lkml , Felipe Balbi , Thinh Nguyen , Tejas Joglekar , Jack Pham , Todd Kjos , Greg KH , Linux USB List , stable , Jens Axboe , Christoph Hellwig Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 23, 2020 at 9:46 AM Felipe Balbi wrote: > >>>>> > >>>>> Since ~4.20, when the functionfs gadget enabled scatter-gather > >>>>> support, we have seen problems with adb connections stalling and > >>>>> stopping to function on hardware with dwc3 usb controllers. > >>>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices. ... > >> > >> I'm pretty sure this should be solved at the DMA API level, just want to confirm. > > > > I have sent you the tracepoints long time ago. Also my analysis of the > > problem (BTW, I don't think the tracepoints helped much). It's > > basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg(). > > AFAICT, this is caused by DMA API merging pages together when map an > sglist for DMA. While doing that, it does *not* move the SG_END flag > which sg_is_last() checks. > > I consider that an overlook on the DMA API, wouldn't you? Why should DMA > API users care if pages were merged or not while mapping the sglist? We > have for_each_sg() and sg_is_last() for a reason. > From an initial look, I agree this is pretty confusing. dma_map_sg() can coalesce entries in the sg list, modifying the sg entires themselves, however, in doing so it doesn't modify the number of entries in the sglist (nor the end state bit). That's pretty subtle! My initial naive attempt to fix the dma-iommu path to set the end bit at the tail of __finalize_sg() which does the sg entry modifications, only caused trouble elsewhere, as there's plenty of logic that expects the number of entries to not change, so having sg_next() return NULL before that point results in lots of null pointer traversals. Further, looking at the history, while apparently quirky, this has been the documented behavior in DMA-API.txt for over almost 14 years (at least). It clearly states that that dma_map_api can return fewer mapped entries then sg entries, and one should loop only that many times (for_each_sg() having a max number of entries to iterate over seems specifically for this purpose). Additionally, it says one must preserve the original number of entries (not # mapped entries) for dma_unmap_sg(). So I'm not sure that sg_is_last() is really valid for iterating on mapped sg lists. Should it be? Probably (at least with my unfamiliar eyes), but sg_is_last() has been around for almost as long coexisting with this behavioral quirk, so I'm also not sure this is the best hill for the dwc3 driver to die on. :) The fix here: https://lore.kernel.org/lkml/20200122222645.38805-3-john.stultz@linaro.org/ Or maybe the slightly cleaner varient here: https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/db845c-mainline-WIP&id=61a5816aa71ec719e77df9f2260dbbf6bcec7c99 seems like it would correctly address things following the documentation and avoid the failures we're seeing. As to if dma_map_sg() should fixup the state bits or properly shrink the sg list when it coalesces entries, that seems like it would be a much more intrusive change across quite a bit of the kernel that was written to follow the documented method. So my confidence that such a change would make it upstream in a reasonable amount of time isn't very high, and it seems like a bad idea to block the driver from working properly in the meantime. Pulling in Christoph and Jens as I suspect they have more context on this, and maybe can explain thats its not so quirky with the right perspective? Thoughts? Maybe there is an easier way to make it less quirky then what I imagine? thanks -john