Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755453AbbDTNDg (ORCPT ); Mon, 20 Apr 2015 09:03:36 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:45859 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932285AbbDTNDb (ORCPT ); Mon, 20 Apr 2015 09:03:31 -0400 Date: Mon, 20 Apr 2015 09:03:03 -0400 From: Konrad Rzeszutek Wilk To: Dorian Gray Cc: Alexander Duyck , Alan Stern , Suman Tripathi , iommu@lists.linux-foundation.org, USB list , Kernel development list Subject: Re: Error: DMA: Out of SW-IOMMU space [was: External USB drives become unresponsive after few hours.] Message-ID: <20150420130303.GB9002@l.oracle.com> References: <552FCD25.9060807@gmail.com> <20150416184252.GE7388@x230.dumpdata.com> <20150417200623.GA14442@l.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5439 Lines: 170 On Sun, Apr 19, 2015 at 05:43:18PM +0200, Dorian Gray wrote: > I think the case is closed. > Now that I know it's not USB, but wireless driver, I looked through > the new k3.19.5's changelog and saw this: > > > commit b943e69d33fac1e5f6db57868e061096b0aae67a > Author: Larry Finger > Date: Sat Mar 21 15:16:05 2015 -0500 > > rtlwifi: Fix IOMMU mapping leak in AP mode > > commit be0b5e635883678bfbc695889772fed545f3427d upstream. > > Transmission of an AP beacon does not call the TX interrupt service routine, > which usually does the cleanup. Instead, cleanup is handled in a tasklet > completion routine. Unfortunately, this routine has a serious bug > in that it does > not release the DMA mapping before it frees the skb, thus one > IOMMU mapping is > leaked for each beacon. The test system failed with no free IOMMU > mapping slots > approximately one hour after hostapd was used to start an AP. > > This issue was reported and tested at > https://github.com/lwfinger/rtlwifi_new/issues/30. > > Reported-and-tested-by: Kevin Mullican > Cc: Kevin Mullican > Signed-off-by: Shao Fu > Signed-off-by: Larry Finger > Signed-off-by: Kalle Valo > Signed-off-by: Greg Kroah-Hartman > > > Looks very related, especially because my wireless card is also always > in AP mode, however I haven't been actually using it lately, so > probably that's why I didn't notice anything related to it (and kept > focused on USB), until I used dump_dma. > > Well, due to my minimal knowledge regarding kernel's internals I can't > be 100% sure that this was it, but so far 3.19.5 is working stable > (uptime 6hrs and counting). Sweet! > > Thank you Konrad (and everyone else involved) for helping me out to > pinpoint the actual culprit. Sure thing. Happy to have been able to help! > Jake > > > On 18 April 2015 at 21:59, Dorian Gray wrote: > > On 18 April 2015 at 12:10, Dorian Gray wrote: > >> On 17 April 2015 at 22:06, Konrad Rzeszutek Wilk wrote: > >>> On Fri, Apr 17, 2015 at 05:14:20PM +0200, Dorian Gray wrote: > >>>> On 16 April 2015 at 20:42, Konrad Rzeszutek Wilk wrote: > >>>> > And easier way is to compile the kernel with CONFIG_DMA_API_DEBUG > >>>> > and then load the attached module. > >>>> > > >>>> > That should tell you who and what else is holding on the buffers. > >>>> > >>>> Ok, I have compiled 3.19.4 w/ CONFIG_DMA_API_DEBUG=y + the module you sent me. > >>>> Now, I'm not sure if I've done it right - I waited until the error > >>>> occured and then modprobe'd dump_dma. > >>>> I have attached the kernel log, but it tells me not much, if anything... > >>> > >>> The network driver is quite hungry for DMA. Did it do the same thing > >>> in the earlier kernels? > >>> > >>> Thanks. > >>>> > >>>> Thanks again. > >>>> Jake > >>> > >>> > >> > >> Yeah, you're right: > >> > >> # grep rtl8192se dump_dma_k3.19.4.log | wc -l > >> 6789 > >> # > >> # grep rtl8192se dump_dma_k3.17.8.log | wc -l > >> 162 > >> # > >> > >> So, wlan driver would be the real culprit then..? > >> I would have never thought... > >> > >> I guess I'm gonna test 3.19.4 once more (just to be sure) with > >> rtl8192se removed and see what happens. > >> > >> Thanks! > >> Jake > > > > > > [update] > > > > Ok, 6 hours of uptime (3.19.4 + blacklisted rtl8192se) and everything > > was fine... > > However, I was checking periodically and noticed that 'radeon' also > > tends to grow continuously over time, whereas ethernet driver sticks > > to, more or less, the same range: > > > > # uname -r > > 3.19.4 > > # > > # grep -Eo 'radeon|r8169' L1.log | sort | uniq -c > > 62 r8169 > > 4183 radeon > > # > > # grep -Eo 'radeon|r8169' L2.log | sort | uniq -c > > 33 r8169 > > 5582 radeon > > # > > # grep -Eo 'radeon|r8169' L3.log | sort | uniq -c > > 54 r8169 > > 7007 radeon > > # > > # grep -Eo 'radeon|r8169' L4.log | sort | uniq -c > > 49 r8169 > > 7429 radeon > > # > > # grep -Eo 'radeon|r8169' L5.log | sort | uniq -c > > 34 r8169 > > 9360 radeon > > # > > > > It doesn't grow that much in 3.17.8: > > > > # uname -r > > 3.17.8 > > # > > # grep -Eo 'radeon|r8169|rtl8192se' L1.log | sort | uniq -c > > 265 r8169 > > 1229 radeon > > 142 rtl8192se > > # > > # grep -Eo 'radeon|r8169|rtl8192se' L2.log | sort | uniq -c > > 187 r8169 > > 3159 radeon > > 124 rtl8192se > > # > > # grep -Eo 'radeon|r8169|rtl8192se' L3.log | sort | uniq -c > > 41 r8169 > > 1894 radeon > > 39 rtl8192se > > # > > # grep -Eo 'radeon|r8169|rtl8192se' L4.log | sort | uniq -c > > 64 r8169 > > 3370 radeon > > 77 rtl8192se > > # > > # grep -Eo 'radeon|r8169|rtl8192se' L5.log | sort | uniq -c > > 52 r8169 > > 2597 radeon > > 49 rtl8192se > > # > > > > > > Btw, at some point (3.19.4) I encounetered this: > > [21631.181909] DMA-API: debugging out of memory - disabling > > > > Jake -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/