Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754646AbaKYMXW (ORCPT ); Tue, 25 Nov 2014 07:23:22 -0500 Received: from foss-mx-na.foss.arm.com ([217.140.108.86]:38821 "EHLO foss-mx-na.foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753205AbaKYMXS (ORCPT ); Tue, 25 Nov 2014 07:23:18 -0500 Date: Tue, 25 Nov 2014 12:23:02 +0000 From: Catalin Marinas To: Russell King - ARM Linux Cc: Arnd Bergmann , Will Deacon , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , Ding Tianhong Subject: Re: For the problem when using swiotlb Message-ID: <20141125122302.GC18858@e104818-lin.cambridge.arm.com> References: <5469E26B.2010905@huawei.com> <20141121175118.GA10451@localhost> <20141121180925.GG19783@e104818-lin.cambridge.arm.com> <2166613.l2i4mdmtLA@wuerfel> <20141125105815.GB18858@e104818-lin.cambridge.arm.com> <20141125112905.GC3836@n2100.arm.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141125112905.GC3836@n2100.arm.linux.org.uk> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 25, 2014 at 11:29:05AM +0000, Russell King - ARM Linux wrote: > On Tue, Nov 25, 2014 at 10:58:15AM +0000, Catalin Marinas wrote: > > Since we don't have a coherent_dma_supported() function, we defer the > > validity check of coherent_dma_mask to dma_alloc_coherent() (and here we > > can remove bouncing since swiotlb has relatively small buffers). > > Bouncing of coherent DMA buffers is insane; if you have to bounce them, > they're by definition not coherent. "Bouncing" is not a proper term here. What I meant is not using the swiotlb bounce buffers for the dma_alloc_coherent(). The swiotlb_alloc_coherent() allows this but it doesn't do any copying/bouncing in such scenario (it would not make sense, as you said below). > Think about one of the common uses of coherent DMA buffers: ring buffers, > where both the CPU and the DMA agent write to the ring: > > - CPU writes to ring, loading address and length, then writing to the > status word for the ring entry. > - DMA reads the ring status word, sees it owns the entry, processes it, > DMA writes to the ring status word to give it back. > > What this means is that if you are bouncing the buffer, you are copying > it whole-sale between the CPU visible version and the DMA visible > version, which means that you can miss DMA updates to it. So, bouncing > a coherent DMA buffer is simply not an acceptable implementation for > dma_alloc_coherent(). I agree, not arguing against this. > As for validity of masks, it is defined in the DMA API documentation that > if a DMA mask is suitable for the streaming APIs, then it is also suitable > for the coherent APIs. The reverse is left open, and so may not > necessarily be true. > > In other words: > > err = dma_set_mask(dev, mask); > if (err == 0) > assert(dma_set_coherent_mask(dev, mask) == 0); > > must always succeed, but reversing the two calls has no guarantee. That would succeed on arm64 currently. The problem is that swiotlb bounce buffers allow for smaller than 32-bit dma_mask for streaming DMA since it can do bouncing. With coherent allocations, we could allow the same smaller than 32-bit coherent_dma_mask, however allocations may fail since they fall back to the swiotlb reserved buffer which is relatively small. What I want to avoid is threads like this where people ask for bigger swiotlb buffers for coherent allocations most likely because they do the wrong thing with their dma masks (or dma-ranges property). If the arm64 dma_alloc_coherent() code avoids calling swiotlb_alloc_coherent() and just limits itself to ZONE_DMA, we make it clear that the swiotlb buffers are only used for streaming DMA (bouncing). Once we avoid swiotlb buffers for coherent DMA, the problem is that even though your assertion above would pass, dma_alloc_coherent() may fail if coherent_dma_mask is smaller than the end of ZONE_DMA (and pfn offsets considered). Basically breaking the assumption that you mentioned above with regards to coherent_dma_mask suitability. I think the best we can get is for dma_supported() to ignore swiotlb_dma_supported() and simply check for ZONE_DMA (similar to the arm32 one). We guarantee that swiotlb bounce buffers are within ZONE_DMA already, so no need for calling swiotlb_dma_supported(). If we ever have devices with smaller than 32-bit mask on arm64, we will have to reduce ZONE_DMA. > Note that there seems to be only one driver which has different coherent > and streaming DMA masks today: > > drivers/media/pci/sta2x11/sta2x11_vip.c: > if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(26))) { > err = dma_set_coherent_mask(&vip->pdev->dev, DMA_BIT_MASK(29)); This would work if we have a ZONE_DMA of 29-bit while the swiotlb bounce buffer within 26-bit. If we go with the above ZONE_DMA only check, we would need ZONE_DMA of 26-bit. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/