Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753273AbZFDWnZ (ORCPT ); Thu, 4 Jun 2009 18:43:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753701AbZFDWnO (ORCPT ); Thu, 4 Jun 2009 18:43:14 -0400 Received: from sh.osrg.net ([192.16.179.4]:59959 "EHLO sh.osrg.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753376AbZFDWnN (ORCPT ); Thu, 4 Jun 2009 18:43:13 -0400 Date: Fri, 5 Jun 2009 07:43:01 +0900 To: just.for.lkml@googlemail.com Cc: jens.axboe@oracle.com, fujita.tomonori@lab.ntt.co.jp, bharrosh@panasas.com, hancockrwd@gmail.com, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org Subject: Re: sata_sil24 0000:04:00.0: DMA-API: device driver frees DMA sg list with different entry count [map count=13] [unmap count=10] From: FUJITA Tomonori In-Reply-To: <64bb37e0906041107i18faee5etd6dbb05838740bed@mail.gmail.com> References: <20090604164418F.fujita.tomonori@lab.ntt.co.jp> <20090604075309.GU11363@kernel.dk> <64bb37e0906041107i18faee5etd6dbb05838740bed@mail.gmail.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20090605074246R.fujita.tomonori@lab.ntt.co.jp> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (sh.osrg.net [192.16.179.4]); Fri, 05 Jun 2009 07:43:03 +0900 (JST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5357 Lines: 110 On Thu, 4 Jun 2009 20:07:36 +0200 Torsten Kaiser wrote: > On Thu, Jun 4, 2009 at 9:53 AM, Jens Axboe wrote: > > On Thu, Jun 04 2009, FUJITA Tomonori wrote: > >> On Thu, 04 Jun 2009 10:15:14 +0300 > >> Boaz Harrosh wrote: > >> > >> > On 06/04/2009 09:33 AM, FUJITA Tomonori wrote: > >> > > On Thu, 4 Jun 2009 08:12:34 +0200 > >> > > Torsten Kaiser wrote: > >> > > > >> > >> On Thu, Jun 4, 2009 at 2:02 AM, FUJITA Tomonori > >> > >> wrote: > >> > >>> On Wed, 3 Jun 2009 21:30:32 +0200 > >> > >>> Torsten Kaiser wrote: > >> > >>>> Still happens with 2.6.30-rc8 (see trace at the end of the email) > >> > >>>> > >> > >>>> As orig_n_elem is only used two times in libata-core.c I suspected a > >> > >>>> corruption of the qc->sg, but adding checks for this did not trigger. > >> > >>>> So I looked into lib/dma-debug.c. > >> > >>>> It seems add_dma_entry() does not protect against adding the same > >> > >>>> entry twice. > >> > >>> Do you mean that add_dma_entry() doesn't protect against adding a new > >> > >>> entry identical to the existing entry, right? > >> > >> Yes, as I read the hash bucket code in lib/dma-debug.c a second entry > >> > >> from the same device and the same address will just be added to the > >> > >> list and on unmap it will always return the first entry. > >> > > > >> > > It means that two different DMA operations will be performed against > >> > > the same dma addresss on the same device at the same time. It doesn't > >> > > happen unless there is a bug in a driver, an IOMMU or somewhere, as I > >> > > wrote in the previous mail. > >> > > > >> > > >> > What about the draining buffers used by libata. Are they not the same buffer > >> > for all devices for all requests? > >> > >> I'm not sure if the drain buffer is used like that. But is there > >> easier ways to see the same buffer; e.g. sending the same buffer twice > >> with DIO? > > > > I'm pretty sure we discussed this some months ago, the intel iommu > > driver had a similar bug iirc. Lets say you want to write the same 4kb > > block to two spots on the disk. You prepare and submit that with > > O_DIRECT and using aio. On a device with NCQ, that could easily map the > > same page twice. Or, perhaps more likely, doing 512b writes and not > > getting all of them merged. > > I have a even better theory: RAID1 > There are two disk on this sil24 controller that are uses as an RAID1 > to form my root partition. > > That also fits the pattern of the very large number of duplicate dma > mappings (as each data block needs to be written twice), but that the > DMA-API debug check only triggers during heavier load: Most of the > time both drives are in sync and so the write request should be > idential, so it does not matter which entry gets returned from the > hash bucket. > But when I run 'updatedb' to trigger this error the read request > disturb the pattern and the write requests also become asymetric. > > >> As I wrote, I assume that he uses GART IOMMU; > > [ 0.010000] Checking aperture... > [ 0.010000] No AGP bridge found > [ 0.010000] Node 0: aperture @ a7f0000000 size 32 MB > [ 0.010000] Aperture beyond 4GB. Ignoring. > [ 0.010000] Your BIOS doesn't leave a aperture memory hole > [ 0.010000] Please enable the IOMMU option in the BIOS setup > (sadly my BIOS does not have such an option...) > [ 0.010000] This costs you 64 MB of RAM > [ 0.010000] Mapping aperture over 65536 KB of RAM @ 20000000 > [ 0.010000] Memory: 4057512k/4718592k available (4674k kernel code, > 524868k absent, 136212k reserved > , 2520k data, 1172k init) > [snip] > [ 1.304386] DMA-API: preallocated 32768 debug entries > [ 1.309439] DMA-API: debugging enabled by kernel config > [ 1.310123] PCI-DMA: Disabling AGP. > [ 1.313711] PCI-DMA: aperture base @ 20000000 size 65536 KB > [ 1.320002] PCI-DMA: using GART IOMMU. > [ 1.323763] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture > [ 1.330640] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31 > [ 1.340007] hpet0: 3 comparators, 32-bit 25.000000 MHz counter You use GART IOMMU. So I thought that you shouldn't hit this problem because an IOMMU gives an unique dma address per dma mapping... but I forgot one really important thing about GART, it's not real IOMMU hardware. It does address remapping only when necessary (an address can be accessed by a device). It's possible that you see multiple DMA transfers performed against the same dma address on one device at the same time. > >> it allocates an unique > >> dma address per dma mapping operation. > >> > >> However, dma-debug is broken wrt this, I guess. > > > > Seems so. > > Yes, as the md code for RAID1 has a very good cause to send the same > memory page twice to this device. Yeah, now it's clear for me why you hit this bug. I'm not sure there is any simple way to fix dma-debug wrt this. I think that it's better to just disable it since 2.6.30 will be released shortly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/