We've got a bit of a problem with the sata_nv driver that I'm trying to
figure out a decent solution to (hence all the lists CCed). This is the
situation:
The nForce4 ADMA hardware has 2 modes: legacy mode, where it acts like a
normal ATA controller with 32-bit DMA limits, and ADMA mode where it can
access all of 64-bit memory. Each PCI device has 2 SATA ports, and the
legacy/ADMA mode can be controlled independently on both of them.
The trick is that if an ATAPI device is connected, we (as far as I'm
aware) can't use ADMA mode, so we have to switch that port into legacy
mode. This means it's only capable of 32-bit DMA. However the other port
on the controller may be connected to a hard drive and therefore still
capable of 64-bit DMA. (To make things more complicated, devices can be
hotplugged and so this can change dynamically.) Since the device that
libata is doing DMA mapping against is attached to the PCI device and
not the port, it creates a problem here. If we change the mask on one it
affects the other one as well.
The original solution used by the driver was to leave the DMA mask at
64-bit and use blk_queue_bounce_limit to try to force the block layer
not to send any requests with DMA addresses over 4GB into the driver.
However it seems on x86_64 this doesn't work, since it pushes high
addresses through anyway and expects the IOMMU to take care of it (which
it doesn't because of the 64-bit mask).
The last solution I tried was to set the DMA mask on both ports to
32-bit on slave_configure when an ATAPI device is connected. However,
this runs into complications as well. This is run on initialization and
when trying to set the other port into 32-bit DMA, it may not be
initialized yet. Plus, it forces the port with a hard drive on it into
32-bit DMA needlessly.
The ideal solution would be to do mapping against a different struct
device for each port, so that we could maintain the proper DMA mask for
each of them at all times. However I'm not sure if that's possible. The
thought of using the SCSI struct device for DMA mapping was brought up
at one point.. any thoughts on that?
On Jan 29, 2008 11:08 AM, Robert Hancock <[email protected]> wrote:
...
> The last solution I tried was to set the DMA mask on both ports to
> 32-bit on slave_configure when an ATAPI device is connected. However,
> this runs into complications as well. This is run on initialization and
> when trying to set the other port into 32-bit DMA, it may not be
> initialized yet. Plus, it forces the port with a hard drive on it into
> 32-bit DMA needlessly.
Have you measured the impact of setting the PCI dma mask to 32-bit?
Last time Alex Williamson (HP) measured this on IA64, we deliberately
forced pci_map_sg() to use the IOMMU even for devices that were 64-bit
capable. We got 3-5% better throughput since the device had fewer
entries to retrieve and the devices (at the time) weren't that good at
processing SG lists.
>
> The ideal solution would be to do mapping against a different struct
> device for each port, so that we could maintain the proper DMA mask for
> each of them at all times. However I'm not sure if that's possible. The
> thought of using the SCSI struct device for DMA mapping was brought up
> at one point.. any thoughts on that?
I'm pretty sure that's not possible (using two PCI dev structs). I'm
skeptical it's worth converting DMA services to use SCSI devs since
that's an extremely invasive change for a marginal benefit.
hth,
grant
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Mon, Jan 28, 2008 at 06:08:44PM -0600, Robert Hancock wrote:
> The
> thought of using the SCSI struct device for DMA mapping was brought up
> at one point.. any thoughts on that?
I believe this will work on some architectures and not others.
Anything that uses include/asm-generic/dma-mapping.h will break, for
example. It would be nice for those architectures to get fixed ...
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
> The ideal solution would be to do mapping against a different struct
> device for each port, so that we could maintain the proper DMA mask for
> each of them at all times. However I'm not sure if that's possible.
I cannot imagine why it should be that difficult. The PCI subsystem
could over a pci_clone_device() or similar function. For all complicated
purposes (sysfs etc) the original device could be used, so it would
be hopefully not that difficult.
The alternative would be to add a new family of PCI mapping
functions that take an explicit mask. Disadvantage would be changing
all architectures, but on the other hand the interface could be phase
in one by one (and nF4 primarily only works on x86 anyways)
I suspect the later would be a little cleaner, although they don't
make much difference.
-Andi
On Tue, 2008-01-29 at 05:28 +0100, Andi Kleen wrote:
> > The ideal solution would be to do mapping against a different struct
> > device for each port, so that we could maintain the proper DMA mask for
> > each of them at all times. However I'm not sure if that's possible.
>
> I cannot imagine why it should be that difficult. The PCI subsystem
> could over a pci_clone_device() or similar function. For all complicated
> purposes (sysfs etc) the original device could be used, so it would
> be hopefully not that difficult.
I know it works for parisc ... all we care about for DMA mapping is the
mask in the actual device and the location of the iommu. For the
latter, we just go up device->parent until we find it, so as long as
manufactured devices are properly parented we have no problems with
mapping them.
The concern matthew has is this code in asm-generic/dma-mapping.h:
static inline void *
dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t
*dma_handle,
gfp_t flag)
{
BUG_ON(dev->bus != &pci_bus_type);
return pci_alloc_consistent(to_pci_dev(dev), size, dma_handle);
}
The manufactured devices wouldn't be PCI devices (otherwise they'd show
up in PCI and cause all sorts of confusion), so any architectures which
haven't converted to using the dma_ functions internally will BUG here.
However, a quick audit shows that to be just m68k, v850 and sparc (not
sparc64), so they're probably none the driver cares about.
> The alternative would be to add a new family of PCI mapping
> functions that take an explicit mask. Disadvantage would be changing
> all architectures, but on the other hand the interface could be phase
> in one by one (and nF4 primarily only works on x86 anyways)
I suppose it would allow us to clean dma_mask and dma_coherent_mask out
of the device structures ... on the other hand, the mask isn't simply
what the device wants, it's also what the platform allows you to set, so
it would have to be stored somewhere anyway.
> I suspect the later would be a little cleaner, although they don't
> make much difference.
James
--- On Mon, 1/28/08, Robert Hancock <[email protected]> wrote:
> The trick is that if an ATAPI device is connected, we (as
> far as I'm
> aware) can't use ADMA mode, so we have to switch that
> port into legacy
> mode.
Can you double check this with the HW architect of the
HW DMA engine of the ASIC?
> This means it's only capable of 32-bit DMA.
> However the other port
> on the controller may be connected to a hard drive and
> therefore still
> capable of 64-bit DMA.
If this is indeed the case as you've presented it here,
it sounds like a HW shortcoming. I cannot see how the device
type (or protocol) dictate how the DMA engine operates.
They live in two different domains.
> The ideal solution would be to do mapping against a
> different struct
> device for each port, so that we could maintain the proper
> DMA mask for
> each of them at all times. However I'm not sure if
> that's possible. The
> thought of using the SCSI struct device for DMA mapping was
> brought up
> at one point.. any thoughts on that?
The reason for this is that the object that a struct scsi_dev
represents has nothing to do with HW DMA engines.
It looks like your current solution is correct and
x86_64's blk_queue_bounce_limit needs work.
Luben
--- On Mon, 1/28/08, Andi Kleen <[email protected]> wrote:
> > The ideal solution would be to do mapping against a
> different struct
> > device for each port, so that we could maintain the
> proper DMA mask for
> > each of them at all times. However I'm not sure if
> that's possible.
>
> I cannot imagine why it should be that difficult. The PCI
> subsystem
> could over a pci_clone_device() or similar function. For
> all complicated
> purposes (sysfs etc) the original device could be used, so
> it would
> be hopefully not that difficult.
>
> The alternative would be to add a new family of PCI mapping
> functions that take an explicit mask. Disadvantage would be
> changing
> all architectures, but on the other hand the interface
> could be phase
> in one by one (and nF4 primarily only works on x86 anyways)
>
> I suspect the later would be a little cleaner, although
> they don't
> make much difference.
Yes, I guess, that's certainly doable.
The current PCI abstraction is clean: HW DMA engine(s) implementation
is a property of the PCI function.
Marrying different behaviour of the HW DMA engine of the ASIC
depending on the SCSI end device at the PCI device abstraction doesn't
sound good. (An extreme design is a single DMA engine servicing
the ASIC.)
Although, the effect that Rob wants could be cleanly implemented
at a higher level, pci_map_sg() and such, or fixing
blk_queue_bounce_limit() in x86_64 to that effect.
Luben
Luben Tuikov wrote:
> --- On Mon, 1/28/08, Robert Hancock <[email protected]> wrote:
>> The trick is that if an ATAPI device is connected, we (as
>> far as I'm
>> aware) can't use ADMA mode, so we have to switch that
>> port into legacy
>> mode.
>
> Can you double check this with the HW architect of the
> HW DMA engine of the ASIC?
Will do so. However, previous statements from NVIDIA fairly clearly
indicate that this is the case.
>
>> This means it's only capable of 32-bit DMA.
>> However the other port
>> on the controller may be connected to a hard drive and
>> therefore still
>> capable of 64-bit DMA.
>
> If this is indeed the case as you've presented it here,
> it sounds like a HW shortcoming. I cannot see how the device
> type (or protocol) dictate how the DMA engine operates.
> They live in two different domains.
Well, there is an indirect link. The ADMA interface (which supports
64-bit DMA) cannot be used to issue ATAPI commands, so if an ATAPI
device is connected we have to go to legacy mode, which supports only
32-bit DMA.
I'm not sure why ADMA mode doesn't support ATAPI. The only reason I can
think of is that there's issues since ATAPI commands can potentially be
of unpredictable transfer size. The "real" ADMA spec that the NVIDIA
implementation is loosely based on does have some special "ignore
excess" controls that don't seem to be in the NVIDIA version (or at
least not to the knowledge I have on this hardware).
And yes, it is a rather unfortunate hardware shortcoming (presuming that
it is entirely true).
>
>> The ideal solution would be to do mapping against a
>> different struct
>> device for each port, so that we could maintain the proper
>> DMA mask for
>> each of them at all times. However I'm not sure if
>> that's possible. The
>> thought of using the SCSI struct device for DMA mapping was
>> brought up
>> at one point.. any thoughts on that?
>
> The reason for this is that the object that a struct scsi_dev
> represents has nothing to do with HW DMA engines.
>
> It looks like your current solution is correct and
> x86_64's blk_queue_bounce_limit needs work.
>
> Luben
>
>
Robert Hancock wrote:
> Luben Tuikov wrote:
>> --- On Mon, 1/28/08, Robert Hancock <[email protected]> wrote:
>>> The trick is that if an ATAPI device is connected, we (as
>>> far as I'm aware) can't use ADMA mode, so we have to switch that
>>> port into legacy mode.
>>
>> Can you double check this with the HW architect of the
>> HW DMA engine of the ASIC?
>
> Will do so. However, previous statements from NVIDIA fairly clearly
> indicate that this is the case.
>
>>
>>> This means it's only capable of 32-bit DMA.
>>> However the other port on the controller may be connected to a hard
>>> drive and therefore still capable of 64-bit DMA.
>>
>> If this is indeed the case as you've presented it here,
>> it sounds like a HW shortcoming. I cannot see how the device
>> type (or protocol) dictate how the DMA engine operates.
>> They live in two different domains.
>
> Well, there is an indirect link. The ADMA interface (which supports
> 64-bit DMA) cannot be used to issue ATAPI commands, so if an ATAPI
> device is connected we have to go to legacy mode, which supports only
> 32-bit DMA.
>
> I'm not sure why ADMA mode doesn't support ATAPI. The only reason I can
> think of is that there's issues since ATAPI commands can potentially be
> of unpredictable transfer size. The "real" ADMA spec that the NVIDIA
> implementation is loosely based on does have some special "ignore
> excess" controls that don't seem to be in the NVIDIA version (or at
> least not to the knowledge I have on this hardware).
..
The original Pacific Digital ADMA cores *do* support most ATAPI commands
in ADMA mode, including READ_CD, READ_10, etc.. With the caveat that if
DSC completion state is required, the driver has to drop out of ADMA
and poll for it after the ADMA command completes.
Commands which were not ADMA compatible (eg. MODE_SENSE, TEST_UNIT_READY, ..)
were simply handled with PIO (in the driver) rather than any form of DMA,
which is okay because those commands are relatively infrequent.
Note that Pacific Digital "officially" said "no ATAPI" for the ADMA design,
but I implemented it regardless (for Linux) and it worked rather well.
We could burn DVDs and back-up to tape simultaneously, with the burner
and the tape unit sharing a single IDE cable/channel.
Cheers
Mark Lord wrote:
> ..
> Commands which were not ADMA compatible (eg. MODE_SENSE,
> TEST_UNIT_READY, ..)
> were simply handled with PIO (in the driver) rather than any form of DMA,
> which is okay because those commands are relatively infrequent.
..
A slight correction there: TEST_UNIT_READY was fine in ADMA mode as well.
Cheers
Mark Lord wrote:
> Robert Hancock wrote:
>> Luben Tuikov wrote:
>>> --- On Mon, 1/28/08, Robert Hancock <[email protected]> wrote:
>>>> The trick is that if an ATAPI device is connected, we (as
>>>> far as I'm aware) can't use ADMA mode, so we have to switch that
>>>> port into legacy mode.
>>>
>>> Can you double check this with the HW architect of the
>>> HW DMA engine of the ASIC?
>>
>> Will do so. However, previous statements from NVIDIA fairly clearly
>> indicate that this is the case.
>>
>>>
>>>> This means it's only capable of 32-bit DMA.
>>>> However the other port on the controller may be connected to a hard
>>>> drive and therefore still capable of 64-bit DMA.
>>>
>>> If this is indeed the case as you've presented it here,
>>> it sounds like a HW shortcoming. I cannot see how the device
>>> type (or protocol) dictate how the DMA engine operates.
>>> They live in two different domains.
>>
>> Well, there is an indirect link. The ADMA interface (which supports
>> 64-bit DMA) cannot be used to issue ATAPI commands, so if an ATAPI
>> device is connected we have to go to legacy mode, which supports only
>> 32-bit DMA.
>>
>> I'm not sure why ADMA mode doesn't support ATAPI. The only reason I
>> can think of is that there's issues since ATAPI commands can
>> potentially be of unpredictable transfer size. The "real" ADMA spec
>> that the NVIDIA implementation is loosely based on does have some
>> special "ignore excess" controls that don't seem to be in the NVIDIA
>> version (or at least not to the knowledge I have on this hardware).
> ..
>
> The original Pacific Digital ADMA cores *do* support most ATAPI commands
> in ADMA mode, including READ_CD, READ_10, etc.. With the caveat that if
> DSC completion state is required, the driver has to drop out of ADMA
> and poll for it after the ADMA command completes.
>
> Commands which were not ADMA compatible (eg. MODE_SENSE,
> TEST_UNIT_READY, ..)
> were simply handled with PIO (in the driver) rather than any form of DMA,
> which is okay because those commands are relatively infrequent.
>
> Note that Pacific Digital "officially" said "no ATAPI" for the ADMA design,
> but I implemented it regardless (for Linux) and it worked rather well.
> We could burn DVDs and back-up to tape simultaneously, with the burner
> and the tape unit sharing a single IDE cable/channel.
I'm told that the ADMA hardware does have some support for issuing ATAPI
commands, however according to Allen Martin of NVIDIA, "The ATAPI
support in the ADMA hardware had some serious problem that forced us to
turn it off in the Windows driver." So it looks like ATAPI in ADMA mode
is likely a non-starter.
On Tue, Jan 29, 2008 at 02:09:52PM -0800, Luben Tuikov wrote:
> > The ideal solution would be to do mapping against a
> > different struct
> > device for each port, so that we could maintain the proper
> > DMA mask for
> > each of them at all times. However I'm not sure if
> > that's possible. The
> > thought of using the SCSI struct device for DMA mapping was
> > brought up
> > at one point.. any thoughts on that?
>
> The reason for this is that the object that a struct scsi_dev
> represents has nothing to do with HW DMA engines.
It really would work, once the few remaining architectures move away
from asserting that the 'struct device' passed in is a pci device.
It seems like the best way forward to me.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."