Quadlet reads to memory above 4GB is painfully slow when serviced
by the AR DMA context. In addition, the CPU(s) may be locked-up,
preventing any transfer at all.
Write the PhyUpperBound register with the end-of-memory value. If
end-of-memory is beyond the OHCI limit of 0x0000ffff00000000,
clamp to that value.
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/firewire/ohci.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 044ace3..b4135a5 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -2249,6 +2249,7 @@ static int ohci_enable(struct fw_card *card,
struct pci_dev *dev = to_pci_dev(card->device);
u32 lps, version, irqs;
int i, ret;
+ u32 phys_upper;
if (software_reset(ohci)) {
dev_err(card->device, "failed to reset ohci card\n");
@@ -2323,7 +2324,10 @@ static int ohci_enable(struct fw_card *card,
reg_write(ohci, OHCI1394_FairnessControl, 0);
card->priority_budget_implemented = ohci->pri_req_max != 0;
- reg_write(ohci, OHCI1394_PhyUpperBound, 0x00010000);
+ phys_upper = min(0xffff0000ULL,
+ (dma_get_required_mask(card->device) >> 16) + 1);
+ reg_write(ohci, OHCI1394_PhyUpperBound, max(phys_upper, 0x00010000U));
+
reg_write(ohci, OHCI1394_IntEventClear, ~0);
reg_write(ohci, OHCI1394_IntMaskClear, ~0);
--
1.8.1.2
Peter Hurley wrote:
> Quadlet reads to memory above 4GB is painfully slow when serviced
> by the AR DMA context. In addition, the CPU(s) may be locked-up,
> preventing any transfer at all.
Using physical DMA prevents the use of that address space for software
address handlers, so you have adjust the low_memory_region start in
core-transaction.c.
> Write the PhyUpperBound register with the end-of-memory value. If
> end-of-memory is beyond the OHCI limit of 0x0000ffff00000000,
> clamp to that value.
You will have to lower this limit; there are protcols that assume that
addresses like 0xecc000000000 are available for software.
Regards,
Clemens
On Tue, 2013-03-26 at 17:12 +0100, Clemens Ladisch wrote:
> Peter Hurley wrote:
> > Quadlet reads to memory above 4GB is painfully slow when serviced
> > by the AR DMA context. In addition, the CPU(s) may be locked-up,
> > preventing any transfer at all.
>
> Using physical DMA prevents the use of that address space for software
> address handlers, so you have adjust the low_memory_region start in
> core-transaction.c.
Right, thanks for pointing that out.
> > Write the PhyUpperBound register with the end-of-memory value. If
> > end-of-memory is beyond the OHCI limit of 0x0000ffff00000000,
> > clamp to that value.
>
> You will have to lower this limit; there are protcols that assume that
> addresses like 0xecc000000000 are available for software.
Maybe 0x0000e80000000000 is a better maximum upper bound?
Regards,
Peter Hurley
Peter Hurley wrote:
> On Tue, 2013-03-26 at 17:12 +0100, Clemens Ladisch wrote:
>> Peter Hurley wrote:
>>> Write the PhyUpperBound register with the end-of-memory value. If
>>> end-of-memory is beyond the OHCI limit of 0x0000ffff00000000,
>>> clamp to that value.
>>
>> You will have to lower this limit; there are protcols that assume that
>> addresses like 0xecc000000000 are available for software.
>
> Maybe 0x0000e80000000000 is a better maximum upper bound?
Why the space of 0x04c000000000? ;-)
There's also 0xc007dedadada (although this address isn't used on the PC
side). Maybe we should just round down to 0x800000000000 and revisit
the question when there are 256 TB memory machines. (Directly accessing
the full 256 TB will not be possible in any case.)
Regards,
Clemens
On Mar 26 Peter Hurley wrote:
> Quadlet reads to memory above 4GB is painfully slow when serviced
> by the AR DMA context. In addition, the CPU(s) may be locked-up,
> preventing any transfer at all.
>
> Write the PhyUpperBound register with the end-of-memory value. If
> end-of-memory is beyond the OHCI limit of 0x0000ffff00000000,
> clamp to that value.
>
> Signed-off-by: Peter Hurley <[email protected]>
> ---
> drivers/firewire/ohci.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
> index 044ace3..b4135a5 100644
> --- a/drivers/firewire/ohci.c
> +++ b/drivers/firewire/ohci.c
> @@ -2249,6 +2249,7 @@ static int ohci_enable(struct fw_card *card,
> struct pci_dev *dev = to_pci_dev(card->device);
> u32 lps, version, irqs;
> int i, ret;
> + u32 phys_upper;
>
> if (software_reset(ohci)) {
> dev_err(card->device, "failed to reset ohci card\n");
> @@ -2323,7 +2324,10 @@ static int ohci_enable(struct fw_card *card,
> reg_write(ohci, OHCI1394_FairnessControl, 0);
> card->priority_budget_implemented = ohci->pri_req_max != 0;
>
> - reg_write(ohci, OHCI1394_PhyUpperBound, 0x00010000);
> + phys_upper = min(0xffff0000ULL,
> + (dma_get_required_mask(card->device) >> 16) + 1);
> + reg_write(ohci, OHCI1394_PhyUpperBound, max(phys_upper, 0x00010000U));
> +
> reg_write(ohci, OHCI1394_IntEventClear, ~0);
> reg_write(ohci, OHCI1394_IntMaskClear, ~0);
>
What Clemens said.
Also: By far most OHCI-1394 chips do not implement PhyUpperBound,
i.e. ignore any writes to PhyUpperBound, return 0 when PhyUpperBound is
read, and keep the boundary between physical response and AR response at
4 GB, as described in the spec.
It has been a long time though since I last checked whether PhyUpperBound
is implemented; maybe it has become more widespread than it was back then.
Or maybe it hasn't: All OHCI-1394 chips that ever came to market are 32
bit chips anyway. So the few rare ones that do support PhyUpperBound
larger than 4 GB cannot in fact use it.
Or am I severely behind the times about this?
--
Stefan Richter
-=====-===-= --== ==-=-
http://arcgraph.de/sr/
On Tue, 2013-03-26 at 19:56 +0100, Stefan Richter wrote:
> On Mar 26 Peter Hurley wrote:
> > Quadlet reads to memory above 4GB is painfully slow when serviced
> > by the AR DMA context. In addition, the CPU(s) may be locked-up,
> > preventing any transfer at all.
> >
> > Write the PhyUpperBound register with the end-of-memory value. If
> > end-of-memory is beyond the OHCI limit of 0x0000ffff00000000,
> > clamp to that value.
> >
> > Signed-off-by: Peter Hurley <[email protected]>
> > ---
> > drivers/firewire/ohci.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
> > index 044ace3..b4135a5 100644
> > --- a/drivers/firewire/ohci.c
> > +++ b/drivers/firewire/ohci.c
> > @@ -2249,6 +2249,7 @@ static int ohci_enable(struct fw_card *card,
> > struct pci_dev *dev = to_pci_dev(card->device);
> > u32 lps, version, irqs;
> > int i, ret;
> > + u32 phys_upper;
> >
> > if (software_reset(ohci)) {
> > dev_err(card->device, "failed to reset ohci card\n");
> > @@ -2323,7 +2324,10 @@ static int ohci_enable(struct fw_card *card,
> > reg_write(ohci, OHCI1394_FairnessControl, 0);
> > card->priority_budget_implemented = ohci->pri_req_max != 0;
> >
> > - reg_write(ohci, OHCI1394_PhyUpperBound, 0x00010000);
> > + phys_upper = min(0xffff0000ULL,
> > + (dma_get_required_mask(card->device) >> 16) + 1);
> > + reg_write(ohci, OHCI1394_PhyUpperBound, max(phys_upper, 0x00010000U));
> > +
> > reg_write(ohci, OHCI1394_IntEventClear, ~0);
> > reg_write(ohci, OHCI1394_IntMaskClear, ~0);
> >
>
> What Clemens said.
That's what I'm testing right now for v2 -- a fixed phys-AR boundary of
0x800000000000ULL.
> Also: By far most OHCI-1394 chips do not implement PhyUpperBound,
> i.e. ignore any writes to PhyUpperBound, return 0 when PhyUpperBound is
> read, and keep the boundary between physical response and AR response at
> 4 GB, as described in the spec.
>
> It has been a long time though since I last checked whether PhyUpperBound
> is implemented; maybe it has become more widespread than it was back then.
>
> Or maybe it hasn't: All OHCI-1394 chips that ever came to market are 32
> bit chips anyway. So the few rare ones that do support PhyUpperBound
> larger than 4 GB cannot in fact use it.
>
> Or am I severely behind the times about this?
The FW643e-2 is natively PCIe (not behind a bridge) and supports phys
DMA past 4GB (the datasheet says all 48 bits but I can only test it out
to 10GB).
I thought the FW643e was as well? You'll have to test that out :)
Regards,
Peter Hurley
On Mar 26 Peter Hurley wrote:
> On Tue, 2013-03-26 at 19:56 +0100, Stefan Richter wrote:
> > It has been a long time though since I last checked whether PhyUpperBound
> > is implemented; maybe it has become more widespread than it was back then.
> >
> > Or maybe it hasn't: All OHCI-1394 chips that ever came to market are 32
> > bit chips anyway. So the few rare ones that do support PhyUpperBound
> > larger than 4 GB cannot in fact use it.
> >
> > Or am I severely behind the times about this?
>
> The FW643e-2 is natively PCIe (not behind a bridge) and supports phys
> DMA past 4GB (the datasheet says all 48 bits but I can only test it out
> to 10GB).
>
> I thought the FW643e was as well? You'll have to test that out :)
OK, will do.
--
Stefan Richter
-=====-===-= --== ==-=-
http://arcgraph.de/sr/
On Mar 26 Stefan Richter wrote:
> On Mar 26 Peter Hurley wrote:
> > On Tue, 2013-03-26 at 19:56 +0100, Stefan Richter wrote:
> > > It has been a long time though since I last checked whether PhyUpperBound
> > > is implemented; maybe it has become more widespread than it was back then.
> > >
> > > Or maybe it hasn't: All OHCI-1394 chips that ever came to market are 32
> > > bit chips anyway. So the few rare ones that do support PhyUpperBound
> > > larger than 4 GB cannot in fact use it.
> > >
> > > Or am I severely behind the times about this?
> >
> > The FW643e-2 is natively PCIe (not behind a bridge) and supports phys
> > DMA past 4GB (the datasheet says all 48 bits but I can only test it out
> > to 10GB).
> >
> > I thought the FW643e was as well? You'll have to test that out :)
>
> OK, will do.
Does lspci or something similar show which PCI devices are capable of 64 bit
wide addressing?
--
Stefan Richter
-=====-===-= --== ===-=
http://arcgraph.de/sr/
On Fri, 2013-03-29 at 11:44 +0100, Stefan Richter wrote:
> On Mar 26 Stefan Richter wrote:
> > On Mar 26 Peter Hurley wrote:
> > > On Tue, 2013-03-26 at 19:56 +0100, Stefan Richter wrote:
> > > > It has been a long time though since I last checked whether PhyUpperBound
> > > > is implemented; maybe it has become more widespread than it was back then.
> > > >
> > > > Or maybe it hasn't: All OHCI-1394 chips that ever came to market are 32
> > > > bit chips anyway. So the few rare ones that do support PhyUpperBound
> > > > larger than 4 GB cannot in fact use it.
> > > >
> > > > Or am I severely behind the times about this?
> > >
> > > The FW643e-2 is natively PCIe (not behind a bridge) and supports phys
> > > DMA past 4GB (the datasheet says all 48 bits but I can only test it out
> > > to 10GB).
> > >
> > > I thought the FW643e was as well? You'll have to test that out :)
> >
> > OK, will do.
>
> Does lspci or something similar show which PCI devices are capable of 64 bit
> wide addressing?
Not definitively.
Usually (but not always), if the host registers are 64-bit addressable,
then the device supports DAC. But that's not something that can be
relied on programmatically.
For example, lspci on FW643e-2:
06:00.0 FireWire (IEEE 1394): LSI Corporation FW643 [TrueFire] PCIe 1394b Controller (rev 08) (prog-if 10 [OHCI])
Subsystem: LSI Corporation FW643 [TrueFire] PCIe 1394b Controller
Flags: bus master, fast devsel, latency 0, IRQ 89
==> Memory at dbeff000 (64-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>
Kernel driver in use: firewire_ohci
Kernel modules: firewire-ohci
By contrast, lspci on FW323 on same machine:
07:06.0 FireWire (IEEE 1394): LSI Corporation FW322/323 [TrueFire] 1394a Controller (rev 70) (prog-if 10 [OHCI])
Subsystem: Dell Device 5811
Flags: bus master, medium devsel, latency 64, IRQ 30
==> Memory at dbbff000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>
Kernel driver in use: firewire_ohci
Kernel modules: firewire-ohci
Peter Hurley wrote:
> On Fri, 2013-03-29 at 11:44 +0100, Stefan Richter wrote:
>>> On Mar 26 Peter Hurley wrote:
>>>> The FW643e-2 is natively PCIe (not behind a bridge) and supports phys
>>>> DMA past 4GB (the datasheet says all 48 bits but I can only test it out
>>>> to 10GB).
>>>>
>>>> I thought the FW643e was as well? You'll have to test that out :)
>>
>> Does lspci or something similar show which PCI devices are capable of 64 bit
>> wide addressing?
>
> Not definitively.
>
> Usually (but not always), if the host registers are 64-bit addressable,
> then the device supports DAC.
DACs are a feature of conventional PCI.
All PCI Express devices are 64-bit addressable. In the lspci output, PCIe
devices have the "Express" capability.
However, whether a device can *generate* 64-bit DMA requests is completely
device-specific.
Regards,
Clemens
On Fri, 2013-03-29 at 12:19 +0100, Clemens Ladisch wrote:
> Peter Hurley wrote:
> > On Fri, 2013-03-29 at 11:44 +0100, Stefan Richter wrote:
> >>> On Mar 26 Peter Hurley wrote:
> >>>> The FW643e-2 is natively PCIe (not behind a bridge) and supports phys
> >>>> DMA past 4GB (the datasheet says all 48 bits but I can only test it out
> >>>> to 10GB).
> >>>>
> >>>> I thought the FW643e was as well? You'll have to test that out :)
> >>
> >> Does lspci or something similar show which PCI devices are capable of 64 bit
> >> wide addressing?
> >
> > Not definitively.
> >
> > Usually (but not always), if the host registers are 64-bit addressable,
> > then the device supports DAC.
>
> DACs are a feature of conventional PCI.
>
> All PCI Express devices are 64-bit addressable. In the lspci output, PCIe
> devices have the "Express" capability.
The "device" could be sitting behind a PCIe-PCI bridge that _only_
supports the host window for 64-bit. All other 64-bit decodes could
return garbage.
> However, whether a device can *generate* 64-bit DMA requests is completely
> device-specific.