2007-11-12 23:14:48

by Robert Hancock

[permalink] [raw]
Subject: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to make sure that the legacy
PRD table and padding buffer are appropriately allocated according to the
DMA mask requirements of the current operating mode (ADMA or legacy).

Also, we should run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed.

Fixes Red Hat Bugzilla #351451: https://bugzilla.redhat.com/show_bug.cgi?id=351451

Signed-off-by: Robert Hancock <[email protected]>

--- linux-2.6.24-rc1-git10/drivers/ata/sata_nv.c 2007-11-01 20:01:32.000000000 -0600
+++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c 2007-11-10 19:57:47.000000000 -0600
@@ -247,6 +247,7 @@
void __iomem *ctl_block;
void __iomem *gen_block;
void __iomem *notifier_clear_block;
+ u64 adma_dma_mask;
u8 flags;
int last_issue_ncq;
};
@@ -747,11 +748,29 @@
on the port. */
adma_enable = 0;
nv_adma_register_mode(ap);
+ if (!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
+ /* Transitioning to legacy mode. Free the pad buffer. */
+ ata_pad_free(ap, ap->host->dev);
+ ap->pad = NULL;
+ ap->pad_dma = 0;
+ }
} else {
- bounce_limit = *ap->dev->dma_mask;
+ bounce_limit = pp->adma_dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
+
+ if (pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) {
+ /* Transitioning to ADMA mode. Free legacy PRD table
+ and the pad buffer. */
+ ata_pad_free(ap, ap->host->dev);
+ ap->pad = NULL;
+ ap->pad_dma = 0;
+ dmam_free_coherent(ap->host->dev, ATA_PRD_TBL_SZ,
+ ap->prd, ap->prd_dma);
+ ap->prd = NULL;
+ ap->prd_dma = 0;
+ }
}

pci_read_config_dword(pdev, NV_MCP_SATA_CFG_20, &current_reg);
@@ -763,23 +782,45 @@
config_mask = NV_MCP_SATA_CFG_20_PORT0_EN |
NV_MCP_SATA_CFG_20_PORT0_PWB_EN;

+ /* Set appropriate DMA mask. */
+ pci_set_dma_mask(pdev, bounce_limit);
+ pci_set_consistent_dma_mask(pdev, bounce_limit);
+
+ blk_queue_bounce_limit(sdev->request_queue, bounce_limit);
+ blk_queue_segment_boundary(sdev->request_queue, segment_boundary);
+ blk_queue_max_hw_segments(sdev->request_queue, sg_tablesize);
+ ata_port_printk(ap, KERN_INFO,
+ "bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n",
+ (unsigned long long)bounce_limit, segment_boundary,
+ sg_tablesize);
+
if (adma_enable) {
new_reg = current_reg | config_mask;
- pp->flags &= ~NV_ADMA_ATAPI_SETUP_COMPLETE;
+ if (pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) {
+ /* Transition to ADMA mode.
+ Reallocate the pad buffer. */
+ rc = ata_pad_alloc(ap, ap->host->dev);
+ pp->flags &= ~NV_ADMA_ATAPI_SETUP_COMPLETE;
+ }
} else {
new_reg = current_reg & ~config_mask;
- pp->flags |= NV_ADMA_ATAPI_SETUP_COMPLETE;
+ if (!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
+ /* Transition to legacy mode.
+ Reallocate the legacy PRD and pad buffer. */
+ ap->prd = dmam_alloc_coherent(ap->host->dev,
+ ATA_PRD_TBL_SZ, &ap->prd_dma, GFP_KERNEL);
+ if (!ap->prd)
+ rc = -ENOMEM;
+ else
+ rc = ata_pad_alloc(ap, ap->host->dev);
+
+ pp->flags |= NV_ADMA_ATAPI_SETUP_COMPLETE;
+ }
}

if (current_reg != new_reg)
pci_write_config_dword(pdev, NV_MCP_SATA_CFG_20, new_reg);

- blk_queue_bounce_limit(sdev->request_queue, bounce_limit);
- blk_queue_segment_boundary(sdev->request_queue, segment_boundary);
- blk_queue_max_hw_segments(sdev->request_queue, sg_tablesize);
- ata_port_printk(ap, KERN_INFO,
- "bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n",
- (unsigned long long)bounce_limit, segment_boundary, sg_tablesize);
return rc;
}

@@ -791,11 +832,13 @@

static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
{
- /* Since commands where a result TF is requested are not
- executed in ADMA mode, the only time this function will be called
- in ADMA mode will be if a command fails. In this case we
- don't care about going into register mode with ADMA commands
- pending, as the commands will all shortly be aborted anyway. */
+ /* Other than when internal or pass-through commands are executed,
+ the only time this function will be called in ADMA mode will be
+ if a command fails. In the failure case we don't care about going
+ into register mode with ADMA commands pending, as the commands will
+ all shortly be aborted anyway. We assume that NCQ commands are not
+ issued via passthrough and so this will not abort any commands in
+ that case. */
nv_adma_register_mode(ap);

ata_tf_read(ap, tf);
@@ -1136,7 +1179,9 @@

VPRINTK("ENTER\n");

- rc = ata_port_start(ap);
+ /* Do not allocate standard ATA PRD buffer here. Only do this if
+ an ATAPI device is connected (in slave_config). */
+ rc = ata_pad_alloc(ap, dev);
if (rc)
return rc;

@@ -1150,6 +1195,7 @@
pp->gen_block = ap->host->iomap[NV_MMIO_BAR] + NV_ADMA_GEN;
pp->notifier_clear_block = pp->gen_block +
NV_ADMA_NOTIFIER_CLEAR + (4 * ap->port_no);
+ pp->adma_dma_mask = *dev->dma_mask & dev->coherent_dma_mask;

mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ,
&mem_dma, GFP_KERNEL);
@@ -1359,11 +1405,9 @@
struct nv_adma_port_priv *pp = qc->ap->private_data;

/* ADMA engine can only be used for non-ATAPI DMA commands,
- or interrupt-driven no-data commands, where a result taskfile
- is not required. */
+ or interrupt-driven no-data commands. */
if ((pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ||
- (qc->tf.flags & ATA_TFLAG_POLLING) ||
- (qc->flags & ATA_QCFLAG_RESULT_TF))
+ (qc->tf.flags & ATA_TFLAG_POLLING))
return 1;

if ((qc->flags & ATA_QCFLAG_DMAMAP) ||


2007-11-13 02:26:19

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

Hello, Robert.

Robert Hancock wrote:
> @@ -747,11 +748,29 @@
> on the port. */
> adma_enable = 0;
> nv_adma_register_mode(ap);
> + if (!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
> + /* Transitioning to legacy mode. Free the pad buffer. */
> + ata_pad_free(ap, ap->host->dev);
> + ap->pad = NULL;
> + ap->pad_dma = 0;
> + }
> } else {
> - bounce_limit = *ap->dev->dma_mask;
> + bounce_limit = pp->adma_dma_mask;
> segment_boundary = NV_ADMA_DMA_BOUNDARY;
> sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
> adma_enable = 1;
> +
> + if (pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) {
> + /* Transitioning to ADMA mode. Free legacy PRD table
> + and the pad buffer. */
> + ata_pad_free(ap, ap->host->dev);
> + ap->pad = NULL;
> + ap->pad_dma = 0;
> + dmam_free_coherent(ap->host->dev, ATA_PRD_TBL_SZ,
> + ap->prd, ap->prd_dma);
> + ap->prd = NULL;
> + ap->prd_dma = 0;
> + }

How about always initialize DMA mask to ATA_DMA_MASK regardless of ADMA
mode such that PRD and PAD buffers are always accessible by register
mode and just raising PCI dma mask and queue bounce limit if ADMA mode
is active?

> + /* Set appropriate DMA mask. */
> + pci_set_dma_mask(pdev, bounce_limit);
> + pci_set_consistent_dma_mask(pdev, bounce_limit);

These can fail.

Also, please separate out the result TF handling to a separate patch. I
know it's a small change but as both introduces important behavior
changes, I think it would be nice to have a bisection point inbetween.

Thanks.

--
tejun

2007-11-13 04:27:29

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

Tejun Heo wrote:
> How about always initialize DMA mask to ATA_DMA_MASK regardless of ADMA
> mode such that PRD and PAD buffers are always accessible by register
> mode and just raising PCI dma mask and queue bounce limit if ADMA mode
> is active?

Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in
that way (there are some dual-socket Opteron boxes with this controller,
forcing an allocation below 4GB for this could force a non-optimal node
allocation I think..) To do this I'd have to raise the mask for the APRD
allocation, drop it again, then raise it again in ADMA mode, which is
kind of ugly.

Also, I'd rather not allocate the legacy PRD at all if we're in ADMA
mode. That way, if some bug causes us to try and do legacy DMA in ADMA
mode, we'll crash from null pointer dereference instead of potentially
transferring incorrect data (as we had in this case) and corrupting things.

>
>> + /* Set appropriate DMA mask. */
>> + pci_set_dma_mask(pdev, bounce_limit);
>> + pci_set_consistent_dma_mask(pdev, bounce_limit);
>
> These can fail.

Yes, it should likely do something with these return values. Though
theoretically it shouldn't fail, since the DMA mask is either 32-bit,
which shouldn't fail, or one that was successfully set before. Also I
don't think the SCSI layer actually checks the slave_config return
value.. sigh.

>
> Also, please separate out the result TF handling to a separate patch. I
> know it's a small change but as both introduces important behavior
> changes, I think it would be nice to have a bisection point inbetween.

Could do. That change would have to come first though, as the change to
not allocate the PRD except when necessary would cause some cases there
to blow up when before they might have worked in some cases.

>
> Thanks.
>

2007-11-13 04:42:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

Robert Hancock wrote:
> Tejun Heo wrote:
>> How about always initialize DMA mask to ATA_DMA_MASK regardless of ADMA
>> mode such that PRD and PAD buffers are always accessible by register
>> mode and just raising PCI dma mask and queue bounce limit if ADMA mode
>> is active?
>
> Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in
> that way (there are some dual-socket Opteron boxes with this controller,
> forcing an allocation below 4GB for this could force a non-optimal node
> allocation I think..) To do this I'd have to raise the mask for the APRD
> allocation, drop it again, then raise it again in ADMA mode, which is
> kind of ugly.

I don't think it really matters. The table isn't too big and it's not
like access to the table has any processor locality. Maybe it's better
to allocate to the same node as the irq but raising DMA mask doesn't
help at all.

I think performance impact is nil either way but even in highly unlikely
case it has any impact, allocating PRDs under 4G should be better as it
avoids DAC cycles on the bus. But again, this is just irrelevant.

I'd say just allocate everything under 4G.

> Also, I'd rather not allocate the legacy PRD at all if we're in ADMA
> mode. That way, if some bug causes us to try and do legacy DMA in ADMA
> mode, we'll crash from null pointer dereference instead of potentially
> transferring incorrect data (as we had in this case) and corrupting things.

Yeap, I can agree with this. But can you add BUG_ON()/WARN_ON() at
places instead? I know blanking pointers feel safer but I think it's
best to keep resource allocation / release in ->port_start/stop().

>>> + /* Set appropriate DMA mask. */
>>> + pci_set_dma_mask(pdev, bounce_limit);
>>> + pci_set_consistent_dma_mask(pdev, bounce_limit);
>>
>> These can fail.
>
> Yes, it should likely do something with these return values. Though
> theoretically it shouldn't fail, since the DMA mask is either 32-bit,
> which shouldn't fail, or one that was successfully set before. Also I
> don't think the SCSI layer actually checks the slave_config return
> value.. sigh.

Then please at least add WARN_ON() && another reason why allocating /
deallocating resources from ->slave_config isn't such a good idea.

>> Also, please separate out the result TF handling to a separate patch. I
>> know it's a small change but as both introduces important behavior
>> changes, I think it would be nice to have a bisection point inbetween.
>
> Could do. That change would have to come first though, as the change to
> not allocate the PRD except when necessary would cause some cases there
> to blow up when before they might have worked in some cases.

Yes, please.

Thanks.

--
tejun

2007-11-13 14:13:08

by Mark Lord

[permalink] [raw]
Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

Tejun Heo wrote:
> Robert Hancock wrote:
>> Tejun Heo wrote:
..
>> Yes, it should likely do something with these return values. Though
>> theoretically it shouldn't fail, since the DMA mask is either 32-bit,
>> which shouldn't fail, or one that was successfully set before. Also I
>> don't think the SCSI layer actually checks the slave_config return
>> value.. sigh.
>
> Then please at least add WARN_ON() && another reason why allocating /
> deallocating resources from ->slave_config isn't such a good idea.
..

The entire point of "slave_configure" is to provide a point for the LLD
to do per-device data structure allocation/init.

And yes, SCSI does check the return code. Whether the code around that check
is buggy or not is another question, but it's always worked for me.

> if (sdev->host->hostt->slave_configure) {
> int ret = sdev->host->hostt->slave_configure(sdev);
> if (ret) {
> /*
> * if LLDD reports slave not present, don't clutter
> * console with alloc failure messages
> */
> if (ret != -ENXIO) {
> sdev_printk(KERN_ERR, sdev,
> "failed to configure device\n");
> }
> return SCSI_SCAN_NO_RESPONSE;
> }
> }


Cheers

2007-11-14 01:58:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

Mark Lord wrote:
> Tejun Heo wrote:
>> Robert Hancock wrote:
>>> Tejun Heo wrote:
> ..
>>> Yes, it should likely do something with these return values. Though
>>> theoretically it shouldn't fail, since the DMA mask is either 32-bit,
>>> which shouldn't fail, or one that was successfully set before. Also I
>>> don't think the SCSI layer actually checks the slave_config return
>>> value.. sigh.
>>
>> Then please at least add WARN_ON() && another reason why allocating /
>> deallocating resources from ->slave_config isn't such a good idea.
> ..
>
> The entire point of "slave_configure" is to provide a point for the LLD
> to do per-device data structure allocation/init.
>
> And yes, SCSI does check the return code. Whether the code around that
> check
> is buggy or not is another question, but it's always worked for me.

I see but I still prefer having PRD, pad buf allocation/release in
->port_start/stop() primarily for consistency. Robert, what do you think?

--
tejun

2007-11-14 04:12:32

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

Tejun Heo wrote:
>> Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in
>> that way (there are some dual-socket Opteron boxes with this controller,
>> forcing an allocation below 4GB for this could force a non-optimal node
>> allocation I think..) To do this I'd have to raise the mask for the APRD
>> allocation, drop it again, then raise it again in ADMA mode, which is
>> kind of ugly.
>
> I don't think it really matters. The table isn't too big and it's not
> like access to the table has any processor locality. Maybe it's better
> to allocate to the same node as the irq but raising DMA mask doesn't
> help at all.

It's quite possible that restricting the DMA mask will also restrict
what node that can get allocated on. I'm not so much thinking of the CPU
access to the table but the controller's banging on the thing several
times for each command..

>
> I think performance impact is nil either way but even in highly unlikely
> case it has any impact, allocating PRDs under 4G should be better as it
> avoids DAC cycles on the bus. But again, this is just irrelevant.
>
> I'd say just allocate everything under 4G.

The DAC issue shouldn't matter as these controllers are integrated into
the chipset so it will be using all HT bus transactions, not PCI.

We can do it without all that mess in slave_config though, just by
delaying raising the DMA mask until after the PRD/pad buffers are allocated.

>
>> Also, I'd rather not allocate the legacy PRD at all if we're in ADMA
>> mode. That way, if some bug causes us to try and do legacy DMA in ADMA
>> mode, we'll crash from null pointer dereference instead of potentially
>> transferring incorrect data (as we had in this case) and corrupting things.
>
> Yeap, I can agree with this. But can you add BUG_ON()/WARN_ON() at
> places instead? I know blanking pointers feel safer but I think it's
> best to keep resource allocation / release in ->port_start/stop().

Yeah, I've got rid of that stuff now and added some BUG_ONs for this.
Will submit the patches shortly.