2010-08-15 00:01:10

by Grant Grundler

[permalink] [raw]
Subject: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)

Attached patch enables my x86 machine to recognize and talk to a
"Native 4K" SATA device.

When I started working on this, I didn't know Matthew Wilcox had
posted a similar patch 2 years ago:
http://git.kernel.org/?p=linux/kernel/git/willy/ata.git;a=shortlog;h=refs/heads/ata-large-sectors

Gwendal Grignou pointed me at the the above code and small portions of
this patch include Matthew's work. That's why Mathew is first on the
"Signed-off-by:". I've NOT included his use of a bitmap to determine
512 vs Native for ATA command block size - just used a simple table.
And bugs are almost certainly mine.

Lastly, the patch has been tested with a native 4K 'Engineering
Sample' drive provided by Hitachi GST.

Signed-off-by: Matthew Wilcox <[email protected]>
Signed-off-by: Grant Grundler <[email protected]>
Reviewed-by: Gwendal Grignou <[email protected]>

-----

Gwendal's review of this patch had him concerned about access to
"dev->sdev->sector_size" may have risks (e.g. sdev goes away before
dev does). I didn't like adding additional fields to ata_dev that
duplicate what scsi is doing and this seems to be working so far. But
since I don't know how dev->sdev is protected/syncronized (despite how
libata and scsi are joined at the hip), I hope someone else can
comment on if my usage of dev->sdev->sector_size is a problem and why.

I believe use of ata_id_logical_per_physical_sectors() was wrong. I've
replaced it with ata_id_log2_per_physical_sector() and cleaned up how
block size word and alignment are referenced. It's alot easier to read
where those functions are used now.


Here is the kernel output from hotplugging the SATA drive on my
machine (using Marvell 7042 controller):

[ 2468.301218] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4010000
action 0xe frozen
[ 2468.301225] ata4: edma_err_cause=00000010 pp_flags=00000000, dev connect
[ 2468.301240] ata4: SError: { PHYRdyChg DevExch }
[ 2468.301257] ata4: hard resetting link
[ 2471.701117] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2471.740390] ata4.00: HPA detected: current 9770670, native 78165360
[ 2471.740399] ata4.00: ATA-7: Hitachi HTSxxxxxxxxxxxx, yyyyyyyy, max UDMA/100
[ 2471.740402] ata4.00: 9770670 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 2471.764465] ata4.00: configured for UDMA/100
[ 2471.764501] ata4: EH complete
[ 2471.764988] scsi 4:0:0:0: Direct-Access ATA Hitachi
HTSxxxxxxxxxxxx PQ: 0 ANSI: 5
[ 2471.782914] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.787093] sd 4:0:0:0: [sdd] Write Protect is off
[ 2471.787120] sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[ 2471.788768] sd 4:0:0:0: [sdd] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 2471.792293] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.792834] sd 4:0:0:0: Attached scsi generic sg3 type 0
[ 2471.799370] sdd: unknown partition table
[ 2471.845582] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.853763] sd 4:0:0:0: [sdd] Attached SCSI disk

I was able to dd to/from the device and pull SMART data (not shown and
probably not able to share.)

If someone has "Advanced Format Developers Kit" SATA drives from
idema.org, I'd appreciate any test reports.

thanks
grant

[ patch NOT inlined because gmail will word-wrap and make a mess of
it. Please use attached filed.]


Attachments:
2.6.35-libata_4k_block_support-01 (8.38 kB)

2010-08-15 04:37:20

by Matthew Wilcox

[permalink] [raw]
Subject: RE: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)

If you will insist on sending to my Outlook address, you'll get replies in standard broken Outlook format. You've been warned.

---

+/* BUG: Big endian systems need accesses to "id" wrapped with le16_to_cpu().
+ */

No, it's already been converted. See ata_dev_read_id().

---

I'm not sure about your use of a switch to set the sector size. Have you checked the code that GCC generates for this?

---

All the places you dereference dev->sdev are within a callchain from ->queuecommand; sdev can't possibly go away.
It'd be clearer that this is the case if you used scmd->device->sector_size instead of dev->sdev->sector_size.

---

@@ -516,7 +515,7 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
memset(scsi_cmd, 0, sizeof(scsi_cmd));

if (args[3]) {
- argsize = SECTOR_SIZE * args[3];
+ argsize = ATA_SECT_SIZE * args[3];
argbuf = kmalloc(argsize, GFP_KERNEL);
if (argbuf == NULL) {
rc = -ENOMEM;

I think this is wrong. The ioctl does PIO Data-in; as such, it should use the native sector size, not 512.
That said, there's a possibility of data corruption for programs which use this ioctl, assuming a 512-byte sector size when it's natively 4k.
This one's tricky and needs serious thought. I might error it if sector_size isn't 512 bytes :-)
It's a legacy ioctl anyway, right?

---

I also remember Alan pointing out that some ATA controllers don't support non-512-byte commands, and we should have a 'safe for large sectors' flag set by the controller.

-----Original Message-----
From: Grant Grundler [mailto:[email protected]]
Sent: Saturday, August 14, 2010 5:01 PM
To: Jeff Garzik; Tejun Heo
Cc: Linux IDE; Wilcox, Matthew R; Gwendal Grignou; LKML
Subject: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)

Attached patch enables my x86 machine to recognize and talk to a "Native 4K" SATA device.

When I started working on this, I didn't know Matthew Wilcox had posted a similar patch 2 years ago:
http://git.kernel.org/?p=linux/kernel/git/willy/ata.git;a=shortlog;h=refs/heads/ata-large-sectors

Gwendal Grignou pointed me at the the above code and small portions of this patch include Matthew's work. That's why Mathew is first on the "Signed-off-by:". I've NOT included his use of a bitmap to determine
512 vs Native for ATA command block size - just used a simple table.
And bugs are almost certainly mine.

Lastly, the patch has been tested with a native 4K 'Engineering Sample' drive provided by Hitachi GST.

Signed-off-by: Matthew Wilcox <[email protected]>
Signed-off-by: Grant Grundler <[email protected]>
Reviewed-by: Gwendal Grignou <[email protected]>

-----

Gwendal's review of this patch had him concerned about access to "dev->sdev->sector_size" may have risks (e.g. sdev goes away before dev does). I didn't like adding additional fields to ata_dev that duplicate what scsi is doing and this seems to be working so far. But since I don't know how dev->sdev is protected/syncronized (despite how libata and scsi are joined at the hip), I hope someone else can comment on if my usage of dev->sdev->sector_size is a problem and why.

I believe use of ata_id_logical_per_physical_sectors() was wrong. I've replaced it with ata_id_log2_per_physical_sector() and cleaned up how block size word and alignment are referenced. It's alot easier to read where those functions are used now.


Here is the kernel output from hotplugging the SATA drive on my machine (using Marvell 7042 controller):

[ 2468.301218] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen [ 2468.301225] ata4: edma_err_cause=00000010 pp_flags=00000000, dev connect [ 2468.301240] ata4: SError: { PHYRdyChg DevExch } [ 2468.301257] ata4: hard resetting link [ 2471.701117] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 2471.740390] ata4.00: HPA detected: current 9770670, native 78165360 [ 2471.740399] ata4.00: ATA-7: Hitachi HTSxxxxxxxxxxxx, yyyyyyyy, max UDMA/100 [ 2471.740402] ata4.00: 9770670 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 2471.764465] ata4.00: configured for UDMA/100 [ 2471.764501] ata4: EH complete
[ 2471.764988] scsi 4:0:0:0: Direct-Access ATA Hitachi
HTSxxxxxxxxxxxx PQ: 0 ANSI: 5
[ 2471.782914] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.787093] sd 4:0:0:0: [sdd] Write Protect is off [ 2471.787120] sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 2471.788768] sd 4:0:0:0: [sdd] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 2471.792293] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.792834] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 2471.799370] sdd: unknown partition table [ 2471.845582] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.853763] sd 4:0:0:0: [sdd] Attached SCSI disk

I was able to dd to/from the device and pull SMART data (not shown and probably not able to share.)

If someone has "Advanced Format Developers Kit" SATA drives from idema.org, I'd appreciate any test reports.

thanks
grant

[ patch NOT inlined because gmail will word-wrap and make a mess of it. Please use attached filed.]
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2010-08-16 19:17:34

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)

On Sat, Aug 14, 2010 at 9:37 PM, Wilcox, Matthew R
<[email protected]> wrote:
> If you will insist on sending to my Outlook address, you'll get replies in standard broken Outlook format.  You've been warned.

You trumped my Gmail warning. I fold. :)

>
> ---
>
> +/* BUG: Big endian systems need accesses to "id" wrapped with le16_to_cpu().
> + */
>
> No, it's already been converted.  See ata_dev_read_id().

Ah - good. I'll remove the comment.

>
> I'm not sure about your use of a switch to set the sector size.  Have you checked the code that GCC generates for this?

The switch probably sucks unless we could weight the order of the
tests. E.g. common cases first. But it's just an implementation detail
that is relatively easy to replace with the bitmap you had implemented
before.

>
> All the places you dereference dev->sdev are within a callchain from ->queuecommand; sdev can't possibly go away.
> It'd be clearer that this is the case if you used scmd->device->sector_size instead of dev->sdev->sector_size.

Thank you - that looks much better to me too.

> --
>
> @@ -516,7 +515,7 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
>        memset(scsi_cmd, 0, sizeof(scsi_cmd));
>
>        if (args[3]) {
> -               argsize = SECTOR_SIZE * args[3];
> +               argsize = ATA_SECT_SIZE * args[3];
>                argbuf = kmalloc(argsize, GFP_KERNEL);
>                if (argbuf == NULL) {
>                        rc = -ENOMEM;
>
> I think this is wrong.  The ioctl does PIO Data-in; as such, it should use the native sector size, not 512.
> That said, there's a possibility of data corruption for programs which use this ioctl, assuming a 512-byte sector size when it's natively 4k.
> This one's tricky and needs serious thought.  I might error it if sector_size isn't 512 bytes :-)
> It's a legacy ioctl anyway, right?

I have no idea. If it's tricky, I probably have it wrong.
Anyone else have guidance here?


> ---
>
> I also remember Alan pointing out that some ATA controllers don't support non-512-byte commands, and we should have a 'safe for large sectors' flag set by the controller.

This could also be added as a separate patch. I didn't run into this
problem so I can't say which controllers would have problems.

thanks for the helpful feedback!
I'll post a V2 later today.

thanks,
grant

>
> -----Original Message-----
> From: Grant Grundler [mailto:[email protected]]
> Sent: Saturday, August 14, 2010 5:01 PM
> To: Jeff Garzik; Tejun Heo
> Cc: Linux IDE; Wilcox, Matthew R; Gwendal Grignou; LKML
> Subject: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)
>
> Attached patch enables my x86 machine to recognize and talk to a "Native 4K" SATA device.
>
> When I started working on this, I didn't know Matthew Wilcox had posted a similar patch 2 years ago:
>   http://git.kernel.org/?p=linux/kernel/git/willy/ata.git;a=shortlog;h=refs/heads/ata-large-sectors
>
> Gwendal Grignou pointed me at the the above code and small portions of this patch include Matthew's work. That's why Mathew is first on the "Signed-off-by:". I've NOT included his use of a bitmap to determine
> 512 vs Native for ATA command block size - just used a simple table.
> And bugs are almost certainly mine.
>
> Lastly, the patch has been tested with a native 4K 'Engineering Sample' drive provided by Hitachi GST.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Grant Grundler <[email protected]>
> Reviewed-by: Gwendal Grignou <[email protected]>
>
> -----
>
> Gwendal's review of this patch had him concerned about access to "dev->sdev->sector_size" may have risks (e.g. sdev goes away before dev does).  I didn't like adding additional fields to ata_dev that duplicate what scsi is doing and this seems to be working so far. But since I don't know how dev->sdev is protected/syncronized (despite how libata and scsi are joined at the hip), I hope someone else can comment on if my usage of dev->sdev->sector_size is a problem and why.
>
> I believe use of ata_id_logical_per_physical_sectors() was wrong. I've replaced it with ata_id_log2_per_physical_sector() and cleaned up how block size word and alignment are referenced. It's alot easier to read where those functions are used now.
>
>
> Here is the kernel output from hotplugging the SATA drive on my machine (using Marvell 7042 controller):
>
> [ 2468.301218] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen [ 2468.301225] ata4: edma_err_cause=00000010 pp_flags=00000000, dev connect [ 2468.301240] ata4: SError: { PHYRdyChg DevExch } [ 2468.301257] ata4: hard resetting link [ 2471.701117] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 2471.740390] ata4.00: HPA detected: current 9770670, native 78165360 [ 2471.740399] ata4.00: ATA-7: Hitachi HTSxxxxxxxxxxxx, yyyyyyyy, max UDMA/100 [ 2471.740402] ata4.00: 9770670 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 2471.764465] ata4.00: configured for UDMA/100 [ 2471.764501] ata4: EH complete
> [ 2471.764988] scsi 4:0:0:0: Direct-Access     ATA      Hitachi
> HTSxxxxxxxxxxxx PQ: 0 ANSI: 5
> [ 2471.782914] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
> (40.0 GB/37.2 GiB)
> [ 2471.787093] sd 4:0:0:0: [sdd] Write Protect is off [ 2471.787120] sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 2471.788768] sd 4:0:0:0: [sdd] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 2471.792293] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
> (40.0 GB/37.2 GiB)
> [ 2471.792834] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 2471.799370]  sdd: unknown partition table [ 2471.845582] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
> (40.0 GB/37.2 GiB)
> [ 2471.853763] sd 4:0:0:0: [sdd] Attached SCSI disk
>
> I was able to dd to/from the device and pull SMART data (not shown and probably not able to share.)
>
> If someone has "Advanced Format Developers Kit" SATA drives from idema.org, I'd appreciate any test reports.
>
> thanks
> grant
>
> [ patch NOT inlined because gmail will word-wrap and make a mess of it. Please use attached filed.]
>

2010-08-16 19:41:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)

On 08/16/2010 03:17 PM, Grant Grundler wrote:
> On Sat, Aug 14, 2010 at 9:37 PM, Wilcox, Matthew R
> <[email protected]> wrote:
>> If you will insist on sending to my Outlook address, you'll get replies in standard broken Outlook format. You've been warned.
>
> You trumped my Gmail warning. I fold. :)
>
>>
>> ---
>>
>> +/* BUG: Big endian systems need accesses to "id" wrapped with le16_to_cpu().
>> + */
>>
>> No, it's already been converted. See ata_dev_read_id().
>
> Ah - good. I'll remove the comment.
>
>>
>> I'm not sure about your use of a switch to set the sector size. Have you checked the code that GCC generates for this?
>
> The switch probably sucks unless we could weight the order of the
> tests. E.g. common cases first. But it's just an implementation detail
> that is relatively easy to replace with the bitmap you had implemented
> before.
>
>>
>> All the places you dereference dev->sdev are within a callchain from ->queuecommand; sdev can't possibly go away.
>> It'd be clearer that this is the case if you used scmd->device->sector_size instead of dev->sdev->sector_size.
>
> Thank you - that looks much better to me too.
>
>> --
>>
>> @@ -516,7 +515,7 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
>> memset(scsi_cmd, 0, sizeof(scsi_cmd));
>>
>> if (args[3]) {
>> - argsize = SECTOR_SIZE * args[3];
>> + argsize = ATA_SECT_SIZE * args[3];
>> argbuf = kmalloc(argsize, GFP_KERNEL);
>> if (argbuf == NULL) {
>> rc = -ENOMEM;
>>
>> I think this is wrong. The ioctl does PIO Data-in; as such, it should use the native sector size, not 512.
>> That said, there's a possibility of data corruption for programs which use this ioctl, assuming a 512-byte sector size when it's natively 4k.
>> This one's tricky and needs serious thought. I might error it if sector_size isn't 512 bytes :-)
>> It's a legacy ioctl anyway, right?
>
> I have no idea. If it's tricky, I probably have it wrong.
> Anyone else have guidance here?

The main question is whether the size of a DRQ block changes, when LBA
logical size changes? I need to review the ATA8 specs in this area, but
I would think some interfaces that return 512-byte pages for things like
SMART info would be unchanged. How do the drives behave for
PIO-Data-{In,Out} commands that are not reading/writing user data, but
rather drive metadata?

Jeff


2010-08-16 20:24:22

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)

On Mon, Aug 16, 2010 at 12:41 PM, Jeff Garzik <[email protected]> wrote:
...
> The main question is whether the size of a DRQ block changes, when LBA
> logical size changes?

Don't we already know that depends on the ATA command?

> I need to review the ATA8 specs in this area, but I
> would think some interfaces that return 512-byte pages for things like SMART
> info would be unchanged.  How do the drives behave for PIO-Data-{In,Out}
> commands that are not reading/writing user data, but rather drive metadata?

SMART Info/Logs are meta data and as you noted unchanged.

Two other meta-data PIO commands I can think of:
o FW update: I'm not going to risk bricking this drive to find that out.
o Vendor specific selftest/diagnostic commands (I don't know if any
exist - just a guess)

If someone from Hitachi cares to respond publicly regarding the
behavior of this "Engineering Sample", that would be appreciated and
helpful. But feedback from any other vendor would be welcome/useful
also.

I suspect legacy tools might be needed to update firmware and those
might continue to function if DRQ block size is still 512 byte. I'd
like to leave this hardcoded to ATA_SECT_SIZE until someone can
demonstrate a need to change it. Ok?

thanks,
grant

2010-08-17 15:26:15

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)

Hello,

On 08/16/2010 10:24 PM, Grant Grundler wrote:
> I suspect legacy tools might be needed to update firmware and those
> might continue to function if DRQ block size is still 512 byte. I'd
> like to leave this hardcoded to ATA_SECT_SIZE until someone can
> demonstrate a need to change it. Ok?

I agree it'll be best to leave it 512bytes. We'll just cause more
confusion by overriding the sector size there. Another possible
approach would be simply failing the ioctl if logical sector size
isn't 512 byte. The ioctl should probably be deprecated one day
anyway.

Thanks.

--
tejun