2008-07-29 18:26:25

by Matthew Wilcox

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, Jul 29, 2008 at 01:24:31PM -0400, Ric Wheeler wrote:
> Jim pinged me about the use case for having our tool chain (parted
> specifically) support devices with non-512 bytes sectors.

Matt Domsch spoke with me about this at OLS. I took that opportunity,
and I'll take this one, to pimp my ata-ram driver which allows you to
alter the sector sizse to whatever you want:

http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=ata-ram

I'll admit to having not tested it with anything other than 512, but it
ought to support 4096 byte sectors just fine. I haven't looked at what
would be required to support 520-byte sectors.

Jeff, any interest in merging ata-ram soon? I've got some users inside
Intel, and Zab persuaded me to add the multiple port support last night,
so it's not just useful for me. I think it's also a nice template to
have around to show how to write a minimal libata driver.

> Off the top of my head, the following is the list of existing or soon to
> appear devices that could use this (taken from our thread last year, at
> http://kerneltrap.org/mailarchive/linux-fsdevel/2007/3/11/317183):
>
> (1) R/W optical drives
> (2) S390 dasd devices have a 4096 byte sector
> (3) new 4096 byte disks (which intend to export a virtual 512 byte
> sector)
>
> Anything else pop into mind? Any idea if SSD or FLASH devices have
> thoughts to use a non-512 byte sector size? What tools are most critical
> to support?
>
> Thanks!
>
> Ric
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."


2008-07-29 18:38:13

by James Bottomley

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, 2008-07-29 at 12:26 -0600, Matthew Wilcox wrote:
> On Tue, Jul 29, 2008 at 01:24:31PM -0400, Ric Wheeler wrote:
> > Jim pinged me about the use case for having our tool chain (parted
> > specifically) support devices with non-512 bytes sectors.
>
> Matt Domsch spoke with me about this at OLS. I took that opportunity,
> and I'll take this one, to pimp my ata-ram driver which allows you to
> alter the sector sizse to whatever you want:
>
> http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=ata-ram
>
> I'll admit to having not tested it with anything other than 512, but it
> ought to support 4096 byte sectors just fine. I haven't looked at what
> would be required to support 520-byte sectors.

scsi_debug does exactly the same thing, so it reports anything you tell
it (Martin Petersen actually added this so he could test with 4k
sectors).

The problem, which ata_ram also suffers, is that the tools we most need
to test are the ones for manipulating non volatile characteristics (like
partition tables). We'd really like the disk contents to survive reboot
for this ...

James

2008-07-29 18:42:57

by Matthew Wilcox

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, Jul 29, 2008 at 01:37:25PM -0500, James Bottomley wrote:
> scsi_debug does exactly the same thing, so it reports anything you tell
> it (Martin Petersen actually added this so he could test with 4k
> sectors).
>
> The problem, which ata_ram also suffers, is that the tools we most need
> to test are the ones for manipulating non volatile characteristics (like
> partition tables). We'd really like the disk contents to survive reboot
> for this ...

Ummm... _reboot_, or _module unload/reload_? I could certainly include
an option to populate the ramdisc from a file. Is the ioctl to re-read
the partition table not enough?

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-29 18:43:30

by Martin K. Petersen

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

>>>>> "Matthew" == Matthew Wilcox <[email protected]> writes:

Matthew> I'll admit to having not tested it with anything other than
Matthew> 512, but it ought to support 4096 byte sectors just fine. I
Matthew> haven't looked at what would be required to support 520-byte
Matthew> sectors.

I recently added multiple sector support to scsi_debug. On a recent
kernel you can modprobe scsi_debug sector_size=4096.

I have only tested 4KB but it also supports 1 and 2KB.

--
Martin K. Petersen Oracle Linux Engineering

2008-07-29 18:45:24

by James Bottomley

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, 2008-07-29 at 12:42 -0600, Matthew Wilcox wrote:
> On Tue, Jul 29, 2008 at 01:37:25PM -0500, James Bottomley wrote:
> > scsi_debug does exactly the same thing, so it reports anything you tell
> > it (Martin Petersen actually added this so he could test with 4k
> > sectors).
> >
> > The problem, which ata_ram also suffers, is that the tools we most need
> > to test are the ones for manipulating non volatile characteristics (like
> > partition tables). We'd really like the disk contents to survive reboot
> > for this ...
>
> Ummm... _reboot_, or _module unload/reload_? I could certainly include
> an option to populate the ramdisc from a file. Is the ioctl to re-read
> the partition table not enough?

reboot ... we'd like to take the tools through shutdown restart testing
to make sure they're all working ... of course, then there's the
bios ...

James

2008-07-29 18:50:30

by Matthew Wilcox

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, Jul 29, 2008 at 01:44:38PM -0500, James Bottomley wrote:
> On Tue, 2008-07-29 at 12:42 -0600, Matthew Wilcox wrote:
> > On Tue, Jul 29, 2008 at 01:37:25PM -0500, James Bottomley wrote:
> > > scsi_debug does exactly the same thing, so it reports anything you tell
> > > it (Martin Petersen actually added this so he could test with 4k
> > > sectors).
> > >
> > > The problem, which ata_ram also suffers, is that the tools we most need
> > > to test are the ones for manipulating non volatile characteristics (like
> > > partition tables). We'd really like the disk contents to survive reboot
> > > for this ...
> >
> > Ummm... _reboot_, or _module unload/reload_? I could certainly include
> > an option to populate the ramdisc from a file. Is the ioctl to re-read
> > the partition table not enough?
>
> reboot ... we'd like to take the tools through shutdown restart testing
> to make sure they're all working ... of course, then there's the
> bios ...

It's not up to us to fix the BIOS.

Since the vast majority of users use a distro, and the vast majority of
distros use a fully modular kernel, wouldn't initialising the contents
of ata-ram from the initrd/initramfs solve the problem?

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-07-29 18:52:31

by Martin K. Petersen

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

>>>>> "James" == James Bottomley <[email protected]> writes:

James> The problem, which ata_ram also suffers, is that the tools we
James> most need to test are the ones for manipulating non volatile
James> characteristics (like partition tables). We'd really like the
James> disk contents to survive reboot for this ...

Yeah, I should add that I wanted persistence too. I went through a
whole stack (well, 5-6 or so) fibre channel drives from various
vendors and attempted to low-level format them to 4KB sectors. Most
of them laughed in my face. One of them tried to comply and
irreparably confused its firmware in the process.

Just yesterday I received a couple of prototype drives in the mail.
I'll ask the vendor whether they support 4KB and if so I'll give them
a whirl.

--
Martin K. Petersen Oracle Linux Engineering

2008-07-29 18:55:04

by Ric Wheeler

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

Martin K. Petersen wrote:
>>>>>> "James" == James Bottomley <[email protected]> writes:
>>>>>>
>
> James> The problem, which ata_ram also suffers, is that the tools we
> James> most need to test are the ones for manipulating non volatile
> James> characteristics (like partition tables). We'd really like the
> James> disk contents to survive reboot for this ...
>
> Yeah, I should add that I wanted persistence too. I went through a
> whole stack (well, 5-6 or so) fibre channel drives from various
> vendors and attempted to low-level format them to 4KB sectors. Most
> of them laughed in my face. One of them tried to comply and
> irreparably confused its firmware in the process.
>
> Just yesterday I received a couple of prototype drives in the mail.
> I'll ask the vendor whether they support 4KB and if so I'll give them
> a whirl.
>
Isn't this a great use case for a SCSI target device where our target
can be a software disk on a remote host? What is missing for us to put
something like that together?

ric

2008-07-29 18:56:55

by James Bottomley

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, 2008-07-29 at 14:54 -0400, Ric Wheeler wrote:
> Martin K. Petersen wrote:
> >>>>>> "James" == James Bottomley <[email protected]> writes:
> >>>>>>
> >
> > James> The problem, which ata_ram also suffers, is that the tools we
> > James> most need to test are the ones for manipulating non volatile
> > James> characteristics (like partition tables). We'd really like the
> > James> disk contents to survive reboot for this ...
> >
> > Yeah, I should add that I wanted persistence too. I went through a
> > whole stack (well, 5-6 or so) fibre channel drives from various
> > vendors and attempted to low-level format them to 4KB sectors. Most
> > of them laughed in my face. One of them tried to comply and
> > irreparably confused its firmware in the process.
> >
> > Just yesterday I received a couple of prototype drives in the mail.
> > I'll ask the vendor whether they support 4KB and if so I'll give them
> > a whirl.
> >
> Isn't this a great use case for a SCSI target device where our target
> can be a software disk on a remote host? What is missing for us to put
> something like that together?

Technically nothing. Tomo should already have one for the STGT test
infrastructure (I've cc'd him).

James

2008-07-29 19:01:04

by James Bottomley

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, 2008-07-29 at 12:50 -0600, Matthew Wilcox wrote:
> On Tue, Jul 29, 2008 at 01:44:38PM -0500, James Bottomley wrote:
> > On Tue, 2008-07-29 at 12:42 -0600, Matthew Wilcox wrote:
> > > On Tue, Jul 29, 2008 at 01:37:25PM -0500, James Bottomley wrote:
> > > > scsi_debug does exactly the same thing, so it reports anything you tell
> > > > it (Martin Petersen actually added this so he could test with 4k
> > > > sectors).
> > > >
> > > > The problem, which ata_ram also suffers, is that the tools we most need
> > > > to test are the ones for manipulating non volatile characteristics (like
> > > > partition tables). We'd really like the disk contents to survive reboot
> > > > for this ...
> > >
> > > Ummm... _reboot_, or _module unload/reload_? I could certainly include
> > > an option to populate the ramdisc from a file. Is the ioctl to re-read
> > > the partition table not enough?
> >
> > reboot ... we'd like to take the tools through shutdown restart testing
> > to make sure they're all working ... of course, then there's the
> > bios ...
>
> It's not up to us to fix the BIOS.
>
> Since the vast majority of users use a distro, and the vast majority of
> distros use a fully modular kernel, wouldn't initialising the contents
> of ata-ram from the initrd/initramfs solve the problem?

Well ... we'd really like it file backed to truly verify ... sort of
like scsi_debug on a loopback. But saving on shutdown and
reinitialising from the saved image on boot would likely be perfect.

James

2008-07-29 21:54:26

by Jeff Garzik

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

Matthew Wilcox wrote:
> On Tue, Jul 29, 2008 at 01:24:31PM -0400, Ric Wheeler wrote:
>> Jim pinged me about the use case for having our tool chain (parted
>> specifically) support devices with non-512 bytes sectors.
>
> Matt Domsch spoke with me about this at OLS. I took that opportunity,
> and I'll take this one, to pimp my ata-ram driver which allows you to
> alter the sector sizse to whatever you want:
>
> http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=ata-ram
>
> I'll admit to having not tested it with anything other than 512, but it
> ought to support 4096 byte sectors just fine. I haven't looked at what
> would be required to support 520-byte sectors.
>
> Jeff, any interest in merging ata-ram soon? I've got some users inside
> Intel, and Zab persuaded me to add the multiple port support last night,
> so it's not just useful for me. I think it's also a nice template to
> have around to show how to write a minimal libata driver.

I'm happy to include ata_ram whenever it is working and you think it's
ready for inclusion.

Jeff


2008-07-29 23:45:07

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, 29 Jul 2008 13:56:14 -0500
James Bottomley <[email protected]> wrote:

> On Tue, 2008-07-29 at 14:54 -0400, Ric Wheeler wrote:
> > Martin K. Petersen wrote:
> > >>>>>> "James" == James Bottomley <[email protected]> writes:
> > >>>>>>
> > >
> > > James> The problem, which ata_ram also suffers, is that the tools we
> > > James> most need to test are the ones for manipulating non volatile
> > > James> characteristics (like partition tables). We'd really like the
> > > James> disk contents to survive reboot for this ...
> > >
> > > Yeah, I should add that I wanted persistence too. I went through a
> > > whole stack (well, 5-6 or so) fibre channel drives from various
> > > vendors and attempted to low-level format them to 4KB sectors. Most
> > > of them laughed in my face. One of them tried to comply and
> > > irreparably confused its firmware in the process.
> > >
> > > Just yesterday I received a couple of prototype drives in the mail.
> > > I'll ask the vendor whether they support 4KB and if so I'll give them
> > > a whirl.
> > >
> > Isn't this a great use case for a SCSI target device where our target
> > can be a software disk on a remote host? What is missing for us to put
> > something like that together?
>
> Technically nothing. Tomo should already have one for the STGT test
> infrastructure (I've cc'd him).

Yeah, stgt also enables you to use a software media changer and a
software DVD drive (and we are working on VTL).

http://stgt.berlios.de/

You can connect to a remote host with iSCSI. FCoE might work since
Mike Christie has used stgt to work on the FCoE initiator driver.

stgt doesn't support non-512 byte sector sizes now but I'll add the
support shortly. I want to try DIF with iSCSI and FCoE.

2008-07-30 05:51:25

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

James Bottomley wrote:
> On Tue, 2008-07-29 at 12:26 -0600, Matthew Wilcox wrote:
>> On Tue, Jul 29, 2008 at 01:24:31PM -0400, Ric Wheeler wrote:
>>> Jim pinged me about the use case for having our tool chain (parted
>>> specifically) support devices with non-512 bytes sectors.
>> Matt Domsch spoke with me about this at OLS. I took that opportunity,
>> and I'll take this one, to pimp my ata-ram driver which allows you to
>> alter the sector sizse to whatever you want:
>>
>> http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=ata-ram
>>
>> I'll admit to having not tested it with anything other than 512, but it
>> ought to support 4096 byte sectors just fine. I haven't looked at what
>> would be required to support 520-byte sectors.
>
> scsi_debug does exactly the same thing, so it reports anything you tell
> it (Martin Petersen actually added this so he could test with 4k
> sectors).
>
> The problem, which ata_ram also suffers, is that the tools we most need
> to test are the ones for manipulating non volatile characteristics (like
> partition tables). We'd really like the disk contents to survive reboot
> for this ...

SCST (http://scst.sf.net) fully supports non-512 bytes sectors up to
4096. Available target drivers for transports: software iSCSI, FC,
InfiniBand SRP, parallel SCSI, SAS (not much tested, because of lack of
hardware). With VDISK dev handler you can use files as a backstorage.

I personally for a long time have been working with 4K sectors, because
it's better for performance, but so far found the only tool, which
doesn't support them: disktest.

> James
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2008-07-30 13:52:00

by Matt Domsch

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, Jul 29, 2008 at 02:48:31PM -0400, Martin K. Petersen wrote:
> >>>>> "James" == James Bottomley <[email protected]> writes:
>
> James> The problem, which ata_ram also suffers, is that the tools we
> James> most need to test are the ones for manipulating non volatile
> James> characteristics (like partition tables). We'd really like the
> James> disk contents to survive reboot for this ...
>
> Yeah, I should add that I wanted persistence too. I went through a
> whole stack (well, 5-6 or so) fibre channel drives from various
> vendors and attempted to low-level format them to 4KB sectors. Most
> of them laughed in my face. One of them tried to comply and
> irreparably confused its firmware in the process.
>
> Just yesterday I received a couple of prototype drives in the mail.
> I'll ask the vendor whether they support 4KB and if so I'll give them
> a whirl.

I have access to disks with native 4KB sectors now too. Would
interested parties be willing to share test plans, so we could be sure
we have coverage wrt correctness: kernel internals, userspace tools like parted,
fdisk, kpartx, apps using O_DIRECT)? Benchmarking winds up being an
NDA activity this early in the game so I don't want the focus of any
joint work to be benchmarks yet.

--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux

2008-07-30 17:17:06

by Jim Meyering

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

Matt Domsch <[email protected]> wrote:
> On Tue, Jul 29, 2008 at 02:48:31PM -0400, Martin K. Petersen wrote:
...
>> Just yesterday I received a couple of prototype drives in the mail.
>> I'll ask the vendor whether they support 4KB and if so I'll give them
>> a whirl.
>
> I have access to disks with native 4KB sectors now too. Would

Do they expose that sector size?
I.e., does ioctl(fd,BLKSSZGET,&ss) set ss to 4096?

I'm interested because I'm preparing GNU Parted's partition table
manipulation code (not its FS code) for just that.
In particular, now I've heard two stories:

- disk makers will eventually sell drives with >512-byte sectors

- some disk makers have sort of agreed not to do that, and
expect forever to hide the larger underlying sector size
behind a virtual 512 (of course, this imposes alignment
restrictions, but that's a smaller problem)

Even if the latter is the case, we still have to deal with
optical and flash, both of which can already have larger sectors.

> interested parties be willing to share test plans, so we could be sure
> we have coverage wrt correctness: kernel internals, userspace tools like parted,
> fdisk, kpartx, apps using O_DIRECT)? Benchmarking winds up being an
> NDA activity this early in the game so I don't want the focus of any
> joint work to be benchmarks yet.

Speaking of O_DIRECT, both dd and shred (both in coreutils), use
O_DIRECT, so you could get _some_ coverage just by running shred
and experimenting with dd's oflag=direct and iflag=direct options.

2008-07-30 17:29:36

by Matt Domsch

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Wed, Jul 30, 2008 at 07:16:49PM +0200, Jim Meyering wrote:
> Matt Domsch <[email protected]> wrote:
> > On Tue, Jul 29, 2008 at 02:48:31PM -0400, Martin K. Petersen wrote:
> ...
> >> Just yesterday I received a couple of prototype drives in the mail.
> >> I'll ask the vendor whether they support 4KB and if so I'll give them
> >> a whirl.
> >
> > I have access to disks with native 4KB sectors now too. Would
>
> Do they expose that sector size?
> I.e., does ioctl(fd,BLKSSZGET,&ss) set ss to 4096?

yes.

> I'm interested because I'm preparing GNU Parted's partition table
> manipulation code (not its FS code) for just that.
> In particular, now I've heard two stories:
>
> - disk makers will eventually sell drives with >512-byte sectors

yes

> - some disk makers have sort of agreed not to do that, and
> expect forever to hide the larger underlying sector size
> behind a virtual 512 (of course, this imposes alignment
> restrictions, but that's a smaller problem)

yes, this is happening also.

There will be 3 types of disks eventually:
1) those that report a 512-byte sector size, and are really a 512-byte
size. This is nearly all disks today.

2) those that report a 512-byte sector size, but are really a
4096-byte size, and the drive does the conversions and
read/modify/write. T10 and T13 are looking to add commands to
expose this different underlying physical sector size so the OS
could be aware of it. This is primarily being driven to mitigate
any problems that may happen with "legacy" OSs that are not aware
of the difference.

3) those that report a 4096-byte sector size, and are really a
4096-byte size. This seems ideal for aware OSs.

Which of 2) or 3) hit the market in mass remains to be seen. I want
Linux to be able to handle either painlessly.


--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux

2008-07-30 17:44:32

by Alan

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

> read/modify/write. T10 and T13 are looking to add commands to
> expose this different underlying physical sector size so the OS
> could be aware of it. This is primarily being driven to mitigate

The identify bits are already there for reporting both size and offset.

> 3) those that report a 4096-byte sector size, and are really a
> 4096-byte size. This seems ideal for aware OSs.
>
> Which of 2) or 3) hit the market in mass remains to be seen. I want
> Linux to be able to handle either painlessly.

I am expecting 3 to turn up some _minor_ problem cases. Many older ATA
controllers magically know the sector size of media and the internal
state machines and FIFO they use for performance is potentially going to
go gaga in this case when we do a PIO transfer.

Alan

2008-07-30 18:16:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Wed, Jul 30, 2008 at 12:29:22PM -0500, Matt Domsch wrote:
> 2) those that report a 512-byte sector size, but are really a
> 4096-byte size, and the drive does the conversions and
> read/modify/write. T10 and T13 are looking to add commands to
> expose this different underlying physical sector size so the OS
> could be aware of it. This is primarily being driven to mitigate
> any problems that may happen with "legacy" OSs that are not aware
> of the difference.

As usual, the biggest problem will be "legacy" userspace. For
example, most partition tools are still generating legacy partition
tables that look like this:

Disk /dev/sda: 255 heads, 63 sectors, 38913 cylinders

Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID
1 80 1 1 0 254 63 121 63 1959867 83
2 00 0 1 122 254 63 619 1959930 8000370 82
3 00 0 1 620 254 63 1023 9960300 615177045 05
4 00 0 0 0 0 0 0 0 0 00
5 00 1 1 620 254 63 1023 63 615176982 8e

Note the starting sector# for the first partition.....

- Ted

2008-07-30 18:30:48

by Ric Wheeler

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

Theodore Tso wrote:
> On Wed, Jul 30, 2008 at 12:29:22PM -0500, Matt Domsch wrote:
>
>> 2) those that report a 512-byte sector size, but are really a
>> 4096-byte size, and the drive does the conversions and
>> read/modify/write. T10 and T13 are looking to add commands to
>> expose this different underlying physical sector size so the OS
>> could be aware of it. This is primarily being driven to mitigate
>> any problems that may happen with "legacy" OSs that are not aware
>> of the difference.
>>
>
> As usual, the biggest problem will be "legacy" userspace. For
> example, most partition tools are still generating legacy partition
> tables that look like this:
>
> Disk /dev/sda: 255 heads, 63 sectors, 38913 cylinders
>
> Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID
> 1 80 1 1 0 254 63 121 63 1959867 83
> 2 00 0 1 122 254 63 619 1959930 8000370 82
> 3 00 0 1 620 254 63 1023 9960300 615177045 05
> 4 00 0 0 0 0 0 0 0 0 00
> 5 00 1 1 620 254 63 1023 63 615176982 8e
>
> Note the starting sector# for the first partition.....
>
> - Ted
>
If I remember correctly, the MS Vista new alignment for data partitions
is on a 0 offset, 1MB aligned boundary. The support for 4096 byte
sectors is only for data partitions (not boot).

Array vendors, who consume a fair amount of drives, are most likely more
friendly to native 4k drives. The big fear from disk vendors is getting
a wave of returns from Best Buy, etc when people go and plug in a new,
native 4k drive into an old box....

ric

2008-07-30 18:48:04

by Theodore Ts'o

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Wed, Jul 30, 2008 at 02:28:12PM -0400, Ric Wheeler wrote:
> If I remember correctly, the MS Vista new alignment for data partitions
> is on a 0 offset, 1MB aligned boundary. The support for 4096 byte
> sectors is only for data partitions (not boot).
>
> Array vendors, who consume a fair amount of drives, are most likely more
> friendly to native 4k drives. The big fear from disk vendors is getting
> a wave of returns from Best Buy, etc when people go and plug in a new,
> native 4k drive into an old box....

Or a new box running XP, either via the Dell "upgrade to XP" program,
or from a corporate I/T load[1]. :-)

[1] http://www.theinquirer.net/gb/inquirer/news/2008/06/23/intel-dumps-vista

More to the point for Linux, are *our* partition table programs (i.e.,
fdisk, cfdisk, et. al) fixed with better defaults in upstream, and
what are the upcoming enterprise distributions going to ship with?
Since that's what a large number of Linux customers will end up using
for the next 3-5 years....

- Ted

2008-08-01 16:12:21

by Matthew Wilcox

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Wed, Jul 30, 2008 at 08:51:47AM -0500, Matt Domsch wrote:
> I have access to disks with native 4KB sectors now too. Would
> interested parties be willing to share test plans, so we could be sure
> we have coverage wrt correctness: kernel internals, userspace tools like parted,
> fdisk, kpartx, apps using O_DIRECT)? Benchmarking winds up being an
> NDA activity this early in the game so I don't want the focus of any
> joint work to be benchmarks yet.

Are they SCSI? I just got round to trying 4k sector sizes in ata_ram
(after adding file backed capability) and found that libata currently
silently ignores the identify bits that report sector size. I'll work
on fixing that this afternoon if nobody beats me to it.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-08-05 16:54:42

by Matthew Wilcox

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Fri, Aug 01, 2008 at 10:11:49AM -0600, Matthew Wilcox wrote:
> Are they SCSI? I just got round to trying 4k sector sizes in ata_ram
> (after adding file backed capability) and found that libata currently
> silently ignores the identify bits that report sector size. I'll work
> on fixing that this afternoon if nobody beats me to it.

OK, I have patches. I'll send them to linux-ide. If anyone wants to
try them, I pushed out two trees; one for libata:

http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=ata-large-sectors

and one for ata_ram supporting:
- large sectors
- file backing
http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=ata-ram

I hope that will help some more people do testing.

Here's the dmesg from running:

$ sudo modprobe ata_ram sector_size=4096 capacity=262144 nr_ports=2

(note that you'll need at least 2.5GB of ram in your machine to try this,
or Linux gets really unhappy. You can, of course, reduce the capacity.
Would there be interest in a lazily allocated option for ata_ram?)

[ 1134.017240] scsi7 : ata_ram
[ 1134.017420] scsi8 : ata_ram
[ 1134.017489] ata8: SATA max UDMA/133 ata_ram_0
[ 1134.017495] ata9: SATA max UDMA/133 ata_ram_1
[ 1134.017557] ata8.00: ATA-8: Linux RAM Drive, 0.01, max UDMA7
[ 1134.017563] ata8.00: 262144 sectors, multi 0: LBA
[ 1134.017602] ata8.00: configured for UDMA/133
[ 1134.017631] ata9.00: ATA-8: Linux RAM Drive, 0.01, max UDMA7
[ 1134.017636] ata9.00: 262144 sectors, multi 0: LBA
[ 1134.017668] ata9.00: configured for UDMA/133
[ 1134.035741] scsi 7:0:0:0: Direct-Access ATA Linux RAM Drive 0.01 PQ: 0 ANSI: 5
[ 1134.035904] sd 7:0:0:0: [sdb] 262144 4096-byte hardware sectors (1074 MB)
[ 1134.035926] sd 7:0:0:0: [sdb] Write Protect is off
[ 1134.035932] sd 7:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 1134.035961] sd 7:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 1134.036039] sd 7:0:0:0: [sdb] 262144 4096-byte hardware sectors (1074 MB)
[ 1134.036061] sd 7:0:0:0: [sdb] Write Protect is off
[ 1134.036066] sd 7:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 1134.036095] sd 7:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 1134.036119] sdb: unknown partition table
[ 1134.036276] sd 7:0:0:0: [sdb] Attached SCSI disk
[ 1134.036463] sd 7:0:0:0: Attached scsi generic sg2 type 0
[ 1134.036607] scsi 8:0:0:0: Direct-Access ATA Linux RAM Drive 0.01 PQ: 0 ANSI: 5
[ 1134.036749] sd 8:0:0:0: [sdc] 262144 4096-byte hardware sectors (1074 MB)
[ 1134.036768] sd 8:0:0:0: [sdc] Write Protect is off
[ 1134.036774] sd 8:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 1134.036803] sd 8:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 1134.036869] sd 8:0:0:0: [sdc] 262144 4096-byte hardware sectors (1074 MB)
[ 1134.036888] sd 8:0:0:0: [sdc] Write Protect is off
[ 1134.036895] sd 8:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 1134.036924] sd 8:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 1134.036944] sdc: unknown partition table
[ 1134.037082] sd 8:0:0:0: [sdc] Attached SCSI disk
[ 1134.037182] sd 8:0:0:0: Attached scsi generic sg3 type 0

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-08-05 16:58:00

by Matt Domsch

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Fri, Aug 01, 2008 at 10:11:49AM -0600, Matthew Wilcox wrote:
> On Wed, Jul 30, 2008 at 08:51:47AM -0500, Matt Domsch wrote:
> > I have access to disks with native 4KB sectors now too. Would
> > interested parties be willing to share test plans, so we could be
> > sure we have coverage wrt correctness: kernel internals, userspace
> > tools like parted, fdisk, kpartx, apps using O_DIRECT)?
> > Benchmarking winds up being an NDA activity this early in the game
> > so I don't want the focus of any joint work to be benchmarks yet.
>
> Are they SCSI?

yes (SAS).

--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux

2008-08-05 16:58:24

by Matthew Wilcox

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

On Tue, Aug 05, 2008 at 10:54:27AM -0600, Matthew Wilcox wrote:
> On Fri, Aug 01, 2008 at 10:11:49AM -0600, Matthew Wilcox wrote:
> > Are they SCSI? I just got round to trying 4k sector sizes in ata_ram
> > (after adding file backed capability) and found that libata currently
> > silently ignores the identify bits that report sector size. I'll work
> > on fixing that this afternoon if nobody beats me to it.
>
> OK, I have patches. I'll send them to linux-ide. If anyone wants to
> try them, I pushed out two trees; one for libata:
>
> http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=ata-large-sectors
>
> and one for ata_ram supporting:
> - large sectors
> - file backing
> http://git.kernel.org/?p=linux/kernel/git/willy/misc.git;a=shortlog;h=ata-ram

I forgot to mention ... I didn't add support for 520-byte (or 4104-byte
or 4160-byte) sectors. Martin helpfully pointed me to
http://www.t13.org/Documents/UploadedDocuments/docs2008/e07162r2-External_Path_Protection.pdf
but it seems like T13 haven't allocated some words for this yet. If
anyone wants to work on this, please contact me; I have some thoughts.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-08-10 20:03:29

by Pavel Machek

[permalink] [raw]
Subject: Re: tools support for non-512 byte sector sizes

Hi!

> > - some disk makers have sort of agreed not to do that, and
> > expect forever to hide the larger underlying sector size
> > behind a virtual 512 (of course, this imposes alignment
> > restrictions, but that's a smaller problem)
>
> yes, this is happening also.
>
> There will be 3 types of disks eventually:
> 1) those that report a 512-byte sector size, and are really a 512-byte
> size. This is nearly all disks today.
>
> 2) those that report a 512-byte sector size, but are really a
> 4096-byte size, and the drive does the conversions and
> read/modify/write. T10 and T13 are looking to add commands to
> expose this different underlying physical sector size so the OS
> could be aware of it. This is primarily being driven to mitigate
> any problems that may happen with "legacy" OSs that are not aware
> of the difference.

How is this going to work with journaling? This has nasty property
that if you are writing to sector n during powerfail, disk may also
kill sectors n-3, n-2 and n-1..... and that's bad right?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html