2005-03-14 16:44:52

by Berkley Shands

[permalink] [raw]
Subject: Devices/Partitions over 2TB

With a Broadcom BC4852 and suitable Sata drives, it is easy to create
functional devices with well in excess of 2TB raw space. This presents a severe
problem to partitioning tools, such as fdisk/cfdisk and the like as the
kernel partition structure has a 32 bit integer max for sector counts. Since
the read_int() function combined with cround() overflows, having such a large
device makes life difficult. While mkfs.ext3 doesn't care, there is not
any way to slice that space up other than to use the raw device (/dev/sda).
Ever though about backing up 4TB to tape? :-)
Even with a severely hacked up fdisk, the 32-bit field is just too
hard coded to make the effort worthwhile. A newer revision of the partitioning
system needs to be thought of. The kernel must be told how to deal with large
devices (the filesystems already do this), using at least 64 bits here.
Using a large SAN server, I've seen 200 Peta-byte arrays. While that is excessive
for a local workstation, 4-16 TB is clearly available and being deployed.
I have not found any documentation of efforts to overcome the 2TB partition limit,
though someone should be thinking of this. If there is an effort, could someone
point me to that place? If not, is there any interest in starting such a project?

thanks for the bandwidth.

[email protected]


2005-03-14 21:25:54

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: Devices/Partitions over 2TB

In article <[email protected]>,
Berkley Shands <[email protected]> wrote:
>I have not found any documentation of efforts to overcome the 2TB
>partition limit,

config LBD
bool "Support for Large Block Devices"
depends on X86 || MIPS32 || PPC32 || ARCH_S390_31 || SUPERH
help
Say Y here if you want to attach large (bigger than 2TB) discs to
your machine, or if you want to have a raid or loopback device
bigger than 2TB. Otherwise say N.

Mike.

2005-03-14 21:36:54

by Randy.Dunlap

[permalink] [raw]
Subject: Re: Devices/Partitions over 2TB

Miquel van Smoorenburg wrote:
> In article <[email protected]>,
> Berkley Shands <[email protected]> wrote:
>
>>I have not found any documentation of efforts to overcome the 2TB
>>partition limit,
>
>
> config LBD
> bool "Support for Large Block Devices"
> depends on X86 || MIPS32 || PPC32 || ARCH_S390_31 || SUPERH
> help
> Say Y here if you want to attach large (bigger than 2TB) discs to
> your machine, or if you want to have a raid or loopback device
> bigger than 2TB. Otherwise say N.
>
> Mike.


ISTR some mention or plan or idea of using EFI GUID partition table
format, or something else that already existed & worked and supported
larger partitions sizes.

Maybe Peter Anvin or Andries would recall this info?

--
~Randy

2005-03-15 00:13:32

by jmerkey

[permalink] [raw]
Subject: Re: Devices/Partitions over 2TB


You have to ignore the partition table contents for ending cylinder. Use
the following instead. You also
have to write your own FS or modify the partition code in Linux or you
won't be able to use the storage. This
config option listed in the previous post only enables 64 bit LBA
addressing, it does not fix the busted fdisk program
or the problems you will see with the partition tables.

i.e.

SystemDisk[j]->BytesPerSector = bdev_hardsect_size(bdev);
SystemDisk[j]->driveSectors = (LONGLONG)bdev->bd_disk->capacity;
SystemDisk[j]->driveSize = (LONGLONG)
((LONGLONG)bdev->bd_disk->capacity *
SystemDisk[j]->BytesPerSector);
SystemDisk[j]->max_sg_elements = bio_get_nr_vecs(bdev);

the bd_disk->capacity reports the actual drive size, but fdisk ignores it.

Good Luck.

Jeff


Miquel van Smoorenburg wrote:

>In article <[email protected]>,
>Berkley Shands <[email protected]> wrote:
>
>
>>I have not found any documentation of efforts to overcome the 2TB
>>partition limit,
>>
>>
>
>config LBD
> bool "Support for Large Block Devices"
> depends on X86 || MIPS32 || PPC32 || ARCH_S390_31 || SUPERH
> help
> Say Y here if you want to attach large (bigger than 2TB) discs to
> your machine, or if you want to have a raid or loopback device
> bigger than 2TB. Otherwise say N.
>
>Mike.
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2005-03-15 04:47:58

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: Devices/Partitions over 2TB

In article <[email protected]> you wrote:
> You have to ignore the partition table contents for ending cylinder.

Why use MSDOS partition tables at all? What about LVM or GUID Partitions?

Gruss
Bernd

2005-03-15 05:06:48

by jmerkey

[permalink] [raw]
Subject: Re: Devices/Partitions over 2TB

Bernd Eckenfels wrote:

>In article <[email protected]> you wrote:
>
>
>>You have to ignore the partition table contents for ending cylinder.
>>
>>
Good Question. Where are the standard tools in FC2 and FC3 for these types?

Jeff

>Gruss
>Bernd
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2005-03-16 12:18:38

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: Devices/Partitions over 2TB

Hi,

On Tue, 2005-03-15 at 04:54, jmerkey wrote:

> Good Question. Where are the standard tools in FC2 and FC3 for these types?

For LVM, the lvm2 package contains all the necessary tools. I know
Alasdair did some kernel fixes for lvm2 striping on >2TB partitions
recently, though, so older kernels might not work perfectly if you're
using stripes.

To use genuine partitions > 2TB, though, you need alternative
partitioning; the GPT disk label supports that, and "parted" can create
and partition such disk labels. (Note that most x86 BIOSes can't boot
off them, though, so don't do this on your boot disk!)

--Stephen


2005-03-16 14:06:24

by Lennart Sorensen

[permalink] [raw]
Subject: Re: Devices/Partitions over 2TB

On Wed, Mar 16, 2005 at 12:16:57PM +0000, Stephen C. Tweedie wrote:
> For LVM, the lvm2 package contains all the necessary tools. I know
> Alasdair did some kernel fixes for lvm2 striping on >2TB partitions
> recently, though, so older kernels might not work perfectly if you're
> using stripes.
>
> To use genuine partitions > 2TB, though, you need alternative
> partitioning; the GPT disk label supports that, and "parted" can create
> and partition such disk labels. (Note that most x86 BIOSes can't boot
> off them, though, so don't do this on your boot disk!)

Does the BIOS actually support partitions in general? I thought that
was a problem for the code in the MBR. As long as your bootcode in the
MBR supports whatever partition scheme you come up with, I can't see how
it should be a problem, but maybe I am missing something. So what does
GRUB/LILO support?

Len Sorensen

2005-03-17 13:08:39

by Andries Brouwer

[permalink] [raw]
Subject: Re: Devices/Partitions over 2TB

On Mon, Mar 14, 2005 at 10:44:31AM -0600, Berkley Shands wrote:

> With a Broadcom BC4852 and suitable Sata drives, it is easy to create
> functional devices with well in excess of 2TB raw space. This presents a severe
> problem to partitioning tools, such as fdisk/cfdisk and the like as the
> kernel partition structure has a 32 bit integer max for sector counts. Since
> the read_int() function combined with cround() overflows, ...

You should not read fdisk source but think about the DOS-type partition table.
An entry in such a table describes partition start and end in CHS terms
using 24 bits for start and end, and describes partition start and size
in LBA terms using 32 bits for start and size. If you use sectors of size
512, that limits the use of DOS-type partition tables to disks of at most
2^41 bytes, that is, 2 TiB.

What to do afterwards? Last year I made a hack, reserving type 88 hex for
a Linux plaintext partition table. You must be able to find the kernel patch
somewhere on Google, otherwise ask. No fdisk required, the partition table
is just plaintext that you edit using emacs or vi.
The idea here is to use an ordinary DOS-type partition table for the start
of the disk, and let the type 88 partition describe the rest.

There is also the EFI/GPT disk descriptor that is common on IA64, but not much
used elsewhere. Maybe parted supports it.

Andries