Hi
Imagine we have a virtual disk which provides a 64bit (sparse) address
room. Unfortunately we can not use it as a block device because in a lot
of places (including buffer_head structure), we're using a long or even
an int for the block number.
Is there any way of getting a standardized way of doing I/O to a block
device which could handle 64bit addresses for the block number?
Don't you think that we will run into problems anyway because soon there
will be raid systems with a couple of Terrabytes of space to waste for
mp3's ;-)
Reto
[Reto Baettig]
> Imagine we have a virtual disk which provides a 64bit (sparse)
> address room. Unfortunately we can not use it as a block device
> because in a lot of places (including buffer_head structure), we're
> using a long or even an int for the block number.
Actually it should be 'unsigned long'. If anyone uses 'long' or 'int',
I guess it's a bug.
> Is there any way of getting a standardized way of doing I/O to a block
> device which could handle 64bit addresses for the block number?
Yeah, tell the world you explicitly don't support 32-bit architectures.
Linux supports (to some degree) at least four 64-bit architectures,
where 'unsigned long' is nice and big. And I imagine support for
POWER3 and HP-PA 2.0w are coming in the not-so-distant future.
Either that, or (since you say the address space is sparse) do your own
block mapping within the driver. If you still need more than 32 bits,
you'll have to fudge it with multiple virtual devices.
> Don't you think that we will run into problems anyway because soon
> there will be raid systems with a couple of Terrabytes of space to
> waste for mp3's ;-)
A couple of terabytes is fine. That's 32 bits of blocks. *More* than
that, now, we've got a problem.
Peter
On Wed, Dec 06, 2000 at 10:07:57PM -0600, Peter Samuelson wrote:
>
> > Don't you think that we will run into problems anyway because soon
> > there will be raid systems with a couple of Terrabytes of space to
> > waste for mp3's ;-)
>
> A couple of terabytes is fine. That's 32 bits of blocks. *More* than
> that, now, we've got a problem.
>
Which is exactly what we're going to be dealing with "real soon now".
I'm going to be putting together a RAID system with between 1.7-5.1TB
by February. This will be seen as a single block device to clients
via a network block device (more than likely it will be 16 Ciprico
Rimfire 7000's spread across 4 nodes via a Quadrics switch). So, what
I'm seeing right now is that I won't be able to address this amount of
space with a single block device. By the summer of 2001 we could be
looking at putting together 10-150TB (depends on budget and need) of
disk space for a production cluster and it would be nice if our
parallel filesystem could span that entire space with a single image.
That being said, has anyone started making changes to accommodate large
devices like this in the block layer, at least on 64bit architectures?
I don't think we are seriously considering anything other than Alphas
at this point.
BAPper
> Is there any way of getting a standardized way of doing I/O to a block
> device which could handle 64bit addresses for the block number?
Submit patches early into 2.5 to extend the block range ?
> Don't you think that we will run into problems anyway because soon there
> will be raid systems with a couple of Terrabytes of space to waste for
> mp3's ;-)
The limit is currently about 1Tb per block so yes it will eventually get us
Hi,
On Wed, Dec 06, 2000 at 06:50:15AM -0800, Reto Baettig wrote:
> Imagine we have a virtual disk which provides a 64bit (sparse) address
> room. Unfortunately we can not use it as a block device because in a lot
> of places (including buffer_head structure), we're using a long or even
> an int for the block number.
>
> Is there any way of getting a standardized way of doing I/O to a block
> device which could handle 64bit addresses for the block number?
It's on the agenda for urgent fixing in 2.5 (along with
block-dev-layer support for high memory on Intel, and merging in
better disk profiling, and a general cleanup of the data tables in
ll_rw_blk.c).
Cheers,
Stephen