2012-01-20 11:20:29

by Bluflonalgul

[permalink] [raw]
Subject: Ext4 bigalloc and sparc ext3 16k blocksize

Some (misleading?) article on kernelnewbies.org said the new Ext4
Bigalloc feature was about supporting block size up to 1MB.
I tried to use this feature to read an ext3 fs with 16k blocksize made
on a Linux Debian Sparc (NAS).
But I couldn't read such filesystem with the Linux 3.2 kernel on x86
PC... It fails to read fs structure (as it used to fail with previous
kernels).

Could someone point me to some documentation, or give me some clues:
I'd like to understand what's wrong and if I can hope to read such fs
with Linux on x86 (natively, without fusefs tricks or additional
tools).
Thanks.
B.


2012-01-20 22:57:17

by Andreas Dilger

[permalink] [raw]
Subject: Re: Ext4 bigalloc and sparc ext3 16k blocksize

On 2012-01-20, at 4:20 AM, Bluflonalgul wrote:
> Some (misleading?) article on kernelnewbies.org said the new Ext4
> Bigalloc feature was about supporting block size up to 1MB.
> I tried to use this feature to read an ext3 fs with 16k blocksize made
> on a Linux Debian Sparc (NAS).
> But I couldn't read such filesystem with the Linux 3.2 kernel on x86
> PC... It fails to read fs structure (as it used to fail with previous
> kernels).

The bigalloc feature is not intended to be disk compatible with a
large blocksize filesystem, or no "feature" would be needed at all
besides increasing the blocksize in the superblock.

What it is intended to handle is efficient block allocation for large
file IO, by increasing the size of space allocation/tracking in the
block bitmap, without breaking the kernel paradigm of keeping block
size <= PAGE_SIZE.

This gives many of the benefits of having a large blocksize without
needing to change the whole kernel.

> Could someone point me to some documentation, or give me some clues:
> I'd like to understand what's wrong and if I can hope to read such fs
> with Linux on x86 (natively, without fusefs tricks or additional
> tools).

There was some work done by Robin Dong (2011-11-18) that would get us
most of the way to just handling large blocksize filesystems directly
by the kernel. This might be facilitated by denying mmap access to
such filesystems, but for media/big data filesystems (as opposed to the
root fs) this is probably not a serious limitation.

I'm still interested to see a continuation of Robin's work, taking it
to just be disk compatible with large blocksize, even if it is not
possible to use mmap IO on such filesystems (always setting MNT_NOEXEC
on systems where PAGE_SIZE < blocksize and not supplying f_op->mmap
should work).

The reason that this is desirable is that it allows bypassing the 16TB
file size limitations, and it also allows mounting filesystems from
SPARC, PPC, and IA64 systems that were formatted in this manner and are
getting old and need replacing.

Cheers, Andreas






2012-01-21 02:14:11

by Robin Dong

[permalink] [raw]
Subject: Re: Ext4 bigalloc and sparc ext3 16k blocksize

2012/1/21 Andreas Dilger <[email protected]>:
> On 2012-01-20, at 4:20 AM, Bluflonalgul wrote:
>> Some (misleading?) article on kernelnewbies.org said the new Ext4
>> Bigalloc feature was about supporting block size up to 1MB.
>> I tried to use this feature to read an ext3 fs with 16k blocksize made
>> on a Linux Debian Sparc (NAS).
>> But I couldn't read such filesystem with the Linux 3.2 kernel on x86
>> PC... It fails to read fs structure (as it used to fail with previous
>> kernels).
>
> The bigalloc feature is not intended to be disk compatible with a
> large blocksize filesystem, or no "feature" would be needed at all
> besides increasing the blocksize in the superblock.
>
> What it is intended to handle is efficient block allocation for large
> file IO, by increasing the size of space allocation/tracking in the
> block bitmap, without breaking the kernel paradigm of keeping block
> size <= PAGE_SIZE.
>
> This gives many of the benefits of having a large blocksize without
> needing to change the whole kernel.
>
>> Could someone point me to some documentation, or give me some clues:
>> I'd like to understand what's wrong and if I can hope to read such fs
>> with Linux on x86 (natively, without fusefs tricks or additional
>> tools).
>
> There was some work done by Robin Dong (2011-11-18) that would get us
> most of the way to just handling large blocksize filesystems directly
> by the kernel. ?This might be facilitated by denying mmap access to
> such filesystems, but for media/big data filesystems (as opposed to the
> root fs) this is probably not a serious limitation.
>
> I'm still interested to see a continuation of Robin's work, taking it
> to just be disk compatible with large blocksize, even if it is not
> possible to use mmap IO on such filesystems (always setting MNT_NOEXEC
> on systems where PAGE_SIZE < blocksize and not supplying f_op->mmap
> should work).

A great idea of extended version of ext4_extent was mentioned by Ted
(2011-11-19 http://www.spinics.net/lists/linux-ext4/msg28999.html)

I am happily to buy this story which might solve the concerns of
TAOBAO corp and Andreas as well.
Therefore I will send a RFC later to continue :)


>
> The reason that this is desirable is that it allows bypassing the 16TB
> file size limitations, and it also allows mounting filesystems from
> SPARC, PPC, and IA64 systems that were formatted in this manner and are
> getting old and need replacing.
>
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html



--
--
Best Regard
Robin Dong