2010-07-11 17:04:46

by Patrick J. LoPresti

[permalink] [raw]
Subject: [PATCH 2/2] OCFS2: Allow huge (> 16 TiB) volumes to mount

The OCFS2 developers have already done all of the hard work to allow
volumes larger than 16 TiB. But there is still a "sanity check" in
fs/ocfs2/super.c that prevents the mounting of such volumes, even when
the cluster size and journal options would allow it.

This patch replaces that sanity check with a more sophisticated one to
mount a huge volume provided that (a) it is addressable by the raw
word/address size of the system (borrowing a test from ext4); (b) the
volume is using JBD2; and (c) the JBD2_FEATURE_INCOMPAT_64BIT flag is
set on the journal.

I factored out the sanity check into its own function. I also moved it
from ocfs2_initialize_super() down to ocfs2_check_volume(); any earlier,
and the journal will not have been initialized yet.

This patch is one of a pair, and it depends on the other ("JBD2: Allow
feature checks before journal recovery").

I have tested this patch on small volumes, huge volumes, and huge
volumes without 64-bit block support in the journal. All of them appear
to work or to fail gracefully, as appropriate.

Signed-off-by: Patrick LoPresti <[email protected]>


diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 0eaa929..b809508 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1991,6 +1991,47 @@ static int ocfs2_setup_osb_uuid(struct ocfs2_super *osb, const unsigned char *uu
return 0;
}

+/* Check to make sure entire volume is addressable on this system.
+ Requires osb_clusters_at_boot to be valid and for the journal to
+ have been initialized by ocfs2_journal_init(). */
+static int ocfs2_check_addressable(struct ocfs2_super *osb)
+{
+ int status = 0;
+ u64 max_block =
+ ocfs2_clusters_to_blocks(osb->sb,
+ osb->osb_clusters_at_boot) - 1;
+
+ /* Absolute addressability check (borrowed from ext4/super.c) */
+ if ((max_block >
+ (sector_t)(~0LL) >> (osb->sb->s_blocksize_bits - 9)) ||
+ (max_block > (pgoff_t)(~0LL) >> (PAGE_CACHE_SHIFT -
+ osb->sb->s_blocksize_bits))) {
+ mlog(ML_ERROR, "Volume too large "
+ "to mount safely on this system");
+ status = -EFBIG;
+ goto out;
+ }
+
+ /* 32-bit block number is always OK. */
+ if (max_block <= (u32)~0UL)
+ goto out;
+
+ /* Volume is "huge", so see if our journal is new enough to
+ support it. */
+ if (!(OCFS2_HAS_COMPAT_FEATURE(osb->sb,
+ OCFS2_FEATURE_COMPAT_JBD2_SB) &&
+ jbd2_journal_check_used_features(osb->journal->j_journal, 0, 0,
+ JBD2_FEATURE_INCOMPAT_64BIT))) {
+ mlog(ML_ERROR, "The journal cannot address the entire volume. "
+ "Enable the 'block64' journal option with tunefs.ocfs2");
+ status = -EFBIG;
+ goto out;
+ }
+
+ out:
+ return status;
+}
+
static int ocfs2_initialize_super(struct super_block *sb,
struct buffer_head *bh,
int sector_size,
@@ -2215,14 +2256,6 @@ static int ocfs2_initialize_super(struct super_block *sb,
goto bail;
}

- if (ocfs2_clusters_to_blocks(osb->sb, le32_to_cpu(di->i_clusters) - 1)
- > (u32)~0UL) {
- mlog(ML_ERROR, "Volume might try to write to blocks beyond "
- "what jbd can address in 32 bits.\n");
- status = -EINVAL;
- goto bail;
- }


2010-07-13 00:21:37

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 2/2] OCFS2: Allow huge (> 16 TiB) volumes to mount

On 2010-07-11, at 11:04, Patrick J. LoPresti wrote:
> +/* Check to make sure entire volume is addressable on this system.
> + Requires osb_clusters_at_boot to be valid and for the journal to
> + have been initialized by ocfs2_journal_init(). */
> +static int ocfs2_check_addressable(struct ocfs2_super *osb)
> +{
> + /* Absolute addressability check (borrowed from ext4/super.c) */
> + if ((max_block >
> + (sector_t)(~0LL) >> (osb->sb->s_blocksize_bits - 9)) ||
> + (max_block > (pgoff_t)(~0LL) >> (PAGE_CACHE_SHIFT -
> + osb->sb->s_blocksize_bits))) {
> + mlog(ML_ERROR, "Volume too large "
> + "to mount safely on this system");
> + status = -EFBIG;
> + goto out;
> + }

This hunk of code is actually in several filesystems. It wouldn't be a bad idea to make it a library function that can be called by the filesystem to check the kernel page cache and block layer can handle these large filesystems.

Cheers, Andreas






2010-07-13 01:08:52

by Patrick J. LoPresti

[permalink] [raw]
Subject: Re: [PATCH 2/2] OCFS2: Allow huge (> 16 TiB) volumes to mount

On Mon, Jul 12, 2010 at 5:21 PM, Andreas Dilger <[email protected]> wrote:
> On 2010-07-11, at 11:04, Patrick J. LoPresti wrote:
> >
>> + ? ? /* Absolute addressability check (borrowed from ext4/super.c) */
>> + ? ? if ((max_block >
>> + ? ? ? ? ?(sector_t)(~0LL) >> (osb->sb->s_blocksize_bits - 9)) ||
>> + ? ? ? ? (max_block > (pgoff_t)(~0LL) >> (PAGE_CACHE_SHIFT

2010-07-13 01:25:11

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 2/2] OCFS2: Allow huge (> 16 TiB) volumes to mount

On Mon, Jul 12, 2010 at 06:08:51PM -0700, Patrick J. LoPresti wrote:
> On Mon, Jul 12, 2010 at 5:21 PM, Andreas Dilger <[email protected]> wrote:
> > On 2010-07-11, at 11:04, Patrick J. LoPresti wrote:
> > >
> >> + ? ? /* Absolute addressability check (borrowed from ext4/super.c) */
> >> + ? ? if ((max_block >
> >> + ? ? ? ? ?(sector_t)(~0LL) >> (osb->sb->s_blocksize_bits - 9)) ||
> >> + ? ? ? ? (max_block > (pgoff_t)(~0LL) >> (PAGE_CACHE_SHIFT -
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?osb->sb->s_blocksize_bits))) {
> >> + ? ? ? ? ? ? mlog(ML_ERROR, "Volume too large "
> >> + ? ? ? ? ? ? ? ? ?"to mount safely on this system");
> >> + ? ? ? ? ? ? status = -EFBIG;
> >> + ? ? ? ? ? ? goto out;
> >> + ? ? }
> >
> > This hunk of code is actually in several filesystems. ?It wouldn't be a bad idea to make it a library function that can be called by the filesystem to check the kernel page cache and block layer can handle these large filesystems.
>
> True, but some of them do it differently (e.g. see the #if switch in
> xfs_sb_validate_fsb_count). Tracking down all variants and changing
> them is a much larger task than my simple patch.

The XFS code is different to the above because there is still a 16TB
size limit on 32 bit systemsi (i.e. page cache address limits). IOWs,
you can't just remove the above 16TB check unless you (i.e. OCFS2)
handle >16TB block devices on 32 bit systems correctly...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-07-13 01:37:04

by Patrick J. LoPresti

[permalink] [raw]
Subject: Re: [PATCH 2/2] OCFS2: Allow huge (> 16 TiB) volumes to mount

On Mon, Jul 12, 2010 at 6:25 PM, Dave Chinner <[email protected]> wrote:
>
> The XFS code is different to the above because there is still a 16TB
> size limit on 32 bit systemsi (i.e. page cache address limits). IOWs,
> you can't just remove the above 16TB check unless you (i.e. OCFS2)
> handle >16TB block devices on 32 bit systems correctly...

If you look at my patch, you will see that is precisely what it does.
As the comments indicate, it uses the exact same check as ext4, which
will correctly refuse to mount huge volumes on 32-bit systems.

The XFS test appears to be the same thing written a little
differently. Andreas is suggesting that somebody should factor out
this check into a common library routine. That sounds like a fine
idea, but it also sounds orthogonal to the (simple and useful) patch I
am attempting to submit.

- Pat

2010-07-13 04:46:54

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 2/2] OCFS2: Allow huge (> 16 TiB) volumes to mount

On 2010-07-12, at 19:08, Patrick J. LoPresti wrote:
> On Mon, Jul 12, 2010 at 5:21 PM, Andreas Dilger <[email protected]> wrote:
>> On 2010-07-11, at 11:04, Patrick J. LoPresti wrote:
>>>
>>> + /* Absolute addressability check (borrowed from ext4/super.c) */
>>> + if ((max_block >
>>> + (sector_t)(~0LL) >> (osb->sb->s_blocksize_bits - 9)) ||
>>> + (max_block > (pgoff_t)(~0LL) >> (PAGE_CACHE_SHIFT -
>>> + osb->sb->s_blocksize_bits))) {
>>> + mlog(ML_ERROR, "Volume too large "
>>> + "to mount safely on this system");
>>> + status = -EFBIG;
>>> + goto out;
>>> + }
>>
>> This hunk of code is actually in several filesystems. It wouldn't be a bad idea to make it a library function that can be called by the filesystem to check the kernel page cache and block layer can handle these large filesystems.
>
> True, but some of them do it differently (e.g. see the #if switch in
> xfs_sb_validate_fsb_count). Tracking down all variants and changing
> them is a much larger task than my simple patch.
>
> Are you suggesting I need to do this before my patch is accepted at
> all? Or is this a refactoring that can happen later?

I'm just suggesting it should be done at some point. I thought it would be better to do it first, rather than add yet another copy of this code. That said, I hate to block useful fixes because of cleanup (and I have no control over OCFS2 anyway :-). However, I've found that once the fix is in people usually forget (or become too busy) to do the cleanup and it just lingers on unseen.

Cheers, Andreas






2010-07-13 05:00:12

by Patrick J. LoPresti

[permalink] [raw]
Subject: Re: [PATCH 2/2] OCFS2: Allow huge (> 16 TiB) volumes to mount

On Mon, Jul 12, 2010 at 9:46 PM, Andreas Dilger <[email protected]> wrote:
> On 2010-07-12, at 19:08, Patrick J. LoPresti wrote:
>>
>> Are you suggesting I need to do this before my patch is accepted at
>> all? ?Or is this a refactoring that can happen later?
>
> I'm just suggesting it should be done at some point. ?I thought it would be better to do it first, rather than add yet another copy of this code. ?That said, I hate to block useful fixes because of cleanup (and I have no control over OCFS2 anyway :-). ?However, I've found that once the fix is in people usually forget (or become too busy) to do the cleanup and it just lingers on unseen.

I hear you.

I do not object to factoring out the basic addressability test and
using it in my patch, leaving it for others -- like yourself :-) -- to
modify other file systems to invoke it.

Does that sound like a reasonable compromise? If so, where should the
function live and what should it be called, do you think?

- Pat

2010-07-13 08:10:13

by Joel Becker

[permalink] [raw]
Subject: Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge (> 16 TiB) volumes to mount

On Mon, Jul 12, 2010 at 10:00:10PM -0700, Patrick J. LoPresti wrote:
> On Mon, Jul 12, 2010 at 9:46 PM, Andreas Dilger <[email protected]> wrote:
> > On 2010-07-12, at 19:08, Patrick J. LoPresti wrote:
> >>
> >> Are you suggesting I need to do this before my patch is accepted at
> >> all? ?Or is this a refactoring that can happen later?
> >
> > I'm just suggesting it should be done at some point. ?I thought it would be better to do it first, rather than add yet another copy of this code. ?That said, I hate to block useful fixes because of cleanup (and I have no control over OCFS2 anyway :-). ?However, I've found that once the fix is in people usually forget (or become too busy) to do the cleanup and it just lingers on unseen.
>
> I hear you.
>
> I do not object to factoring out the basic addressability test and
> using it in my patch, leaving it for others -- like yourself :-) -- to
> modify other file systems to invoke it.

I think you should modify ext3 and xfs, as they clearly are
partaking of this functionality. I'll happily review it for you. Put
the call in fs/libfs.c. Call it generic_check_addressable(struct
super_block *super).

Joel


--

"The only way to get rid of a temptation is to yield to it."
- Oscar Wilde

Joel Becker
Consulting Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127