2004-01-27 19:16:35

by Florian Huber

[permalink] [raw]
Subject: md raid + jfs + jfs_fsck

Hello MLs,
today I switched from no-raid to linux kernel software raid 1 on a jfs
and a ext3 partition. Both are working fine, but jfs_fsck reports an
error on the jfs md device (md2 <-- hda3+hdc3):

Superblock is corrupt and cannot be repaired
since both primary and secondary copies are corrupt.

Did I miss something? jfs_fsck runs without any error on hda3 and hdc3,
but fails on md2.

I'm using the 2.6.2-rc2 kernel with raid autodetection.

TIA
Florian


--
Florian Huber

Key ID: D9D50EA2
Fingerprint: 0241 C329 E355 9B94 8D34 F637 4EB9 1B1D D9D5 0EA2

BOFH Excuse #413:
Cow-tippers tipped a cow onto the server.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-27 19:29:52

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Tue, 2004-01-27 at 13:15, Florian Huber wrote:
> Hello MLs,
> today I switched from no-raid to linux kernel software raid 1 on a jfs
> and a ext3 partition. Both are working fine, but jfs_fsck reports an
> error on the jfs md device (md2 <-- hda3+hdc3):
>
> Superblock is corrupt and cannot be repaired
> since both primary and secondary copies are corrupt.
>
> Did I miss something? jfs_fsck runs without any error on hda3 and hdc3,
> but fails on md2.

I wonder if JFS is having trouble getting the partition size. Can you
run jfs_fsck with the -v flag to see what part of the superblock it
doesn't like?

> I'm using the 2.6.2-rc2 kernel with raid autodetection.
>
> TIA
> Florian

Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center

2004-01-27 19:40:07

by Florian Huber

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Tue, 2004-01-27 at 20:28, Dave Kleikamp wrote:
> I wonder if JFS is having trouble getting the partition size. Can you
> run jfs_fsck with the -v flag to see what part of the superblock it
> doesn't like?

The current device is: /dev/md2
Open(...READ/WRITE EXCLUSIVE...) returned rc = 0
Incorrect jlog length detected in the superblock (P).
Incorrect jlog length detected in the superblock (S).
Superblock is corrupt and cannot be repaired
since both primary and secondary copies are corrupt.

--
Florian Huber

Key ID: D9D50EA2
Fingerprint: 0241 C329 E355 9B94 8D34 F637 4EB9 1B1D D9D5 0EA2

BOFH Excuse #147:
Party-bug in the Aloha protocol.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-27 19:53:11

by Florian Huber

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Tue, 2004-01-27 at 20:39, Florian Huber wrote:

> Open(...READ/WRITE EXCLUSIVE...) returned rc = 0

I forgot to mention, that the raid device is mounted. But it makes no
difference if I fsck' from from other boot media.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-27 20:44:17

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Tue, 2004-01-27 at 13:39, Florian Huber wrote:
> On Tue, 2004-01-27 at 20:28, Dave Kleikamp wrote:
> > I wonder if JFS is having trouble getting the partition size. Can you
> > run jfs_fsck with the -v flag to see what part of the superblock it
> > doesn't like?
>
> The current device is: /dev/md2
> Open(...READ/WRITE EXCLUSIVE...) returned rc = 0
> Incorrect jlog length detected in the superblock (P).
> Incorrect jlog length detected in the superblock (S).
> Superblock is corrupt and cannot be repaired
> since both primary and secondary copies are corrupt.

My guess is that software raid is stealing a few blocks from the end of
the partition, and JFS doesn't like that, since it's journal goes all
the way to the end. I've created a patch that will shorten the journal
if it can safely be done. It was built against the latest jfsutils cvs
tree, but applies to version 1.1.4:
http://www10.software.ibm.com/developer/opensource/jfs/project/pub/jfsutils-1.1.4.tar.gz

Please let me know if this fixes it. (lkml: Yeah, I know the code is
indented too far. It's outside the kernel, so give me a break.)

Index: jfsutils/fsck/fsckmeta.c
===================================================================
RCS file: /usr/cvs/jfs/jfsutils/fsck/fsckmeta.c,v
retrieving revision 1.18
diff -u -p -r1.18 fsckmeta.c
--- jfsutils/fsck/fsckmeta.c 17 Dec 2003 20:28:47 -0000 1.18
+++ jfsutils/fsck/fsckmeta.c 27 Jan 2004 20:27:56 -0000
@@ -2124,9 +2124,34 @@ int validate_super(int which_super)
}
agg_blks_in_aggreg += jlog_length_from_pxd;
if (agg_blks_in_aggreg > agg_blks_on_device) {
+ int64_t short_blocks;
+ uint32_t new_jlog_size;
/* log length is bad */
vs_rc = FSCK_BADSBFJLL;
- fsck_send_msg(fsck_BADSBFJLL, fsck_ref_msg(which_super));
+ /* Let's try to fix it. :^) */
+ short_blocks = agg_blks_in_aggreg -
+ agg_blks_on_device;
+ new_jlog_size = (jlog_length_from_pxd -
+ short_blocks) *
+ sb_ptr->s_bsize;
+ /* logform likes multiples of 16K */
+ new_jlog_size &= 0xfffffC000;
+ /* Don't let it go below 1/2 MB */
+ if (new_jlog_size > (1 << 19)) {
+ printf("The volume seems to have shrunk by %Ld blocks.\n"
+ "Will attempt to fix.\n",
+ short_blocks);
+ jlog_length_from_pxd =
+ new_jlog_size /
+ sb_ptr->s_bsize;
+ PXDlength(&(sb_ptr->s_logpxd),
+ jlog_length_from_pxd);
+ vs_rc = ujfs_put_superblk(
+ Dev_IOPort, sb_ptr, 1);
+ }
+ if (vs_rc)
+ fsck_send_msg(fsck_BADSBFJLL,
+ fsck_ref_msg(which_super));
}
}
}

--
David Kleikamp
IBM Linux Technology Center

2004-01-27 20:53:33

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Tue, Jan 27, 2004 at 02:43:05PM -0600, Dave Kleikamp wrote:
> My guess is that software raid is stealing a few blocks from the end of
> the partition,

Yes, it does. But JFS should get the right size from the gendisk anyway.
Or did you create the raid with the filesystem already existant? While that
appears to work for a non-full ext2/ext3 filesystem it's not something you
should do because it makes the filesystem internal bookkeeping wrong and
you'll run into trouble with any filesystem sooner or later.

2004-01-27 21:19:47

by Florian Huber

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Tue, 2004-01-27 at 21:53, Christoph Hellwig wrote:
> On Tue, Jan 27, 2004 at 02:43:05PM -0600, Dave Kleikamp wrote:
> Yes, it does. But JFS should get the right size from the gendisk anyway.
> Or did you create the raid with the filesystem already existant?
Yes, i did so.

> While that appears to work for a non-full ext2/ext3 filesystem it's not something you
> should do because it makes the filesystem internal bookkeeping wrong and
> you'll run into trouble with any filesystem sooner or later.

So, remove the raid, create a new raid "1" with one partiton and create
a jfs fs on top of it, copy all files and add the other disk to the
raid?


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-27 21:22:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Tue, Jan 27, 2004 at 10:19:45PM +0100, Florian Huber wrote:
> So, remove the raid, create a new raid "1" with one partiton and create
> a jfs fs on top of it, copy all files and add the other disk to the
> raid?

You can't partition md devices (yet), but otherwise yes. I think you can
also create md device without the persistant superblock still, but it
always was a pain to maintain those.

2004-01-27 21:47:15

by Gene Heskett

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Tuesday 27 January 2004 15:53, Christoph Hellwig wrote:
>On Tue, Jan 27, 2004 at 02:43:05PM -0600, Dave Kleikamp wrote:
>> My guess is that software raid is stealing a few blocks from the
>> end of the partition,
>
>Yes, it does. But JFS should get the right size from the gendisk
> anyway. Or did you create the raid with the filesystem already
> existant? While that appears to work for a non-full ext2/ext3
> filesystem it's not something you should do because it makes the
> filesystem internal bookkeeping wrong and you'll run into trouble
> with any filesystem sooner or later.
>
I wonder if this discussion has anything to do with what we perceive
as an excruciatingly long resync time? Should the array be
reformatted after startup with a new mkreiserfs in the event thats
what we are running on a raid5?

If it exists, please point me to a good, maybe better than that which
comes with mdtools, discussion, web site or whatever please.

>-
>To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--
Cheers, Gene
"There are four boxes to be used in defense of liberty: soap,
ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.22% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

2004-01-28 02:48:10

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Tue, Jan 27, 2004 at 08:53:24PM +0000, Christoph Hellwig wrote:
> Yes, it does. But JFS should get the right size from the gendisk anyway.
> Or did you create the raid with the filesystem already existant? While that
> appears to work for a non-full ext2/ext3 filesystem it's not something you
> should do because it makes the filesystem internal bookkeeping wrong and
> you'll run into trouble with any filesystem sooner or later.

The key words here is *appears* to work. No matter what the
filesystem, as Chrisoph says, you'll run into trouble sooner or
later....

- Ted

2004-01-28 09:24:29

by Luigi Genoni

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck



On Tue, 27 Jan 2004, Christoph Hellwig wrote:
>
> Yes, it does. But JFS should get the right size from the gendisk anyway.
> Or did you create the raid with the filesystem already existant? While that
> appears to work for a non-full ext2/ext3 filesystem it's not something you
> should do because it makes the filesystem internal bookkeeping wrong and
> you'll run into trouble with any filesystem sooner or later.
>
In most situation to create a new FS on a RAID1 MD is not an option.
It happens that you have to mirror a partition, maybe alarge one, and it
already had a filesystem on top of it. Then what should you do?
backup, mirror and then restore? Sometimes it is not possible this too.
Then you accept to deal with the possible problems...

Luigi

2004-01-28 09:39:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Wed, Jan 28, 2004 at 10:24:14AM +0100, [email protected] wrote:
> In most situation to create a new FS on a RAID1 MD is not an option.
> It happens that you have to mirror a partition, maybe alarge one, and it
> already had a filesystem on top of it. Then what should you do?
> backup, mirror and then restore? Sometimes it is not possible this too.
> Then you accept to deal with the possible problems...

Then you need to shrink the filesystem. As long as the space isn't used
yet it's rather trivial for most ondisk formats, but you absolutely need
to do it to be safe.

2004-01-28 10:54:47

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Wed, Jan 28, 2004 at 10:24:14AM +0100, [email protected] wrote:
>
>
> On Tue, 27 Jan 2004, Christoph Hellwig wrote:
> >
> > Yes, it does. But JFS should get the right size from the gendisk anyway.
> > Or did you create the raid with the filesystem already existant? While that
> > appears to work for a non-full ext2/ext3 filesystem it's not something you
> > should do because it makes the filesystem internal bookkeeping wrong and
> > you'll run into trouble with any filesystem sooner or later.
> >
> In most situation to create a new FS on a RAID1 MD is not an option.
> It happens that you have to mirror a partition, maybe alarge one, and it
> already had a filesystem on top of it. Then what should you do?
> backup, mirror and then restore? Sometimes it is not possible this too.
> Then you accept to deal with the possible problems...

Read The Fine Manual. :)

http://unthought.net/Software-RAID.HOWTO/Software-RAID.HOWTO-7.html#ss7.4

"Method 2" covers exactly this, for a root filesytem though, but you
should be able to adapt it.

/ jakob


2004-01-28 16:29:53

by Luigi Genoni

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck



On Wed, 28 Jan 2004, Christoph Hellwig wrote:

> Date: Wed, 28 Jan 2004 09:38:51 +0000
> From: Christoph Hellwig <[email protected]>
> Then you need to shrink the filesystem. As long as the space isn't used
> yet it's rather trivial for most ondisk formats, but you absolutely need
> to do it to be safe.
>

perfect! In fact that is what I am used to do.
than would be optimum to be able to shrink a FS on line, and not
all linux FS can do that. The real problem is that somehow not all know about
this and are not aware, as you can see from this thread. maybe should be
added somethninmg about this is kernel documentation?

Luigi

2004-01-29 22:40:19

by Helge Hafting

[permalink] [raw]
Subject: Re: [Jfs-discussion] md raid + jfs + jfs_fsck

On Wed, Jan 28, 2004 at 10:24:14AM +0100, [email protected] wrote:
>
>
> On Tue, 27 Jan 2004, Christoph Hellwig wrote:
> >
> > Yes, it does. But JFS should get the right size from the gendisk anyway.
> > Or did you create the raid with the filesystem already existant? While that
> > appears to work for a non-full ext2/ext3 filesystem it's not something you
> > should do because it makes the filesystem internal bookkeeping wrong and
> > you'll run into trouble with any filesystem sooner or later.
> >
> In most situation to create a new FS on a RAID1 MD is not an option.
> It happens that you have to mirror a partition, maybe alarge one, and it
> already had a filesystem on top of it. Then what should you do?
> backup, mirror and then restore? Sometimes it is not possible this too.
> Then you accept to deal with the possible problems...
>
If you need to mirror it - then you have an empty mirror disk ready, right?
Create a degraded array on the mirror disk, then make a fs there. Then
copy everything over from the original partition. After this, change
the original partition to raid and add it to the other array. (It will
then be updated from the copy).


This approach works with all filesystems, including those that
cannot be resized. Data is copied twice instead of once, but
teh copying step will defragment files and you have the option
of changing filesystem or take advantage of sparse files if
you so wish.

Helge Hafting