2001-02-20 22:41:11

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com

On Tue, Feb 20, 2001 at 10:49:07PM +0000, Heinz Mauelshagen wrote:
>
> Hi all,
>
> a tarball of the Linux Logical Volume Manager 0.9.1 Beta 5 is available now at
>
> <http://www.sistina.com/>
>
> for download (Follow the "LVM download page" link).
>
> This release fixes several bugs.
> See the CHANGELOG file contained in the tarball for further information.
>
> A change in the i/o protocoll version *forces* you to update
> the driver as well.
> Follow instructions in PATCHES/README to achieve this please.
>
>
> Please help us to stabilize for 0.9.1 ASAP and test is as much as possible!
> Feed back related information to <[email protected]>.

The bheads in the lv_t is the wrong way to go, I just wrote an alternate patch
for rawio that keeps the bh inside the kiovec, not in the lv, this also
imrproves rawio performance in general (such allocation deallocation flood was
wasteful). No a single change is required at the lvm layer, all the changes lives in
the kiobuf layer. It's tested and it works for me.

diff -urN rawio-ref/fs/buffer.c rawio/fs/buffer.c
--- rawio-ref/fs/buffer.c Tue Feb 20 23:17:10 2001
+++ rawio/fs/buffer.c Tue Feb 20 23:17:27 2001
@@ -1240,6 +1240,29 @@
wake_up(&buffer_wait);
}

+int alloc_kiobuf_bhs(struct kiobuf * kiobuf)
+{
+ int i, j;
+
+ for (i = 0; i < KIO_MAX_SECTORS; i++)
+ if (!(kiobuf->bh[i] = get_unused_buffer_head(0))) {
+ for (j = 0; j < i; j++)
+ put_unused_buffer_head(kiobuf->bh[j]);
+ wake_up(&buffer_wait);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+void free_kiobuf_bhs(struct kiobuf * kiobuf)
+{
+ int i;
+
+ for (i = 0; i < KIO_MAX_SECTORS; i++)
+ put_unused_buffer_head(kiobuf->bh[i]);
+ wake_up(&buffer_wait);
+}
+
static void end_buffer_io_async(struct buffer_head * bh, int uptodate)
{
unsigned long flags;
@@ -1333,10 +1356,8 @@
iosize = 0;
}

- put_unused_buffer_head(tmp);
iosize += size;
}
- wake_up(&buffer_wait);

dprintk ("do_kio end %d %d\n", iosize, err);

@@ -1390,7 +1411,7 @@
int i;
int bufind;
int pageind;
- int bhind;
+ int bhind, kiobuf_bh_nr;
int offset;
unsigned long blocknr;
struct kiobuf * iobuf = NULL;
@@ -1422,6 +1443,7 @@
*/
bufind = bhind = transferred = err = 0;
for (i = 0; i < nr; i++) {
+ kiobuf_bh_nr = 0;
iobuf = iovec[i];
err = setup_kiobuf_bounce_pages(iobuf, GFP_USER);
if (err)
@@ -1444,12 +1466,8 @@

while (length > 0) {
blocknr = b[bufind++];
- tmp = get_unused_buffer_head(0);
- if (!tmp) {
- err = -ENOMEM;
- goto error;
- }
-
+ tmp = iobuf->bh[kiobuf_bh_nr++];
+
tmp->b_dev = B_FREE;
tmp->b_size = size;
tmp->b_data = (char *) (page + offset);
@@ -1460,7 +1478,8 @@
if (rw == WRITE) {
set_bit(BH_Uptodate, &tmp->b_state);
set_bit(BH_Dirty, &tmp->b_state);
- }
+ } else
+ clear_bit(BH_Uptodate, &tmp->b_state);

dprintk ("buffer %d (%d) at %p\n",
bhind, tmp->b_blocknr, tmp->b_data);
@@ -1478,7 +1497,7 @@
transferred += err;
else
goto finished;
- bhind = 0;
+ kiobuf_bh_nr = bhind = 0;
}

if (offset >= PAGE_SIZE) {
@@ -1506,17 +1525,6 @@
if (transferred)
return transferred;
return err;
-
- error:
- /* We got an error allocation the bh'es. Just free the current
- buffer_heads and exit. */
- for (i = 0; i < bhind; i++)
- put_unused_buffer_head(bh[i]);
- wake_up(&buffer_wait);
-
- clear_kiobuf_bounce_pages(iobuf);
-
- goto finished;
}

/*
diff -urN rawio-ref/fs/iobuf.c rawio/fs/iobuf.c
--- rawio-ref/fs/iobuf.c Tue Feb 20 23:17:10 2001
+++ rawio/fs/iobuf.c Tue Feb 20 23:17:24 2001
@@ -41,6 +41,11 @@
iobuf->pagelist = iobuf->page_array;
iobuf->maplist = iobuf->map_array;
iobuf->bouncelist = iobuf->bounce_array;
+ if (alloc_kiobuf_bhs(iobuf)) {
+ kmem_cache_free(kiobuf_cachep, iobuf);
+ free_kiovec(i, bufp);
+ return -ENOMEM;
+ }
*bufp++ = iobuf;
}

@@ -73,6 +78,7 @@
if (iobuf->array_len > KIO_STATIC_PAGES) {
kfree (iobuf->pagelist);
}
+ free_kiobuf_bhs(iobuf);
kmem_cache_free(kiobuf_cachep, bufp[i]);
}
}
diff -urN rawio-ref/include/linux/iobuf.h rawio/include/linux/iobuf.h
--- rawio-ref/include/linux/iobuf.h Tue Feb 20 23:17:10 2001
+++ rawio/include/linux/iobuf.h Tue Feb 20 23:17:24 2001
@@ -50,6 +50,7 @@
unsigned long page_array[KIO_STATIC_PAGES];
struct page * map_array[KIO_STATIC_PAGES];
unsigned long bounce_array[KIO_STATIC_PAGES];
+ struct buffer_head *bh[KIO_MAX_SECTORS];
};


@@ -67,6 +68,8 @@
int setup_kiobuf_bounce_pages(struct kiobuf *, int gfp_mask);
void clear_kiobuf_bounce_pages(struct kiobuf *);
void kiobuf_copy_bounce(struct kiobuf *, int direction, int max);
+extern int alloc_kiobuf_bhs(struct kiobuf *);
+extern void free_kiobuf_bhs(struct kiobuf *);

/* Direction codes for kiobuf_copy_bounce: */
enum {



I didn't had much time to look into beta5 yet but I can't see why you changed
the protocol to 11. There's no breakage between beta4 and beta5 in the
datastructures shared with userspace. I don't like gratuitous API breakage.

Just as reminer the hardsectsize fix isn't yet merged in beta5.

Andrea


2001-02-21 00:32:20

by Andreas Dilger

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com

Andrea writes:
> On Tue, Feb 20, 2001 at 10:49:07PM +0000, Heinz Mauelshagen wrote:
> > A change in the i/o protocoll version *forces* you to update
> > the driver as well.
>
> I didn't had much time to look into beta5 yet but I can't see why you changed
> the protocol to 11. There's no breakage between beta4 and beta5 in the
> datastructures shared with userspace. I don't like gratuitous API breakage.

The reason why the IOP was changed was because the VG_CREATE ioctl now
depends on the vg_number in the supplied vg_t to determine which VG minor
number to use. The old interface used the minor number of the opened
device inode, but for devfs the device inodes don't exist until the VG
is created... If you run an older kernel with new tools, you can only
use the first VG.

I suggested reverting this change for the current release, and only fixing
the LVM devfs code (which is probably still broken in other ways). Heinz
decided to update the IOP instead. Note that with the new library build,
it is possible to have multiple IOP tools installed at the same time, and
the correct ones are chosen at runtime based on the kernel IOP.

Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert

2001-02-21 01:11:37

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com

On Tue, Feb 20, 2001 at 05:31:25PM -0700, Andreas Dilger wrote:
> The reason why the IOP was changed was because the VG_CREATE ioctl now
> depends on the vg_number in the supplied vg_t to determine which VG minor
> number to use. The old interface used the minor number of the opened
> device inode, but for devfs the device inodes don't exist until the VG
> is created... If you run an older kernel with new tools, you can only
> use the first VG.

Ah, I was reading the patch incidentally against 2.2 patch where devfs support
is not included, so I wasn't thinking the devfs way ;). Thanks for the
explanation.

I assume it's not possible to mknod on top of devfs. So then we could use a
temporary device in /var/tmp or whatever for that. However those workarounds
tends to be ugly.

Probably the best way to preserve the IOP that I recommend for beta6 is to add
a new ioctl to the VG chardevice. Rename VG_CREATE to VG_CREATE_OLD.
VG_CREATE_OLD is a wrapper that calculates the minor number from the inode and
then fallbacks into VG_CREATE, and the new VG_CREATE is the one that gets
the minor of the vg from userspace.

Either ways we don't break backwards compatibilty across 0.9* cycle.

If there would been a strong reason and it would be a mess to provide backwards
compatibilty I would of course agree to raise at IOP 11, but just to avoid a
few lines of code for a wrapper or a temporary mknod on /tmp for a devfs-only
fix, I think it worth to preserve IOP 10.

Andrea

2001-02-21 03:50:21

by Richard Gooch

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com

Andrea Arcangeli writes:
> On Tue, Feb 20, 2001 at 05:31:25PM -0700, Andreas Dilger wrote:
> > The reason why the IOP was changed was because the VG_CREATE ioctl now
> > depends on the vg_number in the supplied vg_t to determine which VG minor
> > number to use. The old interface used the minor number of the opened
> > device inode, but for devfs the device inodes don't exist until the VG
> > is created... If you run an older kernel with new tools, you can only
> > use the first VG.
>
> Ah, I was reading the patch incidentally against 2.2 patch where devfs support
> is not included, so I wasn't thinking the devfs way ;). Thanks for the
> explanation.
>
> I assume it's not possible to mknod on top of devfs. So then we
> could use a temporary device in /var/tmp or whatever for that.
> However those workarounds tends to be ugly.

You definately can mknod(2) on devfs.

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2001-02-21 16:59:48

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com

On Wed, Feb 21, 2001 at 02:49:17PM +1100, Richard Gooch wrote:
> You definately can mknod(2) on devfs. [..]

So then why don't we simply create the VG ourself with the right minor number
and use it as we do without devfs? We'll still have a global 256 VG limit this
way but that's not a minor issue.

Andrea

2001-02-21 17:04:08

by Richard Gooch

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com

[LVM list removed so I don't get the nastygram]
Andrea Arcangeli writes:
> On Wed, Feb 21, 2001 at 02:49:17PM +1100, Richard Gooch wrote:
> > You definately can mknod(2) on devfs. [..]
>
> So then why don't we simply create the VG ourself with the right
> minor number and use it as we do without devfs? We'll still have a
> global 256 VG limit this way but that's not a minor issue.

Erm, that's a good question. Since I don't know the background to this
thread, can someone fill me in on what the problem/issue is?

Regards,

Richard....
Permanent: [email protected]
Current: [email protected]

2001-02-21 17:19:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com

In article <[email protected]> you wrote:
> On Wed, Feb 21, 2001 at 02:49:17PM +1100, Richard Gooch wrote:
>> You definately can mknod(2) on devfs. [..]

> So then why don't we simply create the VG ourself with the right minor number
> and use it as we do without devfs? We'll still have a global 256 VG limit this
> way but that's not a minor issue.

Yes - that's how I did it in my inital LVM & devfs patches.
It would be really good to have something devfs-like just for LVM in
setups that don't use LVM, so we could avoid mounting root read/write
for device-creation.
One of the stronger points for a per-driver devfs, IHMO.

Christoph

--
Of course it doesn't work. We've performed a software upgrade.

2001-02-22 04:13:10

by Peter Samuelson

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com


[Christoph Hellwig]
> It would be really good to have something devfs-like just for LVM in
> setups that don't use LVM, so we could avoid mounting root read/write
^^^devfs?
> for device-creation.

For most people, read/write access to /dev is rarely needed -- how
often do you create new VGs or LVs? How often do you run MAKEDEV or
vgscan? Sometimes you need to change tty inodes but that's what
/dev/pts is for.

If you do need read-write access to /dev but not /, put it on a
separate filesystem. Leave just a skeletal /dev in your root
filesystem, enough to bootstrap.

Peter

2001-02-22 09:47:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com

On Wed, Feb 21, 2001 at 10:12:25PM -0600, Peter Samuelson wrote:
>
> [Christoph Hellwig]
> > It would be really good to have something devfs-like just for LVM in
> > setups that don't use LVM, so we could avoid mounting root read/write
> ^^^devfs?

Yes...

> > for device-creation.
>
> For most people, read/write access to /dev is rarely needed -- how
> often do you create new VGs or LVs? How often do you run MAKEDEV or
> vgscan?

On every bootup, _before_ doing mount -a

Christoph

--
Of course it doesn't work. We've performed a software upgrade.

2001-02-22 10:53:19

by Peter Samuelson

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com


[Peter Samuelson]
> > How often do you run MAKEDEV or vgscan?

[Christoph Hellwig]
> On every bootup, _before_ doing mount -a

A mere 'vgchange -ay' works fine for *my* boot processes. Is there a
particular reason to do 'vgscan' every time? (I'm not arguing -- just
wondering.)

Peter

2001-02-22 11:54:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [lvm-devel] *** ANNOUNCEMENT *** LVM 0.9.1 beta5 available at www.sistina.com

On Thu, Feb 22, 2001 at 04:52:51AM -0600, Peter Samuelson wrote:
>
> [Peter Samuelson]
> > > How often do you run MAKEDEV or vgscan?
>
> [Christoph Hellwig]
> > On every bootup, _before_ doing mount -a
>
> A mere 'vgchange -ay' works fine for *my* boot processes. Is there a
> particular reason to do 'vgscan' every time? (I'm not arguing -- just
> wondering.)

It is that what is suggested by the LVM crew. I have also tried to use
it without vgscan - it work then and whenn, but not 100% percent reliable.

Christoph

--
Of course it doesn't work. We've performed a software upgrade.
Whip me. Beat me. Make me maintain AIX.