2002-01-14 17:56:42

by Alexander Viro

[permalink] [raw]
Subject: [RFLART] kdev_t in ioctls

Linus, at least some ioctls (e.g. lvm ones) pass kdev_t from/to
userland. While the common policy with ioctls is "anything goes", this
kind of abuse is IMNSHO over the top.

Example: ioctl(fd, VG_CREATE, ptr) expects the following:
at ptr -
struct {
/* bunch of sane fields */
struct proc_dir_entry *proc; /* ignored */
pv_t *pv[ABS_MAX_PV + 1];
lv_t *lv[ABS_MAX_LV + 1];
/* bunch of stuff */
}

and pointers in the second array are to the following:
struct {
/* lots of stuff */
kdev_t lv_dev;
/* lots of other stuff */
}

They _are_ dereferenced and values of ptr->lv[i]->lv_dev are stored in
kernel data structures. And used afterwards. As kdev_t.

The same goes for the rest of LVM ioctls - pretty much all of them
pull such stunts. I'm not going to comment on harmless gross indecencies
like struct proc_dir_entry * passed from the userland (and fortunately
ignored), but kdev_t instances are _not_ harmless.

Public statement along the lines "any API that passes kdev_t values
across the kernel boundary is unacceptable" would be a nice thing...


2002-01-14 18:04:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFLART] kdev_t in ioctls


On Mon, 14 Jan 2002, Alexander Viro wrote:
>
> Linus, at least some ioctls (e.g. lvm ones) pass kdev_t from/to
> userland. While the common policy with ioctls is "anything goes", this
> kind of abuse is IMNSHO over the top.

That's completely bogus.

The good news is that the bit-for-bit representation of old kdev_t and
"dev_t" are obviously 100% the same, so we should just make the damn thing
be dev_t, and user land will never notice anything.

So we can just change all structures that are exported to user land to use
"dev_t", and add the required conversion magic. Possibly by duplicating
the structure, and having "used_lvm_struct_x" and functions to read and
write them from/to user space.

> Public statement along the lines "any API that passes kdev_t values
> across the kernel boundary is unacceptable" would be a nice thing...

Consider that done. ANYTHING that exports kdev_t to user space is
incredibly broken, and will not work in a few months when the actual bit
representation (and size) will change.

Do we have any lvm people willing to fix this? (linux-lvm cc'd, but I know
they've been very silent on the 2.5.x changes so far)

Linus

2002-01-14 18:09:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [linux-lvm] Re: [RFLART] kdev_t in ioctls

On Mon, Jan 14, 2002 at 10:01:25AM -0800, Linus Torvalds wrote:
>
> On Mon, 14 Jan 2002, Alexander Viro wrote:
> >
> > Linus, at least some ioctls (e.g. lvm ones) pass kdev_t from/to
> > userland. While the common policy with ioctls is "anything goes", this
> > kind of abuse is IMNSHO over the top.
>
> That's completely bogus.
>
> The good news is that the bit-for-bit representation of old kdev_t and
> "dev_t" are obviously 100% the same, so we should just make the damn thing
> be dev_t, and user land will never notice anything.

Glibc disagrees with you (bits/types.h):

typedef __u_quad_t __dev_t; /* Type of device numbers. */

We'd have to use __kernel_dev_t instead which again pulls kernel
headers in..

Christoph

2002-01-14 18:14:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [linux-lvm] Re: [RFLART] kdev_t in ioctls

On Mon, Jan 14, 2002 at 07:08:34PM +0100, Christoph Hellwig wrote:
> > The good news is that the bit-for-bit representation of old kdev_t and
> > "dev_t" are obviously 100% the same, so we should just make the damn thing
> > be dev_t, and user land will never notice anything.
>
> Glibc disagrees with you (bits/types.h):
>
> typedef __u_quad_t __dev_t; /* Type of device numbers. */
>
> We'd have to use __kernel_dev_t instead which again pulls kernel
> headers in..

Argg. That's also non-funny:

[hch@sb linux]$ grep __kernel_dev_t; include/asm-*/posix_types.h
include/asm-alpha/posix_types.h:typedef unsigned int __kernel_dev_t;
include/asm-arm/posix_types.h:typedef unsigned short __kernel_dev_t;
include/asm-cris/posix_types.h:typedef unsigned short __kernel_dev_t;
include/asm-i386/posix_types.h:typedef unsigned short __kernel_dev_t;
include/asm-ia64/posix_types.h:typedef unsigned int __kernel_dev_t;
include/asm-m68k/posix_types.h:typedef unsigned short __kernel_dev_t;
include/asm-mips64/posix_types.h:typedef unsigned int __kernel_dev_t;
include/asm-mips/posix_types.h:typedef unsigned int __kernel_dev_t;
include/asm-parisc/posix_types.h:typedef unsigned int __kernel_dev_t;
include/asm-ppc/posix_types.h:typedef unsigned int __kernel_dev_t;
include/asm-s390/posix_types.h:typedef unsigned short __kernel_dev_t;
include/asm-s390x/posix_types.h:typedef unsigned int __kernel_dev_t;
include/asm-sh/posix_types.h:typedef unsigned short __kernel_dev_t;
include/asm-sparc64/posix_types.h:typedef unsigned int __kernel_dev_t;
include/asm-sparc/posix_types.h:typedef unsigned short __kernel_dev_t;

2002-01-14 18:20:52

by Alexander Viro

[permalink] [raw]
Subject: Re: [RFLART] kdev_t in ioctls



On Mon, 14 Jan 2002, Linus Torvalds wrote:

> The good news is that the bit-for-bit representation of old kdev_t and
> "dev_t" are obviously 100% the same, so we should just make the damn thing
> be dev_t, and user land will never notice anything.
>
> So we can just change all structures that are exported to user land to use
> "dev_t", and add the required conversion magic. Possibly by duplicating
> the structure, and having "used_lvm_struct_x" and functions to read and
> write them from/to user space.

... that, BTW, would kill the crap like

typedef struct ...
...
#ifdef __KERNEL__
struct kiobuf *lv_iobuf;
sector_t blocks[LVM_MAX_SECTORS];
struct kiobuf *lv_COW_table_iobuf;
struct rw_semaphore lv_lock;
struct list_head *lv_snapshot_hash_table;
uint32_t lv_snapshot_hash_table_size;
uint32_t lv_snapshot_hash_mask;
wait_queue_head_t lv_snapshot_wait;
int lv_snapshot_use_rate;
struct vg_v3 *vg;

uint lv_allocated_snapshot_le;
#else
char dummy[200];
#endif
} lv_t;

Yup, structure "shared" by kernel and userland. Look through the
include/linux/lvm.h - it's choke-full of that. And quite a few
of them contain fields like
struct proc_dir_entry *foo
struct block_device *bar
struct list_head baz
outside of these #ifdef __KERNEL__...

> > Public statement along the lines "any API that passes kdev_t values
> > across the kernel boundary is unacceptable" would be a nice thing...
>
> Consider that done. ANYTHING that exports kdev_t to user space is
> incredibly broken, and will not work in a few months when the actual bit
> representation (and size) will change.
>
> Do we have any lvm people willing to fix this? (linux-lvm cc'd, but I know
> they've been very silent on the 2.5.x changes so far)

Umm... Is the version in main tree actually maintained?

2002-01-14 18:46:05

by Alan

[permalink] [raw]
Subject: Re: [linux-lvm] Re: [RFLART] kdev_t in ioctls

> > Glibc disagrees with you (bits/types.h):
> >
> > typedef __u_quad_t __dev_t; /* Type of device numbers. */
> >
> > We'd have to use __kernel_dev_t instead which again pulls kernel
> > headers in..
>
> Argg. That's also non-funny:

glibc is meant to disagree. glibc provides a virtualised dev_t to user space
so that the kernel one can be expanded in future without application
breakage.

2002-01-14 18:48:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [linux-lvm] Re: [RFLART] kdev_t in ioctls

On Mon, Jan 14, 2002 at 06:56:44PM +0000, Alan Cox wrote:
> > > Glibc disagrees with you (bits/types.h):
> > >
> > > typedef __u_quad_t __dev_t; /* Type of device numbers. */
> > >
> > > We'd have to use __kernel_dev_t instead which again pulls kernel
> > > headers in..
> >
> > Argg. That's also non-funny:
>
> glibc is meant to disagree. glibc provides a virtualised dev_t to user space
> so that the kernel one can be expanded in future without application
> breakage.

I know - still it makes Linus' suggestion not work on ~90% of the
systems.

2002-01-14 18:54:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: [linux-lvm] Re: [RFLART] kdev_t in ioctls


On Mon, 14 Jan 2002, Christoph Hellwig wrote:
>
> I know - still it makes Linus' suggestion not work on ~90% of the
> systems.

It doesn't matter if user-land compilation breaks. As long as old binaries
work (and they will), we're fine.

User-land was _already_ broken. By virtue of using a type that it should
NOT have used.

If you want to use __kernel_dev_t, go ahead.

Linus

2002-01-15 12:27:11

by Joe Thornber

[permalink] [raw]
Subject: Re: [linux-lvm] Re: [RFLART] kdev_t in ioctls

On Mon, Jan 14, 2002 at 10:01:25AM -0800, Linus Torvalds wrote:
> Consider that done. ANYTHING that exports kdev_t to user space is
> incredibly broken, and will not work in a few months when the actual bit
> representation (and size) will change.

The kdev_t's in the driver interface are just one of the *minor*
problems with the LVM driver.

I came to the conclusion last summer that a rewrite was in order, of
both the kernel driver and the userland tools. The new driver is
called 'device-mapper', and has been discussed briefly on this list.
It aims to support volume management in general, ie. not be LVM
specific.

The userland tools (known as LVM2), will go into beta this week.
Initially they will just replicate the functionality of LVM1, but we
do have a lot of extra features queued which will go in subsequent
releases.

Of course Sistina will continue to support the existing LVM1 driver
for the 2.4 series.

As far as the 2.5 series is concerned, I would much rather see people
embracing the new architecture (or telling me why it sucks). Rather
than trying to hack the LVM1 driver so it works. People have been
complaining for the last year about LVM, we weren't able to do much
about it since we were in a stable kernel and couldn't change any
interfaces. Now that 2.5 is finally here it is time for people to
address the real problems - kdev_t's only scratch the surface.

- Joe