2003-07-16 18:35:31

by Greg KH

[permalink] [raw]
Subject: [PATCH] print_dev_t for 2.6.0-test1-mm

Hi,

Here's a patch against 2.6.0-test1-mm that fixes up the different places
where we export a dev_t to userspace. This fixes all of the compiler
warnings that were previously reported with these files.

If I should put the print_dev_t() function in a different header file,
please let me know.

thanks,

greg k-h


diff -Naur -X /home/greg/linux/dontdiff linux-2.6.0-test1-mm/drivers/block/genhd.c linux-2.6.0-test1-mm-gregkh/drivers/block/genhd.c
--- linux-2.6.0-test1-mm/drivers/block/genhd.c 2003-07-13 20:34:02.000000000 -0700
+++ linux-2.6.0-test1-mm-gregkh/drivers/block/genhd.c 2003-07-16 11:34:34.755238792 -0700
@@ -336,7 +336,7 @@
static ssize_t disk_dev_read(struct gendisk * disk, char *page)
{
dev_t base = MKDEV(disk->major, disk->first_minor);
- return sprintf(page, "%04x\n", (unsigned)base);
+ return print_dev_t(page, base);
}
static ssize_t disk_range_read(struct gendisk * disk, char *page)
{
diff -Naur -X /home/greg/linux/dontdiff linux-2.6.0-test1-mm/drivers/char/tty_io.c linux-2.6.0-test1-mm-gregkh/drivers/char/tty_io.c
--- linux-2.6.0-test1-mm/drivers/char/tty_io.c 2003-07-13 20:34:49.000000000 -0700
+++ linux-2.6.0-test1-mm-gregkh/drivers/char/tty_io.c 2003-07-16 11:35:40.205288872 -0700
@@ -2106,7 +2106,7 @@
static ssize_t show_dev(struct class_device *class_dev, char *buf)
{
struct tty_dev *tty_dev = to_tty_dev(class_dev);
- return sprintf(buf, "%04lx\n", (unsigned long)tty_dev->dev);
+ return print_dev_t(buf, tty_dev->dev);
}
static CLASS_DEVICE_ATTR(dev, S_IRUGO, show_dev, NULL);

diff -Naur -X /home/greg/linux/dontdiff linux-2.6.0-test1-mm/drivers/i2c/i2c-dev.c linux-2.6.0-test1-mm-gregkh/drivers/i2c/i2c-dev.c
--- linux-2.6.0-test1-mm/drivers/i2c/i2c-dev.c 2003-07-13 20:36:48.000000000 -0700
+++ linux-2.6.0-test1-mm-gregkh/drivers/i2c/i2c-dev.c 2003-07-16 11:36:23.060773848 -0700
@@ -118,7 +118,7 @@
static ssize_t show_dev(struct class_device *class_dev, char *buf)
{
struct i2c_dev *i2c_dev = to_i2c_dev(class_dev);
- return sprintf(buf, "%04x\n", MKDEV(I2C_MAJOR, i2c_dev->minor));
+ return print_dev_t(buf, MKDEV(I2C_MAJOR, i2c_dev->minor));
}
static CLASS_DEVICE_ATTR(dev, S_IRUGO, show_dev, NULL);

diff -Naur -X /home/greg/linux/dontdiff linux-2.6.0-test1-mm/drivers/usb/core/file.c linux-2.6.0-test1-mm-gregkh/drivers/usb/core/file.c
--- linux-2.6.0-test1-mm/drivers/usb/core/file.c 2003-07-16 11:25:05.269813736 -0700
+++ linux-2.6.0-test1-mm-gregkh/drivers/usb/core/file.c 2003-07-16 11:33:27.193509736 -0700
@@ -93,7 +93,7 @@
{
struct usb_interface *intf = class_dev_to_usb_interface(class_dev);
dev_t dev = MKDEV(USB_MAJOR, intf->minor);
- return sprintf(buf, "%04lx\n", (unsigned long)dev);
+ return print_dev_t(buf, dev);
}
static CLASS_DEVICE_ATTR(dev, S_IRUGO, show_dev, NULL);

diff -Naur -X /home/greg/linux/dontdiff linux-2.6.0-test1-mm/include/linux/kdev_t.h linux-2.6.0-test1-mm-gregkh/include/linux/kdev_t.h
--- linux-2.6.0-test1-mm/include/linux/kdev_t.h 2003-07-16 11:25:05.868722688 -0700
+++ linux-2.6.0-test1-mm-gregkh/include/linux/kdev_t.h 2003-07-16 11:33:05.754768920 -0700
@@ -103,6 +103,11 @@
return mk_kdev(ma, mi);
}

+static inline int print_dev_t(char *buffer, dev_t dev)
+{
+ return sprintf(buffer, "%04lx\n", (unsigned long)dev);
+}
+
#else /* __KERNEL__ */

/*


2003-07-16 20:01:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Greg KH <[email protected]> wrote:
>
> Here's a patch against 2.6.0-test1-mm that fixes up the different places
> where we export a dev_t to userspace. This fixes all of the compiler
> warnings that were previously reported with these files.

I added this as well:

static inline char *format_dev_t(char *buffer, dev_t dev)
{
sprintf(buffer, "%04lx\n", (unsigned long)dev);
return buffer;
}

tp be placed direct in a printk().

We'll probably need to do something more fancy in here later, because once
a dev_t becomes 32:32, it'll need to be printed out with "%016llx", which
is daft.

So we'll need to come up with some standardised way of presenting a dev_t
to the user. Presumably that will just be

sprintf(buf, "%d:%d", major(dev), minor(dev));

But if we do this, will it break your existing stuff?

> If I should put the print_dev_t() function in a different header file,
> please let me know.

Seems OK. Every kdev_t.h includer now needs to include kernel.h too. Fair
enough.

2003-07-16 20:54:16

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Wed, Jul 16, 2003 at 01:09:15PM -0700, Andrew Morton wrote:
> > Here's a patch against 2.6.0-test1-mm that fixes up the different places
> > where we export a dev_t to userspace. This fixes all of the compiler
> > warnings that were previously reported with these files.
>
> I added this as well:
>
> static inline char *format_dev_t(char *buffer, dev_t dev)
> {
> sprintf(buffer, "%04lx\n", (unsigned long)dev);
> return buffer;
> }
>
> tp be placed direct in a printk().

Nice.

> We'll probably need to do something more fancy in here later, because once
> a dev_t becomes 32:32, it'll need to be printed out with "%016llx", which
> is daft.
>
> So we'll need to come up with some standardised way of presenting a dev_t
> to the user. Presumably that will just be
>
> sprintf(buf, "%d:%d", major(dev), minor(dev));
>
> But if we do this, will it break your existing stuff?

No, I don't think there are any users of udev right now :)

I wouldn't mind the ':' being there, makes my life a bit easier, but for
some reason Al Viro didn't want to do that a long time ago...

If we put the ':' in there, it protects userspace from having to deal
with different sized dev_t, so that really makes sense.

thanks,

greg k-h

2003-07-16 21:06:26

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Greg KH <[email protected]> wrote:
>
> > So we'll need to come up with some standardised way of presenting a dev_t
> > to the user. Presumably that will just be
> >
> > sprintf(buf, "%d:%d", major(dev), minor(dev));
> >
> > But if we do this, will it break your existing stuff?
>
> No, I don't think there are any users of udev right now :)
>
> I wouldn't mind the ':' being there, makes my life a bit easier, but for
> some reason Al Viro didn't want to do that a long time ago...
>
> If we put the ':' in there, it protects userspace from having to deal
> with different sized dev_t, so that really makes sense.

OK, I think I'll make it so and hope he doesn't notice ;)

The new dev_t encoding is a bit weird because we of course continue to
support the old 8:8 encoding. I think the rule is: "if the top 32-bits are
zero, it is 8:8, otherwise 32:32". We can express this nicely with
"%u:%u".

Now I need to go hunt down all those places where I added casts to unsigned
longs in printks. hrm.

2003-07-16 21:20:06

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Wed, Jul 16, 2003 at 02:13:20PM -0700, Andrew Morton wrote:

> The new dev_t encoding is a bit weird because we of course continue to
> support the old 8:8 encoding. I think the rule is: "if the top 32-bits are
> zero, it is 8:8, otherwise 32:32". We can express this nicely with
> "%u:%u".

16-bit only: 8:8, otherwise 32-bit only: 16:16, otherwise 32:32.

Andries

2003-07-16 21:31:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Andries Brouwer <[email protected]> wrote:
>
> On Wed, Jul 16, 2003 at 02:13:20PM -0700, Andrew Morton wrote:
>
> > The new dev_t encoding is a bit weird because we of course continue to
> > support the old 8:8 encoding. I think the rule is: "if the top 32-bits are
> > zero, it is 8:8, otherwise 32:32". We can express this nicely with
> > "%u:%u".
>
> 16-bit only: 8:8, otherwise 32-bit only: 16:16, otherwise 32:32.
>

Why do we need the 16:16 option?

2003-07-16 21:41:10

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Wed, Jul 16, 2003 at 02:13:20PM -0700, Andrew Morton wrote:
>
> The new dev_t encoding is a bit weird because we of course continue to
> support the old 8:8 encoding. I think the rule is: "if the top 32-bits are
> zero, it is 8:8, otherwise 32:32". We can express this nicely with
> "%u:%u".

Sounds good, much appreciated.

> Now I need to go hunt down all those places where I added casts to unsigned
> longs in printks. hrm.

Heh, I think I got a few of them with my patch. Who else prints out the
dev_t value?

thanks,

greg k-h

2003-07-16 21:52:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Greg KH <[email protected]> wrote:
>
> Who else prints out the dev_t value?

There are only a few places where it happens. It is random junk like
"mounted filesystem foo on device %d"

2003-07-16 21:57:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Followup to: <[email protected]>
By author: Andrew Morton <[email protected]>
In newsgroup: linux.dev.kernel
>
> Andries Brouwer <[email protected]> wrote:
> >
> > On Wed, Jul 16, 2003 at 02:13:20PM -0700, Andrew Morton wrote:
> >
> > > The new dev_t encoding is a bit weird because we of course continue to
> > > support the old 8:8 encoding. I think the rule is: "if the top 32-bits are
> > > zero, it is 8:8, otherwise 32:32". We can express this nicely with
> > > "%u:%u".
> >
> > 16-bit only: 8:8, otherwise 32-bit only: 16:16, otherwise 32:32.
> >
>
> Why do we need the 16:16 option?
>

We needs 32-bit for NFSv2, but I thought it was going to be 12:20.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

2003-07-16 22:00:12

by Greg KH

[permalink] [raw]
Subject: what's left for 64 bit dev_t

On Wed, Jul 16, 2003 at 03:00:10PM -0700, Andrew Morton wrote:
> Greg KH <[email protected]> wrote:
> >
> > Who else prints out the dev_t value?
>
> There are only a few places where it happens. It is random junk like
> "mounted filesystem foo on device %d"

Ah.

Ok, to change the topic a bit, what's left to do on the 64bit dev_t
stuff? I know your tree has some support, but there was rumors that
more was really needed to finish this off right.

Any ideas? Thoughts? Patches? :)

thanks,

greg k-h

2003-07-16 22:10:16

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Wed, Jul 16, 2003 at 02:39:02PM -0700, Andrew Morton wrote:
> Andries Brouwer <[email protected]> wrote:
> >
> > On Wed, Jul 16, 2003 at 02:13:20PM -0700, Andrew Morton wrote:
> >
> > > The new dev_t encoding is a bit weird because we of course continue to
> > > support the old 8:8 encoding. I think the rule is: "if the top 32-bits are
> > > zero, it is 8:8, otherwise 32:32". We can express this nicely with
> > > "%u:%u".
> >
> > 16-bit only: 8:8, otherwise 32-bit only: 16:16, otherwise 32:32.

> Why do we need the 16:16 option?

It is not very important, but major 0 is reserved, so if userspace
(or a filesystem) hands us a 32-bit device number, we have to
split that in some way, not 0+32. Life is easiest with 16+16.
(Now the major is nonzero, otherwise we had 8+8.)
Other choices lead to slightly more complicated code.

Andries

2003-07-16 22:17:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Andries Brouwer <[email protected]> wrote:
>
>
> > Why do we need the 16:16 option?
>
> It is not very important, but major 0 is reserved, so if userspace
> (or a filesystem) hands us a 32-bit device number, we have to
> split that in some way, not 0+32. Life is easiest with 16+16.
> (Now the major is nonzero, otherwise we had 8+8.)
> Other choices lead to slightly more complicated code.
>

Why would anyone hand the kernel a 32-bit device number? They're either 16
or 64, are they not?

2003-07-16 22:20:17

by Andrew Morton

[permalink] [raw]
Subject: Re: what's left for 64 bit dev_t

Greg KH <[email protected]> wrote:
>
> Ok, to change the topic a bit, what's left to do on the 64bit dev_t
> stuff?

I don't know, really. I've asked this of Andries several times and either
he doesn't know either, or I didn't understand the answer.

But there have been no problems with the code in -mm for a couple of months.

> I know your tree has some support, but there was rumors that
> more was really needed to finish this off right.
>
> Any ideas? Thoughts? Patches? :)

The fact that the code has not been tested on ppc and, presumably, several
other platforms is a problem, but I guess that'll work itself out.

The situation at present is that Linus will take the patches, but I ain't
sending them because viro has expressed oblique concerns over the approach.
I'd like to get his take on it before proceeding. But he has vanished
again. However I do expect that he'll get a chance to review and comment on the
changes soon.

The other concern is the unknown amount of followup work which is needed.
All I can say there is that the kernel will continue to work in the areas
which have been exercised by testers of the -mm kernels.

I expect we'll end up just jamming it in and seeing what happens.

2003-07-16 22:34:30

by Greg KH

[permalink] [raw]
Subject: Re: what's left for 64 bit dev_t

On Wed, Jul 16, 2003 at 03:19:39PM -0700, Andrew Morton wrote:
>
> I expect we'll end up just jamming it in and seeing what happens.

Sounds good to me :)

I think the big problems will start to happen when people try to _use_
the expanded namespace. Is LANANA set up to assign bigger numbers now?
Are they going to carve them up into chunks? Or are we relying on
userspace implementations like udev to handle the number management?

greg k-h

2003-07-16 23:30:53

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Wed, Jul 16, 2003 at 03:21:43PM -0700, Andrew Morton wrote:

> > > Why do we need the 16:16 option?
> >
> > It is not very important, but major 0 is reserved, so if userspace
> > (or a filesystem) hands us a 32-bit device number, we have to
> > split that in some way, not 0+32. Life is easiest with 16+16.
> > (Now the major is nonzero, otherwise we had 8+8.)
> > Other choices lead to slightly more complicated code.
> >
>
> Why would anyone hand the kernel a 32-bit device number? They're either 16
> or 64, are they not?

The kernel has no control over what userspace comes with.
And here userspace includes filesystems.
Not all filesystems know how to come with 64 bits.

Andries

2003-07-16 23:41:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Andries Brouwer <[email protected]> wrote:
>
> > Why would anyone hand the kernel a 32-bit device number? They're either 16
> > or 64, are they not?
>
> The kernel has no control over what userspace comes with.
> And here userspace includes filesystems.
> Not all filesystems know how to come with 64 bits.

What does "comes with" mean?

Please describe a scenario in which a filesystem which works on current
kernels will, in a 64-bit-dev_t kernel, call init_special_inode() with a
16:16 encoded device number.

2003-07-17 02:46:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Followup to: <[email protected]>
By author: Andries Brouwer <[email protected]>
In newsgroup: linux.dev.kernel
> > >
> > > 16-bit only: 8:8, otherwise 32-bit only: 16:16, otherwise 32:32.
>
> > Why do we need the 16:16 option?
>
> It is not very important, but major 0 is reserved, so if userspace
> (or a filesystem) hands us a 32-bit device number, we have to
> split that in some way, not 0+32. Life is easiest with 16+16.
> (Now the major is nonzero, otherwise we had 8+8.)
> Other choices lead to slightly more complicated code.
>

I would still recommend the arrangement for 64-bit dev_t that I posted
a while ago:

dev_t<63:40> := major<31:8>
dev_t<39:16> := minor<31:8>
dev_t<15:8> := major<7:0>
dev_t<7:0> := minor<7:0>

No aliasing, no forbidden bit patterns, no conditional code, no need
for magic numbers, and it's fully compatible with the current
LSB-adjusted user space dev_t format. I also posted i386 code for the
various operations to show that they really can be done with very
little code.

If you want, you can even make it 32-bit-friendly, although it makes
it more complex; for example, this version would implement 32-bit with
a 12:20 split:

dev_t<63:44> := major<31:12>
dev_t<43:32> := minor<31:20>
dev_t<31:28> := major<11:8>
dev_t<27:16> := minor<19:8>
dev_t<15:8> := major<7:0>
dev_t<7:0> := minor<7:0>

-hpa
--
<[email protected]> at work, <[email protected]> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

2003-07-17 02:47:43

by H. Peter Anvin

[permalink] [raw]
Subject: Re: what's left for 64 bit dev_t

Followup to: <[email protected]>
By author: Greg KH <[email protected]>
In newsgroup: linux.dev.kernel
>
> On Wed, Jul 16, 2003 at 03:19:39PM -0700, Andrew Morton wrote:
> >
> > I expect we'll end up just jamming it in and seeing what happens.
>
> Sounds good to me :)
>
> I think the big problems will start to happen when people try to _use_
> the expanded namespace. Is LANANA set up to assign bigger numbers now?
> Are they going to carve them up into chunks? Or are we relying on
> userspace implementations like udev to handle the number management?
>

I suspect it will be a combination. In the short term I suspect John
Cagle (who is device@lanana) will fix the current glaring bogosities.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

2003-07-17 08:13:49

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Wed, Jul 16, 2003 at 04:49:17PM -0700, Andrew Morton wrote:
> Please describe a scenario in which a filesystem which works on current
> kernels will, in a 64-bit-dev_t kernel, call init_special_inode() with a
> 16:16 encoded device number.

Perhaps he's thinking of NFSv2. If you want to make a device
bigger than 8:8... Personally, I'm happy to ignore NFSv2 for this.
If we did support a 32bit median format, I would suggest we
either use Peter's strategy or we use 12:20. 16:16 is so limiting.

Joel


--

"War doesn't determine who's right; war determines who's left."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127

2003-07-17 08:32:38

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Hi,

On Thu, 17 Jul 2003, Joel Becker wrote:

> On Wed, Jul 16, 2003 at 04:49:17PM -0700, Andrew Morton wrote:
> > Please describe a scenario in which a filesystem which works on current
> > kernels will, in a 64-bit-dev_t kernel, call init_special_inode() with a
> > 16:16 encoded device number.
>
> Perhaps he's thinking of NFSv2. If you want to make a device
> bigger than 8:8... Personally, I'm happy to ignore NFSv2 for this.

It's not just NFS2, with NFS3 and later it also depends on how many and
which bits the server keeps. They usually use the standard major/minor/
makedev macros, so you only get back what the platform supports.
Splitting dev_t in major/minor numbers can be lots of fun...

bye, Roman

2003-07-17 09:00:42

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Thu, Jul 17, 2003 at 10:47:08AM +0200, Roman Zippel wrote:
> It's not just NFS2, with NFS3 and later it also depends on how many and
> which bits the server keeps. They usually use the standard major/minor/
> makedev macros, so you only get back what the platform supports.
> Splitting dev_t in major/minor numbers can be lots of fun...

Well, exporting devices over NFS is always tricky, because if
the server isn't an identical OS, you can't even trust the numbers. As
you point out, you get the platform's idea of a device number, and that
doesn't map to your local OS.
It is no different than today. You have to make sure that the
server's filesystem stores device numbers valid for the client if the
client wants to use those device nodes.

Joel


--

"In the room the women come and go
Talking of Michaelangelo."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127

2003-07-17 09:09:26

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Joel Becker <[email protected]> wrote:
>
> Well, exporting devices over NFS is always tricky, because if
> the server isn't an identical OS, you can't even trust the numbers. As
> you point out, you get the platform's idea of a device number, and that
> doesn't map to your local OS.

And surely the task of mangling whatever comes off the wire into a dev_t for
init_special_inode() should be private to the Linux NFS client?

Still wondering why we need to support a 16:16 encoding in [k]dev_t.


2003-07-17 09:56:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

>>>>> " " == Andrew Morton <[email protected]> writes:

> And surely the task of mangling whatever comes off the wire
> into a dev_t for init_special_inode() should be private to the
> Linux NFS client?

Well... Yes, but don't forget that it's not just a client issue but a
server issue too.

The NFSv2 'rdev' field is an unspecified 32-bit integer
format.
For NFSv3, you have a 32-bit major and a 32-bit minor number. Again
the mapping is unspecified by the protocol.

It all works by assuming that the client and server have agreed to use
the same format/conventions.

So if we want to retain backward compatibility with existing 2.4.x NFS
(and particularly NFSroot) clients/servers, then we want to ensure
that all numbers that are sent over the wire stay the same.

Cheers,
Trond

2003-07-17 10:11:15

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Wed, Jul 16, 2003 at 04:49:17PM -0700, Andrew Morton wrote:
> Andries Brouwer <[email protected]> wrote:
>
> > > Why would anyone hand the kernel a 32-bit device number?
> > > They're either 16 or 64, are they not?
> >
> > The kernel has no control over what userspace comes with.
> > And here userspace includes filesystems.
> > Not all filesystems know how to come with 64 bits.
>
> What does "comes with" mean?
>
> Please describe a scenario in which a filesystem which works on current
> kernels will, in a 64-bit-dev_t kernel, call init_special_inode() with a
> 16:16 encoded device number.

:-) You change the subject.
There are many filesystems that only have room for 32 bits.
For example, NFSv2 has "unsigned int rdev".
So, the kernel must be able to handle 32-bit device numbers.

Now about the encoding - nobody knows. This NFS filesystem was mounted
from a FreeBSD system. It is encoded 16+8+8 with the middle 8 the major.
Or, no, it was Solaris or Irix. Encoded 14+18. Etc.

In the case of NFSv2 there is an unknown system on the other side.
Internally for Linux we have not yet used larger device numbers
so there are no cases of 16+16 yet. But there will be occasions
where we have to store a device number in 32 bits, and what I am
saying is that life is easiest if we use 16+16 in such cases.

Andries

2003-07-17 10:28:34

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Wed, Jul 16, 2003 at 07:54:36PM -0700, H. Peter Anvin wrote:

> > 16-bit only: 8:8, otherwise 32-bit only: 16:16, otherwise 32:32.
>
> I would still recommend the arrangement for 64-bit dev_t that I posted
> a while ago:
>
> dev_t<63:40> := major<31:8>
> dev_t<39:16> := minor<31:8>
> dev_t<15:8> := major<7:0>
> dev_t<7:0> := minor<7:0>

Yes, but we we also need to handle 32-bit dev_t incarnations.

> If you want, you can even make it 32-bit-friendly, although it makes
> it more complex; for example, this version would implement 32-bit with
> a 12:20 split:
>
> dev_t<63:44> := major<31:12>
> dev_t<43:32> := minor<31:20>
> dev_t<31:28> := major<11:8>
> dev_t<27:16> := minor<19:8>
> dev_t<15:8> := major<7:0>
> dev_t<7:0> := minor<7:0>

Too messy. But you are right - no conditionals involved.

2003-07-17 10:31:45

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

In article <[email protected]>,
Andries Brouwer <[email protected]> wrote:
>There are many filesystems that only have room for 32 bits.
>For example, NFSv2 has "unsigned int rdev".
>So, the kernel must be able to handle 32-bit device numbers.
>
>Now about the encoding - nobody knows. This NFS filesystem was mounted
>from a FreeBSD system. It is encoded 16+8+8 with the middle 8 the major.
>Or, no, it was Solaris or Irix. Encoded 14+18. Etc.
>
>In the case of NFSv2 there is an unknown system on the other side.

So put the translation of 32 bits rdev to 32:32 in the NFS client.
Provide a mount-time option "rdev-encoding=14:18", with symbolic
names for often-used encodings: "rdev-encoding=solaris". Done.
You can do this on the NFS server side as well.. per-client,
even. If anyone still cares for NFSv2, that is.

Same goes for other filesystems, though a dynamic translation will
not be nessecary. But the filesystem driver itself
must convert from native rdev to linux 32:32.

Mike.

2003-07-17 11:05:07

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Thu, Jul 17, 2003 at 10:46:35AM +0000, Miquel van Smoorenburg wrote:

> The filesystem driver itself must convert from native rdev to linux 32:32.

Look at the mknod utility.
The user types major,minor.
The system call uses dev_t.
This means that user space needs to be able to combine
major,minor into a dev_t.

It is not a good idea to require of mknod that it knows
about the filesystem the node is going to be created on.

Andries


[In other words: we invent something, and what we invent is
encoded in <sys/sysmacros.h>. It cannot depend on fs type.]


2003-07-17 11:32:05

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

In article <[email protected]>,
Andries Brouwer <[email protected]> wrote:
>On Thu, Jul 17, 2003 at 10:46:35AM +0000, Miquel van Smoorenburg wrote:
>
>> The filesystem driver itself must convert from native rdev to linux 32:32.
>
>Look at the mknod utility.
>The user types major,minor.
>The system call uses dev_t.
>This means that user space needs to be able to combine
>major,minor into a dev_t.

Ah, I see. That is a different issue - converting the 32-bit dev_t
from userspace into a 32:32 internal representation.

But, a utility like mknod currently only knows about 8:8 anyway.
It needs to be patched to know about >8:>8 ... why not add
64bit (32:32) dev_t syscalls at the same time ?

I mean, if the 64 bit dev_t is not going to be exposed to userspace,
why bother with it in the kernel ? And if it /is/ going to be
exposed to userspace, why bother with a 32 bit encoding ?

Mike.

2003-07-17 11:31:58

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Thu, Jul 17, 2003 at 02:24:44AM -0700, Andrew Morton wrote:

> And surely the task of mangling whatever comes off the wire into a dev_t for
> init_special_inode() should be private to the Linux NFS client?
>
> Still wondering why we need to support a 16:16 encoding in [k]dev_t.

I think I answered this already in earlier posts today. Again:
(i) We need support for 16/32/64-bit dev_t.
(ii) User space (glibc) has 64-bit dev_t.
(iii) The split into major/minor is hardwired in <sys/sysmacros.h>,
independent of filesystem. Thus, we must define major(),minor(),makedev().
(iv) For Linux the device number is a cookie - major and minor do not
really have a significance - we just select a driver given a *dev_t
interval. That means that there are no reasons for inventing more
complicated setups like 12:20.

And since you add "[k]": a kdev_t is internal to the kernel,
we do whatever we want. I wanted a pointer (say, to a struct gendisk or so),
but these days it seems we are heading for an arithmetic type, with 32:32.

Andries

2003-07-17 11:40:44

by Alan

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Iau, 2003-07-17 at 10:15, Joel Becker wrote:
> Well, exporting devices over NFS is always tricky, because if
> the server isn't an identical OS, you can't even trust the numbers.

NFSv3 fixed this.

2003-07-17 13:45:06

by Andries Brouwer

[permalink] [raw]
Subject: Re: what's left for 64 bit dev_t

On Wed, Jul 16, 2003 at 03:19:39PM -0700, Andrew Morton wrote:

> The situation at present is that Linus will take the patches,
> but I ain't sending them because viro has expressed oblique
> concerns over the approach. I'd like to get his take on it
> before proceeding. But he has vanished again.

Aha. So, recently I diagnosed a deadlock, and this gives
some more insight in the nature of the deadlock.

It would be vaguely interesting to see these oblique concerns
dated and quoted.

2003-07-17 21:48:04

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

Andries Brouwer <[email protected]> wrote:
>
> > The filesystem driver itself must convert from native rdev to linux 32:32.
>
> Look at the mknod utility.
> The user types major,minor.
> The system call uses dev_t.
> This means that user space needs to be able to combine
> major,minor into a dev_t.

But mknod64() takes major/minor. Requiring a util-linux upgrade is OK.

2003-07-17 22:10:00

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Thu, Jul 17, 2003 at 02:55:07PM -0700, Andrew Morton wrote:
> Andries Brouwer <[email protected]> wrote:
> >
> > > The filesystem driver itself must convert from native rdev to linux 32:32.
> >
> > Look at the mknod utility.
> > The user types major,minor.
> > The system call uses dev_t.
> > This means that user space needs to be able to combine
> > major,minor into a dev_t.
>
> But mknod64() takes major/minor. Requiring a fileutils upgrade is OK.

[I think I already answered - please ask again if not.]

Premise: some filesystems or archives store 32 bits.
Conclusion: we must be able to handle that.
This is unrelated to the kernel, unrelated to system calls,
it is related to <sys/sysmacros.h>.

2003-07-17 22:31:31

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Fri, Jul 18, 2003 at 12:24:51AM +0200, Andries Brouwer wrote:
> Premise: some filesystems or archives store 32 bits.
> Conclusion: we must be able to handle that.
> This is unrelated to the kernel, unrelated to system calls,
> it is related to <sys/sysmacros.h>.

How does linux handle that today? IIRC, it ignores the high
16bits and treats that 32bit number as 8:8. That is what happens today,
for every filesystem, whether it stores 32 or 16 bits.
Why expand that? We can continue to treat 32bit numbers (eg,
from NFSv2) as 16bit numbers.

Joel


--

"One of the symptoms of an approaching nervous breakdown is the
belief that one's work is terribly important."
- Bertrand Russell

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127

2003-07-17 22:56:29

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Thu, Jul 17, 2003 at 03:43:08PM -0700, Joel Becker wrote:
> On Fri, Jul 18, 2003 at 12:24:51AM +0200, Andries Brouwer wrote:
> > Premise: some filesystems or archives store 32 bits.
> > Conclusion: we must be able to handle that.
> > This is unrelated to the kernel, unrelated to system calls,
> > it is related to <sys/sysmacros.h>.
>
> How does linux handle that today? IIRC, it ignores the high
> 16bits and treats that 32bit number as 8:8. That is what happens today,
> for every filesystem, whether it stores 32 or 16 bits.
> Why expand that? We can continue to treat 32bit numbers (eg,
> from NFSv2) as 16bit numbers.

:-) A surprising question.
Why expand that?
Because we would like to use more than 16 bits in device numbers.

2003-07-17 23:34:21

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

In article <[email protected]>,
Andries Brouwer <[email protected]> wrote:
>On Thu, Jul 17, 2003 at 03:43:08PM -0700, Joel Becker wrote:
>> On Fri, Jul 18, 2003 at 12:24:51AM +0200, Andries Brouwer wrote:
>> > Premise: some filesystems or archives store 32 bits.
>> > Conclusion: we must be able to handle that.
>> > This is unrelated to the kernel, unrelated to system calls,
>> > it is related to <sys/sysmacros.h>.
>>
>> How does linux handle that today? IIRC, it ignores the high
>> 16bits and treats that 32bit number as 8:8. That is what happens today,
>> for every filesystem, whether it stores 32 or 16 bits.
>> Why expand that? We can continue to treat 32bit numbers (eg,
>> from NFSv2) as 16bit numbers.
>
>:-) A surprising question.
>Why expand that?
>Because we would like to use more than 16 bits in device numbers.

But why do you need a 32bit interface to the kernel when a
32:32 interface exists? Userland can translate 32 bit major/minor
into 32:32 to the kernel, if a 64 bits syscall exists, right
CAse in point: mknod64()

Same goes for filesystems. A 32 bit on-disk rdev doesn't need to
be handled by the rest of the kernel. The filesystem driver just
needs to translate it to 32 major 32 minor for the rest of the
kernel.

Mike.

2003-07-17 23:50:33

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Fri, Jul 18, 2003 at 01:11:15AM +0200, Andries Brouwer wrote:
> :-) A surprising question.
> Why expand that?
> Because we would like to use more than 16 bits in device numbers.

Yes, but there is a nice simplicity in saying filesystems that
support 64bit device numbers get the expanded space, and filesystems
that cannot are limited to 16bits. Most modern systems would have an
updated set of filesystems. All pre-existing filesystems have only
16bit device numbers. All new mknod64() calls will only work on
filesystems that can store 64bits.

Joel

--

"Hey mister if you're gonna walk on water,
Could you drop a line my way?"

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127

2003-07-17 23:49:51

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Thu, Jul 17, 2003 at 11:49:15PM +0000, Miquel van Smoorenburg wrote:

> But why do you need a 32bit interface to the kernel when a
> 32:32 interface exists? Userland can translate 32 bit major/minor
> into 32:32 to the kernel, if a 64 bits syscall exists.

I? A 32bit interface to the kernel? Why do you think I want one?

The discussion has become too long, and people react to single
sentences in a reply instead of reading the thread.

[This started when I answered Andrew and wrote about a dev_t:
8+8 when 16-bit, otherwise 16+16 when 32-bit, otherwise 32+32.
Look: no kernel involved. No interface involved.
This structure is defined by <sys/sysmacros.h>.]

2003-07-18 00:51:09

by Andries Brouwer

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Thu, Jul 17, 2003 at 05:04:45PM -0700, Joel Becker wrote:

> Yes, but there is a nice simplicity in saying filesystems that
> support 64bit device numbers get the expanded space, and filesystems
> that cannot are limited to 16bits. Most modern systems would have an
> updated set of filesystems. All pre-existing filesystems have only
> 16bit device numbers. All new mknod64() calls will only work on
> filesystems that can store 64bits.

You are an optimist.
My transition is much slower - I am a slow kind of person.

There is no flag day. The kernel must be updated, glibc must be
updated, user space software must be updated. A long process
that will take years. Indeed, so far we have not succeeded in
updating the kernel, and eight years went by.

Filesystems? Last I looked reiserfs handled 32 bits.

Really, we need the three stages - if the middle 32-bit stage
is absent too much software breaks. We must go forward slowly.

Andries

2003-07-18 07:52:16

by Joel Becker

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

On Fri, Jul 18, 2003 at 03:05:58AM +0200, Andries Brouwer wrote:
> There is no flag day. The kernel must be updated, glibc must be
> updated, user space software must be updated. A long process
> that will take years. Indeed, so far we have not succeeded in
> updating the kernel, and eight years went by.

Yes, software must be updated. Why on earth would you update it
twice when once will do?

> Filesystems? Last I looked reiserfs handled 32 bits.

And you treat it as having 16bits until reiser4 or reiser5
handles 64bits.

Joel

--

"Under capitalism, man exploits man. Under Communism, it's just
the opposite."
- John Kenneth Galbraith

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: [email protected]
Phone: (650) 506-8127

2003-07-20 14:18:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] print_dev_t for 2.6.0-test1-mm

> On Wed, Jul 16, 2003 at 07:54:36PM -0700, H. Peter Anvin wrote:
>
>> > 16-bit only: 8:8, otherwise 32-bit only: 16:16, otherwise 32:32.
>>
>> I would still recommend the arrangement for 64-bit dev_t that I posted
>> a while ago:
>>
>> dev_t<63:40> := major<31:8>
>> dev_t<39:16> := minor<31:8>
>> dev_t<15:8> := major<7:0>
>> dev_t<7:0> := minor<7:0>
>
> Yes, but we we also need to handle 32-bit dev_t incarnations.
>
>> If you want, you can even make it 32-bit-friendly, although it makes
>> it more complex; for example, this version would implement 32-bit with
>> a 12:20 split:
>>
>> dev_t<63:44> := major<31:12>
>> dev_t<43:32> := minor<31:20>
>> dev_t<31:28> := major<11:8>
>> dev_t<27:16> := minor<19:8>
>> dev_t<15:8> := major<7:0>
>> dev_t<7:0> := minor<7:0>
>
> Too messy. But you are right - no conditionals involved.

It's only necessary, though, if we require that dev32_t is a pure
truncation of dev_t, which is nice but not necessary.

No aliasing is a *very* good thing.