2002-11-17 19:45:28

by Doug Ledford

[permalink] [raw]
Subject: Why /dev/sdc1 doesn't show up...

Because module loading of any incestious, cross-locking modules is horked
:-P

NOTE: I suspect the same bug applies to IDE devices as well, but you
wouldn't see it unless you compile your IDE drivers as modules and use
initrd or equivelant to load the modules.

Longer answer:

Device scans happen almost exclusively at either host module init time or
device module init time. At that point in time, either the host the
device is on or the high level driver accessing the device will still be
in it's init_module() routine. That, of course, implies that either
host->hostt->module->live is 0 or that *_template->module->live is 0 (and
consequently so is fops->owner->live == 0). As a result, when you
register sdc with the driverfs code, it then tries to fops->open(sdc) to
read the partition table and then automatically register subdevices.
This fails if you are currently loading sd_mod because fops->owner->live
== 0. This fails if you are loading your low level driver because the
replacement for __MOD_INC_USE_COUNT(sdev->host->hostt->module); in
sd_open() is to instead use try_module_get(sdev->host->hostt->module) and
with the low level driver still in it's init_module() routine, this fails.

For hot plug events when the module is already live, things work fine.
However, that doesn't help hot plug drivers at boot up because when they
register their hot plug table they immediately get called for the devices
that are already present, and in those cases they will have the exact same
problem that we have with non hot plug drivers.

Hrmph...

Working on a fix. Haven't decided how to do it yet. Something as ugly as
adding:

driver_template.module->live = 1;
scsi_register_host(&driver_template);
driver_template.module->live = 0;

in scsi_module.c works, but is too ugly to live (and totally defeats the
purpose of the new module loading code anyway). Oh, and all the high
level drivers would have to do the same thing in their module init
routines in order to make things work properly when the lldd is loaded
before the high level driver.

I could make all the scsi drivers delay bus scans (via a work queue entry
that we don't wait for completion on, but I haven't figured out how to do
that without leaking mem yet, unless I write a special SCSI worker thread
that kfree()'s the work structs on completion...) until after their init
routines have finished (and do the same for the high level scsi drivers),
but that's quite a pain in the ass and doesn't fix IDE. A generic fix
would be preferable.

Suggestions Rusty?


--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606


2002-11-17 19:54:13

by Alexander Viro

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...



On Sun, 17 Nov 2002, Doug Ledford wrote:

> in scsi_module.c works, but is too ugly to live (and totally defeats the
> purpose of the new module loading code anyway). Oh, and all the high

There is a purpose? Seriously, "no use of ones object during init" is
WRONG. Rusty, remember I've told you that block devices need to be
able to open() during init? That's what it is.

We _might_ eventually kludge around that, but IMO the ->live checks on
the init side are just plain wrong.

2002-11-17 20:05:03

by Doug Ledford

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

On Sun, Nov 17, 2002 at 03:01:06PM -0500, Alexander Viro wrote:
>
>
> On Sun, 17 Nov 2002, Doug Ledford wrote:
>
> > in scsi_module.c works, but is too ugly to live (and totally defeats the
> > purpose of the new module loading code anyway). Oh, and all the high
>
> There is a purpose? Seriously, "no use of ones object during init" is
> WRONG. Rusty, remember I've told you that block devices need to be
> able to open() during init? That's what it is.
>
> We _might_ eventually kludge around that, but IMO the ->live checks on
> the init side are just plain wrong.

I tend to agree, and I almost wrote that in my email, but then decided I
hadn't thought on the issue long enough to declare that and then be proven
wrong and look like a fool.

--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606

2002-11-17 20:09:17

by Alexander Viro

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...



On Sun, 17 Nov 2002, Alexander Viro wrote:

> There is a purpose? Seriously, "no use of ones object during init" is
> WRONG. Rusty, remember I've told you that block devices need to be
> able to open() during init? That's what it is.
>
> We _might_ eventually kludge around that, but IMO the ->live checks on
> the init side are just plain wrong.

PS: logics in case of block devices is very simple: by the time when you
can handle open and IO on a disk, you are already set up - no point
bailing out after that point; if other disks served by the same driver
do not work, it means that they should be left out, not that driver
should fail init. IOW, logics of init, regardless of modular/nonmodular
makes that a point where you are past failure exits.

And I'd rather see similar rules for other interfaces. Yes, it might
require API changes - e.g. old block API was badly b0rken in that respect
since register_blkdev() allowed open() attempts, no matter whether you are
ready or not, so block drivers used to be seriously racy in that area.

Right way is "don't export an object until you are ready to serve it" and
once your API is of that sort, you are pretty much done. Note that non-modular
case _also_ need that sort of logics, simply because we might get userland
up and running before the initcalls are done.

2002-11-17 23:13:08

by Andries Brouwer

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

On Sun, Nov 17, 2002 at 02:52:58PM -0500, Doug Ledford wrote:

> Working on a fix. Haven't decided how to do it yet.

I encountered similar phenomena with usb-storage.
I think the proper solution is to never automatically
scan for a partition table.
We perhaps need to do that at boot time, but in all other
cases user space can ask the kernel to read a partition table.
For usb-storage things work fairly well (for some kernels)
using
blockdev --rereadpt /dev/sdx

Andries


2002-11-17 23:37:08

by Doug Ledford

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

On Mon, Nov 18, 2002 at 12:20:04AM +0100, Andries Brouwer wrote:
> On Sun, Nov 17, 2002 at 02:52:58PM -0500, Doug Ledford wrote:
>
> > Working on a fix. Haven't decided how to do it yet.
>
> I encountered similar phenomena with usb-storage.
> I think the proper solution is to never automatically
> scan for a partition table.

Scanning for a partition table is just one aspect. In general, once a
module has handed over it's entry points to some other kernel area those
entry points should be ready for use, so blocking access to them isn't
needed. OTOH, when you are tearing a module down then you do want access
to stop, so that's the proper time to set live == 0.

> We perhaps need to do that at boot time, but in all other
> cases user space can ask the kernel to read a partition table.

(When it works, which it currently doesn't for me in latest bk)

> For usb-storage things work fairly well (for some kernels)
> using
> blockdev --rereadpt /dev/sdx

Yeah, that's fine for hotplug. But, you know what, for PCI hotplug scsi
this wouldn't even be an issue since a true hot plug event normally would
be after module init is complete and scanning the table wouldn't be a
problem then because module->live would be non-0 ;-)

--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606

2002-11-18 08:52:29

by Rusty Russell

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

In message <[email protected]> you write:
> Because module loading of any incestious, cross-locking modules is horked
> :-P
>
> NOTE: I suspect the same bug applies to IDE devices as well, but you
> wouldn't see it unless you compile your IDE drivers as modules and use
> initrd or equivelant to load the modules.
>
> Longer answer:
>
> Device scans happen almost exclusively at either host module init time or
> device module init time. At that point in time, either the host the
> device is on or the high level driver accessing the device will still be
> in it's init_module() routine. That, of course, implies that either
> host->hostt->module->live is 0 or that *_template->module->live is 0 (and
> consequently so is fops->owner->live == 0).

Yes, it's an interesting corner case. As I implied before, if a
module is register an interface, you do not need to try_module_get().
Why? Because *someone* already has a reference by definition (either
the module itself is calling you, or someone else with whom the module
registered).

Unfortunately, this does complicate code slightly if you other paths
which *do* need the try_module_get(). But the existence of this
interface is no accident: I know Al Viro wants to split every register
interface to "reserve" and "use", and make all modules use them
correctly, but I chose to avoid forcing such changes on all module
authors and all interface authors for good reason.

[ Mainly to Al ]:

I know what a PITA it is, because I implemented it a year ago. So
first the interface had to be changed to two-stage init, *then* the
module had to be changed to use it. If a module uses 5 interfaces,
that's a series of five patches, each one interlocked with the
corresponding interface. And implementing it showed me that authors
were unlikely to correctly use any interface more complex than the
current one 8(

And it *still* means you need two paths for your code: one for "I know
it's not actually "active" yet, but I'm doing init and I need to
access it anyway". So every interface gains significant complexity by
effectively implementing their own "live" flag...

> Suggestions Rusty?

Hope that helps,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-18 09:44:22

by Alexander Viro

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...



On Mon, 18 Nov 2002, Rusty Russell wrote:

> Unfortunately, this does complicate code slightly if you other paths
> which *do* need the try_module_get(). But the existence of this
> interface is no accident: I know Al Viro wants to split every register
> interface to "reserve" and "use", and make all modules use them
> correctly, but I chose to avoid forcing such changes on all module
> authors and all interface authors for good reason.

No. I want to get the things into "don't bloody export it until you
can handle it" land.

> [ Mainly to Al ]:
>
> I know what a PITA it is, because I implemented it a year ago. So
> first the interface had to be changed to two-stage init, *then* the
> module had to be changed to use it. If a module uses 5 interfaces,
> that's a series of five patches, each one interlocked with the
> corresponding interface. And implementing it showed me that authors
> were unlikely to correctly use any interface more complex than the
> current one 8(

> And it *still* means you need two paths for your code: one for "I know
> it's not actually "active" yet, but I'm doing init and I need to
> access it anyway". So every interface gains significant complexity by
> effectively implementing their own "live" flag...

Not really. For case in question (block devices) there is only one path
and I'd rather keep it that way, thank you very much.

Again, by the time when add_disk() got to reading partition table, device
is _there_. That's it - we had set it up completely, it's ready for IO,
whatever. At that point we want generic code to do some work with that
device. And there is no magic path for that - it's normal open/read/close.

There is no "live" flag - you had shown it, you'd better be ready to have
it used. Doesn't cause any problems.

It's simpler than old interface - there you indeed had to pull off all sorts
of tricks. Current rule is trivial - "nobody touches it until they see it;
register when you are 100% ready to deal with users".

And no, I don't buy arguments about poor interface-writers. You do some
infrastructure intended to be used by driver-writers - you are supposed
to have a clue. 'Cause having rabid monkeys on crack on *both* sides of
an interface is a recipe for disaster and on the driver side you are
guaranteed to have a bunch of them.

We need to have interfaces cleaned up. No silver bullets there. There's
maybe a dozen of interfaces that cover 99% of all drivers out there. Remaining
1% can and should fend for itself - you do something really tricky, you
are responsible for getting it right.

And yes, we need clued people for that cleanup. Forget the modules;
if use of an API requires magic, it will lead to fuckups. Lots of them.
If API doesn't make sense for users of that API - ditto.

Papering over the races in driver init is OK, as long as it doesn't hurt
drivers with clean init. Your logics with ->live on init side *does* hurt
such drivers - it forces them to do magic just to neutralize your band-aids.

2002-11-18 09:55:40

by Roman Zippel

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

Hi,

On Mon, 18 Nov 2002, Rusty Russell wrote:

> And it *still* means you need two paths for your code: one for "I know
> it's not actually "active" yet, but I'm doing init and I need to
> access it anyway". So every interface gains significant complexity by
> effectively implementing their own "live" flag...

Rusty, you know how refcounts work? Especially the "if
(atomic_dec_and_test(&obj->ref)) cleanup(obj)" part is really interesting.

bye, Roman

2002-11-18 23:45:31

by Rusty Russell

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

In message <[email protected]> you
write:
> Not really. For case in question (block devices) there is only one path
> and I'd rather keep it that way, thank you very much.

See other posting. This is a fundamental design decision, and it's
not changing. Sorry.

> Again, by the time when add_disk() got to reading partition table, device
> is _there_. That's it - we had set it up completely, it's ready for IO,
> whatever. At that point we want generic code to do some work with that
> device. And there is no magic path for that - it's normal open/read/close.
>
> There is no "live" flag - you had shown it, you'd better be ready to have
> it used. Doesn't cause any problems.

Unless the module does something else afterwards which fails and wants
to fail the init. You're saying "don't do that", which is not a good
answer 8(

You can implement a "make_module_live()" in module.h if you want
module authors to do two-stage init manually (and trust them to get it
right). Or you can run a notifier on "enlivening" a module: I'd hoped
to avoid that.

Hope that helps,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-19 00:01:56

by Doug Ledford

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

On Tue, Nov 19, 2002 at 10:49:21AM +1100, Rusty Russell wrote:
> In message <[email protected]> you
> write:
> > Not really. For case in question (block devices) there is only one path
> > and I'd rather keep it that way, thank you very much.
>
> See other posting. This is a fundamental design decision, and it's
> not changing. Sorry.

Then you'll have to back out the patch to module.c because it's already
changed.


--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606

2002-11-18 23:58:12

by Alan

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

On Mon, 2002-11-18 at 23:49, Rusty Russell wrote:
> In message <[email protected]> you
> write:
> > Not really. For case in question (block devices) there is only one path
> > and I'd rather keep it that way, thank you very much.
>
> See other posting. This is a fundamental design decision, and it's
> not changing. Sorry.

So is the block layer one, and it was fundamentally designed first 8)

2002-11-19 00:01:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...


On Tue, 19 Nov 2002, Rusty Russell wrote:
>
> See other posting. This is a fundamental design decision, and it's
> not changing. Sorry.

Rusty, you take that approach, and the module code flies out of the
kernel.

The block device code (and a lot of other code) has been there happily
forever. And not breaking drivers was part of the module loader rule. Now
you seem to say that drivers should be broken een if they are perfectly
fine and do not have any races as is.

If so, then bye bye new module loader.

Linus

2002-11-19 05:49:35

by Rusty Russell

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

> right). Or you can run a notifier on "enlivening" a module: I'd hoped
> to avoid that.

Actually, after some thought, this seems to clearly be the Right
Thing, because it solves another existing race. Consider a module
which does:

if (!register_foo(&my_foo))
goto cleanup;
if (!create_proc_entry(&my_entry))
goto cleanup_foo;

If register_foo() calls /sbin/hotplug, the module can still fail to
load and /sbin/hotplug is called for something that doesn't exist.
With the new module loader, you can also have /sbin/hotplug try to
access the module before it's gone live, which will fail to prevent
the "using before we know module won't fail init" race.

Now, if you run /sbin/hotplug out of a notifier which is fired when
the module actually goes live, this problem vanishes. It also means
we can block module unload until /sbin/hotplug is run.

The part that makes this feel like the Right Thing is that adding to
init/main.c:

/* THIS_MODULE == NULL */
notifier_call_chain(&module_notifiers, MODULE_LIVE, NULL);

means that /sbin/hotplug is called for everything which was registered
at boot. (We may not want to do this, but in general the symmetry
seems really nice).

[ Note: the logic for /sbin/hotplug applies to any similar "publicity"
function which promises that something now exists. ]

Al, thoughts?
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-19 07:05:59

by Alexander Viro

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...



On Tue, 19 Nov 2002, Rusty Russell wrote:

> means that /sbin/hotplug is called for everything which was registered
> at boot. (We may not want to do this, but in general the symmetry
> seems really nice).
>
> [ Note: the logic for /sbin/hotplug applies to any similar "publicity"
> function which promises that something now exists. ]
>
> Al, thoughts?
> Rusty.

Wrong granularity. "module had finished initialization" is _not_ a natural
event wrt stuff exported by that module. And that's one of the main problems
I have with your approach - you are trying to make module loader do the
stuff that has nothing whatsoever with modules.

Moreover, if driver is built into the kernel, you get (at the very least)
a different mechanism for triggering the same events, if not the outright
different sequence of events. Which is a Bad Thing(tm) since it leads to
a shitload of extra problems with debugging.

I went through the init code of all block device drivers and there was
a metric buttload of crap in handling failures. I doubt that more than
5% had any relation to races - much more often taking given failure path
had lead to oops. Deterministic one. With nothing else happening with
the system.

Double freeing. Forgetting to free. Dereference after freeing. Leaving
timers active. Leaving pointers to local structures in global arrays.
Plain and simple dereferencing of NULL due to wrong order. Freeing what
you had never allocated. Similar for resources other than memory. Doing
cleanup in a fashion that screws other drivers. IOW, all sorts of crap
that can accumulate in a spaghetty code that never got any serious testing.
Path-dependent crap, at that - different exits are fscked in different ways.

On that background all shite you are so concerned about is a noise. More often
than not, screwups in export order that might be papered over by your semantics
change either do not matter since the module doesn't fail in init _or_ do not
matter since it does fail and gets the system fscked to hell and back on the
failure exit path, races or not.

In other words, this band-aid doesn't do anything useful - if you go and
fix the crap in driver's init you can fix the export order while you are
at it. If you do not - pray that failure will not happen, because it it
will you are screwed without any races.

There is no fscking way for module loader to handle that. Leave it as-is,
possibly making the loader BUG() if init fails and refcount is positive.
Anything beyond that is not a responsibility of modules code. Note that
such BUG() doesn't add any new breakage - such situation had always been
a bug.

And yes, that means the need to fix the error recovery in driver init.
Tough. I don't know any other way to handle
kfree(p);
p = NULL;
kfree(p->bar);
and I don't believe that you can find one. Again, no silver bullets -
if you want to talk about handling half-way failures in module init,
be ready to deal with the above (embedded into the mess of goto and
function calls, to boot).

We are talking about bugs that had always been on these codepaths - both
in modular and built-in cases. They need to be dealt with before we can
do anything about these codepaths. Has nothing to module loader...

What we _can_ do (and that will be on per-subsystem basis) is providing
APIs that would cut down on the amount of glue needed in driver init.
And that's where switch to use of /sbin/hotplug (again, on per-subsystem
basis) might help big way. But for each subsystem that will require
looking through the drivers using it.

BTW, remember the flamef^Warguments about turning kdev_t into a pointer
and thus avoiding digging through piles and piles of cr^Hode? Didn't
fly for exactly the same reasons - there's no way to do it magically
for entire kernel without breaking tons of code and/or creating an
impossible-to-debug mess...

2002-11-19 15:58:44

by Doug Ledford

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

On Tue, Nov 19, 2002 at 04:52:25PM +1100, Rusty Russell wrote:
> > right). Or you can run a notifier on "enlivening" a module: I'd hoped
> > to avoid that.
>
> Actually, after some thought, this seems to clearly be the Right
> Thing, because it solves another existing race. Consider a module
> which does:
>
> if (!register_foo(&my_foo))
> goto cleanup;
> if (!create_proc_entry(&my_entry))
> goto cleanup_foo;

There is *NO* module that does this right now and can be considered even
close to working. The rule always has been "register yourself when you
are ready for use". You're trying to add this new "You can fail after
registering yourself" semantic for brain dead coders that can't write an
init function to save thier ass. My position is that in doing so, you
fuck all of us that do write a reasonable init sequence and handle our
error conditions. Plus, since this is a changes in semantics, you have
possibly 50 or 100 modules that rely on the old behaviour, and maybe a few
that are broken in regards to registration ordering. I think you are
trying to fix the wrong group of modules here.

So, to me, the answer is clear. The rule is hard and fast, you don't hand
out your function pointers to other modules or the core kernel until you
are ready for them to be used. Don't muck with the module loader to solve
the problem, fix the maybe 4 or 5 modules that might violate this rule.

> If register_foo() calls /sbin/hotplug, the module can still fail to
> load and /sbin/hotplug is called for something that doesn't exist.

Which is just totally fucking stupid on the part of any module author that
would do things in that order. You're trying to write a module loader
that makes it safe for a module author to be a total moron. Don't. Let
the morons go code somewhere else. For the kernel, make them follow a few
sensible rules instead of enforcing moron proof crap on everyone else.

> With the new module loader, you can also have /sbin/hotplug try to
> access the module before it's gone live, which will fail to prevent
> the "using before we know module won't fail init" race.
>
> Now, if you run /sbin/hotplug out of a notifier which is fired when
> the module actually goes live, this problem vanishes. It also means
> we can block module unload until /sbin/hotplug is run.
>
> The part that makes this feel like the Right Thing is that adding to
> init/main.c:
>
> /* THIS_MODULE == NULL */
> notifier_call_chain(&module_notifiers, MODULE_LIVE, NULL);
>
> means that /sbin/hotplug is called for everything which was registered
> at boot. (We may not want to do this, but in general the symmetry
> seems really nice).
>
> [ Note: the logic for /sbin/hotplug applies to any similar "publicity"
> function which promises that something now exists. ]
>
> Al, thoughts?

I actually like the idea of having an automatic go live function that we
can trigger, but I still don't think that justifies blocking a module from
entering itself during init.

--
Doug Ledford <[email protected]> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606

2002-11-19 17:48:23

by Jeff Garzik

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

Doug Ledford wrote:

> On Tue, Nov 19, 2002 at 04:52:25PM +1100, Rusty Russell wrote:
>
> >>right). Or you can run a notifier on "enlivening" a module: I'd hoped
> >>to avoid that.
> >
> >Actually, after some thought, this seems to clearly be the Right
> >Thing, because it solves another existing race. Consider a module
> >which does:
> >
> > if (!register_foo(&my_foo))
> > goto cleanup;
> > if (!create_proc_entry(&my_entry))
> > goto cleanup_foo;
>
>
> There is *NO* module that does this right now and can be considered even
> close to working. The rule always has been "register yourself when you
> are ready for use". You're trying to add this new "You can fail after
> registering yourself" semantic for brain dead coders that can't write an
> init function to save thier ass. My position is that in doing so, you
> fuck all of us that do write a reasonable init sequence and handle our
> error conditions. Plus, since this is a changes in semantics, you have
> possibly 50 or 100 modules that rely on the old behaviour, and maybe a
> few
> that are broken in regards to registration ordering. I think you are
> trying to fix the wrong group of modules here.
>
> So, to me, the answer is clear. The rule is hard and fast, you don't
> hand
> out your function pointers to other modules or the core kernel until you
> are ready for them to be used. Don't muck with the module loader to
> solve
> the problem, fix the maybe 4 or 5 modules that might violate this rule.



violently agreed. This has the potential for requiring an update of
almost every driver in the kernel, does it not?

Jeff



2002-11-19 22:26:55

by Andries Brouwer

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

On Tue, Nov 19, 2002 at 02:12:54AM -0500, Alexander Viro wrote:

> BTW, remember the flamef^Warguments about turning kdev_t into a pointer
> and thus avoiding digging through piles and piles of cr^Hode? Didn't
> fly for exactly the same reasons - there's no way to do it magically
> for entire kernel without breaking tons of code and/or creating an
> impossible-to-debug mess...

You never convinced me. Maybe we ever meet and can talk..

In the meantime:
Yesterday or so I built versions of 2.5.48 with 32-bit and 64-bit dev_t.

The code is full of conversions dev_t -> int -> dev_t -> int.
Ugly. I gave the mknod method in struct inode_operations type
int (*mknod) (struct inode *,struct dentry *,int,dev_t);
thus avoiding much conversion.
Do you have objections?

Andries

# mount /dev/hde6 /fbsd6 -t ufs -o ufstype=44bsd,ro
# ./ls -l /fbsd6/dev/da2s[34]
brw-r----- 1 root disk 4, 0x040012 Oct 31 21:06 /fbsd6/dev/da2s3
brw-r----- 1 root disk 4, 0x050012 Oct 31 21:06 /fbsd6/dev/da2s4
#

2002-11-20 07:21:17

by Rusty Russell

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

In message <[email protected]> you write:
> Doug Ledford wrote:
> > are ready for them to be used. Don't muck with the module loader to
> > solve
> > the problem, fix the maybe 4 or 5 modules that might violate this rule.
>
> violently agreed. This has the potential for requiring an update of
> almost every driver in the kernel, does it not?

God no! I'm crazy maybe, but not stupid. I know of two interfaces
which would be affected, and no drivers.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-20 07:19:50

by Rusty Russell

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

In message <[email protected]> you write:
> On Tue, Nov 19, 2002 at 10:49:21AM +1100, Rusty Russell wrote:
> > In message <[email protected]>
you
> > write:
> > > Not really. For case in question (block devices) there is only one path
> > > and I'd rather keep it that way, thank you very much.
> >
> > See other posting. This is a fundamental design decision, and it's
> > not changing. Sorry.
>
> Then you'll have to back out the patch to module.c because it's already
> changed.

Yeah, I just noticed. To be honest, I was wrong. And the code
shouldn't be put back until (if ever) I have a solution which solves
the races and *doesn't* break working code.

And meanwhile, there are more important things (like reducing the 400k
overhead of CONFIG_KALLSYMS adds to the kernel).

Sorry for the overzealousness,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-20 07:21:16

by Rusty Russell

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

In message <[email protected]> you
write:
> Wrong granularity. "module had finished initialization" is _not_ a natural
> event wrt stuff exported by that module.

Well, I have to disagree here. "I have successfully finished my
initialization" is the same logic you're trying to beat into module
author's heads before they commit their interfaces.

> And that's one of the main problems I have with your approach - you
> are trying to make module loader do the stuff that has nothing
> whatsoever with modules.

Or, put another way, I'm trying to ensure that correct code in the
kernel is correct code in a module, by having the module code simulate
the conditions in which the original occurred.

> Moreover, if driver is built into the kernel, you get (at the very least)
> a different mechanism for triggering the same events, if not the outright
> different sequence of events. Which is a Bad Thing(tm) since it leads to
> a shitload of extra problems with debugging.

Speaking of which, what's the long-term plan for hotplug events at
boot? Will you get a shitload of "add" events, or rely on userspace
to scan?

> Double freeing. Forgetting to free. Dereference after freeing. Leaving
> timers active. Leaving pointers to local structures in global arrays.

Hmm, my audit of copy_to/from_user uncovered similar bogosities, and
I've dropped more than one Trivial Patch Monkey patches because they
cleaned up one case in an incurable sewer of problems 8(

The "better to have a bad driver than no driver" becomes less true as
people start copying from it, too. This is where meta-maintainers
like Jeff Garzik make a definite difference as chief shit-kicker.

> On that background all shite you are so concerned about is a noise.

Perhaps. But I can't change the world: that's your job.

> What we _can_ do (and that will be on per-subsystem basis) is providing
> APIs that would cut down on the amount of glue needed in driver init.

Fervently agree: it's hard to get one line wrong, *and* almost any
scheme can be implemented on top of it.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-20 07:19:58

by Rusty Russell

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

In message <[email protected]> you write:
> On Tue, 19 Nov 2002, Rusty Russell wrote:
> >
> > See other posting. This is a fundamental design decision, and it's
> > not changing. Sorry.
>
> Rusty, you take that approach, and the module code flies out of the
> kernel.

Linus,

Sorry, you're right of course. I should at least have done
something like the "unsafe" wedge which was used for the other
interfaces. Under the circumstances, I agree with your fix.

> you seem to say that drivers should be broken een if they are perfectly
> fine and do not have any races as is.

Linus, I would like an answer: how does one register two /proc
entries? The following has a race when modular:

if (!create_proc_entry("foo"...))
goto fail;
if (!create_proc_entry("bar"...))
goto fail_free_foo;
return 0;

I sympathize with Al's "we have worse problems, just BUG()
when it happens" approach, but I don't agree with Doug's "that's
clearly broken code!" assertion. Al wants to split such code into:

if (!reserve_proc_entry("foo"...))
goto fail;
if (!reserve_proc_entry("bar"...))
goto fail_free_foo;

/* Now we're safe! */
activate_proc_entry("foo");
activate_proc_entry("bar");
return 0;

Note that this code change is *only* required because the code is
modular. The code has works perfectly fine in the kernel, and has for
years, and very few people have ever noticed any problem with it.

Now, let's imagine one simple implementation of reserve_proc_entry and
activate_proc_entry: a proc entry gets a flag (call it "live") which
it can use to tell the difference between reserved and activated
entries, and it ignores !live ones.

Now, if you substitute the "live" flag in the proc entry for a "live"
flag in the proc->owner entry, you have the previous module code,
except:

1) You don't need to change all the modules to do two-stage init.
2) You don't need to change all the interfaces to two-stage init.
3) You *do* have problems with complex interfaces which need to do
something more than just flip the flag to make it "live".

Now do you see why I like this scheme? For most interfaces, it "just
works": I shall endevour to fix the remaining cases in a way which is
no less clear than current code, and if I fail, hey, we leave your
change in and we're no worse than 2.4.

> If so, then bye bye new module loader.

Fair enough. Frankly, I just wanted to neaten the module code to
allow other improvements without breaking userspace every time. But I
felt it wrong to re-implement the same module races: let this be a
valuable lesson to me not to bite off more than I can chew 8(

Thanks for your patience,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2002-11-20 15:38:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...


On Wed, 20 Nov 2002, Rusty Russell wrote:
>
> Linus, I would like an answer: how does one register two /proc
> entries?

I think the answer is simple: you don't.

In practice this doesn't happen. And even in theory when it does happen,
trying to do it right is _worse_ than just letting it be. So what if one
of them fails? Big deal. You didn't get the file, because you are playing
with modules that do something that they shouldn't have done.

Rule #1: KISS

Rule #2: "don't overdesign" aka "perfect is the enemy of the good"

In other words, screw the cases that don't matter, we only need to verify
that we don't oops. Which means that the _right_ approach for proc entries
is something like this:

if (register_device(....))
return -ENOMEM;

if (!create_proc_entry("foo"...))
printk("Unable to create '/proc/foo'\n");
if (!create_proc_entry("bar"...))
printk("Unable to create '/proc/bar'\n");
return 0;

and simply face the fact that trying to fail the module load at a late
time is a bad idea, and if somebody already has registered your /proc
entries then the system maintainer is doing something wrong and it is
_his_ problem and he can

The kernel shouldn't oops. And if that means that we should just leave the
module loaded (not return an error), then so what? Hey, it may work fine
without the /proc entries, erroring out might be _worse_.

This is what happens with built-in drivers already. Modules are nothing
but a convenience. They're not "worthy" of complexity.

Linus

2002-11-20 23:36:01

by john slee

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

[ cc trimmed a-plenty ]

On Tue, Nov 19, 2002 at 12:55:20PM -0500, Jeff Garzik wrote:
> >There is *NO* module that does this right now and can be considered even
> >close to working. The rule always has been "register yourself when you
> >are ready for use". You're trying to add this new "You can fail after
> >registering yourself" semantic for brain dead coders that can't write an
> >init function to save thier ass. My position is that in doing so, you
> >fuck all of us that do write a reasonable init sequence and handle our
> >error conditions. Plus, since this is a changes in semantics, you have
> >possibly 50 or 100 modules that rely on the old behaviour, and maybe a
> >few
> >that are broken in regards to registration ordering. I think you are
> >trying to fix the wrong group of modules here.
> >
> >So, to me, the answer is clear. The rule is hard and fast, you don't
> >hand
> >out your function pointers to other modules or the core kernel until you
> >are ready for them to be used. Don't muck with the module loader to
> >solve
> >the problem, fix the maybe 4 or 5 modules that might violate this rule.
>
>
>
> violently agreed. This has the potential for requiring an update of
> almost every driver in the kernel, does it not?

jeff, why not put some sample code in prominent public places
(kernel.org, or perhaps more appropriately kernelnewbies.org) that
provides examples of sane module code, since that appears to be
the problem here.

i ask you since i have a vague recollection of you mentioning
macro-fying all the nic drivers with m4 a while back, and to do this you
must have had some sort of basic skeleton to flesh out.

it won't help the idiots, but it may help people who are less than
intimate with the rules and regulations of lunix module authoring
(perhaps having come from a different OS).

j.

--
toyota power: http://indigoid.net/

2002-11-24 22:25:22

by Rusty Russell

[permalink] [raw]
Subject: Re: Why /dev/sdc1 doesn't show up...

In message <[email protected]> you wr
ite:
>
> On Wed, 20 Nov 2002, Rusty Russell wrote:
> >
> > Linus, I would like an answer: how does one register two /proc
> > entries?
>
> I think the answer is simple: you don't.

OK. Two devices? A device and a sysfs attribute? A device and a
filesystem? A device and a setsockopt? So I sympathise, but I
disagree here.

> This is what happens with built-in drivers already. Modules are nothing
> but a convenience. They're not "worthy" of complexity.

Agreed. Having an invisible "MUST NOT fail beyond this point" line is
leaving a subtle trap for driver writers.

We *already have* a mechanism to isolate a module: we did it to avoid
a two-stage module destroy policy. We can use the exact same
mechanism to avoid such a two-stage module init policy.

Modulo bugs, I've erred on the side of not breaking the 1500+ modules
in the kernel, and *not* exposing the unload races to them. Please
consider carefully.

Note 1: Al mentioned he wants to fire off userspace before initcalls
are done, introducing the same races in core kernel init. Without
seeing reasons, it's hard to tell why he wants this, but this would
require everyone to do two-stage init anyway.

Note 2: If you really want two-stage init, you're best off making it
explicit: ie. two functions: one which can fail (int module_prep()),
and one which can't (void module_start()). I regarded this as too
invasive a change just for an obscure module race.

Hope that clarifies?
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.