The following is a patch which removes the BKL from quite a few
drivers' release functions. The release functions are already
serialized in the VFS code by an atomic_t which guarantees that each
function will be called only once, after all file descriptors have been
closed. In addition, in these drivers, the BKL was _only_ held in the
release function and nowhere else in the driver where it might be needed.
Many of these patches simply remove the BKL from the file. This causes
no harm because the BKL was not really protecting anything, anyway.
Other patches try to actually fix the locking. Some do this by making
use of atomic operations with the atomic_* functions, or the
(test|set)_bit functions. Most of these patches replace uses of normal
integers which were used to keep open counts in the drivers. In other
some cases, a spinlock was added when the atomic operations could not
guarantee proper serialization by themselves. And, in very few cases,
the existing locking was extended to protect more things. These cases
are very uncommon because locking is very uncommon in most of these drivers.
Special care has been taken not to introduce more locking issues into
the drivers (do no harm). They're available as one big patch which is
against 2.4.14. The big patch is about 50k, so, instead of attaching
it, here is a link: http://lse.sourceforge.net/lockhier/patches/bkl.rollup
Here is documentation describing some of the patches and other locking
issues in the drivers: http://lse.sourceforge.net/lockhier/
The patch applies against 2.4.14.
--
David C. Hansen
[email protected]
> Many of these patches simply remove the BKL from the file. This causes
> no harm because the BKL was not really protecting anything, anyway.
> Other patches try to actually fix the locking. Some do this by making
> use of atomic operations with the atomic_* functions, or the
> (test|set)_bit functions. Most of these patches replace uses of normal
> integers which were used to keep open counts in the drivers. In other
> some cases, a spinlock was added when the atomic operations could not
> guarantee proper serialization by themselves. And, in very few cases,
> the existing locking was extended to protect more things. These cases
> are very uncommon because locking is very uncommon in most of these
> drivers.
At least some of the removals in the input tree are probably wrong. You are
introducing a race with deregistering of input devices.
Regards
Oliver
In article <01112211121601.00690@argo> you wrote:
> At least some of the removals in the input tree are probably wrong. You are
> introducing a race with deregistering of input devices.
Nope, it's fine to remove it. Input is racy all over the place and the list
are modified somewhere else without any locking anyways.
Christoph
--
Of course it doesn't work. We've performed a software upgrade.
Christoph Hellwig <[email protected]> said:
> In article <01112211121601.00690@argo> you wrote:
> > At least some of the removals in the input tree are probably wrong. You
> > are introducing a race with deregistering of input devices.
> Nope, it's fine to remove it. Input is racy all over the place and the list
> are modified somewhere else without any locking anyways.
"It is broken anyway, breaking it some more makes no difference"!?
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
On Thu, Nov 22, 2001 at 09:30:06AM -0300, Horst von Brand wrote:
> > Nope, it's fine to remove it. Input is racy all over the place and the list
> > are modified somewhere else without any locking anyways.
>
> "It is broken anyway, breaking it some more makes no difference"!?
Wether you lock access to shared data at one or zero points doesn't matter,
so it's not breaking it more.
Christoph
--
Of course it doesn't work. We've performed a software upgrade.
Christoph Hellwig <[email protected]> said:
Nope, it's fine to remove it. Input is racy all over the place and the list
are modified somewhere else without any locking anyways.
Horst von Brand <[email protected]> then said:
"It is broken anyway, breaking it some more makes no difference"!?
No, it is more a matter of "it is not helping at all, so removing it
makes no difference in behavior." Removing it does, however, help
clean up the code and remove unnecessary instances of the BKL from the
kernel code.
If you check the web page at
http://lse.sourceforge.net/lockhier/patches.html, you'll find
additional information on why this patch was produced. The most common
"no-op" was that (BKL) locking was done during release but not during
open. In some cases, there truly are things to guard. In some cases,
there really isn't. In all cases, nothing was really being correctly
guarded.
Usage of the BKL exists, in many cases, as a legendary artifact. Nobody
is sure if it's really needed, so everybody is afraid to take it out,
"just in case". These patches represent real research -- we believe it
really is safe to take it out in these cases. If we could not be sure,
we didn't try to patch it.
Rick
On Fri, 23 Nov 2001, Rick Lindsley wrote:
> Christoph Hellwig <[email protected]> said:
>
> Nope, it's fine to remove it. Input is racy all over the place and the list
> are modified somewhere else without any locking anyways.
>
> Horst von Brand <[email protected]> then said:
>
> "It is broken anyway, breaking it some more makes no difference"!?
>
> No, it is more a matter of "it is not helping at all, so removing it
> makes no difference in behavior." Removing it does, however, help
> clean up the code and remove unnecessary instances of the BKL from the
> kernel code.
>
> If you check the web page at
> http://lse.sourceforge.net/lockhier/patches.html, you'll find
> additional information on why this patch was produced. The most common
> "no-op" was that (BKL) locking was done during release but not during
> open. In some cases, there truly are things to guard. In some cases,
> there really isn't. In all cases, nothing was really being correctly
> guarded.
While this is doubtlessly true, please don't do things like removing the
lock from interfaces like the call to open() in the input subsystem.
People may depend on the lock being held there. Having open() under BKL
simplifies writing USB device drivers.
Regards
Oliver
In article <[email protected]> you wrote:
> While this is doubtlessly true, please don't do things like removing the
> lock from interfaces like the call to open() in the input subsystem.
> People may depend on the lock being held there. Having open() under BKL
> simplifies writing USB device drivers.
Beeing completly single-threaded also simplifies writing unclean drivers..
BTW, I've attached a patch that fixes the largest input races (against 2.4.6),
I don't see how to change the total lack of locking for other data structures
without an API change, though.
Christoph
--
Whip me. Beat me. Make me maintain AIX.
--- linux-2.4.6/drivers/input/input.c Mon Jun 4 21:17:55 2001
+++ linux/drivers/input/input.c Sun Jul 8 22:58:10 2001
@@ -57,6 +57,7 @@
static devfs_handle_t input_devfs_handle;
static int input_number;
static long input_devices[NBITS(INPUT_DEVICES)];
+static DECLARE_MUTEX(input_lock);
void input_event(struct input_dev *dev, unsigned int type, unsigned int code, int value)
{
@@ -222,6 +223,8 @@
struct input_handler *handler = input_handler;
struct input_handle *handle;
+ down(&input_lock);
+
/*
* Initialize repeat timer to default values.
*/
@@ -257,6 +260,8 @@
input_link_handle(handle);
handler = handler->next;
}
+
+ up(&input_lock);
}
void input_unregister_device(struct input_dev *dev)
@@ -265,6 +270,8 @@
struct input_dev **devptr = &input_dev;
struct input_handle *dnext;
+ down(&input_lock);
+
/*
* Kill any pending repeat timers.
*/
@@ -294,6 +301,8 @@
if (dev->number < INPUT_DEVICES)
clear_bit(dev->number, input_devices);
+
+ up(&input_lock);
}
void input_register_handler(struct input_handler *handler)
@@ -301,6 +310,8 @@
struct input_dev *dev = input_dev;
struct input_handle *handle;
+ down(&input_lock);
+
/*
* Add minors if needed.
*/
@@ -324,6 +335,8 @@
input_link_handle(handle);
dev = dev->next;
}
+
+ up(&input_lock);
}
void input_unregister_handler(struct input_handler *handler)
@@ -332,6 +345,8 @@
struct input_handle *handle = handler->handle;
struct input_handle *hnext;
+ down(&input_lock);
+
/*
* Tell the handler to disconnect from all devices it keeps open.
*/
@@ -358,38 +373,60 @@
if (handler->fops != NULL)
input_table[handler->minor >> 5] = NULL;
+
+ up(&input_lock);
}
static int input_open_file(struct inode *inode, struct file *file)
{
- struct input_handler *handler = input_table[MINOR(inode->i_rdev) >> 5];
struct file_operations *old_fops, *new_fops = NULL;
+ struct input_handler *handler;
+ unsigned int minor = MINOR(inode->i_rdev), index;
int err;
- /* No load-on-demand here? */
- if (!handler || !(new_fops = fops_get(handler->fops)))
+ if (minor >= INPUT_DEVICES)
return -ENODEV;
- /*
- * That's _really_ odd. Usually NULL ->open means "nothing special",
- * not "no device". Oh, well...
- */
- if (!new_fops->open) {
- fops_put(new_fops);
- return -ENODEV;
+ down(&input_lock);
+ index = minor >> 5;
+ handler = input_table[index];
+
+ if (handler)
+ new_fops = fops_get(handler->fops);
+ if (!new_fops) {
+ char modname[32];
+
+ up(&input_lock);
+ sprintf(modname, "input-handler-%d", index);
+ request_module(modname);
+ down(&input_lock);
+
+ err = -ENODEV;
+ handler = input_table[index];
+ if (!handler)
+ goto end;
+ if (!(new_fops = fops_get(handler->fops)))
+ goto end;
}
+
old_fops = file->f_op;
file->f_op = new_fops;
- lock_kernel();
- err = new_fops->open(inode, file);
- unlock_kernel();
+ if (new_fops->open) {
+ lock_kernel();
+ err = new_fops->open(inode, file);
+ unlock_kernel();
+ } else
+ err = 0;
if (err) {
fops_put(file->f_op);
file->f_op = fops_get(old_fops);
}
+
fops_put(old_fops);
+end:
+ up(&input_lock);
return err;
}
Am Freitag 23 November 2001 11:47 schrieb Christoph Hellwig:
> In article <[email protected]>
you wrote:
> > While this is doubtlessly true, please don't do things like removing the
> > lock from interfaces like the call to open() in the input subsystem.
> > People may depend on the lock being held there. Having open() under BKL
> > simplifies writing USB device drivers.
>
> Beeing completly single-threaded also simplifies writing unclean drivers..
This is true. However USB drivers have to cope with devices becoming
unplugged at all times. The races this produces are not nice.
> BTW, I've attached a patch that fixes the largest input races (against
> 2.4.6), I don't see how to change the total lack of locking for other data
> structures without an API change, though.
This looks very good. Could you get this to the maintainer ?
Regards
Oliver
I wrote:
If you check the web page at
http://lse.sourceforge.net/lockhier/patches.html, you'll find
additional information on why this patch was produced. The most common
"no-op" was that (BKL) locking was done during release but not during
open. In some cases, there truly are things to guard. In some cases,
there really isn't. In all cases, nothing was really being correctly
guarded.
[email protected] replied:
While this is doubtlessly true, please don't do things like removing the
lock from interfaces like the call to open() in the input subsystem.
People may depend on the lock being held there. Having open() under BKL
simplifies writing USB device drivers.
The good news is, the patches addressed unnecessary BKL's in release(),
not open(), so I don't think the patches we submitted will cause you to
lose any sleep. The better news is, Christoph has even produced a
patch to address your concerns (which it sounds like you like.) The
best news is, the kernel is cleaner now in multiple ways.
Life is good.
Rick
Christoph Hellwig wrote:
>Beeing completly single-threaded also simplifies writing unclean drivers..
>
>BTW, I've attached a patch that fixes the largest input races (against 2.4.6),
>I don't see how to change the total lack of locking for other data structures
>without an API change, though.
>
The removal of the BKL in the input.c's open() was beyond the scope of
what we were trying to do. But, it snuck into the patch anyway. It
probably shouldn't have been there.
In the patch, in the open function, you do this:
if (handler)
new_fops = fops_get(handler->fops);
But, the fops_get() #define already cheecks to make sure handler isn't null:
#define fops_get(fops) \
(((fops) && (fops)->owner) \
? ( try_inc_mod_count((fops)->owner) ? (fops) : NULL ) \
: (fops))
It's nice to be careful, but is that extra check really necessary?
And, the BKL is still held during the open() call to the handler's
open(). Is this trying to make sure that open() isn't called twice on a
single device? I checked all of the files in drivers/input, and didn't
see any other uses of the BKL.
--
David C. Hansen
[email protected]
On Monday 26 November 2001 18:46, David C. Hansen wrote:
> In the patch, in the open function, you do this:
> if (handler)
> new_fops = fops_get(handler->fops);
>
> But, the fops_get() #define already cheecks to make sure handler isn't
> null: #define fops_get(fops) \
> (((fops) && (fops)->owner) \
> ? ( try_inc_mod_count((fops)->owner) ? (fops) : NULL ) \
> : (fops))
Look closer, it doesn't check 'handler' (it couldn't).
--
Ciao,
Flavio Stanchina
Trento - Italy
"The best defense against logic is ignorance."
http://spazioweb.inwind.it/fstanchina/
Flavio Stanchina wrote:
>Look closer, it doesn't check 'handler' (it couldn't).
>
Silly me... Thanks.