2005-09-01 18:24:26

by Brown, Len

[permalink] [raw]
Subject: RE: reboot vs poweroff


>Patch tested and works fine here. You should probably make a
>note in the bugzilla so we don't get a conflicting merge
>from the ACPI folks.

One might also consider that it would be a good idea to
send patches that break ACPI files to the ACPI maintainer
and [email protected] before sending them
to Linus...

-Len


2005-09-02 04:44:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: reboot vs poweroff

"Brown, Len" <[email protected]> writes:

>
>>Patch tested and works fine here. You should probably make a
>>note in the bugzilla so we don't get a conflicting merge
>>from the ACPI folks.
>
> One might also consider that it would be a good idea to
> send patches that break ACPI files to the ACPI maintainer
> and [email protected] before sending them
> to Linus...

My apologies, for bug fixes that are not complete and simply move
where the bug is. My apologies also for not cc'ing you, I didn't
intend to omit you but it never occurred to me. The patch was
also 2 lines and obviously correct.

For this round I knew you were on the CC list and deliberately included
you.

My goal for the reboot/halt/suspend/kexec path is to fix it so the
generic code is correct and consistent. Something it hasn't been for
years creating the affect that a correct bug fix in one place would
break something else.

Until the reboot paths are correct and consistent things will continue
to break, in weird and unpredictable ways, that will keep us all
hunting weird strange bugs for a long time. I think the S4 suspend
case is the last code path that needs to be fixed. It is certainly
the last one I am aware of.

Eric

2005-09-10 21:09:20

by Eric W. Biederman

[permalink] [raw]
Subject: Re: reboot vs poweroff

"Brown, Len" <[email protected]> writes:

>
>>Patch tested and works fine here. You should probably make a
>>note in the bugzilla so we don't get a conflicting merge
>>from the ACPI folks.
>
> One might also consider that it would be a good idea to
> send patches that break ACPI files to the ACPI maintainer
> and [email protected] before sending them
> to Linus...

OK hopefully this is my final version of this bug fix.

Eric

diff --git a/include/linux/reboot.h b/include/linux/reboot.h
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -59,6 +59,10 @@ extern void machine_crash_shutdown(struc
* Architecture independent implemenations of sys_reboot commands.
*/

+extern void kernel_restart_prepare(char *cmd);
+extern void kernel_halt_prepare(void);
+extern void kernel_power_off_prepare(void);
+
extern void kernel_restart(char *cmd);
extern void kernel_halt(void);
extern void kernel_power_off(void);
diff --git a/kernel/power/disk.c b/kernel/power/disk.c
--- a/kernel/power/disk.c
+++ b/kernel/power/disk.c
@@ -17,12 +17,12 @@
#include <linux/delay.h>
#include <linux/fs.h>
#include <linux/mount.h>
+#include <linux/pm.h>

#include "power.h"


extern suspend_disk_method_t pm_disk_mode;
-extern struct pm_ops * pm_ops;

extern int swsusp_suspend(void);
extern int swsusp_write(void);
@@ -49,13 +49,11 @@ dev_t swsusp_resume_device;

static void power_down(suspend_disk_method_t mode)
{
- unsigned long flags;
int error = 0;

- local_irq_save(flags);
switch(mode) {
case PM_DISK_PLATFORM:
- device_shutdown();
+ kernel_power_off_prepare();
error = pm_ops->enter(PM_SUSPEND_DISK);
break;
case PM_DISK_SHUTDOWN:
diff --git a/kernel/sys.c b/kernel/sys.c
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -361,17 +361,35 @@ out_unlock:
return retval;
}

+/**
+ * emergency_restart - reboot the system
+ *
+ * Without shutting down any hardware or taking any locks
+ * reboot the system. This is called when we know we are in
+ * trouble so this is our best effort to reboot. This is
+ * safe to call in interrupt context.
+ */
void emergency_restart(void)
{
machine_emergency_restart();
}
EXPORT_SYMBOL_GPL(emergency_restart);

-void kernel_restart(char *cmd)
+/**
+ * kernel_restart - reboot the system
+ *
+ * Shutdown everything and perform a clean reboot.
+ * This is not safe to call in interrupt context.
+ */
+void kernel_restart_prepare(char *cmd)
{
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
system_state = SYSTEM_RESTART;
device_shutdown();
+}
+void kernel_restart(char *cmd)
+{
+ kernel_restart_prepare(cmd);
if (!cmd) {
printk(KERN_EMERG "Restarting system.\n");
} else {
@@ -382,6 +400,12 @@ void kernel_restart(char *cmd)
}
EXPORT_SYMBOL_GPL(kernel_restart);

+/**
+ * kernel_kexec - reboot the system
+ *
+ * Move into place and start executing a preloaded standalone
+ * executable. If nothing was preloaded return an error.
+ */
void kernel_kexec(void)
{
#ifdef CONFIG_KEXEC
@@ -390,9 +414,7 @@ void kernel_kexec(void)
if (!image) {
return;
}
- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
- system_state = SYSTEM_RESTART;
- device_shutdown();
+ kernel_restart_prepare(NULL);
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();
machine_kexec(image);
@@ -400,21 +422,39 @@ void kernel_kexec(void)
}
EXPORT_SYMBOL_GPL(kernel_kexec);

-void kernel_halt(void)
+/**
+ * kernel_halt - halt the system
+ *
+ * Shutdown everything and perform a clean system halt.
+ */
+void kernel_halt_prepare(void)
{
notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
system_state = SYSTEM_HALT;
device_shutdown();
+}
+void kernel_halt(void)
+{
+ kernel_halt_prepare();
printk(KERN_EMERG "System halted.\n");
machine_halt();
}
EXPORT_SYMBOL_GPL(kernel_halt);

-void kernel_power_off(void)
+/**
+ * kernel_power_off - power_off the system
+ *
+ * Shutdown everything and perform a clean system power_off.
+ */
+void kernel_power_off_prepare(void)
{
notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
system_state = SYSTEM_POWER_OFF;
device_shutdown();
+}
+void kernel_power_off(void)
+{
+ kernel_power_off_prepare();
printk(KERN_EMERG "Power down.\n");
machine_power_off();
}

2005-09-11 08:43:53

by Meelis Roos

[permalink] [raw]
Subject: Re: reboot vs poweroff

> OK hopefully this is my final version of this bug fix.

I'm not in the position to test threse patches any more because the
mainboard in question died last week when I installed some more RAM into
it. Masoud Sharbiani <[email protected]> added to CC: since he
experienced the same problem.

--
Meelis Roos ([email protected])

2005-09-11 08:55:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: reboot vs poweroff

Meelis Roos <[email protected]> writes:

>> OK hopefully this is my final version of this bug fix.
>
> I'm not in the position to test threse patches any more because the mainboard in
> question died last week when I installed some more RAM into it. Masoud Sharbiani
> <[email protected]> added to CC: since he experienced the same problem.

Thanks for looking.

At this point I am less worried about fixing the bug. Then I am
worried about the form of the bug fix. (i.e. do weird compile
options break it).

But any testing or review is appreciated.

Eric

2005-09-20 17:44:50

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 1/2] reboot: Comment and factor the main reboot functions


In the lead up to 2.6.13 I fixed a large number of reboot
problems by making the calling conventions consistent. Despite
checking and double checking my work it appears I missed an
obvious one.

This first patch simply refactors the reboot routines
so all of the preparation for various kinds of reboots
are in their own functions. Making it very hard to
get the various kinds of reboot out of sync.

Signed-off-by: Eric W. Biederman <[email protected]>


---

include/linux/reboot.h | 4 ++++
kernel/sys.c | 52 ++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 50 insertions(+), 6 deletions(-)

bc95b88c85ec3e7a0525f73f6c3ae19fa9640fbd
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -59,6 +59,10 @@ extern void machine_crash_shutdown(struc
* Architecture independent implemenations of sys_reboot commands.
*/

+extern void kernel_restart_prepare(char *cmd);
+extern void kernel_halt_prepare(void);
+extern void kernel_power_off_prepare(void);
+
extern void kernel_restart(char *cmd);
extern void kernel_halt(void);
extern void kernel_power_off(void);
diff --git a/kernel/sys.c b/kernel/sys.c
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -361,17 +361,35 @@ out_unlock:
return retval;
}

+/**
+ * emergency_restart - reboot the system
+ *
+ * Without shutting down any hardware or taking any locks
+ * reboot the system. This is called when we know we are in
+ * trouble so this is our best effort to reboot. This is
+ * safe to call in interrupt context.
+ */
void emergency_restart(void)
{
machine_emergency_restart();
}
EXPORT_SYMBOL_GPL(emergency_restart);

-void kernel_restart(char *cmd)
+/**
+ * kernel_restart - reboot the system
+ *
+ * Shutdown everything and perform a clean reboot.
+ * This is not safe to call in interrupt context.
+ */
+void kernel_restart_prepare(char *cmd)
{
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
system_state = SYSTEM_RESTART;
device_shutdown();
+}
+void kernel_restart(char *cmd)
+{
+ kernel_restart_prepare(cmd);
if (!cmd) {
printk(KERN_EMERG "Restarting system.\n");
} else {
@@ -382,6 +400,12 @@ void kernel_restart(char *cmd)
}
EXPORT_SYMBOL_GPL(kernel_restart);

+/**
+ * kernel_kexec - reboot the system
+ *
+ * Move into place and start executing a preloaded standalone
+ * executable. If nothing was preloaded return an error.
+ */
void kernel_kexec(void)
{
#ifdef CONFIG_KEXEC
@@ -390,9 +414,7 @@ void kernel_kexec(void)
if (!image) {
return;
}
- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
- system_state = SYSTEM_RESTART;
- device_shutdown();
+ kernel_restart_prepare(NULL);
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();
machine_kexec(image);
@@ -400,21 +422,39 @@ void kernel_kexec(void)
}
EXPORT_SYMBOL_GPL(kernel_kexec);

-void kernel_halt(void)
+/**
+ * kernel_halt - halt the system
+ *
+ * Shutdown everything and perform a clean system halt.
+ */
+void kernel_halt_prepare(void)
{
notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
system_state = SYSTEM_HALT;
device_shutdown();
+}
+void kernel_halt(void)
+{
+ kernel_halt_prepare();
printk(KERN_EMERG "System halted.\n");
machine_halt();
}
EXPORT_SYMBOL_GPL(kernel_halt);

-void kernel_power_off(void)
+/**
+ * kernel_power_off - power_off the system
+ *
+ * Shutdown everything and perform a clean system power_off.
+ */
+void kernel_power_off_prepare(void)
{
notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
system_state = SYSTEM_POWER_OFF;
device_shutdown();
+}
+void kernel_power_off(void)
+{
+ kernel_power_off_prepare();
printk(KERN_EMERG "Power down.\n");
machine_power_off();
}

2005-09-20 17:51:39

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 2/2] suspend: Cleanup calling of power off methods.


In the lead up to 2.6.13 I fixed a large number of reboot
problems by making the calling conventions consistent. Despite
checking and double checking my work it appears I missed an
obvious one.

The S4 suspend code for PM_DISK_PLATFORM was also calling
device_shutdown without setting system_state, and was
not calling the appropriate reboot_notifier.

This patch fixes the bug by replacing the call of device_suspend with
kernel_poweroff_prepare.

Various forms of this failure have been fixed and tracked for a while.

Thanks for tracking this down go to: Alexey Starikovskiy,
Meelis Roos <[email protected]>, Nigel Cunningham <[email protected]>,
Pierre Ossman <[email protected]>

History of this bug is at:
http://bugme.osdl.org/show_bug.cgi?id=4320

Signed-off-by: Eric W. Biederman <[email protected]>


---

kernel/power/disk.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)

2c72ba7b1126a7ccf3e8fc032f041a223e39aa97
diff --git a/kernel/power/disk.c b/kernel/power/disk.c
--- a/kernel/power/disk.c
+++ b/kernel/power/disk.c
@@ -17,12 +17,12 @@
#include <linux/delay.h>
#include <linux/fs.h>
#include <linux/mount.h>
+#include <linux/pm.h>

#include "power.h"


extern suspend_disk_method_t pm_disk_mode;
-extern struct pm_ops * pm_ops;

extern int swsusp_suspend(void);
extern int swsusp_write(void);
@@ -49,13 +49,11 @@ dev_t swsusp_resume_device;

static void power_down(suspend_disk_method_t mode)
{
- unsigned long flags;
int error = 0;

- local_irq_save(flags);
switch(mode) {
case PM_DISK_PLATFORM:
- device_shutdown();
+ kernel_power_off_prepare();
error = pm_ops->enter(PM_SUSPEND_DISK);
break;
case PM_DISK_SHUTDOWN:

2005-09-20 21:09:19

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Hi!

> In the lead up to 2.6.13 I fixed a large number of reboot
> problems by making the calling conventions consistent. Despite
> checking and double checking my work it appears I missed an
> obvious one.
>
> The S4 suspend code for PM_DISK_PLATFORM was also calling
> device_shutdown without setting system_state, and was
> not calling the appropriate reboot_notifier.

ACK on both. But should not you submit patch via -mm, so it gets at
least some testing there?
Pavel
--
if you have sharp zaurus hardware you don't need... you know my address

2005-09-21 02:10:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Pavel Machek <[email protected]> writes:

> Hi!
>
>> In the lead up to 2.6.13 I fixed a large number of reboot
>> problems by making the calling conventions consistent. Despite
>> checking and double checking my work it appears I missed an
>> obvious one.
>>
>> The S4 suspend code for PM_DISK_PLATFORM was also calling
>> device_shutdown without setting system_state, and was
>> not calling the appropriate reboot_notifier.
>
> ACK on both. But should not you submit patch via -mm, so it gets at
> least some testing there?

The code is obviously correct, and the people with the problem
have reported that this approach solves it.

If this bit of functionality is to even work we need to do
something like this.

So I don't see what benefit putting this in -mm would give. If
I was aggressive I would say that this needs to be in 2.6.13.N.
If I'm not following some procedure I don't have a problem
changing though.

This is the final fix I know of to get a consistent set of semantics
for the everything in the ``reboot path''.

>From a practical standpoint I am very tardy in getting this out.

Eric

2005-09-21 10:18:57

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Hi!

> >> The S4 suspend code for PM_DISK_PLATFORM was also calling
> >> device_shutdown without setting system_state, and was
> >> not calling the appropriate reboot_notifier.
> >
> > ACK on both. But should not you submit patch via -mm, so it gets at
> > least some testing there?
>
> The code is obviously correct, and the people with the problem
> have reported that this approach solves it.
>
> If this bit of functionality is to even work we need to do
> something like this.
>
> So I don't see what benefit putting this in -mm would give. If
> I was aggressive I would say that this needs to be in 2.6.13.N.
> If I'm not following some procedure I don't have a problem
> changing though.

I think you are not following the proper procedure. All the patches
should go through akpm.
Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-09-21 13:52:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Pavel Machek <[email protected]> writes:

> I think you are not following the proper procedure. All the patches
> should go through akpm.

Ok. I must have missed the most recent procedure change.

Anyway. Andrew has picked these patches up so all looks good.

Eric

2005-09-21 16:35:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.



On Wed, 21 Sep 2005, Pavel Machek wrote:
>
> I think you are not following the proper procedure. All the patches
> should go through akpm.

One issue is that I actually worry that Andrew will at some point be where
I was a couple of years ago - overworked and stressed out by just tons and
tons of patches.

Yes, he's written/modified tons of patch-tracking tools, and the git
merging hopefully avoids some of the pressures, but it still worries me.
If Andrew burns out, we'll all suffer hugely.

I'm wondering what we can do to offset those kinds of issues. I _do_ like
having -mm as a staging area and catching some problems there, so going
through andrew is wonderful in that sense, but it has downsides.

Andrew?

Linus

2005-09-21 17:30:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Linus Torvalds <[email protected]> writes:

> On Wed, 21 Sep 2005, Pavel Machek wrote:
>>
>> I think you are not following the proper procedure. All the patches
>> should go through akpm.

Ok. I thought it was fine to send simple and obviously correct bug
fixes to Linus.

> One issue is that I actually worry that Andrew will at some point be where
> I was a couple of years ago - overworked and stressed out by just tons and
> tons of patches.
>
> Yes, he's written/modified tons of patch-tracking tools, and the git
> merging hopefully avoids some of the pressures, but it still worries me.
> If Andrew burns out, we'll all suffer hugely.
>
> I'm wondering what we can do to offset those kinds of issues. I _do_ like
> having -mm as a staging area and catching some problems there, so going
> through andrew is wonderful in that sense, but it has downsides.

It is especially challenging for people like me who typically work on
parts of the kernel without a maintainer. So there frequently isn't
an intermediate I can submit my patches to.

Eric

2005-09-21 17:40:57

by Alexander Nyberg

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

On Wed, Sep 21, 2005 at 09:35:20AM -0700 Linus Torvalds wrote:

>
>
> On Wed, 21 Sep 2005, Pavel Machek wrote:
> >
> > I think you are not following the proper procedure. All the patches
> > should go through akpm.
>
> One issue is that I actually worry that Andrew will at some point be where
> I was a couple of years ago - overworked and stressed out by just tons and
> tons of patches.
>
> Yes, he's written/modified tons of patch-tracking tools, and the git
> merging hopefully avoids some of the pressures, but it still worries me.
> If Andrew burns out, we'll all suffer hugely.
>
> I'm wondering what we can do to offset those kinds of issues. I _do_ like
> having -mm as a staging area and catching some problems there, so going
> through andrew is wonderful in that sense, but it has downsides.
>

Morever bugme.osdl.org is severely underworked (acpi being a noteable
exception) and Andrew has stepped in alot there too. Alot of bugs
reported on the mailing list are only followed up by Andrew.

I think he really should receive much more help than he currently does.

2005-09-21 17:47:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Linus Torvalds <[email protected]> wrote:
>
>
>
> On Wed, 21 Sep 2005, Pavel Machek wrote:
> >
> > I think you are not following the proper procedure. All the patches
> > should go through akpm.
>
> One issue is that I actually worry that Andrew will at some point be where
> I was a couple of years ago - overworked and stressed out by just tons and
> tons of patches.

Patches are very low overhead, really. It's patches which don't work which
take lots of time - a single dud patch can take hours and can make me think
rude thoughts about its originator.

> Yes, he's written/modified tons of patch-tracking tools, and the git
> merging hopefully avoids some of the pressures, but it still worries me.
> If Andrew burns out, we'll all suffer hugely.
>
> I'm wondering what we can do to offset those kinds of issues. I _do_ like
> having -mm as a staging area and catching some problems there, so going
> through andrew is wonderful in that sense, but it has downsides.
>
> Andrew?
>

I'm doin OK.

Patch volume isn't a problem wrt the simple mechanics of handling them.
The problem we have at present is lack of patch reviewing bandwidth. I'll
be tightening things up in that area. Relatively few developers seem to
have the stomach to do a line-by-line through large patches, and it would
be nice to refocus people a bit on that. Christoph's work is hugely
appreciated, thanks.

Famous last words, but the actual patch volume _has_ to drop off one day.
In fact there doesn't seem to much happening out there wrt 2.6.15.

Bugs are a big problem - it takes 4 hours minimum to get a -mm out the door
and a single bug can cause it to slip to the next day in which case I have
to start again. A couple of times it has taken over two days just to get
together a tree which boots on four architectures and compiles on seven.

I'm spending more and more time on bugs now. We have hundreds of bugs
which people have taken the time to report, which the relevant developers
know about and NOTHING IS HAPPENING. "I can't reproduce it" is not an
adequate reason when there are nice testers out there who are available to
work through the diagnosis process. We have hundreds of machines out there
which we are known to have broken and developers just need to reapportion
some of their time to getting these things fixed.

The -mm tree does prevent a large amount of crap from hitting mainline -
I'd guess the bug leakthrough rate is ~10%, although that 90% tends to be
the easy stuff - often compile errors. I'd like to release -mm's more
often and I'd like -mm to have less of a wild-and-crappy reputation. Both
of these would happen if originators were to test ther stuff more
carefully.


2005-09-21 18:03:00

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

[email protected] (Eric W. Biederman) wrote:
>
> Linus Torvalds <[email protected]> writes:
>
> > On Wed, 21 Sep 2005, Pavel Machek wrote:
> >>
> >> I think you are not following the proper procedure. All the patches
> >> should go through akpm.
>
> Ok. I thought it was fine to send simple and obviously correct bug
> fixes to Linus.

I habitually scoop up patches and will get them into Linus (preferably
after 1-2 -mm cycles) if he ducks them.

> > One issue is that I actually worry that Andrew will at some point be where
> > I was a couple of years ago - overworked and stressed out by just tons and
> > tons of patches.
> >
> > Yes, he's written/modified tons of patch-tracking tools, and the git
> > merging hopefully avoids some of the pressures, but it still worries me.
> > If Andrew burns out, we'll all suffer hugely.
> >
> > I'm wondering what we can do to offset those kinds of issues. I _do_ like
> > having -mm as a staging area and catching some problems there, so going
> > through andrew is wonderful in that sense, but it has downsides.
>
> It is especially challenging for people like me who typically work on
> parts of the kernel without a maintainer. So there frequently isn't
> an intermediate I can submit my patches to.

Yup. And MAINTAINERS has quite a few omissions. I generally know who
should be poked and if there's nobody obvious I have 26000 patches to grep
through to find out who might know a bit about that code. Low-level x86 is
a bit of a problem really because it has many cooks and no obvious chef.

Individual maintainers have differing response times, differing
attentiveness and differing patchloss ratios.

There's also confusion once I've cc'ed a maintainer on a patch over whether
I'll be sending it to Linus or whether I want them to.

If a maintainer has a tree in -mm then I'll autodrop the patch if they
merge it, so there's no confusion there.

If the maintainer says "thanks, merged" and I don't have their tree in -mm
then I'll tend to hang onto the patch indefinitely until it finally hits
-linus. Or I'll eventually forget and merge it up anyway ;)

If the maintainer just acks the patch I'll send it in to Linus.

If the maintainer nacks the patch I'll either drop it or I'll mark it
not-for-merging and hang onto it anwyay, as a reminder that we have some
bug which needs fixing.

If the maintainer has a tree in -mm and doesn't merge the patch I'll hang
onto it and periodically resend to the maintainer until some definite
response comes back.

2005-09-21 18:10:20

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Andrew Morton <[email protected]> writes:

> I'm doin OK.

Good to hear.

> Patch volume isn't a problem wrt the simple mechanics of handling them.
> The problem we have at present is lack of patch reviewing bandwidth. I'll
> be tightening things up in that area. Relatively few developers seem to
> have the stomach to do a line-by-line through large patches, and it would
> be nice to refocus people a bit on that. Christoph's work is hugely
> appreciated, thanks.
>
> Famous last words, but the actual patch volume _has_ to drop off one day.
> In fact there doesn't seem to much happening out there wrt 2.6.15.

Due to changes coming through git or that there will simply be fewer
things that need to be patched?

As for 2.6.15 I know I have patches in the queue that I intend to send
out later this week, which probably count. I wonder if other developers
are similar.

Eric

2005-09-21 18:16:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Alexander Nyberg <[email protected]> wrote:
>
> On Wed, Sep 21, 2005 at 09:35:20AM -0700 Linus Torvalds wrote:
>
> >
> >
> > On Wed, 21 Sep 2005, Pavel Machek wrote:
> > >
> > > I think you are not following the proper procedure. All the patches
> > > should go through akpm.
> >
> > One issue is that I actually worry that Andrew will at some point be where
> > I was a couple of years ago - overworked and stressed out by just tons and
> > tons of patches.
> >
> > Yes, he's written/modified tons of patch-tracking tools, and the git
> > merging hopefully avoids some of the pressures, but it still worries me.
> > If Andrew burns out, we'll all suffer hugely.
> >
> > I'm wondering what we can do to offset those kinds of issues. I _do_ like
> > having -mm as a staging area and catching some problems there, so going
> > through andrew is wonderful in that sense, but it has downsides.
> >
>
> Morever bugme.osdl.org is severely underworked (acpi being a noteable
> exception) and Andrew has stepped in alot there too. Alot of bugs
> reported on the mailing list are only followed up by Andrew.
>
> I think he really should receive much more help than he currently does.

Yes, kernel bugmeister is a completely separate function from
patchmonkeying. It is something which a separate person can and should do.

My current thinking is that I'll develop the processes, find out what works
and then look to hand it off to some other sucker. I wouldn't claim that
this is going very well at present, perhaps because I'm just not putting
enough time into the bugmeistering to be able to demonstrate what works and
what does not.

I wouldn't say that bugmeister is a fulltime job, but it'll be a
lot-of-time job. It needs someone who isn't shy and who has a good
understanding of the kernel code-wise, of the processes (hah) and of the
people.

The ability to maintain an overall view of where we're at, which bugs are
serious and which aren't. The ability to succinctly communicate that
overview to everyone else. Able to tell Linus "you can't release a kernel
until bugs A, B and C are fixed". The skills and gut-feel to know when a
patch is some once-off which can be ignored unless it reoccurs, etc. It's
one of those things which can consume as much effort as one is able to put
into it.

Kernel development is more professional than we like to pretend nowadays,
and developers will react well to someone who is doing this for us. It's
pretty boring tho.

2005-09-21 18:25:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

[email protected] (Eric W. Biederman) wrote:
>
> > Famous last words, but the actual patch volume _has_ to drop off one day.
> > In fact there doesn't seem to much happening out there wrt 2.6.15.
>
> Due to changes coming through git or that there will simply be fewer
> things that need to be patched?

We're at -rc2 and I only have only maybe 100 patches tagged for 2.6.15 at
this time. The number of actual major features lined up for 2.6.15 looks
relatively small too.

As I said, famous last words. But we have to finish this thing one day ;)

> As for 2.6.15 I know I have patches in the queue that I intend to send
> out later this week, which probably count. I wonder if other developers
> are similar.

Possibly. Quite a few of the git trees are looking pretty fat. I need to
get another kernel-status thingy out soon, but that takes many hours of
bugzilla-poking.

2005-09-21 18:35:39

by Diego Calleja

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

El Wed, 21 Sep 2005 19:36:30 +0200,
Alexander Nyberg <[email protected]> escribi?:

> Morever bugme.osdl.org is severely underworked (acpi being a noteable
> exception) and Andrew has stepped in alot there too. Alot of bugs
> reported on the mailing list are only followed up by Andrew.

One of the things I'm _really_ missing from OSDL's bugzilla setup is a mailing
list (if there's one I've never heard about it) where all changes/new bugs/
random crap are posted. Something like:

http://lists.freedesktop.org/archives/xorg-bugzilla-noise/2005-April/thread.html

2005-09-21 18:51:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Diego Calleja <[email protected]> wrote:
>
> El Wed, 21 Sep 2005 19:36:30 +0200,
> Alexander Nyberg <[email protected]> escribi?:
>
> > Morever bugme.osdl.org is severely underworked (acpi being a noteable
> > exception) and Andrew has stepped in alot there too. Alot of bugs
> > reported on the mailing list are only followed up by Andrew.
>
> One of the things I'm _really_ missing from OSDL's bugzilla setup is a mailing
> list (if there's one I've never heard about it) where all changes/new bugs/
> random crap are posted.

There is such a list - it's a great way to depress yourself while still
half asleep.

[email protected], but I'm not sure how one subscribes. It doesn't
appear at http://lists.osdl.org/mailman/listinfo/. Martin?

2005-09-21 19:43:33

by Russell King

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

On Wed, Sep 21, 2005 at 07:36:30PM +0200, Alexander Nyberg wrote:
> Morever bugme.osdl.org is severely underworked (acpi being a noteable
> exception) and Andrew has stepped in alot there too. Alot of bugs
> reported on the mailing list are only followed up by Andrew.

That depends on your point of view. One of the biggest problems
bugme has is that not enough of the kernel developers are on it.

I originally signed up to bugme to be able to use it as a service
for those folk who want to report a bug against the new code I look
after, but (for me) it's turned into a bug reporting system for all
serial drivers and seemingly its my responsibility to fix them all
(because I can't assign them to anyone else - I don't even know
who else is signed up to bugme to be able to give them away.)

And as a direct result of this, I tend to end up rejecting bug
reports for random serial drivers that I have absolutely no idea
about just because I can't shift them. I don't like doing this
because it means as a whole we're losing valuable bug reports,
but if I don't take this drastic action, I'd end up with pages
of unprocessable bugs.

So, before trying to get the "underworked" bug system used more,
please try to get more developers signed up to it so that we have
the necessary folk behind the bug system to handle the increased
work load.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-21 19:45:51

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

On Wed, 21 Sep 2005, Andrew Morton wrote:
>
> I wouldn't say that bugmeister is a fulltime job, but it'll be a
> lot-of-time job. It needs someone who isn't shy and who has a good
> understanding of the kernel code-wise, of the processes (hah) and of the
> people.
>
> The ability to maintain an overall view of where we're at, which bugs are
> serious and which aren't. The ability to succinctly communicate that
> overview to everyone else. Able to tell Linus "you can't release a kernel
> until bugs A, B and C are fixed". The skills and gut-feel to know when a
> patch is some once-off which can be ignored unless it reoccurs, etc. It's
> one of those things which can consume as much effort as one is able to put
> into it.

I recognize this description: sorry, Andrew, can we please clone you?

Hugh

2005-09-21 19:55:05

by Martin Bligh

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

>> > Morever bugme.osdl.org is severely underworked (acpi being a noteable
>> > exception) and Andrew has stepped in alot there too. Alot of bugs
>> > reported on the mailing list are only followed up by Andrew.
>>
>> One of the things I'm _really_ missing from OSDL's bugzilla setup is a mailing
>> list (if there's one I've never heard about it) where all changes/new bugs/
>> random crap are posted.
>
> There is such a list - it's a great way to depress yourself while still
> half asleep.
>
> [email protected], but I'm not sure how one subscribes. It doesn't
> appear at http://lists.osdl.org/mailman/listinfo/. Martin?

http://lists.osdl.org/mailman/listinfo/bugme-new

But it's not "all changes/new bugs/random crap", it's "new bugs".
And it's all categories, but there's handy-dandy X- header fields
to filter on.

M.

2005-09-21 20:12:04

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Russell King <[email protected]> wrote:
>
> So, before trying to get the "underworked" bug system used more,
> please try to get more developers signed up to it so that we have
> the necessary folk behind the bug system to handle the increased
> work load.

They don't actually need to be signed up to bugzilla. Just cc
[email protected] on the email trail and anything with
'[Bug NNNN]' in Subject: gets filed appropriately.

So all you need to do forward the email to the relevant culprit and cc
[email protected]. Unfotunately some of the emails which
bugzilla sends (the [bugme-new] ones) don't actually have
[email protected] on the To: or Cc: lines, so you need to
add that by hand the first time.

On problem with all this is that once the discussion has gone to email, it
kinda has to stay that way - if someone goes in and updates the bug via the
web interface, those people who were getting the info only via direct email
don't get to see the new info. Generally that works out OK.

It would be nice if

a) we could add non-bugzilla-account-holders to a bug's cc list and

b) Once bugzilla sees any person either sending or receiving emails or
web entries, it autoadds that person to the bug's cc list, so they get
email for all further activity. Could be a bit irritating, but tough
luck ;) We need to fix bugs.


2005-09-21 20:16:50

by Diego Calleja

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

El Wed, 21 Sep 2005 11:49:48 -0700,
Andrew Morton <[email protected]> escribi?:

> There is such a list - it's a great way to depress yourself while still
> half asleep.

Thanks. While we are at it, what do people thinks about this
very humble wiki page? (ie, does it have sense or I'd better remove it?):

http://wiki.kernelnewbies.org/wiki/LinuxChanges

I'm trying to do something useful, ie like:
http://wiki.dragonflybsd.org/index.php/DragonFly_Status

(yes, I also hate wikis, but if people knows of a better "colaborative
environment" crap which works even with lynx and using it is as easy
*cough* as writing text in a text form in lynx I'm all ears.
http://gcc.gnu.org/wiki/ is also a great example of why kernel could benefit
from using a wiki)

2005-09-22 07:15:18

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Russell King wrote:

>So, before trying to get the "underworked" bug system used more,
>please try to get more developers signed up to it so that we have
>the necessary folk behind the bug system to handle the increased
>work load.
>
>
>

What we probably need then is an official policy that maintainers need
to have an account in the bugzilla. Start with the subsystem maintainers
and leave it to them to get each driver maintainer in line. Having only
a handful of parts of the kernel in the bugzilla is just confusing.

Personally I think the mailing lists are a great way for general
discussion. But once we have a confirmed bug (or difficult new feature)
it is better off being tracked in bugzilla. And this is my opinion both
as a user and as a developer. Bugzilla is the de facto standard of
reporting bugs so some users might find it troublesome dealing with
mailing lists such as LKML.

Rgds
Pierre

2005-09-22 08:10:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [ACPI] Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Hi,

On Thursday, 22 of September 2005 09:15, Pierre Ossman wrote:
> Russell King wrote:
>
> >So, before trying to get the "underworked" bug system used more,
> >please try to get more developers signed up to it so that we have
> >the necessary folk behind the bug system to handle the increased
> >work load.
> >
> >
> >
>
> What we probably need then is an official policy that maintainers need
> to have an account in the bugzilla. Start with the subsystem maintainers
> and leave it to them to get each driver maintainer in line. Having only
> a handful of parts of the kernel in the bugzilla is just confusing.
>
> Personally I think the mailing lists are a great way for general
> discussion. But once we have a confirmed bug (or difficult new feature)
> it is better off being tracked in bugzilla. And this is my opinion both
> as a user and as a developer. Bugzilla is the de facto standard of
> reporting bugs so some users might find it troublesome dealing with
> mailing lists such as LKML.

Generally, I think, all bugs fall into one of two categories. Namely, there
are bugs that get fixed immediately as soon as someone with a clue sees
the report, compilation problems and the like, and there are bugs that
require much time to be handled. IMHO, it doesn't make sense to litter
bugzilla with bugs of the first kind, but all bugs of the second kind
should be tracked in it, at least for the record.

Greetings,
Rafael


--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2005-09-22 09:32:41

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Pierre Ossman <[email protected]> writes:

> What we probably need then is an official policy that maintainers need
> to have an account in the bugzilla. Start with the subsystem maintainers
> and leave it to them to get each driver maintainer in line. Having only
> a handful of parts of the kernel in the bugzilla is just confusing.

But the definition of a maintainer is whoever takes responsibility for
part X. The are many pieces of the kernel that don't easily break
up into the taxonomy of subsystem and driver. There are many people
to reluctantly take responsibility because there is no one else,
and so aren't even mentioned in MAINTAINERS much less the rest of it.

> Personally I think the mailing lists are a great way for general
> discussion. But once we have a confirmed bug (or difficult new feature)
> it is better off being tracked in bugzilla. And this is my opinion both
> as a user and as a developer. Bugzilla is the de facto standard of
> reporting bugs so some users might find it troublesome dealing with
> mailing lists such as LKML.

One problem I have with a system like bugzilla is that frequently bug
reports are not complete, and bugzilla sets the expectation that
once you file a bug the reporters part is complete. Frequently it takes
several round trips via email to even understand the bug that is being
reported.

So either we need a two level bug tracking system where there
is a place to capture bugs that users see, and a place to track
bugs that developers understand. Or we need something that is
much more interactive than bugzilla.

I like Andrews idea of a short term mailing lists for each bug. Where
filing the bug creates the mailing list and the mailing list exists
until the bug is closed sounds like something that might even get
used, as it can be easy for everyone.

Eric

2005-09-22 09:38:31

by Russell King

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

On Thu, Sep 22, 2005 at 03:30:21AM -0600, Eric W. Biederman wrote:
> One problem I have with a system like bugzilla is that frequently bug
> reports are not complete, and bugzilla sets the expectation that
> once you file a bug the reporters part is complete. Frequently it takes
> several round trips via email to even understand the bug that is being
> reported.

What it needs is something like the Red Hat bugzilla front end,
where reporters are guided through the information they need to
submit. Maybe that would help with bugme if we had such a front
end?

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-22 10:54:59

by Pierre Ossman

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

Eric W. Biederman wrote:

>But the definition of a maintainer is whoever takes responsibility for
>part X. The are many pieces of the kernel that don't easily break
>up into the taxonomy of subsystem and driver. There are many people
>to reluctantly take responsibility because there is no one else,
>and so aren't even mentioned in MAINTAINERS much less the rest of it.
>
>
>

Setting up an account for a mailing list and linking unmaintained parts
to it would be a solution to that. Either set up a specific list for
this or use when of the current ones (like LKML). The big problem I see
at the moment is that not all parts of the kernel are represented in the
bugzilla.

>One problem I have with a system like bugzilla is that frequently bug
>reports are not complete, and bugzilla sets the expectation that
>once you file a bug the reporters part is complete. Frequently it takes
>several round trips via email to even understand the bug that is being
>reported.
>
>

I agree that most bug reports are incomplete. But I still think that a
bugzilla is the way to go. We need to educate the users in filing bug
reports no matter which forum is used. Russell's point about having a
wizard would probably help a lot. A bugzilla also gives the option of
marking bugs with NEEDINFO, INVALID and similar, clearly expressing to
the user how the maintainer sees this bug.

>So either we need a two level bug tracking system where there
>is a place to capture bugs that users see, and a place to track
>bugs that developers understand. Or we need something that is
>much more interactive than bugzilla.
>
>
>

I think that the categories NEW/ASSIGNED/CONFIRMED suffices. Although
discussions on mailing lists are more natural for the people here I
don't agree that bugzilla is less interactive than lists.

Rgds
Pierre

2005-09-26 12:09:15

by Diego Calleja

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.

El Wed, 21 Sep 2005 12:54:58 -0700,
"Martin J. Bligh" <[email protected]> escribi?:

> http://lists.osdl.org/mailman/listinfo/bugme-new
>
> But it's not "all changes/new bugs/random crap", it's "new bugs".
> And it's all categories, but there's handy-dandy X- header fields
> to filter on.

I discovered this other ml: http://lists.osdl.org/mailman/listinfo/bugme-janitors

So, http://lists.osdl.org/mailman/listinfo/bugme-new is just for new bugs
and http://lists.osdl.org/mailman/listinfo/bugme-janitors for everything else,
or bugme-janitors is non-functional? (I tried to subscribe to check it myself but
subscription requires "moderator approval")

If so, can we have the bugme-janitors and bugme-new archives opened for
everyone (so google can index it?)

2005-09-26 13:47:52

by Martin Bligh

[permalink] [raw]
Subject: Re: [PATCH 2/2] suspend: Cleanup calling of power off methods.



--Diego Calleja <[email protected]> wrote (on Monday, September 26, 2005 14:09:00 +0200):

> El Wed, 21 Sep 2005 12:54:58 -0700,
> "Martin J. Bligh" <[email protected]> escribi?:
>
>> http://lists.osdl.org/mailman/listinfo/bugme-new
>>
>> But it's not "all changes/new bugs/random crap", it's "new bugs".
>> And it's all categories, but there's handy-dandy X- header fields
>> to filter on.
>
> I discovered this other ml: http://lists.osdl.org/mailman/listinfo/bugme-janitors
>
> So, http://lists.osdl.org/mailman/listinfo/bugme-new is just for new bugs
> and http://lists.osdl.org/mailman/listinfo/bugme-janitors for everything else,
> or bugme-janitors is non-functional? (I tried to subscribe to check it myself but
> subscription requires "moderator approval")

Ooops. shouldn't do. will fix.

> If so, can we have the bugme-janitors and bugme-new archives opened for
> everyone (so google can index it?)

Yeah, sounds fair enough (though I think google will index bugzilla too)

M.