Hi!
Patrick fucked up power managment in 2.6.0-test4, without discussing
it with anybody. He
* changed userland interface
* changed in-kernel interface, in such way that all drivers needing
power
managment need to be changed. (And he messed it up, driver no longer
can do something with both interrupts disabled and enabled on resume).
* DID NOT BOTHER TO TEST HIS CHANGES and broke both suspend-to-disk
and suspend-to-ram. His "cleanups" are unneccessary and clearly buggy
(he changed prototype but failed to change assembly function), and he
ignores non-i386 architectures.
Unfortunately that diff is big enough and does so many things at once
that it is virtually impossible to fix. No usefull work can be done on
power managment in this state.
Please take attached patch, and apply with -R, to bring tree back to
-test3 state. It should be a non-brainer, as Patrick's patch should
not go in in the first place. [If you can do some bk magic to exclude
Patrick's power managment merge, that would be even better].
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
On Mon, Sep 01, 2003 at 01:28:12AM +0200, Pavel Machek wrote:
> Please take attached patch, and apply with -R, to bring tree back to
> -test3 state. It should be a non-brainer, as Patrick's patch should
> not go in in the first place. [If you can do some bk magic to exclude
> Patrick's power managment merge, that would be even better].
Please don't. A number of us are working _with_ Patrick to sort the
problems out and get them resolved. IMHO, there are good changes in
Pat's work, and backing the lot out would be silly at this point.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
Hi!
> > Please take attached patch, and apply with -R, to bring tree back to
> > -test3 state. It should be a non-brainer, as Patrick's patch should
> > not go in in the first place. [If you can do some bk magic to exclude
> > Patrick's power managment merge, that would be even better].
>
> Please don't. A number of us are working _with_ Patrick to sort the
> problems out and get them resolved. IMHO, there are good changes in
> Pat's work, and backing the lot out would be silly at this point.
Its the only way to have power managment working by 2.6.1. Lots of
work went into pm during 2.5 series, and Patrick invalidated all that
with one, 140KB, untested and broken patch (and he managed to break
about all rules about patch submission). It is not possible to fix
damage he done within week.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
On Mon, Sep 01, 2003 at 10:11:54AM +0200, Pavel Machek wrote:
> Its the only way to have power managment working by 2.6.1.
Rubbish. PM is now working here on ARM again - within a week of Pat's
change.
> Lots of
> work went into pm during 2.5 series, and Patrick invalidated all that
> with one, 140KB, untested and broken patch (and he managed to break
> about all rules about patch submission).
I agree that it needed public review _before_ hitting Linus' tree - a
change of that magnitude with only half the subsystems fixed up should
not go directly into Linus' tree without review.
> It is not possible to fix damage he done within week.
It is my understanding that the old PM in 2.5 was not suitable for
the PPC architecture and the new PM model is. As far as the drivers
are concerned, the interface presented is a definite improvement on
what there was before (there are a few things which I'd like to see
further improvement on, but that's not a subject for discussion in
this thread.)
I don't particularly care about kernel/power/* because its not useful
for me - whereas you obviously do. Maybe that's where your axe is
grinding. But whatever, don't throw the baby (driver model changes)
out with the bath water.
And finally, there's longer than a week to fix it. 8)
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On Mon, 1 Sep 2003, Pavel Machek wrote:
>
> Patrick fucked up power managment in 2.6.0-test4 [...]
Ok, guys, how about just talking to each other without the personal
attacks. Pavel in particular - this is more personal friction than
anything else, since others do not seem to have the same visceral dislike
of the patches, and there _has_ been discussion about them.
So please try to talk out the issues rather than just trying to screw each
other over, ok?
Linus
Hi!
> > Its the only way to have power managment working by 2.6.1.
>
> Rubbish. PM is now working here on ARM again - within a week of Pat's
> change.
As you said, you are not using kernel/power.
> > Lots of
> > work went into pm during 2.5 series, and Patrick invalidated all that
> > with one, 140KB, untested and broken patch (and he managed to break
> > about all rules about patch submission).
>
> I agree that it needed public review _before_ hitting Linus' tree - a
> change of that magnitude with only half the subsystems fixed up should
> not go directly into Linus' tree without review.
Good. [I believe it was big enough to require testing on separate tree
(-mm? -ac?) before going mainline].
> > It is not possible to fix damage he done within week.
>
> It is my understanding that the old PM in 2.5 was not suitable for
> the PPC architecture and the new PM model is. As far as the drivers
> are concerned, the interface presented is a definite improvement on
> what there was before (there are a few things which I'd like to see
> further improvement on, but that's not a subject for discussion in
> this thread.)
I only see dm getting more and more complicated :-(.
> I don't particularly care about kernel/power/* because its not useful
> for me - whereas you obviously do. Maybe that's where your axe is
> grinding. But whatever, don't throw the baby (driver model changes)
> out with the bath water.
kernel/power/* changes are worst, because that's subsystem I should be
maintainer of. Driver model changes are quite bad, too, because they
mean we should go over all drivers and fix them. And driver failures
are often pretty subtle.
> And finally, there's longer than a week to fix it. 8)
I'm not looking forward to another half-a-year before kernel gets into
state where it was in -test3.
I still want that crap killed, at least because of the way *how* it
was merged. Altrough killing kernel/power/* would be good start, and
maybe it would lead us to a state where dm changes can be debugged.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Hi!
> > Patrick fucked up power managment in 2.6.0-test4 [...]
>
> Ok, guys, how about just talking to each other without the personal
> attacks. Pavel in particular - this is more personal friction than
> anything else, since others do not seem to have the same visceral dislike
> of the patches, and there _has_ been discussion about them.
Unfortunately, that was not personal attack, that was a fact. Even
Patrick had to agree that his -test4 changes were bad idea, and I even
have "official apology" from him (on irc).
He just thinks he can fix his code, and I want that code to be
reverted, reviewed, tested, and than merged back. There's no way
current mess can be fixed in reasonable time.
I've seen no discussion about that code, and certainly have not seen
that code before it was merged, which is strange since I'm listed as
software suspend maintainer.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
On Mon, Sep 01, 2003 at 11:12:20PM +0200, Pavel Machek wrote:
> He just thinks he can fix his code, and I want that code to be
> reverted, reviewed, tested, and than merged back. There's no way
> current mess can be fixed in reasonable time.
Please don't - that means undoing all the work I've put in to make
ARM work again, and I don't have time to play silly games like this.
We have the majority of the driver model power management back into
a working state (there are still some quirks left, but they are
trivially solvable - I just haven't had the time to sort them out
with Pat yet.)
Whether you like it or not, people are using and developing against
the new driver model code. ARM uses the driver model pretty extensively,
and reverting these changes will only cause more work (which means I
get less time to look at serial, PCMCIA stuff and other stuff.)
The driver model at least should stay as is.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
Hi!
> > He just thinks he can fix his code, and I want that code to be
> > reverted, reviewed, tested, and than merged back. There's no way
> > current mess can be fixed in reasonable time.
>
> Please don't - that means undoing all the work I've put in to make
> ARM work again, and I don't have time to play silly games like this.
Okay, so Patrick broke ARM and you fixed it. But he also broke i386 and
x86-64; and it is not at all clear that his "newer" version is better
than the old one. [Really, what's the advantage? AFAICS it is more
complicated and less flexible, putting "suspend" method to bus as
oppossed to device].
I guess I could survive dm changes going in, but breaking both driver
model, sleep support and all the drivers at same time is a bit too
much...
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
On Tue, Sep 02, 2003 at 12:19:20AM +0200, Pavel Machek wrote:
> > Please don't - that means undoing all the work I've put in to make
> > ARM work again, and I don't have time to play silly games like this.
>
> Okay, so Patrick broke ARM and you fixed it. But he also broke i386 and
> x86-64; and it is not at all clear that his "newer" version is better
> than the old one. [Really, what's the advantage? AFAICS it is more
> complicated and less flexible, putting "suspend" method to bus as
> oppossed to device].
I don't think PCI device support broke - Pat seems to have fixed up
all that fairly nicely, so the driver model change should be
transparent.
The main advantage from a driver writers point of view is the disposal
of the "level" argument. (Doesn't really affect x86, PCI drivers never
had visibility of this.)
However, I'll let the PPC people justify the real reason for the driver
model change, since it was /their/ requirement that caused it, and I'm
not going to fight their battles for them. (although I seem to be doing
exactly that while wasting my time here.)
It's about time that the people in the PPC community, who were the main
guys pushing for the driver model change, spoke up and justified this.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
Hi!
> > > Please don't - that means undoing all the work I've put in to make
> > > ARM work again, and I don't have time to play silly games like this.
> >
> > Okay, so Patrick broke ARM and you fixed it. But he also broke i386 and
> > x86-64; and it is not at all clear that his "newer" version is better
> > than the old one. [Really, what's the advantage? AFAICS it is more
> > complicated and less flexible, putting "suspend" method to bus as
> > oppossed to device].
>
> I don't think PCI device support broke - Pat seems to have fixed up
> all that fairly nicely, so the driver model change should be
> transparent.
As far as I can test, yes, at least UHCI looks broken :-(. It is true
that calling convention at PCI level did not change.
> The main advantage from a driver writers point of view is the disposal
> of the "level" argument. (Doesn't really affect x86, PCI drivers never
> had visibility of this.)
Yes, "level" is gone, instead we have very ugly
-EAGAIN-means-call-me-with-interrupts-disabled hack.
> However, I'll let the PPC people justify the real reason for the driver
> model change, since it was /their/ requirement that caused it, and I'm
> not going to fight their battles for them. (although I seem to be doing
> exactly that while wasting my time here.)
I noticed something going on, but it seem to me one more "struct bus"
would have solved that...
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
> > > Patrick fucked up power managment in 2.6.0-test4 [...]
> >
> > Ok, guys, how about just talking to each other without the personal
> > attacks. Pavel in particular - this is more personal friction than
> > anything else, since others do not seem to have the same visceral dislike
> > of the patches, and there _has_ been discussion about them.
>
> Unfortunately, that was not personal attack, that was a fact. Even
> Patrick had to agree that his -test4 changes were bad idea, and I even
> have "official apology" from him (on irc).
No, I didn't say it was a bad idea. I said that I understood your
concerns, that I was sorry I broke it for you, and that I would fix it
ASAP. Several times in fact.
> He just thinks he can fix his code, and I want that code to be
> reverted, reviewed, tested, and than merged back. There's no way
> current mess can be fixed in reasonable time.
I realize that posting it before merging it would have resulted in a
better outcome, as least thus far. However, it's been 11 days since I've
merged the orgininal code, and 2 since I posted the last patch. While
you've been ranting and raving, you've squandered a lot of valuable time
that could have been spent reviewing the code.
> I've seen no discussion about that code, and certainly have not seen
> that code before it was merged, which is strange since I'm listed as
> software suspend maintainer.
BFD. swsusp is one piece of the puzzle, and I've not touched any of the
core functionality. What I have done is:
Secondly, look at what I've done to it:
- Compartmentalized portions that are useful to more general
suspend/resume sequences.
- Removed duplicate code between different PM mechanisms.
- Made it possible to use it with ACPI, instead of in lieu of it.
- Cleaned up several functions, streamlined a few, added comments, and
made the names easier to swallow than the 'magic' variety.
- Removed 5 calls to panic() and 8 instances of BUG().
It's resulted in a net reduction of lines of code, and I've made it much
easier to read. I'm sorry if this has caused some temporary instability,
especially if the last round of patches still hasn't fixed it. I will get
those resolved ASAP. (Again)
In all actuality, I don't need swsusp. I have a better suspend-to-disk
implementation that is faster, smaller, and cleaner. I've hesitated
merging it because I thought swsusp improvements would be more welcome.
Obviously they're not; or you haven't actually taken the time to read the
code.
Regardless, you can have your toy back. I'm tired of reading it, trying to
fix it, and more than anything, dealing with you. I'll document the API
that a suspend-to-disk mechanism must follow to be used by the PM core, so
that you may optionally hook into it and let users still leverage the
generic suspend improvements.
I will also restore swsusp to whatever state you like - either -test1,
-test3 or -test4 state, or keep it the way it currently is in my patches.
But note that doing so will result in a large amount of duplicated code
which you will be responsible for either merging or removing.
If you don't want to work with me, I will work around you. I'm not trying
to compete with you, rather, as I've said before, make it better for
everyone. Power management has not worked for a majority of users for a
long time. It's in dire need of someone with motivation, direction, and
the time to make it work properly, and get it on par with some of our
contemporary OSes. I'm trying to be that person, and actually fix things
in the long run, rather than encouraging people to "fix it themselves".
Also, continuing to flame me without a rebuttal is not very polite, and
indicates that you don't want to have an actual conversation. Please
remove me and any public mailing lists from the cc list if you're going to
continue to do this. I'll eventually filter you out, but please save
everyone else from the drivel..
Pat
On Tue, Sep 02, 2003 at 12:40:18AM +0200, Pavel Machek wrote:
> > The main advantage from a driver writers point of view is the disposal
> > of the "level" argument. (Doesn't really affect x86, PCI drivers never
> > had visibility of this.)
>
> Yes, "level" is gone, instead we have very ugly
> -EAGAIN-means-call-me-with-interrupts-disabled hack.
>From a driver writers point of view, that's something I won't be using.
If I'm told to suspend, I better suspend. If the driver model is calling
me out of sequence (because there are other children depending on me)
then it hasn't taken notice of my ordering requirements.
(Note - this bit isn't complete, but then the new model is no worse off
than the old model at present with respect to that issue. The new model
does provide the interfaces to allow drivers to specify these dependencies
though.)
> > However, I'll let the PPC people justify the real reason for the driver
> > model change, since it was /their/ requirement that caused it, and I'm
> > not going to fight their battles for them. (although I seem to be doing
> > exactly that while wasting my time here.)
>
> I noticed something going on, but it seem to me one more "struct bus"
> would have solved that...
I've no idea what the PPC problem exactly was. It's up to the PPC guys
to speak up *now*.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
Hi!
> > He just thinks he can fix his code, and I want that code to be
> > reverted, reviewed, tested, and than merged back. There's no way
> > current mess can be fixed in reasonable time.
>
> I realize that posting it before merging it would have resulted in a
> better outcome, as least thus far. However, it's been 11 days since I've
> merged the orgininal code, and 2 since I posted the last patch. While
> you've been ranting and raving, you've squandered a lot of valuable time
> that could have been spent reviewing the code.
I've been reviewing your code, see your mailbox. Unfortunately due to
you renaming functions and moving files around, it is very hard to review.
> > I've seen no discussion about that code, and certainly have not seen
> > that code before it was merged, which is strange since I'm listed as
> > software suspend maintainer.
>
> BFD. swsusp is one piece of the puzzle, and I've not touched any of the
> core functionality. What I have done is:
[Attached is patch "not changing core functionality". How did you
expect me to verify that? And it was you who protested on killing 3
printks.]
> Secondly, look at what I've done to it:
>
> - Compartmentalized portions that are useful to more general
> suspend/resume sequences.
>
> - Removed duplicate code between different PM mechanisms.
>
> - Made it possible to use it with ACPI, instead of in lieu of it.
>
> - Cleaned up several functions, streamlined a few, added comments, and
> made the names easier to swallow than the 'magic' variety.
>
> - Removed 5 calls to panic() and 8 instances of BUG().
And managed to call sleeping functions with interrupts disabled and
break x86-64 somewhere in the process. Hmm, and because you killed
BUG_ON(in_atomic()), you did not realize that you were breaking
that. And I do not think you actually tested those "panic" codepaths
to make sure you are not corrupting data, right?
> In all actuality, I don't need swsusp. I have a better suspend-to-disk
> implementation that is faster, smaller, and cleaner. I've hesitated
> merging it because I thought swsusp improvements would be more welcome.
> Obviously they're not; or you haven't actually taken the time to read the
> code.
Swsusp improvements *would* be welcome, but I prefer to get them via
email, not via ftp from Linus.
> I will also restore swsusp to whatever state you like - either -test1,
> -test3 or -test4 state, or keep it the way it currently is in my patches.
> But note that doing so will result in a large amount of duplicated code
> which you will be responsible for either merging or removing.
Good, please return it to -test3 state. If you can leave your split-up
patches on some public ftp site, that would be good; when dm is back
working so I can actually test it, I'll do some cherry-picking.
> Also, continuing to flame me without a rebuttal is not very polite, and
> indicates that you don't want to have an actual conversation. Please
> remove me and any public mailing lists from the cc list if you're going to
> continue to do this. I'll eventually filter you out, but please save
> everyone else from the drivel..
...while this surely contributes to actual conversation.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
> I've been reviewing your code, see your mailbox. Unfortunately due to
> you renaming functions and moving files around, it is very hard to review.
I've tried to make each changeset do one conceptual item. And, each patch
that I posted, besides obviously the cumulative one, represents one
changeset.
> [Attached is patch "not changing core functionality". How did you
> expect me to verify that? And it was you who protested on killing 3
> printks.]
What is the problem with it? Why is it not better than what was there
before?
> And managed to call sleeping functions with interrupts disabled and
> break x86-64 somewhere in the process.
The first is unintentional and not something I see here. Will investigate
further.
Have you confirmed that x86-64 is broken, or are you simply trying to
raise more accusations? If it is broken, please tell exactly what the
problem is and I will fix it.
> Hmm, and because you killed
> BUG_ON(in_atomic()), you did not realize that you were breaking
> that.
That was in software_suspend() itself, which was completely bogus. For
one, you should review how you're getting called and realize that neither
places were atomic contexts. So, it was useless.
Now, you did export software_suspend() for some unknown reason, and that
is simply bogus. Why would a module call you?
Finally, you BUG()'d when you could simply return an error. That's
completely unfriendly to the user. Just return an error, like every other
sane code path.
> And I do not think you actually tested those "panic" codepaths
> to make sure you are not corrupting data, right?
panic() is not a valid replacement for sane error handling. Every single
panic() in swsusp can be replaced by proper error handling. You should
have done that a long time ago. Calling panic() is just lazy.
> > I will also restore swsusp to whatever state you like - either -test1,
> > -test3 or -test4 state, or keep it the way it currently is in my patches.
> > But note that doing so will result in a large amount of duplicated code
> > which you will be responsible for either merging or removing.
>
> Good, please return it to -test3 state. If you can leave your split-up
> patches on some public ftp site, that would be good; when dm is back
> working so I can actually test it, I'll do some cherry-picking.
They will remain at the URL I posted the other day. As will the patches
to convert it back to the state you please. I will patch against the
current tree and continue to work from there. Note that this involves the
contents of kernel/swsusp.c only.
And, the driver model is working fine. I don't know what you're
complaining about now, but a sane bug report would be helpful. So would
some patches -- you've been maintaining swsusp for two years now, and
you've not help convert one driver to the new model (even before it
changed).
Thanks,
Pat
Hi!
> complaining about now, but a sane bug report would be helpful. So would
> some patches -- you've been maintaining swsusp for two years now, and
> you've not help convert one driver to the new model (even before it
> changed).
Why are you lying?! If you want to have constructive discussion, you'd
better stop that.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Hi!
> > [Attached is patch "not changing core functionality". How did you
> > expect me to verify that? And it was you who protested on killing 3
> > printks.]
>
> What is the problem with it? Why is it not better than what was there
> before?
Like except the fact it no longer works?
Okay, lets see:
@@ -65,9 +65,7 @@
#include "power.h"
-extern long sys_sync(void);
-
-unsigned char software_suspend_enabled = 0;
+unsigned char software_suspend_enabled = 1;
#define __ADDRESS(x) ((unsigned long) phys_to_virt(x))
...by this you enable suspend even before system is booted. That's bad
idea. It could hurt in the past (when we had sysrq-D so swsusp), and
it may hurt again in future when battery goes low during boot.
int pfn;
struct page *page;
-#ifdef CONFIG_DISCONTIGMEM
- panic("Discontingmem not supported");
-#else
BUG_ON (max_pfn != num_physpages);
-#endif
+
...you moved check down to swsusp_save but you did not test it - you
have typo in "CONFIG":
+int swsusp_save(void)
+{
+#if defined (CONFIG_HIGHMEM) || defined (COFNIG_DISCONTIGMEM)
+ printk("swsusp is not supported with high- or
discontig-mem.\n");
+ return -EPERM;
+#endif
- if (PageHighMem(page))
- panic("Swsusp not supported on highmem
- boxes. Send 1GB of RAM to <[email protected]> and try again
- ;-).");
Great, lets kill the messenger. Notice that when swsusp goes wrong it
tends to do bad data corruption, so some defensive programming is good
idea and this should be at least BUG_ON(). Gcc should optimize it
anyway.
-void do_magic_suspend_2(void)
+int do_magic_suspend_2(void)
{
int is_problem;
read_swapfiles();
is_problem = suspend_prepare_image();
spin_unlock_irq(&suspend_pagedir_lock);
...
barrier();
mb();
- spin_lock_irq(&suspend_pagedir_lock); /* Done to disable
interrupts */
mdelay(1000);
-
- free_pages((unsigned long) pagedir_nosave, pagedir_order);
- spin_unlock_irq(&suspend_pagedir_lock);
- mark_swapfiles(((swp_entry_t) {0}), MARK_SWAP_RESUME);
...
- might_sleep();
- do_software_suspend();
+ return -EFAULT;
}
You killed code to mark swap as normal swap space and replaced it with
simple return -EFAULT. This is extremely dangerous; now swap is marked
as containing valid suspend image. On next reboot, user code resume,
but disk already has changed state. There will be silent data
corruption going on, and filesystem may be marked as clean at the
end. Great way to loose all your data. Really.
else if (!memcmp("S2",cur->swh.magic.magic,2))
memcpy(cur->swh.magic.magic,"SWAPSPACE2",10);
else {
- if (noresume)
- return -EINVAL;
- panic("%sUnable to find suspended-data signature
(%.10s - misspelled?\n",
+ printk("swsusp: %s: Unable to find suspended-data
signature (%.10s - misspelled?\n",
name_resume, cur->swh.magic.magic);
- }
- if (noresume) {
- /* We don't do a sanity check here: we want to restore
the swap
- whatever version of kernel made the suspend image;
- We need to write swap, but swap is *not* enabled so
- we must write the device directly */
- printk("%s: Fixing swap signatures %s...\n",
name_resume, resume_file);
- bdev_write_page(bdev, 0, cur);
+ return -EFAULT;
}
printk( "%sSignature found, resuming\n", name_resume );
User has suspended, but he does not want to resume. Code was there to
restore swap space to "normal" state so it can be swapon-ed. This will
not kill data, only puzzled user.
-void software_resume(void)
+int __init swsusp_restore(void)
{
- if (num_online_cpus() > 1) {
- printk(KERN_WARNING "Software Suspend has
malfunctioning SMP support. Disabled :(\n");
- return;
- }
I can not easily see where you moved this check.
- /* We enable the possibility of machine suspend */
- software_suspend_enabled = 1;
- if (!resume_status)
- return;
+ return do_magic(1);
You'd better keep the code as it was, first prepare all the data, only
then enter do_magic(). That way, as little code as possible will run
after we entered the assembly, making stuff easier to debug/check.
> > And managed to call sleeping functions with interrupts disabled and
> > break x86-64 somewhere in the process.
>
> The first is unintentional and not something I see here. Will investigate
> further.
As I told you before, its CONFIG_PREEMPT, and those BUG_ON() checks.
- if (!is_problem) {
- kernel_fpu_end(); /* save_processor_state() does
- kernel_fpu_begin, and we need to revert it in order to p\ass
- in_atomic() checks */
- BUG_ON(in_atomic());
- suspend_save_image();
- suspend_power_down(); /* FIXME: if
- suspend_power_down is commented out, console is lost after few
- suspends ?!\ */
- }
-
+ if (!is_problem)
+ return suspend_save_image();
You've killed kernel_fpu_end(), that might be part of problem. Also
you killed BUG_ON(in_atomic()) which would actually catch that error.
> Have you confirmed that x86-64 is broken, or are you simply trying to
> raise more accusations? If it is broken, please tell exactly what the
> problem is and I will fix it.
How do you expect void do_magic() -> int do_magic() to work without
changing actuall do_magic implementation?
> > Hmm, and because you killed
> > BUG_ON(in_atomic()), you did not realize that you were breaking
> > that.
>
> That was in software_suspend() itself, which was completely bogus. For
> one, you should review how you're getting called and realize that neither
> places were atomic contexts. So, it was useless.
The same way BUG_ON(in_atomic()) should not be in kmalloc()? This is
to guard against bad callers, please leave it in.
> Now, you did export software_suspend() for some unknown reason, and that
> is simply bogus. Why would a module call you?
>
> Finally, you BUG()'d when you could simply return an error. That's
> completely unfriendly to the user. Just return an error, like every other
> sane code path.
When someone calls kmalloc(GPF_KERNEL) with interrupts disabled, do
you just return NULL? No. If someone does that, it needs to be fixed.
> > And I do not think you actually tested those "panic" codepaths
> > to make sure you are not corrupting data, right?
>
> panic() is not a valid replacement for sane error handling. Every single
> panic() in swsusp can be replaced by proper error handling. You should
> have done that a long time ago. Calling panic() is just lazy.
And every untested error handler means severe data corruption. I might
be lazy, but you are dangerous to the user data.
> And, the driver model is working fine. I don't know what you're
> complaining about now, but a sane bug report would be helpful. So would
> some patches -- you've been maintaining swsusp for two years now, and
> you've not help convert one driver to the new model (even before it
> changed).
You've seen the reports before:
* ide problems (some user even posted decoded backtrace)
* UHCI leads to dead usb and machine 20x slower than it should be
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
> However, I'll let the PPC people justify the real reason for the driver
> model change, since it was /their/ requirement that caused it, and I'm
> not going to fight their battles for them. (although I seem to be doing
> exactly that while wasting my time here.)
> It's about time that the people in the PPC community, who were the main
> guys pushing for the driver model change, spoke up and justified this.
Ok, actually it was me, and it was not about PPC specifics but rather
about having sane semantics that actually mean something ;)
The whole point was to get rid of the old 2 step save_state, then
suspend model which didn't make sense. A saved state is only meaningful
as long as that state doesn't get modified afterward, so saving state
and suspending are an atomic operation.
That also means that I plan in the long run to also kill the PCI
save_state. We didn't do that yet to not harm drivers more than we
already did though ;)
Now, about that late "without interrupts" thing. Well, this is not
entirely my design idea, so I'll let Patrick speak up, as I wrote
previously, I'm not too fond of the "I return -EAGAIN" thing, on
the other hand, I wanted to kill the suspend vs. powerdown
distinction we had for a very simple reason:
Once you have suspended the whole PM tree, you can't expect a random
driver to be able to talk to it's device anymore since the "parent"
have been suspended. For example, a USB or 1394 device cannot be
"powered down" at this stage as the previous "suspend" run will have
stopped all activity on the host controller. Since I wanted the overall
PM semantics to be consistent, I asked Patrick to consider removing that
"power down" step. The fact of powering down the device is part of the
same "atomic" suspend operation, deciding wether to power down or not
the device depends solely on the target state (you typically don't do it
for S4 as you expect a complete machine shutdown afterward).
Now, we still had that need, for a few drivers (and in most cases, this
is really about system devices, so it doesn't fit in the same model
anyway, but still, we have a few broken thingies on PCI too), where a
driver need to be suspended "late" (actually powered down late) when
system interrupts have been disabled because that bit of HW is
terminally broken and will assert it's interrupt line in a non
controlled way.
That's the only reason why the powerdown call to the driver model still
exist and why Patrick kept around that mecanism of returning -EAGAIN.
So such a driver is expected to do the following:
- first suspend will actually suspend the IOs as expected (block queues
for
a block driver, stop net queue for a network driver, etc....) and
basically do everything but the actual power down of the device. Then,
it returns -EAGAIN instead of 0 to request beeing called late for
power down.
- second suspend will do the last step.
(I do agree that having the same callback called twice without an
indication of "when" in those 2 steps it is called isn't the best API we
could come up with, but I prefered that to adding a paremeter to
suspend() just for this special case).
This is only to be used on such "broken" devices and only on busses
where you know that your "host" bus is still available after the round
of "suspend" calls (and even PCI may not in some cases), so this is not
something I'd like to see people use too much.
Finally, to reply about UHCI/USB "brokenness", I've been working with
David Brownell and other USB folks lately to get that working properly,
it was not just a matter of adapting to the new callbacks, but also to
do the right thing on the host controller, hopefully, David now has some
patches that will be merged soon that implement proper PM on USB as well
(at least for the host side, I suppose a bunch of device drivers will
have to be fixed, with the new stuff, they'll just get -ESHUTDOWN when
trying to submit URBs after the host is suspended).
Pavel, please keep in mind that proper PM is a difficult task, what we
had before was full of holes, we spent a significant amount of time over
the past year debating the right way to do all of this and Patrick spent
even more time actually implementing it, so rather than just crying
loud, I'd epxect you rather help us fixing what still need to be.
Ben.
> @@ -65,9 +65,7 @@
>
> #include "power.h"
>
> -extern long sys_sync(void);
> -
> -unsigned char software_suspend_enabled = 0;
> +unsigned char software_suspend_enabled = 1;
>
> #define __ADDRESS(x) ((unsigned long) phys_to_virt(x))
>
> ...by this you enable suspend even before system is booted. That's bad
> idea. It could hurt in the past (when we had sysrq-D so swsusp), and
> it may hurt again in future when battery goes low during boot.
Does it or does it not cause a problem? Look at the old software_suspend()
function - The handling of this flag was weird and non-standard. This is
cleaner.
> -#ifdef CONFIG_DISCONTIGMEM
> - panic("Discontingmem not supported");
> -#else
> BUG_ON (max_pfn != num_physpages);
> -#endif
> +
>
> ...you moved check down to swsusp_save but you did not test it - you
> have typo in "CONFIG":
Woops.
>
> +int swsusp_save(void)
> +{
> +#if defined (CONFIG_HIGHMEM) || defined (COFNIG_DISCONTIGMEM)
> + printk("swsusp is not supported with high- or
> discontig-mem.\n");
> + return -EPERM;
> +#endif
>
> - if (PageHighMem(page))
> - panic("Swsusp not supported on highmem
> - boxes. Send 1GB of RAM to <[email protected]> and try again
> - ;-).");
>
> Great, lets kill the messenger. Notice that when swsusp goes wrong it
> tends to do bad data corruption, so some defensive programming is good
> idea and this should be at least BUG_ON(). Gcc should optimize it
> anyway.
First, look at the snippet that you pasted just before this, then explain
how we could have a HIGHMEM page if we unconditionally error when HIGHMEM
is enabled.
> -void do_magic_suspend_2(void)
> +int do_magic_suspend_2(void)
> {
> int is_problem;
> read_swapfiles();
> is_problem = suspend_prepare_image();
> spin_unlock_irq(&suspend_pagedir_lock);
> ...
> barrier();
> mb();
> - spin_lock_irq(&suspend_pagedir_lock); /* Done to disable
> interrupts */
> mdelay(1000);
> -
> - free_pages((unsigned long) pagedir_nosave, pagedir_order);
> - spin_unlock_irq(&suspend_pagedir_lock);
> - mark_swapfiles(((swp_entry_t) {0}), MARK_SWAP_RESUME);
> ...
> - might_sleep();
> - do_software_suspend();
> + return -EFAULT;
> }
>
>
> You killed code to mark swap as normal swap space and replaced it with
> simple return -EFAULT. This is extremely dangerous; now swap is marked
> as containing valid suspend image. On next reboot, user code resume,
> but disk already has changed state. There will be silent data
> corruption going on, and filesystem may be marked as clean at the
> end. Great way to loose all your data. Really.
Snide comments not appreicated.
The resetting of the swapfiles happened in swsusp_free(). Please read the
entire patch. That is, until the latest round of patches, in which the
resetting happened immediately after the header was read. Again, please
read the entire set of patches.
> else if (!memcmp("S2",cur->swh.magic.magic,2))
> memcpy(cur->swh.magic.magic,"SWAPSPACE2",10);
> else {
> - if (noresume)
> - return -EINVAL;
> - panic("%sUnable to find suspended-data signature
> (%.10s - misspelled?\n",
> + printk("swsusp: %s: Unable to find suspended-data
> signature (%.10s - misspelled?\n",
> name_resume, cur->swh.magic.magic);
> - }
> - if (noresume) {
> - /* We don't do a sanity check here: we want to restore
> the swap
> - whatever version of kernel made the suspend image;
> - We need to write swap, but swap is *not* enabled so
> - we must write the device directly */
> - printk("%s: Fixing swap signatures %s...\n",
> name_resume, resume_file);
> - bdev_write_page(bdev, 0, cur);
> + return -EFAULT;
> }
>
> printk( "%sSignature found, resuming\n", name_resume );
>
> User has suspended, but he does not want to resume. Code was there to
> restore swap space to "normal" state so it can be swapon-ed. This will
> not kill data, only puzzled user.
Read your own code:
static int bdev_write_page(struct block_device *bdev, long pos, void *buf)
{
#if 0
struct buffer_head *bh;
BUG_ON (pos%PAGE_SIZE);
bh = __bread(bdev, pos/PAGE_SIZE, PAGE_SIZE);
if (!bh || (!bh->b_data)) {
return -1;
}
memcpy(bh->b_data, buf, PAGE_SIZE); /* FIXME: may need kmap()
*/
BUG_ON(!buffer_uptodate(bh));
generic_make_request(WRITE, bh);
if (!buffer_uptodate(bh))
printk(KERN_CRIT "%sWarning %s: Fixing swap signatures unsuccessful...\n", name_resume, resume_file);
wait_on_buffer(bh);
brelse(bh);
return 0;
#endif
printk(KERN_CRIT "%sWarning %s: Fixing swap signatures unimplemented...\n", name_resume, resume_file);
return 0;
}
Now explain to me why killing that part of code was unnecessary. You've
already forced them to use mkswap(1) to restore the partition. Let's juts
make it official.
> -void software_resume(void)
> +int __init swsusp_restore(void)
> {
> - if (num_online_cpus() > 1) {
> - printk(KERN_WARNING "Software Suspend has
> malfunctioning SMP support. Disabled :(\n");
> - return;
> - }
>
> I can not easily see where you moved this check.
Read the rest of the patches, and the changelogs (I do believe it's in
them). It's in kernel/power/main.c::enter_state(), so all PM handlers can
use it.
> - /* We enable the possibility of machine suspend */
> - software_suspend_enabled = 1;
> - if (!resume_status)
> - return;
> + return do_magic(1);
>
> You'd better keep the code as it was, first prepare all the data, only
> then enter do_magic(). That way, as little code as possible will run
> after we entered the assembly, making stuff easier to debug/check.
Will you read the code? Read your own code, then read the finished code.
The entire thing, not just snippets of the patch, because you're obviously
missing entire concepts.
By the time you called do_magic(), you had stopped processes, freed memory
and suspended drivers. I take care of that in kernel/power/main.c, then
call swsusp. That's in the patches, and the changelogs.
> - if (!is_problem) {
> - kernel_fpu_end(); /* save_processor_state() does
> - kernel_fpu_begin, and we need to revert it in order to p\ass
> - in_atomic() checks */
> - BUG_ON(in_atomic());
> - suspend_save_image();
> - suspend_power_down(); /* FIXME: if
> - suspend_power_down is commented out, console is lost after few
> - suspends ?!\ */
> - }
> -
> + if (!is_problem)
> + return suspend_save_image();
>
> You've killed kernel_fpu_end(), that might be part of problem. Also
> you killed BUG_ON(in_atomic()) which would actually catch that error.
Again, you've exhibiting gross unfamiliarity with your own code.
kernel_fpu_end() is called from do_fpu_end() which is called from
restore_processor_state(), which is called from do_magic() (now
swsusp_arch_suspend()).
As for the BUG_ON(in_atomic()), that's laziness, and just poor style. A
nicer check would have been
int software_suspend(void)
{
if (in_atomic() || in_interrupt())
return -EPERM;
...
}
But, we can all take the moral way out and just make sure that our callers
are calling us in the right context.
> > Have you confirmed that x86-64 is broken, or are you simply trying to
> > raise more accusations? If it is broken, please tell exactly what the
> > problem is and I will fix it.
>
> How do you expect void do_magic() -> int do_magic() to work without
> changing actuall do_magic implementation?
Return is in %eax. We propogate the error from the C functions
(swsusp_suspend() and swsusp_restore()) to the caller of
swsusp_arch_suspend() (nee do_magic()).
> The same way BUG_ON(in_atomic()) should not be in kmalloc()? This is
> to guard against bad callers, please leave it in.
Except kmalloc() has 1000's of callers, many in non-obvious contexts,
hence a real need for such a check. You have now three callers, which is
trivial to look after.
This is not a guessing game. You can impose calling rules that people must
abide by. You can gracefully handle plausible errors. The code doesn't
have to be such a damned mess.
> > panic() is not a valid replacement for sane error handling. Every single
> > panic() in swsusp can be replaced by proper error handling. You should
> > have done that a long time ago. Calling panic() is just lazy.
>
> And every untested error handler means severe data corruption. I might
> be lazy, but you are dangerous to the user data.
My point was that I'm actually trying to fix the error handling so we can
gracefully back out of where we were. I.e. take the non-lazy approach.
The fact that swsusp is riddled with so many panic()s and BUG()s has
always indicated that it is volatile and unsafe. If you cannot handle the
errors that you get during normal operation, then it is most likely not
ready for prime time.
Once again, I'm trying to make it better, against all odds. Against a
multitude of kernel developers scoffing at the notion of swsusp ever
working reliably. Against nearly everyone that has seen kernel code
expressing sorrow for having to read and understand the code. And against
you, who has no gratitude for someone trying to make marked improvements
to the structure and reliability of the code.
> You've seen the reports before:
>
> * ide problems (some user even posted decoded backtrace)
>
> * UHCI leads to dead usb and machine 20x slower than it should be
I don't get this. Someone else reports an error -- admist you flaming me
-- and then you get up and flame me about that, too? I saw the report,
have contacted the person responsible, and see that I have a patch that
should fix the problem. Unfortunately, I've wasted several hours writing
you email trying to explain things to you, and I've not had a chance to
test it.
In short, who the fuck do you think you are that you can constantly berate
me, and with problems that aren't even yours? You've wasted a lot of my
time, obviosuly not even bothered to read the patches or the comments that
go along with them, then have the nerve to attack me about driver reports
you're not even seeing, and tell me in private to revert swsusp and to "do
it soon".
I'm done with you. I rescind my offer to revert swsusp. You can submit
patches yourself. I will not touch that POS any more, for better or worse.
Pat
On Tue, Sep 02, 2003 at 12:40:18AM +0200, Pavel Machek wrote:
> > I don't think PCI device support broke - Pat seems to have fixed up
> > all that fairly nicely, so the driver model change should be
> > transparent.
>
> As far as I can test, yes, at least UHCI looks broken :-(. It is true
> that calling convention at PCI level did not change.
UHCI is broken? How?
Hey, I don't think that the USB code will work at all with suspend
without just unloading the whole USB core.
But I now have some patches in my tree that hook up the usb tree with
the power model to allow usb drivers to be notified that they should
save state in order to try to work on fixing this "problem".
greg k-h
On Maw, 2003-09-02 at 11:21, Benjamin Herrenschmidt wrote:
> The whole point was to get rid of the old 2 step save_state, then
> suspend model which didn't make sense. A saved state is only meaningful
> as long as that state doesn't get modified afterward, so saving state
> and suspending are an atomic operation.
Very old myth. In fact just about every scheduler on the planet exploits
the fact this is untrue.
save state
continue running doing scheduler stuff
restore other state losing the state in the middle we dont need
Ditto with a lot of I/O devices. My audio save state and suspend can be
seperated - I might play a little bit of a song twice but is that a bug
?
On Wed, 2003-09-03 at 15:02, Alan Cox wrote:
> On Maw, 2003-09-02 at 11:21, Benjamin Herrenschmidt wrote:
> > The whole point was to get rid of the old 2 step save_state, then
> > suspend model which didn't make sense. A saved state is only meaningful
> > as long as that state doesn't get modified afterward, so saving state
> > and suspending are an atomic operation.
>
> Very old myth. In fact just about every scheduler on the planet exploits
> the fact this is untrue.
>
> save state
> continue running doing scheduler stuff
> restore other state losing the state in the middle we dont need
>
> Ditto with a lot of I/O devices. My audio save state and suspend can be
> seperated - I might play a little bit of a song twice but is that a bug
It is in lots of case with IO. Especially if your state don't match
between different devices that rely on each other (parent/child
typically), or if some of that state information matches something
persistent on the HW (devices don't necessarily get fully powered
off during suspend).
Note that in most case, there isn't really a notion of "state" to
store or save anyway, that is "state" is just whatever is in your
net_device structure for a network driver, or whatever private
structure in your whatever-other driver, so you just have to restore
a couple of things on wakeup, but really nothing to save on suspend.
The single callback is much simpler, and will avoid lots of mistakes
imho.
Ben.
Hi!
> > > I don't think PCI device support broke - Pat seems to have fixed up
> > > all that fairly nicely, so the driver model change should be
> > > transparent.
> >
> > As far as I can test, yes, at least UHCI looks broken :-(. It is true
> > that calling convention at PCI level did not change.
>
> UHCI is broken? How?
>
> Hey, I don't think that the USB code will work at all with suspend
> without just unloading the whole USB core.
Well, it did work okay in -test3. UHCI has support (leftover from APM
times or something like that), and it was properly called from suspend
sequence. USB devices were "disconnected/reconnected", but it all
worked.
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
Hi!
> > -extern long sys_sync(void);
> > -
> > -unsigned char software_suspend_enabled = 0;
> > +unsigned char software_suspend_enabled = 1;
> >
> > #define __ADDRESS(x) ((unsigned long) phys_to_virt(x))
> >
> > ...by this you enable suspend even before system is booted. That's bad
> > idea. It could hurt in the past (when we had sysrq-D so swsusp), and
> > it may hurt again in future when battery goes low during boot.
>
> Does it or does it not cause a problem?
It does not cause a problem just now.
> > - if (PageHighMem(page))
> > - panic("Swsusp not supported on highmem
> > - boxes. Send 1GB of RAM to <[email protected]> and try again
> > - ;-).");
> >
> > Great, lets kill the messenger. Notice that when swsusp goes wrong it
> > tends to do bad data corruption, so some defensive programming is good
> > idea and this should be at least BUG_ON(). Gcc should optimize it
> > anyway.
>
> First, look at the snippet that you pasted just before this, then explain
> how we could have a HIGHMEM page if we unconditionally error when HIGHMEM
> is enabled.
Maybe it could not have happened, I'm not sure about exact
CONFIG_HIMEM semantics.
...
> > You killed code to mark swap as normal swap space and replaced it with
> > simple return -EFAULT. This is extremely dangerous; now swap is marked
> > as containing valid suspend image. On next reboot, user code resume,
> > but disk already has changed state. There will be silent data
> > corruption going on, and filesystem may be marked as clean at the
> > end. Great way to loose all your data. Really.
>
> Snide comments not appreicated.
>
> The resetting of the swapfiles happened in swsusp_free(). Please read the
> entire patch. That is, until the latest round of patches, in which the
> resetting happened immediately after the header was read. Again, please
> read the entire set of patches.
Okay, my point was that the patch was not exactly obvious. You said
that you have not changed any core functionality, and you may even be
right, but it broke somewhere in the middle of pretty non-obvious
changes.
> > User has suspended, but he does not want to resume. Code was there to
> > restore swap space to "normal" state so it can be swapon-ed. This will
> > not kill data, only puzzled user.
>
> Read your own code:
...
> Now explain to me why killing that part of code was unnecessary. You've
> already forced them to use mkswap(1) to restore the partition. Let's juts
> make it official.
...and remove any chance that someone fixes that #if 0 hunk of code.
> > - if (!is_problem) {
> > - kernel_fpu_end(); /* save_processor_state() does
> > - kernel_fpu_begin, and we need to revert it in order to p\ass
> > - in_atomic() checks */
> > - BUG_ON(in_atomic());
> > - suspend_save_image();
> > - suspend_power_down(); /* FIXME: if
> > - suspend_power_down is commented out, console is lost after few
> > - suspends ?!\ */
> > - }
> > -
> > + if (!is_problem)
> > + return suspend_save_image();
> >
> > You've killed kernel_fpu_end(), that might be part of problem. Also
> > you killed BUG_ON(in_atomic()) which would actually catch that error.
>
> Again, you've exhibiting gross unfamiliarity with your own code.
> kernel_fpu_end() is called from do_fpu_end() which is called from
> restore_processor_state(), which is called from do_magic() (now
> swsusp_arch_suspend()).
No, at this point you are wrong. restore_processor_state() is only
run at resume. But you need to end atomic section during suspend,
too, so you don't sleep in it (causing so much warnings at console
that it takes forever to suspend).
> As for the BUG_ON(in_atomic()), that's laziness, and just poor style. A
> nicer check would have been
>
> int software_suspend(void)
> {
> if (in_atomic() || in_interrupt())
> return -EPERM;
> ...
> }
>
> But, we can all take the moral way out and just make sure that our callers
> are calling us in the right context.
Callers are under your control, and I don't want to trace your code
*that* closely. [This is my choice. if () return -EPERM; is not that
much different, but I prefer this one.]
> > > Have you confirmed that x86-64 is broken, or are you simply trying to
> > > raise more accusations? If it is broken, please tell exactly what the
> > > problem is and I will fix it.
> >
> > How do you expect void do_magic() -> int do_magic() to work without
> > changing actuall do_magic implementation?
>
> Return is in %eax. We propogate the error from the C functions
> (swsusp_suspend() and swsusp_restore()) to the caller of
> swsusp_arch_suspend() (nee do_magic()).
Okay, this is nice trick and it might work. I've checked assembly on
both i386 and x86-64 and it looks okay.
> > > panic() is not a valid replacement for sane error handling. Every single
> > > panic() in swsusp can be replaced by proper error handling. You should
> > > have done that a long time ago. Calling panic() is just lazy.
> >
> > And every untested error handler means severe data corruption. I might
> > be lazy, but you are dangerous to the user data.
>
> My point was that I'm actually trying to fix the error handling so we can
> gracefully back out of where we were. I.e. take the non-lazy
> approach.
Trying to fix error handling is good and welcome, but please test it
first. It already ate 3 different filesystems here (like "so damaged
fsck can't fix it", had to mke2fs), and I do not want it to take some
user filesystem.
> Once again, I'm trying to make it better, against all odds. Against a
> multitude of kernel developers scoffing at the notion of swsusp ever
> working reliably. Against nearly everyone that has seen kernel code
> expressing sorrow for having to read and understand the code. And against
> you, who has no gratitude for someone trying to make marked improvements
> to the structure and reliability of the code.
Again, if that patch came by email and not ftp, you'd probably get
some gratitude.
> > You've seen the reports before:
> >
> > * ide problems (some user even posted decoded backtrace)
> >
> > * UHCI leads to dead usb and machine 20x slower than it should be
>
> I don't get this. Someone else reports an error -- admist you flaming me
> -- and then you get up and flame me about that, too? I saw the
I've reported you ide problem using irc at Aug27, and reported you
UHCI problem to you, too. I've not seen email with a patch. You said
driver model has no problems, these are two examples I know of. [Where
do I find that ide fix? I'll need it for testing].
> In short, who the fuck do you think you are that you can constantly berate
> me, and with problems that aren't even yours? You've wasted a lot of
> my
....
> patches yourself. I will not touch that POS any more, for better or
> worse.
Why does every email from you have to end with a flame?
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
Hi!
> > -void software_resume(void)
> > +int __init swsusp_restore(void)
> > {
> > - if (num_online_cpus() > 1) {
> > - printk(KERN_WARNING "Software Suspend has
> > malfunctioning SMP support. Disabled :(\n");
> > - return;
> > - }
> >
> > I can not easily see where you moved this check.
>
> Read the rest of the patches, and the changelogs (I do believe it's in
> them). It's in kernel/power/main.c::enter_state(), so all PM handlers can
> use it.
Notice that this is done during resume. You are free to suspend with 1
cpu, then attempt to resume with 2 cpus. Not *too* likely to happen,
but....
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
In article <[email protected]>,
Pavel Machek <[email protected]> wrote:
| Hi!
|
| > > -void software_resume(void)
| > > +int __init swsusp_restore(void)
| > > {
| > > - if (num_online_cpus() > 1) {
| > > - printk(KERN_WARNING "Software Suspend has
| > > malfunctioning SMP support. Disabled :(\n");
| > > - return;
| > > - }
| > >
| > > I can not easily see where you moved this check.
| >
| > Read the rest of the patches, and the changelogs (I do believe it's in
| > them). It's in kernel/power/main.c::enter_state(), so all PM handlers can
| > use it.
|
| Notice that this is done during resume. You are free to suspend with 1
| cpu, then attempt to resume with 2 cpus. Not *too* likely to happen,
| but....
Would it matter if you did? Unless you are running an SMP kernel? I
guess at some point the laptop CPUs will have HT, and SMP will be more
of an issue than it is now.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Tue, 2 Sep 2003, Patrick Mochel wrote:
>
> > @@ -65,9 +65,7 @@
> >
> > #include "power.h"
> >
> > -extern long sys_sync(void);
> > -
> > -unsigned char software_suspend_enabled = 0;
> > +unsigned char software_suspend_enabled = 1;
> >
> > #define __ADDRESS(x) ((unsigned long) phys_to_virt(x))
> >
> > ...by this you enable suspend even before system is booted. That's bad
> > idea. It could hurt in the past (when we had sysrq-D so swsusp), and
> > it may hurt again in future when battery goes low during boot.
>
> Does it or does it not cause a problem? Look at the old software_suspend()
> function - The handling of this flag was weird and non-standard. This is
> cleaner.
I don't want to get in the middle of this, but having a laptop decide
that it doesn't have enough battery to finish a boot is something which
happens now and again. If this change could cause data corruption where
the old code didn't, perhaps we could stand the overhead of moving the
enable to the end of the boot, or wherever would be safe.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
Hi!
> > The whole point was to get rid of the old 2 step save_state, then
> > suspend model which didn't make sense. A saved state is only meaningful
> > as long as that state doesn't get modified afterward, so saving state
> > and suspending are an atomic operation.
Actually swsusp does exactly that, its
suspend drivers
snapshot state
resume drivers
write snapshot to the disk
I do not think that it matters, through, because I can do save_state
and suspend anyway...
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Hi!
> > > #include "power.h"
> > >
> > > -extern long sys_sync(void);
> > > -
> > > -unsigned char software_suspend_enabled = 0;
> > > +unsigned char software_suspend_enabled = 1;
> > >
> > > #define __ADDRESS(x) ((unsigned long) phys_to_virt(x))
> > >
> > > ...by this you enable suspend even before system is booted. That's bad
> > > idea. It could hurt in the past (when we had sysrq-D so swsusp), and
> > > it may hurt again in future when battery goes low during boot.
> >
> > Does it or does it not cause a problem? Look at the old software_suspend()
> > function - The handling of this flag was weird and non-standard. This is
> > cleaner.
>
> I don't want to get in the middle of this, but having a laptop decide
> that it doesn't have enough battery to finish a boot is something which
> happens now and again. If this change could cause data corruption where
> the old code didn't, perhaps we could stand the overhead of moving the
> enable to the end of the boot, or wherever would be safe.
We do not yet do suspend-to-disk on battery low, so Patrick's code is
actually safe. Still I do not think that's good change.
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
> > > -void software_resume(void)
> > > +int __init swsusp_restore(void)
> > > {
> > > - if (num_online_cpus() > 1) {
> > > - printk(KERN_WARNING "Software Suspend has
> > > malfunctioning SMP support. Disabled :(\n");
> > > - return;
> > > - }
> > >
> > > I can not easily see where you moved this check.
> >
> > Read the rest of the patches, and the changelogs (I do believe it's in
> > them). It's in kernel/power/main.c::enter_state(), so all PM handlers can
> > use it.
>
> Notice that this is done during resume. You are free to suspend with 1
> cpu, then attempt to resume with 2 cpus. Not *too* likely to happen,
> but....
That's a silly thing to do, though I don't support the notion of letting
people find out the hard way. Why not just fail on CONFIG_SMP until it's
done right?
Pat
> > > -unsigned char software_suspend_enabled = 0;
> > > +unsigned char software_suspend_enabled = 1;
> > >
> > > #define __ADDRESS(x) ((unsigned long) phys_to_virt(x))
> > >
> > > ...by this you enable suspend even before system is booted. That's bad
> > > idea. It could hurt in the past (when we had sysrq-D so swsusp), and
> > > it may hurt again in future when battery goes low during boot.
> >
> > Does it or does it not cause a problem?
>
> It does not cause a problem just now.
But, we don't support suspend-on-low-battery.
> Okay, my point was that the patch was not exactly obvious. You said
> that you have not changed any core functionality, and you may even be
> right, but it broke somewhere in the middle of pretty non-obvious
> changes.
Your point may now be that I untintentionally broke swsusp functionality
through several large patches. However, that didn't seem the case as you
were screaming at me. If you had taken the few minutes to send this email
in the first place, we could be over this already.
> > Now explain to me why killing that part of code was unnecessary. You've
> > already forced them to use mkswap(1) to restore the partition. Let's juts
> > make it official.
>
> ...and remove any chance that someone fixes that #if 0 hunk of code.
It's been #ifdef'd out since it was merged into Linus's tree on 15 July
2002. I think if someone really wants to fix it up, they'll figure out how
to add it back.
> > kernel_fpu_end() is called from do_fpu_end() which is called from
> > restore_processor_state(), which is called from do_magic() (now
> > swsusp_arch_suspend()).
>
> No, at this point you are wrong. restore_processor_state() is only
> run at resume. But you need to end atomic section during suspend,
> too, so you don't sleep in it (causing so much warnings at console
> that it takes forever to suspend).
Hey, even I'm wrong sometimes. I just posted a patch for this.
> > As for the BUG_ON(in_atomic()), that's laziness, and just poor style. A
> > nicer check would have been
> >
> > int software_suspend(void)
> > {
> > if (in_atomic() || in_interrupt())
> > return -EPERM;
> > ...
> > }
> >
> > But, we can all take the moral way out and just make sure that our callers
> > are calling us in the right context.
>
> Callers are under your control, and I don't want to trace your code
> *that* closely. [This is my choice. if () return -EPERM; is not that
> much different, but I prefer this one.]
You don't have to debug the callers, as you wouldn't anyway if you just
BUG()d or panic()d. It is nice to provide a backtrace, but you don't have
to kill the system to do so:
if (bad_context) {
dump_stack();
return -EPERM;
}
> > Return is in %eax. We propogate the error from the C functions
> > (swsusp_suspend() and swsusp_restore()) to the caller of
> > swsusp_arch_suspend() (nee do_magic()).
>
> Okay, this is nice trick and it might work. I've checked assembly on
> both i386 and x86-64 and it looks okay.
It's not a trick, it's the calling convention.
> Trying to fix error handling is good and welcome, but please test it
> first. It already ate 3 different filesystems here (like "so damaged
> fsck can't fix it", had to mke2fs), and I do not want it to take some
> user filesystem.
What took 3 filesystems? The changes I made? Where are the bug reports?
BTW, except for one time that I mounted / as ext2, I've been using ext3
with absolulely no problems.
> I've reported you ide problem using irc at Aug27, and reported you
> UHCI problem to you, too. I've not seen email with a patch. You said
> driver model has no problems, these are two examples I know of. [Where
> do I find that ide fix? I'll need it for testing].
I gave you a work around for the bug, and told you that I knew about it.
I appreciate the fact that you took the time to report it, but continuing
to bite my head off about known bugs that are being worked on is
completely unreasonable.
The fix was posted to lkml a couple of days ago. It should also be in the
-test4-mm5 kernel.
Pat
On Thu, 4 Sep 2003, Pavel Machek wrote:
> Hi!
>
>
> > > > #include "power.h"
> > > >
> > > > -extern long sys_sync(void);
> > > > -
> > > > -unsigned char software_suspend_enabled = 0;
> > > > +unsigned char software_suspend_enabled = 1;
> > > >
> > > > #define __ADDRESS(x) ((unsigned long) phys_to_virt(x))
> > > >
> > > > ...by this you enable suspend even before system is booted. That's bad
> > > > idea. It could hurt in the past (when we had sysrq-D so swsusp), and
> > > > it may hurt again in future when battery goes low during boot.
> > >
> > > Does it or does it not cause a problem? Look at the old software_suspend()
> > > function - The handling of this flag was weird and non-standard. This is
> > > cleaner.
> >
> > I don't want to get in the middle of this, but having a laptop decide
> > that it doesn't have enough battery to finish a boot is something which
> > happens now and again. If this change could cause data corruption where
> > the old code didn't, perhaps we could stand the overhead of moving the
> > enable to the end of the boot, or wherever would be safe.
>
> We do not yet do suspend-to-disk on battery low, so Patrick's code is
> actually safe. Still I do not think that's good change.
I'm not sure trying to suspect during boot is a good thing to do, either.
Maybe that should wait and be enabled at user option at the end of boot.
In any case, I think avoiding data damage is higher priority than
efficiency, elegance, modularity, etc.
Beyond that I'll let you guys fight it out, I'm just worried that the new
pm is going to bite me even if I don't use suspend.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
> I'm not sure trying to suspect during boot is a good thing to do, either.
> Maybe that should wait and be enabled at user option at the end of boot.
> In any case, I think avoiding data damage is higher priority than
> efficiency, elegance, modularity, etc.
>
> Beyond that I'll let you guys fight it out, I'm just worried that the new
> pm is going to bite me even if I don't use suspend.
Your paranoia is understood, but completely unfounded. As Pavel said, we
do not support such a feature. When/if we do, only the mechanism for
detecting a low battery state and transitioning to a suspend sequence
would be in the kernel. The policy that stated what the watermark on the
battery charge was to do that would live in userspace, and set on boot
(long after the kernel was done booting). The default would obviously be
OFF until it was set by userspace.
Patches to implement this would be greatly appreciated.
Pat
On Monday 01 September 2003 18:55, Patrick Mochel wrote:
> In all actuality, I don't need swsusp. I have a better suspend-to-disk
> implementation that is faster, smaller, and cleaner. I've hesitated
> merging it because I thought swsusp improvements would be more welcome.
> Obviously they're not; or you haven't actually taken the time to read the
> code.
Is there somewhere we can download your code? swsusp in -test3 hung my box
immediately without touching the disk, and in -test4 there doesn't seem to be
any way to trigger it under /proc or /sys...
I've been subscribed to the swsusp list for two weeks now and 2.6 has only
been mentioned _once_, and that was a two message thread with somebody asking
about it and nigel saying he didn't have time to work on it right now. It's
a apparently a 2.4-only list, and I don't use 2.4 anymore.
APM suspend doesn't work properly on my new thinkpad (suspends but hangs with
the power LED still on and the hibernate light off, and the thing's a brick
at that point; the only thing you can do is hold the power button down for
ten seconds or pop the battery to get it to boot back up from scratch.) So I
have to shut the sucker down every time I want to move it, which is a pain...
Rob
Hi!
> > Notice that this is done during resume. You are free to suspend with 1
> > cpu, then attempt to resume with 2 cpus. Not *too* likely to happen,
> > but....
>
> That's a silly thing to do, though I don't support the notion of letting
> people find out the hard way. Why not just fail on CONFIG_SMP until it's
> done right?
I guess runtime check for number of cpus is better idea -- with more
P4's around kernels with CONFING_SMP will be very common. We should at
least work in single-processor config there.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Hi!
> > In all actuality, I don't need swsusp. I have a better suspend-to-disk
> > implementation that is faster, smaller, and cleaner. I've hesitated
> > merging it because I thought swsusp improvements would be more welcome.
> > Obviously they're not; or you haven't actually taken the time to read the
> > code.
>
> Is there somewhere we can download your code? swsusp in -test3 hung my box
> immediately without touching the disk, and in -test4 there doesn't seem to be
> any way to trigger it under /proc or /sys...
echo -n disk > /sys/power/disk, but it is broken.
> I've been subscribed to the swsusp list for two weeks now and 2.6 has only
> been mentioned _once_, and that was a two message thread with somebody asking
> about it and nigel saying he didn't have time to work on it right now. It's
> a apparently a 2.4-only list, and I don't use 2.4 anymore.
2.6 swsusp development is not really done on that list.
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
On Friday 05 September 2003 18:26, Pavel Machek wrote:
> 2.6 swsusp development is not really done on that list.
Oh Really - do I have to dig out the message in which you agreed to consolidate the effort on that list?
Reminds me of when Nigel wrote to Linus for guidance on the development direction of swsusp and you replied...
You have been so nice recently, actually, I am just beginning to wonder on who's payroll your are...
Michael
Hi!
> > 2.6 swsusp development is not really done on that list.
>
> Oh Really - do I have to dig out the message in which you agreed to
> consolidate the effort on that list?
How long ago was that?
> You have been so nice recently, actually, I am just beginning to
> wonder on who's payroll your are...
I do not think I want to be involved in two flamewars at the same
time.
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
Hi!
> > > > 2.6 swsusp development is not really done on that list.
> > >
> > > Oh Really - do I have to dig out the message in which you agreed to
> > > consolidate the effort on that list?
> >
> > How long ago was that?
>
> Wed, 9 Jul 2003 20:18:30 +0200
>
> http://lister.fornax.hu/pipermail/swsusp/2003-July/003011.html
>
> > >http://sourceforge.net/mailarchive/forum.php?thread_id=2977276&forum_id=34880
Ahha, this message. Sorry, we have not understood each other.
I used old fornax mailling list and someone told me not to do that. I
agreed to not use fornax ml any more (and changed my muttrc). When
I'll have something to say about swsusp-2.4, I'm going to use new list
address, but I did not want to imply that I'll send swsusp-2.6 stuff
to that list.
[I'm ccing this message to swsusp@sf, that should clear up any
confusion].
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
On Friday 05 September 2003 19:08, Pavel Machek wrote:
> Hi!
>
> > > 2.6 swsusp development is not really done on that list.
> >
> > Oh Really - do I have to dig out the message in which you agreed to
> > consolidate the effort on that list?
>
> How long ago was that?
Wed, 9 Jul 2003 20:18:30 +0200
http://lister.fornax.hu/pipermail/swsusp/2003-July/003011.html
http://sourceforge.net/mailarchive/forum.php?thread_id=2977276&forum_id=34880
>
> > You have been so nice recently, actually, I am just beginning to
> > wonder on who's payroll your are...
>
> I do not think I want to be involved in two flamewars at the same
> time.
OK, so lets leave it at that.
Darwin will decide ;)
Michael
> Is there somewhere we can download your code? swsusp in -test3 hung my box
> immediately without touching the disk, and in -test4 there doesn't seem to be
> any way to trigger it under /proc or /sys...
I posted a URL last Saturday to the patch, and Andrew merged it into
-test4-mm4.
If you're interested in testing, please try -test4-mm6, as it has a couple
of more fixes.
> APM suspend doesn't work properly on my new thinkpad (suspends but hangs with
> the power LED still on and the hibernate light off, and the thing's a brick
> at that point; the only thing you can do is hold the power button down for
> ten seconds or pop the battery to get it to boot back up from scratch.) So I
> have to shut the sucker down every time I want to move it, which is a pain...
What model is it? It probably doesn't support APM at all. I can't
guarantee that ACPI suspend/resume will work on it, but I'm interested to
see if it does..
Thanks,
Pat
On Fri, Sep 05, 2003 at 10:47:49AM -0700, Patrick Mochel wrote:
> > the power LED still on and the hibernate light off, and the thing's a brick
> > at that point; the only thing you can do is hold the power button down for
> > ten seconds or pop the battery to get it to boot back up from scratch.) So I
> > have to shut the sucker down every time I want to move it, which is a pain...
>
> What model is it? It probably doesn't support APM at all. I can't
> guarantee that ACPI suspend/resume will work on it, but I'm interested to
> see if it does..
Note that a lot of ThinkPads out in the field need a BIOS update
before their ACPI is working. (I know this because IBM was quite
helpful and proactive in addressing their Linux-related ACPI BIOS
issues)
Jeff
> Note that a lot of ThinkPads out in the field need a BIOS update
> before their ACPI is working. (I know this because IBM was quite
> helpful and proactive in addressing their Linux-related ACPI BIOS
> issues)
Noted.
Do you happen to know which series/models this is true for, or where to
find a list?
Thanks,
Pat
Hi!
> > APM suspend doesn't work properly on my new thinkpad (suspends but hangs with
> > the power LED still on and the hibernate light off, and the thing's a brick
> > at that point; the only thing you can do is hold the power button down for
> > ten seconds or pop the battery to get it to boot back up from scratch.) So I
> > have to shut the sucker down every time I want to move it, which is a pain...
>
> What model is it? It probably doesn't support APM at all. I can't
> guarantee that ACPI suspend/resume will work on it, but I'm interested to
> see if it does..
If it did not support APM at all, it would not crash because APM
module would not load. Try to see if 2.4 APM works. (2.6 APM is known
not to be in great shape).
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
Hi!
> > > > the power LED still on and the hibernate light off, and the thing's a
> > > > brick at that point; the only thing you can do is hold the power button
> > > > down for ten seconds or pop the battery to get it to boot back up from
> > > > scratch.) So I have to shut the sucker down every time I want to move
> > > > it, which is a pain...
> > >
> > > What model is it? It probably doesn't support APM at all. I can't
> > > guarantee that ACPI suspend/resume will work on it, but I'm interested to
> > > see if it does..
> >
> > Note that a lot of ThinkPads out in the field need a BIOS update
> > before their ACPI is working. (I know this because IBM was quite
> > helpful and proactive in addressing their Linux-related ACPI BIOS
> > issues)
>
> ACPI currently works fine in terms of IRQ routing, sensing when the lid closes
> and the power button gets pressed, telling me how much battery power is left
> and when I disconnect the AC adapter from the wall...
>
> I was hoping software suspend would avoid having to have IBM firmware involved
> in the suspend process at all (it can boot, it can shut down, I just want it
> to snapshot process state so it comes up with the same things up on the
> desktop as last time).
Yes software suspend can do that.
> P.S. I reeeeeeeeeeeeeeeeeally hate it the way the keys on the keyboard
> sometimes have an up event delayed (or miss it entirely) and decide to
> auto-repeat insanely fast. It happens about twice an hour. I've seen mouse
> clicks do it as well. Not a show-stopper, just annoying.
I guess that *is* showstopper. Unfortunately notebook keyboards tend
to be crappy :-(.
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
On Fri, 5 Sep 2003, Jeff Garzik wrote:
> Note that a lot of ThinkPads out in the field need a BIOS update
> before their ACPI is working. (I know this because IBM was quite
> helpful and proactive in addressing their Linux-related ACPI BIOS
> issues)
And even that isn't always enough :(
I have a TP30, 2366-51U with the latest BIOS:
Version: 1IET67WW (2.06 )
Release: 07/17/2003
Processor Manufacturer: GenuineIntel
Processor Version: Pentium(R) 4
BIOS32 Service Directory present.
Calling Interface Address: 0x000FD7E0
ACPI 2.0 present.
OEM ID: IBM
RSD table at 0x0FF63195.
PNP 1.0 present.
Event Notification: Polling
Event Notification Flag Address: 0x000004B4
Real Mode Code Address: F000:9D36
Real Mode Data Address: 0040:0000
Protected Mode Code Address: 0x000F9D54
Protected Mode Data Address: 0x00000400
PCI Interrupt Routing 1.0 present.
Table Size: 256 bytes
Router ID: 00:1f.0
Exclusive IRQs: None
Compatible Router: 8086:122e
Up through 2.05, ACPI crashed the kernel during boot(2.4 and 2.6) -
I posted here about that... I'm going to try this weekend with
the just flashed 2.06 - even though the changelog doesn't indicate
anything changed wrt ACPI.
The problem was, iirc, was scanning one of the tables - I can't find
the message now :(
APM works on 2.4, but not 2.6
--
Rick Nelson
"Linux poses a real challenge for those with a taste for late-night
hacking (and/or conversations with God)."
(By Matt Welsh)
On Friday 05 September 2003 14:02, Jeff Garzik wrote:
> On Fri, Sep 05, 2003 at 10:47:49AM -0700, Patrick Mochel wrote:
> > > the power LED still on and the hibernate light off, and the thing's a
> > > brick at that point; the only thing you can do is hold the power button
> > > down for ten seconds or pop the battery to get it to boot back up from
> > > scratch.) So I have to shut the sucker down every time I want to move
> > > it, which is a pain...
> >
> > What model is it? It probably doesn't support APM at all. I can't
> > guarantee that ACPI suspend/resume will work on it, but I'm interested to
> > see if it does..
>
> Note that a lot of ThinkPads out in the field need a BIOS update
> before their ACPI is working. (I know this because IBM was quite
> helpful and proactive in addressing their Linux-related ACPI BIOS
> issues)
>
> Jeff
ACPI currently works fine in terms of IRQ routing, sensing when the lid closes
and the power button gets pressed, telling me how much battery power is left
and when I disconnect the AC adapter from the wall...
I was hoping software suspend would avoid having to have IBM firmware involved
in the suspend process at all (it can boot, it can shut down, I just want it
to snapshot process state so it comes up with the same things up on the
desktop as last time).
And then if I just get alsa configured I'll have every single thing on my new
laptop working under 2.6. (Well, okay, then I'll need to tackle USB. But
every single thing I've tried to use so far, anyway. :)
P.S. I reeeeeeeeeeeeeeeeeally hate it the way the keys on the keyboard
sometimes have an up event delayed (or miss it entirely) and decide to
auto-repeat insanely fast. It happens about twice an hour. I've seen mouse
clicks do it as well. Not a show-stopper, just annoying.
Okay, once it did it during boot, and the beast was unusable. X recovers when
you hit the next key, the console apparently does not.
Rob
Hi!
> Pavel, please keep in mind that proper PM is a difficult task, what we
> had before was full of holes, we spent a significant amount of time over
> the past year debating the right way to do all of this and Patrick spent
> even more time actually implementing it, so rather than just crying
> loud, I'd epxect you rather help us fixing what still need to be.
I was screaming out loud because I did not see the patch coming. It
simply materialized in Linus' tree. Exactly because PM is hard, I'd
expect "here's rewrite of driver model, please test it on your
hardware" mail on lkml before it is merged.
Anyway, ad uhci: it seemed to work well in -test3. What's current
status of pm_send_all() interface? I think pm_send_all() was
responsible for UHCI working...
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Hi!
> > > I was hoping software suspend would avoid having to have IBM firmware
> > > involved in the suspend process at all (it can boot, it can shut down, I
> > > just want it to snapshot process state so it comes up with the same
> > > things up on the desktop as last time).
> >
> > Yes software suspend can do that.
>
> Footnote 1: If it worked, which I've never been able to get it to
> do.
Try -test3 with ext2 and minimal set of drivers.
> > > P.S. I reeeeeeeeeeeeeeeeeally hate it the way the keys on the keyboard
> > > sometimes have an up event delayed (or miss it entirely) and decide to
> > > auto-repeat insanely fast. It happens about twice an hour. I've seen
> > > mouse clicks do it as well. Not a show-stopper, just annoying.
> >
> > I guess that *is* showstopper. Unfortunately notebook keyboards tend
> > to be crappy :-(.
>
> Not on a thinkpad. I could probably bring down a wild caribou with this
> thing, it's designed like a tank. (Part of the reason I bought it. :)
>
> I tried 2.4 for a bit on it when I was first trying to get it working.
> (Largely to assess how much reconfiguration needed to be done, since I'd
> swapped in a hard drive from a toshiba and didn't want to have to reinstall.)
>
> The keyboard never had a problem under 2.4, this is a 2.6 problem.
> It also
You need to report this to vojtech, I guess.
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
On Friday 05 September 2003 15:06, Pavel Machek wrote:
> > I was hoping software suspend would avoid having to have IBM firmware
> > involved in the suspend process at all (it can boot, it can shut down, I
> > just want it to snapshot process state so it comes up with the same
> > things up on the desktop as last time).
>
> Yes software suspend can do that.
Footnote 1: If it worked, which I've never been able to get it to do.
> > P.S. I reeeeeeeeeeeeeeeeeally hate it the way the keys on the keyboard
> > sometimes have an up event delayed (or miss it entirely) and decide to
> > auto-repeat insanely fast. It happens about twice an hour. I've seen
> > mouse clicks do it as well. Not a show-stopper, just annoying.
>
> I guess that *is* showstopper. Unfortunately notebook keyboards tend
> to be crappy :-(.
Not on a thinkpad. I could probably bring down a wild caribou with this
thing, it's designed like a tank. (Part of the reason I bought it. :)
I tried 2.4 for a bit on it when I was first trying to get it working.
(Largely to assess how much reconfiguration needed to be done, since I'd
swapped in a hard drive from a toshiba and didn't want to have to reinstall.)
The keyboard never had a problem under 2.4, this is a 2.6 problem. It also
affects the mouse at times (not mouse movements, but mouse clicks). If I had
to theorize without information, I'd say that the new input event core that
all this stuff is going through either offloads something on a kernel thread,
or is getting blocked on a semaphore that another thread is using, and it
goes 1/10th of a second or so without being secheduled, so the
acknowledgement of the "up" event gets delayed, and repeat logic triggers
almost immediately, and something is believing that the key was held down for
a long time, so I get a bunch of keys.
Sometimes the I'll get say 7 characters (almost instantly) and then it'll
stop. And every once in a while, the key will just stick and keep going
until I hit something else. (If this happens on the console, hitting
something else doesn't help, but in X it does. I'm not in the console enough
to give too much data about that, I just had it hang once repeating a key,
and I couldn't figure out which one it was repeating (some sort of function
key escape code) and there was nothing I could do to make it stop. That's
actually an old bug I've seen in 2.4, sometimes during bootup it would think
a function key was being held down, and keep echoing an escape sequence to
the screen. But under 2.4 it would stop when I hit any other key, and in 2.6
it doesn't listen to the keyboard when it's in that state...)
It's intermittent enough it doesn't stop me from using it. It's no worse than
keybounce where dirty keys don't always make contact. But this is a much
different symptom...
> Pavel
Rob
| > P.S. I reeeeeeeeeeeeeeeeeally hate it the way the keys on the
| > keyboard
| > sometimes have an up event delayed (or miss it entirely) and decide to
| > auto-repeat insanely fast. It happens about twice an hour. I've seen mouse
| > clicks do it as well. Not a show-stopper, just annoying.
| I guess that *is* showstopper. Unfortunately notebook keyboards tend
| to be crappy :-(.
Just take a look at :
http://bugzilla.xfree86.org/show_bug.cgi?id=532
and
http://bugzilla.kernel.org/show_bug.cgi?id=912
runaway keyboard. And *not* a notebook one or even a crappy desktop one.
Not the top-of-the line mechanical keyboard one can buy today, but a
good almost new better-than-average one.
Regards,
--
Nicolas Mailhot
Patrick Mochel wrote:
>>Note that a lot of ThinkPads out in the field need a BIOS update
>>before their ACPI is working. (I know this because IBM was quite
>>helpful and proactive in addressing their Linux-related ACPI BIOS
>>issues)
>
>
> Noted.
>
> Do you happen to know which series/models this is true for, or where to
> find a list?
No, but I can ask. IBMer posted to (public) RH bugzilla bugs with
details, too. I'll search for those as well.
Jeff
Richard A Nelson wrote:
> On Fri, 5 Sep 2003, Jeff Garzik wrote:
>
>
>>Note that a lot of ThinkPads out in the field need a BIOS update
>>before their ACPI is working. (I know this because IBM was quite
>>helpful and proactive in addressing their Linux-related ACPI BIOS
>>issues)
>
>
> And even that isn't always enough :(
>
> I have a TP30, 2366-51U with the latest BIOS:
> Version: 1IET67WW (2.06 )
> Release: 07/17/2003
> Processor Manufacturer: GenuineIntel
> Processor Version: Pentium(R) 4
> BIOS32 Service Directory present.
> Calling Interface Address: 0x000FD7E0
> ACPI 2.0 present.
> OEM ID: IBM
> RSD table at 0x0FF63195.
> PNP 1.0 present.
> Event Notification: Polling
> Event Notification Flag Address: 0x000004B4
> Real Mode Code Address: F000:9D36
> Real Mode Data Address: 0040:0000
> Protected Mode Code Address: 0x000F9D54
> Protected Mode Data Address: 0x00000400
> PCI Interrupt Routing 1.0 present.
> Table Size: 256 bytes
> Router ID: 00:1f.0
> Exclusive IRQs: None
> Compatible Router: 8086:122e
>
> Up through 2.05, ACPI crashed the kernel during boot(2.4 and 2.6) -
> I posted here about that... I'm going to try this weekend with
> the just flashed 2.06 - even though the changelog doesn't indicate
> anything changed wrt ACPI.
>
> The problem was, iirc, was scanning one of the tables - I can't find
> the message now :(
If you are up for a little debugging and comfortable with building your
own kernels, then please enable the relaxed-aml-checking and debug
options in the ACPI kernel config. Those, and dmidecode output, will
provide useful info to the ACPI folks.
Jeff