The following problem, which I reported for kernel 2.6.18-rc1 on my
development machine
Dell OptiPlex GX110
uname -a = Linux gx110 2.6.18-rc7-noinitrd #1 PREEMPT Thu Sep 14
15:13:38 CEST 2006 i686 i686 i386 GNU/Linux
933 MHz Pentium III processor, i810 chipset, 512 MB RAM
distribution SuSE 10.0, syslog-ng 1.6.8, klogd 1.4.1, Xorg X11 6.8.2
still exists with 2.6.18-rc7:
While X is running, output from printk() appears in syslog (eg.
/var/log/messages) only after a key is pressed on the system keyboard,
even though it is visible with dmesg immediately.
Additional observations:
- The problem is *not* present with 2.6.17.* or earlier kernels.
- The problem *is* present with 2.6.18-rc*-mm* kernels.
- The problem disappears if the X server is terminated (telinit 3) and
reappears if the X server is started again (telinit 5).
- Syslog messages from userspace programs are not affected by the delay.
- No messages are lost, all appear eventually, though possibly hours
or days later, depending on how long nobody touches the keyboard.
- It doesn't matter which key is pressed; even pressing a shift key all
by its own is sufficient to make the missing messages appear.
- I couldn't find any other action that would release the messages;
neither mouse movements or clicks, nor waiting up to 24 hours, not
even logging in via ssh from another machine and compiling a Linux
kernel. ;-)
- The effect can be clearly observed by the difference between the
kernel's own timestamps and those by syslogd; an extreme example:
Sep 16 14:11:16 gx110 kernel: [18729.057746] gigaset: unblocking all
channels
Sep 16 14:11:16 gx110 kernel: [18729.057765] gigaset: searching
scheduled commands
Sep 16 14:11:16 gx110 kernel: [86033.298803] gigaset: received response
(8 bytes): ^M^JZLOG^M^J
Sep 16 14:11:16 gx110 kernel: [86033.298898] bas_gigaset: cmd_loop: End
of Command (0 Bytes)
Please let me know if I can help in any way with locating the cause of
this annoying phenomenon.
Thanks
Tilman
--
Tilman Schmidt E-Mail: [email protected]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeoeffnet mindestens haltbar bis: (siehe Rueckseite)
On Sat, 2006-09-16 at 14:45 +0200, Tilman Schmidt wrote:
> The following problem, which I reported for kernel 2.6.18-rc1 on my
> development machine
> Dell OptiPlex GX110
> uname -a = Linux gx110 2.6.18-rc7-noinitrd #1 PREEMPT Thu Sep 14
> 15:13:38 CEST 2006 i686 i686 i386 GNU/Linux
> 933 MHz Pentium III processor, i810 chipset, 512 MB RAM
> distribution SuSE 10.0, syslog-ng 1.6.8, klogd 1.4.1, Xorg X11 6.8.2
> still exists with 2.6.18-rc7:
>
> While X is running, output from printk() appears in syslog (eg.
> /var/log/messages) only after a key is pressed on the system keyboard,
> even though it is visible with dmesg immediately.
>
> Additional observations:
> - The problem is *not* present with 2.6.17.* or earlier kernels.
> - The problem *is* present with 2.6.18-rc*-mm* kernels.
> - The problem disappears if the X server is terminated (telinit 3) and
> reappears if the X server is started again (telinit 5).
> - Syslog messages from userspace programs are not affected by the delay.
> - No messages are lost, all appear eventually, though possibly hours
> or days later, depending on how long nobody touches the keyboard.
> - It doesn't matter which key is pressed; even pressing a shift key all
> by its own is sufficient to make the missing messages appear.
> - I couldn't find any other action that would release the messages;
> neither mouse movements or clicks, nor waiting up to 24 hours, not
> even logging in via ssh from another machine and compiling a Linux
> kernel. ;-)
> - The effect can be clearly observed by the difference between the
> kernel's own timestamps and those by syslogd; an extreme example:
>
> Sep 16 14:11:16 gx110 kernel: [18729.057746] gigaset: unblocking all
> channels
> Sep 16 14:11:16 gx110 kernel: [18729.057765] gigaset: searching
> scheduled commands
> Sep 16 14:11:16 gx110 kernel: [86033.298803] gigaset: received response
> (8 bytes): ^M^JZLOG^M^J
> Sep 16 14:11:16 gx110 kernel: [86033.298898] bas_gigaset: cmd_loop: End
> of Command (0 Bytes)
>
> Please let me know if I can help in any way with locating the cause of
> this annoying phenomenon.
Unfortunately I don't know what would be the cause.
You might try git-bisect to find the offending patch.
http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt
thanks
-john
On 19.09.2006 20:52, john stultz wrote:
> Unfortunately I don't know what would be the cause.
>
> You might try git-bisect to find the offending patch.
> http://www.kernel.org/pub/software/scm/git/docs/howto/isolate-bugs-with-bisect.txt
Um, ok, I'll try. But I've never used git before, so I'll need some time
reading all the docs, installing git and finding my way around it. I'll
report back as soon as I have a result, but I wouldn't expect it to be
in time for the problem to be fixed in 2.6.18 release.
Thanks
Tilman
--
Tilman Schmidt E-Mail: [email protected]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Unge?ffnet mindestens haltbar bis: (siehe R?ckseite)
On 19.09.2006 20:52, john stultz wrote:
> You might try git-bisect to find the offending patch.
This turned out to be easier than I expected.
The patch which introduces the problem is:
[a0f1ccfd8d37457a6d8a9e01acebeefcdfcc306e] lockdep: do not recurse in printk
Reverting that patch makes the problem disappear in 2.6.18, too.
In fact, it suffices to revert just the last chunk:
@@ -809,8 +815,15 @@ void release_console_sem(void)
console_may_schedule = 0;
up(&console_sem);
spin_unlock_irqrestore(&logbuf_lock, flags);
- if (wake_klogd && !oops_in_progress && waitqueue_active(&log_wait))
- wake_up_interruptible(&log_wait);
+ if (wake_klogd && !oops_in_progress && waitqueue_active(&log_wait)) {
+ /*
+ * If we printk from within the lock dependency code,
+ * from within the scheduler code, then do not lock
+ * up due to self-recursion:
+ */
+ if (!lockdep_internal())
+ wake_up_interruptible(&log_wait);
+ }
}
EXPORT_SYMBOL(release_console_sem);
Hope that helps.
--
Tilman Schmidt E-Mail: [email protected]
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeoeffnet mindestens haltbar bis: (siehe Rueckseite)
On Fri, 2006-09-22 at 01:28 +0200, Tilman Schmidt wrote:
> On 19.09.2006 20:52, john stultz wrote:
> > You might try git-bisect to find the offending patch.
>
> This turned out to be easier than I expected.
> The patch which introduces the problem is:
>
> [a0f1ccfd8d37457a6d8a9e01acebeefcdfcc306e] lockdep: do not recurse in printk
When you've narrow down a patch, be sure to CC the author (in this case
Ingo).
> Reverting that patch makes the problem disappear in 2.6.18, too.
> In fact, it suffices to revert just the last chunk:
>
> @@ -809,8 +815,15 @@ void release_console_sem(void)
> console_may_schedule = 0;
> up(&console_sem);
> spin_unlock_irqrestore(&logbuf_lock, flags);
> - if (wake_klogd && !oops_in_progress && waitqueue_active(&log_wait))
> - wake_up_interruptible(&log_wait);
> + if (wake_klogd && !oops_in_progress && waitqueue_active(&log_wait)) {
> + /*
> + * If we printk from within the lock dependency code,
> + * from within the scheduler code, then do not lock
> + * up due to self-recursion:
> + */
> + if (!lockdep_internal())
> + wake_up_interruptible(&log_wait);
> + }
> }
> EXPORT_SYMBOL(release_console_sem);
Ingo, your thoughts?
thanks
-john