2007-11-24 17:26:40

by Stefano Brivio

[permalink] [raw]
Subject: [BUG] jiffies counter leaps in 2.6.24-rc3

It looks like the jiffies counter sometimes jumps back and forth of some
hundreds seconds in 2.6.24-rc3. I observed that this happens when I use the
su(1) command, e.g.:

Nov 24 06:17:17 morte [190769.065301] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 232 (6730/29)
Nov 24 06:17:22 morte su[16826]: Successful su for root by st3
Nov 24 06:17:22 morte su[16826]: + pts/1 st3:root
Nov 24 06:17:22 morte su(pam_unix)[16826]: session opened for user root by (uid=1000)
Nov 24 06:17:38 morte [715682.606983] b43-phy2 ERROR: PHY transmission error
Nov 24 06:18:17 morte [715707.765415] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 194 (970/5)

or

Nov 23 20:55:40 morte [627074.320296] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 10 (550/55)
Nov 23 20:56:01 morte su[5452]: Successful su for root by st3
Nov 23 20:56:01 morte su[5452]: + pts/4 st3:root
Nov 23 20:56:01 morte su(pam_unix)[5452]: session opened for user root by (uid=1000)
Nov 23 20:56:03 morte su(pam_unix)[5452]: session closed for user root
Nov 23 20:56:40 morte [167187.102931] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 10 (40/4)

or

Nov 23 06:31:00 morte [156536.124549] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 240 (6480/27)
Nov 23 06:31:58 morte su[4350]: Successful su for root by st3
Nov 23 06:31:58 morte su[4350]: + pts/0 st3:root
Nov 23 06:31:58 morte su(pam_unix)[4350]: session opened for user root by (uid=1000)
Nov 23 06:32:09 morte [587438.574530] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 240 (4080/17)

(I checked with a clock the timestamp prepended by syslog-ng, and it's
correct.)

I'm thinking this could be somehow related to the setpriority() call made
by su(1), but I don't know how to debug this further. Any clue?

morte st3 # cat /proc/interrupts
CPU0
0: 319167512 XT-PIC-XT timer
1: 459332 XT-PIC-XT i8042
2: 0 XT-PIC-XT cascade
3: 1 XT-PIC-XT
4: 1 XT-PIC-XT
5: 1 XT-PIC-XT
7: 9 XT-PIC-XT
8: 2 XT-PIC-XT rtc
9: 9114 XT-PIC-XT acpi
10: 46272099 XT-PIC-XT uhci_hcd:usb1, i915@pci:0000:00:02.0
11: 23691749 XT-PIC-XT yenta, Intel 82801DB-ICH4, uhci_hcd:usb2, uhci_hcd:usb3, ehci_hcd:usb4, Intel 82801DB-ICH4 Modem, ohci1394, b43
12: 3287633 XT-PIC-XT i8042
14: 1076167 XT-PIC-XT ide0
15: 47 XT-PIC-XT ide1
NMI: 0 Non-maskable interrupts
LOC: 0 Local timer interrupts
TRM: 0 Thermal event interrupts
SPU: 0 Spurious interrupts
ERR: 8
MIS: 0

morte st3 # cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 13
model name : Intel(R) Pentium(R) M processor 1.50GHz
stepping : 6
cpu MHz : 1500.000
cache size : 2048 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe bts est tm2
bogomips : 2993.47
clflush size : 64


--
Ciao
Stefano


2007-11-24 17:57:15

by Frans Pop

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

Stefano Brivio wrote:
> It looks like the jiffies counter sometimes jumps back and forth of some
> hundreds seconds in 2.6.24-rc3. I observed that this happens when I use
> the su(1) command, e.g.:

Can you please explain what exactly the problem is here?

Are you perhaps referring to the number between square brackets for the su
log lines? In that case there is no problem as in that case the number is
not jiffies, but the process ID (PID) of the su process...

Cheers,
FJP

2007-11-24 18:09:56

by Stefano Brivio

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Sat, 24 Nov 2007 18:56:57 +0100
Frans Pop <[email protected]> wrote:

> Stefano Brivio wrote:
> > It looks like the jiffies counter sometimes jumps back and forth of some
> > hundreds seconds in 2.6.24-rc3. I observed that this happens when I use
> > the su(1) command, e.g.:
>
> Can you please explain what exactly the problem is here?
>
> Are you perhaps referring to the number between square brackets for the su
> log lines? In that case there is no problem as in that case the number is
> not jiffies, but the process ID (PID) of the su process...

Sorry guy but I'm not _that_ idiot. Please notice jiffies values before and
after that.


--
Ciao
Stefano

2007-11-24 18:31:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Saturday, 24 of November 2007, Stefano Brivio wrote:
> It looks like the jiffies counter sometimes jumps back and forth of some
> hundreds seconds in 2.6.24-rc3. I observed that this happens when I use the
> su(1) command, e.g.:
>
> Nov 24 06:17:17 morte [190769.065301] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 232 (6730/29)
> Nov 24 06:17:22 morte su[16826]: Successful su for root by st3
> Nov 24 06:17:22 morte su[16826]: + pts/1 st3:root
> Nov 24 06:17:22 morte su(pam_unix)[16826]: session opened for user root by (uid=1000)
> Nov 24 06:17:38 morte [715682.606983] b43-phy2 ERROR: PHY transmission error
> Nov 24 06:18:17 morte [715707.765415] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 194 (970/5)
>
> or
>
> Nov 23 20:55:40 morte [627074.320296] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 10 (550/55)
> Nov 23 20:56:01 morte su[5452]: Successful su for root by st3
> Nov 23 20:56:01 morte su[5452]: + pts/4 st3:root
> Nov 23 20:56:01 morte su(pam_unix)[5452]: session opened for user root by (uid=1000)
> Nov 23 20:56:03 morte su(pam_unix)[5452]: session closed for user root
> Nov 23 20:56:40 morte [167187.102931] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 10 (40/4)
>
> or
>
> Nov 23 06:31:00 morte [156536.124549] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 240 (6480/27)
> Nov 23 06:31:58 morte su[4350]: Successful su for root by st3
> Nov 23 06:31:58 morte su[4350]: + pts/0 st3:root
> Nov 23 06:31:58 morte su(pam_unix)[4350]: session opened for user root by (uid=1000)
> Nov 23 06:32:09 morte [587438.574530] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 240 (4080/17)
>
> (I checked with a clock the timestamp prepended by syslog-ng, and it's
> correct.)
>
> I'm thinking this could be somehow related to the setpriority() call made
> by su(1), but I don't know how to debug this further. Any clue?

NO_HZ? Highres timers?

I understand that the previous kernels behave correctly. All of them?

Rafael

2007-11-24 18:47:26

by Stefano Brivio

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Sat, 24 Nov 2007 19:48:58 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> NO_HZ? Highres timers?

CONFIG_HZ_1000=y
# CONFIG_HIGH_RES_TIMERS is not set

> I understand that the previous kernels behave correctly. All of them?

2.6.21 behaved correctly. Sorry but git-bisect would take a lot of time (I
can't reliably reproduce the jiffies jump), so I would avoid that if not
strictly needed.


--
Ciao
Stefano

2007-11-24 19:13:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Saturday, 24 of November 2007, Stefano Brivio wrote:
> On Sat, 24 Nov 2007 19:48:58 +0100
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > NO_HZ? Highres timers?
>
> CONFIG_HZ_1000=y
> # CONFIG_HIGH_RES_TIMERS is not set
>
> > I understand that the previous kernels behave correctly. All of them?
>
> 2.6.21 behaved correctly. Sorry but git-bisect would take a lot of time (I
> can't reliably reproduce the jiffies jump), so I would avoid that if not
> strictly needed.

Well, it would be good to know if 2.6.23 behaves correctly, at least.

Thanks,
Rafael

2007-11-26 18:01:55

by Stefano Brivio

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Sat, 24 Nov 2007 20:31:25 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Saturday, 24 of November 2007, Stefano Brivio wrote:
> > On Sat, 24 Nov 2007 19:48:58 +0100
> > "Rafael J. Wysocki" <[email protected]> wrote:
> >
> > > NO_HZ? Highres timers?
> >
> > CONFIG_HZ_1000=y
> > # CONFIG_HIGH_RES_TIMERS is not set
> >
> > > I understand that the previous kernels behave correctly. All of them?
> >
> > 2.6.21 behaved correctly. Sorry but git-bisect would take a lot of time
> > (I can't reliably reproduce the jiffies jump), so I would avoid that if
> > not strictly needed.
>
> Well, it would be good to know if 2.6.23 behaves correctly, at least.

Will try that ASAP, sorry for the delay.


--
Ciao
Stefano

2007-11-29 07:23:28

by Stefano Brivio

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Sat, 24 Nov 2007 20:31:25 +0100
"Rafael J. Wysocki" <[email protected]> wrote:

> On Saturday, 24 of November 2007, Stefano Brivio wrote:
> > On Sat, 24 Nov 2007 19:48:58 +0100
> > "Rafael J. Wysocki" <[email protected]> wrote:
> >
> > > NO_HZ? Highres timers?
> >
> > CONFIG_HZ_1000=y
> > # CONFIG_HIGH_RES_TIMERS is not set
> >
> > > I understand that the previous kernels behave correctly. All of them?
> >
> > 2.6.21 behaved correctly. Sorry but git-bisect would take a lot of time
> > (I can't reliably reproduce the jiffies jump), so I would avoid that if
> > not strictly needed.
>
> Well, it would be good to know if 2.6.23 behaves correctly, at least.

Weird, it looks like I can't boot with 2.6.23.9 because of some issues with
dm-crypt (my root filesystem is encrypted). I double-checked the
configuration (which I just took from my current one), well, no way. Any
other test I can do?

In the meanwhile, I noted another thing: sometimes it happens that I become
root and the jiffies counter jumps ahead. Then, when I close any root
session, the jiffies counter jumps back to the correct value.

Please remember that this isn't just an aesthetic issue, as some drivers
(e.g. b43 and b43legacy, but I guess a lot more) rely on jiffies. Do I need
to file a bug to bugzilla.kernel.org? Thank you.


--
Ciao
Stefano

2007-11-29 08:53:19

by Stefano Brivio

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

Ok, I found out how to reliably reproduce this bug. The root session issue
was a bit weird, but I noticed I usually switch to root only when I need to
change the CPU frequency. And here's what happens:

# cpuspeedy min
[after some time...]
Nov 29 09:42:31 morte [ 4408.722597] b43-phy0 ERROR: PHY transmission error

# cpuspeedy max
Nov 29 09:42:41 morte [ 1177.238599] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 540 (2160/4)

It looks like jiffies get scaled exactly by a 3.75 factor, which is
<max_frequency>/<min_frequency> here. I use acpi-cpufreq and this is my
(only) CPU:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 13
model name : Intel(R) Pentium(R) M processor 1.50GHz
stepping : 6
cpu MHz : 1500.000
cache size : 2048 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe bts est tm2
bogomips : 2993.49
clflush size : 64

Any hint?


--
Ciao
Stefano

2007-11-29 10:33:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3


* Stefano Brivio <[email protected]> wrote:

> Please remember that this isn't just an aesthetic issue, as some
> drivers (e.g. b43 and b43legacy, but I guess a lot more) rely on
> jiffies. Do I need to file a bug to bugzilla.kernel.org? Thank you.

yes, please do file a bugzilla report too - there seems to be no
progress or quick ideas at the moment.

Ingo

2007-11-29 10:35:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3


* Stefano Brivio <[email protected]> wrote:

> Ok, I found out how to reliably reproduce this bug. The root session issue
> was a bit weird, but I noticed I usually switch to root only when I need to
> change the CPU frequency. And here's what happens:
>
> # cpuspeedy min
> [after some time...]
> Nov 29 09:42:31 morte [ 4408.722597] b43-phy0 ERROR: PHY transmission error
>
> # cpuspeedy max
> Nov 29 09:42:41 morte [ 1177.238599] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 540 (2160/4)
>
> It looks like jiffies get scaled exactly by a 3.75 factor, which is
> <max_frequency>/<min_frequency> here. I use acpi-cpufreq and this is my
> (only) CPU:
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 13
> model name : Intel(R) Pentium(R) M processor 1.50GHz
> stepping : 6
> cpu MHz : 1500.000
> cache size : 2048 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe bts est tm2
> bogomips : 2993.49
> clflush size : 64
>
> Any hint?

ah, the cpufreq/acpi/acpi-cpufreq angle is indeed a big clue. I've
Cc:-ed various interested parties.

Ingo

2007-11-29 17:47:21

by Dave Jones

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Thu, Nov 29, 2007 at 11:35:11AM +0100, Ingo Molnar wrote:

> * Stefano Brivio <[email protected]> wrote:
>
> > Ok, I found out how to reliably reproduce this bug. The root session issue
> > was a bit weird, but I noticed I usually switch to root only when I need to
> > change the CPU frequency. And here's what happens:
> >
> > # cpuspeedy min
> > [after some time...]
> > Nov 29 09:42:31 morte [ 4408.722597] b43-phy0 ERROR: PHY transmission error
> >
> > # cpuspeedy max
> > Nov 29 09:42:41 morte [ 1177.238599] wmaster0: STA 00:14:c1:35:8d:eb Average rate: 540 (2160/4)
> >
> > It looks like jiffies get scaled exactly by a 3.75 factor, which is
> > <max_frequency>/<min_frequency> here. I use acpi-cpufreq and this is my
>
> ah, the cpufreq/acpi/acpi-cpufreq angle is indeed a big clue. I've
> Cc:-ed various interested parties.

Hmm, what's in /sys/devices/system/clocksource/clocksource0/current_clocksource ?

adjust_jiffies() always gets called on a speed transition. I'm wondering
if perhaps we shouldn't do that if we're using something other than
the tsc for timekeeping.

Dave

--
http://www.codemonkey.org.uk

2007-11-29 18:05:56

by Stefano Brivio

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Thu, 29 Nov 2007 12:45:48 -0500
Dave Jones <[email protected]> wrote:

> Hmm, what's
> in /sys/devices/system/clocksource/clocksource0/current_clocksource ?

morte st3 # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm

> adjust_jiffies() always gets called on a speed transition. I'm wondering
> if perhaps we shouldn't do that if we're using something other than
> the tsc for timekeeping.

I gave other clocksources a try:

morte st3 # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm pit jiffies tsc

Setting tsc as current clocksource:
morte st3 # echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
[13367.729060] Time: tsc clocksource has been installed.
didn't fix anything. Other weirdness here: it looks like it affects my
typematic delay and repeat rate.

Setting jiffies or pit as current clocksource didn't fix anything, but at
least it doesn't cause anything bad.

Do you want me to comment out adjust_jiffies() and see what happens?


--
Ciao
Stefano

2007-11-29 18:25:23

by Dave Jones

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Thu, Nov 29, 2007 at 06:57:43PM +0100, Stefano Brivio wrote:

> > adjust_jiffies() always gets called on a speed transition. I'm wondering
> > if perhaps we shouldn't do that if we're using something other than
> > the tsc for timekeeping.
>
> I gave other clocksources a try:
>
> morte st3 # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
> acpi_pm pit jiffies tsc
>
> Setting tsc as current clocksource:
> morte st3 # echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
> [13367.729060] Time: tsc clocksource has been installed.
> didn't fix anything. Other weirdness here: it looks like it affects my
> typematic delay and repeat rate.

*grumble*

> Setting jiffies or pit as current clocksource didn't fix anything, but at
> least it doesn't cause anything bad.
>
> Do you want me to comment out adjust_jiffies() and see what happens?

Just for laughs, why not.

Dave

--
http://www.codemonkey.org.uk

2007-11-29 19:33:51

by Stefano Brivio

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Thu, 29 Nov 2007 13:24:02 -0500
Dave Jones <[email protected]> wrote:

> On Thu, Nov 29, 2007 at 06:57:43PM +0100, Stefano Brivio wrote:

> > Do you want me to comment out adjust_jiffies() and see what happens?
>
> Just for laughs, why not.

I wanted to laugh, but the behaviour didn't change at all. How boring.


--
Ciao
Stefano

2007-11-29 19:37:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3


* Stefano Brivio <[email protected]> wrote:

> On Thu, 29 Nov 2007 13:24:02 -0500
> Dave Jones <[email protected]> wrote:
>
> > On Thu, Nov 29, 2007 at 06:57:43PM +0100, Stefano Brivio wrote:
>
> > > Do you want me to comment out adjust_jiffies() and see what happens?
> >
> > Just for laughs, why not.
>
> I wanted to laugh, but the behaviour didn't change at all. How boring.

just to make sure there is no tsc impact, does booting with "notsc"
change anything?

if not, does booting with idle=halt or idle=poll help? [your power usage
will go up so it's not very useful - but it could help exclude a few
sources of trouble.]

Ingo

2007-11-29 20:06:03

by Dave Jones

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Thu, Nov 29, 2007 at 08:25:46PM +0100, Stefano Brivio wrote:
> On Thu, 29 Nov 2007 13:24:02 -0500
> Dave Jones <[email protected]> wrote:
>
> > On Thu, Nov 29, 2007 at 06:57:43PM +0100, Stefano Brivio wrote:
>
> > > Do you want me to comment out adjust_jiffies() and see what happens?
> >
> > Just for laughs, why not.
>
> I wanted to laugh, but the behaviour didn't change at all. How boring.

Unless I'm missing something obvious, this would exonorate cpufreq
as a cause, which would seem odd based upon your earlier diagnosis.

deeply confused.

Dave

--
http://www.codemonkey.org.uk

2007-11-29 20:21:17

by Stefano Brivio

[permalink] [raw]
Subject: Re: [BUG] jiffies counter leaps in 2.6.24-rc3

On Thu, 29 Nov 2007 20:36:24 +0100
Ingo Molnar <[email protected]> wrote:

> just to make sure there is no tsc impact, does booting with "notsc"
> change anything?

No, it doesn't.

> if not, does booting with idle=halt or idle=poll help? [your power usage
> will go up so it's not very useful - but it could help exclude a few
> sources of trouble.]

With idle=poll nothing changes, except that - as soon as I switch the CPU
frequency - I get this in the logs:

[ 297.562722] Marking TSC unstable due to: cpufreq changes.
[ 297.566677] Time: acpi_pm clocksource has been installed.

With idle=halt, changing frequency just freezes the system. I can't even use
the magic Sys Rq keys then.

I observed another thing. When I don't use any of the parameters you
suggested for debugging, I get this at boot:

[ 6.530752] Marking TSC unstable due to: possible TSC halt in C2.
[ 6.530761] Time: acpi_pm clocksource has been installed.
[ 6.620700] Clocksource tsc unstable (delta = -456217276 ns)


--
Ciao
Stefano