2008-03-05 07:24:27

by Dave Young

[permalink] [raw]
Subject: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

Hi,
don't know if it's fixed or not, with 2.6.25-rc4 after suspend/resume,
my syslog full of rcupreempt warnings.

root@darkstar:/var/log# grep WARNING syslog|wc
85499 940859 11223920

warnings are as following:

[ 4134.833641] ------------[ cut here ]------------
[ 4134.833647] WARNING: at include/linux/rcupreempt.h:91
tick_nohz_stop_sched_tick+0x39d/0x3b0()
[ 4134.833650] Modules linked in: binfmt_misc rfcomm l2cap
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
snd_pcm_oss snd_mixer_oss thermal snd_hda_intel processor snd_pcm
button sg hci_usb rtc_cmos bluetooth evdev rtc_core snd_timer snd
pcspkr 3c59x i2c_i801 rtc_lib dcdbas serio_raw soundcore intel_agp
agpgart snd_page_alloc
[ 4134.833690] Pid: 0, comm: swapper Not tainted 2.6.25-rc4 #1
[ 4134.833693] [<c0128700>] ? zap_locks+0x30/0x70
[ 4134.833700] [<c0127ea4>] warn_on_slowpath+0x54/0x80
[ 4134.833708] [<c0140030>] ? hrtimer_init_sleeper+0x0/0x20
[ 4134.833715] [<c014c4ea>] ? __lock_acquired+0x10a/0x140
[ 4134.833722] [<c013fa05>] ? hrtimer_get_next_event+0x75/0xe0
[ 4134.833728] [<c014bf36>] ? __lock_release+0x26/0x70
[ 4134.833734] [<c013fa05>] ? hrtimer_get_next_event+0x75/0xe0
[ 4134.833741] [<c03d5989>] ? _spin_unlock_irqrestore+0x39/0x70
[ 4134.833749] [<c013fa05>] ? hrtimer_get_next_event+0x75/0xe0
[ 4134.833757] [<c013188b>] ? cmp_next_hrtimer_event+0x1b/0xa0
[ 4134.833765] [<c01470dd>] tick_nohz_stop_sched_tick+0x39d/0x3b0
[ 4134.833772] [<c03d3059>] ? __sched_text_start+0x229/0x4c0
[ 4134.833781] [<c0103030>] ? default_idle+0x0/0xa0
[ 4134.833787] [<c0103117>] cpu_idle+0x37/0x140
[ 4134.833793] [<c03d0e7f>] start_secondary+0x9f/0xc0
[ 4134.833802] =======================
[ 4134.833805] ---[ end trace f3669ad1d62556af ]---
[ 4134.833811] ------------[ cut here ]------------
[ 4134.833814] WARNING: at include/linux/rcupreempt.h:99
tick_nohz_restart_sched_tick+0x17a/0x180()
[ 4134.833817] Modules linked in: binfmt_misc rfcomm l2cap
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
snd_pcm_oss snd_mixer_oss thermal snd_hda_intel processor snd_pcm
button sg hci_usb rtc_cmos bluetooth evdev rtc_core snd_timer snd
pcspkr 3c59x i2c_i801 rtc_lib dcdbas serio_raw soundcore intel_agp
agpgart snd_page_alloc
[ 4134.833855] Pid: 0, comm: swapper Not tainted 2.6.25-rc4 #1
[ 4134.833857] [<c0128700>] ? zap_locks+0x30/0x70
[ 4134.833864] [<c0127ea4>] warn_on_slowpath+0x54/0x80
[ 4134.833873] [<c013f81d>] ? hrtimer_start+0xdd/0x150
[ 4134.833879] [<c014bf36>] ? __lock_release+0x26/0x70
[ 4134.833885] [<c013f81d>] ? hrtimer_start+0xdd/0x150
[ 4134.833892] [<c010ac18>] ? read_tsc+0x8/0x10
[ 4134.833898] [<c014212e>] ? getnstimeofday+0x3e/0x120
[ 4134.833906] [<c013ee78>] ? ktime_get_ts+0x58/0x60
[ 4134.833914] [<c013edcf>] ? ktime_get+0xf/0x30
[ 4134.833921] [<c0103030>] ? default_idle+0x0/0xa0
[ 4134.833927] [<c014729a>] tick_nohz_restart_sched_tick+0x17a/0x180
[ 4134.833934] [<c0103030>] ? default_idle+0x0/0xa0
[ 4134.833940] [<c01031c0>] cpu_idle+0xe0/0x140
[ 4134.833946] [<c03d0e7f>] start_secondary+0x9f/0xc0
[ 4134.833955] =======================


2008-03-05 07:35:34

by Jike Song

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

I have the very same problem. (Dell Optiplex 745, and 2.6.25-rc2 is
fine with this)

Besides, when "echo mem > /sys/power/state", everything goes right;
while "echo standby > /sys/power/state", the WARNING appears.



On Wed, Mar 5, 2008 at 3:24 PM, Dave Young <[email protected]> wrote:
> Hi,
> don't know if it's fixed or not, with 2.6.25-rc4 after suspend/resume,
> my syslog full of rcupreempt warnings.
>
> root@darkstar:/var/log# grep WARNING syslog|wc
> 85499 940859 11223920
>
> warnings are as following:
>
> [ 4134.833641] ------------[ cut here ]------------
> [ 4134.833647] WARNING: at include/linux/rcupreempt.h:91
> tick_nohz_stop_sched_tick+0x39d/0x3b0()
> [ 4134.833650] Modules linked in: binfmt_misc rfcomm l2cap
> snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
> snd_pcm_oss snd_mixer_oss thermal snd_hda_intel processor snd_pcm
> button sg hci_usb rtc_cmos bluetooth evdev rtc_core snd_timer snd
> pcspkr 3c59x i2c_i801 rtc_lib dcdbas serio_raw soundcore intel_agp
> agpgart snd_page_alloc
> [ 4134.833690] Pid: 0, comm: swapper Not tainted 2.6.25-rc4 #1
> [ 4134.833693] [<c0128700>] ? zap_locks+0x30/0x70
> [ 4134.833700] [<c0127ea4>] warn_on_slowpath+0x54/0x80
> [ 4134.833708] [<c0140030>] ? hrtimer_init_sleeper+0x0/0x20
> [ 4134.833715] [<c014c4ea>] ? __lock_acquired+0x10a/0x140
> [ 4134.833722] [<c013fa05>] ? hrtimer_get_next_event+0x75/0xe0
> [ 4134.833728] [<c014bf36>] ? __lock_release+0x26/0x70
> [ 4134.833734] [<c013fa05>] ? hrtimer_get_next_event+0x75/0xe0
> [ 4134.833741] [<c03d5989>] ? _spin_unlock_irqrestore+0x39/0x70
> [ 4134.833749] [<c013fa05>] ? hrtimer_get_next_event+0x75/0xe0
> [ 4134.833757] [<c013188b>] ? cmp_next_hrtimer_event+0x1b/0xa0
> [ 4134.833765] [<c01470dd>] tick_nohz_stop_sched_tick+0x39d/0x3b0
> [ 4134.833772] [<c03d3059>] ? __sched_text_start+0x229/0x4c0
> [ 4134.833781] [<c0103030>] ? default_idle+0x0/0xa0
> [ 4134.833787] [<c0103117>] cpu_idle+0x37/0x140
> [ 4134.833793] [<c03d0e7f>] start_secondary+0x9f/0xc0
> [ 4134.833802] =======================
> [ 4134.833805] ---[ end trace f3669ad1d62556af ]---
> [ 4134.833811] ------------[ cut here ]------------
> [ 4134.833814] WARNING: at include/linux/rcupreempt.h:99
> tick_nohz_restart_sched_tick+0x17a/0x180()
> [ 4134.833817] Modules linked in: binfmt_misc rfcomm l2cap
> snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
> snd_pcm_oss snd_mixer_oss thermal snd_hda_intel processor snd_pcm
> button sg hci_usb rtc_cmos bluetooth evdev rtc_core snd_timer snd
> pcspkr 3c59x i2c_i801 rtc_lib dcdbas serio_raw soundcore intel_agp
> agpgart snd_page_alloc
> [ 4134.833855] Pid: 0, comm: swapper Not tainted 2.6.25-rc4 #1
> [ 4134.833857] [<c0128700>] ? zap_locks+0x30/0x70
> [ 4134.833864] [<c0127ea4>] warn_on_slowpath+0x54/0x80
> [ 4134.833873] [<c013f81d>] ? hrtimer_start+0xdd/0x150
> [ 4134.833879] [<c014bf36>] ? __lock_release+0x26/0x70
> [ 4134.833885] [<c013f81d>] ? hrtimer_start+0xdd/0x150
> [ 4134.833892] [<c010ac18>] ? read_tsc+0x8/0x10
> [ 4134.833898] [<c014212e>] ? getnstimeofday+0x3e/0x120
> [ 4134.833906] [<c013ee78>] ? ktime_get_ts+0x58/0x60
> [ 4134.833914] [<c013edcf>] ? ktime_get+0xf/0x30
> [ 4134.833921] [<c0103030>] ? default_idle+0x0/0xa0
> [ 4134.833927] [<c014729a>] tick_nohz_restart_sched_tick+0x17a/0x180
> [ 4134.833934] [<c0103030>] ? default_idle+0x0/0xa0
> [ 4134.833940] [<c01031c0>] cpu_idle+0xe0/0x140
> [ 4134.833946] [<c03d0e7f>] start_secondary+0x9f/0xc0
> [ 4134.833955] =======================
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-03-05 16:56:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Wed, Mar 05, 2008 at 03:35:20PM +0800, Jike Song wrote:
> I have the very same problem. (Dell Optiplex 745, and 2.6.25-rc2 is
> fine with this)
>
> Besides, when "echo mem > /sys/power/state", everything goes right;
> while "echo standby > /sys/power/state", the WARNING appears.

Could you please try applying Karsten Wiese's patch to see if it
fixes the WARNINGs? See http://lkml.org/lkml/2008/2/26/386 for
the patch.

Thanx, Paul

> On Wed, Mar 5, 2008 at 3:24 PM, Dave Young <[email protected]> wrote:
> > Hi,
> > don't know if it's fixed or not, with 2.6.25-rc4 after suspend/resume,
> > my syslog full of rcupreempt warnings.
> >
> > root@darkstar:/var/log# grep WARNING syslog|wc
> > 85499 940859 11223920
> >
> > warnings are as following:
> >
> > [ 4134.833641] ------------[ cut here ]------------
> > [ 4134.833647] WARNING: at include/linux/rcupreempt.h:91
> > tick_nohz_stop_sched_tick+0x39d/0x3b0()
> > [ 4134.833650] Modules linked in: binfmt_misc rfcomm l2cap
> > snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
> > snd_pcm_oss snd_mixer_oss thermal snd_hda_intel processor snd_pcm
> > button sg hci_usb rtc_cmos bluetooth evdev rtc_core snd_timer snd
> > pcspkr 3c59x i2c_i801 rtc_lib dcdbas serio_raw soundcore intel_agp
> > agpgart snd_page_alloc
> > [ 4134.833690] Pid: 0, comm: swapper Not tainted 2.6.25-rc4 #1
> > [ 4134.833693] [<c0128700>] ? zap_locks+0x30/0x70
> > [ 4134.833700] [<c0127ea4>] warn_on_slowpath+0x54/0x80
> > [ 4134.833708] [<c0140030>] ? hrtimer_init_sleeper+0x0/0x20
> > [ 4134.833715] [<c014c4ea>] ? __lock_acquired+0x10a/0x140
> > [ 4134.833722] [<c013fa05>] ? hrtimer_get_next_event+0x75/0xe0
> > [ 4134.833728] [<c014bf36>] ? __lock_release+0x26/0x70
> > [ 4134.833734] [<c013fa05>] ? hrtimer_get_next_event+0x75/0xe0
> > [ 4134.833741] [<c03d5989>] ? _spin_unlock_irqrestore+0x39/0x70
> > [ 4134.833749] [<c013fa05>] ? hrtimer_get_next_event+0x75/0xe0
> > [ 4134.833757] [<c013188b>] ? cmp_next_hrtimer_event+0x1b/0xa0
> > [ 4134.833765] [<c01470dd>] tick_nohz_stop_sched_tick+0x39d/0x3b0
> > [ 4134.833772] [<c03d3059>] ? __sched_text_start+0x229/0x4c0
> > [ 4134.833781] [<c0103030>] ? default_idle+0x0/0xa0
> > [ 4134.833787] [<c0103117>] cpu_idle+0x37/0x140
> > [ 4134.833793] [<c03d0e7f>] start_secondary+0x9f/0xc0
> > [ 4134.833802] =======================
> > [ 4134.833805] ---[ end trace f3669ad1d62556af ]---
> > [ 4134.833811] ------------[ cut here ]------------
> > [ 4134.833814] WARNING: at include/linux/rcupreempt.h:99
> > tick_nohz_restart_sched_tick+0x17a/0x180()
> > [ 4134.833817] Modules linked in: binfmt_misc rfcomm l2cap
> > snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
> > snd_pcm_oss snd_mixer_oss thermal snd_hda_intel processor snd_pcm
> > button sg hci_usb rtc_cmos bluetooth evdev rtc_core snd_timer snd
> > pcspkr 3c59x i2c_i801 rtc_lib dcdbas serio_raw soundcore intel_agp
> > agpgart snd_page_alloc
> > [ 4134.833855] Pid: 0, comm: swapper Not tainted 2.6.25-rc4 #1
> > [ 4134.833857] [<c0128700>] ? zap_locks+0x30/0x70
> > [ 4134.833864] [<c0127ea4>] warn_on_slowpath+0x54/0x80
> > [ 4134.833873] [<c013f81d>] ? hrtimer_start+0xdd/0x150
> > [ 4134.833879] [<c014bf36>] ? __lock_release+0x26/0x70
> > [ 4134.833885] [<c013f81d>] ? hrtimer_start+0xdd/0x150
> > [ 4134.833892] [<c010ac18>] ? read_tsc+0x8/0x10
> > [ 4134.833898] [<c014212e>] ? getnstimeofday+0x3e/0x120
> > [ 4134.833906] [<c013ee78>] ? ktime_get_ts+0x58/0x60
> > [ 4134.833914] [<c013edcf>] ? ktime_get+0xf/0x30
> > [ 4134.833921] [<c0103030>] ? default_idle+0x0/0xa0
> > [ 4134.833927] [<c014729a>] tick_nohz_restart_sched_tick+0x17a/0x180
> > [ 4134.833934] [<c0103030>] ? default_idle+0x0/0xa0
> > [ 4134.833940] [<c01031c0>] cpu_idle+0xe0/0x140
> > [ 4134.833946] [<c03d0e7f>] start_secondary+0x9f/0xc0
> > [ 4134.833955] =======================
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >

2008-03-06 02:11:57

by Jike Song

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Thu, Mar 6, 2008 at 12:55 AM, Paul E. McKenney
<[email protected]> wrote:
> Could you please try applying Karsten Wiese's patch to see if it
> fixes the WARNINGs? See http://lkml.org/lkml/2008/2/26/386 for
> the patch.
>
> Thanx, Paul

Yes, the WARNING disappears with Karsten's patch applied.
Thanks.

2008-03-06 02:34:32

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Thu, Mar 06, 2008 at 10:11:26AM +0800, Jike Song wrote:
> On Thu, Mar 6, 2008 at 12:55 AM, Paul E. McKenney
> <[email protected]> wrote:
> > Could you please try applying Karsten Wiese's patch to see if it
> > fixes the WARNINGs? See http://lkml.org/lkml/2008/2/26/386 for
> > the patch.
> >
> > Thanx, Paul
>
> Yes, the WARNING disappears with Karsten's patch applied.
> Thanks.

Good to hear!!! I believe that Karsten's patch is on its way in,
so hopefully that will clear things up.

Thanx, Paul

2008-03-06 03:21:07

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Thu, Mar 6, 2008 at 10:28 AM, Paul E. McKenney
<[email protected]> wrote:
>
> On Thu, Mar 06, 2008 at 10:11:26AM +0800, Jike Song wrote:
> > On Thu, Mar 6, 2008 at 12:55 AM, Paul E. McKenney
> > <[email protected]> wrote:
> > > Could you please try applying Karsten Wiese's patch to see if it
> > > fixes the WARNINGs? See http://lkml.org/lkml/2008/2/26/386 for
> > > the patch.
> > >
> > > Thanx, Paul
> >
> > Yes, the WARNING disappears with Karsten's patch applied.
> > Thanks.

Confirmed

>
> Good to hear!!! I believe that Karsten's patch is on its way in,
> so hopefully that will clear things up.
>
> Thanx, Paul
>

2008-03-06 11:09:18

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Thu, Mar 6, 2008 at 11:20 AM, Dave Young <[email protected]> wrote:
> On Thu, Mar 6, 2008 at 10:28 AM, Paul E. McKenney
>
> <[email protected]> wrote:
> >
> > On Thu, Mar 06, 2008 at 10:11:26AM +0800, Jike Song wrote:
> > > On Thu, Mar 6, 2008 at 12:55 AM, Paul E. McKenney
> > > <[email protected]> wrote:
> > > > Could you please try applying Karsten Wiese's patch to see if it
> > > > fixes the WARNINGs? See http://lkml.org/lkml/2008/2/26/386 for
> > > > the patch.
> > > >
> > > > Thanx, Paul
> > >
> > > Yes, the WARNING disappears with Karsten's patch applied.
> > > Thanks.
>
> Confirmed
>
>
>
> >
> > Good to hear!!! I believe that Karsten's patch is on its way in,
> > so hopefully that will clear things up.

Hi, paul

My syslog became a 2G size big file yestoday due to the warnings.
How about change the WARN_ON to WARN_ON_ONCE?

> >
> > Thanx, Paul
> >
>

2008-03-06 16:28:54

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Thu, Mar 06, 2008 at 07:08:55PM +0800, Dave Young wrote:
> On Thu, Mar 6, 2008 at 11:20 AM, Dave Young <[email protected]> wrote:
> > On Thu, Mar 6, 2008 at 10:28 AM, Paul E. McKenney
> >
> > <[email protected]> wrote:
> > >
> > > On Thu, Mar 06, 2008 at 10:11:26AM +0800, Jike Song wrote:
> > > > On Thu, Mar 6, 2008 at 12:55 AM, Paul E. McKenney
> > > > <[email protected]> wrote:
> > > > > Could you please try applying Karsten Wiese's patch to see if it
> > > > > fixes the WARNINGs? See http://lkml.org/lkml/2008/2/26/386 for
> > > > > the patch.
> > > > >
> > > > > Thanx, Paul
> > > >
> > > > Yes, the WARNING disappears with Karsten's patch applied.
> > > > Thanks.
> >
> > Confirmed
> >
> >
> >
> > >
> > > Good to hear!!! I believe that Karsten's patch is on its way in,
> > > so hopefully that will clear things up.
>
> Hi, paul
>
> My syslog became a 2G size big file yestoday due to the warnings.
> How about change the WARN_ON to WARN_ON_ONCE?

Hello, Dave,

I might be convinced to make this change for 2.6.26, but the condition
that the WARN_ON() is complaining about is quite serious, so I don't
want to take a chance on it getting lost in the noise in the 2.6.25
series.

Seem reasonable?

Better yet, is there some sort of time-limited WARN_ON that kicks out
a message at most once per second or some such? Enough to definitely
be noticed, but not enough to bring the machine to its knees?

Thanx, Paul

2008-03-07 03:07:59

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Fri, Mar 7, 2008 at 12:27 AM, Paul E. McKenney
<[email protected]> wrote:
>
> On Thu, Mar 06, 2008 at 07:08:55PM +0800, Dave Young wrote:
> > On Thu, Mar 6, 2008 at 11:20 AM, Dave Young <[email protected]> wrote:
> > > On Thu, Mar 6, 2008 at 10:28 AM, Paul E. McKenney
> > >
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, Mar 06, 2008 at 10:11:26AM +0800, Jike Song wrote:
> > > > > On Thu, Mar 6, 2008 at 12:55 AM, Paul E. McKenney
> > > > > <[email protected]> wrote:
> > > > > > Could you please try applying Karsten Wiese's patch to see if it
> > > > > > fixes the WARNINGs? See http://lkml.org/lkml/2008/2/26/386 for
> > > > > > the patch.
> > > > > >
> > > > > > Thanx, Paul
> > > > >
> > > > > Yes, the WARNING disappears with Karsten's patch applied.
> > > > > Thanks.
> > >
> > > Confirmed
> > >
> > >
> > >
> > > >
> > > > Good to hear!!! I believe that Karsten's patch is on its way in,
> > > > so hopefully that will clear things up.
> >
> > Hi, paul
> >
> > My syslog became a 2G size big file yestoday due to the warnings.
> > How about change the WARN_ON to WARN_ON_ONCE?
>
> Hello, Dave,
>
> I might be convinced to make this change for 2.6.26, but the condition
> that the WARN_ON() is complaining about is quite serious, so I don't
> want to take a chance on it getting lost in the noise in the 2.6.25
> series.
>
> Seem reasonable?

IMHO, WARN_ON_ONCE is enough for my eyes :)

>
> Better yet, is there some sort of time-limited WARN_ON that kicks out
> a message at most once per second or some such? Enough to definitely
> be noticed, but not enough to bring the machine to its knees?

Seems there's no such functions/macros, but is is really needed?

>
> Thanx, Paul
>

2008-03-07 04:19:57

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Fri, Mar 07, 2008 at 11:07:48AM +0800, Dave Young wrote:
> On Fri, Mar 7, 2008 at 12:27 AM, Paul E. McKenney
> <[email protected]> wrote:
> > On Thu, Mar 06, 2008 at 07:08:55PM +0800, Dave Young wrote:
> > > My syslog became a 2G size big file yestoday due to the warnings.
> > > How about change the WARN_ON to WARN_ON_ONCE?
> >
> > Hello, Dave,
> >
> > I might be convinced to make this change for 2.6.26, but the condition
> > that the WARN_ON() is complaining about is quite serious, so I don't
> > want to take a chance on it getting lost in the noise in the 2.6.25
> > series.
> >
> > Seem reasonable?
>
> IMHO, WARN_ON_ONCE is enough for my eyes :)

I could believe that, but my experience has been that many others
need the condition to be obvious...

> > Better yet, is there some sort of time-limited WARN_ON that kicks out
> > a message at most once per second or some such? Enough to definitely
> > be noticed, but not enough to bring the machine to its knees?
>
> Seems there's no such functions/macros, but is is really needed?

If everyone reports errors when they see isolated WARN_ON()s in their
logfiles, then no. But...

Thanx, Paul

2008-03-07 04:35:42

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Fri, Mar 7, 2008 at 12:19 PM, Paul E. McKenney
<[email protected]> wrote:
> On Fri, Mar 07, 2008 at 11:07:48AM +0800, Dave Young wrote:
> > On Fri, Mar 7, 2008 at 12:27 AM, Paul E. McKenney
> > <[email protected]> wrote:
> > > On Thu, Mar 06, 2008 at 07:08:55PM +0800, Dave Young wrote:
>
> > > > My syslog became a 2G size big file yestoday due to the warnings.
> > > > How about change the WARN_ON to WARN_ON_ONCE?
> > >
> > > Hello, Dave,
> > >
> > > I might be convinced to make this change for 2.6.26, but the condition
> > > that the WARN_ON() is complaining about is quite serious, so I don't
> > > want to take a chance on it getting lost in the noise in the 2.6.25
> > > series.
> > >
> > > Seem reasonable?
> >
> > IMHO, WARN_ON_ONCE is enough for my eyes :)
>
> I could believe that, but my experience has been that many others
> need the condition to be obvious...
>
>
> > > Better yet, is there some sort of time-limited WARN_ON that kicks out
> > > a message at most once per second or some such? Enough to definitely
> > > be noticed, but not enough to bring the machine to its knees?
> >
> > Seems there's no such functions/macros, but is is really needed?
>
> If everyone reports errors when they see isolated WARN_ON()s in their
> logfiles, then no. But...

Ok, I agree with you.

Maybe something like WARN_ON_HZ(condition) or
WARN_ON_PERIOD(condition, period_value)?

>
> Thanx, Paul
>

2008-03-07 06:58:16

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Fri, Mar 07, 2008 at 12:35:26PM +0800, Dave Young wrote:
> On Fri, Mar 7, 2008 at 12:19 PM, Paul E. McKenney
> <[email protected]> wrote:
> > On Fri, Mar 07, 2008 at 11:07:48AM +0800, Dave Young wrote:
> > > On Fri, Mar 7, 2008 at 12:27 AM, Paul E. McKenney
> > > <[email protected]> wrote:
> > > > On Thu, Mar 06, 2008 at 07:08:55PM +0800, Dave Young wrote:
> >
> > > > > My syslog became a 2G size big file yestoday due to the warnings.
> > > > > How about change the WARN_ON to WARN_ON_ONCE?
> > > >
> > > > Hello, Dave,
> > > >
> > > > I might be convinced to make this change for 2.6.26, but the condition
> > > > that the WARN_ON() is complaining about is quite serious, so I don't
> > > > want to take a chance on it getting lost in the noise in the 2.6.25
> > > > series.
> > > >
> > > > Seem reasonable?
> > >
> > > IMHO, WARN_ON_ONCE is enough for my eyes :)
> >
> > I could believe that, but my experience has been that many others
> > need the condition to be obvious...
> >
> >
> > > > Better yet, is there some sort of time-limited WARN_ON that kicks out
> > > > a message at most once per second or some such? Enough to definitely
> > > > be noticed, but not enough to bring the machine to its knees?
> > >
> > > Seems there's no such functions/macros, but is is really needed?
> >
> > If everyone reports errors when they see isolated WARN_ON()s in their
> > logfiles, then no. But...
>
> Ok, I agree with you.
>
> Maybe something like WARN_ON_HZ(condition) or
> WARN_ON_PERIOD(condition, period_value)?

Makes sense to me! The other benefit of this sort of thing is that
it lets you know whether the problem was a one-off or whether it
continued happening -- but without too much log bloat.

I was thinking in terms of once every ten seconds, but am not all
that hung up on the exact value of the period.

Thoughts?

Thanx, Paul

2008-03-07 07:31:21

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Fri, Mar 7, 2008 at 2:57 PM, Paul E. McKenney
<[email protected]> wrote:
>
> On Fri, Mar 07, 2008 at 12:35:26PM +0800, Dave Young wrote:
> > On Fri, Mar 7, 2008 at 12:19 PM, Paul E. McKenney
> > <[email protected]> wrote:
> > > On Fri, Mar 07, 2008 at 11:07:48AM +0800, Dave Young wrote:
> > > > On Fri, Mar 7, 2008 at 12:27 AM, Paul E. McKenney
> > > > <[email protected]> wrote:
> > > > > On Thu, Mar 06, 2008 at 07:08:55PM +0800, Dave Young wrote:
> > >
> > > > > > My syslog became a 2G size big file yestoday due to the warnings.
> > > > > > How about change the WARN_ON to WARN_ON_ONCE?
> > > > >
> > > > > Hello, Dave,
> > > > >
> > > > > I might be convinced to make this change for 2.6.26, but the condition
> > > > > that the WARN_ON() is complaining about is quite serious, so I don't
> > > > > want to take a chance on it getting lost in the noise in the 2.6.25
> > > > > series.
> > > > >
> > > > > Seem reasonable?
> > > >
> > > > IMHO, WARN_ON_ONCE is enough for my eyes :)
> > >
> > > I could believe that, but my experience has been that many others
> > > need the condition to be obvious...
> > >
> > >
> > > > > Better yet, is there some sort of time-limited WARN_ON that kicks out
> > > > > a message at most once per second or some such? Enough to definitely
> > > > > be noticed, but not enough to bring the machine to its knees?
> > > >
> > > > Seems there's no such functions/macros, but is is really needed?
> > >
> > > If everyone reports errors when they see isolated WARN_ON()s in their
> > > logfiles, then no. But...
> >
> > Ok, I agree with you.
> >
> > Maybe something like WARN_ON_HZ(condition) or
> > WARN_ON_PERIOD(condition, period_value)?
>
> Makes sense to me! The other benefit of this sort of thing is that
> it lets you know whether the problem was a one-off or whether it
> continued happening -- but without too much log bloat.
>
> I was thinking in terms of once every ten seconds, but am not all
> that hung up on the exact value of the period.
>
> Thoughts?

Then, WARN_ON_SECS(condition, seconds) ?

>
> Thanx, Paul
>

2008-03-07 07:44:12

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Fri, Mar 7, 2008 at 3:31 PM, Dave Young <[email protected]> wrote:
> On Fri, Mar 7, 2008 at 2:57 PM, Paul E. McKenney
>
>
> <[email protected]> wrote:
> >
> > On Fri, Mar 07, 2008 at 12:35:26PM +0800, Dave Young wrote:
> > > On Fri, Mar 7, 2008 at 12:19 PM, Paul E. McKenney
> > > <[email protected]> wrote:
> > > > On Fri, Mar 07, 2008 at 11:07:48AM +0800, Dave Young wrote:
> > > > > On Fri, Mar 7, 2008 at 12:27 AM, Paul E. McKenney
> > > > > <[email protected]> wrote:
> > > > > > On Thu, Mar 06, 2008 at 07:08:55PM +0800, Dave Young wrote:
> > > >
> > > > > > > My syslog became a 2G size big file yestoday due to the warnings.
> > > > > > > How about change the WARN_ON to WARN_ON_ONCE?
> > > > > >
> > > > > > Hello, Dave,
> > > > > >
> > > > > > I might be convinced to make this change for 2.6.26, but the condition
> > > > > > that the WARN_ON() is complaining about is quite serious, so I don't
> > > > > > want to take a chance on it getting lost in the noise in the 2.6.25
> > > > > > series.
> > > > > >
> > > > > > Seem reasonable?
> > > > >
> > > > > IMHO, WARN_ON_ONCE is enough for my eyes :)
> > > >
> > > > I could believe that, but my experience has been that many others
> > > > need the condition to be obvious...
> > > >
> > > >
> > > > > > Better yet, is there some sort of time-limited WARN_ON that kicks out
> > > > > > a message at most once per second or some such? Enough to definitely
> > > > > > be noticed, but not enough to bring the machine to its knees?
> > > > >
> > > > > Seems there's no such functions/macros, but is is really needed?
> > > >
> > > > If everyone reports errors when they see isolated WARN_ON()s in their
> > > > logfiles, then no. But...
> > >
> > > Ok, I agree with you.
> > >
> > > Maybe something like WARN_ON_HZ(condition) or
> > > WARN_ON_PERIOD(condition, period_value)?
> >
> > Makes sense to me! The other benefit of this sort of thing is that
> > it lets you know whether the problem was a one-off or whether it
> > continued happening -- but without too much log bloat.
> >
> > I was thinking in terms of once every ten seconds, but am not all
> > that hung up on the exact value of the period.
> >
> > Thoughts?
>
> Then, WARN_ON_SECS(condition, seconds) ?

Sorry, seconds must be a fixed number here, so your 10 secs maybe
suitable for it.

>
> >
> > Thanx, Paul
> >
>

2008-03-07 07:55:52

by Dave Young

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Fri, Mar 7, 2008 at 3:43 PM, Dave Young <[email protected]> wrote:
>
> On Fri, Mar 7, 2008 at 3:31 PM, Dave Young <[email protected]> wrote:
> > On Fri, Mar 7, 2008 at 2:57 PM, Paul E. McKenney
> >
> >
> > <[email protected]> wrote:
> > >
> > > On Fri, Mar 07, 2008 at 12:35:26PM +0800, Dave Young wrote:
> > > > On Fri, Mar 7, 2008 at 12:19 PM, Paul E. McKenney
> > > > <[email protected]> wrote:
> > > > > On Fri, Mar 07, 2008 at 11:07:48AM +0800, Dave Young wrote:
> > > > > > On Fri, Mar 7, 2008 at 12:27 AM, Paul E. McKenney
> > > > > > <[email protected]> wrote:
> > > > > > > On Thu, Mar 06, 2008 at 07:08:55PM +0800, Dave Young wrote:
> > > > >
> > > > > > > > My syslog became a 2G size big file yestoday due to the warnings.
> > > > > > > > How about change the WARN_ON to WARN_ON_ONCE?
> > > > > > >
> > > > > > > Hello, Dave,
> > > > > > >
> > > > > > > I might be convinced to make this change for 2.6.26, but the condition
> > > > > > > that the WARN_ON() is complaining about is quite serious, so I don't
> > > > > > > want to take a chance on it getting lost in the noise in the 2.6.25
> > > > > > > series.
> > > > > > >
> > > > > > > Seem reasonable?
> > > > > >
> > > > > > IMHO, WARN_ON_ONCE is enough for my eyes :)
> > > > >
> > > > > I could believe that, but my experience has been that many others
> > > > > need the condition to be obvious...
> > > > >
> > > > >
> > > > > > > Better yet, is there some sort of time-limited WARN_ON that kicks out
> > > > > > > a message at most once per second or some such? Enough to definitely
> > > > > > > be noticed, but not enough to bring the machine to its knees?
> > > > > >
> > > > > > Seems there's no such functions/macros, but is is really needed?
> > > > >
> > > > > If everyone reports errors when they see isolated WARN_ON()s in their
> > > > > logfiles, then no. But...
> > > >
> > > > Ok, I agree with you.
> > > >
> > > > Maybe something like WARN_ON_HZ(condition) or
> > > > WARN_ON_PERIOD(condition, period_value)?
> > >
> > > Makes sense to me! The other benefit of this sort of thing is that
> > > it lets you know whether the problem was a one-off or whether it
> > > continued happening -- but without too much log bloat.
> > >
> > > I was thinking in terms of once every ten seconds, but am not all
> > > that hung up on the exact value of the period.
> > >
> > > Thoughts?
> >
> > Then, WARN_ON_SECS(condition, seconds) ?
>
> Sorry, seconds must be a fixed number here, so your 10 secs maybe
> suitable for it.

Or the secs number could be a config option/cmmand line param?

>
> >
> > >
> > > Thanx, Paul
> > >
> >
>

2008-03-07 14:09:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 2.6.25-rc4 rcupreempt.h WARNINGs while suspend/resume

On Fri, Mar 07, 2008 at 03:55:41PM +0800, Dave Young wrote:
> On Fri, Mar 7, 2008 at 3:43 PM, Dave Young <[email protected]> wrote:
> >
> > On Fri, Mar 7, 2008 at 3:31 PM, Dave Young <[email protected]> wrote:
> > > On Fri, Mar 7, 2008 at 2:57 PM, Paul E. McKenney
> > >
> > >
> > > <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 07, 2008 at 12:35:26PM +0800, Dave Young wrote:
> > > > > On Fri, Mar 7, 2008 at 12:19 PM, Paul E. McKenney
> > > > > <[email protected]> wrote:
> > > > > > On Fri, Mar 07, 2008 at 11:07:48AM +0800, Dave Young wrote:
> > > > > > > On Fri, Mar 7, 2008 at 12:27 AM, Paul E. McKenney
> > > > > > > <[email protected]> wrote:
> > > > > > > > On Thu, Mar 06, 2008 at 07:08:55PM +0800, Dave Young wrote:
> > > > > >
> > > > > > > > > My syslog became a 2G size big file yestoday due to the warnings.
> > > > > > > > > How about change the WARN_ON to WARN_ON_ONCE?
> > > > > > > >
> > > > > > > > Hello, Dave,
> > > > > > > >
> > > > > > > > I might be convinced to make this change for 2.6.26, but the condition
> > > > > > > > that the WARN_ON() is complaining about is quite serious, so I don't
> > > > > > > > want to take a chance on it getting lost in the noise in the 2.6.25
> > > > > > > > series.
> > > > > > > >
> > > > > > > > Seem reasonable?
> > > > > > >
> > > > > > > IMHO, WARN_ON_ONCE is enough for my eyes :)
> > > > > >
> > > > > > I could believe that, but my experience has been that many others
> > > > > > need the condition to be obvious...
> > > > > >
> > > > > >
> > > > > > > > Better yet, is there some sort of time-limited WARN_ON that kicks out
> > > > > > > > a message at most once per second or some such? Enough to definitely
> > > > > > > > be noticed, but not enough to bring the machine to its knees?
> > > > > > >
> > > > > > > Seems there's no such functions/macros, but is is really needed?
> > > > > >
> > > > > > If everyone reports errors when they see isolated WARN_ON()s in their
> > > > > > logfiles, then no. But...
> > > > >
> > > > > Ok, I agree with you.
> > > > >
> > > > > Maybe something like WARN_ON_HZ(condition) or
> > > > > WARN_ON_PERIOD(condition, period_value)?
> > > >
> > > > Makes sense to me! The other benefit of this sort of thing is that
> > > > it lets you know whether the problem was a one-off or whether it
> > > > continued happening -- but without too much log bloat.
> > > >
> > > > I was thinking in terms of once every ten seconds, but am not all
> > > > that hung up on the exact value of the period.
> > > >
> > > > Thoughts?
> > >
> > > Then, WARN_ON_SECS(condition, seconds) ?
> >
> > Sorry, seconds must be a fixed number here, so your 10 secs maybe
> > suitable for it.
>
> Or the secs number could be a config option/cmmand line param?

Any of these options could work from my viewpoint:

o WARN_ON_SECS() would allow someone to tune a particular
warning to log more or less often.

o A config option or command-line parameter would mean less
typing in the source code, and would allow global control.

Thanx, Paul