Hi
I sent already report to netdev, but most interesting question i have, that
machine is not rebooted (it was set over sysctl value to kernel.panic) and
watchdog didnt reboot it too.
I set:
kernel.panic = 10
kernel.panic_on_oops = 10
watchdog iTCO_wdt + watchdog from busybox, and still machine didn't came back
online from panic! But after pressing reset button by guy on location (it is
very far in mountains, roads is blocked by snow now, there is no keyboard/
screen even to check what's happening).
After testing i notice that iTCO_wdt not working on this motherboard.
in dmesg
Feb 1 19:34:17 10.184.184.1 kernel: [ 58.112496] iTCO_wdt: Intel TCO
WatchDog Timer Driver v1.02 (26-Jul-2007)
Feb 1 19:34:17 10.184.184.1 kernel: [ 58.113114] iTCO_wdt: Found a ICH9R
TCO device (Version=2, TCOBASE=0x0460)
Feb 1 19:34:17 10.184.184.1 kernel: [ 58.113654] iTCO_wdt: initialized.
heartbeat=30 sec (nowayout=0)
1)i launch busybox watchdog:
watchdog -t 5 /dev/watchdog
i can see it in processes
2)then i do
killall -9 watchdog
i can see in dmesg
Feb 2 00:55:23 10.184.184.1 kernel: [ 6400.419418] iTCO_wdt: Unexpected
close, not stopping watchdog!
Machine is not rebooting. It is not rebooting also on panic (over sysctl
value). Motherboard: Intel DP35DP
Here is panic message, just for information.
Feb 1 09:08:50 SERVER [12380.067104] BUG: unable to handle kernel NULL
pointer dereference
Feb 1 09:08:50 SERVER at virtual address 00000008
Feb 1 09:08:50 SERVER [12380.067140] printing eip: c01f10ed
Feb 1 09:08:50 SERVER *pde = 00000000
Feb 1 09:08:50 SERVER
Feb 1 09:08:50 SERVER [12380.067162] Oops: 0000 [#1]
Feb 1 09:08:50 SERVER SMP
Feb 1 09:08:50 SERVER
Feb 1 09:08:50 SERVER [12380.067181] Modules linked in:
Feb 1 09:08:50 SERVER netconsole
Feb 1 09:08:50 SERVER configfs
Feb 1 09:08:50 SERVER iTCO_wdt
Feb 1 09:08:50 SERVER nf_nat_pptp
Feb 1 09:08:50 SERVER nf_conntrack_pptp
Feb 1 09:08:50 SERVER nf_conntrack_proto_gre
Feb 1 09:08:50 SERVER nf_nat_proto_gre
Feb 1 09:08:50 SERVER sch_esfq
Feb 1 09:08:50 SERVER xt_tcpudp
Feb 1 09:08:50 SERVER ipt_TTL
Feb 1 09:08:50 SERVER ipt_ttl
Feb 1 09:08:50 SERVER xt_NOTRACK
Feb 1 09:08:50 SERVER iptable_raw
Feb 1 09:08:50 SERVER iptable_mangle
Feb 1 09:08:50 SERVER ifb
Feb 1 09:08:50 SERVER e1000e
Feb 1 09:08:50 SERVER em_nbyte
Feb 1 09:08:50 SERVER cls_tcindex
Feb 1 09:08:50 SERVER act_gact
Feb 1 09:08:50 SERVER cls_rsvp
Feb 1 09:08:50 SERVER sch_htb
Feb 1 09:08:50 SERVER cls_fw
Feb 1 09:08:50 SERVER act_mirred
Feb 1 09:08:50 SERVER em_u32
Feb 1 09:08:50 SERVER sch_red
Feb 1 09:08:50 SERVER sch_sfq
Feb 1 09:08:50 SERVER sch_tbf
Feb 1 09:08:50 SERVER sch_teql
Feb 1 09:08:50 SERVER cls_basic
Feb 1 09:08:50 SERVER act_police
Feb 1 09:08:50 SERVER sch_gred
Feb 1 09:08:50 SERVER act_pedit
Feb 1 09:08:50 SERVER sch_hfsc
Feb 1 09:08:50 SERVER cls_rsvp6
Feb 1 09:08:50 SERVER sch_ingress
Feb 1 09:08:50 SERVER em_meta
Feb 1 09:08:50 SERVER em_text
Feb 1 09:08:50 SERVER act_ipt
Feb 1 09:08:50 SERVER sch_dsmark
Feb 1 09:08:50 SERVER sch_prio
Feb 1 09:08:50 SERVER sch_netem
Feb 1 09:08:50 SERVER act_simple
Feb 1 09:08:50 SERVER cls_u32
Feb 1 09:08:50 SERVER em_cmp
Feb 1 09:08:50 SERVER sch_cbq
Feb 1 09:08:50 SERVER cls_route
Feb 1 09:08:50 SERVER xt_TCPMSS
Feb 1 09:08:50 SERVER iptable_nat
Feb 1 09:08:50 SERVER nf_conntrack_ipv4
Feb 1 09:08:50 SERVER ipt_LOG
Feb 1 09:08:50 SERVER ipt_MASQUERADE
Feb 1 09:08:50 SERVER ipt_REDIRECT
Feb 1 09:08:50 SERVER nf_nat
Feb 1 09:08:50 SERVER nf_conntrack
Feb 1 09:08:50 SERVER nfnetlink
Feb 1 09:08:50 SERVER iptable_filter
Feb 1 09:08:50 SERVER ip_tables
Feb 1 09:08:50 SERVER x_tables
Feb 1 09:08:50 SERVER 8021q
Feb 1 09:08:50 SERVER tun
Feb 1 09:08:50 SERVER tulip
Feb 1 09:08:50 SERVER r8169
Feb 1 09:08:50 SERVER sky2
Feb 1 09:08:50 SERVER via_velocity
Feb 1 09:08:50 SERVER via_rhine
Feb 1 09:08:50 SERVER sis900
Feb 1 09:08:50 SERVER ne2k_pci
Feb 1 09:08:50 SERVER 8390
Feb 1 09:08:50 SERVER skge
Feb 1 09:08:50 SERVER tg3
Feb 1 09:08:50 SERVER 8139too
Feb 1 09:08:50 SERVER e1000
Feb 1 09:08:50 SERVER e100
Feb 1 09:08:50 SERVER usb_storage
Feb 1 09:08:50 SERVER mtdblock
Feb 1 09:08:50 SERVER mtd_blkdevs
Feb 1 09:08:50 SERVER usbhid
Feb 1 09:08:50 SERVER uhci_hcd
Feb 1 09:08:50 SERVER ehci_hcd
Feb 1 09:08:50 SERVER ohci_hcd
Feb 1 09:08:50 SERVER usbcore
Feb 1 09:08:50 SERVER
Feb 1 09:08:50 SERVER [12380.067515]
Feb 1 09:08:50 SERVER [12380.067530] Pid: 0, comm: swapper Not tainted
(2.6.24-build-0021 #26)
Feb 1 09:08:50 SERVER [12380.067550] EIP: 0060:[<c01f10ed>] EFLAGS: 00010086
CPU: 0
Feb 1 09:08:50 SERVER [12380.067571] EIP is at rb_erase+0x110/0x22f
Feb 1 09:08:50 SERVER [12380.067589] EAX: f52bbea0 EBX: 00000000 ECX:
00000000 EDX: f52bbea0
Feb 1 09:08:50 SERVER [12380.067608] ESI: f717df50 EDI: c1fed000 EBP:
c1fecf80 ESP: c037fda8
Feb 1 09:08:50 SERVER [12380.067628] DS: 007b ES: 007b FS: 00d8 GS: 0000
SS: 0068
Feb 1 09:08:50 SERVER [12380.067647] Process swapper (pid: 0, ti=c037e000
task=c03533a0 task.ti=c037e000)
Feb 1 09:08:50 SERVER
Feb 1 09:08:50 SERVER [12380.067668] Stack:
Feb 1 09:08:50 SERVER 00000001
Feb 1 09:08:50 SERVER c1fed000
Feb 1 09:08:50 SERVER c1fecf78
Feb 1 09:08:50 SERVER 00000002
Feb 1 09:08:50 SERVER 00000001
Feb 1 09:08:50 SERVER c0134663
Feb 1 09:08:50 SERVER c1fed000
Feb 1 09:08:50 SERVER c1fecf78
Feb 1 09:08:50 SERVER
Feb 1 09:08:50 SERVER [12380.067714]
Feb 1 09:08:50 SERVER c1fecf40
Feb 1 09:08:50 SERVER c013515b
Feb 1 09:08:50 SERVER 00000000
Feb 1 09:08:50 SERVER 4f3f473e
Feb 1 09:08:50 SERVER 000002d0
Feb 1 09:08:50 SERVER ffffffff
Feb 1 09:08:50 SERVER 7fffffff
Feb 1 09:08:50 SERVER 4f3f473e
Feb 1 09:08:50 SERVER
Feb 1 09:08:50 SERVER [12380.067760]
Feb 1 09:08:50 SERVER 000002d0
Feb 1 09:08:50 SERVER 00000000
Feb 1 09:08:50 SERVER c1fec120
Feb 1 09:08:50 SERVER c037ff84
Feb 1 09:08:50 SERVER c037fe70
Feb 1 09:08:50 SERVER f76ae880
Feb 1 09:08:50 SERVER c0113963
Feb 1 09:08:50 SERVER c1ff5f78
Feb 1 09:08:50 SERVER
Feb 1 09:08:50 SERVER [12380.067806] Call Trace:
Feb 1 09:08:50 SERVER [12380.067839] [<c0134663>]
Feb 1 09:08:50 SERVER __remove_hrtimer+0x5d/0x64
Feb 1 09:08:50 SERVER [12380.067861] [<c013515b>]
Feb 1 09:08:50 SERVER hrtimer_interrupt+0x10c/0x19a
Feb 1 09:08:50 SERVER [12380.067883] [<c0113963>]
Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x6f/0x80
Feb 1 09:08:50 SERVER [12380.067905] [<c0105838>]
Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30
Feb 1 09:08:50 SERVER [12380.067928] [<c02be6d7>]
Feb 1 09:08:50 SERVER _spin_lock_irqsave+0x13/0x27
Feb 1 09:08:50 SERVER [12380.067949] [<c0134bc7>]
Feb 1 09:08:50 SERVER lock_hrtimer_base+0x15/0x2f
Feb 1 09:08:50 SERVER [12380.067970] [<c0134ca0>]
Feb 1 09:08:50 SERVER hrtimer_start+0x16/0xf4
Feb 1 09:08:50 SERVER [12380.067991] [<c027ec43>]
Feb 1 09:08:50 SERVER qdisc_watchdog_schedule+0x1e/0x21
Feb 1 09:08:50 SERVER [12380.068013] [<f89f8fe6>]
Feb 1 09:08:50 SERVER htb_dequeue+0x6ef/0x6fb [sch_htb]
Feb 1 09:08:50 SERVER [12380.068036] [<c028ac4d>]
Feb 1 09:08:50 SERVER ip_rcv+0x1fc/0x237
Feb 1 09:08:50 SERVER [12380.068057] [<c0135297>]
Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb
Feb 1 09:08:50 SERVER [12380.068078] [<c0135297>]
Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb
Feb 1 09:08:50 SERVER [12380.068099] [<c0136e26>]
Feb 1 09:08:50 SERVER getnstimeofday+0x2b/0xb5
Feb 1 09:08:50 SERVER [12380.068118] [<c0138d70>]
Feb 1 09:08:50 SERVER clockevents_program_event+0xe0/0xee
Feb 1 09:08:50 SERVER [12380.068140] [<c027da0e>]
Feb 1 09:08:50 SERVER __qdisc_run+0x2a/0x163
Feb 1 09:08:50 SERVER [12380.068161] [<c02722d8>]
Feb 1 09:08:50 SERVER net_tx_action+0xa8/0xcc
Feb 1 09:08:50 SERVER [12380.068180] [<c027ec65>]
Feb 1 09:08:50 SERVER qdisc_watchdog+0x0/0x1b
Feb 1 09:08:50 SERVER [12380.068199] [<c027ec7d>]
Feb 1 09:08:50 SERVER qdisc_watchdog+0x18/0x1b
Feb 1 09:08:50 SERVER [12380.068218] [<c0135007>]
Feb 1 09:08:50 SERVER run_hrtimer_softirq+0x4e/0x96
Feb 1 09:08:50 SERVER [12380.068241] [<c0126a82>]
Feb 1 09:08:50 SERVER __do_softirq+0x5d/0xc1
Feb 1 09:08:50 SERVER [12380.068260] [<c0126b18>]
Feb 1 09:08:50 SERVER do_softirq+0x32/0x36
Feb 1 09:08:50 SERVER [12380.068279] [<c0126d6a>]
Feb 1 09:08:50 SERVER irq_exit+0x38/0x6b
Feb 1 09:08:50 SERVER [12380.068298] [<c0113968>]
Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x74/0x80
Feb 1 09:08:50 SERVER [12380.068319] [<c0105838>]
Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30
Feb 1 09:08:50 SERVER [12380.068343] [<c0103243>]
Feb 1 09:08:50 SERVER mwait_idle_with_hints+0x3c/0x40
Feb 1 09:08:50 SERVER [12380.068365] [<c0103247>]
Feb 1 09:08:50 SERVER mwait_idle+0x0/0xa
Feb 1 09:08:50 SERVER [12380.068384] [<c010357e>]
Feb 1 09:08:50 SERVER cpu_idle+0x98/0xb9
Feb 1 09:08:50 SERVER [12380.068403] [<c03848c2>]
Feb 1 09:08:50 SERVER start_kernel+0x2d7/0x2df
Feb 1 09:08:50 SERVER [12380.068422] [<c03840e0>]
Feb 1 09:08:50 SERVER unknown_bootoption+0x0/0x195
Feb 1 09:08:50 SERVER [12380.068444] =======================
Feb 1 09:08:50 SERVER [12380.068460] Code:
Feb 1 09:08:50 SERVER 01
Feb 1 09:08:50 SERVER 00
Feb 1 09:08:50 SERVER 00
Feb 1 09:08:50 SERVER 8b
Feb 1 09:08:50 SERVER 4e
Feb 1 09:08:50 SERVER 08
Feb 1 09:08:50 SERVER 39
Feb 1 09:08:50 SERVER d9
Feb 1 09:08:50 SERVER 0f
Feb 1 09:08:50 SERVER 85
Feb 1 09:08:50 SERVER 85
Feb 1 09:08:50 SERVER 00
Feb 1 09:08:50 SERVER 00
Feb 1 09:08:50 SERVER 00
Feb 1 09:08:50 SERVER 8b
Feb 1 09:08:50 SERVER 4e
Feb 1 09:08:50 SERVER 04
Feb 1 09:08:50 SERVER 8b
Feb 1 09:08:50 SERVER 01
Feb 1 09:08:50 SERVER a8
Feb 1 09:08:50 SERVER 01
Feb 1 09:08:50 SERVER 75
Feb 1 09:08:50 SERVER 14
Feb 1 09:08:50 SERVER 83
Feb 1 09:08:50 SERVER c8
Feb 1 09:08:50 SERVER 01
Feb 1 09:08:50 SERVER 89
Feb 1 09:08:50 SERVER ea
Feb 1 09:08:50 SERVER 89
Feb 1 09:08:50 SERVER 01
Feb 1 09:08:50 SERVER 89
Feb 1 09:08:50 SERVER f0
Feb 1 09:08:50 SERVER 83
Feb 1 09:08:50 SERVER 26
Feb 1 09:08:50 SERVER fe
Feb 1 09:08:50 SERVER e8
Feb 1 09:08:50 SERVER 1e
Feb 1 09:08:50 SERVER fd
Feb 1 09:08:50 SERVER ff
Feb 1 09:08:50 SERVER ff
Feb 1 09:08:50 SERVER 8b
Feb 1 09:08:50 SERVER 4e
Feb 1 09:08:50 SERVER 04
Feb 1 07:08:49 SERVER unparseable log message: "<8b> "
Feb 1 09:08:50 SERVER 59
Feb 1 09:08:50 SERVER 08
Feb 1 09:08:50 SERVER 85
Feb 1 09:08:50 SERVER db
Feb 1 09:08:50 SERVER 74
Feb 1 09:08:50 SERVER 06
Feb 1 09:08:50 SERVER 8b
Feb 1 09:08:50 SERVER 03
Feb 1 09:08:50 SERVER a8
Feb 1 09:08:50 SERVER 01
Feb 1 09:08:50 SERVER 74
Feb 1 09:08:50 SERVER 15
Feb 1 09:08:50 SERVER 8b
Feb 1 09:08:50 SERVER 41
Feb 1 09:08:50 SERVER 04
Feb 1 09:08:50 SERVER 85
Feb 1 09:08:50 SERVER c0
Feb 1 09:08:50 SERVER 0f
Feb 1 09:08:50 SERVER 84
Feb 1 09:08:50 SERVER c6
Feb 1 09:08:50 SERVER
Feb 1 09:08:50 SERVER [12380.068753] EIP: [<c01f10ed>]
Feb 1 09:08:50 SERVER rb_erase+0x110/0x22f
Feb 1 09:08:50 SERVER SS:ESP 0068:c037fda8
Feb 1 09:08:50 SERVER [12380.068978] Kernel panic - not syncing: Fatal
exception in interrupt
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
On Friday 01 February 2008 10:12, Denys Fedoryshchenko wrote:
> Hi
>
> I sent already report to netdev, but most interesting question i have, that
> machine is not rebooted (it was set over sysctl value to kernel.panic) and
> watchdog didnt reboot it too.
>
> I set:
>
> kernel.panic = 10
> kernel.panic_on_oops = 10
>
> watchdog iTCO_wdt + watchdog from busybox, and still machine didn't came back
> online from panic! But after pressing reset button by guy on location (it is
> very far in mountains, roads is blocked by snow now, there is no keyboard/
> screen even to check what's happening).
>
> After testing i notice that iTCO_wdt not working on this motherboard.
>
> in dmesg
> Feb 1 19:34:17 10.184.184.1 kernel: [ 58.112496] iTCO_wdt: Intel TCO
> WatchDog Timer Driver v1.02 (26-Jul-2007)
> Feb 1 19:34:17 10.184.184.1 kernel: [ 58.113114] iTCO_wdt: Found a ICH9R
> TCO device (Version=2, TCOBASE=0x0460)
> Feb 1 19:34:17 10.184.184.1 kernel: [ 58.113654] iTCO_wdt: initialized.
> heartbeat=30 sec (nowayout=0)
>
> 1)i launch busybox watchdog:
> watchdog -t 5 /dev/watchdog
> i can see it in processes
>
> 2)then i do
> killall -9 watchdog
> i can see in dmesg
> Feb 2 00:55:23 10.184.184.1 kernel: [ 6400.419418] iTCO_wdt: Unexpected
> close, not stopping watchdog!
>
> Machine is not rebooting. It is not rebooting also on panic (over sysctl
> value). Motherboard: Intel DP35DP
>
> Here is panic message, just for information.
>
...
> Feb 1 09:08:50 SERVER [12380.067806] Call Trace:
> Feb 1 09:08:50 SERVER [12380.067839] [<c0134663>]
> Feb 1 09:08:50 SERVER __remove_hrtimer+0x5d/0x64
> Feb 1 09:08:50 SERVER [12380.067861] [<c013515b>]
> Feb 1 09:08:50 SERVER hrtimer_interrupt+0x10c/0x19a
> Feb 1 09:08:50 SERVER [12380.067883] [<c0113963>]
> Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x6f/0x80
> Feb 1 09:08:50 SERVER [12380.067905] [<c0105838>]
> Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30
> Feb 1 09:08:50 SERVER [12380.067928] [<c02be6d7>]
> Feb 1 09:08:50 SERVER _spin_lock_irqsave+0x13/0x27
> Feb 1 09:08:50 SERVER [12380.067949] [<c0134bc7>]
> Feb 1 09:08:50 SERVER lock_hrtimer_base+0x15/0x2f
> Feb 1 09:08:50 SERVER [12380.067970] [<c0134ca0>]
> Feb 1 09:08:50 SERVER hrtimer_start+0x16/0xf4
> Feb 1 09:08:50 SERVER [12380.067991] [<c027ec43>]
> Feb 1 09:08:50 SERVER qdisc_watchdog_schedule+0x1e/0x21
> Feb 1 09:08:50 SERVER [12380.068013] [<f89f8fe6>]
> Feb 1 09:08:50 SERVER htb_dequeue+0x6ef/0x6fb [sch_htb]
> Feb 1 09:08:50 SERVER [12380.068036] [<c028ac4d>]
> Feb 1 09:08:50 SERVER ip_rcv+0x1fc/0x237
> Feb 1 09:08:50 SERVER [12380.068057] [<c0135297>]
> Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb
> Feb 1 09:08:50 SERVER [12380.068078] [<c0135297>]
> Feb 1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb
> Feb 1 09:08:50 SERVER [12380.068099] [<c0136e26>]
> Feb 1 09:08:50 SERVER getnstimeofday+0x2b/0xb5
> Feb 1 09:08:50 SERVER [12380.068118] [<c0138d70>]
> Feb 1 09:08:50 SERVER clockevents_program_event+0xe0/0xee
> Feb 1 09:08:50 SERVER [12380.068140] [<c027da0e>]
> Feb 1 09:08:50 SERVER __qdisc_run+0x2a/0x163
> Feb 1 09:08:50 SERVER [12380.068161] [<c02722d8>]
> Feb 1 09:08:50 SERVER net_tx_action+0xa8/0xcc
> Feb 1 09:08:50 SERVER [12380.068180] [<c027ec65>]
> Feb 1 09:08:50 SERVER qdisc_watchdog+0x0/0x1b
> Feb 1 09:08:50 SERVER [12380.068199] [<c027ec7d>]
> Feb 1 09:08:50 SERVER qdisc_watchdog+0x18/0x1b
> Feb 1 09:08:50 SERVER [12380.068218] [<c0135007>]
> Feb 1 09:08:50 SERVER run_hrtimer_softirq+0x4e/0x96
> Feb 1 09:08:50 SERVER [12380.068241] [<c0126a82>]
> Feb 1 09:08:50 SERVER __do_softirq+0x5d/0xc1
> Feb 1 09:08:50 SERVER [12380.068260] [<c0126b18>]
> Feb 1 09:08:50 SERVER do_softirq+0x32/0x36
> Feb 1 09:08:50 SERVER [12380.068279] [<c0126d6a>]
> Feb 1 09:08:50 SERVER irq_exit+0x38/0x6b
> Feb 1 09:08:50 SERVER [12380.068298] [<c0113968>]
> Feb 1 09:08:50 SERVER smp_apic_timer_interrupt+0x74/0x80
> Feb 1 09:08:50 SERVER [12380.068319] [<c0105838>]
> Feb 1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30
> Feb 1 09:08:50 SERVER [12380.068343] [<c0103243>]
> Feb 1 09:08:50 SERVER mwait_idle_with_hints+0x3c/0x40
> Feb 1 09:08:50 SERVER [12380.068365] [<c0103247>]
> Feb 1 09:08:50 SERVER mwait_idle+0x0/0xa
> Feb 1 09:08:50 SERVER [12380.068384] [<c010357e>]
> Feb 1 09:08:50 SERVER cpu_idle+0x98/0xb9
> Feb 1 09:08:50 SERVER [12380.068403] [<c03848c2>]
> Feb 1 09:08:50 SERVER start_kernel+0x2d7/0x2df
> Feb 1 09:08:50 SERVER [12380.068422] [<c03840e0>]
> Feb 1 09:08:50 SERVER unknown_bootoption+0x0/0x195
> Feb 1 09:08:50 SERVER [12380.068444] =======================
What do you see if you build with CONFIG_HIGH_RES_TIMERS=n
Does it work better if you boot with "acpi=off"?
if yes, how about with just pnpacpi=off?
thanks,
-Len
On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote
>
> What do you see if you build with CONFIG_HIGH_RES_TIMERS=n
>
> Does it work better if you boot with "acpi=off"?
> if yes, how about with just pnpacpi=off?
>
> thanks,
> -Len
It is not very easy to test. About bug - most probably it is related to third
party ESFQ patch, i will drop it and then test more properly when i will be
able to make watchdog work fine. But more important i notice - that iTCO_wdt
is not working at all. I think hrtimers doesn't change anything on that.
About testing, i cannot take even small risk now(and near 3-5 days) by
changing kernel options, i set now maximum available set of watchdogs, cause
there is noone to maintain server, area is unreachable because of snow and
bad weather.
Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it work?
Maybe just registers addresses or way how TCO watchdog activated changed on
this chipset?
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
On Friday 01 February 2008 14:15, Denys Fedoryshchenko wrote:
>
> On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote
> >
> > What do you see if you build with CONFIG_HIGH_RES_TIMERS=n
> >
> > Does it work better if you boot with "acpi=off"?
> > if yes, how about with just pnpacpi=off?
> >
> > thanks,
> > -Len
>
> It is not very easy to test. About bug - most probably it is related to third
> party ESFQ patch, i will drop it and then test more properly when i will be
> able to make watchdog work fine. But more important i notice - that iTCO_wdt
> is not working at all. I think hrtimers doesn't change anything on that.
> About testing, i cannot take even small risk now(and near 3-5 days) by
> changing kernel options, i set now maximum available set of watchdogs, cause
> there is noone to maintain server, area is unreachable because of snow and
> bad weather.
>
> Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it work?
> Maybe just registers addresses or way how TCO watchdog activated changed on
> this chipset?
yes, i'm wondering if the changes in IO resource reservations
in the PNPACPI layer are interfering with the native driver.
unfortunately, if you boot with acpi=off or pnpacpi=off, you may
run into other, unrelated, issues (or not).
one way to isolate the problem is if you revert these two lines
from their 2.6.24 values to their 2.6.23 values by applying this patch:
---
diff --git a/include/linux/pnp.h b/include/linux/pnp.h
index 2a6d62c..16b46aa 100644
--- a/include/linux/pnp.h
+++ b/include/linux/pnp.h
@@ -13,8 +13,8 @@
#include <linux/errno.h>
#include <linux/mod_devicetable.h>
-#define PNP_MAX_PORT 40
-#define PNP_MAX_MEM 12
+#define PNP_MAX_PORT 8
+#define PNP_MAX_MEM 4
#define PNP_MAX_IRQ 2
#define PNP_MAX_DMA 2
#define PNP_NAME_LEN 50
I check, watchdog still doesn't work with acpi=off, nor with pnpacpi=off
I will try to check technical documents about chipset, to find any reference
to watchdog registers, maybe i can see there something useful.
On Fri, 1 Feb 2008 15:39:08 -0500, Len Brown wrote
> On Friday 01 February 2008 14:15, Denys Fedoryshchenko wrote:
> >
> > On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote
> > >
> > > What do you see if you build with CONFIG_HIGH_RES_TIMERS=n
> > >
> > > Does it work better if you boot with "acpi=off"?
> > > if yes, how about with just pnpacpi=off?
> > >
> > > thanks,
> > > -Len
> >
> > It is not very easy to test. About bug - most probably it is related to
third
> > party ESFQ patch, i will drop it and then test more properly when i will
be
> > able to make watchdog work fine. But more important i notice - that
iTCO_wdt
> > is not working at all. I think hrtimers doesn't change anything on that.
> > About testing, i cannot take even small risk now(and near 3-5 days) by
> > changing kernel options, i set now maximum available set of watchdogs,
cause
> > there is noone to maintain server, area is unreachable because of snow
and
> > bad weather.
> >
> > Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it
work?
> > Maybe just registers addresses or way how TCO watchdog activated changed
on
> > this chipset?
>
> yes, i'm wondering if the changes in IO resource reservations
> in the PNPACPI layer are interfering with the native driver.
>
> unfortunately, if you boot with acpi=off or pnpacpi=off, you may
> run into other, unrelated, issues (or not).
>
> one way to isolate the problem is if you revert these two lines
> from their 2.6.24 values to their 2.6.23 values by applying this patch:
> ---
> diff --git a/include/linux/pnp.h b/include/linux/pnp.h
> index 2a6d62c..16b46aa 100644
> --- a/include/linux/pnp.h
> +++ b/include/linux/pnp.h
> @@ -13,8 +13,8 @@
> #include <linux/errno.h>
> #include <linux/mod_devicetable.h>
>
> -#define PNP_MAX_PORT 40
> -#define PNP_MAX_MEM 12
> +#define PNP_MAX_PORT 8
> +#define PNP_MAX_MEM 4
> #define PNP_MAX_IRQ 2
> #define PNP_MAX_DMA 2
> #define PNP_NAME_LEN 50
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
I think i found issue, but not able to understand how to fix it.
I did small patch to make sure, that code able to change TCO_EN(bit 13) to 0.
It cannot change it, because TCO_LOCK bit is set. I
For example i did patch to see that:
--- /usr/src/linux-2.6.24/drivers/watchdog/iTCO_wdt.c 2008-01-25
00:58:37.000000000 +0200
+++ /WORK/globalosii/linux-embedded/drivers/watchdog/iTCO_wdt.c 2008-02-02
05:11:46.000000000 +0200
@@ -659,8 +659,13 @@
goto out;
}
val32 = inl(SMI_EN);
+ printk(KERN_INFO PFX "TCO_EN was %04lX\n", val32);
val32 &= 0xffffdfff; /* Turn off SMI clearing watchdog */
+ printk(KERN_INFO PFX "TCO_EN will try to set %04lX\n", val32);
outl(val32, SMI_EN);
+ val32 = inl(SMI_EN);
+ printk(KERN_INFO PFX "TCO_EN after set %04lX\n", val32);
+
release_region(SMI_EN, 4);
/* The TCO I/O registers reside in a 32-byte range pointed to by the
TCOBASE value */
and i got in dmesg
[ 589.913354] iTCO_wdt: TCO_EN was 0000203B
[ 589.913356] iTCO_wdt: TCO_EN will try to set 0000003B
[ 589.913360] iTCO_wdt: TCO_EN after set 0000203B
So this function will not work in some conditions, for example in my
situation. It is a bit dangerous, because as i understand function is
supposed to disable unexpected reboots during watchdog setup, so maybe must
be added check for TCO_LOCK bit, or just to check if value really has been
changed.
Also i dont understand code:
TCO1_STS for example, to clear bit's needs to write 1 to each one (it is not
WRITE, it is WRITECLEAR almost all of them, except Bit 0 on TCO1_STS which is
Read Only) (i read that in ICH9 datasheet). So outb(0, TCO1_STS), just will
not do anything.
TCO2_STS, bit 0 is responsible Intruder Detect on ICH8 and ICH9!!! Probably
it is not good to reset this bit.
Code:
/* Clear out the (probably old) status */
outb(0, TCO1_STS);
outb(3, TCO2_STS);
But that all small issues, and doesn't explain why it doesn't work. I did
small patch, and instead of resetting timer, i am getting current value of
timer.
Patch looks like this:
@@ -483,6 +484,7 @@
static ssize_t iTCO_wdt_write (struct file *file, const char __user *data,
size_t len, loff_t * ppos)
{
+ unsigned int val16;
/* See if we got the magic character 'V' and reload the timer */
if (len) {
if (!nowayout) {
@@ -503,7 +505,14 @@
}
/* someone wrote to us, we should reload the timer */
- iTCO_wdt_keepalive();
+ //iTCO_wdt_keepalive();
+ spin_lock(&iTCO_wdt_private.io_lock);
+ val16 = inw(TCO_RLD);
+ val16 &= 0x3ff;
+ spin_unlock(&iTCO_wdt_private.io_lock);
+
+ printk(KERN_INFO PFX "Remaining time %d\n", (val16 * 6) / 10);
+
[ 2505.979453] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.02 (26-Jul-2007)
[ 2505.980073] iTCO_wdt: TCO_EN was 0000203B
[ 2505.980076] iTCO_wdt: TCO_EN will try to set 0000003B
[ 2505.980083] iTCO_wdt: TCO_EN after set 0000203B
[ 2505.980085] iTCO_wdt: Found a ICH9R TCO device (Version=2, TCOBASE=0x0460)
[ 2505.980088] iTCO_wdt: TCO1_STS was 0000
[ 2505.980090] iTCO_wdt: TCO2_STS was 0000
[ 2505.980664] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
[ 2515.908192] iTCO_wdt: Remaining time 30
[ 2516.408459] iTCO_wdt: Remaining time 29
[ 2516.908687] iTCO_wdt: Remaining time 28
[ 2517.408917] iTCO_wdt: Remaining time 28
[ 2517.909144] iTCO_wdt: Remaining time 28
[ 2518.409373] iTCO_wdt: Remaining time 27
[ 2518.909601] iTCO_wdt: Remaining time 27
[ 2519.409829] iTCO_wdt: Remaining time 26
[ 2519.910057] iTCO_wdt: Remaining time 25
[ 2520.410287] iTCO_wdt: Remaining time 25
[ 2520.910515] iTCO_wdt: Remaining time 25
[ 2521.410745] iTCO_wdt: Remaining time 24
[ 2521.910972] iTCO_wdt: Remaining time 24
[ 2522.411201] iTCO_wdt: Remaining time 23
[ 2522.911429] iTCO_wdt: Remaining time 22
[ 2523.411658] iTCO_wdt: Remaining time 22
[ 2523.911886] iTCO_wdt: Remaining time 21
[ 2524.412115] iTCO_wdt: Remaining time 21
[ 2524.912343] iTCO_wdt: Remaining time 21
[ 2525.412573] iTCO_wdt: Remaining time 20
[ 2525.912801] iTCO_wdt: Remaining time 19
[ 2526.413030] iTCO_wdt: Remaining time 19
[ 2526.913258] iTCO_wdt: Remaining time 18
[ 2527.413487] iTCO_wdt: Remaining time 18
[ 2527.913715] iTCO_wdt: Remaining time 18
[ 2528.413944] iTCO_wdt: Remaining time 17
[ 2528.914172] iTCO_wdt: Remaining time 16
[ 2529.414401] iTCO_wdt: Remaining time 16
[ 2529.914629] iTCO_wdt: Remaining time 15
[ 2530.414859] iTCO_wdt: Remaining time 15
[ 2530.915087] iTCO_wdt: Remaining time 14
[ 2531.415315] iTCO_wdt: Remaining time 14
[ 2531.915544] iTCO_wdt: Remaining time 13
[ 2532.415773] iTCO_wdt: Remaining time 13
[ 2532.916001] iTCO_wdt: Remaining time 12
[ 2533.416230] iTCO_wdt: Remaining time 12
[ 2533.916459] iTCO_wdt: Remaining time 11
[ 2534.416688] iTCO_wdt: Remaining time 10
[ 2534.916916] iTCO_wdt: Remaining time 10
[ 2535.417144] iTCO_wdt: Remaining time 10
[ 2535.917373] iTCO_wdt: Remaining time 9
[ 2536.417602] iTCO_wdt: Remaining time 9
[ 2536.917830] iTCO_wdt: Remaining time 8
[ 2537.418059] iTCO_wdt: Remaining time 7
[ 2537.918287] iTCO_wdt: Remaining time 7
[ 2538.418516] iTCO_wdt: Remaining time 7
[ 2538.918744] iTCO_wdt: Remaining time 6
[ 2539.418973] iTCO_wdt: Remaining time 6
[ 2539.919201] iTCO_wdt: Remaining time 5
[ 2540.419431] iTCO_wdt: Remaining time 4
[ 2540.919658] iTCO_wdt: Remaining time 4
[ 2541.419888] iTCO_wdt: Remaining time 4
[ 2541.920116] iTCO_wdt: Remaining time 3
[ 2542.420345] iTCO_wdt: Remaining time 3
[ 2542.920573] iTCO_wdt: Remaining time 2
[ 2543.420802] iTCO_wdt: Remaining time 1
[ 2543.921030] iTCO_wdt: Remaining time 1
[ 2544.421259] iTCO_wdt: Remaining time 0
[ 2544.921487] iTCO_wdt: Remaining time 0
[ 2545.421716] iTCO_wdt: Remaining time 2
[ 2545.921945] iTCO_wdt: Remaining time 1
[ 2546.422173] iTCO_wdt: Remaining time 1
[ 2546.922402] iTCO_wdt: Remaining time 0
[ 2547.422631] iTCO_wdt: Remaining time 2
[ 2547.922859] iTCO_wdt: Remaining time 1
[ 2548.423088] iTCO_wdt: Remaining time 1
I tried to watch register each 100ms
[ 3525.608533] iTCO_wdt: Remaining ticks 3
[ 3525.709376] iTCO_wdt: Remaining ticks 3
[ 3525.810220] iTCO_wdt: Remaining ticks 3
[ 3525.911065] iTCO_wdt: Remaining ticks 3
[ 3526.011909] iTCO_wdt: Remaining ticks 2
[ 3526.112753] iTCO_wdt: Remaining ticks 2
[ 3526.213598] iTCO_wdt: Remaining ticks 2
[ 3526.314443] iTCO_wdt: Remaining ticks 2
[ 3526.415287] iTCO_wdt: Remaining ticks 2
[ 3526.516135] iTCO_wdt: Remaining ticks 2
[ 3526.616977] iTCO_wdt: Remaining ticks 1
[ 3526.717820] iTCO_wdt: Remaining ticks 1
[ 3526.818665] iTCO_wdt: Remaining ticks 1
[ 3526.919510] iTCO_wdt: Remaining ticks 1
[ 3527.020354] iTCO_wdt: Remaining ticks 1
[ 3527.121199] iTCO_wdt: Remaining ticks 4
[ 3527.222043] iTCO_wdt: Remaining ticks 4
[ 3527.322890] iTCO_wdt: Remaining ticks 4
[ 3527.423732] iTCO_wdt: Remaining ticks 4
[ 3527.524577] iTCO_wdt: Remaining ticks 4
[ 3527.625422] iTCO_wdt: Remaining ticks 4
Which means timer reaching 0... and, nothing happen! It goes again 2 and then
again 0. I check even STS registers, they are still zero! Register just set
back to default value 0004h.
Probably someone can help me with this? Or it is hardware bug of chipset?
I will try to look more docs, maybe i will be able to find whats wrong there.
On Fri, 1 Feb 2008 15:39:08 -0500, Len Brown wrote
> On Friday 01 February 2008 14:15, Denys Fedoryshchenko wrote:
> >
> > On Fri, 1 Feb 2008 12:11:41 -0500, Len Brown wrote
> > >
> > > What do you see if you build with CONFIG_HIGH_RES_TIMERS=n
> > >
> > > Does it work better if you boot with "acpi=off"?
> > > if yes, how about with just pnpacpi=off?
> > >
> > > thanks,
> > > -Len
> >
> > It is not very easy to test. About bug - most probably it is related to
third
> > party ESFQ patch, i will drop it and then test more properly when i will
be
> > able to make watchdog work fine. But more important i notice - that
iTCO_wdt
> > is not working at all. I think hrtimers doesn't change anything on that.
> > About testing, i cannot take even small risk now(and near 3-5 days) by
> > changing kernel options, i set now maximum available set of watchdogs,
cause
> > there is noone to maintain server, area is unreachable because of snow
and
> > bad weather.
> >
> > Do you think reasonable to try acpi / pnpacpi with iTCO_wdt to make it
work?
> > Maybe just registers addresses or way how TCO watchdog activated changed
on
> > this chipset?
>
> yes, i'm wondering if the changes in IO resource reservations
> in the PNPACPI layer are interfering with the native driver.
>
> unfortunately, if you boot with acpi=off or pnpacpi=off, you may
> run into other, unrelated, issues (or not).
>
> one way to isolate the problem is if you revert these two lines
> from their 2.6.24 values to their 2.6.23 values by applying this patch:
> ---
> diff --git a/include/linux/pnp.h b/include/linux/pnp.h
> index 2a6d62c..16b46aa 100644
> --- a/include/linux/pnp.h
> +++ b/include/linux/pnp.h
> @@ -13,8 +13,8 @@
> #include <linux/errno.h>
> #include <linux/mod_devicetable.h>
>
> -#define PNP_MAX_PORT 40
> -#define PNP_MAX_MEM 12
> +#define PNP_MAX_PORT 8
> +#define PNP_MAX_MEM 4
> #define PNP_MAX_IRQ 2
> #define PNP_MAX_DMA 2
> #define PNP_NAME_LEN 50
--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.
Hi Denys,
> Probably someone can help me with this? Or it is hardware bug of chipset?
> I will try to look more docs, maybe i will be able to find whats wrong there.
I'll have a look at it next week.
Greetings,
Wim.