Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754315Ab0BJLK2 (ORCPT ); Wed, 10 Feb 2010 06:10:28 -0500 Received: from mail.gmx.net ([213.165.64.20]:55366 "HELO mail.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752760Ab0BJLKY (ORCPT ); Wed, 10 Feb 2010 06:10:24 -0500 X-Authenticated: #2387525 X-Provags-ID: V01U2FsdGVkX1/1OY1FYCz19enyPNYjOY/OWFnsq4lJZp3JOFTmnP 1JNCR40n+Hq7Ap Message-ID: <4B72941A.7060704@gmx.de> Date: Wed, 10 Feb 2010 12:10:18 +0100 From: Axel Uhl User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: Kernel Bug in ATA or SMART area Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg=sha1; boundary="------------ms070908050903030500090007" X-Y-GMX-Trusted: 0 X-FuHaFi: 0.48999999999999999 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 34737 Lines: 883 This is a cryptographically signed message in MIME format. --------------ms070908050903030500090007 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I get intermittent kernel exceptions, probably related to an issue with the ATA or SMART code in the kernel / modules. I have a cron job that every so many hours captures the disk temperatures. I "hdparm -y" my disks when they are not used. It seems that when the cron job runs that spins up the disks to access their temperature using smartctl, if during this time certain other interactions happen, the kernel raises an exception. The smartctl command I'm using is: /usr/sbin/smartctl -a $devfull >/var/local/snmp/smart-$dev.TMP Here is one such exception from the syslog: Feb 10 06:57:01 homemp3 /USR/SBIN/CRON[18271]: (root) CMD (/etc/snmp/local-snmp-cronjob-spinup) Feb 10 06:57:06 homemp3 vdr: [3293] connect from 127.0.0.1, port 59975 - accepted Feb 10 06:57:06 homemp3 vdr: [3293] closing SVDRP connection Feb 10 06:57:09 homemp3 kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 10 06:57:09 homemp3 kernel: ata5.00: failed command: SMART Feb 10 06:57:09 homemp3 kernel: ata5.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 Feb 10 06:57:09 homemp3 kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Feb 10 06:57:09 homemp3 kernel: ata5.00: status: { DRDY } Feb 10 06:57:10 homemp3 kernel: irq 10: nobody cared (try booting with the "irqpoll" option) Feb 10 06:57:10 homemp3 kernel: Pid: 17050, comm: hadam3p_um_6.14 Not tainted 2.6.32.7 #1 Feb 10 06:57:10 homemp3 kernel: Call Trace: Feb 10 06:57:10 homemp3 kernel: [] ? printk+0x18/0x1a Feb 10 06:57:10 homemp3 kernel: [] __report_bad_irq+0x27/0x85 Feb 10 06:57:10 homemp3 kernel: [] note_interrupt+0x148/0x186 Feb 10 06:57:10 homemp3 kernel: [] handle_level_irq+0x86/0xb1 Feb 10 06:57:10 homemp3 kernel: [] handle_irq+0x1a/0x28 Feb 10 06:57:10 homemp3 kernel: [] do_IRQ+0x33/0x80 Feb 10 06:57:10 homemp3 kernel: [] ? pdc_interrupt+0x150/0x3a5 Feb 10 06:57:10 homemp3 kernel: [] common_interrupt+0x29/0x30 Feb 10 06:57:10 homemp3 kernel: [] ? __do_softirq+0x33/0xec Feb 10 06:57:10 homemp3 kernel: [] ? handle_IRQ_event+0x31/0xc1 Feb 10 06:57:10 homemp3 kernel: [] do_softirq+0x2a/0x2f Feb 10 06:57:10 homemp3 kernel: [] irq_exit+0x2a/0x2f Feb 10 06:57:10 homemp3 kernel: [] do_IRQ+0x3c/0x80 Feb 10 06:57:10 homemp3 kernel: [] ? do_device_not_available+0x0/0x48 Feb 10 06:57:10 homemp3 kernel: [] common_interrupt+0x29/0x30 Feb 10 06:57:10 homemp3 kernel: handlers: Feb 10 06:57:10 homemp3 kernel: [] (pdc_interrupt+0x0/0x3a5) Feb 10 06:57:10 homemp3 kernel: [] (ata_sff_interrupt+0x0/0xc4) Feb 10 06:57:10 homemp3 kernel: [] (azx_interrupt+0x0/0x11b [snd_hda_intel]) Feb 10 06:57:10 homemp3 kernel: [] (usb_hcd_irq+0x0/0x5e [usbcore]) Feb 10 06:57:10 homemp3 kernel: [] (e1000_intr+0x0/0xf2 [e1000]) Feb 10 06:57:10 homemp3 kernel: Disabling IRQ #10 Feb 10 06:57:10 homemp3 kernel: ata5: soft resetting link Feb 10 06:57:11 homemp3 kernel: ata5.00: configured for UDMA/133 Feb 10 06:57:11 homemp3 kernel: ata5: EH complete I suppose the hadam3p_um_6.14 command is part of my BOINC installation, doing a climate simulation. However, the problem has also occurred with this program not running. Here's the output of scripts/ver_linux: If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux homemp3 2.6.32.7 #1 Sat Feb 6 10:50:05 CET 2010 i686 GNU/Linux Gnu C 4.3.2 Gnu make 3.81 binutils 2.18.0.20080103 util-linux 2.13.1.1 mount 2.13.1.1 module-init-tools found Linux C Library 2.7 Dynamic linker (ldd) 2.7 Procps 3.2.7 Console-tools 0.2.3 Sh-utils 6.10 udev 125 Modules Loaded ipv6 bsd_comp usb_storage ppp_deflate zlib_deflate ppp_ge neric slhc w83627hf hwmon_vid hwmon i2c_dev saa7134_alsa snd_hda_codec_realtek tuner_simple tuner_types mt352 snd_hda_intel saa7134_dvb snd_hda_codec videobuf_dvb snd_pcm_oss snd_mixer_oss dvb_core snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event 8250_pnp snd_seq 8250 serial_core snd_timer snd_seq_device saa7134 ehci_hcd ir_common snd v4l2_common videodev v4l1_compat soundcore snd_page_alloc videobuf_dma_sg videobuf_core tveeprom usbcore via_rhine i2c_core e1000 As a result of the exception, the system becomes very unresponsive. Any sort of I/O seems to have very poor response time. Ping times in the gigabit network suddenly take several tens of milliseconds. It seems this may have to do with the IRQ10 being somehow affected, but I'm not sure. /proc/version: Linux version 2.6.32.7 (root@homemp3) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 Sat Feb 6 10:50:05 CET 2010 /proc/cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Celeron(R) CPU 2.80GHz stepping : 9 cpu MHz : 2799.930 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmovpat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pebs bts pni dtes64 monitor ds_cpl tm2 cid cx16 xtpr lahf_lm bogomips : 5623.13 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: Output of lcpsi -vvv: 00:00.0 Host bridge: VIA Technologies, Inc. PT880 Ultra/PT894 Host Bridge Subsystem: ASRock Incorporation Device 0308 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step ping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot -,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: agpgart-via 00:00.1 Host bridge: VIA Technologies, Inc. PT894 Host Bridge Subsystem: ASRock Incorporation Device 1308 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step ping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [70] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot -,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:02.0 PCI bridge: VIA Technologies, Inc. PT890 PCI to PCI Bridge Controller (p rog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step ping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited ExtTag- RBE- FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupporte d- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPe nd- LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L 0 <64ns, L1 <1us ClockPM- Suprise- LLActRep+ BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive - BWMgmt- ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surpise + Slot # 10, PowerLimit 75.000000; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- L inkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlo ck- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interl ock- Changed: MRL- PresDet- LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSV isible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- Capabilities: [68] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot +,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel 00:09.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controll er (rev 05) Subsystem: Intel Corporation PRO/1000 GT Desktop Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Step ping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- Kernel modules: nvidiafb 80:01.0 Audio device: VIA Technologies, Inc. VT1708/A [Azalia HDAC] (VIA High De finition Audio Controller) (rev 10) Subsystem: ASRock Incorporation Device 0888 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step ping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-