2001-12-26 04:39:13

by Weiping He

[permalink] [raw]
Subject: anybody know about "journal-615" and/or "journal-601" log error?

Hi,all
Merry Christmas & Happy new year first!

Then the problem:
I'm using kernel 2.4.17 now:

---------------------------8<---------------------------------------------------------------------------
root@x200:/var/log# cat /proc/version
Linux version 2.4.17 (root@x200) (gcc version 2.95.3 20010315 (release)) #5 Tue Dec 25 14:45:38 CST 2001
---------------------------8<---------------------------------------------------------------------------

but I've experienced server hang up of my box, the syslog entity is:
(this show up after I re-compile the kernel with the reiserfs's:
Have reiserfs do extra internal checking
Stats in /proc/fs/reiserfs
options set to enable)
---------------------------8<---------------------------------------------------------------------------
Dec 26 09:47:10 x200 kernel: journal-615: buffer write failed
---------------------------8<---------------------------------------------------------------------------

or (before set the two options above to enable):
---------------------------8<---------------------------------------------------------------------------
Dec 18 10:19:13 x200 kernel: journal-601, buffer write failed
---------------------------8<---------------------------------------------------------------------------

This box is a IBM Xseries X200, hardware config:
---------------------------8<---------------------------------------------------------------------------

root@x200:/var/log# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 866.676
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1723.59



root@x200:/var/log# lspci -vvv
00:00.0 Host bridge: VIA Technologies, Inc. VT82C691 [Apollo PRO] (rev c4)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 8
Region 0: Memory at d8000000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3 AGP] (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: dc000000-ddffffff
Prefetchable memory behind bridge: d0000000-d7ffffff
BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
Subsystem: VIA Technologies, Inc. VT82C686/A PCI to ISA Bridge
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0

00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) (prog-if 8a [Master SecP PriP])
Subsystem: VIA Technologies, Inc. VT82C586 IDE [Apollo]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Region 4: I/O ports at d000 [size=16]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10) (prog-if 00 [UHCI])
Subsystem: Unknown device 0925:1234
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, cache line size 08
Interrupt: pin D routed to IRQ 11
Region 4: I/O ports at d400 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.3 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10) (prog-if 00 [UHCI])
Subsystem: Unknown device 0925:1234
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, cache line size 08
Interrupt: pin D routed to IRQ 11
Region 4: I/O ports at d800 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin ? routed to IRQ 5
Capabilities: [68] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 [Apollo Super AC97/Audio] (rev 20)
Subsystem: VIA Technologies, Inc.: Unknown device 4511
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin C routed to IRQ 9
Region 0: I/O ports at dc00 [size=256]
Region 1: I/O ports at e000 [size=4]
Region 2: I/O ports at e400 [size=4]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0b.0 SCSI storage controller: Adaptec 7892A (rev 02)
Subsystem: Adaptec: Unknown device e220
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (10000ns min, 6250ns max), cache line size 08
Interrupt: pin A routed to IRQ 11
BIST result: 00
Region 0: I/O ports at e800 [disabled] [size=256]
Region 1: Memory at df101000 (64-bit, non-prefetchable) [size=4K]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0d.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
Subsystem: Intel Corporation 82559 Fast Ethernet LAN on Motherboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2000ns min, 14000ns max), cache line size 08
Interrupt: pin A routed to IRQ 10
Region 0: Memory at df100000 (32-bit, non-prefetchable) [size=4K]
Region 1: I/O ports at ec00 [size=64]
Region 2: Memory at df000000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at <unassigned> [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable+ DSel=0 DScale=2 PME-

01:00.0 VGA compatible controller: S3 Inc. Savage 4 (rev 04) (prog-if 00 [VGA])
Subsystem: Diamond Multimedia Systems: Unknown device 4906
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 248 (1000ns min, 63750ns max), cache line size 08
Interrupt: pin A routed to IRQ 5
Region 0: Memory at dd000000 (32-bit, non-prefetchable) [size=512K]
Region 1: Memory at d0000000 (32-bit, prefetchable) [size=128M]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] AGP version 2.0
Status: RQ=31 SBA- 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
---------------------------8<---------------------------------------------------------------------------

the distribution is Slackware 8.0 and
we got 512M memory in this box and installing Oracle 8.1.7 on it to use as
a experiment platform, all the filesystem are reiserfs:

---------------------------8<---------------------------------------------------------------------------
root@x200:/var/log# mount
/dev/sdb3 on / type reiserfs (rw)
/dev/sdb6 on /tmp type reiserfs (rw)
/dev/sdb7 on /var type reiserfs (rw)
/dev/sdb8 on /opt type reiserfs (rw)
/dev/sdb9 on /usr type reiserfs (rw)
/dev/sdb10 on /home type reiserfs (rw)
/dev/sdc6 on /ext type reiserfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
none on /proc type proc (rw)
---------------------------8<---------------------------------------------------------------------------
I have three hard disk, they are IBM scsi disk (one 72k rpm, 2 10k rpm).

I've experienced power failure before the problem happens.
and after it happens, in `top' I can see that `kupdated' mark as <debunc>
my question is what's the matter here? I do some search in kernel archive
but found almost nothing relate that 'journal-615' or 'journal-601'.
Is it a bug or my sw/hw configure problem?
Any help would appreciate.


I've recompile the kernel using egcs 1.1.2 to see if it's different.


Thanks & Regards laser



?:.?˛???m???ka??b???zwm??b????˛???m?b??????z_???^n?r???z???h?????&???z??z?ޗ?+??+zf???h???~????i?????????z_???j:+v???)ߣ?m?S?y??杶????i??????????i


2001-12-26 06:42:46

by Oleg Drokin

[permalink] [raw]
Subject: Re: anybody know about "journal-615" and/or "journal-601" log error?

Hello!

On Wed, Dec 26, 2001 at 12:36:18PM +0800, Weiping He wrote:

> but I've experienced server hang up of my box, the syslog entity is:
> (this show up after I re-compile the kernel with the reiserfs's:
> Have reiserfs do extra internal checking
> Stats in /proc/fs/reiserfs
> options set to enable)
> Dec 26 09:47:10 x200 kernel: journal-615: buffer write failed
> Dec 18 10:19:13 x200 kernel: journal-601, buffer write failed
Any other errors in the logs?
Reading the code these errors appear after we put IO request
and then watitng for it to be complete with wait_on_buffer().
But after wait_on_buffer returns, bufer is still not up to date,
which usually means IO request have failed for some reason.

> I've experienced power failure before the problem happens.
> and after it happens, in `top' I can see that `kupdated' mark as <debunc>
Yes, because there was an Oops after this message.

> my question is what's the matter here? I do some search in kernel archive
> but found almost nothing relate that 'journal-615' or 'journal-601'.

> Is it a bug or my sw/hw configure problem?
Right now it looks more like a HW problem.
Can you dig more messages from your kernel log.
Can you try to run reiserfsck on a problematic partition and see
if there is anything wrong?

Bye,
Oleg

2001-12-26 08:40:51

by Weiping He

[permalink] [raw]
Subject: Re: anybody know about "journal-615" and/or "journal-601" log error?

>
> > but I've experienced server hang up of my box, the syslog entity is:
> > (this show up after I re-compile the kernel with the reiserfs's:
> > Have reiserfs do extra internal checking
> > Stats in /proc/fs/reiserfs
> > options set to enable)
> > Dec 26 09:47:10 x200 kernel: journal-615: buffer write failed
> > Dec 18 10:19:13 x200 kernel: journal-601, buffer write failed
> Any other errors in the logs?

no, there is the only log I can find related in syslog, but I'v seen some kernel stack dump on the screen,
and I havn't got chance to copy them out. I'm observing it now, if I repeat the error, I'll sent them.

> Reading the code these errors appear after we put IO request
> and then watitng for it to be complete with wait_on_buffer().
> But after wait_on_buffer returns, bufer is still not up to date,
> which usually means IO request have failed for some reason.
>

yeah, that's what I thought too.

> Right now it looks more like a HW problem.

I guest so at present, but let me wait to see if I can repeat the problem later.

> Can you dig more messages from your kernel log.

there are some other log messages in /var/log/debug, looks like:

----------------------------------------------8<-----------------------------------------
Dec 26 01:58:55 x200 kernel: journal_read_transaction, commit offset 5581 had bad time 232573 or length 4
Dec 26 01:58:55 x200 kernel: journal-1299: Setting newest_mount_id to 28
Dec 26 02:07:06 x200 kernel: CPU: Before vendor init, caps: 0383f9ff 00000000 00000000, vendor = 0
Dec 26 02:07:06 x200 kernel: CPU: After vendor init, caps: 0383f9ff 00000000 00000000 00000000
Dec 26 02:07:06 x200 kernel: CPU: After generic, caps: 0383f9ff 00000000 00000000 00000000
Dec 26 02:07:06 x200 kernel: CPU: Common caps: 0383f9ff 00000000 00000000 00000000
Dec 26 02:07:10 x200 kernel: journal-1153: found in header: first_unflushed_offset 405, last_flushed_trans_id 119668
Dec 26 02:07:10 x200 kernel: journal-1206: Starting replay from offset 405, trans_id 119669
Dec 26 02:07:11 x200 kernel: journal-1299: Setting newest_mount_id to 34
Dec 26 02:07:12 x200 kernel: journal-1153: found in header: first_unflushed_offset 7579, last_flushed_trans_id 15604
Dec 26 02:07:12 x200 kernel: journal-1206: Starting replay from offset 7579, trans_id 15605
Dec 26 02:07:12 x200 kernel: journal-1299: Setting newest_mount_id to 34
Dec 26 02:07:12 x200 kernel: journal-1153: found in header: first_unflushed_offset 4260, last_flushed_trans_id 44372
----------------------------------------------8<-----------------------------------------

but I think their are unrelevented here, because I think their are the normal log check after power failure.

> Can you try to run reiserfsck on a problematic partition and see
> if there is anything wrong?

I've run it using command like :

#reiserfsck -x -o -l logfile /dev/sdc6

to all my partitions in linux, and the `logfile' is empty, so may be I should observing to
see if the problem could repeated and get more information.
BTW: if there another log helpful I can found, please inform me, I just checked `syslog',
`message' and `debug' in my system log directory.

Thanks and Regards Laser
?:.?˛???m???ka??b???zwm??b????˛???m?b??????z_???^n?r???z???h?????&???z??z?ޗ?+??+zf???h???~????i?????????z_???j:+v???)ߣ?m?S?y??杶????i??????????i

2002-01-10 02:15:06

by Weiping He

[permalink] [raw]
Subject: Re: anybody know about "journal-615" and/or "journal-601" log error?(may be SCSI problem?)

Hi,

I've experienced another crash this morning.
I'v recorded the message displayed on the screen:

-------------------------------------8<------------------------------------------------
journal-601, buffer write failed
invalid operand: 0000
CPU:0
EIP: 0010:[<c018a7f9>] Not tainted
EFLAGS: 00010286
eax: 00000024 ebx: c028e020 ecx: da39c000 edx: 00000001
esi: dc615400 edi: dc615400 ebp: c1835e58 esp: c1835e58
ds: 0018 es: 0018 ss: 0018
Process Kupdated (pid: 7, stackpage = c1835000)
stack: c0286660 c0329d60 c028e020 c1835e3c 00000000 e0c3b504 c019985b dc615400
c029e020 00001268 dc615400 e0c3a094 c1835f74 00000005 00000006 00000000
ca2a31e0 c019d074 dc615400 c0c3b504 00000001 c1835fa0 dc615400 c1834560

CallTrace [<c01999856>] [<c019d074>] [<c01e356f>] [<c01d7c91>] [<c01123f9>]
[<c01186f4>] [<c01991cd>] [<c019c263>] [<c01881c9>] [<c0132cb3>] [<c01325c0>]
[<c0132841>] [<c0105000>] [<c01054aa>] [<c01054b3>]

Code: 0f 0b 68 60 9d 32 c0 85 f6 74 10 0f b7 46 08 50 e8 ea 4f fa
scsi0:0:8:0: Attempting to queue an ABORT message
scsi0: Dumping Card State while idle, at SEQADDR 0x8
ACCUM = 0x0, SINDEX = 0x27, DINDEX = 0xe4, ARG_2 = 0x0
HCNT = 0x0
SCSISEQ = 0x12, SBLKCTL = 0xa,
DFCNTRL=0x0, DFSTATUS = 0x89
LASTPHASE=0x1, SCSISIGI=0x0, SXFRCTL0=0x80
SSTAT0=0x0, SSTAT1=0x8
SCSIPHASE=0x0
STACK==0x3, 0x108, 0x160, 0x0
SCB Count = 116
Kernel NEXTQSCB=36
Card NEXTQSCB=36
QINFIFO entries:
Waiting Queue entries:
Disconnected Queue entries: 18:28
QOUTFIFO entries:
Sequencer Free SCB List: 12 21 30 15 19 3 27 6 7 23 26 14 2 13 8 9 22 24 5 20 31
28 0 11 1 4 25 16 17 29 10

Pending list: 28

Kernel Free SCB list: 39 26 4 22 56 19 0 44 45 21 31 62 2 30 26 7 41 13 20 32
61 52 6 47 48 12 55 42 54 1 34 37 11 15 25 46 27 38 67 18 23 5 50 58 63 16 57 3
17 9 59 43 112 60 14 49 10 53 24 8 35 33 40 113 114 115 108 109 110 111 104 105
106 107 100 101 102 103 96 97 98 99 92 93 94 95 88 89 90 91 84 85 86 87 80 81 82
83 76 77 78 79 72 73 74 75 68 69 70 71 64 65 66

DevQ(0:4:0): 0 waiting
DevQ(0:6:0): 0 waiting
DevQ(0:8:0): 0 waiting
(scsi0:A:8:0): Queuing a recovery SCB
scsi0:0:8:0: Device is disconnected, re_queuing SCB
Recovery code sleeping
(scsi0:A:8:0): Abort Tag Message Sent
(scsi0:A:8:0): SCB 28 - Abort Tag Completed.
Recovery SCB completes
Recovery Code asake
aic7xxx_abort returns 0x2002.
-------------------------------------------------------8<----------------------------------------------

it's recorded by hand, so may be some typo there, but I think it should reflact something happended there.
It seems there are something related to my SCSI card or driver got problem, but I'm not sure whether
it's a hardware problem (my scsi card got problem) or something wrong in driver (or my config of that
driver)?

and in the syslog still just got "journal-601" report, but I think it's reasonable,
because when the kernel wants to write some log, the hardware just don't act accordingly.
Any thoughts?

thanks and regards laser

?:.?˛???m???ka??b???zwm??b????˛???m?b??????z_???^n?r???z???h?????&???z??z?ޗ?+??+zf???h???~????i?????????z_???j:+v???)ߣ?m?S?y??杶????i??????????i

2002-01-10 10:36:50

by Nikita Danilov

[permalink] [raw]
Subject: Re: anybody know about "journal-615" and/or "journal-601" log error?(may be SCSI problem?)

journal-601 means that writing of the block into the journal area failed.
journal-615 means that writing of the commit record into the journal area failed.

Reiserfs cannot work reliably if there are io errors on access to
journal area. So, if this is turns out to be hardware problem (as it
looks to be, judging by scsi error messages), you will have to resort to
changing drive. There is also not well tested support for relocating
journal to another device or different place on a device.

Weiping He writes:
> Hi,
>
> I've experienced another crash this morning.
> I'v recorded the message displayed on the screen:
>
> -------------------------------------8<------------------------------------------------
> journal-601, buffer write failed

Nikita.