Hi,
next kernel panics when running fsfuzzer test on ext4 file system.
Machine Type: Power 7 PowerVM LPAR
kernel : 4.13.0-next-20170915
config : attached
Test: fsfuzzer
dmesg:
-----
EXT4-fs (loop1): couldn't mount RDWR because of unsupported optional
features (e4000000)
JBD2: no valid journal superblock found
EXT4-fs (loop1): error loading journal
EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: (null)
EXT4-fs error (device loop1): ext4_mb_generate_buddy:756: group 0, block
bitmap and bg descriptor inconsistent: 1077 vs 1045 free clusters
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xd0000000084b0424
Oops: Kernel access of bad area, sig: 11 [#1]
BE SMP NR_CPUS=2048 NUMA pSeries
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in: cramfs(E) dlci(E) 8021q(E) garp(E) mrp(E) af_key(E)
ieee802154_socket(E) ieee802154(E) rpcrdma(E) ib_isert(E)
iscsi_target_mod(E) ib_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E)
ib_srp(E) hidp(E) ib_ipoib(E) cmtp(E) kernelcapi(E) rdma_ucm(E)
ib_ucm(E) bnep(E) ib_uverbs(E) rfcomm(E) bluetooth(E) ib_umad(E)
rdma_cm(E) ecdh_generic(E) rfkill(E) ib_cm(E) iw_cm(E) pptp(E) gre(E)
l2tp_ppp(E) l2tp_netlink(E) l2tp_core(E) ip6_udp_tunnel(E) udp_tunnel(E)
pppoe(E) pppox(E) ppp_generic(E) slhc(E) crypto_user(E) ib_core(E)
nfnetlink(E) scsi_transport_iscsi(E) atm(E) sctp(E) dccp_ipv4(E)
netlink_diag(E) dccp_diag(E) ip6table_filter(E) af_packet_diag(E)
unix_diag(E) tcp_diag(E) udp_diag(E) ebtable_filter(E) bridge(E) sg(E)
ibmveth(E) rpadlpar_io(E) loop(E) xt_CHECKSUM(E) iptable_mangle(E)
ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E)
nf_nat_ipv4(E) nf_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E)
xt_conntrack(E) nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) tun(E)
stp(E) llc(E) rpaphp(E) kvm_pr(E) kvm(E) ebtables(E) ip6_tables(E)
dccp(E) inet_diag(E) iptable_filter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E)
lockd(E) grace(E) sunrpc(E) binfmt_misc(E) ip_tables(E) ext4(E)
mbcache(E) jbd2(E) sd_mod(E) ibmvscsi(E) scsi_transport_srp(E) [last
unloaded: netlink_diag]
CPU: 15 PID: 32523 Comm: jbd2/loop1-8 Tainted: G W E
4.13.0-next-20170915-autotest #1
task: c000000189664900 task.stack: c0000009f122c000
NIP: d0000000084b0424 LR: d0000000084a8040 CTR: c000000000373fe0
REGS: c0000009f122f590 TRAP: 0300 Tainted: G W E
(4.13.0-next-20170915-autotest)
MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 28002022
XER: 20000000
CFAR: c000000000008718 DAR: 0000000000000008 DSISR: 40000000 SOFTE: 1
GPR00: d0000000084a8040 c0000009f122f810 d0000000084bf010
c00000093b66a7d0
GPR04: f0000000024ed980 c00000093b66a578 c00000093b66a578
0000000000000001
GPR08: 0000000000000000 0000000000000000 0000000000000000
d0000000084b2738
GPR12: c000000000373fe0 c00000000e939d80 c000000000118350
c0000013f0bc22c0
GPR16: c0000009f65e4000 c0000009f122fb20 0000000000000000
c00000108a749824
GPR20: c0000009f791cf9c 0000000000000000 c0000009f791cf60
d0000000084b6cb0
GPR24: 0000000000000000 c00000108a749880 0000000000000000
c0000009f791d300
GPR28: c00000093b66a7d0 c0000009f122c000 0000000000000000
c0000013f8f389b8
NIP [d0000000084b0424] .jbd2_journal_put_journal_head+0x94/0x22c [jbd2]
LR [d0000000084a8040] .__jbd2_journal_remove_checkpoint+0x70/0x200
[jbd2]
Call Trace:
[c0000009f122f810] [d0000000084b04ec] .jbd2_journal_put_journal_head
+0x15c/0x22c [jbd2] (unreliable)
[c0000009f122f890] [d0000000084a8040] .__jbd2_journal_remove_checkpoint
+0x70/0x200 [jbd2]
[c0000009f122f930] [d0000000084a8b98] .__jbd2_log_wait_for_space
+0x368/0x3d0 [jbd2]
[c0000009f122f9c0]
[d0000000084a8c5c] .__jbd2_journal_clean_checkpoint_list+0x5c/0xc0
[jbd2]
[c0000009f122fa60] [d0000000084a4e2c] .jbd2_journal_commit_transaction
+0x33c/0x2f80 [jbd2]
[c0000009f122fc90] [d0000000084ad160] .jbd2_journal_init_inode
+0x390/0x5a0 [jbd2]
[c0000009f122fd70] [c0000000001184ac] .kthread+0x15c/0x1a0
[c0000009f122fe30] [c00000000000b4e4] .ret_from_kernel_thread+0x58/0x74
Instruction dump:
7c2004ac 7d40f8a8 7d4a4878 7d40f9ad 40c2fff4 38210080 e8010010 ebc1fff0
ebe1fff8 7c0803a6 4e800020 ebdf0040 <e93e000a> 79290fe0 0b090000
e93e0028
---[ end trace bd674540a2bf235c ]---
Kernel panic - not syncing: Fatal exception
--
Regard's
Abdul Haleem
IBM Linux Technology Centre
Abdul Haleem <[email protected]> writes:
> Hi,
>
> next kernel panics when running fsfuzzer test on ext4 file system.
>
> Machine Type: Power 7 PowerVM LPAR
> kernel : 4.13.0-next-20170915
> config : attached
> Test: fsfuzzer
>
> dmesg:
> -----
> EXT4-fs (loop1): couldn't mount RDWR because of unsupported optional
> features (e4000000)
> JBD2: no valid journal superblock found
> EXT4-fs (loop1): error loading journal
> EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: (null)
> EXT4-fs error (device loop1): ext4_mb_generate_buddy:756: group 0, block
> bitmap and bg descriptor inconsistent: 1077 vs 1045 free clusters
> Unable to handle kernel paging request for data at address 0x00000008
> Faulting instruction address: 0xd0000000084b0424
> Oops: Kernel access of bad area, sig: 11 [#1]
> BE SMP NR_CPUS=2048 NUMA pSeries
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in: cramfs(E) dlci(E) 8021q(E) garp(E) mrp(E) af_key(E)
> ieee802154_socket(E) ieee802154(E) rpcrdma(E) ib_isert(E)
> iscsi_target_mod(E) ib_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E)
> ib_srp(E) hidp(E) ib_ipoib(E) cmtp(E) kernelcapi(E) rdma_ucm(E)
> ib_ucm(E) bnep(E) ib_uverbs(E) rfcomm(E) bluetooth(E) ib_umad(E)
> rdma_cm(E) ecdh_generic(E) rfkill(E) ib_cm(E) iw_cm(E) pptp(E) gre(E)
> l2tp_ppp(E) l2tp_netlink(E) l2tp_core(E) ip6_udp_tunnel(E) udp_tunnel(E)
> pppoe(E) pppox(E) ppp_generic(E) slhc(E) crypto_user(E) ib_core(E)
> nfnetlink(E) scsi_transport_iscsi(E) atm(E) sctp(E) dccp_ipv4(E)
> netlink_diag(E) dccp_diag(E) ip6table_filter(E) af_packet_diag(E)
> unix_diag(E) tcp_diag(E) udp_diag(E) ebtable_filter(E) bridge(E) sg(E)
> ibmveth(E) rpadlpar_io(E) loop(E) xt_CHECKSUM(E) iptable_mangle(E)
> ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E)
> nf_nat_ipv4(E) nf_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E)
> xt_conntrack(E) nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) tun(E)
> stp(E) llc(E) rpaphp(E) kvm_pr(E) kvm(E) ebtables(E) ip6_tables(E)
> dccp(E) inet_diag(E) iptable_filter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E)
> lockd(E) grace(E) sunrpc(E) binfmt_misc(E) ip_tables(E) ext4(E)
> mbcache(E) jbd2(E) sd_mod(E) ibmvscsi(E) scsi_transport_srp(E) [last
> unloaded: netlink_diag]
> CPU: 15 PID: 32523 Comm: jbd2/loop1-8 Tainted: G W E
> 4.13.0-next-20170915-autotest #1
These would be easier to read if they weren't word-wrapped.
> task: c000000189664900 task.stack: c0000009f122c000
> NIP: d0000000084b0424 LR: d0000000084a8040 CTR: c000000000373fe0
> REGS: c0000009f122f590 TRAP: 0300 Tainted: G W E > (4.13.0-next-20170915-autotest)
> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 28002022
> XER: 20000000
> CFAR: c000000000008718 DAR: 0000000000000008 DSISR: 40000000 SOFTE: 1
> GPR00: d0000000084a8040 c0000009f122f810 d0000000084bf010 > c00000093b66a7d0
> GPR04: f0000000024ed980 c00000093b66a578 c00000093b66a578 > 0000000000000001
> GPR08: 0000000000000000 0000000000000000 0000000000000000 > d0000000084b2738
> GPR12: c000000000373fe0 c00000000e939d80 c000000000118350 > c0000013f0bc22c0
> GPR16: c0000009f65e4000 c0000009f122fb20 0000000000000000 > c00000108a749824
> GPR20: c0000009f791cf9c 0000000000000000 c0000009f791cf60 > d0000000084b6cb0
> GPR24: 0000000000000000 c00000108a749880 0000000000000000 > c0000009f791d300
> GPR28: c00000093b66a7d0 c0000009f122c000 0000000000000000 > c0000013f8f389b8
> NIP [d0000000084b0424] .jbd2_journal_put_journal_head+0x94/0x22c [jbd2]
> LR [d0000000084a8040] .__jbd2_journal_remove_checkpoint+0x70/0x200 > [jbd2]
> Call Trace:
> [c0000009f122f810] [d0000000084b04ec] .jbd2_journal_put_journal_head +0x15c/0x22c [jbd2] (unreliable)
> [c0000009f122f890] [d0000000084a8040] .__jbd2_journal_remove_checkpoint +0x70/0x200 [jbd2]
> [c0000009f122f930] [d0000000084a8b98] .__jbd2_log_wait_for_space +0x368/0x3d0 [jbd2]
> [c0000009f122f9c0] [d0000000084a8c5c] .__jbd2_journal_clean_checkpoint_list+0x5c/0xc0 > [jbd2]
> [c0000009f122fa60] [d0000000084a4e2c] .jbd2_journal_commit_transaction +0x33c/0x2f80 [jbd2]
> [c0000009f122fc90] [d0000000084ad160] .jbd2_journal_init_inode +0x390/0x5a0 [jbd2]
> [c0000009f122fd70] [c0000000001184ac] .kthread+0x15c/0x1a0
> [c0000009f122fe30] [c00000000000b4e4] .ret_from_kernel_thread+0x58/0x74
> Instruction dump:
> 7c2004ac 7d40f8a8 7d4a4878 7d40f9ad 40c2fff4 38210080 e8010010 ebc1fff0
> ebe1fff8 7c0803a6 4e800020 ebdf0040 <e93e000a> 79290fe0 0b090000 e93e0028
> ---[ end trace bd674540a2bf235c ]---
The instruction dump is:
lwsync
ldarx r10,0,r31 # probably bit_spin_unlock() in jbd_unlock_bh_journal_head()
andc r10,r10,r9
stdcx. r10,0,r31
bne- 0xfffffffffffffff4
addi r1,r1,128
ld r0,16(r1)
ld r30,-16(r1)
ld r31,-8(r1)
mtlr r0
blr # return
ld r30,64(r31)
lwa r9,8(r30) # <-- POP, r30 is NULL
rldicl r9,r9,1,63
tdnei r9,0 # Most likely a BUG_ON()
ld r9,40(r30)
Perhaps:
J_ASSERT_JH(jh, jh->b_jcount >= 0);
And given:
struct journal_head {
/*
* Points back to our buffer_head. [jbd_lock_bh_journal_head()]
*/
struct buffer_head *b_bh;
/*
* Reference count - see description in journal.c
* [jbd_lock_bh_journal_head()]
*/
int b_jcount;
ie. b_jcount is 8 bytes into the struct, that looks likely.
So we had a jh that was NULL.
Is it reproducible?
cheers
On Wed, 2017-09-20 at 16:44 +1000, Michael Ellerman wrote:
> Abdul Haleem <[email protected]> writes:
>
> > Hi,
> >
> > next kernel panics when running fsfuzzer test on ext4 file system.
> >
> > Machine Type: Power 7 PowerVM LPAR
> > kernel : 4.13.0-next-20170915
> > config : attached
> > Test: fsfuzzer
> >
> > dmesg:
> > -----
> > EXT4-fs (loop1): couldn't mount RDWR because of unsupported optional
> > features (e4000000)
> > JBD2: no valid journal superblock found
> > EXT4-fs (loop1): error loading journal
> > EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: (null)
> > EXT4-fs error (device loop1): ext4_mb_generate_buddy:756: group 0, block
> > bitmap and bg descriptor inconsistent: 1077 vs 1045 free clusters
> > Unable to handle kernel paging request for data at address 0x00000008
> > Faulting instruction address: 0xd0000000084b0424
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > BE SMP NR_CPUS=2048 NUMA pSeries
> > Dumping ftrace buffer:
> > (ftrace buffer empty)
> > Modules linked in: cramfs(E) dlci(E) 8021q(E) garp(E) mrp(E) af_key(E)
> > ieee802154_socket(E) ieee802154(E) rpcrdma(E) ib_isert(E)
> > iscsi_target_mod(E) ib_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E)
> > ib_srp(E) hidp(E) ib_ipoib(E) cmtp(E) kernelcapi(E) rdma_ucm(E)
> > ib_ucm(E) bnep(E) ib_uverbs(E) rfcomm(E) bluetooth(E) ib_umad(E)
> > rdma_cm(E) ecdh_generic(E) rfkill(E) ib_cm(E) iw_cm(E) pptp(E) gre(E)
> > l2tp_ppp(E) l2tp_netlink(E) l2tp_core(E) ip6_udp_tunnel(E) udp_tunnel(E)
> > pppoe(E) pppox(E) ppp_generic(E) slhc(E) crypto_user(E) ib_core(E)
> > nfnetlink(E) scsi_transport_iscsi(E) atm(E) sctp(E) dccp_ipv4(E)
> > netlink_diag(E) dccp_diag(E) ip6table_filter(E) af_packet_diag(E)
> > unix_diag(E) tcp_diag(E) udp_diag(E) ebtable_filter(E) bridge(E) sg(E)
> > ibmveth(E) rpadlpar_io(E) loop(E) xt_CHECKSUM(E) iptable_mangle(E)
> > ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E)
> > nf_nat_ipv4(E) nf_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E)
> > xt_conntrack(E) nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) tun(E)
> > stp(E) llc(E) rpaphp(E) kvm_pr(E) kvm(E) ebtables(E) ip6_tables(E)
> > dccp(E) inet_diag(E) iptable_filter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E)
> > lockd(E) grace(E) sunrpc(E) binfmt_misc(E) ip_tables(E) ext4(E)
> > mbcache(E) jbd2(E) sd_mod(E) ibmvscsi(E) scsi_transport_srp(E) [last
> > unloaded: netlink_diag]
> > CPU: 15 PID: 32523 Comm: jbd2/loop1-8 Tainted: G W E
> > 4.13.0-next-20170915-autotest #1
>
> These would be easier to read if they weren't word-wrapped.
Yes, I will keep it pre formatted next time.
>
> > task: c000000189664900 task.stack: c0000009f122c000
> > NIP: d0000000084b0424 LR: d0000000084a8040 CTR: c000000000373fe0
> > REGS: c0000009f122f590 TRAP: 0300 Tainted: G W E > (4.13.0-next-20170915-autotest)
> > MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 28002022
> > XER: 20000000
> > CFAR: c000000000008718 DAR: 0000000000000008 DSISR: 40000000 SOFTE: 1
> > GPR00: d0000000084a8040 c0000009f122f810 d0000000084bf010 > c00000093b66a7d0
> > GPR04: f0000000024ed980 c00000093b66a578 c00000093b66a578 > 0000000000000001
> > GPR08: 0000000000000000 0000000000000000 0000000000000000 > d0000000084b2738
> > GPR12: c000000000373fe0 c00000000e939d80 c000000000118350 > c0000013f0bc22c0
> > GPR16: c0000009f65e4000 c0000009f122fb20 0000000000000000 > c00000108a749824
> > GPR20: c0000009f791cf9c 0000000000000000 c0000009f791cf60 > d0000000084b6cb0
> > GPR24: 0000000000000000 c00000108a749880 0000000000000000 > c0000009f791d300
> > GPR28: c00000093b66a7d0 c0000009f122c000 0000000000000000 > c0000013f8f389b8
> > NIP [d0000000084b0424] .jbd2_journal_put_journal_head+0x94/0x22c [jbd2]
> > LR [d0000000084a8040] .__jbd2_journal_remove_checkpoint+0x70/0x200 > [jbd2]
> > Call Trace:
> > [c0000009f122f810] [d0000000084b04ec] .jbd2_journal_put_journal_head +0x15c/0x22c [jbd2] (unreliable)
> > [c0000009f122f890] [d0000000084a8040] .__jbd2_journal_remove_checkpoint +0x70/0x200 [jbd2]
> > [c0000009f122f930] [d0000000084a8b98] .__jbd2_log_wait_for_space +0x368/0x3d0 [jbd2]
> > [c0000009f122f9c0] [d0000000084a8c5c] .__jbd2_journal_clean_checkpoint_list+0x5c/0xc0 > [jbd2]
> > [c0000009f122fa60] [d0000000084a4e2c] .jbd2_journal_commit_transaction +0x33c/0x2f80 [jbd2]
> > [c0000009f122fc90] [d0000000084ad160] .jbd2_journal_init_inode +0x390/0x5a0 [jbd2]
> > [c0000009f122fd70] [c0000000001184ac] .kthread+0x15c/0x1a0
> > [c0000009f122fe30] [c00000000000b4e4] .ret_from_kernel_thread+0x58/0x74
> > Instruction dump:
> > 7c2004ac 7d40f8a8 7d4a4878 7d40f9ad 40c2fff4 38210080 e8010010 ebc1fff0
> > ebe1fff8 7c0803a6 4e800020 ebdf0040 <e93e000a> 79290fe0 0b090000 e93e0028
> > ---[ end trace bd674540a2bf235c ]---
>
> The instruction dump is:
>
> lwsync
> ldarx r10,0,r31 # probably bit_spin_unlock() in jbd_unlock_bh_journal_head()
> andc r10,r10,r9
> stdcx. r10,0,r31
> bne- 0xfffffffffffffff4
> addi r1,r1,128
> ld r0,16(r1)
> ld r30,-16(r1)
> ld r31,-8(r1)
> mtlr r0
> blr # return
>
> ld r30,64(r31)
> lwa r9,8(r30) # <-- POP, r30 is NULL
> rldicl r9,r9,1,63
> tdnei r9,0 # Most likely a BUG_ON()
> ld r9,40(r30)
>
> Perhaps:
>
> J_ASSERT_JH(jh, jh->b_jcount >= 0);
>
> And given:
>
> struct journal_head {
> /*
> * Points back to our buffer_head. [jbd_lock_bh_journal_head()]
> */
> struct buffer_head *b_bh;
>
> /*
> * Reference count - see description in journal.c
> * [jbd_lock_bh_journal_head()]
> */
> int b_jcount;
>
>
> ie. b_jcount is 8 bytes into the struct, that looks likely.
>
> So we had a jh that was NULL.
>
>
> Is it reproducible?
No, bug is not seen once for 3 re-run from yesterday.
Hope to hit it again in the future CI runs.
--
Regard's
Abdul Haleem
IBM Linux Technology Centre
On Fri 22-09-17 15:46:21, Abdul Haleem wrote:
> On Wed, 2017-09-20 at 16:44 +1000, Michael Ellerman wrote:
> > Is it reproducible?
>
> No, bug is not seen once for 3 re-run from yesterday.
>
> Hope to hit it again in the future CI runs.
I was also looking into this but I couldn't figure out what has happened.
>From the crash it seems that jh->b_bh->b_private was NULL although it
should point back to 'jh'. But that just doesn't seem possible so maybe it
is something different. If you ever happen to hit it again, please save
the fuzzed image so that it can be used for reproduction. Thanks!
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR