2017-06-24 09:54:28

by Yisheng Xie

[permalink] [raw]
Subject: [RFC] memory corruption caused by efi driver?

hi all,

I met an Oops problem with linux-3.10. The RIP is sysfs_open_file+0x46/0x2b0 (I will and the full
crash log in the end of this mail).

when disassemble sysfs_open_file with crash, check and find it happens when open the file:
/sys/firmware/efi/vars/dbDefault-8be4df61-93ca-11d2-aa0d-00e098032b8c/raw_var

I had dump the info of kobject and efivar_entry, it seems have been corruption:
crash> struct kobject ffff880464552838
struct kobject {
name = 0x35302d30312d3031 <Address 0x35302d30312d3031 out of bounds>,
entry = {
next = 0x9060d307472632e,
prev = 0x1010df78648862a
},
parent = 0x102820300050b,
kset = 0xf7cecc30ff420835,
ktype = 0x2935586810ad0c76,
sd = 0x4112ef7c27763246,
kref = {
refcount = {
counter = 1243300391
}
},
state_initialized = 0,
state_in_sysfs = 1,
state_add_uevent_sent = 0,
state_remove_uevent_sent = 1,
uevent_suppress = 0
}
crash> p &((struct efivar_entry *)0)->kobj
$1 = (struct kobject *) 0x838
crash> struct efivar_entry -x 0xffff880464552000
struct efivar_entry {
var = {
VariableName = {0x64, 0x62, 0x44, 0x65, 0x66, 0x61, 0x75, 0x6c, 0x74, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...},
VendorGuid = {
b = "a\337\344\213ʓ\322\021\252\r\000\340\230\003+\214"
},
DataSize = 0xc47,
Data = "\241Y\300\245䔧J\207\265\253\025\\+\360r@\006\000\000\000\000\000\000$\006\000\000\275\232\372wY\003\062M\275`(\364\347\217xK0\202\006\020\060\202\003\370\240\003\002\001\002\002\na\b\323\304\000\000\000\000\000\004\060\r\006\t*\206H\206\367\r\001\001\v\005\000\060\201\221\061\v0\t\006\003U\004\006\023\002US1\023\060\021\006\003U\004\b\023\nWashington1\020\060\016\006\003U\004\a\023\aRedmond1\036\060\034\006\003U\004\n\023\025Microsoft Corporation1;09\006\003U\004\003\023\062Microsoft Corporation Third Party Marketplace Root0\036\027\r110627212245Z\027\r2606272"...,
Status = 0x7265632f696b702f,
Attributes = 0x4d2f7374
},
list = {
next = 0x4d72615069685472,
prev = 0x30325f6f6f527261
},
kobj = {
name = 0x35302d30312d3031 <Address 0x35302d30312d3031 out of bounds>,
entry = {
next = 0x9060d307472632e,
prev = 0x1010df78648862a
},
parent = 0x102820300050b,
kset = 0xf7cecc30ff420835,
ktype = 0x2935586810ad0c76,
sd = 0x4112ef7c27763246,
kref = {
refcount = {
counter = 0x4a1b4227
}
},
state_initialized = 0x0,
state_in_sysfs = 0x1,
state_add_uevent_sent = 0x0,
state_remove_uevent_sent = 0x1,
uevent_suppress = 0x0
},
scanning = 0x48,
deleting = 0x59
}


Any idea about it?

Any comment is appreciative!

Thanks
Yisheng Xie

detail log:
------
[12476.033560] general protection fault: 0000 [#1] SMP
[12476.039247] kbox catch die event.
[12476.058628] collected_len = 154965, LOG_BUF_LEN_LOCAL = 1048576
[12476.121740] kbox: notify die begin
[12476.125632] kbox: no notify die func register. no need to notify
[12476.132414] do nothing after die!
[12476.136184] Modules linked in: loop binfmt_misc kboxdriver(O) kbox(O) kernel_log_dev(OE) signo_catch(O) bsp_cpld_lpc(OVE) vfat fat intel_powerclamp coretemp intel_rapl crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg i2c_i801 pcspkr shpchp i2c_hid video wmi acpi_pad ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic igb crct10dif_pclmul crct10dif_common i2c_algo_bit ahci i2c_core libahci dca crc32c_intel libata ptp pps_core 8250_dw intel_lpss_module mfd_core [last unloaded: gen_timer]
[12476.191525] CPU: 3 PID: 11257 Comm: cat Tainted: G WC OE ----V------- 3.10.0-327.53.58.73.x86_64 #1
[12476.202708] Hardware name: Default string Default string/SKYBAY, BIOS 5.11 05/05/2017
[12476.211528] task: ffff880315ea5080 ti: ffff88045e530000 task.ti: ffff88045e530000
[12476.219965] RIP: 0010:[<ffffffff812601a6>] [<ffffffff812601a6>] sysfs_open_file+0x46/0x2b0
[12476.229452] RSP: 0018:ffff88045e533c78 EFLAGS: 00010202
[12476.235505] RAX: 2935586810ad0c76 RBX: ffff88043e693e00 RCX: ffff88046451b694
[12476.243560] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88046451b690
[12476.251647] RBP: ffff88045e533ca0 R08: 0000000000000000 R09: 0000000000000000
[12476.259700] R10: 0b90000000000000 R11: ffff880466920780 R12: ffff88042c0094d0
[12476.267752] R13: ffff88046451b690 R14: ffff88042c0094d0 R15: ffff880464552838
[12476.275806] FS: 00007f3e56a96740(0000) GS:ffff88047e4c0000(0000) knlGS:0000000000000000
[12476.285001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12476.291532] CR2: 00007f3e5659aa80 CR3: 000000043e7e8000 CR4: 00000000003407e0
[12476.299621] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[12476.307672] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[12476.315725] Stack:
[12476.318052] ffff88043e693e00 ffff88042c0094d0 ffff880036cff0c0 0000000000000000
[12476.326565] ffff88043e693e10 ffff88045e533ce8 ffffffff811e15c7 ffff88042c0094d0
[12476.335079] ffffffff81260160 ffff88045e533f28 0000000000008000 ffff88045e533df0
[12476.343599] Call Trace:
[12476.346443] [<ffffffff811e15c7>] do_dentry_open+0x1a7/0x2e0
[12476.352887] [<ffffffff81260160>] ? sysfs_schedule_callback+0x1c0/0x1c0
[12476.360429] [<ffffffff811e17f9>] vfs_open+0x39/0x70
[12476.366105] [<ffffffff811f2c3d>] do_last+0x1ed/0x12a0
[12476.373605] [<ffffffff81300422>] ? radix_tree_lookup_slot+0x22/0x50
[12476.380851] [<ffffffff811f3db2>] path_openat+0xc2/0x490
[12476.386906] [<ffffffff811f557b>] do_filp_open+0x4b/0xb0
[12476.393769] [<ffffffff81202177>] ? __alloc_fd+0xa7/0x130
[12476.399913] [<ffffffff811e2cc3>] do_sys_open+0xf3/0x1f0
[12476.405972] [<ffffffff811e2dde>] SyS_open+0x1e/0x20
[12476.411650] [<ffffffff81650a49>] system_call_fastpath+0x16/0x1b
[12476.418472] Code: f3 4c 8b 68 78 49 8b 45 08 4c 89 ef 4c 8b 78 48 e8 20 09 00 00 48 85 c0 0f 84 47 02 00 00 49 8b 47 28 48 85 c0 0f 84 ba 01 00 00 <4c> 8b 60 08 4d 85 e4 0f 84 ad 01 00 00 8b 43 44 a8 02 74 2e 41
[12476.442610] RIP [<ffffffff812601a6>] sysfs_open_file+0x46/0x2b0
[12476.449436] RSP <ffff88045e533c78>
[12476.453750] ---[ end trace 3f2d7ee3bfcdead8 ]---
[12476.453752] Kernel panic - not syncing: Fatal exception


2017-06-24 11:12:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC] memory corruption caused by efi driver?

On Sat, Jun 24, 2017 at 05:52:23PM +0800, Yisheng Xie wrote:
> hi all,
>
> I met an Oops problem with linux-3.10. The RIP is sysfs_open_file+0x46/0x2b0 (I will and the full
> crash log in the end of this mail).

3.10 is _very_ old and obsolete, can you duplicate this on a modern
kernel, like 4.11?

thanks,

greg k-h

2017-06-25 13:07:28

by Xishi Qiu

[permalink] [raw]
Subject: Re: [RFC] memory corruption caused by efi driver?

On 2017/6/24 19:12, Greg KH wrote:

> On Sat, Jun 24, 2017 at 05:52:23PM +0800, Yisheng Xie wrote:
>> hi all,
>>
>> I met an Oops problem with linux-3.10. The RIP is sysfs_open_file+0x46/0x2b0 (I will and the full
>> crash log in the end of this mail).
>
> 3.10 is _very_ old and obsolete, can you duplicate this on a modern
> kernel, like 4.11?
>
> thanks,
>
> greg k-h
>
> .
>

Hi, if I disable CONFIG_EFI_VARS, it seems OK now.

And I cann't reproduce the problem on mainline(v4.12).

Here is my test, run some stress test, then
cat /sys/firmware/efi/efivars/*
or
cat /sys/firmware/efi/vars/*/*

1) 3.10, get warning
CONFIG_EFI_VARS=y
CONFIG_EFIVAR_FS=y

2) 3.10, get warning
CONFIG_EFI_VARS=y
CONFIG_EFIVAR_FS=n

3) 3.10, ok
CONFIG_EFI_VARS=n
CONFIG_EFIVAR_FS=y

4) mainline, ok
CONFIG_EFI_VARS=y
CONFIG_EFIVAR_FS=y

log:
[78872.389117] WARNING: at fs/sysfs/file.c:343 sysfs_open_file+0x222/0x2b0()
[78872.389118] missing sysfs attribute operations for kobject: (null)
[78872.389177] Modules linked in: gen_timer(OVE) tun zram(C) ext4 jbd2 mbcache loop regmap_i2c binfmt_misc scsi_transport_iscsi cfg80211 ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack rfk
ill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg iTCO_wdt ipmi_devintf iTCO_ve
ndor_support vfat fat intel_powerclamp coretemp kvm_intel kvm nfsd crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ipmi_ssif aesni_intel lrw gf128mul auth_rpcgss glue_helper a
blk_helper i7core_edac nfs_acl cryptd lpc_ich pcspkr
[78872.389197] ipmi_si i2c_i801 edac_core shpchp mfd_core lockd ipmi_msghandler acpi_cpufreq grace sunrpc uinput xfs libcrc32c sd_mod sr_mod crc_t10dif cdrom crct10dif_common ixgbe igb ahci
mdio libahci ptp i2c_algo_bit pps_core libata i2c_core megaraid_sas dca dm_mirror dm_region_hash dm_log dm_mod [last unloaded: gen_timer]
[78872.389202] CPU: 52 PID: 28434 Comm: cat Tainted: G WC OE ----V------- 3.10.0-327.55.58.81.x86_64 #2
[78872.389204] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Tecal RH5885 V2/CH91RGPUC, BIOS RGPUC-BIOS-V058 06/23/2013
[78872.389207] ffff88200a61fc10 00000000df10e27d ffff88200a61fbc8 ffffffff8163ed14
[78872.389208] ffff88200a61fc00 ffffffff8107b300 00000000fffffff3 ffff88103f6473a0
[78872.389209] ffff8880236cb700 ffff88103f6473a0 ffff8860281d8838 ffff88200a61fc68
[78872.389210] Call Trace:
[78872.389224] [<ffffffff8163ed14>] dump_stack+0x19/0x1b
[78872.389233] [<ffffffff8107b300>] warn_slowpath_common+0x70/0xb0
[78872.389234] [<ffffffff8107b39c>] warn_slowpath_fmt+0x5c/0x80
[78872.389236] [<ffffffff8125f1d2>] sysfs_open_file+0x222/0x2b0
[78872.389242] [<ffffffff811e0167>] do_dentry_open+0x1a7/0x2e0
[78872.389244] [<ffffffff8125efb0>] ? sysfs_schedule_callback+0x1c0/0x1c0
[78872.389245] [<ffffffff811e0399>] vfs_open+0x39/0x70
[78872.389251] [<ffffffff811f183d>] do_last+0x1ed/0x12a0
[78872.389259] [<ffffffff811c4ffe>] ? kmem_cache_alloc_trace+0x1ce/0x1f0
[78872.389261] [<ffffffff811f29b2>] path_openat+0xc2/0x490
[78872.389267] [<ffffffff8112786d>] ? call_rcu_sched+0x1d/0x20
[78872.389275] [<ffffffff8118484d>] ? shmem_destroy_inode+0x2d/0x40
[78872.389281] [<ffffffff811fe4c6>] ? evict+0x106/0x170
[78872.389283] [<ffffffff811f417b>] do_filp_open+0x4b/0xb0
[78872.389286] [<ffffffff81200d97>] ? __alloc_fd+0xa7/0x130
[78872.389290] [<ffffffff811e1863>] do_sys_open+0xf3/0x1f0
[78872.389291] [<ffffffff811e197e>] SyS_open+0x1e/0x20
[78872.389297] [<ffffffff8164f109>] system_call_fastpath+0x16/0x1b
[78872.389298] ---[ end trace cbe34632be0fdedf ]---
[78872.390067] ------------[ cut here ]------------

2017-06-25 13:32:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC] memory corruption caused by efi driver?

On Sun, Jun 25, 2017 at 09:06:58PM +0800, Xishi Qiu wrote:
> On 2017/6/24 19:12, Greg KH wrote:
>
> > On Sat, Jun 24, 2017 at 05:52:23PM +0800, Yisheng Xie wrote:
> >> hi all,
> >>
> >> I met an Oops problem with linux-3.10. The RIP is sysfs_open_file+0x46/0x2b0 (I will and the full
> >> crash log in the end of this mail).
> >
> > 3.10 is _very_ old and obsolete, can you duplicate this on a modern
> > kernel, like 4.11?
> >
> > thanks,
> >
> > greg k-h
> >
> > .
> >
>
> Hi, if I disable CONFIG_EFI_VARS, it seems OK now.
>
> And I cann't reproduce the problem on mainline(v4.12).
>
> Here is my test, run some stress test, then
> cat /sys/firmware/efi/efivars/*
> or
> cat /sys/firmware/efi/vars/*/*
>
> 1) 3.10, get warning
> CONFIG_EFI_VARS=y
> CONFIG_EFIVAR_FS=y
>
> 2) 3.10, get warning
> CONFIG_EFI_VARS=y
> CONFIG_EFIVAR_FS=n
>
> 3) 3.10, ok
> CONFIG_EFI_VARS=n
> CONFIG_EFIVAR_FS=y
>
> 4) mainline, ok
> CONFIG_EFI_VARS=y
> CONFIG_EFIVAR_FS=y

Then use mainline :)