Hi folks,
This is the third version of the patch to fix the softlockup in EDMM iotcl()[1][2].
If we run an enclave equipped with large EPC(30G or greater on my platfrom)
on the Linux with kernel preemptions disabled(by configuring
"CONFIG_PREEMPT_NONE=y"), we will get the following softlockup warning
messages being reported in "dmesg" log:
The EDMM's ioctl()s (sgx_ioc_enclave_{ modify_types | restrict_permissions | remove_pages})
interface provided by kernel support batch changing attributes of enclave's EPC.
If userspace App requests kernel to handle too many EPC pages, kernel
may stuck for a long time(with preemption disabled).
The log is as follows:
------------[ cut here ]------------
[ 901.101294] watchdog: BUG: soft lockup - CPU#92 stuck for 23s! [occlum-run:4289]
[ 901.109617] Modules linked in: veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter br_netfilter bridge stp llc overlay nls_iso8859_1 intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit binfmt_misc ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 pmt_telemetry sha1_ssse3 pmt_class joydev intel_sdsi input_leds aesni_intel crypto_simd cryptd dax_hmem cxl_acpi cmdlinepart rapl cxl_core ast spi_nor intel_cstate drm_shmem_helper einj mtd drm_kms_helper mei_me idxd isst_if_mmio isst_if_mbox_pci isst_if_common intel_vsec idxd_bus mei acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid sch_fq_codel msr parport_pc ppdev lp parport ramoops reed_solomon pstore_blk pstore_zone efi_pstore drm ip_tables x_tables
[ 901.109670] autofs4 mlx5_ib ib_uverbs ib_core hid_generic usbhid hid ses enclosure scsi_transport_sas mlx5_core pci_hyperv_intf mlxfw igb ahci psample i2c_algo_bit i2c_i801 spi_intel_pci xhci_pci tls megaraid_sas dca spi_intel crc32_pclmul i2c_smbus i2c_ismt libahci xhci_pci_renesas wmi pinctrl_emmitsburg
[ 901.109691] CPU: 92 PID: 4289 Comm: occlum-run Not tainted 6.9.0-rc5 #3
[ 901.109693] Hardware name: Inspur NF5468-M7-A0-R0-00/NF5468-M7-A0-R0-00, BIOS 05.02.01 05/08/2023
[ 901.109695] RIP: 0010:sgx_enclave_restrict_permissions+0xba/0x1f0
[ 901.109701] Code: 48 c1 e6 05 48 89 d1 48 8d 5c 24 40 b8 0e 00 00 00 48 2b 8e 70 8e 15 8b 48 c1 e9 05 48 c1 e1 0c 48 03 8e 68 8e 15 8b 0f 01 cf <a9> 00 00 00 40 0f 85 b2 00 00 00 85 c0 0f 85 db 00 00 00 4c 89 ef
[ 901.109702] RSP: 0018:ffffad0ae5d0f8c0 EFLAGS: 00000202
[ 901.109704] RAX: 0000000000000000 RBX: ffffad0ae5d0f900 RCX: ffffad11dfc0e000
[ 901.109705] RDX: ffffad2adcff81c0 RSI: 0000000000000000 RDI: ffff9a12f5f4f000
[ 901.109706] RBP: ffffad0ae5d0f9b0 R08: 0000000000000002 R09: ffff9a1289f57520
[ 901.109707] R10: 000000000000005d R11: 0000000000000002 R12: 00000006d8ff2000
[ 901.109708] R13: ffff9a12f5f4f000 R14: ffffad0ae5d0fa18 R15: ffff9a12f5f4f020
[ 901.109709] FS: 00007fb20ad1d740(0000) GS:ffff9a317fe00000(0000) knlGS:0000000000000000
[ 901.109710] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 901.109711] CR2: 00007f8041811000 CR3: 0000000118530006 CR4: 0000000000770ef0
[ 901.109712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 901.109713] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 901.109714] PKRU: 55555554
[ 901.109714] Call Trace:
[ 901.109716] <IRQ>
[ 901.109718] ? show_regs+0x67/0x70
[ 901.109722] ? watchdog_timer_fn+0x1f3/0x280
[ 901.109725] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 901.109727] ? __hrtimer_run_queues+0xc8/0x220
[ 901.109731] ? hrtimer_interrupt+0x10c/0x250
[ 901.109733] ? __sysvec_apic_timer_interrupt+0x53/0x130
[ 901.109736] ? sysvec_apic_timer_interrupt+0x7b/0x90
[ 901.109739] </IRQ>
[ 901.109740] <TASK>
[ 901.109740] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 901.109745] ? sgx_enclave_restrict_permissions+0xba/0x1f0
[ 901.109747] ? aa_file_perm+0x145/0x550
[ 901.109750] sgx_ioctl+0x1ab/0x900
[ 901.109751] ? xas_find+0x84/0x200
[ 901.109754] ? sgx_enclave_etrack+0xbb/0x140
[ 901.109756] ? sgx_encl_may_map+0x19a/0x240
[ 901.109758] ? common_file_perm+0x8a/0x1b0
[ 901.109760] ? obj_cgroup_charge_pages+0xa2/0x100
[ 901.109763] ? tlb_flush_mmu+0x31/0x1c0
[ 901.109766] ? tlb_finish_mmu+0x42/0x80
[ 901.109767] ? do_mprotect_pkey+0x150/0x530
[ 901.109769] ? __fget_light+0xc0/0x100
[ 901.109772] __x64_sys_ioctl+0x95/0xd0
[ 901.109775] x64_sys_call+0x1209/0x20c0
[ 901.109777] do_syscall_64+0x6d/0x110
[ 901.109779] ? syscall_exit_to_user_mode+0x86/0x1c0
[ 901.109782] ? do_syscall_64+0x79/0x110
[ 901.109783] ? syscall_exit_to_user_mode+0x86/0x1c0
[ 901.109784] ? do_syscall_64+0x79/0x110
[ 901.109785] ? free_unref_page+0x10e/0x180
[ 901.109788] ? __do_fault+0x36/0x130
[ 901.109791] ? do_pte_missing+0x2e8/0xcc0
[ 901.109792] ? __pte_offset_map+0x1c/0x190
[ 901.109795] ? __handle_mm_fault+0x7b9/0xe60
[ 901.109796] ? __count_memcg_events+0x70/0x100
[ 901.109798] ? handle_mm_fault+0x256/0x360
[ 901.109799] ? do_user_addr_fault+0x3c1/0x860
[ 901.109801] ? irqentry_exit_to_user_mode+0x67/0x190
[ 901.109803] ? irqentry_exit+0x3b/0x50
[ 901.109804] ? exc_page_fault+0x89/0x180
[ 901.109806] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 901.109807] RIP: 0033:0x7fb20b4315cb
[ 901.109810] Code: 0f 1e fa 48 8b 05 c5 78 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 78 0d 00 f7 d8 64 89 01 48
[ 901.109811] RSP: 002b:00007ffc0e7af718 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 901.109812] RAX: ffffffffffffffda RBX: 0000000780000000 RCX: 00007fb20b4315cb
[ 901.109813] RDX: 00007ffc0e7af720 RSI: 00000000c028a405 RDI: 0000000000000005
[ 901.109814] RBP: 0000000000000005 R08: 0000000000000000 R09: 00007ffc0e7af794
[ 901.109815] R10: 00007ffc0e7af7c8 R11: 0000000000000246 R12: 00000000c028a405
[ 901.109815] R13: 00007ffc0e7af720 R14: 0000000780000000 R15: 00007fb20b2ea980
[ 901.109817] </TASK>
------------[ end trace ]------------
We suggest to give up CPU in the ioctl() handler explicitly. I have attached a
patch which can fix such issue.
Appreciate for all the review on previous versions.
Thanks for your time!
Regards,
Bojun
---
v1 -> v2:
- Make the cond_resched() as a prefix op instead of a postfix op (Jarkko)
- Additionly check the pending signal at the begin of every for loop (Jarkko)
- Introduce sgx_check_signal_and_resched() to wrap the pending check
and reschedule operations (Jarkko)
- Refine the trace log in commit() message (Kai)
v2 -> v3:
- Set the default value of "ret" as "-ERESTARTSYS" to avoid redundantly check
whether the current iteration is the first one (Dave)
[1] v1: https://lore.kernel.org/linux-sgx/[email protected]/
[2] v2: https://lore.kernel.org/linux-sgx/[email protected]/
Bojun Zhu (1):
x86/sgx: Explicitly give up the CPU in EDMM's ioctl() to avoid
softlockup
arch/x86/kernel/cpu/sgx/ioctl.c | 40 +++++++++++++++++++++++----------
1 file changed, 28 insertions(+), 12 deletions(-)
base-commit: ed30a4a51bb196781c8058073ea720133a65596f
--
2.25.1