vfree() might sleep if called not in interrupt context.
So does kvfree() too. Fix misleading kvfree()'s comment about
allowed context.
Fixes: 04b8e946075d ("mm/util.c: improve kvfree() kerneldoc")
Signed-off-by: Andrey Ryabinin <[email protected]>
---
mm/util.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c
index eeac38a64290..7f1f165f46af 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -442,7 +442,7 @@ EXPORT_SYMBOL(kvmalloc_node);
* It is slightly more efficient to use kfree() or vfree() if you are certain
* that you know which one to use.
*
- * Context: Any context except NMI.
+ * Context: Either preemptible task context or not-NMI interrupt.
*/
void kvfree(const void *addr)
{
--
2.16.4
Add might_sleep() calls to vfree(), kvfree() to catch potential
sleep-in-atomic bugs earlier.
Signed-off-by: Andrey Ryabinin <[email protected]>
---
mm/util.c | 2 ++
mm/vmalloc.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/mm/util.c b/mm/util.c
index 7f1f165f46af..929ed1795bc1 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -446,6 +446,8 @@ EXPORT_SYMBOL(kvmalloc_node);
*/
void kvfree(const void *addr)
{
+ might_sleep_if(!in_interrupt());
+
if (is_vmalloc_addr(addr))
vfree(addr);
else
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d00d42d6bf79..97d4b25d0373 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1587,6 +1587,8 @@ void vfree(const void *addr)
kmemleak_free(addr);
+ might_sleep_if(!in_interrupt());
+
if (!addr)
return;
if (unlikely(in_interrupt()))
--
2.16.4
vfree() might sleep if called not in interrupt context. Explain
that in the comment.
Signed-off-by: Andrey Ryabinin <[email protected]>
---
mm/vmalloc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a728fc492557..d00d42d6bf79 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1577,6 +1577,8 @@ void vfree_atomic(const void *addr)
* have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, but making the calling
* conventions for vfree() arch-depenedent would be a really bad idea)
*
+ * May sleep if called *not* from interrupt context.
+ *
* NOTE: assumes that the object at @addr has a size >= sizeof(llist_node)
*/
void vfree(const void *addr)
--
2.16.4
FYI, we noticed the following commit (built with gcc-6):
commit: a79ed8bfb24e899aa55de42703ae4508ff016311 ("[PATCH 3/3] vfree, kvfree: Add debug might sleeps.")
url: https://github.com/0day-ci/linux/commits/Andrey-Ryabinin/kvfree-Fix-misleading-comment/20180915-094734
in testcase: trinity
with following parameters:
runtime: 300s
test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/
on test machine: qemu-system-i386 -enable-kvm -cpu Haswell,+smep,+smap -m 384M
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+----------------------------------------------------------------+------------+------------+
| | d835757305 | a79ed8bfb2 |
+----------------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 12 | 12 |
| INFO:trying_to_register_non-static_key | 12 | 12 |
| WARNING:at_mm/slab_common.c:#kmalloc_slab | 12 | 12 |
| EIP:kmalloc_slab | 12 | 12 |
| Mem-Info | 12 | 12 |
| WARNING:at_lib/debugobjects.c:#__debug_object_init | 12 | 12 |
| EIP:__debug_object_init | 12 | 12 |
| WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page | 8 | 8 |
| EIP:note_page | 8 | 8 |
| BUG:sleeping_function_called_from_invalid_context_at_mm/util.c | 0 | 12 |
+----------------------------------------------------------------+------------+------------+
[ 3.265372] BUG: sleeping function called from invalid context at mm/util.c:449
[ 3.288552] in_atomic(): 0, irqs_disabled(): 0, pid: 142, name: rhashtable_thra
[ 3.301548] INFO: lockdep is turned off.
[ 3.302214] Preemption disabled at:
[ 3.302221] [<c163e86f>] get_random_u32+0x4f/0x100
[ 3.327556] CPU: 0 PID: 142 Comm: rhashtable_thra Tainted: G W T 4.19.0-rc3-00266-ga79ed8bf #656
[ 3.328540] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 3.328540] Call Trace:
[ 3.328540] ? dump_stack+0x55/0x7b
[ 3.328540] ? get_random_u32+0x4f/0x100
[ 3.328540] ? ___might_sleep+0x11d/0x170
[ 3.328540] ? kvfree+0x61/0x70
[ 3.328540] ? bucket_table_free+0x18/0x80
[ 3.328540] ? bucket_table_alloc+0x79/0x160
[ 3.328540] ? rhashtable_insert_slow+0x25d/0x2d0
[ 3.328540] ? insert_retry+0x1df/0x320
[ 3.328540] ? threadfunc+0xa3/0x3fe
[ 3.328540] ? kzalloc+0x14/0x14
[ 3.328540] ? _raw_spin_unlock_irqrestore+0x30/0x50
[ 3.328540] ? kthread+0xd1/0x100
[ 3.328540] ? insert_retry+0x320/0x320
[ 3.328540] ? kthread_delayed_work_timer_fn+0x80/0x80
[ 3.328540] ? ret_from_fork+0x2e/0x38
[ 5.372143] test 3125 add/delete pairs into rhlist
[ 5.403746] test 3125 random rhlist add/delete operations
[ 5.424742] Started 10 threads, 0 failed, rhltable test returns 0
[ 5.428940] test passed
[ 5.429422] test_printf: crng possibly not yet initialized. plain 'p' buffer contains "(ptrval)"
[ 5.429582] test_printf: all 240 tests passed
[ 5.431911] test_bitmap: test 13: input is '0-2047:128/256' OK, Time: 1362
[ 5.437605] test_bitmap: all 1524 tests passed
[ 5.439117] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
[ 5.439954] crc32: self tests passed, processed 225944 bytes in 182273 nsec
[ 5.441271] crc32c: CRC_LE_BITS = 64
[ 5.441916] crc32c: self tests passed, processed 225944 bytes in 91343 nsec
[ 5.461018] crc32_combine: 8373 self tests passed
[ 5.484112] crc32c_combine: 8373 self tests passed
[ 5.485066] rbtree testing
[ 5.493229] -> test 1 (latency of nnodes insert+delete): 21966 cycles
[ 5.500638] -> test 2 (latency of nnodes cached insert+delete): 15776 cycles
[ 5.502454] -> test 3 (latency of inorder traversal): 1708 cycles
[ 5.503461] -> test 4 (latency to fetch first node)
[ 5.504259] non-cached: 24 cycles
[ 5.504920] cached: 0 cycles
[ 5.556187] augmented rbtree testing
[ 5.566165] -> test 1 (latency of nnodes insert+delete): 26854 cycles
[ 5.575698] -> test 2 (latency of nnodes cached insert+delete): 21175 cycles
[ 5.635774] gpio_it87: no device
[ 5.638713] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[ 5.639917] ACPI: Power Button [PWRF]
[ 5.672299] r3964: Philips r3964 Driver $Revision: 1.10 $
[ 5.673230] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 5.706897] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 5.739763] 00:06: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
[ 5.741939] Non-volatile memory driver v1.3
[ 5.742810] platform pc8736x_gpio.0: NatSemi pc8736x GPIO Driver Initializing
[ 5.743991] platform pc8736x_gpio.0: no device found
[ 5.744831] nsc_gpio initializing
[ 5.745389] telclk_interrupt = 0xf non-mcpbl0010 hw.
[ 5.746210] smapi::smapi_init, ERROR invalid usSmapiID
[ 5.747395] mwave: tp3780i::tp3780I_InitializeBoardData: Error: SMAPI is not available on this machine
[ 5.748867] mwave: mwavedd::mwave_init: Error: Failed to initialize board data
[ 5.750018] mwave: mwavedd::mwave_init: Error: Failed to initialize
[ 5.754673] Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds).
[ 5.757691] dummy-irq: no IRQ given. Use irq=N
[ 5.758451] lkdtm: No crash points registered, enable through debugfs
[ 5.759598] Silicon Labs C2 port support v. 0.51.0 - (C) 2007 Rodolfo Giometti
[ 5.760838] c2port c2port0: C2 port uc added
[ 5.761530] c2port c2port0: uc flash has 30 blocks x 512 bytes (15360 bytes total)
[ 5.764561] Uniform Multi-Platform E-IDE driver
[ 5.765335] ide_generic: please use "probe_mask=0x3f" module parameter for probing all legacy ISA IDE ports
[ 5.777597] Rounding down aligned max_sectors from 4294967295 to 4294967288
[ 5.783572] db_root: cannot open: /etc/target
[ 5.784598] HSI/SSI char device loaded
[ 5.785210] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[ 5.787551] e1000: Copyright (c) 1999-2006 Intel Corporation.
[ 5.925519] PCI Interrupt Link [LNKC] enabled at IRQ 11
[ 6.275394] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
[ 6.276503] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
[ 6.277645] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[ 6.278591] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 6.279569] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
[ 6.280684] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 6.281610] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 5.1.0-k
[ 6.282850] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[ 6.287717] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 6.289972] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 6.291592] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 6.387914] mousedev: PS/2 mouse device common for all mice
[ 6.388921] evbug: Connected device: input0 (Power Button at LNXPWRBN/button/input0)
[ 6.390631] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[ 6.392093] evbug: Connected device: input1 (AT Translated Set 2 keyboard at isa0060/serio0/input0)
[ 6.394182] rtc_cmos 00:00: RTC can wake from S4
[ 6.397608] rtc_cmos 00:00: registered as rtc0
[ 6.398375] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram, hpet irqs
[ 6.400602] rtc-test rtc-test.0: registered as rtc1
[ 6.401534] rtc-test rtc-test.1: registered as rtc2
[ 6.402471] rtc-test rtc-test.2: registered as rtc3
[ 6.403374] i2c /dev entries driver
[ 6.404051] i2c-parport-light: adapter type unspecified
[ 6.404994] Driver for 1-wire Dallas network protocol.
[ 6.405923] w1_f0d_init()
[ 6.406772] intel_powerclamp: CPU does not support MWAIT
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
Thanks,
Rong Chen
On 09/18/2018 11:52 AM, kernel test robot wrote:
>
> [ 3.265372] BUG: sleeping function called from invalid context at mm/util.c:449
> [ 3.288552] in_atomic(): 0, irqs_disabled(): 0, pid: 142, name: rhashtable_thra
> [ 3.301548] INFO: lockdep is turned off.
> [ 3.302214] Preemption disabled at:
> [ 3.302221] [<c163e86f>] get_random_u32+0x4f/0x100
> [ 3.327556] CPU: 0 PID: 142 Comm: rhashtable_thra Tainted: G W T 4.19.0-rc3-00266-ga79ed8bf #656
> [ 3.328540] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 3.328540] Call Trace:
> [ 3.328540] ? dump_stack+0x55/0x7b
> [ 3.328540] ? get_random_u32+0x4f/0x100
> [ 3.328540] ? ___might_sleep+0x11d/0x170
> [ 3.328540] ? kvfree+0x61/0x70
> [ 3.328540] ? bucket_table_free+0x18/0x80
> [ 3.328540] ? bucket_table_alloc+0x79/0x160
> [ 3.328540] ? rhashtable_insert_slow+0x25d/0x2d0
> [ 3.328540] ? insert_retry+0x1df/0x320
> [ 3.328540] ? threadfunc+0xa3/0x3fe
> [ 3.328540] ? kzalloc+0x14/0x14
> [ 3.328540] ? _raw_spin_unlock_irqrestore+0x30/0x50
> [ 3.328540] ? kthread+0xd1/0x100
> [ 3.328540] ? insert_retry+0x320/0x320
> [ 3.328540] ? kthread_delayed_work_timer_fn+0x80/0x80
> [ 3.328540] ? ret_from_fork+0x2e/0x38
Seems like we need to drop might_sleep_if() from kvfree().
rcu_read_lock()
rhashtable_insert_rehash()
new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC | __GFP_NOWARN);
->kvmalloc();
bucket_table_free(new_tbl);
->kvfree()
rcu_read_unlock()
kvmalloc(..., GFP_ATOMIC) simply always kmalloc:
if ((flags & GFP_KERNEL) != GFP_KERNEL)
return kmalloc_node(size, flags, node);
So in the above case, kvfree() always frees kmalloced memory -> and never calls vfree().
Signed-off-by: Andrey Ryabinin <[email protected]>
---
mm/util.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/util.c b/mm/util.c
index 929ed1795bc1..7f1f165f46af 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -446,8 +446,6 @@ EXPORT_SYMBOL(kvmalloc_node);
*/
void kvfree(const void *addr)
{
- might_sleep_if(!in_interrupt());
-
if (is_vmalloc_addr(addr))
vfree(addr);
else
--