2021-01-05 07:08:07

by Chris Goldsworthy

[permalink] [raw]
Subject: [PATCH v2] Resolve LRU page-pinning issue for file-backed pages

It is possible for file-backed pages to end up in a contiguous memory area
(CMA), such that the relevant page must be migrated using the .migratepage()
callback when its backing physical memory is selected for use in an CMA
allocation (through cma_alloc()). However, if a set of address space
operations (AOPs) for a file-backed page lacks a migratepage() page call-back,
fallback_migrate_page() will be used instead, which through
try_to_release_page() calls try_to_free_buffers() (which is called directly or
through a try_to_free_buffers() callback. try_to_free_buffers() in turn calls
drop_buffers()

drop_buffers() itself can fail due to the buffer_head associated with a page
being busy. However, it is possible that the buffer_head is on an LRU list for
a CPU, such that we can try removing the buffer_head from that list, in order
to successfully release the page. Do this.

v1: https://lore.kernel.org/lkml/[email protected]/T/#m3a44b5745054206665455625ccaf27379df8a190
Original version of the patch (with updates to make to account for changes in
on_each_cpu_cond()).

v2: Follow Matthew Wilcox's suggestion of reducing the number of calls to
on_each_cpu_cond(), by iterating over a page's busy buffer_heads inside of
on_each_cpu_cond(). To copy from his e-mail, we go from:

for_each_buffer
for_each_cpu
for_each_lru_entry

to:

for_each_cpu
for_each_buffer
for_each_lru_entry

This is done using xarrays, which I found to be the cleanest data structure to
use, though a pre-allocated array of page_size(page) / bh->b_size elements might
be more performant.

Laura Abbott (1):
fs/buffer.c: Revoke LRU when trying to drop buffers

fs/buffer.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
fs/internal.h | 5 ++++
2 files changed, 85 insertions(+), 5 deletions(-)

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2021-01-05 07:09:56

by Chris Goldsworthy

[permalink] [raw]
Subject: [PATCH v2] fs/buffer.c: Revoke LRU when trying to drop buffers

From: Laura Abbott <[email protected]>

When a buffer is added to the LRU list, a reference is taken which is
not dropped until the buffer is evicted from the LRU list. This is the
correct behavior, however this LRU reference will prevent the buffer
from being dropped. This means that the buffer can't actually be dropped
until it is selected for eviction. There's no bound on the time spent
on the LRU list, which means that the buffer may be undroppable for
very long periods of time. Given that migration involves dropping
buffers, the associated page is now unmigratible for long periods of
time as well. CMA relies on being able to migrate a specific range
of pages, so these types of failures make CMA significantly
less reliable, especially under high filesystem usage.

Rather than waiting for the LRU algorithm to eventually kick out
the buffer, explicitly remove the buffer from the LRU list when trying
to drop it. There is still the possibility that the buffer
could be added back on the list, but that indicates the buffer is
still in use and would probably have other 'in use' indicates to
prevent dropping.

Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Chris Goldsworthy <[email protected]>
Cc: Matthew Wilcox <[email protected]>
---
fs/buffer.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
fs/internal.h | 5 ++++
2 files changed, 85 insertions(+), 5 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 96c7604..536fb5b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -48,6 +48,7 @@
#include <linux/sched/mm.h>
#include <trace/events/block.h>
#include <linux/fscrypt.h>
+#include <linux/xarray.h>

#include "internal.h"

@@ -1471,12 +1472,63 @@ static bool has_bh_in_lru(int cpu, void *dummy)
return false;
}

+static void __evict_bhs_lru(void *arg)
+{
+ struct bh_lru *b = &get_cpu_var(bh_lrus);
+ struct busy_bhs_container *busy_bhs = arg;
+ struct buffer_head *bh;
+ int i;
+
+ XA_STATE(xas, &busy_bhs->xarray, 0);
+
+ xas_for_each(&xas, bh, busy_bhs->size) {
+ for (i = 0; i < BH_LRU_SIZE; i++) {
+ if (b->bhs[i] == bh) {
+ brelse(b->bhs[i]);
+ b->bhs[i] = NULL;
+ break;
+ }
+ }
+
+ bh = bh->b_this_page;
+ }
+
+ put_cpu_var(bh_lrus);
+}
+
+static bool page_has_bhs_in_lru(int cpu, void *arg)
+{
+ struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu);
+ struct busy_bhs_container *busy_bhs = arg;
+ struct buffer_head *bh;
+ int i;
+
+ XA_STATE(xas, &busy_bhs->xarray, 0);
+
+ xas_for_each(&xas, bh, busy_bhs->size) {
+ for (i = 0; i < BH_LRU_SIZE; i++) {
+ if (b->bhs[i] == bh)
+ return true;
+ }
+
+ bh = bh->b_this_page;
+ }
+
+ return false;
+
+}
void invalidate_bh_lrus(void)
{
on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1);
}
EXPORT_SYMBOL_GPL(invalidate_bh_lrus);

+static void evict_bh_lrus(struct busy_bhs_container *busy_bhs)
+{
+ on_each_cpu_cond(page_has_bhs_in_lru, __evict_bhs_lru,
+ busy_bhs, 1);
+}
+
void set_bh_page(struct buffer_head *bh,
struct page *page, unsigned long offset)
{
@@ -3242,14 +3294,36 @@ drop_buffers(struct page *page, struct buffer_head **buffers_to_free)
{
struct buffer_head *head = page_buffers(page);
struct buffer_head *bh;
+ struct busy_bhs_container busy_bhs;
+ int xa_ret, ret = 0;
+
+ xa_init(&busy_bhs.xarray);
+ busy_bhs.size = 0;

bh = head;
do {
- if (buffer_busy(bh))
- goto failed;
+ if (buffer_busy(bh)) {
+ xa_ret = xa_err(xa_store(&busy_bhs.xarray, busy_bhs.size++,
+ bh, GFP_ATOMIC));
+ if (xa_ret)
+ goto out;
+ }
bh = bh->b_this_page;
} while (bh != head);

+ if (busy_bhs.size) {
+ /*
+ * Check if the busy failure was due to an outstanding
+ * LRU reference
+ */
+ evict_bh_lrus(&busy_bhs);
+ do {
+ if (buffer_busy(bh))
+ goto out;
+ } while (bh != head);
+ }
+
+ ret = 1;
do {
struct buffer_head *next = bh->b_this_page;

@@ -3259,9 +3333,10 @@ drop_buffers(struct page *page, struct buffer_head **buffers_to_free)
} while (bh != head);
*buffers_to_free = head;
detach_page_private(page);
- return 1;
-failed:
- return 0;
+out:
+ xa_destroy(&busy_bhs.xarray);
+
+ return ret;
}

int try_to_free_buffers(struct page *page)
diff --git a/fs/internal.h b/fs/internal.h
index 77c50be..00f17c4 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -15,6 +15,7 @@ struct mount;
struct shrink_control;
struct fs_context;
struct user_namespace;
+struct xarray;

/*
* block_dev.c
@@ -49,6 +50,10 @@ static inline int emergency_thaw_bdev(struct super_block *sb)
*/
extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
get_block_t *get_block, struct iomap *iomap);
+struct busy_bhs_container {
+ struct xarray xarray;
+ int size;
+};

/*
* char_dev.c
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2021-01-12 14:56:58

by Oliver Sang

[permalink] [raw]
Subject: [fs/buffer.c] 6bb5a3cec4: WARNING:suspicious_RCU_usage


Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 6bb5a3cec4c480f7b7d3d3cacc618bdd185ca0db ("[PATCH v2] fs/buffer.c: Revoke LRU when trying to drop buffers")
url: https://github.com/0day-ci/linux/commits/Chris-Goldsworthy/fs-buffer-c-Revoke-LRU-when-trying-to-drop-buffers/20210105-150702
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62

in testcase: kernel-selftests
version: kernel-selftests-x86_64-cb0debfe-1_20201231
with following parameters:

group: group-02
ucode: 0xe2

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt


on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


kern :warn : [ 1910.305179] WARNING: suspicious RCU usage
user :notice: [ 1910.315809] ok 4 openat2 with normal struct argument [misalign=3] succeeds
kern :warn : [ 1910.317774] 5.11.0-rc2-g6bb5a3cec4c4 #1 Not tainted
kern :warn : [ 1910.317776] -----------------------------

user :notice: [ 1910.329127] ok 5 openat2 with normal struct argument [misalign=4] succeeds
kern :warn : [ 1910.329489] include/linux/xarray.h:1164 suspicious rcu_dereference_check() usage!
kern :warn : [ 1910.329491]
other info that might help us debug this:

kern :warn : [ 1910.329492]
rcu_scheduler_active = 2, debug_locks = 1

user :notice: [ 1910.338253] ok 6 openat2 with normal struct argument [misalign=5] succeeds
kern :warn : [ 1910.342028] 2 locks held by umount/11983:
kern :warn : [ 1910.342029] #0: ffff88881309b0e0 (&type->s_umount_key#49

user :notice: [ 1910.350849] ok 7 openat2 with normal struct argument [misalign=6] succeeds
kern :warn : [ 1910.357472] ){+.+.}-{3:3}

kern :warn : [ 1910.365467] , at: deactivate_super (kbuild/src/consumer/fs/super.c:366 kbuild/src/consumer/fs/super.c:362)
kern :warn : [ 1910.365472] #1: ffff88880af6f860 (&mapping->private_lock){+.+.}-{2:2}, at: try_to_free_buffers (kbuild/src/consumer/fs/buffer.c:3316)
user :notice: [ 1910.373661] ok 8 openat2 with normal struct argument [misalign=7] succeeds
kern :warn : [ 1910.376316]
stack backtrace:

user :notice: [ 1910.383052] ok 9 openat2 with normal struct argument [misalign=8] succeeds
kern :warn : [ 1910.383195] CPU: 2 PID: 11983 Comm: umount Not tainted 5.11.0-rc2-g6bb5a3cec4c4 #1
kern :warn : [ 1910.383198] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.8.1 12/05/2017
kern :warn : [ 1910.383199] Call Trace:
kern :warn : [ 1910.383203] dump_stack (kbuild/src/consumer/lib/dump_stack.c:122)
kern :warn : [ 1910.383208] xas_start (kbuild/src/consumer/include/linux/xarray.h:1164 kbuild/src/consumer/include/linux/xarray.h:1162 kbuild/src/consumer/lib/xarray.c:188)
kern :warn : [ 1910.383215] xas_load (kbuild/src/consumer/include/linux/xarray.h:169 kbuild/src/consumer/include/linux/xarray.h:1224 kbuild/src/consumer/lib/xarray.c:235)
kern :warn : [ 1910.383218] xas_find (kbuild/src/consumer/lib/xarray.c:1244)
kern :warn : [ 1910.383221] ? ll_rw_block (kbuild/src/consumer/fs/buffer.c:1463)
kern :warn : [ 1910.383226] page_has_bhs_in_lru (kbuild/src/consumer/fs/buffer.c:1471)
kern :warn : [ 1910.383232] ? ll_rw_block (kbuild/src/consumer/fs/buffer.c:1463)
kern :warn : [ 1910.383237] smp_call_function_many_cond (kbuild/src/consumer/kernel/smp.c:665 (discriminator 1))
kern :warn : [ 1910.383241] ? attach_nobh_buffers (kbuild/src/consumer/fs/buffer.c:1439)
kern :warn : [ 1910.383245] ? ll_rw_block (kbuild/src/consumer/fs/buffer.c:1463)
kern :warn : [ 1910.383247] ? attach_nobh_buffers (kbuild/src/consumer/fs/buffer.c:1439)
kern :warn : [ 1910.383249] on_each_cpu_cond_mask (kbuild/src/consumer/include/linux/cpumask.h:373 kbuild/src/consumer/kernel/smp.c:900)
kern :warn : [ 1910.383255] drop_buffers (kbuild/src/consumer/fs/buffer.c:3247 kbuild/src/consumer/fs/buffer.c:3279)
kern :warn : [ 1910.383267] try_to_free_buffers (kbuild/src/consumer/fs/buffer.c:3316)
kern :warn : [ 1910.383273] invalidate_inode_page (kbuild/src/consumer/mm/truncate.c:212 kbuild/src/consumer/mm/truncate.c:264)
kern :warn : [ 1910.383277] __invalidate_mapping_pages (kbuild/src/consumer/mm/truncate.c:593)
kern :warn : [ 1910.383307] invalidate_bdev (kbuild/src/consumer/fs/block_dev.c:97)
kern :warn : [ 1910.383311] ext4_put_super (kbuild/src/consumer/fs/ext4/super.c:1189 (discriminator 2)) ext4
kern :warn : [ 1910.383347] generic_shutdown_super (kbuild/src/consumer/include/linux/list.h:282 kbuild/src/consumer/fs/super.c:466)
kern :warn : [ 1910.383352] kill_block_super (kbuild/src/consumer/fs/super.c:1394)
kern :warn : [ 1910.383356] deactivate_locked_super (kbuild/src/consumer/fs/super.c:342)

kern :warn : [ 1910.392811] cleanup_mnt (kbuild/src/consumer/fs/namespace.c:117 kbuild/src/consumer/fs/namespace.c:1119)
kern :warn : [ 1910.392817] task_work_run (kbuild/src/consumer/kernel/task_work.c:142 (discriminator 1))
kern :warn : [ 1910.392821] exit_to_user_mode_prepare (kbuild/src/consumer/include/linux/tracehook.h:189 kbuild/src/consumer/kernel/entry/common.c:174 kbuild/src/consumer/kernel/entry/common.c:201)
kern :warn : [ 1910.392828] syscall_exit_to_user_mode (kbuild/src/consumer/kernel/entry/common.c:125 kbuild/src/consumer/kernel/entry/common.c:304)
kern :warn : [ 1910.392831] entry_SYSCALL_64_after_hwframe (kbuild/src/consumer/arch/x86/entry/entry_64.S:127)
kern :warn : [ 1910.392835] RIP: 0033:0x7fd218195507
kern :warn : [ 1910.392838] Code: 19 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 19 0c 00 f7 d8 64 89 01 48
All code
========
0: 19 0c 00 sbb %ecx,(%rax,%rax,1)
3: f7 d8 neg %eax
5: 64 89 01 mov %eax,%fs:(%rcx)
8: 48 83 c8 ff or $0xffffffffffffffff,%rax
c: c3 retq
d: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
13: 31 f6 xor %esi,%esi
15: e9 09 00 00 00 jmpq 0x23
1a: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
21: 00 00
23: b8 a6 00 00 00 mov $0xa6,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 59 19 0c 00 mov 0xc1959(%rip),%rcx # 0xc1993
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 retq
9: 48 8b 0d 59 19 0c 00 mov 0xc1959(%rip),%rcx # 0xc1969
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
kern :warn : [ 1910.392841] RSP: 002b:00007fff92d7eac8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
kern :warn : [ 1910.392844] RAX: 0000000000000000 RBX: 0000559d38a02e90 RCX: 00007fd218195507
kern :warn : [ 1910.392845] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000559d38a030a0
kern :warn : [ 1910.392847] RBP: 0000000000000000 R08: 0000559d38a030c0 R09: 00007fd218216e80
kern :warn : [ 1910.392848] R10: 0000000000000000 R11: 0000000000000246 R12: 0000559d38a030a0
kern :warn : [ 1910.392850] R13: 00007fd2182bb1c4 R14: 0000559d38a02f88 R15: 0000000000000000
user :notice: [ 1910.618740] ok 10 openat2 with normal struct argument [misalign=9] succeeds

user :notice: [ 1910.628617] ok 11 openat2 with normal struct argument [misalign=11] succeeds

user :notice: [ 1910.638701] ok 12 openat2 with normal struct argument [misalign=17] succeeds

user :notice: [ 1910.648749] ok 13 openat2 with normal struct argument [misalign=87] succeeds

user :notice: [ 1910.659303] ok 14 openat2 with bigger struct (zeroed out) argument [misalign=0] succeeds

user :notice: [ 1910.670867] ok 15 openat2 with bigger struct (zeroed out) argument [misalign=1] succeeds

user :notice: [ 1910.682337] ok 16 openat2 with bigger struct (zeroed out) argument [misalign=2] succeeds

user :notice: [ 1910.693720] ok 17 openat2 with bigger struct (zeroed out) argument [misalign=3] succeeds

user :notice: [ 1910.705011] ok 18 openat2 with bigger struct (zeroed out) argument [misalign=4] succeeds

user :notice: [ 1910.716296] ok 19 openat2 with bigger struct (zeroed out) argument [misalign=5] succeeds

user :notice: [ 1910.727643] ok 20 openat2 with bigger struct (zeroed out) argument [misalign=6] succeeds

user :notice: [ 1910.738998] ok 21 openat2 with bigger struct (zeroed out) argument [misalign=7] succeeds

user :notice: [ 1910.750428] ok 22 openat2 with bigger struct (zeroed out) argument [misalign=8] succeeds

user :notice: [ 1910.761787] ok 23 openat2 with bigger struct (zeroed out) argument [misalign=9] succeeds

user :notice: [ 1910.773213] ok 24 openat2 with bigger struct (zeroed out) argument [misalign=11] succeeds

user :notice: [ 1910.784687] ok 25 openat2 with bigger struct (zeroed out) argument [misalign=17] succeeds

user :notice: [ 1910.796172] ok 26 openat2 with bigger struct (zeroed out) argument [misalign=87] succeeds

user :notice: [ 1910.808006] ok 27 openat2 with zero-sized 'struct' argument [misalign=0] fails with -22 (Invalid argument)

user :notice: [ 1910.821279] ok 28 openat2 with zero-sized 'struct' argument [misalign=1] fails with -22 (Invalid argument)

user :notice: [ 1910.834603] ok 29 openat2 with zero-sized 'struct' argument [misalign=2] fails with -22 (Invalid argument)

user :notice: [ 1910.848015] ok 30 openat2 with zero-sized 'struct' argument [misalign=3] fails with -22 (Invalid argument)

user :notice: [ 1910.861381] ok 31 openat2 with zero-sized 'struct' argument [misalign=4] fails with -22 (Invalid argument)

user :notice: [ 1910.874736] ok 32 openat2 with zero-sized 'struct' argument [misalign=5] fails with -22 (Invalid argument)

user :notice: [ 1910.887944] ok 33 openat2 with zero-sized 'struct' argument [misalign=6] fails with -22 (Invalid argument)

user :notice: [ 1910.901316] ok 34 openat2 with zero-sized 'struct' argument [misalign=7] fails with -22 (Invalid argument)

user :notice: [ 1910.914802] ok 35 openat2 with zero-sized 'struct' argument [misalign=8] fails with -22 (Invalid argument)

user :notice: [ 1910.928188] ok 36 openat2 with zero-sized 'struct' argument [misalign=9] fails with -22 (Invalid argument)

user :notice: [ 1910.941651] ok 37 openat2 with zero-sized 'struct' argument [misalign=11] fails with -22 (Invalid argument)


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
Oliver Sang


Attachments:
(No filename) (11.26 kB)
config-5.11.0-rc2-g6bb5a3cec4c4 (215.94 kB)
job-script (7.16 kB)
kmsg.xz (1.59 MB)
kernel-selftests (192.64 kB)
job.yaml (6.15 kB)
reproduce (619.00 B)
Download all attachments