2020-03-08 14:05:32

by Chen, Rong A

[permalink] [raw]
Subject: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

Greeting,

FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:


commit: 6d390e4b5d48ec03bb87e63cf0a2bff5f4e116da ("locks: fix a potential use-after-free problem when wakeup a waiter")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
with following parameters:

nr_task: 100%
mode: process
test: lock1
cpufreq_governor: performance
ucode: 0x11

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops -51.3% regression |
| test machine | 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory |
| test parameters | cpufreq_governor=performance |
| | mode=thread |
| | nr_task=100% |
| | test=lock1 |
| | ucode=0x11 |
+------------------+----------------------------------------------------------------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-knm01/lock1/will-it-scale/0x11

commit:
0a68ff5e2e ("fcntl: Distribute switch variables for initialization")
6d390e4b5d ("locks: fix a potential use-after-free problem when wakeup a waiter")

0a68ff5e2e7cf226 6d390e4b5d48ec03bb87e63cf0a
---------------- ---------------------------
%stddev %change %stddev
\ | \
66597 ± 3% -96.6% 2260 will-it-scale.per_process_ops
313.70 -1.2% 310.06 will-it-scale.time.elapsed_time
313.70 -1.2% 310.06 will-it-scale.time.elapsed_time.max
19180216 ± 3% -96.6% 651090 will-it-scale.workload
63324 ± 9% -27.5% 45902 meminfo.Mapped
52611 ± 11% -33.5% 35010 ± 2% numa-meminfo.node0.Mapped
13299 ± 11% -33.6% 8829 numa-vmstat.node0.nr_mapped
1440 ± 3% -8.9% 1312 ± 4% slabinfo.fsnotify_mark_connector.active_objs
1440 ± 3% -8.9% 1312 ± 4% slabinfo.fsnotify_mark_connector.num_objs
0.00 ± 10% -0.0 0.00 ± 17% mpstat.cpu.all.soft%
86.17 +11.7 97.88 mpstat.cpu.all.sys%
12.62 ± 8% -11.7 0.89 ± 6% mpstat.cpu.all.usr%
85.00 +13.8% 96.75 vmstat.cpu.sy
12.00 ± 10% -100.0% 0.00 vmstat.cpu.us
2274 -2.9% 2208 vmstat.system.cs
15943 ± 9% -27.5% 11561 proc-vmstat.nr_mapped
1809 ± 75% -91.3% 157.00 ± 6% proc-vmstat.numa_hint_faults
1809 ± 75% -91.3% 157.00 ± 6% proc-vmstat.numa_hint_faults_local
914333 +3.7% 948222 proc-vmstat.numa_hit
914333 +3.7% 948222 proc-vmstat.numa_local
3736 ± 6% +18.5% 4427 ± 2% proc-vmstat.pgactivate
990333 +4.3% 1032696 proc-vmstat.pgalloc_normal
862745 +3.7% 894537 proc-vmstat.pgfault
2383 ± 7% -55.4% 1064 sched_debug.cfs_rq:/.exec_clock.stddev
1089611 ± 4% -45.1% 597726 sched_debug.cfs_rq:/.min_vruntime.stddev
1.52 ± 10% +58.6% 2.41 ± 8% sched_debug.cfs_rq:/.nr_spread_over.avg
-9694655 -44.6% -5372610 sched_debug.cfs_rq:/.spread0.min
885953 ± 2% -36.8% 560044 ± 2% sched_debug.cfs_rq:/.spread0.stddev
493.40 ± 8% +32.0% 651.30 ± 10% sched_debug.cfs_rq:/.util_avg.min
62.39 ± 8% -21.1% 49.24 ± 13% sched_debug.cfs_rq:/.util_avg.stddev
131.46 ± 4% -12.4% 115.20 ± 4% sched_debug.cfs_rq:/.util_est_enqueued.stddev
782383 ± 2% -13.4% 677845 ± 8% sched_debug.cpu.avg_idle.min
2142 ± 12% -55.3% 957.56 ± 4% sched_debug.cpu.clock.stddev
2142 ± 12% -55.3% 957.56 ± 4% sched_debug.cpu.clock_task.stddev
289492 ± 13% -27.7% 209254 ± 17% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 ± 11% -55.5% 0.00 ± 4% sched_debug.cpu.next_balance.stddev
40490 ± 16% -41.8% 23551 ± 4% sched_debug.cpu.nr_switches.max
3333 ± 9% -28.6% 2380 sched_debug.cpu.nr_switches.stddev
1152 -9.2% 1045 sched_debug.cpu.sched_count.avg
36522 ± 17% -46.5% 19545 ± 5% sched_debug.cpu.sched_count.max
3083 ± 10% -32.8% 2072 sched_debug.cpu.sched_count.stddev
17142 ± 17% -44.3% 9552 ± 5% sched_debug.cpu.ttwu_count.max
1439 ± 8% -31.9% 981.06 sched_debug.cpu.ttwu_count.stddev
16805 ± 18% -46.5% 8998 ± 5% sched_debug.cpu.ttwu_local.max
1371 ± 10% -33.5% 912.34 sched_debug.cpu.ttwu_local.stddev
0.00 -100.0% 0.00 sched_debug.rt_rq:/.rt_nr_migratory.avg
0.20 -100.0% 0.00 sched_debug.rt_rq:/.rt_nr_migratory.max
0.01 -100.0% 0.00 sched_debug.rt_rq:/.rt_nr_migratory.stddev
0.14 ± 41% -83.3% 0.02 ± 23% sched_debug.rt_rq:/.rt_time.avg
39.32 ± 41% -99.1% 0.35 ± 22% sched_debug.rt_rq:/.rt_time.max
2.31 ± 41% -97.4% 0.06 ± 4% sched_debug.rt_rq:/.rt_time.stddev
33.60 -82.4% 5.92 perf-stat.i.MPKI
7.654e+09 ± 3% +10.5% 8.458e+09 perf-stat.i.branch-instructions
7.01 -6.2 0.84 perf-stat.i.branch-miss-rate%
5.368e+08 ± 4% -87.3% 67920479 perf-stat.i.branch-misses
5.18 +17.7 22.86 perf-stat.i.cache-miss-rate%
65182049 ± 2% -29.9% 45680888 perf-stat.i.cache-misses
1.277e+09 ± 3% -84.3% 2e+08 perf-stat.i.cache-references
2266 -4.2% 2170 perf-stat.i.context-switches
11.58 ± 3% +11.0% 12.85 perf-stat.i.cpi
4.4e+11 +1.1% 4.447e+11 perf-stat.i.cpu-cycles
242.58 ± 2% +3.9% 252.10 perf-stat.i.cpu-migrations
6729 ± 2% +44.4% 9715 perf-stat.i.cycles-between-cache-misses
2.30 -2.1 0.19 perf-stat.i.iTLB-load-miss-rate%
8.993e+08 ± 3% -92.9% 63736147 perf-stat.i.iTLB-load-misses
3.808e+10 ± 3% -9.2% 3.458e+10 perf-stat.i.iTLB-loads
3.797e+10 ± 3% -9.1% 3.452e+10 perf-stat.i.instructions
42.78 +1170.0% 543.24 perf-stat.i.instructions-per-iTLB-miss
0.09 ± 3% -10.5% 0.08 perf-stat.i.ipc
33.74 -82.8% 5.80 perf-stat.overall.MPKI
7.01 -6.2 0.80 perf-stat.overall.branch-miss-rate%
5.11 +17.7 22.85 perf-stat.overall.cache-miss-rate%
11.62 ± 3% +10.9% 12.88 perf-stat.overall.cpi
6738 ± 2% +44.4% 9728 perf-stat.overall.cycles-between-cache-misses
2.31 -2.1 0.18 perf-stat.overall.iTLB-load-miss-rate%
42.21 +1186.2% 542.88 perf-stat.overall.instructions-per-iTLB-miss
0.09 ± 3% -9.9% 0.08 perf-stat.overall.ipc
618557 +2550.0% 16391579 perf-stat.overall.path-length
7.631e+09 ± 3% +10.6% 8.44e+09 perf-stat.ps.branch-instructions
5.353e+08 ± 4% -87.4% 67223267 perf-stat.ps.branch-misses
65236586 ± 2% -30.1% 45607616 perf-stat.ps.cache-misses
1.277e+09 ± 3% -84.4% 1.997e+08 perf-stat.ps.cache-references
2189 -2.7% 2129 perf-stat.ps.context-switches
4.393e+11 +1.0% 4.436e+11 perf-stat.ps.cpu-cycles
217.99 ± 2% +10.2% 240.17 perf-stat.ps.cpu-migrations
8.968e+08 ± 3% -92.9% 63451935 perf-stat.ps.iTLB-load-misses
3.79e+10 ± 3% -9.0% 3.448e+10 perf-stat.ps.iTLB-loads
3.785e+10 ± 3% -9.0% 3.445e+10 perf-stat.ps.instructions
2653 +4.3% 2766 perf-stat.ps.minor-faults
2653 +4.3% 2766 perf-stat.ps.page-faults
1.186e+13 ± 2% -10.0% 1.067e+13 perf-stat.total.instructions
30.15 -29.6 0.52 ± 3% perf-profile.calltrace.cycles-pp.posix_lock_inode.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl
10.22 ± 7% -10.2 0.00 perf-profile.calltrace.cycles-pp.locks_alloc_lock.posix_lock_inode.do_lock_file_wait.fcntl_setlk.do_fcntl
9.82 ± 21% -9.8 0.00 perf-profile.calltrace.cycles-pp._copy_from_user.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.34 -9.3 0.00 perf-profile.calltrace.cycles-pp.locks_alloc_lock.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
8.91 ± 8% -8.9 0.00 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.locks_alloc_lock.posix_lock_inode.do_lock_file_wait.fcntl_setlk
7.83 ± 3% -7.8 0.00 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.locks_alloc_lock.fcntl_setlk.do_fcntl.__x64_sys_fcntl
7.46 ± 26% -7.5 0.00 perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.do_fcntl.__x64_sys_fcntl.do_syscall_64
7.17 ± 9% -7.2 0.00 perf-profile.calltrace.cycles-pp.security_file_lock.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl
5.86 ± 35% -5.9 0.00 perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault._copy_from_user.do_fcntl.__x64_sys_fcntl
5.71 ± 12% -5.7 0.00 perf-profile.calltrace.cycles-pp.common_file_perm.security_file_lock.do_lock_file_wait.fcntl_setlk.do_fcntl
4.75 ± 16% -4.8 0.00 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
0.92 ± 3% -0.1 0.81 perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.91 ± 3% -0.1 0.81 perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.91 ± 3% -0.1 0.81 perf-profile.calltrace.cycles-pp.task_numa_work.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.91 ± 2% -0.1 0.80 perf-profile.calltrace.cycles-pp.change_protection.change_prot_numa.task_numa_work.task_work_run.exit_to_usermode_loop
0.91 ± 2% -0.1 0.80 perf-profile.calltrace.cycles-pp.change_prot_numa.task_numa_work.task_work_run.exit_to_usermode_loop.do_syscall_64
0.90 ± 3% -0.1 0.79 perf-profile.calltrace.cycles-pp.change_p4d_range.change_protection.change_prot_numa.task_numa_work.task_work_run
0.00 +1.0 1.01 ± 25% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
0.00 +1.0 1.04 ± 25% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt
0.00 +1.1 1.14 ± 26% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
0.00 +2.1 2.09 ± 36% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.native_queued_spin_lock_slowpath
0.00 +3.7 3.74 ± 38% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.native_queued_spin_lock_slowpath._raw_spin_lock
0.00 +5.1 5.14 ± 37% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.native_queued_spin_lock_slowpath._raw_spin_lock.locks_delete_block
0.00 +5.5 5.54 ± 36% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.native_queued_spin_lock_slowpath._raw_spin_lock.locks_delete_block.do_lock_file_wait
87.63 +11.5 99.09 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
86.40 +12.7 99.06 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
80.12 +18.0 98.08 perf-profile.calltrace.cycles-pp.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
76.44 +21.5 97.98 perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
59.98 +37.9 97.85 perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
41.88 +55.6 97.44 perf-profile.calltrace.cycles-pp.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
0.52 ± 60% +96.2 96.76 perf-profile.calltrace.cycles-pp.locks_delete_block.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl
0.00 +96.3 96.31 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.locks_delete_block.do_lock_file_wait.fcntl_setlk
0.00 +96.5 96.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_delete_block.do_lock_file_wait.fcntl_setlk.do_fcntl
30.44 -29.9 0.53 ± 3% perf-profile.children.cycles-pp.posix_lock_inode
19.64 ± 4% -19.3 0.32 ± 3% perf-profile.children.cycles-pp.locks_alloc_lock
16.97 ± 5% -16.7 0.25 ± 5% perf-profile.children.cycles-pp.kmem_cache_alloc
9.99 ± 32% -9.9 0.07 ± 10% perf-profile.children.cycles-pp.___might_sleep
9.87 ± 21% -9.8 0.10 ± 5% perf-profile.children.cycles-pp._copy_from_user
7.51 ± 25% -7.5 0.00 perf-profile.children.cycles-pp.__might_fault
7.19 ± 9% -7.1 0.12 ± 11% perf-profile.children.cycles-pp.security_file_lock
5.73 ± 12% -5.6 0.10 ± 10% perf-profile.children.cycles-pp.common_file_perm
5.06 ± 15% -4.9 0.20 ± 5% perf-profile.children.cycles-pp.syscall_return_via_sysret
10.41 ± 29% -4.6 5.80 ± 35% perf-profile.children.cycles-pp.apic_timer_interrupt
9.58 ± 29% -4.2 5.35 ± 37% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
3.79 ± 7% -3.7 0.09 ± 7% perf-profile.children.cycles-pp.entry_SYSCALL_64
7.66 ± 32% -3.7 3.97 ± 38% perf-profile.children.cycles-pp.hrtimer_interrupt
3.73 -3.6 0.13 ± 5% perf-profile.children.cycles-pp.kmem_cache_free
2.77 ± 2% -2.7 0.07 ± 6% perf-profile.children.cycles-pp.memset_erms
4.70 ± 27% -2.5 2.18 ± 35% perf-profile.children.cycles-pp.__hrtimer_run_queues
2.72 ± 33% -1.5 1.18 ± 26% perf-profile.children.cycles-pp.tick_sched_timer
1.52 ± 4% -1.5 0.07 ± 7% perf-profile.children.cycles-pp.__fget_light
1.51 ± 6% -1.5 0.05 perf-profile.children.cycles-pp.locks_delete_lock_ctx
2.44 ± 32% -1.4 1.08 ± 25% perf-profile.children.cycles-pp.tick_sched_handle
2.32 ± 33% -1.3 1.05 ± 25% perf-profile.children.cycles-pp.update_process_times
1.11 ± 5% -1.1 0.06 ± 7% perf-profile.children.cycles-pp.fcntl
1.50 ± 29% -0.9 0.64 ± 16% perf-profile.children.cycles-pp.scheduler_tick
0.91 ± 28% -0.7 0.24 ± 12% perf-profile.children.cycles-pp.irq_exit
0.71 ± 7% -0.6 0.07 ± 12% perf-profile.children.cycles-pp.fpregs_assert_state_consistent
1.08 ± 29% -0.6 0.48 ± 15% perf-profile.children.cycles-pp.task_tick_fair
0.53 ± 45% -0.4 0.13 ± 8% perf-profile.children.cycles-pp.__softirqentry_text_start
0.36 ± 3% -0.3 0.03 ±100% perf-profile.children.cycles-pp.__list_del_entry_valid
0.49 ± 33% -0.3 0.21 ± 17% perf-profile.children.cycles-pp.update_curr
1.07 ± 3% -0.2 0.82 perf-profile.children.cycles-pp.exit_to_usermode_loop
1.05 ± 3% -0.2 0.82 perf-profile.children.cycles-pp.task_work_run
1.04 ± 3% -0.2 0.81 perf-profile.children.cycles-pp.change_protection
1.04 ± 3% -0.2 0.81 perf-profile.children.cycles-pp.change_prot_numa
1.04 ± 3% -0.2 0.81 perf-profile.children.cycles-pp.change_p4d_range
1.05 ± 3% -0.2 0.82 perf-profile.children.cycles-pp.task_numa_work
0.38 ± 28% -0.2 0.18 ± 14% perf-profile.children.cycles-pp.update_load_avg
0.35 ± 24% -0.2 0.16 ± 35% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.36 ± 17% -0.2 0.18 ± 46% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.22 ± 11% -0.1 0.08 ± 19% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.18 ± 28% -0.1 0.06 ± 71% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.21 ± 42% -0.1 0.09 ± 37% perf-profile.children.cycles-pp.ktime_get
0.15 ± 53% -0.1 0.04 ± 57% perf-profile.children.cycles-pp.run_timer_softirq
0.19 ± 41% -0.1 0.08 ± 40% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.14 ± 38% -0.1 0.07 ± 16% perf-profile.children.cycles-pp.cpuacct_charge
0.17 ± 28% -0.1 0.10 ± 44% perf-profile.children.cycles-pp.read_tsc
0.15 ± 3% -0.1 0.08 ± 23% perf-profile.children.cycles-pp.clockevents_program_event
0.12 ± 26% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.11 ± 28% -0.1 0.06 ± 14% perf-profile.children.cycles-pp.__update_load_avg_se
0.08 ± 8% -0.0 0.03 ±100% perf-profile.children.cycles-pp.page_fault
0.08 ± 6% -0.0 0.03 ±100% perf-profile.children.cycles-pp.do_page_fault
0.00 +0.1 0.10 ± 4% perf-profile.children.cycles-pp.__locks_wake_up_blocks
88.01 +11.4 99.42 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
86.90 +12.5 99.39 perf-profile.children.cycles-pp.do_syscall_64
80.17 +17.9 98.08 perf-profile.children.cycles-pp.__x64_sys_fcntl
76.53 +21.5 97.98 perf-profile.children.cycles-pp.do_fcntl
60.08 +37.8 97.86 perf-profile.children.cycles-pp.fcntl_setlk
41.97 +55.5 97.45 perf-profile.children.cycles-pp.do_lock_file_wait
3.80 ± 11% +92.9 96.69 perf-profile.children.cycles-pp._raw_spin_lock
0.65 ± 19% +96.1 96.76 perf-profile.children.cycles-pp.locks_delete_block
0.00 +96.5 96.52 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
8.92 ± 30% -8.9 0.07 ± 7% perf-profile.self.cycles-pp.___might_sleep
8.75 ± 9% -8.7 0.09 ± 4% perf-profile.self.cycles-pp.posix_lock_inode
5.96 ± 21% -6.0 0.00 perf-profile.self.cycles-pp.do_fcntl
5.31 ± 6% -5.2 0.11 ± 4% perf-profile.self.cycles-pp.fcntl_setlk
5.25 ± 5% -5.2 0.10 ± 5% perf-profile.self.cycles-pp.kmem_cache_alloc
5.05 ± 15% -4.9 0.20 ± 5% perf-profile.self.cycles-pp.syscall_return_via_sysret
4.01 ± 10% -3.9 0.09 ± 5% perf-profile.self.cycles-pp.do_syscall_64
3.79 ± 7% -3.7 0.09 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64
3.32 ± 4% -3.2 0.12 ± 3% perf-profile.self.cycles-pp.kmem_cache_free
3.39 ± 8% -3.2 0.21 ± 11% perf-profile.self.cycles-pp._raw_spin_lock
2.02 ± 3% -2.0 0.06 perf-profile.self.cycles-pp.memset_erms
1.75 ± 4% -1.7 0.06 ± 7% perf-profile.self.cycles-pp.common_file_perm
1.65 ± 4% -1.6 0.05 ± 8% perf-profile.self.cycles-pp.locks_alloc_lock
1.35 ± 6% -1.3 0.05 ± 9% perf-profile.self.cycles-pp.__fget_light
0.59 ± 5% -0.5 0.07 ± 12% perf-profile.self.cycles-pp.fpregs_assert_state_consistent
0.31 ± 6% -0.3 0.03 ±100% perf-profile.self.cycles-pp.__list_del_entry_valid
0.90 ± 5% -0.2 0.70 ± 2% perf-profile.self.cycles-pp.change_p4d_range
0.33 ± 28% -0.2 0.15 ± 38% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.28 ± 32% -0.2 0.11 ± 15% perf-profile.self.cycles-pp.update_curr
0.22 ± 11% -0.1 0.08 ± 19% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.18 ± 26% -0.1 0.05 ± 67% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.17 ± 49% -0.1 0.06 ± 68% perf-profile.self.cycles-pp.ktime_get
0.17 ± 24% -0.1 0.06 ± 22% perf-profile.self.cycles-pp.irq_exit
0.16 ± 28% -0.1 0.07 ± 42% perf-profile.self.cycles-pp.perf_mux_hrtimer_handler
0.12 ± 43% -0.1 0.04 ±107% perf-profile.self.cycles-pp.rcu_sched_clock_irq
0.15 ± 30% -0.1 0.07 ± 16% perf-profile.self.cycles-pp.update_load_avg
0.14 ± 38% -0.1 0.07 ± 16% perf-profile.self.cycles-pp.cpuacct_charge
0.15 ± 29% -0.1 0.09 ± 43% perf-profile.self.cycles-pp.read_tsc
0.00 +0.1 0.09 ± 4% perf-profile.self.cycles-pp.__locks_wake_up_blocks
0.00 +90.8 90.84 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
4528 ± 34% +23.5% 5591 ± 33% interrupts.CPU1.NMI:Non-maskable_interrupts
4528 ± 34% +23.5% 5591 ± 33% interrupts.CPU1.PMI:Performance_monitoring_interrupts
3384 +61.1% 5453 ± 33% interrupts.CPU104.NMI:Non-maskable_interrupts
3384 +61.1% 5453 ± 33% interrupts.CPU104.PMI:Performance_monitoring_interrupts
3355 +35.9% 4558 ± 34% interrupts.CPU107.NMI:Non-maskable_interrupts
3355 +35.9% 4558 ± 34% interrupts.CPU107.PMI:Performance_monitoring_interrupts
210.50 ± 3% +15.4% 243.00 ± 7% interrupts.CPU11.RES:Rescheduling_interrupts
4217 ± 34% +29.9% 5479 ± 33% interrupts.CPU111.NMI:Non-maskable_interrupts
4217 ± 34% +29.9% 5479 ± 33% interrupts.CPU111.PMI:Performance_monitoring_interrupts
4238 ± 35% +51.0% 6400 ± 24% interrupts.CPU112.NMI:Non-maskable_interrupts
4238 ± 35% +51.0% 6400 ± 24% interrupts.CPU112.PMI:Performance_monitoring_interrupts
3393 +60.6% 5450 ± 33% interrupts.CPU113.NMI:Non-maskable_interrupts
3393 +60.6% 5450 ± 33% interrupts.CPU113.PMI:Performance_monitoring_interrupts
4311 ± 33% +27.0% 5475 ± 33% interrupts.CPU114.NMI:Non-maskable_interrupts
4311 ± 33% +27.0% 5475 ± 33% interrupts.CPU114.PMI:Performance_monitoring_interrupts
10.25 ± 88% +865.9% 99.00 ± 96% interrupts.CPU117.RES:Rescheduling_interrupts
4265 ± 35% +27.9% 5457 ± 33% interrupts.CPU121.NMI:Non-maskable_interrupts
4265 ± 35% +27.9% 5457 ± 33% interrupts.CPU121.PMI:Performance_monitoring_interrupts
4205 ± 33% +30.1% 5473 ± 33% interrupts.CPU127.NMI:Non-maskable_interrupts
4205 ± 33% +30.1% 5473 ± 33% interrupts.CPU127.PMI:Performance_monitoring_interrupts
4207 ± 33% +30.0% 5469 ± 33% interrupts.CPU133.NMI:Non-maskable_interrupts
4207 ± 33% +30.0% 5469 ± 33% interrupts.CPU133.PMI:Performance_monitoring_interrupts
4393 ± 32% +45.9% 6412 ± 24% interrupts.CPU135.NMI:Non-maskable_interrupts
4393 ± 32% +45.9% 6412 ± 24% interrupts.CPU135.PMI:Performance_monitoring_interrupts
4320 ± 35% +26.6% 5468 ± 33% interrupts.CPU137.NMI:Non-maskable_interrupts
4320 ± 35% +26.6% 5468 ± 33% interrupts.CPU137.PMI:Performance_monitoring_interrupts
3305 +37.8% 4554 ± 35% interrupts.CPU141.NMI:Non-maskable_interrupts
3305 +37.8% 4554 ± 35% interrupts.CPU141.PMI:Performance_monitoring_interrupts
115.50 ± 56% -94.6% 6.25 ± 17% interrupts.CPU142.RES:Rescheduling_interrupts
4205 ± 34% +29.8% 5459 ± 33% interrupts.CPU147.NMI:Non-maskable_interrupts
4205 ± 34% +29.8% 5459 ± 33% interrupts.CPU147.PMI:Performance_monitoring_interrupts
3316 +64.9% 5468 ± 33% interrupts.CPU154.NMI:Non-maskable_interrupts
3316 +64.9% 5468 ± 33% interrupts.CPU154.PMI:Performance_monitoring_interrupts
4250 ± 34% +28.7% 5468 ± 33% interrupts.CPU156.NMI:Non-maskable_interrupts
4250 ± 34% +28.7% 5468 ± 33% interrupts.CPU156.PMI:Performance_monitoring_interrupts
4315 ± 34% +48.4% 6405 ± 24% interrupts.CPU159.NMI:Non-maskable_interrupts
4315 ± 34% +48.4% 6405 ± 24% interrupts.CPU159.PMI:Performance_monitoring_interrupts
4267 ± 32% +27.8% 5453 ± 33% interrupts.CPU161.NMI:Non-maskable_interrupts
4267 ± 32% +27.8% 5453 ± 33% interrupts.CPU161.PMI:Performance_monitoring_interrupts
3447 +58.7% 5472 ± 33% interrupts.CPU162.NMI:Non-maskable_interrupts
3447 +58.7% 5472 ± 33% interrupts.CPU162.PMI:Performance_monitoring_interrupts
4334 ± 32% +26.1% 5467 ± 33% interrupts.CPU169.NMI:Non-maskable_interrupts
4334 ± 32% +26.1% 5467 ± 33% interrupts.CPU169.PMI:Performance_monitoring_interrupts
6.25 ± 6% +2788.0% 180.50 ±155% interrupts.CPU173.RES:Rescheduling_interrupts
3370 ± 2% +61.9% 5457 ± 33% interrupts.CPU175.NMI:Non-maskable_interrupts
3370 ± 2% +61.9% 5457 ± 33% interrupts.CPU175.PMI:Performance_monitoring_interrupts
4256 ± 32% +28.4% 5464 ± 33% interrupts.CPU179.NMI:Non-maskable_interrupts
4256 ± 32% +28.4% 5464 ± 33% interrupts.CPU179.PMI:Performance_monitoring_interrupts
4199 ± 34% +30.6% 5484 ± 33% interrupts.CPU180.NMI:Non-maskable_interrupts
4199 ± 34% +30.6% 5484 ± 33% interrupts.CPU180.PMI:Performance_monitoring_interrupts
4320 ± 34% +26.6% 5471 ± 33% interrupts.CPU182.NMI:Non-maskable_interrupts
4320 ± 34% +26.6% 5471 ± 33% interrupts.CPU182.PMI:Performance_monitoring_interrupts
20.25 ± 75% +840.7% 190.50 ± 92% interrupts.CPU186.RES:Rescheduling_interrupts
11.75 ± 79% +1342.6% 169.50 ±107% interrupts.CPU187.RES:Rescheduling_interrupts
4340 ± 32% +25.7% 5457 ± 33% interrupts.CPU189.NMI:Non-maskable_interrupts
4340 ± 32% +25.7% 5457 ± 33% interrupts.CPU189.PMI:Performance_monitoring_interrupts
4258 ± 36% +28.2% 5458 ± 33% interrupts.CPU191.NMI:Non-maskable_interrupts
4258 ± 36% +28.2% 5458 ± 33% interrupts.CPU191.PMI:Performance_monitoring_interrupts
6.50 ± 23% +1034.6% 73.75 ±104% interrupts.CPU199.RES:Rescheduling_interrupts
153.00 ± 57% -72.5% 42.00 ±144% interrupts.CPU200.RES:Rescheduling_interrupts
3332 +64.2% 5472 ± 33% interrupts.CPU202.NMI:Non-maskable_interrupts
3332 +64.2% 5472 ± 33% interrupts.CPU202.PMI:Performance_monitoring_interrupts
3397 ± 2% +60.9% 5464 ± 33% interrupts.CPU203.NMI:Non-maskable_interrupts
3397 ± 2% +60.9% 5464 ± 33% interrupts.CPU203.PMI:Performance_monitoring_interrupts
4299 ± 31% +48.0% 6364 ± 24% interrupts.CPU205.NMI:Non-maskable_interrupts
4299 ± 31% +48.0% 6364 ± 24% interrupts.CPU205.PMI:Performance_monitoring_interrupts
4030 ± 33% +34.3% 5413 ± 33% interrupts.CPU214.NMI:Non-maskable_interrupts
4030 ± 33% +34.3% 5413 ± 33% interrupts.CPU214.PMI:Performance_monitoring_interrupts
4060 ± 35% +54.4% 6267 ± 24% interrupts.CPU215.NMI:Non-maskable_interrupts
4060 ± 35% +54.4% 6267 ± 24% interrupts.CPU215.PMI:Performance_monitoring_interrupts
4242 ± 37% +50.4% 6382 ± 24% interrupts.CPU225.NMI:Non-maskable_interrupts
4242 ± 37% +50.4% 6382 ± 24% interrupts.CPU225.PMI:Performance_monitoring_interrupts
3351 +63.0% 5462 ± 33% interrupts.CPU226.NMI:Non-maskable_interrupts
3351 +63.0% 5462 ± 33% interrupts.CPU226.PMI:Performance_monitoring_interrupts
7.25 ± 5% +455.2% 40.25 ±117% interrupts.CPU237.RES:Rescheduling_interrupts
4169 ± 34% +30.8% 5453 ± 33% interrupts.CPU242.NMI:Non-maskable_interrupts
4169 ± 34% +30.8% 5453 ± 33% interrupts.CPU242.PMI:Performance_monitoring_interrupts
3645 ± 2% +80.2% 6568 ± 24% interrupts.CPU25.NMI:Non-maskable_interrupts
3645 ± 2% +80.2% 6568 ± 24% interrupts.CPU25.PMI:Performance_monitoring_interrupts
26.00 ± 93% +714.4% 211.75 ± 70% interrupts.CPU258.RES:Rescheduling_interrupts
8.00 ± 15% +1962.5% 165.00 ±156% interrupts.CPU260.RES:Rescheduling_interrupts
4144 ± 33% +75.5% 7274 interrupts.CPU262.NMI:Non-maskable_interrupts
4144 ± 33% +75.5% 7274 interrupts.CPU262.PMI:Performance_monitoring_interrupts
4391 ± 31% +24.6% 5470 ± 33% interrupts.CPU264.NMI:Non-maskable_interrupts
4391 ± 31% +24.6% 5470 ± 33% interrupts.CPU264.PMI:Performance_monitoring_interrupts
3475 ± 2% +83.4% 6373 ± 24% interrupts.CPU274.NMI:Non-maskable_interrupts
3475 ± 2% +83.4% 6373 ± 24% interrupts.CPU274.PMI:Performance_monitoring_interrupts
4454 ± 35% +22.7% 5465 ± 32% interrupts.CPU276.NMI:Non-maskable_interrupts
4454 ± 35% +22.7% 5465 ± 32% interrupts.CPU276.PMI:Performance_monitoring_interrupts
4407 ± 35% +44.3% 6361 ± 24% interrupts.CPU277.NMI:Non-maskable_interrupts
4407 ± 35% +44.3% 6361 ± 24% interrupts.CPU277.PMI:Performance_monitoring_interrupts
4466 ± 31% +67.9% 7497 interrupts.CPU28.NMI:Non-maskable_interrupts
4466 ± 31% +67.9% 7497 interrupts.CPU28.PMI:Performance_monitoring_interrupts
4392 ± 31% +24.5% 5467 ± 33% interrupts.CPU284.NMI:Non-maskable_interrupts
4392 ± 31% +24.5% 5467 ± 33% interrupts.CPU284.PMI:Performance_monitoring_interrupts
3476 ± 2% +57.0% 5456 ± 33% interrupts.CPU285.NMI:Non-maskable_interrupts
3476 ± 2% +57.0% 5456 ± 33% interrupts.CPU285.PMI:Performance_monitoring_interrupts
4306 ± 31% +47.0% 6329 ± 24% interrupts.CPU286.NMI:Non-maskable_interrupts
4306 ± 31% +47.0% 6329 ± 24% interrupts.CPU286.PMI:Performance_monitoring_interrupts
3471 ± 2% +55.4% 5394 ± 33% interrupts.CPU287.NMI:Non-maskable_interrupts
3471 ± 2% +55.4% 5394 ± 33% interrupts.CPU287.PMI:Performance_monitoring_interrupts
189.00 ± 88% -93.1% 13.00 ± 5% interrupts.CPU287.TLB:TLB_shootdowns
7308 -48.9% 3736 interrupts.CPU31.NMI:Non-maskable_interrupts
7308 -48.9% 3736 interrupts.CPU31.PMI:Performance_monitoring_interrupts
203.50 ±144% -96.3% 7.50 ± 42% interrupts.CPU31.RES:Rescheduling_interrupts
410.25 ±133% -92.8% 29.50 ±120% interrupts.CPU41.RES:Rescheduling_interrupts
33.25 ±100% +740.6% 279.50 ± 41% interrupts.CPU42.RES:Rescheduling_interrupts
3654 +77.2% 6476 ± 24% interrupts.CPU5.NMI:Non-maskable_interrupts
3654 +77.2% 6476 ± 24% interrupts.CPU5.PMI:Performance_monitoring_interrupts
3519 ± 2% +83.1% 6442 ± 24% interrupts.CPU53.NMI:Non-maskable_interrupts
3519 ± 2% +83.1% 6442 ± 24% interrupts.CPU53.PMI:Performance_monitoring_interrupts
4446 ± 35% +24.3% 5527 ± 33% interrupts.CPU55.NMI:Non-maskable_interrupts
4446 ± 35% +24.3% 5527 ± 33% interrupts.CPU55.PMI:Performance_monitoring_interrupts
697.50 ± 79% -94.2% 40.75 ±117% interrupts.CPU56.RES:Rescheduling_interrupts
4437 ± 36% +24.2% 5512 ± 33% interrupts.CPU59.NMI:Non-maskable_interrupts
4437 ± 36% +24.2% 5512 ± 33% interrupts.CPU59.PMI:Performance_monitoring_interrupts
204.00 ± 3% +18.3% 241.25 ± 7% interrupts.CPU6.RES:Rescheduling_interrupts
4471 ± 35% +43.9% 6433 ± 24% interrupts.CPU61.NMI:Non-maskable_interrupts
4471 ± 35% +43.9% 6433 ± 24% interrupts.CPU61.PMI:Performance_monitoring_interrupts
4440 ± 35% +45.3% 6453 ± 24% interrupts.CPU63.NMI:Non-maskable_interrupts
4440 ± 35% +45.3% 6453 ± 24% interrupts.CPU63.PMI:Performance_monitoring_interrupts
98.00 ± 68% -63.5% 35.75 ± 37% interrupts.CPU71.RES:Rescheduling_interrupts
4189 ± 34% +53.1% 6413 ± 24% interrupts.CPU76.NMI:Non-maskable_interrupts
4189 ± 34% +53.1% 6413 ± 24% interrupts.CPU76.PMI:Performance_monitoring_interrupts
4307 ± 35% +48.3% 6389 ± 24% interrupts.CPU79.NMI:Non-maskable_interrupts
4307 ± 35% +48.3% 6389 ± 24% interrupts.CPU79.PMI:Performance_monitoring_interrupts
4535 ± 33% +23.9% 5620 ± 33% interrupts.CPU8.NMI:Non-maskable_interrupts
4535 ± 33% +23.9% 5620 ± 33% interrupts.CPU8.PMI:Performance_monitoring_interrupts
4156 ± 33% +31.3% 5459 ± 33% interrupts.CPU81.NMI:Non-maskable_interrupts
4156 ± 33% +31.3% 5459 ± 33% interrupts.CPU81.PMI:Performance_monitoring_interrupts
3514 ± 2% +56.1% 5487 ± 33% interrupts.CPU85.NMI:Non-maskable_interrupts
3514 ± 2% +56.1% 5487 ± 33% interrupts.CPU85.PMI:Performance_monitoring_interrupts
157.25 ±140% -96.0% 6.25 ± 13% interrupts.CPU94.RES:Rescheduling_interrupts
3450 +58.9% 5481 ± 33% interrupts.CPU96.NMI:Non-maskable_interrupts
3450 +58.9% 5481 ± 33% interrupts.CPU96.PMI:Performance_monitoring_interrupts
6757 ± 2% -46.0% 3648 interrupts.CPU99.NMI:Non-maskable_interrupts
6757 ± 2% -46.0% 3648 interrupts.CPU99.PMI:Performance_monitoring_interrupts
32590 ± 8% -29.7% 22912 interrupts.RES:Rescheduling_interrupts
149331 -44.0% 83638 ± 18% softirqs.CPU0.RCU
154172 -65.3% 53549 ± 52% softirqs.CPU1.RCU
149705 -75.4% 36779 ± 3% softirqs.CPU10.RCU
137249 ± 6% -52.8% 64842 ± 22% softirqs.CPU100.RCU
144358 ± 3% -57.4% 61477 ± 7% softirqs.CPU101.RCU
139086 ± 2% -53.8% 64278 ± 22% softirqs.CPU102.RCU
141535 -52.7% 66922 ± 15% softirqs.CPU103.RCU
145647 ± 3% -62.2% 55051 ± 20% softirqs.CPU104.RCU
143618 -55.7% 63585 ± 21% softirqs.CPU105.RCU
142656 ± 2% -42.5% 82020 ± 12% softirqs.CPU106.RCU
142621 ± 3% -47.9% 74257 ± 4% softirqs.CPU107.RCU
143977 -73.0% 38851 ± 7% softirqs.CPU108.RCU
148186 -73.3% 39616 ± 16% softirqs.CPU109.RCU
148411 -66.0% 50432 ± 38% softirqs.CPU11.RCU
131733 ± 3% -47.7% 68907 ± 33% softirqs.CPU110.RCU
135896 ± 2% -51.8% 65445 ± 10% softirqs.CPU111.RCU
146748 ± 4% -57.3% 62708 ± 12% softirqs.CPU112.RCU
143644 ± 2% -54.0% 66035 ± 20% softirqs.CPU113.RCU
148131 -43.6% 83565 ± 6% softirqs.CPU114.RCU
142262 ± 5% -51.0% 69776 ± 14% softirqs.CPU115.RCU
137307 ± 2% -36.9% 86610 ± 17% softirqs.CPU116.RCU
141385 ± 3% -45.0% 77750 ± 5% softirqs.CPU117.RCU
145208 ± 2% -48.2% 75160 ± 2% softirqs.CPU118.RCU
142544 ± 2% -43.5% 80535 ± 12% softirqs.CPU119.RCU
150060 -72.0% 41986 ± 11% softirqs.CPU12.RCU
146692 -56.6% 63652 ± 5% softirqs.CPU120.RCU
147222 -52.3% 70283 ± 10% softirqs.CPU121.RCU
143793 -55.3% 64344 ± 10% softirqs.CPU122.RCU
142026 ± 4% -55.8% 62711 ± 7% softirqs.CPU123.RCU
143670 -43.6% 81073 ± 12% softirqs.CPU124.RCU
142780 ± 3% -46.2% 76838 ± 11% softirqs.CPU125.RCU
133321 ± 3% -40.2% 79755 ± 12% softirqs.CPU126.RCU
145916 -45.7% 79185 ± 9% softirqs.CPU127.RCU
136628 ± 3% -41.6% 79814 ± 7% softirqs.CPU128.RCU
137059 -41.0% 80805 ± 12% softirqs.CPU129.RCU
149696 -63.0% 55348 ± 10% softirqs.CPU13.RCU
147229 ± 2% -49.3% 74690 ± 5% softirqs.CPU130.RCU
147899 ± 3% -54.4% 67418 ± 4% softirqs.CPU131.RCU
132512 ± 14% -47.0% 70285 ± 28% softirqs.CPU132.RCU
133682 ± 2% -52.9% 62936 ± 9% softirqs.CPU133.RCU
146490 -71.7% 41517 ± 5% softirqs.CPU134.RCU
148518 ± 3% -72.3% 41066 ± 3% softirqs.CPU135.RCU
145103 ± 5% -51.8% 69963 ± 24% softirqs.CPU136.RCU
149436 -62.2% 56443 ± 9% softirqs.CPU137.RCU
142993 ± 2% -57.3% 60995 ± 20% softirqs.CPU138.RCU
147013 -58.0% 61805 ± 4% softirqs.CPU139.RCU
150367 -69.6% 45759 ± 21% softirqs.CPU14.RCU
148173 -47.1% 78334 ± 8% softirqs.CPU140.RCU
143437 -49.1% 73055 ± 8% softirqs.CPU141.RCU
139227 ± 4% -43.4% 78863 ± 18% softirqs.CPU142.RCU
143866 -48.7% 73768 ± 6% softirqs.CPU143.RCU
142245 -47.0% 75426 ± 13% softirqs.CPU144.RCU
139469 ± 8% -37.3% 87475 ± 15% softirqs.CPU145.RCU
143762 -68.9% 44695 ± 5% softirqs.CPU146.RCU
145731 ± 5% -65.5% 50262 ± 16% softirqs.CPU147.RCU
139469 ± 2% -59.9% 55960 ± 12% softirqs.CPU148.RCU
132710 ± 2% -56.2% 58180 ± 15% softirqs.CPU149.RCU
149868 -61.2% 58162 ± 42% softirqs.CPU15.RCU
147153 ± 2% -70.8% 43017 ± 23% softirqs.CPU150.RCU
148657 -65.8% 50882 ± 44% softirqs.CPU151.RCU
143775 ± 4% -37.5% 89910 ± 15% softirqs.CPU152.RCU
144799 ± 3% -49.8% 72653 ± 11% softirqs.CPU153.RCU
138303 ± 2% -42.7% 79258 ± 20% softirqs.CPU154.RCU
138800 -42.1% 80362 ± 14% softirqs.CPU155.RCU
147270 -73.8% 38514 ± 5% softirqs.CPU156.RCU
147722 ± 3% -72.1% 41276 ± 11% softirqs.CPU157.RCU
134079 ± 10% -57.7% 56667 ± 12% softirqs.CPU158.RCU
133037 ± 7% -50.8% 65481 ± 13% softirqs.CPU159.RCU
152164 -69.0% 47165 ± 20% softirqs.CPU16.RCU
144543 ± 2% -58.0% 60684 ± 13% softirqs.CPU160.RCU
147968 -59.9% 59261 ± 12% softirqs.CPU161.RCU
148447 ± 2% -64.2% 53086 ± 7% softirqs.CPU162.RCU
149657 -62.1% 56720 ± 5% softirqs.CPU163.RCU
137962 ± 5% -47.7% 72115 ± 4% softirqs.CPU164.RCU
142832 -50.0% 71411 ± 8% softirqs.CPU165.RCU
143128 ± 3% -48.4% 73878 ± 4% softirqs.CPU166.RCU
141600 -48.7% 72637 ± 4% softirqs.CPU167.RCU
147401 ± 3% -68.9% 45899 ± 15% softirqs.CPU168.RCU
146766 -70.6% 43172 ± 12% softirqs.CPU169.RCU
147087 ± 3% -70.1% 43940 ± 32% softirqs.CPU17.RCU
137946 ± 2% -48.7% 70789 ± 2% softirqs.CPU170.RCU
132854 ± 5% -43.1% 75539 ± 6% softirqs.CPU171.RCU
145481 -68.6% 45673 ± 8% softirqs.CPU172.RCU
149354 -67.4% 48727 ± 3% softirqs.CPU173.RCU
145270 ± 2% -59.3% 59184 ± 16% softirqs.CPU174.RCU
145388 ± 4% -59.8% 58494 ± 9% softirqs.CPU175.RCU
143228 ± 6% -54.1% 65769 ± 21% softirqs.CPU176.RCU
136993 ± 6% -59.3% 55694 ± 10% softirqs.CPU177.RCU
149681 -62.2% 56645 ± 7% softirqs.CPU178.RCU
147550 ± 2% -55.4% 65788 ± 30% softirqs.CPU179.RCU
137404 ± 4% -66.5% 46040 ± 17% softirqs.CPU18.RCU
134946 ± 5% -43.2% 76653 ± 11% softirqs.CPU180.RCU
130799 ± 9% -40.9% 77282 ± 13% softirqs.CPU181.RCU
146835 -50.3% 73011 ± 9% softirqs.CPU182.RCU
148394 -47.3% 78163 ± 5% softirqs.CPU183.RCU
148806 -63.0% 55081 ± 7% softirqs.CPU184.RCU
142254 ± 5% -60.0% 56889 ± 12% softirqs.CPU185.RCU
146921 -48.3% 75892 ± 10% softirqs.CPU186.RCU
142122 ± 4% -57.1% 61031 ± 11% softirqs.CPU187.RCU
149177 -49.4% 75525 ± 14% softirqs.CPU188.RCU
148051 -44.2% 82643 ± 14% softirqs.CPU189.RCU
149028 -75.5% 36446 ± 4% softirqs.CPU19.RCU
141030 -48.3% 72850 ± 6% softirqs.CPU190.RCU
142660 ± 3% -42.6% 81944 ± 15% softirqs.CPU191.RCU
134522 -43.3% 76270 ± 4% softirqs.CPU192.RCU
136038 ± 2% -35.5% 87779 ± 11% softirqs.CPU193.RCU
146055 ± 3% -51.9% 70309 ± 9% softirqs.CPU194.RCU
146754 ± 2% -72.7% 40050 ± 17% softirqs.CPU195.RCU
135458 ± 3% -57.1% 58154 ± 21% softirqs.CPU196.RCU
146622 -61.9% 55807 ± 9% softirqs.CPU197.RCU
148294 ± 2% -55.8% 65572 ± 32% softirqs.CPU198.RCU
150500 -59.3% 61199 ± 10% softirqs.CPU199.RCU
144541 ± 2% -73.1% 38859 ± 12% softirqs.CPU2.RCU
150631 -75.9% 36230 ± 5% softirqs.CPU20.RCU
146123 -51.9% 70303 ± 15% softirqs.CPU200.RCU
147081 -51.4% 71416 ± 14% softirqs.CPU201.RCU
141476 ± 2% -53.6% 65702 ± 14% softirqs.CPU202.RCU
136300 ± 5% -53.2% 63846 ± 7% softirqs.CPU203.RCU
146788 ± 2% -48.6% 75440 ± 4% softirqs.CPU204.RCU
148322 -47.1% 78433 ± 10% softirqs.CPU205.RCU
138706 ± 3% -45.5% 75583 ± 5% softirqs.CPU206.RCU
137361 -45.0% 75562 ± 3% softirqs.CPU207.RCU
138359 -47.0% 73342 ± 2% softirqs.CPU208.RCU
136332 -45.0% 74917 ± 3% softirqs.CPU209.RCU
150277 -73.0% 40538 ± 6% softirqs.CPU21.RCU
145124 ± 3% -47.9% 75589 ± 6% softirqs.CPU210.RCU
146271 -71.6% 41471 ± 13% softirqs.CPU211.RCU
139056 ± 2% -46.7% 74081 ± 18% softirqs.CPU212.RCU
142892 -46.3% 76733 ± 23% softirqs.CPU213.RCU
146304 ± 2% -68.4% 46184 ± 3% softirqs.CPU214.RCU
143095 -63.6% 52118 ± 15% softirqs.CPU215.RCU
147312 -56.8% 63666 ± 4% softirqs.CPU216.RCU
143855 ± 2% -74.1% 37305 ± 8% softirqs.CPU217.RCU
137220 ± 2% -42.2% 79354 ± 8% softirqs.CPU218.RCU
140877 ± 2% -44.6% 78032 ± 9% softirqs.CPU219.RCU
149192 -75.1% 37180 ± 9% softirqs.CPU22.RCU
148680 -63.8% 53862 ± 9% softirqs.CPU220.RCU
149030 -59.0% 61029 ± 24% softirqs.CPU221.RCU
129255 ± 5% -54.2% 59214 ± 8% softirqs.CPU222.RCU
141081 ± 3% -57.2% 60382 ± 4% softirqs.CPU223.RCU
141349 ± 2% -52.5% 67175 ± 16% softirqs.CPU224.RCU
140947 -51.2% 68741 ± 26% softirqs.CPU225.RCU
142097 -56.7% 61538 ± 9% softirqs.CPU226.RCU
141631 -55.9% 62393 ± 9% softirqs.CPU227.RCU
138989 ± 5% -44.5% 77152 ± 4% softirqs.CPU228.RCU
142158 -46.3% 76380 ± 2% softirqs.CPU229.RCU
149213 -74.0% 38726 ± 11% softirqs.CPU23.RCU
148070 -45.3% 80985 ± 3% softirqs.CPU230.RCU
147054 -47.0% 77910 softirqs.CPU231.RCU
132579 -51.7% 63982 ± 22% softirqs.CPU232.RCU
139684 -54.5% 63535 ± 16% softirqs.CPU233.RCU
143487 ± 5% -72.6% 39378 ± 11% softirqs.CPU234.RCU
142380 -67.3% 46619 ± 27% softirqs.CPU235.RCU
143210 ± 3% -45.7% 77828 ± 7% softirqs.CPU236.RCU
142725 ± 2% -49.5% 72074 ± 7% softirqs.CPU237.RCU
144390 ± 2% -64.4% 51394 ± 2% softirqs.CPU238.RCU
138946 ± 4% -64.9% 48756 ± 7% softirqs.CPU239.RCU
139948 ± 6% -66.7% 46574 ± 22% softirqs.CPU24.RCU
139319 ± 3% -46.0% 75238 ± 11% softirqs.CPU240.RCU
145038 ± 4% -44.2% 80861 ± 5% softirqs.CPU241.RCU
139431 ± 4% -41.8% 81101 ± 11% softirqs.CPU242.RCU
142230 ± 2% -44.9% 78421 ± 7% softirqs.CPU243.RCU
146946 -52.7% 69526 ± 26% softirqs.CPU244.RCU
143217 ± 2% -55.5% 63802 ± 20% softirqs.CPU245.RCU
142411 -56.5% 61903 ± 14% softirqs.CPU246.RCU
141483 ± 5% -59.1% 57856 ± 14% softirqs.CPU247.RCU
142446 -46.3% 76539 ± 10% softirqs.CPU248.RCU
142154 ± 2% -41.4% 83313 ± 6% softirqs.CPU249.RCU
150478 -75.0% 37572 ± 4% softirqs.CPU25.RCU
139540 ± 3% -44.7% 77146 ± 7% softirqs.CPU250.RCU
146768 -46.4% 78658 ± 6% softirqs.CPU251.RCU
144365 -59.0% 59236 ± 8% softirqs.CPU252.RCU
139798 ± 3% -45.3% 76470 ± 21% softirqs.CPU253.RCU
146669 -58.7% 60580 ± 11% softirqs.CPU254.RCU
142679 -56.8% 61582 ± 14% softirqs.CPU255.RCU
147315 ± 2% -69.7% 44583 ± 31% softirqs.CPU256.RCU
143467 -75.1% 35694 ± 3% softirqs.CPU257.RCU
133596 ± 2% -46.6% 71299 ± 8% softirqs.CPU258.RCU
130651 ± 4% -42.0% 75759 ± 23% softirqs.CPU259.RCU
150875 -69.9% 45352 ± 40% softirqs.CPU26.RCU
145742 ± 2% -50.0% 72835 ± 17% softirqs.CPU260.RCU
147088 -47.0% 77954 ± 9% softirqs.CPU261.RCU
136092 ± 2% -42.5% 78240 ± 5% softirqs.CPU262.RCU
137450 ± 2% -47.0% 72861 ± 8% softirqs.CPU263.RCU
147266 -67.4% 48024 ± 11% softirqs.CPU264.RCU
150209 ± 7% -58.9% 61693 ± 17% softirqs.CPU265.RCU
138347 ± 2% -64.4% 49208 ± 12% softirqs.CPU266.RCU
152793 ± 2% -68.8% 47730 ± 8% softirqs.CPU267.RCU
145801 -67.8% 47013 ± 20% softirqs.CPU268.RCU
138275 ± 6% -66.3% 46622 ± 25% softirqs.CPU269.RCU
148179 ± 3% -73.9% 38702 ± 2% softirqs.CPU27.RCU
144943 -70.5% 42692 ± 17% softirqs.CPU270.RCU
144114 ± 2% -65.2% 50135 ± 15% softirqs.CPU271.RCU
143648 ± 2% -59.7% 57848 ± 25% softirqs.CPU272.RCU
136667 ± 9% -60.0% 54600 ± 42% softirqs.CPU273.RCU
134889 ± 4% -66.8% 44719 ± 15% softirqs.CPU274.RCU
141352 ± 2% -66.7% 47078 ± 17% softirqs.CPU275.RCU
149548 ± 3% -72.8% 40747 ± 5% softirqs.CPU276.RCU
144559 -68.6% 45419 ± 11% softirqs.CPU277.RCU
152609 ± 3% -65.5% 52614 ± 21% softirqs.CPU278.RCU
144128 -69.2% 44404 ± 5% softirqs.CPU279.RCU
149528 ± 2% -72.2% 41632 ± 19% softirqs.CPU28.RCU
138839 ± 2% -72.9% 37575 ± 5% softirqs.CPU280.RCU
144458 -72.4% 39807 ± 6% softirqs.CPU281.RCU
142443 ± 2% -65.3% 49439 ± 18% softirqs.CPU282.RCU
144386 -63.1% 53283 ± 38% softirqs.CPU283.RCU
149196 -63.1% 55034 ± 15% softirqs.CPU284.RCU
142301 -66.0% 48379 ± 9% softirqs.CPU285.RCU
150683 -67.8% 48488 ± 17% softirqs.CPU286.RCU
130981 ± 3% -57.6% 55596 ± 14% softirqs.CPU287.RCU
152743 ± 2% -75.2% 37816 ± 8% softirqs.CPU29.RCU
146420 ± 2% -71.4% 41827 ± 7% softirqs.CPU3.RCU
149996 -61.5% 57714 ± 53% softirqs.CPU30.RCU
150934 -65.8% 51629 ± 24% softirqs.CPU31.RCU
150459 ± 2% -75.8% 36486 ± 2% softirqs.CPU32.RCU
149892 -69.8% 45277 ± 32% softirqs.CPU33.RCU
149804 ± 5% -66.8% 49681 ± 36% softirqs.CPU34.RCU
137898 ± 5% -74.2% 35608 ± 3% softirqs.CPU35.RCU
152001 ± 2% -74.9% 38088 ± 10% softirqs.CPU36.RCU
148600 ± 2% -69.7% 45067 ± 22% softirqs.CPU37.RCU
150712 ± 5% -72.6% 41356 ± 21% softirqs.CPU38.RCU
148916 -48.6% 76615 ± 32% softirqs.CPU39.RCU
151563 ± 3% -72.7% 41428 ± 5% softirqs.CPU4.RCU
147078 -68.2% 46806 ± 29% softirqs.CPU40.RCU
147330 ± 5% -62.1% 55909 ± 34% softirqs.CPU41.RCU
146539 -59.0% 60100 ± 18% softirqs.CPU42.RCU
143115 ± 2% -66.1% 48457 ± 22% softirqs.CPU43.RCU
145713 ± 2% -62.2% 55128 ± 25% softirqs.CPU44.RCU
150433 -67.1% 49527 ± 11% softirqs.CPU45.RCU
148077 -73.6% 39126 ± 4% softirqs.CPU46.RCU
158449 ± 5% -53.1% 74380 ± 9% softirqs.CPU47.RCU
152460 -65.6% 52407 ± 7% softirqs.CPU48.RCU
148122 -62.2% 56012 ± 19% softirqs.CPU49.RCU
149452 -73.2% 40044 ± 17% softirqs.CPU5.RCU
148740 ± 3% -62.4% 55890 ± 15% softirqs.CPU50.RCU
146229 -64.5% 51882 ± 23% softirqs.CPU51.RCU
146880 ± 2% -71.0% 42540 ± 6% softirqs.CPU52.RCU
152004 ± 3% -70.9% 44225 ± 10% softirqs.CPU53.RCU
137981 ± 3% -65.5% 47653 ± 13% softirqs.CPU54.RCU
144774 -58.7% 59763 ± 37% softirqs.CPU55.RCU
152584 ± 2% -69.0% 47365 ± 15% softirqs.CPU56.RCU
145585 ± 2% -58.7% 60148 ± 28% softirqs.CPU57.RCU
152466 -62.7% 56864 ± 7% softirqs.CPU58.RCU
148066 -65.8% 50694 ± 10% softirqs.CPU59.RCU
147809 ± 2% -71.2% 42539 ± 24% softirqs.CPU6.RCU
139054 ± 4% -69.7% 42144 ± 16% softirqs.CPU60.RCU
146609 -65.6% 50385 ± 13% softirqs.CPU61.RCU
140736 ± 4% -70.2% 41911 ± 4% softirqs.CPU62.RCU
150042 ± 4% -72.9% 40669 ± 10% softirqs.CPU63.RCU
149494 -69.7% 45283 ± 18% softirqs.CPU64.RCU
146783 ± 4% -67.3% 48007 ± 15% softirqs.CPU65.RCU
150611 -68.2% 47965 ± 13% softirqs.CPU66.RCU
139234 -67.4% 45407 ± 12% softirqs.CPU67.RCU
148960 ± 3% -69.7% 45081 ± 13% softirqs.CPU68.RCU
151217 -49.4% 76556 ± 8% softirqs.CPU69.RCU
149167 -70.5% 44038 ± 10% softirqs.CPU7.RCU
141711 ± 3% -66.5% 47481 ± 5% softirqs.CPU70.RCU
150495 -64.0% 54186 ± 3% softirqs.CPU71.RCU
144744 ± 3% -44.2% 80838 ± 8% softirqs.CPU72.RCU
136696 ± 2% -38.0% 84779 ± 20% softirqs.CPU73.RCU
149733 -46.8% 79645 ± 5% softirqs.CPU74.RCU
147438 ± 2% -46.8% 78397 ± 17% softirqs.CPU75.RCU
133887 ± 9% -54.9% 60372 ± 20% softirqs.CPU76.RCU
143925 ± 2% -48.1% 74722 ± 26% softirqs.CPU77.RCU
146567 -59.8% 58886 ± 16% softirqs.CPU78.RCU
147427 -57.3% 62883 ± 10% softirqs.CPU79.RCU
150664 -68.1% 48076 ± 27% softirqs.CPU8.RCU
146030 -63.8% 52917 ± 18% softirqs.CPU80.RCU
139080 ± 2% -64.0% 50007 ± 24% softirqs.CPU81.RCU
144590 ± 2% -43.9% 81176 ± 17% softirqs.CPU82.RCU
141457 ± 2% -48.2% 73232 ± 21% softirqs.CPU83.RCU
146686 ± 2% -73.7% 38572 ± 8% softirqs.CPU84.RCU
149314 -61.9% 56963 ± 32% softirqs.CPU85.RCU
136532 ± 5% -45.9% 73883 ± 8% softirqs.CPU86.RCU
136856 ± 6% -42.2% 79101 ± 6% softirqs.CPU87.RCU
147431 -53.4% 68679 ± 17% softirqs.CPU88.RCU
128555 ± 5% -50.4% 63740 ± 19% softirqs.CPU89.RCU
150064 -67.4% 48899 ± 33% softirqs.CPU9.RCU
149543 -59.9% 59930 ± 11% softirqs.CPU90.RCU
148388 ± 3% -60.4% 58758 ± 21% softirqs.CPU91.RCU
134059 ± 7% -45.0% 73673 ± 9% softirqs.CPU92.RCU
139940 -44.7% 77438 ± 6% softirqs.CPU93.RCU
145683 -72.5% 40029 ± 7% softirqs.CPU94.RCU
149777 -47.4% 78782 ± 18% softirqs.CPU95.RCU
149632 -45.1% 82115 ± 2% softirqs.CPU96.RCU
146641 ± 2% -46.2% 78878 ± 13% softirqs.CPU97.RCU
146629 -43.5% 82815 ± 13% softirqs.CPU98.RCU
138969 ± 6% -42.3% 80249 ± 5% softirqs.CPU99.RCU
41531880 -58.0% 17433667 ± 3% softirqs.RCU



will-it-scale.per_process_ops

80000 +-------------------------------------------------------------------+
| |
70000 |.+ .+.+..+. .+. .+.+..+.+.. .+.+..+.+..+.+.. |
60000 |-+. +..+ +. + + |
| |
50000 |-+ |
| |
40000 |-+ |
| |
30000 |-+ |
20000 |-+ |
| |
10000 |-+ |
| |
0 +-------------------------------------------------------------------+


will-it-scale.workload

2.2e+07 +-----------------------------------------------------------------+
2e+07 |-+ .+.+.. .+.+. .+..+. .+.. |
|.+..+ +.+..+.+.+. +..+.+ + +.+ |
1.8e+07 |-+ |
1.6e+07 |-+ |
1.4e+07 |-+ |
1.2e+07 |-+ |
| |
1e+07 |-+ |
8e+06 |-+ |
6e+06 |-+ |
4e+06 |-+ |
| |
2e+06 |-+ |
0 +-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample

***************************************************************************************************
lkp-knm01: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-knm01/lock1/will-it-scale/0x11

commit:
0a68ff5e2e ("fcntl: Distribute switch variables for initialization")
6d390e4b5d ("locks: fix a potential use-after-free problem when wakeup a waiter")

0a68ff5e2e7cf226 6d390e4b5d48ec03bb87e63cf0a
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:4 50% 2:4 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x
2:4 -50% :4 dmesg.WARNING:at_ip__fsnotify_parent/0x
%stddev %change %stddev
\ | \
4555 -51.3% 2218 ± 3% will-it-scale.per_thread_ops
760.80 ± 6% -26.1% 561.95 ± 7% will-it-scale.time.user_time
1312253 -51.3% 639028 ± 3% will-it-scale.workload
153716 +10.6% 170015 ± 2% meminfo.Shmem
1.01 ± 5% -0.2 0.79 ± 5% mpstat.cpu.all.usr%
6359 ± 5% -11.1% 5656 ± 3% sched_debug.cpu.curr->pid.max
153760 +10.5% 169943 ± 2% numa-meminfo.node0.Shmem
11621 ± 13% +19.8% 13920 ± 9% numa-meminfo.node1.SUnreclaim
38459 +10.6% 42548 ± 2% numa-vmstat.node0.nr_shmem
2905 ± 13% +19.8% 3480 ± 9% numa-vmstat.node1.nr_slab_unreclaimable
2228 ± 2% +13.6% 2530 ± 6% slabinfo.UNIX.active_objs
2228 ± 2% +13.6% 2530 ± 6% slabinfo.UNIX.num_objs
160690 +2.4% 164524 proc-vmstat.nr_active_anon
306982 +1.3% 311087 proc-vmstat.nr_file_pages
38387 +10.7% 42492 ± 2% proc-vmstat.nr_shmem
160690 +2.4% 164524 proc-vmstat.nr_zone_active_anon
4048 ± 2% +13.8% 4607 ± 3% proc-vmstat.pgactivate
5.69 ± 2% -5.1% 5.40 ± 2% perf-stat.i.MPKI
8.616e+09 -2.7% 8.38e+09 perf-stat.i.branch-instructions
1.03 -0.2 0.81 perf-stat.i.branch-miss-rate%
86798208 -24.9% 65209294 perf-stat.i.branch-misses
20.35 +2.2 22.56 perf-stat.i.cache-miss-rate%
40221638 +1.5% 40843608 perf-stat.i.cache-misses
1.977e+08 ± 2% -8.4% 1.812e+08 ± 2% perf-stat.i.cache-references
12.59 +3.4% 13.01 perf-stat.i.cpi
11067 -1.6% 10894 perf-stat.i.cycles-between-cache-misses
0.21 -0.0 0.19 perf-stat.i.iTLB-load-miss-rate%
74059260 -13.5% 64033307 perf-stat.i.iTLB-load-misses
3.541e+10 -3.3% 3.424e+10 perf-stat.i.iTLB-loads
3.536e+10 -3.3% 3.419e+10 perf-stat.i.instructions
477.71 +11.9% 534.66 perf-stat.i.instructions-per-iTLB-miss
0.08 -3.2% 0.08 perf-stat.i.ipc
1.00 -0.2 0.77 perf-stat.overall.branch-miss-rate%
20.34 +2.2 22.52 ± 2% perf-stat.overall.cache-miss-rate%
12.61 +3.4% 13.03 perf-stat.overall.cpi
11066 -1.5% 10895 perf-stat.overall.cycles-between-cache-misses
0.21 -0.0 0.19 perf-stat.overall.iTLB-load-miss-rate%
477.71 +11.9% 534.68 perf-stat.overall.instructions-per-iTLB-miss
0.08 -3.3% 0.08 perf-stat.overall.ipc
8253689 +98.9% 16418473 ± 2% perf-stat.overall.path-length
8.596e+09 -2.7% 8.362e+09 perf-stat.ps.branch-instructions
86094437 -25.0% 64597658 perf-stat.ps.branch-misses
40178634 +1.5% 40798401 perf-stat.ps.cache-misses
1.976e+08 ± 2% -8.3% 1.812e+08 ± 2% perf-stat.ps.cache-references
73832341 -13.6% 63794241 perf-stat.ps.iTLB-load-misses
3.53e+10 -3.3% 3.413e+10 perf-stat.ps.iTLB-loads
3.527e+10 -3.3% 3.411e+10 perf-stat.ps.instructions
1.083e+13 -3.2% 1.048e+13 perf-stat.total.instructions
96.22 -96.2 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
96.11 -96.1 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.fcntl_setlk.do_fcntl.__x64_sys_fcntl
0.94 -0.6 0.39 ± 57% perf-profile.calltrace.cycles-pp.posix_lock_inode.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl
98.41 +0.4 98.78 perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe.do_fcntl
98.14 +0.5 98.67 perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +4.8 4.82 ± 29% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.native_queued_spin_lock_slowpath._raw_spin_lock.locks_delete_block
0.00 +5.1 5.06 ± 28% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.native_queued_spin_lock_slowpath._raw_spin_lock.locks_delete_block.do_lock_file_wait
1.19 +97.1 98.29 perf-profile.calltrace.cycles-pp.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
0.00 +97.3 97.27 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.locks_delete_block.do_lock_file_wait.fcntl_setlk
0.00 +97.4 97.39 perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_delete_block.do_lock_file_wait.fcntl_setlk.do_fcntl
0.00 +97.6 97.64 perf-profile.calltrace.cycles-pp.locks_delete_block.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl
0.95 -0.4 0.52 ± 6% perf-profile.children.cycles-pp.posix_lock_inode
0.60 -0.3 0.30 ± 4% perf-profile.children.cycles-pp.locks_alloc_lock
0.49 -0.3 0.23 ± 4% perf-profile.children.cycles-pp.kmem_cache_alloc
0.20 ± 2% -0.1 0.12 ± 3% perf-profile.children.cycles-pp.kmem_cache_free
0.18 ± 6% -0.1 0.10 ± 5% perf-profile.children.cycles-pp.security_file_lock
0.18 ± 4% -0.1 0.09 ± 7% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.12 ± 3% -0.1 0.04 ± 58% perf-profile.children.cycles-pp.locks_delete_lock_ctx
0.14 -0.1 0.07 ± 7% perf-profile.children.cycles-pp.memset_erms
0.15 ± 8% -0.1 0.09 ± 5% perf-profile.children.cycles-pp.common_file_perm
0.13 ± 3% -0.1 0.06 ± 13% perf-profile.children.cycles-pp.___might_sleep
0.15 ± 3% -0.1 0.09 ± 10% perf-profile.children.cycles-pp._copy_from_user
0.25 ± 4% -0.1 0.19 ± 16% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.09 ± 5% -0.0 0.04 ± 58% perf-profile.children.cycles-pp.__fget_files
0.12 ± 3% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.__fget_light
0.11 -0.0 0.07 perf-profile.children.cycles-pp.__libc_fcntl
0.00 +0.1 0.10 ± 10% perf-profile.children.cycles-pp.__locks_wake_up_blocks
99.20 +0.2 99.40 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
99.16 +0.2 99.38 perf-profile.children.cycles-pp.do_syscall_64
98.22 +0.4 98.67 perf-profile.children.cycles-pp.fcntl_setlk
96.40 +1.2 97.57 perf-profile.children.cycles-pp._raw_spin_lock
96.22 +1.2 97.42 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.19 +97.1 98.29 perf-profile.children.cycles-pp.do_lock_file_wait
0.00 +97.6 97.64 perf-profile.children.cycles-pp.locks_delete_block
0.30 ± 4% -0.2 0.11 ± 10% perf-profile.self.cycles-pp.fcntl_setlk
0.18 ± 2% -0.1 0.09 ± 7% perf-profile.self.cycles-pp.kmem_cache_alloc
0.19 -0.1 0.11 ± 7% perf-profile.self.cycles-pp.kmem_cache_free
0.15 ± 4% -0.1 0.08 ± 8% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.12 ± 4% -0.1 0.06 ± 7% perf-profile.self.cycles-pp.memset_erms
0.15 -0.1 0.09 ± 5% perf-profile.self.cycles-pp.posix_lock_inode
0.12 -0.1 0.06 ± 13% perf-profile.self.cycles-pp.___might_sleep
0.25 ± 4% -0.1 0.19 ± 17% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.14 ± 3% -0.1 0.08 ± 5% perf-profile.self.cycles-pp.do_fcntl
0.08 ± 5% -0.0 0.04 ± 58% perf-profile.self.cycles-pp.__fget_files
0.11 -0.0 0.07 ± 14% perf-profile.self.cycles-pp.do_syscall_64
0.09 ± 4% -0.0 0.05 ± 9% perf-profile.self.cycles-pp.locks_alloc_lock
0.00 +0.1 0.10 ± 15% perf-profile.self.cycles-pp.__locks_wake_up_blocks
5550 ± 33% -33.8% 3672 interrupts.CPU102.NMI:Non-maskable_interrupts
5550 ± 33% -33.8% 3672 interrupts.CPU102.PMI:Performance_monitoring_interrupts
5548 ± 33% -33.7% 3677 interrupts.CPU103.NMI:Non-maskable_interrupts
5548 ± 33% -33.7% 3677 interrupts.CPU103.PMI:Performance_monitoring_interrupts
5544 ± 33% -17.3% 4586 ± 34% interrupts.CPU105.NMI:Non-maskable_interrupts
5544 ± 33% -17.3% 4586 ± 34% interrupts.CPU105.PMI:Performance_monitoring_interrupts
236.00 ± 2% +28.8% 304.00 ± 20% interrupts.CPU11.RES:Rescheduling_interrupts
115.00 ± 84% -82.4% 20.25 ±110% interrupts.CPU111.RES:Rescheduling_interrupts
5545 ± 33% -33.7% 3678 interrupts.CPU112.NMI:Non-maskable_interrupts
5545 ± 33% -33.7% 3678 interrupts.CPU112.PMI:Performance_monitoring_interrupts
6470 ± 24% -29.1% 4587 ± 34% interrupts.CPU121.NMI:Non-maskable_interrupts
6470 ± 24% -29.1% 4587 ± 34% interrupts.CPU121.PMI:Performance_monitoring_interrupts
5548 ± 33% -17.1% 4598 ± 34% interrupts.CPU122.NMI:Non-maskable_interrupts
5548 ± 33% -17.1% 4598 ± 34% interrupts.CPU122.PMI:Performance_monitoring_interrupts
6455 ± 24% -29.0% 4581 ± 34% interrupts.CPU129.NMI:Non-maskable_interrupts
6455 ± 24% -29.0% 4581 ± 34% interrupts.CPU129.PMI:Performance_monitoring_interrupts
5535 ± 33% -17.0% 4593 ± 34% interrupts.CPU131.NMI:Non-maskable_interrupts
5535 ± 33% -17.0% 4593 ± 34% interrupts.CPU131.PMI:Performance_monitoring_interrupts
5546 ± 33% -33.7% 3678 interrupts.CPU135.NMI:Non-maskable_interrupts
5546 ± 33% -33.7% 3678 interrupts.CPU135.PMI:Performance_monitoring_interrupts
6451 ± 24% -28.9% 4587 ± 34% interrupts.CPU136.NMI:Non-maskable_interrupts
6451 ± 24% -28.9% 4587 ± 34% interrupts.CPU136.PMI:Performance_monitoring_interrupts
71.50 ±129% -88.5% 8.25 ± 40% interrupts.CPU136.RES:Rescheduling_interrupts
5524 ± 33% -17.0% 4586 ± 34% interrupts.CPU137.NMI:Non-maskable_interrupts
5524 ± 33% -17.0% 4586 ± 34% interrupts.CPU137.PMI:Performance_monitoring_interrupts
108.50 ±135% -93.5% 7.00 interrupts.CPU137.RES:Rescheduling_interrupts
6464 ± 24% -28.9% 4596 ± 34% interrupts.CPU138.NMI:Non-maskable_interrupts
6464 ± 24% -28.9% 4596 ± 34% interrupts.CPU138.PMI:Performance_monitoring_interrupts
5537 ± 33% -17.2% 4584 ± 34% interrupts.CPU141.NMI:Non-maskable_interrupts
5537 ± 33% -17.2% 4584 ± 34% interrupts.CPU141.PMI:Performance_monitoring_interrupts
148.25 ±110% -93.8% 9.25 ± 29% interrupts.CPU152.RES:Rescheduling_interrupts
5534 ± 33% -33.6% 3673 interrupts.CPU156.NMI:Non-maskable_interrupts
5534 ± 33% -33.6% 3673 interrupts.CPU156.PMI:Performance_monitoring_interrupts
7360 -37.7% 4584 ± 34% interrupts.CPU169.NMI:Non-maskable_interrupts
7360 -37.7% 4584 ± 34% interrupts.CPU169.PMI:Performance_monitoring_interrupts
114.25 ±115% -74.2% 29.50 ±118% interrupts.CPU170.RES:Rescheduling_interrupts
38.00 ±115% -75.7% 9.25 ± 11% interrupts.CPU177.RES:Rescheduling_interrupts
7367 -50.3% 3663 interrupts.CPU182.NMI:Non-maskable_interrupts
7367 -50.3% 3663 interrupts.CPU182.PMI:Performance_monitoring_interrupts
4597 ± 34% +59.7% 7343 interrupts.CPU185.NMI:Non-maskable_interrupts
4597 ± 34% +59.7% 7343 interrupts.CPU185.PMI:Performance_monitoring_interrupts
110.00 ±134% -91.1% 9.75 ± 25% interrupts.CPU209.RES:Rescheduling_interrupts
5514 ± 33% -17.1% 4572 ± 34% interrupts.CPU211.NMI:Non-maskable_interrupts
5514 ± 33% -17.1% 4572 ± 34% interrupts.CPU211.PMI:Performance_monitoring_interrupts
5646 ± 33% -33.5% 3755 interrupts.CPU22.NMI:Non-maskable_interrupts
5646 ± 33% -33.5% 3755 interrupts.CPU22.PMI:Performance_monitoring_interrupts
5518 ± 33% -17.1% 4573 ± 34% interrupts.CPU228.NMI:Non-maskable_interrupts
5518 ± 33% -17.1% 4573 ± 34% interrupts.CPU228.PMI:Performance_monitoring_interrupts
5642 ± 33% -16.9% 4688 ± 34% interrupts.CPU23.NMI:Non-maskable_interrupts
5642 ± 33% -16.9% 4688 ± 34% interrupts.CPU23.PMI:Performance_monitoring_interrupts
6438 ± 24% -29.1% 4568 ± 34% interrupts.CPU236.NMI:Non-maskable_interrupts
6438 ± 24% -29.1% 4568 ± 34% interrupts.CPU236.PMI:Performance_monitoring_interrupts
5524 ± 33% -33.8% 3659 interrupts.CPU241.NMI:Non-maskable_interrupts
5524 ± 33% -33.8% 3659 interrupts.CPU241.PMI:Performance_monitoring_interrupts
69.50 ±135% -87.8% 8.50 ± 13% interrupts.CPU256.RES:Rescheduling_interrupts
5513 ± 33% -17.4% 4556 ± 33% interrupts.CPU261.NMI:Non-maskable_interrupts
5513 ± 33% -17.4% 4556 ± 33% interrupts.CPU261.PMI:Performance_monitoring_interrupts
6428 ± 24% -43.2% 3654 interrupts.CPU263.NMI:Non-maskable_interrupts
6428 ± 24% -43.2% 3654 interrupts.CPU263.PMI:Performance_monitoring_interrupts
5543 ± 33% -33.8% 3670 interrupts.CPU264.NMI:Non-maskable_interrupts
5543 ± 33% -33.8% 3670 interrupts.CPU264.PMI:Performance_monitoring_interrupts
44.50 ±121% -81.5% 8.25 ± 15% interrupts.CPU267.RES:Rescheduling_interrupts
3754 +98.2% 7440 interrupts.CPU35.NMI:Non-maskable_interrupts
3754 +98.2% 7440 interrupts.CPU35.PMI:Performance_monitoring_interrupts
5605 ± 33% -33.9% 3707 interrupts.CPU42.NMI:Non-maskable_interrupts
5605 ± 33% -33.9% 3707 interrupts.CPU42.PMI:Performance_monitoring_interrupts
5576 ± 32% -17.3% 4610 ± 34% interrupts.CPU44.NMI:Non-maskable_interrupts
5576 ± 32% -17.3% 4610 ± 34% interrupts.CPU44.PMI:Performance_monitoring_interrupts
7394 -50.1% 3693 interrupts.CPU56.NMI:Non-maskable_interrupts
7394 -50.1% 3693 interrupts.CPU56.PMI:Performance_monitoring_interrupts
7418 -37.6% 4626 ± 34% interrupts.CPU61.NMI:Non-maskable_interrupts
7418 -37.6% 4626 ± 34% interrupts.CPU61.PMI:Performance_monitoring_interrupts
7.25 ± 31% +1051.7% 83.50 ± 62% interrupts.CPU62.RES:Rescheduling_interrupts
6485 ± 24% -28.8% 4619 ± 34% interrupts.CPU65.NMI:Non-maskable_interrupts
6485 ± 24% -28.8% 4619 ± 34% interrupts.CPU65.PMI:Performance_monitoring_interrupts
134.75 ±105% -93.5% 8.75 ± 16% interrupts.CPU65.RES:Rescheduling_interrupts
6487 ± 24% -28.9% 4613 ± 34% interrupts.CPU68.NMI:Non-maskable_interrupts
6487 ± 24% -28.9% 4613 ± 34% interrupts.CPU68.PMI:Performance_monitoring_interrupts
6498 ± 24% -43.2% 3691 interrupts.CPU69.NMI:Non-maskable_interrupts
6498 ± 24% -43.2% 3691 interrupts.CPU69.PMI:Performance_monitoring_interrupts
5550 ± 33% -17.0% 4607 ± 34% interrupts.CPU80.NMI:Non-maskable_interrupts
5550 ± 33% -17.0% 4607 ± 34% interrupts.CPU80.PMI:Performance_monitoring_interrupts
5547 ± 33% -33.6% 3683 interrupts.CPU81.NMI:Non-maskable_interrupts
5547 ± 33% -33.6% 3683 interrupts.CPU81.PMI:Performance_monitoring_interrupts
5545 ± 33% -17.1% 4597 ± 34% interrupts.CPU84.NMI:Non-maskable_interrupts
5545 ± 33% -17.1% 4597 ± 34% interrupts.CPU84.PMI:Performance_monitoring_interrupts
5558 ± 33% -17.1% 4607 ± 34% interrupts.CPU90.NMI:Non-maskable_interrupts
5558 ± 33% -17.1% 4607 ± 34% interrupts.CPU90.PMI:Performance_monitoring_interrupts
5550 ± 33% -33.7% 3679 interrupts.CPU95.NMI:Non-maskable_interrupts
5550 ± 33% -33.7% 3679 interrupts.CPU95.PMI:Performance_monitoring_interrupts
54559 ± 3% -22.4% 42328 ± 7% softirqs.CPU104.RCU
65316 ± 16% -25.4% 48704 softirqs.CPU105.RCU
49154 ± 3% -26.3% 36235 ± 4% softirqs.CPU108.RCU
48474 ± 4% -20.4% 38585 ± 8% softirqs.CPU109.RCU
56182 ± 9% -24.4% 42449 ± 10% softirqs.CPU11.RCU
69518 ± 7% -17.7% 57209 ± 9% softirqs.CPU114.RCU
70411 ± 7% -16.9% 58494 ± 14% softirqs.CPU115.RCU
82704 ± 7% -11.9% 72902 ± 5% softirqs.CPU117.RCU
49949 ± 5% -24.5% 37732 ± 6% softirqs.CPU12.RCU
71866 ± 12% -21.0% 56762 ± 10% softirqs.CPU120.RCU
69964 ± 9% -16.7% 58296 ± 13% softirqs.CPU121.RCU
74258 ± 11% -19.0% 60167 ± 16% softirqs.CPU123.RCU
64097 ± 4% -22.3% 49819 ± 4% softirqs.CPU13.RCU
71335 ± 11% -18.4% 58212 ± 13% softirqs.CPU132.RCU
47661 ± 3% -19.2% 38499 ± 7% softirqs.CPU134.RCU
71835 ± 9% -17.8% 59082 ± 6% softirqs.CPU136.RCU
73184 ± 12% -20.0% 58549 ± 12% softirqs.CPU137.RCU
50549 ± 5% -27.2% 36822 ± 5% softirqs.CPU14.RCU
87211 ± 5% -11.5% 77224 ± 8% softirqs.CPU143.RCU
58973 ± 18% -37.5% 36832 ± 4% softirqs.CPU15.RCU
54512 ± 9% -23.5% 41675 ± 6% softirqs.CPU150.RCU
48970 ± 4% -14.9% 41693 ± 7% softirqs.CPU151.RCU
51441 ± 8% -25.4% 38354 ± 4% softirqs.CPU156.RCU
49719 ± 7% -21.0% 39301 ± 3% softirqs.CPU157.RCU
51413 ± 5% -27.3% 37356 ± 2% softirqs.CPU16.RCU
69966 ± 6% -14.7% 59665 ± 11% softirqs.CPU160.RCU
47327 ± 4% -25.4% 35304 ± 3% softirqs.CPU169.RCU
49442 ± 4% -28.5% 35364 ± 4% softirqs.CPU17.RCU
54983 ± 2% -13.7% 47425 ± 4% softirqs.CPU172.RCU
71486 ± 13% -18.8% 58033 ± 8% softirqs.CPU175.RCU
78719 ± 26% -24.9% 59143 ± 10% softirqs.CPU176.RCU
66339 ± 6% -15.1% 56337 ± 10% softirqs.CPU177.RCU
70598 ± 5% -12.9% 61514 ± 4% softirqs.CPU178.RCU
49000 ± 5% -25.9% 36287 ± 6% softirqs.CPU18.RCU
69522 ± 2% -17.5% 57331 ± 11% softirqs.CPU184.RCU
53616 ± 15% -29.7% 37690 ± 11% softirqs.CPU19.RCU
46530 ± 3% -23.6% 35527 ± 5% softirqs.CPU195.RCU
65960 ± 9% -13.9% 56772 ± 5% softirqs.CPU196.RCU
65948 ± 3% -13.8% 56818 ± 9% softirqs.CPU197.RCU
61753 ± 11% -18.4% 50375 ± 9% softirqs.CPU198.RCU
49063 ± 5% -26.4% 36102 ± 3% softirqs.CPU20.RCU
69682 ± 8% -17.0% 57840 ± 12% softirqs.CPU200.RCU
85131 ± 6% -16.5% 71117 ± 8% softirqs.CPU209.RCU
48180 ± 4% -24.9% 36169 ± 5% softirqs.CPU21.RCU
49795 ± 3% -24.2% 37734 ± 11% softirqs.CPU211.RCU
74862 ± 2% -20.0% 59923 ± 15% softirqs.CPU213.RCU
67497 ± 13% -22.9% 52016 ± 8% softirqs.CPU215.RCU
70217 ± 8% -25.2% 52488 ± 13% softirqs.CPU221.RCU
73106 ± 3% -20.5% 58122 ± 8% softirqs.CPU222.RCU
78826 ± 7% -26.6% 57843 ± 13% softirqs.CPU224.RCU
80792 -12.3% 70815 ± 7% softirqs.CPU230.RCU
60982 ± 7% -20.4% 48548 ± 10% softirqs.CPU239.RCU
46118 ± 4% -23.1% 35479 ± 7% softirqs.CPU24.RCU
88583 ± 4% -20.6% 70347 ± 12% softirqs.CPU242.RCU
86647 ± 3% -20.5% 68926 ± 13% softirqs.CPU248.RCU
47652 ± 4% -26.2% 35166 ± 4% softirqs.CPU25.RCU
88271 ± 2% -19.1% 71391 ± 16% softirqs.CPU250.RCU
47416 ± 4% -20.2% 37859 ± 16% softirqs.CPU256.RCU
52441 ± 14% -32.8% 35259 ± 2% softirqs.CPU26.RCU
54370 ± 5% -19.1% 43968 ± 14% softirqs.CPU265.RCU
53131 ± 7% -23.2% 40803 ± 8% softirqs.CPU266.RCU
48354 ± 4% -26.4% 35588 ± 4% softirqs.CPU268.RCU
51934 ± 14% -33.3% 34636 ± 5% softirqs.CPU269.RCU
53543 ± 13% -33.5% 35599 ± 3% softirqs.CPU27.RCU
47194 ± 6% -20.0% 37759 softirqs.CPU270.RCU
49457 ± 14% -29.0% 35136 softirqs.CPU271.RCU
45491 ± 5% -26.2% 33559 ± 2% softirqs.CPU272.RCU
45784 ± 3% -25.1% 34298 ± 4% softirqs.CPU273.RCU
52291 ± 26% -30.2% 36488 ± 6% softirqs.CPU274.RCU
44673 ± 5% -21.3% 35144 ± 2% softirqs.CPU275.RCU
49716 ± 5% -26.6% 36487 ± 5% softirqs.CPU276.RCU
52399 ± 14% -24.7% 39459 ± 7% softirqs.CPU277.RCU
48252 ± 7% -25.0% 36188 ± 7% softirqs.CPU28.RCU
50299 ± 8% -27.5% 36457 ± 9% softirqs.CPU280.RCU
50958 ± 12% -28.7% 36316 ± 8% softirqs.CPU281.RCU
48188 ± 3% -24.5% 36399 ± 7% softirqs.CPU283.RCU
53991 ± 5% -21.1% 42600 ± 9% softirqs.CPU285.RCU
63902 ± 9% -20.6% 50708 ± 3% softirqs.CPU287.RCU
48867 ± 3% -26.7% 35816 ± 4% softirqs.CPU29.RCU
64188 ± 22% -36.3% 40910 ± 15% softirqs.CPU3.RCU
49075 ± 7% -24.1% 37237 ± 10% softirqs.CPU30.RCU
55112 ± 19% -33.5% 36654 ± 5% softirqs.CPU31.RCU
49696 ± 7% -23.3% 38094 ± 10% softirqs.CPU32.RCU
47263 ± 4% -26.6% 34695 ± 3% softirqs.CPU33.RCU
49828 ± 7% -25.0% 37346 ± 13% softirqs.CPU34.RCU
48569 ± 2% -22.8% 37473 ± 14% softirqs.CPU35.RCU
53596 ± 15% -31.7% 36597 ± 5% softirqs.CPU36.RCU
65963 ± 8% -19.1% 53397 ± 2% softirqs.CPU39.RCU
48592 ± 6% -21.8% 38015 softirqs.CPU41.RCU
54962 ± 12% -23.0% 42297 ± 7% softirqs.CPU44.RCU
56343 ± 5% -17.9% 46282 ± 12% softirqs.CPU48.RCU
49455 -12.9% 43073 ± 8% softirqs.CPU49.RCU
51678 ± 11% -24.4% 39057 ± 4% softirqs.CPU5.RCU
48164 ± 3% -24.2% 36493 ± 2% softirqs.CPU52.RCU
51613 ± 5% -25.3% 38554 ± 4% softirqs.CPU53.RCU
48022 ± 7% -23.9% 36525 ± 3% softirqs.CPU54.RCU
57159 ± 7% -16.9% 47503 ± 12% softirqs.CPU56.RCU
51776 ± 6% -18.3% 42303 ± 4% softirqs.CPU59.RCU
54240 ± 9% -24.5% 40949 ± 6% softirqs.CPU6.RCU
49715 ± 6% -24.6% 37501 ± 2% softirqs.CPU60.RCU
46880 ± 4% -20.4% 37312 ± 5% softirqs.CPU61.RCU
48634 ± 6% -11.9% 42833 ± 8% softirqs.CPU62.RCU
47966 ± 6% -20.0% 38393 ± 9% softirqs.CPU63.RCU
51800 ± 9% -22.7% 40039 ± 10% softirqs.CPU64.RCU
53752 ± 15% -32.3% 36411 ± 5% softirqs.CPU65.RCU
52624 ± 13% -24.8% 39585 ± 6% softirqs.CPU68.RCU
78620 ± 4% -14.0% 67609 ± 6% softirqs.CPU69.RCU
95100 ± 7% -17.3% 78635 ± 13% softirqs.CPU73.RCU
92258 ± 5% -19.8% 74034 ± 13% softirqs.CPU75.RCU
54483 ± 13% -32.8% 36617 ± 7% softirqs.CPU8.RCU
57971 ± 10% -24.3% 43865 ± 7% softirqs.CPU80.RCU
47545 ± 6% -25.2% 35553 ± 7% softirqs.CPU84.RCU
48658 ± 4% -17.5% 40119 ± 9% softirqs.CPU85.RCU
85838 ± 4% -14.1% 73715 ± 11% softirqs.CPU86.RCU
86626 ± 4% -18.1% 70970 ± 11% softirqs.CPU92.RCU
84850 ± 2% -17.5% 69983 ± 10% softirqs.CPU97.RCU
86178 ± 2% -18.8% 69974 ± 10% softirqs.CPU99.RCU
18827858 ± 3% -14.5% 16102740 ± 4% softirqs.RCU





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (85.74 kB)
config-5.6.0-rc4-00002-g6d390e4b5d48e (206.88 kB)
job-script (7.42 kB)
job.yaml (5.05 kB)
reproduce (320.00 B)
Download all attachments

2020-03-09 14:38:07

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: 6d390e4b5d48ec03bb87e63cf0a2bff5f4e116da ("locks: fix a potential use-after-free problem when wakeup a waiter")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: will-it-scale
> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
> with following parameters:
>
> nr_task: 100%
> mode: process
> test: lock1
> cpufreq_governor: performance
> ucode: 0x11
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+----------------------------------------------------------------------+
> > testcase: change | will-it-scale: will-it-scale.per_thread_ops -51.3% regression |
> > test machine | 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory |
> > test parameters | cpufreq_governor=performance |
> > | mode=thread |
> > | nr_task=100% |
> > | test=lock1 |
> > | ucode=0x11 |
> +------------------+----------------------------------------------------------------------+
>

This is not completely unexpected as we're banging on the global
blocked_lock_lock now for every unlock. This test just thrashes file
locks and unlocks without doing anything in between, so the workload
looks pretty artificial [1].

It would be nice to avoid the global lock in this codepath, but it
doesn't look simple to do. I'll keep thinking about it, but for now I'm
inclined to ignore this result unless we see a problem in more realistic
workloads.

[1]: https://github.com/antonblanchard/will-it-scale/blob/master/tests/lock1.c
--
Jeff Layton <[email protected]>

2020-03-09 15:54:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
>
> On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> >
> > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
>
> This is not completely unexpected as we're banging on the global
> blocked_lock_lock now for every unlock. This test just thrashes file
> locks and unlocks without doing anything in between, so the workload
> looks pretty artificial [1].
>
> It would be nice to avoid the global lock in this codepath, but it
> doesn't look simple to do. I'll keep thinking about it, but for now I'm
> inclined to ignore this result unless we see a problem in more realistic
> workloads.

That is a _huge_ regression, though.

What about something like the attached? Wouldn't that work? And make
the code actually match the old comment about wow "fl_blocker" being
NULL being special.

The old code seemed to not know about things like memory ordering either.

Patch is entirely untested, but aims to have that "smp_store_release()
means I'm done and not going to touch it any more", making that
smp_load_acquire() test hopefully be valid as per the comment..

Hmm?

Linus


Attachments:
patch.diff (1.89 kB)

2020-03-09 17:24:46

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> >
> > This is not completely unexpected as we're banging on the global
> > blocked_lock_lock now for every unlock. This test just thrashes file
> > locks and unlocks without doing anything in between, so the workload
> > looks pretty artificial [1].
> >
> > It would be nice to avoid the global lock in this codepath, but it
> > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > inclined to ignore this result unless we see a problem in more realistic
> > workloads.
>
> That is a _huge_ regression, though.
>
> What about something like the attached? Wouldn't that work? And make
> the code actually match the old comment about wow "fl_blocker" being
> NULL being special.
>
> The old code seemed to not know about things like memory ordering either.
>
> Patch is entirely untested, but aims to have that "smp_store_release()
> means I'm done and not going to touch it any more", making that
> smp_load_acquire() test hopefully be valid as per the comment..

Yeah, something along those lines maybe. I don't think we can use
fl_blocker that way though, as the wait_event_interruptible is waiting
on it to go to NULL, and the wake_up happens before fl_blocker is
cleared.

Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
instead of testing for !fl_blocker to see whether we can avoid the
blocked_lock_lock?

--
Jeff Layton <[email protected]>

2020-03-09 19:11:04

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > >
> > > This is not completely unexpected as we're banging on the global
> > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > locks and unlocks without doing anything in between, so the workload
> > > looks pretty artificial [1].
> > >
> > > It would be nice to avoid the global lock in this codepath, but it
> > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > inclined to ignore this result unless we see a problem in more realistic
> > > workloads.
> >
> > That is a _huge_ regression, though.
> >
> > What about something like the attached? Wouldn't that work? And make
> > the code actually match the old comment about wow "fl_blocker" being
> > NULL being special.
> >
> > The old code seemed to not know about things like memory ordering either.
> >
> > Patch is entirely untested, but aims to have that "smp_store_release()
> > means I'm done and not going to touch it any more", making that
> > smp_load_acquire() test hopefully be valid as per the comment..
>
> Yeah, something along those lines maybe. I don't think we can use
> fl_blocker that way though, as the wait_event_interruptible is waiting
> on it to go to NULL, and the wake_up happens before fl_blocker is
> cleared.
>
> Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> instead of testing for !fl_blocker to see whether we can avoid the
> blocked_lock_lock?
>

How about something like this instead? (untested other than for
compilation)

Basically, this just switches the waiters over to wait for
fl_blocked_member to go empty. That still happens before the wakeup, so
it should be ok to wait on that.

I think we can also eliminate the lockless list_empty check in
locks_delete_block, as the fl_blocker check should be sufficient now.
--
Jeff Layton <[email protected]>


Attachments:
0001-locks-reinstate-locks_delete_lock-optimization.patch (4.80 kB)

2020-03-09 19:56:55

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, 2020-03-09 at 15:09 -0400, Jeff Layton wrote:
> On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> > On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> > > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > > >
> > > > This is not completely unexpected as we're banging on the global
> > > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > > locks and unlocks without doing anything in between, so the workload
> > > > looks pretty artificial [1].
> > > >
> > > > It would be nice to avoid the global lock in this codepath, but it
> > > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > > inclined to ignore this result unless we see a problem in more realistic
> > > > workloads.
> > >
> > > That is a _huge_ regression, though.
> > >
> > > What about something like the attached? Wouldn't that work? And make
> > > the code actually match the old comment about wow "fl_blocker" being
> > > NULL being special.
> > >
> > > The old code seemed to not know about things like memory ordering either.
> > >
> > > Patch is entirely untested, but aims to have that "smp_store_release()
> > > means I'm done and not going to touch it any more", making that
> > > smp_load_acquire() test hopefully be valid as per the comment..
> >
> > Yeah, something along those lines maybe. I don't think we can use
> > fl_blocker that way though, as the wait_event_interruptible is waiting
> > on it to go to NULL, and the wake_up happens before fl_blocker is
> > cleared.
> >
> > Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> > instead of testing for !fl_blocker to see whether we can avoid the
> > blocked_lock_lock?
> >
>
> How about something like this instead? (untested other than for
> compilation)
>
> Basically, this just switches the waiters over to wait for
> fl_blocked_member to go empty. That still happens before the wakeup, so
> it should be ok to wait on that.
>
> I think we can also eliminate the lockless list_empty check in
> locks_delete_block, as the fl_blocker check should be sufficient now.

Actually, no -- we need to keep that check in. The rest should work
though. I'll do some testing with it and see if the perf issue goes
away.

Thanks,
--
Jeff Layton <[email protected]>

2020-03-09 21:44:48

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, Mar 09 2020, Jeff Layton wrote:

> On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
>> On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
>> > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
>> > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
>> > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
>> > >
>> > > This is not completely unexpected as we're banging on the global
>> > > blocked_lock_lock now for every unlock. This test just thrashes file
>> > > locks and unlocks without doing anything in between, so the workload
>> > > looks pretty artificial [1].
>> > >
>> > > It would be nice to avoid the global lock in this codepath, but it
>> > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
>> > > inclined to ignore this result unless we see a problem in more realistic
>> > > workloads.
>> >
>> > That is a _huge_ regression, though.
>> >
>> > What about something like the attached? Wouldn't that work? And make
>> > the code actually match the old comment about wow "fl_blocker" being
>> > NULL being special.
>> >
>> > The old code seemed to not know about things like memory ordering either.
>> >
>> > Patch is entirely untested, but aims to have that "smp_store_release()
>> > means I'm done and not going to touch it any more", making that
>> > smp_load_acquire() test hopefully be valid as per the comment..
>>
>> Yeah, something along those lines maybe. I don't think we can use
>> fl_blocker that way though, as the wait_event_interruptible is waiting
>> on it to go to NULL, and the wake_up happens before fl_blocker is
>> cleared.
>>
>> Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
>> instead of testing for !fl_blocker to see whether we can avoid the
>> blocked_lock_lock?
>>
>
> How about something like this instead? (untested other than for
> compilation)
>
> Basically, this just switches the waiters over to wait for
> fl_blocked_member to go empty. That still happens before the wakeup, so
> it should be ok to wait on that.
>
> I think we can also eliminate the lockless list_empty check in
> locks_delete_block, as the fl_blocker check should be sufficient now.
> --
> Jeff Layton <[email protected]>
> From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <[email protected]>
> Date: Mon, 9 Mar 2020 14:35:43 -0400
> Subject: [PATCH] locks: reinstate locks_delete_lock optimization
>
> ...by using smp_load_acquire and smp_store_release to close the race
> window.
>
> [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> the fl_blocker pointer to clear. Remove the list_empty check
> from locks_delete_lock shortcut. ]

Why do you think it is OK to remove that list_empty check? I don't
think it is. There might be locked requests that need to be woken up.

As the problem here is a use-after-free due to a race, one option would
be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
test/use.

Another option is to use a different lock. The fl_wait contains a
spinlock, and we have wake_up_locked() which is provided for exactly
these sorts of situations where the wake_up call can race with a thread
waking up.

So my compile-tested-only proposal is below.
I can probably a proper change-log entry if you think the patch is a
good way to go.

NeilBrown


diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..8aa04d5ac8b3 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)

waiter = list_first_entry(&blocker->fl_blocked_requests,
struct file_lock, fl_blocked_member);
+ spin_lock(&waiter->fl_wait.lock);
__locks_delete_block(waiter);
if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
waiter->fl_lmops->lm_notify(waiter);
else
- wake_up(&waiter->fl_wait);
+ wake_up_locked(&waiter->fl_wait);
+ spin_unlock(&waiter->fl_wait.lock);
}
}

@@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;

+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread
+ * "owns" the lock and is the only one that might try to claim
+ * the lock. So it is safe to test fl_blocker locklessly.
+ * Also if fl_blocker is NULL, this waiter is not listed on
+ * fl_blocked_requests for some lock, so no other request can
+ * be added to the list of fl_blocked_requests for this
+ * request. So if fl_blocker is NULL, it is safe to
+ * locklessly check if fl_blocked_requests is empty. If both
+ * of these checks succeed, there is no need to take the lock.
+ * However, some other thread might have only *just* set
+ * fl_blocker to NULL and it about to send a wakeup on
+ * fl_wait, so we mustn't return too soon or we might free waiter
+ * before that wakeup can be sent. So take the fl_wait.lock
+ * to serialize with the wakeup in __locks_wake_up_blocks().
+ */
+ if (waiter->fl_blocker == NULL) {
+ spin_lock(&waiter->fl_wait.lock);
+ if (waiter->fl_blocker == NULL &&
+ list_empty(&waiter->fl_blocked_requests)) {
+ spin_unlock(&waiter->fl_wait.lock);
+ return status;
+ }
+ spin_unlock(&waiter->fl_wait.lock);
+ }
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;


Attachments:
signature.asc (847.00 B)

2020-03-09 21:59:21

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
> On Mon, Mar 09 2020, Jeff Layton wrote:
>
> > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> > > On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > > > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> > > > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > > > >
> > > > > This is not completely unexpected as we're banging on the global
> > > > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > > > locks and unlocks without doing anything in between, so the workload
> > > > > looks pretty artificial [1].
> > > > >
> > > > > It would be nice to avoid the global lock in this codepath, but it
> > > > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > > > inclined to ignore this result unless we see a problem in more realistic
> > > > > workloads.
> > > >
> > > > That is a _huge_ regression, though.
> > > >
> > > > What about something like the attached? Wouldn't that work? And make
> > > > the code actually match the old comment about wow "fl_blocker" being
> > > > NULL being special.
> > > >
> > > > The old code seemed to not know about things like memory ordering either.
> > > >
> > > > Patch is entirely untested, but aims to have that "smp_store_release()
> > > > means I'm done and not going to touch it any more", making that
> > > > smp_load_acquire() test hopefully be valid as per the comment..
> > >
> > > Yeah, something along those lines maybe. I don't think we can use
> > > fl_blocker that way though, as the wait_event_interruptible is waiting
> > > on it to go to NULL, and the wake_up happens before fl_blocker is
> > > cleared.
> > >
> > > Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> > > instead of testing for !fl_blocker to see whether we can avoid the
> > > blocked_lock_lock?
> > >
> >
> > How about something like this instead? (untested other than for
> > compilation)
> >
> > Basically, this just switches the waiters over to wait for
> > fl_blocked_member to go empty. That still happens before the wakeup, so
> > it should be ok to wait on that.
> >
> > I think we can also eliminate the lockless list_empty check in
> > locks_delete_block, as the fl_blocker check should be sufficient now.
> > --
> > Jeff Layton <[email protected]>
> > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> > From: Linus Torvalds <[email protected]>
> > Date: Mon, 9 Mar 2020 14:35:43 -0400
> > Subject: [PATCH] locks: reinstate locks_delete_lock optimization
> >
> > ...by using smp_load_acquire and smp_store_release to close the race
> > window.
> >
> > [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> > the fl_blocker pointer to clear. Remove the list_empty check
> > from locks_delete_lock shortcut. ]
>
> Why do you think it is OK to remove that list_empty check? I don't
> think it is. There might be locked requests that need to be woken up.
>

Temporary braino. We definitely cannot remove that check.

> As the problem here is a use-after-free due to a race, one option would
> be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
> test/use.
>

Yeah, I was considering this too, but Linus' approach seemed simpler.

> Another option is to use a different lock. The fl_wait contains a
> spinlock, and we have wake_up_locked() which is provided for exactly
> these sorts of situations where the wake_up call can race with a thread
> waking up.
>
> So my compile-tested-only proposal is below.
> I can probably a proper change-log entry if you think the patch is a
> good way to go.
>
> NeilBrown
>
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..8aa04d5ac8b3 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>
> waiter = list_first_entry(&blocker->fl_blocked_requests,
> struct file_lock, fl_blocked_member);
> + spin_lock(&waiter->fl_wait.lock);
> __locks_delete_block(waiter);
> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> waiter->fl_lmops->lm_notify(waiter);
> else
> - wake_up(&waiter->fl_wait);
> + wake_up_locked(&waiter->fl_wait);
> + spin_unlock(&waiter->fl_wait.lock);
> }
> }
>
> @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread
> + * "owns" the lock and is the only one that might try to claim
> + * the lock. So it is safe to test fl_blocker locklessly.
> + * Also if fl_blocker is NULL, this waiter is not listed on
> + * fl_blocked_requests for some lock, so no other request can
> + * be added to the list of fl_blocked_requests for this
> + * request. So if fl_blocker is NULL, it is safe to
> + * locklessly check if fl_blocked_requests is empty. If both
> + * of these checks succeed, there is no need to take the lock.
> + * However, some other thread might have only *just* set
> + * fl_blocker to NULL and it about to send a wakeup on
> + * fl_wait, so we mustn't return too soon or we might free waiter
> + * before that wakeup can be sent. So take the fl_wait.lock
> + * to serialize with the wakeup in __locks_wake_up_blocks().
> + */
> + if (waiter->fl_blocker == NULL) {
> + spin_lock(&waiter->fl_wait.lock);
> + if (waiter->fl_blocker == NULL &&
> + list_empty(&waiter->fl_blocked_requests)) {
> + spin_unlock(&waiter->fl_wait.lock);
> + return status;
> + }
> + spin_unlock(&waiter->fl_wait.lock);
> + }
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;

Yeah, this is simpler for me to prove to myself that it's correct, and I
like that it touches less code, tbh. I'll give it a try here in a bit
and see if it also fixes up the perf regression.

FWIW, here's the variant of Linus' patch I've been testing. It seems to
fix the performance regression too.

--------------8<---------------

[PATCH] locks: reinstate locks_delete_lock optimization

There is measurable performance impact in some synthetic tests in commit
6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup
a waiter). Fix the race condition instead by clearing the fl_blocker
pointer after the wakeup and by using smp_load_acquire and
smp_store_release to handle the access.

This means that we can no longer use the clearing of fl_blocker clearing
as the wait condition, so switch over to checking whether the
fl_blocked_member list is empty.

[ jlayton: wait on the fl_blocked_requests list to go empty instead of
the fl_blocker pointer to clear. ]

Cc: yangerkun <[email protected]>
Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter)
Signed-off-by: Jeff Layton <[email protected]>
---
fs/cifs/file.c | 3 ++-
fs/locks.c | 43 +++++++++++++++++++++++++++++++++++++------
2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 3b942ecdd4be..8f9d849a0012 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1169,7 +1169,8 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
rc = posix_lock_file(file, flock, NULL);
up_write(&cinode->lock_sem);
if (rc == FILE_LOCK_DEFERRED) {
- rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
+ rc = wait_event_interruptible(flock->fl_wait,
+ list_empty(&flock->fl_blocked_member));
if (!rc)
goto try_again;
locks_delete_block(flock);
diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..e78d37c73df5 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -725,7 +725,6 @@ static void __locks_delete_block(struct file_lock *waiter)
{
locks_delete_global_blocked(waiter);
list_del_init(&waiter->fl_blocked_member);
- waiter->fl_blocker = NULL;
}

static void __locks_wake_up_blocks(struct file_lock *blocker)
@@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
waiter->fl_lmops->lm_notify(waiter);
else
wake_up(&waiter->fl_wait);
+
+ /*
+ * Tell the world we're done with it - see comment at
+ * top of locks_delete_block().
+ */
+ smp_store_release(&waiter->fl_blocker, NULL);
}
}

@@ -753,11 +758,32 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;

+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread
+ * "owns" the lock and is the only one that might try to claim
+ * the lock. So it is safe to test fl_blocker locklessly.
+ * Also if fl_blocker is NULL, this waiter is not listed on
+ * fl_blocked_requests for some lock, so no other request can
+ * be added to the list of fl_blocked_requests for this
+ * request. So if fl_blocker is NULL, it is safe to
+ * locklessly check if fl_blocked_requests is empty. If both
+ * of these checks succeed, there is no need to take the lock.
+ */
+ if (!smp_load_acquire(&waiter->fl_blocker) &&
+ list_empty(&waiter->fl_blocked_requests))
+ return status;
+
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;
__locks_wake_up_blocks(waiter);
__locks_delete_block(waiter);
+
+ /*
+ * Tell the world we're done with it - see comment at top
+ * of this function
+ */
+ smp_store_release(&waiter->fl_blocker, NULL);
spin_unlock(&blocked_lock_lock);
return status;
}
@@ -1350,7 +1376,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
error = posix_lock_inode(inode, fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
+ error = wait_event_interruptible(fl->fl_wait,
+ list_empty(&fl->fl_blocked_member));
if (error)
break;
}
@@ -1435,7 +1462,8 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
error = posix_lock_inode(inode, &fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
+ error = wait_event_interruptible(fl.fl_wait,
+ list_empty(&fl.fl_blocked_member));
if (!error) {
/*
* If we've been sleeping someone might have
@@ -1638,7 +1666,8 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)

locks_dispose_list(&dispose);
error = wait_event_interruptible_timeout(new_fl->fl_wait,
- !new_fl->fl_blocker, break_time);
+ list_empty(&new_fl->fl_blocked_member),
+ break_time);

percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
@@ -2122,7 +2151,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
error = flock_lock_inode(inode, fl);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
+ error = wait_event_interruptible(fl->fl_wait,
+ list_empty(&fl->fl_blocked_member));
if (error)
break;
}
@@ -2399,7 +2429,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
error = vfs_lock_file(filp, cmd, fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
+ error = wait_event_interruptible(fl->fl_wait,
+ list_empty(&fl->fl_blocked_member));
if (error)
break;
}
--
2.24.1


2020-03-09 22:12:41

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
> On Mon, Mar 09 2020, Jeff Layton wrote:
>
> > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> > > On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > > > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> > > > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > > > >
> > > > > This is not completely unexpected as we're banging on the global
> > > > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > > > locks and unlocks without doing anything in between, so the workload
> > > > > looks pretty artificial [1].
> > > > >
> > > > > It would be nice to avoid the global lock in this codepath, but it
> > > > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > > > inclined to ignore this result unless we see a problem in more realistic
> > > > > workloads.
> > > >
> > > > That is a _huge_ regression, though.
> > > >
> > > > What about something like the attached? Wouldn't that work? And make
> > > > the code actually match the old comment about wow "fl_blocker" being
> > > > NULL being special.
> > > >
> > > > The old code seemed to not know about things like memory ordering either.
> > > >
> > > > Patch is entirely untested, but aims to have that "smp_store_release()
> > > > means I'm done and not going to touch it any more", making that
> > > > smp_load_acquire() test hopefully be valid as per the comment..
> > >
> > > Yeah, something along those lines maybe. I don't think we can use
> > > fl_blocker that way though, as the wait_event_interruptible is waiting
> > > on it to go to NULL, and the wake_up happens before fl_blocker is
> > > cleared.
> > >
> > > Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> > > instead of testing for !fl_blocker to see whether we can avoid the
> > > blocked_lock_lock?
> > >
> >
> > How about something like this instead? (untested other than for
> > compilation)
> >
> > Basically, this just switches the waiters over to wait for
> > fl_blocked_member to go empty. That still happens before the wakeup, so
> > it should be ok to wait on that.
> >
> > I think we can also eliminate the lockless list_empty check in
> > locks_delete_block, as the fl_blocker check should be sufficient now.
> > --
> > Jeff Layton <[email protected]>
> > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> > From: Linus Torvalds <[email protected]>
> > Date: Mon, 9 Mar 2020 14:35:43 -0400
> > Subject: [PATCH] locks: reinstate locks_delete_lock optimization
> >
> > ...by using smp_load_acquire and smp_store_release to close the race
> > window.
> >
> > [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> > the fl_blocker pointer to clear. Remove the list_empty check
> > from locks_delete_lock shortcut. ]
>
> Why do you think it is OK to remove that list_empty check? I don't
> think it is. There might be locked requests that need to be woken up.
>
> As the problem here is a use-after-free due to a race, one option would
> be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
> test/use.
>
> Another option is to use a different lock. The fl_wait contains a
> spinlock, and we have wake_up_locked() which is provided for exactly
> these sorts of situations where the wake_up call can race with a thread
> waking up.
>
> So my compile-tested-only proposal is below.
> I can probably a proper change-log entry if you think the patch is a
> good way to go.
>
> NeilBrown
>
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..8aa04d5ac8b3 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>
> waiter = list_first_entry(&blocker->fl_blocked_requests,
> struct file_lock, fl_blocked_member);
> + spin_lock(&waiter->fl_wait.lock);
> __locks_delete_block(waiter);
> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> waiter->fl_lmops->lm_notify(waiter);
> else
> - wake_up(&waiter->fl_wait);
> + wake_up_locked(&waiter->fl_wait);
> + spin_unlock(&waiter->fl_wait.lock);
> }
> }
>
> @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread
> + * "owns" the lock and is the only one that might try to claim
> + * the lock. So it is safe to test fl_blocker locklessly.
> + * Also if fl_blocker is NULL, this waiter is not listed on
> + * fl_blocked_requests for some lock, so no other request can
> + * be added to the list of fl_blocked_requests for this
> + * request. So if fl_blocker is NULL, it is safe to
> + * locklessly check if fl_blocked_requests is empty. If both
> + * of these checks succeed, there is no need to take the lock.
> + * However, some other thread might have only *just* set
> + * fl_blocker to NULL and it about to send a wakeup on
> + * fl_wait, so we mustn't return too soon or we might free waiter
> + * before that wakeup can be sent. So take the fl_wait.lock
> + * to serialize with the wakeup in __locks_wake_up_blocks().
> + */
> + if (waiter->fl_blocker == NULL) {
> + spin_lock(&waiter->fl_wait.lock);
> + if (waiter->fl_blocker == NULL &&
> + list_empty(&waiter->fl_blocked_requests)) {
> + spin_unlock(&waiter->fl_wait.lock);
> + return status;
> + }
> + spin_unlock(&waiter->fl_wait.lock);
> + }
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;
>

Looks good on a cursory check, and I'm inclined to go with this since
it's less fiddly for people to backport.

One other difference to note -- we are holding the fl_wait lock when
calling lm_notify, but I don't think it will matter to any of the
existing lm_notify functions.

If you want to clean up the changelog and resend that would be great.

Thanks,
--
Jeff Layton <[email protected]>

2020-03-10 03:25:52

by yangerkun

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On 2020/3/10 6:11, Jeff Layton wrote:
> On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
>> On Mon, Mar 09 2020, Jeff Layton wrote:
>>
>>> On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
>>>> On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
>>>>> On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
>>>>>> On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
>>>>>>> FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
>>>>>>
>>>>>> This is not completely unexpected as we're banging on the global
>>>>>> blocked_lock_lock now for every unlock. This test just thrashes file
>>>>>> locks and unlocks without doing anything in between, so the workload
>>>>>> looks pretty artificial [1].
>>>>>>
>>>>>> It would be nice to avoid the global lock in this codepath, but it
>>>>>> doesn't look simple to do. I'll keep thinking about it, but for now I'm
>>>>>> inclined to ignore this result unless we see a problem in more realistic
>>>>>> workloads.
>>>>>
>>>>> That is a _huge_ regression, though.
>>>>>
>>>>> What about something like the attached? Wouldn't that work? And make
>>>>> the code actually match the old comment about wow "fl_blocker" being
>>>>> NULL being special.
>>>>>
>>>>> The old code seemed to not know about things like memory ordering either.
>>>>>
>>>>> Patch is entirely untested, but aims to have that "smp_store_release()
>>>>> means I'm done and not going to touch it any more", making that
>>>>> smp_load_acquire() test hopefully be valid as per the comment..
>>>>
>>>> Yeah, something along those lines maybe. I don't think we can use
>>>> fl_blocker that way though, as the wait_event_interruptible is waiting
>>>> on it to go to NULL, and the wake_up happens before fl_blocker is
>>>> cleared.
>>>>
>>>> Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
>>>> instead of testing for !fl_blocker to see whether we can avoid the
>>>> blocked_lock_lock?
>>>>
>>>
>>> How about something like this instead? (untested other than for
>>> compilation)
>>>
>>> Basically, this just switches the waiters over to wait for
>>> fl_blocked_member to go empty. That still happens before the wakeup, so
>>> it should be ok to wait on that.
>>>
>>> I think we can also eliminate the lockless list_empty check in
>>> locks_delete_block, as the fl_blocker check should be sufficient now.
>>> --
>>> Jeff Layton <[email protected]>
>>> From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
>>> From: Linus Torvalds <[email protected]>
>>> Date: Mon, 9 Mar 2020 14:35:43 -0400
>>> Subject: [PATCH] locks: reinstate locks_delete_lock optimization
>>>
>>> ...by using smp_load_acquire and smp_store_release to close the race
>>> window.
>>>
>>> [ jlayton: wait on the fl_blocked_requests list to go empty instead of
>>> the fl_blocker pointer to clear. Remove the list_empty check
>>> from locks_delete_lock shortcut. ]
>>
>> Why do you think it is OK to remove that list_empty check? I don't
>> think it is. There might be locked requests that need to be woken up.
>>
>> As the problem here is a use-after-free due to a race, one option would
>> be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
>> test/use.
>>
>> Another option is to use a different lock. The fl_wait contains a
>> spinlock, and we have wake_up_locked() which is provided for exactly
>> these sorts of situations where the wake_up call can race with a thread
>> waking up.
>>
>> So my compile-tested-only proposal is below.
>> I can probably a proper change-log entry if you think the patch is a
>> good way to go.
>>
>> NeilBrown
>>
>>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 426b55d333d5..8aa04d5ac8b3 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>>
>> waiter = list_first_entry(&blocker->fl_blocked_requests,
>> struct file_lock, fl_blocked_member);
>> + spin_lock(&waiter->fl_wait.lock);
>> __locks_delete_block(waiter);
>> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>> waiter->fl_lmops->lm_notify(waiter);
>> else
>> - wake_up(&waiter->fl_wait);
>> + wake_up_locked(&waiter->fl_wait);
>> + spin_unlock(&waiter->fl_wait.lock);
>> }
>> }
>>
>> @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
>> {
>> int status = -ENOENT;
>>
>> + /*
>> + * If fl_blocker is NULL, it won't be set again as this thread
>> + * "owns" the lock and is the only one that might try to claim
>> + * the lock. So it is safe to test fl_blocker locklessly.
>> + * Also if fl_blocker is NULL, this waiter is not listed on
>> + * fl_blocked_requests for some lock, so no other request can
>> + * be added to the list of fl_blocked_requests for this
>> + * request. So if fl_blocker is NULL, it is safe to
>> + * locklessly check if fl_blocked_requests is empty. If both
>> + * of these checks succeed, there is no need to take the lock.
>> + * However, some other thread might have only *just* set
>> + * fl_blocker to NULL and it about to send a wakeup on
>> + * fl_wait, so we mustn't return too soon or we might free waiter
>> + * before that wakeup can be sent. So take the fl_wait.lock
>> + * to serialize with the wakeup in __locks_wake_up_blocks().
>> + */
>> + if (waiter->fl_blocker == NULL) {
>> + spin_lock(&waiter->fl_wait.lock);
>> + if (waiter->fl_blocker == NULL &&
>> + list_empty(&waiter->fl_blocked_requests)) {
>> + spin_unlock(&waiter->fl_wait.lock);
>> + return status;
>> + }
>> + spin_unlock(&waiter->fl_wait.lock);
>> + }
>> spin_lock(&blocked_lock_lock);
>> if (waiter->fl_blocker)
>> status = 0;
>>
>
> Looks good on a cursory check, and I'm inclined to go with this since
> it's less fiddly for people to backport.
>
> One other difference to note -- we are holding the fl_wait lock when
> calling lm_notify, but I don't think it will matter to any of the
> existing lm_notify functions.
>
> If you want to clean up the changelog and resend that would be great.
>
> Thanks,
>
Something others. I think there is no need to call locks_delete_block
for all case in function like flock_lock_inode_wait. What we should do
as the patch '16306a61d3b7 ("fs/locks: always delete_block after
waiting.")' describes is that we need call locks_delete_block not only
for error equal to -ERESTARTSYS(please point out if I am wrong). And
this patch may fix the regression too since simple lock that success or
unlock will not try to acquire blocked_lock_lock.



From 40a0604199e9810d0380f90c403bbd4300075cad Mon Sep 17 00:00:00 2001
From: yangerkun <[email protected]>
Date: Tue, 10 Mar 2020 10:12:57 +0800
Subject: [PATCH] fs/locks: fix the regression in flocks

'6d390e4b5d48 ("locks: fix a potential use-after-free problem when
wakeup a waiter")' introduce a regression since we will acquire
blocked_lock_lock everytime we lock or unlock. Actually, what patch
'16306a61d3b7 ("fs/locks: always delete_block after waiting.")' want to
do is that we should wakeup waiter not only for error equals to
-ERESTARTSYS, some other error code like -ENOMEM return from
flock_lock_inode need be treated the same as the file_lock may block other
flock too(flock a -> conflict with others and begin to wait -> flock b
conflict with a and wait for a -> someone wakeup flock a then
flock_lock_inode return -ENOMEM). Fix this regression by check error.

Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when
wakeup a waiter")
Signed-off-by: yangerkun <[email protected]>
---
fs/locks.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..403ed2230dd4 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1354,7 +1354,9 @@ static int posix_lock_inode_wait(struct inode
*inode, struct file_lock *fl)
if (error)
break;
}
- locks_delete_block(fl);
+ if (error)
+ locks_delete_block(fl);
+
return error;
}

@@ -1447,7 +1449,8 @@ int locks_mandatory_area(struct inode *inode,
struct file *filp, loff_t start,

break;
}
- locks_delete_block(&fl);
+ if (error)
+ locks_delete_block(&fl);

return error;
}
@@ -2126,7 +2129,9 @@ static int flock_lock_inode_wait(struct inode
*inode, struct file_lock *fl)
if (error)
break;
}
- locks_delete_block(fl);
+ if (error)
+ locks_delete_block(fl);
+
return error;
}

@@ -2403,7 +2408,8 @@ static int do_lock_file_wait(struct file *filp,
unsigned int cmd,
if (error)
break;
}
- locks_delete_block(fl);
+ if (error)
+ locks_delete_block(fl);

return error;
}
--
2.17.2








2020-03-10 07:51:32

by Chen, Rong A

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, Mar 10, 2020 at 08:42:13AM +1100, NeilBrown wrote:
> On Mon, Mar 09 2020, Jeff Layton wrote:
>
> > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> >> On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> >> > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> >> > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> >> > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> >> > >
> >> > > This is not completely unexpected as we're banging on the global
> >> > > blocked_lock_lock now for every unlock. This test just thrashes file
> >> > > locks and unlocks without doing anything in between, so the workload
> >> > > looks pretty artificial [1].
> >> > >
> >> > > It would be nice to avoid the global lock in this codepath, but it
> >> > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> >> > > inclined to ignore this result unless we see a problem in more realistic
> >> > > workloads.
> >> >
> >> > That is a _huge_ regression, though.
> >> >
> >> > What about something like the attached? Wouldn't that work? And make
> >> > the code actually match the old comment about wow "fl_blocker" being
> >> > NULL being special.
> >> >
> >> > The old code seemed to not know about things like memory ordering either.
> >> >
> >> > Patch is entirely untested, but aims to have that "smp_store_release()
> >> > means I'm done and not going to touch it any more", making that
> >> > smp_load_acquire() test hopefully be valid as per the comment..
> >>
> >> Yeah, something along those lines maybe. I don't think we can use
> >> fl_blocker that way though, as the wait_event_interruptible is waiting
> >> on it to go to NULL, and the wake_up happens before fl_blocker is
> >> cleared.
> >>
> >> Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> >> instead of testing for !fl_blocker to see whether we can avoid the
> >> blocked_lock_lock?
> >>
> >
> > How about something like this instead? (untested other than for
> > compilation)
> >
> > Basically, this just switches the waiters over to wait for
> > fl_blocked_member to go empty. That still happens before the wakeup, so
> > it should be ok to wait on that.
> >
> > I think we can also eliminate the lockless list_empty check in
> > locks_delete_block, as the fl_blocker check should be sufficient now.
> > --
> > Jeff Layton <[email protected]>
> > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> > From: Linus Torvalds <[email protected]>
> > Date: Mon, 9 Mar 2020 14:35:43 -0400
> > Subject: [PATCH] locks: reinstate locks_delete_lock optimization
> >
> > ...by using smp_load_acquire and smp_store_release to close the race
> > window.
> >
> > [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> > the fl_blocker pointer to clear. Remove the list_empty check
> > from locks_delete_lock shortcut. ]
>
> Why do you think it is OK to remove that list_empty check? I don't
> think it is. There might be locked requests that need to be woken up.
>
> As the problem here is a use-after-free due to a race, one option would
> be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
> test/use.
>
> Another option is to use a different lock. The fl_wait contains a
> spinlock, and we have wake_up_locked() which is provided for exactly
> these sorts of situations where the wake_up call can race with a thread
> waking up.
>
> So my compile-tested-only proposal is below.
> I can probably a proper change-log entry if you think the patch is a
> good way to go.
>
> NeilBrown
>
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..8aa04d5ac8b3 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>
> waiter = list_first_entry(&blocker->fl_blocked_requests,
> struct file_lock, fl_blocked_member);
> + spin_lock(&waiter->fl_wait.lock);
> __locks_delete_block(waiter);
> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> waiter->fl_lmops->lm_notify(waiter);
> else
> - wake_up(&waiter->fl_wait);
> + wake_up_locked(&waiter->fl_wait);
> + spin_unlock(&waiter->fl_wait.lock);
> }
> }
>
> @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread
> + * "owns" the lock and is the only one that might try to claim
> + * the lock. So it is safe to test fl_blocker locklessly.
> + * Also if fl_blocker is NULL, this waiter is not listed on
> + * fl_blocked_requests for some lock, so no other request can
> + * be added to the list of fl_blocked_requests for this
> + * request. So if fl_blocker is NULL, it is safe to
> + * locklessly check if fl_blocked_requests is empty. If both
> + * of these checks succeed, there is no need to take the lock.
> + * However, some other thread might have only *just* set
> + * fl_blocker to NULL and it about to send a wakeup on
> + * fl_wait, so we mustn't return too soon or we might free waiter
> + * before that wakeup can be sent. So take the fl_wait.lock
> + * to serialize with the wakeup in __locks_wake_up_blocks().
> + */
> + if (waiter->fl_blocker == NULL) {
> + spin_lock(&waiter->fl_wait.lock);
> + if (waiter->fl_blocker == NULL &&
> + list_empty(&waiter->fl_blocked_requests)) {
> + spin_unlock(&waiter->fl_wait.lock);
> + return status;
> + }
> + spin_unlock(&waiter->fl_wait.lock);
> + }
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;
>

Hi,

We tested the above patch, the result of will-it-scale.per_process_ops
increased to 63278.

0a68ff5e2e7cf226 6d390e4b5d48ec03bb87e63cf0 9170174bff4246028f834a5eb7 testcase/testparams/testbox
---------------- -------------------------- -------------------------- ---------------------------
%stddev change %stddev change %stddev
\ | \ | \
66597 ± 3% -97% 2260 -5% 63278 ± 3% will-it-scale/performance-process-100%-lock1-ucode=0x11/lkp-knm01
66597 -97% 2260 -5% 63278 GEO-MEAN will-it-scale.per_process_ops

Best Regards,
Rong Chen

2020-03-10 07:53:47

by Chen, Rong A

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, Mar 09, 2020 at 05:58:14PM -0400, Jeff Layton wrote:
> On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
> > On Mon, Mar 09 2020, Jeff Layton wrote:
> >
> > > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> > > > On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > > > > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> > > > > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > > > > >
> > > > > > This is not completely unexpected as we're banging on the global
> > > > > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > > > > locks and unlocks without doing anything in between, so the workload
> > > > > > looks pretty artificial [1].
> > > > > >
> > > > > > It would be nice to avoid the global lock in this codepath, but it
> > > > > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > > > > inclined to ignore this result unless we see a problem in more realistic
> > > > > > workloads.
> > > > >
> > > > > That is a _huge_ regression, though.
> > > > >
> > > > > What about something like the attached? Wouldn't that work? And make
> > > > > the code actually match the old comment about wow "fl_blocker" being
> > > > > NULL being special.
> > > > >
> > > > > The old code seemed to not know about things like memory ordering either.
> > > > >
> > > > > Patch is entirely untested, but aims to have that "smp_store_release()
> > > > > means I'm done and not going to touch it any more", making that
> > > > > smp_load_acquire() test hopefully be valid as per the comment..
> > > >
> > > > Yeah, something along those lines maybe. I don't think we can use
> > > > fl_blocker that way though, as the wait_event_interruptible is waiting
> > > > on it to go to NULL, and the wake_up happens before fl_blocker is
> > > > cleared.
> > > >
> > > > Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> > > > instead of testing for !fl_blocker to see whether we can avoid the
> > > > blocked_lock_lock?
> > > >
> > >
> > > How about something like this instead? (untested other than for
> > > compilation)
> > >
> > > Basically, this just switches the waiters over to wait for
> > > fl_blocked_member to go empty. That still happens before the wakeup, so
> > > it should be ok to wait on that.
> > >
> > > I think we can also eliminate the lockless list_empty check in
> > > locks_delete_block, as the fl_blocker check should be sufficient now.
> > > --
> > > Jeff Layton <[email protected]>
> > > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> > > From: Linus Torvalds <[email protected]>
> > > Date: Mon, 9 Mar 2020 14:35:43 -0400
> > > Subject: [PATCH] locks: reinstate locks_delete_lock optimization
> > >
> > > ...by using smp_load_acquire and smp_store_release to close the race
> > > window.
> > >
> > > [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> > > the fl_blocker pointer to clear. Remove the list_empty check
> > > from locks_delete_lock shortcut. ]
> >
> > Why do you think it is OK to remove that list_empty check? I don't
> > think it is. There might be locked requests that need to be woken up.
> >
>
> Temporary braino. We definitely cannot remove that check.
>
> > As the problem here is a use-after-free due to a race, one option would
> > be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
> > test/use.
> >
>
> Yeah, I was considering this too, but Linus' approach seemed simpler.
>
> > Another option is to use a different lock. The fl_wait contains a
> > spinlock, and we have wake_up_locked() which is provided for exactly
> > these sorts of situations where the wake_up call can race with a thread
> > waking up.
> >
> > So my compile-tested-only proposal is below.
> > I can probably a proper change-log entry if you think the patch is a
> > good way to go.
> >
> > NeilBrown
> >
> >
> > diff --git a/fs/locks.c b/fs/locks.c
> > index 426b55d333d5..8aa04d5ac8b3 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> >
> > waiter = list_first_entry(&blocker->fl_blocked_requests,
> > struct file_lock, fl_blocked_member);
> > + spin_lock(&waiter->fl_wait.lock);
> > __locks_delete_block(waiter);
> > if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> > waiter->fl_lmops->lm_notify(waiter);
> > else
> > - wake_up(&waiter->fl_wait);
> > + wake_up_locked(&waiter->fl_wait);
> > + spin_unlock(&waiter->fl_wait.lock);
> > }
> > }
> >
> > @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
> > {
> > int status = -ENOENT;
> >
> > + /*
> > + * If fl_blocker is NULL, it won't be set again as this thread
> > + * "owns" the lock and is the only one that might try to claim
> > + * the lock. So it is safe to test fl_blocker locklessly.
> > + * Also if fl_blocker is NULL, this waiter is not listed on
> > + * fl_blocked_requests for some lock, so no other request can
> > + * be added to the list of fl_blocked_requests for this
> > + * request. So if fl_blocker is NULL, it is safe to
> > + * locklessly check if fl_blocked_requests is empty. If both
> > + * of these checks succeed, there is no need to take the lock.
> > + * However, some other thread might have only *just* set
> > + * fl_blocker to NULL and it about to send a wakeup on
> > + * fl_wait, so we mustn't return too soon or we might free waiter
> > + * before that wakeup can be sent. So take the fl_wait.lock
> > + * to serialize with the wakeup in __locks_wake_up_blocks().
> > + */
> > + if (waiter->fl_blocker == NULL) {
> > + spin_lock(&waiter->fl_wait.lock);
> > + if (waiter->fl_blocker == NULL &&
> > + list_empty(&waiter->fl_blocked_requests)) {
> > + spin_unlock(&waiter->fl_wait.lock);
> > + return status;
> > + }
> > + spin_unlock(&waiter->fl_wait.lock);
> > + }
> > spin_lock(&blocked_lock_lock);
> > if (waiter->fl_blocker)
> > status = 0;
>
> Yeah, this is simpler for me to prove to myself that it's correct, and I
> like that it touches less code, tbh. I'll give it a try here in a bit
> and see if it also fixes up the perf regression.
>
> FWIW, here's the variant of Linus' patch I've been testing. It seems to
> fix the performance regression too.
>
> --------------8<---------------
>
> [PATCH] locks: reinstate locks_delete_lock optimization
>
> There is measurable performance impact in some synthetic tests in commit
> 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup
> a waiter). Fix the race condition instead by clearing the fl_blocker
> pointer after the wakeup and by using smp_load_acquire and
> smp_store_release to handle the access.
>
> This means that we can no longer use the clearing of fl_blocker clearing
> as the wait condition, so switch over to checking whether the
> fl_blocked_member list is empty.
>
> [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> the fl_blocker pointer to clear. ]
>
> Cc: yangerkun <[email protected]>
> Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter)
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/cifs/file.c | 3 ++-
> fs/locks.c | 43 +++++++++++++++++++++++++++++++++++++------
> 2 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 3b942ecdd4be..8f9d849a0012 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1169,7 +1169,8 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
> rc = posix_lock_file(file, flock, NULL);
> up_write(&cinode->lock_sem);
> if (rc == FILE_LOCK_DEFERRED) {
> - rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
> + rc = wait_event_interruptible(flock->fl_wait,
> + list_empty(&flock->fl_blocked_member));
> if (!rc)
> goto try_again;
> locks_delete_block(flock);
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..e78d37c73df5 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -725,7 +725,6 @@ static void __locks_delete_block(struct file_lock *waiter)
> {
> locks_delete_global_blocked(waiter);
> list_del_init(&waiter->fl_blocked_member);
> - waiter->fl_blocker = NULL;
> }
>
> static void __locks_wake_up_blocks(struct file_lock *blocker)
> @@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> waiter->fl_lmops->lm_notify(waiter);
> else
> wake_up(&waiter->fl_wait);
> +
> + /*
> + * Tell the world we're done with it - see comment at
> + * top of locks_delete_block().
> + */
> + smp_store_release(&waiter->fl_blocker, NULL);
> }
> }
>
> @@ -753,11 +758,32 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread
> + * "owns" the lock and is the only one that might try to claim
> + * the lock. So it is safe to test fl_blocker locklessly.
> + * Also if fl_blocker is NULL, this waiter is not listed on
> + * fl_blocked_requests for some lock, so no other request can
> + * be added to the list of fl_blocked_requests for this
> + * request. So if fl_blocker is NULL, it is safe to
> + * locklessly check if fl_blocked_requests is empty. If both
> + * of these checks succeed, there is no need to take the lock.
> + */
> + if (!smp_load_acquire(&waiter->fl_blocker) &&
> + list_empty(&waiter->fl_blocked_requests))
> + return status;
> +
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;
> __locks_wake_up_blocks(waiter);
> __locks_delete_block(waiter);
> +
> + /*
> + * Tell the world we're done with it - see comment at top
> + * of this function
> + */
> + smp_store_release(&waiter->fl_blocker, NULL);
> spin_unlock(&blocked_lock_lock);
> return status;
> }
> @@ -1350,7 +1376,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> error = posix_lock_inode(inode, fl, NULL);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> + error = wait_event_interruptible(fl->fl_wait,
> + list_empty(&fl->fl_blocked_member));
> if (error)
> break;
> }
> @@ -1435,7 +1462,8 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
> error = posix_lock_inode(inode, &fl, NULL);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
> + error = wait_event_interruptible(fl.fl_wait,
> + list_empty(&fl.fl_blocked_member));
> if (!error) {
> /*
> * If we've been sleeping someone might have
> @@ -1638,7 +1666,8 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>
> locks_dispose_list(&dispose);
> error = wait_event_interruptible_timeout(new_fl->fl_wait,
> - !new_fl->fl_blocker, break_time);
> + list_empty(&new_fl->fl_blocked_member),
> + break_time);
>
> percpu_down_read(&file_rwsem);
> spin_lock(&ctx->flc_lock);
> @@ -2122,7 +2151,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> error = flock_lock_inode(inode, fl);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> + error = wait_event_interruptible(fl->fl_wait,
> + list_empty(&fl->fl_blocked_member));
> if (error)
> break;
> }
> @@ -2399,7 +2429,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
> error = vfs_lock_file(filp, cmd, fl, NULL);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> + error = wait_event_interruptible(fl->fl_wait,
> + list_empty(&fl->fl_blocked_member));
> if (error)
> break;
> }
> --
> 2.24.1
>
>

Hi,

We tested the above patch, the result of will-it-scale.per_process_ops
increased to 67207.

0a68ff5e2e7cf226 6d390e4b5d48ec03bb87e63cf0 bac15fc9e87397da379af89a33 testcase/testparams/testbox
---------------- -------------------------- -------------------------- ---------------------------
%stddev change %stddev change %stddev
\ | \ | \
66597 ± 3% -97% 2260 67207 ± 3% will-it-scale/performance-process-100%-lock1-ucode=0x11/lkp-knm01
66597 -97% 2260 67207 GEO-MEAN will-it-scale.per_process_ops

Best Regards,
Rong Chen

2020-03-10 07:56:27

by Chen, Rong A

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, Mar 10, 2020 at 11:24:50AM +0800, yangerkun wrote:
> On 2020/3/10 6:11, Jeff Layton wrote:
> > On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
> > > On Mon, Mar 09 2020, Jeff Layton wrote:
> > >
> > > > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> > > > > On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > > > > > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> > > > > > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > > > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > > > > > >
> > > > > > > This is not completely unexpected as we're banging on the global
> > > > > > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > > > > > locks and unlocks without doing anything in between, so the workload
> > > > > > > looks pretty artificial [1].
> > > > > > >
> > > > > > > It would be nice to avoid the global lock in this codepath, but it
> > > > > > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > > > > > inclined to ignore this result unless we see a problem in more realistic
> > > > > > > workloads.
> > > > > >
> > > > > > That is a _huge_ regression, though.
> > > > > >
> > > > > > What about something like the attached? Wouldn't that work? And make
> > > > > > the code actually match the old comment about wow "fl_blocker" being
> > > > > > NULL being special.
> > > > > >
> > > > > > The old code seemed to not know about things like memory ordering either.
> > > > > >
> > > > > > Patch is entirely untested, but aims to have that "smp_store_release()
> > > > > > means I'm done and not going to touch it any more", making that
> > > > > > smp_load_acquire() test hopefully be valid as per the comment..
> > > > >
> > > > > Yeah, something along those lines maybe. I don't think we can use
> > > > > fl_blocker that way though, as the wait_event_interruptible is waiting
> > > > > on it to go to NULL, and the wake_up happens before fl_blocker is
> > > > > cleared.
> > > > >
> > > > > Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> > > > > instead of testing for !fl_blocker to see whether we can avoid the
> > > > > blocked_lock_lock?
> > > >
> > > > How about something like this instead? (untested other than for
> > > > compilation)
> > > >
> > > > Basically, this just switches the waiters over to wait for
> > > > fl_blocked_member to go empty. That still happens before the wakeup, so
> > > > it should be ok to wait on that.
> > > >
> > > > I think we can also eliminate the lockless list_empty check in
> > > > locks_delete_block, as the fl_blocker check should be sufficient now.
> > > > --
> > > > Jeff Layton <[email protected]>
> > > > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> > > > From: Linus Torvalds <[email protected]>
> > > > Date: Mon, 9 Mar 2020 14:35:43 -0400
> > > > Subject: [PATCH] locks: reinstate locks_delete_lock optimization
> > > >
> > > > ...by using smp_load_acquire and smp_store_release to close the race
> > > > window.
> > > >
> > > > [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> > > > the fl_blocker pointer to clear. Remove the list_empty check
> > > > from locks_delete_lock shortcut. ]
> > >
> > > Why do you think it is OK to remove that list_empty check? I don't
> > > think it is. There might be locked requests that need to be woken up.
> > >
> > > As the problem here is a use-after-free due to a race, one option would
> > > be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
> > > test/use.
> > >
> > > Another option is to use a different lock. The fl_wait contains a
> > > spinlock, and we have wake_up_locked() which is provided for exactly
> > > these sorts of situations where the wake_up call can race with a thread
> > > waking up.
> > >
> > > So my compile-tested-only proposal is below.
> > > I can probably a proper change-log entry if you think the patch is a
> > > good way to go.
> > >
> > > NeilBrown
> > >
> > >
> > > diff --git a/fs/locks.c b/fs/locks.c
> > > index 426b55d333d5..8aa04d5ac8b3 100644
> > > --- a/fs/locks.c
> > > +++ b/fs/locks.c
> > > @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> > > waiter = list_first_entry(&blocker->fl_blocked_requests,
> > > struct file_lock, fl_blocked_member);
> > > + spin_lock(&waiter->fl_wait.lock);
> > > __locks_delete_block(waiter);
> > > if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> > > waiter->fl_lmops->lm_notify(waiter);
> > > else
> > > - wake_up(&waiter->fl_wait);
> > > + wake_up_locked(&waiter->fl_wait);
> > > + spin_unlock(&waiter->fl_wait.lock);
> > > }
> > > }
> > > @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
> > > {
> > > int status = -ENOENT;
> > > + /*
> > > + * If fl_blocker is NULL, it won't be set again as this thread
> > > + * "owns" the lock and is the only one that might try to claim
> > > + * the lock. So it is safe to test fl_blocker locklessly.
> > > + * Also if fl_blocker is NULL, this waiter is not listed on
> > > + * fl_blocked_requests for some lock, so no other request can
> > > + * be added to the list of fl_blocked_requests for this
> > > + * request. So if fl_blocker is NULL, it is safe to
> > > + * locklessly check if fl_blocked_requests is empty. If both
> > > + * of these checks succeed, there is no need to take the lock.
> > > + * However, some other thread might have only *just* set
> > > + * fl_blocker to NULL and it about to send a wakeup on
> > > + * fl_wait, so we mustn't return too soon or we might free waiter
> > > + * before that wakeup can be sent. So take the fl_wait.lock
> > > + * to serialize with the wakeup in __locks_wake_up_blocks().
> > > + */
> > > + if (waiter->fl_blocker == NULL) {
> > > + spin_lock(&waiter->fl_wait.lock);
> > > + if (waiter->fl_blocker == NULL &&
> > > + list_empty(&waiter->fl_blocked_requests)) {
> > > + spin_unlock(&waiter->fl_wait.lock);
> > > + return status;
> > > + }
> > > + spin_unlock(&waiter->fl_wait.lock);
> > > + }
> > > spin_lock(&blocked_lock_lock);
> > > if (waiter->fl_blocker)
> > > status = 0;
> > >
> >
> > Looks good on a cursory check, and I'm inclined to go with this since
> > it's less fiddly for people to backport.
> >
> > One other difference to note -- we are holding the fl_wait lock when
> > calling lm_notify, but I don't think it will matter to any of the
> > existing lm_notify functions.
> >
> > If you want to clean up the changelog and resend that would be great.
> >
> > Thanks,
> >
> Something others. I think there is no need to call locks_delete_block for
> all case in function like flock_lock_inode_wait. What we should do as the
> patch '16306a61d3b7 ("fs/locks: always delete_block after waiting.")'
> describes is that we need call locks_delete_block not only for error equal
> to -ERESTARTSYS(please point out if I am wrong). And this patch may fix the
> regression too since simple lock that success or unlock will not try to
> acquire blocked_lock_lock.
>
>
>
> From 40a0604199e9810d0380f90c403bbd4300075cad Mon Sep 17 00:00:00 2001
> From: yangerkun <[email protected]>
> Date: Tue, 10 Mar 2020 10:12:57 +0800
> Subject: [PATCH] fs/locks: fix the regression in flocks
>
> '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> wakeup a waiter")' introduce a regression since we will acquire
> blocked_lock_lock everytime we lock or unlock. Actually, what patch
> '16306a61d3b7 ("fs/locks: always delete_block after waiting.")' want to
> do is that we should wakeup waiter not only for error equals to
> -ERESTARTSYS, some other error code like -ENOMEM return from
> flock_lock_inode need be treated the same as the file_lock may block other
> flock too(flock a -> conflict with others and begin to wait -> flock b
> conflict with a and wait for a -> someone wakeup flock a then
> flock_lock_inode return -ENOMEM). Fix this regression by check error.
>
> Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> wakeup a waiter")
> Signed-off-by: yangerkun <[email protected]>
> ---
> fs/locks.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..403ed2230dd4 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1354,7 +1354,9 @@ static int posix_lock_inode_wait(struct inode *inode,
> struct file_lock *fl)
> if (error)
> break;
> }
> - locks_delete_block(fl);
> + if (error)
> + locks_delete_block(fl);
> +
> return error;
> }
>
> @@ -1447,7 +1449,8 @@ int locks_mandatory_area(struct inode *inode, struct
> file *filp, loff_t start,
>
> break;
> }
> - locks_delete_block(&fl);
> + if (error)
> + locks_delete_block(&fl);
>
> return error;
> }
> @@ -2126,7 +2129,9 @@ static int flock_lock_inode_wait(struct inode *inode,
> struct file_lock *fl)
> if (error)
> break;
> }
> - locks_delete_block(fl);
> + if (error)
> + locks_delete_block(fl);
> +
> return error;
> }
>
> @@ -2403,7 +2408,8 @@ static int do_lock_file_wait(struct file *filp,
> unsigned int cmd,
> if (error)
> break;
> }
> - locks_delete_block(fl);
> + if (error)
> + locks_delete_block(fl);
>
> return error;
> }
> --
> 2.17.2
>

Hi,

We tested the above patch, the result of will-it-scale.per_process_ops
increased to 62404.

0a68ff5e2e7cf226 6d390e4b5d48ec03bb87e63cf0 a3f09d0d818584c84780e6753e testcase/testparams/testbox
---------------- -------------------------- -------------------------- ---------------------------
%stddev change %stddev change %stddev
\ | \ | \
66597 ± 3% -97% 2260 -6% 62404 ± 6% will-it-scale/performance-process-100%-lock1-ucode=0x11/lkp-knm01
66597 -97% 2260 -6% 62404 GEO-MEAN will-it-scale.per_process_ops

Best Regards,
Rong Chen

2020-03-10 13:30:19

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
> On 2020/3/10 6:11, Jeff Layton wrote:
> > On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
> > > On Mon, Mar 09 2020, Jeff Layton wrote:
> > >
> > > > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> > > > > On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > > > > > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> > > > > > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > > > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > > > > > >
> > > > > > > This is not completely unexpected as we're banging on the global
> > > > > > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > > > > > locks and unlocks without doing anything in between, so the workload
> > > > > > > looks pretty artificial [1].
> > > > > > >
> > > > > > > It would be nice to avoid the global lock in this codepath, but it
> > > > > > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > > > > > inclined to ignore this result unless we see a problem in more realistic
> > > > > > > workloads.
> > > > > >
> > > > > > That is a _huge_ regression, though.
> > > > > >
> > > > > > What about something like the attached? Wouldn't that work? And make
> > > > > > the code actually match the old comment about wow "fl_blocker" being
> > > > > > NULL being special.
> > > > > >
> > > > > > The old code seemed to not know about things like memory ordering either.
> > > > > >
> > > > > > Patch is entirely untested, but aims to have that "smp_store_release()
> > > > > > means I'm done and not going to touch it any more", making that
> > > > > > smp_load_acquire() test hopefully be valid as per the comment..
> > > > >
> > > > > Yeah, something along those lines maybe. I don't think we can use
> > > > > fl_blocker that way though, as the wait_event_interruptible is waiting
> > > > > on it to go to NULL, and the wake_up happens before fl_blocker is
> > > > > cleared.
> > > > >
> > > > > Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> > > > > instead of testing for !fl_blocker to see whether we can avoid the
> > > > > blocked_lock_lock?
> > > > >
> > > >
> > > > How about something like this instead? (untested other than for
> > > > compilation)
> > > >
> > > > Basically, this just switches the waiters over to wait for
> > > > fl_blocked_member to go empty. That still happens before the wakeup, so
> > > > it should be ok to wait on that.
> > > >
> > > > I think we can also eliminate the lockless list_empty check in
> > > > locks_delete_block, as the fl_blocker check should be sufficient now.
> > > > --
> > > > Jeff Layton <[email protected]>
> > > > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> > > > From: Linus Torvalds <[email protected]>
> > > > Date: Mon, 9 Mar 2020 14:35:43 -0400
> > > > Subject: [PATCH] locks: reinstate locks_delete_lock optimization
> > > >
> > > > ...by using smp_load_acquire and smp_store_release to close the race
> > > > window.
> > > >
> > > > [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> > > > the fl_blocker pointer to clear. Remove the list_empty check
> > > > from locks_delete_lock shortcut. ]
> > >
> > > Why do you think it is OK to remove that list_empty check? I don't
> > > think it is. There might be locked requests that need to be woken up.
> > >
> > > As the problem here is a use-after-free due to a race, one option would
> > > be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
> > > test/use.
> > >
> > > Another option is to use a different lock. The fl_wait contains a
> > > spinlock, and we have wake_up_locked() which is provided for exactly
> > > these sorts of situations where the wake_up call can race with a thread
> > > waking up.
> > >
> > > So my compile-tested-only proposal is below.
> > > I can probably a proper change-log entry if you think the patch is a
> > > good way to go.
> > >
> > > NeilBrown
> > >
> > >
> > > diff --git a/fs/locks.c b/fs/locks.c
> > > index 426b55d333d5..8aa04d5ac8b3 100644
> > > --- a/fs/locks.c
> > > +++ b/fs/locks.c
> > > @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> > >
> > > waiter = list_first_entry(&blocker->fl_blocked_requests,
> > > struct file_lock, fl_blocked_member);
> > > + spin_lock(&waiter->fl_wait.lock);
> > > __locks_delete_block(waiter);
> > > if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> > > waiter->fl_lmops->lm_notify(waiter);
> > > else
> > > - wake_up(&waiter->fl_wait);
> > > + wake_up_locked(&waiter->fl_wait);
> > > + spin_unlock(&waiter->fl_wait.lock);
> > > }
> > > }
> > >
> > > @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
> > > {
> > > int status = -ENOENT;
> > >
> > > + /*
> > > + * If fl_blocker is NULL, it won't be set again as this thread
> > > + * "owns" the lock and is the only one that might try to claim
> > > + * the lock. So it is safe to test fl_blocker locklessly.
> > > + * Also if fl_blocker is NULL, this waiter is not listed on
> > > + * fl_blocked_requests for some lock, so no other request can
> > > + * be added to the list of fl_blocked_requests for this
> > > + * request. So if fl_blocker is NULL, it is safe to
> > > + * locklessly check if fl_blocked_requests is empty. If both
> > > + * of these checks succeed, there is no need to take the lock.
> > > + * However, some other thread might have only *just* set
> > > + * fl_blocker to NULL and it about to send a wakeup on
> > > + * fl_wait, so we mustn't return too soon or we might free waiter
> > > + * before that wakeup can be sent. So take the fl_wait.lock
> > > + * to serialize with the wakeup in __locks_wake_up_blocks().
> > > + */
> > > + if (waiter->fl_blocker == NULL) {
> > > + spin_lock(&waiter->fl_wait.lock);
> > > + if (waiter->fl_blocker == NULL &&
> > > + list_empty(&waiter->fl_blocked_requests)) {
> > > + spin_unlock(&waiter->fl_wait.lock);
> > > + return status;
> > > + }
> > > + spin_unlock(&waiter->fl_wait.lock);
> > > + }
> > > spin_lock(&blocked_lock_lock);
> > > if (waiter->fl_blocker)
> > > status = 0;
> > >
> >
> > Looks good on a cursory check, and I'm inclined to go with this since
> > it's less fiddly for people to backport.
> >
> > One other difference to note -- we are holding the fl_wait lock when
> > calling lm_notify, but I don't think it will matter to any of the
> > existing lm_notify functions.
> >
> > If you want to clean up the changelog and resend that would be great.
> >
> > Thanks,
> >
> Something others. I think there is no need to call locks_delete_block
> for all case in function like flock_lock_inode_wait. What we should do
> as the patch '16306a61d3b7 ("fs/locks: always delete_block after
> waiting.")' describes is that we need call locks_delete_block not only
> for error equal to -ERESTARTSYS(please point out if I am wrong). And
> this patch may fix the regression too since simple lock that success or
> unlock will not try to acquire blocked_lock_lock.
>
>

Nice! This looks like it would work too, and it's a simpler fix.

I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
statements to make sure we never exit with one still queued. Also, I
think we can do a similar optimization in __break_lease.

There are some other callers of locks_delete_block:

cifs_posix_lock_set: already only calls it in these cases

nlmsvc_unlink_block: I think we need to call this in most cases, and
they're not going to be high-performance codepaths in general

nfsd4 callback handling: Several calls here, most need to always be
called. find_blocked_lock could be reworked to take the
blocked_lock_lock only once (I'll do that in a separate patch).

How about something like this (

----------------------8<---------------------

From: yangerkun <[email protected]>

[PATCH] filelock: fix regression in unlock performance

'6d390e4b5d48 ("locks: fix a potential use-after-free problem when
wakeup a waiter")' introduces a regression since we will acquire
blocked_lock_lock every time locks_delete_block is called.

In many cases we can just avoid calling locks_delete_block at all,
when we know that the wait was awoken by the condition becoming true.
Change several callers of locks_delete_block to only call it when
waking up due to signal or other error condition.

[ jlayton: add similar optimization to __break_lease, reword changelog,
add WARN_ON_ONCE calls ]

Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
Signed-off-by: yangerkun <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..b88a5b11c464 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
if (error)
break;
}
- locks_delete_block(fl);
+ if (error)
+ locks_delete_block(fl);
+ WARN_ON_ONCE(fl->fl_blocker);
+
return error;
}

@@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,

break;
}
- locks_delete_block(&fl);
+ if (error)
+ locks_delete_block(&fl);
+ WARN_ON_ONCE(fl.fl_blocker);

return error;
}
@@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)

locks_dispose_list(&dispose);
error = wait_event_interruptible_timeout(new_fl->fl_wait,
- !new_fl->fl_blocker, break_time);
+ !new_fl->fl_blocker,
+ break_time);

percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
trace_break_lease_unblock(inode, new_fl);
- locks_delete_block(new_fl);
if (error >= 0) {
/*
* Wait for the next conflicting lease that has not been
* broken yet
*/
- if (error == 0)
+ if (error == 0) {
+ locks_delete_block(new_fl);
time_out_leases(inode, &dispose);
+ }
if (any_leases_conflict(inode, new_fl))
goto restart;
error = 0;
+ } else {
+ locks_delete_block(new_fl);
}
+ WARN_ON_ONCE(fl->fl_blocker);
out:
spin_unlock(&ctx->flc_lock);
percpu_up_read(&file_rwsem);
@@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
if (error)
break;
}
- locks_delete_block(fl);
+ if (error)
+ locks_delete_block(fl);
+ WARN_ON_ONCE(fl->fl_blocker);
+
return error;
}

@@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
if (error)
break;
}
- locks_delete_block(fl);
+ if (error)
+ locks_delete_block(fl);
+ WARN_ON_ONCE(fl->fl_blocker);

return error;
}
--
2.24.1


2020-03-10 14:21:03

by yangerkun

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression



On 2020/3/10 20:52, Jeff Layton wrote:
> On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
>> On 2020/3/10 6:11, Jeff Layton wrote:
>>> On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
>>>> On Mon, Mar 09 2020, Jeff Layton wrote:
>>>>
>>>>> On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
>>>>>> On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
>>>>>>> On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
>>>>>>>> On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
>>>>>>>>> FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
>>>>>>>>
>>>>>>>> This is not completely unexpected as we're banging on the global
>>>>>>>> blocked_lock_lock now for every unlock. This test just thrashes file
>>>>>>>> locks and unlocks without doing anything in between, so the workload
>>>>>>>> looks pretty artificial [1].
>>>>>>>>
>>>>>>>> It would be nice to avoid the global lock in this codepath, but it
>>>>>>>> doesn't look simple to do. I'll keep thinking about it, but for now I'm
>>>>>>>> inclined to ignore this result unless we see a problem in more realistic
>>>>>>>> workloads.
>>>>>>>
>>>>>>> That is a _huge_ regression, though.
>>>>>>>
>>>>>>> What about something like the attached? Wouldn't that work? And make
>>>>>>> the code actually match the old comment about wow "fl_blocker" being
>>>>>>> NULL being special.
>>>>>>>
>>>>>>> The old code seemed to not know about things like memory ordering either.
>>>>>>>
>>>>>>> Patch is entirely untested, but aims to have that "smp_store_release()
>>>>>>> means I'm done and not going to touch it any more", making that
>>>>>>> smp_load_acquire() test hopefully be valid as per the comment..
>>>>>>
>>>>>> Yeah, something along those lines maybe. I don't think we can use
>>>>>> fl_blocker that way though, as the wait_event_interruptible is waiting
>>>>>> on it to go to NULL, and the wake_up happens before fl_blocker is
>>>>>> cleared.
>>>>>>
>>>>>> Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
>>>>>> instead of testing for !fl_blocker to see whether we can avoid the
>>>>>> blocked_lock_lock?
>>>>>>
>>>>>
>>>>> How about something like this instead? (untested other than for
>>>>> compilation)
>>>>>
>>>>> Basically, this just switches the waiters over to wait for
>>>>> fl_blocked_member to go empty. That still happens before the wakeup, so
>>>>> it should be ok to wait on that.
>>>>>
>>>>> I think we can also eliminate the lockless list_empty check in
>>>>> locks_delete_block, as the fl_blocker check should be sufficient now.
>>>>> --
>>>>> Jeff Layton <[email protected]>
>>>>> From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
>>>>> From: Linus Torvalds <[email protected]>
>>>>> Date: Mon, 9 Mar 2020 14:35:43 -0400
>>>>> Subject: [PATCH] locks: reinstate locks_delete_lock optimization
>>>>>
>>>>> ...by using smp_load_acquire and smp_store_release to close the race
>>>>> window.
>>>>>
>>>>> [ jlayton: wait on the fl_blocked_requests list to go empty instead of
>>>>> the fl_blocker pointer to clear. Remove the list_empty check
>>>>> from locks_delete_lock shortcut. ]
>>>>
>>>> Why do you think it is OK to remove that list_empty check? I don't
>>>> think it is. There might be locked requests that need to be woken up.
>>>>
>>>> As the problem here is a use-after-free due to a race, one option would
>>>> be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
>>>> test/use.
>>>>
>>>> Another option is to use a different lock. The fl_wait contains a
>>>> spinlock, and we have wake_up_locked() which is provided for exactly
>>>> these sorts of situations where the wake_up call can race with a thread
>>>> waking up.
>>>>
>>>> So my compile-tested-only proposal is below.
>>>> I can probably a proper change-log entry if you think the patch is a
>>>> good way to go.
>>>>
>>>> NeilBrown
>>>>
>>>>
>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>> index 426b55d333d5..8aa04d5ac8b3 100644
>>>> --- a/fs/locks.c
>>>> +++ b/fs/locks.c
>>>> @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>>>>
>>>> waiter = list_first_entry(&blocker->fl_blocked_requests,
>>>> struct file_lock, fl_blocked_member);
>>>> + spin_lock(&waiter->fl_wait.lock);
>>>> __locks_delete_block(waiter);
>>>> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>>>> waiter->fl_lmops->lm_notify(waiter);
>>>> else
>>>> - wake_up(&waiter->fl_wait);
>>>> + wake_up_locked(&waiter->fl_wait);
>>>> + spin_unlock(&waiter->fl_wait.lock);
>>>> }
>>>> }
>>>>
>>>> @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
>>>> {
>>>> int status = -ENOENT;
>>>>
>>>> + /*
>>>> + * If fl_blocker is NULL, it won't be set again as this thread
>>>> + * "owns" the lock and is the only one that might try to claim
>>>> + * the lock. So it is safe to test fl_blocker locklessly.
>>>> + * Also if fl_blocker is NULL, this waiter is not listed on
>>>> + * fl_blocked_requests for some lock, so no other request can
>>>> + * be added to the list of fl_blocked_requests for this
>>>> + * request. So if fl_blocker is NULL, it is safe to
>>>> + * locklessly check if fl_blocked_requests is empty. If both
>>>> + * of these checks succeed, there is no need to take the lock.
>>>> + * However, some other thread might have only *just* set
>>>> + * fl_blocker to NULL and it about to send a wakeup on
>>>> + * fl_wait, so we mustn't return too soon or we might free waiter
>>>> + * before that wakeup can be sent. So take the fl_wait.lock
>>>> + * to serialize with the wakeup in __locks_wake_up_blocks().
>>>> + */
>>>> + if (waiter->fl_blocker == NULL) {
>>>> + spin_lock(&waiter->fl_wait.lock);
>>>> + if (waiter->fl_blocker == NULL &&
>>>> + list_empty(&waiter->fl_blocked_requests)) {
>>>> + spin_unlock(&waiter->fl_wait.lock);
>>>> + return status;
>>>> + }
>>>> + spin_unlock(&waiter->fl_wait.lock);
>>>> + }
>>>> spin_lock(&blocked_lock_lock);
>>>> if (waiter->fl_blocker)
>>>> status = 0;
>>>>
>>>
>>> Looks good on a cursory check, and I'm inclined to go with this since
>>> it's less fiddly for people to backport.
>>>
>>> One other difference to note -- we are holding the fl_wait lock when
>>> calling lm_notify, but I don't think it will matter to any of the
>>> existing lm_notify functions.
>>>
>>> If you want to clean up the changelog and resend that would be great.
>>>
>>> Thanks,
>>>
>> Something others. I think there is no need to call locks_delete_block
>> for all case in function like flock_lock_inode_wait. What we should do
>> as the patch '16306a61d3b7 ("fs/locks: always delete_block after
>> waiting.")' describes is that we need call locks_delete_block not only
>> for error equal to -ERESTARTSYS(please point out if I am wrong). And
>> this patch may fix the regression too since simple lock that success or
>> unlock will not try to acquire blocked_lock_lock.
>>
>>
>
> Nice! This looks like it would work too, and it's a simpler fix.
>
> I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
> statements to make sure we never exit with one still queued. Also, I
> think we can do a similar optimization in __break_lease.
>
> There are some other callers of locks_delete_block:
>
> cifs_posix_lock_set: already only calls it in these cases


Maybe cifs_posix_lock_set should to be treated the same as
posix_lock_inode_wait since cifs_posix_lock_set can call
locks_delete_block only when rc equals to -ERESTARTSYS.

--------------------------------------------

[PATCH] cifs: call locks_delete_block for all error case in
cifs_posix_lock_set

'16306a61d3b7 ("fs/locks: always delete_block after waiting.")' fix the
problem that we should call locks_delete_block for all error case.
However, cifs_posix_lock_set has been leaved alone which bug may still
exists. Fix it and reorder the code to make in simple.

Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
Signed-off-by: yangerkun <[email protected]>
---
fs/cifs/file.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 3b942ecdd4be..e20fc252c0a9 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1159,21 +1159,25 @@ cifs_posix_lock_set(struct file *file, struct
file_lock *flock)
if ((flock->fl_flags & FL_POSIX) == 0)
return rc;

-try_again:
- cifs_down_write(&cinode->lock_sem);
- if (!cinode->can_cache_brlcks) {
- up_write(&cinode->lock_sem);
- return rc;
- }
+ for (;;) {
+ cifs_down_write(&cinode->lock_sem);
+ if (!cinode->can_cache_brlcks) {
+ up_write(&cinode->lock_sem);
+ return rc;
+ }

- rc = posix_lock_file(file, flock, NULL);
- up_write(&cinode->lock_sem);
- if (rc == FILE_LOCK_DEFERRED) {
+ rc = posix_lock_file(file, flock, NULL);
+ up_write(&cinode->lock_sem);
+ if (rc != FILE_LOCK_DEFERRED)
+ break;
rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
- if (!rc)
- goto try_again;
- locks_delete_block(flock);
+ if (rc)
+ break;
}
+ if (rc)
+ locks_delete_block(flock);
+ WARN_ON_ONCE(flock->fl_blocker);
+
return rc;
}

--
2.17.2



>
> nlmsvc_unlink_block: I think we need to call this in most cases, and
> they're not going to be high-performance codepaths in general
>
> nfsd4 callback handling: Several calls here, most need to always be
> called. find_blocked_lock could be reworked to take the
> blocked_lock_lock only once (I'll do that in a separate patch).
>
> How about something like this (

Thanks for this, I prefer this patch!

>
> ----------------------8<---------------------
>
> From: yangerkun <[email protected]>
>
> [PATCH] filelock: fix regression in unlock performance
>
> '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> wakeup a waiter")' introduces a regression since we will acquire
> blocked_lock_lock every time locks_delete_block is called.
>
> In many cases we can just avoid calling locks_delete_block at all,
> when we know that the wait was awoken by the condition becoming true.
> Change several callers of locks_delete_block to only call it when
> waking up due to signal or other error condition.
>
> [ jlayton: add similar optimization to __break_lease, reword changelog,
> add WARN_ON_ONCE calls ]
>
> Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> Signed-off-by: yangerkun <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/locks.c | 29 ++++++++++++++++++++++-------
> 1 file changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..b88a5b11c464 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> if (error)
> break;
> }
> - locks_delete_block(fl);
> + if (error)
> + locks_delete_block(fl);
> + WARN_ON_ONCE(fl->fl_blocker);
> +
> return error;
> }
>
> @@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
>
> break;
> }
> - locks_delete_block(&fl);
> + if (error)
> + locks_delete_block(&fl);
> + WARN_ON_ONCE(fl.fl_blocker);
>
> return error;
> }
> @@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>
> locks_dispose_list(&dispose);
> error = wait_event_interruptible_timeout(new_fl->fl_wait,
> - !new_fl->fl_blocker, break_time);
> + !new_fl->fl_blocker,
> + break_time);
>
> percpu_down_read(&file_rwsem);
> spin_lock(&ctx->flc_lock);
> trace_break_lease_unblock(inode, new_fl);
> - locks_delete_block(new_fl);
> if (error >= 0) {
> /*
> * Wait for the next conflicting lease that has not been
> * broken yet
> */
> - if (error == 0)
> + if (error == 0) {
> + locks_delete_block(new_fl);
> time_out_leases(inode, &dispose);
> + }
> if (any_leases_conflict(inode, new_fl))
> goto restart;
> error = 0;
> + } else {
> + locks_delete_block(new_fl);
> }
> + WARN_ON_ONCE(fl->fl_blocker);
> out:
> spin_unlock(&ctx->flc_lock);
> percpu_up_read(&file_rwsem);
> @@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> if (error)
> break;
> }
> - locks_delete_block(fl);
> + if (error)
> + locks_delete_block(fl);
> + WARN_ON_ONCE(fl->fl_blocker);
> +
> return error;
> }
>
> @@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
> if (error)
> break;
> }
> - locks_delete_block(fl);
> + if (error)
> + locks_delete_block(fl);
> + WARN_ON_ONCE(fl->fl_blocker);
>
> return error;
> }
>

2020-03-10 15:07:45

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, 2020-03-10 at 22:18 +0800, yangerkun wrote:
>
> On 2020/3/10 20:52, Jeff Layton wrote:
> > On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
> > > On 2020/3/10 6:11, Jeff Layton wrote:
> > > > On Tue, 2020-03-10 at 08:42 +1100, NeilBrown wrote:
> > > > > On Mon, Mar 09 2020, Jeff Layton wrote:
> > > > >
> > > > > > On Mon, 2020-03-09 at 13:22 -0400, Jeff Layton wrote:
> > > > > > > On Mon, 2020-03-09 at 08:52 -0700, Linus Torvalds wrote:
> > > > > > > > On Mon, Mar 9, 2020 at 7:36 AM Jeff Layton <[email protected]> wrote:
> > > > > > > > > On Sun, 2020-03-08 at 22:03 +0800, kernel test robot wrote:
> > > > > > > > > > FYI, we noticed a -96.6% regression of will-it-scale.per_process_ops due to commit:
> > > > > > > > >
> > > > > > > > > This is not completely unexpected as we're banging on the global
> > > > > > > > > blocked_lock_lock now for every unlock. This test just thrashes file
> > > > > > > > > locks and unlocks without doing anything in between, so the workload
> > > > > > > > > looks pretty artificial [1].
> > > > > > > > >
> > > > > > > > > It would be nice to avoid the global lock in this codepath, but it
> > > > > > > > > doesn't look simple to do. I'll keep thinking about it, but for now I'm
> > > > > > > > > inclined to ignore this result unless we see a problem in more realistic
> > > > > > > > > workloads.
> > > > > > > >
> > > > > > > > That is a _huge_ regression, though.
> > > > > > > >
> > > > > > > > What about something like the attached? Wouldn't that work? And make
> > > > > > > > the code actually match the old comment about wow "fl_blocker" being
> > > > > > > > NULL being special.
> > > > > > > >
> > > > > > > > The old code seemed to not know about things like memory ordering either.
> > > > > > > >
> > > > > > > > Patch is entirely untested, but aims to have that "smp_store_release()
> > > > > > > > means I'm done and not going to touch it any more", making that
> > > > > > > > smp_load_acquire() test hopefully be valid as per the comment..
> > > > > > >
> > > > > > > Yeah, something along those lines maybe. I don't think we can use
> > > > > > > fl_blocker that way though, as the wait_event_interruptible is waiting
> > > > > > > on it to go to NULL, and the wake_up happens before fl_blocker is
> > > > > > > cleared.
> > > > > > >
> > > > > > > Maybe we need to mix in some sort of FL_BLOCK_ACTIVE flag and use that
> > > > > > > instead of testing for !fl_blocker to see whether we can avoid the
> > > > > > > blocked_lock_lock?
> > > > > > >
> > > > > >
> > > > > > How about something like this instead? (untested other than for
> > > > > > compilation)
> > > > > >
> > > > > > Basically, this just switches the waiters over to wait for
> > > > > > fl_blocked_member to go empty. That still happens before the wakeup, so
> > > > > > it should be ok to wait on that.
> > > > > >
> > > > > > I think we can also eliminate the lockless list_empty check in
> > > > > > locks_delete_block, as the fl_blocker check should be sufficient now.
> > > > > > --
> > > > > > Jeff Layton <[email protected]>
> > > > > > From c179d779c9b72838ed9996a65d686d86679d1639 Mon Sep 17 00:00:00 2001
> > > > > > From: Linus Torvalds <[email protected]>
> > > > > > Date: Mon, 9 Mar 2020 14:35:43 -0400
> > > > > > Subject: [PATCH] locks: reinstate locks_delete_lock optimization
> > > > > >
> > > > > > ...by using smp_load_acquire and smp_store_release to close the race
> > > > > > window.
> > > > > >
> > > > > > [ jlayton: wait on the fl_blocked_requests list to go empty instead of
> > > > > > the fl_blocker pointer to clear. Remove the list_empty check
> > > > > > from locks_delete_lock shortcut. ]
> > > > >
> > > > > Why do you think it is OK to remove that list_empty check? I don't
> > > > > think it is. There might be locked requests that need to be woken up.
> > > > >
> > > > > As the problem here is a use-after-free due to a race, one option would
> > > > > be to use rcu_free() on the file_lock, and hold rcu_read_lock() around
> > > > > test/use.
> > > > >
> > > > > Another option is to use a different lock. The fl_wait contains a
> > > > > spinlock, and we have wake_up_locked() which is provided for exactly
> > > > > these sorts of situations where the wake_up call can race with a thread
> > > > > waking up.
> > > > >
> > > > > So my compile-tested-only proposal is below.
> > > > > I can probably a proper change-log entry if you think the patch is a
> > > > > good way to go.
> > > > >
> > > > > NeilBrown
> > > > >
> > > > >
> > > > > diff --git a/fs/locks.c b/fs/locks.c
> > > > > index 426b55d333d5..8aa04d5ac8b3 100644
> > > > > --- a/fs/locks.c
> > > > > +++ b/fs/locks.c
> > > > > @@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> > > > >
> > > > > waiter = list_first_entry(&blocker->fl_blocked_requests,
> > > > > struct file_lock, fl_blocked_member);
> > > > > + spin_lock(&waiter->fl_wait.lock);
> > > > > __locks_delete_block(waiter);
> > > > > if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> > > > > waiter->fl_lmops->lm_notify(waiter);
> > > > > else
> > > > > - wake_up(&waiter->fl_wait);
> > > > > + wake_up_locked(&waiter->fl_wait);
> > > > > + spin_unlock(&waiter->fl_wait.lock);
> > > > > }
> > > > > }
> > > > >
> > > > > @@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
> > > > > {
> > > > > int status = -ENOENT;
> > > > >
> > > > > + /*
> > > > > + * If fl_blocker is NULL, it won't be set again as this thread
> > > > > + * "owns" the lock and is the only one that might try to claim
> > > > > + * the lock. So it is safe to test fl_blocker locklessly.
> > > > > + * Also if fl_blocker is NULL, this waiter is not listed on
> > > > > + * fl_blocked_requests for some lock, so no other request can
> > > > > + * be added to the list of fl_blocked_requests for this
> > > > > + * request. So if fl_blocker is NULL, it is safe to
> > > > > + * locklessly check if fl_blocked_requests is empty. If both
> > > > > + * of these checks succeed, there is no need to take the lock.
> > > > > + * However, some other thread might have only *just* set
> > > > > + * fl_blocker to NULL and it about to send a wakeup on
> > > > > + * fl_wait, so we mustn't return too soon or we might free waiter
> > > > > + * before that wakeup can be sent. So take the fl_wait.lock
> > > > > + * to serialize with the wakeup in __locks_wake_up_blocks().
> > > > > + */
> > > > > + if (waiter->fl_blocker == NULL) {
> > > > > + spin_lock(&waiter->fl_wait.lock);
> > > > > + if (waiter->fl_blocker == NULL &&
> > > > > + list_empty(&waiter->fl_blocked_requests)) {
> > > > > + spin_unlock(&waiter->fl_wait.lock);
> > > > > + return status;
> > > > > + }
> > > > > + spin_unlock(&waiter->fl_wait.lock);
> > > > > + }
> > > > > spin_lock(&blocked_lock_lock);
> > > > > if (waiter->fl_blocker)
> > > > > status = 0;
> > > > >
> > > >
> > > > Looks good on a cursory check, and I'm inclined to go with this since
> > > > it's less fiddly for people to backport.
> > > >
> > > > One other difference to note -- we are holding the fl_wait lock when
> > > > calling lm_notify, but I don't think it will matter to any of the
> > > > existing lm_notify functions.
> > > >
> > > > If you want to clean up the changelog and resend that would be great.
> > > >
> > > > Thanks,
> > > >
> > > Something others. I think there is no need to call locks_delete_block
> > > for all case in function like flock_lock_inode_wait. What we should do
> > > as the patch '16306a61d3b7 ("fs/locks: always delete_block after
> > > waiting.")' describes is that we need call locks_delete_block not only
> > > for error equal to -ERESTARTSYS(please point out if I am wrong). And
> > > this patch may fix the regression too since simple lock that success or
> > > unlock will not try to acquire blocked_lock_lock.
> > >
> > >
> >
> > Nice! This looks like it would work too, and it's a simpler fix.
> >
> > I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
> > statements to make sure we never exit with one still queued. Also, I
> > think we can do a similar optimization in __break_lease.
> >
> > There are some other callers of locks_delete_block:
> >
> > cifs_posix_lock_set: already only calls it in these cases
>
> Maybe cifs_posix_lock_set should to be treated the same as
> posix_lock_inode_wait since cifs_posix_lock_set can call
> locks_delete_block only when rc equals to -ERESTARTSYS.
>
> --------------------------------------------
>
> [PATCH] cifs: call locks_delete_block for all error case in
> cifs_posix_lock_set
>
> '16306a61d3b7 ("fs/locks: always delete_block after waiting.")' fix the
> problem that we should call locks_delete_block for all error case.
>
> However, cifs_posix_lock_set has been leaved alone which bug may still
> exists. Fix it and reorder the code to make in simple.
>

I don't think this is a real bug. The block will not be inserted unless
posix_lock_file returns FILE_LOCK_DEFERRED, and wait_event_interruptible
only returns 0 or -ERESTARTSYS.

Why do you believe we need to call it after any error?

> Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
> Signed-off-by: yangerkun <[email protected]>
> ---
> fs/cifs/file.c | 28 ++++++++++++++++------------
> 1 file changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 3b942ecdd4be..e20fc252c0a9 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1159,21 +1159,25 @@ cifs_posix_lock_set(struct file *file, struct
> file_lock *flock)
> if ((flock->fl_flags & FL_POSIX) == 0)
> return rc;
>
> -try_again:
> - cifs_down_write(&cinode->lock_sem);
> - if (!cinode->can_cache_brlcks) {
> - up_write(&cinode->lock_sem);
> - return rc;
> - }
> + for (;;) {
> + cifs_down_write(&cinode->lock_sem);
> + if (!cinode->can_cache_brlcks) {
> + up_write(&cinode->lock_sem);
> + return rc;
> + }
>
> - rc = posix_lock_file(file, flock, NULL);
> - up_write(&cinode->lock_sem);
> - if (rc == FILE_LOCK_DEFERRED) {
> + rc = posix_lock_file(file, flock, NULL);
> + up_write(&cinode->lock_sem);
> + if (rc != FILE_LOCK_DEFERRED)
> + break;
> rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
> - if (!rc)
> - goto try_again;
> - locks_delete_block(flock);
> + if (rc)
> + break;
> }
> + if (rc)
> + locks_delete_block(flock);
> + WARN_ON_ONCE(flock->fl_blocker);
> +
> return rc;
> }


--
Jeff Layton <[email protected]>

2020-03-10 17:29:01

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, 2020-03-10 at 08:52 -0400, Jeff Layton wrote:

[snip]

> On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
> > >
> > Something others. I think there is no need to call locks_delete_block
> > for all case in function like flock_lock_inode_wait. What we should do
> > as the patch '16306a61d3b7 ("fs/locks: always delete_block after
> > waiting.")' describes is that we need call locks_delete_block not only
> > for error equal to -ERESTARTSYS(please point out if I am wrong). And
> > this patch may fix the regression too since simple lock that success or
> > unlock will not try to acquire blocked_lock_lock.
> >
> >
>
> Nice! This looks like it would work too, and it's a simpler fix.
>
> I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
> statements to make sure we never exit with one still queued. Also, I
> think we can do a similar optimization in __break_lease.
>
> There are some other callers of locks_delete_block:
>
> cifs_posix_lock_set: already only calls it in these cases
>
> nlmsvc_unlink_block: I think we need to call this in most cases, and
> they're not going to be high-performance codepaths in general
>
> nfsd4 callback handling: Several calls here, most need to always be
> called. find_blocked_lock could be reworked to take the
> blocked_lock_lock only once (I'll do that in a separate patch).
>
> How about something like this (
>
> ----------------------8<---------------------
>
> From: yangerkun <[email protected]>
>
> [PATCH] filelock: fix regression in unlock performance
>
> '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> wakeup a waiter")' introduces a regression since we will acquire
> blocked_lock_lock every time locks_delete_block is called.
>
> In many cases we can just avoid calling locks_delete_block at all,
> when we know that the wait was awoken by the condition becoming true.
> Change several callers of locks_delete_block to only call it when
> waking up due to signal or other error condition.
>
> [ jlayton: add similar optimization to __break_lease, reword changelog,
> add WARN_ON_ONCE calls ]
>
> Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> Signed-off-by: yangerkun <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/locks.c | 29 ++++++++++++++++++++++-------
> 1 file changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..b88a5b11c464 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> if (error)
> break;
> }
> - locks_delete_block(fl);
> + if (error)
> + locks_delete_block(fl);
> + WARN_ON_ONCE(fl->fl_blocker);
> +
> return error;
> }
>
> @@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
>
> break;
> }
> - locks_delete_block(&fl);
> + if (error)
> + locks_delete_block(&fl);
> + WARN_ON_ONCE(fl.fl_blocker);
>
> return error;
> }
> @@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>
> locks_dispose_list(&dispose);
> error = wait_event_interruptible_timeout(new_fl->fl_wait,
> - !new_fl->fl_blocker, break_time);
> + !new_fl->fl_blocker,
> + break_time);
>
> percpu_down_read(&file_rwsem);
> spin_lock(&ctx->flc_lock);
> trace_break_lease_unblock(inode, new_fl);
> - locks_delete_block(new_fl);
> if (error >= 0) {
> /*
> * Wait for the next conflicting lease that has not been
> * broken yet
> */
> - if (error == 0)
> + if (error == 0) {
> + locks_delete_block(new_fl);
> time_out_leases(inode, &dispose);
> + }
> if (any_leases_conflict(inode, new_fl))
> goto restart;
> error = 0;
> + } else {
> + locks_delete_block(new_fl);
> }
> + WARN_ON_ONCE(fl->fl_blocker);
> out:
> spin_unlock(&ctx->flc_lock);
> percpu_up_read(&file_rwsem);
> @@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> if (error)
> break;
> }
> - locks_delete_block(fl);
> + if (error)
> + locks_delete_block(fl);
> + WARN_ON_ONCE(fl->fl_blocker);
> +
> return error;
> }
>
> @@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
> if (error)
> break;
> }
> - locks_delete_block(fl);
> + if (error)
> + locks_delete_block(fl);
> + WARN_ON_ONCE(fl->fl_blocker);
>
> return error;
> }

I've gone ahead and added the above patch to linux-next. Linus, Neil,
are you ok with this one? I think this is probably the simplest
approach.

Assuming so and that this tests out OK, I'll a PR in a few days, after
it has had a bit of soak time in next.

Thanks for the effort everyone!
--
Jeff Layton <[email protected]>

2020-03-10 21:03:06

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, Mar 10 2020, Jeff Layton wrote:

> On Tue, 2020-03-10 at 08:52 -0400, Jeff Layton wrote:
>
> [snip]
>
>> On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
>> > >
>> > Something others. I think there is no need to call locks_delete_block
>> > for all case in function like flock_lock_inode_wait. What we should do
>> > as the patch '16306a61d3b7 ("fs/locks: always delete_block after
>> > waiting.")' describes is that we need call locks_delete_block not only
>> > for error equal to -ERESTARTSYS(please point out if I am wrong). And
>> > this patch may fix the regression too since simple lock that success or
>> > unlock will not try to acquire blocked_lock_lock.
>> >
>> >
>>
>> Nice! This looks like it would work too, and it's a simpler fix.
>>
>> I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
>> statements to make sure we never exit with one still queued. Also, I
>> think we can do a similar optimization in __break_lease.
>>
>> There are some other callers of locks_delete_block:
>>
>> cifs_posix_lock_set: already only calls it in these cases
>>
>> nlmsvc_unlink_block: I think we need to call this in most cases, and
>> they're not going to be high-performance codepaths in general
>>
>> nfsd4 callback handling: Several calls here, most need to always be
>> called. find_blocked_lock could be reworked to take the
>> blocked_lock_lock only once (I'll do that in a separate patch).
>>
>> How about something like this (
>>
>> ----------------------8<---------------------
>>
>> From: yangerkun <[email protected]>
>>
>> [PATCH] filelock: fix regression in unlock performance
>>
>> '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
>> wakeup a waiter")' introduces a regression since we will acquire
>> blocked_lock_lock every time locks_delete_block is called.
>>
>> In many cases we can just avoid calling locks_delete_block at all,
>> when we know that the wait was awoken by the condition becoming true.
>> Change several callers of locks_delete_block to only call it when
>> waking up due to signal or other error condition.
>>
>> [ jlayton: add similar optimization to __break_lease, reword changelog,
>> add WARN_ON_ONCE calls ]
>>
>> Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
>> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
>> Signed-off-by: yangerkun <[email protected]>
>> Signed-off-by: Jeff Layton <[email protected]>
>> ---
>> fs/locks.c | 29 ++++++++++++++++++++++-------
>> 1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 426b55d333d5..b88a5b11c464 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>> if (error)
>> break;
>> }
>> - locks_delete_block(fl);
>> + if (error)
>> + locks_delete_block(fl);
>> + WARN_ON_ONCE(fl->fl_blocker);
>> +
>> return error;
>> }
>>
>> @@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
>>
>> break;
>> }
>> - locks_delete_block(&fl);
>> + if (error)
>> + locks_delete_block(&fl);
>> + WARN_ON_ONCE(fl.fl_blocker);
>>
>> return error;
>> }
>> @@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>>
>> locks_dispose_list(&dispose);
>> error = wait_event_interruptible_timeout(new_fl->fl_wait,
>> - !new_fl->fl_blocker, break_time);
>> + !new_fl->fl_blocker,
>> + break_time);
>>
>> percpu_down_read(&file_rwsem);
>> spin_lock(&ctx->flc_lock);
>> trace_break_lease_unblock(inode, new_fl);
>> - locks_delete_block(new_fl);
>> if (error >= 0) {
>> /*
>> * Wait for the next conflicting lease that has not been
>> * broken yet
>> */
>> - if (error == 0)
>> + if (error == 0) {
>> + locks_delete_block(new_fl);
>> time_out_leases(inode, &dispose);
>> + }
>> if (any_leases_conflict(inode, new_fl))
>> goto restart;
>> error = 0;
>> + } else {
>> + locks_delete_block(new_fl);
>> }
>> + WARN_ON_ONCE(fl->fl_blocker);
>> out:
>> spin_unlock(&ctx->flc_lock);
>> percpu_up_read(&file_rwsem);
>> @@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>> if (error)
>> break;
>> }
>> - locks_delete_block(fl);
>> + if (error)
>> + locks_delete_block(fl);
>> + WARN_ON_ONCE(fl->fl_blocker);
>> +
>> return error;
>> }
>>
>> @@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
>> if (error)
>> break;
>> }
>> - locks_delete_block(fl);
>> + if (error)
>> + locks_delete_block(fl);
>> + WARN_ON_ONCE(fl->fl_blocker);
>>
>> return error;
>> }
>
> I've gone ahead and added the above patch to linux-next. Linus, Neil,
> are you ok with this one? I think this is probably the simplest
> approach.

I think this patch contains an assumption which is not justified. It
assumes that if a wait_event completes without error, then the wake_up()
must have happened. I don't think that is correct.

In the patch that caused the recent regression, the race described
involved a signal arriving just as __locks_wake_up_blocks() was being
called on another thread.
So the waiting process was woken by a signal *after* ->fl_blocker was set
to NULL, and *before* the wake_up(). If wait_event_interruptible()
finds that the condition is true, it will report success whether there
was a signal or not.

If you skip the locks_delete_block() after a wait, you get exactly the
same race as the optimization - which only skipped most of
locks_delete_block().

I have a better solution. I did like your patch except that it changed
too much code. So I revised it to change less code. See below.

NeilBrown

From: NeilBrown <[email protected]>
Date: Wed, 11 Mar 2020 07:39:04 +1100
Subject: [PATCH] locks: restore locks_delete_lock optimization

A recent patch (see Fixes: below) removed an optimization which is
important as it avoids taking a lock in a common case.

The comment justifying the optimisation was correct as far as it went,
in that if the tests succeeded, then the values would remain stable and
the test result will remain valid even without a lock.

However after the test succeeds the lock can be freed while some other
thread might have only just set ->blocker to NULL (thus allowing the
test to succeed) but has not yet called wake_up() on the wq in the lock.
If the wake_up happens after the lock is freed, a use-after-free error
occurs.

This patch restores the optimization and reorders code to avoid the
use-after-free. Specifically we move the list_del_init on
fl_blocked_member to *after* the wake_up(), and add an extra test on
fl_block_member() to locks_delete_lock() before deciding to avoid taking
the spinlock.

As this involves breaking code out of __locks_delete_block(), we discard
the function completely and open-code it in the two places it was
called.

These lockless accesses do not require any memory barriers. The failure
mode from possible memory access reordering is that the test at the top
of locks_delete_lock() will fail, and in that case we fall through into
the locked region which provides sufficient memory barriers implicitly.

Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
Signed-off-by: NeilBrown <[email protected]>
---
fs/locks.c | 42 ++++++++++++++++++++++++++++--------------
1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..dc99ab2262ea 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -716,18 +716,6 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
hash_del(&waiter->fl_link);
}

-/* Remove waiter from blocker's block list.
- * When blocker ends up pointing to itself then the list is empty.
- *
- * Must be called with blocked_lock_lock held.
- */
-static void __locks_delete_block(struct file_lock *waiter)
-{
- locks_delete_global_blocked(waiter);
- list_del_init(&waiter->fl_blocked_member);
- waiter->fl_blocker = NULL;
-}
-
static void __locks_wake_up_blocks(struct file_lock *blocker)
{
while (!list_empty(&blocker->fl_blocked_requests)) {
@@ -735,11 +723,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)

waiter = list_first_entry(&blocker->fl_blocked_requests,
struct file_lock, fl_blocked_member);
- __locks_delete_block(waiter);
+ locks_delete_global_blocked(waiter);
+ waiter->fl_blocker = NULL;
if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
waiter->fl_lmops->lm_notify(waiter);
else
wake_up(&waiter->fl_wait);
+ list_del_init(&waiter->fl_blocked_member);
}
}

@@ -753,11 +743,35 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;

+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread
+ * "owns" the lock and is the only one that might try to claim
+ * the lock. So it is safe to test fl_blocker locklessly.
+ * Also if fl_blocker is NULL, this waiter is not listed on
+ * fl_blocked_requests for some lock, so no other request can
+ * be added to the list of fl_blocked_requests for this
+ * request. So if fl_blocker is NULL, it is safe to
+ * locklessly check if fl_blocked_requests is empty. If both
+ * of these checks succeed, there is no need to take the lock.
+ * We also check fl_blocked_member is empty. This is logically
+ * redundant with the test of fl_blocker, but it ensure that
+ * __locks_wake_up_blocks() has finished the wakeup and will not
+ * access the lock again, so it is safe to return and free.
+ * There is no need for any memory barriers with these lockless
+ * tests as is the reads happen before the corresponding writes are
+ * seen, we fall through to the locked code.
+ */
+ if (waiter->fl_blocker == NULL &&
+ list_empty(&waiter->fl_blocked_member) &&
+ list_empty(&waiter->fl_blocked_requests))
+ return status;
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;
__locks_wake_up_blocks(waiter);
- __locks_delete_block(waiter);
+ locks_delete_global_blocked(waiter);
+ list_del_init(&waiter->fl_blocked_member);
+ waiter->fl_blocker = NULL;
spin_unlock(&blocked_lock_lock);
return status;
}
--
2.25.1


Attachments:
signature.asc (847.00 B)

2020-03-10 21:15:12

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Wed, 2020-03-11 at 08:01 +1100, NeilBrown wrote:
> On Tue, Mar 10 2020, Jeff Layton wrote:
>
> > On Tue, 2020-03-10 at 08:52 -0400, Jeff Layton wrote:
> >
> > [snip]
> >
> > > On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
> > > > Something others. I think there is no need to call locks_delete_block
> > > > for all case in function like flock_lock_inode_wait. What we should do
> > > > as the patch '16306a61d3b7 ("fs/locks: always delete_block after
> > > > waiting.")' describes is that we need call locks_delete_block not only
> > > > for error equal to -ERESTARTSYS(please point out if I am wrong). And
> > > > this patch may fix the regression too since simple lock that success or
> > > > unlock will not try to acquire blocked_lock_lock.
> > > >
> > > >
> > >
> > > Nice! This looks like it would work too, and it's a simpler fix.
> > >
> > > I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
> > > statements to make sure we never exit with one still queued. Also, I
> > > think we can do a similar optimization in __break_lease.
> > >
> > > There are some other callers of locks_delete_block:
> > >
> > > cifs_posix_lock_set: already only calls it in these cases
> > >
> > > nlmsvc_unlink_block: I think we need to call this in most cases, and
> > > they're not going to be high-performance codepaths in general
> > >
> > > nfsd4 callback handling: Several calls here, most need to always be
> > > called. find_blocked_lock could be reworked to take the
> > > blocked_lock_lock only once (I'll do that in a separate patch).
> > >
> > > How about something like this (
> > >
> > > ----------------------8<---------------------
> > >
> > > From: yangerkun <[email protected]>
> > >
> > > [PATCH] filelock: fix regression in unlock performance
> > >
> > > '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> > > wakeup a waiter")' introduces a regression since we will acquire
> > > blocked_lock_lock every time locks_delete_block is called.
> > >
> > > In many cases we can just avoid calling locks_delete_block at all,
> > > when we know that the wait was awoken by the condition becoming true.
> > > Change several callers of locks_delete_block to only call it when
> > > waking up due to signal or other error condition.
> > >
> > > [ jlayton: add similar optimization to __break_lease, reword changelog,
> > > add WARN_ON_ONCE calls ]
> > >
> > > Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
> > > Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> > > Signed-off-by: yangerkun <[email protected]>
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > > fs/locks.c | 29 ++++++++++++++++++++++-------
> > > 1 file changed, 22 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/fs/locks.c b/fs/locks.c
> > > index 426b55d333d5..b88a5b11c464 100644
> > > --- a/fs/locks.c
> > > +++ b/fs/locks.c
> > > @@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> > > if (error)
> > > break;
> > > }
> > > - locks_delete_block(fl);
> > > + if (error)
> > > + locks_delete_block(fl);
> > > + WARN_ON_ONCE(fl->fl_blocker);
> > > +
> > > return error;
> > > }
> > >
> > > @@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
> > >
> > > break;
> > > }
> > > - locks_delete_block(&fl);
> > > + if (error)
> > > + locks_delete_block(&fl);
> > > + WARN_ON_ONCE(fl.fl_blocker);
> > >
> > > return error;
> > > }
> > > @@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
> > >
> > > locks_dispose_list(&dispose);
> > > error = wait_event_interruptible_timeout(new_fl->fl_wait,
> > > - !new_fl->fl_blocker, break_time);
> > > + !new_fl->fl_blocker,
> > > + break_time);
> > >
> > > percpu_down_read(&file_rwsem);
> > > spin_lock(&ctx->flc_lock);
> > > trace_break_lease_unblock(inode, new_fl);
> > > - locks_delete_block(new_fl);
> > > if (error >= 0) {
> > > /*
> > > * Wait for the next conflicting lease that has not been
> > > * broken yet
> > > */
> > > - if (error == 0)
> > > + if (error == 0) {
> > > + locks_delete_block(new_fl);
> > > time_out_leases(inode, &dispose);
> > > + }
> > > if (any_leases_conflict(inode, new_fl))
> > > goto restart;
> > > error = 0;
> > > + } else {
> > > + locks_delete_block(new_fl);
> > > }
> > > + WARN_ON_ONCE(fl->fl_blocker);
> > > out:
> > > spin_unlock(&ctx->flc_lock);
> > > percpu_up_read(&file_rwsem);
> > > @@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> > > if (error)
> > > break;
> > > }
> > > - locks_delete_block(fl);
> > > + if (error)
> > > + locks_delete_block(fl);
> > > + WARN_ON_ONCE(fl->fl_blocker);
> > > +
> > > return error;
> > > }
> > >
> > > @@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
> > > if (error)
> > > break;
> > > }
> > > - locks_delete_block(fl);
> > > + if (error)
> > > + locks_delete_block(fl);
> > > + WARN_ON_ONCE(fl->fl_blocker);
> > >
> > > return error;
> > > }
> >
> > I've gone ahead and added the above patch to linux-next. Linus, Neil,
> > are you ok with this one? I think this is probably the simplest
> > approach.
>
> I think this patch contains an assumption which is not justified. It
> assumes that if a wait_event completes without error, then the wake_up()
> must have happened. I don't think that is correct.
>
> In the patch that caused the recent regression, the race described
> involved a signal arriving just as __locks_wake_up_blocks() was being
> called on another thread.
> So the waiting process was woken by a signal *after* ->fl_blocker was set
> to NULL, and *before* the wake_up(). If wait_event_interruptible()
> finds that the condition is true, it will report success whether there
> was a signal or not.
>
> If you skip the locks_delete_block() after a wait, you get exactly the
> same race as the optimization - which only skipped most of
> locks_delete_block().
>
> I have a better solution. I did like your patch except that it changed
> too much code. So I revised it to change less code. See below.
>
> NeilBrown
>
> From: NeilBrown <[email protected]>
> Date: Wed, 11 Mar 2020 07:39:04 +1100
> Subject: [PATCH] locks: restore locks_delete_lock optimization
>
> A recent patch (see Fixes: below) removed an optimization which is
> important as it avoids taking a lock in a common case.
>
> The comment justifying the optimisation was correct as far as it went,
> in that if the tests succeeded, then the values would remain stable and
> the test result will remain valid even without a lock.
>
> However after the test succeeds the lock can be freed while some other
> thread might have only just set ->blocker to NULL (thus allowing the
> test to succeed) but has not yet called wake_up() on the wq in the lock.
> If the wake_up happens after the lock is freed, a use-after-free error
> occurs.
>
> This patch restores the optimization and reorders code to avoid the
> use-after-free. Specifically we move the list_del_init on
> fl_blocked_member to *after* the wake_up(), and add an extra test on
> fl_block_member() to locks_delete_lock() before deciding to avoid taking
> the spinlock.
>
> As this involves breaking code out of __locks_delete_block(), we discard
> the function completely and open-code it in the two places it was
> called.
>
> These lockless accesses do not require any memory barriers. The failure
> mode from possible memory access reordering is that the test at the top
> of locks_delete_lock() will fail, and in that case we fall through into
> the locked region which provides sufficient memory barriers implicitly.
>
> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> Signed-off-by: NeilBrown <[email protected]>
> ---
> fs/locks.c | 42 ++++++++++++++++++++++++++++--------------
> 1 file changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..dc99ab2262ea 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -716,18 +716,6 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
> hash_del(&waiter->fl_link);
> }
>
> -/* Remove waiter from blocker's block list.
> - * When blocker ends up pointing to itself then the list is empty.
> - *
> - * Must be called with blocked_lock_lock held.
> - */
> -static void __locks_delete_block(struct file_lock *waiter)
> -{
> - locks_delete_global_blocked(waiter);
> - list_del_init(&waiter->fl_blocked_member);
> - waiter->fl_blocker = NULL;
> -}
> -
> static void __locks_wake_up_blocks(struct file_lock *blocker)
> {
> while (!list_empty(&blocker->fl_blocked_requests)) {
> @@ -735,11 +723,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>
> waiter = list_first_entry(&blocker->fl_blocked_requests,
> struct file_lock, fl_blocked_member);
> - __locks_delete_block(waiter);
> + locks_delete_global_blocked(waiter);
> + waiter->fl_blocker = NULL;
> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> waiter->fl_lmops->lm_notify(waiter);
> else
> wake_up(&waiter->fl_wait);
> + list_del_init(&waiter->fl_blocked_member);

Are you sure you don't need a memory barrier here? Could the
list_del_init be hoisted just above the if condition?

> }
> }
>
> @@ -753,11 +743,35 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread
> + * "owns" the lock and is the only one that might try to claim
> + * the lock. So it is safe to test fl_blocker locklessly.
> + * Also if fl_blocker is NULL, this waiter is not listed on
> + * fl_blocked_requests for some lock, so no other request can
> + * be added to the list of fl_blocked_requests for this
> + * request. So if fl_blocker is NULL, it is safe to
> + * locklessly check if fl_blocked_requests is empty. If both
> + * of these checks succeed, there is no need to take the lock.
> + * We also check fl_blocked_member is empty. This is logically
> + * redundant with the test of fl_blocker, but it ensure that
> + * __locks_wake_up_blocks() has finished the wakeup and will not
> + * access the lock again, so it is safe to return and free.
> + * There is no need for any memory barriers with these lockless
> + * tests as is the reads happen before the corresponding writes are
> + * seen, we fall through to the locked code.
> + */
> + if (waiter->fl_blocker == NULL &&
> + list_empty(&waiter->fl_blocked_member) &&
> + list_empty(&waiter->fl_blocked_requests))
> + return status;
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;
> __locks_wake_up_blocks(waiter);
> - __locks_delete_block(waiter);
> + locks_delete_global_blocked(waiter);
> + list_del_init(&waiter->fl_blocked_member);
> + waiter->fl_blocker = NULL;
> spin_unlock(&blocked_lock_lock);
> return status;
> }

--
Jeff Layton <[email protected]>

2020-03-10 21:22:37

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, Mar 10 2020, Jeff Layton wrote:

>> @@ -735,11 +723,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>>
>> waiter = list_first_entry(&blocker->fl_blocked_requests,
>> struct file_lock, fl_blocked_member);
>> - __locks_delete_block(waiter);
>> + locks_delete_global_blocked(waiter);
>> + waiter->fl_blocker = NULL;
>> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>> waiter->fl_lmops->lm_notify(waiter);
>> else
>> wake_up(&waiter->fl_wait);
>> + list_del_init(&waiter->fl_blocked_member);
>
> Are you sure you don't need a memory barrier here? Could the
> list_del_init be hoisted just above the if condition?
>

A compiler barrier() is probably justified. Memory barriers delay reads
and expedite writes so they cannot be needed.

wake_up(&waiter->fl_wait);
+ /* The list_del_init() must not be visible before the
+ * wake_up completes, the the waiter can then be freed.
+ */
+ barrier();
+ list_del_init(&waiter->fl_blocked_member);

Thanks,
NeilBrown


Attachments:
signature.asc (847.00 B)

2020-03-10 21:48:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, Mar 10, 2020 at 2:22 PM NeilBrown <[email protected]> wrote:
>
> A compiler barrier() is probably justified. Memory barriers delay reads
> and expedite writes so they cannot be needed.

That's not at all guaranteed. Weakly ordered memory things can
actually have odd orderings, and not just "writes delayed, reads done
early". Reads may be delayed too by cache misses, and memory barriers
can thus expedite reads as well (by forcing the missing read to happen
before later non-missing ones).

So don't assume that a memory barrier would only delay reads and
expedite writes. Quite the reverse: assume that there is no ordering
at all unless you impose one with a memory barrier (*).

Linus

(*) it's a bit more complex than that, in that we do assume that
control dependencies end up gating writes, for example, but those
kinds of implicit ordering things should *not* be what you depend on
in the code unless you're doing some seriously subtle memory ordering
work and comment on it extensively.

2020-03-10 22:09:55

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, 2020-03-10 at 14:47 -0700, Linus Torvalds wrote:
> On Tue, Mar 10, 2020 at 2:22 PM NeilBrown <[email protected]> wrote:
> > A compiler barrier() is probably justified. Memory barriers delay reads
> > and expedite writes so they cannot be needed.
>
> That's not at all guaranteed. Weakly ordered memory things can
> actually have odd orderings, and not just "writes delayed, reads done
> early". Reads may be delayed too by cache misses, and memory barriers
> can thus expedite reads as well (by forcing the missing read to happen
> before later non-missing ones).
>
> So don't assume that a memory barrier would only delay reads and
> expedite writes. Quite the reverse: assume that there is no ordering
> at all unless you impose one with a memory barrier (*).
>
> Linus
>
> (*) it's a bit more complex than that, in that we do assume that
> control dependencies end up gating writes, for example, but those
> kinds of implicit ordering things should *not* be what you depend on
> in the code unless you're doing some seriously subtle memory ordering
> work and comment on it extensively.

Good point. I too prefer code that's understandable by mere mortals.

Given that, and the fact that Neil pointed out that yangerkun's latest
patch would reintroduce the original race, I'm leaning back toward the
patch Neil sent yesterday. It relies solely on spinlocks, and so doesn't
have the subtle memory-ordering requirements of the others.

I did some cursory testing with it and it seems to fix the performance
regression. If you guys are OK with this patch, and Neil can send an
updated changelog, I'll get it into -next and we can get this sorted
out.

Thanks,

-------------------8<-------------------

[PATCH] locks: reintroduce locks_delete_block shortcut
---
fs/locks.c | 29 ++++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..8aa04d5ac8b3 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -735,11 +735,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)

waiter = list_first_entry(&blocker->fl_blocked_requests,
struct file_lock, fl_blocked_member);
+ spin_lock(&waiter->fl_wait.lock);
__locks_delete_block(waiter);
if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
waiter->fl_lmops->lm_notify(waiter);
else
- wake_up(&waiter->fl_wait);
+ wake_up_locked(&waiter->fl_wait);
+ spin_unlock(&waiter->fl_wait.lock);
}
}

@@ -753,6 +755,31 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;

+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread
+ * "owns" the lock and is the only one that might try to claim
+ * the lock. So it is safe to test fl_blocker locklessly.
+ * Also if fl_blocker is NULL, this waiter is not listed on
+ * fl_blocked_requests for some lock, so no other request can
+ * be added to the list of fl_blocked_requests for this
+ * request. So if fl_blocker is NULL, it is safe to
+ * locklessly check if fl_blocked_requests is empty. If both
+ * of these checks succeed, there is no need to take the lock.
+ * However, some other thread might have only *just* set
+ * fl_blocker to NULL and it about to send a wakeup on
+ * fl_wait, so we mustn't return too soon or we might free waiter
+ * before that wakeup can be sent. So take the fl_wait.lock
+ * to serialize with the wakeup in __locks_wake_up_blocks().
+ */
+ if (waiter->fl_blocker == NULL) {
+ spin_lock(&waiter->fl_wait.lock);
+ if (waiter->fl_blocker == NULL &&
+ list_empty(&waiter->fl_blocked_requests)) {
+ spin_unlock(&waiter->fl_wait.lock);
+ return status;
+ }
+ spin_unlock(&waiter->fl_wait.lock);
+ }
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;
--
2.24.1


2020-03-10 22:33:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, Mar 10, 2020 at 3:07 PM Jeff Layton <[email protected]> wrote:
>
> Given that, and the fact that Neil pointed out that yangerkun's latest
> patch would reintroduce the original race, I'm leaning back toward the
> patch Neil sent yesterday. It relies solely on spinlocks, and so doesn't
> have the subtle memory-ordering requirements of the others.

It has subtle locking changes, though.

It now calls the "->lm_notify()" callback with the wait queue spinlock held.

is that ok? It's not obvious. Those functions take other spinlocks,
and wake up other things. See for example nlmsvc_notify_blocked()..
Yes, it was called under the blocked_lock_lock spinlock before too,
but now there's an _additional_ spinlock, and it must not call
"wake_up(&waiter->fl_wait))" in the callback, for example, because it
already holds the lock on that wait queue.

Maybe that is never done. I don't know the callbacks.

I was really hoping that the simple memory ordering of using that
smp_store_release -> smp_load_acquire using fl_blocker would be
sufficient. That's a particularly simple and efficient ordering.

Oh well. If you want to go that spinlock way, it needs to document why
it's safe to do a callback under it.

Linus

2020-03-11 01:57:49

by yangerkun

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression



On 2020/3/11 5:01, NeilBrown wrote:
> On Tue, Mar 10 2020, Jeff Layton wrote:
>
>> On Tue, 2020-03-10 at 08:52 -0400, Jeff Layton wrote:
>>
>> [snip]
>>
>>> On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
>>>>>
>>>> Something others. I think there is no need to call locks_delete_block
>>>> for all case in function like flock_lock_inode_wait. What we should do
>>>> as the patch '16306a61d3b7 ("fs/locks: always delete_block after
>>>> waiting.")' describes is that we need call locks_delete_block not only
>>>> for error equal to -ERESTARTSYS(please point out if I am wrong). And
>>>> this patch may fix the regression too since simple lock that success or
>>>> unlock will not try to acquire blocked_lock_lock.
>>>>
>>>>
>>>
>>> Nice! This looks like it would work too, and it's a simpler fix.
>>>
>>> I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
>>> statements to make sure we never exit with one still queued. Also, I
>>> think we can do a similar optimization in __break_lease.
>>>
>>> There are some other callers of locks_delete_block:
>>>
>>> cifs_posix_lock_set: already only calls it in these cases
>>>
>>> nlmsvc_unlink_block: I think we need to call this in most cases, and
>>> they're not going to be high-performance codepaths in general
>>>
>>> nfsd4 callback handling: Several calls here, most need to always be
>>> called. find_blocked_lock could be reworked to take the
>>> blocked_lock_lock only once (I'll do that in a separate patch).
>>>
>>> How about something like this (
>>>
>>> ----------------------8<---------------------
>>>
>>> From: yangerkun <[email protected]>
>>>
>>> [PATCH] filelock: fix regression in unlock performance
>>>
>>> '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
>>> wakeup a waiter")' introduces a regression since we will acquire
>>> blocked_lock_lock every time locks_delete_block is called.
>>>
>>> In many cases we can just avoid calling locks_delete_block at all,
>>> when we know that the wait was awoken by the condition becoming true.
>>> Change several callers of locks_delete_block to only call it when
>>> waking up due to signal or other error condition.
>>>
>>> [ jlayton: add similar optimization to __break_lease, reword changelog,
>>> add WARN_ON_ONCE calls ]
>>>
>>> Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
>>> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
>>> Signed-off-by: yangerkun <[email protected]>
>>> Signed-off-by: Jeff Layton <[email protected]>
>>> ---
>>> fs/locks.c | 29 ++++++++++++++++++++++-------
>>> 1 file changed, 22 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/locks.c b/fs/locks.c
>>> index 426b55d333d5..b88a5b11c464 100644
>>> --- a/fs/locks.c
>>> +++ b/fs/locks.c
>>> @@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>>> if (error)
>>> break;
>>> }
>>> - locks_delete_block(fl);
>>> + if (error)
>>> + locks_delete_block(fl);
>>> + WARN_ON_ONCE(fl->fl_blocker);
>>> +
>>> return error;
>>> }
>>>
>>> @@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
>>>
>>> break;
>>> }
>>> - locks_delete_block(&fl);
>>> + if (error)
>>> + locks_delete_block(&fl);
>>> + WARN_ON_ONCE(fl.fl_blocker);
>>>
>>> return error;
>>> }
>>> @@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>>>
>>> locks_dispose_list(&dispose);
>>> error = wait_event_interruptible_timeout(new_fl->fl_wait,
>>> - !new_fl->fl_blocker, break_time);
>>> + !new_fl->fl_blocker,
>>> + break_time);
>>>
>>> percpu_down_read(&file_rwsem);
>>> spin_lock(&ctx->flc_lock);
>>> trace_break_lease_unblock(inode, new_fl);
>>> - locks_delete_block(new_fl);
>>> if (error >= 0) {
>>> /*
>>> * Wait for the next conflicting lease that has not been
>>> * broken yet
>>> */
>>> - if (error == 0)
>>> + if (error == 0) {
>>> + locks_delete_block(new_fl);
>>> time_out_leases(inode, &dispose);
>>> + }
>>> if (any_leases_conflict(inode, new_fl))
>>> goto restart;
>>> error = 0;
>>> + } else {
>>> + locks_delete_block(new_fl);
>>> }
>>> + WARN_ON_ONCE(fl->fl_blocker);
>>> out:
>>> spin_unlock(&ctx->flc_lock);
>>> percpu_up_read(&file_rwsem);
>>> @@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>>> if (error)
>>> break;
>>> }
>>> - locks_delete_block(fl);
>>> + if (error)
>>> + locks_delete_block(fl);
>>> + WARN_ON_ONCE(fl->fl_blocker);
>>> +
>>> return error;
>>> }
>>>
>>> @@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
>>> if (error)
>>> break;
>>> }
>>> - locks_delete_block(fl);
>>> + if (error)
>>> + locks_delete_block(fl);
>>> + WARN_ON_ONCE(fl->fl_blocker);
>>>
>>> return error;
>>> }
>>
>> I've gone ahead and added the above patch to linux-next. Linus, Neil,
>> are you ok with this one? I think this is probably the simplest
>> approach.
>
> I think this patch contains an assumption which is not justified. It
> assumes that if a wait_event completes without error, then the wake_up()
> must have happened. I don't think that is correct.
>
> In the patch that caused the recent regression, the race described
> involved a signal arriving just as __locks_wake_up_blocks() was being
> called on another thread.
> So the waiting process was woken by a signal *after* ->fl_blocker was set
> to NULL, and *before* the wake_up(). If wait_event_interruptible()
> finds that the condition is true, it will report success whether there
> was a signal or not.
Neil and Jeff, Hi,

But after this, like in flock_lock_inode_wait, we will go another
flock_lock_inode. And the flock_lock_inode it may return
-ENOMEM/-ENOENT/-EAGAIN/0.

- 0: If there is a try lock, it means that we have call
locks_move_blocks, and fl->fl_blocked_requests will be NULL, no need to
wake up at all. If there is a unlock, no one call wait for me, no need
to wake up too.

- ENOENT: means we are doing unlock, no one will wait for me, no need to
wake up.

- ENOMEM: since last time we go through flock_lock_inode someone may
wait for me, so for this error, we need to wake up them.

- EAGAIN: since we has go through flock_lock_inode before, these may
never happen because FL_SLEEP will not lose.

So the assumption may be ok and for some error case we need to wake up
someone may wait for me before(the reason for the patch "cifs: call
locks_delete_block for all error case in cifs_posix_lock_set"). If I am
wrong, please point out!


>
> If you skip the locks_delete_block() after a wait, you get exactly the
> same race as the optimization - which only skipped most of
> locks_delete_block().
>
> I have a better solution. I did like your patch except that it changed
> too much code. So I revised it to change less code. See below.
>
> NeilBrown
>
> From: NeilBrown <[email protected]>
> Date: Wed, 11 Mar 2020 07:39:04 +1100
> Subject: [PATCH] locks: restore locks_delete_lock optimization
>
> A recent patch (see Fixes: below) removed an optimization which is
> important as it avoids taking a lock in a common case.
>
> The comment justifying the optimisation was correct as far as it went,
> in that if the tests succeeded, then the values would remain stable and
> the test result will remain valid even without a lock.
>
> However after the test succeeds the lock can be freed while some other
> thread might have only just set ->blocker to NULL (thus allowing the
> test to succeed) but has not yet called wake_up() on the wq in the lock.
> If the wake_up happens after the lock is freed, a use-after-free error
> occurs.
>
> This patch restores the optimization and reorders code to avoid the
> use-after-free. Specifically we move the list_del_init on
> fl_blocked_member to *after* the wake_up(), and add an extra test on
> fl_block_member() to locks_delete_lock() before deciding to avoid taking
> the spinlock.
>
> As this involves breaking code out of __locks_delete_block(), we discard
> the function completely and open-code it in the two places it was
> called.
>
> These lockless accesses do not require any memory barriers. The failure
> mode from possible memory access reordering is that the test at the top
> of locks_delete_lock() will fail, and in that case we fall through into
> the locked region which provides sufficient memory barriers implicitly.
>
> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> Signed-off-by: NeilBrown <[email protected]>
> ---
> fs/locks.c | 42 ++++++++++++++++++++++++++++--------------
> 1 file changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..dc99ab2262ea 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -716,18 +716,6 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
> hash_del(&waiter->fl_link);
> }
>
> -/* Remove waiter from blocker's block list.
> - * When blocker ends up pointing to itself then the list is empty.
> - *
> - * Must be called with blocked_lock_lock held.
> - */
> -static void __locks_delete_block(struct file_lock *waiter)
> -{
> - locks_delete_global_blocked(waiter);
> - list_del_init(&waiter->fl_blocked_member);
> - waiter->fl_blocker = NULL;
> -}
> -
> static void __locks_wake_up_blocks(struct file_lock *blocker)
> {
> while (!list_empty(&blocker->fl_blocked_requests)) {
> @@ -735,11 +723,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>
> waiter = list_first_entry(&blocker->fl_blocked_requests,
> struct file_lock, fl_blocked_member);
> - __locks_delete_block(waiter);
> + locks_delete_global_blocked(waiter);
> + waiter->fl_blocker = NULL;
> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> waiter->fl_lmops->lm_notify(waiter);
> else
> wake_up(&waiter->fl_wait);
> + list_del_init(&waiter->fl_blocked_member);
> }
> }
>
> @@ -753,11 +743,35 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread
> + * "owns" the lock and is the only one that might try to claim
> + * the lock. So it is safe to test fl_blocker locklessly.
> + * Also if fl_blocker is NULL, this waiter is not listed on
> + * fl_blocked_requests for some lock, so no other request can
> + * be added to the list of fl_blocked_requests for this
> + * request. So if fl_blocker is NULL, it is safe to
> + * locklessly check if fl_blocked_requests is empty. If both
> + * of these checks succeed, there is no need to take the lock.
> + * We also check fl_blocked_member is empty. This is logically
> + * redundant with the test of fl_blocker, but it ensure that
> + * __locks_wake_up_blocks() has finished the wakeup and will not
> + * access the lock again, so it is safe to return and free.
> + * There is no need for any memory barriers with these lockless
> + * tests as is the reads happen before the corresponding writes are
> + * seen, we fall through to the locked code.
> + */
> + if (waiter->fl_blocker == NULL &&
> + list_empty(&waiter->fl_blocked_member) &&
> + list_empty(&waiter->fl_blocked_requests))
> + return status;
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;
> __locks_wake_up_blocks(waiter);
> - __locks_delete_block(waiter);
> + locks_delete_global_blocked(waiter);
> + list_del_init(&waiter->fl_blocked_member);
> + waiter->fl_blocker = NULL;
> spin_unlock(&blocked_lock_lock);
> return status;
> }
>

2020-03-11 12:53:16

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Wed, 2020-03-11 at 09:57 +0800, yangerkun wrote:

[snip]

>
> On 2020/3/11 5:01, NeilBrown wrote:
> >
> > I think this patch contains an assumption which is not justified. It
> > assumes that if a wait_event completes without error, then the wake_up()
> > must have happened. I don't think that is correct.
> >
> > In the patch that caused the recent regression, the race described
> > involved a signal arriving just as __locks_wake_up_blocks() was being
> > called on another thread.
> > So the waiting process was woken by a signal *after* ->fl_blocker was set
> > to NULL, and *before* the wake_up(). If wait_event_interruptible()
> > finds that the condition is true, it will report success whether there
> > was a signal or not.
> Neil and Jeff, Hi,
>
> But after this, like in flock_lock_inode_wait, we will go another
> flock_lock_inode. And the flock_lock_inode it may return
> -ENOMEM/-ENOENT/-EAGAIN/0.
>
> - 0: If there is a try lock, it means that we have call
> locks_move_blocks, and fl->fl_blocked_requests will be NULL, no need to
> wake up at all. If there is a unlock, no one call wait for me, no need
> to wake up too.
>
> - ENOENT: means we are doing unlock, no one will wait for me, no need to
> wake up.
>
> - ENOMEM: since last time we go through flock_lock_inode someone may
> wait for me, so for this error, we need to wake up them.
>
> - EAGAIN: since we has go through flock_lock_inode before, these may
> never happen because FL_SLEEP will not lose.
>
> So the assumption may be ok and for some error case we need to wake up
> someone may wait for me before(the reason for the patch "cifs: call
> locks_delete_block for all error case in cifs_posix_lock_set"). If I am
> wrong, please point out!
>
>

That's the basic dilemma. We need to know whether we'll need to delete
the block before taking the blocked_lock_lock.

Your most recent patch used the return code from the wait to determine
this, but that's not 100% reliable (as Neil pointed out). Could we try
to do this by doing the delete only when we get certain error codes?
Maybe, but that's a bit fragile-sounding.

Neil's most recent patch used presence on the fl_blocked_requests list
to determine whether to take the lock, but that relied on some very
subtle memory ordering. We could of course do that, but that's a bit
brittle too.

That's the main reason I'm leaning toward the patch Neil sent
originally and that uses the fl_wait.lock. The existing alternate lock
managers (nfsd and lockd) don't use fl_wait at all, so I don't think
doing that will cause any issues.

--
Jeff Layton <[email protected]>

2020-03-11 13:28:40

by yangerkun

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression



On 2020/3/11 20:52, Jeff Layton wrote:
> On Wed, 2020-03-11 at 09:57 +0800, yangerkun wrote:
>
> [snip]
>
>>
>> On 2020/3/11 5:01, NeilBrown wrote:
>>>
>>> I think this patch contains an assumption which is not justified. It
>>> assumes that if a wait_event completes without error, then the wake_up()
>>> must have happened. I don't think that is correct.
>>>
>>> In the patch that caused the recent regression, the race described
>>> involved a signal arriving just as __locks_wake_up_blocks() was being
>>> called on another thread.
>>> So the waiting process was woken by a signal *after* ->fl_blocker was set
>>> to NULL, and *before* the wake_up(). If wait_event_interruptible()
>>> finds that the condition is true, it will report success whether there
>>> was a signal or not.
>> Neil and Jeff, Hi,
>>
>> But after this, like in flock_lock_inode_wait, we will go another
>> flock_lock_inode. And the flock_lock_inode it may return
>> -ENOMEM/-ENOENT/-EAGAIN/0.
>>
>> - 0: If there is a try lock, it means that we have call
>> locks_move_blocks, and fl->fl_blocked_requests will be NULL, no need to
>> wake up at all. If there is a unlock, no one call wait for me, no need
>> to wake up too.
>>
>> - ENOENT: means we are doing unlock, no one will wait for me, no need to
>> wake up.
>>
>> - ENOMEM: since last time we go through flock_lock_inode someone may
>> wait for me, so for this error, we need to wake up them.
>>
>> - EAGAIN: since we has go through flock_lock_inode before, these may
>> never happen because FL_SLEEP will not lose.
>>
>> So the assumption may be ok and for some error case we need to wake up
>> someone may wait for me before(the reason for the patch "cifs: call
>> locks_delete_block for all error case in cifs_posix_lock_set"). If I am
>> wrong, please point out!
>>
>>
>
> That's the basic dilemma. We need to know whether we'll need to delete
> the block before taking the blocked_lock_lock.
>
> Your most recent patch used the return code from the wait to determine
> this, but that's not 100% reliable (as Neil pointed out). Could we try

I am a little confused, maybe I am wrong.

As Neil say: "If wait_event_interruptible() finds that the condition is
true, it will report success whether there was a signal or not.", this
wait_event_interruptible may return 0 for this scenes? so we will go
loop and call flock_lock_inode again, and after we exits the loop with
error equals 0(if we try lock), the lock has call locks_move_blocks and
leave fl_blocked_requests as NULL?

> to do this by doing the delete only when we get certain error codes?
> Maybe, but that's a bit fragile-sounding.
>
> Neil's most recent patch used presence on the fl_blocked_requests list
> to determine whether to take the lock, but that relied on some very
> subtle memory ordering. We could of course do that, but that's a bit
> brittle too.
>
> That's the main reason I'm leaning toward the patch Neil sent
> originally and that uses the fl_wait.lock. The existing alternate lock
> managers (nfsd and lockd) don't use fl_wait at all, so I don't think
> doing that will cause any issues.
>

2020-03-11 22:16:19

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Wed, Mar 11 2020, yangerkun wrote:

> On 2020/3/11 5:01, NeilBrown wrote:
>> On Tue, Mar 10 2020, Jeff Layton wrote:
>>
>>> On Tue, 2020-03-10 at 08:52 -0400, Jeff Layton wrote:
>>>
>>> [snip]
>>>
>>>> On Tue, 2020-03-10 at 11:24 +0800, yangerkun wrote:
>>>>>>
>>>>> Something others. I think there is no need to call locks_delete_block
>>>>> for all case in function like flock_lock_inode_wait. What we should do
>>>>> as the patch '16306a61d3b7 ("fs/locks: always delete_block after
>>>>> waiting.")' describes is that we need call locks_delete_block not only
>>>>> for error equal to -ERESTARTSYS(please point out if I am wrong). And
>>>>> this patch may fix the regression too since simple lock that success or
>>>>> unlock will not try to acquire blocked_lock_lock.
>>>>>
>>>>>
>>>>
>>>> Nice! This looks like it would work too, and it's a simpler fix.
>>>>
>>>> I'd be inclined to add a WARN_ON_ONCE(fl->fl_blocker) after the if
>>>> statements to make sure we never exit with one still queued. Also, I
>>>> think we can do a similar optimization in __break_lease.
>>>>
>>>> There are some other callers of locks_delete_block:
>>>>
>>>> cifs_posix_lock_set: already only calls it in these cases
>>>>
>>>> nlmsvc_unlink_block: I think we need to call this in most cases, and
>>>> they're not going to be high-performance codepaths in general
>>>>
>>>> nfsd4 callback handling: Several calls here, most need to always be
>>>> called. find_blocked_lock could be reworked to take the
>>>> blocked_lock_lock only once (I'll do that in a separate patch).
>>>>
>>>> How about something like this (
>>>>
>>>> ----------------------8<---------------------
>>>>
>>>> From: yangerkun <[email protected]>
>>>>
>>>> [PATCH] filelock: fix regression in unlock performance
>>>>
>>>> '6d390e4b5d48 ("locks: fix a potential use-after-free problem when
>>>> wakeup a waiter")' introduces a regression since we will acquire
>>>> blocked_lock_lock every time locks_delete_block is called.
>>>>
>>>> In many cases we can just avoid calling locks_delete_block at all,
>>>> when we know that the wait was awoken by the condition becoming true.
>>>> Change several callers of locks_delete_block to only call it when
>>>> waking up due to signal or other error condition.
>>>>
>>>> [ jlayton: add similar optimization to __break_lease, reword changelog,
>>>> add WARN_ON_ONCE calls ]
>>>>
>>>> Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
>>>> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
>>>> Signed-off-by: yangerkun <[email protected]>
>>>> Signed-off-by: Jeff Layton <[email protected]>
>>>> ---
>>>> fs/locks.c | 29 ++++++++++++++++++++++-------
>>>> 1 file changed, 22 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/fs/locks.c b/fs/locks.c
>>>> index 426b55d333d5..b88a5b11c464 100644
>>>> --- a/fs/locks.c
>>>> +++ b/fs/locks.c
>>>> @@ -1354,7 +1354,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>>>> if (error)
>>>> break;
>>>> }
>>>> - locks_delete_block(fl);
>>>> + if (error)
>>>> + locks_delete_block(fl);
>>>> + WARN_ON_ONCE(fl->fl_blocker);
>>>> +
>>>> return error;
>>>> }
>>>>
>>>> @@ -1447,7 +1450,9 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
>>>>
>>>> break;
>>>> }
>>>> - locks_delete_block(&fl);
>>>> + if (error)
>>>> + locks_delete_block(&fl);
>>>> + WARN_ON_ONCE(fl.fl_blocker);
>>>>
>>>> return error;
>>>> }
>>>> @@ -1638,23 +1643,28 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>>>>
>>>> locks_dispose_list(&dispose);
>>>> error = wait_event_interruptible_timeout(new_fl->fl_wait,
>>>> - !new_fl->fl_blocker, break_time);
>>>> + !new_fl->fl_blocker,
>>>> + break_time);
>>>>
>>>> percpu_down_read(&file_rwsem);
>>>> spin_lock(&ctx->flc_lock);
>>>> trace_break_lease_unblock(inode, new_fl);
>>>> - locks_delete_block(new_fl);
>>>> if (error >= 0) {
>>>> /*
>>>> * Wait for the next conflicting lease that has not been
>>>> * broken yet
>>>> */
>>>> - if (error == 0)
>>>> + if (error == 0) {
>>>> + locks_delete_block(new_fl);
>>>> time_out_leases(inode, &dispose);
>>>> + }
>>>> if (any_leases_conflict(inode, new_fl))
>>>> goto restart;
>>>> error = 0;
>>>> + } else {
>>>> + locks_delete_block(new_fl);
>>>> }
>>>> + WARN_ON_ONCE(fl->fl_blocker);
>>>> out:
>>>> spin_unlock(&ctx->flc_lock);
>>>> percpu_up_read(&file_rwsem);
>>>> @@ -2126,7 +2136,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
>>>> if (error)
>>>> break;
>>>> }
>>>> - locks_delete_block(fl);
>>>> + if (error)
>>>> + locks_delete_block(fl);
>>>> + WARN_ON_ONCE(fl->fl_blocker);
>>>> +
>>>> return error;
>>>> }
>>>>
>>>> @@ -2403,7 +2416,9 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
>>>> if (error)
>>>> break;
>>>> }
>>>> - locks_delete_block(fl);
>>>> + if (error)
>>>> + locks_delete_block(fl);
>>>> + WARN_ON_ONCE(fl->fl_blocker);
>>>>
>>>> return error;
>>>> }
>>>
>>> I've gone ahead and added the above patch to linux-next. Linus, Neil,
>>> are you ok with this one? I think this is probably the simplest
>>> approach.
>>
>> I think this patch contains an assumption which is not justified. It
>> assumes that if a wait_event completes without error, then the wake_up()
>> must have happened. I don't think that is correct.
>>
>> In the patch that caused the recent regression, the race described
>> involved a signal arriving just as __locks_wake_up_blocks() was being
>> called on another thread.
>> So the waiting process was woken by a signal *after* ->fl_blocker was set
>> to NULL, and *before* the wake_up(). If wait_event_interruptible()
>> finds that the condition is true, it will report success whether there
>> was a signal or not.
> Neil and Jeff, Hi,
>
> But after this, like in flock_lock_inode_wait, we will go another
> flock_lock_inode. And the flock_lock_inode it may return
> -ENOMEM/-ENOENT/-EAGAIN/0.
>
> - 0: If there is a try lock, it means that we have call
> locks_move_blocks, and fl->fl_blocked_requests will be NULL, no need to
> wake up at all. If there is a unlock, no one call wait for me, no need
> to wake up too.
>
> - ENOENT: means we are doing unlock, no one will wait for me, no need to
> wake up.
>
> - ENOMEM: since last time we go through flock_lock_inode someone may
> wait for me, so for this error, we need to wake up them.
>
> - EAGAIN: since we has go through flock_lock_inode before, these may
> never happen because FL_SLEEP will not lose.
>
> So the assumption may be ok and for some error case we need to wake up
> someone may wait for me before(the reason for the patch "cifs: call
> locks_delete_block for all error case in cifs_posix_lock_set"). If I am
> wrong, please point out!
>

My original rewrite of this code did restrict the cases where
locks_delete_block() was called - but that didn't work.
See commit
Commit 16306a61d3b7 ("fs/locks: always delete_block after waiting.")

There may be still be cases were we don't need to call
locks_delete_block(), but it is certainly safer - both now and after
possible future changes - to always call it.
If we can make it cheap to always call it - and I'm sure we can - then
that is the safest approach.

Thanks,
NeilBrown


>
>>
>> If you skip the locks_delete_block() after a wait, you get exactly the
>> same race as the optimization - which only skipped most of
>> locks_delete_block().
>>
>> I have a better solution. I did like your patch except that it changed
>> too much code. So I revised it to change less code. See below.
>>
>> NeilBrown
>>
>> From: NeilBrown <[email protected]>
>> Date: Wed, 11 Mar 2020 07:39:04 +1100
>> Subject: [PATCH] locks: restore locks_delete_lock optimization
>>
>> A recent patch (see Fixes: below) removed an optimization which is
>> important as it avoids taking a lock in a common case.
>>
>> The comment justifying the optimisation was correct as far as it went,
>> in that if the tests succeeded, then the values would remain stable and
>> the test result will remain valid even without a lock.
>>
>> However after the test succeeds the lock can be freed while some other
>> thread might have only just set ->blocker to NULL (thus allowing the
>> test to succeed) but has not yet called wake_up() on the wq in the lock.
>> If the wake_up happens after the lock is freed, a use-after-free error
>> occurs.
>>
>> This patch restores the optimization and reorders code to avoid the
>> use-after-free. Specifically we move the list_del_init on
>> fl_blocked_member to *after* the wake_up(), and add an extra test on
>> fl_block_member() to locks_delete_lock() before deciding to avoid taking
>> the spinlock.
>>
>> As this involves breaking code out of __locks_delete_block(), we discard
>> the function completely and open-code it in the two places it was
>> called.
>>
>> These lockless accesses do not require any memory barriers. The failure
>> mode from possible memory access reordering is that the test at the top
>> of locks_delete_lock() will fail, and in that case we fall through into
>> the locked region which provides sufficient memory barriers implicitly.
>>
>> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
>> Signed-off-by: NeilBrown <[email protected]>
>> ---
>> fs/locks.c | 42 ++++++++++++++++++++++++++++--------------
>> 1 file changed, 28 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 426b55d333d5..dc99ab2262ea 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -716,18 +716,6 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
>> hash_del(&waiter->fl_link);
>> }
>>
>> -/* Remove waiter from blocker's block list.
>> - * When blocker ends up pointing to itself then the list is empty.
>> - *
>> - * Must be called with blocked_lock_lock held.
>> - */
>> -static void __locks_delete_block(struct file_lock *waiter)
>> -{
>> - locks_delete_global_blocked(waiter);
>> - list_del_init(&waiter->fl_blocked_member);
>> - waiter->fl_blocker = NULL;
>> -}
>> -
>> static void __locks_wake_up_blocks(struct file_lock *blocker)
>> {
>> while (!list_empty(&blocker->fl_blocked_requests)) {
>> @@ -735,11 +723,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>>
>> waiter = list_first_entry(&blocker->fl_blocked_requests,
>> struct file_lock, fl_blocked_member);
>> - __locks_delete_block(waiter);
>> + locks_delete_global_blocked(waiter);
>> + waiter->fl_blocker = NULL;
>> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>> waiter->fl_lmops->lm_notify(waiter);
>> else
>> wake_up(&waiter->fl_wait);
>> + list_del_init(&waiter->fl_blocked_member);
>> }
>> }
>>
>> @@ -753,11 +743,35 @@ int locks_delete_block(struct file_lock *waiter)
>> {
>> int status = -ENOENT;
>>
>> + /*
>> + * If fl_blocker is NULL, it won't be set again as this thread
>> + * "owns" the lock and is the only one that might try to claim
>> + * the lock. So it is safe to test fl_blocker locklessly.
>> + * Also if fl_blocker is NULL, this waiter is not listed on
>> + * fl_blocked_requests for some lock, so no other request can
>> + * be added to the list of fl_blocked_requests for this
>> + * request. So if fl_blocker is NULL, it is safe to
>> + * locklessly check if fl_blocked_requests is empty. If both
>> + * of these checks succeed, there is no need to take the lock.
>> + * We also check fl_blocked_member is empty. This is logically
>> + * redundant with the test of fl_blocker, but it ensure that
>> + * __locks_wake_up_blocks() has finished the wakeup and will not
>> + * access the lock again, so it is safe to return and free.
>> + * There is no need for any memory barriers with these lockless
>> + * tests as is the reads happen before the corresponding writes are
>> + * seen, we fall through to the locked code.
>> + */
>> + if (waiter->fl_blocker == NULL &&
>> + list_empty(&waiter->fl_blocked_member) &&
>> + list_empty(&waiter->fl_blocked_requests))
>> + return status;
>> spin_lock(&blocked_lock_lock);
>> if (waiter->fl_blocker)
>> status = 0;
>> __locks_wake_up_blocks(waiter);
>> - __locks_delete_block(waiter);
>> + locks_delete_global_blocked(waiter);
>> + list_del_init(&waiter->fl_blocked_member);
>> + waiter->fl_blocker = NULL;
>> spin_unlock(&blocked_lock_lock);
>> return status;
>> }
>>


Attachments:
signature.asc (847.00 B)

2020-03-11 22:24:21

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, Mar 10 2020, Linus Torvalds wrote:

> On Tue, Mar 10, 2020 at 3:07 PM Jeff Layton <[email protected]> wrote:
>>
>> Given that, and the fact that Neil pointed out that yangerkun's latest
>> patch would reintroduce the original race, I'm leaning back toward the
>> patch Neil sent yesterday. It relies solely on spinlocks, and so doesn't
>> have the subtle memory-ordering requirements of the others.
>
> It has subtle locking changes, though.
>
> It now calls the "->lm_notify()" callback with the wait queue spinlock held.
>
> is that ok? It's not obvious. Those functions take other spinlocks,
> and wake up other things. See for example nlmsvc_notify_blocked()..
> Yes, it was called under the blocked_lock_lock spinlock before too,
> but now there's an _additional_ spinlock, and it must not call
> "wake_up(&waiter->fl_wait))" in the callback, for example, because it
> already holds the lock on that wait queue.
>
> Maybe that is never done. I don't know the callbacks.
>
> I was really hoping that the simple memory ordering of using that
> smp_store_release -> smp_load_acquire using fl_blocker would be
> sufficient. That's a particularly simple and efficient ordering.
>
> Oh well. If you want to go that spinlock way, it needs to document why
> it's safe to do a callback under it.
>
> Linus

I've learn recently to dislike calling callbacks while holding a lock.
I don't think the current callbacks care, but the requirement imposes a
burden on future callbacks too.

We can combine the two ideas - move the list_del_init() later, and still
protect it with the wq locks. This avoids holding the lock across the
callback, but provides clear atomicity guarantees.

NeilBrown

From: NeilBrown <[email protected]>
Subject: [PATCH] Subject: [PATCH] locks: restore locks_delete_lock
optimization

A recent patch (see Fixes: below) removed an optimization which is
important as it avoids taking a lock in a common case.

The comment justifying the optimisation was correct as far as it went,
in that if the tests succeeded, then the values would remain stable and
the test result will remain valid even without a lock.

However after the test succeeds the lock can be freed while some other
thread might have only just set ->blocker to NULL (thus allowing the
test to succeed) but has not yet called wake_up() on the wq in the lock.
If the wake_up happens after the lock is freed, a use-after-free error occurs.

This patch restores the optimization and reorders code to avoid the
use-after-free. Specifically we move the list_del_init on
fl_blocked_member to *after* the wake_up(), and add an extra test on
fl_block_member() to locks_delete_lock() before deciding to avoid taking
the spinlock.

To ensure correct ordering for the list_empty() test and the
list_del_init() call, we protect them both with the wq spinlock. This
provides required atomicity, while scaling much better than taking the
global blocked_lock_lock.

Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
Signed-off-by: NeilBrown <[email protected]>
---
fs/locks.c | 46 ++++++++++++++++++++++++++++++++++++++--------
1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..16098a209d63 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -721,11 +721,19 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
*
* Must be called with blocked_lock_lock held.
*/
-static void __locks_delete_block(struct file_lock *waiter)
+static void __locks_delete_block(struct file_lock *waiter, bool notify)
{
locks_delete_global_blocked(waiter);
- list_del_init(&waiter->fl_blocked_member);
waiter->fl_blocker = NULL;
+ if (notify) {
+ if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
+ waiter->fl_lmops->lm_notify(waiter);
+ else
+ wake_up(&waiter->fl_wait);
+ }
+ spin_lock(&waiter->fl_wait.lock);
+ list_del_init(&waiter->fl_blocked_member);
+ spin_unlock(&waiter->fl_wait.lock);
}

static void __locks_wake_up_blocks(struct file_lock *blocker)
@@ -735,11 +743,7 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)

waiter = list_first_entry(&blocker->fl_blocked_requests,
struct file_lock, fl_blocked_member);
- __locks_delete_block(waiter);
- if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
- waiter->fl_lmops->lm_notify(waiter);
- else
- wake_up(&waiter->fl_wait);
+ __locks_delete_block(waiter, true);
}
}

@@ -753,11 +757,37 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;

+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread
+ * "owns" the lock and is the only one that might try to claim
+ * the lock. So it is safe to test fl_blocker locklessly.
+ * Also if fl_blocker is NULL, this waiter is not listed on
+ * fl_blocked_requests for some lock, so no other request can
+ * be added to the list of fl_blocked_requests for this
+ * request. So if fl_blocker is NULL, it is safe to
+ * locklessly check if fl_blocked_requests is empty. If both
+ * of these checks succeed, there is no need to take the lock.
+ * We also check fl_blocked_member is empty un the fl_wait.lock.
+ * If this fails, __locks_delete_block() must still be notifying
+ * waiters, so it isn't yet safe to return and free the file_lock.
+ * Doing this under fl_wait.lock allows significantly better scaling
+ * than unconditionally taking blocks_lock_lock.
+ */
+ if (waiter->fl_blocker == NULL &&
+ list_empty(&waiter->fl_blocked_requests)) {
+ spin_lock(&waiter->fl_wait.lock);
+ if (list_empty(&waiter->fl_blocked_member)) {
+ spin_unlock(&waiter->fl_wait.lock);
+ return status;
+ }
+ /* Notification is still happening */
+ spin_unlock(&waiter->fl_wait.lock);
+ }
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;
__locks_wake_up_blocks(waiter);
- __locks_delete_block(waiter);
+ __locks_delete_block(waiter, false);
spin_unlock(&blocked_lock_lock);
return status;
}
--
2.25.1


Attachments:
signature.asc (847.00 B)

2020-03-12 00:40:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Wed, Mar 11, 2020 at 3:22 PM NeilBrown <[email protected]> wrote:
>
> We can combine the two ideas - move the list_del_init() later, and still
> protect it with the wq locks. This avoids holding the lock across the
> callback, but provides clear atomicity guarantees.

Ugfh. Honestly, this is disgusting.

Now you re-take the same lock in immediate succession for the
non-callback case. It's just hidden.

And it's not like the list_del_init() _needs_ the lock (it's not
currently called with the lock held).

So that "hold the lock over list_del_init()" seems to be horrendously
bogus. It's only done as a serialization thing for that optimistic
case.

And that optimistic case doesn't even *want* that kind of
serialization. It really just wants a "I'm done" flag.

So no. Don't do this. It's mis-using the lock in several ways.

Linus

2020-03-12 04:43:37

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Wed, Mar 11 2020, Linus Torvalds wrote:

> On Wed, Mar 11, 2020 at 3:22 PM NeilBrown <[email protected]> wrote:
>>
>> We can combine the two ideas - move the list_del_init() later, and still
>> protect it with the wq locks. This avoids holding the lock across the
>> callback, but provides clear atomicity guarantees.
>
> Ugfh. Honestly, this is disgusting.
>
> Now you re-take the same lock in immediate succession for the
> non-callback case. It's just hidden.
>
> And it's not like the list_del_init() _needs_ the lock (it's not
> currently called with the lock held).
>
> So that "hold the lock over list_del_init()" seems to be horrendously
> bogus. It's only done as a serialization thing for that optimistic
> case.
>
> And that optimistic case doesn't even *want* that kind of
> serialization. It really just wants a "I'm done" flag.
>
> So no. Don't do this. It's mis-using the lock in several ways.
>
> Linus

It seems that test_and_set_bit_lock() is the preferred way to handle
flags when memory ordering is important, and I can't see how to use that
well with an "I'm done" flag. I can make it look OK with a "I'm
detaching" flag. Maybe this is better.

NeilBrown

From f46db25f328ddf37ca9fbd390c6eb5f50c4bd2e6 Mon Sep 17 00:00:00 2001
From: NeilBrown <[email protected]>
Date: Wed, 11 Mar 2020 07:39:04 +1100
Subject: [PATCH] locks: restore locks_delete_lock optimization

A recent patch (see Fixes: below) removed an optimization which is
important as it avoids taking a lock in a common case.

The comment justifying the optimisation was correct as far as it went,
in that if the tests succeeded, then the values would remain stable and
the test result will remain valid even without a lock.

However after the test succeeds the lock can be freed while some other
thread might have only just set ->blocker to NULL (thus allowing the
test to succeed) but has not yet called wake_up() on the wq in the lock.
If the wake_up happens after the lock is freed, a use-after-free error occurs.

This patch restores the optimization and adds a flag to ensure this
use-after-free is avoid. The use happens only when the flag is set, and
the free doesn't happen until the flag has been cleared, or we have
taken blocked_lock_lock.

Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
Signed-off-by: NeilBrown <[email protected]>
---
fs/locks.c | 44 ++++++++++++++++++++++++++++++++++++++------
include/linux/fs.h | 3 ++-
2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..334473004c6c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -283,7 +283,7 @@ locks_dump_ctx_list(struct list_head *list, char *list_type)
struct file_lock *fl;

list_for_each_entry(fl, list, fl_list) {
- pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
+ pr_warn("%s: fl_owner=%p fl_flags=0x%lx fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
}
}

@@ -314,7 +314,7 @@ locks_check_ctx_file_list(struct file *filp, struct list_head *list,
list_for_each_entry(fl, list, fl_list)
if (fl->fl_file == filp)
pr_warn("Leaked %s lock on dev=0x%x:0x%x ino=0x%lx "
- " fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n",
+ " fl_owner=%p fl_flags=0x%lx fl_type=0x%x fl_pid=%u\n",
list_type, MAJOR(inode->i_sb->s_dev),
MINOR(inode->i_sb->s_dev), inode->i_ino,
fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
@@ -736,10 +736,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
waiter = list_first_entry(&blocker->fl_blocked_requests,
struct file_lock, fl_blocked_member);
__locks_delete_block(waiter);
- if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
- waiter->fl_lmops->lm_notify(waiter);
- else
- wake_up(&waiter->fl_wait);
+ if (!test_and_set_bit_lock(FL_DELETING, &waiter->fl_flags)) {
+ if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
+ waiter->fl_lmops->lm_notify(waiter);
+ else
+ wake_up(&waiter->fl_wait);
+ clear_bit_unlock(FL_DELETING, &waiter->fl_flags);
+ }
}
}

@@ -753,11 +756,40 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;

+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread
+ * "owns" the lock and is the only one that might try to claim
+ * the lock. So it is safe to test fl_blocker locklessly.
+ * Also if fl_blocker is NULL, this waiter is not listed on
+ * fl_blocked_requests for some lock, so no other request can
+ * be added to the list of fl_blocked_requests for this
+ * request. So if fl_blocker is NULL, it is safe to
+ * locklessly check if fl_blocked_requests is empty. If both
+ * of these checks succeed, there is no need to take the lock.
+ *
+ * We perform these checks only if we can set FL_DELETING.
+ * This ensure that we don't race with __locks_wake_up_blocks()
+ * in a way which leads it to call wake_up() *after* we return
+ * and the file_lock is freed.
+ */
+ if (!test_and_set_bit_lock(FL_DELETING, &waiter->fl_flags)) {
+ if (waiter->fl_blocker == NULL &&
+ list_empty(&waiter->fl_blocked_requests)) {
+ /* Already fully unlinked */
+ clear_bit_unlock(FL_DELETING, &waiter->fl_flags);
+ return status;
+ }
+ }
+
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;
__locks_wake_up_blocks(waiter);
__locks_delete_block(waiter);
+ /* This flag might not be set and it is largely irrelevant
+ * now, but it seem cleaner to clear it.
+ */
+ clear_bit(FL_DELETING, &waiter->fl_flags);
spin_unlock(&blocked_lock_lock);
return status;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3cd4fe6b845e..4db514f29bca 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1012,6 +1012,7 @@ static inline struct file *get_file(struct file *f)
#define FL_UNLOCK_PENDING 512 /* Lease is being broken */
#define FL_OFDLCK 1024 /* lock is "owned" by struct file */
#define FL_LAYOUT 2048 /* outstanding pNFS layout */
+#define FL_DELETING 32768 /* lock is being disconnected */

#define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)

@@ -1087,7 +1088,7 @@ struct file_lock {
* ->fl_blocker->fl_blocked_requests
*/
fl_owner_t fl_owner;
- unsigned int fl_flags;
+ unsigned long fl_flags;
unsigned char fl_type;
unsigned int fl_pid;
int fl_link_cpu; /* what cpu's list is this on? */
--
2.25.1


Attachments:
signature.asc (847.00 B)

2020-03-12 12:33:21

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Thu, 2020-03-12 at 15:42 +1100, NeilBrown wrote:
> On Wed, Mar 11 2020, Linus Torvalds wrote:
>
> > On Wed, Mar 11, 2020 at 3:22 PM NeilBrown <[email protected]> wrote:
> > > We can combine the two ideas - move the list_del_init() later, and still
> > > protect it with the wq locks. This avoids holding the lock across the
> > > callback, but provides clear atomicity guarantees.
> >
> > Ugfh. Honestly, this is disgusting.
> >
> > Now you re-take the same lock in immediate succession for the
> > non-callback case. It's just hidden.
> >
> > And it's not like the list_del_init() _needs_ the lock (it's not
> > currently called with the lock held).
> >
> > So that "hold the lock over list_del_init()" seems to be horrendously
> > bogus. It's only done as a serialization thing for that optimistic
> > case.
> >
> > And that optimistic case doesn't even *want* that kind of
> > serialization. It really just wants a "I'm done" flag.
> >
> > So no. Don't do this. It's mis-using the lock in several ways.
> >
> > Linus
>
> It seems that test_and_set_bit_lock() is the preferred way to handle
> flags when memory ordering is important, and I can't see how to use that
> well with an "I'm done" flag. I can make it look OK with a "I'm
> detaching" flag. Maybe this is better.
>
> NeilBrown
>
> From f46db25f328ddf37ca9fbd390c6eb5f50c4bd2e6 Mon Sep 17 00:00:00 2001
> From: NeilBrown <[email protected]>
> Date: Wed, 11 Mar 2020 07:39:04 +1100
> Subject: [PATCH] locks: restore locks_delete_lock optimization
>
> A recent patch (see Fixes: below) removed an optimization which is
> important as it avoids taking a lock in a common case.
>
> The comment justifying the optimisation was correct as far as it went,
> in that if the tests succeeded, then the values would remain stable and
> the test result will remain valid even without a lock.
>
> However after the test succeeds the lock can be freed while some other
> thread might have only just set ->blocker to NULL (thus allowing the
> test to succeed) but has not yet called wake_up() on the wq in the lock.
> If the wake_up happens after the lock is freed, a use-after-free error occurs.
>
> This patch restores the optimization and adds a flag to ensure this
> use-after-free is avoid. The use happens only when the flag is set, and
> the free doesn't happen until the flag has been cleared, or we have
> taken blocked_lock_lock.
>
> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> Signed-off-by: NeilBrown <[email protected]>
> ---
> fs/locks.c | 44 ++++++++++++++++++++++++++++++++++++++------
> include/linux/fs.h | 3 ++-
> 2 files changed, 40 insertions(+), 7 deletions(-)
>

Just a note that I'm traveling at the moment, and won't be able do much
other than comment on this for a few days.

> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..334473004c6c 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -283,7 +283,7 @@ locks_dump_ctx_list(struct list_head *list, char *list_type)
> struct file_lock *fl;
>
> list_for_each_entry(fl, list, fl_list) {
> - pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
> + pr_warn("%s: fl_owner=%p fl_flags=0x%lx fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
> }
> }
>
> @@ -314,7 +314,7 @@ locks_check_ctx_file_list(struct file *filp, struct list_head *list,
> list_for_each_entry(fl, list, fl_list)
> if (fl->fl_file == filp)
> pr_warn("Leaked %s lock on dev=0x%x:0x%x ino=0x%lx "
> - " fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n",
> + " fl_owner=%p fl_flags=0x%lx fl_type=0x%x fl_pid=%u\n",
> list_type, MAJOR(inode->i_sb->s_dev),
> MINOR(inode->i_sb->s_dev), inode->i_ino,
> fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
> @@ -736,10 +736,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> waiter = list_first_entry(&blocker->fl_blocked_requests,
> struct file_lock, fl_blocked_member);
> __locks_delete_block(waiter);
> - if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> - waiter->fl_lmops->lm_notify(waiter);
> - else
> - wake_up(&waiter->fl_wait);
> + if (!test_and_set_bit_lock(FL_DELETING, &waiter->fl_flags)) {
> + if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> + waiter->fl_lmops->lm_notify(waiter);
> + else
> + wake_up(&waiter->fl_wait);
> + clear_bit_unlock(FL_DELETING, &waiter->fl_flags);
> + }

I *think* this is probably safe.

AIUI, when you use atomic bitops on a flag word like this, you should
use them for all modifications to ensure that your changes don't get
clobbered by another task racing in to do a read/modify/write cycle on
the same word.

I haven't gone over all of the places where fl_flags is changed, but I
don't see any at first glance that do it on a blocked request.

> }
> }
>
> @@ -753,11 +756,40 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread
> + * "owns" the lock and is the only one that might try to claim
> + * the lock. So it is safe to test fl_blocker locklessly.
> + * Also if fl_blocker is NULL, this waiter is not listed on
> + * fl_blocked_requests for some lock, so no other request can
> + * be added to the list of fl_blocked_requests for this
> + * request. So if fl_blocker is NULL, it is safe to
> + * locklessly check if fl_blocked_requests is empty. If both
> + * of these checks succeed, there is no need to take the lock.
> + *
> + * We perform these checks only if we can set FL_DELETING.
> + * This ensure that we don't race with __locks_wake_up_blocks()
> + * in a way which leads it to call wake_up() *after* we return
> + * and the file_lock is freed.
> + */
> + if (!test_and_set_bit_lock(FL_DELETING, &waiter->fl_flags)) {
> + if (waiter->fl_blocker == NULL &&
> + list_empty(&waiter->fl_blocked_requests)) {
> + /* Already fully unlinked */
> + clear_bit_unlock(FL_DELETING, &waiter->fl_flags);
> + return status;
> + }
> + }
> +
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;
> __locks_wake_up_blocks(waiter);
> __locks_delete_block(waiter);
> + /* This flag might not be set and it is largely irrelevant
> + * now, but it seem cleaner to clear it.
> + */
> + clear_bit(FL_DELETING, &waiter->fl_flags);
> spin_unlock(&blocked_lock_lock);
> return status;
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3cd4fe6b845e..4db514f29bca 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1012,6 +1012,7 @@ static inline struct file *get_file(struct file *f)
> #define FL_UNLOCK_PENDING 512 /* Lease is being broken */
> #define FL_OFDLCK 1024 /* lock is "owned" by struct file */
> #define FL_LAYOUT 2048 /* outstanding pNFS layout */
> +#define FL_DELETING 32768 /* lock is being disconnected */

nit: Why the big gap?

>
> #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
>
> @@ -1087,7 +1088,7 @@ struct file_lock {
> * ->fl_blocker->fl_blocked_requests
> */
> fl_owner_t fl_owner;
> - unsigned int fl_flags;
> + unsigned long fl_flags;

This will break kABI, so backporting this to enterprise distro kernels
won't be trivial. Not a showstopper, but it might be nice to avoid that
if we can.

While it's not quite as efficient, we could just do the FL_DELETING
manipulation under the flc->flc_lock. That's per-inode, so it should be
safe to do it that way.

> unsigned char fl_type;
> unsigned int fl_pid;
> int fl_link_cpu; /* what cpu's list is this on? */

2020-03-12 16:08:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Wed, Mar 11, 2020 at 9:42 PM NeilBrown <[email protected]> wrote:
>
> It seems that test_and_set_bit_lock() is the preferred way to handle
> flags when memory ordering is important

That looks better.

The _preferred_ way is actually the one I already posted: do a
"smp_store_release()" to store the flag (like a NULL pointer), and a
smp_load_acquire() to load it.

That's basically optimal on most architectures (all modern ones -
there are bad architectures from before people figured out that
release/acquire is better than separate memory barriers), not needing
any atomics and only minimal memory ordering.

I wonder if a special flags value (keeping it "unsigned int" to avoid
the issue Jeff pointed out) might be acceptable?

IOW, could we do just

smp_store_release(&waiter->fl_flags, FL_RELEASED);

to say that we're done with the lock? Or do people still look at and
depend on the flag values at that point?

Linus

2020-03-12 22:20:52

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Thu, Mar 12 2020, Jeff Layton wrote:

> On Thu, 2020-03-12 at 15:42 +1100, NeilBrown wrote:
>> On Wed, Mar 11 2020, Linus Torvalds wrote:
>>
>> > On Wed, Mar 11, 2020 at 3:22 PM NeilBrown <[email protected]> wrote:
>> > > We can combine the two ideas - move the list_del_init() later, and still
>> > > protect it with the wq locks. This avoids holding the lock across the
>> > > callback, but provides clear atomicity guarantees.
>> >
>> > Ugfh. Honestly, this is disgusting.
>> >
>> > Now you re-take the same lock in immediate succession for the
>> > non-callback case. It's just hidden.
>> >
>> > And it's not like the list_del_init() _needs_ the lock (it's not
>> > currently called with the lock held).
>> >
>> > So that "hold the lock over list_del_init()" seems to be horrendously
>> > bogus. It's only done as a serialization thing for that optimistic
>> > case.
>> >
>> > And that optimistic case doesn't even *want* that kind of
>> > serialization. It really just wants a "I'm done" flag.
>> >
>> > So no. Don't do this. It's mis-using the lock in several ways.
>> >
>> > Linus
>>
>> It seems that test_and_set_bit_lock() is the preferred way to handle
>> flags when memory ordering is important, and I can't see how to use that
>> well with an "I'm done" flag. I can make it look OK with a "I'm
>> detaching" flag. Maybe this is better.
>>
>> NeilBrown
>>
>> From f46db25f328ddf37ca9fbd390c6eb5f50c4bd2e6 Mon Sep 17 00:00:00 2001
>> From: NeilBrown <[email protected]>
>> Date: Wed, 11 Mar 2020 07:39:04 +1100
>> Subject: [PATCH] locks: restore locks_delete_lock optimization
>>
>> A recent patch (see Fixes: below) removed an optimization which is
>> important as it avoids taking a lock in a common case.
>>
>> The comment justifying the optimisation was correct as far as it went,
>> in that if the tests succeeded, then the values would remain stable and
>> the test result will remain valid even without a lock.
>>
>> However after the test succeeds the lock can be freed while some other
>> thread might have only just set ->blocker to NULL (thus allowing the
>> test to succeed) but has not yet called wake_up() on the wq in the lock.
>> If the wake_up happens after the lock is freed, a use-after-free error occurs.
>>
>> This patch restores the optimization and adds a flag to ensure this
>> use-after-free is avoid. The use happens only when the flag is set, and
>> the free doesn't happen until the flag has been cleared, or we have
>> taken blocked_lock_lock.
>>
>> Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
>> Signed-off-by: NeilBrown <[email protected]>
>> ---
>> fs/locks.c | 44 ++++++++++++++++++++++++++++++++++++++------
>> include/linux/fs.h | 3 ++-
>> 2 files changed, 40 insertions(+), 7 deletions(-)
>>
>
> Just a note that I'm traveling at the moment, and won't be able do much
> other than comment on this for a few days.
>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 426b55d333d5..334473004c6c 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -283,7 +283,7 @@ locks_dump_ctx_list(struct list_head *list, char *list_type)
>> struct file_lock *fl;
>>
>> list_for_each_entry(fl, list, fl_list) {
>> - pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
>> + pr_warn("%s: fl_owner=%p fl_flags=0x%lx fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
>> }
>> }
>>
>> @@ -314,7 +314,7 @@ locks_check_ctx_file_list(struct file *filp, struct list_head *list,
>> list_for_each_entry(fl, list, fl_list)
>> if (fl->fl_file == filp)
>> pr_warn("Leaked %s lock on dev=0x%x:0x%x ino=0x%lx "
>> - " fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n",
>> + " fl_owner=%p fl_flags=0x%lx fl_type=0x%x fl_pid=%u\n",
>> list_type, MAJOR(inode->i_sb->s_dev),
>> MINOR(inode->i_sb->s_dev), inode->i_ino,
>> fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
>> @@ -736,10 +736,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
>> waiter = list_first_entry(&blocker->fl_blocked_requests,
>> struct file_lock, fl_blocked_member);
>> __locks_delete_block(waiter);
>> - if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>> - waiter->fl_lmops->lm_notify(waiter);
>> - else
>> - wake_up(&waiter->fl_wait);
>> + if (!test_and_set_bit_lock(FL_DELETING, &waiter->fl_flags)) {
>> + if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
>> + waiter->fl_lmops->lm_notify(waiter);
>> + else
>> + wake_up(&waiter->fl_wait);
>> + clear_bit_unlock(FL_DELETING, &waiter->fl_flags);
>> + }
>
> I *think* this is probably safe.
>
> AIUI, when you use atomic bitops on a flag word like this, you should
> use them for all modifications to ensure that your changes don't get
> clobbered by another task racing in to do a read/modify/write cycle on
> the same word.
>
> I haven't gone over all of the places where fl_flags is changed, but I
> don't see any at first glance that do it on a blocked request.
>
>> }
>> }
>>
>> @@ -753,11 +756,40 @@ int locks_delete_block(struct file_lock *waiter)
>> {
>> int status = -ENOENT;
>>
>> + /*
>> + * If fl_blocker is NULL, it won't be set again as this thread
>> + * "owns" the lock and is the only one that might try to claim
>> + * the lock. So it is safe to test fl_blocker locklessly.
>> + * Also if fl_blocker is NULL, this waiter is not listed on
>> + * fl_blocked_requests for some lock, so no other request can
>> + * be added to the list of fl_blocked_requests for this
>> + * request. So if fl_blocker is NULL, it is safe to
>> + * locklessly check if fl_blocked_requests is empty. If both
>> + * of these checks succeed, there is no need to take the lock.
>> + *
>> + * We perform these checks only if we can set FL_DELETING.
>> + * This ensure that we don't race with __locks_wake_up_blocks()
>> + * in a way which leads it to call wake_up() *after* we return
>> + * and the file_lock is freed.
>> + */
>> + if (!test_and_set_bit_lock(FL_DELETING, &waiter->fl_flags)) {
>> + if (waiter->fl_blocker == NULL &&
>> + list_empty(&waiter->fl_blocked_requests)) {
>> + /* Already fully unlinked */
>> + clear_bit_unlock(FL_DELETING, &waiter->fl_flags);
>> + return status;
>> + }
>> + }
>> +
>> spin_lock(&blocked_lock_lock);
>> if (waiter->fl_blocker)
>> status = 0;
>> __locks_wake_up_blocks(waiter);
>> __locks_delete_block(waiter);
>> + /* This flag might not be set and it is largely irrelevant
>> + * now, but it seem cleaner to clear it.
>> + */
>> + clear_bit(FL_DELETING, &waiter->fl_flags);
>> spin_unlock(&blocked_lock_lock);
>> return status;
>> }
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 3cd4fe6b845e..4db514f29bca 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1012,6 +1012,7 @@ static inline struct file *get_file(struct file *f)
>> #define FL_UNLOCK_PENDING 512 /* Lease is being broken */
>> #define FL_OFDLCK 1024 /* lock is "owned" by struct file */
>> #define FL_LAYOUT 2048 /* outstanding pNFS layout */
>> +#define FL_DELETING 32768 /* lock is being disconnected */
>
> nit: Why the big gap?

No good reason - it seems like a conceptually different sort of flag so
I vaguely felt that it would help if it were numerically separate.

>
>>
>> #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
>>
>> @@ -1087,7 +1088,7 @@ struct file_lock {
>> * ->fl_blocker->fl_blocked_requests
>> */
>> fl_owner_t fl_owner;
>> - unsigned int fl_flags;
>> + unsigned long fl_flags;
>
> This will break kABI, so backporting this to enterprise distro kernels
> won't be trivial. Not a showstopper, but it might be nice to avoid that
> if we can.
>
> While it's not quite as efficient, we could just do the FL_DELETING
> manipulation under the flc->flc_lock. That's per-inode, so it should be
> safe to do it that way.

If we are going to use a spinlock, I'd much rather not add a flag bit,
but instead use the blocked_member list_head.

I'm almost tempted to suggest adding
smp_list_del_init_release() and smp_list_empty_careful_acquire()
so that list membership can be used as a barrier. I'm not sure I game
though.

NeilBrown


>
>> unsigned char fl_type;
>> unsigned int fl_pid;
>> int fl_link_cpu; /* what cpu's list is this on? */


Attachments:
signature.asc (847.00 B)

2020-03-14 01:13:28

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Fri, 2020-03-13 at 09:19 +1100, NeilBrown wrote:
> On Thu, Mar 12 2020, Jeff Layton wrote:
>
> > On Thu, 2020-03-12 at 15:42 +1100, NeilBrown wrote:
> > > On Wed, Mar 11 2020, Linus Torvalds wrote:
> > >
> > > > On Wed, Mar 11, 2020 at 3:22 PM NeilBrown <[email protected]> wrote:
> > > > > We can combine the two ideas - move the list_del_init() later, and still
> > > > > protect it with the wq locks. This avoids holding the lock across the
> > > > > callback, but provides clear atomicity guarantees.
> > > >
> > > > Ugfh. Honestly, this is disgusting.
> > > >
> > > > Now you re-take the same lock in immediate succession for the
> > > > non-callback case. It's just hidden.
> > > >
> > > > And it's not like the list_del_init() _needs_ the lock (it's not
> > > > currently called with the lock held).
> > > >
> > > > So that "hold the lock over list_del_init()" seems to be horrendously
> > > > bogus. It's only done as a serialization thing for that optimistic
> > > > case.
> > > >
> > > > And that optimistic case doesn't even *want* that kind of
> > > > serialization. It really just wants a "I'm done" flag.
> > > >
> > > > So no. Don't do this. It's mis-using the lock in several ways.
> > > >
> > > > Linus
> > >
> > > It seems that test_and_set_bit_lock() is the preferred way to handle
> > > flags when memory ordering is important, and I can't see how to use that
> > > well with an "I'm done" flag. I can make it look OK with a "I'm
> > > detaching" flag. Maybe this is better.
> > >
> > > NeilBrown
> > >
> > > From f46db25f328ddf37ca9fbd390c6eb5f50c4bd2e6 Mon Sep 17 00:00:00 2001
> > > From: NeilBrown <[email protected]>
> > > Date: Wed, 11 Mar 2020 07:39:04 +1100
> > > Subject: [PATCH] locks: restore locks_delete_lock optimization
> > >
> > > A recent patch (see Fixes: below) removed an optimization which is
> > > important as it avoids taking a lock in a common case.
> > >
> > > The comment justifying the optimisation was correct as far as it went,
> > > in that if the tests succeeded, then the values would remain stable and
> > > the test result will remain valid even without a lock.
> > >
> > > However after the test succeeds the lock can be freed while some other
> > > thread might have only just set ->blocker to NULL (thus allowing the
> > > test to succeed) but has not yet called wake_up() on the wq in the lock.
> > > If the wake_up happens after the lock is freed, a use-after-free error occurs.
> > >
> > > This patch restores the optimization and adds a flag to ensure this
> > > use-after-free is avoid. The use happens only when the flag is set, and
> > > the free doesn't happen until the flag has been cleared, or we have
> > > taken blocked_lock_lock.
> > >
> > > Fixes: 6d390e4b5d48 ("locks: fix a potential use-after-free problem when wakeup a waiter")
> > > Signed-off-by: NeilBrown <[email protected]>
> > > ---
> > > fs/locks.c | 44 ++++++++++++++++++++++++++++++++++++++------
> > > include/linux/fs.h | 3 ++-
> > > 2 files changed, 40 insertions(+), 7 deletions(-)
> > >
> >
> > Just a note that I'm traveling at the moment, and won't be able do much
> > other than comment on this for a few days.
> >
> > > diff --git a/fs/locks.c b/fs/locks.c
> > > index 426b55d333d5..334473004c6c 100644
> > > --- a/fs/locks.c
> > > +++ b/fs/locks.c
> > > @@ -283,7 +283,7 @@ locks_dump_ctx_list(struct list_head *list, char *list_type)
> > > struct file_lock *fl;
> > >
> > > list_for_each_entry(fl, list, fl_list) {
> > > - pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
> > > + pr_warn("%s: fl_owner=%p fl_flags=0x%lx fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
> > > }
> > > }
> > >
> > > @@ -314,7 +314,7 @@ locks_check_ctx_file_list(struct file *filp, struct list_head *list,
> > > list_for_each_entry(fl, list, fl_list)
> > > if (fl->fl_file == filp)
> > > pr_warn("Leaked %s lock on dev=0x%x:0x%x ino=0x%lx "
> > > - " fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n",
> > > + " fl_owner=%p fl_flags=0x%lx fl_type=0x%x fl_pid=%u\n",
> > > list_type, MAJOR(inode->i_sb->s_dev),
> > > MINOR(inode->i_sb->s_dev), inode->i_ino,
> > > fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
> > > @@ -736,10 +736,13 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> > > waiter = list_first_entry(&blocker->fl_blocked_requests,
> > > struct file_lock, fl_blocked_member);
> > > __locks_delete_block(waiter);
> > > - if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> > > - waiter->fl_lmops->lm_notify(waiter);
> > > - else
> > > - wake_up(&waiter->fl_wait);
> > > + if (!test_and_set_bit_lock(FL_DELETING, &waiter->fl_flags)) {
> > > + if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> > > + waiter->fl_lmops->lm_notify(waiter);
> > > + else
> > > + wake_up(&waiter->fl_wait);
> > > + clear_bit_unlock(FL_DELETING, &waiter->fl_flags);
> > > + }
> >
> > I *think* this is probably safe.
> >
> > AIUI, when you use atomic bitops on a flag word like this, you should
> > use them for all modifications to ensure that your changes don't get
> > clobbered by another task racing in to do a read/modify/write cycle on
> > the same word.
> >
> > I haven't gone over all of the places where fl_flags is changed, but I
> > don't see any at first glance that do it on a blocked request.
> >
> > > }
> > > }
> > >
> > > @@ -753,11 +756,40 @@ int locks_delete_block(struct file_lock *waiter)
> > > {
> > > int status = -ENOENT;
> > >
> > > + /*
> > > + * If fl_blocker is NULL, it won't be set again as this thread
> > > + * "owns" the lock and is the only one that might try to claim
> > > + * the lock. So it is safe to test fl_blocker locklessly.
> > > + * Also if fl_blocker is NULL, this waiter is not listed on
> > > + * fl_blocked_requests for some lock, so no other request can
> > > + * be added to the list of fl_blocked_requests for this
> > > + * request. So if fl_blocker is NULL, it is safe to
> > > + * locklessly check if fl_blocked_requests is empty. If both
> > > + * of these checks succeed, there is no need to take the lock.
> > > + *
> > > + * We perform these checks only if we can set FL_DELETING.
> > > + * This ensure that we don't race with __locks_wake_up_blocks()
> > > + * in a way which leads it to call wake_up() *after* we return
> > > + * and the file_lock is freed.
> > > + */
> > > + if (!test_and_set_bit_lock(FL_DELETING, &waiter->fl_flags)) {
> > > + if (waiter->fl_blocker == NULL &&
> > > + list_empty(&waiter->fl_blocked_requests)) {
> > > + /* Already fully unlinked */
> > > + clear_bit_unlock(FL_DELETING, &waiter->fl_flags);
> > > + return status;
> > > + }
> > > + }
> > > +
> > > spin_lock(&blocked_lock_lock);
> > > if (waiter->fl_blocker)
> > > status = 0;
> > > __locks_wake_up_blocks(waiter);
> > > __locks_delete_block(waiter);
> > > + /* This flag might not be set and it is largely irrelevant
> > > + * now, but it seem cleaner to clear it.
> > > + */
> > > + clear_bit(FL_DELETING, &waiter->fl_flags);
> > > spin_unlock(&blocked_lock_lock);
> > > return status;
> > > }
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index 3cd4fe6b845e..4db514f29bca 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -1012,6 +1012,7 @@ static inline struct file *get_file(struct file *f)
> > > #define FL_UNLOCK_PENDING 512 /* Lease is being broken */
> > > #define FL_OFDLCK 1024 /* lock is "owned" by struct file */
> > > #define FL_LAYOUT 2048 /* outstanding pNFS layout */
> > > +#define FL_DELETING 32768 /* lock is being disconnected */
> >
> > nit: Why the big gap?
>
> No good reason - it seems like a conceptually different sort of flag so
> I vaguely felt that it would help if it were numerically separate.
>
> > > #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE)
> > >
> > > @@ -1087,7 +1088,7 @@ struct file_lock {
> > > * ->fl_blocker->fl_blocked_requests
> > > */
> > > fl_owner_t fl_owner;
> > > - unsigned int fl_flags;
> > > + unsigned long fl_flags;
> >
> > This will break kABI, so backporting this to enterprise distro kernels
> > won't be trivial. Not a showstopper, but it might be nice to avoid that
> > if we can.
> >
> > While it's not quite as efficient, we could just do the FL_DELETING
> > manipulation under the flc->flc_lock. That's per-inode, so it should be
> > safe to do it that way.
>
> If we are going to use a spinlock, I'd much rather not add a flag bit,
> but instead use the blocked_member list_head.
>

If we do want to go that route though, we'll probably need to make
variants of locks_delete_block that can be called with the flc_lock
held and without. Most of the fs/locks.c callers call it with the
flc_lock held -- most of the others don't.

> I'm almost tempted to suggest adding
> smp_list_del_init_release() and smp_list_empty_careful_acquire()
> so that list membership can be used as a barrier. I'm not sure I game
> though.
>

Those do sound quite handy to have, but I'm not sure it's really
required. We could also just go back to considering the patch that
Linus sent originally, along with changing all of the
wait_event_interruptible calls to use
list_empty(&fl->fl_blocked_member) instead of !fl->fl_blocker as the
condition. (See attached)

--
Jeff Layton <[email protected]>


Attachments:
0001-locks-reinstate-locks_delete_lock-optimization.patch (5.34 kB)

2020-03-14 01:32:37

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Thu, 2020-03-12 at 09:07 -0700, Linus Torvalds wrote:
> On Wed, Mar 11, 2020 at 9:42 PM NeilBrown <[email protected]> wrote:
> > It seems that test_and_set_bit_lock() is the preferred way to handle
> > flags when memory ordering is important
>
> That looks better.
>
> The _preferred_ way is actually the one I already posted: do a
> "smp_store_release()" to store the flag (like a NULL pointer), and a
> smp_load_acquire() to load it.
>
> That's basically optimal on most architectures (all modern ones -
> there are bad architectures from before people figured out that
> release/acquire is better than separate memory barriers), not needing
> any atomics and only minimal memory ordering.
>
> I wonder if a special flags value (keeping it "unsigned int" to avoid
> the issue Jeff pointed out) might be acceptable?
>
> IOW, could we do just
>
> smp_store_release(&waiter->fl_flags, FL_RELEASED);
>
> to say that we're done with the lock? Or do people still look at and
> depend on the flag values at that point?

I think nlmsvc_grant_block does. We could probably work around it
there, but we'd need to couple this change with some clear
documentation to make it clear that you can't rely on fl_flags after
locks_delete_block returns.

If avoiding new locks is preferred here (and I'm fine with that), then
maybe we should just go with the patch you sent originally (along with
changing the waiters to wait on fl_blocked_member going empty instead
of the fl_blocker going NULL)?

--
Jeff Layton <[email protected]>

2020-03-14 02:31:44

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Fri, Mar 13 2020, Jeff Layton wrote:

> On Thu, 2020-03-12 at 09:07 -0700, Linus Torvalds wrote:
>> On Wed, Mar 11, 2020 at 9:42 PM NeilBrown <[email protected]> wrote:
>> > It seems that test_and_set_bit_lock() is the preferred way to handle
>> > flags when memory ordering is important
>>
>> That looks better.
>>
>> The _preferred_ way is actually the one I already posted: do a
>> "smp_store_release()" to store the flag (like a NULL pointer), and a
>> smp_load_acquire() to load it.
>>
>> That's basically optimal on most architectures (all modern ones -
>> there are bad architectures from before people figured out that
>> release/acquire is better than separate memory barriers), not needing
>> any atomics and only minimal memory ordering.
>>
>> I wonder if a special flags value (keeping it "unsigned int" to avoid
>> the issue Jeff pointed out) might be acceptable?
>>
>> IOW, could we do just
>>
>> smp_store_release(&waiter->fl_flags, FL_RELEASED);
>>
>> to say that we're done with the lock? Or do people still look at and
>> depend on the flag values at that point?
>
> I think nlmsvc_grant_block does. We could probably work around it
> there, but we'd need to couple this change with some clear
> documentation to make it clear that you can't rely on fl_flags after
> locks_delete_block returns.
>
> If avoiding new locks is preferred here (and I'm fine with that), then
> maybe we should just go with the patch you sent originally (along with
> changing the waiters to wait on fl_blocked_member going empty instead
> of the fl_blocker going NULL)?

I agree. I've poked at this for a while and come to the conclusion that
I cannot really come up with anything that is structurally better than
your patch.
The idea of list_del_init_release() and list_empty_acquire() is growing
on me though. See below.

list_empty_acquire() might be appropriate for waitqueue_active(), which
is documented as requiring a memory barrier, but in practice seems to
often be used without one.

But I'm happy for you to go with your patch that changes all the wait
calls.

NeilBrown



diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..2e5eb677c324 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -174,6 +174,20 @@

#include <linux/uaccess.h>

+/* Should go in list.h */
+static inline int list_empty_acquire(const struct list_head *head)
+{
+ return smp_load_acquire(&head->next) == head;
+}
+
+static inline void list_del_init_release(struct list_head *entry)
+{
+ __list_del_entry(entry);
+ entry->prev = entry;
+ smp_store_release(&entry->next, entry);
+}
+
+
#define IS_POSIX(fl) (fl->fl_flags & FL_POSIX)
#define IS_FLOCK(fl) (fl->fl_flags & FL_FLOCK)
#define IS_LEASE(fl) (fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
@@ -724,7 +738,6 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
static void __locks_delete_block(struct file_lock *waiter)
{
locks_delete_global_blocked(waiter);
- list_del_init(&waiter->fl_blocked_member);
waiter->fl_blocker = NULL;
}

@@ -740,6 +753,11 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
waiter->fl_lmops->lm_notify(waiter);
else
wake_up(&waiter->fl_wait);
+ /*
+ * Tell the world that we're done with it - see comment at
+ * top of locks_delete_block().
+ */
+ list_del_init_release(&waiter->fl_blocked_member);
}
}

@@ -753,6 +771,25 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;

+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread
+ * "owns" the lock and is the only one that might try to claim
+ * the lock. So it is safe to test fl_blocker locklessly.
+ * Also if fl_blocker is NULL, this waiter is not listed on
+ * fl_blocked_requests for some lock, so no other request can
+ * be added to the list of fl_blocked_requests for this
+ * request. So if fl_blocker is NULL, it is safe to
+ * locklessly check if fl_blocked_requests is empty. If both
+ * of these checks succeed, there is no need to take the lock.
+ * However, some other thread could still be in__locks_wake_up_blocks()
+ * and may yet access 'waiter', so we cannot return and possibly
+ * free the 'waiter' unless we check that __locks_wake_up_blocks()
+ * is done. For that we carefully test fl_blocked_member.
+ */
+ if (waiter->fl_blocker == NULL &&
+ list_empty(&waiter->fl_blocked_requests) &&
+ list_empty_acquire(&waiter->fl_blocked_member))
+ return status;
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;


Attachments:
signature.asc (847.00 B)

2020-03-15 02:50:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Fri, Mar 13, 2020 at 7:31 PM NeilBrown <[email protected]> wrote:
>
> The idea of list_del_init_release() and list_empty_acquire() is growing
> on me though. See below.

This does look like a promising approach.

However:

> + if (waiter->fl_blocker == NULL &&
> + list_empty(&waiter->fl_blocked_requests) &&
> + list_empty_acquire(&waiter->fl_blocked_member))
> + return status;

This does not seem sensible to me.

The thing is, the whole point about "acquire" semantics is that it
should happen _first_ - because a load-with-acquire only orders things
_after_ it.

So testing some other non-locked state before testing the load-acquire
state makes little sense: it means that the other tests you do are
fundamentally unordered and nonsensical in an unlocked model.

So _if_ those other tests matter (do they?), then they should be after
the acquire test (because they test things that on the writer side are
set before the "store-release"). Otherwise you're testing random
state.

And if they don't matter, then they shouldn't exist at all.

IOW, if you depend on ordering, then the _only_ ordering that exists is:

- writer side: writes done _before_ the smp_store_release() are visible

- to the reader side done _after_ the smp_load_acquire()

and absolutely no other ordering exists or makes sense to test for.

That limited ordering guarantee is why a store-release -> load-acquire
is fundamentally cheaper than any other serialization.

So the optimistic "I don't need to do anything" case should start ouf with

if (list_empty_acquire(&waiter->fl_blocked_member)) {

and go from there. Does it actually need to do anything else at all?
But if it does need to check the other fields, they should be checked
after that acquire.

Also, it worries me that the comment talks about "if fl_blocker is
NULL". But it realy now is that fl_blocked_member list being empty
that is the real serialization test, adn that's the one that the
comment should primarily talk about.

Linus

2020-03-15 13:55:01

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Sat, 2020-03-14 at 08:58 -0700, Linus Torvalds wrote:
> On Fri, Mar 13, 2020 at 7:31 PM NeilBrown <[email protected]> wrote:
> > The idea of list_del_init_release() and list_empty_acquire() is growing
> > on me though. See below.
>
> This does look like a promising approach.
>
> However:
>
> > + if (waiter->fl_blocker == NULL &&
> > + list_empty(&waiter->fl_blocked_requests) &&
> > + list_empty_acquire(&waiter->fl_blocked_member))
> > + return status;
>
> This does not seem sensible to me.
>
> The thing is, the whole point about "acquire" semantics is that it
> should happen _first_ - because a load-with-acquire only orders things
> _after_ it.
>
> So testing some other non-locked state before testing the load-acquire
> state makes little sense: it means that the other tests you do are
> fundamentally unordered and nonsensical in an unlocked model.
>
> So _if_ those other tests matter (do they?), then they should be after
> the acquire test (because they test things that on the writer side are
> set before the "store-release"). Otherwise you're testing random
> state.
>
> And if they don't matter, then they shouldn't exist at all.
>
> IOW, if you depend on ordering, then the _only_ ordering that exists is:
>
> - writer side: writes done _before_ the smp_store_release() are visible
>
> - to the reader side done _after_ the smp_load_acquire()
>
> and absolutely no other ordering exists or makes sense to test for.
>
> That limited ordering guarantee is why a store-release -> load-acquire
> is fundamentally cheaper than any other serialization.
>
> So the optimistic "I don't need to do anything" case should start ouf with
>
> if (list_empty_acquire(&waiter->fl_blocked_member)) {
>
> and go from there. Does it actually need to do anything else at all?
> But if it does need to check the other fields, they should be checked
> after that acquire.
>
> Also, it worries me that the comment talks about "if fl_blocker is
> NULL". But it realy now is that fl_blocked_member list being empty
> that is the real serialization test, adn that's the one that the
> comment should primarily talk about.
>

Good point. The list manipulation and setting of fl_blocker are always
done in conjunction, so I don't see why we'd need to check but one
condition there (whichever gets the explicit acquire/release semantics).

The fl_blocker pointer seems like the clearest way to indicate that to
me, but if using list_empty makes sense for other reasons, I'm fine with
that.

This is what I have so far (leaving Linus as author since he did the
original patch):

------------8<-------------

From 1493f539e09dfcd5e0862209c6f7f292a2f2d228 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <[email protected]>
Date: Mon, 9 Mar 2020 14:35:43 -0400
Subject: [PATCH] locks: reinstate locks_delete_block optimization

There is measurable performance impact in some synthetic tests due to
commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when
wakeup a waiter). Fix the race condition instead by clearing the
fl_blocker pointer after the wake_up, using explicit acquire/release
semantics.

With this change, we can just check for fl_blocker to clear as an
indicator that the block is already deleted, and eliminate the
list_empty check that was in the old optimization.

This does mean that we can no longer use the clearing of fl_blocker as
the wait condition, so switch the waiters over to checking whether the
fl_blocked_member list_head is empty.

Cc: yangerkun <[email protected]>
Cc: NeilBrown <[email protected]>
Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter)
Signed-off-by: Jeff Layton <[email protected]>
---
fs/cifs/file.c | 3 ++-
fs/locks.c | 38 ++++++++++++++++++++++++++++++++------
2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 3b942ecdd4be..8f9d849a0012 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1169,7 +1169,8 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
rc = posix_lock_file(file, flock, NULL);
up_write(&cinode->lock_sem);
if (rc == FILE_LOCK_DEFERRED) {
- rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
+ rc = wait_event_interruptible(flock->fl_wait,
+ list_empty(&flock->fl_blocked_member));
if (!rc)
goto try_again;
locks_delete_block(flock);
diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..652a09ab02d7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -725,7 +725,6 @@ static void __locks_delete_block(struct file_lock *waiter)
{
locks_delete_global_blocked(waiter);
list_del_init(&waiter->fl_blocked_member);
- waiter->fl_blocker = NULL;
}

static void __locks_wake_up_blocks(struct file_lock *blocker)
@@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
waiter->fl_lmops->lm_notify(waiter);
else
wake_up(&waiter->fl_wait);
+
+ /*
+ * Tell the world we're done with it - see comment at
+ * top of locks_delete_block().
+ */
+ smp_store_release(&waiter->fl_blocker, NULL);
}
}

@@ -753,11 +758,27 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;

+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread "owns"
+ * the lock and is the only one that might try to claim the lock.
+ * Because fl_blocker is explicitly set last during a delete, it's
+ * safe to locklessly test to see if it's NULL and avoid doing
+ * anything further if it is.
+ */
+ if (!smp_load_acquire(&waiter->fl_blocker))
+ return status;
+
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;
__locks_wake_up_blocks(waiter);
__locks_delete_block(waiter);
+
+ /*
+ * Tell the world we're done with it - see comment at top
+ * of this function
+ */
+ smp_store_release(&waiter->fl_blocker, NULL);
spin_unlock(&blocked_lock_lock);
return status;
}
@@ -1350,7 +1371,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
error = posix_lock_inode(inode, fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
+ error = wait_event_interruptible(fl->fl_wait,
+ list_empty(&fl->fl_blocked_member));
if (error)
break;
}
@@ -1435,7 +1457,8 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
error = posix_lock_inode(inode, &fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
+ error = wait_event_interruptible(fl.fl_wait,
+ list_empty(&fl.fl_blocked_member));
if (!error) {
/*
* If we've been sleeping someone might have
@@ -1638,7 +1661,8 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)

locks_dispose_list(&dispose);
error = wait_event_interruptible_timeout(new_fl->fl_wait,
- !new_fl->fl_blocker, break_time);
+ list_empty(&new_fl->fl_blocked_member),
+ break_time);

percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
@@ -2122,7 +2146,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
error = flock_lock_inode(inode, fl);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
+ error = wait_event_interruptible(fl->fl_wait,
+ list_empty(&fl->fl_blocked_member));
if (error)
break;
}
@@ -2399,7 +2424,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
error = vfs_lock_file(filp, cmd, fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
+ error = wait_event_interruptible(fl->fl_wait,
+ list_empty(&fl->fl_blocked_member));
if (error)
break;
}
--
2.24.1


2020-03-16 04:35:28

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Sat, Mar 14 2020, Linus Torvalds wrote:

> On Fri, Mar 13, 2020 at 7:31 PM NeilBrown <[email protected]> wrote:
>>
>> The idea of list_del_init_release() and list_empty_acquire() is growing
>> on me though. See below.
>
> This does look like a promising approach.

Thanks.

>
> However:
>
>> + if (waiter->fl_blocker == NULL &&
>> + list_empty(&waiter->fl_blocked_requests) &&
>> + list_empty_acquire(&waiter->fl_blocked_member))
>> + return status;
>
> This does not seem sensible to me.
>
> The thing is, the whole point about "acquire" semantics is that it
> should happen _first_ - because a load-with-acquire only orders things
> _after_ it.

Agreed.

>
> So testing some other non-locked state before testing the load-acquire
> state makes little sense: it means that the other tests you do are
> fundamentally unordered and nonsensical in an unlocked model.
>
> So _if_ those other tests matter (do they?), then they should be after
> the acquire test (because they test things that on the writer side are
> set before the "store-release"). Otherwise you're testing random
> state.
>
> And if they don't matter, then they shouldn't exist at all.

The ->fl_blocker == NULL test isn't needed. It is effectively equivalent
to the list_empty(fl_blocked_member) test.

The fl_blocked_requests test *is* needed (because a tree is dismantled
from the root to the leaves, so it stops being a member while it still
holds other requests). I didn't think the ordering mattered all that
much but having pondered it again I see that it does.

>
> IOW, if you depend on ordering, then the _only_ ordering that exists is:
>
> - writer side: writes done _before_ the smp_store_release() are visible
>
> - to the reader side done _after_ the smp_load_acquire()
>
> and absolutely no other ordering exists or makes sense to test for.
>
> That limited ordering guarantee is why a store-release -> load-acquire
> is fundamentally cheaper than any other serialization.
>
> So the optimistic "I don't need to do anything" case should start ouf with
>
> if (list_empty_acquire(&waiter->fl_blocked_member)) {
>
> and go from there. Does it actually need to do anything else at all?
> But if it does need to check the other fields, they should be checked
> after that acquire.

So it should be
if (list_empty_acquire(&wait->fl_blocked_member) &&
list_empty_acquire(&wait->fl_blocked_requests))
return status;

And because that second list_empty_acquire() is on the list head, and
pairs with a list_del_init_release() on a list member, I would need to
fix the __list_del() part to be
next->prev = prev;
smp_store_release(prev->next, next)

>
> Also, it worries me that the comment talks about "if fl_blocker is
> NULL". But it realy now is that fl_blocked_member list being empty
> that is the real serialization test, adn that's the one that the
> comment should primarily talk about.

Yes, I see that now. Thanks.

NeilBrown


Attachments:
signature.asc (847.00 B)

2020-03-16 05:07:04

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Sun, Mar 15 2020, Jeff Layton wrote:

> On Sat, 2020-03-14 at 08:58 -0700, Linus Torvalds wrote:
>> On Fri, Mar 13, 2020 at 7:31 PM NeilBrown <[email protected]> wrote:
>> > The idea of list_del_init_release() and list_empty_acquire() is growing
>> > on me though. See below.
>>
>> This does look like a promising approach.
>>
>> However:
>>
>> > + if (waiter->fl_blocker == NULL &&
>> > + list_empty(&waiter->fl_blocked_requests) &&
>> > + list_empty_acquire(&waiter->fl_blocked_member))
>> > + return status;
>>
>> This does not seem sensible to me.
>>
>> The thing is, the whole point about "acquire" semantics is that it
>> should happen _first_ - because a load-with-acquire only orders things
>> _after_ it.
>>
>> So testing some other non-locked state before testing the load-acquire
>> state makes little sense: it means that the other tests you do are
>> fundamentally unordered and nonsensical in an unlocked model.
>>
>> So _if_ those other tests matter (do they?), then they should be after
>> the acquire test (because they test things that on the writer side are
>> set before the "store-release"). Otherwise you're testing random
>> state.
>>
>> And if they don't matter, then they shouldn't exist at all.
>>
>> IOW, if you depend on ordering, then the _only_ ordering that exists is:
>>
>> - writer side: writes done _before_ the smp_store_release() are visible
>>
>> - to the reader side done _after_ the smp_load_acquire()
>>
>> and absolutely no other ordering exists or makes sense to test for.
>>
>> That limited ordering guarantee is why a store-release -> load-acquire
>> is fundamentally cheaper than any other serialization.
>>
>> So the optimistic "I don't need to do anything" case should start ouf with
>>
>> if (list_empty_acquire(&waiter->fl_blocked_member)) {
>>
>> and go from there. Does it actually need to do anything else at all?
>> But if it does need to check the other fields, they should be checked
>> after that acquire.
>>
>> Also, it worries me that the comment talks about "if fl_blocker is
>> NULL". But it realy now is that fl_blocked_member list being empty
>> that is the real serialization test, adn that's the one that the
>> comment should primarily talk about.
>>
>
> Good point. The list manipulation and setting of fl_blocker are always
> done in conjunction, so I don't see why we'd need to check but one
> condition there (whichever gets the explicit acquire/release semantics).
>
> The fl_blocker pointer seems like the clearest way to indicate that to
> me, but if using list_empty makes sense for other reasons, I'm fine with
> that.
>
> This is what I have so far (leaving Linus as author since he did the
> original patch):
>
> ------------8<-------------
>
> From 1493f539e09dfcd5e0862209c6f7f292a2f2d228 Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <[email protected]>
> Date: Mon, 9 Mar 2020 14:35:43 -0400
> Subject: [PATCH] locks: reinstate locks_delete_block optimization
>
> There is measurable performance impact in some synthetic tests due to
> commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when
> wakeup a waiter). Fix the race condition instead by clearing the
> fl_blocker pointer after the wake_up, using explicit acquire/release
> semantics.
>
> With this change, we can just check for fl_blocker to clear as an
> indicator that the block is already deleted, and eliminate the
> list_empty check that was in the old optimization.
>
> This does mean that we can no longer use the clearing of fl_blocker as
> the wait condition, so switch the waiters over to checking whether the
> fl_blocked_member list_head is empty.
>
> Cc: yangerkun <[email protected]>
> Cc: NeilBrown <[email protected]>
> Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter)
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/cifs/file.c | 3 ++-
> fs/locks.c | 38 ++++++++++++++++++++++++++++++++------
> 2 files changed, 34 insertions(+), 7 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 3b942ecdd4be..8f9d849a0012 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1169,7 +1169,8 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
> rc = posix_lock_file(file, flock, NULL);
> up_write(&cinode->lock_sem);
> if (rc == FILE_LOCK_DEFERRED) {
> - rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
> + rc = wait_event_interruptible(flock->fl_wait,
> + list_empty(&flock->fl_blocked_member));
> if (!rc)
> goto try_again;
> locks_delete_block(flock);
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..652a09ab02d7 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -725,7 +725,6 @@ static void __locks_delete_block(struct file_lock *waiter)
> {
> locks_delete_global_blocked(waiter);
> list_del_init(&waiter->fl_blocked_member);
> - waiter->fl_blocker = NULL;
> }
>
> static void __locks_wake_up_blocks(struct file_lock *blocker)
> @@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> waiter->fl_lmops->lm_notify(waiter);
> else
> wake_up(&waiter->fl_wait);
> +
> + /*
> + * Tell the world we're done with it - see comment at
> + * top of locks_delete_block().
> + */
> + smp_store_release(&waiter->fl_blocker, NULL);
> }
> }
>
> @@ -753,11 +758,27 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread "owns"
> + * the lock and is the only one that might try to claim the lock.
> + * Because fl_blocker is explicitly set last during a delete, it's
> + * safe to locklessly test to see if it's NULL and avoid doing
> + * anything further if it is.
> + */
> + if (!smp_load_acquire(&waiter->fl_blocker))
> + return status;

No, we really do need fl_blocked_requests to be empty.
After fl_blocker is cleared, the owner might check for other blockers
and might queue behind them leaving the blocked requests in place.
Or it might have to detach all those blocked requests and wake them up
so they can go and fend for themselves.

I think the worse-case scenario could go something like that.
Process A get a lock - Al
Process B tries to get a conflicting lock and blocks Bl -> Al
Process C tries to get a conflicting lock and blocks on B:
Cl -> Bl -> Al

At much the same time that C goes to attach Cl to Bl, A
calls unlock and B get signaled.

So A is calling locks_wake_up_blocks(Al) - which takes blocked_lock_lock.
C is calling locks_insert_block(Bl, Cl) - which also takes the lock
B is calling locks_delete_block(Bl) which might not take the lock.

Assume C gets the lock first.

Before C calls locks_insert_block, Bl->fl_blocked_requests is empty.
After A finishes in locks_wake_up_blocks, Bl->fl_blocker is NULL

If B sees that fl_blocker is NULL, we need it to see that
fl_blocked_requests is no longer empty, so that it takes the lock and
cleans up fl_blocked_requests.

If the list_empty test on fl_blocked_request goes after the fl_blocker
test, the memory barriers we have should assure that. I had thought
that it would need an extra barrier, but as a spinlock places the change
to fl_blocked_requests *before* the change to fl_blocker, I no longer
think that is needed.

Thanks,
NeilBrown


> +
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;
> __locks_wake_up_blocks(waiter);
> __locks_delete_block(waiter);
> +
> + /*
> + * Tell the world we're done with it - see comment at top
> + * of this function
> + */
> + smp_store_release(&waiter->fl_blocker, NULL);
> spin_unlock(&blocked_lock_lock);
> return status;
> }
> @@ -1350,7 +1371,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> error = posix_lock_inode(inode, fl, NULL);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> + error = wait_event_interruptible(fl->fl_wait,
> + list_empty(&fl->fl_blocked_member));
> if (error)
> break;
> }
> @@ -1435,7 +1457,8 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
> error = posix_lock_inode(inode, &fl, NULL);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
> + error = wait_event_interruptible(fl.fl_wait,
> + list_empty(&fl.fl_blocked_member));
> if (!error) {
> /*
> * If we've been sleeping someone might have
> @@ -1638,7 +1661,8 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>
> locks_dispose_list(&dispose);
> error = wait_event_interruptible_timeout(new_fl->fl_wait,
> - !new_fl->fl_blocker, break_time);
> + list_empty(&new_fl->fl_blocked_member),
> + break_time);
>
> percpu_down_read(&file_rwsem);
> spin_lock(&ctx->flc_lock);
> @@ -2122,7 +2146,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> error = flock_lock_inode(inode, fl);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> + error = wait_event_interruptible(fl->fl_wait,
> + list_empty(&fl->fl_blocked_member));
> if (error)
> break;
> }
> @@ -2399,7 +2424,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
> error = vfs_lock_file(filp, cmd, fl, NULL);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> + error = wait_event_interruptible(fl->fl_wait,
> + list_empty(&fl->fl_blocked_member));
> if (error)
> break;
> }
> --
> 2.24.1


Attachments:
signature.asc (847.00 B)

2020-03-16 11:08:15

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, 2020-03-16 at 16:06 +1100, NeilBrown wrote:

[...]

> No, we really do need fl_blocked_requests to be empty.
> After fl_blocker is cleared, the owner might check for other blockers
> and might queue behind them leaving the blocked requests in place.
> Or it might have to detach all those blocked requests and wake them up
> so they can go and fend for themselves.
>
> I think the worse-case scenario could go something like that.
> Process A get a lock - Al
> Process B tries to get a conflicting lock and blocks Bl -> Al
> Process C tries to get a conflicting lock and blocks on B:
> Cl -> Bl -> Al
>
> At much the same time that C goes to attach Cl to Bl, A
> calls unlock and B get signaled.
>
> So A is calling locks_wake_up_blocks(Al) - which takes blocked_lock_lock.
> C is calling locks_insert_block(Bl, Cl) - which also takes the lock
> B is calling locks_delete_block(Bl) which might not take the lock.
>
> Assume C gets the lock first.
>
> Before C calls locks_insert_block, Bl->fl_blocked_requests is empty.
> After A finishes in locks_wake_up_blocks, Bl->fl_blocker is NULL
>
> If B sees that fl_blocker is NULL, we need it to see that
> fl_blocked_requests is no longer empty, so that it takes the lock and
> cleans up fl_blocked_requests.
>
> If the list_empty test on fl_blocked_request goes after the fl_blocker
> test, the memory barriers we have should assure that. I had thought
> that it would need an extra barrier, but as a spinlock places the change
> to fl_blocked_requests *before* the change to fl_blocker, I no longer
> think that is needed.

Got it. I was thinking all of the waiters of a blocker would already be
awoken once fl_blocker was set to NULL, but you're correct and they
aren't. How about this?

-----------------8<------------------

From f40e865842ae84a9d465ca9edb66f0985c1587d4 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <[email protected]>
Date: Mon, 9 Mar 2020 14:35:43 -0400
Subject: [PATCH] locks: reinstate locks_delete_block optimization

There is measurable performance impact in some synthetic tests due to
commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when
wakeup a waiter). Fix the race condition instead by clearing the
fl_blocker pointer after the wake_up, using explicit acquire/release
semantics.

This does mean that we can no longer use the clearing of fl_blocker as
the wait condition, so switch the waiters over to checking whether the
fl_blocked_member list_head is empty.

Cc: yangerkun <[email protected]>
Cc: NeilBrown <[email protected]>
Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter)
Signed-off-by: Jeff Layton <[email protected]>
---
fs/cifs/file.c | 3 ++-
fs/locks.c | 41 +++++++++++++++++++++++++++++++++++------
2 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 3b942ecdd4be..8f9d849a0012 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1169,7 +1169,8 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
rc = posix_lock_file(file, flock, NULL);
up_write(&cinode->lock_sem);
if (rc == FILE_LOCK_DEFERRED) {
- rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
+ rc = wait_event_interruptible(flock->fl_wait,
+ list_empty(&flock->fl_blocked_member));
if (!rc)
goto try_again;
locks_delete_block(flock);
diff --git a/fs/locks.c b/fs/locks.c
index 426b55d333d5..eaf754ecdaa8 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -725,7 +725,6 @@ static void __locks_delete_block(struct file_lock *waiter)
{
locks_delete_global_blocked(waiter);
list_del_init(&waiter->fl_blocked_member);
- waiter->fl_blocker = NULL;
}

static void __locks_wake_up_blocks(struct file_lock *blocker)
@@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
waiter->fl_lmops->lm_notify(waiter);
else
wake_up(&waiter->fl_wait);
+
+ /*
+ * Tell the world we're done with it - see comment at
+ * top of locks_delete_block().
+ */
+ smp_store_release(&waiter->fl_blocker, NULL);
}
}

@@ -753,11 +758,30 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;

+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread "owns"
+ * the lock and is the only one that might try to claim the lock.
+ * Because fl_blocker is explicitly set last during a delete, it's
+ * safe to locklessly test to see if it's NULL. If it is, then we know
+ * that no new locks can be inserted into its fl_blocked_requests list,
+ * and we can therefore avoid doing anything further as long as that
+ * list is empty.
+ */
+ if (!smp_load_acquire(&waiter->fl_blocker) &&
+ list_empty(&waiter->fl_blocked_requests))
+ return status;
+
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;
__locks_wake_up_blocks(waiter);
__locks_delete_block(waiter);
+
+ /*
+ * Tell the world we're done with it - see comment at top
+ * of this function
+ */
+ smp_store_release(&waiter->fl_blocker, NULL);
spin_unlock(&blocked_lock_lock);
return status;
}
@@ -1350,7 +1374,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
error = posix_lock_inode(inode, fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
+ error = wait_event_interruptible(fl->fl_wait,
+ list_empty(&fl->fl_blocked_member));
if (error)
break;
}
@@ -1435,7 +1460,8 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
error = posix_lock_inode(inode, &fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
+ error = wait_event_interruptible(fl.fl_wait,
+ list_empty(&fl.fl_blocked_member));
if (!error) {
/*
* If we've been sleeping someone might have
@@ -1638,7 +1664,8 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)

locks_dispose_list(&dispose);
error = wait_event_interruptible_timeout(new_fl->fl_wait,
- !new_fl->fl_blocker, break_time);
+ list_empty(&new_fl->fl_blocked_member),
+ break_time);

percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
@@ -2122,7 +2149,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
error = flock_lock_inode(inode, fl);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
+ error = wait_event_interruptible(fl->fl_wait,
+ list_empty(&fl->fl_blocked_member));
if (error)
break;
}
@@ -2399,7 +2427,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
error = vfs_lock_file(filp, cmd, fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
+ error = wait_event_interruptible(fl->fl_wait,
+ list_empty(&fl->fl_blocked_member));
if (error)
break;
}
--
2.24.1


Attachments:
signature.asc (879.00 B)
This is a digitally signed message part

2020-03-16 17:28:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, Mar 16, 2020 at 4:07 AM Jeff Layton <[email protected]> wrote:
>
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread "owns"
> + * the lock and is the only one that might try to claim the lock.
> + * Because fl_blocker is explicitly set last during a delete, it's
> + * safe to locklessly test to see if it's NULL. If it is, then we know
> + * that no new locks can be inserted into its fl_blocked_requests list,
> + * and we can therefore avoid doing anything further as long as that
> + * list is empty.
> + */
> + if (!smp_load_acquire(&waiter->fl_blocker) &&
> + list_empty(&waiter->fl_blocked_requests))
> + return status;

Ack. This looks sane to me now.

yangerkun - how did you find the original problem?

Would you mind using whatever stress test that caused commit
6d390e4b5d48 ("locks: fix a potential use-after-free problem when
wakeup a waiter") with this patch? And if you did it analytically,
you're a champ and should look at this patch too!

Thanks,

Linus

2020-03-16 22:46:41

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, Mar 16 2020, Jeff Layton wrote:

> @@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> waiter->fl_lmops->lm_notify(waiter);
> else
> wake_up(&waiter->fl_wait);
> +
> + /*
> + * Tell the world we're done with it - see comment at
> + * top of locks_delete_block().
> + */
> + smp_store_release(&waiter->fl_blocker, NULL);
> }
> }
>
> @@ -753,11 +758,30 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread "owns"
> + * the lock and is the only one that might try to claim the lock.
> + * Because fl_blocker is explicitly set last during a delete, it's
> + * safe to locklessly test to see if it's NULL. If it is, then we know
> + * that no new locks can be inserted into its fl_blocked_requests list,
> + * and we can therefore avoid doing anything further as long as that
> + * list is empty.

I think it would be worth spelling out what the 'acquire' is needed
for. We seem to have a general policy of requiring comment to explain
the presence of barriers.

The 'acquire' on fl_blocker guarantees two things.
1/ that fl_blocked_requests can be tested locklessly. If something was
recently added to that list it must have been in a locked region
*before* the locked region when fl_blocker was set to NULL.
2/ that no other thread is accessing 'waiter', so it is safe to free it.
__locks_wake_up_blocks is careful not to touch waiter after
fl_blocker is released.


> + */
> + if (!smp_load_acquire(&waiter->fl_blocker) &&
> + list_empty(&waiter->fl_blocked_requests))
> + return status;
> +
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;
> __locks_wake_up_blocks(waiter);
> __locks_delete_block(waiter);
> +
> + /*
> + * Tell the world we're done with it - see comment at top
> + * of this function

This comment might be misleading. The world doesn't care.
Only this thread cares where ->fl_blocker is NULL. We need the release
semantics when some *other* thread sets fl_blocker to NULL, not when
this thread does.
I don't think we need to spell that out and I'm not against using
store_release here, but locks_delete_block cannot race with itself, so
referring to the comment at the top of this function is misleading.

So:
Reviewed-by: NeilBrown <[email protected]>

but I'm not totally happy with the comments.

Thanks,
NeilBrown


> + */
> + smp_store_release(&waiter->fl_blocker, NULL);
> spin_unlock(&blocked_lock_lock);
> return status;
> }


Attachments:
signature.asc (847.00 B)

2020-03-17 02:05:13

by yangerkun

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression



On 2020/3/17 1:26, Linus Torvalds wrote:
> On Mon, Mar 16, 2020 at 4:07 AM Jeff Layton <[email protected]> wrote:
>>
>>
>> + /*
>> + * If fl_blocker is NULL, it won't be set again as this thread "owns"
>> + * the lock and is the only one that might try to claim the lock.
>> + * Because fl_blocker is explicitly set last during a delete, it's
>> + * safe to locklessly test to see if it's NULL. If it is, then we know
>> + * that no new locks can be inserted into its fl_blocked_requests list,
>> + * and we can therefore avoid doing anything further as long as that
>> + * list is empty.
>> + */
>> + if (!smp_load_acquire(&waiter->fl_blocker) &&
>> + list_empty(&waiter->fl_blocked_requests))
>> + return status;
>
> Ack. This looks sane to me now.
>
> yangerkun - how did you find the original problem?\

While try to fix CVE-2019-19769, add some log in __locks_wake_up_blocks
help me to rebuild the problem soon. This help me to discern the problem
soon.

>
> Would you mind using whatever stress test that caused commit
> 6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> wakeup a waiter") with this patch? And if you did it analytically,
> you're a champ and should look at this patch too!

I will try to understand this patch, and if it's looks good to me, will
do the performance test!

Thanks

2020-03-17 14:07:05

by yangerkun

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression



On 2020/3/17 9:41, yangerkun wrote:
>
>
> On 2020/3/17 1:26, Linus Torvalds wrote:
>> On Mon, Mar 16, 2020 at 4:07 AM Jeff Layton <[email protected]> wrote:
>>>
>>>
>>> +       /*
>>> +        * If fl_blocker is NULL, it won't be set again as this
>>> thread "owns"
>>> +        * the lock and is the only one that might try to claim the
>>> lock.
>>> +        * Because fl_blocker is explicitly set last during a delete,
>>> it's
>>> +        * safe to locklessly test to see if it's NULL. If it is,
>>> then we know
>>> +        * that no new locks can be inserted into its
>>> fl_blocked_requests list,
>>> +        * and we can therefore avoid doing anything further as long
>>> as that
>>> +        * list is empty.
>>> +        */
>>> +       if (!smp_load_acquire(&waiter->fl_blocker) &&
>>> +           list_empty(&waiter->fl_blocked_requests))
>>> +               return status;
>>
>> Ack. This looks sane to me now.
>>
>> yangerkun - how did you find the original problem?\
>
> While try to fix CVE-2019-19769, add some log in __locks_wake_up_blocks
> help me to rebuild the problem soon. This help me to discern the problem
> soon.
>
>>
>> Would you mind using whatever stress test that caused commit
>> 6d390e4b5d48 ("locks: fix a potential use-after-free problem when
>> wakeup a waiter") with this patch? And if you did it analytically,
>> you're a champ and should look at this patch too!
>
> I will try to understand this patch, and if it's looks good to me, will
> do the performance test!

This patch looks good to me, with this patch, the bug '6d390e4b5d48
("locks: fix a potential use-after-free problem when wakeup a waiter")'
describes won't happen again. Actually, I find that syzkaller has report
this bug before[1], and the log of it can help us to reproduce it with
some latency in __locks_wake_up_blocks!

Also, some ltp testcases describes in [2] pass too with the patch!

For performance test, I have try to understand will-it-scale/lkp, but it
seem a little complex to me, and may need some more time. So, Rong Chen,
can you help to do this? Or the results may come a little later...

Thanks,
----
[1] https://syzkaller.appspot.com/bug?extid=922689db06e57b69c240
[2] https://lkml.org/lkml/2020/3/11/578

2020-03-17 16:00:25

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, 2020-03-17 at 09:45 +1100, NeilBrown wrote:
> > +
> > + /*
> > + * Tell the world we're done with it - see comment at top
> > + * of this function
>
> This comment might be misleading. The world doesn't care.
> Only this thread cares where ->fl_blocker is NULL. We need the release
> semantics when some *other* thread sets fl_blocker to NULL, not when
> this thread does.
> I don't think we need to spell that out and I'm not against using
> store_release here, but locks_delete_block cannot race with itself, so
> referring to the comment at the top of this function is misleading.
>
> So:
> Reviewed-by: NeilBrown <[email protected]>
>
> but I'm not totally happy with the comments.
>
>

Thanks Neil. We can clean up the comments before merge. How about this
revision to the earlier patch? I took the liberty of poaching your your
proposed verbiage:

------------------8<---------------------

From c9fbfae0ab615e20de0bdf1ae7b27591d602f577 Mon Sep 17 00:00:00 2001
From: Jeff Layton <[email protected]>
Date: Mon, 16 Mar 2020 18:57:47 -0400
Subject: [PATCH] SQUASH: update with Neil's comments

Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index eaf754ecdaa8..e74075b0e8ec 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -741,8 +741,9 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
wake_up(&waiter->fl_wait);

/*
- * Tell the world we're done with it - see comment at
- * top of locks_delete_block().
+ * The setting of fl_blocker to NULL marks the official "done"
+ * point in deleting a block. Paired with acquire at the top
+ * of locks_delete_block().
*/
smp_store_release(&waiter->fl_blocker, NULL);
}
@@ -761,11 +762,23 @@ int locks_delete_block(struct file_lock *waiter)
/*
* If fl_blocker is NULL, it won't be set again as this thread "owns"
* the lock and is the only one that might try to claim the lock.
- * Because fl_blocker is explicitly set last during a delete, it's
- * safe to locklessly test to see if it's NULL. If it is, then we know
- * that no new locks can be inserted into its fl_blocked_requests list,
- * and we can therefore avoid doing anything further as long as that
- * list is empty.
+ *
+ * We use acquire/release to manage fl_blocker so that we can
+ * optimize away taking the blocked_lock_lock in many cases.
+ *
+ * The smp_load_acquire guarantees two things:
+ *
+ * 1/ that fl_blocked_requests can be tested locklessly. If something
+ * was recently added to that list it must have been in a locked region
+ * *before* the locked region when fl_blocker was set to NULL.
+ *
+ * 2/ that no other thread is accessing 'waiter', so it is safe to free
+ * it. __locks_wake_up_blocks is careful not to touch waiter after
+ * fl_blocker is released.
+ *
+ * If a lockless check of fl_blocker shows it to be NULL, we know that
+ * no new locks can be inserted into its fl_blocked_requests list, and
+ * can avoid doing anything further if the list is empty.
*/
if (!smp_load_acquire(&waiter->fl_blocker) &&
list_empty(&waiter->fl_blocked_requests))
@@ -778,8 +791,8 @@ int locks_delete_block(struct file_lock *waiter)
__locks_delete_block(waiter);

/*
- * Tell the world we're done with it - see comment at top
- * of this function
+ * The setting of fl_blocker to NULL marks the official "done" point in
+ * deleting a block. Paired with acquire at the top of this function.
*/
smp_store_release(&waiter->fl_blocker, NULL);
spin_unlock(&blocked_lock_lock);
--
2.24.1

2020-03-17 16:10:15

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, 2020-03-17 at 22:05 +0800, yangerkun wrote:
>
> On 2020/3/17 9:41, yangerkun wrote:
> >
> > On 2020/3/17 1:26, Linus Torvalds wrote:
> > > On Mon, Mar 16, 2020 at 4:07 AM Jeff Layton <[email protected]> wrote:
> > > >
> > > > + /*
> > > > + * If fl_blocker is NULL, it won't be set again as this
> > > > thread "owns"
> > > > + * the lock and is the only one that might try to claim the
> > > > lock.
> > > > + * Because fl_blocker is explicitly set last during a delete,
> > > > it's
> > > > + * safe to locklessly test to see if it's NULL. If it is,
> > > > then we know
> > > > + * that no new locks can be inserted into its
> > > > fl_blocked_requests list,
> > > > + * and we can therefore avoid doing anything further as long
> > > > as that
> > > > + * list is empty.
> > > > + */
> > > > + if (!smp_load_acquire(&waiter->fl_blocker) &&
> > > > + list_empty(&waiter->fl_blocked_requests))
> > > > + return status;
> > >
> > > Ack. This looks sane to me now.
> > >
> > > yangerkun - how did you find the original problem?\
> >
> > While try to fix CVE-2019-19769, add some log in __locks_wake_up_blocks
> > help me to rebuild the problem soon. This help me to discern the problem
> > soon.
> >
> > > Would you mind using whatever stress test that caused commit
> > > 6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> > > wakeup a waiter") with this patch? And if you did it analytically,
> > > you're a champ and should look at this patch too!
> >
> > I will try to understand this patch, and if it's looks good to me, will
> > do the performance test!
>
> This patch looks good to me, with this patch, the bug '6d390e4b5d48
> ("locks: fix a potential use-after-free problem when wakeup a waiter")'
> describes won't happen again. Actually, I find that syzkaller has report
> this bug before[1], and the log of it can help us to reproduce it with
> some latency in __locks_wake_up_blocks!
>
> Also, some ltp testcases describes in [2] pass too with the patch!
>
> For performance test, I have try to understand will-it-scale/lkp, but it
> seem a little complex to me, and may need some more time. So, Rong Chen,
> can you help to do this? Or the results may come a little later...
>
> Thanks,
> ----
> [1] https://syzkaller.appspot.com/bug?extid=922689db06e57b69c240
> [2] https://lkml.org/lkml/2020/3/11/578

Thanks yangerkun. Let me know if you want to add your Reviewed-by tag.

Cheers,
--
Jeff Layton <[email protected]>

2020-03-17 21:29:26

by NeilBrown

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Tue, Mar 17 2020, Jeff Layton wrote:

> On Tue, 2020-03-17 at 09:45 +1100, NeilBrown wrote:
>> > +
>> > + /*
>> > + * Tell the world we're done with it - see comment at top
>> > + * of this function
>>
>> This comment might be misleading. The world doesn't care.
>> Only this thread cares where ->fl_blocker is NULL. We need the release
>> semantics when some *other* thread sets fl_blocker to NULL, not when
>> this thread does.
>> I don't think we need to spell that out and I'm not against using
>> store_release here, but locks_delete_block cannot race with itself, so
>> referring to the comment at the top of this function is misleading.
>>
>> So:
>> Reviewed-by: NeilBrown <[email protected]>
>>
>> but I'm not totally happy with the comments.
>>
>>
>
> Thanks Neil. We can clean up the comments before merge. How about this
> revision to the earlier patch? I took the liberty of poaching your your
> proposed verbiage:

Thanks. I'm happy with that.

(Well.... actually I hate the use of the word "official" unless there is
a well defined office holder being blamed. But the word has come to
mean something vaguer in common usage and there is probably no point
fighting it. In this case "formal" is close but less personally
annoying, but I'm not sure the word is needed at all).

Thanks,
NeilBrown


>
> ------------------8<---------------------
>
> From c9fbfae0ab615e20de0bdf1ae7b27591d602f577 Mon Sep 17 00:00:00 2001
> From: Jeff Layton <[email protected]>
> Date: Mon, 16 Mar 2020 18:57:47 -0400
> Subject: [PATCH] SQUASH: update with Neil's comments
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/locks.c | 31 ++++++++++++++++++++++---------
> 1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index eaf754ecdaa8..e74075b0e8ec 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -741,8 +741,9 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> wake_up(&waiter->fl_wait);
>
> /*
> - * Tell the world we're done with it - see comment at
> - * top of locks_delete_block().
> + * The setting of fl_blocker to NULL marks the official "done"
> + * point in deleting a block. Paired with acquire at the top
> + * of locks_delete_block().
> */
> smp_store_release(&waiter->fl_blocker, NULL);
> }
> @@ -761,11 +762,23 @@ int locks_delete_block(struct file_lock *waiter)
> /*
> * If fl_blocker is NULL, it won't be set again as this thread "owns"
> * the lock and is the only one that might try to claim the lock.
> - * Because fl_blocker is explicitly set last during a delete, it's
> - * safe to locklessly test to see if it's NULL. If it is, then we know
> - * that no new locks can be inserted into its fl_blocked_requests list,
> - * and we can therefore avoid doing anything further as long as that
> - * list is empty.
> + *
> + * We use acquire/release to manage fl_blocker so that we can
> + * optimize away taking the blocked_lock_lock in many cases.
> + *
> + * The smp_load_acquire guarantees two things:
> + *
> + * 1/ that fl_blocked_requests can be tested locklessly. If something
> + * was recently added to that list it must have been in a locked region
> + * *before* the locked region when fl_blocker was set to NULL.
> + *
> + * 2/ that no other thread is accessing 'waiter', so it is safe to free
> + * it. __locks_wake_up_blocks is careful not to touch waiter after
> + * fl_blocker is released.
> + *
> + * If a lockless check of fl_blocker shows it to be NULL, we know that
> + * no new locks can be inserted into its fl_blocked_requests list, and
> + * can avoid doing anything further if the list is empty.
> */
> if (!smp_load_acquire(&waiter->fl_blocker) &&
> list_empty(&waiter->fl_blocked_requests))
> @@ -778,8 +791,8 @@ int locks_delete_block(struct file_lock *waiter)
> __locks_delete_block(waiter);
>
> /*
> - * Tell the world we're done with it - see comment at top
> - * of this function
> + * The setting of fl_blocker to NULL marks the official "done" point in
> + * deleting a block. Paired with acquire at the top of this function.
> */
> smp_store_release(&waiter->fl_blocker, NULL);
> spin_unlock(&blocked_lock_lock);
> --
> 2.24.1


Attachments:
signature.asc (847.00 B)

2020-03-18 01:10:12

by yangerkun

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression



On 2020/3/18 0:07, Jeff Layton wrote:
> On Tue, 2020-03-17 at 22:05 +0800, yangerkun wrote:
>>
>> On 2020/3/17 9:41, yangerkun wrote:
>>>
>>> On 2020/3/17 1:26, Linus Torvalds wrote:
>>>> On Mon, Mar 16, 2020 at 4:07 AM Jeff Layton <[email protected]> wrote:
>>>>>
>>>>> + /*
>>>>> + * If fl_blocker is NULL, it won't be set again as this
>>>>> thread "owns"
>>>>> + * the lock and is the only one that might try to claim the
>>>>> lock.
>>>>> + * Because fl_blocker is explicitly set last during a delete,
>>>>> it's
>>>>> + * safe to locklessly test to see if it's NULL. If it is,
>>>>> then we know
>>>>> + * that no new locks can be inserted into its
>>>>> fl_blocked_requests list,
>>>>> + * and we can therefore avoid doing anything further as long
>>>>> as that
>>>>> + * list is empty.
>>>>> + */
>>>>> + if (!smp_load_acquire(&waiter->fl_blocker) &&
>>>>> + list_empty(&waiter->fl_blocked_requests))
>>>>> + return status;
>>>>
>>>> Ack. This looks sane to me now.
>>>>
>>>> yangerkun - how did you find the original problem?\
>>>
>>> While try to fix CVE-2019-19769, add some log in __locks_wake_up_blocks
>>> help me to rebuild the problem soon. This help me to discern the problem
>>> soon.
>>>
>>>> Would you mind using whatever stress test that caused commit
>>>> 6d390e4b5d48 ("locks: fix a potential use-after-free problem when
>>>> wakeup a waiter") with this patch? And if you did it analytically,
>>>> you're a champ and should look at this patch too!
>>>
>>> I will try to understand this patch, and if it's looks good to me, will
>>> do the performance test!
>>
>> This patch looks good to me, with this patch, the bug '6d390e4b5d48
>> ("locks: fix a potential use-after-free problem when wakeup a waiter")'
>> describes won't happen again. Actually, I find that syzkaller has report
>> this bug before[1], and the log of it can help us to reproduce it with
>> some latency in __locks_wake_up_blocks!
>>
>> Also, some ltp testcases describes in [2] pass too with the patch!
>>
>> For performance test, I have try to understand will-it-scale/lkp, but it
>> seem a little complex to me, and may need some more time. So, Rong Chen,
>> can you help to do this? Or the results may come a little later...
>>
>> Thanks,
>> ----
>> [1] https://syzkaller.appspot.com/bug?extid=922689db06e57b69c240
>> [2] https://lkml.org/lkml/2020/3/11/578
>
> Thanks yangerkun. Let me know if you want to add your Reviewed-by tag.

Yeah, you can add:

Reviewed-by: yangerkun <[email protected]>

>
> Cheers,
>

2020-03-18 05:14:00

by Chen, Rong A

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, Mar 16, 2020 at 07:07:24AM -0400, Jeff Layton wrote:
> On Mon, 2020-03-16 at 16:06 +1100, NeilBrown wrote:
>
> [...]
>
> > No, we really do need fl_blocked_requests to be empty.
> > After fl_blocker is cleared, the owner might check for other blockers
> > and might queue behind them leaving the blocked requests in place.
> > Or it might have to detach all those blocked requests and wake them up
> > so they can go and fend for themselves.
> >
> > I think the worse-case scenario could go something like that.
> > Process A get a lock - Al
> > Process B tries to get a conflicting lock and blocks Bl -> Al
> > Process C tries to get a conflicting lock and blocks on B:
> > Cl -> Bl -> Al
> >
> > At much the same time that C goes to attach Cl to Bl, A
> > calls unlock and B get signaled.
> >
> > So A is calling locks_wake_up_blocks(Al) - which takes blocked_lock_lock.
> > C is calling locks_insert_block(Bl, Cl) - which also takes the lock
> > B is calling locks_delete_block(Bl) which might not take the lock.
> >
> > Assume C gets the lock first.
> >
> > Before C calls locks_insert_block, Bl->fl_blocked_requests is empty.
> > After A finishes in locks_wake_up_blocks, Bl->fl_blocker is NULL
> >
> > If B sees that fl_blocker is NULL, we need it to see that
> > fl_blocked_requests is no longer empty, so that it takes the lock and
> > cleans up fl_blocked_requests.
> >
> > If the list_empty test on fl_blocked_request goes after the fl_blocker
> > test, the memory barriers we have should assure that. I had thought
> > that it would need an extra barrier, but as a spinlock places the change
> > to fl_blocked_requests *before* the change to fl_blocker, I no longer
> > think that is needed.
>
> Got it. I was thinking all of the waiters of a blocker would already be
> awoken once fl_blocker was set to NULL, but you're correct and they
> aren't. How about this?

Hi,

We tested the patch and confirmed it can fix the regression:

commit:
0a68ff5e2e ("fcntl: Distribute switch variables for initialization")
6d390e4b5d ("locks: fix a potential use-after-free problem when wakeup a waiter")
3063690b0e ("locks: reinstate locks_delete_block optimization")

0a68ff5e2e7cf226 6d390e4b5d48ec03bb87e63cf0 3063690b0ef0089115914f366a testcase/testparams/testbox
---------------- -------------------------- -------------------------- ---------------------------
%stddev change %stddev change %stddev
\ | \ | \
66597 ± 3% -97% 2260 67062 will-it-scale/performance-process-100%-lock1-ucode=0x11/lkp-knm01
66597 -97% 2260 67062 GEO-MEAN will-it-scale.per_process_ops

Best Regards,
Rong Chen

>
> -----------------8<------------------
>
> From f40e865842ae84a9d465ca9edb66f0985c1587d4 Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <[email protected]>
> Date: Mon, 9 Mar 2020 14:35:43 -0400
> Subject: [PATCH] locks: reinstate locks_delete_block optimization
>
> There is measurable performance impact in some synthetic tests due to
> commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when
> wakeup a waiter). Fix the race condition instead by clearing the
> fl_blocker pointer after the wake_up, using explicit acquire/release
> semantics.
>
> This does mean that we can no longer use the clearing of fl_blocker as
> the wait condition, so switch the waiters over to checking whether the
> fl_blocked_member list_head is empty.
>
> Cc: yangerkun <[email protected]>
> Cc: NeilBrown <[email protected]>
> Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter)
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/cifs/file.c | 3 ++-
> fs/locks.c | 41 +++++++++++++++++++++++++++++++++++------
> 2 files changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 3b942ecdd4be..8f9d849a0012 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -1169,7 +1169,8 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
> rc = posix_lock_file(file, flock, NULL);
> up_write(&cinode->lock_sem);
> if (rc == FILE_LOCK_DEFERRED) {
> - rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
> + rc = wait_event_interruptible(flock->fl_wait,
> + list_empty(&flock->fl_blocked_member));
> if (!rc)
> goto try_again;
> locks_delete_block(flock);
> diff --git a/fs/locks.c b/fs/locks.c
> index 426b55d333d5..eaf754ecdaa8 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -725,7 +725,6 @@ static void __locks_delete_block(struct file_lock *waiter)
> {
> locks_delete_global_blocked(waiter);
> list_del_init(&waiter->fl_blocked_member);
> - waiter->fl_blocker = NULL;
> }
>
> static void __locks_wake_up_blocks(struct file_lock *blocker)
> @@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
> waiter->fl_lmops->lm_notify(waiter);
> else
> wake_up(&waiter->fl_wait);
> +
> + /*
> + * Tell the world we're done with it - see comment at
> + * top of locks_delete_block().
> + */
> + smp_store_release(&waiter->fl_blocker, NULL);
> }
> }
>
> @@ -753,11 +758,30 @@ int locks_delete_block(struct file_lock *waiter)
> {
> int status = -ENOENT;
>
> + /*
> + * If fl_blocker is NULL, it won't be set again as this thread "owns"
> + * the lock and is the only one that might try to claim the lock.
> + * Because fl_blocker is explicitly set last during a delete, it's
> + * safe to locklessly test to see if it's NULL. If it is, then we know
> + * that no new locks can be inserted into its fl_blocked_requests list,
> + * and we can therefore avoid doing anything further as long as that
> + * list is empty.
> + */
> + if (!smp_load_acquire(&waiter->fl_blocker) &&
> + list_empty(&waiter->fl_blocked_requests))
> + return status;
> +
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_blocker)
> status = 0;
> __locks_wake_up_blocks(waiter);
> __locks_delete_block(waiter);
> +
> + /*
> + * Tell the world we're done with it - see comment at top
> + * of this function
> + */
> + smp_store_release(&waiter->fl_blocker, NULL);
> spin_unlock(&blocked_lock_lock);
> return status;
> }
> @@ -1350,7 +1374,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> error = posix_lock_inode(inode, fl, NULL);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> + error = wait_event_interruptible(fl->fl_wait,
> + list_empty(&fl->fl_blocked_member));
> if (error)
> break;
> }
> @@ -1435,7 +1460,8 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start,
> error = posix_lock_inode(inode, &fl, NULL);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker);
> + error = wait_event_interruptible(fl.fl_wait,
> + list_empty(&fl.fl_blocked_member));
> if (!error) {
> /*
> * If we've been sleeping someone might have
> @@ -1638,7 +1664,8 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>
> locks_dispose_list(&dispose);
> error = wait_event_interruptible_timeout(new_fl->fl_wait,
> - !new_fl->fl_blocker, break_time);
> + list_empty(&new_fl->fl_blocked_member),
> + break_time);
>
> percpu_down_read(&file_rwsem);
> spin_lock(&ctx->flc_lock);
> @@ -2122,7 +2149,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
> error = flock_lock_inode(inode, fl);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> + error = wait_event_interruptible(fl->fl_wait,
> + list_empty(&fl->fl_blocked_member));
> if (error)
> break;
> }
> @@ -2399,7 +2427,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
> error = vfs_lock_file(filp, cmd, fl, NULL);
> if (error != FILE_LOCK_DEFERRED)
> break;
> - error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker);
> + error = wait_event_interruptible(fl->fl_wait,
> + list_empty(&fl->fl_blocked_member));
> if (error)
> break;
> }
> --
> 2.24.1
>


2020-03-19 17:53:53

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Mon, 2020-03-16 at 10:26 -0700, Linus Torvalds wrote:
> On Mon, Mar 16, 2020 at 4:07 AM Jeff Layton <[email protected]> wrote:
> >
> > + /*
> > + * If fl_blocker is NULL, it won't be set again as this thread "owns"
> > + * the lock and is the only one that might try to claim the lock.
> > + * Because fl_blocker is explicitly set last during a delete, it's
> > + * safe to locklessly test to see if it's NULL. If it is, then we know
> > + * that no new locks can be inserted into its fl_blocked_requests list,
> > + * and we can therefore avoid doing anything further as long as that
> > + * list is empty.
> > + */
> > + if (!smp_load_acquire(&waiter->fl_blocker) &&
> > + list_empty(&waiter->fl_blocked_requests))
> > + return status;
>
> Ack. This looks sane to me now.
>
> yangerkun - how did you find the original problem?
>
> Would you mind using whatever stress test that caused commit
> 6d390e4b5d48 ("locks: fix a potential use-after-free problem when
> wakeup a waiter") with this patch? And if you did it analytically,
> you're a champ and should look at this patch too!
>

Thanks for all the help with this.

Yangerkun gave me his Reviewed-by and I sent you the most recent version
of the patch yesterday (cc'ing the relevant mailing lists). I left you
as author as the original patch was yours.

Let me know if you'd prefer I send a pull request instead.

Cheers,
--
Jeff Layton <[email protected]>

2020-03-19 19:24:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Thu, Mar 19, 2020 at 10:52 AM Jeff Layton <[email protected]> wrote:
>
> Yangerkun gave me his Reviewed-by and I sent you the most recent version
> of the patch yesterday (cc'ing the relevant mailing lists). I left you
> as author as the original patch was yours.
>
> Let me know if you'd prefer I send a pull request instead.

Is that patch the only thing you have pending?

If you have other things, send me a pull request, otherwise just let
me know and I'll apply the patch directly.

Linus

2020-03-19 19:25:36

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Thu, 2020-03-19 at 12:23 -0700, Linus Torvalds wrote:
> On Thu, Mar 19, 2020 at 10:52 AM Jeff Layton <[email protected]> wrote:
> > Yangerkun gave me his Reviewed-by and I sent you the most recent version
> > of the patch yesterday (cc'ing the relevant mailing lists). I left you
> > as author as the original patch was yours.
> >
> > Let me know if you'd prefer I send a pull request instead.
>
> Is that patch the only thing you have pending?
>
> If you have other things, send me a pull request, otherwise just let
> me know and I'll apply the patch directly.

That's it for now.

Thanks,
--
Jeff Layton <[email protected]>

2020-03-19 19:37:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Thu, Mar 19, 2020 at 12:24 PM Jeff Layton <[email protected]> wrote:
>
> >
> > If you have other things, send me a pull request, otherwise just let
> > me know and I'll apply the patch directly.
>
> That's it for now.

Lol. You confused me with your question of whether I wanted a pull
request or not.

I had already applied the patch as dcf23ac3e846 ("locks: reinstate
locks_delete_block optimization") yesterday ;)

Linus

2020-03-19 20:10:56

by Jeff Layton

[permalink] [raw]
Subject: Re: [locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression

On Thu, 2020-03-19 at 12:35 -0700, Linus Torvalds wrote:
> On Thu, Mar 19, 2020 at 12:24 PM Jeff Layton <[email protected]> wrote:
> > > If you have other things, send me a pull request, otherwise just let
> > > me know and I'll apply the patch directly.
> >
> > That's it for now.
>
> Lol. You confused me with your question of whether I wanted a pull
> request or not.
>
> I had already applied the patch as dcf23ac3e846 ("locks: reinstate
> locks_delete_block optimization") yesterday ;)
>

Sorry about that! I did a pull this morning and didn't see it, you must
have pushed afterward. Thanks again for picking it up.

--
Jeff Layton <[email protected]>