Hi Jeff,
Thanks for the great feedback on version 0. I haven't address all your
points yet but will do. Just wanted to post a more readeable version.
Still missing things and TODOs (FIXME!!):
- adding lockdep assertions
- more tests and benchmarks on big machines
v1:
- rebased on v3.19-8975-g3d88348
- splittet into smaller pieces
- fixed a wrong usage of __locks_insert/delete_block() and it's posix version
- added seqfile helpers to avoid ugly open coded version
Original cover letter:
I am looking at how to get rid of lglock. Reason being -rt is not too
happy with that lock, especially that it uses arch_spinlock_t and
therefore it is not changed into a mutex on -rt. I know no change is
accepted only fixing something for -rt alone. So here my attempt to
make things faster for mainline and fixing -rt.
There are two users of lglock at this point. fs/locks.c and
kernel/stop_machine.c
I presume the fs/locks is the more interesting one in respect of
performance. Let's have a look at that one first.
The lglock version of file_lock_lock is used in combination of
blocked_lock_lock to protect file_lock's fl_link, fl_block, fl_next,
blocked_hash and the percpu file_lock_list.
The plan is to reorganize the usage of the locks and what they protect
so that the usage of the global blocked_lock_lock is reduced.
Whenever we insert a new lock we are going to grab besides the i_lock
also the corresponding percpu file_lock_lock. The global
blocked_lock_lock is only used when blocked_hash is involved.
file_lock_list exists to be being able to produce the content of
/proc/locks. For listing the all locks it seems a bit excessive to
grab all locks at once. We should be okay just grabbing the
corresponding lock when iterating over the percpu file_lock_list.
file_lock_lock protects now file_lock_list and fl_link, fl_block and
fl_next allone. That means we need to define which file_lock_lock is
used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
and fl_next.
I haven't found a good way around for the open coded seq_ops
(locks_start, locks_next, locks_stop). Maybe someone has good idea how
to handle with the locks.
For performance testing I used
git://git.samba.org/jlayton/lockperf.git and for correctness
https://github.com/linux-test-project/ltp/tree/master/testcases/network/nfsv4/locks
In case you are missing the posix03 results, my machine doesn't like
it too much. The load brings it to its knees due to the very high
load. Propably I need different parameters.
I didn't run excessive tests so far, because I am waiting for getting
access on a bigger box compared to my small i7-4850HQ system. I hope
to see larger improvements when there are more cores involved.
Anyway here are some results based on 3.19.
Explanation:
3.19: means unpatched kernel
3.19.0+: patched version
-with-reader.data: 'while true; do cat /proc/locks; done > /dev/null'
3.19/flock01.data
# NumSamples = 117; Min = 853.77; Max = 1425.62
# Mean = 1387.965754; Variance = 5512.265923; SD = 74.244636; Median 1408.578537
# each ∎ represents a count of 1
853.7746 - 910.9590 [ 1]: ∎
910.9590 - 968.1435 [ 0]:
968.1435 - 1025.3280 [ 1]: ∎
1025.3280 - 1082.5124 [ 0]:
1082.5124 - 1139.6969 [ 0]:
1139.6969 - 1196.8813 [ 2]: ∎∎
1196.8813 - 1254.0658 [ 1]: ∎
1254.0658 - 1311.2502 [ 1]: ∎
1311.2502 - 1368.4347 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎
1368.4347 - 1425.6192 [ 100]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.19.0+/flock01.data
# NumSamples = 124; Min = 1415.92; Max = 1459.39
# Mean = 1430.258322; Variance = 36.425769; SD = 6.035376; Median 1429.462718
# each ∎ represents a count of 1
1415.9192 - 1420.2667 [ 2]: ∎∎
1420.2667 - 1424.6142 [ 19]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1424.6142 - 1428.9617 [ 36]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1428.9617 - 1433.3092 [ 32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1433.3092 - 1437.6567 [ 22]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1437.6567 - 1442.0042 [ 10]: ∎∎∎∎∎∎∎∎∎∎
1442.0042 - 1446.3517 [ 2]: ∎∎
1446.3517 - 1450.6992 [ 0]:
1450.6992 - 1455.0467 [ 0]:
1455.0467 - 1459.3942 [ 1]: ∎
3.19/flock01-with-reader.data
# NumSamples = 97; Min = 1342.97; Max = 1423.54
# Mean = 1410.695019; Variance = 149.115584; SD = 12.211289; Median 1413.260338
# each ∎ represents a count of 1
1342.9675 - 1351.0245 [ 1]: ∎
1351.0245 - 1359.0815 [ 0]:
1359.0815 - 1367.1384 [ 1]: ∎
1367.1384 - 1375.1954 [ 1]: ∎
1375.1954 - 1383.2524 [ 1]: ∎
1383.2524 - 1391.3093 [ 4]: ∎∎∎∎
1391.3093 - 1399.3663 [ 0]:
1399.3663 - 1407.4233 [ 10]: ∎∎∎∎∎∎∎∎∎∎
1407.4233 - 1415.4803 [ 45]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1415.4803 - 1423.5372 [ 34]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.19.0+/flock01-with-reader.data
# NumSamples = 104; Min = 1331.30; Max = 1349.97
# Mean = 1340.767603; Variance = 16.434314; SD = 4.053926; Median 1340.243416
# each ∎ represents a count of 1
1331.3015 - 1333.1680 [ 2]: ∎∎
1333.1680 - 1335.0345 [ 6]: ∎∎∎∎∎∎
1335.0345 - 1336.9011 [ 9]: ∎∎∎∎∎∎∎∎∎
1336.9011 - 1338.7676 [ 17]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1338.7676 - 1340.6341 [ 21]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1340.6341 - 1342.5006 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎
1342.5006 - 1344.3671 [ 17]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1344.3671 - 1346.2336 [ 10]: ∎∎∎∎∎∎∎∎∎∎
1346.2336 - 1348.1001 [ 4]: ∎∎∎∎
1348.1001 - 1349.9666 [ 7]: ∎∎∎∎∎∎∎
3.19/flock02.data
# NumSamples = 2726; Min = 5.33; Max = 65.40
# Mean = 13.524542; Variance = 15.739906; SD = 3.967355; Median 12.857334
# each ∎ represents a count of 32
5.3260 - 11.3331 [ 167]: ∎∎∎∎∎
11.3331 - 17.3402 [ 2435]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
17.3402 - 23.3472 [ 90]: ∎∎
23.3472 - 29.3543 [ 4]:
29.3543 - 35.3614 [ 6]:
35.3614 - 41.3684 [ 6]:
41.3684 - 47.3755 [ 1]:
47.3755 - 53.3826 [ 11]:
53.3826 - 59.3897 [ 3]:
59.3897 - 65.3967 [ 3]:
3.19.0+/flock02.data
# NumSamples = 2226; Min = 9.69; Max = 45.76
# Mean = 13.140839; Variance = 2.468176; SD = 1.571043; Median 12.936904
# each ∎ represents a count of 18
9.6911 - 13.2975 [ 1366]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
13.2975 - 16.9040 [ 806]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
16.9040 - 20.5104 [ 51]: ∎∎
20.5104 - 24.1168 [ 0]:
24.1168 - 27.7232 [ 0]:
27.7232 - 31.3297 [ 1]:
31.3297 - 34.9361 [ 1]:
34.9361 - 38.5425 [ 0]:
38.5425 - 42.1489 [ 0]:
42.1489 - 45.7554 [ 1]:
3.19/flock02-with-reader.data
# NumSamples = 2719; Min = 2.59; Max = 67.27
# Mean = 14.488231; Variance = 8.927716; SD = 2.987928; Median 14.103543
# each ∎ represents a count of 30
2.5949 - 9.0619 [ 2]:
9.0619 - 15.5289 [ 2251]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
15.5289 - 21.9960 [ 441]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎
21.9960 - 28.4630 [ 15]:
28.4630 - 34.9300 [ 0]:
34.9300 - 41.3970 [ 1]:
41.3970 - 47.8640 [ 4]:
47.8640 - 54.3310 [ 1]:
54.3310 - 60.7980 [ 3]:
60.7980 - 67.2650 [ 1]:
3.19.0+/flock02-with-reader.data
# NumSamples = 2539; Min = 9.95; Max = 23.41
# Mean = 13.993072; Variance = 2.729366; SD = 1.652079; Median 13.800728
# each ∎ represents a count of 13
9.9482 - 11.2940 [ 47]: ∎∎∎
11.2940 - 12.6398 [ 399]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
12.6398 - 13.9855 [ 1016]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
13.9855 - 15.3313 [ 732]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
15.3313 - 16.6771 [ 190]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎
16.6771 - 18.0229 [ 70]: ∎∎∎∎∎
18.0229 - 19.3687 [ 41]: ∎∎∎
19.3687 - 20.7145 [ 28]: ∎∎
20.7145 - 22.0602 [ 13]: ∎
22.0602 - 23.4060 [ 3]:
3.19/lease01.data
# NumSamples = 65; Min = 152.59; Max = 175.66
# Mean = 167.052274; Variance = 23.396889; SD = 4.837033; Median 166.765499
# each ∎ represents a count of 1
152.5900 - 154.8972 [ 1]: ∎
154.8972 - 157.2045 [ 1]: ∎
157.2045 - 159.5117 [ 4]: ∎∎∎∎
159.5117 - 161.8190 [ 4]: ∎∎∎∎
161.8190 - 164.1262 [ 5]: ∎∎∎∎∎
164.1262 - 166.4335 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
166.4335 - 168.7407 [ 10]: ∎∎∎∎∎∎∎∎∎∎
168.7407 - 171.0480 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎
171.0480 - 173.3552 [ 6]: ∎∎∎∎∎∎
173.3552 - 175.6625 [ 7]: ∎∎∎∎∎∎∎
3.19.0+/lease01.data
# NumSamples = 602; Min = 145.21; Max = 181.15
# Mean = 167.448570; Variance = 32.250924; SD = 5.678990; Median 167.925163
# each ∎ represents a count of 2
145.2090 - 148.8032 [ 4]: ∎∎
148.8032 - 152.3974 [ 8]: ∎∎∎∎
152.3974 - 155.9916 [ 14]: ∎∎∎∎∎∎∎
155.9916 - 159.5858 [ 31]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
159.5858 - 163.1800 [ 44]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
163.1800 - 166.7742 [ 145]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
166.7742 - 170.3684 [ 173]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
170.3684 - 173.9626 [ 126]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
173.9626 - 177.5568 [ 41]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
177.5568 - 181.1510 [ 16]: ∎∎∎∎∎∎∎∎
3.19/lease01-with-reader.data
# NumSamples = 76; Min = 137.96; Max = 179.57
# Mean = 163.911898; Variance = 69.281798; SD = 8.323569; Median 164.218926
# each ∎ represents a count of 1
137.9570 - 142.1184 [ 1]: ∎
142.1184 - 146.2799 [ 0]:
146.2799 - 150.4414 [ 6]: ∎∎∎∎∎∎
150.4414 - 154.6028 [ 2]: ∎∎
154.6028 - 158.7643 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎
158.7643 - 162.9257 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
162.9257 - 167.0872 [ 10]: ∎∎∎∎∎∎∎∎∎∎
167.0872 - 171.2487 [ 19]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
171.2487 - 175.4101 [ 7]: ∎∎∎∎∎∎∎
175.4101 - 179.5716 [ 5]: ∎∎∎∎∎
3.19.0+/lease01-with-reader.data
# NumSamples = 216; Min = 144.76; Max = 176.76
# Mean = 160.680593; Variance = 44.032938; SD = 6.635732; Median 160.827124
# each ∎ represents a count of 1
144.7606 - 147.9610 [ 8]: ∎∎∎∎∎∎∎∎
147.9610 - 151.1614 [ 16]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
151.1614 - 154.3618 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎
154.3618 - 157.5622 [ 30]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
157.5622 - 160.7626 [ 42]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
160.7626 - 163.9630 [ 41]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
163.9630 - 167.1634 [ 32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
167.1634 - 170.3638 [ 24]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
170.3638 - 173.5642 [ 6]: ∎∎∎∎∎∎
173.5642 - 176.7646 [ 6]: ∎∎∎∎∎∎
3.19/lease02.data
# NumSamples = 306; Min = 43.22; Max = 64.80
# Mean = 59.318955; Variance = 9.779672; SD = 3.127247; Median 59.907252
# each ∎ represents a count of 1
43.2247 - 45.3825 [ 1]: ∎
45.3825 - 47.5403 [ 1]: ∎
47.5403 - 49.6981 [ 1]: ∎
49.6981 - 51.8560 [ 8]: ∎∎∎∎∎∎∎∎
51.8560 - 54.0138 [ 10]: ∎∎∎∎∎∎∎∎∎∎
54.0138 - 56.1716 [ 20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
56.1716 - 58.3294 [ 43]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
58.3294 - 60.4872 [ 93]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
60.4872 - 62.6450 [ 104]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
62.6450 - 64.8028 [ 25]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.19.0+/lease02.data
# NumSamples = 527; Min = 50.49; Max = 64.11
# Mean = 59.436887; Variance = 4.335719; SD = 2.082239; Median 59.632461
# each ∎ represents a count of 1
50.4855 - 51.8483 [ 2]: ∎∎
51.8483 - 53.2112 [ 1]: ∎
53.2112 - 54.5741 [ 5]: ∎∎∎∎∎
54.5741 - 55.9369 [ 26]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
55.9369 - 57.2998 [ 47]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
57.2998 - 58.6627 [ 92]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
58.6627 - 60.0255 [ 124]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
60.0255 - 61.3884 [ 127]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
61.3884 - 62.7513 [ 89]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
62.7513 - 64.1141 [ 14]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.19/lease02-with-reader.data
# NumSamples = 287; Min = 43.36; Max = 61.39
# Mean = 56.435938; Variance = 9.470188; SD = 3.077367; Median 57.268150
# each ∎ represents a count of 1
43.3599 - 45.1627 [ 1]: ∎
45.1627 - 46.9655 [ 4]: ∎∎∎∎
46.9655 - 48.7684 [ 5]: ∎∎∎∎∎
48.7684 - 50.5712 [ 8]: ∎∎∎∎∎∎∎∎
50.5712 - 52.3741 [ 10]: ∎∎∎∎∎∎∎∎∎∎
52.3741 - 54.1769 [ 25]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
54.1769 - 55.9797 [ 37]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
55.9797 - 57.7826 [ 93]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
57.7826 - 59.5854 [ 74]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
59.5854 - 61.3883 [ 30]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.19.0+/lease02-with-reader.data
# NumSamples = 919; Min = 46.59; Max = 62.16
# Mean = 56.053034; Variance = 5.565033; SD = 2.359032; Median 56.239479
# each ∎ represents a count of 3
46.5883 - 48.1459 [ 4]: ∎
48.1459 - 49.7034 [ 7]: ∎∎
49.7034 - 51.2609 [ 14]: ∎∎∎∎
51.2609 - 52.8184 [ 56]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
52.8184 - 54.3760 [ 120]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
54.3760 - 55.9335 [ 220]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
55.9335 - 57.4910 [ 260]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
57.4910 - 59.0486 [ 152]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
59.0486 - 60.6061 [ 70]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
60.6061 - 62.1636 [ 16]: ∎∎∎∎∎
3.19/posix01.data
# NumSamples = 62; Min = 1132.49; Max = 1458.17
# Mean = 1336.939168; Variance = 6234.581717; SD = 78.959368; Median 1338.021800
# each ∎ represents a count of 1
1132.4925 - 1165.0600 [ 2]: ∎∎
1165.0600 - 1197.6274 [ 2]: ∎∎
1197.6274 - 1230.1949 [ 4]: ∎∎∎∎
1230.1949 - 1262.7623 [ 4]: ∎∎∎∎
1262.7623 - 1295.3297 [ 4]: ∎∎∎∎
1295.3297 - 1327.8972 [ 7]: ∎∎∎∎∎∎∎
1327.8972 - 1360.4646 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
1360.4646 - 1393.0321 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
1393.0321 - 1425.5995 [ 3]: ∎∎∎
1425.5995 - 1458.1670 [ 10]: ∎∎∎∎∎∎∎∎∎∎
3.19.0+/posix01.data
# NumSamples = 506; Min = 1309.62; Max = 1500.85
# Mean = 1453.666440; Variance = 619.986362; SD = 24.899525; Median 1457.756685
# each ∎ represents a count of 2
1309.6159 - 1328.7394 [ 1]:
1328.7394 - 1347.8630 [ 0]:
1347.8630 - 1366.9866 [ 2]: ∎
1366.9866 - 1386.1102 [ 3]: ∎
1386.1102 - 1405.2338 [ 15]: ∎∎∎∎∎∎∎
1405.2338 - 1424.3574 [ 42]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1424.3574 - 1443.4810 [ 77]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1443.4810 - 1462.6046 [ 157]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1462.6046 - 1481.7282 [ 163]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1481.7282 - 1500.8518 [ 46]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.19/posix01-with-reader.data
# NumSamples = 45; Min = 1234.92; Max = 1450.74
# Mean = 1409.432073; Variance = 1146.435306; SD = 33.859051; Median 1416.858889
# each ∎ represents a count of 1
1234.9186 - 1256.5012 [ 1]: ∎
1256.5012 - 1278.0838 [ 0]:
1278.0838 - 1299.6664 [ 0]:
1299.6664 - 1321.2490 [ 0]:
1321.2490 - 1342.8316 [ 0]:
1342.8316 - 1364.4142 [ 1]: ∎
1364.4142 - 1385.9968 [ 4]: ∎∎∎∎
1385.9968 - 1407.5794 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎
1407.5794 - 1429.1620 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1429.1620 - 1450.7446 [ 9]: ∎∎∎∎∎∎∎∎∎
3.19.0+/posix01-with-reader.data
# NumSamples = 117; Min = 1301.80; Max = 1411.39
# Mean = 1379.963065; Variance = 519.145334; SD = 22.784761; Median 1383.988035
# each ∎ represents a count of 1
1301.7995 - 1312.7584 [ 3]: ∎∎∎
1312.7584 - 1323.7172 [ 2]: ∎∎
1323.7172 - 1334.6761 [ 3]: ∎∎∎
1334.6761 - 1345.6350 [ 4]: ∎∎∎∎
1345.6350 - 1356.5938 [ 0]:
1356.5938 - 1367.5527 [ 10]: ∎∎∎∎∎∎∎∎∎∎
1367.5527 - 1378.5115 [ 19]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1378.5115 - 1389.4704 [ 28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1389.4704 - 1400.4292 [ 35]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1400.4292 - 1411.3881 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
3.19/posix02.data
# NumSamples = 458; Min = 18.61; Max = 75.56
# Mean = 23.772732; Variance = 18.547870; SD = 4.306724; Median 23.473403
# each ∎ represents a count of 3
18.6111 - 24.3065 [ 297]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
24.3065 - 30.0018 [ 158]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
30.0018 - 35.6972 [ 0]:
35.6972 - 41.3926 [ 0]:
41.3926 - 47.0879 [ 0]:
47.0879 - 52.7833 [ 0]:
52.7833 - 58.4787 [ 0]:
58.4787 - 64.1740 [ 0]:
64.1740 - 69.8694 [ 1]:
69.8694 - 75.5648 [ 2]:
3.19.0+/posix02.data
# NumSamples = 2340; Min = 15.62; Max = 79.66
# Mean = 22.538333; Variance = 12.248801; SD = 3.499829; Median 22.106778
# each ∎ represents a count of 15
15.6169 - 22.0211 [ 1167]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
22.0211 - 28.4252 [ 1101]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
28.4252 - 34.8294 [ 60]: ∎∎∎∎
34.8294 - 41.2335 [ 3]:
41.2335 - 47.6377 [ 1]:
47.6377 - 54.0418 [ 0]:
54.0418 - 60.4459 [ 1]:
60.4459 - 66.8501 [ 2]:
66.8501 - 73.2542 [ 3]:
73.2542 - 79.6584 [ 2]:
3.19/posix02-with-reader.data
# NumSamples = 387; Min = 19.21; Max = 29.79
# Mean = 25.041294; Variance = 3.455306; SD = 1.858845; Median 25.111446
# each ∎ represents a count of 1
19.2050 - 20.2634 [ 2]: ∎∎
20.2634 - 21.3218 [ 10]: ∎∎∎∎∎∎∎∎∎∎
21.3218 - 22.3802 [ 16]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
22.3802 - 23.4386 [ 44]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
23.4386 - 24.4970 [ 84]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
24.4970 - 25.5554 [ 74]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
25.5554 - 26.6138 [ 80]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
26.6138 - 27.6722 [ 55]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
27.6722 - 28.7306 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
28.7306 - 29.7890 [ 9]: ∎∎∎∎∎∎∎∎∎
3.19.0+/posix02-with-reader.data
# NumSamples = 1689; Min = 16.97; Max = 85.43
# Mean = 23.973230; Variance = 21.079021; SD = 4.591189; Median 23.419958
# each ∎ represents a count of 13
16.9744 - 23.8198 [ 993]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
23.8198 - 30.6652 [ 670]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
30.6652 - 37.5106 [ 9]:
37.5106 - 44.3560 [ 3]:
44.3560 - 51.2015 [ 2]:
51.2015 - 58.0469 [ 2]:
58.0469 - 64.8923 [ 3]:
64.8923 - 71.7377 [ 3]:
71.7377 - 78.5831 [ 1]:
78.5831 - 85.4285 [ 3]:
Cc: Alexander Viro <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Daniel Wagner (5):
locks: Remove unnecessary IS_POSIX test
locks: Split insert/delete block functions into flock/posix parts
seq_file: Add percpu seq_hlist helpers with locking iterators
locks: Use percpu spinlocks to protect file_lock_list
locks: Use blocked_lock_lock only to protect blocked_hash
fs/locks.c | 148 ++++++++++++++++++++++++++++++-----------------
fs/seq_file.c | 83 ++++++++++++++++++++++++++
include/linux/seq_file.h | 13 +++++
3 files changed, 190 insertions(+), 54 deletions(-)
--
2.1.0
Since following change
commit bd61e0a9c852de2d705b6f1bb2cc54c5774db570
Author: Jeff Layton <[email protected]>
Date: Fri Jan 16 15:05:55 2015 -0500
locks: convert posix locks to file_lock_context
all Posix locks are kept on their a separate list, so the test is
redudant.
Signed-off-by: Daniel Wagner <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Cc: Alexander Viro <[email protected]>
---
fs/locks.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 365c82e..f63aa92 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -964,8 +964,6 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
*/
if (request->fl_type != F_UNLCK) {
list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
- if (!IS_POSIX(fl))
- continue;
if (!posix_locks_conflict(request, fl))
continue;
if (conflock)
--
2.1.0
The locks_insert/delete_block() functions are used for flock, posix
and leases types. blocked_lock_lock is used to serialize all access to
fl_link, fl_block, fl_next and blocked_hash. Here, we prepare the
stage for using blocked_lock_lock to protect blocked_hash.
Signed-off-by: Daniel Wagner <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Cc: Alexander Viro <[email protected]>
---
fs/locks.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 40 insertions(+), 8 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index f63aa92..142e4fd 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -607,11 +607,20 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
*/
static void __locks_delete_block(struct file_lock *waiter)
{
- locks_delete_global_blocked(waiter);
list_del_init(&waiter->fl_block);
waiter->fl_next = NULL;
}
+/* Posix block variant of __locks_delete_block.
+ *
+ * Must be called with blocked_lock_lock held.
+ */
+static void __locks_delete_posix_block(struct file_lock *waiter)
+{
+ locks_delete_global_blocked(waiter);
+ __locks_delete_block(waiter);
+}
+
static void locks_delete_block(struct file_lock *waiter)
{
spin_lock(&blocked_lock_lock);
@@ -619,6 +628,13 @@ static void locks_delete_block(struct file_lock *waiter)
spin_unlock(&blocked_lock_lock);
}
+static void locks_delete_posix_block(struct file_lock *waiter)
+{
+ spin_lock(&blocked_lock_lock);
+ __locks_delete_posix_block(waiter);
+ spin_unlock(&blocked_lock_lock);
+}
+
/* Insert waiter into blocker's block list.
* We use a circular list so that processes can be easily woken up in
* the order they blocked. The documentation doesn't require this but
@@ -635,7 +651,17 @@ static void __locks_insert_block(struct file_lock *blocker,
BUG_ON(!list_empty(&waiter->fl_block));
waiter->fl_next = blocker;
list_add_tail(&waiter->fl_block, &blocker->fl_block);
- if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
+}
+
+/* Posix block variant of __locks_insert_block.
+ *
+ * Must be called with flc_lock and blocked_lock_lock held.
+ */
+static void __locks_insert_posix_block(struct file_lock *blocker,
+ struct file_lock *waiter)
+{
+ __locks_insert_block(blocker, waiter);
+ if (!IS_OFDLCK(blocker))
locks_insert_global_blocked(waiter);
}
@@ -671,7 +697,10 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
waiter = list_first_entry(&blocker->fl_block,
struct file_lock, fl_block);
- __locks_delete_block(waiter);
+ if (IS_POSIX(blocker))
+ __locks_delete_posix_block(waiter);
+ else
+ __locks_delete_block(waiter);
if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
waiter->fl_lmops->lm_notify(waiter);
else
@@ -979,7 +1008,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
spin_lock(&blocked_lock_lock);
if (likely(!posix_locks_deadlock(request, fl))) {
error = FILE_LOCK_DEFERRED;
- __locks_insert_block(fl, request);
+ __locks_insert_posix_block(fl, request);
}
spin_unlock(&blocked_lock_lock);
goto out;
@@ -1180,7 +1209,7 @@ int posix_lock_file_wait(struct file *filp, struct file_lock *fl)
if (!error)
continue;
- locks_delete_block(fl);
+ locks_delete_posix_block(fl);
break;
}
return error;
@@ -1277,7 +1306,7 @@ int locks_mandatory_area(int read_write, struct inode *inode,
continue;
}
- locks_delete_block(&fl);
+ locks_delete_posix_block(&fl);
break;
}
@@ -2097,7 +2126,10 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
if (!error)
continue;
- locks_delete_block(fl);
+ if (IS_POSIX(fl))
+ locks_delete_posix_block(fl);
+ else
+ locks_delete_block(fl);
break;
}
@@ -2461,7 +2493,7 @@ posix_unblock_lock(struct file_lock *waiter)
spin_lock(&blocked_lock_lock);
if (waiter->fl_next)
- __locks_delete_block(waiter);
+ __locks_delete_posix_block(waiter);
else
status = -ENOENT;
spin_unlock(&blocked_lock_lock);
--
2.1.0
Introduce a variant of the seq_hlist helpers for iterating seq_hlist
are protected by perpcu spinlocks.
Signed-off-by: Daniel Wagner <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Cc: Alexander Viro <[email protected]>
---
fs/seq_file.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/seq_file.h | 13 ++++++++
2 files changed, 96 insertions(+)
diff --git a/fs/seq_file.c b/fs/seq_file.c
index 555f821..56adfdb 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -966,3 +966,86 @@ seq_hlist_next_percpu(void *v, struct hlist_head __percpu *head,
return NULL;
}
EXPORT_SYMBOL(seq_hlist_next_percpu);
+
+/**
+ * seq_hlist_start_precpu_locked - start an iteration of a percpu hlist array
+ * @head: pointer to percpu array of struct hlist_heads
+ * @lock: pointer to percpu spinlock which protects @head
+ * @cpu: pointer to cpu "cursor"
+ * @pos: start position of sequence
+ *
+ * Called at seq_file->op->start().
+ */
+struct hlist_node *
+seq_hlist_start_percpu_locked(struct hlist_head __percpu *head,
+ spinlock_t __percpu *lock, int *cpu, loff_t pos)
+{
+ struct hlist_node *node;
+
+ for_each_possible_cpu(*cpu) {
+ spin_lock(per_cpu_ptr(lock, *cpu));
+ hlist_for_each(node, per_cpu_ptr(head, *cpu)) {
+ if (pos-- == 0)
+ return node;
+ }
+ spin_unlock(per_cpu_ptr(lock, *cpu));
+ }
+ return NULL;
+}
+EXPORT_SYMBOL(seq_hlist_start_percpu_locked);
+
+/**
+ * seq_hlist_next_percpu_locked - move to the next position of the percpu hlist array
+ * @v: pointer to current hlist_node
+ * @head: pointer to percpu array of struct hlist_heads
+ * @lock: pointer to percpu spinlock which protects @head
+ * @cpu: pointer to cpu "cursor"
+ * @pos: start position of sequence
+ *
+ * Called at seq_file->op->next().
+ */
+struct hlist_node *
+seq_hlist_next_percpu_locked(void *v, struct hlist_head __percpu *head,
+ spinlock_t __percpu *lock,
+ int *cpu, loff_t *pos)
+{
+ struct hlist_node *node = v;
+
+ ++*pos;
+
+ if (node->next)
+ return node->next;
+
+ spin_unlock(per_cpu_ptr(lock, *cpu));
+
+ for (*cpu = cpumask_next(*cpu, cpu_possible_mask); *cpu < nr_cpu_ids;
+ *cpu = cpumask_next(*cpu, cpu_possible_mask)) {
+ struct hlist_head *bucket;
+
+ spin_lock(per_cpu_ptr(lock, *cpu));
+ bucket = per_cpu_ptr(head, *cpu);
+
+ if (!hlist_empty(bucket))
+ return bucket->first;
+
+ spin_unlock(per_cpu_ptr(lock, *cpu));
+ }
+ return NULL;
+}
+EXPORT_SYMBOL(seq_hlist_next_percpu_locked);
+
+/**
+ * seq_hlist_stop_percpu_locked - stop iterating over the percpu hlist array
+ * @v: pointer to current hlist_node
+ * @lock: pointer to percpu spinlock which protects @head
+ * @cpu: pointer to cpu "cursor"
+ *
+ * Called at seq_file->op->stop().
+ */
+void
+seq_hlist_stop_percpu_locked(void *v, spinlock_t __percpu *lock, int *cpu)
+{
+ if (v)
+ spin_unlock(per_cpu_ptr(lock, *cpu));
+}
+EXPORT_SYMBOL(seq_hlist_stop_percpu_locked);
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index afbb1fd..6419ac4 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -184,4 +184,17 @@ extern struct hlist_node *seq_hlist_start_percpu(struct hlist_head __percpu *hea
extern struct hlist_node *seq_hlist_next_percpu(void *v, struct hlist_head __percpu *head, int *cpu, loff_t *pos);
+extern struct hlist_node *seq_hlist_start_percpu_locked(
+ struct hlist_head __percpu *head,
+ spinlock_t __percpu *lock,
+ int *cpu, loff_t pos);
+
+extern struct hlist_node *seq_hlist_next_percpu_locked(
+ void *v, struct hlist_head __percpu *head,
+ spinlock_t __percpu *lock,
+ int *cpu, loff_t *pos);
+
+extern void seq_hlist_stop_percpu_locked(
+ void *v, spinlock_t __percpu *lock, int *cpu);
+
#endif
--
2.1.0
Replace the lglock with percpu spinlocks. That allows us to iterate in
the seqfile ops without taking all underlyng spinlocks with the
lg_global_lock().
Signed-off-by: Daniel Wagner <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Cc: Alexander Viro <[email protected]>
---
fs/locks.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 142e4fd..20ed00a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -128,7 +128,6 @@
#include <linux/pid_namespace.h>
#include <linux/hashtable.h>
#include <linux/percpu.h>
-#include <linux/lglock.h>
#define CREATE_TRACE_POINTS
#include <trace/events/filelock.h>
@@ -160,10 +159,10 @@ int lease_break_time = 45;
/*
* The global file_lock_list is only used for displaying /proc/locks, so we
* keep a list on each CPU, with each list protected by its own spinlock via
- * the file_lock_lglock. Note that alterations to the list also require that
+ * the file_lock_lock. Note that alterations to the list also require that
* the relevant flc_lock is held.
*/
-DEFINE_STATIC_LGLOCK(file_lock_lglock);
+static DEFINE_PER_CPU(spinlock_t, file_lock_lock);
static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
/*
@@ -561,10 +560,10 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
/* Must be called with the flc_lock held! */
static void locks_insert_global_locks(struct file_lock *fl)
{
- lg_local_lock(&file_lock_lglock);
+ spin_lock(this_cpu_ptr(&file_lock_lock));
fl->fl_link_cpu = smp_processor_id();
hlist_add_head(&fl->fl_link, this_cpu_ptr(&file_lock_list));
- lg_local_unlock(&file_lock_lglock);
+ spin_unlock(this_cpu_ptr(&file_lock_lock));
}
/* Must be called with the flc_lock held! */
@@ -577,9 +576,9 @@ static void locks_delete_global_locks(struct file_lock *fl)
*/
if (hlist_unhashed(&fl->fl_link))
return;
- lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
+ spin_lock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
hlist_del_init(&fl->fl_link);
- lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
}
static unsigned long
@@ -2628,9 +2627,9 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
struct locks_iterator *iter = f->private;
iter->li_pos = *pos + 1;
- lg_global_lock(&file_lock_lglock);
spin_lock(&blocked_lock_lock);
- return seq_hlist_start_percpu(&file_lock_list, &iter->li_cpu, *pos);
+ return seq_hlist_start_percpu_locked(&file_lock_list, &file_lock_lock,
+ &iter->li_cpu, *pos);
}
static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
@@ -2638,14 +2637,17 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
struct locks_iterator *iter = f->private;
++iter->li_pos;
- return seq_hlist_next_percpu(v, &file_lock_list, &iter->li_cpu, pos);
+ return seq_hlist_next_percpu_locked(v, &file_lock_list, &file_lock_lock,
+ &iter->li_cpu, pos);
}
static void locks_stop(struct seq_file *f, void *v)
__releases(&blocked_lock_lock)
{
+ struct locks_iterator *iter = f->private;
+
+ seq_hlist_stop_percpu_locked(v, &file_lock_lock, &iter->li_cpu);
spin_unlock(&blocked_lock_lock);
- lg_global_unlock(&file_lock_lglock);
}
static const struct seq_operations locks_seq_operations = {
@@ -2686,10 +2688,10 @@ static int __init filelock_init(void)
filelock_cache = kmem_cache_create("file_lock_cache",
sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
- lg_lock_init(&file_lock_lglock, "file_lock_lglock");
-
- for_each_possible_cpu(i)
+ for_each_possible_cpu(i) {
INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));
+ spin_lock_init(per_cpu_ptr(&file_lock_lock, i));
+ }
return 0;
}
--
2.1.0
blocked_lock_lock and file_lock_lock is used to protect file_lock's
fl_link, fl_block, fl_next, blocked_hash and the percpu
file_lock_list.
The plan is to reorganize the usage of the locks and what they protect
so that the usage of the global blocked_lock_lock is reduced.
Whenever we insert a new lock we are going to grab besides the
flc_lock also the corresponding percpu file_lock_lock. The global
blocked_lock_lock is only used when blocked_hash is involved.
file_lock_lock protects now file_lock_list and fl_link, fl_block and
fl_next allone. That means we need to define which file_lock_lock is
used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
and fl_next.
Signed-off-by: Daniel Wagner <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Cc: Alexander Viro <[email protected]>
---
fs/locks.c | 78 ++++++++++++++++++++++++++++++++++----------------------------
1 file changed, 43 insertions(+), 35 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 20ed00a..73b99ac 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -161,6 +161,20 @@ int lease_break_time = 45;
* keep a list on each CPU, with each list protected by its own spinlock via
* the file_lock_lock. Note that alterations to the list also require that
* the relevant flc_lock is held.
+ *
+ * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * pointer for file_lock structures that are acting as lock requests (in
+ * contrast to those that are acting as records of acquired locks).
+ *
+ * file_lock structures acting as lock requests (waiters) use the same
+ * spinlock as the those acting as lock holder (blocker). E.g. the
+ * blocker is initially added to the file_lock_list living on CPU 0,
+ * all waiters on that blocker are serialized via CPU 0 (see
+ * fl_link_cpu usage).
+ *
+ * In particular, adding an entry to the fl_block list requires that you hold
+ * both the flc_lock and the blocked_lock_lock (acquired in that order).
+ * Deleting an entry from the list however only requires the file_lock_lock.
*/
static DEFINE_PER_CPU(spinlock_t, file_lock_lock);
static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
@@ -182,19 +196,6 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
/*
* This lock protects the blocked_hash. Generally, if you're accessing it, you
* want to be holding this lock.
- *
- * In addition, it also protects the fl->fl_block list, and the fl->fl_next
- * pointer for file_lock structures that are acting as lock requests (in
- * contrast to those that are acting as records of acquired locks).
- *
- * Note that when we acquire this lock in order to change the above fields,
- * we often hold the flc_lock as well. In certain cases, when reading the fields
- * protected by this lock, we can skip acquiring it iff we already hold the
- * flc_lock.
- *
- * In particular, adding an entry to the fl_block list requires that you hold
- * both the flc_lock and the blocked_lock_lock (acquired in that order).
- * Deleting an entry from the list however only requires the file_lock_lock.
*/
static DEFINE_SPINLOCK(blocked_lock_lock);
@@ -602,7 +603,7 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
/* Remove waiter from blocker's block list.
* When blocker ends up pointing to itself then the list is empty.
*
- * Must be called with blocked_lock_lock held.
+ * Must be called with file_lock_lock held.
*/
static void __locks_delete_block(struct file_lock *waiter)
{
@@ -612,7 +613,7 @@ static void __locks_delete_block(struct file_lock *waiter)
/* Posix block variant of __locks_delete_block.
*
- * Must be called with blocked_lock_lock held.
+ * Must be called with file_lock_lock held.
*/
static void __locks_delete_posix_block(struct file_lock *waiter)
{
@@ -622,16 +623,18 @@ static void __locks_delete_posix_block(struct file_lock *waiter)
static void locks_delete_block(struct file_lock *waiter)
{
- spin_lock(&blocked_lock_lock);
+ spin_lock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
__locks_delete_block(waiter);
- spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
}
static void locks_delete_posix_block(struct file_lock *waiter)
{
+ spin_lock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
spin_lock(&blocked_lock_lock);
__locks_delete_posix_block(waiter);
spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
}
/* Insert waiter into blocker's block list.
@@ -639,22 +642,23 @@ static void locks_delete_posix_block(struct file_lock *waiter)
* the order they blocked. The documentation doesn't require this but
* it seems like the reasonable thing to do.
*
- * Must be called with both the flc_lock and blocked_lock_lock held. The
- * fl_block list itself is protected by the blocked_lock_lock, but by ensuring
+ * Must be called with both the flc_lock and file_lock_lock held. The
+ * fl_block list itself is protected by the file_lock_lock, but by ensuring
* that the flc_lock is also held on insertions we can avoid taking the
- * blocked_lock_lock in some cases when we see that the fl_block list is empty.
+ * file_lock_lock in some cases when we see that the fl_block list is empty.
*/
static void __locks_insert_block(struct file_lock *blocker,
struct file_lock *waiter)
{
BUG_ON(!list_empty(&waiter->fl_block));
+ waiter->fl_link_cpu = blocker->fl_link_cpu;
waiter->fl_next = blocker;
list_add_tail(&waiter->fl_block, &blocker->fl_block);
}
/* Posix block variant of __locks_insert_block.
*
- * Must be called with flc_lock and blocked_lock_lock held.
+ * Must be called with flc_lock and file_lock_lock held.
*/
static void __locks_insert_posix_block(struct file_lock *blocker,
struct file_lock *waiter)
@@ -668,9 +672,9 @@ static void __locks_insert_posix_block(struct file_lock *blocker,
static void locks_insert_block(struct file_lock *blocker,
struct file_lock *waiter)
{
- spin_lock(&blocked_lock_lock);
+ spin_lock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
__locks_insert_block(blocker, waiter);
- spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
}
/*
@@ -681,31 +685,33 @@ static void locks_insert_block(struct file_lock *blocker,
static void locks_wake_up_blocks(struct file_lock *blocker)
{
/*
- * Avoid taking global lock if list is empty. This is safe since new
+ * Avoid taking lock if list is empty. This is safe since new
* blocked requests are only added to the list under the flc_lock, and
* the flc_lock is always held here. Note that removal from the fl_block
* list does not require the flc_lock, so we must recheck list_empty()
- * after acquiring the blocked_lock_lock.
+ * after acquiring the file_lock_lock.
*/
if (list_empty(&blocker->fl_block))
return;
- spin_lock(&blocked_lock_lock);
+ spin_lock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
while (!list_empty(&blocker->fl_block)) {
struct file_lock *waiter;
waiter = list_first_entry(&blocker->fl_block,
struct file_lock, fl_block);
- if (IS_POSIX(blocker))
+ if (IS_POSIX(blocker)) {
+ spin_lock(&blocked_lock_lock);
__locks_delete_posix_block(waiter);
- else
+ spin_unlock(&blocked_lock_lock);
+ } else
__locks_delete_block(waiter);
if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
waiter->fl_lmops->lm_notify(waiter);
else
wake_up(&waiter->fl_wait);
}
- spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
}
static void
@@ -732,9 +738,11 @@ static void
locks_delete_lock_ctx(struct file_lock *fl, struct list_head *dispose)
{
locks_unlink_lock_ctx(fl);
- if (dispose)
+ if (dispose) {
+ spin_lock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
list_add(&fl->fl_list, dispose);
- else
+ spin_unlock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
+ } else
locks_free_lock(fl);
}
@@ -1004,12 +1012,14 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
* locks list must be done while holding the same lock!
*/
error = -EDEADLK;
+ spin_lock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
spin_lock(&blocked_lock_lock);
if (likely(!posix_locks_deadlock(request, fl))) {
error = FILE_LOCK_DEFERRED;
__locks_insert_posix_block(fl, request);
}
spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
goto out;
}
}
@@ -2490,12 +2500,14 @@ posix_unblock_lock(struct file_lock *waiter)
{
int status = 0;
+ spin_lock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
spin_lock(&blocked_lock_lock);
if (waiter->fl_next)
__locks_delete_posix_block(waiter);
else
status = -ENOENT;
spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
return status;
}
EXPORT_SYMBOL(posix_unblock_lock);
@@ -2622,12 +2634,10 @@ static int locks_show(struct seq_file *f, void *v)
}
static void *locks_start(struct seq_file *f, loff_t *pos)
- __acquires(&blocked_lock_lock)
{
struct locks_iterator *iter = f->private;
iter->li_pos = *pos + 1;
- spin_lock(&blocked_lock_lock);
return seq_hlist_start_percpu_locked(&file_lock_list, &file_lock_lock,
&iter->li_cpu, *pos);
}
@@ -2642,12 +2652,10 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
}
static void locks_stop(struct seq_file *f, void *v)
- __releases(&blocked_lock_lock)
{
struct locks_iterator *iter = f->private;
seq_hlist_stop_percpu_locked(v, &file_lock_lock, &iter->li_cpu);
- spin_unlock(&blocked_lock_lock);
}
static const struct seq_operations locks_seq_operations = {
--
2.1.0
Daniel Wagner <[email protected]> writes:
>
> I am looking at how to get rid of lglock. Reason being -rt is not too
> happy with that lock, especially that it uses arch_spinlock_t and
AFAIK it could just use normal spinlock. Have you tried that?
-Andi
--
[email protected] -- Speaking for myself only
On 02/20/2015 05:05 PM, Andi Kleen wrote:
> Daniel Wagner <[email protected]> writes:
>>
>> I am looking at how to get rid of lglock. Reason being -rt is not too
>> happy with that lock, especially that it uses arch_spinlock_t and
>
> AFAIK it could just use normal spinlock. Have you tried that?
I have tried it. At least fs/locks.c didn't blow up. The benchmark
results (lockperf) indicated that using normal spinlocks is even
slightly faster. Simply converting felt like cheating. It might be
necessary for the other user (kernel/stop_machine.c). Currently it looks
like there is some additional benefit getting lglock away in fs/locks.c.
cheers,
daniel
On Tue, 24 Feb 2015 16:58:26 +0100
Daniel Wagner <[email protected]> wrote:
> On 02/20/2015 05:05 PM, Andi Kleen wrote:
> > Daniel Wagner <[email protected]> writes:
> >>
> >> I am looking at how to get rid of lglock. Reason being -rt is not too
> >> happy with that lock, especially that it uses arch_spinlock_t and
> >
> > AFAIK it could just use normal spinlock. Have you tried that?
>
> I have tried it. At least fs/locks.c didn't blow up. The benchmark
> results (lockperf) indicated that using normal spinlocks is even
> slightly faster. Simply converting felt like cheating. It might be
> necessary for the other user (kernel/stop_machine.c). Currently it looks
> like there is some additional benefit getting lglock away in fs/locks.c.
>
What would that benefit be?
lglocks are basically percpu spinlocks. Fixing some underlying
infrastructure that provides that seems like it might be a better
approach than declaring them "manually" and avoiding them altogether.
Note that you can still do basically what you're proposing here with
lglocks as well. Avoid using lg_global_* and just lock each one in
turn.
That said, now that I've thought about this, I'm not sure that's really
something we want to do when accessing /proc/locks. If you lock each
one in turn, then you aren't freezing the state of the file_lock_list
percpu lists. Won't that mean that you aren't necessarily getting a
consistent view of the locks on those lists when you cat /proc/locks?
I think having a consistent view there might trump any benefit to
performance. Reading /proc/locks is a *very* rare activity in the big
scheme of things.
I do however like the idea of moving more to be protected by the
lglocks, and minimizing usage of the blocked_lock_lock.
--
Jeff Layton <[email protected]>
Sorry for the late response. Got dragged away.
On 02/24/2015 10:06 PM, Jeff Layton wrote:
> On Tue, 24 Feb 2015 16:58:26 +0100
> Daniel Wagner <[email protected]> wrote:
>
>> On 02/20/2015 05:05 PM, Andi Kleen wrote:
>>> Daniel Wagner <[email protected]> writes:
>>>>
>>>> I am looking at how to get rid of lglock. Reason being -rt is not too
>>>> happy with that lock, especially that it uses arch_spinlock_t and
>>>
>>> AFAIK it could just use normal spinlock. Have you tried that?
>>
>> I have tried it. At least fs/locks.c didn't blow up. The benchmark
>> results (lockperf) indicated that using normal spinlocks is even
>> slightly faster. Simply converting felt like cheating. It might be
>> necessary for the other user (kernel/stop_machine.c). Currently it looks
>> like there is some additional benefit getting lglock away in fs/locks.c.
>>
>
> What would that benefit be?
>
> lglocks are basically percpu spinlocks. Fixing some underlying
> infrastructure that provides that seems like it might be a better
> approach than declaring them "manually" and avoiding them altogether.
>
> Note that you can still do basically what you're proposing here with
> lglocks as well. Avoid using lg_global_* and just lock each one in
> turn.
Yes, that was I was referring to as benefit. My main point is that there
are only lg_local_* calls we could as well use normal spinlocks. No need
to fancy.
> That said, now that I've thought about this, I'm not sure that's really
> something we want to do when accessing /proc/locks. If you lock each
> one in turn, then you aren't freezing the state of the file_lock_list
> percpu lists. Won't that mean that you aren't necessarily getting a
> consistent view of the locks on those lists when you cat /proc/locks?
Maybe I am overlooking something here but I don't see a consistency
problem. We list a blocker and all its waiter in a go since only the
blocker is added to flock_lock_list and the waiters are added blocker's
fl_block list.
> I think having a consistent view there might trump any benefit to
> performance. Reading /proc/locks is a *very* rare activity in the big
> scheme of things.
I agree, but I hope that I got it right with my consistency argument
than there shouldn't be a problem.
> I do however like the idea of moving more to be protected by the
> lglocks, and minimizing usage of the blocked_lock_lock.
Good to hear. I am trying to write a new test (a variation of the
dinning philosophers 'problem') case which benchmarks blocked_lock_lock
after the re-factoring.
cheers,
daniel
On Fri, 27 Feb 2015 16:01:30 +0100
Daniel Wagner <[email protected]> wrote:
> Sorry for the late response. Got dragged away.
>
> On 02/24/2015 10:06 PM, Jeff Layton wrote:
> > On Tue, 24 Feb 2015 16:58:26 +0100
> > Daniel Wagner <[email protected]> wrote:
> >
> >> On 02/20/2015 05:05 PM, Andi Kleen wrote:
> >>> Daniel Wagner <[email protected]> writes:
> >>>>
> >>>> I am looking at how to get rid of lglock. Reason being -rt is not too
> >>>> happy with that lock, especially that it uses arch_spinlock_t and
> >>>
> >>> AFAIK it could just use normal spinlock. Have you tried that?
> >>
> >> I have tried it. At least fs/locks.c didn't blow up. The benchmark
> >> results (lockperf) indicated that using normal spinlocks is even
> >> slightly faster. Simply converting felt like cheating. It might be
> >> necessary for the other user (kernel/stop_machine.c). Currently it looks
> >> like there is some additional benefit getting lglock away in fs/locks.c.
> >>
> >
> > What would that benefit be?
> >
> > lglocks are basically percpu spinlocks. Fixing some underlying
> > infrastructure that provides that seems like it might be a better
> > approach than declaring them "manually" and avoiding them altogether.
> >
> > Note that you can still do basically what you're proposing here with
> > lglocks as well. Avoid using lg_global_* and just lock each one in
> > turn.
>
> Yes, that was I was referring to as benefit. My main point is that there
> are only lg_local_* calls we could as well use normal spinlocks. No need
> to fancy.
>
Sure, but the lg_lock wrappers are a nice abstraction for this. I don't
think that we gain much by eliminating them. Changing the lglock code
to use normal spinlocks would also have the benefit of fixing up the
other user of that code.
> > That said, now that I've thought about this, I'm not sure that's really
> > something we want to do when accessing /proc/locks. If you lock each
> > one in turn, then you aren't freezing the state of the file_lock_list
> > percpu lists. Won't that mean that you aren't necessarily getting a
> > consistent view of the locks on those lists when you cat /proc/locks?
>
> Maybe I am overlooking something here but I don't see a consistency
> problem. We list a blocker and all its waiter in a go since only the
> blocker is added to flock_lock_list and the waiters are added blocker's
> fl_block list.
>
Other locking activity could be happening at the same time. For
instance, between when you drop one CPU's spinlock and pick up another,
the lock that you just printed could be acquired by another thread on
another CPU and then you go print it again. Now you're showing
conflicting locks in /proc/locks output.
Is that a real problem? I've no idea -- we don't have a lot of guidance
for what sort of atomicity /proc/locks needs, but that seems wrong to
me.
I also just don't see much benefit in optimizing /proc/locks access.
That's only done very rarely under most workloads. Locking all of the
spinlocks when you want to read from it sounds 100% fine to me and that
may help prevent these sorts of consistency problems. It also has the
benefit of keeping the /proc/locks seqfile code simpler.
> > I think having a consistent view there might trump any benefit to
> > performance. Reading /proc/locks is a *very* rare activity in the big
> > scheme of things.
>
> I agree, but I hope that I got it right with my consistency argument
> than there shouldn't be a problem.
>
> > I do however like the idea of moving more to be protected by the
> > lglocks, and minimizing usage of the blocked_lock_lock.
>
> Good to hear. I am trying to write a new test (a variation of the
> dinning philosophers 'problem') case which benchmarks blocked_lock_lock
> after the re-factoring.
>
Sounds good. I may go ahead and pick up the first couple of patches and
queue them for v4.1 since they seem like reasonable cleanups. I'll let
you know once I've done that.
--
Jeff Layton <[email protected]>