2015-02-19 14:34:42

by Daniel Wagner

[permalink] [raw]
Subject: [RFC v0 0/1] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock

Hi,

I am looking at how to get rid of lglock. Reason being -rt is not too
happy with that lock, especially that it uses arch_spinlock_t and
therefore it is not changed into a mutex on -rt. I know no change is
accepted only fixing something for -rt alone. So here my attempt to
make things faster for mainline and fixing -rt.

There are two users of lglock at this point. fs/locks.c and
kernel/stop_machine.c

I presume the fs/locks is the more interesting one in respect of
performance. Let's have a look at that one first.

The lglock version of file_lock_lock is used in combination of
blocked_lock_lock to protect file_lock's fl_link, fl_block, fl_next,
blocked_hash and the percpu file_lock_list.

The plan is to reorganize the usage of the locks and what they protect
so that the usage of the global blocked_lock_lock is reduced.

Whenever we insert a new lock we are going to grab besides the i_lock
also the corresponding percpu file_lock_lock. The global
blocked_lock_lock is only used when blocked_hash is involved.

file_lock_list exists to be being able to produce the content of
/proc/locks. For listing the all locks it seems a bit excessive to
grab all locks at once. We should be okay just grabbing the
corresponding lock when iterating over the percpu file_lock_list.

file_lock_lock protects now file_lock_list and fl_link, fl_block and
fl_next allone. That means we need to define which file_lock_lock is
used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
and fl_next.

I haven't found a good way around for the open coded seq_ops
(locks_start, locks_next, locks_stop). Maybe someone has good idea how
to handle with the locks.

For performance testing I used
git://git.samba.org/jlayton/lockperf.git and for correctness
https://github.com/linux-test-project/ltp/tree/master/testcases/network/nfsv4/locks
In case you are missing the posix03 results, my machine doesn't like
it too much. The load brings it to its knees due to the very high
load. Propably I need different parameters.

I didn't run excessive tests so far, because I am waiting for getting
access on a bigger box compared to my small i7-4850HQ system. I hope
to see larger improvements when there are more cores involved.

Anyway here are some results based on 3.19.

Explanation:
3.19: means unpatched kernel
3.19.0+: patched version
-with-reader.data: 'while true; do cat /proc/locks; done > /dev/null'


3.19/flock01.data
# NumSamples = 117; Min = 853.77; Max = 1425.62
# Mean = 1387.965754; Variance = 5512.265923; SD = 74.244636; Median 1408.578537
# each ∎ represents a count of 1
853.7746 - 910.9590 [ 1]: ∎
910.9590 - 968.1435 [ 0]:
968.1435 - 1025.3280 [ 1]: ∎
1025.3280 - 1082.5124 [ 0]:
1082.5124 - 1139.6969 [ 0]:
1139.6969 - 1196.8813 [ 2]: ∎∎
1196.8813 - 1254.0658 [ 1]: ∎
1254.0658 - 1311.2502 [ 1]: ∎
1311.2502 - 1368.4347 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎
1368.4347 - 1425.6192 [ 100]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎

3.19.0+/flock01.data
# NumSamples = 124; Min = 1415.92; Max = 1459.39
# Mean = 1430.258322; Variance = 36.425769; SD = 6.035376; Median 1429.462718
# each ∎ represents a count of 1
1415.9192 - 1420.2667 [ 2]: ∎∎
1420.2667 - 1424.6142 [ 19]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1424.6142 - 1428.9617 [ 36]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1428.9617 - 1433.3092 [ 32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1433.3092 - 1437.6567 [ 22]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1437.6567 - 1442.0042 [ 10]: ∎∎∎∎∎∎∎∎∎∎
1442.0042 - 1446.3517 [ 2]: ∎∎
1446.3517 - 1450.6992 [ 0]:
1450.6992 - 1455.0467 [ 0]:
1455.0467 - 1459.3942 [ 1]: ∎

3.19/flock01-with-reader.data
# NumSamples = 97; Min = 1342.97; Max = 1423.54
# Mean = 1410.695019; Variance = 149.115584; SD = 12.211289; Median 1413.260338
# each ∎ represents a count of 1
1342.9675 - 1351.0245 [ 1]: ∎
1351.0245 - 1359.0815 [ 0]:
1359.0815 - 1367.1384 [ 1]: ∎
1367.1384 - 1375.1954 [ 1]: ∎
1375.1954 - 1383.2524 [ 1]: ∎
1383.2524 - 1391.3093 [ 4]: ∎∎∎∎
1391.3093 - 1399.3663 [ 0]:
1399.3663 - 1407.4233 [ 10]: ∎∎∎∎∎∎∎∎∎∎
1407.4233 - 1415.4803 [ 45]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1415.4803 - 1423.5372 [ 34]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎

3.19.0+/flock01-with-reader.data
# NumSamples = 104; Min = 1331.30; Max = 1349.97
# Mean = 1340.767603; Variance = 16.434314; SD = 4.053926; Median 1340.243416
# each ∎ represents a count of 1
1331.3015 - 1333.1680 [ 2]: ∎∎
1333.1680 - 1335.0345 [ 6]: ∎∎∎∎∎∎
1335.0345 - 1336.9011 [ 9]: ∎∎∎∎∎∎∎∎∎
1336.9011 - 1338.7676 [ 17]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1338.7676 - 1340.6341 [ 21]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1340.6341 - 1342.5006 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎
1342.5006 - 1344.3671 [ 17]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1344.3671 - 1346.2336 [ 10]: ∎∎∎∎∎∎∎∎∎∎
1346.2336 - 1348.1001 [ 4]: ∎∎∎∎
1348.1001 - 1349.9666 [ 7]: ∎∎∎∎∎∎∎

3.19/flock02.data
# NumSamples = 2726; Min = 5.33; Max = 65.40
# Mean = 13.524542; Variance = 15.739906; SD = 3.967355; Median 12.857334
# each ∎ represents a count of 32
5.3260 - 11.3331 [ 167]: ∎∎∎∎∎
11.3331 - 17.3402 [ 2435]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
17.3402 - 23.3472 [ 90]: ∎∎
23.3472 - 29.3543 [ 4]:
29.3543 - 35.3614 [ 6]:
35.3614 - 41.3684 [ 6]:
41.3684 - 47.3755 [ 1]:
47.3755 - 53.3826 [ 11]:
53.3826 - 59.3897 [ 3]:
59.3897 - 65.3967 [ 3]:

3.19.0+/flock02.data
# NumSamples = 2226; Min = 9.69; Max = 45.76
# Mean = 13.140839; Variance = 2.468176; SD = 1.571043; Median 12.936904
# each ∎ represents a count of 18
9.6911 - 13.2975 [ 1366]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
13.2975 - 16.9040 [ 806]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
16.9040 - 20.5104 [ 51]: ∎∎
20.5104 - 24.1168 [ 0]:
24.1168 - 27.7232 [ 0]:
27.7232 - 31.3297 [ 1]:
31.3297 - 34.9361 [ 1]:
34.9361 - 38.5425 [ 0]:
38.5425 - 42.1489 [ 0]:
42.1489 - 45.7554 [ 1]:

3.19/flock02-with-reader.data
# NumSamples = 2719; Min = 2.59; Max = 67.27
# Mean = 14.488231; Variance = 8.927716; SD = 2.987928; Median 14.103543
# each ∎ represents a count of 30
2.5949 - 9.0619 [ 2]:
9.0619 - 15.5289 [ 2251]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
15.5289 - 21.9960 [ 441]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎
21.9960 - 28.4630 [ 15]:
28.4630 - 34.9300 [ 0]:
34.9300 - 41.3970 [ 1]:
41.3970 - 47.8640 [ 4]:
47.8640 - 54.3310 [ 1]:
54.3310 - 60.7980 [ 3]:
60.7980 - 67.2650 [ 1]:

3.19.0+/flock02-with-reader.data
# NumSamples = 2539; Min = 9.95; Max = 23.41
# Mean = 13.993072; Variance = 2.729366; SD = 1.652079; Median 13.800728
# each ∎ represents a count of 13
9.9482 - 11.2940 [ 47]: ∎∎∎
11.2940 - 12.6398 [ 399]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
12.6398 - 13.9855 [ 1016]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
13.9855 - 15.3313 [ 732]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
15.3313 - 16.6771 [ 190]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎
16.6771 - 18.0229 [ 70]: ∎∎∎∎∎
18.0229 - 19.3687 [ 41]: ∎∎∎
19.3687 - 20.7145 [ 28]: ∎∎
20.7145 - 22.0602 [ 13]: ∎
22.0602 - 23.4060 [ 3]:

3.19/lease01.data
# NumSamples = 65; Min = 152.59; Max = 175.66
# Mean = 167.052274; Variance = 23.396889; SD = 4.837033; Median 166.765499
# each ∎ represents a count of 1
152.5900 - 154.8972 [ 1]: ∎
154.8972 - 157.2045 [ 1]: ∎
157.2045 - 159.5117 [ 4]: ∎∎∎∎
159.5117 - 161.8190 [ 4]: ∎∎∎∎
161.8190 - 164.1262 [ 5]: ∎∎∎∎∎
164.1262 - 166.4335 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
166.4335 - 168.7407 [ 10]: ∎∎∎∎∎∎∎∎∎∎
168.7407 - 171.0480 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎
171.0480 - 173.3552 [ 6]: ∎∎∎∎∎∎
173.3552 - 175.6625 [ 7]: ∎∎∎∎∎∎∎

3.19.0+/lease01.data
# NumSamples = 602; Min = 145.21; Max = 181.15
# Mean = 167.448570; Variance = 32.250924; SD = 5.678990; Median 167.925163
# each ∎ represents a count of 2
145.2090 - 148.8032 [ 4]: ∎∎
148.8032 - 152.3974 [ 8]: ∎∎∎∎
152.3974 - 155.9916 [ 14]: ∎∎∎∎∎∎∎
155.9916 - 159.5858 [ 31]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
159.5858 - 163.1800 [ 44]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
163.1800 - 166.7742 [ 145]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
166.7742 - 170.3684 [ 173]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
170.3684 - 173.9626 [ 126]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
173.9626 - 177.5568 [ 41]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
177.5568 - 181.1510 [ 16]: ∎∎∎∎∎∎∎∎

3.19/lease01-with-reader.data
# NumSamples = 76; Min = 137.96; Max = 179.57
# Mean = 163.911898; Variance = 69.281798; SD = 8.323569; Median 164.218926
# each ∎ represents a count of 1
137.9570 - 142.1184 [ 1]: ∎
142.1184 - 146.2799 [ 0]:
146.2799 - 150.4414 [ 6]: ∎∎∎∎∎∎
150.4414 - 154.6028 [ 2]: ∎∎
154.6028 - 158.7643 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎
158.7643 - 162.9257 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
162.9257 - 167.0872 [ 10]: ∎∎∎∎∎∎∎∎∎∎
167.0872 - 171.2487 [ 19]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
171.2487 - 175.4101 [ 7]: ∎∎∎∎∎∎∎
175.4101 - 179.5716 [ 5]: ∎∎∎∎∎

3.19.0+/lease01-with-reader.data
# NumSamples = 216; Min = 144.76; Max = 176.76
# Mean = 160.680593; Variance = 44.032938; SD = 6.635732; Median 160.827124
# each ∎ represents a count of 1
144.7606 - 147.9610 [ 8]: ∎∎∎∎∎∎∎∎
147.9610 - 151.1614 [ 16]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
151.1614 - 154.3618 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎
154.3618 - 157.5622 [ 30]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
157.5622 - 160.7626 [ 42]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
160.7626 - 163.9630 [ 41]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
163.9630 - 167.1634 [ 32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
167.1634 - 170.3638 [ 24]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
170.3638 - 173.5642 [ 6]: ∎∎∎∎∎∎
173.5642 - 176.7646 [ 6]: ∎∎∎∎∎∎

3.19/lease02.data
# NumSamples = 306; Min = 43.22; Max = 64.80
# Mean = 59.318955; Variance = 9.779672; SD = 3.127247; Median 59.907252
# each ∎ represents a count of 1
43.2247 - 45.3825 [ 1]: ∎
45.3825 - 47.5403 [ 1]: ∎
47.5403 - 49.6981 [ 1]: ∎
49.6981 - 51.8560 [ 8]: ∎∎∎∎∎∎∎∎
51.8560 - 54.0138 [ 10]: ∎∎∎∎∎∎∎∎∎∎
54.0138 - 56.1716 [ 20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
56.1716 - 58.3294 [ 43]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
58.3294 - 60.4872 [ 93]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
60.4872 - 62.6450 [ 104]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
62.6450 - 64.8028 [ 25]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎

3.19.0+/lease02.data
# NumSamples = 527; Min = 50.49; Max = 64.11
# Mean = 59.436887; Variance = 4.335719; SD = 2.082239; Median 59.632461
# each ∎ represents a count of 1
50.4855 - 51.8483 [ 2]: ∎∎
51.8483 - 53.2112 [ 1]: ∎
53.2112 - 54.5741 [ 5]: ∎∎∎∎∎
54.5741 - 55.9369 [ 26]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
55.9369 - 57.2998 [ 47]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
57.2998 - 58.6627 [ 92]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
58.6627 - 60.0255 [ 124]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
60.0255 - 61.3884 [ 127]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
61.3884 - 62.7513 [ 89]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
62.7513 - 64.1141 [ 14]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎

3.19/lease02-with-reader.data
# NumSamples = 287; Min = 43.36; Max = 61.39
# Mean = 56.435938; Variance = 9.470188; SD = 3.077367; Median 57.268150
# each ∎ represents a count of 1
43.3599 - 45.1627 [ 1]: ∎
45.1627 - 46.9655 [ 4]: ∎∎∎∎
46.9655 - 48.7684 [ 5]: ∎∎∎∎∎
48.7684 - 50.5712 [ 8]: ∎∎∎∎∎∎∎∎
50.5712 - 52.3741 [ 10]: ∎∎∎∎∎∎∎∎∎∎
52.3741 - 54.1769 [ 25]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
54.1769 - 55.9797 [ 37]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
55.9797 - 57.7826 [ 93]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
57.7826 - 59.5854 [ 74]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
59.5854 - 61.3883 [ 30]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎

3.19.0+/lease02-with-reader.data
# NumSamples = 919; Min = 46.59; Max = 62.16
# Mean = 56.053034; Variance = 5.565033; SD = 2.359032; Median 56.239479
# each ∎ represents a count of 3
46.5883 - 48.1459 [ 4]: ∎
48.1459 - 49.7034 [ 7]: ∎∎
49.7034 - 51.2609 [ 14]: ∎∎∎∎
51.2609 - 52.8184 [ 56]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
52.8184 - 54.3760 [ 120]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
54.3760 - 55.9335 [ 220]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
55.9335 - 57.4910 [ 260]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
57.4910 - 59.0486 [ 152]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
59.0486 - 60.6061 [ 70]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
60.6061 - 62.1636 [ 16]: ∎∎∎∎∎

3.19/posix01.data
# NumSamples = 62; Min = 1132.49; Max = 1458.17
# Mean = 1336.939168; Variance = 6234.581717; SD = 78.959368; Median 1338.021800
# each ∎ represents a count of 1
1132.4925 - 1165.0600 [ 2]: ∎∎
1165.0600 - 1197.6274 [ 2]: ∎∎
1197.6274 - 1230.1949 [ 4]: ∎∎∎∎
1230.1949 - 1262.7623 [ 4]: ∎∎∎∎
1262.7623 - 1295.3297 [ 4]: ∎∎∎∎
1295.3297 - 1327.8972 [ 7]: ∎∎∎∎∎∎∎
1327.8972 - 1360.4646 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
1360.4646 - 1393.0321 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
1393.0321 - 1425.5995 [ 3]: ∎∎∎
1425.5995 - 1458.1670 [ 10]: ∎∎∎∎∎∎∎∎∎∎

3.19.0+/posix01.data
# NumSamples = 506; Min = 1309.62; Max = 1500.85
# Mean = 1453.666440; Variance = 619.986362; SD = 24.899525; Median 1457.756685
# each ∎ represents a count of 2
1309.6159 - 1328.7394 [ 1]:
1328.7394 - 1347.8630 [ 0]:
1347.8630 - 1366.9866 [ 2]: ∎
1366.9866 - 1386.1102 [ 3]: ∎
1386.1102 - 1405.2338 [ 15]: ∎∎∎∎∎∎∎
1405.2338 - 1424.3574 [ 42]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1424.3574 - 1443.4810 [ 77]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1443.4810 - 1462.6046 [ 157]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1462.6046 - 1481.7282 [ 163]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1481.7282 - 1500.8518 [ 46]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎

3.19/posix01-with-reader.data
# NumSamples = 45; Min = 1234.92; Max = 1450.74
# Mean = 1409.432073; Variance = 1146.435306; SD = 33.859051; Median 1416.858889
# each ∎ represents a count of 1
1234.9186 - 1256.5012 [ 1]: ∎
1256.5012 - 1278.0838 [ 0]:
1278.0838 - 1299.6664 [ 0]:
1299.6664 - 1321.2490 [ 0]:
1321.2490 - 1342.8316 [ 0]:
1342.8316 - 1364.4142 [ 1]: ∎
1364.4142 - 1385.9968 [ 4]: ∎∎∎∎
1385.9968 - 1407.5794 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎
1407.5794 - 1429.1620 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1429.1620 - 1450.7446 [ 9]: ∎∎∎∎∎∎∎∎∎

3.19.0+/posix01-with-reader.data
# NumSamples = 117; Min = 1301.80; Max = 1411.39
# Mean = 1379.963065; Variance = 519.145334; SD = 22.784761; Median 1383.988035
# each ∎ represents a count of 1
1301.7995 - 1312.7584 [ 3]: ∎∎∎
1312.7584 - 1323.7172 [ 2]: ∎∎
1323.7172 - 1334.6761 [ 3]: ∎∎∎
1334.6761 - 1345.6350 [ 4]: ∎∎∎∎
1345.6350 - 1356.5938 [ 0]:
1356.5938 - 1367.5527 [ 10]: ∎∎∎∎∎∎∎∎∎∎
1367.5527 - 1378.5115 [ 19]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1378.5115 - 1389.4704 [ 28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1389.4704 - 1400.4292 [ 35]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1400.4292 - 1411.3881 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎

3.19/posix02.data
# NumSamples = 458; Min = 18.61; Max = 75.56
# Mean = 23.772732; Variance = 18.547870; SD = 4.306724; Median 23.473403
# each ∎ represents a count of 3
18.6111 - 24.3065 [ 297]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
24.3065 - 30.0018 [ 158]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
30.0018 - 35.6972 [ 0]:
35.6972 - 41.3926 [ 0]:
41.3926 - 47.0879 [ 0]:
47.0879 - 52.7833 [ 0]:
52.7833 - 58.4787 [ 0]:
58.4787 - 64.1740 [ 0]:
64.1740 - 69.8694 [ 1]:
69.8694 - 75.5648 [ 2]:

3.19.0+/posix02.data
# NumSamples = 2340; Min = 15.62; Max = 79.66
# Mean = 22.538333; Variance = 12.248801; SD = 3.499829; Median 22.106778
# each ∎ represents a count of 15
15.6169 - 22.0211 [ 1167]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
22.0211 - 28.4252 [ 1101]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
28.4252 - 34.8294 [ 60]: ∎∎∎∎
34.8294 - 41.2335 [ 3]:
41.2335 - 47.6377 [ 1]:
47.6377 - 54.0418 [ 0]:
54.0418 - 60.4459 [ 1]:
60.4459 - 66.8501 [ 2]:
66.8501 - 73.2542 [ 3]:
73.2542 - 79.6584 [ 2]:

3.19/posix02-with-reader.data
# NumSamples = 387; Min = 19.21; Max = 29.79
# Mean = 25.041294; Variance = 3.455306; SD = 1.858845; Median 25.111446
# each ∎ represents a count of 1
19.2050 - 20.2634 [ 2]: ∎∎
20.2634 - 21.3218 [ 10]: ∎∎∎∎∎∎∎∎∎∎
21.3218 - 22.3802 [ 16]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
22.3802 - 23.4386 [ 44]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
23.4386 - 24.4970 [ 84]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
24.4970 - 25.5554 [ 74]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
25.5554 - 26.6138 [ 80]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
26.6138 - 27.6722 [ 55]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
27.6722 - 28.7306 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
28.7306 - 29.7890 [ 9]: ∎∎∎∎∎∎∎∎∎

3.19.0+/posix02-with-reader.data
# NumSamples = 1689; Min = 16.97; Max = 85.43
# Mean = 23.973230; Variance = 21.079021; SD = 4.591189; Median 23.419958
# each ∎ represents a count of 13
16.9744 - 23.8198 [ 993]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
23.8198 - 30.6652 [ 670]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
30.6652 - 37.5106 [ 9]:
37.5106 - 44.3560 [ 3]:
44.3560 - 51.2015 [ 2]:
51.2015 - 58.0469 [ 2]:
58.0469 - 64.8923 [ 3]:
64.8923 - 71.7377 [ 3]:
71.7377 - 78.5831 [ 1]:
78.5831 - 85.4285 [ 3]:


cheers,
daniel

Cc: Alexander Viro <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Cc: John Kacur <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Daniel Wagner (1):
fs/locks: Use plain percpu spinlocks instead of lglock to protect
file_lock

fs/locks.c | 164 +++++++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 112 insertions(+), 52 deletions(-)

--
2.1.0


2015-02-19 14:34:41

by Daniel Wagner

[permalink] [raw]
Subject: [RFC v0 1/1] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock

The lglock version of file_lock_lock is used in combination of
blocked_lock_lock to protect file_lock's fl_link, fl_block, fl_next,
blocked_hash and the percpu file_lock_list.

The plan is to reorganize the usage of the locks and what they protect
so that the usage of the global blocked_lock_lock is reduced.

Whenever we insert a new lock we are going to grab besides the i_lock
also the corresponding percpu file_lock_lock. The global
blocked_lock_lock is only used when blocked_hash is involved.

file_lock_list exists to be being able to produce the content of
/proc/locks. For listing the all locks it seems a bit excessive to
grab all locks at once. We should be okay just grabbing the
corresponding lock when iterating over the percpu file_lock_list.

file_lock_lock protects now file_lock_list and fl_link, fl_block and
fl_next allone. That means we need to define which file_lock_lock is
used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
and fl_next.

Signed-off-by: Daniel Wagner <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Cc: John Kacur <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

---
fs/locks.c | 164 +++++++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 112 insertions(+), 52 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 59e2f90..1ad7cff 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -163,10 +163,24 @@ int lease_break_time = 45;
/*
* The global file_lock_list is only used for displaying /proc/locks, so we
* keep a list on each CPU, with each list protected by its own spinlock via
- * the file_lock_lglock. Note that alterations to the list also require that
+ * the file_lock_lock. Note that alterations to the list also require that
* the relevant i_lock is held.
+ *
+ * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * pointer for file_lock structures that are acting as lock requests (in
+ * contrast to those that are acting as records of acquired locks).
+ *
+ * file_lock structures acting as lock requests (waiters) use the same
+ * spinlock as the those acting as lock holder (blocker). E.g. the
+ * blocker is initially added to the file_lock_list living on CPU 0,
+ * all waiters on that blocker are serialized via CPU 0 (see fl_link_cpu).
+ *
+ * Note that when we acquire this lock in order to change the above fields,
+ * we often hold the i_lock as well. In certain cases, when reading the fields
+ * protected by this lock, we can skip acquiring it iff we already hold the
+ * i_lock.
*/
-DEFINE_STATIC_LGLOCK(file_lock_lglock);
+static DEFINE_PER_CPU(spinlock_t, file_lock_lock);
static DEFINE_PER_CPU(struct hlist_head, file_lock_list);

/*
@@ -186,19 +200,6 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
/*
* This lock protects the blocked_hash. Generally, if you're accessing it, you
* want to be holding this lock.
- *
- * In addition, it also protects the fl->fl_block list, and the fl->fl_next
- * pointer for file_lock structures that are acting as lock requests (in
- * contrast to those that are acting as records of acquired locks).
- *
- * Note that when we acquire this lock in order to change the above fields,
- * we often hold the i_lock as well. In certain cases, when reading the fields
- * protected by this lock, we can skip acquiring it iff we already hold the
- * i_lock.
- *
- * In particular, adding an entry to the fl_block list requires that you hold
- * both the i_lock and the blocked_lock_lock (acquired in that order). Deleting
- * an entry from the list however only requires the file_lock_lock.
*/
static DEFINE_SPINLOCK(blocked_lock_lock);

@@ -516,10 +517,10 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
/* Must be called with the i_lock held! */
static void locks_insert_global_locks(struct file_lock *fl)
{
- lg_local_lock(&file_lock_lglock);
+ spin_lock(this_cpu_ptr(&file_lock_lock));
fl->fl_link_cpu = smp_processor_id();
- hlist_add_head(&fl->fl_link, this_cpu_ptr(&file_lock_list));
- lg_local_unlock(&file_lock_lglock);
+ hlist_add_head_rcu(&fl->fl_link, this_cpu_ptr(&file_lock_list));
+ spin_unlock(this_cpu_ptr(&file_lock_lock));
}

/* Must be called with the i_lock held! */
@@ -532,9 +533,9 @@ static void locks_delete_global_locks(struct file_lock *fl)
*/
if (hlist_unhashed(&fl->fl_link))
return;
- lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
- hlist_del_init(&fl->fl_link);
- lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
+ spin_lock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
+ hlist_del_init_rcu(&fl->fl_link);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
}

static unsigned long
@@ -557,11 +558,15 @@ static void locks_delete_global_blocked(struct file_lock *waiter)

/* Remove waiter from blocker's block list.
* When blocker ends up pointing to itself then the list is empty.
- *
- * Must be called with blocked_lock_lock held.
*/
static void __locks_delete_block(struct file_lock *waiter)
{
+ list_del_init(&waiter->fl_block);
+ waiter->fl_next = NULL;
+}
+
+static void __locks_delete_posix_block(struct file_lock *waiter)
+{
locks_delete_global_blocked(waiter);
list_del_init(&waiter->fl_block);
waiter->fl_next = NULL;
@@ -569,9 +574,18 @@ static void __locks_delete_block(struct file_lock *waiter)

static void locks_delete_block(struct file_lock *waiter)
{
- spin_lock(&blocked_lock_lock);
+ spin_lock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
__locks_delete_block(waiter);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
+}
+
+static void locks_delete_posix_block(struct file_lock *waiter)
+{
+ spin_lock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
+ spin_lock(&blocked_lock_lock);
+ locks_delete_block(waiter);
spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
}

/* Insert waiter into blocker's block list.
@@ -579,18 +593,22 @@ static void locks_delete_block(struct file_lock *waiter)
* the order they blocked. The documentation doesn't require this but
* it seems like the reasonable thing to do.
*
- * Must be called with both the i_lock and blocked_lock_lock held. The fl_block
- * list itself is protected by the blocked_lock_lock, but by ensuring that the
- * i_lock is also held on insertions we can avoid taking the blocked_lock_lock
- * in some cases when we see that the fl_block list is empty.
+ * Must be called with both the i_lock and file_lock_lock held.
*/
static void __locks_insert_block(struct file_lock *blocker,
struct file_lock *waiter)
{
BUG_ON(!list_empty(&waiter->fl_block));
+ waiter->fl_link_cpu = blocker->fl_link_cpu;
waiter->fl_next = blocker;
list_add_tail(&waiter->fl_block, &blocker->fl_block);
- if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
+}
+
+static void __locks_insert_posix_block(struct file_lock *blocker,
+ struct file_lock *waiter)
+{
+ __locks_insert_block(blocker, waiter);
+ if (!IS_OFDLCK(blocker))
locks_insert_global_blocked(waiter);
}

@@ -598,9 +616,9 @@ static void __locks_insert_block(struct file_lock *blocker,
static void locks_insert_block(struct file_lock *blocker,
struct file_lock *waiter)
{
- spin_lock(&blocked_lock_lock);
+ spin_lock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
__locks_insert_block(blocker, waiter);
- spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
}

/*
@@ -615,24 +633,29 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
* blocked requests are only added to the list under the i_lock, and
* the i_lock is always held here. Note that removal from the fl_block
* list does not require the i_lock, so we must recheck list_empty()
- * after acquiring the blocked_lock_lock.
+ * after acquiring the file_lock_lock.
*/
if (list_empty(&blocker->fl_block))
return;

- spin_lock(&blocked_lock_lock);
+ spin_lock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
while (!list_empty(&blocker->fl_block)) {
struct file_lock *waiter;

waiter = list_first_entry(&blocker->fl_block,
struct file_lock, fl_block);
- __locks_delete_block(waiter);
+ if (IS_POSIX(blocker)) {
+ spin_lock(&blocked_lock_lock);
+ __locks_delete_posix_block(waiter);
+ spin_unlock(&blocked_lock_lock);
+ } else
+ __locks_delete_block(waiter);
if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
waiter->fl_lmops->lm_notify(waiter);
else
wake_up(&waiter->fl_wait);
}
- spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
}

/* Insert file lock fl into an inode's lock list at the position indicated
@@ -690,9 +713,11 @@ static void locks_delete_lock(struct file_lock **thisfl_p,
struct file_lock *fl = *thisfl_p;

locks_unlink_lock(thisfl_p);
- if (dispose)
+ if (dispose) {
+ spin_lock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
list_add(&fl->fl_block, dispose);
- else
+ spin_unlock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
+ } else
locks_free_lock(fl);
}

@@ -971,12 +996,14 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
* locks list must be done while holding the same lock!
*/
error = -EDEADLK;
+ spin_lock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
spin_lock(&blocked_lock_lock);
if (likely(!posix_locks_deadlock(request, fl))) {
error = FILE_LOCK_DEFERRED;
- __locks_insert_block(fl, request);
+ __locks_insert_posix_block(fl, request);
}
spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
goto out;
}
}
@@ -1183,7 +1210,7 @@ int posix_lock_file_wait(struct file *filp, struct file_lock *fl)
if (!error)
continue;

- locks_delete_block(fl);
+ locks_delete_posix_block(fl);
break;
}
return error;
@@ -1273,7 +1300,7 @@ int locks_mandatory_area(int read_write, struct inode *inode,
continue;
}

- locks_delete_block(&fl);
+ locks_delete_posix_block(&fl);
break;
}

@@ -2432,12 +2459,14 @@ posix_unblock_lock(struct file_lock *waiter)
{
int status = 0;

+ spin_lock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
spin_lock(&blocked_lock_lock);
if (waiter->fl_next)
- __locks_delete_block(waiter);
+ __locks_delete_posix_block(waiter);
else
status = -ENOENT;
spin_unlock(&blocked_lock_lock);
+ spin_unlock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
return status;
}
EXPORT_SYMBOL(posix_unblock_lock);
@@ -2563,30 +2592,61 @@ static int locks_show(struct seq_file *f, void *v)
return 0;
}

+
static void *locks_start(struct seq_file *f, loff_t *pos)
- __acquires(&blocked_lock_lock)
{
struct locks_iterator *iter = f->private;
+ struct hlist_node *node;
+ loff_t p = *pos;

iter->li_pos = *pos + 1;
- lg_global_lock(&file_lock_lglock);
- spin_lock(&blocked_lock_lock);
- return seq_hlist_start_percpu(&file_lock_list, &iter->li_cpu, *pos);
+
+ for_each_possible_cpu(iter->li_cpu) {
+ spin_lock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
+ hlist_for_each(node, per_cpu_ptr(&file_lock_list, iter->li_cpu)) {
+ if (p-- == 0)
+ return node;
+ }
+ spin_unlock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
+ }
+ return NULL;
}

static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
{
struct locks_iterator *iter = f->private;
+ struct hlist_node *node = v;

++iter->li_pos;
- return seq_hlist_next_percpu(v, &file_lock_list, &iter->li_cpu, pos);
+ ++*pos;
+
+ if (node->next)
+ return node->next;
+
+ spin_unlock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
+
+ for (iter->li_cpu = cpumask_next(iter->li_cpu, cpu_possible_mask);
+ iter->li_cpu < nr_cpu_ids;
+ iter->li_cpu = cpumask_next(iter->li_cpu, cpu_possible_mask)) {
+ struct hlist_head *bucket;
+
+ spin_lock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
+ bucket = per_cpu_ptr(&file_lock_list, iter->li_cpu);
+
+ if (!hlist_empty(bucket))
+ return bucket->first;
+
+ spin_unlock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
+ }
+ return NULL;
}

static void locks_stop(struct seq_file *f, void *v)
- __releases(&blocked_lock_lock)
{
- spin_unlock(&blocked_lock_lock);
- lg_global_unlock(&file_lock_lglock);
+ struct locks_iterator *iter = f->private;
+
+ if (v)
+ spin_unlock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
}

static const struct seq_operations locks_seq_operations = {
@@ -2624,10 +2684,10 @@ static int __init filelock_init(void)
filelock_cache = kmem_cache_create("file_lock_cache",
sizeof(struct file_lock), 0, SLAB_PANIC, NULL);

- lg_lock_init(&file_lock_lglock, "file_lock_lglock");
-
- for_each_possible_cpu(i)
+ for_each_possible_cpu(i) {
INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));
+ spin_lock_init(per_cpu_ptr(&file_lock_lock, i));
+ }

return 0;
}
--
2.1.0

2015-02-19 20:49:50

by Jeff Layton

[permalink] [raw]
Subject: Re: [RFC v0 1/1] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock

On Thu, 19 Feb 2015 15:26:14 +0100
Daniel Wagner <[email protected]> wrote:

> The lglock version of file_lock_lock is used in combination of
> blocked_lock_lock to protect file_lock's fl_link, fl_block, fl_next,
> blocked_hash and the percpu file_lock_list.
>
> The plan is to reorganize the usage of the locks and what they protect
> so that the usage of the global blocked_lock_lock is reduced.
>
> Whenever we insert a new lock we are going to grab besides the i_lock
> also the corresponding percpu file_lock_lock. The global
> blocked_lock_lock is only used when blocked_hash is involved.
>

Ok, good. I like that part -- I hate the blocked_lock_lock, but freezing
the file locking state is the only way I've found to ensure the
correctness of deadlock detection. Bummer.

> file_lock_list exists to be being able to produce the content of
> /proc/locks. For listing the all locks it seems a bit excessive to
> grab all locks at once. We should be okay just grabbing the
> corresponding lock when iterating over the percpu file_lock_list.
>

True, but that's not a terribly common event, which is why I figured
the lg_lock was an acceptable tradeoff there. That said, if you can get
rid of it in favor of something more efficient then that sounds good to
me. If it helps the -rt kernels, then so much the better...


> file_lock_lock protects now file_lock_list and fl_link, fl_block and
> fl_next allone. That means we need to define which file_lock_lock is
> used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
> and fl_next.
>

Ok, so when a lock is blocked, we'll insert the waiter onto the
fl_block list for the blocker, and use the blocker's per-cpu spinlock
to protect that list. Good idea.

> Signed-off-by: Daniel Wagner <[email protected]>
> Cc: Alexander Viro <[email protected]>
> Cc: Jeff Layton <[email protected]>
> Cc: "J. Bruce Fields" <[email protected]>
> Cc: John Kacur <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>

Thanks for the patch. Some general comments first:

- fs/locks.c has seen a fair bit of overhaul during the current merge
window, and this patch won't apply directly. I'd suggest cleaning it up
after -rc1 is cut.

- the basic idea seems sound, but this is very "fiddly" code. It would
be nice to see if you can break this up into multiple patches. For
instance, maybe convert the lglock to the percpu spinlocks first in a
separate patch, and just keep it protecting the file_lock_list. Once
that's done, start changing other pieces to be protected by the percpu
locks. Small, incremental changes like that are much easier to deal
with if something breaks, since we can at least run a bisect to narrow
down the offending change. They're also easier to review.

- the open-coded seqfile stuff is pretty nasty. When I did the original
change to use lglocks, I ended up adding seq_hlist_*_percpu macros to
support it. Maybe consider doing something like that here too, though
this is a bit more complex obviously since you want to be able to just
hold one spinlock at a time.

- it would also be good to start thinking about adding lockdep
assertions to this code. I simply didn't realize how wonderful they
were when I did the global spinlock to i_lock conversion a year or two
ago. That can really help catch bugs, and as the (spin)locking gets
more complex in this code, that'd be good to help ensure correctness.

> ---
> fs/locks.c | 164 +++++++++++++++++++++++++++++++++++++++++--------------------
> 1 file changed, 112 insertions(+), 52 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 59e2f90..1ad7cff 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -163,10 +163,24 @@ int lease_break_time = 45;
> /*
> * The global file_lock_list is only used for displaying /proc/locks, so we
> * keep a list on each CPU, with each list protected by its own spinlock via
> - * the file_lock_lglock. Note that alterations to the list also require that
> + * the file_lock_lock. Note that alterations to the list also require that
> * the relevant i_lock is held.
> + *
> + * In addition, it also protects the fl->fl_block list, and the fl->fl_next
> + * pointer for file_lock structures that are acting as lock requests (in
> + * contrast to those that are acting as records of acquired locks).
> + *
> + * file_lock structures acting as lock requests (waiters) use the same
> + * spinlock as the those acting as lock holder (blocker). E.g. the
> + * blocker is initially added to the file_lock_list living on CPU 0,
> + * all waiters on that blocker are serialized via CPU 0 (see fl_link_cpu).
> + *
> + * Note that when we acquire this lock in order to change the above fields,
> + * we often hold the i_lock as well. In certain cases, when reading the fields
> + * protected by this lock, we can skip acquiring it iff we already hold the
> + * i_lock.
> */
> -DEFINE_STATIC_LGLOCK(file_lock_lglock);
> +static DEFINE_PER_CPU(spinlock_t, file_lock_lock);
> static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
>
> /*
> @@ -186,19 +200,6 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
> /*
> * This lock protects the blocked_hash. Generally, if you're accessing it, you
> * want to be holding this lock.
> - *
> - * In addition, it also protects the fl->fl_block list, and the fl->fl_next
> - * pointer for file_lock structures that are acting as lock requests (in
> - * contrast to those that are acting as records of acquired locks).
> - *
> - * Note that when we acquire this lock in order to change the above fields,
> - * we often hold the i_lock as well. In certain cases, when reading the fields
> - * protected by this lock, we can skip acquiring it iff we already hold the
> - * i_lock.
> - *
> - * In particular, adding an entry to the fl_block list requires that you hold
> - * both the i_lock and the blocked_lock_lock (acquired in that order). Deleting
> - * an entry from the list however only requires the file_lock_lock.
> */
> static DEFINE_SPINLOCK(blocked_lock_lock);
>
> @@ -516,10 +517,10 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
> /* Must be called with the i_lock held! */
> static void locks_insert_global_locks(struct file_lock *fl)
> {
> - lg_local_lock(&file_lock_lglock);
> + spin_lock(this_cpu_ptr(&file_lock_lock));
> fl->fl_link_cpu = smp_processor_id();
> - hlist_add_head(&fl->fl_link, this_cpu_ptr(&file_lock_list));
> - lg_local_unlock(&file_lock_lglock);
> + hlist_add_head_rcu(&fl->fl_link, this_cpu_ptr(&file_lock_list));
> + spin_unlock(this_cpu_ptr(&file_lock_lock));
> }
>
> /* Must be called with the i_lock held! */
> @@ -532,9 +533,9 @@ static void locks_delete_global_locks(struct file_lock *fl)
> */
> if (hlist_unhashed(&fl->fl_link))
> return;
> - lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
> - hlist_del_init(&fl->fl_link);
> - lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
> + spin_lock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
> + hlist_del_init_rcu(&fl->fl_link);
> + spin_unlock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
> }
>
> static unsigned long
> @@ -557,11 +558,15 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
>
> /* Remove waiter from blocker's block list.
> * When blocker ends up pointing to itself then the list is empty.
> - *
> - * Must be called with blocked_lock_lock held.
> */
> static void __locks_delete_block(struct file_lock *waiter)
> {
> + list_del_init(&waiter->fl_block);
> + waiter->fl_next = NULL;
> +}
> +
> +static void __locks_delete_posix_block(struct file_lock *waiter)
> +{
> locks_delete_global_blocked(waiter);
> list_del_init(&waiter->fl_block);
> waiter->fl_next = NULL;
> @@ -569,9 +574,18 @@ static void __locks_delete_block(struct file_lock *waiter)
>
> static void locks_delete_block(struct file_lock *waiter)
> {
> - spin_lock(&blocked_lock_lock);
> + spin_lock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
> __locks_delete_block(waiter);
> + spin_unlock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
> +}
> +
> +static void locks_delete_posix_block(struct file_lock *waiter)
> +{
> + spin_lock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
> + spin_lock(&blocked_lock_lock);
> + locks_delete_block(waiter);
> spin_unlock(&blocked_lock_lock);
> + spin_unlock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
> }
>
> /* Insert waiter into blocker's block list.
> @@ -579,18 +593,22 @@ static void locks_delete_block(struct file_lock *waiter)
> * the order they blocked. The documentation doesn't require this but
> * it seems like the reasonable thing to do.
> *
> - * Must be called with both the i_lock and blocked_lock_lock held. The fl_block
> - * list itself is protected by the blocked_lock_lock, but by ensuring that the
> - * i_lock is also held on insertions we can avoid taking the blocked_lock_lock
> - * in some cases when we see that the fl_block list is empty.
> + * Must be called with both the i_lock and file_lock_lock held.
> */
> static void __locks_insert_block(struct file_lock *blocker,
> struct file_lock *waiter)
> {
> BUG_ON(!list_empty(&waiter->fl_block));
> + waiter->fl_link_cpu = blocker->fl_link_cpu;
> waiter->fl_next = blocker;
> list_add_tail(&waiter->fl_block, &blocker->fl_block);
> - if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
> +}
> +
> +static void __locks_insert_posix_block(struct file_lock *blocker,
> + struct file_lock *waiter)
> +{
> + __locks_insert_block(blocker, waiter);
> + if (!IS_OFDLCK(blocker))
> locks_insert_global_blocked(waiter);
> }
>
> @@ -598,9 +616,9 @@ static void __locks_insert_block(struct file_lock *blocker,
> static void locks_insert_block(struct file_lock *blocker,
> struct file_lock *waiter)
> {
> - spin_lock(&blocked_lock_lock);
> + spin_lock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
> __locks_insert_block(blocker, waiter);
> - spin_unlock(&blocked_lock_lock);
> + spin_unlock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
> }
>
> /*
> @@ -615,24 +633,29 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
> * blocked requests are only added to the list under the i_lock, and
> * the i_lock is always held here. Note that removal from the fl_block
> * list does not require the i_lock, so we must recheck list_empty()
> - * after acquiring the blocked_lock_lock.
> + * after acquiring the file_lock_lock.
> */
> if (list_empty(&blocker->fl_block))
> return;
>
> - spin_lock(&blocked_lock_lock);
> + spin_lock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
> while (!list_empty(&blocker->fl_block)) {
> struct file_lock *waiter;
>
> waiter = list_first_entry(&blocker->fl_block,
> struct file_lock, fl_block);
> - __locks_delete_block(waiter);
> + if (IS_POSIX(blocker)) {
> + spin_lock(&blocked_lock_lock);
> + __locks_delete_posix_block(waiter);
> + spin_unlock(&blocked_lock_lock);
> + } else
> + __locks_delete_block(waiter);
> if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
> waiter->fl_lmops->lm_notify(waiter);
> else
> wake_up(&waiter->fl_wait);
> }
> - spin_unlock(&blocked_lock_lock);
> + spin_unlock(per_cpu_ptr(&file_lock_lock, blocker->fl_link_cpu));
> }
>
> /* Insert file lock fl into an inode's lock list at the position indicated
> @@ -690,9 +713,11 @@ static void locks_delete_lock(struct file_lock **thisfl_p,
> struct file_lock *fl = *thisfl_p;
>
> locks_unlink_lock(thisfl_p);
> - if (dispose)
> + if (dispose) {
> + spin_lock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
> list_add(&fl->fl_block, dispose);
> - else
> + spin_unlock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
> + } else
> locks_free_lock(fl);
> }
>
> @@ -971,12 +996,14 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
> * locks list must be done while holding the same lock!
> */
> error = -EDEADLK;
> + spin_lock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
> spin_lock(&blocked_lock_lock);
> if (likely(!posix_locks_deadlock(request, fl))) {
> error = FILE_LOCK_DEFERRED;
> - __locks_insert_block(fl, request);
> + __locks_insert_posix_block(fl, request);
> }
> spin_unlock(&blocked_lock_lock);
> + spin_unlock(per_cpu_ptr(&file_lock_lock, fl->fl_link_cpu));
> goto out;
> }
> }
> @@ -1183,7 +1210,7 @@ int posix_lock_file_wait(struct file *filp, struct file_lock *fl)
> if (!error)
> continue;
>
> - locks_delete_block(fl);
> + locks_delete_posix_block(fl);
> break;
> }
> return error;
> @@ -1273,7 +1300,7 @@ int locks_mandatory_area(int read_write, struct inode *inode,
> continue;
> }
>
> - locks_delete_block(&fl);
> + locks_delete_posix_block(&fl);
> break;
> }
>
> @@ -2432,12 +2459,14 @@ posix_unblock_lock(struct file_lock *waiter)
> {
> int status = 0;
>
> + spin_lock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
> spin_lock(&blocked_lock_lock);
> if (waiter->fl_next)
> - __locks_delete_block(waiter);
> + __locks_delete_posix_block(waiter);
> else
> status = -ENOENT;
> spin_unlock(&blocked_lock_lock);
> + spin_unlock(per_cpu_ptr(&file_lock_lock, waiter->fl_link_cpu));
> return status;
> }
> EXPORT_SYMBOL(posix_unblock_lock);
> @@ -2563,30 +2592,61 @@ static int locks_show(struct seq_file *f, void *v)
> return 0;
> }
>
> +
> static void *locks_start(struct seq_file *f, loff_t *pos)
> - __acquires(&blocked_lock_lock)
> {
> struct locks_iterator *iter = f->private;
> + struct hlist_node *node;
> + loff_t p = *pos;
>
> iter->li_pos = *pos + 1;
> - lg_global_lock(&file_lock_lglock);
> - spin_lock(&blocked_lock_lock);
> - return seq_hlist_start_percpu(&file_lock_list, &iter->li_cpu, *pos);
> +
> + for_each_possible_cpu(iter->li_cpu) {
> + spin_lock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
> + hlist_for_each(node, per_cpu_ptr(&file_lock_list, iter->li_cpu)) {
> + if (p-- == 0)
> + return node;
> + }
> + spin_unlock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
> + }
> + return NULL;
> }
>
> static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
> {
> struct locks_iterator *iter = f->private;
> + struct hlist_node *node = v;
>
> ++iter->li_pos;
> - return seq_hlist_next_percpu(v, &file_lock_list, &iter->li_cpu, pos);
> + ++*pos;
> +
> + if (node->next)
> + return node->next;
> +
> + spin_unlock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
> +
> + for (iter->li_cpu = cpumask_next(iter->li_cpu, cpu_possible_mask);
> + iter->li_cpu < nr_cpu_ids;
> + iter->li_cpu = cpumask_next(iter->li_cpu, cpu_possible_mask)) {
> + struct hlist_head *bucket;
> +
> + spin_lock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
> + bucket = per_cpu_ptr(&file_lock_list, iter->li_cpu);
> +
> + if (!hlist_empty(bucket))
> + return bucket->first;
> +
> + spin_unlock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
> + }
> + return NULL;
> }
>
> static void locks_stop(struct seq_file *f, void *v)
> - __releases(&blocked_lock_lock)
> {
> - spin_unlock(&blocked_lock_lock);
> - lg_global_unlock(&file_lock_lglock);
> + struct locks_iterator *iter = f->private;
> +
> + if (v)
> + spin_unlock(per_cpu_ptr(&file_lock_lock, iter->li_cpu));
> }
>
> static const struct seq_operations locks_seq_operations = {
> @@ -2624,10 +2684,10 @@ static int __init filelock_init(void)
> filelock_cache = kmem_cache_create("file_lock_cache",
> sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
>
> - lg_lock_init(&file_lock_lglock, "file_lock_lglock");
> -
> - for_each_possible_cpu(i)
> + for_each_possible_cpu(i) {
> INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));
> + spin_lock_init(per_cpu_ptr(&file_lock_lock, i));
> + }
>
> return 0;
> }


--
Jeff Layton <[email protected]>

2015-02-20 14:39:23

by Daniel Wagner

[permalink] [raw]
Subject: Re: [RFC v0 1/1] fs/locks: Use plain percpu spinlocks instead of lglock to protect file_lock

Hi Jeff,

On 02/19/2015 09:49 PM, Jeff Layton wrote:
> On Thu, 19 Feb 2015 15:26:14 +0100
> Daniel Wagner <[email protected]> wrote:
>> Whenever we insert a new lock we are going to grab besides the i_lock
>> also the corresponding percpu file_lock_lock. The global
>> blocked_lock_lock is only used when blocked_hash is involved.
>>
>
> Ok, good. I like that part -- I hate the blocked_lock_lock, but freezing
> the file locking state is the only way I've found to ensure the
> correctness of deadlock detection. Bummer.

Okay, I'll look into that.

>> file_lock_list exists to be being able to produce the content of
>> /proc/locks. For listing the all locks it seems a bit excessive to
>> grab all locks at once. We should be okay just grabbing the
>> corresponding lock when iterating over the percpu file_lock_list.
>>
>
> True, but that's not a terribly common event, which is why I figured
> the lg_lock was an acceptable tradeoff there. That said, if you can get
> rid of it in favor of something more efficient then that sounds good to
> me. If it helps the -rt kernels, then so much the better...

Great! I was hoping to hear that :)

>> file_lock_lock protects now file_lock_list and fl_link, fl_block and
>> fl_next allone. That means we need to define which file_lock_lock is
>> used for all waiters. Luckely, fl_link_cpu can be reused for fl_block
>> and fl_next.
>>
>
> Ok, so when a lock is blocked, we'll insert the waiter onto the
> fl_block list for the blocker, and use the blocker's per-cpu spinlock
> to protect that list. Good idea.

Let's hope it doesn't explode. So far I am still confident it works.


>> Signed-off-by: Daniel Wagner <[email protected]>
>> Cc: Alexander Viro <[email protected]>
>> Cc: Jeff Layton <[email protected]>
>> Cc: "J. Bruce Fields" <[email protected]>
>> Cc: John Kacur <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>>
>
> Thanks for the patch. Some general comments first:
>
> - fs/locks.c has seen a fair bit of overhaul during the current merge
> window, and this patch won't apply directly. I'd suggest cleaning it up
> after -rc1 is cut.

I just rebased it and splited it a bit up. Couldn't wait...

> - the basic idea seems sound, but this is very "fiddly" code. It would
> be nice to see if you can break this up into multiple patches. For
> instance, maybe convert the lglock to the percpu spinlocks first in a
> separate patch, and just keep it protecting the file_lock_list. Once
> that's done, start changing other pieces to be protected by the percpu
> locks. Small, incremental changes like that are much easier to deal
> with if something breaks, since we can at least run a bisect to narrow
> down the offending change. They're also easier to review.

I complete agree. Sorry to send such a bad initial version. I should
have known it better.

> - the open-coded seqfile stuff is pretty nasty. When I did the original
> change to use lglocks, I ended up adding seq_hlist_*_percpu macros to
> support it. Maybe consider doing something like that here too, though
> this is a bit more complex obviously since you want to be able to just
> hold one spinlock at a time.

Ok.

> - it would also be good to start thinking about adding lockdep
> assertions to this code. I simply didn't realize how wonderful they
> were when I did the global spinlock to i_lock conversion a year or two
> ago. That can really help catch bugs, and as the (spin)locking gets
> more complex in this code, that'd be good to help ensure correctness.

I'll give it a try.

Many thanks for the quick review. Really appreciated!

cheers,
daniel