2018-02-05 01:31:07

by Davidlohr Bueso

[permalink] [raw]
Subject: [RFC PATCH 00/64] mm: towards parallel address space operations

From: Davidlohr Bueso <[email protected]>

Hi,

This patchset is a new version of both the range locking machinery as well
as a full mmap_sem conversion that makes use of it -- as the worst case
scenario as all mmap_sem calls are converted to a full range mmap_lock
equivalent. As such, while there is no improvement of concurrency perse,
these changes aim at adding the machinery to permit this in the future.

Direct users of the mm->mmap_sem can be classified as those that (1) acquire
and release the lock within the same context, and (2) those who directly
manipulate the mmap_sem down the callchain. For example:

(1) down_read(&mm->mmap_sem);
/* do something */
/* nobody down the chain uses mmap_sem directly */
up_read(&mm->mmap_sem);

(2a) down_read(&mm->mmap_sem);
/* do something that retuns mmap_sem unlocked */
fn(mm, &locked);
if (locked)
up_read(&mm->mmap_sem);

(2b) down_read(&mm->mmap_sem);
/* do something that in between released and reacquired mmap_sem */
fn(mm);
up_read(&mm->mmap_sem);

Patches 1-2: add the range locking machinery. This is rebased on the rbtree
optimizations for interval trees such that we can quickly detect overlapping
ranges. More documentation as also been added, with an ordering example in the
source code.

Patch 3: adds new mm locking wrappers around mmap_sem.

Patches 4-15: teaches page fault paths about mmrange (specifically adding the
range in question to the struct vm_fault). In addition, most of these patches
update mmap_sem callers that call into the 2a and 2b examples above.

Patches 15-63: adds most of the trivial conversions -- the (1) example above.
(patches 21, 22, 23 are hacks that avoid rwsem_is_locked(mmap_sem) such that
we don't have to teach file_operations about mmrange.

Patch 64: finally do the actual conversion and replace mmap_sem with the range
mmap_lock.

I've run the series on a 40-core (ht) 2-socket IvyBridge with 16 Gb of memory
on various benchmarks that stress address space concurrency.

** pft is a microbenchmark for page fault rates.

When running with increasing thread counts, range locking takes a rather small
hit (yet constant) of ~2% for the pft timings, with a max of 5%. This translates
similarly to faults/cpu.


pft timings
v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
Amean system-1 1.11 ( 0.00%) 1.17 ( -5.86%)
Amean system-4 1.14 ( 0.00%) 1.18 ( -3.07%)
Amean system-7 1.38 ( 0.00%) 1.36 ( 0.94%)
Amean system-12 2.28 ( 0.00%) 2.31 ( -1.18%)
Amean system-21 4.11 ( 0.00%) 4.13 ( -0.44%)
Amean system-30 5.94 ( 0.00%) 6.01 ( -1.11%)
Amean system-40 8.24 ( 0.00%) 8.33 ( -1.04%)
Amean elapsed-1 1.28 ( 0.00%) 1.33 ( -4.50%)
Amean elapsed-4 0.32 ( 0.00%) 0.34 ( -5.27%)
Amean elapsed-7 0.24 ( 0.00%) 0.24 ( -0.43%)
Amean elapsed-12 0.23 ( 0.00%) 0.23 ( -0.22%)
Amean elapsed-21 0.26 ( 0.00%) 0.25 ( 0.39%)
Amean elapsed-30 0.24 ( 0.00%) 0.24 ( -0.21%)
Amean elapsed-40 0.24 ( 0.00%) 0.24 ( 0.84%)
Stddev system-1 0.04 ( 0.00%) 0.05 ( -16.29%)
Stddev system-4 0.03 ( 0.00%) 0.03 ( 17.70%)
Stddev system-7 0.08 ( 0.00%) 0.02 ( 68.56%)
Stddev system-12 0.05 ( 0.00%) 0.06 ( -31.22%)
Stddev system-21 0.06 ( 0.00%) 0.06 ( 8.07%)
Stddev system-30 0.05 ( 0.00%) 0.09 ( -70.15%)
Stddev system-40 0.11 ( 0.00%) 0.07 ( 41.53%)
Stddev elapsed-1 0.03 ( 0.00%) 0.05 ( -72.14%)
Stddev elapsed-4 0.01 ( 0.00%) 0.01 ( -4.98%)
Stddev elapsed-7 0.01 ( 0.00%) 0.01 ( 60.65%)
Stddev elapsed-12 0.01 ( 0.00%) 0.01 ( 6.24%)
Stddev elapsed-21 0.01 ( 0.00%) 0.01 ( -1.13%)
Stddev elapsed-30 0.00 ( 0.00%) 0.00 ( -45.10%)
Stddev elapsed-40 0.01 ( 0.00%) 0.01 ( 25.97%)

pft faults
v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
Hmean faults/cpu-1 629011.4218 ( 0.00%) 601523.2875 ( -4.37%)
Hmean faults/cpu-4 630952.1771 ( 0.00%) 602105.6527 ( -4.57%)
Hmean faults/cpu-7 518412.2806 ( 0.00%) 518082.2585 ( -0.06%)
Hmean faults/cpu-12 324957.1130 ( 0.00%) 321678.8932 ( -1.01%)
Hmean faults/cpu-21 182712.2633 ( 0.00%) 182643.5347 ( -0.04%)
Hmean faults/cpu-30 126618.2558 ( 0.00%) 125698.1965 ( -0.73%)
Hmean faults/cpu-40 91266.3914 ( 0.00%) 90614.9956 ( -0.71%)
Hmean faults/sec-1 628010.9821 ( 0.00%) 600700.3641 ( -4.35%)
Hmean faults/sec-4 2475859.3012 ( 0.00%) 2351373.1960 ( -5.03%)
Hmean faults/sec-7 3372026.7978 ( 0.00%) 3408924.8028 ( 1.09%)
Hmean faults/sec-12 3517750.6290 ( 0.00%) 3488785.0815 ( -0.82%)
Hmean faults/sec-21 3151328.9188 ( 0.00%) 3156983.9401 ( 0.18%)
Hmean faults/sec-30 3324673.3141 ( 0.00%) 3318585.9949 ( -0.18%)
Hmean faults/sec-40 3362503.8992 ( 0.00%) 3410086.6644 ( 1.42%)
Stddev faults/cpu-1 14795.1817 ( 0.00%) 22870.4755 ( -54.58%)
Stddev faults/cpu-4 8759.4355 ( 0.00%) 8117.4629 ( 7.33%)
Stddev faults/cpu-7 20638.6659 ( 0.00%) 2290.0083 ( 88.90%)
Stddev faults/cpu-12 4003.9838 ( 0.00%) 5297.7747 ( -32.31%)
Stddev faults/cpu-21 2127.4059 ( 0.00%) 1186.5330 ( 44.23%)
Stddev faults/cpu-30 558.8082 ( 0.00%) 1366.5374 (-144.54%)
Stddev faults/cpu-40 1234.8354 ( 0.00%) 768.8031 ( 37.74%)
Stddev faults/sec-1 14757.0434 ( 0.00%) 22740.7172 ( -54.10%)
Stddev faults/sec-4 49934.6675 ( 0.00%) 54133.9449 ( -8.41%)
Stddev faults/sec-7 152781.8690 ( 0.00%) 16415.0736 ( 89.26%)
Stddev faults/sec-12 228697.8709 ( 0.00%) 239575.3690 ( -4.76%)
Stddev faults/sec-21 70244.4600 ( 0.00%) 75031.5776 ( -6.81%)
Stddev faults/sec-30 52147.1842 ( 0.00%) 58651.5496 ( -12.47%)
Stddev faults/sec-40 149846.3761 ( 0.00%) 113646.0640 ( 24.16%)

v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
User 47.46 48.21
System 540.43 546.03
Elapsed 61.85 64.33

** gitcheckout is probably the workload that takes the biggest hit (-35%).
Sys time, as expected, increases quite a bit, coming from overhead of blocking.

gitcheckout
v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
System mean 9.49 ( 0.00%) 9.82 ( -3.49%)
System stddev 0.20 ( 0.00%) 0.39 ( -95.73%)
Elapsed mean 22.87 ( 0.00%) 30.90 ( -35.12%)
Elapsed stddev 0.39 ( 0.00%) 6.32 (-1526.48%)
CPU mean 98.07 ( 0.00%) 76.27 ( 22.23%)
CPU stddev 0.70 ( 0.00%) 14.63 (-1978.37%)


v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
User 224.06 224.80
System 176.05 181.01
Elapsed 619.51 801.78


** freqmine is an implementation of Frequent Itemsets Mining (FIM) that
analyses a set of transactions looking to extract association rules with
threads. This is a common workload in retail. This configuration uses
between 2 and 4*NUMCPUs. The performance differences with this patchset
are marginal.

freqmine-large
v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
Amean 2 216.89 ( 0.00%) 216.59 ( 0.14%)
Amean 5 91.56 ( 0.00%) 91.58 ( -0.02%)
Amean 8 59.41 ( 0.00%) 59.54 ( -0.22%)
Amean 12 44.19 ( 0.00%) 44.24 ( -0.12%)
Amean 21 33.97 ( 0.00%) 33.55 ( 1.25%)
Amean 30 33.28 ( 0.00%) 33.15 ( 0.40%)
Amean 48 34.38 ( 0.00%) 34.21 ( 0.48%)
Amean 79 33.22 ( 0.00%) 32.83 ( 1.19%)
Amean 110 36.15 ( 0.00%) 35.29 ( 2.40%)
Amean 141 35.63 ( 0.00%) 36.38 ( -2.12%)
Amean 160 36.31 ( 0.00%) 36.05 ( 0.73%)
Stddev 2 1.10 ( 0.00%) 0.19 ( 82.79%)
Stddev 5 0.23 ( 0.00%) 0.10 ( 54.31%)
Stddev 8 0.17 ( 0.00%) 0.43 (-146.19%)
Stddev 12 0.12 ( 0.00%) 0.12 ( -0.05%)
Stddev 21 0.49 ( 0.00%) 0.39 ( 21.88%)
Stddev 30 1.07 ( 0.00%) 0.93 ( 12.61%)
Stddev 48 0.76 ( 0.00%) 0.66 ( 12.07%)
Stddev 79 0.29 ( 0.00%) 0.58 ( -98.77%)
Stddev 110 1.10 ( 0.00%) 0.53 ( 51.93%)
Stddev 141 0.66 ( 0.00%) 0.79 ( -18.83%)
Stddev 160 0.27 ( 0.00%) 0.15 ( 42.71%)

v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
User 29346.21 28818.39
System 292.18 676.92
Elapsed 2622.81 2615.77


** kernbench (build kernels). With increasing thread counts, the amoung of
overhead from range locking is no more than ~5%.

kernbench
v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
Amean user-2 554.53 ( 0.00%) 555.74 ( -0.22%)
Amean user-4 566.23 ( 0.00%) 567.15 ( -0.16%)
Amean user-8 588.66 ( 0.00%) 589.68 ( -0.17%)
Amean user-16 647.97 ( 0.00%) 648.46 ( -0.08%)
Amean user-32 923.05 ( 0.00%) 925.25 ( -0.24%)
Amean user-64 1066.74 ( 0.00%) 1067.11 ( -0.03%)
Amean user-80 1082.50 ( 0.00%) 1082.11 ( 0.04%)
Amean syst-2 71.80 ( 0.00%) 74.90 ( -4.31%)
Amean syst-4 76.77 ( 0.00%) 79.91 ( -4.10%)
Amean syst-8 71.58 ( 0.00%) 74.83 ( -4.54%)
Amean syst-16 79.21 ( 0.00%) 82.95 ( -4.73%)
Amean syst-32 104.21 ( 0.00%) 108.47 ( -4.09%)
Amean syst-64 113.69 ( 0.00%) 119.39 ( -5.02%)
Amean syst-80 113.98 ( 0.00%) 120.18 ( -5.44%)
Amean elsp-2 307.65 ( 0.00%) 309.27 ( -0.53%)
Amean elsp-4 159.86 ( 0.00%) 160.94 ( -0.67%)
Amean elsp-8 84.76 ( 0.00%) 85.04 ( -0.33%)
Amean elsp-16 49.63 ( 0.00%) 49.56 ( 0.15%)
Amean elsp-32 37.52 ( 0.00%) 38.16 ( -1.68%)
Amean elsp-64 36.76 ( 0.00%) 37.03 ( -0.72%)
Amean elsp-80 37.09 ( 0.00%) 37.49 ( -1.08%)
Stddev user-2 0.97 ( 0.00%) 0.66 ( 32.20%)
Stddev user-4 0.52 ( 0.00%) 0.60 ( -17.34%)
Stddev user-8 0.64 ( 0.00%) 0.23 ( 63.28%)
Stddev user-16 1.40 ( 0.00%) 0.64 ( 54.46%)
Stddev user-32 1.32 ( 0.00%) 0.95 ( 28.47%)
Stddev user-64 0.77 ( 0.00%) 1.47 ( -91.61%)
Stddev user-80 1.12 ( 0.00%) 0.94 ( 16.00%)
Stddev syst-2 0.45 ( 0.00%) 0.45 ( 0.22%)
Stddev syst-4 0.41 ( 0.00%) 0.58 ( -41.24%)
Stddev syst-8 0.55 ( 0.00%) 0.28 ( 49.35%)
Stddev syst-16 0.22 ( 0.00%) 0.29 ( -30.98%)
Stddev syst-32 0.44 ( 0.00%) 0.56 ( -27.75%)
Stddev syst-64 0.47 ( 0.00%) 0.48 ( -1.91%)
Stddev syst-80 0.24 ( 0.00%) 0.60 (-144.20%)
Stddev elsp-2 0.46 ( 0.00%) 0.31 ( 32.97%)
Stddev elsp-4 0.14 ( 0.00%) 0.25 ( -72.38%)
Stddev elsp-8 0.36 ( 0.00%) 0.08 ( 77.92%)
Stddev elsp-16 0.74 ( 0.00%) 0.58 ( 22.00%)
Stddev elsp-32 0.31 ( 0.00%) 0.74 (-138.95%)
Stddev elsp-64 0.12 ( 0.00%) 0.12 ( 1.62%)
Stddev elsp-80 0.23 ( 0.00%) 0.15 ( 35.38%)

v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
User 28309.95 28341.20
System 3320.18 3473.73
Elapsed 3792.13 3850.21



** reaim's compute, new_dbase and shared workloads were tested, with
the new dbase one taking up to a 20% hit, which is expected as this
micro benchmark context switches a lot and benefits from reducing them
with spin-on-owner feature that range locks lack. Compute otoh was
boosted for higher thread counts.

reaim
v4.15-rc8 v4.15-rc8
range-mmap_lock-v1
Hmean compute-1 5652.98 ( 0.00%) 5738.64 ( 1.52%)
Hmean compute-21 81997.42 ( 0.00%) 81997.42 ( -0.00%)
Hmean compute-41 135622.27 ( 0.00%) 138959.73 ( 2.46%)
Hmean compute-61 179272.55 ( 0.00%) 174367.92 ( -2.74%)
Hmean compute-81 200187.60 ( 0.00%) 195250.60 ( -2.47%)
Hmean compute-101 207337.40 ( 0.00%) 187633.35 ( -9.50%)
Hmean compute-121 179018.55 ( 0.00%) 206087.69 ( 15.12%)
Hmean compute-141 175887.20 ( 0.00%) 195528.60 ( 11.17%)
Hmean compute-161 198063.33 ( 0.00%) 190335.54 ( -3.90%)
Hmean new_dbase-1 56.64 ( 0.00%) 60.76 ( 7.27%)
Hmean new_dbase-21 11149.48 ( 0.00%) 10082.35 ( -9.57%)
Hmean new_dbase-41 25161.87 ( 0.00%) 21626.83 ( -14.05%)
Hmean new_dbase-61 39858.32 ( 0.00%) 33956.04 ( -14.81%)
Hmean new_dbase-81 55057.19 ( 0.00%) 43879.73 ( -20.30%)
Hmean new_dbase-101 67566.57 ( 0.00%) 56323.77 ( -16.64%)
Hmean new_dbase-121 79517.22 ( 0.00%) 64877.67 ( -18.41%)
Hmean new_dbase-141 92365.91 ( 0.00%) 76571.18 ( -17.10%)
Hmean new_dbase-161 101590.77 ( 0.00%) 85332.76 ( -16.00%)
Hmean shared-1 71.26 ( 0.00%) 76.43 ( 7.26%)
Hmean shared-21 11546.39 ( 0.00%) 10521.92 ( -8.87%)
Hmean shared-41 28302.97 ( 0.00%) 22116.50 ( -21.86%)
Hmean shared-61 23814.56 ( 0.00%) 21886.13 ( -8.10%)
Hmean shared-81 11578.89 ( 0.00%) 16423.55 ( 41.84%)
Hmean shared-101 9991.41 ( 0.00%) 11378.95 ( 13.89%)
Hmean shared-121 9884.83 ( 0.00%) 10010.92 ( 1.28%)
Hmean shared-141 9911.88 ( 0.00%) 9637.14 ( -2.77%)
Hmean shared-161 8587.14 ( 0.00%) 9613.53 ( 11.95%)
Stddev compute-1 94.42 ( 0.00%) 166.37 ( -76.20%)
Stddev compute-21 1915.36 ( 0.00%) 2582.96 ( -34.85%)
Stddev compute-41 4822.88 ( 0.00%) 6057.32 ( -25.60%)
Stddev compute-61 4425.14 ( 0.00%) 3676.90 ( 16.91%)
Stddev compute-81 5549.60 ( 0.00%) 17213.90 (-210.18%)
Stddev compute-101 19395.33 ( 0.00%) 28315.96 ( -45.99%)
Stddev compute-121 16140.56 ( 0.00%) 27927.63 ( -73.03%)
Stddev compute-141 9616.27 ( 0.00%) 31273.43 (-225.21%)
Stddev compute-161 34746.00 ( 0.00%) 20706.81 ( 40.41%)
Stddev new_dbase-1 1.08 ( 0.00%) 0.80 ( 25.62%)
Stddev new_dbase-21 356.67 ( 0.00%) 297.23 ( 16.66%)
Stddev new_dbase-41 739.68 ( 0.00%) 1287.72 ( -74.09%)
Stddev new_dbase-61 896.06 ( 0.00%) 1293.55 ( -44.36%)
Stddev new_dbase-81 2003.96 ( 0.00%) 2018.08 ( -0.70%)
Stddev new_dbase-101 2101.25 ( 0.00%) 3461.91 ( -64.75%)
Stddev new_dbase-121 3294.30 ( 0.00%) 3917.20 ( -18.91%)
Stddev new_dbase-141 3488.81 ( 0.00%) 5242.36 ( -50.26%)
Stddev new_dbase-161 2744.12 ( 0.00%) 5262.36 ( -91.77%)
Stddev shared-1 1.38 ( 0.00%) 1.24 ( 9.84%)
Stddev shared-21 1930.40 ( 0.00%) 232.81 ( 87.94%)
Stddev shared-41 1939.93 ( 0.00%) 2316.09 ( -19.39%)
Stddev shared-61 15001.13 ( 0.00%) 12004.82 ( 19.97%)
Stddev shared-81 1313.02 ( 0.00%) 14583.51 (-1010.68%)
Stddev shared-101 355.44 ( 0.00%) 393.79 ( -10.79%)
Stddev shared-121 1736.68 ( 0.00%) 782.50 ( 54.94%)
Stddev shared-141 1865.93 ( 0.00%) 1140.24 ( 38.89%)
Stddev shared-161 1155.19 ( 0.00%) 2045.55 ( -77.07%)

Overall sys% always increases, which is expected, but with the exception
of git-checkout, the worst case scenario is not that excruciating.

Full test and details (including sysbench oltp mysql and specjbb) can be found here:
https://linux-scalability.org/range-mmap_lock/tweed-results/

Testing: I have setup an mmtests config file with all the workloads described:
http://linux-scalability.org/mmtests-config

Applies on top of linux-next (20180202). At least compile tested on
the following architectures:

x86_64, alpha, arm32, blackfin, cris, frv, ia64, m32r, m68k, mips, microblaze
ppc, s390, sparc, tile and xtensa.


Thanks!

Davidlohr Bueso (64):
interval-tree: build unconditionally
Introduce range reader/writer lock
mm: introduce mm locking wrappers
mm: add a range parameter to the vm_fault structure
mm,khugepaged: prepare passing of rangelock field to vm_fault
mm: teach pagefault paths about range locking
mm/hugetlb: teach hugetlb_fault() about range locking
mm: teach lock_page_or_retry() about range locking
mm/mmu_notifier: teach oom reaper about range locking
kernel/exit: teach exit_mm() about range locking
prctl: teach about range locking
fs/userfaultfd: teach userfaultfd_must_wait() about range locking
fs/proc: teach about range locking
fs/coredump: teach about range locking
ipc: use mm locking wrappers
virt: use mm locking wrappers
kernel: use mm locking wrappers
mm/ksm: teach about range locking
mm/mlock: use mm locking wrappers
mm/madvise: use mm locking wrappers
mm: teach drop/take_all_locks() about range locking
mm: avoid mmap_sem trylock in vm_insert_page()
mm: huge pagecache: do not check mmap_sem state
mm/thp: disable mmap_sem is_locked checks
mm: use mm locking wrappers
fs: use mm locking wrappers
arch/{x86,sh,ppc}: teach bad_area() about range locking
arch/x86: use mm locking wrappers
arch/alpha: use mm locking wrappers
arch/tile: use mm locking wrappers
arch/sparc: use mm locking wrappers
arch/s390: use mm locking wrappers
arch/powerpc: use mm locking wrappers
arch/parisc: use mm locking wrappers
arch/ia64: use mm locking wrappers
arch/mips: use mm locking wrappers
arch/arc: use mm locking wrappers
arch/blackfin: use mm locking wrappers
arch/m68k: use mm locking wrappers
arch/sh: use mm locking wrappers
arch/cris: use mm locking wrappers
arch/frv: use mm locking wrappers
arch/hexagon: use mm locking wrappers
arch/score: use mm locking wrappers
arch/m32r: use mm locking wrappers
arch/metag: use mm locking wrappers
arch/microblaze: use mm locking wrappers
arch/tile: use mm locking wrappers
arch/xtensa: use mm locking wrappers
arch/unicore32: use mm locking wrappers
arch/mn10300: use mm locking wrappers
arch/openrisc: use mm locking wrappers
arch/nios2: use mm locking wrappers
arch/arm: use mm locking wrappers
arch/riscv: use mm locking wrappers
drivers/android: use mm locking wrappers
drivers/gpu: use mm locking wrappers
drivers/infiniband: use mm locking wrappers
drivers/iommu: use mm locking helpers
drivers/xen: use mm locking wrappers
staging/lustre: use generic range lock
drivers: use mm locking wrappers (the rest)
mm/mmap: hack drop down_write_nest_lock()
mm: convert mmap_sem to range mmap_lock

arch/alpha/kernel/traps.c | 6 +-
arch/alpha/mm/fault.c | 13 +-
arch/arc/kernel/troubleshoot.c | 5 +-
arch/arc/mm/fault.c | 15 +-
arch/arm/kernel/process.c | 5 +-
arch/arm/kernel/swp_emulate.c | 5 +-
arch/arm/lib/uaccess_with_memcpy.c | 18 +-
arch/arm/mm/fault.c | 14 +-
arch/arm/probes/uprobes/core.c | 5 +-
arch/arm64/kernel/traps.c | 5 +-
arch/arm64/kernel/vdso.c | 12 +-
arch/arm64/mm/fault.c | 13 +-
arch/blackfin/kernel/ptrace.c | 5 +-
arch/blackfin/kernel/trace.c | 7 +-
arch/cris/mm/fault.c | 13 +-
arch/frv/mm/fault.c | 13 +-
arch/hexagon/kernel/vdso.c | 5 +-
arch/hexagon/mm/vm_fault.c | 11 +-
arch/ia64/kernel/perfmon.c | 10 +-
arch/ia64/mm/fault.c | 13 +-
arch/ia64/mm/init.c | 13 +-
arch/m32r/mm/fault.c | 15 +-
arch/m68k/kernel/sys_m68k.c | 18 +-
arch/m68k/mm/fault.c | 11 +-
arch/metag/mm/fault.c | 13 +-
arch/microblaze/mm/fault.c | 15 +-
arch/mips/kernel/traps.c | 5 +-
arch/mips/kernel/vdso.c | 7 +-
arch/mips/mm/c-octeon.c | 5 +-
arch/mips/mm/c-r4k.c | 5 +-
arch/mips/mm/fault.c | 13 +-
arch/mn10300/mm/fault.c | 13 +-
arch/nios2/mm/fault.c | 15 +-
arch/nios2/mm/init.c | 5 +-
arch/openrisc/kernel/dma.c | 6 +-
arch/openrisc/mm/fault.c | 13 +-
arch/parisc/kernel/traps.c | 7 +-
arch/parisc/mm/fault.c | 11 +-
arch/powerpc/include/asm/mmu_context.h | 3 +-
arch/powerpc/include/asm/powernv.h | 5 +-
arch/powerpc/kernel/vdso.c | 7 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +-
arch/powerpc/kvm/book3s_64_vio.c | 5 +-
arch/powerpc/kvm/book3s_hv.c | 7 +-
arch/powerpc/kvm/e500_mmu_host.c | 5 +-
arch/powerpc/mm/copro_fault.c | 8 +-
arch/powerpc/mm/fault.c | 35 +-
arch/powerpc/mm/mmu_context_iommu.c | 5 +-
arch/powerpc/mm/subpage-prot.c | 13 +-
arch/powerpc/oprofile/cell/spu_task_sync.c | 7 +-
arch/powerpc/platforms/cell/spufs/file.c | 6 +-
arch/powerpc/platforms/powernv/npu-dma.c | 7 +-
arch/riscv/kernel/vdso.c | 5 +-
arch/riscv/mm/fault.c | 13 +-
arch/s390/include/asm/gmap.h | 14 +-
arch/s390/kernel/vdso.c | 5 +-
arch/s390/kvm/gaccess.c | 35 +-
arch/s390/kvm/kvm-s390.c | 24 +-
arch/s390/kvm/priv.c | 29 +-
arch/s390/mm/fault.c | 9 +-
arch/s390/mm/gmap.c | 125 ++--
arch/s390/pci/pci_mmio.c | 5 +-
arch/score/mm/fault.c | 13 +-
arch/sh/kernel/sys_sh.c | 7 +-
arch/sh/kernel/vsyscall/vsyscall.c | 5 +-
arch/sh/mm/fault.c | 50 +-
arch/sparc/mm/fault_32.c | 24 +-
arch/sparc/mm/fault_64.c | 15 +-
arch/sparc/vdso/vma.c | 5 +-
arch/tile/kernel/stack.c | 5 +-
arch/tile/mm/elf.c | 12 +-
arch/tile/mm/fault.c | 15 +-
arch/tile/mm/pgtable.c | 6 +-
arch/um/include/asm/mmu_context.h | 8 +-
arch/um/kernel/tlb.c | 12 +-
arch/um/kernel/trap.c | 9 +-
arch/unicore32/mm/fault.c | 14 +-
arch/x86/entry/vdso/vma.c | 14 +-
arch/x86/events/core.c | 2 +-
arch/x86/include/asm/mmu_context.h | 5 +-
arch/x86/include/asm/mpx.h | 6 +-
arch/x86/kernel/tboot.c | 2 +-
arch/x86/kernel/vm86_32.c | 5 +-
arch/x86/mm/debug_pagetables.c | 13 +-
arch/x86/mm/fault.c | 40 +-
arch/x86/mm/mpx.c | 55 +-
arch/x86/um/vdso/vma.c | 5 +-
arch/xtensa/mm/fault.c | 13 +-
drivers/android/binder_alloc.c | 12 +-
drivers/gpu/drm/Kconfig | 2 -
drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 7 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 11 +-
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 +-
drivers/gpu/drm/i915/Kconfig | 1 -
drivers/gpu/drm/i915/i915_gem.c | 5 +-
drivers/gpu/drm/i915/i915_gem_userptr.c | 13 +-
drivers/gpu/drm/radeon/radeon_cs.c | 5 +-
drivers/gpu/drm/radeon/radeon_gem.c | 7 +-
drivers/gpu/drm/radeon/radeon_mn.c | 7 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 4 +-
drivers/gpu/drm/ttm/ttm_bo_vm.c | 4 +-
drivers/infiniband/core/umem.c | 19 +-
drivers/infiniband/core/umem_odp.c | 14 +-
drivers/infiniband/hw/hfi1/user_pages.c | 15 +-
drivers/infiniband/hw/mlx4/main.c | 5 +-
drivers/infiniband/hw/mlx5/main.c | 5 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 17 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 19 +-
drivers/iommu/amd_iommu_v2.c | 9 +-
drivers/iommu/intel-svm.c | 9 +-
drivers/media/v4l2-core/videobuf-core.c | 5 +-
drivers/media/v4l2-core/videobuf-dma-contig.c | 5 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 22 +-
drivers/misc/cxl/cxllib.c | 5 +-
drivers/misc/cxl/fault.c | 5 +-
drivers/misc/mic/scif/scif_rma.c | 17 +-
drivers/misc/sgi-gru/grufault.c | 91 +--
drivers/misc/sgi-gru/grufile.c | 5 +-
drivers/oprofile/buffer_sync.c | 12 +-
drivers/staging/lustre/lustre/llite/Makefile | 2 +-
drivers/staging/lustre/lustre/llite/file.c | 16 +-
.../staging/lustre/lustre/llite/llite_internal.h | 4 +-
drivers/staging/lustre/lustre/llite/llite_mmap.c | 4 +-
drivers/staging/lustre/lustre/llite/range_lock.c | 240 --------
drivers/staging/lustre/lustre/llite/range_lock.h | 83 ---
drivers/staging/lustre/lustre/llite/vvp_io.c | 7 +-
.../media/atomisp/pci/atomisp2/hmm/hmm_bo.c | 5 +-
drivers/tee/optee/call.c | 5 +-
drivers/vfio/vfio_iommu_spapr_tce.c | 8 +-
drivers/vfio/vfio_iommu_type1.c | 16 +-
drivers/xen/gntdev.c | 5 +-
drivers/xen/privcmd.c | 12 +-
fs/aio.c | 7 +-
fs/binfmt_elf.c | 3 +-
fs/coredump.c | 5 +-
fs/exec.c | 38 +-
fs/proc/base.c | 33 +-
fs/proc/internal.h | 3 +
fs/proc/task_mmu.c | 51 +-
fs/proc/task_nommu.c | 22 +-
fs/proc/vmcore.c | 14 +-
fs/userfaultfd.c | 64 +-
include/asm-generic/mm_hooks.h | 3 +-
include/linux/hmm.h | 4 +-
include/linux/huge_mm.h | 2 -
include/linux/hugetlb.h | 9 +-
include/linux/ksm.h | 6 +-
include/linux/lockdep.h | 33 +
include/linux/migrate.h | 4 +-
include/linux/mm.h | 159 ++++-
include/linux/mm_types.h | 4 +-
include/linux/mmu_notifier.h | 6 +-
include/linux/pagemap.h | 7 +-
include/linux/range_lock.h | 189 ++++++
include/linux/uprobes.h | 15 +-
include/linux/userfaultfd_k.h | 5 +-
ipc/shm.c | 22 +-
kernel/acct.c | 5 +-
kernel/events/core.c | 5 +-
kernel/events/uprobes.c | 66 +-
kernel/exit.c | 9 +-
kernel/fork.c | 18 +-
kernel/futex.c | 7 +-
kernel/locking/Makefile | 2 +-
kernel/locking/range_lock.c | 667 +++++++++++++++++++++
kernel/sched/fair.c | 5 +-
kernel/sys.c | 22 +-
kernel/trace/trace_output.c | 5 +-
lib/Kconfig | 14 -
lib/Kconfig.debug | 1 -
lib/Makefile | 3 +-
mm/filemap.c | 9 +-
mm/frame_vector.c | 8 +-
mm/gup.c | 79 ++-
mm/hmm.c | 37 +-
mm/hugetlb.c | 16 +-
mm/init-mm.c | 2 +-
mm/internal.h | 3 +-
mm/khugepaged.c | 57 +-
mm/ksm.c | 64 +-
mm/madvise.c | 80 ++-
mm/memcontrol.c | 21 +-
mm/memory.c | 30 +-
mm/mempolicy.c | 56 +-
mm/migrate.c | 30 +-
mm/mincore.c | 28 +-
mm/mlock.c | 49 +-
mm/mmap.c | 145 +++--
mm/mmu_notifier.c | 14 +-
mm/mprotect.c | 28 +-
mm/mremap.c | 34 +-
mm/msync.c | 9 +-
mm/nommu.c | 55 +-
mm/oom_kill.c | 11 +-
mm/pagewalk.c | 60 +-
mm/process_vm_access.c | 8 +-
mm/shmem.c | 2 +-
mm/swapfile.c | 7 +-
mm/userfaultfd.c | 24 +-
mm/util.c | 12 +-
security/tomoyo/domain.c | 3 +-
virt/kvm/arm/mmu.c | 17 +-
virt/kvm/async_pf.c | 7 +-
virt/kvm/kvm_main.c | 25 +-
205 files changed, 2817 insertions(+), 1651 deletions(-)
delete mode 100644 drivers/staging/lustre/lustre/llite/range_lock.c
delete mode 100644 drivers/staging/lustre/lustre/llite/range_lock.h
create mode 100644 include/linux/range_lock.h
create mode 100644 kernel/locking/range_lock.c

--
2.13.6



2018-02-05 01:29:01

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 04/64] mm: add a range parameter to the vm_fault structure

From: Davidlohr Bueso <[email protected]>

When handling a page fault, it happens that the mmap_sem is released
during the processing. As moving to range lock requires to pass the
range parameter to the lock/unlock operation, this patch add a pointer
to the range structure used when locking the mmap_sem to vm_fault
structure.

It is currently unused, but will be in the next patches.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/mm.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9d2ed23aa894..bcf2509d448d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -361,6 +361,10 @@ struct vm_fault {
* page table to avoid allocation from
* atomic context.
*/
+ struct range_lock *lockrange; /* Range lock interval in use for when
+ * the mm lock is manipulated throughout
+ * its lifespan.
+ */
};

/* page entry size for vm->huge_fault() */
--
2.13.6


2018-02-05 01:29:20

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 15/64] ipc: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This is straightforward as the necessary syscalls already
know about mmrange. No change in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/shm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index 6c29c791c7f2..4ab752647ca9 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1398,7 +1398,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
if (err)
goto out_fput;

- if (down_write_killable(&current->mm->mmap_sem)) {
+ if (mm_write_lock_killable(current->mm, &mmrange)) {
err = -EINTR;
goto out_fput;
}
@@ -1419,7 +1419,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
if (IS_ERR_VALUE(addr))
err = (long)addr;
invalid:
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
if (populate)
mm_populate(addr, populate);

@@ -1494,7 +1494,7 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
if (addr & ~PAGE_MASK)
return retval;

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

/*
@@ -1585,7 +1585,7 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)

#endif

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return retval;
}

--
2.13.6


2018-02-05 01:29:28

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 02/64] Introduce range reader/writer lock

From: Davidlohr Bueso <[email protected]>

This implements a sleepable range rwlock, based on interval tree, serializing
conflicting/intersecting/overlapping ranges within the tree. The largest range
is given by [0, ~0] (inclusive). Unlike traditional locks, range locking
involves dealing with the tree itself and the range to be locked, normally
stack allocated and always explicitly prepared/initialized by the user in a
[a0, a1] a0 <= a1 sorted manner, before actually taking the lock.

Interval-tree based range locking is about controlling tasks' forward
progress when adding an arbitrary interval (node) to the tree, depending
on any overlapping ranges. A task can only continue (wakeup) if there are
no intersecting ranges, thus achieving mutual exclusion. To this end, a
reference counter is kept for each intersecting range in the tree
(_before_ adding itself to it). To enable shared locking semantics,
the reader to-be-locked will not take reference if an intersecting node
is also a reader, therefore ignoring the node altogether.

Fairness and freedom of starvation are guaranteed by the lack of lock
stealing, thus range locks depend directly on interval tree semantics.
This is particularly for iterations, where the key for the rbtree is
given by the interval's low endpoint, and duplicates are walked as it
would an inorder traversal of the tree.

The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where
R_all is total number of ranges and R_int is the number of ranges
intersecting the operated range.

How much does it cost:
----------------------

The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where R_all
is total number of ranges and R_int is the number of ranges intersecting the
new range range to be added.

Due to its sharable nature, full range locks can be compared with rw-sempahores,
which also serves from a mutex standpoint as writer-only situations are
pretty similar nowadays.

The first is the memory footprint, tree locks are smaller than rwsems: 32 vs
40 bytes, but require an additional 72 bytes of stack for the range structure.

Secondly, because every range call is serialized by the tree->lock, any lock()
fastpath will at least have an interval_tree_insert() and spinlock lock+unlock
overhead compared to a single atomic insn in the case of rwsems. Similar scenario
obviously for the unlock() case.

The torture module was used to measure 1-1 differences in lock acquisition with
increasing core counts over a period of 10 minutes. Readers and writers are
interleaved, with a slight advantage to writers as its the first kthread that is
created. The following shows the avg ops/minute with various thread-setups on
boxes with small and large core-counts.

** 4-core AMD Opteron **
(write-only)
rwsem-2thr: 4198.5, stddev: 7.77
range-2thr: 4199.1, stddev: 0.73

rwsem-4thr: 6036.8, stddev: 50.91
range-4thr: 6004.9, stddev: 126.57

rwsem-8thr: 6245.6, stddev: 59.39
range-8thr: 6229.3, stddev: 10.60

(read-only)
rwsem-2thr: 5930.7, stddev: 21.92
range-2thr: 5917.3, stddev: 25.45

rwsem-4thr: 9881.6, stddev: 0.70
range-4thr: 9540.2, stddev: 98.28

rwsem-8thr: 11633.2, stddev: 7.72
range-8thr: 11314.7, stddev: 62.22

For the read/write-only cases, there is very little difference between the range lock
and rwsems, with up to a 3% hit, which could very well be considered in the noise range.

(read-write)
rwsem-write-1thr: 1744.8, stddev: 11.59
rwsem-read-1thr: 1043.1, stddev: 3.97
range-write-1thr: 1740.2, stddev: 5.99
range-read-1thr: 1022.5, stddev: 6.41

rwsem-write-2thr: 1662.5, stddev: 0.70
rwsem-read-2thr: 1278.0, stddev: 25.45
range-write-2thr: 1321.5, stddev: 51.61
range-read-2thr: 1243.5, stddev: 30.40

rwsem-write-4thr: 1761.0, stddev: 11.31
rwsem-read-4thr: 1426.0, stddev: 7.07
range-write-4thr: 1417.0, stddev: 29.69
range-read-4thr: 1398.0, stddev: 56.56

While a single reader and writer threads does not show must difference, increasing
core counts shows that in reader/writer workloads, writer threads can take a hit in
raw performance of up to ~20%, while the number of reader throughput is quite similar
among both locks.

** 240-core (ht) IvyBridge **
(write-only)
rwsem-120thr: 6844.5, stddev: 82.73
range-120thr: 6070.5, stddev: 85.55

rwsem-240thr: 6292.5, stddev: 146.3
range-240thr: 6099.0, stddev: 15.55

rwsem-480thr: 6164.8, stddev: 33.94
range-480thr: 6062.3, stddev: 19.79

(read-only)
rwsem-120thr: 136860.4, stddev: 2539.92
range-120thr: 138052.2, stddev: 327.39

rwsem-240thr: 235297.5, stddev: 2220.50
range-240thr: 232099.1, stddev: 3614.72

rwsem-480thr: 272683.0, stddev: 3924.32
range-480thr: 256539.2, stddev: 9541.69

Similar to the small box, larger machines show that range locks take only a minor
(up to ~6% for 480 threads) hit even in completely exclusive or shared scenarios.

(read-write)
rwsem-write-60thr: 4658.1, stddev: 1303.19
rwsem-read-60thr: 1108.7, stddev: 718.42
range-write-60thr: 3203.6, stddev: 139.30
range-read-60thr: 1852.8, stddev: 147.5

rwsem-write-120thr: 3971.3, stddev: 1413.0
rwsem-read-120thr: 1038.8, stddev: 353.51
range-write-120thr: 2282.1, stddev: 207.18
range-read-120thr: 1856.5, stddev: 198.69

rwsem-write-240thr: 4112.7, stddev: 2448.1
rwsem-read-240thr: 1277.4, stddev: 430.30
range-write-240thr: 2353.1, stddev: 502.04
range-read-240thr: 1551.5, stddev: 361.33

When mixing readers and writers, writer throughput can take a hit of up to ~40%,
similar to the 4 core machine, however, reader threads can increase the number of
acquisitions in up to ~80%. In any case, the overall writer+reader throughput will
always be higher for rwsems. A huge factor in this behavior is that range locks
do not have writer spin-on-owner feature.

On both machines when actually testing threads acquiring different ranges, the
amount of throughput will always outperform the rwsem, due to the increased
parallelism; which is no surprise either. As such microbenchmarks that merely
pounds on a lock will pretty much always suffer upon direct lock conversions,
but not enough to matter in the overall picture.

Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
include/linux/lockdep.h | 33 +++
include/linux/range_lock.h | 189 +++++++++++++
kernel/locking/Makefile | 2 +-
kernel/locking/range_lock.c | 667 ++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 890 insertions(+), 1 deletion(-)
create mode 100644 include/linux/range_lock.h
create mode 100644 kernel/locking/range_lock.c

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..5df01b567d16 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -490,6 +490,16 @@ do { \
lock_acquired(&(_lock)->dep_map, _RET_IP_); \
} while (0)

+#define RANGE_LOCK_CONTENDED(tree, _lock, try, lock) \
+do { \
+ if (!try(tree, _lock)) { \
+ lock_contended(&(tree)->dep_map, _RET_IP_); \
+ lock(tree, _lock); \
+ } \
+ lock_acquired(&(tree)->dep_map, _RET_IP_); \
+} while (0)
+
+
#define LOCK_CONTENDED_RETURN(_lock, try, lock) \
({ \
int ____err = 0; \
@@ -502,6 +512,18 @@ do { \
____err; \
})

+#define RANGE_LOCK_CONTENDED_RETURN(tree, _lock, try, lock) \
+({ \
+ int ____err = 0; \
+ if (!try(tree, _lock)) { \
+ lock_contended(&(tree)->dep_map, _RET_IP_); \
+ ____err = lock(tree, _lock); \
+ } \
+ if (!____err) \
+ lock_acquired(&(tree)->dep_map, _RET_IP_); \
+ ____err; \
+})
+
#else /* CONFIG_LOCK_STAT */

#define lock_contended(lockdep_map, ip) do {} while (0)
@@ -510,9 +532,15 @@ do { \
#define LOCK_CONTENDED(_lock, try, lock) \
lock(_lock)

+#define RANGE_LOCK_CONTENDED(tree, _lock, try, lock) \
+ lock(tree, _lock)
+
#define LOCK_CONTENDED_RETURN(_lock, try, lock) \
lock(_lock)

+#define RANGE_LOCK_CONTENDED_RETURN(tree, _lock, try, lock) \
+ lock(tree, _lock)
+
#endif /* CONFIG_LOCK_STAT */

#ifdef CONFIG_LOCKDEP
@@ -577,6 +605,11 @@ static inline void print_irqtrace_events(struct task_struct *curr)
#define rwsem_acquire_read(l, s, t, i) lock_acquire_shared(l, s, t, NULL, i)
#define rwsem_release(l, n, i) lock_release(l, n, i)

+#define range_lock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
+#define range_lock_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
+#define range_lock_acquire_read(l, s, t, i) lock_acquire_shared(l, s, t, NULL, i)
+#define range_lock_release(l, n, i) lock_release(l, n, i)
+
#define lock_map_acquire(l) lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
#define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
#define lock_map_acquire_tryread(l) lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_)
diff --git a/include/linux/range_lock.h b/include/linux/range_lock.h
new file mode 100644
index 000000000000..51448addb2fa
--- /dev/null
+++ b/include/linux/range_lock.h
@@ -0,0 +1,189 @@
+/*
+ * Range/interval rw-locking
+ * -------------------------
+ *
+ * Interval-tree based range locking is about controlling tasks' forward
+ * progress when adding an arbitrary interval (node) to the tree, depending
+ * on any overlapping ranges. A task can only continue (or wakeup) if there
+ * are no intersecting ranges, thus achieving mutual exclusion. To this end,
+ * a reference counter is kept for each intersecting range in the tree
+ * (_before_ adding itself to it). To enable shared locking semantics,
+ * the reader to-be-locked will not take reference if an intersecting node
+ * is also a reader, therefore ignoring the node altogether.
+ *
+ * Given the above, range lock order and fairness has fifo semantics among
+ * contended ranges. Among uncontended ranges, order is given by the inorder
+ * tree traversal which is performed.
+ *
+ * Example: Tasks A, B, C. Tree is empty.
+ *
+ * t0: A grabs the (free) lock [a,n]; thus ref[a,n] = 0.
+ * t1: B tries to grab the lock [g,z]; thus ref[g,z] = 1.
+ * t2: C tries to grab the lock [b,m]; thus ref[b,m] = 2.
+ *
+ * t3: A releases the lock [a,n]; thus ref[g,z] = 0, ref[b,m] = 1.
+ * t4: B grabs the lock [g.z].
+ *
+ * In addition, freedom of starvation is guaranteed by the fact that there
+ * is no lock stealing going on, everything being serialized by the tree->lock.
+ *
+ * The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where
+ * R_all is total number of ranges and R_int is the number of ranges
+ * intersecting the operated range.
+ */
+#ifndef _LINUX_RANGE_LOCK_H
+#define _LINUX_RANGE_LOCK_H
+
+#include <linux/rbtree.h>
+#include <linux/interval_tree.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+
+/*
+ * The largest range will span [0,RANGE_LOCK_FULL].
+ */
+#define RANGE_LOCK_FULL ~0UL
+
+struct range_lock {
+ struct interval_tree_node node;
+ struct task_struct *tsk;
+ /* Number of ranges which are blocking acquisition of the lock */
+ unsigned int blocking_ranges;
+ u64 seqnum;
+};
+
+struct range_lock_tree {
+ struct rb_root_cached root;
+ spinlock_t lock;
+ u64 seqnum; /* track order of incoming ranges, avoid overflows */
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+};
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define __RANGE_LOCK_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
+#else
+# define __RANGE_LOCK_DEP_MAP_INIT(lockname)
+#endif
+
+#define __RANGE_LOCK_TREE_INITIALIZER(name) \
+ { .root = RB_ROOT_CACHED \
+ , .seqnum = 0 \
+ , .lock = __SPIN_LOCK_UNLOCKED(name.lock) \
+ __RANGE_LOCK_DEP_MAP_INIT(name) } \
+
+#define DEFINE_RANGE_LOCK_TREE(name) \
+ struct range_lock_tree name = __RANGE_LOCK_TREE_INITIALIZER(name)
+
+#define __RANGE_LOCK_INITIALIZER(__start, __last) { \
+ .node = { \
+ .start = (__start) \
+ ,.last = (__last) \
+ } \
+ , .tsk = NULL \
+ , .blocking_ranges = 0 \
+ , .seqnum = 0 \
+ }
+
+#define DEFINE_RANGE_LOCK(name, start, last) \
+ struct range_lock name = __RANGE_LOCK_INITIALIZER((start), (last))
+
+#define DEFINE_RANGE_LOCK_FULL(name) \
+ struct range_lock name = __RANGE_LOCK_INITIALIZER(0, RANGE_LOCK_FULL)
+
+static inline void
+__range_lock_tree_init(struct range_lock_tree *tree,
+ const char *name, struct lock_class_key *key)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ /*
+ * Make sure we are not reinitializing a held lock:
+ */
+ debug_check_no_locks_freed((void *)tree, sizeof(*tree));
+ lockdep_init_map(&tree->dep_map, name, key, 0);
+#endif
+ tree->root = RB_ROOT_CACHED;
+ spin_lock_init(&tree->lock);
+ tree->seqnum = 0;
+}
+
+#define range_lock_tree_init(tree) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ __range_lock_tree_init((tree), #tree, &__key); \
+} while (0)
+
+void range_lock_init(struct range_lock *lock,
+ unsigned long start, unsigned long last);
+void range_lock_init_full(struct range_lock *lock);
+
+/*
+ * lock for reading
+ */
+void range_read_lock(struct range_lock_tree *tree, struct range_lock *lock);
+int range_read_lock_interruptible(struct range_lock_tree *tree,
+ struct range_lock *lock);
+int range_read_lock_killable(struct range_lock_tree *tree,
+ struct range_lock *lock);
+int range_read_trylock(struct range_lock_tree *tree, struct range_lock *lock);
+void range_read_unlock(struct range_lock_tree *tree, struct range_lock *lock);
+
+/*
+ * lock for writing
+ */
+void range_write_lock(struct range_lock_tree *tree, struct range_lock *lock);
+int range_write_lock_interruptible(struct range_lock_tree *tree,
+ struct range_lock *lock);
+int range_write_lock_killable(struct range_lock_tree *tree,
+ struct range_lock *lock);
+int range_write_trylock(struct range_lock_tree *tree, struct range_lock *lock);
+void range_write_unlock(struct range_lock_tree *tree, struct range_lock *lock);
+
+void range_downgrade_write(struct range_lock_tree *tree,
+ struct range_lock *lock);
+
+int range_is_locked(struct range_lock_tree *tree, struct range_lock *lock);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/*
+ * nested locking. NOTE: range locks are not allowed to recurse
+ * (which occurs if the same task tries to acquire the same
+ * lock instance multiple times), but multiple locks of the
+ * same lock class might be taken, if the order of the locks
+ * is always the same. This ordering rule can be expressed
+ * to lockdep via the _nested() APIs, but enumerating the
+ * subclasses that are used. (If the nesting relationship is
+ * static then another method for expressing nested locking is
+ * the explicit definition of lock class keys and the use of
+ * lockdep_set_class() at lock initialization time.
+ * See Documentation/locking/lockdep-design.txt for more details.)
+ */
+extern void range_read_lock_nested(struct range_lock_tree *tree,
+ struct range_lock *lock, int subclass);
+extern void range_write_lock_nested(struct range_lock_tree *tree,
+ struct range_lock *lock, int subclass);
+extern int range_write_lock_killable_nested(struct range_lock_tree *tree,
+ struct range_lock *lock, int subclass);
+extern void _range_write_lock_nest_lock(struct range_lock_tree *tree,
+ struct range_lock *lock, struct lockdep_map *nest_lock);
+
+# define range_write_lock_nest_lock(tree, lock, nest_lock) \
+do { \
+ typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \
+ _range_write_lock_nest_lock(tree, lock, &(nest_lock)->dep_map); \
+} while (0);
+
+#else
+# define range_read_lock_nested(tree, lock, subclass) \
+ range_read_lock(tree, lock)
+# define range_write_lock_nest_lock(tree, lock, nest_lock) \
+ range_write_lock(tree, lock)
+# define range_write_lock_nested(tree, lock, subclass) \
+ range_write_lock(tree, lock)
+# define range_write_lock_killable_nested(tree, lock, subclass) \
+ range_write_lock_killable(tree, lock)
+#endif
+
+#endif
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 392c7f23af76..348a6f7d8c21 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -3,7 +3,7 @@
# and is generally not a function of system call inputs.
KCOV_INSTRUMENT := n

-obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
+obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o range_lock.o

ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
diff --git a/kernel/locking/range_lock.c b/kernel/locking/range_lock.c
new file mode 100644
index 000000000000..673c30c07743
--- /dev/null
+++ b/kernel/locking/range_lock.c
@@ -0,0 +1,667 @@
+/*
+ * Copyright (C) 2017 Jan Kara, Davidlohr Bueso.
+ */
+
+#include <linux/rbtree.h>
+#include <linux/spinlock.h>
+#include <linux/range_lock.h>
+#include <linux/lockdep.h>
+#include <linux/sched/signal.h>
+#include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
+#include <linux/sched.h>
+#include <linux/export.h>
+
+#define range_interval_tree_foreach(node, root, start, last) \
+ for (node = interval_tree_iter_first(root, start, last); \
+ node; node = interval_tree_iter_next(node, start, last))
+
+#define to_range_lock(ptr) container_of(ptr, struct range_lock, node)
+#define to_interval_tree_node(ptr) \
+ container_of(ptr, struct interval_tree_node, rb)
+
+static inline void
+__range_tree_insert(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ lock->seqnum = tree->seqnum++;
+ interval_tree_insert(&lock->node, &tree->root);
+}
+
+static inline void
+__range_tree_remove(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ interval_tree_remove(&lock->node, &tree->root);
+}
+
+/*
+ * lock->tsk reader tracking.
+ */
+#define RANGE_FLAG_READER 1UL
+
+static inline struct task_struct *range_lock_waiter(struct range_lock *lock)
+{
+ return (struct task_struct *)
+ ((unsigned long) lock->tsk & ~RANGE_FLAG_READER);
+}
+
+static inline void range_lock_set_reader(struct range_lock *lock)
+{
+ lock->tsk = (struct task_struct *)
+ ((unsigned long)lock->tsk | RANGE_FLAG_READER);
+}
+
+static inline void range_lock_clear_reader(struct range_lock *lock)
+{
+ lock->tsk = (struct task_struct *)
+ ((unsigned long)lock->tsk & ~RANGE_FLAG_READER);
+}
+
+static inline bool range_lock_is_reader(struct range_lock *lock)
+{
+ return (unsigned long) lock->tsk & RANGE_FLAG_READER;
+}
+
+static inline void
+__range_lock_init(struct range_lock *lock,
+ unsigned long start, unsigned long last)
+{
+ WARN_ON(start > last);
+
+ lock->node.start = start;
+ lock->node.last = last;
+ RB_CLEAR_NODE(&lock->node.rb);
+ lock->blocking_ranges = 0;
+ lock->tsk = NULL;
+ lock->seqnum = 0;
+}
+
+/**
+ * range_lock_init - Initialize a range lock
+ * @lock: the range lock to be initialized
+ * @start: start of the interval (inclusive)
+ * @last: last location in the interval (inclusive)
+ *
+ * Initialize the range's [start, last] such that it can
+ * later be locked. User is expected to enter a sorted
+ * range, such that @start <= @last.
+ *
+ * It is not allowed to initialize an already locked range.
+ */
+void range_lock_init(struct range_lock *lock,
+ unsigned long start, unsigned long last)
+{
+ __range_lock_init(lock, start, last);
+}
+EXPORT_SYMBOL_GPL(range_lock_init);
+
+/**
+ * range_lock_init_full - Initialize a full range lock
+ * @lock: the range lock to be initialized
+ *
+ * Initialize the full range.
+ *
+ * It is not allowed to initialize an already locked range.
+ */
+void range_lock_init_full(struct range_lock *lock)
+{
+ __range_lock_init(lock, 0, RANGE_LOCK_FULL);
+}
+EXPORT_SYMBOL_GPL(range_lock_init_full);
+
+static inline void
+range_lock_put(struct range_lock *lock, struct wake_q_head *wake_q)
+{
+ if (!--lock->blocking_ranges)
+ wake_q_add(wake_q, range_lock_waiter(lock));
+}
+
+static inline int wait_for_ranges(struct range_lock_tree *tree,
+ struct range_lock *lock, long state)
+{
+ int ret = 0;
+
+ while (true) {
+ set_current_state(state);
+
+ /* do we need to go to sleep? */
+ if (!lock->blocking_ranges)
+ break;
+
+ if (unlikely(signal_pending_state(state, current))) {
+ struct interval_tree_node *node;
+ unsigned long flags;
+ DEFINE_WAKE_Q(wake_q);
+
+ ret = -EINTR;
+ /*
+ * We're not taking the lock after all, cleanup
+ * after ourselves.
+ */
+ spin_lock_irqsave(&tree->lock, flags);
+
+ range_lock_clear_reader(lock);
+ __range_tree_remove(tree, lock);
+
+ range_interval_tree_foreach(node, &tree->root,
+ lock->node.start,
+ lock->node.last) {
+ struct range_lock *blked;
+ blked = to_range_lock(node);
+
+ if (range_lock_is_reader(lock) &&
+ range_lock_is_reader(blked))
+ continue;
+
+ /* unaccount for threads _we_ are blocking */
+ if (lock->seqnum < blked->seqnum)
+ range_lock_put(blked, &wake_q);
+ }
+
+ spin_unlock_irqrestore(&tree->lock, flags);
+ wake_up_q(&wake_q);
+ break;
+ }
+
+ schedule();
+ }
+
+ __set_current_state(TASK_RUNNING);
+ return ret;
+}
+
+/**
+ * range_read_trylock - Trylock for reading
+ * @tree: interval tree
+ * @lock: the range lock to be trylocked
+ *
+ * The trylock is against the range itself, not the @tree->lock.
+ *
+ * Returns 1 if successful, 0 if contention (must block to acquire).
+ */
+static inline int __range_read_trylock(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ int ret = true;
+ unsigned long flags;
+ struct interval_tree_node *node;
+
+ spin_lock_irqsave(&tree->lock, flags);
+
+ range_interval_tree_foreach(node, &tree->root,
+ lock->node.start, lock->node.last) {
+ struct range_lock *blocked_lock;
+ blocked_lock = to_range_lock(node);
+
+ if (!range_lock_is_reader(blocked_lock)) {
+ ret = false;
+ goto unlock;
+ }
+ }
+
+ range_lock_set_reader(lock);
+ __range_tree_insert(tree, lock);
+unlock:
+ spin_unlock_irqrestore(&tree->lock, flags);
+
+ return ret;
+}
+
+int range_read_trylock(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ int ret = __range_read_trylock(tree, lock);
+
+ if (ret)
+ range_lock_acquire_read(&tree->dep_map, 0, 1, _RET_IP_);
+
+ return ret;
+}
+
+EXPORT_SYMBOL_GPL(range_read_trylock);
+
+static __always_inline int __sched
+__range_read_lock_common(struct range_lock_tree *tree,
+ struct range_lock *lock, long state)
+{
+ struct interval_tree_node *node;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tree->lock, flags);
+
+ range_interval_tree_foreach(node, &tree->root,
+ lock->node.start, lock->node.last) {
+ struct range_lock *blocked_lock;
+ blocked_lock = to_range_lock(node);
+
+ if (!range_lock_is_reader(blocked_lock))
+ lock->blocking_ranges++;
+ }
+
+ __range_tree_insert(tree, lock);
+
+ lock->tsk = current;
+ range_lock_set_reader(lock);
+ spin_unlock_irqrestore(&tree->lock, flags);
+
+ return wait_for_ranges(tree, lock, state);
+}
+
+static __always_inline int
+__range_read_lock(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ return __range_read_lock_common(tree, lock, TASK_UNINTERRUPTIBLE);
+}
+
+/**
+ * range_read_lock - Lock for reading
+ * @tree: interval tree
+ * @lock: the range lock to be locked
+ *
+ * Returns when the lock has been acquired or sleep until
+ * until there are no overlapping ranges.
+ */
+void range_read_lock(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ might_sleep();
+ range_lock_acquire_read(&tree->dep_map, 0, 0, _RET_IP_);
+
+ RANGE_LOCK_CONTENDED(tree, lock,
+ __range_read_trylock, __range_read_lock);
+}
+EXPORT_SYMBOL_GPL(range_read_lock);
+
+/**
+ * range_read_lock_interruptible - Lock for reading (interruptible)
+ * @tree: interval tree
+ * @lock: the range lock to be locked
+ *
+ * Lock the range like range_read_lock(), and return 0 if the
+ * lock has been acquired or sleep until until there are no
+ * overlapping ranges. If a signal arrives while waiting for the
+ * lock then this function returns -EINTR.
+ */
+int range_read_lock_interruptible(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ might_sleep();
+ return __range_read_lock_common(tree, lock, TASK_INTERRUPTIBLE);
+}
+EXPORT_SYMBOL_GPL(range_read_lock_interruptible);
+
+/**
+ * range_read_lock_killable - Lock for reading (killable)
+ * @tree: interval tree
+ * @lock: the range lock to be locked
+ *
+ * Lock the range like range_read_lock(), and return 0 if the
+ * lock has been acquired or sleep until until there are no
+ * overlapping ranges. If a signal arrives while waiting for the
+ * lock then this function returns -EINTR.
+ */
+static __always_inline int
+__range_read_lock_killable(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ return __range_read_lock_common(tree, lock, TASK_KILLABLE);
+}
+
+int range_read_lock_killable(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ might_sleep();
+ range_lock_acquire_read(&tree->dep_map, 0, 0, _RET_IP_);
+
+ if (RANGE_LOCK_CONTENDED_RETURN(tree, lock, __range_read_trylock,
+ __range_read_lock_killable)) {
+ range_lock_release(&tree->dep_map, 1, _RET_IP_);
+ return -EINTR;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(range_read_lock_killable);
+
+/**
+ * range_read_unlock - Unlock for reading
+ * @tree: interval tree
+ * @lock: the range lock to be unlocked
+ *
+ * Wakes any blocked readers, when @lock is the only conflicting range.
+ *
+ * It is not allowed to unlock an unacquired read lock.
+ */
+void range_read_unlock(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ struct interval_tree_node *node;
+ unsigned long flags;
+ DEFINE_WAKE_Q(wake_q);
+
+ spin_lock_irqsave(&tree->lock, flags);
+
+ range_lock_clear_reader(lock);
+ __range_tree_remove(tree, lock);
+
+ range_lock_release(&tree->dep_map, 1, _RET_IP_);
+
+ range_interval_tree_foreach(node, &tree->root,
+ lock->node.start, lock->node.last) {
+ struct range_lock *blocked_lock;
+ blocked_lock = to_range_lock(node);
+
+ if (!range_lock_is_reader(blocked_lock))
+ range_lock_put(blocked_lock, &wake_q);
+ }
+
+ spin_unlock_irqrestore(&tree->lock, flags);
+ wake_up_q(&wake_q);
+}
+EXPORT_SYMBOL_GPL(range_read_unlock);
+
+/*
+ * Check for overlaps for fast write_trylock(), which is the same
+ * optimization that interval_tree_iter_first() does.
+ */
+static inline bool __range_overlaps_intree(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ struct interval_tree_node *root;
+ struct range_lock *left;
+
+ if (unlikely(RB_EMPTY_ROOT(&tree->root.rb_root)))
+ return false;
+
+ root = to_interval_tree_node(tree->root.rb_root.rb_node);
+ left = to_range_lock(to_interval_tree_node(rb_first_cached(&tree->root)));
+
+ return lock->node.start <= root->__subtree_last &&
+ left->node.start <= lock->node.last;
+}
+
+/**
+ * range_write_trylock - Trylock for writing
+ * @tree: interval tree
+ * @lock: the range lock to be trylocked
+ *
+ * The trylock is against the range itself, not the @tree->lock.
+ *
+ * Returns 1 if successful, 0 if contention (must block to acquire).
+ */
+static inline int __range_write_trylock(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ int overlaps;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tree->lock, flags);
+ overlaps = __range_overlaps_intree(tree, lock);
+
+ if (!overlaps) {
+ range_lock_clear_reader(lock);
+ __range_tree_insert(tree, lock);
+ }
+
+ spin_unlock_irqrestore(&tree->lock, flags);
+
+ return !overlaps;
+}
+
+int range_write_trylock(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ int ret = __range_write_trylock(tree, lock);
+
+ if (ret)
+ range_lock_acquire(&tree->dep_map, 0, 1, _RET_IP_);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(range_write_trylock);
+
+static __always_inline int __sched
+__range_write_lock_common(struct range_lock_tree *tree,
+ struct range_lock *lock, long state)
+{
+ struct interval_tree_node *node;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tree->lock, flags);
+
+ range_interval_tree_foreach(node, &tree->root,
+ lock->node.start, lock->node.last) {
+ /*
+ * As a writer, we always consider an existing node. We
+ * need to wait; either the intersecting node is another
+ * writer or we have a reader that needs to finish.
+ */
+ lock->blocking_ranges++;
+ }
+
+ __range_tree_insert(tree, lock);
+
+ lock->tsk = current;
+ spin_unlock_irqrestore(&tree->lock, flags);
+
+ return wait_for_ranges(tree, lock, state);
+}
+
+static __always_inline int
+__range_write_lock(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ return __range_write_lock_common(tree, lock, TASK_UNINTERRUPTIBLE);
+}
+
+/**
+ * range_write_lock - Lock for writing
+ * @tree: interval tree
+ * @lock: the range lock to be locked
+ *
+ * Returns when the lock has been acquired or sleep until
+ * until there are no overlapping ranges.
+ */
+void range_write_lock(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ might_sleep();
+ range_lock_acquire(&tree->dep_map, 0, 0, _RET_IP_);
+
+ RANGE_LOCK_CONTENDED(tree, lock,
+ __range_write_trylock, __range_write_lock);
+}
+EXPORT_SYMBOL_GPL(range_write_lock);
+
+/**
+ * range_write_lock_interruptible - Lock for writing (interruptible)
+ * @tree: interval tree
+ * @lock: the range lock to be locked
+ *
+ * Lock the range like range_write_lock(), and return 0 if the
+ * lock has been acquired or sleep until until there are no
+ * overlapping ranges. If a signal arrives while waiting for the
+ * lock then this function returns -EINTR.
+ */
+int range_write_lock_interruptible(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ might_sleep();
+ return __range_write_lock_common(tree, lock, TASK_INTERRUPTIBLE);
+}
+EXPORT_SYMBOL_GPL(range_write_lock_interruptible);
+
+/**
+ * range_write_lock_killable - Lock for writing (killable)
+ * @tree: interval tree
+ * @lock: the range lock to be locked
+ *
+ * Lock the range like range_write_lock(), and return 0 if the
+ * lock has been acquired or sleep until until there are no
+ * overlapping ranges. If a signal arrives while waiting for the
+ * lock then this function returns -EINTR.
+ */
+static __always_inline int
+__range_write_lock_killable(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ return __range_write_lock_common(tree, lock, TASK_KILLABLE);
+}
+
+int range_write_lock_killable(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ might_sleep();
+ range_lock_acquire(&tree->dep_map, 0, 0, _RET_IP_);
+
+ if (RANGE_LOCK_CONTENDED_RETURN(tree, lock, __range_write_trylock,
+ __range_write_lock_killable)) {
+ range_lock_release(&tree->dep_map, 1, _RET_IP_);
+ return -EINTR;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(range_write_lock_killable);
+
+/**
+ * range_write_unlock - Unlock for writing
+ * @tree: interval tree
+ * @lock: the range lock to be unlocked
+ *
+ * Wakes any blocked readers, when @lock is the only conflicting range.
+ *
+ * It is not allowed to unlock an unacquired write lock.
+ */
+void range_write_unlock(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ struct interval_tree_node *node;
+ unsigned long flags;
+ DEFINE_WAKE_Q(wake_q);
+
+ spin_lock_irqsave(&tree->lock, flags);
+
+ range_lock_clear_reader(lock);
+ __range_tree_remove(tree, lock);
+
+ range_lock_release(&tree->dep_map, 1, _RET_IP_);
+
+ range_interval_tree_foreach(node, &tree->root,
+ lock->node.start, lock->node.last) {
+ struct range_lock *blocked_lock;
+ blocked_lock = to_range_lock(node);
+
+ range_lock_put(blocked_lock, &wake_q);
+ }
+
+ spin_unlock_irqrestore(&tree->lock, flags);
+ wake_up_q(&wake_q);
+}
+EXPORT_SYMBOL_GPL(range_write_unlock);
+
+/**
+ * range_downgrade_write - Downgrade write range lock to read lock
+ * @tree: interval tree
+ * @lock: the range lock to be downgraded
+ *
+ * Wakes any blocked readers, when @lock is the only conflicting range.
+ *
+ * It is not allowed to downgrade an unacquired write lock.
+ */
+void range_downgrade_write(struct range_lock_tree *tree,
+ struct range_lock *lock)
+{
+ unsigned long flags;
+ struct interval_tree_node *node;
+ DEFINE_WAKE_Q(wake_q);
+
+ lock_downgrade(&tree->dep_map, _RET_IP_);
+
+ spin_lock_irqsave(&tree->lock, flags);
+
+ WARN_ON(range_lock_is_reader(lock));
+
+ range_interval_tree_foreach(node, &tree->root,
+ lock->node.start, lock->node.last) {
+ struct range_lock *blocked_lock;
+ blocked_lock = to_range_lock(node);
+
+ /*
+ * Unaccount for any blocked reader lock. Wakeup if possible.
+ */
+ if (range_lock_is_reader(blocked_lock))
+ range_lock_put(blocked_lock, &wake_q);
+ }
+
+ range_lock_set_reader(lock);
+ spin_unlock_irqrestore(&tree->lock, flags);
+ wake_up_q(&wake_q);
+}
+EXPORT_SYMBOL_GPL(range_downgrade_write);
+
+/**
+ * range_is_locked - Returns 1 if the given range is already either reader or
+ * writer owned. Otherwise 0.
+ * @tree: interval tree
+ * @lock: the range lock to be checked
+ *
+ * Similar to trylocks, this is against the range itself, not the @tree->lock.
+ */
+int range_is_locked(struct range_lock_tree *tree, struct range_lock *lock)
+{
+ int overlaps;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tree->lock, flags);
+ overlaps = __range_overlaps_intree(tree, lock);
+ spin_lock_irqsave(&tree->lock, flags);
+
+ return overlaps;
+}
+EXPORT_SYMBOL_GPL(range_is_locked);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+
+void range_read_lock_nested(struct range_lock_tree *tree,
+ struct range_lock *lock, int subclass)
+{
+ might_sleep();
+ range_lock_acquire_read(&tree->dep_map, subclass, 0, _RET_IP_);
+
+ RANGE_LOCK_CONTENDED(tree, lock, __range_read_trylock, __range_read_lock);
+}
+EXPORT_SYMBOL_GPL(range_read_lock_nested);
+
+void _range_write_lock_nest_lock(struct range_lock_tree *tree,
+ struct range_lock *lock,
+ struct lockdep_map *nest)
+{
+ might_sleep();
+ range_lock_acquire_nest(&tree->dep_map, 0, 0, nest, _RET_IP_);
+
+ RANGE_LOCK_CONTENDED(tree, lock,
+ __range_write_trylock, __range_write_lock);
+}
+EXPORT_SYMBOL_GPL(_range_write_lock_nest_lock);
+
+void range_write_lock_nested(struct range_lock_tree *tree,
+ struct range_lock *lock, int subclass)
+{
+ might_sleep();
+ range_lock_acquire(&tree->dep_map, subclass, 0, _RET_IP_);
+
+ RANGE_LOCK_CONTENDED(tree, lock,
+ __range_write_trylock, __range_write_lock);
+}
+EXPORT_SYMBOL_GPL(range_write_lock_nested);
+
+
+int range_write_lock_killable_nested(struct range_lock_tree *tree,
+ struct range_lock *lock, int subclass)
+{
+ might_sleep();
+ range_lock_acquire(&tree->dep_map, subclass, 0, _RET_IP_);
+
+ if (RANGE_LOCK_CONTENDED_RETURN(tree, lock, __range_write_trylock,
+ __range_write_lock_killable)) {
+ range_lock_release(&tree->dep_map, 1, _RET_IP_);
+ return -EINTR;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(range_write_lock_killable_nested);
+#endif
--
2.13.6


2018-02-05 01:29:41

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 05/64] mm,khugepaged: prepare passing of rangelock field to vm_fault

From: Davidlohr Bueso <[email protected]>

When collapsing huge pages from swapin, a vm_fault structure is built
and passed to do_swap_page(). The new range field of the vm_fault
structure must be set correctly when dealing with range_lock.

We teach the main workhorse, khugepaged_scan_mm_slot(), to pass on
a full range lock.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
mm/khugepaged.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b7e2268dfc9a..0b91ce730160 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -873,7 +873,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
static bool __collapse_huge_page_swapin(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd,
- int referenced)
+ int referenced,
+ struct range_lock *mmrange)
{
int swapped_in = 0, ret = 0;
struct vm_fault vmf = {
@@ -882,6 +883,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
.flags = FAULT_FLAG_ALLOW_RETRY,
.pmd = pmd,
.pgoff = linear_page_index(vma, address),
+ .lockrange = mmrange,
};

/* we only decide to swapin, if there is enough young ptes */
@@ -926,9 +928,10 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
}

static void collapse_huge_page(struct mm_struct *mm,
- unsigned long address,
- struct page **hpage,
- int node, int referenced)
+ unsigned long address,
+ struct page **hpage,
+ int node, int referenced,
+ struct range_lock *mmrange)
{
pmd_t *pmd, _pmd;
pte_t *pte;
@@ -986,7 +989,7 @@ static void collapse_huge_page(struct mm_struct *mm,
* If it fails, we release mmap_sem and jump out_nolock.
* Continuing to collapse causes inconsistency.
*/
- if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced)) {
+ if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced, mmrange)) {
mem_cgroup_cancel_charge(new_page, memcg, true);
up_read(&mm->mmap_sem);
goto out_nolock;
@@ -1093,7 +1096,8 @@ static void collapse_huge_page(struct mm_struct *mm,
static int khugepaged_scan_pmd(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long address,
- struct page **hpage)
+ struct page **hpage,
+ struct range_lock *mmrange)
{
pmd_t *pmd;
pte_t *pte, *_pte;
@@ -1207,7 +1211,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
if (ret) {
node = khugepaged_find_target_node();
/* collapse_huge_page will return with the mmap_sem released */
- collapse_huge_page(mm, address, hpage, node, referenced);
+ collapse_huge_page(mm, address, hpage, node, referenced,
+ mmrange);
}
out:
trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced,
@@ -1658,6 +1663,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
struct mm_struct *mm;
struct vm_area_struct *vma;
int progress = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

VM_BUG_ON(!pages);
VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&khugepaged_mm_lock));
@@ -1731,7 +1737,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
} else {
ret = khugepaged_scan_pmd(mm, vma,
khugepaged_scan.address,
- hpage);
+ hpage, &mmrange);
}
/* move to next address */
khugepaged_scan.address += HPAGE_PMD_SIZE;
--
2.13.6


2018-02-05 01:29:46

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 08/64] mm: teach lock_page_or_retry() about range locking

From: Davidlohr Bueso <[email protected]>

The mmap_sem locking rules for lock_page_or_retry() depends on
the page being locked upon return, and can get funky. As such
we need to teach the function about mmrange, which is passed
on via vm_fault.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/pagemap.h | 7 ++++---
mm/filemap.c | 5 +++--
mm/memory.c | 3 ++-
3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 34ce3ebf97d5..e41a734efbe0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -464,7 +464,7 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
extern void __lock_page(struct page *page);
extern int __lock_page_killable(struct page *page);
extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
- unsigned int flags);
+ unsigned int flags, struct range_lock *mmrange);
extern void unlock_page(struct page *page);

static inline int trylock_page(struct page *page)
@@ -504,10 +504,11 @@ static inline int lock_page_killable(struct page *page)
* __lock_page_or_retry().
*/
static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
- unsigned int flags)
+ unsigned int flags,
+ struct range_lock *mmrange)
{
might_sleep();
- return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
+ return trylock_page(page) || __lock_page_or_retry(page, mm, flags, mmrange);
}

/*
diff --git a/mm/filemap.c b/mm/filemap.c
index 693f62212a59..6124ede79a4d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1293,7 +1293,7 @@ EXPORT_SYMBOL_GPL(__lock_page_killable);
* with the page locked and the mmap_sem unperturbed.
*/
int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
- unsigned int flags)
+ unsigned int flags, struct range_lock *mmrange)
{
if (flags & FAULT_FLAG_ALLOW_RETRY) {
/*
@@ -2529,7 +2529,8 @@ int filemap_fault(struct vm_fault *vmf)
goto no_cached_page;
}

- if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) {
+ if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags,
+ vmf->lockrange)) {
put_page(page);
return ret | VM_FAULT_RETRY;
}
diff --git a/mm/memory.c b/mm/memory.c
index 2d087b0e174d..5adcdc7dee80 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2986,7 +2986,8 @@ int do_swap_page(struct vm_fault *vmf)
goto out_release;
}

- locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
+ locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags,
+ vmf->lockrange);

delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
if (!locked) {
--
2.13.6


2018-02-05 01:29:55

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 09/64] mm/mmu_notifier: teach oom reaper about range locking

From: Davidlohr Bueso <[email protected]>

Also begin using mm_is_locked() wrappers (which is sometimes
the only reason why mm_has_blockable_invalidate_notifiers()
needs to be aware of the range passed back in oom_reap_task_mm().

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/mmu_notifier.h | 6 ++++--
mm/mmu_notifier.c | 5 +++--
mm/oom_kill.c | 3 ++-
3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 2d07a1ed5a31..9172cb0bc15d 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -236,7 +236,8 @@ extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
bool only_end);
extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
unsigned long start, unsigned long end);
-extern bool mm_has_blockable_invalidate_notifiers(struct mm_struct *mm);
+extern bool mm_has_blockable_invalidate_notifiers(struct mm_struct *mm,
+ struct range_lock *mmrange);

static inline void mmu_notifier_release(struct mm_struct *mm)
{
@@ -476,7 +477,8 @@ static inline void mmu_notifier_invalidate_range(struct mm_struct *mm,
{
}

-static inline bool mm_has_blockable_invalidate_notifiers(struct mm_struct *mm)
+static inline bool mm_has_blockable_invalidate_notifiers(struct mm_struct *mm,
+ struct range_lock *mmrange)
{
return false;
}
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index eff6b88a993f..3e8a1a10607e 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -240,13 +240,14 @@ EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range);
* Must be called while holding mm->mmap_sem for either read or write.
* The result is guaranteed to be valid until mm->mmap_sem is dropped.
*/
-bool mm_has_blockable_invalidate_notifiers(struct mm_struct *mm)
+bool mm_has_blockable_invalidate_notifiers(struct mm_struct *mm,
+ struct range_lock *mmrange)
{
struct mmu_notifier *mn;
int id;
bool ret = false;

- WARN_ON_ONCE(!rwsem_is_locked(&mm->mmap_sem));
+ WARN_ON_ONCE(!mm_is_locked(mm, mmrange));

if (!mm_has_notifiers(mm))
return ret;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8219001708e0..2288e1cb1bc9 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -490,6 +490,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
struct mmu_gather tlb;
struct vm_area_struct *vma;
bool ret = true;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* We have to make sure to not race with the victim exit path
@@ -519,7 +520,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
* TODO: we really want to get rid of this ugly hack and make sure that
* notifiers cannot block for unbounded amount of time
*/
- if (mm_has_blockable_invalidate_notifiers(mm)) {
+ if (mm_has_blockable_invalidate_notifiers(mm, &mmrange)) {
up_read(&mm->mmap_sem);
schedule_timeout_idle(HZ);
goto unlock_oom;
--
2.13.6


2018-02-05 01:30:18

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 03/64] mm: introduce mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This patch adds the necessary wrappers to encapsulate mmap_sem
locking and will enable any future changes to be a lot more
confined to here. In addition, future users will incrementally
be added in the next patches. mm_[read/write]_[un]lock() naming
is used.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/mm.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47c06fd20f6a..9d2ed23aa894 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -12,6 +12,7 @@
#include <linux/list.h>
#include <linux/mmzone.h>
#include <linux/rbtree.h>
+#include <linux/range_lock.h>
#include <linux/atomic.h>
#include <linux/debug_locks.h>
#include <linux/mm_types.h>
@@ -2675,5 +2676,77 @@ void __init setup_nr_node_ids(void);
static inline void setup_nr_node_ids(void) {}
#endif

+/*
+ * Address space locking wrappers.
+ */
+static inline bool mm_is_locked(struct mm_struct *mm,
+ struct range_lock *range)
+{
+ return rwsem_is_locked(&mm->mmap_sem);
+}
+
+/* Reader wrappers */
+static inline int mm_read_trylock(struct mm_struct *mm,
+ struct range_lock *range)
+{
+ return down_read_trylock(&mm->mmap_sem);
+}
+
+static inline void mm_read_lock(struct mm_struct *mm, struct range_lock *range)
+{
+ down_read(&mm->mmap_sem);
+}
+
+static inline void mm_read_lock_nested(struct mm_struct *mm,
+ struct range_lock *range, int subclass)
+{
+ down_read_nested(&mm->mmap_sem, subclass);
+}
+
+static inline void mm_read_unlock(struct mm_struct *mm,
+ struct range_lock *range)
+{
+ up_read(&mm->mmap_sem);
+}
+
+/* Writer wrappers */
+static inline int mm_write_trylock(struct mm_struct *mm,
+ struct range_lock *range)
+{
+ return down_write_trylock(&mm->mmap_sem);
+}
+
+static inline void mm_write_lock(struct mm_struct *mm, struct range_lock *range)
+{
+ down_write(&mm->mmap_sem);
+}
+
+static inline int mm_write_lock_killable(struct mm_struct *mm,
+ struct range_lock *range)
+{
+ return down_write_killable(&mm->mmap_sem);
+}
+
+static inline void mm_downgrade_write(struct mm_struct *mm,
+ struct range_lock *range)
+{
+ downgrade_write(&mm->mmap_sem);
+}
+
+static inline void mm_write_unlock(struct mm_struct *mm,
+ struct range_lock *range)
+{
+ up_write(&mm->mmap_sem);
+}
+
+static inline void mm_write_lock_nested(struct mm_struct *mm,
+ struct range_lock *range, int subclass)
+{
+ down_write_nested(&mm->mmap_sem, subclass);
+}
+
+#define mm_write_nest_lock(mm, range, nest_lock) \
+ down_write_nest_lock(&(mm)->mmap_sem, nest_lock)
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
--
2.13.6


2018-02-05 01:31:11

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 13/64] fs/proc: teach about range locking

From: Davidlohr Bueso <[email protected]>

And use mm locking wrappers -- no change in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
fs/proc/base.c | 33 ++++++++++++++++++++-------------
fs/proc/task_mmu.c | 22 +++++++++++-----------
fs/proc/task_nommu.c | 22 +++++++++++++---------
3 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9298324325ed..c94ee3e54f25 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -220,6 +220,7 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf,
unsigned long p;
char c;
ssize_t rv;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

BUG_ON(*pos < 0);

@@ -242,12 +243,12 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf,
goto out_mmput;
}

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
env_end = mm->env_end;
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

BUG_ON(arg_start > arg_end);
BUG_ON(env_start > env_end);
@@ -915,6 +916,7 @@ static ssize_t environ_read(struct file *file, char __user *buf,
unsigned long src = *ppos;
int ret = 0;
struct mm_struct *mm = file->private_data;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
unsigned long env_start, env_end;

/* Ensure the process spawned far enough to have an environment. */
@@ -929,10 +931,10 @@ static ssize_t environ_read(struct file *file, char __user *buf,
if (!mmget_not_zero(mm))
goto free;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
env_start = mm->env_start;
env_end = mm->env_end;
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

while (count > 0) {
size_t this_len, max_len;
@@ -1962,9 +1964,11 @@ static int map_files_d_revalidate(struct dentry *dentry, unsigned int flags)
goto out;

if (!dname_to_vma_addr(dentry, &vm_start, &vm_end)) {
- down_read(&mm->mmap_sem);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ mm_read_lock(mm, &mmrange);
exact_vma_exists = !!find_exact_vma(mm, vm_start, vm_end);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}

mmput(mm);
@@ -1995,6 +1999,7 @@ static int map_files_get_link(struct dentry *dentry, struct path *path)
struct task_struct *task;
struct mm_struct *mm;
int rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

rc = -ENOENT;
task = get_proc_task(d_inode(dentry));
@@ -2011,14 +2016,14 @@ static int map_files_get_link(struct dentry *dentry, struct path *path)
goto out_mmput;

rc = -ENOENT;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_exact_vma(mm, vm_start, vm_end);
if (vma && vma->vm_file) {
*path = vma->vm_file->f_path;
path_get(path);
rc = 0;
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

out_mmput:
mmput(mm);
@@ -2091,6 +2096,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
struct task_struct *task;
int result;
struct mm_struct *mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

result = -ENOENT;
task = get_proc_task(dir);
@@ -2109,7 +2115,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
if (!mm)
goto out_put_task;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_exact_vma(mm, vm_start, vm_end);
if (!vma)
goto out_no_vma;
@@ -2119,7 +2125,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
(void *)(unsigned long)vma->vm_file->f_mode);

out_no_vma:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
mmput(mm);
out_put_task:
put_task_struct(task);
@@ -2144,6 +2150,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
struct map_files_info info;
struct map_files_info *p;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

ret = -ENOENT;
task = get_proc_task(file_inode(file));
@@ -2161,7 +2168,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
mm = get_task_mm(task);
if (!mm)
goto out_put_task;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

nr_files = 0;

@@ -2188,7 +2195,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
ret = -ENOMEM;
if (fa)
flex_array_free(fa);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
mmput(mm);
goto out_put_task;
}
@@ -2206,7 +2213,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
BUG();
}
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

for (i = 0; i < nr_files; i++) {
char buf[4 * sizeof(long) + 2]; /* max: %lx-%lx\0 */
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7c0a79a937b5..feb5bd4e5c82 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -136,7 +136,7 @@ static void vma_stop(struct proc_maps_private *priv)
struct mm_struct *mm = priv->mm;

release_task_mempolicy(priv);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &priv->mmrange);
mmput(mm);
}

@@ -175,7 +175,7 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
return NULL;

range_lock_init_full(&priv->mmrange);
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &priv->mmrange);
hold_task_mempolicy(priv);
priv->tail_vma = get_gate_vma(mm);

@@ -1135,7 +1135,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
};

if (type == CLEAR_REFS_MM_HIWATER_RSS) {
- if (down_write_killable(&mm->mmap_sem)) {
+ if (mm_write_lock_killable(mm, &mmrange)) {
count = -EINTR;
goto out_mm;
}
@@ -1145,18 +1145,18 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
* resident set size to this mm's current rss value.
*/
reset_mm_hiwater_rss(mm);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
goto out_mm;
}

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
tlb_gather_mmu(&tlb, mm, 0, -1);
if (type == CLEAR_REFS_SOFT_DIRTY) {
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (!(vma->vm_flags & VM_SOFTDIRTY))
continue;
- up_read(&mm->mmap_sem);
- if (down_write_killable(&mm->mmap_sem)) {
+ mm_read_unlock(mm, &mmrange);
+ if (mm_write_lock_killable(mm, &mmrange)) {
count = -EINTR;
goto out_mm;
}
@@ -1164,7 +1164,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
vma->vm_flags &= ~VM_SOFTDIRTY;
vma_set_page_prot(vma);
}
- downgrade_write(&mm->mmap_sem);
+ mm_downgrade_write(mm, &mmrange);
break;
}
mmu_notifier_invalidate_range_start(mm, 0, -1);
@@ -1174,7 +1174,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
if (type == CLEAR_REFS_SOFT_DIRTY)
mmu_notifier_invalidate_range_end(mm, 0, -1);
tlb_finish_mmu(&tlb, 0, -1);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
out_mm:
mmput(mm);
}
@@ -1528,10 +1528,10 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
/* overflow ? */
if (end < start_vaddr || end > end_vaddr)
end = end_vaddr;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, mmrange);
ret = walk_page_range(start_vaddr, end, &pagemap_walk,
mmrange);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
start_vaddr = end;

len = min(count, PM_ENTRY_BYTES * pm.pos);
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 5b62f57bd9bc..50a21813f926 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -23,9 +23,10 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
struct vm_area_struct *vma;
struct vm_region *region;
struct rb_node *p;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
unsigned long bytes = 0, sbytes = 0, slack = 0, size;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
vma = rb_entry(p, struct vm_area_struct, vm_rb);

@@ -77,7 +78,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
"Shared:\t%8lu bytes\n",
bytes, slack, sbytes);

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}

unsigned long task_vsize(struct mm_struct *mm)
@@ -85,13 +86,14 @@ unsigned long task_vsize(struct mm_struct *mm)
struct vm_area_struct *vma;
struct rb_node *p;
unsigned long vsize = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
vma = rb_entry(p, struct vm_area_struct, vm_rb);
vsize += vma->vm_end - vma->vm_start;
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return vsize;
}

@@ -103,8 +105,9 @@ unsigned long task_statm(struct mm_struct *mm,
struct vm_region *region;
struct rb_node *p;
unsigned long size = kobjsize(mm);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
vma = rb_entry(p, struct vm_area_struct, vm_rb);
size += kobjsize(vma);
@@ -119,7 +122,7 @@ unsigned long task_statm(struct mm_struct *mm,
>> PAGE_SHIFT;
*data = (PAGE_ALIGN(mm->start_stack) - (mm->start_data & PAGE_MASK))
>> PAGE_SHIFT;
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
size >>= PAGE_SHIFT;
size += *text + *data;
*resident = size;
@@ -223,13 +226,14 @@ static void *m_start(struct seq_file *m, loff_t *pos)
if (!mm || !mmget_not_zero(mm))
return NULL;

- down_read(&mm->mmap_sem);
+ range_lock_init_full(&priv->mmrange);
+ mm_read_lock(mm, &priv->mmrange);
/* start from the Nth VMA */
for (p = rb_first(&mm->mm_rb); p; p = rb_next(p))
if (n-- == 0)
return p;

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &priv->mmrange);
mmput(mm);
return NULL;
}
@@ -239,7 +243,7 @@ static void m_stop(struct seq_file *m, void *_vml)
struct proc_maps_private *priv = m->private;

if (!IS_ERR_OR_NULL(_vml)) {
- up_read(&priv->mm->mmap_sem);
+ mm_read_unlock(priv->mm, &priv->mmrange);
mmput(priv->mm);
}
if (priv->task) {
--
2.13.6


2018-02-05 01:31:12

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 18/64] mm/ksm: teach about range locking

From: Davidlohr Bueso <[email protected]>

Conversion is straightforward as most calls use mmap_sem
within the same function context. No changes in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
mm/ksm.c | 40 +++++++++++++++++++++++-----------------
1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 66c350cd9799..c7d62c367ffc 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -526,11 +526,11 @@ static void break_cow(struct rmap_item *rmap_item)
*/
put_anon_vma(rmap_item->anon_vma);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_mergeable_vma(mm, addr);
if (vma)
break_ksm(vma, addr, &mmrange);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}

static struct page *get_mergeable_page(struct rmap_item *rmap_item)
@@ -539,8 +539,9 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item)
unsigned long addr = rmap_item->address;
struct vm_area_struct *vma;
struct page *page;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_mergeable_vma(mm, addr);
if (!vma)
goto out;
@@ -556,7 +557,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item)
out:
page = NULL;
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return page;
}

@@ -936,7 +937,7 @@ static int unmerge_and_remove_all_rmap_items(void)
for (mm_slot = ksm_scan.mm_slot;
mm_slot != &ksm_mm_head; mm_slot = ksm_scan.mm_slot) {
mm = mm_slot->mm;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (ksm_test_exit(mm))
break;
@@ -949,7 +950,7 @@ static int unmerge_and_remove_all_rmap_items(void)
}

remove_trailing_rmap_items(mm_slot, &mm_slot->rmap_list);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

spin_lock(&ksm_mmlist_lock);
ksm_scan.mm_slot = list_entry(mm_slot->mm_list.next,
@@ -972,7 +973,7 @@ static int unmerge_and_remove_all_rmap_items(void)
return 0;

error:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
spin_lock(&ksm_mmlist_lock);
ksm_scan.mm_slot = &ksm_mm_head;
spin_unlock(&ksm_mmlist_lock);
@@ -1251,8 +1252,9 @@ static int try_to_merge_with_ksm_page(struct rmap_item *rmap_item,
struct mm_struct *mm = rmap_item->mm;
struct vm_area_struct *vma;
int err = -EFAULT;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_mergeable_vma(mm, rmap_item->address);
if (!vma)
goto out;
@@ -1268,7 +1270,7 @@ static int try_to_merge_with_ksm_page(struct rmap_item *rmap_item,
rmap_item->anon_vma = vma->anon_vma;
get_anon_vma(vma->anon_vma);
out:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return err;
}

@@ -2071,12 +2073,13 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
*/
if (ksm_use_zero_pages && (checksum == zero_checksum)) {
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_mergeable_vma(mm, rmap_item->address);
err = try_to_merge_one_page(vma, page,
ZERO_PAGE(rmap_item->address));
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
/*
* In case of failure, the page was not really empty, so we
* need to continue. Otherwise we're done.
@@ -2154,6 +2157,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
struct vm_area_struct *vma;
struct rmap_item *rmap_item;
int nid;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (list_empty(&ksm_mm_head.mm_list))
return NULL;
@@ -2210,7 +2214,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
}

mm = slot->mm;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
if (ksm_test_exit(mm))
vma = NULL;
else
@@ -2244,7 +2248,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
ksm_scan.address += PAGE_SIZE;
} else
put_page(*page);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return rmap_item;
}
put_page(*page);
@@ -2282,10 +2286,10 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)

free_mm_slot(slot);
clear_bit(MMF_VM_MERGEABLE, &mm->flags);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
mmdrop(mm);
} else {
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
/*
* up_read(&mm->mmap_sem) first because after
* spin_unlock(&ksm_mmlist_lock) run, the "mm" may
@@ -2474,8 +2478,10 @@ void __ksm_exit(struct mm_struct *mm)
clear_bit(MMF_VM_MERGEABLE, &mm->flags);
mmdrop(mm);
} else if (mm_slot) {
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ mm_write_lock(mm, &mmrange);
+ mm_write_unlock(mm, &mmrange);
}
}

--
2.13.6


2018-02-05 01:31:18

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 58/64] drivers/infiniband: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
drivers/infiniband/core/umem.c | 16 +++++++++-------
drivers/infiniband/core/umem_odp.c | 11 ++++++-----
drivers/infiniband/hw/hfi1/user_pages.c | 15 +++++++++------
drivers/infiniband/hw/mlx4/main.c | 5 +++--
drivers/infiniband/hw/mlx5/main.c | 5 +++--
drivers/infiniband/hw/qib/qib_user_pages.c | 10 ++++++----
drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++++++++-------
7 files changed, 45 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index fd9601ed5b84..bdbb345916d0 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -164,7 +164,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,

npages = ib_umem_num_pages(umem);

- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);

locked = npages + current->mm->pinned_vm;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
@@ -237,7 +237,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
} else
current->mm->pinned_vm = locked;

- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
if (vma_list)
free_page((unsigned long) vma_list);
free_page((unsigned long) page_list);
@@ -249,10 +249,11 @@ EXPORT_SYMBOL(ib_umem_get);
static void ib_umem_account(struct work_struct *work)
{
struct ib_umem *umem = container_of(work, struct ib_umem, work);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&umem->mm->mmap_sem);
+ mm_write_lock(umem->mm, &mmrange);
umem->mm->pinned_vm -= umem->diff;
- up_write(&umem->mm->mmap_sem);
+ mm_write_unlock(umem->mm, &mmrange);
mmput(umem->mm);
kfree(umem);
}
@@ -267,6 +268,7 @@ void ib_umem_release(struct ib_umem *umem)
struct mm_struct *mm;
struct task_struct *task;
unsigned long diff;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (umem->odp_data) {
ib_umem_odp_release(umem);
@@ -295,7 +297,7 @@ void ib_umem_release(struct ib_umem *umem)
* we defer the vm_locked accounting to the system workqueue.
*/
if (context->closing) {
- if (!down_write_trylock(&mm->mmap_sem)) {
+ if (!mm_write_trylock(mm, &mmrange)) {
INIT_WORK(&umem->work, ib_umem_account);
umem->mm = mm;
umem->diff = diff;
@@ -304,10 +306,10 @@ void ib_umem_release(struct ib_umem *umem)
return;
}
} else
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

mm->pinned_vm -= diff;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mmput(mm);
out:
kfree(umem);
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 0572953260e8..3b5f6814ba41 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -334,16 +334,17 @@ int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem,
if (access & IB_ACCESS_HUGETLB) {
struct vm_area_struct *vma;
struct hstate *h;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, ib_umem_start(umem));
if (!vma || !is_vm_hugetlb_page(vma)) {
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return -EINVAL;
}
h = hstate_vma(vma);
umem->page_shift = huge_page_shift(h);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
umem->hugetlb = 1;
} else {
umem->hugetlb = 0;
@@ -674,7 +675,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
(bcnt + BIT(page_shift) - 1) >> page_shift,
PAGE_SIZE / sizeof(struct page *));

- down_read(&owning_mm->mmap_sem);
+ mm_read_lock(owning_mm, &mmrange);
/*
* Note: this might result in redundent page getting. We can
* avoid this by checking dma_list to be 0 before calling
@@ -685,7 +686,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
npages = get_user_pages_remote(owning_process, owning_mm,
user_virt, gup_num_pages,
flags, local_page_list, NULL, NULL, &mmrange);
- up_read(&owning_mm->mmap_sem);
+ mm_read_unlock(owning_mm, &mmrange);

if (npages < 0)
break;
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
index e341e6dcc388..1a6103d4f367 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -76,6 +76,7 @@ bool hfi1_can_pin_pages(struct hfi1_devdata *dd, struct mm_struct *mm,
unsigned int usr_ctxts =
dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt;
bool can_lock = capable(CAP_IPC_LOCK);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* Calculate per-cache size. The calculation below uses only a quarter
@@ -91,9 +92,9 @@ bool hfi1_can_pin_pages(struct hfi1_devdata *dd, struct mm_struct *mm,
/* Convert to number of pages */
size = DIV_ROUND_UP(size, PAGE_SIZE);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
pinned = mm->pinned_vm;
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/* First, check the absolute limit against all pinned pages. */
if (pinned + npages >= ulimit && !can_lock)
@@ -106,14 +107,15 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np
bool writable, struct page **pages)
{
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

ret = get_user_pages_fast(vaddr, npages, writable, pages);
if (ret < 0)
return ret;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
mm->pinned_vm += ret;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return ret;
}
@@ -122,6 +124,7 @@ void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
size_t npages, bool dirty)
{
size_t i;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

for (i = 0; i < npages; i++) {
if (dirty)
@@ -130,8 +133,8 @@ void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
}

if (mm) { /* during close after signal, mm can be NULL */
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
mm->pinned_vm -= npages;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
}
}
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 8d2ee9322f2e..3124717bda45 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1188,6 +1188,7 @@ static void mlx4_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
struct mlx4_ib_ucontext *context = to_mucontext(ibcontext);
struct task_struct *owning_process = NULL;
struct mm_struct *owning_mm = NULL;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

owning_process = get_pid_task(ibcontext->tgid, PIDTYPE_PID);
if (!owning_process)
@@ -1219,7 +1220,7 @@ static void mlx4_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
/* need to protect from a race on closing the vma as part of
* mlx4_ib_vma_close().
*/
- down_write(&owning_mm->mmap_sem);
+ mm_write_lock(owning_mm, &mmrange);
for (i = 0; i < HW_BAR_COUNT; i++) {
vma = context->hw_bar_info[i].vma;
if (!vma)
@@ -1239,7 +1240,7 @@ static void mlx4_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
context->hw_bar_info[i].vma->vm_ops = NULL;
}

- up_write(&owning_mm->mmap_sem);
+ mm_write_unlock(owning_mm, &mmrange);
mmput(owning_mm);
put_task_struct(owning_process);
}
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 4236c8086820..303fed2657fe 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1902,6 +1902,7 @@ static void mlx5_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
struct mlx5_ib_ucontext *context = to_mucontext(ibcontext);
struct task_struct *owning_process = NULL;
struct mm_struct *owning_mm = NULL;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

owning_process = get_pid_task(ibcontext->tgid, PIDTYPE_PID);
if (!owning_process)
@@ -1931,7 +1932,7 @@ static void mlx5_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
/* need to protect from a race on closing the vma as part of
* mlx5_ib_vma_close.
*/
- down_write(&owning_mm->mmap_sem);
+ mm_write_lock(owning_mm, &mmrange);
mutex_lock(&context->vma_private_list_mutex);
list_for_each_entry_safe(vma_private, n, &context->vma_private_list,
list) {
@@ -1948,7 +1949,7 @@ static void mlx5_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
kfree(vma_private);
}
mutex_unlock(&context->vma_private_list_mutex);
- up_write(&owning_mm->mmap_sem);
+ mm_write_unlock(owning_mm, &mmrange);
mmput(owning_mm);
put_task_struct(owning_process);
}
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 6bcb4f9f9b30..13b7f6f93560 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -136,24 +136,26 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
int ret;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);

ret = __qib_get_user_pages(start_page, num_pages, p, &mmrange);

- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);

return ret;
}

void qib_release_user_pages(struct page **p, size_t num_pages)
{
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
if (current->mm) /* during close after signal, mm can be NULL */
- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);

__qib_release_user_pages(p, num_pages, 1);

if (current->mm) {
current->mm->pinned_vm -= num_pages;
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
}
}
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 5f36c6d2e21b..7cb05133311c 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -57,10 +57,11 @@ static void usnic_uiom_reg_account(struct work_struct *work)
{
struct usnic_uiom_reg *umem = container_of(work,
struct usnic_uiom_reg, work);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&umem->mm->mmap_sem);
+ mm_write_lock(umem->mm, &mmrange);
umem->mm->locked_vm -= umem->diff;
- up_write(&umem->mm->mmap_sem);
+ mm_write_unlock(umem->mm, &mmrange);
mmput(umem->mm);
kfree(umem);
}
@@ -126,7 +127,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,

npages = PAGE_ALIGN(size + (addr & ~PAGE_MASK)) >> PAGE_SHIFT;

- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);

locked = npages + current->mm->locked_vm;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
@@ -189,7 +190,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
else
current->mm->locked_vm = locked;

- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
free_page((unsigned long) page_list);
return ret;
}
@@ -425,6 +426,7 @@ void usnic_uiom_reg_release(struct usnic_uiom_reg *uiomr, int closing)
{
struct mm_struct *mm;
unsigned long diff;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

__usnic_uiom_reg_release(uiomr->pd, uiomr, 1);

@@ -445,7 +447,7 @@ void usnic_uiom_reg_release(struct usnic_uiom_reg *uiomr, int closing)
* we defer the vm_locked accounting to the system workqueue.
*/
if (closing) {
- if (!down_write_trylock(&mm->mmap_sem)) {
+ if (!mm_write_trylock(mm, &mmrange)) {
INIT_WORK(&uiomr->work, usnic_uiom_reg_account);
uiomr->mm = mm;
uiomr->diff = diff;
@@ -454,10 +456,10 @@ void usnic_uiom_reg_release(struct usnic_uiom_reg *uiomr, int closing)
return;
}
} else
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

current->mm->locked_vm -= diff;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mmput(mm);
kfree(uiomr);
}
--
2.13.6


2018-02-05 01:31:37

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 01/64] interval-tree: build unconditionally

From: Davidlohr Bueso <[email protected]>

In preparation for range locking, this patch gets rid of
CONFIG_INTERVAL_TREE option as we will unconditionally
build it.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
drivers/gpu/drm/Kconfig | 2 --
drivers/gpu/drm/i915/Kconfig | 1 -
lib/Kconfig | 14 --------------
lib/Kconfig.debug | 1 -
lib/Makefile | 3 +--
5 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index deeefa7a1773..eac89dc17199 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -168,7 +168,6 @@ config DRM_RADEON
select HWMON
select BACKLIGHT_CLASS_DEVICE
select BACKLIGHT_LCD_SUPPORT
- select INTERVAL_TREE
help
Choose this option if you have an ATI Radeon graphics card. There
are both PCI and AGP versions. You don't need to choose this to
@@ -189,7 +188,6 @@ config DRM_AMDGPU
select HWMON
select BACKLIGHT_CLASS_DEVICE
select BACKLIGHT_LCD_SUPPORT
- select INTERVAL_TREE
select CHASH
help
Choose this option if you have a recent AMD Radeon graphics card.
diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index dfd95889f4b7..520a613ec69f 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -3,7 +3,6 @@ config DRM_I915
depends on DRM
depends on X86 && PCI
select INTEL_GTT
- select INTERVAL_TREE
# we need shmfs for the swappable backing store, and in particular
# the shmem_readpage() which depends upon tmpfs
select SHMEM
diff --git a/lib/Kconfig b/lib/Kconfig
index e96089499371..18b56ed167c4 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -362,20 +362,6 @@ config TEXTSEARCH_FSM
config BTREE
bool

-config INTERVAL_TREE
- bool
- help
- Simple, embeddable, interval-tree. Can find the start of an
- overlapping range in log(n) time and then iterate over all
- overlapping nodes. The algorithm is implemented as an
- augmented rbtree.
-
- See:
-
- Documentation/rbtree.txt
-
- for more information.
-
config RADIX_TREE_MULTIORDER
bool

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6088408ef26c..c888f03569e7 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1716,7 +1716,6 @@ config RBTREE_TEST
config INTERVAL_TREE_TEST
tristate "Interval tree test"
depends on DEBUG_KERNEL
- select INTERVAL_TREE
help
A benchmark measuring the performance of the interval tree library

diff --git a/lib/Makefile b/lib/Makefile
index a90d4fcd748f..1c1f8e3ccaa8 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -39,7 +39,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
bsearch.o find_bit.o llist.o memweight.o kfifo.o \
percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
- once.o refcount.o usercopy.o errseq.o bucket_locks.o
+ once.o refcount.o usercopy.o errseq.o bucket_locks.o interval_tree.o
obj-$(CONFIG_STRING_SELFTEST) += test_string.o
obj-y += string_helpers.o
obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
@@ -84,7 +84,6 @@ obj-$(CONFIG_DEBUG_LOCKING_API_SELFTESTS) += locking-selftest.o
obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o

obj-$(CONFIG_BTREE) += btree.o
-obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o
obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o
obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
obj-$(CONFIG_DEBUG_LIST) += list_debug.o
--
2.13.6


2018-02-05 01:32:11

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 64/64] mm: convert mmap_sem to range mmap_lock

From: Davidlohr Bueso <[email protected]>

With mmrange now in place and everyone using the mm
locking wrappers, we can convert the rwsem to a the
range locking scheme. Every single user of mmap_sem
will use a full range, which means that there is no
more parallelism than what we already had. This is
the worst case scenario. Prefetching has been blindly
converted (for now).

This lays out the foundations for later mm address
space locking scalability.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/ia64/mm/fault.c | 2 +-
arch/x86/events/core.c | 2 +-
arch/x86/kernel/tboot.c | 2 +-
arch/x86/mm/fault.c | 2 +-
include/linux/mm.h | 51 +++++++++++++++++++++++++-----------------------
include/linux/mm_types.h | 4 ++--
kernel/fork.c | 2 +-
mm/init-mm.c | 2 +-
mm/memory.c | 2 +-
9 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index 9d379a9a9a5c..fd495bbb3726 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -95,7 +95,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
| (((isr >> IA64_ISR_W_BIT) & 1UL) << VM_WRITE_BIT));

/* mmap_sem is performance critical.... */
- prefetchw(&mm->mmap_sem);
+ prefetchw(&mm->mmap_lock);

/*
* If we're in an interrupt or have no user context, we must not take the fault..
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 140d33288e78..9b94559160b2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2144,7 +2144,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
* For now, this can't happen because all callers hold mmap_sem
* for write. If this changes, we'll need a different solution.
*/
- lockdep_assert_held_exclusive(&mm->mmap_sem);
+ lockdep_assert_held_exclusive(&mm->mmap_lock);

if (atomic_inc_return(&mm->context.perf_rdpmc_allowed) == 1)
on_each_cpu_mask(mm_cpumask(mm), refresh_pce, NULL, 1);
diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index a2486f444073..ec23bc6a1eb0 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -104,7 +104,7 @@ static struct mm_struct tboot_mm = {
.pgd = swapper_pg_dir,
.mm_users = ATOMIC_INIT(2),
.mm_count = ATOMIC_INIT(1),
- .mmap_sem = __RWSEM_INITIALIZER(init_mm.mmap_sem),
+ .mmap_lock = __RANGE_LOCK_TREE_INITIALIZER(init_mm.mmap_lock),
.page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
};
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 87bdcb26a907..c025dbf349a1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1258,7 +1258,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
* Detect and handle instructions that would cause a page fault for
* both a tracked kernel page and a userspace page.
*/
- prefetchw(&mm->mmap_sem);
+ prefetchw(&mm->mmap_lock);

if (unlikely(kmmio_fault(regs, address)))
return;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0b9867e8a35d..a0c2f4b17e3c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2699,73 +2699,76 @@ static inline void setup_nr_node_ids(void) {}
* Address space locking wrappers.
*/
static inline bool mm_is_locked(struct mm_struct *mm,
- struct range_lock *range)
+ struct range_lock *mmrange)
{
- return rwsem_is_locked(&mm->mmap_sem);
+ return range_is_locked(&mm->mmap_lock, mmrange);
}

/* Reader wrappers */
static inline int mm_read_trylock(struct mm_struct *mm,
- struct range_lock *range)
+ struct range_lock *mmrange)
{
- return down_read_trylock(&mm->mmap_sem);
+ return range_read_trylock(&mm->mmap_lock, mmrange);
}

-static inline void mm_read_lock(struct mm_struct *mm, struct range_lock *range)
+static inline void mm_read_lock(struct mm_struct *mm,
+ struct range_lock *mmrange)
{
- down_read(&mm->mmap_sem);
+ range_read_lock(&mm->mmap_lock, mmrange);
}

static inline void mm_read_lock_nested(struct mm_struct *mm,
- struct range_lock *range, int subclass)
+ struct range_lock *mmrange, int subclass)
{
- down_read_nested(&mm->mmap_sem, subclass);
+ range_read_lock_nested(&mm->mmap_lock, mmrange, subclass);
}

static inline void mm_read_unlock(struct mm_struct *mm,
- struct range_lock *range)
+ struct range_lock *mmrange)
{
- up_read(&mm->mmap_sem);
+ range_read_unlock(&mm->mmap_lock, mmrange);
}

/* Writer wrappers */
static inline int mm_write_trylock(struct mm_struct *mm,
- struct range_lock *range)
+ struct range_lock *mmrange)
{
- return down_write_trylock(&mm->mmap_sem);
+ return range_write_trylock(&mm->mmap_lock, mmrange);
}

-static inline void mm_write_lock(struct mm_struct *mm, struct range_lock *range)
+static inline void mm_write_lock(struct mm_struct *mm,
+ struct range_lock *mmrange)
{
- down_write(&mm->mmap_sem);
+ range_write_lock(&mm->mmap_lock, mmrange);
}

static inline int mm_write_lock_killable(struct mm_struct *mm,
- struct range_lock *range)
+ struct range_lock *mmrange)
{
- return down_write_killable(&mm->mmap_sem);
+ return range_write_lock_killable(&mm->mmap_lock, mmrange);
}

static inline void mm_downgrade_write(struct mm_struct *mm,
- struct range_lock *range)
+ struct range_lock *mmrange)
{
- downgrade_write(&mm->mmap_sem);
+ range_downgrade_write(&mm->mmap_lock, mmrange);
}

static inline void mm_write_unlock(struct mm_struct *mm,
- struct range_lock *range)
+ struct range_lock *mmrange)
{
- up_write(&mm->mmap_sem);
+ range_write_unlock(&mm->mmap_lock, mmrange);
}

static inline void mm_write_lock_nested(struct mm_struct *mm,
- struct range_lock *range, int subclass)
+ struct range_lock *mmrange,
+ int subclass)
{
- down_write_nested(&mm->mmap_sem, subclass);
+ range_write_lock_nested(&mm->mmap_lock, mmrange, subclass);
}

-#define mm_write_nest_lock(mm, range, nest_lock) \
- down_write_nest_lock(&(mm)->mmap_sem, nest_lock)
+#define mm_write_lock_nest_lock(mm, range, nest_lock) \
+ range_write_lock_nest_lock(&(mm)->mmap_lock, mmrange, nest_lock)

#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..fd9545fe4735 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -8,7 +8,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/rbtree.h>
-#include <linux/rwsem.h>
+#include <linux/range_lock.h>
#include <linux/completion.h>
#include <linux/cpumask.h>
#include <linux/uprobes.h>
@@ -393,7 +393,7 @@ struct mm_struct {
int map_count; /* number of VMAs */

spinlock_t page_table_lock; /* Protects page tables and some counters */
- struct rw_semaphore mmap_sem;
+ struct range_lock_tree mmap_lock;

struct list_head mmlist; /* List of maybe swapped mm's. These are globally strung
* together off init_mm.mmlist, and are protected
diff --git a/kernel/fork.c b/kernel/fork.c
index 060554e33111..252a1fe18f16 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -899,7 +899,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm->vmacache_seqnum = 0;
atomic_set(&mm->mm_users, 1);
atomic_set(&mm->mm_count, 1);
- init_rwsem(&mm->mmap_sem);
+ range_lock_tree_init(&mm->mmap_lock);
INIT_LIST_HEAD(&mm->mmlist);
mm->core_state = NULL;
mm_pgtables_bytes_init(mm);
diff --git a/mm/init-mm.c b/mm/init-mm.c
index f94d5d15ebc0..c4aee632702f 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -20,7 +20,7 @@ struct mm_struct init_mm = {
.pgd = swapper_pg_dir,
.mm_users = ATOMIC_INIT(2),
.mm_count = ATOMIC_INIT(1),
- .mmap_sem = __RWSEM_INITIALIZER(init_mm.mmap_sem),
+ .mmap_lock = __RANGE_LOCK_TREE_INITIALIZER(init_mm.mmap_lock),
.page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
.user_ns = &init_user_ns,
diff --git a/mm/memory.c b/mm/memory.c
index e3bf2879f7c3..d4fc526d82a4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4568,7 +4568,7 @@ void __might_fault(const char *file, int line)
__might_sleep(file, line, 0);
#if defined(CONFIG_DEBUG_ATOMIC_SLEEP)
if (current->mm)
- might_lock_read(&current->mm->mmap_sem);
+ might_lock_read(&current->mm->mmap_lock);
#endif
}
EXPORT_SYMBOL(__might_fault);
--
2.13.6


2018-02-05 01:32:11

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 57/64] drivers/gpu: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.
Those mmap_sem users that don't know about mmrange are updated
trivially as the sem is used in the same context of the caller.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 7 ++++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 ++++----
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 +++--
drivers/gpu/drm/i915/i915_gem.c | 5 +++--
drivers/gpu/drm/i915/i915_gem_userptr.c | 9 +++++----
drivers/gpu/drm/radeon/radeon_cs.c | 5 +++--
drivers/gpu/drm/radeon/radeon_gem.c | 7 ++++---
drivers/gpu/drm/radeon/radeon_mn.c | 7 ++++---
drivers/gpu/drm/ttm/ttm_bo_vm.c | 4 ++--
9 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index bd67f4cb8e6c..cda7ea8503b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -257,9 +257,10 @@ struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
struct mm_struct *mm = current->mm;
struct amdgpu_mn *rmn;
int r;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

mutex_lock(&adev->mn_lock);
- if (down_write_killable(&mm->mmap_sem)) {
+ if (mm_write_lock_killable(mm, &mmrange)) {
mutex_unlock(&adev->mn_lock);
return ERR_PTR(-EINTR);
}
@@ -289,13 +290,13 @@ struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
hash_add(adev->mn_hash, &rmn->node, (unsigned long)mm);

release_locks:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mutex_unlock(&adev->mn_lock);

return rmn;

free_rmn:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mutex_unlock(&adev->mn_lock);
kfree(rmn);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index bd464a599341..95467ef0df45 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -696,7 +696,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
if (!(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY))
flags |= FOLL_WRITE;

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);

if (gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) {
/* check that we only use anonymous memory
@@ -706,7 +706,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)

vma = find_vma(gtt->usermm, gtt->userptr);
if (!vma || vma->vm_file || vma->vm_end < end) {
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return -EPERM;
}
}
@@ -735,12 +735,12 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)

} while (pinned < ttm->num_pages);

- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return 0;

release_pages:
release_pages(pages, pinned);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return r;
}

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 93aae5c1e78b..ca516482b145 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -851,6 +851,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
*/
struct kfd_process *p = kfd_lookup_process_by_pasid(pasid);
struct mm_struct *mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!p)
return; /* Presumably process exited. */
@@ -866,7 +867,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,

memset(&memory_exception_data, 0, sizeof(memory_exception_data));

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);

memory_exception_data.gpu_id = dev->id;
@@ -893,7 +894,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
mmput(mm);

mutex_lock(&p->event_mutex);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dd89abd2263d..61d958934efd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1758,8 +1758,9 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
if (args->flags & I915_MMAP_WC) {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem)) {
+ if (mm_write_lock_killable(mm, &mmrange)) {
i915_gem_object_put(obj);
return -EINTR;
}
@@ -1769,7 +1770,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
else
addr = -ENOMEM;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

/* This may race, but that's ok, it only gets set */
WRITE_ONCE(obj->frontbuffer_ggtt_origin, ORIGIN_CPU);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 881bcc7d663a..3886b74638f7 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -205,6 +205,7 @@ i915_mmu_notifier_find(struct i915_mm_struct *mm)
{
struct i915_mmu_notifier *mn;
int err = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

mn = mm->mn;
if (mn)
@@ -214,7 +215,7 @@ i915_mmu_notifier_find(struct i915_mm_struct *mm)
if (IS_ERR(mn))
err = PTR_ERR(mn);

- down_write(&mm->mm->mmap_sem);
+ mm_write_lock(mm->mm, &mmrange);
mutex_lock(&mm->i915->mm_lock);
if (mm->mn == NULL && !err) {
/* Protected by mmap_sem (write-lock) */
@@ -231,7 +232,7 @@ i915_mmu_notifier_find(struct i915_mm_struct *mm)
err = 0;
}
mutex_unlock(&mm->i915->mm_lock);
- up_write(&mm->mm->mmap_sem);
+ mm_write_unlock(mm->mm, &mmrange);

if (mn && !IS_ERR(mn)) {
destroy_workqueue(mn->wq);
@@ -514,7 +515,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
if (mmget_not_zero(mm)) {
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
while (pinned < npages) {
ret = get_user_pages_remote
(work->task, mm,
@@ -527,7 +528,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)

pinned += ret;
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
mmput(mm);
}
}
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 1ae31dbc61c6..71a19881b04a 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -79,6 +79,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
unsigned i;
bool need_mmap_lock = false;
int r;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (p->chunk_relocs == NULL) {
return 0;
@@ -190,12 +191,12 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
p->vm_bos = radeon_vm_get_bos(p->rdev, p->ib.vm,
&p->validated);
if (need_mmap_lock)
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);

r = radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);

if (need_mmap_lock)
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

return r;
}
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index a9962ffba720..3e169fa1750e 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -292,6 +292,7 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
struct radeon_bo *bo;
uint32_t handle;
int r;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (offset_in_page(args->addr | args->size))
return -EINVAL;
@@ -336,17 +337,17 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
}

if (args->flags & RADEON_GEM_USERPTR_VALIDATE) {
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
r = radeon_bo_reserve(bo, true);
if (r) {
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
goto release_object;
}

radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT);
r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
radeon_bo_unreserve(bo);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (r)
goto release_object;
}
diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
index abd24975c9b1..9b10cacc5b14 100644
--- a/drivers/gpu/drm/radeon/radeon_mn.c
+++ b/drivers/gpu/drm/radeon/radeon_mn.c
@@ -186,8 +186,9 @@ static struct radeon_mn *radeon_mn_get(struct radeon_device *rdev)
struct mm_struct *mm = current->mm;
struct radeon_mn *rmn;
int r;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return ERR_PTR(-EINTR);

mutex_lock(&rdev->mn_lock);
@@ -216,13 +217,13 @@ static struct radeon_mn *radeon_mn_get(struct radeon_device *rdev)

release_locks:
mutex_unlock(&rdev->mn_lock);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return rmn;

free_rmn:
mutex_unlock(&rdev->mn_lock);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
kfree(rmn);

return ERR_PTR(r);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 08a3c324242e..2b2a1668fbe3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -67,7 +67,7 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
goto out_unlock;

ttm_bo_reference(bo);
- up_read(&vmf->vma->vm_mm->mmap_sem);
+ mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange);
(void) dma_fence_wait(bo->moving, true);
ttm_bo_unreserve(bo);
ttm_bo_unref(&bo);
@@ -137,7 +137,7 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
ttm_bo_reference(bo);
- up_read(&vmf->vma->vm_mm->mmap_sem);
+ mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange);
(void) ttm_bo_wait_unreserved(bo);
ttm_bo_unref(&bo);
}
--
2.13.6


2018-02-05 01:32:29

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 63/64] mm/mmap: hack drop down_write_nest_lock()

From: Davidlohr Bueso <[email protected]>

* THIS IS A HACK *

Directly call down_write() on i_mmap_rwsem (such that
we don't have to convert it to a range lock)

Signed-off-by: Davidlohr Bueso <[email protected]>
---
mm/mmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index e10d005f7e2f..1d3a5edd19b2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3394,7 +3394,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
* The LSB of head.next can't change from under us
* because we hold the mm_all_locks_mutex.
*/
- down_write_nest_lock(&anon_vma->root->rwsem, &mm->mmap_sem);
+ down_write(&anon_vma->root->rwsem);
/*
* We can safely modify head.next after taking the
* anon_vma->root->rwsem. If some other vma in this mm shares
@@ -3424,7 +3424,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
*/
if (test_and_set_bit(AS_MM_ALL_LOCKS, &mapping->flags))
BUG();
- down_write_nest_lock(&mapping->i_mmap_rwsem, &mm->mmap_sem);
+ down_write(&mapping->i_mmap_rwsem);
}
}

--
2.13.6


2018-02-05 01:32:32

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 55/64] arch/riscv: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/riscv/kernel/vdso.c | 5 +++--
arch/riscv/mm/fault.c | 10 +++++-----
2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
index 582cb153eb24..4bbb6e0425df 100644
--- a/arch/riscv/kernel/vdso.c
+++ b/arch/riscv/kernel/vdso.c
@@ -69,10 +69,11 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
struct mm_struct *mm = current->mm;
unsigned long vdso_base, vdso_len;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

vdso_len = (vdso_pages + 1) << PAGE_SHIFT;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
vdso_base = get_unmapped_area(NULL, 0, vdso_len, 0, 0);
if (IS_ERR_VALUE(vdso_base)) {
ret = vdso_base;
@@ -94,7 +95,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
mm->context.vdso = NULL;

end:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 75d15e73ba39..6f78080e987c 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -79,7 +79,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);

retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, addr);
if (unlikely(!vma))
goto bad_area;
@@ -170,7 +170,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -178,7 +178,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
* Fix it, but check if it's kernel or user first.
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
/* User mode accesses just cause a SIGSEGV */
if (user_mode(regs)) {
do_trap(regs, SIGSEGV, code, addr, tsk);
@@ -206,14 +206,14 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
* (which will retry the fault, or kill us if we got oom-killed).
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
/* Kernel mode? Handle exceptions or die */
if (!user_mode(regs))
goto no_context;
--
2.13.6


2018-02-05 01:32:58

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 56/64] drivers/android: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

The binder_alloc_free_page() shrinker callback can call
zap_page_range(), which needs mmap_sem. Use mm locking
wrappers, no change in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
drivers/android/binder_alloc.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 5a426c877dfb..191724983638 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -194,6 +194,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
struct vm_area_struct *vma = NULL;
struct mm_struct *mm = NULL;
bool need_mm = false;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: %s pages %pK-%pK\n", alloc->pid,
@@ -219,7 +220,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
mm = alloc->vma_vm_mm;

if (mm) {
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
vma = alloc->vma;
}

@@ -288,7 +289,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
/* vm_insert_page does not seem to increment the refcount */
}
if (mm) {
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mmput(mm);
}
return 0;
@@ -321,7 +322,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
}
err_no_vma:
if (mm) {
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mmput(mm);
}
return vma ? -ENOMEM : -ESRCH;
@@ -914,6 +915,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
uintptr_t page_addr;
size_t index;
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

alloc = page->alloc;
if (!mutex_trylock(&alloc->mutex))
@@ -929,7 +931,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
if (!mmget_not_zero(alloc->vma_vm_mm))
goto err_mmget;
mm = alloc->vma_vm_mm;
- if (!down_write_trylock(&mm->mmap_sem))
+ if (!mm_write_trylock(mm, &mmrange))
goto err_down_write_mmap_sem_failed;
}

@@ -945,7 +947,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,

trace_binder_unmap_user_end(alloc, index);

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mmput(mm);
}

--
2.13.6


2018-02-05 01:33:06

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 59/64] drivers/iommu: use mm locking helpers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
drivers/iommu/amd_iommu_v2.c | 4 ++--
drivers/iommu/intel-svm.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 15a7103fd84c..d3aee158d251 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -523,7 +523,7 @@ static void do_fault(struct work_struct *work)
flags |= FAULT_FLAG_WRITE;
flags |= FAULT_FLAG_REMOTE;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_extend_vma(mm, address, &mmrange);
if (!vma || address < vma->vm_start)
/* failed to get a vma in the right range */
@@ -535,7 +535,7 @@ static void do_fault(struct work_struct *work)

ret = handle_mm_fault(vma, address, flags, &mmrange);
out:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

if (ret & VM_FAULT_ERROR)
/* failed to service fault */
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 6a74386ee83f..c4d0d2398052 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -643,7 +643,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (!is_canonical_address(address))
goto bad_req;

- down_read(&svm->mm->mmap_sem);
+ mm_read_lock(svm->mm, &mmrange);
vma = find_extend_vma(svm->mm, address, &mmrange);
if (!vma || address < vma->vm_start)
goto invalid;
@@ -658,7 +658,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)

result = QI_RESP_SUCCESS;
invalid:
- up_read(&svm->mm->mmap_sem);
+ mm_read_unlock(svm->mm, &mmrange);
mmput(svm->mm);
bad_req:
/* Accounting for major/minor faults? */
--
2.13.6


2018-02-05 01:33:36

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 50/64] arch/unicore32: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/unicore32/mm/fault.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index dd35b6191798..f806ade79afd 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -233,12 +233,12 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
* validly references user space from well defined areas of the code,
* we can bug out early if this is from code which shouldn't.
*/
- if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!mm_read_trylock(mm, &mmrange)) {
if (!user_mode(regs)
&& !search_exception_tables(regs->UCreg_pc))
goto no_context;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
} else {
/*
* The above down_read_trylock() might have succeeded in
@@ -275,7 +275,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Handle the "normal" case first - VM_FAULT_MAJOR
--
2.13.6


2018-02-05 01:33:49

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 52/64] arch/openrisc: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/openrisc/kernel/dma.c | 6 ++++--
arch/openrisc/mm/fault.c | 10 +++++-----
2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
index a945f00011b4..9fee5388f647 100644
--- a/arch/openrisc/kernel/dma.c
+++ b/arch/openrisc/kernel/dma.c
@@ -87,6 +87,7 @@ or1k_dma_alloc(struct device *dev, size_t size,
{
unsigned long va;
void *page;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
struct mm_walk walk = {
.pte_entry = page_set_nocache,
.mm = &init_mm
@@ -106,7 +107,7 @@ or1k_dma_alloc(struct device *dev, size_t size,
* We need to iterate through the pages, clearing the dcache for
* them and setting the cache-inhibit bit.
*/
- if (walk_page_range(va, va + size, &walk)) {
+ if (walk_page_range(va, va + size, &walk, &mmrange)) {
free_pages_exact(page, size);
return NULL;
}
@@ -120,6 +121,7 @@ or1k_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
{
unsigned long va = (unsigned long)vaddr;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
struct mm_walk walk = {
.pte_entry = page_clear_nocache,
.mm = &init_mm
@@ -127,7 +129,7 @@ or1k_dma_free(struct device *dev, size_t size, void *vaddr,

if ((attrs & DMA_ATTR_NON_CONSISTENT) == 0) {
/* walk_page_range shouldn't be able to fail here */
- WARN_ON(walk_page_range(va, va + size, &walk));
+ WARN_ON(walk_page_range(va, va + size, &walk, &mmrange));
}

free_pages_exact(vaddr, size);
diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index 75ddb1e8e7e7..81f6d509bf64 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -109,7 +109,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
goto no_context;

retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);

if (!vma)
@@ -198,7 +198,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -207,7 +207,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
*/

bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:

@@ -270,14 +270,14 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
__asm__ __volatile__("l.nop 42");
__asm__ __volatile__("l.nop 1");

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Send a sigbus, regardless of whether we were in kernel
--
2.13.6


2018-02-05 01:33:51

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 61/64] staging/lustre: use generic range lock

From: Davidlohr Bueso <[email protected]>

This replaces the in-house version. It also adds the mmrange
and makes use of mm locking wrappers.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
drivers/staging/lustre/lustre/llite/Makefile | 2 +-
drivers/staging/lustre/lustre/llite/file.c | 16 +-
.../staging/lustre/lustre/llite/llite_internal.h | 4 +-
drivers/staging/lustre/lustre/llite/llite_mmap.c | 4 +-
drivers/staging/lustre/lustre/llite/range_lock.c | 240 ---------------------
drivers/staging/lustre/lustre/llite/range_lock.h | 83 -------
drivers/staging/lustre/lustre/llite/vvp_io.c | 7 +-
7 files changed, 17 insertions(+), 339 deletions(-)
delete mode 100644 drivers/staging/lustre/lustre/llite/range_lock.c
delete mode 100644 drivers/staging/lustre/lustre/llite/range_lock.h

diff --git a/drivers/staging/lustre/lustre/llite/Makefile b/drivers/staging/lustre/lustre/llite/Makefile
index 519fd747e3ad..0a6fb56c7e89 100644
--- a/drivers/staging/lustre/lustre/llite/Makefile
+++ b/drivers/staging/lustre/lustre/llite/Makefile
@@ -4,7 +4,7 @@ subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include

obj-$(CONFIG_LUSTRE_FS) += lustre.o
lustre-y := dcache.o dir.o file.o llite_lib.o llite_nfs.o \
- rw.o rw26.o namei.o symlink.o llite_mmap.o range_lock.o \
+ rw.o rw26.o namei.o symlink.o llite_mmap.o \
xattr.o xattr_cache.o xattr_security.o \
super25.o statahead.o glimpse.o lcommon_cl.o lcommon_misc.o \
vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o \
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 938b859b6650..a1064da457ae 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1085,10 +1085,10 @@ ll_file_io_generic(const struct lu_env *env, struct vvp_io_args *args,
if (((iot == CIT_WRITE) ||
(iot == CIT_READ && (file->f_flags & O_DIRECT))) &&
!(vio->vui_fd->fd_flags & LL_FILE_GROUP_LOCKED)) {
- CDEBUG(D_VFSTRACE, "Range lock [%llu, %llu]\n",
- range.rl_node.in_extent.start,
- range.rl_node.in_extent.end);
- rc = range_lock(&lli->lli_write_tree, &range);
+ CDEBUG(D_VFSTRACE, "Range lock [%lu, %lu]\n",
+ range.node.start,
+ range.node.last);
+ rc = range_write_lock_interruptible(&lli->lli_write_tree, &range);
if (rc < 0)
goto out;

@@ -1098,10 +1098,10 @@ ll_file_io_generic(const struct lu_env *env, struct vvp_io_args *args,
rc = cl_io_loop(env, io);
ll_cl_remove(file, env);
if (range_locked) {
- CDEBUG(D_VFSTRACE, "Range unlock [%llu, %llu]\n",
- range.rl_node.in_extent.start,
- range.rl_node.in_extent.end);
- range_unlock(&lli->lli_write_tree, &range);
+ CDEBUG(D_VFSTRACE, "Range unlock [%lu, %lu]\n",
+ range.node.start,
+ range.node.last);
+ range_write_unlock(&lli->lli_write_tree, &range);
}
} else {
/* cl_io_rw_init() handled IO */
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index f68c2e88f12b..7dae3d032769 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -47,10 +47,10 @@
#include <lustre_intent.h>
#include <linux/compat.h>
#include <linux/namei.h>
+#include <linux/range_lock.h>
#include <linux/xattr.h>
#include <linux/posix_acl_xattr.h>
#include "vvp_internal.h"
-#include "range_lock.h"

#ifndef FMODE_EXEC
#define FMODE_EXEC 0
@@ -919,7 +919,7 @@ int ll_file_mmap(struct file *file, struct vm_area_struct *vma);
void policy_from_vma(union ldlm_policy_data *policy, struct vm_area_struct *vma,
unsigned long addr, size_t count);
struct vm_area_struct *our_vma(struct mm_struct *mm, unsigned long addr,
- size_t count);
+ size_t count, struct range_lock *mmrange);

static inline void ll_invalidate_page(struct page *vmpage)
{
diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index c0533bd6f352..adba30973c82 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -59,12 +59,12 @@ void policy_from_vma(union ldlm_policy_data *policy,
}

struct vm_area_struct *our_vma(struct mm_struct *mm, unsigned long addr,
- size_t count)
+ size_t count, struct range_lock *mmrange)
{
struct vm_area_struct *vma, *ret = NULL;

/* mmap_sem must have been held by caller. */
- LASSERT(!down_write_trylock(&mm->mmap_sem));
+ LASSERT(!mm_write_trylock(mm, mmrange));

for (vma = find_vma(mm, addr);
vma && vma->vm_start < (addr + count); vma = vma->vm_next) {
diff --git a/drivers/staging/lustre/lustre/llite/range_lock.c b/drivers/staging/lustre/lustre/llite/range_lock.c
deleted file mode 100644
index cc9565f6bfe2..000000000000
--- a/drivers/staging/lustre/lustre/llite/range_lock.c
+++ /dev/null
@@ -1,240 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.gnu.org/licenses/gpl-2.0.html
- *
- * GPL HEADER END
- */
-/*
- * Range lock is used to allow multiple threads writing a single shared
- * file given each thread is writing to a non-overlapping portion of the
- * file.
- *
- * Refer to the possible upstream kernel version of range lock by
- * Jan Kara <[email protected]>: https://lkml.org/lkml/2013/1/31/480
- *
- * This file could later replaced by the upstream kernel version.
- */
-/*
- * Author: Prakash Surya <[email protected]>
- * Author: Bobi Jam <[email protected]>
- */
-#include "range_lock.h"
-#include <uapi/linux/lustre/lustre_idl.h>
-
-/**
- * Initialize a range lock tree
- *
- * \param tree [in] an empty range lock tree
- *
- * Pre: Caller should have allocated the range lock tree.
- * Post: The range lock tree is ready to function.
- */
-void range_lock_tree_init(struct range_lock_tree *tree)
-{
- tree->rlt_root = NULL;
- tree->rlt_sequence = 0;
- spin_lock_init(&tree->rlt_lock);
-}
-
-/**
- * Initialize a range lock node
- *
- * \param lock [in] an empty range lock node
- * \param start [in] start of the covering region
- * \param end [in] end of the covering region
- *
- * Pre: Caller should have allocated the range lock node.
- * Post: The range lock node is meant to cover [start, end] region
- */
-int range_lock_init(struct range_lock *lock, __u64 start, __u64 end)
-{
- int rc;
-
- memset(&lock->rl_node, 0, sizeof(lock->rl_node));
- if (end != LUSTRE_EOF)
- end >>= PAGE_SHIFT;
- rc = interval_set(&lock->rl_node, start >> PAGE_SHIFT, end);
- if (rc)
- return rc;
-
- INIT_LIST_HEAD(&lock->rl_next_lock);
- lock->rl_task = NULL;
- lock->rl_lock_count = 0;
- lock->rl_blocking_ranges = 0;
- lock->rl_sequence = 0;
- return rc;
-}
-
-static inline struct range_lock *next_lock(struct range_lock *lock)
-{
- return list_entry(lock->rl_next_lock.next, typeof(*lock), rl_next_lock);
-}
-
-/**
- * Helper function of range_unlock()
- *
- * \param node [in] a range lock found overlapped during interval node
- * search
- * \param arg [in] the range lock to be tested
- *
- * \retval INTERVAL_ITER_CONT indicate to continue the search for next
- * overlapping range node
- * \retval INTERVAL_ITER_STOP indicate to stop the search
- */
-static enum interval_iter range_unlock_cb(struct interval_node *node, void *arg)
-{
- struct range_lock *lock = arg;
- struct range_lock *overlap = node2rangelock(node);
- struct range_lock *iter;
-
- list_for_each_entry(iter, &overlap->rl_next_lock, rl_next_lock) {
- if (iter->rl_sequence > lock->rl_sequence) {
- --iter->rl_blocking_ranges;
- LASSERT(iter->rl_blocking_ranges > 0);
- }
- }
- if (overlap->rl_sequence > lock->rl_sequence) {
- --overlap->rl_blocking_ranges;
- if (overlap->rl_blocking_ranges == 0)
- wake_up_process(overlap->rl_task);
- }
- return INTERVAL_ITER_CONT;
-}
-
-/**
- * Unlock a range lock, wake up locks blocked by this lock.
- *
- * \param tree [in] range lock tree
- * \param lock [in] range lock to be deleted
- *
- * If this lock has been granted, relase it; if not, just delete it from
- * the tree or the same region lock list. Wake up those locks only blocked
- * by this lock through range_unlock_cb().
- */
-void range_unlock(struct range_lock_tree *tree, struct range_lock *lock)
-{
- spin_lock(&tree->rlt_lock);
- if (!list_empty(&lock->rl_next_lock)) {
- struct range_lock *next;
-
- if (interval_is_intree(&lock->rl_node)) { /* first lock */
- /* Insert the next same range lock into the tree */
- next = next_lock(lock);
- next->rl_lock_count = lock->rl_lock_count - 1;
- interval_erase(&lock->rl_node, &tree->rlt_root);
- interval_insert(&next->rl_node, &tree->rlt_root);
- } else {
- /* find the first lock in tree */
- list_for_each_entry(next, &lock->rl_next_lock,
- rl_next_lock) {
- if (!interval_is_intree(&next->rl_node))
- continue;
-
- LASSERT(next->rl_lock_count > 0);
- next->rl_lock_count--;
- break;
- }
- }
- list_del_init(&lock->rl_next_lock);
- } else {
- LASSERT(interval_is_intree(&lock->rl_node));
- interval_erase(&lock->rl_node, &tree->rlt_root);
- }
-
- interval_search(tree->rlt_root, &lock->rl_node.in_extent,
- range_unlock_cb, lock);
- spin_unlock(&tree->rlt_lock);
-}
-
-/**
- * Helper function of range_lock()
- *
- * \param node [in] a range lock found overlapped during interval node
- * search
- * \param arg [in] the range lock to be tested
- *
- * \retval INTERVAL_ITER_CONT indicate to continue the search for next
- * overlapping range node
- * \retval INTERVAL_ITER_STOP indicate to stop the search
- */
-static enum interval_iter range_lock_cb(struct interval_node *node, void *arg)
-{
- struct range_lock *lock = arg;
- struct range_lock *overlap = node2rangelock(node);
-
- lock->rl_blocking_ranges += overlap->rl_lock_count + 1;
- return INTERVAL_ITER_CONT;
-}
-
-/**
- * Lock a region
- *
- * \param tree [in] range lock tree
- * \param lock [in] range lock node containing the region span
- *
- * \retval 0 get the range lock
- * \retval <0 error code while not getting the range lock
- *
- * If there exists overlapping range lock, the new lock will wait and
- * retry, if later it find that it is not the chosen one to wake up,
- * it wait again.
- */
-int range_lock(struct range_lock_tree *tree, struct range_lock *lock)
-{
- struct interval_node *node;
- int rc = 0;
-
- spin_lock(&tree->rlt_lock);
- /*
- * We need to check for all conflicting intervals
- * already in the tree.
- */
- interval_search(tree->rlt_root, &lock->rl_node.in_extent,
- range_lock_cb, lock);
- /*
- * Insert to the tree if I am unique, otherwise I've been linked to
- * the rl_next_lock of another lock which has the same range as mine
- * in range_lock_cb().
- */
- node = interval_insert(&lock->rl_node, &tree->rlt_root);
- if (node) {
- struct range_lock *tmp = node2rangelock(node);
-
- list_add_tail(&lock->rl_next_lock, &tmp->rl_next_lock);
- tmp->rl_lock_count++;
- }
- lock->rl_sequence = ++tree->rlt_sequence;
-
- while (lock->rl_blocking_ranges > 0) {
- lock->rl_task = current;
- __set_current_state(TASK_INTERRUPTIBLE);
- spin_unlock(&tree->rlt_lock);
- schedule();
-
- if (signal_pending(current)) {
- range_unlock(tree, lock);
- rc = -EINTR;
- goto out;
- }
- spin_lock(&tree->rlt_lock);
- }
- spin_unlock(&tree->rlt_lock);
-out:
- return rc;
-}
diff --git a/drivers/staging/lustre/lustre/llite/range_lock.h b/drivers/staging/lustre/lustre/llite/range_lock.h
deleted file mode 100644
index 38b2be4e378f..000000000000
--- a/drivers/staging/lustre/lustre/llite/range_lock.h
+++ /dev/null
@@ -1,83 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * GPL HEADER START
- *
- * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 only,
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License version 2 for more details (a copy is included
- * in the LICENSE file that accompanied this code).
- *
- * You should have received a copy of the GNU General Public License
- * version 2 along with this program; If not, see
- * http://www.gnu.org/licenses/gpl-2.0.html
- *
- * GPL HEADER END
- */
-/*
- * Range lock is used to allow multiple threads writing a single shared
- * file given each thread is writing to a non-overlapping portion of the
- * file.
- *
- * Refer to the possible upstream kernel version of range lock by
- * Jan Kara <[email protected]>: https://lkml.org/lkml/2013/1/31/480
- *
- * This file could later replaced by the upstream kernel version.
- */
-/*
- * Author: Prakash Surya <[email protected]>
- * Author: Bobi Jam <[email protected]>
- */
-#ifndef _RANGE_LOCK_H
-#define _RANGE_LOCK_H
-
-#include <linux/libcfs/libcfs.h>
-#include <interval_tree.h>
-
-struct range_lock {
- struct interval_node rl_node;
- /**
- * Process to enqueue this lock.
- */
- struct task_struct *rl_task;
- /**
- * List of locks with the same range.
- */
- struct list_head rl_next_lock;
- /**
- * Number of locks in the list rl_next_lock
- */
- unsigned int rl_lock_count;
- /**
- * Number of ranges which are blocking acquisition of the lock
- */
- unsigned int rl_blocking_ranges;
- /**
- * Sequence number of range lock. This number is used to get to know
- * the order the locks are queued; this is required for range_cancel().
- */
- __u64 rl_sequence;
-};
-
-static inline struct range_lock *node2rangelock(const struct interval_node *n)
-{
- return container_of(n, struct range_lock, rl_node);
-}
-
-struct range_lock_tree {
- struct interval_node *rlt_root;
- spinlock_t rlt_lock; /* protect range lock tree */
- __u64 rlt_sequence;
-};
-
-void range_lock_tree_init(struct range_lock_tree *tree);
-int range_lock_init(struct range_lock *lock, __u64 start, __u64 end);
-int range_lock(struct range_lock_tree *tree, struct range_lock *lock);
-void range_unlock(struct range_lock_tree *tree, struct range_lock *lock);
-#endif
diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c
index e7a4778e02e4..1d4b19bd5f53 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_io.c
+++ b/drivers/staging/lustre/lustre/llite/vvp_io.c
@@ -378,6 +378,7 @@ static int vvp_mmap_locks(const struct lu_env *env,
int result = 0;
struct iov_iter i;
struct iovec iov;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

LASSERT(io->ci_type == CIT_READ || io->ci_type == CIT_WRITE);

@@ -397,8 +398,8 @@ static int vvp_mmap_locks(const struct lu_env *env,
count += addr & (~PAGE_MASK);
addr &= PAGE_MASK;

- down_read(&mm->mmap_sem);
- while ((vma = our_vma(mm, addr, count)) != NULL) {
+ mm_read_lock(mm, &mmrange);
+ while ((vma = our_vma(mm, addr, count, &mmrange)) != NULL) {
struct inode *inode = file_inode(vma->vm_file);
int flags = CEF_MUST;

@@ -438,7 +439,7 @@ static int vvp_mmap_locks(const struct lu_env *env,
count -= vma->vm_end - addr;
addr = vma->vm_end;
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (result < 0)
break;
}
--
2.13.6


2018-02-05 01:34:14

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 45/64] arch/m32r: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/m32r/mm/fault.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/m32r/mm/fault.c b/arch/m32r/mm/fault.c
index 0129aea46729..2c6b58ecfc53 100644
--- a/arch/m32r/mm/fault.c
+++ b/arch/m32r/mm/fault.c
@@ -137,11 +137,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
* source. If this is invalid we can skip the address space check,
* thus avoiding the deadlock.
*/
- if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!mm_read_trylock(mm, &mmrange)) {
if ((error_code & ACE_USERMODE) == 0 &&
!search_exception_tables(regs->psw))
goto bad_area_nosemaphore;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
}

vma = find_vma(mm, address);
@@ -213,7 +213,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
else
tsk->min_flt++;
set_thread_fault_code(0);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -221,7 +221,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
@@ -274,14 +274,14 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!(error_code & ACE_USERMODE))
goto no_context;
pagefault_out_of_memory();
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/* Kernel mode? Handle exception or die */
if (!(error_code & ACE_USERMODE))
--
2.13.6


2018-02-05 01:35:22

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 35/64] arch/ia64: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/ia64/kernel/perfmon.c | 10 +++++-----
arch/ia64/mm/fault.c | 8 ++++----
arch/ia64/mm/init.c | 13 +++++++------
3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/kernel/perfmon.c b/arch/ia64/kernel/perfmon.c
index 858602494096..53cde97fe67a 100644
--- a/arch/ia64/kernel/perfmon.c
+++ b/arch/ia64/kernel/perfmon.c
@@ -2244,7 +2244,7 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t
struct vm_area_struct *vma = NULL;
unsigned long size;
void *smpl_buf;
-
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* the fixed header + requested size and align to page boundary
@@ -2307,13 +2307,13 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t
* now we atomically find some area in the address space and
* remap the buffer in it.
*/
- down_write(&task->mm->mmap_sem);
+ mm_write_lock(task->mm, &mmrange);

/* find some free area in address space, must have mmap sem held */
vma->vm_start = get_unmapped_area(NULL, 0, size, 0, MAP_PRIVATE|MAP_ANONYMOUS);
if (IS_ERR_VALUE(vma->vm_start)) {
DPRINT(("Cannot find unmapped area for size %ld\n", size));
- up_write(&task->mm->mmap_sem);
+ mm_write_unlock(task->mm, &mmrange);
goto error;
}
vma->vm_end = vma->vm_start + size;
@@ -2324,7 +2324,7 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t
/* can only be applied to current task, need to have the mm semaphore held when called */
if (pfm_remap_buffer(vma, (unsigned long)smpl_buf, vma->vm_start, size)) {
DPRINT(("Can't remap buffer\n"));
- up_write(&task->mm->mmap_sem);
+ mm_write_unlock(task->mm, &mmrange);
goto error;
}

@@ -2335,7 +2335,7 @@ pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t
insert_vm_struct(mm, vma);

vm_stat_account(vma->vm_mm, vma->vm_flags, vma_pages(vma));
- up_write(&task->mm->mmap_sem);
+ mm_write_unlock(task->mm, &mmrange);

/*
* keep track of user level virtual address
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index 44f0ec5f77c2..9d379a9a9a5c 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -126,7 +126,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
if (mask & VM_WRITE)
flags |= FAULT_FLAG_WRITE;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

vma = find_vma_prev(mm, address, &prev_vma);
if (!vma && !prev_vma )
@@ -203,7 +203,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

check_expansion:
@@ -234,7 +234,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
goto good_area;

bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
#ifdef CONFIG_VIRTUAL_MEM_MAP
bad_area_no_up:
#endif
@@ -305,7 +305,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
return;

out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 18278b448530..a870478bbe16 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -106,6 +106,7 @@ void
ia64_init_addr_space (void)
{
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

ia64_set_rbs_bot();

@@ -122,13 +123,13 @@ ia64_init_addr_space (void)
vma->vm_end = vma->vm_start + PAGE_SIZE;
vma->vm_flags = VM_DATA_DEFAULT_FLAGS|VM_GROWSUP|VM_ACCOUNT;
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);
if (insert_vm_struct(current->mm, vma)) {
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
kmem_cache_free(vm_area_cachep, vma);
return;
}
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
}

/* map NaT-page at address zero to speed up speculative dereferencing of NULL: */
@@ -141,13 +142,13 @@ ia64_init_addr_space (void)
vma->vm_page_prot = __pgprot(pgprot_val(PAGE_READONLY) | _PAGE_MA_NAT);
vma->vm_flags = VM_READ | VM_MAYREAD | VM_IO |
VM_DONTEXPAND | VM_DONTDUMP;
- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);
if (insert_vm_struct(current->mm, vma)) {
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
kmem_cache_free(vm_area_cachep, vma);
return;
}
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
}
}
}
--
2.13.6


2018-02-05 01:35:35

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 53/64] arch/nios2: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/nios2/mm/fault.c | 12 ++++++------
arch/nios2/mm/init.c | 5 +++--
2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index 768678b685af..a59ebadd8e13 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -85,11 +85,11 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;

- if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!mm_read_trylock(mm, &mmrange)) {
if (!user_mode(regs) && !search_exception_tables(regs->ea))
goto bad_area_nosemaphore;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
}

vma = find_vma(mm, address);
@@ -174,7 +174,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -182,7 +182,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
@@ -220,14 +220,14 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/* Kernel mode? Handle exceptions or die */
if (!user_mode(regs))
diff --git a/arch/nios2/mm/init.c b/arch/nios2/mm/init.c
index c92fe4234009..58bb1c1441ce 100644
--- a/arch/nios2/mm/init.c
+++ b/arch/nios2/mm/init.c
@@ -123,15 +123,16 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
{
struct mm_struct *mm = current->mm;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

/* Map kuser helpers to user space address */
ret = install_special_mapping(mm, KUSER_BASE, KUSER_SIZE,
VM_READ | VM_EXEC | VM_MAYREAD |
VM_MAYEXEC, kuser_page);

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return ret;
}
--
2.13.6


2018-02-05 01:35:44

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 54/64] arch/arm: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.
For those mmap_sem users that need mmrange, we simply add it
to the function as the mmap_sem usage is in the same context.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/arm/kernel/process.c | 5 +++--
arch/arm/kernel/swp_emulate.c | 5 +++--
arch/arm/lib/uaccess_with_memcpy.c | 18 ++++++++++--------
arch/arm/mm/fault.c | 6 +++---
arch/arm64/kernel/traps.c | 5 +++--
arch/arm64/kernel/vdso.c | 12 +++++++-----
arch/arm64/mm/fault.c | 6 +++---
7 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 1523cb18b109..39fd5bd204d7 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -424,6 +424,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
unsigned long addr;
unsigned long hint;
int ret = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!signal_page)
signal_page = get_signal_page();
@@ -433,7 +434,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
npages = 1; /* for sigpage */
npages += vdso_total_pages;

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;
hint = sigpage_addr(mm, npages);
addr = get_unmapped_area(NULL, hint, npages << PAGE_SHIFT, 0, 0);
@@ -460,7 +461,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
arm_install_vdso(mm, addr + PAGE_SIZE);

up_fail:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}
#endif
diff --git a/arch/arm/kernel/swp_emulate.c b/arch/arm/kernel/swp_emulate.c
index 3bda08bee674..e01a469393fb 100644
--- a/arch/arm/kernel/swp_emulate.c
+++ b/arch/arm/kernel/swp_emulate.c
@@ -111,13 +111,14 @@ static const struct file_operations proc_status_fops = {
static void set_segfault(struct pt_regs *regs, unsigned long addr)
{
siginfo_t info;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
if (find_vma(current->mm, addr) == NULL)
info.si_code = SEGV_MAPERR;
else
info.si_code = SEGV_ACCERR;
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

info.si_signo = SIGSEGV;
info.si_errno = 0;
diff --git a/arch/arm/lib/uaccess_with_memcpy.c b/arch/arm/lib/uaccess_with_memcpy.c
index 9b4ed1728616..24464fa0a78a 100644
--- a/arch/arm/lib/uaccess_with_memcpy.c
+++ b/arch/arm/lib/uaccess_with_memcpy.c
@@ -89,6 +89,7 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n)
{
unsigned long ua_flags;
int atomic;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (uaccess_kernel()) {
memcpy((void *)to, from, n);
@@ -99,7 +100,7 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n)
atomic = faulthandler_disabled();

if (!atomic)
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
while (n) {
pte_t *pte;
spinlock_t *ptl;
@@ -107,11 +108,11 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n)

while (!pin_page_for_write(to, &pte, &ptl)) {
if (!atomic)
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (__put_user(0, (char __user *)to))
goto out;
if (!atomic)
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
}

tocopy = (~(unsigned long)to & ~PAGE_MASK) + 1;
@@ -131,7 +132,7 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n)
spin_unlock(ptl);
}
if (!atomic)
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

out:
return n;
@@ -161,23 +162,24 @@ static unsigned long noinline
__clear_user_memset(void __user *addr, unsigned long n)
{
unsigned long ua_flags;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (uaccess_kernel()) {
memset((void *)addr, 0, n);
return 0;
}

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
while (n) {
pte_t *pte;
spinlock_t *ptl;
int tocopy;

while (!pin_page_for_write(addr, &pte, &ptl)) {
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (__put_user(0, (char __user *)addr))
goto out;
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
}

tocopy = (~(unsigned long)addr & ~PAGE_MASK) + 1;
@@ -195,7 +197,7 @@ __clear_user_memset(void __user *addr, unsigned long n)
else
spin_unlock(ptl);
}
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

out:
return n;
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 99ae40b5851a..6ce3e0707db5 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -291,11 +291,11 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
* validly references user space from well defined areas of the code,
* we can bug out early if this is from code which shouldn't.
*/
- if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!mm_read_trylock(mm, &mmrange)) {
if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
goto no_context;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
} else {
/*
* The above down_read_trylock() might have succeeded in
@@ -348,7 +348,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Handle the "normal" case first - VM_FAULT_MAJOR
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index bbb0fde2780e..bf185655b142 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -351,13 +351,14 @@ void force_signal_inject(int signal, int code, struct pt_regs *regs,
void arm64_notify_segfault(struct pt_regs *regs, unsigned long addr)
{
int code;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
if (find_vma(current->mm, addr) == NULL)
code = SEGV_MAPERR;
else
code = SEGV_ACCERR;
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

force_signal_inject(SIGSEGV, code, regs, addr);
}
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 2d419006ad43..1b0006fe9668 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -94,8 +94,9 @@ int aarch32_setup_vectors_page(struct linux_binprm *bprm, int uses_interp)

};
void *ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;
current->mm->context.vdso = (void *)addr;

@@ -104,7 +105,7 @@ int aarch32_setup_vectors_page(struct linux_binprm *bprm, int uses_interp)
VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYEXEC,
&spec);

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return PTR_ERR_OR_ZERO(ret);
}
@@ -178,12 +179,13 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
struct mm_struct *mm = current->mm;
unsigned long vdso_base, vdso_text_len, vdso_mapping_len;
void *ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

vdso_text_len = vdso_pages << PAGE_SHIFT;
/* Be sure to map the data page */
vdso_mapping_len = vdso_text_len + PAGE_SIZE;

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;
vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
if (IS_ERR_VALUE(vdso_base)) {
@@ -206,12 +208,12 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
goto up_fail;


- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return 0;

up_fail:
mm->context.vdso = NULL;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return PTR_ERR(ret);
}

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 1f3ad9e4f214..555d533d52ab 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -434,11 +434,11 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
* validly references user space from well defined areas of the code,
* we can bug out early if this is from code which shouldn't.
*/
- if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!mm_read_trylock(mm, &mmrange)) {
if (!user_mode(regs) && !search_exception_tables(regs->pc))
goto no_context;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
} else {
/*
* The above down_read_trylock() might have succeeded in which
@@ -477,7 +477,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
goto retry;
}
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Handle the "normal" (no error) case first.
--
2.13.6


2018-02-05 01:35:57

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 48/64] arch/tile: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

* THIS IS A HACK *

Breaks arch/um/. See comment in fix_range_common().

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/um/include/asm/mmu_context.h | 5 +++--
arch/um/kernel/tlb.c | 12 +++++++++++-
arch/um/kernel/trap.c | 6 +++---
3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h
index 98cc3e36385a..7dc202c611db 100644
--- a/arch/um/include/asm/mmu_context.h
+++ b/arch/um/include/asm/mmu_context.h
@@ -49,14 +49,15 @@ extern void force_flush_all(void);

static inline void activate_mm(struct mm_struct *old, struct mm_struct *new)
{
+ DEFINE_RANGE_LOCK_FULL(mmrange);
/*
* This is called by fs/exec.c and sys_unshare()
* when the new ->mm is used for the first time.
*/
__switch_mm(&new->context.id);
- down_write(&new->mmap_sem);
+ mm_write_lock(new, &mmrange);
uml_setup_stubs(new);
- up_write(&new->mmap_sem);
+ mm_write_unlock(new, &mmrange);
}

static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c
index 37508b190106..eeeeb048b6f4 100644
--- a/arch/um/kernel/tlb.c
+++ b/arch/um/kernel/tlb.c
@@ -297,10 +297,20 @@ void fix_range_common(struct mm_struct *mm, unsigned long start_addr,

/* This is not an else because ret is modified above */
if (ret) {
+ /*
+ * FIXME: this is _wrong_ and will break arch/um.
+ *
+ * The right thing to do is modify the flush_tlb_range()
+ * api, but that in turn would require file_operations
+ * knowing about mmrange... Compiles cleanly, but sucks
+ * otherwise.
+ */
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
printk(KERN_ERR "fix_range_common: failed, killing current "
"process: %d\n", task_tgid_vnr(current));
/* We are under mmap_sem, release it such that current can terminate */
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
force_sig(SIGKILL, current);
do_signal(&current->thread.regs);
}
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index e632a14e896e..14dcb83d00a9 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -47,7 +47,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
if (is_user)
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);
if (!vma)
goto out;
@@ -123,7 +123,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
#endif
flush_tlb_page(vma, address);
out:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
out_nosemaphore:
return err;

@@ -132,7 +132,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
* We ran out of memory, call the OOM killer, and return the userspace
* (which will retry the fault, or kill us if we got oom-killed).
*/
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!is_user)
goto out_nosemaphore;
pagefault_out_of_memory();
--
2.13.6


2018-02-05 01:36:05

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 62/64] drivers: use mm locking wrappers (the rest)

From: Davidlohr Bueso <[email protected]>

This converts the rest of the drivers' mmap_sem usage to
mm locking wrappers. This becomes quite straightforward
with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
drivers/media/v4l2-core/videobuf-core.c | 5 ++-
drivers/media/v4l2-core/videobuf-dma-contig.c | 5 ++-
drivers/media/v4l2-core/videobuf-dma-sg.c | 4 +-
drivers/misc/cxl/cxllib.c | 5 ++-
drivers/misc/cxl/fault.c | 5 ++-
drivers/misc/mic/scif/scif_rma.c | 14 +++---
drivers/misc/sgi-gru/grufault.c | 52 +++++++++++++---------
drivers/misc/sgi-gru/grufile.c | 5 ++-
drivers/oprofile/buffer_sync.c | 12 ++---
.../media/atomisp/pci/atomisp2/hmm/hmm_bo.c | 5 ++-
drivers/tee/optee/call.c | 5 ++-
drivers/vfio/vfio_iommu_spapr_tce.c | 8 ++--
drivers/vfio/vfio_iommu_type1.c | 15 ++++---
13 files changed, 80 insertions(+), 60 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf-core.c b/drivers/media/v4l2-core/videobuf-core.c
index 9a89d3ae170f..2081606e179e 100644
--- a/drivers/media/v4l2-core/videobuf-core.c
+++ b/drivers/media/v4l2-core/videobuf-core.c
@@ -533,11 +533,12 @@ int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b)
enum v4l2_field field;
unsigned long flags = 0;
int retval;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

MAGIC_CHECK(q->int_ops->magic, MAGIC_QTYPE_OPS);

if (b->memory == V4L2_MEMORY_MMAP)
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);

videobuf_queue_lock(q);
retval = -EBUSY;
@@ -624,7 +625,7 @@ int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b)
videobuf_queue_unlock(q);

if (b->memory == V4L2_MEMORY_MMAP)
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

return retval;
}
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index e02353e340dd..8b1f58807c0d 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -166,12 +166,13 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
unsigned long pages_done, user_address;
unsigned int offset;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

offset = vb->baddr & ~PAGE_MASK;
mem->size = PAGE_ALIGN(vb->size + offset);
ret = -EINVAL;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

vma = find_vma(mm, vb->baddr);
if (!vma)
@@ -203,7 +204,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
}

out_up:
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

return ret;
}
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 64a4cd62eeb3..e7ff32aca981 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -204,9 +204,9 @@ static int videobuf_dma_init_user(struct videobuf_dmabuf *dma, int direction,
int ret;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
ret = videobuf_dma_init_user_locked(dma, direction, data, size, &mmrange);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

return ret;
}
diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
index 30ccba436b3b..bf147735945c 100644
--- a/drivers/misc/cxl/cxllib.c
+++ b/drivers/misc/cxl/cxllib.c
@@ -214,11 +214,12 @@ int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags)
u64 dar;
struct vm_area_struct *vma = NULL;
unsigned long page_size;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (mm == NULL)
return -EFAULT;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

vma = find_vma(mm, addr);
if (!vma) {
@@ -250,7 +251,7 @@ int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags)
}
rc = 0;
out:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return rc;
}
EXPORT_SYMBOL_GPL(cxllib_handle_fault);
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
index 70dbb6de102c..f95169703f71 100644
--- a/drivers/misc/cxl/fault.c
+++ b/drivers/misc/cxl/fault.c
@@ -317,6 +317,7 @@ static void cxl_prefault_vma(struct cxl_context *ctx)
struct vm_area_struct *vma;
int rc;
struct mm_struct *mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

mm = get_mem_context(ctx);
if (mm == NULL) {
@@ -325,7 +326,7 @@ static void cxl_prefault_vma(struct cxl_context *ctx)
return;
}

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
for (ea = vma->vm_start; ea < vma->vm_end;
ea = next_segment(ea, slb.vsid)) {
@@ -340,7 +341,7 @@ static void cxl_prefault_vma(struct cxl_context *ctx)
last_esid = slb.esid;
}
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

mmput(mm);
}
diff --git a/drivers/misc/mic/scif/scif_rma.c b/drivers/misc/mic/scif/scif_rma.c
index 6ecac843e5f3..4bbdf875b5da 100644
--- a/drivers/misc/mic/scif/scif_rma.c
+++ b/drivers/misc/mic/scif/scif_rma.c
@@ -274,19 +274,21 @@ static inline int
__scif_dec_pinned_vm_lock(struct mm_struct *mm,
int nr_pages, bool try_lock)
{
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
if (!mm || !nr_pages || !scif_ulimit_check)
return 0;
if (try_lock) {
- if (!down_write_trylock(&mm->mmap_sem)) {
+ if (!mm_write_trylock(mm, &mmrange)) {
dev_err(scif_info.mdev.this_device,
"%s %d err\n", __func__, __LINE__);
return -1;
}
} else {
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
}
mm->pinned_vm -= nr_pages;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return 0;
}

@@ -1386,11 +1388,11 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot,
prot |= SCIF_PROT_WRITE;
retry:
mm = current->mm;
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
if (ulimit) {
err = __scif_check_inc_pinned_vm(mm, nr_pages);
if (err) {
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
pinned_pages->nr_pages = 0;
goto error_unmap;
}
@@ -1402,7 +1404,7 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot,
(prot & SCIF_PROT_WRITE) ? FOLL_WRITE : 0,
pinned_pages->pages,
NULL, &mmrange);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
if (nr_pages != pinned_pages->nr_pages) {
if (try_upgrade) {
if (ulimit)
diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index b35d60bb2197..bac8bb94ba65 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -76,20 +76,21 @@ struct vm_area_struct *gru_find_vma(unsigned long vaddr)
* - NULL if vaddr invalid OR is not a valid GSEG vaddr.
*/

-static struct gru_thread_state *gru_find_lock_gts(unsigned long vaddr)
+static struct gru_thread_state *gru_find_lock_gts(unsigned long vaddr,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
struct gru_thread_state *gts = NULL;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, mmrange);
vma = gru_find_vma(vaddr);
if (vma)
gts = gru_find_thread_state(vma, TSID(vaddr, vma));
if (gts)
mutex_lock(&gts->ts_ctxlock);
else
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
return gts;
}

@@ -98,8 +99,9 @@ static struct gru_thread_state *gru_alloc_locked_gts(unsigned long vaddr)
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
struct gru_thread_state *gts = ERR_PTR(-EINVAL);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
vma = gru_find_vma(vaddr);
if (!vma)
goto err;
@@ -108,21 +110,22 @@ static struct gru_thread_state *gru_alloc_locked_gts(unsigned long vaddr)
if (IS_ERR(gts))
goto err;
mutex_lock(&gts->ts_ctxlock);
- downgrade_write(&mm->mmap_sem);
+ mm_downgrade_write(mm, &mmrange);
return gts;

err:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return gts;
}

/*
* Unlock a GTS that was previously locked with gru_find_lock_gts().
*/
-static void gru_unlock_gts(struct gru_thread_state *gts)
+static void gru_unlock_gts(struct gru_thread_state *gts,
+ struct range_lock *mmrange)
{
mutex_unlock(&gts->ts_ctxlock);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, mmrange);
}

/*
@@ -597,9 +600,9 @@ static irqreturn_t gru_intr(int chiplet, int blade)
if (!gts->ts_force_cch_reload) {
DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_read_trylock(&gts->ts_mm->mmap_sem)) {
+ if (mm_read_trylock(gts->ts_mm, &mmrange)) {
gru_try_dropin(gru, gts, tfh, NULL, &mmrange);
- up_read(&gts->ts_mm->mmap_sem);
+ mm_read_unlock(gts->ts_mm, &mmrange);
}
} else {
tfh_user_polling_mode(tfh);
@@ -672,7 +675,7 @@ int gru_handle_user_call_os(unsigned long cb)
if ((cb & (GRU_HANDLE_STRIDE - 1)) || ucbnum >= GRU_NUM_CB)
return -EINVAL;

- gts = gru_find_lock_gts(cb);
+ gts = gru_find_lock_gts(cb, &mmrange);
if (!gts)
return -EINVAL;
gru_dbg(grudev, "address 0x%lx, gid %d, gts 0x%p\n", cb, gts->ts_gru ? gts->ts_gru->gs_gid : -1, gts);
@@ -699,7 +702,7 @@ int gru_handle_user_call_os(unsigned long cb)
ret = gru_user_dropin(gts, tfh, cbk, &mmrange);
}
exit:
- gru_unlock_gts(gts);
+ gru_unlock_gts(gts, &mmrange);
return ret;
}

@@ -713,12 +716,13 @@ int gru_get_exception_detail(unsigned long arg)
struct gru_control_block_extended *cbe;
struct gru_thread_state *gts;
int ucbnum, cbrnum, ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

STAT(user_exception);
if (copy_from_user(&excdet, (void __user *)arg, sizeof(excdet)))
return -EFAULT;

- gts = gru_find_lock_gts(excdet.cb);
+ gts = gru_find_lock_gts(excdet.cb, &mmrange);
if (!gts)
return -EINVAL;

@@ -743,7 +747,7 @@ int gru_get_exception_detail(unsigned long arg)
} else {
ret = -EAGAIN;
}
- gru_unlock_gts(gts);
+ gru_unlock_gts(gts, &mmrange);

gru_dbg(grudev,
"cb 0x%lx, op %d, exopc %d, cbrstate %d, cbrexecstatus 0x%x, ecause 0x%x, "
@@ -787,6 +791,7 @@ int gru_user_unload_context(unsigned long arg)
{
struct gru_thread_state *gts;
struct gru_unload_context_req req;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

STAT(user_unload_context);
if (copy_from_user(&req, (void __user *)arg, sizeof(req)))
@@ -797,13 +802,13 @@ int gru_user_unload_context(unsigned long arg)
if (!req.gseg)
return gru_unload_all_contexts();

- gts = gru_find_lock_gts(req.gseg);
+ gts = gru_find_lock_gts(req.gseg, &mmrange);
if (!gts)
return -EINVAL;

if (gts->ts_gru)
gru_unload_context(gts, 1);
- gru_unlock_gts(gts);
+ gru_unlock_gts(gts, &mmrange);

return 0;
}
@@ -817,6 +822,7 @@ int gru_user_flush_tlb(unsigned long arg)
struct gru_thread_state *gts;
struct gru_flush_tlb_req req;
struct gru_mm_struct *gms;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

STAT(user_flush_tlb);
if (copy_from_user(&req, (void __user *)arg, sizeof(req)))
@@ -825,12 +831,12 @@ int gru_user_flush_tlb(unsigned long arg)
gru_dbg(grudev, "gseg 0x%lx, vaddr 0x%lx, len 0x%lx\n", req.gseg,
req.vaddr, req.len);

- gts = gru_find_lock_gts(req.gseg);
+ gts = gru_find_lock_gts(req.gseg, &mmrange);
if (!gts)
return -EINVAL;

gms = gts->ts_gms;
- gru_unlock_gts(gts);
+ gru_unlock_gts(gts, &mmrange);
gru_flush_tlb_range(gms, req.vaddr, req.len);

return 0;
@@ -843,6 +849,7 @@ long gru_get_gseg_statistics(unsigned long arg)
{
struct gru_thread_state *gts;
struct gru_get_gseg_statistics_req req;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (copy_from_user(&req, (void __user *)arg, sizeof(req)))
return -EFAULT;
@@ -852,10 +859,10 @@ long gru_get_gseg_statistics(unsigned long arg)
* If no gts exists in the array, the context has never been used & all
* statistics are implicitly 0.
*/
- gts = gru_find_lock_gts(req.gseg);
+ gts = gru_find_lock_gts(req.gseg, &mmrange);
if (gts) {
memcpy(&req.stats, &gts->ustats, sizeof(gts->ustats));
- gru_unlock_gts(gts);
+ gru_unlock_gts(gts, &mmrange);
} else {
memset(&req.stats, 0, sizeof(gts->ustats));
}
@@ -875,13 +882,14 @@ int gru_set_context_option(unsigned long arg)
struct gru_thread_state *gts;
struct gru_set_context_option_req req;
int ret = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

STAT(set_context_option);
if (copy_from_user(&req, (void __user *)arg, sizeof(req)))
return -EFAULT;
gru_dbg(grudev, "op %d, gseg 0x%lx, value1 0x%lx\n", req.op, req.gseg, req.val1);

- gts = gru_find_lock_gts(req.gseg);
+ gts = gru_find_lock_gts(req.gseg, &mmrange);
if (!gts) {
gts = gru_alloc_locked_gts(req.gseg);
if (IS_ERR(gts))
@@ -912,7 +920,7 @@ int gru_set_context_option(unsigned long arg)
default:
ret = -EINVAL;
}
- gru_unlock_gts(gts);
+ gru_unlock_gts(gts, &mmrange);

return ret;
}
diff --git a/drivers/misc/sgi-gru/grufile.c b/drivers/misc/sgi-gru/grufile.c
index 104a05f6b738..1403a4f73cbd 100644
--- a/drivers/misc/sgi-gru/grufile.c
+++ b/drivers/misc/sgi-gru/grufile.c
@@ -136,6 +136,7 @@ static int gru_create_new_context(unsigned long arg)
struct vm_area_struct *vma;
struct gru_vma_data *vdata;
int ret = -EINVAL;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (copy_from_user(&req, (void __user *)arg, sizeof(req)))
return -EFAULT;
@@ -148,7 +149,7 @@ static int gru_create_new_context(unsigned long arg)
if (!(req.options & GRU_OPT_MISS_MASK))
req.options |= GRU_OPT_MISS_FMM_INTR;

- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);
vma = gru_find_vma(req.gseg);
if (vma) {
vdata = vma->vm_private_data;
@@ -159,7 +160,7 @@ static int gru_create_new_context(unsigned long arg)
vdata->vd_tlb_preload_count = req.tlb_preload_count;
ret = 0;
}
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);

return ret;
}
diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
index ac27f3d3fbb4..33a36b97f8a5 100644
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -90,12 +90,13 @@ munmap_notify(struct notifier_block *self, unsigned long val, void *data)
unsigned long addr = (unsigned long)data;
struct mm_struct *mm = current->mm;
struct vm_area_struct *mpnt;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

mpnt = find_vma(mm, addr);
if (mpnt && mpnt->vm_file && (mpnt->vm_flags & VM_EXEC)) {
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
/* To avoid latency problems, we only process the current CPU,
* hoping that most samples for the task are on this CPU
*/
@@ -103,7 +104,7 @@ munmap_notify(struct notifier_block *self, unsigned long val, void *data)
return 0;
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return 0;
}

@@ -255,8 +256,9 @@ lookup_dcookie(struct mm_struct *mm, unsigned long addr, off_t *offset)
{
unsigned long cookie = NO_COOKIE;
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {

if (addr < vma->vm_start || addr >= vma->vm_end)
@@ -276,7 +278,7 @@ lookup_dcookie(struct mm_struct *mm, unsigned long addr, off_t *offset)

if (!vma)
cookie = INVALID_COOKIE;
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

return cookie;
}
diff --git a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_bo.c b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_bo.c
index 79bd540d7882..f38303ea8470 100644
--- a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_bo.c
+++ b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm_bo.c
@@ -983,6 +983,7 @@ static int alloc_user_pages(struct hmm_buffer_object *bo,
int i;
struct vm_area_struct *vma;
struct page **pages;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

pages = kmalloc_array(bo->pgnr, sizeof(struct page *), GFP_KERNEL);
if (unlikely(!pages))
@@ -996,9 +997,9 @@ static int alloc_user_pages(struct hmm_buffer_object *bo,
}

mutex_unlock(&bo->mutex);
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm, (unsigned long)userptr);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (vma == NULL) {
dev_err(atomisp_dev, "find_vma failed\n");
kfree(bo->page_obj);
diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
index a5afbe6dee68..488a08e17a93 100644
--- a/drivers/tee/optee/call.c
+++ b/drivers/tee/optee/call.c
@@ -561,11 +561,12 @@ static int check_mem_type(unsigned long start, size_t num_pages)
{
struct mm_struct *mm = current->mm;
int rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
rc = __check_mem_type(find_vma(mm, start),
start + num_pages * PAGE_SIZE);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

return rc;
}
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 759a5bdd40e1..114da7865bd2 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -44,7 +44,7 @@ static long try_increment_locked_vm(struct mm_struct *mm, long npages)
if (!npages)
return 0;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
locked = mm->locked_vm + npages;
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
if (locked > lock_limit && !capable(CAP_IPC_LOCK))
@@ -58,7 +58,7 @@ static long try_increment_locked_vm(struct mm_struct *mm, long npages)
rlimit(RLIMIT_MEMLOCK),
ret ? " - exceeded" : "");

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return ret;
}
@@ -68,7 +68,7 @@ static void decrement_locked_vm(struct mm_struct *mm, long npages)
if (!mm || !npages)
return;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
if (WARN_ON_ONCE(npages > mm->locked_vm))
npages = mm->locked_vm;
mm->locked_vm -= npages;
@@ -76,7 +76,7 @@ static void decrement_locked_vm(struct mm_struct *mm, long npages)
npages << PAGE_SHIFT,
mm->locked_vm << PAGE_SHIFT,
rlimit(RLIMIT_MEMLOCK));
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
}

/*
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1b3b103da637..80a6ec8722fb 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -251,6 +251,7 @@ static int vfio_lock_acct(struct task_struct *task, long npage, bool *lock_cap)
struct mm_struct *mm;
bool is_current;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!npage)
return 0;
@@ -261,7 +262,7 @@ static int vfio_lock_acct(struct task_struct *task, long npage, bool *lock_cap)
if (!mm)
return -ESRCH; /* process exited */

- ret = down_write_killable(&mm->mmap_sem);
+ ret = mm_write_lock_killable(mm, &mmrange);
if (!ret) {
if (npage > 0) {
if (lock_cap ? !*lock_cap :
@@ -279,7 +280,7 @@ static int vfio_lock_acct(struct task_struct *task, long npage, bool *lock_cap)
if (!ret)
mm->locked_vm += npage;

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
}

if (!is_current)
@@ -339,21 +340,21 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
struct page *page[1];
struct vm_area_struct *vma;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (mm == current->mm) {
ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE),
page);
} else {
unsigned int flags = 0;
- DEFINE_RANGE_LOCK_FULL(mmrange);

if (prot & IOMMU_WRITE)
flags |= FOLL_WRITE;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
NULL, NULL, &mmrange);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}

if (ret == 1) {
@@ -361,7 +362,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
return 0;
}

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

vma = find_vma_intersection(mm, vaddr, vaddr + 1);

@@ -371,7 +372,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
ret = 0;
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return ret;
}

--
2.13.6


2018-02-05 01:36:15

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 60/64] drivers/xen: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

All callers use mmap_sem within the same function
context. No change in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
drivers/xen/gntdev.c | 5 +++--
drivers/xen/privcmd.c | 12 +++++++-----
2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index bd56653b9bbc..9181eee4e160 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -648,12 +648,13 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
struct vm_area_struct *vma;
struct grant_map *map;
int rv = -EINVAL;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (copy_from_user(&op, u, sizeof(op)) != 0)
return -EFAULT;
pr_debug("priv %p, offset for vaddr %lx\n", priv, (unsigned long)op.vaddr);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm, op.vaddr);
if (!vma || vma->vm_ops != &gntdev_vmops)
goto out_unlock;
@@ -667,7 +668,7 @@ static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
rv = 0;

out_unlock:
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

if (rv == 0 && copy_to_user(u, &op, sizeof(op)) != 0)
return -EFAULT;
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 1c909183c42a..3736752556c5 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -257,6 +257,7 @@ static long privcmd_ioctl_mmap(struct file *file, void __user *udata)
int rc;
LIST_HEAD(pagelist);
struct mmap_gfn_state state;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* We only support privcmd_ioctl_mmap_batch for auto translated. */
if (xen_feature(XENFEAT_auto_translated_physmap))
@@ -276,7 +277,7 @@ static long privcmd_ioctl_mmap(struct file *file, void __user *udata)
if (rc || list_empty(&pagelist))
goto out;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

{
struct page *page = list_first_entry(&pagelist,
@@ -301,7 +302,7 @@ static long privcmd_ioctl_mmap(struct file *file, void __user *udata)


out_up:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

out:
free_page_list(&pagelist);
@@ -451,6 +452,7 @@ static long privcmd_ioctl_mmap_batch(
unsigned long nr_pages;
LIST_HEAD(pagelist);
struct mmap_batch_state state;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

switch (version) {
case 1:
@@ -497,7 +499,7 @@ static long privcmd_ioctl_mmap_batch(
}
}

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

vma = find_vma(mm, m.addr);
if (!vma ||
@@ -553,7 +555,7 @@ static long privcmd_ioctl_mmap_batch(
BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
&pagelist, mmap_batch_fn, &state));

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

if (state.global_error) {
/* Write back errors in second pass. */
@@ -574,7 +576,7 @@ static long privcmd_ioctl_mmap_batch(
return ret;

out_unlock:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
goto out;
}

--
2.13.6


2018-02-05 01:36:15

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 37/64] arch/arc: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/arc/kernel/troubleshoot.c | 5 +++--
arch/arc/mm/fault.c | 12 ++++++------
2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index 6e9a0a9a6a04..7212ba466c56 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -89,11 +89,12 @@ static void show_faulting_vma(unsigned long address, char *buf)
dev_t dev = 0;
char *nm = buf;
struct mm_struct *active_mm = current->active_mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* can't use print_vma_addr() yet as it doesn't check for
* non-inclusive vma
*/
- down_read(&active_mm->mmap_sem);
+ mm_read_lock(active_mm, &mmrange);
vma = find_vma(active_mm, address);

/* check against the find_vma( ) behaviour which returns the next VMA
@@ -115,7 +116,7 @@ static void show_faulting_vma(unsigned long address, char *buf)
} else
pr_info(" @No matching VMA found\n");

- up_read(&active_mm->mmap_sem);
+ mm_read_unlock(active_mm, &mmrange);
}

static void show_ecr_verbose(struct pt_regs *regs)
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index e423f764f159..235e89a3ed8e 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -100,7 +100,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
@@ -143,7 +143,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
/* If Pagefault was interrupted by SIGKILL, exit page fault "early" */
if (unlikely(fatal_signal_pending(current))) {
if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY))
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (user_mode(regs))
return;
}
@@ -171,7 +171,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
}

/* Fault Handled Gracefully */
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;
}

@@ -190,7 +190,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
@@ -219,7 +219,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
die("Oops", regs, address);

out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

if (user_mode(regs)) {
pagefault_out_of_memory();
@@ -229,7 +229,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
goto no_context;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

if (!user_mode(regs))
goto no_context;
--
2.13.6


2018-02-05 01:36:46

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 51/64] arch/mn10300: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/mn10300/mm/fault.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/mn10300/mm/fault.c b/arch/mn10300/mm/fault.c
index 71c38f0c8702..cd973bd02259 100644
--- a/arch/mn10300/mm/fault.c
+++ b/arch/mn10300/mm/fault.c
@@ -175,7 +175,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
if ((fault_code & MMUFCR_xFC_ACCESS) == MMUFCR_xFC_ACCESS_USR)
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

vma = find_vma(mm, address);
if (!vma)
@@ -286,7 +286,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -294,7 +294,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/* User mode accesses just cause a SIGSEGV */
if ((fault_code & MMUFCR_xFC_ACCESS) == MMUFCR_xFC_ACCESS_USR) {
@@ -349,7 +349,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if ((fault_code & MMUFCR_xFC_ACCESS) == MMUFCR_xFC_ACCESS_USR) {
pagefault_out_of_memory();
return;
@@ -357,7 +357,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
goto no_context;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Send a sigbus, regardless of whether we were in kernel
--
2.13.6


2018-02-05 01:37:00

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 36/64] arch/mips: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/mips/kernel/traps.c | 5 +++--
arch/mips/kernel/vdso.c | 4 ++--
arch/mips/mm/c-octeon.c | 5 +++--
arch/mips/mm/c-r4k.c | 5 +++--
arch/mips/mm/fault.c | 10 +++++-----
5 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 0ae4a731cc12..a7d1d2417844 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -746,6 +746,7 @@ int process_fpemu_return(int sig, void __user *fault_addr, unsigned long fcr31)
{
struct siginfo si;
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

clear_siginfo(&si);
switch (sig) {
@@ -766,13 +767,13 @@ int process_fpemu_return(int sig, void __user *fault_addr, unsigned long fcr31)
case SIGSEGV:
si.si_addr = fault_addr;
si.si_signo = sig;
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm, (unsigned long)fault_addr);
if (vma && (vma->vm_start <= (unsigned long)fault_addr))
si.si_code = SEGV_ACCERR;
else
si.si_code = SEGV_MAPERR;
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
force_sig_info(sig, &si, current);
return 1;

diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 56b7c29991db..beaf63864e70 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -104,7 +104,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
int ret;
DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

/* Map delay slot emulation page */
@@ -177,6 +177,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
ret = 0;

out:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}
diff --git a/arch/mips/mm/c-octeon.c b/arch/mips/mm/c-octeon.c
index 0e45b061e514..e4f6db4a8755 100644
--- a/arch/mips/mm/c-octeon.c
+++ b/arch/mips/mm/c-octeon.c
@@ -136,11 +136,12 @@ static void octeon_flush_icache_range(unsigned long start, unsigned long end)
static void octeon_flush_cache_sigtramp(unsigned long addr)
{
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm, addr);
octeon_flush_icache_all_cores(vma);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
}


diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 6f534b209971..7f9c9c91dbc1 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -999,8 +999,9 @@ static void r4k_flush_cache_sigtramp(unsigned long addr)
{
struct flush_cache_sigtramp_args args;
int npages;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);

npages = get_user_pages_fast(addr, 1, 0, &args.page);
if (npages < 1)
@@ -1013,7 +1014,7 @@ static void r4k_flush_cache_sigtramp(unsigned long addr)

put_page(args.page);
out:
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
}

static void r4k_flush_icache_all(void)
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index 1433edd01d09..510abb6b433a 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -98,7 +98,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
@@ -192,7 +192,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -200,7 +200,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
@@ -256,14 +256,14 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
* We ran out of memory, call the OOM killer, and return the userspace
* (which will retry the fault, or kill us if we got oom-killed).
*/
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/* Kernel mode? Handle exceptions or die */
if (!user_mode(regs))
--
2.13.6


2018-02-05 01:37:11

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 41/64] arch/cris: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/cris/mm/fault.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/cris/mm/fault.c b/arch/cris/mm/fault.c
index 16af16d77269..decc8f1fbc9d 100644
--- a/arch/cris/mm/fault.c
+++ b/arch/cris/mm/fault.c
@@ -122,7 +122,7 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
@@ -205,7 +205,7 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -214,7 +214,7 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
*/

bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
DPG(show_registers(regs));
@@ -286,14 +286,14 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
*/

out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Send a sigbus, regardless of whether we were in kernel
--
2.13.6


2018-02-05 01:37:17

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 32/64] arch/s390: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/s390/kernel/vdso.c | 5 +++--
arch/s390/kvm/gaccess.c | 4 ++--
arch/s390/kvm/kvm-s390.c | 24 ++++++++++++++----------
arch/s390/kvm/priv.c | 29 +++++++++++++++++------------
arch/s390/mm/fault.c | 6 +++---
arch/s390/mm/gmap.c | 45 ++++++++++++++++++++++++---------------------
arch/s390/pci/pci_mmio.c | 5 +++--
7 files changed, 66 insertions(+), 52 deletions(-)

diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index f3a1c7c6824e..0395c6b906fd 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -213,6 +213,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
unsigned long vdso_pages;
unsigned long vdso_base;
int rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!vdso_enabled)
return 0;
@@ -239,7 +240,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
* it at vdso_base which is the "natural" base for it, but we might
* fail and end up putting it elsewhere.
*/
- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;
vdso_base = get_unmapped_area(NULL, 0, vdso_pages << PAGE_SHIFT, 0, 0);
if (IS_ERR_VALUE(vdso_base)) {
@@ -270,7 +271,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
rc = 0;

out_up:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return rc;
}

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index ff739b86df36..28c2c14319c8 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -1179,7 +1179,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
int rc;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&sg->mm->mmap_sem);
+ mm_read_lock(sg->mm, &mmrange);
/*
* We don't want any guest-2 tables to change - so the parent
* tables/pointers we read stay valid - unshadowing is however
@@ -1209,6 +1209,6 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
if (!rc)
rc = gmap_shadow_page(sg, saddr, __pte(pte.val), &mmrange);
ipte_unlock(vcpu);
- up_read(&sg->mm->mmap_sem);
+ mm_read_unlock(sg->mm, &mmrange);
return rc;
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ba4c7092335a..942aeb6cbf1c 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1420,6 +1420,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
{
uint8_t *keys;
uint64_t hva;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
int srcu_idx, i, r = 0;

if (args->flags != 0)
@@ -1437,7 +1438,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
if (!keys)
return -ENOMEM;

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
srcu_idx = srcu_read_lock(&kvm->srcu);
for (i = 0; i < args->count; i++) {
hva = gfn_to_hva(kvm, args->start_gfn + i);
@@ -1451,7 +1452,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
break;
}
srcu_read_unlock(&kvm->srcu, srcu_idx);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

if (!r) {
r = copy_to_user((uint8_t __user *)args->skeydata_addr, keys,
@@ -1468,6 +1469,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
{
uint8_t *keys;
uint64_t hva;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
int srcu_idx, i, r = 0;

if (args->flags != 0)
@@ -1493,7 +1495,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
if (r)
goto out;

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
srcu_idx = srcu_read_lock(&kvm->srcu);
for (i = 0; i < args->count; i++) {
hva = gfn_to_hva(kvm, args->start_gfn + i);
@@ -1513,7 +1515,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
break;
}
srcu_read_unlock(&kvm->srcu, srcu_idx);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
out:
kvfree(keys);
return r;
@@ -1543,6 +1545,7 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm,
unsigned long bufsize, hva, pgstev, i, next, cur;
int srcu_idx, peek, r = 0, rr;
u8 *res;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

cur = args->start_gfn;
i = next = pgstev = 0;
@@ -1586,7 +1589,7 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm,

args->start_gfn = cur;

- down_read(&kvm->mm->mmap_sem);
+ mm_read_lock(kvm->mm, &mmrange);
srcu_idx = srcu_read_lock(&kvm->srcu);
while (i < bufsize) {
hva = gfn_to_hva(kvm, cur);
@@ -1620,7 +1623,7 @@ static int kvm_s390_get_cmma_bits(struct kvm *kvm,
cur++;
}
srcu_read_unlock(&kvm->srcu, srcu_idx);
- up_read(&kvm->mm->mmap_sem);
+ mm_read_unlock(kvm->mm, &mmrange);
args->count = i;
args->remaining = s ? atomic64_read(&s->dirty_pages) : 0;

@@ -1643,6 +1646,7 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
unsigned long hva, mask, pgstev, i;
uint8_t *bits;
int srcu_idx, r = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

mask = args->mask;

@@ -1668,7 +1672,7 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
goto out;
}

- down_read(&kvm->mm->mmap_sem);
+ mm_read_lock(kvm->mm, &mmrange);
srcu_idx = srcu_read_lock(&kvm->srcu);
for (i = 0; i < args->count; i++) {
hva = gfn_to_hva(kvm, args->start_gfn + i);
@@ -1683,12 +1687,12 @@ static int kvm_s390_set_cmma_bits(struct kvm *kvm,
set_pgste_bits(kvm->mm, hva, mask, pgstev);
}
srcu_read_unlock(&kvm->srcu, srcu_idx);
- up_read(&kvm->mm->mmap_sem);
+ mm_read_unlock(kvm->mm, &mmrange);

if (!kvm->mm->context.use_cmma) {
- down_write(&kvm->mm->mmap_sem);
+ mm_write_lock(kvm->mm, &mmrange);
kvm->mm->context.use_cmma = 1;
- up_write(&kvm->mm->mmap_sem);
+ mm_write_unlock(kvm->mm, &mmrange);
}
out:
vfree(bits);
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index c4c4e157c036..7bb37eca557e 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -246,6 +246,7 @@ static int handle_iske(struct kvm_vcpu *vcpu)
unsigned char key;
int reg1, reg2;
int rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

vcpu->stat.instruction_iske++;

@@ -265,9 +266,9 @@ static int handle_iske(struct kvm_vcpu *vcpu)
if (kvm_is_error_hva(addr))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
rc = get_guest_storage_key(current->mm, addr, &key);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (rc)
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
vcpu->run->s.regs.gprs[reg1] &= ~0xff;
@@ -280,6 +281,7 @@ static int handle_rrbe(struct kvm_vcpu *vcpu)
unsigned long addr;
int reg1, reg2;
int rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

vcpu->stat.instruction_rrbe++;

@@ -299,9 +301,9 @@ static int handle_rrbe(struct kvm_vcpu *vcpu)
if (kvm_is_error_hva(addr))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
rc = reset_guest_reference_bit(current->mm, addr);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (rc < 0)
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);

@@ -351,16 +353,17 @@ static int handle_sske(struct kvm_vcpu *vcpu)
}

while (start != end) {
+ DEFINE_RANGE_LOCK_FULL(mmrange);
unsigned long addr = gfn_to_hva(vcpu->kvm, gpa_to_gfn(start));

if (kvm_is_error_hva(addr))
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
rc = cond_set_guest_storage_key(current->mm, addr, key, &oldkey,
m3 & SSKE_NQ, m3 & SSKE_MR,
m3 & SSKE_MC);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (rc < 0)
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
start += PAGE_SIZE;
@@ -953,13 +956,14 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)

if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
int rc = kvm_s390_skey_check_enable(vcpu);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (rc)
return rc;
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
rc = cond_set_guest_storage_key(current->mm, useraddr,
key, NULL, nq, mr, mc);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (rc < 0)
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
}
@@ -1046,6 +1050,7 @@ static int handle_essa(struct kvm_vcpu *vcpu)
unsigned long *cbrlo;
struct gmap *gmap;
int i, orc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

VCPU_EVENT(vcpu, 4, "ESSA: release %d pages", entries);
gmap = vcpu->arch.gmap;
@@ -1073,9 +1078,9 @@ static int handle_essa(struct kvm_vcpu *vcpu)
* already correct, we do nothing and avoid the lock.
*/
if (vcpu->kvm->mm->context.use_cmma == 0) {
- down_write(&vcpu->kvm->mm->mmap_sem);
+ mm_write_lock(vcpu->kvm->mm, &mmrange);
vcpu->kvm->mm->context.use_cmma = 1;
- up_write(&vcpu->kvm->mm->mmap_sem);
+ mm_write_unlock(vcpu->kvm->mm, &mmrange);
}
/*
* If we are here, we are supposed to have CMMA enabled in
@@ -1098,10 +1103,10 @@ static int handle_essa(struct kvm_vcpu *vcpu)
}
vcpu->arch.sie_block->cbrlo &= PAGE_MASK; /* reset nceo */
cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo);
- down_read(&gmap->mm->mmap_sem);
+ mm_read_lock(gmap->mm, &mmrange);
for (i = 0; i < entries; ++i)
__gmap_zap(gmap, cbrlo[i]);
- up_read(&gmap->mm->mmap_sem);
+ mm_read_unlock(gmap->mm, &mmrange);
return 0;
}

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 17ba3c402f9d..0d6b63fa629e 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -463,7 +463,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
flags |= FAULT_FLAG_USER;
if (access == VM_WRITE || (trans_exc_code & store_indication) == 0x400)
flags |= FAULT_FLAG_WRITE;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

gmap = NULL;
if (IS_ENABLED(CONFIG_PGSTE) && type == GMAP_FAULT) {
@@ -546,7 +546,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
flags &= ~(FAULT_FLAG_ALLOW_RETRY |
FAULT_FLAG_RETRY_NOWAIT);
flags |= FAULT_FLAG_TRIED;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
goto retry;
}
}
@@ -564,7 +564,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
}
fault = 0;
out_up:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
out:
return fault;
}
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index b12a44813022..9419ae7b7f56 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -395,6 +395,7 @@ int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long len)
{
unsigned long off;
int flush;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

BUG_ON(gmap_is_shadow(gmap));
if ((to | len) & (PMD_SIZE - 1))
@@ -403,10 +404,10 @@ int gmap_unmap_segment(struct gmap *gmap, unsigned long to, unsigned long len)
return -EINVAL;

flush = 0;
- down_write(&gmap->mm->mmap_sem);
+ mm_write_lock(gmap->mm, &mmrange);
for (off = 0; off < len; off += PMD_SIZE)
flush |= __gmap_unmap_by_gaddr(gmap, to + off);
- up_write(&gmap->mm->mmap_sem);
+ mm_write_unlock(gmap->mm, &mmrange);
if (flush)
gmap_flush_tlb(gmap);
return 0;
@@ -427,6 +428,7 @@ int gmap_map_segment(struct gmap *gmap, unsigned long from,
{
unsigned long off;
int flush;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

BUG_ON(gmap_is_shadow(gmap));
if ((from | to | len) & (PMD_SIZE - 1))
@@ -436,7 +438,7 @@ int gmap_map_segment(struct gmap *gmap, unsigned long from,
return -EINVAL;

flush = 0;
- down_write(&gmap->mm->mmap_sem);
+ mm_write_lock(gmap->mm, &mmrange);
for (off = 0; off < len; off += PMD_SIZE) {
/* Remove old translation */
flush |= __gmap_unmap_by_gaddr(gmap, to + off);
@@ -446,7 +448,7 @@ int gmap_map_segment(struct gmap *gmap, unsigned long from,
(void *) from + off))
break;
}
- up_write(&gmap->mm->mmap_sem);
+ mm_write_unlock(gmap->mm, &mmrange);
if (flush)
gmap_flush_tlb(gmap);
if (off >= len)
@@ -492,10 +494,11 @@ EXPORT_SYMBOL_GPL(__gmap_translate);
unsigned long gmap_translate(struct gmap *gmap, unsigned long gaddr)
{
unsigned long rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&gmap->mm->mmap_sem);
+ mm_read_lock(gmap->mm, &mmrange);
rc = __gmap_translate(gmap, gaddr);
- up_read(&gmap->mm->mmap_sem);
+ mm_read_unlock(gmap->mm, &mmrange);
return rc;
}
EXPORT_SYMBOL_GPL(gmap_translate);
@@ -623,8 +626,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
bool unlocked;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&gmap->mm->mmap_sem);
-
+ mm_read_lock(gmap->mm, &mmrange);
retry:
unlocked = false;
vmaddr = __gmap_translate(gmap, gaddr);
@@ -646,7 +648,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,

rc = __gmap_link(gmap, gaddr, vmaddr);
out_up:
- up_read(&gmap->mm->mmap_sem);
+ mm_read_unlock(gmap->mm, &mmrange);
return rc;
}
EXPORT_SYMBOL_GPL(gmap_fault);
@@ -678,8 +680,9 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
{
unsigned long gaddr, vmaddr, size;
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&gmap->mm->mmap_sem);
+ mm_read_lock(gmap->mm, &mmrange);
for (gaddr = from; gaddr < to;
gaddr = (gaddr + PMD_SIZE) & PMD_MASK) {
/* Find the vm address for the guest address */
@@ -694,7 +697,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
zap_page_range(vma, vmaddr, size);
}
- up_read(&gmap->mm->mmap_sem);
+ mm_read_unlock(gmap->mm, &mmrange);
}
EXPORT_SYMBOL_GPL(gmap_discard);

@@ -942,9 +945,9 @@ int gmap_mprotect_notify(struct gmap *gmap, unsigned long gaddr,
return -EINVAL;
if (!MACHINE_HAS_ESOP && prot == PROT_READ)
return -EINVAL;
- down_read(&gmap->mm->mmap_sem);
+ mm_read_lock(gmap->mm, &mmrange);
rc = gmap_protect_range(gmap, gaddr, len, prot, PGSTE_IN_BIT, &mmrange);
- up_read(&gmap->mm->mmap_sem);
+ mm_read_unlock(gmap->mm, &mmrange);
return rc;
}
EXPORT_SYMBOL_GPL(gmap_mprotect_notify);
@@ -1536,11 +1539,11 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
}
spin_unlock(&parent->shadow_lock);
/* protect after insertion, so it will get properly invalidated */
- down_read(&parent->mm->mmap_sem);
+ mm_read_lock(parent->mm, &mmrange);
rc = gmap_protect_range(parent, asce & _ASCE_ORIGIN,
((asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE,
PROT_READ, PGSTE_VSIE_BIT, &mmrange);
- up_read(&parent->mm->mmap_sem);
+ mm_read_unlock(parent->mm, &mmrange);
spin_lock(&parent->shadow_lock);
new->initialized = true;
if (rc) {
@@ -2176,12 +2179,12 @@ int s390_enable_sie(void)
/* Fail if the page tables are 2K */
if (!mm_alloc_pgste(mm))
return -EINVAL;
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
mm->context.has_pgste = 1;
/* split thp mappings and disable thp for future mappings */
thp_split_mm(mm);
zap_zero_pages(mm, &mmrange);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return 0;
}
EXPORT_SYMBOL_GPL(s390_enable_sie);
@@ -2206,7 +2209,7 @@ int s390_enable_skey(void)
int rc = 0;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
if (mm_use_skey(mm))
goto out_up;

@@ -2225,7 +2228,7 @@ int s390_enable_skey(void)
walk_page_range(0, TASK_SIZE, &walk, &mmrange);

out_up:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return rc;
}
EXPORT_SYMBOL_GPL(s390_enable_skey);
@@ -2245,9 +2248,9 @@ void s390_reset_cmma(struct mm_struct *mm)
struct mm_walk walk = { .pte_entry = __s390_reset_cmma };
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
walk.mm = mm;
walk_page_range(0, TASK_SIZE, &walk, &mmrange);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
}
EXPORT_SYMBOL_GPL(s390_reset_cmma);
diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 7d42a8794f10..bea541d5e181 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -17,8 +17,9 @@ static long get_pfn(unsigned long user_addr, unsigned long access,
{
struct vm_area_struct *vma;
long ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
ret = -EINVAL;
vma = find_vma(current->mm, user_addr);
if (!vma)
@@ -28,7 +29,7 @@ static long get_pfn(unsigned long user_addr, unsigned long access,
goto out;
ret = follow_pfn(vma, user_addr, pfn);
out:
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return ret;
}

--
2.13.6


2018-02-05 01:37:30

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 49/64] arch/xtensa: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/xtensa/mm/fault.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 6f8e3e7cccb5..5e783e5583b6 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -75,7 +75,7 @@ void do_page_fault(struct pt_regs *regs)
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);

if (!vma)
@@ -141,7 +141,7 @@ void do_page_fault(struct pt_regs *regs)
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
if (flags & VM_FAULT_MAJOR)
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
@@ -154,7 +154,7 @@ void do_page_fault(struct pt_regs *regs)
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (user_mode(regs)) {
current->thread.bad_vaddr = address;
current->thread.error_code = is_write;
@@ -173,7 +173,7 @@ void do_page_fault(struct pt_regs *regs)
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
bad_page_fault(regs, address, SIGKILL);
else
@@ -181,7 +181,7 @@ void do_page_fault(struct pt_regs *regs)
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/* Send a sigbus, regardless of whether we were in kernel
* or user mode.
--
2.13.6


2018-02-05 01:37:53

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 46/64] arch/metag: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/metag/mm/fault.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/metag/mm/fault.c b/arch/metag/mm/fault.c
index e16ba0ea7ea1..47ab10069fde 100644
--- a/arch/metag/mm/fault.c
+++ b/arch/metag/mm/fault.c
@@ -114,7 +114,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

vma = find_vma_prev(mm, address, &prev_vma);

@@ -169,7 +169,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return 0;

check_expansion:
@@ -178,7 +178,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
goto good_area;

bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
if (user_mode(regs)) {
@@ -206,7 +206,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
goto no_context;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Send a sigbus, regardless of whether we were in kernel
@@ -230,7 +230,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (user_mode(regs)) {
pagefault_out_of_memory();
return 1;
--
2.13.6


2018-02-05 01:38:03

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 40/64] arch/sh: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/sh/kernel/sys_sh.c | 7 ++++---
arch/sh/kernel/vsyscall/vsyscall.c | 5 +++--
2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/sh/kernel/sys_sh.c b/arch/sh/kernel/sys_sh.c
index 724911c59e7d..35b91b6c8d34 100644
--- a/arch/sh/kernel/sys_sh.c
+++ b/arch/sh/kernel/sys_sh.c
@@ -58,6 +58,7 @@ asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
asmlinkage int sys_cacheflush(unsigned long addr, unsigned long len, int op)
{
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if ((op <= 0) || (op > (CACHEFLUSH_D_PURGE|CACHEFLUSH_I)))
return -EINVAL;
@@ -69,10 +70,10 @@ asmlinkage int sys_cacheflush(unsigned long addr, unsigned long len, int op)
if (addr + len < addr)
return -EFAULT;

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma (current->mm, addr);
if (vma == NULL || addr < vma->vm_start || addr + len > vma->vm_end) {
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return -EFAULT;
}

@@ -91,6 +92,6 @@ asmlinkage int sys_cacheflush(unsigned long addr, unsigned long len, int op)
if (op & CACHEFLUSH_I)
flush_icache_range(addr, addr+len);

- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return 0;
}
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index cc0cc5b4ff18..17520e6b7783 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -63,8 +63,9 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
struct mm_struct *mm = current->mm;
unsigned long addr;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
@@ -83,7 +84,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
current->mm->context.vdso = (void *)addr;

up_fail:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}

--
2.13.6


2018-02-05 01:38:13

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 33/64] arch/powerpc: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.
For those mmap_sem callers who don't, we add it within the same
function context.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/powerpc/kernel/vdso.c | 7 ++++---
arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 ++++--
arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++--
arch/powerpc/kvm/book3s_64_vio.c | 5 +++--
arch/powerpc/kvm/book3s_hv.c | 7 ++++---
arch/powerpc/kvm/e500_mmu_host.c | 5 +++--
arch/powerpc/mm/copro_fault.c | 4 ++--
arch/powerpc/mm/mmu_context_iommu.c | 5 +++--
arch/powerpc/mm/subpage-prot.c | 13 +++++++------
arch/powerpc/oprofile/cell/spu_task_sync.c | 7 ++++---
arch/powerpc/platforms/cell/spufs/file.c | 6 ++++--
arch/powerpc/platforms/powernv/npu-dma.c | 2 +-
12 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 22b01a3962f0..869632b601b8 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -155,6 +155,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
unsigned long vdso_pages;
unsigned long vdso_base;
int rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!vdso_ready)
return 0;
@@ -196,7 +197,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
* and end up putting it elsewhere.
* Add enough to the size so that the result can be aligned.
*/
- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;
vdso_base = get_unmapped_area(NULL, vdso_base,
(vdso_pages << PAGE_SHIFT) +
@@ -236,11 +237,11 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
goto fail_mmapsem;
}

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return 0;

fail_mmapsem:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return rc;
}

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b73dbc9e797d..c05a99209fc1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -583,8 +583,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
hva = gfn_to_hva_memslot(memslot, gfn);
npages = get_user_pages_fast(hva, 1, writing, pages);
if (npages < 1) {
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
/* Check if it's an I/O mapping */
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm, hva);
if (vma && vma->vm_start <= hva && hva + psize <= vma->vm_end &&
(vma->vm_flags & VM_PFNMAP)) {
@@ -594,7 +596,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
is_ci = pte_ci(__pte((pgprot_val(vma->vm_page_prot))));
write_ok = vma->vm_flags & VM_WRITE;
}
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (!pfn)
goto out_put;
} else {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 0c854816e653..9a4d1758b0db 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -397,8 +397,10 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
level = 0;
npages = get_user_pages_fast(hva, 1, writing, pages);
if (npages < 1) {
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
/* Check if it's an I/O mapping */
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm, hva);
if (vma && vma->vm_start <= hva && hva < vma->vm_end &&
(vma->vm_flags & VM_PFNMAP)) {
@@ -406,7 +408,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
((hva - vma->vm_start) >> PAGE_SHIFT);
pgflags = pgprot_val(vma->vm_page_prot);
}
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
if (!pfn)
return -EFAULT;
} else {
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 4dffa611376d..5e6fe2820009 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -60,11 +60,12 @@ static unsigned long kvmppc_stt_pages(unsigned long tce_pages)
static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
{
long ret = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!current || !current->mm)
return ret; /* process exited */

- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);

if (inc) {
unsigned long locked, lock_limit;
@@ -89,7 +90,7 @@ static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc)
rlimit(RLIMIT_MEMLOCK),
ret ? " - exceeded" : "");

- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);

return ret;
}
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 473f6eebe34f..1bf281f37713 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3610,6 +3610,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
unsigned long lpcr = 0, senc;
unsigned long psize, porder;
int srcu_idx;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* Allocate hashed page table (if not done already) and reset it */
if (!kvm->arch.hpt.virt) {
@@ -3642,7 +3643,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)

/* Look up the VMA for the start of this memory slot */
hva = memslot->userspace_addr;
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm, hva);
if (!vma || vma->vm_start > hva || (vma->vm_flags & VM_IO))
goto up_out;
@@ -3650,7 +3651,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
psize = vma_kernel_pagesize(vma);
porder = __ilog2(psize);

- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

/* We can handle 4k, 64k or 16M pages in the VRMA */
err = -EINVAL;
@@ -3680,7 +3681,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
return err;

up_out:
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
goto out_srcu;
}

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 423b21393bc9..72ce80fa9453 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -358,7 +358,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,

if (tlbsel == 1) {
struct vm_area_struct *vma;
- down_read(&current->mm->mmap_sem);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+ mm_read_lock(current->mm, &mmrange);

vma = find_vma(current->mm, hva);
if (vma && hva >= vma->vm_start &&
@@ -444,7 +445,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
tsize = max(BOOK3E_PAGESZ_4K, tsize & ~1);
}

- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
}

if (likely(!pfnmap)) {
diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index 8f5e604828a1..570ebca7e2f8 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -47,7 +47,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
if (mm->pgd == NULL)
return -EFAULT;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
ret = -EFAULT;
vma = find_vma(mm, ea);
if (!vma)
@@ -97,7 +97,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
current->min_flt++;

out_unlock:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return ret;
}
EXPORT_SYMBOL_GPL(copro_handle_mm_fault);
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index 91ee2231c527..35d32a8ccb89 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -36,11 +36,12 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
unsigned long npages, bool incr)
{
long ret = 0, locked, lock_limit;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!npages)
return 0;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

if (incr) {
locked = mm->locked_vm + npages;
@@ -61,7 +62,7 @@ static long mm_iommu_adjust_locked_vm(struct mm_struct *mm,
npages << PAGE_SHIFT,
mm->locked_vm << PAGE_SHIFT,
rlimit(RLIMIT_MEMLOCK));
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return ret;
}
diff --git a/arch/powerpc/mm/subpage-prot.c b/arch/powerpc/mm/subpage-prot.c
index f14a07c2fb90..0afd636123fd 100644
--- a/arch/powerpc/mm/subpage-prot.c
+++ b/arch/powerpc/mm/subpage-prot.c
@@ -97,9 +97,10 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
u32 **spm, *spp;
unsigned long i;
size_t nw;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
unsigned long next, limit;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
limit = addr + len;
if (limit > spt->maxaddr)
limit = spt->maxaddr;
@@ -127,7 +128,7 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
/* now flush any existing HPTEs for the range */
hpte_flush_range(mm, addr, nw);
}
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
}

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -216,7 +217,7 @@ long sys_subpage_prot(unsigned long addr, unsigned long len, u32 __user *map)
if (!access_ok(VERIFY_READ, map, (len >> PAGE_SHIFT) * sizeof(u32)))
return -EFAULT;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
subpage_mark_vma_nohuge(mm, addr, len);
for (limit = addr + len; addr < limit; addr = next) {
next = pmd_addr_end(addr, limit);
@@ -251,11 +252,11 @@ long sys_subpage_prot(unsigned long addr, unsigned long len, u32 __user *map)
if (addr + (nw << PAGE_SHIFT) > next)
nw = (next - addr) >> PAGE_SHIFT;

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
if (__copy_from_user(spp, map, nw * sizeof(u32)))
return -EFAULT;
map += nw;
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

/* now flush any existing HPTEs for the range */
hpte_flush_range(mm, addr, nw);
@@ -264,6 +265,6 @@ long sys_subpage_prot(unsigned long addr, unsigned long len, u32 __user *map)
spt->maxaddr = limit;
err = 0;
out:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return err;
}
diff --git a/arch/powerpc/oprofile/cell/spu_task_sync.c b/arch/powerpc/oprofile/cell/spu_task_sync.c
index 44d67b167e0b..50ebb615fdab 100644
--- a/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ b/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -325,6 +325,7 @@ get_exec_dcookie_and_offset(struct spu *spu, unsigned int *offsetp,
struct vm_area_struct *vma;
struct file *exe_file;
struct mm_struct *mm = spu->mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!mm)
goto out;
@@ -336,7 +337,7 @@ get_exec_dcookie_and_offset(struct spu *spu, unsigned int *offsetp,
fput(exe_file);
}

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (vma->vm_start > spu_ref || vma->vm_end <= spu_ref)
continue;
@@ -353,13 +354,13 @@ get_exec_dcookie_and_offset(struct spu *spu, unsigned int *offsetp,
*spu_bin_dcookie = fast_get_dcookie(&vma->vm_file->f_path);
pr_debug("got dcookie for %pD\n", vma->vm_file);

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

out:
return app_cookie;

fail_no_image_cookie:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

printk(KERN_ERR "SPU_PROF: "
"%s, line %d: Cannot find dcookie for SPU binary\n",
diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c
index c1be486da899..f2017a915bd8 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -347,11 +347,13 @@ static int spufs_ps_fault(struct vm_fault *vmf,
goto refault;

if (ctx->state == SPU_STATE_SAVED) {
- up_read(&current->mm->mmap_sem);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ mm_read_unlock(current->mm, &mmrange);
spu_context_nospu_trace(spufs_ps_fault__sleep, ctx);
ret = spufs_wait(ctx->run_wq, ctx->state == SPU_STATE_RUNNABLE);
spu_context_trace(spufs_ps_fault__wake, ctx, ctx->spu);
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
} else {
area = ctx->spu->problem_phys + ps_offs;
vm_insert_pfn(vmf->vma, vmf->address, (area + offset) >> PAGE_SHIFT);
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 759e9a4c7479..8cf4be123663 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -802,7 +802,7 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
if (!firmware_has_feature(FW_FEATURE_OPAL))
return -ENODEV;

- WARN_ON(!rwsem_is_locked(&mm->mmap_sem));
+ WARN_ON(!mm_is_locked(mm, mmrange));

for (i = 0; i < count; i++) {
is_write = flags[i] & NPU2_WRITE;
--
2.13.6


2018-02-05 01:38:17

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 42/64] arch/frv: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/frv/mm/fault.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/frv/mm/fault.c b/arch/frv/mm/fault.c
index 494d33b628fc..a5da0586e6cc 100644
--- a/arch/frv/mm/fault.c
+++ b/arch/frv/mm/fault.c
@@ -86,7 +86,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
if (user_mode(__frame))
flags |= FAULT_FLAG_USER;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

vma = find_vma(mm, ear0);
if (!vma)
@@ -181,7 +181,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
else
current->min_flt++;

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -189,7 +189,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/* User mode accesses just cause a SIGSEGV */
if (user_mode(__frame)) {
@@ -259,14 +259,14 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(__frame))
goto no_context;
pagefault_out_of_memory();
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Send a sigbus, regardless of whether we were in kernel
--
2.13.6


2018-02-05 01:38:39

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 44/64] arch/score: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/score/mm/fault.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/score/mm/fault.c b/arch/score/mm/fault.c
index 07a8637ad142..535df3b377a5 100644
--- a/arch/score/mm/fault.c
+++ b/arch/score/mm/fault.c
@@ -81,7 +81,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
@@ -127,7 +127,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
else
tsk->min_flt++;

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -135,7 +135,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
@@ -174,14 +174,14 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
/* Kernel mode? Handle exceptions or die */
if (!user_mode(regs))
goto no_context;
--
2.13.6


2018-02-05 01:38:41

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 07/64] mm/hugetlb: teach hugetlb_fault() about range locking

From: Davidlohr Bueso <[email protected]>

Such that we can pass the mmrange along to vm_fault for
page in userfault range (handle_userfault()) which gets
funky with mmap_sem - just look at the locking rules.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/hugetlb.h | 9 +++++----
mm/gup.c | 3 ++-
mm/hugetlb.c | 16 +++++++++++-----
mm/memory.c | 2 +-
4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 36fa6a2a82e3..df0a89a95bdc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -91,7 +91,7 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_ar
long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
struct page **, struct vm_area_struct **,
unsigned long *, unsigned long *, long, unsigned int,
- int *);
+ int *, struct range_lock *);
void unmap_hugepage_range(struct vm_area_struct *,
unsigned long, unsigned long, struct page *);
void __unmap_hugepage_range_final(struct mmu_gather *tlb,
@@ -106,7 +106,8 @@ int hugetlb_report_node_meminfo(int, char *);
void hugetlb_show_meminfo(void);
unsigned long hugetlb_total_pages(void);
int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long address, unsigned int flags);
+ unsigned long address, unsigned int flags,
+ struct range_lock *mmrange);
int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
@@ -170,7 +171,7 @@ static inline unsigned long hugetlb_total_pages(void)
return 0;
}

-#define follow_hugetlb_page(m,v,p,vs,a,b,i,w,n) ({ BUG(); 0; })
+#define follow_hugetlb_page(m,v,p,vs,a,b,i,w,n,r) ({ BUG(); 0; })
#define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL)
#define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; })
static inline void hugetlb_report_meminfo(struct seq_file *m)
@@ -189,7 +190,7 @@ static inline void hugetlb_show_meminfo(void)
#define pud_huge(x) 0
#define is_hugepage_only_range(mm, addr, len) 0
#define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
-#define hugetlb_fault(mm, vma, addr, flags) ({ BUG(); 0; })
+#define hugetlb_fault(mm, vma, addr, flags,mmrange) ({ BUG(); 0; })
#define hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
src_addr, pagep) ({ BUG(); 0; })
#define huge_pte_offset(mm, address, sz) 0
diff --git a/mm/gup.c b/mm/gup.c
index 01983a7b3750..3d1b6dd11616 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -684,7 +684,8 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
if (is_vm_hugetlb_page(vma)) {
i = follow_hugetlb_page(mm, vma, pages, vmas,
&start, &nr_pages, i,
- gup_flags, nonblocking);
+ gup_flags, nonblocking,
+ mmrange);
continue;
}
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7c204e3d132b..fd22459e89ef 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3675,7 +3675,8 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping,

static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct address_space *mapping, pgoff_t idx,
- unsigned long address, pte_t *ptep, unsigned int flags)
+ unsigned long address, pte_t *ptep, unsigned int flags,
+ struct range_lock *mmrange)
{
struct hstate *h = hstate_vma(vma);
int ret = VM_FAULT_SIGBUS;
@@ -3716,6 +3717,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
.vma = vma,
.address = address,
.flags = flags,
+ .lockrange = mmrange,
/*
* Hard to debug if it ends up being
* used by a callee that assumes
@@ -3869,7 +3871,8 @@ u32 hugetlb_fault_mutex_hash(struct hstate *h, struct mm_struct *mm,
#endif

int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long address, unsigned int flags)
+ unsigned long address, unsigned int flags,
+ struct range_lock *mmrange)
{
pte_t *ptep, entry;
spinlock_t *ptl;
@@ -3912,7 +3915,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,

entry = huge_ptep_get(ptep);
if (huge_pte_none(entry)) {
- ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags);
+ ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
+ flags, mmrange);
goto out_mutex;
}

@@ -4140,7 +4144,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page **pages, struct vm_area_struct **vmas,
unsigned long *position, unsigned long *nr_pages,
- long i, unsigned int flags, int *nonblocking)
+ long i, unsigned int flags, int *nonblocking,
+ struct range_lock *mmrange)
{
unsigned long pfn_offset;
unsigned long vaddr = *position;
@@ -4221,7 +4226,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
FAULT_FLAG_ALLOW_RETRY);
fault_flags |= FAULT_FLAG_TRIED;
}
- ret = hugetlb_fault(mm, vma, vaddr, fault_flags);
+ ret = hugetlb_fault(mm, vma, vaddr, fault_flags,
+ mmrange);
if (ret & VM_FAULT_ERROR) {
err = vm_fault_to_errno(ret, flags);
remainder = 0;
diff --git a/mm/memory.c b/mm/memory.c
index b3561a052939..2d087b0e174d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4136,7 +4136,7 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
mem_cgroup_oom_enable();

if (unlikely(is_vm_hugetlb_page(vma)))
- ret = hugetlb_fault(vma->vm_mm, vma, address, flags);
+ ret = hugetlb_fault(vma->vm_mm, vma, address, flags, mmrange);
else
ret = __handle_mm_fault(vma, address, flags, mmrange);

--
2.13.6


2018-02-05 01:38:51

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 43/64] arch/hexagon: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/hexagon/kernel/vdso.c | 5 +++--
arch/hexagon/mm/vm_fault.c | 8 ++++----
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/hexagon/kernel/vdso.c b/arch/hexagon/kernel/vdso.c
index 3ea968415539..53e3db1b54f1 100644
--- a/arch/hexagon/kernel/vdso.c
+++ b/arch/hexagon/kernel/vdso.c
@@ -64,8 +64,9 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
int ret;
unsigned long vdso_base;
struct mm_struct *mm = current->mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

/* Try to get it loaded right near ld.so/glibc. */
@@ -89,7 +90,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
mm->context.vdso = (void *)vdso_base;

up_fail:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}

diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index 7d6ada2c2230..58203949486e 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -69,7 +69,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
@@ -122,11 +122,11 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/* Handle copyin/out exception cases */
if (!user_mode(regs))
@@ -155,7 +155,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
return;

bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

if (user_mode(regs)) {
info.si_signo = SIGSEGV;
--
2.13.6


2018-02-05 01:39:29

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 24/64] mm/thp: disable mmap_sem is_locked checks

From: Davidlohr Bueso <[email protected]>

* THIS IS A HACK *

pud/pmd_trans_huge_lock() such that we don't have to
teach file_operations about mmrange.

No-Yet-Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/huge_mm.h | 2 --
1 file changed, 2 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a8a126259bc4..7694c11b3575 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -189,7 +189,6 @@ static inline int is_swap_pmd(pmd_t pmd)
static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
struct vm_area_struct *vma)
{
- VM_BUG_ON_VMA(!rwsem_is_locked(&vma->vm_mm->mmap_sem), vma);
if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
return __pmd_trans_huge_lock(pmd, vma);
else
@@ -198,7 +197,6 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
struct vm_area_struct *vma)
{
- VM_BUG_ON_VMA(!rwsem_is_locked(&vma->vm_mm->mmap_sem), vma);
if (pud_trans_huge(*pud) || pud_devmap(*pud))
return __pud_trans_huge_lock(pud, vma);
else
--
2.13.6


2018-02-05 01:39:37

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 27/64] arch/{x86,sh,ppc}: teach bad_area() about range locking

From: Davidlohr Bueso <[email protected]>

Such architectures will drop the mmap_sem inside __bad_area(),
which in turn calls bad_area_nosemaphore(). The rest of the
archs will implement this logic within do_page_fault(), so
they remain unchanged as we already have the mmrange.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/powerpc/mm/fault.c | 32 +++++++++++++++++---------------
arch/sh/mm/fault.c | 47 ++++++++++++++++++++++++++---------------------
arch/x86/mm/fault.c | 35 ++++++++++++++++++++---------------
3 files changed, 63 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index d562dc88687d..80e4cf0e4c3b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -129,7 +129,7 @@ static noinline int bad_area_nosemaphore(struct pt_regs *regs, unsigned long add
}

static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code,
- int pkey)
+ int pkey, struct range_lock *mmrange)
{
struct mm_struct *mm = current->mm;

@@ -137,14 +137,15 @@ static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code,
* Something tried to access memory that isn't in our memory map..
* Fix it, but check if it's kernel or user first..
*/
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);

return __bad_area_nosemaphore(regs, address, si_code, pkey);
}

-static noinline int bad_area(struct pt_regs *regs, unsigned long address)
+static noinline int bad_area(struct pt_regs *regs, unsigned long address,
+ struct range_lock *mmrange)
{
- return __bad_area(regs, address, SEGV_MAPERR, 0);
+ return __bad_area(regs, address, SEGV_MAPERR, 0, mmrange);
}

static int bad_key_fault_exception(struct pt_regs *regs, unsigned long address,
@@ -153,9 +154,10 @@ static int bad_key_fault_exception(struct pt_regs *regs, unsigned long address,
return __bad_area_nosemaphore(regs, address, SEGV_PKUERR, pkey);
}

-static noinline int bad_access(struct pt_regs *regs, unsigned long address)
+static noinline int bad_access(struct pt_regs *regs, unsigned long address,
+ struct range_lock *mmrange)
{
- return __bad_area(regs, address, SEGV_ACCERR, 0);
+ return __bad_area(regs, address, SEGV_ACCERR, 0, mmrange);
}

static int do_sigbus(struct pt_regs *regs, unsigned long address,
@@ -475,12 +477,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
* source. If this is invalid we can skip the address space check,
* thus avoiding the deadlock.
*/
- if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
+ if (unlikely(!mm_read_trylock(mm, &mmrange))) {
if (!is_user && !search_exception_tables(regs->nip))
return bad_area_nosemaphore(regs, address);

retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
} else {
/*
* The above down_read_trylock() might have succeeded in
@@ -492,23 +494,23 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,

vma = find_vma(mm, address);
if (unlikely(!vma))
- return bad_area(regs, address);
+ return bad_area(regs, address, &mmrange);
if (likely(vma->vm_start <= address))
goto good_area;
if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
- return bad_area(regs, address);
+ return bad_area(regs, address, &mmrange);

/* The stack is being expanded, check if it's valid */
if (unlikely(bad_stack_expansion(regs, address, vma, store_update_sp)))
- return bad_area(regs, address);
+ return bad_area(regs, address, &mmrange);

/* Try to expand it */
if (unlikely(expand_stack(vma, address)))
- return bad_area(regs, address);
+ return bad_area(regs, address, &mmrange);

good_area:
if (unlikely(access_error(is_write, is_exec, vma)))
- return bad_access(regs, address);
+ return bad_access(regs, address, &mmrange);

/*
* If for any reason at all we couldn't handle the fault,
@@ -535,7 +537,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
int pkey = vma_pkey(vma);

if (likely(pkey)) {
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return bad_key_fault_exception(regs, address, pkey);
}
}
@@ -567,7 +569,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
return is_user ? 0 : SIGBUS;
}

- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

if (unlikely(fault & VM_FAULT_ERROR))
return mm_fault_error(regs, address, fault);
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index d36106564728..a9f75dc1abb3 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -277,7 +277,8 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,

static void
__bad_area(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, int si_code)
+ unsigned long address, int si_code,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = current->mm;

@@ -285,31 +286,34 @@ __bad_area(struct pt_regs *regs, unsigned long error_code,
* Something tried to access memory that isn't in our memory map..
* Fix it, but check if it's kernel or user first..
*/
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);

__bad_area_nosemaphore(regs, error_code, address, si_code);
}

static noinline void
-bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
+bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address,
+ struct range_lock *mmrange)
{
- __bad_area(regs, error_code, address, SEGV_MAPERR);
+ __bad_area(regs, error_code, address, SEGV_MAPERR, mmrange);
}

static noinline void
bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
- unsigned long address)
+ unsigned long address,
+ struct range_lock *mmrange)
{
- __bad_area(regs, error_code, address, SEGV_ACCERR);
+ __bad_area(regs, error_code, address, SEGV_ACCERR, mmrange);
}

static void
-do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address)
+do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
+ struct range_lock *mmrange)
{
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);

/* Kernel mode? Handle exceptions or die: */
if (!user_mode(regs))
@@ -320,7 +324,8 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address)

static noinline int
mm_fault_error(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, unsigned int fault)
+ unsigned long address, unsigned int fault,
+ struct range_lock *mmrange)
{
/*
* Pagefault was interrupted by SIGKILL. We have no reason to
@@ -328,7 +333,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
*/
if (fatal_signal_pending(current)) {
if (!(fault & VM_FAULT_RETRY))
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, mmrange);
if (!user_mode(regs))
no_context(regs, error_code, address);
return 1;
@@ -340,11 +345,11 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
if (fault & VM_FAULT_OOM) {
/* Kernel mode? Handle exceptions or die: */
if (!user_mode(regs)) {
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, mmrange);
no_context(regs, error_code, address);
return 1;
}
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, mmrange);

/*
* We ran out of memory, call the OOM killer, and return the
@@ -354,9 +359,9 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
pagefault_out_of_memory();
} else {
if (fault & VM_FAULT_SIGBUS)
- do_sigbus(regs, error_code, address);
+ do_sigbus(regs, error_code, address, mmrange);
else if (fault & VM_FAULT_SIGSEGV)
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, mmrange);
else
BUG();
}
@@ -449,21 +454,21 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
}

retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

vma = find_vma(mm, address);
if (unlikely(!vma)) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, &mmrange);
return;
}
if (likely(vma->vm_start <= address))
goto good_area;
if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, &mmrange);
return;
}
if (unlikely(expand_stack(vma, address))) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, &mmrange);
return;
}

@@ -473,7 +478,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
*/
good_area:
if (unlikely(access_error(error_code, vma))) {
- bad_area_access_error(regs, error_code, address);
+ bad_area_access_error(regs, error_code, address, &mmrange);
return;
}

@@ -492,7 +497,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
fault = handle_mm_fault(vma, address, flags, &mmrange);

if (unlikely(fault & (VM_FAULT_RETRY | VM_FAULT_ERROR)))
- if (mm_fault_error(regs, error_code, address, fault))
+ if (mm_fault_error(regs, error_code, address, fault, &mmrange))
return;

if (flags & FAULT_FLAG_ALLOW_RETRY) {
@@ -518,5 +523,5 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 93f1b8d4c88e..87bdcb26a907 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -937,7 +937,8 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,

static void
__bad_area(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, struct vm_area_struct *vma, int si_code)
+ unsigned long address, struct vm_area_struct *vma, int si_code,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = current->mm;
u32 pkey;
@@ -949,16 +950,17 @@ __bad_area(struct pt_regs *regs, unsigned long error_code,
* Something tried to access memory that isn't in our memory map..
* Fix it, but check if it's kernel or user first..
*/
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);

__bad_area_nosemaphore(regs, error_code, address,
(vma) ? &pkey : NULL, si_code);
}

static noinline void
-bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
+bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address,
+ struct range_lock *mmrange)
{
- __bad_area(regs, error_code, address, NULL, SEGV_MAPERR);
+ __bad_area(regs, error_code, address, NULL, SEGV_MAPERR, mmrange);
}

static inline bool bad_area_access_from_pkeys(unsigned long error_code,
@@ -980,7 +982,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,

static noinline void
bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, struct vm_area_struct *vma)
+ unsigned long address, struct vm_area_struct *vma,
+ struct range_lock *mmrange)
{
/*
* This OSPKE check is not strictly necessary at runtime.
@@ -988,9 +991,11 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
* if pkeys are compiled out.
*/
if (bad_area_access_from_pkeys(error_code, vma))
- __bad_area(regs, error_code, address, vma, SEGV_PKUERR);
+ __bad_area(regs, error_code, address, vma, SEGV_PKUERR,
+ mmrange);
else
- __bad_area(regs, error_code, address, vma, SEGV_ACCERR);
+ __bad_area(regs, error_code, address, vma, SEGV_ACCERR,
+ mmrange);
}

static void
@@ -1353,14 +1358,14 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
* validate the source. If this is invalid we can skip the address
* space check, thus avoiding the deadlock:
*/
- if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
+ if (unlikely(!mm_read_trylock(mm, &mmrange))) {
if (!(error_code & X86_PF_USER) &&
!search_exception_tables(regs->ip)) {
bad_area_nosemaphore(regs, error_code, address, NULL);
return;
}
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
} else {
/*
* The above down_read_trylock() might have succeeded in
@@ -1372,13 +1377,13 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,

vma = find_vma(mm, address);
if (unlikely(!vma)) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, &mmrange);
return;
}
if (likely(vma->vm_start <= address))
goto good_area;
if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, &mmrange);
return;
}
if (error_code & X86_PF_USER) {
@@ -1389,12 +1394,12 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
* 32 pointers and then decrements %sp by 65535.)
*/
if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, &mmrange);
return;
}
}
if (unlikely(expand_stack(vma, address))) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, &mmrange);
return;
}

@@ -1404,7 +1409,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
*/
good_area:
if (unlikely(access_error(error_code, vma))) {
- bad_area_access_error(regs, error_code, address, vma);
+ bad_area_access_error(regs, error_code, address, vma, &mmrange);
return;
}

@@ -1450,7 +1455,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
return;
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (unlikely(fault & VM_FAULT_ERROR)) {
mm_fault_error(regs, error_code, address, &pkey, fault);
return;
--
2.13.6


2018-02-05 01:39:44

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 39/64] arch/m68k: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/m68k/kernel/sys_m68k.c | 18 +++++++++++-------
arch/m68k/mm/fault.c | 8 ++++----
2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/m68k/kernel/sys_m68k.c b/arch/m68k/kernel/sys_m68k.c
index 27e10af5153a..d151bd19385c 100644
--- a/arch/m68k/kernel/sys_m68k.c
+++ b/arch/m68k/kernel/sys_m68k.c
@@ -378,6 +378,7 @@ asmlinkage int
sys_cacheflush (unsigned long addr, int scope, int cache, unsigned long len)
{
int ret = -EINVAL;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (scope < FLUSH_SCOPE_LINE || scope > FLUSH_SCOPE_ALL ||
cache & ~FLUSH_CACHE_BOTH)
@@ -399,7 +400,7 @@ sys_cacheflush (unsigned long addr, int scope, int cache, unsigned long len)
* Verify that the specified address region actually belongs
* to this process.
*/
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm, addr);
if (!vma || addr < vma->vm_start || addr + len > vma->vm_end)
goto out_unlock;
@@ -450,7 +451,7 @@ sys_cacheflush (unsigned long addr, int scope, int cache, unsigned long len)
}
}
out_unlock:
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
out:
return ret;
}
@@ -461,6 +462,8 @@ asmlinkage int
sys_atomic_cmpxchg_32(unsigned long newval, int oldval, int d3, int d4, int d5,
unsigned long __user * mem)
{
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
/* This was borrowed from ARM's implementation. */
for (;;) {
struct mm_struct *mm = current->mm;
@@ -470,7 +473,7 @@ sys_atomic_cmpxchg_32(unsigned long newval, int oldval, int d3, int d4, int d5,
spinlock_t *ptl;
unsigned long mem_value;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
pgd = pgd_offset(mm, (unsigned long)mem);
if (!pgd_present(*pgd))
goto bad_access;
@@ -493,11 +496,11 @@ sys_atomic_cmpxchg_32(unsigned long newval, int oldval, int d3, int d4, int d5,
__put_user(newval, mem);

pte_unmap_unlock(pte, ptl);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return mem_value;

bad_access:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
/* This is not necessarily a bad access, we can get here if
a memory we're trying to write to should be copied-on-write.
Make the kernel do the necessary page stuff, then re-iterate.
@@ -536,14 +539,15 @@ sys_atomic_cmpxchg_32(unsigned long newval, int oldval, int d3, int d4, int d5,
{
struct mm_struct *mm = current->mm;
unsigned long mem_value;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

mem_value = *mem;
if (mem_value == oldval)
*mem = newval;

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return mem_value;
}

diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index ec32a193726f..426d22924852 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -90,7 +90,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

vma = find_vma(mm, address);
if (!vma)
@@ -181,7 +181,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return 0;

/*
@@ -189,7 +189,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
@@ -218,6 +218,6 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
current->thread.faddr = address;

send_sig:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return send_fault_sig(regs);
}
--
2.13.6


2018-02-05 01:40:22

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 47/64] arch/microblaze: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/microblaze/mm/fault.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index fd49efbdfbf4..072e0d79aab5 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -139,12 +139,12 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
* source. If this is invalid we can skip the address space check,
* thus avoiding the deadlock.
*/
- if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
+ if (unlikely(!mm_read_trylock(mm, &mmrange))) {
if (kernel_mode(regs) && !search_exception_tables(regs->pc))
goto bad_area_nosemaphore;

retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
}

vma = find_vma(mm, address);
@@ -251,7 +251,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* keep track of tlb+htab misses that are good addrs but
@@ -262,7 +262,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
return;

bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
pte_errors++;
@@ -286,7 +286,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
bad_page_fault(regs, address, SIGKILL);
else
@@ -294,7 +294,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (user_mode(regs)) {
info.si_signo = SIGBUS;
info.si_errno = 0;
--
2.13.6


2018-02-05 01:40:29

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 34/64] arch/parisc: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/parisc/kernel/traps.c | 7 ++++---
arch/parisc/mm/fault.c | 8 ++++----
2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index c919e6c0a687..ac73697c7952 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -718,8 +718,9 @@ void notrace handle_interruption(int code, struct pt_regs *regs)

if (user_mode(regs)) {
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm,regs->iaoq[0]);
if (vma && (regs->iaoq[0] >= vma->vm_start)
&& (vma->vm_flags & VM_EXEC)) {
@@ -727,10 +728,10 @@ void notrace handle_interruption(int code, struct pt_regs *regs)
fault_address = regs->iaoq[0];
fault_space = regs->iasq[0];

- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
break; /* call do_page_fault() */
}
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
}
/* Fall Through */
case 27:
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index 79db33a0cb0c..f4877e321c28 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -282,7 +282,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
if (acc_type & VM_WRITE)
flags |= FAULT_FLAG_WRITE;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma_prev(mm, address, &prev_vma);
if (!vma || address < vma->vm_start)
goto check_expansion;
@@ -339,7 +339,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
goto retry;
}
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

check_expansion:
@@ -351,7 +351,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
* Something tried to access memory that isn't in our memory map..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

if (user_mode(regs)) {
struct siginfo si;
@@ -427,7 +427,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
parisc_terminate("Bad Address (null pointer deref?)", regs, code, address);

out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
--
2.13.6


2018-02-05 01:40:51

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 21/64] mm: teach drop/take_all_locks() about range locking

From: Davidlohr Bueso <[email protected]>

And use the mm locking helpers. No changes in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/mm.h | 6 ++++--
mm/mmap.c | 12 +++++++-----
mm/mmu_notifier.c | 9 +++++----
3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fc4e7fdc3e76..0b9867e8a35d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2198,8 +2198,10 @@ static inline int check_data_rlimit(unsigned long rlim,
return 0;
}

-extern int mm_take_all_locks(struct mm_struct *mm);
-extern void mm_drop_all_locks(struct mm_struct *mm);
+extern int mm_take_all_locks(struct mm_struct *mm,
+ struct range_lock *mmrange);
+extern void mm_drop_all_locks(struct mm_struct *mm,
+ struct range_lock *mmrange);

extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
extern struct file *get_mm_exe_file(struct mm_struct *mm);
diff --git a/mm/mmap.c b/mm/mmap.c
index f61d49cb791e..8f0eb88a5d5e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3461,12 +3461,13 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
*
* mm_take_all_locks() can fail if it's interrupted by signals.
*/
-int mm_take_all_locks(struct mm_struct *mm)
+int mm_take_all_locks(struct mm_struct *mm,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma;
struct anon_vma_chain *avc;

- BUG_ON(down_read_trylock(&mm->mmap_sem));
+ BUG_ON(mm_read_trylock(mm, mmrange));

mutex_lock(&mm_all_locks_mutex);

@@ -3497,7 +3498,7 @@ int mm_take_all_locks(struct mm_struct *mm)
return 0;

out_unlock:
- mm_drop_all_locks(mm);
+ mm_drop_all_locks(mm, mmrange);
return -EINTR;
}

@@ -3541,12 +3542,13 @@ static void vm_unlock_mapping(struct address_space *mapping)
* The mmap_sem cannot be released by the caller until
* mm_drop_all_locks() returns.
*/
-void mm_drop_all_locks(struct mm_struct *mm)
+void mm_drop_all_locks(struct mm_struct *mm,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma;
struct anon_vma_chain *avc;

- BUG_ON(down_read_trylock(&mm->mmap_sem));
+ BUG_ON(mm_read_trylock(mm, mmrange));
BUG_ON(!mutex_is_locked(&mm_all_locks_mutex));

for (vma = mm->mmap; vma; vma = vma->vm_next) {
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 3e8a1a10607e..da99c01b8149 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -274,6 +274,7 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
{
struct mmu_notifier_mm *mmu_notifier_mm;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

BUG_ON(atomic_read(&mm->mm_users) <= 0);

@@ -283,8 +284,8 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
goto out;

if (take_mmap_sem)
- down_write(&mm->mmap_sem);
- ret = mm_take_all_locks(mm);
+ mm_write_lock(mm, &mmrange);
+ ret = mm_take_all_locks(mm, &mmrange);
if (unlikely(ret))
goto out_clean;

@@ -309,10 +310,10 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
hlist_add_head(&mn->hlist, &mm->mmu_notifier_mm->list);
spin_unlock(&mm->mmu_notifier_mm->lock);

- mm_drop_all_locks(mm);
+ mm_drop_all_locks(mm, &mmrange);
out_clean:
if (take_mmap_sem)
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
kfree(mmu_notifier_mm);
out:
BUG_ON(atomic_read(&mm->mm_users) <= 0);
--
2.13.6


2018-02-05 01:41:00

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 22/64] mm: avoid mmap_sem trylock in vm_insert_page()

From: Davidlohr Bueso <[email protected]>

The rules to this function state that mmap_sem must
be acquired by the caller:

- for write if used in f_op->mmap() (by far the most common case)
- for read if used from vma_op->fault()(with VM_MIXEDMAP)

The only exception is:
mmap_vmcore()
remap_vmalloc_range_partial()
mmap_vmcore()

But there is no concurrency here, thus mmap_sem is not held.
After auditing the kernel, the following drivers use the fault
path and correctly set VM_MIXEDMAP):

.fault = etnaviv_gem_fault
.fault = udl_gem_fault
tegra_bo_fault()

As such, drop the reader trylock BUG_ON() for the common case.
This avoids having file_operations know about mmranges, as
mmap_sem is held during, mmap() for example.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
mm/memory.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 5adcdc7dee80..7c69674cd9da 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1773,7 +1773,6 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
if (!page_count(page))
return -EINVAL;
if (!(vma->vm_flags & VM_MIXEDMAP)) {
- BUG_ON(down_read_trylock(&vma->vm_mm->mmap_sem));
BUG_ON(vma->vm_flags & VM_PFNMAP);
vma->vm_flags |= VM_MIXEDMAP;
}
--
2.13.6


2018-02-05 01:41:02

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 29/64] arch/alpha: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/alpha/kernel/traps.c | 6 ++++--
arch/alpha/mm/fault.c | 10 +++++-----
2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/alpha/kernel/traps.c b/arch/alpha/kernel/traps.c
index 4bd99a7b1c41..2d884945bd26 100644
--- a/arch/alpha/kernel/traps.c
+++ b/arch/alpha/kernel/traps.c
@@ -986,12 +986,14 @@ do_entUnaUser(void __user * va, unsigned long opcode,
info.si_code = SEGV_ACCERR;
else {
struct mm_struct *mm = current->mm;
- down_read(&mm->mmap_sem);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ mm_read_lock(mm, &mmrange);
if (find_vma(mm, (unsigned long)va))
info.si_code = SEGV_ACCERR;
else
info.si_code = SEGV_MAPERR;
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}
info.si_addr = va;
send_sig_info(SIGSEGV, &info, current);
diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index 690d86a00a20..ec0ad8e23528 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -118,7 +118,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
@@ -181,14 +181,14 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

return;

/* Something tried to access memory that isn't in our memory map.
Fix it, but check if it's kernel or user first. */
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

if (user_mode(regs))
goto do_sigsegv;
@@ -212,14 +212,14 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
/* We ran out of memory, or some other thing happened to us that
made us unable to handle the page fault gracefully. */
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
/* Send a sigbus, regardless of whether we were in kernel
or user mode. */
info.si_signo = SIGBUS;
--
2.13.6


2018-02-05 01:41:15

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 31/64] arch/sparc: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/sparc/mm/fault_32.c | 18 +++++++++---------
arch/sparc/mm/fault_64.c | 12 ++++++------
arch/sparc/vdso/vma.c | 5 +++--
3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index ebb2406dbe7c..1f63a37b6f81 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -204,7 +204,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);

retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

if (!from_user && address >= PAGE_OFFSET)
goto bad_area;
@@ -281,7 +281,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;

/*
@@ -289,7 +289,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
@@ -338,7 +338,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (from_user) {
pagefault_out_of_memory();
return;
@@ -346,7 +346,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
goto no_context;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
do_fault_siginfo(BUS_ADRERR, SIGBUS, regs, text_fault);
if (!from_user)
goto no_context;
@@ -394,7 +394,7 @@ static void force_user_fault(unsigned long address, int write)

code = SEGV_MAPERR;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
@@ -419,15 +419,15 @@ static void force_user_fault(unsigned long address, int write)
case VM_FAULT_OOM:
goto do_sigbus;
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return;
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
__do_fault_siginfo(code, SIGSEGV, tsk->thread.kregs, address);
return;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
__do_fault_siginfo(BUS_ADRERR, SIGBUS, tsk->thread.kregs, address);
}

diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index e0a3c36b0fa1..d674c2d6b51a 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -335,7 +335,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)

perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);

- if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!mm_read_trylock(mm, &mmrange)) {
if ((regs->tstate & TSTATE_PRIV) &&
!search_exception_tables(regs->tpc)) {
insn = get_fault_insn(regs, insn);
@@ -343,7 +343,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
}

retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
}

if (fault_code & FAULT_CODE_BAD_RA)
@@ -476,7 +476,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
goto retry;
}
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

mm_rss = get_mm_rss(mm);
#if defined(CONFIG_TRANSPARENT_HUGEPAGE)
@@ -507,7 +507,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
*/
bad_area:
insn = get_fault_insn(regs, insn);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

handle_kernel_fault:
do_kernel_fault(regs, si_code, fault_code, insn, address);
@@ -519,7 +519,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
*/
out_of_memory:
insn = get_fault_insn(regs, insn);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!(regs->tstate & TSTATE_PRIV)) {
pagefault_out_of_memory();
goto exit_exception;
@@ -532,7 +532,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)

do_sigbus:
insn = get_fault_insn(regs, insn);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Send a sigbus, regardless of whether we were in kernel
diff --git a/arch/sparc/vdso/vma.c b/arch/sparc/vdso/vma.c
index f51595f861b8..35b888bc2f54 100644
--- a/arch/sparc/vdso/vma.c
+++ b/arch/sparc/vdso/vma.c
@@ -178,8 +178,9 @@ static int map_vdso(const struct vdso_image *image,
struct vm_area_struct *vma;
unsigned long text_start, addr = 0;
int ret = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

/*
* First, get an unmapped region: then randomize it, and make sure that
@@ -235,7 +236,7 @@ static int map_vdso(const struct vdso_image *image,
if (ret)
current->mm->context.vdso = NULL;

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}

--
2.13.6


2018-02-05 01:41:30

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 38/64] arch/blackfin: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/blackfin/kernel/ptrace.c | 5 +++--
arch/blackfin/kernel/trace.c | 7 ++++---
2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/blackfin/kernel/ptrace.c b/arch/blackfin/kernel/ptrace.c
index a6827095b99a..e6657ab61afc 100644
--- a/arch/blackfin/kernel/ptrace.c
+++ b/arch/blackfin/kernel/ptrace.c
@@ -121,15 +121,16 @@ is_user_addr_valid(struct task_struct *child, unsigned long start, unsigned long
bool valid;
struct vm_area_struct *vma;
struct sram_list_struct *sraml;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* overflow */
if (start + len < start)
return -EIO;

- down_read(&child->mm->mmap_sem);
+ mm_read_lock(child->mm, &mmrange);
vma = find_vma(child->mm, start);
valid = vma && start >= vma->vm_start && start + len <= vma->vm_end;
- up_read(&child->mm->mmap_sem);
+ mm_read_unlock(child->mm, &mmrange);
if (valid)
return 0;

diff --git a/arch/blackfin/kernel/trace.c b/arch/blackfin/kernel/trace.c
index 151f22196ab6..9bf938b14601 100644
--- a/arch/blackfin/kernel/trace.c
+++ b/arch/blackfin/kernel/trace.c
@@ -33,6 +33,7 @@ void decode_address(char *buf, unsigned long address)
struct mm_struct *mm;
unsigned long offset;
struct rb_node *n;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

#ifdef CONFIG_KALLSYMS
unsigned long symsize;
@@ -124,7 +125,7 @@ void decode_address(char *buf, unsigned long address)
continue;

mm = t->mm;
- if (!down_read_trylock(&mm->mmap_sem))
+ if (!mm_read_trylock(mm, &mmrange))
goto __continue;

for (n = rb_first(&mm->mm_rb); n; n = rb_next(n)) {
@@ -166,7 +167,7 @@ void decode_address(char *buf, unsigned long address)
sprintf(buf, "[ %s vma:0x%lx-0x%lx]",
name, vma->vm_start, vma->vm_end);

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
task_unlock(t);

if (buf[0] == '\0')
@@ -176,7 +177,7 @@ void decode_address(char *buf, unsigned long address)
}
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
__continue:
task_unlock(t);
}
--
2.13.6


2018-02-05 01:41:43

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 14/64] fs/coredump: teach about range locking

From: Davidlohr Bueso <[email protected]>

coredump_wait() needs mmap_sem such that zap_threads()
is stable. The conversion is trivial as the mmap_sem
is only used in the same function context. No change
in semantics.

In addition, we need an mmrange in exec_mmap() as mmap_sem
is needed for de_thread() or coredump (for core_state and
changing tsk->mm) scenarios. No change in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
fs/coredump.c | 5 +++--
fs/exec.c | 18 ++++++++++--------
2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 1e2c87acac9b..ad91712498fc 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -412,17 +412,18 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
int core_waiters = -EBUSY;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

init_completion(&core_state->startup);
core_state->dumper.task = tsk;
core_state->dumper.next = NULL;

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

if (!mm->core_state)
core_waiters = zap_threads(tsk, mm, core_state, exit_code);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

if (core_waiters > 0) {
struct core_thread *ptr;
diff --git a/fs/exec.c b/fs/exec.c
index e46752874b47..a61ac9e81169 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -294,12 +294,13 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
int err;
struct vm_area_struct *vma = NULL;
struct mm_struct *mm = bprm->mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
if (!vma)
return -ENOMEM;

- if (down_write_killable(&mm->mmap_sem)) {
+ if (mm_write_lock_killable(mm, &mmrange)) {
err = -EINTR;
goto err_free;
}
@@ -324,11 +325,11 @@ static int __bprm_mm_init(struct linux_binprm *bprm)

mm->stack_vm = mm->total_vm = 1;
arch_bprm_mm_init(mm, vma);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
bprm->p = vma->vm_end - sizeof(void *);
return 0;
err:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
err_free:
bprm->vma = NULL;
kmem_cache_free(vm_area_cachep, vma);
@@ -739,7 +740,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
bprm->loader -= stack_shift;
bprm->exec -= stack_shift;

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

vm_flags = VM_STACK_FLAGS;
@@ -796,7 +797,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
ret = -EFAULT;

out_unlock:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}
EXPORT_SYMBOL(setup_arg_pages);
@@ -1011,6 +1012,7 @@ static int exec_mmap(struct mm_struct *mm)
{
struct task_struct *tsk;
struct mm_struct *old_mm, *active_mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* Notify parent that we're no longer interested in the old VM */
tsk = current;
@@ -1025,9 +1027,9 @@ static int exec_mmap(struct mm_struct *mm)
* through with the exec. We must hold mmap_sem around
* checking core_state and changing tsk->mm.
*/
- down_read(&old_mm->mmap_sem);
+ mm_read_lock(old_mm, &mmrange);
if (unlikely(old_mm->core_state)) {
- up_read(&old_mm->mmap_sem);
+ mm_read_unlock(old_mm, &mmrange);
return -EINTR;
}
}
@@ -1040,7 +1042,7 @@ static int exec_mmap(struct mm_struct *mm)
vmacache_flush(tsk);
task_unlock(tsk);
if (old_mm) {
- up_read(&old_mm->mmap_sem);
+ mm_read_unlock(old_mm, &mmrange);
BUG_ON(active_mm != old_mm);
setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm);
mm_update_next_owner(old_mm);
--
2.13.6


2018-02-05 01:41:48

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 28/64] arch/x86: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/x86/entry/vdso/vma.c | 11 ++++++-----
arch/x86/kernel/vm86_32.c | 5 +++--
arch/x86/mm/debug_pagetables.c | 13 +++++++++----
arch/x86/mm/mpx.c | 14 ++++++++------
arch/x86/um/vdso/vma.c | 5 +++--
5 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 2e0bdf6a3aaf..5993caa12cc3 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -157,7 +157,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
int ret = 0;
DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

addr = get_unmapped_area(NULL, addr,
@@ -200,7 +200,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
}

up_fail:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}

@@ -261,8 +261,9 @@ int map_vdso_once(const struct vdso_image *image, unsigned long addr)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
/*
* Check if we have already mapped vdso blob - fail to prevent
* abusing from userspace install_speciall_mapping, which may
@@ -273,11 +274,11 @@ int map_vdso_once(const struct vdso_image *image, unsigned long addr)
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (vma_is_special_mapping(vma, &vdso_mapping) ||
vma_is_special_mapping(vma, &vvar_mapping)) {
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return -EEXIST;
}
}
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return map_vdso(image, addr);
}
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 5edb27f1a2c4..524817b365f6 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -171,8 +171,9 @@ static void mark_screen_rdonly(struct mm_struct *mm)
pmd_t *pmd;
pte_t *pte;
int i;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
pgd = pgd_offset(mm, 0xA0000);
if (pgd_none_or_clear_bad(pgd))
goto out;
@@ -198,7 +199,7 @@ static void mark_screen_rdonly(struct mm_struct *mm)
}
pte_unmap_unlock(pte, ptl);
out:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
flush_tlb_mm_range(mm, 0xA0000, 0xA0000 + 32*PAGE_SIZE, 0UL);
}

diff --git a/arch/x86/mm/debug_pagetables.c b/arch/x86/mm/debug_pagetables.c
index 421f2664ffa0..b044a0680923 100644
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -1,6 +1,7 @@
#include <linux/debugfs.h>
#include <linux/module.h>
#include <linux/seq_file.h>
+#include <linux/mm.h>
#include <asm/pgtable.h>

static int ptdump_show(struct seq_file *m, void *v)
@@ -25,9 +26,11 @@ static const struct file_operations ptdump_fops = {
static int ptdump_show_curknl(struct seq_file *m, void *v)
{
if (current->mm->pgd) {
- down_read(&current->mm->mmap_sem);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ mm_read_lock(current->mm, &mmrange);
ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
}
return 0;
}
@@ -51,9 +54,11 @@ static struct dentry *pe_curusr;
static int ptdump_show_curusr(struct seq_file *m, void *v)
{
if (current->mm->pgd) {
- down_read(&current->mm->mmap_sem);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ mm_read_lock(current->mm, &mmrange);
ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
}
return 0;
}
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 51c3e1f7e6be..e9c8d75e1d68 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -53,11 +53,11 @@ static unsigned long mpx_mmap(unsigned long len)
if (len != mpx_bt_size_bytes(mm))
return -EINVAL;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
addr = do_mmap(NULL, 0, len, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0, &populate, NULL,
&mmrange);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
if (populate)
mm_populate(addr, populate);

@@ -228,6 +228,7 @@ int mpx_enable_management(void)
void __user *bd_base = MPX_INVALID_BOUNDS_DIR;
struct mm_struct *mm = current->mm;
int ret = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* runtime in the userspace will be responsible for allocation of
@@ -241,7 +242,7 @@ int mpx_enable_management(void)
* unmap path; we can just use mm->context.bd_addr instead.
*/
bd_base = mpx_get_bounds_dir();
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

/* MPX doesn't support addresses above 47 bits yet. */
if (find_vma(mm, DEFAULT_MAP_WINDOW)) {
@@ -255,20 +256,21 @@ int mpx_enable_management(void)
if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR)
ret = -ENXIO;
out:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}

int mpx_disable_management(void)
{
struct mm_struct *mm = current->mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!cpu_feature_enabled(X86_FEATURE_MPX))
return -ENXIO;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return 0;
}

diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c
index 6be22f991b59..f129e97eb307 100644
--- a/arch/x86/um/vdso/vma.c
+++ b/arch/x86/um/vdso/vma.c
@@ -57,11 +57,12 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
{
int err;
struct mm_struct *mm = current->mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!vdso_enabled)
return 0;

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

err = install_special_mapping(mm, um_vdso_addr, PAGE_SIZE,
@@ -69,7 +70,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
vdsop);

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return err;
}
--
2.13.6


2018-02-05 01:42:00

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 30/64] arch/tile: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

This becomes quite straightforward with the mmrange in place.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/tile/kernel/stack.c | 5 +++--
arch/tile/mm/elf.c | 12 +++++++-----
arch/tile/mm/fault.c | 12 ++++++------
arch/tile/mm/pgtable.c | 6 ++++--
4 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/tile/kernel/stack.c b/arch/tile/kernel/stack.c
index 94ecbc6676e5..acd4a1ee8df1 100644
--- a/arch/tile/kernel/stack.c
+++ b/arch/tile/kernel/stack.c
@@ -378,6 +378,7 @@ void tile_show_stack(struct KBacktraceIterator *kbt)
{
int i;
int have_mmap_sem = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!start_backtrace())
return;
@@ -398,7 +399,7 @@ void tile_show_stack(struct KBacktraceIterator *kbt)
if (kbt->task == current && address < PAGE_OFFSET &&
!have_mmap_sem && kbt->task->mm && !in_interrupt()) {
have_mmap_sem =
- down_read_trylock(&kbt->task->mm->mmap_sem);
+ mm_read_trylock(kbt->task->mm, &mmrange);
}

describe_addr(kbt, address, have_mmap_sem,
@@ -415,7 +416,7 @@ void tile_show_stack(struct KBacktraceIterator *kbt)
if (kbt->end == KBT_LOOP)
pr_err("Stack dump stopped; next frame identical to this one\n");
if (have_mmap_sem)
- up_read(&kbt->task->mm->mmap_sem);
+ mm_read_unlock(kbt->task->mm, &mmrange);
end_backtrace();
}
EXPORT_SYMBOL(tile_show_stack);
diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c
index 889901824400..9aba9813cdb8 100644
--- a/arch/tile/mm/elf.c
+++ b/arch/tile/mm/elf.c
@@ -44,6 +44,7 @@ static int notify_exec(struct mm_struct *mm)
char *buf, *path;
struct vm_area_struct *vma;
struct file *exe_file;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!sim_is_simulator())
return 1;
@@ -60,10 +61,10 @@ static int notify_exec(struct mm_struct *mm)
if (IS_ERR(path))
goto done_put;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (vma = current->mm->mmap; ; vma = vma->vm_next) {
if (vma == NULL) {
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
goto done_put;
}
if (vma->vm_file == exe_file)
@@ -91,7 +92,7 @@ static int notify_exec(struct mm_struct *mm)
}
}
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

sim_notify_exec(path);
done_put:
@@ -119,6 +120,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
{
struct mm_struct *mm = current->mm;
int retval = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* Notify the simulator that an exec just occurred.
@@ -128,7 +130,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
if (!notify_exec(mm))
sim_notify_exec(bprm->filename);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

retval = setup_vdso_pages();

@@ -149,7 +151,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
}
#endif

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return retval;
}
diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index 09f053eb146f..f4ce0806653a 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -383,7 +383,7 @@ static int handle_page_fault(struct pt_regs *regs,
* source. If this is invalid we can skip the address space check,
* thus avoiding the deadlock.
*/
- if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!mm_read_trylock(mm, &mmrange)) {
if (is_kernel_mode &&
!search_exception_tables(regs->pc)) {
vma = NULL; /* happy compiler */
@@ -391,7 +391,7 @@ static int handle_page_fault(struct pt_regs *regs,
}

retry:
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
}

vma = find_vma(mm, address);
@@ -482,7 +482,7 @@ static int handle_page_fault(struct pt_regs *regs,
}
#endif

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return 1;

/*
@@ -490,7 +490,7 @@ static int handle_page_fault(struct pt_regs *regs,
* Fix it, but check if it's kernel or user first..
*/
bad_area:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
@@ -557,14 +557,14 @@ static int handle_page_fault(struct pt_regs *regs,
* us unable to handle the page fault gracefully.
*/
out_of_memory:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (is_kernel_mode)
goto no_context;
pagefault_out_of_memory();
return 0;

do_sigbus:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/* Kernel mode? Handle exceptions or die */
if (is_kernel_mode)
diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index ec5576fd3a86..2aab41fe69cf 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -430,7 +430,9 @@ void start_mm_caching(struct mm_struct *mm)
*/
static unsigned long update_priority_cached(struct mm_struct *mm)
{
- if (mm->context.priority_cached && down_write_trylock(&mm->mmap_sem)) {
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ if (mm->context.priority_cached && mm_write_trylock(mm, &mmrange)) {
struct vm_area_struct *vm;
for (vm = mm->mmap; vm; vm = vm->vm_next) {
if (hv_pte_get_cached_priority(vm->vm_page_prot))
@@ -438,7 +440,7 @@ static unsigned long update_priority_cached(struct mm_struct *mm)
}
if (vm == NULL)
mm->context.priority_cached = 0;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
}
return mm->context.priority_cached;
}
--
2.13.6


2018-02-05 01:42:42

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 23/64] mm: huge pagecache: do not check mmap_sem state

From: Davidlohr Bueso <[email protected]>

*THIS IS A HACK*

By dropping the rwsem_is_locked checks in zap_pmd_range()
and zap_pud_range() we can avoid having to teach
file_operations about mmrange. For example in xfs:
iomap_dio_rw() is called by .read_iter file callbacks.

No-Yet-Signed-off-by: Davidlohr Bueso <[email protected]>
---
mm/memory.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7c69674cd9da..598a8c69e3d3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1422,8 +1422,6 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
next = pmd_addr_end(addr, end);
if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {
if (next - addr != HPAGE_PMD_SIZE) {
- VM_BUG_ON_VMA(vma_is_anonymous(vma) &&
- !rwsem_is_locked(&tlb->mm->mmap_sem), vma);
__split_huge_pmd(vma, pmd, addr, false, NULL);
} else if (zap_huge_pmd(tlb, vma, pmd, addr))
goto next;
@@ -1459,7 +1457,6 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
next = pud_addr_end(addr, end);
if (pud_trans_huge(*pud) || pud_devmap(*pud)) {
if (next - addr != HPAGE_PUD_SIZE) {
- VM_BUG_ON_VMA(!rwsem_is_locked(&tlb->mm->mmap_sem), vma);
split_huge_pud(vma, pud, addr);
} else if (zap_huge_pud(tlb, vma, pud, addr))
goto next;
--
2.13.6


2018-02-05 01:42:44

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 25/64] mm: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

Most of the mmap_sem users are already aware of mmrange,
making the conversion straightforward. Those who don't,
simply use the mmap_sem within the same function context.
No change in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
mm/filemap.c | 4 ++--
mm/frame_vector.c | 4 ++--
mm/gup.c | 16 ++++++++--------
mm/khugepaged.c | 35 +++++++++++++++++++----------------
mm/memcontrol.c | 10 +++++-----
mm/memory.c | 9 +++++----
mm/mempolicy.c | 21 +++++++++++----------
mm/migrate.c | 10 ++++++----
mm/mincore.c | 4 ++--
mm/mmap.c | 30 +++++++++++++++++-------------
mm/mprotect.c | 14 ++++++++------
mm/mremap.c | 4 ++--
mm/msync.c | 9 +++++----
mm/nommu.c | 23 +++++++++++++----------
mm/oom_kill.c | 8 ++++----
mm/pagewalk.c | 4 ++--
mm/process_vm_access.c | 4 ++--
mm/shmem.c | 2 +-
mm/swapfile.c | 7 ++++---
mm/userfaultfd.c | 24 ++++++++++++++----------
mm/util.c | 9 +++++----
21 files changed, 137 insertions(+), 114 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 6124ede79a4d..b56f93e14992 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1303,7 +1303,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
if (flags & FAULT_FLAG_RETRY_NOWAIT)
return 0;

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
if (flags & FAULT_FLAG_KILLABLE)
wait_on_page_locked_killable(page);
else
@@ -1315,7 +1315,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,

ret = __lock_page_killable(page);
if (ret) {
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
return 0;
}
} else
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index d3dccd80c6ee..2074f6c4d6e9 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -47,7 +47,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
if (WARN_ON_ONCE(nr_frames > vec->nr_allocated))
nr_frames = vec->nr_allocated;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
locked = 1;
vma = find_vma_intersection(mm, start, start + 1);
if (!vma) {
@@ -102,7 +102,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
out:
if (locked)
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (!ret)
ret = -EFAULT;
if (ret > 0)
diff --git a/mm/gup.c b/mm/gup.c
index 3d1b6dd11616..08d7c17e9f06 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -827,7 +827,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
}

if (ret & VM_FAULT_RETRY) {
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, mmrange);
if (!(fault_flags & FAULT_FLAG_TRIED)) {
*unlocked = true;
fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
@@ -911,7 +911,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
*/
*locked = 1;
lock_dropped = true;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, mmrange);
ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
pages, NULL, NULL, mmrange);
if (ret != 1) {
@@ -932,7 +932,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
* We must let the caller know we temporarily dropped the lock
* and so the critical section protected by it was lost.
*/
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
*locked = 0;
}
return pages_done;
@@ -992,11 +992,11 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
long ret;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL,
&locked, gup_flags | FOLL_TOUCH, &mmrange);
if (locked)
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return ret;
}
EXPORT_SYMBOL(get_user_pages_unlocked);
@@ -1184,7 +1184,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
VM_BUG_ON(end & ~PAGE_MASK);
VM_BUG_ON_VMA(start < vma->vm_start, vma);
VM_BUG_ON_VMA(end > vma->vm_end, vma);
- VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm);
+ VM_BUG_ON_MM(!mm_is_locked(mm, mmrange), mm);

gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK;
if (vma->vm_flags & VM_LOCKONFAULT)
@@ -1239,7 +1239,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
*/
if (!locked) {
locked = 1;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, nstart);
} else if (nstart >= vma->vm_end)
vma = vma->vm_next;
@@ -1271,7 +1271,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
ret = 0;
}
if (locked)
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return ret; /* 0 or negative error code */
}

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 0b91ce730160..9076d26d162a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -469,6 +469,8 @@ void __khugepaged_exit(struct mm_struct *mm)
free_mm_slot(mm_slot);
mmdrop(mm);
} else if (mm_slot) {
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
/*
* This is required to serialize against
* khugepaged_test_exit() (which is guaranteed to run
@@ -477,8 +479,8 @@ void __khugepaged_exit(struct mm_struct *mm)
* khugepaged has finished working on the pagetables
* under the mmap_sem.
*/
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
+ mm_write_unlock(mm, &mmrange);
}
}

@@ -902,7 +904,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,

/* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
if (ret & VM_FAULT_RETRY) {
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, mmrange);
if (hugepage_vma_revalidate(mm, address, &vmf.vma)) {
/* vma is no longer available, don't continue to swapin */
trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0);
@@ -956,7 +958,7 @@ static void collapse_huge_page(struct mm_struct *mm,
* sync compaction, and we do not need to hold the mmap_sem during
* that. We will recheck the vma after taking it again in write mode.
*/
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
new_page = khugepaged_alloc_page(hpage, gfp, node);
if (!new_page) {
result = SCAN_ALLOC_HUGE_PAGE_FAIL;
@@ -968,11 +970,11 @@ static void collapse_huge_page(struct mm_struct *mm,
goto out_nolock;
}

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, mmrange);
result = hugepage_vma_revalidate(mm, address, &vma);
if (result) {
mem_cgroup_cancel_charge(new_page, memcg, true);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
goto out_nolock;
}

@@ -980,7 +982,7 @@ static void collapse_huge_page(struct mm_struct *mm,
if (!pmd) {
result = SCAN_PMD_NULL;
mem_cgroup_cancel_charge(new_page, memcg, true);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
goto out_nolock;
}

@@ -991,17 +993,17 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced, mmrange)) {
mem_cgroup_cancel_charge(new_page, memcg, true);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
goto out_nolock;
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);
/*
* Prevent all access to pagetables with the exception of
* gup_fast later handled by the ptep_clear_flush and the VM
* handled by the anon_vma lock + PG_lock.
*/
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, mmrange);
result = hugepage_vma_revalidate(mm, address, &vma);
if (result)
goto out;
@@ -1084,7 +1086,7 @@ static void collapse_huge_page(struct mm_struct *mm,
khugepaged_pages_collapsed++;
result = SCAN_SUCCEED;
out_up_write:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, mmrange);
out_nolock:
trace_mm_collapse_huge_page(mm, isolated, result);
return;
@@ -1249,6 +1251,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
struct vm_area_struct *vma;
unsigned long addr;
pmd_t *pmd, _pmd;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

i_mmap_lock_write(mapping);
vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
@@ -1269,12 +1272,12 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* re-fault. Not ideal, but it's more important to not disturb
* the system too much.
*/
- if (down_write_trylock(&vma->vm_mm->mmap_sem)) {
+ if (mm_write_trylock(vma->vm_mm, &mmrange)) {
spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
/* assume page table is clear */
_pmd = pmdp_collapse_flush(vma, addr, pmd);
spin_unlock(ptl);
- up_write(&vma->vm_mm->mmap_sem);
+ mm_write_unlock(vma->vm_mm, &mmrange);
mm_dec_nr_ptes(vma->vm_mm);
pte_free(vma->vm_mm, pmd_pgtable(_pmd));
}
@@ -1684,7 +1687,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
* the next mm on the list.
*/
vma = NULL;
- if (unlikely(!down_read_trylock(&mm->mmap_sem)))
+ if (unlikely(!mm_read_trylock(mm, &mmrange)))
goto breakouterloop_mmap_sem;
if (likely(!khugepaged_test_exit(mm)))
vma = find_vma(mm, khugepaged_scan.address);
@@ -1729,7 +1732,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
if (!shmem_huge_enabled(vma))
goto skip;
file = get_file(vma->vm_file);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
ret = 1;
khugepaged_scan_shmem(mm, file->f_mapping,
pgoff, hpage);
@@ -1750,7 +1753,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
}
}
breakouterloop:
- up_read(&mm->mmap_sem); /* exit_mmap will destroy ptes after this */
+ mm_read_unlock(mm, &mmrange); /* exit_mmap will destroy ptes after this */
breakouterloop_mmap_sem:

spin_lock(&khugepaged_mm_lock);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a7ac5a14b22e..699d35ffee1a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4916,16 +4916,16 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
{
unsigned long precharge;
- DEFINE_RANGE_LOCK_FULL(mmrange);

struct mm_walk mem_cgroup_count_precharge_walk = {
.pmd_entry = mem_cgroup_count_precharge_pte_range,
.mm = mm,
};
- down_read(&mm->mmap_sem);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+ mm_read_lock(mm, &mmrange);
walk_page_range(0, mm->highest_vm_end,
&mem_cgroup_count_precharge_walk, &mmrange);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

precharge = mc.precharge;
mc.precharge = 0;
@@ -5211,7 +5211,7 @@ static void mem_cgroup_move_charge(void)
atomic_inc(&mc.from->moving_account);
synchronize_rcu();
retry:
- if (unlikely(!down_read_trylock(&mc.mm->mmap_sem))) {
+ if (unlikely(!mm_read_trylock(mc.mm, &mmrange))) {
/*
* Someone who are holding the mmap_sem might be waiting in
* waitq. So we cancel all extra charges, wake up all waiters,
@@ -5230,7 +5230,7 @@ static void mem_cgroup_move_charge(void)
walk_page_range(0, mc.mm->highest_vm_end, &mem_cgroup_move_charge_walk,
&mmrange);

- up_read(&mc.mm->mmap_sem);
+ mm_read_unlock(mc.mm, &mmrange);
atomic_dec(&mc.from->moving_account);
}

diff --git a/mm/memory.c b/mm/memory.c
index 598a8c69e3d3..e3bf2879f7c3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4425,7 +4425,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
int write = gup_flags & FOLL_WRITE;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
/* ignore errors, just check how much was successfully transferred */
while (len) {
int bytes, ret, offset;
@@ -4474,7 +4474,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
buf += bytes;
addr += bytes;
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

return buf - old_buf;
}
@@ -4525,11 +4525,12 @@ void print_vma_addr(char *prefix, unsigned long ip)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* we might be running from an atomic context so we cannot sleep
*/
- if (!down_read_trylock(&mm->mmap_sem))
+ if (!mm_read_trylock(mm, &mmrange))
return;

vma = find_vma(mm, ip);
@@ -4548,7 +4549,7 @@ void print_vma_addr(char *prefix, unsigned long ip)
free_page((unsigned long)buf);
}
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}

#if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 001dc176abc1..93b69c603e8d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -378,11 +378,12 @@ void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new)
void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new)
{
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
for (vma = mm->mmap; vma; vma = vma->vm_next)
mpol_rebind_policy(vma->vm_policy, new);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
}

static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
@@ -842,10 +843,10 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
* vma/shared policy at addr is NULL. We
* want to return MPOL_DEFAULT in this case.
*/
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma_intersection(mm, addr, addr+1);
if (!vma) {
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return -EFAULT;
}
if (vma->vm_ops && vma->vm_ops->get_policy)
@@ -895,7 +896,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
out:
mpol_cond_put(pol);
if (vma)
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return err;
}

@@ -992,7 +993,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
if (err)
return err;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

/*
* Find a 'source' bit set in 'tmp' whose corresponding 'dest'
@@ -1073,7 +1074,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
if (err < 0)
break;
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (err < 0)
return err;
return busy;
@@ -1195,12 +1196,12 @@ static long do_mbind(unsigned long start, unsigned long len,
{
NODEMASK_SCRATCH(scratch);
if (scratch) {
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
task_lock(current);
err = mpol_set_nodemask(new, nmask, scratch);
task_unlock(current);
if (err)
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
} else
err = -ENOMEM;
NODEMASK_SCRATCH_FREE(scratch);
@@ -1229,7 +1230,7 @@ static long do_mbind(unsigned long start, unsigned long len,
} else
putback_movable_pages(&pagelist);

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mpol_out:
mpol_put(new);
return err;
diff --git a/mm/migrate.c b/mm/migrate.c
index 7a6afc34dd54..e905d2aef7fa 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1486,8 +1486,9 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
struct page *page;
unsigned int follflags;
int err;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
err = -EFAULT;
vma = find_vma(mm, addr);
if (!vma || addr < vma->vm_start || !vma_migratable(vma))
@@ -1540,7 +1541,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
*/
put_page(page);
out:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return err;
}

@@ -1638,8 +1639,9 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
const void __user **pages, int *status)
{
unsigned long i;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

for (i = 0; i < nr_pages; i++) {
unsigned long addr = (unsigned long)(*pages);
@@ -1666,7 +1668,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
status++;
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}

/*
diff --git a/mm/mincore.c b/mm/mincore.c
index a6875a34aac0..1255098449b8 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -259,9 +259,9 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
* Do at most PAGE_SIZE entries per iteration, due to
* the temporary buffer size.
*/
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
retval = do_mincore(start, min(pages, PAGE_SIZE), tmp, &mmrange);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

if (retval <= 0)
break;
diff --git a/mm/mmap.c b/mm/mmap.c
index 8f0eb88a5d5e..e10d005f7e2f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -191,7 +191,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
LIST_HEAD(uf);
DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

#ifdef CONFIG_COMPAT_BRK
@@ -244,7 +244,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
set_brk:
mm->brk = brk;
populate = newbrk > oldbrk && (mm->def_flags & VM_LOCKED) != 0;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
userfaultfd_unmap_complete(mm, &uf);
if (populate)
mm_populate(oldbrk, newbrk - oldbrk);
@@ -252,7 +252,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)

out:
retval = mm->brk;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return retval;
}

@@ -2762,11 +2762,11 @@ int vm_munmap(unsigned long start, size_t len)
LIST_HEAD(uf);
DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

ret = do_munmap(mm, start, len, &uf, &mmrange);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
userfaultfd_unmap_complete(mm, &uf);
return ret;
}
@@ -2808,7 +2808,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
if (pgoff + (size >> PAGE_SHIFT) < pgoff)
return ret;

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

vma = find_vma(mm, start);
@@ -2871,7 +2871,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
prot, flags, pgoff, &populate, NULL, &mmrange);
fput(file);
out:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
if (populate)
mm_populate(ret, populate);
if (!IS_ERR_VALUE(ret))
@@ -2882,9 +2882,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
static inline void verify_mm_writelocked(struct mm_struct *mm)
{
#ifdef CONFIG_DEBUG_VM
- if (unlikely(down_read_trylock(&mm->mmap_sem))) {
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ if (unlikely(mm_read_trylock(mm, &mmrange))) {
WARN_ON(1);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}
#endif
}
@@ -2996,12 +2998,12 @@ int vm_brk_flags(unsigned long addr, unsigned long len, unsigned long flags)
LIST_HEAD(uf);
DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

ret = do_brk_flags(addr, len, flags, &uf, &mmrange);
populate = ((mm->def_flags & VM_LOCKED) != 0);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
userfaultfd_unmap_complete(mm, &uf);
if (populate && !ret)
mm_populate(addr, len);
@@ -3048,6 +3050,8 @@ void exit_mmap(struct mm_struct *mm)
unmap_vmas(&tlb, vma, 0, -1);

if (unlikely(mm_is_oom_victim(mm))) {
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
/*
* Wait for oom_reap_task() to stop working on this
* mm. Because MMF_OOM_SKIP is already set before
@@ -3061,8 +3065,8 @@ void exit_mmap(struct mm_struct *mm)
* is found not NULL while holding the task_lock.
*/
set_bit(MMF_OOM_SKIP, &mm->flags);
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
+ mm_write_unlock(mm, &mmrange);
}
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b84a70720319..2f39450ae959 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -424,7 +424,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,

reqprot = prot;

- if (down_write_killable(&current->mm->mmap_sem))
+ if (mm_write_lock_killable(current->mm, &mmrange))
return -EINTR;

/*
@@ -514,7 +514,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
prot = reqprot;
}
out:
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
return error;
}

@@ -536,6 +536,7 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
{
int pkey;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* No flags supported yet. */
if (flags)
@@ -544,7 +545,7 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
if (init_val & ~PKEY_ACCESS_MASK)
return -EINVAL;

- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);
pkey = mm_pkey_alloc(current->mm);

ret = -ENOSPC;
@@ -558,17 +559,18 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
}
ret = pkey;
out:
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
return ret;
}

SYSCALL_DEFINE1(pkey_free, int, pkey)
{
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);
ret = mm_pkey_free(current->mm, pkey);
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);

/*
* We could provie warnings or errors if any VMA still
diff --git a/mm/mremap.c b/mm/mremap.c
index 21a9e2a2baa2..cc56d13e5e67 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -557,7 +557,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
if (!new_len)
return ret;

- if (down_write_killable(&current->mm->mmap_sem))
+ if (mm_write_lock_killable(current->mm, &mmrange))
return -EINTR;

if (flags & MREMAP_FIXED) {
@@ -641,7 +641,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
vm_unacct_memory(charged);
locked = 0;
}
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
if (locked && new_len > old_len)
mm_populate(new_addr + old_len, new_len - old_len);
userfaultfd_unmap_complete(mm, &uf_unmap_early);
diff --git a/mm/msync.c b/mm/msync.c
index ef30a429623a..2524b4708e78 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -36,6 +36,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
struct vm_area_struct *vma;
int unmapped_error = 0;
int error = -EINVAL;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC))
goto out;
@@ -55,7 +56,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
* If the interval [start,end) covers some unmapped address ranges,
* just ignore them, but return -ENOMEM at the end.
*/
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, start);
for (;;) {
struct file *file;
@@ -86,12 +87,12 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
if ((flags & MS_SYNC) && file &&
(vma->vm_flags & VM_SHARED)) {
get_file(file);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
error = vfs_fsync_range(file, fstart, fend, 1);
fput(file);
if (error || start >= end)
goto out;
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, start);
} else {
if (start >= end) {
@@ -102,7 +103,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
}
}
out_unlock:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
out:
return error ? : unmapped_error;
}
diff --git a/mm/nommu.c b/mm/nommu.c
index 1805f0a788b3..575525e86961 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -187,10 +187,10 @@ static long __get_user_pages_unlocked(struct task_struct *tsk,
long ret;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
ret = __get_user_pages(tsk, mm, start, nr_pages, gup_flags, pages,
NULL, NULL, &mmrange);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return ret;
}

@@ -253,12 +253,13 @@ void *vmalloc_user(unsigned long size)
ret = __vmalloc(size, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL);
if (ret) {
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);
vma = find_vma(current->mm, (unsigned long)ret);
if (vma)
vma->vm_flags |= VM_USERMAP;
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
}

return ret;
@@ -1651,9 +1652,9 @@ int vm_munmap(unsigned long addr, size_t len)
int ret;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
ret = do_munmap(mm, addr, len, NULL, &mmrange);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return ret;
}
EXPORT_SYMBOL(vm_munmap);
@@ -1739,10 +1740,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
unsigned long, new_addr)
{
unsigned long ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_write(&current->mm->mmap_sem);
+ mm_write_lock(current->mm, &mmrange);
ret = do_mremap(addr, old_len, new_len, flags, new_addr);
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
return ret;
}

@@ -1815,8 +1817,9 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
{
struct vm_area_struct *vma;
int write = gup_flags & FOLL_WRITE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

/* the access must start within one of the target process's mappings */
vma = find_vma(mm, addr);
@@ -1838,7 +1841,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
len = 0;
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

return len;
}
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 2288e1cb1bc9..6bf9cb38bfe1 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -508,7 +508,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
*/
mutex_lock(&oom_lock);

- if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!mm_read_trylock(mm, &mmrange)) {
ret = false;
trace_skip_task_reaping(tsk->pid);
goto unlock_oom;
@@ -521,7 +521,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
* notifiers cannot block for unbounded amount of time
*/
if (mm_has_blockable_invalidate_notifiers(mm, &mmrange)) {
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
schedule_timeout_idle(HZ);
goto unlock_oom;
}
@@ -533,7 +533,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
* down_write();up_write() cycle in exit_mmap().
*/
if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
trace_skip_task_reaping(tsk->pid);
goto unlock_oom;
}
@@ -578,7 +578,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
K(get_mm_counter(mm, MM_ANONPAGES)),
K(get_mm_counter(mm, MM_FILEPAGES)),
K(get_mm_counter(mm, MM_SHMEMPAGES)));
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

trace_finish_task_reaping(tsk->pid);
unlock_oom:
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 44a2507c94fd..55a4dcc519cd 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -301,7 +301,7 @@ int walk_page_range(unsigned long start, unsigned long end,
if (!walk->mm)
return -EINVAL;

- VM_BUG_ON_MM(!rwsem_is_locked(&walk->mm->mmap_sem), walk->mm);
+ VM_BUG_ON_MM(!mm_is_locked(walk->mm, mmrange), walk->mm);

vma = find_vma(walk->mm, start);
do {
@@ -345,7 +345,7 @@ int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk,
if (!walk->mm)
return -EINVAL;

- VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem));
+ VM_BUG_ON(!mm_is_locked(walk->mm, mmrange));
VM_BUG_ON(!vma);
walk->vma = vma;
err = walk_page_test(vma->vm_start, vma->vm_end, walk, mmrange);
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index ff6772b86195..aaccb8972f83 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -110,12 +110,12 @@ static int process_vm_rw_single_vec(unsigned long addr,
* access remotely because task/mm might not
* current/current->mm
*/
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
pages = get_user_pages_remote(task, mm, pa, pages, flags,
process_pages, NULL, &locked,
&mmrange);
if (locked)
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
if (pages <= 0)
return -EFAULT;

diff --git a/mm/shmem.c b/mm/shmem.c
index 1907688b75ee..8a99281bf502 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1961,7 +1961,7 @@ static int shmem_fault(struct vm_fault *vmf)
if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) &&
!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
/* It's polite to up mmap_sem if we can */
- up_read(&vma->vm_mm->mmap_sem);
+ mm_read_unlock(vma->vm_mm, vmf->lockrange);
ret = VM_FAULT_RETRY;
}

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 006047b16814..d9c6ca32b94f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1958,15 +1958,16 @@ static int unuse_mm(struct mm_struct *mm,
{
struct vm_area_struct *vma;
int ret = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!mm_read_trylock(mm, &mmrange)) {
/*
* Activate page so shrink_inactive_list is unlikely to unmap
* its ptes while lock is dropped, so swapoff can make progress.
*/
activate_page(page);
unlock_page(page);
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
lock_page(page);
}
for (vma = mm->mmap; vma; vma = vma->vm_next) {
@@ -1974,7 +1975,7 @@ static int unuse_mm(struct mm_struct *mm,
break;
cond_resched();
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
return (ret < 0)? ret: 0;
}

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 39791b81ede7..8ad13bea799d 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -155,7 +155,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
unsigned long dst_start,
unsigned long src_start,
unsigned long len,
- bool zeropage)
+ bool zeropage,
+ struct range_lock *mmrange)
{
int vm_alloc_shared = dst_vma->vm_flags & VM_SHARED;
int vm_shared = dst_vma->vm_flags & VM_SHARED;
@@ -177,7 +178,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
* feature is not supported.
*/
if (zeropage) {
- up_read(&dst_mm->mmap_sem);
+ mm_read_unlock(dst_mm, mmrange);
return -EINVAL;
}

@@ -275,7 +276,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
cond_resched();

if (unlikely(err == -EFAULT)) {
- up_read(&dst_mm->mmap_sem);
+ mm_read_unlock(dst_mm, mmrange);
BUG_ON(!page);

err = copy_huge_page_from_user(page,
@@ -285,7 +286,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
err = -EFAULT;
goto out;
}
- down_read(&dst_mm->mmap_sem);
+ mm_read_lock(dst_mm, mmrange);

dst_vma = NULL;
goto retry;
@@ -305,7 +306,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
}

out_unlock:
- up_read(&dst_mm->mmap_sem);
+ mm_read_unlock(dst_mm, mmrange);
out:
if (page) {
/*
@@ -367,7 +368,8 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
unsigned long dst_start,
unsigned long src_start,
unsigned long len,
- bool zeropage);
+ bool zeropage,
+ struct range_lock *mmrange);
#endif /* CONFIG_HUGETLB_PAGE */

static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
@@ -412,6 +414,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
unsigned long src_addr, dst_addr;
long copied;
struct page *page;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* Sanitize the command parameters:
@@ -428,7 +431,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
copied = 0;
page = NULL;
retry:
- down_read(&dst_mm->mmap_sem);
+ mm_read_lock(dst_mm, &mmrange);

/*
* Make sure the vma is not shared, that the dst range is
@@ -468,7 +471,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
*/
if (is_vm_hugetlb_page(dst_vma))
return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start,
- src_start, len, zeropage);
+ src_start, len, zeropage,
+ &mmrange);

if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
goto out_unlock;
@@ -523,7 +527,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
if (unlikely(err == -EFAULT)) {
void *page_kaddr;

- up_read(&dst_mm->mmap_sem);
+ mm_read_unlock(dst_mm, &mmrange);
BUG_ON(!page);

page_kaddr = kmap(page);
@@ -552,7 +556,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
}

out_unlock:
- up_read(&dst_mm->mmap_sem);
+ mm_read_unlock(dst_mm, &mmrange);
out:
if (page)
put_page(page);
diff --git a/mm/util.c b/mm/util.c
index b0ec1d88bb71..e17c6c74cc23 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -351,11 +351,11 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,

ret = security_mmap_file(file, prot, flag);
if (!ret) {
- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;
ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
&populate, &uf, &mmrange);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
userfaultfd_unmap_complete(mm, &uf);
if (populate)
mm_populate(ret, populate);
@@ -715,18 +715,19 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
int res = 0;
unsigned int len;
struct mm_struct *mm = get_task_mm(task);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
unsigned long arg_start, arg_end, env_start, env_end;
if (!mm)
goto out;
if (!mm->arg_end)
goto out_mm; /* Shh! No looking before we're done */

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
env_end = mm->env_end;
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

len = arg_end - arg_start;

--
2.13.6


2018-02-05 01:42:51

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 26/64] fs: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

Also fixup some previous userfaultfd changes.
No change in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
fs/aio.c | 4 ++--
fs/userfaultfd.c | 26 ++++++++++++++------------
include/linux/userfaultfd_k.h | 5 +++--
mm/madvise.c | 4 ++--
4 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 31774b75c372..98affcf36b97 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -512,7 +512,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
ctx->mmap_size = nr_pages * PAGE_SIZE;
pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);

- if (down_write_killable(&mm->mmap_sem)) {
+ if (mm_write_lock_killable(mm, &mmrange)) {
ctx->mmap_size = 0;
aio_free_ring(ctx);
return -EINTR;
@@ -521,7 +521,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size,
PROT_READ | PROT_WRITE,
MAP_SHARED, 0, &unused, NULL, &mmrange);
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
if (IS_ERR((void *)ctx->mmap_base)) {
ctx->mmap_size = 0;
aio_free_ring(ctx);
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 883fbffb284e..805bdc7ecf2d 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -482,7 +482,7 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason)
vmf->address,
vmf->flags, reason,
vmf->lockrange);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, vmf->lockrange);

if (likely(must_wait && !READ_ONCE(ctx->released) &&
(return_to_userland ? !signal_pending(current) :
@@ -536,7 +536,7 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason)
* and there's no need to retake the mmap_sem
* in such case.
*/
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, vmf->lockrange);
ret = VM_FAULT_NOPAGE;
}
}
@@ -629,13 +629,14 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
if (release_new_ctx) {
struct vm_area_struct *vma;
struct mm_struct *mm = release_new_ctx->mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* the various vma->vm_userfaultfd_ctx still points to it */
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
for (vma = mm->mmap; vma; vma = vma->vm_next)
if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx)
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

userfaultfd_ctx_put(release_new_ctx);
}
@@ -765,7 +766,8 @@ void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *vm_ctx,
}

bool userfaultfd_remove(struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = vma->vm_mm;
struct userfaultfd_ctx *ctx;
@@ -776,7 +778,7 @@ bool userfaultfd_remove(struct vm_area_struct *vma,
return true;

userfaultfd_ctx_get(ctx);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, mmrange);

msg_init(&ewq.msg);

@@ -870,7 +872,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
* it's critical that released is set to true (above), before
* taking the mmap_sem for writing.
*/
- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
prev = NULL;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
cond_resched();
@@ -893,7 +895,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
vma->vm_flags = new_flags;
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
}
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mmput(mm);
wakeup:
/*
@@ -1321,7 +1323,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
if (!mmget_not_zero(mm))
goto out;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
@@ -1450,7 +1452,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
vma = vma->vm_next;
} while (vma && vma->vm_start < end);
out_unlock:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mmput(mm);
if (!ret) {
/*
@@ -1496,7 +1498,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
if (!mmget_not_zero(mm))
goto out;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
@@ -1609,7 +1611,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
vma = vma->vm_next;
} while (vma && vma->vm_start < end);
out_unlock:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
mmput(mm);
out:
return ret;
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index f2f3b68ba910..35164358245f 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -64,7 +64,7 @@ extern void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *,

extern bool userfaultfd_remove(struct vm_area_struct *vma,
unsigned long start,
- unsigned long end);
+ unsigned long end, struct range_lock *mmrange);

extern int userfaultfd_unmap_prep(struct vm_area_struct *vma,
unsigned long start, unsigned long end,
@@ -120,7 +120,8 @@ static inline void mremap_userfaultfd_complete(struct vm_userfaultfd_ctx *ctx,

static inline bool userfaultfd_remove(struct vm_area_struct *vma,
unsigned long start,
- unsigned long end)
+ unsigned long end,
+ struct range_lock *mmrange)
{
return true;
}
diff --git a/mm/madvise.c b/mm/madvise.c
index de8fb035955c..9ba23187445b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -529,7 +529,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
if (!can_madv_dontneed_vma(vma))
return -EINVAL;

- if (!userfaultfd_remove(vma, start, end)) {
+ if (!userfaultfd_remove(vma, start, end, mmrange)) {
*prev = NULL; /* mmap_sem has been dropped, prev is stale */

mm_read_lock(current->mm, mmrange);
@@ -613,7 +613,7 @@ static long madvise_remove(struct vm_area_struct *vma,
* mmap_sem.
*/
get_file(f);
- if (userfaultfd_remove(vma, start, end)) {
+ if (userfaultfd_remove(vma, start, end, mmrange)) {
/* mmap_sem was not released by userfaultfd_remove() */
mm_read_unlock(current->mm, mmrange);
}
--
2.13.6


2018-02-05 01:43:35

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 16/64] virt: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

No change in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
virt/kvm/arm/mmu.c | 17 ++++++++++-------
virt/kvm/async_pf.c | 4 ++--
virt/kvm/kvm_main.c | 9 +++++----
3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ec62d1cccab7..9a866a639c2c 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -815,9 +815,10 @@ void stage2_unmap_vm(struct kvm *kvm)
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
int idx;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

idx = srcu_read_lock(&kvm->srcu);
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
spin_lock(&kvm->mmu_lock);

slots = kvm_memslots(kvm);
@@ -825,7 +826,7 @@ void stage2_unmap_vm(struct kvm *kvm)
stage2_unmap_memslot(kvm, memslot);

spin_unlock(&kvm->mmu_lock);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
srcu_read_unlock(&kvm->srcu, idx);
}

@@ -1317,6 +1318,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
pgprot_t mem_type = PAGE_S2;
bool logging_active = memslot_is_logging(memslot);
unsigned long flags = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

write_fault = kvm_is_write_fault(vcpu);
exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1328,11 +1330,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
}

/* Let's check if we will get back a huge page backed by hugetlbfs */
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma_intersection(current->mm, hva, hva + 1);
if (unlikely(!vma)) {
kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return -EFAULT;
}

@@ -1353,7 +1355,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
((memslot->base_gfn << PAGE_SHIFT) & ~PMD_MASK))
force_pte = true;
}
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

/* We need minimum second+third level pages */
ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES,
@@ -1889,6 +1891,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
hva_t reg_end = hva + mem->memory_size;
bool writable = !(mem->flags & KVM_MEM_READONLY);
int ret = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
change != KVM_MR_FLAGS_ONLY)
@@ -1902,7 +1905,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
(KVM_PHYS_SIZE >> PAGE_SHIFT))
return -EFAULT;

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
/*
* A memory region could potentially cover multiple VMAs, and any holes
* between them, so iterate over all of them to find out if we can map
@@ -1970,7 +1973,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
stage2_flush_memslot(kvm, memslot);
spin_unlock(&kvm->mmu_lock);
out:
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return ret;
}

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 4cd2b93bb20c..ed559789d7cb 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -87,11 +87,11 @@ static void async_pf_execute(struct work_struct *work)
* mm and might be done in another context, so we must
* access remotely.
*/
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
&locked, &mmrange);
if (locked)
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

kvm_async_page_present_sync(vcpu, apf);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 86ec078f4c3b..92fd944e7e3a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1222,6 +1222,7 @@ EXPORT_SYMBOL_GPL(kvm_is_visible_gfn);
unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
{
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
unsigned long addr, size;

size = PAGE_SIZE;
@@ -1230,7 +1231,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
if (kvm_is_error_hva(addr))
return PAGE_SIZE;

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = find_vma(current->mm, addr);
if (!vma)
goto out;
@@ -1238,7 +1239,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
size = vma_kernel_pagesize(vma);

out:
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

return size;
}
@@ -1494,7 +1495,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
if (npages == 1)
return pfn;

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
if (npages == -EHWPOISON ||
(!async && check_user_page_hwpoison(addr, &mmrange))) {
pfn = KVM_PFN_ERR_HWPOISON;
@@ -1519,7 +1520,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
pfn = KVM_PFN_ERR_FAULT;
}
exit:
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
return pfn;
}

--
2.13.6


2018-02-05 01:43:39

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 20/64] mm/madvise: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

mmap_sem users are already aware of mmrange, thus a
straightforward conversion. No changes in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
mm/madvise.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index eaec6bfc2b08..de8fb035955c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -532,7 +532,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
if (!userfaultfd_remove(vma, start, end)) {
*prev = NULL; /* mmap_sem has been dropped, prev is stale */

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, mmrange);
vma = find_vma(current->mm, start);
if (!vma)
return -ENOMEM;
@@ -582,7 +582,8 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
*/
static long madvise_remove(struct vm_area_struct *vma,
struct vm_area_struct **prev,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
loff_t offset;
int error;
@@ -614,13 +615,13 @@ static long madvise_remove(struct vm_area_struct *vma,
get_file(f);
if (userfaultfd_remove(vma, start, end)) {
/* mmap_sem was not released by userfaultfd_remove() */
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, mmrange);
}
error = vfs_fallocate(f,
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
offset, end - start);
fput(f);
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, mmrange);
return error;
}

@@ -690,7 +691,7 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
{
switch (behavior) {
case MADV_REMOVE:
- return madvise_remove(vma, prev, start, end);
+ return madvise_remove(vma, prev, start, end, mmrange);
case MADV_WILLNEED:
return madvise_willneed(vma, prev, start, end, mmrange);
case MADV_FREE:
@@ -809,6 +810,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
int write;
size_t len;
struct blk_plug plug;
+
DEFINE_RANGE_LOCK_FULL(mmrange);
if (!madvise_behavior_valid(behavior))
return error;
@@ -836,10 +838,10 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)

write = madvise_need_mmap_write(behavior);
if (write) {
- if (down_write_killable(&current->mm->mmap_sem))
+ if (mm_write_lock_killable(current->mm, &mmrange))
return -EINTR;
} else {
- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
}

/*
@@ -889,9 +891,9 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
out:
blk_finish_plug(&plug);
if (write)
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
else
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);

return error;
}
--
2.13.6


2018-02-05 01:43:41

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 19/64] mm/mlock: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

Conversion is straightforward, mmap_sem is used within the
same function context. No changes in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
mm/mlock.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 3f6bd953e8b0..dfd175b2cf20 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -686,7 +686,7 @@ static __must_check int do_mlock(unsigned long start, size_t len,
lock_limit >>= PAGE_SHIFT;
locked = len >> PAGE_SHIFT;

- if (down_write_killable(&current->mm->mmap_sem))
+ if (mm_write_lock_killable(current->mm, &mmrange))
return -EINTR;

locked += current->mm->locked_vm;
@@ -705,7 +705,7 @@ static __must_check int do_mlock(unsigned long start, size_t len,
if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
error = apply_vma_lock_flags(start, len, flags, &mmrange);

- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
if (error)
return error;

@@ -741,10 +741,10 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
len = PAGE_ALIGN(len + (offset_in_page(start)));
start &= PAGE_MASK;

- if (down_write_killable(&current->mm->mmap_sem))
+ if (mm_write_lock_killable(current->mm, &mmrange))
return -EINTR;
ret = apply_vma_lock_flags(start, len, 0, &mmrange);
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);

return ret;
}
@@ -811,14 +811,14 @@ SYSCALL_DEFINE1(mlockall, int, flags)
lock_limit = rlimit(RLIMIT_MEMLOCK);
lock_limit >>= PAGE_SHIFT;

- if (down_write_killable(&current->mm->mmap_sem))
+ if (mm_write_lock_killable(current->mm, &mmrange))
return -EINTR;

ret = -ENOMEM;
if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
capable(CAP_IPC_LOCK))
ret = apply_mlockall_flags(flags, &mmrange);
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
if (!ret && (flags & MCL_CURRENT))
mm_populate(0, TASK_SIZE);

@@ -830,10 +830,10 @@ SYSCALL_DEFINE0(munlockall)
int ret;
DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&current->mm->mmap_sem))
+ if (mm_write_lock_killable(current->mm, &mmrange))
return -EINTR;
ret = apply_mlockall_flags(0, &mmrange);
- up_write(&current->mm->mmap_sem);
+ mm_write_unlock(current->mm, &mmrange);
return ret;
}

--
2.13.6


2018-02-05 01:43:44

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 17/64] kernel: use mm locking wrappers

From: Davidlohr Bueso <[email protected]>

Most of the users are already aware of mmrange, so conversion
is straightforward. For those who don't, they all use mmap_sem
in the same function context. No change in semantics.

The dup_mmap() needs two ranges, one for the new and old mms.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/acct.c | 5 +++--
kernel/events/core.c | 5 +++--
kernel/events/uprobes.c | 17 +++++++++--------
kernel/fork.c | 16 ++++++++++------
kernel/futex.c | 4 ++--
kernel/sched/fair.c | 5 +++--
kernel/trace/trace_output.c | 5 +++--
7 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/kernel/acct.c b/kernel/acct.c
index addf7732fb56..bc8826f68002 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -538,14 +538,15 @@ void acct_collect(long exitcode, int group_dead)

if (group_dead && current->mm) {
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&current->mm->mmap_sem);
+ mm_read_lock(current->mm, &mmrange);
vma = current->mm->mmap;
while (vma) {
vsize += vma->vm_end - vma->vm_start;
vma = vma->vm_next;
}
- up_read(&current->mm->mmap_sem);
+ mm_read_unlock(current->mm, &mmrange);
}

spin_lock_irq(&current->sighand->siglock);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f0549e79978b..b21d0942d225 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8264,6 +8264,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event)
struct mm_struct *mm = NULL;
unsigned int count = 0;
unsigned long flags;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* We may observe TASK_TOMBSTONE, which means that the event tear-down
@@ -8279,7 +8280,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event)
if (!mm)
goto restart;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);

raw_spin_lock_irqsave(&ifh->lock, flags);
list_for_each_entry(filter, &ifh->list, entry) {
@@ -8299,7 +8300,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event)
event->addr_filters_gen++;
raw_spin_unlock_irqrestore(&ifh->lock, flags);

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

mmput(mm);

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 60e12b39182c..df6da03d5dc1 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -818,7 +818,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
if (err && is_register)
goto free;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
vma = find_vma(mm, info->vaddr);
if (!vma || !valid_vma(vma, is_register) ||
file_inode(vma->vm_file) != uprobe->inode)
@@ -842,7 +842,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
}

unlock:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
free:
mmput(mm);
info = free_map_info(info);
@@ -984,7 +984,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
int err = 0;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
unsigned long vaddr;
loff_t offset;
@@ -1001,7 +1001,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
vaddr = offset_to_vaddr(vma, uprobe->offset);
err |= remove_breakpoint(uprobe, mm, vaddr, &mmrange);
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

return err;
}
@@ -1150,8 +1150,9 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
{
struct vm_area_struct *vma;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- if (down_write_killable(&mm->mmap_sem))
+ if (mm_write_lock_killable(mm, &mmrange))
return -EINTR;

if (mm->uprobes_state.xol_area) {
@@ -1181,7 +1182,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
/* pairs with get_xol_area() */
smp_store_release(&mm->uprobes_state.xol_area, area); /* ^^^ */
fail:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);

return ret;
}
@@ -1748,7 +1749,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
struct vm_area_struct *vma;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, bp_vaddr);
if (vma && vma->vm_start <= bp_vaddr) {
if (valid_vma(vma, false)) {
@@ -1766,7 +1767,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)

if (!uprobe && test_and_clear_bit(MMF_RECALC_UPROBES, &mm->flags))
mmf_recalc_uprobes(mm);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

return uprobe;
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 2113e252cb9d..060554e33111 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -401,9 +401,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
int retval;
unsigned long charge;
LIST_HEAD(uf);
+ DEFINE_RANGE_LOCK_FULL(old_mmrange);
+ DEFINE_RANGE_LOCK_FULL(mmrange); /* for the new mm */

uprobe_start_dup_mmap();
- if (down_write_killable(&oldmm->mmap_sem)) {
+ if (mm_write_lock_killable(oldmm, &old_mmrange)) {
retval = -EINTR;
goto fail_uprobe_end;
}
@@ -412,7 +414,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
/*
* Not linked in yet - no deadlock potential:
*/
- down_write_nested(&mm->mmap_sem, SINGLE_DEPTH_NESTING);
+ mm_write_lock_nested(mm, &mmrange, SINGLE_DEPTH_NESTING);

/* No ordering required: file already has been exposed. */
RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
@@ -522,9 +524,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
arch_dup_mmap(oldmm, mm);
retval = 0;
out:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
flush_tlb_mm(oldmm);
- up_write(&oldmm->mmap_sem);
+ mm_write_unlock(oldmm, &old_mmrange);
dup_userfaultfd_complete(&uf);
fail_uprobe_end:
uprobe_end_dup_mmap();
@@ -554,9 +556,11 @@ static inline void mm_free_pgd(struct mm_struct *mm)
#else
static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
{
- down_write(&oldmm->mmap_sem);
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ mm_write_lock(oldmm, &mmrange);
RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
- up_write(&oldmm->mmap_sem);
+ mm_write_unlock(oldmm, &mmrange);
return 0;
}
#define mm_alloc_pgd(mm) (0)
diff --git a/kernel/futex.c b/kernel/futex.c
index 09a0d86f80a0..6764240e87bb 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -727,10 +727,10 @@ static int fault_in_user_writeable(u32 __user *uaddr)
int ret;
DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
FAULT_FLAG_WRITE, NULL, &mmrange);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

return ret < 0 ? ret : 0;
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7b6535987500..01f8c533aa21 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2470,6 +2470,7 @@ void task_numa_work(struct callback_head *work)
struct vm_area_struct *vma;
unsigned long start, end;
unsigned long nr_pte_updates = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
long pages, virtpages;

SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work));
@@ -2521,7 +2522,7 @@ void task_numa_work(struct callback_head *work)
return;


- if (!down_read_trylock(&mm->mmap_sem))
+ if (!mm_read_trylock(mm, &mmrange))
return;
vma = find_vma(mm, start);
if (!vma) {
@@ -2589,7 +2590,7 @@ void task_numa_work(struct callback_head *work)
mm->numa_scan_offset = start;
else
reset_ptenuma_scan(p);
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

/*
* Make sure tasks use at least 32x as much time to run other code
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 90db994ac900..0c3f5193de41 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -395,8 +395,9 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,

if (mm) {
const struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
vma = find_vma(mm, ip);
if (vma) {
file = vma->vm_file;
@@ -408,7 +409,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
trace_seq_printf(s, "[+0x%lx]",
ip - vmstart);
}
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
}
if (ret && ((sym_flags & TRACE_ITER_SYM_ADDR) || !file))
trace_seq_printf(s, " <" IP_FMT ">", ip);
--
2.13.6


2018-02-05 01:44:13

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 10/64] kernel/exit: teach exit_mm() about range locking

From: Davidlohr Bueso <[email protected]>

... and use mm locking wrappers -- no change is semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/exit.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 42ca71a44c9a..a9540f157eb2 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -495,6 +495,7 @@ static void exit_mm(void)
{
struct mm_struct *mm = current->mm;
struct core_state *core_state;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

mm_release(current, mm);
if (!mm)
@@ -507,12 +508,12 @@ static void exit_mm(void)
* will increment ->nr_threads for each thread in the
* group with ->mm != NULL.
*/
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
core_state = mm->core_state;
if (core_state) {
struct core_thread self;

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);

self.task = current;
self.next = xchg(&core_state->dumper.next, &self);
@@ -530,14 +531,14 @@ static void exit_mm(void)
freezable_schedule();
}
__set_current_state(TASK_RUNNING);
- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
}
mmgrab(mm);
BUG_ON(mm != current->active_mm);
/* more a memory barrier than a real lock */
task_lock(current);
current->mm = NULL;
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
enter_lazy_tlb(mm, current);
task_unlock(current);
mm_update_next_owner(mm);
--
2.13.6


2018-02-05 01:44:26

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 06/64] mm: teach pagefault paths about range locking

From: Davidlohr Bueso <[email protected]>

In handle_mm_fault() we need to remember the range lock specified
when the mmap_sem was first taken as pf paths can drop the lock.
Although this patch may seem far too big at first, it is so due to
bisectability, and later conversion patches become quite easy to
follow. Furthermore, most of what this patch does is pass a pointer
to an 'mmrange' stack allocated parameter that is later used by the
vm_fault structure. The new interfaces are pretty much all in the
following areas:

- vma handling (vma_merge(), vma_adjust(), split_vma(), copy_vma())
- gup family (all except get_user_pages_unlocked(), which internally
passes the mmrange).
- mm walking (walk_page_vma())
- mmap/unmap (do_mmap(), do_munmap())
- handle_mm_fault(), fixup_user_fault()

Most of the pain of the patch is updating all callers in the kernel
for this. While tedious, it is not that hard to review, I hope.
The idea is to use a local variable (no concurrency) whenever the
mmap_sem is taken and we end up in pf paths that end up retaking
the lock. Ie:

DEFINE_RANGE_LOCK_FULL(mmrange);

down_write(&mm->mmap_sem);
some_fn(a, b, c, &mmrange);
....
....
...
handle_mm_fault(vma, addr, flags, mmrange);
...
up_write(&mm->mmap_sem);

Semantically nothing changes at all, and the 'mmrange' ends up
being unused for now. Later patches will use the variable when
the mmap_sem wrappers replace straightforward down/up.

Compile tested defconfigs on various non-x86 archs without breaking.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/alpha/mm/fault.c | 3 +-
arch/arc/mm/fault.c | 3 +-
arch/arm/mm/fault.c | 8 ++-
arch/arm/probes/uprobes/core.c | 5 +-
arch/arm64/mm/fault.c | 7 ++-
arch/cris/mm/fault.c | 3 +-
arch/frv/mm/fault.c | 3 +-
arch/hexagon/mm/vm_fault.c | 3 +-
arch/ia64/mm/fault.c | 3 +-
arch/m32r/mm/fault.c | 3 +-
arch/m68k/mm/fault.c | 3 +-
arch/metag/mm/fault.c | 3 +-
arch/microblaze/mm/fault.c | 3 +-
arch/mips/kernel/vdso.c | 3 +-
arch/mips/mm/fault.c | 3 +-
arch/mn10300/mm/fault.c | 3 +-
arch/nios2/mm/fault.c | 3 +-
arch/openrisc/mm/fault.c | 3 +-
arch/parisc/mm/fault.c | 3 +-
arch/powerpc/include/asm/mmu_context.h | 3 +-
arch/powerpc/include/asm/powernv.h | 5 +-
arch/powerpc/mm/copro_fault.c | 4 +-
arch/powerpc/mm/fault.c | 3 +-
arch/powerpc/platforms/powernv/npu-dma.c | 5 +-
arch/riscv/mm/fault.c | 3 +-
arch/s390/include/asm/gmap.h | 14 +++--
arch/s390/kvm/gaccess.c | 31 ++++++----
arch/s390/mm/fault.c | 3 +-
arch/s390/mm/gmap.c | 80 +++++++++++++++---------
arch/score/mm/fault.c | 3 +-
arch/sh/mm/fault.c | 3 +-
arch/sparc/mm/fault_32.c | 6 +-
arch/sparc/mm/fault_64.c | 3 +-
arch/tile/mm/fault.c | 3 +-
arch/um/include/asm/mmu_context.h | 3 +-
arch/um/kernel/trap.c | 3 +-
arch/unicore32/mm/fault.c | 8 ++-
arch/x86/entry/vdso/vma.c | 3 +-
arch/x86/include/asm/mmu_context.h | 5 +-
arch/x86/include/asm/mpx.h | 6 +-
arch/x86/mm/fault.c | 3 +-
arch/x86/mm/mpx.c | 41 ++++++++-----
arch/xtensa/mm/fault.c | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +-
drivers/gpu/drm/i915/i915_gem_userptr.c | 4 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 4 +-
drivers/infiniband/core/umem.c | 3 +-
drivers/infiniband/core/umem_odp.c | 3 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 7 ++-
drivers/infiniband/hw/usnic/usnic_uiom.c | 3 +-
drivers/iommu/amd_iommu_v2.c | 5 +-
drivers/iommu/intel-svm.c | 5 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 18 ++++--
drivers/misc/mic/scif/scif_rma.c | 3 +-
drivers/misc/sgi-gru/grufault.c | 43 ++++++++-----
drivers/vfio/vfio_iommu_type1.c | 3 +-
fs/aio.c | 3 +-
fs/binfmt_elf.c | 3 +-
fs/exec.c | 20 ++++--
fs/proc/internal.h | 3 +
fs/proc/task_mmu.c | 29 ++++++---
fs/proc/vmcore.c | 14 ++++-
fs/userfaultfd.c | 18 +++---
include/asm-generic/mm_hooks.h | 3 +-
include/linux/hmm.h | 4 +-
include/linux/ksm.h | 6 +-
include/linux/migrate.h | 4 +-
include/linux/mm.h | 73 +++++++++++++---------
include/linux/uprobes.h | 15 +++--
ipc/shm.c | 14 +++--
kernel/events/uprobes.c | 49 +++++++++------
kernel/futex.c | 3 +-
mm/frame_vector.c | 4 +-
mm/gup.c | 60 ++++++++++--------
mm/hmm.c | 37 ++++++-----
mm/internal.h | 3 +-
mm/ksm.c | 24 +++++---
mm/madvise.c | 58 ++++++++++-------
mm/memcontrol.c | 13 ++--
mm/memory.c | 10 +--
mm/mempolicy.c | 35 ++++++-----
mm/migrate.c | 20 +++---
mm/mincore.c | 24 +++++---
mm/mlock.c | 33 ++++++----
mm/mmap.c | 99 +++++++++++++++++-------------
mm/mprotect.c | 14 +++--
mm/mremap.c | 30 +++++----
mm/nommu.c | 32 ++++++----
mm/pagewalk.c | 56 +++++++++--------
mm/process_vm_access.c | 4 +-
mm/util.c | 3 +-
security/tomoyo/domain.c | 3 +-
virt/kvm/async_pf.c | 3 +-
virt/kvm/kvm_main.c | 16 +++--
94 files changed, 784 insertions(+), 474 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index cd3c572ee912..690d86a00a20 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -90,6 +90,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
int fault, si_code = SEGV_MAPERR;
siginfo_t info;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* As of EV6, a load into $31/$f31 is a prefetch, and never faults
(or is suppressed by the PALcode). Support that for older CPUs
@@ -148,7 +149,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
/* If for any reason at all we couldn't handle the fault,
make sure we exit gracefully rather than endlessly redo
the fault. */
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index a0b7bd6d030d..e423f764f159 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -69,6 +69,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
int fault, ret;
int write = regs->ecr_cause & ECR_C_PROTV_STORE; /* ST/EX */
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* We fault-in kernel-space virtual memory on-demand. The
@@ -137,7 +138,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

/* If Pagefault was interrupted by SIGKILL, exit page fault "early" */
if (unlikely(fatal_signal_pending(current))) {
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index b75eada23d0a..99ae40b5851a 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -221,7 +221,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)

static int __kprobes
__do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
- unsigned int flags, struct task_struct *tsk)
+ unsigned int flags, struct task_struct *tsk,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma;
int fault;
@@ -243,7 +244,7 @@ __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
goto out;
}

- return handle_mm_fault(vma, addr & PAGE_MASK, flags);
+ return handle_mm_fault(vma, addr & PAGE_MASK, flags, mmrange);

check_stack:
/* Don't allow expansion below FIRST_USER_ADDRESS */
@@ -261,6 +262,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
struct mm_struct *mm;
int fault, sig, code;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (notify_page_fault(regs, fsr))
return 0;
@@ -308,7 +310,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
#endif
}

- fault = __do_page_fault(mm, addr, fsr, flags, tsk);
+ fault = __do_page_fault(mm, addr, fsr, flags, tsk, &mmrange);

/* If we need to retry but a fatal signal is pending, handle the
* signal first. We do not need to release the mmap_sem because
diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..e8b893eaebcf 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -30,10 +30,11 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
}

int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
- unsigned long vaddr)
+ unsigned long vaddr, struct range_lock *mmrange)
{
return uprobe_write_opcode(mm, vaddr,
- __opcode_to_mem_arm(auprobe->bpinsn));
+ __opcode_to_mem_arm(auprobe->bpinsn),
+ mmrange);
}

bool arch_uprobe_ignore(struct arch_uprobe *auprobe, struct pt_regs *regs)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index ce441d29e7f6..1f3ad9e4f214 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -342,7 +342,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re

static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
unsigned int mm_flags, unsigned long vm_flags,
- struct task_struct *tsk)
+ struct task_struct *tsk, struct range_lock *mmrange)
{
struct vm_area_struct *vma;
int fault;
@@ -368,7 +368,7 @@ static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
goto out;
}

- return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags);
+ return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags, mmrange);

check_stack:
if (vma->vm_flags & VM_GROWSDOWN && !expand_stack(vma, addr))
@@ -390,6 +390,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
int fault, sig, code, major = 0;
unsigned long vm_flags = VM_READ | VM_WRITE;
unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (notify_page_fault(regs, esr))
return 0;
@@ -450,7 +451,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
#endif
}

- fault = __do_page_fault(mm, addr, mm_flags, vm_flags, tsk);
+ fault = __do_page_fault(mm, addr, mm_flags, vm_flags, tsk, &mmrange);
major |= fault & VM_FAULT_MAJOR;

if (fault & VM_FAULT_RETRY) {
diff --git a/arch/cris/mm/fault.c b/arch/cris/mm/fault.c
index 29cc58038b98..16af16d77269 100644
--- a/arch/cris/mm/fault.c
+++ b/arch/cris/mm/fault.c
@@ -61,6 +61,7 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
siginfo_t info;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

D(printk(KERN_DEBUG
"Page fault for %lX on %X at %lX, prot %d write %d\n",
@@ -170,7 +171,7 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
* the fault.
*/

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/frv/mm/fault.c b/arch/frv/mm/fault.c
index cbe7aec863e3..494d33b628fc 100644
--- a/arch/frv/mm/fault.c
+++ b/arch/frv/mm/fault.c
@@ -41,6 +41,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
pud_t *pue;
pte_t *pte;
int fault;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

#if 0
const char *atxc[16] = {
@@ -165,7 +166,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, ear0, flags);
+ fault = handle_mm_fault(vma, ear0, flags, &mmrange);
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)
goto out_of_memory;
diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index 3eec33c5cfd7..7d6ada2c2230 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -55,6 +55,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
int fault;
const struct exception_table_entry *fixup;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* If we're in an interrupt or have no user context,
@@ -102,7 +103,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
break;
}

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index dfdc152d6737..44f0ec5f77c2 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -89,6 +89,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
unsigned long mask;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

mask = ((((isr >> IA64_ISR_X_BIT) & 1UL) << VM_EXEC_BIT)
| (((isr >> IA64_ISR_W_BIT) & 1UL) << VM_WRITE_BIT));
@@ -162,7 +163,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
* sure we exit gracefully rather than endlessly redo the
* fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/m32r/mm/fault.c b/arch/m32r/mm/fault.c
index 46d9a5ca0e3a..0129aea46729 100644
--- a/arch/m32r/mm/fault.c
+++ b/arch/m32r/mm/fault.c
@@ -82,6 +82,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
unsigned long flags = 0;
int fault;
siginfo_t info;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* If BPSW IE bit enable --> set PSW IE bit
@@ -197,7 +198,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
*/
addr = (address & PAGE_MASK);
set_thread_fault_code(error_code);
- fault = handle_mm_fault(vma, addr, flags);
+ fault = handle_mm_fault(vma, addr, flags, &mmrange);
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)
goto out_of_memory;
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 03253c4f8e6a..ec32a193726f 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -75,6 +75,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
struct vm_area_struct * vma;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

pr_debug("do page fault:\nregs->sr=%#x, regs->pc=%#lx, address=%#lx, %ld, %p\n",
regs->sr, regs->pc, address, error_code, mm ? mm->pgd : NULL);
@@ -138,7 +139,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* the fault.
*/

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);
pr_debug("handle_mm_fault returns %d\n", fault);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
diff --git a/arch/metag/mm/fault.c b/arch/metag/mm/fault.c
index de54fe686080..e16ba0ea7ea1 100644
--- a/arch/metag/mm/fault.c
+++ b/arch/metag/mm/fault.c
@@ -56,6 +56,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
siginfo_t info;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

tsk = current;

@@ -135,7 +136,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return 0;
diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index f91b30f8aaa8..fd49efbdfbf4 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -93,6 +93,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
int is_write = error_code & ESR_S;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

regs->ear = address;
regs->esr = error_code;
@@ -216,7 +217,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 019035d7225c..56b7c29991db 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -102,6 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
unsigned long gic_size, vvar_size, size, base, data_addr, vdso_addr, gic_pfn;
struct vm_area_struct *vma;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (down_write_killable(&mm->mmap_sem))
return -EINTR;
@@ -110,7 +111,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
VM_READ|VM_WRITE|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
- 0, NULL);
+ 0, NULL, &mmrange);
if (IS_ERR_VALUE(base)) {
ret = base;
goto out;
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index 4f8f5bf46977..1433edd01d09 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -47,6 +47,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;

static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

#if 0
printk("Cpu%d[%s:%d:%0*lx:%ld:%0*lx]\n", raw_smp_processor_id(),
@@ -152,7 +153,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/mn10300/mm/fault.c b/arch/mn10300/mm/fault.c
index f0bfa1448744..71c38f0c8702 100644
--- a/arch/mn10300/mm/fault.c
+++ b/arch/mn10300/mm/fault.c
@@ -125,6 +125,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
siginfo_t info;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

#ifdef CONFIG_GDBSTUB
/* handle GDB stub causing a fault */
@@ -254,7 +255,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index b804dd06ea1c..768678b685af 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -49,6 +49,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
int code = SEGV_MAPERR;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

cause >>= 2;

@@ -132,7 +133,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
index d0021dfae20a..75ddb1e8e7e7 100644
--- a/arch/openrisc/mm/fault.c
+++ b/arch/openrisc/mm/fault.c
@@ -55,6 +55,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
siginfo_t info;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

tsk = current;

@@ -163,7 +164,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
* the fault.
*/

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index e247edbca68e..79db33a0cb0c 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -264,6 +264,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
unsigned long acc_type;
int fault = 0;
unsigned int flags;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (faulthandler_disabled())
goto no_context;
@@ -301,7 +302,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
* fault.
*/

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 051b3d63afe3..089b3cf948eb 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -176,7 +176,8 @@ extern void arch_exit_mmap(struct mm_struct *mm);

static inline void arch_unmap(struct mm_struct *mm,
struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
if (start <= mm->context.vdso_base && mm->context.vdso_base < end)
mm->context.vdso_base = 0;
diff --git a/arch/powerpc/include/asm/powernv.h b/arch/powerpc/include/asm/powernv.h
index dc5f6a5d4575..805ff3ba94e1 100644
--- a/arch/powerpc/include/asm/powernv.h
+++ b/arch/powerpc/include/asm/powernv.h
@@ -21,7 +21,7 @@ extern void pnv_npu2_destroy_context(struct npu_context *context,
struct pci_dev *gpdev);
extern int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
unsigned long *flags, unsigned long *status,
- int count);
+ int count, struct range_lock *mmrange);

void pnv_tm_init(void);
#else
@@ -35,7 +35,8 @@ static inline void pnv_npu2_destroy_context(struct npu_context *context,

static inline int pnv_npu2_handle_fault(struct npu_context *context,
uintptr_t *ea, unsigned long *flags,
- unsigned long *status, int count) {
+ unsigned long *status, int count,
+ struct range_lock *mmrange) {
return -ENODEV;
}

diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index 697b70ad1195..8f5e604828a1 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -39,6 +39,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
struct vm_area_struct *vma;
unsigned long is_write;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (mm == NULL)
return -EFAULT;
@@ -77,7 +78,8 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
}

ret = 0;
- *flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0);
+ *flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0,
+ &mmrange);
if (unlikely(*flt & VM_FAULT_ERROR)) {
if (*flt & VM_FAULT_OOM) {
ret = -ENOMEM;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 866446cf2d9a..d562dc88687d 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -399,6 +399,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
int is_write = page_fault_is_write(error_code);
int fault, major = 0;
bool store_update_sp = false;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (notify_page_fault(regs))
return 0;
@@ -514,7 +515,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

#ifdef CONFIG_PPC_MEM_KEYS
/*
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 0a253b64ac5f..759e9a4c7479 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -789,7 +789,8 @@ EXPORT_SYMBOL(pnv_npu2_destroy_context);
* Assumes mmap_sem is held for the contexts associated mm.
*/
int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
- unsigned long *flags, unsigned long *status, int count)
+ unsigned long *flags, unsigned long *status,
+ int count, struct range_lock *mmrange)
{
u64 rc = 0, result = 0;
int i, is_write;
@@ -807,7 +808,7 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
is_write = flags[i] & NPU2_WRITE;
rc = get_user_pages_remote(NULL, mm, ea[i], 1,
is_write ? FOLL_WRITE : 0,
- page, NULL, NULL);
+ page, NULL, NULL, mmrange);

/*
* To support virtualised environments we will have to do an
diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 148c98ca9b45..75d15e73ba39 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -42,6 +42,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
unsigned long addr, cause;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
int fault, code = SEGV_MAPERR;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

cause = regs->scause;
addr = regs->sbadaddr;
@@ -119,7 +120,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, addr, flags);
+ fault = handle_mm_fault(vma, addr, flags, &mmrange);

/*
* If we need to retry but a fatal signal is pending, handle the
diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
index e07cce88dfb0..117c19a947c9 100644
--- a/arch/s390/include/asm/gmap.h
+++ b/arch/s390/include/asm/gmap.h
@@ -107,22 +107,24 @@ void gmap_discard(struct gmap *, unsigned long from, unsigned long to);
void __gmap_zap(struct gmap *, unsigned long gaddr);
void gmap_unlink(struct mm_struct *, unsigned long *table, unsigned long vmaddr);

-int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val);
+int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val,
+ struct range_lock *mmrange);

struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
int edat_level);
int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level);
int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
- int fake);
+ int fake, struct range_lock *mmrange);
int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
- int fake);
+ int fake, struct range_lock *mmrange);
int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
- int fake);
+ int fake, struct range_lock *mmrange);
int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
- int fake);
+ int fake, struct range_lock *mmrange);
int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr,
unsigned long *pgt, int *dat_protection, int *fake);
-int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte);
+int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte,
+ struct range_lock *mmrange);

void gmap_register_pte_notifier(struct gmap_notifier *);
void gmap_unregister_pte_notifier(struct gmap_notifier *);
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index c24bfa72baf7..ff739b86df36 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -978,10 +978,11 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra)
* @saddr: faulting address in the shadow gmap
* @pgt: pointer to the page table address result
* @fake: pgt references contiguous guest memory block, not a pgtable
+ * @mmrange: address space range locking
*/
static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
unsigned long *pgt, int *dat_protection,
- int *fake)
+ int *fake, struct range_lock *mmrange)
{
struct gmap *parent;
union asce asce;
@@ -1034,7 +1035,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rfte.val = ptr;
goto shadow_r2t;
}
- rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val);
+ rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val,
+ mmrange);
if (rc)
return rc;
if (rfte.i)
@@ -1047,7 +1049,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
*dat_protection |= rfte.p;
ptr = rfte.rto * PAGE_SIZE;
shadow_r2t:
- rc = gmap_shadow_r2t(sg, saddr, rfte.val, *fake);
+ rc = gmap_shadow_r2t(sg, saddr, rfte.val, *fake, mmrange);
if (rc)
return rc;
/* fallthrough */
@@ -1060,7 +1062,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rste.val = ptr;
goto shadow_r3t;
}
- rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val);
+ rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val,
+ mmrange);
if (rc)
return rc;
if (rste.i)
@@ -1074,7 +1077,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
ptr = rste.rto * PAGE_SIZE;
shadow_r3t:
rste.p |= *dat_protection;
- rc = gmap_shadow_r3t(sg, saddr, rste.val, *fake);
+ rc = gmap_shadow_r3t(sg, saddr, rste.val, *fake, mmrange);
if (rc)
return rc;
/* fallthrough */
@@ -1087,7 +1090,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rtte.val = ptr;
goto shadow_sgt;
}
- rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val);
+ rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val,
+ mmrange);
if (rc)
return rc;
if (rtte.i)
@@ -1110,7 +1114,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
ptr = rtte.fc0.sto * PAGE_SIZE;
shadow_sgt:
rtte.fc0.p |= *dat_protection;
- rc = gmap_shadow_sgt(sg, saddr, rtte.val, *fake);
+ rc = gmap_shadow_sgt(sg, saddr, rtte.val, *fake, mmrange);
if (rc)
return rc;
/* fallthrough */
@@ -1123,7 +1127,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
ste.val = ptr;
goto shadow_pgt;
}
- rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val);
+ rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val,
+ mmrange);
if (rc)
return rc;
if (ste.i)
@@ -1142,7 +1147,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
ptr = ste.fc0.pto * (PAGE_SIZE / 2);
shadow_pgt:
ste.fc0.p |= *dat_protection;
- rc = gmap_shadow_pgt(sg, saddr, ste.val, *fake);
+ rc = gmap_shadow_pgt(sg, saddr, ste.val, *fake, mmrange);
if (rc)
return rc;
}
@@ -1172,6 +1177,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
unsigned long pgt;
int dat_protection, fake;
int rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_read(&sg->mm->mmap_sem);
/*
@@ -1184,7 +1190,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
rc = gmap_shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);
if (rc)
rc = kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection,
- &fake);
+ &fake, &mmrange);

vaddr.addr = saddr;
if (fake) {
@@ -1192,7 +1198,8 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
goto shadow_page;
}
if (!rc)
- rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8, &pte.val);
+ rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8,
+ &pte.val, &mmrange);
if (!rc && pte.i)
rc = PGM_PAGE_TRANSLATION;
if (!rc && pte.z)
@@ -1200,7 +1207,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
shadow_page:
pte.p |= dat_protection;
if (!rc)
- rc = gmap_shadow_page(sg, saddr, __pte(pte.val));
+ rc = gmap_shadow_page(sg, saddr, __pte(pte.val), &mmrange);
ipte_unlock(vcpu);
up_read(&sg->mm->mmap_sem);
return rc;
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 93faeca52284..17ba3c402f9d 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -421,6 +421,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
unsigned long address;
unsigned int flags;
int fault;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

tsk = current;
/*
@@ -507,7 +508,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);
/* No reason to continue if interrupted by SIGKILL. */
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
fault = VM_FAULT_SIGNAL;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 2c55a2b9d6c6..b12a44813022 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -621,6 +621,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
unsigned long vmaddr;
int rc;
bool unlocked;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_read(&gmap->mm->mmap_sem);

@@ -632,7 +633,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
goto out_up;
}
if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags,
- &unlocked)) {
+ &unlocked, &mmrange)) {
rc = -EFAULT;
goto out_up;
}
@@ -835,13 +836,15 @@ static pte_t *gmap_pte_op_walk(struct gmap *gmap, unsigned long gaddr,
* @gaddr: virtual address in the guest address space
* @vmaddr: address in the host process address space
* @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
+ * @mmrange: address space range locking
*
* Returns 0 if the caller can retry __gmap_translate (might fail again),
* -ENOMEM if out of memory and -EFAULT if anything goes wrong while fixing
* up or connecting the gmap page table.
*/
static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,
- unsigned long vmaddr, int prot)
+ unsigned long vmaddr, int prot,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = gmap->mm;
unsigned int fault_flags;
@@ -849,7 +852,8 @@ static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,

BUG_ON(gmap_is_shadow(gmap));
fault_flags = (prot == PROT_WRITE) ? FAULT_FLAG_WRITE : 0;
- if (fixup_user_fault(current, mm, vmaddr, fault_flags, &unlocked))
+ if (fixup_user_fault(current, mm, vmaddr, fault_flags, &unlocked,
+ mmrange))
return -EFAULT;
if (unlocked)
/* lost mmap_sem, caller has to retry __gmap_translate */
@@ -874,6 +878,7 @@ static void gmap_pte_op_end(spinlock_t *ptl)
* @len: size of area
* @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
* @bits: pgste notification bits to set
+ * @mmrange: address space range locking
*
* Returns 0 if successfully protected, -ENOMEM if out of memory and
* -EFAULT if gaddr is invalid (or mapping for shadows is missing).
@@ -881,7 +886,8 @@ static void gmap_pte_op_end(spinlock_t *ptl)
* Called with sg->mm->mmap_sem in read.
*/
static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
- unsigned long len, int prot, unsigned long bits)
+ unsigned long len, int prot, unsigned long bits,
+ struct range_lock *mmrange)
{
unsigned long vmaddr;
spinlock_t *ptl;
@@ -900,7 +906,8 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
vmaddr = __gmap_translate(gmap, gaddr);
if (IS_ERR_VALUE(vmaddr))
return vmaddr;
- rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot);
+ rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot,
+ mmrange);
if (rc)
return rc;
continue;
@@ -929,13 +936,14 @@ int gmap_mprotect_notify(struct gmap *gmap, unsigned long gaddr,
unsigned long len, int prot)
{
int rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if ((gaddr & ~PAGE_MASK) || (len & ~PAGE_MASK) || gmap_is_shadow(gmap))
return -EINVAL;
if (!MACHINE_HAS_ESOP && prot == PROT_READ)
return -EINVAL;
down_read(&gmap->mm->mmap_sem);
- rc = gmap_protect_range(gmap, gaddr, len, prot, PGSTE_IN_BIT);
+ rc = gmap_protect_range(gmap, gaddr, len, prot, PGSTE_IN_BIT, &mmrange);
up_read(&gmap->mm->mmap_sem);
return rc;
}
@@ -947,6 +955,7 @@ EXPORT_SYMBOL_GPL(gmap_mprotect_notify);
* @gmap: pointer to guest mapping meta data structure
* @gaddr: virtual address in the guest address space
* @val: pointer to the unsigned long value to return
+ * @mmrange: address space range locking
*
* Returns 0 if the value was read, -ENOMEM if out of memory and -EFAULT
* if reading using the virtual address failed. -EINVAL if called on a gmap
@@ -954,7 +963,8 @@ EXPORT_SYMBOL_GPL(gmap_mprotect_notify);
*
* Called with gmap->mm->mmap_sem in read.
*/
-int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val)
+int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val,
+ struct range_lock *mmrange)
{
unsigned long address, vmaddr;
spinlock_t *ptl;
@@ -986,7 +996,7 @@ int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val)
rc = vmaddr;
break;
}
- rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, PROT_READ);
+ rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, PROT_READ, mmrange);
if (rc)
break;
}
@@ -1026,12 +1036,14 @@ static inline void gmap_insert_rmap(struct gmap *sg, unsigned long vmaddr,
* @raddr: rmap address in the shadow gmap
* @paddr: address in the parent guest address space
* @len: length of the memory area to protect
+ * @mmrange: address space range locking
*
* Returns 0 if successfully protected and the rmap was created, -ENOMEM
* if out of memory and -EFAULT if paddr is invalid.
*/
static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
- unsigned long paddr, unsigned long len)
+ unsigned long paddr, unsigned long len,
+ struct range_lock *mmrange)
{
struct gmap *parent;
struct gmap_rmap *rmap;
@@ -1069,7 +1081,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
radix_tree_preload_end();
if (rc) {
kfree(rmap);
- rc = gmap_pte_op_fixup(parent, paddr, vmaddr, PROT_READ);
+ rc = gmap_pte_op_fixup(parent, paddr, vmaddr, PROT_READ, mmrange);
if (rc)
return rc;
continue;
@@ -1473,6 +1485,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
struct gmap *sg, *new;
unsigned long limit;
int rc;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

BUG_ON(gmap_is_shadow(parent));
spin_lock(&parent->shadow_lock);
@@ -1526,7 +1539,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
down_read(&parent->mm->mmap_sem);
rc = gmap_protect_range(parent, asce & _ASCE_ORIGIN,
((asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE,
- PROT_READ, PGSTE_VSIE_BIT);
+ PROT_READ, PGSTE_VSIE_BIT, &mmrange);
up_read(&parent->mm->mmap_sem);
spin_lock(&parent->shadow_lock);
new->initialized = true;
@@ -1546,6 +1559,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow);
* @saddr: faulting address in the shadow gmap
* @r2t: parent gmap address of the region 2 table to get shadowed
* @fake: r2t references contiguous guest memory block, not a r2t
+ * @mmrange: address space range locking
*
* The r2t parameter specifies the address of the source table. The
* four pages of the source table are made read-only in the parent gmap
@@ -1559,7 +1573,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow);
* Called with sg->mm->mmap_sem in read.
*/
int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
- int fake)
+ int fake, struct range_lock *mmrange)
{
unsigned long raddr, origin, offset, len;
unsigned long *s_r2t, *table;
@@ -1608,7 +1622,7 @@ int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
origin = r2t & _REGION_ENTRY_ORIGIN;
offset = ((r2t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE;
len = ((r2t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset;
- rc = gmap_protect_rmap(sg, raddr, origin + offset, len);
+ rc = gmap_protect_rmap(sg, raddr, origin + offset, len, mmrange);
spin_lock(&sg->guest_table_lock);
if (!rc) {
table = gmap_table_walk(sg, saddr, 4);
@@ -1635,6 +1649,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_r2t);
* @saddr: faulting address in the shadow gmap
* @r3t: parent gmap address of the region 3 table to get shadowed
* @fake: r3t references contiguous guest memory block, not a r3t
+ * @mmrange: address space range locking
*
* Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the
* shadow table structure is incomplete, -ENOMEM if out of memory and
@@ -1643,7 +1658,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_r2t);
* Called with sg->mm->mmap_sem in read.
*/
int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
- int fake)
+ int fake, struct range_lock *mmrange)
{
unsigned long raddr, origin, offset, len;
unsigned long *s_r3t, *table;
@@ -1691,7 +1706,7 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
origin = r3t & _REGION_ENTRY_ORIGIN;
offset = ((r3t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE;
len = ((r3t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset;
- rc = gmap_protect_rmap(sg, raddr, origin + offset, len);
+ rc = gmap_protect_rmap(sg, raddr, origin + offset, len, mmrange);
spin_lock(&sg->guest_table_lock);
if (!rc) {
table = gmap_table_walk(sg, saddr, 3);
@@ -1718,6 +1733,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_r3t);
* @saddr: faulting address in the shadow gmap
* @sgt: parent gmap address of the segment table to get shadowed
* @fake: sgt references contiguous guest memory block, not a sgt
+ * @mmrange: address space range locking
*
* Returns: 0 if successfully shadowed or already shadowed, -EAGAIN if the
* shadow table structure is incomplete, -ENOMEM if out of memory and
@@ -1726,7 +1742,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_r3t);
* Called with sg->mm->mmap_sem in read.
*/
int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
- int fake)
+ int fake, struct range_lock *mmrange)
{
unsigned long raddr, origin, offset, len;
unsigned long *s_sgt, *table;
@@ -1775,7 +1791,7 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
origin = sgt & _REGION_ENTRY_ORIGIN;
offset = ((sgt & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE;
len = ((sgt & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset;
- rc = gmap_protect_rmap(sg, raddr, origin + offset, len);
+ rc = gmap_protect_rmap(sg, raddr, origin + offset, len, mmrange);
spin_lock(&sg->guest_table_lock);
if (!rc) {
table = gmap_table_walk(sg, saddr, 2);
@@ -1842,6 +1858,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup);
* @saddr: faulting address in the shadow gmap
* @pgt: parent gmap address of the page table to get shadowed
* @fake: pgt references contiguous guest memory block, not a pgtable
+ * @mmrange: address space range locking
*
* Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the
* shadow table structure is incomplete, -ENOMEM if out of memory,
@@ -1850,7 +1867,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup);
* Called with gmap->mm->mmap_sem in read
*/
int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
- int fake)
+ int fake, struct range_lock *mmrange)
{
unsigned long raddr, origin;
unsigned long *s_pgt, *table;
@@ -1894,7 +1911,7 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
/* Make pgt read-only in parent gmap page table (not the pgste) */
raddr = (saddr & _SEGMENT_MASK) | _SHADOW_RMAP_SEGMENT;
origin = pgt & _SEGMENT_ENTRY_ORIGIN & PAGE_MASK;
- rc = gmap_protect_rmap(sg, raddr, origin, PAGE_SIZE);
+ rc = gmap_protect_rmap(sg, raddr, origin, PAGE_SIZE, mmrange);
spin_lock(&sg->guest_table_lock);
if (!rc) {
table = gmap_table_walk(sg, saddr, 1);
@@ -1921,6 +1938,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_pgt);
* @sg: pointer to the shadow guest address space structure
* @saddr: faulting address in the shadow gmap
* @pte: pte in parent gmap address space to get shadowed
+ * @mmrange: address space range locking
*
* Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the
* shadow table structure is incomplete, -ENOMEM if out of memory and
@@ -1928,7 +1946,8 @@ EXPORT_SYMBOL_GPL(gmap_shadow_pgt);
*
* Called with sg->mm->mmap_sem in read.
*/
-int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
+int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte,
+ struct range_lock *mmrange)
{
struct gmap *parent;
struct gmap_rmap *rmap;
@@ -1982,7 +2001,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
radix_tree_preload_end();
if (!rc)
break;
- rc = gmap_pte_op_fixup(parent, paddr, vmaddr, prot);
+ rc = gmap_pte_op_fixup(parent, paddr, vmaddr, prot, mmrange);
if (rc)
break;
}
@@ -2117,7 +2136,8 @@ static inline void thp_split_mm(struct mm_struct *mm)
* - This must be called after THP was enabled
*/
static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
- unsigned long end, struct mm_walk *walk)
+ unsigned long end, struct mm_walk *walk,
+ struct range_lock *mmrange)
{
unsigned long addr;

@@ -2133,12 +2153,13 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
return 0;
}

-static inline void zap_zero_pages(struct mm_struct *mm)
+static inline void zap_zero_pages(struct mm_struct *mm,
+ struct range_lock *mmrange)
{
struct mm_walk walk = { .pmd_entry = __zap_zero_pages };

walk.mm = mm;
- walk_page_range(0, TASK_SIZE, &walk);
+ walk_page_range(0, TASK_SIZE, &walk, mmrange);
}

/*
@@ -2147,6 +2168,7 @@ static inline void zap_zero_pages(struct mm_struct *mm)
int s390_enable_sie(void)
{
struct mm_struct *mm = current->mm;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* Do we have pgstes? if yes, we are done */
if (mm_has_pgste(mm))
@@ -2158,7 +2180,7 @@ int s390_enable_sie(void)
mm->context.has_pgste = 1;
/* split thp mappings and disable thp for future mappings */
thp_split_mm(mm);
- zap_zero_pages(mm);
+ zap_zero_pages(mm, &mmrange);
up_write(&mm->mmap_sem);
return 0;
}
@@ -2182,6 +2204,7 @@ int s390_enable_skey(void)
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
int rc = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_write(&mm->mmap_sem);
if (mm_use_skey(mm))
@@ -2190,7 +2213,7 @@ int s390_enable_skey(void)
mm->context.use_skey = 1;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
- MADV_UNMERGEABLE, &vma->vm_flags)) {
+ MADV_UNMERGEABLE, &vma->vm_flags, &mmrange)) {
mm->context.use_skey = 0;
rc = -ENOMEM;
goto out_up;
@@ -2199,7 +2222,7 @@ int s390_enable_skey(void)
mm->def_flags &= ~VM_MERGEABLE;

walk.mm = mm;
- walk_page_range(0, TASK_SIZE, &walk);
+ walk_page_range(0, TASK_SIZE, &walk, &mmrange);

out_up:
up_write(&mm->mmap_sem);
@@ -2220,10 +2243,11 @@ static int __s390_reset_cmma(pte_t *pte, unsigned long addr,
void s390_reset_cmma(struct mm_struct *mm)
{
struct mm_walk walk = { .pte_entry = __s390_reset_cmma };
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_write(&mm->mmap_sem);
walk.mm = mm;
- walk_page_range(0, TASK_SIZE, &walk);
+ walk_page_range(0, TASK_SIZE, &walk, &mmrange);
up_write(&mm->mmap_sem);
}
EXPORT_SYMBOL_GPL(s390_reset_cmma);
diff --git a/arch/score/mm/fault.c b/arch/score/mm/fault.c
index b85fad4f0874..07a8637ad142 100644
--- a/arch/score/mm/fault.c
+++ b/arch/score/mm/fault.c
@@ -51,6 +51,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
unsigned long flags = 0;
siginfo_t info;
int fault;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

info.si_code = SEGV_MAPERR;

@@ -111,7 +112,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, mmrange);
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)
goto out_of_memory;
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index 6fd1bf7481c7..d36106564728 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -405,6 +405,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
struct vm_area_struct * vma;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

tsk = current;
mm = tsk->mm;
@@ -488,7 +489,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if (unlikely(fault & (VM_FAULT_RETRY | VM_FAULT_ERROR)))
if (mm_fault_error(regs, error_code, address, fault))
diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index a8103a84b4ac..ebb2406dbe7c 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -176,6 +176,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
int from_user = !(regs->psr & PSR_PS);
int fault, code;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (text_fault)
address = regs->pc;
@@ -242,7 +243,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
@@ -389,6 +390,7 @@ static void force_user_fault(unsigned long address, int write)
struct mm_struct *mm = tsk->mm;
unsigned int flags = FAULT_FLAG_USER;
int code;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

code = SEGV_MAPERR;

@@ -412,7 +414,7 @@ static void force_user_fault(unsigned long address, int write)
if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
goto bad_area;
}
- switch (handle_mm_fault(vma, address, flags)) {
+ switch (handle_mm_fault(vma, address, flags, &mmrange)) {
case VM_FAULT_SIGBUS:
case VM_FAULT_OOM:
goto do_sigbus;
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 41363f46797b..e0a3c36b0fa1 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -287,6 +287,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
int si_code, fault_code, fault;
unsigned long address, mm_rss;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

fault_code = get_thread_fault_code();

@@ -438,7 +439,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
goto bad_area;
}

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
goto exit_exception;
diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index f58fa06a2214..09f053eb146f 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -275,6 +275,7 @@ static int handle_page_fault(struct pt_regs *regs,
int is_kernel_mode;
pgd_t *pgd;
unsigned int flags;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* on TILE, protection faults are always writes */
if (!is_page_fault)
@@ -437,7 +438,7 @@ static int handle_page_fault(struct pt_regs *regs,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return 0;
diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h
index fca34b2177e2..98cc3e36385a 100644
--- a/arch/um/include/asm/mmu_context.h
+++ b/arch/um/include/asm/mmu_context.h
@@ -23,7 +23,8 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
extern void arch_exit_mmap(struct mm_struct *mm);
static inline void arch_unmap(struct mm_struct *mm,
struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
}
static inline void arch_bprm_mm_init(struct mm_struct *mm,
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index b2b02df9896e..e632a14e896e 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -33,6 +33,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
pte_t *pte;
int err = -EFAULT;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

*code_out = SEGV_MAPERR;

@@ -74,7 +75,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
do {
int fault;

- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
goto out_nosemaphore;
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index bbefcc46a45e..dd35b6191798 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -168,7 +168,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
}

static int __do_pf(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
- unsigned int flags, struct task_struct *tsk)
+ unsigned int flags, struct task_struct *tsk,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma;
int fault;
@@ -194,7 +195,7 @@ static int __do_pf(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
* If for any reason at all we couldn't handle the fault, make
* sure we exit gracefully rather than endlessly redo the fault.
*/
- fault = handle_mm_fault(vma, addr & PAGE_MASK, flags);
+ fault = handle_mm_fault(vma, addr & PAGE_MASK, flags, mmrange);
return fault;

check_stack:
@@ -210,6 +211,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
struct mm_struct *mm;
int fault, sig, code;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

tsk = current;
mm = tsk->mm;
@@ -251,7 +253,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
#endif
}

- fault = __do_pf(mm, addr, fsr, flags, tsk);
+ fault = __do_pf(mm, addr, fsr, flags, tsk, &mmrange);

/* If we need to retry but a fatal signal is pending, handle the
* signal first. We do not need to release the mmap_sem because
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 5b8b556dbb12..2e0bdf6a3aaf 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -155,6 +155,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
struct vm_area_struct *vma;
unsigned long text_start;
int ret = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (down_write_killable(&mm->mmap_sem))
return -EINTR;
@@ -192,7 +193,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)

if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
- do_munmap(mm, text_start, image->size, NULL);
+ do_munmap(mm, text_start, image->size, NULL, &mmrange);
} else {
current->mm->context.vdso = (void __user *)text_start;
current->mm->context.vdso_image = image;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index c931b88982a0..31fb02ed4770 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -263,7 +263,8 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
}

static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
/*
* mpx_notify_unmap() goes and reads a rarely-hot
@@ -283,7 +284,7 @@ static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
* consistently wrong.
*/
if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX)))
- mpx_notify_unmap(mm, vma, start, end);
+ mpx_notify_unmap(mm, vma, start, end, mmrange);
}

#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 61eb4b63c5ec..c26099224a17 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -73,7 +73,8 @@ static inline void mpx_mm_init(struct mm_struct *mm)
mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
}
void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long start, unsigned long end);
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange);

unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len,
unsigned long flags);
@@ -95,7 +96,8 @@ static inline void mpx_mm_init(struct mm_struct *mm)
}
static inline void mpx_notify_unmap(struct mm_struct *mm,
struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
}

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 800de815519c..93f1b8d4c88e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1244,6 +1244,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
int fault, major = 0;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
u32 pkey;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

tsk = current;
mm = tsk->mm;
@@ -1423,7 +1424,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
* fault, so we read the pkey beforehand.
*/
pkey = vma_pkey(vma);
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);
major |= fault & VM_FAULT_MAJOR;

/*
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index e500949bae24..51c3e1f7e6be 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -47,6 +47,7 @@ static unsigned long mpx_mmap(unsigned long len)
{
struct mm_struct *mm = current->mm;
unsigned long addr, populate;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* Only bounds table can be allocated here */
if (len != mpx_bt_size_bytes(mm))
@@ -54,7 +55,8 @@ static unsigned long mpx_mmap(unsigned long len)

down_write(&mm->mmap_sem);
addr = do_mmap(NULL, 0, len, PROT_READ | PROT_WRITE,
- MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0, &populate, NULL);
+ MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0, &populate, NULL,
+ &mmrange);
up_write(&mm->mmap_sem);
if (populate)
mm_populate(addr, populate);
@@ -427,13 +429,15 @@ int mpx_handle_bd_fault(void)
* A thin wrapper around get_user_pages(). Returns 0 if the
* fault was resolved or -errno if not.
*/
-static int mpx_resolve_fault(long __user *addr, int write)
+static int mpx_resolve_fault(long __user *addr, int write,
+ struct range_lock *mmrange)
{
long gup_ret;
int nr_pages = 1;

gup_ret = get_user_pages((unsigned long)addr, nr_pages,
- write ? FOLL_WRITE : 0, NULL, NULL);
+ write ? FOLL_WRITE : 0, NULL, NULL,
+ mmrange);
/*
* get_user_pages() returns number of pages gotten.
* 0 means we failed to fault in and get anything,
@@ -500,7 +504,8 @@ static int get_user_bd_entry(struct mm_struct *mm, unsigned long *bd_entry_ret,
*/
static int get_bt_addr(struct mm_struct *mm,
long __user *bd_entry_ptr,
- unsigned long *bt_addr_result)
+ unsigned long *bt_addr_result,
+ struct range_lock *mmrange)
{
int ret;
int valid_bit;
@@ -519,7 +524,8 @@ static int get_bt_addr(struct mm_struct *mm,
if (!ret)
break;
if (ret == -EFAULT)
- ret = mpx_resolve_fault(bd_entry_ptr, need_write);
+ ret = mpx_resolve_fault(bd_entry_ptr,
+ need_write, mmrange);
/*
* If we could not resolve the fault, consider it
* userspace's fault and error out.
@@ -730,7 +736,8 @@ static unsigned long mpx_get_bd_entry_offset(struct mm_struct *mm,
}

static int unmap_entire_bt(struct mm_struct *mm,
- long __user *bd_entry, unsigned long bt_addr)
+ long __user *bd_entry, unsigned long bt_addr,
+ struct range_lock *mmrange)
{
unsigned long expected_old_val = bt_addr | MPX_BD_ENTRY_VALID_FLAG;
unsigned long uninitialized_var(actual_old_val);
@@ -747,7 +754,7 @@ static int unmap_entire_bt(struct mm_struct *mm,
if (!ret)
break;
if (ret == -EFAULT)
- ret = mpx_resolve_fault(bd_entry, need_write);
+ ret = mpx_resolve_fault(bd_entry, need_write, mmrange);
/*
* If we could not resolve the fault, consider it
* userspace's fault and error out.
@@ -780,11 +787,12 @@ static int unmap_entire_bt(struct mm_struct *mm,
* avoid recursion, do_munmap() will check whether it comes
* from one bounds table through VM_MPX flag.
*/
- return do_munmap(mm, bt_addr, mpx_bt_size_bytes(mm), NULL);
+ return do_munmap(mm, bt_addr, mpx_bt_size_bytes(mm), NULL, mmrange);
}

static int try_unmap_single_bt(struct mm_struct *mm,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
struct vm_area_struct *next;
struct vm_area_struct *prev;
@@ -835,7 +843,7 @@ static int try_unmap_single_bt(struct mm_struct *mm,
}

bde_vaddr = mm->context.bd_addr + mpx_get_bd_entry_offset(mm, start);
- ret = get_bt_addr(mm, bde_vaddr, &bt_addr);
+ ret = get_bt_addr(mm, bde_vaddr, &bt_addr, mmrange);
/*
* No bounds table there, so nothing to unmap.
*/
@@ -853,12 +861,13 @@ static int try_unmap_single_bt(struct mm_struct *mm,
*/
if ((start == bta_start_vaddr) &&
(end == bta_end_vaddr))
- return unmap_entire_bt(mm, bde_vaddr, bt_addr);
+ return unmap_entire_bt(mm, bde_vaddr, bt_addr, mmrange);
return zap_bt_entries_mapping(mm, bt_addr, start, end);
}

static int mpx_unmap_tables(struct mm_struct *mm,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
unsigned long one_unmap_start;
trace_mpx_unmap_search(start, end);
@@ -876,7 +885,8 @@ static int mpx_unmap_tables(struct mm_struct *mm,
*/
if (one_unmap_end > next_unmap_start)
one_unmap_end = next_unmap_start;
- ret = try_unmap_single_bt(mm, one_unmap_start, one_unmap_end);
+ ret = try_unmap_single_bt(mm, one_unmap_start, one_unmap_end,
+ mmrange);
if (ret)
return ret;

@@ -894,7 +904,8 @@ static int mpx_unmap_tables(struct mm_struct *mm,
* necessary, and the 'vma' is the first vma in this range (start -> end).
*/
void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
int ret;

@@ -920,7 +931,7 @@ void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
vma = vma->vm_next;
} while (vma && vma->vm_start < end);

- ret = mpx_unmap_tables(mm, start, end);
+ ret = mpx_unmap_tables(mm, start, end, mmrange);
if (ret)
force_sig(SIGSEGV, current);
}
diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 8b9b6f44bb06..6f8e3e7cccb5 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -44,6 +44,7 @@ void do_page_fault(struct pt_regs *regs)
int is_write, is_exec;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

info.si_code = SEGV_MAPERR;

@@ -108,7 +109,7 @@ void do_page_fault(struct pt_regs *regs)
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
- fault = handle_mm_fault(vma, address, flags);
+ fault = handle_mm_fault(vma, address, flags, &mmrange);

if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
return;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e4bb435e614b..bd464a599341 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -691,6 +691,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
unsigned int flags = 0;
unsigned pinned = 0;
int r;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY))
flags |= FOLL_WRITE;
@@ -721,7 +722,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
list_add(&guptask.list, &gtt->guptasks);
spin_unlock(&gtt->guptasklock);

- r = get_user_pages(userptr, num_pages, flags, p, NULL);
+ r = get_user_pages(userptr, num_pages, flags, p, NULL, &mmrange);

spin_lock(&gtt->guptasklock);
list_del(&guptask.list);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 382a77a1097e..881bcc7d663a 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -512,6 +512,8 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)

ret = -EFAULT;
if (mmget_not_zero(mm)) {
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
down_read(&mm->mmap_sem);
while (pinned < npages) {
ret = get_user_pages_remote
@@ -519,7 +521,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
obj->userptr.ptr + pinned * PAGE_SIZE,
npages - pinned,
flags,
- pvec + pinned, NULL, NULL);
+ pvec + pinned, NULL, NULL, &mmrange);
if (ret < 0)
break;

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index a0a839bc39bf..9fc3a4f86945 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -545,6 +545,8 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
struct radeon_ttm_tt *gtt = (void *)ttm;
unsigned pinned = 0, nents;
int r;
+ // XXX: this is wrong!!
+ DEFINE_RANGE_LOCK_FULL(mmrange);

int write = !(gtt->userflags & RADEON_GEM_USERPTR_READONLY);
enum dma_data_direction direction = write ?
@@ -569,7 +571,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
struct page **pages = ttm->pages + pinned;

r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
- pages, NULL);
+ pages, NULL, &mmrange);
if (r < 0)
goto release_pages;

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 9a4e899d94b3..fd9601ed5b84 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -96,6 +96,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
struct scatterlist *sg, *sg_list_start;
int need_release = 0;
unsigned int gup_flags = FOLL_WRITE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (dmasync)
dma_attrs |= DMA_ATTR_WRITE_BARRIER;
@@ -194,7 +195,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
ret = get_user_pages_longterm(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof (struct page *)),
- gup_flags, page_list, vma_list);
+ gup_flags, page_list, vma_list, &mmrange);

if (ret < 0)
goto out;
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 2aadf5813a40..0572953260e8 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -632,6 +632,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
int j, k, ret = 0, start_idx, npages = 0, page_shift;
unsigned int flags = 0;
phys_addr_t p = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (access_mask == 0)
return -EINVAL;
@@ -683,7 +684,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
*/
npages = get_user_pages_remote(owning_process, owning_mm,
user_virt, gup_num_pages,
- flags, local_page_list, NULL, NULL);
+ flags, local_page_list, NULL, NULL, &mmrange);
up_read(&owning_mm->mmap_sem);

if (npages < 0)
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index ce83ba9a12ef..6bcb4f9f9b30 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -53,7 +53,7 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages,
* Call with current->mm->mmap_sem held.
*/
static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
- struct page **p)
+ struct page **p, struct range_lock *mmrange)
{
unsigned long lock_limit;
size_t got;
@@ -70,7 +70,7 @@ static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
ret = get_user_pages(start_page + got * PAGE_SIZE,
num_pages - got,
FOLL_WRITE | FOLL_FORCE,
- p + got, NULL);
+ p + got, NULL, mmrange);
if (ret < 0)
goto bail_release;
}
@@ -134,10 +134,11 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
struct page **p)
{
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_write(&current->mm->mmap_sem);

- ret = __qib_get_user_pages(start_page, num_pages, p);
+ ret = __qib_get_user_pages(start_page, num_pages, p, &mmrange);

up_write(&current->mm->mmap_sem);

diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 4381c0a9a873..5f36c6d2e21b 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -113,6 +113,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
int flags;
dma_addr_t pa;
unsigned int gup_flags;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!can_do_mlock())
return -EPERM;
@@ -146,7 +147,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
ret = get_user_pages(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof(struct page *)),
- gup_flags, page_list, NULL);
+ gup_flags, page_list, NULL, &mmrange);

if (ret < 0)
goto out;
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 1d0b53a04a08..15a7103fd84c 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -512,6 +512,7 @@ static void do_fault(struct work_struct *work)
unsigned int flags = 0;
struct mm_struct *mm;
u64 address;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

mm = fault->state->mm;
address = fault->address;
@@ -523,7 +524,7 @@ static void do_fault(struct work_struct *work)
flags |= FAULT_FLAG_REMOTE;

down_read(&mm->mmap_sem);
- vma = find_extend_vma(mm, address);
+ vma = find_extend_vma(mm, address, &mmrange);
if (!vma || address < vma->vm_start)
/* failed to get a vma in the right range */
goto out;
@@ -532,7 +533,7 @@ static void do_fault(struct work_struct *work)
if (access_error(vma, fault))
goto out;

- ret = handle_mm_fault(vma, address, flags);
+ ret = handle_mm_fault(vma, address, flags, &mmrange);
out:
up_read(&mm->mmap_sem);

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 35a408d0ae4f..6a74386ee83f 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -585,6 +585,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
struct intel_iommu *iommu = d;
struct intel_svm *svm = NULL;
int head, tail, handled = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* Clear PPR bit before reading head/tail registers, to
* ensure that we get a new interrupt if needed. */
@@ -643,7 +644,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
goto bad_req;

down_read(&svm->mm->mmap_sem);
- vma = find_extend_vma(svm->mm, address);
+ vma = find_extend_vma(svm->mm, address, &mmrange);
if (!vma || address < vma->vm_start)
goto invalid;

@@ -651,7 +652,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
goto invalid;

ret = handle_mm_fault(vma, address,
- req->wr_req ? FAULT_FLAG_WRITE : 0);
+ req->wr_req ? FAULT_FLAG_WRITE : 0, &mmrange);
if (ret & VM_FAULT_ERROR)
goto invalid;

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index f412429cf5ba..64a4cd62eeb3 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -152,7 +152,8 @@ static void videobuf_dma_init(struct videobuf_dmabuf *dma)
}

static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
- int direction, unsigned long data, unsigned long size)
+ int direction, unsigned long data, unsigned long size,
+ struct range_lock *mmrange)
{
unsigned long first, last;
int err, rw = 0;
@@ -186,7 +187,7 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
data, size, dma->nr_pages);

err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
- flags, dma->pages, NULL);
+ flags, dma->pages, NULL, mmrange);

if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
@@ -201,9 +202,10 @@ static int videobuf_dma_init_user(struct videobuf_dmabuf *dma, int direction,
unsigned long data, unsigned long size)
{
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_read(&current->mm->mmap_sem);
- ret = videobuf_dma_init_user_locked(dma, direction, data, size);
+ ret = videobuf_dma_init_user_locked(dma, direction, data, size, &mmrange);
up_read(&current->mm->mmap_sem);

return ret;
@@ -539,9 +541,14 @@ static int __videobuf_iolock(struct videobuf_queue *q,
we take current->mm->mmap_sem there, to prevent
locking inversion, so don't take it here */

+ /* XXX: can we use a local mmrange here? */
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
err = videobuf_dma_init_user_locked(&mem->dma,
- DMA_FROM_DEVICE,
- vb->baddr, vb->bsize);
+ DMA_FROM_DEVICE,
+ vb->baddr,
+ vb->bsize,
+ &mmrange);
if (0 != err)
return err;
}
@@ -555,6 +562,7 @@ static int __videobuf_iolock(struct videobuf_queue *q,
* building for PAE. Compiler doesn't like direct casting
* of a 32 bit ptr to 64 bit integer.
*/
+
bus = (dma_addr_t)(unsigned long)fbuf->base + vb->boff;
pages = PAGE_ALIGN(vb->size) >> PAGE_SHIFT;
err = videobuf_dma_init_overlay(&mem->dma, DMA_FROM_DEVICE,
diff --git a/drivers/misc/mic/scif/scif_rma.c b/drivers/misc/mic/scif/scif_rma.c
index c824329f7012..6ecac843e5f3 100644
--- a/drivers/misc/mic/scif/scif_rma.c
+++ b/drivers/misc/mic/scif/scif_rma.c
@@ -1332,6 +1332,7 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot,
int prot = *out_prot;
int ulimit = 0;
struct mm_struct *mm = NULL;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* Unsupported flags */
if (map_flags & ~(SCIF_MAP_KERNEL | SCIF_MAP_ULIMIT))
@@ -1400,7 +1401,7 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot,
nr_pages,
(prot & SCIF_PROT_WRITE) ? FOLL_WRITE : 0,
pinned_pages->pages,
- NULL);
+ NULL, &mmrange);
up_write(&mm->mmap_sem);
if (nr_pages != pinned_pages->nr_pages) {
if (try_upgrade) {
diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index 93be82fc338a..b35d60bb2197 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -189,7 +189,8 @@ static void get_clear_fault_map(struct gru_state *gru,
*/
static int non_atomic_pte_lookup(struct vm_area_struct *vma,
unsigned long vaddr, int write,
- unsigned long *paddr, int *pageshift)
+ unsigned long *paddr, int *pageshift,
+ struct range_lock *mmrange)
{
struct page *page;

@@ -198,7 +199,8 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
#else
*pageshift = PAGE_SHIFT;
#endif
- if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
+ if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0,
+ &page, NULL, mmrange) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
put_page(page);
@@ -263,7 +265,8 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
}

static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
- int write, int atomic, unsigned long *gpa, int *pageshift)
+ int write, int atomic, unsigned long *gpa, int *pageshift,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = gts->ts_mm;
struct vm_area_struct *vma;
@@ -283,7 +286,8 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
if (ret) {
if (atomic)
goto upm;
- if (non_atomic_pte_lookup(vma, vaddr, write, &paddr, &ps))
+ if (non_atomic_pte_lookup(vma, vaddr, write, &paddr,
+ &ps, mmrange))
goto inval;
}
if (is_gru_paddr(paddr))
@@ -324,7 +328,8 @@ static void gru_preload_tlb(struct gru_state *gru,
unsigned long fault_vaddr, int asid, int write,
unsigned char tlb_preload_count,
struct gru_tlb_fault_handle *tfh,
- struct gru_control_block_extended *cbe)
+ struct gru_control_block_extended *cbe,
+ struct range_lock *mmrange)
{
unsigned long vaddr = 0, gpa;
int ret, pageshift;
@@ -342,7 +347,7 @@ static void gru_preload_tlb(struct gru_state *gru,
vaddr = min(vaddr, fault_vaddr + tlb_preload_count * PAGE_SIZE);

while (vaddr > fault_vaddr) {
- ret = gru_vtop(gts, vaddr, write, atomic, &gpa, &pageshift);
+ ret = gru_vtop(gts, vaddr, write, atomic, &gpa, &pageshift, mmrange);
if (ret || tfh_write_only(tfh, gpa, GAA_RAM, vaddr, asid, write,
GRU_PAGESIZE(pageshift)))
return;
@@ -368,7 +373,8 @@ static void gru_preload_tlb(struct gru_state *gru,
static int gru_try_dropin(struct gru_state *gru,
struct gru_thread_state *gts,
struct gru_tlb_fault_handle *tfh,
- struct gru_instruction_bits *cbk)
+ struct gru_instruction_bits *cbk,
+ struct range_lock *mmrange)
{
struct gru_control_block_extended *cbe = NULL;
unsigned char tlb_preload_count = gts->ts_tlb_preload_count;
@@ -423,7 +429,7 @@ static int gru_try_dropin(struct gru_state *gru,
if (atomic_read(&gts->ts_gms->ms_range_active))
goto failactive;

- ret = gru_vtop(gts, vaddr, write, atomic, &gpa, &pageshift);
+ ret = gru_vtop(gts, vaddr, write, atomic, &gpa, &pageshift, mmrange);
if (ret == VTOP_INVALID)
goto failinval;
if (ret == VTOP_RETRY)
@@ -438,7 +444,8 @@ static int gru_try_dropin(struct gru_state *gru,
}

if (unlikely(cbe) && pageshift == PAGE_SHIFT) {
- gru_preload_tlb(gru, gts, atomic, vaddr, asid, write, tlb_preload_count, tfh, cbe);
+ gru_preload_tlb(gru, gts, atomic, vaddr, asid, write,
+ tlb_preload_count, tfh, cbe, mmrange);
gru_flush_cache_cbe(cbe);
}

@@ -587,10 +594,13 @@ static irqreturn_t gru_intr(int chiplet, int blade)
* If it fails, retry the fault in user context.
*/
gts->ustats.fmm_tlbmiss++;
- if (!gts->ts_force_cch_reload &&
- down_read_trylock(&gts->ts_mm->mmap_sem)) {
- gru_try_dropin(gru, gts, tfh, NULL);
- up_read(&gts->ts_mm->mmap_sem);
+ if (!gts->ts_force_cch_reload) {
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
+ if (down_read_trylock(&gts->ts_mm->mmap_sem)) {
+ gru_try_dropin(gru, gts, tfh, NULL, &mmrange);
+ up_read(&gts->ts_mm->mmap_sem);
+ }
} else {
tfh_user_polling_mode(tfh);
STAT(intr_mm_lock_failed);
@@ -625,7 +635,7 @@ irqreturn_t gru_intr_mblade(int irq, void *dev_id)

static int gru_user_dropin(struct gru_thread_state *gts,
struct gru_tlb_fault_handle *tfh,
- void *cb)
+ void *cb, struct range_lock *mmrange)
{
struct gru_mm_struct *gms = gts->ts_gms;
int ret;
@@ -635,7 +645,7 @@ static int gru_user_dropin(struct gru_thread_state *gts,
wait_event(gms->ms_wait_queue,
atomic_read(&gms->ms_range_active) == 0);
prefetchw(tfh); /* Helps on hdw, required for emulator */
- ret = gru_try_dropin(gts->ts_gru, gts, tfh, cb);
+ ret = gru_try_dropin(gts->ts_gru, gts, tfh, cb, mmrange);
if (ret <= 0)
return ret;
STAT(call_os_wait_queue);
@@ -653,6 +663,7 @@ int gru_handle_user_call_os(unsigned long cb)
struct gru_thread_state *gts;
void *cbk;
int ucbnum, cbrnum, ret = -EINVAL;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

STAT(call_os);

@@ -685,7 +696,7 @@ int gru_handle_user_call_os(unsigned long cb)
tfh = get_tfh_by_index(gts->ts_gru, cbrnum);
cbk = get_gseg_base_address_cb(gts->ts_gru->gs_gru_base_vaddr,
gts->ts_ctxnum, ucbnum);
- ret = gru_user_dropin(gts, tfh, cbk);
+ ret = gru_user_dropin(gts, tfh, cbk, &mmrange);
}
exit:
gru_unlock_gts(gts);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29ae4819..1b3b103da637 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -345,13 +345,14 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
page);
} else {
unsigned int flags = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (prot & IOMMU_WRITE)
flags |= FOLL_WRITE;

down_read(&mm->mmap_sem);
ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
- NULL, NULL);
+ NULL, NULL, &mmrange);
up_read(&mm->mmap_sem);
}

diff --git a/fs/aio.c b/fs/aio.c
index a062d75109cb..31774b75c372 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -457,6 +457,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
int nr_pages;
int i;
struct file *file;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* Compensate for the ring buffer's head/tail overlap entry */
nr_events += 2; /* 1 is required, 2 for good luck */
@@ -519,7 +520,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)

ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size,
PROT_READ | PROT_WRITE,
- MAP_SHARED, 0, &unused, NULL);
+ MAP_SHARED, 0, &unused, NULL, &mmrange);
up_write(&mm->mmap_sem);
if (IS_ERR((void *)ctx->mmap_base)) {
ctx->mmap_size = 0;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 2f492dfcabde..9aea808d55d7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -180,6 +180,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
int ei_index = 0;
const struct cred *cred = current_cred();
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* In some cases (e.g. Hyper-Threading), we want to avoid L1
@@ -300,7 +301,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
* Grow the stack manually; some architectures have a limit on how
* far ahead a user-space access may be in order to grow the stack.
*/
- vma = find_extend_vma(current->mm, bprm->p);
+ vma = find_extend_vma(current->mm, bprm->p, &mmrange);
if (!vma)
return -EFAULT;

diff --git a/fs/exec.c b/fs/exec.c
index e7b69e14649f..e46752874b47 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -197,6 +197,11 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
struct page *page;
int ret;
unsigned int gup_flags = FOLL_FORCE;
+ /*
+ * No concurrency for the bprm->mm yet -- this is exec path;
+ * but gup needs an mmrange.
+ */
+ DEFINE_RANGE_LOCK_FULL(mmrange);

#ifdef CONFIG_STACK_GROWSUP
if (write) {
@@ -214,7 +219,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
* doing the exec and bprm->mm is the new process's mm.
*/
ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
- &page, NULL, NULL);
+ &page, NULL, NULL, &mmrange);
if (ret <= 0)
return NULL;

@@ -615,7 +620,8 @@ EXPORT_SYMBOL(copy_strings_kernel);
* 4) Free up any cleared pgd range.
* 5) Shrink the vma to cover only the new range.
*/
-static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
+static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long old_start = vma->vm_start;
@@ -637,7 +643,8 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
/*
* cover the whole range: [new_start, old_end)
*/
- if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
+ if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL,
+ mmrange))
return -ENOMEM;

/*
@@ -671,7 +678,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
/*
* Shrink the vma to just the new range. Always succeeds.
*/
- vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
+ vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL, mmrange);

return 0;
}
@@ -694,6 +701,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
unsigned long stack_size;
unsigned long stack_expand;
unsigned long rlim_stack;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size */
@@ -749,14 +757,14 @@ int setup_arg_pages(struct linux_binprm *bprm,
vm_flags |= VM_STACK_INCOMPLETE_SETUP;

ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
- vm_flags);
+ vm_flags, &mmrange);
if (ret)
goto out_unlock;
BUG_ON(prev != vma);

/* Move stack pages down in memory. */
if (stack_shift) {
- ret = shift_arg_pages(vma, stack_shift);
+ ret = shift_arg_pages(vma, stack_shift, &mmrange);
if (ret)
goto out_unlock;
}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index d697c8ab0a14..791f9f93643c 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -16,6 +16,7 @@
#include <linux/binfmts.h>
#include <linux/sched/coredump.h>
#include <linux/sched/task.h>
+#include <linux/range_lock.h>

struct ctl_table_header;
struct mempolicy;
@@ -263,6 +264,8 @@ struct proc_maps_private {
#ifdef CONFIG_NUMA
struct mempolicy *task_mempolicy;
#endif
+ /* mmap_sem is held across all stages of seqfile */
+ struct range_lock mmrange;
} __randomize_layout;

struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b66fc8de7d34..7c0a79a937b5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -174,6 +174,7 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
if (!mm || !mmget_not_zero(mm))
return NULL;

+ range_lock_init_full(&priv->mmrange);
down_read(&mm->mmap_sem);
hold_task_mempolicy(priv);
priv->tail_vma = get_gate_vma(mm);
@@ -514,7 +515,7 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,

#ifdef CONFIG_SHMEM
static int smaps_pte_hole(unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
struct mem_size_stats *mss = walk->private;

@@ -605,7 +606,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
#endif

static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
struct vm_area_struct *vma = walk->vma;
pte_t *pte;
@@ -797,7 +798,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
#endif

/* mmap_sem is held in m_start */
- walk_page_vma(vma, &smaps_walk);
+ walk_page_vma(vma, &smaps_walk, &priv->mmrange);
if (vma->vm_flags & VM_LOCKED)
mss->pss_locked += mss->pss;

@@ -1012,7 +1013,8 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
#endif

static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
- unsigned long end, struct mm_walk *walk)
+ unsigned long end, struct mm_walk *walk,
+ struct range_lock *mmrange)
{
struct clear_refs_private *cp = walk->private;
struct vm_area_struct *vma = walk->vma;
@@ -1103,6 +1105,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
struct mmu_gather tlb;
int itype;
int rv;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

memset(buffer, 0, sizeof(buffer));
if (count > sizeof(buffer) - 1)
@@ -1166,7 +1169,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
}
mmu_notifier_invalidate_range_start(mm, 0, -1);
}
- walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
+ walk_page_range(0, mm->highest_vm_end, &clear_refs_walk,
+ &mmrange);
if (type == CLEAR_REFS_SOFT_DIRTY)
mmu_notifier_invalidate_range_end(mm, 0, -1);
tlb_finish_mmu(&tlb, 0, -1);
@@ -1223,7 +1227,7 @@ static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
}

static int pagemap_pte_hole(unsigned long start, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
struct pagemapread *pm = walk->private;
unsigned long addr = start;
@@ -1301,7 +1305,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
}

static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
struct vm_area_struct *vma = walk->vma;
struct pagemapread *pm = walk->private;
@@ -1467,6 +1471,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
unsigned long start_vaddr;
unsigned long end_vaddr;
int ret = 0, copied = 0;
+ DEFINE_RANGE_LOCK_FULL(tmprange);
+ struct range_lock *mmrange = &tmprange;

if (!mm || !mmget_not_zero(mm))
goto out;
@@ -1523,7 +1529,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
if (end < start_vaddr || end > end_vaddr)
end = end_vaddr;
down_read(&mm->mmap_sem);
- ret = walk_page_range(start_vaddr, end, &pagemap_walk);
+ ret = walk_page_range(start_vaddr, end, &pagemap_walk,
+ mmrange);
up_read(&mm->mmap_sem);
start_vaddr = end;

@@ -1671,7 +1678,8 @@ static struct page *can_gather_numa_stats_pmd(pmd_t pmd,
#endif

static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
- unsigned long end, struct mm_walk *walk)
+ unsigned long end, struct mm_walk *walk,
+ struct range_lock *mmrange)
{
struct numa_maps *md = walk->private;
struct vm_area_struct *vma = walk->vma;
@@ -1740,6 +1748,7 @@ static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
*/
static int show_numa_map(struct seq_file *m, void *v, int is_pid)
{
+ struct proc_maps_private *priv = m->private;
struct numa_maps_private *numa_priv = m->private;
struct proc_maps_private *proc_priv = &numa_priv->proc_maps;
struct vm_area_struct *vma = v;
@@ -1785,7 +1794,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
seq_puts(m, " huge");

/* mmap_sem is held by m_start */
- walk_page_vma(vma, &walk);
+ walk_page_vma(vma, &walk, &priv->mmrange);

if (!md->pages)
goto out;
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a45f0af22a60..3768955c10bc 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -350,6 +350,11 @@ static int remap_oldmem_pfn_checked(struct vm_area_struct *vma,
unsigned long pos_start, pos_end, pos;
unsigned long zeropage_pfn = my_zero_pfn(0);
size_t len = 0;
+ /*
+ * No concurrency for the bprm->mm yet -- this is a vmcore path,
+ * but do_munmap() needs an mmrange.
+ */
+ DEFINE_RANGE_LOCK_FULL(mmrange);

pos_start = pfn;
pos_end = pfn + (size >> PAGE_SHIFT);
@@ -388,7 +393,7 @@ static int remap_oldmem_pfn_checked(struct vm_area_struct *vma,
}
return 0;
fail:
- do_munmap(vma->vm_mm, from, len, NULL);
+ do_munmap(vma->vm_mm, from, len, NULL, &mmrange);
return -EAGAIN;
}

@@ -411,6 +416,11 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
size_t size = vma->vm_end - vma->vm_start;
u64 start, end, len, tsz;
struct vmcore *m;
+ /*
+ * No concurrency for the bprm->mm yet -- this is a vmcore path,
+ * but do_munmap() needs an mmrange.
+ */
+ DEFINE_RANGE_LOCK_FULL(mmrange);

start = (u64)vma->vm_pgoff << PAGE_SHIFT;
end = start + size;
@@ -481,7 +491,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)

return 0;
fail:
- do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
+ do_munmap(vma->vm_mm, vma->vm_start, len, NULL, &mmrange);
return -EAGAIN;
}
#else
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 87a13a7c8270..e3089865fd52 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -851,6 +851,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
/* len == 0 means wake all */
struct userfaultfd_wake_range range = { .len = 0, };
unsigned long new_flags;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

WRITE_ONCE(ctx->released, true);

@@ -880,7 +881,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
new_flags, vma->anon_vma,
vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- NULL_VM_UFFD_CTX);
+ NULL_VM_UFFD_CTX, &mmrange);
if (prev)
vma = prev;
else
@@ -1276,6 +1277,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
bool found;
bool basic_ioctls;
unsigned long start, end, vma_end;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

user_uffdio_register = (struct uffdio_register __user *) arg;

@@ -1413,18 +1415,19 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
prev = vma_merge(mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- ((struct vm_userfaultfd_ctx){ ctx }));
+ ((struct vm_userfaultfd_ctx){ ctx }),
+ &mmrange);
if (prev) {
vma = prev;
goto next;
}
if (vma->vm_start < start) {
- ret = split_vma(mm, vma, start, 1);
+ ret = split_vma(mm, vma, start, 1, &mmrange);
if (ret)
break;
}
if (vma->vm_end > end) {
- ret = split_vma(mm, vma, end, 0);
+ ret = split_vma(mm, vma, end, 0, &mmrange);
if (ret)
break;
}
@@ -1471,6 +1474,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
bool found;
unsigned long start, end, vma_end;
const void __user *buf = (void __user *)arg;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

ret = -EFAULT;
if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
@@ -1571,18 +1575,18 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
prev = vma_merge(mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- NULL_VM_UFFD_CTX);
+ NULL_VM_UFFD_CTX, &mmrange);
if (prev) {
vma = prev;
goto next;
}
if (vma->vm_start < start) {
- ret = split_vma(mm, vma, start, 1);
+ ret = split_vma(mm, vma, start, 1, &mmrange);
if (ret)
break;
}
if (vma->vm_end > end) {
- ret = split_vma(mm, vma, end, 0);
+ ret = split_vma(mm, vma, end, 0, &mmrange);
if (ret)
break;
}
diff --git a/include/asm-generic/mm_hooks.h b/include/asm-generic/mm_hooks.h
index 8ac4e68a12f0..2115deceded1 100644
--- a/include/asm-generic/mm_hooks.h
+++ b/include/asm-generic/mm_hooks.h
@@ -19,7 +19,8 @@ static inline void arch_exit_mmap(struct mm_struct *mm)

static inline void arch_unmap(struct mm_struct *mm,
struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
}

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 325017ad9311..da004594d831 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -295,7 +295,7 @@ int hmm_vma_get_pfns(struct vm_area_struct *vma,
struct hmm_range *range,
unsigned long start,
unsigned long end,
- hmm_pfn_t *pfns);
+ hmm_pfn_t *pfns, struct range_lock *mmrange);
bool hmm_vma_range_done(struct vm_area_struct *vma, struct hmm_range *range);


@@ -323,7 +323,7 @@ int hmm_vma_fault(struct vm_area_struct *vma,
unsigned long end,
hmm_pfn_t *pfns,
bool write,
- bool block);
+ bool block, struct range_lock *mmrange);
#endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */


diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 44368b19b27e..19667b75f73c 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -20,7 +20,8 @@ struct mem_cgroup;

#ifdef CONFIG_KSM
int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, int advice, unsigned long *vm_flags);
+ unsigned long end, int advice, unsigned long *vm_flags,
+ struct range_lock *mmrange);
int __ksm_enter(struct mm_struct *mm);
void __ksm_exit(struct mm_struct *mm);

@@ -78,7 +79,8 @@ static inline void ksm_exit(struct mm_struct *mm)

#ifdef CONFIG_MMU
static inline int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, int advice, unsigned long *vm_flags)
+ unsigned long end, int advice, unsigned long *vm_flags,
+ struct range_lock *mmrange)
{
return 0;
}
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 0c6fe904bc97..fa08e348a295 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -272,7 +272,7 @@ int migrate_vma(const struct migrate_vma_ops *ops,
unsigned long end,
unsigned long *src,
unsigned long *dst,
- void *private);
+ void *private, struct range_lock *mmrange);
#else
static inline int migrate_vma(const struct migrate_vma_ops *ops,
struct vm_area_struct *vma,
@@ -280,7 +280,7 @@ static inline int migrate_vma(const struct migrate_vma_ops *ops,
unsigned long end,
unsigned long *src,
unsigned long *dst,
- void *private)
+ void *private, struct range_lock *mmrange)
{
return -EINVAL;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bcf2509d448d..fc4e7fdc3e76 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1295,11 +1295,12 @@ struct mm_walk {
int (*pud_entry)(pud_t *pud, unsigned long addr,
unsigned long next, struct mm_walk *walk);
int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
- unsigned long next, struct mm_walk *walk);
+ unsigned long next, struct mm_walk *walk,
+ struct range_lock *mmrange);
int (*pte_entry)(pte_t *pte, unsigned long addr,
unsigned long next, struct mm_walk *walk);
int (*pte_hole)(unsigned long addr, unsigned long next,
- struct mm_walk *walk);
+ struct mm_walk *walk, struct range_lock *mmrange);
int (*hugetlb_entry)(pte_t *pte, unsigned long hmask,
unsigned long addr, unsigned long next,
struct mm_walk *walk);
@@ -1311,8 +1312,9 @@ struct mm_walk {
};

int walk_page_range(unsigned long addr, unsigned long end,
- struct mm_walk *walk);
-int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk);
+ struct mm_walk *walk, struct range_lock *mmrange);
+int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk,
+ struct range_lock *mmrange);
void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
unsigned long end, unsigned long floor, unsigned long ceiling);
int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
@@ -1337,17 +1339,18 @@ int invalidate_inode_page(struct page *page);

#ifdef CONFIG_MMU
extern int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
- unsigned int flags);
+ unsigned int flags, struct range_lock *mmrange);
extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
- bool *unlocked);
+ bool *unlocked, struct range_lock *mmrange);
void unmap_mapping_pages(struct address_space *mapping,
pgoff_t start, pgoff_t nr, bool even_cows);
void unmap_mapping_range(struct address_space *mapping,
loff_t const holebegin, loff_t const holelen, int even_cows);
#else
static inline int handle_mm_fault(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags)
+ unsigned long address, unsigned int flags,
+ struct range_lock *mmrange)
{
/* should never happen if there's no MMU */
BUG();
@@ -1355,7 +1358,8 @@ static inline int handle_mm_fault(struct vm_area_struct *vma,
}
static inline int fixup_user_fault(struct task_struct *tsk,
struct mm_struct *mm, unsigned long address,
- unsigned int fault_flags, bool *unlocked)
+ unsigned int fault_flags, bool *unlocked,
+ struct range_lock *mmrange)
{
/* should never happen if there's no MMU */
BUG();
@@ -1383,24 +1387,28 @@ extern int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked);
+ struct vm_area_struct **vmas, int *locked,
+ struct range_lock *mmrange);
long get_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas);
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas, struct range_lock *mmrange);
long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages, int *locked);
+ unsigned int gup_flags, struct page **pages,
+ int *locked, struct range_lock *mmrange);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
#ifdef CONFIG_FS_DAX
long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas);
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas,
+ struct range_lock *mmrange);
#else
static inline long get_user_pages_longterm(unsigned long start,
unsigned long nr_pages, unsigned int gup_flags,
- struct page **pages, struct vm_area_struct **vmas)
+ struct page **pages, struct vm_area_struct **vmas,
+ struct range_lock *mmrange)
{
- return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+ return get_user_pages(start, nr_pages, gup_flags, pages, vmas, mmrange);
}
#endif /* CONFIG_FS_DAX */

@@ -1505,7 +1513,8 @@ extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long
int dirty_accountable, int prot_numa);
extern int mprotect_fixup(struct vm_area_struct *vma,
struct vm_area_struct **pprev, unsigned long start,
- unsigned long end, unsigned long newflags);
+ unsigned long end, unsigned long newflags,
+ struct range_lock *mmrange);

/*
* doesn't attempt to fault and will return short.
@@ -2149,28 +2158,30 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
extern int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
- struct vm_area_struct *expand);
+ struct vm_area_struct *expand, struct range_lock *mmrange);
static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
+ unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
+ struct range_lock *mmrange)
{
- return __vma_adjust(vma, start, end, pgoff, insert, NULL);
+ return __vma_adjust(vma, start, end, pgoff, insert, NULL, mmrange);
}
extern struct vm_area_struct *vma_merge(struct mm_struct *,
struct vm_area_struct *prev, unsigned long addr, unsigned long end,
unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
- struct mempolicy *, struct vm_userfaultfd_ctx);
+ struct mempolicy *, struct vm_userfaultfd_ctx,
+ struct range_lock *mmrange);
extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
- unsigned long addr, int new_below);
+ unsigned long addr, int new_below, struct range_lock *mmrange);
extern int split_vma(struct mm_struct *, struct vm_area_struct *,
- unsigned long addr, int new_below);
+ unsigned long addr, int new_below, struct range_lock *mmrange);
extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
extern void __vma_link_rb(struct mm_struct *, struct vm_area_struct *,
struct rb_node **, struct rb_node *);
extern void unlink_file_vma(struct vm_area_struct *);
extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
unsigned long addr, unsigned long len, pgoff_t pgoff,
- bool *need_rmap_locks);
+ bool *need_rmap_locks, struct range_lock *mmrange);
extern void exit_mmap(struct mm_struct *);

static inline int check_data_rlimit(unsigned long rlim,
@@ -2212,21 +2223,22 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo

extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
- struct list_head *uf);
+ struct list_head *uf, struct range_lock *mmrange);
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
- struct list_head *uf);
+ struct list_head *uf, struct range_lock *mmrange);
extern int do_munmap(struct mm_struct *, unsigned long, size_t,
- struct list_head *uf);
+ struct list_head *uf, struct range_lock *mmrange);

static inline unsigned long
do_mmap_pgoff(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
unsigned long pgoff, unsigned long *populate,
- struct list_head *uf)
+ struct list_head *uf, struct range_lock *mmrange)
{
- return do_mmap(file, addr, len, prot, flags, 0, pgoff, populate, uf);
+ return do_mmap(file, addr, len, prot, flags, 0, pgoff, populate,
+ uf, mmrange);
}

#ifdef CONFIG_MMU
@@ -2405,7 +2417,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
#endif

-struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
+struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr,
+ struct range_lock *);
int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..79eb735e7c95 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -34,6 +34,7 @@ struct mm_struct;
struct inode;
struct notifier_block;
struct page;
+struct range_lock;

#define UPROBE_HANDLER_REMOVE 1
#define UPROBE_HANDLER_MASK 1
@@ -115,17 +116,20 @@ struct uprobes_state {
struct xol_area *xol_area;
};

-extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
-extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
+extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm,
+ unsigned long vaddr, struct range_lock *mmrange);
+extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm,
+ unsigned long vaddr, struct range_lock *mmrange);
extern bool is_swbp_insn(uprobe_opcode_t *insn);
extern bool is_trap_insn(uprobe_opcode_t *insn);
extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
+extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
+ uprobe_opcode_t, struct range_lock *mmrange);
extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
-extern int uprobe_mmap(struct vm_area_struct *vma);
+extern int uprobe_mmap(struct vm_area_struct *vma, struct range_lock *mmrange);;
extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end);
extern void uprobe_start_dup_mmap(void);
extern void uprobe_end_dup_mmap(void);
@@ -169,7 +173,8 @@ static inline void
uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
{
}
-static inline int uprobe_mmap(struct vm_area_struct *vma)
+static inline int uprobe_mmap(struct vm_area_struct *vma,
+ struct range_lock *mmrange)
{
return 0;
}
diff --git a/ipc/shm.c b/ipc/shm.c
index 4643865e9171..6c29c791c7f2 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1293,6 +1293,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
struct path path;
fmode_t f_mode;
unsigned long populate = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

err = -EINVAL;
if (shmid < 0)
@@ -1411,7 +1412,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
goto invalid;
}

- addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate, NULL);
+ addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate, NULL,
+ &mmrange);
*raddr = addr;
err = 0;
if (IS_ERR_VALUE(addr))
@@ -1487,6 +1489,7 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
struct file *file;
struct vm_area_struct *next;
#endif
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (addr & ~PAGE_MASK)
return retval;
@@ -1537,7 +1540,8 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
*/
file = vma->vm_file;
size = i_size_read(file_inode(vma->vm_file));
- do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
+ do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start,
+ NULL, &mmrange);
/*
* We discovered the size of the shm segment, so
* break out of here and fall through to the next
@@ -1564,7 +1568,8 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
if ((vma->vm_ops == &shm_vm_ops) &&
((vma->vm_start - addr)/PAGE_SIZE == vma->vm_pgoff) &&
(vma->vm_file == file))
- do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
+ do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start,
+ NULL, &mmrange);
vma = next;
}

@@ -1573,7 +1578,8 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
* given
*/
if (vma && vma->vm_start == addr && vma->vm_ops == &shm_vm_ops) {
- do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
+ do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start,
+ NULL, &mmrange);
retval = 0;
}

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ce6848e46e94..60e12b39182c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -300,7 +300,7 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
* Return 0 (success) or a negative errno.
*/
int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
- uprobe_opcode_t opcode)
+ uprobe_opcode_t opcode, struct range_lock *mmrange)
{
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -309,7 +309,8 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
retry:
/* Read the page with vaddr into memory */
ret = get_user_pages_remote(NULL, mm, vaddr, 1,
- FOLL_FORCE | FOLL_SPLIT, &old_page, &vma, NULL);
+ FOLL_FORCE | FOLL_SPLIT, &old_page, &vma, NULL,
+ mmrange);
if (ret <= 0)
return ret;

@@ -349,9 +350,10 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
* For mm @mm, store the breakpoint instruction at @vaddr.
* Return 0 (success) or a negative errno.
*/
-int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
+int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
+ unsigned long vaddr, struct range_lock *mmrange)
{
- return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+ return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN, mmrange);
}

/**
@@ -364,9 +366,12 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
* Return 0 (success) or a negative errno.
*/
int __weak
-set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
+set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
+ unsigned long vaddr, struct range_lock *mmrange)
{
- return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t *)&auprobe->insn);
+ return uprobe_write_opcode(mm, vaddr,
+ *(uprobe_opcode_t *)&auprobe->insn,
+ mmrange);
}

static struct uprobe *get_uprobe(struct uprobe *uprobe)
@@ -650,7 +655,8 @@ static bool filter_chain(struct uprobe *uprobe,

static int
install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
- struct vm_area_struct *vma, unsigned long vaddr)
+ struct vm_area_struct *vma, unsigned long vaddr,
+ struct range_lock *mmrange)
{
bool first_uprobe;
int ret;
@@ -667,7 +673,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
if (first_uprobe)
set_bit(MMF_HAS_UPROBES, &mm->flags);

- ret = set_swbp(&uprobe->arch, mm, vaddr);
+ ret = set_swbp(&uprobe->arch, mm, vaddr, mmrange);
if (!ret)
clear_bit(MMF_RECALC_UPROBES, &mm->flags);
else if (first_uprobe)
@@ -677,10 +683,11 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
}

static int
-remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
+remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
+ unsigned long vaddr, struct range_lock *mmrange)
{
set_bit(MMF_RECALC_UPROBES, &mm->flags);
- return set_orig_insn(&uprobe->arch, mm, vaddr);
+ return set_orig_insn(&uprobe->arch, mm, vaddr, mmrange);
}

static inline bool uprobe_is_active(struct uprobe *uprobe)
@@ -794,6 +801,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
bool is_register = !!new;
struct map_info *info;
int err = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

percpu_down_write(&dup_mmap_sem);
info = build_map_info(uprobe->inode->i_mapping,
@@ -824,11 +832,13 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
/* consult only the "caller", new consumer. */
if (consumer_filter(new,
UPROBE_FILTER_REGISTER, mm))
- err = install_breakpoint(uprobe, mm, vma, info->vaddr);
+ err = install_breakpoint(uprobe, mm, vma,
+ info->vaddr, &mmrange);
} else if (test_bit(MMF_HAS_UPROBES, &mm->flags)) {
if (!filter_chain(uprobe,
UPROBE_FILTER_UNREGISTER, mm))
- err |= remove_breakpoint(uprobe, mm, info->vaddr);
+ err |= remove_breakpoint(uprobe, mm,
+ info->vaddr, &mmrange);
}

unlock:
@@ -972,6 +982,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
{
struct vm_area_struct *vma;
int err = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_read(&mm->mmap_sem);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
@@ -988,7 +999,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
continue;

vaddr = offset_to_vaddr(vma, uprobe->offset);
- err |= remove_breakpoint(uprobe, mm, vaddr);
+ err |= remove_breakpoint(uprobe, mm, vaddr, &mmrange);
}
up_read(&mm->mmap_sem);

@@ -1063,7 +1074,7 @@ static void build_probe_list(struct inode *inode,
* Currently we ignore all errors and always return 0, the callers
* can't handle the failure anyway.
*/
-int uprobe_mmap(struct vm_area_struct *vma)
+int uprobe_mmap(struct vm_area_struct *vma, struct range_lock *mmrange)
{
struct list_head tmp_list;
struct uprobe *uprobe, *u;
@@ -1087,7 +1098,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
if (!fatal_signal_pending(current) &&
filter_chain(uprobe, UPROBE_FILTER_MMAP, vma->vm_mm)) {
unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset);
- install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
+ install_breakpoint(uprobe, vma->vm_mm, vma, vaddr, mmrange);
}
put_uprobe(uprobe);
}
@@ -1698,7 +1709,8 @@ static void mmf_recalc_uprobes(struct mm_struct *mm)
clear_bit(MMF_HAS_UPROBES, &mm->flags);
}

-static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
+static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr,
+ struct range_lock *mmrange)
{
struct page *page;
uprobe_opcode_t opcode;
@@ -1718,7 +1730,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
* essentially a kernel access to the memory.
*/
result = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &page,
- NULL, NULL);
+ NULL, NULL, mmrange);
if (result < 0)
return result;

@@ -1734,6 +1746,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
struct mm_struct *mm = current->mm;
struct uprobe *uprobe = NULL;
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_read(&mm->mmap_sem);
vma = find_vma(mm, bp_vaddr);
@@ -1746,7 +1759,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
}

if (!uprobe)
- *is_swbp = is_trap_at_addr(mm, bp_vaddr);
+ *is_swbp = is_trap_at_addr(mm, bp_vaddr, &mmrange);
} else {
*is_swbp = -EFAULT;
}
diff --git a/kernel/futex.c b/kernel/futex.c
index 1f450e092c74..09a0d86f80a0 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -725,10 +725,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
{
struct mm_struct *mm = current->mm;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_read(&mm->mmap_sem);
ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
- FAULT_FLAG_WRITE, NULL);
+ FAULT_FLAG_WRITE, NULL, &mmrange);
up_read(&mm->mmap_sem);

return ret < 0 ? ret : 0;
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index c64dca6e27c2..d3dccd80c6ee 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -39,6 +39,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
int ret = 0;
int err;
int locked;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (nr_frames == 0)
return 0;
@@ -71,7 +72,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
vec->got_ref = true;
vec->is_pfns = false;
ret = get_user_pages_locked(start, nr_frames,
- gup_flags, (struct page **)(vec->ptrs), &locked);
+ gup_flags, (struct page **)(vec->ptrs), &locked,
+ &mmrange);
goto out;
}

diff --git a/mm/gup.c b/mm/gup.c
index 1b46e6e74881..01983a7b3750 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -478,7 +478,8 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
* If it is, *@nonblocking will be set to 0 and -EBUSY returned.
*/
static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
- unsigned long address, unsigned int *flags, int *nonblocking)
+ unsigned long address, unsigned int *flags, int *nonblocking,
+ struct range_lock *mmrange)
{
unsigned int fault_flags = 0;
int ret;
@@ -499,7 +500,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
fault_flags |= FAULT_FLAG_TRIED;
}

- ret = handle_mm_fault(vma, address, fault_flags);
+ ret = handle_mm_fault(vma, address, fault_flags, mmrange);
if (ret & VM_FAULT_ERROR) {
int err = vm_fault_to_errno(ret, *flags);

@@ -592,6 +593,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
* @vmas: array of pointers to vmas corresponding to each page.
* Or NULL if the caller does not require them.
* @nonblocking: whether waiting for disk IO or mmap_sem contention
+ * @mmrange: mm address space range locking
*
* Returns number of pages pinned. This may be fewer than the number
* requested. If nr_pages is 0 or negative, returns 0. If no pages
@@ -638,7 +640,8 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *nonblocking)
+ struct vm_area_struct **vmas, int *nonblocking,
+ struct range_lock *mmrange)
{
long i = 0;
unsigned int page_mask;
@@ -664,7 +667,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,

/* first iteration or cross vma bound */
if (!vma || start >= vma->vm_end) {
- vma = find_extend_vma(mm, start);
+ vma = find_extend_vma(mm, start, mmrange);
if (!vma && in_gate_area(mm, start)) {
int ret;
ret = get_gate_page(mm, start & PAGE_MASK,
@@ -697,7 +700,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
if (!page) {
int ret;
ret = faultin_page(tsk, vma, start, &foll_flags,
- nonblocking);
+ nonblocking, mmrange);
switch (ret) {
case 0:
goto retry;
@@ -796,7 +799,7 @@ static bool vma_permits_fault(struct vm_area_struct *vma,
*/
int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
- bool *unlocked)
+ bool *unlocked, struct range_lock *mmrange)
{
struct vm_area_struct *vma;
int ret, major = 0;
@@ -805,14 +808,14 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
fault_flags |= FAULT_FLAG_ALLOW_RETRY;

retry:
- vma = find_extend_vma(mm, address);
+ vma = find_extend_vma(mm, address, mmrange);
if (!vma || address < vma->vm_start)
return -EFAULT;

if (!vma_permits_fault(vma, fault_flags))
return -EFAULT;

- ret = handle_mm_fault(vma, address, fault_flags);
+ ret = handle_mm_fault(vma, address, fault_flags, mmrange);
major |= ret & VM_FAULT_MAJOR;
if (ret & VM_FAULT_ERROR) {
int err = vm_fault_to_errno(ret, 0);
@@ -849,7 +852,8 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
struct page **pages,
struct vm_area_struct **vmas,
int *locked,
- unsigned int flags)
+ unsigned int flags,
+ struct range_lock *mmrange)
{
long ret, pages_done;
bool lock_dropped;
@@ -868,7 +872,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
lock_dropped = false;
for (;;) {
ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages,
- vmas, locked);
+ vmas, locked, mmrange);
if (!locked)
/* VM_FAULT_RETRY couldn't trigger, bypass */
return ret;
@@ -908,7 +912,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
lock_dropped = true;
down_read(&mm->mmap_sem);
ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
- pages, NULL, NULL);
+ pages, NULL, NULL, mmrange);
if (ret != 1) {
BUG_ON(ret > 1);
if (!pages_done)
@@ -956,11 +960,11 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
*/
long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- int *locked)
+ int *locked, struct range_lock *mmrange)
{
return __get_user_pages_locked(current, current->mm, start, nr_pages,
pages, NULL, locked,
- gup_flags | FOLL_TOUCH);
+ gup_flags | FOLL_TOUCH, mmrange);
}
EXPORT_SYMBOL(get_user_pages_locked);

@@ -985,10 +989,11 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct mm_struct *mm = current->mm;
int locked = 1;
long ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_read(&mm->mmap_sem);
ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL,
- &locked, gup_flags | FOLL_TOUCH);
+ &locked, gup_flags | FOLL_TOUCH, &mmrange);
if (locked)
up_read(&mm->mmap_sem);
return ret;
@@ -1054,11 +1059,13 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ struct vm_area_struct **vmas, int *locked,
+ struct range_lock *mmrange)
{
return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
locked,
- gup_flags | FOLL_TOUCH | FOLL_REMOTE);
+ gup_flags | FOLL_TOUCH | FOLL_REMOTE,
+ mmrange);
}
EXPORT_SYMBOL(get_user_pages_remote);

@@ -1071,11 +1078,11 @@ EXPORT_SYMBOL(get_user_pages_remote);
*/
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas)
+ struct vm_area_struct **vmas, struct range_lock *mmrange)
{
return __get_user_pages_locked(current, current->mm, start, nr_pages,
pages, vmas, NULL,
- gup_flags | FOLL_TOUCH);
+ gup_flags | FOLL_TOUCH, mmrange);
}
EXPORT_SYMBOL(get_user_pages);

@@ -1094,7 +1101,8 @@ EXPORT_SYMBOL(get_user_pages);
*/
long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas_arg)
+ struct vm_area_struct **vmas_arg,
+ struct range_lock *mmrange)
{
struct vm_area_struct **vmas = vmas_arg;
struct vm_area_struct *vma_prev = NULL;
@@ -1110,7 +1118,7 @@ long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
return -ENOMEM;
}

- rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+ rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas, mmrange);

for (i = 0; i < rc; i++) {
struct vm_area_struct *vma = vmas[i];
@@ -1149,6 +1157,7 @@ EXPORT_SYMBOL(get_user_pages_longterm);
* @start: start address
* @end: end address
* @nonblocking:
+ * @mmrange: mm address space range locking
*
* This takes care of mlocking the pages too if VM_LOCKED is set.
*
@@ -1163,7 +1172,8 @@ EXPORT_SYMBOL(get_user_pages_longterm);
* released. If it's released, *@nonblocking will be set to 0.
*/
long populate_vma_page_range(struct vm_area_struct *vma,
- unsigned long start, unsigned long end, int *nonblocking)
+ unsigned long start, unsigned long end, int *nonblocking,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long nr_pages = (end - start) / PAGE_SIZE;
@@ -1198,7 +1208,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
* not result in a stack expansion that recurses back here.
*/
return __get_user_pages(current, mm, start, nr_pages, gup_flags,
- NULL, NULL, nonblocking);
+ NULL, NULL, nonblocking, mmrange);
}

/*
@@ -1215,6 +1225,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
struct vm_area_struct *vma = NULL;
int locked = 0;
long ret = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

VM_BUG_ON(start & ~PAGE_MASK);
VM_BUG_ON(len != PAGE_ALIGN(len));
@@ -1247,7 +1258,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
* double checks the vma flags, so that it won't mlock pages
* if the vma was already munlocked.
*/
- ret = populate_vma_page_range(vma, nstart, nend, &locked);
+ ret = populate_vma_page_range(vma, nstart, nend, &locked, &mmrange);
if (ret < 0) {
if (ignore_errors) {
ret = 0;
@@ -1282,10 +1293,11 @@ struct page *get_dump_page(unsigned long addr)
{
struct vm_area_struct *vma;
struct page *page;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (__get_user_pages(current, current->mm, addr, 1,
FOLL_FORCE | FOLL_DUMP | FOLL_GET, &page, &vma,
- NULL) < 1)
+ NULL, &mmrange) < 1)
return NULL;
flush_cache_page(vma, addr, page_to_pfn(page));
return page;
diff --git a/mm/hmm.c b/mm/hmm.c
index 320545b98ff5..b14e6869689e 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -245,7 +245,8 @@ struct hmm_vma_walk {

static int hmm_vma_do_fault(struct mm_walk *walk,
unsigned long addr,
- hmm_pfn_t *pfn)
+ hmm_pfn_t *pfn,
+ struct range_lock *mmrange)
{
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE;
struct hmm_vma_walk *hmm_vma_walk = walk->private;
@@ -254,7 +255,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk,

flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY;
flags |= hmm_vma_walk->write ? FAULT_FLAG_WRITE : 0;
- r = handle_mm_fault(vma, addr, flags);
+ r = handle_mm_fault(vma, addr, flags, mmrange);
if (r & VM_FAULT_RETRY)
return -EBUSY;
if (r & VM_FAULT_ERROR) {
@@ -298,7 +299,9 @@ static void hmm_pfns_clear(hmm_pfn_t *pfns,

static int hmm_vma_walk_hole(unsigned long addr,
unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk,
+ struct range_lock *mmrange)
+
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
@@ -312,7 +315,7 @@ static int hmm_vma_walk_hole(unsigned long addr,
if (hmm_vma_walk->fault) {
int ret;

- ret = hmm_vma_do_fault(walk, addr, &pfns[i]);
+ ret = hmm_vma_do_fault(walk, addr, &pfns[i], mmrange);
if (ret != -EAGAIN)
return ret;
}
@@ -323,7 +326,8 @@ static int hmm_vma_walk_hole(unsigned long addr,

static int hmm_vma_walk_clear(unsigned long addr,
unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk,
+ struct range_lock *mmrange)
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
@@ -337,7 +341,7 @@ static int hmm_vma_walk_clear(unsigned long addr,
if (hmm_vma_walk->fault) {
int ret;

- ret = hmm_vma_do_fault(walk, addr, &pfns[i]);
+ ret = hmm_vma_do_fault(walk, addr, &pfns[i], mmrange);
if (ret != -EAGAIN)
return ret;
}
@@ -349,7 +353,8 @@ static int hmm_vma_walk_clear(unsigned long addr,
static int hmm_vma_walk_pmd(pmd_t *pmdp,
unsigned long start,
unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk,
+ struct range_lock *mmrange)
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
@@ -366,7 +371,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,

again:
if (pmd_none(*pmdp))
- return hmm_vma_walk_hole(start, end, walk);
+ return hmm_vma_walk_hole(start, end, walk, mmrange);

if (pmd_huge(*pmdp) && vma->vm_flags & VM_HUGETLB)
return hmm_pfns_bad(start, end, walk);
@@ -389,10 +394,10 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
goto again;
if (pmd_protnone(pmd))
- return hmm_vma_walk_clear(start, end, walk);
+ return hmm_vma_walk_clear(start, end, walk, mmrange);

if (write_fault && !pmd_write(pmd))
- return hmm_vma_walk_clear(start, end, walk);
+ return hmm_vma_walk_clear(start, end, walk, mmrange);

pfn = pmd_pfn(pmd) + pte_index(addr);
flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
@@ -464,7 +469,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
fault:
pte_unmap(ptep);
/* Fault all pages in range */
- return hmm_vma_walk_clear(start, end, walk);
+ return hmm_vma_walk_clear(start, end, walk, mmrange);
}
pte_unmap(ptep - 1);

@@ -495,7 +500,8 @@ int hmm_vma_get_pfns(struct vm_area_struct *vma,
struct hmm_range *range,
unsigned long start,
unsigned long end,
- hmm_pfn_t *pfns)
+ hmm_pfn_t *pfns,
+ struct range_lock *mmrange)
{
struct hmm_vma_walk hmm_vma_walk;
struct mm_walk mm_walk;
@@ -541,7 +547,7 @@ int hmm_vma_get_pfns(struct vm_area_struct *vma,
mm_walk.pmd_entry = hmm_vma_walk_pmd;
mm_walk.pte_hole = hmm_vma_walk_hole;

- walk_page_range(start, end, &mm_walk);
+ walk_page_range(start, end, &mm_walk, mmrange);
return 0;
}
EXPORT_SYMBOL(hmm_vma_get_pfns);
@@ -664,7 +670,8 @@ int hmm_vma_fault(struct vm_area_struct *vma,
unsigned long end,
hmm_pfn_t *pfns,
bool write,
- bool block)
+ bool block,
+ struct range_lock *mmrange)
{
struct hmm_vma_walk hmm_vma_walk;
struct mm_walk mm_walk;
@@ -717,7 +724,7 @@ int hmm_vma_fault(struct vm_area_struct *vma,
mm_walk.pte_hole = hmm_vma_walk_hole;

do {
- ret = walk_page_range(start, end, &mm_walk);
+ ret = walk_page_range(start, end, &mm_walk, mmrange);
start = hmm_vma_walk.last;
} while (ret == -EAGAIN);

diff --git a/mm/internal.h b/mm/internal.h
index 62d8c34e63d5..abf1de31e524 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -289,7 +289,8 @@ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma,

#ifdef CONFIG_MMU
extern long populate_vma_page_range(struct vm_area_struct *vma,
- unsigned long start, unsigned long end, int *nonblocking);
+ unsigned long start, unsigned long end, int *nonblocking,
+ struct range_lock *mmrange);
extern void munlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
diff --git a/mm/ksm.c b/mm/ksm.c
index 293721f5da70..66c350cd9799 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -448,7 +448,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
* of the process that owns 'vma'. We also do not want to enforce
* protection keys here anyway.
*/
-static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
+static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
+ struct range_lock *mmrange)
{
struct page *page;
int ret = 0;
@@ -461,7 +462,8 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
break;
if (PageKsm(page))
ret = handle_mm_fault(vma, addr,
- FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE);
+ FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
+ mmrange);
else
ret = VM_FAULT_WRITE;
put_page(page);
@@ -516,6 +518,7 @@ static void break_cow(struct rmap_item *rmap_item)
struct mm_struct *mm = rmap_item->mm;
unsigned long addr = rmap_item->address;
struct vm_area_struct *vma;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/*
* It is not an accident that whenever we want to break COW
@@ -526,7 +529,7 @@ static void break_cow(struct rmap_item *rmap_item)
down_read(&mm->mmap_sem);
vma = find_mergeable_vma(mm, addr);
if (vma)
- break_ksm(vma, addr);
+ break_ksm(vma, addr, &mmrange);
up_read(&mm->mmap_sem);
}

@@ -807,7 +810,8 @@ static void remove_trailing_rmap_items(struct mm_slot *mm_slot,
* in cmp_and_merge_page on one of the rmap_items we would be removing.
*/
static int unmerge_ksm_pages(struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
unsigned long addr;
int err = 0;
@@ -818,7 +822,7 @@ static int unmerge_ksm_pages(struct vm_area_struct *vma,
if (signal_pending(current))
err = -ERESTARTSYS;
else
- err = break_ksm(vma, addr);
+ err = break_ksm(vma, addr, mmrange);
}
return err;
}
@@ -922,6 +926,7 @@ static int unmerge_and_remove_all_rmap_items(void)
struct mm_struct *mm;
struct vm_area_struct *vma;
int err = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

spin_lock(&ksm_mmlist_lock);
ksm_scan.mm_slot = list_entry(ksm_mm_head.mm_list.next,
@@ -937,8 +942,8 @@ static int unmerge_and_remove_all_rmap_items(void)
break;
if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma)
continue;
- err = unmerge_ksm_pages(vma,
- vma->vm_start, vma->vm_end);
+ err = unmerge_ksm_pages(vma, vma->vm_start,
+ vma->vm_end, &mmrange);
if (err)
goto error;
}
@@ -2350,7 +2355,8 @@ static int ksm_scan_thread(void *nothing)
}

int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, int advice, unsigned long *vm_flags)
+ unsigned long end, int advice, unsigned long *vm_flags,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = vma->vm_mm;
int err;
@@ -2384,7 +2390,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
return 0; /* just ignore the advice */

if (vma->anon_vma) {
- err = unmerge_ksm_pages(vma, start, end);
+ err = unmerge_ksm_pages(vma, start, end, mmrange);
if (err)
return err;
}
diff --git a/mm/madvise.c b/mm/madvise.c
index 4d3c922ea1a1..eaec6bfc2b08 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -54,7 +54,8 @@ static int madvise_need_mmap_write(int behavior)
*/
static long madvise_behavior(struct vm_area_struct *vma,
struct vm_area_struct **prev,
- unsigned long start, unsigned long end, int behavior)
+ unsigned long start, unsigned long end, int behavior,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = vma->vm_mm;
int error = 0;
@@ -104,7 +105,8 @@ static long madvise_behavior(struct vm_area_struct *vma,
break;
case MADV_MERGEABLE:
case MADV_UNMERGEABLE:
- error = ksm_madvise(vma, start, end, behavior, &new_flags);
+ error = ksm_madvise(vma, start, end, behavior,
+ &new_flags, mmrange);
if (error) {
/*
* madvise() returns EAGAIN if kernel resources, such as
@@ -138,7 +140,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, mmrange);
if (*prev) {
vma = *prev;
goto success;
@@ -151,7 +153,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
error = -ENOMEM;
goto out;
}
- error = __split_vma(mm, vma, start, 1);
+ error = __split_vma(mm, vma, start, 1, mmrange);
if (error) {
/*
* madvise() returns EAGAIN if kernel resources, such as
@@ -168,7 +170,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
error = -ENOMEM;
goto out;
}
- error = __split_vma(mm, vma, end, 0);
+ error = __split_vma(mm, vma, end, 0, mmrange);
if (error) {
/*
* madvise() returns EAGAIN if kernel resources, such as
@@ -191,7 +193,8 @@ static long madvise_behavior(struct vm_area_struct *vma,

#ifdef CONFIG_SWAP
static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
- unsigned long end, struct mm_walk *walk)
+ unsigned long end, struct mm_walk *walk,
+ struct range_lock *mmrange)
{
pte_t *orig_pte;
struct vm_area_struct *vma = walk->private;
@@ -226,7 +229,8 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
}

static void force_swapin_readahead(struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
struct mm_walk walk = {
.mm = vma->vm_mm,
@@ -234,7 +238,7 @@ static void force_swapin_readahead(struct vm_area_struct *vma,
.private = vma,
};

- walk_page_range(start, end, &walk);
+ walk_page_range(start, end, &walk, mmrange);

lru_add_drain(); /* Push any new pages onto the LRU now */
}
@@ -272,14 +276,15 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
*/
static long madvise_willneed(struct vm_area_struct *vma,
struct vm_area_struct **prev,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ struct range_lock *mmrange)
{
struct file *file = vma->vm_file;

*prev = vma;
#ifdef CONFIG_SWAP
if (!file) {
- force_swapin_readahead(vma, start, end);
+ force_swapin_readahead(vma, start, end, mmrange);
return 0;
}

@@ -308,7 +313,8 @@ static long madvise_willneed(struct vm_area_struct *vma,
}

static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
- unsigned long end, struct mm_walk *walk)
+ unsigned long end, struct mm_walk *walk,
+ struct range_lock *mmrange)

{
struct mmu_gather *tlb = walk->private;
@@ -442,7 +448,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,

static void madvise_free_page_range(struct mmu_gather *tlb,
struct vm_area_struct *vma,
- unsigned long addr, unsigned long end)
+ unsigned long addr, unsigned long end,
+ struct range_lock *mmrange)
{
struct mm_walk free_walk = {
.pmd_entry = madvise_free_pte_range,
@@ -451,12 +458,14 @@ static void madvise_free_page_range(struct mmu_gather *tlb,
};

tlb_start_vma(tlb, vma);
- walk_page_range(addr, end, &free_walk);
+ walk_page_range(addr, end, &free_walk, mmrange);
tlb_end_vma(tlb, vma);
}

static int madvise_free_single_vma(struct vm_area_struct *vma,
- unsigned long start_addr, unsigned long end_addr)
+ unsigned long start_addr,
+ unsigned long end_addr,
+ struct range_lock *mmrange)
{
unsigned long start, end;
struct mm_struct *mm = vma->vm_mm;
@@ -478,7 +487,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
update_hiwater_rss(mm);

mmu_notifier_invalidate_range_start(mm, start, end);
- madvise_free_page_range(&tlb, vma, start, end);
+ madvise_free_page_range(&tlb, vma, start, end, mmrange);
mmu_notifier_invalidate_range_end(mm, start, end);
tlb_finish_mmu(&tlb, start, end);

@@ -514,7 +523,7 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
static long madvise_dontneed_free(struct vm_area_struct *vma,
struct vm_area_struct **prev,
unsigned long start, unsigned long end,
- int behavior)
+ int behavior, struct range_lock *mmrange)
{
*prev = vma;
if (!can_madv_dontneed_vma(vma))
@@ -562,7 +571,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
if (behavior == MADV_DONTNEED)
return madvise_dontneed_single_vma(vma, start, end);
else if (behavior == MADV_FREE)
- return madvise_free_single_vma(vma, start, end);
+ return madvise_free_single_vma(vma, start, end, mmrange);
else
return -EINVAL;
}
@@ -676,18 +685,21 @@ static int madvise_inject_error(int behavior,

static long
madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
- unsigned long start, unsigned long end, int behavior)
+ unsigned long start, unsigned long end, int behavior,
+ struct range_lock *mmrange)
{
switch (behavior) {
case MADV_REMOVE:
return madvise_remove(vma, prev, start, end);
case MADV_WILLNEED:
- return madvise_willneed(vma, prev, start, end);
+ return madvise_willneed(vma, prev, start, end, mmrange);
case MADV_FREE:
case MADV_DONTNEED:
- return madvise_dontneed_free(vma, prev, start, end, behavior);
+ return madvise_dontneed_free(vma, prev, start, end, behavior,
+ mmrange);
default:
- return madvise_behavior(vma, prev, start, end, behavior);
+ return madvise_behavior(vma, prev, start, end, behavior,
+ mmrange);
}
}

@@ -797,7 +809,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
int write;
size_t len;
struct blk_plug plug;
-
+ DEFINE_RANGE_LOCK_FULL(mmrange);
if (!madvise_behavior_valid(behavior))
return error;

@@ -860,7 +872,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
tmp = end;

/* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
- error = madvise_vma(vma, &prev, start, tmp, behavior);
+ error = madvise_vma(vma, &prev, start, tmp, behavior, &mmrange);
if (error)
goto out;
start = tmp;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 88c1af32fd67..a7ac5a14b22e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4881,7 +4881,8 @@ static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,

static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma = walk->vma;
pte_t *pte;
@@ -4915,6 +4916,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
{
unsigned long precharge;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

struct mm_walk mem_cgroup_count_precharge_walk = {
.pmd_entry = mem_cgroup_count_precharge_pte_range,
@@ -4922,7 +4924,7 @@ static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
};
down_read(&mm->mmap_sem);
walk_page_range(0, mm->highest_vm_end,
- &mem_cgroup_count_precharge_walk);
+ &mem_cgroup_count_precharge_walk, &mmrange);
up_read(&mm->mmap_sem);

precharge = mc.precharge;
@@ -5081,7 +5083,8 @@ static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset)

static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk,
+ struct range_lock *mmrange)
{
int ret = 0;
struct vm_area_struct *vma = walk->vma;
@@ -5197,6 +5200,7 @@ static void mem_cgroup_move_charge(void)
.pmd_entry = mem_cgroup_move_charge_pte_range,
.mm = mc.mm,
};
+ DEFINE_RANGE_LOCK_FULL(mmrange);

lru_add_drain_all();
/*
@@ -5223,7 +5227,8 @@ static void mem_cgroup_move_charge(void)
* When we have consumed all precharges and failed in doing
* additional charge, the page walk just aborts.
*/
- walk_page_range(0, mc.mm->highest_vm_end, &mem_cgroup_move_charge_walk);
+ walk_page_range(0, mc.mm->highest_vm_end, &mem_cgroup_move_charge_walk,
+ &mmrange);

up_read(&mc.mm->mmap_sem);
atomic_dec(&mc.from->moving_account);
diff --git a/mm/memory.c b/mm/memory.c
index 5ec6433d6a5c..b3561a052939 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4021,7 +4021,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
* return value. See filemap_fault() and __lock_page_or_retry().
*/
static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
- unsigned int flags)
+ unsigned int flags, struct range_lock *mmrange)
{
struct vm_fault vmf = {
.vma = vma,
@@ -4029,6 +4029,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
.flags = flags,
.pgoff = linear_page_index(vma, address),
.gfp_mask = __get_fault_gfp_mask(vma),
+ .lockrange = mmrange,
};
unsigned int dirty = flags & FAULT_FLAG_WRITE;
struct mm_struct *mm = vma->vm_mm;
@@ -4110,7 +4111,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
* return value. See filemap_fault() and __lock_page_or_retry().
*/
int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
- unsigned int flags)
+ unsigned int flags, struct range_lock *mmrange)
{
int ret;

@@ -4137,7 +4138,7 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
if (unlikely(is_vm_hugetlb_page(vma)))
ret = hugetlb_fault(vma->vm_mm, vma, address, flags);
else
- ret = __handle_mm_fault(vma, address, flags);
+ ret = __handle_mm_fault(vma, address, flags, mmrange);

if (flags & FAULT_FLAG_USER) {
mem_cgroup_oom_disable();
@@ -4425,6 +4426,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
struct vm_area_struct *vma;
void *old_buf = buf;
int write = gup_flags & FOLL_WRITE;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_read(&mm->mmap_sem);
/* ignore errors, just check how much was successfully transferred */
@@ -4434,7 +4436,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
struct page *page = NULL;

ret = get_user_pages_remote(tsk, mm, addr, 1,
- gup_flags, &page, &vma, NULL);
+ gup_flags, &page, &vma, NULL, &mmrange);
if (ret <= 0) {
#ifndef CONFIG_HAVE_IOREMAP_PROT
break;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a8b7d59002e8..001dc176abc1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -467,7 +467,8 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
* and move them to the pagelist if they do.
*/
static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
- unsigned long end, struct mm_walk *walk)
+ unsigned long end, struct mm_walk *walk,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma = walk->vma;
struct page *page;
@@ -618,7 +619,7 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
static int
queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
nodemask_t *nodes, unsigned long flags,
- struct list_head *pagelist)
+ struct list_head *pagelist, struct range_lock *mmrange)
{
struct queue_pages qp = {
.pagelist = pagelist,
@@ -634,7 +635,7 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
.private = &qp,
};

- return walk_page_range(start, end, &queue_pages_walk);
+ return walk_page_range(start, end, &queue_pages_walk, mmrange);
}

/*
@@ -675,7 +676,8 @@ static int vma_replace_policy(struct vm_area_struct *vma,

/* Step 2: apply policy to a range and do splits. */
static int mbind_range(struct mm_struct *mm, unsigned long start,
- unsigned long end, struct mempolicy *new_pol)
+ unsigned long end, struct mempolicy *new_pol,
+ struct range_lock *mmrange)
{
struct vm_area_struct *next;
struct vm_area_struct *prev;
@@ -705,7 +707,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
((vmstart - vma->vm_start) >> PAGE_SHIFT);
prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx);
+ new_pol, vma->vm_userfaultfd_ctx, mmrange);
if (prev) {
vma = prev;
next = vma->vm_next;
@@ -715,12 +717,12 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
goto replace;
}
if (vma->vm_start != vmstart) {
- err = split_vma(vma->vm_mm, vma, vmstart, 1);
+ err = split_vma(vma->vm_mm, vma, vmstart, 1, mmrange);
if (err)
goto out;
}
if (vma->vm_end != vmend) {
- err = split_vma(vma->vm_mm, vma, vmend, 0);
+ err = split_vma(vma->vm_mm, vma, vmend, 0, mmrange);
if (err)
goto out;
}
@@ -797,12 +799,12 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes)
}
}

-static int lookup_node(unsigned long addr)
+static int lookup_node(unsigned long addr, struct range_lock *mmrange)
{
struct page *p;
int err;

- err = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL);
+ err = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL, mmrange);
if (err >= 0) {
err = page_to_nid(p);
put_page(p);
@@ -818,6 +820,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
struct mempolicy *pol = current->mempolicy;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (flags &
~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR|MPOL_F_MEMS_ALLOWED))
@@ -857,7 +860,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,

if (flags & MPOL_F_NODE) {
if (flags & MPOL_F_ADDR) {
- err = lookup_node(addr);
+ err = lookup_node(addr, &mmrange);
if (err < 0)
goto out;
*policy = err;
@@ -943,7 +946,7 @@ struct page *alloc_new_node_page(struct page *page, unsigned long node)
* Returns error or the number of pages not migrated.
*/
static int migrate_to_node(struct mm_struct *mm, int source, int dest,
- int flags)
+ int flags, struct range_lock *mmrange)
{
nodemask_t nmask;
LIST_HEAD(pagelist);
@@ -959,7 +962,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
*/
VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)));
queue_pages_range(mm, mm->mmap->vm_start, mm->task_size, &nmask,
- flags | MPOL_MF_DISCONTIG_OK, &pagelist);
+ flags | MPOL_MF_DISCONTIG_OK, &pagelist, mmrange);

if (!list_empty(&pagelist)) {
err = migrate_pages(&pagelist, alloc_new_node_page, NULL, dest,
@@ -983,6 +986,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
int busy = 0;
int err;
nodemask_t tmp;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

err = migrate_prep();
if (err)
@@ -1063,7 +1067,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
break;

node_clear(source, tmp);
- err = migrate_to_node(mm, source, dest, flags);
+ err = migrate_to_node(mm, source, dest, flags, &mmrange);
if (err > 0)
busy += err;
if (err < 0)
@@ -1143,6 +1147,7 @@ static long do_mbind(unsigned long start, unsigned long len,
unsigned long end;
int err;
LIST_HEAD(pagelist);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (flags & ~(unsigned long)MPOL_MF_VALID)
return -EINVAL;
@@ -1204,9 +1209,9 @@ static long do_mbind(unsigned long start, unsigned long len,
goto mpol_out;

err = queue_pages_range(mm, start, end, nmask,
- flags | MPOL_MF_INVERT, &pagelist);
+ flags | MPOL_MF_INVERT, &pagelist, &mmrange);
if (!err)
- err = mbind_range(mm, start, end, new);
+ err = mbind_range(mm, start, end, new, &mmrange);

if (!err) {
int nr_failed = 0;
diff --git a/mm/migrate.c b/mm/migrate.c
index 5d0dc7b85f90..7a6afc34dd54 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2105,7 +2105,8 @@ struct migrate_vma {

static int migrate_vma_collect_hole(unsigned long start,
unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk,
+ struct range_lock *mmrange)
{
struct migrate_vma *migrate = walk->private;
unsigned long addr;
@@ -2138,7 +2139,8 @@ static int migrate_vma_collect_skip(unsigned long start,
static int migrate_vma_collect_pmd(pmd_t *pmdp,
unsigned long start,
unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk,
+ struct range_lock *mmrange)
{
struct migrate_vma *migrate = walk->private;
struct vm_area_struct *vma = walk->vma;
@@ -2149,7 +2151,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,

again:
if (pmd_none(*pmdp))
- return migrate_vma_collect_hole(start, end, walk);
+ return migrate_vma_collect_hole(start, end, walk, mmrange);

if (pmd_trans_huge(*pmdp)) {
struct page *page;
@@ -2183,7 +2185,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
walk);
if (pmd_none(*pmdp))
return migrate_vma_collect_hole(start, end,
- walk);
+ walk, mmrange);
}
}

@@ -2309,7 +2311,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
* valid page, it updates the src array and takes a reference on the page, in
* order to pin the page until we lock it and unmap it.
*/
-static void migrate_vma_collect(struct migrate_vma *migrate)
+static void migrate_vma_collect(struct migrate_vma *migrate,
+ struct range_lock *mmrange)
{
struct mm_walk mm_walk;

@@ -2325,7 +2328,7 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
mmu_notifier_invalidate_range_start(mm_walk.mm,
migrate->start,
migrate->end);
- walk_page_range(migrate->start, migrate->end, &mm_walk);
+ walk_page_range(migrate->start, migrate->end, &mm_walk, mmrange);
mmu_notifier_invalidate_range_end(mm_walk.mm,
migrate->start,
migrate->end);
@@ -2891,7 +2894,8 @@ int migrate_vma(const struct migrate_vma_ops *ops,
unsigned long end,
unsigned long *src,
unsigned long *dst,
- void *private)
+ void *private,
+ struct range_lock *mmrange)
{
struct migrate_vma migrate;

@@ -2917,7 +2921,7 @@ int migrate_vma(const struct migrate_vma_ops *ops,
migrate.vma = vma;

/* Collect, and try to unmap source pages */
- migrate_vma_collect(&migrate);
+ migrate_vma_collect(&migrate, mmrange);
if (!migrate.cpages)
return 0;

diff --git a/mm/mincore.c b/mm/mincore.c
index fc37afe226e6..a6875a34aac0 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -85,7 +85,9 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
}

static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
- struct vm_area_struct *vma, unsigned char *vec)
+ struct vm_area_struct *vma,
+ unsigned char *vec,
+ struct range_lock *mmrange)
{
unsigned long nr = (end - addr) >> PAGE_SHIFT;
int i;
@@ -104,15 +106,17 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
}

static int mincore_unmapped_range(unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk,
+ struct range_lock *mmrange)
{
walk->private += __mincore_unmapped_range(addr, end,
- walk->vma, walk->private);
+ walk->vma,
+ walk->private, mmrange);
return 0;
}

static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
spinlock_t *ptl;
struct vm_area_struct *vma = walk->vma;
@@ -128,7 +132,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
}

if (pmd_trans_unstable(pmd)) {
- __mincore_unmapped_range(addr, end, vma, vec);
+ __mincore_unmapped_range(addr, end, vma, vec, mmrange);
goto out;
}

@@ -138,7 +142,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,

if (pte_none(pte))
__mincore_unmapped_range(addr, addr + PAGE_SIZE,
- vma, vec);
+ vma, vec, mmrange);
else if (pte_present(pte))
*vec = 1;
else { /* pte is a swap entry */
@@ -174,7 +178,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
* all the arguments, we hold the mmap semaphore: we should
* just return the amount of info we're asked for.
*/
-static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *vec)
+static long do_mincore(unsigned long addr, unsigned long pages,
+ unsigned char *vec, struct range_lock *mmrange)
{
struct vm_area_struct *vma;
unsigned long end;
@@ -191,7 +196,7 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
return -ENOMEM;
mincore_walk.mm = vma->vm_mm;
end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
- err = walk_page_range(addr, end, &mincore_walk);
+ err = walk_page_range(addr, end, &mincore_walk, mmrange);
if (err < 0)
return err;
return (end - addr) >> PAGE_SHIFT;
@@ -227,6 +232,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
long retval;
unsigned long pages;
unsigned char *tmp;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* Check the start address: needs to be page-aligned.. */
if (start & ~PAGE_MASK)
@@ -254,7 +260,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
* the temporary buffer size.
*/
down_read(&current->mm->mmap_sem);
- retval = do_mincore(start, min(pages, PAGE_SIZE), tmp);
+ retval = do_mincore(start, min(pages, PAGE_SIZE), tmp, &mmrange);
up_read(&current->mm->mmap_sem);

if (retval <= 0)
diff --git a/mm/mlock.c b/mm/mlock.c
index 74e5a6547c3d..3f6bd953e8b0 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -517,7 +517,8 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
* For vmas that pass the filters, merge/split as appropriate.
*/
static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
- unsigned long start, unsigned long end, vm_flags_t newflags)
+ unsigned long start, unsigned long end, vm_flags_t newflags,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = vma->vm_mm;
pgoff_t pgoff;
@@ -534,20 +535,20 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, mmrange);
if (*prev) {
vma = *prev;
goto success;
}

if (start != vma->vm_start) {
- ret = split_vma(mm, vma, start, 1);
+ ret = split_vma(mm, vma, start, 1, mmrange);
if (ret)
goto out;
}

if (end != vma->vm_end) {
- ret = split_vma(mm, vma, end, 0);
+ ret = split_vma(mm, vma, end, 0, mmrange);
if (ret)
goto out;
}
@@ -580,7 +581,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
}

static int apply_vma_lock_flags(unsigned long start, size_t len,
- vm_flags_t flags)
+ vm_flags_t flags, struct range_lock *mmrange)
{
unsigned long nstart, end, tmp;
struct vm_area_struct * vma, * prev;
@@ -610,7 +611,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
- error = mlock_fixup(vma, &prev, nstart, tmp, newflags);
+ error = mlock_fixup(vma, &prev, nstart, tmp, newflags, mmrange);
if (error)
break;
nstart = tmp;
@@ -667,11 +668,13 @@ static int count_mm_mlocked_page_nr(struct mm_struct *mm,
return count >> PAGE_SHIFT;
}

-static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
+static __must_check int do_mlock(unsigned long start, size_t len,
+ vm_flags_t flags)
{
unsigned long locked;
unsigned long lock_limit;
int error = -ENOMEM;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!can_do_mlock())
return -EPERM;
@@ -700,7 +703,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla

/* check against resource limits */
if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
- error = apply_vma_lock_flags(start, len, flags);
+ error = apply_vma_lock_flags(start, len, flags, &mmrange);

up_write(&current->mm->mmap_sem);
if (error)
@@ -733,13 +736,14 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
{
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

len = PAGE_ALIGN(len + (offset_in_page(start)));
start &= PAGE_MASK;

if (down_write_killable(&current->mm->mmap_sem))
return -EINTR;
- ret = apply_vma_lock_flags(start, len, 0);
+ ret = apply_vma_lock_flags(start, len, 0, &mmrange);
up_write(&current->mm->mmap_sem);

return ret;
@@ -755,7 +759,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
* is called once including the MCL_FUTURE flag and then a second time without
* it, VM_LOCKED and VM_LOCKONFAULT will be cleared from mm->def_flags.
*/
-static int apply_mlockall_flags(int flags)
+static int apply_mlockall_flags(int flags, struct range_lock *mmrange)
{
struct vm_area_struct * vma, * prev = NULL;
vm_flags_t to_add = 0;
@@ -784,7 +788,8 @@ static int apply_mlockall_flags(int flags)
newflags |= to_add;

/* Ignore errors */
- mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
+ mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags,
+ mmrange);
cond_resched();
}
out:
@@ -795,6 +800,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
{
unsigned long lock_limit;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)))
return -EINVAL;
@@ -811,7 +817,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
ret = -ENOMEM;
if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
capable(CAP_IPC_LOCK))
- ret = apply_mlockall_flags(flags);
+ ret = apply_mlockall_flags(flags, &mmrange);
up_write(&current->mm->mmap_sem);
if (!ret && (flags & MCL_CURRENT))
mm_populate(0, TASK_SIZE);
@@ -822,10 +828,11 @@ SYSCALL_DEFINE1(mlockall, int, flags)
SYSCALL_DEFINE0(munlockall)
{
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (down_write_killable(&current->mm->mmap_sem))
return -EINTR;
- ret = apply_mlockall_flags(0);
+ ret = apply_mlockall_flags(0, &mmrange);
up_write(&current->mm->mmap_sem);
return ret;
}
diff --git a/mm/mmap.c b/mm/mmap.c
index 4bb038e7984b..f61d49cb791e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -177,7 +177,8 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
return next;
}

-static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf);
+static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf,
+ struct range_lock *mmrange);

SYSCALL_DEFINE1(brk, unsigned long, brk)
{
@@ -188,6 +189,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
unsigned long min_brk;
bool populate;
LIST_HEAD(uf);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (down_write_killable(&mm->mmap_sem))
return -EINTR;
@@ -225,7 +227,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)

/* Always allow shrinking brk. */
if (brk <= mm->brk) {
- if (!do_munmap(mm, newbrk, oldbrk-newbrk, &uf))
+ if (!do_munmap(mm, newbrk, oldbrk-newbrk, &uf, &mmrange))
goto set_brk;
goto out;
}
@@ -236,7 +238,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
goto out;

/* Ok, looks good - let it rip. */
- if (do_brk(oldbrk, newbrk-oldbrk, &uf) < 0)
+ if (do_brk(oldbrk, newbrk-oldbrk, &uf, &mmrange) < 0)
goto out;

set_brk:
@@ -680,7 +682,7 @@ static inline void __vma_unlink_prev(struct mm_struct *mm,
*/
int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
- struct vm_area_struct *expand)
+ struct vm_area_struct *expand, struct range_lock *mmrange)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *next = vma->vm_next, *orig_vma = vma;
@@ -887,10 +889,10 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
i_mmap_unlock_write(mapping);

if (root) {
- uprobe_mmap(vma);
+ uprobe_mmap(vma, mmrange);

if (adjust_next)
- uprobe_mmap(next);
+ uprobe_mmap(next, mmrange);
}

if (remove_next) {
@@ -960,7 +962,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}
}
if (insert && file)
- uprobe_mmap(insert);
+ uprobe_mmap(insert, mmrange);

validate_mm(mm);

@@ -1101,7 +1103,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
unsigned long end, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
pgoff_t pgoff, struct mempolicy *policy,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ struct range_lock *mmrange)
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
struct vm_area_struct *area, *next;
@@ -1149,10 +1152,11 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
/* cases 1, 6 */
err = __vma_adjust(prev, prev->vm_start,
next->vm_end, prev->vm_pgoff, NULL,
- prev);
+ prev, mmrange);
} else /* cases 2, 5, 7 */
err = __vma_adjust(prev, prev->vm_start,
- end, prev->vm_pgoff, NULL, prev);
+ end, prev->vm_pgoff, NULL,
+ prev, mmrange);
if (err)
return NULL;
khugepaged_enter_vma_merge(prev, vm_flags);
@@ -1169,10 +1173,12 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
vm_userfaultfd_ctx)) {
if (prev && addr < prev->vm_end) /* case 4 */
err = __vma_adjust(prev, prev->vm_start,
- addr, prev->vm_pgoff, NULL, next);
+ addr, prev->vm_pgoff, NULL,
+ next, mmrange);
else { /* cases 3, 8 */
err = __vma_adjust(area, addr, next->vm_end,
- next->vm_pgoff - pglen, NULL, next);
+ next->vm_pgoff - pglen, NULL,
+ next, mmrange);
/*
* In case 3 area is already equal to next and
* this is a noop, but in case 8 "area" has
@@ -1322,7 +1328,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot,
unsigned long flags, vm_flags_t vm_flags,
unsigned long pgoff, unsigned long *populate,
- struct list_head *uf)
+ struct list_head *uf, struct range_lock *mmrange)
{
struct mm_struct *mm = current->mm;
int pkey = 0;
@@ -1491,7 +1497,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
vm_flags |= VM_NORESERVE;
}

- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
+ addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, mmrange);
if (!IS_ERR_VALUE(addr) &&
((vm_flags & VM_LOCKED) ||
(flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
@@ -1628,7 +1634,7 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)

unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
- struct list_head *uf)
+ struct list_head *uf, struct range_lock *mmrange)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
@@ -1654,7 +1660,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Clear old maps */
while (find_vma_links(mm, addr, addr + len, &prev, &rb_link,
&rb_parent)) {
- if (do_munmap(mm, addr, len, uf))
+ if (do_munmap(mm, addr, len, uf, mmrange))
return -ENOMEM;
}

@@ -1672,7 +1678,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* Can we just expand an old mapping?
*/
vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
- NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
+ NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, mmrange);
if (vma)
goto out;

@@ -1756,7 +1762,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
}

if (file)
- uprobe_mmap(vma);
+ uprobe_mmap(vma, mmrange);

/*
* New (or expanded) vma always get soft dirty status.
@@ -2435,7 +2441,8 @@ int expand_stack(struct vm_area_struct *vma, unsigned long address)
}

struct vm_area_struct *
-find_extend_vma(struct mm_struct *mm, unsigned long addr)
+find_extend_vma(struct mm_struct *mm, unsigned long addr,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma, *prev;

@@ -2446,7 +2453,8 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
if (!prev || expand_stack(prev, addr))
return NULL;
if (prev->vm_flags & VM_LOCKED)
- populate_vma_page_range(prev, addr, prev->vm_end, NULL);
+ populate_vma_page_range(prev, addr, prev->vm_end,
+ NULL, mmrange);
return prev;
}
#else
@@ -2456,7 +2464,8 @@ int expand_stack(struct vm_area_struct *vma, unsigned long address)
}

struct vm_area_struct *
-find_extend_vma(struct mm_struct *mm, unsigned long addr)
+find_extend_vma(struct mm_struct *mm, unsigned long addr,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma;
unsigned long start;
@@ -2473,7 +2482,7 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
if (expand_stack(vma, addr))
return NULL;
if (vma->vm_flags & VM_LOCKED)
- populate_vma_page_range(vma, addr, start, NULL);
+ populate_vma_page_range(vma, addr, start, NULL, mmrange);
return vma;
}
#endif
@@ -2561,7 +2570,7 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
* has already been checked or doesn't make sense to fail.
*/
int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, int new_below)
+ unsigned long addr, int new_below, struct range_lock *mmrange)
{
struct vm_area_struct *new;
int err;
@@ -2604,9 +2613,11 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,

if (new_below)
err = vma_adjust(vma, addr, vma->vm_end, vma->vm_pgoff +
- ((addr - new->vm_start) >> PAGE_SHIFT), new);
+ ((addr - new->vm_start) >> PAGE_SHIFT), new,
+ mmrange);
else
- err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new);
+ err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new,
+ mmrange);

/* Success. */
if (!err)
@@ -2630,12 +2641,12 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
* either for the first part or the tail.
*/
int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, int new_below)
+ unsigned long addr, int new_below, struct range_lock *mmrange)
{
if (mm->map_count >= sysctl_max_map_count)
return -ENOMEM;

- return __split_vma(mm, vma, addr, new_below);
+ return __split_vma(mm, vma, addr, new_below, mmrange);
}

/* Munmap is split into 2 main parts -- this part which finds
@@ -2644,7 +2655,7 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
* Jeremy Fitzhardinge <[email protected]>
*/
int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
- struct list_head *uf)
+ struct list_head *uf, struct range_lock *mmrange)
{
unsigned long end;
struct vm_area_struct *vma, *prev, *last;
@@ -2686,7 +2697,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
return -ENOMEM;

- error = __split_vma(mm, vma, start, 0);
+ error = __split_vma(mm, vma, start, 0, mmrange);
if (error)
return error;
prev = vma;
@@ -2695,7 +2706,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
/* Does it split the last one? */
last = find_vma(mm, end);
if (last && end > last->vm_start) {
- int error = __split_vma(mm, last, end, 1);
+ int error = __split_vma(mm, last, end, 1, mmrange);
if (error)
return error;
}
@@ -2736,7 +2747,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
detach_vmas_to_be_unmapped(mm, vma, prev, end);
unmap_region(mm, vma, prev, start, end);

- arch_unmap(mm, vma, start, end);
+ arch_unmap(mm, vma, start, end, mmrange);

/* Fix up all other VM information */
remove_vma_list(mm, vma);
@@ -2749,11 +2760,12 @@ int vm_munmap(unsigned long start, size_t len)
int ret;
struct mm_struct *mm = current->mm;
LIST_HEAD(uf);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (down_write_killable(&mm->mmap_sem))
return -EINTR;

- ret = do_munmap(mm, start, len, &uf);
+ ret = do_munmap(mm, start, len, &uf, &mmrange);
up_write(&mm->mmap_sem);
userfaultfd_unmap_complete(mm, &uf);
return ret;
@@ -2779,6 +2791,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
unsigned long populate = 0;
unsigned long ret = -EINVAL;
struct file *file;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

pr_warn_once("%s (%d) uses deprecated remap_file_pages() syscall. See Documentation/vm/remap_file_pages.txt.\n",
current->comm, current->pid);
@@ -2855,7 +2868,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,

file = get_file(vma->vm_file);
ret = do_mmap_pgoff(vma->vm_file, start, size,
- prot, flags, pgoff, &populate, NULL);
+ prot, flags, pgoff, &populate, NULL, &mmrange);
fput(file);
out:
up_write(&mm->mmap_sem);
@@ -2881,7 +2894,9 @@ static inline void verify_mm_writelocked(struct mm_struct *mm)
* anonymous maps. eventually we may be able to do some
* brk-specific accounting here.
*/
-static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags, struct list_head *uf)
+static int do_brk_flags(unsigned long addr, unsigned long request,
+ unsigned long flags, struct list_head *uf,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
@@ -2920,7 +2935,7 @@ static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long
*/
while (find_vma_links(mm, addr, addr + len, &prev, &rb_link,
&rb_parent)) {
- if (do_munmap(mm, addr, len, uf))
+ if (do_munmap(mm, addr, len, uf, mmrange))
return -ENOMEM;
}

@@ -2936,7 +2951,7 @@ static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long

/* Can we just expand an old private anonymous mapping? */
vma = vma_merge(mm, prev, addr, addr + len, flags,
- NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX);
+ NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, mmrange);
if (vma)
goto out;

@@ -2967,9 +2982,10 @@ static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long
return 0;
}

-static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf)
+static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf,
+ struct range_lock *mmrange)
{
- return do_brk_flags(addr, len, 0, uf);
+ return do_brk_flags(addr, len, 0, uf, mmrange);
}

int vm_brk_flags(unsigned long addr, unsigned long len, unsigned long flags)
@@ -2978,11 +2994,12 @@ int vm_brk_flags(unsigned long addr, unsigned long len, unsigned long flags)
int ret;
bool populate;
LIST_HEAD(uf);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (down_write_killable(&mm->mmap_sem))
return -EINTR;

- ret = do_brk_flags(addr, len, flags, &uf);
+ ret = do_brk_flags(addr, len, flags, &uf, &mmrange);
populate = ((mm->def_flags & VM_LOCKED) != 0);
up_write(&mm->mmap_sem);
userfaultfd_unmap_complete(mm, &uf);
@@ -3105,7 +3122,7 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
*/
struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
unsigned long addr, unsigned long len, pgoff_t pgoff,
- bool *need_rmap_locks)
+ bool *need_rmap_locks, struct range_lock *mmrange)
{
struct vm_area_struct *vma = *vmap;
unsigned long vma_start = vma->vm_start;
@@ -3127,7 +3144,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
return NULL; /* should never get here */
new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, mmrange);
if (new_vma) {
/*
* Source vma may have been merged into new_vma
diff --git a/mm/mprotect.c b/mm/mprotect.c
index e3309fcf586b..b84a70720319 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -299,7 +299,8 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,

int
mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
- unsigned long start, unsigned long end, unsigned long newflags)
+ unsigned long start, unsigned long end, unsigned long newflags,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long oldflags = vma->vm_flags;
@@ -340,7 +341,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*pprev = vma_merge(mm, *pprev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, mmrange);
if (*pprev) {
vma = *pprev;
VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
@@ -350,13 +351,13 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
*pprev = vma;

if (start != vma->vm_start) {
- error = split_vma(mm, vma, start, 1);
+ error = split_vma(mm, vma, start, 1, mmrange);
if (error)
goto fail;
}

if (end != vma->vm_end) {
- error = split_vma(mm, vma, end, 0);
+ error = split_vma(mm, vma, end, 0, mmrange);
if (error)
goto fail;
}
@@ -379,7 +380,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
*/
if ((oldflags & (VM_WRITE | VM_SHARED | VM_LOCKED)) == VM_LOCKED &&
(newflags & VM_WRITE)) {
- populate_vma_page_range(vma, start, end, NULL);
+ populate_vma_page_range(vma, start, end, NULL, mmrange);
}

vm_stat_account(mm, oldflags, -nrpages);
@@ -404,6 +405,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
const int grows = prot & (PROT_GROWSDOWN|PROT_GROWSUP);
const bool rier = (current->personality & READ_IMPLIES_EXEC) &&
(prot & PROT_READ);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP);
if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */
@@ -494,7 +496,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
- error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
+ error = mprotect_fixup(vma, &prev, nstart, tmp, newflags, &mmrange);
if (error)
goto out;
nstart = tmp;
diff --git a/mm/mremap.c b/mm/mremap.c
index 049470aa1e3e..21a9e2a2baa2 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -264,7 +264,8 @@ static unsigned long move_vma(struct vm_area_struct *vma,
unsigned long old_addr, unsigned long old_len,
unsigned long new_len, unsigned long new_addr,
bool *locked, struct vm_userfaultfd_ctx *uf,
- struct list_head *uf_unmap)
+ struct list_head *uf_unmap,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *new_vma;
@@ -292,13 +293,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
* so KSM can come around to merge on vma and new_vma afterwards.
*/
err = ksm_madvise(vma, old_addr, old_addr + old_len,
- MADV_UNMERGEABLE, &vm_flags);
+ MADV_UNMERGEABLE, &vm_flags, mmrange);
if (err)
return err;

new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT);
new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff,
- &need_rmap_locks);
+ &need_rmap_locks, mmrange);
if (!new_vma)
return -ENOMEM;

@@ -353,7 +354,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
if (unlikely(vma->vm_flags & VM_PFNMAP))
untrack_pfn_moved(vma);

- if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
+ if (do_munmap(mm, old_addr, old_len, uf_unmap, mmrange) < 0) {
/* OOM: unable to split vma, just get accounts right */
vm_unacct_memory(excess >> PAGE_SHIFT);
excess = 0;
@@ -444,7 +445,8 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
unsigned long new_addr, unsigned long new_len, bool *locked,
struct vm_userfaultfd_ctx *uf,
struct list_head *uf_unmap_early,
- struct list_head *uf_unmap)
+ struct list_head *uf_unmap,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
@@ -462,12 +464,13 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
if (addr + old_len > new_addr && new_addr + new_len > addr)
goto out;

- ret = do_munmap(mm, new_addr, new_len, uf_unmap_early);
+ ret = do_munmap(mm, new_addr, new_len, uf_unmap_early, mmrange);
if (ret)
goto out;

if (old_len >= new_len) {
- ret = do_munmap(mm, addr+new_len, old_len - new_len, uf_unmap);
+ ret = do_munmap(mm, addr+new_len, old_len - new_len,
+ uf_unmap, mmrange);
if (ret && old_len != new_len)
goto out;
old_len = new_len;
@@ -490,7 +493,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
goto out1;

ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
- uf_unmap);
+ uf_unmap, mmrange);
if (!(offset_in_page(ret)))
goto out;
out1:
@@ -532,6 +535,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX;
LIST_HEAD(uf_unmap_early);
LIST_HEAD(uf_unmap);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
return ret;
@@ -558,7 +562,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,

if (flags & MREMAP_FIXED) {
ret = mremap_to(addr, old_len, new_addr, new_len,
- &locked, &uf, &uf_unmap_early, &uf_unmap);
+ &locked, &uf, &uf_unmap_early,
+ &uf_unmap, &mmrange);
goto out;
}

@@ -568,7 +573,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
* do_munmap does all the needed commit accounting
*/
if (old_len >= new_len) {
- ret = do_munmap(mm, addr+new_len, old_len - new_len, &uf_unmap);
+ ret = do_munmap(mm, addr+new_len, old_len - new_len,
+ &uf_unmap, &mmrange);
if (ret && old_len != new_len)
goto out;
ret = addr;
@@ -592,7 +598,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
int pages = (new_len - old_len) >> PAGE_SHIFT;

if (vma_adjust(vma, vma->vm_start, addr + new_len,
- vma->vm_pgoff, NULL)) {
+ vma->vm_pgoff, NULL, &mmrange)) {
ret = -ENOMEM;
goto out;
}
@@ -628,7 +634,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
}

ret = move_vma(vma, addr, old_len, new_len, new_addr,
- &locked, &uf, &uf_unmap);
+ &locked, &uf, &uf_unmap, &mmrange);
}
out:
if (offset_in_page(ret)) {
diff --git a/mm/nommu.c b/mm/nommu.c
index ebb6e618dade..1805f0a788b3 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -113,7 +113,8 @@ unsigned int kobjsize(const void *objp)
static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int foll_flags, struct page **pages,
- struct vm_area_struct **vmas, int *nonblocking)
+ struct vm_area_struct **vmas, int *nonblocking,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma;
unsigned long vm_flags;
@@ -162,18 +163,19 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
*/
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas)
+ struct vm_area_struct **vmas,
+ struct range_lock *mmrange)
{
return __get_user_pages(current, current->mm, start, nr_pages,
- gup_flags, pages, vmas, NULL);
+ gup_flags, pages, vmas, NULL, mmrange);
}
EXPORT_SYMBOL(get_user_pages);

long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- int *locked)
+ int *locked, struct range_lock *mmrange)
{
- return get_user_pages(start, nr_pages, gup_flags, pages, NULL);
+ return get_user_pages(start, nr_pages, gup_flags, pages, NULL, mmrange);
}
EXPORT_SYMBOL(get_user_pages_locked);

@@ -183,9 +185,11 @@ static long __get_user_pages_unlocked(struct task_struct *tsk,
unsigned int gup_flags)
{
long ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);
+
down_read(&mm->mmap_sem);
ret = __get_user_pages(tsk, mm, start, nr_pages, gup_flags, pages,
- NULL, NULL);
+ NULL, NULL, &mmrange);
up_read(&mm->mmap_sem);
return ret;
}
@@ -836,7 +840,8 @@ EXPORT_SYMBOL(find_vma);
* find a VMA
* - we don't extend stack VMAs under NOMMU conditions
*/
-struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr)
+struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr,
+ struct range_lock *mmrange)
{
return find_vma(mm, addr);
}
@@ -1206,7 +1211,8 @@ unsigned long do_mmap(struct file *file,
vm_flags_t vm_flags,
unsigned long pgoff,
unsigned long *populate,
- struct list_head *uf)
+ struct list_head *uf,
+ struct range_lock *mmrange)
{
struct vm_area_struct *vma;
struct vm_region *region;
@@ -1476,7 +1482,7 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
* for the first part or the tail.
*/
int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, int new_below)
+ unsigned long addr, int new_below, struct range_lock *mmrange)
{
struct vm_area_struct *new;
struct vm_region *region;
@@ -1578,7 +1584,8 @@ static int shrink_vma(struct mm_struct *mm,
* - under NOMMU conditions the chunk to be unmapped must be backed by a single
* VMA, though it need not cover the whole VMA
*/
-int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf)
+int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
+ struct list_head *uf, struct range_lock *mmrange)
{
struct vm_area_struct *vma;
unsigned long end;
@@ -1624,7 +1631,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list
if (end != vma->vm_end && offset_in_page(end))
return -EINVAL;
if (start != vma->vm_start && end != vma->vm_end) {
- ret = split_vma(mm, vma, start, 1);
+ ret = split_vma(mm, vma, start, 1, mmrange);
if (ret < 0)
return ret;
}
@@ -1642,9 +1649,10 @@ int vm_munmap(unsigned long addr, size_t len)
{
struct mm_struct *mm = current->mm;
int ret;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

down_write(&mm->mmap_sem);
- ret = do_munmap(mm, addr, len, NULL);
+ ret = do_munmap(mm, addr, len, NULL, &mmrange);
up_write(&mm->mmap_sem);
return ret;
}
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 8d2da5dec1e0..44a2507c94fd 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -26,7 +26,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
}

static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
pmd_t *pmd;
unsigned long next;
@@ -38,7 +38,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
next = pmd_addr_end(addr, end);
if (pmd_none(*pmd) || !walk->vma) {
if (walk->pte_hole)
- err = walk->pte_hole(addr, next, walk);
+ err = walk->pte_hole(addr, next, walk, mmrange);
if (err)
break;
continue;
@@ -48,7 +48,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
* needs to know about pmd_trans_huge() pmds
*/
if (walk->pmd_entry)
- err = walk->pmd_entry(pmd, addr, next, walk);
+ err = walk->pmd_entry(pmd, addr, next, walk, mmrange);
if (err)
break;

@@ -71,7 +71,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
}

static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
pud_t *pud;
unsigned long next;
@@ -83,7 +83,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
next = pud_addr_end(addr, end);
if (pud_none(*pud) || !walk->vma) {
if (walk->pte_hole)
- err = walk->pte_hole(addr, next, walk);
+ err = walk->pte_hole(addr, next, walk, mmrange);
if (err)
break;
continue;
@@ -106,7 +106,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
goto again;

if (walk->pmd_entry || walk->pte_entry)
- err = walk_pmd_range(pud, addr, next, walk);
+ err = walk_pmd_range(pud, addr, next, walk, mmrange);
if (err)
break;
} while (pud++, addr = next, addr != end);
@@ -115,7 +115,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
}

static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
p4d_t *p4d;
unsigned long next;
@@ -126,13 +126,13 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
next = p4d_addr_end(addr, end);
if (p4d_none_or_clear_bad(p4d)) {
if (walk->pte_hole)
- err = walk->pte_hole(addr, next, walk);
+ err = walk->pte_hole(addr, next, walk, mmrange);
if (err)
break;
continue;
}
if (walk->pmd_entry || walk->pte_entry)
- err = walk_pud_range(p4d, addr, next, walk);
+ err = walk_pud_range(p4d, addr, next, walk, mmrange);
if (err)
break;
} while (p4d++, addr = next, addr != end);
@@ -141,7 +141,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
}

static int walk_pgd_range(unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
pgd_t *pgd;
unsigned long next;
@@ -152,13 +152,13 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd)) {
if (walk->pte_hole)
- err = walk->pte_hole(addr, next, walk);
+ err = walk->pte_hole(addr, next, walk, mmrange);
if (err)
break;
continue;
}
if (walk->pmd_entry || walk->pte_entry)
- err = walk_p4d_range(pgd, addr, next, walk);
+ err = walk_p4d_range(pgd, addr, next, walk, mmrange);
if (err)
break;
} while (pgd++, addr = next, addr != end);
@@ -175,7 +175,7 @@ static unsigned long hugetlb_entry_end(struct hstate *h, unsigned long addr,
}

static int walk_hugetlb_range(unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
struct vm_area_struct *vma = walk->vma;
struct hstate *h = hstate_vma(vma);
@@ -192,7 +192,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
if (pte)
err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
else if (walk->pte_hole)
- err = walk->pte_hole(addr, next, walk);
+ err = walk->pte_hole(addr, next, walk, mmrange);

if (err)
break;
@@ -203,7 +203,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,

#else /* CONFIG_HUGETLB_PAGE */
static int walk_hugetlb_range(unsigned long addr, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
return 0;
}
@@ -217,7 +217,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
* error, where we abort the current walk.
*/
static int walk_page_test(unsigned long start, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
struct vm_area_struct *vma = walk->vma;

@@ -235,23 +235,23 @@ static int walk_page_test(unsigned long start, unsigned long end,
if (vma->vm_flags & VM_PFNMAP) {
int err = 1;
if (walk->pte_hole)
- err = walk->pte_hole(start, end, walk);
+ err = walk->pte_hole(start, end, walk, mmrange);
return err ? err : 1;
}
return 0;
}

static int __walk_page_range(unsigned long start, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
int err = 0;
struct vm_area_struct *vma = walk->vma;

if (vma && is_vm_hugetlb_page(vma)) {
if (walk->hugetlb_entry)
- err = walk_hugetlb_range(start, end, walk);
+ err = walk_hugetlb_range(start, end, walk, mmrange);
} else
- err = walk_pgd_range(start, end, walk);
+ err = walk_pgd_range(start, end, walk, mmrange);

return err;
}
@@ -285,10 +285,11 @@ static int __walk_page_range(unsigned long start, unsigned long end,
* Locking:
* Callers of walk_page_range() and walk_page_vma() should hold
* @walk->mm->mmap_sem, because these function traverse vma list and/or
- * access to vma's data.
+ * access to vma's data. As such, the @mmrange will represent the
+ * address space range.
*/
int walk_page_range(unsigned long start, unsigned long end,
- struct mm_walk *walk)
+ struct mm_walk *walk, struct range_lock *mmrange)
{
int err = 0;
unsigned long next;
@@ -315,7 +316,7 @@ int walk_page_range(unsigned long start, unsigned long end,
next = min(end, vma->vm_end);
vma = vma->vm_next;

- err = walk_page_test(start, next, walk);
+ err = walk_page_test(start, next, walk, mmrange);
if (err > 0) {
/*
* positive return values are purely for
@@ -329,14 +330,15 @@ int walk_page_range(unsigned long start, unsigned long end,
break;
}
if (walk->vma || walk->pte_hole)
- err = __walk_page_range(start, next, walk);
+ err = __walk_page_range(start, next, walk, mmrange);
if (err)
break;
} while (start = next, start < end);
return err;
}

-int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk)
+int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk,
+ struct range_lock *mmrange)
{
int err;

@@ -346,10 +348,10 @@ int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk)
VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem));
VM_BUG_ON(!vma);
walk->vma = vma;
- err = walk_page_test(vma->vm_start, vma->vm_end, walk);
+ err = walk_page_test(vma->vm_start, vma->vm_end, walk, mmrange);
if (err > 0)
return 0;
if (err < 0)
return err;
- return __walk_page_range(vma->vm_start, vma->vm_end, walk);
+ return __walk_page_range(vma->vm_start, vma->vm_end, walk, mmrange);
}
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index a447092d4635..ff6772b86195 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -90,6 +90,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
unsigned long max_pages_per_loop = PVM_MAX_KMALLOC_PAGES
/ sizeof(struct pages *);
unsigned int flags = 0;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* Work out address and page range required */
if (len == 0)
@@ -111,7 +112,8 @@ static int process_vm_rw_single_vec(unsigned long addr,
*/
down_read(&mm->mmap_sem);
pages = get_user_pages_remote(task, mm, pa, pages, flags,
- process_pages, NULL, &locked);
+ process_pages, NULL, &locked,
+ &mmrange);
if (locked)
up_read(&mm->mmap_sem);
if (pages <= 0)
diff --git a/mm/util.c b/mm/util.c
index c1250501364f..b0ec1d88bb71 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -347,13 +347,14 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
struct mm_struct *mm = current->mm;
unsigned long populate;
LIST_HEAD(uf);
+ DEFINE_RANGE_LOCK_FULL(mmrange);

ret = security_mmap_file(file, prot, flag);
if (!ret) {
if (down_write_killable(&mm->mmap_sem))
return -EINTR;
ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
- &populate, &uf);
+ &populate, &uf, &mmrange);
up_write(&mm->mmap_sem);
userfaultfd_unmap_complete(mm, &uf);
if (populate)
diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index f6758dad981f..c1e36ea2c6fc 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -868,6 +868,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
struct tomoyo_page_dump *dump)
{
struct page *page;
+ DEFINE_RANGE_LOCK_FULL(mmrange); /* see get_page_arg() in fs/exec.c */

/* dump->data is released by tomoyo_find_next_domain(). */
if (!dump->data) {
@@ -884,7 +885,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
* the execve().
*/
if (get_user_pages_remote(current, bprm->mm, pos, 1,
- FOLL_FORCE, &page, NULL, NULL) <= 0)
+ FOLL_FORCE, &page, NULL, NULL, &mmrange) <= 0)
return false;
#else
page = bprm->page[pos / PAGE_SIZE];
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 57bcb27dcf30..4cd2b93bb20c 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -78,6 +78,7 @@ static void async_pf_execute(struct work_struct *work)
unsigned long addr = apf->addr;
gva_t gva = apf->gva;
int locked = 1;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

might_sleep();

@@ -88,7 +89,7 @@ static void async_pf_execute(struct work_struct *work)
*/
down_read(&mm->mmap_sem);
get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
- &locked);
+ &locked, &mmrange);
if (locked)
up_read(&mm->mmap_sem);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4501e658e8d6..86ec078f4c3b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1317,11 +1317,12 @@ unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *w
return gfn_to_hva_memslot_prot(slot, gfn, writable);
}

-static inline int check_user_page_hwpoison(unsigned long addr)
+static inline int check_user_page_hwpoison(unsigned long addr,
+ struct range_lock *mmrange)
{
int rc, flags = FOLL_HWPOISON | FOLL_WRITE;

- rc = get_user_pages(addr, 1, flags, NULL, NULL);
+ rc = get_user_pages(addr, 1, flags, NULL, NULL, mmrange);
return rc == -EHWPOISON;
}

@@ -1411,7 +1412,8 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
static int hva_to_pfn_remapped(struct vm_area_struct *vma,
unsigned long addr, bool *async,
bool write_fault, bool *writable,
- kvm_pfn_t *p_pfn)
+ kvm_pfn_t *p_pfn,
+ struct range_lock *mmrange)
{
unsigned long pfn;
int r;
@@ -1425,7 +1427,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
bool unlocked = false;
r = fixup_user_fault(current, current->mm, addr,
(write_fault ? FAULT_FLAG_WRITE : 0),
- &unlocked);
+ &unlocked, mmrange);
if (unlocked)
return -EAGAIN;
if (r)
@@ -1477,6 +1479,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
struct vm_area_struct *vma;
kvm_pfn_t pfn = 0;
int npages, r;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

/* we can do it either atomically or asynchronously, not both */
BUG_ON(atomic && async);
@@ -1493,7 +1496,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,

down_read(&current->mm->mmap_sem);
if (npages == -EHWPOISON ||
- (!async && check_user_page_hwpoison(addr))) {
+ (!async && check_user_page_hwpoison(addr, &mmrange))) {
pfn = KVM_PFN_ERR_HWPOISON;
goto exit;
}
@@ -1504,7 +1507,8 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
if (vma == NULL)
pfn = KVM_PFN_ERR_FAULT;
else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
- r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn);
+ r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable,
+ &pfn, &mmrange);
if (r == -EAGAIN)
goto retry;
if (r < 0)
--
2.13.6


2018-02-05 01:44:57

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 11/64] prctl: teach about range locking

From: Davidlohr Bueso <[email protected]>

And pass along pointers where needed. No changes in
semantics by using mm locking helpers.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/sys.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 31a2866b7abd..a9c659c42bd6 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1769,6 +1769,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
struct file *old_exe, *exe_file;
struct inode *inode;
int err;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

exe = fdget(fd);
if (!exe.file)
@@ -1797,7 +1798,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
if (exe_file) {
struct vm_area_struct *vma;

- down_read(&mm->mmap_sem);
+ mm_read_lock(mm, &mmrange);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (!vma->vm_file)
continue;
@@ -1806,7 +1807,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
goto exit_err;
}

- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
fput(exe_file);
}

@@ -1820,7 +1821,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
fdput(exe);
return err;
exit_err:
- up_read(&mm->mmap_sem);
+ mm_read_unlock(mm, &mmrange);
fput(exe_file);
goto exit;
}
@@ -1923,6 +1924,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
unsigned long user_auxv[AT_VECTOR_SIZE];
struct mm_struct *mm = current->mm;
int error;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv));
BUILD_BUG_ON(sizeof(struct prctl_mm_map) > 256);
@@ -1959,7 +1961,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
return error;
}

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);

/*
* We don't validate if these members are pointing to
@@ -1996,7 +1998,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
if (prctl_map.auxv_size)
memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));

- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return 0;
}
#endif /* CONFIG_CHECKPOINT_RESTORE */
@@ -2038,6 +2040,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
struct prctl_mm_map prctl_map;
struct vm_area_struct *vma;
int error;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV &&
opt != PR_SET_MM_MAP &&
@@ -2063,7 +2066,7 @@ static int prctl_set_mm(int opt, unsigned long addr,

error = -EINVAL;

- down_write(&mm->mmap_sem);
+ mm_write_lock(mm, &mmrange);
vma = find_vma(mm, addr);

prctl_map.start_code = mm->start_code;
@@ -2156,7 +2159,7 @@ static int prctl_set_mm(int opt, unsigned long addr,

error = 0;
out:
- up_write(&mm->mmap_sem);
+ mm_write_unlock(mm, &mmrange);
return error;
}

@@ -2196,6 +2199,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
struct task_struct *me = current;
unsigned char comm[sizeof(me->comm)];
long error;
+ DEFINE_RANGE_LOCK_FULL(mmrange);

error = security_task_prctl(option, arg2, arg3, arg4, arg5);
if (error != -ENOSYS)
@@ -2379,13 +2383,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_SET_THP_DISABLE:
if (arg3 || arg4 || arg5)
return -EINVAL;
- if (down_write_killable(&me->mm->mmap_sem))
+ if (mm_write_lock_killable(me->mm, &mmrange))
return -EINTR;
if (arg2)
set_bit(MMF_DISABLE_THP, &me->mm->flags);
else
clear_bit(MMF_DISABLE_THP, &me->mm->flags);
- up_write(&me->mm->mmap_sem);
+ mm_write_unlock(me->mm, &mmrange);
break;
case PR_MPX_ENABLE_MANAGEMENT:
if (arg2 || arg3 || arg4 || arg5)
--
2.13.6


2018-02-05 01:45:08

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 12/64] fs/userfaultfd: teach userfaultfd_must_wait() about range locking

From: Davidlohr Bueso <[email protected]>

And make use of mm_is_locked() which is why we pass down the
vmf->lockrange.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
fs/userfaultfd.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index e3089865fd52..883fbffb284e 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -217,13 +217,14 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
struct vm_area_struct *vma,
unsigned long address,
unsigned long flags,
- unsigned long reason)
+ unsigned long reason,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = ctx->mm;
pte_t *pte;
bool ret = true;

- VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
+ VM_BUG_ON(!mm_is_locked(mm, mmrange));

pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma));
if (!pte)
@@ -247,7 +248,8 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
struct vm_area_struct *vma,
unsigned long address,
unsigned long flags,
- unsigned long reason)
+ unsigned long reason,
+ struct range_lock *mmrange)
{
return false; /* should never get here */
}
@@ -263,7 +265,8 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
unsigned long address,
unsigned long flags,
- unsigned long reason)
+ unsigned long reason,
+ struct range_lock *mmrange)
{
struct mm_struct *mm = ctx->mm;
pgd_t *pgd;
@@ -273,7 +276,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
pte_t *pte;
bool ret = true;

- VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
+ VM_BUG_ON(!mm_is_locked(mm, mmrange));

pgd = pgd_offset(mm, address);
if (!pgd_present(*pgd))
@@ -365,7 +368,7 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason)
* Coredumping runs without mmap_sem so we can only check that
* the mmap_sem is held, if PF_DUMPCORE was not set.
*/
- WARN_ON_ONCE(!rwsem_is_locked(&mm->mmap_sem));
+ WARN_ON_ONCE(!mm_is_locked(mm, vmf->lockrange));

ctx = vmf->vma->vm_userfaultfd_ctx.ctx;
if (!ctx)
@@ -473,11 +476,12 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason)

if (!is_vm_hugetlb_page(vmf->vma))
must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags,
- reason);
+ reason, vmf->lockrange);
else
must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma,
vmf->address,
- vmf->flags, reason);
+ vmf->flags, reason,
+ vmf->lockrange);
up_read(&mm->mmap_sem);

if (likely(must_wait && !READ_ONCE(ctx->released) &&
--
2.13.6


2018-02-05 16:11:28

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH 06/64] mm: teach pagefault paths about range locking

On 05/02/2018 02:26, Davidlohr Bueso wrote:
> From: Davidlohr Bueso <[email protected]>
>
> In handle_mm_fault() we need to remember the range lock specified
> when the mmap_sem was first taken as pf paths can drop the lock.
> Although this patch may seem far too big at first, it is so due to
> bisectability, and later conversion patches become quite easy to
> follow. Furthermore, most of what this patch does is pass a pointer
> to an 'mmrange' stack allocated parameter that is later used by the
> vm_fault structure. The new interfaces are pretty much all in the
> following areas:
>
> - vma handling (vma_merge(), vma_adjust(), split_vma(), copy_vma())
> - gup family (all except get_user_pages_unlocked(), which internally
> passes the mmrange).
> - mm walking (walk_page_vma())
> - mmap/unmap (do_mmap(), do_munmap())
> - handle_mm_fault(), fixup_user_fault()
>
> Most of the pain of the patch is updating all callers in the kernel
> for this. While tedious, it is not that hard to review, I hope.
> The idea is to use a local variable (no concurrency) whenever the
> mmap_sem is taken and we end up in pf paths that end up retaking
> the lock. Ie:
>
> DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_write(&mm->mmap_sem);
> some_fn(a, b, c, &mmrange);
> ....
> ....
> ...
> handle_mm_fault(vma, addr, flags, mmrange);
> ...
> up_write(&mm->mmap_sem);
>
> Semantically nothing changes at all, and the 'mmrange' ends up
> being unused for now. Later patches will use the variable when
> the mmap_sem wrappers replace straightforward down/up.
>
> Compile tested defconfigs on various non-x86 archs without breaking.
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> ---
> arch/alpha/mm/fault.c | 3 +-
> arch/arc/mm/fault.c | 3 +-
> arch/arm/mm/fault.c | 8 ++-
> arch/arm/probes/uprobes/core.c | 5 +-
> arch/arm64/mm/fault.c | 7 ++-
> arch/cris/mm/fault.c | 3 +-
> arch/frv/mm/fault.c | 3 +-
> arch/hexagon/mm/vm_fault.c | 3 +-
> arch/ia64/mm/fault.c | 3 +-
> arch/m32r/mm/fault.c | 3 +-
> arch/m68k/mm/fault.c | 3 +-
> arch/metag/mm/fault.c | 3 +-
> arch/microblaze/mm/fault.c | 3 +-
> arch/mips/kernel/vdso.c | 3 +-
> arch/mips/mm/fault.c | 3 +-
> arch/mn10300/mm/fault.c | 3 +-
> arch/nios2/mm/fault.c | 3 +-
> arch/openrisc/mm/fault.c | 3 +-
> arch/parisc/mm/fault.c | 3 +-
> arch/powerpc/include/asm/mmu_context.h | 3 +-
> arch/powerpc/include/asm/powernv.h | 5 +-
> arch/powerpc/mm/copro_fault.c | 4 +-
> arch/powerpc/mm/fault.c | 3 +-
> arch/powerpc/platforms/powernv/npu-dma.c | 5 +-
> arch/riscv/mm/fault.c | 3 +-
> arch/s390/include/asm/gmap.h | 14 +++--
> arch/s390/kvm/gaccess.c | 31 ++++++----
> arch/s390/mm/fault.c | 3 +-
> arch/s390/mm/gmap.c | 80 +++++++++++++++---------
> arch/score/mm/fault.c | 3 +-
> arch/sh/mm/fault.c | 3 +-
> arch/sparc/mm/fault_32.c | 6 +-
> arch/sparc/mm/fault_64.c | 3 +-
> arch/tile/mm/fault.c | 3 +-
> arch/um/include/asm/mmu_context.h | 3 +-
> arch/um/kernel/trap.c | 3 +-
> arch/unicore32/mm/fault.c | 8 ++-
> arch/x86/entry/vdso/vma.c | 3 +-
> arch/x86/include/asm/mmu_context.h | 5 +-
> arch/x86/include/asm/mpx.h | 6 +-
> arch/x86/mm/fault.c | 3 +-
> arch/x86/mm/mpx.c | 41 ++++++++-----
> arch/xtensa/mm/fault.c | 3 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +-
> drivers/gpu/drm/i915/i915_gem_userptr.c | 4 +-
> drivers/gpu/drm/radeon/radeon_ttm.c | 4 +-
> drivers/infiniband/core/umem.c | 3 +-
> drivers/infiniband/core/umem_odp.c | 3 +-
> drivers/infiniband/hw/qib/qib_user_pages.c | 7 ++-
> drivers/infiniband/hw/usnic/usnic_uiom.c | 3 +-
> drivers/iommu/amd_iommu_v2.c | 5 +-
> drivers/iommu/intel-svm.c | 5 +-
> drivers/media/v4l2-core/videobuf-dma-sg.c | 18 ++++--
> drivers/misc/mic/scif/scif_rma.c | 3 +-
> drivers/misc/sgi-gru/grufault.c | 43 ++++++++-----
> drivers/vfio/vfio_iommu_type1.c | 3 +-
> fs/aio.c | 3 +-
> fs/binfmt_elf.c | 3 +-
> fs/exec.c | 20 ++++--
> fs/proc/internal.h | 3 +
> fs/proc/task_mmu.c | 29 ++++++---
> fs/proc/vmcore.c | 14 ++++-
> fs/userfaultfd.c | 18 +++---
> include/asm-generic/mm_hooks.h | 3 +-
> include/linux/hmm.h | 4 +-
> include/linux/ksm.h | 6 +-
> include/linux/migrate.h | 4 +-
> include/linux/mm.h | 73 +++++++++++++---------
> include/linux/uprobes.h | 15 +++--
> ipc/shm.c | 14 +++--
> kernel/events/uprobes.c | 49 +++++++++------
> kernel/futex.c | 3 +-
> mm/frame_vector.c | 4 +-
> mm/gup.c | 60 ++++++++++--------
> mm/hmm.c | 37 ++++++-----
> mm/internal.h | 3 +-
> mm/ksm.c | 24 +++++---
> mm/madvise.c | 58 ++++++++++-------
> mm/memcontrol.c | 13 ++--
> mm/memory.c | 10 +--
> mm/mempolicy.c | 35 ++++++-----
> mm/migrate.c | 20 +++---
> mm/mincore.c | 24 +++++---
> mm/mlock.c | 33 ++++++----
> mm/mmap.c | 99 +++++++++++++++++-------------
> mm/mprotect.c | 14 +++--
> mm/mremap.c | 30 +++++----
> mm/nommu.c | 32 ++++++----
> mm/pagewalk.c | 56 +++++++++--------
> mm/process_vm_access.c | 4 +-
> mm/util.c | 3 +-
> security/tomoyo/domain.c | 3 +-
> virt/kvm/async_pf.c | 3 +-
> virt/kvm/kvm_main.c | 16 +++--
> 94 files changed, 784 insertions(+), 474 deletions(-)
>
> diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
> index cd3c572ee912..690d86a00a20 100644
> --- a/arch/alpha/mm/fault.c
> +++ b/arch/alpha/mm/fault.c
> @@ -90,6 +90,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
> int fault, si_code = SEGV_MAPERR;
> siginfo_t info;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* As of EV6, a load into $31/$f31 is a prefetch, and never faults
> (or is suppressed by the PALcode). Support that for older CPUs
> @@ -148,7 +149,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
> /* If for any reason at all we couldn't handle the fault,
> make sure we exit gracefully rather than endlessly redo
> the fault. */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
> index a0b7bd6d030d..e423f764f159 100644
> --- a/arch/arc/mm/fault.c
> +++ b/arch/arc/mm/fault.c
> @@ -69,6 +69,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
> int fault, ret;
> int write = regs->ecr_cause & ECR_C_PROTV_STORE; /* ST/EX */
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /*
> * We fault-in kernel-space virtual memory on-demand. The
> @@ -137,7 +138,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> /* If Pagefault was interrupted by SIGKILL, exit page fault "early" */
> if (unlikely(fatal_signal_pending(current))) {
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index b75eada23d0a..99ae40b5851a 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -221,7 +221,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
>
> static int __kprobes
> __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
> - unsigned int flags, struct task_struct *tsk)
> + unsigned int flags, struct task_struct *tsk,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *vma;
> int fault;
> @@ -243,7 +244,7 @@ __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
> goto out;
> }
>
> - return handle_mm_fault(vma, addr & PAGE_MASK, flags);
> + return handle_mm_fault(vma, addr & PAGE_MASK, flags, mmrange);
>
> check_stack:
> /* Don't allow expansion below FIRST_USER_ADDRESS */
> @@ -261,6 +262,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> struct mm_struct *mm;
> int fault, sig, code;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (notify_page_fault(regs, fsr))
> return 0;
> @@ -308,7 +310,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> #endif
> }
>
> - fault = __do_page_fault(mm, addr, fsr, flags, tsk);
> + fault = __do_page_fault(mm, addr, fsr, flags, tsk, &mmrange);
>
> /* If we need to retry but a fatal signal is pending, handle the
> * signal first. We do not need to release the mmap_sem because
> diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
> index d1329f1ba4e4..e8b893eaebcf 100644
> --- a/arch/arm/probes/uprobes/core.c
> +++ b/arch/arm/probes/uprobes/core.c
> @@ -30,10 +30,11 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
> }
>
> int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
> - unsigned long vaddr)
> + unsigned long vaddr, struct range_lock *mmrange)
> {
> return uprobe_write_opcode(mm, vaddr,
> - __opcode_to_mem_arm(auprobe->bpinsn));
> + __opcode_to_mem_arm(auprobe->bpinsn),
> + mmrange);
> }
>
> bool arch_uprobe_ignore(struct arch_uprobe *auprobe, struct pt_regs *regs)
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index ce441d29e7f6..1f3ad9e4f214 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -342,7 +342,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
>
> static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
> unsigned int mm_flags, unsigned long vm_flags,
> - struct task_struct *tsk)
> + struct task_struct *tsk, struct range_lock *mmrange)
> {
> struct vm_area_struct *vma;
> int fault;
> @@ -368,7 +368,7 @@ static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
> goto out;
> }
>
> - return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags);
> + return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags, mmrange);
>
> check_stack:
> if (vma->vm_flags & VM_GROWSDOWN && !expand_stack(vma, addr))
> @@ -390,6 +390,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
> int fault, sig, code, major = 0;
> unsigned long vm_flags = VM_READ | VM_WRITE;
> unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (notify_page_fault(regs, esr))
> return 0;
> @@ -450,7 +451,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
> #endif
> }
>
> - fault = __do_page_fault(mm, addr, mm_flags, vm_flags, tsk);
> + fault = __do_page_fault(mm, addr, mm_flags, vm_flags, tsk, &mmrange);
> major |= fault & VM_FAULT_MAJOR;
>
> if (fault & VM_FAULT_RETRY) {
> diff --git a/arch/cris/mm/fault.c b/arch/cris/mm/fault.c
> index 29cc58038b98..16af16d77269 100644
> --- a/arch/cris/mm/fault.c
> +++ b/arch/cris/mm/fault.c
> @@ -61,6 +61,7 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
> siginfo_t info;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> D(printk(KERN_DEBUG
> "Page fault for %lX on %X at %lX, prot %d write %d\n",
> @@ -170,7 +171,7 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
> * the fault.
> */
>
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/frv/mm/fault.c b/arch/frv/mm/fault.c
> index cbe7aec863e3..494d33b628fc 100644
> --- a/arch/frv/mm/fault.c
> +++ b/arch/frv/mm/fault.c
> @@ -41,6 +41,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
> pud_t *pue;
> pte_t *pte;
> int fault;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> #if 0
> const char *atxc[16] = {
> @@ -165,7 +166,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, ear0, flags);
> + fault = handle_mm_fault(vma, ear0, flags, &mmrange);
> if (unlikely(fault & VM_FAULT_ERROR)) {
> if (fault & VM_FAULT_OOM)
> goto out_of_memory;
> diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
> index 3eec33c5cfd7..7d6ada2c2230 100644
> --- a/arch/hexagon/mm/vm_fault.c
> +++ b/arch/hexagon/mm/vm_fault.c
> @@ -55,6 +55,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
> int fault;
> const struct exception_table_entry *fixup;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /*
> * If we're in an interrupt or have no user context,
> @@ -102,7 +103,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
> break;
> }
>
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
> index dfdc152d6737..44f0ec5f77c2 100644
> --- a/arch/ia64/mm/fault.c
> +++ b/arch/ia64/mm/fault.c
> @@ -89,6 +89,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
> unsigned long mask;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> mask = ((((isr >> IA64_ISR_X_BIT) & 1UL) << VM_EXEC_BIT)
> | (((isr >> IA64_ISR_W_BIT) & 1UL) << VM_WRITE_BIT));
> @@ -162,7 +163,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
> * sure we exit gracefully rather than endlessly redo the
> * fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/m32r/mm/fault.c b/arch/m32r/mm/fault.c
> index 46d9a5ca0e3a..0129aea46729 100644
> --- a/arch/m32r/mm/fault.c
> +++ b/arch/m32r/mm/fault.c
> @@ -82,6 +82,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
> unsigned long flags = 0;
> int fault;
> siginfo_t info;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /*
> * If BPSW IE bit enable --> set PSW IE bit
> @@ -197,7 +198,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
> */
> addr = (address & PAGE_MASK);
> set_thread_fault_code(error_code);
> - fault = handle_mm_fault(vma, addr, flags);
> + fault = handle_mm_fault(vma, addr, flags, &mmrange);
> if (unlikely(fault & VM_FAULT_ERROR)) {
> if (fault & VM_FAULT_OOM)
> goto out_of_memory;
> diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
> index 03253c4f8e6a..ec32a193726f 100644
> --- a/arch/m68k/mm/fault.c
> +++ b/arch/m68k/mm/fault.c
> @@ -75,6 +75,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> struct vm_area_struct * vma;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> pr_debug("do page fault:\nregs->sr=%#x, regs->pc=%#lx, address=%#lx, %ld, %p\n",
> regs->sr, regs->pc, address, error_code, mm ? mm->pgd : NULL);
> @@ -138,7 +139,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> * the fault.
> */
>
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
> pr_debug("handle_mm_fault returns %d\n", fault);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> diff --git a/arch/metag/mm/fault.c b/arch/metag/mm/fault.c
> index de54fe686080..e16ba0ea7ea1 100644
> --- a/arch/metag/mm/fault.c
> +++ b/arch/metag/mm/fault.c
> @@ -56,6 +56,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> siginfo_t info;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> tsk = current;
>
> @@ -135,7 +136,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return 0;
> diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
> index f91b30f8aaa8..fd49efbdfbf4 100644
> --- a/arch/microblaze/mm/fault.c
> +++ b/arch/microblaze/mm/fault.c
> @@ -93,6 +93,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
> int is_write = error_code & ESR_S;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> regs->ear = address;
> regs->esr = error_code;
> @@ -216,7 +217,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> index 019035d7225c..56b7c29991db 100644
> --- a/arch/mips/kernel/vdso.c
> +++ b/arch/mips/kernel/vdso.c
> @@ -102,6 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> unsigned long gic_size, vvar_size, size, base, data_addr, vdso_addr, gic_pfn;
> struct vm_area_struct *vma;
> int ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (down_write_killable(&mm->mmap_sem))
> return -EINTR;
> @@ -110,7 +111,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> VM_READ|VM_WRITE|VM_EXEC|
> VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> - 0, NULL);
> + 0, NULL, &mmrange);
> if (IS_ERR_VALUE(base)) {
> ret = base;
> goto out;
> diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
> index 4f8f5bf46977..1433edd01d09 100644
> --- a/arch/mips/mm/fault.c
> +++ b/arch/mips/mm/fault.c
> @@ -47,6 +47,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
>
> static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> #if 0
> printk("Cpu%d[%s:%d:%0*lx:%ld:%0*lx]\n", raw_smp_processor_id(),
> @@ -152,7 +153,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/mn10300/mm/fault.c b/arch/mn10300/mm/fault.c
> index f0bfa1448744..71c38f0c8702 100644
> --- a/arch/mn10300/mm/fault.c
> +++ b/arch/mn10300/mm/fault.c
> @@ -125,6 +125,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
> siginfo_t info;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> #ifdef CONFIG_GDBSTUB
> /* handle GDB stub causing a fault */
> @@ -254,7 +255,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
> index b804dd06ea1c..768678b685af 100644
> --- a/arch/nios2/mm/fault.c
> +++ b/arch/nios2/mm/fault.c
> @@ -49,6 +49,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
> int code = SEGV_MAPERR;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> cause >>= 2;
>
> @@ -132,7 +133,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
> index d0021dfae20a..75ddb1e8e7e7 100644
> --- a/arch/openrisc/mm/fault.c
> +++ b/arch/openrisc/mm/fault.c
> @@ -55,6 +55,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
> siginfo_t info;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> tsk = current;
>
> @@ -163,7 +164,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
> * the fault.
> */
>
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
> index e247edbca68e..79db33a0cb0c 100644
> --- a/arch/parisc/mm/fault.c
> +++ b/arch/parisc/mm/fault.c
> @@ -264,6 +264,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
> unsigned long acc_type;
> int fault = 0;
> unsigned int flags;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (faulthandler_disabled())
> goto no_context;
> @@ -301,7 +302,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
> * fault.
> */
>
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 051b3d63afe3..089b3cf948eb 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -176,7 +176,8 @@ extern void arch_exit_mmap(struct mm_struct *mm);
>
> static inline void arch_unmap(struct mm_struct *mm,
> struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> if (start <= mm->context.vdso_base && mm->context.vdso_base < end)
> mm->context.vdso_base = 0;
> diff --git a/arch/powerpc/include/asm/powernv.h b/arch/powerpc/include/asm/powernv.h
> index dc5f6a5d4575..805ff3ba94e1 100644
> --- a/arch/powerpc/include/asm/powernv.h
> +++ b/arch/powerpc/include/asm/powernv.h
> @@ -21,7 +21,7 @@ extern void pnv_npu2_destroy_context(struct npu_context *context,
> struct pci_dev *gpdev);
> extern int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
> unsigned long *flags, unsigned long *status,
> - int count);
> + int count, struct range_lock *mmrange);
>
> void pnv_tm_init(void);
> #else
> @@ -35,7 +35,8 @@ static inline void pnv_npu2_destroy_context(struct npu_context *context,
>
> static inline int pnv_npu2_handle_fault(struct npu_context *context,
> uintptr_t *ea, unsigned long *flags,
> - unsigned long *status, int count) {
> + unsigned long *status, int count,
> + struct range_lock *mmrange) {
> return -ENODEV;
> }
>
> diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
> index 697b70ad1195..8f5e604828a1 100644
> --- a/arch/powerpc/mm/copro_fault.c
> +++ b/arch/powerpc/mm/copro_fault.c
> @@ -39,6 +39,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
> struct vm_area_struct *vma;
> unsigned long is_write;
> int ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (mm == NULL)
> return -EFAULT;
> @@ -77,7 +78,8 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
> }
>
> ret = 0;
> - *flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0);
> + *flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0,
> + &mmrange);
> if (unlikely(*flt & VM_FAULT_ERROR)) {
> if (*flt & VM_FAULT_OOM) {
> ret = -ENOMEM;
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 866446cf2d9a..d562dc88687d 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -399,6 +399,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
> int is_write = page_fault_is_write(error_code);
> int fault, major = 0;
> bool store_update_sp = false;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (notify_page_fault(regs))
> return 0;
> @@ -514,7 +515,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> #ifdef CONFIG_PPC_MEM_KEYS
> /*
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 0a253b64ac5f..759e9a4c7479 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -789,7 +789,8 @@ EXPORT_SYMBOL(pnv_npu2_destroy_context);
> * Assumes mmap_sem is held for the contexts associated mm.
> */
> int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
> - unsigned long *flags, unsigned long *status, int count)
> + unsigned long *flags, unsigned long *status,
> + int count, struct range_lock *mmrange)
> {
> u64 rc = 0, result = 0;
> int i, is_write;
> @@ -807,7 +808,7 @@ int pnv_npu2_handle_fault(struct npu_context *context, uintptr_t *ea,
> is_write = flags[i] & NPU2_WRITE;
> rc = get_user_pages_remote(NULL, mm, ea[i], 1,
> is_write ? FOLL_WRITE : 0,
> - page, NULL, NULL);
> + page, NULL, NULL, mmrange);
>
> /*
> * To support virtualised environments we will have to do an
> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> index 148c98ca9b45..75d15e73ba39 100644
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -42,6 +42,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
> unsigned long addr, cause;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> int fault, code = SEGV_MAPERR;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> cause = regs->scause;
> addr = regs->sbadaddr;
> @@ -119,7 +120,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, addr, flags);
> + fault = handle_mm_fault(vma, addr, flags, &mmrange);
>
> /*
> * If we need to retry but a fatal signal is pending, handle the
> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
> index e07cce88dfb0..117c19a947c9 100644
> --- a/arch/s390/include/asm/gmap.h
> +++ b/arch/s390/include/asm/gmap.h
> @@ -107,22 +107,24 @@ void gmap_discard(struct gmap *, unsigned long from, unsigned long to);
> void __gmap_zap(struct gmap *, unsigned long gaddr);
> void gmap_unlink(struct mm_struct *, unsigned long *table, unsigned long vmaddr);
>
> -int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val);
> +int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val,
> + struct range_lock *mmrange);
>
> struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
> int edat_level);
> int gmap_shadow_valid(struct gmap *sg, unsigned long asce, int edat_level);
> int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
> - int fake);
> + int fake, struct range_lock *mmrange);
> int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
> - int fake);
> + int fake, struct range_lock *mmrange);
> int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
> - int fake);
> + int fake, struct range_lock *mmrange);
> int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
> - int fake);
> + int fake, struct range_lock *mmrange);
> int gmap_shadow_pgt_lookup(struct gmap *sg, unsigned long saddr,
> unsigned long *pgt, int *dat_protection, int *fake);
> -int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte);
> +int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte,
> + struct range_lock *mmrange);
>
> void gmap_register_pte_notifier(struct gmap_notifier *);
> void gmap_unregister_pte_notifier(struct gmap_notifier *);
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index c24bfa72baf7..ff739b86df36 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -978,10 +978,11 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra)
> * @saddr: faulting address in the shadow gmap
> * @pgt: pointer to the page table address result
> * @fake: pgt references contiguous guest memory block, not a pgtable
> + * @mmrange: address space range locking
> */
> static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> unsigned long *pgt, int *dat_protection,
> - int *fake)
> + int *fake, struct range_lock *mmrange)
> {
> struct gmap *parent;
> union asce asce;
> @@ -1034,7 +1035,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> rfte.val = ptr;
> goto shadow_r2t;
> }
> - rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val);
> + rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val,
> + mmrange);
> if (rc)
> return rc;
> if (rfte.i)
> @@ -1047,7 +1049,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> *dat_protection |= rfte.p;
> ptr = rfte.rto * PAGE_SIZE;
> shadow_r2t:
> - rc = gmap_shadow_r2t(sg, saddr, rfte.val, *fake);
> + rc = gmap_shadow_r2t(sg, saddr, rfte.val, *fake, mmrange);
> if (rc)
> return rc;
> /* fallthrough */
> @@ -1060,7 +1062,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> rste.val = ptr;
> goto shadow_r3t;
> }
> - rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val);
> + rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val,
> + mmrange);
> if (rc)
> return rc;
> if (rste.i)
> @@ -1074,7 +1077,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> ptr = rste.rto * PAGE_SIZE;
> shadow_r3t:
> rste.p |= *dat_protection;
> - rc = gmap_shadow_r3t(sg, saddr, rste.val, *fake);
> + rc = gmap_shadow_r3t(sg, saddr, rste.val, *fake, mmrange);
> if (rc)
> return rc;
> /* fallthrough */
> @@ -1087,7 +1090,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> rtte.val = ptr;
> goto shadow_sgt;
> }
> - rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val);
> + rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val,
> + mmrange);
> if (rc)
> return rc;
> if (rtte.i)
> @@ -1110,7 +1114,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> ptr = rtte.fc0.sto * PAGE_SIZE;
> shadow_sgt:
> rtte.fc0.p |= *dat_protection;
> - rc = gmap_shadow_sgt(sg, saddr, rtte.val, *fake);
> + rc = gmap_shadow_sgt(sg, saddr, rtte.val, *fake, mmrange);
> if (rc)
> return rc;
> /* fallthrough */
> @@ -1123,7 +1127,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> ste.val = ptr;
> goto shadow_pgt;
> }
> - rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val);
> + rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val,
> + mmrange);
> if (rc)
> return rc;
> if (ste.i)
> @@ -1142,7 +1147,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
> ptr = ste.fc0.pto * (PAGE_SIZE / 2);
> shadow_pgt:
> ste.fc0.p |= *dat_protection;
> - rc = gmap_shadow_pgt(sg, saddr, ste.val, *fake);
> + rc = gmap_shadow_pgt(sg, saddr, ste.val, *fake, mmrange);
> if (rc)
> return rc;
> }
> @@ -1172,6 +1177,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> unsigned long pgt;
> int dat_protection, fake;
> int rc;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_read(&sg->mm->mmap_sem);
> /*
> @@ -1184,7 +1190,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> rc = gmap_shadow_pgt_lookup(sg, saddr, &pgt, &dat_protection, &fake);
> if (rc)
> rc = kvm_s390_shadow_tables(sg, saddr, &pgt, &dat_protection,
> - &fake);
> + &fake, &mmrange);
>
> vaddr.addr = saddr;
> if (fake) {
> @@ -1192,7 +1198,8 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> goto shadow_page;
> }
> if (!rc)
> - rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8, &pte.val);
> + rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8,
> + &pte.val, &mmrange);
> if (!rc && pte.i)
> rc = PGM_PAGE_TRANSLATION;
> if (!rc && pte.z)
> @@ -1200,7 +1207,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
> shadow_page:
> pte.p |= dat_protection;
> if (!rc)
> - rc = gmap_shadow_page(sg, saddr, __pte(pte.val));
> + rc = gmap_shadow_page(sg, saddr, __pte(pte.val), &mmrange);
> ipte_unlock(vcpu);
> up_read(&sg->mm->mmap_sem);
> return rc;
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index 93faeca52284..17ba3c402f9d 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -421,6 +421,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
> unsigned long address;
> unsigned int flags;
> int fault;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> tsk = current;
> /*
> @@ -507,7 +508,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
> /* No reason to continue if interrupted by SIGKILL. */
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
> fault = VM_FAULT_SIGNAL;
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 2c55a2b9d6c6..b12a44813022 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -621,6 +621,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
> unsigned long vmaddr;
> int rc;
> bool unlocked;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_read(&gmap->mm->mmap_sem);
>
> @@ -632,7 +633,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
> goto out_up;
> }
> if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags,
> - &unlocked)) {
> + &unlocked, &mmrange)) {
> rc = -EFAULT;
> goto out_up;
> }
> @@ -835,13 +836,15 @@ static pte_t *gmap_pte_op_walk(struct gmap *gmap, unsigned long gaddr,
> * @gaddr: virtual address in the guest address space
> * @vmaddr: address in the host process address space
> * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
> + * @mmrange: address space range locking
> *
> * Returns 0 if the caller can retry __gmap_translate (might fail again),
> * -ENOMEM if out of memory and -EFAULT if anything goes wrong while fixing
> * up or connecting the gmap page table.
> */
> static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,
> - unsigned long vmaddr, int prot)
> + unsigned long vmaddr, int prot,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = gmap->mm;
> unsigned int fault_flags;
> @@ -849,7 +852,8 @@ static int gmap_pte_op_fixup(struct gmap *gmap, unsigned long gaddr,
>
> BUG_ON(gmap_is_shadow(gmap));
> fault_flags = (prot == PROT_WRITE) ? FAULT_FLAG_WRITE : 0;
> - if (fixup_user_fault(current, mm, vmaddr, fault_flags, &unlocked))
> + if (fixup_user_fault(current, mm, vmaddr, fault_flags, &unlocked,
> + mmrange))
> return -EFAULT;
> if (unlocked)
> /* lost mmap_sem, caller has to retry __gmap_translate */
> @@ -874,6 +878,7 @@ static void gmap_pte_op_end(spinlock_t *ptl)
> * @len: size of area
> * @prot: indicates access rights: PROT_NONE, PROT_READ or PROT_WRITE
> * @bits: pgste notification bits to set
> + * @mmrange: address space range locking
> *
> * Returns 0 if successfully protected, -ENOMEM if out of memory and
> * -EFAULT if gaddr is invalid (or mapping for shadows is missing).
> @@ -881,7 +886,8 @@ static void gmap_pte_op_end(spinlock_t *ptl)
> * Called with sg->mm->mmap_sem in read.
> */
> static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
> - unsigned long len, int prot, unsigned long bits)
> + unsigned long len, int prot, unsigned long bits,
> + struct range_lock *mmrange)
> {
> unsigned long vmaddr;
> spinlock_t *ptl;
> @@ -900,7 +906,8 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr,
> vmaddr = __gmap_translate(gmap, gaddr);
> if (IS_ERR_VALUE(vmaddr))
> return vmaddr;
> - rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot);
> + rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, prot,
> + mmrange);
> if (rc)
> return rc;
> continue;
> @@ -929,13 +936,14 @@ int gmap_mprotect_notify(struct gmap *gmap, unsigned long gaddr,
> unsigned long len, int prot)
> {
> int rc;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if ((gaddr & ~PAGE_MASK) || (len & ~PAGE_MASK) || gmap_is_shadow(gmap))
> return -EINVAL;
> if (!MACHINE_HAS_ESOP && prot == PROT_READ)
> return -EINVAL;
> down_read(&gmap->mm->mmap_sem);
> - rc = gmap_protect_range(gmap, gaddr, len, prot, PGSTE_IN_BIT);
> + rc = gmap_protect_range(gmap, gaddr, len, prot, PGSTE_IN_BIT, &mmrange);
> up_read(&gmap->mm->mmap_sem);
> return rc;
> }
> @@ -947,6 +955,7 @@ EXPORT_SYMBOL_GPL(gmap_mprotect_notify);
> * @gmap: pointer to guest mapping meta data structure
> * @gaddr: virtual address in the guest address space
> * @val: pointer to the unsigned long value to return
> + * @mmrange: address space range locking
> *
> * Returns 0 if the value was read, -ENOMEM if out of memory and -EFAULT
> * if reading using the virtual address failed. -EINVAL if called on a gmap
> @@ -954,7 +963,8 @@ EXPORT_SYMBOL_GPL(gmap_mprotect_notify);
> *
> * Called with gmap->mm->mmap_sem in read.
> */
> -int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val)
> +int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val,
> + struct range_lock *mmrange)
> {
> unsigned long address, vmaddr;
> spinlock_t *ptl;
> @@ -986,7 +996,7 @@ int gmap_read_table(struct gmap *gmap, unsigned long gaddr, unsigned long *val)
> rc = vmaddr;
> break;
> }
> - rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, PROT_READ);
> + rc = gmap_pte_op_fixup(gmap, gaddr, vmaddr, PROT_READ, mmrange);
> if (rc)
> break;
> }
> @@ -1026,12 +1036,14 @@ static inline void gmap_insert_rmap(struct gmap *sg, unsigned long vmaddr,
> * @raddr: rmap address in the shadow gmap
> * @paddr: address in the parent guest address space
> * @len: length of the memory area to protect
> + * @mmrange: address space range locking
> *
> * Returns 0 if successfully protected and the rmap was created, -ENOMEM
> * if out of memory and -EFAULT if paddr is invalid.
> */
> static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
> - unsigned long paddr, unsigned long len)
> + unsigned long paddr, unsigned long len,
> + struct range_lock *mmrange)
> {
> struct gmap *parent;
> struct gmap_rmap *rmap;
> @@ -1069,7 +1081,7 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
> radix_tree_preload_end();
> if (rc) {
> kfree(rmap);
> - rc = gmap_pte_op_fixup(parent, paddr, vmaddr, PROT_READ);
> + rc = gmap_pte_op_fixup(parent, paddr, vmaddr, PROT_READ, mmrange);
> if (rc)
> return rc;
> continue;
> @@ -1473,6 +1485,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
> struct gmap *sg, *new;
> unsigned long limit;
> int rc;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> BUG_ON(gmap_is_shadow(parent));
> spin_lock(&parent->shadow_lock);
> @@ -1526,7 +1539,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
> down_read(&parent->mm->mmap_sem);
> rc = gmap_protect_range(parent, asce & _ASCE_ORIGIN,
> ((asce & _ASCE_TABLE_LENGTH) + 1) * PAGE_SIZE,
> - PROT_READ, PGSTE_VSIE_BIT);
> + PROT_READ, PGSTE_VSIE_BIT, &mmrange);
> up_read(&parent->mm->mmap_sem);
> spin_lock(&parent->shadow_lock);
> new->initialized = true;
> @@ -1546,6 +1559,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow);
> * @saddr: faulting address in the shadow gmap
> * @r2t: parent gmap address of the region 2 table to get shadowed
> * @fake: r2t references contiguous guest memory block, not a r2t
> + * @mmrange: address space range locking
> *
> * The r2t parameter specifies the address of the source table. The
> * four pages of the source table are made read-only in the parent gmap
> @@ -1559,7 +1573,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow);
> * Called with sg->mm->mmap_sem in read.
> */
> int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
> - int fake)
> + int fake, struct range_lock *mmrange)
> {
> unsigned long raddr, origin, offset, len;
> unsigned long *s_r2t, *table;
> @@ -1608,7 +1622,7 @@ int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
> origin = r2t & _REGION_ENTRY_ORIGIN;
> offset = ((r2t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE;
> len = ((r2t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset;
> - rc = gmap_protect_rmap(sg, raddr, origin + offset, len);
> + rc = gmap_protect_rmap(sg, raddr, origin + offset, len, mmrange);
> spin_lock(&sg->guest_table_lock);
> if (!rc) {
> table = gmap_table_walk(sg, saddr, 4);
> @@ -1635,6 +1649,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_r2t);
> * @saddr: faulting address in the shadow gmap
> * @r3t: parent gmap address of the region 3 table to get shadowed
> * @fake: r3t references contiguous guest memory block, not a r3t
> + * @mmrange: address space range locking
> *
> * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the
> * shadow table structure is incomplete, -ENOMEM if out of memory and
> @@ -1643,7 +1658,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_r2t);
> * Called with sg->mm->mmap_sem in read.
> */
> int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
> - int fake)
> + int fake, struct range_lock *mmrange)
> {
> unsigned long raddr, origin, offset, len;
> unsigned long *s_r3t, *table;
> @@ -1691,7 +1706,7 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
> origin = r3t & _REGION_ENTRY_ORIGIN;
> offset = ((r3t & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE;
> len = ((r3t & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset;
> - rc = gmap_protect_rmap(sg, raddr, origin + offset, len);
> + rc = gmap_protect_rmap(sg, raddr, origin + offset, len, mmrange);
> spin_lock(&sg->guest_table_lock);
> if (!rc) {
> table = gmap_table_walk(sg, saddr, 3);
> @@ -1718,6 +1733,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_r3t);
> * @saddr: faulting address in the shadow gmap
> * @sgt: parent gmap address of the segment table to get shadowed
> * @fake: sgt references contiguous guest memory block, not a sgt
> + * @mmrange: address space range locking
> *
> * Returns: 0 if successfully shadowed or already shadowed, -EAGAIN if the
> * shadow table structure is incomplete, -ENOMEM if out of memory and
> @@ -1726,7 +1742,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_r3t);
> * Called with sg->mm->mmap_sem in read.
> */
> int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
> - int fake)
> + int fake, struct range_lock *mmrange)
> {
> unsigned long raddr, origin, offset, len;
> unsigned long *s_sgt, *table;
> @@ -1775,7 +1791,7 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
> origin = sgt & _REGION_ENTRY_ORIGIN;
> offset = ((sgt & _REGION_ENTRY_OFFSET) >> 6) * PAGE_SIZE;
> len = ((sgt & _REGION_ENTRY_LENGTH) + 1) * PAGE_SIZE - offset;
> - rc = gmap_protect_rmap(sg, raddr, origin + offset, len);
> + rc = gmap_protect_rmap(sg, raddr, origin + offset, len, mmrange);
> spin_lock(&sg->guest_table_lock);
> if (!rc) {
> table = gmap_table_walk(sg, saddr, 2);
> @@ -1842,6 +1858,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup);
> * @saddr: faulting address in the shadow gmap
> * @pgt: parent gmap address of the page table to get shadowed
> * @fake: pgt references contiguous guest memory block, not a pgtable
> + * @mmrange: address space range locking
> *
> * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the
> * shadow table structure is incomplete, -ENOMEM if out of memory,
> @@ -1850,7 +1867,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_pgt_lookup);
> * Called with gmap->mm->mmap_sem in read
> */
> int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
> - int fake)
> + int fake, struct range_lock *mmrange)
> {
> unsigned long raddr, origin;
> unsigned long *s_pgt, *table;
> @@ -1894,7 +1911,7 @@ int gmap_shadow_pgt(struct gmap *sg, unsigned long saddr, unsigned long pgt,
> /* Make pgt read-only in parent gmap page table (not the pgste) */
> raddr = (saddr & _SEGMENT_MASK) | _SHADOW_RMAP_SEGMENT;
> origin = pgt & _SEGMENT_ENTRY_ORIGIN & PAGE_MASK;
> - rc = gmap_protect_rmap(sg, raddr, origin, PAGE_SIZE);
> + rc = gmap_protect_rmap(sg, raddr, origin, PAGE_SIZE, mmrange);
> spin_lock(&sg->guest_table_lock);
> if (!rc) {
> table = gmap_table_walk(sg, saddr, 1);
> @@ -1921,6 +1938,7 @@ EXPORT_SYMBOL_GPL(gmap_shadow_pgt);
> * @sg: pointer to the shadow guest address space structure
> * @saddr: faulting address in the shadow gmap
> * @pte: pte in parent gmap address space to get shadowed
> + * @mmrange: address space range locking
> *
> * Returns 0 if successfully shadowed or already shadowed, -EAGAIN if the
> * shadow table structure is incomplete, -ENOMEM if out of memory and
> @@ -1928,7 +1946,8 @@ EXPORT_SYMBOL_GPL(gmap_shadow_pgt);
> *
> * Called with sg->mm->mmap_sem in read.
> */
> -int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
> +int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte,
> + struct range_lock *mmrange)
> {
> struct gmap *parent;
> struct gmap_rmap *rmap;
> @@ -1982,7 +2001,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
> radix_tree_preload_end();
> if (!rc)
> break;
> - rc = gmap_pte_op_fixup(parent, paddr, vmaddr, prot);
> + rc = gmap_pte_op_fixup(parent, paddr, vmaddr, prot, mmrange);
> if (rc)
> break;
> }
> @@ -2117,7 +2136,8 @@ static inline void thp_split_mm(struct mm_struct *mm)
> * - This must be called after THP was enabled
> */
> static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
> - unsigned long end, struct mm_walk *walk)
> + unsigned long end, struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> unsigned long addr;
>
> @@ -2133,12 +2153,13 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
> return 0;
> }
>
> -static inline void zap_zero_pages(struct mm_struct *mm)
> +static inline void zap_zero_pages(struct mm_struct *mm,
> + struct range_lock *mmrange)
> {
> struct mm_walk walk = { .pmd_entry = __zap_zero_pages };
>
> walk.mm = mm;
> - walk_page_range(0, TASK_SIZE, &walk);
> + walk_page_range(0, TASK_SIZE, &walk, mmrange);
> }
>
> /*
> @@ -2147,6 +2168,7 @@ static inline void zap_zero_pages(struct mm_struct *mm)
> int s390_enable_sie(void)
> {
> struct mm_struct *mm = current->mm;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* Do we have pgstes? if yes, we are done */
> if (mm_has_pgste(mm))
> @@ -2158,7 +2180,7 @@ int s390_enable_sie(void)
> mm->context.has_pgste = 1;
> /* split thp mappings and disable thp for future mappings */
> thp_split_mm(mm);
> - zap_zero_pages(mm);
> + zap_zero_pages(mm, &mmrange);
> up_write(&mm->mmap_sem);
> return 0;
> }
> @@ -2182,6 +2204,7 @@ int s390_enable_skey(void)
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma;
> int rc = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_write(&mm->mmap_sem);
> if (mm_use_skey(mm))
> @@ -2190,7 +2213,7 @@ int s390_enable_skey(void)
> mm->context.use_skey = 1;
> for (vma = mm->mmap; vma; vma = vma->vm_next) {
> if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> - MADV_UNMERGEABLE, &vma->vm_flags)) {
> + MADV_UNMERGEABLE, &vma->vm_flags, &mmrange)) {
> mm->context.use_skey = 0;
> rc = -ENOMEM;
> goto out_up;
> @@ -2199,7 +2222,7 @@ int s390_enable_skey(void)
> mm->def_flags &= ~VM_MERGEABLE;
>
> walk.mm = mm;
> - walk_page_range(0, TASK_SIZE, &walk);
> + walk_page_range(0, TASK_SIZE, &walk, &mmrange);
>
> out_up:
> up_write(&mm->mmap_sem);
> @@ -2220,10 +2243,11 @@ static int __s390_reset_cmma(pte_t *pte, unsigned long addr,
> void s390_reset_cmma(struct mm_struct *mm)
> {
> struct mm_walk walk = { .pte_entry = __s390_reset_cmma };
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_write(&mm->mmap_sem);
> walk.mm = mm;
> - walk_page_range(0, TASK_SIZE, &walk);
> + walk_page_range(0, TASK_SIZE, &walk, &mmrange);
> up_write(&mm->mmap_sem);
> }
> EXPORT_SYMBOL_GPL(s390_reset_cmma);
> diff --git a/arch/score/mm/fault.c b/arch/score/mm/fault.c
> index b85fad4f0874..07a8637ad142 100644
> --- a/arch/score/mm/fault.c
> +++ b/arch/score/mm/fault.c
> @@ -51,6 +51,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
> unsigned long flags = 0;
> siginfo_t info;
> int fault;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> info.si_code = SEGV_MAPERR;
>
> @@ -111,7 +112,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, mmrange);
> if (unlikely(fault & VM_FAULT_ERROR)) {
> if (fault & VM_FAULT_OOM)
> goto out_of_memory;
> diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
> index 6fd1bf7481c7..d36106564728 100644
> --- a/arch/sh/mm/fault.c
> +++ b/arch/sh/mm/fault.c
> @@ -405,6 +405,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
> struct vm_area_struct * vma;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> tsk = current;
> mm = tsk->mm;
> @@ -488,7 +489,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if (unlikely(fault & (VM_FAULT_RETRY | VM_FAULT_ERROR)))
> if (mm_fault_error(regs, error_code, address, fault))
> diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
> index a8103a84b4ac..ebb2406dbe7c 100644
> --- a/arch/sparc/mm/fault_32.c
> +++ b/arch/sparc/mm/fault_32.c
> @@ -176,6 +176,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
> int from_user = !(regs->psr & PSR_PS);
> int fault, code;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (text_fault)
> address = regs->pc;
> @@ -242,7 +243,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> @@ -389,6 +390,7 @@ static void force_user_fault(unsigned long address, int write)
> struct mm_struct *mm = tsk->mm;
> unsigned int flags = FAULT_FLAG_USER;
> int code;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> code = SEGV_MAPERR;
>
> @@ -412,7 +414,7 @@ static void force_user_fault(unsigned long address, int write)
> if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
> goto bad_area;
> }
> - switch (handle_mm_fault(vma, address, flags)) {
> + switch (handle_mm_fault(vma, address, flags, &mmrange)) {
> case VM_FAULT_SIGBUS:
> case VM_FAULT_OOM:
> goto do_sigbus;
> diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
> index 41363f46797b..e0a3c36b0fa1 100644
> --- a/arch/sparc/mm/fault_64.c
> +++ b/arch/sparc/mm/fault_64.c
> @@ -287,6 +287,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
> int si_code, fault_code, fault;
> unsigned long address, mm_rss;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> fault_code = get_thread_fault_code();
>
> @@ -438,7 +439,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
> goto bad_area;
> }
>
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> goto exit_exception;
> diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
> index f58fa06a2214..09f053eb146f 100644
> --- a/arch/tile/mm/fault.c
> +++ b/arch/tile/mm/fault.c
> @@ -275,6 +275,7 @@ static int handle_page_fault(struct pt_regs *regs,
> int is_kernel_mode;
> pgd_t *pgd;
> unsigned int flags;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* on TILE, protection faults are always writes */
> if (!is_page_fault)
> @@ -437,7 +438,7 @@ static int handle_page_fault(struct pt_regs *regs,
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return 0;
> diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h
> index fca34b2177e2..98cc3e36385a 100644
> --- a/arch/um/include/asm/mmu_context.h
> +++ b/arch/um/include/asm/mmu_context.h
> @@ -23,7 +23,8 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
> extern void arch_exit_mmap(struct mm_struct *mm);
> static inline void arch_unmap(struct mm_struct *mm,
> struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> }
> static inline void arch_bprm_mm_init(struct mm_struct *mm,
> diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
> index b2b02df9896e..e632a14e896e 100644
> --- a/arch/um/kernel/trap.c
> +++ b/arch/um/kernel/trap.c
> @@ -33,6 +33,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
> pte_t *pte;
> int err = -EFAULT;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> *code_out = SEGV_MAPERR;
>
> @@ -74,7 +75,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
> do {
> int fault;
>
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> goto out_nosemaphore;
> diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
> index bbefcc46a45e..dd35b6191798 100644
> --- a/arch/unicore32/mm/fault.c
> +++ b/arch/unicore32/mm/fault.c
> @@ -168,7 +168,8 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
> }
>
> static int __do_pf(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
> - unsigned int flags, struct task_struct *tsk)
> + unsigned int flags, struct task_struct *tsk,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *vma;
> int fault;
> @@ -194,7 +195,7 @@ static int __do_pf(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
> * If for any reason at all we couldn't handle the fault, make
> * sure we exit gracefully rather than endlessly redo the fault.
> */
> - fault = handle_mm_fault(vma, addr & PAGE_MASK, flags);
> + fault = handle_mm_fault(vma, addr & PAGE_MASK, flags, mmrange);
> return fault;
>
> check_stack:
> @@ -210,6 +211,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> struct mm_struct *mm;
> int fault, sig, code;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> tsk = current;
> mm = tsk->mm;
> @@ -251,7 +253,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> #endif
> }
>
> - fault = __do_pf(mm, addr, fsr, flags, tsk);
> + fault = __do_pf(mm, addr, fsr, flags, tsk, &mmrange);
>
> /* If we need to retry but a fatal signal is pending, handle the
> * signal first. We do not need to release the mmap_sem because
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index 5b8b556dbb12..2e0bdf6a3aaf 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -155,6 +155,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
> struct vm_area_struct *vma;
> unsigned long text_start;
> int ret = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (down_write_killable(&mm->mmap_sem))
> return -EINTR;
> @@ -192,7 +193,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
>
> if (IS_ERR(vma)) {
> ret = PTR_ERR(vma);
> - do_munmap(mm, text_start, image->size, NULL);
> + do_munmap(mm, text_start, image->size, NULL, &mmrange);
> } else {
> current->mm->context.vdso = (void __user *)text_start;
> current->mm->context.vdso_image = image;
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index c931b88982a0..31fb02ed4770 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -263,7 +263,8 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
> }
>
> static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> /*
> * mpx_notify_unmap() goes and reads a rarely-hot
> @@ -283,7 +284,7 @@ static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
> * consistently wrong.
> */
> if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX)))
> - mpx_notify_unmap(mm, vma, start, end);
> + mpx_notify_unmap(mm, vma, start, end, mmrange);
> }
>
> #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
> index 61eb4b63c5ec..c26099224a17 100644
> --- a/arch/x86/include/asm/mpx.h
> +++ b/arch/x86/include/asm/mpx.h
> @@ -73,7 +73,8 @@ static inline void mpx_mm_init(struct mm_struct *mm)
> mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
> }
> void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long start, unsigned long end);
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange);
>
> unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len,
> unsigned long flags);
> @@ -95,7 +96,8 @@ static inline void mpx_mm_init(struct mm_struct *mm)
> }
> static inline void mpx_notify_unmap(struct mm_struct *mm,
> struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> }
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 800de815519c..93f1b8d4c88e 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1244,6 +1244,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
> int fault, major = 0;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> u32 pkey;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> tsk = current;
> mm = tsk->mm;
> @@ -1423,7 +1424,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
> * fault, so we read the pkey beforehand.
> */
> pkey = vma_pkey(vma);
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
> major |= fault & VM_FAULT_MAJOR;
>
> /*
> diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> index e500949bae24..51c3e1f7e6be 100644
> --- a/arch/x86/mm/mpx.c
> +++ b/arch/x86/mm/mpx.c
> @@ -47,6 +47,7 @@ static unsigned long mpx_mmap(unsigned long len)
> {
> struct mm_struct *mm = current->mm;
> unsigned long addr, populate;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* Only bounds table can be allocated here */
> if (len != mpx_bt_size_bytes(mm))
> @@ -54,7 +55,8 @@ static unsigned long mpx_mmap(unsigned long len)
>
> down_write(&mm->mmap_sem);
> addr = do_mmap(NULL, 0, len, PROT_READ | PROT_WRITE,
> - MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0, &populate, NULL);
> + MAP_ANONYMOUS | MAP_PRIVATE, VM_MPX, 0, &populate, NULL,
> + &mmrange);
> up_write(&mm->mmap_sem);
> if (populate)
> mm_populate(addr, populate);
> @@ -427,13 +429,15 @@ int mpx_handle_bd_fault(void)
> * A thin wrapper around get_user_pages(). Returns 0 if the
> * fault was resolved or -errno if not.
> */
> -static int mpx_resolve_fault(long __user *addr, int write)
> +static int mpx_resolve_fault(long __user *addr, int write,
> + struct range_lock *mmrange)
> {
> long gup_ret;
> int nr_pages = 1;
>
> gup_ret = get_user_pages((unsigned long)addr, nr_pages,
> - write ? FOLL_WRITE : 0, NULL, NULL);
> + write ? FOLL_WRITE : 0, NULL, NULL,
> + mmrange);
> /*
> * get_user_pages() returns number of pages gotten.
> * 0 means we failed to fault in and get anything,
> @@ -500,7 +504,8 @@ static int get_user_bd_entry(struct mm_struct *mm, unsigned long *bd_entry_ret,
> */
> static int get_bt_addr(struct mm_struct *mm,
> long __user *bd_entry_ptr,
> - unsigned long *bt_addr_result)
> + unsigned long *bt_addr_result,
> + struct range_lock *mmrange)
> {
> int ret;
> int valid_bit;
> @@ -519,7 +524,8 @@ static int get_bt_addr(struct mm_struct *mm,
> if (!ret)
> break;
> if (ret == -EFAULT)
> - ret = mpx_resolve_fault(bd_entry_ptr, need_write);
> + ret = mpx_resolve_fault(bd_entry_ptr,
> + need_write, mmrange);
> /*
> * If we could not resolve the fault, consider it
> * userspace's fault and error out.
> @@ -730,7 +736,8 @@ static unsigned long mpx_get_bd_entry_offset(struct mm_struct *mm,
> }
>
> static int unmap_entire_bt(struct mm_struct *mm,
> - long __user *bd_entry, unsigned long bt_addr)
> + long __user *bd_entry, unsigned long bt_addr,
> + struct range_lock *mmrange)
> {
> unsigned long expected_old_val = bt_addr | MPX_BD_ENTRY_VALID_FLAG;
> unsigned long uninitialized_var(actual_old_val);
> @@ -747,7 +754,7 @@ static int unmap_entire_bt(struct mm_struct *mm,
> if (!ret)
> break;
> if (ret == -EFAULT)
> - ret = mpx_resolve_fault(bd_entry, need_write);
> + ret = mpx_resolve_fault(bd_entry, need_write, mmrange);
> /*
> * If we could not resolve the fault, consider it
> * userspace's fault and error out.
> @@ -780,11 +787,12 @@ static int unmap_entire_bt(struct mm_struct *mm,
> * avoid recursion, do_munmap() will check whether it comes
> * from one bounds table through VM_MPX flag.
> */
> - return do_munmap(mm, bt_addr, mpx_bt_size_bytes(mm), NULL);
> + return do_munmap(mm, bt_addr, mpx_bt_size_bytes(mm), NULL, mmrange);
> }
>
> static int try_unmap_single_bt(struct mm_struct *mm,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *next;
> struct vm_area_struct *prev;
> @@ -835,7 +843,7 @@ static int try_unmap_single_bt(struct mm_struct *mm,
> }
>
> bde_vaddr = mm->context.bd_addr + mpx_get_bd_entry_offset(mm, start);
> - ret = get_bt_addr(mm, bde_vaddr, &bt_addr);
> + ret = get_bt_addr(mm, bde_vaddr, &bt_addr, mmrange);
> /*
> * No bounds table there, so nothing to unmap.
> */
> @@ -853,12 +861,13 @@ static int try_unmap_single_bt(struct mm_struct *mm,
> */
> if ((start == bta_start_vaddr) &&
> (end == bta_end_vaddr))
> - return unmap_entire_bt(mm, bde_vaddr, bt_addr);
> + return unmap_entire_bt(mm, bde_vaddr, bt_addr, mmrange);
> return zap_bt_entries_mapping(mm, bt_addr, start, end);
> }
>
> static int mpx_unmap_tables(struct mm_struct *mm,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> unsigned long one_unmap_start;
> trace_mpx_unmap_search(start, end);
> @@ -876,7 +885,8 @@ static int mpx_unmap_tables(struct mm_struct *mm,
> */
> if (one_unmap_end > next_unmap_start)
> one_unmap_end = next_unmap_start;
> - ret = try_unmap_single_bt(mm, one_unmap_start, one_unmap_end);
> + ret = try_unmap_single_bt(mm, one_unmap_start, one_unmap_end,
> + mmrange);
> if (ret)
> return ret;
>
> @@ -894,7 +904,8 @@ static int mpx_unmap_tables(struct mm_struct *mm,
> * necessary, and the 'vma' is the first vma in this range (start -> end).
> */
> void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> int ret;
>
> @@ -920,7 +931,7 @@ void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
> vma = vma->vm_next;
> } while (vma && vma->vm_start < end);
>
> - ret = mpx_unmap_tables(mm, start, end);
> + ret = mpx_unmap_tables(mm, start, end, mmrange);
> if (ret)
> force_sig(SIGSEGV, current);
> }
> diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
> index 8b9b6f44bb06..6f8e3e7cccb5 100644
> --- a/arch/xtensa/mm/fault.c
> +++ b/arch/xtensa/mm/fault.c
> @@ -44,6 +44,7 @@ void do_page_fault(struct pt_regs *regs)
> int is_write, is_exec;
> int fault;
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> info.si_code = SEGV_MAPERR;
>
> @@ -108,7 +109,7 @@ void do_page_fault(struct pt_regs *regs)
> * make sure we exit gracefully rather than endlessly redo
> * the fault.
> */
> - fault = handle_mm_fault(vma, address, flags);
> + fault = handle_mm_fault(vma, address, flags, &mmrange);
>
> if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> return;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index e4bb435e614b..bd464a599341 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -691,6 +691,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
> unsigned int flags = 0;
> unsigned pinned = 0;
> int r;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (!(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY))
> flags |= FOLL_WRITE;
> @@ -721,7 +722,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
> list_add(&guptask.list, &gtt->guptasks);
> spin_unlock(&gtt->guptasklock);
>
> - r = get_user_pages(userptr, num_pages, flags, p, NULL);
> + r = get_user_pages(userptr, num_pages, flags, p, NULL, &mmrange);
>
> spin_lock(&gtt->guptasklock);
> list_del(&guptask.list);
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 382a77a1097e..881bcc7d663a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -512,6 +512,8 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
>
> ret = -EFAULT;
> if (mmget_not_zero(mm)) {
> + DEFINE_RANGE_LOCK_FULL(mmrange);
> +
> down_read(&mm->mmap_sem);
> while (pinned < npages) {
> ret = get_user_pages_remote
> @@ -519,7 +521,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
> obj->userptr.ptr + pinned * PAGE_SIZE,
> npages - pinned,
> flags,
> - pvec + pinned, NULL, NULL);
> + pvec + pinned, NULL, NULL, &mmrange);
> if (ret < 0)
> break;
>
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index a0a839bc39bf..9fc3a4f86945 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -545,6 +545,8 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
> struct radeon_ttm_tt *gtt = (void *)ttm;
> unsigned pinned = 0, nents;
> int r;
> + // XXX: this is wrong!!
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> int write = !(gtt->userflags & RADEON_GEM_USERPTR_READONLY);
> enum dma_data_direction direction = write ?
> @@ -569,7 +571,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
> struct page **pages = ttm->pages + pinned;
>
> r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
> - pages, NULL);
> + pages, NULL, &mmrange);
> if (r < 0)
> goto release_pages;
>
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 9a4e899d94b3..fd9601ed5b84 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -96,6 +96,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> struct scatterlist *sg, *sg_list_start;
> int need_release = 0;
> unsigned int gup_flags = FOLL_WRITE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (dmasync)
> dma_attrs |= DMA_ATTR_WRITE_BARRIER;
> @@ -194,7 +195,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> ret = get_user_pages_longterm(cur_base,
> min_t(unsigned long, npages,
> PAGE_SIZE / sizeof (struct page *)),
> - gup_flags, page_list, vma_list);
> + gup_flags, page_list, vma_list, &mmrange);
>
> if (ret < 0)
> goto out;
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index 2aadf5813a40..0572953260e8 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -632,6 +632,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
> int j, k, ret = 0, start_idx, npages = 0, page_shift;
> unsigned int flags = 0;
> phys_addr_t p = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (access_mask == 0)
> return -EINVAL;
> @@ -683,7 +684,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
> */
> npages = get_user_pages_remote(owning_process, owning_mm,
> user_virt, gup_num_pages,
> - flags, local_page_list, NULL, NULL);
> + flags, local_page_list, NULL, NULL, &mmrange);
> up_read(&owning_mm->mmap_sem);
>
> if (npages < 0)
> diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
> index ce83ba9a12ef..6bcb4f9f9b30 100644
> --- a/drivers/infiniband/hw/qib/qib_user_pages.c
> +++ b/drivers/infiniband/hw/qib/qib_user_pages.c
> @@ -53,7 +53,7 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages,
> * Call with current->mm->mmap_sem held.
> */
> static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
> - struct page **p)
> + struct page **p, struct range_lock *mmrange)
> {
> unsigned long lock_limit;
> size_t got;
> @@ -70,7 +70,7 @@ static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
> ret = get_user_pages(start_page + got * PAGE_SIZE,
> num_pages - got,
> FOLL_WRITE | FOLL_FORCE,
> - p + got, NULL);
> + p + got, NULL, mmrange);
> if (ret < 0)
> goto bail_release;
> }
> @@ -134,10 +134,11 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
> struct page **p)
> {
> int ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_write(&current->mm->mmap_sem);
>
> - ret = __qib_get_user_pages(start_page, num_pages, p);
> + ret = __qib_get_user_pages(start_page, num_pages, p, &mmrange);
>
> up_write(&current->mm->mmap_sem);
>
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index 4381c0a9a873..5f36c6d2e21b 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -113,6 +113,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
> int flags;
> dma_addr_t pa;
> unsigned int gup_flags;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (!can_do_mlock())
> return -EPERM;
> @@ -146,7 +147,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
> ret = get_user_pages(cur_base,
> min_t(unsigned long, npages,
> PAGE_SIZE / sizeof(struct page *)),
> - gup_flags, page_list, NULL);
> + gup_flags, page_list, NULL, &mmrange);
>
> if (ret < 0)
> goto out;
> diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
> index 1d0b53a04a08..15a7103fd84c 100644
> --- a/drivers/iommu/amd_iommu_v2.c
> +++ b/drivers/iommu/amd_iommu_v2.c
> @@ -512,6 +512,7 @@ static void do_fault(struct work_struct *work)
> unsigned int flags = 0;
> struct mm_struct *mm;
> u64 address;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> mm = fault->state->mm;
> address = fault->address;
> @@ -523,7 +524,7 @@ static void do_fault(struct work_struct *work)
> flags |= FAULT_FLAG_REMOTE;
>
> down_read(&mm->mmap_sem);
> - vma = find_extend_vma(mm, address);
> + vma = find_extend_vma(mm, address, &mmrange);
> if (!vma || address < vma->vm_start)
> /* failed to get a vma in the right range */
> goto out;
> @@ -532,7 +533,7 @@ static void do_fault(struct work_struct *work)
> if (access_error(vma, fault))
> goto out;
>
> - ret = handle_mm_fault(vma, address, flags);
> + ret = handle_mm_fault(vma, address, flags, &mmrange);
> out:
> up_read(&mm->mmap_sem);
>
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 35a408d0ae4f..6a74386ee83f 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -585,6 +585,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> struct intel_iommu *iommu = d;
> struct intel_svm *svm = NULL;
> int head, tail, handled = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* Clear PPR bit before reading head/tail registers, to
> * ensure that we get a new interrupt if needed. */
> @@ -643,7 +644,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> goto bad_req;
>
> down_read(&svm->mm->mmap_sem);
> - vma = find_extend_vma(svm->mm, address);
> + vma = find_extend_vma(svm->mm, address, &mmrange);
> if (!vma || address < vma->vm_start)
> goto invalid;
>
> @@ -651,7 +652,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> goto invalid;
>
> ret = handle_mm_fault(vma, address,
> - req->wr_req ? FAULT_FLAG_WRITE : 0);
> + req->wr_req ? FAULT_FLAG_WRITE : 0, &mmrange);
> if (ret & VM_FAULT_ERROR)
> goto invalid;
>
> diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
> index f412429cf5ba..64a4cd62eeb3 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-sg.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
> @@ -152,7 +152,8 @@ static void videobuf_dma_init(struct videobuf_dmabuf *dma)
> }
>
> static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
> - int direction, unsigned long data, unsigned long size)
> + int direction, unsigned long data, unsigned long size,
> + struct range_lock *mmrange)
> {
> unsigned long first, last;
> int err, rw = 0;
> @@ -186,7 +187,7 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
> data, size, dma->nr_pages);
>
> err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
> - flags, dma->pages, NULL);
> + flags, dma->pages, NULL, mmrange);
>
> if (err != dma->nr_pages) {
> dma->nr_pages = (err >= 0) ? err : 0;
> @@ -201,9 +202,10 @@ static int videobuf_dma_init_user(struct videobuf_dmabuf *dma, int direction,
> unsigned long data, unsigned long size)
> {
> int ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_read(&current->mm->mmap_sem);
> - ret = videobuf_dma_init_user_locked(dma, direction, data, size);
> + ret = videobuf_dma_init_user_locked(dma, direction, data, size, &mmrange);
> up_read(&current->mm->mmap_sem);
>
> return ret;
> @@ -539,9 +541,14 @@ static int __videobuf_iolock(struct videobuf_queue *q,
> we take current->mm->mmap_sem there, to prevent
> locking inversion, so don't take it here */
>
> + /* XXX: can we use a local mmrange here? */
> + DEFINE_RANGE_LOCK_FULL(mmrange);
> +
> err = videobuf_dma_init_user_locked(&mem->dma,
> - DMA_FROM_DEVICE,
> - vb->baddr, vb->bsize);
> + DMA_FROM_DEVICE,
> + vb->baddr,
> + vb->bsize,
> + &mmrange);
> if (0 != err)
> return err;
> }
> @@ -555,6 +562,7 @@ static int __videobuf_iolock(struct videobuf_queue *q,
> * building for PAE. Compiler doesn't like direct casting
> * of a 32 bit ptr to 64 bit integer.
> */
> +
> bus = (dma_addr_t)(unsigned long)fbuf->base + vb->boff;
> pages = PAGE_ALIGN(vb->size) >> PAGE_SHIFT;
> err = videobuf_dma_init_overlay(&mem->dma, DMA_FROM_DEVICE,
> diff --git a/drivers/misc/mic/scif/scif_rma.c b/drivers/misc/mic/scif/scif_rma.c
> index c824329f7012..6ecac843e5f3 100644
> --- a/drivers/misc/mic/scif/scif_rma.c
> +++ b/drivers/misc/mic/scif/scif_rma.c
> @@ -1332,6 +1332,7 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot,
> int prot = *out_prot;
> int ulimit = 0;
> struct mm_struct *mm = NULL;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* Unsupported flags */
> if (map_flags & ~(SCIF_MAP_KERNEL | SCIF_MAP_ULIMIT))
> @@ -1400,7 +1401,7 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot,
> nr_pages,
> (prot & SCIF_PROT_WRITE) ? FOLL_WRITE : 0,
> pinned_pages->pages,
> - NULL);
> + NULL, &mmrange);
> up_write(&mm->mmap_sem);
> if (nr_pages != pinned_pages->nr_pages) {
> if (try_upgrade) {
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index 93be82fc338a..b35d60bb2197 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -189,7 +189,8 @@ static void get_clear_fault_map(struct gru_state *gru,
> */
> static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> unsigned long vaddr, int write,
> - unsigned long *paddr, int *pageshift)
> + unsigned long *paddr, int *pageshift,
> + struct range_lock *mmrange)
> {
> struct page *page;
>
> @@ -198,7 +199,8 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> #else
> *pageshift = PAGE_SHIFT;
> #endif
> - if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> + if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0,
> + &page, NULL, mmrange) <= 0)

There is no need to pass down the range here since underlying called
__get_user_pages_locked() is told to not unlock the mmap_sem.
In general get_user_pages() doesn't need a range parameter.

> return -EFAULT;
> *paddr = page_to_phys(page);
> put_page(page);
> @@ -263,7 +265,8 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
> }
>
> static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> - int write, int atomic, unsigned long *gpa, int *pageshift)
> + int write, int atomic, unsigned long *gpa, int *pageshift,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = gts->ts_mm;
> struct vm_area_struct *vma;
> @@ -283,7 +286,8 @@ static int gru_vtop(struct gru_thread_state *gts, unsigned long vaddr,
> if (ret) {
> if (atomic)
> goto upm;
> - if (non_atomic_pte_lookup(vma, vaddr, write, &paddr, &ps))
> + if (non_atomic_pte_lookup(vma, vaddr, write, &paddr,
> + &ps, mmrange))
> goto inval;
> }
> if (is_gru_paddr(paddr))
> @@ -324,7 +328,8 @@ static void gru_preload_tlb(struct gru_state *gru,
> unsigned long fault_vaddr, int asid, int write,
> unsigned char tlb_preload_count,
> struct gru_tlb_fault_handle *tfh,
> - struct gru_control_block_extended *cbe)
> + struct gru_control_block_extended *cbe,
> + struct range_lock *mmrange)
> {
> unsigned long vaddr = 0, gpa;
> int ret, pageshift;
> @@ -342,7 +347,7 @@ static void gru_preload_tlb(struct gru_state *gru,
> vaddr = min(vaddr, fault_vaddr + tlb_preload_count * PAGE_SIZE);
>
> while (vaddr > fault_vaddr) {
> - ret = gru_vtop(gts, vaddr, write, atomic, &gpa, &pageshift);
> + ret = gru_vtop(gts, vaddr, write, atomic, &gpa, &pageshift, mmrange);
> if (ret || tfh_write_only(tfh, gpa, GAA_RAM, vaddr, asid, write,
> GRU_PAGESIZE(pageshift)))
> return;
> @@ -368,7 +373,8 @@ static void gru_preload_tlb(struct gru_state *gru,
> static int gru_try_dropin(struct gru_state *gru,
> struct gru_thread_state *gts,
> struct gru_tlb_fault_handle *tfh,
> - struct gru_instruction_bits *cbk)
> + struct gru_instruction_bits *cbk,
> + struct range_lock *mmrange)
> {
> struct gru_control_block_extended *cbe = NULL;
> unsigned char tlb_preload_count = gts->ts_tlb_preload_count;
> @@ -423,7 +429,7 @@ static int gru_try_dropin(struct gru_state *gru,
> if (atomic_read(&gts->ts_gms->ms_range_active))
> goto failactive;
>
> - ret = gru_vtop(gts, vaddr, write, atomic, &gpa, &pageshift);
> + ret = gru_vtop(gts, vaddr, write, atomic, &gpa, &pageshift, mmrange);
> if (ret == VTOP_INVALID)
> goto failinval;
> if (ret == VTOP_RETRY)
> @@ -438,7 +444,8 @@ static int gru_try_dropin(struct gru_state *gru,
> }
>
> if (unlikely(cbe) && pageshift == PAGE_SHIFT) {
> - gru_preload_tlb(gru, gts, atomic, vaddr, asid, write, tlb_preload_count, tfh, cbe);
> + gru_preload_tlb(gru, gts, atomic, vaddr, asid, write,
> + tlb_preload_count, tfh, cbe, mmrange);
> gru_flush_cache_cbe(cbe);
> }
>
> @@ -587,10 +594,13 @@ static irqreturn_t gru_intr(int chiplet, int blade)
> * If it fails, retry the fault in user context.
> */
> gts->ustats.fmm_tlbmiss++;
> - if (!gts->ts_force_cch_reload &&
> - down_read_trylock(&gts->ts_mm->mmap_sem)) {
> - gru_try_dropin(gru, gts, tfh, NULL);
> - up_read(&gts->ts_mm->mmap_sem);
> + if (!gts->ts_force_cch_reload) {
> + DEFINE_RANGE_LOCK_FULL(mmrange);
> +
> + if (down_read_trylock(&gts->ts_mm->mmap_sem)) {
> + gru_try_dropin(gru, gts, tfh, NULL, &mmrange);
> + up_read(&gts->ts_mm->mmap_sem);
> + }
> } else {
> tfh_user_polling_mode(tfh);
> STAT(intr_mm_lock_failed);
> @@ -625,7 +635,7 @@ irqreturn_t gru_intr_mblade(int irq, void *dev_id)
>
> static int gru_user_dropin(struct gru_thread_state *gts,
> struct gru_tlb_fault_handle *tfh,
> - void *cb)
> + void *cb, struct range_lock *mmrange)
> {
> struct gru_mm_struct *gms = gts->ts_gms;
> int ret;
> @@ -635,7 +645,7 @@ static int gru_user_dropin(struct gru_thread_state *gts,
> wait_event(gms->ms_wait_queue,
> atomic_read(&gms->ms_range_active) == 0);
> prefetchw(tfh); /* Helps on hdw, required for emulator */
> - ret = gru_try_dropin(gts->ts_gru, gts, tfh, cb);
> + ret = gru_try_dropin(gts->ts_gru, gts, tfh, cb, mmrange);
> if (ret <= 0)
> return ret;
> STAT(call_os_wait_queue);
> @@ -653,6 +663,7 @@ int gru_handle_user_call_os(unsigned long cb)
> struct gru_thread_state *gts;
> void *cbk;
> int ucbnum, cbrnum, ret = -EINVAL;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> STAT(call_os);
>
> @@ -685,7 +696,7 @@ int gru_handle_user_call_os(unsigned long cb)
> tfh = get_tfh_by_index(gts->ts_gru, cbrnum);
> cbk = get_gseg_base_address_cb(gts->ts_gru->gs_gru_base_vaddr,
> gts->ts_ctxnum, ucbnum);
> - ret = gru_user_dropin(gts, tfh, cbk);
> + ret = gru_user_dropin(gts, tfh, cbk, &mmrange);
> }
> exit:
> gru_unlock_gts(gts);
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index e30e29ae4819..1b3b103da637 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -345,13 +345,14 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> page);
> } else {
> unsigned int flags = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (prot & IOMMU_WRITE)
> flags |= FOLL_WRITE;
>
> down_read(&mm->mmap_sem);
> ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> - NULL, NULL);
> + NULL, NULL, &mmrange);
> up_read(&mm->mmap_sem);
> }
>
> diff --git a/fs/aio.c b/fs/aio.c
> index a062d75109cb..31774b75c372 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -457,6 +457,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
> int nr_pages;
> int i;
> struct file *file;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* Compensate for the ring buffer's head/tail overlap entry */
> nr_events += 2; /* 1 is required, 2 for good luck */
> @@ -519,7 +520,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
>
> ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size,
> PROT_READ | PROT_WRITE,
> - MAP_SHARED, 0, &unused, NULL);
> + MAP_SHARED, 0, &unused, NULL, &mmrange);
> up_write(&mm->mmap_sem);
> if (IS_ERR((void *)ctx->mmap_base)) {
> ctx->mmap_size = 0;
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 2f492dfcabde..9aea808d55d7 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -180,6 +180,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
> int ei_index = 0;
> const struct cred *cred = current_cred();
> struct vm_area_struct *vma;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /*
> * In some cases (e.g. Hyper-Threading), we want to avoid L1
> @@ -300,7 +301,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
> * Grow the stack manually; some architectures have a limit on how
> * far ahead a user-space access may be in order to grow the stack.
> */
> - vma = find_extend_vma(current->mm, bprm->p);
> + vma = find_extend_vma(current->mm, bprm->p, &mmrange);
> if (!vma)
> return -EFAULT;
>
> diff --git a/fs/exec.c b/fs/exec.c
> index e7b69e14649f..e46752874b47 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -197,6 +197,11 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> struct page *page;
> int ret;
> unsigned int gup_flags = FOLL_FORCE;
> + /*
> + * No concurrency for the bprm->mm yet -- this is exec path;
> + * but gup needs an mmrange.
> + */
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> #ifdef CONFIG_STACK_GROWSUP
> if (write) {
> @@ -214,7 +219,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> * doing the exec and bprm->mm is the new process's mm.
> */
> ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
> - &page, NULL, NULL);
> + &page, NULL, NULL, &mmrange);
> if (ret <= 0)
> return NULL;
>
> @@ -615,7 +620,8 @@ EXPORT_SYMBOL(copy_strings_kernel);
> * 4) Free up any cleared pgd range.
> * 5) Shrink the vma to cover only the new range.
> */
> -static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
> +static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = vma->vm_mm;
> unsigned long old_start = vma->vm_start;
> @@ -637,7 +643,8 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
> /*
> * cover the whole range: [new_start, old_end)
> */
> - if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL))
> + if (vma_adjust(vma, new_start, old_end, vma->vm_pgoff, NULL,
> + mmrange))
> return -ENOMEM;
>
> /*
> @@ -671,7 +678,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
> /*
> * Shrink the vma to just the new range. Always succeeds.
> */
> - vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL);
> + vma_adjust(vma, new_start, new_end, vma->vm_pgoff, NULL, mmrange);
>
> return 0;
> }
> @@ -694,6 +701,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
> unsigned long stack_size;
> unsigned long stack_expand;
> unsigned long rlim_stack;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> #ifdef CONFIG_STACK_GROWSUP
> /* Limit stack size */
> @@ -749,14 +757,14 @@ int setup_arg_pages(struct linux_binprm *bprm,
> vm_flags |= VM_STACK_INCOMPLETE_SETUP;
>
> ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
> - vm_flags);
> + vm_flags, &mmrange);
> if (ret)
> goto out_unlock;
> BUG_ON(prev != vma);
>
> /* Move stack pages down in memory. */
> if (stack_shift) {
> - ret = shift_arg_pages(vma, stack_shift);
> + ret = shift_arg_pages(vma, stack_shift, &mmrange);
> if (ret)
> goto out_unlock;
> }
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index d697c8ab0a14..791f9f93643c 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -16,6 +16,7 @@
> #include <linux/binfmts.h>
> #include <linux/sched/coredump.h>
> #include <linux/sched/task.h>
> +#include <linux/range_lock.h>
>
> struct ctl_table_header;
> struct mempolicy;
> @@ -263,6 +264,8 @@ struct proc_maps_private {
> #ifdef CONFIG_NUMA
> struct mempolicy *task_mempolicy;
> #endif
> + /* mmap_sem is held across all stages of seqfile */
> + struct range_lock mmrange;
> } __randomize_layout;
>
> struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b66fc8de7d34..7c0a79a937b5 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -174,6 +174,7 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
> if (!mm || !mmget_not_zero(mm))
> return NULL;
>
> + range_lock_init_full(&priv->mmrange);
> down_read(&mm->mmap_sem);
> hold_task_mempolicy(priv);
> priv->tail_vma = get_gate_vma(mm);
> @@ -514,7 +515,7 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
>
> #ifdef CONFIG_SHMEM
> static int smaps_pte_hole(unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> struct mem_size_stats *mss = walk->private;
>
> @@ -605,7 +606,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
> #endif
>
> static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> struct vm_area_struct *vma = walk->vma;
> pte_t *pte;
> @@ -797,7 +798,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
> #endif
>
> /* mmap_sem is held in m_start */
> - walk_page_vma(vma, &smaps_walk);
> + walk_page_vma(vma, &smaps_walk, &priv->mmrange);
> if (vma->vm_flags & VM_LOCKED)
> mss->pss_locked += mss->pss;
>
> @@ -1012,7 +1013,8 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
> #endif
>
> static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
> - unsigned long end, struct mm_walk *walk)
> + unsigned long end, struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> struct clear_refs_private *cp = walk->private;
> struct vm_area_struct *vma = walk->vma;
> @@ -1103,6 +1105,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> struct mmu_gather tlb;
> int itype;
> int rv;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> memset(buffer, 0, sizeof(buffer));
> if (count > sizeof(buffer) - 1)
> @@ -1166,7 +1169,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> }
> mmu_notifier_invalidate_range_start(mm, 0, -1);
> }
> - walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
> + walk_page_range(0, mm->highest_vm_end, &clear_refs_walk,
> + &mmrange);
> if (type == CLEAR_REFS_SOFT_DIRTY)
> mmu_notifier_invalidate_range_end(mm, 0, -1);
> tlb_finish_mmu(&tlb, 0, -1);
> @@ -1223,7 +1227,7 @@ static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
> }
>
> static int pagemap_pte_hole(unsigned long start, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> struct pagemapread *pm = walk->private;
> unsigned long addr = start;
> @@ -1301,7 +1305,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
> }
>
> static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> struct vm_area_struct *vma = walk->vma;
> struct pagemapread *pm = walk->private;
> @@ -1467,6 +1471,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
> unsigned long start_vaddr;
> unsigned long end_vaddr;
> int ret = 0, copied = 0;
> + DEFINE_RANGE_LOCK_FULL(tmprange);
> + struct range_lock *mmrange = &tmprange;
>
> if (!mm || !mmget_not_zero(mm))
> goto out;
> @@ -1523,7 +1529,8 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
> if (end < start_vaddr || end > end_vaddr)
> end = end_vaddr;
> down_read(&mm->mmap_sem);
> - ret = walk_page_range(start_vaddr, end, &pagemap_walk);
> + ret = walk_page_range(start_vaddr, end, &pagemap_walk,
> + mmrange);
> up_read(&mm->mmap_sem);
> start_vaddr = end;
>
> @@ -1671,7 +1678,8 @@ static struct page *can_gather_numa_stats_pmd(pmd_t pmd,
> #endif
>
> static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
> - unsigned long end, struct mm_walk *walk)
> + unsigned long end, struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> struct numa_maps *md = walk->private;
> struct vm_area_struct *vma = walk->vma;
> @@ -1740,6 +1748,7 @@ static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
> */
> static int show_numa_map(struct seq_file *m, void *v, int is_pid)
> {
> + struct proc_maps_private *priv = m->private;
> struct numa_maps_private *numa_priv = m->private;
> struct proc_maps_private *proc_priv = &numa_priv->proc_maps;
> struct vm_area_struct *vma = v;
> @@ -1785,7 +1794,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
> seq_puts(m, " huge");
>
> /* mmap_sem is held by m_start */
> - walk_page_vma(vma, &walk);
> + walk_page_vma(vma, &walk, &priv->mmrange);
>
> if (!md->pages)
> goto out;
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index a45f0af22a60..3768955c10bc 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -350,6 +350,11 @@ static int remap_oldmem_pfn_checked(struct vm_area_struct *vma,
> unsigned long pos_start, pos_end, pos;
> unsigned long zeropage_pfn = my_zero_pfn(0);
> size_t len = 0;
> + /*
> + * No concurrency for the bprm->mm yet -- this is a vmcore path,
> + * but do_munmap() needs an mmrange.
> + */
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> pos_start = pfn;
> pos_end = pfn + (size >> PAGE_SHIFT);
> @@ -388,7 +393,7 @@ static int remap_oldmem_pfn_checked(struct vm_area_struct *vma,
> }
> return 0;
> fail:
> - do_munmap(vma->vm_mm, from, len, NULL);
> + do_munmap(vma->vm_mm, from, len, NULL, &mmrange);
> return -EAGAIN;
> }
>
> @@ -411,6 +416,11 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
> size_t size = vma->vm_end - vma->vm_start;
> u64 start, end, len, tsz;
> struct vmcore *m;
> + /*
> + * No concurrency for the bprm->mm yet -- this is a vmcore path,
> + * but do_munmap() needs an mmrange.
> + */
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> start = (u64)vma->vm_pgoff << PAGE_SHIFT;
> end = start + size;
> @@ -481,7 +491,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
>
> return 0;
> fail:
> - do_munmap(vma->vm_mm, vma->vm_start, len, NULL);
> + do_munmap(vma->vm_mm, vma->vm_start, len, NULL, &mmrange);
> return -EAGAIN;
> }
> #else
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 87a13a7c8270..e3089865fd52 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -851,6 +851,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
> /* len == 0 means wake all */
> struct userfaultfd_wake_range range = { .len = 0, };
> unsigned long new_flags;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> WRITE_ONCE(ctx->released, true);
>
> @@ -880,7 +881,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
> new_flags, vma->anon_vma,
> vma->vm_file, vma->vm_pgoff,
> vma_policy(vma),
> - NULL_VM_UFFD_CTX);
> + NULL_VM_UFFD_CTX, &mmrange);
> if (prev)
> vma = prev;
> else
> @@ -1276,6 +1277,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> bool found;
> bool basic_ioctls;
> unsigned long start, end, vma_end;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> user_uffdio_register = (struct uffdio_register __user *) arg;
>
> @@ -1413,18 +1415,19 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> prev = vma_merge(mm, prev, start, vma_end, new_flags,
> vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> vma_policy(vma),
> - ((struct vm_userfaultfd_ctx){ ctx }));
> + ((struct vm_userfaultfd_ctx){ ctx }),
> + &mmrange);
> if (prev) {
> vma = prev;
> goto next;
> }
> if (vma->vm_start < start) {
> - ret = split_vma(mm, vma, start, 1);
> + ret = split_vma(mm, vma, start, 1, &mmrange);
> if (ret)
> break;
> }
> if (vma->vm_end > end) {
> - ret = split_vma(mm, vma, end, 0);
> + ret = split_vma(mm, vma, end, 0, &mmrange);
> if (ret)
> break;
> }
> @@ -1471,6 +1474,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
> bool found;
> unsigned long start, end, vma_end;
> const void __user *buf = (void __user *)arg;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> ret = -EFAULT;
> if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
> @@ -1571,18 +1575,18 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
> prev = vma_merge(mm, prev, start, vma_end, new_flags,
> vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> vma_policy(vma),
> - NULL_VM_UFFD_CTX);
> + NULL_VM_UFFD_CTX, &mmrange);
> if (prev) {
> vma = prev;
> goto next;
> }
> if (vma->vm_start < start) {
> - ret = split_vma(mm, vma, start, 1);
> + ret = split_vma(mm, vma, start, 1, &mmrange);
> if (ret)
> break;
> }
> if (vma->vm_end > end) {
> - ret = split_vma(mm, vma, end, 0);
> + ret = split_vma(mm, vma, end, 0, &mmrange);
> if (ret)
> break;
> }
> diff --git a/include/asm-generic/mm_hooks.h b/include/asm-generic/mm_hooks.h
> index 8ac4e68a12f0..2115deceded1 100644
> --- a/include/asm-generic/mm_hooks.h
> +++ b/include/asm-generic/mm_hooks.h
> @@ -19,7 +19,8 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
>
> static inline void arch_unmap(struct mm_struct *mm,
> struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> }
>
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 325017ad9311..da004594d831 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -295,7 +295,7 @@ int hmm_vma_get_pfns(struct vm_area_struct *vma,
> struct hmm_range *range,
> unsigned long start,
> unsigned long end,
> - hmm_pfn_t *pfns);
> + hmm_pfn_t *pfns, struct range_lock *mmrange);
> bool hmm_vma_range_done(struct vm_area_struct *vma, struct hmm_range *range);
>
>
> @@ -323,7 +323,7 @@ int hmm_vma_fault(struct vm_area_struct *vma,
> unsigned long end,
> hmm_pfn_t *pfns,
> bool write,
> - bool block);
> + bool block, struct range_lock *mmrange);
> #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */
>
>
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index 44368b19b27e..19667b75f73c 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -20,7 +20,8 @@ struct mem_cgroup;
>
> #ifdef CONFIG_KSM
> int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> - unsigned long end, int advice, unsigned long *vm_flags);
> + unsigned long end, int advice, unsigned long *vm_flags,
> + struct range_lock *mmrange);
> int __ksm_enter(struct mm_struct *mm);
> void __ksm_exit(struct mm_struct *mm);
>
> @@ -78,7 +79,8 @@ static inline void ksm_exit(struct mm_struct *mm)
>
> #ifdef CONFIG_MMU
> static inline int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> - unsigned long end, int advice, unsigned long *vm_flags)
> + unsigned long end, int advice, unsigned long *vm_flags,
> + struct range_lock *mmrange)
> {
> return 0;
> }
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 0c6fe904bc97..fa08e348a295 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -272,7 +272,7 @@ int migrate_vma(const struct migrate_vma_ops *ops,
> unsigned long end,
> unsigned long *src,
> unsigned long *dst,
> - void *private);
> + void *private, struct range_lock *mmrange);
> #else
> static inline int migrate_vma(const struct migrate_vma_ops *ops,
> struct vm_area_struct *vma,
> @@ -280,7 +280,7 @@ static inline int migrate_vma(const struct migrate_vma_ops *ops,
> unsigned long end,
> unsigned long *src,
> unsigned long *dst,
> - void *private)
> + void *private, struct range_lock *mmrange)
> {
> return -EINVAL;
> }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bcf2509d448d..fc4e7fdc3e76 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1295,11 +1295,12 @@ struct mm_walk {
> int (*pud_entry)(pud_t *pud, unsigned long addr,
> unsigned long next, struct mm_walk *walk);
> int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
> - unsigned long next, struct mm_walk *walk);
> + unsigned long next, struct mm_walk *walk,
> + struct range_lock *mmrange);
> int (*pte_entry)(pte_t *pte, unsigned long addr,
> unsigned long next, struct mm_walk *walk);
> int (*pte_hole)(unsigned long addr, unsigned long next,
> - struct mm_walk *walk);
> + struct mm_walk *walk, struct range_lock *mmrange);
> int (*hugetlb_entry)(pte_t *pte, unsigned long hmask,
> unsigned long addr, unsigned long next,
> struct mm_walk *walk);
> @@ -1311,8 +1312,9 @@ struct mm_walk {
> };
>
> int walk_page_range(unsigned long addr, unsigned long end,
> - struct mm_walk *walk);
> -int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk);
> + struct mm_walk *walk, struct range_lock *mmrange);
> +int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk,
> + struct range_lock *mmrange);
> void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
> unsigned long end, unsigned long floor, unsigned long ceiling);
> int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
> @@ -1337,17 +1339,18 @@ int invalidate_inode_page(struct page *page);
>
> #ifdef CONFIG_MMU
> extern int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> - unsigned int flags);
> + unsigned int flags, struct range_lock *mmrange);
> extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
> unsigned long address, unsigned int fault_flags,
> - bool *unlocked);
> + bool *unlocked, struct range_lock *mmrange);
> void unmap_mapping_pages(struct address_space *mapping,
> pgoff_t start, pgoff_t nr, bool even_cows);
> void unmap_mapping_range(struct address_space *mapping,
> loff_t const holebegin, loff_t const holelen, int even_cows);
> #else
> static inline int handle_mm_fault(struct vm_area_struct *vma,
> - unsigned long address, unsigned int flags)
> + unsigned long address, unsigned int flags,
> + struct range_lock *mmrange)
> {
> /* should never happen if there's no MMU */
> BUG();
> @@ -1355,7 +1358,8 @@ static inline int handle_mm_fault(struct vm_area_struct *vma,
> }
> static inline int fixup_user_fault(struct task_struct *tsk,
> struct mm_struct *mm, unsigned long address,
> - unsigned int fault_flags, bool *unlocked)
> + unsigned int fault_flags, bool *unlocked,
> + struct range_lock *mmrange)
> {
> /* should never happen if there's no MMU */
> BUG();
> @@ -1383,24 +1387,28 @@ extern int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
> long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
> unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas, int *locked);
> + struct vm_area_struct **vmas, int *locked,
> + struct range_lock *mmrange);
> long get_user_pages(unsigned long start, unsigned long nr_pages,
> - unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas);
> + unsigned int gup_flags, struct page **pages,
> + struct vm_area_struct **vmas, struct range_lock *mmrange);
> long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
> - unsigned int gup_flags, struct page **pages, int *locked);
> + unsigned int gup_flags, struct page **pages,
> + int *locked, struct range_lock *mmrange);
> long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> struct page **pages, unsigned int gup_flags);
> #ifdef CONFIG_FS_DAX
> long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> - unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas);
> + unsigned int gup_flags, struct page **pages,
> + struct vm_area_struct **vmas,
> + struct range_lock *mmrange);
> #else
> static inline long get_user_pages_longterm(unsigned long start,
> unsigned long nr_pages, unsigned int gup_flags,
> - struct page **pages, struct vm_area_struct **vmas)
> + struct page **pages, struct vm_area_struct **vmas,
> + struct range_lock *mmrange)
> {
> - return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
> + return get_user_pages(start, nr_pages, gup_flags, pages, vmas, mmrange);
> }
> #endif /* CONFIG_FS_DAX */
>
> @@ -1505,7 +1513,8 @@ extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long
> int dirty_accountable, int prot_numa);
> extern int mprotect_fixup(struct vm_area_struct *vma,
> struct vm_area_struct **pprev, unsigned long start,
> - unsigned long end, unsigned long newflags);
> + unsigned long end, unsigned long newflags,
> + struct range_lock *mmrange);
>
> /*
> * doesn't attempt to fault and will return short.
> @@ -2149,28 +2158,30 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
> extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
> extern int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
> unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
> - struct vm_area_struct *expand);
> + struct vm_area_struct *expand, struct range_lock *mmrange);
> static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
> - unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
> + unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
> + struct range_lock *mmrange)
> {
> - return __vma_adjust(vma, start, end, pgoff, insert, NULL);
> + return __vma_adjust(vma, start, end, pgoff, insert, NULL, mmrange);
> }
> extern struct vm_area_struct *vma_merge(struct mm_struct *,
> struct vm_area_struct *prev, unsigned long addr, unsigned long end,
> unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
> - struct mempolicy *, struct vm_userfaultfd_ctx);
> + struct mempolicy *, struct vm_userfaultfd_ctx,
> + struct range_lock *mmrange);
> extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
> extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
> - unsigned long addr, int new_below);
> + unsigned long addr, int new_below, struct range_lock *mmrange);
> extern int split_vma(struct mm_struct *, struct vm_area_struct *,
> - unsigned long addr, int new_below);
> + unsigned long addr, int new_below, struct range_lock *mmrange);
> extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
> extern void __vma_link_rb(struct mm_struct *, struct vm_area_struct *,
> struct rb_node **, struct rb_node *);
> extern void unlink_file_vma(struct vm_area_struct *);
> extern struct vm_area_struct *copy_vma(struct vm_area_struct **,
> unsigned long addr, unsigned long len, pgoff_t pgoff,
> - bool *need_rmap_locks);
> + bool *need_rmap_locks, struct range_lock *mmrange);
> extern void exit_mmap(struct mm_struct *);
>
> static inline int check_data_rlimit(unsigned long rlim,
> @@ -2212,21 +2223,22 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
>
> extern unsigned long mmap_region(struct file *file, unsigned long addr,
> unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> - struct list_head *uf);
> + struct list_head *uf, struct range_lock *mmrange);
> extern unsigned long do_mmap(struct file *file, unsigned long addr,
> unsigned long len, unsigned long prot, unsigned long flags,
> vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
> - struct list_head *uf);
> + struct list_head *uf, struct range_lock *mmrange);
> extern int do_munmap(struct mm_struct *, unsigned long, size_t,
> - struct list_head *uf);
> + struct list_head *uf, struct range_lock *mmrange);
>
> static inline unsigned long
> do_mmap_pgoff(struct file *file, unsigned long addr,
> unsigned long len, unsigned long prot, unsigned long flags,
> unsigned long pgoff, unsigned long *populate,
> - struct list_head *uf)
> + struct list_head *uf, struct range_lock *mmrange)
> {
> - return do_mmap(file, addr, len, prot, flags, 0, pgoff, populate, uf);
> + return do_mmap(file, addr, len, prot, flags, 0, pgoff, populate,
> + uf, mmrange);
> }
>
> #ifdef CONFIG_MMU
> @@ -2405,7 +2417,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
> unsigned long start, unsigned long end);
> #endif
>
> -struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
> +struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr,
> + struct range_lock *);
> int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
> unsigned long pfn, unsigned long size, pgprot_t);
> int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index 0a294e950df8..79eb735e7c95 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -34,6 +34,7 @@ struct mm_struct;
> struct inode;
> struct notifier_block;
> struct page;
> +struct range_lock;
>
> #define UPROBE_HANDLER_REMOVE 1
> #define UPROBE_HANDLER_MASK 1
> @@ -115,17 +116,20 @@ struct uprobes_state {
> struct xol_area *xol_area;
> };
>
> -extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
> -extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
> +extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm,
> + unsigned long vaddr, struct range_lock *mmrange);
> +extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm,
> + unsigned long vaddr, struct range_lock *mmrange);
> extern bool is_swbp_insn(uprobe_opcode_t *insn);
> extern bool is_trap_insn(uprobe_opcode_t *insn);
> extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
> extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
> -extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
> +extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
> + uprobe_opcode_t, struct range_lock *mmrange);
> extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
> extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
> extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
> -extern int uprobe_mmap(struct vm_area_struct *vma);
> +extern int uprobe_mmap(struct vm_area_struct *vma, struct range_lock *mmrange);;
> extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end);
> extern void uprobe_start_dup_mmap(void);
> extern void uprobe_end_dup_mmap(void);
> @@ -169,7 +173,8 @@ static inline void
> uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> {
> }
> -static inline int uprobe_mmap(struct vm_area_struct *vma)
> +static inline int uprobe_mmap(struct vm_area_struct *vma,
> + struct range_lock *mmrange)
> {
> return 0;
> }
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 4643865e9171..6c29c791c7f2 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -1293,6 +1293,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
> struct path path;
> fmode_t f_mode;
> unsigned long populate = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> err = -EINVAL;
> if (shmid < 0)
> @@ -1411,7 +1412,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
> goto invalid;
> }
>
> - addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate, NULL);
> + addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate, NULL,
> + &mmrange);
> *raddr = addr;
> err = 0;
> if (IS_ERR_VALUE(addr))
> @@ -1487,6 +1489,7 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
> struct file *file;
> struct vm_area_struct *next;
> #endif
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (addr & ~PAGE_MASK)
> return retval;
> @@ -1537,7 +1540,8 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
> */
> file = vma->vm_file;
> size = i_size_read(file_inode(vma->vm_file));
> - do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
> + do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start,
> + NULL, &mmrange);
> /*
> * We discovered the size of the shm segment, so
> * break out of here and fall through to the next
> @@ -1564,7 +1568,8 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
> if ((vma->vm_ops == &shm_vm_ops) &&
> ((vma->vm_start - addr)/PAGE_SIZE == vma->vm_pgoff) &&
> (vma->vm_file == file))
> - do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
> + do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start,
> + NULL, &mmrange);
> vma = next;
> }
>
> @@ -1573,7 +1578,8 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
> * given
> */
> if (vma && vma->vm_start == addr && vma->vm_ops == &shm_vm_ops) {
> - do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start, NULL);
> + do_munmap(mm, vma->vm_start, vma->vm_end - vma->vm_start,
> + NULL, &mmrange);
> retval = 0;
> }
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index ce6848e46e94..60e12b39182c 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -300,7 +300,7 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
> * Return 0 (success) or a negative errno.
> */
> int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
> - uprobe_opcode_t opcode)
> + uprobe_opcode_t opcode, struct range_lock *mmrange)
> {
> struct page *old_page, *new_page;
> struct vm_area_struct *vma;
> @@ -309,7 +309,8 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
> retry:
> /* Read the page with vaddr into memory */
> ret = get_user_pages_remote(NULL, mm, vaddr, 1,
> - FOLL_FORCE | FOLL_SPLIT, &old_page, &vma, NULL);
> + FOLL_FORCE | FOLL_SPLIT, &old_page, &vma, NULL,
> + mmrange);

There is no need to pass down the range here as get_user_pages_remote() is
told to not unlock the mmap_sem.
There are other places where passing range parameter down is not necessary
and is making this series bigger than needed, adding extra parameter to a
lot of functions which doesn't need it.

Laurent.

> if (ret <= 0)
> return ret;
>
> @@ -349,9 +350,10 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
> * For mm @mm, store the breakpoint instruction at @vaddr.
> * Return 0 (success) or a negative errno.
> */
> -int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
> +int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
> + unsigned long vaddr, struct range_lock *mmrange)
> {
> - return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
> + return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN, mmrange);
> }
>
> /**
> @@ -364,9 +366,12 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
> * Return 0 (success) or a negative errno.
> */
> int __weak
> -set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
> +set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
> + unsigned long vaddr, struct range_lock *mmrange)
> {
> - return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t *)&auprobe->insn);
> + return uprobe_write_opcode(mm, vaddr,
> + *(uprobe_opcode_t *)&auprobe->insn,
> + mmrange);
> }
>
> static struct uprobe *get_uprobe(struct uprobe *uprobe)
> @@ -650,7 +655,8 @@ static bool filter_chain(struct uprobe *uprobe,
>
> static int
> install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
> - struct vm_area_struct *vma, unsigned long vaddr)
> + struct vm_area_struct *vma, unsigned long vaddr,
> + struct range_lock *mmrange)
> {
> bool first_uprobe;
> int ret;
> @@ -667,7 +673,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
> if (first_uprobe)
> set_bit(MMF_HAS_UPROBES, &mm->flags);
>
> - ret = set_swbp(&uprobe->arch, mm, vaddr);
> + ret = set_swbp(&uprobe->arch, mm, vaddr, mmrange);
> if (!ret)
> clear_bit(MMF_RECALC_UPROBES, &mm->flags);
> else if (first_uprobe)
> @@ -677,10 +683,11 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
> }
>
> static int
> -remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
> +remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
> + unsigned long vaddr, struct range_lock *mmrange)
> {
> set_bit(MMF_RECALC_UPROBES, &mm->flags);
> - return set_orig_insn(&uprobe->arch, mm, vaddr);
> + return set_orig_insn(&uprobe->arch, mm, vaddr, mmrange);
> }
>
> static inline bool uprobe_is_active(struct uprobe *uprobe)
> @@ -794,6 +801,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
> bool is_register = !!new;
> struct map_info *info;
> int err = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> percpu_down_write(&dup_mmap_sem);
> info = build_map_info(uprobe->inode->i_mapping,
> @@ -824,11 +832,13 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
> /* consult only the "caller", new consumer. */
> if (consumer_filter(new,
> UPROBE_FILTER_REGISTER, mm))
> - err = install_breakpoint(uprobe, mm, vma, info->vaddr);
> + err = install_breakpoint(uprobe, mm, vma,
> + info->vaddr, &mmrange);
> } else if (test_bit(MMF_HAS_UPROBES, &mm->flags)) {
> if (!filter_chain(uprobe,
> UPROBE_FILTER_UNREGISTER, mm))
> - err |= remove_breakpoint(uprobe, mm, info->vaddr);
> + err |= remove_breakpoint(uprobe, mm,
> + info->vaddr, &mmrange);
> }
>
> unlock:
> @@ -972,6 +982,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
> {
> struct vm_area_struct *vma;
> int err = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_read(&mm->mmap_sem);
> for (vma = mm->mmap; vma; vma = vma->vm_next) {
> @@ -988,7 +999,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
> continue;
>
> vaddr = offset_to_vaddr(vma, uprobe->offset);
> - err |= remove_breakpoint(uprobe, mm, vaddr);
> + err |= remove_breakpoint(uprobe, mm, vaddr, &mmrange);
> }
> up_read(&mm->mmap_sem);
>
> @@ -1063,7 +1074,7 @@ static void build_probe_list(struct inode *inode,
> * Currently we ignore all errors and always return 0, the callers
> * can't handle the failure anyway.
> */
> -int uprobe_mmap(struct vm_area_struct *vma)
> +int uprobe_mmap(struct vm_area_struct *vma, struct range_lock *mmrange)
> {
> struct list_head tmp_list;
> struct uprobe *uprobe, *u;
> @@ -1087,7 +1098,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
> if (!fatal_signal_pending(current) &&
> filter_chain(uprobe, UPROBE_FILTER_MMAP, vma->vm_mm)) {
> unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset);
> - install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
> + install_breakpoint(uprobe, vma->vm_mm, vma, vaddr, mmrange);
> }
> put_uprobe(uprobe);
> }
> @@ -1698,7 +1709,8 @@ static void mmf_recalc_uprobes(struct mm_struct *mm)
> clear_bit(MMF_HAS_UPROBES, &mm->flags);
> }
>
> -static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
> +static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr,
> + struct range_lock *mmrange)
> {
> struct page *page;
> uprobe_opcode_t opcode;
> @@ -1718,7 +1730,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
> * essentially a kernel access to the memory.
> */
> result = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &page,
> - NULL, NULL);
> + NULL, NULL, mmrange);
> if (result < 0)
> return result;
>
> @@ -1734,6 +1746,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
> struct mm_struct *mm = current->mm;
> struct uprobe *uprobe = NULL;
> struct vm_area_struct *vma;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_read(&mm->mmap_sem);
> vma = find_vma(mm, bp_vaddr);
> @@ -1746,7 +1759,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
> }
>
> if (!uprobe)
> - *is_swbp = is_trap_at_addr(mm, bp_vaddr);
> + *is_swbp = is_trap_at_addr(mm, bp_vaddr, &mmrange);
> } else {
> *is_swbp = -EFAULT;
> }
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 1f450e092c74..09a0d86f80a0 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -725,10 +725,11 @@ static int fault_in_user_writeable(u32 __user *uaddr)
> {
> struct mm_struct *mm = current->mm;
> int ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_read(&mm->mmap_sem);
> ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
> - FAULT_FLAG_WRITE, NULL);
> + FAULT_FLAG_WRITE, NULL, &mmrange);
> up_read(&mm->mmap_sem);
>
> return ret < 0 ? ret : 0;
> diff --git a/mm/frame_vector.c b/mm/frame_vector.c
> index c64dca6e27c2..d3dccd80c6ee 100644
> --- a/mm/frame_vector.c
> +++ b/mm/frame_vector.c
> @@ -39,6 +39,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> int ret = 0;
> int err;
> int locked;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (nr_frames == 0)
> return 0;
> @@ -71,7 +72,8 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> vec->got_ref = true;
> vec->is_pfns = false;
> ret = get_user_pages_locked(start, nr_frames,
> - gup_flags, (struct page **)(vec->ptrs), &locked);
> + gup_flags, (struct page **)(vec->ptrs), &locked,
> + &mmrange);
> goto out;
> }
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 1b46e6e74881..01983a7b3750 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -478,7 +478,8 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
> * If it is, *@nonblocking will be set to 0 and -EBUSY returned.
> */
> static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
> - unsigned long address, unsigned int *flags, int *nonblocking)
> + unsigned long address, unsigned int *flags, int *nonblocking,
> + struct range_lock *mmrange)
> {
> unsigned int fault_flags = 0;
> int ret;
> @@ -499,7 +500,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
> fault_flags |= FAULT_FLAG_TRIED;
> }
>
> - ret = handle_mm_fault(vma, address, fault_flags);
> + ret = handle_mm_fault(vma, address, fault_flags, mmrange);
> if (ret & VM_FAULT_ERROR) {
> int err = vm_fault_to_errno(ret, *flags);
>
> @@ -592,6 +593,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
> * @vmas: array of pointers to vmas corresponding to each page.
> * Or NULL if the caller does not require them.
> * @nonblocking: whether waiting for disk IO or mmap_sem contention
> + * @mmrange: mm address space range locking
> *
> * Returns number of pages pinned. This may be fewer than the number
> * requested. If nr_pages is 0 or negative, returns 0. If no pages
> @@ -638,7 +640,8 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
> static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas, int *nonblocking)
> + struct vm_area_struct **vmas, int *nonblocking,
> + struct range_lock *mmrange)
> {
> long i = 0;
> unsigned int page_mask;
> @@ -664,7 +667,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>
> /* first iteration or cross vma bound */
> if (!vma || start >= vma->vm_end) {
> - vma = find_extend_vma(mm, start);
> + vma = find_extend_vma(mm, start, mmrange);
> if (!vma && in_gate_area(mm, start)) {
> int ret;
> ret = get_gate_page(mm, start & PAGE_MASK,
> @@ -697,7 +700,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> if (!page) {
> int ret;
> ret = faultin_page(tsk, vma, start, &foll_flags,
> - nonblocking);
> + nonblocking, mmrange);
> switch (ret) {
> case 0:
> goto retry;
> @@ -796,7 +799,7 @@ static bool vma_permits_fault(struct vm_area_struct *vma,
> */
> int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
> unsigned long address, unsigned int fault_flags,
> - bool *unlocked)
> + bool *unlocked, struct range_lock *mmrange)
> {
> struct vm_area_struct *vma;
> int ret, major = 0;
> @@ -805,14 +808,14 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
> fault_flags |= FAULT_FLAG_ALLOW_RETRY;
>
> retry:
> - vma = find_extend_vma(mm, address);
> + vma = find_extend_vma(mm, address, mmrange);
> if (!vma || address < vma->vm_start)
> return -EFAULT;
>
> if (!vma_permits_fault(vma, fault_flags))
> return -EFAULT;
>
> - ret = handle_mm_fault(vma, address, fault_flags);
> + ret = handle_mm_fault(vma, address, fault_flags, mmrange);
> major |= ret & VM_FAULT_MAJOR;
> if (ret & VM_FAULT_ERROR) {
> int err = vm_fault_to_errno(ret, 0);
> @@ -849,7 +852,8 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
> struct page **pages,
> struct vm_area_struct **vmas,
> int *locked,
> - unsigned int flags)
> + unsigned int flags,
> + struct range_lock *mmrange)
> {
> long ret, pages_done;
> bool lock_dropped;
> @@ -868,7 +872,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
> lock_dropped = false;
> for (;;) {
> ret = __get_user_pages(tsk, mm, start, nr_pages, flags, pages,
> - vmas, locked);
> + vmas, locked, mmrange);
> if (!locked)
> /* VM_FAULT_RETRY couldn't trigger, bypass */
> return ret;
> @@ -908,7 +912,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
> lock_dropped = true;
> down_read(&mm->mmap_sem);
> ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
> - pages, NULL, NULL);
> + pages, NULL, NULL, mmrange);
> if (ret != 1) {
> BUG_ON(ret > 1);
> if (!pages_done)
> @@ -956,11 +960,11 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
> */
> long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> - int *locked)
> + int *locked, struct range_lock *mmrange)
> {
> return __get_user_pages_locked(current, current->mm, start, nr_pages,
> pages, NULL, locked,
> - gup_flags | FOLL_TOUCH);
> + gup_flags | FOLL_TOUCH, mmrange);
> }
> EXPORT_SYMBOL(get_user_pages_locked);
>
> @@ -985,10 +989,11 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> struct mm_struct *mm = current->mm;
> int locked = 1;
> long ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_read(&mm->mmap_sem);
> ret = __get_user_pages_locked(current, mm, start, nr_pages, pages, NULL,
> - &locked, gup_flags | FOLL_TOUCH);
> + &locked, gup_flags | FOLL_TOUCH, &mmrange);
> if (locked)
> up_read(&mm->mmap_sem);
> return ret;
> @@ -1054,11 +1059,13 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
> long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
> unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas, int *locked)
> + struct vm_area_struct **vmas, int *locked,
> + struct range_lock *mmrange)
> {
> return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
> locked,
> - gup_flags | FOLL_TOUCH | FOLL_REMOTE);
> + gup_flags | FOLL_TOUCH | FOLL_REMOTE,
> + mmrange);
> }
> EXPORT_SYMBOL(get_user_pages_remote);
>
> @@ -1071,11 +1078,11 @@ EXPORT_SYMBOL(get_user_pages_remote);
> */
> long get_user_pages(unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas)
> + struct vm_area_struct **vmas, struct range_lock *mmrange)
> {
> return __get_user_pages_locked(current, current->mm, start, nr_pages,
> pages, vmas, NULL,
> - gup_flags | FOLL_TOUCH);
> + gup_flags | FOLL_TOUCH, mmrange);
> }
> EXPORT_SYMBOL(get_user_pages);
>
> @@ -1094,7 +1101,8 @@ EXPORT_SYMBOL(get_user_pages);
> */
> long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas_arg)
> + struct vm_area_struct **vmas_arg,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct **vmas = vmas_arg;
> struct vm_area_struct *vma_prev = NULL;
> @@ -1110,7 +1118,7 @@ long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> return -ENOMEM;
> }
>
> - rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
> + rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas, mmrange);
>
> for (i = 0; i < rc; i++) {
> struct vm_area_struct *vma = vmas[i];
> @@ -1149,6 +1157,7 @@ EXPORT_SYMBOL(get_user_pages_longterm);
> * @start: start address
> * @end: end address
> * @nonblocking:
> + * @mmrange: mm address space range locking
> *
> * This takes care of mlocking the pages too if VM_LOCKED is set.
> *
> @@ -1163,7 +1172,8 @@ EXPORT_SYMBOL(get_user_pages_longterm);
> * released. If it's released, *@nonblocking will be set to 0.
> */
> long populate_vma_page_range(struct vm_area_struct *vma,
> - unsigned long start, unsigned long end, int *nonblocking)
> + unsigned long start, unsigned long end, int *nonblocking,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = vma->vm_mm;
> unsigned long nr_pages = (end - start) / PAGE_SIZE;
> @@ -1198,7 +1208,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
> * not result in a stack expansion that recurses back here.
> */
> return __get_user_pages(current, mm, start, nr_pages, gup_flags,
> - NULL, NULL, nonblocking);
> + NULL, NULL, nonblocking, mmrange);
> }
>
> /*
> @@ -1215,6 +1225,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
> struct vm_area_struct *vma = NULL;
> int locked = 0;
> long ret = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> VM_BUG_ON(start & ~PAGE_MASK);
> VM_BUG_ON(len != PAGE_ALIGN(len));
> @@ -1247,7 +1258,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
> * double checks the vma flags, so that it won't mlock pages
> * if the vma was already munlocked.
> */
> - ret = populate_vma_page_range(vma, nstart, nend, &locked);
> + ret = populate_vma_page_range(vma, nstart, nend, &locked, &mmrange);
> if (ret < 0) {
> if (ignore_errors) {
> ret = 0;
> @@ -1282,10 +1293,11 @@ struct page *get_dump_page(unsigned long addr)
> {
> struct vm_area_struct *vma;
> struct page *page;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (__get_user_pages(current, current->mm, addr, 1,
> FOLL_FORCE | FOLL_DUMP | FOLL_GET, &page, &vma,
> - NULL) < 1)
> + NULL, &mmrange) < 1)
> return NULL;
> flush_cache_page(vma, addr, page_to_pfn(page));
> return page;
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 320545b98ff5..b14e6869689e 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -245,7 +245,8 @@ struct hmm_vma_walk {
>
> static int hmm_vma_do_fault(struct mm_walk *walk,
> unsigned long addr,
> - hmm_pfn_t *pfn)
> + hmm_pfn_t *pfn,
> + struct range_lock *mmrange)
> {
> unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE;
> struct hmm_vma_walk *hmm_vma_walk = walk->private;
> @@ -254,7 +255,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk,
>
> flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY;
> flags |= hmm_vma_walk->write ? FAULT_FLAG_WRITE : 0;
> - r = handle_mm_fault(vma, addr, flags);
> + r = handle_mm_fault(vma, addr, flags, mmrange);
> if (r & VM_FAULT_RETRY)
> return -EBUSY;
> if (r & VM_FAULT_ERROR) {
> @@ -298,7 +299,9 @@ static void hmm_pfns_clear(hmm_pfn_t *pfns,
>
> static int hmm_vma_walk_hole(unsigned long addr,
> unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk,
> + struct range_lock *mmrange)
> +
> {
> struct hmm_vma_walk *hmm_vma_walk = walk->private;
> struct hmm_range *range = hmm_vma_walk->range;
> @@ -312,7 +315,7 @@ static int hmm_vma_walk_hole(unsigned long addr,
> if (hmm_vma_walk->fault) {
> int ret;
>
> - ret = hmm_vma_do_fault(walk, addr, &pfns[i]);
> + ret = hmm_vma_do_fault(walk, addr, &pfns[i], mmrange);
> if (ret != -EAGAIN)
> return ret;
> }
> @@ -323,7 +326,8 @@ static int hmm_vma_walk_hole(unsigned long addr,
>
> static int hmm_vma_walk_clear(unsigned long addr,
> unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> struct hmm_vma_walk *hmm_vma_walk = walk->private;
> struct hmm_range *range = hmm_vma_walk->range;
> @@ -337,7 +341,7 @@ static int hmm_vma_walk_clear(unsigned long addr,
> if (hmm_vma_walk->fault) {
> int ret;
>
> - ret = hmm_vma_do_fault(walk, addr, &pfns[i]);
> + ret = hmm_vma_do_fault(walk, addr, &pfns[i], mmrange);
> if (ret != -EAGAIN)
> return ret;
> }
> @@ -349,7 +353,8 @@ static int hmm_vma_walk_clear(unsigned long addr,
> static int hmm_vma_walk_pmd(pmd_t *pmdp,
> unsigned long start,
> unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> struct hmm_vma_walk *hmm_vma_walk = walk->private;
> struct hmm_range *range = hmm_vma_walk->range;
> @@ -366,7 +371,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>
> again:
> if (pmd_none(*pmdp))
> - return hmm_vma_walk_hole(start, end, walk);
> + return hmm_vma_walk_hole(start, end, walk, mmrange);
>
> if (pmd_huge(*pmdp) && vma->vm_flags & VM_HUGETLB)
> return hmm_pfns_bad(start, end, walk);
> @@ -389,10 +394,10 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
> goto again;
> if (pmd_protnone(pmd))
> - return hmm_vma_walk_clear(start, end, walk);
> + return hmm_vma_walk_clear(start, end, walk, mmrange);
>
> if (write_fault && !pmd_write(pmd))
> - return hmm_vma_walk_clear(start, end, walk);
> + return hmm_vma_walk_clear(start, end, walk, mmrange);
>
> pfn = pmd_pfn(pmd) + pte_index(addr);
> flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
> @@ -464,7 +469,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> fault:
> pte_unmap(ptep);
> /* Fault all pages in range */
> - return hmm_vma_walk_clear(start, end, walk);
> + return hmm_vma_walk_clear(start, end, walk, mmrange);
> }
> pte_unmap(ptep - 1);
>
> @@ -495,7 +500,8 @@ int hmm_vma_get_pfns(struct vm_area_struct *vma,
> struct hmm_range *range,
> unsigned long start,
> unsigned long end,
> - hmm_pfn_t *pfns)
> + hmm_pfn_t *pfns,
> + struct range_lock *mmrange)
> {
> struct hmm_vma_walk hmm_vma_walk;
> struct mm_walk mm_walk;
> @@ -541,7 +547,7 @@ int hmm_vma_get_pfns(struct vm_area_struct *vma,
> mm_walk.pmd_entry = hmm_vma_walk_pmd;
> mm_walk.pte_hole = hmm_vma_walk_hole;
>
> - walk_page_range(start, end, &mm_walk);
> + walk_page_range(start, end, &mm_walk, mmrange);
> return 0;
> }
> EXPORT_SYMBOL(hmm_vma_get_pfns);
> @@ -664,7 +670,8 @@ int hmm_vma_fault(struct vm_area_struct *vma,
> unsigned long end,
> hmm_pfn_t *pfns,
> bool write,
> - bool block)
> + bool block,
> + struct range_lock *mmrange)
> {
> struct hmm_vma_walk hmm_vma_walk;
> struct mm_walk mm_walk;
> @@ -717,7 +724,7 @@ int hmm_vma_fault(struct vm_area_struct *vma,
> mm_walk.pte_hole = hmm_vma_walk_hole;
>
> do {
> - ret = walk_page_range(start, end, &mm_walk);
> + ret = walk_page_range(start, end, &mm_walk, mmrange);
> start = hmm_vma_walk.last;
> } while (ret == -EAGAIN);
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 62d8c34e63d5..abf1de31e524 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -289,7 +289,8 @@ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma,
>
> #ifdef CONFIG_MMU
> extern long populate_vma_page_range(struct vm_area_struct *vma,
> - unsigned long start, unsigned long end, int *nonblocking);
> + unsigned long start, unsigned long end, int *nonblocking,
> + struct range_lock *mmrange);
> extern void munlock_vma_pages_range(struct vm_area_struct *vma,
> unsigned long start, unsigned long end);
> static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 293721f5da70..66c350cd9799 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -448,7 +448,8 @@ static inline bool ksm_test_exit(struct mm_struct *mm)
> * of the process that owns 'vma'. We also do not want to enforce
> * protection keys here anyway.
> */
> -static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
> +static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
> + struct range_lock *mmrange)
> {
> struct page *page;
> int ret = 0;
> @@ -461,7 +462,8 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
> break;
> if (PageKsm(page))
> ret = handle_mm_fault(vma, addr,
> - FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE);
> + FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
> + mmrange);
> else
> ret = VM_FAULT_WRITE;
> put_page(page);
> @@ -516,6 +518,7 @@ static void break_cow(struct rmap_item *rmap_item)
> struct mm_struct *mm = rmap_item->mm;
> unsigned long addr = rmap_item->address;
> struct vm_area_struct *vma;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /*
> * It is not an accident that whenever we want to break COW
> @@ -526,7 +529,7 @@ static void break_cow(struct rmap_item *rmap_item)
> down_read(&mm->mmap_sem);
> vma = find_mergeable_vma(mm, addr);
> if (vma)
> - break_ksm(vma, addr);
> + break_ksm(vma, addr, &mmrange);
> up_read(&mm->mmap_sem);
> }
>
> @@ -807,7 +810,8 @@ static void remove_trailing_rmap_items(struct mm_slot *mm_slot,
> * in cmp_and_merge_page on one of the rmap_items we would be removing.
> */
> static int unmerge_ksm_pages(struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> unsigned long addr;
> int err = 0;
> @@ -818,7 +822,7 @@ static int unmerge_ksm_pages(struct vm_area_struct *vma,
> if (signal_pending(current))
> err = -ERESTARTSYS;
> else
> - err = break_ksm(vma, addr);
> + err = break_ksm(vma, addr, mmrange);
> }
> return err;
> }
> @@ -922,6 +926,7 @@ static int unmerge_and_remove_all_rmap_items(void)
> struct mm_struct *mm;
> struct vm_area_struct *vma;
> int err = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> spin_lock(&ksm_mmlist_lock);
> ksm_scan.mm_slot = list_entry(ksm_mm_head.mm_list.next,
> @@ -937,8 +942,8 @@ static int unmerge_and_remove_all_rmap_items(void)
> break;
> if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma)
> continue;
> - err = unmerge_ksm_pages(vma,
> - vma->vm_start, vma->vm_end);
> + err = unmerge_ksm_pages(vma, vma->vm_start,
> + vma->vm_end, &mmrange);
> if (err)
> goto error;
> }
> @@ -2350,7 +2355,8 @@ static int ksm_scan_thread(void *nothing)
> }
>
> int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> - unsigned long end, int advice, unsigned long *vm_flags)
> + unsigned long end, int advice, unsigned long *vm_flags,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = vma->vm_mm;
> int err;
> @@ -2384,7 +2390,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> return 0; /* just ignore the advice */
>
> if (vma->anon_vma) {
> - err = unmerge_ksm_pages(vma, start, end);
> + err = unmerge_ksm_pages(vma, start, end, mmrange);
> if (err)
> return err;
> }
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d3c922ea1a1..eaec6bfc2b08 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -54,7 +54,8 @@ static int madvise_need_mmap_write(int behavior)
> */
> static long madvise_behavior(struct vm_area_struct *vma,
> struct vm_area_struct **prev,
> - unsigned long start, unsigned long end, int behavior)
> + unsigned long start, unsigned long end, int behavior,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = vma->vm_mm;
> int error = 0;
> @@ -104,7 +105,8 @@ static long madvise_behavior(struct vm_area_struct *vma,
> break;
> case MADV_MERGEABLE:
> case MADV_UNMERGEABLE:
> - error = ksm_madvise(vma, start, end, behavior, &new_flags);
> + error = ksm_madvise(vma, start, end, behavior,
> + &new_flags, mmrange);
> if (error) {
> /*
> * madvise() returns EAGAIN if kernel resources, such as
> @@ -138,7 +140,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
> pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> *prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
> vma->vm_file, pgoff, vma_policy(vma),
> - vma->vm_userfaultfd_ctx);
> + vma->vm_userfaultfd_ctx, mmrange);
> if (*prev) {
> vma = *prev;
> goto success;
> @@ -151,7 +153,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
> error = -ENOMEM;
> goto out;
> }
> - error = __split_vma(mm, vma, start, 1);
> + error = __split_vma(mm, vma, start, 1, mmrange);
> if (error) {
> /*
> * madvise() returns EAGAIN if kernel resources, such as
> @@ -168,7 +170,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
> error = -ENOMEM;
> goto out;
> }
> - error = __split_vma(mm, vma, end, 0);
> + error = __split_vma(mm, vma, end, 0, mmrange);
> if (error) {
> /*
> * madvise() returns EAGAIN if kernel resources, such as
> @@ -191,7 +193,8 @@ static long madvise_behavior(struct vm_area_struct *vma,
>
> #ifdef CONFIG_SWAP
> static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> - unsigned long end, struct mm_walk *walk)
> + unsigned long end, struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> pte_t *orig_pte;
> struct vm_area_struct *vma = walk->private;
> @@ -226,7 +229,8 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> }
>
> static void force_swapin_readahead(struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> struct mm_walk walk = {
> .mm = vma->vm_mm,
> @@ -234,7 +238,7 @@ static void force_swapin_readahead(struct vm_area_struct *vma,
> .private = vma,
> };
>
> - walk_page_range(start, end, &walk);
> + walk_page_range(start, end, &walk, mmrange);
>
> lru_add_drain(); /* Push any new pages onto the LRU now */
> }
> @@ -272,14 +276,15 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
> */
> static long madvise_willneed(struct vm_area_struct *vma,
> struct vm_area_struct **prev,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end,
> + struct range_lock *mmrange)
> {
> struct file *file = vma->vm_file;
>
> *prev = vma;
> #ifdef CONFIG_SWAP
> if (!file) {
> - force_swapin_readahead(vma, start, end);
> + force_swapin_readahead(vma, start, end, mmrange);
> return 0;
> }
>
> @@ -308,7 +313,8 @@ static long madvise_willneed(struct vm_area_struct *vma,
> }
>
> static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
> - unsigned long end, struct mm_walk *walk)
> + unsigned long end, struct mm_walk *walk,
> + struct range_lock *mmrange)
>
> {
> struct mmu_gather *tlb = walk->private;
> @@ -442,7 +448,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>
> static void madvise_free_page_range(struct mmu_gather *tlb,
> struct vm_area_struct *vma,
> - unsigned long addr, unsigned long end)
> + unsigned long addr, unsigned long end,
> + struct range_lock *mmrange)
> {
> struct mm_walk free_walk = {
> .pmd_entry = madvise_free_pte_range,
> @@ -451,12 +458,14 @@ static void madvise_free_page_range(struct mmu_gather *tlb,
> };
>
> tlb_start_vma(tlb, vma);
> - walk_page_range(addr, end, &free_walk);
> + walk_page_range(addr, end, &free_walk, mmrange);
> tlb_end_vma(tlb, vma);
> }
>
> static int madvise_free_single_vma(struct vm_area_struct *vma,
> - unsigned long start_addr, unsigned long end_addr)
> + unsigned long start_addr,
> + unsigned long end_addr,
> + struct range_lock *mmrange)
> {
> unsigned long start, end;
> struct mm_struct *mm = vma->vm_mm;
> @@ -478,7 +487,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
> update_hiwater_rss(mm);
>
> mmu_notifier_invalidate_range_start(mm, start, end);
> - madvise_free_page_range(&tlb, vma, start, end);
> + madvise_free_page_range(&tlb, vma, start, end, mmrange);
> mmu_notifier_invalidate_range_end(mm, start, end);
> tlb_finish_mmu(&tlb, start, end);
>
> @@ -514,7 +523,7 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
> static long madvise_dontneed_free(struct vm_area_struct *vma,
> struct vm_area_struct **prev,
> unsigned long start, unsigned long end,
> - int behavior)
> + int behavior, struct range_lock *mmrange)
> {
> *prev = vma;
> if (!can_madv_dontneed_vma(vma))
> @@ -562,7 +571,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> if (behavior == MADV_DONTNEED)
> return madvise_dontneed_single_vma(vma, start, end);
> else if (behavior == MADV_FREE)
> - return madvise_free_single_vma(vma, start, end);
> + return madvise_free_single_vma(vma, start, end, mmrange);
> else
> return -EINVAL;
> }
> @@ -676,18 +685,21 @@ static int madvise_inject_error(int behavior,
>
> static long
> madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
> - unsigned long start, unsigned long end, int behavior)
> + unsigned long start, unsigned long end, int behavior,
> + struct range_lock *mmrange)
> {
> switch (behavior) {
> case MADV_REMOVE:
> return madvise_remove(vma, prev, start, end);
> case MADV_WILLNEED:
> - return madvise_willneed(vma, prev, start, end);
> + return madvise_willneed(vma, prev, start, end, mmrange);
> case MADV_FREE:
> case MADV_DONTNEED:
> - return madvise_dontneed_free(vma, prev, start, end, behavior);
> + return madvise_dontneed_free(vma, prev, start, end, behavior,
> + mmrange);
> default:
> - return madvise_behavior(vma, prev, start, end, behavior);
> + return madvise_behavior(vma, prev, start, end, behavior,
> + mmrange);
> }
> }
>
> @@ -797,7 +809,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> int write;
> size_t len;
> struct blk_plug plug;
> -
> + DEFINE_RANGE_LOCK_FULL(mmrange);
> if (!madvise_behavior_valid(behavior))
> return error;
>
> @@ -860,7 +872,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> tmp = end;
>
> /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
> - error = madvise_vma(vma, &prev, start, tmp, behavior);
> + error = madvise_vma(vma, &prev, start, tmp, behavior, &mmrange);
> if (error)
> goto out;
> start = tmp;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 88c1af32fd67..a7ac5a14b22e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4881,7 +4881,8 @@ static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
>
> static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
> unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *vma = walk->vma;
> pte_t *pte;
> @@ -4915,6 +4916,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
> static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
> {
> unsigned long precharge;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> struct mm_walk mem_cgroup_count_precharge_walk = {
> .pmd_entry = mem_cgroup_count_precharge_pte_range,
> @@ -4922,7 +4924,7 @@ static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
> };
> down_read(&mm->mmap_sem);
> walk_page_range(0, mm->highest_vm_end,
> - &mem_cgroup_count_precharge_walk);
> + &mem_cgroup_count_precharge_walk, &mmrange);
> up_read(&mm->mmap_sem);
>
> precharge = mc.precharge;
> @@ -5081,7 +5083,8 @@ static void mem_cgroup_cancel_attach(struct cgroup_taskset *tset)
>
> static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
> unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> int ret = 0;
> struct vm_area_struct *vma = walk->vma;
> @@ -5197,6 +5200,7 @@ static void mem_cgroup_move_charge(void)
> .pmd_entry = mem_cgroup_move_charge_pte_range,
> .mm = mc.mm,
> };
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> lru_add_drain_all();
> /*
> @@ -5223,7 +5227,8 @@ static void mem_cgroup_move_charge(void)
> * When we have consumed all precharges and failed in doing
> * additional charge, the page walk just aborts.
> */
> - walk_page_range(0, mc.mm->highest_vm_end, &mem_cgroup_move_charge_walk);
> + walk_page_range(0, mc.mm->highest_vm_end, &mem_cgroup_move_charge_walk,
> + &mmrange);
>
> up_read(&mc.mm->mmap_sem);
> atomic_dec(&mc.from->moving_account);
> diff --git a/mm/memory.c b/mm/memory.c
> index 5ec6433d6a5c..b3561a052939 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4021,7 +4021,7 @@ static int handle_pte_fault(struct vm_fault *vmf)
> * return value. See filemap_fault() and __lock_page_or_retry().
> */
> static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> - unsigned int flags)
> + unsigned int flags, struct range_lock *mmrange)
> {
> struct vm_fault vmf = {
> .vma = vma,
> @@ -4029,6 +4029,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> .flags = flags,
> .pgoff = linear_page_index(vma, address),
> .gfp_mask = __get_fault_gfp_mask(vma),
> + .lockrange = mmrange,
> };
> unsigned int dirty = flags & FAULT_FLAG_WRITE;
> struct mm_struct *mm = vma->vm_mm;
> @@ -4110,7 +4111,7 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> * return value. See filemap_fault() and __lock_page_or_retry().
> */
> int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> - unsigned int flags)
> + unsigned int flags, struct range_lock *mmrange)
> {
> int ret;
>
> @@ -4137,7 +4138,7 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
> if (unlikely(is_vm_hugetlb_page(vma)))
> ret = hugetlb_fault(vma->vm_mm, vma, address, flags);
> else
> - ret = __handle_mm_fault(vma, address, flags);
> + ret = __handle_mm_fault(vma, address, flags, mmrange);
>
> if (flags & FAULT_FLAG_USER) {
> mem_cgroup_oom_disable();
> @@ -4425,6 +4426,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
> struct vm_area_struct *vma;
> void *old_buf = buf;
> int write = gup_flags & FOLL_WRITE;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_read(&mm->mmap_sem);
> /* ignore errors, just check how much was successfully transferred */
> @@ -4434,7 +4436,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
> struct page *page = NULL;
>
> ret = get_user_pages_remote(tsk, mm, addr, 1,
> - gup_flags, &page, &vma, NULL);
> + gup_flags, &page, &vma, NULL, &mmrange);
> if (ret <= 0) {
> #ifndef CONFIG_HAVE_IOREMAP_PROT
> break;
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index a8b7d59002e8..001dc176abc1 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -467,7 +467,8 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
> * and move them to the pagelist if they do.
> */
> static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
> - unsigned long end, struct mm_walk *walk)
> + unsigned long end, struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *vma = walk->vma;
> struct page *page;
> @@ -618,7 +619,7 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
> static int
> queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
> nodemask_t *nodes, unsigned long flags,
> - struct list_head *pagelist)
> + struct list_head *pagelist, struct range_lock *mmrange)
> {
> struct queue_pages qp = {
> .pagelist = pagelist,
> @@ -634,7 +635,7 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
> .private = &qp,
> };
>
> - return walk_page_range(start, end, &queue_pages_walk);
> + return walk_page_range(start, end, &queue_pages_walk, mmrange);
> }
>
> /*
> @@ -675,7 +676,8 @@ static int vma_replace_policy(struct vm_area_struct *vma,
>
> /* Step 2: apply policy to a range and do splits. */
> static int mbind_range(struct mm_struct *mm, unsigned long start,
> - unsigned long end, struct mempolicy *new_pol)
> + unsigned long end, struct mempolicy *new_pol,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *next;
> struct vm_area_struct *prev;
> @@ -705,7 +707,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
> ((vmstart - vma->vm_start) >> PAGE_SHIFT);
> prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
> vma->anon_vma, vma->vm_file, pgoff,
> - new_pol, vma->vm_userfaultfd_ctx);
> + new_pol, vma->vm_userfaultfd_ctx, mmrange);
> if (prev) {
> vma = prev;
> next = vma->vm_next;
> @@ -715,12 +717,12 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
> goto replace;
> }
> if (vma->vm_start != vmstart) {
> - err = split_vma(vma->vm_mm, vma, vmstart, 1);
> + err = split_vma(vma->vm_mm, vma, vmstart, 1, mmrange);
> if (err)
> goto out;
> }
> if (vma->vm_end != vmend) {
> - err = split_vma(vma->vm_mm, vma, vmend, 0);
> + err = split_vma(vma->vm_mm, vma, vmend, 0, mmrange);
> if (err)
> goto out;
> }
> @@ -797,12 +799,12 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes)
> }
> }
>
> -static int lookup_node(unsigned long addr)
> +static int lookup_node(unsigned long addr, struct range_lock *mmrange)
> {
> struct page *p;
> int err;
>
> - err = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL);
> + err = get_user_pages(addr & PAGE_MASK, 1, 0, &p, NULL, mmrange);
> if (err >= 0) {
> err = page_to_nid(p);
> put_page(p);
> @@ -818,6 +820,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma = NULL;
> struct mempolicy *pol = current->mempolicy;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (flags &
> ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR|MPOL_F_MEMS_ALLOWED))
> @@ -857,7 +860,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>
> if (flags & MPOL_F_NODE) {
> if (flags & MPOL_F_ADDR) {
> - err = lookup_node(addr);
> + err = lookup_node(addr, &mmrange);
> if (err < 0)
> goto out;
> *policy = err;
> @@ -943,7 +946,7 @@ struct page *alloc_new_node_page(struct page *page, unsigned long node)
> * Returns error or the number of pages not migrated.
> */
> static int migrate_to_node(struct mm_struct *mm, int source, int dest,
> - int flags)
> + int flags, struct range_lock *mmrange)
> {
> nodemask_t nmask;
> LIST_HEAD(pagelist);
> @@ -959,7 +962,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
> */
> VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)));
> queue_pages_range(mm, mm->mmap->vm_start, mm->task_size, &nmask,
> - flags | MPOL_MF_DISCONTIG_OK, &pagelist);
> + flags | MPOL_MF_DISCONTIG_OK, &pagelist, mmrange);
>
> if (!list_empty(&pagelist)) {
> err = migrate_pages(&pagelist, alloc_new_node_page, NULL, dest,
> @@ -983,6 +986,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
> int busy = 0;
> int err;
> nodemask_t tmp;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> err = migrate_prep();
> if (err)
> @@ -1063,7 +1067,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
> break;
>
> node_clear(source, tmp);
> - err = migrate_to_node(mm, source, dest, flags);
> + err = migrate_to_node(mm, source, dest, flags, &mmrange);
> if (err > 0)
> busy += err;
> if (err < 0)
> @@ -1143,6 +1147,7 @@ static long do_mbind(unsigned long start, unsigned long len,
> unsigned long end;
> int err;
> LIST_HEAD(pagelist);
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (flags & ~(unsigned long)MPOL_MF_VALID)
> return -EINVAL;
> @@ -1204,9 +1209,9 @@ static long do_mbind(unsigned long start, unsigned long len,
> goto mpol_out;
>
> err = queue_pages_range(mm, start, end, nmask,
> - flags | MPOL_MF_INVERT, &pagelist);
> + flags | MPOL_MF_INVERT, &pagelist, &mmrange);
> if (!err)
> - err = mbind_range(mm, start, end, new);
> + err = mbind_range(mm, start, end, new, &mmrange);
>
> if (!err) {
> int nr_failed = 0;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 5d0dc7b85f90..7a6afc34dd54 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2105,7 +2105,8 @@ struct migrate_vma {
>
> static int migrate_vma_collect_hole(unsigned long start,
> unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> struct migrate_vma *migrate = walk->private;
> unsigned long addr;
> @@ -2138,7 +2139,8 @@ static int migrate_vma_collect_skip(unsigned long start,
> static int migrate_vma_collect_pmd(pmd_t *pmdp,
> unsigned long start,
> unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> struct migrate_vma *migrate = walk->private;
> struct vm_area_struct *vma = walk->vma;
> @@ -2149,7 +2151,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>
> again:
> if (pmd_none(*pmdp))
> - return migrate_vma_collect_hole(start, end, walk);
> + return migrate_vma_collect_hole(start, end, walk, mmrange);
>
> if (pmd_trans_huge(*pmdp)) {
> struct page *page;
> @@ -2183,7 +2185,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> walk);
> if (pmd_none(*pmdp))
> return migrate_vma_collect_hole(start, end,
> - walk);
> + walk, mmrange);
> }
> }
>
> @@ -2309,7 +2311,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> * valid page, it updates the src array and takes a reference on the page, in
> * order to pin the page until we lock it and unmap it.
> */
> -static void migrate_vma_collect(struct migrate_vma *migrate)
> +static void migrate_vma_collect(struct migrate_vma *migrate,
> + struct range_lock *mmrange)
> {
> struct mm_walk mm_walk;
>
> @@ -2325,7 +2328,7 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
> mmu_notifier_invalidate_range_start(mm_walk.mm,
> migrate->start,
> migrate->end);
> - walk_page_range(migrate->start, migrate->end, &mm_walk);
> + walk_page_range(migrate->start, migrate->end, &mm_walk, mmrange);
> mmu_notifier_invalidate_range_end(mm_walk.mm,
> migrate->start,
> migrate->end);
> @@ -2891,7 +2894,8 @@ int migrate_vma(const struct migrate_vma_ops *ops,
> unsigned long end,
> unsigned long *src,
> unsigned long *dst,
> - void *private)
> + void *private,
> + struct range_lock *mmrange)
> {
> struct migrate_vma migrate;
>
> @@ -2917,7 +2921,7 @@ int migrate_vma(const struct migrate_vma_ops *ops,
> migrate.vma = vma;
>
> /* Collect, and try to unmap source pages */
> - migrate_vma_collect(&migrate);
> + migrate_vma_collect(&migrate, mmrange);
> if (!migrate.cpages)
> return 0;
>
> diff --git a/mm/mincore.c b/mm/mincore.c
> index fc37afe226e6..a6875a34aac0 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -85,7 +85,9 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
> }
>
> static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
> - struct vm_area_struct *vma, unsigned char *vec)
> + struct vm_area_struct *vma,
> + unsigned char *vec,
> + struct range_lock *mmrange)
> {
> unsigned long nr = (end - addr) >> PAGE_SHIFT;
> int i;
> @@ -104,15 +106,17 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
> }
>
> static int mincore_unmapped_range(unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> walk->private += __mincore_unmapped_range(addr, end,
> - walk->vma, walk->private);
> + walk->vma,
> + walk->private, mmrange);
> return 0;
> }
>
> static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> spinlock_t *ptl;
> struct vm_area_struct *vma = walk->vma;
> @@ -128,7 +132,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> }
>
> if (pmd_trans_unstable(pmd)) {
> - __mincore_unmapped_range(addr, end, vma, vec);
> + __mincore_unmapped_range(addr, end, vma, vec, mmrange);
> goto out;
> }
>
> @@ -138,7 +142,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
> if (pte_none(pte))
> __mincore_unmapped_range(addr, addr + PAGE_SIZE,
> - vma, vec);
> + vma, vec, mmrange);
> else if (pte_present(pte))
> *vec = 1;
> else { /* pte is a swap entry */
> @@ -174,7 +178,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> * all the arguments, we hold the mmap semaphore: we should
> * just return the amount of info we're asked for.
> */
> -static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *vec)
> +static long do_mincore(unsigned long addr, unsigned long pages,
> + unsigned char *vec, struct range_lock *mmrange)
> {
> struct vm_area_struct *vma;
> unsigned long end;
> @@ -191,7 +196,7 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
> return -ENOMEM;
> mincore_walk.mm = vma->vm_mm;
> end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
> - err = walk_page_range(addr, end, &mincore_walk);
> + err = walk_page_range(addr, end, &mincore_walk, mmrange);
> if (err < 0)
> return err;
> return (end - addr) >> PAGE_SHIFT;
> @@ -227,6 +232,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
> long retval;
> unsigned long pages;
> unsigned char *tmp;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* Check the start address: needs to be page-aligned.. */
> if (start & ~PAGE_MASK)
> @@ -254,7 +260,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
> * the temporary buffer size.
> */
> down_read(&current->mm->mmap_sem);
> - retval = do_mincore(start, min(pages, PAGE_SIZE), tmp);
> + retval = do_mincore(start, min(pages, PAGE_SIZE), tmp, &mmrange);
> up_read(&current->mm->mmap_sem);
>
> if (retval <= 0)
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 74e5a6547c3d..3f6bd953e8b0 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -517,7 +517,8 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
> * For vmas that pass the filters, merge/split as appropriate.
> */
> static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
> - unsigned long start, unsigned long end, vm_flags_t newflags)
> + unsigned long start, unsigned long end, vm_flags_t newflags,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = vma->vm_mm;
> pgoff_t pgoff;
> @@ -534,20 +535,20 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
> pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> *prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
> vma->vm_file, pgoff, vma_policy(vma),
> - vma->vm_userfaultfd_ctx);
> + vma->vm_userfaultfd_ctx, mmrange);
> if (*prev) {
> vma = *prev;
> goto success;
> }
>
> if (start != vma->vm_start) {
> - ret = split_vma(mm, vma, start, 1);
> + ret = split_vma(mm, vma, start, 1, mmrange);
> if (ret)
> goto out;
> }
>
> if (end != vma->vm_end) {
> - ret = split_vma(mm, vma, end, 0);
> + ret = split_vma(mm, vma, end, 0, mmrange);
> if (ret)
> goto out;
> }
> @@ -580,7 +581,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
> }
>
> static int apply_vma_lock_flags(unsigned long start, size_t len,
> - vm_flags_t flags)
> + vm_flags_t flags, struct range_lock *mmrange)
> {
> unsigned long nstart, end, tmp;
> struct vm_area_struct * vma, * prev;
> @@ -610,7 +611,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
> tmp = vma->vm_end;
> if (tmp > end)
> tmp = end;
> - error = mlock_fixup(vma, &prev, nstart, tmp, newflags);
> + error = mlock_fixup(vma, &prev, nstart, tmp, newflags, mmrange);
> if (error)
> break;
> nstart = tmp;
> @@ -667,11 +668,13 @@ static int count_mm_mlocked_page_nr(struct mm_struct *mm,
> return count >> PAGE_SHIFT;
> }
>
> -static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
> +static __must_check int do_mlock(unsigned long start, size_t len,
> + vm_flags_t flags)
> {
> unsigned long locked;
> unsigned long lock_limit;
> int error = -ENOMEM;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (!can_do_mlock())
> return -EPERM;
> @@ -700,7 +703,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
>
> /* check against resource limits */
> if ((locked <= lock_limit) || capable(CAP_IPC_LOCK))
> - error = apply_vma_lock_flags(start, len, flags);
> + error = apply_vma_lock_flags(start, len, flags, &mmrange);
>
> up_write(&current->mm->mmap_sem);
> if (error)
> @@ -733,13 +736,14 @@ SYSCALL_DEFINE3(mlock2, unsigned long, start, size_t, len, int, flags)
> SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> {
> int ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> len = PAGE_ALIGN(len + (offset_in_page(start)));
> start &= PAGE_MASK;
>
> if (down_write_killable(&current->mm->mmap_sem))
> return -EINTR;
> - ret = apply_vma_lock_flags(start, len, 0);
> + ret = apply_vma_lock_flags(start, len, 0, &mmrange);
> up_write(&current->mm->mmap_sem);
>
> return ret;
> @@ -755,7 +759,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
> * is called once including the MCL_FUTURE flag and then a second time without
> * it, VM_LOCKED and VM_LOCKONFAULT will be cleared from mm->def_flags.
> */
> -static int apply_mlockall_flags(int flags)
> +static int apply_mlockall_flags(int flags, struct range_lock *mmrange)
> {
> struct vm_area_struct * vma, * prev = NULL;
> vm_flags_t to_add = 0;
> @@ -784,7 +788,8 @@ static int apply_mlockall_flags(int flags)
> newflags |= to_add;
>
> /* Ignore errors */
> - mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags);
> + mlock_fixup(vma, &prev, vma->vm_start, vma->vm_end, newflags,
> + mmrange);
> cond_resched();
> }
> out:
> @@ -795,6 +800,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
> {
> unsigned long lock_limit;
> int ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (!flags || (flags & ~(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)))
> return -EINVAL;
> @@ -811,7 +817,7 @@ SYSCALL_DEFINE1(mlockall, int, flags)
> ret = -ENOMEM;
> if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
> capable(CAP_IPC_LOCK))
> - ret = apply_mlockall_flags(flags);
> + ret = apply_mlockall_flags(flags, &mmrange);
> up_write(&current->mm->mmap_sem);
> if (!ret && (flags & MCL_CURRENT))
> mm_populate(0, TASK_SIZE);
> @@ -822,10 +828,11 @@ SYSCALL_DEFINE1(mlockall, int, flags)
> SYSCALL_DEFINE0(munlockall)
> {
> int ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (down_write_killable(&current->mm->mmap_sem))
> return -EINTR;
> - ret = apply_mlockall_flags(0);
> + ret = apply_mlockall_flags(0, &mmrange);
> up_write(&current->mm->mmap_sem);
> return ret;
> }
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 4bb038e7984b..f61d49cb791e 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -177,7 +177,8 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
> return next;
> }
>
> -static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf);
> +static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf,
> + struct range_lock *mmrange);
>
> SYSCALL_DEFINE1(brk, unsigned long, brk)
> {
> @@ -188,6 +189,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
> unsigned long min_brk;
> bool populate;
> LIST_HEAD(uf);
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (down_write_killable(&mm->mmap_sem))
> return -EINTR;
> @@ -225,7 +227,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
>
> /* Always allow shrinking brk. */
> if (brk <= mm->brk) {
> - if (!do_munmap(mm, newbrk, oldbrk-newbrk, &uf))
> + if (!do_munmap(mm, newbrk, oldbrk-newbrk, &uf, &mmrange))
> goto set_brk;
> goto out;
> }
> @@ -236,7 +238,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
> goto out;
>
> /* Ok, looks good - let it rip. */
> - if (do_brk(oldbrk, newbrk-oldbrk, &uf) < 0)
> + if (do_brk(oldbrk, newbrk-oldbrk, &uf, &mmrange) < 0)
> goto out;
>
> set_brk:
> @@ -680,7 +682,7 @@ static inline void __vma_unlink_prev(struct mm_struct *mm,
> */
> int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
> unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
> - struct vm_area_struct *expand)
> + struct vm_area_struct *expand, struct range_lock *mmrange)
> {
> struct mm_struct *mm = vma->vm_mm;
> struct vm_area_struct *next = vma->vm_next, *orig_vma = vma;
> @@ -887,10 +889,10 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
> i_mmap_unlock_write(mapping);
>
> if (root) {
> - uprobe_mmap(vma);
> + uprobe_mmap(vma, mmrange);
>
> if (adjust_next)
> - uprobe_mmap(next);
> + uprobe_mmap(next, mmrange);
> }
>
> if (remove_next) {
> @@ -960,7 +962,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
> }
> }
> if (insert && file)
> - uprobe_mmap(insert);
> + uprobe_mmap(insert, mmrange);
>
> validate_mm(mm);
>
> @@ -1101,7 +1103,8 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
> unsigned long end, unsigned long vm_flags,
> struct anon_vma *anon_vma, struct file *file,
> pgoff_t pgoff, struct mempolicy *policy,
> - struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> + struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> + struct range_lock *mmrange)
> {
> pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> struct vm_area_struct *area, *next;
> @@ -1149,10 +1152,11 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
> /* cases 1, 6 */
> err = __vma_adjust(prev, prev->vm_start,
> next->vm_end, prev->vm_pgoff, NULL,
> - prev);
> + prev, mmrange);
> } else /* cases 2, 5, 7 */
> err = __vma_adjust(prev, prev->vm_start,
> - end, prev->vm_pgoff, NULL, prev);
> + end, prev->vm_pgoff, NULL,
> + prev, mmrange);
> if (err)
> return NULL;
> khugepaged_enter_vma_merge(prev, vm_flags);
> @@ -1169,10 +1173,12 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
> vm_userfaultfd_ctx)) {
> if (prev && addr < prev->vm_end) /* case 4 */
> err = __vma_adjust(prev, prev->vm_start,
> - addr, prev->vm_pgoff, NULL, next);
> + addr, prev->vm_pgoff, NULL,
> + next, mmrange);
> else { /* cases 3, 8 */
> err = __vma_adjust(area, addr, next->vm_end,
> - next->vm_pgoff - pglen, NULL, next);
> + next->vm_pgoff - pglen, NULL,
> + next, mmrange);
> /*
> * In case 3 area is already equal to next and
> * this is a noop, but in case 8 "area" has
> @@ -1322,7 +1328,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> unsigned long len, unsigned long prot,
> unsigned long flags, vm_flags_t vm_flags,
> unsigned long pgoff, unsigned long *populate,
> - struct list_head *uf)
> + struct list_head *uf, struct range_lock *mmrange)
> {
> struct mm_struct *mm = current->mm;
> int pkey = 0;
> @@ -1491,7 +1497,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> vm_flags |= VM_NORESERVE;
> }
>
> - addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
> + addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, mmrange);
> if (!IS_ERR_VALUE(addr) &&
> ((vm_flags & VM_LOCKED) ||
> (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
> @@ -1628,7 +1634,7 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
>
> unsigned long mmap_region(struct file *file, unsigned long addr,
> unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> - struct list_head *uf)
> + struct list_head *uf, struct range_lock *mmrange)
> {
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma, *prev;
> @@ -1654,7 +1660,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> /* Clear old maps */
> while (find_vma_links(mm, addr, addr + len, &prev, &rb_link,
> &rb_parent)) {
> - if (do_munmap(mm, addr, len, uf))
> + if (do_munmap(mm, addr, len, uf, mmrange))
> return -ENOMEM;
> }
>
> @@ -1672,7 +1678,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> * Can we just expand an old mapping?
> */
> vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
> - NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
> + NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, mmrange);
> if (vma)
> goto out;
>
> @@ -1756,7 +1762,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> }
>
> if (file)
> - uprobe_mmap(vma);
> + uprobe_mmap(vma, mmrange);
>
> /*
> * New (or expanded) vma always get soft dirty status.
> @@ -2435,7 +2441,8 @@ int expand_stack(struct vm_area_struct *vma, unsigned long address)
> }
>
> struct vm_area_struct *
> -find_extend_vma(struct mm_struct *mm, unsigned long addr)
> +find_extend_vma(struct mm_struct *mm, unsigned long addr,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *vma, *prev;
>
> @@ -2446,7 +2453,8 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
> if (!prev || expand_stack(prev, addr))
> return NULL;
> if (prev->vm_flags & VM_LOCKED)
> - populate_vma_page_range(prev, addr, prev->vm_end, NULL);
> + populate_vma_page_range(prev, addr, prev->vm_end,
> + NULL, mmrange);
> return prev;
> }
> #else
> @@ -2456,7 +2464,8 @@ int expand_stack(struct vm_area_struct *vma, unsigned long address)
> }
>
> struct vm_area_struct *
> -find_extend_vma(struct mm_struct *mm, unsigned long addr)
> +find_extend_vma(struct mm_struct *mm, unsigned long addr,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *vma;
> unsigned long start;
> @@ -2473,7 +2482,7 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
> if (expand_stack(vma, addr))
> return NULL;
> if (vma->vm_flags & VM_LOCKED)
> - populate_vma_page_range(vma, addr, start, NULL);
> + populate_vma_page_range(vma, addr, start, NULL, mmrange);
> return vma;
> }
> #endif
> @@ -2561,7 +2570,7 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
> * has already been checked or doesn't make sense to fail.
> */
> int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long addr, int new_below)
> + unsigned long addr, int new_below, struct range_lock *mmrange)
> {
> struct vm_area_struct *new;
> int err;
> @@ -2604,9 +2613,11 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
>
> if (new_below)
> err = vma_adjust(vma, addr, vma->vm_end, vma->vm_pgoff +
> - ((addr - new->vm_start) >> PAGE_SHIFT), new);
> + ((addr - new->vm_start) >> PAGE_SHIFT), new,
> + mmrange);
> else
> - err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new);
> + err = vma_adjust(vma, vma->vm_start, addr, vma->vm_pgoff, new,
> + mmrange);
>
> /* Success. */
> if (!err)
> @@ -2630,12 +2641,12 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
> * either for the first part or the tail.
> */
> int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long addr, int new_below)
> + unsigned long addr, int new_below, struct range_lock *mmrange)
> {
> if (mm->map_count >= sysctl_max_map_count)
> return -ENOMEM;
>
> - return __split_vma(mm, vma, addr, new_below);
> + return __split_vma(mm, vma, addr, new_below, mmrange);
> }
>
> /* Munmap is split into 2 main parts -- this part which finds
> @@ -2644,7 +2655,7 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
> * Jeremy Fitzhardinge <[email protected]>
> */
> int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> - struct list_head *uf)
> + struct list_head *uf, struct range_lock *mmrange)
> {
> unsigned long end;
> struct vm_area_struct *vma, *prev, *last;
> @@ -2686,7 +2697,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
> return -ENOMEM;
>
> - error = __split_vma(mm, vma, start, 0);
> + error = __split_vma(mm, vma, start, 0, mmrange);
> if (error)
> return error;
> prev = vma;
> @@ -2695,7 +2706,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> /* Does it split the last one? */
> last = find_vma(mm, end);
> if (last && end > last->vm_start) {
> - int error = __split_vma(mm, last, end, 1);
> + int error = __split_vma(mm, last, end, 1, mmrange);
> if (error)
> return error;
> }
> @@ -2736,7 +2747,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> detach_vmas_to_be_unmapped(mm, vma, prev, end);
> unmap_region(mm, vma, prev, start, end);
>
> - arch_unmap(mm, vma, start, end);
> + arch_unmap(mm, vma, start, end, mmrange);
>
> /* Fix up all other VM information */
> remove_vma_list(mm, vma);
> @@ -2749,11 +2760,12 @@ int vm_munmap(unsigned long start, size_t len)
> int ret;
> struct mm_struct *mm = current->mm;
> LIST_HEAD(uf);
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (down_write_killable(&mm->mmap_sem))
> return -EINTR;
>
> - ret = do_munmap(mm, start, len, &uf);
> + ret = do_munmap(mm, start, len, &uf, &mmrange);
> up_write(&mm->mmap_sem);
> userfaultfd_unmap_complete(mm, &uf);
> return ret;
> @@ -2779,6 +2791,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> unsigned long populate = 0;
> unsigned long ret = -EINVAL;
> struct file *file;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> pr_warn_once("%s (%d) uses deprecated remap_file_pages() syscall. See Documentation/vm/remap_file_pages.txt.\n",
> current->comm, current->pid);
> @@ -2855,7 +2868,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>
> file = get_file(vma->vm_file);
> ret = do_mmap_pgoff(vma->vm_file, start, size,
> - prot, flags, pgoff, &populate, NULL);
> + prot, flags, pgoff, &populate, NULL, &mmrange);
> fput(file);
> out:
> up_write(&mm->mmap_sem);
> @@ -2881,7 +2894,9 @@ static inline void verify_mm_writelocked(struct mm_struct *mm)
> * anonymous maps. eventually we may be able to do some
> * brk-specific accounting here.
> */
> -static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags, struct list_head *uf)
> +static int do_brk_flags(unsigned long addr, unsigned long request,
> + unsigned long flags, struct list_head *uf,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma, *prev;
> @@ -2920,7 +2935,7 @@ static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long
> */
> while (find_vma_links(mm, addr, addr + len, &prev, &rb_link,
> &rb_parent)) {
> - if (do_munmap(mm, addr, len, uf))
> + if (do_munmap(mm, addr, len, uf, mmrange))
> return -ENOMEM;
> }
>
> @@ -2936,7 +2951,7 @@ static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long
>
> /* Can we just expand an old private anonymous mapping? */
> vma = vma_merge(mm, prev, addr, addr + len, flags,
> - NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX);
> + NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, mmrange);
> if (vma)
> goto out;
>
> @@ -2967,9 +2982,10 @@ static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long
> return 0;
> }
>
> -static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf)
> +static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf,
> + struct range_lock *mmrange)
> {
> - return do_brk_flags(addr, len, 0, uf);
> + return do_brk_flags(addr, len, 0, uf, mmrange);
> }
>
> int vm_brk_flags(unsigned long addr, unsigned long len, unsigned long flags)
> @@ -2978,11 +2994,12 @@ int vm_brk_flags(unsigned long addr, unsigned long len, unsigned long flags)
> int ret;
> bool populate;
> LIST_HEAD(uf);
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (down_write_killable(&mm->mmap_sem))
> return -EINTR;
>
> - ret = do_brk_flags(addr, len, flags, &uf);
> + ret = do_brk_flags(addr, len, flags, &uf, &mmrange);
> populate = ((mm->def_flags & VM_LOCKED) != 0);
> up_write(&mm->mmap_sem);
> userfaultfd_unmap_complete(mm, &uf);
> @@ -3105,7 +3122,7 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
> */
> struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> unsigned long addr, unsigned long len, pgoff_t pgoff,
> - bool *need_rmap_locks)
> + bool *need_rmap_locks, struct range_lock *mmrange)
> {
> struct vm_area_struct *vma = *vmap;
> unsigned long vma_start = vma->vm_start;
> @@ -3127,7 +3144,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> return NULL; /* should never get here */
> new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
> vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> - vma->vm_userfaultfd_ctx);
> + vma->vm_userfaultfd_ctx, mmrange);
> if (new_vma) {
> /*
> * Source vma may have been merged into new_vma
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index e3309fcf586b..b84a70720319 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -299,7 +299,8 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
>
> int
> mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
> - unsigned long start, unsigned long end, unsigned long newflags)
> + unsigned long start, unsigned long end, unsigned long newflags,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = vma->vm_mm;
> unsigned long oldflags = vma->vm_flags;
> @@ -340,7 +341,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
> pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
> *pprev = vma_merge(mm, *pprev, start, end, newflags,
> vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> - vma->vm_userfaultfd_ctx);
> + vma->vm_userfaultfd_ctx, mmrange);
> if (*pprev) {
> vma = *pprev;
> VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
> @@ -350,13 +351,13 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
> *pprev = vma;
>
> if (start != vma->vm_start) {
> - error = split_vma(mm, vma, start, 1);
> + error = split_vma(mm, vma, start, 1, mmrange);
> if (error)
> goto fail;
> }
>
> if (end != vma->vm_end) {
> - error = split_vma(mm, vma, end, 0);
> + error = split_vma(mm, vma, end, 0, mmrange);
> if (error)
> goto fail;
> }
> @@ -379,7 +380,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
> */
> if ((oldflags & (VM_WRITE | VM_SHARED | VM_LOCKED)) == VM_LOCKED &&
> (newflags & VM_WRITE)) {
> - populate_vma_page_range(vma, start, end, NULL);
> + populate_vma_page_range(vma, start, end, NULL, mmrange);
> }
>
> vm_stat_account(mm, oldflags, -nrpages);
> @@ -404,6 +405,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> const int grows = prot & (PROT_GROWSDOWN|PROT_GROWSUP);
> const bool rier = (current->personality & READ_IMPLIES_EXEC) &&
> (prot & PROT_READ);
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP);
> if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */
> @@ -494,7 +496,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> tmp = vma->vm_end;
> if (tmp > end)
> tmp = end;
> - error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
> + error = mprotect_fixup(vma, &prev, nstart, tmp, newflags, &mmrange);
> if (error)
> goto out;
> nstart = tmp;
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 049470aa1e3e..21a9e2a2baa2 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -264,7 +264,8 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> unsigned long old_addr, unsigned long old_len,
> unsigned long new_len, unsigned long new_addr,
> bool *locked, struct vm_userfaultfd_ctx *uf,
> - struct list_head *uf_unmap)
> + struct list_head *uf_unmap,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = vma->vm_mm;
> struct vm_area_struct *new_vma;
> @@ -292,13 +293,13 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> * so KSM can come around to merge on vma and new_vma afterwards.
> */
> err = ksm_madvise(vma, old_addr, old_addr + old_len,
> - MADV_UNMERGEABLE, &vm_flags);
> + MADV_UNMERGEABLE, &vm_flags, mmrange);
> if (err)
> return err;
>
> new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT);
> new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff,
> - &need_rmap_locks);
> + &need_rmap_locks, mmrange);
> if (!new_vma)
> return -ENOMEM;
>
> @@ -353,7 +354,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> if (unlikely(vma->vm_flags & VM_PFNMAP))
> untrack_pfn_moved(vma);
>
> - if (do_munmap(mm, old_addr, old_len, uf_unmap) < 0) {
> + if (do_munmap(mm, old_addr, old_len, uf_unmap, mmrange) < 0) {
> /* OOM: unable to split vma, just get accounts right */
> vm_unacct_memory(excess >> PAGE_SHIFT);
> excess = 0;
> @@ -444,7 +445,8 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> unsigned long new_addr, unsigned long new_len, bool *locked,
> struct vm_userfaultfd_ctx *uf,
> struct list_head *uf_unmap_early,
> - struct list_head *uf_unmap)
> + struct list_head *uf_unmap,
> + struct range_lock *mmrange)
> {
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma;
> @@ -462,12 +464,13 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> if (addr + old_len > new_addr && new_addr + new_len > addr)
> goto out;
>
> - ret = do_munmap(mm, new_addr, new_len, uf_unmap_early);
> + ret = do_munmap(mm, new_addr, new_len, uf_unmap_early, mmrange);
> if (ret)
> goto out;
>
> if (old_len >= new_len) {
> - ret = do_munmap(mm, addr+new_len, old_len - new_len, uf_unmap);
> + ret = do_munmap(mm, addr+new_len, old_len - new_len,
> + uf_unmap, mmrange);
> if (ret && old_len != new_len)
> goto out;
> old_len = new_len;
> @@ -490,7 +493,7 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> goto out1;
>
> ret = move_vma(vma, addr, old_len, new_len, new_addr, locked, uf,
> - uf_unmap);
> + uf_unmap, mmrange);
> if (!(offset_in_page(ret)))
> goto out;
> out1:
> @@ -532,6 +535,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX;
> LIST_HEAD(uf_unmap_early);
> LIST_HEAD(uf_unmap);
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
> return ret;
> @@ -558,7 +562,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>
> if (flags & MREMAP_FIXED) {
> ret = mremap_to(addr, old_len, new_addr, new_len,
> - &locked, &uf, &uf_unmap_early, &uf_unmap);
> + &locked, &uf, &uf_unmap_early,
> + &uf_unmap, &mmrange);
> goto out;
> }
>
> @@ -568,7 +573,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> * do_munmap does all the needed commit accounting
> */
> if (old_len >= new_len) {
> - ret = do_munmap(mm, addr+new_len, old_len - new_len, &uf_unmap);
> + ret = do_munmap(mm, addr+new_len, old_len - new_len,
> + &uf_unmap, &mmrange);
> if (ret && old_len != new_len)
> goto out;
> ret = addr;
> @@ -592,7 +598,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> int pages = (new_len - old_len) >> PAGE_SHIFT;
>
> if (vma_adjust(vma, vma->vm_start, addr + new_len,
> - vma->vm_pgoff, NULL)) {
> + vma->vm_pgoff, NULL, &mmrange)) {
> ret = -ENOMEM;
> goto out;
> }
> @@ -628,7 +634,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> }
>
> ret = move_vma(vma, addr, old_len, new_len, new_addr,
> - &locked, &uf, &uf_unmap);
> + &locked, &uf, &uf_unmap, &mmrange);
> }
> out:
> if (offset_in_page(ret)) {
> diff --git a/mm/nommu.c b/mm/nommu.c
> index ebb6e618dade..1805f0a788b3 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -113,7 +113,8 @@ unsigned int kobjsize(const void *objp)
> static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> unsigned long start, unsigned long nr_pages,
> unsigned int foll_flags, struct page **pages,
> - struct vm_area_struct **vmas, int *nonblocking)
> + struct vm_area_struct **vmas, int *nonblocking,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *vma;
> unsigned long vm_flags;
> @@ -162,18 +163,19 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> */
> long get_user_pages(unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas)
> + struct vm_area_struct **vmas,
> + struct range_lock *mmrange)
> {
> return __get_user_pages(current, current->mm, start, nr_pages,
> - gup_flags, pages, vmas, NULL);
> + gup_flags, pages, vmas, NULL, mmrange);
> }
> EXPORT_SYMBOL(get_user_pages);
>
> long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> - int *locked)
> + int *locked, struct range_lock *mmrange)
> {
> - return get_user_pages(start, nr_pages, gup_flags, pages, NULL);
> + return get_user_pages(start, nr_pages, gup_flags, pages, NULL, mmrange);
> }
> EXPORT_SYMBOL(get_user_pages_locked);
>
> @@ -183,9 +185,11 @@ static long __get_user_pages_unlocked(struct task_struct *tsk,
> unsigned int gup_flags)
> {
> long ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
> +
> down_read(&mm->mmap_sem);
> ret = __get_user_pages(tsk, mm, start, nr_pages, gup_flags, pages,
> - NULL, NULL);
> + NULL, NULL, &mmrange);
> up_read(&mm->mmap_sem);
> return ret;
> }
> @@ -836,7 +840,8 @@ EXPORT_SYMBOL(find_vma);
> * find a VMA
> * - we don't extend stack VMAs under NOMMU conditions
> */
> -struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr)
> +struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr,
> + struct range_lock *mmrange)
> {
> return find_vma(mm, addr);
> }
> @@ -1206,7 +1211,8 @@ unsigned long do_mmap(struct file *file,
> vm_flags_t vm_flags,
> unsigned long pgoff,
> unsigned long *populate,
> - struct list_head *uf)
> + struct list_head *uf,
> + struct range_lock *mmrange)
> {
> struct vm_area_struct *vma;
> struct vm_region *region;
> @@ -1476,7 +1482,7 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg)
> * for the first part or the tail.
> */
> int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long addr, int new_below)
> + unsigned long addr, int new_below, struct range_lock *mmrange)
> {
> struct vm_area_struct *new;
> struct vm_region *region;
> @@ -1578,7 +1584,8 @@ static int shrink_vma(struct mm_struct *mm,
> * - under NOMMU conditions the chunk to be unmapped must be backed by a single
> * VMA, though it need not cover the whole VMA
> */
> -int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf)
> +int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> + struct list_head *uf, struct range_lock *mmrange)
> {
> struct vm_area_struct *vma;
> unsigned long end;
> @@ -1624,7 +1631,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, struct list
> if (end != vma->vm_end && offset_in_page(end))
> return -EINVAL;
> if (start != vma->vm_start && end != vma->vm_end) {
> - ret = split_vma(mm, vma, start, 1);
> + ret = split_vma(mm, vma, start, 1, mmrange);
> if (ret < 0)
> return ret;
> }
> @@ -1642,9 +1649,10 @@ int vm_munmap(unsigned long addr, size_t len)
> {
> struct mm_struct *mm = current->mm;
> int ret;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> down_write(&mm->mmap_sem);
> - ret = do_munmap(mm, addr, len, NULL);
> + ret = do_munmap(mm, addr, len, NULL, &mmrange);
> up_write(&mm->mmap_sem);
> return ret;
> }
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 8d2da5dec1e0..44a2507c94fd 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -26,7 +26,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> }
>
> static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> pmd_t *pmd;
> unsigned long next;
> @@ -38,7 +38,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
> next = pmd_addr_end(addr, end);
> if (pmd_none(*pmd) || !walk->vma) {
> if (walk->pte_hole)
> - err = walk->pte_hole(addr, next, walk);
> + err = walk->pte_hole(addr, next, walk, mmrange);
> if (err)
> break;
> continue;
> @@ -48,7 +48,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
> * needs to know about pmd_trans_huge() pmds
> */
> if (walk->pmd_entry)
> - err = walk->pmd_entry(pmd, addr, next, walk);
> + err = walk->pmd_entry(pmd, addr, next, walk, mmrange);
> if (err)
> break;
>
> @@ -71,7 +71,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
> }
>
> static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> pud_t *pud;
> unsigned long next;
> @@ -83,7 +83,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
> next = pud_addr_end(addr, end);
> if (pud_none(*pud) || !walk->vma) {
> if (walk->pte_hole)
> - err = walk->pte_hole(addr, next, walk);
> + err = walk->pte_hole(addr, next, walk, mmrange);
> if (err)
> break;
> continue;
> @@ -106,7 +106,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
> goto again;
>
> if (walk->pmd_entry || walk->pte_entry)
> - err = walk_pmd_range(pud, addr, next, walk);
> + err = walk_pmd_range(pud, addr, next, walk, mmrange);
> if (err)
> break;
> } while (pud++, addr = next, addr != end);
> @@ -115,7 +115,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
> }
>
> static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> p4d_t *p4d;
> unsigned long next;
> @@ -126,13 +126,13 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
> next = p4d_addr_end(addr, end);
> if (p4d_none_or_clear_bad(p4d)) {
> if (walk->pte_hole)
> - err = walk->pte_hole(addr, next, walk);
> + err = walk->pte_hole(addr, next, walk, mmrange);
> if (err)
> break;
> continue;
> }
> if (walk->pmd_entry || walk->pte_entry)
> - err = walk_pud_range(p4d, addr, next, walk);
> + err = walk_pud_range(p4d, addr, next, walk, mmrange);
> if (err)
> break;
> } while (p4d++, addr = next, addr != end);
> @@ -141,7 +141,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
> }
>
> static int walk_pgd_range(unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> pgd_t *pgd;
> unsigned long next;
> @@ -152,13 +152,13 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
> next = pgd_addr_end(addr, end);
> if (pgd_none_or_clear_bad(pgd)) {
> if (walk->pte_hole)
> - err = walk->pte_hole(addr, next, walk);
> + err = walk->pte_hole(addr, next, walk, mmrange);
> if (err)
> break;
> continue;
> }
> if (walk->pmd_entry || walk->pte_entry)
> - err = walk_p4d_range(pgd, addr, next, walk);
> + err = walk_p4d_range(pgd, addr, next, walk, mmrange);
> if (err)
> break;
> } while (pgd++, addr = next, addr != end);
> @@ -175,7 +175,7 @@ static unsigned long hugetlb_entry_end(struct hstate *h, unsigned long addr,
> }
>
> static int walk_hugetlb_range(unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> struct vm_area_struct *vma = walk->vma;
> struct hstate *h = hstate_vma(vma);
> @@ -192,7 +192,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
> if (pte)
> err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
> else if (walk->pte_hole)
> - err = walk->pte_hole(addr, next, walk);
> + err = walk->pte_hole(addr, next, walk, mmrange);
>
> if (err)
> break;
> @@ -203,7 +203,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
>
> #else /* CONFIG_HUGETLB_PAGE */
> static int walk_hugetlb_range(unsigned long addr, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> return 0;
> }
> @@ -217,7 +217,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
> * error, where we abort the current walk.
> */
> static int walk_page_test(unsigned long start, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> struct vm_area_struct *vma = walk->vma;
>
> @@ -235,23 +235,23 @@ static int walk_page_test(unsigned long start, unsigned long end,
> if (vma->vm_flags & VM_PFNMAP) {
> int err = 1;
> if (walk->pte_hole)
> - err = walk->pte_hole(start, end, walk);
> + err = walk->pte_hole(start, end, walk, mmrange);
> return err ? err : 1;
> }
> return 0;
> }
>
> static int __walk_page_range(unsigned long start, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> int err = 0;
> struct vm_area_struct *vma = walk->vma;
>
> if (vma && is_vm_hugetlb_page(vma)) {
> if (walk->hugetlb_entry)
> - err = walk_hugetlb_range(start, end, walk);
> + err = walk_hugetlb_range(start, end, walk, mmrange);
> } else
> - err = walk_pgd_range(start, end, walk);
> + err = walk_pgd_range(start, end, walk, mmrange);
>
> return err;
> }
> @@ -285,10 +285,11 @@ static int __walk_page_range(unsigned long start, unsigned long end,
> * Locking:
> * Callers of walk_page_range() and walk_page_vma() should hold
> * @walk->mm->mmap_sem, because these function traverse vma list and/or
> - * access to vma's data.
> + * access to vma's data. As such, the @mmrange will represent the
> + * address space range.
> */
> int walk_page_range(unsigned long start, unsigned long end,
> - struct mm_walk *walk)
> + struct mm_walk *walk, struct range_lock *mmrange)
> {
> int err = 0;
> unsigned long next;
> @@ -315,7 +316,7 @@ int walk_page_range(unsigned long start, unsigned long end,
> next = min(end, vma->vm_end);
> vma = vma->vm_next;
>
> - err = walk_page_test(start, next, walk);
> + err = walk_page_test(start, next, walk, mmrange);
> if (err > 0) {
> /*
> * positive return values are purely for
> @@ -329,14 +330,15 @@ int walk_page_range(unsigned long start, unsigned long end,
> break;
> }
> if (walk->vma || walk->pte_hole)
> - err = __walk_page_range(start, next, walk);
> + err = __walk_page_range(start, next, walk, mmrange);
> if (err)
> break;
> } while (start = next, start < end);
> return err;
> }
>
> -int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk)
> +int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk,
> + struct range_lock *mmrange)
> {
> int err;
>
> @@ -346,10 +348,10 @@ int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk)
> VM_BUG_ON(!rwsem_is_locked(&walk->mm->mmap_sem));
> VM_BUG_ON(!vma);
> walk->vma = vma;
> - err = walk_page_test(vma->vm_start, vma->vm_end, walk);
> + err = walk_page_test(vma->vm_start, vma->vm_end, walk, mmrange);
> if (err > 0)
> return 0;
> if (err < 0)
> return err;
> - return __walk_page_range(vma->vm_start, vma->vm_end, walk);
> + return __walk_page_range(vma->vm_start, vma->vm_end, walk, mmrange);
> }
> diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
> index a447092d4635..ff6772b86195 100644
> --- a/mm/process_vm_access.c
> +++ b/mm/process_vm_access.c
> @@ -90,6 +90,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
> unsigned long max_pages_per_loop = PVM_MAX_KMALLOC_PAGES
> / sizeof(struct pages *);
> unsigned int flags = 0;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* Work out address and page range required */
> if (len == 0)
> @@ -111,7 +112,8 @@ static int process_vm_rw_single_vec(unsigned long addr,
> */
> down_read(&mm->mmap_sem);
> pages = get_user_pages_remote(task, mm, pa, pages, flags,
> - process_pages, NULL, &locked);
> + process_pages, NULL, &locked,
> + &mmrange);
> if (locked)
> up_read(&mm->mmap_sem);
> if (pages <= 0)
> diff --git a/mm/util.c b/mm/util.c
> index c1250501364f..b0ec1d88bb71 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -347,13 +347,14 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
> struct mm_struct *mm = current->mm;
> unsigned long populate;
> LIST_HEAD(uf);
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> ret = security_mmap_file(file, prot, flag);
> if (!ret) {
> if (down_write_killable(&mm->mmap_sem))
> return -EINTR;
> ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
> - &populate, &uf);
> + &populate, &uf, &mmrange);
> up_write(&mm->mmap_sem);
> userfaultfd_unmap_complete(mm, &uf);
> if (populate)
> diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
> index f6758dad981f..c1e36ea2c6fc 100644
> --- a/security/tomoyo/domain.c
> +++ b/security/tomoyo/domain.c
> @@ -868,6 +868,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
> struct tomoyo_page_dump *dump)
> {
> struct page *page;
> + DEFINE_RANGE_LOCK_FULL(mmrange); /* see get_page_arg() in fs/exec.c */
>
> /* dump->data is released by tomoyo_find_next_domain(). */
> if (!dump->data) {
> @@ -884,7 +885,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
> * the execve().
> */
> if (get_user_pages_remote(current, bprm->mm, pos, 1,
> - FOLL_FORCE, &page, NULL, NULL) <= 0)
> + FOLL_FORCE, &page, NULL, NULL, &mmrange) <= 0)
> return false;
> #else
> page = bprm->page[pos / PAGE_SIZE];
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 57bcb27dcf30..4cd2b93bb20c 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -78,6 +78,7 @@ static void async_pf_execute(struct work_struct *work)
> unsigned long addr = apf->addr;
> gva_t gva = apf->gva;
> int locked = 1;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> might_sleep();
>
> @@ -88,7 +89,7 @@ static void async_pf_execute(struct work_struct *work)
> */
> down_read(&mm->mmap_sem);
> get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
> - &locked);
> + &locked, &mmrange);
> if (locked)
> up_read(&mm->mmap_sem);
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4501e658e8d6..86ec078f4c3b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1317,11 +1317,12 @@ unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *w
> return gfn_to_hva_memslot_prot(slot, gfn, writable);
> }
>
> -static inline int check_user_page_hwpoison(unsigned long addr)
> +static inline int check_user_page_hwpoison(unsigned long addr,
> + struct range_lock *mmrange)
> {
> int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
>
> - rc = get_user_pages(addr, 1, flags, NULL, NULL);
> + rc = get_user_pages(addr, 1, flags, NULL, NULL, mmrange);
> return rc == -EHWPOISON;
> }
>
> @@ -1411,7 +1412,8 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
> static int hva_to_pfn_remapped(struct vm_area_struct *vma,
> unsigned long addr, bool *async,
> bool write_fault, bool *writable,
> - kvm_pfn_t *p_pfn)
> + kvm_pfn_t *p_pfn,
> + struct range_lock *mmrange)
> {
> unsigned long pfn;
> int r;
> @@ -1425,7 +1427,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
> bool unlocked = false;
> r = fixup_user_fault(current, current->mm, addr,
> (write_fault ? FAULT_FLAG_WRITE : 0),
> - &unlocked);
> + &unlocked, mmrange);
> if (unlocked)
> return -EAGAIN;
> if (r)
> @@ -1477,6 +1479,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
> struct vm_area_struct *vma;
> kvm_pfn_t pfn = 0;
> int npages, r;
> + DEFINE_RANGE_LOCK_FULL(mmrange);
>
> /* we can do it either atomically or asynchronously, not both */
> BUG_ON(atomic && async);
> @@ -1493,7 +1496,7 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
>
> down_read(&current->mm->mmap_sem);
> if (npages == -EHWPOISON ||
> - (!async && check_user_page_hwpoison(addr))) {
> + (!async && check_user_page_hwpoison(addr, &mmrange))) {
> pfn = KVM_PFN_ERR_HWPOISON;
> goto exit;
> }
> @@ -1504,7 +1507,8 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
> if (vma == NULL)
> pfn = KVM_PFN_ERR_FAULT;
> else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
> - r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn);
> + r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable,
> + &pfn, &mmrange);
> if (r == -EAGAIN)
> goto retry;
> if (r < 0)
>


2018-02-05 16:54:52

by Laurent Dufour

[permalink] [raw]
Subject: Re: [RFC PATCH 00/64] mm: towards parallel address space operations

On 05/02/2018 02:26, Davidlohr Bueso wrote:
> From: Davidlohr Bueso <[email protected]>
>
> Hi,
>
> This patchset is a new version of both the range locking machinery as well
> as a full mmap_sem conversion that makes use of it -- as the worst case
> scenario as all mmap_sem calls are converted to a full range mmap_lock
> equivalent. As such, while there is no improvement of concurrency perse,
> these changes aim at adding the machinery to permit this in the future.

Despite the massive rebase, what are the changes in this series compared to
the one I sent in last May - you silently based on, by the way :
https://lkml.org/lkml/2017/5/24/409

>
> Direct users of the mm->mmap_sem can be classified as those that (1) acquire
> and release the lock within the same context, and (2) those who directly
> manipulate the mmap_sem down the callchain. For example:
>
> (1) down_read(&mm->mmap_sem);
> /* do something */
> /* nobody down the chain uses mmap_sem directly */
> up_read(&mm->mmap_sem);
>
> (2a) down_read(&mm->mmap_sem);
> /* do something that retuns mmap_sem unlocked */
> fn(mm, &locked);
> if (locked)
> up_read(&mm->mmap_sem);
>
> (2b) down_read(&mm->mmap_sem);
> /* do something that in between released and reacquired mmap_sem */
> fn(mm);
> up_read(&mm->mmap_sem);

Unfortunately, there are also indirect users which rely on the mmap_sem
locking to protect their data. For the first step using a full range this
doesn't matter, but when refining the range, these one would be the most
critical ones as they would have to be reworked to take the range in account.

>
> Patches 1-2: add the range locking machinery. This is rebased on the rbtree
> optimizations for interval trees such that we can quickly detect overlapping
> ranges. More documentation as also been added, with an ordering example in the
> source code.
>
> Patch 3: adds new mm locking wrappers around mmap_sem.
>
> Patches 4-15: teaches page fault paths about mmrange (specifically adding the
> range in question to the struct vm_fault). In addition, most of these patches
> update mmap_sem callers that call into the 2a and 2b examples above.
>
> Patches 15-63: adds most of the trivial conversions -- the (1) example above.
> (patches 21, 22, 23 are hacks that avoid rwsem_is_locked(mmap_sem) such that
> we don't have to teach file_operations about mmrange.
>
> Patch 64: finally do the actual conversion and replace mmap_sem with the range
> mmap_lock.
>
> I've run the series on a 40-core (ht) 2-socket IvyBridge with 16 Gb of memory
> on various benchmarks that stress address space concurrency.
>
> ** pft is a microbenchmark for page fault rates.
>
> When running with increasing thread counts, range locking takes a rather small
> hit (yet constant) of ~2% for the pft timings, with a max of 5%. This translates
> similarly to faults/cpu.
>
>
> pft timings
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> Amean system-1 1.11 ( 0.00%) 1.17 ( -5.86%)
> Amean system-4 1.14 ( 0.00%) 1.18 ( -3.07%)
> Amean system-7 1.38 ( 0.00%) 1.36 ( 0.94%)
> Amean system-12 2.28 ( 0.00%) 2.31 ( -1.18%)
> Amean system-21 4.11 ( 0.00%) 4.13 ( -0.44%)
> Amean system-30 5.94 ( 0.00%) 6.01 ( -1.11%)
> Amean system-40 8.24 ( 0.00%) 8.33 ( -1.04%)
> Amean elapsed-1 1.28 ( 0.00%) 1.33 ( -4.50%)
> Amean elapsed-4 0.32 ( 0.00%) 0.34 ( -5.27%)
> Amean elapsed-7 0.24 ( 0.00%) 0.24 ( -0.43%)
> Amean elapsed-12 0.23 ( 0.00%) 0.23 ( -0.22%)
> Amean elapsed-21 0.26 ( 0.00%) 0.25 ( 0.39%)
> Amean elapsed-30 0.24 ( 0.00%) 0.24 ( -0.21%)
> Amean elapsed-40 0.24 ( 0.00%) 0.24 ( 0.84%)
> Stddev system-1 0.04 ( 0.00%) 0.05 ( -16.29%)
> Stddev system-4 0.03 ( 0.00%) 0.03 ( 17.70%)
> Stddev system-7 0.08 ( 0.00%) 0.02 ( 68.56%)
> Stddev system-12 0.05 ( 0.00%) 0.06 ( -31.22%)
> Stddev system-21 0.06 ( 0.00%) 0.06 ( 8.07%)
> Stddev system-30 0.05 ( 0.00%) 0.09 ( -70.15%)
> Stddev system-40 0.11 ( 0.00%) 0.07 ( 41.53%)
> Stddev elapsed-1 0.03 ( 0.00%) 0.05 ( -72.14%)
> Stddev elapsed-4 0.01 ( 0.00%) 0.01 ( -4.98%)
> Stddev elapsed-7 0.01 ( 0.00%) 0.01 ( 60.65%)
> Stddev elapsed-12 0.01 ( 0.00%) 0.01 ( 6.24%)
> Stddev elapsed-21 0.01 ( 0.00%) 0.01 ( -1.13%)
> Stddev elapsed-30 0.00 ( 0.00%) 0.00 ( -45.10%)
> Stddev elapsed-40 0.01 ( 0.00%) 0.01 ( 25.97%)
>
> pft faults
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> Hmean faults/cpu-1 629011.4218 ( 0.00%) 601523.2875 ( -4.37%)
> Hmean faults/cpu-4 630952.1771 ( 0.00%) 602105.6527 ( -4.57%)
> Hmean faults/cpu-7 518412.2806 ( 0.00%) 518082.2585 ( -0.06%)
> Hmean faults/cpu-12 324957.1130 ( 0.00%) 321678.8932 ( -1.01%)
> Hmean faults/cpu-21 182712.2633 ( 0.00%) 182643.5347 ( -0.04%)
> Hmean faults/cpu-30 126618.2558 ( 0.00%) 125698.1965 ( -0.73%)
> Hmean faults/cpu-40 91266.3914 ( 0.00%) 90614.9956 ( -0.71%)
> Hmean faults/sec-1 628010.9821 ( 0.00%) 600700.3641 ( -4.35%)
> Hmean faults/sec-4 2475859.3012 ( 0.00%) 2351373.1960 ( -5.03%)
> Hmean faults/sec-7 3372026.7978 ( 0.00%) 3408924.8028 ( 1.09%)
> Hmean faults/sec-12 3517750.6290 ( 0.00%) 3488785.0815 ( -0.82%)
> Hmean faults/sec-21 3151328.9188 ( 0.00%) 3156983.9401 ( 0.18%)
> Hmean faults/sec-30 3324673.3141 ( 0.00%) 3318585.9949 ( -0.18%)
> Hmean faults/sec-40 3362503.8992 ( 0.00%) 3410086.6644 ( 1.42%)
> Stddev faults/cpu-1 14795.1817 ( 0.00%) 22870.4755 ( -54.58%)
> Stddev faults/cpu-4 8759.4355 ( 0.00%) 8117.4629 ( 7.33%)
> Stddev faults/cpu-7 20638.6659 ( 0.00%) 2290.0083 ( 88.90%)
> Stddev faults/cpu-12 4003.9838 ( 0.00%) 5297.7747 ( -32.31%)
> Stddev faults/cpu-21 2127.4059 ( 0.00%) 1186.5330 ( 44.23%)
> Stddev faults/cpu-30 558.8082 ( 0.00%) 1366.5374 (-144.54%)
> Stddev faults/cpu-40 1234.8354 ( 0.00%) 768.8031 ( 37.74%)
> Stddev faults/sec-1 14757.0434 ( 0.00%) 22740.7172 ( -54.10%)
> Stddev faults/sec-4 49934.6675 ( 0.00%) 54133.9449 ( -8.41%)
> Stddev faults/sec-7 152781.8690 ( 0.00%) 16415.0736 ( 89.26%)
> Stddev faults/sec-12 228697.8709 ( 0.00%) 239575.3690 ( -4.76%)
> Stddev faults/sec-21 70244.4600 ( 0.00%) 75031.5776 ( -6.81%)
> Stddev faults/sec-30 52147.1842 ( 0.00%) 58651.5496 ( -12.47%)
> Stddev faults/sec-40 149846.3761 ( 0.00%) 113646.0640 ( 24.16%)
>
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> User 47.46 48.21
> System 540.43 546.03
> Elapsed 61.85 64.33
>
> ** gitcheckout is probably the workload that takes the biggest hit (-35%).
> Sys time, as expected, increases quite a bit, coming from overhead of blocking.
>
> gitcheckout
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> System mean 9.49 ( 0.00%) 9.82 ( -3.49%)
> System stddev 0.20 ( 0.00%) 0.39 ( -95.73%)
> Elapsed mean 22.87 ( 0.00%) 30.90 ( -35.12%)
> Elapsed stddev 0.39 ( 0.00%) 6.32 (-1526.48%)
> CPU mean 98.07 ( 0.00%) 76.27 ( 22.23%)
> CPU stddev 0.70 ( 0.00%) 14.63 (-1978.37%)
>
>
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> User 224.06 224.80
> System 176.05 181.01
> Elapsed 619.51 801.78
>
>
> ** freqmine is an implementation of Frequent Itemsets Mining (FIM) that
> analyses a set of transactions looking to extract association rules with
> threads. This is a common workload in retail. This configuration uses
> between 2 and 4*NUMCPUs. The performance differences with this patchset
> are marginal.
>
> freqmine-large
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> Amean 2 216.89 ( 0.00%) 216.59 ( 0.14%)
> Amean 5 91.56 ( 0.00%) 91.58 ( -0.02%)
> Amean 8 59.41 ( 0.00%) 59.54 ( -0.22%)
> Amean 12 44.19 ( 0.00%) 44.24 ( -0.12%)
> Amean 21 33.97 ( 0.00%) 33.55 ( 1.25%)
> Amean 30 33.28 ( 0.00%) 33.15 ( 0.40%)
> Amean 48 34.38 ( 0.00%) 34.21 ( 0.48%)
> Amean 79 33.22 ( 0.00%) 32.83 ( 1.19%)
> Amean 110 36.15 ( 0.00%) 35.29 ( 2.40%)
> Amean 141 35.63 ( 0.00%) 36.38 ( -2.12%)
> Amean 160 36.31 ( 0.00%) 36.05 ( 0.73%)
> Stddev 2 1.10 ( 0.00%) 0.19 ( 82.79%)
> Stddev 5 0.23 ( 0.00%) 0.10 ( 54.31%)
> Stddev 8 0.17 ( 0.00%) 0.43 (-146.19%)
> Stddev 12 0.12 ( 0.00%) 0.12 ( -0.05%)
> Stddev 21 0.49 ( 0.00%) 0.39 ( 21.88%)
> Stddev 30 1.07 ( 0.00%) 0.93 ( 12.61%)
> Stddev 48 0.76 ( 0.00%) 0.66 ( 12.07%)
> Stddev 79 0.29 ( 0.00%) 0.58 ( -98.77%)
> Stddev 110 1.10 ( 0.00%) 0.53 ( 51.93%)
> Stddev 141 0.66 ( 0.00%) 0.79 ( -18.83%)
> Stddev 160 0.27 ( 0.00%) 0.15 ( 42.71%)
>
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> User 29346.21 28818.39
> System 292.18 676.92
> Elapsed 2622.81 2615.77
>
>
> ** kernbench (build kernels). With increasing thread counts, the amoung of
> overhead from range locking is no more than ~5%.
>
> kernbench
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> Amean user-2 554.53 ( 0.00%) 555.74 ( -0.22%)
> Amean user-4 566.23 ( 0.00%) 567.15 ( -0.16%)
> Amean user-8 588.66 ( 0.00%) 589.68 ( -0.17%)
> Amean user-16 647.97 ( 0.00%) 648.46 ( -0.08%)
> Amean user-32 923.05 ( 0.00%) 925.25 ( -0.24%)
> Amean user-64 1066.74 ( 0.00%) 1067.11 ( -0.03%)
> Amean user-80 1082.50 ( 0.00%) 1082.11 ( 0.04%)
> Amean syst-2 71.80 ( 0.00%) 74.90 ( -4.31%)
> Amean syst-4 76.77 ( 0.00%) 79.91 ( -4.10%)
> Amean syst-8 71.58 ( 0.00%) 74.83 ( -4.54%)
> Amean syst-16 79.21 ( 0.00%) 82.95 ( -4.73%)
> Amean syst-32 104.21 ( 0.00%) 108.47 ( -4.09%)
> Amean syst-64 113.69 ( 0.00%) 119.39 ( -5.02%)
> Amean syst-80 113.98 ( 0.00%) 120.18 ( -5.44%)
> Amean elsp-2 307.65 ( 0.00%) 309.27 ( -0.53%)
> Amean elsp-4 159.86 ( 0.00%) 160.94 ( -0.67%)
> Amean elsp-8 84.76 ( 0.00%) 85.04 ( -0.33%)
> Amean elsp-16 49.63 ( 0.00%) 49.56 ( 0.15%)
> Amean elsp-32 37.52 ( 0.00%) 38.16 ( -1.68%)
> Amean elsp-64 36.76 ( 0.00%) 37.03 ( -0.72%)
> Amean elsp-80 37.09 ( 0.00%) 37.49 ( -1.08%)
> Stddev user-2 0.97 ( 0.00%) 0.66 ( 32.20%)
> Stddev user-4 0.52 ( 0.00%) 0.60 ( -17.34%)
> Stddev user-8 0.64 ( 0.00%) 0.23 ( 63.28%)
> Stddev user-16 1.40 ( 0.00%) 0.64 ( 54.46%)
> Stddev user-32 1.32 ( 0.00%) 0.95 ( 28.47%)
> Stddev user-64 0.77 ( 0.00%) 1.47 ( -91.61%)
> Stddev user-80 1.12 ( 0.00%) 0.94 ( 16.00%)
> Stddev syst-2 0.45 ( 0.00%) 0.45 ( 0.22%)
> Stddev syst-4 0.41 ( 0.00%) 0.58 ( -41.24%)
> Stddev syst-8 0.55 ( 0.00%) 0.28 ( 49.35%)
> Stddev syst-16 0.22 ( 0.00%) 0.29 ( -30.98%)
> Stddev syst-32 0.44 ( 0.00%) 0.56 ( -27.75%)
> Stddev syst-64 0.47 ( 0.00%) 0.48 ( -1.91%)
> Stddev syst-80 0.24 ( 0.00%) 0.60 (-144.20%)
> Stddev elsp-2 0.46 ( 0.00%) 0.31 ( 32.97%)
> Stddev elsp-4 0.14 ( 0.00%) 0.25 ( -72.38%)
> Stddev elsp-8 0.36 ( 0.00%) 0.08 ( 77.92%)
> Stddev elsp-16 0.74 ( 0.00%) 0.58 ( 22.00%)
> Stddev elsp-32 0.31 ( 0.00%) 0.74 (-138.95%)
> Stddev elsp-64 0.12 ( 0.00%) 0.12 ( 1.62%)
> Stddev elsp-80 0.23 ( 0.00%) 0.15 ( 35.38%)
>
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> User 28309.95 28341.20
> System 3320.18 3473.73
> Elapsed 3792.13 3850.21
>
>
>
> ** reaim's compute, new_dbase and shared workloads were tested, with
> the new dbase one taking up to a 20% hit, which is expected as this
> micro benchmark context switches a lot and benefits from reducing them
> with spin-on-owner feature that range locks lack. Compute otoh was
> boosted for higher thread counts.
>
> reaim
> v4.15-rc8 v4.15-rc8
> range-mmap_lock-v1
> Hmean compute-1 5652.98 ( 0.00%) 5738.64 ( 1.52%)
> Hmean compute-21 81997.42 ( 0.00%) 81997.42 ( -0.00%)
> Hmean compute-41 135622.27 ( 0.00%) 138959.73 ( 2.46%)
> Hmean compute-61 179272.55 ( 0.00%) 174367.92 ( -2.74%)
> Hmean compute-81 200187.60 ( 0.00%) 195250.60 ( -2.47%)
> Hmean compute-101 207337.40 ( 0.00%) 187633.35 ( -9.50%)
> Hmean compute-121 179018.55 ( 0.00%) 206087.69 ( 15.12%)
> Hmean compute-141 175887.20 ( 0.00%) 195528.60 ( 11.17%)
> Hmean compute-161 198063.33 ( 0.00%) 190335.54 ( -3.90%)
> Hmean new_dbase-1 56.64 ( 0.00%) 60.76 ( 7.27%)
> Hmean new_dbase-21 11149.48 ( 0.00%) 10082.35 ( -9.57%)
> Hmean new_dbase-41 25161.87 ( 0.00%) 21626.83 ( -14.05%)
> Hmean new_dbase-61 39858.32 ( 0.00%) 33956.04 ( -14.81%)
> Hmean new_dbase-81 55057.19 ( 0.00%) 43879.73 ( -20.30%)
> Hmean new_dbase-101 67566.57 ( 0.00%) 56323.77 ( -16.64%)
> Hmean new_dbase-121 79517.22 ( 0.00%) 64877.67 ( -18.41%)
> Hmean new_dbase-141 92365.91 ( 0.00%) 76571.18 ( -17.10%)
> Hmean new_dbase-161 101590.77 ( 0.00%) 85332.76 ( -16.00%)
> Hmean shared-1 71.26 ( 0.00%) 76.43 ( 7.26%)
> Hmean shared-21 11546.39 ( 0.00%) 10521.92 ( -8.87%)
> Hmean shared-41 28302.97 ( 0.00%) 22116.50 ( -21.86%)
> Hmean shared-61 23814.56 ( 0.00%) 21886.13 ( -8.10%)
> Hmean shared-81 11578.89 ( 0.00%) 16423.55 ( 41.84%)
> Hmean shared-101 9991.41 ( 0.00%) 11378.95 ( 13.89%)
> Hmean shared-121 9884.83 ( 0.00%) 10010.92 ( 1.28%)
> Hmean shared-141 9911.88 ( 0.00%) 9637.14 ( -2.77%)
> Hmean shared-161 8587.14 ( 0.00%) 9613.53 ( 11.95%)
> Stddev compute-1 94.42 ( 0.00%) 166.37 ( -76.20%)
> Stddev compute-21 1915.36 ( 0.00%) 2582.96 ( -34.85%)
> Stddev compute-41 4822.88 ( 0.00%) 6057.32 ( -25.60%)
> Stddev compute-61 4425.14 ( 0.00%) 3676.90 ( 16.91%)
> Stddev compute-81 5549.60 ( 0.00%) 17213.90 (-210.18%)
> Stddev compute-101 19395.33 ( 0.00%) 28315.96 ( -45.99%)
> Stddev compute-121 16140.56 ( 0.00%) 27927.63 ( -73.03%)
> Stddev compute-141 9616.27 ( 0.00%) 31273.43 (-225.21%)
> Stddev compute-161 34746.00 ( 0.00%) 20706.81 ( 40.41%)
> Stddev new_dbase-1 1.08 ( 0.00%) 0.80 ( 25.62%)
> Stddev new_dbase-21 356.67 ( 0.00%) 297.23 ( 16.66%)
> Stddev new_dbase-41 739.68 ( 0.00%) 1287.72 ( -74.09%)
> Stddev new_dbase-61 896.06 ( 0.00%) 1293.55 ( -44.36%)
> Stddev new_dbase-81 2003.96 ( 0.00%) 2018.08 ( -0.70%)
> Stddev new_dbase-101 2101.25 ( 0.00%) 3461.91 ( -64.75%)
> Stddev new_dbase-121 3294.30 ( 0.00%) 3917.20 ( -18.91%)
> Stddev new_dbase-141 3488.81 ( 0.00%) 5242.36 ( -50.26%)
> Stddev new_dbase-161 2744.12 ( 0.00%) 5262.36 ( -91.77%)
> Stddev shared-1 1.38 ( 0.00%) 1.24 ( 9.84%)
> Stddev shared-21 1930.40 ( 0.00%) 232.81 ( 87.94%)
> Stddev shared-41 1939.93 ( 0.00%) 2316.09 ( -19.39%)
> Stddev shared-61 15001.13 ( 0.00%) 12004.82 ( 19.97%)
> Stddev shared-81 1313.02 ( 0.00%) 14583.51 (-1010.68%)
> Stddev shared-101 355.44 ( 0.00%) 393.79 ( -10.79%)
> Stddev shared-121 1736.68 ( 0.00%) 782.50 ( 54.94%)
> Stddev shared-141 1865.93 ( 0.00%) 1140.24 ( 38.89%)
> Stddev shared-161 1155.19 ( 0.00%) 2045.55 ( -77.07%)
>
> Overall sys% always increases, which is expected, but with the exception
> of git-checkout, the worst case scenario is not that excruciating.
>
> Full test and details (including sysbench oltp mysql and specjbb) can be found here:
> https://linux-scalability.org/range-mmap_lock/tweed-results/
>
> Testing: I have setup an mmtests config file with all the workloads described:
> http://linux-scalability.org/mmtests-config

Is this link still valid, I can't reach it ?

Thanks,
Laurent.

>
> Applies on top of linux-next (20180202). At least compile tested on
> the following architectures:
>
> x86_64, alpha, arm32, blackfin, cris, frv, ia64, m32r, m68k, mips, microblaze
> ppc, s390, sparc, tile and xtensa.
>
>
> Thanks!
>
> Davidlohr Bueso (64):
> interval-tree: build unconditionally
> Introduce range reader/writer lock
> mm: introduce mm locking wrappers
> mm: add a range parameter to the vm_fault structure
> mm,khugepaged: prepare passing of rangelock field to vm_fault
> mm: teach pagefault paths about range locking
> mm/hugetlb: teach hugetlb_fault() about range locking
> mm: teach lock_page_or_retry() about range locking
> mm/mmu_notifier: teach oom reaper about range locking
> kernel/exit: teach exit_mm() about range locking
> prctl: teach about range locking
> fs/userfaultfd: teach userfaultfd_must_wait() about range locking
> fs/proc: teach about range locking
> fs/coredump: teach about range locking
> ipc: use mm locking wrappers
> virt: use mm locking wrappers
> kernel: use mm locking wrappers
> mm/ksm: teach about range locking
> mm/mlock: use mm locking wrappers
> mm/madvise: use mm locking wrappers
> mm: teach drop/take_all_locks() about range locking
> mm: avoid mmap_sem trylock in vm_insert_page()
> mm: huge pagecache: do not check mmap_sem state
> mm/thp: disable mmap_sem is_locked checks
> mm: use mm locking wrappers
> fs: use mm locking wrappers
> arch/{x86,sh,ppc}: teach bad_area() about range locking
> arch/x86: use mm locking wrappers
> arch/alpha: use mm locking wrappers
> arch/tile: use mm locking wrappers
> arch/sparc: use mm locking wrappers
> arch/s390: use mm locking wrappers
> arch/powerpc: use mm locking wrappers
> arch/parisc: use mm locking wrappers
> arch/ia64: use mm locking wrappers
> arch/mips: use mm locking wrappers
> arch/arc: use mm locking wrappers
> arch/blackfin: use mm locking wrappers
> arch/m68k: use mm locking wrappers
> arch/sh: use mm locking wrappers
> arch/cris: use mm locking wrappers
> arch/frv: use mm locking wrappers
> arch/hexagon: use mm locking wrappers
> arch/score: use mm locking wrappers
> arch/m32r: use mm locking wrappers
> arch/metag: use mm locking wrappers
> arch/microblaze: use mm locking wrappers
> arch/tile: use mm locking wrappers
> arch/xtensa: use mm locking wrappers
> arch/unicore32: use mm locking wrappers
> arch/mn10300: use mm locking wrappers
> arch/openrisc: use mm locking wrappers
> arch/nios2: use mm locking wrappers
> arch/arm: use mm locking wrappers
> arch/riscv: use mm locking wrappers
> drivers/android: use mm locking wrappers
> drivers/gpu: use mm locking wrappers
> drivers/infiniband: use mm locking wrappers
> drivers/iommu: use mm locking helpers
> drivers/xen: use mm locking wrappers
> staging/lustre: use generic range lock
> drivers: use mm locking wrappers (the rest)
> mm/mmap: hack drop down_write_nest_lock()
> mm: convert mmap_sem to range mmap_lock
>
> arch/alpha/kernel/traps.c | 6 +-
> arch/alpha/mm/fault.c | 13 +-
> arch/arc/kernel/troubleshoot.c | 5 +-
> arch/arc/mm/fault.c | 15 +-
> arch/arm/kernel/process.c | 5 +-
> arch/arm/kernel/swp_emulate.c | 5 +-
> arch/arm/lib/uaccess_with_memcpy.c | 18 +-
> arch/arm/mm/fault.c | 14 +-
> arch/arm/probes/uprobes/core.c | 5 +-
> arch/arm64/kernel/traps.c | 5 +-
> arch/arm64/kernel/vdso.c | 12 +-
> arch/arm64/mm/fault.c | 13 +-
> arch/blackfin/kernel/ptrace.c | 5 +-
> arch/blackfin/kernel/trace.c | 7 +-
> arch/cris/mm/fault.c | 13 +-
> arch/frv/mm/fault.c | 13 +-
> arch/hexagon/kernel/vdso.c | 5 +-
> arch/hexagon/mm/vm_fault.c | 11 +-
> arch/ia64/kernel/perfmon.c | 10 +-
> arch/ia64/mm/fault.c | 13 +-
> arch/ia64/mm/init.c | 13 +-
> arch/m32r/mm/fault.c | 15 +-
> arch/m68k/kernel/sys_m68k.c | 18 +-
> arch/m68k/mm/fault.c | 11 +-
> arch/metag/mm/fault.c | 13 +-
> arch/microblaze/mm/fault.c | 15 +-
> arch/mips/kernel/traps.c | 5 +-
> arch/mips/kernel/vdso.c | 7 +-
> arch/mips/mm/c-octeon.c | 5 +-
> arch/mips/mm/c-r4k.c | 5 +-
> arch/mips/mm/fault.c | 13 +-
> arch/mn10300/mm/fault.c | 13 +-
> arch/nios2/mm/fault.c | 15 +-
> arch/nios2/mm/init.c | 5 +-
> arch/openrisc/kernel/dma.c | 6 +-
> arch/openrisc/mm/fault.c | 13 +-
> arch/parisc/kernel/traps.c | 7 +-
> arch/parisc/mm/fault.c | 11 +-
> arch/powerpc/include/asm/mmu_context.h | 3 +-
> arch/powerpc/include/asm/powernv.h | 5 +-
> arch/powerpc/kernel/vdso.c | 7 +-
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +-
> arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +-
> arch/powerpc/kvm/book3s_64_vio.c | 5 +-
> arch/powerpc/kvm/book3s_hv.c | 7 +-
> arch/powerpc/kvm/e500_mmu_host.c | 5 +-
> arch/powerpc/mm/copro_fault.c | 8 +-
> arch/powerpc/mm/fault.c | 35 +-
> arch/powerpc/mm/mmu_context_iommu.c | 5 +-
> arch/powerpc/mm/subpage-prot.c | 13 +-
> arch/powerpc/oprofile/cell/spu_task_sync.c | 7 +-
> arch/powerpc/platforms/cell/spufs/file.c | 6 +-
> arch/powerpc/platforms/powernv/npu-dma.c | 7 +-
> arch/riscv/kernel/vdso.c | 5 +-
> arch/riscv/mm/fault.c | 13 +-
> arch/s390/include/asm/gmap.h | 14 +-
> arch/s390/kernel/vdso.c | 5 +-
> arch/s390/kvm/gaccess.c | 35 +-
> arch/s390/kvm/kvm-s390.c | 24 +-
> arch/s390/kvm/priv.c | 29 +-
> arch/s390/mm/fault.c | 9 +-
> arch/s390/mm/gmap.c | 125 ++--
> arch/s390/pci/pci_mmio.c | 5 +-
> arch/score/mm/fault.c | 13 +-
> arch/sh/kernel/sys_sh.c | 7 +-
> arch/sh/kernel/vsyscall/vsyscall.c | 5 +-
> arch/sh/mm/fault.c | 50 +-
> arch/sparc/mm/fault_32.c | 24 +-
> arch/sparc/mm/fault_64.c | 15 +-
> arch/sparc/vdso/vma.c | 5 +-
> arch/tile/kernel/stack.c | 5 +-
> arch/tile/mm/elf.c | 12 +-
> arch/tile/mm/fault.c | 15 +-
> arch/tile/mm/pgtable.c | 6 +-
> arch/um/include/asm/mmu_context.h | 8 +-
> arch/um/kernel/tlb.c | 12 +-
> arch/um/kernel/trap.c | 9 +-
> arch/unicore32/mm/fault.c | 14 +-
> arch/x86/entry/vdso/vma.c | 14 +-
> arch/x86/events/core.c | 2 +-
> arch/x86/include/asm/mmu_context.h | 5 +-
> arch/x86/include/asm/mpx.h | 6 +-
> arch/x86/kernel/tboot.c | 2 +-
> arch/x86/kernel/vm86_32.c | 5 +-
> arch/x86/mm/debug_pagetables.c | 13 +-
> arch/x86/mm/fault.c | 40 +-
> arch/x86/mm/mpx.c | 55 +-
> arch/x86/um/vdso/vma.c | 5 +-
> arch/xtensa/mm/fault.c | 13 +-
> drivers/android/binder_alloc.c | 12 +-
> drivers/gpu/drm/Kconfig | 2 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 7 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 11 +-
> drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 +-
> drivers/gpu/drm/i915/Kconfig | 1 -
> drivers/gpu/drm/i915/i915_gem.c | 5 +-
> drivers/gpu/drm/i915/i915_gem_userptr.c | 13 +-
> drivers/gpu/drm/radeon/radeon_cs.c | 5 +-
> drivers/gpu/drm/radeon/radeon_gem.c | 7 +-
> drivers/gpu/drm/radeon/radeon_mn.c | 7 +-
> drivers/gpu/drm/radeon/radeon_ttm.c | 4 +-
> drivers/gpu/drm/ttm/ttm_bo_vm.c | 4 +-
> drivers/infiniband/core/umem.c | 19 +-
> drivers/infiniband/core/umem_odp.c | 14 +-
> drivers/infiniband/hw/hfi1/user_pages.c | 15 +-
> drivers/infiniband/hw/mlx4/main.c | 5 +-
> drivers/infiniband/hw/mlx5/main.c | 5 +-
> drivers/infiniband/hw/qib/qib_user_pages.c | 17 +-
> drivers/infiniband/hw/usnic/usnic_uiom.c | 19 +-
> drivers/iommu/amd_iommu_v2.c | 9 +-
> drivers/iommu/intel-svm.c | 9 +-
> drivers/media/v4l2-core/videobuf-core.c | 5 +-
> drivers/media/v4l2-core/videobuf-dma-contig.c | 5 +-
> drivers/media/v4l2-core/videobuf-dma-sg.c | 22 +-
> drivers/misc/cxl/cxllib.c | 5 +-
> drivers/misc/cxl/fault.c | 5 +-
> drivers/misc/mic/scif/scif_rma.c | 17 +-
> drivers/misc/sgi-gru/grufault.c | 91 +--
> drivers/misc/sgi-gru/grufile.c | 5 +-
> drivers/oprofile/buffer_sync.c | 12 +-
> drivers/staging/lustre/lustre/llite/Makefile | 2 +-
> drivers/staging/lustre/lustre/llite/file.c | 16 +-
> .../staging/lustre/lustre/llite/llite_internal.h | 4 +-
> drivers/staging/lustre/lustre/llite/llite_mmap.c | 4 +-
> drivers/staging/lustre/lustre/llite/range_lock.c | 240 --------
> drivers/staging/lustre/lustre/llite/range_lock.h | 83 ---
> drivers/staging/lustre/lustre/llite/vvp_io.c | 7 +-
> .../media/atomisp/pci/atomisp2/hmm/hmm_bo.c | 5 +-
> drivers/tee/optee/call.c | 5 +-
> drivers/vfio/vfio_iommu_spapr_tce.c | 8 +-
> drivers/vfio/vfio_iommu_type1.c | 16 +-
> drivers/xen/gntdev.c | 5 +-
> drivers/xen/privcmd.c | 12 +-
> fs/aio.c | 7 +-
> fs/binfmt_elf.c | 3 +-
> fs/coredump.c | 5 +-
> fs/exec.c | 38 +-
> fs/proc/base.c | 33 +-
> fs/proc/internal.h | 3 +
> fs/proc/task_mmu.c | 51 +-
> fs/proc/task_nommu.c | 22 +-
> fs/proc/vmcore.c | 14 +-
> fs/userfaultfd.c | 64 +-
> include/asm-generic/mm_hooks.h | 3 +-
> include/linux/hmm.h | 4 +-
> include/linux/huge_mm.h | 2 -
> include/linux/hugetlb.h | 9 +-
> include/linux/ksm.h | 6 +-
> include/linux/lockdep.h | 33 +
> include/linux/migrate.h | 4 +-
> include/linux/mm.h | 159 ++++-
> include/linux/mm_types.h | 4 +-
> include/linux/mmu_notifier.h | 6 +-
> include/linux/pagemap.h | 7 +-
> include/linux/range_lock.h | 189 ++++++
> include/linux/uprobes.h | 15 +-
> include/linux/userfaultfd_k.h | 5 +-
> ipc/shm.c | 22 +-
> kernel/acct.c | 5 +-
> kernel/events/core.c | 5 +-
> kernel/events/uprobes.c | 66 +-
> kernel/exit.c | 9 +-
> kernel/fork.c | 18 +-
> kernel/futex.c | 7 +-
> kernel/locking/Makefile | 2 +-
> kernel/locking/range_lock.c | 667 +++++++++++++++++++++
> kernel/sched/fair.c | 5 +-
> kernel/sys.c | 22 +-
> kernel/trace/trace_output.c | 5 +-
> lib/Kconfig | 14 -
> lib/Kconfig.debug | 1 -
> lib/Makefile | 3 +-
> mm/filemap.c | 9 +-
> mm/frame_vector.c | 8 +-
> mm/gup.c | 79 ++-
> mm/hmm.c | 37 +-
> mm/hugetlb.c | 16 +-
> mm/init-mm.c | 2 +-
> mm/internal.h | 3 +-
> mm/khugepaged.c | 57 +-
> mm/ksm.c | 64 +-
> mm/madvise.c | 80 ++-
> mm/memcontrol.c | 21 +-
> mm/memory.c | 30 +-
> mm/mempolicy.c | 56 +-
> mm/migrate.c | 30 +-
> mm/mincore.c | 28 +-
> mm/mlock.c | 49 +-
> mm/mmap.c | 145 +++--
> mm/mmu_notifier.c | 14 +-
> mm/mprotect.c | 28 +-
> mm/mremap.c | 34 +-
> mm/msync.c | 9 +-
> mm/nommu.c | 55 +-
> mm/oom_kill.c | 11 +-
> mm/pagewalk.c | 60 +-
> mm/process_vm_access.c | 8 +-
> mm/shmem.c | 2 +-
> mm/swapfile.c | 7 +-
> mm/userfaultfd.c | 24 +-
> mm/util.c | 12 +-
> security/tomoyo/domain.c | 3 +-
> virt/kvm/arm/mmu.c | 17 +-
> virt/kvm/async_pf.c | 7 +-
> virt/kvm/kvm_main.c | 25 +-
> 205 files changed, 2817 insertions(+), 1651 deletions(-)
> delete mode 100644 drivers/staging/lustre/lustre/llite/range_lock.c
> delete mode 100644 drivers/staging/lustre/lustre/llite/range_lock.h
> create mode 100644 include/linux/range_lock.h
> create mode 100644 kernel/locking/range_lock.c
>


2018-02-06 18:44:44

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 06/64] mm: teach pagefault paths about range locking

On Mon, 05 Feb 2018, Laurent Dufour wrote:

>> --- a/drivers/misc/sgi-gru/grufault.c
>> +++ b/drivers/misc/sgi-gru/grufault.c
>> @@ -189,7 +189,8 @@ static void get_clear_fault_map(struct gru_state *gru,
>> */
>> static int non_atomic_pte_lookup(struct vm_area_struct *vma,
>> unsigned long vaddr, int write,
>> - unsigned long *paddr, int *pageshift)
>> + unsigned long *paddr, int *pageshift,
>> + struct range_lock *mmrange)
>> {
>> struct page *page;
>>
>> @@ -198,7 +199,8 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
>> #else
>> *pageshift = PAGE_SHIFT;
>> #endif
>> - if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
>> + if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0,
>> + &page, NULL, mmrange) <= 0)
>
>There is no need to pass down the range here since underlying called
>__get_user_pages_locked() is told to not unlock the mmap_sem.
>In general get_user_pages() doesn't need a range parameter.

Yeah, you're right. At least it was a productive exercise for auditing.

Thanks,
Davidlohr

2018-02-06 19:00:08

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [RFC PATCH 00/64] mm: towards parallel address space operations

On Mon, 05 Feb 2018, Laurent Dufour wrote:

>On 05/02/2018 02:26, Davidlohr Bueso wrote:
>> From: Davidlohr Bueso <[email protected]>
>>
>> Hi,
>>
>> This patchset is a new version of both the range locking machinery as well
>> as a full mmap_sem conversion that makes use of it -- as the worst case
>> scenario as all mmap_sem calls are converted to a full range mmap_lock
>> equivalent. As such, while there is no improvement of concurrency perse,
>> these changes aim at adding the machinery to permit this in the future.
>
>Despite the massive rebase, what are the changes in this series compared to
>the one I sent in last May - you silently based on, by the way :
>https://lkml.org/lkml/2017/5/24/409

Hardly, but yes I meant to reference that. It ended up being easier to just
do a from scratch version. I haven't done a comparison, but at first I thought
you missed gup users (now not so much), this patchset allows testing on more
archs (see below), we remove the trylock in vm_insert_page(), etc.

>>
>> Direct users of the mm->mmap_sem can be classified as those that (1) acquire
>> and release the lock within the same context, and (2) those who directly
>> manipulate the mmap_sem down the callchain. For example:
>>
>> (1) down_read(&mm->mmap_sem);
>> /* do something */
>> /* nobody down the chain uses mmap_sem directly */
>> up_read(&mm->mmap_sem);
>>
>> (2a) down_read(&mm->mmap_sem);
>> /* do something that retuns mmap_sem unlocked */
>> fn(mm, &locked);
>> if (locked)
>> up_read(&mm->mmap_sem);
>>
>> (2b) down_read(&mm->mmap_sem);
>> /* do something that in between released and reacquired mmap_sem */
>> fn(mm);
>> up_read(&mm->mmap_sem);
>
>Unfortunately, there are also indirect users which rely on the mmap_sem
>locking to protect their data. For the first step using a full range this
>doesn't matter, but when refining the range, these one would be the most
>critical ones as they would have to be reworked to take the range in account.

Of course. The value I see in this patchset is that we can determine whether or
not we move forward based on the worst case scenario numbers.

>> Testing: I have setup an mmtests config file with all the workloads described:
>> http://linux-scalability.org/mmtests-config
>
>Is this link still valid, I can't reach it ?

Sorry, that should have been:

https://linux-scalability.org/range-mmap_lock/mmtests-config

Thanks,
Davidlohr