2020-10-09 19:59:50

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM

From: Ira Weiny <[email protected]>

Should a stray write in the kernel occur persistent memory is affected more
than regular memory. A write to the wrong area of memory could result in
latent data corruption which will will persist after a reboot. PKS provides a
nice way to restrict access to persistent memory kernel mappings, while
providing fast access when needed.

Since the last RFC[1] this patch set has grown quite a bit. It now depends on
the core patches submitted separately.

https://lore.kernel.org/lkml/[email protected]/

And contained in the git tree here:

https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3

However, functionally there is only 1 major change from the last RFC.
Specifically, kmap() is most often used within a single thread in a 'map/do
something/unmap' pattern. In fact this is the pattern used in ~90% of the
callers of kmap(). This pattern works very well for the pmem use case and the
testing which was done. However, there were another ~20-30 kmap users which do
not follow this pattern. Some of them seem to expect the mapping to be
'global' while others require a detailed audit to be sure.[2][3]

While we don't anticipate global mappings to pmem there is a danger in
changing the semantics of kmap(). Effectively, this would cause an unresolved
page fault with little to no information about why.

There were a number of options considered.

1) Attempt to change all the thread local kmap() calls to kmap_atomic()
2) Introduce a flags parameter to kmap() to indicate if the mapping should be
global or not
3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a
global mapping of the pages
4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to
be used within that thread of execution only

Option 1 is simply not feasible kmap_atomic() is not the same semantic as
kmap() within a single tread. Option 2 would require all of the call sites of
kmap() to change. Option 3 seems like a good minimal change but there is a
danger that new code may miss the semantic change of kmap() and not get the
behavior intended for future users. Therefore, option #4 was chosen.

To handle the global PKRS state in the most efficient manner possible. We
lazily override the thread specific PKRS key value only when needed because we
anticipate PKS to not be needed will not be needed most of the time. And even
when it is used 90% of the time it is a thread local call.


[1] https://lore.kernel.org/lkml/[email protected]/

[2] The following list of callers continue calling kmap() (utilizing the global
PKRS). It would be nice if more of them could be converted to kmap_thread()

drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]);
drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl));
drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page);
drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page);
drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1));
drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]);
drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page);
drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page);
drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page);
drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset;
drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg));
drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off;
drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset;
fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]);
fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page);
fs/ceph/inode.c: ctl->dentries = kmap(ctl->page);
fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr);
lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset;
net/ceph/pagelist.c: pl->mapped_tail = kmap(page);
net/ceph/pagelist.c: pl->mapped_tail = kmap(page);
virt/kvm/kvm_main.c: hva = kmap(page);

[3] The following appear to follow the same pattern as ext2 which was converted
after some code audit. So I _think_ they too could be converted to
k[un]map_thread().

fs/freevxfs/vxfs_subr.c|75| kmap(pp);
fs/jfs/jfs_metapage.c|102| kmap(page);
fs/jfs/jfs_metapage.c|156| kmap(page);
fs/minix/dir.c|72| kmap(page);
fs/nilfs2/dir.c|195| kmap(page);
fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page);
fs/ntfs/aops.h|78| kmap(page);
fs/ntfs/compress.c|574| kmap(page);
fs/qnx6/dir.c|32| kmap(page);
fs/qnx6/dir.c|58| kmap(*p = page);
fs/qnx6/inode.c|190| kmap(page);
fs/qnx6/inode.c|557| kmap(page);
fs/reiserfs/inode.c|2397| kmap(bh_result->b_page);
fs/reiserfs/xattr.c|444| kmap(page);
fs/sysv/dir.c|60| kmap(page);
fs/sysv/dir.c|262| kmap(page);
fs/ufs/dir.c|194| kmap(page);
fs/ufs/dir.c|562| kmap(page);


Ira Weiny (58):
x86/pks: Add a global pkrs option
x86/pks/test: Add testing for global option
memremap: Add zone device access protection
kmap: Add stray access protection for device pages
kmap: Introduce k[un]map_thread
kmap: Introduce k[un]map_thread debugging
drivers/drbd: Utilize new kmap_thread()
drivers/firmware_loader: Utilize new kmap_thread()
drivers/gpu: Utilize new kmap_thread()
drivers/rdma: Utilize new kmap_thread()
drivers/net: Utilize new kmap_thread()
fs/afs: Utilize new kmap_thread()
fs/btrfs: Utilize new kmap_thread()
fs/cifs: Utilize new kmap_thread()
fs/ecryptfs: Utilize new kmap_thread()
fs/gfs2: Utilize new kmap_thread()
fs/nilfs2: Utilize new kmap_thread()
fs/hfs: Utilize new kmap_thread()
fs/hfsplus: Utilize new kmap_thread()
fs/jffs2: Utilize new kmap_thread()
fs/nfs: Utilize new kmap_thread()
fs/f2fs: Utilize new kmap_thread()
fs/fuse: Utilize new kmap_thread()
fs/freevxfs: Utilize new kmap_thread()
fs/reiserfs: Utilize new kmap_thread()
fs/zonefs: Utilize new kmap_thread()
fs/ubifs: Utilize new kmap_thread()
fs/cachefiles: Utilize new kmap_thread()
fs/ntfs: Utilize new kmap_thread()
fs/romfs: Utilize new kmap_thread()
fs/vboxsf: Utilize new kmap_thread()
fs/hostfs: Utilize new kmap_thread()
fs/cramfs: Utilize new kmap_thread()
fs/erofs: Utilize new kmap_thread()
fs: Utilize new kmap_thread()
fs/ext2: Use ext2_put_page
fs/ext2: Utilize new kmap_thread()
fs/isofs: Utilize new kmap_thread()
fs/jffs2: Utilize new kmap_thread()
net: Utilize new kmap_thread()
drivers/target: Utilize new kmap_thread()
drivers/scsi: Utilize new kmap_thread()
drivers/mmc: Utilize new kmap_thread()
drivers/xen: Utilize new kmap_thread()
drivers/firmware: Utilize new kmap_thread()
drives/staging: Utilize new kmap_thread()
drivers/mtd: Utilize new kmap_thread()
drivers/md: Utilize new kmap_thread()
drivers/misc: Utilize new kmap_thread()
drivers/android: Utilize new kmap_thread()
kernel: Utilize new kmap_thread()
mm: Utilize new kmap_thread()
lib: Utilize new kmap_thread()
powerpc: Utilize new kmap_thread()
samples: Utilize new kmap_thread()
dax: Stray access protection for dax_direct_access()
nvdimm/pmem: Stray access protection for pmem->virt_addr
[dax|pmem]: Enable stray access protection

Documentation/core-api/protection-keys.rst | 11 +-
arch/powerpc/mm/mem.c | 4 +-
arch/x86/entry/common.c | 28 +++
arch/x86/include/asm/pkeys.h | 6 +-
arch/x86/include/asm/pkeys_common.h | 8 +-
arch/x86/kernel/process.c | 74 ++++++-
arch/x86/mm/fault.c | 193 ++++++++++++++----
arch/x86/mm/pkeys.c | 88 ++++++--
drivers/android/binder_alloc.c | 4 +-
drivers/base/firmware_loader/fallback.c | 4 +-
drivers/base/firmware_loader/main.c | 4 +-
drivers/block/drbd/drbd_main.c | 4 +-
drivers/block/drbd/drbd_receiver.c | 12 +-
drivers/dax/device.c | 2 +
drivers/dax/super.c | 2 +
drivers/firmware/efi/capsule-loader.c | 6 +-
drivers/firmware/efi/capsule.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +-
drivers/gpu/drm/gma500/gma_display.c | 4 +-
drivers/gpu/drm/gma500/mmu.c | 10 +-
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +-
.../drm/i915/gem/selftests/i915_gem_context.c | 4 +-
.../drm/i915/gem/selftests/i915_gem_mman.c | 8 +-
drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +-
drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +-
drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +-
drivers/gpu/drm/i915/i915_gem.c | 8 +-
drivers/gpu/drm/i915/i915_gpu_error.c | 4 +-
drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 4 +-
drivers/infiniband/hw/hfi1/sdma.c | 4 +-
drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +-
drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +-
drivers/md/bcache/request.c | 4 +-
drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +-
drivers/mmc/host/mmc_spi.c | 4 +-
drivers/mmc/host/sdricoh_cs.c | 4 +-
drivers/mtd/mtd_blkdevs.c | 12 +-
drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +-
.../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +-
drivers/nvdimm/pmem.c | 6 +
drivers/scsi/ipr.c | 8 +-
drivers/scsi/pmcraid.c | 8 +-
drivers/staging/rts5208/rtsx_transport.c | 4 +-
drivers/target/target_core_iblock.c | 4 +-
drivers/target/target_core_rd.c | 4 +-
drivers/target/target_core_transport.c | 4 +-
drivers/xen/gntalloc.c | 4 +-
fs/afs/dir.c | 16 +-
fs/afs/dir_edit.c | 16 +-
fs/afs/mntpt.c | 4 +-
fs/afs/write.c | 4 +-
fs/aio.c | 4 +-
fs/binfmt_elf.c | 4 +-
fs/binfmt_elf_fdpic.c | 4 +-
fs/btrfs/check-integrity.c | 4 +-
fs/btrfs/compression.c | 4 +-
fs/btrfs/inode.c | 16 +-
fs/btrfs/lzo.c | 24 +--
fs/btrfs/raid56.c | 34 +--
fs/btrfs/reflink.c | 8 +-
fs/btrfs/send.c | 4 +-
fs/btrfs/zlib.c | 32 +--
fs/btrfs/zstd.c | 20 +-
fs/cachefiles/rdwr.c | 4 +-
fs/cifs/cifsencrypt.c | 6 +-
fs/cifs/file.c | 16 +-
fs/cifs/smb2ops.c | 8 +-
fs/cramfs/inode.c | 10 +-
fs/ecryptfs/crypto.c | 8 +-
fs/ecryptfs/read_write.c | 8 +-
fs/erofs/super.c | 4 +-
fs/erofs/xattr.c | 4 +-
fs/exec.c | 10 +-
fs/ext2/dir.c | 8 +-
fs/ext2/ext2.h | 8 +
fs/ext2/namei.c | 15 +-
fs/f2fs/f2fs.h | 8 +-
fs/freevxfs/vxfs_immed.c | 4 +-
fs/fuse/readdir.c | 4 +-
fs/gfs2/bmap.c | 4 +-
fs/gfs2/ops_fstype.c | 4 +-
fs/hfs/bnode.c | 14 +-
fs/hfs/btree.c | 20 +-
fs/hfsplus/bitmap.c | 20 +-
fs/hfsplus/bnode.c | 102 ++++-----
fs/hfsplus/btree.c | 18 +-
fs/hostfs/hostfs_kern.c | 12 +-
fs/io_uring.c | 4 +-
fs/isofs/compress.c | 4 +-
fs/jffs2/file.c | 8 +-
fs/jffs2/gc.c | 4 +-
fs/nfs/dir.c | 20 +-
fs/nilfs2/alloc.c | 34 +--
fs/nilfs2/cpfile.c | 4 +-
fs/ntfs/aops.c | 4 +-
fs/reiserfs/journal.c | 4 +-
fs/romfs/super.c | 4 +-
fs/splice.c | 4 +-
fs/ubifs/file.c | 16 +-
fs/vboxsf/file.c | 12 +-
fs/zonefs/super.c | 4 +-
include/linux/entry-common.h | 3 +
include/linux/highmem.h | 63 +++++-
include/linux/memremap.h | 1 +
include/linux/mm.h | 43 ++++
include/linux/pkeys.h | 6 +-
include/linux/sched.h | 8 +
include/trace/events/kmap_thread.h | 56 +++++
init/init_task.c | 6 +
kernel/fork.c | 18 ++
kernel/kexec_core.c | 8 +-
lib/Kconfig.debug | 8 +
lib/iov_iter.c | 12 +-
lib/pks/pks_test.c | 138 +++++++++++--
lib/test_bpf.c | 4 +-
lib/test_hmm.c | 8 +-
mm/Kconfig | 13 ++
mm/debug.c | 23 +++
mm/memory.c | 8 +-
mm/memremap.c | 90 ++++++++
mm/swapfile.c | 4 +-
mm/userfaultfd.c | 4 +-
net/ceph/messenger.c | 4 +-
net/core/datagram.c | 4 +-
net/core/sock.c | 8 +-
net/ipv4/ip_output.c | 4 +-
net/sunrpc/cache.c | 4 +-
net/sunrpc/xdr.c | 8 +-
net/tls/tls_device.c | 4 +-
samples/vfio-mdev/mbochs.c | 4 +-
131 files changed, 1284 insertions(+), 565 deletions(-)
create mode 100644 include/trace/events/kmap_thread.h

--
2.28.0.rc0.12.gb6a658bd00c9


2020-10-09 19:59:53

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 01/58] x86/pks: Add a global pkrs option

From: Ira Weiny <[email protected]>

Some users, such as kmap(), sometimes requires PKS to be global.
However, updating all CPUs, and worse yet all threads is expensive.

Introduce a global PKRS state which is checked at critical times to
allow the state to enable access when global PKS is required. To
accomplish this with minimal locking; the code is carefully designed
with the following key concepts.

1) Borrow the idea of lazy TLB invalidations from the fault handler
code. When enabling PKS access we anticipate that other threads are
not yet running. However, if they are we catch the fault and clean
up the MSR value.

2) When disabling PKS access we force all MSR values across all CPU's.
This is required to block access as soon as possible.[1] However, it
is key that we never attempt to update the per-task PKS values
directly. See next point.

3) Per-task PKS values never get updated with global PKS values. This
is key to prevent locking requirements and a nearly intractable
problem of trying to update every task in the system. Here are a few
key points.

3a) The MSR value can be updated with the global PKS value if that
global value happened to change while the task was running.

3b) If the task was sleeping while the global PKS was updated then
the global value is added in when task's are scheduled.

3c) If the global PKS value restricts access the MSR is updated as
soon as possible[1] and the thread value is not updated which ensures
the thread does not retain the elevated privileges after a context
switch.

4) Follow on patches must be careful to preserve the separation of the
thread PKRS value and the MSR value.

5) Access Disable on any individual pkey is turned into (Access Disable
| Write Disable) to facilitate faster integration of the global value
into the thread local MSR through a simple '&' operation. Doing
otherwise would result in complicated individual bit manipulation for
each pkey.

[1] There is a race condition which is ignored which is required for
performance issues. This potentially allows access to a thread until
the end of it's time slice. After the context switch the global value
will be restored.

Signed-off-by: Ira Weiny <[email protected]>
---
Documentation/core-api/protection-keys.rst | 11 +-
arch/x86/entry/common.c | 7 +
arch/x86/include/asm/pkeys.h | 6 +-
arch/x86/include/asm/pkeys_common.h | 8 +-
arch/x86/kernel/process.c | 74 +++++++-
arch/x86/mm/fault.c | 189 ++++++++++++++++-----
arch/x86/mm/pkeys.c | 88 ++++++++--
include/linux/pkeys.h | 6 +-
lib/pks/pks_test.c | 16 +-
9 files changed, 329 insertions(+), 76 deletions(-)

diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index c60366921d60..9e8a98653e13 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -121,9 +121,9 @@ mapping adds that mapping to the protection domain.
int pks_key_alloc(const char * const pkey_user);
#define PAGE_KERNEL_PKEY(pkey)
#define _PAGE_KEY(pkey)
- void pks_mknoaccess(int pkey);
- void pks_mkread(int pkey);
- void pks_mkrdwr(int pkey);
+ void pks_mknoaccess(int pkey, bool global);
+ void pks_mkread(int pkey, bool global);
+ void pks_mkrdwr(int pkey, bool global);
void pks_key_free(int pkey);

pks_key_alloc() allocates keys dynamically to allow better use of the limited
@@ -141,7 +141,10 @@ _PAGE_KEY().
The pks_mk*() family of calls allows kernel users the ability to change the
protections for the domain identified by the pkey specified. 3 states are
available pks_mknoaccess(), pks_mkread(), and pks_mkrdwr() which set the access
-to none, read, and read/write respectively.
+to none, read, and read/write respectively. 'global' specifies that the
+protection should be set across all threads (logical CPU's) not just the
+current running thread/CPU. This increases the overhead of PKS and lessens the
+protection so it should be used sparingly.

Finally, pks_key_free() allows a user to return the key to the allocator for
use by others.
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 324a8fd5ac10..86ad32e0095e 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -261,12 +261,19 @@ noinstr void idtentry_exit_nmi(struct pt_regs *regs, irqentry_state_t *irq_state
* current running value and set the default PKRS value for the duration of the
* exception. Thus preventing exception handlers from having the elevated
* access of the interrupted task.
+ *
+ * NOTE That the thread saved PKRS must be preserved separately to ensure
+ * global overrides do not 'stick' on a thread.
*/
noinstr void irq_save_pkrs(irqentry_state_t *state)
{
if (!cpu_feature_enabled(X86_FEATURE_PKS))
return;

+ /*
+ * The thread_pkrs must be maintained separately to prevent global
+ * overrides from 'sticking' on a thread.
+ */
state->thread_pkrs = current->thread.saved_pkrs;
state->pkrs = this_cpu_read(pkrs_cache);
write_pkrs(INIT_PKRS_VALUE);
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 79952216474e..cae0153a5480 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -143,9 +143,9 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags);
int pks_key_alloc(const char *const pkey_user);
void pks_key_free(int pkey);

-void pks_mknoaccess(int pkey);
-void pks_mkread(int pkey);
-void pks_mkrdwr(int pkey);
+void pks_mknoaccess(int pkey, bool global);
+void pks_mkread(int pkey, bool global);
+void pks_mkrdwr(int pkey, bool global);

#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */

diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h
index 8961e2ddd6ff..e380679ba1bb 100644
--- a/arch/x86/include/asm/pkeys_common.h
+++ b/arch/x86/include/asm/pkeys_common.h
@@ -6,7 +6,12 @@
#define PKR_WD_BIT 0x2
#define PKR_BITS_PER_PKEY 2

-#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY))
+/*
+ * We must define 11b as the default to make global overrides efficient.
+ * See arch/x86/kernel/process.c where the global pkrs is factored in during
+ * context switch.
+ */
+#define PKR_AD_KEY(pkey) ((PKR_WD_BIT | PKR_AD_BIT) << ((pkey) * PKR_BITS_PER_PKEY))

/*
* Define a default PKRS value for each task.
@@ -27,6 +32,7 @@
#define PKS_NUM_KEYS 16

#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
+extern u32 pkrs_global_cache;
DECLARE_PER_CPU(u32, pkrs_cache);
noinstr void write_pkrs(u32 new_pkrs);
#else
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index eb3a95a69392..58edd162d9cb 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -43,7 +43,7 @@
#include <asm/io_bitmap.h>
#include <asm/proto.h>
#include <asm/frame.h>
-#include <asm/pkeys_common.h>
+#include <linux/pkeys.h>

#include "process.h"

@@ -189,15 +189,83 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
}

#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
-DECLARE_PER_CPU(u32, pkrs_cache);
static inline void pks_init_task(struct task_struct *tsk)
{
/* New tasks get the most restrictive PKRS value */
tsk->thread.saved_pkrs = INIT_PKRS_VALUE;
}
+
+extern u32 pkrs_global_cache;
+
+/**
+ * The global PKRS value can only increase access. Because 01b and 11b both
+ * disable access. The following truth table is our desired result for each of
+ * the pkeys when we add in the global permissions.
+ *
+ * 00 R/W - Write enabled (all access)
+ * 10 Read - write disabled (Read only)
+ * 01 NO Acc - access disabled
+ * 11 NO Acc - also access disabled
+ *
+ * local global desired required
+ * result operation
+ * 00 00 00 &
+ * 00 10 00 &
+ * 00 01 00 &
+ * 00 11 00 &
+ *
+ * 10 00 00 &
+ * 10 10 10 &
+ * 10 01 10 ^ special case
+ * 10 11 10 &
+ *
+ * 01 00 00 &
+ * 01 10 10 ^ special case
+ * 01 01 01 &
+ * 01 11 01 &
+ *
+ * 11 00 00 &
+ * 11 10 10 &
+ * 11 01 01 &
+ * 11 11 11 &
+ *
+ * In order to eliminate the need to loop through each pkey and deal with the 2
+ * above special cases we force all 01b values to 11b through the API thus
+ * resulting in the simplified truth table below.
+ *
+ * 00 R/W - Write enabled (all access)
+ * 10 Read - write disabled (Read only)
+ * 01 NO Acc - access disabled
+ * (Not allowed in the API always use 11)
+ * 11 NO Acc - access disabled
+ *
+ * local global desired effective
+ * result operation
+ * 00 00 00 &
+ * 00 10 00 &
+ * 00 11 00 &
+ * 00 11 00 &
+ *
+ * 10 00 00 &
+ * 10 10 10 &
+ * 10 11 10 &
+ * 10 11 10 &
+ *
+ * 11 00 00 &
+ * 11 10 10 &
+ * 11 11 11 &
+ * 11 11 11 &
+ *
+ * 11 00 00 &
+ * 11 10 10 &
+ * 11 11 11 &
+ * 11 11 11 &
+ *
+ * Thus we can simply 'AND' in the global pkrs value.
+ */
static inline void pks_sched_in(void)
{
- write_pkrs(current->thread.saved_pkrs);
+ write_pkrs(current->thread.saved_pkrs & pkrs_global_cache);
}
#else
static inline void pks_init_task(struct task_struct *tsk) { }
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index dd5af9399131..4b4ff9efa298 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -32,6 +32,8 @@
#include <asm/pgtable_areas.h> /* VMALLOC_START, ... */
#include <asm/kvm_para.h> /* kvm_handle_async_pf */

+#include <asm-generic/mman-common.h>
+
#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>

@@ -995,9 +997,124 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
}
}

-static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
+#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
+/*
+ * check if we have had a 'global' pkey update. If so, handle this like a lazy
+ * TLB; fix up the local MSR and return
+ *
+ * See arch/x86/kernel/process.c for the explanation on how global is handled
+ * with a simple '&' operation.
+ *
+ * Also we don't update the current thread saved_pkrs because we don't want the
+ * global value to 'stick' with the thread. Rather we want this to be valid
+ * only for the remainder of this time slice. For subsequent time slices the
+ * global value will be factored in during schedule; see arch/x86/kernel/process.c
+ *
+ * Finally we have a trade off between performance and forcing a restriction of
+ * permissions across all CPUs on a global update.
+ *
+ * Given the following window.
+ *
+ * Global PKRS CPU #0 CPU #1
+ * cache MSR MSR
+ *
+ * | | |
+ * Global |----------\ | |
+ * Restriction | ------------> read | <= T1
+ * (on CPU #0) | | | |
+ * ------\ | | | |
+ * ------>| | | |
+ * | | | |
+ * Update CPU #1 |--------\ | | |
+ * | --------------\ | |
+ * | | --|------------>|
+ * Global remote | | | |
+ * MSR update | | | |
+ * (CPU 2-n) | | | |
+ * |-----> CPU's | v |
+ * local | (2-N) | local --\ |
+ * update | | update ------>|(Update <= T2
+ * ----------------\ | | Incorrect)
+ * | -----------\ | |
+ * | --->|(Update OK) |
+ * Context | | |
+ * Switch |----------\ | |
+ * | ------------> read |
+ * | | | |
+ * | | | |
+ * | | v |
+ * | | local --\ |
+ * | | update ------>|(Update
+ * | | | Correct)
+ *
+ * We allow for a larger window of the global pkey being open because global
+ * updates should be rare and we don't want to burden normal faults with having
+ * to read the global state.
+ */
+static bool global_pkey_is_enabled(pte_t *pte, bool is_write,
+ irqentry_state_t *irq_state)
+{
+ u8 pkey = pte_flags_pkey(pte->pte);
+ int pkey_shift = pkey * PKR_BITS_PER_PKEY;
+ u32 mask = (((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);
+ u32 global = READ_ONCE(pkrs_global_cache);
+ u32 val;
+
+ /* Return early if global access is not valid */
+ val = (global & mask) >> pkey_shift;
+ if ((val & PKR_AD_BIT) || (is_write && (val & PKR_WD_BIT)))
+ return false;
+
+ irq_state->pkrs &= global;
+
+ return true;
+}
+
+#else /* !CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */
+__always_inline bool global_pkey_is_enabled(pte_t *pte, bool is_write,
+ irqentry_state_t *irq_state)
+{
+ return false;
+}
+#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */
+
+#ifdef CONFIG_PKS_TESTING
+bool pks_test_callback(irqentry_state_t *irq_state);
+static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state)
+{
+ /*
+ * If we get a protection key exception it could be because we
+ * are running the PKS test. If so, pks_test_callback() will
+ * clear the protection mechanism and return true to indicate
+ * the fault was handled
+ */
+ return pks_test_callback(irq_state);
+}
+#else /* !CONFIG_PKS_TESTING */
+static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state)
+{
+ return false;
+}
+#endif /* CONFIG_PKS_TESTING */
+
+
+static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte,
+ irqentry_state_t *irq_state)
{
- if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
+ bool is_write = (error_code & X86_PF_WRITE);
+
+ if (IS_ENABLED(CONFIG_ARCH_HAS_SUPERVISOR_PKEYS) &&
+ error_code & X86_PF_PK) {
+ if (global_pkey_is_enabled(pte, is_write, irq_state))
+ return 1;
+
+ if (handle_pks_testing(error_code, irq_state))
+ return 1;
+
+ return 0;
+ }
+
+ if (is_write && !pte_write(*pte))
return 0;

if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
@@ -1007,7 +1124,7 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
}

/*
- * Handle a spurious fault caused by a stale TLB entry.
+ * Handle a spurious fault caused by a stale TLB entry or a lazy PKRS update.
*
* This allows us to lazily refresh the TLB when increasing the
* permissions of a kernel page (RO -> RW or NX -> X). Doing it
@@ -1022,13 +1139,19 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
* There are no security implications to leaving a stale TLB when
* increasing the permissions on a page.
*
+ * Similarly, PKRS increases in permissions are done on a thread local level.
+ * But if the caller indicates the permission should be allowd globaly we can
+ * lazily update only those threads which fault and avoid a global IPI MSR
+ * update.
+ *
* Returns non-zero if a spurious fault was handled, zero otherwise.
*
* See Intel Developer's Manual Vol 3 Section 4.10.4.3, bullet 3
* (Optional Invalidation).
*/
static noinline int
-spurious_kernel_fault(unsigned long error_code, unsigned long address)
+spurious_kernel_fault(unsigned long error_code, unsigned long address,
+ irqentry_state_t *irq_state)
{
pgd_t *pgd;
p4d_t *p4d;
@@ -1038,17 +1161,19 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
int ret;

/*
- * Only writes to RO or instruction fetches from NX may cause
- * spurious faults.
+ * Only PKey faults or writes to RO or instruction fetches from NX may
+ * cause spurious faults.
*
* These could be from user or supervisor accesses but the TLB
* is only lazily flushed after a kernel mapping protection
* change, so user accesses are not expected to cause spurious
* faults.
*/
- if (error_code != (X86_PF_WRITE | X86_PF_PROT) &&
- error_code != (X86_PF_INSTR | X86_PF_PROT))
- return 0;
+ if (!(error_code & X86_PF_PK)) {
+ if (error_code != (X86_PF_WRITE | X86_PF_PROT) &&
+ error_code != (X86_PF_INSTR | X86_PF_PROT))
+ return 0;
+ }

pgd = init_mm.pgd + pgd_index(address);
if (!pgd_present(*pgd))
@@ -1059,27 +1184,31 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
return 0;

if (p4d_large(*p4d))
- return spurious_kernel_fault_check(error_code, (pte_t *) p4d);
+ return spurious_kernel_fault_check(error_code, (pte_t *) p4d,
+ irq_state);

pud = pud_offset(p4d, address);
if (!pud_present(*pud))
return 0;

if (pud_large(*pud))
- return spurious_kernel_fault_check(error_code, (pte_t *) pud);
+ return spurious_kernel_fault_check(error_code, (pte_t *) pud,
+ irq_state);

pmd = pmd_offset(pud, address);
if (!pmd_present(*pmd))
return 0;

if (pmd_large(*pmd))
- return spurious_kernel_fault_check(error_code, (pte_t *) pmd);
+ return spurious_kernel_fault_check(error_code, (pte_t *) pmd,
+ irq_state);

pte = pte_offset_kernel(pmd, address);
if (!pte_present(*pte))
return 0;

- ret = spurious_kernel_fault_check(error_code, pte);
+ ret = spurious_kernel_fault_check(error_code, pte,
+ irq_state);
if (!ret)
return 0;

@@ -1087,7 +1216,8 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
* Make sure we have permissions in PMD.
* If not, then there's a bug in the page tables:
*/
- ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd);
+ ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd,
+ irq_state);
WARN_ONCE(!ret, "PMD has incorrect permission bits\n");

return ret;
@@ -1150,25 +1280,6 @@ static int fault_in_kernel_space(unsigned long address)
return address >= TASK_SIZE_MAX;
}

-#ifdef CONFIG_PKS_TESTING
-bool pks_test_callback(irqentry_state_t *irq_state);
-static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state)
-{
- /*
- * If we get a protection key exception it could be because we
- * are running the PKS test. If so, pks_test_callback() will
- * clear the protection mechanism and return true to indicate
- * the fault was handled.
- */
- return (hw_error_code & X86_PF_PK) && pks_test_callback(irq_state);
-}
-#else
-static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state)
-{
- return false;
-}
-#endif
-
/*
* Called for all faults where 'address' is part of the kernel address
* space. Might get called for faults that originate from *code* that
@@ -1186,9 +1297,6 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
!cpu_feature_enabled(X86_FEATURE_PKS))
WARN_ON_ONCE(hw_error_code & X86_PF_PK);

- if (handle_pks_testing(hw_error_code, irq_state))
- return;
-
#ifdef CONFIG_X86_32
/*
* We can fault-in kernel-space virtual memory on-demand. The
@@ -1220,8 +1328,11 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
}
#endif

- /* Was the fault spurious, caused by lazy TLB invalidation? */
- if (spurious_kernel_fault(hw_error_code, address))
+ /*
+ * Was the fault spurious; caused by lazy TLB invalidation or PKRS
+ * update?
+ */
+ if (spurious_kernel_fault(hw_error_code, address, irq_state))
return;

/* kprobes don't want to hook the spurious faults: */
@@ -1492,7 +1603,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
*
* Fingers crossed.
*
- * The async #PF handling code takes care of idtentry handling
+ * The async #PF handling code takes care of irqentry handling
* itself.
*/
if (kvm_handle_async_pf(regs, (u32)address))
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 2431c68ef752..a45893069877 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -263,33 +263,84 @@ noinstr void write_pkrs(u32 new_pkrs)
}
EXPORT_SYMBOL_GPL(write_pkrs);

+/*
+ * NOTE: The pkrs_global_cache is _never_ stored in the per thread PKRS cache
+ * values [thread.saved_pkrs] by design
+ *
+ * This allows us to invalidate access on running threads immediately upon
+ * invalidate. Sleeping threads will not be enabled due to the algorithm
+ * during pkrs_sched_in()
+ */
+DEFINE_SPINLOCK(pkrs_global_cache_lock);
+u32 pkrs_global_cache = INIT_PKRS_VALUE;
+EXPORT_SYMBOL_GPL(pkrs_global_cache);
+
+static inline void update_global_pkrs(int pkey, unsigned long protection)
+{
+ int pkey_shift = pkey * PKR_BITS_PER_PKEY;
+ u32 mask = (((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);
+ u32 old_val;
+
+ spin_lock(&pkrs_global_cache_lock);
+ old_val = (pkrs_global_cache & mask) >> pkey_shift;
+ pkrs_global_cache &= ~mask;
+ if (protection & PKEY_DISABLE_ACCESS)
+ pkrs_global_cache |= PKR_AD_BIT << pkey_shift;
+ if (protection & PKEY_DISABLE_WRITE)
+ pkrs_global_cache |= PKR_WD_BIT << pkey_shift;
+
+ /*
+ * If we are preventing access from the old value. Force the
+ * update on all running threads.
+ */
+ if (((old_val == 0) && protection) ||
+ ((old_val & PKR_WD_BIT) && (protection & PKEY_DISABLE_ACCESS))) {
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ u32 *ptr = per_cpu_ptr(&pkrs_cache, cpu);
+
+ *ptr = update_pkey_val(*ptr, pkey, protection);
+ wrmsrl_on_cpu(cpu, MSR_IA32_PKRS, *ptr);
+ put_cpu_ptr(ptr);
+ }
+ }
+ spin_unlock(&pkrs_global_cache_lock);
+}
+
/**
* Do not call this directly, see pks_mk*() below.
*
* @pkey: Key for the domain to change
* @protection: protection bits to be used
+ * @global: should this change be made globally or not.
*
* Protection utilizes the same protection bits specified for User pkeys
* PKEY_DISABLE_ACCESS
* PKEY_DISABLE_WRITE
*
*/
-static inline void pks_update_protection(int pkey, unsigned long protection)
+static inline void pks_update_protection(int pkey, unsigned long protection,
+ bool global)
{
- current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs,
- pkey, protection);
preempt_disable();
+ if (global)
+ update_global_pkrs(pkey, protection);
+
+ current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs, pkey,
+ protection);
write_pkrs(current->thread.saved_pkrs);
+
preempt_enable();
}

/**
* PKS access control functions
*
- * Change the access of the domain specified by the pkey. These are global
- * updates. They only affects the current running thread. It is undefined and
- * a bug for users to call this without having allocated a pkey and using it as
- * pkey here.
+ * Change the access of the domain specified by the pkey. These may be global
+ * updates depending on the value of global. It is undefined and a bug for
+ * users to call this without having allocated a pkey and using it as pkey
+ * here.
*
* pks_mknoaccess()
* Disable all access to the domain
@@ -299,23 +350,30 @@ static inline void pks_update_protection(int pkey, unsigned long protection)
* Make the domain Read/Write
*
* @pkey the pkey for which the access should change.
- *
+ * @global if true the access is enabled on all threads/logical cpus
*/
-void pks_mknoaccess(int pkey)
+void pks_mknoaccess(int pkey, bool global)
{
- pks_update_protection(pkey, PKEY_DISABLE_ACCESS);
+ /*
+ * We force disable access to be 11b
+ * (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
+ * instaed of 01b See arch/x86/kernel/process.c where the global pkrs
+ * is factored in during context switch.
+ */
+ pks_update_protection(pkey, PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE,
+ global);
}
EXPORT_SYMBOL_GPL(pks_mknoaccess);

-void pks_mkread(int pkey)
+void pks_mkread(int pkey, bool global)
{
- pks_update_protection(pkey, PKEY_DISABLE_WRITE);
+ pks_update_protection(pkey, PKEY_DISABLE_WRITE, global);
}
EXPORT_SYMBOL_GPL(pks_mkread);

-void pks_mkrdwr(int pkey)
+void pks_mkrdwr(int pkey, bool global)
{
- pks_update_protection(pkey, 0);
+ pks_update_protection(pkey, 0, global);
}
EXPORT_SYMBOL_GPL(pks_mkrdwr);

@@ -377,7 +435,7 @@ void pks_key_free(int pkey)
return;

/* Restore to default of no access */
- pks_mknoaccess(pkey);
+ pks_mknoaccess(pkey, true);
pks_key_users[pkey] = NULL;
__clear_bit(pkey, &pks_key_allocation_map);
}
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index f9552bd9341f..8f3bfec83949 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -57,15 +57,15 @@ static inline int pks_key_alloc(const char * const pkey_user)
static inline void pks_key_free(int pkey)
{
}
-static inline void pks_mknoaccess(int pkey)
+static inline void pks_mknoaccess(int pkey, bool global)
{
WARN_ON_ONCE(1);
}
-static inline void pks_mkread(int pkey)
+static inline void pks_mkread(int pkey, bool global)
{
WARN_ON_ONCE(1);
}
-static inline void pks_mkrdwr(int pkey)
+static inline void pks_mkrdwr(int pkey, bool global)
{
WARN_ON_ONCE(1);
}
diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c
index d7dbf92527bd..286c8b8457da 100644
--- a/lib/pks/pks_test.c
+++ b/lib/pks/pks_test.c
@@ -163,12 +163,12 @@ static void check_exception(irqentry_state_t *irq_state)
* Check we can update the value during exception without affecting the
* calling thread. The calling thread is checked after exception...
*/
- pks_mkrdwr(test_armed_key);
+ pks_mkrdwr(test_armed_key, false);
if (!check_pkrs(test_armed_key, 0)) {
pr_err(" FAIL: exception did not change register to 0\n");
test_exception_ctx->pass = false;
}
- pks_mknoaccess(test_armed_key);
+ pks_mknoaccess(test_armed_key, false);
if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)) {
pr_err(" FAIL: exception did not change register to 0x3\n");
test_exception_ctx->pass = false;
@@ -314,13 +314,13 @@ static int run_access_test(struct pks_test_ctx *ctx,
{
switch (test->mode) {
case PKS_TEST_NO_ACCESS:
- pks_mknoaccess(ctx->pkey);
+ pks_mknoaccess(ctx->pkey, false);
break;
case PKS_TEST_RDWR:
- pks_mkrdwr(ctx->pkey);
+ pks_mkrdwr(ctx->pkey, false);
break;
case PKS_TEST_RDONLY:
- pks_mkread(ctx->pkey);
+ pks_mkread(ctx->pkey, false);
break;
default:
pr_err("BUG in test invalid mode\n");
@@ -476,7 +476,7 @@ static void run_exception_test(void)
goto free_context;
}

- pks_mkread(ctx->pkey);
+ pks_mkread(ctx->pkey, false);

spin_lock(&test_lock);
WRITE_ONCE(test_exception_ctx, ctx);
@@ -556,7 +556,7 @@ static void crash_it(void)
return;
}

- pks_mknoaccess(ctx->pkey);
+ pks_mknoaccess(ctx->pkey, false);

spin_lock(&test_lock);
WRITE_ONCE(test_armed_key, 0);
@@ -618,7 +618,7 @@ static ssize_t pks_write_file(struct file *file, const char __user *user_buf,
/* start of context switch test */
if (!strcmp(buf, "1")) {
/* Ensure a known state to test context switch */
- pks_mknoaccess(ctx->pkey);
+ pks_mknoaccess(ctx->pkey, false);
}

/* After context switch msr should be restored */
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:00:02

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 06/58] kmap: Introduce k[un]map_thread debugging

From: Ira Weiny <[email protected]>

Most kmap() callers use the map within a single thread and have no need
for the protection domain to be enabled globally.

To differentiate these kmap users, new k[un]map_thread() calls were
introduced which are thread local.

To aid in debugging the new use of kmap_thread(), add a reference count,
a check on that count, and tracing to ID where mapping errors occur.

Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
include/linux/highmem.h | 5 +++
include/linux/sched.h | 5 +++
include/trace/events/kmap_thread.h | 56 ++++++++++++++++++++++++++++++
init/init_task.c | 3 ++
kernel/fork.c | 15 ++++++++
lib/Kconfig.debug | 8 +++++
mm/debug.c | 23 ++++++++++++
7 files changed, 115 insertions(+)
create mode 100644 include/trace/events/kmap_thread.h

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ef7813544719..22d1c000802e 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -247,6 +247,10 @@ static inline void kunmap(struct page *page)
__kunmap(page, true);
}

+#ifdef CONFIG_DEBUG_KMAP_THREAD
+void *kmap_thread(struct page *page);
+void kunmap_thread(struct page *page);
+#else
static inline void *kmap_thread(struct page *page)
{
return __kmap(page, false);
@@ -255,6 +259,7 @@ static inline void kunmap_thread(struct page *page)
{
__kunmap(page, false);
}
+#endif

/*
* Prevent people trying to call kunmap_atomic() as if it were kunmap()
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 25d97ab6c757..4627ea4a49e6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1318,6 +1318,11 @@ struct task_struct {
#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
unsigned int dev_page_access_ref;
#endif
+
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+ unsigned int kmap_thread_cnt;
+#endif
+
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
diff --git a/include/trace/events/kmap_thread.h b/include/trace/events/kmap_thread.h
new file mode 100644
index 000000000000..e7143cfe0daf
--- /dev/null
+++ b/include/trace/events/kmap_thread.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ *
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kmap_thread
+
+#if !defined(_TRACE_KMAP_THREAD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_KMAP_THREAD_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(kmap_thread_template,
+ TP_PROTO(struct task_struct *tsk, struct page *page,
+ void *caller_addr, int cnt),
+ TP_ARGS(tsk, page, caller_addr, cnt),
+
+ TP_STRUCT__entry(
+ __field(int, pid)
+ __field(struct page *, page)
+ __field(void *, caller_addr)
+ __field(int, cnt)
+ ),
+
+ TP_fast_assign(
+ __entry->pid = tsk->pid;
+ __entry->page = page;
+ __entry->caller_addr = caller_addr;
+ __entry->cnt = cnt;
+ ),
+
+ TP_printk("PID %d; (%d) %pS %p",
+ __entry->pid,
+ __entry->cnt,
+ __entry->caller_addr,
+ __entry->page
+ )
+);
+
+DEFINE_EVENT(kmap_thread_template, kmap_thread,
+ TP_PROTO(struct task_struct *tsk, struct page *page,
+ void *caller_addr, int cnt),
+ TP_ARGS(tsk, page, caller_addr, cnt));
+
+DEFINE_EVENT(kmap_thread_template, kunmap_thread,
+ TP_PROTO(struct task_struct *tsk, struct page *page,
+ void *caller_addr, int cnt),
+ TP_ARGS(tsk, page, caller_addr, cnt));
+
+
+#endif /* _TRACE_KMAP_THREAD_H */
+
+#include <trace/define_trace.h>
diff --git a/init/init_task.c b/init/init_task.c
index 9b39f25de59b..19f09965eb34 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -212,6 +212,9 @@ struct task_struct init_task
#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
.dev_page_access_ref = 0,
#endif
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+ .kmap_thread_cnt = 0,
+#endif
};
EXPORT_SYMBOL(init_task);

diff --git a/kernel/fork.c b/kernel/fork.c
index b6a3ee328a89..2c66e49b7614 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -722,6 +722,17 @@ static inline void put_signal_struct(struct signal_struct *sig)
free_signal_struct(sig);
}

+#ifdef CONFIG_DEBUG_KMAP_THREAD
+static void check_outstanding_kmap_thread(struct task_struct *tsk)
+{
+ if (tsk->kmap_thread_cnt)
+ pr_warn(KERN_ERR "WARNING: PID %d; Failed to kunmap_thread() [cnt %d]\n",
+ tsk->pid, tsk->kmap_thread_cnt);
+}
+#else
+static void check_outstanding_kmap_thread(struct task_struct *tsk) { }
+#endif
+
void __put_task_struct(struct task_struct *tsk)
{
WARN_ON(!tsk->exit_state);
@@ -734,6 +745,7 @@ void __put_task_struct(struct task_struct *tsk)
exit_creds(tsk);
delayacct_tsk_free(tsk);
put_signal_struct(tsk->signal);
+ check_outstanding_kmap_thread(tsk);

if (!profile_handoff_task(tsk))
free_task(tsk);
@@ -943,6 +955,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
#endif
#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION
tsk->dev_page_access_ref = 0;
+#endif
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+ tsk->kmap_thread_cnt = 0;
#endif
return tsk;

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f015c09ba5a1..6507b43d5b0c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -858,6 +858,14 @@ config DEBUG_HIGHMEM
This option enables additional error checking for high memory
systems. Disable for production systems.

+config DEBUG_KMAP_THREAD
+ bool "Kmap debugging"
+ depends on DEBUG_KERNEL
+ help
+ This option enables additional error checking for kernel mapping code
+ specifically the k[un]map_thread() calls. Disable for production
+ systems.
+
config HAVE_DEBUG_STACKOVERFLOW
bool

diff --git a/mm/debug.c b/mm/debug.c
index ca8d1cacdecc..68d186f3570e 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -320,3 +320,26 @@ void page_init_poison(struct page *page, size_t size)
}
EXPORT_SYMBOL_GPL(page_init_poison);
#endif /* CONFIG_DEBUG_VM */
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/kmap_thread.h>
+
+#ifdef CONFIG_DEBUG_KMAP_THREAD
+void *kmap_thread(struct page *page)
+{
+ trace_kmap_thread(current, page, __builtin_return_address(0),
+ current->kmap_thread_cnt);
+ current->kmap_thread_cnt++;
+ return __kmap(page, false);
+}
+EXPORT_SYMBOL_GPL(kmap_thread);
+
+void kunmap_thread(struct page *page)
+{
+ __kunmap(page, false);
+ current->kmap_thread_cnt--;
+ trace_kunmap_thread(current, page, __builtin_return_address(0),
+ current->kmap_thread_cnt);
+}
+EXPORT_SYMBOL_GPL(kunmap_thread);
+#endif
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:00:26

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 16/58] fs/gfs2: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Bob Peterson <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/gfs2/bmap.c | 4 ++--
fs/gfs2/ops_fstype.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 0f69fbd4af66..375af4528411 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -67,7 +67,7 @@ static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct buffer_head *dibh,
}

if (!PageUptodate(page)) {
- void *kaddr = kmap(page);
+ void *kaddr = kmap_thread(page);
u64 dsize = i_size_read(inode);

if (dsize > gfs2_max_stuffed_size(ip))
@@ -75,7 +75,7 @@ static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct buffer_head *dibh,

memcpy(kaddr, dibh->b_data + sizeof(struct gfs2_dinode), dsize);
memset(kaddr + dsize, 0, PAGE_SIZE - dsize);
- kunmap(page);
+ kunmap_thread(page);

SetPageUptodate(page);
}
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 6d18d2c91add..a5d20d9b504a 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -263,9 +263,9 @@ static int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector, int silent)
__free_page(page);
return -EIO;
}
- p = kmap(page);
+ p = kmap_thread(page);
gfs2_sb_in(sdp, p);
- kunmap(page);
+ kunmap_thread(page);
__free_page(page);
return gfs2_check_sb(sdp, silent);
}
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:00:26

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 15/58] fs/ecryptfs: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Herbert Xu <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Aditya Pakki <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/ecryptfs/crypto.c | 8 ++++----
fs/ecryptfs/read_write.c | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 0681540c48d9..e73e00994bee 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -469,10 +469,10 @@ int ecryptfs_encrypt_page(struct page *page)
}

lower_offset = lower_offset_for_page(crypt_stat, page);
- enc_extent_virt = kmap(enc_extent_page);
+ enc_extent_virt = kmap_thread(enc_extent_page);
rc = ecryptfs_write_lower(ecryptfs_inode, enc_extent_virt, lower_offset,
PAGE_SIZE);
- kunmap(enc_extent_page);
+ kunmap_thread(enc_extent_page);
if (rc < 0) {
ecryptfs_printk(KERN_ERR,
"Error attempting to write lower page; rc = [%d]\n",
@@ -518,10 +518,10 @@ int ecryptfs_decrypt_page(struct page *page)
BUG_ON(!(crypt_stat->flags & ECRYPTFS_ENCRYPTED));

lower_offset = lower_offset_for_page(crypt_stat, page);
- page_virt = kmap(page);
+ page_virt = kmap_thread(page);
rc = ecryptfs_read_lower(page_virt, lower_offset, PAGE_SIZE,
ecryptfs_inode);
- kunmap(page);
+ kunmap_thread(page);
if (rc < 0) {
ecryptfs_printk(KERN_ERR,
"Error attempting to read lower page; rc = [%d]\n",
diff --git a/fs/ecryptfs/read_write.c b/fs/ecryptfs/read_write.c
index 0438997ac9d8..5eca4330c0c0 100644
--- a/fs/ecryptfs/read_write.c
+++ b/fs/ecryptfs/read_write.c
@@ -64,11 +64,11 @@ int ecryptfs_write_lower_page_segment(struct inode *ecryptfs_inode,

offset = ((((loff_t)page_for_lower->index) << PAGE_SHIFT)
+ offset_in_page);
- virt = kmap(page_for_lower);
+ virt = kmap_thread(page_for_lower);
rc = ecryptfs_write_lower(ecryptfs_inode, virt, offset, size);
if (rc > 0)
rc = 0;
- kunmap(page_for_lower);
+ kunmap_thread(page_for_lower);
return rc;
}

@@ -251,11 +251,11 @@ int ecryptfs_read_lower_page_segment(struct page *page_for_ecryptfs,
int rc;

offset = ((((loff_t)page_index) << PAGE_SHIFT) + offset_in_page);
- virt = kmap(page_for_ecryptfs);
+ virt = kmap_thread(page_for_ecryptfs);
rc = ecryptfs_read_lower(virt, offset, size, ecryptfs_inode);
if (rc > 0)
rc = 0;
- kunmap(page_for_ecryptfs);
+ kunmap_thread(page_for_ecryptfs);
flush_dcache_page(page_for_ecryptfs);
return rc;
}
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:00:44

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 49/58] drivers/misc: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index 8531ae781195..f308abb8ad03 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -343,7 +343,7 @@ static int qp_memcpy_to_queue_iter(struct vmci_queue *queue,
size_t to_copy;

if (kernel_if->host)
- va = kmap(kernel_if->u.h.page[page_index]);
+ va = kmap_thread(kernel_if->u.h.page[page_index]);
else
va = kernel_if->u.g.vas[page_index + 1];
/* Skip header. */
@@ -357,12 +357,12 @@ static int qp_memcpy_to_queue_iter(struct vmci_queue *queue,
if (!copy_from_iter_full((u8 *)va + page_offset, to_copy,
from)) {
if (kernel_if->host)
- kunmap(kernel_if->u.h.page[page_index]);
+ kunmap_thread(kernel_if->u.h.page[page_index]);
return VMCI_ERROR_INVALID_ARGS;
}
bytes_copied += to_copy;
if (kernel_if->host)
- kunmap(kernel_if->u.h.page[page_index]);
+ kunmap_thread(kernel_if->u.h.page[page_index]);
}

return VMCI_SUCCESS;
@@ -391,7 +391,7 @@ static int qp_memcpy_from_queue_iter(struct iov_iter *to,
int err;

if (kernel_if->host)
- va = kmap(kernel_if->u.h.page[page_index]);
+ va = kmap_thread(kernel_if->u.h.page[page_index]);
else
va = kernel_if->u.g.vas[page_index + 1];
/* Skip header. */
@@ -405,12 +405,12 @@ static int qp_memcpy_from_queue_iter(struct iov_iter *to,
err = copy_to_iter((u8 *)va + page_offset, to_copy, to);
if (err != to_copy) {
if (kernel_if->host)
- kunmap(kernel_if->u.h.page[page_index]);
+ kunmap_thread(kernel_if->u.h.page[page_index]);
return VMCI_ERROR_INVALID_ARGS;
}
bytes_copied += to_copy;
if (kernel_if->host)
- kunmap(kernel_if->u.h.page[page_index]);
+ kunmap_thread(kernel_if->u.h.page[page_index]);
}

return VMCI_SUCCESS;
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:00:54

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 56/58] dax: Stray access protection for dax_direct_access()

From: Ira Weiny <[email protected]>

dax_direct_access() is a special case of accessing pmem via a page
offset and without a struct page.

Because the dax driver is well aware of the special protections it has
mapped memory with, call dev_access_[en|dis]able() directly instead of
the unnecessary overhead of trying to get a page to kmap.

Similar to kmap, we leverage existing functions, dax_read_[un]lock(),
because they are already required to surround the use of the memory
returned from dax_direct_access().

Signed-off-by: Ira Weiny <[email protected]>
---
drivers/dax/super.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e84070b55463..0ddb3ee73e36 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -30,6 +30,7 @@ static DEFINE_SPINLOCK(dax_host_lock);

int dax_read_lock(void)
{
+ dev_access_enable(false);
return srcu_read_lock(&dax_srcu);
}
EXPORT_SYMBOL_GPL(dax_read_lock);
@@ -37,6 +38,7 @@ EXPORT_SYMBOL_GPL(dax_read_lock);
void dax_read_unlock(int id)
{
srcu_read_unlock(&dax_srcu, id);
+ dev_access_disable(false);
}
EXPORT_SYMBOL_GPL(dax_read_unlock);

--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:01:21

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 53/58] lib: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Alexander Viro <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
lib/iov_iter.c | 12 ++++++------
lib/test_bpf.c | 4 ++--
lib/test_hmm.c | 8 ++++----
3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5e40786c8f12..1d47f957cf95 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -208,7 +208,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
}
/* Too bad - revert to non-atomic kmap */

- kaddr = kmap(page);
+ kaddr = kmap_thread(page);
from = kaddr + offset;
left = copyout(buf, from, copy);
copy -= left;
@@ -225,7 +225,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
from += copy;
bytes -= copy;
}
- kunmap(page);
+ kunmap_thread(page);

done:
if (skip == iov->iov_len) {
@@ -292,7 +292,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
}
/* Too bad - revert to non-atomic kmap */

- kaddr = kmap(page);
+ kaddr = kmap_thread(page);
to = kaddr + offset;
left = copyin(to, buf, copy);
copy -= left;
@@ -309,7 +309,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t
to += copy;
bytes -= copy;
}
- kunmap(page);
+ kunmap_thread(page);

done:
if (skip == iov->iov_len) {
@@ -1742,10 +1742,10 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
return 0;

iterate_all_kinds(i, bytes, v, -EINVAL, ({
- w.iov_base = kmap(v.bv_page) + v.bv_offset;
+ w.iov_base = kmap_thread(v.bv_page) + v.bv_offset;
w.iov_len = v.bv_len;
err = f(&w, context);
- kunmap(v.bv_page);
+ kunmap_thread(v.bv_page);
err;}), ({
w = v;
err = f(&w, context);})
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index ca7d635bccd9..441f822f56ba 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -6506,11 +6506,11 @@ static void *generate_test_data(struct bpf_test *test, int sub)
if (!page)
goto err_kfree_skb;

- ptr = kmap(page);
+ ptr = kmap_thread(page);
if (!ptr)
goto err_free_page;
memcpy(ptr, test->frag_data, MAX_DATA);
- kunmap(page);
+ kunmap_thread(page);
skb_add_rx_frag(skb, 0, page, 0, MAX_DATA, MAX_DATA);
}

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index e7dc3de355b7..e40d26f97f45 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -329,9 +329,9 @@ static int dmirror_do_read(struct dmirror *dmirror, unsigned long start,
if (!page)
return -ENOENT;

- tmp = kmap(page);
+ tmp = kmap_thread(page);
memcpy(ptr, tmp, PAGE_SIZE);
- kunmap(page);
+ kunmap_thread(page);

ptr += PAGE_SIZE;
bounce->cpages++;
@@ -398,9 +398,9 @@ static int dmirror_do_write(struct dmirror *dmirror, unsigned long start,
if (!page || xa_pointer_tag(entry) != DPT_XA_TAG_WRITE)
return -ENOENT;

- tmp = kmap(page);
+ tmp = kmap_thread(page);
memcpy(tmp, ptr, PAGE_SIZE);
- kunmap(page);
+ kunmap_thread(page);

ptr += PAGE_SIZE;
bounce->cpages++;
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:01:21

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 54/58] powerpc: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
arch/powerpc/mm/mem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 42e25874f5a8..6ef557b8dda6 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -573,9 +573,9 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
{
unsigned long maddr;

- maddr = (unsigned long) kmap(page) + (addr & ~PAGE_MASK);
+ maddr = (unsigned long) kmap_thread(page) + (addr & ~PAGE_MASK);
flush_icache_range(maddr, maddr + len);
- kunmap(page);
+ kunmap_thread(page);
}

/*
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:01:35

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 51/58] kernel: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

This kmap() call is localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Eric Biederman <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
kernel/kexec_core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c19c0dad1ebe..272a9920c0d6 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -815,7 +815,7 @@ static int kimage_load_normal_segment(struct kimage *image,
if (result < 0)
goto out;

- ptr = kmap(page);
+ ptr = kmap_thread(page);
/* Start with a clear page */
clear_page(ptr);
ptr += maddr & ~PAGE_MASK;
@@ -828,7 +828,7 @@ static int kimage_load_normal_segment(struct kimage *image,
memcpy(ptr, kbuf, uchunk);
else
result = copy_from_user(ptr, buf, uchunk);
- kunmap(page);
+ kunmap_thread(page);
if (result) {
result = -EFAULT;
goto out;
@@ -879,7 +879,7 @@ static int kimage_load_crash_segment(struct kimage *image,
goto out;
}
arch_kexec_post_alloc_pages(page_address(page), 1, 0);
- ptr = kmap(page);
+ ptr = kmap_thread(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
PAGE_SIZE - (maddr & ~PAGE_MASK));
@@ -895,7 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
else
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
- kunmap(page);
+ kunmap_thread(page);
arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:01:48

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 47/58] drivers/mtd: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Miquel Raynal <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/mtd/mtd_blkdevs.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 0c05f77f9b21..4b18998273fa 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -88,14 +88,14 @@ static blk_status_t do_blktrans_request(struct mtd_blktrans_ops *tr,
return BLK_STS_IOERR;
return BLK_STS_OK;
case REQ_OP_READ:
- buf = kmap(bio_page(req->bio)) + bio_offset(req->bio);
+ buf = kmap_thread(bio_page(req->bio)) + bio_offset(req->bio);
for (; nsect > 0; nsect--, block++, buf += tr->blksize) {
if (tr->readsect(dev, block, buf)) {
- kunmap(bio_page(req->bio));
+ kunmap_thread(bio_page(req->bio));
return BLK_STS_IOERR;
}
}
- kunmap(bio_page(req->bio));
+ kunmap_thread(bio_page(req->bio));
rq_flush_dcache_pages(req);
return BLK_STS_OK;
case REQ_OP_WRITE:
@@ -103,14 +103,14 @@ static blk_status_t do_blktrans_request(struct mtd_blktrans_ops *tr,
return BLK_STS_IOERR;

rq_flush_dcache_pages(req);
- buf = kmap(bio_page(req->bio)) + bio_offset(req->bio);
+ buf = kmap_thread(bio_page(req->bio)) + bio_offset(req->bio);
for (; nsect > 0; nsect--, block++, buf += tr->blksize) {
if (tr->writesect(dev, block, buf)) {
- kunmap(bio_page(req->bio));
+ kunmap_thread(bio_page(req->bio));
return BLK_STS_IOERR;
}
}
- kunmap(bio_page(req->bio));
+ kunmap_thread(bio_page(req->bio));
return BLK_STS_OK;
default:
return BLK_STS_IOERR;
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:01:53

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 46/58] drives/staging: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/staging/rts5208/rtsx_transport.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rts5208/rtsx_transport.c b/drivers/staging/rts5208/rtsx_transport.c
index 0027bcf638ad..f747cc23951b 100644
--- a/drivers/staging/rts5208/rtsx_transport.c
+++ b/drivers/staging/rts5208/rtsx_transport.c
@@ -92,13 +92,13 @@ unsigned int rtsx_stor_access_xfer_buf(unsigned char *buffer,
while (sglen > 0) {
unsigned int plen = min(sglen, (unsigned int)
PAGE_SIZE - poff);
- unsigned char *ptr = kmap(page);
+ unsigned char *ptr = kmap_thread(page);

if (dir == TO_XFER_BUF)
memcpy(ptr + poff, buffer + cnt, plen);
else
memcpy(buffer + cnt, ptr + poff, plen);
- kunmap(page);
+ kunmap_thread(page);

/* Start at the beginning of the next page */
poff = 0;
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:02:31

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 31/58] fs/vboxsf: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Hans de Goede <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/vboxsf/file.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
index c4ab5996d97a..d9c7e6b7b4cc 100644
--- a/fs/vboxsf/file.c
+++ b/fs/vboxsf/file.c
@@ -216,7 +216,7 @@ static int vboxsf_readpage(struct file *file, struct page *page)
u8 *buf;
int err;

- buf = kmap(page);
+ buf = kmap_thread(page);

err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread, buf);
if (err == 0) {
@@ -227,7 +227,7 @@ static int vboxsf_readpage(struct file *file, struct page *page)
SetPageError(page);
}

- kunmap(page);
+ kunmap_thread(page);
unlock_page(page);
return err;
}
@@ -268,10 +268,10 @@ static int vboxsf_writepage(struct page *page, struct writeback_control *wbc)
if (!sf_handle)
return -EBADF;

- buf = kmap(page);
+ buf = kmap_thread(page);
err = vboxsf_write(sf_handle->root, sf_handle->handle,
off, &nwrite, buf);
- kunmap(page);
+ kunmap_thread(page);

kref_put(&sf_handle->refcount, vboxsf_handle_release);

@@ -302,10 +302,10 @@ static int vboxsf_write_end(struct file *file, struct address_space *mapping,
if (!PageUptodate(page) && copied < len)
zero_user(page, from + copied, len - copied);

- buf = kmap(page);
+ buf = kmap_thread(page);
err = vboxsf_write(sf_handle->root, sf_handle->handle,
pos, &nwritten, buf + from);
- kunmap(page);
+ kunmap_thread(page);

if (err) {
nwritten = 0;
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:03:01

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 35/58] fs: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

These kmap() calls are localized to a single thread. To avoid the over
head of global PKRS updates use the new kmap_thread() call.

Cc: Alexander Viro <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/aio.c | 4 ++--
fs/binfmt_elf.c | 4 ++--
fs/binfmt_elf_fdpic.c | 4 ++--
fs/exec.c | 10 +++++-----
fs/io_uring.c | 4 ++--
fs/splice.c | 4 ++--
6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index d5ec30385566..27f95996d25f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1223,10 +1223,10 @@ static long aio_read_events_ring(struct kioctx *ctx,
avail = min(avail, nr - ret);
avail = min_t(long, avail, AIO_EVENTS_PER_PAGE - pos);

- ev = kmap(page);
+ ev = kmap_thread(page);
copy_ret = copy_to_user(event + ret, ev + pos,
sizeof(*ev) * avail);
- kunmap(page);
+ kunmap_thread(page);

if (unlikely(copy_ret)) {
ret = -EFAULT;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 13d053982dd7..1a332ef1ae03 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2430,9 +2430,9 @@ static int elf_core_dump(struct coredump_params *cprm)

page = get_dump_page(addr);
if (page) {
- void *kaddr = kmap(page);
+ void *kaddr = kmap_thread(page);
stop = !dump_emit(cprm, kaddr, PAGE_SIZE);
- kunmap(page);
+ kunmap_thread(page);
put_page(page);
} else
stop = !dump_skip(cprm, PAGE_SIZE);
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 50f845702b92..8fbe188e0fdd 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1542,9 +1542,9 @@ static bool elf_fdpic_dump_segments(struct coredump_params *cprm)
bool res;
struct page *page = get_dump_page(addr);
if (page) {
- void *kaddr = kmap(page);
+ void *kaddr = kmap_thread(page);
res = dump_emit(cprm, kaddr, PAGE_SIZE);
- kunmap(page);
+ kunmap_thread(page);
put_page(page);
} else {
res = dump_skip(cprm, PAGE_SIZE);
diff --git a/fs/exec.c b/fs/exec.c
index a91003e28eaa..3948b8511e3a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -575,11 +575,11 @@ static int copy_strings(int argc, struct user_arg_ptr argv,

if (kmapped_page) {
flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
+ kunmap_thread(kmapped_page);
put_arg_page(kmapped_page);
}
kmapped_page = page;
- kaddr = kmap(kmapped_page);
+ kaddr = kmap_thread(kmapped_page);
kpos = pos & PAGE_MASK;
flush_arg_page(bprm, kpos, kmapped_page);
}
@@ -593,7 +593,7 @@ static int copy_strings(int argc, struct user_arg_ptr argv,
out:
if (kmapped_page) {
flush_kernel_dcache_page(kmapped_page);
- kunmap(kmapped_page);
+ kunmap_thread(kmapped_page);
put_arg_page(kmapped_page);
}
return ret;
@@ -871,11 +871,11 @@ int transfer_args_to_stack(struct linux_binprm *bprm,

for (index = MAX_ARG_PAGES - 1; index >= stop; index--) {
unsigned int offset = index == stop ? bprm->p & ~PAGE_MASK : 0;
- char *src = kmap(bprm->page[index]) + offset;
+ char *src = kmap_thread(bprm->page[index]) + offset;
sp -= PAGE_SIZE - offset;
if (copy_to_user((void *) sp, src, PAGE_SIZE - offset) != 0)
ret = -EFAULT;
- kunmap(bprm->page[index]);
+ kunmap_thread(bprm->page[index]);
if (ret)
goto out;
}
diff --git a/fs/io_uring.c b/fs/io_uring.c
index aae0ef2ec34d..f59bb079822d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2903,7 +2903,7 @@ static ssize_t loop_rw_iter(int rw, struct file *file, struct kiocb *kiocb,
iovec = iov_iter_iovec(iter);
} else {
/* fixed buffers import bvec */
- iovec.iov_base = kmap(iter->bvec->bv_page)
+ iovec.iov_base = kmap_thread(iter->bvec->bv_page)
+ iter->iov_offset;
iovec.iov_len = min(iter->count,
iter->bvec->bv_len - iter->iov_offset);
@@ -2918,7 +2918,7 @@ static ssize_t loop_rw_iter(int rw, struct file *file, struct kiocb *kiocb,
}

if (iov_iter_is_bvec(iter))
- kunmap(iter->bvec->bv_page);
+ kunmap_thread(iter->bvec->bv_page);

if (nr < 0) {
if (!ret)
diff --git a/fs/splice.c b/fs/splice.c
index ce75aec52274..190c4d218c30 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -815,9 +815,9 @@ static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
void *data;
loff_t tmp = sd->pos;

- data = kmap(buf->page);
+ data = kmap_thread(buf->page);
ret = __kernel_write(sd->u.file, data + buf->offset, sd->len, &tmp);
- kunmap(buf->page);
+ kunmap_thread(buf->page);

return ret;
}
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:03:38

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Nicolas Pitre <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/cramfs/inode.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 912308600d39..003c014a42ed 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
struct page *page = pages[i];

if (page) {
- memcpy(data, kmap(page), PAGE_SIZE);
- kunmap(page);
+ memcpy(data, kmap_thread(page), PAGE_SIZE);
+ kunmap_thread(page);
put_page(page);
} else
memset(data, 0, PAGE_SIZE);
@@ -826,7 +826,7 @@ static int cramfs_readpage(struct file *file, struct page *page)

maxblock = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
bytes_filled = 0;
- pgdata = kmap(page);
+ pgdata = kmap_thread(page);

if (page->index < maxblock) {
struct super_block *sb = inode->i_sb;
@@ -914,13 +914,13 @@ static int cramfs_readpage(struct file *file, struct page *page)

memset(pgdata + bytes_filled, 0, PAGE_SIZE - bytes_filled);
flush_dcache_page(page);
- kunmap(page);
+ kunmap_thread(page);
SetPageUptodate(page);
unlock_page(page);
return 0;

err:
- kunmap(page);
+ kunmap_thread(page);
ClearPageUptodate(page);
SetPageError(page);
unlock_page(page);
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:03:46

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 32/58] fs/hostfs: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Anton Ivanov <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/hostfs/hostfs_kern.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index c070c0d8e3e9..608efd0f83cb 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -409,7 +409,7 @@ static int hostfs_writepage(struct page *page, struct writeback_control *wbc)
if (page->index >= end_index)
count = inode->i_size & (PAGE_SIZE-1);

- buffer = kmap(page);
+ buffer = kmap_thread(page);

err = write_file(HOSTFS_I(inode)->fd, &base, buffer, count);
if (err != count) {
@@ -425,7 +425,7 @@ static int hostfs_writepage(struct page *page, struct writeback_control *wbc)
err = 0;

out:
- kunmap(page);
+ kunmap_thread(page);

unlock_page(page);
return err;
@@ -437,7 +437,7 @@ static int hostfs_readpage(struct file *file, struct page *page)
loff_t start = page_offset(page);
int bytes_read, ret = 0;

- buffer = kmap(page);
+ buffer = kmap_thread(page);
bytes_read = read_file(FILE_HOSTFS_I(file)->fd, &start, buffer,
PAGE_SIZE);
if (bytes_read < 0) {
@@ -454,7 +454,7 @@ static int hostfs_readpage(struct file *file, struct page *page)

out:
flush_dcache_page(page);
- kunmap(page);
+ kunmap_thread(page);
unlock_page(page);
return ret;
}
@@ -480,9 +480,9 @@ static int hostfs_write_end(struct file *file, struct address_space *mapping,
unsigned from = pos & (PAGE_SIZE - 1);
int err;

- buffer = kmap(page);
+ buffer = kmap_thread(page);
err = write_file(FILE_HOSTFS_I(file)->fd, &pos, buffer + from, copied);
- kunmap(page);
+ kunmap_thread(page);

if (!PageUptodate(page) && err == PAGE_SIZE)
SetPageUptodate(page);
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:04:47

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 27/58] fs/ubifs: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Richard Weinberger <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/ubifs/file.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index b77d1637bbbc..a3537447a885 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -111,7 +111,7 @@ static int do_readpage(struct page *page)
ubifs_assert(c, !PageChecked(page));
ubifs_assert(c, !PagePrivate(page));

- addr = kmap(page);
+ addr = kmap_thread(page);

block = page->index << UBIFS_BLOCKS_PER_PAGE_SHIFT;
beyond = (i_size + UBIFS_BLOCK_SIZE - 1) >> UBIFS_BLOCK_SHIFT;
@@ -174,7 +174,7 @@ static int do_readpage(struct page *page)
SetPageUptodate(page);
ClearPageError(page);
flush_dcache_page(page);
- kunmap(page);
+ kunmap_thread(page);
return 0;

error:
@@ -182,7 +182,7 @@ static int do_readpage(struct page *page)
ClearPageUptodate(page);
SetPageError(page);
flush_dcache_page(page);
- kunmap(page);
+ kunmap_thread(page);
return err;
}

@@ -616,7 +616,7 @@ static int populate_page(struct ubifs_info *c, struct page *page,
dbg_gen("ino %lu, pg %lu, i_size %lld, flags %#lx",
inode->i_ino, page->index, i_size, page->flags);

- addr = zaddr = kmap(page);
+ addr = zaddr = kmap_thread(page);

end_index = (i_size - 1) >> PAGE_SHIFT;
if (!i_size || page->index > end_index) {
@@ -692,7 +692,7 @@ static int populate_page(struct ubifs_info *c, struct page *page,
SetPageUptodate(page);
ClearPageError(page);
flush_dcache_page(page);
- kunmap(page);
+ kunmap_thread(page);
*n = nn;
return 0;

@@ -700,7 +700,7 @@ static int populate_page(struct ubifs_info *c, struct page *page,
ClearPageUptodate(page);
SetPageError(page);
flush_dcache_page(page);
- kunmap(page);
+ kunmap_thread(page);
ubifs_err(c, "bad data node (block %u, inode %lu)",
page_block, inode->i_ino);
return -EINVAL;
@@ -918,7 +918,7 @@ static int do_writepage(struct page *page, int len)
/* Update radix tree tags */
set_page_writeback(page);

- addr = kmap(page);
+ addr = kmap_thread(page);
block = page->index << UBIFS_BLOCKS_PER_PAGE_SHIFT;
i = 0;
while (len) {
@@ -950,7 +950,7 @@ static int do_writepage(struct page *page, int len)
ClearPagePrivate(page);
ClearPageChecked(page);

- kunmap(page);
+ kunmap_thread(page);
unlock_page(page);
end_page_writeback(page);
return err;
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:05:12

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 26/58] fs/zonefs: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Damien Le Moal <[email protected]>
Cc: Naohiro Aota <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/zonefs/super.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 8ec7c8f109d7..2fd6c86beee1 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1297,7 +1297,7 @@ static int zonefs_read_super(struct super_block *sb)
if (ret)
goto free_page;

- super = kmap(page);
+ super = kmap_thread(page);

ret = -EINVAL;
if (le32_to_cpu(super->s_magic) != ZONEFS_MAGIC)
@@ -1349,7 +1349,7 @@ static int zonefs_read_super(struct super_block *sb)
ret = 0;

unmap:
- kunmap(page);
+ kunmap_thread(page);
free_page:
__free_page(page);

--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:06:28

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 25/58] fs/reiserfs: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Jan Kara <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Alex Shi <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/reiserfs/journal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index e98f99338f8f..be8f56261e8c 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -4194,11 +4194,11 @@ static int do_journal_end(struct reiserfs_transaction_handle *th, int flags)
SB_ONDISK_JOURNAL_SIZE(sb)));
set_buffer_uptodate(tmp_bh);
page = cn->bh->b_page;
- addr = kmap(page);
+ addr = kmap_thread(page);
memcpy(tmp_bh->b_data,
addr + offset_in_page(cn->bh->b_data),
cn->bh->b_size);
- kunmap(page);
+ kunmap_thread(page);
mark_buffer_dirty(tmp_bh);
jindex++;
set_buffer_journal_dirty(cn->bh);
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:07:26

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Cc: Jaegeuk Kim <[email protected]>
Cc: Chao Yu <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
fs/f2fs/f2fs.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index d9e52a7f3702..ff72a45a577e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(

static inline void f2fs_copy_page(struct page *src, struct page *dst)
{
- char *src_kaddr = kmap(src);
- char *dst_kaddr = kmap(dst);
+ char *src_kaddr = kmap_thread(src);
+ char *dst_kaddr = kmap_thread(dst);

memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
- kunmap(dst);
- kunmap(src);
+ kunmap_thread(dst);
+ kunmap_thread(src);
}

static inline void f2fs_put_page(struct page *page, int unlock)
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:07:42

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 19/58] fs/hfsplus: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this FS are localized to a single thread. To avoid
the over head of global PKRS updates use the new kmap_thread() call.

Signed-off-by: Ira Weiny <[email protected]>
---
fs/hfsplus/bitmap.c | 20 ++++-----
fs/hfsplus/bnode.c | 102 ++++++++++++++++++++++----------------------
fs/hfsplus/btree.c | 18 ++++----
3 files changed, 70 insertions(+), 70 deletions(-)

diff --git a/fs/hfsplus/bitmap.c b/fs/hfsplus/bitmap.c
index cebce0cfe340..9ec7c1559a0c 100644
--- a/fs/hfsplus/bitmap.c
+++ b/fs/hfsplus/bitmap.c
@@ -39,7 +39,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
start = size;
goto out;
}
- pptr = kmap(page);
+ pptr = kmap_thread(page);
curr = pptr + (offset & (PAGE_CACHE_BITS - 1)) / 32;
i = offset % 32;
offset &= ~(PAGE_CACHE_BITS - 1);
@@ -74,7 +74,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
}
curr++;
}
- kunmap(page);
+ kunmap_thread(page);
offset += PAGE_CACHE_BITS;
if (offset >= size)
break;
@@ -84,7 +84,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
start = size;
goto out;
}
- curr = pptr = kmap(page);
+ curr = pptr = kmap_thread(page);
if ((size ^ offset) / PAGE_CACHE_BITS)
end = pptr + PAGE_CACHE_BITS / 32;
else
@@ -127,7 +127,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
len -= 32;
}
set_page_dirty(page);
- kunmap(page);
+ kunmap_thread(page);
offset += PAGE_CACHE_BITS;
page = read_mapping_page(mapping, offset / PAGE_CACHE_BITS,
NULL);
@@ -135,7 +135,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
start = size;
goto out;
}
- pptr = kmap(page);
+ pptr = kmap_thread(page);
curr = pptr;
end = pptr + PAGE_CACHE_BITS / 32;
}
@@ -151,7 +151,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size,
done:
*curr = cpu_to_be32(n);
set_page_dirty(page);
- kunmap(page);
+ kunmap_thread(page);
*max = offset + (curr - pptr) * 32 + i - start;
sbi->free_blocks -= *max;
hfsplus_mark_mdb_dirty(sb);
@@ -185,7 +185,7 @@ int hfsplus_block_free(struct super_block *sb, u32 offset, u32 count)
page = read_mapping_page(mapping, pnr, NULL);
if (IS_ERR(page))
goto kaboom;
- pptr = kmap(page);
+ pptr = kmap_thread(page);
curr = pptr + (offset & (PAGE_CACHE_BITS - 1)) / 32;
end = pptr + PAGE_CACHE_BITS / 32;
len = count;
@@ -215,11 +215,11 @@ int hfsplus_block_free(struct super_block *sb, u32 offset, u32 count)
if (!count)
break;
set_page_dirty(page);
- kunmap(page);
+ kunmap_thread(page);
page = read_mapping_page(mapping, ++pnr, NULL);
if (IS_ERR(page))
goto kaboom;
- pptr = kmap(page);
+ pptr = kmap_thread(page);
curr = pptr;
end = pptr + PAGE_CACHE_BITS / 32;
}
@@ -231,7 +231,7 @@ int hfsplus_block_free(struct super_block *sb, u32 offset, u32 count)
}
out:
set_page_dirty(page);
- kunmap(page);
+ kunmap_thread(page);
sbi->free_blocks += len;
hfsplus_mark_mdb_dirty(sb);
mutex_unlock(&sbi->alloc_mutex);
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 177fae4e6581..62757d92fbbd 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -29,14 +29,14 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
off &= ~PAGE_MASK;

l = min_t(int, len, PAGE_SIZE - off);
- memcpy(buf, kmap(*pagep) + off, l);
- kunmap(*pagep);
+ memcpy(buf, kmap_thread(*pagep) + off, l);
+ kunmap_thread(*pagep);

while ((len -= l) != 0) {
buf += l;
l = min_t(int, len, PAGE_SIZE);
- memcpy(buf, kmap(*++pagep), l);
- kunmap(*pagep);
+ memcpy(buf, kmap_thread(*++pagep), l);
+ kunmap_thread(*pagep);
}
}

@@ -82,16 +82,16 @@ void hfs_bnode_write(struct hfs_bnode *node, void *buf, int off, int len)
off &= ~PAGE_MASK;

l = min_t(int, len, PAGE_SIZE - off);
- memcpy(kmap(*pagep) + off, buf, l);
+ memcpy(kmap_thread(*pagep) + off, buf, l);
set_page_dirty(*pagep);
- kunmap(*pagep);
+ kunmap_thread(*pagep);

while ((len -= l) != 0) {
buf += l;
l = min_t(int, len, PAGE_SIZE);
- memcpy(kmap(*++pagep), buf, l);
+ memcpy(kmap_thread(*++pagep), buf, l);
set_page_dirty(*pagep);
- kunmap(*pagep);
+ kunmap_thread(*pagep);
}
}

@@ -112,15 +112,15 @@ void hfs_bnode_clear(struct hfs_bnode *node, int off, int len)
off &= ~PAGE_MASK;

l = min_t(int, len, PAGE_SIZE - off);
- memset(kmap(*pagep) + off, 0, l);
+ memset(kmap_thread(*pagep) + off, 0, l);
set_page_dirty(*pagep);
- kunmap(*pagep);
+ kunmap_thread(*pagep);

while ((len -= l) != 0) {
l = min_t(int, len, PAGE_SIZE);
- memset(kmap(*++pagep), 0, l);
+ memset(kmap_thread(*++pagep), 0, l);
set_page_dirty(*pagep);
- kunmap(*pagep);
+ kunmap_thread(*pagep);
}
}

@@ -142,24 +142,24 @@ void hfs_bnode_copy(struct hfs_bnode *dst_node, int dst,

if (src == dst) {
l = min_t(int, len, PAGE_SIZE - src);
- memcpy(kmap(*dst_page) + src, kmap(*src_page) + src, l);
- kunmap(*src_page);
+ memcpy(kmap_thread(*dst_page) + src, kmap_thread(*src_page) + src, l);
+ kunmap_thread(*src_page);
set_page_dirty(*dst_page);
- kunmap(*dst_page);
+ kunmap_thread(*dst_page);

while ((len -= l) != 0) {
l = min_t(int, len, PAGE_SIZE);
- memcpy(kmap(*++dst_page), kmap(*++src_page), l);
- kunmap(*src_page);
+ memcpy(kmap_thread(*++dst_page), kmap_thread(*++src_page), l);
+ kunmap_thread(*src_page);
set_page_dirty(*dst_page);
- kunmap(*dst_page);
+ kunmap_thread(*dst_page);
}
} else {
void *src_ptr, *dst_ptr;

do {
- src_ptr = kmap(*src_page) + src;
- dst_ptr = kmap(*dst_page) + dst;
+ src_ptr = kmap_thread(*src_page) + src;
+ dst_ptr = kmap_thread(*dst_page) + dst;
if (PAGE_SIZE - src < PAGE_SIZE - dst) {
l = PAGE_SIZE - src;
src = 0;
@@ -171,9 +171,9 @@ void hfs_bnode_copy(struct hfs_bnode *dst_node, int dst,
}
l = min(len, l);
memcpy(dst_ptr, src_ptr, l);
- kunmap(*src_page);
+ kunmap_thread(*src_page);
set_page_dirty(*dst_page);
- kunmap(*dst_page);
+ kunmap_thread(*dst_page);
if (!dst)
dst_page++;
else
@@ -202,27 +202,27 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int src, int len)

if (src == dst) {
while (src < len) {
- memmove(kmap(*dst_page), kmap(*src_page), src);
- kunmap(*src_page);
+ memmove(kmap_thread(*dst_page), kmap_thread(*src_page), src);
+ kunmap_thread(*src_page);
set_page_dirty(*dst_page);
- kunmap(*dst_page);
+ kunmap_thread(*dst_page);
len -= src;
src = PAGE_SIZE;
src_page--;
dst_page--;
}
src -= len;
- memmove(kmap(*dst_page) + src,
- kmap(*src_page) + src, len);
- kunmap(*src_page);
+ memmove(kmap_thread(*dst_page) + src,
+ kmap_thread(*src_page) + src, len);
+ kunmap_thread(*src_page);
set_page_dirty(*dst_page);
- kunmap(*dst_page);
+ kunmap_thread(*dst_page);
} else {
void *src_ptr, *dst_ptr;

do {
- src_ptr = kmap(*src_page) + src;
- dst_ptr = kmap(*dst_page) + dst;
+ src_ptr = kmap_thread(*src_page) + src;
+ dst_ptr = kmap_thread(*dst_page) + dst;
if (src < dst) {
l = src;
src = PAGE_SIZE;
@@ -234,9 +234,9 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int src, int len)
}
l = min(len, l);
memmove(dst_ptr - l, src_ptr - l, l);
- kunmap(*src_page);
+ kunmap_thread(*src_page);
set_page_dirty(*dst_page);
- kunmap(*dst_page);
+ kunmap_thread(*dst_page);
if (dst == PAGE_SIZE)
dst_page--;
else
@@ -251,26 +251,26 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int src, int len)

if (src == dst) {
l = min_t(int, len, PAGE_SIZE - src);
- memmove(kmap(*dst_page) + src,
- kmap(*src_page) + src, l);
- kunmap(*src_page);
+ memmove(kmap_thread(*dst_page) + src,
+ kmap_thread(*src_page) + src, l);
+ kunmap_thread(*src_page);
set_page_dirty(*dst_page);
- kunmap(*dst_page);
+ kunmap_thread(*dst_page);

while ((len -= l) != 0) {
l = min_t(int, len, PAGE_SIZE);
- memmove(kmap(*++dst_page),
- kmap(*++src_page), l);
- kunmap(*src_page);
+ memmove(kmap_thread(*++dst_page),
+ kmap_thread(*++src_page), l);
+ kunmap_thread(*src_page);
set_page_dirty(*dst_page);
- kunmap(*dst_page);
+ kunmap_thread(*dst_page);
}
} else {
void *src_ptr, *dst_ptr;

do {
- src_ptr = kmap(*src_page) + src;
- dst_ptr = kmap(*dst_page) + dst;
+ src_ptr = kmap_thread(*src_page) + src;
+ dst_ptr = kmap_thread(*dst_page) + dst;
if (PAGE_SIZE - src <
PAGE_SIZE - dst) {
l = PAGE_SIZE - src;
@@ -283,9 +283,9 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int src, int len)
}
l = min(len, l);
memmove(dst_ptr, src_ptr, l);
- kunmap(*src_page);
+ kunmap_thread(*src_page);
set_page_dirty(*dst_page);
- kunmap(*dst_page);
+ kunmap_thread(*dst_page);
if (!dst)
dst_page++;
else
@@ -502,14 +502,14 @@ struct hfs_bnode *hfs_bnode_find(struct hfs_btree *tree, u32 num)
if (!test_bit(HFS_BNODE_NEW, &node->flags))
return node;

- desc = (struct hfs_bnode_desc *)(kmap(node->page[0]) +
+ desc = (struct hfs_bnode_desc *)(kmap_thread(node->page[0]) +
node->page_offset);
node->prev = be32_to_cpu(desc->prev);
node->next = be32_to_cpu(desc->next);
node->num_recs = be16_to_cpu(desc->num_recs);
node->type = desc->type;
node->height = desc->height;
- kunmap(node->page[0]);
+ kunmap_thread(node->page[0]);

switch (node->type) {
case HFS_NODE_HEADER:
@@ -593,14 +593,14 @@ struct hfs_bnode *hfs_bnode_create(struct hfs_btree *tree, u32 num)
}

pagep = node->page;
- memset(kmap(*pagep) + node->page_offset, 0,
+ memset(kmap_thread(*pagep) + node->page_offset, 0,
min_t(int, PAGE_SIZE, tree->node_size));
set_page_dirty(*pagep);
- kunmap(*pagep);
+ kunmap_thread(*pagep);
for (i = 1; i < tree->pages_per_bnode; i++) {
- memset(kmap(*++pagep), 0, PAGE_SIZE);
+ memset(kmap_thread(*++pagep), 0, PAGE_SIZE);
set_page_dirty(*pagep);
- kunmap(*pagep);
+ kunmap_thread(*pagep);
}
clear_bit(HFS_BNODE_NEW, &node->flags);
wake_up(&node->lock_wq);
diff --git a/fs/hfsplus/btree.c b/fs/hfsplus/btree.c
index 66774f4cb4fd..74fcef3a1628 100644
--- a/fs/hfsplus/btree.c
+++ b/fs/hfsplus/btree.c
@@ -394,7 +394,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)

off += node->page_offset;
pagep = node->page + (off >> PAGE_SHIFT);
- data = kmap(*pagep);
+ data = kmap_thread(*pagep);
off &= ~PAGE_MASK;
idx = 0;

@@ -407,7 +407,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
idx += i;
data[off] |= m;
set_page_dirty(*pagep);
- kunmap(*pagep);
+ kunmap_thread(*pagep);
tree->free_nodes--;
mark_inode_dirty(tree->inode);
hfs_bnode_put(node);
@@ -417,14 +417,14 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
}
}
if (++off >= PAGE_SIZE) {
- kunmap(*pagep);
- data = kmap(*++pagep);
+ kunmap_thread(*pagep);
+ data = kmap_thread(*++pagep);
off = 0;
}
idx += 8;
len--;
}
- kunmap(*pagep);
+ kunmap_thread(*pagep);
nidx = node->next;
if (!nidx) {
hfs_dbg(BNODE_MOD, "create new bmap node\n");
@@ -440,7 +440,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
off = off16;
off += node->page_offset;
pagep = node->page + (off >> PAGE_SHIFT);
- data = kmap(*pagep);
+ data = kmap_thread(*pagep);
off &= ~PAGE_MASK;
}
}
@@ -490,7 +490,7 @@ void hfs_bmap_free(struct hfs_bnode *node)
}
off += node->page_offset + nidx / 8;
page = node->page[off >> PAGE_SHIFT];
- data = kmap(page);
+ data = kmap_thread(page);
off &= ~PAGE_MASK;
m = 1 << (~nidx & 7);
byte = data[off];
@@ -498,13 +498,13 @@ void hfs_bmap_free(struct hfs_bnode *node)
pr_crit("trying to free free bnode "
"%u(%d)\n",
node->this, node->type);
- kunmap(page);
+ kunmap_thread(page);
hfs_bnode_put(node);
return;
}
data[off] = byte & ~m;
set_page_dirty(page);
- kunmap(page);
+ kunmap_thread(page);
hfs_bnode_put(node);
tree->free_nodes++;
mark_inode_dirty(tree->inode);
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:10:07

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 07/58] drivers/drbd: Utilize new kmap_thread()

From: Ira Weiny <[email protected]>

The kmap() calls in this driver are localized to a single thread. To
avoid the over head of global PKRS updates use the new kmap_thread()
call.

Cc: Jens Axboe <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/block/drbd/drbd_main.c | 4 ++--
drivers/block/drbd/drbd_receiver.c | 12 ++++++------
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 573dbf6f0c31..f0d0c6b0745e 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1532,9 +1532,9 @@ static int _drbd_no_send_page(struct drbd_peer_device *peer_device, struct page
int err;

socket = peer_device->connection->data.socket;
- addr = kmap(page) + offset;
+ addr = kmap_thread(page) + offset;
err = drbd_send_all(peer_device->connection, socket, addr, size, msg_flags);
- kunmap(page);
+ kunmap_thread(page);
if (!err)
peer_device->device->send_cnt += size >> 9;
return err;
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 422363daa618..4704bc0564e2 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1951,13 +1951,13 @@ read_in_block(struct drbd_peer_device *peer_device, u64 id, sector_t sector,
page = peer_req->pages;
page_chain_for_each(page) {
unsigned len = min_t(int, ds, PAGE_SIZE);
- data = kmap(page);
+ data = kmap_thread(page);
err = drbd_recv_all_warn(peer_device->connection, data, len);
if (drbd_insert_fault(device, DRBD_FAULT_RECEIVE)) {
drbd_err(device, "Fault injection: Corrupting data on receive\n");
data[0] = data[0] ^ (unsigned long)-1;
}
- kunmap(page);
+ kunmap_thread(page);
if (err) {
drbd_free_peer_req(device, peer_req);
return NULL;
@@ -1992,7 +1992,7 @@ static int drbd_drain_block(struct drbd_peer_device *peer_device, int data_size)

page = drbd_alloc_pages(peer_device, 1, 1);

- data = kmap(page);
+ data = kmap_thread(page);
while (data_size) {
unsigned int len = min_t(int, data_size, PAGE_SIZE);

@@ -2001,7 +2001,7 @@ static int drbd_drain_block(struct drbd_peer_device *peer_device, int data_size)
break;
data_size -= len;
}
- kunmap(page);
+ kunmap_thread(page);
drbd_free_pages(peer_device->device, page, 0);
return err;
}
@@ -2033,10 +2033,10 @@ static int recv_dless_read(struct drbd_peer_device *peer_device, struct drbd_req
D_ASSERT(peer_device->device, sector == bio->bi_iter.bi_sector);

bio_for_each_segment(bvec, bio, iter) {
- void *mapped = kmap(bvec.bv_page) + bvec.bv_offset;
+ void *mapped = kmap_thread(bvec.bv_page) + bvec.bv_offset;
expect = min_t(int, data_size, bvec.bv_len);
err = drbd_recv_all_warn(peer_device->connection, mapped, expect);
- kunmap(bvec.bv_page);
+ kunmap_thread(bvec.bv_page);
if (err)
return err;
data_size -= expect;
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:11:13

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 05/58] kmap: Introduce k[un]map_thread

From: Ira Weiny <[email protected]>

To correctly support the semantics of kmap() with Kernel protection keys
(PKS), kmap() may be required to set the protections on multiple
processors (globally). Enabling PKS globally can be very expensive
depending on the requested operation. Furthermore, enabling a domain
globally reduces the protection afforded by PKS.

Most kmap() (Aprox 209 of 229) callers use the map within a single thread and
have no need for the protection domain to be enabled globally. However, the
remaining callers do not follow this pattern and, as best I can tell, expect
the mapping to be 'global' and available to any thread who may access the
mapping.[1]

We don't anticipate global mappings to pmem, however in general there is a
danger in changing the semantics of kmap(). Effectively, this would cause an
unresolved page fault with little to no information about why the failure
occurred.

To resolve this a number of options were considered.

1) Attempt to change all the thread local kmap() calls to kmap_atomic()[2]
2) Introduce a flags parameter to kmap() to indicate if the mapping should be
global or not
3) Change ~20 call sites to 'kmap_global()' to indicate that they require a
global enablement of the pages.
4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to
be used within that thread of execution only

Option 1 is simply not feasible. Option 2 would require all of the call sites
of kmap() to change. Option 3 seems like a good minimal change but there is a
danger that new code may miss the semantic change of kmap() and not get the
behavior the developer intended. Therefore, #4 was chosen.

Subsequent patches will convert most ~90% of the kmap callers to this new call
leaving about 10% of the existing kmap callers to enable PKS globally.

Cc: Randy Dunlap <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
include/linux/highmem.h | 34 ++++++++++++++++++++++++++--------
1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 2a9806e3b8d2..ef7813544719 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -60,7 +60,7 @@ static inline void kmap_flush_tlb(unsigned long addr) { }
#endif

void *kmap_high(struct page *page);
-static inline void *kmap(struct page *page)
+static inline void *__kmap(struct page *page, bool global)
{
void *addr;

@@ -74,20 +74,20 @@ static inline void *kmap(struct page *page)
* Even non-highmem pages may have additional access protections which
* need to be checked and potentially enabled.
*/
- dev_page_enable_access(page, true);
+ dev_page_enable_access(page, global);
return addr;
}

void kunmap_high(struct page *page);

-static inline void kunmap(struct page *page)
+static inline void __kunmap(struct page *page, bool global)
{
might_sleep();
/*
* Even non-highmem pages may have additional access protections which
* need to be checked and potentially disabled.
*/
- dev_page_disable_access(page, true);
+ dev_page_disable_access(page, global);
if (!PageHighMem(page))
return;
kunmap_high(page);
@@ -160,10 +160,10 @@ static inline struct page *kmap_to_page(void *addr)

static inline unsigned long totalhigh_pages(void) { return 0UL; }

-static inline void *kmap(struct page *page)
+static inline void *__kmap(struct page *page, bool global)
{
might_sleep();
- dev_page_enable_access(page, true);
+ dev_page_enable_access(page, global);
return page_address(page);
}

@@ -171,9 +171,9 @@ static inline void kunmap_high(struct page *page)
{
}

-static inline void kunmap(struct page *page)
+static inline void __kunmap(struct page *page, bool global)
{
- dev_page_disable_access(page, true);
+ dev_page_disable_access(page, global);
#ifdef ARCH_HAS_FLUSH_ON_KUNMAP
kunmap_flush_on_unmap(page_address(page));
#endif
@@ -238,6 +238,24 @@ static inline void kmap_atomic_idx_pop(void)

#endif

+static inline void *kmap(struct page *page)
+{
+ return __kmap(page, true);
+}
+static inline void kunmap(struct page *page)
+{
+ __kunmap(page, true);
+}
+
+static inline void *kmap_thread(struct page *page)
+{
+ return __kmap(page, false);
+}
+static inline void kunmap_thread(struct page *page)
+{
+ __kunmap(page, false);
+}
+
/*
* Prevent people trying to call kunmap_atomic() as if it were kunmap()
* kunmap_atomic() should get the return value of kmap_atomic, not the page.
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-09 20:11:49

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC PKS/PMEM 04/58] kmap: Add stray access protection for device pages

From: Ira Weiny <[email protected]>

Device managed pages may have additional protections. These protections
need to be removed prior to valid use by kernel users.

Check for special treatment of device managed pages in kmap and take
action if needed. We use kmap as an interface for generic kernel code
because under normal circumstances it would be a bug for general kernel
code to not use kmap prior to accessing kernel memory. Therefore, this
should allow any valid kernel users to seamlessly use these pages
without issues.

Because of the critical nature of kmap it must be pointed out that the
over head on regular DRAM is carefully implemented to be as fast as
possible. Furthermore the underlying MSR write required on device pages
when protected is better than a normal MSR write.

Specifically, WRMSR(MSR_IA32_PKRS) is not serializing but still
maintains ordering properties similar to WRPKRU. The current SDM
section on PKRS needs updating but should be the same as that of WRPKRU.
So to quote from the WRPKRU text:

WRPKRU will never execute speculatively. Memory accesses
affected by PKRU register will not execute (even speculatively)
until all prior executions of WRPKRU have completed execution
and updated the PKRU register.

Still this will make accessing pmem more expensive from the kernel but
the overhead is minimized and many pmem users access this memory through
user page mappings which are not affected at all.

Cc: Randy Dunlap <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
include/linux/highmem.h | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 14e6202ce47f..2a9806e3b8d2 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -8,6 +8,7 @@
#include <linux/mm.h>
#include <linux/uaccess.h>
#include <linux/hardirq.h>
+#include <linux/memremap.h>

#include <asm/cacheflush.h>

@@ -31,6 +32,20 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)

#include <asm/kmap_types.h>

+static inline void dev_page_enable_access(struct page *page, bool global)
+{
+ if (!page_is_access_protected(page))
+ return;
+ dev_access_enable(global);
+}
+
+static inline void dev_page_disable_access(struct page *page, bool global)
+{
+ if (!page_is_access_protected(page))
+ return;
+ dev_access_disable(global);
+}
+
#ifdef CONFIG_HIGHMEM
extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
extern void kunmap_atomic_high(void *kvaddr);
@@ -55,6 +70,11 @@ static inline void *kmap(struct page *page)
else
addr = kmap_high(page);
kmap_flush_tlb((unsigned long)addr);
+ /*
+ * Even non-highmem pages may have additional access protections which
+ * need to be checked and potentially enabled.
+ */
+ dev_page_enable_access(page, true);
return addr;
}

@@ -63,6 +83,11 @@ void kunmap_high(struct page *page);
static inline void kunmap(struct page *page)
{
might_sleep();
+ /*
+ * Even non-highmem pages may have additional access protections which
+ * need to be checked and potentially disabled.
+ */
+ dev_page_disable_access(page, true);
if (!PageHighMem(page))
return;
kunmap_high(page);
@@ -85,6 +110,7 @@ static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
{
preempt_disable();
pagefault_disable();
+ dev_page_enable_access(page, false);
if (!PageHighMem(page))
return page_address(page);
return kmap_atomic_high_prot(page, prot);
@@ -137,6 +163,7 @@ static inline unsigned long totalhigh_pages(void) { return 0UL; }
static inline void *kmap(struct page *page)
{
might_sleep();
+ dev_page_enable_access(page, true);
return page_address(page);
}

@@ -146,6 +173,7 @@ static inline void kunmap_high(struct page *page)

static inline void kunmap(struct page *page)
{
+ dev_page_disable_access(page, true);
#ifdef ARCH_HAS_FLUSH_ON_KUNMAP
kunmap_flush_on_unmap(page_address(page));
#endif
@@ -155,6 +183,7 @@ static inline void *kmap_atomic(struct page *page)
{
preempt_disable();
pagefault_disable();
+ dev_page_enable_access(page, false);
return page_address(page);
}
#define kmap_atomic_prot(page, prot) kmap_atomic(page)
@@ -216,7 +245,8 @@ static inline void kmap_atomic_idx_pop(void)
#define kunmap_atomic(addr) \
do { \
BUILD_BUG_ON(__same_type((addr), struct page *)); \
- kunmap_atomic_high(addr); \
+ dev_page_disable_access(kmap_to_page(addr), false); \
+ kunmap_atomic_high(addr); \
pagefault_enable(); \
preempt_enable(); \
} while (0)
--
2.28.0.rc0.12.gb6a658bd00c9

2020-10-10 04:53:31

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

On Fri, Oct 09, 2020 at 12:49:57PM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> The kmap() calls in this FS are localized to a single thread. To avoid
> the over head of global PKRS updates use the new kmap_thread() call.
>
> Cc: Jaegeuk Kim <[email protected]>
> Cc: Chao Yu <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
> ---
> fs/f2fs/f2fs.h | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index d9e52a7f3702..ff72a45a577e 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(
>
> static inline void f2fs_copy_page(struct page *src, struct page *dst)
> {
> - char *src_kaddr = kmap(src);
> - char *dst_kaddr = kmap(dst);
> + char *src_kaddr = kmap_thread(src);
> + char *dst_kaddr = kmap_thread(dst);
>
> memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> - kunmap(dst);
> - kunmap(src);
> + kunmap_thread(dst);
> + kunmap_thread(src);
> }

Wouldn't it make more sense to switch cases like this to kmap_atomic()?
The pages are only mapped to do a memcpy(), then they're immediately unmapped.

- Eric

2020-10-10 07:11:24

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote:
> On Fri, Oct 09, 2020 at 12:49:57PM -0700, [email protected] wrote:
> > The kmap() calls in this FS are localized to a single thread. To avoid
> > the over head of global PKRS updates use the new kmap_thread() call.
> >
> > @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(
> >
> > static inline void f2fs_copy_page(struct page *src, struct page *dst)
> > {
> > - char *src_kaddr = kmap(src);
> > - char *dst_kaddr = kmap(dst);
> > + char *src_kaddr = kmap_thread(src);
> > + char *dst_kaddr = kmap_thread(dst);
> >
> > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> > - kunmap(dst);
> > - kunmap(src);
> > + kunmap_thread(dst);
> > + kunmap_thread(src);
> > }
>
> Wouldn't it make more sense to switch cases like this to kmap_atomic()?
> The pages are only mapped to do a memcpy(), then they're immediately unmapped.

Maybe you missed the earlier thread from Thomas trying to do something
similar for rather different reasons ...

https://lore.kernel.org/lkml/[email protected]/

2020-10-10 07:22:51

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

On Sat, Oct 10, 2020 at 01:39:54AM +0100, Matthew Wilcox wrote:
> On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote:
> > On Fri, Oct 09, 2020 at 12:49:57PM -0700, [email protected] wrote:
> > > The kmap() calls in this FS are localized to a single thread. To avoid
> > > the over head of global PKRS updates use the new kmap_thread() call.
> > >
> > > @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(
> > >
> > > static inline void f2fs_copy_page(struct page *src, struct page *dst)
> > > {
> > > - char *src_kaddr = kmap(src);
> > > - char *dst_kaddr = kmap(dst);
> > > + char *src_kaddr = kmap_thread(src);
> > > + char *dst_kaddr = kmap_thread(dst);
> > >
> > > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> > > - kunmap(dst);
> > > - kunmap(src);
> > > + kunmap_thread(dst);
> > > + kunmap_thread(src);
> > > }
> >
> > Wouldn't it make more sense to switch cases like this to kmap_atomic()?
> > The pages are only mapped to do a memcpy(), then they're immediately unmapped.
>
> Maybe you missed the earlier thread from Thomas trying to do something
> similar for rather different reasons ...
>
> https://lore.kernel.org/lkml/[email protected]/

I did miss it. I'm not subscribed to any of the mailing lists it was sent to.

Anyway, it shouldn't matter. Patchsets should be standalone, and not require
reading random prior threads on linux-kernel to understand.

And I still don't really understand. After this patchset, there is still code
nearly identical to the above (doing a temporary mapping just for a memcpy) that
would still be using kmap_atomic(). Is the idea that later, such code will be
converted to use kmap_thread() instead? If not, why use one over the other?

- Eric

2020-10-10 08:13:08

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

On Fri, 2020-10-09 at 14:34 -0700, Eric Biggers wrote:
> On Fri, Oct 09, 2020 at 12:49:57PM -0700, [email protected] wrote:
> > From: Ira Weiny <[email protected]>
> >
> > The kmap() calls in this FS are localized to a single thread. To
> > avoid the over head of global PKRS updates use the new
> > kmap_thread() call.
> >
> > Cc: Jaegeuk Kim <[email protected]>
> > Cc: Chao Yu <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> > ---
> > fs/f2fs/f2fs.h | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index d9e52a7f3702..ff72a45a577e 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -2410,12 +2410,12 @@ static inline struct page
> > *f2fs_pagecache_get_page(
> >
> > static inline void f2fs_copy_page(struct page *src, struct page
> > *dst)
> > {
> > - char *src_kaddr = kmap(src);
> > - char *dst_kaddr = kmap(dst);
> > + char *src_kaddr = kmap_thread(src);
> > + char *dst_kaddr = kmap_thread(dst);
> >
> > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> > - kunmap(dst);
> > - kunmap(src);
> > + kunmap_thread(dst);
> > + kunmap_thread(src);
> > }
>
> Wouldn't it make more sense to switch cases like this to
> kmap_atomic()?
> The pages are only mapped to do a memcpy(), then they're immediately
> unmapped.

On a VIPT/VIVT architecture, this is horrendously wasteful. You're
taking something that was mapped at colour c_src mapping it to a new
address src_kaddr, which is likely a different colour and necessitates
flushing the original c_src, then you copy it to dst_kaddr, which is
also likely a different colour from c_dst, so dst_kaddr has to be
flushed on kunmap and c_dst has to be invalidated on kmap. What we
should have is an architectural primitive for doing this, something
like kmemcopy_arch(dst, src). PIPT architectures can implement it as
the above (possibly losing kmap if they don't need it) but VIPT/VIVT
architectures can set up a correctly coloured mapping so they can
simply copy from c_src to c_dst without any need to flush and the data
arrives cache hot at c_dst.

James


2020-10-13 11:12:23

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

On Mon, Oct 12, 2020 at 09:02:54PM +0100, Matthew Wilcox wrote:
> On Mon, Oct 12, 2020 at 12:53:54PM -0700, Ira Weiny wrote:
> > On Mon, Oct 12, 2020 at 05:44:38PM +0100, Matthew Wilcox wrote:
> > > On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote:
> > > > kmap_atomic() is always preferred over kmap()/kmap_thread().
> > > > kmap_atomic() is _much_ more lightweight since its TLB invalidation is
> > > > always CPU-local and never broadcast.
> > > >
> > > > So, basically, unless you *must* sleep while the mapping is in place,
> > > > kmap_atomic() is preferred.
> > >
> > > But kmap_atomic() disables preemption, so the _ideal_ interface would map
> > > it only locally, then on preemption make it global. I don't even know
> > > if that _can_ be done. But this email makes it seem like kmap_atomic()
> > > has no downsides.
> >
> > And that is IIUC what Thomas was trying to solve.
> >
> > Also, Linus brought up that kmap_atomic() has quirks in nesting.[1]
> >
> > >From what I can see all of these discussions support the need to have something
> > between kmap() and kmap_atomic().
> >
> > However, the reason behind converting call sites to kmap_thread() are different
> > between Thomas' patch set and mine. Both require more kmap granularity.
> > However, they do so with different reasons and underlying implementations but
> > with the _same_ resulting semantics; a thread local mapping which is
> > preemptable.[2] Therefore they each focus on changing different call sites.
> >
> > While this patch set is huge I think it serves a valuable purpose to identify a
> > large number of call sites which are candidates for this new semantic.
>
> Yes, I agree. My problem with this patch-set is that it ties it to
> some Intel feature that almost nobody cares about.

I humbly disagree. At this level the only thing this is tied to is the idea
that there are additional memory protections available which can be enabled
quickly on a per-thread basis. PKS on Intel is but 1 implementation of that.

Even the kmap code only has knowledge that there is something which needs to be
done special on a devm page.

>
> Maybe we should
> care about it, but you didn't try very hard to make anyone care about
> it in the cover letter.

Ok my bad. We have customers who care very much about restricting access to
the PMEM pages to prevent bugs in the kernel from causing permanent damage to
their data/file systems. I'll reword the cover letter better.

>
> For a future patch-set, I'd like to see you just introduce the new
> API. Then you can optimise the Intel implementation of it afterwards.
> Those patch-sets have entirely different reviewers.

I considered doing this. But this seemed more logical because the feature is
being driven by PMEM which is behind the kmap interface not by the users of the
API.

I can introduce a patch set with a kmap_thread() call which does nothing if
that is more palatable but it seems wrong to me to do so.

Ira

2020-10-13 19:47:07

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()

On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
> On Fri, Oct 9, 2020 at 12:52 PM <[email protected]> wrote:
> >
> > From: Ira Weiny <[email protected]>
> >
> > The kmap() calls in this FS are localized to a single thread. To avoid
> > the over head of global PKRS updates use the new kmap_thread() call.
> >
> > Cc: Nicolas Pitre <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> > ---
> > fs/cramfs/inode.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> > index 912308600d39..003c014a42ed 100644
> > --- a/fs/cramfs/inode.c
> > +++ b/fs/cramfs/inode.c
> > @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> > struct page *page = pages[i];
> >
> > if (page) {
> > - memcpy(data, kmap(page), PAGE_SIZE);
> > - kunmap(page);
> > + memcpy(data, kmap_thread(page), PAGE_SIZE);
> > + kunmap_thread(page);
>
> Why does this need a sleepable kmap? This looks like a textbook
> kmap_atomic() use case.

There's a lot of code of this form. Could we perhaps have:

static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
{
char *vto = kmap_atomic(to);

memcpy(vto, vfrom, size);
kunmap_atomic(vto);
}

in linux/highmem.h ?

2020-10-13 19:47:46

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()

On Tue, Oct 13, 2020 at 12:37 PM Matthew Wilcox <[email protected]> wrote:
>
> On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
> > On Fri, Oct 9, 2020 at 12:52 PM <[email protected]> wrote:
> > >
> > > From: Ira Weiny <[email protected]>
> > >
> > > The kmap() calls in this FS are localized to a single thread. To avoid
> > > the over head of global PKRS updates use the new kmap_thread() call.
> > >
> > > Cc: Nicolas Pitre <[email protected]>
> > > Signed-off-by: Ira Weiny <[email protected]>
> > > ---
> > > fs/cramfs/inode.c | 10 +++++-----
> > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> > > index 912308600d39..003c014a42ed 100644
> > > --- a/fs/cramfs/inode.c
> > > +++ b/fs/cramfs/inode.c
> > > @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> > > struct page *page = pages[i];
> > >
> > > if (page) {
> > > - memcpy(data, kmap(page), PAGE_SIZE);
> > > - kunmap(page);
> > > + memcpy(data, kmap_thread(page), PAGE_SIZE);
> > > + kunmap_thread(page);
> >
> > Why does this need a sleepable kmap? This looks like a textbook
> > kmap_atomic() use case.
>
> There's a lot of code of this form. Could we perhaps have:
>
> static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
> {
> char *vto = kmap_atomic(to);
>
> memcpy(vto, vfrom, size);
> kunmap_atomic(vto);
> }
>
> in linux/highmem.h ?

Nice, yes, that could also replace the local ones in lib/iov_iter.c
(memcpy_{to,from}_page())

2020-10-13 20:03:26

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()

On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:

> static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
> {
> char *vto = kmap_atomic(to);
>
> memcpy(vto, vfrom, size);
> kunmap_atomic(vto);
> }
>
> in linux/highmem.h ?

You mean, like
static void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
{
char *from = kmap_atomic(page);
memcpy(to, from + offset, len);
kunmap_atomic(from);
}

static void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
{
char *to = kmap_atomic(page);
memcpy(to + offset, from, len);
kunmap_atomic(to);
}

static void memzero_page(struct page *page, size_t offset, size_t len)
{
char *addr = kmap_atomic(page);
memset(addr + offset, 0, len);
kunmap_atomic(addr);
}

in lib/iov_iter.c? FWIW, I don't like that "highpage" in the name and
highmem.h as location - these make perfect sense regardless of highmem;
they are normal memory operations with page + offset used instead of
a pointer...

2020-10-13 20:51:48

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()

On Tue, Oct 13, 2020 at 09:01:49PM +0100, Al Viro wrote:
> On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
>
> > static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
> > {
> > char *vto = kmap_atomic(to);
> >
> > memcpy(vto, vfrom, size);
> > kunmap_atomic(vto);
> > }
> >
> > in linux/highmem.h ?
>
> You mean, like
> static void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
> {
> char *from = kmap_atomic(page);
> memcpy(to, from + offset, len);
> kunmap_atomic(from);
> }
>
> static void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
> {
> char *to = kmap_atomic(page);
> memcpy(to + offset, from, len);
> kunmap_atomic(to);
> }
>
> static void memzero_page(struct page *page, size_t offset, size_t len)
> {
> char *addr = kmap_atomic(page);
> memset(addr + offset, 0, len);
> kunmap_atomic(addr);
> }
>
> in lib/iov_iter.c? FWIW, I don't like that "highpage" in the name and
> highmem.h as location - these make perfect sense regardless of highmem;
> they are normal memory operations with page + offset used instead of
> a pointer...

I was thinking along those lines as well especially because of the direction
this patch set takes kmap().

Thanks for pointing these out to me. How about I lift them to a common header?
But if not highmem.h where?

Ira