2015-05-06 17:50:51

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 00/15] decouple pagefault_disable() from preempt_disable()

As Peter asked me to also do the decoupling in one shot, this is
the new series.

I recently discovered that might_fault() doesn't call might_sleep()
anymore. Therefore bugs like:

spin_lock(&lock);
rc = copy_to_user(...);
spin_unlock(&lock);

would not be detected with CONFIG_DEBUG_ATOMIC_SLEEP. The code was
changed to disable false positives for code like:

pagefault_disable();
rc = copy_to_user(...);
pagefault_enable();

Whereby the caller wants do deal with failures.

If we schedule while holding a spinlock, bad things can happen.
Preemption is disabled, but we still preempt - on s390 this can lead
to spinlocks never getting unlocked (as unlocking code checks for the
cpu id that locke the spinlock), therefore resulting in a deadlock.

Until now, pagefault_(dis|en)nable) simply modified the preempt count,
therefore telling the pagefault handler that the context is atomic and
sleeping is disallowed.

With !CONFIG_PREEMPT_COUNT, the preempt count does not exist.
So preempt_disable() as well es pagefault_disable() is a NOP.
in_atomic() checks the preempt count. So we can't detect in_atomic on
systems without preemption.
That is also mentioned in the comment of in_atomic():

"WARNING: this macro cannot always detect atomic context; in particular,
it cannot know about held spinlocks in non-preemptible kernels."

We automatically have !CONFIG_PREEMPT_COUNT with !CONFIG_PREEMPT
and !CONFIG_DEBUG_ATOMIC_SLEEP, so on a system with disabled pagefaults.

All fault handlers currently rely on in_atomic() to check for disabled
pagefaults.

This series therefore does 2 things:


1. Decouple pagefault_disable() from preempt_enable()

pagefault_(dis|en)able() modifies an own counter and doesn't touch
preemption anymore. The fault handlers now only check for
disabled pagefaults.

I checked (hopefully) every caller of pagefault_disable(), if they
implicitly rely on preempt_disable() - and if so added these calls.
Hope I haven't missed some cases.

I didn't check all users of preempt_disable() if they relied on them
disabling pagefaults. My assumption is that such code is broken
either way (on non-preemptible systems). Pagefaults would only be
disabled with CONFIG_PREEMPT_COUNT.

I also thought about leaving the in_atomic() check inside the
fault handlers in addition to the new check. In my opinion that
can only hide bugs that would happen with !CONFIG_PREEMPT_COUNT.

We could even add a warning at that place like
WARN_ONCE(in_atomic() && !pagefault_disabled())

But most of these cases should be found with the following part.


2. Reenable might_sleep() checks for might_fault()

As we can now decide if we really have pagefaults disabled,
we can reenable the might_sleep() check in might_fault().

So this should now work:

spin_lock(&lock); /* also if left away */
pagefault_disable()
rc = copy_to_user(...);
pagefault_enable();
spin_unlock(&lock);

And this should report a warning again:

spin_lock(&lock);
rc = copy_to_user(...);
spin_unlock(&lock);


Cross compiled on powerpc, arm, sparc, sparc64, arm64, x86_64, i386,
mips, alpha, ia64, xtensa, m68k, microblaze.

Tested on s390x.

Any feedback very welcome!

Thanks!


David Hildenbrand (15):
uaccess: count pagefault_disable() levels in pagefault_disabled
mm, uaccess: trigger might_sleep() in might_fault() with disabled
pagefaults
uaccess: clarify that uaccess may only sleep if pagefaults are enabled
mm: explicitly disable/enable preemption in kmap_atomic_*
mips: kmap_coherent relies on disabled preemption
mm: use pagefault_disabled() to check for disabled pagefaults
drm/i915: use pagefault_disabled() to check for disabled pagefaults
futex: UP futex_atomic_op_inuser() relies on disabled preemption
futex: UP futex_atomic_cmpxchg_inatomic() relies on disabled
preemption
arm/futex: UP futex_atomic_cmpxchg_inatomic() relies on disabled
preemption
arm/futex: UP futex_atomic_op_inuser() relies on disabled preemption
futex: clarify that preemption doesn't have to be disabled
powerpc: enable_kernel_altivec() requires disabled preemption
mips: properly lock access to the fpu
uaccess: decouple preemption from the pagefault logic

arch/alpha/mm/fault.c | 5 ++--
arch/arc/include/asm/futex.h | 10 +++----
arch/arc/mm/fault.c | 2 +-
arch/arm/include/asm/futex.h | 13 +++++++--
arch/arm/mm/fault.c | 2 +-
arch/arm/mm/highmem.c | 3 ++
arch/arm64/include/asm/futex.h | 4 +--
arch/arm64/mm/fault.c | 2 +-
arch/avr32/include/asm/uaccess.h | 12 +++++---
arch/avr32/mm/fault.c | 4 +--
arch/cris/mm/fault.c | 4 +--
arch/frv/mm/fault.c | 4 +--
arch/frv/mm/highmem.c | 2 ++
arch/hexagon/include/asm/uaccess.h | 3 +-
arch/ia64/mm/fault.c | 4 +--
arch/m32r/include/asm/uaccess.h | 30 +++++++++++++-------
arch/m32r/mm/fault.c | 4 +--
arch/m68k/mm/fault.c | 4 +--
arch/metag/mm/fault.c | 2 +-
arch/metag/mm/highmem.c | 4 ++-
arch/microblaze/include/asm/uaccess.h | 6 ++--
arch/microblaze/mm/fault.c | 6 ++--
arch/microblaze/mm/highmem.c | 4 ++-
arch/mips/include/asm/uaccess.h | 45 ++++++++++++++++++++----------
arch/mips/kernel/signal-common.h | 9 ++----
arch/mips/mm/fault.c | 4 +--
arch/mips/mm/highmem.c | 5 +++-
arch/mips/mm/init.c | 2 ++
arch/mn10300/include/asm/highmem.h | 3 ++
arch/mn10300/mm/fault.c | 4 +--
arch/nios2/mm/fault.c | 2 +-
arch/parisc/include/asm/cacheflush.h | 2 ++
arch/parisc/kernel/traps.c | 4 +--
arch/parisc/mm/fault.c | 4 +--
arch/powerpc/lib/vmx-helper.c | 11 ++++----
arch/powerpc/mm/fault.c | 9 +++---
arch/powerpc/mm/highmem.c | 4 ++-
arch/s390/include/asm/uaccess.h | 15 ++++++----
arch/s390/mm/fault.c | 2 +-
arch/score/include/asm/uaccess.h | 15 ++++++----
arch/score/mm/fault.c | 3 +-
arch/sh/mm/fault.c | 3 +-
arch/sparc/mm/fault_32.c | 4 +--
arch/sparc/mm/fault_64.c | 4 +--
arch/sparc/mm/highmem.c | 4 ++-
arch/sparc/mm/init_64.c | 2 +-
arch/tile/include/asm/uaccess.h | 18 ++++++++----
arch/tile/mm/fault.c | 4 +--
arch/tile/mm/highmem.c | 3 +-
arch/um/kernel/trap.c | 4 +--
arch/unicore32/mm/fault.c | 2 +-
arch/x86/include/asm/uaccess.h | 15 ++++++----
arch/x86/include/asm/uaccess_32.h | 6 ++--
arch/x86/lib/usercopy_32.c | 6 ++--
arch/x86/mm/fault.c | 5 ++--
arch/x86/mm/highmem_32.c | 3 +-
arch/x86/mm/iomap_32.c | 2 ++
arch/xtensa/mm/fault.c | 4 +--
arch/xtensa/mm/highmem.c | 2 ++
drivers/crypto/vmx/aes.c | 8 +++++-
drivers/crypto/vmx/aes_cbc.c | 6 ++++
drivers/crypto/vmx/ghash.c | 8 ++++++
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +-
include/asm-generic/futex.h | 7 +++--
include/linux/highmem.h | 2 ++
include/linux/io-mapping.h | 2 ++
include/linux/kernel.h | 3 +-
include/linux/sched.h | 1 +
include/linux/uaccess.h | 36 +++++++++++++++---------
kernel/fork.c | 3 ++
lib/strnlen_user.c | 6 ++--
mm/memory.c | 18 ++++--------
72 files changed, 302 insertions(+), 169 deletions(-)

--
2.1.4


2015-05-06 17:55:14

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

Until now, pagefault_disable()/pagefault_enabled() used the preempt
count to track whether in an environment with pagefaults disabled (can
be queried via in_atomic()).

This patch introduces a separate counter in task_struct to count the
level of pagefault_disable() calls. We'll keep manipulating the preempt
count to retain compatibility to existing pagefault handlers.

It is now possible to verify whether in a pagefault_disable() envionment
by calling pagefault_disabled(). In contrast to in_atomic() it will not
be influenced by preempt_enable()/preempt_disable().

This patch is based on a patch from Ingo Molnar.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/sched.h | 1 +
include/linux/uaccess.h | 36 +++++++++++++++++++++++++++++-------
kernel/fork.c | 3 +++
3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ede26ca..75778cb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1724,6 +1724,7 @@ struct task_struct {
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
unsigned long task_state_change;
#endif
+ int pagefault_disabled;
};

/* Future-safe accessor for struct task_struct's cpus_allowed. */
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index ecd3319..23290cc 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -2,20 +2,36 @@
#define __LINUX_UACCESS_H__

#include <linux/preempt.h>
+#include <linux/sched.h>
#include <asm/uaccess.h>

+static __always_inline void pagefault_disabled_inc(void)
+{
+ current->pagefault_disabled++;
+}
+
+static __always_inline void pagefault_disabled_dec(void)
+{
+ current->pagefault_disabled--;
+ WARN_ON(current->pagefault_disabled < 0);
+}
+
/*
- * These routines enable/disable the pagefault handler in that
- * it will not take any locks and go straight to the fixup table.
+ * These routines enable/disable the pagefault handler. If disabled, it will
+ * not take any locks and go straight to the fixup table.
+ *
+ * We increase the preempt and the pagefault count, to be able to distinguish
+ * whether we run in simple atomic context or in a real pagefault_disable()
+ * context.
+ *
+ * For now, after pagefault_disabled() has been called, we run in atomic
+ * context. User access methods will not sleep.
*
- * They have great resemblance to the preempt_disable/enable calls
- * and in fact they are identical; this is because currently there is
- * no other way to make the pagefault handlers do this. So we do
- * disable preemption but we don't necessarily care about that.
*/
static inline void pagefault_disable(void)
{
preempt_count_inc();
+ pagefault_disabled_inc();
/*
* make sure to have issued the store before a pagefault
* can hit.
@@ -25,18 +41,24 @@ static inline void pagefault_disable(void)

static inline void pagefault_enable(void)
{
-#ifndef CONFIG_PREEMPT
/*
* make sure to issue those last loads/stores before enabling
* the pagefault handler again.
*/
barrier();
+ pagefault_disabled_dec();
+#ifndef CONFIG_PREEMPT
preempt_count_dec();
#else
preempt_enable();
#endif
}

+/*
+ * Is the pagefault handler disabled? If so, user access methods will not sleep.
+ */
+#define pagefault_disabled() (current->pagefault_disabled != 0)
+
#ifndef ARCH_HAS_NOCACHE_UACCESS

static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
diff --git a/kernel/fork.c b/kernel/fork.c
index 03c1eaa..c344d27 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1396,6 +1396,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->hardirq_context = 0;
p->softirq_context = 0;
#endif
+
+ p->pagefault_disabled = 0;
+
#ifdef CONFIG_LOCKDEP
p->lockdep_depth = 0; /* no locks held yet */
p->curr_chain_key = 0;
--
2.1.4

2015-05-06 17:54:49

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 02/15] mm, uaccess: trigger might_sleep() in might_fault() with disabled pagefaults

Commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with
pagefault_disable()") removed might_sleep() checks for all user access
code (that uses might_fault()).

The reason was to disable wrong "sleep in atomic" warnings in the
following scenario:
pagefault_disable()
rc = copy_to_user(...)
pagefault_enable()

Which is valid, as pagefault_disable() increments the preempt counter
and therefore disables the pagefault handler. copy_to_user() will not
sleep and return an error code if a page is not available.

However, as all might_sleep() checks are removed,
CONFIG_DEBUG_ATOMIC_SLEEP would no longer detect the following scenario:
spin_lock(&lock);
rc = copy_to_user(...)
spin_unlock(&lock)

If the kernel is compiled with preemption turned on, preempt_disable()
will make in_atomic() detect disabled preemption. The fault handler would
correctly never sleep on user access.
However, with preemption turned off, preempt_disable() is usually a NOP
(with !CONFIG_PREEMPT_COUNT), therefore in_atomic() will not be able to
detect disabled preemption nor disabled pagefaults. The fault handler
could sleep.
We really want to enable CONFIG_DEBUG_ATOMIC_SLEEP checks for user access
functions again, otherwise we can end up with horrible deadlocks.

Root of all evil is that pagefault_disable() acts almost as
preempt_disable(), depending on preemption being turned on/off.

As we now have pagefault_disabled(), we can use it to distinguish
whether user acces functions might sleep.

Convert might_fault() into a makro that calls __might_fault(), to
allow proper file + line messages in case of a might_sleep() warning.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/kernel.h | 3 ++-
mm/memory.c | 18 ++++++------------
2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3a5b48e..060dd7b 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -244,7 +244,8 @@ static inline u32 reciprocal_scale(u32 val, u32 ep_ro)

#if defined(CONFIG_MMU) && \
(defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP))
-void might_fault(void);
+#define might_fault() __might_fault(__FILE__, __LINE__)
+void __might_fault(const char *file, int line);
#else
static inline void might_fault(void) { }
#endif
diff --git a/mm/memory.c b/mm/memory.c
index d1fa0c1..2ddd80a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3737,7 +3737,7 @@ void print_vma_addr(char *prefix, unsigned long ip)
}

#if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP)
-void might_fault(void)
+void __might_fault(const char *file, int line)
{
/*
* Some code (nfs/sunrpc) uses socket ops on kernel memory while
@@ -3747,21 +3747,15 @@ void might_fault(void)
*/
if (segment_eq(get_fs(), KERNEL_DS))
return;
-
- /*
- * it would be nicer only to annotate paths which are not under
- * pagefault_disable, however that requires a larger audit and
- * providing helpers like get_user_atomic.
- */
- if (in_atomic())
+ if (pagefault_disabled())
return;
-
- __might_sleep(__FILE__, __LINE__, 0);
-
+ __might_sleep(file, line, 0);
+#if defined(CONFIG_DEBUG_ATOMIC_SLEEP)
if (current->mm)
might_lock_read(&current->mm->mmap_sem);
+#endif
}
-EXPORT_SYMBOL(might_fault);
+EXPORT_SYMBOL(__might_fault);
#endif

#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
--
2.1.4

2015-05-06 17:50:58

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 03/15] uaccess: clarify that uaccess may only sleep if pagefaults are enabled

In general, non-atomic variants of user access functions must not sleep
if pagefaults are disabled.

Let's update all relevant comments in uaccess code. This also reflects
the might_sleep() checks in might_fault().

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/avr32/include/asm/uaccess.h | 12 ++++++----
arch/hexagon/include/asm/uaccess.h | 3 ++-
arch/m32r/include/asm/uaccess.h | 30 +++++++++++++++--------
arch/microblaze/include/asm/uaccess.h | 6 +++--
arch/mips/include/asm/uaccess.h | 45 +++++++++++++++++++++++------------
arch/s390/include/asm/uaccess.h | 15 ++++++++----
arch/score/include/asm/uaccess.h | 15 ++++++++----
arch/tile/include/asm/uaccess.h | 18 +++++++++-----
arch/x86/include/asm/uaccess.h | 15 ++++++++----
arch/x86/include/asm/uaccess_32.h | 6 +++--
arch/x86/lib/usercopy_32.c | 6 +++--
lib/strnlen_user.c | 6 +++--
12 files changed, 118 insertions(+), 59 deletions(-)

diff --git a/arch/avr32/include/asm/uaccess.h b/arch/avr32/include/asm/uaccess.h
index a46f7cf..68cf638 100644
--- a/arch/avr32/include/asm/uaccess.h
+++ b/arch/avr32/include/asm/uaccess.h
@@ -97,7 +97,8 @@ static inline __kernel_size_t __copy_from_user(void *to,
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -116,7 +117,8 @@ static inline __kernel_size_t __copy_from_user(void *to,
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -136,7 +138,8 @@ static inline __kernel_size_t __copy_from_user(void *to,
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -158,7 +161,8 @@ static inline __kernel_size_t __copy_from_user(void *to,
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h
index e4127e4..f000a38 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -36,7 +36,8 @@
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
diff --git a/arch/m32r/include/asm/uaccess.h b/arch/m32r/include/asm/uaccess.h
index 71adff2..cac7014 100644
--- a/arch/m32r/include/asm/uaccess.h
+++ b/arch/m32r/include/asm/uaccess.h
@@ -91,7 +91,8 @@ static inline void set_fs(mm_segment_t s)
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -155,7 +156,8 @@ extern int fixup_exception(struct pt_regs *regs);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -175,7 +177,8 @@ extern int fixup_exception(struct pt_regs *regs);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -194,7 +197,8 @@ extern int fixup_exception(struct pt_regs *regs);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -274,7 +278,8 @@ do { \
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -568,7 +573,8 @@ unsigned long __generic_copy_from_user(void *, const void __user *, unsigned lon
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -588,7 +594,8 @@ unsigned long __generic_copy_from_user(void *, const void __user *, unsigned lon
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from kernel space to user space.
*
@@ -606,7 +613,8 @@ unsigned long __generic_copy_from_user(void *, const void __user *, unsigned lon
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -626,7 +634,8 @@ unsigned long __generic_copy_from_user(void *, const void __user *, unsigned lon
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to kernel space.
*
@@ -677,7 +686,8 @@ unsigned long clear_user(void __user *mem, unsigned long len);
* strlen_user: - Get the size of a string in user space.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Get the size of a NUL-terminated string in user space.
*
diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index 62942fd..331b0d3 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -178,7 +178,8 @@ extern long __user_bad(void);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -290,7 +291,8 @@ extern long __user_bad(void);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index bf8b324..9722357 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -103,7 +103,8 @@ extern u64 __ua_limit;
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -138,7 +139,8 @@ extern u64 __ua_limit;
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -157,7 +159,8 @@ extern u64 __ua_limit;
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -177,7 +180,8 @@ extern u64 __ua_limit;
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -199,7 +203,8 @@ extern u64 __ua_limit;
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -498,7 +503,8 @@ extern void __put_user_unknown(void);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -517,7 +523,8 @@ extern void __put_user_unknown(void);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -537,7 +544,8 @@ extern void __put_user_unknown(void);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -559,7 +567,8 @@ extern void __put_user_unknown(void);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -815,7 +824,8 @@ extern size_t __copy_user(void *__to, const void *__from, size_t __n);
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -888,7 +898,8 @@ extern size_t __copy_user_inatomic(void *__to, const void *__from, size_t __n);
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from kernel space to user space.
*
@@ -1075,7 +1086,8 @@ extern size_t __copy_in_user_eva(void *__to, const void *__from, size_t __n);
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -1107,7 +1119,8 @@ extern size_t __copy_in_user_eva(void *__to, const void *__from, size_t __n);
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to kernel space.
*
@@ -1329,7 +1342,8 @@ strncpy_from_user(char *__to, const char __user *__from, long __len)
* strlen_user: - Get the size of a string in user space.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Get the size of a NUL-terminated string in user space.
*
@@ -1398,7 +1412,8 @@ static inline long __strnlen_user(const char __user *s, long n)
* strnlen_user: - Get the size of a string in user space.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Get the size of a NUL-terminated string in user space.
*
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index d64a7a6..9dd4cc4 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -98,7 +98,8 @@ static inline unsigned long extable_fixup(const struct exception_table_entry *x)
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -118,7 +119,8 @@ unsigned long __must_check __copy_from_user(void *to, const void __user *from,
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -264,7 +266,8 @@ int __get_user_bad(void) __attribute__((noreturn));
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from kernel space to user space.
*
@@ -290,7 +293,8 @@ __compiletime_warning("copy_from_user() buffer size is not provably correct")
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to kernel space.
*
@@ -348,7 +352,8 @@ static inline unsigned long strnlen_user(const char __user *src, unsigned long n
* strlen_user: - Get the size of a string in user space.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Get the size of a NUL-terminated string in user space.
*
diff --git a/arch/score/include/asm/uaccess.h b/arch/score/include/asm/uaccess.h
index ab66ddd..20a3591 100644
--- a/arch/score/include/asm/uaccess.h
+++ b/arch/score/include/asm/uaccess.h
@@ -36,7 +36,8 @@
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -61,7 +62,8 @@
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -79,7 +81,8 @@
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -98,7 +101,8 @@
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -119,7 +123,8 @@
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
diff --git a/arch/tile/include/asm/uaccess.h b/arch/tile/include/asm/uaccess.h
index d598c11..0a9c4265 100644
--- a/arch/tile/include/asm/uaccess.h
+++ b/arch/tile/include/asm/uaccess.h
@@ -85,7 +85,8 @@ int __range_ok(unsigned long addr, unsigned long size);
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -199,7 +200,8 @@ extern int __get_user_bad(void)
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -281,7 +283,8 @@ extern int __put_user_bad(void)
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -337,7 +340,8 @@ extern int __put_user_bad(void)
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -373,7 +377,8 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -444,7 +449,8 @@ static inline unsigned long __must_check copy_from_user(void *to,
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to user space. Caller must check
* the specified blocks with access_ok() before calling this function.
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index ace9dec..a8df874 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -74,7 +74,8 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -145,7 +146,8 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -240,7 +242,8 @@ extern void __put_user_8(void);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -455,7 +458,8 @@ struct __large_struct { unsigned long buf[100]; };
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -479,7 +483,8 @@ struct __large_struct { unsigned long buf[100]; };
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 0ed5504..f5dcb52 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -74,7 +74,8 @@ __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n)
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -121,7 +122,8 @@ __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index e2f5e21..91d93b9 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -647,7 +647,8 @@ EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from kernel space to user space.
*
@@ -668,7 +669,8 @@ EXPORT_SYMBOL(_copy_to_user);
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Copy data from user space to kernel space.
*
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index a28df52..36c15a2 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -84,7 +84,8 @@ static inline long do_strnlen_user(const char __user *src, unsigned long count,
* @str: The string to measure.
* @count: Maximum count (including NUL character)
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Get the size of a NUL-terminated string in user space.
*
@@ -113,7 +114,8 @@ EXPORT_SYMBOL(strnlen_user);
* strlen_user: - Get the size of a user string INCLUDING final NUL.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are
+ * enabled.
*
* Get the size of a NUL-terminated string in user space.
*
--
2.1.4

2015-05-06 17:53:24

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 04/15] mm: explicitly disable/enable preemption in kmap_atomic_*

The existing code relies on pagefault_disable() implicitly disabling
preemption, so that no schedule will happen between kmap_atomic() and
kunmap_atomic().

Let's make this explicit, to prepare for pagefault_disable() not
touching preemption anymore.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/arm/mm/highmem.c | 3 +++
arch/frv/mm/highmem.c | 2 ++
arch/metag/mm/highmem.c | 4 +++-
arch/microblaze/mm/highmem.c | 4 +++-
arch/mips/mm/highmem.c | 5 ++++-
arch/mn10300/include/asm/highmem.h | 3 +++
arch/parisc/include/asm/cacheflush.h | 2 ++
arch/powerpc/mm/highmem.c | 4 +++-
arch/sparc/mm/highmem.c | 4 +++-
arch/tile/mm/highmem.c | 3 ++-
arch/x86/mm/highmem_32.c | 3 ++-
arch/x86/mm/iomap_32.c | 2 ++
arch/xtensa/mm/highmem.c | 2 ++
include/linux/highmem.h | 2 ++
include/linux/io-mapping.h | 2 ++
15 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/arch/arm/mm/highmem.c b/arch/arm/mm/highmem.c
index b98895d..ee8dfa7 100644
--- a/arch/arm/mm/highmem.c
+++ b/arch/arm/mm/highmem.c
@@ -59,6 +59,7 @@ void *kmap_atomic(struct page *page)
void *kmap;
int type;

+ preempt_disable();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
@@ -121,6 +122,7 @@ void __kunmap_atomic(void *kvaddr)
kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
}
pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);

@@ -130,6 +132,7 @@ void *kmap_atomic_pfn(unsigned long pfn)
int idx, type;
struct page *page = pfn_to_page(pfn);

+ preempt_disable();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
diff --git a/arch/frv/mm/highmem.c b/arch/frv/mm/highmem.c
index bed9a9b..785344b 100644
--- a/arch/frv/mm/highmem.c
+++ b/arch/frv/mm/highmem.c
@@ -42,6 +42,7 @@ void *kmap_atomic(struct page *page)
unsigned long paddr;
int type;

+ preempt_disable();
pagefault_disable();
type = kmap_atomic_idx_push();
paddr = page_to_phys(page);
@@ -85,5 +86,6 @@ void __kunmap_atomic(void *kvaddr)
}
kmap_atomic_idx_pop();
pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);
diff --git a/arch/metag/mm/highmem.c b/arch/metag/mm/highmem.c
index d71f621..807f1b1 100644
--- a/arch/metag/mm/highmem.c
+++ b/arch/metag/mm/highmem.c
@@ -43,7 +43,7 @@ void *kmap_atomic(struct page *page)
unsigned long vaddr;
int type;

- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
+ preempt_disable();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
@@ -82,6 +82,7 @@ void __kunmap_atomic(void *kvaddr)
}

pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);

@@ -95,6 +96,7 @@ void *kmap_atomic_pfn(unsigned long pfn)
unsigned long vaddr;
int type;

+ preempt_disable();
pagefault_disable();

type = kmap_atomic_idx_push();
diff --git a/arch/microblaze/mm/highmem.c b/arch/microblaze/mm/highmem.c
index 5a92576..2fcc5a5 100644
--- a/arch/microblaze/mm/highmem.c
+++ b/arch/microblaze/mm/highmem.c
@@ -37,7 +37,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
unsigned long vaddr;
int idx, type;

- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
+ preempt_disable();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
@@ -63,6 +63,7 @@ void __kunmap_atomic(void *kvaddr)

if (vaddr < __fix_to_virt(FIX_KMAP_END)) {
pagefault_enable();
+ preempt_enable();
return;
}

@@ -84,5 +85,6 @@ void __kunmap_atomic(void *kvaddr)
#endif
kmap_atomic_idx_pop();
pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);
diff --git a/arch/mips/mm/highmem.c b/arch/mips/mm/highmem.c
index da815d2..11661cb 100644
--- a/arch/mips/mm/highmem.c
+++ b/arch/mips/mm/highmem.c
@@ -47,7 +47,7 @@ void *kmap_atomic(struct page *page)
unsigned long vaddr;
int idx, type;

- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
+ preempt_disable();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
@@ -72,6 +72,7 @@ void __kunmap_atomic(void *kvaddr)

if (vaddr < FIXADDR_START) { // FIXME
pagefault_enable();
+ preempt_enable();
return;
}

@@ -92,6 +93,7 @@ void __kunmap_atomic(void *kvaddr)
#endif
kmap_atomic_idx_pop();
pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);

@@ -104,6 +106,7 @@ void *kmap_atomic_pfn(unsigned long pfn)
unsigned long vaddr;
int idx, type;

+ preempt_disable();
pagefault_disable();

type = kmap_atomic_idx_push();
diff --git a/arch/mn10300/include/asm/highmem.h b/arch/mn10300/include/asm/highmem.h
index 2fbbe4d..1ddea5a 100644
--- a/arch/mn10300/include/asm/highmem.h
+++ b/arch/mn10300/include/asm/highmem.h
@@ -75,6 +75,7 @@ static inline void *kmap_atomic(struct page *page)
unsigned long vaddr;
int idx, type;

+ preempt_disable();
pagefault_disable();
if (page < highmem_start_page)
return page_address(page);
@@ -98,6 +99,7 @@ static inline void __kunmap_atomic(unsigned long vaddr)

if (vaddr < FIXADDR_START) { /* FIXME */
pagefault_enable();
+ preempt_enable();
return;
}

@@ -122,6 +124,7 @@ static inline void __kunmap_atomic(unsigned long vaddr)

kmap_atomic_idx_pop();
pagefault_enable();
+ preempt_enable();
}
#endif /* __KERNEL__ */

diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index de65f66..ec2df4b 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -142,6 +142,7 @@ static inline void kunmap(struct page *page)

static inline void *kmap_atomic(struct page *page)
{
+ preempt_disable();
pagefault_disable();
return page_address(page);
}
@@ -150,6 +151,7 @@ static inline void __kunmap_atomic(void *addr)
{
flush_kernel_dcache_page_addr(addr);
pagefault_enable();
+ preempt_enable();
}

#define kmap_atomic_prot(page, prot) kmap_atomic(page)
diff --git a/arch/powerpc/mm/highmem.c b/arch/powerpc/mm/highmem.c
index e7450bd..e292c8a 100644
--- a/arch/powerpc/mm/highmem.c
+++ b/arch/powerpc/mm/highmem.c
@@ -34,7 +34,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
unsigned long vaddr;
int idx, type;

- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
+ preempt_disable();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
@@ -59,6 +59,7 @@ void __kunmap_atomic(void *kvaddr)

if (vaddr < __fix_to_virt(FIX_KMAP_END)) {
pagefault_enable();
+ preempt_enable();
return;
}

@@ -82,5 +83,6 @@ void __kunmap_atomic(void *kvaddr)

kmap_atomic_idx_pop();
pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);
diff --git a/arch/sparc/mm/highmem.c b/arch/sparc/mm/highmem.c
index 449f864..a454ec5 100644
--- a/arch/sparc/mm/highmem.c
+++ b/arch/sparc/mm/highmem.c
@@ -53,7 +53,7 @@ void *kmap_atomic(struct page *page)
unsigned long vaddr;
long idx, type;

- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
+ preempt_disable();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
@@ -91,6 +91,7 @@ void __kunmap_atomic(void *kvaddr)

if (vaddr < FIXADDR_START) { // FIXME
pagefault_enable();
+ preempt_enable();
return;
}

@@ -126,5 +127,6 @@ void __kunmap_atomic(void *kvaddr)

kmap_atomic_idx_pop();
pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);
diff --git a/arch/tile/mm/highmem.c b/arch/tile/mm/highmem.c
index 6aa2f26..fcd5450 100644
--- a/arch/tile/mm/highmem.c
+++ b/arch/tile/mm/highmem.c
@@ -201,7 +201,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
int idx, type;
pte_t *pte;

- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
+ preempt_disable();
pagefault_disable();

/* Avoid icache flushes by disallowing atomic executable mappings. */
@@ -259,6 +259,7 @@ void __kunmap_atomic(void *kvaddr)
}

pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 4500142..eecb207a 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -35,7 +35,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
unsigned long vaddr;
int idx, type;

- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
+ preempt_disable();
pagefault_disable();

if (!PageHighMem(page))
@@ -100,6 +100,7 @@ void __kunmap_atomic(void *kvaddr)
#endif

pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);

diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 9ca35fc..2b7ece0 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -59,6 +59,7 @@ void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
unsigned long vaddr;
int idx, type;

+ preempt_disable();
pagefault_disable();

type = kmap_atomic_idx_push();
@@ -117,5 +118,6 @@ iounmap_atomic(void __iomem *kvaddr)
}

pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL_GPL(iounmap_atomic);
diff --git a/arch/xtensa/mm/highmem.c b/arch/xtensa/mm/highmem.c
index 8cfb71e..184cead 100644
--- a/arch/xtensa/mm/highmem.c
+++ b/arch/xtensa/mm/highmem.c
@@ -42,6 +42,7 @@ void *kmap_atomic(struct page *page)
enum fixed_addresses idx;
unsigned long vaddr;

+ preempt_disable();
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
@@ -79,6 +80,7 @@ void __kunmap_atomic(void *kvaddr)
}

pagefault_enable();
+ preempt_enable();
}
EXPORT_SYMBOL(__kunmap_atomic);

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 9286a46..6aefcd0 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -65,6 +65,7 @@ static inline void kunmap(struct page *page)

static inline void *kmap_atomic(struct page *page)
{
+ preempt_disable();
pagefault_disable();
return page_address(page);
}
@@ -73,6 +74,7 @@ static inline void *kmap_atomic(struct page *page)
static inline void __kunmap_atomic(void *addr)
{
pagefault_enable();
+ preempt_enable();
}

#define kmap_atomic_pfn(pfn) kmap_atomic(pfn_to_page(pfn))
diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
index 657fab4..c27dde7 100644
--- a/include/linux/io-mapping.h
+++ b/include/linux/io-mapping.h
@@ -141,6 +141,7 @@ static inline void __iomem *
io_mapping_map_atomic_wc(struct io_mapping *mapping,
unsigned long offset)
{
+ preempt_disable();
pagefault_disable();
return ((char __force __iomem *) mapping) + offset;
}
@@ -149,6 +150,7 @@ static inline void
io_mapping_unmap_atomic(void __iomem *vaddr)
{
pagefault_enable();
+ preempt_enable();
}

/* Non-atomic map/unmap */
--
2.1.4

2015-05-06 17:53:31

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 05/15] mips: kmap_coherent relies on disabled preemption

k(un)map_coherent relies on pagefault_disable() to also disable
preemption.

Let's make this explicit, to prepare for pagefault_disable() not
touching preemption anymore.

This patch is based on a patch by Yang Shi on the -rt tree:
"k{un}map_coherent are just called when cpu_has_dc_aliases == 1 with VIPT
cache. However, actually, the most modern MIPS processors have PIPT dcache
without dcache alias issue. In such case, k{un}map_atomic will be called
with preempt enabled."

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/mips/mm/init.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index faa5c98..198a314 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -90,6 +90,7 @@ static void *__kmap_pgprot(struct page *page, unsigned long addr, pgprot_t prot)

BUG_ON(Page_dcache_dirty(page));

+ preempt_disable();
pagefault_disable();
idx = (addr >> PAGE_SHIFT) & (FIX_N_COLOURS - 1);
idx += in_interrupt() ? FIX_N_COLOURS : 0;
@@ -152,6 +153,7 @@ void kunmap_coherent(void)
write_c0_entryhi(old_ctx);
local_irq_restore(flags);
pagefault_enable();
+ preempt_enable();
}

void copy_user_highpage(struct page *to, struct page *from,
--
2.1.4

2015-05-06 17:51:04

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 06/15] mm: use pagefault_disabled() to check for disabled pagefaults

Now that the pagefault disabled counter is in place, we can replace
the in_atomic() checks by pagefault_disabled() checks.

If a fault happens while in_atomic(), it can only be a bug in the code.
Either a pagefault_disable() call is missing or the specific *in_atomic()
variants of uaccess should have been used. That's what might_fault()
checks when calling might_sleep().

pagefault_disable() is defined in linux/uaccess.h, so let's properly
add that include to all relevant files.

This patch is based on a patch from Thomas Gleixner.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/alpha/mm/fault.c | 5 ++---
arch/arc/mm/fault.c | 2 +-
arch/arm/mm/fault.c | 2 +-
arch/arm64/mm/fault.c | 2 +-
arch/avr32/mm/fault.c | 4 ++--
arch/cris/mm/fault.c | 4 ++--
arch/frv/mm/fault.c | 4 ++--
arch/ia64/mm/fault.c | 4 ++--
arch/m32r/mm/fault.c | 4 ++--
arch/m68k/mm/fault.c | 4 ++--
arch/metag/mm/fault.c | 2 +-
arch/microblaze/mm/fault.c | 6 +++---
arch/mips/mm/fault.c | 4 ++--
arch/mn10300/mm/fault.c | 4 ++--
arch/nios2/mm/fault.c | 2 +-
arch/parisc/kernel/traps.c | 4 ++--
arch/parisc/mm/fault.c | 4 ++--
arch/powerpc/mm/fault.c | 9 +++++----
arch/s390/mm/fault.c | 2 +-
arch/score/mm/fault.c | 3 ++-
arch/sh/mm/fault.c | 3 ++-
arch/sparc/mm/fault_32.c | 4 ++--
arch/sparc/mm/fault_64.c | 4 ++--
arch/sparc/mm/init_64.c | 2 +-
arch/tile/mm/fault.c | 4 ++--
arch/um/kernel/trap.c | 4 ++--
arch/unicore32/mm/fault.c | 2 +-
arch/x86/mm/fault.c | 5 +++--
arch/xtensa/mm/fault.c | 4 ++--
29 files changed, 55 insertions(+), 52 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index 9d0ac09..f702a27 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -23,8 +23,7 @@
#include <linux/smp.h>
#include <linux/interrupt.h>
#include <linux/module.h>
-
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>

extern void die_if_kernel(char *,struct pt_regs *,long, unsigned long *);

@@ -107,7 +106,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,

/* If we're in an interrupt context, or have no user context,
we must not take the fault. */
- if (!mm || in_atomic())
+ if (!mm || pagefault_disabled())
goto no_context;

#ifdef CONFIG_ALPHA_LARGE_VMALLOC
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 6a2e006..2f632a0 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -86,7 +86,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 6333d9c..841a1b6 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -276,7 +276,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 96da131..d93c129 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -211,7 +211,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
* If we're in an interrupt or have no user context, we must not take
* the fault.
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/avr32/mm/fault.c b/arch/avr32/mm/fault.c
index d223a8b..09f7ced 100644
--- a/arch/avr32/mm/fault.c
+++ b/arch/avr32/mm/fault.c
@@ -14,11 +14,11 @@
#include <linux/pagemap.h>
#include <linux/kdebug.h>
#include <linux/kprobes.h>
+#include <linux/uaccess.h>

#include <asm/mmu_context.h>
#include <asm/sysreg.h>
#include <asm/tlb.h>
-#include <asm/uaccess.h>

#ifdef CONFIG_KPROBES
static inline int notify_page_fault(struct pt_regs *regs, int trap)
@@ -81,7 +81,7 @@ asmlinkage void do_page_fault(unsigned long ecr, struct pt_regs *regs)
* If we're in an interrupt or have no user context, we must
* not take the fault...
*/
- if (in_atomic() || !mm || regs->sr & SYSREG_BIT(GM))
+ if (pagefault_disabled() || !mm || regs->sr & SYSREG_BIT(GM))
goto no_context;

local_irq_enable();
diff --git a/arch/cris/mm/fault.c b/arch/cris/mm/fault.c
index 83f12f2..41f83a7 100644
--- a/arch/cris/mm/fault.c
+++ b/arch/cris/mm/fault.c
@@ -8,7 +8,7 @@
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/wait.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
#include <arch/system.h>

extern int find_fixup_code(struct pt_regs *);
@@ -113,7 +113,7 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
* user context, we must not take the fault.
*/

- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/frv/mm/fault.c b/arch/frv/mm/fault.c
index ec4917d..d1f67d1 100644
--- a/arch/frv/mm/fault.c
+++ b/arch/frv/mm/fault.c
@@ -19,9 +19,9 @@
#include <linux/kernel.h>
#include <linux/ptrace.h>
#include <linux/hardirq.h>
+#include <linux/uaccess.h>

#include <asm/pgtable.h>
-#include <asm/uaccess.h>
#include <asm/gdb-stub.h>

/*****************************************************************************/
@@ -78,7 +78,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

if (user_mode(__frame))
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index ba5ba7a..5a636aa 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -11,10 +11,10 @@
#include <linux/kprobes.h>
#include <linux/kdebug.h>
#include <linux/prefetch.h>
+#include <linux/uaccess.h>

#include <asm/pgtable.h>
#include <asm/processor.h>
-#include <asm/uaccess.h>

extern int die(char *, struct pt_regs *, long);

@@ -96,7 +96,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
/*
* If we're in an interrupt or have no user context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

#ifdef CONFIG_VIRTUAL_MEM_MAP
diff --git a/arch/m32r/mm/fault.c b/arch/m32r/mm/fault.c
index e3d4d48901..4e1892e 100644
--- a/arch/m32r/mm/fault.c
+++ b/arch/m32r/mm/fault.c
@@ -24,9 +24,9 @@
#include <linux/vt_kern.h> /* For unblank_screen() */
#include <linux/highmem.h>
#include <linux/module.h>
+#include <linux/uaccess.h>

#include <asm/m32r.h>
-#include <asm/uaccess.h>
#include <asm/hardirq.h>
#include <asm/mmu_context.h>
#include <asm/tlbflush.h>
@@ -114,7 +114,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
* If we're in an interrupt or have no user context or are running in an
* atomic region then we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto bad_area_nosemaphore;

if (error_code & ACE_USERMODE)
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index b2f04ae..156309b0 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -10,10 +10,10 @@
#include <linux/ptrace.h>
#include <linux/interrupt.h>
#include <linux/module.h>
+#include <linux/uaccess.h>

#include <asm/setup.h>
#include <asm/traps.h>
-#include <asm/uaccess.h>
#include <asm/pgalloc.h>

extern void die_if_kernel(char *, struct pt_regs *, long);
@@ -81,7 +81,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/metag/mm/fault.c b/arch/metag/mm/fault.c
index 2de5dc6..713700d 100644
--- a/arch/metag/mm/fault.c
+++ b/arch/metag/mm/fault.c
@@ -105,7 +105,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,

mm = tsk->mm;

- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index d46a5eb..6a75a42 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -107,13 +107,13 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
if ((error_code & 0x13) == 0x13 || (error_code & 0x11) == 0x11)
is_write = 0;

- if (unlikely(in_atomic() || !mm)) {
+ if (unlikely(pagefault_disabled() || !mm)) {
if (kernel_mode(regs))
goto bad_area_nosemaphore;

- /* in_atomic() in user mode is really bad,
+ /* pagefault_disabled() in user mode is really bad,
as is current->mm == NULL. */
- pr_emerg("Page fault in user mode with in_atomic(), mm = %p\n",
+ pr_emerg("Page fault in user mode with pagefault_disabled(), mm = %p\n",
mm);
pr_emerg("r15 = %lx MSR = %lx\n",
regs->r15, regs->msr);
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index 7ff8637..ccba212 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -21,10 +21,10 @@
#include <linux/module.h>
#include <linux/kprobes.h>
#include <linux/perf_event.h>
+#include <linux/uaccess.h>

#include <asm/branch.h>
#include <asm/mmu_context.h>
-#include <asm/uaccess.h>
#include <asm/ptrace.h>
#include <asm/highmem.h> /* For VMALLOC_END */
#include <linux/kdebug.h>
@@ -94,7 +94,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto bad_area_nosemaphore;

if (user_mode(regs))
diff --git a/arch/mn10300/mm/fault.c b/arch/mn10300/mm/fault.c
index 0c2cc5d..1d441a8 100644
--- a/arch/mn10300/mm/fault.c
+++ b/arch/mn10300/mm/fault.c
@@ -23,8 +23,8 @@
#include <linux/interrupt.h>
#include <linux/init.h>
#include <linux/vt_kern.h> /* For unblank_screen() */
+#include <linux/uaccess.h>

-#include <asm/uaccess.h>
#include <asm/pgalloc.h>
#include <asm/hardirq.h>
#include <asm/cpu-regs.h>
@@ -168,7 +168,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

if ((fault_code & MMUFCR_xFC_ACCESS) == MMUFCR_xFC_ACCESS_USR)
diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index 0c9b6af..fdae091 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -77,7 +77,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto bad_area_nosemaphore;

if (user_mode(regs))
diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index 47ee620..e72483c 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -26,9 +26,9 @@
#include <linux/console.h>
#include <linux/bug.h>
#include <linux/ratelimit.h>
+#include <linux/uaccess.h>

#include <asm/assembly.h>
-#include <asm/uaccess.h>
#include <asm/io.h>
#include <asm/irq.h>
#include <asm/traps.h>
@@ -800,7 +800,7 @@ void notrace handle_interruption(int code, struct pt_regs *regs)
* unless pagefault_disable() was called before.
*/

- if (fault_space == 0 && !in_atomic())
+ if (fault_space == 0 && !pagefault_disabled())
{
pdc_chassis_send_status(PDC_CHASSIS_DIRECT_PANIC);
parisc_terminate("Kernel Fault", regs, code, fault_address);
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index e5120e6..15503ad 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -15,8 +15,8 @@
#include <linux/sched.h>
#include <linux/interrupt.h>
#include <linux/module.h>
+#include <linux/uaccess.h>

-#include <asm/uaccess.h>
#include <asm/traps.h>

/* Various important other fields */
@@ -207,7 +207,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
int fault;
unsigned int flags;

- if (in_atomic())
+ if (pagefault_disabled())
goto no_context;

tsk = current;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index b396868..04658b5 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -33,13 +33,13 @@
#include <linux/ratelimit.h>
#include <linux/context_tracking.h>
#include <linux/hugetlb.h>
+#include <linux/uaccess.h>

#include <asm/firmware.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/mmu.h>
#include <asm/mmu_context.h>
-#include <asm/uaccess.h>
#include <asm/tlbflush.h>
#include <asm/siginfo.h>
#include <asm/debug.h>
@@ -272,15 +272,16 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
if (!arch_irq_disabled_regs(regs))
local_irq_enable();

- if (in_atomic() || mm == NULL) {
+ if (pagefault_disabled() || mm == NULL) {
if (!user_mode(regs)) {
rc = SIGSEGV;
goto bail;
}
- /* in_atomic() in user mode is really bad,
+ /* pagefault_disabled() in user mode is really bad,
as is current->mm == NULL. */
printk(KERN_EMERG "Page fault in user mode with "
- "in_atomic() = %d mm = %p\n", in_atomic(), mm);
+ "pagefault_disabled() = %d mm = %p\n",
+ pagefault_disabled(), mm);
printk(KERN_EMERG "NIP = %lx MSR = %lx\n",
regs->nip, regs->msr);
die("Weird page fault", regs, SIGSEGV);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 76515bc..a3dcc10 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -399,7 +399,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
* user context.
*/
fault = VM_FAULT_BADCONTEXT;
- if (unlikely(!user_space_fault(regs) || in_atomic() || !mm))
+ if (unlikely(!user_space_fault(regs) || pagefault_disabled() || !mm))
goto out;

address = trans_exc_code & __FAIL_ADDR_MASK;
diff --git a/arch/score/mm/fault.c b/arch/score/mm/fault.c
index 6860beb..37a6c2e 100644
--- a/arch/score/mm/fault.c
+++ b/arch/score/mm/fault.c
@@ -34,6 +34,7 @@
#include <linux/string.h>
#include <linux/types.h>
#include <linux/ptrace.h>
+#include <linux/uaccess.h>

/*
* This routine handles page faults. It determines the address,
@@ -73,7 +74,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto bad_area_nosemaphore;

if (user_mode(regs))
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index a58fec9..f7a83a9 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/perf_event.h>
#include <linux/kdebug.h>
+#include <linux/uaccess.h>
#include <asm/io_trapped.h>
#include <asm/mmu_context.h>
#include <asm/tlbflush.h>
@@ -440,7 +441,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
* If we're in an interrupt, have no user context or are running
* in an atomic region then we must not take the fault:
*/
- if (unlikely(in_atomic() || !mm)) {
+ if (unlikely(pagefault_disabled() || !mm)) {
bad_area_nosemaphore(regs, error_code, address);
return;
}
diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index 70d8171..c399e7b 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -21,6 +21,7 @@
#include <linux/perf_event.h>
#include <linux/interrupt.h>
#include <linux/kdebug.h>
+#include <linux/uaccess.h>

#include <asm/page.h>
#include <asm/pgtable.h>
@@ -29,7 +30,6 @@
#include <asm/setup.h>
#include <asm/smp.h>
#include <asm/traps.h>
-#include <asm/uaccess.h>

#include "mm_32.h"

@@ -196,7 +196,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 4798232..0ff9e50 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -22,12 +22,12 @@
#include <linux/kdebug.h>
#include <linux/percpu.h>
#include <linux/context_tracking.h>
+#include <linux/uaccess.h>

#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/openprom.h>
#include <asm/oplib.h>
-#include <asm/uaccess.h>
#include <asm/asi.h>
#include <asm/lsu.h>
#include <asm/sections.h>
@@ -330,7 +330,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto intr_or_no_mm;

perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 4ca0d6b..795ea96 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2706,7 +2706,7 @@ void hugetlb_setup(struct pt_regs *regs)
struct mm_struct *mm = current->mm;
struct tsb_config *tp;

- if (in_atomic() || !mm) {
+ if (pagefault_disabled() || !mm) {
const struct exception_table_entry *entry;

entry = search_exception_tables(regs->tpc);
diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index e83cc99..3f4f58d 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -354,9 +354,9 @@ static int handle_page_fault(struct pt_regs *regs,

/*
* If we're in an interrupt, have no user context or are running in an
- * atomic region then we must not take the fault.
+ * region with pagefaults disabled then we must not take the fault.
*/
- if (in_atomic() || !mm) {
+ if (pagefault_disabled() || !mm) {
vma = NULL; /* happy compiler */
goto bad_area_nosemaphore;
}
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index 8e4daf4..7317c87 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -35,10 +35,10 @@ int handle_page_fault(unsigned long address, unsigned long ip,
*code_out = SEGV_MAPERR;

/*
- * If the fault was during atomic operation, don't take the fault, just
+ * If the fault was with pagefaults disabled, don't take the fault, just
* fail.
*/
- if (in_atomic())
+ if (pagefault_disabled())
goto out_nosemaphore;

if (is_user)
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index 0dc922d..b3cf385 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -218,7 +218,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 181c53b..be394b1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -13,6 +13,7 @@
#include <linux/hugetlb.h> /* hstate_index_to_shift */
#include <linux/prefetch.h> /* prefetchw */
#include <linux/context_tracking.h> /* exception_enter(), ... */
+#include <linux/uaccess.h> /* pagefault_disabled() */

#include <asm/traps.h> /* dotraplinkage, ... */
#include <asm/pgalloc.h> /* pgd_*(), ... */
@@ -1126,9 +1127,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,

/*
* If we're in an interrupt, have no user context or are running
- * in an atomic region then we must not take the fault:
+ * in a region with pagefaults disabled then we must not take the fault
*/
- if (unlikely(in_atomic() || !mm)) {
+ if (unlikely(pagefault_disabled() || !mm)) {
bad_area_nosemaphore(regs, error_code, address);
return;
}
diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 9e3571a..99ba449 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -15,10 +15,10 @@
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/hardirq.h>
+#include <linux/uaccess.h>
#include <asm/mmu_context.h>
#include <asm/cacheflush.h>
#include <asm/hardirq.h>
-#include <asm/uaccess.h>
#include <asm/pgalloc.h>

DEFINE_PER_CPU(unsigned long, asid_cache) = ASID_USER_FIRST;
@@ -57,7 +57,7 @@ void do_page_fault(struct pt_regs *regs)
/* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm) {
+ if (pagefault_disabled() || !mm) {
bad_page_fault(regs, address, SIGSEGV);
return;
}
--
2.1.4

2015-05-06 17:53:33

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 07/15] drm/i915: use pagefault_disabled() to check for disabled pagefaults

Now that the pagefault disabled counter is in place, we can replace
the in_atomic() check by a pagefault_disabled() checks.

Signed-off-by: David Hildenbrand <[email protected]>
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7ab63d9..98dc211 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -32,6 +32,7 @@
#include "i915_trace.h"
#include "intel_drv.h"
#include <linux/dma_remapping.h>
+#include <linux/uaccess.h>

#define __EXEC_OBJECT_HAS_PIN (1<<31)
#define __EXEC_OBJECT_HAS_FENCE (1<<30)
@@ -458,7 +459,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
}

/* We can't wait for rendering with pagefaults disabled */
- if (obj->active && in_atomic())
+ if (obj->active && pagefault_disabled())
return -EFAULT;

if (use_cpu_reloc(obj))
--
2.1.4

2015-05-06 17:53:14

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 08/15] futex: UP futex_atomic_op_inuser() relies on disabled preemption

Let's explicitly disable/enable preemption in the !CONFIG_SMP version
of futex_atomic_op_inuser, to prepare for pagefault_disable() not
touching preemption anymore.

Otherwise we might break mutual exclusion when relying on a get_user()/
put_user() implementation.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/asm-generic/futex.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/futex.h b/include/asm-generic/futex.h
index b59b5a5..3586017 100644
--- a/include/asm-generic/futex.h
+++ b/include/asm-generic/futex.h
@@ -8,8 +8,7 @@
#ifndef CONFIG_SMP
/*
* The following implementation only for uniprocessor machines.
- * For UP, it's relies on the fact that pagefault_disable() also disables
- * preemption to ensure mutual exclusion.
+ * It relies on preempt_disable() ensuring mutual exclusion.
*
*/

@@ -38,6 +37,7 @@ futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
oparg = 1 << oparg;

+ preempt_disable();
pagefault_disable();

ret = -EFAULT;
@@ -72,6 +72,7 @@ futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)

out_pagefault_enable:
pagefault_enable();
+ preempt_enable();

if (ret == 0) {
switch (cmp) {
--
2.1.4

2015-05-06 17:53:20

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 09/15] futex: UP futex_atomic_cmpxchg_inatomic() relies on disabled preemption

Let's explicitly disable/enable preemption in the !CONFIG_SMP version
of futex_atomic_cmpxchg_inatomic, to prepare for pagefault_disable() not
touching preemption anymore. This is needed for this function to be
callable from both, atomic and non-atomic context.

Otherwise we might break mutual exclusion when relying on a get_user()/
put_user() implementation.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/asm-generic/futex.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/include/asm-generic/futex.h b/include/asm-generic/futex.h
index 3586017..e56272c 100644
--- a/include/asm-generic/futex.h
+++ b/include/asm-generic/futex.h
@@ -107,6 +107,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
{
u32 val;

+ preempt_disable();
if (unlikely(get_user(val, uaddr) != 0))
return -EFAULT;

@@ -114,6 +115,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
return -EFAULT;

*uval = val;
+ preempt_enable();

return 0;
}
--
2.1.4

2015-05-06 17:53:09

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 10/15] arm/futex: UP futex_atomic_cmpxchg_inatomic() relies on disabled preemption

The !CONFIG_SMP implementation of futex_atomic_cmpxchg_inatomic()
requires preemption to be disabled to guarantee mutual exclusion.
Let's make this explicit.

This patch is based on a patch by Sebastian Andrzej Siewior on the
-rt branch.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/arm/include/asm/futex.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index 4e78065..255bfd1 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -93,6 +93,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;

+ preempt_disable();
__asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
"1: " TUSER(ldr) " %1, [%4]\n"
" teq %1, %2\n"
@@ -104,6 +105,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
: "cc", "memory");

*uval = val;
+ preempt_enable();
+
return ret;
}

--
2.1.4

2015-05-06 17:52:52

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 11/15] arm/futex: UP futex_atomic_op_inuser() relies on disabled preemption

The !CONFIG_SMP implementation of futex_atomic_op_inuser() seems to rely
on disabled preemption to guarantee mutual exclusion.

>From commit e589ed23dd27:
"For UP it's enough to disable preemption to ensure mutual exclusion..."
>From the code itself:
"!SMP, we can work around lack of atomic ops by disabling preemption"

Let's make this explicit, to prepare for pagefault_disable() not
touching preemption anymore.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/arm/include/asm/futex.h | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index 255bfd1..5eed828 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -127,7 +127,10 @@ futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;

- pagefault_disable(); /* implies preempt_disable() */
+#ifndef CONFIG_SMP
+ preempt_disable();
+#endif
+ pagefault_disable();

switch (op) {
case FUTEX_OP_SET:
@@ -149,7 +152,10 @@ futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable(); /* subsumes preempt_enable() */
+ pagefault_enable();
+#ifndef CONFIG_SMP
+ preempt_enable();
+#endif

if (!ret) {
switch (cmp) {
--
2.1.4

2015-05-06 17:51:34

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 12/15] futex: clarify that preemption doesn't have to be disabled

As arm64 and arc have no special implementations for !CONFIG_SMP, mutual
exclusion doesn't seem to rely on preemption.

Let's make it clear in the comments that preemption doesn't have to be
disabled when accessing user space in the futex code, so we can remove
preempt_disable() from pagefault_disable().

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/arc/include/asm/futex.h | 10 +++++-----
arch/arm64/include/asm/futex.h | 4 ++--
2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arc/include/asm/futex.h b/arch/arc/include/asm/futex.h
index 4dc64dd..05b5aaf 100644
--- a/arch/arc/include/asm/futex.h
+++ b/arch/arc/include/asm/futex.h
@@ -53,7 +53,7 @@ static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
return -EFAULT;

- pagefault_disable(); /* implies preempt_disable() */
+ pagefault_disable();

switch (op) {
case FUTEX_OP_SET:
@@ -75,7 +75,7 @@ static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable(); /* subsumes preempt_enable() */
+ pagefault_enable();

if (!ret) {
switch (cmp) {
@@ -104,7 +104,7 @@ static inline int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
return ret;
}

-/* Compare-xchg with preemption disabled.
+/* Compare-xchg with pagefaults disabled.
* Notes:
* -Best-Effort: Exchg happens only if compare succeeds.
* If compare fails, returns; leaving retry/looping to upper layers
@@ -121,7 +121,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, u32 oldval,
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
return -EFAULT;

- pagefault_disable(); /* implies preempt_disable() */
+ pagefault_disable();

/* TBD : can use llock/scond */
__asm__ __volatile__(
@@ -142,7 +142,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, u32 oldval,
: "r"(oldval), "r"(newval), "r"(uaddr), "ir"(-EFAULT)
: "cc", "memory");

- pagefault_enable(); /* subsumes preempt_enable() */
+ pagefault_enable();

*uval = val;
return val;
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index 5f750dc..74069b3 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -58,7 +58,7 @@ futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;

- pagefault_disable(); /* implies preempt_disable() */
+ pagefault_disable();

switch (op) {
case FUTEX_OP_SET:
@@ -85,7 +85,7 @@ futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
ret = -ENOSYS;
}

- pagefault_enable(); /* subsumes preempt_enable() */
+ pagefault_enable();

if (!ret) {
switch (cmp) {
--
2.1.4

2015-05-06 17:51:38

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 13/15] powerpc: enable_kernel_altivec() requires disabled preemption

enable_kernel_altivec() has to be called with disabled preemption.
Let's make this explicit, to prepare for pagefault_disable() not
touching preemption anymore.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/powerpc/lib/vmx-helper.c | 11 ++++++-----
drivers/crypto/vmx/aes.c | 8 +++++++-
drivers/crypto/vmx/aes_cbc.c | 6 ++++++
drivers/crypto/vmx/ghash.c | 8 ++++++++
4 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/lib/vmx-helper.c b/arch/powerpc/lib/vmx-helper.c
index 3cf529c..ac93a3b 100644
--- a/arch/powerpc/lib/vmx-helper.c
+++ b/arch/powerpc/lib/vmx-helper.c
@@ -27,11 +27,11 @@ int enter_vmx_usercopy(void)
if (in_interrupt())
return 0;

- /* This acts as preempt_disable() as well and will make
- * enable_kernel_altivec(). We need to disable page faults
- * as they can call schedule and thus make us lose the VMX
- * context. So on page faults, we just fail which will cause
- * a fallback to the normal non-vmx copy.
+ preempt_disable();
+ /*
+ * We need to disable page faults as they can call schedule and
+ * thus make us lose the VMX context. So on page faults, we just
+ * fail which will cause a fallback to the normal non-vmx copy.
*/
pagefault_disable();

@@ -47,6 +47,7 @@ int enter_vmx_usercopy(void)
int exit_vmx_usercopy(void)
{
pagefault_enable();
+ preempt_enable();
return 0;
}

diff --git a/drivers/crypto/vmx/aes.c b/drivers/crypto/vmx/aes.c
index ab300ea..a9064e3 100644
--- a/drivers/crypto/vmx/aes.c
+++ b/drivers/crypto/vmx/aes.c
@@ -78,12 +78,14 @@ static int p8_aes_setkey(struct crypto_tfm *tfm, const u8 *key,
int ret;
struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);

+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();
ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
ret += aes_p8_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
pagefault_enable();
-
+ preempt_enable();
+
ret += crypto_cipher_setkey(ctx->fallback, key, keylen);
return ret;
}
@@ -95,10 +97,12 @@ static void p8_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
if (in_interrupt()) {
crypto_cipher_encrypt_one(ctx->fallback, dst, src);
} else {
+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();
aes_p8_encrypt(src, dst, &ctx->enc_key);
pagefault_enable();
+ preempt_enable();
}
}

@@ -109,10 +113,12 @@ static void p8_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
if (in_interrupt()) {
crypto_cipher_decrypt_one(ctx->fallback, dst, src);
} else {
+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();
aes_p8_decrypt(src, dst, &ctx->dec_key);
pagefault_enable();
+ preempt_enable();
}
}

diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
index 1a559b7..477284a 100644
--- a/drivers/crypto/vmx/aes_cbc.c
+++ b/drivers/crypto/vmx/aes_cbc.c
@@ -79,11 +79,13 @@ static int p8_aes_cbc_setkey(struct crypto_tfm *tfm, const u8 *key,
int ret;
struct p8_aes_cbc_ctx *ctx = crypto_tfm_ctx(tfm);

+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();
ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
ret += aes_p8_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
pagefault_enable();
+ preempt_enable();

ret += crypto_blkcipher_setkey(ctx->fallback, key, keylen);
return ret;
@@ -106,6 +108,7 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc,
if (in_interrupt()) {
ret = crypto_blkcipher_encrypt(&fallback_desc, dst, src, nbytes);
} else {
+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();

@@ -119,6 +122,7 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc,
}

pagefault_enable();
+ preempt_enable();
}

return ret;
@@ -141,6 +145,7 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc *desc,
if (in_interrupt()) {
ret = crypto_blkcipher_decrypt(&fallback_desc, dst, src, nbytes);
} else {
+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();

@@ -154,6 +159,7 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc *desc,
}

pagefault_enable();
+ preempt_enable();
}

return ret;
diff --git a/drivers/crypto/vmx/ghash.c b/drivers/crypto/vmx/ghash.c
index d0ffe27..f255ec4 100644
--- a/drivers/crypto/vmx/ghash.c
+++ b/drivers/crypto/vmx/ghash.c
@@ -114,11 +114,13 @@ static int p8_ghash_setkey(struct crypto_shash *tfm, const u8 *key,
if (keylen != GHASH_KEY_LEN)
return -EINVAL;

+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();
enable_kernel_fp();
gcm_init_p8(ctx->htable, (const u64 *) key);
pagefault_enable();
+ preempt_enable();
return crypto_shash_setkey(ctx->fallback, key, keylen);
}

@@ -140,23 +142,27 @@ static int p8_ghash_update(struct shash_desc *desc,
}
memcpy(dctx->buffer + dctx->bytes, src,
GHASH_DIGEST_SIZE - dctx->bytes);
+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();
enable_kernel_fp();
gcm_ghash_p8(dctx->shash, ctx->htable, dctx->buffer,
GHASH_DIGEST_SIZE);
pagefault_enable();
+ preempt_enable();
src += GHASH_DIGEST_SIZE - dctx->bytes;
srclen -= GHASH_DIGEST_SIZE - dctx->bytes;
dctx->bytes = 0;
}
len = srclen & ~(GHASH_DIGEST_SIZE - 1);
if (len) {
+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();
enable_kernel_fp();
gcm_ghash_p8(dctx->shash, ctx->htable, src, len);
pagefault_enable();
+ preempt_enable();
src += len;
srclen -= len;
}
@@ -180,12 +186,14 @@ static int p8_ghash_final(struct shash_desc *desc, u8 *out)
if (dctx->bytes) {
for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++)
dctx->buffer[i] = 0;
+ preempt_disable();
pagefault_disable();
enable_kernel_altivec();
enable_kernel_fp();
gcm_ghash_p8(dctx->shash, ctx->htable, dctx->buffer,
GHASH_DIGEST_SIZE);
pagefault_enable();
+ preempt_enable();
dctx->bytes = 0;
}
memcpy(out, dctx->shash, GHASH_DIGEST_SIZE);
--
2.1.4

2015-05-06 17:51:27

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 14/15] mips: properly lock access to the fpu

Let's always disable preemption and pagefaults when locking the fpu,
so we can be sure that the owner won't change in between.

This is a preparation for pagefault_disable() not touching preemption
anymore.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/mips/kernel/signal-common.h | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/mips/kernel/signal-common.h b/arch/mips/kernel/signal-common.h
index 06805e0..0b85f82 100644
--- a/arch/mips/kernel/signal-common.h
+++ b/arch/mips/kernel/signal-common.h
@@ -28,12 +28,7 @@ extern void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
extern int fpcsr_pending(unsigned int __user *fpcsr);

/* Make sure we will not lose FPU ownership */
-#ifdef CONFIG_PREEMPT
-#define lock_fpu_owner() preempt_disable()
-#define unlock_fpu_owner() preempt_enable()
-#else
-#define lock_fpu_owner() pagefault_disable()
-#define unlock_fpu_owner() pagefault_enable()
-#endif
+#define lock_fpu_owner() ({ preempt_disable(); pagefault_disable(); })
+#define unlock_fpu_owner() ({ pagefault_enable(); preempt_enable(); })

#endif /* __SIGNAL_COMMON_H */
--
2.1.4

2015-05-06 17:51:29

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH RFC 15/15] uaccess: decouple preemption from the pagefault logic

As the fault handlers now all rely on the pagefault_disabled() checks
and implicit preempt_disable() calls by pagefault_disable() have been
made explicit, we can completely rely on the pagefault_disableD counter.

So let's no longer touch the preempt count when disabling/enabling
pagefaults. After a call to pagefault_disable(), pagefault_disabled()
will return true, but in_atomic() won't.

Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/uaccess.h | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 23290cc..83f2a7b 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -1,7 +1,6 @@
#ifndef __LINUX_UACCESS_H__
#define __LINUX_UACCESS_H__

-#include <linux/preempt.h>
#include <linux/sched.h>
#include <asm/uaccess.h>

@@ -20,17 +19,11 @@ static __always_inline void pagefault_disabled_dec(void)
* These routines enable/disable the pagefault handler. If disabled, it will
* not take any locks and go straight to the fixup table.
*
- * We increase the preempt and the pagefault count, to be able to distinguish
- * whether we run in simple atomic context or in a real pagefault_disable()
- * context.
- *
- * For now, after pagefault_disabled() has been called, we run in atomic
- * context. User access methods will not sleep.
- *
+ * User access methods will not sleep when called from a pagefault_disabled()
+ * environment.
*/
static inline void pagefault_disable(void)
{
- preempt_count_inc();
pagefault_disabled_inc();
/*
* make sure to have issued the store before a pagefault
@@ -47,11 +40,6 @@ static inline void pagefault_enable(void)
*/
barrier();
pagefault_disabled_dec();
-#ifndef CONFIG_PREEMPT
- preempt_count_dec();
-#else
- preempt_enable();
-#endif
}

/*
--
2.1.4

2015-05-06 22:02:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH RFC 00/15] decouple pagefault_disable() from preempt_disable()

On Wed, 6 May 2015 19:50:24 +0200 David Hildenbrand <[email protected]> wrote:

> As Peter asked me to also do the decoupling in one shot, this is
> the new series.
>
> I recently discovered that might_fault() doesn't call might_sleep()
> anymore. Therefore bugs like:
>
> spin_lock(&lock);
> rc = copy_to_user(...);
> spin_unlock(&lock);
>
> would not be detected with CONFIG_DEBUG_ATOMIC_SLEEP. The code was
> changed to disable false positives for code like:
>
> pagefault_disable();
> rc = copy_to_user(...);
> pagefault_enable();
>
> Whereby the caller wants do deal with failures.

hm, that was a significant screwup. I wonder how many bugs we
subsequently added.

>
> ..
>

> This series therefore does 2 things:
>
>
> 1. Decouple pagefault_disable() from preempt_enable()
>
> ...
>
> 2. Reenable might_sleep() checks for might_fault()

All seems sensible to me. pagefault_disabled has to go into the
task_struct (rather than being per-cpu) because
pagefault_disabled_inc() doesn't disable preemption, yes?

2015-05-07 00:25:16

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH RFC 13/15] powerpc: enable_kernel_altivec() requires disabled preemption

On Wed, 2015-05-06 at 19:50 +0200, David Hildenbrand wrote:
> enable_kernel_altivec() has to be called with disabled preemption.
> Let's make this explicit, to prepare for pagefault_disable() not
> touching preemption anymore.
>
> Signed-off-by: David Hildenbrand <[email protected]>

Acked-by: Benjamin Herrenschmidt <[email protected]>

> ---
> arch/powerpc/lib/vmx-helper.c | 11 ++++++-----
> drivers/crypto/vmx/aes.c | 8 +++++++-
> drivers/crypto/vmx/aes_cbc.c | 6 ++++++
> drivers/crypto/vmx/ghash.c | 8 ++++++++
> 4 files changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/lib/vmx-helper.c b/arch/powerpc/lib/vmx-helper.c
> index 3cf529c..ac93a3b 100644
> --- a/arch/powerpc/lib/vmx-helper.c
> +++ b/arch/powerpc/lib/vmx-helper.c
> @@ -27,11 +27,11 @@ int enter_vmx_usercopy(void)
> if (in_interrupt())
> return 0;
>
> - /* This acts as preempt_disable() as well and will make
> - * enable_kernel_altivec(). We need to disable page faults
> - * as they can call schedule and thus make us lose the VMX
> - * context. So on page faults, we just fail which will cause
> - * a fallback to the normal non-vmx copy.
> + preempt_disable();
> + /*
> + * We need to disable page faults as they can call schedule and
> + * thus make us lose the VMX context. So on page faults, we just
> + * fail which will cause a fallback to the normal non-vmx copy.
> */
> pagefault_disable();
>
> @@ -47,6 +47,7 @@ int enter_vmx_usercopy(void)
> int exit_vmx_usercopy(void)
> {
> pagefault_enable();
> + preempt_enable();
> return 0;
> }
>
> diff --git a/drivers/crypto/vmx/aes.c b/drivers/crypto/vmx/aes.c
> index ab300ea..a9064e3 100644
> --- a/drivers/crypto/vmx/aes.c
> +++ b/drivers/crypto/vmx/aes.c
> @@ -78,12 +78,14 @@ static int p8_aes_setkey(struct crypto_tfm *tfm, const u8 *key,
> int ret;
> struct p8_aes_ctx *ctx = crypto_tfm_ctx(tfm);
>
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
> ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
> ret += aes_p8_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
> pagefault_enable();
> -
> + preempt_enable();
> +
> ret += crypto_cipher_setkey(ctx->fallback, key, keylen);
> return ret;
> }
> @@ -95,10 +97,12 @@ static void p8_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
> if (in_interrupt()) {
> crypto_cipher_encrypt_one(ctx->fallback, dst, src);
> } else {
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
> aes_p8_encrypt(src, dst, &ctx->enc_key);
> pagefault_enable();
> + preempt_enable();
> }
> }
>
> @@ -109,10 +113,12 @@ static void p8_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
> if (in_interrupt()) {
> crypto_cipher_decrypt_one(ctx->fallback, dst, src);
> } else {
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
> aes_p8_decrypt(src, dst, &ctx->dec_key);
> pagefault_enable();
> + preempt_enable();
> }
> }
>
> diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
> index 1a559b7..477284a 100644
> --- a/drivers/crypto/vmx/aes_cbc.c
> +++ b/drivers/crypto/vmx/aes_cbc.c
> @@ -79,11 +79,13 @@ static int p8_aes_cbc_setkey(struct crypto_tfm *tfm, const u8 *key,
> int ret;
> struct p8_aes_cbc_ctx *ctx = crypto_tfm_ctx(tfm);
>
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
> ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
> ret += aes_p8_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
> pagefault_enable();
> + preempt_enable();
>
> ret += crypto_blkcipher_setkey(ctx->fallback, key, keylen);
> return ret;
> @@ -106,6 +108,7 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc,
> if (in_interrupt()) {
> ret = crypto_blkcipher_encrypt(&fallback_desc, dst, src, nbytes);
> } else {
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
>
> @@ -119,6 +122,7 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc,
> }
>
> pagefault_enable();
> + preempt_enable();
> }
>
> return ret;
> @@ -141,6 +145,7 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc *desc,
> if (in_interrupt()) {
> ret = crypto_blkcipher_decrypt(&fallback_desc, dst, src, nbytes);
> } else {
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
>
> @@ -154,6 +159,7 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc *desc,
> }
>
> pagefault_enable();
> + preempt_enable();
> }
>
> return ret;
> diff --git a/drivers/crypto/vmx/ghash.c b/drivers/crypto/vmx/ghash.c
> index d0ffe27..f255ec4 100644
> --- a/drivers/crypto/vmx/ghash.c
> +++ b/drivers/crypto/vmx/ghash.c
> @@ -114,11 +114,13 @@ static int p8_ghash_setkey(struct crypto_shash *tfm, const u8 *key,
> if (keylen != GHASH_KEY_LEN)
> return -EINVAL;
>
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
> enable_kernel_fp();
> gcm_init_p8(ctx->htable, (const u64 *) key);
> pagefault_enable();
> + preempt_enable();
> return crypto_shash_setkey(ctx->fallback, key, keylen);
> }
>
> @@ -140,23 +142,27 @@ static int p8_ghash_update(struct shash_desc *desc,
> }
> memcpy(dctx->buffer + dctx->bytes, src,
> GHASH_DIGEST_SIZE - dctx->bytes);
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
> enable_kernel_fp();
> gcm_ghash_p8(dctx->shash, ctx->htable, dctx->buffer,
> GHASH_DIGEST_SIZE);
> pagefault_enable();
> + preempt_enable();
> src += GHASH_DIGEST_SIZE - dctx->bytes;
> srclen -= GHASH_DIGEST_SIZE - dctx->bytes;
> dctx->bytes = 0;
> }
> len = srclen & ~(GHASH_DIGEST_SIZE - 1);
> if (len) {
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
> enable_kernel_fp();
> gcm_ghash_p8(dctx->shash, ctx->htable, src, len);
> pagefault_enable();
> + preempt_enable();
> src += len;
> srclen -= len;
> }
> @@ -180,12 +186,14 @@ static int p8_ghash_final(struct shash_desc *desc, u8 *out)
> if (dctx->bytes) {
> for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++)
> dctx->buffer[i] = 0;
> + preempt_disable();
> pagefault_disable();
> enable_kernel_altivec();
> enable_kernel_fp();
> gcm_ghash_p8(dctx->shash, ctx->htable, dctx->buffer,
> GHASH_DIGEST_SIZE);
> pagefault_enable();
> + preempt_enable();
> dctx->bytes = 0;
> }
> memcpy(out, dctx->shash, GHASH_DIGEST_SIZE);

2015-05-07 06:23:59

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 00/15] decouple pagefault_disable() from preempt_disable()

> > This series therefore does 2 things:
> >
> >
> > 1. Decouple pagefault_disable() from preempt_enable()
> >
> > ...
> >
> > 2. Reenable might_sleep() checks for might_fault()
>
> All seems sensible to me. pagefault_disabled has to go into the
> task_struct (rather than being per-cpu) because
> pagefault_disabled_inc() doesn't disable preemption, yes?
>

Right, we can now get scheduled while in pagefault_disable() (if preemption
hasn't been disabled manually). So we have to store it per task/thread not per
cpu.

Actually even the preempt disable counter is only per-cpu for x86 and lives in
thread_info for all other archs (which is also not 100% clean but doesn't
matter at that point).

I had that pagefault disable counter in thread_info before, but that required
messing with asm-offsets of some arch (I had a proper version but this one
feels cleaner).

Thanks for having a look!

David

2015-05-07 10:21:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RFC 00/15] decouple pagefault_disable() from preempt_disable()


* Andrew Morton <[email protected]> wrote:

> On Wed, 6 May 2015 19:50:24 +0200 David Hildenbrand <[email protected]> wrote:
>
> > As Peter asked me to also do the decoupling in one shot, this is
> > the new series.
> >
> > I recently discovered that might_fault() doesn't call might_sleep()
> > anymore. Therefore bugs like:
> >
> > spin_lock(&lock);
> > rc = copy_to_user(...);
> > spin_unlock(&lock);
> >
> > would not be detected with CONFIG_DEBUG_ATOMIC_SLEEP. The code was
> > changed to disable false positives for code like:
> >
> > pagefault_disable();
> > rc = copy_to_user(...);
> > pagefault_enable();
> >
> > Whereby the caller wants do deal with failures.
>
> hm, that was a significant screwup. I wonder how many bugs we
> subsequently added.

So I'm wondering what the motivation was to allow things like:

pagefault_disable();
rc = copy_to_user(...);
pagefault_enable();

and to declare it a false positive?

AFAICS most uses are indeed atomic:

pagefault_disable();
ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
pagefault_enable();

so why not make it explicitly atomic again?

Thanks,

Ingo

2015-05-07 10:23:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

On Wed, May 06, 2015 at 07:50:25PM +0200, David Hildenbrand wrote:
> +/*
> + * Is the pagefault handler disabled? If so, user access methods will not sleep.
> + */
> +#define pagefault_disabled() (current->pagefault_disabled != 0)

So -RT has:

static inline bool pagefault_disabled(void)
{
return current->pagefault_disabled || in_atomic();
}

AFAICR we did this to avoid having to do both:

preempt_disable();
pagefault_disable();

in a fair number of places -- just like this patch-set does, this is
touching two cachelines where one would have been enough.

Also, removing in_atomic() from fault handlers like you did
significantly changes semantics for interrupts (soft, hard and NMI).

So while I agree with most of these patches, I'm very hesitant on the
above little detail.

2015-05-07 10:51:09

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

> On Wed, May 06, 2015 at 07:50:25PM +0200, David Hildenbrand wrote:
> > +/*
> > + * Is the pagefault handler disabled? If so, user access methods will not sleep.
> > + */
> > +#define pagefault_disabled() (current->pagefault_disabled != 0)
>
> So -RT has:
>
> static inline bool pagefault_disabled(void)
> {
> return current->pagefault_disabled || in_atomic();
> }
>
> AFAICR we did this to avoid having to do both:
>
> preempt_disable();
> pagefault_disable();
>
> in a fair number of places -- just like this patch-set does, this is
> touching two cachelines where one would have been enough.
>
> Also, removing in_atomic() from fault handlers like you did
> significantly changes semantics for interrupts (soft, hard and NMI).
>
> So while I agree with most of these patches, I'm very hesitant on the
> above little detail.
>

Just to make sure we have a common understanding (as written in my cover
letter):

Your suggestion won't work with !CONFIG_PREEMPT (!CONFIG_PREEMPT_COUNT). If
there is no preempt counter, in_atomic() won't work. So doing a
preempt_disable() instead of a pagefault_disable() is not going to work.
(not sure how -RT handles that - most probably with CONFIG_PREEMPT_COUNT being
enabled, due to atomic debug).

That's why I dropped that check for a reason.

So in my opinion, in_atomic() should never be used in any fault handler - it
has nothing to do with disabled pagefaults. It doesn't give us anything more
besides some false security for atomic environments.


This patchset is about decoupling both concept. (not ending up with to
mechanisms doing almost the same)

That's also what Thomas Gleixner suggested
https://lkml.org/lkml/2014/11/27/820 .


David

2015-05-07 10:51:37

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH RFC 00/15] decouple pagefault_disable() from preempt_disable()

Am 07.05.2015 um 11:48 schrieb Ingo Molnar:
>
> * Andrew Morton <[email protected]> wrote:
>
>> On Wed, 6 May 2015 19:50:24 +0200 David Hildenbrand <[email protected]> wrote:
>>
>>> As Peter asked me to also do the decoupling in one shot, this is
>>> the new series.
>>>
>>> I recently discovered that might_fault() doesn't call might_sleep()
>>> anymore. Therefore bugs like:
>>>
>>> spin_lock(&lock);
>>> rc = copy_to_user(...);
>>> spin_unlock(&lock);
>>>
>>> would not be detected with CONFIG_DEBUG_ATOMIC_SLEEP. The code was
>>> changed to disable false positives for code like:
>>>
>>> pagefault_disable();
>>> rc = copy_to_user(...);
>>> pagefault_enable();
>>>
>>> Whereby the caller wants do deal with failures.
>>
>> hm, that was a significant screwup. I wonder how many bugs we
>> subsequently added.
>
> So I'm wondering what the motivation was to allow things like:
>
> pagefault_disable();
> rc = copy_to_user(...);
> pagefault_enable();
>
> and to declare it a false positive?
>
> AFAICS most uses are indeed atomic:
>
> pagefault_disable();
> ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
> pagefault_enable();
>
> so why not make it explicitly atomic again?

Hmm, I am probably misreading that, but it sound as you suggest to go back
to Davids first proposal
https://lkml.org/lkml/2014/11/25/436
which makes might_fault to also contain might_sleep. Correct?

Christian

2015-05-07 11:08:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RFC 00/15] decouple pagefault_disable() from preempt_disable()


* Christian Borntraeger <[email protected]> wrote:

> Am 07.05.2015 um 11:48 schrieb Ingo Molnar:
> >
> > * Andrew Morton <[email protected]> wrote:
> >
> >> On Wed, 6 May 2015 19:50:24 +0200 David Hildenbrand <[email protected]> wrote:
> >>
> >>> As Peter asked me to also do the decoupling in one shot, this is
> >>> the new series.
> >>>
> >>> I recently discovered that might_fault() doesn't call might_sleep()
> >>> anymore. Therefore bugs like:
> >>>
> >>> spin_lock(&lock);
> >>> rc = copy_to_user(...);
> >>> spin_unlock(&lock);
> >>>
> >>> would not be detected with CONFIG_DEBUG_ATOMIC_SLEEP. The code was
> >>> changed to disable false positives for code like:
> >>>
> >>> pagefault_disable();
> >>> rc = copy_to_user(...);
> >>> pagefault_enable();
> >>>
> >>> Whereby the caller wants do deal with failures.
> >>
> >> hm, that was a significant screwup. I wonder how many bugs we
> >> subsequently added.
> >
> > So I'm wondering what the motivation was to allow things like:
> >
> > pagefault_disable();
> > rc = copy_to_user(...);
> > pagefault_enable();
> >
> > and to declare it a false positive?
> >
> > AFAICS most uses are indeed atomic:
> >
> > pagefault_disable();
> > ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
> > pagefault_enable();
> >
> > so why not make it explicitly atomic again?
>
> Hmm, I am probably misreading that, but it sound as you suggest to go back
> to Davids first proposal
> https://lkml.org/lkml/2014/11/25/436
> which makes might_fault to also contain might_sleep. Correct?

Yes, but I'm wondering what I'm missing: is there any deep reason for
making pagefaults-disabled sections non-atomic?

Thanks,

Ingo

2015-05-07 11:12:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

On Thu, May 07, 2015 at 12:50:53PM +0200, David Hildenbrand wrote:
> Just to make sure we have a common understanding (as written in my cover
> letter):
>
> Your suggestion won't work with !CONFIG_PREEMPT (!CONFIG_PREEMPT_COUNT). If
> there is no preempt counter, in_atomic() won't work.

But there is, we _always_ have a preempt_count, and irq_enter() et al.
_always_ increment the relevant bits.

The thread_info::preempt_count field it never under PREEMPT_COUNT
include/asm-generic/preempt.h provides stuff regardless of
PREEMPT_COUNT.

See how __irq_enter() -> preempt_count_add(HARDIRQ_OFFSET) ->
__preempt_count_add() _always_ just works.

Its only things like preempt_disable() / preempt_enable() that get
munged depending on PREEMPT_COUNT/PREEMPT.

2015-05-07 11:12:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled


* David Hildenbrand <[email protected]> wrote:

> > AFAICR we did this to avoid having to do both:
> >
> > preempt_disable();
> > pagefault_disable();
> >
> > in a fair number of places -- just like this patch-set does, this is
> > touching two cachelines where one would have been enough.
> >
> > Also, removing in_atomic() from fault handlers like you did
> > significantly changes semantics for interrupts (soft, hard and NMI).
> >
> > So while I agree with most of these patches, I'm very hesitant on the
> > above little detail.
>
> Just to make sure we have a common understanding (as written in my
> cover letter):
>
> Your suggestion won't work with !CONFIG_PREEMPT
> (!CONFIG_PREEMPT_COUNT). If there is no preempt counter, in_atomic()
> won't work. So doing a preempt_disable() instead of a
> pagefault_disable() is not going to work. (not sure how -RT handles
> that - most probably with CONFIG_PREEMPT_COUNT being enabled, due to
> atomic debug).
>
> That's why I dropped that check for a reason.

So, what's the point of disabling the preempt counter?

Looks like the much simpler (and faster) solution would be to
eliminate CONFIG_PREEMPT_COUNT (i.e. make it always available), and
use it for pagefault-disable.

> This patchset is about decoupling both concept. (not ending up with
> to mechanisms doing almost the same)

So that's really backwards: just because we might not have a handy
counter we introduce _another one_, and duplicate checks for it ;-)

Why not keep a single counter, if indeed what we care about most in
the pagefault_disable() case is atomicity?

Thanks,

Ingo

2015-05-07 11:23:45

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

> On Thu, May 07, 2015 at 12:50:53PM +0200, David Hildenbrand wrote:
> > Just to make sure we have a common understanding (as written in my cover
> > letter):
> >
> > Your suggestion won't work with !CONFIG_PREEMPT (!CONFIG_PREEMPT_COUNT). If
> > there is no preempt counter, in_atomic() won't work.
>
> But there is, we _always_ have a preempt_count, and irq_enter() et al.
> _always_ increment the relevant bits.
>
> The thread_info::preempt_count field it never under PREEMPT_COUNT
> include/asm-generic/preempt.h provides stuff regardless of
> PREEMPT_COUNT.
>
> See how __irq_enter() -> preempt_count_add(HARDIRQ_OFFSET) ->
> __preempt_count_add() _always_ just works.
>
> Its only things like preempt_disable() / preempt_enable() that get
> munged depending on PREEMPT_COUNT/PREEMPT.
>

Sorry for the confusion. Sure, there is always the count.

My point is that preempt_disable() won't result in an in_atomic() == true
with !PREEMPT_COUNT, so I don't see any point in adding in to the pagefault
handlers. It is not reliable.


David

2015-05-07 11:25:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled


* David Hildenbrand <[email protected]> wrote:

> > On Thu, May 07, 2015 at 12:50:53PM +0200, David Hildenbrand wrote:
> > > Just to make sure we have a common understanding (as written in my cover
> > > letter):
> > >
> > > Your suggestion won't work with !CONFIG_PREEMPT (!CONFIG_PREEMPT_COUNT). If
> > > there is no preempt counter, in_atomic() won't work.
> >
> > But there is, we _always_ have a preempt_count, and irq_enter() et al.
> > _always_ increment the relevant bits.
> >
> > The thread_info::preempt_count field it never under PREEMPT_COUNT
> > include/asm-generic/preempt.h provides stuff regardless of
> > PREEMPT_COUNT.
> >
> > See how __irq_enter() -> preempt_count_add(HARDIRQ_OFFSET) ->
> > __preempt_count_add() _always_ just works.
> >
> > Its only things like preempt_disable() / preempt_enable() that get
> > munged depending on PREEMPT_COUNT/PREEMPT.
> >
>
> Sorry for the confusion. Sure, there is always the count.
>
> My point is that preempt_disable() won't result in an in_atomic() == true
> with !PREEMPT_COUNT, so I don't see any point in adding in to the pagefault
> handlers. It is not reliable.

That's why we have the preempt_count_inc()/dec() methods that are
always available.

So where's the problem?

Thanks,

Ingo

2015-05-07 11:30:40

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

>
> * David Hildenbrand <[email protected]> wrote:
>
> > > On Thu, May 07, 2015 at 12:50:53PM +0200, David Hildenbrand wrote:
> > > > Just to make sure we have a common understanding (as written in my cover
> > > > letter):
> > > >
> > > > Your suggestion won't work with !CONFIG_PREEMPT (!CONFIG_PREEMPT_COUNT). If
> > > > there is no preempt counter, in_atomic() won't work.
> > >
> > > But there is, we _always_ have a preempt_count, and irq_enter() et al.
> > > _always_ increment the relevant bits.
> > >
> > > The thread_info::preempt_count field it never under PREEMPT_COUNT
> > > include/asm-generic/preempt.h provides stuff regardless of
> > > PREEMPT_COUNT.
> > >
> > > See how __irq_enter() -> preempt_count_add(HARDIRQ_OFFSET) ->
> > > __preempt_count_add() _always_ just works.
> > >
> > > Its only things like preempt_disable() / preempt_enable() that get
> > > munged depending on PREEMPT_COUNT/PREEMPT.
> > >
> >
> > Sorry for the confusion. Sure, there is always the count.
> >
> > My point is that preempt_disable() won't result in an in_atomic() == true
> > with !PREEMPT_COUNT, so I don't see any point in adding in to the pagefault
> > handlers. It is not reliable.
>
> That's why we have the preempt_count_inc()/dec() methods that are
> always available.
>
> So where's the problem?


My point:

Getting rid of PREEMPT_COUNT (and therefore always doing
preempt_count_inc()/dec()) will make preempt_disable() __never__ be a NOP.

So with !CONFIG_PREEMPT we will do preemption stuff that is simply not needed.

Two concepts that share one mechanism. I think this is broken.

David

2015-05-07 11:40:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 00/15] decouple pagefault_disable() from preempt_disable()

On Thu, May 07, 2015 at 01:08:28PM +0200, Ingo Molnar wrote:
> Yes, but I'm wondering what I'm missing: is there any deep reason for
> making pagefaults-disabled sections non-atomic?

This all comes from -rt, where we had significant latencies due to these
things.

2015-05-07 11:40:46

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

> On Thu, May 07, 2015 at 12:50:53PM +0200, David Hildenbrand wrote:
> > Just to make sure we have a common understanding (as written in my cover
> > letter):
> >
> > Your suggestion won't work with !CONFIG_PREEMPT (!CONFIG_PREEMPT_COUNT). If
> > there is no preempt counter, in_atomic() won't work.
>
> But there is, we _always_ have a preempt_count, and irq_enter() et al.
> _always_ increment the relevant bits.
>
> The thread_info::preempt_count field it never under PREEMPT_COUNT
> include/asm-generic/preempt.h provides stuff regardless of
> PREEMPT_COUNT.
>
> See how __irq_enter() -> preempt_count_add(HARDIRQ_OFFSET) ->
> __preempt_count_add() _always_ just works.


Okay thinking about this further, I think I got your point. That basically means
that the in_atomic() check makes sense for irqs.

But in my opinion, it does not help do replace

preempt_disable()
pagefault_disable()

by

preempt_disable()


(as discussed because of the PREEMPT_COUNT stuff)

So I agree that we should better add it to not mess with hard/soft irq.

>
> Its only things like preempt_disable() / preempt_enable() that get
> munged depending on PREEMPT_COUNT/PREEMPT.
>

But anyhow, opinions seem to differ how to best handle that whole stuff.

I think a separate counter just makes sense, as we are dealing with two
different concepts and we don't want to lose the preempt_disable =^ NOP
for !CONFIG_PREEMPT.

I also think that

pagefault_disable()
rt = copy_from_user()
pagefault_enable()

is a valid use case.

So any suggestions how to continue?

Thanks!

David

2015-05-07 11:42:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

On Thu, May 07, 2015 at 01:23:35PM +0200, David Hildenbrand wrote:
> > On Thu, May 07, 2015 at 12:50:53PM +0200, David Hildenbrand wrote:
> > > Just to make sure we have a common understanding (as written in my cover
> > > letter):
> > >
> > > Your suggestion won't work with !CONFIG_PREEMPT (!CONFIG_PREEMPT_COUNT). If
> > > there is no preempt counter, in_atomic() won't work.
> >
> > But there is, we _always_ have a preempt_count, and irq_enter() et al.
> > _always_ increment the relevant bits.
> >
> > The thread_info::preempt_count field it never under PREEMPT_COUNT
> > include/asm-generic/preempt.h provides stuff regardless of
> > PREEMPT_COUNT.
> >
> > See how __irq_enter() -> preempt_count_add(HARDIRQ_OFFSET) ->
> > __preempt_count_add() _always_ just works.
> >
> > Its only things like preempt_disable() / preempt_enable() that get
> > munged depending on PREEMPT_COUNT/PREEMPT.
> >
>
> Sorry for the confusion. Sure, there is always the count.
>
> My point is that preempt_disable() won't result in an in_atomic() == true
> with !PREEMPT_COUNT, so I don't see any point in adding in to the pagefault
> handlers. It is not reliable.

It _very_ reliably tells if we're in interrupts! Which your patches
break.

It also very much avoids touching two cachelines in a number of places.

2015-05-07 11:48:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

On Thu, May 07, 2015 at 01:40:30PM +0200, David Hildenbrand wrote:
> I think a separate counter just makes sense, as we are dealing with two
> different concepts and we don't want to lose the preempt_disable =^ NOP
> for !CONFIG_PREEMPT.

Right, let me try and get my head on straight -- I'm so used to
PREEMPT=y being the 'hard' case :-)

2015-05-07 11:51:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

On Thu, May 07, 2015 at 01:40:30PM +0200, David Hildenbrand wrote:
> But anyhow, opinions seem to differ how to best handle that whole stuff.
>
> I think a separate counter just makes sense, as we are dealing with two
> different concepts and we don't want to lose the preempt_disable =^ NOP
> for !CONFIG_PREEMPT.
>
> I also think that
>
> pagefault_disable()
> rt = copy_from_user()
> pagefault_enable()
>
> is a valid use case.
>
> So any suggestions how to continue?


static inline bool __pagefault_disabled(void)
{
return current->pagefault_disabled;
}

static inline bool pagefault_disabled(void)
{
return in_atomic() || __pagefault_disabled();
}

And leave the preempt_disable() + pagefault_disable() for now. You're
right in that that is clearest.

If we ever get to the point where that really is an issue, I'll try and
be clever then :-)

2015-05-07 12:14:52

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

> On Thu, May 07, 2015 at 01:40:30PM +0200, David Hildenbrand wrote:
> > But anyhow, opinions seem to differ how to best handle that whole stuff.
> >
> > I think a separate counter just makes sense, as we are dealing with two
> > different concepts and we don't want to lose the preempt_disable =^ NOP
> > for !CONFIG_PREEMPT.
> >
> > I also think that
> >
> > pagefault_disable()
> > rt = copy_from_user()
> > pagefault_enable()
> >
> > is a valid use case.
> >
> > So any suggestions how to continue?
>
>
> static inline bool __pagefault_disabled(void)
> {
> return current->pagefault_disabled;
> }
>
> static inline bool pagefault_disabled(void)
> {
> return in_atomic() || __pagefault_disabled();
> }
>
> And leave the preempt_disable() + pagefault_disable() for now. You're
> right in that that is clearest.
>
> If we ever get to the point where that really is an issue, I'll try and
> be clever then :-)
>

Thanks :), well just to make sure I got your opinion on this correctly:

1. You think that 2 counters is the way to go for now

2. You agree that we can't replace preempt_disable()+pagefault_disable() with
preempt_disable() (CONFIG_PREEMPT stuff), so we need to have them separately

3. We need in_atomic() (in the fault handlers only!) in addition to make sure we
don't mess with irq contexts (In that case I would add a good comment to that
place, describing why preempt_disable() won't help)


I think this is the right way to go because:

a) This way we don't have to modify preempt_disable() logic (including
PREEMPT_COUNT).

b) There are not that many users relying on
preempt_disable()+pagefault_disable() (compared to pure preempt_disable() or
pagefault_disable() users), so the performance overhead of two cache lines
should be small. Users only making use of one of them should see no difference
in performance.

c) We correctly decouple preemption and pagefault logic. Therefore we can now
preempt when pagefaults are disabled, which feels right.

d) We can get some of that -rt flavor into the base kernel

e) We don't require inatomic variants in pagefault_disable() context as Ingo
suggested (For me, this now feels wrong - I had a different opinion back then
when working on the first revision of this patchset).
We would use inatomic() because preempt_disable() behaves differently with
PREEMPT_COUNT, mixing concepts at the user level.


@Ingo, do you have a strong feeling against this whole patchset/idea?


David

2015-05-07 12:27:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled


* David Hildenbrand <[email protected]> wrote:

> @Ingo, do you have a strong feeling against this whole
> patchset/idea?

No objections, sounds good to me now.

Thanks,

Ingo

2015-05-07 12:32:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 01/15] uaccess: count pagefault_disable() levels in pagefault_disabled

On Thu, May 07, 2015 at 02:14:39PM +0200, David Hildenbrand wrote:
> Thanks :), well just to make sure I got your opinion on this correctly:
>
> 1. You think that 2 counters is the way to go for now

ack

> 2. You agree that we can't replace preempt_disable()+pagefault_disable() with
> preempt_disable() (CONFIG_PREEMPT stuff), so we need to have them separately

ack

> 3. We need in_atomic() (in the fault handlers only!) in addition to make sure we
> don't mess with irq contexts (In that case I would add a good comment to that
> place, describing why preempt_disable() won't help)

ack

> I think this is the right way to go because:
>
> a) This way we don't have to modify preempt_disable() logic (including
> PREEMPT_COUNT).
>
> b) There are not that many users relying on
> preempt_disable()+pagefault_disable() (compared to pure preempt_disable() or
> pagefault_disable() users), so the performance overhead of two cache lines
> should be small. Users only making use of one of them should see no difference
> in performance.

indeed.

> c) We correctly decouple preemption and pagefault logic. Therefore we can now
> preempt when pagefaults are disabled, which feels right.

Right, that's always been the intent of introducing pagefault_disable().

2015-05-07 15:45:45

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH draft] mm: use pagefault_disable() to check for disabled pagefaults in the handler

---- zip ---

This will replacement patch nr 6 if we have an agreement :)

So this is my suggestion to fix the irq context problem. We should not modify
pagefault_disabled() to check for in_atomic(), otherwise we can't reuse it
for might_fault()/might_sleep() checks.

This also allows to clean it up later (if ever :) ).

Thanks!

---- zip ---

Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.

Please note that we keep the in_atomic() checks in place. This is needed
to detect irq context (in which case preemption is always properly disabled).

In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().

Therefore, we add an comment to faulthandler_disabled(), describing why this
is needed.

faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.

This patch is based on a patch from Thomas Gleixner.

Signed-off-by: David Hildenbrand <[email protected]>
---
arch/alpha/mm/fault.c | 5 ++---
arch/arc/mm/fault.c | 2 +-
arch/arm/mm/fault.c | 2 +-
arch/arm64/mm/fault.c | 2 +-
arch/avr32/mm/fault.c | 4 ++--
arch/cris/mm/fault.c | 6 +++---
arch/frv/mm/fault.c | 4 ++--
arch/ia64/mm/fault.c | 4 ++--
arch/m32r/mm/fault.c | 8 ++++----
arch/m68k/mm/fault.c | 4 ++--
arch/metag/mm/fault.c | 2 +-
arch/microblaze/mm/fault.c | 8 ++++----
arch/mips/mm/fault.c | 4 ++--
arch/mn10300/mm/fault.c | 4 ++--
arch/nios2/mm/fault.c | 2 +-
arch/parisc/kernel/traps.c | 4 ++--
arch/parisc/mm/fault.c | 4 ++--
arch/powerpc/mm/fault.c | 9 +++++----
arch/s390/mm/fault.c | 2 +-
arch/score/mm/fault.c | 3 ++-
arch/sh/mm/fault.c | 5 +++--
arch/sparc/mm/fault_32.c | 4 ++--
arch/sparc/mm/fault_64.c | 4 ++--
arch/sparc/mm/init_64.c | 2 +-
arch/tile/mm/fault.c | 4 ++--
arch/um/kernel/trap.c | 4 ++--
arch/unicore32/mm/fault.c | 2 +-
arch/x86/mm/fault.c | 5 +++--
arch/xtensa/mm/fault.c | 4 ++--
include/linux/uaccess.h | 12 ++++++++++++
30 files changed, 72 insertions(+), 57 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index 9d0ac09..4a905bd 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -23,8 +23,7 @@
#include <linux/smp.h>
#include <linux/interrupt.h>
#include <linux/module.h>
-
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>

extern void die_if_kernel(char *,struct pt_regs *,long, unsigned long *);

@@ -107,7 +106,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,

/* If we're in an interrupt context, or have no user context,
we must not take the fault. */
- if (!mm || in_atomic())
+ if (!mm || faulthandler_disabled())
goto no_context;

#ifdef CONFIG_ALPHA_LARGE_VMALLOC
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 6a2e006..d948e4e 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -86,7 +86,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 6333d9c..0d629b8 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -276,7 +276,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 96da131..0948d32 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -211,7 +211,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
* If we're in an interrupt or have no user context, we must not take
* the fault.
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/avr32/mm/fault.c b/arch/avr32/mm/fault.c
index d223a8b..c035339 100644
--- a/arch/avr32/mm/fault.c
+++ b/arch/avr32/mm/fault.c
@@ -14,11 +14,11 @@
#include <linux/pagemap.h>
#include <linux/kdebug.h>
#include <linux/kprobes.h>
+#include <linux/uaccess.h>

#include <asm/mmu_context.h>
#include <asm/sysreg.h>
#include <asm/tlb.h>
-#include <asm/uaccess.h>

#ifdef CONFIG_KPROBES
static inline int notify_page_fault(struct pt_regs *regs, int trap)
@@ -81,7 +81,7 @@ asmlinkage void do_page_fault(unsigned long ecr, struct pt_regs *regs)
* If we're in an interrupt or have no user context, we must
* not take the fault...
*/
- if (in_atomic() || !mm || regs->sr & SYSREG_BIT(GM))
+ if (faulthandler_disabled() || !mm || regs->sr & SYSREG_BIT(GM))
goto no_context;

local_irq_enable();
diff --git a/arch/cris/mm/fault.c b/arch/cris/mm/fault.c
index 83f12f2..3066d40 100644
--- a/arch/cris/mm/fault.c
+++ b/arch/cris/mm/fault.c
@@ -8,7 +8,7 @@
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/wait.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
#include <arch/system.h>

extern int find_fixup_code(struct pt_regs *);
@@ -109,11 +109,11 @@ do_page_fault(unsigned long address, struct pt_regs *regs,
info.si_code = SEGV_MAPERR;

/*
- * If we're in an interrupt or "atomic" operation or have no
+ * If we're in an interrupt, have pagefaults disabled or have no
* user context, we must not take the fault.
*/

- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/frv/mm/fault.c b/arch/frv/mm/fault.c
index ec4917d..61d9976 100644
--- a/arch/frv/mm/fault.c
+++ b/arch/frv/mm/fault.c
@@ -19,9 +19,9 @@
#include <linux/kernel.h>
#include <linux/ptrace.h>
#include <linux/hardirq.h>
+#include <linux/uaccess.h>

#include <asm/pgtable.h>
-#include <asm/uaccess.h>
#include <asm/gdb-stub.h>

/*****************************************************************************/
@@ -78,7 +78,7 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

if (user_mode(__frame))
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index ba5ba7a..70b40d1 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -11,10 +11,10 @@
#include <linux/kprobes.h>
#include <linux/kdebug.h>
#include <linux/prefetch.h>
+#include <linux/uaccess.h>

#include <asm/pgtable.h>
#include <asm/processor.h>
-#include <asm/uaccess.h>

extern int die(char *, struct pt_regs *, long);

@@ -96,7 +96,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
/*
* If we're in an interrupt or have no user context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

#ifdef CONFIG_VIRTUAL_MEM_MAP
diff --git a/arch/m32r/mm/fault.c b/arch/m32r/mm/fault.c
index e3d4d48901..8f9875b 100644
--- a/arch/m32r/mm/fault.c
+++ b/arch/m32r/mm/fault.c
@@ -24,9 +24,9 @@
#include <linux/vt_kern.h> /* For unblank_screen() */
#include <linux/highmem.h>
#include <linux/module.h>
+#include <linux/uaccess.h>

#include <asm/m32r.h>
-#include <asm/uaccess.h>
#include <asm/hardirq.h>
#include <asm/mmu_context.h>
#include <asm/tlbflush.h>
@@ -111,10 +111,10 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code,
mm = tsk->mm;

/*
- * If we're in an interrupt or have no user context or are running in an
- * atomic region then we must not take the fault..
+ * If we're in an interrupt or have no user context or have pagefaults
+ * disabled then we must not take the fault.
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto bad_area_nosemaphore;

if (error_code & ACE_USERMODE)
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index b2f04ae..6a94cdd 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -10,10 +10,10 @@
#include <linux/ptrace.h>
#include <linux/interrupt.h>
#include <linux/module.h>
+#include <linux/uaccess.h>

#include <asm/setup.h>
#include <asm/traps.h>
-#include <asm/uaccess.h>
#include <asm/pgalloc.h>

extern void die_if_kernel(char *, struct pt_regs *, long);
@@ -81,7 +81,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/metag/mm/fault.c b/arch/metag/mm/fault.c
index 2de5dc6..f57edca 100644
--- a/arch/metag/mm/fault.c
+++ b/arch/metag/mm/fault.c
@@ -105,7 +105,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,

mm = tsk->mm;

- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c
index d46a5eb..177dfc0 100644
--- a/arch/microblaze/mm/fault.c
+++ b/arch/microblaze/mm/fault.c
@@ -107,14 +107,14 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
if ((error_code & 0x13) == 0x13 || (error_code & 0x11) == 0x11)
is_write = 0;

- if (unlikely(in_atomic() || !mm)) {
+ if (unlikely(faulthandler_disabled() || !mm)) {
if (kernel_mode(regs))
goto bad_area_nosemaphore;

- /* in_atomic() in user mode is really bad,
+ /* faulthandler_disabled() in user mode is really bad,
as is current->mm == NULL. */
- pr_emerg("Page fault in user mode with in_atomic(), mm = %p\n",
- mm);
+ pr_emerg("Page fault in user mode with faulthandler_disabled(), mm = %p\n",
+ mm);
pr_emerg("r15 = %lx MSR = %lx\n",
regs->r15, regs->msr);
die("Weird page fault", regs, SIGSEGV);
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index 7ff8637..36c0f26 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -21,10 +21,10 @@
#include <linux/module.h>
#include <linux/kprobes.h>
#include <linux/perf_event.h>
+#include <linux/uaccess.h>

#include <asm/branch.h>
#include <asm/mmu_context.h>
-#include <asm/uaccess.h>
#include <asm/ptrace.h>
#include <asm/highmem.h> /* For VMALLOC_END */
#include <linux/kdebug.h>
@@ -94,7 +94,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto bad_area_nosemaphore;

if (user_mode(regs))
diff --git a/arch/mn10300/mm/fault.c b/arch/mn10300/mm/fault.c
index 0c2cc5d..4a1d181 100644
--- a/arch/mn10300/mm/fault.c
+++ b/arch/mn10300/mm/fault.c
@@ -23,8 +23,8 @@
#include <linux/interrupt.h>
#include <linux/init.h>
#include <linux/vt_kern.h> /* For unblank_screen() */
+#include <linux/uaccess.h>

-#include <asm/uaccess.h>
#include <asm/pgalloc.h>
#include <asm/hardirq.h>
#include <asm/cpu-regs.h>
@@ -168,7 +168,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long fault_code,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

if ((fault_code & MMUFCR_xFC_ACCESS) == MMUFCR_xFC_ACCESS_USR)
diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c
index 0c9b6af..b51878b 100644
--- a/arch/nios2/mm/fault.c
+++ b/arch/nios2/mm/fault.c
@@ -77,7 +77,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto bad_area_nosemaphore;

if (user_mode(regs))
diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index 47ee620..6548fd1 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -26,9 +26,9 @@
#include <linux/console.h>
#include <linux/bug.h>
#include <linux/ratelimit.h>
+#include <linux/uaccess.h>

#include <asm/assembly.h>
-#include <asm/uaccess.h>
#include <asm/io.h>
#include <asm/irq.h>
#include <asm/traps.h>
@@ -800,7 +800,7 @@ void notrace handle_interruption(int code, struct pt_regs *regs)
* unless pagefault_disable() was called before.
*/

- if (fault_space == 0 && !in_atomic())
+ if (fault_space == 0 && !faulthandler_disabled())
{
pdc_chassis_send_status(PDC_CHASSIS_DIRECT_PANIC);
parisc_terminate("Kernel Fault", regs, code, fault_address);
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index e5120e6..15503ad 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -15,8 +15,8 @@
#include <linux/sched.h>
#include <linux/interrupt.h>
#include <linux/module.h>
+#include <linux/uaccess.h>

-#include <asm/uaccess.h>
#include <asm/traps.h>

/* Various important other fields */
@@ -207,7 +207,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
int fault;
unsigned int flags;

- if (in_atomic())
+ if (pagefault_disabled())
goto no_context;

tsk = current;
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index b396868..6d53597 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -33,13 +33,13 @@
#include <linux/ratelimit.h>
#include <linux/context_tracking.h>
#include <linux/hugetlb.h>
+#include <linux/uaccess.h>

#include <asm/firmware.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/mmu.h>
#include <asm/mmu_context.h>
-#include <asm/uaccess.h>
#include <asm/tlbflush.h>
#include <asm/siginfo.h>
#include <asm/debug.h>
@@ -272,15 +272,16 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
if (!arch_irq_disabled_regs(regs))
local_irq_enable();

- if (in_atomic() || mm == NULL) {
+ if (faulthandler_disabled() || mm == NULL) {
if (!user_mode(regs)) {
rc = SIGSEGV;
goto bail;
}
- /* in_atomic() in user mode is really bad,
+ /* faulthandler_disabled() in user mode is really bad,
as is current->mm == NULL. */
printk(KERN_EMERG "Page fault in user mode with "
- "in_atomic() = %d mm = %p\n", in_atomic(), mm);
+ "faulthandler_disabled() = %d mm = %p\n",
+ faulthandler_disabled(), mm);
printk(KERN_EMERG "NIP = %lx MSR = %lx\n",
regs->nip, regs->msr);
die("Weird page fault", regs, SIGSEGV);
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 76515bc..4c8f5d7 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -399,7 +399,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
* user context.
*/
fault = VM_FAULT_BADCONTEXT;
- if (unlikely(!user_space_fault(regs) || in_atomic() || !mm))
+ if (unlikely(!user_space_fault(regs) || faulthandler_disabled() || !mm))
goto out;

address = trans_exc_code & __FAIL_ADDR_MASK;
diff --git a/arch/score/mm/fault.c b/arch/score/mm/fault.c
index 6860beb..37a6c2e 100644
--- a/arch/score/mm/fault.c
+++ b/arch/score/mm/fault.c
@@ -34,6 +34,7 @@
#include <linux/string.h>
#include <linux/types.h>
#include <linux/ptrace.h>
+#include <linux/uaccess.h>

/*
* This routine handles page faults. It determines the address,
@@ -73,7 +74,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long write,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto bad_area_nosemaphore;

if (user_mode(regs))
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index a58fec9..79d8276 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/perf_event.h>
#include <linux/kdebug.h>
+#include <linux/uaccess.h>
#include <asm/io_trapped.h>
#include <asm/mmu_context.h>
#include <asm/tlbflush.h>
@@ -438,9 +439,9 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,

/*
* If we're in an interrupt, have no user context or are running
- * in an atomic region then we must not take the fault:
+ * with pagefaults disabled then we must not take the fault:
*/
- if (unlikely(in_atomic() || !mm)) {
+ if (unlikely(faulthandler_disabled() || !mm)) {
bad_area_nosemaphore(regs, error_code, address);
return;
}
diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index 70d8171..c399e7b 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -21,6 +21,7 @@
#include <linux/perf_event.h>
#include <linux/interrupt.h>
#include <linux/kdebug.h>
+#include <linux/uaccess.h>

#include <asm/page.h>
#include <asm/pgtable.h>
@@ -29,7 +30,6 @@
#include <asm/setup.h>
#include <asm/smp.h>
#include <asm/traps.h>
-#include <asm/uaccess.h>

#include "mm_32.h"

@@ -196,7 +196,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (pagefault_disabled() || !mm)
goto no_context;

perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 4798232..e9268ea 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -22,12 +22,12 @@
#include <linux/kdebug.h>
#include <linux/percpu.h>
#include <linux/context_tracking.h>
+#include <linux/uaccess.h>

#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/openprom.h>
#include <asm/oplib.h>
-#include <asm/uaccess.h>
#include <asm/asi.h>
#include <asm/lsu.h>
#include <asm/sections.h>
@@ -330,7 +330,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto intr_or_no_mm;

perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 4ca0d6b..cee9b77 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2706,7 +2706,7 @@ void hugetlb_setup(struct pt_regs *regs)
struct mm_struct *mm = current->mm;
struct tsb_config *tp;

- if (in_atomic() || !mm) {
+ if (faulthandler_disabled() || !mm) {
const struct exception_table_entry *entry;

entry = search_exception_tables(regs->tpc);
diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c
index e83cc99..3f4f58d 100644
--- a/arch/tile/mm/fault.c
+++ b/arch/tile/mm/fault.c
@@ -354,9 +354,9 @@ static int handle_page_fault(struct pt_regs *regs,

/*
* If we're in an interrupt, have no user context or are running in an
- * atomic region then we must not take the fault.
+ * region with pagefaults disabled then we must not take the fault.
*/
- if (in_atomic() || !mm) {
+ if (pagefault_disabled() || !mm) {
vma = NULL; /* happy compiler */
goto bad_area_nosemaphore;
}
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index 8e4daf4..f9c9e5a 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -35,10 +35,10 @@ int handle_page_fault(unsigned long address, unsigned long ip,
*code_out = SEGV_MAPERR;

/*
- * If the fault was during atomic operation, don't take the fault, just
+ * If the fault was with pagefaults disabled, don't take the fault, just
* fail.
*/
- if (in_atomic())
+ if (faulthandler_disabled())
goto out_nosemaphore;

if (is_user)
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index 0dc922d..afccef552 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -218,7 +218,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm)
+ if (faulthandler_disabled() || !mm)
goto no_context;

if (user_mode(regs))
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 181c53b..9dc9098 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -13,6 +13,7 @@
#include <linux/hugetlb.h> /* hstate_index_to_shift */
#include <linux/prefetch.h> /* prefetchw */
#include <linux/context_tracking.h> /* exception_enter(), ... */
+#include <linux/uaccess.h> /* faulthandler_disabled() */

#include <asm/traps.h> /* dotraplinkage, ... */
#include <asm/pgalloc.h> /* pgd_*(), ... */
@@ -1126,9 +1127,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,

/*
* If we're in an interrupt, have no user context or are running
- * in an atomic region then we must not take the fault:
+ * in a region with pagefaults disabled then we must not take the fault
*/
- if (unlikely(in_atomic() || !mm)) {
+ if (unlikely(faulthandler_disabled() || !mm)) {
bad_area_nosemaphore(regs, error_code, address);
return;
}
diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c
index 9e3571a..83a44a3 100644
--- a/arch/xtensa/mm/fault.c
+++ b/arch/xtensa/mm/fault.c
@@ -15,10 +15,10 @@
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/hardirq.h>
+#include <linux/uaccess.h>
#include <asm/mmu_context.h>
#include <asm/cacheflush.h>
#include <asm/hardirq.h>
-#include <asm/uaccess.h>
#include <asm/pgalloc.h>

DEFINE_PER_CPU(unsigned long, asid_cache) = ASID_USER_FIRST;
@@ -57,7 +57,7 @@ void do_page_fault(struct pt_regs *regs)
/* If we're in an interrupt or have no user
* context, we must not take the fault..
*/
- if (in_atomic() || !mm) {
+ if (faulthandler_disabled() || !mm) {
bad_page_fault(regs, address, SIGSEGV);
return;
}
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 23290cc..90786d2 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -59,6 +59,18 @@ static inline void pagefault_enable(void)
*/
#define pagefault_disabled() (current->pagefault_disabled != 0)

+/*
+ * The pagefault handler is in general disabled by pagefault_disable() or
+ * when in irq context (via in_atomic()).
+ *
+ * This function should only be used by the fault handlers. Other users should
+ * stick to pagefault_disabled().
+ * Please NEVER use preempt_disable() to disable the fault handler. With
+ * !CONFIG_PREEMPT_COUNT, this is like a NOP. So the handler won't be disabled.
+ * in_atomic() will report different values based on !CONFIG_PREEMPT_COUNT.
+ */
+#define faulthandler_disabled() (pagefault_disabled() || in_atomic())
+
#ifndef ARCH_HAS_NOCACHE_UACCESS

static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
--
2.1.4