v1 -> v2:
- moved pagefault_count to the end of thread_info for all archs that would have
required manually calculating asm-offsets - to keep changes minimal.
- remove unlikely() from "mm, uaccess: trigger might_sleep() in" and keep
changes minimal (in_atomic() -> pagefault_disabled())
----
I recently discovered that might_fault() doesn't call might_sleep() anymore.
Therefore bugs like:
spin_lock(&lock);
rc = copy_to_user(...);
spin_unlock(&lock);
would not be detected with CONFIG_DEBUG_ATOMIC_SLEEP. The code was changed to
disable false positives for code like:
pagefault_disable();
rc = copy_to_user(...);
pagefault_enable();
Until now, pagefault_disable() and pagefault_enable() simply modified the
preempt count, therefore telling the pagefault handler that the context is
atomic and sleeping is disallowed.
In order to reenable might_sleep() checks for the correct path, we need a way to
detect whether we run in a pagefault_disable() context.
This series therefore introduces a separate pagefault_count and uses it to count
the levels of pagefault_disable() per thread. might_sleep() checks are
reactivated for the !pagefault_disable() path.
So this should now work:
spin_lock(&lock); /* also if left away */
pagefault_disable()
rc = copy_to_user(...);
pagefault_enable();
spin_unlock(&lock);
And this should report a warning again:
spin_lock(&lock);
rc = copy_to_user(...);
spin_unlock(&lock);
Please note that this series will not completely split the handling of
pagefault_disable() and the preempt count. This will be done in another series.
Purpose of this series is to reenable might_sleep() checks for might_fault(),
avoiding to produce false positives.
Cross compiled on powerpc, arm, sparc, sparc64, arm64, x86_64, i386, mips,
alpha, ia64, xtensa, m68k, microblaze.
Tested on s390.
David Hildenbrand (5):
uaccess: add pagefault_count to thread_info
uaccess: count pagefault_disable() levels in pagefault_count
mm, uaccess: trigger might_sleep() in might_fault() when pagefaults
are disabled
uaccess: clarify that uaccess may only sleep if pagefaults are not
disabled
uaccess: CONFIG_DEBUG_PAGEFAULT_COUNT to debug pagefault_count
arch/alpha/include/asm/thread_info.h | 1 +
arch/arc/include/asm/thread_info.h | 1 +
arch/arm/include/asm/thread_info.h | 1 +
arch/arm64/include/asm/thread_info.h | 1 +
arch/avr32/include/asm/thread_info.h | 1 +
arch/avr32/include/asm/uaccess.h | 12 +++++---
arch/blackfin/include/asm/thread_info.h | 1 +
arch/c6x/include/asm/thread_info.h | 1 +
arch/cris/include/asm/thread_info.h | 1 +
arch/frv/include/asm/thread_info.h | 1 +
arch/hexagon/include/asm/thread_info.h | 1 +
arch/hexagon/include/asm/uaccess.h | 3 +-
arch/ia64/include/asm/thread_info.h | 1 +
arch/m32r/include/asm/thread_info.h | 1 +
arch/m32r/include/asm/uaccess.h | 30 ++++++++++++------
arch/m68k/include/asm/thread_info.h | 1 +
arch/metag/include/asm/thread_info.h | 1 +
arch/microblaze/include/asm/thread_info.h | 1 +
arch/microblaze/include/asm/uaccess.h | 6 ++--
arch/mips/include/asm/thread_info.h | 1 +
arch/mips/include/asm/uaccess.h | 45 ++++++++++++++++++---------
arch/mn10300/include/asm/thread_info.h | 1 +
arch/openrisc/include/asm/thread_info.h | 1 +
arch/parisc/include/asm/thread_info.h | 1 +
arch/powerpc/include/asm/thread_info.h | 1 +
arch/s390/include/asm/thread_info.h | 1 +
arch/s390/include/asm/uaccess.h | 15 ++++++---
arch/score/include/asm/thread_info.h | 1 +
arch/score/include/asm/uaccess.h | 15 ++++++---
arch/sh/include/asm/thread_info.h | 1 +
arch/sparc/include/asm/thread_info_32.h | 1 +
arch/sparc/include/asm/thread_info_64.h | 1 +
arch/tile/include/asm/thread_info.h | 1 +
arch/tile/include/asm/uaccess.h | 21 ++++++++-----
arch/um/include/asm/thread_info.h | 1 +
arch/unicore32/include/asm/thread_info.h | 1 +
arch/x86/include/asm/thread_info.h | 1 +
arch/x86/include/asm/uaccess.h | 15 ++++++---
arch/x86/include/asm/uaccess_32.h | 6 ++--
arch/x86/lib/usercopy_32.c | 6 ++--
arch/xtensa/include/asm/thread_info.h | 1 +
include/linux/kernel.h | 3 +-
include/linux/uaccess.h | 51 ++++++++++++++++++++++++++-----
lib/Kconfig.debug | 9 ++++++
lib/strnlen_user.c | 6 ++--
mm/maccess.c | 11 +++++++
mm/memory.c | 18 ++++-------
47 files changed, 222 insertions(+), 80 deletions(-)
--
1.8.5.5
This patch introduces CONFIG_DEBUG_PAGEFAULT_COUNT, to detect over-/underflows
in the pagefault_count resulting from a wrong usage of pagefault_enable() and
pagefault_disable().
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/uaccess.h | 8 +++++++-
lib/Kconfig.debug | 9 +++++++++
mm/maccess.c | 11 +++++++++++
3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 1dfc678..6ffb90b 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -20,11 +20,17 @@ static __always_inline void pagefault_count_inc(void)
(*pagefault_count_ptr())++;
}
-static __always_inline void pagefault_count_dec(void)
+static __always_inline void __pagefault_count_dec(void)
{
(*pagefault_count_ptr())--;
}
+#ifdef CONFIG_DEBUG_PAGEFAULT_COUNT
+extern void pagefault_count_dec(void);
+#else
+#define pagefault_count_dec() __pagefault_count_dec()
+#endif
+
/*
* These routines enable/disable the pagefault handler. If disabled, it will
* not take any locks and go straight to the fixup table.
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d780351..9dea6e0 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -860,6 +860,15 @@ config DEBUG_PREEMPT
if kernel code uses it in a preemption-unsafe way. Also, the kernel
will detect preemption count underflows.
+config DEBUG_PAGEFAULT_COUNT
+ bool "Debug pagefault_disable / pagefault_enable"
+ depends on DEBUG_KERNEL
+ default n
+ help
+ If you say Y here then the kernel will detect pagefault count
+ over-/underflows and therefore non-matching pagefault_enable() and
+ pagefault_disable() calls.
+
menu "Lock Debugging (spinlocks, mutexes, etc...)"
config DEBUG_RT_MUTEXES
diff --git a/mm/maccess.c b/mm/maccess.c
index d53adf9..4b72aa1 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -5,6 +5,17 @@
#include <linux/mm.h>
#include <linux/uaccess.h>
+#ifdef CONFIG_DEBUG_PAGEFAULT_COUNT
+void pagefault_count_dec(void)
+{
+ __pagefault_count_dec();
+
+ /* underflow / previous overflow ? */
+ WARN_ON(pagefault_count() < 0);
+}
+EXPORT_SYMBOL(pagefault_count_dec);
+#endif
+
/**
* probe_kernel_read(): safely attempt to read from a location
* @dst: pointer to the buffer that shall take the data
--
1.8.5.5
In general, non-atomic variants of user access functions may not sleep if
pagefaults are disabled.
Let's update all relevant comments in uaccess code. This also refelects the
might_sleep() checks in might_fault().
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/avr32/include/asm/uaccess.h | 12 ++++++----
arch/hexagon/include/asm/uaccess.h | 3 ++-
arch/m32r/include/asm/uaccess.h | 30 +++++++++++++++--------
arch/microblaze/include/asm/uaccess.h | 6 +++--
arch/mips/include/asm/uaccess.h | 45 +++++++++++++++++++++++------------
arch/s390/include/asm/uaccess.h | 15 ++++++++----
arch/score/include/asm/uaccess.h | 15 ++++++++----
arch/tile/include/asm/uaccess.h | 21 ++++++++++------
arch/x86/include/asm/uaccess.h | 15 ++++++++----
arch/x86/include/asm/uaccess_32.h | 6 +++--
arch/x86/lib/usercopy_32.c | 6 +++--
lib/strnlen_user.c | 6 +++--
12 files changed, 120 insertions(+), 60 deletions(-)
diff --git a/arch/avr32/include/asm/uaccess.h b/arch/avr32/include/asm/uaccess.h
index 245b2ee..6b6dd58 100644
--- a/arch/avr32/include/asm/uaccess.h
+++ b/arch/avr32/include/asm/uaccess.h
@@ -97,7 +97,8 @@ static inline __kernel_size_t __copy_from_user(void *to,
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -116,7 +117,8 @@ static inline __kernel_size_t __copy_from_user(void *to,
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -136,7 +138,8 @@ static inline __kernel_size_t __copy_from_user(void *to,
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -158,7 +161,8 @@ static inline __kernel_size_t __copy_from_user(void *to,
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h
index e4127e4..af98618 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -36,7 +36,8 @@
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
diff --git a/arch/m32r/include/asm/uaccess.h b/arch/m32r/include/asm/uaccess.h
index 84fe7ba..83bfa33 100644
--- a/arch/m32r/include/asm/uaccess.h
+++ b/arch/m32r/include/asm/uaccess.h
@@ -91,7 +91,8 @@ static inline void set_fs(mm_segment_t s)
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -155,7 +156,8 @@ extern int fixup_exception(struct pt_regs *regs);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -175,7 +177,8 @@ extern int fixup_exception(struct pt_regs *regs);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -194,7 +197,8 @@ extern int fixup_exception(struct pt_regs *regs);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -274,7 +278,8 @@ do { \
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -568,7 +573,8 @@ unsigned long __generic_copy_from_user(void *, const void __user *, unsigned lon
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -588,7 +594,8 @@ unsigned long __generic_copy_from_user(void *, const void __user *, unsigned lon
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from kernel space to user space.
*
@@ -606,7 +613,8 @@ unsigned long __generic_copy_from_user(void *, const void __user *, unsigned lon
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -626,7 +634,8 @@ unsigned long __generic_copy_from_user(void *, const void __user *, unsigned lon
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to kernel space.
*
@@ -677,7 +686,8 @@ unsigned long clear_user(void __user *mem, unsigned long len);
* strlen_user: - Get the size of a string in user space.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Get the size of a NUL-terminated string in user space.
*
diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index 59a89a6..53bfbb8 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -178,7 +178,8 @@ extern long __user_bad(void);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -290,7 +291,8 @@ extern long __user_bad(void);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index 22a5624..eded117 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -103,7 +103,8 @@ extern u64 __ua_limit;
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -138,7 +139,8 @@ extern u64 __ua_limit;
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -157,7 +159,8 @@ extern u64 __ua_limit;
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -177,7 +180,8 @@ extern u64 __ua_limit;
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -199,7 +203,8 @@ extern u64 __ua_limit;
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -498,7 +503,8 @@ extern void __put_user_unknown(void);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -517,7 +523,8 @@ extern void __put_user_unknown(void);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -537,7 +544,8 @@ extern void __put_user_unknown(void);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -559,7 +567,8 @@ extern void __put_user_unknown(void);
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -815,7 +824,8 @@ extern size_t __copy_user(void *__to, const void *__from, size_t __n);
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -888,7 +898,8 @@ extern size_t __copy_user_inatomic(void *__to, const void *__from, size_t __n);
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from kernel space to user space.
*
@@ -1075,7 +1086,8 @@ extern size_t __copy_in_user_eva(void *__to, const void *__from, size_t __n);
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -1107,7 +1119,8 @@ extern size_t __copy_in_user_eva(void *__to, const void *__from, size_t __n);
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to kernel space.
*
@@ -1356,7 +1369,8 @@ static inline long __strlen_user(const char __user *s)
* strlen_user: - Get the size of a string in user space.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Get the size of a NUL-terminated string in user space.
*
@@ -1425,7 +1439,8 @@ static inline long __strnlen_user(const char __user *s, long n)
* strnlen_user: - Get the size of a string in user space.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Get the size of a NUL-terminated string in user space.
*
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index cd4c68e..1291da5 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -98,7 +98,8 @@ static inline unsigned long extable_fixup(const struct exception_table_entry *x)
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -118,7 +119,8 @@ unsigned long __must_check __copy_from_user(void *to, const void __user *from,
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -264,7 +266,8 @@ int __get_user_bad(void) __attribute__((noreturn));
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from kernel space to user space.
*
@@ -290,7 +293,8 @@ __compiletime_warning("copy_from_user() buffer size is not provably correct")
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to kernel space.
*
@@ -348,7 +352,8 @@ static inline unsigned long strnlen_user(const char __user *src, unsigned long n
* strlen_user: - Get the size of a string in user space.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Get the size of a NUL-terminated string in user space.
*
diff --git a/arch/score/include/asm/uaccess.h b/arch/score/include/asm/uaccess.h
index ab66ddd..010a800 100644
--- a/arch/score/include/asm/uaccess.h
+++ b/arch/score/include/asm/uaccess.h
@@ -36,7 +36,8 @@
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -61,7 +62,8 @@
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -79,7 +81,8 @@
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -98,7 +101,8 @@
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -119,7 +123,8 @@
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
diff --git a/arch/tile/include/asm/uaccess.h b/arch/tile/include/asm/uaccess.h
index b6cde32..9efe668 100644
--- a/arch/tile/include/asm/uaccess.h
+++ b/arch/tile/include/asm/uaccess.h
@@ -78,7 +78,8 @@ int __range_ok(unsigned long addr, unsigned long size);
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -192,7 +193,8 @@ extern int __get_user_bad(void)
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -272,7 +274,8 @@ extern int __put_user_bad(void)
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -327,7 +330,8 @@ extern int __put_user_bad(void)
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -363,7 +367,8 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -434,7 +439,8 @@ static inline unsigned long __must_check copy_from_user(void *to,
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to user space. Caller must check
* the specified blocks with access_ok() before calling this function.
@@ -466,7 +472,8 @@ copy_in_user(void __user *to, const void __user *from, unsigned long n)
* strlen_user: - Get the size of a string in user space.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Get the size of a NUL-terminated string in user space.
*
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 0d592e0..014d8cb 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -74,7 +74,8 @@ static inline bool __chk_range_not_ok(unsigned long addr, unsigned long size, un
* @addr: User space pointer to start of block to check
* @size: Size of block to check
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Checks if a pointer to a block of memory in user space is valid.
*
@@ -145,7 +146,8 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -240,7 +242,8 @@ extern void __put_user_8(void);
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
@@ -455,7 +458,8 @@ struct __large_struct { unsigned long buf[100]; };
* @x: Variable to store result.
* @ptr: Source address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple variable from user space to kernel
* space. It supports simple types like char and int, but not larger
@@ -479,7 +483,8 @@ struct __large_struct { unsigned long buf[100]; };
* @x: Value to copy to user space.
* @ptr: Destination address, in user space.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* This macro copies a single simple value from kernel space to user
* space. It supports simple types like char and int, but not larger
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 3c03a5d..ae5b37f 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -70,7 +70,8 @@ __copy_to_user_inatomic(void __user *to, const void *from, unsigned long n)
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from kernel space to user space. Caller must check
* the specified block with access_ok() before calling this function.
@@ -117,7 +118,8 @@ __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to kernel space. Caller must check
* the specified block with access_ok() before calling this function.
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index e2f5e21..64ba86e 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -647,7 +647,8 @@ EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);
* @from: Source address, in kernel space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from kernel space to user space.
*
@@ -668,7 +669,8 @@ EXPORT_SYMBOL(_copy_to_user);
* @from: Source address, in user space.
* @n: Number of bytes to copy.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Copy data from user space to kernel space.
*
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index a28df52..0013abd 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -84,7 +84,8 @@ static inline long do_strnlen_user(const char __user *src, unsigned long count,
* @str: The string to measure.
* @count: Maximum count (including NUL character)
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Get the size of a NUL-terminated string in user space.
*
@@ -113,7 +114,8 @@ EXPORT_SYMBOL(strnlen_user);
* strlen_user: - Get the size of a user string INCLUDING final NUL.
* @str: The string to measure.
*
- * Context: User context only. This function may sleep.
+ * Context: User context only. This function may sleep if pagefaults are not
+ * disabled.
*
* Get the size of a NUL-terminated string in user space.
*
--
1.8.5.5
Commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with
pagefault_disable()") removed might_sleep() checks for all user access code
(that uses might_fault()).
The reason was to disable wrong "sleep in atomic" warnings in the following
scenario:
pagefault_disable()
rc = copy_to_user(...)
pagefault_enable()
Which is valid, as pagefault_disable() increments the preempt counter and
therefore disables the pagefault handler. copy_to_user() will not sleep and return
an invalid return code if a page is not available.
However, as all might_sleep() checks are removed, CONFIG_DEBUG_ATOMIC_SLEEP
would no longer detect the following scenario:
spin_lock(&lock);
rc = copy_to_user(...)
spin_unlock(&lock)
If the kernel is compiled with preemption turned on, the preempt counter would
be incremented and copy_to_user() would never sleep. However, with preemption
turned off, the preempt counter will not be touched, we will therefore sleep in
atomic context. We really want to enable CONFIG_DEBUG_ATOMIC_SLEEP checks for
user access functions again, otherwise horrible deadlocks might be hard to debug.
Root of all evil is that pagefault_disable() acted almost as preempt_disable(),
depending on preemption being turned on/off.
As we now have pagefault_disabled(), we can use it to distingusih whether user
acces functions might sleep.
Convert might_fault() into a makro that calls __might_fault(), to allow proper
file + line messages in case of a might_sleep() warning. We can't move the code
directly into kernel.h for now, as that results in ugly header recursions we
can't avoid for now.
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/kernel.h | 3 ++-
mm/memory.c | 18 ++++++------------
2 files changed, 8 insertions(+), 13 deletions(-)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 446d76a..7e65a55 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -232,7 +232,8 @@ static inline u32 reciprocal_scale(u32 val, u32 ep_ro)
#if defined(CONFIG_MMU) && \
(defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP))
-void might_fault(void);
+#define might_fault() __might_fault(__FILE__, __LINE__)
+void __might_fault(const char *file, int line);
#else
static inline void might_fault(void) { }
#endif
diff --git a/mm/memory.c b/mm/memory.c
index 0b3f6c7..563720a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3686,7 +3686,7 @@ void print_vma_addr(char *prefix, unsigned long ip)
}
#if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP)
-void might_fault(void)
+void __might_fault(const char *file, int line)
{
/*
* Some code (nfs/sunrpc) uses socket ops on kernel memory while
@@ -3696,21 +3696,15 @@ void might_fault(void)
*/
if (segment_eq(get_fs(), KERNEL_DS))
return;
-
- /*
- * it would be nicer only to annotate paths which are not under
- * pagefault_disable, however that requires a larger audit and
- * providing helpers like get_user_atomic.
- */
- if (in_atomic())
+ if (pagefault_disabled())
return;
-
- __might_sleep(__FILE__, __LINE__, 0);
-
+ __might_sleep(file, line, 0);
+#if defined(CONFIG_DEBUG_ATOMIC_SLEEP)
if (current->mm)
might_lock_read(¤t->mm->mmap_sem);
+#endif
}
-EXPORT_SYMBOL(might_fault);
+EXPORT_SYMBOL(__might_fault);
#endif
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
--
1.8.5.5
This patch adds the pagefault_count to the thread_info of all
architectures. It will be used to count the pagefault_disable() levels
on a per-thread basis.
We are not reusing the preempt_count as this is per cpu on x86 and we want to
demangle pagefault_disable() from preemption in the future.
The new counter is added directly below the preempt_count, except for archs
relying on a manual calculation of asm offsets - to minimize the changes.
Signed-off-by: David Hildenbrand <[email protected]>
---
arch/alpha/include/asm/thread_info.h | 1 +
arch/arc/include/asm/thread_info.h | 1 +
arch/arm/include/asm/thread_info.h | 1 +
arch/arm64/include/asm/thread_info.h | 1 +
arch/avr32/include/asm/thread_info.h | 1 +
arch/blackfin/include/asm/thread_info.h | 1 +
arch/c6x/include/asm/thread_info.h | 1 +
arch/cris/include/asm/thread_info.h | 1 +
arch/frv/include/asm/thread_info.h | 1 +
arch/hexagon/include/asm/thread_info.h | 1 +
arch/ia64/include/asm/thread_info.h | 1 +
arch/m32r/include/asm/thread_info.h | 1 +
arch/m68k/include/asm/thread_info.h | 1 +
arch/metag/include/asm/thread_info.h | 1 +
arch/microblaze/include/asm/thread_info.h | 1 +
arch/mips/include/asm/thread_info.h | 1 +
arch/mn10300/include/asm/thread_info.h | 1 +
arch/openrisc/include/asm/thread_info.h | 1 +
arch/parisc/include/asm/thread_info.h | 1 +
arch/powerpc/include/asm/thread_info.h | 1 +
arch/s390/include/asm/thread_info.h | 1 +
arch/score/include/asm/thread_info.h | 1 +
arch/sh/include/asm/thread_info.h | 1 +
arch/sparc/include/asm/thread_info_32.h | 1 +
arch/sparc/include/asm/thread_info_64.h | 1 +
arch/tile/include/asm/thread_info.h | 1 +
arch/um/include/asm/thread_info.h | 1 +
arch/unicore32/include/asm/thread_info.h | 1 +
arch/x86/include/asm/thread_info.h | 1 +
arch/xtensa/include/asm/thread_info.h | 1 +
30 files changed, 30 insertions(+)
diff --git a/arch/alpha/include/asm/thread_info.h b/arch/alpha/include/asm/thread_info.h
index 48bbea6..1830671 100644
--- a/arch/alpha/include/asm/thread_info.h
+++ b/arch/alpha/include/asm/thread_info.h
@@ -22,6 +22,7 @@ struct thread_info {
mm_segment_t addr_limit; /* thread address space */
unsigned cpu; /* current CPU */
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count;/* pagefault_disable() levels */
unsigned int status; /* thread-synchronous flags */
int bpt_nsaved;
diff --git a/arch/arc/include/asm/thread_info.h b/arch/arc/include/asm/thread_info.h
index 02bc5ec..2fde704 100644
--- a/arch/arc/include/asm/thread_info.h
+++ b/arch/arc/include/asm/thread_info.h
@@ -41,6 +41,7 @@
struct thread_info {
unsigned long flags; /* low level flags */
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count; /* pagefault_disable() levels */
struct task_struct *task; /* main task structure */
mm_segment_t addr_limit; /* thread address space */
struct exec_domain *exec_domain;/* execution domain */
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index ce73ab6..bf47d2d 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -51,6 +51,7 @@ struct cpu_context_save {
struct thread_info {
unsigned long flags; /* low level flags */
int preempt_count; /* 0 => preemptable, <0 => bug */
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* address limit */
struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 459bf8e..2469f15 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -50,6 +50,7 @@ struct thread_info {
struct exec_domain *exec_domain; /* execution domain */
struct restart_block restart_block;
int preempt_count; /* 0 => preemptable, <0 => bug */
+ int pagefault_count;/* pagefault_disable() levels */
int cpu; /* cpu */
};
diff --git a/arch/avr32/include/asm/thread_info.h b/arch/avr32/include/asm/thread_info.h
index a978f3f..0c1d6f7 100644
--- a/arch/avr32/include/asm/thread_info.h
+++ b/arch/avr32/include/asm/thread_info.h
@@ -25,6 +25,7 @@ struct thread_info {
unsigned long flags; /* low level flags */
__u32 cpu;
__s32 preempt_count; /* 0 => preemptable, <0 => BUG */
+ __s32 pagefault_count;/* pagefault_disable() levels */
__u32 rar_saved; /* return address... */
__u32 rsr_saved; /* ...and status register
saved by debug handler
diff --git a/arch/blackfin/include/asm/thread_info.h b/arch/blackfin/include/asm/thread_info.h
index 55f473b..3ba26aa 100644
--- a/arch/blackfin/include/asm/thread_info.h
+++ b/arch/blackfin/include/asm/thread_info.h
@@ -41,6 +41,7 @@ struct thread_info {
unsigned long flags; /* low level flags */
int cpu; /* cpu we're on */
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count; /* pagefault_disable() levels */
mm_segment_t addr_limit; /* address limit */
struct restart_block restart_block;
#ifndef CONFIG_SMP
diff --git a/arch/c6x/include/asm/thread_info.h b/arch/c6x/include/asm/thread_info.h
index d4e9ef8..6b2dcac 100644
--- a/arch/c6x/include/asm/thread_info.h
+++ b/arch/c6x/include/asm/thread_info.h
@@ -44,6 +44,7 @@ struct thread_info {
unsigned long flags; /* low level flags */
int cpu; /* cpu we're on */
int preempt_count; /* 0 = preemptable, <0 = BUG */
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* thread address space */
struct restart_block restart_block;
};
diff --git a/arch/cris/include/asm/thread_info.h b/arch/cris/include/asm/thread_info.h
index 55dede1..3356902 100644
--- a/arch/cris/include/asm/thread_info.h
+++ b/arch/cris/include/asm/thread_info.h
@@ -32,6 +32,7 @@ struct thread_info {
unsigned long flags; /* low level flags */
__u32 cpu; /* current CPU */
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count;/* pagefault_disable() levels */
__u32 tls; /* TLS for this thread */
mm_segment_t addr_limit; /* thread address space:
diff --git a/arch/frv/include/asm/thread_info.h b/arch/frv/include/asm/thread_info.h
index af29e17..79a97ee 100644
--- a/arch/frv/include/asm/thread_info.h
+++ b/arch/frv/include/asm/thread_info.h
@@ -36,6 +36,7 @@ struct thread_info {
unsigned long status; /* thread-synchronous flags */
__u32 cpu; /* current CPU */
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* thread address space:
* 0-0xBFFFFFFF for user-thead
diff --git a/arch/hexagon/include/asm/thread_info.h b/arch/hexagon/include/asm/thread_info.h
index a59dad3..d54042e 100644
--- a/arch/hexagon/include/asm/thread_info.h
+++ b/arch/hexagon/include/asm/thread_info.h
@@ -51,6 +51,7 @@ struct thread_info {
unsigned long flags; /* low level flags */
__u32 cpu; /* current cpu */
int preempt_count; /* 0=>preemptible,<0=>BUG */
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* segmentation sux */
/*
* used for syscalls somehow;
diff --git a/arch/ia64/include/asm/thread_info.h b/arch/ia64/include/asm/thread_info.h
index 5b17418..14f128c 100644
--- a/arch/ia64/include/asm/thread_info.h
+++ b/arch/ia64/include/asm/thread_info.h
@@ -27,6 +27,7 @@ struct thread_info {
__u32 status; /* Thread synchronous flags */
mm_segment_t addr_limit; /* user-level address space limit */
int preempt_count; /* 0=premptable, <0=BUG; will also serve as bh-counter */
+ int pagefault_count; /* pagefault_disable() levels */
struct restart_block restart_block;
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
__u64 ac_stamp;
diff --git a/arch/m32r/include/asm/thread_info.h b/arch/m32r/include/asm/thread_info.h
index 0017170..bf14efb 100644
--- a/arch/m32r/include/asm/thread_info.h
+++ b/arch/m32r/include/asm/thread_info.h
@@ -35,6 +35,7 @@ struct thread_info {
0-0xFFFFFFFF for kernel-thread
*/
struct restart_block restart_block;
+ int pagefault_count;/* pagefault_disable() levels */
__u8 supervisor_stack[0];
};
diff --git a/arch/m68k/include/asm/thread_info.h b/arch/m68k/include/asm/thread_info.h
index 21a4784..5a6a203 100644
--- a/arch/m68k/include/asm/thread_info.h
+++ b/arch/m68k/include/asm/thread_info.h
@@ -29,6 +29,7 @@ struct thread_info {
struct exec_domain *exec_domain; /* execution domain */
mm_segment_t addr_limit; /* thread address space */
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count;/* pagefault_disable() levels */
__u32 cpu; /* should always be 0 on m68k */
unsigned long tp_value; /* thread pointer */
struct restart_block restart_block;
diff --git a/arch/metag/include/asm/thread_info.h b/arch/metag/include/asm/thread_info.h
index 4771133..91729f5 100644
--- a/arch/metag/include/asm/thread_info.h
+++ b/arch/metag/include/asm/thread_info.h
@@ -33,6 +33,7 @@ struct thread_info {
unsigned long status; /* thread-synchronous flags */
u32 cpu; /* current CPU */
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count; /* pagefault_disable() levels */
mm_segment_t addr_limit; /* thread address space */
struct restart_block restart_block;
diff --git a/arch/microblaze/include/asm/thread_info.h b/arch/microblaze/include/asm/thread_info.h
index 8c9d365..f905b02 100644
--- a/arch/microblaze/include/asm/thread_info.h
+++ b/arch/microblaze/include/asm/thread_info.h
@@ -70,6 +70,7 @@ struct thread_info {
unsigned long status; /* thread-synchronous flags */
__u32 cpu; /* current CPU */
__s32 preempt_count; /* 0 => preemptable,< 0 => BUG*/
+ __s32 pagefault_count; /* pagefault_disable() levels */
mm_segment_t addr_limit; /* thread address space */
struct restart_block restart_block;
diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
index 7de8658..f9f27ac 100644
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -28,6 +28,7 @@ struct thread_info {
unsigned long tp_value; /* thread pointer */
__u32 cpu; /* current CPU */
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /*
* thread address space limit:
diff --git a/arch/mn10300/include/asm/thread_info.h b/arch/mn10300/include/asm/thread_info.h
index bf280ea..f6c03a5 100644
--- a/arch/mn10300/include/asm/thread_info.h
+++ b/arch/mn10300/include/asm/thread_info.h
@@ -45,6 +45,7 @@ struct thread_info {
unsigned long flags; /* low level flags */
__u32 cpu; /* current CPU */
__s32 preempt_count; /* 0 => preemptable, <0 => BUG */
+ __s32 pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* thread address space:
0-0xBFFFFFFF for user-thead
diff --git a/arch/openrisc/include/asm/thread_info.h b/arch/openrisc/include/asm/thread_info.h
index d797acc..bdabd6e 100644
--- a/arch/openrisc/include/asm/thread_info.h
+++ b/arch/openrisc/include/asm/thread_info.h
@@ -52,6 +52,7 @@ struct thread_info {
unsigned long flags; /* low level flags */
__u32 cpu; /* current CPU */
__s32 preempt_count; /* 0 => preemptable, <0 => BUG */
+ __s32 pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* thread address space:
0-0x7FFFFFFF for user-thead
diff --git a/arch/parisc/include/asm/thread_info.h b/arch/parisc/include/asm/thread_info.h
index a846118..e37b76b 100644
--- a/arch/parisc/include/asm/thread_info.h
+++ b/arch/parisc/include/asm/thread_info.h
@@ -14,6 +14,7 @@ struct thread_info {
mm_segment_t addr_limit; /* user-level address space limit */
__u32 cpu; /* current CPU */
int preempt_count; /* 0=premptable, <0=BUG; will also serve as bh-counter */
+ int pagefault_count; /* pagefault_disable() levels */
struct restart_block restart_block;
};
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index b034ecd..e8585fd 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -43,6 +43,7 @@ struct thread_info {
int cpu; /* cpu we're on */
int preempt_count; /* 0 => preemptable,
<0 => BUG */
+ int pagefault_count; /* pagefault_disable() levels */
struct restart_block restart_block;
unsigned long local_flags; /* private flags for thread */
diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h
index 4d62fd5..bbf0513f 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -39,6 +39,7 @@ struct thread_info {
unsigned long sys_call_table; /* System call table address */
unsigned int cpu; /* current CPU */
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count;/* pagefault_disable() levels */
struct restart_block restart_block;
unsigned int system_call;
__u64 user_timer;
diff --git a/arch/score/include/asm/thread_info.h b/arch/score/include/asm/thread_info.h
index 656b7ad..d7f748d 100644
--- a/arch/score/include/asm/thread_info.h
+++ b/arch/score/include/asm/thread_info.h
@@ -35,6 +35,7 @@ struct thread_info {
/* 0 => preemptable, < 0 => BUG */
int preempt_count;
+ int pagefault_count;/* pagefault_disable() levels */
/*
* thread address space:
diff --git a/arch/sh/include/asm/thread_info.h b/arch/sh/include/asm/thread_info.h
index ad27ffa..682a466 100644
--- a/arch/sh/include/asm/thread_info.h
+++ b/arch/sh/include/asm/thread_info.h
@@ -32,6 +32,7 @@ struct thread_info {
__u32 status; /* thread synchronous flags */
__u32 cpu;
int preempt_count; /* 0 => preemptable, <0 => BUG */
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* thread address space */
struct restart_block restart_block;
unsigned long previous_sp; /* sp of previous stack in case
diff --git a/arch/sparc/include/asm/thread_info_32.h b/arch/sparc/include/asm/thread_info_32.h
index 025c984..ff0b112 100644
--- a/arch/sparc/include/asm/thread_info_32.h
+++ b/arch/sparc/include/asm/thread_info_32.h
@@ -49,6 +49,7 @@ struct thread_info {
unsigned long w_saved;
struct restart_block restart_block;
+ int pagefault_count;/* pagefault_disable() levels */
};
/*
diff --git a/arch/sparc/include/asm/thread_info_64.h b/arch/sparc/include/asm/thread_info_64.h
index 798f027..76a60ab 100644
--- a/arch/sparc/include/asm/thread_info_64.h
+++ b/arch/sparc/include/asm/thread_info_64.h
@@ -62,6 +62,7 @@ struct thread_info {
struct pt_regs *kern_una_regs;
unsigned int kern_una_insn;
+ int pagefault_count;/* pagefault_disable() levels */
unsigned long fpregs[(7 * 256) / sizeof(unsigned long)]
__attribute__ ((aligned(64)));
diff --git a/arch/tile/include/asm/thread_info.h b/arch/tile/include/asm/thread_info.h
index 48e4fd0..57032b6 100644
--- a/arch/tile/include/asm/thread_info.h
+++ b/arch/tile/include/asm/thread_info.h
@@ -33,6 +33,7 @@ struct thread_info {
__u32 cpu; /* current CPU */
int preempt_count; /* 0 => preemptable,
<0 => BUG */
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* thread address space
(KERNEL_DS or USER_DS) */
diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1c5b2a8..90b193c 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -19,6 +19,7 @@ struct thread_info {
__u32 cpu; /* current CPU */
int preempt_count; /* 0 => preemptable,
<0 => BUG */
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* thread address space:
0-0xBFFFFFFF for user
0-0xFFFFFFFF for kernel */
diff --git a/arch/unicore32/include/asm/thread_info.h b/arch/unicore32/include/asm/thread_info.h
index af36d8e..1d50fb3 100644
--- a/arch/unicore32/include/asm/thread_info.h
+++ b/arch/unicore32/include/asm/thread_info.h
@@ -69,6 +69,7 @@ struct thread_info {
unsigned long flags; /* low level flags */
int preempt_count; /* 0 => preemptable */
/* <0 => bug */
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit; /* address limit */
struct task_struct *task; /* main task structure */
struct exec_domain *exec_domain; /* execution domain */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 547e344..fa075ab 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -30,6 +30,7 @@ struct thread_info {
__u32 status; /* thread synchronous flags */
__u32 cpu; /* current CPU */
int saved_preempt_count;
+ int pagefault_count;/* pagefault_disable() levels */
mm_segment_t addr_limit;
struct restart_block restart_block;
void __user *sysenter_return;
diff --git a/arch/xtensa/include/asm/thread_info.h b/arch/xtensa/include/asm/thread_info.h
index 470153e..a866129 100644
--- a/arch/xtensa/include/asm/thread_info.h
+++ b/arch/xtensa/include/asm/thread_info.h
@@ -53,6 +53,7 @@ struct thread_info {
mm_segment_t addr_limit; /* thread address space */
struct restart_block restart_block;
+ __s32 pagefault_count;/* pagefault_disable() levels */
unsigned long cpenable;
/* Allocate storage for extra user states and coprocessor states. */
--
1.8.5.5
pagefault_disable() and pagefault_enable() have to increment/decrement the
pagefault_count. We keep manipulating the preempt count to retain compability
to existing pagefault handlers. This has to be demangled in separate patches.
It is now possible to verify whether in a pagefault_disable() envionment by
calling pagefault_disabled().
Signed-off-by: David Hildenbrand <[email protected]>
---
include/linux/uaccess.h | 45 ++++++++++++++++++++++++++++++++++++++-------
1 file changed, 38 insertions(+), 7 deletions(-)
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index ecd3319..1dfc678 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -2,20 +2,45 @@
#define __LINUX_UACCESS_H__
#include <linux/preempt.h>
+#include <linux/thread_info.h>
#include <asm/uaccess.h>
+static __always_inline int pagefault_count(void)
+{
+ return current_thread_info()->pagefault_count;
+}
+
+static __always_inline int *pagefault_count_ptr(void)
+{
+ return ¤t_thread_info()->pagefault_count;
+}
+
+static __always_inline void pagefault_count_inc(void)
+{
+ (*pagefault_count_ptr())++;
+}
+
+static __always_inline void pagefault_count_dec(void)
+{
+ (*pagefault_count_ptr())--;
+}
+
/*
- * These routines enable/disable the pagefault handler in that
- * it will not take any locks and go straight to the fixup table.
+ * These routines enable/disable the pagefault handler. If disabled, it will
+ * not take any locks and go straight to the fixup table.
+ *
+ * We increase the preempt and the pagefault count, to be able to distinguish
+ * whether we run in simple atomic context or in a real pagefault_disable()
+ * context.
+ *
+ * For now, after pagefault_disabled() has been called, we run in atomic
+ * context. User access methods will not sleep.
*
- * They have great resemblance to the preempt_disable/enable calls
- * and in fact they are identical; this is because currently there is
- * no other way to make the pagefault handlers do this. So we do
- * disable preemption but we don't necessarily care about that.
*/
static inline void pagefault_disable(void)
{
preempt_count_inc();
+ pagefault_count_inc();
/*
* make sure to have issued the store before a pagefault
* can hit.
@@ -25,18 +50,24 @@ static inline void pagefault_disable(void)
static inline void pagefault_enable(void)
{
-#ifndef CONFIG_PREEMPT
/*
* make sure to issue those last loads/stores before enabling
* the pagefault handler again.
*/
barrier();
+ pagefault_count_dec();
+#ifndef CONFIG_PREEMPT
preempt_count_dec();
#else
preempt_enable();
#endif
}
+/*
+ * Is the pagefault handler disabled? If so, user access methods will not sleep.
+ */
+#define pagefault_disabled() (pagefault_count() != 0)
+
#ifndef ARCH_HAS_NOCACHE_UACCESS
static inline unsigned long __copy_from_user_inatomic_nocache(void *to,
--
1.8.5.5
On Wed, Dec 10, 2014 at 10:23 PM, David Hildenbrand
<[email protected]> wrote:
> This patch adds the pagefault_count to the thread_info of all
> architectures. It will be used to count the pagefault_disable() levels
> on a per-thread basis.
>
> We are not reusing the preempt_count as this is per cpu on x86 and we want to
> demangle pagefault_disable() from preemption in the future.
>
> The new counter is added directly below the preempt_count, except for archs
> relying on a manual calculation of asm offsets - to minimize the changes.
>
Hi David
Is this patchset targeted for 3.19? If yes, then we need this for
arch/nios2 as well (new arch in 3.19).
On Wed, Dec 10, 2014 at 03:23:29PM +0100, David Hildenbrand wrote:
Did you look at the -rt patches where this comes from?
https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=b389ced19ab649438196d132768fe6522d2f052b
https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=4fb7f9d416f7b34036d9d1b209e77c462ac0ee10
https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=c730a4aade9e5c9b84f65de01d6612bf70c577e3
https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=d365f5bf933e988a39874b33302f02ff6c7989b7
https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=93eb18f43dfa5d49d99e2b8ad236eea2c35dd539
https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=8947442e896921e1b645f9e1fd0f2beee103bba0
On Mon, Dec 15, 2014 at 12:23:01PM +0100, David Hildenbrand wrote:
> Peter just showed my that there is some work ongoing on that
> pagefault_disable() topic. So I am not sure if the arch-specific part of this
> series is still relevant.
Well 'on-going' would be over stating it. We did those bits in -rt many
years ago, but never really made a strong enough effort to get them
upstream.
If there now is a 'semi' sane use-case for upstreaming them we should
indeed look at that.
On Mon, Dec 15, 2014 at 12:21:27PM +0100, David Hildenbrand wrote:
> > On Wed, Dec 10, 2014 at 03:23:29PM +0100, David Hildenbrand wrote:
> >
> > Did you look at the -rt patches where this comes from?
> >
> > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=b389ced19ab649438196d132768fe6522d2f052b
> > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=4fb7f9d416f7b34036d9d1b209e77c462ac0ee10
> > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=c730a4aade9e5c9b84f65de01d6612bf70c577e3
> > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=d365f5bf933e988a39874b33302f02ff6c7989b7
> > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=93eb18f43dfa5d49d99e2b8ad236eea2c35dd539
> > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=8947442e896921e1b645f9e1fd0f2beee103bba0
> >
> >
>
> Thanks for the links - haven't seen these patches so far (somebody on the list
> just mentioned that someone tried to demangle that stuff some time ago). But
> good to know that somebody is working on that pagefault_disable() thing.
>
> Do you know what the plans for this series are? So I can base my patches
> (might_sleep() checks for might_fault()) on that queue.
As stated in that other email, there's no active work on this atm. Its
just what -rt needed the pagefault_{en,dis}able() bits for. I think we
should try and merge some of that upstream now that there is a stronger
use case.
> On Mon, Dec 15, 2014 at 12:21:27PM +0100, David Hildenbrand wrote:
> > > On Wed, Dec 10, 2014 at 03:23:29PM +0100, David Hildenbrand wrote:
> > >
> > > Did you look at the -rt patches where this comes from?
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=b389ced19ab649438196d132768fe6522d2f052b
> > > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=4fb7f9d416f7b34036d9d1b209e77c462ac0ee10
> > > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=c730a4aade9e5c9b84f65de01d6612bf70c577e3
> > > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=d365f5bf933e988a39874b33302f02ff6c7989b7
> > > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=93eb18f43dfa5d49d99e2b8ad236eea2c35dd539
> > > https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=8947442e896921e1b645f9e1fd0f2beee103bba0
> > >
> > >
> >
> > Thanks for the links - haven't seen these patches so far (somebody on the list
> > just mentioned that someone tried to demangle that stuff some time ago). But
> > good to know that somebody is working on that pagefault_disable() thing.
> >
> > Do you know what the plans for this series are? So I can base my patches
> > (might_sleep() checks for might_fault()) on that queue.
>
> As stated in that other email, there's no active work on this atm. Its
> just what -rt needed the pagefault_{en,dis}able() bits for. I think we
> should try and merge some of that upstream now that there is a stronger
> use case.
>
Ah, now I get it. So the main question is which approach is better:
a) -rt version: Store the pagefault_count in struct task_struct()
b) my version: Storing it in thread_info
IOW: My series first and the -rt part (pagefault handlers, preempt fixup) on
top or -rt version first and my work (patch 3 + 4 ) on top.
Getting rid of that whole preemption handling in pagefault_disable() / fixing up
the pagefault handlers is something I would have addressed in future patches,
but that part seems to be just fine in the -rt code.
Thanks for having a look!
David
> On Wed, Dec 10, 2014 at 03:23:29PM +0100, David Hildenbrand wrote:
>
> Did you look at the -rt patches where this comes from?
>
> https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=b389ced19ab649438196d132768fe6522d2f052b
> https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=4fb7f9d416f7b34036d9d1b209e77c462ac0ee10
> https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=c730a4aade9e5c9b84f65de01d6612bf70c577e3
> https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=d365f5bf933e988a39874b33302f02ff6c7989b7
> https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=93eb18f43dfa5d49d99e2b8ad236eea2c35dd539
> https://git.kernel.org/cgit/linux/kernel/git/clrkwllms/rt-linux.git/commit/?h=v3.14.21-rt9&id=8947442e896921e1b645f9e1fd0f2beee103bba0
>
>
Thanks for the links - haven't seen these patches so far (somebody on the list
just mentioned that someone tried to demangle that stuff some time ago). But
good to know that somebody is working on that pagefault_disable() thing.
Do you know what the plans for this series are? So I can base my patches
(might_sleep() checks for might_fault()) on that queue.
David
> On Wed, Dec 10, 2014 at 10:23 PM, David Hildenbrand
> <[email protected]> wrote:
> > This patch adds the pagefault_count to the thread_info of all
> > architectures. It will be used to count the pagefault_disable() levels
> > on a per-thread basis.
> >
> > We are not reusing the preempt_count as this is per cpu on x86 and we want to
> > demangle pagefault_disable() from preemption in the future.
> >
> > The new counter is added directly below the preempt_count, except for archs
> > relying on a manual calculation of asm offsets - to minimize the changes.
> >
> Hi David
>
> Is this patchset targeted for 3.19? If yes, then we need this for
> arch/nios2 as well (new arch in 3.19).
>
Hi,
Peter just showed my that there is some work ongoing on that
pagefault_disable() topic. So I am not sure if the arch-specific part of this
series is still relevant.
But thanks for the info!
David
On Mon, Jan 12, 2015 at 03:19:11PM +0100, David Hildenbrand wrote:
> Thomas, Peter,
>
> anything that speaks against putting the pagefault_disable counter into
> thread_info (my series) instead of task_struct (rt tree)?
>
> IOW, what would be the right place for it?
I think we put it in task_struct because lazy; ARM seems one of the few
popular archs where current still goes through thread_info.
And that I think is the only reason to maybe use thread_info, cost of
access. The down-side of using thread_info is of course that it reduces
stack size.
In any case; I think that if you want to go do this; please consider the
route -rt took and completely separate the two, don't leave the
preempt_count_{inc,dec} remnant in pagefault_{en,dis}able() at all.
> On Mon, Jan 12, 2015 at 03:19:11PM +0100, David Hildenbrand wrote:
> > Thomas, Peter,
> >
> > anything that speaks against putting the pagefault_disable counter into
> > thread_info (my series) instead of task_struct (rt tree)?
> >
> > IOW, what would be the right place for it?
>
> I think we put it in task_struct because lazy; ARM seems one of the few
> popular archs where current still goes through thread_info.
>
> And that I think is the only reason to maybe use thread_info, cost of
> access. The down-side of using thread_info is of course that it reduces
> stack size.
>
> In any case; I think that if you want to go do this; please consider the
> route -rt took and completely separate the two, don't leave the
> preempt_count_{inc,dec} remnant in pagefault_{en,dis}able() at all.
>
>
Thanks Peter,
I am currently preparing/testing a series that does the requested separation
(getting rid of preempt_count_{inc,dec} ...) while putting the pagefault disable
count into task_info.
Downside is that now that I have to touch all fault handlers, I have to go
through all archs again.
Think I'll have something to show in a couple of days.
David
On Thu, Feb 19, 2015 at 03:48:05PM +0100, David Hildenbrand wrote:
> Downside is that now that I have to touch all fault handlers, I have to go
> through all archs again.
You should be able to borrow from the -rt patches there. They have all
that.
> On Thu, Feb 19, 2015 at 03:48:05PM +0100, David Hildenbrand wrote:
> > Downside is that now that I have to touch all fault handlers, I have to go
> > through all archs again.
>
> You should be able to borrow from the -rt patches there. They have all
> that.
>
Jup, that's what I partially did.
Thanks
David