Document the circumstances under which refcount_t's saturation mechanism
works deterministically.
Signed-off-by: Jann Horn <[email protected]>
---
Notes:
v2:
- write down the math (Kees)
include/linux/refcount.h | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 0ac50cf62d062..0e3ee25eb156a 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -38,11 +38,24 @@
* atomic operations, then the count will continue to edge closer to 0. If it
* reaches a value of 1 before /any/ of the threads reset it to the saturated
* value, then a concurrent refcount_dec_and_test() may erroneously free the
- * underlying object. Given the precise timing details involved with the
- * round-robin scheduling of each thread manipulating the refcount and the need
- * to hit the race multiple times in succession, there doesn't appear to be a
- * practical avenue of attack even if using refcount_add() operations with
- * larger increments.
+ * underlying object.
+ * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is currently
+ * 0x400000 (and can't easily be raised in the future beyond FUTEX_TID_MASK).
+ * With the current PID limit, if no batched refcounting operations are used and
+ * the attacker can't repeatedly trigger kernel oopses in the middle of refcount
+ * operations, this makes it impossible for a saturated refcount to leave the
+ * saturation range, even if it is possible for multiple uses of the same
+ * refcount to nest in the context of a single task:
+ *
+ * (UINT_MAX+1-REFCOUNT_SATURATED) / PID_MAX_LIMIT =
+ * 0x40000000 / 0x400000 = 0x100 = 256
+ *
+ * If hundreds of references are added/removed with a single refcounting
+ * operation, it may potentially be possible to leave the saturation range; but
+ * given the precise timing details involved with the round-robin scheduling of
+ * each thread manipulating the refcount and the need to hit the race multiple
+ * times in succession, there doesn't appear to be a practical avenue of attack
+ * even if using refcount_add() operations with larger increments.
*
* Memory ordering
* ===============
base-commit: 98d54f81e36ba3bf92172791eba5ca5bd813989b
--
2.25.0.265.gbab2e86ba0-goog
On Tue, 3 Mar 2020 at 11:54, Jann Horn <[email protected]> wrote:
>
> Document the circumstances under which refcount_t's saturation mechanism
> works deterministically.
>
> Signed-off-by: Jann Horn <[email protected]>
I /think/ the main point of Kees's suggestion was that FUTEX_TID_MASK
is UAPI, so unlikely to change.
> ---
>
> Notes:
> v2:
> - write down the math (Kees)
>
> include/linux/refcount.h | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 0ac50cf62d062..0e3ee25eb156a 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -38,11 +38,24 @@
> * atomic operations, then the count will continue to edge closer to 0. If it
> * reaches a value of 1 before /any/ of the threads reset it to the saturated
> * value, then a concurrent refcount_dec_and_test() may erroneously free the
> - * underlying object. Given the precise timing details involved with the
> - * round-robin scheduling of each thread manipulating the refcount and the need
> - * to hit the race multiple times in succession, there doesn't appear to be a
> - * practical avenue of attack even if using refcount_add() operations with
> - * larger increments.
> + * underlying object.
> + * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is currently
> + * 0x400000 (and can't easily be raised in the future beyond FUTEX_TID_MASK).
> + * With the current PID limit, if no batched refcounting operations are used and
> + * the attacker can't repeatedly trigger kernel oopses in the middle of refcount
> + * operations, this makes it impossible for a saturated refcount to leave the
> + * saturation range, even if it is possible for multiple uses of the same
> + * refcount to nest in the context of a single task:
> + *
> + * (UINT_MAX+1-REFCOUNT_SATURATED) / PID_MAX_LIMIT =
> + * 0x40000000 / 0x400000 = 0x100 = 256
> + *
> + * If hundreds of references are added/removed with a single refcounting
> + * operation, it may potentially be possible to leave the saturation range; but
> + * given the precise timing details involved with the round-robin scheduling of
> + * each thread manipulating the refcount and the need to hit the race multiple
> + * times in succession, there doesn't appear to be a practical avenue of attack
> + * even if using refcount_add() operations with larger increments.
> *
> * Memory ordering
> * ===============
>
> base-commit: 98d54f81e36ba3bf92172791eba5ca5bd813989b
> --
> 2.25.0.265.gbab2e86ba0-goog
>
On Tue, Mar 3, 2020 at 2:07 PM Ard Biesheuvel <[email protected]> wrote:
> On Tue, 3 Mar 2020 at 11:54, Jann Horn <[email protected]> wrote:
> >
> > Document the circumstances under which refcount_t's saturation mechanism
> > works deterministically.
> >
> > Signed-off-by: Jann Horn <[email protected]>
>
> I /think/ the main point of Kees's suggestion was that FUTEX_TID_MASK
> is UAPI, so unlikely to change.
Yeah, but it has already changed three times in git history:
76b81e2b0e224 ("[PATCH] lightweight robust futexes updates 2"):
0x1fffffff -> 0x3fffffff
d0aa7a70bf03b ("futex_requeue_pi optimization"): 0x3fffffff -> 0x0fffffff
bd197234b0a6 ("Revert "futex_requeue_pi optimization""): 0x0fffffff ->
0x3fffffff
I just sent a patch to fix up a comment that still claimed the mask
was 0x1fffffff... so I didn't want to explicitly write the new value
here.
While making the value *bigger* would probably be a bit hard (and
unnecessary), making it smaller would be fairly easy here - the field
is populated by userspace, so even though the mask is 0x3fffffff,
userspace will never set the upper bits, so they're effectively
reserved bits with value 0.
On Tue, Mar 03, 2020 at 11:54:27AM +0100, Jann Horn wrote:
> Document the circumstances under which refcount_t's saturation mechanism
> works deterministically.
>
> Signed-off-by: Jann Horn <[email protected]>
Acked-by: Kees Cook <[email protected]>
Thanks!
-Kees
>
> Notes:
> v2:
> - write down the math (Kees)
>
> include/linux/refcount.h | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 0ac50cf62d062..0e3ee25eb156a 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -38,11 +38,24 @@
> * atomic operations, then the count will continue to edge closer to 0. If it
> * reaches a value of 1 before /any/ of the threads reset it to the saturated
> * value, then a concurrent refcount_dec_and_test() may erroneously free the
> - * underlying object. Given the precise timing details involved with the
> - * round-robin scheduling of each thread manipulating the refcount and the need
> - * to hit the race multiple times in succession, there doesn't appear to be a
> - * practical avenue of attack even if using refcount_add() operations with
> - * larger increments.
> + * underlying object.
> + * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is currently
> + * 0x400000 (and can't easily be raised in the future beyond FUTEX_TID_MASK).
> + * With the current PID limit, if no batched refcounting operations are used and
> + * the attacker can't repeatedly trigger kernel oopses in the middle of refcount
> + * operations, this makes it impossible for a saturated refcount to leave the
> + * saturation range, even if it is possible for multiple uses of the same
> + * refcount to nest in the context of a single task:
> + *
> + * (UINT_MAX+1-REFCOUNT_SATURATED) / PID_MAX_LIMIT =
> + * 0x40000000 / 0x400000 = 0x100 = 256
> + *
> + * If hundreds of references are added/removed with a single refcounting
> + * operation, it may potentially be possible to leave the saturation range; but
> + * given the precise timing details involved with the round-robin scheduling of
> + * each thread manipulating the refcount and the need to hit the race multiple
> + * times in succession, there doesn't appear to be a practical avenue of attack
> + * even if using refcount_add() operations with larger increments.
> *
> * Memory ordering
> * ===============
>
> base-commit: 98d54f81e36ba3bf92172791eba5ca5bd813989b
> --
> 2.25.0.265.gbab2e86ba0-goog
>
--
Kees Cook
On Tue, Mar 03, 2020 at 11:54:27AM +0100, Jann Horn wrote:
> Document the circumstances under which refcount_t's saturation mechanism
> works deterministically.
>
> Signed-off-by: Jann Horn <[email protected]>
> ---
>
> Notes:
> v2:
> - write down the math (Kees)
>
> include/linux/refcount.h | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 0ac50cf62d062..0e3ee25eb156a 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -38,11 +38,24 @@
> * atomic operations, then the count will continue to edge closer to 0. If it
> * reaches a value of 1 before /any/ of the threads reset it to the saturated
> * value, then a concurrent refcount_dec_and_test() may erroneously free the
> - * underlying object. Given the precise timing details involved with the
> - * round-robin scheduling of each thread manipulating the refcount and the need
> - * to hit the race multiple times in succession, there doesn't appear to be a
> - * practical avenue of attack even if using refcount_add() operations with
> - * larger increments.
> + * underlying object.
> + * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is currently
> + * 0x400000 (and can't easily be raised in the future beyond FUTEX_TID_MASK).
> + * With the current PID limit, if no batched refcounting operations are used and
> + * the attacker can't repeatedly trigger kernel oopses in the middle of refcount
> + * operations, this makes it impossible for a saturated refcount to leave the
> + * saturation range, even if it is possible for multiple uses of the same
> + * refcount to nest in the context of a single task:
> + *
> + * (UINT_MAX+1-REFCOUNT_SATURATED) / PID_MAX_LIMIT =
> + * 0x40000000 / 0x400000 = 0x100 = 256
> + *
> + * If hundreds of references are added/removed with a single refcounting
> + * operation, it may potentially be possible to leave the saturation range; but
> + * given the precise timing details involved with the round-robin scheduling of
> + * each thread manipulating the refcount and the need to hit the race multiple
> + * times in succession, there doesn't appear to be a practical avenue of attack
> + * even if using refcount_add() operations with larger increments.
> *
> * Memory ordering
> * ===============
>
> base-commit: 98d54f81e36ba3bf92172791eba5ca5bd813989b
Acked-by: Will Deacon <[email protected]>
Peter -- would you be able to take this through -tip, please?
Will
On Tue, Mar 17, 2020 at 10:27:18PM +0000, Will Deacon wrote:
> Acked-by: Will Deacon <[email protected]>
>
> Peter -- would you be able to take this through -tip, please?
Got it, I'll stick it in locking/core.
Thanks!
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: a13f58a0cafa7b0416a2898bc3b0defbb305d108
Gitweb: https://git.kernel.org/tip/a13f58a0cafa7b0416a2898bc3b0defbb305d108
Author: Jann Horn <[email protected]>
AuthorDate: Tue, 03 Mar 2020 11:54:27 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Wed, 08 Apr 2020 12:05:07 +02:00
locking/refcount: Document interaction with PID_MAX_LIMIT
Document the circumstances under which refcount_t's saturation mechanism
works deterministically.
Acked-by: Kees Cook <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Jann Horn <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/refcount.h | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 0ac50cf..0e3ee25 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -38,11 +38,24 @@
* atomic operations, then the count will continue to edge closer to 0. If it
* reaches a value of 1 before /any/ of the threads reset it to the saturated
* value, then a concurrent refcount_dec_and_test() may erroneously free the
- * underlying object. Given the precise timing details involved with the
- * round-robin scheduling of each thread manipulating the refcount and the need
- * to hit the race multiple times in succession, there doesn't appear to be a
- * practical avenue of attack even if using refcount_add() operations with
- * larger increments.
+ * underlying object.
+ * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is currently
+ * 0x400000 (and can't easily be raised in the future beyond FUTEX_TID_MASK).
+ * With the current PID limit, if no batched refcounting operations are used and
+ * the attacker can't repeatedly trigger kernel oopses in the middle of refcount
+ * operations, this makes it impossible for a saturated refcount to leave the
+ * saturation range, even if it is possible for multiple uses of the same
+ * refcount to nest in the context of a single task:
+ *
+ * (UINT_MAX+1-REFCOUNT_SATURATED) / PID_MAX_LIMIT =
+ * 0x40000000 / 0x400000 = 0x100 = 256
+ *
+ * If hundreds of references are added/removed with a single refcounting
+ * operation, it may potentially be possible to leave the saturation range; but
+ * given the precise timing details involved with the round-robin scheduling of
+ * each thread manipulating the refcount and the need to hit the race multiple
+ * times in succession, there doesn't appear to be a practical avenue of attack
+ * even if using refcount_add() operations with larger increments.
*
* Memory ordering
* ===============