2019-01-18 12:29:24

by Elena Reshetova

[permalink] [raw]
Subject: [PATCH 0/5] sched refcount_t conversions

I would really love finally to merge these old patches
(now rebased on top of linux-next/master as of last friday),
since as far as I remember none has raised any more concerns
on them.

refcount_t has been now successfully used in kernel in many places,
helped to detect bugs and mistakes in logic of refcounters.

This series, for scheduler and task struct specifically,
replaces atomic_t reference counters with the new refcount_t type
and API (see include/linux/refcount.h).
By doing this we prevent intentional or accidental
underflows or overflows that can lead to use-after-free vulnerabilities.

The patches are fully independent and can be cherry-picked separately.
More information about each conversion in each patch separately.

Elena Reshetova (5):
sched: convert sighand_struct.count to refcount_t
sched: convert signal_struct.sigcnt to refcount_t
sched: convert numa_group.refcount to refcount_t
sched/task_struct: convert task_struct.usage to refcount_t
sched/task_struct: convert task_struct.stack_refcount to refcount_t

fs/exec.c | 4 ++--
fs/proc/task_nommu.c | 2 +-
include/linux/init_task.h | 1 +
include/linux/sched.h | 5 +++--
include/linux/sched/signal.h | 5 +++--
include/linux/sched/task.h | 4 ++--
include/linux/sched/task_stack.h | 2 +-
init/init_task.c | 6 +++---
kernel/fork.c | 24 ++++++++++++------------
kernel/sched/fair.c | 12 ++++++------
10 files changed, 34 insertions(+), 31 deletions(-)

--
2.7.4



2019-01-18 12:29:31

by Elena Reshetova

[permalink] [raw]
Subject: [PATCH 1/5] sched: convert sighand_struct.count to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable sighand_struct.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

**Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the sighand_struct.count it might make a difference
in following places:
- __cleanup_sighand: decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
fs/exec.c | 4 ++--
fs/proc/task_nommu.c | 2 +-
include/linux/sched/signal.h | 3 ++-
kernel/fork.c | 8 ++++----
4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index fb72d36..966cd98 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1189,7 +1189,7 @@ static int de_thread(struct task_struct *tsk)
flush_itimer_signals();
#endif

- if (atomic_read(&oldsighand->count) != 1) {
+ if (refcount_read(&oldsighand->count) != 1) {
struct sighand_struct *newsighand;
/*
* This ->sighand is shared with the CLONE_SIGHAND
@@ -1199,7 +1199,7 @@ static int de_thread(struct task_struct *tsk)
if (!newsighand)
return -ENOMEM;

- atomic_set(&newsighand->count, 1);
+ refcount_set(&newsighand->count, 1);
memcpy(newsighand->action, oldsighand->action,
sizeof(newsighand->action));

diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 0b63d68..f912872 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -64,7 +64,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
else
bytes += kobjsize(current->files);

- if (current->sighand && atomic_read(&current->sighand->count) > 1)
+ if (current->sighand && refcount_read(&current->sighand->count) > 1)
sbytes += kobjsize(current->sighand);
else
bytes += kobjsize(current->sighand);
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 0c3e396..2a65721 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -8,13 +8,14 @@
#include <linux/sched/jobctl.h>
#include <linux/sched/task.h>
#include <linux/cred.h>
+#include <linux/refcount.h>

/*
* Types defining task->signal and task->sighand and APIs using them:
*/

struct sighand_struct {
- atomic_t count;
+ refcount_t count;
struct k_sigaction action[_NSIG];
spinlock_t siglock;
wait_queue_head_t signalfd_wqh;
diff --git a/kernel/fork.c b/kernel/fork.c
index c48e9e2..71b4757 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1463,7 +1463,7 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
struct sighand_struct *sig;

if (clone_flags & CLONE_SIGHAND) {
- atomic_inc(&current->sighand->count);
+ refcount_inc(&current->sighand->count);
return 0;
}
sig = kmem_cache_alloc(sighand_cachep, GFP_KERNEL);
@@ -1471,7 +1471,7 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
if (!sig)
return -ENOMEM;

- atomic_set(&sig->count, 1);
+ refcount_set(&sig->count, 1);
spin_lock_irq(&current->sighand->siglock);
memcpy(sig->action, current->sighand->action, sizeof(sig->action));
spin_unlock_irq(&current->sighand->siglock);
@@ -1480,7 +1480,7 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)

void __cleanup_sighand(struct sighand_struct *sighand)
{
- if (atomic_dec_and_test(&sighand->count)) {
+ if (refcount_dec_and_test(&sighand->count)) {
signalfd_cleanup(sighand);
/*
* sighand_cachep is SLAB_TYPESAFE_BY_RCU so we can free it
@@ -2439,7 +2439,7 @@ static int check_unshare_flags(unsigned long unshare_flags)
return -EINVAL;
}
if (unshare_flags & (CLONE_SIGHAND | CLONE_VM)) {
- if (atomic_read(&current->sighand->count) > 1)
+ if (refcount_read(&current->sighand->count) > 1)
return -EINVAL;
}
if (unshare_flags & CLONE_VM) {
--
2.7.4


2019-01-18 12:29:44

by Elena Reshetova

[permalink] [raw]
Subject: [PATCH 5/5] sched/task_struct: convert task_struct.stack_refcount to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable task_struct.stack_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

**Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the task_struct.stack_refcount it might make a difference
in following places:
- try_get_task_stack(): increment in refcount_inc_not_zero() only
guarantees control dependency on success vs. fully ordered
atomic counterpart
- put_task_stack(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/init_task.h | 1 +
include/linux/sched.h | 2 +-
include/linux/sched/task_stack.h | 2 +-
init/init_task.c | 2 +-
kernel/fork.c | 6 +++---
5 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index a7083a4..6049baa 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -13,6 +13,7 @@
#include <linux/securebits.h>
#include <linux/seqlock.h>
#include <linux/rbtree.h>
+#include <linux/refcount.h>
#include <linux/sched/autogroup.h>
#include <net/net_namespace.h>
#include <linux/sched/rt.h>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index bead0c7..439e34a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1201,7 +1201,7 @@ struct task_struct {
#endif
#ifdef CONFIG_THREAD_INFO_IN_TASK
/* A live task holds one reference: */
- atomic_t stack_refcount;
+ refcount_t stack_refcount;
#endif
#ifdef CONFIG_LIVEPATCH
int patch_state;
diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h
index 6a84192..2413427 100644
--- a/include/linux/sched/task_stack.h
+++ b/include/linux/sched/task_stack.h
@@ -61,7 +61,7 @@ static inline unsigned long *end_of_stack(struct task_struct *p)
#ifdef CONFIG_THREAD_INFO_IN_TASK
static inline void *try_get_task_stack(struct task_struct *tsk)
{
- return atomic_inc_not_zero(&tsk->stack_refcount) ?
+ return refcount_inc_not_zero(&tsk->stack_refcount) ?
task_stack_page(tsk) : NULL;
}

diff --git a/init/init_task.c b/init/init_task.c
index 0ad9772..df0257c 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -62,7 +62,7 @@ struct task_struct init_task
= {
#ifdef CONFIG_THREAD_INFO_IN_TASK
.thread_info = INIT_THREAD_INFO(init_task),
- .stack_refcount = ATOMIC_INIT(1),
+ .stack_refcount = REFCOUNT_INIT(1),
#endif
.state = 0,
.stack = init_stack,
diff --git a/kernel/fork.c b/kernel/fork.c
index a4ac220..73c2c9f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -428,7 +428,7 @@ static void release_task_stack(struct task_struct *tsk)
#ifdef CONFIG_THREAD_INFO_IN_TASK
void put_task_stack(struct task_struct *tsk)
{
- if (atomic_dec_and_test(&tsk->stack_refcount))
+ if (refcount_dec_and_test(&tsk->stack_refcount))
release_task_stack(tsk);
}
#endif
@@ -446,7 +446,7 @@ void free_task(struct task_struct *tsk)
* If the task had a separate stack allocation, it should be gone
* by now.
*/
- WARN_ON_ONCE(atomic_read(&tsk->stack_refcount) != 0);
+ WARN_ON_ONCE(refcount_read(&tsk->stack_refcount) != 0);
#endif
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
@@ -866,7 +866,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
tsk->stack_vm_area = stack_vm_area;
#endif
#ifdef CONFIG_THREAD_INFO_IN_TASK
- atomic_set(&tsk->stack_refcount, 1);
+ refcount_set(&tsk->stack_refcount, 1);
#endif

if (err)
--
2.7.4


2019-01-18 12:30:37

by Elena Reshetova

[permalink] [raw]
Subject: [PATCH 2/5] sched: convert signal_struct.sigcnt to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable signal_struct.sigcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

**Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the signal_struct.sigcnt it might make a difference
in following places:
- put_signal_struct(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/sched/signal.h | 2 +-
init/init_task.c | 2 +-
kernel/fork.c | 6 +++---
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 2a65721..38a0f07 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -83,7 +83,7 @@ struct multiprocess_signals {
* the locking of signal_struct.
*/
struct signal_struct {
- atomic_t sigcnt;
+ refcount_t sigcnt;
atomic_t live;
int nr_threads;
struct list_head thread_head;
diff --git a/init/init_task.c b/init/init_task.c
index 26131e7..756be60 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -45,7 +45,7 @@ static struct signal_struct init_signals = {
};

static struct sighand_struct init_sighand = {
- .count = ATOMIC_INIT(1),
+ .count = REFCOUNT_INIT(1),
.action = { { { .sa_handler = SIG_DFL, } }, },
.siglock = __SPIN_LOCK_UNLOCKED(init_sighand.siglock),
.signalfd_wqh = __WAIT_QUEUE_HEAD_INITIALIZER(init_sighand.signalfd_wqh),
diff --git a/kernel/fork.c b/kernel/fork.c
index 71b4757..504324c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -709,7 +709,7 @@ static inline void free_signal_struct(struct signal_struct *sig)

static inline void put_signal_struct(struct signal_struct *sig)
{
- if (atomic_dec_and_test(&sig->sigcnt))
+ if (refcount_dec_and_test(&sig->sigcnt))
free_signal_struct(sig);
}

@@ -1527,7 +1527,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)

sig->nr_threads = 1;
atomic_set(&sig->live, 1);
- atomic_set(&sig->sigcnt, 1);
+ refcount_set(&sig->sigcnt, 1);

/* list_add(thread_node, thread_head) without INIT_LIST_HEAD() */
sig->thread_head = (struct list_head)LIST_HEAD_INIT(tsk->thread_node);
@@ -2082,7 +2082,7 @@ static __latent_entropy struct task_struct *copy_process(
} else {
current->signal->nr_threads++;
atomic_inc(&current->signal->live);
- atomic_inc(&current->signal->sigcnt);
+ refcount_inc(&current->signal->sigcnt);
task_join_group_stop(p);
list_add_tail_rcu(&p->thread_group,
&p->group_leader->thread_group);
--
2.7.4


2019-01-18 12:30:57

by Elena Reshetova

[permalink] [raw]
Subject: [PATCH 4/5] sched/task_struct: convert task_struct.usage to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable task_struct.usage is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

**Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the task_struct.usage it might make a difference
in following places:
- put_task_struct(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/sched.h | 3 ++-
include/linux/sched/task.h | 4 ++--
init/init_task.c | 2 +-
kernel/fork.c | 4 ++--
4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5e66900..bead0c7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -21,6 +21,7 @@
#include <linux/seccomp.h>
#include <linux/nodemask.h>
#include <linux/rcupdate.h>
+#include <linux/refcount.h>
#include <linux/resource.h>
#include <linux/latencytop.h>
#include <linux/sched/prio.h>
@@ -608,7 +609,7 @@ struct task_struct {
randomized_struct_fields_start

void *stack;
- atomic_t usage;
+ refcount_t usage;
/* Per task flags (PF_*), defined further below: */
unsigned int flags;
unsigned int ptrace;
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 44c6f15..2e97a22 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -88,13 +88,13 @@ extern void sched_exec(void);
#define sched_exec() {}
#endif

-#define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
+#define get_task_struct(tsk) do { refcount_inc(&(tsk)->usage); } while(0)

extern void __put_task_struct(struct task_struct *t);

static inline void put_task_struct(struct task_struct *t)
{
- if (atomic_dec_and_test(&t->usage))
+ if (refcount_dec_and_test(&t->usage))
__put_task_struct(t);
}

diff --git a/init/init_task.c b/init/init_task.c
index 756be60..0ad9772 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -66,7 +66,7 @@ struct task_struct init_task
#endif
.state = 0,
.stack = init_stack,
- .usage = ATOMIC_INIT(2),
+ .usage = REFCOUNT_INIT(2),
.flags = PF_KTHREAD,
.prio = MAX_PRIO - 20,
.static_prio = MAX_PRIO - 20,
diff --git a/kernel/fork.c b/kernel/fork.c
index 504324c..a4ac220 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -716,7 +716,7 @@ static inline void put_signal_struct(struct signal_struct *sig)
void __put_task_struct(struct task_struct *tsk)
{
WARN_ON(!tsk->exit_state);
- WARN_ON(atomic_read(&tsk->usage));
+ WARN_ON(refcount_read(&tsk->usage));
WARN_ON(tsk == current);

cgroup_free(tsk);
@@ -895,7 +895,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
* One for us, one for whoever does the "release_task()" (usually
* parent)
*/
- atomic_set(&tsk->usage, 2);
+ refcount_set(&tsk->usage, 2);
#ifdef CONFIG_BLK_DEV_IO_TRACE
tsk->btrace_seq = 0;
#endif
--
2.7.4


2019-01-18 12:59:53

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 1/5] sched: convert sighand_struct.count to refcount_t

Hi Elena,

[...]

> **Important note for maintainers:
>
> Some functions from refcount_t API defined in lib/refcount.c
> have different memory ordering guarantees than their atomic
> counterparts.
> The full comparison can be seen in
> https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
> in state to be merged to the documentation tree.

Just a remark to point out that that document got merged, even though
in a different location/format: c.f.,

b6e859f6cdd1 ("docs: refcount_t documentation")

Andrea

2019-01-18 13:31:39

by Elena Reshetova

[permalink] [raw]
Subject: [PATCH 3/5] sched: convert numa_group.refcount to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable numa_group.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

**Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.
The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.
Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the numa_group.refcount it might make a difference
in following places:
- get_numa_group(): increment in refcount_inc_not_zero() only
guarantees control dependency on success vs. fully ordered
atomic counterpart
- put_numa_group(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
kernel/sched/fair.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7303e0b..32a2382 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1035,7 +1035,7 @@ unsigned int sysctl_numa_balancing_scan_size = 256;
unsigned int sysctl_numa_balancing_scan_delay = 1000;

struct numa_group {
- atomic_t refcount;
+ refcount_t refcount;

spinlock_t lock; /* nr_tasks, tasks */
int nr_tasks;
@@ -1104,7 +1104,7 @@ static unsigned int task_scan_start(struct task_struct *p)
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);

- period *= atomic_read(&ng->refcount);
+ period *= refcount_read(&ng->refcount);
period *= shared + 1;
period /= private + shared + 1;
}
@@ -1127,7 +1127,7 @@ static unsigned int task_scan_max(struct task_struct *p)
unsigned long private = group_faults_priv(ng);
unsigned long period = smax;

- period *= atomic_read(&ng->refcount);
+ period *= refcount_read(&ng->refcount);
period *= shared + 1;
period /= private + shared + 1;

@@ -2203,12 +2203,12 @@ static void task_numa_placement(struct task_struct *p)

static inline int get_numa_group(struct numa_group *grp)
{
- return atomic_inc_not_zero(&grp->refcount);
+ return refcount_inc_not_zero(&grp->refcount);
}

static inline void put_numa_group(struct numa_group *grp)
{
- if (atomic_dec_and_test(&grp->refcount))
+ if (refcount_dec_and_test(&grp->refcount))
kfree_rcu(grp, rcu);
}

@@ -2229,7 +2229,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
if (!grp)
return;

- atomic_set(&grp->refcount, 1);
+ refcount_set(&grp->refcount, 1);
grp->active_nodes = 1;
grp->max_faults_cpu = 0;
spin_lock_init(&grp->lock);
--
2.7.4


2019-01-18 15:10:43

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 0/5] sched refcount_t conversions

On Fri, Jan 18, 2019 at 02:27:25PM +0200, Elena Reshetova wrote:
> I would really love finally to merge these old patches
> (now rebased on top of linux-next/master as of last friday),
> since as far as I remember none has raised any more concerns
> on them.
>
> refcount_t has been now successfully used in kernel in many places,
> helped to detect bugs and mistakes in logic of refcounters.
>
> This series, for scheduler and task struct specifically,
> replaces atomic_t reference counters with the new refcount_t type
> and API (see include/linux/refcount.h).
> By doing this we prevent intentional or accidental
> underflows or overflows that can lead to use-after-free vulnerabilities.
>
> The patches are fully independent and can be cherry-picked separately.
> More information about each conversion in each patch separately.
>
> Elena Reshetova (5):
> sched: convert sighand_struct.count to refcount_t
> sched: convert signal_struct.sigcnt to refcount_t
> sched: convert numa_group.refcount to refcount_t
> sched/task_struct: convert task_struct.usage to refcount_t
> sched/task_struct: convert task_struct.stack_refcount to refcount_t

For the series, please feel free to add:

Reviewed-by: Andrea Parri <[email protected]>

(You may still want to update the references to the 'refcount-vs-atomic'
doc. in the commit messages.)

Andrea


>
> fs/exec.c | 4 ++--
> fs/proc/task_nommu.c | 2 +-
> include/linux/init_task.h | 1 +
> include/linux/sched.h | 5 +++--
> include/linux/sched/signal.h | 5 +++--
> include/linux/sched/task.h | 4 ++--
> include/linux/sched/task_stack.h | 2 +-
> init/init_task.c | 6 +++---
> kernel/fork.c | 24 ++++++++++++------------
> kernel/sched/fair.c | 12 ++++++------
> 10 files changed, 34 insertions(+), 31 deletions(-)
>
> --
> 2.7.4
>

2019-01-18 16:13:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/5] sched refcount_t conversions

On Fri, Jan 18, 2019 at 02:27:25PM +0200, Elena Reshetova wrote:
> Elena Reshetova (5):
> sched: convert sighand_struct.count to refcount_t
> sched: convert signal_struct.sigcnt to refcount_t

These should really be seen by Oleg (bounced) and I'll await his reply.

> sched: convert numa_group.refcount to refcount_t
> sched/task_struct: convert task_struct.usage to refcount_t
> sched/task_struct: convert task_struct.stack_refcount to refcount_t

Those are ok and I'll queue them.

2019-01-18 17:26:10

by Elena Reshetova

[permalink] [raw]
Subject: RE: [PATCH 1/5] sched: convert sighand_struct.count to refcount_t

> Hi Elena,

Hi!

>
> [...]
>
> > **Important note for maintainers:
> >
> > Some functions from refcount_t API defined in lib/refcount.c
> > have different memory ordering guarantees than their atomic
> > counterparts.
> > The full comparison can be seen in
> > https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
> > in state to be merged to the documentation tree.
>
> Just a remark to point out that that document got merged, even though
> in a different location/format: c.f.,
>
> b6e859f6cdd1 ("docs: refcount_t documentation")
>

Ups, indeed, forgot that we merged the docs. It has been long time that
these patches has been sitting in my old repo.

Best Regards,
Elena.

2019-01-18 17:42:30

by Elena Reshetova

[permalink] [raw]
Subject: RE: [PATCH 0/5] sched refcount_t conversions


> On Fri, Jan 18, 2019 at 02:27:25PM +0200, Elena Reshetova wrote:
> > Elena Reshetova (5):
> > sched: convert sighand_struct.count to refcount_t
> > sched: convert signal_struct.sigcnt to refcount_t
>
> These should really be seen by Oleg (bounced) and I'll await his reply.
>
> > sched: convert numa_group.refcount to refcount_t
> > sched/task_struct: convert task_struct.usage to refcount_t
> > sched/task_struct: convert task_struct.stack_refcount to refcount_t
>
> Those are ok and I'll queue them.

Thank you Peter!

Best Regards,
Elena

2019-01-18 17:44:13

by Elena Reshetova

[permalink] [raw]
Subject: RE: [PATCH 0/5] sched refcount_t conversions

> On Fri, Jan 18, 2019 at 02:27:25PM +0200, Elena Reshetova wrote:
> > I would really love finally to merge these old patches
> > (now rebased on top of linux-next/master as of last friday),
> > since as far as I remember none has raised any more concerns
> > on them.
> >
> > refcount_t has been now successfully used in kernel in many places,
> > helped to detect bugs and mistakes in logic of refcounters.
> >
> > This series, for scheduler and task struct specifically,
> > replaces atomic_t reference counters with the new refcount_t type
> > and API (see include/linux/refcount.h).
> > By doing this we prevent intentional or accidental
> > underflows or overflows that can lead to use-after-free vulnerabilities.
> >
> > The patches are fully independent and can be cherry-picked separately.
> > More information about each conversion in each patch separately.
> >
> > Elena Reshetova (5):
> > sched: convert sighand_struct.count to refcount_t
> > sched: convert signal_struct.sigcnt to refcount_t
> > sched: convert numa_group.refcount to refcount_t
> > sched/task_struct: convert task_struct.usage to refcount_t
> > sched/task_struct: convert task_struct.stack_refcount to refcount_t
>
> For the series, please feel free to add:
>
> Reviewed-by: Andrea Parri <[email protected]>

Thank you for your review!

Best Regards,
Elena

2019-01-21 17:06:53

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/5] sched: convert sighand_struct.count to refcount_t

On 01/18, Elena Reshetova wrote:
>
> --- a/fs/proc/task_nommu.c
> +++ b/fs/proc/task_nommu.c
> @@ -64,7 +64,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
> else
> bytes += kobjsize(current->files);
>
> - if (current->sighand && atomic_read(&current->sighand->count) > 1)
> + if (current->sighand && refcount_read(&current->sighand->count) > 1)
> sbytes += kobjsize(current->sighand);
> else
> bytes += kobjsize(current->sighand);

I fail to understand this code with or without the patch... I do not see
how is it possible to hit ->sighand == NULL or sighand->count == 0 in
proc_pid_status() paths.

Nevermind, this is off-topic.

Reviewed-by: Oleg Nesterov <[email protected]>


2019-01-21 17:43:53

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/5] sched: convert signal_struct.sigcnt to refcount_t

On 01/18, Elena Reshetova wrote:
>
> For the signal_struct.sigcnt it might make a difference
> in following places:
> - put_signal_struct(): decrement in refcount_dec_and_test() only
> provides RELEASE ordering and control dependency on success
> vs. fully ordered atomic counterpart

this is fine.

Reviewed-by: Oleg Nesterov <[email protected]>


2019-01-22 09:13:06

by Elena Reshetova

[permalink] [raw]
Subject: RE: [PATCH 2/5] sched: convert signal_struct.sigcnt to refcount_t

> On 01/18, Elena Reshetova wrote:
> >
> > For the signal_struct.sigcnt it might make a difference
> > in following places:
> > - put_signal_struct(): decrement in refcount_dec_and_test() only
> > provides RELEASE ordering and control dependency on success
> > vs. fully ordered atomic counterpart
>
> this is fine.
>
> Reviewed-by: Oleg Nesterov <[email protected]>

Thank you Oleg!

Will you be able to take this and the other scheduler
patch to whatever tree/path it should normally go to get eventually
integrated?

Best Regards,
Elena.

2019-01-22 09:28:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/5] sched: convert signal_struct.sigcnt to refcount_t

On Tue, Jan 22, 2019 at 09:11:42AM +0000, Reshetova, Elena wrote:
> Will you be able to take this and the other scheduler
> patch to whatever tree/path it should normally go to get eventually
> integrated?

I've queeud them up.

2019-01-22 09:38:26

by Elena Reshetova

[permalink] [raw]
Subject: RE: [PATCH 2/5] sched: convert signal_struct.sigcnt to refcount_t

> On Tue, Jan 22, 2019 at 09:11:42AM +0000, Reshetova, Elena wrote:
> > Will you be able to take this and the other scheduler
> > patch to whatever tree/path it should normally go to get eventually
> > integrated?
>
> I've queeud them up.

Thank you!

Best Regards,
Elena.

Subject: [tip:sched/core] sched/core: Convert sighand_struct.count to refcount_t

Commit-ID: d036bda7d0e7269c2982eb979acfef855f5d7977
Gitweb: https://git.kernel.org/tip/d036bda7d0e7269c2982eb979acfef855f5d7977
Author: Elena Reshetova <[email protected]>
AuthorDate: Fri, 18 Jan 2019 14:27:26 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 4 Feb 2019 08:53:52 +0100

sched/core: Convert sighand_struct.count to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:

- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable sighand_struct.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

** Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.

The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.

Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.

Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the sighand_struct.count it might make a difference
in following places:

- __cleanup_sighand: decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
fs/exec.c | 4 ++--
fs/proc/task_nommu.c | 2 +-
include/linux/sched/signal.h | 3 ++-
kernel/fork.c | 8 ++++----
4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index fb72d36f7823..966cd98a2ce2 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1189,7 +1189,7 @@ no_thread_group:
flush_itimer_signals();
#endif

- if (atomic_read(&oldsighand->count) != 1) {
+ if (refcount_read(&oldsighand->count) != 1) {
struct sighand_struct *newsighand;
/*
* This ->sighand is shared with the CLONE_SIGHAND
@@ -1199,7 +1199,7 @@ no_thread_group:
if (!newsighand)
return -ENOMEM;

- atomic_set(&newsighand->count, 1);
+ refcount_set(&newsighand->count, 1);
memcpy(newsighand->action, oldsighand->action,
sizeof(newsighand->action));

diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 0b63d68dedb2..f912872fbf91 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -64,7 +64,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
else
bytes += kobjsize(current->files);

- if (current->sighand && atomic_read(&current->sighand->count) > 1)
+ if (current->sighand && refcount_read(&current->sighand->count) > 1)
sbytes += kobjsize(current->sighand);
else
bytes += kobjsize(current->sighand);
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 13789d10a50e..37eeb1a28eba 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -8,13 +8,14 @@
#include <linux/sched/jobctl.h>
#include <linux/sched/task.h>
#include <linux/cred.h>
+#include <linux/refcount.h>

/*
* Types defining task->signal and task->sighand and APIs using them:
*/

struct sighand_struct {
- atomic_t count;
+ refcount_t count;
struct k_sigaction action[_NSIG];
spinlock_t siglock;
wait_queue_head_t signalfd_wqh;
diff --git a/kernel/fork.c b/kernel/fork.c
index b69248e6f0e0..370856d4c0b3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1463,7 +1463,7 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
struct sighand_struct *sig;

if (clone_flags & CLONE_SIGHAND) {
- atomic_inc(&current->sighand->count);
+ refcount_inc(&current->sighand->count);
return 0;
}
sig = kmem_cache_alloc(sighand_cachep, GFP_KERNEL);
@@ -1471,7 +1471,7 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
if (!sig)
return -ENOMEM;

- atomic_set(&sig->count, 1);
+ refcount_set(&sig->count, 1);
spin_lock_irq(&current->sighand->siglock);
memcpy(sig->action, current->sighand->action, sizeof(sig->action));
spin_unlock_irq(&current->sighand->siglock);
@@ -1480,7 +1480,7 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)

void __cleanup_sighand(struct sighand_struct *sighand)
{
- if (atomic_dec_and_test(&sighand->count)) {
+ if (refcount_dec_and_test(&sighand->count)) {
signalfd_cleanup(sighand);
/*
* sighand_cachep is SLAB_TYPESAFE_BY_RCU so we can free it
@@ -2439,7 +2439,7 @@ static int check_unshare_flags(unsigned long unshare_flags)
return -EINVAL;
}
if (unshare_flags & (CLONE_SIGHAND | CLONE_VM)) {
- if (atomic_read(&current->sighand->count) > 1)
+ if (refcount_read(&current->sighand->count) > 1)
return -EINVAL;
}
if (unshare_flags & CLONE_VM) {

Subject: [tip:sched/core] sched/core: Convert signal_struct.sigcnt to refcount_t

Commit-ID: 60d4de3ff7f775509deba94b3db3c1abe55bf7a5
Gitweb: https://git.kernel.org/tip/60d4de3ff7f775509deba94b3db3c1abe55bf7a5
Author: Elena Reshetova <[email protected]>
AuthorDate: Fri, 18 Jan 2019 14:27:27 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 4 Feb 2019 08:53:53 +0100

sched/core: Convert signal_struct.sigcnt to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:

- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable signal_struct.sigcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

** Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.

The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.

Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.

Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the signal_struct.sigcnt it might make a difference
in following places:

- put_signal_struct(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/sched/signal.h | 2 +-
init/init_task.c | 2 +-
kernel/fork.c | 6 +++---
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 37eeb1a28eba..ae5655197698 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -83,7 +83,7 @@ struct multiprocess_signals {
* the locking of signal_struct.
*/
struct signal_struct {
- atomic_t sigcnt;
+ refcount_t sigcnt;
atomic_t live;
int nr_threads;
struct list_head thread_head;
diff --git a/init/init_task.c b/init/init_task.c
index 5aebe3be4d7c..9aa3ebc74970 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -44,7 +44,7 @@ static struct signal_struct init_signals = {
};

static struct sighand_struct init_sighand = {
- .count = ATOMIC_INIT(1),
+ .count = REFCOUNT_INIT(1),
.action = { { { .sa_handler = SIG_DFL, } }, },
.siglock = __SPIN_LOCK_UNLOCKED(init_sighand.siglock),
.signalfd_wqh = __WAIT_QUEUE_HEAD_INITIALIZER(init_sighand.signalfd_wqh),
diff --git a/kernel/fork.c b/kernel/fork.c
index 370856d4c0b3..935a42d5f8ff 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -710,7 +710,7 @@ static inline void free_signal_struct(struct signal_struct *sig)

static inline void put_signal_struct(struct signal_struct *sig)
{
- if (atomic_dec_and_test(&sig->sigcnt))
+ if (refcount_dec_and_test(&sig->sigcnt))
free_signal_struct(sig);
}

@@ -1527,7 +1527,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)

sig->nr_threads = 1;
atomic_set(&sig->live, 1);
- atomic_set(&sig->sigcnt, 1);
+ refcount_set(&sig->sigcnt, 1);

/* list_add(thread_node, thread_head) without INIT_LIST_HEAD() */
sig->thread_head = (struct list_head)LIST_HEAD_INIT(tsk->thread_node);
@@ -2082,7 +2082,7 @@ static __latent_entropy struct task_struct *copy_process(
} else {
current->signal->nr_threads++;
atomic_inc(&current->signal->live);
- atomic_inc(&current->signal->sigcnt);
+ refcount_inc(&current->signal->sigcnt);
task_join_group_stop(p);
list_add_tail_rcu(&p->thread_group,
&p->group_leader->thread_group);

Subject: [tip:sched/core] sched/fair: Convert numa_group.refcount to refcount_t

Commit-ID: c45a77952427b678aa9205e1b0ee3bcf33339a2e
Gitweb: https://git.kernel.org/tip/c45a77952427b678aa9205e1b0ee3bcf33339a2e
Author: Elena Reshetova <[email protected]>
AuthorDate: Fri, 18 Jan 2019 14:27:28 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 4 Feb 2019 08:53:54 +0100

sched/fair: Convert numa_group.refcount to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:

- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable numa_group.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

** Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.

The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.

Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.

Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the numa_group.refcount it might make a difference
in following places:

- get_numa_group(): increment in refcount_inc_not_zero() only
guarantees control dependency on success vs. fully ordered
atomic counterpart
- put_numa_group(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e2ff4b69dcf6..5b2b919c7929 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1035,7 +1035,7 @@ unsigned int sysctl_numa_balancing_scan_size = 256;
unsigned int sysctl_numa_balancing_scan_delay = 1000;

struct numa_group {
- atomic_t refcount;
+ refcount_t refcount;

spinlock_t lock; /* nr_tasks, tasks */
int nr_tasks;
@@ -1104,7 +1104,7 @@ static unsigned int task_scan_start(struct task_struct *p)
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);

- period *= atomic_read(&ng->refcount);
+ period *= refcount_read(&ng->refcount);
period *= shared + 1;
period /= private + shared + 1;
}
@@ -1127,7 +1127,7 @@ static unsigned int task_scan_max(struct task_struct *p)
unsigned long private = group_faults_priv(ng);
unsigned long period = smax;

- period *= atomic_read(&ng->refcount);
+ period *= refcount_read(&ng->refcount);
period *= shared + 1;
period /= private + shared + 1;

@@ -2203,12 +2203,12 @@ static void task_numa_placement(struct task_struct *p)

static inline int get_numa_group(struct numa_group *grp)
{
- return atomic_inc_not_zero(&grp->refcount);
+ return refcount_inc_not_zero(&grp->refcount);
}

static inline void put_numa_group(struct numa_group *grp)
{
- if (atomic_dec_and_test(&grp->refcount))
+ if (refcount_dec_and_test(&grp->refcount))
kfree_rcu(grp, rcu);
}

@@ -2229,7 +2229,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
if (!grp)
return;

- atomic_set(&grp->refcount, 1);
+ refcount_set(&grp->refcount, 1);
grp->active_nodes = 1;
grp->max_faults_cpu = 0;
spin_lock_init(&grp->lock);

Subject: [tip:sched/core] sched/core: Convert task_struct.usage to refcount_t

Commit-ID: ec1d281923cf81cc660343d0cb8ffc837ffb991d
Gitweb: https://git.kernel.org/tip/ec1d281923cf81cc660343d0cb8ffc837ffb991d
Author: Elena Reshetova <[email protected]>
AuthorDate: Fri, 18 Jan 2019 14:27:29 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 4 Feb 2019 08:53:55 +0100

sched/core: Convert task_struct.usage to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:

- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable task_struct.usage is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

** Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.

The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.

Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.

Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the task_struct.usage it might make a difference
in following places:

- put_task_struct(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/sched.h | 3 ++-
include/linux/sched/task.h | 4 ++--
init/init_task.c | 2 +-
kernel/fork.c | 4 ++--
4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e2bba022827d..9d14d6864ca6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -21,6 +21,7 @@
#include <linux/seccomp.h>
#include <linux/nodemask.h>
#include <linux/rcupdate.h>
+#include <linux/refcount.h>
#include <linux/resource.h>
#include <linux/latencytop.h>
#include <linux/sched/prio.h>
@@ -607,7 +608,7 @@ struct task_struct {
randomized_struct_fields_start

void *stack;
- atomic_t usage;
+ refcount_t usage;
/* Per task flags (PF_*), defined further below: */
unsigned int flags;
unsigned int ptrace;
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 44c6f15800ff..2e97a2227045 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -88,13 +88,13 @@ extern void sched_exec(void);
#define sched_exec() {}
#endif

-#define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
+#define get_task_struct(tsk) do { refcount_inc(&(tsk)->usage); } while(0)

extern void __put_task_struct(struct task_struct *t);

static inline void put_task_struct(struct task_struct *t)
{
- if (atomic_dec_and_test(&t->usage))
+ if (refcount_dec_and_test(&t->usage))
__put_task_struct(t);
}

diff --git a/init/init_task.c b/init/init_task.c
index 9aa3ebc74970..aca34c89529f 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -65,7 +65,7 @@ struct task_struct init_task
#endif
.state = 0,
.stack = init_stack,
- .usage = ATOMIC_INIT(2),
+ .usage = REFCOUNT_INIT(2),
.flags = PF_KTHREAD,
.prio = MAX_PRIO - 20,
.static_prio = MAX_PRIO - 20,
diff --git a/kernel/fork.c b/kernel/fork.c
index 935a42d5f8ff..3f7e192e29f2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -717,7 +717,7 @@ static inline void put_signal_struct(struct signal_struct *sig)
void __put_task_struct(struct task_struct *tsk)
{
WARN_ON(!tsk->exit_state);
- WARN_ON(atomic_read(&tsk->usage));
+ WARN_ON(refcount_read(&tsk->usage));
WARN_ON(tsk == current);

cgroup_free(tsk);
@@ -896,7 +896,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
* One for us, one for whoever does the "release_task()" (usually
* parent)
*/
- atomic_set(&tsk->usage, 2);
+ refcount_set(&tsk->usage, 2);
#ifdef CONFIG_BLK_DEV_IO_TRACE
tsk->btrace_seq = 0;
#endif

Subject: [tip:sched/core] sched/core: Convert task_struct.stack_refcount to refcount_t

Commit-ID: f0b89d3958d73cd0785ec381f0ddf8efb6f183d8
Gitweb: https://git.kernel.org/tip/f0b89d3958d73cd0785ec381f0ddf8efb6f183d8
Author: Elena Reshetova <[email protected]>
AuthorDate: Fri, 18 Jan 2019 14:27:30 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 4 Feb 2019 08:53:56 +0100

sched/core: Convert task_struct.stack_refcount to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:

- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable task_struct.stack_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

** Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c
have different memory ordering guarantees than their atomic
counterparts.

The full comparison can be seen in
https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
in state to be merged to the documentation tree.

Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.

Please double check that you don't have some undocumented
memory guarantees for this variable usage.

For the task_struct.stack_refcount it might make a difference
in following places:

- try_get_task_stack(): increment in refcount_inc_not_zero() only
guarantees control dependency on success vs. fully ordered
atomic counterpart
- put_task_stack(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and control dependency on success
vs. fully ordered atomic counterpart

Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/init_task.h | 1 +
include/linux/sched.h | 2 +-
include/linux/sched/task_stack.h | 2 +-
init/init_task.c | 2 +-
kernel/fork.c | 6 +++---
5 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index a7083a45a26c..6049baa5b8bc 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -13,6 +13,7 @@
#include <linux/securebits.h>
#include <linux/seqlock.h>
#include <linux/rbtree.h>
+#include <linux/refcount.h>
#include <linux/sched/autogroup.h>
#include <net/net_namespace.h>
#include <linux/sched/rt.h>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9d14d6864ca6..628bf13cb5a5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1194,7 +1194,7 @@ struct task_struct {
#endif
#ifdef CONFIG_THREAD_INFO_IN_TASK
/* A live task holds one reference: */
- atomic_t stack_refcount;
+ refcount_t stack_refcount;
#endif
#ifdef CONFIG_LIVEPATCH
int patch_state;
diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h
index 6a841929073f..2413427e439c 100644
--- a/include/linux/sched/task_stack.h
+++ b/include/linux/sched/task_stack.h
@@ -61,7 +61,7 @@ static inline unsigned long *end_of_stack(struct task_struct *p)
#ifdef CONFIG_THREAD_INFO_IN_TASK
static inline void *try_get_task_stack(struct task_struct *tsk)
{
- return atomic_inc_not_zero(&tsk->stack_refcount) ?
+ return refcount_inc_not_zero(&tsk->stack_refcount) ?
task_stack_page(tsk) : NULL;
}

diff --git a/init/init_task.c b/init/init_task.c
index aca34c89529f..46dbf546264d 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -61,7 +61,7 @@ struct task_struct init_task
= {
#ifdef CONFIG_THREAD_INFO_IN_TASK
.thread_info = INIT_THREAD_INFO(init_task),
- .stack_refcount = ATOMIC_INIT(1),
+ .stack_refcount = REFCOUNT_INIT(1),
#endif
.state = 0,
.stack = init_stack,
diff --git a/kernel/fork.c b/kernel/fork.c
index 3f7e192e29f2..77059b211608 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -429,7 +429,7 @@ static void release_task_stack(struct task_struct *tsk)
#ifdef CONFIG_THREAD_INFO_IN_TASK
void put_task_stack(struct task_struct *tsk)
{
- if (atomic_dec_and_test(&tsk->stack_refcount))
+ if (refcount_dec_and_test(&tsk->stack_refcount))
release_task_stack(tsk);
}
#endif
@@ -447,7 +447,7 @@ void free_task(struct task_struct *tsk)
* If the task had a separate stack allocation, it should be gone
* by now.
*/
- WARN_ON_ONCE(atomic_read(&tsk->stack_refcount) != 0);
+ WARN_ON_ONCE(refcount_read(&tsk->stack_refcount) != 0);
#endif
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
@@ -867,7 +867,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
tsk->stack_vm_area = stack_vm_area;
#endif
#ifdef CONFIG_THREAD_INFO_IN_TASK
- atomic_set(&tsk->stack_refcount, 1);
+ refcount_set(&tsk->stack_refcount, 1);
#endif

if (err)

2019-02-05 11:58:53

by Elena Reshetova

[permalink] [raw]
Subject: RE: [PATCH 0/5] sched refcount_t conversions

> On Fri, Jan 18, 2019 at 02:27:25PM +0200, Elena Reshetova wrote:
> > Elena Reshetova (5):
> > sched: convert sighand_struct.count to refcount_t
> > sched: convert signal_struct.sigcnt to refcount_t
>
> These should really be seen by Oleg (bounced) and I'll await his reply.
>
> > sched: convert numa_group.refcount to refcount_t
> > sched/task_struct: convert task_struct.usage to refcount_t
> > sched/task_struct: convert task_struct.stack_refcount to refcount_t
>
> Those are ok and I'll queue them.

Just to double check that nothing got lost in between:
I noticed that above task_struct patches got merged, is numa_group one also
coming soon?

Best Regards,
Elena.

2019-02-05 12:32:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/5] sched refcount_t conversions

On Tue, Feb 05, 2019 at 11:57:00AM +0000, Reshetova, Elena wrote:
> > On Fri, Jan 18, 2019 at 02:27:25PM +0200, Elena Reshetova wrote:
> > > Elena Reshetova (5):
> > > sched: convert sighand_struct.count to refcount_t
> > > sched: convert signal_struct.sigcnt to refcount_t
> >
> > These should really be seen by Oleg (bounced) and I'll await his reply.
> >
> > > sched: convert numa_group.refcount to refcount_t
> > > sched/task_struct: convert task_struct.usage to refcount_t
> > > sched/task_struct: convert task_struct.stack_refcount to refcount_t
> >
> > Those are ok and I'll queue them.
>
> Just to double check that nothing got lost in between:
> I noticed that above task_struct patches got merged, is numa_group one also
> coming soon?

I have tip commit mails for all 5