Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
arm and others on their way.
Changes in v5:
* Kees catched that the following changes in
perf_event_context.refcount and futex_pi_state.refcount
are not correct now when ARCH_HAS_REFCOUNT is enabled:
- WARN_ON(!atomic_inc_not_zero(refcount));
+ refcount_inc(refcount);
So they are now changed back to using refcount_inc_not_zero.
Changes in v4:
* just rebase and corrections on linux-next/master
Changes in v3:
* SoB chain corrected
* minor corrections based on v2 feedback
* rebase on linux-next/master as of today
Changes in v2:
* dropped already merged patches
* rebase on top of linux-next/master
* Now by default refcount_t = atomic_t (*) and uses all atomic
standard operations unless CONFIG_REFCOUNT_FULL is enabled.
This is a compromise for the systems that are critical on
performance (such as net) and cannot accept even slight delay
on the refcounter operations.
This series, for core kernel components, replaces atomic_t reference
counters with the new refcount_t type and API (see include/linux/refcount.h).
By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.
The patches are fully independent and can be cherry-picked separately.
If there are no objections to the patches, please merge them via respective trees.
Elena Reshetova (15):
sched: convert sighand_struct.count to refcount_t
sched: convert signal_struct.sigcnt to refcount_t
sched: convert user_struct.__count to refcount_t
sched: convert numa_group.refcount to refcount_t
sched/task_struct: convert task_struct.usage to refcount_t
sched/task_struct: convert task_struct.stack_refcount to refcount_t
perf: convert perf_event_context.refcount to refcount_t
perf/ring_buffer: convert ring_buffer.refcount to refcount_t
perf/ring_buffer: convert ring_buffer.aux_refcount to refcount_t
uprobes: convert uprobe.ref to refcount_t
nsproxy: convert nsproxy.count to refcount_t
groups: convert group_info.usage to refcount_t
creds: convert cred.usage to refcount_t
futex: convert futex_pi_state.refcount to refcount_t
kcov: convert kcov.refcount to refcount_t
fs/exec.c | 4 ++--
fs/proc/task_nommu.c | 2 +-
include/linux/cred.h | 13 ++++++------
include/linux/init_task.h | 7 +++---
include/linux/nsproxy.h | 6 +++---
include/linux/perf_event.h | 3 ++-
include/linux/sched.h | 5 +++--
include/linux/sched/signal.h | 5 +++--
include/linux/sched/task.h | 4 ++--
include/linux/sched/task_stack.h | 2 +-
include/linux/sched/user.h | 5 +++--
kernel/cred.c | 46 ++++++++++++++++++++--------------------
kernel/events/core.c | 18 ++++++++--------
kernel/events/internal.h | 5 +++--
kernel/events/ring_buffer.c | 8 +++----
kernel/events/uprobes.c | 8 +++----
kernel/fork.c | 24 ++++++++++-----------
kernel/futex.c | 13 ++++++------
kernel/groups.c | 2 +-
kernel/kcov.c | 9 ++++----
kernel/nsproxy.c | 6 +++---
kernel/sched/fair.c | 12 +++++------
kernel/user.c | 8 +++----
23 files changed, 112 insertions(+), 103 deletions(-)
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable sighand_struct.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
fs/exec.c | 4 ++--
fs/proc/task_nommu.c | 2 +-
include/linux/init_task.h | 2 +-
include/linux/sched/signal.h | 3 ++-
kernel/fork.c | 8 ++++----
5 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index daa19d8..5c8960d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1200,7 +1200,7 @@ static int de_thread(struct task_struct *tsk)
flush_itimer_signals();
#endif
- if (atomic_read(&oldsighand->count) != 1) {
+ if (refcount_read(&oldsighand->count) != 1) {
struct sighand_struct *newsighand;
/*
* This ->sighand is shared with the CLONE_SIGHAND
@@ -1210,7 +1210,7 @@ static int de_thread(struct task_struct *tsk)
if (!newsighand)
return -ENOMEM;
- atomic_set(&newsighand->count, 1);
+ refcount_set(&newsighand->count, 1);
memcpy(newsighand->action, oldsighand->action,
sizeof(newsighand->action));
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 8652209..5038b6b 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -63,7 +63,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
else
bytes += kobjsize(current->files);
- if (current->sighand && atomic_read(¤t->sighand->count) > 1)
+ if (current->sighand && refcount_read(¤t->sighand->count) > 1)
sbytes += kobjsize(current->sighand);
else
bytes += kobjsize(current->sighand);
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 3c07ace..6c6f520 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -85,7 +85,7 @@ extern struct fs_struct init_fs;
extern struct nsproxy init_nsproxy;
#define INIT_SIGHAND(sighand) { \
- .count = ATOMIC_INIT(1), \
+ .count = REFCOUNT_INIT(1), \
.action = { { { .sa_handler = SIG_DFL, } }, }, \
.siglock = __SPIN_LOCK_UNLOCKED(sighand.siglock), \
.signalfd_wqh = __WAIT_QUEUE_HEAD_INITIALIZER(sighand.signalfd_wqh), \
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 2a0dd40..4d5cdf1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -7,13 +7,14 @@
#include <linux/sched/jobctl.h>
#include <linux/sched/task.h>
#include <linux/cred.h>
+#include <linux/refcount.h>
/*
* Types defining task->signal and task->sighand and APIs using them:
*/
struct sighand_struct {
- atomic_t count;
+ refcount_t count;
struct k_sigaction action[_NSIG];
spinlock_t siglock;
wait_queue_head_t signalfd_wqh;
diff --git a/kernel/fork.c b/kernel/fork.c
index 8004fc7..43bcd20 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1320,7 +1320,7 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
struct sighand_struct *sig;
if (clone_flags & CLONE_SIGHAND) {
- atomic_inc(¤t->sighand->count);
+ refcount_inc(¤t->sighand->count);
return 0;
}
sig = kmem_cache_alloc(sighand_cachep, GFP_KERNEL);
@@ -1328,14 +1328,14 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
if (!sig)
return -ENOMEM;
- atomic_set(&sig->count, 1);
+ refcount_set(&sig->count, 1);
memcpy(sig->action, current->sighand->action, sizeof(sig->action));
return 0;
}
void __cleanup_sighand(struct sighand_struct *sighand)
{
- if (atomic_dec_and_test(&sighand->count)) {
+ if (refcount_dec_and_test(&sighand->count)) {
signalfd_cleanup(sighand);
/*
* sighand_cachep is SLAB_TYPESAFE_BY_RCU so we can free it
@@ -2236,7 +2236,7 @@ static int check_unshare_flags(unsigned long unshare_flags)
return -EINVAL;
}
if (unshare_flags & (CLONE_SIGHAND | CLONE_VM)) {
- if (atomic_read(¤t->sighand->count) > 1)
+ if (refcount_read(¤t->sighand->count) > 1)
return -EINVAL;
}
if (unshare_flags & CLONE_VM) {
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable task_struct.usage is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/init_task.h | 2 +-
include/linux/sched.h | 3 ++-
include/linux/sched/task.h | 4 ++--
kernel/fork.c | 4 ++--
4 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6c6f520..d312376 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -227,7 +227,7 @@ extern struct cred init_cred;
INIT_TASK_TI(tsk) \
.state = 0, \
.stack = init_stack, \
- .usage = ATOMIC_INIT(2), \
+ .usage = REFCOUNT_INIT(2), \
.flags = PF_KTHREAD, \
.prio = MAX_PRIO-20, \
.static_prio = MAX_PRIO-20, \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6110471..1eeb300 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -20,6 +20,7 @@
#include <linux/seccomp.h>
#include <linux/nodemask.h>
#include <linux/rcupdate.h>
+#include <linux/refcount.h>
#include <linux/resource.h>
#include <linux/latencytop.h>
#include <linux/sched/prio.h>
@@ -534,7 +535,7 @@ struct task_struct {
randomized_struct_fields_start
void *stack;
- atomic_t usage;
+ refcount_t usage;
/* Per task flags (PF_*), defined further below: */
unsigned int flags;
unsigned int ptrace;
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 79a2a74..2ddc9b0 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -85,13 +85,13 @@ extern void sched_exec(void);
#define sched_exec() {}
#endif
-#define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
+#define get_task_struct(tsk) do { refcount_inc(&(tsk)->usage); } while(0)
extern void __put_task_struct(struct task_struct *t);
static inline void put_task_struct(struct task_struct *t)
{
- if (atomic_dec_and_test(&t->usage))
+ if (refcount_dec_and_test(&t->usage))
__put_task_struct(t);
}
diff --git a/kernel/fork.c b/kernel/fork.c
index d2e2eb0..1e54683 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -407,7 +407,7 @@ static inline void put_signal_struct(struct signal_struct *sig)
void __put_task_struct(struct task_struct *tsk)
{
WARN_ON(!tsk->exit_state);
- WARN_ON(atomic_read(&tsk->usage));
+ WARN_ON(refcount_read(&tsk->usage));
WARN_ON(tsk == current);
cgroup_free(tsk);
@@ -564,7 +564,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
* One for us, one for whoever does the "release_task()" (usually
* parent)
*/
- atomic_set(&tsk->usage, 2);
+ refcount_set(&tsk->usage, 2);
#ifdef CONFIG_BLK_DEV_IO_TRACE
tsk->btrace_seq = 0;
#endif
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable task_struct.stack_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/init_task.h | 3 ++-
include/linux/sched.h | 2 +-
include/linux/sched/task_stack.h | 2 +-
kernel/fork.c | 6 +++---
4 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index d312376..af2ebe6 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -12,6 +12,7 @@
#include <linux/securebits.h>
#include <linux/seqlock.h>
#include <linux/rbtree.h>
+#include <linux/refcount.h>
#include <linux/sched/autogroup.h>
#include <net/net_namespace.h>
#include <linux/sched/rt.h>
@@ -207,7 +208,7 @@ extern struct cred init_cred;
#ifdef CONFIG_THREAD_INFO_IN_TASK
# define INIT_TASK_TI(tsk) \
.thread_info = INIT_THREAD_INFO(tsk), \
- .stack_refcount = ATOMIC_INIT(1),
+ .stack_refcount = REFCOUNT_INIT(1),
#else
# define INIT_TASK_TI(tsk)
#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1eeb300..3d21012 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1089,7 +1089,7 @@ struct task_struct {
#endif
#ifdef CONFIG_THREAD_INFO_IN_TASK
/* A live task holds one reference: */
- atomic_t stack_refcount;
+ refcount_t stack_refcount;
#endif
#ifdef CONFIG_LIVEPATCH
int patch_state;
diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h
index df6ea66..aab3809 100644
--- a/include/linux/sched/task_stack.h
+++ b/include/linux/sched/task_stack.h
@@ -60,7 +60,7 @@ static inline unsigned long *end_of_stack(struct task_struct *p)
#ifdef CONFIG_THREAD_INFO_IN_TASK
static inline void *try_get_task_stack(struct task_struct *tsk)
{
- return atomic_inc_not_zero(&tsk->stack_refcount) ?
+ return refcount_inc_not_zero(&tsk->stack_refcount) ?
task_stack_page(tsk) : NULL;
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 1e54683..4208c79 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -355,7 +355,7 @@ static void release_task_stack(struct task_struct *tsk)
#ifdef CONFIG_THREAD_INFO_IN_TASK
void put_task_stack(struct task_struct *tsk)
{
- if (atomic_dec_and_test(&tsk->stack_refcount))
+ if (refcount_dec_and_test(&tsk->stack_refcount))
release_task_stack(tsk);
}
#endif
@@ -373,7 +373,7 @@ void free_task(struct task_struct *tsk)
* If the task had a separate stack allocation, it should be gone
* by now.
*/
- WARN_ON_ONCE(atomic_read(&tsk->stack_refcount) != 0);
+ WARN_ON_ONCE(refcount_read(&tsk->stack_refcount) != 0);
#endif
rt_mutex_debug_task_free(tsk);
ftrace_graph_exit_task(tsk);
@@ -535,7 +535,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
tsk->stack_vm_area = stack_vm_area;
#endif
#ifdef CONFIG_THREAD_INFO_IN_TASK
- atomic_set(&tsk->stack_refcount, 1);
+ refcount_set(&tsk->stack_refcount, 1);
#endif
if (err)
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable user_struct.__count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/sched/user.h | 5 +++--
kernel/user.c | 8 ++++----
2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index 3c07e41..afcbf19 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -3,6 +3,7 @@
#include <linux/uidgid.h>
#include <linux/atomic.h>
+#include <linux/refcount.h>
struct key;
@@ -10,7 +11,7 @@ struct key;
* Some day this will be a full-fledged user tracking system..
*/
struct user_struct {
- atomic_t __count; /* reference count */
+ refcount_t __count; /* reference count */
atomic_t processes; /* How many processes does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
#ifdef CONFIG_FANOTIFY
@@ -54,7 +55,7 @@ extern struct user_struct root_user;
extern struct user_struct * alloc_uid(kuid_t);
static inline struct user_struct *get_uid(struct user_struct *u)
{
- atomic_inc(&u->__count);
+ refcount_inc(&u->__count);
return u;
}
extern void free_uid(struct user_struct *);
diff --git a/kernel/user.c b/kernel/user.c
index 00281ad..c072348 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -90,7 +90,7 @@ static DEFINE_SPINLOCK(uidhash_lock);
/* root_user.__count is 1, for init task cred */
struct user_struct root_user = {
- .__count = ATOMIC_INIT(1),
+ .__count = REFCOUNT_INIT(1),
.processes = ATOMIC_INIT(1),
.sigpending = ATOMIC_INIT(0),
.locked_shm = 0,
@@ -116,7 +116,7 @@ static struct user_struct *uid_hash_find(kuid_t uid, struct hlist_head *hashent)
hlist_for_each_entry(user, hashent, uidhash_node) {
if (uid_eq(user->uid, uid)) {
- atomic_inc(&user->__count);
+ refcount_inc(&user->__count);
return user;
}
}
@@ -163,7 +163,7 @@ void free_uid(struct user_struct *up)
return;
local_irq_save(flags);
- if (atomic_dec_and_lock(&up->__count, &uidhash_lock))
+ if (refcount_dec_and_lock(&up->__count, &uidhash_lock))
free_user(up, flags);
else
local_irq_restore(flags);
@@ -184,7 +184,7 @@ struct user_struct *alloc_uid(kuid_t uid)
goto out_unlock;
new->uid = uid;
- atomic_set(&new->__count, 1);
+ refcount_set(&new->__count, 1);
/*
* Before adding this, check whether we raced
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable ring_buffer.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
kernel/events/core.c | 4 ++--
kernel/events/internal.h | 3 ++-
kernel/events/ring_buffer.c | 2 +-
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 57ef7ce..43da16ef 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5089,7 +5089,7 @@ struct ring_buffer *ring_buffer_get(struct perf_event *event)
rcu_read_lock();
rb = rcu_dereference(event->rb);
if (rb) {
- if (!atomic_inc_not_zero(&rb->refcount))
+ if (!refcount_inc_not_zero(&rb->refcount))
rb = NULL;
}
rcu_read_unlock();
@@ -5099,7 +5099,7 @@ struct ring_buffer *ring_buffer_get(struct perf_event *event)
void ring_buffer_put(struct ring_buffer *rb)
{
- if (!atomic_dec_and_test(&rb->refcount))
+ if (!refcount_dec_and_test(&rb->refcount))
return;
WARN_ON_ONCE(!list_empty(&rb->event_list));
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 486fd78..b8e6fdf 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -3,13 +3,14 @@
#include <linux/hardirq.h>
#include <linux/uaccess.h>
+#include <linux/refcount.h>
/* Buffer handling */
#define RING_BUFFER_WRITABLE 0x01
struct ring_buffer {
- atomic_t refcount;
+ refcount_t refcount;
struct rcu_head rcu_head;
#ifdef CONFIG_PERF_USE_VMALLOC
struct work_struct work;
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index ee97196..3353572 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -284,7 +284,7 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
else
rb->overwrite = 1;
- atomic_set(&rb->refcount, 1);
+ refcount_set(&rb->refcount, 1);
INIT_LIST_HEAD(&rb->event_list);
spin_lock_init(&rb->event_lock);
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable uprobe.ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
kernel/events/uprobes.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 456596d..ca093957 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -66,7 +66,7 @@ static struct percpu_rw_semaphore dup_mmap_sem;
struct uprobe {
struct rb_node rb_node; /* node in the rb tree */
- atomic_t ref;
+ refcount_t ref;
struct rw_semaphore register_rwsem;
struct rw_semaphore consumer_rwsem;
struct list_head pending_list;
@@ -371,13 +371,13 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v
static struct uprobe *get_uprobe(struct uprobe *uprobe)
{
- atomic_inc(&uprobe->ref);
+ refcount_inc(&uprobe->ref);
return uprobe;
}
static void put_uprobe(struct uprobe *uprobe)
{
- if (atomic_dec_and_test(&uprobe->ref))
+ if (refcount_dec_and_test(&uprobe->ref))
kfree(uprobe);
}
@@ -459,7 +459,7 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe)
rb_link_node(&uprobe->rb_node, parent, p);
rb_insert_color(&uprobe->rb_node, &uprobes_tree);
/* get access + creation ref */
- atomic_set(&uprobe->ref, 2);
+ refcount_set(&uprobe->ref, 2);
return u;
}
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nsproxy.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/nsproxy.h | 6 +++---
kernel/nsproxy.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index ac0d65b..f862ba8 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -28,7 +28,7 @@ struct fs_struct;
* nsproxy is copied.
*/
struct nsproxy {
- atomic_t count;
+ refcount_t count;
struct uts_namespace *uts_ns;
struct ipc_namespace *ipc_ns;
struct mnt_namespace *mnt_ns;
@@ -74,14 +74,14 @@ int __init nsproxy_cache_init(void);
static inline void put_nsproxy(struct nsproxy *ns)
{
- if (atomic_dec_and_test(&ns->count)) {
+ if (refcount_dec_and_test(&ns->count)) {
free_nsproxy(ns);
}
}
static inline void get_nsproxy(struct nsproxy *ns)
{
- atomic_inc(&ns->count);
+ refcount_inc(&ns->count);
}
#endif
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f6c5d33..5bfe691 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -31,7 +31,7 @@
static struct kmem_cache *nsproxy_cachep;
struct nsproxy init_nsproxy = {
- .count = ATOMIC_INIT(1),
+ .count = REFCOUNT_INIT(1),
.uts_ns = &init_uts_ns,
#if defined(CONFIG_POSIX_MQUEUE) || defined(CONFIG_SYSVIPC)
.ipc_ns = &init_ipc_ns,
@@ -52,7 +52,7 @@ static inline struct nsproxy *create_nsproxy(void)
nsproxy = kmem_cache_alloc(nsproxy_cachep, GFP_KERNEL);
if (nsproxy)
- atomic_set(&nsproxy->count, 1);
+ refcount_set(&nsproxy->count, 1);
return nsproxy;
}
@@ -225,7 +225,7 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
p->nsproxy = new;
task_unlock(p);
- if (ns && atomic_dec_and_test(&ns->count))
+ if (ns && refcount_dec_and_test(&ns->count))
free_nsproxy(ns);
}
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable group_info.usage is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/cred.h | 7 ++++---
kernel/cred.c | 2 +-
kernel/groups.c | 2 +-
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 099058e..00948dd 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -17,6 +17,7 @@
#include <linux/key.h>
#include <linux/selinux.h>
#include <linux/atomic.h>
+#include <linux/refcount.h>
#include <linux/uidgid.h>
#include <linux/sched.h>
#include <linux/sched/user.h>
@@ -28,7 +29,7 @@ struct inode;
* COW Supplementary groups list
*/
struct group_info {
- atomic_t usage;
+ refcount_t usage;
int ngroups;
kgid_t gid[0];
} __randomize_layout;
@@ -44,7 +45,7 @@ struct group_info {
*/
static inline struct group_info *get_group_info(struct group_info *gi)
{
- atomic_inc(&gi->usage);
+ refcount_inc(&gi->usage);
return gi;
}
@@ -54,7 +55,7 @@ static inline struct group_info *get_group_info(struct group_info *gi)
*/
#define put_group_info(group_info) \
do { \
- if (atomic_dec_and_test(&(group_info)->usage)) \
+ if (refcount_dec_and_test(&(group_info)->usage)) \
groups_free(group_info); \
} while (0)
diff --git a/kernel/cred.c b/kernel/cred.c
index ecf0365..8122d7c 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -36,7 +36,7 @@ do { \
static struct kmem_cache *cred_jar;
/* init to 2 - one for init_task, one to ensure it is never freed */
-struct group_info init_groups = { .usage = ATOMIC_INIT(2) };
+struct group_info init_groups = { .usage = REFCOUNT_INIT(2) };
/*
* The initial credentials for the initial task
diff --git a/kernel/groups.c b/kernel/groups.c
index 434f666..5fc6e21 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -23,7 +23,7 @@ struct group_info *groups_alloc(int gidsetsize)
if (!gi)
return NULL;
- atomic_set(&gi->usage, 1);
+ refcount_set(&gi->usage, 1);
gi->ngroups = gidsetsize;
return gi;
}
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable cred.usage is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/cred.h | 6 +++---
kernel/cred.c | 44 ++++++++++++++++++++++----------------------
2 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 00948dd..a9f217b 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -109,7 +109,7 @@ extern bool may_setgroups(void);
* same context as task->real_cred.
*/
struct cred {
- atomic_t usage;
+ refcount_t usage;
#ifdef CONFIG_DEBUG_CREDENTIALS
atomic_t subscribers; /* number of processes subscribed */
void *put_addr;
@@ -222,7 +222,7 @@ static inline bool cap_ambient_invariant_ok(const struct cred *cred)
*/
static inline struct cred *get_new_cred(struct cred *cred)
{
- atomic_inc(&cred->usage);
+ refcount_inc(&cred->usage);
return cred;
}
@@ -262,7 +262,7 @@ static inline void put_cred(const struct cred *_cred)
struct cred *cred = (struct cred *) _cred;
validate_creds(cred);
- if (atomic_dec_and_test(&(cred)->usage))
+ if (refcount_dec_and_test(&(cred)->usage))
__put_cred(cred);
}
diff --git a/kernel/cred.c b/kernel/cred.c
index 8122d7c..49664e5 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -42,7 +42,7 @@ struct group_info init_groups = { .usage = REFCOUNT_INIT(2) };
* The initial credentials for the initial task
*/
struct cred init_cred = {
- .usage = ATOMIC_INIT(4),
+ .usage = REFCOUNT_INIT(4),
#ifdef CONFIG_DEBUG_CREDENTIALS
.subscribers = ATOMIC_INIT(2),
.magic = CRED_MAGIC,
@@ -101,17 +101,17 @@ static void put_cred_rcu(struct rcu_head *rcu)
#ifdef CONFIG_DEBUG_CREDENTIALS
if (cred->magic != CRED_MAGIC_DEAD ||
- atomic_read(&cred->usage) != 0 ||
+ refcount_read(&cred->usage) != 0 ||
read_cred_subscribers(cred) != 0)
panic("CRED: put_cred_rcu() sees %p with"
" mag %x, put %p, usage %d, subscr %d\n",
cred, cred->magic, cred->put_addr,
- atomic_read(&cred->usage),
+ refcount_read(&cred->usage),
read_cred_subscribers(cred));
#else
- if (atomic_read(&cred->usage) != 0)
+ if (refcount_read(&cred->usage) != 0)
panic("CRED: put_cred_rcu() sees %p with usage %d\n",
- cred, atomic_read(&cred->usage));
+ cred, refcount_read(&cred->usage));
#endif
security_cred_free(cred);
@@ -135,10 +135,10 @@ static void put_cred_rcu(struct rcu_head *rcu)
void __put_cred(struct cred *cred)
{
kdebug("__put_cred(%p{%d,%d})", cred,
- atomic_read(&cred->usage),
+ refcount_read(&cred->usage),
read_cred_subscribers(cred));
- BUG_ON(atomic_read(&cred->usage) != 0);
+ BUG_ON(refcount_read(&cred->usage) != 0);
#ifdef CONFIG_DEBUG_CREDENTIALS
BUG_ON(read_cred_subscribers(cred) != 0);
cred->magic = CRED_MAGIC_DEAD;
@@ -159,7 +159,7 @@ void exit_creds(struct task_struct *tsk)
struct cred *cred;
kdebug("exit_creds(%u,%p,%p,{%d,%d})", tsk->pid, tsk->real_cred, tsk->cred,
- atomic_read(&tsk->cred->usage),
+ refcount_read(&tsk->cred->usage),
read_cred_subscribers(tsk->cred));
cred = (struct cred *) tsk->real_cred;
@@ -194,7 +194,7 @@ const struct cred *get_task_cred(struct task_struct *task)
do {
cred = __task_cred((task));
BUG_ON(!cred);
- } while (!atomic_inc_not_zero(&((struct cred *)cred)->usage));
+ } while (!refcount_inc_not_zero(&((struct cred *)cred)->usage));
rcu_read_unlock();
return cred;
@@ -212,7 +212,7 @@ struct cred *cred_alloc_blank(void)
if (!new)
return NULL;
- atomic_set(&new->usage, 1);
+ refcount_set(&new->usage, 1);
#ifdef CONFIG_DEBUG_CREDENTIALS
new->magic = CRED_MAGIC;
#endif
@@ -258,7 +258,7 @@ struct cred *prepare_creds(void)
old = task->cred;
memcpy(new, old, sizeof(struct cred));
- atomic_set(&new->usage, 1);
+ refcount_set(&new->usage, 1);
set_cred_subscribers(new, 0);
get_group_info(new->group_info);
get_uid(new->user);
@@ -335,7 +335,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
get_cred(p->cred);
alter_cred_subscribers(p->cred, 2);
kdebug("share_creds(%p{%d,%d})",
- p->cred, atomic_read(&p->cred->usage),
+ p->cred, refcount_read(&p->cred->usage),
read_cred_subscribers(p->cred));
atomic_inc(&p->cred->user->processes);
return 0;
@@ -426,7 +426,7 @@ int commit_creds(struct cred *new)
const struct cred *old = task->real_cred;
kdebug("commit_creds(%p{%d,%d})", new,
- atomic_read(&new->usage),
+ refcount_read(&new->usage),
read_cred_subscribers(new));
BUG_ON(task->cred != old);
@@ -435,7 +435,7 @@ int commit_creds(struct cred *new)
validate_creds(old);
validate_creds(new);
#endif
- BUG_ON(atomic_read(&new->usage) < 1);
+ BUG_ON(refcount_read(&new->usage) < 1);
get_cred(new); /* we will require a ref for the subj creds too */
@@ -500,13 +500,13 @@ EXPORT_SYMBOL(commit_creds);
void abort_creds(struct cred *new)
{
kdebug("abort_creds(%p{%d,%d})", new,
- atomic_read(&new->usage),
+ refcount_read(&new->usage),
read_cred_subscribers(new));
#ifdef CONFIG_DEBUG_CREDENTIALS
BUG_ON(read_cred_subscribers(new) != 0);
#endif
- BUG_ON(atomic_read(&new->usage) < 1);
+ BUG_ON(refcount_read(&new->usage) < 1);
put_cred(new);
}
EXPORT_SYMBOL(abort_creds);
@@ -523,7 +523,7 @@ const struct cred *override_creds(const struct cred *new)
const struct cred *old = current->cred;
kdebug("override_creds(%p{%d,%d})", new,
- atomic_read(&new->usage),
+ refcount_read(&new->usage),
read_cred_subscribers(new));
validate_creds(old);
@@ -534,7 +534,7 @@ const struct cred *override_creds(const struct cred *new)
alter_cred_subscribers(old, -1);
kdebug("override_creds() = %p{%d,%d}", old,
- atomic_read(&old->usage),
+ refcount_read(&old->usage),
read_cred_subscribers(old));
return old;
}
@@ -552,7 +552,7 @@ void revert_creds(const struct cred *old)
const struct cred *override = current->cred;
kdebug("revert_creds(%p{%d,%d})", old,
- atomic_read(&old->usage),
+ refcount_read(&old->usage),
read_cred_subscribers(old));
validate_creds(old);
@@ -611,7 +611,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
validate_creds(old);
*new = *old;
- atomic_set(&new->usage, 1);
+ refcount_set(&new->usage, 1);
set_cred_subscribers(new, 0);
get_uid(new->user);
get_user_ns(new->user_ns);
@@ -735,7 +735,7 @@ static void dump_invalid_creds(const struct cred *cred, const char *label,
printk(KERN_ERR "CRED: ->magic=%x, put_addr=%p\n",
cred->magic, cred->put_addr);
printk(KERN_ERR "CRED: ->usage=%d, subscr=%d\n",
- atomic_read(&cred->usage),
+ refcount_read(&cred->usage),
read_cred_subscribers(cred));
printk(KERN_ERR "CRED: ->*uid = { %d,%d,%d,%d }\n",
from_kuid_munged(&init_user_ns, cred->uid),
@@ -809,7 +809,7 @@ void validate_creds_for_do_exit(struct task_struct *tsk)
{
kdebug("validate_creds_for_do_exit(%p,%p{%d,%d})",
tsk->real_cred, tsk->cred,
- atomic_read(&tsk->cred->usage),
+ refcount_read(&tsk->cred->usage),
read_cred_subscribers(tsk->cred));
__validate_process_creds(tsk, __FILE__, __LINE__);
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable futex_pi_state.refcount is used as pure
reference counter. Convert it to refcount_t and fix up
the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
kernel/futex.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 0939255..65460b4 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -67,6 +67,7 @@
#include <linux/freezer.h>
#include <linux/bootmem.h>
#include <linux/fault-inject.h>
+#include <linux/refcount.h>
#include <asm/futex.h>
@@ -209,7 +210,7 @@ struct futex_pi_state {
struct rt_mutex pi_mutex;
struct task_struct *owner;
- atomic_t refcount;
+ refcount_t refcount;
union futex_key key;
} __randomize_layout;
@@ -795,7 +796,7 @@ static int refill_pi_state_cache(void)
INIT_LIST_HEAD(&pi_state->list);
/* pi_mutex gets initialized later */
pi_state->owner = NULL;
- atomic_set(&pi_state->refcount, 1);
+ refcount_set(&pi_state->refcount, 1);
pi_state->key = FUTEX_KEY_INIT;
current->pi_state_cache = pi_state;
@@ -815,7 +816,7 @@ static struct futex_pi_state *alloc_pi_state(void)
static void get_pi_state(struct futex_pi_state *pi_state)
{
- WARN_ON_ONCE(!atomic_inc_not_zero(&pi_state->refcount));
+ WARN_ON_ONCE(!refcount_inc_not_zero(&pi_state->refcount));
}
/*
@@ -829,7 +830,7 @@ static void put_pi_state(struct futex_pi_state *pi_state)
if (!pi_state)
return;
- if (!atomic_dec_and_test(&pi_state->refcount))
+ if (!refcount_dec_and_test(&pi_state->refcount))
return;
/*
@@ -853,7 +854,7 @@ static void put_pi_state(struct futex_pi_state *pi_state)
* refcount is at 0 - put it back to 1.
*/
pi_state->owner = NULL;
- atomic_set(&pi_state->refcount, 1);
+ refcount_set(&pi_state->refcount, 1);
current->pi_state_cache = pi_state;
}
}
@@ -1051,7 +1052,7 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
* and futex_wait_requeue_pi() as it cannot go to 0 and consequently
* free pi_state before we can take a reference ourselves.
*/
- WARN_ON(!atomic_read(&pi_state->refcount));
+ WARN_ON(!refcount_read(&pi_state->refcount));
/*
* Now that we have a pi_state, we can acquire wait_lock
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable kcov.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
kernel/kcov.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/kcov.c b/kernel/kcov.c
index cd77199..a6c7df3 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -19,6 +19,7 @@
#include <linux/debugfs.h>
#include <linux/uaccess.h>
#include <linux/kcov.h>
+#include <linux/refcount.h>
#include <asm/setup.h>
/*
@@ -35,7 +36,7 @@ struct kcov {
* - opened file descriptor
* - task with enabled coverage (we can't unwire it from another task)
*/
- atomic_t refcount;
+ refcount_t refcount;
/* The lock protects mode, size, area and t. */
spinlock_t lock;
enum kcov_mode mode;
@@ -94,12 +95,12 @@ EXPORT_SYMBOL(__sanitizer_cov_trace_pc);
static void kcov_get(struct kcov *kcov)
{
- atomic_inc(&kcov->refcount);
+ refcount_inc(&kcov->refcount);
}
static void kcov_put(struct kcov *kcov)
{
- if (atomic_dec_and_test(&kcov->refcount)) {
+ if (refcount_dec_and_test(&kcov->refcount)) {
vfree(kcov->area);
kfree(kcov);
}
@@ -175,7 +176,7 @@ static int kcov_open(struct inode *inode, struct file *filep)
kcov = kzalloc(sizeof(*kcov), GFP_KERNEL);
if (!kcov)
return -ENOMEM;
- atomic_set(&kcov->refcount, 1);
+ refcount_set(&kcov->refcount, 1);
spin_lock_init(&kcov->lock);
filep->private_data = kcov;
return nonseekable_open(inode, filep);
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable ring_buffer.aux_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
kernel/events/core.c | 2 +-
kernel/events/internal.h | 2 +-
kernel/events/ring_buffer.c | 6 +++---
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 43da16ef..09c6704 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5164,7 +5164,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
/* this has to be the last one */
rb_free_aux(rb);
- WARN_ON_ONCE(atomic_read(&rb->aux_refcount));
+ WARN_ON_ONCE(refcount_read(&rb->aux_refcount));
mutex_unlock(&event->mmap_mutex);
}
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index b8e6fdf..fb55716 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -48,7 +48,7 @@ struct ring_buffer {
atomic_t aux_mmap_count;
unsigned long aux_mmap_locked;
void (*free_aux)(void *);
- atomic_t aux_refcount;
+ refcount_t aux_refcount;
void **aux_pages;
void *aux_priv;
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 3353572..ba12fd2 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -357,7 +357,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
if (!atomic_read(&rb->aux_mmap_count))
goto err;
- if (!atomic_inc_not_zero(&rb->aux_refcount))
+ if (!refcount_inc_not_zero(&rb->aux_refcount))
goto err;
/*
@@ -648,7 +648,7 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
* we keep a refcount here to make sure either of the two can
* reference them safely.
*/
- atomic_set(&rb->aux_refcount, 1);
+ refcount_set(&rb->aux_refcount, 1);
rb->aux_overwrite = overwrite;
rb->aux_watermark = watermark;
@@ -667,7 +667,7 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
void rb_free_aux(struct ring_buffer *rb)
{
- if (atomic_dec_and_test(&rb->aux_refcount))
+ if (refcount_dec_and_test(&rb->aux_refcount))
__rb_free_aux(rb);
}
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable perf_event_context.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/perf_event.h | 3 ++-
kernel/events/core.c | 12 ++++++------
2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9bac4bf..75c11181 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -54,6 +54,7 @@ struct perf_guest_info_callbacks {
#include <linux/perf_regs.h>
#include <linux/workqueue.h>
#include <linux/cgroup.h>
+#include <linux/refcount.h>
#include <asm/local.h>
struct perf_callchain_entry {
@@ -737,7 +738,7 @@ struct perf_event_context {
int nr_stat;
int nr_freq;
int rotate_disable;
- atomic_t refcount;
+ refcount_t refcount;
struct task_struct *task;
/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 852768d..57ef7ce 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1109,7 +1109,7 @@ static void perf_event_ctx_deactivate(struct perf_event_context *ctx)
static void get_ctx(struct perf_event_context *ctx)
{
- WARN_ON(!atomic_inc_not_zero(&ctx->refcount));
+ WARN_ON(!refcount_inc_not_zero(&ctx->refcount));
}
static void free_ctx(struct rcu_head *head)
@@ -1123,7 +1123,7 @@ static void free_ctx(struct rcu_head *head)
static void put_ctx(struct perf_event_context *ctx)
{
- if (atomic_dec_and_test(&ctx->refcount)) {
+ if (refcount_dec_and_test(&ctx->refcount)) {
if (ctx->parent_ctx)
put_ctx(ctx->parent_ctx);
if (ctx->task && ctx->task != TASK_TOMBSTONE)
@@ -1201,7 +1201,7 @@ perf_event_ctx_lock_nested(struct perf_event *event, int nesting)
again:
rcu_read_lock();
ctx = ACCESS_ONCE(event->ctx);
- if (!atomic_inc_not_zero(&ctx->refcount)) {
+ if (!refcount_inc_not_zero(&ctx->refcount)) {
rcu_read_unlock();
goto again;
}
@@ -1329,7 +1329,7 @@ perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
}
if (ctx->task == TASK_TOMBSTONE ||
- !atomic_inc_not_zero(&ctx->refcount)) {
+ !refcount_inc_not_zero(&ctx->refcount)) {
raw_spin_unlock(&ctx->lock);
ctx = NULL;
} else {
@@ -3795,7 +3795,7 @@ static void __perf_event_init_context(struct perf_event_context *ctx)
INIT_LIST_HEAD(&ctx->pinned_groups);
INIT_LIST_HEAD(&ctx->flexible_groups);
INIT_LIST_HEAD(&ctx->event_list);
- atomic_set(&ctx->refcount, 1);
+ refcount_set(&ctx->refcount, 1);
}
static struct perf_event_context *
@@ -9812,7 +9812,7 @@ __perf_event_ctx_lock_double(struct perf_event *group_leader,
again:
rcu_read_lock();
gctx = READ_ONCE(group_leader->ctx);
- if (!atomic_inc_not_zero(&gctx->refcount)) {
+ if (!refcount_inc_not_zero(&gctx->refcount)) {
rcu_read_unlock();
goto again;
}
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable signal_struct.sigcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
include/linux/sched/signal.h | 2 +-
kernel/fork.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 4d5cdf1..c5f1a67 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -77,7 +77,7 @@ struct thread_group_cputimer {
* the locking of signal_struct.
*/
struct signal_struct {
- atomic_t sigcnt;
+ refcount_t sigcnt;
atomic_t live;
int nr_threads;
struct list_head thread_head;
diff --git a/kernel/fork.c b/kernel/fork.c
index 43bcd20..d2e2eb0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -400,7 +400,7 @@ static inline void free_signal_struct(struct signal_struct *sig)
static inline void put_signal_struct(struct signal_struct *sig)
{
- if (atomic_dec_and_test(&sig->sigcnt))
+ if (refcount_dec_and_test(&sig->sigcnt))
free_signal_struct(sig);
}
@@ -1382,7 +1382,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->nr_threads = 1;
atomic_set(&sig->live, 1);
- atomic_set(&sig->sigcnt, 1);
+ refcount_set(&sig->sigcnt, 1);
/* list_add(thread_node, thread_head) without INIT_LIST_HEAD() */
sig->thread_head = (struct list_head)LIST_HEAD_INIT(tsk->thread_node);
@@ -1887,7 +1887,7 @@ static __latent_entropy struct task_struct *copy_process(
} else {
current->signal->nr_threads++;
atomic_inc(¤t->signal->live);
- atomic_inc(¤t->signal->sigcnt);
+ refcount_inc(¤t->signal->sigcnt);
list_add_tail_rcu(&p->thread_group,
&p->group_leader->thread_group);
list_add_tail_rcu(&p->thread_node,
--
2.7.4
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable numa_group.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: David Windsor <[email protected]>
Reviewed-by: Hans Liljestrand <[email protected]>
Signed-off-by: Elena Reshetova <[email protected]>
---
kernel/sched/fair.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a5d83ed..fe4e38c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1059,7 +1059,7 @@ unsigned int sysctl_numa_balancing_scan_size = 256;
unsigned int sysctl_numa_balancing_scan_delay = 1000;
struct numa_group {
- atomic_t refcount;
+ refcount_t refcount;
spinlock_t lock; /* nr_tasks, tasks */
int nr_tasks;
@@ -1128,7 +1128,7 @@ static unsigned int task_scan_start(struct task_struct *p)
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
- period *= atomic_read(&ng->refcount);
+ period *= refcount_read(&ng->refcount);
period *= shared + 1;
period /= private + shared + 1;
}
@@ -1151,7 +1151,7 @@ static unsigned int task_scan_max(struct task_struct *p)
unsigned long private = group_faults_priv(ng);
unsigned long period = smax;
- period *= atomic_read(&ng->refcount);
+ period *= refcount_read(&ng->refcount);
period *= shared + 1;
period /= private + shared + 1;
@@ -2236,12 +2236,12 @@ static void task_numa_placement(struct task_struct *p)
static inline int get_numa_group(struct numa_group *grp)
{
- return atomic_inc_not_zero(&grp->refcount);
+ return refcount_inc_not_zero(&grp->refcount);
}
static inline void put_numa_group(struct numa_group *grp)
{
- if (atomic_dec_and_test(&grp->refcount))
+ if (refcount_dec_and_test(&grp->refcount))
kfree_rcu(grp, rcu);
}
@@ -2262,7 +2262,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
if (!grp)
return;
- atomic_set(&grp->refcount, 1);
+ refcount_set(&grp->refcount, 1);
grp->active_nodes = 1;
grp->max_faults_cpu = 0;
spin_lock_init(&grp->lock);
--
2.7.4
On Wed, Aug 30, 2017 at 5:22 AM, Elena Reshetova
<[email protected]> wrote:
> Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
> arm and others on their way.
>
> Changes in v5:
> * Kees catched that the following changes in
> perf_event_context.refcount and futex_pi_state.refcount
> are not correct now when ARCH_HAS_REFCOUNT is enabled:
> - WARN_ON(!atomic_inc_not_zero(refcount));
> + refcount_inc(refcount);
> So they are now changed back to using refcount_inc_not_zero.
Thanks!
Reviewed-by: Kees Cook <[email protected]>
Andrew, are you able to carry these patches in -mm, since they span a
bunch of core kernel areas?
-Kees
>
> Changes in v4:
> * just rebase and corrections on linux-next/master
>
> Changes in v3:
> * SoB chain corrected
> * minor corrections based on v2 feedback
> * rebase on linux-next/master as of today
>
> Changes in v2:
> * dropped already merged patches
> * rebase on top of linux-next/master
> * Now by default refcount_t = atomic_t (*) and uses all atomic
> standard operations unless CONFIG_REFCOUNT_FULL is enabled.
> This is a compromise for the systems that are critical on
> performance (such as net) and cannot accept even slight delay
> on the refcounter operations.
>
> This series, for core kernel components, replaces atomic_t reference
> counters with the new refcount_t type and API (see include/linux/refcount.h).
> By doing this we prevent intentional or accidental
> underflows or overflows that can led to use-after-free vulnerabilities.
>
> The patches are fully independent and can be cherry-picked separately.
> If there are no objections to the patches, please merge them via respective trees.
>
>
> Elena Reshetova (15):
> sched: convert sighand_struct.count to refcount_t
> sched: convert signal_struct.sigcnt to refcount_t
> sched: convert user_struct.__count to refcount_t
> sched: convert numa_group.refcount to refcount_t
> sched/task_struct: convert task_struct.usage to refcount_t
> sched/task_struct: convert task_struct.stack_refcount to refcount_t
> perf: convert perf_event_context.refcount to refcount_t
> perf/ring_buffer: convert ring_buffer.refcount to refcount_t
> perf/ring_buffer: convert ring_buffer.aux_refcount to refcount_t
> uprobes: convert uprobe.ref to refcount_t
> nsproxy: convert nsproxy.count to refcount_t
> groups: convert group_info.usage to refcount_t
> creds: convert cred.usage to refcount_t
> futex: convert futex_pi_state.refcount to refcount_t
> kcov: convert kcov.refcount to refcount_t
>
> fs/exec.c | 4 ++--
> fs/proc/task_nommu.c | 2 +-
> include/linux/cred.h | 13 ++++++------
> include/linux/init_task.h | 7 +++---
> include/linux/nsproxy.h | 6 +++---
> include/linux/perf_event.h | 3 ++-
> include/linux/sched.h | 5 +++--
> include/linux/sched/signal.h | 5 +++--
> include/linux/sched/task.h | 4 ++--
> include/linux/sched/task_stack.h | 2 +-
> include/linux/sched/user.h | 5 +++--
> kernel/cred.c | 46 ++++++++++++++++++++--------------------
> kernel/events/core.c | 18 ++++++++--------
> kernel/events/internal.h | 5 +++--
> kernel/events/ring_buffer.c | 8 +++----
> kernel/events/uprobes.c | 8 +++----
> kernel/fork.c | 24 ++++++++++-----------
> kernel/futex.c | 13 ++++++------
> kernel/groups.c | 2 +-
> kernel/kcov.c | 9 ++++----
> kernel/nsproxy.c | 6 +++---
> kernel/sched/fair.c | 12 +++++------
> kernel/user.c | 8 +++----
> 23 files changed, 112 insertions(+), 103 deletions(-)
>
> --
> 2.7.4
>
--
Kees Cook
Pixel Security
On Wed, 30 Aug 2017, Elena Reshetova wrote:
> atomic_t variables are currently used to implement reference
> counters with the following properties:
> - counter is initialized to 1 using atomic_set()
> - a resource is freed upon counter reaching zero
> - once counter reaches zero, its further
> increments aren't allowed
> - counter schema uses basic atomic operations
> (set, inc, inc_not_zero, dec_and_test, etc.)
>
> Such atomic variables should be converted to a newly provided
> refcount_t type and API that prevents accidental counter overflows
> and underflows. This is important since overflows and underflows
> can lead to use-after-free situation and be exploitable.
>
> The variable futex_pi_state.refcount is used as pure
> reference counter. Convert it to refcount_t and fix up
> the operations.
>
> Suggested-by: Kees Cook <[email protected]>
> Reviewed-by: David Windsor <[email protected]>
> Reviewed-by: Hans Liljestrand <[email protected]>
> Signed-off-by: Elena Reshetova <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > atomic_t variables are currently used to implement reference
> > counters with the following properties:
> > - counter is initialized to 1 using atomic_set()
> > - a resource is freed upon counter reaching zero
> > - once counter reaches zero, its further
> > increments aren't allowed
> > - counter schema uses basic atomic operations
> > (set, inc, inc_not_zero, dec_and_test, etc.)
> >
> > Such atomic variables should be converted to a newly provided
> > refcount_t type and API that prevents accidental counter overflows
> > and underflows. This is important since overflows and underflows
> > can lead to use-after-free situation and be exploitable.
> >
> > The variable futex_pi_state.refcount is used as pure
> > reference counter. Convert it to refcount_t and fix up
> > the operations.
> >
> > Suggested-by: Kees Cook <[email protected]>
> > Reviewed-by: David Windsor <[email protected]>
> > Reviewed-by: Hans Liljestrand <[email protected]>
> > Signed-off-by: Elena Reshetova <[email protected]>
>
> Reviewed-by: Thomas Gleixner <[email protected]>
So the thing to be careful with for things like futex and some of the
other core kernel code is the memory ordering.
atomic_dec_and_test() provides a full smp_mb() before and after,
refcount_dec_and_test() only provides release semantics.
This is typically sufficient, and I would argue that if we rely on more
than that, there _should_ be a comment, however reality isn't always as
nice.
That said, I think this conversion is OK, pi_state->refcount isn't
relied upon to provide additional memory ordering above and beyond what
refcounting requires.
On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > atomic_t variables are currently used to implement reference
> > > counters with the following properties:
> > > - counter is initialized to 1 using atomic_set()
> > > - a resource is freed upon counter reaching zero
> > > - once counter reaches zero, its further
> > > increments aren't allowed
> > > - counter schema uses basic atomic operations
> > > (set, inc, inc_not_zero, dec_and_test, etc.)
> > >
> > > Such atomic variables should be converted to a newly provided
> > > refcount_t type and API that prevents accidental counter overflows
> > > and underflows. This is important since overflows and underflows
> > > can lead to use-after-free situation and be exploitable.
> > >
> > > The variable futex_pi_state.refcount is used as pure
> > > reference counter. Convert it to refcount_t and fix up
> > > the operations.
> > >
> > > Suggested-by: Kees Cook <[email protected]>
> > > Reviewed-by: David Windsor <[email protected]>
> > > Reviewed-by: Hans Liljestrand <[email protected]>
> > > Signed-off-by: Elena Reshetova <[email protected]>
> >
> > Reviewed-by: Thomas Gleixner <[email protected]>
>
> So the thing to be careful with for things like futex and some of the
> other core kernel code is the memory ordering.
>
> atomic_dec_and_test() provides a full smp_mb() before and after,
> refcount_dec_and_test() only provides release semantics.
>
> This is typically sufficient, and I would argue that if we rely on more
> than that, there _should_ be a comment, however reality isn't always as
> nice.
>
> That said, I think this conversion is OK, pi_state->refcount isn't
> relied upon to provide additional memory ordering above and beyond what
> refcounting requires.
So the changelogs should reflect that. The current one suggests that this
is a one to one replacement for atomic_t just with the extra sanity checks
added.
Thanks,
tglx
On Thu, Aug 31, 2017 at 04:48:00PM -0700, Kees Cook wrote:
> On Wed, Aug 30, 2017 at 5:22 AM, Elena Reshetova
> <[email protected]> wrote:
> > Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
> > arm and others on their way.
> >
> > Changes in v5:
> > * Kees catched that the following changes in
> > perf_event_context.refcount and futex_pi_state.refcount
> > are not correct now when ARCH_HAS_REFCOUNT is enabled:
> > - WARN_ON(!atomic_inc_not_zero(refcount));
> > + refcount_inc(refcount);
> > So they are now changed back to using refcount_inc_not_zero.
>
> Thanks!
>
> Reviewed-by: Kees Cook <[email protected]>
>
> Andrew, are you able to carry these patches in -mm, since they span a
> bunch of core kernel areas?
No.. these patches should go through the regular trees that maintain
these various parts.
> On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > > atomic_t variables are currently used to implement reference
> > > > counters with the following properties:
> > > > - counter is initialized to 1 using atomic_set()
> > > > - a resource is freed upon counter reaching zero
> > > > - once counter reaches zero, its further
> > > > increments aren't allowed
> > > > - counter schema uses basic atomic operations
> > > > (set, inc, inc_not_zero, dec_and_test, etc.)
> > > >
> > > > Such atomic variables should be converted to a newly provided
> > > > refcount_t type and API that prevents accidental counter overflows
> > > > and underflows. This is important since overflows and underflows
> > > > can lead to use-after-free situation and be exploitable.
> > > >
> > > > The variable futex_pi_state.refcount is used as pure
> > > > reference counter. Convert it to refcount_t and fix up
> > > > the operations.
> > > >
> > > > Suggested-by: Kees Cook <[email protected]>
> > > > Reviewed-by: David Windsor <[email protected]>
> > > > Reviewed-by: Hans Liljestrand <[email protected]>
> > > > Signed-off-by: Elena Reshetova <[email protected]>
> > >
> > > Reviewed-by: Thomas Gleixner <[email protected]>
> >
> > So the thing to be careful with for things like futex and some of the
> > other core kernel code is the memory ordering.
> >
> > atomic_dec_and_test() provides a full smp_mb() before and after,
> > refcount_dec_and_test() only provides release semantics.
> >
> > This is typically sufficient, and I would argue that if we rely on more
> > than that, there _should_ be a comment, however reality isn't always as
> > nice.
> >
> > That said, I think this conversion is OK, pi_state->refcount isn't
> > relied upon to provide additional memory ordering above and beyond what
> > refcounting requires.
>
> So the changelogs should reflect that. The current one suggests that this
> is a one to one replacement for atomic_t just with the extra sanity checks
> added.
I will update the commit texts accordingly and resend the whole series since
this should be then mentioned in every commit to make sure it is not missed.
Best Regards,
Elena.
>
> Thanks,
>
> tglx
> On Fri, 1 Sep 2017, Peter Zijlstra wrote:
> > > On Fri, Sep 01, 2017 at 09:39:50AM +0200, Thomas Gleixner wrote:
> > > > On Wed, 30 Aug 2017, Elena Reshetova wrote:
> > > > > atomic_t variables are currently used to implement reference
> > > > > counters with the following properties:
> > > > > - counter is initialized to 1 using atomic_set()
> > > > > - a resource is freed upon counter reaching zero
> > > > > - once counter reaches zero, its further
> > > > > increments aren't allowed
> > > > > - counter schema uses basic atomic operations
> > > > > (set, inc, inc_not_zero, dec_and_test, etc.)
> > > > >
> > > > > Such atomic variables should be converted to a newly provided
> > > > > refcount_t type and API that prevents accidental counter overflows
> > > > > and underflows. This is important since overflows and underflows
> > > > > can lead to use-after-free situation and be exploitable.
> > > > >
> > > > > The variable futex_pi_state.refcount is used as pure
> > > > > reference counter. Convert it to refcount_t and fix up
> > > > > the operations.
> > > > >
> > > > > Suggested-by: Kees Cook <[email protected]>
> > > > > Reviewed-by: David Windsor <[email protected]>
> > > > > Reviewed-by: Hans Liljestrand <[email protected]>
> > > > > Signed-off-by: Elena Reshetova <[email protected]>
> > > >
> > > > Reviewed-by: Thomas Gleixner <[email protected]>
> > >
> > > So the thing to be careful with for things like futex and some of the
> > > other core kernel code is the memory ordering.
> > >
> > > atomic_dec_and_test() provides a full smp_mb() before and after,
> > > refcount_dec_and_test() only provides release semantics.
> > >
> > > This is typically sufficient, and I would argue that if we rely on more
> > > than that, there _should_ be a comment, however reality isn't always as
> > > nice.
> > >
> > > That said, I think this conversion is OK, pi_state->refcount isn't
> > > relied upon to provide additional memory ordering above and beyond what
> > > refcounting requires.
> >
> > So the changelogs should reflect that. The current one suggests that this
> > is a one to one replacement for atomic_t just with the extra sanity checks
> > added.
>
> I will update the commit texts accordingly and resend the whole series since
> this should be then mentioned in every commit to make sure it is not missed.
Actually on the second thought: does the above memory ordering differences
really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way
how it is currently implemented for x86 is the same way as it is for atomic cases.
>
> Best Regards,
> Elena.
>
> >
> > Thanks,
> >
> > tglx
On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> Actually on the second thought: does the above memory ordering differences
> really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way
> how it is currently implemented for x86 is the same way as it is for atomic cases.
Never look to x86 for memory ordering, its boring.
And yes, for the ARM implementation it can certainly make a difference.
> On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > Actually on the second thought: does the above memory ordering differences
> > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > how it is currently implemented for x86 is the same way as it is for atomic cases.
>
> Never look to x86 for memory ordering, its boring.
>
> And yes, for the ARM implementation it can certainly make a difference.
So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT
enabled or not and then also based on architecture. Thus I believe is also true for atomic: there
might be differences when you use arch. dependent version of function or not.
So, I guess if I rewrite the commits, I should only include the statement on relaxed memory order
for REFCOUNT_FULL and tell that arch. specific implementations may vary on their properties
(as they do now).
On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
>
> > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > Actually on the second thought: does the above memory ordering differences
> > > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > how it is currently implemented for x86 is the same way as it is for atomic cases.
> >
> > Never look to x86 for memory ordering, its boring.
> >
> > And yes, for the ARM implementation it can certainly make a difference.
>
> So, yes, what I am trying to say is that it can really depend if you have ARCH_HAS_REFCOUNT
> enabled or not and then also based on architecture. Thus I believe is also true for atomic: there
> might be differences when you use arch. dependent version of function or not.
So the generic one in lib/refcount.c is already weaker on ARM, they
don't need to do a ARCH specific 'fast' implementation for the
difference to show up.
On Fri, Sep 1, 2017 at 2:48 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Aug 31, 2017 at 04:48:00PM -0700, Kees Cook wrote:
>> On Wed, Aug 30, 2017 at 5:22 AM, Elena Reshetova
>> <[email protected]> wrote:
>> > Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
>> > arm and others on their way.
>> >
>> > Changes in v5:
>> > * Kees catched that the following changes in
>> > perf_event_context.refcount and futex_pi_state.refcount
>> > are not correct now when ARCH_HAS_REFCOUNT is enabled:
>> > - WARN_ON(!atomic_inc_not_zero(refcount));
>> > + refcount_inc(refcount);
>> > So they are now changed back to using refcount_inc_not_zero.
>>
>> Thanks!
>>
>> Reviewed-by: Kees Cook <[email protected]>
>>
>> Andrew, are you able to carry these patches in -mm, since they span a
>> bunch of core kernel areas?
>
> No.. these patches should go through the regular trees that maintain
> these various parts.
Okay, sounds fine. Elena, can you split these up? (You'll probably
have to examine MAINTAINERS and/or git history for each patch...)
-Kees
--
Kees Cook
Pixel Security
> On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> >
> > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > Actually on the second thought: does the above memory ordering differences
> > > > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > > how it is currently implemented for x86 is the same way as it is for atomic
> cases.
> > >
> > > Never look to x86 for memory ordering, its boring.
> > >
> > > And yes, for the ARM implementation it can certainly make a difference.
> >
> > So, yes, what I am trying to say is that it can really depend if you have
> ARCH_HAS_REFCOUNT
> > enabled or not and then also based on architecture. Thus I believe is also true for
> atomic: there
> > might be differences when you use arch. dependent version of function or not.
>
> So the generic one in lib/refcount.c is already weaker on ARM, they
> don't need to do a ARCH specific 'fast' implementation for the
> difference to show up.
But can they make "fast" implementation on ARM that would give stronger memory guarantees?
> On Fri, Sep 1, 2017 at 2:48 AM, Peter Zijlstra <[email protected]> wrote:
> > On Thu, Aug 31, 2017 at 04:48:00PM -0700, Kees Cook wrote:
> >> On Wed, Aug 30, 2017 at 5:22 AM, Elena Reshetova
> >> <[email protected]> wrote:
> >> > Now we have at least x86 support for ARCH_HAS_REFCOUNT merged and
> >> > arm and others on their way.
> >> >
> >> > Changes in v5:
> >> > * Kees catched that the following changes in
> >> > perf_event_context.refcount and futex_pi_state.refcount
> >> > are not correct now when ARCH_HAS_REFCOUNT is enabled:
> >> > - WARN_ON(!atomic_inc_not_zero(refcount));
> >> > + refcount_inc(refcount);
> >> > So they are now changed back to using refcount_inc_not_zero.
> >>
> >> Thanks!
> >>
> >> Reviewed-by: Kees Cook <[email protected]>
> >>
> >> Andrew, are you able to carry these patches in -mm, since they span a
> >> bunch of core kernel areas?
> >
> > No.. these patches should go through the regular trees that maintain
> > these various parts.
>
> Okay, sounds fine. Elena, can you split these up? (You'll probably
> have to examine MAINTAINERS and/or git history for each patch...)
Well, I can do this, but patches are already fully independent for cherry-pick and all maintainers
should be in the CC list, so I was hoping people can pull into their trees from this series.
But if people want to split, I can do a split...
Best Regards,
Elena.
>
> -Kees
>
> --
> Kees Cook
> Pixel Security
On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote:
> > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> > >
> > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > > Actually on the second thought: does the above memory ordering differences
> > > > > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the way
> > > > > how it is currently implemented for x86 is the same way as it is for atomic
> > cases.
> > > >
> > > > Never look to x86 for memory ordering, its boring.
> > > >
> > > > And yes, for the ARM implementation it can certainly make a difference.
> > >
> > > So, yes, what I am trying to say is that it can really depend if you have
> > ARCH_HAS_REFCOUNT
> > > enabled or not and then also based on architecture. Thus I believe is also true for
> > atomic: there
> > > might be differences when you use arch. dependent version of function or not.
> >
> > So the generic one in lib/refcount.c is already weaker on ARM, they
> > don't need to do a ARCH specific 'fast' implementation for the
> > difference to show up.
>
> But can they make "fast" implementation on ARM that would give stronger memory guarantees?
Whatever for?
> -----Original Message-----
> From: Peter Zijlstra [mailto:[email protected]]
> Sent: Friday, September 1, 2017 10:13 PM
> To: Reshetova, Elena <[email protected]>
> Cc: Thomas Gleixner <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 14/15] futex: convert futex_pi_state.refcount to refcount_t
>
> On Fri, Sep 01, 2017 at 05:03:55PM +0000, Reshetova, Elena wrote:
> > > On Fri, Sep 01, 2017 at 01:24:16PM +0000, Reshetova, Elena wrote:
> > > >
> > > > > On Fri, Sep 01, 2017 at 11:05:33AM +0000, Reshetova, Elena wrote:
> > > > > > Actually on the second thought: does the above memory ordering
> differences
> > > > > > really apply when we have ARCH_HAS_REFCOUNT? To me it looks like the
> way
> > > > > > how it is currently implemented for x86 is the same way as it is for atomic
> > > cases.
> > > > >
> > > > > Never look to x86 for memory ordering, its boring.
> > > > >
> > > > > And yes, for the ARM implementation it can certainly make a difference.
> > > >
> > > > So, yes, what I am trying to say is that it can really depend if you have
> > > ARCH_HAS_REFCOUNT
> > > > enabled or not and then also based on architecture. Thus I believe is also true
> for
> > > atomic: there
> > > > might be differences when you use arch. dependent version of function or not.
> > >
> > > So the generic one in lib/refcount.c is already weaker on ARM, they
> > > don't need to do a ARCH specific 'fast' implementation for the
> > > difference to show up.
> >
> > But can they make "fast" implementation on ARM that would give stronger
> memory guarantees?
>
> Whatever for?
Well, maybe just by default when arch.-specific implementation is done. But I was just trying to speculate
to understand. I will resend this one with new comment added.
Still not sure if I need to resend the whole series with updated commits
or break this up by individual patches further for the separate merges.
On Mon, Sep 04, 2017 at 10:31:54AM +0000, Reshetova, Elena wrote:
> > > But can they make "fast" implementation on ARM that would give stronger
> > > memory guarantees?
> >
> > Whatever for?
>
> Well, maybe just by default when arch.-specific implementation is
> done. But I was just trying to speculate to understand. I will resend
> this one with new comment added.
So the generic lib/refcount.c already has weak ordering. It doesn't make
sense for an arch specific implementation (on a weakly ordered machine)
to provide stronger guarantees (it would make things slower).
The weaker ordering of the refcount_t primitives is sufficient if we're
talking pure refcounts. If for some reason code relies on stronger
ordering there _SHOULD_ be a comment with describing the additional
ordering requirements.
But that's a fairly big 'should'. I can well imagine the comment not
being there. In fact, see below.
> Still not sure if I need to resend the whole series with updated
> commits or break this up by individual patches further for the
> separate merges.
I've yet to look at the ones targeted at subsystems I do, I'm forever
and terminally behind on review :/
I called out the issue on futex in particular because it is fairly
tricky code that.
Now Thomas would like you to mention the fact that refcount_t doesn't
provide the exact same ordering as the atomic_t usages it replaces and
I think it would be good if you could hand-wave an argument on why the
futex code doesn't care.
Now, suppose we were to convert i_count to refcount_t (yes, I know, my
initial conversion wasn't well received), then we need to add
futex_get_inode() similar to futex_get_mm().
That is, smp_mb__{before,after}_atomic() works as expected and can be
used to fortify the implied barriers by refcount_t.
---
Subject: fs,inode: Add comment explaining additional ordering
Add a note to ihold() to document the ordering futex relies upon.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
fs/inode.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/inode.c b/fs/inode.c
index 50370599e371..17192ba92fef 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -395,6 +395,10 @@ void __iget(struct inode *inode)
*/
void ihold(struct inode *inode)
{
+ /*
+ * Note: futex.c:get_futex_key_refs() relies on this function
+ * implying an smp_mb().
+ */
WARN_ON(atomic_inc_return(&inode->i_count) < 2);
}
EXPORT_SYMBOL(ihold);
Sorry for delayed reply.
> On Mon, Sep 04, 2017 at 10:31:54AM +0000, Reshetova, Elena wrote:
> > > > But can they make "fast" implementation on ARM that would give stronger
> > > > memory guarantees?
> > >
> > > Whatever for?
> >
> > Well, maybe just by default when arch.-specific implementation is
> > done. But I was just trying to speculate to understand. I will resend
> > this one with new comment added.
>
> So the generic lib/refcount.c already has weak ordering. It doesn't make
> sense for an arch specific implementation (on a weakly ordered machine)
> to provide stronger guarantees (it would make things slower).
Thank you for explaining this! Helps to understand a lot.
>
> The weaker ordering of the refcount_t primitives is sufficient if we're
> talking pure refcounts. If for some reason code relies on stronger
> ordering there _SHOULD_ be a comment with describing the additional
> ordering requirements.
>
> But that's a fairly big 'should'. I can well imagine the comment not
> being there. In fact, see below.
>
> > Still not sure if I need to resend the whole series with updated
> > commits or break this up by individual patches further for the
> > separate merges.
>
> I've yet to look at the ones targeted at subsystems I do, I'm forever
> and terminally behind on review :/
>
> I called out the issue on futex in particular because it is fairly
> tricky code that.
>
> Now Thomas would like you to mention the fact that refcount_t doesn't
> provide the exact same ordering as the atomic_t usages it replaces and
> I think it would be good if you could hand-wave an argument on why the
> futex code doesn't care.
I think I can mention the ordering differences on all yet-to-be-merged
patches to make sure maintainers are aware. The problem with concrete
cases is that I don't usually have enough knowledge of code to understand
for sure where it would matter or not. Previously I was even under impression
that it should not matter at all for the variables that we are converting since
they are classical refcounters, but your examples clearly show that it is not
*always* the case (but I think it is the case for most of patches).
So, I am a bit confused on how to approach this.
Either just put a statement to all patches and rely that maintainers certainly
know their code and can catch these things or do an analysis myself, but
then I would need a bit of guidance on what is the reasonable heuristics on
how check each refcounter. This goes really beyond my current
kernel knowledge, but I am happy to learn if somebody points me to smth
I can read/fill missing points.
Best Regards,
Elena.
>
>
> Now, suppose we were to convert i_count to refcount_t (yes, I know, my
> initial conversion wasn't well received), then we need to add
> futex_get_inode() similar to futex_get_mm().
>
> That is, smp_mb__{before,after}_atomic() works as expected and can be
> used to fortify the implied barriers by refcount_t.
>
> ---
> Subject: fs,inode: Add comment explaining additional ordering
>
> Add a note to ihold() to document the ordering futex relies upon.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> fs/inode.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 50370599e371..17192ba92fef 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -395,6 +395,10 @@ void __iget(struct inode *inode)
> */
> void ihold(struct inode *inode)
> {
> + /*
> + * Note: futex.c:get_futex_key_refs() relies on this function
> + * implying an smp_mb().
> + */
> WARN_ON(atomic_inc_return(&inode->i_count) < 2);
> }
> EXPORT_SYMBOL(ihold);