Subject: [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave()

This series is a v2 of the atomic_dec_and_lock_irqsave(). Now refcount_*
is used instead of atomic_* as suggested by Peter Zijlstra.

Patch
- 1-3 converts the user from atomic_* API to refcount_* API
- 4 implements refcount_dec_and_lock_irqsave
- 5-8 converts the local_irq_save() + refcount_dec_and_lock() users to
refcount_dec_and_lock_irqsave()

The whole series sits also at
git://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git refcount_t_irqsave

Sebastian




Subject: [PATCH 4/8] locking/refcount: implement refcount_dec_and_lock_irqsave()

From: Anna-Maria Gleixner <[email protected]>

There are in-tree users of refcount_dec_and_lock() which must acquire the
spin lock with interrupts disabled. To workaround the lack of an irqsave
variant of refcount_dec_and_lock() they use local_irq_save() at the call
site. This causes extra code and creates in some places unneeded long
interrupt disabled times. These places need also extra treatment for
PREEMPT_RT due to the disconnect of the irq disabling and the lock
function.

Implement the missing irqsave variant of the function.

Signed-off-by: Anna-Maria Gleixner <[email protected]>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
include/linux/refcount.h | 4 +++-
lib/refcount.c | 28 ++++++++++++++++++++++++++++
2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 4193c41e383a..a685da2c4522 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -98,5 +98,7 @@ extern __must_check bool refcount_dec_if_one(refcount_t *r);
extern __must_check bool refcount_dec_not_one(refcount_t *r);
extern __must_check bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock);
extern __must_check bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock);
-
+extern __must_check bool refcount_dec_and_lock_irqsave(refcount_t *r,
+ spinlock_t *lock,
+ unsigned long *flags);
#endif /* _LINUX_REFCOUNT_H */
diff --git a/lib/refcount.c b/lib/refcount.c
index 0eb48353abe3..d3b81cefce91 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -350,3 +350,31 @@ bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock)
}
EXPORT_SYMBOL(refcount_dec_and_lock);

+/**
+ * refcount_dec_and_lock_irqsave - return holding spinlock with disabled
+ * interrupts if able to decrement refcount to 0
+ * @r: the refcount
+ * @lock: the spinlock to be locked
+ * @flags: saved IRQ-flags if the is acquired
+ *
+ * Same as refcount_dec_and_lock() above except that the spinlock is acquired
+ * with disabled interupts.
+ *
+ * Return: true and hold spinlock if able to decrement refcount to 0, false
+ * otherwise
+ */
+bool refcount_dec_and_lock_irqsave(refcount_t *r, spinlock_t *lock,
+ unsigned long *flags)
+{
+ if (refcount_dec_not_one(r))
+ return false;
+
+ spin_lock_irqsave(lock, *flags);
+ if (!refcount_dec_and_test(r)) {
+ spin_unlock_irqrestore(lock, *flags);
+ return false;
+ }
+
+ return true;
+}
+EXPORT_SYMBOL(refcount_dec_and_lock_irqsave);
--
2.17.0


Subject: [PATCH 8/8] md: raid5: Do not disable irq on release_inactive_stripe_list() call

From: Anna-Maria Gleixner <[email protected]>

There is no need to invoke release_inactive_stripe_list() with interrupts
disabled. All call sites, except raid5_release_stripe(), unlock
->device_lock and enable interrupts before invoking the function.

Make it consistent.

Signed-off-by: Anna-Maria Gleixner <[email protected]>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
drivers/md/raid5.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 28453264c3eb..9433b2619006 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -414,9 +414,8 @@ void raid5_release_stripe(struct stripe_head *sh)
INIT_LIST_HEAD(&list);
hash = sh->hash_lock_index;
do_release_stripe(conf, sh, &list);
- spin_unlock(&conf->device_lock);
+ spin_unlock_irqrestore(&conf->device_lock, flags);
release_inactive_stripe_list(conf, &list, hash);
- local_irq_restore(flags);
}
}

--
2.17.0


Subject: [PATCH 7/8] md: raid5: Use irqsave variant of refcount_dec_and_lock()

From: Anna-Maria Gleixner <[email protected]>

The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock. With this variant the call of
local_irq_save is no longer required.

Signed-off-by: Anna-Maria Gleixner <[email protected]>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
drivers/md/raid5.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 57ea0ae8c7ff..28453264c3eb 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -409,16 +409,15 @@ void raid5_release_stripe(struct stripe_head *sh)
md_wakeup_thread(conf->mddev->thread);
return;
slow_path:
- local_irq_save(flags);
/* we are ok here if STRIPE_ON_RELEASE_LIST is set or not */
- if (refcount_dec_and_lock(&sh->count, &conf->device_lock)) {
+ if (refcount_dec_and_lock_irqsave(&sh->count, &conf->device_lock, &flags)) {
INIT_LIST_HEAD(&list);
hash = sh->hash_lock_index;
do_release_stripe(conf, sh, &list);
spin_unlock(&conf->device_lock);
release_inactive_stripe_list(conf, &list, hash);
+ local_irq_restore(flags);
}
- local_irq_restore(flags);
}

static inline void remove_hash(struct stripe_head *sh)
--
2.17.0


Subject: [PATCH 6/8] userns: Use irqsave variant of refcount_dec_and_lock()

From: Anna-Maria Gleixner <[email protected]>

The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock. With this variant the call of
local_irq_save/restore is no longer required.

Signed-off-by: Anna-Maria Gleixner <[email protected]>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/user.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/user.c b/kernel/user.c
index 5f65ef195259..0df9b1640b2a 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -169,11 +169,8 @@ void free_uid(struct user_struct *up)
if (!up)
return;

- local_irq_save(flags);
- if (refcount_dec_and_lock(&up->__count, &uidhash_lock))
+ if (refcount_dec_and_lock_irqsave(&up->__count, &uidhash_lock, &flags))
free_user(up, flags);
- else
- local_irq_restore(flags);
}

struct user_struct *alloc_uid(kuid_t uid)
--
2.17.0


Subject: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t

refcount_t type and corresponding API should be used instead of atomic_t when
the variable is used as a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free situations.

Most changes are 1:1 replacements except for
BUG_ON(atomic_inc_return(&sh->count) != 1);

which has been turned into
refcount_inc(&sh->count);
BUG_ON(refcount_read(&sh->count) != 1);

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
drivers/md/raid5-cache.c | 8 ++---
drivers/md/raid5-ppl.c | 2 +-
drivers/md/raid5.c | 67 ++++++++++++++++++++--------------------
drivers/md/raid5.h | 4 +--
4 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 3c65f52b68f5..532fdf56c117 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1049,7 +1049,7 @@ int r5l_write_stripe(struct r5l_log *log, struct stripe_head *sh)
* don't delay.
*/
clear_bit(STRIPE_DELAYED, &sh->state);
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);

mutex_lock(&log->io_mutex);
/* meta + data */
@@ -1388,7 +1388,7 @@ static void r5c_flush_stripe(struct r5conf *conf, struct stripe_head *sh)
lockdep_assert_held(&conf->device_lock);

list_del_init(&sh->lru);
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);

set_bit(STRIPE_HANDLE, &sh->state);
atomic_inc(&conf->active_stripes);
@@ -1491,7 +1491,7 @@ static void r5c_do_reclaim(struct r5conf *conf)
*/
if (!list_empty(&sh->lru) &&
!test_bit(STRIPE_HANDLE, &sh->state) &&
- atomic_read(&sh->count) == 0) {
+ refcount_read(&sh->count) == 0) {
r5c_flush_stripe(conf, sh);
if (count++ >= R5C_RECLAIM_STRIPE_GROUP)
break;
@@ -2912,7 +2912,7 @@ int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh)
* don't delay.
*/
clear_bit(STRIPE_DELAYED, &sh->state);
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);

mutex_lock(&log->io_mutex);
/* meta + data */
diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
index 42890a08375b..87840cfe7a80 100644
--- a/drivers/md/raid5-ppl.c
+++ b/drivers/md/raid5-ppl.c
@@ -388,7 +388,7 @@ int ppl_write_stripe(struct r5conf *conf, struct stripe_head *sh)

set_bit(STRIPE_LOG_TRAPPED, &sh->state);
clear_bit(STRIPE_DELAYED, &sh->state);
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);

if (ppl_log_stripe(log, sh)) {
spin_lock_irq(&ppl_conf->no_mem_stripes_lock);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index be117d0a65a8..57ea0ae8c7ff 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -306,7 +306,7 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
static void __release_stripe(struct r5conf *conf, struct stripe_head *sh,
struct list_head *temp_inactive_list)
{
- if (atomic_dec_and_test(&sh->count))
+ if (refcount_dec_and_test(&sh->count))
do_release_stripe(conf, sh, temp_inactive_list);
}

@@ -398,7 +398,7 @@ void raid5_release_stripe(struct stripe_head *sh)

/* Avoid release_list until the last reference.
*/
- if (atomic_add_unless(&sh->count, -1, 1))
+ if (refcount_dec_not_one(&sh->count))
return;

if (unlikely(!conf->mddev->thread) ||
@@ -411,7 +411,7 @@ void raid5_release_stripe(struct stripe_head *sh)
slow_path:
local_irq_save(flags);
/* we are ok here if STRIPE_ON_RELEASE_LIST is set or not */
- if (atomic_dec_and_lock(&sh->count, &conf->device_lock)) {
+ if (refcount_dec_and_lock(&sh->count, &conf->device_lock)) {
INIT_LIST_HEAD(&list);
hash = sh->hash_lock_index;
do_release_stripe(conf, sh, &list);
@@ -501,7 +501,7 @@ static void init_stripe(struct stripe_head *sh, sector_t sector, int previous)
struct r5conf *conf = sh->raid_conf;
int i, seq;

- BUG_ON(atomic_read(&sh->count) != 0);
+ BUG_ON(refcount_read(&sh->count) != 0);
BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
BUG_ON(stripe_operations_active(sh));
BUG_ON(sh->batch_head);
@@ -678,11 +678,11 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector,
&conf->cache_state);
} else {
init_stripe(sh, sector, previous);
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
}
- } else if (!atomic_inc_not_zero(&sh->count)) {
+ } else if (!refcount_inc_not_zero(&sh->count)) {
spin_lock(&conf->device_lock);
- if (!atomic_read(&sh->count)) {
+ if (!refcount_read(&sh->count)) {
if (!test_bit(STRIPE_HANDLE, &sh->state))
atomic_inc(&conf->active_stripes);
BUG_ON(list_empty(&sh->lru) &&
@@ -698,7 +698,7 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector,
sh->group = NULL;
}
}
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
spin_unlock(&conf->device_lock);
}
} while (sh == NULL);
@@ -760,9 +760,9 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
hash = stripe_hash_locks_hash(head_sector);
spin_lock_irq(conf->hash_locks + hash);
head = __find_stripe(conf, head_sector, conf->generation);
- if (head && !atomic_inc_not_zero(&head->count)) {
+ if (head && !refcount_inc_not_zero(&head->count)) {
spin_lock(&conf->device_lock);
- if (!atomic_read(&head->count)) {
+ if (!refcount_read(&head->count)) {
if (!test_bit(STRIPE_HANDLE, &head->state))
atomic_inc(&conf->active_stripes);
BUG_ON(list_empty(&head->lru) &&
@@ -778,7 +778,7 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
head->group = NULL;
}
}
- atomic_inc(&head->count);
+ refcount_inc(&head->count);
spin_unlock(&conf->device_lock);
}
spin_unlock_irq(conf->hash_locks + hash);
@@ -847,7 +847,7 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
sh->batch_head->bm_seq = seq;
}

- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
unlock_out:
unlock_two_stripes(head, sh);
out:
@@ -1110,9 +1110,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
pr_debug("%s: for %llu schedule op %d on disc %d\n",
__func__, (unsigned long long)sh->sector,
bi->bi_opf, i);
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
if (sh != head_sh)
- atomic_inc(&head_sh->count);
+ refcount_inc(&head_sh->count);
if (use_new_offset(conf, sh))
bi->bi_iter.bi_sector = (sh->sector
+ rdev->new_data_offset);
@@ -1174,9 +1174,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
"replacement disc %d\n",
__func__, (unsigned long long)sh->sector,
rbi->bi_opf, i);
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
if (sh != head_sh)
- atomic_inc(&head_sh->count);
+ refcount_inc(&head_sh->count);
if (use_new_offset(conf, sh))
rbi->bi_iter.bi_sector = (sh->sector
+ rrdev->new_data_offset);
@@ -1354,7 +1354,7 @@ static void ops_run_biofill(struct stripe_head *sh)
}
}

- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
init_async_submit(&submit, ASYNC_TX_ACK, tx, ops_complete_biofill, sh, NULL);
async_trigger_callback(&submit);
}
@@ -1432,7 +1432,7 @@ ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
if (i != target)
xor_srcs[count++] = sh->dev[i].page;

- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);

init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_ZERO_DST, NULL,
ops_complete_compute, sh, to_addr_conv(sh, percpu, 0));
@@ -1521,7 +1521,7 @@ ops_run_compute6_1(struct stripe_head *sh, struct raid5_percpu *percpu)
BUG_ON(!test_bit(R5_Wantcompute, &tgt->flags));
dest = tgt->page;

- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);

if (target == qd_idx) {
count = set_syndrome_sources(blocks, sh, SYNDROME_SRC_ALL);
@@ -1596,7 +1596,7 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
pr_debug("%s: stripe: %llu faila: %d failb: %d\n",
__func__, (unsigned long long)sh->sector, faila, failb);

- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);

if (failb == syndrome_disks+1) {
/* Q disk is one of the missing disks */
@@ -1867,7 +1867,7 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
break;
}
if (i >= sh->disks) {
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
set_bit(R5_Discard, &sh->dev[pd_idx].flags);
ops_complete_reconstruct(sh);
return;
@@ -1908,7 +1908,7 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
flags = ASYNC_TX_ACK |
(prexor ? ASYNC_TX_XOR_DROP_DST : ASYNC_TX_XOR_ZERO_DST);

- atomic_inc(&head_sh->count);
+ refcount_inc(&head_sh->count);
init_async_submit(&submit, flags, tx, ops_complete_reconstruct, head_sh,
to_addr_conv(sh, percpu, j));
} else {
@@ -1950,7 +1950,7 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
break;
}
if (i >= sh->disks) {
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
set_bit(R5_Discard, &sh->dev[sh->pd_idx].flags);
set_bit(R5_Discard, &sh->dev[sh->qd_idx].flags);
ops_complete_reconstruct(sh);
@@ -1974,7 +1974,7 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
struct stripe_head, batch_list) == head_sh;

if (last_stripe) {
- atomic_inc(&head_sh->count);
+ refcount_inc(&head_sh->count);
init_async_submit(&submit, txflags, tx, ops_complete_reconstruct,
head_sh, to_addr_conv(sh, percpu, j));
} else
@@ -2031,7 +2031,7 @@ static void ops_run_check_p(struct stripe_head *sh, struct raid5_percpu *percpu)
tx = async_xor_val(xor_dest, xor_srcs, 0, count, STRIPE_SIZE,
&sh->ops.zero_sum_result, &submit);

- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
init_async_submit(&submit, ASYNC_TX_ACK, tx, ops_complete_check, sh, NULL);
tx = async_trigger_callback(&submit);
}
@@ -2050,7 +2050,7 @@ static void ops_run_check_pq(struct stripe_head *sh, struct raid5_percpu *percpu
if (!checkp)
srcs[count] = NULL;

- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
init_async_submit(&submit, ASYNC_TX_ACK, NULL, ops_complete_check,
sh, to_addr_conv(sh, percpu, 0));
async_syndrome_val(srcs, 0, count+2, STRIPE_SIZE,
@@ -2150,7 +2150,7 @@ static struct stripe_head *alloc_stripe(struct kmem_cache *sc, gfp_t gfp,
INIT_LIST_HEAD(&sh->lru);
INIT_LIST_HEAD(&sh->r5c);
INIT_LIST_HEAD(&sh->log_list);
- atomic_set(&sh->count, 1);
+ refcount_set(&sh->count, 1);
sh->raid_conf = conf;
sh->log_start = MaxSector;
for (i = 0; i < disks; i++) {
@@ -2451,7 +2451,7 @@ static int drop_one_stripe(struct r5conf *conf)
spin_unlock_irq(conf->hash_locks + hash);
if (!sh)
return 0;
- BUG_ON(atomic_read(&sh->count));
+ BUG_ON(refcount_read(&sh->count));
shrink_buffers(sh);
free_stripe(conf->slab_cache, sh);
atomic_dec(&conf->active_stripes);
@@ -2483,7 +2483,7 @@ static void raid5_end_read_request(struct bio * bi)
break;

pr_debug("end_read_request %llu/%d, count: %d, error %d.\n",
- (unsigned long long)sh->sector, i, atomic_read(&sh->count),
+ (unsigned long long)sh->sector, i, refcount_read(&sh->count),
bi->bi_status);
if (i == disks) {
bio_reset(bi);
@@ -2620,7 +2620,7 @@ static void raid5_end_write_request(struct bio *bi)
}
}
pr_debug("end_write_request %llu/%d, count %d, error: %d.\n",
- (unsigned long long)sh->sector, i, atomic_read(&sh->count),
+ (unsigned long long)sh->sector, i, refcount_read(&sh->count),
bi->bi_status);
if (i == disks) {
bio_reset(bi);
@@ -4687,7 +4687,7 @@ static void handle_stripe(struct stripe_head *sh)
pr_debug("handling stripe %llu, state=%#lx cnt=%d, "
"pd_idx=%d, qd_idx=%d\n, check:%d, reconstruct:%d\n",
(unsigned long long)sh->sector, sh->state,
- atomic_read(&sh->count), sh->pd_idx, sh->qd_idx,
+ refcount_read(&sh->count), sh->pd_idx, sh->qd_idx,
sh->check_state, sh->reconstruct_state);

analyse_stripe(sh, &s);
@@ -5062,7 +5062,7 @@ static void activate_bit_delay(struct r5conf *conf,
struct stripe_head *sh = list_entry(head.next, struct stripe_head, lru);
int hash;
list_del_init(&sh->lru);
- atomic_inc(&sh->count);
+ refcount_inc(&sh->count);
hash = sh->hash_lock_index;
__release_stripe(conf, sh, &temp_inactive_list[hash]);
}
@@ -5387,7 +5387,8 @@ static struct stripe_head *__get_priority_stripe(struct r5conf *conf, int group)
sh->group = NULL;
}
list_del_init(&sh->lru);
- BUG_ON(atomic_inc_return(&sh->count) != 1);
+ refcount_inc(&sh->count);
+ BUG_ON(refcount_read(&sh->count) != 1);
return sh;
}

diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 3f8da26032ac..bc2a24e99346 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -4,7 +4,7 @@

#include <linux/raid/xor.h>
#include <linux/dmaengine.h>
-
+#include <linux/refcount.h>
/*
*
* Each stripe contains one buffer per device. Each buffer can be in
@@ -208,7 +208,7 @@ struct stripe_head {
short ddf_layout;/* use DDF ordering to calculate Q */
short hash_lock_index;
unsigned long state; /* state flags */
- atomic_t count; /* nr of active thread/requests */
+ refcount_t count; /* nr of active thread/requests */
int bm_seq; /* sequence number for bitmap flushes */
int disks; /* disks in stripe */
int overwrite_disks; /* total overwrite disks in stripe,
--
2.17.0


Subject: [PATCH 5/8] bdi: Use irqsave variant of refcount_dec_and_lock()

From: Anna-Maria Gleixner <[email protected]>

The irqsave variant of refcount_dec_and_lock handles irqsave/restore when
taking/releasing the spin lock. With this variant the call of
local_irq_save/restore is no longer required.

Signed-off-by: Anna-Maria Gleixner <[email protected]>
[bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g ]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
mm/backing-dev.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7984a872073e..520aa092f7b2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -485,11 +485,8 @@ void wb_congested_put(struct bdi_writeback_congested *congested)
{
unsigned long flags;

- local_irq_save(flags);
- if (!refcount_dec_and_lock(&congested->refcnt, &cgwb_lock)) {
- local_irq_restore(flags);
+ if (!refcount_dec_and_lock_irqsave(&congested->refcnt, &cgwb_lock, &flags))
return;
- }

/* bdi might already have been destroyed leaving @congested unlinked */
if (congested->__bdi) {
--
2.17.0


Subject: [PATCH 1/8] bdi: use refcount_t for reference counting instead atomic_t

refcount_t type and corresponding API should be used instead of atomic_t when
the variable is used as a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free situations.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
include/linux/backing-dev-defs.h | 3 ++-
include/linux/backing-dev.h | 4 ++--
mm/backing-dev.c | 12 ++++++------
3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 0bd432a4d7bd..81c75934ca5b 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -12,6 +12,7 @@
#include <linux/timer.h>
#include <linux/workqueue.h>
#include <linux/kref.h>
+#include <linux/refcount.h>

struct page;
struct device;
@@ -76,7 +77,7 @@ enum wb_reason {
*/
struct bdi_writeback_congested {
unsigned long state; /* WB_[a]sync_congested flags */
- atomic_t refcnt; /* nr of attached wb's and blkg */
+ refcount_t refcnt; /* nr of attached wb's and blkg */

#ifdef CONFIG_CGROUP_WRITEBACK
struct backing_dev_info *__bdi; /* the associated bdi, set to NULL
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 72ca0f3d39f3..c28a47cbe355 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -404,13 +404,13 @@ static inline bool inode_cgwb_enabled(struct inode *inode)
static inline struct bdi_writeback_congested *
wb_congested_get_create(struct backing_dev_info *bdi, int blkcg_id, gfp_t gfp)
{
- atomic_inc(&bdi->wb_congested->refcnt);
+ refcount_inc(&bdi->wb_congested->refcnt);
return bdi->wb_congested;
}

static inline void wb_congested_put(struct bdi_writeback_congested *congested)
{
- if (atomic_dec_and_test(&congested->refcnt))
+ if (refcount_dec_and_test(&congested->refcnt))
kfree(congested);
}

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7441bd93b732..7984a872073e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -450,10 +450,10 @@ wb_congested_get_create(struct backing_dev_info *bdi, int blkcg_id, gfp_t gfp)
if (new_congested) {
/* !found and storage for new one already allocated, insert */
congested = new_congested;
- new_congested = NULL;
rb_link_node(&congested->rb_node, parent, node);
rb_insert_color(&congested->rb_node, &bdi->cgwb_congested_tree);
- goto found;
+ spin_unlock_irqrestore(&cgwb_lock, flags);
+ return congested;
}

spin_unlock_irqrestore(&cgwb_lock, flags);
@@ -463,13 +463,13 @@ wb_congested_get_create(struct backing_dev_info *bdi, int blkcg_id, gfp_t gfp)
if (!new_congested)
return NULL;

- atomic_set(&new_congested->refcnt, 0);
+ refcount_set(&new_congested->refcnt, 1);
new_congested->__bdi = bdi;
new_congested->blkcg_id = blkcg_id;
goto retry;

found:
- atomic_inc(&congested->refcnt);
+ refcount_inc(&congested->refcnt);
spin_unlock_irqrestore(&cgwb_lock, flags);
kfree(new_congested);
return congested;
@@ -486,7 +486,7 @@ void wb_congested_put(struct bdi_writeback_congested *congested)
unsigned long flags;

local_irq_save(flags);
- if (!atomic_dec_and_lock(&congested->refcnt, &cgwb_lock)) {
+ if (!refcount_dec_and_lock(&congested->refcnt, &cgwb_lock)) {
local_irq_restore(flags);
return;
}
@@ -794,7 +794,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
if (!bdi->wb_congested)
return -ENOMEM;

- atomic_set(&bdi->wb_congested->refcnt, 1);
+ refcount_set(&bdi->wb_congested->refcnt, 1);

err = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL);
if (err) {
--
2.17.0


Subject: [PATCH 2/8] userns: use refcount_t for reference counting instead atomic_t

refcount_t type and corresponding API should be used instead of atomic_t when
the variable is used as a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free situations.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
include/linux/sched/user.h | 5 +++--
kernel/user.c | 8 ++++----
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index 96fe289c4c6e..39ad98c09c58 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -4,6 +4,7 @@

#include <linux/uidgid.h>
#include <linux/atomic.h>
+#include <linux/refcount.h>
#include <linux/ratelimit.h>

struct key;
@@ -12,7 +13,7 @@ struct key;
* Some day this will be a full-fledged user tracking system..
*/
struct user_struct {
- atomic_t __count; /* reference count */
+ refcount_t __count; /* reference count */
atomic_t processes; /* How many processes does this user have? */
atomic_t sigpending; /* How many pending signals does this user have? */
#ifdef CONFIG_FANOTIFY
@@ -59,7 +60,7 @@ extern struct user_struct root_user;
extern struct user_struct * alloc_uid(kuid_t);
static inline struct user_struct *get_uid(struct user_struct *u)
{
- atomic_inc(&u->__count);
+ refcount_inc(&u->__count);
return u;
}
extern void free_uid(struct user_struct *);
diff --git a/kernel/user.c b/kernel/user.c
index 36288d840675..5f65ef195259 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -96,7 +96,7 @@ static DEFINE_SPINLOCK(uidhash_lock);

/* root_user.__count is 1, for init task cred */
struct user_struct root_user = {
- .__count = ATOMIC_INIT(1),
+ .__count = REFCOUNT_INIT(1),
.processes = ATOMIC_INIT(1),
.sigpending = ATOMIC_INIT(0),
.locked_shm = 0,
@@ -123,7 +123,7 @@ static struct user_struct *uid_hash_find(kuid_t uid, struct hlist_head *hashent)

hlist_for_each_entry(user, hashent, uidhash_node) {
if (uid_eq(user->uid, uid)) {
- atomic_inc(&user->__count);
+ refcount_inc(&user->__count);
return user;
}
}
@@ -170,7 +170,7 @@ void free_uid(struct user_struct *up)
return;

local_irq_save(flags);
- if (atomic_dec_and_lock(&up->__count, &uidhash_lock))
+ if (refcount_dec_and_lock(&up->__count, &uidhash_lock))
free_user(up, flags);
else
local_irq_restore(flags);
@@ -191,7 +191,7 @@ struct user_struct *alloc_uid(kuid_t uid)
goto out_unlock;

new->uid = uid;
- atomic_set(&new->__count, 1);
+ refcount_set(&new->__count, 1);
ratelimit_state_init(&new->ratelimit, HZ, 100);
ratelimit_set_flags(&new->ratelimit, RATELIMIT_MSG_ON_RELEASE);

--
2.17.0


2018-05-23 12:37:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t

> Most changes are 1:1 replacements except for
> BUG_ON(atomic_inc_return(&sh->count) != 1);

That doesn't look right, 'inc_return == 1' implies inc-from-zero, which
is not allowed by refcount.

> which has been turned into
> refcount_inc(&sh->count);
> BUG_ON(refcount_read(&sh->count) != 1);

And that is racy, you can have additional increments in between.


Subject: Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t

On 2018-05-23 14:36:15 [+0200], Peter Zijlstra wrote:
> > Most changes are 1:1 replacements except for
> > BUG_ON(atomic_inc_return(&sh->count) != 1);
>
> That doesn't look right, 'inc_return == 1' implies inc-from-zero, which
> is not allowed by refcount.
>
> > which has been turned into
> > refcount_inc(&sh->count);
> > BUG_ON(refcount_read(&sh->count) != 1);
>
> And that is racy, you can have additional increments in between.

so do we stay with the atomic* API here or do we extend refcount* API?

Sebastian

2018-05-23 12:57:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t

On Wed, May 23, 2018 at 02:50:07PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-05-23 14:36:15 [+0200], Peter Zijlstra wrote:
> > > Most changes are 1:1 replacements except for
> > > BUG_ON(atomic_inc_return(&sh->count) != 1);
> >
> > That doesn't look right, 'inc_return == 1' implies inc-from-zero, which
> > is not allowed by refcount.
> >
> > > which has been turned into
> > > refcount_inc(&sh->count);
> > > BUG_ON(refcount_read(&sh->count) != 1);
> >
> > And that is racy, you can have additional increments in between.
>
> so do we stay with the atomic* API here or do we extend refcount* API?

Stay with the atomic; I'll look at the rest of these patches, but raid5
looks like a usage-count, not a reference count.

I'll probably ack your initial set and parts of this.. but let me get to
the end of this first.

2018-05-23 13:02:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave()

On Wed, May 09, 2018 at 09:36:37PM +0200, Sebastian Andrzej Siewior wrote:
> This series is a v2 of the atomic_dec_and_lock_irqsave(). Now refcount_*
> is used instead of atomic_* as suggested by Peter Zijlstra.
>
> Patch
> - 1-3 converts the user from atomic_* API to refcount_* API
> - 4 implements refcount_dec_and_lock_irqsave
> - 5-8 converts the local_irq_save() + refcount_dec_and_lock() users to
> refcount_dec_and_lock_irqsave()
>
> The whole series sits also at
> git://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git refcount_t_irqsave
>

1-2, 4-6:

Acked-by: Peter Zijlstra (Intel) <[email protected]>

Subject: Re: [PATCH v2 0/8] Introduce refcount_dec_and_lock_irqsave()

On 2018-05-23 15:01:25 [+0200], Peter Zijlstra wrote:
> 1-2, 4-6:
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>

Thank you.

Sebastian

2018-05-23 13:23:15

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t

On Wed, May 09, 2018 at 09:36:40PM +0200, Sebastian Andrzej Siewior wrote:
> refcount_t type and corresponding API should be used instead of atomic_t when
> the variable is used as a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free situations.
>
> Most changes are 1:1 replacements except for
> BUG_ON(atomic_inc_return(&sh->count) != 1);
>
> which has been turned into
> refcount_inc(&sh->count);
> BUG_ON(refcount_read(&sh->count) != 1);

@@ -5387,7 +5387,8 @@ static struct stripe_head *__get_priority_stripe(struct
+r5conf *conf, int group)
sh->group = NULL;
}
list_del_init(&sh->lru);
- BUG_ON(atomic_inc_return(&sh->count) != 1);
+ refcount_inc(&sh->count);
+ BUG_ON(refcount_read(&sh->count) != 1);
return sh;
}


That's the only problematic usage. And I think what it's really saying is:

BUG_ON(refcount_read(&sh->count) != 0);
refcount_set(&sh->count, 1);

With that, this looks like a reasonable use of refcount_t to me.

2018-05-23 17:49:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t

On Wed, May 23, 2018 at 06:21:19AM -0700, Matthew Wilcox wrote:
> On Wed, May 09, 2018 at 09:36:40PM +0200, Sebastian Andrzej Siewior wrote:
> > refcount_t type and corresponding API should be used instead of atomic_t when
> > the variable is used as a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free situations.
> >
> > Most changes are 1:1 replacements except for
> > BUG_ON(atomic_inc_return(&sh->count) != 1);
> >
> > which has been turned into
> > refcount_inc(&sh->count);
> > BUG_ON(refcount_read(&sh->count) != 1);
>
> @@ -5387,7 +5387,8 @@ static struct stripe_head *__get_priority_stripe(struct
> +r5conf *conf, int group)
> sh->group = NULL;
> }
> list_del_init(&sh->lru);
> - BUG_ON(atomic_inc_return(&sh->count) != 1);
> + refcount_inc(&sh->count);
> + BUG_ON(refcount_read(&sh->count) != 1);
> return sh;
> }
>
>
> That's the only problematic usage. And I think what it's really saying is:
>
> BUG_ON(refcount_read(&sh->count) != 0);
> refcount_set(&sh->count, 1);
>
> With that, this looks like a reasonable use of refcount_t to me.

I'm not so sure, look at:

r5c_do_reclaim():

if (!list_empty(&sh->lru) &&
!test_bit(STRIPE_HANDLE, &sh->state) &&
atomic_read(&sh->count) == 0) {
r5c_flush_stripe(cond, sh)

Which does:

r5c_flush_stripe():

atomic_inc(&sh->count);

Which is another inc-from-zero. Also, having sh's with count==0 in a
list is counter to the concept of refcounts and smells like usage-counts
to me. For refcount 0 really means deads and gone.

If this really is supposed to be a refcount, someone more familiar with
the raid5 should do the patch and write a comprehensive changelog on it.

2018-05-23 19:23:17

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t

On Wed, May 23, 2018 at 07:49:04PM +0200, Peter Zijlstra wrote:
> On Wed, May 23, 2018 at 06:21:19AM -0700, Matthew Wilcox wrote:
> > On Wed, May 09, 2018 at 09:36:40PM +0200, Sebastian Andrzej Siewior wrote:
> > > refcount_t type and corresponding API should be used instead of atomic_t when
> > > the variable is used as a reference counter. This allows to avoid accidental
> > > refcounter overflows that might lead to use-after-free situations.
> > >
> > > Most changes are 1:1 replacements except for
> > > BUG_ON(atomic_inc_return(&sh->count) != 1);
> > >
> > > which has been turned into
> > > refcount_inc(&sh->count);
> > > BUG_ON(refcount_read(&sh->count) != 1);
> >
> > @@ -5387,7 +5387,8 @@ static struct stripe_head *__get_priority_stripe(struct
> > +r5conf *conf, int group)
> > sh->group = NULL;
> > }
> > list_del_init(&sh->lru);
> > - BUG_ON(atomic_inc_return(&sh->count) != 1);
> > + refcount_inc(&sh->count);
> > + BUG_ON(refcount_read(&sh->count) != 1);
> > return sh;
> > }
> >
> >
> > That's the only problematic usage. And I think what it's really saying is:
> >
> > BUG_ON(refcount_read(&sh->count) != 0);
> > refcount_set(&sh->count, 1);
> >
> > With that, this looks like a reasonable use of refcount_t to me.
>
> I'm not so sure, look at:
>
> r5c_do_reclaim():
>
> if (!list_empty(&sh->lru) &&
> !test_bit(STRIPE_HANDLE, &sh->state) &&
> atomic_read(&sh->count) == 0) {
> r5c_flush_stripe(cond, sh)
>
> Which does:
>
> r5c_flush_stripe():
>
> atomic_inc(&sh->count);
>
> Which is another inc-from-zero. Also, having sh's with count==0 in a
> list is counter to the concept of refcounts and smells like usage-counts
> to me. For refcount 0 really means deads and gone.
>
> If this really is supposed to be a refcount, someone more familiar with
> the raid5 should do the patch and write a comprehensive changelog on it.

I don't know what is changed in the refcount, such raid5 change has attempted
before and didn't work. 0 for the stripe count is a valid usage and we do
inc-from-zero in several places.

Thanks,
Shaohua

2018-05-24 07:15:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t

On Wed, May 23, 2018 at 12:22:39PM -0700, Shaohua Li wrote:
> I don't know what is changed in the refcount, such raid5 change has attempted
> before and didn't work. 0 for the stripe count is a valid usage and we do
> inc-from-zero in several places.

Nothing much has changed with refcount; and the above does indeed still
appear to be an issue. Thanks for confirming.