2020-11-22 15:42:12

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH v2 for-next 0/4] optimise sbitmap deferred clear

sbitmap takes away some cycles for my tag-deficient test, removal of
locking in sbitmap_deferred_clear() gives +~1% throuhput.

[1/4] and [4/4] are simple, it'd be great if someone could double
check for ordering issues for other two patches.

v2: add 3rd (CAS -> atomic and) and 4th patches

Pavel Begunkov (4):
sbitmap: optimise sbitmap_deferred_clear()
sbitmap: remove swap_lock
sbitmap: replace CAS with atomic and
sbitmap: simplify wrap check

include/linux/sbitmap.h | 5 -----
lib/sbitmap.c | 44 +++++++++++++++++------------------------
2 files changed, 18 insertions(+), 31 deletions(-)

--
2.24.0


2020-11-22 15:42:52

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH v2 1/4] sbitmap: optimise sbitmap_deferred_clear()

Because of spinlocks and atomics sbitmap_deferred_clear() have to reload
&sb->map[index] on each access even though the map address won't change.
Pass in sbitmap_word instead of {sb, index}, so it's cached in a
variable. It also improves code generation of
sbitmap_find_bit_in_index().

Signed-off-by: Pavel Begunkov <[email protected]>
---
lib/sbitmap.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 267aa7709416..c1c8a4e69325 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -12,32 +12,32 @@
/*
* See if we have deferred clears that we can batch move
*/
-static inline bool sbitmap_deferred_clear(struct sbitmap *sb, int index)
+static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
{
unsigned long mask, val;
bool ret = false;
unsigned long flags;

- spin_lock_irqsave(&sb->map[index].swap_lock, flags);
+ spin_lock_irqsave(&map->swap_lock, flags);

- if (!sb->map[index].cleared)
+ if (!map->cleared)
goto out_unlock;

/*
* First get a stable cleared mask, setting the old mask to 0.
*/
- mask = xchg(&sb->map[index].cleared, 0);
+ mask = xchg(&map->cleared, 0);

/*
* Now clear the masked bits in our free word
*/
do {
- val = sb->map[index].word;
- } while (cmpxchg(&sb->map[index].word, val, val & ~mask) != val);
+ val = map->word;
+ } while (cmpxchg(&map->word, val, val & ~mask) != val);

ret = true;
out_unlock:
- spin_unlock_irqrestore(&sb->map[index].swap_lock, flags);
+ spin_unlock_irqrestore(&map->swap_lock, flags);
return ret;
}

@@ -92,7 +92,7 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth)
unsigned int i;

for (i = 0; i < sb->map_nr; i++)
- sbitmap_deferred_clear(sb, i);
+ sbitmap_deferred_clear(&sb->map[i]);

sb->depth = depth;
sb->map_nr = DIV_ROUND_UP(sb->depth, bits_per_word);
@@ -139,15 +139,15 @@ static int __sbitmap_get_word(unsigned long *word, unsigned long depth,
static int sbitmap_find_bit_in_index(struct sbitmap *sb, int index,
unsigned int alloc_hint, bool round_robin)
{
+ struct sbitmap_word *map = &sb->map[index];
int nr;

do {
- nr = __sbitmap_get_word(&sb->map[index].word,
- sb->map[index].depth, alloc_hint,
+ nr = __sbitmap_get_word(&map->word, map->depth, alloc_hint,
!round_robin);
if (nr != -1)
break;
- if (!sbitmap_deferred_clear(sb, index))
+ if (!sbitmap_deferred_clear(map))
break;
} while (1);

@@ -207,7 +207,7 @@ int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint,
break;
}

- if (sbitmap_deferred_clear(sb, index))
+ if (sbitmap_deferred_clear(&sb->map[index]))
goto again;

/* Jump to next index. */
--
2.24.0

2020-11-22 15:44:38

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH v2 3/4] sbitmap: replace CAS with atomic and

sbitmap_deferred_clear() does CAS loop to propagate cleared bits,
replace it with equivalent atomic bitwise and. That's slightly faster
and makes wait-free instead of lock-free as before.

The atomic can be relaxed (i.e. barrier-less) because following
sbitmap_get*() deal with synchronisation, see comments in
sbitmap_queue_clear().

It's ok to cast to atomic_long_t, that's what bitops/lock.h does.

Signed-off-by: Pavel Begunkov <[email protected]>
---
lib/sbitmap.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 4fd877048ba8..c18b518a16ba 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -14,7 +14,7 @@
*/
static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
{
- unsigned long mask, val;
+ unsigned long mask;

if (!READ_ONCE(map->cleared))
return false;
@@ -27,10 +27,8 @@ static inline bool sbitmap_deferred_clear(struct sbitmap_word *map)
/*
* Now clear the masked bits in our free word
*/
- do {
- val = map->word;
- } while (cmpxchg(&map->word, val, val & ~mask) != val);
-
+ atomic_long_andnot(mask, (atomic_long_t *)&map->word);
+ BUILD_BUG_ON(sizeof(atomic_long_t) != sizeof(map->word));
return true;
}

--
2.24.0

2020-11-24 15:10:42

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] sbitmap: optimise sbitmap_deferred_clear()

On 24/11/2020 14:11, John Garry wrote:
> On 22/11/2020 15:35, Pavel Begunkov wrote:
>> Because of spinlocks and atomics sbitmap_deferred_clear() have to reload
>> &sb->map[index] on each access even though the map address won't change.
>> Pass in sbitmap_word instead of {sb, index}, so it's cached in a
>> variable. It also improves code generation of
>> sbitmap_find_bit_in_index().
>>
>> Signed-off-by: Pavel Begunkov<[email protected]>
>
> Looks ok, even though a bit odd not be passing a struct sbitmap * now

IMHO, narrower context is better, so looks more natural to me.

> Reviewed-by: John Garry <[email protected]>

Thanks!

--
Pavel Begunkov

2020-11-25 02:07:22

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] sbitmap: optimise sbitmap_deferred_clear()

On 22/11/2020 15:35, Pavel Begunkov wrote:
> Because of spinlocks and atomics sbitmap_deferred_clear() have to reload
> &sb->map[index] on each access even though the map address won't change.
> Pass in sbitmap_word instead of {sb, index}, so it's cached in a
> variable. It also improves code generation of
> sbitmap_find_bit_in_index().
>
> Signed-off-by: Pavel Begunkov<[email protected]>

Looks ok, even though a bit odd not be passing a struct sbitmap * now

Reviewed-by: John Garry <[email protected]>

2020-12-08 01:21:10

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2 for-next 0/4] optimise sbitmap deferred clear

On 11/22/20 8:35 AM, Pavel Begunkov wrote:
> sbitmap takes away some cycles for my tag-deficient test, removal of
> locking in sbitmap_deferred_clear() gives +~1% throuhput.
>
> [1/4] and [4/4] are simple, it'd be great if someone could double
> check for ordering issues for other two patches.
>
> v2: add 3rd (CAS -> atomic and) and 4th patches

Applied, thanks.

--
Jens Axboe