LinuxLists.cc - [PATCH v3 0/3] lib: find_*_bit reimplementation

2015-02-08 14:11:39

[permalink] [raw]

Subject: [PATCH v3 0/3] lib: find_*_bit reimplementation

This patchset does rework find_bit functions family to achieve better
performance, and decrease size of text. All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

/* addr[] is filled from /dev/urandom */
start = clock();
while (ret < nbits)
ret = find_next_bit(addr, nbits, ret + 1);

end = clock();
printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz rezults are next:
(for find_next_bit, nbits is 8M, for find_first_bit - 80K)

find_next_bit: find_first_bit:
new current new current
26932 43151 14777 14925
26947 43182 14521 15423
26507 43824 15053 14705
27329 43759 14473 14777
26895 43367 14847 15023
26990 43693 15103 15163
26775 43299 15067 15232
27282 42752 14544 15121
27504 43088 14644 14858
26761 43856 14699 15193
26692 43075 14781 14681
27137 42969 14451 15061
... ...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.
To disable it, and use generic one, please apply next patch:

---
arch/arm/include/asm/bitops.h | 19 -------------------
arch/arm/kernel/armksyms.c | 11 -----------
arch/arm/lib/Makefile | 2 +-
include/asm-generic/bitops/le.h | 1 +
4 files changed, 2 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
index 5638099..e0611d1 100644
--- a/arch/arm/include/asm/bitops.h
+++ b/arch/arm/include/asm/bitops.h
@@ -192,25 +192,6 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
#define test_and_clear_bit(nr,p) ATOMIC_BITOP(test_and_clear_bit,nr,p)
#define test_and_change_bit(nr,p) ATOMIC_BITOP(test_and_change_bit,nr,p)

-#ifndef __ARMEB__
-/*
- * These are the little endian, atomic definitions.
- */
-#define find_first_zero_bit(p,sz) _find_first_zero_bit_le(p,sz)
-#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_le(p,sz,off)
-#define find_first_bit(p,sz) _find_first_bit_le(p,sz)
-#define find_next_bit(p,sz,off) _find_next_bit_le(p,sz,off)
-
-#else
-/*
- * These are the big endian, atomic definitions.
- */
-#define find_first_zero_bit(p,sz) _find_first_zero_bit_be(p,sz)
-#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_be(p,sz,off)
-#define find_first_bit(p,sz) _find_first_bit_be(p,sz)
-#define find_next_bit(p,sz,off) _find_next_bit_be(p,sz,off)
-
-#endif

#if __LINUX_ARM_ARCH__ < 5

diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index a88671c..22e8748 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -146,17 +146,6 @@ EXPORT_SYMBOL(_clear_bit);
EXPORT_SYMBOL(_test_and_clear_bit);
EXPORT_SYMBOL(_change_bit);
EXPORT_SYMBOL(_test_and_change_bit);
-EXPORT_SYMBOL(_find_first_zero_bit_le);
-EXPORT_SYMBOL(_find_next_zero_bit_le);
-EXPORT_SYMBOL(_find_first_bit_le);
-EXPORT_SYMBOL(_find_next_bit_le);
-
-#ifdef __ARMEB__
-EXPORT_SYMBOL(_find_first_zero_bit_be);
-EXPORT_SYMBOL(_find_next_zero_bit_be);
-EXPORT_SYMBOL(_find_first_bit_be);
-EXPORT_SYMBOL(_find_next_bit_be);
-#endif

#ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_OLD_MCOUNT
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 0573faa..de369aa 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -6,7 +6,7 @@

lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
- delay.o delay-loop.o findbit.o memchr.o memcpy.o \
+ delay.o delay-loop.o memchr.o memcpy.o \
memmove.o memset.o memzero.o setbit.o \
strchr.o strrchr.o \
testchangebit.o testclearbit.o testsetbit.o \
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 6173154..9a8798f 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -2,6 +2,7 @@
#define _ASM_GENERIC_BITOPS_LE_H_

#include <asm/types.h>
+#include <asm-generic/bitops/find.h>
#include <asm/byteorder.h>

#if defined(__LITTLE_ENDIAN)
--
1.9.1

v3:
* conditional bit inverting in _find_next_bit replaced with XORing by
mask (patch 1);
* size/offset checkers moved from wrappers to _find_next_bit (patch 1);
* #include <linux/bitops.h> restored (patch 1);
* lib/find_next_bit descriptor changed to not appeal to file name and
so let to produce clean diff in case of rename (patch 2);
* patch 3 is generated with '-M' option to detect renaming and drop
file content;
* this cover added.

Yury Norov (3):
lib: find_*_bit reimplementation
lib: move find_last_bit to lib/find_next_bit.c
lib: rename lib/find_next_bit.c to lib/find_bit.c

lib/Makefile | 2 +-
lib/find_bit.c | 197 ++++++++++++++++++++++++++++++++++++
lib/find_last_bit.c | 49 ---------
lib/find_next_bit.c | 285 ----------------------------------------------------
4 files changed, 198 insertions(+), 335 deletions(-)
create mode 100644 lib/find_bit.c
delete mode 100644 lib/find_last_bit.c
delete mode 100644 lib/find_next_bit.c

Signed-off-by: Yury Norov <[email protected]>
Tested-by: Alexey Klimov <[email protected]> (odroid-xu3 ARMv7 SoC).
--
2.1.0

2015-02-08 14:11:47

[permalink] [raw]

Subject: [PATCH v3 1/3] lib: find_*_bit reimplementation

New implementations takes less space in source file (see diffstat)
and in object. For me it's 710 vs 453 bytes of text.
It also shows better performance.

Signed-off-by: Yury Norov <[email protected]>
---
lib/find_last_bit.c | 30 ++----
lib/find_next_bit.c | 275 ++++++++++++++++------------------------------------
2 files changed, 92 insertions(+), 213 deletions(-)

diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
index 91ca09f..106050f 100644
--- a/lib/find_last_bit.c
+++ b/lib/find_last_bit.c
@@ -4,6 +4,9 @@
* Written by Rusty Russell <[email protected]>
* (Inspired by David Howell's find_next_bit implementation)
*
+ * Rewritten by Yury Norov <[email protected]> to decrease
+ * size and improve performance, 2015.
+ *
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
@@ -12,36 +15,19 @@

#include <linux/bitops.h>
#include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>

#ifndef find_last_bit

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
- unsigned long words;
- unsigned long tmp;
-
- /* Start at final word. */
- words = size / BITS_PER_LONG;
-
- /* Partial final word? */
- if (size & (BITS_PER_LONG-1)) {
- tmp = (addr[words] & (~0UL >> (BITS_PER_LONG
- - (size & (BITS_PER_LONG-1)))));
- if (tmp)
- goto found;
- }
+ unsigned long idx = DIV_ROUND_UP(size, BITS_PER_LONG);

- while (words) {
- tmp = addr[--words];
- if (tmp) {
-found:
- return words * BITS_PER_LONG + __fls(tmp);
- }
+ while (idx--) {
+ if (addr[idx])
+ return min(idx * BITS_PER_LONG + __fls(addr[idx]), size);
}

- /* Not found */
return size;
}
EXPORT_SYMBOL(find_last_bit);
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 0cbfc0b..71aa497 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -3,6 +3,9 @@
* Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*
+ * Rewritten by Yury Norov <[email protected]> to decrease
+ * size and improve performance, 2015.
+ *
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
@@ -11,10 +14,48 @@

#include <linux/bitops.h>
#include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>
+
+#define HIGH_BITS_MASK(nr) (ULONG_MAX << (nr))
+
+#if !defined(find_next_bit) || !defined(find_next_zero_bit)
+static unsigned long _find_next_bit(const unsigned long *addr,
+ unsigned long nbits, unsigned long start, unsigned long mask)
+{
+ unsigned long tmp;
+
+ if (start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ mask;

-#define BITOP_WORD(nr) ((nr) / BITS_PER_LONG)
+ /* Handle 1st word. */
+ if (!IS_ALIGNED(start, BITS_PER_LONG)) {
+ tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);
+ start = round_down(start, BITS_PER_LONG);
+ }
+
+ while (!tmp) {
+ start += BITS_PER_LONG;
+ if (start >= nbits)
+ return nbits;
+
+ /*
+ * This is an equvalent for:
+ *
+ * tmp = find_set ? addr[start / BITS_PER_LONG]
+ * : ~addr[start / BITS_PER_LONG];
+ *
+ * but saves a branch condition.
+ *
+ * Thanks George Spelvin <[email protected]> for idea.
+ */
+ tmp = addr[start / BITS_PER_LONG] ^ mask;
+ }
+
+ return min(start + __ffs(tmp), nbits);
+}
+#endif

#ifndef find_next_bit
/*
@@ -23,86 +64,16 @@
unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
unsigned long offset)
{
- const unsigned long *p = addr + BITOP_WORD(offset);
- unsigned long result = offset & ~(BITS_PER_LONG-1);
- unsigned long tmp;
-
- if (offset >= size)
- return size;
- size -= result;
- offset %= BITS_PER_LONG;
- if (offset) {
- tmp = *(p++);
- tmp &= (~0UL << offset);
- if (size < BITS_PER_LONG)
- goto found_first;
- if (tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
- while (size & ~(BITS_PER_LONG-1)) {
- if ((tmp = *(p++)))
- goto found_middle;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
- }
- if (!size)
- return result;
- tmp = *p;
-
-found_first:
- tmp &= (~0UL >> (BITS_PER_LONG - size));
- if (tmp == 0UL) /* Are any bits set? */
- return result + size; /* Nope. */
-found_middle:
- return result + __ffs(tmp);
+ return _find_next_bit(addr, size, offset, 0UL);
}
EXPORT_SYMBOL(find_next_bit);
#endif

#ifndef find_next_zero_bit
-/*
- * This implementation of find_{first,next}_zero_bit was stolen from
- * Linus' asm-alpha/bitops.h.
- */
unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
unsigned long offset)
{
- const unsigned long *p = addr + BITOP_WORD(offset);
- unsigned long result = offset & ~(BITS_PER_LONG-1);
- unsigned long tmp;
-
- if (offset >= size)
- return size;
- size -= result;
- offset %= BITS_PER_LONG;
- if (offset) {
- tmp = *(p++);
- tmp |= ~0UL >> (BITS_PER_LONG - offset);
- if (size < BITS_PER_LONG)
- goto found_first;
- if (~tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
- while (size & ~(BITS_PER_LONG-1)) {
- if (~(tmp = *(p++)))
- goto found_middle;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
- }
- if (!size)
- return result;
- tmp = *p;
-
-found_first:
- tmp |= ~0UL << size;
- if (tmp == ~0UL) /* Are any bits zero? */
- return result + size; /* Nope. */
-found_middle:
- return result + ffz(tmp);
+ return _find_next_bit(addr, size, offset, ~0UL);
}
EXPORT_SYMBOL(find_next_zero_bit);
#endif
@@ -113,24 +84,14 @@ EXPORT_SYMBOL(find_next_zero_bit);
*/
unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
{
- const unsigned long *p = addr;
- unsigned long result = 0;
- unsigned long tmp;
+ unsigned long idx;

- while (size & ~(BITS_PER_LONG-1)) {
- if ((tmp = *(p++)))
- goto found;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+ if (addr[idx])
+ return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
}
- if (!size)
- return result;

- tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
- if (tmp == 0UL) /* Are any bits set? */
- return result + size; /* Nope. */
-found:
- return result + __ffs(tmp);
+ return size;
}
EXPORT_SYMBOL(find_first_bit);
#endif
@@ -141,24 +102,14 @@ EXPORT_SYMBOL(find_first_bit);
*/
unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
{
- const unsigned long *p = addr;
- unsigned long result = 0;
- unsigned long tmp;
+ unsigned long idx;

- while (size & ~(BITS_PER_LONG-1)) {
- if (~(tmp = *(p++)))
- goto found;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+ if (addr[idx] != ULONG_MAX)
+ return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
}
- if (!size)
- return result;

- tmp = (*p) | (~0UL << size);
- if (tmp == ~0UL) /* Are any bits zero? */
- return result + size; /* Nope. */
-found:
- return result + ffz(tmp);
+ return size;
}
EXPORT_SYMBOL(find_first_zero_bit);
#endif
@@ -166,18 +117,6 @@ EXPORT_SYMBOL(find_first_zero_bit);
#ifdef __BIG_ENDIAN

/* include/linux/byteorder does not support "unsigned long" type */
-static inline unsigned long ext2_swabp(const unsigned long * x)
-{
-#if BITS_PER_LONG == 64
- return (unsigned long) __swab64p((u64 *) x);
-#elif BITS_PER_LONG == 32
- return (unsigned long) __swab32p((u32 *) x);
-#else
-#error BITS_PER_LONG not defined
-#endif
-}
-
-/* include/linux/byteorder doesn't support "unsigned long" type */
static inline unsigned long ext2_swab(const unsigned long y)
{
#if BITS_PER_LONG == 64
@@ -189,48 +128,40 @@ static inline unsigned long ext2_swab(const unsigned long y)
#endif
}

-#ifndef find_next_zero_bit_le
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
- long size, unsigned long offset)
+#if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
+static unsigned long _find_next_bit_le(const unsigned long *addr,
+ unsigned long nbits, unsigned long start, unsigned long mask)
{
- const unsigned long *p = addr;
- unsigned long result = offset & ~(BITS_PER_LONG - 1);
unsigned long tmp;

- if (offset >= size)
- return size;
- p += BITOP_WORD(offset);
- size -= result;
- offset &= (BITS_PER_LONG - 1UL);
- if (offset) {
- tmp = ext2_swabp(p++);
- tmp |= (~0UL >> (BITS_PER_LONG - offset));
- if (size < BITS_PER_LONG)
- goto found_first;
- if (~tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
+ if (start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ mask;
+
+ /* Handle 1st word. */
+ if (!IS_ALIGNED(start, BITS_PER_LONG)) {
+ tmp &= ext2_swab(HIGH_BITS_MASK(start % BITS_PER_LONG));
+ start = round_down(start, BITS_PER_LONG);
}

- while (size & ~(BITS_PER_LONG - 1)) {
- if (~(tmp = *(p++)))
- goto found_middle_swap;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ while (!tmp) {
+ start += BITS_PER_LONG;
+ if (start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ mask;
}
- if (!size)
- return result;
- tmp = ext2_swabp(p);
-found_first:
- tmp |= ~0UL << size;
- if (tmp == ~0UL) /* Are any bits zero? */
- return result + size; /* Nope. Skip ffz */
-found_middle:
- return result + ffz(tmp);

-found_middle_swap:
- return result + ffz(ext2_swab(tmp));
+ return min(start + __ffs(ext2_swab(tmp)), nbits);
+}
+#endif
+
+#ifndef find_next_zero_bit_le
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+ long size, unsigned long offset)
+{
+ return _find_next_bit_le(addr, size, offset, ~0UL);
}
EXPORT_SYMBOL(find_next_zero_bit_le);
#endif
@@ -239,45 +170,7 @@ EXPORT_SYMBOL(find_next_zero_bit_le);
unsigned long find_next_bit_le(const void *addr, unsigned
long size, unsigned long offset)
{
- const unsigned long *p = addr;
- unsigned long result = offset & ~(BITS_PER_LONG - 1);
- unsigned long tmp;
-
- if (offset >= size)
- return size;
- p += BITOP_WORD(offset);
- size -= result;
- offset &= (BITS_PER_LONG - 1UL);
- if (offset) {
- tmp = ext2_swabp(p++);
- tmp &= (~0UL << offset);
- if (size < BITS_PER_LONG)
- goto found_first;
- if (tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
-
- while (size & ~(BITS_PER_LONG - 1)) {
- tmp = *(p++);
- if (tmp)
- goto found_middle_swap;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
- }
- if (!size)
- return result;
- tmp = ext2_swabp(p);
-found_first:
- tmp &= (~0UL >> (BITS_PER_LONG - size));
- if (tmp == 0UL) /* Are any bits set? */
- return result + size; /* Nope. */
-found_middle:
- return result + __ffs(tmp);
-
-found_middle_swap:
- return result + __ffs(ext2_swab(tmp));
+ return _find_next_bit_le(addr, size, offset, 0UL);
}
EXPORT_SYMBOL(find_next_bit_le);
#endif
--
2.1.0

2015-02-08 14:11:50

[permalink] [raw]

Subject: [PATCH v3 2/3] lib: move find_last_bit to lib/find_next_bit.c

Currently all 'find_*_bit' family is located in lib/find_next_bit.c,
except 'find_last_bit', which is in lib/find_last_bit.c. It seems,
there's no major benefit to have it separated.

Signed-off-by: Yury Norov <[email protected]>
---
lib/Makefile | 2 +-
lib/find_last_bit.c | 35 -----------------------------------
lib/find_next_bit.c | 21 ++++++++++++++++++++-
3 files changed, 21 insertions(+), 37 deletions(-)
delete mode 100644 lib/find_last_bit.c

diff --git a/lib/Makefile b/lib/Makefile
index 3c3b30b..13990aa 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y += lockref.o
obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
- bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
+ bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
obj-y += string_helpers.o
obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
deleted file mode 100644
index 106050f..0000000
--- a/lib/find_last_bit.c
+++ /dev/null
@@ -1,35 +0,0 @@
-/* find_last_bit.c: fallback find next bit implementation
- *
- * Copyright (C) 2008 IBM Corporation
- * Written by Rusty Russell <[email protected]>
- * (Inspired by David Howell's find_next_bit implementation)
- *
- * Rewritten by Yury Norov <[email protected]> to decrease
- * size and improve performance, 2015.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include <linux/bitops.h>
-#include <linux/export.h>
-#include <linux/kernel.h>
-
-#ifndef find_last_bit
-
-unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
-{
- unsigned long idx = DIV_ROUND_UP(size, BITS_PER_LONG);
-
- while (idx--) {
- if (addr[idx])
- return min(idx * BITS_PER_LONG + __fls(addr[idx]), size);
- }
-
- return size;
-}
-EXPORT_SYMBOL(find_last_bit);
-
-#endif
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 71aa497..5ec0ab9 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -1,8 +1,12 @@
-/* find_next_bit.c: fallback find next bit implementation
+/* bit search implementation
*
* Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*
+ * Copyright (C) 2008 IBM Corporation
+ * 'find_last_bit' is written by Rusty Russell <[email protected]>
+ * (Inspired by David Howell's find_next_bit implementation)
+ *
* Rewritten by Yury Norov <[email protected]> to decrease
* size and improve performance, 2015.
*
@@ -114,6 +118,21 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
EXPORT_SYMBOL(find_first_zero_bit);
#endif

+#ifndef find_last_bit
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+ unsigned long idx = DIV_ROUND_UP(size, BITS_PER_LONG);
+
+ while (idx--) {
+ if (addr[idx])
+ return min(idx * BITS_PER_LONG + __fls(addr[idx]), size);
+ }
+
+ return size;
+}
+EXPORT_SYMBOL(find_last_bit);
+#endif
+
#ifdef __BIG_ENDIAN

/* include/linux/byteorder does not support "unsigned long" type */
--
2.1.0

2015-02-08 14:11:54

[permalink] [raw]

Subject: [PATCH v3 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c

This file contains implementation for all find_*_bit{,_le}
So giving it more generic name looks reasonable.

Signed-off-by: Yury Norov <[email protected]>
---
lib/Makefile | 2 +-
lib/{find_next_bit.c => find_bit.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
rename lib/{find_next_bit.c => find_bit.c} (100%)

diff --git a/lib/Makefile b/lib/Makefile
index 13990aa..1cc93f4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y += lockref.o
obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
- bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
+ bsearch.o find_bit.o llist.o memweight.o kfifo.o \
percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
obj-y += string_helpers.o
obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_next_bit.c b/lib/find_bit.c
similarity index 100%
rename from lib/find_next_bit.c
rename to lib/find_bit.c
--
2.1.0

2015-02-08 18:48:26

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

This basically has my Reviewed-by: (I'll send it in a few hours
when I have time to do a real final copy-editing), but a few minor
notes:

>+ /*
>+ * This is an equvalent for:
>+ *
>+ * tmp = find_set ? addr[start / BITS_PER_LONG]
>+ * : ~addr[start / BITS_PER_LONG];
>+ *
>+ * but saves a branch condition.
>+ *
>+ * Thanks George Spelvin <[email protected]> for idea.
>+ */
>+ tmp = addr[start / BITS_PER_LONG] ^ mask;

1. There's no need for detailed credit for such a trivial and obvious
thing. If you want to comment it, describe the use of the parameter
in the function header, e.g.

+/*
+ * "mask" is either 0 or ~0UL and XORed into each fetched word, to select between
+ * searching for one bits and zero bits. If this function is inlined, GCC is
+ * smart enough to propagate the constant.
+ */

2. The variable might be more meaningfully named "xor" or "invert"; "mask"
suggests something that is &-ed with a value.

> unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
> unsigned long offset)
> {
>+ return _find_next_bit(addr, size, offset, ~0UL); <---
> }

> unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
> {
>+ unsigned long idx;
>
>+ for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
>+ if (addr[idx] != ULONG_MAX) <---
>+ return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
> }
>
>+ return size;
> }

3. Using two names (ULONG_MAX and ~0UL) for the same thing is a bit odd;
you might want to be consistent.

I'll ack it either way; none of these are significant technical issues.

2015-02-09 08:32:16

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

Two more comments on the code. Two minor, but one that
seems like a bug, so for now, it's

Nacked-by: George Spelvin <[email protected]>

Specifically, it seems like find_last_bit used to ignore trailing
garbage in the bitmap, but now will stop searching if the last word
contains some set bits not within size.

The minor one is that I don't think the first-word masking needs to
be conditional. The general code works fine if the start is aligned
(HIGH_BITS_MASK just generates an all-ones mask), is quite quick, and
saves a test & conditional branch.

A second minor one is that the comment in include/linux/bitops.h has
an obvious typo, and should probably be fixed, too.

Here's a diff on top of yours with all my suggested changes, both
these two and the ones from my previous e-mail.

diff --git a/find_last_bit.c b/find_last_bit.c
index e67e970..106050f 100644
--- a/find_last_bit.c
+++ b/find_last_bit.c
@@ -13,6 +13,7 @@
* 2 of the License, or (at your option) any later version.
*/

+#include <linux/bitops.h>
#include <linux/export.h>
#include <linux/kernel.h>

diff --git a/find_next_bit.c b/find_next_bit.c
index 71aa497..9d01d4a 100644
--- a/find_next_bit.c
+++ b/find_next_bit.c
@@ -16,41 +16,34 @@
#include <linux/export.h>
#include <linux/kernel.h>

-#define HIGH_BITS_MASK(nr) (ULONG_MAX << (nr))
+#define HIGH_BITS_MASK(nr) (~0UL << (nr))

#if !defined(find_next_bit) || !defined(find_next_zero_bit)
+
+/*
+ * This is a common helper function for find_next_bit and
+ * find_next_zero_bit. The difference is the "invert" argument, which
+ * is XORed with each fetched word before searching it for one bits.
+ */
static unsigned long _find_next_bit(const unsigned long *addr,
- unsigned long nbits, unsigned long start, unsigned long mask)
+ unsigned long nbits, unsigned long start, unsigned long invert)
{
unsigned long tmp;

if (start >= nbits)
return nbits;

- tmp = addr[start / BITS_PER_LONG] ^ mask;
+ tmp = addr[start / BITS_PER_LONG] ^ invert;

/* Handle 1st word. */
- if (!IS_ALIGNED(start, BITS_PER_LONG)) {
- tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);
- start = round_down(start, BITS_PER_LONG);
- }
+ tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);
+ start = round_down(start, BITS_PER_LONG);

while (!tmp) {
start += BITS_PER_LONG;
if (start >= nbits)
return nbits;
-
- /*
- * This is an equvalent for:
- *
- * tmp = find_set ? addr[start / BITS_PER_LONG]
- * : ~addr[start / BITS_PER_LONG];
- *
- * but saves a branch condition.
- *
- * Thanks George Spelvin <[email protected]> for idea.
- */
- tmp = addr[start / BITS_PER_LONG] ^ mask;
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
}

return min(start + __ffs(tmp), nbits);
@@ -105,7 +98,7 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
unsigned long idx;

for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
- if (addr[idx] != ULONG_MAX)
+ if (~addr[idx])
return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
}

@@ -130,14 +123,14 @@ static inline unsigned long ext2_swab(const unsigned long y)

#if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
static unsigned long _find_next_bit_le(const unsigned long *addr,
- unsigned long nbits, unsigned long start, unsigned long mask)
+ unsigned long nbits, unsigned long start, unsigned long invert)
{
unsigned long tmp;

if (start >= nbits)
return nbits;

- tmp = addr[start / BITS_PER_LONG] ^ mask;
+ tmp = addr[start / BITS_PER_LONG] ^ invert;

/* Handle 1st word. */
if (!IS_ALIGNED(start, BITS_PER_LONG)) {
@@ -150,7 +143,7 @@ static unsigned long _find_next_bit_le(const unsigned long *addr,
if (start >= nbits)
return nbits;

- tmp = addr[start / BITS_PER_LONG] ^ mask;
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
}

return min(start + __ffs(ext2_swab(tmp)), nbits);
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5d858e02..4a5e5934 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -218,9 +218,9 @@ static inline unsigned long __ffs64(u64 word)
/**
* find_last_bit - find the last set bit in a memory region
* @addr: The address to start the search at
- * @size: The maximum size to search
+ * @size: The number of bits to search
*
- * Returns the bit number of the first set bit, or size.
+ * Returns the bit number of the last set bit, or size if no bits are set.
*/
extern unsigned long find_last_bit(const unsigned long *addr,
unsigned long size);

Now for the serious issue. I see a function change, fo find_last_bit
with no justifying comments.

diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
index 91ca09f..106050f 100644
--- a/lib/find_last_bit.c
+++ b/lib/find_last_bit.c
@@ -19,29 +21,13 @@

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
- unsigned long words;
- unsigned long tmp;
-
- /* Start at final word. */
- words = size / BITS_PER_LONG;
-
- /* Partial final word? */
- if (size & (BITS_PER_LONG-1)) {
- tmp = (addr[words] & (~0UL >> (BITS_PER_LONG
- - (size & (BITS_PER_LONG-1)))));
- if (tmp)
- goto found;
- }
+ unsigned long idx = DIV_ROUND_UP(size, BITS_PER_LONG);

- while (words) {
- tmp = addr[--words];
- if (tmp) {
-found:
- return words * BITS_PER_LONG + __fls(tmp);
- }
+ while (idx--) {
+ if (addr[idx])
+ return min(idx * BITS_PER_LONG + __fls(addr[idx]), size);
}

- /* Not found */
return size;
}
EXPORT_SYMBOL(find_last_bit);

Previously, the last word was masked, so bits beyond "size" were ignored.
With the revised code, something like find_last_bit(array, 96) will return 96
if array[1] >> 32 is non-zero, even if array[1] & 0xffffffff is zero.

Looking through the callers, I haven't found a case where this matters yet
so perhaps it's a safe optimization, but this really needs to be more
clearly documented if intentional.

If no change was desired, I'd think a good way to do this would be:

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
unsigned long tmp = addr[--idx];

tmp &= (2UL << (size % BITS_PER_LONG)) - 1; /* Mask last word */

while (!tmp) {
if (!idx)
return size;
tmp = addr[--idx];
}
return idx * BITS_PER_LONG + __fls(tmp);
}

Which is not that different from the old code, but cleaned up.

(Feel free to add my Signed-off-by if you think you need it.)

2015-02-09 11:53:44

by Rasmus Villemoes

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

[Yury, please do remember to Cc everyone who has previously
participated]

On Mon, Feb 09 2015, "George Spelvin" <[email protected]> wrote:

> Two more comments on the code. Two minor, but one that
> seems like a bug, so for now, it's
>
> Nacked-by: George Spelvin <[email protected]>
>
> Specifically, it seems like find_last_bit used to ignore trailing
> garbage in the bitmap, but now will stop searching if the last word
> contains some set bits not within size.

True, though see below.

> The minor one is that I don't think the first-word masking needs to
> be conditional. The general code works fine if the start is aligned
> (HIGH_BITS_MASK just generates an all-ones mask), is quite quick, and
> saves a test & conditional branch.
>

I also noted that during the first review, but when I tried to compile
it gcc actually generated slightly worse code, so I decided not to
comment on it. I don't have a strong preference either way, though.

>
> Previously, the last word was masked, so bits beyond "size" were ignored.
> With the revised code, something like find_last_bit(array, 96) will return 96
> if array[1] >> 32 is non-zero, even if array[1] & 0xffffffff is zero.
>
> Looking through the callers, I haven't found a case where this matters yet
> so perhaps it's a safe optimization, but this really needs to be more
> clearly documented if intentional.
>
> If no change was desired, I'd think a good way to do this would be:
>
> unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
> {
> size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
> unsigned long tmp = addr[--idx];
>
> tmp &= (2UL << (size % BITS_PER_LONG)) - 1; /* Mask last word */
>
> while (!tmp) {
> if (!idx)
> return size;
> tmp = addr[--idx];
> }
> return idx * BITS_PER_LONG + __fls(tmp);
> }

How should that work? If size is for example 1, the mask evaluates to 3UL,
while what is needed is 1UL. If size is aligned, the mask becomes 1UL,
which is also not right.

Also, I think it is best to handle size==0 appropriately, meaning that
one cannot dereference addr in any way (and certainly not addr[-1]).

So how about

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
unsigned long mask = LAST_WORD_MASK(size);

while (idx--) {
unsigned long val = addr[idx] & mask;
if (val)
return idx * BITS_PER_LONG + __fls(val);
mask = ~0ul;
}
return size;
}

Rasmus

2015-02-09 16:45:45

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

Sorry, I screwed up the bit-twiddling while messing with various options.
I was trying to get size == 32 to work; that should have been:

> tmp &= (2UL << ((size-1) % BITS_PER_LONG)) - 1; /* Mask last word */

And you're right that LAST_WORD_MASK is a good wrapper.

Vasrious working solutions include:
#define LAST_WORD_MASK(bits) ((2UL << (bits-1) % BITS_PER_LONG) - 1)
#define LAST_WORD_MASK(bits) ~(~0UL << bits % BITS_PER_LONG)
#define LAST_WORD_MASK(bits) (~0UL >> -bits % BITS_PER_LONG)

I'm not sure which generates the nicest code. It's 4 instructions
each way, with the last being 1 byte smaller:

0000000000000000 <lwm1>:
0: 8d 4f ff lea -0x1(%rdi),%ecx
3: b8 02 00 00 00 mov $0x2,%eax
8: 48 d3 e0 shl %cl,%rax
b: 48 83 e8 01 sub $0x1,%rax
f: c3 retq

0000000000000010 <lwm2>:
10: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
17: 89 f9 mov %edi,%ecx
19: 48 d3 e0 shl %cl,%rax
1c: 48 f7 d0 not %rax
1f: c3 retq

0000000000000020 <lwm3>:
20: 89 f9 mov %edi,%ecx
22: f7 d9 neg %ecx
24: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
2b: 48 d3 e8 shr %cl,%rax
2e: c3 retq

> Also, I think it is best to handle size==0 appropriately, meaning that
> one cannot dereference addr in any way (and certainly not addr[-1]).

Ah, okay; l I figured that was a safe case to omit. But your solution is nicer
than mine overall.

It may be that omitting the mask *is* safe, but it's a lot of wading through
callers to prove it.

2015-02-11 22:14:14

by Rasmus Villemoes

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

[for some reason google decided to put this in my spam folder, hrmpf]

On Mon, Feb 09 2015, "George Spelvin" <[email protected]> wrote:

> Sorry, I screwed up the bit-twiddling while messing with various options.
> I was trying to get size == 32 to work; that should have been:
>
>> tmp &= (2UL << ((size-1) % BITS_PER_LONG)) - 1; /* Mask last word */
>
> And you're right that LAST_WORD_MASK is a good wrapper.
>

Well, it's not my invention, I just misremembered the
name. linux/bitmap.h already exposes BITMAP_LAST_WORD_MASK.

> Vasrious working solutions include:
> #define LAST_WORD_MASK(bits) ((2UL << (bits-1) % BITS_PER_LONG) - 1)
> #define LAST_WORD_MASK(bits) ~(~0UL << bits % BITS_PER_LONG)
> #define LAST_WORD_MASK(bits) (~0UL >> -bits % BITS_PER_LONG)

Incidentally, I had a patch lying around for replacing BITMAP_LAST_WORD_MASK by
something like the last of these (it is currently using a ?:). But to allow bits to
have signed type it is safer to spell it

#define BITMAP_LAST_WORD_MASK(bits) (~0UL >> ((-(bits)) & (BITS_PER_LONG-1)))

[also adding lots of parentheses so I don't have to worry about precedence].

> I'm not sure which generates the nicest code. It's 4 instructions
> each way, with the last being 1 byte smaller:

I think one would have to look at effects on real code; when just compiling a
function doing nothing but this gcc has to use specific registers for in
and out.

>> Also, I think it is best to handle size==0 appropriately, meaning that
>> one cannot dereference addr in any way (and certainly not addr[-1]).
>
> Ah, okay; l I figured that was a safe case to omit. But your solution is nicer
> than mine overall.
>
> It may be that omitting the mask *is* safe, but it's a lot of wading through
> callers to prove it.

I think generic library code like this should provide both safety
checks, and only if some true performance bottleneck is found can one
start looking at implementing __shortcuts which have further constraints
on the caller.

Rasmus

2015-02-11 23:05:22

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

On 09.02.2015 14:53, Rasmus Villemoes wrote:
> [Yury, please do remember to Cc everyone who has previously
> participated]
>
> On Mon, Feb 09 2015, "George Spelvin" <[email protected]> wrote:
>
>> Two more comments on the code. Two minor, but one that
>> seems like a bug, so for now, it's
>>
>> Nacked-by: George Spelvin <[email protected]>
>>
>> Specifically, it seems like find_last_bit used to ignore trailing
>> garbage in the bitmap, but now will stop searching if the last word
>> contains some set bits not within size.
> True, though see below.
>
>> The minor one is that I don't think the first-word masking needs to
>> be conditional. The general code works fine if the start is aligned
>> (HIGH_BITS_MASK just generates an all-ones mask), is quite quick, and
>> saves a test & conditional branch.
>>
> I also noted that during the first review, but when I tried to compile
> it gcc actually generated slightly worse code, so I decided not to
> comment on it. I don't have a strong preference either way, though.
>
>> Previously, the last word was masked, so bits beyond "size" were ignored.
>> With the revised code, something like find_last_bit(array, 96) will return 96
>> if array[1] >> 32 is non-zero, even if array[1] & 0xffffffff is zero.
>>
>> Looking through the callers, I haven't found a case where this matters yet
>> so perhaps it's a safe optimization, but this really needs to be more
>> clearly documented if intentional.
>>
>> If no change was desired, I'd think a good way to do this would be:
>>
>> unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
>> {
>> size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
>> unsigned long tmp = addr[--idx];
>>
>> tmp &= (2UL << (size % BITS_PER_LONG)) - 1; /* Mask last word */
>>
>> while (!tmp) {
>> if (!idx)
>> return size;
>> tmp = addr[--idx];
>> }
>> return idx * BITS_PER_LONG + __fls(tmp);
>> }
> How should that work? If size is for example 1, the mask evaluates to 3UL,
> while what is needed is 1UL. If size is aligned, the mask becomes 1UL,
> which is also not right.
>
> Also, I think it is best to handle size==0 appropriately, meaning that
> one cannot dereference addr in any way (and certainly not addr[-1]).
>
> So how about
>
> unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
> {
> size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
> unsigned long mask = LAST_WORD_MASK(size);
>
> while (idx--) {
> unsigned long val = addr[idx] & mask;
> if (val)
> return idx * BITS_PER_LONG + __fls(val);
> mask = ~0ul;
> }
> return size;
> }
>
> Rasmus
Rasmus, your version has ANDing by mask, and resetting the mask at each iteration
of main loop. I think we can avoid it. What do you think on next?

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
size_t idx;
unsigned long tmp;

if (!size)
return 0;

idx = DIV_ROUND_UP(size, BITS_PER_LONG) - 1;
tmp = addr[idx] & LAST_WORD_MASK(size);

while (!tmp) {
if (!idx--)
return size;
tmp = addr[idx];
}
return idx * BITS_PER_LONG + __fls(tmp);
}

Yury

2015-02-12 08:15:50

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

> Rasmus, your version has ANDing by mask, and resetting the mask at each iteration
> of main loop. I think we can avoid it. What do you think on next?

Yes, that's basically what I proposed (modulo checking for zero size and
my buggy LAST_WORD_MASK).

But two unconditional instructions in the loop are awfully minor; it's
loads and conditional branches that cost.

The reset of the mask can be done in parallel with other operations; it's
only the AND that actually takes a cycle.

I can definitely see the argument that, for code that's not used often
enough to stay resident in the L1 cache, any speedup has to win by at
least one L2 cache access to be worth taking another cache line.

For Ivy bridge, those numbers are 32 KB and 12 cycles.

2015-02-12 09:58:59

by Rasmus Villemoes

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

On Thu, Feb 12 2015, "George Spelvin" <[email protected]> wrote:

>> Rasmus, your version has ANDing by mask, and resetting the mask at each iteration
>> of main loop. I think we can avoid it. What do you think on next?
>
> Yes, that's basically what I proposed (modulo checking for zero size and
> my buggy LAST_WORD_MASK).
>
> But two unconditional instructions in the loop are awfully minor; it's
> loads and conditional branches that cost.
>
> The reset of the mask can be done in parallel with other operations; it's
> only the AND that actually takes a cycle.
>
> I can definitely see the argument that, for code that's not used often
> enough to stay resident in the L1 cache, any speedup has to win by at
> least one L2 cache access to be worth taking another cache line.

That, and if the compiler was smart enough, the AND should actually be
free (at least on x86), since it can replace a test instruction. [1]

In any case, my code compiles to fewer bytes (though not an entire cache
line), and I think it is somewhat clearer - I'm obviously biased on the
latter point.

Rasmus

[1] Unfortunately, the compiler doesn't seem to be smart enough :-( When I
compile my code with gcc 5.0, I get

0000000000000000 <find_last_bit_rv>:
0: 53 push %rbx
1: 89 f1 mov %esi,%ecx
3: 48 8d 5e 3f lea 0x3f(%rsi),%rbx
7: f7 d9 neg %ecx
9: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
10: 48 83 e3 c0 and $0xffffffffffffffc0,%rbx
14: 48 d3 ea shr %cl,%rdx
17: eb 1a jmp 33 <find_last_bit_rv+0x33>
19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
20: 48 89 d1 mov %rdx,%rcx
23: 48 23 0c df and (%rdi,%rbx,8),%rcx
27: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
2e: 48 85 c9 test %rcx,%rcx
31: 75 15 jne 48 <find_last_bit_rv+0x48>
33: 48 83 eb 01 sub $0x1,%rbx
37: 48 83 fb ff cmp $0xffffffffffffffff,%rbx
3b: 75 e3 jne 20 <find_last_bit_rv+0x20>
3d: 48 89 f0 mov %rsi,%rax
40: 5b pop %rbx
41: c3 retq
42: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
48: 48 89 cf mov %rcx,%rdi
4b: 48 c1 e3 06 shl $0x6,%rbx
4f: e8 00 00 00 00 callq 54 <find_last_bit_rv+0x54>
54: 48 98 cltq
56: 48 01 d8 add %rbx,%rax
59: 5b pop %rbx
5a: c3 retq
5b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

the main loop is 20--3b. The test instruction at 2e seems to be
redundant. The same at 37: the sub instruction already sets plenty of
flags that could be used, so explicitly comparing %rbx to -1 seems
redundant.

2015-02-12 23:46:09

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

> That, and if the compiler was smart enough, the AND should actually be
> free (at least on x86), since it can replace a test instruction. [1]

Ah, right, x86 loads don't set the flags, so you need a TEST
instruction anyway. So the AND is free!

Of course, not all the world's an x86. But load/store architectures
generally need a test instruction as well. It's things like MIPS and
Aloha, that don't have condition codes, but only conditional branches
on register values, that will need an extra cycle.

> In any case, my code compiles to fewer bytes (though not an entire cache
> line), and I think it is somewhat clearer - I'm obviously biased on the
> latter point.

Myself, I think the code that shows that only the first word gets masked
by control flow is clearer, but since the complexity is completely
localized to this function, I'll take smaller and faster.

> [1] Unfortunately, the compiler doesn't seem to be smart enough :-( When I
> compile my code with gcc 5.0, I get
>
> 0000000000000000 <find_last_bit_rv>:
> 0: 53 push %rbx
> 1: 89 f1 mov %esi,%ecx
> 3: 48 8d 5e 3f lea 0x3f(%rsi),%rbx
> 7: f7 d9 neg %ecx
> 9: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
> 10: 48 83 e3 c0 and $0xffffffffffffffc0,%rbx
> 14: 48 d3 ea shr %cl,%rdx
> 17: eb 1a jmp 33 <find_last_bit_rv+0x33>
> 19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
>
> 20: 48 89 d1 mov %rdx,%rcx
> 23: 48 23 0c df and (%rdi,%rbx,8),%rcx
> 27: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
> 2e: 48 85 c9 test %rcx,%rcx
> 31: 75 15 jne 48 <find_last_bit_rv+0x48>
> 33: 48 83 eb 01 sub $0x1,%rbx
> 37: 48 83 fb ff cmp $0xffffffffffffffff,%rbx
> 3b: 75 e3 jne 20 <find_last_bit_rv+0x20>
>
> 3d: 48 89 f0 mov %rsi,%rax
> 40: 5b pop %rbx
> 41: c3 retq
> 42: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
> 48: 48 89 cf mov %rcx,%rdi
> 4b: 48 c1 e3 06 shl $0x6,%rbx
> 4f: e8 00 00 00 00 callq 54 <find_last_bit_rv+0x54>
> 54: 48 98 cltq
> 56: 48 01 d8 add %rbx,%rax
> 59: 5b pop %rbx
> 5a: c3 retq
> 5b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

> the main loop is 20--3b. The test instruction at 2e seems to be
> redundant. The same at 37: the sub instruction already sets plenty of
> flags that could be used, so explicitly comparing %rbx to -1 seems
> redundant.

Er... I think you hand-edited that code; it's wrong. The loop assumes that
%rbx is in units of words, but the prologue sets it up in units of bits.

The mov to %rcx is also redundant, since it could be eliminated with some
minor rescheduling.

The code generation I *want* for that function is:

# addr in %rdi, size in %rsi
movl %esi, %ecx
leaq 0x3f(%rsi), %rax
negl %ecx
movq $-1, %rdx
shrq $6, %rax
shrq %cl, %rdx
jmp 2f
1:
movq $-1, %rdx
2:
subq $1, %rax
jc 3f
andq (%rdi,%rax,8), %rdx
jeq 1b

bsrq %rdx, %rdx
salq $6, %rax
addq %rdx, %rax
ret
3:
movq %rsi, %rax
retq

I wonder if the compiler can be convinced by a bit of source tweaking?

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);
unsigned long val = LAST_WORD_MASK(size);

while (idx--) {
val &= addr[idx];
if (val)
return idx * BITS_PER_LONG + __fls(val);
val = ~0ul;
}
return size;
}

Darn, no difference...

2015-02-13 10:13:49

by Rasmus Villemoes

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

On Fri, Feb 13 2015, "George Spelvin" <[email protected]> wrote:

>> the main loop is 20--3b. The test instruction at 2e seems to be
>> redundant. The same at 37: the sub instruction already sets plenty of
>> flags that could be used, so explicitly comparing %rbx to -1 seems
>> redundant.
>
> Er... I think you hand-edited that code; it's wrong. The loop assumes that
> %rbx is in units of words, but the prologue sets it up in units of bits.

No, but I messed up the source by hand :-) My DIV_ROUND_UP macro was
bogus. Well spotted. Fixing that I still see the redundant cmp and
test, though.

> The mov to %rcx is also redundant, since it could be eliminated with
> some minor rescheduling.
>
> The code generation I *want* for that function is:
>
> # addr in %rdi, size in %rsi
> movl %esi, %ecx
> leaq 0x3f(%rsi), %rax
> negl %ecx
> movq $-1, %rdx
> shrq $6, %rax
> shrq %cl, %rdx
> jmp 2f
> 1:
> movq $-1, %rdx
> 2:
> subq $1, %rax
> jc 3f
> andq (%rdi,%rax,8), %rdx
> jeq 1b
>
> bsrq %rdx, %rdx
> salq $6, %rax
> addq %rdx, %rax
> ret
> 3:
> movq %rsi, %rax
> retq

Nice. But I don't think find_last_bit is important enough to warrant
arch-specific versions.

So, where are we with this? Have we reached some kind of consensus?

Rasmus

2015-02-13 21:09:07

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] lib: find_*_bit reimplementation

> Ohh... I used to think I know something about optimization. I tried build
> Rasmus' 'find_last_bit' against mine on MIPS, and found that sizes are as
> 268 vs 320 bytes. I think the best version is suggested by George, both
> readable, and effective. What about using it? If no objections, I'll
> gather all fixes and upload new patchset this weekend.

I'll happily ack whichever you prefer. Tightening the code to the
maximum possible fun exercise, but not essential. However, I finally
got GCC to generate reasonable code with the following:

size_t find_last_bit3(const unsigned long *addr, size_t size)
{
if (size) {
unsigned long val = LAST_WORD_MASK(size);
size_t idx = (size-1) / BITS_PER_LONG;

do {
val &= addr[idx];
if (val)
return idx * BITS_PER_LONG + __fls(val);
val = ~0ul;
} while (idx--);
}
return size;
}

size_t find_last_bit4(const unsigned long *addr, size_t size)
{
if (size) {
unsigned long val = LAST_WORD_MASK(size);
size_t idx = DIV_ROUND_UP(size, BITS_PER_LONG);

do {
val &= addr[idx];
if (val)
return idx * BITS_PER_LONG + __fls(val);
val = ~0ul;
} while (--idx);
}
return size;
}

Which produced, respectively, 76 and 68-byte functions:

0000000000000000 <find_last_bit3>:
0: 31 c0 xor %eax,%eax
2: 48 85 f6 test %rsi,%rsi
5: 74 44 je 4b <find_last_bit3+0x4b>
7: 48 8d 46 ff lea -0x1(%rsi),%rax
b: 89 f1 mov %esi,%ecx
d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
14: f7 d9 neg %ecx
16: 48 c1 e8 06 shr $0x6,%rax
1a: 48 d3 ea shr %cl,%rdx
1d: eb 11 jmp 30 <find_last_bit3+0x30>
1f: 90 nop
20: 48 83 e8 01 sub $0x1,%rax
24: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
2b: 48 39 d0 cmp %rdx,%rax
2e: 74 18 je 48 <find_last_bit3+0x48>
30: 48 23 14 c7 and (%rdi,%rax,8),%rdx
34: 74 ea je 20 <find_last_bit3+0x20>
36: 48 c1 e0 06 shl $0x6,%rax
3a: 48 0f bd d2 bsr %rdx,%rdx
3e: 48 01 d0 add %rdx,%rax
41: c3 retq
42: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
48: 48 89 f0 mov %rsi,%rax
4b: c3 retq
4c: 0f 1f 40 00 nopl 0x0(%rax)

0000000000000050 <find_last_bit4>:
50: 31 c0 xor %eax,%eax
52: 48 85 f6 test %rsi,%rsi
55: 74 3c je 93 <find_last_bit4+0x43>
57: 48 8d 46 3f lea 0x3f(%rsi),%rax
5b: 89 f1 mov %esi,%ecx
5d: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
64: f7 d9 neg %ecx
66: 48 c1 e8 06 shr $0x6,%rax
6a: 48 d3 ea shr %cl,%rdx
6d: eb 0d jmp 7c <find_last_bit4+0x2c>
6f: 90 nop
70: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
77: 48 01 d0 add %rdx,%rax
7a: 74 14 je 90 <find_last_bit4+0x40>
7c: 48 23 14 c7 and (%rdi,%rax,8),%rdx
80: 74 ee je 70 <find_last_bit4+0x20>
82: 48 c1 e0 06 shl $0x6,%rax
86: 48 0f bd d2 bsr %rdx,%rdx
8a: 48 01 d0 add %rdx,%rax
8d: c3 retq
8e: 66 90 xchg %ax,%ax
90: 48 89 f0 mov %rsi,%rax
93: c3 retq

The second one omits all the unwanted tests & compares, and even does
something clever with the -1 mask that I didn't think of.

2015-02-17 02:36:08

[permalink] [raw]

Subject: [PATCH v4 0/3] lib: find_*_bit reimplementation

This patchset does rework find_bit functions family to achieve better
performance, and decrease size of text. All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

/* addr[] is filled from /dev/urandom */
start = clock();
while (ret < nbits)
ret = find_next_bit(addr, nbits, ret + 1);

end = clock();
printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measuremets are next:
(for find_next_bit, nbits is 8M, for find_first_bit - 80K)

find_next_bit: find_first_bit:
new current new current
26932 43151 14777 14925
26947 43182 14521 15423
26507 43824 15053 14705
27329 43759 14473 14777
26895 43367 14847 15023
26990 43693 15103 15163
26775 43299 15067 15232
27282 42752 14544 15121
27504 43088 14644 14858
26761 43856 14699 15193
26692 43075 14781 14681
27137 42969 14451 15061
... ...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.
To disable it, and use generic one, please apply next patch:

---
arch/arm/include/asm/bitops.h | 19 -------------------
arch/arm/kernel/armksyms.c | 11 -----------
arch/arm/lib/Makefile | 2 +-
include/asm-generic/bitops/le.h | 1 +
4 files changed, 2 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
index 5638099..e0611d1 100644
--- a/arch/arm/include/asm/bitops.h
+++ b/arch/arm/include/asm/bitops.h
@@ -192,25 +192,6 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
#define test_and_clear_bit(nr,p) ATOMIC_BITOP(test_and_clear_bit,nr,p)
#define test_and_change_bit(nr,p) ATOMIC_BITOP(test_and_change_bit,nr,p)

-#ifndef __ARMEB__
-/*
- * These are the little endian, atomic definitions.
- */
-#define find_first_zero_bit(p,sz) _find_first_zero_bit_le(p,sz)
-#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_le(p,sz,off)
-#define find_first_bit(p,sz) _find_first_bit_le(p,sz)
-#define find_next_bit(p,sz,off) _find_next_bit_le(p,sz,off)
-
-#else
-/*
- * These are the big endian, atomic definitions.
- */
-#define find_first_zero_bit(p,sz) _find_first_zero_bit_be(p,sz)
-#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_be(p,sz,off)
-#define find_first_bit(p,sz) _find_first_bit_be(p,sz)
-#define find_next_bit(p,sz,off) _find_next_bit_be(p,sz,off)
-
-#endif

#if __LINUX_ARM_ARCH__ < 5

diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index a88671c..22e8748 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -146,17 +146,6 @@ EXPORT_SYMBOL(_clear_bit);
EXPORT_SYMBOL(_test_and_clear_bit);
EXPORT_SYMBOL(_change_bit);
EXPORT_SYMBOL(_test_and_change_bit);
-EXPORT_SYMBOL(_find_first_zero_bit_le);
-EXPORT_SYMBOL(_find_next_zero_bit_le);
-EXPORT_SYMBOL(_find_first_bit_le);
-EXPORT_SYMBOL(_find_next_bit_le);
-
-#ifdef __ARMEB__
-EXPORT_SYMBOL(_find_first_zero_bit_be);
-EXPORT_SYMBOL(_find_next_zero_bit_be);
-EXPORT_SYMBOL(_find_first_bit_be);
-EXPORT_SYMBOL(_find_next_bit_be);
-#endif

#ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_OLD_MCOUNT
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 0573faa..de369aa 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -6,7 +6,7 @@

lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
- delay.o delay-loop.o findbit.o memchr.o memcpy.o \
+ delay.o delay-loop.o memchr.o memcpy.o \
memmove.o memset.o memzero.o setbit.o \
strchr.o strrchr.o \
testchangebit.o testclearbit.o testsetbit.o \
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 6173154..9a8798f 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -2,6 +2,7 @@
#define _ASM_GENERIC_BITOPS_LE_H_

#include <asm/types.h>
+#include <asm-generic/bitops/find.h>
#include <asm/byteorder.h>

#if defined(__LITTLE_ENDIAN)
--
1.9.1

Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
helpful discussions.

v4:
* find_last_bit found buggy and replaced with George Spelvin's version,
which handles garbage at the end of word-unaligned bitmap correctly;
* find_last_bit description fixed in 'include/linux/bitops.h';
* '_find_next_bit{,_le}: first word handling turned to be unconditional;

v3:
* conditional bit inverting in _find_next_bit replaced with XORing by
mask (patch 1);
* size/offset checkers moved from wrappers to _find_next_bit (patch 1);
* #include <linux/bitops.h> restored (patch 1);
* lib/find_next_bit descriptor changed to not appeal to file name and
so let to produce clean diff in case of rename (patch 2);
* patch 3 is generated with '-M' option to detect renaming and drop
file content;
* this cover added.

Yury Norov (3):
lib: find_*_bit reimplementation
lib: move find_last_bit to lib/find_next_bit.c
lib: rename lib/find_next_bit.c to lib/find_bit.c

include/linux/bitops.h | 4 +-
lib/Makefile | 2 +-
lib/find_bit.c | 195 +++++++++++++++++++++++++++++++++
lib/find_last_bit.c | 49 ---------
lib/find_next_bit.c | 285 -------------------------------------------------
5 files changed, 198 insertions(+), 337 deletions(-)
create mode 100644 lib/find_bit.c
delete mode 100644 lib/find_last_bit.c
delete mode 100644 lib/find_next_bit.c

--
2.1.0

2015-02-17 02:36:38

[permalink] [raw]

Subject: [PATCH v4 1/3] lib: find_*_bit reimplementation

The new implementation takes less space in the sources
(see diffstat) and in the object. For me it's 710 vs 453
bytes of text. It also shows a better performance.

find_last_bit description fixed due to obvious typo.

In this patch 2 macros were introduced: {LOW,HIGH}_BITS_MASK,
that are doing almost the same as BITMAP_{FIRST,LAST}_WORD_MASK
in include/linux/bitmap.h. But 'LAST' macro is potentially less
effective, because it issues a conditional branch that can be
omitted. If it is replaced one day by a more effective
implementation, {LOW,HIGH}_BITS_MASK can be removed.

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/bitops.h | 4 +-
lib/find_last_bit.c | 37 +++----
lib/find_next_bit.c | 269 ++++++++++++++-----------------------------------
3 files changed, 94 insertions(+), 216 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5d858e0..297f5bd 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -218,9 +218,9 @@ static inline unsigned long __ffs64(u64 word)
/**
* find_last_bit - find the last set bit in a memory region
* @addr: The address to start the search at
- * @size: The maximum size to search
+ * @size: The number of bits to search
*
- * Returns the bit number of the first set bit, or size.
+ * Returns the bit number of the last set bit, or size.
*/
extern unsigned long find_last_bit(const unsigned long *addr,
unsigned long size);
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
index 91ca09f..edbb281 100644
--- a/lib/find_last_bit.c
+++ b/lib/find_last_bit.c
@@ -4,6 +4,9 @@
* Written by Rusty Russell <[email protected]>
* (Inspired by David Howell's find_next_bit implementation)
*
+ * Rewritten by Yury Norov <[email protected]> to decrease
+ * size and improve performance, 2015.
+ *
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
@@ -12,36 +15,26 @@

#include <linux/bitops.h>
#include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>
+
+#define LOW_BITS_MASK(nr) (~0UL >> -(nr))

#ifndef find_last_bit

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
- unsigned long words;
- unsigned long tmp;
+ if (size) {
+ unsigned long val = LOW_BITS_MASK(size % BITS_PER_LONG);
+ unsigned long idx = (size-1) / BITS_PER_LONG;

- /* Start at final word. */
- words = size / BITS_PER_LONG;
+ do {
+ val &= addr[idx];
+ if (val)
+ return idx * BITS_PER_LONG + __fls(val);

- /* Partial final word? */
- if (size & (BITS_PER_LONG-1)) {
- tmp = (addr[words] & (~0UL >> (BITS_PER_LONG
- - (size & (BITS_PER_LONG-1)))));
- if (tmp)
- goto found;
+ val = ~0ul;
+ } while (idx--);
}
-
- while (words) {
- tmp = addr[--words];
- if (tmp) {
-found:
- return words * BITS_PER_LONG + __fls(tmp);
- }
- }
-
- /* Not found */
return size;
}
EXPORT_SYMBOL(find_last_bit);
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 0cbfc0b..1f5f108 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -3,6 +3,9 @@
* Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*
+ * Rewritten by Yury Norov <[email protected]> to decrease
+ * size and improve performance, 2015.
+ *
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
@@ -11,98 +14,60 @@

#include <linux/bitops.h>
#include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>

-#define BITOP_WORD(nr) ((nr) / BITS_PER_LONG)
+#define HIGH_BITS_MASK(nr) (~0UL << (nr))
+
+#if !defined(find_next_bit) || !defined(find_next_zero_bit)

-#ifndef find_next_bit
/*
- * Find the next set bit in a memory region.
+ * This is a common helper function for find_next_bit and
+ * find_next_zero_bit. The difference is the "invert" argument, which
+ * is XORed with each fetched word before searching it for one bits.
*/
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
+static unsigned long _find_next_bit(const unsigned long *addr,
+ unsigned long nbits, unsigned long start, unsigned long invert)
{
- const unsigned long *p = addr + BITOP_WORD(offset);
- unsigned long result = offset & ~(BITS_PER_LONG-1);
unsigned long tmp;

- if (offset >= size)
- return size;
- size -= result;
- offset %= BITS_PER_LONG;
- if (offset) {
- tmp = *(p++);
- tmp &= (~0UL << offset);
- if (size < BITS_PER_LONG)
- goto found_first;
- if (tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
- while (size & ~(BITS_PER_LONG-1)) {
- if ((tmp = *(p++)))
- goto found_middle;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ if (!nbits || start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
+
+ /* Handle 1st word. */
+ tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);
+ start = round_down(start, BITS_PER_LONG);
+
+ while (!tmp) {
+ start += BITS_PER_LONG;
+ if (start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
}
- if (!size)
- return result;
- tmp = *p;

-found_first:
- tmp &= (~0UL >> (BITS_PER_LONG - size));
- if (tmp == 0UL) /* Are any bits set? */
- return result + size; /* Nope. */
-found_middle:
- return result + __ffs(tmp);
+ return min(start + __ffs(tmp), nbits);
}
-EXPORT_SYMBOL(find_next_bit);
#endif

-#ifndef find_next_zero_bit
+#ifndef find_next_bit
/*
- * This implementation of find_{first,next}_zero_bit was stolen from
- * Linus' asm-alpha/bitops.h.
+ * Find the next set bit in a memory region.
*/
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ return _find_next_bit(addr, size, offset, 0UL);
+}
+EXPORT_SYMBOL(find_next_bit);
+#endif
+
+#ifndef find_next_zero_bit
unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
unsigned long offset)
{
- const unsigned long *p = addr + BITOP_WORD(offset);
- unsigned long result = offset & ~(BITS_PER_LONG-1);
- unsigned long tmp;
-
- if (offset >= size)
- return size;
- size -= result;
- offset %= BITS_PER_LONG;
- if (offset) {
- tmp = *(p++);
- tmp |= ~0UL >> (BITS_PER_LONG - offset);
- if (size < BITS_PER_LONG)
- goto found_first;
- if (~tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
- while (size & ~(BITS_PER_LONG-1)) {
- if (~(tmp = *(p++)))
- goto found_middle;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
- }
- if (!size)
- return result;
- tmp = *p;
-
-found_first:
- tmp |= ~0UL << size;
- if (tmp == ~0UL) /* Are any bits zero? */
- return result + size; /* Nope. */
-found_middle:
- return result + ffz(tmp);
+ return _find_next_bit(addr, size, offset, ~0UL);
}
EXPORT_SYMBOL(find_next_zero_bit);
#endif
@@ -113,24 +78,14 @@ EXPORT_SYMBOL(find_next_zero_bit);
*/
unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
{
- const unsigned long *p = addr;
- unsigned long result = 0;
- unsigned long tmp;
+ unsigned long idx;

- while (size & ~(BITS_PER_LONG-1)) {
- if ((tmp = *(p++)))
- goto found;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+ if (addr[idx])
+ return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
}
- if (!size)
- return result;

- tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
- if (tmp == 0UL) /* Are any bits set? */
- return result + size; /* Nope. */
-found:
- return result + __ffs(tmp);
+ return size;
}
EXPORT_SYMBOL(find_first_bit);
#endif
@@ -141,24 +96,14 @@ EXPORT_SYMBOL(find_first_bit);
*/
unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
{
- const unsigned long *p = addr;
- unsigned long result = 0;
- unsigned long tmp;
+ unsigned long idx;

- while (size & ~(BITS_PER_LONG-1)) {
- if (~(tmp = *(p++)))
- goto found;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+ if (addr[idx] != ~0UL)
+ return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
}
- if (!size)
- return result;

- tmp = (*p) | (~0UL << size);
- if (tmp == ~0UL) /* Are any bits zero? */
- return result + size; /* Nope. */
-found:
- return result + ffz(tmp);
+ return size;
}
EXPORT_SYMBOL(find_first_zero_bit);
#endif
@@ -166,18 +111,6 @@ EXPORT_SYMBOL(find_first_zero_bit);
#ifdef __BIG_ENDIAN

/* include/linux/byteorder does not support "unsigned long" type */
-static inline unsigned long ext2_swabp(const unsigned long * x)
-{
-#if BITS_PER_LONG == 64
- return (unsigned long) __swab64p((u64 *) x);
-#elif BITS_PER_LONG == 32
- return (unsigned long) __swab32p((u32 *) x);
-#else
-#error BITS_PER_LONG not defined
-#endif
-}
-
-/* include/linux/byteorder doesn't support "unsigned long" type */
static inline unsigned long ext2_swab(const unsigned long y)
{
#if BITS_PER_LONG == 64
@@ -189,48 +122,38 @@ static inline unsigned long ext2_swab(const unsigned long y)
#endif
}

-#ifndef find_next_zero_bit_le
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
- long size, unsigned long offset)
+#if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
+static unsigned long _find_next_bit_le(const unsigned long *addr,
+ unsigned long nbits, unsigned long start, unsigned long invert)
{
- const unsigned long *p = addr;
- unsigned long result = offset & ~(BITS_PER_LONG - 1);
unsigned long tmp;

- if (offset >= size)
- return size;
- p += BITOP_WORD(offset);
- size -= result;
- offset &= (BITS_PER_LONG - 1UL);
- if (offset) {
- tmp = ext2_swabp(p++);
- tmp |= (~0UL >> (BITS_PER_LONG - offset));
- if (size < BITS_PER_LONG)
- goto found_first;
- if (~tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
+ if (!nbits || start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
+
+ /* Handle 1st word. */
+ tmp &= ext2_swab(HIGH_BITS_MASK(start % BITS_PER_LONG));
+ start = round_down(start, BITS_PER_LONG);

- while (size & ~(BITS_PER_LONG - 1)) {
- if (~(tmp = *(p++)))
- goto found_middle_swap;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ while (!tmp) {
+ start += BITS_PER_LONG;
+ if (start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
}
- if (!size)
- return result;
- tmp = ext2_swabp(p);
-found_first:
- tmp |= ~0UL << size;
- if (tmp == ~0UL) /* Are any bits zero? */
- return result + size; /* Nope. Skip ffz */
-found_middle:
- return result + ffz(tmp);

-found_middle_swap:
- return result + ffz(ext2_swab(tmp));
+ return min(start + __ffs(ext2_swab(tmp)), nbits);
+}
+#endif
+
+#ifndef find_next_zero_bit_le
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+ long size, unsigned long offset)
+{
+ return _find_next_bit_le(addr, size, offset, ~0UL);
}
EXPORT_SYMBOL(find_next_zero_bit_le);
#endif
@@ -239,45 +162,7 @@ EXPORT_SYMBOL(find_next_zero_bit_le);
unsigned long find_next_bit_le(const void *addr, unsigned
long size, unsigned long offset)
{
- const unsigned long *p = addr;
- unsigned long result = offset & ~(BITS_PER_LONG - 1);
- unsigned long tmp;
-
- if (offset >= size)
- return size;
- p += BITOP_WORD(offset);
- size -= result;
- offset &= (BITS_PER_LONG - 1UL);
- if (offset) {
- tmp = ext2_swabp(p++);
- tmp &= (~0UL << offset);
- if (size < BITS_PER_LONG)
- goto found_first;
- if (tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
-
- while (size & ~(BITS_PER_LONG - 1)) {
- tmp = *(p++);
- if (tmp)
- goto found_middle_swap;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
- }
- if (!size)
- return result;
- tmp = ext2_swabp(p);
-found_first:
- tmp &= (~0UL >> (BITS_PER_LONG - size));
- if (tmp == 0UL) /* Are any bits set? */
- return result + size; /* Nope. */
-found_middle:
- return result + __ffs(tmp);
-
-found_middle_swap:
- return result + __ffs(ext2_swab(tmp));
+ return _find_next_bit_le(addr, size, offset, 0UL);
}
EXPORT_SYMBOL(find_next_bit_le);
#endif
--
2.1.0

2015-02-17 02:36:36

[permalink] [raw]

Subject: [PATCH v4 2/3] lib: move find_last_bit to lib/find_next_bit.c

Currently all 'find_*_bit' family is located in lib/find_next_bit.c,
except 'find_last_bit', which is in lib/find_last_bit.c. It seems,
there's no major benefit to have it separated.
---
lib/Makefile | 2 +-
lib/find_last_bit.c | 42 ------------------------------------------
lib/find_next_bit.c | 27 ++++++++++++++++++++++++++-
3 files changed, 27 insertions(+), 44 deletions(-)
delete mode 100644 lib/find_last_bit.c

diff --git a/lib/Makefile b/lib/Makefile
index 87eb3bf..9dafcd5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y += lockref.o
obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
- bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
+ bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
obj-y += string_helpers.o
obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
deleted file mode 100644
index edbb281..0000000
--- a/lib/find_last_bit.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/* find_last_bit.c: fallback find next bit implementation
- *
- * Copyright (C) 2008 IBM Corporation
- * Written by Rusty Russell <[email protected]>
- * (Inspired by David Howell's find_next_bit implementation)
- *
- * Rewritten by Yury Norov <[email protected]> to decrease
- * size and improve performance, 2015.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include <linux/bitops.h>
-#include <linux/export.h>
-#include <linux/kernel.h>
-
-#define LOW_BITS_MASK(nr) (~0UL >> -(nr))
-
-#ifndef find_last_bit
-
-unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
-{
- if (size) {
- unsigned long val = LOW_BITS_MASK(size % BITS_PER_LONG);
- unsigned long idx = (size-1) / BITS_PER_LONG;
-
- do {
- val &= addr[idx];
- if (val)
- return idx * BITS_PER_LONG + __fls(val);
-
- val = ~0ul;
- } while (idx--);
- }
- return size;
-}
-EXPORT_SYMBOL(find_last_bit);
-
-#endif
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 1f5f108..627d0ec 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -1,8 +1,12 @@
-/* find_next_bit.c: fallback find next bit implementation
+/* bit search implementation
*
* Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*
+ * Copyright (C) 2008 IBM Corporation
+ * 'find_last_bit' is written by Rusty Russell <[email protected]>
+ * (Inspired by David Howell's find_next_bit implementation)
+ *
* Rewritten by Yury Norov <[email protected]> to decrease
* size and improve performance, 2015.
*
@@ -17,6 +21,7 @@
#include <linux/kernel.h>

#define HIGH_BITS_MASK(nr) (~0UL << (nr))
+#define LOW_BITS_MASK(nr) (~0UL >> -(nr))

#if !defined(find_next_bit) || !defined(find_next_zero_bit)

@@ -108,6 +113,26 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
EXPORT_SYMBOL(find_first_zero_bit);
#endif

+#ifndef find_last_bit
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+ if (size) {
+ unsigned long val = LOW_BITS_MASK(size % BITS_PER_LONG);
+ unsigned long idx = (size-1) / BITS_PER_LONG;
+
+ do {
+ val &= addr[idx];
+ if (val)
+ return idx * BITS_PER_LONG + __fls(val);
+
+ val = ~0ul;
+ } while (idx--);
+ }
+ return size;
+}
+EXPORT_SYMBOL(find_last_bit);
+#endif
+
#ifdef __BIG_ENDIAN

/* include/linux/byteorder does not support "unsigned long" type */
--
2.1.0

2015-02-17 02:36:33

[permalink] [raw]

Subject: [PATCH v4 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c

This file contains implementation for all find_*_bit{,_le} family.
So giving it more generic name looks reasonable.
---
lib/Makefile | 2 +-
lib/{find_next_bit.c => find_bit.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
rename lib/{find_next_bit.c => find_bit.c} (100%)

diff --git a/lib/Makefile b/lib/Makefile
index 9dafcd5..d857965 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y += lockref.o
obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
- bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
+ bsearch.o find_bit.o llist.o memweight.o kfifo.o \
percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
obj-y += string_helpers.o
obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_next_bit.c b/lib/find_bit.c
similarity index 100%
rename from lib/find_next_bit.c
rename to lib/find_bit.c
--
2.1.0

2015-02-18 17:57:43

by Rasmus Villemoes

[permalink] [raw]

Subject: Re: [PATCH v4 1/3] lib: find_*_bit reimplementation

On Tue, Feb 17 2015, Yury Norov <[email protected]> wrote:

> The new implementation takes less space in the sources
> (see diffstat) and in the object. For me it's 710 vs 453
> bytes of text. It also shows a better performance.
>
> find_last_bit description fixed due to obvious typo.
>
> In this patch 2 macros were introduced: {LOW,HIGH}_BITS_MASK,
> that are doing almost the same as BITMAP_{FIRST,LAST}_WORD_MASK
> in include/linux/bitmap.h. But 'LAST' macro is potentially less
> effective, because it issues a conditional branch that can be
> omitted. If it is replaced one day by a more effective
> implementation, {LOW,HIGH}_BITS_MASK can be removed.
>

I think it's better to use the existing macros and then improve them
instead of duplicating the functionality. I'll submit a patch for that
later tonight (that should then make it to 3.21 [or whatever 3.19+2 will
be called] together with this series). More on this issue below.

>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> include/linux/bitops.h | 4 +-
> lib/find_last_bit.c | 37 +++----
> lib/find_next_bit.c | 269 ++++++++++++++-----------------------------------
> 3 files changed, 94 insertions(+), 216 deletions(-)
>
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 5d858e0..297f5bd 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -218,9 +218,9 @@ static inline unsigned long __ffs64(u64 word)
> /**
> * find_last_bit - find the last set bit in a memory region
> * @addr: The address to start the search at
> - * @size: The maximum size to search
> + * @size: The number of bits to search
> *
> - * Returns the bit number of the first set bit, or size.
> + * Returns the bit number of the last set bit, or size.
> */
> extern unsigned long find_last_bit(const unsigned long *addr,
> unsigned long size);
> diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
> index 91ca09f..edbb281 100644
> --- a/lib/find_last_bit.c
> +++ b/lib/find_last_bit.c
> @@ -4,6 +4,9 @@
> * Written by Rusty Russell <[email protected]>
> * (Inspired by David Howell's find_next_bit implementation)
> *
> + * Rewritten by Yury Norov <[email protected]> to decrease
> + * size and improve performance, 2015.
> + *
> * This program is free software; you can redistribute it and/or
> * modify it under the terms of the GNU General Public License
> * as published by the Free Software Foundation; either version
> @@ -12,36 +15,26 @@
>
> #include <linux/bitops.h>
> #include <linux/export.h>
> -#include <asm/types.h>
> -#include <asm/byteorder.h>
> +#include <linux/kernel.h>
> +
> +#define LOW_BITS_MASK(nr) (~0UL >> -(nr))

This is technically wrong, and may not even work on architectures that
are not as forgiving as x86: Whatever type and value nr has, -(nr) is
almost guaranteed not to be a number between 0 and BITS_PER_LONG-1. And
even on x86, gcc doesn't generate as good code as it could:

163: 49 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%r8
16a: 83 e1 3f and $0x3f,%ecx
16d: f7 d9 neg %ecx
16f: 48 c1 ea 06 shr $0x6,%rdx
173: 49 d3 e8 shr %cl,%r8

It doesn't realize that pre-masking %ecx with 0x3f is redundant when we
then negate it and use it as a shift amount.

So a better definition of the macro is

#define BITMAP_LAST_WORD_MASK(nr) (~0UL >> (-(nr) & (BITS_PER_LONG-1)))

and then callers shouldn't do the modulo. On x86, gcc knows that the &
is redundant. I use & instead of % so that nr may also have signed type
(otherwise we're again in UB land, since -(nr) % BITS_PER_LONG is then,
by the broken C standard, a negative number).

> #include <linux/bitops.h>
> #include <linux/export.h>
> -#include <asm/types.h>
> -#include <asm/byteorder.h>
> +#include <linux/kernel.h>
>
> -#define BITOP_WORD(nr) ((nr) / BITS_PER_LONG)
> +#define HIGH_BITS_MASK(nr) (~0UL << (nr))
> +
> +#if !defined(find_next_bit) || !defined(find_next_zero_bit)
>
> -#ifndef find_next_bit
> /*
> - * Find the next set bit in a memory region.
> + * This is a common helper function for find_next_bit and
> + * find_next_zero_bit. The difference is the "invert" argument, which
> + * is XORed with each fetched word before searching it for one bits.
> */
> -unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
> - unsigned long offset)
> +static unsigned long _find_next_bit(const unsigned long *addr,
> + unsigned long nbits, unsigned long start, unsigned long invert)
> {
> - const unsigned long *p = addr + BITOP_WORD(offset);
> - unsigned long result = offset & ~(BITS_PER_LONG-1);
> unsigned long tmp;
>
> - if (offset >= size)
> - return size;
> - size -= result;
> - offset %= BITS_PER_LONG;
> - if (offset) {
> - tmp = *(p++);
> - tmp &= (~0UL << offset);
> - if (size < BITS_PER_LONG)
> - goto found_first;
> - if (tmp)
> - goto found_middle;
> - size -= BITS_PER_LONG;
> - result += BITS_PER_LONG;
> - }
> - while (size & ~(BITS_PER_LONG-1)) {
> - if ((tmp = *(p++)))
> - goto found_middle;
> - result += BITS_PER_LONG;
> - size -= BITS_PER_LONG;
> + if (!nbits || start >= nbits)
> + return nbits;
> +
> + tmp = addr[start / BITS_PER_LONG] ^ invert;
> +
> + /* Handle 1st word. */
> + tmp &= HIGH_BITS_MASK(start % BITS_PER_LONG);

And of course here, I'd then suggest using BITMAP_FIRST_WORD_MASK(start)
(that even matches the comment :-)), omitting the definition of
HIGH_BITS_MASK.

> @@ -113,24 +78,14 @@ EXPORT_SYMBOL(find_next_zero_bit);
> */
> unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
> {
> - const unsigned long *p = addr;
> - unsigned long result = 0;
> - unsigned long tmp;
> + unsigned long idx;
>
> - while (size & ~(BITS_PER_LONG-1)) {
> - if ((tmp = *(p++)))
> - goto found;
> - result += BITS_PER_LONG;
> - size -= BITS_PER_LONG;
> + for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
> + if (addr[idx])
> + return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
> }
> - if (!size)
> - return result;
>
> - tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
> - if (tmp == 0UL) /* Are any bits set? */
> - return result + size; /* Nope. */
> -found:
> - return result + __ffs(tmp);
> + return size;
> }
> EXPORT_SYMBOL(find_first_bit);
> #endif
> @@ -141,24 +96,14 @@ EXPORT_SYMBOL(find_first_bit);
> */
> unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
> {
> - const unsigned long *p = addr;
> - unsigned long result = 0;
> - unsigned long tmp;
> + unsigned long idx;
>
> - while (size & ~(BITS_PER_LONG-1)) {
> - if (~(tmp = *(p++)))
> - goto found;
> - result += BITS_PER_LONG;
> - size -= BITS_PER_LONG;
> + for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
> + if (addr[idx] != ~0UL)
> + return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
> }

Since I'm afraid the above means I have to ask you to send a v5, I might
as well also comment on this: I think it would make the code much more
obviously parallel to find_first_bit if the test was "if (~addr[idx])"
and the ffz is then replaced by __ffs(~addr[idx]). Many architectures
implement ffz(x) as __ffs(~x) anyway, so it shouldn't be any less
efficient. But it's no big deal, so if you feel this is better, just
leave it.

Rasmus

2015-02-22 17:24:37

[permalink] [raw]

Subject: [PATCH v5 0/3] lib: find_*_bit reimplementation

This patchset does rework find_bit functions family to achieve better
performance, and decrease size of text. All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

/* addr[] is filled from /dev/urandom */
start = clock();
while (ret < nbits)
ret = find_next_bit(addr, nbits, ret + 1);

end = clock();
printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measuremets are next:
(for find_next_bit, nbits is 8M, for find_first_bit - 80K)

find_next_bit: find_first_bit:
new current new current
26932 43151 14777 14925
26947 43182 14521 15423
26507 43824 15053 14705
27329 43759 14473 14777
26895 43367 14847 15023
26990 43693 15103 15163
26775 43299 15067 15232
27282 42752 14544 15121
27504 43088 14644 14858
26761 43856 14699 15193
26692 43075 14781 14681
27137 42969 14451 15061
... ...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.
To disable it, and use generic one, please apply next patch:

---
arch/arm/include/asm/bitops.h | 19 -------------------
arch/arm/kernel/armksyms.c | 11 -----------
arch/arm/lib/Makefile | 2 +-
include/asm-generic/bitops/le.h | 1 +
4 files changed, 2 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
index 5638099..e0611d1 100644
--- a/arch/arm/include/asm/bitops.h
+++ b/arch/arm/include/asm/bitops.h
@@ -192,25 +192,6 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
#define test_and_clear_bit(nr,p) ATOMIC_BITOP(test_and_clear_bit,nr,p)
#define test_and_change_bit(nr,p) ATOMIC_BITOP(test_and_change_bit,nr,p)

-#ifndef __ARMEB__
-/*
- * These are the little endian, atomic definitions.
- */
-#define find_first_zero_bit(p,sz) _find_first_zero_bit_le(p,sz)
-#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_le(p,sz,off)
-#define find_first_bit(p,sz) _find_first_bit_le(p,sz)
-#define find_next_bit(p,sz,off) _find_next_bit_le(p,sz,off)
-
-#else
-/*
- * These are the big endian, atomic definitions.
- */
-#define find_first_zero_bit(p,sz) _find_first_zero_bit_be(p,sz)
-#define find_next_zero_bit(p,sz,off) _find_next_zero_bit_be(p,sz,off)
-#define find_first_bit(p,sz) _find_first_bit_be(p,sz)
-#define find_next_bit(p,sz,off) _find_next_bit_be(p,sz,off)
-
-#endif

#if __LINUX_ARM_ARCH__ < 5

diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index a88671c..22e8748 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -146,17 +146,6 @@ EXPORT_SYMBOL(_clear_bit);
EXPORT_SYMBOL(_test_and_clear_bit);
EXPORT_SYMBOL(_change_bit);
EXPORT_SYMBOL(_test_and_change_bit);
-EXPORT_SYMBOL(_find_first_zero_bit_le);
-EXPORT_SYMBOL(_find_next_zero_bit_le);
-EXPORT_SYMBOL(_find_first_bit_le);
-EXPORT_SYMBOL(_find_next_bit_le);
-
-#ifdef __ARMEB__
-EXPORT_SYMBOL(_find_first_zero_bit_be);
-EXPORT_SYMBOL(_find_next_zero_bit_be);
-EXPORT_SYMBOL(_find_first_bit_be);
-EXPORT_SYMBOL(_find_next_bit_be);
-#endif

#ifdef CONFIG_FUNCTION_TRACER
#ifdef CONFIG_OLD_MCOUNT
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 0573faa..de369aa 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -6,7 +6,7 @@

lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
- delay.o delay-loop.o findbit.o memchr.o memcpy.o \
+ delay.o delay-loop.o memchr.o memcpy.o \
memmove.o memset.o memzero.o setbit.o \
strchr.o strrchr.o \
testchangebit.o testclearbit.o testsetbit.o \
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 6173154..9a8798f 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -2,6 +2,7 @@
#define _ASM_GENERIC_BITOPS_LE_H_

#include <asm/types.h>
+#include <asm-generic/bitops/find.h>
#include <asm/byteorder.h>

#if defined(__LITTLE_ENDIAN)
--
1.9.1

Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
helpful discussions.

v5:
* {FIRST,LAST}_BITS_MASK macros are removed. BITMAP_{FIRST,LAST}_WORD_MASK
are used instead.

v4:
* find_last_bit found buggy and replaced with George Spelvin's version,
which handles garbage at the end of word-unaligned bitmap correctly;
* find_last_bit description fixed in 'include/linux/bitops.h';
* '_find_next_bit{,_le}: first word handling turned to be unconditional;

v3:
* conditional bit inverting in _find_next_bit replaced with XORing by
mask (patch 1);
* size/offset checkers moved from wrappers to _find_next_bit (patch 1);
* #include <linux/bitops.h> restored (patch 1);
* lib/find_next_bit descriptor changed to not appeal to file name and
so let to produce clean diff in case of rename (patch 2);
* patch 3 is generated with '-M' option to detect renaming and drop
file content;
* this cover added.

Yury Norov (3):
lib: find_*_bit reimplementation
lib: move find_last_bit to lib/find_next_bit.c
lib: rename lib/find_next_bit.c to lib/find_bit.c

include/linux/bitops.h | 4 +-
lib/Makefile | 2 +-
lib/find_bit.c | 195 +++++++++++++++++++++++++++++++++
lib/find_last_bit.c | 49 ---------
lib/find_next_bit.c | 285 -------------------------------------------------
5 files changed, 198 insertions(+), 337 deletions(-)
create mode 100644 lib/find_bit.c
delete mode 100644 lib/find_last_bit.c
delete mode 100644 lib/find_next_bit.c

--
2.1.0

2015-02-22 17:24:41

[permalink] [raw]

Subject: [PATCH v5 1/3] lib: find_*_bit reimplementation

New implementations takes less space in source file (see diffstat)
and in object. For me it's 710 vs 453 bytes of text.
It also shows better performance.

find_last_bit description fixed due to obvious typo.

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/bitops.h | 4 +-
lib/find_last_bit.c | 35 +++----
lib/find_next_bit.c | 267 ++++++++++++++-----------------------------------
3 files changed, 90 insertions(+), 216 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 5d858e0..297f5bd 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -218,9 +218,9 @@ static inline unsigned long __ffs64(u64 word)
/**
* find_last_bit - find the last set bit in a memory region
* @addr: The address to start the search at
- * @size: The maximum size to search
+ * @size: The number of bits to search
*
- * Returns the bit number of the first set bit, or size.
+ * Returns the bit number of the last set bit, or size.
*/
extern unsigned long find_last_bit(const unsigned long *addr,
unsigned long size);
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
index 91ca09f..f7b594e 100644
--- a/lib/find_last_bit.c
+++ b/lib/find_last_bit.c
@@ -4,6 +4,9 @@
* Written by Rusty Russell <[email protected]>
* (Inspired by David Howell's find_next_bit implementation)
*
+ * Rewritten by Yury Norov <[email protected]> to decrease
+ * size and improve performance, 2015.
+ *
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
@@ -12,36 +15,24 @@

#include <linux/bitops.h>
#include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>

#ifndef find_last_bit

unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
- unsigned long words;
- unsigned long tmp;
-
- /* Start at final word. */
- words = size / BITS_PER_LONG;
+ if (size) {
+ unsigned long val = BITMAP_LAST_WORD_MASK(size);
+ unsigned long idx = (size-1) / BITS_PER_LONG;

- /* Partial final word? */
- if (size & (BITS_PER_LONG-1)) {
- tmp = (addr[words] & (~0UL >> (BITS_PER_LONG
- - (size & (BITS_PER_LONG-1)))));
- if (tmp)
- goto found;
- }
+ do {
+ val &= addr[idx];
+ if (val)
+ return idx * BITS_PER_LONG + __fls(val);

- while (words) {
- tmp = addr[--words];
- if (tmp) {
-found:
- return words * BITS_PER_LONG + __fls(tmp);
- }
+ val = ~0ul;
+ } while (idx--);
}
-
- /* Not found */
return size;
}
EXPORT_SYMBOL(find_last_bit);
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index 0cbfc0b..cbea5ef 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -3,6 +3,9 @@
* Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*
+ * Rewritten by Yury Norov <[email protected]> to decrease
+ * size and improve performance, 2015.
+ *
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
@@ -11,98 +14,58 @@

#include <linux/bitops.h>
#include <linux/export.h>
-#include <asm/types.h>
-#include <asm/byteorder.h>
+#include <linux/kernel.h>

-#define BITOP_WORD(nr) ((nr) / BITS_PER_LONG)
+#if !defined(find_next_bit) || !defined(find_next_zero_bit)

-#ifndef find_next_bit
/*
- * Find the next set bit in a memory region.
+ * This is a common helper function for find_next_bit and
+ * find_next_zero_bit. The difference is the "invert" argument, which
+ * is XORed with each fetched word before searching it for one bits.
*/
-unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
- unsigned long offset)
+static unsigned long _find_next_bit(const unsigned long *addr,
+ unsigned long nbits, unsigned long start, unsigned long invert)
{
- const unsigned long *p = addr + BITOP_WORD(offset);
- unsigned long result = offset & ~(BITS_PER_LONG-1);
unsigned long tmp;

- if (offset >= size)
- return size;
- size -= result;
- offset %= BITS_PER_LONG;
- if (offset) {
- tmp = *(p++);
- tmp &= (~0UL << offset);
- if (size < BITS_PER_LONG)
- goto found_first;
- if (tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
- while (size & ~(BITS_PER_LONG-1)) {
- if ((tmp = *(p++)))
- goto found_middle;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ if (!nbits || start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
+
+ /* Handle 1st word. */
+ tmp &= BITMAP_FIRST_WORD_MASK(start);
+ start = round_down(start, BITS_PER_LONG);
+
+ while (!tmp) {
+ start += BITS_PER_LONG;
+ if (start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
}
- if (!size)
- return result;
- tmp = *p;

-found_first:
- tmp &= (~0UL >> (BITS_PER_LONG - size));
- if (tmp == 0UL) /* Are any bits set? */
- return result + size; /* Nope. */
-found_middle:
- return result + __ffs(tmp);
+ return min(start + __ffs(tmp), nbits);
}
-EXPORT_SYMBOL(find_next_bit);
#endif

-#ifndef find_next_zero_bit
+#ifndef find_next_bit
/*
- * This implementation of find_{first,next}_zero_bit was stolen from
- * Linus' asm-alpha/bitops.h.
+ * Find the next set bit in a memory region.
*/
+unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
+ unsigned long offset)
+{
+ return _find_next_bit(addr, size, offset, 0UL);
+}
+EXPORT_SYMBOL(find_next_bit);
+#endif
+
+#ifndef find_next_zero_bit
unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
unsigned long offset)
{
- const unsigned long *p = addr + BITOP_WORD(offset);
- unsigned long result = offset & ~(BITS_PER_LONG-1);
- unsigned long tmp;
-
- if (offset >= size)
- return size;
- size -= result;
- offset %= BITS_PER_LONG;
- if (offset) {
- tmp = *(p++);
- tmp |= ~0UL >> (BITS_PER_LONG - offset);
- if (size < BITS_PER_LONG)
- goto found_first;
- if (~tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
- while (size & ~(BITS_PER_LONG-1)) {
- if (~(tmp = *(p++)))
- goto found_middle;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
- }
- if (!size)
- return result;
- tmp = *p;
-
-found_first:
- tmp |= ~0UL << size;
- if (tmp == ~0UL) /* Are any bits zero? */
- return result + size; /* Nope. */
-found_middle:
- return result + ffz(tmp);
+ return _find_next_bit(addr, size, offset, ~0UL);
}
EXPORT_SYMBOL(find_next_zero_bit);
#endif
@@ -113,24 +76,14 @@ EXPORT_SYMBOL(find_next_zero_bit);
*/
unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
{
- const unsigned long *p = addr;
- unsigned long result = 0;
- unsigned long tmp;
+ unsigned long idx;

- while (size & ~(BITS_PER_LONG-1)) {
- if ((tmp = *(p++)))
- goto found;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+ if (addr[idx])
+ return min(idx * BITS_PER_LONG + __ffs(addr[idx]), size);
}
- if (!size)
- return result;

- tmp = (*p) & (~0UL >> (BITS_PER_LONG - size));
- if (tmp == 0UL) /* Are any bits set? */
- return result + size; /* Nope. */
-found:
- return result + __ffs(tmp);
+ return size;
}
EXPORT_SYMBOL(find_first_bit);
#endif
@@ -141,24 +94,14 @@ EXPORT_SYMBOL(find_first_bit);
*/
unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
{
- const unsigned long *p = addr;
- unsigned long result = 0;
- unsigned long tmp;
+ unsigned long idx;

- while (size & ~(BITS_PER_LONG-1)) {
- if (~(tmp = *(p++)))
- goto found;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ for (idx = 0; idx * BITS_PER_LONG < size; idx++) {
+ if (addr[idx] != ~0UL)
+ return min(idx * BITS_PER_LONG + ffz(addr[idx]), size);
}
- if (!size)
- return result;

- tmp = (*p) | (~0UL << size);
- if (tmp == ~0UL) /* Are any bits zero? */
- return result + size; /* Nope. */
-found:
- return result + ffz(tmp);
+ return size;
}
EXPORT_SYMBOL(find_first_zero_bit);
#endif
@@ -166,18 +109,6 @@ EXPORT_SYMBOL(find_first_zero_bit);
#ifdef __BIG_ENDIAN

/* include/linux/byteorder does not support "unsigned long" type */
-static inline unsigned long ext2_swabp(const unsigned long * x)
-{
-#if BITS_PER_LONG == 64
- return (unsigned long) __swab64p((u64 *) x);
-#elif BITS_PER_LONG == 32
- return (unsigned long) __swab32p((u32 *) x);
-#else
-#error BITS_PER_LONG not defined
-#endif
-}
-
-/* include/linux/byteorder doesn't support "unsigned long" type */
static inline unsigned long ext2_swab(const unsigned long y)
{
#if BITS_PER_LONG == 64
@@ -189,48 +120,38 @@ static inline unsigned long ext2_swab(const unsigned long y)
#endif
}

-#ifndef find_next_zero_bit_le
-unsigned long find_next_zero_bit_le(const void *addr, unsigned
- long size, unsigned long offset)
+#if !defined(find_next_bit_le) || !defined(find_next_zero_bit_le)
+static unsigned long _find_next_bit_le(const unsigned long *addr,
+ unsigned long nbits, unsigned long start, unsigned long invert)
{
- const unsigned long *p = addr;
- unsigned long result = offset & ~(BITS_PER_LONG - 1);
unsigned long tmp;

- if (offset >= size)
- return size;
- p += BITOP_WORD(offset);
- size -= result;
- offset &= (BITS_PER_LONG - 1UL);
- if (offset) {
- tmp = ext2_swabp(p++);
- tmp |= (~0UL >> (BITS_PER_LONG - offset));
- if (size < BITS_PER_LONG)
- goto found_first;
- if (~tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
+ if (!nbits || start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
+
+ /* Handle 1st word. */
+ tmp &= ext2_swab(BITMAP_FIRST_WORD_MASK(start));
+ start = round_down(start, BITS_PER_LONG);

- while (size & ~(BITS_PER_LONG - 1)) {
- if (~(tmp = *(p++)))
- goto found_middle_swap;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
+ while (!tmp) {
+ start += BITS_PER_LONG;
+ if (start >= nbits)
+ return nbits;
+
+ tmp = addr[start / BITS_PER_LONG] ^ invert;
}
- if (!size)
- return result;
- tmp = ext2_swabp(p);
-found_first:
- tmp |= ~0UL << size;
- if (tmp == ~0UL) /* Are any bits zero? */
- return result + size; /* Nope. Skip ffz */
-found_middle:
- return result + ffz(tmp);

-found_middle_swap:
- return result + ffz(ext2_swab(tmp));
+ return min(start + __ffs(ext2_swab(tmp)), nbits);
+}
+#endif
+
+#ifndef find_next_zero_bit_le
+unsigned long find_next_zero_bit_le(const void *addr, unsigned
+ long size, unsigned long offset)
+{
+ return _find_next_bit_le(addr, size, offset, ~0UL);
}
EXPORT_SYMBOL(find_next_zero_bit_le);
#endif
@@ -239,45 +160,7 @@ EXPORT_SYMBOL(find_next_zero_bit_le);
unsigned long find_next_bit_le(const void *addr, unsigned
long size, unsigned long offset)
{
- const unsigned long *p = addr;
- unsigned long result = offset & ~(BITS_PER_LONG - 1);
- unsigned long tmp;
-
- if (offset >= size)
- return size;
- p += BITOP_WORD(offset);
- size -= result;
- offset &= (BITS_PER_LONG - 1UL);
- if (offset) {
- tmp = ext2_swabp(p++);
- tmp &= (~0UL << offset);
- if (size < BITS_PER_LONG)
- goto found_first;
- if (tmp)
- goto found_middle;
- size -= BITS_PER_LONG;
- result += BITS_PER_LONG;
- }
-
- while (size & ~(BITS_PER_LONG - 1)) {
- tmp = *(p++);
- if (tmp)
- goto found_middle_swap;
- result += BITS_PER_LONG;
- size -= BITS_PER_LONG;
- }
- if (!size)
- return result;
- tmp = ext2_swabp(p);
-found_first:
- tmp &= (~0UL >> (BITS_PER_LONG - size));
- if (tmp == 0UL) /* Are any bits set? */
- return result + size; /* Nope. */
-found_middle:
- return result + __ffs(tmp);
-
-found_middle_swap:
- return result + __ffs(ext2_swab(tmp));
+ return _find_next_bit_le(addr, size, offset, 0UL);
}
EXPORT_SYMBOL(find_next_bit_le);
#endif
--
2.1.0

2015-02-22 17:24:46

[permalink] [raw]

Subject: [PATCH v5 2/3] lib: move find_last_bit to lib/find_next_bit.c

Currently all 'find_*_bit' family is located in lib/find_next_bit.c,
except 'find_last_bit', which is in lib/find_last_bit.c. It seems,
there's no major benefit to have it separated.
---
lib/Makefile | 2 +-
lib/find_last_bit.c | 40 ----------------------------------------
lib/find_next_bit.c | 26 +++++++++++++++++++++++++-
3 files changed, 26 insertions(+), 42 deletions(-)
delete mode 100644 lib/find_last_bit.c

diff --git a/lib/Makefile b/lib/Makefile
index 87eb3bf..9dafcd5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y += lockref.o
obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
- bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
+ bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
obj-y += string_helpers.o
obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_last_bit.c b/lib/find_last_bit.c
deleted file mode 100644
index f7b594e..0000000
--- a/lib/find_last_bit.c
+++ /dev/null
@@ -1,40 +0,0 @@
-/* find_last_bit.c: fallback find next bit implementation
- *
- * Copyright (C) 2008 IBM Corporation
- * Written by Rusty Russell <[email protected]>
- * (Inspired by David Howell's find_next_bit implementation)
- *
- * Rewritten by Yury Norov <[email protected]> to decrease
- * size and improve performance, 2015.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include <linux/bitops.h>
-#include <linux/export.h>
-#include <linux/kernel.h>
-
-#ifndef find_last_bit
-
-unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
-{
- if (size) {
- unsigned long val = BITMAP_LAST_WORD_MASK(size);
- unsigned long idx = (size-1) / BITS_PER_LONG;
-
- do {
- val &= addr[idx];
- if (val)
- return idx * BITS_PER_LONG + __fls(val);
-
- val = ~0ul;
- } while (idx--);
- }
- return size;
-}
-EXPORT_SYMBOL(find_last_bit);
-
-#endif
diff --git a/lib/find_next_bit.c b/lib/find_next_bit.c
index cbea5ef..9ae4d40 100644
--- a/lib/find_next_bit.c
+++ b/lib/find_next_bit.c
@@ -1,8 +1,12 @@
-/* find_next_bit.c: fallback find next bit implementation
+/* bit search implementation
*
* Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*
+ * Copyright (C) 2008 IBM Corporation
+ * 'find_last_bit' is written by Rusty Russell <[email protected]>
+ * (Inspired by David Howell's find_next_bit implementation)
+ *
* Rewritten by Yury Norov <[email protected]> to decrease
* size and improve performance, 2015.
*
@@ -106,6 +110,26 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
EXPORT_SYMBOL(find_first_zero_bit);
#endif

+#ifndef find_last_bit
+unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
+{
+ if (size) {
+ unsigned long val = BITMAP_LAST_WORD_MASK(size);
+ unsigned long idx = (size-1) / BITS_PER_LONG;
+
+ do {
+ val &= addr[idx];
+ if (val)
+ return idx * BITS_PER_LONG + __fls(val);
+
+ val = ~0ul;
+ } while (idx--);
+ }
+ return size;
+}
+EXPORT_SYMBOL(find_last_bit);
+#endif
+
#ifdef __BIG_ENDIAN

/* include/linux/byteorder does not support "unsigned long" type */
--
2.1.0

2015-02-22 17:24:57

[permalink] [raw]

Subject: [PATCH v5 3/3] lib: rename lib/find_next_bit.c to lib/find_bit.c

This file contains implementation for all find_*_bit{,_le}
So giving it more generic name looks reasonable.
---
lib/Makefile | 2 +-
lib/{find_next_bit.c => find_bit.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
rename lib/{find_next_bit.c => find_bit.c} (100%)

diff --git a/lib/Makefile b/lib/Makefile
index 9dafcd5..d857965 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -25,7 +25,7 @@ obj-y += lockref.o
obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \
- bsearch.o find_next_bit.o llist.o memweight.o kfifo.o \
+ bsearch.o find_bit.o llist.o memweight.o kfifo.o \
percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
obj-y += string_helpers.o
obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
diff --git a/lib/find_next_bit.c b/lib/find_bit.c
similarity index 100%
rename from lib/find_next_bit.c
rename to lib/find_bit.c
--
2.1.0

2015-02-23 21:50:11

by Rasmus Villemoes

[permalink] [raw]

Subject: Re: [PATCH v5 1/3] lib: find_*_bit reimplementation

On Sun, Feb 22 2015, Yury Norov <[email protected]> wrote:

> New implementations takes less space in source file (see diffstat)
> and in object. For me it's 710 vs 453 bytes of text.
> It also shows better performance.
>
> find_last_bit description fixed due to obvious typo.
>

Hm, sorry, I should probably have reminded you to include linux/bitmap.h:

lib/find_last_bit.c: In function ‘find_last_bit’:
lib/find_last_bit.c:25:3: error: implicit declaration of function ‘BITMAP_LAST_WORD_MASK’ [-Werror=implicit-function-declaration]
lib/find_next_bit.c: In function ‘_find_next_bit’:
lib/find_next_bit.c:37:2: error: implicit declaration of function ‘BITMAP_FIRST_WORD_MASK’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
cc1: some warnings being treated as errors

With that fixed:

Reviewed-by: Rasmus Villemoes <[email protected]>

That also applies to 2/3 and 3/3.

2015-02-24 00:29:24

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH v5 1/3] lib: find_*_bit reimplementation

Also add my

Reviewed-by: George Spelvin <[email protected]>

2015-02-24 00:40:37

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH v5 0/3] lib: find_*_bit reimplementation

On Sun, 22 Feb 2015 20:24:14 +0300 Yury Norov <[email protected]> wrote:

> This patchset does rework find_bit functions family to achieve better
> performance, and decrease size of text. All rework is done in patch 1.
> Patches 2 and 3 are about code moving and renaming.
>
> It was boot-tested on x86_64 and MIPS (big-endian) machines.
> Performance tests were ran on userspace with code like this:
>
> /* addr[] is filled from /dev/urandom */
> start = clock();
> while (ret < nbits)
> ret = find_next_bit(addr, nbits, ret + 1);
>
> end = clock();
> printf("%ld\t", (unsigned long) end - start);
>
> On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measuremets are next:
> (for find_next_bit, nbits is 8M, for find_first_bit - 80K)
>
> find_next_bit: find_first_bit:
> new current new current
> 26932 43151 14777 14925
> 26947 43182 14521 15423
> 26507 43824 15053 14705
> 27329 43759 14473 14777
> 26895 43367 14847 15023
> 26990 43693 15103 15163
> 26775 43299 15067 15232
> 27282 42752 14544 15121
> 27504 43088 14644 14858
> 26761 43856 14699 15193
> 26692 43075 14781 14681
> 27137 42969 14451 15061
> ... ...
>
> find_next_bit performance gain is 35-40%;
> find_first_bit - no measurable difference.
>
> On ARM machine, there is arch-specific implementation for find_bit.
> To disable it, and use generic one, please apply next patch:

I avoid putting patches into changelogs because in some situations
patch(1) tries to apply it when you apply the real patch. Maybe you
can share the userspace test harness with someone who has access to an
arm machine?

Patches 2 and 3 are missing your signed-off-by:. I added it to my
copies of those patches.