2024-01-08 17:06:02

by Yury Norov

[permalink] [raw]
Subject: [GIT PULL] bitmap patches for v6.8

The following changes since commit 33cc938e65a98f1d29d0a18403dbbee050dcad9a:

Linux 6.7-rc4 (2023-12-03 18:52:56 +0900)

are available in the Git repository at:

https://github.com/norov/linux.git/ tags/bitmap-for-6.8

for you to fetch changes up to 071ad962baf5e857fd965595421cf6fb588610ed:

bitmap: Step down as a reviewer (2024-01-01 12:50:08 -0800)

----------------------------------------------------------------
Hi Linus,

Please pull bitmap patches for v6.8.

There is a couple of random cleanup patches, and the rest of the request
is a series adding atomic find_bit() operations:

https://patchwork.kernel.org/project/linux-media/cover/[email protected]/

Most of the patches are arch and drivers code where transition to the
new API is straightforward. And most of them (~20/35) are reviewed by
corresponding maintainers. For less actively supported subsystems I
didn't receive a feedback.

Many people asked to pull patches for their drivers together with the
head patch of the series. And because the transition is quite clean,
I decided to move the whole series in this request, including that
unreviewed material. Please let me know if it doesn't work for you,
and I'll resend it with the only reviewed patches.

Thanks,
Yury

----------------------------------------------------------------
Andy Shevchenko (1):
bitmap: Step down as a reviewer

Guanjun (1):
lib/find_bit: Fix the code comments about find_next_bit_wrap

Yury Norov (36):
lib/find: optimize find_*_bit_wrap
lib/find: add atomic find_bit() primitives
lib/find: add test for atomic find_bit() ops
lib/sbitmap; optimize __sbitmap_get_word() by using find_and_set_bit()
watch_queue: optimize post_one_notification() by using find_and_clear_bit()
sched: add cpumask_find_and_set() and use it in __mm_cid_get()
mips: sgi-ip30: optimize heart_alloc_int() by using find_and_set_bit()
sparc: optimize alloc_msi() by using find_and_set_bit()
perf/arm: use atomic find_bit() API
drivers/perf: optimize ali_drw_get_counter_idx() by using find_and_set_bit()
dmaengine: idxd: optimize perfmon_assign_event()
ath10k: optimize ath10k_snoc_napi_poll()
wifi: rtw88: optimize the driver by using atomic iterator
KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
PCI: hv: Optimize hv_get_dom_num() by using find_and_set_bit()
scsi: core: optimize scsi_evt_emit() by using an atomic iterator
scsi: mpi3mr: optimize the driver by using find_and_set_bit()
scsi: qedi: optimize qedi_get_task_idx() by using find_and_set_bit()
powerpc: optimize arch code by using atomic find_bit() API
iommu: optimize subsystem by using atomic find_bit() API
media: radio-shark: optimize the driver by using atomic find_bit() API
sfc: optimize the driver by using atomic find_bit() API
tty: nozomi: optimize interrupt_handler()
usb: cdc-acm: optimize acm_softint()
block: null_blk: replace get_tag() with a generic find_and_set_bit_lock()
RDMA/rtrs: optimize __rtrs_get_permit() by using find_and_set_bit_lock()
mISDN: optimize get_free_devid()
media: em28xx: cx231xx: optimize drivers by using find_and_set_bit()
ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
serial: sc12is7xx: optimize sc16is7xx_alloc_line()
bluetooth: optimize cmtp_alloc_block_id()
net: smc: optimize smc_wr_tx_get_free_slot_index()
ALSA: use atomic find_bit() functions where applicable
m68k: optimize get_mmu_context()
microblaze: optimize get_mmu_context()
sh: mach-x3proto: optimize ilsel_enable()

MAINTAINERS | 1 -
arch/m68k/include/asm/mmu_context.h | 11 +-
arch/microblaze/include/asm/mmu_context_mm.h | 11 +-
arch/mips/sgi-ip30/ip30-irq.c | 12 +-
arch/powerpc/mm/book3s32/mmu_context.c | 10 +-
arch/powerpc/platforms/pasemi/dma_lib.c | 45 +---
arch/powerpc/platforms/powernv/pci-sriov.c | 12 +-
arch/sh/boards/mach-x3proto/ilsel.c | 4 +-
arch/sparc/kernel/pci_msi.c | 9 +-
arch/x86/kvm/hyperv.c | 40 ++--
drivers/block/null_blk/main.c | 41 ++--
drivers/dma/idxd/perfmon.c | 8 +-
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 15 +-
drivers/iommu/arm/arm-smmu/arm-smmu.h | 10 +-
drivers/iommu/msm_iommu.c | 18 +-
drivers/isdn/mISDN/core.c | 9 +-
drivers/media/radio/radio-shark.c | 5 +-
drivers/media/radio/radio-shark2.c | 5 +-
drivers/media/usb/cx231xx/cx231xx-cards.c | 16 +-
drivers/media/usb/em28xx/em28xx-cards.c | 37 ++--
drivers/net/ethernet/rocker/rocker_ofdpa.c | 11 +-
drivers/net/ethernet/sfc/rx_common.c | 4 +-
drivers/net/ethernet/sfc/siena/rx_common.c | 4 +-
drivers/net/ethernet/sfc/siena/siena_sriov.c | 14 +-
drivers/net/wireless/ath/ath10k/snoc.c | 9 +-
drivers/net/wireless/realtek/rtw88/pci.c | 5 +-
drivers/net/wireless/realtek/rtw89/pci.c | 5 +-
drivers/pci/controller/pci-hyperv.c | 7 +-
drivers/perf/alibaba_uncore_drw_pmu.c | 10 +-
drivers/perf/arm-cci.c | 24 +--
drivers/perf/arm-ccn.c | 10 +-
drivers/perf/arm_dmc620_pmu.c | 9 +-
drivers/perf/arm_pmuv3.c | 8 +-
drivers/scsi/mpi3mr/mpi3mr_os.c | 21 +-
drivers/scsi/qedi/qedi_main.c | 9 +-
drivers/scsi/scsi_lib.c | 7 +-
drivers/tty/nozomi.c | 5 +-
drivers/tty/serial/sc16is7xx.c | 8 +-
drivers/usb/class/cdc-acm.c | 5 +-
include/linux/cpumask.h | 12 ++
include/linux/find.h | 301 ++++++++++++++++++++++++++-
kernel/sched/sched.h | 14 +-
kernel/watch_queue.c | 6 +-
lib/find_bit.c | 85 ++++++++
lib/sbitmap.c | 46 +---
lib/test_bitmap.c | 61 ++++++
net/bluetooth/cmtp/core.c | 10 +-
net/smc/smc_wr.c | 10 +-
sound/pci/hda/hda_codec.c | 7 +-
sound/usb/caiaq/audio.c | 13 +-
50 files changed, 635 insertions(+), 424 deletions(-)


2024-01-21 21:47:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] bitmap patches for v6.8

So I've left this to be my last pull request, because I hate how our
header files are growing, and this part:

include/linux/find.h | 301 ++++++++++++++++++++++++++++++-
1 file changed, 297 insertions(+), 4 deletions(-)

in particular.

Nobody includes <linux/find.h> directly, but indirectly pretty much
*every* single kernel C file includes it.

Looking at some basic stats of my dependency files in my tree, 4426 of
4526 object files (~98%) depend on find.h because they get it through
*some* path that ends up with bitmap.h -> find.h.

And honestly, the number of files that actually want the new functions
is basically just a tiny handful. It's also not obvious how useful
those optimizations are, considering that a lot of the loops are
*tiny*. I looked at a few cases, and the size of the bitmap it was
iterating over was often in the 2-4 range, sometimes (like
RTW89_TXCH_NUM) 13, etc.

In radio-shark, you replaced a loop like this

for (i = 0; i < 2; i++) {

with that for_each_test_and_clear_bit(), and it *really* isn't clear
that it was worth it. It sure wasn't performance-critical to begin
with.

In general, if an "optimization" doesn't have any performance numbers
attached to it, is it an optimization at all?

So I finally ended up pulling this, but after looking at the patch I
went "this is adding more lines than it removes, has no performance
numbers, and grows a core header file that is included by absolutely
everything by a third".

. and then I decided to just unpull it again.

Linus

2024-01-21 23:35:31

by Yury Norov

[permalink] [raw]
Subject: Re: [GIT PULL] bitmap patches for v6.8

On Sun, Jan 21, 2024 at 01:47:21PM -0800, Linus Torvalds wrote:
> So I've left this to be my last pull request, because I hate how our
> header files are growing, and this part:
>
> include/linux/find.h | 301 ++++++++++++++++++++++++++++++-
> 1 file changed, 297 insertions(+), 4 deletions(-)
>
> in particular.
>
> Nobody includes <linux/find.h> directly, but indirectly pretty much
> *every* single kernel C file includes it.
>
> Looking at some basic stats of my dependency files in my tree, 4426 of
> 4526 object files (~98%) depend on find.h because they get it through
> *some* path that ends up with bitmap.h -> find.h.
>
> And honestly, the number of files that actually want the new functions
> is basically just a tiny handful. It's also not obvious how useful
> those optimizations are, considering that a lot of the loops are
> *tiny*. I looked at a few cases, and the size of the bitmap it was
> iterating over was often in the 2-4 range, sometimes (like
> RTW89_TXCH_NUM) 13, etc.
>
> In radio-shark, you replaced a loop like this
>
> for (i = 0; i < 2; i++) {
>
> with that for_each_test_and_clear_bit(), and it *really* isn't clear
> that it was worth it. It sure wasn't performance-critical to begin
> with.
>
> In general, if an "optimization" doesn't have any performance numbers
> attached to it, is it an optimization at all?

No, this is not a performance optimization, and I don't claim that.
Jan Kara reported some performance improvement, but performance is not
the main goal of the series, and I didn't run performance tests for
this on myself.

The original motivation came from the fact that using non-volatile
find_bit() together with volatile test_and_set_bit() may trigger
KCSAN warning on concurrent memory access.

People wanted to make the whole find API volatile, which is a bad idea
for sure. So I had to give them a reasonable alternative. After some
thinking I decided that we just need a separate set of volatile find API.
Check this for initial discussion:

https://lore.kernel.org/lkml/[email protected]/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615

It also makes the code cleaner, at least to my taste, and some
reviewers' too. And to some degree less bug-prone. For example,
ILSEL_LEVELS is just 15, but traversing code in ilsel_enable()
is buggy, as Geert spotted. And switching to atomic find() fixes
it automatically:

https://lore.kernel.org/lkml/CAMuHMdWHxesM-EOOMtrrw3Caz+5Wux35QiKOjvwA=vwQpRe26Q@mail.gmail.com/T/#me53217b32fd5623af6c7eafa5c4ce6d0465f6c58

(His review came just a couple days ago, after I submitted the pull
request, so the tag is not there.)

> So I finally ended up pulling this, but after looking at the patch I
> went "this is adding more lines than it removes, has no performance
> numbers, and grows a core header file that is included by absolutely
> everything by a third".
>
> .. and then I decided to just unpull it again.

Yes, it's true. Bitmap.h historically includes everything related to
bitmaps, and it became too big. I started moving some chunks out of
it already. For example, aae06fc1b5a2e splits out string-related bitmap
code to a separate bitmap-str.h. That patch was more about human
readability, and it keeps the order, so that bitmap.h includes
bitmap-str.h, but we can easily turn that around.

I wanted to do it some day, but actually nobody complained before now,
and I wanted to collect more material to make it a series.

I still think that kernel needs atomic find_bit() API. If you change
your mind and pull the series, I can make patches that split-out
atomic find() declarations to a separate header, not included by
bitmap.h, and submit it for -rc2, together with bitmap-str.h rework.

Thanks,
Yury

2024-01-22 19:48:25

by Jan Kara

[permalink] [raw]
Subject: Re: [GIT PULL] bitmap patches for v6.8

On Sun 21-01-24 15:35:18, Yury Norov wrote:
> On Sun, Jan 21, 2024 at 01:47:21PM -0800, Linus Torvalds wrote:
> > So I've left this to be my last pull request, because I hate how our
> > header files are growing, and this part:
> >
> > include/linux/find.h | 301 ++++++++++++++++++++++++++++++-
> > 1 file changed, 297 insertions(+), 4 deletions(-)
> >
> > in particular.
> >
> > Nobody includes <linux/find.h> directly, but indirectly pretty much
> > *every* single kernel C file includes it.
> >
> > Looking at some basic stats of my dependency files in my tree, 4426 of
> > 4526 object files (~98%) depend on find.h because they get it through
> > *some* path that ends up with bitmap.h -> find.h.
> >
> > And honestly, the number of files that actually want the new functions
> > is basically just a tiny handful. It's also not obvious how useful
> > those optimizations are, considering that a lot of the loops are
> > *tiny*. I looked at a few cases, and the size of the bitmap it was
> > iterating over was often in the 2-4 range, sometimes (like
> > RTW89_TXCH_NUM) 13, etc.
> >
> > In radio-shark, you replaced a loop like this
> >
> > for (i = 0; i < 2; i++) {
> >
> > with that for_each_test_and_clear_bit(), and it *really* isn't clear
> > that it was worth it. It sure wasn't performance-critical to begin
> > with.
> >
> > In general, if an "optimization" doesn't have any performance numbers
> > attached to it, is it an optimization at all?
>
> No, this is not a performance optimization, and I don't claim that.
> Jan Kara reported some performance improvement, but performance is not
> the main goal of the series, and I didn't run performance tests for
> this on myself.
>
> The original motivation came from the fact that using non-volatile
> find_bit() together with volatile test_and_set_bit() may trigger
> KCSAN warning on concurrent memory access.

Maybe to save Linus some reading: It is not only about KCSAN, it is a real
(theoretical) race. In principle the problem is that find_next_bit() does
something like:

if (*addr)
return __ffs(*addr)

and if the compiler decides to refetch *addr and we race with concurrent
writer bad things can happen. Now I wanted to fix this in a less intrusive
way using READ_ONCE()
(https://lore.kernel.org/linux-fsdevel/[email protected]/)
but Yury didn't like that and came up with this series.

> People wanted to make the whole find API volatile, which is a bad idea
> for sure. So I had to give them a reasonable alternative. After some
> thinking I decided that we just need a separate set of volatile find API.
> Check this for initial discussion:
>
> https://lore.kernel.org/lkml/[email protected]/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615
>
> It also makes the code cleaner, at least to my taste, and some
> reviewers' too. And to some degree less bug-prone. For example,
> ILSEL_LEVELS is just 15, but traversing code in ilsel_enable()
> is buggy, as Geert spotted. And switching to atomic find() fixes
> it automatically:
>
> https://lore.kernel.org/lkml/CAMuHMdWHxesM-EOOMtrrw3Caz+5Wux35QiKOjvwA=vwQpRe26Q@mail.gmail.com/T/#me53217b32fd5623af6c7eafa5c4ce6d0465f6c58
>
> (His review came just a couple days ago, after I submitted the pull
> request, so the tag is not there.)

I would just note that this pull as is still does not address the original
issue in xarray when using find_next_bit() and I'm not yet sure how/if you
plan to address it.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR