2023-04-14 05:14:40

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v3 0/2] modules/kmod: replace implementation with a sempahore

On this v3:

o Tons of spell checks thanks to Miroslav Benes
o Fixed a stupid bug where I was using the timeout without HZ as
reported by Miroslav Benes
o Enanced the tribal knowledge docs for the semaphore Vs mutex
considerations folks might make as suggested by Matthew Wilcox
o Added tags for patches

Luis Chamberlain (1):
modules/kmod: replace implementation with a semaphore

Peter Zijlstra (1):
Change DEFINE_SEMAPHORE() to take a number argument

arch/mips/cavium-octeon/setup.c | 2 +-
arch/x86/kernel/cpu/intel.c | 2 +-
drivers/firmware/efi/runtime-wrappers.c | 2 +-
drivers/firmware/efi/vars.c | 2 +-
drivers/macintosh/adb.c | 2 +-
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
drivers/platform/x86/intel/ifs/sysfs.c | 2 +-
drivers/scsi/esas2r/esas2r_ioctl.c | 2 +-
.../interface/vchiq_arm/vchiq_arm.c | 2 +-
include/linux/semaphore.h | 11 ++++++--
kernel/module/kmod.c | 26 +++++--------------
kernel/printk/printk.c | 2 +-
net/rxrpc/call_object.c | 6 ++---
13 files changed, 28 insertions(+), 35 deletions(-)

--
2.39.2


2023-04-14 05:14:46

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v3 2/2] modules/kmod: replace implementation with a semaphore

Simplify the concurrency delimiter we use for kmod with the semaphore.
I had used the kmod strategy to try to implement a similar concurrency
delimiter for the kernel_read*() calls from the finit_module() path
so to reduce vmalloc() memory pressure. That effort didn't provide yet
conclusive results, but one thing that became clear is we can use
the suggested alternative solution with semaphores which Linus hinted
at instead of using the atomic / wait strategy.

I've stress tested this with kmod test 0008:

time /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008

And I get only a *slight* delay. That delay however is small, a few
seconds for a full test loop run that runs 150 times, for about ~30-40
seconds. The small delay is worth the simplfication IMHO.

Signed-off-by: Luis Chamberlain <[email protected]>
---
kernel/module/kmod.c | 26 +++++++-------------------
1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/kernel/module/kmod.c b/kernel/module/kmod.c
index b717134ebe17..5899083436a3 100644
--- a/kernel/module/kmod.c
+++ b/kernel/module/kmod.c
@@ -40,8 +40,7 @@
* effect. Systems like these are very unlikely if modules are enabled.
*/
#define MAX_KMOD_CONCURRENT 50
-static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
-static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
+static DEFINE_SEMAPHORE(kmod_concurrent_max, MAX_KMOD_CONCURRENT);

/*
* This is a restriction on having *all* MAX_KMOD_CONCURRENT threads
@@ -148,29 +147,18 @@ int __request_module(bool wait, const char *fmt, ...)
if (ret)
return ret;

- if (atomic_dec_if_positive(&kmod_concurrent_max) < 0) {
- pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
- atomic_read(&kmod_concurrent_max),
- MAX_KMOD_CONCURRENT, module_name);
- ret = wait_event_killable_timeout(kmod_wq,
- atomic_dec_if_positive(&kmod_concurrent_max) >= 0,
- MAX_KMOD_ALL_BUSY_TIMEOUT * HZ);
- if (!ret) {
- pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
- module_name, MAX_KMOD_CONCURRENT, MAX_KMOD_ALL_BUSY_TIMEOUT);
- return -ETIME;
- } else if (ret == -ERESTARTSYS) {
- pr_warn_ratelimited("request_module: sigkill sent for modprobe %s, giving up", module_name);
- return ret;
- }
+ ret = down_timeout(&kmod_concurrent_max, MAX_KMOD_ALL_BUSY_TIMEOUT * HZ);
+ if (ret) {
+ pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
+ module_name, MAX_KMOD_CONCURRENT, MAX_KMOD_ALL_BUSY_TIMEOUT);
+ return ret;
}

trace_module_request(module_name, wait, _RET_IP_);

ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);

- atomic_inc(&kmod_concurrent_max);
- wake_up(&kmod_wq);
+ up(&kmod_concurrent_max);

return ret;
}
--
2.39.2

2023-04-14 05:20:45

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH v3 1/2] Change DEFINE_SEMAPHORE() to take a number argument

From: Peter Zijlstra <[email protected]>

Fundamentally semaphores are a counted primitive, but
DEFINE_SEMAPHORE() does not expose this and explicitly creates a
binary semaphore.

Change DEFINE_SEMAPHORE() to take a number argument and use that in the
few places that open-coded it using __SEMAPHORE_INITIALIZER().

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
[mcgrof: add some tribal knowledge about why some folks prefer
binary sempahores over mutexes]
Signed-off-by: Luis Chamberlain <[email protected]>
---
arch/mips/cavium-octeon/setup.c | 2 +-
arch/x86/kernel/cpu/intel.c | 2 +-
drivers/firmware/efi/runtime-wrappers.c | 2 +-
drivers/firmware/efi/vars.c | 2 +-
drivers/macintosh/adb.c | 2 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
drivers/platform/x86/intel/ifs/sysfs.c | 2 +-
drivers/scsi/esas2r/esas2r_ioctl.c | 2 +-
.../vc04_services/interface/vchiq_arm/vchiq_arm.c | 2 +-
include/linux/semaphore.h | 11 +++++++++--
kernel/printk/printk.c | 2 +-
net/rxrpc/call_object.c | 6 ++----
12 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c
index a71727f7a608..c5561016f577 100644
--- a/arch/mips/cavium-octeon/setup.c
+++ b/arch/mips/cavium-octeon/setup.c
@@ -72,7 +72,7 @@ extern void pci_console_init(const char *arg);
static unsigned long long max_memory = ULLONG_MAX;
static unsigned long long reserve_low_mem;

-DEFINE_SEMAPHORE(octeon_bootbus_sem);
+DEFINE_SEMAPHORE(octeon_bootbus_sem, 1);
EXPORT_SYMBOL(octeon_bootbus_sem);

static struct octeon_boot_descriptor *octeon_boot_desc_ptr;
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 291d4167fab8..12bad63822f0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1177,7 +1177,7 @@ static const struct {
static struct ratelimit_state bld_ratelimit;

static unsigned int sysctl_sld_mitigate = 1;
-static DEFINE_SEMAPHORE(buslock_sem);
+static DEFINE_SEMAPHORE(buslock_sem, 1);

#ifdef CONFIG_PROC_SYSCTL
static struct ctl_table sld_sysctls[] = {
diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c
index 1fba4e09cdcf..a400c4312c82 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -158,7 +158,7 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call)
* none of the remaining functions are actually ever called at runtime.
* So let's just use a single lock to serialize all Runtime Services calls.
*/
-static DEFINE_SEMAPHORE(efi_runtime_lock);
+static DEFINE_SEMAPHORE(efi_runtime_lock, 1);

/*
* Expose the EFI runtime lock to the UV platform
diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c
index bd75b87f5fc1..bfc5fa6aa47b 100644
--- a/drivers/firmware/efi/vars.c
+++ b/drivers/firmware/efi/vars.c
@@ -21,7 +21,7 @@
/* Private pointer to registered efivars */
static struct efivars *__efivars;

-static DEFINE_SEMAPHORE(efivars_lock);
+static DEFINE_SEMAPHORE(efivars_lock, 1);

static efi_status_t check_var_size(bool nonblocking, u32 attributes,
unsigned long size)
diff --git a/drivers/macintosh/adb.c b/drivers/macintosh/adb.c
index 23bd0c77ac1a..56599515d51a 100644
--- a/drivers/macintosh/adb.c
+++ b/drivers/macintosh/adb.c
@@ -80,7 +80,7 @@ static struct adb_driver *adb_controller;
BLOCKING_NOTIFIER_HEAD(adb_client_list);
static int adb_got_sleep;
static int adb_inited;
-static DEFINE_SEMAPHORE(adb_probe_mutex);
+static DEFINE_SEMAPHORE(adb_probe_mutex, 1);
static int sleepy_trackpad;
static int autopoll_devs;
int __adb_probe_sync;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 5d1e4fe335aa..5a105bab4387 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -298,7 +298,7 @@ const u32 dmae_reg_go_c[] = {

/* Global resources for unloading a previously loaded device */
#define BNX2X_PREV_WAIT_NEEDED 1
-static DEFINE_SEMAPHORE(bnx2x_prev_sem);
+static DEFINE_SEMAPHORE(bnx2x_prev_sem, 1);
static LIST_HEAD(bnx2x_prev_list);

/* Forward declaration */
diff --git a/drivers/platform/x86/intel/ifs/sysfs.c b/drivers/platform/x86/intel/ifs/sysfs.c
index ee636a76b083..4c3c642ee19a 100644
--- a/drivers/platform/x86/intel/ifs/sysfs.c
+++ b/drivers/platform/x86/intel/ifs/sysfs.c
@@ -13,7 +13,7 @@
* Protects against simultaneous tests on multiple cores, or
* reloading can file while a test is in progress
*/
-static DEFINE_SEMAPHORE(ifs_sem);
+static DEFINE_SEMAPHORE(ifs_sem, 1);

/*
* The sysfs interface to check additional details of last test
diff --git a/drivers/scsi/esas2r/esas2r_ioctl.c b/drivers/scsi/esas2r/esas2r_ioctl.c
index e003d923acbf..055d2e87a2c8 100644
--- a/drivers/scsi/esas2r/esas2r_ioctl.c
+++ b/drivers/scsi/esas2r/esas2r_ioctl.c
@@ -56,7 +56,7 @@ dma_addr_t esas2r_buffered_ioctl_addr;
u32 esas2r_buffered_ioctl_size;
struct pci_dev *esas2r_buffered_ioctl_pcid;

-static DEFINE_SEMAPHORE(buffered_ioctl_semaphore);
+static DEFINE_SEMAPHORE(buffered_ioctl_semaphore, 1);
typedef int (*BUFFERED_IOCTL_CALLBACK)(struct esas2r_adapter *,
struct esas2r_request *,
struct esas2r_sg_context *,
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
index cddcd3c596c9..1a656fdc9445 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
@@ -149,7 +149,7 @@ static char *g_fragments_base;
static char *g_free_fragments;
static struct semaphore g_free_fragments_sema;

-static DEFINE_SEMAPHORE(g_free_fragments_mutex);
+static DEFINE_SEMAPHORE(g_free_fragments_mutex, 1);

static int
vchiq_blocking_bulk_transfer(struct vchiq_instance *instance, unsigned int handle, void *data,
diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
index 6694d0019a68..2d6aa3fd7861 100644
--- a/include/linux/semaphore.h
+++ b/include/linux/semaphore.h
@@ -25,8 +25,15 @@ struct semaphore {
.wait_list = LIST_HEAD_INIT((name).wait_list), \
}

-#define DEFINE_SEMAPHORE(name) \
- struct semaphore name = __SEMAPHORE_INITIALIZER(name, 1)
+/*
+ * There is a big difference between a binary semaphore and a mutex.
+ * You cannot call mutex_unlock() from IRQ context because it takes an
+ * internal mutex spin_lock in a non-IRQ-safe manner. Both try_lock()
+ * and unlock() can be called from IRQ context. A mutex must also be
+ * released in the same context that locked it.
+ */
+#define DEFINE_SEMAPHORE(_name, _n) \
+ struct semaphore _name = __SEMAPHORE_INITIALIZER(_name, _n)

static inline void sema_init(struct semaphore *sem, int val)
{
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fd0c9f913940..76987aaa5a45 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -89,7 +89,7 @@ static DEFINE_MUTEX(console_mutex);
* console_sem protects updates to console->seq and console_suspended,
* and also provides serialization for console printing.
*/
-static DEFINE_SEMAPHORE(console_sem);
+static DEFINE_SEMAPHORE(console_sem, 1);
HLIST_HEAD(console_list);
EXPORT_SYMBOL_GPL(console_list);
DEFINE_STATIC_SRCU(console_srcu);
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index e9f1f49d18c2..3e5cc70884dd 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -40,10 +40,8 @@ const char *const rxrpc_call_completions[NR__RXRPC_CALL_COMPLETIONS] = {

struct kmem_cache *rxrpc_call_jar;

-static struct semaphore rxrpc_call_limiter =
- __SEMAPHORE_INITIALIZER(rxrpc_call_limiter, 1000);
-static struct semaphore rxrpc_kernel_call_limiter =
- __SEMAPHORE_INITIALIZER(rxrpc_kernel_call_limiter, 1000);
+static DEFINE_SEMAPHORE(rxrpc_call_limiter, 1000);
+static DEFINE_SEMAPHORE(rxrpc_kernel_call_limiter, 1000);

void rxrpc_poke_call(struct rxrpc_call *call, enum rxrpc_call_poke_trace what)
{
--
2.39.2

2023-04-14 09:35:14

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules/kmod: replace implementation with a semaphore

On Thu, 13 Apr 2023, Luis Chamberlain wrote:

> Simplify the concurrency delimiter we use for kmod with the semaphore.
> I had used the kmod strategy to try to implement a similar concurrency
> delimiter for the kernel_read*() calls from the finit_module() path
> so to reduce vmalloc() memory pressure. That effort didn't provide yet
> conclusive results, but one thing that became clear is we can use
> the suggested alternative solution with semaphores which Linus hinted
> at instead of using the atomic / wait strategy.
>
> I've stress tested this with kmod test 0008:
>
> time /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
>
> And I get only a *slight* delay. That delay however is small, a few
> seconds for a full test loop run that runs 150 times, for about ~30-40
> seconds. The small delay is worth the simplfication IMHO.
>
> Signed-off-by: Luis Chamberlain <[email protected]>

Reviewed-by: Miroslav Benes <[email protected]>

M

2023-04-14 09:47:34

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] modules/kmod: replace implementation with a semaphore

On 14.04.23 07:13, Luis Chamberlain wrote:
> Simplify the concurrency delimiter we use for kmod with the semaphore.
> I had used the kmod strategy to try to implement a similar concurrency
> delimiter for the kernel_read*() calls from the finit_module() path
> so to reduce vmalloc() memory pressure. That effort didn't provide yet
> conclusive results, but one thing that became clear is we can use
> the suggested alternative solution with semaphores which Linus hinted
> at instead of using the atomic / wait strategy.
>
> I've stress tested this with kmod test 0008:
>
> time /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
>
> And I get only a *slight* delay. That delay however is small, a few
> seconds for a full test loop run that runs 150 times, for about ~30-40
> seconds. The small delay is worth the simplfication IMHO.
>
> Signed-off-by: Luis Chamberlain <[email protected]>
> ---
> kernel/module/kmod.c | 26 +++++++-------------------
> 1 file changed, 7 insertions(+), 19 deletions(-)
>
> diff --git a/kernel/module/kmod.c b/kernel/module/kmod.c
> index b717134ebe17..5899083436a3 100644
> --- a/kernel/module/kmod.c
> +++ b/kernel/module/kmod.c
> @@ -40,8 +40,7 @@
> * effect. Systems like these are very unlikely if modules are enabled.
> */
> #define MAX_KMOD_CONCURRENT 50
> -static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
> -static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
> +static DEFINE_SEMAPHORE(kmod_concurrent_max, MAX_KMOD_CONCURRENT);
>
> /*
> * This is a restriction on having *all* MAX_KMOD_CONCURRENT threads
> @@ -148,29 +147,18 @@ int __request_module(bool wait, const char *fmt, ...)
> if (ret)
> return ret;
>
> - if (atomic_dec_if_positive(&kmod_concurrent_max) < 0) {
> - pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
> - atomic_read(&kmod_concurrent_max),
> - MAX_KMOD_CONCURRENT, module_name);
> - ret = wait_event_killable_timeout(kmod_wq,
> - atomic_dec_if_positive(&kmod_concurrent_max) >= 0,
> - MAX_KMOD_ALL_BUSY_TIMEOUT * HZ);
> - if (!ret) {
> - pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
> - module_name, MAX_KMOD_CONCURRENT, MAX_KMOD_ALL_BUSY_TIMEOUT);
> - return -ETIME;
> - } else if (ret == -ERESTARTSYS) {
> - pr_warn_ratelimited("request_module: sigkill sent for modprobe %s, giving up", module_name);
> - return ret;
> - }
> + ret = down_timeout(&kmod_concurrent_max, MAX_KMOD_ALL_BUSY_TIMEOUT * HZ);
> + if (ret) {
> + pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
> + module_name, MAX_KMOD_CONCURRENT, MAX_KMOD_ALL_BUSY_TIMEOUT);
> + return ret;
> }
>
> trace_module_request(module_name, wait, _RET_IP_);
>
> ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
>
> - atomic_inc(&kmod_concurrent_max);
> - wake_up(&kmod_wq);
> + up(&kmod_concurrent_max);
>
> return ret;
> }

Reviewed-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb

2023-04-14 17:19:06

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] Change DEFINE_SEMAPHORE() to take a number argument

On Thu, Apr 13, 2023 at 10:13:48PM -0700, Luis Chamberlain wrote:
> From: Peter Zijlstra <[email protected]>
> [mcgrof: add some tribal knowledge about why some folks prefer
> binary sempahores over mutexes]

Jeesh, sorry I thought I had replaced the tribal knowledge tibit with
what Matthew had suggested before, will do that in v4.

Luis