2023-09-22 09:52:37

by John Stultz

[permalink] [raw]
Subject: [PATCH 0/3] Fixes for test-ww_mutex stress test

As part of recent stabilizing of the proxy-execution series,
I've seen a number of issues from the test-ww_mutex module.

This test is great for shaking out problems in the patches, but
in some cases it seems the patch series has made it easier to
uncover problems in the test-ww_mutex stress tests.

So this patch series provides a few fixes that I've come up with
in testing with qemu using large cpu counts.

Feedback would be greatly appreciated!

thanks
-john

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: [email protected]

John Stultz (3):
test-ww_mutex: Use prng instead of rng to avoid hangs at bootup
test-ww_mutex: Fix potential workqueue corruption
test-ww_mutex: Make sure we bail out instead of livelock

kernel/locking/test-ww_mutex.c | 48 ++++++++++++++++++++++++----------
1 file changed, 34 insertions(+), 14 deletions(-)

--
2.42.0.515.g380fc7ccd1-goog


2023-09-25 06:20:24

by John Stultz

[permalink] [raw]
Subject: [PATCH 2/3] test-ww_mutex: Fix potential workqueue corruption

In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.

Often this would cause strange crashes as the mutexes would be
freed while they were being used.

Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.

Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.

It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.

So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: [email protected]
Signed-off-by: John Stultz <[email protected]>
---
kernel/locking/test-ww_mutex.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 9bceba65858a..358d66150426 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -479,7 +479,6 @@ static void stress_inorder_work(struct work_struct *work)
} while (!time_after(jiffies, stress->timeout));

kfree(order);
- kfree(stress);
}

struct reorder_lock {
@@ -544,7 +543,6 @@ static void stress_reorder_work(struct work_struct *work)
list_for_each_entry_safe(ll, ln, &locks, link)
kfree(ll);
kfree(order);
- kfree(stress);
}

static void stress_one_work(struct work_struct *work)
@@ -565,8 +563,6 @@ static void stress_one_work(struct work_struct *work)
break;
}
} while (!time_after(jiffies, stress->timeout));
-
- kfree(stress);
}

#define STRESS_INORDER BIT(0)
@@ -577,15 +573,24 @@ static void stress_one_work(struct work_struct *work)
static int stress(int nlocks, int nthreads, unsigned int flags)
{
struct ww_mutex *locks;
- int n;
+ struct stress *stress_array;
+ int n, count;

locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
if (!locks)
return -ENOMEM;

+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array),
+ GFP_KERNEL);
+ if (!stress_array) {
+ kfree(locks);
+ return -ENOMEM;
+ }
+
for (n = 0; n < nlocks; n++)
ww_mutex_init(&locks[n], &ww_class);

+ count = 0;
for (n = 0; nthreads; n++) {
struct stress *stress;
void (*fn)(struct work_struct *work);
@@ -609,9 +614,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
if (!fn)
continue;

- stress = kmalloc(sizeof(*stress), GFP_KERNEL);
- if (!stress)
- break;
+ stress = &stress_array[count++];

INIT_WORK(&stress->work, fn);
stress->locks = locks;
@@ -626,6 +629,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)

for (n = 0; n < nlocks; n++)
ww_mutex_destroy(&locks[n]);
+ kfree(stress_array);
kfree(locks);

return 0;
--
2.42.0.515.g380fc7ccd1-goog