2013-05-26 09:09:22

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 0/4] ipc/sem.c: Bug fixes, regression fixes, v3

I've split my patch into 4 parts:
- 1: Fix-missing-wakeups-in-do_smart_update
- 2: seperate-wait-for-zero-and-alter-tasks
- 3: Always-use-only-one-queue-for-alter-operations
- 4: Rename-try_atomic_semop-to-perform_atomic

Linus:
- Patch 1 should be merged immediately: It fixes bugs,
the current code misses wakeups.

- Patch 2 and 3 restore the behavior of linux <=3.0.9.
I would propose that they are merged, too: I can't rule out that
changing the priority of the wakeups breaks user space apps.

- Patch 4 is trivial, no code changes at all.
If 2+3 are merged, then 4 should be merged, too.

I have tested patch 1 seperately and 1+2+3+4:
With patch 1 applied, there are no more missed wakeups.

With all 4 applied, linux-3.0.10-rc1 behaves as linux <=3.0.9.

With regards to the scalability, I do not expect any degradation:
Operations on seperate semaphores in an array remain parallelized.

Apps that use lots of wait-for-zero semop are probably even faster,
because the wait-for-zero ops are now only scanned if a semval is 0.

--
Manfred


2013-05-26 09:10:19

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 1/4] ipc/sem.c: Fix missing wakeups in do_smart_update_queue()

do_smart_update_queue() is called when an operation
(semop, semctl(SETVAL), semctl(SETALL), ...) modified the array.
It must check which of the sleeping tasks can proceed.

do_smart_update_queue() missed a few wakeups:
- if a sleeping complex op was completed, then all per-semaphore queues
must be scanned - not only those that were modified by *sops
- if a sleeping simple op proceeded, then the global queue
must be scanned again

And:
- the test for "|sops == NULL) before scanning the global
queue is not required: If the global queue is empty, then
it doesn't need to be scanned - regardless of the reason
for calling do_smart_update_queue()

The patch is not optimized, i.e. even completing a wait-for-zero
operation causes a rescan. This is done to keep the patch as simple as
possible.
Avoiding unnecessary scans is implemented in the following patches.

Signed-off-by: Manfred Spraul <[email protected]>
---
ipc/sem.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index a7e40ed..70480a3 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -752,19 +752,29 @@ static void do_smart_update(struct sem_array *sma, struct sembuf *sops, int nsop
int otime, struct list_head *pt)
{
int i;
+ int progress;

- if (sma->complex_count || sops == NULL) {
- if (update_queue(sma, -1, pt))
+ progress = 1;
+retry_global:
+ if (sma->complex_count) {
+ if (update_queue(sma, -1, pt)) {
+ progress = 1;
otime = 1;
+ sops = NULL;
+ }
}
+ if (!progress)
+ goto done;

if (!sops) {
/* No semops; something special is going on. */
for (i = 0; i < sma->sem_nsems; i++) {
- if (update_queue(sma, i, pt))
+ if (update_queue(sma, i, pt)) {
otime = 1;
+ progress = 1;
+ }
}
- goto done;
+ goto done_checkretry;
}

/* Check the semaphores that were modified. */
@@ -772,8 +782,15 @@ static void do_smart_update(struct sem_array *sma, struct sembuf *sops, int nsop
if (sops[i].sem_op > 0 ||
(sops[i].sem_op < 0 &&
sma->sem_base[sops[i].sem_num].semval == 0))
- if (update_queue(sma, sops[i].sem_num, pt))
+ if (update_queue(sma, sops[i].sem_num, pt)) {
otime = 1;
+ progress = 1;
+ }
+ }
+done_checkretry:
+ if (progress) {
+ progress = 0;
+ goto retry_global;
}
done:
if (otime)
--
1.8.1.4

2013-05-26 09:10:29

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 3/4] ipc/sem.c: Always use only one queue for alter operations.

There are two places that can contain alter operations:
- the global queue: sma->pending_alter
- the per-semaphore queues: sma->sem_base[].pending_alter.

Since one of the queues must be processed first, this causes an odd
priorization of the wakeups:
Right now, complex operations have priority over simple ops.

The patch restores the behavior of linux <=3.0.9: The longest
waiting operation has the highest priority.

This is done by using only one queue:
- if there are complex ops, then sma->pending_alter is used.
- otherwise, the per-semaphore queues are used.

As a side effect, do_smart_update_queue() becomes much simpler:
No more goto logic.

Signed-off-by: Manfred Spraul <[email protected]>
---
ipc/sem.c | 128 ++++++++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 88 insertions(+), 40 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index ce25da3..9ed3853 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -192,6 +192,53 @@ void __init sem_init (void)
IPC_SEM_IDS, sysvipc_sem_proc_show);
}

+/**
+ * unmerge_queues - unmerge queues, if possible.
+ * @sma: semaphore array
+ *
+ * The function unmerges the wait queues if complex_count is 0.
+ * It must be called prior to dropping the global semaphore array lock.
+ */
+static void unmerge_queues(struct sem_array *sma)
+{
+ struct sem_queue *q, *tq;
+
+ /* complex operations still around? */
+ if (sma->complex_count)
+ return;
+ /*
+ * We will switch back to simple mode.
+ * Move all pending operation back into the per-semaphore
+ * queues.
+ */
+ list_for_each_entry_safe(q, tq, &sma->pending_alter, list) {
+ struct sem *curr;
+ curr = &sma->sem_base[q->sops[0].sem_num];
+
+ list_add_tail(&q->list, &curr->pending_alter);
+ }
+ INIT_LIST_HEAD(&sma->pending_alter);
+}
+
+/**
+ * merge_queues - Merge single semop queues into global queue
+ * @sma: semaphore array
+ *
+ * This function merges all per-semaphore queues into the global queue.
+ * It is necessary to achieve FIFO ordering for the pending single-sop
+ * operations when a multi-semop operation must sleep.
+ * Only the alter operations must be moved, the const operations can stay.
+ */
+static void merge_queues(struct sem_array *sma)
+{
+ int i;
+ for (i = 0; i < sma->sem_nsems; i++) {
+ struct sem *sem = sma->sem_base + i;
+
+ list_splice_init(&sem->pending_alter, &sma->pending_alter);
+ }
+}
+
/*
* If the request contains only one semaphore operation, and there are
* no complex transactions pending, lock only the semaphore involved.
@@ -262,6 +309,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
static inline void sem_unlock(struct sem_array *sma, int locknum)
{
if (locknum == -1) {
+ unmerge_queues(sma);
spin_unlock(&sma->sem_perm.lock);
} else {
struct sem *sem = sma->sem_base + locknum;
@@ -829,49 +877,38 @@ static void do_smart_update(struct sem_array *sma, struct sembuf *sops, int nsop
int otime, struct list_head *pt)
{
int i;
- int progress;

otime |= do_smart_wakeup_zero(sma, sops, nsops, pt);

- progress = 1;
-retry_global:
- if (sma->complex_count) {
- if (update_queue(sma, -1, pt)) {
- progress = 1;
- otime = 1;
- sops = NULL;
- }
- }
- if (!progress)
- goto done;
-
- if (!sops) {
- /* No semops; something special is going on. */
- for (i = 0; i < sma->sem_nsems; i++) {
- if (update_queue(sma, i, pt)) {
- otime = 1;
- progress = 1;
+ if (!list_empty(&sma->pending_alter)) {
+ /* semaphore array uses the global queue - just process it. */
+ otime |= update_queue(sma, -1, pt);
+ } else {
+ if (!sops) {
+ /*
+ * No sops, thus the modified semaphores are not
+ * known. Check all.
+ */
+ for (i = 0; i < sma->sem_nsems; i++)
+ otime |= update_queue(sma, i, pt);
+ } else {
+ /*
+ * Check the semaphores that were increased:
+ * - No complex ops, thus all sleeping ops are
+ * decrease.
+ * - if we decreased the value, then any sleeping
+ * semaphore ops wont be able to run: If the
+ * previous value was too small, then the new
+ * value will be too small, too.
+ */
+ for (i = 0; i < nsops; i++) {
+ if (sops[i].sem_op > 0) {
+ otime |= update_queue(sma,
+ sops[i].sem_num, pt);
+ }
}
}
- goto done_checkretry;
- }
-
- /* Check the semaphores that were modified. */
- for (i = 0; i < nsops; i++) {
- if (sops[i].sem_op > 0 ||
- (sops[i].sem_op < 0 &&
- sma->sem_base[sops[i].sem_num].semval == 0))
- if (update_queue(sma, sops[i].sem_num, pt)) {
- otime = 1;
- progress = 1;
- }
- }
-done_checkretry:
- if (progress) {
- progress = 0;
- goto retry_global;
}
-done:
if (otime)
sma->sem_otime = get_seconds();
}
@@ -1741,11 +1778,22 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops,
struct sem *curr;
curr = &sma->sem_base[sops->sem_num];

- if (alter)
- list_add_tail(&queue.list, &curr->pending_alter);
- else
+ if (alter) {
+ if (sma->complex_count) {
+ list_add_tail(&queue.list,
+ &sma->pending_alter);
+ } else {
+
+ list_add_tail(&queue.list,
+ &curr->pending_alter);
+ }
+ } else {
list_add_tail(&queue.list, &curr->pending_const);
+ }
} else {
+ if (!sma->complex_count)
+ merge_queues(sma);
+
if (alter)
list_add_tail(&queue.list, &sma->pending_alter);
else
--
1.8.1.4

2013-05-26 09:10:25

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into seperate queues

Introduce seperate queues for operations that do not modify the
semaphore values.
Advantages:
- Simpler logic in check_restart().
- Faster update_queue(): Right now, all wait-for-zero operations
are always tested, even if the semaphore value is not 0.
- wait-for-zero gets again priority, as in linux <=3.0.9

Signed-off-by: Manfred Spraul <[email protected]>
---
include/linux/sem.h | 5 +-
ipc/sem.c | 209 +++++++++++++++++++++++++++++++++++++---------------
2 files changed, 153 insertions(+), 61 deletions(-)

diff --git a/include/linux/sem.h b/include/linux/sem.h
index 53d4265..55e17f6 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -15,7 +15,10 @@ struct sem_array {
time_t sem_otime; /* last semop time */
time_t sem_ctime; /* last change time */
struct sem *sem_base; /* ptr to first semaphore in array */
- struct list_head sem_pending; /* pending operations to be processed */
+ struct list_head pending_alter; /* pending operations */
+ /* that alter the array */
+ struct list_head pending_const; /* pending complex operations */
+ /* that do not alter semvals */
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
diff --git a/ipc/sem.c b/ipc/sem.c
index 70480a3..ce25da3 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -95,7 +95,10 @@ struct sem {
int semval; /* current value */
int sempid; /* pid of last operation */
spinlock_t lock; /* spinlock for fine-grained semtimedop */
- struct list_head sem_pending; /* pending single-sop operations */
+ struct list_head pending_alter; /* pending single-sop operations */
+ /* that alter the semaphore */
+ struct list_head pending_const; /* pending single-sop operations */
+ /* that do not alter the semaphore*/
};

/* One queue for each sleeping process in the system. */
@@ -152,7 +155,7 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
/*
* linked list protection:
* sem_undo.id_next,
- * sem_array.sem_pending{,last},
+ * sem_array.pending{_alter,_cont},
* sem_array.sem_undo: sem_lock() for read/write
* sem_undo.proc_next: only "current" is allowed to read/write that field.
*
@@ -337,7 +340,7 @@ static inline void sem_rmid(struct ipc_namespace *ns, struct sem_array *s)
* Without the check/retry algorithm a lockless wakeup is possible:
* - queue.status is initialized to -EINTR before blocking.
* - wakeup is performed by
- * * unlinking the queue entry from sma->sem_pending
+ * * unlinking the queue entry from the pending list
* * setting queue.status to IN_WAKEUP
* This is the notification for the blocked thread that a
* result value is imminent.
@@ -418,12 +421,14 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
sma->sem_base = (struct sem *) &sma[1];

for (i = 0; i < nsems; i++) {
- INIT_LIST_HEAD(&sma->sem_base[i].sem_pending);
+ INIT_LIST_HEAD(&sma->sem_base[i].pending_alter);
+ INIT_LIST_HEAD(&sma->sem_base[i].pending_const);
spin_lock_init(&sma->sem_base[i].lock);
}

sma->complex_count = 0;
- INIT_LIST_HEAD(&sma->sem_pending);
+ INIT_LIST_HEAD(&sma->pending_alter);
+ INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
sma->sem_nsems = nsems;
sma->sem_ctime = get_seconds();
@@ -609,60 +614,130 @@ static void unlink_queue(struct sem_array *sma, struct sem_queue *q)
* update_queue is O(N^2) when it restarts scanning the whole queue of
* waiting operations. Therefore this function checks if the restart is
* really necessary. It is called after a previously waiting operation
- * was completed.
+ * modified the array.
+ * Note that wait-for-zero operations are handled without restart.
*/
static int check_restart(struct sem_array *sma, struct sem_queue *q)
{
- struct sem *curr;
- struct sem_queue *h;
-
- /* if the operation didn't modify the array, then no restart */
- if (q->alter == 0)
- return 0;
-
- /* pending complex operations are too difficult to analyse */
- if (sma->complex_count)
+ /* pending complex alter operations are too difficult to analyse */
+ if (!list_empty(&sma->pending_alter))
return 1;

/* we were a sleeping complex operation. Too difficult */
if (q->nsops > 1)
return 1;

- curr = sma->sem_base + q->sops[0].sem_num;
+ /* It is impossible that someone waits for the new value:
+ * - complex operations always restart.
+ * - wait-for-zero are handled seperately.
+ * - q is a previously sleeping simple operation that
+ * altered the array. It must be a decrement, because
+ * simple increments never sleep.
+ * - If there are older (higher priority) decrements
+ * in the queue, then they have observed the original
+ * semval value and couldn't proceed. The operation
+ * decremented to value - thus they won't proceed either.
+ */
+ return 0;
+}

- /* No-one waits on this queue */
- if (list_empty(&curr->sem_pending))
- return 0;
+/**
+ * wake_const_ops(sma, semnum, pt) - Wake up non-alter tasks
+ * @sma: semaphore array.
+ * @semnum: semaphore that was modified.
+ * @pt: list head for the tasks that must be woken up.
+ *
+ * wake_const_ops must be called after a semaphore in a semaphore array
+ * was set to 0. If complex const operations are pending, wake_const_ops must
+ * be called with semnum = -1, as well as with the number of each modified
+ * semaphore.
+ * The tasks that must be woken up are added to @pt. The return code
+ * is stored in q->pid.
+ * The function returns 1 if at least one operation was completed successfully.
+ */
+static int wake_const_ops(struct sem_array *sma, int semnum,
+ struct list_head *pt)
+{
+ struct sem_queue *q;
+ struct list_head *walk;
+ struct list_head *pending_list;
+ int semop_completed = 0;
+
+ if (semnum == -1)
+ pending_list = &sma->pending_const;
+ else
+ pending_list = &sma->sem_base[semnum].pending_const;
+
+ walk = pending_list->next;
+ while (walk != pending_list) {
+ int error;
+
+ q = container_of(walk, struct sem_queue, list);
+ walk = walk->next;
+
+ error = try_atomic_semop(sma, q->sops, q->nsops,
+ q->undo, q->pid);
+
+ if (error <= 0) {
+ /* operation completed, remove from queue & wakeup */
+
+ unlink_queue(sma, q);
+
+ wake_up_sem_queue_prepare(pt, q, error);
+ if (error == 0)
+ semop_completed = 1;
+ }
+ }
+ return semop_completed;
+}

- /* the new semaphore value */
- if (curr->semval) {
- /* It is impossible that someone waits for the new value:
- * - q is a previously sleeping simple operation that
- * altered the array. It must be a decrement, because
- * simple increments never sleep.
- * - The value is not 0, thus wait-for-zero won't proceed.
- * - If there are older (higher priority) decrements
- * in the queue, then they have observed the original
- * semval value and couldn't proceed. The operation
- * decremented to value - thus they won't proceed either.
+/**
+ * do_smart_wakeup_zero(sma, sops, nsops, pt) - wakeup all wait for zero tasks
+ * @sma: semaphore array
+ * @sops: operations that were performed
+ * @nsops: number of operations
+ * @pt: list head of the tasks that must be woken up.
+ *
+ * do_smart_wakeup_zero() checks all required queue for wait-for-zero
+ * operations, based on the actual changes that were performed on the
+ * semaphore array.
+ * The function returns 1 if at least one operation was completed successfully.
+ */
+static int do_smart_wakeup_zero(struct sem_array *sma, struct sembuf *sops,
+ int nsops, struct list_head *pt)
+{
+ int i;
+ int semop_completed = 0;
+ int got_zero = 0;
+
+ /* first: the per-semaphore queues, if known */
+ if (sops) {
+ for (i = 0; i < nsops; i++) {
+ int num = sops[i].sem_num;
+
+ if (sma->sem_base[num].semval == 0) {
+ got_zero = 1;
+ semop_completed |= wake_const_ops(sma, num, pt);
+ }
+ }
+ } else {
+ /*
+ * No sops means modified semaphores not known.
+ * Assume all were changed.
*/
- BUG_ON(q->sops[0].sem_op >= 0);
- return 0;
+ for (i = 0; i < sma->sem_nsems; i++) {
+ if (sma->sem_base[i].semval == 0)
+ semop_completed |= wake_const_ops(sma, i, pt);
+ }
}
/*
- * semval is 0. Check if there are wait-for-zero semops.
- * They must be the first entries in the per-semaphore queue
+ * If one of the modified semaphores got 0,
+ * then check the global queue, too.
*/
- h = list_first_entry(&curr->sem_pending, struct sem_queue, list);
- BUG_ON(h->nsops != 1);
- BUG_ON(h->sops[0].sem_num != q->sops[0].sem_num);
+ if (got_zero)
+ semop_completed |= wake_const_ops(sma, -1, pt);

- /* Yes, there is a wait-for-zero semop. Restart */
- if (h->sops[0].sem_op == 0)
- return 1;
-
- /* Again - no-one is waiting for the new value. */
- return 0;
+ return semop_completed;
}


@@ -678,6 +753,8 @@ static int check_restart(struct sem_array *sma, struct sem_queue *q)
* semaphore.
* The tasks that must be woken up are added to @pt. The return code
* is stored in q->pid.
+ * The function internally checks if const operations can now succeed.
+ *
* The function return 1 if at least one semop was completed successfully.
*/
static int update_queue(struct sem_array *sma, int semnum, struct list_head *pt)
@@ -688,9 +765,9 @@ static int update_queue(struct sem_array *sma, int semnum, struct list_head *pt)
int semop_completed = 0;

if (semnum == -1)
- pending_list = &sma->sem_pending;
+ pending_list = &sma->pending_alter;
else
- pending_list = &sma->sem_base[semnum].sem_pending;
+ pending_list = &sma->sem_base[semnum].pending_alter;

again:
walk = pending_list->next;
@@ -702,13 +779,12 @@ again:

/* If we are scanning the single sop, per-semaphore list of
* one semaphore and that semaphore is 0, then it is not
- * necessary to scan the "alter" entries: simple increments
+ * necessary to scan further: simple increments
* that affect only one entry succeed immediately and cannot
* be in the per semaphore pending queue, and decrements
* cannot be successful if the value is already 0.
*/
- if (semnum != -1 && sma->sem_base[semnum].semval == 0 &&
- q->alter)
+ if (semnum != -1 && sma->sem_base[semnum].semval == 0)
break;

error = try_atomic_semop(sma, q->sops, q->nsops,
@@ -724,6 +800,7 @@ again:
restart = 0;
} else {
semop_completed = 1;
+ do_smart_wakeup_zero(sma, q->sops, q->nsops, pt);
restart = check_restart(sma, q);
}

@@ -742,8 +819,8 @@ again:
* @otime: force setting otime
* @pt: list head of the tasks that must be woken up.
*
- * do_smart_update() does the required called to update_queue, based on the
- * actual changes that were performed on the semaphore array.
+ * do_smart_update() does the required calls to update_queue and wakeup_zero,
+ * based on the actual changes that were performed on the semaphore array.
* Note that the function does not do the actual wake-up: the caller is
* responsible for calling wake_up_sem_queue_do(@pt).
* It is safe to perform this call after dropping all locks.
@@ -754,6 +831,8 @@ static void do_smart_update(struct sem_array *sma, struct sembuf *sops, int nsop
int i;
int progress;

+ otime |= do_smart_wakeup_zero(sma, sops, nsops, pt);
+
progress = 1;
retry_global:
if (sma->complex_count) {
@@ -813,14 +892,14 @@ static int count_semncnt (struct sem_array * sma, ushort semnum)
struct sem_queue * q;

semncnt = 0;
- list_for_each_entry(q, &sma->sem_base[semnum].sem_pending, list) {
+ list_for_each_entry(q, &sma->sem_base[semnum].pending_alter, list) {
struct sembuf * sops = q->sops;
BUG_ON(sops->sem_num != semnum);
if ((sops->sem_op < 0) && !(sops->sem_flg & IPC_NOWAIT))
semncnt++;
}

- list_for_each_entry(q, &sma->sem_pending, list) {
+ list_for_each_entry(q, &sma->pending_alter, list) {
struct sembuf * sops = q->sops;
int nsops = q->nsops;
int i;
@@ -839,14 +918,14 @@ static int count_semzcnt (struct sem_array * sma, ushort semnum)
struct sem_queue * q;

semzcnt = 0;
- list_for_each_entry(q, &sma->sem_base[semnum].sem_pending, list) {
+ list_for_each_entry(q, &sma->sem_base[semnum].pending_const, list) {
struct sembuf * sops = q->sops;
BUG_ON(sops->sem_num != semnum);
if ((sops->sem_op == 0) && !(sops->sem_flg & IPC_NOWAIT))
semzcnt++;
}

- list_for_each_entry(q, &sma->sem_pending, list) {
+ list_for_each_entry(q, &sma->pending_const, list) {
struct sembuf * sops = q->sops;
int nsops = q->nsops;
int i;
@@ -884,13 +963,22 @@ static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)

/* Wake up all pending processes and let them fail with EIDRM. */
INIT_LIST_HEAD(&tasks);
- list_for_each_entry_safe(q, tq, &sma->sem_pending, list) {
+ list_for_each_entry_safe(q, tq, &sma->pending_const, list) {
+ unlink_queue(sma, q);
+ wake_up_sem_queue_prepare(&tasks, q, -EIDRM);
+ }
+
+ list_for_each_entry_safe(q, tq, &sma->pending_alter, list) {
unlink_queue(sma, q);
wake_up_sem_queue_prepare(&tasks, q, -EIDRM);
}
for (i = 0; i < sma->sem_nsems; i++) {
struct sem *sem = sma->sem_base + i;
- list_for_each_entry_safe(q, tq, &sem->sem_pending, list) {
+ list_for_each_entry_safe(q, tq, &sem->pending_const, list) {
+ unlink_queue(sma, q);
+ wake_up_sem_queue_prepare(&tasks, q, -EIDRM);
+ }
+ list_for_each_entry_safe(q, tq, &sem->pending_alter, list) {
unlink_queue(sma, q);
wake_up_sem_queue_prepare(&tasks, q, -EIDRM);
}
@@ -1654,14 +1742,15 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops,
curr = &sma->sem_base[sops->sem_num];

if (alter)
- list_add_tail(&queue.list, &curr->sem_pending);
+ list_add_tail(&queue.list, &curr->pending_alter);
else
- list_add(&queue.list, &curr->sem_pending);
+ list_add_tail(&queue.list, &curr->pending_const);
} else {
if (alter)
- list_add_tail(&queue.list, &sma->sem_pending);
+ list_add_tail(&queue.list, &sma->pending_alter);
else
- list_add(&queue.list, &sma->sem_pending);
+ list_add_tail(&queue.list, &sma->pending_const);
+
sma->complex_count++;
}

--
1.8.1.4

2013-05-26 09:10:51

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 4/4] ipc/sem.c: Rename try_atomic_semop() to perform_atomic_semop(), docu update

Final cleanup: Some minor points that I noticed while writing the
previous 3 patches

1) The name try_atomic_semop() is misleading: The function performs the
operation (if it is possible).

2) Some documentation updates.

No real code change, a rename and documentation changes.

Signed-off-by: Manfred Spraul <[email protected]>
---
ipc/sem.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index 9ed3853..1dbb2fa 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -153,12 +153,15 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
#define SEMOPM_FAST 64 /* ~ 372 bytes on stack */

/*
- * linked list protection:
+ * Locking:
* sem_undo.id_next,
+ * sem_array.complex_count,
* sem_array.pending{_alter,_cont},
- * sem_array.sem_undo: sem_lock() for read/write
+ * sem_array.sem_undo: global sem_lock() for read/write
* sem_undo.proc_next: only "current" is allowed to read/write that field.
*
+ * sem_array.sem_base[i].pending_{const,alter}:
+ * global or semaphore sem_lock() for read/write
*/

#define sc_semmsl sem_ctls[0]
@@ -535,12 +538,19 @@ SYSCALL_DEFINE3(semget, key_t, key, int, nsems, int, semflg)
return ipcget(ns, &sem_ids(ns), &sem_ops, &sem_params);
}

-/*
- * Determine whether a sequence of semaphore operations would succeed
- * all at once. Return 0 if yes, 1 if need to sleep, else return error code.
+/** perform_atomic_semop - Perform (if possible) a semaphore operation
+ * @sma: semaphore array
+ * @sops: array with operations that should be checked
+ * @nsems: number of sops
+ * @un: undo array
+ * @pid: pid that did the change
+ *
+ * Returns 0 if the operation was possible.
+ * Returns 1 if the operation is impossible, the caller must sleep.
+ * Negative values are error codes.
*/

-static int try_atomic_semop (struct sem_array * sma, struct sembuf * sops,
+static int perform_atomic_semop(struct sem_array *sma, struct sembuf *sops,
int nsops, struct sem_undo *un, int pid)
{
int result, sem_op;
@@ -723,8 +733,8 @@ static int wake_const_ops(struct sem_array *sma, int semnum,
q = container_of(walk, struct sem_queue, list);
walk = walk->next;

- error = try_atomic_semop(sma, q->sops, q->nsops,
- q->undo, q->pid);
+ error = perform_atomic_semop(sma, q->sops, q->nsops,
+ q->undo, q->pid);

if (error <= 0) {
/* operation completed, remove from queue & wakeup */
@@ -835,7 +845,7 @@ again:
if (semnum != -1 && sma->sem_base[semnum].semval == 0)
break;

- error = try_atomic_semop(sma, q->sops, q->nsops,
+ error = perform_atomic_semop(sma, q->sops, q->nsops,
q->undo, q->pid);

/* Does q->sleeper still need to sleep? */
@@ -1658,7 +1668,6 @@ static int get_queue_result(struct sem_queue *q)
return error;
}

-
SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops,
unsigned, nsops, const struct timespec __user *, timeout)
{
@@ -1756,7 +1765,8 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf __user *, tsops,
if (un && un->semid == -1)
goto out_unlock_free;

- error = try_atomic_semop (sma, sops, nsops, un, task_tgid_vnr(current));
+ error = perform_atomic_semop(sma, sops, nsops, un,
+ task_tgid_vnr(current));
if (error <= 0) {
if (alter && error == 0)
do_smart_update(sma, sops, nsops, 1, &tasks);
--
1.8.1.4

2013-05-26 17:37:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/4] ipc/sem.c: Bug fixes, regression fixes, v3

On Sun, May 26, 2013 at 2:08 AM, Manfred Spraul
<[email protected]> wrote:
> I've split my patch into 4 parts:
> - 1: Fix-missing-wakeups-in-do_smart_update
> - 2: seperate-wait-for-zero-and-alter-tasks
> - 3: Always-use-only-one-queue-for-alter-operations
> - 4: Rename-try_atomic_semop-to-perform_atomic

Rik, Davidlohr, acks/reviews please?

Linus

2013-05-26 20:50:38

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 0/4] ipc/sem.c: Bug fixes, regression fixes, v3

On Sun, 2013-05-26 at 11:08 +0200, Manfred Spraul wrote:
> I've split my patch into 4 parts:
> - 1: Fix-missing-wakeups-in-do_smart_update
> - 2: seperate-wait-for-zero-and-alter-tasks
> - 3: Always-use-only-one-queue-for-alter-operations
> - 4: Rename-try_atomic_semop-to-perform_atomic
>
> Linus:
> - Patch 1 should be merged immediately: It fixes bugs,
> the current code misses wakeups.

Nothing against this.

> - Patch 2 and 3 restore the behavior of linux <=3.0.9.
> I would propose that they are merged, too: I can't rule out that
> changing the priority of the wakeups breaks user space apps.
>
> - Patch 4 is trivial, no code changes at all.
> If 2+3 are merged, then 4 should be merged, too.
>
> I have tested patch 1 seperately and 1+2+3+4:
> With patch 1 applied, there are no more missed wakeups.
>
> With all 4 applied, linux-3.0.10-rc1 behaves as linux <=3.0.9.
>
> With regards to the scalability, I do not expect any degradation:
> Operations on seperate semaphores in an array remain parallelized.

In lack of getting my swingbench DSS environment back, I ran these
changes against the semop-multi program on my laptop. For 256 threads,
with Manfred's patchset the ops/sec suffers around -7.3%.

3.10-rc2-baseline:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 325289276, ops/sec 10842975

- 18.14% a.out [kernel.kallsyms] [k] SYSC_semtimedop ◆
- SYSC_semtimedop ▒
+ 97.54% SyS_semtimedop ▒
+ 2.46% SyS_semop
- 5.24% a.out [kernel.kallsyms] [k] ipc_obtain_object_check ▒
- ipc_obtain_object_check ▒
+ 92.37% SYSC_semtimedop ▒
+ 7.63% SyS_semtimedop
- 4.67% a.out [kernel.kallsyms] [k] _raw_spin_lock ▒
- _raw_spin_lock ▒
+ 91.98% SYSC_semtimedop ▒
+ 7.89% SyS_semtimedop


3.10-rc2-manfred:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 303314830, ops/sec 10110494

- 17.10% a.out [kernel.kallsyms] [k] SYSC_semtimedop ◆
- SYSC_semtimedop ▒
+ 97.47% SyS_semtimedop ▒
+ 2.53% SyS_semop
- 4.79% a.out [kernel.kallsyms] [k] ipc_obtain_object_check ▒
- ipc_obtain_object_check ▒
+ 91.88% SYSC_semtimedop ▒
+ 8.12% SyS_semtimedop
- 4.50% a.out [kernel.kallsyms] [k] _raw_spin_lock ▒
- _raw_spin_lock ▒
+ 90.92% SYSC_semtimedop ▒
+ 8.95% SyS_semtimedop

3.9:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 151293714, ops/sec 5043123

- 59.73% a.out [kernel.kallsyms] [k] _raw_spin_lock ◆
- _raw_spin_lock ▒
+ 98.86% ipc_lock ▒
+ 1.13% ipc_lock_check ▒
- 6.48% a.out [kernel.kallsyms] [k] sys_semtimedop ▒
- sys_semtimedop ▒
+ 95.26% sys_semop ▒
+ 4.74% system_call_fastpath


While I'm not happy about the [smallish] throughput impact, I'm not as
against this patchset as I was originally. I still think that such
changes, if applied, should go through the linux-next/3.11 phase as much
testing is still needed. I'd also like to see how the Oracle benchmark
behaves (yes, it should be more or less faithful to how semop-multi is
impacted).

Thanks,
Davidlohr

2013-05-26 22:59:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/4] ipc/sem.c: Bug fixes, regression fixes, v3

On Sun, May 26, 2013 at 1:50 PM, Davidlohr Bueso <[email protected]> wrote:
>
> In lack of getting my swingbench DSS environment back, I ran these
> changes against the semop-multi program on my laptop. For 256 threads,
> with Manfred's patchset the ops/sec suffers around -7.3%.

Hmm. That test program only tests simple ops, right? So the slowdown
comes from either the code simply being that much slower due to extra
complexity, or from the fair wakeups..

Anyway, I applied 1/4, we can continue to discuss the rest.

Linus

2013-05-27 02:04:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 0/4] ipc/sem.c: Bug fixes, regression fixes, v3

On Sun, May 26, 2013 at 11:08:51AM +0200, Manfred Spraul wrote:
> I've split my patch into 4 parts:
> - 1: Fix-missing-wakeups-in-do_smart_update
> - 2: seperate-wait-for-zero-and-alter-tasks
> - 3: Always-use-only-one-queue-for-alter-operations
> - 4: Rename-try_atomic_semop-to-perform_atomic
>
> Linus:
> - Patch 1 should be merged immediately: It fixes bugs,
> the current code misses wakeups.

Should this patch be backported to 3.9 and maybe older kernels as well?

thanks,

greg k-h

2013-05-27 15:57:16

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH 0/4] ipc/sem.c: Bug fixes, regression fixes, v3

Hi Davidlohr,

On 05/26/2013 10:50 PM, Davidlohr Bueso wrote:
>
> In lack of getting my swingbench DSS environment back, I ran these
> changes against the semop-multi program on my laptop. For 256 threads,
> with Manfred's patchset the ops/sec suffers around -7.3%.

Could you also check the performance of only patch#1?
I fear that it might be slower than all 4 together.

With regards to semop-multi:
Is this the tool?
http://marc.info/?l=linux-kernel&m=136208613626892&q=p3

I think the logic is the wrong:
Locking a semaphore is substraction, unlocking adding.
Thus multiple tasks can run in parallel - and the task switch code is
never triggered.
Could you double check that the number of context switches matches the
output?

I usually use this tool:
http://marc.info/?l=linux-kernel&m=125038376609750


--
Manfred

2013-05-27 17:50:48

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 0/4] ipc/sem.c: Bug fixes, regression fixes, v3

On 05/26/2013 10:04 PM, Greg KH wrote:
> On Sun, May 26, 2013 at 11:08:51AM +0200, Manfred Spraul wrote:
>> I've split my patch into 4 parts:
>> - 1: Fix-missing-wakeups-in-do_smart_update
>> - 2: seperate-wait-for-zero-and-alter-tasks
>> - 3: Always-use-only-one-queue-for-alter-operations
>> - 4: Rename-try_atomic_semop-to-perform_atomic
>>
>> Linus:
>> - Patch 1 should be merged immediately: It fixes bugs,
>> the current code misses wakeups.
>
> Should this patch be backported to 3.9 and maybe older kernels as well?

No need, the new semaphore code (with this last regression)
did not get merged until the 3.10 merge window.

2013-05-27 17:51:28

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 1/4] ipc/sem.c: Fix missing wakeups in do_smart_update_queue()

On 05/26/2013 05:08 AM, Manfred Spraul wrote:
> do_smart_update_queue() is called when an operation
> (semop, semctl(SETVAL), semctl(SETALL), ...) modified the array.
> It must check which of the sleeping tasks can proceed.
>
> do_smart_update_queue() missed a few wakeups:
> - if a sleeping complex op was completed, then all per-semaphore queues
> must be scanned - not only those that were modified by *sops
> - if a sleeping simple op proceeded, then the global queue
> must be scanned again
>
> And:
> - the test for "|sops == NULL) before scanning the global
> queue is not required: If the global queue is empty, then
> it doesn't need to be scanned - regardless of the reason
> for calling do_smart_update_queue()
>
> The patch is not optimized, i.e. even completing a wait-for-zero
> operation causes a rescan. This is done to keep the patch as simple as
> possible.
> Avoiding unnecessary scans is implemented in the following patches.
>
> Signed-off-by: Manfred Spraul <[email protected]>

Very much not optimized, but we need to fix the regression.

Acked-by: Rik van Riel <[email protected]>

2013-05-27 17:57:14

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into seperate queues

On 05/26/2013 05:08 AM, Manfred Spraul wrote:
> Introduce seperate queues for operations that do not modify the
> semaphore values.
> Advantages:
> - Simpler logic in check_restart().
> - Faster update_queue(): Right now, all wait-for-zero operations
> are always tested, even if the semaphore value is not 0.
> - wait-for-zero gets again priority, as in linux <=3.0.9

Whether this complexity is wanted is not for
me to decide, as I am not the ipc/sem.c
maintainer. I'll leave that up to Andrew and Linus.

The code looks correct, though.

Acked-by: Rik van Riel <[email protected]>

2013-05-27 17:59:19

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 4/4] ipc/sem.c: Rename try_atomic_semop() to perform_atomic_semop(), docu update

On 05/26/2013 05:08 AM, Manfred Spraul wrote:
> Final cleanup: Some minor points that I noticed while writing the
> previous 3 patches
>
> 1) The name try_atomic_semop() is misleading: The function performs the
> operation (if it is possible).
>
> 2) Some documentation updates.
>
> No real code change, a rename and documentation changes.

I like this.

> Signed-off-by: Manfred Spraul <[email protected]>

Reviewed-by: Rik van Riel <[email protected]>

2013-05-27 18:24:03

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 3/4] ipc/sem.c: Always use only one queue for alter operations.

On 05/26/2013 05:08 AM, Manfred Spraul wrote:
> There are two places that can contain alter operations:
> - the global queue: sma->pending_alter
> - the per-semaphore queues: sma->sem_base[].pending_alter.
>
> Since one of the queues must be processed first, this causes an odd
> priorization of the wakeups:
> Right now, complex operations have priority over simple ops.
>
> The patch restores the behavior of linux <=3.0.9: The longest
> waiting operation has the highest priority.
>
> This is done by using only one queue:
> - if there are complex ops, then sma->pending_alter is used.
> - otherwise, the per-semaphore queues are used.
>
> As a side effect, do_smart_update_queue() becomes much simpler:
> No more goto logic.
>
> Signed-off-by: Manfred Spraul <[email protected]>

This is the one where I really wonder whether the
complexity is warranted.

Again, the code does look correct.

Acked-by: Rik van Riel <[email protected]>

2013-05-27 19:53:48

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH 1/4] ipc/sem.c: Fix missing wakeups in do_smart_update_queue()

Hi Rik,

On 05/27/2013 07:51 PM, Rik van Riel wrote:
>
> Very much not optimized, but we need to fix the regression.
>
> Acked-by: Rik van Riel <[email protected]>

Thanks for the review.

I was afraid that the differences would be noticeable but they aren't:
At least on my laptop, the differences between the patches are within
the noise:

# ./cat /proc/stat | grep ctxt;./osim 2 2 1000000 0 0;cat /proc/stat |
grep ctxt
runtime: stock patch 1 patch 1-4
2,0088875 2,041084 1,997367

# ./cat /proc/stat | grep ctxt;./osim 2 20 1000000 0 0;cat /proc/stat |
grep ctxt
4,32541 4,295956 4,277817

(assuming 2 cpus, otherwise increase "2" and "20" accordingly)

Here is the link to the tool:
http://marc.info/?l=linux-kernel&m=125038376609750

--
Manfred

2013-05-27 19:57:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 0/4] ipc/sem.c: Bug fixes, regression fixes, v3

On Mon, May 27, 2013 at 01:50:38PM -0400, Rik van Riel wrote:
> On 05/26/2013 10:04 PM, Greg KH wrote:
> >On Sun, May 26, 2013 at 11:08:51AM +0200, Manfred Spraul wrote:
> >>I've split my patch into 4 parts:
> >>- 1: Fix-missing-wakeups-in-do_smart_update
> >>- 2: seperate-wait-for-zero-and-alter-tasks
> >>- 3: Always-use-only-one-queue-for-alter-operations
> >>- 4: Rename-try_atomic_semop-to-perform_atomic
> >>
> >>Linus:
> >>- Patch 1 should be merged immediately: It fixes bugs,
> >> the current code misses wakeups.
> >
> >Should this patch be backported to 3.9 and maybe older kernels as well?
>
> No need, the new semaphore code (with this last regression)
> did not get merged until the 3.10 merge window.

Thanks, I didn't realize that.

greg k-h

2013-05-28 20:37:32

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 0/4] ipc/sem.c: Bug fixes, regression fixes, v3

*sigh* it seems that the email this morning wasn't sent, resending...

On Mon, 2013-05-27 at 17:57 +0200, Manfred Spraul wrote:
> Hi Davidlohr,
>
> On 05/26/2013 10:50 PM, Davidlohr Bueso wrote:
> >
> > In lack of getting my swingbench DSS environment back, I ran these
> > changes against the semop-multi program on my laptop. For 256 threads,
> > with Manfred's patchset the ops/sec suffers around -7.3%.
>
> Could you also check the performance of only patch#1?
> I fear that it might be slower than all 4 together.

Performance wise, patch 1 actually doesn't make any difference with what
was already upstream.

>
> With regards to semop-multi:
> Is this the tool?
> http://marc.info/?l=linux-kernel&m=136208613626892&q=p3

Yep.

Thanks,
Davidlohr

2013-06-01 09:20:38

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into seperate queues

Hi Rik,

On 05/27/2013 07:57 PM, Rik van Riel wrote:
> On 05/26/2013 05:08 AM, Manfred Spraul wrote:
>> Introduce seperate queues for operations that do not modify the
>> semaphore values.
>> Advantages:
>> - Simpler logic in check_restart().
>> - Faster update_queue(): Right now, all wait-for-zero operations
>> are always tested, even if the semaphore value is not 0.
>> - wait-for-zero gets again priority, as in linux <=3.0.9
>
> Whether this complexity is wanted is not for
> me to decide, as I am not the ipc/sem.c
> maintainer. I'll leave that up to Andrew and Linus.
>
We can have only one: Either more logic or unoptimized loops.
But I don't think that the complexity increases that much, e.g. some
parts (e.g. check_restart()) get much simpler.

But:
Mike Galbraith ran 3.10-rc3 with and without my changes on a 4-socket
64-core system, and for me the results appears to be quite slow:
- semop-multi 256 64: around 600.000 ops/sec, both with and without my
additional patches [difference around 1%]
That is slower than my 1.4 GHz i3 with 3.9 - I get around 1.000.000
ops/sec
Is that expected?
My only idea would be trashing from writing sma->sem_otime.

- osim [i.e.: with reschedules] is much slower: around 21 us per schedule.
Perhaps the scheduler didn't pair the threads optimally: intra-cpu
reschedules
take around 2 us on my i3, inter-cpu reschedules around 16 us.

Thus I have attached my test apps.
- psem: psem tests sleeping semaphore operations.
Pairs of two threads perform ping-pong operations, starting with 1
semaphore and increasing up to the given max.
Either bound to the same cpu ("intra-cpu") or bound to different
cpus ("inter-cpu").
Inter-cpu is hardcoded, probably always a different socket
(distance max_cpus/2).

- semscale performs operations that never block, i.e. like your
semop-multi.c
It does:
- delays in user space to figure out what is the maximum number of
operations possible taking into account that user space will do something.
- use interleaving, to force the threads to different cores/sockets.

Perhaps something in 3.0.10-rc3 breaks the scalability?

--
Manfred


Attachments:
psem.cpp (6.96 kB)
semscale.cpp (7.58 kB)
Download all attachments

2013-06-01 10:03:31

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into seperate queues

On Sat, 2013-06-01 at 11:20 +0200, Manfred Spraul wrote:
> Hi Rik,
>
> On 05/27/2013 07:57 PM, Rik van Riel wrote:
> > On 05/26/2013 05:08 AM, Manfred Spraul wrote:
> >> Introduce seperate queues for operations that do not modify the
> >> semaphore values.
> >> Advantages:
> >> - Simpler logic in check_restart().
> >> - Faster update_queue(): Right now, all wait-for-zero operations
> >> are always tested, even if the semaphore value is not 0.
> >> - wait-for-zero gets again priority, as in linux <=3.0.9
> >
> > Whether this complexity is wanted is not for
> > me to decide, as I am not the ipc/sem.c
> > maintainer. I'll leave that up to Andrew and Linus.
> >
> We can have only one: Either more logic or unoptimized loops.
> But I don't think that the complexity increases that much, e.g. some
> parts (e.g. check_restart()) get much simpler.
>
> But:
> Mike Galbraith ran 3.10-rc3 with and without my changes on a 4-socket
> 64-core system, and for me the results appears to be quite slow:
> - semop-multi 256 64: around 600.000 ops/sec, both with and without my
> additional patches [difference around 1%]
> That is slower than my 1.4 GHz i3 with 3.9 - I get around 1.000.000
> ops/sec
> Is that expected?
> My only idea would be trashing from writing sma->sem_otime.
>
> - osim [i.e.: with reschedules] is much slower: around 21 us per schedule.
> Perhaps the scheduler didn't pair the threads optimally: intra-cpu
> reschedules
> take around 2 us on my i3, inter-cpu reschedules around 16 us.

I got -rt backports working, and it's faster at semop-multi, but one
hell of a lot slower at osim. The goto again loop in sem_lock() is a
livelock in -rt, I had to do that differently.

-rtm is with our patches added, -rt is backport without.

vogelweide:/abuild/mike/:[0]# uname -r
3.8.13-rt9-rtm
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 33553800, ops/sec 1118460
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 33344598, ops/sec 1111486
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 33655348, ops/sec 1121844
vogelweide:/abuild/mike/:[0]#
vogelweide:/abuild/mike/:[130]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 12.296215 seconds for 1000192 loops
per loop execution time: 12.293 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 11.613663 seconds for 1000192 loops
per loop execution time: 11.611 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 13.755537 seconds for 1000192 loops
per loop execution time: 13.752 usec

vogelweide:/abuild/mike/:[0]# uname -r
3.8.13-rt9-rt
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 37343656, ops/sec 1244788
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 37226496, ops/sec 1240883
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 36730816, ops/sec 1224360
vogelweide:/abuild/mike/:[0]#
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 12.676632 seconds for 1000192 loops
per loop execution time: 12.674 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 14.166756 seconds for 1000192 loops
per loop execution time: 14.164 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 15.116200 seconds for 1000192 loops
per loop execution time: 15.113 use

2013-06-01 10:05:28

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into seperate queues

On Sat, 2013-06-01 at 12:03 +0200, Mike Galbraith wrote:

> -rtm is with our patches added, -rt is backport without.

'y' key in 'your' not fully depressed.

2013-06-01 11:04:33

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into seperate queues

On Sat, 2013-06-01 at 12:03 +0200, Mike Galbraith wrote:

> -rtm is with our patches added, -rt is backport without.

Now same tree, turn off PREEMPT_FULL so the kernel is plain old PREEMPT.
Again, -rt is without your patches, -rtm with, followed by backing out
all ipc backports to end up at pre scalability series to show what the
set gains on this box.

vogelweide:/abuild/mike/:[0]# uname -r
3.8.13-rt9-rt
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 19102372, ops/sec 636745
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 19877668, ops/sec 662588
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 19610632, ops/sec 653687
vogelweide:/abuild/mike/:[0]#
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 1.893317 seconds for 1000192 loops
per loop execution time: 1.892 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 1.976055 seconds for 1000192 loops
per loop execution time: 1.975 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 1.651162 seconds for 1000192 loops
per loop execution time: 1.650 usec


vogelweide:/abuild/mike/:[0]# uname -r
3.8.13-rt9-rtm
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 22053968, ops/sec 735132
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 21295310, ops/sec 709843
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 21458002, ops/sec 715266
vogelweide:/abuild/mike/:[0]#
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 1.858765 seconds for 1000192 loops
per loop execution time: 1.858 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 1.806943 seconds for 1000192 loops
per loop execution time: 1.806 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 1.854004 seconds for 1000192 loops
per loop execution time: 1.853 usec

All ipc patches backed out.

vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 8885026, ops/sec 296167
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 8791676, ops/sec 293055
vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64
cpus 64, threads: 256, semaphores: 64, test duration: 30 secs
total operations: 8301092, ops/sec 276703
vogelweide:/abuild/mike/:[0]#
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 10.750488 seconds for 1000192 loops
per loop execution time: 10.748 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 10.737264 seconds for 1000192 loops
per loop execution time: 10.735 usec
vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000000 0 0
osim <sems> <tasks> <loops> <busy-in> <busy-out>
osim: using a semaphore array with 64 semaphores.
osim: using 256 tasks.
osim: each thread loops 3907 times
osim: each thread busyloops 0 loops outside and 0 loops inside.
total execution time: 10.756395 seconds for 1000192 loops
per loop execution time: 10.754 usec


2013-06-01 11:52:13

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH 2/4] ipc/sem: seperate wait-for-zero and alter tasks into seperate queues

Hi all,

On 06/01/2013 11:20 AM, Manfred Spraul wrote:
>
> - osim [i.e.: with reschedules] is much slower: around 21 us per
> schedule.
> Perhaps the scheduler didn't pair the threads optimally: intra-cpu
> reschedules
> take around 2 us on my i3, inter-cpu reschedules around 16 us.
>
I mixed up numbers.
osim reports around 1.6 us for the 64-thread system.
It is still odd that it is only factor 1.5 facter than my 4-thread i3,
but at least it is not slower.

Anyway: I have attached the table of all results that I have so far.

--
Manfred


Attachments:
Eval-ipc.ods (22.11 kB)