2008-03-21 09:42:00

by Manfred Spraul

[permalink] [raw]
Subject: Scalability requirements for sysv ipc (was: ipc: store ipcs into IDRs)

Hi all,

I noticed that sysv ipc now uses very special locking: first a global
rw-semaphore, then within that semaphore rcu:
> linux-2.6.25-rc3:/ipc/util.c:
> struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
> {
> struct kern_ipc_perm *out;
> int lid = ipcid_to_idx(id);
>
> down_read(&ids->rw_mutex);
>
> rcu_read_lock();
> out = idr_find(&ids->ipcs_idr, lid);
ids->rw_mutex is a per-namespace (i.e.: usually global) semaphore. Thus
ipc_lock writes into a global cacheline. Everything else is based on
per-object locking, especially sysv sem doesn't contain a single global
lock/statistic counter/...
That can't be the Right Thing (tm): Either there are cases where we need
the scalability (then using IDRs is impossible), or the scalability is
never needed (then the remaining parts from RCU should be removed).
I don't have a suitable test setup, has anyone performed benchmarks
recently?
Is sysv semaphore still important, or have all apps moved to posix
semaphores/futexes?
Nadia: Do you have access to a suitable benchmark?

A microbenchmark on a single-cpu system doesn't help much (except that
2.6.25 is around factor 2 slower for sysv msg ping-pong between two
tasks compared to the numbers I remember from older kernels....)

--
Manfred


2008-03-21 12:46:18

by Nadia Derbey

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc (was: ipc: store ipcs into IDRs)

Manfred Spraul wrote:
> Hi all,
>
> I noticed that sysv ipc now uses very special locking: first a global
> rw-semaphore, then within that semaphore rcu:
> > linux-2.6.25-rc3:/ipc/util.c:
>
>> struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
>> {
>> struct kern_ipc_perm *out;
>> int lid = ipcid_to_idx(id);
>>
>> down_read(&ids->rw_mutex);
>>
>> rcu_read_lock();
>> out = idr_find(&ids->ipcs_idr, lid);
>
> ids->rw_mutex is a per-namespace (i.e.: usually global) semaphore. Thus
> ipc_lock writes into a global cacheline. Everything else is based on
> per-object locking, especially sysv sem doesn't contain a single global
> lock/statistic counter/...
> That can't be the Right Thing (tm): Either there are cases where we need
> the scalability (then using IDRs is impossible), or the scalability is
> never needed (then the remaining parts from RCU should be removed).
> I don't have a suitable test setup, has anyone performed benchmarks
> recently?
> Is sysv semaphore still important, or have all apps moved to posix
> semaphores/futexes?
> Nadia: Do you have access to a suitable benchmark?
>
> A microbenchmark on a single-cpu system doesn't help much (except that
> 2.6.25 is around factor 2 slower for sysv msg ping-pong between two
> tasks compared to the numbers I remember from older kernels....)
>

If I remember well, at that time I had used ctxbench and I wrote some
other small scripts.
And the results I had were around 2 or 3% slowdown, but I have to
confirm that by checking in my archives.

I'll also have a look at the remaining RCU critical sections in the code.

Regards,
Nadia

2008-03-21 13:33:39

by Manfred Spraul

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

Nadia Derbey wrote:
> Manfred Spraul wrote:
>>
>> A microbenchmark on a single-cpu system doesn't help much (except
>> that 2.6.25 is around factor 2 slower for sysv msg ping-pong between
>> two tasks compared to the numbers I remember from older kernels....)
>>
>
> If I remember well, at that time I had used ctxbench and I wrote some
> other small scripts.
> And the results I had were around 2 or 3% slowdown, but I have to
> confirm that by checking in my archives.
>
Do you have access to multi-core systems? The "best case" for the rcu
code would be
- 8 or 16 cores
- one instance of ctxbench running on each core, bound to that core.

I'd expect a significant slowdown. The big question is if it matters.

--
Manfred

2008-03-21 14:13:55

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

On Fri, Mar 21, 2008 at 02:33:24PM +0100, Manfred Spraul wrote:
> Nadia Derbey wrote:
> >Manfred Spraul wrote:
> >>
> >>A microbenchmark on a single-cpu system doesn't help much (except
> >>that 2.6.25 is around factor 2 slower for sysv msg ping-pong between
> >>two tasks compared to the numbers I remember from older kernels....)
> >
> >If I remember well, at that time I had used ctxbench and I wrote some
> >other small scripts.
> >And the results I had were around 2 or 3% slowdown, but I have to
> >confirm that by checking in my archives.
> >
> Do you have access to multi-core systems? The "best case" for the rcu
> code would be
> - 8 or 16 cores
> - one instance of ctxbench running on each core, bound to that core.
>
> I'd expect a significant slowdown. The big question is if it matters.

I could give it a spin -- though I would need to be pointed to the
patch and the test.

Thanx, Paul

2008-03-21 16:09:00

by Manfred Spraul

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

Paul E. McKenney wrote:
> I could give it a spin -- though I would need to be pointed to the
> patch and the test.
>
>
I'd just compare a recent kernel with something older, pre Fri Oct 19
11:53:44 2007

Then download ctxbench, run one instance on each core, bound with taskset.
http://www.tmr.com/%7Epublic/source/
(I don't juse ctxbench myself, if it doesn't work then I could post my
own app. It would be i386 only with RDTSCs inside)

I'll try to run it on my PentiumIII/850, right now I'm still setting
everything up.

--
Manfred

2008-03-22 05:47:56

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc


On Fri, 2008-03-21 at 17:08 +0100, Manfred Spraul wrote:
> Paul E. McKenney wrote:
> > I could give it a spin -- though I would need to be pointed to the
> > patch and the test.
> >
> >
> I'd just compare a recent kernel with something older, pre Fri Oct 19
> 11:53:44 2007
>
> Then download ctxbench, run one instance on each core, bound with taskset.
> http://www.tmr.com/%7Epublic/source/
> (I don't juse ctxbench myself, if it doesn't work then I could post my
> own app. It would be i386 only with RDTSCs inside)

(test gizmos are always welcome)

Results for Q6600 box don't look particularly wonderful.

taskset -c 3 ./ctx -s

2.6.24.3
3766962 itterations in 9.999845 seconds = 376734/sec

2.6.22.18-cfs-v24.1
4375920 itterations in 10.006199 seconds = 437330/sec

for i in 0 1 2 3; do taskset -c $i ./ctx -s& done

2.6.22.18-cfs-v24.1
4355784 itterations in 10.005670 seconds = 435361/sec
4396033 itterations in 10.005686 seconds = 439384/sec
4390027 itterations in 10.006511 seconds = 438739/sec
4383906 itterations in 10.006834 seconds = 438128/sec

2.6.24.3
1269937 itterations in 9.999757 seconds = 127006/sec
1266723 itterations in 9.999663 seconds = 126685/sec
1267293 itterations in 9.999348 seconds = 126742/sec
1265793 itterations in 9.999766 seconds = 126592/sec

-Mike

2008-03-22 10:10:28

by Manfred Spraul

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

/*
* psem.cpp, parallel sysv sem pingpong
*
* Copyright (C) 1999, 2001, 2005, 2008 by Manfred Spraul.
* All rights reserved except the rights granted by the GPL.
*
* Redistribution of this file is permitted under the terms of the GNU
* General Public License (GPL) version 2 or later.
* $Header$
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <getopt.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <pthread.h>

//////////////////////////////////////////////////////////////////////////////

static enum {
WAITING,
RUNNING,
STOPPED,
} volatile g_state = WAITING;

unsigned long long *g_results;
int *g_svsem_ids;
pthread_t *g_threads;

struct taskinfo {
int svsem_id;
int threadid;
int sender;
};

#define DATASIZE 8

void* worker_thread(void *arg)
{
struct taskinfo *ti = (struct taskinfo*)arg;
unsigned long long rounds;
int ret;

{
cpu_set_t cpus;
CPU_ZERO(&cpus);
CPU_SET(ti->threadid/2, &cpus);
printf("ti: %d %lxh\n", ti->threadid/2, cpus.__bits[0]);

ret = pthread_setaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus);
if (ret < 0) {
printf("pthread_setaffinity_np failed for thread %d with errno %d.\n",
ti->threadid, errno);
}

ret = pthread_getaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus);
if (ret < 0) {
printf("pthread_getaffinity_np() failed for thread %d with errno %d.\n",
ti->threadid, errno);
fflush(stdout);
} else {
printf("thread %d: sysvsem %8d type %d bound to %lxh\n",ti->threadid,
ti->svsem_id, ti->sender, cpus.__bits[0]);
}
fflush(stdout);
}

rounds = 0;
while(g_state == WAITING) {
#ifdef __i386__
__asm__ __volatile__("pause": : :"memory");
#endif
}

if (ti->sender) {
struct sembuf sop[1];
int res;

/* 1) insert token */
sop[0].sem_num=0;
sop[0].sem_op=1;
sop[0].sem_flg=0;
res = semop(ti->svsem_id,sop,1);

if (ret != 0) {
printf("Initial semop failed, errno %d.\n", errno);
exit(1);
}
}
while(g_state == RUNNING) {
struct sembuf sop[1];
int res;

/* 1) retrieve token */
sop[0].sem_num=ti->sender;
sop[0].sem_op=-1;
sop[0].sem_flg=0;
res = semop(ti->svsem_id,sop,1);
if (ret != 0) {
/* EIDRM can happen */
if (errno == EIDRM)
break;
printf("main semop failed, errno %d.\n", errno);
exit(1);
}

/* 2) reinsert token */
sop[0].sem_num=1-ti->sender;
sop[0].sem_op=1;
sop[0].sem_flg=0;
res = semop(ti->svsem_id,sop,1);
if (ret != 0) {
/* EIDRM can happen */
if (errno == EIDRM)
break;
printf("main semop failed, errno %d.\n", errno);
exit(1);
}


rounds++;
}
g_results[ti->threadid] = rounds;

pthread_exit(0);
return NULL;
}

void init_thread(int thread1, int thread2)
{
int ret;
struct taskinfo *ti1, *ti2;

ti1 = new (struct taskinfo);
ti2 = new (struct taskinfo);
if (!ti1 || !ti2) {
printf("Could not allocate task info\n");
exit(1);
}
g_svsem_ids[thread1] = semget(IPC_PRIVATE,2,0777|IPC_CREAT);
if(g_svsem_ids[thread1] == -1) {
printf(" message queue create failed.\n");
exit(1);
}
ti1->svsem_id = g_svsem_ids[thread1];
ti2->svsem_id = ti1->svsem_id;
ti1->threadid = thread1;
ti2->threadid = thread2;
ti1->sender = 1;
ti2->sender = 0;

ret = pthread_create(&g_threads[thread1], NULL, worker_thread, ti1);
if (ret) {
printf(" pthread_create failed with error code %d\n", ret);
exit(1);
}
ret = pthread_create(&g_threads[thread2], NULL, worker_thread, ti2);
if (ret) {
printf(" pthread_create failed with error code %d\n", ret);
exit(1);
}
}

//////////////////////////////////////////////////////////////////////////////

int main(int argc, char **argv)
{
int queues, timeout;
unsigned long long totals;
int i;

printf("psem [nr queues] [timeout]\n");
if (argc != 3) {
printf(" Invalid parameters.\n");
return 0;
}
queues = atoi(argv[1]);
timeout = atoi(argv[2]);
printf("Using %d queues (%d threads) for %d seconds.\n",
queues, 2*queues, timeout);

g_results = new unsigned long long[2*queues];
g_svsem_ids = new int[queues];
g_threads = new pthread_t[2*queues];
for (i=0;i<queues;i++) {
g_results[i] = 0;
g_results[i+queues] = 0;
init_thread(i, i+queues);
}

sleep(1);
g_state = RUNNING;
sleep(timeout);
g_state = STOPPED;
sleep(1);
for (i=0;i<queues;i++) {
int res;
res = semctl(g_svsem_ids[i],1,IPC_RMID,NULL);
if (res < 0) {
printf("semctl(IPC_RMID) failed for %d, errno%d.\n",
g_svsem_ids[i], errno);
}
}
for (i=0;i<2*queues;i++)
pthread_join(g_threads[i], NULL);

printf("Result matrix:\n");
totals = 0;
for (i=0;i<queues;i++) {
printf(" Thread %3d: %8lld %3d: %8lld\n",
i, g_results[i], i+queues, g_results[i+queues]);
totals += g_results[i] + g_results[i+queues];
}
printf("Total: %lld\n", totals);
}


Attachments:
pmsg.cpp (4.56 kB)
psem.cpp (4.73 kB)
Download all attachments

2008-03-22 11:54:07

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc


On Sat, 2008-03-22 at 11:10 +0100, Manfred Spraul wrote:

> Attached are my own testapps: one for sysv msg, one for sysv sem.
> Could you run them? Taskset is done internally, just execute
>
> $ for i in 1 2 3 4;do ./psem $i 5;./pmsg $i 5;done

2.6.22.18-cfs-v24-smp 2.6.24.3-smp
Result matrix: (psem)
Thread 0: 2394885 1: 2394885 Thread 0: 2004534 1: 2004535
Total: 4789770 Total: 4009069
Result matrix: (pmsg)
Thread 0: 2345913 1: 2345914 Thread 0: 1971000 1: 1971000
Total: 4691827 Total: 3942000


Result matrix:
Thread 0: 1613610 2: 1613611 Thread 0: 477112 2: 477111
Thread 1: 1613590 3: 1613590 Thread 1: 485607 3: 485607
Total: 6454401 Total: 1925437
Result matrix:
Thread 0: 1409956 2: 1409956 Thread 0: 519398 2: 519398
Thread 1: 1409776 3: 1409776 Thread 1: 519169 3: 519170
Total: 5639464 Total: 2077135


Result matrix:
Thread 0: 516309 3: 516309 Thread 0: 401157 3: 401157
Thread 1: 318546 4: 318546 Thread 1: 408252 4: 408252
Thread 2: 352940 5: 352940 Thread 2: 703600 5: 703600
Total: 2375590 Total: 3026018
Result matrix:
Thread 0: 478356 3: 478356 Thread 0: 344738 3: 344739
Thread 1: 241655 4: 241655 Thread 1: 343614 4: 343615
Thread 2: 252444 5: 252445 Thread 2: 589298 5: 589299
Total: 1944911 Total: 2555303


Result matrix:
Thread 0: 443392 4: 443392 Thread 0: 398491 4: 398491
Thread 1: 443338 5: 443339 Thread 1: 398473 5: 398473
Thread 2: 444069 6: 444070 Thread 2: 394647 6: 394648
Thread 3: 444078 7: 444078 Thread 3: 394784 7: 394785
Total: 3549756 Total: 3172792
Result matrix:
Thread 0: 354973 4: 354973 Thread 0: 331307 4: 331307
Thread 1: 354966 5: 354966 Thread 1: 331220 5: 331221
Thread 2: 358035 6: 358035 Thread 2: 322852 6: 322852
Thread 3: 357877 7: 357877 Thread 3: 322899 7: 322899
Total: 2851702 Total: 2616557

2008-03-22 14:23:00

by Manfred Spraul

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

/*
* psem.cpp, parallel sysv sem pingpong
*
* Copyright (C) 1999, 2001, 2005, 2008 by Manfred Spraul.
* All rights reserved except the rights granted by the GPL.
*
* Redistribution of this file is permitted under the terms of the GNU
* General Public License (GPL) version 2 or later.
* $Header$
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <getopt.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <pthread.h>

//////////////////////////////////////////////////////////////////////////////

static enum {
WAITING,
RUNNING,
STOPPED,
} volatile g_state = WAITING;

unsigned long long *g_results;
int *g_svsem_ids;
pthread_t *g_threads;

struct taskinfo {
int svsem_id;
int threadid;
int cpuid;
int sender;
};

#define DATASIZE 8

void* worker_thread(void *arg)
{
struct taskinfo *ti = (struct taskinfo*)arg;
unsigned long long rounds;
int ret;

{
cpu_set_t cpus;
CPU_ZERO(&cpus);
CPU_SET(ti->cpuid, &cpus);

ret = pthread_setaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus);
if (ret < 0) {
printf("pthread_setaffinity_np failed for thread %d with errno %d.\n",
ti->threadid, errno);
}

ret = pthread_getaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus);
if (ret < 0) {
printf("pthread_getaffinity_np() failed for thread %d with errno %d.\n",
ti->threadid, errno);
fflush(stdout);
} else {
printf("thread %d: sysvsem %8d type %d bound to %04lxh\n",ti->threadid,
ti->svsem_id, ti->sender, cpus.__bits[0]);
}
fflush(stdout);
}

rounds = 0;
while(g_state == WAITING) {
#ifdef __i386__
__asm__ __volatile__("pause": : :"memory");
#endif
}

if (ti->sender) {
struct sembuf sop[1];
int res;

/* 1) insert token */
sop[0].sem_num=0;
sop[0].sem_op=1;
sop[0].sem_flg=0;
res = semop(ti->svsem_id,sop,1);

if (ret != 0) {
printf("Initial semop failed, errno %d.\n", errno);
exit(1);
}
}
while(g_state == RUNNING) {
struct sembuf sop[1];
int res;

/* 1) retrieve token */
sop[0].sem_num=ti->sender;
sop[0].sem_op=-1;
sop[0].sem_flg=0;
res = semop(ti->svsem_id,sop,1);
if (ret != 0) {
/* EIDRM can happen */
if (errno == EIDRM)
break;
printf("main semop failed, errno %d.\n", errno);
exit(1);
}

/* 2) reinsert token */
sop[0].sem_num=1-ti->sender;
sop[0].sem_op=1;
sop[0].sem_flg=0;
res = semop(ti->svsem_id,sop,1);
if (ret != 0) {
/* EIDRM can happen */
if (errno == EIDRM)
break;
printf("main semop failed, errno %d.\n", errno);
exit(1);
}


rounds++;
}
g_results[ti->threadid] = rounds;

pthread_exit(0);
return NULL;
}

void init_threads(int cpu, int cpus)
{
int ret;
struct taskinfo *ti1, *ti2;

ti1 = new (struct taskinfo);
ti2 = new (struct taskinfo);
if (!ti1 || !ti2) {
printf("Could not allocate task info\n");
exit(1);
}
g_svsem_ids[cpu] = semget(IPC_PRIVATE,2,0777|IPC_CREAT);
if(g_svsem_ids[cpu] == -1) {
printf("sem array create failed.\n");
exit(1);
}

g_results[cpu] = 0;
g_results[cpu+cpus] = 0;

ti1->svsem_id = g_svsem_ids[cpu];
ti1->threadid = cpu;
ti1->cpuid = cpu;
ti1->sender = 1;
ti2->svsem_id = g_svsem_ids[cpu];
ti2->threadid = cpu+cpus;
ti2->cpuid = cpu;
ti2->sender = 0;

ret = pthread_create(&g_threads[ti1->threadid], NULL, worker_thread, ti1);
if (ret) {
printf(" pthread_create failed with error code %d\n", ret);
exit(1);
}
ret = pthread_create(&g_threads[ti2->threadid], NULL, worker_thread, ti2);
if (ret) {
printf(" pthread_create failed with error code %d\n", ret);
exit(1);
}
}

//////////////////////////////////////////////////////////////////////////////

int main(int argc, char **argv)
{
int queues, timeout;
unsigned long long totals;
int i;

printf("psem [nr queues] [timeout]\n");
if (argc != 3) {
printf(" Invalid parameters.\n");
return 0;
}
queues = atoi(argv[1]);
timeout = atoi(argv[2]);
printf("Using %d queues/cpus (%d threads) for %d seconds.\n",
queues, 2*queues, timeout);

g_results = new unsigned long long[2*queues];
g_svsem_ids = new int[queues];
g_threads = new pthread_t[2*queues];
for (i=0;i<queues;i++) {
init_threads(i, queues);
}

sleep(1);
g_state = RUNNING;
sleep(timeout);
g_state = STOPPED;
sleep(1);
for (i=0;i<queues;i++) {
int res;
res = semctl(g_svsem_ids[i],1,IPC_RMID,NULL);
if (res < 0) {
printf("semctl(IPC_RMID) failed for %d, errno%d.\n",
g_svsem_ids[i], errno);
}
}
for (i=0;i<2*queues;i++)
pthread_join(g_threads[i], NULL);

printf("Result matrix:\n");
totals = 0;
for (i=0;i<queues;i++) {
printf(" Thread %3d: %8lld %3d: %8lld\n",
i, g_results[i], i+queues, g_results[i+queues]);
totals += g_results[i] + g_results[i+queues];
}
printf("Total: %lld\n", totals);
}


Attachments:
pmsg.cpp (4.54 kB)
psem.cpp (4.71 kB)
Download all attachments

2008-03-22 19:09:19

by Manfred Spraul

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

Hi all,

I've revived my Dual-CPU Pentium III/850:
I couldn't notice a scalability-problem (two cpus are around 190%, but
just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower
than 2.6.18.8:

psem 2.6.18 2.6.25 Diff [%]
1 cpu 948.005 398.435 -57,97
2 cpus 1.768.273 734.816 -58,44
Scalability [%] 193,26 192,21

pmsg 2.6.18 2.6.25 Diff [%]
1 cpu 821.582 356.904 -56,56
2 cpus 1.488.058 661.754 -55,53
Scalability [%] 190,56 192,71

Attached are the .config files and the individual results.
Did I accidentially enable a scheduler debug option?

--
Manfred


Attachments:
bench.tar.gz (37.21 kB)

2008-03-22 19:35:22

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc


On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote:
> Mike Galbraith wrote:
> > Total: 4691827 Total: 3942000
> >
> Thanks. Unfortunately the test was buggy, it bound the tasks to the
> wrong cpu :-(
> Could you run it again? Actually 1 cpu and 4 cpus are probably enough.

Sure. (ran as before, hopefully no transcription errors)

2.6.22.18-cfs-v24-smp 2.6.24.3-smp
Result matrix: (psem)
Thread 0: 2395778 1: 2395779 Thread 0: 2054990 1: 2054992
Total: 4791557 Total: 4009069
Result matrix: (pmsg)
Thread 0: 2317014 1: 2317015 Thread 0: 1959099 1: 1959099
Total: 4634029 Total: 3918198


Result matrix:
Thread 0: 2340716 2: 2340716 Thread 0: 1890292 2: 1890293
Thread 1: 2361052 3: 2361052 Thread 1: 1899031 3: 1899032
Total: 9403536 Total: 7578648
Result matrix:
Thread 0: 1429567 2: 1429567 Thread 0: 1295071 2: 1295071
Thread 1: 1429267 3: 1429268 Thread 1: 1289253 3: 1289254
Total: 5717669 Total: 5168649


Result matrix:
Thread 0: 2263039 3: 2263039 Thread 0: 1351208 3: 1351209
Thread 1: 2265120 4: 2265121 Thread 1: 1351300 4: 1351300
Thread 2: 2263642 5: 2263642 Thread 2: 1319512 5: 1319512
Total: 13583603 Total: 8044041
Result matrix:
Thread 0: 483934 3: 483934 Thread 0: 514766 3: 514767
Thread 1: 239714 4: 239715 Thread 1: 252764 4: 252765
Thread 2: 270216 5: 270216 Thread 2: 253216 5: 253217
Total: 1987729 Total: 2041495


Result matrix:
Thread 0: 2260038 4: 2260039 Thread 0: 642235 4: 642236
Thread 1: 2262748 5: 2262749 Thread 1: 642742 5: 642743
Thread 2: 2271236 6: 2271237 Thread 2: 640281 6: 640282
Thread 3: 2257651 7: 2257652 Thread 3: 641931 7: 641931
Total: 18103350 Total: 5134381
Result matrix:
Thread 0: 382811 4: 382811 Thread 0: 342297 4: 342297
Thread 1: 382801 5: 382802 Thread 1: 342309 5: 342310
Thread 2: 376620 6: 376621 Thread 2: 343857 6: 343857
Thread 3: 376559 7: 376559 Thread 3: 343836 7: 343836
Total: 3037584 Total: 2744599

2008-03-23 06:38:53

by Manfred Spraul

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

Mike Galbraith wrote:
> On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote:
>
>> Mike Galbraith wrote:
>>
>>> Total: 4691827 Total: 3942000
>>>
>>>
>> Thanks. Unfortunately the test was buggy, it bound the tasks to the
>> wrong cpu :-(
>> Could you run it again? Actually 1 cpu and 4 cpus are probably enough.
>>
>
> Sure. (ran as before, hopefully no transcription errors)
>
>
Thanks:
sysv sem:
- 2.6.22 had almost linear scaling (up to 4 cores).
- 2.6.24.3 scales to 2 cpus, then it collapses. with 4 cores, it's 75%
slower than 2.6.22.

sysv msg:
- neither 2.6.22 nor 2.6.24 scale very good. That's more or less
expected, the message queue code contains a few global statistic
counters (msg_hdrs, msg_bytes).

The cleanup of sysv is nice, but IMHO sysv sem should remain scalable -
and a gloal semaphore with IDR can't be as scalable as the RCU protected
array that was used before.

--
Manfred

2008-03-23 07:10:49

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc


On Sat, 2008-03-22 at 20:35 +0100, Mike Galbraith wrote:
> On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote:
> > Mike Galbraith wrote:
> > > Total: 4691827 Total: 3942000
> > >
> > Thanks. Unfortunately the test was buggy, it bound the tasks to the
> > wrong cpu :-(
> > Could you run it again? Actually 1 cpu and 4 cpus are probably enough.
>
> Sure. (ran as before, hopefully no transcription errors)

Looking at the output over morning java, I noticed that pmsg didn't get
recompiled due to a fat finger, so those numbers are bogus. Corrected
condensed version of output is below, charted data attached.

(hope evolution doesn't turn this into something other than plain text)



1
2
3
4
2.6.22.18-cfs-v24.1 psem
4791557
9403536
13583603
18103350
2.6.22.18-cfs-v24.1 pmsg
4906249
9171440
13264752
17774106
2.6.24.3 psem
4009069
7578648
8044041
5134381
2.6.24.3 pmsg
3917588
7290206
7644794
4824967


Attachments:
xxxx.pdf (15.86 kB)

2008-03-23 07:15:27

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc


On Sun, 2008-03-23 at 07:38 +0100, Manfred Spraul wrote:
> Mike Galbraith wrote:
> > On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote:
> >
> >> Mike Galbraith wrote:
> >>
> >>> Total: 4691827 Total: 3942000
> >>>
> >>>
> >> Thanks. Unfortunately the test was buggy, it bound the tasks to the
> >> wrong cpu :-(
> >> Could you run it again? Actually 1 cpu and 4 cpus are probably enough.
> >>
> >
> > Sure. (ran as before, hopefully no transcription errors)
> >
> >
> Thanks:
> sysv sem:
> - 2.6.22 had almost linear scaling (up to 4 cores).
> - 2.6.24.3 scales to 2 cpus, then it collapses. with 4 cores, it's 75%
> slower than 2.6.22.
>
> sysv msg:
> - neither 2.6.22 nor 2.6.24 scale very good. That's more or less
> expected, the message queue code contains a few global statistic
> counters (msg_hdrs, msg_bytes).

Actually, 2.6.22 is fine, and 2.6.24.3 is not, just as sysv sem. I just
noticed that pmsg didn't get recompiled last night (fat finger) , and
sent a correction.

-Mike

2008-03-23 07:21:05

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc


On Sun, 2008-03-23 at 08:08 +0100, Mike Galbraith wrote:
> On Sat, 2008-03-22 at 20:35 +0100, Mike Galbraith wrote:
> > On Sat, 2008-03-22 at 15:22 +0100, Manfred Spraul wrote:
> > > Mike Galbraith wrote:
> > > > Total: 4691827 Total: 3942000
> > > >
> > > Thanks. Unfortunately the test was buggy, it bound the tasks to the
> > > wrong cpu :-(
> > > Could you run it again? Actually 1 cpu and 4 cpus are probably enough.
> >
> > Sure. (ran as before, hopefully no transcription errors)
>
> Looking at the output over morning java, I noticed that pmsg didn't get
> recompiled due to a fat finger, so those numbers are bogus. Corrected
> condensed version of output is below, charted data attached.
>
> (hope evolution doesn't turn this into something other than plain text)

Pff, I'd rather have had the bounce. Good thing I attached the damn
chart, evolution can't screw that up.

>
>
>
> 1
> 2
> 3
> 4
> 2.6.22.18-cfs-v24.1 psem
> 4791557
> 9403536
> 13583603
> 18103350
> 2.6.22.18-cfs-v24.1 pmsg
> 4906249
> 9171440
> 13264752
> 17774106
> 2.6.24.3 psem
> 4009069
> 7578648
> 8044041
> 5134381
> 2.6.24.3 pmsg
> 3917588
> 7290206
> 7644794
> 4824967
>

2008-03-25 15:50:35

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc


On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote:

> just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower
> than 2.6.18.8:

After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535,
2.6.25.git scaled linearly, but as you noted, markedly down from earlier
kernels with this benchmark. 2.6.24.4 with same revert, but all
2.6.25.git ipc changes piled on top still performed close to 2.6.22, so
I went looking. Bisection led me to..

8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit
commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
Author: Peter Zijlstra <[email protected]>
Date: Fri Jan 25 21:08:29 2008 +0100

sched: high-res preemption tick

Use HR-timers (when available) to deliver an accurate preemption tick.

The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.

The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

:040000 040000 ab225228500f7a19d5ad20ca12ca3fc8ff5f5ad1 f1742e1d225a72aecea9d6961ed989b5943d31d8 M arch
:040000 040000 25d85e4ef7a71b0cc76801a2526ebeb4dce180fe ae61510186b4fad708ef0211ac169decba16d4e5 M include
:040000 040000 9247cec7dd506c648ac027c17e5a07145aa41b26 950832cc1dc4d30923f593ecec883a06b45d62e9 M kernel

..and I verified it via :-/ echo 7 > sched_features in latest. That
only bought me roughly half though, so there's a part three in there
somewhere.

-Mike


Attachments:
xxxx.pdf (17.49 kB)

2008-03-25 16:01:19

by Nadia Derbey

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

Manfred Spraul wrote:
> Nadia Derbey wrote:
>
>> Manfred Spraul wrote:
>>
>>>
>>> A microbenchmark on a single-cpu system doesn't help much (except
>>> that 2.6.25 is around factor 2 slower for sysv msg ping-pong between
>>> two tasks compared to the numbers I remember from older kernels....)
>>>
>>
>> If I remember well, at that time I had used ctxbench and I wrote some
>> other small scripts.
>> And the results I had were around 2 or 3% slowdown, but I have to
>> confirm that by checking in my archives.
>>
> Do you have access to multi-core systems? The "best case" for the rcu
> code would be
> - 8 or 16 cores
> - one instance of ctxbench running on each core, bound to that core.
>
> I'd expect a significant slowdown. The big question is if it matters.
>
> --
> Manfred
>
>

Hi,

Here is what I could find on my side:

=============================================================

lkernel@akt$ cat tst3/res_new/output
[root@akt tests]# echo 32768 > /proc/sys/kernel/msgmni
[root@akt tests]# ./msgbench_std_dev_plot -n
32768000 msgget iterations in 21.469724 seconds = 1526294/sec

32768000 msgsnd iterations in 18.891328 seconds = 1734583/sec

32768000 msgctl(ipc_stat) iterations in 15.359802 seconds = 2133472/sec

32768000 msgctl(msg_stat) iterations in 15.296114 seconds = 2142260/sec

32768000 msgctl(ipc_rmid) iterations in 32.981277 seconds = 993542/sec

AVERAGE STD_DEV MIN MAX
GET: 21469.724000 566.024657 19880 23607
SEND: 18891.328000 515.542311 18433 21962
IPC_STAT: 15359.802000 274.918673 15147 17166
MSG_STAT: 15296.114000 155.775508 15138 16790
RM: 32981.277000 675.621060 32141 35433


lkernel@akt$ cat tst3/res_ref/output
[root@akt tests]# echo 32768 > /proc/sys/kernel/msgmni
[root@akt tests]# ./msgbench_std_dev_plot -r
32768000 msgget iterations in 665.842852 seconds = 49213/sec

32768000 msgsnd iterations in 18.363853 seconds = 1784458/sec

32768000 msgctl(ipc_stat) iterations in 14.609669 seconds = 2243001/sec

32768000 msgctl(msg_stat) iterations in 14.774829 seconds = 2217950/sec

32768000 msgctl(ipc_rmid) iterations in 31.134984 seconds = 1052483/sec

AVERAGE STD_DEV MIN MAX
GET: 665842.852000 946.697555 654049 672208
SEND: 18363.853000 107.514954 18295 19563
IPC_STAT: 14609.669000 43.100272 14529 14881
MSG_STAT: 14774.829000 97.174924 14516 15436
RM: 31134.984000 444.612055 30521 33523


==================================================================

Unfortunately, I haven't kept the exact kernel release numbers, but the
testing method was:
res_ref = unpatched kernel
res_new = same kernel release with my patches applied.

What I'll try to do is to re-run your tests (pmsg and psem) with this
method (from my what I saw, the patches applied on a 2.6.23-rc4-mm1),
but I can't do it before Thursday.

Regards,
Nadia

2008-03-25 16:14:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

On Tue, 2008-03-25 at 16:50 +0100, Mike Galbraith wrote:
> On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote:
>
> > just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower
> > than 2.6.18.8:
>
> After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535,
> 2.6.25.git scaled linearly, but as you noted, markedly down from earlier
> kernels with this benchmark. 2.6.24.4 with same revert, but all
> 2.6.25.git ipc changes piled on top still performed close to 2.6.22, so
> I went looking. Bisection led me to..
>
> 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit
> commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Jan 25 21:08:29 2008 +0100
>
> sched: high-res preemption tick
>
> Use HR-timers (when available) to deliver an accurate preemption tick.
>
> The regular scheduler tick that runs at 1/HZ can be too coarse when nice
> level are used. The fairness system will still keep the cpu utilisation 'fair'
> by then delaying the task that got an excessive amount of CPU time but try to
> minimize this by delivering preemption points spot-on.
>
> The average frequency of this extra interrupt is sched_latency / nr_latency.
> Which need not be higher than 1/HZ, its just that the distribution within the
> sched_latency period is important.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> :040000 040000 ab225228500f7a19d5ad20ca12ca3fc8ff5f5ad1 f1742e1d225a72aecea9d6961ed989b5943d31d8 M arch
> :040000 040000 25d85e4ef7a71b0cc76801a2526ebeb4dce180fe ae61510186b4fad708ef0211ac169decba16d4e5 M include
> :040000 040000 9247cec7dd506c648ac027c17e5a07145aa41b26 950832cc1dc4d30923f593ecec883a06b45d62e9 M kernel
>
> ...and I verified it via :-/ echo 7 > sched_features in latest. That
> only bought me roughly half though, so there's a part three in there
> somewhere.

Ouch, I guess hrtimers are just way expensive on some hardware...

2008-03-25 18:31:32

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc


On Tue, 2008-03-25 at 17:13 +0100, Peter Zijlstra wrote:
> On Tue, 2008-03-25 at 16:50 +0100, Mike Galbraith wrote:

> > ...and I verified it via :-/ echo 7 > sched_features in latest. That
> > only bought me roughly half though, so there's a part three in there
> > somewhere.
>
> Ouch, I guess hrtimers are just way expensive on some hardware...

That would be about on par with my luck. I'll try to muster up the
gumption to go looking for part three, though my motivation for
searching long ago proved to be a dead end wrt sysv ipc.

-Mike

2008-03-26 06:19:10

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc


On Tue, 2008-03-25 at 17:13 +0100, Peter Zijlstra wrote:

> > ...and I verified it via :-/ echo 7 > sched_features in latest. That
> > only bought me roughly half though, so there's a part three in there
> > somewhere.
>
> Ouch, I guess hrtimers are just way expensive on some hardware...

It takes a large bite out of my P4 as well.

2008-03-27 22:29:18

by Bill Davidsen

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

Mike Galbraith wrote:
> On Fri, 2008-03-21 at 17:08 +0100, Manfred Spraul wrote:
>> Paul E. McKenney wrote:
>>> I could give it a spin -- though I would need to be pointed to the
>>> patch and the test.
>>>
>>>
>> I'd just compare a recent kernel with something older, pre Fri Oct 19
>> 11:53:44 2007
>>
>> Then download ctxbench, run one instance on each core, bound with taskset.
>> http://www.tmr.com/%7Epublic/source/
>> (I don't juse ctxbench myself, if it doesn't work then I could post my
>> own app. It would be i386 only with RDTSCs inside)
>
> (test gizmos are always welcome)
>
> Results for Q6600 box don't look particularly wonderful.
>
> taskset -c 3 ./ctx -s
>
> 2.6.24.3
> 3766962 itterations in 9.999845 seconds = 376734/sec
>
> 2.6.22.18-cfs-v24.1
> 4375920 itterations in 10.006199 seconds = 437330/sec
>
> for i in 0 1 2 3; do taskset -c $i ./ctx -s& done
>
> 2.6.22.18-cfs-v24.1
> 4355784 itterations in 10.005670 seconds = 435361/sec
> 4396033 itterations in 10.005686 seconds = 439384/sec
> 4390027 itterations in 10.006511 seconds = 438739/sec
> 4383906 itterations in 10.006834 seconds = 438128/sec
>
> 2.6.24.3
> 1269937 itterations in 9.999757 seconds = 127006/sec
> 1266723 itterations in 9.999663 seconds = 126685/sec
> 1267293 itterations in 9.999348 seconds = 126742/sec
> 1265793 itterations in 9.999766 seconds = 126592/sec
>
Glad to see that ctxbench is still useful, I think there's a more recent
version I haven't put up, which uses threads rather than processes, but
there were similar values generated, so I somewhat lost interest. There
was a "round robin" feature to pass the token through more processes,
again I didn't find more use for the data.

I never tried binding the process to a CPU, in general the affinity code
puts one process per CPU under light load, and limits the context switch
overhead. It looks as if you are testing only the single CPU (or core) case.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2008-03-28 09:49:29

by Manfred Spraul

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc

diff -ur ctxbench-1.9.orig/ctxbench.c ctxbench-1.9/ctxbench.c
--- ctxbench-1.9.orig/ctxbench.c 2002-12-09 22:41:59.000000000 +0100
+++ ctxbench-1.9/ctxbench.c 2008-03-28 10:30:55.000000000 +0100
@@ -1,19 +1,28 @@
+#include <sched.h>
#include <time.h>
#include <errno.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
-#include <sched.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/shm.h>
#include <sys/sem.h>
#include <sys/msg.h>
#include <sys/stat.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/wait.h>

/* this should be in unistd.h!! */
/* #include <getopt.h> */

+/**************** Prototypes */
+
+void shmchild(int shm, int semid);
+void shmparent(int shm, int semid, pid_t child);
+void do_cpubind(int cpu);
+
/**************** General internal procs and flags here */
/* help/usage */
static void usage(void);
@@ -25,7 +34,6 @@
int Niter = 0;
/* Use signals rather than semiphores */
static void sig_NOP();
-static void wait_sig();
int OkayToRun = 0;
int ParentPID, ChildPID;
/* pipe vectors for -p option */
@@ -79,19 +87,20 @@

int msgqid;
int do_yield = 0;
-
-main(int argc, char *argv[])
+
+int main(int argc, char *argv[])
{
int shm;
struct shmid_ds buf;
int semid = -1;
- int child, stat;
+ int cpubind = -1;
+ int child;
int RunTime = 10;
union semun pvt_semun;

pvt_semun.val = 0;

- while ((shm = getopt(argc, argv, "sSLYmpn:t:")) != EOF) {
+ while ((shm = getopt(argc, argv, "sSLYmpn:t:c:")) != EOF) {
switch (shm) {
/* these are IPC types */
case 's': /* use semiphore */
@@ -124,11 +133,14 @@
case 't': /* give time to run */
RunTime = atoi(optarg);
break;
+ case 'c': /* bind to a specific cpu */
+ cpubind = atoi(optarg);
+ break;
default: /* typo */
usage();
}
}
-
+
signal(SIGALRM, timeout);
if (RunTime) alarm(RunTime);

@@ -164,7 +176,7 @@
}

/* identify version and method */
- printf("\n\nContext switching benchmark v1.17\n");
+ printf("\n\nContext switching benchmark v1.17-cpubind\n");
printf(" Using %s for IPC control\n", IPCname[IPCtype]);

printf(" Max iterations: %8d (zero = no limit)\n", Iterations);
@@ -174,13 +186,14 @@

ParentPID = getpid();
if ((child = fork()) == 0) {
+ do_cpubind(cpubind);
ChildPID = getpid();
shmchild(shm, semid);
} else {
+ do_cpubind(cpubind);
ChildPID = child;
shmparent(shm, semid, child);
}
-
wait(NULL);
if (shmctl(shm, IPC_RMID, &buf) != 0) {
perror("Error removing shared memory");
@@ -215,14 +228,13 @@
break;
}

- exit(0);
+ return 0;
}
-

/*******************************/
/* child using IPC method */

-int shmchild(int shm, int semid)
+void shmchild(int shm, int semid)
{
volatile char *mem;
int num = 0;
@@ -313,7 +325,7 @@
/********************************/
/* parent using shared memory */

-int shmparent(int shm, int semid, pid_t child)
+void shmparent(int shm, int semid, pid_t child)
{
volatile char *mem;
int num = 0;
@@ -328,7 +340,7 @@


if (!(mem = shmat(shm, 0, 0))) {
- perror("shmchild: Error attaching shared memory");
+ perror("shmparent: Error attaching shared memory");
exit(2);
}

@@ -439,7 +451,7 @@
exit(3);
}
}
-
+
/*****************************************************************
| usage - give the user a clue
****************************************************************/
@@ -458,6 +470,7 @@
" -p use pipes for IPC\n"
" -L spinLock in shared memory\n"
" -Y spinlock with sched_yield (for UP)\n"
+ " -cN bind to cpu N\n"
"\nRun limit options:\n"
" -nN limit loops to N (default via timeout)\n"
" -tN run for N sec, default 10\n\n"
@@ -490,3 +503,22 @@
signal(SIGUSR1, sig_NOP);
return;
}
+
+/*****************************************************************
+ | cpu_bind - bind all tasks to a given cpu
+ ****************************************************************/
+
+void do_cpubind(int cpubind)
+{
+ if (cpubind >= 0) {
+ cpu_set_t d;
+ int ret;
+
+ CPU_ZERO(&d);
+ CPU_SET(cpubind, &d);
+ ret = sched_setaffinity(0, sizeof(d), &d);
+ printf("%d: sched_setaffinity %d: %lxh\n",getpid(), ret, *((int*)&d));
+ ret = sched_getaffinity(0, sizeof(d), &d);
+ printf("%d: sched_getaffinity %d: %lxh\n",getpid(), ret, *((int*)&d));
+ }
+}


Attachments:
patch-cpubind (4.21 kB)

2008-03-30 14:12:32

by Manfred Spraul

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO)

Mike Galbraith wrote:
> On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote:
>
>
>> just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower
>> than 2.6.18.8:
>>
>
> After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535,
> 2.6.25.git scaled linearly
We can't just revert that patch: with IDR, a global lock is mandatory :-(
We must either revert the whole idea of using IDR or live with the
reduced scalability.

Actually, there are further bugs: the undo structures are not
namespace-aware, thus semop with SEM_UNDO, unshare, create new array
with same id, but more semaphores, another semop with SEM_UNDO will
corrupt kernel memory :-(
I'll try to clean up the bugs first, then I'll look at the scalability
again.

--
Manfred

2008-03-30 15:21:50

by David Newall

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO)

Manfred Spraul wrote:
> Mike Galbraith wrote:
>> On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote:
>>> just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60%
>>> slower than 2.6.18.8:
>>>
>>
>> After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535,
>> 2.6.25.git scaled linearly
> We can't just revert that patch: with IDR, a global lock is mandatory :-(
> We must either revert the whole idea of using IDR or live with the
> reduced scalability.
>
> Actually, there are further bugs: the undo structures are not
> namespace-aware, thus semop with SEM_UNDO, unshare, create new array
> with same id, but more semaphores, another semop with SEM_UNDO will
> corrupt kernel memory :-(

You should revert it all. The scalability problem isn't good, but from
what you're saying, the idea isn't ready yet. Revert it all, fix the
problems at your leisure, and submit new patches then.

2008-03-30 16:18:23

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scalability requirements for sysv ipc (+namespaces broken with SEM_UNDO)


On Sun, 2008-03-30 at 16:12 +0200, Manfred Spraul wrote:
> Mike Galbraith wrote:
> > On Sat, 2008-03-22 at 20:08 +0100, Manfred Spraul wrote:
> >
> >
> >> just the normal performance of 2.6.25-rc3 is abyssimal, 55 to 60% slower
> >> than 2.6.18.8:
> >>
> >
> > After manually reverting 3e148c79938aa39035669c1cfa3ff60722134535,
> > 2.6.25.git scaled linearly
> We can't just revert that patch: with IDR, a global lock is mandatory :-(
> We must either revert the whole idea of using IDR or live with the
> reduced scalability.

Yeah, I looked at the problem, but didn't know what the heck to do about
it, so just grabbed my axe to verify/quantify.

> Actually, there are further bugs: the undo structures are not
> namespace-aware, thus semop with SEM_UNDO, unshare, create new array
> with same id, but more semaphores, another semop with SEM_UNDO will
> corrupt kernel memory :-(
> I'll try to clean up the bugs first, then I'll look at the scalability
> again.

Great!

-Mike