LinuxLists.cc - [PATCH RFC] 1/2 central workspace for zlib

2003-05-30 14:37:19

Subject: [PATCH RFC] 1/2 central workspace for zlib

Hi!

The following creates a central workspace per cpu for the zlib. The
original idea was to save memory for embedded, but this should also
improve performance for smp.

Currently, each user of the zlib has to provide a workspace for the
zlib to draw memory from. Each workspace amounts to 400k for deflate
or 50k for inflate. Four users exist, as of 2.4.20:
jffs2: inflate & deflate, initialized once, 450k
cramfs: inflate, initialized once, 50k
zisofs: inflate, initialized once, 50k
ppp: inflate & deflate, per channel, 450k * n

jffs2 and zisofs protect their workspace with a semaphore, cramfs
doesn't but I'm sure it should (anyone ever tried cramfs on smp or
with preempt?). This doesn't scale too well on smp, even though few
people on smp should care about those filesystems.

This patch creates an extra workspace of 400k per cpu, that is used
for both inflate and deflate. One of the central workspaces is used
for users that don't provide their own. Semaphore protection is done
in zlib_(in|de)flateInit() and zlib_(in|de)flateEnd, so the user has
to call those functions more often to release the semaphores before
returning to userspace.

A second patch converts the three filesystems to use the central zlib,
and a third could convert ppp as well, but that isn't started yet.

The patches have had some basic testing on up with jffs2, but not
more, so handle with care. Test results are welcome.

Caveats:
With only cramfs (or zisofs) compiled in, this uses 350k more than
before. Currently everyone pays the price for deflate, whether
necessary or not. Might make sense to split workspaces up between
inflate and deflate, which hurts some with 50k, but helps others with
350k. Ideal would be to have the best of both, but I didn't dare to
tackle that problem (yet).

J?rn

--
Sometimes, asking the right question is already the answer.
-- Unknown

--- linux-2.4.20/lib/zlib_deflate/deflate.c~zlib_space 2002-11-29 00:53:15.000000000 +0100
+++ linux-2.4.20/lib/zlib_deflate/deflate.c 2003-05-28 15:55:25.000000000 +0200
@@ -208,6 +208,7 @@

if (level == Z_DEFAULT_COMPRESSION) level = 6;

+ zlib_get_workspace(strm);
mem = (deflate_workspace *) strm->workspace;

if (windowBits < 0) { /* undocumented feature: suppress zlib header */
@@ -558,6 +559,8 @@

strm->state = Z_NULL;

+ zlib_put_workspace(strm);
+
return status == BUSY_STATE ? Z_DATA_ERROR : Z_OK;
}

--- linux-2.4.20/lib/zlib_inflate/inflate.c~zlib_space 2002-11-29 00:53:15.000000000 +0100
+++ linux-2.4.20/lib/zlib_inflate/inflate.c 2003-05-28 15:55:25.000000000 +0200
@@ -8,6 +8,7 @@
#include "infblock.h"
#include "infutil.h"

+
int ZEXPORT zlib_inflate_workspacesize(void)
{
return sizeof(struct inflate_workspace);
@@ -35,6 +36,7 @@
if (z->state->blocks != Z_NULL)
zlib_inflate_blocks_free(z->state->blocks, z);
z->state = Z_NULL;
+ zlib_put_workspace(z);
return Z_OK;
}

@@ -46,12 +48,13 @@
int stream_size;
{
if (version == Z_NULL || version[0] != ZLIB_VERSION[0] ||
- stream_size != sizeof(z_stream) || z->workspace == Z_NULL)
+ stream_size != sizeof(z_stream))
return Z_VERSION_ERROR;

/* initialize state */
if (z == Z_NULL)
return Z_STREAM_ERROR;
+ zlib_get_workspace(z);
z->msg = Z_NULL;
z->state = &WS(z)->internal_state;
z->state->blocks = Z_NULL;
--- linux-2.4.20/include/linux/zlib.h~zlib_space 2003-04-11 10:37:13.000000000 +0200
+++ linux-2.4.20/include/linux/zlib.h 2003-05-30 16:16:06.000000000 +0200
@@ -78,6 +78,7 @@
struct internal_state FAR *state; /* not visible by applications */

void *workspace; /* memory allocated for this stream */
+ int ws_num; /* index in the internal workspace array */

int data_type; /* best guess about the data type: ascii or binary */
uLong adler; /* adler32 value of the uncompressed data */
@@ -171,6 +172,18 @@
This check is automatically made by deflateInit and inflateInit.
*/

+ZEXTERN void * __zlib_panic_workspace OF((void));
+/*
+ BIG FAT WARNING:
+ The only valid user of this function is a panic handler. This will
+ break for sure in any other case, as it bypasses the locking.
+
+ Returns the first internal workspace. Use this for a panic handler that
+ needs a workspace during panic, but shouldn't waste the memory for it
+ during normal operation.
+*/
+
+
ZEXTERN int ZEXPORT zlib_deflate_workspacesize OF((void));
/*
Returns the number of bytes that needs to be allocated for a per-
--- linux-2.4.20/lib/Config.in~zlib_space 2002-11-29 00:53:15.000000000 +0100
+++ linux-2.4.20/lib/Config.in 2003-05-28 15:55:25.000000000 +0200
@@ -35,4 +35,14 @@
fi
fi

+if [ "$CONFIG_ZLIB_INFLATE" = "y" -o \
+ "$CONFIG_ZLIB_DEFLATE" = "y" ]; then
+ define_tristate CONFIG_ZLIB_GENERIC y
+else
+ if [ "$CONFIG_ZLIB_INFLATE" = "m" -o \
+ "$CONFIG_ZLIB_DEFLATE" = "m" ]; then
+ define_tristate CONFIG_ZLIB_GENERIC m
+ fi
+fi
+
endmenu
--- linux-2.4.20/lib/Makefile~zlib_space 2002-11-29 00:53:15.000000000 +0100
+++ linux-2.4.20/lib/Makefile 2003-05-28 16:05:07.000000000 +0200
@@ -8,7 +8,8 @@

L_TARGET := lib.a

-export-objs := cmdline.o dec_and_lock.o rwsem-spinlock.o rwsem.o rbtree.o
+export-objs := cmdline.o dec_and_lock.o rwsem-spinlock.o rwsem.o rbtree.o \
+ zlib_generic.o

obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o \
bust_spinlocks.o rbtree.o dump_stack.o
@@ -22,6 +23,7 @@

subdir-$(CONFIG_ZLIB_INFLATE) += zlib_inflate
subdir-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate
+obj-$(CONFIG_ZLIB_GENERIC) += zlib_generic.o

# Include the subdirs, if necessary.
obj-y += $(join $(subdir-y),$(subdir-y:%=/%.o))
--- /dev/null 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.4.20/lib/zlib_generic.c 2003-05-30 16:19:52.000000000 +0200
@@ -0,0 +1,90 @@
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/zutil.h>
+#include <linux/vmalloc.h>
+
+#include "zlib_inflate/infutil.h" /* for the workspace size */
+#include "zlib_deflate/defutil.h"
+#define MAX(a,b) (a) > (b) ? (a) : (b)
+
+static void *zlib_workspace[NR_CPUS];
+static uint32_t zlib_workspace_free;
+static struct semaphore zlib_workspace_sem;
+static spinlock_t zlib_workspace_free_lock;
+
+void zlib_exit(void)
+{
+ int i;
+
+ for (i=0; i<smp_num_cpus; i++) {
+ vfree(zlib_workspace[i]);
+ }
+}
+
+int __init zlib_init(void)
+{
+ int i;
+ size_t size;
+
+ zlib_workspace_free = 0;
+ size = MAX(sizeof(struct inflate_workspace),
+ sizeof(struct deflate_workspace));
+ for (i=0; i<smp_num_cpus; i++) {
+ zlib_workspace[i] = vmalloc(size);
+ if (!zlib_workspace[i]) {
+ zlib_exit();
+ return -ENOMEM;
+ }
+ }
+ zlib_workspace_free = 0xffffffff;
+ sema_init(&zlib_workspace_sem, smp_num_cpus);
+ spin_lock_init(&zlib_workspace_free_lock);
+ return 0;
+}
+
+void zlib_get_workspace(z_streamp z)
+{
+ if (z->workspace) /* transition period, will remove workspace later */
+ z->ws_num = -1;
+ else {
+ down_interruptible(&zlib_workspace_sem);
+
+ spin_lock(&zlib_workspace_free_lock);
+ z->ws_num = ffs(zlib_workspace_free) - 1;
+ clear_bit(z->ws_num, &zlib_workspace_free);
+ spin_unlock(&zlib_workspace_free_lock);
+
+ z->workspace = zlib_workspace[z->ws_num];
+ }
+}
+
+void zlib_put_workspace(z_streamp z)
+{
+ if (z->ws_num < 0)
+ ;
+ else {
+ z->workspace = NULL;
+
+ spin_lock(&zlib_workspace_free_lock);
+ __set_bit(z->ws_num, &zlib_workspace_free);
+ spin_unlock(&zlib_workspace_free_lock);
+
+ up(&zlib_workspace_sem);
+ }
+}
+
+/**
+ * BIG FAT WARNING:
+ * The only valid user of this function is a panic handler. This will
+ * break for sure in any other case, as it bypasses the locking.
+ */
+void *__zlib_crash_workspace(void)
+{
+ return zlib_workspace[0];
+}
+
+EXPORT_SYMBOL(zlib_get_workspace);
+EXPORT_SYMBOL(zlib_put_workspace);
+EXPORT_SYMBOL(__zlib_crash_workspace);
+MODULE_AUTHOR("J?rn Engel <[email protected]>");
+MODULE_LICENSE("GPL");
--- linux-2.4.20/include/linux/zutil.h~zlib_space 2003-05-28 15:57:11.000000000 +0200
+++ linux-2.4.20/include/linux/zutil.h 2003-05-30 16:19:05.000000000 +0200
@@ -123,4 +123,7 @@
return (s2 << 16) | s1;
}

+extern void zlib_get_workspace(z_streamp z);
+extern void zlib_put_workspace(z_streamp z);
+
#endif /* _Z_UTIL_H */

2003-05-30 15:17:02

by James Morris

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Fri, 30 May 2003, J?rn Engel wrote:

> The following creates a central workspace per cpu for the zlib. The
> original idea was to save memory for embedded, but this should also
> improve performance for smp.
>
> Currently, each user of the zlib has to provide a workspace for the
> zlib to draw memory from. Each workspace amounts to 400k for deflate
> or 50k for inflate. Four users exist, as of 2.4.20:
> jffs2: inflate & deflate, initialized once, 450k
> cramfs: inflate, initialized once, 50k
> zisofs: inflate, initialized once, 50k
> ppp: inflate & deflate, per channel, 450k * n

In 2.5 (and soon 2.4), the crypto/deflate.c module makes use of zlib as
well. This is typically used in ipsec for ipcomp. Each ipcomp security
association has a zlib context, and access to the workspace is serialized
by the security association's bh lock.

(Something needs to be done for this particular case in general: a box
with 1000 tunnels could use use 450MB of atomic kernel memory just for
zlib workspaces alone. A solution I've been looking at for this is to
allow workspaces to be dynamically sized based on the compression
parameters instead of using worst-case figures. The memory savings may be
up to 90% in this case).

> This patch creates an extra workspace of 400k per cpu, that is used for
> both inflate and deflate. One of the central workspaces is used for
> users that don't provide their own. Semaphore protection is done in
> zlib_(in|de)flateInit() and zlib_(in|de)flateEnd, so the user has to
> call those functions more often to release the semaphores before
> returning to userspace.

This won't work for the bh lock protected case outlined above, and will
cause contention between different users of zlib.

- James
--
James Morris
<[email protected]>

2003-05-30 17:30:37

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Sat, 31 May 2003 01:29:42 +1000, James Morris wrote:
>
> In 2.5 (and soon 2.4), the crypto/deflate.c module makes use of zlib as
> well. This is typically used in ipsec for ipcomp. Each ipcomp security
> association has a zlib context, and access to the workspace is serialized
> by the security association's bh lock.
>
> (Something needs to be done for this particular case in general: a box
> with 1000 tunnels could use use 450MB of atomic kernel memory just for
> zlib workspaces alone. A solution I've been looking at for this is to
> allow workspaces to be dynamically sized based on the compression
> parameters instead of using worst-case figures. The memory savings may be
> up to 90% in this case).

Or 99.9%, if you map all those workspaces to one, similar to my patch.
The softirq context requires another set of workspaces, but in
principle the same could be done. Downside is that the Init- and End-
functions have to be called more often, which will surely cost cpu
time. I haven't done any benchmarks yet, so it remains to be shown if
this is relevant.

Your approach basically reintroduces the zmalloc() and zfree()
functions and makes behaviour under memory pressure interesting. It
is not impossible to get it right, but quite hard for sure.

> > This patch creates an extra workspace of 400k per cpu, that is used for
> > both inflate and deflate. One of the central workspaces is used for
> > users that don't provide their own. Semaphore protection is done in
> > zlib_(in|de)flateInit() and zlib_(in|de)flateEnd, so the user has to
> > call those functions more often to release the semaphores before
> > returning to userspace.
>
> This won't work for the bh lock protected case outlined above, and will
> cause contention between different users of zlib.

Image a 2-cpu machine that does reads and writes to jffs2 on two
devices simultaneously. When one process is reading and one is
writing, everything is fine. When both perform the same operation,
the current design makes one wait at the semaphore, while one cpu is
idle. In my design, you have one workspace per cpu, so both can work
simultaneously.

What contention were you talking about? :)

The bh lock is another issue. Here we need another set of workspaces,
independent of the existing one (with or without my patch). But in
principly, the same should be possible.

J?rn

--
The grand essentials of happiness are: something to do, something to
love, and something to hope for.
-- Allan K. Chalmers

2003-05-31 00:02:30

by David Miller

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

From: J?rn Engel <[email protected]>
Date: Fri, 30 May 2003 19:43:19 +0200

What contention were you talking about? :)

Actually, your idea is interesting. Are you going to use
per-cpu workspaces?

I think the best scheme is 2 per-cpu workspaces, one for
normal and one for softirq context.

No locking needed whatsoever. I hope it can work :-)

2003-05-31 06:08:25

by David Miller

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

From: James Morris <[email protected]>
Date: Sat, 31 May 2003 01:29:42 +1000 (EST)

This won't work for the bh lock protected case outlined above, and
will cause contention between different users of zlib.

My understanding is that these are just scratchpads. The contents
while a decompress/compress operation is not occuring does not
matter.

So if we have 2 such scratchpads per cpu, one for normal and one for
BH context, his idea truly can work and be useful to everyone.
It would also be lockless on SMP.

2003-05-31 06:35:59

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Fri, 30 May 2003 17:14:10 -0700, David S. Miller wrote:
>
> Actually, your idea is interesting. Are you going to use
> per-cpu workspaces?

Yes, I already am.

> I think the best scheme is 2 per-cpu workspaces, one for
> normal and one for softirq context.

Agreed. Current patch doesn't cover softirq context yet.

> No locking needed whatsoever. I hope it can work :-)

How about preemption? zlib operations take their time, so at least on
up, it makes sense to preempt them, when not in softirq context. Can
this still be done lockless?

J?rn

--
More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity.
-- W. A. Wulf

2003-05-31 06:43:25

by David Miller

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

From: J?rn Engel <[email protected]>
Date: Sat, 31 May 2003 08:48:51 +0200

> No locking needed whatsoever. I hope it can work :-)

How about preemption? zlib operations take their time, so at least on
up, it makes sense to preempt them, when not in softirq context. Can
this still be done lockless?

You'll need to disable preemption.

2003-05-31 07:43:13

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

CC List pruned a little.

On Fri, 30 May 2003 23:55:05 -0700, David S. Miller wrote:
> From: J?rn Engel <[email protected]>
>
> How about preemption? zlib operations take their time, so at least on
> up, it makes sense to preempt them, when not in softirq context. Can
> this still be done lockless?
>
> You'll need to disable preemption.

My gut feeling claims that this would hurt interactivity. Con, would
contest on a jffs2 (zlib compressed) filesystem be able to show
interactivity problems wrt zlib?

J?rn

--
Do not stop an army on its way home.
-- Sun Tzu

2003-05-31 07:47:25

by David Miller

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

From: J?rn Engel <[email protected]>
Date: Sat, 31 May 2003 09:56:15 +0200

On Fri, 30 May 2003 23:55:05 -0700, David S. Miller wrote:
> You'll need to disable preemption.

My gut feeling claims that this would hurt interactivity.

You can't leave the per-cpu workspace in an indeterminate
state, you'll context switch and meanwhile that workspace will
be used by another client or you'll next get scheduled on
a different cpu and use a different workspace.

2003-05-31 07:58:30

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Sat, 31 May 2003 00:59:09 -0700, David S. Miller wrote:
>
> You can't leave the per-cpu workspace in an indeterminate
> state, you'll context switch and meanwhile that workspace will
> be used by another client or you'll next get scheduled on
> a different cpu and use a different workspace.

Did you read the patch?

When getting scheduled, the workspace remains associated with the
owning process ( z->workspace, z->ws_num ). Other processes trying to
grab a workspace will get put on a waitqueue ( zlib_workspace_sem ).

The workspace is not exactly per cpu, it is per process. Just the
amount of workspaces happens to be equal to the amount of cpus in the
system, but a couple more or less should work just as well.

In softirq context you would be right. Preempt is disabled anyway and
cpu affinity comes for free.

J?rn

--
A defeated army first battles and then seeks victory.
-- Sun Tzu

2003-05-31 08:01:06

by David Miller

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

From: J?rn Engel <[email protected]>
Date: Sat, 31 May 2003 10:11:29 +0200

In softirq context you would be right. Preempt is disabled anyway
and cpu affinity comes for free.

Looks ok to me then.

2003-05-31 08:07:51

by Con Kolivas

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Sat, 31 May 2003 17:56, J?rn Engel wrote:
> CC List pruned a little.
>
> On Fri, 30 May 2003 23:55:05 -0700, David S. Miller wrote:
> > From: J?rn Engel <[email protected]>
> >
> > How about preemption? zlib operations take their time, so at least on
> > up, it makes sense to preempt them, when not in softirq context. Can
> > this still be done lockless?
> >
> > You'll need to disable preemption.
>
> My gut feeling claims that this would hurt interactivity. Con, would
> contest on a jffs2 (zlib compressed) filesystem be able to show
> interactivity problems wrt zlib?

The only way I could think of was perhaps using a load on another disk
(io_other which is in contest) that is using jffs2 when the contest baseline
is running on a normal filesystem - this has shown very little differences
between filesystems normally. Otherwise if everything in contest is run on
jffs2 it would affect every layer and hard to be sure you had a control to
compare with.

Con

2003-05-31 08:56:49

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Sat, 2003-05-31 at 09:56, Jörn Engel wrote:
> CC List pruned a little.
>
> On Fri, 30 May 2003 23:55:05 -0700, David S. Miller wrote:
> > From: Jörn Engel <[email protected]>
> >
> > How about preemption? zlib operations take their time, so at least on
> > up, it makes sense to preempt them, when not in softirq context. Can
> > this still be done lockless?
> >
> > You'll need to disable preemption.
>
> My gut feeling claims that this would hurt interactivity.

the code is decompressing data that was 4096 bytes originally. I doubt
that will be anything close to some real wallclock time nowadays....

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2003-05-31 10:38:21

by James Morris

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Fri, 30 May 2003, David S. Miller wrote:

> From: James Morris <[email protected]>
> Date: Sat, 31 May 2003 01:29:42 +1000 (EST)
>
> This won't work for the bh lock protected case outlined above, and
> will cause contention between different users of zlib.
>
> My understanding is that these are just scratchpads. The contents
> while a decompress/compress operation is not occuring does not
> matter.

It depends on how the zlib library is used. The filesystems and crypto
code use it so that each operation is distinct, although it is possible to
maintain compression history between operations: PPP does this via a
sliding compression window, and there are other potential users such as
ROHC.

One way of addressing this would to allow the user to supply their own
workspace if compression history needs to be maintained.

> So if we have 2 such scratchpads per cpu, one for normal and one for
> BH context, his idea truly can work and be useful to everyone.
> It would also be lockless on SMP.

And perhaps implement with a lazy allocation scheme so that these
scratchpads are only allocated if needed (i.e. a caller does not provide
its own workspace).

- James
--
James Morris
<[email protected]>

2003-05-31 10:54:53

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Sat, 31 May 2003 20:51:04 +1000, James Morris wrote:
> On Fri, 30 May 2003, David S. Miller wrote:
>
> > My understanding is that these are just scratchpads. The contents
> > while a decompress/compress operation is not occuring does not
> > matter.
>
> It depends on how the zlib library is used. The filesystems and crypto
> code use it so that each operation is distinct, although it is possible to
> maintain compression history between operations: PPP does this via a
> sliding compression window, and there are other potential users such as
> ROHC.
>
> One way of addressing this would to allow the user to supply their own
> workspace if compression history needs to be maintained.

Agreed. How much memory is needed for the history? Most of the
workspace or substancially less?

> > So if we have 2 such scratchpads per cpu, one for normal and one for
> > BH context, his idea truly can work and be useful to everyone.
> > It would also be lockless on SMP.
>
> And perhaps implement with a lazy allocation scheme so that these
> scratchpads are only allocated if needed (i.e. a caller does not provide
> its own workspace).

Disagreed. Two scratchpads per cpu won't hurt in the mean case much
and lazy allocation wouldn't improve the worst case either. Keep them
static and simple.

J?rn

--
With a PC, I always felt limited by the software available. On Unix,
I am limited only by my knowledge.
-- Peter J. Schoenster

2003-06-01 01:12:19

by Matt Mackall

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Sat, May 31, 2003 at 06:22:21PM +1000, Con Kolivas wrote:
>
> The only way I could think of was perhaps using a load on another disk
> (io_other which is in contest) that is using jffs2 when the contest baseline
> is running on a normal filesystem - this has shown very little differences
> between filesystems normally. Otherwise if everything in contest is run on
> jffs2 it would affect every layer and hard to be sure you had a control to
> compare with.

Timing on jffs2 is notoriously unrepeatable anyway - it's fully log
structured rather than journalled so it behaves a little differently.

--
Matt Mackall : http://www.selenic.com : of or relating to the moon

2003-06-02 12:06:55

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Sat, 31 May 2003 20:25:16 -0500, Matt Mackall wrote:
>
> Timing on jffs2 is notoriously unrepeatable anyway - it's fully log
> structured rather than journalled so it behaves a little differently.

cp jffs2.image /dev/partition_for_jffs2
do_test_1
cp jffs2.image /dev/partition_for_jffs2
do_test_2

No problem, if you know how to deal with it. And when doing
comparison benchmarks remember to sleep a lot. Without that, you end
up measuring GC efficiency, which doesn't matter in real life too
much.

J?rn

--
To recognize individual spam features you have to try to get into the
mind of the spammer, and frankly I want to spend as little time inside
the minds of spammers as possible.
-- Paul Graham

2003-06-02 15:02:42

by matsunaga

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

> The following creates a central workspace per cpu for the zlib. The
> original idea was to save memory for embedded, but this should also
> improve performance for smp.

Hi

Thank you for the patch.
It definitely reduces resources and improve multiple CPU scalability.

But I still would like to stick to performance.
(Though I haven't evaluated the performance yet...)
So far I think MTD is used mostly on Embedded device,
in which single CPU which is not so powerful is used.

How is the following code (it is ugly though)?

static void default_workspace[WSIZE];

<snip>

size = MAX(sizeof(struct inflate_workspace),
sizeof(struct deflate_workspace));

if(WSIZE < size)
BUG();

zlib_workspace[0] = default_workspace;

for (i=1; i<smp_num_cpus; i++) {
zlib_workspace[i] = vmalloc(size);
if (!zlib_workspace[i]) {
zlib_exit();
return -ENOMEM;
}
}

P.S.
There is another vmalloc in mtdblock_open()...;-)

2003-06-02 15:23:58

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Tue, 3 June 2003 00:15:56 +0900, matsunaga wrote:
>
> But I still would like to stick to performance.
> (Though I haven't evaluated the performance yet...)
> So far I think MTD is used mostly on Embedded device,
> in which single CPU which is not so powerful is used.
>
> How is the following code (it is ugly though)?
>
> static void default_workspace[WSIZE];
>
> <snip>
>
> size = MAX(sizeof(struct inflate_workspace),
> sizeof(struct deflate_workspace));
>
> if(WSIZE < size)
> BUG();
>
> zlib_workspace[0] = default_workspace;
>
> for (i=1; i<smp_num_cpus; i++) {
> zlib_workspace[i] = vmalloc(size);
> if (!zlib_workspace[i]) {
> zlib_exit();
> return -ENOMEM;
> }
> }

Not bad. We can even do a little better. Since only one workspace is
really absolutely necessary (ignoring the siftirq case), there is no
need to fail anymore. If we don't get enough memory for all
workspaces, initialize the semaphore to be a little lower and live
with fewer workspaces.

I like your ideas, really! :)

> There is another vmalloc in mtdblock_open()...;-)

...that is not trivial to get rid of. Image the case where two
processes are writing to two devices. With two buffers, we do rmew
whenever switching blocks for one device. With one buffer, we have to
do it for every context switch between those two processes, which will
wear down the flash a lot faster.

Considering that mtdblock should not be performance critical in
production use anyway, this is a very hard problem. What do you
think?

J?rn

--
The only real mistake is the one from which we learn nothing.
-- John Powell

2003-06-02 15:26:47

by David Woodhouse

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Mon, 2003-06-02 at 16:36, Jörn Engel wrote:
> Considering that mtdblock should not be performance critical in
> production use anyway, this is a very hard problem. What do you
> think?

mtdblock shouldn't actually be _using_ the vmalloc'd buffer for write
caching in production at all. Anyone using mtdblock in write mode for
production wants shooting.

Perhaps we could get away with allocating it only when the device is
opened for write? Even that's suboptimal since in 2.4, JFFS2 opens the
mtdblock device for write but doesn't actually _write_ to it; just gets
the appropriate MTD device and uses that directly.

--
dwmw2

2003-06-02 15:40:49

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Mon, 2 June 2003 16:40:07 +0100, David Woodhouse wrote:
>
> mtdblock shouldn't actually be _using_ the vmalloc'd buffer for write
> caching in production at all. Anyone using mtdblock in write mode for
> production wants shooting.

Or any other form of lart.

> Perhaps we could get away with allocating it only when the device is
> opened for write? Even that's suboptimal since in 2.4, JFFS2 opens the
> mtdblock device for write but doesn't actually _write_ to it; just gets
> the appropriate MTD device and uses that directly.

Maybe lazy allocation. vmalloc() it with the first write(), which
should be never in production use. So the extra overhead doesn't
really matter.

Do you mind if I submit such a patch?

J?rn

--
Correctness comes second.
Features come third.
Performance comes last.
Maintainability is needed for all of them.

2003-06-02 15:46:07

by David Woodhouse

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Mon, 2003-06-02 at 16:53, Jörn Engel wrote:
> Maybe lazy allocation. vmalloc() it with the first write(), which
> should be never in production use. So the extra overhead doesn't
> really matter.

Seems reasonable.

> Do you mind if I submit such a patch?

For the CVS code, which is what's in 2.5, please do. Don't bother with
2.4; I rewrote all the block translation layer crap so the individual
translation drivers doesn't have to be too involved with the horridness
of the Linux block layer.

--
dwmw2

2003-06-02 15:54:04

by matsunaga

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

> Not bad. We can even do a little better. Since only one workspace is
> really absolutely necessary (ignoring the siftirq case), there is no
> need to fail anymore. If we don't get enough memory for all
> workspaces, initialize the semaphore to be a little lower and live
> with fewer workspaces.
>
> I like your ideas, really! :)

Thanks.

> ...that is not trivial to get rid of. Image the case where two
> processes are writing to two devices. With two buffers, we do rmew
> whenever switching blocks for one device. With one buffer, we have to
> do it for every context switch between those two processes, which will
> wear down the flash a lot faster.
>
> Considering that mtdblock should not be performance critical in
> production use anyway, this is a very hard problem. What do you
> think?

I understand the difficulty. It is difficult to be hard coded for multi devices.
There maybe other things to be modified to improve perforamance.

BR.
Matsunaga

2003-06-02 16:24:01

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Mon, 2 June 2003 16:59:25 +0100, David Woodhouse wrote:
> On Mon, 2003-06-02 at 16:53, J?rn Engel wrote:
> > Maybe lazy allocation. vmalloc() it with the first write(), which
> > should be never in production use. So the extra overhead doesn't
> > really matter.
>
> Seems reasonable.

Patch is in CVS.

Not 100% sure about the correct return code, if the lazy allocation
fails. Can you check that?

Matsunaga, I guess that the extra memory you now have on your machine
has more impact on performance than statical allocation would have.
Translate the saved memory into a monetary unit and you even have a
lart that works for managers.

J?rn

--
You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a
speed hack until you've proven that's where the bottleneck is.
-- Rob Pike

2003-06-02 19:22:58

by David Woodhouse

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Mon, 2003-06-02 at 17:37, J?rn Engel wrote:
> On Mon, 2 June 2003 16:59:25 +0100, David Woodhouse wrote:
> > On Mon, 2003-06-02 at 16:53, J?rn Engel wrote:
> > > Maybe lazy allocation. vmalloc() it with the first write(), which
> > > should be never in production use. So the extra overhead doesn't
> > > really matter.
> >
> > Seems reasonable.
>
> Patch is in CVS.
>
> Not 100% sure about the correct return code, if the lazy allocation
> fails. Can you check that?

Thanks. The return code doesn't matter. If you look at the caller you
see it's only used as uptodate = !!ret anyway. It can't go any further.

I was concerned briefly that the allocation could happen when we're
trying to evict a page on memory pressure -- but in fact that's
virtually impossible since you'd have to mount the file system and mmap
your file without actually writing anything to the block device.

And anyway, refer to earlier comments about anyone using the na?ve
read/erase/modify/writeback mtdblock translation layer for writing in
production wanting to be shot for it :)

--
dwmw2

2003-06-02 20:41:29

by Jörn Engel

[permalink] [raw]

Subject: Re: [PATCH RFC] 1/2 central workspace for zlib

On Mon, 2 June 2003 20:36:13 +0100, David Woodhouse wrote:
>
> I was concerned briefly that the allocation could happen when we're
> trying to evict a page on memory pressure -- but in fact that's
> virtually impossible since you'd have to mount the file system and mmap
> your file without actually writing anything to the block device.

Right. If someone gets into memory pressure before ever writing to
the device, there should be other problems in the system anyway. No
need to fix something that is already broken elsewhere.

> And anyway, refer to earlier comments about anyone using the na?ve
> read/erase/modify/writeback mtdblock translation layer for writing in
> production wanting to be shot for it :)

Right!

J?rn

--
More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity.
-- W. A. Wulf