2001-10-03 08:41:24

by Pierre PEIFFER

[permalink] [raw]
Subject: e2compress in kernel 2.4

Hi !

We are willing to port e2compress from 2.2 kernel series to 2.4 and
we are looking for the right way for porting the compression on the
write part.

For the read operation, we can adapt the original design: the 2.2
part of e2compress can be easily integrated in the 2.4 version; for the
write, it is a little bit more complicated...

As we understand, in the 2.2 kernel, the compression is integrated
between the page cache and the buffer cache, i.e. data pointed by the
pages remain always uncompressed, but the compression occurs on buffers
=> data pointed by the buffers become compressed when the system decide
to.
What we also saw is that in 2.2, in ext2_file_write, the writes
occurs on buffers, and after that, the system looks for the
corresponding page, and if it is present, it also update the data in
this page.

But, under 2.4, as we see in the "generic_file_write", the write
operation occurs on pages, and no more on buffers as in 2.2. And the
needed buffers are created and associated to the page, i.e. the b_data
field of the buffers points on the data of the considered page.

So, here, we are a little bit confused because we don't know where
to introduce the compression, if we keep the same idea of the 2.2
design... In fact, on one hand, once the buffers will be compressed, the
pages will also become compressed, but on the other hand, we don't want
the pages to be compressed, because, the pages, once registered and
linked to the inode are supposed to be uncompressed...

So our idea was to introduce the notion of "cluster of pages", as
the notion of cluster of blocks, i.e. performs the write on several
pages at a time, then compress the buffers corresponding to these pages,
but here the data of the buffers should be splitted up from the data of
the pages and that's our problem... We don't know how to do this. Is
there a way to do this ?

And, from a more general point of view, do you think our approach
has a chance to succeed ?

If you have any questions, feel free to ask more explainations.

Thanks,

Pierre & Denis

PS: Please, cc'ed me personnaly in the answer, I'm not subscribed to the
list.


2001-10-03 09:05:51

by Petru Paler

[permalink] [raw]
Subject: Re: e2compress in kernel 2.4



--On Wednesday, October 03, 2001 10:41:14 +0200 Pierre PEIFFER
<[email protected]> wrote:

> So, here, we are a little bit confused because we don't know where
> to introduce the compression, if we keep the same idea of the 2.2
> design... In fact, on one hand, once the buffers will be compressed, the
> pages will also become compressed, but on the other hand, we don't want
> the pages to be compressed, because, the pages, once registered and
> linked to the inode are supposed to be uncompressed...

Why don't you build it on top of ext3, and do compression right before a
transaction commit?

--
Real programmers use chmod +x /dev/random and cross their fingers.

2001-10-03 12:30:43

by Padraig Brady

[permalink] [raw]
Subject: Re: e2compress in kernel 2.4

Would it not be better to do all the (de)compression @ the page cache
level (http://linuxcompressed.sourceforge.net/) and then you get other
advantages.
Then you would just use the compression bit in ext2 to mark blocks that
should not be decompressed before passing down towards the disk and vive
versa.
Note also since ramfs uses the page cache directly it would get transparent
compression for free?

Padraig.

Pierre PEIFFER wrote:

>Hi !
>
> We are willing to port e2compress from 2.2 kernel series to 2.4 and
>we are looking for the right way for porting the compression on the
>write part.
>
> For the read operation, we can adapt the original design: the 2.2
>part of e2compress can be easily integrated in the 2.4 version; for the
>write, it is a little bit more complicated...
>
> As we understand, in the 2.2 kernel, the compression is integrated
>between the page cache and the buffer cache, i.e. data pointed by the
>pages remain always uncompressed, but the compression occurs on buffers
>=> data pointed by the buffers become compressed when the system decide
>to.
> What we also saw is that in 2.2, in ext2_file_write, the writes
>occurs on buffers, and after that, the system looks for the
>corresponding page, and if it is present, it also update the data in
>this page.
>
> But, under 2.4, as we see in the "generic_file_write", the write
>operation occurs on pages, and no more on buffers as in 2.2. And the
>needed buffers are created and associated to the page, i.e. the b_data
>field of the buffers points on the data of the considered page.
>
> So, here, we are a little bit confused because we don't know where
>to introduce the compression, if we keep the same idea of the 2.2
>design... In fact, on one hand, once the buffers will be compressed, the
>pages will also become compressed, but on the other hand, we don't want
>the pages to be compressed, because, the pages, once registered and
>linked to the inode are supposed to be uncompressed...
>
> So our idea was to introduce the notion of "cluster of pages", as
>the notion of cluster of blocks, i.e. performs the write on several
>pages at a time, then compress the buffers corresponding to these pages,
>but here the data of the buffers should be splitted up from the data of
>the pages and that's our problem... We don't know how to do this. Is
>there a way to do this ?
>
> And, from a more general point of view, do you think our approach
>has a chance to succeed ?
>
> If you have any questions, feel free to ask more explainations.
>
> Thanks,
>
> Pierre & Denis
>
>PS: Please, cc'ed me personnaly in the answer, I'm not subscribed to the
>list.
>


2001-10-03 13:29:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: e2compress in kernel 2.4

Pierre PEIFFER <[email protected]> writes:

> Hi !
>
> We are willing to port e2compress from 2.2 kernel series to 2.4 and
> we are looking for the right way for porting the compression on the
> write part.
>
> For the read operation, we can adapt the original design: the 2.2
> part of e2compress can be easily integrated in the 2.4 version; for the
> write, it is a little bit more complicated...

I'm not certain you even want to reuse the read path, as is from 2.2

>
> So, here, we are a little bit confused because we don't know where
> to introduce the compression, if we keep the same idea of the 2.2
> design... In fact, on one hand, once the buffers will be compressed, the
> pages will also become compressed, but on the other hand, we don't want
> the pages to be compressed, because, the pages, once registered and
> linked to the inode are supposed to be uncompressed...
>
> So our idea was to introduce the notion of "cluster of pages", as
> the notion of cluster of blocks, i.e. performs the write on several
> pages at a time, then compress the buffers corresponding to these pages,
> but here the data of the buffers should be splitted up from the data of
> the pages and that's our problem... We don't know how to do this. Is
> there a way to do this ?
>

You can't reuse the page cache buffers, so some amount of double buffering
is needed. The "cluster of pages" idea is already in the e2compr on-disk
format so it is natural. Doing the compression only at close (as is
done in the 2.0 version ) may also be appropriate. In either case
what you need is an extra address_space per inode. In the extra
address space you can keep your compressed data.

The index on the compressed data should be something like
(compressed_block * compressed_block_size) +
index_into_compressed_block.

The problems you face are similiar to those faced by journaling and
more so by delayed disk block allocation. If you can get delayed
allocation going then there is a good chance you can reduce
fragmentation by only writing the data compressed, and reading
and uncompressing the data on the fly.

Note: delayed allocation is a much easier problem than journalling
as writes may be flushed anytime memory is low. Though when you
throw compression into the mix you might have another set of problems.

2.4 should be able to handle logical disk blocks > PAGE_SIZE just
fine if your write routine can handle gathering up a couple of them.

> And, from a more general point of view, do you think our approach
> has a chance to succeed ?

I think you want to step back and understand the page cache in 2.4.
It should be much easier to work with then going through the buffer
cache was in 2.2 and earlier but it is going to require some
noticeable algorithm changes, on how reads and writes are handled.

Also please keep me in the loop. I can't commit to anything but I'm
just about interested enough to implement some of the needed changes.

Eric

2001-10-16 13:00:52

by Padraig Brady

[permalink] [raw]
Subject: Re: e2compress in kernel 2.4

Would it not be better to use JFFS2 and the new blkmtd driver
(which makes any block device appear as an MTD device)?
http://lists.infradead.org/pipermail/linux-mtd/2001-June/002711.html

Padraig.

Eric W. Biederman wrote:

>Pierre PEIFFER <[email protected]> writes:
>
>>Hi !
>>
>> We are willing to port e2compress from 2.2 kernel series to 2.4 and
>>we are looking for the right way for porting the compression on the
>>write part.
>>
>> For the read operation, we can adapt the original design: the 2.2
>>part of e2compress can be easily integrated in the 2.4 version; for the
>>write, it is a little bit more complicated...
>>
>
>I'm not certain you even want to reuse the read path, as is from 2.2
>
>> So, here, we are a little bit confused because we don't know where
>>to introduce the compression, if we keep the same idea of the 2.2
>>design... In fact, on one hand, once the buffers will be compressed, the
>>pages will also become compressed, but on the other hand, we don't want
>>the pages to be compressed, because, the pages, once registered and
>>linked to the inode are supposed to be uncompressed...
>>
>> So our idea was to introduce the notion of "cluster of pages", as
>>the notion of cluster of blocks, i.e. performs the write on several
>>pages at a time, then compress the buffers corresponding to these pages,
>>but here the data of the buffers should be splitted up from the data of
>>the pages and that's our problem... We don't know how to do this. Is
>>there a way to do this ?
>>
>
>You can't reuse the page cache buffers, so some amount of double buffering
>is needed. The "cluster of pages" idea is already in the e2compr on-disk
>format so it is natural. Doing the compression only at close (as is
>done in the 2.0 version ) may also be appropriate. In either case
>what you need is an extra address_space per inode. In the extra
>address space you can keep your compressed data.
>
>The index on the compressed data should be something like
>(compressed_block * compressed_block_size) +
>index_into_compressed_block.
>
>The problems you face are similiar to those faced by journaling and
>more so by delayed disk block allocation. If you can get delayed
>allocation going then there is a good chance you can reduce
>fragmentation by only writing the data compressed, and reading
>and uncompressing the data on the fly.
>
>Note: delayed allocation is a much easier problem than journalling
>as writes may be flushed anytime memory is low. Though when you
>throw compression into the mix you might have another set of problems.
>
>2.4 should be able to handle logical disk blocks > PAGE_SIZE just
>fine if your write routine can handle gathering up a couple of them.
>
>> And, from a more general point of view, do you think our approach
>>has a chance to succeed ?
>>
>
>I think you want to step back and understand the page cache in 2.4.
>It should be much easier to work with then going through the buffer
>cache was in 2.2 and earlier but it is going to require some
>noticeable algorithm changes, on how reads and writes are handled.
>
>Also please keep me in the loop. I can't commit to anything but I'm
>just about interested enough to implement some of the needed changes.
>
>Eric
>