2001-11-01 17:14:44

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: writing a plugin for reiserfs compression

hi

I got this idea the other day...

Novell NetWare has a feature I really like. It's a file compression
feature they've been having since version 4.0 (or 4.10) of the OS.

- Once a day, a job is run to compress all files that havent been touched
within <n> days - default 14, that have not been flagged CAN'T COMPRESS
or DON'T COMPRESS (see below).

- After the file is compressed, it's checked against the compression
gain. If this is less than <n> per cent (default 30), the compressed
version is being deleted and the file is flagged CAN'T COMPRESS. If the
file is compressed, the uncompressed version is being deleted and the file
is flagged COMPRESSED.

- When a compressed file is accessed, it'll be decompressed on the fly and
flagged ACCESSED AFTER COMPRESSION. The next time it's accessed within the
given <n> days (above), it's decompressed and the compressed file
discarded. The flag COMPRESSED is cleared.

Files can be flagged 'DON'T COMPRESS' and 'FORCE COMPRESS' manually by the
user or admin. 'FORCE COMPRESS' is dominant over 'CAN'T COMPRESS'.

The result is that you're saving loads of space (typically 50-70% on a
netware file server) and, since the compression job is batched up
(typically by night), the performance penalty is minimal. File
decompression will happen quite rarely, as only the least-accessed files
are compressed.

TODO:
New attributes must be added somehow. 'ls' and 'find' and perhaps other
files must be modified to take advantage of this. The compression job can
be a simple script with something like

find . -type f ! --compressed ! --dont-compress / -exec fcomp {} \;

(and check can't compress and force compression).

There must be a way to access the compressed files directly to make
backups more efficient - backing up already compressed files's a good
thing.

COMMENT:
And yes - I know a lot of people are saying this is something we don't
need, as diskspace doesn't cost anything today compared to what it used
to. The first time I heard that, was in '92. We're always using too much
diskspace!

Please cc: to me as I'm not on the list

roy
---
Praktiserende dyslektiker.
La ikke ortografiske krumspring skygge for
intensjonen bak denne fremstilling.


2001-11-01 20:08:37

by Andreas Dilger

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

On Nov 01, 2001 18:14 +0100, Roy Sigurd Karlsbakk wrote:
> Novell NetWare has a feature I really like. It's a file compression
> feature they've been having since version 4.0 (or 4.10) of the OS.

Yes, there is a patch for ext2 that does this as well.

> New attributes must be added somehow. 'ls' and 'find' and perhaps other
> files must be modified to take advantage of this. The compression job can
> be a simple script with something like
>
> find . -type f ! --compressed ! --dont-compress / -exec fcomp {} \;
>
> (and check can't compress and force compression).

There already exists a patch for reiserfs which uses the same interface
to file attributes that ext2 and ext3 use.

Also, ext2 already has a "compressed", "do not compress", and "dirty"
attributes. They are currently not all user modifyable for ext2
filesystems via chattr/lsattr, but that doesn't mean they cannot be
on reiserfs.

> There must be a way to access the compressed files directly to make
> backups more efficient - backing up already compressed files's a good
> thing.

Yes, there is also such an attribute for "raw" access I think.

Making the user-space interface and tools as compatible as possible is
a good thing, IMHO, just like "ls", "cp", etc all work regardless of
the underlying filesystem.

As a note to whoever at namesys created the reiserfs patch to add the
"notail" flag (overloading the "nodump" flag). I would much rather
that a new "notail" flag be allocated for this. I will contact Ted
Ted Ts'o to get a flag assigned. This will avoid any problems in the
future, and may also be useful at some time for ext2.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-11-01 20:17:58

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

Andreas Dilger ([email protected]) wrote*:
>
>On Nov 01, 2001 18:14 +0100, Roy Sigurd Karlsbakk wrote:
>> Novell NetWare has a feature I really like. It's a file compression
>> feature they've been having since version 4.0 (or 4.10) of the OS.
>
>Yes, there is a patch for ext2 that does this as well.

ok...

I just thought there was a patch doing windows nt-like
compress-em-all-realtime-and-get-doomed!

>> New attributes must be added somehow. 'ls' and 'find' and perhaps other
>> files must be modified to take advantage of this. The compression job can
>> be a simple script with something like
>>
>> find . -type f ! --compressed ! --dont-compress / -exec fcomp {} \;
>>
>> (and check can't compress and force compression).

There already exists a patch for reiserfs which uses the same interface
to file attributes that ext2 and ext3 use.

ok? with batched compression?

>Also, ext2 already has a "compressed", "do not compress", and "dirty"
>attributes. They are currently not all user modifyable for ext2
>filesystems via chattr/lsattr, but that doesn't mean they cannot be
>on reiserfs.
>
>> There must be a way to access the compressed files directly to make
>> backups more efficient - backing up already compressed files's a good
>> thing.
>
>Yes, there is also such an attribute for "raw" access I think.
>
>Making the user-space interface and tools as compatible as possible is
>a good thing, IMHO, just like "ls", "cp", etc all work regardless of
>the underlying filesystem.

yes, but it's kinda nice to have some way of checking a file's attributes for a
sysadmin...

>As a note to whoever at namesys created the reiserfs patch to add the
>"notail" flag (overloading the "nodump" flag). I would much rather
>that a new "notail" flag be allocated for this. I will contact Ted
>Ted Ts'o to get a flag assigned. This will avoid any problems in the
>future, and may also be useful at some time for ext2.

Do you think the other flags I mentioned may be useful?

roy

2001-11-01 21:15:59

by Andreas Dilger

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

On Nov 01, 2001 20:17 +0000, Roy Sigurd Karlsbakk wrote:
> Andreas Dilger ([email protected]) wrote*:
> >Yes, there is a patch for ext2 that does this as well.
>
> I just thought there was a patch doing windows nt-like
> compress-em-all-realtime-and-get-doomed!

I don't know what the actual heuristics for determining which files are
compression with the ext2 patch. It is definitely NOT a compressed
block device. Files are compressed in chunks (32kB?), so that it is
possible to seek and do read-modify-write (e.g. appending to a file)
without decompressing the entire file and/or recompressing it. This also
protects against block corruption, since you would limit the amount of
data lost to the end of the chunk after the bad spot.

> >> New attributes must be added somehow. 'ls' and 'find' and perhaps other
> >> files must be modified to take advantage of this. The compression job can
> >> be a simple script with something like
> >>
> >> find . -type f ! --compressed ! --dont-compress / -exec fcomp {} \;
> >>
> >> (and check can't compress and force compression).
>
> > There already exists a patch for reiserfs which uses the same interface
> > to file attributes that ext2 and ext3 use.
>
> ok? with batched compression?

No compression is there for either ext2 (without the patch), and not at
all for reiserfs (AFAIK). All the patch does is give reiserfs the ABILITY
to store per-inode attributes in a way compatible with the existing ext2
attributes. Since most people already have e2fsprogs installed (fsck,
lsattr, chattr live there) it makes sense to use the same user-space
tools to do the same thing, and even the same ioctl numbers/flag values.

> >Also, ext2 already has a "compressed", "do not compress", and "dirty"
> >attributes. They are currently not all user modifyable for ext2
> >filesystems via chattr/lsattr, but that doesn't mean they cannot be
> >on reiserfs.
>
> yes, but it's kinda nice to have some way of checking a file's attributes
> for a sysadmin...

That's what "lsattr" is for. All I was getting at (when mentioning "ls"
and "cp") is that where possible the user tools should be compatible,
even if the underlying filesystem is different. I would rather avoid the
case (currently being worked on) with ext2 and XFS ACL user tools being
different, using different ioctls, but doing 99% of the same function.
It is ugly to think "reiserattr" and "xfsattr" and "chattr/lsattr (ext2)",
and "getacl/setacl", "xfsacl" commands instead of a single set of commands
(and even kernel API) that hide the details from the user.

> Do you think the other flags I mentioned may be useful?

Yes, definitely disabling compression for a file is good. The "accessed
in last 7 days flag" is questionable. This could be determined via the
atime on Unix and doesn't need a separate flag. Also, the difference
between "do not compress" and "can't compress" is very small. If it is
found that the file is incompressible, you could just as easily set the
"do not compress" flag.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-11-01 21:27:59

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

> > I just thought there was a patch doing windows nt-like
> > compress-em-all-realtime-and-get-doomed!
>
> I don't know what the actual heuristics for determining which files are
> compression with the ext2 patch. It is definitely NOT a compressed
> block device. Files are compressed in chunks (32kB?), so that it is
> possible to seek and do read-modify-write (e.g. appending to a file)
> without decompressing the entire file and/or recompressing it. This also
> protects against block corruption, since you would limit the amount of
> data lost to the end of the chunk after the bad spot.

But still... Are the files are compressed as they are created/modified on
the filesystem? My main point was to avoid the compression overhead and
just decompress the file at access time if it's compressed. Compression
should (IMO) be done nightly.

Perhaps a file should be decompressed when it's modified and either (a)
set scheduled to next nightly compression or (b) stay uncompressed the
next <n> days. I mean - as a file is being modified, the chance is large
that the file will be accessed pretty soon...

> Yes, definitely disabling compression for a file is good. The "accessed
> in last 7 days flag" is questionable. This could be determined via the
> atime on Unix and doesn't need a separate flag. Also, the difference
> between "do not compress" and "can't compress" is very small. If it is
> found that the file is incompressible, you could just as easily set the
> "do not compress" flag.

I agree on the 'accessed in last <n> days'. It'd be better to check atime.

I'd still like to separate 'do not compress' and 'can't compress', as to
show why the falg has been set - the former is set by the admin and the
latter by the system.

roy

2001-11-01 21:39:38

by Padraig Brady

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

Roy Sigurd Karlsbakk wrote:

>>>I just thought there was a patch doing windows nt-like
>>>compress-em-all-realtime-and-get-doomed!
>>>
>>I don't know what the actual heuristics for determining which files are
>>compression with the ext2 patch. It is definitely NOT a compressed
>>block device. Files are compressed in chunks (32kB?), so that it is
>>possible to seek and do read-modify-write (e.g. appending to a file)
>>without decompressing the entire file and/or recompressing it. This also
>>protects against block corruption, since you would limit the amount of
>>data lost to the end of the chunk after the bad spot.


It's a compilation option to never compress gz files. I guess it would
be easy to add others (bz2, Z, jpg, mp3, ...) to the list?. The max
chunk used for compression is 32KB and again this is configurable, as
is the compression method/level used. The alogorithms are gzip (1-9),
LZO, bz2.


>>
>
> But still... Are the files are compressed as they are created/modified on
> the filesystem?


Only if the file is in a directory with the +c attribute set.
You can have full control over the compression.

Note this transparent ext2 compression patch is only available for 2.2

Padraig.

2001-11-01 21:44:18

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

> >
> > But still... Are the files are compressed as they are created/modified on
> > the filesystem?
>
> Only if the file is in a directory with the +c attribute set.
> You can have full control over the compression.

Could this be modified to the 'wait <n> days' concept I mentioned earlier?
I mean... Don't let the kernel modify them - ever. Let some cron-job do
it...

> Note this transparent ext2 compression patch is only available for 2.2

Would it be hard to port to 2.4?

2001-11-01 22:55:46

by Hans Reiser

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

Basically a good idea, lots of people have agreed with you for years on it.
Wait until Sep. 30, 2002, and it will become especially easy to write this for
reiserfs using reiser4 plugins. Probably not very hard to write using reiser3
though. If you don't want to wait, I will review any patch you create for
reiser3, and express an opinion when I see it. Clean and simple is more
important to getting it accepted by me than optimal, at least for reiser3. Your
code will be easy tp author excepting the uncompression upon read, which is
probably not that hard. I think we can spare three bits for you in the stat
data. Your code will get tested in 2.5 before being sent in for 2.4 users if
you do it for v3.

Hans


Roy Sigurd Karlsbakk wrote:
>
> hi
>
> I got this idea the other day...
>
> Novell NetWare has a feature I really like. It's a file compression
> feature they've been having since version 4.0 (or 4.10) of the OS.
>
> - Once a day, a job is run to compress all files that havent been touched
> within <n> days - default 14, that have not been flagged CAN'T COMPRESS
> or DON'T COMPRESS (see below).
>
> - After the file is compressed, it's checked against the compression
> gain. If this is less than <n> per cent (default 30), the compressed
> version is being deleted and the file is flagged CAN'T COMPRESS. If the
> file is compressed, the uncompressed version is being deleted and the file
> is flagged COMPRESSED.
>
> - When a compressed file is accessed, it'll be decompressed on the fly and
> flagged ACCESSED AFTER COMPRESSION. The next time it's accessed within the
> given <n> days (above), it's decompressed and the compressed file
> discarded. The flag COMPRESSED is cleared.
>
> Files can be flagged 'DON'T COMPRESS' and 'FORCE COMPRESS' manually by the
> user or admin. 'FORCE COMPRESS' is dominant over 'CAN'T COMPRESS'.
>
> The result is that you're saving loads of space (typically 50-70% on a
> netware file server) and, since the compression job is batched up
> (typically by night), the performance penalty is minimal. File
> decompression will happen quite rarely, as only the least-accessed files
> are compressed.
>
> TODO:
> New attributes must be added somehow. 'ls' and 'find' and perhaps other
> files must be modified to take advantage of this. The compression job can
> be a simple script with something like
>
> find . -type f ! --compressed ! --dont-compress / -exec fcomp {} \;
>
> (and check can't compress and force compression).
>
> There must be a way to access the compressed files directly to make
> backups more efficient - backing up already compressed files's a good
> thing.
>
> COMMENT:
> And yes - I know a lot of people are saying this is something we don't
> need, as diskspace doesn't cost anything today compared to what it used
> to. The first time I heard that, was in '92. We're always using too much
> diskspace!
>
> Please cc: to me as I'm not on the list
>
> roy
> ---
> Praktiserende dyslektiker.
> La ikke ortografiske krumspring skygge for
> intensjonen bak denne fremstilling.

2001-11-01 23:02:07

by Hans Reiser

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

Andreas Dilger wrote:

> As a note to whoever at namesys created the reiserfs patch to add the
> "notail" flag (overloading the "nodump" flag). I would much rather
> that a new "notail" flag be allocated for this. I will contact Ted
> Ted Ts'o to get a flag assigned. This will avoid any problems in the
> future, and may also be useful at some time for ext2.

Sounds correct to do to me.

Hans

2001-11-01 23:10:27

by Hans Reiser

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

Andreas Dilger wrote:
>
> On Nov 01, 2001 18:14 +0100, Roy Sigurd Karlsbakk wrote:
> > Novell NetWare has a feature I really like. It's a file compression
> > feature they've been having since version 4.0 (or 4.10) of the OS.
>
> Yes, there is a patch for ext2 that does this as well.

We try to be aggressive about merging patches in at Namesys. If you are
generous enough to write a patch for ReiserFS, we will either tell you it is not
good, or take it, in a timely manner. Anyone this does not happen with should
complain to me and I will delve into who forgot it and motivate them.

Hans

2001-11-01 23:17:38

by Hans Reiser

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

Roy Sigurd Karlsbakk wrote:
>
> > > I just thought there was a patch doing windows nt-like
> > > compress-em-all-realtime-and-get-doomed!
> >
> > I don't know what the actual heuristics for determining which files are
> > compression with the ext2 patch. It is definitely NOT a compressed
> > block device. Files are compressed in chunks (32kB?), so that it is
> > possible to seek and do read-modify-write (e.g. appending to a file)
> > without decompressing the entire file and/or recompressing it. This also
> > protects against block corruption, since you would limit the amount of
> > data lost to the end of the chunk after the bad spot.
>
> But still... Are the files are compressed as they are created/modified on
> the filesystem? My main point was to avoid the compression overhead and
> just decompress the file at access time if it's compressed. Compression
> should (IMO) be done nightly.
>
> Perhaps a file should be decompressed when it's modified and either (a)
> set scheduled to next nightly compression or (b) stay uncompressed the
> next <n> days. I mean - as a file is being modified, the chance is large
> that the file will be accessed pretty soon...
>
> > Yes, definitely disabling compression for a file is good. The "accessed
> > in last 7 days flag" is questionable. This could be determined via the
> > atime on Unix and doesn't need a separate flag. Also, the difference
> > between "do not compress" and "can't compress" is very small. If it is
> > found that the file is incompressible, you could just as easily set the
> > "do not compress" flag.
>
> I agree on the 'accessed in last <n> days'. It'd be better to check atime.
>
> I'd still like to separate 'do not compress' and 'can't compress', as to
> show why the falg has been set - the former is set by the admin and the
> latter by the system.
>
> roy

I think you'll find it easiest to code it such that doing anything to the file
body, reading or writing, decompresses it. Fancier stuff can wait for later
versions.

Hans

2001-11-02 09:24:47

by Robert Varga

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

On Thu, Nov 01, 2001 at 10:43:37PM +0100, Roy Sigurd Karlsbakk wrote:
> > Note this transparent ext2 compression patch is only available for 2.2
>
> Would it be hard to port to 2.4?

AFAIK kinda yes. It relies on IO going through block buffer cache:
Buffer cache contains the compressed data, while the page cache has it all decompressed.
This avoids excessive (de)compression (you need to compress data only when commiting page
to disk).

With 2.4 Ext2 moved (almost?) entirely out of buffer cache, you'll need to
create your own IO buffers. This is the only solution I came up with. Is there some
other approach how to cope with this problem ?

--
Kind regards,
Robert Varga
------------------------------------------------------------------------------
[email protected] http://hq.sk/~nite/gpgkey.txt


Attachments:
(No filename) (839.00 B)
(No filename) (232.00 B)
Download all attachments

2001-11-02 09:32:17

by Nikita Danilov

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

Andreas Dilger writes:
> On Nov 01, 2001 18:14 +0100, Roy Sigurd Karlsbakk wrote:
> > Novell NetWare has a feature I really like. It's a file compression
> > feature they've been having since version 4.0 (or 4.10) of the OS.
>
> Yes, there is a patch for ext2 that does this as well.
>
> > New attributes must be added somehow. 'ls' and 'find' and perhaps other
> > files must be modified to take advantage of this. The compression job can
> > be a simple script with something like
> >
> > find . -type f ! --compressed ! --dont-compress / -exec fcomp {} \;
> >
> > (and check can't compress and force compression).
>
> There already exists a patch for reiserfs which uses the same interface
> to file attributes that ext2 and ext3 use.
>
> Also, ext2 already has a "compressed", "do not compress", and "dirty"
> attributes. They are currently not all user modifyable for ext2
> filesystems via chattr/lsattr, but that doesn't mean they cannot be
> on reiserfs.
>
> > There must be a way to access the compressed files directly to make
> > backups more efficient - backing up already compressed files's a good
> > thing.
>
> Yes, there is also such an attribute for "raw" access I think.
>
> Making the user-space interface and tools as compatible as possible is
> a good thing, IMHO, just like "ls", "cp", etc all work regardless of
> the underlying filesystem.
>
> As a note to whoever at namesys created the reiserfs patch to add the
> "notail" flag (overloading the "nodump" flag). I would much rather

It was me. Agree completely that allocating new flag would be better. I
just wanted "notail" to actually work and be accessible through standard
utilities, because it's really useful. "nodump" looked like least useful
of flags for me, because dump(8) doesn't work with reiserfs (not that it
worked with ext2 reliably either). I actually tried to contact Remy Card
and Theodore Tso, to discuss how [ls|ch]attr can be modified to support
different file-systems, but to no avail.

> that a new "notail" flag be allocated for this. I will contact Ted
> Ted Ts'o to get a flag assigned. This will avoid any problems in the
> future, and may also be useful at some time for ext2.

I would rather like to see lsattr/chattr to become file-system
independent. This requires that all file-systems use the same ioctl cmds
to set and get bitmasks associated with inodes and provide somehow a
mapping between symbolic name of an attribute and bitmask. Support for
octal bitmask (a la chmod) in chattr is also an option.

>
> Cheers, Andreas

Nikita.

> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
>
> -
--
I regret that you do not very much trust my signature, on the pretext
that we might be several. // J. Derrida, The Post Card.

2001-11-09 23:28:18

by Andreas Dilger

[permalink] [raw]
Subject: Re: writing a plugin for reiserfs compression

Nikita <[email protected]> writes:
> Andreas Dilger writes:
> > As a note to whoever at namesys created the reiserfs patch to add the
> > "notail" flag (overloading the "nodump" flag). I would much rather
>
> It was me. Agree completely that allocating new flag would be better. I
> just wanted "notail" to actually work and be accessible through standard
> utilities, because it's really useful. "nodump" looked like least useful
> of flags for me, because dump(8) doesn't work with reiserfs (not that it
> worked with ext2 reliably either). I actually tried to contact Remy Card
> and Theodore Tso, to discuss how [ls|ch]attr can be modified to support
> different file-systems, but to no avail.
>
> > that a new "notail" flag be allocated for this. I will contact Ted
> > Ted Ts'o to get a flag assigned. This will avoid any problems in the
> > future, and may also be useful at some time for ext2.

OK, FYI Nikita, Ted has allocated a EXT2_NOTAIL_FL flag for chattr/lsattr
(value 0x00008000) which can be used for setting files/directories to be
permanently notail. It is obviously up to the reiserfs code to handle
this flag and inherit it for files created in a directory (e.g. /boot),
but starting with e2fsprogs 1.26 chattr/lsattr it will be able to set/get
this flag on ext2/ext3 (and reiserfs with your attributes patch).

> I would rather like to see lsattr/chattr to become file-system
> independent. This requires that all file-systems use the same ioctl cmds
> to set and get bitmasks associated with inodes and provide somehow a
> mapping between symbolic name of an attribute and bitmask. Support for
> octal bitmask (a la chmod) in chattr is also an option.

There is nothing really ext2-specific to the chattr/lsattr programs. Yes,
they use an ioctl and flag values assigned to ext2, but as you have shown
it is also possible to use this ioctl on reiserfs without any problems.
These commands are for simple file attributes only. If reiserfs has a need
for specific attributes, then Ted can probably allocate a fs-specific range.
If you want to store the values in a different format, you can always map in
the ioctl, although I don't see a real need for that right now.

For more complex extended attributes, there are [gs]etextattr and [gs]etacl
commands (I think) which the ext2 EA/ACL code uses, but again this
is not ext2 specific. The author (Andreas Gruenbacher) is working
with the XFS folks to support a common kernel API and allow the same
user-space tools to work, even if the fs-internal and on-disk EA/ACL formats
are different. However, I know for extended attributes that Hans has
other plans (reiser4/sandbox syscall) so I don't know if this will be
useful to you. Maybe still yes, if the same user programs can interface
with the reiserfs syscall, and it may still be useful for ACL support.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/