2000-11-20 15:18:32

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: ext2 compression: How about using the Netware principle?

Hi

With some years of practice with Novell NetWare, I've been wandering why
the (unused?) file system compression mechanism in ext2 is based on
doing realtime compression. To make compression efficient, it can't be
made this simple. Let's look at the type of volume (file system)
compression introduced with Novell NetWare 4.0 around '94:

- A file is saved to disk
- If the file isn't touched (read or written to) within <n> days
(default 14), the file is compressed.
- If the file isn't compressed more than <n> percent (default 20), the
file is flagged "can't compress".
- All file compression is done on low traffic times (default between
00:00 and 06:00 hours)
- The first time a file is read or written to within the <n> days
interval mentioned above, the file is addressed using realtime
compression. The second time, the file is decompressed and commited to
disk (uncompressed).

Results:
A minimum of CPU time is wasted compressing/decompressing files.
The average server I've been out working with have an effective
compression of somewhere between 30 and 100 per cent.

PS: This functionality was even scheduled for Win2k, but was somewhere
lost... I don't know where...

Questions:
I'm really not a kernel hacker, but really...
- The daily (or nightly) compression job can run as a cron job. This can
be a normal user process running as root. Am I right?
- The decompress-and-perhaps-commit-decompressed-to-disk process should
be done by a kernel process within (or beside) the file system.
- The M$ folks will get even more problems braging about a less useful
product.

Please CC: to me, as I'm not on the list

Regards

Roy Sigurd Karlsbakk


2000-11-21 23:02:26

by Jorge Nerin

[permalink] [raw]
Subject: Re: ext2 compression: How about using the Netware principle?

Roy Sigurd Karlsbakk wrote:
>
> Hi
>
> With some years of practice with Novell NetWare, I've been wandering why
> the (unused?) file system compression mechanism in ext2 is based on
> doing realtime compression. To make compression efficient, it can't be
> made this simple. Let's look at the type of volume (file system)
> compression introduced with Novell NetWare 4.0 around '94:
>
> - A file is saved to disk
> - If the file isn't touched (read or written to) within <n> days
> (default 14), the file is compressed.
> - If the file isn't compressed more than <n> percent (default 20), the
> file is flagged "can't compress".
> - All file compression is done on low traffic times (default between
> 00:00 and 06:00 hours)
> - The first time a file is read or written to within the <n> days
> interval mentioned above, the file is addressed using realtime
> compression. The second time, the file is decompressed and commited to
> disk (uncompressed).
>
> Results:
> A minimum of CPU time is wasted compressing/decompressing files.
> The average server I've been out working with have an effective
> compression of somewhere between 30 and 100 per cent.
>
> PS: This functionality was even scheduled for Win2k, but was somewhere
> lost... I don't know where...
>
> Questions:
> I'm really not a kernel hacker, but really...
> - The daily (or nightly) compression job can run as a cron job. This can
> be a normal user process running as root. Am I right?
> - The decompress-and-perhaps-commit-decompressed-to-disk process should
> be done by a kernel process within (or beside) the file system.
> - The M$ folks will get even more problems braging about a less useful
> product.
>
> Please CC: to me, as I'm not on the list
>
> Regards
>
> Roy Sigurd Karlsbakk
>

Well, filesystem compresion is in NT since 4.0, in fact you can compress
a file, a directory, or the whole partition, but only under NTFS. I
believe that it's [un]compressed on the fly, but I'm not sure about this
fact.

The [un]compression mechanism could be a kernel thread calling a
userspace program (gzip, bzip2, definable) doing the actual
decompresion.

Don't know, just thoughts.

--
Jorge Nerin
<[email protected]>

2000-11-23 10:20:04

by Pavel Machek

[permalink] [raw]
Subject: Re: ext2 compression: How about using the Netware principle?

Hi!

> - A file is saved to disk
> - If the file isn't touched (read or written to) within <n> days
> (default 14), the file is compressed.
> - If the file isn't compressed more than <n> percent (default 20), the
> file is flagged "can't compress".
> - All file compression is done on low traffic times (default between
> 00:00 and 06:00 hours)
> - The first time a file is read or written to within the <n> days
> interval mentioned above, the file is addressed using realtime
> compression. The second time, the file is decompressed and commited to
> disk (uncompressed).

Oops, that means that merely reading a file followed by powerfail can
lead to you loosing the file. Oops.

Besides: you can do this in userspace with existing e2compr. Should take
less than 2 days to implement.

> Results:
> A minimum of CPU time is wasted compressing/decompressing files.
> The average server I've been out working with have an effective
> compression of somewhere between 30 and 100 per cent.

Results: NOP at machines that are never on in that time, random corruption
after powerfail between 0:00-6:00, .. Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2000-11-23 11:37:39

by Anders K. Pedersen

[permalink] [raw]
Subject: Re: ext2 compression: How about using the Netware principle?

Pavel Machek wrote:
> > - A file is saved to disk
> > - If the file isn't touched (read or written to) within <n> days
> > (default 14), the file is compressed.
> > - If the file isn't compressed more than <n> percent (default 20), the
> > file is flagged "can't compress".
> > - All file compression is done on low traffic times (default between
> > 00:00 and 06:00 hours)
> > - The first time a file is read or written to within the <n> days
> > interval mentioned above, the file is addressed using realtime
> > compression. The second time, the file is decompressed and commited to
> > disk (uncompressed).

Also, if less than <n> percent of the volume is free, files will not be
decompressed, and compressed files can only be addressed through
realtime decompression.

> Oops, that means that merely reading a file followed by powerfail can
> lead to you loosing the file. Oops.

That is of course not the case. When a file is decompressed (or
compressed) a new file is created, and once the (de)compression is
completed, a delete and rename will be performed, and this is, if I
remember correctly, transaction based, so the file will not be lost in
case a powerfail should occur.

Regards,
Anders K. Pedersen

2000-11-23 13:19:09

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: ext2 compression: How about using the Netware principle?

> > - A file is saved to disk
> > - If the file isn't touched (read or written to) within <n> days
> > (default 14), the file is compressed.
> > - If the file isn't compressed more than <n> percent (default 20), the
> > file is flagged "can't compress".
> > - All file compression is done on low traffic times (default between
> > 00:00 and 06:00 hours)
> > - The first time a file is read or written to within the <n> days
> > interval mentioned above, the file is addressed using realtime
> > compression. The second time, the file is decompressed and commited to
> > disk (uncompressed).
>
> Oops, that means that merely reading a file followed by powerfail can
> lead to you loosing the file. Oops.

eh.. don't think so.
READ
DECOMPRESS
WRITE
SYNC
DELETE OLD COMPRESSED FILE
or something

> Besides: you can do this in userspace with existing e2compr. Should take
> less than 2 days to implement.

ok
never seen that...

> > Results:
> > A minimum of CPU time is wasted compressing/decompressing files.
> > The average server I've been out working with have an effective
> > compression of somewhere between 30 and 100 per cent.
>
> Results: NOP at machines that are never on in that time, random corruption
> after powerfail between 0:00-6:00, .. Pavel

I'm talking about file servers. Not merely a bloody PC. On a PC, hard disk
space doesn't really cost anything and you can manually compress what you're
not using.

roy

2000-11-24 11:42:44

by Pavel Machek

[permalink] [raw]
Subject: Re: ext2 compression: How about using the Netware principle?

Hi!

> > Besides: you can do this in userspace with existing e2compr. Should take
> > less than 2 days to implement.
>
> ok
> never seen that...

e2compr is suite of patches to do online compression. It is pretty
mature. You should take a look; turning online compression into
offline is pretty trivial.

Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]