2007-10-14 19:35:07

by Justin Piszcz

[permalink] [raw]
Subject: In response to kernel compression e-mail a few months ago.

It turns out the one I did not test, was actually the best:

Used: 7z -mx=9 a linux-2.6.16.17.tar.7z linux-2.6.16.17.tar

$ du -sk * | sort -n
32392 linux-2.6.16.17.tar.7z
33520 linux-2.6.16.17.tar.lzma
33760 linux-2.6.16.17.tar.rar
38064 linux-2.6.16.17.tar.rz
39472 linux-2.6.16.17.tar.szip
39520 linux-2.6.16.17.tar.bz
39936 linux-2.6.16.17.tar.bz2
40000 linux-2.6.16.17.tar.bicom
40656 linux-2.6.16.17.tar.sit
47664 linux-2.6.16.17.tar.lha
49968 linux-2.6.16.17.tar.dzip
50000 linux-2.6.16.17.tar.gz
51344 linux-2.6.16.17.tar.arj
57552 linux-2.6.16.17.tar.lzo
57984 linux-2.6.16.17.tar.F
81136 linux-2.6.16.17.tar.Z
94544 linux-2.6.16.17.tar.zoo
101216 linux-2.6.16.17.tar.arc
228608 linux-2.6.16.17.tar

$ du -sh * | sort -n
32M linux-2.6.16.17.tar.7z
33M linux-2.6.16.17.tar.lzma
33M linux-2.6.16.17.tar.rar
37M linux-2.6.16.17.tar.rz
39M linux-2.6.16.17.tar.bicom
39M linux-2.6.16.17.tar.bz
39M linux-2.6.16.17.tar.bz2
39M linux-2.6.16.17.tar.szip
40M linux-2.6.16.17.tar.sit
47M linux-2.6.16.17.tar.lha
49M linux-2.6.16.17.tar.dzip
49M linux-2.6.16.17.tar.gz
50M linux-2.6.16.17.tar.arj
56M linux-2.6.16.17.tar.lzo
57M linux-2.6.16.17.tar.F
79M linux-2.6.16.17.tar.Z
92M linux-2.6.16.17.tar.zoo
99M linux-2.6.16.17.tar.arc
223M linux-2.6.16.17.tar




2007-10-14 19:46:25

by Jan Engelhardt

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.


On Oct 14 2007 15:34, Justin Piszcz wrote:
>
> It turns out the one I did not test, was actually the best:
>
> Used: 7z -mx=9 a linux-2.6.16.17.tar.7z linux-2.6.16.17.tar
>
> $ du -sk * | sort -n
> 32392 linux-2.6.16.17.tar.7z
> 33520 linux-2.6.16.17.tar.lzma
> 33760 linux-2.6.16.17.tar.rar
> 38064 linux-2.6.16.17.tar.rz
> 39472 linux-2.6.16.17.tar.szip
> 39520 linux-2.6.16.17.tar.bz
> 39936 linux-2.6.16.17.tar.bz2
> 40000 linux-2.6.16.17.tar.bicom
> 40656 linux-2.6.16.17.tar.sit
> 47664 linux-2.6.16.17.tar.lha
> 49968 linux-2.6.16.17.tar.dzip
> 50000 linux-2.6.16.17.tar.gz
> 51344 linux-2.6.16.17.tar.arj
> 57552 linux-2.6.16.17.tar.lzo
> 57984 linux-2.6.16.17.tar.F
> 81136 linux-2.6.16.17.tar.Z
> 94544 linux-2.6.16.17.tar.zoo
> 101216 linux-2.6.16.17.tar.arc
> 228608 linux-2.6.16.17.tar

What's with all these odd formats, and where is .zip? :)
Somehow... have you tried lrzip?
Furthermore, if the files in the .tar archive were actually sorted..
(Obviously we shall pick .7z)

2007-10-14 19:53:17

by Justin Piszcz

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.



On Sun, 14 Oct 2007, Jan Engelhardt wrote:

>
> On Oct 14 2007 15:34, Justin Piszcz wrote:
>>
>> It turns out the one I did not test, was actually the best:
>>
>> Used: 7z -mx=9 a linux-2.6.16.17.tar.7z linux-2.6.16.17.tar
>>
>> $ du -sk * | sort -n
>> 32392 linux-2.6.16.17.tar.7z
>> 33520 linux-2.6.16.17.tar.lzma
>> 33760 linux-2.6.16.17.tar.rar
>> 38064 linux-2.6.16.17.tar.rz
>> 39472 linux-2.6.16.17.tar.szip
>> 39520 linux-2.6.16.17.tar.bz
>> 39936 linux-2.6.16.17.tar.bz2
>> 40000 linux-2.6.16.17.tar.bicom
>> 40656 linux-2.6.16.17.tar.sit
>> 47664 linux-2.6.16.17.tar.lha
>> 49968 linux-2.6.16.17.tar.dzip
>> 50000 linux-2.6.16.17.tar.gz
>> 51344 linux-2.6.16.17.tar.arj
>> 57552 linux-2.6.16.17.tar.lzo
>> 57984 linux-2.6.16.17.tar.F
>> 81136 linux-2.6.16.17.tar.Z
>> 94544 linux-2.6.16.17.tar.zoo
>> 101216 linux-2.6.16.17.tar.arc
>> 228608 linux-2.6.16.17.tar
>
> What's with all these odd formats, and where is .zip? :)
> Somehow... have you tried lrzip?
$ apt-cache search lrzip
$

I tried most of the main ones in the standard testing distribution within
Debian.

> Furthermore, if the files in the .tar archive were actually sorted..
> (Obviously we shall pick .7z)
>

Ah, how did I miss zip? :)

$ du -sk * | sort -n
32392 linux-2.6.16.17.tar.7z
33520 linux-2.6.16.17.tar.lzma
33760 linux-2.6.16.17.tar.rar
38064 linux-2.6.16.17.tar.rz
39472 linux-2.6.16.17.tar.szip
39520 linux-2.6.16.17.tar.bz
39936 linux-2.6.16.17.tar.bz2
40000 linux-2.6.16.17.tar.bicom
40656 linux-2.6.16.17.tar.sit
47664 linux-2.6.16.17.tar.lha
49940 linux-2.6.16.17.tar.zip
49968 linux-2.6.16.17.tar.dzip
50000 linux-2.6.16.17.tar.gz
51344 linux-2.6.16.17.tar.arj
57552 linux-2.6.16.17.tar.lzo
57984 linux-2.6.16.17.tar.F
81136 linux-2.6.16.17.tar.Z
94544 linux-2.6.16.17.tar.zoo
101216 linux-2.6.16.17.tar.arc
228608 linux-2.6.16.17.tar

$ du -sh * | sort -n
32M linux-2.6.16.17.tar.7z
33M linux-2.6.16.17.tar.lzma
33M linux-2.6.16.17.tar.rar
37M linux-2.6.16.17.tar.rz
39M linux-2.6.16.17.tar.bicom
39M linux-2.6.16.17.tar.bz
39M linux-2.6.16.17.tar.bz2
39M linux-2.6.16.17.tar.szip
40M linux-2.6.16.17.tar.sit
47M linux-2.6.16.17.tar.lha
49M linux-2.6.16.17.tar.zip
49M linux-2.6.16.17.tar.dzip
49M linux-2.6.16.17.tar.gz
50M linux-2.6.16.17.tar.arj
56M linux-2.6.16.17.tar.lzo
57M linux-2.6.16.17.tar.F
79M linux-2.6.16.17.tar.Z
92M linux-2.6.16.17.tar.zoo
99M linux-2.6.16.17.tar.arc
223M linux-2.6.16.17.tar

Justin.

2007-10-14 20:04:44

by Jan Engelhardt

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.


On Oct 14 2007 15:53, Justin Piszcz wrote:
>>
>> What's with all these odd formats, and where is .zip? :)
>> Somehow... have you tried lrzip?
> $ apt-cache search lrzip
> $
>
> I tried most of the main ones in the standard testing distribution within
> Debian.

Debian is not a solution to everything.

http://ck.kolivas.org/apps/lrzip/

2007-10-14 20:16:57

by Justin Piszcz

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.



On Sun, 14 Oct 2007, Jan Engelhardt wrote:

>
> On Oct 14 2007 15:53, Justin Piszcz wrote:
>>>
>>> What's with all these odd formats, and where is .zip? :)
>>> Somehow... have you tried lrzip?
>> $ apt-cache search lrzip
>> $
>>
>> I tried most of the main ones in the standard testing distribution within
>> Debian.
>
> Debian is not a solution to everything.
>
> http://ck.kolivas.org/apps/lrzip/
>

$ lrzip -L 9 linux-2.6.16.17.tar
Failed to open streams in rzip_fd
Fatal error - exiting

$ lrzip linux-2.6.16.17.tar -o linux-2.6.16.17.tar.lrz
Failed to open streams in rzip_fd
Fatal error - exiting

$ lrzip -l -L 9 linux-2.6.16.17.tar
Bus error

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22176 abc 20 0 2197m 156m 75m R 93 4.8 0:09.17 lrzip

It must grow to 3.0GB and die (this is on an x86 host)..

$ lrzip -w 1 -l -L 9 linux-2.6.16.17.tar
linux-2.6.16.17.tar - compression ratio 3.127

$ du -sh *lrz
72M linux-2.6.16.17.tar.lrz

$ lrzip -w 10 -l -L 9 linux-2.6.16.17.tar
linux-2.6.16.17.tar - compression ratio 3.380
$ du -sh *lrz
67M linux-2.6.16.17.tar.lrz

Does not seem to come close unless I am doing something wrong.

Also, 7z can compress/decompress on stdin and it is multi-threaded (uses
1.8-2.2 CPU/cores).

>> note that lrzip cannot operate on stdin/stdout

Justin.

2007-10-14 20:50:25

by Al Viro

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.

On Sun, Oct 14, 2007 at 09:46:15PM +0200, Jan Engelhardt wrote:
> (Obviously we shall pick .7z)

The hell it is. Take a look at memory footprint of those suckers...

2007-10-14 20:59:16

by Justin Piszcz

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.



On Sun, 14 Oct 2007, Al Viro wrote:

> On Sun, Oct 14, 2007 at 09:46:15PM +0200, Jan Engelhardt wrote:
>> (Obviously we shall pick .7z)
>
> The hell it is. Take a look at memory footprint of those suckers...
>

For compression with -mx=9 it does use 500-900 MiB of RAM, that is true.
For decompression, 50-70 MiB.

Each have their pros/cons but nothing can compress the kernel any further
than 7z, supports stdin/stdout and also has a native windows port. I used
to strictly use bzip2 for backups and such but if I can pick off an
additional 20-30% more than bzip2 for my backups which I will not use often,
7zip seems to be the winner for space savings and possibly for
bandwidth/cost savings..

compress:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10544 war 20 0 700m 681m 1632 S 141 20.7 1:41.46 7z

decompress:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11927 war 20 0 71256 66m 1536 R 88 2.0 0:04.07 7z

Justin.

2007-10-14 21:49:16

by Jan Engelhardt

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.


On Oct 14 2007 16:58, Justin Piszcz wrote:
>
> compress:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 10544 war 20 0 700m 681m 1632 S 141 20.7 1:41.46 7z

Just how you can utilize a CPU to 141% remains a mystery..
[ to be noted this is sqrt(2)*100 ]

2007-10-14 22:28:39

by Justin Piszcz

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.



On Sun, 14 Oct 2007, Jan Engelhardt wrote:

>
> On Oct 14 2007 16:58, Justin Piszcz wrote:
>>
>> compress:
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 10544 war 20 0 700m 681m 1632 S 141 20.7 1:41.46 7z
>
> Just how you can utilize a CPU to 141% remains a mystery..
> [ to be noted this is sqrt(2)*100 ]
>

It uses 2 cores (multi-thread/multi-core), I believe the author of 7z (I
asked him about this before) said the compression algorithm can use
1.8-2.2 cpus.

Justin.

2007-10-16 13:19:27

by Denys Vlasenko

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.

On Sunday 14 October 2007 21:58, Justin Piszcz wrote:
>
> On Sun, 14 Oct 2007, Al Viro wrote:
>
> > On Sun, Oct 14, 2007 at 09:46:15PM +0200, Jan Engelhardt wrote:
> >> (Obviously we shall pick .7z)
> >
> > The hell it is. Take a look at memory footprint of those suckers...
>
> For compression with -mx=9 it does use 500-900 MiB of RAM, that is true.
> For decompression, 50-70 MiB.

I'm with Al on this. 50 Mb for decompression?
Embedded and small device folks will not love this, I'm sure.
*Maybe* we can use lzma. Seems to use 8Mb on decompression:

PID VSZ*VSZRW RSS (SHR) DIRTY (SHR) STACK COMMAND
30474 10708 8604 8760 392 8360 0 8 lzmacat pld-th-x86_64.tar.lzma

(pld-th-x86_64.tar.lzma is a random 40Mb .lzma file I found on the net)

Sizes in Kb again:

32392 linux-2.6.16.17.tar.7z
33520 linux-2.6.16.17.tar.lzma

P.S. sorting files by extension in tarball generally helps, but in case
of Linux kernel, they are all C code anyway, so no measurable gain there.
--
vda

2007-10-16 13:31:39

by Andreas Schwab

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.

Denys Vlasenko <[email protected]> writes:

> I'm with Al on this. 50 Mb for decompression?
> Embedded and small device folks will not love this, I'm sure.

How often do you unpack the kernel sources on an embedded device? :)

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2007-10-16 14:22:37

by Jan Engelhardt

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.


On Oct 16 2007 14:19, Denys Vlasenko wrote:
>Sizes in Kb again:
>
>32392 linux-2.6.16.17.tar.7z
>33520 linux-2.6.16.17.tar.lzma
>
>P.S. sorting files by extension in tarball generally helps, but in case
>of Linux kernel, they are all C code anyway, so no measurable gain there.

Extension is not all so interesting because, as you point out,
most of it is C code, and .h files are mostly like .c in that they
have structs and function prototype keywords. But sorting by
name buys:

-rw-r--r-- 1 jengelh users 45477128 Oct 12 18:47 linux-2.6.23.1.orig.tar.bz2
-rw-r--r-- 1 jengelh users 45560647 Oct 16 16:18 linux-2.6.23.1.new.tar.bz2

(actually, `find "$@" -print0 | sort -z | tar -T- --null --no-r --owner=root
--group=root -cvjf "$output";` was used)

2007-10-16 20:54:40

by Denys Vlasenko

[permalink] [raw]
Subject: Re: In response to kernel compression e-mail a few months ago.

On Tuesday 16 October 2007 14:31, Andreas Schwab wrote:
> Denys Vlasenko <[email protected]> writes:
>
> > I'm with Al on this. 50 Mb for decompression?
> > Embedded and small device folks will not love this, I'm sure.
>
> How often do you unpack the kernel sources on an embedded device? :)

Oops. I goofed up, I somehow thought we were talking about kernel _images_
being compressed. :(
--
vda