2008-10-17 15:53:53

by Phillip Lougher

[permalink] [raw]
Subject: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

This is a second attempt at mainlining Squashfs. The first attempt was way
way back in early 2005 :-) Since then the filesystem layout has undergone
two major revisions, and the kernel code has almost been completely
rewritten. Both of these were to address the criticisms made at the original
attempt.

Summary of changes:
1. Filesystem layout is now 64-bit, in theory filesystems and
files can be 2^64 in size.

2. Filesystem is now fixed little-endian.

3. "." and ".." are now returned by readdir.

4. Sparse files are now supported.

5. Filesystem is now exportable (NFS etc.).

6. Datablocks up to 1 Mbyte are now supported.

Codewise all of the packed bit-fields and the swap macros have been removed in
favour of aligned structures and in-line swapping using leXX_to_cpu(). The
code has also been extensively restructured, reformatted to kernel coding
standards and commented.

Previously there was resistance to the inclusion of another compressed
filesystem when Linux already had cramfs. There was pressure for a strong
case to be made for the inclusion of Squashfs. Hopefully the case for
the inclusion of other compressed filesystems has now already been answered
over the last couple of years, however, it is worth listing the
features of Squashfs over cramfs, which is still the only read-only
compressed filesystem in mainline.

Max filesystem size: cramfs 16 Mbytes, Squashfs 64-bit filesystem
Max filesize: cramfs 16 Mbytes, Squashfs 64-bit filesystem
Block size: cramfs 4K, Squashfs default 128K, max 1Mbyte
Tail-end packing: cramfs no, Squashfs yes
Directory indexes: cramfs no, Squashfs yes
Compressed metadata: cramfs no, Squashfs yes
Hard link support: cramfs no, Squashfs yes
Support for "." and ".." in readdir: cramfs no, Squashfs yes
Real inode numbers: cramfs no, Squashfs yes. Cramfs gives device inodes,
fifo and empty directories the same inode of 1!
Exportable filesystem (NFS, etc.): cramfs no, Squashfs yes
Active maintenance: cramfs no (it is listed as orphaned, probably no active
work for years), Squashfs yes

Sorry for the list formatting, but many email readers are very unforgiving
displaying tabbed lists and so I avoided them.

For those that want hard performance statistics
http://tree.celinuxforum.org/CelfPubWiki/SquashFsComparisons gives
a full comparison of the performance of Squashfs against cramfs, zisofs,
cloop and ext3. I made these tests a number of years ago using Squashfs 2.1,
but they are still valid. In fact the performance should now be better.

Cramfs is a limited filesystem, it's good for some embedded users but not now
much else, its layout and features hasn't changed in the eight years+ since
its release. Squashfs, despite never being in mainline, has been actively
developed for over six years, and in that time has gone through four
layout revisions, each revision improving compression and performance where
limitations were found. For an often dismissed filesystem, Squashfs has
advanced features such as metadata compression and tail-end packing for greater
compression, and directory indexes for faster dentry operations.

Despite not being in mainline, it is widely used. It is packaged
by all major distributions (Ubuntu, Fedora, Debian, SUSE, Gentoo), it is used
on most LiveCDs, it is extensively used in embedded systems (STBs, routers,
mobile phones), and notably is used in such things as Splashtop and the
Amazon Kindle.

Anyway that's my case for inclusion. If any readers want Squashfs
mainlined it's probably now a good time to offer support!

There are 16 patches in the patch set, and the patches are against the
latest linux-next tree (linux 2.6.27-next-20081016).

Finally, I would like to acknowledge the financial support of the Consumer
Embedded Linux Forum (CELF). They've made it possible for me to spend the
last four months working full time on this mainlining attempt.

Phillip


2008-10-17 17:35:30

by Jörn Engel

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

On Fri, 17 October 2008 16:42:50 +0100, Phillip Lougher wrote:
>
> Codewise all of the packed bit-fields and the swap macros have been removed in
> favour of aligned structures and in-line swapping using leXX_to_cpu(). The
> code has also been extensively restructured, reformatted to kernel coding
> standards and commented.

Excellent! The data structures look good and I don't see a reason for
another format change. Which means the main reason against merging the
code has gone. Your style differs from other kernel code and in a
number of cases it would be nice to be more consistent with existing
conventions. It would certainly help others when reading the code. And
of course, one way to do so it to just merge and wait for some janitors
to notice squashfs and send patches. :)

I have to admit I am scared of this function:
+int squashfs_read_metadata(struct super_block *s, void *buffer,
+ long long block, unsigned int offset,
+ int length, long long *next_block,
+ unsigned int *next_offset)

It takes seven parameters, five of which look deceptively similar to me.
Almost every time I see a call to this function, my mind goes blank.

There must be some way to make this function a bit more agreeable. One
option is to fuse the "block" and "offset" parameters into a struct and
just pass two sets of this struct. Another would be to combine the two
sets of addresses into a single one. A quick look at some of the
callers shows seems to favor that approach.

squashfs_read_metadata(..., block, offset, ..., &block, &offset)
Could become
squashfs_read_metadata(..., &block, &offset, ...)

But again, such a change is no showstopper for mainline inclusion.

> Anyway that's my case for inclusion. If any readers want Squashfs
> mainlined it's probably now a good time to offer support!

Please no. A large amount of popular support would only bring you into
the reiser4 league. Bad arguments don't improve when repeated.

Support in the form of patches would be a different matter, though.

Jörn

--
Mac is for working,
Linux is for Networking,
Windows is for Solitaire!
-- stolen from dc

2008-10-17 18:47:32

by David P. Quigley

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

Looking through the code I see two references to xattrs, one is the
index of the xattr table in the superblock and there seems to be struct
member in one of the inode structures that is an index into this table.
Looking through the code I don't see either of these used at all. Do you
intend to add xattr support at some point? I saw reference to the desire
to add xattr support in an email from 2004 but you said that the code
has been rewritten since then. If you are going to add xattr support you
probably want to add it to more than just regular files. In SELinux and
other LSMs symlinks and directories are also labeled so they will need
xattr entries.

Dave

2008-10-21 01:12:24

by Phillip Lougher

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

David P. Quigley wrote:
> Looking through the code I see two references to xattrs, one is the
> index of the xattr table in the superblock and there seems to be struct
> member in one of the inode structures that is an index into this table.
> Looking through the code I don't see either of these used at all. Do you
> intend to add xattr support at some point? I saw reference to the desire
> to add xattr support in an email from 2004 but you said that the code
> has been rewritten since then. If you are going to add xattr support you
> probably want to add it to more than just regular files. In SELinux and
> other LSMs symlinks and directories are also labeled so they will need
> xattr entries.

Yes and yes. I am intending to add xattr support, something that's been
on my to-do list for a long time (since 2004 as you said), but it's been
something which I've never got the time to do. Once (if) Squashfs is
mainlined, it will be the next thing.

The xattr references in the layout is my attempt at forward planning to
avoid making an incompatible layout change when I finally get around to
implementing it. My plan is to put xattrs in a table (referenced by the
superblock), and then put indexes in "extended" inodes which index
into the table (as you noticed). The general idea in Squashfs is that
inodes get optimised for normally occurring cases, and less common cases
(that would need a bigger inode) get to use an extended inode.
Squashfs currently has an extended regular file inode, which is where
the xattr index will sit, and so this has had an xattr index added. The
other inodes don't currently have extended inodes, these will be defined
when I implement xattrs (which is why they're missing).

Having said that, I've fscked up and forgotten to add an xattr field to
the extended directory inode which is currently defined :)

Thanks for spotting this.

Phillip

> Dave
>
>

2008-10-21 01:22:33

by Phillip Lougher

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

David P. Quigley wrote:
In SELinux and
> other LSMs symlinks and directories are also labeled so they will need
> xattr entries.

BTW you don't mention device, fifo and socket inodes... Do they ever
get labelled? It's something I was going to look into closer to an
implementation, but it would be interesting to know.

Phillip

>
> Dave
>
>

2008-10-21 12:09:56

by Stephen Smalley

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

On Tue, 2008-10-21 at 02:12 +0100, Phillip Lougher wrote:
> David P. Quigley wrote:
> > Looking through the code I see two references to xattrs, one is the
> > index of the xattr table in the superblock and there seems to be struct
> > member in one of the inode structures that is an index into this table.
> > Looking through the code I don't see either of these used at all. Do you
> > intend to add xattr support at some point? I saw reference to the desire
> > to add xattr support in an email from 2004 but you said that the code
> > has been rewritten since then. If you are going to add xattr support you
> > probably want to add it to more than just regular files. In SELinux and
> > other LSMs symlinks and directories are also labeled so they will need
> > xattr entries.
>
> Yes and yes. I am intending to add xattr support, something that's been
> on my to-do list for a long time (since 2004 as you said), but it's been
> something which I've never got the time to do. Once (if) Squashfs is
> mainlined, it will be the next thing.
>
> The xattr references in the layout is my attempt at forward planning to
> avoid making an incompatible layout change when I finally get around to
> implementing it. My plan is to put xattrs in a table (referenced by the
> superblock), and then put indexes in "extended" inodes which index
> into the table (as you noticed). The general idea in Squashfs is that
> inodes get optimised for normally occurring cases, and less common cases
> (that would need a bigger inode) get to use an extended inode.
> Squashfs currently has an extended regular file inode, which is where
> the xattr index will sit, and so this has had an xattr index added. The
> other inodes don't currently have extended inodes, these will be defined
> when I implement xattrs (which is why they're missing).
>
> Having said that, I've fscked up and forgotten to add an xattr field to
> the extended directory inode which is currently defined :)
>
> Thanks for spotting this.

Just to clarify: When using a labeled MAC solution like SELinux or
SMACK, every file (of every type, including device nodes, symlinks,
fifos, etc) will have a security attribute on it. In the case of ext3,
we have benefited from inlining of small attributes into the inode.

--
Stephen Smalley
National Security Agency

2008-10-21 16:29:36

by David P. Quigley

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

On Tue, 2008-10-21 at 02:12 +0100, Phillip Lougher wrote:
> David P. Quigley wrote:
> > Looking through the code I see two references to xattrs, one is the
> > index of the xattr table in the superblock and there seems to be struct
> > member in one of the inode structures that is an index into this table.
> > Looking through the code I don't see either of these used at all. Do you
> > intend to add xattr support at some point? I saw reference to the desire
> > to add xattr support in an email from 2004 but you said that the code
> > has been rewritten since then. If you are going to add xattr support you
> > probably want to add it to more than just regular files. In SELinux and
> > other LSMs symlinks and directories are also labeled so they will need
> > xattr entries.
>
> Yes and yes. I am intending to add xattr support, something that's been
> on my to-do list for a long time (since 2004 as you said), but it's been
> something which I've never got the time to do. Once (if) Squashfs is
> mainlined, it will be the next thing.
>
> The xattr references in the layout is my attempt at forward planning to
> avoid making an incompatible layout change when I finally get around to
> implementing it. My plan is to put xattrs in a table (referenced by the
> superblock), and then put indexes in "extended" inodes which index
> into the table (as you noticed). The general idea in Squashfs is that
> inodes get optimised for normally occurring cases, and less common cases
> (that would need a bigger inode) get to use an extended inode.
> Squashfs currently has an extended regular file inode, which is where
> the xattr index will sit, and so this has had an xattr index added. The
> other inodes don't currently have extended inodes, these will be defined
> when I implement xattrs (which is why they're missing).
>
> Having said that, I've fscked up and forgotten to add an xattr field to
> the extended directory inode which is currently defined :)
>
> Thanks for spotting this.
>
> Phillip
>
> > Dave
> >
> >

Looking through the code I noticed that you give certain object types
the same inode number for all instances of it (devices, fifo/sockets).
How is this done internally? Do these types of objects occupy the same
position on the inode table? If so how do you differentiate between a
device and a socket?

I have some other comments but I'll post them under the specific
patches.

I use to work on Unionfs and we used CVS initially for our SCM. When we
started working on mainlining Unionfs we moved over to a GIT based
system and we found it worked a lot better. You might want to consider
moving your patches to a GIT tree that you make publically available so
people can just clone, compile, and test them. I don't see anything that
stops Squashfs from being compiled and loaded as a module so it might
not be necessary but it makes it easier for people who want to test the
code or even contribute patches.

Dave

2008-10-21 22:29:40

by Alex Riesen

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

2008/10/17 Phillip Lougher <[email protected]>:
> There are 16 patches in the patch set, and the patches are against the
> latest linux-next tree (linux 2.6.27-next-20081016).

You better don't base anything off linux-next. These are not stable: there
can be even something in the tree you mentioned which will never end
up in the mainline and if your patches depend on it they wont apply to
something like v2.6.26.2.

Use something tagged (v2.6.27, v2.6.27-rc*) in _mainline_ instead.
Than it can be applied to anything from stable releases to vendor trees
(which mostly are based off mainline).

2008-10-21 23:15:43

by Phillip Lougher

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

Alex Riesen wrote:
> 2008/10/17 Phillip Lougher <[email protected]>:
>> There are 16 patches in the patch set, and the patches are against the
>> latest linux-next tree (linux 2.6.27-next-20081016).
>
> You better don't base anything off linux-next. These are not stable: there
> can be even something in the tree you mentioned which will never end
> up in the mainline and if your patches depend on it they wont apply to
> something like v2.6.26.2.

Definately, there's some d_obtain_alias stuff in linux-next which has
been there since linux-2.6.27-rc4-next. I thought it would make it into
the final 2.6.27 but it didn't.

I thought linux-next *was* the tree that new patches should be based
off. However, the relationship between linux-2.6.git, linux-next.git,
and the -mm patch series seems to be a little vague to me, not to
mention where the linux-staging tree fits into all this.

Phillip

2008-10-21 23:42:35

by Phillip Lougher

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

David P. Quigley wrote:

>
> Looking through the code I noticed that you give certain object types
> the same inode number for all instances of it (devices, fifo/sockets).
> How is this done internally? Do these types of objects occupy the same
> position on the inode table? If so how do you differentiate between a
> device and a socket?
>

No, devices and fifo/sockets get their own unique inode numbers:

root@slackware:/mnt# mount -t squashfs test.sqsh /mnt2 -o loop
root@slackware:/mnt# ls -li /mnt2
total 0
2 crw-r--r-- 1 root root 1, 1 2008-10-22 00:31 device
4 prw-r--r-- 1 root root 0 2008-10-22 00:31 fifo
3 srwxr-xr-x 1 root root 0 2008-10-17 16:25 socket

struct squashfs_ipc_inode {
__le16 inode_type;
__le16 mode;
__le16 uid;
__le16 guid;
__le32 mtime;
__le32 inode_number;
__le32 nlink;
};

struct squashfs_dev_inode {
__le16 inode_type;
__le16 mode;
__le16 uid;
__le16 guid;
__le32 mtime;
__le32 inode_number;
__le32 nlink;
__le32 rdev;
};


> I use to work on Unionfs and we used CVS initially for our SCM. When we
> started working on mainlining Unionfs we moved over to a GIT based
> system and we found it worked a lot better. You might want to consider
> moving your patches to a GIT tree that you make publically available so
> people can just clone, compile, and test them. I don't see anything that
> stops Squashfs from being compiled and loaded as a module so it might
> not be necessary but it makes it easier for people who want to test the
> code or even contribute patches.


Yeah, Git is much better than CVS, however, I've got nowhere to host a
public Git repository. If someone were to offer hosting I'd be only too
happy to move over to Git.

Phillip

2008-10-21 23:57:24

by David P. Quigley

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

On Wed, 2008-10-22 at 00:42 +0100, Phillip Lougher wrote:
> David P. Quigley wrote:
>
> >
> > Looking through the code I noticed that you give certain object types
> > the same inode number for all instances of it (devices, fifo/sockets).
> > How is this done internally? Do these types of objects occupy the same
> > position on the inode table? If so how do you differentiate between a
> > device and a socket?
> >
>
> No, devices and fifo/sockets get their own unique inode numbers:
>
> root@slackware:/mnt# mount -t squashfs test.sqsh /mnt2 -o loop
> root@slackware:/mnt# ls -li /mnt2
> total 0
> 2 crw-r--r-- 1 root root 1, 1 2008-10-22 00:31 device
> 4 prw-r--r-- 1 root root 0 2008-10-22 00:31 fifo
> 3 srwxr-xr-x 1 root root 0 2008-10-17 16:25 socket
[Snip]
My mistake I misread your statement in email 0. You said that squashfs
has real inode numbers and that cramfs didn't. Good luck with your
mainlining attempt. Once you get xattr support this would definitely
make life better for people who want to make SELinux enabled LiveCDs and
other small devices.

Dave

2008-10-22 07:22:05

by Peter Korsgaard

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

>>>>> "Phillip" == Phillip Lougher <[email protected]> writes:

Hi,

Phillip> Yeah, Git is much better than CVS, however, I've got nowhere
Phillip> to host a public Git repository. If someone were to offer
Phillip> hosting I'd be only too happy to move over to Git.

You could apply for a kernel.org account:

http://kernel.org/faq/#account

--
Bye, Peter Korsgaard

2008-10-22 07:23:58

by David Woodhouse

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

On Wed, 2008-10-22 at 00:42 +0100, Phillip Lougher wrote:
> Yeah, Git is much better than CVS, however, I've got nowhere to host a
> public Git repository. If someone were to offer hosting I'd be only too
> happy to move over to Git.

Mail me a SSH public key (use a passphrase on it), and I'll give you an
account on git.infradead.org.

--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

2008-10-22 16:32:34

by Tim Bird

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

Phillip Lougher wrote:
> Yeah, Git is much better than CVS, however, I've got nowhere to host a
> public Git repository. If someone were to offer hosting I'd be only too
> happy to move over to Git.

I can offer you hosting on mirror.celinuxforum.org. If you are
interested, let me know and I'll set up an account immediately.
I haven't hosted a public git tree there, so there may be
some additional setup required.
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

2008-10-22 17:14:56

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

Hi Phillip,

On Fri, 17 Oct 2008, Phillip Lougher wrote:
> This is a second attempt at mainlining Squashfs. The first attempt was way

This is great news!

I ran a quick test of squashfs 4.0 (the CVS version) on UML/ia32 and ppc64, and it
seems to work fine! Great job! Let's hope we'll see it in mainline soon...

BTW, one minor gripe is that the current mksquashfs doesn't want to run on big
endian yet, as there's no byteswapping support.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

2008-10-23 08:40:49

by Phillip Lougher

[permalink] [raw]
Subject: Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

Geert Uytterhoeven wrote:
> Hi Phillip,
>
> On Fri, 17 Oct 2008, Phillip Lougher wrote:
>> This is a second attempt at mainlining Squashfs. The first attempt was way
>
> This is great news!
>
> I ran a quick test of squashfs 4.0 (the CVS version) on UML/ia32 and ppc64, and it
> seems to work fine! Great job! Let's hope we'll see it in mainline soon...
>

Thanks! I hope it gets into mainline soon too :)

> BTW, one minor gripe is that the current mksquashfs doesn't want to run on big
> endian yet, as there's no byteswapping support.

Yeah, I know about that. There's still some work needing to be done on
the squashfs-tools. I figured it was important to get the kernel stuff
submitted and discussed ASAP.

Phillip