2002-01-12 08:05:31

by H. Peter Anvin

[permalink] [raw]
Subject: initramfs buffer spec -- second draft

This is an update to the initramfs buffer format spec I posted
earlier. The changes are as follows:

a) Move the PAD() declarations around. It is now required that the
cpio header is aligned on a multiple of 4 bytes, thereby removing a
potential ambiguity in the previous specification.

b) Clearly specify that the data can be attached to any member of a
hard link set.

As always, comments appreciated...

initramfs buffer format
-----------------------

Al Viro, H. Peter Anvin
Last revision: 2002-01-11

** DRAFT ** DRAFT ** DRAFT ** DRAFT ** DRAFT ** DRAFT **

Starting with kernel 2.5.x, the old "initial ramdisk" protocol is
getting {replaced/complemented} with the new "initial ramfs"
(initramfs) protocol. The initramfs contents is passed using the same
memory buffer protocol used by the initrd protocol, but the contents
is different. The initramfs buffer contains an archive which is
expanded into a ramfs filesystem; this document details the format of
the initramfs buffer format.

The initramfs buffer format is based around the "newc" CPIO format,
and can be created with the cpio(1) utility. The cpio archive can be
compressed using gzip(1). The simplest form of the initramfs buffer
is thus a single .cpio.gz file.

The full format of the initramfs buffer is defined by the following
grammar, where:
* is used to indicate "0 or more occurrences of"
(|) indicates alternatives
+ indicates concatenation
GZIP() indicates the gzip(1) of the operand
PAD(n) means padding with null bytes to an n-byte boundary
[QUESTION: is the padding relative to the start of the
previous header, or is it an absolute address? Is it at all
legal to have a header start on a non-multiple of 4?]

initramfs := ("\0" | cpio_archive | cpio_gzip_archive)*

cpio_gzip_archive := GZIP(cpio_archive)

cpio_archive := cpio_file* + (<nothing> | cpio_trailer)

cpio_file := PAD(4) + cpio_header + filename + "\0" + PAD(4) + data

cpio_trailer := PAD(4) + cpio_header + "TRAILER!!!\0" + PAD(4)


In human terms, the initramfs buffer contains a collection of
compressed and/or uncompressed cpio archives (in the "newc" format);
arbitrary amounts zero bytes (for padding) can be added between
members.

The cpio "TRAILER!!!" entry (cpio end of file) is optional, but is not
ignored; see "handling of hard links" below.

The structure of the cpio_header is as follows (all 8-byte entries
contain 32-bit hexadecimal ASCII numbers):

Field name Field size Meaning
c_magic 6 bytes The string "070701" or "070702"
c_ino 8 bytes File inode number
c_mode 8 bytes File mode and permissions
c_uid 8 bytes File uid
c_gid 8 bytes File gid
c_nlink 8 bytes Number of links
c_mtime 8 bytes Modification time
c_filesize 8 bytes Size of data field
c_maj 8 bytes Major part of file device number
c_min 8 bytes Minor part of file device number
c_rmaj 8 bytes Major part of device node reference
c_rmin 8 bytes Minor part of device node reference
c_namesize 8 bytes Length of filename, including final \0
c_chksum 8 bytes CRC of data field if c_magic is 070702

The c_mode field matches the contents of st_mode returned by stat(2)
on Linux, and encodes the file type and file permissions.

The c_filesize should be zero for any non-regular file.

If the filename is "TRAILER!!!" this is actually an end-of-file
marker; the c_filesize for an end-of-file marker must be zero.


*** Handling of hard links

When a nondirectory with c_nlink > 1 is seen, the (c_maj,c_min,c_ino)
tuple is looked up in a tuple buffer. If not found, it is entered in
the tuple buffer and the entry is created as usual; if found, a hard
link rather than a second copy of the file is created. It is not
necessary (but permitted) to include a second copy of the file
contents; if the file contents is not included, the c_filesize field
should be set to zero to indicate no data section follows. If data is
present, the previous instance of the file is overwritten; this allows
the data-carrying instance of a file to occur anywhere in the sequence
(GNU cpio is reported to attach the data to the last instance of a
file only.)

When a "TRAILER!!!" end-of-file marker is seen, the tuple buffer is
reset. This permits archives which are generated independently to be
concatenated.

To combine file data from different sources (without having to
regenerate the (c_maj,c_min,c_ino) fields), therefore, either one of
the following techniques can be used:

a) Separate the different file data sources with a "TRAILER!!!"
end-of-file marker, or

b) Make sure c_nlink == 1 for all nondirectory entries.

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>


2002-01-13 02:01:23

by Alexander Viro

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft



On Sat, 12 Jan 2002, H. Peter Anvin wrote:

> Field name Field size Meaning
> c_magic 6 bytes The string "070701" or "070702"
> c_ino 8 bytes File inode number
> c_mode 8 bytes File mode and permissions
> c_uid 8 bytes File uid
> c_gid 8 bytes File gid
> c_nlink 8 bytes Number of links
> c_mtime 8 bytes Modification time
> c_filesize 8 bytes Size of data field
> c_maj 8 bytes Major part of file device number
> c_min 8 bytes Minor part of file device number
> c_rmaj 8 bytes Major part of device node reference
> c_rmin 8 bytes Minor part of device node reference
> c_namesize 8 bytes Length of filename, including final \0
> c_chksum 8 bytes CRC of data field if c_magic is 070702

+ or "00000000" if it's 070701. Kernel
+ is not expected to verify it in any case.

> The c_mode field matches the contents of st_mode returned by stat(2)
> on Linux, and encodes the file type and file permissions.

- The c_filesize should be zero for any non-regular file.
+ The c_filesize can be non-zero only for regular files and symlinks.
+ For symlinks data and c_filesize match the results of readlink(2).
+ Having more than one link to a symlink is illegal.

2002-01-13 02:17:44

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Alexander Viro wrote:

>>
> + or "00000000" if it's 070701. Kernel
> + is not expected to verify it in any case.
>


Check.


>
> - The c_filesize should be zero for any non-regular file.
> + The c_filesize can be non-zero only for regular files and symlinks.
> + For symlinks data and c_filesize match the results of readlink(2).
> + Having more than one link to a symlink is illegal.
>

Why can't you have more than one link to a symlink?

-hpa

2002-01-13 04:12:10

by Alexander Viro

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft



On Sat, 12 Jan 2002, H. Peter Anvin wrote:

> > - The c_filesize should be zero for any non-regular file.
> > + The c_filesize can be non-zero only for regular files and symlinks.
> > + For symlinks data and c_filesize match the results of readlink(2).
> > + Having more than one link to a symlink is illegal.
> >
>
> Why can't you have more than one link to a symlink?

Basically, you'll have no decent way to preserve that when you unpack.

In our case we _can_ do that; in general there is no portable way to
create such links (semantics of link(2) wrt following links differs
even between Linux versions, let alone various Unices).

cpio(1) includes the symlink body with each instance and doesn't
even bother trying to link(2) them when unpacking.

2002-01-13 19:40:02

by Daniel Phillips

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

First off, the documentation is great and the approach seems fundamentallly
sound.

On January 12, 2002 09:04 am, H. Peter Anvin wrote:
[...]
> PAD(n) means padding with null bytes to an n-byte boundary
> [QUESTION: is the padding relative to the start of the
> previous header, or is it an absolute address? Is it at all
> legal to have a header start on a non-multiple of 4?]

I'll vote for the always/absolute rule.

[...]
> The structure of the cpio_header is as follows (all 8-byte entries
> contain 32-bit hexadecimal ASCII numbers):

I thought there's a binary version of the cpio header. What is the
point of the ascii encoding?

[...]
> The c_mode field matches the contents of st_mode returned by stat(2)
> on Linux, and encodes the file type and file permissions.
>
> The c_filesize should be zero for any non-regular file.
>
> If the filename is "TRAILER!!!" this is actually an end-of-file
> marker; the c_filesize for an end-of-file marker must be zero.

It sure looks ugly, but I suppose the c_filesize=zero is the real
end-of-file marker. Did I mention it sure looks ugly?

--
Daniel

2002-01-13 19:58:14

by Eric W. Biederman

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Alexander Viro <[email protected]> writes:

> On Sat, 12 Jan 2002, H. Peter Anvin wrote:

> > c_chksum 8 bytes CRC of data field if c_magic is 070702
>
> + or "00000000" if it's 070701. Kernel
> + is not expected to verify it in any case.

Why is the kernel not expected to check the data integrity? Usually
end to end data integrity is important. And a check on the data integrity
and tells us that either the bootloader or the hardware is messed up
can save hours of debugging?

Eric

2002-01-13 19:56:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

"H. Peter Anvin" <[email protected]> writes:

> This is an update to the initramfs buffer format spec I posted
> earlier. The changes are as follows:

Comments. Endian issues are not specified, is the data little, big
or vax endian?

What is the point of alignment? If the data starts as 4 byte aligned,
the 6 byte magic string guarantees the data will be only 2 byte
aligned. This isn't good for 32 or 64 bit architectures.

I do like having a c_magic that at least allows us to change things
in the future if necessary.

Eric

2002-01-13 20:09:16

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Daniel Phillips wrote:

>
>>The structure of the cpio_header is as follows (all 8-byte entries
>>contain 32-bit hexadecimal ASCII numbers):
>
> I thought there's a binary version of the cpio header. What is the
> point of the ascii encoding?
>


Byte order independence. The binary version of cpio is ancient and
obsolete. Unfortunately the SysV people didn't have the htons() etc
macros of BSD, so they had no concept of portable binary formats.


>>The c_mode field matches the contents of st_mode returned by stat(2)
>>on Linux, and encodes the file type and file permissions.
>>
>>The c_filesize should be zero for any non-regular file.
>>
>>If the filename is "TRAILER!!!" this is actually an end-of-file
>>marker; the c_filesize for an end-of-file marker must be zero.
>>
> It sure looks ugly, but I suppose the c_filesize=zero is the real
> end-of-file marker. Did I mention it sure looks ugly?
>


c_filesize == 0 does *NOT* imply a end-of-archive marker. It is the
filename "TRAILER!!!" that does. And yes, it's ugly.

-hpa


2002-01-13 20:12:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Eric W. Biederman wrote:

>
> Comments. Endian issues are not specified, is the data little, big
> or vax endian?
>


Not applicable. There are no endian-specific binary structure in the
format AT ALL. ASCII-coded fields are always bigendian.


> What is the point of alignment? If the data starts as 4 byte aligned,
> the 6 byte magic string guarantees the data will be only 2 byte
> aligned. This isn't good for 32 or 64 bit architectures.


They're ASCII-coded, so it supposedly doesn't matter (yet, it's a bit
daft, but blame the SysV people.) The alignment makes sure the *data*
field is 4-byte aligned.


> I do like having a c_magic that at least allows us to change things
> in the future if necessary.


It's pretty clear from a lot of the comments that a number of people
haven't understood that the cpio encapsulation *THIS IS A CODIFICATION
OF AN EXISTING FORMAT.*

-hpa


2002-01-13 20:40:06

by Alexander Viro

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft



On 13 Jan 2002, Eric W. Biederman wrote:

> "H. Peter Anvin" <[email protected]> writes:
>
> > This is an update to the initramfs buffer format spec I posted
> > earlier. The changes are as follows:
>
> Comments. Endian issues are not specified, is the data little, big
> or vax endian?

Data is what you put into files, byte-by-byte. Headers are ASCII.

> What is the point of alignment? If the data starts as 4 byte aligned,
> the 6 byte magic string guarantees the data will be only 2 byte
> aligned. This isn't good for 32 or 64 bit architectures.

Both data and headers are aligned. And headers are ascii strings.

2002-01-13 21:01:09

by Eric W. Biederman

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

"H. Peter Anvin" <[email protected]> writes:

> Eric W. Biederman wrote:
>
> > Comments. Endian issues are not specified, is the data little, big
> > or vax endian?
> >
>
>
> Not applicable. There are no endian-specific binary structure in the format AT
> ALL. ASCII-coded fields are always bigendian.

O.k. Thanks, I missed that part. I just looked back and it is clear
that there are 32 bit values encoded in hexadecimal. And I admit the
bigendian (human readable) is strongly implied from the context.

> > What is the point of alignment? If the data starts as 4 byte aligned,
> > the 6 byte magic string guarantees the data will be only 2 byte
> > aligned. This isn't good for 32 or 64 bit architectures.
>
>
> They're ASCII-coded, so it supposedly doesn't matter (yet, it's a bit daft, but
> blame the SysV people.) The alignment makes sure the *data* field is 4-byte
> aligned.

O.k. So the we have a bit of implied padding after the filename. And
it is necessary to preserve this padding or we break with the
prexisting format definition. You don't gain much with that as being
4 byte aligned on 64bit architectures, is not fully aligned.

> > I do like having a c_magic that at least allows us to change things
> > in the future if necessary.
>
>
> It's pretty clear from a lot of the comments that a number of people haven't
> understood that the cpio encapsulation *THIS IS A CODIFICATION OF AN EXISTING
> FORMAT.*

Which we are reusing for a different purpose. And because of that we
become trustees of our version of the format. To make it clear that
someone else defines how this format works a reference to the
appropriate specification is called for.

I admit I did a quick search earlier and I did not find this format
specified, elsewhere.

The cases where initramfs will be used are some of the most operating
specific cases I can imagine. To handle those cases it is necessary
to support the full breadth of the capability of the operating system.
So if initramfs is going to survive todays implementation of the linux
kernel, or possibly be portable to other operating systems we must
have an extensible format. It appears c_magic gives us that
extensibility.

Eric

2002-01-13 21:59:45

by Alexander Viro

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft



On 13 Jan 2002, Eric W. Biederman wrote:

> Which we are reusing for a different purpose. And because of that we
> become trustees of our version of the format. To make it clear that
> someone else defines how this format works a reference to the
> appropriate specification is called for.

We are using it for precisely the same purpose - to put a bunch of
files on a filesystem.

> The cases where initramfs will be used are some of the most operating
> specific cases I can imagine. To handle those cases it is necessary
> to support the full breadth of the capability of the operating system.

Huh? It's a bloody archive - collection of files and nothing else.
What "capability of the operating system"?

2002-01-13 22:38:28

by Eric W. Biederman

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Alexander Viro <[email protected]> writes:

> On 13 Jan 2002, Eric W. Biederman wrote:
>
> > Which we are reusing for a different purpose. And because of that we
> > become trustees of our version of the format. To make it clear that
> > someone else defines how this format works a reference to the
> > appropriate specification is called for.
>
> We are using it for precisely the same purpose - to put a bunch of
> files on a filesystem.

Anytime you are specifying semantics beyond what was in the original
specification it isn't precisely the same case. Close enough not to
matter yes but not precisely the same. The original cpio format does
not specify compression or concatenation of images. It is not
mandated that the cpio format handle the needs of everyones root
filesystem.

Additionally we now have the potential of generating cpio files from
the bootloaders. And bootloaders should be the kinds of programs that
don't need constant maintenance or upgrading, (that is very
destabilizing). So totally reworking the format is not a solution
when we need to change something. Even if is ok for cpio in general.

This changing the format in incompatible ways when there is a new
requirement does seem to be the traditional cpio method.

> > The cases where initramfs will be used are some of the most operating
> > specific cases I can imagine. To handle those cases it is necessary
> > to support the full breadth of the capability of the operating system.
>
> Huh? It's a bloody archive - collection of files and nothing else.
> What "capability of the operating system"?

Exactly. But the standard unix stream of bytes does not cover everyones
concept of files. Things like:
Symbolic Links
Device Nodes,
Resource Forks,
Device links,
Persistent mount points,
ACL's,
Persistent capabilities,

Are all partial exceptions to everything is the same kind of file.
The cpio format as is doesn't handle all of these which is fine, but
we may need some of these later, so we need someplace to expand to
when if/when these kinds of things become important.

The startup process is likely to need everything the operating system
can do, to handle some special case or the other. So if at some
future date we support odd types of special files we will probably
need to use them in the system startup code. We already require device
nodes, and find symbolic links very helpful.

Further Linux is dynamic and always changing, so not having some elbow
room for growth is just asking for trouble. All I noted is that
the c_magic field exists so if/when the need arises we can handle
really strange cases. With everyone in linux being able to use an
initramfs as their root filesystem actually makes the odds of a change
that requires special root filesystem support much more likely.
Because you only have to change one filesystem.

All I am asking is two things. If we are not assuming guardianship
for our variant of the cpio format we should reference those who do
have guardianship, in the specification. We should be aware that the
cpio format as it now exists may not handle all future needs so
having a mechanism to extend the format when those needs arise without
breaking all existing users is important.

Eric

2002-01-14 23:13:14

by kaih

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

[email protected] (Eric W. Biederman) wrote on 13.01.02 in <[email protected]>:

> I admit I did a quick search earlier and I did not find this format
> specified, elsewhere.

The latest existing formal spec is probably POSIX 2001 (look under "pax").
An older POSIX version would have it under "cpio". You'll probably also
find it there in Unix98 a.k.a. SuSv2. (POSIX 2001 (the Austin revision)
supersedes all of those.)

It's a bit long to post here - probably exceeds fair use.

MfG Kai

2002-01-15 00:26:50

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Followup to: <[email protected]>
By author: [email protected] (Kai Henningsen)
In newsgroup: linux.dev.kernel
>
> The latest existing formal spec is probably POSIX 2001 (look under "pax").
> An older POSIX version would have it under "cpio". You'll probably also
> find it there in Unix98 a.k.a. SuSv2. (POSIX 2001 (the Austin revision)
> supersedes all of those.)
>
> It's a bit long to post here - probably exceeds fair use.
>

POSIX only specifies the "old ASCII" cpio format anyway, which is so
limited as to be useless. POSIX specifies also specify "ustar" and
"pax", two extended tar formats, neither of which is suitable for this
application.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-01-15 17:41:19

by Daniel Phillips

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On January 13, 2002 09:39 pm, Alexander Viro wrote:
> On 13 Jan 2002, Eric W. Biederman wrote:
> > "H. Peter Anvin" <[email protected]> writes:
> >
> > > This is an update to the initramfs buffer format spec I posted
> > > earlier. The changes are as follows:
> >
> > Comments. Endian issues are not specified, is the data little, big
> > or vax endian?
>
> Data is what you put into files, byte-by-byte. Headers are ASCII.

Encoding the numeric fields in ASCII/hex is a goofy wart on an otherwise nice
design. What is the compelling reason? Bytesex isn't it: we should just
pick one or the other and stick with it as we do in Ext2.

Why don't we fix cpio to write a consistent bytesex?

--
Daniel

2002-01-15 20:03:50

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Daniel Phillips wrote:

>
> Encoding the numeric fields in ASCII/hex is a goofy wart on an otherwise nice
> design. What is the compelling reason? Bytesex isn't it: we should just
> pick one or the other and stick with it as we do in Ext2.
>
> Why don't we fix cpio to write a consistent bytesex?
>


Because we want to use existing tools. It's a wart, but not compelling
enough of one to rewrite the tools from scratch. (I would also change
the EOA marker from "TRAILER!!!" to "" since a null filename would not
interfere with the namespace.)

I don't think think this application alone is enough to add Yet Another
Version of CPIO. However, if there are more compelling reasons to do so
for CPIO backup reasons itself I guess we could write it up and add it
to GNU cpio as "linux" format...

-hpa


2002-01-15 20:14:20

by Daniel Phillips

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On January 15, 2002 09:03 pm, H. Peter Anvin wrote:
> Daniel Phillips wrote:
>
> >
> > Encoding the numeric fields in ASCII/hex is a goofy wart on an otherwise
nice
> > design. What is the compelling reason? Bytesex isn't it: we should just
> > pick one or the other and stick with it as we do in Ext2.
> >
> > Why don't we fix cpio to write a consistent bytesex?
> >
>
>
> Because we want to use existing tools.

It's a mistake not to fix this tool. I'll post the cost in terms of bytes
wasted shortly, pretty tough to argue with that, right?

> It's a wart, but not compelling
> enough of one to rewrite the tools from scratch.

Why would you rewrite from scratch?

> (I would also change
> the EOA marker from "TRAILER!!!" to "" since a null filename would not
> interfere with the namespace.)

Yes!

> I don't think think this application alone is enough to add Yet Another
> Version of CPIO. However, if there are more compelling reasons to do so
> for CPIO backup reasons itself I guess we could write it up and add it
> to GNU cpio as "linux" format...

Oh, it is, really it is. It's not just any application, and GNU already
has its own verion of cpio.

--
Daniel

2002-01-15 20:15:03

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Daniel Phillips wrote:

>
>>I don't think think this application alone is enough to add Yet Another
>>Version of CPIO. However, if there are more compelling reasons to do so
>> for CPIO backup reasons itself I guess we could write it up and add it
>>to GNU cpio as "linux" format...
>
> Oh, it is, really it is. It's not just any application, and GNU already
> has its own verion of cpio.
>


But not their own data format.

-hpa


2002-01-15 21:06:30

by Andreas Dilger

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On Jan 15, 2002 21:16 +0100, Daniel Phillips wrote:
> On January 15, 2002 09:03 pm, H. Peter Anvin wrote:
> > Daniel Phillips wrote:
> > > Encoding the numeric fields in ASCII/hex is a goofy wart on an otherwise
> > > nice design. What is the compelling reason? Bytesex isn't it: we
> > > should just pick one or the other and stick with it as we do in Ext2.
> > >
> > > Why don't we fix cpio to write a consistent bytesex?
> >
> > Because we want to use existing tools.
>
> It's a mistake not to fix this tool. I'll post the cost in terms of bytes
> wasted shortly, pretty tough to argue with that, right?

Well, I doubt the difference will be more than a few bytes, if you compare
the cpio archive sizes after compression with gzip.

> > I don't think think this application alone is enough to add Yet Another
> > Version of CPIO. However, if there are more compelling reasons to do so
> > for CPIO backup reasons itself I guess we could write it up and add it
> > to GNU cpio as "linux" format...
>
> Oh, it is, really it is. It's not just any application, and GNU already
> has its own verion of cpio.

But then every person who wants to build a kernel will have to have
the patched version of cpio until such a time it is part of the standard
cpio tool (which may be "never"). I would much rather use the currently
available tools than save 20 bytes off a 900kB kernel image.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2002-01-15 23:00:37

by Daniel Phillips

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On January 15, 2002 09:14 pm, H. Peter Anvin wrote:
> Daniel Phillips wrote:
> > You apparently wrote:
> > > I don't think think this application alone is enough to add Yet Another
> > > Version of CPIO. However, if there are more compelling reasons to do so
> > > for CPIO backup reasons itself I guess we could write it up and add it
> > > to GNU cpio as "linux" format...
> >
> > Oh, it is, really it is. It's not just any application, and GNU already
> > has its own verion of cpio.
>
> But not their own data format.

>From the man page:

"The new ASCII format is portable between
different machine architectures and can be used on any size file system, but is
not supported by all versions of cpio; currently, it is only supported by GNU and
Unix System V R4."

--
Daniel

2002-01-15 23:06:27

by Daniel Phillips

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On January 15, 2002 10:04 pm, Andreas Dilger wrote:
> On Jan 15, 2002 21:16 +0100, Daniel Phillips wrote:
> > On January 15, 2002 09:03 pm, H. Peter Anvin wrote:
> > > Daniel Phillips wrote:
> > > > Encoding the numeric fields in ASCII/hex is a goofy wart on an otherwise
> > > > nice design. What is the compelling reason? Bytesex isn't it: we
> > > > should just pick one or the other and stick with it as we do in Ext2.
> > > >
> > > > Why don't we fix cpio to write a consistent bytesex?
> > >
> > > Because we want to use existing tools.
> >
> > It's a mistake not to fix this tool. I'll post the cost in terms of bytes
> > wasted shortly, pretty tough to argue with that, right?
>
> Well, I doubt the difference will be more than a few bytes, if you compare
> the cpio archive sizes after compression with gzip.

Coming soon...

Side note: I have a hard time understanding the dual thinking that goes
something like: "we have to save every nanosecond of CPU but wasting disk is
ok because, um, disk is cheap, and everybody has more than they need anyway,
and reading it takes zero time and oh yes, everybody has disks, don't they?"

> > > I don't think think this application alone is enough to add Yet Another
> > > Version of CPIO. However, if there are more compelling reasons to do so
> > > for CPIO backup reasons itself I guess we could write it up and add it
> > > to GNU cpio as "linux" format...
> >
> > Oh, it is, really it is. It's not just any application, and GNU already
> > has its own verion of cpio.
>
> But then every person who wants to build a kernel will have to have
> the patched version of cpio until such a time it is part of the standard
> cpio tool...

If we go with little-endian then only big-endian architectures will need
the patch, and they tend to need patches for lots of things anyway. Or
if you like I'll write a little utility that goes through the file and
byteswaps all the int fields.

> (which may be "never"). I would much rather use the currently
> available tools than save 20 bytes off a 900kB kernel image.

What if it's more than 20 bytes?

--
Daniel

2002-01-15 23:48:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Daniel Phillips wrote:

>
>>From the man page:
>
> "The new ASCII format is portable between
> different machine architectures and can be used on any size file system, but is
> not supported by all versions of cpio; currently, it is only supported by GNU and
> Unix System V R4."

>

... which, between them, is virtually all Unices these days.

-hpa

2002-01-15 23:48:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Daniel Phillips wrote:

>
> If we go with little-endian then only big-endian architectures will need
> the patch, and they tend to need patches for lots of things anyway. Or
> if you like I'll write a little utility that goes through the file and
> byteswaps all the int fields.
>


HUH?????????????????

-hpa

2002-01-16 00:01:37

by Andreas Dilger

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On Jan 16, 2002 00:09 +0100, Daniel Phillips wrote:
> On January 15, 2002 10:04 pm, Andreas Dilger wrote:
> > Well, I doubt the difference will be more than a few bytes, if you compare
> > the cpio archive sizes after compression with gzip.
>
> Side note: I have a hard time understanding the dual thinking that goes
> something like: "we have to save every nanosecond of CPU but wasting disk is
> ok because, um, disk is cheap, and everybody has more than they need anyway,
> and reading it takes zero time and oh yes, everybody has disks, don't they?"

OK, I agree somewhat that we need to save disk space, just as I agree we
should reduce CPU usage. That said, would you want to save a few CPU
cycles if (for example) it meant we didn't use the ELF binary format,
and had to change? Yes, we went from a.out to ELF, but it was a major
pain even when Linux was far less widely used.

> > But then every person who wants to build a kernel will have to have
> > the patched version of cpio until such a time it is part of the standard
> > cpio tool...
>
> If we go with little-endian then only big-endian architectures will need
> the patch, and they tend to need patches for lots of things anyway. Or
> if you like I'll write a little utility that goes through the file and
> byteswaps all the int fields.

But the proposed cpio format (AFAIK) has ASCII numbers, which is what you
were originally complaining about. I see that cpio(1) says that "by
default, cpio creates binary format archives... and can read archives
created on machines with a different byte-order".

Excluding alignment issues (which can also be handled relatively easily),
is there a reason why we chose the ASCII format over binary, especially
since the binary format _appears_ to be portable (assuming endian
conversions at decoding time), despite warnings to the contrary?

> > (which may be "never"). I would much rather use the currently
> > available tools than save 20 bytes off a 900kB kernel image.
>
> What if it's more than 20 bytes?

Well, anything less than half a sector (or a network packet) isn't
really measurable.

Well, a few quick tests show (GNU cpio version 2.4.2), with raw sizes
in "blocks" as output by cpio, compressed sizes in bytes:

find <dir> | cpio -o -H <format> | gzip -9 | wc -c

dir bin (default) newc (proposed)
raw gzip raw gzip
/sbin 15121 3289678 12952 2769451
/etc 8822 689517 8996 693700
/usr/local/sbin 1895 385461 1899 385764

The binary format reports lots of "truncating inode number", but for
the purpose of initramfs, that is not an issue as we don't anticipate
more than 64k files. I don't know why the /sbin test is so heavily
in favour of the newc (ASCII) format, but I repeated it to confirm
the numbers.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2002-01-16 00:30:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Andreas Dilger wrote:

>
> But the proposed cpio format (AFAIK) has ASCII numbers, which is what you
> were originally complaining about. I see that cpio(1) says that "by
> default, cpio creates binary format archives... and can read archives
> created on machines with a different byte-order".
>
> Excluding alignment issues (which can also be handled relatively easily),
> is there a reason why we chose the ASCII format over binary, especially
> since the binary format _appears_ to be portable (assuming endian
> conversions at decoding time), despite warnings to the contrary?
>


The "binary" format of cpio is *ancient*. There is no binary equivalent
to the "newc" (SVR4) format.


> The binary format reports lots of "truncating inode number", but for
> the purpose of initramfs, that is not an issue as we don't anticipate
> more than 64k files. I don't know why the /sbin test is so heavily
> in favour of the newc (ASCII) format, but I repeated it to confirm
> the numbers.


There are way too many other problems with the ancient cpio formats. Not
an option.

-hpa


2002-01-16 02:44:12

by Aaron Lehmann

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On Sun, Jan 13, 2002 at 12:53:26PM -0700, Eric W. Biederman wrote:
> Comments. Endian issues are not specified, is the data little, big
> or vax endian?

VAX is little endian.

Perhaps you mean PDP11?

2002-01-16 03:26:11

by Alexander Viro

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft



On Tue, 15 Jan 2002, Daniel Phillips wrote:

> It's a mistake not to fix this tool. I'll post the cost in terms of bytes
> wasted shortly, pretty tough to argue with that, right?

No, it's actually very easy: squeezing 40 bytes out of file is not worth
_any_ efforts.

2002-01-16 03:34:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

Andreas Dilger wrote:

>>
> Well, a few quick tests show (GNU cpio version 2.4.2), with raw sizes
> in "blocks" as output by cpio, compressed sizes in bytes:
>
> find <dir> | cpio -o -H <format> | gzip -9 | wc -c
>
> dir bin (default) newc (proposed)
> raw gzip raw gzip
> /sbin 15121 3289678 12952 2769451
> /etc 8822 689517 8996 693700
> /usr/local/sbin 1895 385461 1899 385764
>
> The binary format reports lots of "truncating inode number", but for
> the purpose of initramfs, that is not an issue as we don't anticipate
> more than 64k files. I don't know why the /sbin test is so heavily
> in favour of the newc (ASCII) format, but I repeated it to confirm
> the numbers.
>


Probably because it does hard links.

-hpa


2002-01-16 18:41:56

by Daniel Phillips

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On January 13, 2002 09:39 pm, Alexander Viro wrote:
> On 13 Jan 2002, Eric W. Biederman wrote:
>
> > "H. Peter Anvin" <[email protected]> writes:
> >
> > > This is an update to the initramfs buffer format spec I posted
> > > earlier. The changes are as follows:
> >
> > Comments. Endian issues are not specified, is the data little, big
> > or vax endian?
>
> Data is what you put into files, byte-by-byte. Headers are ASCII.

In a perfect world we would settle of one of big or little-endian and
byte-swap as appropriate, as we do with, e.g., Ext2 filesystems. However it
seems that cpio in its current form has no concept of byte-swapping. Cpio(1)
can neither generate nor decode a cpio file in the 'foreign' byte sex. So if
we are determined to use cpio as it stands, then we are stuck with the goofy
ASCII encoding, does that sum up the situation?

Too bad about that, otherwise cpio seems quite reasonable.

I just can't get over those ascii encoding though, and I can't shake the
feeling that relying on never having a file named TRAILER!!! is strange.
It's gratuitous pollution of the namespace.

What was the reason for going with cpio again - so we can use standard tools?
How hard would it be to fix cpio to get rid of the warts? What would we
break? Is the problem that we would have to, ugh, go into user space or,
eww, cooperate with non-kernel developers?

--
Daniel

2002-01-16 18:43:28

by Daniel Phillips

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On January 13, 2002 09:39 pm, Alexander Viro wrote:
> On 13 Jan 2002, Eric W. Biederman wrote:
>
> > "H. Peter Anvin" <[email protected]> writes:
> >
> > > This is an update to the initramfs buffer format spec I posted
> > > earlier. The changes are as follows:
> >
> > Comments. Endian issues are not specified, is the data little, big
> > or vax endian?
>
> Data is what you put into files, byte-by-byte. Headers are ASCII.

Is there a problem with the available tools, are they not capable of
generating the binary version of the headers?

--
Daniel

2002-01-16 18:44:46

by Daniel Phillips

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On January 13, 2002 09:39 pm, Alexander Viro wrote:
> On 13 Jan 2002, Eric W. Biederman wrote:
>
> > "H. Peter Anvin" <[email protected]> writes:
> >
> > > This is an update to the initramfs buffer format spec I posted
> > > earlier. The changes are as follows:
> >
> > Comments. Endian issues are not specified, is the data little, big
> > or vax endian?
>
> Data is what you put into files, byte-by-byte. Headers are ASCII.

Is there a problem with the available tools, are they not capable of
generating the binary version of the headers?

--
Daniel

2002-01-16 19:16:56

by Daniel Phillips

[permalink] [raw]
Subject: [offtopic] duplicate mails (was: initramfs buffer spec -- second draft)

On January 15, 2002 07:34 am, Daniel Phillips wrote:
> [duplicate stuff]

Sorry about the duplicates, this is some initiative that exim - running on my
laptop - has decided, on its own, to undertake, I am research int. It has
something to do with mails, apparently with some CC's undelivered, hanging
around in the queue and runq sending duplicates of already-delievered copies.

For example, right now I have this hanging around in the queue, and I will
eventually do a runq:

58h 1.4K 16Q3MS-0000lx-00 <[email protected]>
[email protected]
D [email protected]
D [email protected]

But now I'm worried exim will send to andrea and lkml again. Hmm.

Is this a bug?

--
Daniel

2002-01-16 20:41:20

by Bill Davidsen

[permalink] [raw]
Subject: Re: initramfs buffer spec -- second draft

On Tue, 15 Jan 2002, Daniel Phillips wrote:

> In a perfect world we would settle of one of big or little-endian and
> byte-swap as appropriate, as we do with, e.g., Ext2 filesystems. However it
> seems that cpio in its current form has no concept of byte-swapping. Cpio(1)
> can neither generate nor decode a cpio file in the 'foreign' byte sex. So if
> we are determined to use cpio as it stands, then we are stuck with the goofy
> ASCII encoding, does that sum up the situation?
>
> Too bad about that, otherwise cpio seems quite reasonable.

I have to go back and look, isn't -Hcrc endian-neutral?

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.