2013-07-10 02:06:45

by Rob Landley

[permalink] [raw]
Subject: [RESEND] The initmpfs patches.

Attached, so you don't have to fish them out of:

http://lkml.indiana.edu/hypermail/linux/kernel/1306.3/04204.html

Do they look worth applying, or should I wash it through linux-next for
a bit? (Which I'm not sure how to do if I don't host a git tree on a
server, or I'd have done it already.)

There was a previous post with a patch demonstrating the basic concept
a while ago (https://lwn.net/Articles/545740/). This is the cleaned up,
broken up, tested in as many ways as I could think of, does not have
section mismatches, allows you to disable it at runtime, passes
checkpatch.pl version. Still applies to a git pull from 3 minutes ago
(two patches have offsets, but no fuzz).

Thanks,

Rob


Attachments:
(No filename) (713.00 B)
00.patch (1.18 kB)
01.patch (1.10 kB)
02.patch (1.75 kB)
03.patch (4.62 kB)
04.patch (1.99 kB)
05.patch (1.67 kB)
Download all attachments

2013-07-15 21:01:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [RESEND] The initmpfs patches.

On Tue, 09 Jul 2013 21:06:39 -0500 Rob Landley <[email protected]> wrote:

> Attached, so you don't have to fish them out of:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1306.3/04204.html

Too hard. Especially when I want to reply to a patch. Please resend
as a patch series in the time-honoured fashion?

> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -247,7 +247,14 @@ struct dentry *ramfs_mount(struct file_system_type *fs_type,
> static struct dentry *rootfs_mount(struct file_system_type *fs_type,
> int flags, const char *dev_name, void *data)
> {
> - return mount_nodev(fs_type, flags|MS_NOUSER, data, ramfs_fill_super);
> + static int once;
> +
> + if (once)
> + return ERR_PTR(-ENODEV);
> + else
> + once++;
> +
> + return mount_nodev(fs_type, flags, data, ramfs_fill_super);
> }

The patches do this in a couple of places. The treatment of `once' is
obviously racy. Probably it is unlikely to matter in these contexts,
but it does set a poor example. And it's so trivially fixed with, for
example, test_and_set_bit() that I do think it's worth that change.

2013-07-16 04:07:08

by Rob Landley

[permalink] [raw]
Subject: Re: [RESEND] The initmpfs patches.

On 07/15/2013 04:01:35 PM, Andrew Morton wrote:
> On Tue, 09 Jul 2013 21:06:39 -0500 Rob Landley <[email protected]>
> wrote:
>
> > Attached, so you don't have to fish them out of:
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/1306.3/04204.html
>
> Too hard. Especially when I want to reply to a patch. Please resend
> as a patch series in the time-honoured fashion?

Ok.

(Balsa is such an incompetent email client I wrote a python script to
do this via raw smtp, and I'm always convinced it's going to screw up
the send. But I think I've got it debugged now...)

> > --- a/fs/ramfs/inode.c
> > +++ b/fs/ramfs/inode.c
> > @@ -247,7 +247,14 @@ struct dentry *ramfs_mount(struct
> file_system_type *fs_type,
> > static struct dentry *rootfs_mount(struct file_system_type
> *fs_type,
> > int flags, const char *dev_name, void *data)
> > {
> > - return mount_nodev(fs_type, flags|MS_NOUSER, data,
> ramfs_fill_super);
> > + static int once;
> > +
> > + if (once)
> > + return ERR_PTR(-ENODEV);
> > + else
> > + once++;
> > +
> > + return mount_nodev(fs_type, flags, data, ramfs_fill_super);
> > }
>
> The patches do this in a couple of places. The treatment of `once' is
> obviously racy. Probably it is unlikely to matter in these contexts,
> but it does set a poor example. And it's so trivially fixed with, for
> example, test_and_set_bit() that I do think it's worth that change.

Fixing in new series. Retesting will probably delay the resend until
morning.

Thanks,

Rob-

2013-07-16 07:13:03

by Ramkumar Ramachandra

[permalink] [raw]
Subject: Re: [RESEND] The initmpfs patches.

Rob Landley wrote:
> (Balsa is such an incompetent email client I wrote a python script to do
> this via raw smtp, and I'm always convinced it's going to screw up the send.
> But I think I've got it debugged now...)

Use the tried-and-tested git-send-email.perl, perhaps?

2013-07-16 23:43:07

by Rob Landley

[permalink] [raw]
Subject: Re: [RESEND] The initmpfs patches.

On 07/16/2013 02:12:19 AM, Ramkumar Ramachandra wrote:
> Rob Landley wrote:
> > (Balsa is such an incompetent email client I wrote a python script
> to do
> > this via raw smtp, and I'm always convinced it's going to screw up
> the send.
> > But I think I've got it debugged now...)
>
> Use the tried-and-tested git-send-email.perl, perhaps?

Is this script of yours any use for patches that aren't, and never
were, in git? (Given that it's not in the kernel tree, I'm guessing
"no".)

That said... in my resend this morning I substituted the message-id of
the first message instead of the reply-to, didn't I? (Trying to make it
a reply to Andrew's most recent message. Confused the list archive, it
seems.)

I'll resend again...

Rob-

2013-07-17 07:03:00

by Ramkumar Ramachandra

[permalink] [raw]
Subject: Re: [RESEND] The initmpfs patches.

Rob Landley wrote:
> Is this script of yours any use for patches that aren't, and never were, in
> git? (Given that it's not in the kernel tree, I'm guessing "no".)

It's part of git.git. And yes, it works with plain mbox files
(especially those generated by `git format-patch`).

2013-07-17 23:06:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/5] initmpfs v2: use tmpfs instead of ramfs for rootfs

On Tue, 16 Jul 2013 08:31:13 -0700 (PDT) Rob Landley <[email protected]> wrote:

> Use tmpfs for rootfs when CONFIG_TMPFS=y and there's no root=.
> Specify rootfstype=ramfs to get the old initramfs behavior.
>
> The previous initramfs code provided a fairly crappy root filesystem:
> didn't let you --bind mount directories out of it, reported zero
> size/usage so it didn't show up in "df" and couldn't run things like
> rpm that query available space before proceeding, would fill up all
> available memory and panic the system if you wrote too much to it...

The df problem and the mount --bind thing are ramfs issues, are they
not? Can we fix them? If so, that's a less intrusive change, and we
also get a fixed ramfs.

> Using tmpfs instead provides a much better root filesystem.
>
> Changes from last time: use test_and_set_bit() for "once" logic.

2013-07-18 00:15:22

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 0/5] initmpfs v2: use tmpfs instead of ramfs for rootfs

On Wed, 17 Jul 2013, Andrew Morton wrote:
> On Tue, 16 Jul 2013 08:31:13 -0700 (PDT) Rob Landley <[email protected]> wrote:
>
> > Use tmpfs for rootfs when CONFIG_TMPFS=y and there's no root=.
> > Specify rootfstype=ramfs to get the old initramfs behavior.
> >
> > The previous initramfs code provided a fairly crappy root filesystem:
> > didn't let you --bind mount directories out of it, reported zero
> > size/usage so it didn't show up in "df" and couldn't run things like
> > rpm that query available space before proceeding, would fill up all
> > available memory and panic the system if you wrote too much to it...
>
> The df problem and the mount --bind thing are ramfs issues, are they
> not? Can we fix them? If so, that's a less intrusive change, and we
> also get a fixed ramfs.

I'll leave others to comment on "mount --bind", but with regard to "df":
yes, we could enhance ramfs with accounting such as tmpfs has, to allow
it to support non-0 "df". We could have done so years ago; but have
always preferred to leave ramfs as minimal, than import tmpfs features
into it one by one.

I prefer Rob's approach of making tmpfs usable for rootfs.

Hugh

2013-07-18 23:18:04

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH 0/5] initmpfs v2: use tmpfs instead of ramfs for rootfs

Andrew: I'll save you the time of reading this message.

tl;dr: "I agree with what Hugh said".

You're welcome. :)

On 07/17/2013 07:15:29 PM, Hugh Dickins wrote:
> On Wed, 17 Jul 2013, Andrew Morton wrote:
> > On Tue, 16 Jul 2013 08:31:13 -0700 (PDT) Rob Landley
> <[email protected]> wrote:
> >
> > > Use tmpfs for rootfs when CONFIG_TMPFS=y and there's no root=.
> > > Specify rootfstype=ramfs to get the old initramfs behavior.
> > >
> > > The previous initramfs code provided a fairly crappy root
> filesystem:
> > > didn't let you --bind mount directories out of it, reported zero
> > > size/usage so it didn't show up in "df" and couldn't run things
> like
> > > rpm that query available space before proceeding, would fill up
> all
> > > available memory and panic the system if you wrote too much to
> it...
> >
> > The df problem and the mount --bind thing are ramfs issues, are they
> > not? Can we fix them? If so, that's a less intrusive change, and
> we
> > also get a fixed ramfs.
>
> I'll leave others to comment on "mount --bind",

It's unrelated to tmpfs but _is_ related to exposing a non-broken rootfs
to the user.

> but with regard to "df":
> yes, we could enhance ramfs with accounting such as tmpfs has, to
> allow
> it to support non-0 "df". We could have done so years ago; but have
> always preferred to leave ramfs as minimal, than import tmpfs features
> into it one by one.

Ramfs reporting 0 size is not a new issue, here it is 13 years ago:

http://lkml.indiana.edu/hypermail/linux/kernel/0011.2/0098.html

And people proposed adding resource limits to ramfs at the time (yes,
13 years ago):

http://lkml.indiana.edu/hypermail/linux/kernel/0011.2/0713.html

And Linus complained about complicating ramfs which he thought was a
good
educational example and could be turned into a reusable code library.
(Somewhere around
http://lkml.indiana.edu/hypermail/linux/kernel/0112.3/0257.html
or http://lkml.indiana.edu/hypermail/linux/kernel/0101.0/1167.html or...
I'd have to dig for that one. I remember reading it but my google roll
missed.)

Way back when Linus also mentioned embedded users benefitting from
rootfs, ala:

http://lkml.indiana.edu/hypermail/linux/kernel/0112.3/0307.html

Which is why I documented rootfs to be ramfs "or tmpfs, if that's
enabled" back in 2005:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/ramfs-rootfs-initramfs.txt#n57

And when I found out it still wasn't the case a year later I went
"um, hey!" on the list, but ironically I got pushback from the same
guy who objected to my perl removal patches as an "academic"
exercise because it's not how _he_ uses linux...

http://lkml.indiana.edu/hypermail/linux/kernel/0607.3/2480.html
https://lkml.org/lkml/2013/3/20/321

(And you wonder why embedded guys don't speak up more? I'm an
outright "bullhorn and plackard" guy in this community. Random
example: a guy named Rich Felker has been hanging out on the
busybox and uclibc lists and IRC channels for years, and recently
wrote musl-libc.org from "git init" to "builds linux from scratch"
in 2 years. He's on the posix committe list and posts there
multiple times per week. Number of times he's posted to
linux-kernel: zero. I'm sure Sarah Sharp just facepalmed...)

I was recently reminded of initmpfs because I'm finishing up a
contract at Cray and they wanted to do this on their supercomputers
and I went "oh, that's easy", and then had to make it work.
(Embedded and supercomputing have always been closer to each other
than either is to the desktop...) This is very much Not My Area
but I've been waiting a _decade_ for other people to do this and
nada. Really, you could see this as just "fixing my documentation"
from way back when, by changing the code to match the docs. :)

> I prefer Rob's approach of making tmpfs usable for rootfs.

Me too. The resource accounting logic in tmpfs is hundreds of lines,
with shmem_default_max_blocks and shmem_default_max_inodes to specify
default size limits, mount-time option parsing to specify different
values for those limits, plus remount logic (what if you specify a
smaller size after the fact?), plus displaying the settings per-mount
in /proc/mounts... see mm/shmem.c lines 2414 through 2581 for the
largest chunk of it.

That's why we got tmpfs/shmfs as a separate filesystem in the first
place: it's a design decision. Ramfs is intentionally minimalist.

Ramfs can't say how big it is because it doesn't _know_ how big it is.
If you write unlimited data to ramfs, the OOM killer zaps everything but
init and then the system hangs in a page eviction loop. (The OOM killer
can't free pinned page cache with nowhere to evict it to.)

My patch series switching over tmpfs is much smaller than the tmpfs
size accounting code, and we get the swap backing store for free. Plus
hooking up years-old existing tested code (instead of putting new
untested
logic in the boot path), without duplicating functionality.

I.E. "what Hugh said."

Rob-

2013-07-19 00:00:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 0/5] initmpfs v2: use tmpfs instead of ramfs for rootfs

On 07/17/2013 04:06 PM, Andrew Morton wrote:
> On Tue, 16 Jul 2013 08:31:13 -0700 (PDT) Rob Landley <[email protected]> wrote:
>
>> Use tmpfs for rootfs when CONFIG_TMPFS=y and there's no root=.
>> Specify rootfstype=ramfs to get the old initramfs behavior.
>>
>> The previous initramfs code provided a fairly crappy root filesystem:
>> didn't let you --bind mount directories out of it, reported zero
>> size/usage so it didn't show up in "df" and couldn't run things like
>> rpm that query available space before proceeding, would fill up all
>> available memory and panic the system if you wrote too much to it...
>
> The df problem and the mount --bind thing are ramfs issues, are they
> not? Can we fix them? If so, that's a less intrusive change, and we
> also get a fixed ramfs.
>

mount --bind might be useful to fix for ramfs in general (as ramfs
should provide minimal standard filesystem functionality, and that one
counts, I believe), but honestly... we should have had tmpfs as a root
filesystem option either as rootfs or as an automatic overmount a long
time ago.

The automatic overmount option (that is tmpfs on top of rootfs) is nice
in some ways, as it makes garbage-collecting the inittmpfs trivial; this
might save some boot time in the more conventional root scenarios. On
the other hand, it doesn't exactly seem to be a big problem to just
unlink everything.

-hpa