Hi all,
[Please CC me in replies, I am currently not subsribed to this list.]
I am currently using 2.4.21-ac4 (with a few other patches, but none of
them seems to touch pivot_root in any way) on a VIA EPIA M6000 board,
with works pretty fine and seems more stable than the previsouly used
2.4.21-rc2. However, there is one problem that I am unable to solve:
When switching the root filesystem on the fly basically with:
<prepare the new root fs, which is in fact a cramfs>
mount -t cramfs /dev/rd/0 $mntpoint
cd $mntpoint
mount -nt proc none proc/
mount -nt devfs none dev/
/sbin/pivot_root . mnt <dev/console >dev/console 2>&1
<snip, some further preparations on the new root fs>
/usr/sbin/chroot . /sbin/telinit u <dev/console >/dev/console 2>&1
<snip>
exit 0
it no longer works as expected. I should probably note that the
redirections to /dev/console were not necessary for 2.4.21-rc2, I tried
them with 2.4.21-ac4. While the above commands did the trick for
2.4.21-rc2, with 2.4.21-ac4 the kernel processes leave the console on
the old root fs open:
lsof -n | grep mnt
keventd 2 root 0u CHR 5,1 16
/mnt/dev/console
keventd 2 root 1u CHR 5,1 16
/mnt/dev/console
keventd 2 root 2u CHR 5,1 16
/mnt/dev/console
ksoftirqd 3 root 0u CHR 5,1 16
/mnt/dev/console
ksoftirqd 3 root 1u CHR 5,1 16
/mnt/dev/console
ksoftirqd 3 root 2u CHR 5,1 16
/mnt/dev/console
kswapd 4 root 0u CHR 5,1 16
/mnt/dev/console
kswapd 4 root 1u CHR 5,1 16
/mnt/dev/console
kswapd 4 root 2u CHR 5,1 16
/mnt/dev/console
bdflush 5 root 0u CHR 5,1 16
/mnt/dev/console
bdflush 5 root 1u CHR 5,1 16
/mnt/dev/console
bdflush 5 root 2u CHR 5,1 16
/mnt/dev/console
kupdated 6 root 0u CHR 5,1 16
/mnt/dev/console
kupdated 6 root 1u CHR 5,1 16
/mnt/dev/console
kupdated 6 root 2u CHR 5,1 16
/mnt/dev/console
kjournald 7 root 0u CHR 5,1 16
/mnt/dev/console
kjournald 7 root 1u CHR 5,1 16
/mnt/dev/console
kjournald 7 root 2u CHR 5,1 16
/mnt/dev/console
scsi_eh_0 81 root 0u CHR 5,1 16
/mnt/dev/console
scsi_eh_0 81 root 1u CHR 5,1 16
/mnt/dev/console
scsi_eh_0 81 root 2u CHR 5,1 16
/mnt/dev/console
Does anybody have an idea what might be wrong here ?
best regards,
Rene
On 21 July 2003 23:53, Rene Mayrhofer wrote:
> Hi all,
>
> [Please CC me in replies, I am currently not subsribed to this list.]
>
> I am currently using 2.4.21-ac4 (with a few other patches, but none of
> them seems to touch pivot_root in any way) on a VIA EPIA M6000 board,
> with works pretty fine and seems more stable than the previsouly used
> 2.4.21-rc2. However, there is one problem that I am unable to solve:
>
> When switching the root filesystem on the fly basically with:
>
> <prepare the new root fs, which is in fact a cramfs>
> mount -t cramfs /dev/rd/0 $mntpoint
> cd $mntpoint
> mount -nt proc none proc/
> mount -nt devfs none dev/
> /sbin/pivot_root . mnt <dev/console >dev/console 2>&1
> <snip, some further preparations on the new root fs>
> /usr/sbin/chroot . /sbin/telinit u <dev/console >/dev/console 2>&1
btw you probably have to remove leading / ^^^
> <snip>
> exit 0
Hm... I thought you are running with pid 1 and have to do
exec chroot . /sbin/init <dev/console >/dev/console 2>&1
or the like as a final command.
> it no longer works as expected. I should probably note that the
> redirections to /dev/console were not necessary for 2.4.21-rc2, I tried
> them with 2.4.21-ac4. While the above commands did the trick for
> 2.4.21-rc2, with 2.4.21-ac4 the kernel processes leave the console on
> the old root fs open:
>
> lsof -n | grep mnt
> keventd 2 root 0u CHR 5,1 16 /mnt/dev/console
> keventd 2 root 1u CHR 5,1 16 /mnt/dev/console
> keventd 2 root 2u CHR 5,1 16 /mnt/dev/console
> ksoftirqd 3 root 0u CHR 5,1 16 /mnt/dev/console
> ksoftirqd 3 root 1u CHR 5,1 16 /mnt/dev/console
> ksoftirqd 3 root 2u CHR 5,1 16 /mnt/dev/console
> kswapd 4 root 0u CHR 5,1 16 /mnt/dev/console
> kswapd 4 root 1u CHR 5,1 16 /mnt/dev/console
> kswapd 4 root 2u CHR 5,1 16 /mnt/dev/console
> bdflush 5 root 0u CHR 5,1 16 /mnt/dev/console
> bdflush 5 root 1u CHR 5,1 16 /mnt/dev/console
> bdflush 5 root 2u CHR 5,1 16 /mnt/dev/console
> kupdated 6 root 0u CHR 5,1 16 /mnt/dev/console
> kupdated 6 root 1u CHR 5,1 16 /mnt/dev/console
> kupdated 6 root 2u CHR 5,1 16 /mnt/dev/console
> kjournald 7 root 0u CHR 5,1 16 /mnt/dev/console
> kjournald 7 root 1u CHR 5,1 16 /mnt/dev/console
> kjournald 7 root 2u CHR 5,1 16 /mnt/dev/console
> scsi_eh_0 81 root 0u CHR 5,1 16 /mnt/dev/console
> scsi_eh_0 81 root 1u CHR 5,1 16 /mnt/dev/console
> scsi_eh_0 81 root 2u CHR 5,1 16 /mnt/dev/console
>
> Does anybody have an idea what might be wrong here ?
I don't know for sure but it seems like kernel daemons share fds with pid 1.
On my system:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
init 1 root cwd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
init 1 root rtd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
init 1 root txt REG 0,8 19688 77715 /app/util-linux-2.11p/sbin/init (172.16.42.75:/.rootfs/.std)
init 1 root mem REG 0,8 832636 8248 /app/glibc-2.3/lib/ld-2.3.so (172.16.42.75:/.rootfs/.std)
init 1 root mem REG 0,8 69355 8254 /app/glibc-2.3/lib/libcrypt-2.3.so (172.16.42.75:/.rootfs/.std)
init 1 root mem REG 0,8 16153080 8249 /app/glibc-2.3/lib/libc-2.3.so (172.16.42.75:/.rootfs/.std)
init 1 root 3u FIFO 0,6 405 /dev/initctl
keventd 2 root cwd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
keventd 2 root rtd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
keventd 2 root 3u FIFO 0,6 405 /dev/initctl
ksoftirqd 3 root cwd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
ksoftirqd 3 root rtd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
ksoftirqd 3 root 3u FIFO 0,6 405 /dev/initctl
kswapd 4 root cwd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
kswapd 4 root rtd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
kswapd 4 root 3u FIFO 0,6 405 /dev/initctl
bdflush 5 root cwd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
bdflush 5 root rtd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
bdflush 5 root 3u FIFO 0,6 405 /dev/initctl
kupdated 6 root cwd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
kupdated 6 root rtd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
kupdated 6 root 3u FIFO 0,6 405 /dev/initctl
rpciod 16 root cwd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
rpciod 16 root rtd DIR 0,8 1024 65282 / (172.16.42.75:/.rootfs/.std)
rpciod 16 root 3u FIFO 0,6 405 /dev/initctl
See? fd 3 is open to /dev/initctl in init AND in kernel daemons.
I conclude that daemons simply share fd table with init.
So you might try to do the exec chroot . /sbin/init trick
and see whether that works.
--
vda
Hi Denis,
[Please CC me in replies, I am currently not subsribed to this list.]
Denis Vlasenko wrote:
> btw you probably have to remove leading / ^^^
These calls are still within the old root, so it should IMHO be ok.
>><snip>
>>exit 0
>
>
> Hm... I thought you are running with pid 1 and have to do
>
> exec chroot . /sbin/init <dev/console >/dev/console 2>&1
>
> or the like as a final command.
I am not using pivot_root for switching from an initrd to the real root
fs in this case, but I use it to switch roots on the fly during
run-time. Specifically, I have defined a maintainance runlevel with the
root fs being a harddisk partition and a normal runlevel with the rootfs
being a (read-only) cramfs (constructed from the harddisk partition).
This allows the machine (which basically acts as a router and does a few
other neats things) to switch off the harddisk during normal operation
(running completely off RAM), but also allows very simple changes in
maintainance mode. Switching between those modes only means that the
daemons need to be stopped for a few seconds, but all network
connections stay alive because the kernel keeps running.
The scripts for doing the on-the-fly root fs switch are working quite
well by now, but with 2.4.21-ac4 I have the described problem. It worked
perfectly with 2.4.21-rc2.
> See? fd 3 is open to /dev/initctl in init AND in kernel daemons.
> I conclude that daemons simply share fd table with init.
>
> So you might try to do the exec chroot . /sbin/init trick
> and see whether that works.
I tell init to re-execute itself (after pivot_root and thus from the new
root fs), which causes init to close its old fds and open new ones from
the new root fs with. This is necessary because init already runs as pid
1 when I start the root fs switching. Maybe something changed with the
kernel process fds from 2.4.21-rc2 to 2.4.21-ac4 ?
best regards,
Rene
> I tell init to re-execute itself (after pivot_root and thus from the new
> root fs), which causes init to close its old fds and open new ones from
> the new root fs with. This is necessary because init already runs as pid
> 1 when I start the root fs switching. Maybe something changed with the
> kernel process fds from 2.4.21-rc2 to 2.4.21-ac4 ?
>
yes, see the addition of the unshare_files function in kernel/fork.c
-Jason
On Maw, 2003-07-22 at 18:37, Jason Baron wrote:
> > I tell init to re-execute itself (after pivot_root and thus from the new
> > root fs), which causes init to close its old fds and open new ones from
> > the new root fs with. This is necessary because init already runs as pid
> > 1 when I start the root fs switching. Maybe something changed with the
> > kernel process fds from 2.4.21-rc2 to 2.4.21-ac4 ?
> >
>
> yes, see the addition of the unshare_files function in kernel/fork.c
Shouldnt really have changed anything except for security exploits and
threaded apps doing weird stuff. In normal situations the files count is
one so we should actually be executing nothing more exciting that an
atomic_inc/atomic_dec.
I wonder what is going on here.
Hi Alan,
Alan Cox wrote:
> Shouldnt really have changed anything except for security exploits and
> threaded apps doing weird stuff. In normal situations the files count is
> one so we should actually be executing nothing more exciting that an
> atomic_inc/atomic_dec.
>
> I wonder what is going on here.
If it is not expected behaviour that the kernel processes no longer
close their fds open an pivot_root, then I'd like to debug this (is my
use of pivot_root correct or am I doing something wrong here ?). I will
try with vanilla 2.4.21 now and see how that goes (or should I rather
try 2.4.22-pre7 ?).
However, I'd like to use your tree on that machine because of the
support for VIA chipsets (it's a VIA EPIA M-6000).
Thanks for the notice that it should not have changed,
Rene
Alan Cox wrote:
>On Maw, 2003-07-22 at 18:37, Jason Baron wrote:
>
>
>>>I tell init to re-execute itself (after pivot_root and thus from the new
>>>root fs), which causes init to close its old fds and open new ones from
>>>the new root fs with. This is necessary because init already runs as pid
>>>1 when I start the root fs switching. Maybe something changed with the
>>>kernel process fds from 2.4.21-rc2 to 2.4.21-ac4 ?
>>>
>>>
>>>
>>yes, see the addition of the unshare_files function in kernel/fork.c
>>
>>
>
>Shouldnt really have changed anything except for security exploits and
>threaded apps doing weird stuff. In normal situations the files count is
>one so we should actually be executing nothing more exciting that an
>atomic_inc/atomic_dec.
>
>I wonder what is going on here.
>
>-
>
>
But kernel threads may be incrementing init's files->count before user
space init execs, so unshare_files() after execve("/sbin/init") ends up
copying files.
--Mika
> If it is not expected behaviour that the kernel processes no longer
> close their fds open an pivot_root, then I'd like to debug this (is my
> use of pivot_root correct or am I doing something wrong here ?). I will
> try with vanilla 2.4.21 now and see how that goes (or should I rather
> try 2.4.22-pre7 ?).
2.4.22pre7 has the unshare_files fix - its a security fix.
It should not have changed the behaviour so I'm very interested to know if
that specific patch set changes the behaviour and precisely what your code
is doing
#!/bin/sh
# Switch from harddisk to RAM mode.
#
# Rene Mayrhofer, 2003
. /etc/cramdisk.conf
mntpoint=/cramfs
if [ -e /on-cramfs ]; then
echo "Already running on CRAMFS."
exit 1
fi
if grep -q "$mntpoint" /proc/mounts; then
umount $mntpoint/mnt 2>/dev/null
umount $mntpoint/var 2>/dev/null
umount $mntpoint/data 2>/dev/null
umount $mntpoint/dev 2>/dev/null
umount $mntpoint/proc 2>/dev/null
umount $mntpoint
fi
echo "Building CRAMFS image"
#/usr/local/sbin/createramfs.sh
echo "Mounting CRAMFS image"
# this is stupid - why do we have to do it twice before it works ??
dd if=/boot/cramfs.img of=/dev/rd/0 2>/dev/null
mount -t cramfs /dev/rd/0 $mntpoint
dd if=/boot/cramfs.img of=/dev/rd/0 2>/dev/null
mount -t cramfs /dev/rd/0 $mntpoint
cd $mntpoint
echo "Mounting needed kernel filesystems"
mount -nt proc none proc/
mount -nt devfs none dev/
echo "Changing root to CRAMFS"
/sbin/pivot_root . mnt <dev/console >dev/console 2>&1
echo "Creating RAM disk for var/"
mount -nt tmpfs -o size=300M none var/
echo "Creating directories for var"
for d in $CREATE_VAR_DIRS; do
mkdir -p var/$d
done
echo "Copying directories to var/"
for d in $COPY_VAR_DIRS; do
mkdir -p var/$d
cp -dp mnt/var/$d/* var/$d/ 2> /dev/null
done
echo "Linking directories for var"
for d in $LINK_VAR_DIRS; do
ln -s /var-static/$d var/`dirname $d`
done
echo "Re-executing init"
/usr/sbin/chroot . /sbin/telinit u <dev/console >/dev/console 2>&1
echo "Killing all processes that still have stuff open on /mnt"
/usr/bin/killall getty
/usr/sbin/lsof -n | grep "/mnt" |
while read name pid user fd type device size node name; do
# don't kill ourselves or the currently running rc script ....
if [ -d /proc/$pid ] &&
! cat /proc/$pid/cmdline | grep -q "switch-to-cramfs" &&
! cat /proc/$pid/cmdline | grep -q "/etc/init.d/rc"; then
/bin/kill -9 $pid
fi
done
echo "Mounting data directory read-only"
if grep -q "/mnt/data" /proc/mounts; then
umount -n /mnt/data
fi
mount -nt ext2 -o ro /dev/discs/disc0/part3 /data
echo -n "Postponing unmount of filesystems until runlevel switch has completed"
/usr/sbin/chroot . /bin/bash -c '
# the grep -v is because the processlist contains this command itself...
while ps ax | grep -v "\"/etc/init.d/rc\"" | grep -q "/etc/init.d/rc"; do
sleep 1s
done
echo "Unmounting old filesystems"
if grep -q "/mnt/dev" /proc/mounts; then
umount -n /mnt/dev
fi
if grep -q "/mnt/proc/bus/usb" /proc/mounts; then
umount -n /mnt/proc/bus/usb
fi
if grep -q "/mnt/proc" /proc/mounts; then
umount -n /mnt/proc
fi
if grep -q "/mnt/mnt/cdrom" /proc/mounts; then
umount -n /mnt/mnt/cdrom
fi
if grep -q "/mnt/mnt/usb" /proc/mounts; then
umount -n /mnt/mnt/usb
fi
if grep -q "/mnt/data" /proc/mounts; then
umount -n /mnt/data
fi
if umount /mnt; then
/sbin/hdparm -y /dev/hda
fi' < /dev/null > /dev/null 2> /dev/null &
echo "."
exit 0
Give vanilla 2.4.22-pre7 a go. I suspect it'll break the same if its the unshare stuff
/sbin/init used to start up with files->count > 1 and does
close(0);close(1);close(2); -> kernel thread fds close.
Now with unshare_files() and init's files->count ==1 the kernel threads
/dev/console fds remain open. But one could ask of course so what :)
--Mika
Alan Cox wrote:
>>If it is not expected behaviour that the kernel processes no longer
>>close their fds open an pivot_root, then I'd like to debug this (is my
>>use of pivot_root correct or am I doing something wrong here ?). I will
>>try with vanilla 2.4.21 now and see how that goes (or should I rather
>>try 2.4.22-pre7 ?).
>>
>>
>
>2.4.22pre7 has the unshare_files fix - its a security fix.
>
>It should not have changed the behaviour so I'm very interested to know if
>that specific patch set changes the behaviour and precisely what your code
>is doing
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
On Maw, 2003-07-22 at 23:14, Mika Penttilä wrote:
> /sbin/init used to start up with files->count > 1 and does
> close(0);close(1);close(2); -> kernel thread fds close.
>
> Now with unshare_files() and init's files->count ==1 the kernel threads
> /dev/console fds remain open. But one could ask of course so what :)
In other words the kernel side got caught out because it assumed
the bogus thread behaviour and needs some close() calls adding. That
would make sense.
Hi Alan,
Alan Cox wrote:
> Give vanilla 2.4.22-pre7 a go. I suspect it'll break the same if its
> the unshare stuff
Yes, I can confirm that now. 2.4.22-pre7 with the patch-o-matic
netfilter stuff (which I need for the machine but IMHO shouldn't change
anything wrt. this problem) also has the same problem.
best regards,
Rene
Alan Cox wrote:
> On Maw, 2003-07-22 at 23:14, Mika Penttilä wrote:
>
>>/sbin/init used to start up with files->count > 1 and does
>>close(0);close(1);close(2); -> kernel thread fds close.
>>
>>Now with unshare_files() and init's files->count ==1 the kernel threads
>>/dev/console fds remain open. But one could ask of course so what :)
The problem with this behaviour is that the old root fs can not be
unmounted in this case, which basically means that the machine will be
unable to switch off its harddisk. And that, at least in my case, is
annoying :)
> In other words the kernel side got caught out because it assumed
> the bogus thread behaviour and needs some close() calls adding. That
> would make sense.
I have to admin that I don't really know the internals and thus don't
completely understand. What would need to be done to fix it ? Change
init's re-exec routines ?
best regards,
Rene
On Wed, 23 Jul 2003, Rene Mayrhofer wrote:
> Alan Cox wrote:
> > On Maw, 2003-07-22 at 23:14, Mika Penttilä wrote:
> >
> >>/sbin/init used to start up with files->count > 1 and does
> >>close(0);close(1);close(2); -> kernel thread fds close.
> >>
> >>Now with unshare_files() and init's files->count ==1 the kernel threads
> >>/dev/console fds remain open. But one could ask of course so what :)
> The problem with this behaviour is that the old root fs can not be
> unmounted in this case, which basically means that the machine will be
> unable to switch off its harddisk. And that, at least in my case, is
> annoying :)
>
>
> > In other words the kernel side got caught out because it assumed
> > the bogus thread behaviour and needs some close() calls adding. That
> > would make sense.
> I have to admin that I don't really know the internals and thus don't
> completely understand. What would need to be done to fix it ? Change
> init's re-exec routines ?
right. so the semantics of how file tables are shared has changed a bit. I
would think that for at least 'init', it'd be nice to preserve the
original behavior, for situations such as you described. Something like
the following would probably work, although i havent' tried the test
script.
--- linux/kernel/fork.c.orig 2003-07-23 21:34:59.000000000 -0400
+++ linux/kernel/fork.c 2003-07-23 21:35:45.000000000 -0400
@@ -558,7 +558,7 @@ int unshare_files(void)
/* This can race but the race causes us to copy when we don't
need to and drop the copy */
- if(atomic_read(&files->count) == 1)
+ if(atomic_read(&files->count) == 1 || (current->pid == 1))
{
atomic_inc(&files->count);
return 0;
Jason Baron wrote:
> right. so the semantics of how file tables are shared has changed a bit. I
> would think that for at least 'init', it'd be nice to preserve the
> original behavior, for situations such as you described. Something like
> the following would probably work, although i havent' tried the test
> script.
>
> --- linux/kernel/fork.c.orig 2003-07-23 21:34:59.000000000 -0400
> +++ linux/kernel/fork.c 2003-07-23 21:35:45.000000000 -0400
> @@ -558,7 +558,7 @@ int unshare_files(void)
>
> /* This can race but the race causes us to copy when we don't
> need to and drop the copy */
> - if(atomic_read(&files->count) == 1)
> + if(atomic_read(&files->count) == 1 || (current->pid == 1))
> {
> atomic_inc(&files->count);
> return 0;
Thanks for the hint ! I will try that in the evening. Any chance that
this will be in 2.4.22 final ?
best regards,
Rene