2008-02-07 14:13:51

by Tomasz Chmielewski

[permalink] [raw]
Subject: why kexec insists on syncing with recent kernels?

According to kernel/kexec.c:

* kexec does not sync, or unmount filesystems so if you need
* that to happen you need to do that yourself.


I saw this was true with 2.6.18 kernel (i.e., it didn't sync), but kexec
syncs with recent kernels (I checked 2.6.23.14 and 2.6.24):

# kexec -e
md: stopping all md devices
sd 2:0:0:0: [sdb] Synchronizing SCSI cache


With kexec on 2.6.18 it was executing a loaded kernel immediately.


Generally, it's a good thing to sync before jumping into a new kernel,
but it breaks my setup here after upgrading from 2.6.18 to 2.6.24.

Why?

I have a couple of diskless (iSCSI-boot) machines with a buggy BIOS (old
Supermicro P4SBR/P4SBE) which randomly freeze after rebooting (the
machine shuts down just fine, but instead of booting again, showing BIOS
bootup messages etc. you can just see blank screen).

Therefore, I use kexec as a workaround for this rebooting problem.

The way kexec works now makes rebooting unreliable again:
- network interfaces are brought down,
- kernel tries to sync - it never will, as we're booted off network,
which is down

Any ideas why kexec insists on syncing?


--
Tomasz Chmielewski
http://blog.wpkg.org


2008-02-07 16:38:20

by Vivek Goyal

[permalink] [raw]
Subject: Re: why kexec insists on syncing with recent kernels?

On Thu, Feb 07, 2008 at 03:13:30PM +0100, Tomasz Chmielewski wrote:
> According to kernel/kexec.c:
>
> * kexec does not sync, or unmount filesystems so if you need
> * that to happen you need to do that yourself.
>
>

In latest kexec code I do see it syncing. But it does not unmount the
filesystems. So this comment looks like partially wrong.

> I saw this was true with 2.6.18 kernel (i.e., it didn't sync), but kexec
> syncs with recent kernels (I checked 2.6.23.14 and 2.6.24):
>
> # kexec -e
> md: stopping all md devices
> sd 2:0:0:0: [sdb] Synchronizing SCSI cache

Which kexec-tools you are using?

syncing is initiated by user space so changing kernel will not have
any effect (as long as user space is same). I think just that message
are spitted by kernel, so probably 2.6.18 did not spit any message and
2.6.24 does.

>
>
> With kexec on 2.6.18 it was executing a loaded kernel immediately.
>
>
> Generally, it's a good thing to sync before jumping into a new kernel, but
> it breaks my setup here after upgrading from 2.6.18 to 2.6.24.
>
> Why?
>
> I have a couple of diskless (iSCSI-boot) machines with a buggy BIOS (old
> Supermicro P4SBR/P4SBE) which randomly freeze after rebooting (the machine
> shuts down just fine, but instead of booting again, showing BIOS bootup
> messages etc. you can just see blank screen).
>
> Therefore, I use kexec as a workaround for this rebooting problem.
>
> The way kexec works now makes rebooting unreliable again:
> - network interfaces are brought down,
> - kernel tries to sync - it never will, as we're booted off network, which
> is down
>

Kexec has got an option -x --no-ifdown, which will not bring the network
down. Try that. "kexec- -e -x"


> Any ideas why kexec insists on syncing?

To me it makes sense. Just making sure that cache changes make to the file
system before you boot into new kernel.

In latest kexec-tools, I do see sync() is done first and then network
interfaces are brought down.

Try latest kexec tools from:

http://www.vergenet.net/~horms/linux/kexec/kexec-tools/testing/kexec-tools-testing-20071017-rc.tar.gz

and see if it works fine for you.

Thanks
Vivek

2008-02-07 16:59:32

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: why kexec insists on syncing with recent kernels?

Vivek Goyal schrieb:
> On Thu, Feb 07, 2008 at 03:13:30PM +0100, Tomasz Chmielewski wrote:
>> According to kernel/kexec.c:
>>
>> * kexec does not sync, or unmount filesystems so if you need
>> * that to happen you need to do that yourself.
>>
>>
>
> In latest kexec code I do see it syncing. But it does not unmount the
> filesystems. So this comment looks like partially wrong.
>
>> I saw this was true with 2.6.18 kernel (i.e., it didn't sync), but kexec
>> syncs with recent kernels (I checked 2.6.23.14 and 2.6.24):
>>
>> # kexec -e
>> md: stopping all md devices
>> sd 2:0:0:0: [sdb] Synchronizing SCSI cache
>
> Which kexec-tools you are using?

# kexec -v
kexec 1.101 released 15 February 2005


> syncing is initiated by user space so changing kernel will not have
> any effect (as long as user space is same). I think just that message
> are spitted by kernel, so probably 2.6.18 did not spit any message and
> 2.6.24 does.

Yes and no.
I just booted 2.6.24 on a diskless system (Mandriva) I normally use with
2.6.18 kernel, did kexec -e... And it executed the kernel immediately,
without any syncing.
On Debian, with the same 2.6.24 kernel, it does sync.

So what user space part does the syncing (and how to prevent it)?

(...)


>> The way kexec works now makes rebooting unreliable again:
>> - network interfaces are brought down,
>> - kernel tries to sync - it never will, as we're booted off network, which
>> is down
>>
>
> Kexec has got an option -x --no-ifdown, which will not bring the network
> down. Try that. "kexec- -e -x"

It does seem to help, thanks.

Why it has to be the last option specified?

I tried -f option before (don't call shutdown), but it didn't help.


>> Any ideas why kexec insists on syncing?
>
> To me it makes sense. Just making sure that cache changes make to the file
> system before you boot into new kernel.
>
> In latest kexec-tools, I do see sync() is done first and then network
> interfaces are brought down.
>
> Try latest kexec tools from:
>
> http://www.vergenet.net/~horms/linux/kexec/kexec-tools/testing/kexec-tools-testing-20071017-rc.tar.gz

Good to have a newer version, I'll try that, too.


--
Tomasz Chmielewski
http://wpkg.org

2008-02-08 08:53:20

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: why kexec insists on syncing with recent kernels?

On Thu, 7 Feb 2008, Tomasz Chmielewski wrote:
> According to kernel/kexec.c:
>
> * kexec does not sync, or unmount filesystems so if you need
> * that to happen you need to do that yourself.
>
>
> I saw this was true with 2.6.18 kernel (i.e., it didn't sync), but kexec syncs
> with recent kernels (I checked 2.6.23.14 and 2.6.24):
>
> # kexec -e
> md: stopping all md devices
> sd 2:0:0:0: [sdb] Synchronizing SCSI cache
>
>
> With kexec on 2.6.18 it was executing a loaded kernel immediately.
>
>
> Generally, it's a good thing to sync before jumping into a new kernel, but it
> breaks my setup here after upgrading from 2.6.18 to 2.6.24.
>
> Why?
>
> I have a couple of diskless (iSCSI-boot) machines with a buggy BIOS (old
> Supermicro P4SBR/P4SBE) which randomly freeze after rebooting (the machine
> shuts down just fine, but instead of booting again, showing BIOS bootup
> messages etc. you can just see blank screen).
>
> Therefore, I use kexec as a workaround for this rebooting problem.
>
> The way kexec works now makes rebooting unreliable again:
> - network interfaces are brought down,
> - kernel tries to sync - it never will, as we're booted off network, which is
> down
>
> Any ideas why kexec insists on syncing?

kexec calls device_shutdown(), so ->shutdown() will be called for all
devices.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2008-02-08 16:04:32

by Vivek Goyal

[permalink] [raw]
Subject: Re: why kexec insists on syncing with recent kernels?

On Thu, Feb 07, 2008 at 05:59:14PM +0100, Tomasz Chmielewski wrote:
> Vivek Goyal schrieb:
>> On Thu, Feb 07, 2008 at 03:13:30PM +0100, Tomasz Chmielewski wrote:
>>> According to kernel/kexec.c:
>>>
>>> * kexec does not sync, or unmount filesystems so if you need
>>> * that to happen you need to do that yourself.
>>>
>>>
>>
>> In latest kexec code I do see it syncing. But it does not unmount the
>> filesystems. So this comment looks like partially wrong.
>>
>>> I saw this was true with 2.6.18 kernel (i.e., it didn't sync), but kexec
>>> syncs with recent kernels (I checked 2.6.23.14 and 2.6.24):
>>>
>>> # kexec -e
>>> md: stopping all md devices
>>> sd 2:0:0:0: [sdb] Synchronizing SCSI cache
>>
>> Which kexec-tools you are using?
>
> # kexec -v
> kexec 1.101 released 15 February 2005
>
>
>> syncing is initiated by user space so changing kernel will not have
>> any effect (as long as user space is same). I think just that message
>> are spitted by kernel, so probably 2.6.18 did not spit any message and
>> 2.6.24 does.
>
> Yes and no.
> I just booted 2.6.24 on a diskless system (Mandriva) I normally use with
> 2.6.18 kernel, did kexec -e... And it executed the kernel immediately,
> without any syncing.
> On Debian, with the same 2.6.24 kernel, it does sync.
>
> So what user space part does the syncing (and how to prevent it)?

Syncing is initiated by kexec-tools. Following is the code in
kexec/kexec.c in kexec-tools-testing.tar.gz


if ((result == 0) && do_sync) {
sync();

I think this problem has nothing to do with syncing. There seems to be
some dependency on not shutting down network here. You might want to
debug, exactly where does it get stuck.

- Specify earlyprintk= parameter for second kernel and see if control
is reaching to second kernel.

- Otherwise specify --console-serial parameter on "kexec -l" commandline
and it should display a message "I am in purgatory" on serial console.
This will just mean that control has reached at least till purgatory.

Right now there does not seem to be any option to prevent syncing and
I don't know why would one like to have one.

> (...)
>
>
>>> The way kexec works now makes rebooting unreliable again:
>>> - network interfaces are brought down,
>>> - kernel tries to sync - it never will, as we're booted off network,
>>> which is down
>>>
>>
>> Kexec has got an option -x --no-ifdown, which will not bring the network
>> down. Try that. "kexec- -e -x"
>
> It does seem to help, thanks.
>
> Why it has to be the last option specified?
>

I have no idea. This might be an stale comment. Try putting -x before -e.

> I tried -f option before (don't call shutdown), but it didn't help.
>

Even if you did -f, it must have shutdown the network. I think somehow
in latest kernels there is some dependency on network and that's why
not shutting down network in this case is helping you.

Thanks
Vivek

2008-02-08 17:18:49

by Randy Dunlap

[permalink] [raw]
Subject: Re: why kexec insists on syncing with recent kernels?

On Fri, 8 Feb 2008 11:04:08 -0500 Vivek Goyal wrote:

> On Thu, Feb 07, 2008 at 05:59:14PM +0100, Tomasz Chmielewski wrote:
> > Vivek Goyal schrieb:
> >> On Thu, Feb 07, 2008 at 03:13:30PM +0100, Tomasz Chmielewski wrote:
> >>> According to kernel/kexec.c:
> >>>
> >>> * kexec does not sync, or unmount filesystems so if you need
> >>> * that to happen you need to do that yourself.
> >>>
> >>>
> >>
> >> In latest kexec code I do see it syncing. But it does not unmount the
> >> filesystems. So this comment looks like partially wrong.
> >>
> >>> I saw this was true with 2.6.18 kernel (i.e., it didn't sync), but kexec
> >>> syncs with recent kernels (I checked 2.6.23.14 and 2.6.24):
> >>>
> >>> # kexec -e
> >>> md: stopping all md devices
> >>> sd 2:0:0:0: [sdb] Synchronizing SCSI cache
> >>
> >> Which kexec-tools you are using?
> >
> > # kexec -v
> > kexec 1.101 released 15 February 2005
> >
> >
> >> syncing is initiated by user space so changing kernel will not have
> >> any effect (as long as user space is same). I think just that message
> >> are spitted by kernel, so probably 2.6.18 did not spit any message and
> >> 2.6.24 does.
> >
> > Yes and no.
> > I just booted 2.6.24 on a diskless system (Mandriva) I normally use with
> > 2.6.18 kernel, did kexec -e... And it executed the kernel immediately,
> > without any syncing.
> > On Debian, with the same 2.6.24 kernel, it does sync.
> >
> > So what user space part does the syncing (and how to prevent it)?
>
> Syncing is initiated by kexec-tools. Following is the code in
> kexec/kexec.c in kexec-tools-testing.tar.gz
>
>
> if ((result == 0) && do_sync) {
> sync();
>
> I think this problem has nothing to do with syncing. There seems to be
> some dependency on not shutting down network here. You might want to
> debug, exactly where does it get stuck.
>
> - Specify earlyprintk= parameter for second kernel and see if control
> is reaching to second kernel.
>
> - Otherwise specify --console-serial parameter on "kexec -l" commandline
> and it should display a message "I am in purgatory" on serial console.
> This will just mean that control has reached at least till purgatory.
>
> Right now there does not seem to be any option to prevent syncing and
> I don't know why would one like to have one.
>
> > (...)
> >
> >
> >>> The way kexec works now makes rebooting unreliable again:
> >>> - network interfaces are brought down,
> >>> - kernel tries to sync - it never will, as we're booted off network,
> >>> which is down
> >>>
> >>
> >> Kexec has got an option -x --no-ifdown, which will not bring the network
> >> down. Try that. "kexec- -e -x"
> >
> > It does seem to help, thanks.
> >
> > Why it has to be the last option specified?
> >
>
> I have no idea. This might be an stale comment. Try putting -x before -e.
>
> > I tried -f option before (don't call shutdown), but it didn't help.
> >
>
> Even if you did -f, it must have shutdown the network. I think somehow
> in latest kernels there is some dependency on network and that's why
> not shutting down network in this case is helping you.

I'm seeing NFS mounts take forever to unmount (at shutdown/reboot).
(forever => 1 hour ... or never completes)

Is this similar to the problem that the OP is asking about?


---
~Randy

2008-02-08 17:21:57

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: why kexec insists on syncing with recent kernels?

Randy Dunlap schrieb:

(...)

>> Even if you did -f, it must have shutdown the network. I think somehow
>> in latest kernels there is some dependency on network and that's why
>> not shutting down network in this case is helping you.
>
> I'm seeing NFS mounts take forever to unmount (at shutdown/reboot).
> (forever => 1 hour ... or never completes)
>
> Is this similar to the problem that the OP is asking about?

Is it a diskless station?

Even in not, just make sure you don't shut the network down before NFS
is actually unmounted...?


--
Tomasz Chmielewski
http://wpkg.org

2008-02-08 17:28:22

by Vivek Goyal

[permalink] [raw]
Subject: Re: why kexec insists on syncing with recent kernels?

On Fri, Feb 08, 2008 at 06:19:48PM +0100, Tomasz Chmielewski wrote:
> Randy Dunlap schrieb:
>
> (...)
>
>>> Even if you did -f, it must have shutdown the network. I think somehow
>>> in latest kernels there is some dependency on network and that's why
>>> not shutting down network in this case is helping you.
>>
>> I'm seeing NFS mounts take forever to unmount (at shutdown/reboot).
>> (forever => 1 hour ... or never completes)
>>
>> Is this similar to the problem that the OP is asking about?
>
> Is it a diskless station?
>
> Even in not, just make sure you don't shut the network down before NFS is
> actually unmounted...?

Network is shutdown just before kexec -e finally enters the kernel and
starts preparing to jump to the new kernel. Any syncing operation, or
File system unmounting operation will be done before that.

Can't understand why not disabling the network will help in NFS umount
or syncing operation. It has to be something else.

Thanks
Vivek

2008-02-11 12:38:34

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: why kexec insists on syncing with recent kernels?

On Fri, 8 Feb 2008, Tomasz Chmielewski wrote:
> Randy Dunlap schrieb:
>
> (...)
>
> > > Even if you did -f, it must have shutdown the network. I think somehow
> > > in latest kernels there is some dependency on network and that's why
> > > not shutting down network in this case is helping you.
> >
> > I'm seeing NFS mounts take forever to unmount (at shutdown/reboot).
> > (forever => 1 hour ... or never completes)
> >
> > Is this similar to the problem that the OP is asking about?
>
> Is it a diskless station?
>
> Even in not, just make sure you don't shut the network down before NFS is
> actually unmounted...?

JFYI, on PS3, kexec works fine with NFS root.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619