LinuxLists.cc - [PATCH] kexec for ia64

2006-03-13 17:39:45

Subject: [PATCH] kexec for ia64

I have updated kexec patch for ia64. Attached patch fixes a couple of
bugs from previous version and incorporates code developed by Nan Hai.
This patch works on 2.6.16-rc6 kernel. Also attached is a patch for
kexec-tools which applies on top of kexec-tools-1.101 release from Eric
Biederman <http://www.xmission.com/%
7Eebiederm/files/kexec/kexec-tools-1.101.tar.gz> and adds support for
ia64. Please test and provide feedback.

I am working on integrating kdump support and will post that patch once
I have tested it.

--
Khalid

====================================================================
Khalid Aziz Open Source and Linux Organization
(970)898-9214 Hewlett-Packard
[email protected] Fort Collins, CO

"The Linux kernel is subject to relentless development"
- Alessandro Rubini

Attachments:

kexec-ia64-2.6.16-rc2.patch (20.69 kB)
kexec-ia64-tools.patch (25.83 kB)
Download all attachments

2006-03-14 06:46:36

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] kexec for ia64

Khalid Aziz <[email protected]> wrote:
>
> I have updated kexec patch for ia64. Attached patch fixes a couple of
> bugs from previous version and incorporates code developed by Nan Hai.
> This patch works on 2.6.16-rc6 kernel. Also attached is a patch for
> kexec-tools which applies on top of kexec-tools-1.101 release from Eric
> Biederman <http://www.xmission.com/%
> 7Eebiederm/files/kexec/kexec-tools-1.101.tar.gz> and adds support for
> ia64. Please test and provide feedback.
>
> I am working on integrating kdump support and will post that patch once
> I have tested it.
>

> diff -urNp linux-2.6.16-rc2/arch/ia64/kernel/crash.c linux-2.6.16-rc2-ia64kexec/arch/ia64/kernel/crash.c
> --- linux-2.6.16-rc2/arch/ia64/kernel/crash.c 1969-12-31 17:00:00.000000000 -0700
> +++ linux-2.6.16-rc2-ia64kexec/arch/ia64/kernel/crash.c 2006-02-09 15:36:23.000000000 -0700
> ...
> +extern void device_shutdown(void);
> ...

extern declarations always go in headers, please. This patch adds zillions
of them in .c files. It is risky and duplicative.

> diff -urNp linux-2.6.16-rc2/arch/ia64/kernel/machine_kexec.c linux-2.6.16-rc2-ia64kexec/arch/ia64/kernel/machine_kexec.c
> --- linux-2.6.16-rc2/arch/ia64/kernel/machine_kexec.c 1969-12-31 17:00:00.000000000 -0700
> +++ linux-2.6.16-rc2-ia64kexec/arch/ia64/kernel/machine_kexec.c 2006-03-10 14:36:51.000000000 -0700
> ...
> +#include <linux/kernel.h>
> +#include <linux/config.h>
> +#include <linux/mm.h>
> +#include <linux/kexec.h>
> +#include <linux/pci.h>
> +#ifdef CONFIG_HOTPLUG_CPU
> +#include <linux/cpu.h>
> +#endif

The ifdef shouldn't be needed.

> +
> +DECLARE_PER_CPU(u64, ia64_mca_pal_base);
> +
> +extern unsigned long ia64_iobase;
> +extern unsigned long kexec_reboot;
> +extern void kexec_stop_this_cpu(void *);
> +extern struct subsystem devices_subsys;

In header files.

> +const extern unsigned char relocate_new_kernel[];
> +const extern unsigned long kexec_fake_sal_rendez[];
> +const extern unsigned int relocate_new_kernel_size;

Even things which are defined via vmlinux.lds magic should be declared in
headers.

> +extern void ioc_iova_disable(void);

In a header.

> +volatile extern long kexec_rendez;
> +volatile const extern unsigned char kexec_rendez_reloc[];

Ditto.

> +
> +void machine_shutdown(void)
> +{
> + struct pci_dev *dev;
> + irq_desc_t *idesc;
> + cpumask_t mask = CPU_MASK_NONE;
> +
> + /* Disable all PCI devices */
> + list_for_each_entry(dev, &pci_devices, global_list) {
> + if (!(dev->is_enabled)) {
> + continue;
> + }

Unneeded braces.

> + if (!(idesc = irq_descp(dev->irq)))

This:

idesc = irq_descp(dev->irq);
if (!idesc)

is preferred.

> +#ifdef CONFIG_SMP
> +#ifdef CONFIG_HOTPLUG_CPU

#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_CPU)

is tidier. But CONFIG_HOTPLUG_CPU already depends on CONFIG_SMP.

> + int cpu;
> +
> + for_each_online_cpu(cpu) {
> + if (cpu != smp_processor_id())
> + cpu_down(cpu);
> + }
> +#else
> + smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
> +#endif
> +#endif

Why is different code needed for hotplug cpu?

> + ia64_set_itv(1<<16);
> + /* Interrupts aren't acceptable while we reboot */
> + local_irq_disable();
> +
> + /* set kr0 to the appropriate address */
> + set_io_base();
> +
> + {
> + unsigned long pta, impl_va_bits;
> +
> +# define pte_bits 3
> +# define vmlpt_bits (impl_va_bits - PAGE_SHIFT + pte_bits)

Why not simply open-code these things?

> +# define POW2(n) (1ULL << (n))

And that.

> + /* Disable VHPT */
> + impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
> + pta = POW2(61) - POW2(vmlpt_bits);
> + ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
> + }
> +
> +#ifdef CONFIG_IA64_HP_ZX1
> + ioc_iova_disable();
> +#endif

> ...

> --- linux-2.6.16-rc2/arch/ia64/kernel/smp.c 2006-01-02 20:21:10.000000000 -0700
> +++ linux-2.6.16-rc2-ia64kexec/arch/ia64/kernel/smp.c 2006-02-15 16:37:33.000000000 -0700
> @@ -30,6 +30,9 @@
> #include <linux/delay.h>
> #include <linux/efi.h>
> #include <linux/bitops.h>
> +#ifdef CONFIG_KEXEC
> +#include <linux/kexec.h>
> +#endif

The ifdefs shouldn't be needed.

> #include <asm/atomic.h>
> #include <asm/current.h>
> @@ -84,6 +87,43 @@ unlock_ipi_calllock(void)
> spin_unlock_irq(&call_lock);
> }
>
> +#ifdef CONFIG_KEXEC
> +extern void kexec_fake_sal_rendez(void *start, unsigned long wake_up,
> + unsigned long pal_base);

That should go in a header file.

> +#define pte_bits 3
> +#define vmlpt_bits (impl_va_bits - PAGE_SHIFT + pte_bits)
> +#define POW2(n) (1ULL << (n))

Duplicative.

> +DECLARE_PER_CPU(u64, ia64_mca_pal_base);

In a header file.

> /*
> + * Terminate any outstanding interrupts
> + */
> +void terminate_irqs(void)
> +{
> + struct irqaction * action;
> + irq_desc_t *idesc;
> + int i;
> +
> + for (i=0; i<NR_IRQS; i++) {
> + idesc = irq_descp(i);
> + action = idesc->action;
> + if (!action)
> + continue;
> + if (idesc->handler->end)
> + idesc->handler->end(i);
> + }
> +}

Perhaps that should be in kernel/irq/something.c?

2006-03-14 08:29:40

by Zou, Nanhai

[permalink] [raw]

Subject: Re: [PATCH] kexec for ia64

On Tue, 2006-03-14 at 01:39, Khalid Aziz wrote:
> I have updated kexec patch for ia64. Attached patch fixes a couple of
> bugs from previous version and incorporates code developed by Nan Hai.
> This patch works on 2.6.16-rc6 kernel. Also attached is a patch for
> kexec-tools which applies on top of kexec-tools-1.101 release from Eric
> Biederman <http://www.xmission.com/%
> 7Eebiederm/files/kexec/kexec-tools-1.101.tar.gz> and adds support for
> ia64. Please test and provide feedback.
>
> I am working on integrating kdump support and will post that patch once
> I have tested it.

Thanks for merging the patches,

Some issues in the patches.
1. in machine_crash_shutdown
we can't call device_shutdown, because device or device driver may fail
at this point.
call to machine_shutdown is unnecessary, because high level code will
call it.

2. I am afraid
#ifdef ...
#include ...
#endif
is not the linux including style, ifdefs are already in header file, you
just need to include the header. and there is also some unnecessary extern declares in your code.

3. Is the set ar.k0 code necessary? ar.k0 is already holding the right
value.

4. Is the VHPT disable code necessary? kernel will soon goes into
Physical mode and the new kernel will reset VHPT walker.

5. Is the PCI disable code too complex?

The overall concern is I am afraid the code is too much than
necessary.

Attach is the kdump patches which I have sent to you for merging.
I post it to community for early test on various platforms before
merging is done.

the first patch applies to 2.6.15 kernel

the second patch applies to kexec-tools-1.101 and the generic
kexec-tools-1.101-kdump.patch

to test kexec, build the kernel with CPU_HOTPLUG and kexec enabled,
reboot to this kernel.
kexec -l vmlinux.gz --initrd="...." --append="...."
kexec -e

the second kernel can be any kernel even an 2.4 based kernel...

to test kdump, build the first kernel with kdump enabled,
you may need to enable sysrq support to help you trigger a crash.

Build a second crashdumping UP kernel with pseudo proc fs vmcore
enabled.

boot to first kernel with kernel parameter crashkernel=XXXM@YYYM which
means reserve XXXM memory at physical address YYYM for crashdumping
kernel.

then kexec -p crash-kernel-vmlinux.gz --initrd="...." --append="...."

echo c > /proc/sysre-trigger
to tigger a crash.

Then the crash dumping kernel boots, log into the crash dumping kernel,
cp /proc/vmcore core

gdb first-kernel-vmlinux core

Thanks
Zou Nan hai

Attachments:

ia64-kexec-kdump-2.6.15.patch (19.26 kB)
ia64-kexec-tool-kdump.patch.gz (12.24 kB)
Download all attachments

2006-03-14 16:00:19

by Khalid Aziz

[permalink] [raw]

Subject: Re: [PATCH] kexec for ia64

On Mon, 2006-03-13 at 22:44 -0800, Andrew Morton wrote:
> Khalid Aziz <[email protected]> wrote:
> > + int cpu;
> > +
> > + for_each_online_cpu(cpu) {
> > + if (cpu != smp_processor_id())
> > + cpu_down(cpu);
> > + }
> > +#else
> > + smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
> > +#endif
> > +#endif
>
> Why is different code needed for hotplug cpu?

Hi Andrew,

It is preferable to use cpu_down() to shoot down a cpu, but this
function is available only with CONFIG_HOTPLUG_CPU. If
CONFIG_HOTPLUG_CPU is not enabled, only other way to shut down slave
CPUs in a state where they would come back up when the new kernel boots
up and sends them IPI is to hold them in a fake rendezvous which is what
the function kexec_stop_this_cpu() does. So the choices are either to
keep both ways of shutting down CPUs or go with kexec_stop_this_cpu()
for all cases.

Thanks for the comments. I will move declarations to header files as you
suggested.

--
Khalid

====================================================================
Khalid Aziz Open Source and Linux Organization
(970)898-9214 Hewlett-Packard
[email protected] Fort Collins, CO

"The Linux kernel is subject to relentless development"
- Alessandro Rubini

2006-03-14 16:08:09

by Khalid Aziz

[permalink] [raw]

Subject: Re: [PATCH] kexec for ia64

On Tue, 2006-03-14 at 14:48 +0800, Zou Nan hai wrote:
> 3. Is the set ar.k0 code necessary? ar.k0 is already holding the right
> value.

Purely defensive coding to ensure new kernel does not fall on its
face :)

>
> 4. Is the VHPT disable code necessary? kernel will soon goes into
> Physical mode and the new kernel will reset VHPT walker.

Again, playing it safe. We do not want VHPT walker waking up at this
point. Instead of assuming code will not do anything that could cause
VHPT walker to wake up, it is better to just disable it. This way, if
any code makes erroneous references to a virtual address which causes
VHPT walker to make a TLB entry, it will simply get a page fault and we
can catch that. It is much harder to debug if VHPT walker silently makes
a TLB entry for an unexpected virtual address reference and then things
go wrong further down the line.

>
> 5. Is the PCI disable code too complex?

I have simplified it as much as I can. Suggestions to simplify further
would be appreciated.
>
> The overall concern is I am afraid the code is too much than
> necessary.
>

After testing this kexec code for over 10,000 iterations of kexec'ing, I
have found not shutting devices down results in many corner cases that
have been fairly hard to debug. Adding all this code to shut down as
much of the hardware as possible has resulted in much more reliable
kexec code.

--
Khalid

====================================================================
Khalid Aziz Open Source and Linux Organization
(970)898-9214 Hewlett-Packard
[email protected] Fort Collins, CO

"The Linux kernel is subject to relentless development"
- Alessandro Rubini

2006-03-15 01:48:32

by Zou, Nanhai

[permalink] [raw]

Subject: Re: [PATCH] kexec for ia64

On Wed, 2006-03-15 at 00:08, Khalid Aziz wrote:
> On Tue, 2006-03-14 at 14:48 +0800, Zou Nan hai wrote:
> > 3. Is the set ar.k0 code necessary? ar.k0 is already holding the right
> > value.
>
> Purely defensive coding to ensure new kernel does not fall on its
> face :)
>
I am afraid purely defensive code should go through efi memory map and
find out the ar.k0. This can be done in purgatory code, In my current
pugatory implement, ar.k0 is take from the previous kernel, but I can
modify it to pick the value from efi memmap if this is really a concern.

> >
> > 4. Is the VHPT disable code necessary? kernel will soon goes into
> > Physical mode and the new kernel will reset VHPT walker.
>
> Again, playing it safe. We do not want VHPT walker waking up at this
> point. Instead of assuming code will not do anything that could cause
> VHPT walker to wake up, it is better to just disable it. This way, if
> any code makes erroneous references to a virtual address which causes
> VHPT walker to make a TLB entry, it will simply get a page fault and we
> can catch that. It is much harder to debug if VHPT walker silently makes
> a TLB entry for an unexpected virtual address reference and then things
> go wrong further down the line.
>
But we are running in physical mode, how could VHPT walker be
invoked?
> >
> > 5. Is the PCI disable code too complex?
>
> I have simplified it as much as I can. Suggestions to simplify further
> would be appreciated.
> >
I am afraid the irq and device disable code in machine_shutdown in
your patch will never be called because kernel has already iterate on
each device and called the driver->shutdown method for them, and I don't
think we need to set affinity and power state here.

Please take a look at my patch, machine_shutdown is empty except CPU
down code. There is another function device_shootdown used at the time
of crashing.
> > The overall concern is I am afraid the code is too much than
> > necessary.
> >
>
> After testing this kexec code for over 10,000 iterations of kexec'ing, I
> have found not shutting devices down results in many corner cases that
> have been fairly hard to debug. Adding all this code to shut down as
> much of the hardware as possible has resulted in much more reliable
> kexec code.
I have also tested tens of thousands round of kexec to kexec over my
patch, I have never seem device shutdown issue.

I still prefer small and clean kernel patch than put too much
redundant code in kernel.

Thanks.
Zou Nan hai