2017-06-30 04:07:56

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 00/12] Unify the interrupt delivery mode and do its setup in advance

[Background]

MP specification defines three different interrupt delivery modes as follows:

1. PIC Mode
2. Virtual Wire Mode
3. Symmetric I/O Mode

They will be setup in the different periods of booting time:
1. *PIC Mode*, the default interrupt delivery modes, will be set first.
2. *Virtual Wire Mode* will be setup during ISA IRQ initialization( step 1
in the figure.1).
3. *Symmetric I/O Mode*'s setup is related to the system
3.1 In SMP-capable system, setup during prepares CPUs(step 2)
3.2 In UP system, setup during initializes itself(step 3).


start_kernel
+---------------+
|
+--> .......
|
| setup_arch
+--> +-------+
|
| init_IRQ
+-> +--+-----+
| | init_ISA_irqs
| +------> +-+--------+
| | +----------------+
+---> +------> | 1.init_bsp_APIC|
| ....... +----------------+
+--->
| rest_init
+--->---+-----+
| | kernel_init
| +> ----+-----+
| | kernel_init_freeable
| +-> ----+-------------+
| | smp_prepare_cpus
| +---> +----+---------+
| | | +-------------------+
| | +-> |2. apic_bsp_setup |
| | +-------------------+
| |
v | smp_init
+---> +---+----+
| +-------------------+
+--> |3. apic_bsp_setup |
+-------------------+
figure.1 The flow chart of the kernel startup process

[Problem]

1. Cause kernel in an unmatched mode at the beginning of booting time.
2. Cause the dump-capture kernel hangs with 'notsc' option inherited
from 1st kernel option.
3. Cause the code hard to read and maintain.

As Ingo's and Eric's discusses[1,2], it need to be refactor.

[Solution]

1. Construct a selector to unify these switches

+------------+
|disable_apic+--------------------+
+------------+ true |
|false |
| |
+------------v------------------+ |
|!boot_cpu_has(X86_FEATURE_APIC)+-------+
+-------------------------------+ true |
|false |
| |
+-------v---------+ v
|!smp_found_config| PIC MODE
+---------------+-+
|false |true
| |
v +---v---------+
SYMMETRIC IO MODE | !acpi_lapic |
+------+------+
|
v
VIRTUAL WIRE MODE

2. Unifying these setup steps of SMP-capable and UP system

start_kernel
---------------+
|
|
|
| x86_late_time_init
+---->---+------------+
| |
| | +------------------------+
| +----> | 4. init_interrupt_mode |
| +------------------------+
v


3. Execute the function as soon as possible.

[Test]

1. In a theoretical code analysis, the patchset can wrap the original
logic.

1) The original logic of the interrupt delivery mode setup:

-Step O_1) Keep in PIC mode or virtual wire mode:

Check (smp_found_config || !boot_cpu_has(X86_FEATURE_APIC))
true: PIC mode
false: virtual wire mode

-Step O_2) Try to switch to symmetric IO mode:
O_2_1) In up system:

-Check disable_apic
ture: *O_S_1* (original situation 1)
-Check whether there is a separate or integrated chip
don't has: *O_S_2*
-Check !smp_found_config
ture: *O_S_3*
-Others:
*O_S_4*

O_2_2) In smp-capable system:

-Check !smp_found_config && !acpi_lapic
true: goto *O_2_1)*
-Check if it is LAPIC
don't has: *O_S_5*
-Check !max_cpus
true: *O_S_6*
-read_apic_id() != boot_cpu_physical_apicid
true: *O_S_7*
-Others:
*O_S_8*

2) After that patchset, the new logic:

-Step N_1) Skip step O_1 and try to switch to the final interrupt mode
-Check disable_apic
ture: *N_S_1* (New situation 1)
-Check whether there is a separate or integrated chip
ture: *N_S_2*
-Check if (!smp_found_config)
ture: *N_S_3*
-Check !setup_max_cpus
ture: *N_S_4*
-Check read_apic_id() != boot_cpu_physical_apicid
ture: *N_S_5*
-Others:
*N_S_6*

O_S_1 is covered in N_S_1
O_S_2 is covered in N_S_2
O_S_3 is covered in N_S_3
O_S_4 is covered in N_S_6
O_S_5 is covered in N_S_2
O_S_6 is covered in N_S_4
O_S_7 is covered in N_S_5
O_S_8 is covered in N_S_6

2. In the actual test, It also can work well in the situations of
my test matrix

The factors of test matrix:

X86 | SMP |LOCAL APIC|I/O APIC|UP_LATE_INIT|
----- |-----|----------|--------|------------|
32-bit| Y | Y | Y | Y |
64-bit| N | N | N | N |
xen PV |
xen HVM|

disable_apic|X86_FEATURE_APIC|smp_found_config|
------------|----------------|----------------|
0 | 0 | 0 |
1 | 1 | 1 |

acpi_lapic|acpi_ioapic|setup_max_cpus|
----------|-----------|--------------|
0 | 0 | =0 |
1 | 1 | >0 |

[Link]

[1]. https://lkml.org/lkml/2016/8/2/929
[2]. https://lkml.org/lkml/2016/8/1/506

For previous discussion, please refer to:
https://lkml.org/lkml/2017/5/10/323
https://www.spinics.net/lists/kernel/msg2491620.html
https://lkml.org/lkml/2017/3/29/481

Changes V4 --> V5:

- remove the RFC presix
- remove the 1/12 patch in V4
- merge 2 patches together for SMP-capable system
- replace the *_interrupt_* with *_intr_*
- replace the pr_info with apic_printk in apic_intr_mode_init()
- add a patch for PV xen to bypass intr_mode_init()

Changes V3 --> V4:

- Move interrupt_mode_init to x86_init_ops instead of the use of
header files
- Replace "return" with "break" in case of APIC_SYMMETRIC_IO_NO_ROUTING
- Setup upmode earlier for UP system.
- Check interrupt mode before per cpu clock event setup.

Changes V2 --> V3:

- Rebase the patches.
- Change two function name:
apic_bsp_mode_check --> apic_interrupt_mode_select
init_interrupt_mode --> apic_interrupt_mode_init
- Find a new waiting way to check whether timer IRQs work or not
- Refine the switch logic in apic_interrupt_mode_init()
- Consistently start sentences with upper case letters
- Fix some typos and comments
- Try my best to rewrite some changelog again

Changes since V1:

- Move the initialization from init_IRQ() to x86_late_time_init()
- Use a threshold to refactor the check logic in timer_irq_works()
- Rename the framework to a selector
- Split two patches
- Consistently start sentences with upper case letters
- Fix some typos
- Rewrite the changelog

Dou Liyang (12):
x86/apic: Construct a selector for the interrupt delivery mode
x86/apic: Prepare for unifying the interrupt delivery modes setup
x86/apic: Split local APIC timer setup from the APIC setup
x86/apic: Move logical APIC ID away from apic_bsp_setup()
x86/apic: Unify interrupt mode setup for SMP-capable system
x86/apic: Mark the apic_intr_mode extern for sanity check cleanup
x86/apic: Unify interrupt mode setup for UP system
x86/ioapic: Refactor the delay logic in timer_irq_works()
x86/init: add intr_mode_init to x86_init_ops
x86/xen: Bypass intr mode setup in enlighten_pv system
x86/time: Initialize interrupt mode behind timer init
x86/apic: Remove the init_bsp_APIC()

arch/x86/include/asm/apic.h | 15 +++-
arch/x86/include/asm/x86_init.h | 2 +
arch/x86/kernel/apic/apic.c | 193 ++++++++++++++++++++++------------------
arch/x86/kernel/apic/io_apic.c | 45 +++++++++-
arch/x86/kernel/irqinit.c | 3 -
arch/x86/kernel/smpboot.c | 67 ++++----------
arch/x86/kernel/time.c | 5 ++
arch/x86/kernel/x86_init.c | 1 +
arch/x86/xen/enlighten_pv.c | 1 +
9 files changed, 184 insertions(+), 148 deletions(-)

--
2.5.5




2017-06-30 04:07:55

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 02/12] x86/apic: Prepare for unifying the interrupt delivery modes setup

There are three positions for initializing the interrupt delivery
modes:

1) In IRQ initial function, may setup the through-local-APIC
virtual wire mode.

2) In an SMP-capable system, will try to switch to symmetric I/O
model when preparing the cpus in native_smp_prepare_cpus().

3) In UP system with UP_LATE_INIT=y, will set up local APIC and
I/O APIC in smp_init().

Switching to symmetric I/O mode is so late, which causes kernel
in an unmatched mode at the beginning of booting time. And it
causes the dump-capture kernel hangs with 'notsc' option inherited
from 1st kernel option.

Provide a new function to unify that three positions. Preparatory
patch to initialize an interrupt mode directly.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/include/asm/apic.h | 2 ++
arch/x86/kernel/apic/apic.c | 19 +++++++++++++++++++
2 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index bdffcd9..ddc16ff 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -128,6 +128,7 @@ extern void disable_local_APIC(void);
extern void lapic_shutdown(void);
extern void sync_Arb_IDs(void);
extern void init_bsp_APIC(void);
+extern void apic_intr_mode_init(void);
extern void setup_local_APIC(void);
extern void init_apic_mappings(void);
void register_lapic_address(unsigned long address);
@@ -170,6 +171,7 @@ static inline void disable_local_APIC(void) { }
# define setup_boot_APIC_clock x86_init_noop
# define setup_secondary_APIC_clock x86_init_noop
static inline void lapic_update_tsc_freq(void) { }
+static inline void apic_intr_mode_init(void) { }
#endif /* !CONFIG_X86_LOCAL_APIC */

#ifdef CONFIG_X86_X2APIC
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 88be65e..00d97f4 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1266,6 +1266,25 @@ void __init init_bsp_APIC(void)
apic_write(APIC_LVT1, value);
}

+/* Init the interrupt delivery mode for the BSP */
+void __init apic_intr_mode_init(void)
+{
+ switch (apic_intr_mode_select()) {
+ case APIC_PIC:
+ apic_printk(APIC_VERBOSE, KERN_INFO
+ "Keep in PIC mode(8259)\n");
+ return;
+ case APIC_VIRTUAL_WIRE:
+ apic_printk(APIC_VERBOSE, KERN_INFO
+ "Switch to virtual wire mode setup\n");
+ return;
+ case APIC_SYMMETRIC_IO:
+ apic_printk(APIC_VERBOSE, KERN_INFO
+ "Switch to symmectic I/O mode setup\n");
+ return;
+ }
+}
+
static void lapic_setup_esr(void)
{
unsigned int oldvalue, value, maxlvt;
--
2.5.5



2017-06-30 04:07:52

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 01/12] x86/apic: Construct a selector for the interrupt delivery mode

Now, there are many switches in kernel which are used to determine
the final interrupt delivery mode, as shown below:

1) kconfig:
CONFIG_X86_64; CONFIG_X86_LOCAL_APIC; CONFIG_x86_IO_APIC
2) kernel option: disable_apic; skip_ioapic_setup
3) CPU Capability: boot_cpu_has(X86_FEATURE_APIC)
4) MP table: smp_found_config
5) ACPI: acpi_lapic; acpi_ioapic; nr_ioapic

These switches are disordered and scattered and there are also some
dependencies with each other. These make the code difficult to
maintain and read.

Construct a selector to unify them into a single function, then,
Use this selector to get an interrupt delivery mode directly.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/kernel/apic/apic.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 2d75faf..88be65e 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1157,6 +1157,66 @@ void __init sync_Arb_IDs(void)
APIC_INT_LEVELTRIG | APIC_DM_INIT);
}

+enum apic_intr_mode {
+ APIC_PIC,
+ APIC_VIRTUAL_WIRE,
+ APIC_SYMMETRIC_IO,
+};
+
+static int __init apic_intr_mode_select(void)
+{
+ /* Check kernel option */
+ if (disable_apic) {
+ pr_info("APIC disabled via kernel command line\n");
+ return APIC_PIC;
+ }
+
+ /* Check BIOS */
+#ifdef CONFIG_X86_64
+ /* On 64-bit, the APIC must be integrated, Check local APIC only */
+ if (!boot_cpu_has(X86_FEATURE_APIC)) {
+ disable_apic = 1;
+ pr_info("APIC disabled by BIOS\n");
+ return APIC_PIC;
+ }
+#else
+ /*
+ * On 32-bit, check whether there is a separate chip or integrated
+ * APIC
+ */
+
+ /* Has a local APIC ? */
+ if (!boot_cpu_has(X86_FEATURE_APIC) &&
+ APIC_INTEGRATED(boot_cpu_apic_version)) {
+ disable_apic = 1;
+ pr_err(FW_BUG "Local APIC %d not detected, force emulation\n",
+ boot_cpu_physical_apicid);
+ return APIC_PIC;
+ }
+
+ /* Has a separate chip ? */
+ if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config) {
+ disable_apic = 1;
+
+ return APIC_PIC;
+ }
+#endif
+
+ /* Check MP table or ACPI MADT configuration */
+ if (!smp_found_config) {
+ disable_ioapic_support();
+
+ if (!acpi_lapic)
+ pr_info("APIC: ACPI MADT or MP tables are not detected\n");
+
+ return APIC_VIRTUAL_WIRE;
+ }
+
+ /* Other checks of APIC options will be done in each setup function */
+
+ return APIC_SYMMETRIC_IO;
+}
+
/*
* An initial setup of the virtual wire mode.
*/
--
2.5.5



2017-06-30 04:08:10

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 04/12] x86/apic: Move logical APIC ID away from apic_bsp_setup()

apic_bsp_setup() sets and returns logical APIC ID for initializing
cpu0_logical_apicid in SMP-capable system.

The id has nothing to do with the initialization of local APIC and
I/O APIC. And apic_bsp_setup() should be called for interrupt mode
setup intently.

Move the id setup to native_smp_prepare_cpus() for cleanup and mark
apic_bsp_setup() void.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/include/asm/apic.h | 2 +-
arch/x86/kernel/apic/apic.c | 10 +---------
arch/x86/kernel/smpboot.c | 7 +++++--
3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index ddc16ff..c3bedbd 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -146,7 +146,7 @@ static inline int apic_force_enable(unsigned long addr)
extern int apic_force_enable(unsigned long addr);
#endif

-extern int apic_bsp_setup(bool upmode);
+extern void apic_bsp_setup(bool upmode);
extern void apic_ap_setup(void);

/*
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 63220ca..498edbe 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2341,25 +2341,17 @@ static void __init apic_bsp_up_setup(void)
* Returns:
* apic_id of BSP APIC
*/
-int __init apic_bsp_setup(bool upmode)
+void __init apic_bsp_setup(bool upmode)
{
- int id;
-
connect_bsp_APIC();
if (upmode)
apic_bsp_up_setup();
setup_local_APIC();

- if (x2apic_mode)
- id = apic_read(APIC_LDR);
- else
- id = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
-
enable_IO_APIC();
end_local_APIC_setup();
irq_remap_enable_fault_handling();
setup_IO_APIC();
- return id;
}

/*
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 93f0cda..d6721f0 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1347,8 +1347,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
}

default_setup_apic_routing();
- cpu0_logical_apicid = apic_bsp_setup(false);
-
+ apic_bsp_setup(false);
+ if (x2apic_mode)
+ cpu0_logical_apicid = apic_read(APIC_LDR);
+ else
+ cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
/* Setup local timer */
x86_init.timers.setup_percpu_clockev();

--
2.5.5



2017-06-30 04:07:58

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 03/12] x86/apic: Split local APIC timer setup from the APIC setup

apic_bsp_setup() sets up the local APIC, I/O APIC and APIC timer.

The local APIC and I/O APIC setup belongs to interrupt delivery mode
setup. Setting up the local APIC timer for booting CPU is another job
and has nothing to do with interrupt delivery mode setup.

Split local APIC timer setup from the APIC setup, keep it in the
original position for SMP and UP kernel for preparation.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/kernel/apic/apic.c | 4 ++--
arch/x86/kernel/smpboot.c | 5 +++++
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 00d97f4..63220ca 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2359,8 +2359,6 @@ int __init apic_bsp_setup(bool upmode)
end_local_APIC_setup();
irq_remap_enable_fault_handling();
setup_IO_APIC();
- /* Setup local timer */
- x86_init.timers.setup_percpu_clockev();
return id;
}

@@ -2400,6 +2398,8 @@ int __init APIC_init_uniprocessor(void)

default_setup_apic_routing();
apic_bsp_setup(true);
+ /* Setup local timer */
+ x86_init.timers.setup_percpu_clockev();
return 0;
}

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f04479a..93f0cda 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1333,6 +1333,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
case SMP_FORCE_UP:
disable_smp();
apic_bsp_setup(false);
+ /* Setup local timer */
+ x86_init.timers.setup_percpu_clockev();
return;
case SMP_OK:
break;
@@ -1347,6 +1349,9 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
default_setup_apic_routing();
cpu0_logical_apicid = apic_bsp_setup(false);

+ /* Setup local timer */
+ x86_init.timers.setup_percpu_clockev();
+
pr_info("CPU0: ");
print_cpu_info(&cpu_data(0));

--
2.5.5



2017-06-30 04:08:18

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 09/12] x86/init: add intr_mode_init to x86_init_ops

Add an unconditional x86_init_ops function which defaults to the
standard function and can be overridden by the early platform code.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/include/asm/x86_init.h | 2 ++
arch/x86/kernel/apic/apic.c | 2 +-
arch/x86/kernel/smpboot.c | 2 +-
arch/x86/kernel/x86_init.c | 1 +
4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 7ba7e90..f45acdf 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -50,11 +50,13 @@ struct x86_init_resources {
* are set up.
* @intr_init: interrupt init code
* @trap_init: platform specific trap setup
+ * @intr_mode_init: interrupt delivery mode setup
*/
struct x86_init_irqs {
void (*pre_vector_init)(void);
void (*intr_init)(void);
void (*trap_init)(void);
+ void (*intr_mode_init)(void);
};

/**
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 9bf7e95..51204d4 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2387,7 +2387,7 @@ void __init apic_bsp_setup(bool upmode)
#ifdef CONFIG_UP_LATE_INIT
void __init up_late_init(void)
{
- apic_intr_mode_init();
+ x86_init.irqs.intr_mode_init();

if (apic_intr_mode == APIC_PIC)
return;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 2db61dcb..80d25fe 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1288,7 +1288,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

set_cpu_sibling_map(0);

- apic_intr_mode_init();
+ x86_init.irqs.intr_mode_init();
smp_sanity_check();

switch (apic_intr_mode) {
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index a088b2c..a7889b9 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -55,6 +55,7 @@ struct x86_init_ops x86_init __initdata = {
.pre_vector_init = init_ISA_irqs,
.intr_init = native_init_IRQ,
.trap_init = x86_init_noop,
+ .intr_mode_init = apic_intr_mode_init
},

.oem = {
--
2.5.5



2017-06-30 04:08:11

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 07/12] x86/apic: Unify interrupt mode setup for UP system

In UniProcessor kernel with UP_LATE_INIT=y, it enables and setups
interrupt delivery mode in up_late_init().

Unify it to apic_intr_mode_init(), remove APIC_init_uniprocessor().

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/include/asm/apic.h | 1 -
arch/x86/kernel/apic/apic.c | 49 +++++++++------------------------------------
2 files changed, 9 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index bfbf715..b35bbbf 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -144,7 +144,6 @@ void register_lapic_address(unsigned long address);
extern void setup_boot_APIC_clock(void);
extern void setup_secondary_APIC_clock(void);
extern void lapic_update_tsc_freq(void);
-extern int APIC_init_uniprocessor(void);

#ifdef CONFIG_X86_64
static inline int apic_force_enable(unsigned long addr)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 0601054..9bf7e95 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1198,6 +1198,10 @@ static int __init apic_intr_mode_select(int *upmode)
}
#endif

+#ifdef CONFIG_UP_LATE_INIT
+ *upmode = true;
+#endif
+
/* Check MP table or ACPI MADT configuration */
if (!smp_found_config) {
disable_ioapic_support();
@@ -2380,51 +2384,16 @@ void __init apic_bsp_setup(bool upmode)
setup_IO_APIC();
}

-/*
- * This initializes the IO-APIC and APIC hardware if this is
- * a UP kernel.
- */
-int __init APIC_init_uniprocessor(void)
+#ifdef CONFIG_UP_LATE_INIT
+void __init up_late_init(void)
{
- if (disable_apic) {
- pr_info("Apic disabled\n");
- return -1;
- }
-#ifdef CONFIG_X86_64
- if (!boot_cpu_has(X86_FEATURE_APIC)) {
- disable_apic = 1;
- pr_info("Apic disabled by BIOS\n");
- return -1;
- }
-#else
- if (!smp_found_config && !boot_cpu_has(X86_FEATURE_APIC))
- return -1;
-
- /*
- * Complain if the BIOS pretends there is one.
- */
- if (!boot_cpu_has(X86_FEATURE_APIC) &&
- APIC_INTEGRATED(boot_cpu_apic_version)) {
- pr_err("BIOS bug, local APIC 0x%x not detected!...\n",
- boot_cpu_physical_apicid);
- return -1;
- }
-#endif
+ apic_intr_mode_init();

- if (!smp_found_config)
- disable_ioapic_support();
+ if (apic_intr_mode == APIC_PIC)
+ return;

- default_setup_apic_routing();
- apic_bsp_setup(true);
/* Setup local timer */
x86_init.timers.setup_percpu_clockev();
- return 0;
-}
-
-#ifdef CONFIG_UP_LATE_INIT
-void __init up_late_init(void)
-{
- APIC_init_uniprocessor();
}
#endif

--
2.5.5



2017-06-30 04:08:20

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 08/12] x86/ioapic: Refactor the delay logic in timer_irq_works()

Kernel use timer_irq_works() to detects the timer IRQs. It calls
mdelay(10) to delay ten ticks and check whether the timer IRQ work
or not. The mdelay() depends on the loops_per_jiffy which is set up
in calibrate_delay(). Current kernel defaults the IRQ 0 is available
when it calibrates delay.

But it is wrong in the dump-capture kernel with 'notsc' option inherited
from 1st kernel option. dump-capture kernel can't make sure the timer IRQ
works well.

The correct design is making the interrupt mode setup and checking timer
IRQ works in advance of calibrate_delay(). That results in the mdelay()
being unusable in timer_irq_works().

Preparatory patch to make the setup in advance. Refactor the delay logic
by waiting for some cycles. In the system with X86_FEATURE_TSC feature,
Use rdtsc(), others will call __delay() directly.

Note: regard 4G as the max CPU frequence of current single CPU.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 45 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 347bb9f..f710077 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1607,6 +1607,43 @@ static int __init notimercheck(char *s)
}
__setup("no_timer_check", notimercheck);

+static void __init delay_with_tsc(void)
+{
+ unsigned long long start, now;
+ unsigned long ticks = jiffies;
+
+ start = rdtsc();
+
+ /*
+ * We don't know the TSC frequency yet, but waiting for
+ * 40000000000/HZ TSC cycles is safe:
+ * 4 GHz == 10 jiffies
+ * 1 GHz == 40 jiffies
+ */
+ do {
+ rep_nop();
+ now = rdtsc();
+ } while ((now - start) < 40000000000UL / HZ &&
+ time_before_eq(jiffies, ticks + 4));
+}
+
+static void __init delay_without_tsc(void)
+{
+ int band = 1;
+ unsigned long ticks = jiffies;
+
+ /*
+ * We don't know any frequency yet, but waiting for
+ * 40940000000/HZ cycles is safe:
+ * 4 GHz == 10 jiffies
+ * 1 GHz == 40 jiffies
+ * 1 << 1 + 1 << 2 +...+ 1 << 11 = 4094
+ */
+ do {
+ __delay(((1 << band++) * 10000000UL) / HZ);
+ } while (band < 12 && time_before_eq(jiffies, ticks + 4));
+}
+
/*
* There is a nasty bug in some older SMP boards, their mptable lies
* about the timer IRQ. We do the following to work around the situation:
@@ -1625,8 +1662,12 @@ static int __init timer_irq_works(void)

local_save_flags(flags);
local_irq_enable();
- /* Let ten ticks pass... */
- mdelay((10 * 1000) / HZ);
+
+ if (boot_cpu_has(X86_FEATURE_TSC))
+ delay_with_tsc();
+ else
+ delay_without_tsc();
+
local_irq_restore(flags);

/*
--
2.5.5



2017-06-30 04:08:26

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 11/12] x86/time: Initialize interrupt mode behind timer init

In start_kernel(), firstly, it works on the default interrupy mode, then
switch to the final mode. Normally, Booting with BIOS reset is OK.

But, At dump-capture kernel, it boot up without BIOS reset, default mode
may not be compatible with the actual registers, that causes the delivery
interrupt to fail.

Try to set up the final mode as soon as possible. according to the parts
which split from that initialization:

1) Set up the APIC/IOAPIC (including testing whether the timer
interrupt works)

2) Calibrate TSC

3) Set up the local APIC timer

-- From Thomas Gleixner

Initializing the mode should be earlier than calibrating TSC as soon as
possible and needs testing whether the timer interrupt works at the same
time.

call it behind timers init, which meets the above conditions.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/kernel/apic/apic.c | 2 --
arch/x86/kernel/smpboot.c | 2 --
arch/x86/kernel/time.c | 5 +++++
3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 51204d4..d4722c2 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2387,8 +2387,6 @@ void __init apic_bsp_setup(bool upmode)
#ifdef CONFIG_UP_LATE_INIT
void __init up_late_init(void)
{
- x86_init.irqs.intr_mode_init();
-
if (apic_intr_mode == APIC_PIC)
return;

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 80d25fe..d1185dd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1287,8 +1287,6 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
set_sched_topology(x86_topology);

set_cpu_sibling_map(0);
-
- x86_init.irqs.intr_mode_init();
smp_sanity_check();

switch (apic_intr_mode) {
diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index d39c091..0f04d4b 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -84,6 +84,11 @@ void __init hpet_time_init(void)
static __init void x86_late_time_init(void)
{
x86_init.timers.timer_init();
+ /*
+ * After PIT/HPET timers init, select and setup
+ * the final interrupt mode for delivering IRQs.
+ */
+ x86_init.irqs.intr_mode_init();
tsc_init();
}

--
2.5.5



2017-06-30 04:08:35

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 12/12] x86/apic: Remove the init_bsp_APIC()

The init_bsp_APIC() which works for the virtual wire mode is used
in ISA irq initialization at the booting time.

Currently, enable and setup the interrupt mode has been unified
and advanced just behind the timer IRQ setup. Kernel switches to
the final interrupt delivery mode directly. So init_bsp_APIC()
is redundant.

Remove the init_bsp_APIC() function.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/include/asm/apic.h | 1 -
arch/x86/kernel/apic/apic.c | 49 ---------------------------------------------
arch/x86/kernel/irqinit.c | 3 ---
3 files changed, 53 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index b35bbbf..3c9fc9e 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -136,7 +136,6 @@ extern void disconnect_bsp_APIC(int virt_wire_setup);
extern void disable_local_APIC(void);
extern void lapic_shutdown(void);
extern void sync_Arb_IDs(void);
-extern void init_bsp_APIC(void);
extern void apic_intr_mode_init(void);
extern void setup_local_APIC(void);
extern void init_apic_mappings(void);
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index d4722c2..47c94d5 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1235,55 +1235,6 @@ static int __init apic_intr_mode_select(int *upmode)
return APIC_SYMMETRIC_IO;
}

-/*
- * An initial setup of the virtual wire mode.
- */
-void __init init_bsp_APIC(void)
-{
- unsigned int value;
-
- /*
- * Don't do the setup now if we have a SMP BIOS as the
- * through-I/O-APIC virtual wire mode might be active.
- */
- if (smp_found_config || !boot_cpu_has(X86_FEATURE_APIC))
- return;
-
- /*
- * Do not trust the local APIC being empty at bootup.
- */
- clear_local_APIC();
-
- /*
- * Enable APIC.
- */
- value = apic_read(APIC_SPIV);
- value &= ~APIC_VECTOR_MASK;
- value |= APIC_SPIV_APIC_ENABLED;
-
-#ifdef CONFIG_X86_32
- /* This bit is reserved on P4/Xeon and should be cleared */
- if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
- (boot_cpu_data.x86 == 15))
- value &= ~APIC_SPIV_FOCUS_DISABLED;
- else
-#endif
- value |= APIC_SPIV_FOCUS_DISABLED;
- value |= SPURIOUS_APIC_VECTOR;
- apic_write(APIC_SPIV, value);
-
- /*
- * Set up the virtual wire mode.
- */
- apic_write(APIC_LVT0, APIC_DM_EXTINT);
- value = APIC_DM_NMI;
- if (!lapic_is_integrated()) /* 82489DX */
- value |= APIC_LVT_LEVEL_TRIGGER;
- if (apic_extnmi == APIC_EXTNMI_NONE)
- value |= APIC_LVT_MASKED;
- apic_write(APIC_LVT1, value);
-}
-
/* Init the interrupt delivery mode for the BSP */
void __init apic_intr_mode_init(void)
{
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 7468c69..488c9e2 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -72,9 +72,6 @@ void __init init_ISA_irqs(void)
struct irq_chip *chip = legacy_pic->chip;
int i;

-#if defined(CONFIG_X86_64) || defined(CONFIG_X86_LOCAL_APIC)
- init_bsp_APIC();
-#endif
legacy_pic->init(0);

for (i = 0; i < nr_legacy_irqs(); i++)
--
2.5.5



2017-06-30 04:08:58

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 10/12] x86/xen: Bypass intr mode setup in enlighten_pv system

xen_smp_ops overwrites smp_prepare_cpus to xen_pv_smp_prepare_cpus
which initializes interrupt itself.

Touching the intr_mode_init causes unexpected results on the system.

Bypass it in enlighten_pv system.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/xen/enlighten_pv.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index f33eef4..d3362a3 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1268,6 +1268,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
x86_platform.get_nmi_reason = xen_get_nmi_reason;

x86_init.resources.memory_setup = xen_memory_setup;
+ x86_init.irqs.intr_mode_init = x86_init_noop;
x86_init.oem.arch_setup = xen_arch_setup;
x86_init.oem.banner = xen_banner;

--
2.5.5



2017-06-30 04:09:40

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 06/12] x86/apic: Mark the apic_intr_mode extern for sanity check cleanup

Calling native_smp_prepare_cpus() to prepare for SMP bootup, does
some sanity checking, enables APIC mode and disables SMP feature.

Now, APIC mode setup has been unified to apic_intr_mode_init(),
some sanity checks are redundant and need to be cleanup.

Mark the apic_intr_mode extern to refine the switch and remove
the redundant sanity check.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/include/asm/apic.h | 9 ++++++++
arch/x86/kernel/apic/apic.c | 11 +++-------
arch/x86/kernel/smpboot.c | 50 ++++++++-------------------------------------
3 files changed, 21 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index c3bedbd..bfbf715 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -53,6 +53,15 @@ extern int local_apic_timer_c2_ok;
extern int disable_apic;
extern unsigned int lapic_timer_frequency;

+extern enum apic_intr_mode_id apic_intr_mode;
+enum apic_intr_mode_id {
+ APIC_PIC,
+ APIC_VIRTUAL_WIRE,
+ APIC_VIRTUAL_WIRE_NO_CONFIG,
+ APIC_SYMMETRIC_IO,
+ APIC_SYMMETRIC_IO_NO_ROUTING
+};
+
#ifdef CONFIG_SMP
extern void __inquire_remote_apic(int apicid);
#else /* CONFIG_SMP */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index bea8032..0601054 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1157,13 +1157,7 @@ void __init sync_Arb_IDs(void)
APIC_INT_LEVELTRIG | APIC_DM_INIT);
}

-enum apic_intr_mode {
- APIC_PIC,
- APIC_VIRTUAL_WIRE,
- APIC_VIRTUAL_WIRE_NO_CONFIG,
- APIC_SYMMETRIC_IO,
- APIC_SYMMETRIC_IO_NO_ROUTING,
-};
+enum apic_intr_mode_id apic_intr_mode;

static int __init apic_intr_mode_select(int *upmode)
{
@@ -1291,7 +1285,8 @@ void __init apic_intr_mode_init(void)
{
int upmode = false;

- switch (apic_intr_mode_select(&upmode)) {
+ apic_intr_mode = apic_intr_mode_select(&upmode);
+ switch (apic_intr_mode) {
case APIC_PIC:
apic_printk(APIC_VERBOSE, KERN_INFO
"Keep in PIC mode(8259)\n");
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b9b2a43..2db61dcb 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1193,7 +1193,7 @@ enum {
/*
* Various sanity checks.
*/
-static int __init smp_sanity_check(unsigned max_cpus)
+static void __init smp_sanity_check(void)
{
preempt_disable();

@@ -1231,16 +1231,6 @@ static int __init smp_sanity_check(unsigned max_cpus)
}

/*
- * If we couldn't find an SMP configuration at boot time,
- * get out of here now!
- */
- if (!smp_found_config && !acpi_lapic) {
- preempt_enable();
- pr_notice("SMP motherboard not detected\n");
- return SMP_NO_CONFIG;
- }
-
- /*
* Should not be necessary because the MP table should list the boot
* CPU too, but we do it for the sake of robustness anyway.
*/
@@ -1250,29 +1240,6 @@ static int __init smp_sanity_check(unsigned max_cpus)
physid_set(hard_smp_processor_id(), phys_cpu_present_map);
}
preempt_enable();
-
- /*
- * If we couldn't find a local APIC, then get out of here now!
- */
- if (APIC_INTEGRATED(boot_cpu_apic_version) &&
- !boot_cpu_has(X86_FEATURE_APIC)) {
- if (!disable_apic) {
- pr_err("BIOS bug, local APIC #%d not detected!...\n",
- boot_cpu_physical_apicid);
- pr_err("... forcing use of dummy APIC emulation (tell your hw vendor)\n");
- }
- return SMP_NO_APIC;
- }
-
- /*
- * If SMP should be disabled, then really disable it!
- */
- if (!max_cpus) {
- pr_info("SMP mode deactivated\n");
- return SMP_FORCE_UP;
- }
-
- return SMP_OK;
}

static void __init smp_cpu_index_default(void)
@@ -1322,20 +1289,20 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
set_cpu_sibling_map(0);

apic_intr_mode_init();
+ smp_sanity_check();

- switch (smp_sanity_check(max_cpus)) {
- case SMP_NO_CONFIG:
- disable_smp();
- return;
- case SMP_NO_APIC:
+ switch (apic_intr_mode) {
+ case APIC_PIC:
+ case APIC_VIRTUAL_WIRE_NO_CONFIG:
disable_smp();
return;
- case SMP_FORCE_UP:
+ case APIC_SYMMETRIC_IO_NO_ROUTING:
disable_smp();
/* Setup local timer */
x86_init.timers.setup_percpu_clockev();
return;
- case SMP_OK:
+ case APIC_VIRTUAL_WIRE:
+ case APIC_SYMMETRIC_IO:
break;
}

@@ -1343,6 +1310,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
cpu0_logical_apicid = apic_read(APIC_LDR);
else
cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
+
/* Setup local timer */
x86_init.timers.setup_percpu_clockev();

--
2.5.5



2017-06-30 04:08:09

by Dou Liyang

[permalink] [raw]
Subject: [PATCH v5 05/12] x86/apic: Unify interrupt mode setup for SMP-capable system

In the SMP-capable system, enable and setup the interrupt delivery
mode in native_smp_prepare_cpus().

This design mixs the APIC and SMP together, it has highly coupling.

Make the initialization of interrupt mode independent, Unify and
refine it to apic_intr_mode_init() for SMP-capable system.

Signed-off-by: Dou Liyang <[email protected]>
---
arch/x86/kernel/apic/apic.c | 41 ++++++++++++++++++++++++++++++++++++-----
arch/x86/kernel/smpboot.c | 13 ++-----------
2 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 498edbe..bea8032 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1160,10 +1160,12 @@ void __init sync_Arb_IDs(void)
enum apic_intr_mode {
APIC_PIC,
APIC_VIRTUAL_WIRE,
+ APIC_VIRTUAL_WIRE_NO_CONFIG,
APIC_SYMMETRIC_IO,
+ APIC_SYMMETRIC_IO_NO_ROUTING,
};

-static int __init apic_intr_mode_select(void)
+static int __init apic_intr_mode_select(int *upmode)
{
/* Check kernel option */
if (disable_apic) {
@@ -1206,12 +1208,30 @@ static int __init apic_intr_mode_select(void)
if (!smp_found_config) {
disable_ioapic_support();

- if (!acpi_lapic)
+ if (!acpi_lapic) {
pr_info("APIC: ACPI MADT or MP tables are not detected\n");
+ *upmode = true;
+
+ return APIC_VIRTUAL_WIRE_NO_CONFIG;
+ }

return APIC_VIRTUAL_WIRE;
}

+#ifdef CONFIG_SMP
+ /* If SMP should be disabled, then really disable it! */
+ if (!setup_max_cpus) {
+ pr_info("APIC: SMP mode deactivated\n");
+ return APIC_SYMMETRIC_IO_NO_ROUTING;
+ }
+
+ if (read_apic_id() != boot_cpu_physical_apicid) {
+ panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
+ read_apic_id(), boot_cpu_physical_apicid);
+ /* Or can we switch back to PIC here? */
+ }
+#endif
+
/* Other checks of APIC options will be done in each setup function */

return APIC_SYMMETRIC_IO;
@@ -1269,20 +1289,31 @@ void __init init_bsp_APIC(void)
/* Init the interrupt delivery mode for the BSP */
void __init apic_intr_mode_init(void)
{
- switch (apic_intr_mode_select()) {
+ int upmode = false;
+
+ switch (apic_intr_mode_select(&upmode)) {
case APIC_PIC:
apic_printk(APIC_VERBOSE, KERN_INFO
"Keep in PIC mode(8259)\n");
return;
case APIC_VIRTUAL_WIRE:
+ case APIC_VIRTUAL_WIRE_NO_CONFIG:
apic_printk(APIC_VERBOSE, KERN_INFO
"Switch to virtual wire mode setup\n");
- return;
+ default_setup_apic_routing();
+ break;
case APIC_SYMMETRIC_IO:
apic_printk(APIC_VERBOSE, KERN_INFO
"Switch to symmectic I/O mode setup\n");
- return;
+ default_setup_apic_routing();
+ break;
+ case APIC_SYMMETRIC_IO_NO_ROUTING:
+ apic_printk(APIC_VERBOSE, KERN_INFO
+ "Switch to symmectic I/O mode setup in no SMP routine\n");
+ break;
}
+
+ apic_bsp_setup(upmode);
}

static void lapic_setup_esr(void)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d6721f0..b9b2a43 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1321,18 +1321,17 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)

set_cpu_sibling_map(0);

+ apic_intr_mode_init();
+
switch (smp_sanity_check(max_cpus)) {
case SMP_NO_CONFIG:
disable_smp();
- if (APIC_init_uniprocessor())
- pr_notice("Local APIC not detected. Using dummy APIC emulation.\n");
return;
case SMP_NO_APIC:
disable_smp();
return;
case SMP_FORCE_UP:
disable_smp();
- apic_bsp_setup(false);
/* Setup local timer */
x86_init.timers.setup_percpu_clockev();
return;
@@ -1340,14 +1339,6 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
break;
}

- if (read_apic_id() != boot_cpu_physical_apicid) {
- panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
- read_apic_id(), boot_cpu_physical_apicid);
- /* Or can we switch back to PIC here? */
- }
-
- default_setup_apic_routing();
- apic_bsp_setup(false);
if (x2apic_mode)
cpu0_logical_apicid = apic_read(APIC_LDR);
else
--
2.5.5



2017-07-02 17:37:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 01/12] x86/apic: Construct a selector for the interrupt delivery mode

On Fri, 30 Jun 2017, Dou Liyang wrote:
> +static int __init apic_intr_mode_select(void)
> +{
> + /* Check kernel option */
> + if (disable_apic) {
> + pr_info("APIC disabled via kernel command line\n");
> + return APIC_PIC;
> + }
> +
> + /* Check BIOS */
> +#ifdef CONFIG_X86_64
> + /* On 64-bit, the APIC must be integrated, Check local APIC only */
> + if (!boot_cpu_has(X86_FEATURE_APIC)) {
> + disable_apic = 1;
> + pr_info("APIC disabled by BIOS\n");
> + return APIC_PIC;
> + }
> +#else
> + /*
> + * On 32-bit, check whether there is a separate chip or integrated
> + * APIC
> + */
> +
> + /* Has a local APIC ? */
> + if (!boot_cpu_has(X86_FEATURE_APIC) &&
> + APIC_INTEGRATED(boot_cpu_apic_version)) {

This looks wrong. The existing logic is:

if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config)
return -1;

if (!boot_cpu_has(X86_FEATURE_APIC) &&
APIC_INTEGRATED(boot_cpu_apic_version)) {
pr_err(....);

I know that this is magically the same because boot_cpu_apic_version is 0
in the !boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config case, so you
don't fall into that conditional, but it's completely non obvious and does
not really make the code more understandable. Quite the contrary.

> + disable_apic = 1;
> + pr_err(FW_BUG "Local APIC %d not detected, force emulation\n",
> + boot_cpu_physical_apicid);
> + return APIC_PIC;
> + }
> +
> + /* Has a separate chip ? */
> + if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config) {
> + disable_apic = 1;
> +
> + return APIC_PIC;
> + }

So if you move exactly that check above the other then it's clear what's
going on.

> +#endif
> +
> + /* Check MP table or ACPI MADT configuration */
> + if (!smp_found_config) {
> + disable_ioapic_support();
> +
> + if (!acpi_lapic)
> + pr_info("APIC: ACPI MADT or MP tables are not detected\n");
> +
> + return APIC_VIRTUAL_WIRE;
> + }
> +
> + /* Other checks of APIC options will be done in each setup function */
> +

Please remove the extra new line. It's not helping readability.

> + return APIC_SYMMETRIC_IO;
> +}

Thanks,

tglx

2017-07-02 17:47:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] x86/apic: Prepare for unifying the interrupt delivery modes setup

On Fri, 30 Jun 2017, Dou Liyang wrote:
> +/* Init the interrupt delivery mode for the BSP */
> +void __init apic_intr_mode_init(void)
> +{
> + switch (apic_intr_mode_select()) {
> + case APIC_PIC:
> + apic_printk(APIC_VERBOSE, KERN_INFO
> + "Keep in PIC mode(8259)\n");

Please do not proliferate that APIC_VERBOSE, KERN_INFO mess. Clean up the
apic_printk() macro first. Either change printk() to pr_info() or make the
printk level dependent on the APIC verbosity.

Thanks,

tglx

2017-07-02 17:55:02

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 04/12] x86/apic: Move logical APIC ID away from apic_bsp_setup()

On Fri, 30 Jun 2017, Dou Liyang wrote:
> /*
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 93f0cda..d6721f0 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1347,8 +1347,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
> }
>
> default_setup_apic_routing();
> - cpu0_logical_apicid = apic_bsp_setup(false);
> -
> + apic_bsp_setup(false);
> + if (x2apic_mode)
> + cpu0_logical_apicid = apic_read(APIC_LDR);
> + else
> + cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));

Can you please move that into a seperate helper function?

> /* Setup local timer */
> x86_init.timers.setup_percpu_clockev();
>
> --
> 2.5.5
>
>
>
>

2017-07-02 18:07:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] x86/apic: Unify interrupt mode setup for SMP-capable system

On Fri, 30 Jun 2017, Dou Liyang wrote:
> -static int __init apic_intr_mode_select(void)
> +static int __init apic_intr_mode_select(int *upmode)
> {
> /* Check kernel option */
> if (disable_apic) {
> @@ -1206,12 +1208,30 @@ static int __init apic_intr_mode_select(void)
> if (!smp_found_config) {
> disable_ioapic_support();
>
> - if (!acpi_lapic)
> + if (!acpi_lapic) {
> pr_info("APIC: ACPI MADT or MP tables are not detected\n");
> + *upmode = true;

That store and extra argument is pointless.

> +
> + return APIC_VIRTUAL_WIRE_NO_CONFIG;

You added an extra return code, which you can use exactly for that purpose
at the callsite.


Aside of that, if you use int * then use numbers, if you use bool then use
true/false. But mixing that is horrible.

> + }

Thanks,

tglx

2017-07-02 18:19:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 07/12] x86/apic: Unify interrupt mode setup for UP system

On Fri, 30 Jun 2017, Dou Liyang wrote:
> static inline int apic_force_enable(unsigned long addr)
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 0601054..9bf7e95 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1198,6 +1198,10 @@ static int __init apic_intr_mode_select(int *upmode)
> }
> #endif
>
> +#ifdef CONFIG_UP_LATE_INIT
> + *upmode = true;
> +#endif

This is really wrong. The upmode decision, which is required for calling
apic_bsp_setup() should not happen here, really. As I told you in the
previous patch, use the return code and then you can make further decisions
in apic_intr_mode_init().

And you do it there w/o any ifdeffery:

static void apic_intr_mode_init(void)
{
bool upmode = IS_ENABLED(CONFIG_UP_LATE_INIT);

switch (....) {
case XXXX:
upmode = true;
....
}
apic_bsp_setup(upmode);
}

Thanks,

tglx

2017-07-02 19:15:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] x86/ioapic: Refactor the delay logic in timer_irq_works()

On Fri, 30 Jun 2017, Dou Liyang wrote:
> +static void __init delay_with_tsc(void)
> +{
> + unsigned long long start, now;
> + unsigned long ticks = jiffies;

Please make that

unsigned long end = jiffies + 4;

ticks really means: number of ticks. But that variable is doing something
different.

> + start = rdtsc();
> +
> + /*
> + * We don't know the TSC frequency yet, but waiting for
> + * 40000000000/HZ TSC cycles is safe:
> + * 4 GHz == 10 jiffies
> + * 1 GHz == 40 jiffies
> + */
> + do {
> + rep_nop();
> + now = rdtsc();
> + } while ((now - start) < 40000000000UL / HZ &&
> + time_before_eq(jiffies, ticks + 4));
> +}
> +
> +static void __init delay_without_tsc(void)
> +{
> + int band = 1;
> + unsigned long ticks = jiffies;

Please sort variables in reverse fir tree order

unsigned long end = jiffies + 4;
int band = 1;

> +
> + /*
> + * We don't know any frequency yet, but waiting for
> + * 40940000000/HZ cycles is safe:
> + * 4 GHz == 10 jiffies
> + * 1 GHz == 40 jiffies
> + * 1 << 1 + 1 << 2 +...+ 1 << 11 = 4094
> + */
> + do {
> + __delay(((1 << band++) * 10000000UL) / HZ);

s/1/1U/

> + } while (band < 12 && time_before_eq(jiffies, ticks + 4));
> +}

Thanks,

tglx

2017-07-02 19:16:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 09/12] x86/init: add intr_mode_init to x86_init_ops

On Fri, 30 Jun 2017, Dou Liyang wrote:
> Add an unconditional x86_init_ops function which defaults to the
> standard function and can be overridden by the early platform code.

That changelog describes WHAT the patch does, but not WHY. That's useless
as we can see WHAT the patch does from the patch itself. The WHY is the
really important information.

Thanks,

tglx

2017-07-02 19:18:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 10/12] x86/xen: Bypass intr mode setup in enlighten_pv system

On Fri, 30 Jun 2017, Dou Liyang wrote:

> xen_smp_ops overwrites smp_prepare_cpus to xen_pv_smp_prepare_cpus
> which initializes interrupt itself.
>
> Touching the intr_mode_init causes unexpected results on the system.
>
> Bypass it in enlighten_pv system.

So that's the wrong patch order then. You broke XEN at some point with your
changes. You need to prevent that breakage upfront not after the fact.

Thanks,

tglx

2017-07-03 01:44:17

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 01/12] x86/apic: Construct a selector for the interrupt delivery mode

Hi Thomas,

At 07/03/2017 01:37 AM, Thomas Gleixner wrote:
> On Fri, 30 Jun 2017, Dou Liyang wrote:
>> +static int __init apic_intr_mode_select(void)
>> +{
>> + /* Check kernel option */
>> + if (disable_apic) {
>> + pr_info("APIC disabled via kernel command line\n");
>> + return APIC_PIC;
>> + }
>> +
>> + /* Check BIOS */
>> +#ifdef CONFIG_X86_64
>> + /* On 64-bit, the APIC must be integrated, Check local APIC only */
>> + if (!boot_cpu_has(X86_FEATURE_APIC)) {
>> + disable_apic = 1;
>> + pr_info("APIC disabled by BIOS\n");
>> + return APIC_PIC;
>> + }
>> +#else
>> + /*
>> + * On 32-bit, check whether there is a separate chip or integrated
>> + * APIC
>> + */
>> +
>> + /* Has a local APIC ? */
>> + if (!boot_cpu_has(X86_FEATURE_APIC) &&
>> + APIC_INTEGRATED(boot_cpu_apic_version)) {
>
> This looks wrong. The existing logic is:
>
> if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config)
> return -1;
>
> if (!boot_cpu_has(X86_FEATURE_APIC) &&
> APIC_INTEGRATED(boot_cpu_apic_version)) {
> pr_err(....);
>
> I know that this is magically the same because boot_cpu_apic_version is 0
> in the !boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config case, so you
> don't fall into that conditional,

I see, it an unnecessary and surplus thing I did.

but it's completely non obvious and does
> not really make the code more understandable. Quite the contrary.

You are right.

>
>> + disable_apic = 1;
>> + pr_err(FW_BUG "Local APIC %d not detected, force emulation\n",
>> + boot_cpu_physical_apicid);
>> + return APIC_PIC;
>> + }
>> +
>> + /* Has a separate chip ? */
>> + if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config) {
>> + disable_apic = 1;
>> +
>> + return APIC_PIC;
>> + }
>
> So if you move exactly that check above the other then it's clear what's
> going on.

Will keep the order like the existing logic you gave above.

>
>> +#endif
>> +
>> + /* Check MP table or ACPI MADT configuration */
>> + if (!smp_found_config) {
>> + disable_ioapic_support();
>> +
>> + if (!acpi_lapic)
>> + pr_info("APIC: ACPI MADT or MP tables are not detected\n");
>> +
>> + return APIC_VIRTUAL_WIRE;
>> + }
>> +
>> + /* Other checks of APIC options will be done in each setup function */
>> +
>
> Please remove the extra new line. It's not helping readability.

Yes, remove right now.

Thanks,
dou.

>
>> + return APIC_SYMMETRIC_IO;
>> +}
>
> Thanks,
>
> tglx
>
>
>


2017-07-03 01:59:02

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] x86/apic: Prepare for unifying the interrupt delivery modes setup

Hi Thomas,

At 07/03/2017 01:47 AM, Thomas Gleixner wrote:
> On Fri, 30 Jun 2017, Dou Liyang wrote:
>> +/* Init the interrupt delivery mode for the BSP */
>> +void __init apic_intr_mode_init(void)
>> +{
>> + switch (apic_intr_mode_select()) {
>> + case APIC_PIC:
>> + apic_printk(APIC_VERBOSE, KERN_INFO
>> + "Keep in PIC mode(8259)\n");
>
> Please do not proliferate that APIC_VERBOSE, KERN_INFO mess. Clean up the
> apic_printk() macro first. Either change printk() to pr_info() or make the
> printk level dependent on the APIC verbosity.

Oops, I understood, How about the following:

pr_info("APIC: keep in PIC mode(8259)\n");


Thanks,

dou.

>
> Thanks,
>
> tglx
>
>
>


2017-07-03 02:02:55

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 04/12] x86/apic: Move logical APIC ID away from apic_bsp_setup()

Hi, Thomas

At 07/03/2017 01:54 AM, Thomas Gleixner wrote:
> On Fri, 30 Jun 2017, Dou Liyang wrote:
>> /*
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index 93f0cda..d6721f0 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -1347,8 +1347,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
>> }
>>
>> default_setup_apic_routing();
>> - cpu0_logical_apicid = apic_bsp_setup(false);
>> -
>> + apic_bsp_setup(false);
>> + if (x2apic_mode)
>> + cpu0_logical_apicid = apic_read(APIC_LDR);
>> + else
>> + cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
>
> Can you please move that into a seperate helper function?


Yes, it will be a separate helper function in the next version.

Thanks,

dou.

>
>> /* Setup local timer */
>> x86_init.timers.setup_percpu_clockev();
>>
>> --
>> 2.5.5
>>
>>
>>
>>
>
>
>


2017-07-03 02:34:46

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 07/12] x86/apic: Unify interrupt mode setup for UP system

Hi Thomas,

At 07/03/2017 02:19 AM, Thomas Gleixner wrote:
> On Fri, 30 Jun 2017, Dou Liyang wrote:
>> static inline int apic_force_enable(unsigned long addr)
>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
>> index 0601054..9bf7e95 100644
>> --- a/arch/x86/kernel/apic/apic.c
>> +++ b/arch/x86/kernel/apic/apic.c
>> @@ -1198,6 +1198,10 @@ static int __init apic_intr_mode_select(int *upmode)
>> }
>> #endif
>>
>> +#ifdef CONFIG_UP_LATE_INIT
>> + *upmode = true;
>> +#endif
>
> This is really wrong. The upmode decision, which is required for calling
> apic_bsp_setup() should not happen here, really. As I told you in the
> previous patch, use the return code and then you can make further decisions
> in apic_intr_mode_init().

Really thanks for your detail explaining, I learned more than i read
from books about the programming skill.

>
> And you do it there w/o any ifdeffery:
>
> static void apic_intr_mode_init(void)
> {
> bool upmode = IS_ENABLED(CONFIG_UP_LATE_INIT);
>
> switch (....) {
> case XXXX:
> upmode = true;
> ....
> }
> apic_bsp_setup(upmode);
> }

This looks more beautiful than mine. I will use it in the next version.

Thanks,

dou.

>
> Thanks,
>
> tglx
>
>
>


2017-07-03 02:35:09

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] x86/apic: Unify interrupt mode setup for SMP-capable system

Hi Thomas,

At 07/03/2017 02:07 AM, Thomas Gleixner wrote:
> On Fri, 30 Jun 2017, Dou Liyang wrote:
>> -static int __init apic_intr_mode_select(void)
>> +static int __init apic_intr_mode_select(int *upmode)
>> {
>> /* Check kernel option */
>> if (disable_apic) {
>> @@ -1206,12 +1208,30 @@ static int __init apic_intr_mode_select(void)
>> if (!smp_found_config) {
>> disable_ioapic_support();
>>
>> - if (!acpi_lapic)
>> + if (!acpi_lapic) {
>> pr_info("APIC: ACPI MADT or MP tables are not detected\n");
>> + *upmode = true;
>
> That store and extra argument is pointless.
>
>> +
>> + return APIC_VIRTUAL_WIRE_NO_CONFIG;
>
> You added an extra return code, which you can use exactly for that purpose
> at the callsite.
>

Actually indeed. Great! Why didn't I think of that?

>
> Aside of that, if you use int * then use numbers, if you use bool then use
> true/false. But mixing that is horrible.
>

Yes, it is, I will remove the 'upmode' argument.


Thanks,

dou.

>> + }
>
> Thanks,
>
> tglx
>
>
>


2017-07-03 03:23:55

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] x86/ioapic: Refactor the delay logic in timer_irq_works()

Hi Thomas,

At 07/03/2017 03:15 AM, Thomas Gleixner wrote:
> On Fri, 30 Jun 2017, Dou Liyang wrote:
>> +static void __init delay_with_tsc(void)
>> +{
>> + unsigned long long start, now;
>> + unsigned long ticks = jiffies;
>
> Please make that
>
> unsigned long end = jiffies + 4;
>
> ticks really means: number of ticks. But that variable is doing something
> different.

um, I see. Will use 'end' instead.

>
>> + start = rdtsc();
>> +
>> + /*
>> + * We don't know the TSC frequency yet, but waiting for
>> + * 40000000000/HZ TSC cycles is safe:
>> + * 4 GHz == 10 jiffies
>> + * 1 GHz == 40 jiffies
>> + */
>> + do {
>> + rep_nop();
>> + now = rdtsc();
>> + } while ((now - start) < 40000000000UL / HZ &&
>> + time_before_eq(jiffies, ticks + 4));
>> +}
>> +
>> +static void __init delay_without_tsc(void)
>> +{
>> + int band = 1;
>> + unsigned long ticks = jiffies;
>
> Please sort variables in reverse fir tree order
>
> unsigned long end = jiffies + 4;
> int band = 1;
>

OK, I will.

>> +
>> + /*
>> + * We don't know any frequency yet, but waiting for
>> + * 40940000000/HZ cycles is safe:
>> + * 4 GHz == 10 jiffies
>> + * 1 GHz == 40 jiffies
>> + * 1 << 1 + 1 << 2 +...+ 1 << 11 = 4094
>> + */
>> + do {
>> + __delay(((1 << band++) * 10000000UL) / HZ);
>
> s/1/1U/
>

Got it!

Thanks,

dou.


>> + } while (band < 12 && time_before_eq(jiffies, ticks + 4));
>> +}
>
> Thanks,
>
> tglx
>
>
>


2017-07-03 03:28:29

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 09/12] x86/init: add intr_mode_init to x86_init_ops

Hi Thomas,

At 07/03/2017 03:16 AM, Thomas Gleixner wrote:
> On Fri, 30 Jun 2017, Dou Liyang wrote:
>> Add an unconditional x86_init_ops function which defaults to the
>> standard function and can be overridden by the early platform code.
>
> That changelog describes WHAT the patch does, but not WHY. That's useless
> as we can see WHAT the patch does from the patch itself. The WHY is the
> really important information.
>

Understood, I will add the WHY description in the next version.

Thank,

dou.

> Thanks,
>
> tglx
>
>
>


2017-07-03 03:53:06

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 10/12] x86/xen: Bypass intr mode setup in enlighten_pv system

Hi Thomas,

At 07/03/2017 03:18 AM, Thomas Gleixner wrote:
> On Fri, 30 Jun 2017, Dou Liyang wrote:
>
>> xen_smp_ops overwrites smp_prepare_cpus to xen_pv_smp_prepare_cpus
>> which initializes interrupt itself.
>>
>> Touching the intr_mode_init causes unexpected results on the system.
>>
>> Bypass it in enlighten_pv system.
>
> So that's the wrong patch order then. You broke XEN at some point with your
> changes. You need to prevent that breakage upfront not after the fact.

Yes, I have considered to prevent that breakage in the patchset.

Actually, Until the 11th patch, we put the intr_mode_init ahead of
time, which will break XEN.

Before the 11th patch, we just unify the code and do the preparation,
Kernel will do the intr_mode_init like before, which will have no
influence on XEN.

So we put the patch here before 11th patch.

Thanks,

dou.

>
> Thanks,
>
> tglx
>
>
>


2017-07-03 06:41:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] x86/apic: Prepare for unifying the interrupt delivery modes setup

On Mon, 3 Jul 2017, Dou Liyang wrote:
> At 07/03/2017 01:47 AM, Thomas Gleixner wrote:
> > On Fri, 30 Jun 2017, Dou Liyang wrote:
> > > +/* Init the interrupt delivery mode for the BSP */
> > > +void __init apic_intr_mode_init(void)
> > > +{
> > > + switch (apic_intr_mode_select()) {
> > > + case APIC_PIC:
> > > + apic_printk(APIC_VERBOSE, KERN_INFO
> > > + "Keep in PIC mode(8259)\n");
> >
> > Please do not proliferate that APIC_VERBOSE, KERN_INFO mess. Clean up the
> > apic_printk() macro first. Either change printk() to pr_info() or make the
> > printk level dependent on the APIC verbosity.
>
> Oops, I understood, How about the following:
>
> pr_info("APIC: keep in PIC mode(8259)\n");

As this is once per boot, it's ok to have that information unconditionally
printed.

Thanks,

tglx

2017-07-03 06:47:48

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] x86/apic: Prepare for unifying the interrupt delivery modes setup

Hi Thomas,

At 07/03/2017 02:41 PM, Thomas Gleixner wrote:
> On Mon, 3 Jul 2017, Dou Liyang wrote:
>> At 07/03/2017 01:47 AM, Thomas Gleixner wrote:
>>> On Fri, 30 Jun 2017, Dou Liyang wrote:
>>>> +/* Init the interrupt delivery mode for the BSP */
>>>> +void __init apic_intr_mode_init(void)
>>>> +{
>>>> + switch (apic_intr_mode_select()) {
>>>> + case APIC_PIC:
>>>> + apic_printk(APIC_VERBOSE, KERN_INFO
>>>> + "Keep in PIC mode(8259)\n");
>>>
>>> Please do not proliferate that APIC_VERBOSE, KERN_INFO mess. Clean up the
>>> apic_printk() macro first. Either change printk() to pr_info() or make the
>>> printk level dependent on the APIC verbosity.
>>
>> Oops, I understood, How about the following:
>>
>> pr_info("APIC: keep in PIC mode(8259)\n");
>
> As this is once per boot, it's ok to have that information unconditionally
> printed.

Yes, I see. will change others in the next version.

Thanks,

dou.
>
> Thanks,
>
> tglx
>
>
>


2017-07-03 06:56:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v5 10/12] x86/xen: Bypass intr mode setup in enlighten_pv system

On Mon, 3 Jul 2017, Dou Liyang wrote:
> At 07/03/2017 03:18 AM, Thomas Gleixner wrote:
> > On Fri, 30 Jun 2017, Dou Liyang wrote:
> >
> > > xen_smp_ops overwrites smp_prepare_cpus to xen_pv_smp_prepare_cpus
> > > which initializes interrupt itself.
> > >
> > > Touching the intr_mode_init causes unexpected results on the system.
> > >
> > > Bypass it in enlighten_pv system.
> >
> > So that's the wrong patch order then. You broke XEN at some point with your
> > changes. You need to prevent that breakage upfront not after the fact.
>
> Yes, I have considered to prevent that breakage in the patchset.
>
> Actually, Until the 11th patch, we put the intr_mode_init ahead of
> time, which will break XEN.
>
> Before the 11th patch, we just unify the code and do the preparation,
> Kernel will do the intr_mode_init like before, which will have no
> influence on XEN.
>
> So we put the patch here before 11th patch.

Ok. That's good, but please explain it in the changelog. I had the
impression that this is fixing breakage you introduced earlier, but now
with your explanation it makes sense. So the patch order is correct.

Something like this wants to be in the changelog:

XEN PV overrides smp_prepare_cpus(). xen_pv_smp_prepare_cpus()
initializes interrupts in the XEN PV specific way and does not invoke
native_smp_prepare_cpus(). As a consequence, x86_init.intr_mode_init() is
not invoked either.

The invocation of x86_init.intr_mode_init() will be moved from
native_smp_prepare_cpus() in a follow up patch to solve <INSERT
REASON/PROBLEM>.

That move would cause the invocation of x86_init.intr_mode_init() for XEN
PV platforms. To prevent that, override the default x86_init.intr_mode_init()
callback with a noop().

Thanks,

tglx

2017-07-03 07:17:59

by Dou Liyang

[permalink] [raw]
Subject: Re: [PATCH v5 10/12] x86/xen: Bypass intr mode setup in enlighten_pv system

Hi Thomas,

At 07/03/2017 02:56 PM, Thomas Gleixner wrote:
> On Mon, 3 Jul 2017, Dou Liyang wrote:
>> At 07/03/2017 03:18 AM, Thomas Gleixner wrote:
>>> On Fri, 30 Jun 2017, Dou Liyang wrote:
>>>
>>>> xen_smp_ops overwrites smp_prepare_cpus to xen_pv_smp_prepare_cpus
>>>> which initializes interrupt itself.
>>>>
>>>> Touching the intr_mode_init causes unexpected results on the system.
>>>>
>>>> Bypass it in enlighten_pv system.
>>>
>>> So that's the wrong patch order then. You broke XEN at some point with your
>>> changes. You need to prevent that breakage upfront not after the fact.
>>
>> Yes, I have considered to prevent that breakage in the patchset.
>>
>> Actually, Until the 11th patch, we put the intr_mode_init ahead of
>> time, which will break XEN.
>>
>> Before the 11th patch, we just unify the code and do the preparation,
>> Kernel will do the intr_mode_init like before, which will have no
>> influence on XEN.
>>
>> So we put the patch here before 11th patch.
>
> Ok. That's good, but please explain it in the changelog. I had the
> impression that this is fixing breakage you introduced earlier, but now
> with your explanation it makes sense. So the patch order is correct.
>
> Something like this wants to be in the changelog:
>
> XEN PV overrides smp_prepare_cpus(). xen_pv_smp_prepare_cpus()
> initializes interrupts in the XEN PV specific way and does not invoke
> native_smp_prepare_cpus(). As a consequence, x86_init.intr_mode_init() is
> not invoked either.
>
> The invocation of x86_init.intr_mode_init() will be moved from
> native_smp_prepare_cpus() in a follow up patch to solve <INSERT
> REASON/PROBLEM>.
>
> That move would cause the invocation of x86_init.intr_mode_init() for XEN
> PV platforms. To prevent that, override the default x86_init.intr_mode_init()
> callback with a noop().

Thank you very much for writing the changelog again. This is really the
true purpose which I couldn't explain clearly.

I am testing my V6 patchset, will send it out soon.

Thanks,

dou.
>
> Thanks,
>
> tglx
>
>
>


2017-07-06 14:55:23

by kernel test robot

[permalink] [raw]
Subject: [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface-#)

FYI, we noticed the following commit:

commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize interrupt mode behind timer init")
url: https://github.com/0day-ci/linux/commits/Dou-Liyang/Unify-the-interrupt-delivery-mode-and-do-its-setup-in-advance/20170702-232323
base: https://git.kernel.org/cgit/linux/kernel/git/xen/tip.git linux-next

in testcase: will-it-scale
with following parameters:

nr_task: 50%
mode: process
test: futex2
cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale


on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+----------------+------------+------------+
| | 43436935b7 | 03fa63cc96 |
+----------------+------------+------------+
| boot_successes | 0 | 4 |
+----------------+------------+------------+



kern :info : [ 0.005000] tsc: Fast TSC calibration using PIT
kern :info : [ 0.006000] tsc: Detected 2195.020 MHz processor
kern :info : [ 0.007000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
kern :info : [ 0.008001] pid_max: default: 90112 minimum: 704
kern :info : [ 0.009037] ACPI: Core revision 20170303
kern :err : [ 0.010002] ACPI Error: Table [DMAR] is not invalidated during early boot stage (20170303/tbxface-193)
kern :info : [ 0.126278] ACPI: 4 ACPI AML tables successfully acquired and loaded
kern :info : [ 0.127118] Security Framework initialized
kern :info : [ 0.128003] SELinux: Initializing.
kern :debug : [ 0.129012] SELinux: Starting in permissive mode
kern :info : [ 0.132768] Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes)


To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
Kernel Test Robot


Attachments:
(No filename) (2.24 kB)
config-4.12.0-rc4-00029-g03fa63c (155.59 kB)
job-script (6.67 kB)
kmsg.xz (20.02 kB)
will-it-scale (158.00 B)
job.yaml (4.34 kB)
reproduce (294.00 B)
Download all attachments

2017-07-06 21:03:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface -#)

On Thu, 6 Jul 2017, kernel test robot wrote:

> commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize interrupt mode behind timer init")

> +----------------+------------+------------+
> | | 43436935b7 | 03fa63cc96 |
> +----------------+------------+------------+
> | boot_successes | 0 | 4 |
> +----------------+------------+------------+

So 03fa63cc96 makes the box boot again. I'm confused as usual by the
output of this tool.

> kern :info : [ 0.005000] tsc: Fast TSC calibration using PIT
> kern :info : [ 0.006000] tsc: Detected 2195.020 MHz processor
> kern :info : [ 0.007000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
> kern :info : [ 0.008001] pid_max: default: 90112 minimum: 704
> kern :info : [ 0.009037] ACPI: Core revision 20170303
> kern :err : [ 0.010002] ACPI Error: Table [DMAR] is not invalidated during early boot stage (20170303/tbxface-193)

Sure we have a error message here, but compared to what? Compared to
something which does not boot at all?

Thanks,

tglx

2017-07-07 01:55:17

by kernel test robot

[permalink] [raw]
Subject: Re: [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface -#)

On 07/06, Thomas Gleixner wrote:
>On Thu, 6 Jul 2017, kernel test robot wrote:
>
>> commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize interrupt mode behind timer init")
>
>> +----------------+------------+------------+
>> | | 43436935b7 | 03fa63cc96 |
>> +----------------+------------+------------+
>> | boot_successes | 0 | 4 |
>> +----------------+------------+------------+
>
>So 03fa63cc96 makes the box boot again. I'm confused as usual by the
>output of this tool.,
>
>> kern :info : [ 0.005000] tsc: Fast TSC calibration using PIT
>> kern :info : [ 0.006000] tsc: Detected 2195.020 MHz processor
>> kern :info : [ 0.007000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
>> kern :info : [ 0.008001] pid_max: default: 90112 minimum: 704
>> kern :info : [ 0.009037] ACPI: Core revision 20170303
>> kern :err : [ 0.010002] ACPI Error: Table [DMAR] is not invalidated during early boot stage (20170303/tbxface-193)
>
>Sure we have a error message here, but compared to what? Compared to
>something which does not boot at all?

Sorry for the confusion, here commit 43436935b7 boot failed due to OOM which
happened at the late stage of kernel boot while the ACPI error showed at the
early boot stage for commit 03fa63cc96 and it didn't appear in 43436935b7's
dmesg.

Thanks,
Xiaolong

>
>Thanks,
>
> tglx
>

2017-07-07 02:39:20

by Dou Liyang

[permalink] [raw]
Subject: Re: [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface -#)

Hi xiaolong,

Really thanks for your testing.

At 07/07/2017 09:54 AM, Ye Xiaolong wrote:
> On 07/06, Thomas Gleixner wrote:
>> On Thu, 6 Jul 2017, kernel test robot wrote:
>>
>>> commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize interrupt mode behind timer init")
>>
>>> +----------------+------------+------------+
>>> | | 43436935b7 | 03fa63cc96 |
>>> +----------------+------------+------------+
>>> | boot_successes | 0 | 4 |
>>> +----------------+------------+------------+
>>
>> So 03fa63cc96 makes the box boot again. I'm confused as usual by the
>> output of this tool.,
>>
>>> kern :info : [ 0.005000] tsc: Fast TSC calibration using PIT
>>> kern :info : [ 0.006000] tsc: Detected 2195.020 MHz processor
>>> kern :info : [ 0.007000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
>>> kern :info : [ 0.008001] pid_max: default: 90112 minimum: 704
>>> kern :info : [ 0.009037] ACPI: Core revision 20170303
>>> kern :err : [ 0.010002] ACPI Error: Table [DMAR] is not invalidated during early boot stage (20170303/tbxface-193)
>>
>> Sure we have a error message here, but compared to what? Compared to
>> something which does not boot at all?
>
> Sorry for the confusion, here commit 43436935b7 boot failed due to OOM which
> happened at the late stage of kernel boot while the ACPI error showed at the
> early boot stage for commit 03fa63cc96 and it didn't appear in 43436935b7's
> dmesg.
>

let's make the problem clearly firstly:

1) Commit 43436935b7 ("x86/xen: Bypass intr mode setup in enlighten_pv
system") made kernel boot failed, which caused by OOM.

2) Commit 03fa63cc96 ("x86/time: Initialize interrupt mode behind timer
init") can make the kernel boot success again, but with an ACPI error
happened.

And both *1* and *2* used the same configuration showed in the
attachment.

Does anything I missed?

Thanks,

dou.

> Thanks,
> Xiaolong
>
>>
>> Thanks,
>>
>> tglx
>>
>
>
>


2017-07-07 03:04:47

by kernel test robot

[permalink] [raw]
Subject: Re: [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface -#)

On 07/07, Dou Liyang wrote:
>Hi xiaolong,
>
>Really thanks for your testing.
>
>At 07/07/2017 09:54 AM, Ye Xiaolong wrote:
>>On 07/06, Thomas Gleixner wrote:
>>>On Thu, 6 Jul 2017, kernel test robot wrote:
>>>
>>>>commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize interrupt mode behind timer init")
>>>
>>>>+----------------+------------+------------+
>>>>| | 43436935b7 | 03fa63cc96 |
>>>>+----------------+------------+------------+
>>>>| boot_successes | 0 | 4 |
>>>>+----------------+------------+------------+
>>>
>>>So 03fa63cc96 makes the box boot again. I'm confused as usual by the
>>>output of this tool.,
>>>
>>>>kern :info : [ 0.005000] tsc: Fast TSC calibration using PIT
>>>>kern :info : [ 0.006000] tsc: Detected 2195.020 MHz processor
>>>>kern :info : [ 0.007000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
>>>>kern :info : [ 0.008001] pid_max: default: 90112 minimum: 704
>>>>kern :info : [ 0.009037] ACPI: Core revision 20170303
>>>>kern :err : [ 0.010002] ACPI Error: Table [DMAR] is not invalidated during early boot stage (20170303/tbxface-193)
>>>
>>>Sure we have a error message here, but compared to what? Compared to
>>>something which does not boot at all?
>>
>>Sorry for the confusion, here commit 43436935b7 boot failed due to OOM which
>>happened at the late stage of kernel boot while the ACPI error showed at the
>>early boot stage for commit 03fa63cc96 and it didn't appear in 43436935b7's
>>dmesg.
>>
>
>let's make the problem clearly firstly:
>
>1) Commit 43436935b7 ("x86/xen: Bypass intr mode setup in enlighten_pv
>system") made kernel boot failed, which caused by OOM.
>
>2) Commit 03fa63cc96 ("x86/time: Initialize interrupt mode behind
>timer init") can make the kernel boot success again, but with an ACPI
>error happened.
>
>And both *1* and *2* used the same configuration showed in the
>attachment.
>
>Does anything I missed?

Yes, this is exactly what I meant.

Thanks,
Xiaolong
>
>Thanks,
>
> dou.
>
>>Thanks,
>>Xiaolong
>>
>>>
>>>Thanks,
>>>
>>> tglx
>>>
>>
>>
>>
>
>

2017-07-07 04:46:47

by Dou Liyang

[permalink] [raw]
Subject: Re: [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface -#)

Hi Thomas,

At 07/07/2017 11:04 AM, Ye Xiaolong wrote:
> On 07/07, Dou Liyang wrote:
>> Hi xiaolong,
>>
>> Really thanks for your testing.
>>
>> At 07/07/2017 09:54 AM, Ye Xiaolong wrote:
>>> On 07/06, Thomas Gleixner wrote:
>>>> On Thu, 6 Jul 2017, kernel test robot wrote:
>>>>
>>>>> commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize interrupt mode behind timer init")
>>>>
>>>>> +----------------+------------+------------+
>>>>> | | 43436935b7 | 03fa63cc96 |
>>>>> +----------------+------------+------------+
>>>>> | boot_successes | 0 | 4 |
>>>>> +----------------+------------+------------+
>>>>
>>>> So 03fa63cc96 makes the box boot again. I'm confused as usual by the
>>>> output of this tool.,
>>>>
>>>>> kern :info : [ 0.005000] tsc: Fast TSC calibration using PIT
>>>>> kern :info : [ 0.006000] tsc: Detected 2195.020 MHz processor
>>>>> kern :info : [ 0.007000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
>>>>> kern :info : [ 0.008001] pid_max: default: 90112 minimum: 704
>>>>> kern :info : [ 0.009037] ACPI: Core revision 20170303
>>>>> kern :err : [ 0.010002] ACPI Error: Table [DMAR] is not invalidated during early boot stage (20170303/tbxface-193)
>>>>
>>>> Sure we have a error message here, but compared to what? Compared to
>>>> something which does not boot at all?
>>>
>>> Sorry for the confusion, here commit 43436935b7 boot failed due to OOM which
>>> happened at the late stage of kernel boot while the ACPI error showed at the
>>> early boot stage for commit 03fa63cc96 and it didn't appear in 43436935b7's
>>> dmesg.
>>>
>>
>> let's make the problem clearly firstly:
>>
>> 1) Commit 43436935b7 ("x86/xen: Bypass intr mode setup in enlighten_pv
>> system") made kernel boot failed, which caused by OOM.
>>

It is so interesting!

This patch only work for XEN in pv mode, if we boot a kernel w/o XEN,
it will not work in fact.


>> 2) Commit 03fa63cc96 ("x86/time: Initialize interrupt mode behind
>> timer init") can make the kernel boot success again, but with an ACPI
>> error happened.

Indeed, the ACPI Error appeared with this patch.

If x2apic is enabled in x86-64, initializing interrupt mode contains
irq remapping setup and the ACPI DMAR table initialization.

default_setup_apic_routing()
enable_IR_x2apic()
irq_remapping_prepare()
intel_prepare_irq_remapping()
dmar_table_init()

this patch make interrupt mode setup before acpi_early_init() which has
the check of ACPI table state in acpi_reallocate_root_table() where the
ACPI Error generated.

I have got a machine which supports x2apic. I am tracking the code,
try to find out more.

Thanks,

dou.
>>
>> And both *1* and *2* used the same configuration showed in the
>> attachment.
>>
>> Does anything I missed?
>
> Yes, this is exactly what I meant.
>
> Thanks,
> Xiaolong
>>
>> Thanks,
>>
>> dou.
>>
>>> Thanks,
>>> Xiaolong
>>>
>>>>
>>>> Thanks,
>>>>
>>>> tglx
>>>>
>>>
>>>
>>>
>>
>>
>
>
>