2007-09-21 22:32:14

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [0/50] x86 2.6.24 patches review II


- More cleanups and bug fixes
- Use CLFLUSH in c_p_a(). This were the patches that caused some trouble
in -mm with X11, but it should be hopefully fixed now (we'll see)
- Report all IPIs in /proc/interrupts (useful e.g. to find out why
CPU isolation fails)
- Experimental patch to disable SVM for the clinical paranoid
- Dump the LER registers on Oopses by default (note that this is not 100%
reliable -- we know how to make it reliable, but that's not implemented yet)

Please review.

-Andi


2007-09-21 22:32:38

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [1/50] x86_64: store core id bits in cpuinfo_x8


From: "Yinghai Lu" <[email protected]>

We need to store core id bits to cpuinfo_x86 in early_identify_cpu. So we
use it to create acpiid_to_node array in k8topolgy.c

Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Len Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/x86_64/kernel/setup.c | 72 +++++++++++++++++++++++++++--------------
include/asm-x86_64/processor.h | 1
2 files changed, 49 insertions(+), 24 deletions(-)

Index: linux/arch/x86_64/kernel/setup.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -499,18 +499,7 @@ static void __init amd_detect_cmp(struct
int node = 0;
unsigned apicid = hard_smp_processor_id();
#endif
- unsigned ecx = cpuid_ecx(0x80000008);
-
- c->x86_max_cores = (ecx & 0xff) + 1;
-
- /* CPU telling us the core id bits shift? */
- bits = (ecx >> 12) & 0xF;
-
- /* Otherwise recompute */
- if (bits == 0) {
- while ((1 << bits) < c->x86_max_cores)
- bits++;
- }
+ bits = c->x86_coreid_bits;

/* Low order bits define the core id (index of core in socket) */
c->cpu_core_id = c->phys_proc_id & ((1 << bits)-1);
@@ -546,6 +535,34 @@ static void __init amd_detect_cmp(struct
#endif
}

+static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+ unsigned bits;
+ unsigned ecx;
+
+ /* Multi core CPU? */
+ if (c->extended_cpuid_level < 0x80000008)
+ return;
+
+ ecx = cpuid_ecx(0x80000008);
+
+ c->x86_max_cores = (ecx & 0xff) + 1;
+
+ /* CPU telling us the core id bits shift? */
+ bits = (ecx >> 12) & 0xF;
+
+ /* Otherwise recompute */
+ if (bits == 0) {
+ while ((1 << bits) < c->x86_max_cores)
+ bits++;
+ }
+
+ c->x86_coreid_bits = bits;
+
+#endif
+}
+
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
{
unsigned level;
@@ -776,6 +793,7 @@ struct cpu_model_info {
void __cpuinit early_identify_cpu(struct cpuinfo_x86 *c)
{
u32 tfms;
+ u32 xlvl;

c->loops_per_jiffy = loops_per_jiffy;
c->x86_cache_size = -1;
@@ -786,6 +804,7 @@ void __cpuinit early_identify_cpu(struct
c->x86_clflush_size = 64;
c->x86_cache_alignment = c->x86_clflush_size;
c->x86_max_cores = 1;
+ c->x86_coreid_bits = 0;
c->extended_cpuid_level = 0;
memset(&c->x86_capability, 0, sizeof c->x86_capability);

@@ -822,18 +841,6 @@ void __cpuinit early_identify_cpu(struct
#ifdef CONFIG_SMP
c->phys_proc_id = (cpuid_ebx(1) >> 24) & 0xff;
#endif
-}
-
-/*
- * This does the hard work of actually picking apart the CPU stuff...
- */
-void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
-{
- int i;
- u32 xlvl;
-
- early_identify_cpu(c);
-
/* AMD-defined flags: level 0x80000001 */
xlvl = cpuid_eax(0x80000000);
c->extended_cpuid_level = xlvl;
@@ -854,6 +861,23 @@ void __cpuinit identify_cpu(struct cpuin
c->x86_capability[2] = cpuid_edx(0x80860001);
}

+ switch (c->x86_vendor) {
+ case X86_VENDOR_AMD:
+ early_init_amd(c);
+ break;
+ }
+
+}
+
+/*
+ * This does the hard work of actually picking apart the CPU stuff...
+ */
+void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
+{
+ int i;
+
+ early_identify_cpu(c);
+
init_scattered_cpuid_features(c);

c->apicid = phys_pkg_id(0);
Index: linux/include/asm-x86_64/processor.h
===================================================================
--- linux.orig/include/asm-x86_64/processor.h
+++ linux/include/asm-x86_64/processor.h
@@ -63,6 +63,7 @@ struct cpuinfo_x86 {
int x86_tlbsize; /* number of 4K pages in DTLB/ITLB combined(in pages)*/
__u8 x86_virt_bits, x86_phys_bits;
__u8 x86_max_cores; /* cpuid returned max cores value */
+ __u8 x86_coreid_bits; /* cpuid returned core id bits */
__u32 x86_power;
__u32 extended_cpuid_level; /* Max extended CPUID function supported */
unsigned long loops_per_jiffy;

2007-09-21 22:32:57

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [2/50] x86_64: use core id bits for apicid_to_node initialization


From: "Yinghai Lu" <[email protected]>

We shoud use core id bits instead of max cores, in case later with AMD
downcores Quad core Opteron.

Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Len Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/x86_64/mm/k8topology.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

Index: linux/arch/x86_64/mm/k8topology.c
===================================================================
--- linux.orig/arch/x86_64/mm/k8topology.c
+++ linux/arch/x86_64/mm/k8topology.c
@@ -44,12 +44,14 @@ int __init k8_scan_nodes(unsigned long s
{
unsigned long prevbase;
struct bootnode nodes[8];
- int nodeid, i, j, nb;
+ int nodeid, i, nb;
unsigned char nodeids[8];
int found = 0;
u32 reg;
unsigned numnodes;
- unsigned num_cores;
+ unsigned cores;
+ unsigned bits;
+ int j;

if (!early_pci_allowed())
return -1;
@@ -60,9 +62,6 @@ int __init k8_scan_nodes(unsigned long s

printk(KERN_INFO "Scanning NUMA topology in Northbridge %d\n", nb);

- num_cores = (cpuid_ecx(0x80000008) & 0xff) + 1;
- printk(KERN_INFO "CPU has %d num_cores\n", num_cores);
-
reg = read_pci_config(0, nb, 0, 0x60);
numnodes = ((reg >> 4) & 0xF) + 1;
if (numnodes <= 1)
@@ -168,11 +167,15 @@ int __init k8_scan_nodes(unsigned long s
}
printk(KERN_INFO "Using node hash shift of %d\n", memnode_shift);

+ /* use the coreid bits from early_identify_cpu */
+ bits = boot_cpu_data.x86_coreid_bits;
+ cores = (1<<bits);
+
for (i = 0; i < 8; i++) {
if (nodes[i].start != nodes[i].end) {
nodeid = nodeids[i];
- for (j = 0; j < num_cores; j++)
- apicid_to_node[(nodeid * num_cores) + j] = i;
+ for (j = 0; j < cores; j++)
+ apicid_to_node[(nodeid << bits) + j] = i;
setup_node_bootmem(i, nodes[i].start, nodes[i].end);
}
}

2007-09-21 22:33:26

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [3/50] x86_64: remove never used apic_mapped


From: "Yinghai Lu" <[email protected]>

Signed-off-by: Yinghai Lu <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Len Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/x86_64/kernel/apic.c | 5 ++---
include/asm-x86_64/apic.h | 1 -
2 files changed, 2 insertions(+), 4 deletions(-)

Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -39,7 +39,6 @@
#include <asm/hpet.h>
#include <asm/apic.h>

-int apic_mapped;
int apic_verbosity;
int apic_runs_main_timer;
int apic_calibrate_pmtmr __initdata;
@@ -697,8 +696,8 @@ void __init init_apic_mappings(void)
apic_phys = mp_lapic_addr;

set_fixmap_nocache(FIX_APIC_BASE, apic_phys);
- apic_mapped = 1;
- apic_printk(APIC_VERBOSE,"mapped APIC to %16lx (%16lx)\n", APIC_BASE, apic_phys);
+ apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
+ APIC_BASE, apic_phys);

/* Put local APIC into the resource map. */
lapic_resource.start = apic_phys;
Index: linux/include/asm-x86_64/apic.h
===================================================================
--- linux.orig/include/asm-x86_64/apic.h
+++ linux/include/asm-x86_64/apic.h
@@ -19,7 +19,6 @@
extern int apic_verbosity;
extern int apic_runs_main_timer;
extern int ioapic_force;
-extern int apic_mapped;

/*
* Define the default level of output to be very little

2007-09-21 22:33:39

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu


From: "Oliver Pinter" <[email protected]>

add cpu core name for arch/i386/Kconfig.cpu:Pentium 4 sections help
add Pentium D for arch/i386/Kconfig.cpu
add Pentium D for arch/x86_64/Kconfig

Signed-off-by: Oliver Pinter <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Sam Ravnborg <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/i386/Kconfig.cpu | 34 +++++++++++++++++++++++++++++++---
arch/x86_64/Kconfig | 6 +++---
2 files changed, 34 insertions(+), 6 deletions(-)

Index: linux/arch/i386/Kconfig.cpu
===================================================================
--- linux.orig/arch/i386/Kconfig.cpu
+++ linux/arch/i386/Kconfig.cpu
@@ -115,11 +115,39 @@ config MPENTIUM4
bool "Pentium-4/Celeron(P4-based)/Pentium-4 M/older Xeon"
help
Select this for Intel Pentium 4 chips. This includes the
- Pentium 4, P4-based Celeron and Xeon, and Pentium-4 M
- (not Pentium M) chips. This option enables compile flags
- optimized for the chip, uses the correct cache shift, and
+ Pentium 4, Pentium D, P4-based Celeron and Xeon, and
+ Pentium-4 M (not Pentium M) chips. This option enables compile
+ flags optimized for the chip, uses the correct cache shift, and
applies any applicable Pentium III optimizations.

+ CPUIDs: F[0-6][1-A] (in /proc/cpuinfo show = cpu family : 15 )
+
+ Select this for:
+ Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename:
+ -Willamette
+ -Northwood
+ -Mobile Pentium 4
+ -Mobile Pentium 4 M
+ -Extreme Edition (Gallatin)
+ -Prescott
+ -Prescott 2M
+ -Cedar Mill
+ -Presler
+ -Smithfiled
+ Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename:
+ -Foster
+ -Prestonia
+ -Gallatin
+ -Nocona
+ -Irwindale
+ -Cranford
+ -Potomac
+ -Paxville
+ -Dempsey
+
+ more info: http://balusc.xs4all.nl/srv/har-cpu.html
+
+
config MK6
bool "K6/K6-II/K6-III"
help
Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -169,9 +169,9 @@ config MK8
config MPSC
bool "Intel P4 / older Netburst based Xeon"
help
- Optimize for Intel Pentium 4 and older Nocona/Dempsey Xeon CPUs
- with Intel Extended Memory 64 Technology(EM64T). For details see
- <http://www.intel.com/technology/64bitextensions/>.
+ Optimize for Intel Pentium 4, Pentium D and older Nocona/Dempsey
+ Xeon CPUs with Intel Extended Memory 64 Technology(EM64T). For
+ details see <http://www.intel.com/technology/64bitextensions/>.
Note that the latest Xeons (Xeon 51xx and 53xx) are not based on the
Netburst core and shouldn't use this option. You can distinguish them
using the cpu family field

2007-09-21 22:33:57

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [5/50] i386: change order in Kconfig.cpu


From: "Oliver Pinter" <[email protected]>

Change the sort in arch/i386/Kconfig.cpu file, while it is logicaller (by
productions time-line).

Signed-off-by: Oliver Pinter <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Sam Ravnborg <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/i386/Kconfig.cpu | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

Index: linux/arch/i386/Kconfig.cpu
===================================================================
--- linux.orig/arch/i386/Kconfig.cpu
+++ linux/arch/i386/Kconfig.cpu
@@ -104,13 +104,6 @@ config MPENTIUMM
Select this for Intel Pentium M (not Pentium-4 M)
notebook chips.

-config MCORE2
- bool "Core 2/newer Xeon"
- help
- Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and 53xx)
- CPUs. You can distinguish newer from older Xeons by the CPU family
- in /proc/cpuinfo. Newer ones have 6.
-
config MPENTIUM4
bool "Pentium-4/Celeron(P4-based)/Pentium-4 M/older Xeon"
help
@@ -147,6 +140,12 @@ config MPENTIUM4

more info: http://balusc.xs4all.nl/srv/har-cpu.html

+config MCORE2
+ bool "Core 2/newer Xeon"
+ help
+ Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and 53xx)
+ CPUs. You can distinguish newer from older Xeons by the CPU family
+ in /proc/cpuinfo. Newer ones have 6.

config MK6
bool "K6/K6-II/K6-III"

2007-09-21 22:34:28

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [6/50] i386: clean up oops/bug reports


From: Pavel Emelyanov <[email protected]>

Typically the oops first lines look like this:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c049dfbd
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
...

Such output is gained with some ugly if (!nl) printk("\n"); code and
besides being a waste of lines, this is also annoying to read. The
following output looks better (and it is how it looks on x86_64):

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip: c049dfbd *pde = 00000000
Oops: 0002 [#1] PREEMPT SMP
...

Signed-off-by: Pavel Emelyanov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---

arch/i386/kernel/traps.c | 16 ++++------------
arch/i386/mm/fault.c | 13 +++++++------
2 files changed, 11 insertions(+), 18 deletions(-)

Index: linux/arch/i386/kernel/traps.c
===================================================================
--- linux.orig/arch/i386/kernel/traps.c
+++ linux/arch/i386/kernel/traps.c
@@ -444,31 +444,23 @@ void die(const char * str, struct pt_reg
local_save_flags(flags);

if (++die.lock_owner_depth < 3) {
- int nl = 0;
unsigned long esp;
unsigned short ss;

report_bug(regs->eip, regs);

- printk(KERN_EMERG "%s: %04lx [#%d]\n", str, err & 0xffff, ++die_counter);
+ printk(KERN_EMERG "%s: %04lx [#%d] ", str, err & 0xffff, ++die_counter);
#ifdef CONFIG_PREEMPT
- printk(KERN_EMERG "PREEMPT ");
- nl = 1;
+ printk("PREEMPT ");
#endif
#ifdef CONFIG_SMP
- if (!nl)
- printk(KERN_EMERG);
printk("SMP ");
- nl = 1;
#endif
#ifdef CONFIG_DEBUG_PAGEALLOC
- if (!nl)
- printk(KERN_EMERG);
printk("DEBUG_PAGEALLOC");
- nl = 1;
#endif
- if (nl)
- printk("\n");
+ printk("\n");
+
if (notify_die(DIE_OOPS, str, regs, err,
current->thread.trap_no, SIGSEGV) !=
NOTIFY_STOP) {
Index: linux/arch/i386/mm/fault.c
===================================================================
--- linux.orig/arch/i386/mm/fault.c
+++ linux/arch/i386/mm/fault.c
@@ -544,23 +544,22 @@ no_context:
printk(KERN_ALERT "BUG: unable to handle kernel paging"
" request");
printk(" at virtual address %08lx\n",address);
- printk(KERN_ALERT " printing eip:\n");
- printk("%08lx\n", regs->eip);
+ printk(KERN_ALERT "printing eip: %08lx ", regs->eip);

page = read_cr3();
page = ((__typeof__(page) *) __va(page))[address >> PGDIR_SHIFT];
#ifdef CONFIG_X86_PAE
- printk(KERN_ALERT "*pdpt = %016Lx\n", page);
+ printk("*pdpt = %016Lx ", page);
if ((page >> PAGE_SHIFT) < max_low_pfn
&& page & _PAGE_PRESENT) {
page &= PAGE_MASK;
page = ((__typeof__(page) *) __va(page))[(address >> PMD_SHIFT)
& (PTRS_PER_PMD - 1)];
- printk(KERN_ALERT "*pde = %016Lx\n", page);
+ printk(KERN_ALERT "*pde = %016Lx ", page);
page &= ~_PAGE_NX;
}
#else
- printk(KERN_ALERT "*pde = %08lx\n", page);
+ printk("*pde = %08lx ", page);
#endif

/*
@@ -574,8 +573,10 @@ no_context:
page &= PAGE_MASK;
page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
& (PTRS_PER_PTE - 1)];
- printk(KERN_ALERT "*pte = %0*Lx\n", sizeof(page)*2, (u64)page);
+ printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
}
+
+ printk("\n");
}

tsk->thread.cr2 = address;

2007-09-21 22:34:40

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [8/50] x86_64: remove x86_cpu_to_log_apicid


From: Mike Travis <[email protected]>

Remove the x86_cpu_to_log_apicid array. It is set in
arch/x86_64/kernel/genapic_flat.c:flat_init_apic_ldr() and
arch/x86_64/kernel/smpboot.c:do_boot_cpu() but it is never
referenced.

Signed-off-by: Mike Travis <[email protected]>
Signed-off-by: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---

arch/x86_64/kernel/genapic.c | 2 --
arch/x86_64/kernel/genapic_flat.c | 1 -
arch/x86_64/kernel/smpboot.c | 1 -
include/asm-x86_64/smp.h | 1 -
4 files changed, 5 deletions(-)

Index: linux/arch/x86_64/kernel/genapic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/genapic.c
+++ linux/arch/x86_64/kernel/genapic.c
@@ -29,8 +29,6 @@ u8 x86_cpu_to_apicid[NR_CPUS] __read_mos
= { [0 ... NR_CPUS-1] = BAD_APICID };
EXPORT_SYMBOL(x86_cpu_to_apicid);

-u8 x86_cpu_to_log_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = BAD_APICID };
-
struct genapic __read_mostly *genapic = &apic_flat;

/*
Index: linux/arch/x86_64/kernel/genapic_flat.c
===================================================================
--- linux.orig/arch/x86_64/kernel/genapic_flat.c
+++ linux/arch/x86_64/kernel/genapic_flat.c
@@ -52,7 +52,6 @@ static void flat_init_apic_ldr(void)

num = smp_processor_id();
id = 1UL << num;
- x86_cpu_to_log_apicid[num] = id;
apic_write(APIC_DFR, APIC_DFR_FLAT);
val = apic_read(APIC_LDR) & ~APIC_LDR_MASK;
val |= SET_APIC_LOGICAL_ID(id);
Index: linux/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smpboot.c
+++ linux/arch/x86_64/kernel/smpboot.c
@@ -702,7 +702,6 @@ do_rest:
cpu_clear(cpu, cpu_present_map);
cpu_clear(cpu, cpu_possible_map);
x86_cpu_to_apicid[cpu] = BAD_APICID;
- x86_cpu_to_log_apicid[cpu] = BAD_APICID;
return -EIO;
}

Index: linux/include/asm-x86_64/smp.h
===================================================================
--- linux.orig/include/asm-x86_64/smp.h
+++ linux/include/asm-x86_64/smp.h
@@ -78,7 +78,6 @@ static inline int hard_smp_processor_id(
* the real APIC ID <-> CPU # mapping.
*/
extern u8 x86_cpu_to_apicid[NR_CPUS]; /* physical ID */
-extern u8 x86_cpu_to_log_apicid[NR_CPUS];
extern u8 bios_cpu_apicid[];

static inline int cpu_present_to_apicid(int mps_cpu)

2007-09-21 22:34:55

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [7/50] x86: expand /proc/interrupts to include missing vectors, v2


From: Joe Korty <[email protected]>

Add missing IRQs and IRQ descriptions to /proc/interrupts.

/proc/interrupts is most useful when it displays every IRQ vector in use by
the system, not just those somebody thought would be interesting.

This patch inserts the following vector displays to the i386 and x86_64
platforms, as appropriate:

rescheduling interrupts
TLB flush interrupts
function call interrupts
thermal event interrupts
threshold interrupts
spurious interrupts

A threshold interrupt occurs when ECC memory correction is occuring at too
high a frequency. Thresholds are used by the ECC hardware as occasional
ECC failures are part of normal operation, but long sequences of ECC
failures usually indicate a memory chip that is about to fail.

Thermal event interrupts occur when a temperature threshold has been
exceeded for some CPU chip. IIRC, a thermal interrupt is also generated
when the temperature drops back to a normal level.

A spurious interrupt is an interrupt that was raised then lowered by the
device before it could be fully processed by the APIC. Hence the apic sees
the interrupt but does not know what device it came from. For this case
the APIC hardware will assume a vector of 0xff.

Rescheduling, call, and TLB flush interrupts are sent from one CPU to
another per the needs of the OS. Typically, their statistics would be used
to discover if an interrupt flood of the given type has been occuring.

AK: merged v2 and v4 which had some more tweaks

Signed-off-by: Joe Korty <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Cc: Tim Hockin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

Documentation/filesystems/proc.txt | 35 ++++++++++++++++++++++++++++++++++-
arch/i386/kernel/apic.c | 1 +
arch/i386/kernel/cpu/mcheck/p4.c | 1 +
arch/i386/kernel/irq.c | 31 +++++++++++++++++++++++++++++--
arch/i386/kernel/smp.c | 3 +++
arch/i386/mach-voyager/voyager_smp.c | 1 +
arch/i386/xen/smp.c | 1 +
arch/x86_64/kernel/apic.c | 1 +
arch/x86_64/kernel/irq.c | 30 ++++++++++++++++++++++++++++--
arch/x86_64/kernel/mce_amd.c | 1 +
arch/x86_64/kernel/mce_intel.c | 1 +
arch/x86_64/kernel/smp.c | 3 +++
include/asm-i386/hardirq.h | 5 +++++
include/asm-x86_64/pda.h | 6 ++++++
14 files changed, 115 insertions(+), 5 deletions(-)

Index: linux/arch/i386/kernel/apic.c
===================================================================
--- linux.orig/arch/i386/kernel/apic.c
+++ linux/arch/i386/kernel/apic.c
@@ -1277,6 +1277,7 @@ void smp_spurious_interrupt(struct pt_re
/* see sw-dev-man vol 3, chapter 7.4.13.5 */
printk(KERN_INFO "spurious APIC interrupt on CPU#%d, "
"should never happen.\n", smp_processor_id());
+ __get_cpu_var(irq_stat).irq_spurious_counts++;
irq_exit();
}

Index: linux/arch/i386/kernel/cpu/mcheck/p4.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/mcheck/p4.c
+++ linux/arch/i386/kernel/cpu/mcheck/p4.c
@@ -61,6 +61,7 @@ fastcall void smp_thermal_interrupt(stru
{
irq_enter();
vendor_thermal_interrupt(regs);
+ __get_cpu_var(irq_stat).irq_thermal_counts++;
irq_exit();
}

Index: linux/arch/i386/kernel/irq.c
===================================================================
--- linux.orig/arch/i386/kernel/irq.c
+++ linux/arch/i386/kernel/irq.c
@@ -284,14 +284,41 @@ skip:
seq_printf(p, "NMI: ");
for_each_online_cpu(j)
seq_printf(p, "%10u ", nmi_count(j));
- seq_putc(p, '\n');
+ seq_printf(p, " Non-maskable interrupts\n");
#ifdef CONFIG_X86_LOCAL_APIC
seq_printf(p, "LOC: ");
for_each_online_cpu(j)
seq_printf(p, "%10u ",
per_cpu(irq_stat,j).apic_timer_irqs);
- seq_putc(p, '\n');
+ seq_printf(p, " Local interrupts\n");
#endif
+#ifdef CONFIG_SMP
+ seq_printf(p, "RES: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ",
+ per_cpu(irq_stat,j).irq_resched_counts);
+ seq_printf(p, " Rescheduling interrupts\n");
+ seq_printf(p, "CAL: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ",
+ per_cpu(irq_stat,j).irq_call_counts);
+ seq_printf(p, " function call interrupts\n");
+ seq_printf(p, "TLB: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ",
+ per_cpu(irq_stat,j).irq_tlb_counts);
+ seq_printf(p, " TLB shootdowns\n");
+#endif
+ seq_printf(p, "TRM: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ",
+ per_cpu(irq_stat,j).irq_thermal_counts);
+ seq_printf(p, " Thermal event interrupts\n");
+ seq_printf(p, "SPU: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ",
+ per_cpu(irq_stat,j).irq_spurious_counts);
+ seq_printf(p, " Spurious interrupts\n");
seq_printf(p, "ERR: %10u\n", atomic_read(&irq_err_count));
#if defined(CONFIG_X86_IO_APIC)
seq_printf(p, "MIS: %10u\n", atomic_read(&irq_mis_count));
Index: linux/arch/i386/kernel/smp.c
===================================================================
--- linux.orig/arch/i386/kernel/smp.c
+++ linux/arch/i386/kernel/smp.c
@@ -342,6 +342,7 @@ fastcall void smp_invalidate_interrupt(s
smp_mb__after_clear_bit();
out:
put_cpu_no_resched();
+ __get_cpu_var(irq_stat).irq_tlb_counts++;
}

void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
@@ -640,6 +641,7 @@ static void native_smp_send_stop(void)
fastcall void smp_reschedule_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
+ __get_cpu_var(irq_stat).irq_resched_counts++;
}

fastcall void smp_call_function_interrupt(struct pt_regs *regs)
@@ -660,6 +662,7 @@ fastcall void smp_call_function_interrup
*/
irq_enter();
(*func)(info);
+ __get_cpu_var(irq_stat).irq_call_counts++;
irq_exit();

if (wait) {
Index: linux/arch/i386/mach-voyager/voyager_smp.c
===================================================================
--- linux.orig/arch/i386/mach-voyager/voyager_smp.c
+++ linux/arch/i386/mach-voyager/voyager_smp.c
@@ -1037,6 +1037,7 @@ smp_call_function_interrupt(void)
*/
irq_enter();
(*func)(info);
+ __get_cpu_var(irq_stat).irq_call_counts++;
irq_exit();
if (wait) {
mb();
Index: linux/arch/i386/xen/smp.c
===================================================================
--- linux.orig/arch/i386/xen/smp.c
+++ linux/arch/i386/xen/smp.c
@@ -346,6 +346,7 @@ static irqreturn_t xen_call_function_int
*/
irq_enter();
(*func)(info);
+ __get_cpu_var(irq_stat).irq_call_counts++;
irq_exit();

if (wait) {
Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -1117,6 +1117,7 @@ asmlinkage void smp_spurious_interrupt(v
if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f)))
ack_APIC_irq();

+ add_pda(irq_spurious_counts, 1);
irq_exit();
}

Index: linux/arch/x86_64/kernel/irq.c
===================================================================
--- linux.orig/arch/x86_64/kernel/irq.c
+++ linux/arch/x86_64/kernel/irq.c
@@ -88,11 +88,37 @@ skip:
seq_printf(p, "NMI: ");
for_each_online_cpu(j)
seq_printf(p, "%10u ", cpu_pda(j)->__nmi_count);
- seq_putc(p, '\n');
+ seq_printf(p, " Non-maskable interrupts\n");
seq_printf(p, "LOC: ");
for_each_online_cpu(j)
seq_printf(p, "%10u ", cpu_pda(j)->apic_timer_irqs);
- seq_putc(p, '\n');
+ seq_printf(p, " Local interrupts\n");
+#ifdef CONFIG_SMP
+ seq_printf(p, "RES: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", cpu_pda(j)->irq_resched_counts);
+ seq_printf(p, " Rescheduling interrupts\n");
+ seq_printf(p, "CAL: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", cpu_pda(j)->irq_call_counts);
+ seq_printf(p, " function call interrupts\n");
+ seq_printf(p, "TLB: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", cpu_pda(j)->irq_tlb_counts);
+ seq_printf(p, " TLB shootdowns\n");
+#endif
+ seq_printf(p, "TRM: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", cpu_pda(j)->irq_thermal_counts);
+ seq_printf(p, " Thermal event interrupts\n");
+ seq_printf(p, "THR: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", cpu_pda(j)->irq_threshold_counts);
+ seq_printf(p, " Threshold APIC interrupts\n");
+ seq_printf(p, "SPU: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", cpu_pda(j)->irq_spurious_counts);
+ seq_printf(p, " Spurious interrupts\n");
seq_printf(p, "ERR: %10u\n", atomic_read(&irq_err_count));
}
return 0;
Index: linux/arch/x86_64/kernel/mce_amd.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce_amd.c
+++ linux/arch/x86_64/kernel/mce_amd.c
@@ -237,6 +237,7 @@ asmlinkage void mce_threshold_interrupt(
}
}
out:
+ add_pda(irq_threshold_counts, 1);
irq_exit();
}

Index: linux/arch/x86_64/kernel/mce_intel.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce_intel.c
+++ linux/arch/x86_64/kernel/mce_intel.c
@@ -26,6 +26,7 @@ asmlinkage void smp_thermal_interrupt(vo
if (therm_throt_process(msr_val & 1))
mce_log_therm_throt_event(smp_processor_id(), msr_val);

+ add_pda(irq_thermal_counts, 1);
irq_exit();
}

Index: linux/arch/x86_64/kernel/smp.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smp.c
+++ linux/arch/x86_64/kernel/smp.c
@@ -163,6 +163,7 @@ asmlinkage void smp_invalidate_interrupt
out:
ack_APIC_irq();
cpu_clear(cpu, f->flush_cpumask);
+ add_pda(irq_tlb_counts, 1);
}

static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
@@ -493,6 +494,7 @@ void smp_send_stop(void)
asmlinkage void smp_reschedule_interrupt(void)
{
ack_APIC_irq();
+ add_pda(irq_resched_counts, 1);
}

asmlinkage void smp_call_function_interrupt(void)
@@ -514,6 +516,7 @@ asmlinkage void smp_call_function_interr
exit_idle();
irq_enter();
(*func)(info);
+ add_pda(irq_call_counts, 1);
irq_exit();
if (wait) {
mb();
Index: linux/include/asm-i386/hardirq.h
===================================================================
--- linux.orig/include/asm-i386/hardirq.h
+++ linux/include/asm-i386/hardirq.h
@@ -9,6 +9,11 @@ typedef struct {
unsigned long idle_timestamp;
unsigned int __nmi_count; /* arch dependent */
unsigned int apic_timer_irqs; /* arch dependent */
+ unsigned int irq_resched_counts;
+ unsigned int irq_call_counts;
+ unsigned int irq_tlb_counts;
+ unsigned int irq_thermal_counts;
+ unsigned int irq_spurious_counts;
} ____cacheline_aligned irq_cpustat_t;

DECLARE_PER_CPU(irq_cpustat_t, irq_stat);
Index: linux/include/asm-x86_64/pda.h
===================================================================
--- linux.orig/include/asm-x86_64/pda.h
+++ linux/include/asm-x86_64/pda.h
@@ -29,6 +29,12 @@ struct x8664_pda {
short isidle;
struct mm_struct *active_mm;
unsigned apic_timer_irqs;
+ unsigned irq_resched_counts;
+ unsigned irq_call_counts;
+ unsigned irq_tlb_counts;
+ unsigned irq_thermal_counts;
+ unsigned irq_threshold_counts;
+ unsigned irq_spurious_counts;
} ____cacheline_aligned_in_smp;

extern struct x8664_pda *_cpu_pda[];
Index: linux/Documentation/filesystems/proc.txt
===================================================================
--- linux.orig/Documentation/filesystems/proc.txt
+++ linux/Documentation/filesystems/proc.txt
@@ -347,7 +347,40 @@ connects the CPUs in a SMP system. This
the IO-APIC automatically retry the transmission, so it should not be a big
problem, but you should read the SMP-FAQ.

-In this context it could be interesting to note the new irq directory in 2.4.
+In 2.6.2* /proc/interrupts was expanded again. This time the goal was for
+/proc/interrupts to display every IRQ vector in use by the system, not
+just those considered 'most important'. The new vectors are:
+
+ THR -- a threshold interrupt occurs when ECC memory correction is occuring
+ at too high a frequency. Threshold interrupt machinery is often put
+ into the ECC logic, as occasional ECC memory corrections are part of
+ normal operation (due to random alpha particles), but sequences of
+ ECC corrections or outright failures over some short interval usually
+ indicate a memory chip that is about to fail. Note that not every
+ platform has ECC threshold logic, and those that do generally require
+ it to be explicitly turned on.
+
+ TRM -- a thermal event interrupt occurs when a temperature threshold
+ has been exceeded for some CPU chip. This interrupt may also be generated
+ when the temperature drops back to normal.
+
+ SPU -- a spurious interrupt is some interrupt that was raised then lowered
+ by some IO device before it could be fully processed by the APIC. Hence
+ the APIC sees the interrupt but does not know what device it came from.
+ For this case the APIC will generate the interrupt with a IRQ vector
+ of 0xff.
+
+ RES, CAL, TLB -- rescheduling, call and tlb flush interrupts are
+ sent from one CPU to another per the needs of the OS. Typically,
+ their statistics are used by kernel developers and interested users to
+ determine the occurance of interrupt floods of the given type.
+
+The above IRQ vectors are displayed only when relevent. For example,
+the threshold vector does not exist on x86_64 platforms. Others are
+suppressed when the system is a uniprocessor. As of this writing, only
+i386 and x86_64 platforms support the new IRQ vector displays.
+
+Of some interest is the introduction of the /proc/irq directory to 2.4.
It could be used to set IRQ to CPU affinity, this means that you can "hook" an
IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
irq subdir is one subdir for each IRQ, and one file; prof_cpu_mask

2007-09-21 22:35:17

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [9/50] i386: validate against ACPI motherboard resources


From: Robert Hancock <[email protected]>

This path adds validation of the MMCONFIG table against the ACPI reserved
motherboard resources. If the MMCONFIG table is found to be reserved in
ACPI, we don't bother checking the E820 table. The PCI Express firmware
spec apparently tells BIOS developers that reservation in ACPI is required
and E820 reservation is optional, so checking against ACPI first makes
sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though
it is perfectly functional, the existing check needlessly disables MMCONFIG
in these cases.

In order to do this, MMCONFIG setup has been split into two phases. If PCI
configuration type 1 is not available then MMCONFIG is enabled early as
before. Otherwise, it is enabled later after the ACPI interpreter is
enabled, since we need to be able to execute control methods in order to
check the ACPI reserved resources. Presently this is just triggered off
the end of ACPI interpreter initialization.

There are a few other behavioral changes here:

- Validate all MMCONFIG configurations provided, not just the first one.

- Validate the entire required length of each configuration according to
the provided ending bus number is reserved, not just the minimum required
allocation.

- Validate that the area is reserved even if we read it from the chipset
directly and not from the MCFG table. This catches the case where the
BIOS didn't set the location properly in the chipset and has mapped it
over other things it shouldn't have.

This also cleans up the MMCONFIG initialization functions so that they
simply do nothing if MMCONFIG is not compiled in.

Based on an original patch by Rajesh Shah from Intel.

[[email protected]: many fixes and cleanups]
Signed-off-by: Robert Hancock <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Cc: Rajesh Shah <[email protected]>
Cc: Jesse Barnes <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/i386/pci/init.c | 4 -
arch/i386/pci/mmconfig-shared.c | 151 +++++++++++++++++++++++++++++++++++-----
arch/i386/pci/pci.h | 1
drivers/acpi/bus.c | 2
include/linux/pci.h | 8 ++
5 files changed, 144 insertions(+), 22 deletions(-)

Index: linux/arch/i386/pci/init.c
===================================================================
--- linux.orig/arch/i386/pci/init.c
+++ linux/arch/i386/pci/init.c
@@ -11,9 +11,7 @@ static __init int pci_access_init(void)
#ifdef CONFIG_PCI_DIRECT
type = pci_direct_probe();
#endif
-#ifdef CONFIG_PCI_MMCONFIG
- pci_mmcfg_init(type);
-#endif
+ pci_mmcfg_early_init(type);
if (raw_pci_ops)
return 0;
#ifdef CONFIG_PCI_BIOS
Index: linux/arch/i386/pci/mmconfig-shared.c
===================================================================
--- linux.orig/arch/i386/pci/mmconfig-shared.c
+++ linux/arch/i386/pci/mmconfig-shared.c
@@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso
pci_mmcfg_resources_inserted = 1;
}

-static void __init pci_mmcfg_reject_broken(int type)
+static acpi_status __init check_mcfg_resource(struct acpi_resource *res,
+ void *data)
+{
+ struct resource *mcfg_res = data;
+ struct acpi_resource_address64 address;
+ acpi_status status;
+
+ if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) {
+ struct acpi_resource_fixed_memory32 *fixmem32 =
+ &res->data.fixed_memory32;
+ if (!fixmem32)
+ return AE_OK;
+ if ((mcfg_res->start >= fixmem32->address) &&
+ (mcfg_res->end < (fixmem32->address +
+ fixmem32->address_length))) {
+ mcfg_res->flags = 1;
+ return AE_CTRL_TERMINATE;
+ }
+ }
+ if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) &&
+ (res->type != ACPI_RESOURCE_TYPE_ADDRESS64))
+ return AE_OK;
+
+ status = acpi_resource_to_address64(res, &address);
+ if (ACPI_FAILURE(status) ||
+ (address.address_length <= 0) ||
+ (address.resource_type != ACPI_MEMORY_RANGE))
+ return AE_OK;
+
+ if ((mcfg_res->start >= address.minimum) &&
+ (mcfg_res->end < (address.minimum + address.address_length))) {
+ mcfg_res->flags = 1;
+ return AE_CTRL_TERMINATE;
+ }
+ return AE_OK;
+}
+
+static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl,
+ void *context, void **rv)
+{
+ struct resource *mcfg_res = context;
+
+ acpi_walk_resources(handle, METHOD_NAME__CRS,
+ check_mcfg_resource, context);
+
+ if (mcfg_res->flags)
+ return AE_CTRL_TERMINATE;
+
+ return AE_OK;
+}
+
+static int __init is_acpi_reserved(unsigned long start, unsigned long end)
+{
+ struct resource mcfg_res;
+
+ mcfg_res.start = start;
+ mcfg_res.end = end;
+ mcfg_res.flags = 0;
+
+ acpi_get_devices("PNP0C01", find_mboard_resource, &mcfg_res, NULL);
+
+ if (!mcfg_res.flags)
+ acpi_get_devices("PNP0C02", find_mboard_resource, &mcfg_res,
+ NULL);
+
+ return mcfg_res.flags;
+}
+
+static void __init pci_mmcfg_reject_broken(void)
{
typeof(pci_mmcfg_config[0]) *cfg;
+ int i;

if ((pci_mmcfg_config_num == 0) ||
(pci_mmcfg_config == NULL) ||
@@ -229,17 +298,37 @@ static void __init pci_mmcfg_reject_brok
goto reject;
}

- /*
- * Only do this check when type 1 works. If it doesn't work
- * assume we run on a Mac and always use MCFG
- */
- if (type == 1 && !e820_all_mapped(cfg->address,
- cfg->address + MMCONFIG_APER_MIN,
- E820_RESERVED)) {
- printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %Lx is not"
- " E820-reserved\n", cfg->address);
- goto reject;
+ for (i = 0; i < pci_mmcfg_config_num; i++) {
+ u32 size = (cfg->end_bus_number + 1) << 20;
+ cfg = &pci_mmcfg_config[i];
+ printk(KERN_NOTICE "PCI: MCFG configuration %d: base %lu "
+ "segment %hu buses %u - %u\n",
+ i, (unsigned long)cfg->address, cfg->pci_segment,
+ (unsigned int)cfg->start_bus_number,
+ (unsigned int)cfg->end_bus_number);
+ if (is_acpi_reserved(cfg->address, cfg->address + size - 1)) {
+ printk(KERN_NOTICE "PCI: MCFG area at %Lx reserved "
+ "in ACPI motherboard resources\n",
+ cfg->address);
+ } else {
+ printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %Lx is not"
+ " reserved in ACPI motherboard resources\n",
+ cfg->address);
+ /* Don't try to do this check unless configuration
+ type 1 is available. */
+ if ((pci_probe & PCI_PROBE_CONF1) &&
+ e820_all_mapped(cfg->address,
+ cfg->address + size - 1,
+ E820_RESERVED))
+ printk(KERN_NOTICE
+ "PCI: MCFG area at %Lx reserved in "
+ "E820\n",
+ cfg->address);
+ else
+ goto reject;
+ }
}
+
return;

reject:
@@ -249,20 +338,46 @@ reject:
pci_mmcfg_config_num = 0;
}

-void __init pci_mmcfg_init(int type)
+void __init pci_mmcfg_early_init(int type)
+{
+ if ((pci_probe & PCI_PROBE_MMCONF) == 0)
+ return;
+
+ /* If type 1 access is available, no need to enable MMCONFIG yet, we can
+ defer until later when the ACPI interpreter is available to better
+ validate things. */
+ if (type == 1)
+ return;
+
+ acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg);
+
+ if ((pci_mmcfg_config_num == 0) ||
+ (pci_mmcfg_config == NULL) ||
+ (pci_mmcfg_config[0].address == 0))
+ return;
+
+ if (pci_mmcfg_arch_init())
+ pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
+}
+
+void __init pci_mmcfg_late_init(void)
{
int known_bridge = 0;

+ /* MMCONFIG disabled */
if ((pci_probe & PCI_PROBE_MMCONF) == 0)
return;

- if (type == 1 && pci_mmcfg_check_hostbridge())
- known_bridge = 1;
+ /* MMCONFIG already enabled */
+ if (!(pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF))
+ return;

- if (!known_bridge) {
+ if ((pci_probe & PCI_PROBE_CONF1) && pci_mmcfg_check_hostbridge())
+ known_bridge = 1;
+ else
acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg);
- pci_mmcfg_reject_broken(type);
- }
+
+ pci_mmcfg_reject_broken();

if ((pci_mmcfg_config_num == 0) ||
(pci_mmcfg_config == NULL) ||
@@ -270,7 +385,7 @@ void __init pci_mmcfg_init(int type)
return;

if (pci_mmcfg_arch_init()) {
- if (type == 1)
+ if (pci_probe & PCI_PROBE_CONF1)
unreachable_devices();
if (known_bridge)
pci_mmcfg_insert_resources(IORESOURCE_BUSY);
Index: linux/arch/i386/pci/pci.h
===================================================================
--- linux.orig/arch/i386/pci/pci.h
+++ linux/arch/i386/pci/pci.h
@@ -91,7 +91,6 @@ extern int pci_conf1_read(unsigned int s
extern int pci_direct_probe(void);
extern void pci_direct_init(int type);
extern void pci_pcbios_init(void);
-extern void pci_mmcfg_init(int type);
extern void pcibios_sort(void);

/* pci-mmconfig.c */
Index: linux/drivers/acpi/bus.c
===================================================================
--- linux.orig/drivers/acpi/bus.c
+++ linux/drivers/acpi/bus.c
@@ -35,6 +35,7 @@
#ifdef CONFIG_X86
#include <asm/mpspec.h>
#endif
+#include <linux/pci.h>
#include <acpi/acpi_bus.h>
#include <acpi/acpi_drivers.h>

@@ -757,6 +758,7 @@ static int __init acpi_init(void)
result = acpi_bus_init();

if (!result) {
+ pci_mmcfg_late_init();
#ifdef CONFIG_PM_LEGACY
if (!PM_IS_ACTIVE())
pm_active = 1;
Index: linux/include/linux/pci.h
===================================================================
--- linux.orig/include/linux/pci.h
+++ linux/include/linux/pci.h
@@ -893,5 +893,13 @@ extern unsigned long pci_cardbus_mem_siz

extern int pcibios_add_platform_entries(struct pci_dev *dev);

+#ifdef CONFIG_PCI_MMCONFIG
+extern void __init pci_mmcfg_early_init(int type);
+extern void __init pci_mmcfg_late_init(void);
+#else
+static inline void pci_mmcfg_early_init(int type) { }
+static inline void pci_mmcfg_late_init(void) { }
+#endif
+
#endif /* __KERNEL__ */
#endif /* LINUX_PCI_H */

2007-09-21 22:35:40

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [10/50] x86_64: install unstripped copies of compat vdso on disk


From: Roland McGrath <[email protected]>

This keeps an unstripped copy of the vDSO images built before they are
stripped and embedded in the kernel. The unstripped copies get installed
in $(MODLIB)/vdso/ by "make install" (or you can explicitly use the
subtarget "make vdso_install"). These files can be useful when they
contain source-level debugging information.

Signed-off-by: Roland McGrath <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

arch/x86_64/Makefile | 7 ++++++-
arch/x86_64/ia32/Makefile | 25 +++++++++++++++++++++----
2 files changed, 27 insertions(+), 5 deletions(-)

Index: linux/arch/x86_64/Makefile
===================================================================
--- linux.orig/arch/x86_64/Makefile
+++ linux/arch/x86_64/Makefile
@@ -113,9 +113,14 @@ bzdisk: vmlinux
fdimage fdimage144 fdimage288 isoimage: vmlinux
$(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) $@

-install:
+install: vdso_install
$(Q)$(MAKE) $(build)=$(boot) BOOTIMAGE=$(BOOTIMAGE) $@

+vdso_install:
+ifeq ($(CONFIG_IA32_EMULATION),y)
+ $(Q)$(MAKE) $(build)=arch/x86_64/ia32 $@
+endif
+
archclean:
$(Q)$(MAKE) $(clean)=$(boot)

Index: linux/arch/x86_64/ia32/Makefile
===================================================================
--- linux.orig/arch/x86_64/ia32/Makefile
+++ linux/arch/x86_64/ia32/Makefile
@@ -18,18 +18,35 @@ $(obj)/syscall32_syscall.o: \
$(foreach F,sysenter syscall,$(obj)/vsyscall-$F.so)

# Teach kbuild about targets
-targets := $(foreach F,sysenter syscall,vsyscall-$F.o vsyscall-$F.so)
+targets := $(foreach F,$(addprefix vsyscall-,sysenter syscall),\
+ $F.o $F.so $F.so.dbg)

# The DSO images are built using a special linker script
quiet_cmd_syscall = SYSCALL $@
- cmd_syscall = $(CC) -m32 -nostdlib -shared -s \
+ cmd_syscall = $(CC) -m32 -nostdlib -shared \
$(call ld-option, -Wl$(comma)--hash-style=sysv) \
-Wl,-soname=linux-gate.so.1 -o $@ \
-Wl,-T,$(filter-out FORCE,$^)

-$(obj)/vsyscall-sysenter.so $(obj)/vsyscall-syscall.so: \
-$(obj)/vsyscall-%.so: $(src)/vsyscall.lds $(obj)/vsyscall-%.o FORCE
+$(obj)/%.so: OBJCOPYFLAGS := -S
+$(obj)/%.so: $(obj)/%.so.dbg FORCE
+ $(call if_changed,objcopy)
+
+$(obj)/vsyscall-sysenter.so.dbg $(obj)/vsyscall-syscall.so.dbg: \
+$(obj)/vsyscall-%.so.dbg: $(src)/vsyscall.lds $(obj)/vsyscall-%.o FORCE
$(call if_changed,syscall)

AFLAGS_vsyscall-sysenter.o = -m32 -Wa,-32
AFLAGS_vsyscall-syscall.o = -m32 -Wa,-32
+
+vdsos := vdso32-sysenter.so vdso32-syscall.so
+
+quiet_cmd_vdso_install = INSTALL $@
+ cmd_vdso_install = cp $(@:vdso32-%.so=$(obj)/vsyscall-%.so.dbg) \
+ $(MODLIB)/vdso/$@
+
+$(vdsos):
+ @mkdir -p $(MODLIB)/vdso
+ $(call cmd,vdso_install)
+
+vdso_install: $(vdsos)

2007-09-21 22:35:55

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [11/50] x86_64: Install unstripped copy of 64bit vdso to disk


From: Roland McGrath <[email protected]>

This keeps an unstripped copy of the 64bit vDSO images built before they are
stripped and embedded in the kernel. The unstripped copies get installed
in $(MODLIB)/vdso/ by "make install" (or you can explicitly use the
subtarget "make vdso_install"). These files can be useful when they
contain source-level debugging information.


Signed-off-by: Roland McGrath <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/Makefile | 1 +
arch/x86_64/vdso/Makefile | 20 ++++++++++++++++----
2 files changed, 17 insertions(+), 4 deletions(-)

Index: linux/arch/x86_64/Makefile
===================================================================
--- linux.orig/arch/x86_64/Makefile
+++ linux/arch/x86_64/Makefile
@@ -120,6 +120,7 @@ vdso_install:
ifeq ($(CONFIG_IA32_EMULATION),y)
$(Q)$(MAKE) $(build)=arch/x86_64/ia32 $@
endif
+ $(Q)$(MAKE) $(build)=arch/x86_64/vdso $@

archclean:
$(Q)$(MAKE) $(clean)=$(boot)
Index: linux/arch/x86_64/vdso/Makefile
===================================================================
--- linux.orig/arch/x86_64/vdso/Makefile
+++ linux/arch/x86_64/vdso/Makefile
@@ -13,7 +13,7 @@ vobjs := $(foreach F,$(vobjs-y),$(obj)/$

$(obj)/vdso.o: $(obj)/vdso.so

-targets += vdso.so vdso.lds $(vobjs-y) vdso-syms.o
+targets += vdso.so vdso.so.dbg vdso.lds $(vobjs-y) vdso-syms.o

# The DSO images are built using a special linker script.
quiet_cmd_syscall = SYSCALL $@
@@ -25,14 +25,18 @@ export CPPFLAGS_vdso.lds += -P -C -U$(AR
vdso-flags = -fPIC -shared -Wl,-soname=linux-vdso.so.1 \
$(call ld-option, -Wl$(comma)--hash-style=sysv) \
-Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096
-SYSCFLAGS_vdso.so = $(vdso-flags)
+SYSCFLAGS_vdso.so.dbg = $(vdso-flags)

$(obj)/vdso.o: $(src)/vdso.S $(obj)/vdso.so

-$(obj)/vdso.so: $(src)/vdso.lds $(vobjs) FORCE
+$(obj)/vdso.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
$(call if_changed,syscall)

-CFL := $(PROFILING) -mcmodel=small -fPIC -g0 -O2 -fasynchronous-unwind-tables -m64
+$(obj)/%.so: OBJCOPYFLAGS := -S
+$(obj)/%.so: $(obj)/%.so.dbg FORCE
+ $(call if_changed,objcopy)
+
+CFL := $(PROFILING) -mcmodel=small -fPIC $(if $(CONFIG_DEBUG_INFO),-g,-g0) -O2 -fasynchronous-unwind-tables -m64

$(obj)/vclock_gettime.o: CFLAGS = $(CFL)
$(obj)/vgetcpu.o: CFLAGS = $(CFL)
@@ -47,3 +51,11 @@ $(obj)/built-in.o: ld_flags += -R $(obj)
SYSCFLAGS_vdso-syms.o = -r -d
$(obj)/vdso-syms.o: $(src)/vdso.lds $(vobjs) FORCE
$(call if_changed,syscall)
+
+quiet_cmd_vdso_install = INSTALL $@
+ cmd_vdso_install = cp $(obj)/[email protected] $(MODLIB)/vdso/$@
+vdso.so:
+ @mkdir -p $(MODLIB)/vdso
+ $(call cmd,vdso_install)
+
+vdso_install: vdso.so

2007-09-21 22:36:21

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [12/50] x86_64: Untable __init references between IO data


Earlier patch added IO APIC setup into local APIC setup. This caused
modpost warnings. Fix them by untangling setup_local_APIC() and splitting
it into smaller functions. The IO APIC initialization is only called
for the BP init.

Also removed some outdated debugging code and minor cleanup.
Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/kernel/apic.c | 46 ++++++++++++++++++++-----------------------
arch/x86_64/kernel/smpboot.c | 8 +++++++
include/asm-x86_64/apic.h | 1
3 files changed, 31 insertions(+), 24 deletions(-)

Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -323,7 +323,7 @@ void __init init_bsp_APIC(void)

void __cpuinit setup_local_APIC (void)
{
- unsigned int value, maxlvt;
+ unsigned int value;
int i, j;

value = apic_read(APIC_LVR);
@@ -417,33 +417,22 @@ void __cpuinit setup_local_APIC (void)
else
value = APIC_DM_NMI | APIC_LVT_MASKED;
apic_write(APIC_LVT1, value);
+}

+void __cpuinit lapic_setup_esr(void)
+{
+ unsigned maxlvt = get_maxlvt();
+ apic_write(APIC_LVTERR, ERROR_APIC_VECTOR);
/*
- * Now enable IO-APICs, actually call clear_IO_APIC
- * We need clear_IO_APIC before enabling vector on BP
+ * spec says clear errors after enabling vector.
*/
- if (!smp_processor_id())
- if (!skip_ioapic_setup && nr_ioapics)
- enable_IO_APIC();
-
- {
- unsigned oldvalue;
- maxlvt = get_maxlvt();
- oldvalue = apic_read(APIC_ESR);
- value = ERROR_APIC_VECTOR; // enables sending errors
- apic_write(APIC_LVTERR, value);
- /*
- * spec says clear errors after enabling vector.
- */
- if (maxlvt > 3)
- apic_write(APIC_ESR, 0);
- value = apic_read(APIC_ESR);
- if (value != oldvalue)
- apic_printk(APIC_VERBOSE,
- "ESR value after enabling vector: %08x, after %08x\n",
- oldvalue, value);
- }
+ if (maxlvt > 3)
+ apic_write(APIC_ESR, 0);
+}

+void __cpuinit end_local_APIC_setup(void)
+{
+ lapic_setup_esr();
nmi_watchdog_default();
setup_apic_nmi_watchdog(NULL);
apic_pm_activate();
@@ -1178,6 +1167,15 @@ int __init APIC_init_uniprocessor (void)

setup_local_APIC();

+ /*
+ * Now enable IO-APICs, actually call clear_IO_APIC
+ * We need clear_IO_APIC before enabling vector on BP
+ */
+ if (!skip_ioapic_setup && nr_ioapics)
+ enable_IO_APIC();
+
+ end_local_APIC_setup();
+
if (smp_found_config && !skip_ioapic_setup && nr_ioapics)
setup_IO_APIC();
else
Index: linux/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smpboot.c
+++ linux/arch/x86_64/kernel/smpboot.c
@@ -211,6 +211,7 @@ void __cpuinit smp_callin(void)

Dprintk("CALLIN, before setup_local_APIC().\n");
setup_local_APIC();
+ end_local_APIC_setup();

/*
* Get our bogomips.
@@ -870,6 +871,13 @@ void __init smp_prepare_cpus(unsigned in
*/
setup_local_APIC();

+ /*
+ * Enable IO APIC before setting up error vector
+ */
+ if (!skip_ioapic_setup && nr_ioapics)
+ enable_IO_APIC();
+ end_local_APIC_setup();
+
if (GET_APIC_ID(apic_read(APIC_ID)) != boot_cpu_id) {
panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
GET_APIC_ID(apic_read(APIC_ID)), boot_cpu_id);
Index: linux/include/asm-x86_64/apic.h
===================================================================
--- linux.orig/include/asm-x86_64/apic.h
+++ linux/include/asm-x86_64/apic.h
@@ -73,6 +73,7 @@ extern void cache_APIC_registers (void);
extern void sync_Arb_IDs (void);
extern void init_bsp_APIC (void);
extern void setup_local_APIC (void);
+extern void end_local_APIC_setup(void);
extern void init_apic_mappings (void);
extern void smp_local_timer_interrupt (void);
extern void setup_boot_APIC_clock (void);

2007-09-21 22:36:36

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [13/50] x86: Fix and reenable CLFLUSH support in change_page_attr()


Reenable CLFLUSH support in change_page_attr()

Mark pages that need to be cache flushed with a special bit
before putting them into the deferred list.
(PG_owner_priv_1). Then only cache flush these pages
and don't free them.

Takes especial care to handle cases where the page's LRU
or owner_priv_1 bit is already used. Fall back to full cache
flushes then.

They probably do not happen right now, but this makes
it more future proof.

TBD port to i386

Signed-off-by: Andi Kleen <[email protected]>

---
arch/i386/mm/pageattr.c | 69 +++++++++++++++++++++++++++++++++--------
arch/x86_64/mm/pageattr.c | 72 +++++++++++++++++++++++++++++++++----------
include/asm-i386/pgtable.h | 2 +
include/asm-x86_64/pgtable.h | 1
4 files changed, 115 insertions(+), 29 deletions(-)

Index: linux/arch/x86_64/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86_64/mm/pageattr.c
+++ linux/arch/x86_64/mm/pageattr.c
@@ -13,6 +13,10 @@
#include <asm/tlbflush.h>
#include <asm/io.h>

+#define PageFlush(p) test_bit(PG_owner_priv_1, &(p)->flags)
+#define SetPageFlush(p) set_bit(PG_owner_priv_1, &(p)->flags)
+#define TestClearPageFlush(p) test_and_clear_bit(PG_owner_priv_1, &(p)->flags)
+
pte_t *lookup_address(unsigned long address)
{
pgd_t *pgd = pgd_offset_k(address);
@@ -61,6 +65,11 @@ static struct page *split_large_page(uns
return base;
}

+struct flush_arg {
+ int full_flush;
+ struct list_head l;
+};
+
static void cache_flush_page(void *adr)
{
int i;
@@ -70,17 +79,16 @@ static void cache_flush_page(void *adr)

static void flush_kernel_map(void *arg)
{
- struct list_head *l = (struct list_head *)arg;
+ struct flush_arg *a = (struct flush_arg *)arg;
struct page *pg;

- /* When clflush is available always use it because it is
+ /* When clflush is available use it because it is
much cheaper than WBINVD. */
- /* clflush is still broken. Disable for now. */
- if (1 || !cpu_has_clflush)
+ if (a->full_flush || !cpu_has_clflush)
asm volatile("wbinvd" ::: "memory");
- else list_for_each_entry(pg, l, lru) {
- void *adr = page_address(pg);
- cache_flush_page(adr);
+ else list_for_each_entry(pg, &a->l, lru) {
+ if (PageFlush(pg))
+ cache_flush_page(page_address(pg));
}
__flush_tlb_all();
}
@@ -90,11 +98,17 @@ static inline void flush_map(struct list
on_each_cpu(flush_kernel_map, l, 1, 1);
}

-static LIST_HEAD(deferred_pages); /* protected by init_mm.mmap_sem */
+/* both protected by init_mm.mmap_sem */
+static int full_flush;
+static LIST_HEAD(deferred_pages);

-static inline void save_page(struct page *fpage)
+static inline void save_page(struct page *fpage, int data)
{
- if (!test_and_set_bit(PG_arch_1, &fpage->flags))
+ if (data && cpu_has_clflush)
+ SetPageFlush(fpage);
+ if (test_and_set_bit(PG_arch_1, &fpage->flags))
+ return;
+ if (cpu_has_clflush || !data)
list_add(&fpage->lru, &deferred_pages);
}

@@ -122,6 +136,17 @@ static void revert_page(unsigned long ad
set_pte((pte_t *)pmd, large_pte);
}

+static struct page *flush_page(unsigned long address)
+{
+ struct page *p;
+ if (!(pfn_valid(__pa(address) >> PAGE_SHIFT)))
+ return NULL;
+ p = virt_to_page(address);
+ if ((PageFlush(p) || PageLRU(p)) && !test_bit(PG_arch_1, &p->flags))
+ return NULL;
+ return p;
+}
+
static int
__change_page_attr(unsigned long address, unsigned long pfn, pgprot_t prot,
pgprot_t ref_prot)
@@ -133,8 +158,19 @@ __change_page_attr(unsigned long address
kpte = lookup_address(address);
if (!kpte) return 0;
kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
- BUG_ON(PageLRU(kpte_page));
BUG_ON(PageCompound(kpte_page));
+ BUG_ON(PageLRU(kpte_page));
+
+ /* Do caching attributes change?
+ Note: this will need changes if the PAT bit is used (it isn't
+ currently) because that one varies between 2MB and 4K pages. */
+ if ((pte_val(*kpte)&_PAGE_CACHE) != (pgprot_val(prot)&_PAGE_CACHE)) {
+ struct page *p = flush_page(address);
+ if (!p)
+ full_flush = 1;
+ else
+ save_page(p, 1);
+ }
if (pgprot_val(prot) != pgprot_val(ref_prot)) {
if (!pte_huge(*kpte)) {
set_pte(kpte, pfn_pte(pfn, prot));
@@ -162,7 +198,7 @@ __change_page_attr(unsigned long address
/* on x86-64 the direct mapping set at boot is not using 4k pages */
BUG_ON(PageReserved(kpte_page));

- save_page(kpte_page);
+ save_page(kpte_page, 0);
if (page_private(kpte_page) == 0)
revert_page(address, ref_prot);
return 0;
@@ -227,17 +263,21 @@ int change_page_attr(struct page *page,
void global_flush_tlb(void)
{
struct page *pg, *next;
- struct list_head l;
+ struct flush_arg arg;

down_read(&init_mm.mmap_sem);
- list_replace_init(&deferred_pages, &l);
+ arg.full_flush = full_flush;
+ full_flush = 0;
+ list_replace_init(&deferred_pages, &arg.l);
up_read(&init_mm.mmap_sem);

- flush_map(&l);
+ flush_map(&arg);

- list_for_each_entry_safe(pg, next, &l, lru) {
+ list_for_each_entry_safe(pg, next, &arg.l, lru) {
list_del(&pg->lru);
clear_bit(PG_arch_1, &pg->flags);
+ if (TestClearPageFlush(pg))
+ continue;
if (page_private(pg) != 0)
continue;
ClearPagePrivate(pg);
Index: linux/include/asm-x86_64/pgtable.h
===================================================================
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -165,6 +165,7 @@ static inline pte_t ptep_get_and_clear_f

#define _PAGE_PROTNONE 0x080 /* If not present */
#define _PAGE_NX (_AC(1,UL)<<_PAGE_BIT_NX)
+#define _PAGE_CACHE (_PAGE_PCD|_PAGE_PWT)

#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY)
#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
Index: linux/arch/i386/mm/pageattr.c
===================================================================
--- linux.orig/arch/i386/mm/pageattr.c
+++ linux/arch/i386/mm/pageattr.c
@@ -14,7 +14,13 @@
#include <asm/pgalloc.h>
#include <asm/sections.h>

+#define PageFlush(p) test_bit(PG_owner_priv_1, &(p)->flags)
+#define SetPageFlush(p) set_bit(PG_owner_priv_1, &(p)->flags)
+#define TestClearPageFlush(p) test_and_clear_bit(PG_owner_priv_1, &(p)->flags)
+
static DEFINE_SPINLOCK(cpa_lock);
+/* Both protected by cpa_lock */
+static int full_flush;
static struct list_head df_list = LIST_HEAD_INIT(df_list);


@@ -68,6 +74,11 @@ static struct page *split_large_page(uns
return base;
}

+struct flush_arg {
+ int full_flush;
+ struct list_head l;
+};
+
static void cache_flush_page(struct page *p)
{
void *adr = page_address(p);
@@ -78,13 +89,14 @@ static void cache_flush_page(struct page

static void flush_kernel_map(void *arg)
{
- struct list_head *lh = (struct list_head *)arg;
+ struct flush_arg *a = (struct flush_arg *)arg;
struct page *p;

- /* High level code is not ready for clflush yet */
- if (0 && cpu_has_clflush) {
- list_for_each_entry (p, lh, lru)
- cache_flush_page(p);
+ if (!a->full_flush && cpu_has_clflush) {
+ list_for_each_entry (p, &a->l, lru) {
+ if (PageFlush(p))
+ cache_flush_page(p);
+ }
} else if (boot_cpu_data.x86_model >= 4)
wbinvd();

@@ -136,10 +148,25 @@ static inline void revert_page(struct pa
ref_prot));
}

-static inline void save_page(struct page *kpte_page)
+static inline void save_page(struct page *fpage, int data)
+{
+ if (data && cpu_has_clflush)
+ SetPageFlush(fpage);
+ if (test_and_set_bit(PG_arch_1, &fpage->flags))
+ return;
+ if (!data || cpu_has_clflush)
+ list_add(&fpage->lru, &df_list);
+}
+
+static struct page *flush_page(unsigned long address)
{
- if (!test_and_set_bit(PG_arch_1, &kpte_page->flags))
- list_add(&kpte_page->lru, &df_list);
+ struct page *p;
+ if (!(pfn_valid(__pa(address) >> PAGE_SHIFT)))
+ return NULL;
+ p = virt_to_page(address);
+ if ((PageFlush(p) || PageLRU(p)) && !test_bit(PG_arch_1, &p->flags))
+ return NULL;
+ return p;
}

static int
@@ -158,6 +185,18 @@ __change_page_attr(struct page *page, pg
kpte_page = virt_to_page(kpte);
BUG_ON(PageLRU(kpte_page));
BUG_ON(PageCompound(kpte_page));
+ BUG_ON(PageLRU(kpte_page));
+
+ /* Do caching attributes change?
+ Note: this will need changes if the PAT bit is used (it isn't
+ currently) because that one varies between 2MB and 4K pages. */
+ if ((pte_val(*kpte)&_PAGE_CACHE) != (pgprot_val(prot)&_PAGE_CACHE)) {
+ struct page *p = flush_page(address);
+ if (!p)
+ full_flush = 1;
+ else
+ save_page(p, 1);
+ }

if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) {
if (!pte_huge(*kpte)) {
@@ -189,7 +228,7 @@ __change_page_attr(struct page *page, pg
* replace it with a largepage.
*/

- save_page(kpte_page);
+ save_page(kpte_page, 0);
if (!PageReserved(kpte_page)) {
if (cpu_has_pse && (page_private(kpte_page) == 0)) {
paravirt_release_pt(page_to_pfn(kpte_page));
@@ -235,18 +274,22 @@ int change_page_attr(struct page *page,

void global_flush_tlb(void)
{
- struct list_head l;
+ struct flush_arg arg;
struct page *pg, *next;

BUG_ON(irqs_disabled());

spin_lock_irq(&cpa_lock);
- list_replace_init(&df_list, &l);
+ arg.full_flush = full_flush;
+ full_flush = 0;
+ list_replace_init(&df_list, &arg.l);
spin_unlock_irq(&cpa_lock);
- flush_map(&l);
- list_for_each_entry_safe(pg, next, &l, lru) {
+ flush_map(&arg);
+ list_for_each_entry_safe(pg, next, &arg.l, lru) {
list_del(&pg->lru);
clear_bit(PG_arch_1, &pg->flags);
+ if (TestClearPageFlush(pg))
+ continue;
if (PageReserved(pg) || !cpu_has_pse || page_private(pg) != 0)
continue;
ClearPagePrivate(pg);
Index: linux/include/asm-i386/pgtable.h
===================================================================
--- linux.orig/include/asm-i386/pgtable.h
+++ linux/include/asm-i386/pgtable.h
@@ -128,6 +128,8 @@ void paging_init(void);
#else
#define _PAGE_NX 0
#endif
+#define _PAGE_CACHE (_PAGE_PCD|_PAGE_PWT)
+

#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY)
#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)

2007-09-21 22:36:51

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [14/50] x86: Minor code-style cleanups to change_page_attr


Remove a one liner function and expand into parent.
No functional changes.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/i386/mm/pageattr.c | 7 +------
arch/x86_64/mm/pageattr.c | 7 +------
2 files changed, 2 insertions(+), 12 deletions(-)

Index: linux/arch/x86_64/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86_64/mm/pageattr.c
+++ linux/arch/x86_64/mm/pageattr.c
@@ -93,11 +93,6 @@ static void flush_kernel_map(void *arg)
__flush_tlb_all();
}

-static inline void flush_map(struct list_head *l)
-{
- on_each_cpu(flush_kernel_map, l, 1, 1);
-}
-
/* both protected by init_mm.mmap_sem */
static int full_flush;
static LIST_HEAD(deferred_pages);
@@ -271,7 +266,7 @@ void global_flush_tlb(void)
list_replace_init(&deferred_pages, &arg.l);
up_read(&init_mm.mmap_sem);

- flush_map(&arg);
+ on_each_cpu(flush_kernel_map, &arg, 1, 1);

list_for_each_entry_safe(pg, next, &arg.l, lru) {
list_del(&pg->lru);
Index: linux/arch/i386/mm/pageattr.c
===================================================================
--- linux.orig/arch/i386/mm/pageattr.c
+++ linux/arch/i386/mm/pageattr.c
@@ -238,11 +238,6 @@ __change_page_attr(struct page *page, pg
return 0;
}

-static inline void flush_map(struct list_head *l)
-{
- on_each_cpu(flush_kernel_map, l, 1, 1);
-}
-
/*
* Change the page attributes of an page in the linear mapping.
*
@@ -284,7 +279,7 @@ void global_flush_tlb(void)
full_flush = 0;
list_replace_init(&df_list, &arg.l);
spin_unlock_irq(&cpa_lock);
- flush_map(&arg);
+ on_each_cpu(flush_kernel_map, &arg, 1, 1);
list_for_each_entry_safe(pg, next, &arg.l, lru) {
list_del(&pg->lru);
clear_bit(PG_arch_1, &pg->flags);

2007-09-21 22:37:15

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [15/50] x86_64: Return EINVAL for unknown address in change_page_attr


Matches what i386 does and makes more sense.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/mm/pageattr.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux/arch/x86_64/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86_64/mm/pageattr.c
+++ linux/arch/x86_64/mm/pageattr.c
@@ -151,7 +151,9 @@ __change_page_attr(unsigned long address
pgprot_t ref_prot2;

kpte = lookup_address(address);
- if (!kpte) return 0;
+ if (!kpte)
+ return -EINVAL;
+
kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
BUG_ON(PageCompound(kpte_page));
BUG_ON(PageLRU(kpte_page));

2007-09-21 22:37:47

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [16/50] x86: Use macros to modify the PG_arch_1 page flags in change_page_attr


Instead of open coding the bit accesses uses standard style
*PageDeferred* macros.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/i386/mm/pageattr.c | 10 +++++++---
arch/x86_64/mm/pageattr.c | 11 ++++++++---
2 files changed, 15 insertions(+), 6 deletions(-)

Index: linux/arch/x86_64/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86_64/mm/pageattr.c
+++ linux/arch/x86_64/mm/pageattr.c
@@ -17,6 +17,11 @@
#define SetPageFlush(p) set_bit(PG_owner_priv_1, &(p)->flags)
#define TestClearPageFlush(p) test_and_clear_bit(PG_owner_priv_1, &(p)->flags)

+#define PageDeferred(p) test_bit(PG_arch_1, &(p)->flags)
+#define SetPageDeferred(p) set_bit(PG_arch_1, &(p)->flags)
+#define ClearPageDeferred(p) clear_bit(PG_arch_1, &(p)->flags)
+#define TestSetPageDeferred(p) test_and_set_bit(PG_arch_1, &(p)->flags)
+
pte_t *lookup_address(unsigned long address)
{
pgd_t *pgd = pgd_offset_k(address);
@@ -101,7 +106,7 @@ static inline void save_page(struct page
{
if (data && cpu_has_clflush)
SetPageFlush(fpage);
- if (test_and_set_bit(PG_arch_1, &fpage->flags))
+ if (TestSetPageDeferred(fpage))
return;
if (cpu_has_clflush || !data)
list_add(&fpage->lru, &deferred_pages);
@@ -137,7 +142,7 @@ static struct page *flush_page(unsigned
if (!(pfn_valid(__pa(address) >> PAGE_SHIFT)))
return NULL;
p = virt_to_page(address);
- if ((PageFlush(p) || PageLRU(p)) && !test_bit(PG_arch_1, &p->flags))
+ if ((PageFlush(p) || PageLRU(p)) && !PageDeferred(p))
return NULL;
return p;
}
@@ -272,7 +277,7 @@ void global_flush_tlb(void)

list_for_each_entry_safe(pg, next, &arg.l, lru) {
list_del(&pg->lru);
- clear_bit(PG_arch_1, &pg->flags);
+ ClearPageDeferred(pg);
if (TestClearPageFlush(pg))
continue;
if (page_private(pg) != 0)
Index: linux/arch/i386/mm/pageattr.c
===================================================================
--- linux.orig/arch/i386/mm/pageattr.c
+++ linux/arch/i386/mm/pageattr.c
@@ -17,6 +17,10 @@
#define PageFlush(p) test_bit(PG_owner_priv_1, &(p)->flags)
#define SetPageFlush(p) set_bit(PG_owner_priv_1, &(p)->flags)
#define TestClearPageFlush(p) test_and_clear_bit(PG_owner_priv_1, &(p)->flags)
+#define PageDeferred(p) test_bit(PG_arch_1, &(p)->flags)
+#define SetPageDeferred(p) set_bit(PG_arch_1, &(p)->flags)
+#define ClearPageDeferred(p) clear_bit(PG_arch_1, &(p)->flags)
+#define TestSetPageDeferred(p) test_and_set_bit(PG_arch_1, &(p)->flags)

static DEFINE_SPINLOCK(cpa_lock);
/* Both protected by cpa_lock */
@@ -152,7 +156,7 @@ static inline void save_page(struct page
{
if (data && cpu_has_clflush)
SetPageFlush(fpage);
- if (test_and_set_bit(PG_arch_1, &fpage->flags))
+ if (TestSetPageDeferred(fpage))
return;
if (!data || cpu_has_clflush)
list_add(&fpage->lru, &df_list);
@@ -164,7 +168,7 @@ static struct page *flush_page(unsigned
if (!(pfn_valid(__pa(address) >> PAGE_SHIFT)))
return NULL;
p = virt_to_page(address);
- if ((PageFlush(p) || PageLRU(p)) && !test_bit(PG_arch_1, &p->flags))
+ if ((PageFlush(p) || PageLRU(p)) && !PageDeferred(p))
return NULL;
return p;
}
@@ -282,7 +286,7 @@ void global_flush_tlb(void)
on_each_cpu(flush_kernel_map, &arg, 1, 1);
list_for_each_entry_safe(pg, next, &arg.l, lru) {
list_del(&pg->lru);
- clear_bit(PG_arch_1, &pg->flags);
+ ClearPageDeferred(pg);
if (TestClearPageFlush(pg))
continue;
if (PageReserved(pg) || !cpu_has_pse || page_private(pg) != 0)

2007-09-21 22:38:03

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [17/50] x86_64: remove STR() macros


From: Glauber de Oliveira Costa <[email protected]>
This patch removes the __STR() and STR() macros from x86_64 header files.
They seem to be legacy, and has no more users. Even if there were users,
they should use __stringify() instead.

In fact, there were one third place in which this macro was defined
(ia32_binfmt.c), and used just below. In this file, usage was properly
converted to __stringify()

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/ia32/ia32_binfmt.c | 5 +----
include/asm-x86_64/hw_irq.h | 3 ---
include/asm-x86_64/system.h | 3 ---
3 files changed, 1 insertion(+), 10 deletions(-)

Index: linux/arch/x86_64/ia32/ia32_binfmt.c
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
+++ linux/arch/x86_64/ia32/ia32_binfmt.c
@@ -112,11 +112,8 @@ struct elf_prpsinfo
char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
};

-#define __STR(x) #x
-#define STR(x) __STR(x)
-
#define _GET_SEG(x) \
- ({ __u32 seg; asm("movl %%" STR(x) ",%0" : "=r"(seg)); seg; })
+ ({ __u32 seg; asm("movl %%" __stringify(x) ",%0" : "=r"(seg)); seg; })

/* Assumes current==process to be dumped */
#define ELF_CORE_COPY_REGS(pr_reg, regs) \
Index: linux/include/asm-x86_64/hw_irq.h
===================================================================
--- linux.orig/include/asm-x86_64/hw_irq.h
+++ linux/include/asm-x86_64/hw_irq.h
@@ -149,9 +149,6 @@ extern atomic_t irq_mis_count;

#define IO_APIC_IRQ(x) (((x) >= 16) || ((1<<(x)) & io_apic_irqs))

-#define __STR(x) #x
-#define STR(x) __STR(x)
-
#include <asm/ptrace.h>

#define IRQ_NAME2(nr) nr##_interrupt(void)
Index: linux/include/asm-x86_64/system.h
===================================================================
--- linux.orig/include/asm-x86_64/system.h
+++ linux/include/asm-x86_64/system.h
@@ -7,9 +7,6 @@

#ifdef __KERNEL__

-#define __STR(x) #x
-#define STR(x) __STR(x)
-
#define __SAVE(reg,offset) "movq %%" #reg ",(14-" #offset ")*8(%%rsp)\n\t"
#define __RESTORE(reg,offset) "movq (14-" #offset ")*8(%%rsp),%%" #reg "\n\t"

2007-09-21 22:38:35

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [18/50] x86_64: Save registers in saved_context during suspend and hibernation


From: Rafael J. Wysocki <[email protected]>

During hibernation and suspend on x86_64 save CPU registers in the saved_context
structure rather than in a handful of separate variables.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Looks-ok-to: Pavel Machek <[email protected]>
---
arch/x86_64/kernel/acpi/wakeup.S | 101 ++++++++++++++++++++-------------------
arch/x86_64/kernel/asm-offsets.c | 28 ++++++++++
arch/x86_64/kernel/suspend.c | 6 --
arch/x86_64/kernel/suspend_asm.S | 72 ++++++++++++++-------------
include/asm-x86_64/suspend.h | 23 ++------
5 files changed, 125 insertions(+), 105 deletions(-)

Index: linux/arch/x86_64/kernel/asm-offsets.c
===================================================================
--- linux.orig/arch/x86_64/kernel/asm-offsets.c
+++ linux/arch/x86_64/kernel/asm-offsets.c
@@ -76,6 +76,34 @@ int main(void)
DEFINE(pbe_orig_address, offsetof(struct pbe, orig_address));
DEFINE(pbe_next, offsetof(struct pbe, next));
BLANK();
+#define ENTRY(entry) DEFINE(pt_regs_ ## entry, offsetof(struct pt_regs, entry))
+ ENTRY(rbx);
+ ENTRY(rbx);
+ ENTRY(rcx);
+ ENTRY(rdx);
+ ENTRY(rsp);
+ ENTRY(rbp);
+ ENTRY(rsi);
+ ENTRY(rdi);
+ ENTRY(r8);
+ ENTRY(r9);
+ ENTRY(r10);
+ ENTRY(r11);
+ ENTRY(r12);
+ ENTRY(r13);
+ ENTRY(r14);
+ ENTRY(r15);
+ ENTRY(eflags);
+ BLANK();
+#undef ENTRY
+#define ENTRY(entry) DEFINE(saved_context_ ## entry, offsetof(struct saved_context, entry))
+ ENTRY(cr0);
+ ENTRY(cr2);
+ ENTRY(cr3);
+ ENTRY(cr4);
+ ENTRY(cr8);
+ BLANK();
+#undef ENTRY
DEFINE(TSS_ist, offsetof(struct tss_struct, ist));
BLANK();
DEFINE(crypto_tfm_ctx_offset, offsetof(struct crypto_tfm, __crt_ctx));
Index: linux/include/asm-x86_64/suspend.h
===================================================================
--- linux.orig/include/asm-x86_64/suspend.h
+++ linux/include/asm-x86_64/suspend.h
@@ -3,6 +3,9 @@
* Based on code
* Copyright 2001 Patrick Mochel <[email protected]>
*/
+#ifndef __ASM_X86_64_SUSPEND_H
+#define __ASM_X86_64_SUSPEND_H
+
#include <asm/desc.h>
#include <asm/i387.h>

@@ -12,8 +15,9 @@ arch_prepare_suspend(void)
return 0;
}

-/* Image of the saved processor state. If you touch this, fix acpi_wakeup.S. */
+/* Image of the saved processor state. If you touch this, fix acpi/wakeup.S. */
struct saved_context {
+ struct pt_regs regs;
u16 ds, es, fs, gs, ss;
unsigned long gs_base, gs_kernel_base, fs_base;
unsigned long cr0, cr2, cr3, cr4, cr8;
@@ -29,27 +33,14 @@ struct saved_context {
unsigned long tr;
unsigned long safety;
unsigned long return_address;
- unsigned long eflags;
} __attribute__((packed));

-/* We'll access these from assembly, so we'd better have them outside struct */
-extern unsigned long saved_context_eax, saved_context_ebx, saved_context_ecx, saved_context_edx;
-extern unsigned long saved_context_esp, saved_context_ebp, saved_context_esi, saved_context_edi;
-extern unsigned long saved_context_r08, saved_context_r09, saved_context_r10, saved_context_r11;
-extern unsigned long saved_context_r12, saved_context_r13, saved_context_r14, saved_context_r15;
-extern unsigned long saved_context_eflags;
-
#define loaddebug(thread,register) \
set_debugreg((thread)->debugreg##register, register)

extern void fix_processor_context(void);

-extern unsigned long saved_rip;
-extern unsigned long saved_rsp;
-extern unsigned long saved_rbp;
-extern unsigned long saved_rbx;
-extern unsigned long saved_rsi;
-extern unsigned long saved_rdi;
-
/* routines for saving/restoring kernel state */
extern int acpi_save_state_mem(void);
+
+#endif /* __ASM_X86_64_SUSPEND_H */
Index: linux/arch/x86_64/kernel/suspend.c
===================================================================
--- linux.orig/arch/x86_64/kernel/suspend.c
+++ linux/arch/x86_64/kernel/suspend.c
@@ -19,12 +19,6 @@ extern const void __nosave_begin, __nosa

struct saved_context saved_context;

-unsigned long saved_context_eax, saved_context_ebx, saved_context_ecx, saved_context_edx;
-unsigned long saved_context_esp, saved_context_ebp, saved_context_esi, saved_context_edi;
-unsigned long saved_context_r08, saved_context_r09, saved_context_r10, saved_context_r11;
-unsigned long saved_context_r12, saved_context_r13, saved_context_r14, saved_context_r15;
-unsigned long saved_context_eflags;
-
void __save_processor_state(struct saved_context *ctxt)
{
kernel_fpu_begin();
Index: linux/arch/x86_64/kernel/suspend_asm.S
===================================================================
--- linux.orig/arch/x86_64/kernel/suspend_asm.S
+++ linux/arch/x86_64/kernel/suspend_asm.S
@@ -17,24 +17,24 @@
#include <asm/asm-offsets.h>

ENTRY(swsusp_arch_suspend)
-
- movq %rsp, saved_context_esp(%rip)
- movq %rax, saved_context_eax(%rip)
- movq %rbx, saved_context_ebx(%rip)
- movq %rcx, saved_context_ecx(%rip)
- movq %rdx, saved_context_edx(%rip)
- movq %rbp, saved_context_ebp(%rip)
- movq %rsi, saved_context_esi(%rip)
- movq %rdi, saved_context_edi(%rip)
- movq %r8, saved_context_r08(%rip)
- movq %r9, saved_context_r09(%rip)
- movq %r10, saved_context_r10(%rip)
- movq %r11, saved_context_r11(%rip)
- movq %r12, saved_context_r12(%rip)
- movq %r13, saved_context_r13(%rip)
- movq %r14, saved_context_r14(%rip)
- movq %r15, saved_context_r15(%rip)
- pushfq ; popq saved_context_eflags(%rip)
+ movq $saved_context, %rax
+ movq %rsp, pt_regs_rsp(%rax)
+ movq %rbp, pt_regs_rbp(%rax)
+ movq %rsi, pt_regs_rsi(%rax)
+ movq %rdi, pt_regs_rdi(%rax)
+ movq %rbx, pt_regs_rbx(%rax)
+ movq %rcx, pt_regs_rcx(%rax)
+ movq %rdx, pt_regs_rdx(%rax)
+ movq %r8, pt_regs_r8(%rax)
+ movq %r9, pt_regs_r9(%rax)
+ movq %r10, pt_regs_r10(%rax)
+ movq %r11, pt_regs_r11(%rax)
+ movq %r12, pt_regs_r12(%rax)
+ movq %r13, pt_regs_r13(%rax)
+ movq %r14, pt_regs_r14(%rax)
+ movq %r15, pt_regs_r15(%rax)
+ pushfq
+ popq pt_regs_eflags(%rax)

call swsusp_save
ret
@@ -87,23 +87,25 @@ done:
movl $24, %eax
movl %eax, %ds

- movq saved_context_esp(%rip), %rsp
- movq saved_context_ebp(%rip), %rbp
- /* Don't restore %rax, it must be 0 anyway */
- movq saved_context_ebx(%rip), %rbx
- movq saved_context_ecx(%rip), %rcx
- movq saved_context_edx(%rip), %rdx
- movq saved_context_esi(%rip), %rsi
- movq saved_context_edi(%rip), %rdi
- movq saved_context_r08(%rip), %r8
- movq saved_context_r09(%rip), %r9
- movq saved_context_r10(%rip), %r10
- movq saved_context_r11(%rip), %r11
- movq saved_context_r12(%rip), %r12
- movq saved_context_r13(%rip), %r13
- movq saved_context_r14(%rip), %r14
- movq saved_context_r15(%rip), %r15
- pushq saved_context_eflags(%rip) ; popfq
+ /* We don't restore %rax, it must be 0 anyway */
+ movq $saved_context, %rax
+ movq pt_regs_rsp(%rax), %rsp
+ movq pt_regs_rbp(%rax), %rbp
+ movq pt_regs_rsi(%rax), %rsi
+ movq pt_regs_rdi(%rax), %rdi
+ movq pt_regs_rbx(%rax), %rbx
+ movq pt_regs_rcx(%rax), %rcx
+ movq pt_regs_rdx(%rax), %rdx
+ movq pt_regs_r8(%rax), %r8
+ movq pt_regs_r9(%rax), %r9
+ movq pt_regs_r10(%rax), %r10
+ movq pt_regs_r11(%rax), %r11
+ movq pt_regs_r12(%rax), %r12
+ movq pt_regs_r13(%rax), %r13
+ movq pt_regs_r14(%rax), %r14
+ movq pt_regs_r15(%rax), %r15
+ pushq pt_regs_eflags(%rax)
+ popfq

xorq %rax, %rax

Index: linux/arch/x86_64/kernel/acpi/wakeup.S
===================================================================
--- linux.orig/arch/x86_64/kernel/acpi/wakeup.S
+++ linux/arch/x86_64/kernel/acpi/wakeup.S
@@ -4,6 +4,7 @@
#include <asm/pgtable.h>
#include <asm/page.h>
#include <asm/msr.h>
+#include <asm/asm-offsets.h>

# Copyright 2003 Pavel Machek <[email protected]>, distribute under GPLv2
#
@@ -395,31 +396,32 @@ do_suspend_lowlevel:
xorl %eax, %eax
call save_processor_state

- movq %rsp, saved_context_esp(%rip)
- movq %rax, saved_context_eax(%rip)
- movq %rbx, saved_context_ebx(%rip)
- movq %rcx, saved_context_ecx(%rip)
- movq %rdx, saved_context_edx(%rip)
- movq %rbp, saved_context_ebp(%rip)
- movq %rsi, saved_context_esi(%rip)
- movq %rdi, saved_context_edi(%rip)
- movq %r8, saved_context_r08(%rip)
- movq %r9, saved_context_r09(%rip)
- movq %r10, saved_context_r10(%rip)
- movq %r11, saved_context_r11(%rip)
- movq %r12, saved_context_r12(%rip)
- movq %r13, saved_context_r13(%rip)
- movq %r14, saved_context_r14(%rip)
- movq %r15, saved_context_r15(%rip)
- pushfq ; popq saved_context_eflags(%rip)
+ movq $saved_context, %rax
+ movq %rsp, pt_regs_rsp(%rax)
+ movq %rbp, pt_regs_rbp(%rax)
+ movq %rsi, pt_regs_rsi(%rax)
+ movq %rdi, pt_regs_rdi(%rax)
+ movq %rbx, pt_regs_rbx(%rax)
+ movq %rcx, pt_regs_rcx(%rax)
+ movq %rdx, pt_regs_rdx(%rax)
+ movq %r8, pt_regs_r8(%rax)
+ movq %r9, pt_regs_r9(%rax)
+ movq %r10, pt_regs_r10(%rax)
+ movq %r11, pt_regs_r11(%rax)
+ movq %r12, pt_regs_r12(%rax)
+ movq %r13, pt_regs_r13(%rax)
+ movq %r14, pt_regs_r14(%rax)
+ movq %r15, pt_regs_r15(%rax)
+ pushfq
+ popq pt_regs_eflags(%rax)

movq $.L97, saved_rip(%rip)

- movq %rsp,saved_rsp
- movq %rbp,saved_rbp
- movq %rbx,saved_rbx
- movq %rdi,saved_rdi
- movq %rsi,saved_rsi
+ movq %rsp, saved_rsp
+ movq %rbp, saved_rbp
+ movq %rbx, saved_rbx
+ movq %rdi, saved_rdi
+ movq %rsi, saved_rsi

addq $8, %rsp
movl $3, %edi
@@ -430,32 +432,35 @@ do_suspend_lowlevel:
.L99:
.align 4
movl $24, %eax
- movw %ax, %ds
- movq saved_context+58(%rip), %rax
- movq %rax, %cr4
- movq saved_context+50(%rip), %rax
- movq %rax, %cr3
- movq saved_context+42(%rip), %rax
- movq %rax, %cr2
- movq saved_context+34(%rip), %rax
- movq %rax, %cr0
- pushq saved_context_eflags(%rip) ; popfq
- movq saved_context_esp(%rip), %rsp
- movq saved_context_ebp(%rip), %rbp
- movq saved_context_eax(%rip), %rax
- movq saved_context_ebx(%rip), %rbx
- movq saved_context_ecx(%rip), %rcx
- movq saved_context_edx(%rip), %rdx
- movq saved_context_esi(%rip), %rsi
- movq saved_context_edi(%rip), %rdi
- movq saved_context_r08(%rip), %r8
- movq saved_context_r09(%rip), %r9
- movq saved_context_r10(%rip), %r10
- movq saved_context_r11(%rip), %r11
- movq saved_context_r12(%rip), %r12
- movq saved_context_r13(%rip), %r13
- movq saved_context_r14(%rip), %r14
- movq saved_context_r15(%rip), %r15
+ movw %ax, %ds
+
+ /* We don't restore %rax, it must be 0 anyway */
+ movq $saved_context, %rax
+ movq saved_context_cr4(%rax), %rbx
+ movq %rbx, %cr4
+ movq saved_context_cr3(%rax), %rbx
+ movq %rbx, %cr3
+ movq saved_context_cr2(%rax), %rbx
+ movq %rbx, %cr2
+ movq saved_context_cr0(%rax), %rbx
+ movq %rbx, %cr0
+ pushq pt_regs_eflags(%rax)
+ popfq
+ movq pt_regs_rsp(%rax), %rsp
+ movq pt_regs_rbp(%rax), %rbp
+ movq pt_regs_rsi(%rax), %rsi
+ movq pt_regs_rdi(%rax), %rdi
+ movq pt_regs_rbx(%rax), %rbx
+ movq pt_regs_rcx(%rax), %rcx
+ movq pt_regs_rdx(%rax), %rdx
+ movq pt_regs_r8(%rax), %r8
+ movq pt_regs_r9(%rax), %r9
+ movq pt_regs_r10(%rax), %r10
+ movq pt_regs_r11(%rax), %r11
+ movq pt_regs_r12(%rax), %r12
+ movq pt_regs_r13(%rax), %r13
+ movq pt_regs_r14(%rax), %r14
+ movq pt_regs_r15(%rax), %r15

xorl %eax, %eax
addq $8, %rsp

2007-09-21 22:38:53

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS


Also allow to set svm lock.

TBD double check, documentation, i386 support

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/kernel/setup.c | 25 +++++++++++++++++++++++--
include/asm-i386/cpufeature.h | 1 +
include/asm-i386/msr-index.h | 3 +++
3 files changed, 27 insertions(+), 2 deletions(-)

Index: linux/arch/x86_64/kernel/setup.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -565,7 +565,7 @@ static void __cpuinit early_init_amd(str

static void __cpuinit init_amd(struct cpuinfo_x86 *c)
{
- unsigned level;
+ unsigned level, flags, dummy;

#ifdef CONFIG_SMP
unsigned long value;
@@ -634,7 +634,28 @@ static void __cpuinit init_amd(struct cp
/* Family 10 doesn't support C states in MWAIT so don't use it */
if (c->x86 == 0x10 && !force_mwait)
clear_bit(X86_FEATURE_MWAIT, &c->x86_capability);
+
+ if (c->x86 >= 0xf && c->x86 <= 0x11 &&
+ !rdmsr_safe(MSR_VM_CR, &flags, &dummy) &&
+ (flags & 0x18))
+ set_bit(X86_FEATURE_VIRT_DISABLED, &c->x86_capability);
+}
+
+static int enable_svm_lock(char *s)
+{
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ boot_cpu_data.x86 >= 0xf && boot_cpu_data.x86 <= 0x11) {
+ unsigned a,b;
+ if (rdmsr_safe(MSR_VM_CR, &a, &b))
+ return 0;
+ a |= (1 << 3); /* set SVM lock */
+ if (!wrmsr_safe(MSR_VM_CR, &a, &b))
+ return 1;
+ }
+ printk(KERN_ERR "CPU does not support svm_lock\n");
+ return 0;
}
+__setup("svm_lock", enable_svm_lock);

static void __cpuinit detect_ht(struct cpuinfo_x86 *c)
{
@@ -985,7 +1006,7 @@ static int show_cpuinfo(struct seq_file
NULL, NULL, NULL, NULL,
"constant_tsc", "up", NULL, "arch_perfmon",
"pebs", "bts", NULL, "sync_rdtsc",
- "rep_good", NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+ "rep_good", "virtualization_bios_disabled", NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,

/* Intel-defined (#2) */
Index: linux/include/asm-i386/cpufeature.h
===================================================================
--- linux.orig/include/asm-i386/cpufeature.h
+++ linux/include/asm-i386/cpufeature.h
@@ -82,6 +82,7 @@
/* 14 free */
#define X86_FEATURE_SYNC_RDTSC (3*32+15) /* RDTSC synchronizes the CPU */
#define X86_FEATURE_REP_GOOD (3*32+16) /* rep microcode works well on this CPU */
+#define X86_FEATURE_VIRT_DISABLED (3*32+17) /* Hardware virt. BIOS disabled */

/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_XMM3 (4*32+ 0) /* Streaming SIMD Extensions-3 */
Index: linux/include/asm-i386/msr-index.h
===================================================================
--- linux.orig/include/asm-i386/msr-index.h
+++ linux/include/asm-i386/msr-index.h
@@ -98,6 +98,9 @@
#define K8_MTRRFIXRANGE_DRAM_MODIFY 0x00080000 /* MtrrFixDramModEn bit */
#define K8_MTRR_RDMEM_WRMEM_MASK 0x18181818 /* Mask: RdMem|WrMem */

+/* SVM */
+#define MSR_VM_CR 0xc0010114
+
/* K7 MSRs */
#define MSR_K7_EVNTSEL0 0xc0010000
#define MSR_K7_PERFCTR0 0xc0010004

2007-09-21 22:39:22

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [20/50] x86_64: Fix some broken white space in arch/x86_64/mm/init.c


No functional changes
Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/mm/init.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)

Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -73,7 +73,7 @@ void show_mem(void)
printk(KERN_INFO "Free swap: %6ldkB\n", nr_swap_pages<<(PAGE_SHIFT-10));

for_each_online_pgdat(pgdat) {
- for (i = 0; i < pgdat->node_spanned_pages; ++i) {
+ for (i = 0; i < pgdat->node_spanned_pages; ++i) {
/* this loop can take a while with 256 GB and 4k pages
so update the NMI watchdog */
if (unlikely(i % MAX_ORDER_NR_PAGES == 0)) {
@@ -89,7 +89,7 @@ void show_mem(void)
cached++;
else if (page_count(page))
shared += page_count(page) - 1;
- }
+ }
}
printk(KERN_INFO "%lu pages of RAM\n", total);
printk(KERN_INFO "%lu reserved pages\n",reserved);
@@ -114,7 +114,7 @@ static __init void *spp_getpage(void)
}

static __init void set_pte_phys(unsigned long vaddr,
- unsigned long phys, pgprot_t prot)
+ unsigned long phys, pgprot_t prot)
{
pgd_t *pgd;
pud_t *pud;
@@ -324,7 +324,7 @@ static void __init find_early_table_spac
puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
tables = round_up(puds * sizeof(pud_t), PAGE_SIZE) +
- round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+ round_up(pmds * sizeof(pmd_t), PAGE_SIZE);

/* RED-PEN putting page tables only on node 0 could
cause a hotspot and fill up ZONE_DMA. The page tables
@@ -338,8 +338,8 @@ static void __init find_early_table_spac
table_end = table_start;

early_printk("kernel direct mapping tables up to %lx @ %lx-%lx\n",
- end, table_start << PAGE_SHIFT,
- (table_start << PAGE_SHIFT) + tables);
+ end, table_start << PAGE_SHIFT,
+ (table_start << PAGE_SHIFT) + tables);
}

/* Setup the direct mapping of the physical memory at PAGE_OFFSET.
@@ -428,7 +428,7 @@ void __init clear_kernel_mapping(unsigne
if (0 == (pmd_val(*pmd) & _PAGE_PSE)) {
/* Could handle this, but it should not happen currently. */
printk(KERN_ERR
- "clear_kernel_mapping: mapping has been split. will leak memory\n");
+ "clear_kernel_mapping: mapping has been split. will leak memory\n");
pmd_ERROR(*pmd);
}
set_pmd(pmd, __pmd(0));
@@ -539,7 +539,7 @@ void __init mem_init(void)
totalram_pages = free_all_bootmem();
#endif
reservedpages = end_pfn - totalram_pages -
- absent_pages_in_range(0, end_pfn);
+ absent_pages_in_range(0, end_pfn);

after_bootmem = 1;

@@ -554,15 +554,15 @@ void __init mem_init(void)
kclist_add(&kcore_kernel, &_stext, _end - _stext);
kclist_add(&kcore_modules, (void *)MODULES_VADDR, MODULES_LEN);
kclist_add(&kcore_vsyscall, (void *)VSYSCALL_START,
- VSYSCALL_END - VSYSCALL_START);
+ VSYSCALL_END - VSYSCALL_START);

printk("Memory: %luk/%luk available (%ldk kernel code, %ldk reserved, %ldk data, %ldk init)\n",
- (unsigned long) nr_free_pages() << (PAGE_SHIFT-10),
- end_pfn << (PAGE_SHIFT-10),
- codesize >> 10,
- reservedpages << (PAGE_SHIFT-10),
- datasize >> 10,
- initsize >> 10);
+ (unsigned long) nr_free_pages() << (PAGE_SHIFT-10),
+ end_pfn << (PAGE_SHIFT-10),
+ codesize >> 10,
+ reservedpages << (PAGE_SHIFT-10),
+ datasize >> 10,
+ initsize >> 10);
}

void free_init_pages(char *what, unsigned long begin, unsigned long end)
@@ -669,10 +669,10 @@ void __init reserve_bootmem_generic(unsi
int kern_addr_valid(unsigned long addr)
{
unsigned long above = ((long)addr) >> __VIRTUAL_MASK_SHIFT;
- pgd_t *pgd;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;

if (above != 0 && above != -1UL)
return 0;
@@ -737,7 +737,7 @@ int in_gate_area_no_task(unsigned long a
void * __init alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
{
return __alloc_bootmem_core(pgdat->bdata, size,
- SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
+ SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
}

const char *arch_vma_name(struct vm_area_struct *vma)

2007-09-21 22:39:43

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [21/50] i386: Misc cpuinit annotations


From: Satyam Sharma <[email protected]>

msr_class_cpu_callback() can be marked __cpuinit, being the notifier
callback for a __cpuinitdata notifier_block. So can be marked
msr_device_create() too, called only from the newly-__cpuinit
msr_class_cpu_callback() or from __init-marked msr_init().

Signed-off-by: Satyam Sharma <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---

arch/i386/kernel/msr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/i386/kernel/msr.c
===================================================================
--- linux.orig/arch/i386/kernel/msr.c
+++ linux/arch/i386/kernel/msr.c
@@ -135,7 +135,7 @@ static const struct file_operations msr_
.open = msr_open,
};

-static int msr_device_create(int i)
+static int __cpuinit msr_device_create(int i)
{
int err = 0;
struct device *dev;
@@ -146,7 +146,7 @@ static int msr_device_create(int i)
return err;
}

-static int msr_class_cpu_callback(struct notifier_block *nfb,
+static int __cpuinit msr_class_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
unsigned int cpu = (unsigned long)hcpu;

2007-09-21 22:40:14

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [22/50] i386: Misc cpuinit annotations


From: Satyam Sharma <[email protected]>

cpuid_class_cpu_callback() is callback function of a CPU hotplug
notifier_block (that is already marked as __cpuinitdata). Therefore
it can safely be marked as __cpuinit.

cpuid_device_create() is only referenced from other functions that
are __cpuinit or __init. So it can also be safely marked __cpuinit.

Signed-off-by: Satyam Sharma <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---

arch/i386/kernel/cpuid.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux/arch/i386/kernel/cpuid.c
===================================================================
--- linux.orig/arch/i386/kernel/cpuid.c
+++ linux/arch/i386/kernel/cpuid.c
@@ -136,7 +136,7 @@ static const struct file_operations cpui
.open = cpuid_open,
};

-static int cpuid_device_create(int i)
+static int __cpuinit cpuid_device_create(int i)
{
int err = 0;
struct device *dev;
@@ -147,7 +147,9 @@ static int cpuid_device_create(int i)
return err;
}

-static int cpuid_class_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
+static int __cpuinit cpuid_class_cpu_callback(struct notifier_block *nfb,
+ unsigned long action,
+ void *hcpu)
{
unsigned int cpu = (unsigned long)hcpu;

2007-09-21 22:40:50

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [24/50] x86_64: Eliminate result signage problem in asm-x86_64/bitops.h


From: Chuck Lever <[email protected]>
The return type of __scanbit() doesn't match the return type of
find_{first,next}_bit(). Thus when you construct something like
this:

boolean ? __scanbit() : find_first_bit()

you get an unsigned long result if "boolean" is true, and a signed
long result if "boolean" is false.

In file included from /home/cel/src/linux/include/linux/mmzone.h:15,
from /home/cel/src/linux/include/linux/gfp.h:4,
from /home/cel/src/linux/include/linux/slab.h:14,
from /home/cel/src/linux/include/linux/percpu.h:5,
from
/home/cel/src/linux/include/linux/rcupdate.h:41,
from /home/cel/src/linux/include/linux/dcache.h:10,
from /home/cel/src/linux/include/linux/fs.h:275,
from /home/cel/src/linux/fs/nfs/sysctl.c:9:
/home/cel/src/linux/include/linux/nodemask.h: In function
‘__first_node’:
/home/cel/src/linux/include/linux/nodemask.h:229: warning: signed and
unsigned type in conditional expression
/home/cel/src/linux/include/linux/nodemask.h: In function
‘__next_node’:
/home/cel/src/linux/include/linux/nodemask.h:235: warning: signed and
unsigned type in conditional expression
/home/cel/src/linux/include/linux/nodemask.h: In function
‘__first_unset_node’:
/home/cel/src/linux/include/linux/nodemask.h:253: warning: signed and
unsigned type in conditional expression

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---

include/asm-x86_64/bitops.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/include/asm-x86_64/bitops.h
===================================================================
--- linux.orig/include/asm-x86_64/bitops.h
+++ linux/include/asm-x86_64/bitops.h
@@ -260,7 +260,7 @@ extern long find_first_bit(const unsigne
extern long find_next_bit(const unsigned long * addr, long size, long offset);

/* return index of first bet set in val or max when no bit is set */
-static inline unsigned long __scanbit(unsigned long val, unsigned long max)
+static inline long __scanbit(unsigned long val, unsigned long max)
{
asm("bsfq %1,%0 ; cmovz %2,%0" : "=&r" (val) : "r" (val), "r" (max));
return val;

2007-09-21 22:41:19

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [25/50] x86_64: Add parenthesis to IRQ vector macros


From: Steven Rostedt <[email protected]>
It is not good taste to have macros with additions that do not have
parenthesis's around them. This patch parethesizes the IRQ vector
macros for x86_64 arch.

Note, this caused me a bit of heart-ache debugging lguest64.

Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---
include/asm-x86_64/hw_irq.h | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)

Index: linux/include/asm-x86_64/hw_irq.h
===================================================================
--- linux.orig/include/asm-x86_64/hw_irq.h
+++ linux/include/asm-x86_64/hw_irq.h
@@ -40,22 +40,22 @@
/*
* Vectors 0x30-0x3f are used for ISA interrupts.
*/
-#define IRQ0_VECTOR FIRST_EXTERNAL_VECTOR + 0x10
-#define IRQ1_VECTOR IRQ0_VECTOR + 1
-#define IRQ2_VECTOR IRQ0_VECTOR + 2
-#define IRQ3_VECTOR IRQ0_VECTOR + 3
-#define IRQ4_VECTOR IRQ0_VECTOR + 4
-#define IRQ5_VECTOR IRQ0_VECTOR + 5
-#define IRQ6_VECTOR IRQ0_VECTOR + 6
-#define IRQ7_VECTOR IRQ0_VECTOR + 7
-#define IRQ8_VECTOR IRQ0_VECTOR + 8
-#define IRQ9_VECTOR IRQ0_VECTOR + 9
-#define IRQ10_VECTOR IRQ0_VECTOR + 10
-#define IRQ11_VECTOR IRQ0_VECTOR + 11
-#define IRQ12_VECTOR IRQ0_VECTOR + 12
-#define IRQ13_VECTOR IRQ0_VECTOR + 13
-#define IRQ14_VECTOR IRQ0_VECTOR + 14
-#define IRQ15_VECTOR IRQ0_VECTOR + 15
+#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10)
+#define IRQ1_VECTOR (IRQ0_VECTOR + 1)
+#define IRQ2_VECTOR (IRQ0_VECTOR + 2)
+#define IRQ3_VECTOR (IRQ0_VECTOR + 3)
+#define IRQ4_VECTOR (IRQ0_VECTOR + 4)
+#define IRQ5_VECTOR (IRQ0_VECTOR + 5)
+#define IRQ6_VECTOR (IRQ0_VECTOR + 6)
+#define IRQ7_VECTOR (IRQ0_VECTOR + 7)
+#define IRQ8_VECTOR (IRQ0_VECTOR + 8)
+#define IRQ9_VECTOR (IRQ0_VECTOR + 9)
+#define IRQ10_VECTOR (IRQ0_VECTOR + 10)
+#define IRQ11_VECTOR (IRQ0_VECTOR + 11)
+#define IRQ12_VECTOR (IRQ0_VECTOR + 12)
+#define IRQ13_VECTOR (IRQ0_VECTOR + 13)
+#define IRQ14_VECTOR (IRQ0_VECTOR + 14)
+#define IRQ15_VECTOR (IRQ0_VECTOR + 15)

/*
* Special IRQ vectors used by the SMP architecture, 0xf0-0xff

2007-09-21 22:41:37

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [23/50] x86_64: Implement missing x86_64 function smp_call_function_mask()


From: Laurent Vivier <[email protected]>

This patch defines the missing function smp_call_function_mask() for x86_64,
this is more or less a cut&paste of i386 function. It removes also some
duplicate code.

This function is needed by KVM to execute a function on some CPUs.

AK: Fixed description
AK: Moved WARN_ON(irqs_disabled) one level up to not warn in the panic case.

arch/x86_64/kernel/smp.c | 118 ++++++++++++++++++++++++-----------------------
include/asm-x86_64/smp.h | 2
arch/x86_64/kernel/smp.c | 119 ++++++++++++++++++++++++-----------------------
include/asm-x86_64/smp.h | 2
2 files changed, 65 insertions(+), 56 deletions(-)

Signed-off-by: Laurent Vivier <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>


Index: linux/arch/x86_64/kernel/smp.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smp.c
+++ linux/arch/x86_64/kernel/smp.c
@@ -322,17 +322,27 @@ void unlock_ipi_call_lock(void)
}

/*
- * this function sends a 'generic call function' IPI to one other CPU
- * in the system.
- *
- * cpu is a standard Linux logical CPU number.
+ * this function sends a 'generic call function' IPI to all other CPU
+ * of the system defined in the mask.
*/
-static void
-__smp_call_function_single(int cpu, void (*func) (void *info), void *info,
- int nonatomic, int wait)
+
+static int
+__smp_call_function_mask(cpumask_t mask,
+ void (*func)(void *), void *info,
+ int wait)
{
struct call_data_struct data;
- int cpus = 1;
+ cpumask_t allbutself;
+ int cpus;
+
+ allbutself = cpu_online_map;
+ cpu_clear(smp_processor_id(), allbutself);
+
+ cpus_and(mask, mask, allbutself);
+ cpus = cpus_weight(mask);
+
+ if (!cpus)
+ return 0;

data.func = func;
data.info = info;
@@ -343,19 +353,55 @@ __smp_call_function_single(int cpu, void

call_data = &data;
wmb();
- /* Send a message to all other CPUs and wait for them to respond */
- send_IPI_mask(cpumask_of_cpu(cpu), CALL_FUNCTION_VECTOR);
+
+ /* Send a message to other CPUs */
+ if (cpus_equal(mask, allbutself))
+ send_IPI_allbutself(CALL_FUNCTION_VECTOR);
+ else
+ send_IPI_mask(mask, CALL_FUNCTION_VECTOR);

/* Wait for response */
while (atomic_read(&data.started) != cpus)
cpu_relax();

if (!wait)
- return;
+ return 0;

while (atomic_read(&data.finished) != cpus)
cpu_relax();
+
+ return 0;
+}
+/**
+ * smp_call_function_mask(): Run a function on a set of other CPUs.
+ * @mask: The set of cpus to run on. Must not include the current cpu.
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @wait: If true, wait (atomically) until function has completed on other CPUs.
+ *
+ * Returns 0 on success, else a negative status code.
+ *
+ * If @wait is true, then returns once @func has returned; otherwise
+ * it returns just before the target cpu calls @func.
+ *
+ * You must not call this function with disabled interrupts or from a
+ * hardware interrupt handler or from a bottom half handler.
+ */
+int smp_call_function_mask(cpumask_t mask,
+ void (*func)(void *), void *info,
+ int wait)
+{
+ int ret;
+
+ /* Can deadlock when called with interrupts disabled */
+ WARN_ON(irqs_disabled());
+
+ spin_lock(&call_lock);
+ ret = __smp_call_function_mask(mask, func, info, wait);
+ spin_unlock(&call_lock);
+ return ret;
}
+EXPORT_SYMBOL(smp_call_function_mask);

/*
* smp_call_function_single - Run a function on a specific CPU
@@ -374,6 +420,7 @@ int smp_call_function_single (int cpu, v
int nonatomic, int wait)
{
/* prevent preemption and reschedule on another processor */
+ int ret;
int me = get_cpu();

/* Can deadlock when called with interrupts disabled */
@@ -387,51 +434,14 @@ int smp_call_function_single (int cpu, v
return 0;
}

- spin_lock(&call_lock);
- __smp_call_function_single(cpu, func, info, nonatomic, wait);
- spin_unlock(&call_lock);
+ ret = smp_call_function_mask(cpumask_of_cpu(cpu), func, info, wait);
+
put_cpu();
- return 0;
+ return ret;
}
EXPORT_SYMBOL(smp_call_function_single);

/*
- * this function sends a 'generic call function' IPI to all other CPUs
- * in the system.
- */
-static void __smp_call_function (void (*func) (void *info), void *info,
- int nonatomic, int wait)
-{
- struct call_data_struct data;
- int cpus = num_online_cpus()-1;
-
- if (!cpus)
- return;
-
- data.func = func;
- data.info = info;
- atomic_set(&data.started, 0);
- data.wait = wait;
- if (wait)
- atomic_set(&data.finished, 0);
-
- call_data = &data;
- wmb();
- /* Send a message to all other CPUs and wait for them to respond */
- send_IPI_allbutself(CALL_FUNCTION_VECTOR);
-
- /* Wait for response */
- while (atomic_read(&data.started) != cpus)
- cpu_relax();
-
- if (!wait)
- return;
-
- while (atomic_read(&data.finished) != cpus)
- cpu_relax();
-}
-
-/*
* smp_call_function - run a function on all other CPUs.
* @func: The function to run. This must be fast and non-blocking.
* @info: An arbitrary pointer to pass to the function.
@@ -449,10 +459,7 @@ static void __smp_call_function (void (*
int smp_call_function (void (*func) (void *info), void *info, int nonatomic,
int wait)
{
- spin_lock(&call_lock);
- __smp_call_function(func,info,nonatomic,wait);
- spin_unlock(&call_lock);
- return 0;
+ return smp_call_function_mask(cpu_online_map, func, info, wait);
}
EXPORT_SYMBOL(smp_call_function);

@@ -479,7 +486,7 @@ void smp_send_stop(void)
/* Don't deadlock on the call lock in panic */
nolock = !spin_trylock(&call_lock);
local_irq_save(flags);
- __smp_call_function(stop_this_cpu, NULL, 0, 0);
+ __smp_call_function_mask(cpu_online_map, stop_this_cpu, NULL, 0);
if (!nolock)
spin_unlock(&call_lock);
disable_local_APIC();
Index: linux/include/asm-x86_64/smp.h
===================================================================
--- linux.orig/include/asm-x86_64/smp.h
+++ linux/include/asm-x86_64/smp.h
@@ -37,6 +37,8 @@ extern void lock_ipi_call_lock(void);
extern void unlock_ipi_call_lock(void);
extern int smp_num_siblings;
extern void smp_send_reschedule(int cpu);
+extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
+ void *info, int wait);

extern cpumask_t cpu_sibling_map[NR_CPUS];
extern cpumask_t cpu_core_map[NR_CPUS];

2007-09-21 22:42:00

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [26/50] i386: export i386 smp_call_function_mask() to modules


From: Laurent Vivier <[email protected]>

This patch export i386 smp_call_function_mask() with EXPORT_SYMBOL().

This function is needed by KVM to call a function on a set of CPUs.

arch/i386/kernel/smp.c | 7 +++++++
include/asm-i386/smp.h | 9 +++------
2 files changed, 10 insertions(+), 6 deletions(-)

Signed-off-by: Laurent Vivier <[email protected]>


Index: linux/arch/i386/kernel/smp.c
===================================================================
--- linux.orig/arch/i386/kernel/smp.c
+++ linux/arch/i386/kernel/smp.c
@@ -708,3 +708,10 @@ struct smp_ops smp_ops = {
.smp_send_reschedule = native_smp_send_reschedule,
.smp_call_function_mask = native_smp_call_function_mask,
};
+
+int smp_call_function_mask(cpumask_t mask, void (*func) (void *info),
+ void *info, int wait)
+{
+ return smp_ops.smp_call_function_mask(mask, func, info, wait);
+}
+EXPORT_SYMBOL(smp_call_function_mask);
Index: linux/include/asm-i386/smp.h
===================================================================
--- linux.orig/include/asm-i386/smp.h
+++ linux/include/asm-i386/smp.h
@@ -92,12 +92,9 @@ static inline void smp_send_reschedule(i
{
smp_ops.smp_send_reschedule(cpu);
}
-static inline int smp_call_function_mask(cpumask_t mask,
- void (*func) (void *info), void *info,
- int wait)
-{
- return smp_ops.smp_call_function_mask(mask, func, info, wait);
-}
+extern int smp_call_function_mask(cpumask_t mask,
+ void (*func) (void *info), void *info,
+ int wait);

void native_smp_prepare_boot_cpu(void);
void native_smp_prepare_cpus(unsigned int max_cpus);

2007-09-21 22:42:32

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [27/50] x86_64: Remove duplicated vsyscall nsec update


Spotted by Chuck Ebbert

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/kernel/vsyscall.c | 1 -
1 file changed, 1 deletion(-)

Index: linux/arch/x86_64/kernel/vsyscall.c
===================================================================
--- linux.orig/arch/x86_64/kernel/vsyscall.c
+++ linux/arch/x86_64/kernel/vsyscall.c
@@ -80,7 +80,6 @@ void update_vsyscall(struct timespec *wa
vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
vsyscall_gtod_data.sys_tz = sys_tz;
- vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
vsyscall_gtod_data.wall_to_monotonic = wall_to_monotonic;
write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
}

2007-09-21 22:42:52

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [28/50] i386: remove stub early_printk.c


From: "Jan Beulich" <[email protected]>
.. and handle use of the x86-64 file by make logic instead.

Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

arch/i386/kernel/Makefile | 1 +
arch/i386/kernel/early_printk.c | 2 --
2 files changed, 1 insertion(+), 2 deletions(-)

Index: linux/arch/i386/kernel/Makefile
===================================================================
--- linux.orig/arch/i386/kernel/Makefile
+++ linux/arch/i386/kernel/Makefile
@@ -86,6 +86,7 @@ $(obj)/vsyscall-syms.o: $(src)/vsyscall.
$(obj)/vsyscall-sysenter.o $(obj)/vsyscall-note.o FORCE
$(call if_changed,syscall)

+early_printk-y += ../../x86_64/kernel/early_printk.o
k8-y += ../../x86_64/kernel/k8.o
stacktrace-y += ../../x86_64/kernel/stacktrace.o
early-quirks-y += ../../x86_64/kernel/early-quirks.o
Index: linux/arch/i386/kernel/early_printk.c
===================================================================
--- linux.orig/arch/i386/kernel/early_printk.c
+++ /dev/null
@@ -1,2 +0,0 @@
-
-#include "../../x86_64/kernel/early_printk.c"

2007-09-21 22:43:28

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [29/50] x86: honor _PAGE_PSE bit on page walks


From: "Jan Beulich" <[email protected]>
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

arch/i386/mm/fault.c | 3 ++-
arch/x86_64/mm/fault.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)

Index: linux/arch/i386/mm/fault.c
===================================================================
--- linux.orig/arch/i386/mm/fault.c
+++ linux/arch/i386/mm/fault.c
@@ -569,7 +569,8 @@ no_context:
* it's allocated already.
*/
if ((page >> PAGE_SHIFT) < max_low_pfn
- && (page & _PAGE_PRESENT)) {
+ && (page & _PAGE_PRESENT)
+ && !(page & _PAGE_PSE)) {
page &= PAGE_MASK;
page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
& (PTRS_PER_PTE - 1)];
Index: linux/arch/x86_64/mm/fault.c
===================================================================
--- linux.orig/arch/x86_64/mm/fault.c
+++ linux/arch/x86_64/mm/fault.c
@@ -175,7 +175,7 @@ void dump_pagetable(unsigned long addres
pmd = pmd_offset(pud, address);
if (bad_address(pmd)) goto bad;
printk("PMD %lx ", pmd_val(*pmd));
- if (!pmd_present(*pmd)) goto ret;
+ if (!pmd_present(*pmd) || pmd_large(*pmd)) goto ret;

pte = pte_offset_kernel(pmd, address);
if (bad_address(pte)) goto bad;

2007-09-21 22:43:46

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [30/50] x86_64: remove some dead code


From: "Jan Beulich" <[email protected]>
Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

arch/x86_64/kernel/entry.S | 4 ----
arch/x86_64/kernel/traps.c | 3 ---
2 files changed, 7 deletions(-)

Index: linux/arch/x86_64/kernel/entry.S
===================================================================
--- linux.orig/arch/x86_64/kernel/entry.S
+++ linux/arch/x86_64/kernel/entry.S
@@ -1088,10 +1088,6 @@ ENTRY(coprocessor_segment_overrun)
zeroentry do_coprocessor_segment_overrun
END(coprocessor_segment_overrun)

-ENTRY(reserved)
- zeroentry do_reserved
-END(reserved)
-
/* runs on exception stack */
ENTRY(double_fault)
XCPT_FRAME
Index: linux/arch/x86_64/kernel/traps.c
===================================================================
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -70,7 +70,6 @@ asmlinkage void general_protection(void)
asmlinkage void page_fault(void);
asmlinkage void coprocessor_error(void);
asmlinkage void simd_coprocessor_error(void);
-asmlinkage void reserved(void);
asmlinkage void alignment_check(void);
asmlinkage void machine_check(void);
asmlinkage void spurious_interrupt_bug(void);
@@ -710,12 +709,10 @@ DO_ERROR_INFO( 0, SIGFPE, "divide error
DO_ERROR( 4, SIGSEGV, "overflow", overflow)
DO_ERROR( 5, SIGSEGV, "bounds", bounds)
DO_ERROR_INFO( 6, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->rip)
-DO_ERROR( 7, SIGSEGV, "device not available", device_not_available)
DO_ERROR( 9, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun)
DO_ERROR(10, SIGSEGV, "invalid TSS", invalid_TSS)
DO_ERROR(11, SIGBUS, "segment not present", segment_not_present)
DO_ERROR_INFO(17, SIGBUS, "alignment check", alignment_check, BUS_ADRALN, 0)
-DO_ERROR(18, SIGSEGV, "reserved", reserved)

/* Runs on IST stack */
asmlinkage void do_stack_segment(struct pt_regs *regs, long error_code)

2007-09-21 22:44:15

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [31/50] x86_64: honor notify_die() returning NOTIFY_STOP


From: "Jan Beulich" <[email protected]>
If a debugger or other low level code resolves a kernel exception, don't
send signals, kill the kernel, or do anything the like.

Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

arch/x86_64/kernel/mce.c | 7 ++++---
arch/x86_64/kernel/traps.c | 23 +++++++++++++++--------
arch/x86_64/mm/fault.c | 12 ++++++------
include/asm-x86_64/kdebug.h | 4 ++--
4 files changed, 27 insertions(+), 19 deletions(-)

Index: linux/arch/x86_64/kernel/mce.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce.c
+++ linux/arch/x86_64/kernel/mce.c
@@ -196,9 +196,10 @@ void do_machine_check(struct pt_regs * r

atomic_inc(&mce_entry);

- if (regs)
- notify_die(DIE_NMI, "machine check", regs, error_code, 18, SIGKILL);
- if (!banks)
+ if ((regs
+ && notify_die(DIE_NMI, "machine check", regs, error_code,
+ 18, SIGKILL) == NOTIFY_STOP)
+ || !banks)
goto out2;

memset(&m, 0, sizeof(struct mce));
Index: linux/arch/x86_64/kernel/traps.c
===================================================================
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -557,7 +557,7 @@ unsigned __kprobes long oops_begin(void)
return flags;
}

-void __kprobes oops_end(unsigned long flags)
+void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
die_owner = -1;
bust_spinlocks(0);
@@ -568,12 +568,17 @@ void __kprobes oops_end(unsigned long fl
else
/* Nest count reaches zero, release the lock. */
spin_unlock_irqrestore(&die_lock, flags);
+ if (!regs) {
+ oops_exit();
+ return;
+ }
if (panic_on_oops)
panic("Fatal exception");
oops_exit();
+ do_exit(signr);
}

-void __kprobes __die(const char * str, struct pt_regs * regs, long err)
+int __kprobes __die(const char * str, struct pt_regs * regs, long err)
{
static int die_counter;
printk(KERN_EMERG "%s: %04lx [%u] ", str, err & 0xffff,++die_counter);
@@ -587,7 +592,8 @@ void __kprobes __die(const char * str, s
printk("DEBUG_PAGEALLOC");
#endif
printk("\n");
- notify_die(DIE_OOPS, str, regs, err, current->thread.trap_no, SIGSEGV);
+ if (notify_die(DIE_OOPS, str, regs, err, current->thread.trap_no, SIGSEGV) == NOTIFY_STOP)
+ return 1;
show_registers(regs);
add_taint(TAINT_DIE);
/* Executive summary in case the oops scrolled away */
@@ -596,6 +602,7 @@ void __kprobes __die(const char * str, s
printk(" RSP <%016lx>\n", regs->rsp);
if (kexec_should_crash(current))
crash_kexec(regs);
+ return 0;
}

void die(const char * str, struct pt_regs * regs, long err)
@@ -605,9 +612,9 @@ void die(const char * str, struct pt_reg
if (!user_mode(regs))
report_bug(regs->rip, regs);

- __die(str, regs, err);
- oops_end(flags);
- do_exit(SIGSEGV);
+ if (__die(str, regs, err))
+ regs = NULL;
+ oops_end(flags, regs, SIGSEGV);
}

void __kprobes die_nmi(char *str, struct pt_regs *regs, int do_panic)
@@ -624,10 +631,10 @@ void __kprobes die_nmi(char *str, struct
crash_kexec(regs);
if (do_panic || panic_on_oops)
panic("Non maskable interrupt");
- oops_end(flags);
+ oops_end(flags, NULL, SIGBUS);
nmi_exit();
local_irq_enable();
- do_exit(SIGSEGV);
+ do_exit(SIGBUS);
}

static void __kprobes do_trap(int trapnr, int signr, char *str,
Index: linux/arch/x86_64/mm/fault.c
===================================================================
--- linux.orig/arch/x86_64/mm/fault.c
+++ linux/arch/x86_64/mm/fault.c
@@ -234,9 +234,9 @@ static noinline void pgtable_bad(unsigne
tsk->thread.cr2 = address;
tsk->thread.trap_no = 14;
tsk->thread.error_code = error_code;
- __die("Bad pagetable", regs, error_code);
- oops_end(flags);
- do_exit(SIGKILL);
+ if (__die("Bad pagetable", regs, error_code))
+ regs = NULL;
+ oops_end(flags, regs, SIGKILL);
}

/*
@@ -541,11 +541,11 @@ no_context:
tsk->thread.cr2 = address;
tsk->thread.trap_no = 14;
tsk->thread.error_code = error_code;
- __die("Oops", regs, error_code);
+ if (__die("Oops", regs, error_code))
+ regs = NULL;
/* Executive summary in case the body of the oops scrolled away */
printk(KERN_EMERG "CR2: %016lx\n", address);
- oops_end(flags);
- do_exit(SIGKILL);
+ oops_end(flags, regs, SIGKILL);

/*
* We ran out of memory, or some other thing happened to us that made
Index: linux/include/asm-x86_64/kdebug.h
===================================================================
--- linux.orig/include/asm-x86_64/kdebug.h
+++ linux/include/asm-x86_64/kdebug.h
@@ -27,10 +27,10 @@ enum die_val {

extern void printk_address(unsigned long address);
extern void die(const char *,struct pt_regs *,long);
-extern void __die(const char *,struct pt_regs *,long);
+extern int __must_check __die(const char *, struct pt_regs *, long);
extern void show_registers(struct pt_regs *regs);
extern void dump_pagetable(unsigned long);
extern unsigned long oops_begin(void);
-extern void oops_end(unsigned long);
+extern void oops_end(unsigned long, struct pt_regs *, int signr);

#endif

2007-09-21 22:44:36

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [32/50] x86: Show last exception from/to register contents


From: "Jan Beulich" <[email protected]>
.. when dumping register state. This is particularly useful when gcc
managed to tail-call optimize an indirect call which happens to hit a
NULL (or otherwise invalid) pointer.

The result is unreliable because interrupts happening inbetween can mess
it up

AK: added some warnings that the result can be unreliable

Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

Documentation/kernel-parameters.txt | 3 +++
arch/i386/kernel/cpu/amd.c | 4 ++++
arch/i386/kernel/cpu/common.c | 2 ++
arch/i386/kernel/cpu/intel.c | 20 ++++++++++++++------
arch/i386/kernel/traps.c | 35 +++++++++++++++++++++++++++++++++++
arch/x86_64/kernel/setup.c | 23 ++++++++++++++++++-----
arch/x86_64/kernel/traps.c | 33 +++++++++++++++++++++++++++++++++
include/asm-i386/msr-index.h | 3 +++
include/asm-i386/processor.h | 4 ++++
include/asm-x86_64/msr.h | 6 ++++++
include/asm-x86_64/processor.h | 3 +++
11 files changed, 125 insertions(+), 11 deletions(-)

Index: linux/Documentation/kernel-parameters.txt
===================================================================
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -1152,6 +1152,9 @@ and is between 256 and 4096 characters.

nolapic_timer [X86-32,APIC] Do not use the local APIC timer.

+ noler [X86-32/X86-64] Do not print last exception records
+ with kernel register dumps.
+
noltlbs [PPC] Do not use large page/tlb entries for kernel
lowmem mapping on PPC40x.

Index: linux/arch/i386/kernel/cpu/amd.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/amd.c
+++ linux/arch/i386/kernel/cpu/amd.c
@@ -238,9 +238,13 @@ static void __cpuinit init_amd(struct cp
case 0x10:
case 0x11:
set_bit(X86_FEATURE_K8, c->x86_capability);
+ if (ler_enabled)
+ __get_cpu_var(ler_msr) = MSR_IA32_LASTINTFROMIP;
break;
case 6:
set_bit(X86_FEATURE_K7, c->x86_capability);
+ if (ler_enabled)
+ __get_cpu_var(ler_msr) = MSR_IA32_LASTINTFROMIP;
break;
}
if (c->x86 >= 6)
Index: linux/arch/i386/kernel/cpu/common.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/common.c
+++ linux/arch/i386/kernel/cpu/common.c
@@ -503,6 +503,8 @@ static void __cpuinit identify_cpu(struc

/* Init Machine Check Exception if available. */
mcheck_init(c);
+
+ ler_enable();
}

void __init identify_boot_cpu(void)
Index: linux/arch/i386/kernel/cpu/intel.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/intel.c
+++ linux/arch/i386/kernel/cpu/intel.c
@@ -188,15 +188,23 @@ static void __cpuinit init_intel(struct
}
#endif

- if (c->x86 == 15) {
+ switch (c->x86) {
+ case 15:
set_bit(X86_FEATURE_P4, c->x86_capability);
set_bit(X86_FEATURE_SYNC_RDTSC, c->x86_capability);
- }
- if (c->x86 == 6)
+ if (c->x86_model >= 0x03)
+ set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
+ if (ler_enabled)
+ __get_cpu_var(ler_msr) = MSR_P4_LER_FROM_LIP;
+ break;
+ case 6:
set_bit(X86_FEATURE_P3, c->x86_capability);
- if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
- (c->x86 == 0x6 && c->x86_model >= 0x0e))
- set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
+ if (c->x86_model >= 0x0e)
+ set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
+ if (ler_enabled)
+ __get_cpu_var(ler_msr) = MSR_IA32_LASTINTFROMIP;
+ break;
+ }

if (cpu_has_ds) {
unsigned int l1;
Index: linux/arch/i386/kernel/traps.c
===================================================================
--- linux.orig/arch/i386/kernel/traps.c
+++ linux/arch/i386/kernel/traps.c
@@ -374,6 +374,20 @@ void show_registers(struct pt_regs *regs
unsigned int code_len = code_bytes;
unsigned char c;

+ if (oops_in_progress && __get_cpu_var(ler_msr)) {
+ u32 from, to, hi;
+
+ if (rdmsr_safe(__get_cpu_var(ler_msr), &from, &hi) == 0
+ && rdmsr_safe(__get_cpu_var(ler_msr) + 1, &to, &hi) == 0) {
+ printk("\n" KERN_EMERG
+ "last branch before last exception/interrupt\n");
+ printk(KERN_EMERG " from %08x", from);
+ print_symbol(" (%s)\n", from);
+ printk(KERN_EMERG " to %08x", to);
+ print_symbol(" (%s)", to);
+ } else
+ __get_cpu_var(ler_msr) = 0;
+ }
printk("\n" KERN_EMERG "Stack: ");
show_stack_log_lvl(NULL, regs, &regs->esp, KERN_EMERG);

@@ -413,6 +427,19 @@ int is_valid_bugaddr(unsigned long eip)
return ud2 == 0x0b0f;
}

+DEFINE_PER_CPU(u32, ler_msr);
+int ler_enabled = 1;
+
+void ler_enable(void) {
+ if (__get_cpu_var(ler_msr)) {
+ u32 lo, hi;
+
+ if (rdmsr_safe(MSR_IA32_DEBUGCTLMSR, &lo, &hi) < 0
+ || wrmsr_safe(MSR_IA32_DEBUGCTLMSR, lo | 1, hi) < 0)
+ __get_cpu_var(ler_msr) = 0;
+ }
+}
+
/*
* This is gone through when something in the kernel has done something bad and
* is about to be terminated.
@@ -891,6 +918,7 @@ fastcall void __kprobes do_debug(struct
struct task_struct *tsk = current;

get_debugreg(condition, 6);
+ ler_enable();

if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
SIGTRAP) == NOTIFY_STOP)
@@ -1275,6 +1303,13 @@ static int __init kstack_setup(char *s)
}
__setup("kstack=", kstack_setup);

+static int __init ler_setup(char *s)
+{
+ ler_enabled = 0;
+ return 1;
+}
+__setup("noler", ler_setup);
+
static int __init code_bytes_setup(char *s)
{
code_bytes = simple_strtoul(s, NULL, 0);
Index: linux/arch/x86_64/kernel/setup.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -639,6 +639,9 @@ static void __cpuinit init_amd(struct cp
!rdmsr_safe(MSR_VM_CR, &flags, &dummy) &&
(flags & 0x18))
set_bit(X86_FEATURE_VIRT_DISABLED, &c->x86_capability);
+
+ if (ler_enabled && c->x86 <= 17)
+ __get_cpu_var(ler_msr) = MSR_IA32_LASTINTFROMIP;
}

static int enable_svm_lock(char *s)
@@ -774,13 +777,22 @@ static void __cpuinit init_intel(struct
c->x86_phys_bits = 36;
}

- if (c->x86 == 15)
+ switch (c->x86) {
+ case 15:
c->x86_cache_alignment = c->x86_clflush_size * 2;
- if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
- (c->x86 == 0x6 && c->x86_model >= 0x0e))
- set_bit(X86_FEATURE_CONSTANT_TSC, &c->x86_capability);
- if (c->x86 == 6)
+ if (c->x86_model >= 0x03)
+ set_bit(X86_FEATURE_CONSTANT_TSC, &c->x86_capability);
+ if (ler_enabled)
+ __get_cpu_var(ler_msr) = MSR_P4_LER_FROM_LIP;
+ break;
+ case 6:
+ if (c->x86_model >= 0x0e)
+ set_bit(X86_FEATURE_CONSTANT_TSC, &c->x86_capability);
set_bit(X86_FEATURE_REP_GOOD, &c->x86_capability);
+ if (ler_enabled)
+ __get_cpu_var(ler_msr) = MSR_IA32_LASTINTFROMIP;
+ break;
+ }
if (c->x86 == 15)
set_bit(X86_FEATURE_SYNC_RDTSC, &c->x86_capability);
else
@@ -951,6 +963,7 @@ void __cpuinit identify_cpu(struct cpuin
#ifdef CONFIG_NUMA
numa_add_cpu(smp_processor_id());
#endif
+ ler_enable();
}


Index: linux/arch/x86_64/kernel/traps.c
===================================================================
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -492,6 +492,19 @@ void show_registers(struct pt_regs *regs
* time of the fault..
*/
if (in_kernel) {
+ if (oops_in_progress && __get_cpu_var(ler_msr)) {
+ u64 from, to;
+
+ if (checking_rdmsrl(__get_cpu_var(ler_msr), from) == 0
+ && checking_rdmsrl(__get_cpu_var(ler_msr) + 1, to) == 0) {
+ printk("last branch before last exception/interrupt\n");
+ printk(" from ");
+ printk_address(from);
+ printk(" to ");
+ printk_address(to);
+ } else
+ __get_cpu_var(ler_msr) = 0;
+ }
printk("Stack: ");
_show_stack(NULL, regs, (unsigned long*)rsp);

@@ -530,6 +543,19 @@ void out_of_line_bug(void)
EXPORT_SYMBOL(out_of_line_bug);
#endif

+DEFINE_PER_CPU(u32, ler_msr);
+int ler_enabled = 1;
+
+void ler_enable(void) {
+ if (__get_cpu_var(ler_msr)) {
+ u32 lo, hi;
+
+ if (rdmsr_safe(MSR_IA32_DEBUGCTLMSR, &lo, &hi) < 0
+ || wrmsr_safe(MSR_IA32_DEBUGCTLMSR, lo | 1, hi) < 0)
+ __get_cpu_var(ler_msr) = 0;
+ }
+}
+
static DEFINE_SPINLOCK(die_lock);
static int die_owner = -1;
static unsigned int die_nest_count;
@@ -920,6 +946,7 @@ asmlinkage void __kprobes do_debug(struc
siginfo_t info;

get_debugreg(condition, 6);
+ ler_enable();

if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
SIGTRAP) == NOTIFY_STOP)
@@ -1188,6 +1215,12 @@ void __init trap_init(void)
cpu_init();
}

+static int __init ler_setup(char *s)
+{
+ ler_enabled = 0;
+ return 1;
+}
+__setup("noler", ler_setup);

static int __init oops_setup(char *s)
{
Index: linux/include/asm-i386/msr-index.h
===================================================================
--- linux.orig/include/asm-i386/msr-index.h
+++ linux/include/asm-i386/msr-index.h
@@ -63,6 +63,9 @@
#define MSR_IA32_LASTINTFROMIP 0x000001dd
#define MSR_IA32_LASTINTTOIP 0x000001de

+#define MSR_P4_LER_FROM_LIP 0x000001d7
+#define MSR_P4_LER_TO_LIP 0x000001d8
+
#define MSR_IA32_MC0_CTL 0x00000400
#define MSR_IA32_MC0_STATUS 0x00000401
#define MSR_IA32_MC0_ADDR 0x00000402
Index: linux/include/asm-i386/processor.h
===================================================================
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -643,6 +643,10 @@ static inline unsigned int cpuid_edx(uns
return edx;
}

+DECLARE_PER_CPU(u32, ler_msr);
+extern int ler_enabled;
+void ler_enable(void);
+
/* generic versions from gas */
#define GENERIC_NOP1 ".byte 0x90\n"
#define GENERIC_NOP2 ".byte 0x89,0xf6\n"
Index: linux/include/asm-x86_64/msr.h
===================================================================
--- linux.orig/include/asm-x86_64/msr.h
+++ linux/include/asm-x86_64/msr.h
@@ -63,6 +63,12 @@
:"c"(msr), "i"(-EIO), "0"(0)); \
ret__; })

+#define checking_rdmsrl(msr,val) ({ \
+ u32 lo__, hi__; \
+ int rc__ = rdmsr_safe(msr, &lo__, &hi__); \
+ val = lo__ | ((u64)hi__ << 32); \
+ rc__; })
+
#define rdtsc(low,high) \
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))

Index: linux/include/asm-x86_64/processor.h
===================================================================
--- linux.orig/include/asm-x86_64/processor.h
+++ linux/include/asm-x86_64/processor.h
@@ -334,6 +334,9 @@ struct extended_sigtable {
struct extended_signature sigs[0];
};

+DECLARE_PER_CPU(u32, ler_msr);
+extern int ler_enabled;
+void ler_enable(void);

#define ASM_NOP1 K8_NOP1
#define ASM_NOP2 K8_NOP2

2007-09-21 22:44:58

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [33/50] x86: rename .i assembler includes to .h


.i is an ending used for preprocessed stuff.

This patch therefore renames assembler include files to .h and guards
the contents with an #ifdef __ASSEMBLY__.

Signed-off-by: Adrian Bunk <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---

arch/i386/lib/semaphore.S | 4 ++--
arch/x86_64/lib/rwlock.S | 2 +-
include/asm-i386/alternative-asm.h | 16 ++++++++++++++++
include/asm-i386/alternative-asm.i | 12 ------------
include/asm-i386/frame.h | 27 +++++++++++++++++++++++++++
include/asm-i386/frame.i | 23 -----------------------
include/asm-um/alternative-asm.h | 6 ++++++
include/asm-um/alternative-asm.i | 6 ------
include/asm-um/frame.h | 6 ++++++
include/asm-um/frame.i | 6 ------
include/asm-x86_64/alternative-asm.h | 16 ++++++++++++++++
include/asm-x86_64/alternative-asm.i | 12 ------------
12 files changed, 74 insertions(+), 62 deletions(-)

7b64536780b39820b13bebd144983c3c8c9ae64c
Index: linux/arch/i386/lib/semaphore.S
===================================================================
--- linux.orig/arch/i386/lib/semaphore.S
+++ linux/arch/i386/lib/semaphore.S
@@ -15,8 +15,8 @@

#include <linux/linkage.h>
#include <asm/rwlock.h>
-#include <asm/alternative-asm.i>
-#include <asm/frame.i>
+#include <asm/alternative-asm.h>
+#include <asm/frame.h>
#include <asm/dwarf2.h>

/*
Index: linux/arch/x86_64/lib/rwlock.S
===================================================================
--- linux.orig/arch/x86_64/lib/rwlock.S
+++ linux/arch/x86_64/lib/rwlock.S
@@ -2,7 +2,7 @@

#include <linux/linkage.h>
#include <asm/rwlock.h>
-#include <asm/alternative-asm.i>
+#include <asm/alternative-asm.h>
#include <asm/dwarf2.h>

/* rdi: pointer to rwlock_t */
Index: linux/include/asm-i386/alternative-asm.h
===================================================================
--- /dev/null
+++ linux/include/asm-i386/alternative-asm.h
@@ -0,0 +1,16 @@
+#ifdef __ASSEMBLY__
+
+#ifdef CONFIG_SMP
+ .macro LOCK_PREFIX
+1: lock
+ .section .smp_locks,"a"
+ .align 4
+ .long 1b
+ .previous
+ .endm
+#else
+ .macro LOCK_PREFIX
+ .endm
+#endif
+
+#endif /* __ASSEMBLY__ */
Index: linux/include/asm-i386/alternative-asm.i
===================================================================
--- linux.orig/include/asm-i386/alternative-asm.i
+++ /dev/null
@@ -1,12 +0,0 @@
-#ifdef CONFIG_SMP
- .macro LOCK_PREFIX
-1: lock
- .section .smp_locks,"a"
- .align 4
- .long 1b
- .previous
- .endm
-#else
- .macro LOCK_PREFIX
- .endm
-#endif
Index: linux/include/asm-i386/frame.h
===================================================================
--- /dev/null
+++ linux/include/asm-i386/frame.h
@@ -0,0 +1,27 @@
+#ifdef __ASSEMBLY__
+
+#include <asm/dwarf2.h>
+
+/* The annotation hides the frame from the unwinder and makes it look
+ like a ordinary ebp save/restore. This avoids some special cases for
+ frame pointer later */
+#ifdef CONFIG_FRAME_POINTER
+ .macro FRAME
+ pushl %ebp
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ebp,0
+ movl %esp,%ebp
+ .endm
+ .macro ENDFRAME
+ popl %ebp
+ CFI_ADJUST_CFA_OFFSET -4
+ CFI_RESTORE ebp
+ .endm
+#else
+ .macro FRAME
+ .endm
+ .macro ENDFRAME
+ .endm
+#endif
+
+#endif /* __ASSEMBLY__ */
Index: linux/include/asm-i386/frame.i
===================================================================
--- linux.orig/include/asm-i386/frame.i
+++ /dev/null
@@ -1,23 +0,0 @@
-#include <asm/dwarf2.h>
-
-/* The annotation hides the frame from the unwinder and makes it look
- like a ordinary ebp save/restore. This avoids some special cases for
- frame pointer later */
-#ifdef CONFIG_FRAME_POINTER
- .macro FRAME
- pushl %ebp
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET ebp,0
- movl %esp,%ebp
- .endm
- .macro ENDFRAME
- popl %ebp
- CFI_ADJUST_CFA_OFFSET -4
- CFI_RESTORE ebp
- .endm
-#else
- .macro FRAME
- .endm
- .macro ENDFRAME
- .endm
-#endif
Index: linux/include/asm-um/alternative-asm.h
===================================================================
--- /dev/null
+++ linux/include/asm-um/alternative-asm.h
@@ -0,0 +1,6 @@
+#ifndef __UM_ALTERNATIVE_ASM_I
+#define __UM_ALTERNATIVE_ASM_I
+
+#include "asm/arch/alternative-asm.h"
+
+#endif
Index: linux/include/asm-um/alternative-asm.i
===================================================================
--- linux.orig/include/asm-um/alternative-asm.i
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef __UM_ALTERNATIVE_ASM_I
-#define __UM_ALTERNATIVE_ASM_I
-
-#include "asm/arch/alternative-asm.i"
-
-#endif
Index: linux/include/asm-um/frame.h
===================================================================
--- /dev/null
+++ linux/include/asm-um/frame.h
@@ -0,0 +1,6 @@
+#ifndef __UM_FRAME_I
+#define __UM_FRAME_I
+
+#include "asm/arch/frame.h"
+
+#endif
Index: linux/include/asm-um/frame.i
===================================================================
--- linux.orig/include/asm-um/frame.i
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef __UM_FRAME_I
-#define __UM_FRAME_I
-
-#include "asm/arch/frame.i"
-
-#endif
Index: linux/include/asm-x86_64/alternative-asm.h
===================================================================
--- /dev/null
+++ linux/include/asm-x86_64/alternative-asm.h
@@ -0,0 +1,16 @@
+#ifdef __ASSEMBLY__
+
+#ifdef CONFIG_SMP
+ .macro LOCK_PREFIX
+1: lock
+ .section .smp_locks,"a"
+ .align 8
+ .quad 1b
+ .previous
+ .endm
+#else
+ .macro LOCK_PREFIX
+ .endm
+#endif
+
+#endif /* __ASSEMBLY__ */
Index: linux/include/asm-x86_64/alternative-asm.i
===================================================================
--- linux.orig/include/asm-x86_64/alternative-asm.i
+++ /dev/null
@@ -1,12 +0,0 @@
-#ifdef CONFIG_SMP
- .macro LOCK_PREFIX
-1: lock
- .section .smp_locks,"a"
- .align 8
- .quad 1b
- .previous
- .endm
-#else
- .macro LOCK_PREFIX
- .endm
-#endif

2007-09-21 22:45:39

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [34/50] i386: Fix argument signedness warnings


From: Satyam Sharma <[email protected]>


These build warnings:

In file included from include/asm/thread_info.h:16,
from include/linux/thread_info.h:21,
from include/linux/preempt.h:9,
from include/linux/spinlock.h:49,
from include/linux/vmalloc.h:4,
from arch/i386/boot/compressed/misc.c:14:
include/asm/processor.h: In function $B!F(Jcpuid_count$B!G(J:
include/asm/processor.h:615: warning: pointer targets in passing argument 1 of $B!F(Jnative_cpuid$B!G(J differ in signedness
include/asm/processor.h:615: warning: pointer targets in passing argument 2 of $B!F(Jnative_cpuid$B!G(J differ in signedness
include/asm/processor.h:615: warning: pointer targets in passing argument 3 of $B!F(Jnative_cpuid$B!G(J differ in signedness
include/asm/processor.h:615: warning: pointer targets in passing argument 4 of $B!F(Jnative_cpuid$B!G(J differ in signedness

come because the arguments have been specified as pointers to (signed) int
types, not unsigned. So let's specify those as unsigned. Do some codingstyle
here and there while at it.

Signed-off-by: Satyam Sharma <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---

include/asm-i386/processor.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

Index: linux/include/asm-i386/processor.h
===================================================================
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -595,7 +595,9 @@ static inline void load_esp0(struct tss_
* clear %ecx since some cpus (Cyrix MII) do not set or clear %ecx
* resulting in stale register contents being returned.
*/
-static inline void cpuid(unsigned int op, unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx)
+static inline void cpuid(unsigned int op,
+ unsigned int *eax, unsigned int *ebx,
+ unsigned int *ecx, unsigned int *edx)
{
*eax = op;
*ecx = 0;
@@ -603,8 +605,9 @@ static inline void cpuid(unsigned int op
}

/* Some CPUID calls want 'count' to be placed in ecx */
-static inline void cpuid_count(int op, int count, int *eax, int *ebx, int *ecx,
- int *edx)
+static inline void cpuid_count(unsigned int op, int count,
+ unsigned int *eax, unsigned int *ebx,
+ unsigned int *ecx, unsigned int *edx)
{
*eax = op;
*ecx = count;

2007-09-21 22:46:00

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [36/50] x86: Use raw locks during oopses


Don't want any lockdep or other fragile machinery to run during oopses.
Use raw spinlocks directly for oops locking.
Also disables irq flag tracing there.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/i386/kernel/traps.c | 12 +++++++-----
arch/x86_64/kernel/traps.c | 17 ++++++++---------
2 files changed, 15 insertions(+), 14 deletions(-)

Index: linux/arch/i386/kernel/traps.c
===================================================================
--- linux.orig/arch/i386/kernel/traps.c
+++ linux/arch/i386/kernel/traps.c
@@ -447,11 +447,11 @@ void ler_enable(void) {
void die(const char * str, struct pt_regs * regs, long err)
{
static struct {
- spinlock_t lock;
+ raw_spinlock_t lock;
u32 lock_owner;
int lock_owner_depth;
} die = {
- .lock = __SPIN_LOCK_UNLOCKED(die.lock),
+ .lock = __RAW_SPIN_LOCK_UNLOCKED,
.lock_owner = -1,
.lock_owner_depth = 0
};
@@ -462,13 +462,14 @@ void die(const char * str, struct pt_reg

if (die.lock_owner != raw_smp_processor_id()) {
console_verbose();
- spin_lock_irqsave(&die.lock, flags);
+ __raw_spin_lock(&die.lock);
+ raw_local_save_flags(flags);
die.lock_owner = smp_processor_id();
die.lock_owner_depth = 0;
bust_spinlocks(1);
}
else
- local_save_flags(flags);
+ raw_local_save_flags(flags);

if (++die.lock_owner_depth < 3) {
unsigned long esp;
@@ -511,7 +512,8 @@ void die(const char * str, struct pt_reg
bust_spinlocks(0);
die.lock_owner = -1;
add_taint(TAINT_DIE);
- spin_unlock_irqrestore(&die.lock, flags);
+ __raw_spin_unlock(&die.lock);
+ raw_local_irq_restore(flags);

if (!regs)
return;
Index: linux/arch/x86_64/kernel/traps.c
===================================================================
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -556,7 +556,7 @@ void ler_enable(void) {
}
}

-static DEFINE_SPINLOCK(die_lock);
+static raw_spinlock_t die_lock = __RAW_SPIN_LOCK_UNLOCKED;
static int die_owner = -1;
static unsigned int die_nest_count;

@@ -568,13 +568,13 @@ unsigned __kprobes long oops_begin(void)
oops_enter();

/* racy, but better than risking deadlock. */
- local_irq_save(flags);
+ raw_local_irq_save(flags);
cpu = smp_processor_id();
- if (!spin_trylock(&die_lock)) {
+ if (!__raw_spin_trylock(&die_lock)) {
if (cpu == die_owner)
/* nested oops. should stop eventually */;
else
- spin_lock(&die_lock);
+ __raw_spin_lock(&die_lock);
}
die_nest_count++;
die_owner = cpu;
@@ -588,12 +588,11 @@ void __kprobes oops_end(unsigned long fl
die_owner = -1;
bust_spinlocks(0);
die_nest_count--;
- if (die_nest_count)
- /* We still own the lock */
- local_irq_restore(flags);
- else
+ if (!die_nest_count) {
/* Nest count reaches zero, release the lock. */
- spin_unlock_irqrestore(&die_lock, flags);
+ __raw_spin_unlock(&die_lock);
+ }
+ raw_local_irq_restore(flags);
if (!regs) {
oops_exit();
return;

2007-09-21 22:46:29

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [37/50] x86_64: Clean up mce= argument parsing slightly


Move the = into the __setup line.
Document the option in kernel-parameters.txt by adding a pointer
to the x86-64 specific documentation.

Pointed out by Robert Day
Signed-off-by: Andi Kleen <[email protected]>

---
Documentation/kernel-parameters.txt | 2 ++
arch/x86_64/kernel/mce.c | 4 +---
2 files changed, 3 insertions(+), 3 deletions(-)

Index: linux/Documentation/kernel-parameters.txt
===================================================================
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -970,6 +970,8 @@ and is between 256 and 4096 characters.

mce [X86-32] Machine Check Exception

+ mce=option [X86-64] See Documentation/x86-64/boot-options.txt
+
md= [HW] RAID subsystems devices and level
See Documentation/md.txt.

Index: linux/arch/x86_64/kernel/mce.c
===================================================================
--- linux.orig/arch/x86_64/kernel/mce.c
+++ linux/arch/x86_64/kernel/mce.c
@@ -699,8 +699,6 @@ static int __init mcheck_disable(char *s
mce=nobootlog Don't log MCEs from before booting. */
static int __init mcheck_enable(char *str)
{
- if (*str == '=')
- str++;
if (!strcmp(str, "off"))
mce_dont_init = 1;
else if (!strcmp(str, "bootlog") || !strcmp(str,"nobootlog"))
@@ -713,7 +711,7 @@ static int __init mcheck_enable(char *st
}

__setup("nomce", mcheck_disable);
-__setup("mce", mcheck_enable);
+__setup("mce=", mcheck_enable);

/*
* Sysfs support

2007-09-21 22:46:55

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [38/50] x86_64: fix off-by-one in find_next_zero_string


From: Andrew Hastings <[email protected]>
Fix an off-by-one error in find_next_zero_string which prevents
allocating the last bit.

Signed-off-by: Andrew Hastings <[email protected]> on behalf of Cray Inc.
Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/lib/bitstr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86_64/lib/bitstr.c
===================================================================
--- linux.orig/arch/x86_64/lib/bitstr.c
+++ linux/arch/x86_64/lib/bitstr.c
@@ -14,7 +14,7 @@ find_next_zero_string(unsigned long *bit

/* could test bitsliced, but it's hardly worth it */
end = n+len;
- if (end >= nbits)
+ if (end > nbits)
return -1;
for (i = n+1; i < end; i++) {
if (test_bit(i, bitmap)) {

2007-09-21 22:47:28

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [39/50] i386: fix 4 bit apicid assumption of mach-default


From: "Siddha, Suresh B" <[email protected]>

Fix get_apic_id() in mach-default, so that it uses 8 bits incase of xAPIC case
and 4 bits for legacy APIC case.

This fixes the i386 kernel assumption that apic id is less than 16 for xAPIC
platforms with 8 cpus or less and makes the kernel boot on such platforms.

Signed-off-by: Suresh Siddha <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

include/asm-i386/mach-default/mach_apicdef.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux/include/asm-i386/mach-default/mach_apicdef.h
===================================================================
--- linux.orig/include/asm-i386/mach-default/mach_apicdef.h
+++ linux/include/asm-i386/mach-default/mach_apicdef.h
@@ -1,11 +1,17 @@
#ifndef __ASM_MACH_APICDEF_H
#define __ASM_MACH_APICDEF_H

+#include <asm/apic.h>
+
#define APIC_ID_MASK (0xF<<24)

static inline unsigned get_apic_id(unsigned long x)
{
- return (((x)>>24)&0xF);
+ unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR));
+ if (APIC_XAPIC(ver))
+ return (((x)>>24)&0xFF);
+ else
+ return (((x)>>24)&0xF);
}

#define GET_APIC_ID(x) get_apic_id(x)

2007-09-21 22:47:45

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [35/50] i386: Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.


From: Akinobu Mita <[email protected]>

Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.

Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Akinobu Mita <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Cc: Gautham R Shenoy <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/i386/kernel/cpuid.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)

Index: linux/arch/i386/kernel/cpuid.c
===================================================================
--- linux.orig/arch/i386/kernel/cpuid.c
+++ linux/arch/i386/kernel/cpuid.c
@@ -136,15 +136,18 @@ static const struct file_operations cpui
.open = cpuid_open,
};

-static int __cpuinit cpuid_device_create(int i)
+static int cpuid_device_create(int cpu)
{
- int err = 0;
struct device *dev;

- dev = device_create(cpuid_class, NULL, MKDEV(CPUID_MAJOR, i), "cpu%d",i);
- if (IS_ERR(dev))
- err = PTR_ERR(dev);
- return err;
+ dev = device_create(cpuid_class, NULL, MKDEV(CPUID_MAJOR, cpu),
+ "cpu%d", cpu);
+ return IS_ERR(dev) ? PTR_ERR(dev) : 0;
+}
+
+static void cpuid_device_destroy(int cpu)
+{
+ device_destroy(cpuid_class, MKDEV(CPUID_MAJOR, cpu));
}

static int __cpuinit cpuid_class_cpu_callback(struct notifier_block *nfb,
@@ -152,18 +155,21 @@ static int __cpuinit cpuid_class_cpu_cal
void *hcpu)
{
unsigned int cpu = (unsigned long)hcpu;
+ int err = 0;

switch (action) {
- case CPU_ONLINE:
- case CPU_ONLINE_FROZEN:
- cpuid_device_create(cpu);
+ case CPU_UP_PREPARE:
+ case CPU_UP_PREPARE_FROZEN:
+ err = cpuid_device_create(cpu);
break;
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
case CPU_DEAD_FROZEN:
- device_destroy(cpuid_class, MKDEV(CPUID_MAJOR, cpu));
+ cpuid_device_destroy(cpu);
break;
}
- return NOTIFY_OK;
+ return err ? NOTIFY_BAD : NOTIFY_OK;
}

static struct notifier_block __cpuinitdata cpuid_class_cpu_notifier =
@@ -200,7 +206,7 @@ static int __init cpuid_init(void)
out_class:
i = 0;
for_each_online_cpu(i) {
- device_destroy(cpuid_class, MKDEV(CPUID_MAJOR, i));
+ cpuid_device_destroy(i);
}
class_destroy(cpuid_class);
out_chrdev:
@@ -214,7 +220,7 @@ static void __exit cpuid_exit(void)
int cpu = 0;

for_each_online_cpu(cpu)
- device_destroy(cpuid_class, MKDEV(CPUID_MAJOR, cpu));
+ cpuid_device_destroy(cpu);
class_destroy(cpuid_class);
unregister_chrdev(CPUID_MAJOR, "cpu/cpuid");
unregister_hotcpu_notifier(&cpuid_class_cpu_notifier);

2007-09-21 22:48:14

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [40/50] i386: Fix section mismatch


From: Satyam Sharma <[email protected]>

Fix bugzilla #8679

WARNING: arch/i386/kernel/built-in.o(.data+0x2148): Section mismatch: reference
to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mtrr_mutex')

comes because struct notifier_block thermal_throttle_cpu_notifier in
arch/i386/kernel/cpu/mcheck/therm_throt.c goes in .data section but the
notifier callback function itself has been marked __cpuinit which becomes
__init == .init.text when HOTPLUG_CPU=n. The warning is bogus because the
callback will never be called out if HOTPLUG_CPU=n in the first place (as
one can see from kernel/cpu.c, the cpu_chain itself is __cpuinitdata :-)

So, let's mark thermal_throttle_cpu_notifier as __cpuinitdata to fix
the section mismatch warning.

Signed-off-by: Satyam Sharma <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---

arch/i386/kernel/cpu/mcheck/therm_throt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/i386/kernel/cpu/mcheck/therm_throt.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/mcheck/therm_throt.c
+++ linux/arch/i386/kernel/cpu/mcheck/therm_throt.c
@@ -152,7 +152,7 @@ static __cpuinit int thermal_throttle_cp
return NOTIFY_OK;
}

-static struct notifier_block thermal_throttle_cpu_notifier =
+static struct notifier_block thermal_throttle_cpu_notifier __cpuinitdata =
{
.notifier_call = thermal_throttle_cpu_callback,
};

2007-09-21 22:48:38

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [41/50] i386: fix section mismatch warning in intel.c


From: Sam Ravnborg <[email protected]>
Fix following section mismatch warning:
WARNING: vmlinux.o(.text+0xc88c): Section mismatch: reference to .init.text:trap_init_f00f_bug (between 'init_intel' and 'cpuid4_cache_lookup')

init_intel are __cpuint where trap_init_f00f_bug is __init.
Fixed by declaring trap_init_f00f_bug __cpuinit.

Moved the defintion of trap_init_f00f_bug to the sole user in init.c
so the ugly prototype in intel.c could get killed.

Frank van Maarseveen <[email protected]> supplied the .config used
to reproduce the warning.

Cc: Frank van Maarseveen <[email protected]>
Signed-off-by: Sam Ravnborg <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---
---
arch/i386/kernel/cpu/intel.c | 17 +++++++++++++++--
arch/i386/kernel/traps.c | 14 --------------
2 files changed, 15 insertions(+), 16 deletions(-)

Index: linux/arch/i386/kernel/cpu/intel.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/intel.c
+++ linux/arch/i386/kernel/cpu/intel.c
@@ -8,6 +8,7 @@
#include <linux/module.h>

#include <asm/processor.h>
+#include <asm/pgtable.h>
#include <asm/msr.h>
#include <asm/uaccess.h>

@@ -19,8 +20,6 @@
#include <mach_apic.h>
#endif

-extern int trap_init_f00f_bug(void);
-
#ifdef CONFIG_X86_INTEL_USERCOPY
/*
* Alignment at which movsl is preferred for bulk memory copies.
@@ -95,6 +94,20 @@ static int __cpuinit num_cpu_cores(struc
return 1;
}

+#ifdef CONFIG_X86_F00F_BUG
+static void __cpuinit trap_init_f00f_bug(void)
+{
+ __set_fixmap(FIX_F00F_IDT, __pa(&idt_table), PAGE_KERNEL_RO);
+
+ /*
+ * Update the IDT descriptor and reload the IDT so that
+ * it uses the read-only mapped virtual address.
+ */
+ idt_descr.address = fix_to_virt(FIX_F00F_IDT);
+ load_idt(&idt_descr);
+}
+#endif
+
static void __cpuinit init_intel(struct cpuinfo_x86 *c)
{
unsigned int l2 = 0;
Index: linux/arch/i386/kernel/traps.c
===================================================================
--- linux.orig/arch/i386/kernel/traps.c
+++ linux/arch/i386/kernel/traps.c
@@ -1180,20 +1180,6 @@ asmlinkage void math_emulate(long arg)

#endif /* CONFIG_MATH_EMULATION */

-#ifdef CONFIG_X86_F00F_BUG
-void __init trap_init_f00f_bug(void)
-{
- __set_fixmap(FIX_F00F_IDT, __pa(&idt_table), PAGE_KERNEL_RO);
-
- /*
- * Update the IDT descriptor and reload the IDT so that
- * it uses the read-only mapped virtual address.
- */
- idt_descr.address = fix_to_virt(FIX_F00F_IDT);
- load_idt(&idt_descr);
-}
-#endif
-
/*
* This needs to use 'idt_table' rather than 'idt', and
* thus use the _nonmapped_ version of the IDT, as the

2007-09-21 22:48:55

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [42/50] i386: constify wd_ops


From: "Jan Beulich" <[email protected]>
.. as they're, with a single exception, never written to.

Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

arch/i386/kernel/cpu/perfctr-watchdog.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

Index: linux/arch/i386/kernel/cpu/perfctr-watchdog.c
===================================================================
--- linux.orig/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ linux/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -34,7 +34,7 @@ struct wd_ops {
u64 checkbit;
};

-static struct wd_ops *wd_ops;
+static const struct wd_ops *wd_ops;

/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
* offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now)
@@ -325,7 +325,7 @@ static void single_msr_rearm(struct nmi_
write_watchdog_counter(wd->perfctr_msr, NULL, nmi_hz);
}

-static struct wd_ops k7_wd_ops = {
+static const struct wd_ops k7_wd_ops = {
.reserve = single_msr_reserve,
.unreserve = single_msr_unreserve,
.setup = setup_k7_watchdog,
@@ -388,7 +388,7 @@ static void p6_rearm(struct nmi_watchdog
write_watchdog_counter32(wd->perfctr_msr, NULL,nmi_hz);
}

-static struct wd_ops p6_wd_ops = {
+static const struct wd_ops p6_wd_ops = {
.reserve = single_msr_reserve,
.unreserve = single_msr_unreserve,
.setup = setup_p6_watchdog,
@@ -540,7 +540,7 @@ static void p4_rearm(struct nmi_watchdog
write_watchdog_counter(wd->perfctr_msr, NULL, nmi_hz);
}

-static struct wd_ops p4_wd_ops = {
+static const struct wd_ops p4_wd_ops = {
.reserve = p4_reserve,
.unreserve = p4_unreserve,
.setup = setup_p4_watchdog,
@@ -558,6 +558,8 @@ static struct wd_ops p4_wd_ops = {
#define ARCH_PERFMON_NMI_EVENT_SEL ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL
#define ARCH_PERFMON_NMI_EVENT_UMASK ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK

+static struct wd_ops intel_arch_wd_ops;
+
static int setup_intel_arch_watchdog(unsigned nmi_hz)
{
unsigned int ebx;
@@ -599,11 +601,11 @@ static int setup_intel_arch_watchdog(uns
wd->perfctr_msr = perfctr_msr;
wd->evntsel_msr = evntsel_msr;
wd->cccr_msr = 0; //unused
- wd_ops->checkbit = 1ULL << (eax.split.bit_width - 1);
+ intel_arch_wd_ops.checkbit = 1ULL << (eax.split.bit_width - 1);
return 1;
}

-static struct wd_ops intel_arch_wd_ops = {
+static struct wd_ops intel_arch_wd_ops __read_mostly = {
.reserve = single_msr_reserve,
.unreserve = single_msr_unreserve,
.setup = setup_intel_arch_watchdog,

2007-09-21 22:49:30

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [43/50] x86: multi-byte single instruction NOPs


From: "Jan Beulich" <[email protected]>
Add support for and use the multi-byte NOPs recently documented to be
available on all PentiumPro and later processors.

This patch only applies cleanly on top of the "x86: misc.
constifications" patch sent earlier.

Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

arch/i386/kernel/alternative.c | 23 ++++++++++++++++++++++-
include/asm-i386/processor.h | 22 ++++++++++++++++++++++
include/asm-x86_64/processor.h | 22 ++++++++++++++++++++++
3 files changed, 66 insertions(+), 1 deletion(-)

Index: linux/arch/i386/kernel/alternative.c
===================================================================
--- linux.orig/arch/i386/kernel/alternative.c
+++ linux/arch/i386/kernel/alternative.c
@@ -115,12 +115,31 @@ static const unsigned char *const k7_nop
};
#endif

+#ifdef P6_NOP1
+asm("\t.section .rodata, \"a\"\np6nops: "
+ P6_NOP1 P6_NOP2 P6_NOP3 P6_NOP4 P6_NOP5 P6_NOP6
+ P6_NOP7 P6_NOP8);
+extern const unsigned char p6nops[];
+static const unsigned char *const p6_nops[ASM_NOP_MAX+1] = {
+ NULL,
+ p6nops,
+ p6nops + 1,
+ p6nops + 1 + 2,
+ p6nops + 1 + 2 + 3,
+ p6nops + 1 + 2 + 3 + 4,
+ p6nops + 1 + 2 + 3 + 4 + 5,
+ p6nops + 1 + 2 + 3 + 4 + 5 + 6,
+ p6nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
+};
+#endif
+
#ifdef CONFIG_X86_64

extern char __vsyscall_0;
static inline const unsigned char*const * find_nop_table(void)
{
- return k8_nops;
+ return boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+ boot_cpu_data.x86 < 6 ? k8_nops : p6_nops;
}

#else /* CONFIG_X86_64 */
@@ -131,6 +150,8 @@ static const struct nop {
} noptypes[] = {
{ X86_FEATURE_K8, k8_nops },
{ X86_FEATURE_K7, k7_nops },
+ { X86_FEATURE_P4, p6_nops },
+ { X86_FEATURE_P3, p6_nops },
{ -1, NULL }
};

Index: linux/include/asm-i386/processor.h
===================================================================
--- linux.orig/include/asm-i386/processor.h
+++ linux/include/asm-i386/processor.h
@@ -681,6 +681,17 @@ void ler_enable(void);
#define K7_NOP7 ".byte 0x8D,0x04,0x05,0,0,0,0\n"
#define K7_NOP8 K7_NOP7 ASM_NOP1

+/* P6 nops */
+/* uses eax dependencies (Intel-recommended choice) */
+#define P6_NOP1 GENERIC_NOP1
+#define P6_NOP2 ".byte 0x66,0x90\n"
+#define P6_NOP3 ".byte 0x0f,0x1f,0x00\n"
+#define P6_NOP4 ".byte 0x0f,0x1f,0x40,0\n"
+#define P6_NOP5 ".byte 0x0f,0x1f,0x44,0x00,0\n"
+#define P6_NOP6 ".byte 0x66,0x0f,0x1f,0x44,0x00,0\n"
+#define P6_NOP7 ".byte 0x0f,0x1f,0x80,0,0,0,0\n"
+#define P6_NOP8 ".byte 0x0f,0x1f,0x84,0x00,0,0,0,0\n"
+
#ifdef CONFIG_MK8
#define ASM_NOP1 K8_NOP1
#define ASM_NOP2 K8_NOP2
@@ -699,6 +710,17 @@ void ler_enable(void);
#define ASM_NOP6 K7_NOP6
#define ASM_NOP7 K7_NOP7
#define ASM_NOP8 K7_NOP8
+#elif defined(CONFIG_M686) || defined(CONFIG_MPENTIUMII) || \
+ defined(CONFIG_MPENTIUMIII) || defined(CONFIG_MPENTIUMM) || \
+ defined(CONFIG_MCORE2) || defined(CONFIG_PENTIUM4)
+#define ASM_NOP1 P6_NOP1
+#define ASM_NOP2 P6_NOP2
+#define ASM_NOP3 P6_NOP3
+#define ASM_NOP4 P6_NOP4
+#define ASM_NOP5 P6_NOP5
+#define ASM_NOP6 P6_NOP6
+#define ASM_NOP7 P6_NOP7
+#define ASM_NOP8 P6_NOP8
#else
#define ASM_NOP1 GENERIC_NOP1
#define ASM_NOP2 GENERIC_NOP2
Index: linux/include/asm-x86_64/processor.h
===================================================================
--- linux.orig/include/asm-x86_64/processor.h
+++ linux/include/asm-x86_64/processor.h
@@ -338,6 +338,16 @@ DECLARE_PER_CPU(u32, ler_msr);
extern int ler_enabled;
void ler_enable(void);

+#if defined(CONFIG_MPSC) || defined(CONFIG_MCORE2)
+#define ASM_NOP1 P6_NOP1
+#define ASM_NOP2 P6_NOP2
+#define ASM_NOP3 P6_NOP3
+#define ASM_NOP4 P6_NOP4
+#define ASM_NOP5 P6_NOP5
+#define ASM_NOP6 P6_NOP6
+#define ASM_NOP7 P6_NOP7
+#define ASM_NOP8 P6_NOP8
+#else
#define ASM_NOP1 K8_NOP1
#define ASM_NOP2 K8_NOP2
#define ASM_NOP3 K8_NOP3
@@ -346,6 +356,7 @@ void ler_enable(void);
#define ASM_NOP6 K8_NOP6
#define ASM_NOP7 K8_NOP7
#define ASM_NOP8 K8_NOP8
+#endif

/* Opteron nops */
#define K8_NOP1 ".byte 0x90\n"
@@ -357,6 +368,17 @@ void ler_enable(void);
#define K8_NOP7 K8_NOP4 K8_NOP3
#define K8_NOP8 K8_NOP4 K8_NOP4

+/* P6 nops */
+/* uses eax dependencies (Intel-recommended choice) */
+#define P6_NOP1 ".byte 0x90\n"
+#define P6_NOP2 ".byte 0x66,0x90\n"
+#define P6_NOP3 ".byte 0x0f,0x1f,0x00\n"
+#define P6_NOP4 ".byte 0x0f,0x1f,0x40,0\n"
+#define P6_NOP5 ".byte 0x0f,0x1f,0x44,0x00,0\n"
+#define P6_NOP6 ".byte 0x66,0x0f,0x1f,0x44,0x00,0\n"
+#define P6_NOP7 ".byte 0x0f,0x1f,0x80,0,0,0,0\n"
+#define P6_NOP8 ".byte 0x0f,0x1f,0x84,0x00,0,0,0,0\n"
+
#define ASM_NOP_MAX 8

/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */

2007-09-21 22:49:51

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [44/50] i386: Introduce "used_vectors" bitmap which can be used to reserve vectors.


From: Rusty Russell <[email protected]>

This simplifies the io_apic.c __assign_irq_vector() logic and removes
the explicit SYSCALL_VECTOR check, and also allows for vectors to be
reserved by other mechanisms (ie. lguest).

Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---
arch/i386/kernel/i8259.c | 3 ++-
arch/i386/kernel/io_apic.c | 13 ++++++++-----
arch/i386/kernel/traps.c | 10 ++++++++++
include/asm-i386/irq.h | 3 +++
4 files changed, 23 insertions(+), 6 deletions(-)

===================================================================
Index: linux/arch/i386/kernel/i8259.c
===================================================================
--- linux.orig/arch/i386/kernel/i8259.c
+++ linux/arch/i386/kernel/i8259.c
@@ -400,7 +400,8 @@ void __init native_init_IRQ(void)
int vector = FIRST_EXTERNAL_VECTOR + i;
if (i >= NR_IRQS)
break;
- if (vector != SYSCALL_VECTOR)
+ /* SYSCALL_VECTOR was reserved in trap_init. */
+ if (!test_bit(vector, used_vectors))
set_intr_gate(vector, interrupt[i]);
}

Index: linux/arch/i386/kernel/io_apic.c
===================================================================
--- linux.orig/arch/i386/kernel/io_apic.c
+++ linux/arch/i386/kernel/io_apic.c
@@ -1198,7 +1198,7 @@ static u8 irq_vector[NR_IRQ_VECTORS] __r
static int __assign_irq_vector(int irq)
{
static int current_vector = FIRST_DEVICE_VECTOR, current_offset = 0;
- int vector, offset, i;
+ int vector, offset;

BUG_ON((unsigned)irq >= NR_IRQ_VECTORS);

@@ -1215,11 +1215,8 @@ next:
}
if (vector == current_vector)
return -ENOSPC;
- if (vector == SYSCALL_VECTOR)
+ if (test_and_set_bit(vector, used_vectors))
goto next;
- for (i = 0; i < NR_IRQ_VECTORS; i++)
- if (irq_vector[i] == vector)
- goto next;

current_vector = vector;
current_offset = offset;
@@ -2290,6 +2287,12 @@ static inline void __init check_timer(vo

void __init setup_IO_APIC(void)
{
+ int i;
+
+ /* Reserve all the system vectors. */
+ for (i = FIRST_SYSTEM_VECTOR; i < NR_VECTORS; i++)
+ set_bit(i, used_vectors);
+
enable_IO_APIC();

if (acpi_ioapic)
Index: linux/arch/i386/kernel/traps.c
===================================================================
--- linux.orig/arch/i386/kernel/traps.c
+++ linux/arch/i386/kernel/traps.c
@@ -65,6 +65,9 @@

int panic_on_unrecovered_nmi;

+DECLARE_BITMAP(used_vectors, NR_VECTORS);
+EXPORT_SYMBOL_GPL(used_vectors);
+
asmlinkage int system_call(void);

/* Do we ignore FPU interrupts ? */
@@ -1217,6 +1220,8 @@ static void __init set_task_gate(unsigne

void __init trap_init(void)
{
+ int i;
+
#ifdef CONFIG_EISA
void __iomem *p = ioremap(0x0FFFD9, 4);
if (readl(p) == 'E'+('I'<<8)+('S'<<16)+('A'<<24)) {
@@ -1276,6 +1281,11 @@ void __init trap_init(void)

set_system_gate(SYSCALL_VECTOR,&system_call);

+ /* Reserve all the builtin and the syscall vector. */
+ for (i = 0; i < FIRST_EXTERNAL_VECTOR; i++)
+ set_bit(i, used_vectors);
+ set_bit(SYSCALL_VECTOR, used_vectors);
+
/*
* Should be a barrier for any external CPU state.
*/
Index: linux/include/asm-i386/irq.h
===================================================================
--- linux.orig/include/asm-i386/irq.h
+++ linux/include/asm-i386/irq.h
@@ -45,4 +45,7 @@ unsigned int do_IRQ(struct pt_regs *regs
void init_IRQ(void);
void __init native_init_IRQ(void);

+/* Interrupt vector management */
+extern DECLARE_BITMAP(used_vectors, NR_VECTORS);
+
#endif /* _ASM_IRQ_H */

2007-09-21 22:50:21

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [45/50] x86_64: configure HPET_EMULATE_RTC automatically


From: Stefan Richter <[email protected]>
I don't know exactly what this option does...
Andi says it should be automatic rather than exposed as a prompt.

Signed-off-by: Stefan Richter <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

---

---
arch/x86_64/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/arch/x86_64/Kconfig
===================================================================
--- linux.orig/arch/x86_64/Kconfig
+++ linux/arch/x86_64/Kconfig
@@ -469,8 +469,9 @@ config HPET_TIMER
<http://www.intel.com/hardwaredesign/hpetspec.htm>.

config HPET_EMULATE_RTC
- bool "Provide RTC interrupt"
+ bool
depends on HPET_TIMER && RTC=y
+ default y

# Mark as embedded because too many people got it wrong.
# The code disables itself when not needed.

2007-09-21 22:50:43

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [46/50] x86: also show non-zero IRQ counts for vectors that currently don't have a handler


From: "Jan Beulich" <[email protected]>
It doesn't seem to make sense to hide these, even if their counts
can't change at the point in time they're being displayed.

Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

arch/i386/kernel/irq.c | 18 ++++++++++++++----
arch/x86_64/kernel/irq.c | 18 ++++++++++++++----
2 files changed, 28 insertions(+), 8 deletions(-)

Index: linux/arch/i386/kernel/irq.c
===================================================================
--- linux.orig/arch/i386/kernel/irq.c
+++ linux/arch/i386/kernel/irq.c
@@ -259,9 +259,17 @@ int show_interrupts(struct seq_file *p,
}

if (i < NR_IRQS) {
+ unsigned any_count = 0;
+
spin_lock_irqsave(&irq_desc[i].lock, flags);
+#ifndef CONFIG_SMP
+ any_count = kstat_irqs(i);
+#else
+ for_each_online_cpu(j)
+ any_count |= kstat_cpu(j).irqs[i];
+#endif
action = irq_desc[i].action;
- if (!action)
+ if (!action && !any_count)
goto skip;
seq_printf(p, "%3d: ",i);
#ifndef CONFIG_SMP
@@ -272,10 +280,12 @@ int show_interrupts(struct seq_file *p,
#endif
seq_printf(p, " %8s", irq_desc[i].chip->name);
seq_printf(p, "-%-8s", irq_desc[i].name);
- seq_printf(p, " %s", action->name);

- for (action=action->next; action; action = action->next)
- seq_printf(p, ", %s", action->name);
+ if (action) {
+ seq_printf(p, " %s", action->name);
+ while ((action = action->next) != NULL)
+ seq_printf(p, ", %s", action->name);
+ }

seq_putc(p, '\n');
skip:
Index: linux/arch/x86_64/kernel/irq.c
===================================================================
--- linux.orig/arch/x86_64/kernel/irq.c
+++ linux/arch/x86_64/kernel/irq.c
@@ -64,9 +64,17 @@ int show_interrupts(struct seq_file *p,
}

if (i < NR_IRQS) {
+ unsigned any_count = 0;
+
spin_lock_irqsave(&irq_desc[i].lock, flags);
+#ifndef CONFIG_SMP
+ any_count = kstat_irqs(i);
+#else
+ for_each_online_cpu(j)
+ any_count |= kstat_cpu(j).irqs[i];
+#endif
action = irq_desc[i].action;
- if (!action)
+ if (!action && !any_count)
goto skip;
seq_printf(p, "%3d: ",i);
#ifndef CONFIG_SMP
@@ -78,9 +86,11 @@ int show_interrupts(struct seq_file *p,
seq_printf(p, " %8s", irq_desc[i].chip->name);
seq_printf(p, "-%-8s", irq_desc[i].name);

- seq_printf(p, " %s", action->name);
- for (action=action->next; action; action = action->next)
- seq_printf(p, ", %s", action->name);
+ if (action) {
+ seq_printf(p, " %s", action->name);
+ while ((action = action->next) != NULL)
+ seq_printf(p, ", %s", action->name);
+ }
seq_putc(p, '\n');
skip:
spin_unlock_irqrestore(&irq_desc[i].lock, flags);

2007-09-21 22:51:00

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [47/50] i386: avoid temporarily inconsistent pte-s


From: "Jan Beulich" <[email protected]>
One more of these issues (which were considered fixed a few releases
back): Other than on x86-64, i386 allows set_fixmap() to replace
already present mappings. Consequently, on PAE, care must be taken to
not update the high half of a pte while the low half is still holding
the old value.

Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

arch/i386/mm/pgtable.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux/arch/i386/mm/pgtable.c
===================================================================
--- linux.orig/arch/i386/mm/pgtable.c
+++ linux/arch/i386/mm/pgtable.c
@@ -97,8 +97,7 @@ static void set_pte_pfn(unsigned long va
}
pte = pte_offset_kernel(pmd, vaddr);
if (pgprot_val(flags))
- /* <pfn,flags> stored as-is, to permit clearing entries */
- set_pte(pte, pfn_pte(pfn, flags));
+ set_pte_present(&init_mm, vaddr, pte, pfn_pte(pfn, flags));
else
pte_clear(&init_mm, vaddr, pte);

2007-09-21 22:51:23

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [48/50] x86_64: return correct error code from child_rip in x86_64 entry.S


From: Andrey Mirkin <[email protected]>

Right now register edi is just cleared before calling do_exit.
That is wrong because correct return value will be ignored.
Value from rax should be copied to rdi instead of clearing edi.

AK: changed to 32bit move because it's strictly an int

Signed-off-by: Andrey Mirkin <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>

-----

---
arch/x86_64/kernel/entry.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86_64/kernel/entry.S
===================================================================
--- linux.orig/arch/x86_64/kernel/entry.S
+++ linux/arch/x86_64/kernel/entry.S
@@ -989,7 +989,7 @@ child_rip:
movq %rsi, %rdi
call *%rax
# exit
- xorl %edi, %edi
+ mov %eax, %edi
call do_exit
CFI_ENDPROC
ENDPROC(child_rip)

2007-09-21 22:51:41

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [49/50] x86_64: Initialize 64bit registers for a.out executables


Previously the data from before the exec was kept in there. Zero
them instead
Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/ia32/ia32_aout.c | 2 ++
1 file changed, 2 insertions(+)

Index: linux/arch/x86_64/ia32/ia32_aout.c
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32_aout.c
+++ linux/arch/x86_64/ia32/ia32_aout.c
@@ -422,6 +422,8 @@ beyond_if:
(regs)->eflags = 0x200;
(regs)->cs = __USER32_CS;
(regs)->ss = __USER32_DS;
+ regs->r8 = regs->r9 = regs->r10 = regs->r11 =
+ regs->r12 = regs->r13 = regs->r14 = regs->r15 = 0;
set_fs(USER_DS);
if (unlikely(current->ptrace & PT_PTRACED)) {
if (current->ptrace & PT_TRACE_EXEC)

2007-09-21 22:52:06

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] [50/50] x86_64: Remove fpu io port resource


Not needed on modern systems without external FPU

TBD on i386 it is only needed for true 386s. Could remove it there
TBD for >= 486

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/kernel/setup.c | 2 --
1 file changed, 2 deletions(-)

Index: linux/arch/x86_64/kernel/setup.c
===================================================================
--- linux.orig/arch/x86_64/kernel/setup.c
+++ linux/arch/x86_64/kernel/setup.c
@@ -121,8 +121,6 @@ struct resource standard_io_resources[]
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
{ .name = "dma2", .start = 0xc0, .end = 0xdf,
.flags = IORESOURCE_BUSY | IORESOURCE_IO },
- { .name = "fpu", .start = 0xf0, .end = 0xff,
- .flags = IORESOURCE_BUSY | IORESOURCE_IO }
};

#define IORESOURCE_RAM (IORESOURCE_BUSY | IORESOURCE_MEM)

2007-09-21 22:56:11

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [PATCH] [6/50] i386: clean up oops/bug reports

On 09/21/2007 06:32 PM, Andi Kleen wrote:
> From: Pavel Emelyanov <[email protected]>
>
> Typically the oops first lines look like this:
>
> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip:
> c049dfbd
> *pde = 00000000
> Oops: 0002 [#1]
> PREEMPT SMP
> ...
>
> Such output is gained with some ugly if (!nl) printk("\n"); code and
> besides being a waste of lines, this is also annoying to read. The
> following output looks better (and it is how it looks on x86_64):
>
> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip: c049dfbd *pde = 00000000
> Oops: 0002 [#1] PREEMPT SMP
> ...
>
> Signed-off-by: Pavel Emelyanov <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>

Reviewed-by: Chuck Ebbert <[email protected]>

>
> ---
>
> arch/i386/kernel/traps.c | 16 ++++------------
> arch/i386/mm/fault.c | 13 +++++++------
> 2 files changed, 11 insertions(+), 18 deletions(-)
>
> Index: linux/arch/i386/kernel/traps.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/traps.c
> +++ linux/arch/i386/kernel/traps.c
> @@ -444,31 +444,23 @@ void die(const char * str, struct pt_reg
> local_save_flags(flags);
>
> if (++die.lock_owner_depth < 3) {
> - int nl = 0;
> unsigned long esp;
> unsigned short ss;
>
> report_bug(regs->eip, regs);
>
> - printk(KERN_EMERG "%s: %04lx [#%d]\n", str, err & 0xffff, ++die_counter);
> + printk(KERN_EMERG "%s: %04lx [#%d] ", str, err & 0xffff, ++die_counter);
> #ifdef CONFIG_PREEMPT
> - printk(KERN_EMERG "PREEMPT ");
> - nl = 1;
> + printk("PREEMPT ");
> #endif
> #ifdef CONFIG_SMP
> - if (!nl)
> - printk(KERN_EMERG);
> printk("SMP ");
> - nl = 1;
> #endif
> #ifdef CONFIG_DEBUG_PAGEALLOC
> - if (!nl)
> - printk(KERN_EMERG);
> printk("DEBUG_PAGEALLOC");
> - nl = 1;
> #endif
> - if (nl)
> - printk("\n");
> + printk("\n");
> +
> if (notify_die(DIE_OOPS, str, regs, err,
> current->thread.trap_no, SIGSEGV) !=
> NOTIFY_STOP) {
> Index: linux/arch/i386/mm/fault.c
> ===================================================================
> --- linux.orig/arch/i386/mm/fault.c
> +++ linux/arch/i386/mm/fault.c
> @@ -544,23 +544,22 @@ no_context:
> printk(KERN_ALERT "BUG: unable to handle kernel paging"
> " request");
> printk(" at virtual address %08lx\n",address);
> - printk(KERN_ALERT " printing eip:\n");
> - printk("%08lx\n", regs->eip);
> + printk(KERN_ALERT "printing eip: %08lx ", regs->eip);
>
> page = read_cr3();
> page = ((__typeof__(page) *) __va(page))[address >> PGDIR_SHIFT];
> #ifdef CONFIG_X86_PAE
> - printk(KERN_ALERT "*pdpt = %016Lx\n", page);
> + printk("*pdpt = %016Lx ", page);
> if ((page >> PAGE_SHIFT) < max_low_pfn
> && page & _PAGE_PRESENT) {
> page &= PAGE_MASK;
> page = ((__typeof__(page) *) __va(page))[(address >> PMD_SHIFT)
> & (PTRS_PER_PMD - 1)];
> - printk(KERN_ALERT "*pde = %016Lx\n", page);
> + printk(KERN_ALERT "*pde = %016Lx ", page);
> page &= ~_PAGE_NX;
> }
> #else
> - printk(KERN_ALERT "*pde = %08lx\n", page);
> + printk("*pde = %08lx ", page);
> #endif
>
> /*
> @@ -574,8 +573,10 @@ no_context:
> page &= PAGE_MASK;
> page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
> & (PTRS_PER_PTE - 1)];
> - printk(KERN_ALERT "*pte = %0*Lx\n", sizeof(page)*2, (u64)page);
> + printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
> }
> +
> + printk("\n");
> }
>
> tsk->thread.cr2 = address;

2007-09-21 22:57:43

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

On Sat, Sep 22, 2007 at 12:32:02AM +0200, Andi Kleen wrote:


> + Select this for:
> + Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename:
> + -Willamette
> + -Northwood
> + -Mobile Pentium 4
> + -Mobile Pentium 4 M
> + -Extreme Edition (Gallatin)
> + -Prescott
> + -Prescott 2M
> + -Cedar Mill
> + -Presler
> + -Smithfiled
> + Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename:
> + -Foster
> + -Prestonia
> + -Gallatin
> + -Nocona
> + -Irwindale
> + -Cranford
> + -Potomac
> + -Paxville
> + -Dempsey

This seems like yet another list that will need to be perpetually
kept up to date, and given 99% of users don't know the codename
of their core, just the marketing name, I question its value.

> + more info: http://balusc.xs4all.nl/srv/har-cpu.html

This URL is dead already.

> config MPSC
> bool "Intel P4 / older Netburst based Xeon"
> help

sidenote: I always wondered what 'PSC' stood for ?

Dave

--
http://www.codemonkey.org.uk

2007-09-21 23:01:20

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] [50/50] x86_64: Remove fpu io port resource

Andi Kleen wrote:
> Not needed on modern systems without external FPU
>
> TBD on i386 it is only needed for true 386s. Could remove it there
> TBD for >= 486
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> arch/x86_64/kernel/setup.c | 2 --
> 1 file changed, 2 deletions(-)
>
> Index: linux/arch/x86_64/kernel/setup.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/setup.c
> +++ linux/arch/x86_64/kernel/setup.c
> @@ -121,8 +121,6 @@ struct resource standard_io_resources[]
> .flags = IORESOURCE_BUSY | IORESOURCE_IO },
> { .name = "dma2", .start = 0xc0, .end = 0xdf,
> .flags = IORESOURCE_BUSY | IORESOURCE_IO },
> - { .name = "fpu", .start = 0xf0, .end = 0xff,
> - .flags = IORESOURCE_BUSY | IORESOURCE_IO }

Since we are merging x86 and x86-64, I think it would be nice at least
to CC Thomas on patches that increase 32/64-bit differences... because
won't this patch have to be partial un-done when we merge i386 and x86-64?

Jeff


2007-09-21 23:45:48

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

> > config MPSC
> > bool "Intel P4 / older Netburst based Xeon"
> > help
>
> sidenote: I always wondered what 'PSC' stood for ?

Produces Smoke and Cooks ?

2007-09-22 02:35:39

by Oleg Verych

[permalink] [raw]
Subject: Killing printk calls for size (Re: [PATCH] [6/50] i386: clean up oops/bug reports)

* Sat, 22 Sep 2007 00:32:04 +0200 (CEST)
[]
> arch/i386/kernel/traps.c | 16 ++++------------
> arch/i386/mm/fault.c | 13 +++++++------
> 2 files changed, 11 insertions(+), 18 deletions(-)

It seems, like size can be reduced even more now:

[]
> report_bug(regs->eip, regs);
>
> - printk(KERN_EMERG "%s: %04lx [#%d]\n", str, err & 0xffff, ++die_counter);
> + printk(KERN_EMERG "%s: %04lx [#%d] ", str, err & 0xffff, ++die_counter);

+ printk(KERN_EMERG "%s: %04lx [#%d] %s", str, err &0xffff, ++die_counter,

> #ifdef CONFIG_PREEMPT
> - printk(KERN_EMERG "PREEMPT ");
> - nl = 1;
> + printk("PREEMPT ");

+ "PREEMPT "\

> #endif
> #ifdef CONFIG_SMP
> - if (!nl)
> - printk(KERN_EMERG);
> printk("SMP ");

"SMP "\

> - nl = 1;
> #endif
> #ifdef CONFIG_DEBUG_PAGEALLOC
> - if (!nl)
> - printk(KERN_EMERG);
> printk("DEBUG_PAGEALLOC");

"DEBUG_PAGEALLOC"\

> - nl = 1;
> #endif
> - if (nl)
> - printk("\n");
> + printk("\n");

+ "\n");

Just hand waving.

FWIW, with more flexible kconfig, ifdiffery can be removed also...
____

2007-09-22 03:19:46

by Oleg Verych

[permalink] [raw]
Subject: possible corrections in the docs (Re: [PATCH] [7/50] x86: expand /proc/interrupts to include missing vectors, v2)

* Sat, 22 Sep 2007 00:32:05 +0200 (CEST)

[]
> Index: linux/Documentation/filesystems/proc.txt
>===================================================================
> --- linux.orig/Documentation/filesystems/proc.txt
> +++ linux/Documentation/filesystems/proc.txt
> @@ -347,7 +347,40 @@ connects the CPUs in a SMP system. This
> the IO-APIC automatically retry the transmission, so it should not be a big
> problem, but you should read the SMP-FAQ.
>
> -In this context it could be interesting to note the new irq directory in 2.4.
> +In 2.6.2* /proc/interrupts was expanded again. This time the goal was for
> +/proc/interrupts to display every IRQ vector in use by the system, not
> +just those considered 'most important'. The new vectors are:
> +
> + THR -- a threshold interrupt occurs when ECC memory correction is occuring
> + at too high a frequency. Threshold interrupt machinery is often put
> + into the ECC logic, as occasional ECC memory corrections are part of
> + normal operation (due to random alpha particles), but sequences of
> + ECC corrections or outright failures over some short interval usually
> + indicate a memory chip that is about to fail. Note that not every
> + platform has ECC threshold logic, and those that do generally require
> + it to be explicitly turned on.

+ THR -- a threshold interrupt happens, when frequency of ECC memory
+ corrections is too high. Threshold interrupt machinery is often put
+ into the ECC hardware, and must be explicitly enabled, if so. Occasional
+ ECC memory corrections are part of the normal operation (ionizing radiation
+ background). Sequences of ECC corrections or outright failures over some
+ short interval, usually indicate a memory chip, that is about to fail
+ completely.

(that "random alpha particles" bs, must be killed anyway)

> + TRM -- a thermal event interrupt occurs when a temperature threshold
> + has been exceeded for some CPU chip. This interrupt may also be generated
> + when the temperature drops back to normal.
> +
> + SPU -- a spurious interrupt is some interrupt that was raised then lowered
> + by some IO device before it could be fully processed by the APIC. Hence
> + the APIC sees the interrupt but does not know what device it came from.
> + For this case the APIC will generate the interrupt with a IRQ vector
> + of 0xff.

+ SPU -- a spurious interrupt. This is an interrupt, that was raised then lowered
+ so quickly, that it was not fully processed by the APIC. Hence,
+ origin of it is unknown.
+ For this case, interrupt with a IRQ vector of 0xff will be generated.

> + RES, CAL, TLB -- rescheduling, call and tlb flush interrupts are
> + sent from one CPU to another per the needs of the OS. Typically,
> + their statistics are used by kernel developers and interested users to
> + determine the occurance of interrupt floods of the given type.

+ RES, CAL, TLB -- rescheduling, call and tlb flush interrupts,
+ produced by normal OS operation. Typically,
+ this information is used by kernel developers and interested users to
+ determine the occurance of interrupt floods of the given type.


> +The above IRQ vectors are displayed only when relevent. For example,
available?
> +the threshold vector does not exist on x86_64 platforms. Others are
> +suppressed when the system is a uniprocessor. As of this writing, only
> +i386 and x86_64 platforms support the new IRQ vector displays.
> +
> +Of some interest is the introduction of the /proc/irq directory to 2.4.
> It could be used to set IRQ to CPU affinity, this means that you can "hook" an
> IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
> irq subdir is one subdir for each IRQ, and one file; prof_cpu_mask
_____

2007-09-22 05:03:40

by Satyam Sharma

[permalink] [raw]
Subject: Re: [PATCH] [34/50] i386: Fix argument signedness warnings

Hi,


On Sat, 22 Sep 2007, Andi Kleen wrote:
>
> From: Satyam Sharma <[email protected]>
>
>
> These build warnings:
>
> In file included from include/asm/thread_info.h:16,
> from include/linux/thread_info.h:21,
> from include/linux/preempt.h:9,
> from include/linux/spinlock.h:49,
> from include/linux/vmalloc.h:4,
> from arch/i386/boot/compressed/misc.c:14:
> include/asm/processor.h: In function $B!F(Jcpuid_count$B!G(J:
^^^^^^^^^^ ^^^^^^^^^^
> include/asm/processor.h:615: warning: pointer targets in passing
> argument 1 of $B!F(Jnative_cpuid$B!G(J differ in signedness

> include/asm/processor.h:615: warning: pointer targets in passing
> argument 2 of $B!F(Jnative_cpuid$B!G(J differ in signedness

> include/asm/processor.h:615: warning: pointer targets in passing
> argument 3 of $B!F(Jnative_cpuid$B!G(J differ in signedness

> include/asm/processor.h:615: warning: pointer targets in passing
> argument 4 of $B!F(Jnative_cpuid$B!G(J differ in signedness
^^^^^^^^^^ ^^^^^^^^^^

Yikes. My bad, I had faulty (default) alpine settings (and a sad
combination of LANG=en_US.UTF-8) when I made and sent out that patch.
Please ensure that this finally gets committed in a somewhat saner and
more readable state to the tree.

Thanks,

Satyam

2007-09-22 05:32:30

by Oleg Verych

[permalink] [raw]
Subject: Re: [PATCH] [13/50] x86: Fix and reenable CLFLUSH support in change_page_attr()

* Sat, 22 Sep 2007 00:32:11 +0200 (CEST)
[]
> - flush_map(&l);
> + flush_map(&arg);

+ flush_map(&arg.l);

CC arch/x86_64/mm/pageattr.o
arch/x86_64/mm/pageattr.c: In function 'global_flush_tlb':
arch/x86_64/mm/pageattr.c:274: warning: passing argument 1 of 'flush_map' from incompatible pointer type

(for i386 seems too)

[]
> +#define PageFlush(p) test_bit(PG_owner_priv_1, &(p)->flags)
> +#define SetPageFlush(p) set_bit(PG_owner_priv_1, &(p)->flags)
> +#define TestClearPageFlush(p) test_and_clear_bit(PG_owner_priv_1, &(p)->flags)

Is it worth introducing more of that Pascal style? Yes, page stuff is
all about it, but still.

[]
> +static struct page *flush_page(unsigned long address)
> {
> - if (!test_and_set_bit(PG_arch_1, &kpte_page->flags))
> - list_add(&kpte_page->lru, &df_list);
> + struct page *p;
> + if (!(pfn_valid(__pa(address) >> PAGE_SHIFT)))
> + return NULL;
> + p = virt_to_page(address);
> + if ((PageFlush(p) || PageLRU(p)) && !test_bit(PG_arch_1, &p->flags))
> + return NULL;
> + return p;
> }

Saves 16 bytes in non optimized compile (if tcc will ever do this :)

static struct page *flush_page(unsigned long address)
{
struct page *p = NULL;

if (pfn_valid(__pa(address) >> PAGE_SHIFT)) {
p = virt_to_page(address);
if (PageFlush(p) || PageLRU(p))
if (!test_bit(PG_arch_1, &p->flags))
p = NULL;
}
return p;
}

> static int
> @@ -158,6 +185,18 @@ __change_page_attr(struct page *page, pg
> kpte_page = virt_to_page(kpte);
> BUG_ON(PageLRU(kpte_page));
> BUG_ON(PageCompound(kpte_page));
> + BUG_ON(PageLRU(kpte_page));
> +
> + /* Do caching attributes change?
> + Note: this will need changes if the PAT bit is used (it isn't
> + currently) because that one varies between 2MB and 4K pages. */
> + if ((pte_val(*kpte)&_PAGE_CACHE) != (pgprot_val(prot)&_PAGE_CACHE)) {
> + struct page *p = flush_page(address);
> + if (!p)
> + full_flush = 1;
> + else
> + save_page(p, 1);
> + }
>
> if (pgprot_val(prot) != pgprot_val(PAGE_KERNEL)) {
> if (!pte_huge(*kpte)) {
> @@ -189,7 +228,7 @@ __change_page_attr(struct page *page, pg
> * replace it with a largepage.
> */
>
> - save_page(kpte_page);
> + save_page(kpte_page, 0);
> if (!PageReserved(kpte_page)) {
> if (cpu_has_pse && (page_private(kpte_page) == 0)) {
> paravirt_release_pt(page_to_pfn(kpte_page));
> @@ -235,18 +274,22 @@ int change_page_attr(struct page *page,
>
> void global_flush_tlb(void)
> {
> - struct list_head l;
> + struct flush_arg arg;
> struct page *pg, *next;
>
> BUG_ON(irqs_disabled());
>
> spin_lock_irq(&cpa_lock);
> - list_replace_init(&df_list, &l);
> + arg.full_flush = full_flush;
> + full_flush = 0;
> + list_replace_init(&df_list, &arg.l);
> spin_unlock_irq(&cpa_lock);
> - flush_map(&l);
> - list_for_each_entry_safe(pg, next, &l, lru) {
> + flush_map(&arg);

i386 case here.
____

2007-09-22 05:37:29

by Yinghai Lu

[permalink] [raw]
Subject: Re: [patches] [PATCH] [12/50] x86_64: Untable __init references between IO data

On 9/21/07, Andi Kleen <[email protected]> wrote:
>
> Earlier patch added IO APIC setup into local APIC setup. This caused
> modpost warnings. Fix them by untangling setup_local_APIC() and splitting
> it into smaller functions. The IO APIC initialization is only called
> for the BP init.
>
> Also removed some outdated debugging code and minor cleanup.
> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> arch/x86_64/kernel/apic.c | 46 ++++++++++++++++++++-----------------------
> arch/x86_64/kernel/smpboot.c | 8 +++++++
> include/asm-x86_64/apic.h | 1
> 3 files changed, 31 insertions(+), 24 deletions(-)
>
> Index: linux/arch/x86_64/kernel/apic.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/apic.c
> +++ linux/arch/x86_64/kernel/apic.c
> @@ -323,7 +323,7 @@ void __init init_bsp_APIC(void)
>
> void __cpuinit setup_local_APIC (void)
> {
> - unsigned int value, maxlvt;
> + unsigned int value;
> int i, j;
>
> value = apic_read(APIC_LVR);
> @@ -417,33 +417,22 @@ void __cpuinit setup_local_APIC (void)
> else
> value = APIC_DM_NMI | APIC_LVT_MASKED;
> apic_write(APIC_LVT1, value);
> +}
>
> +void __cpuinit lapic_setup_esr(void)

static ?

> +{
> + unsigned maxlvt = get_maxlvt();
> + apic_write(APIC_LVTERR, ERROR_APIC_VECTOR);
> /*
> - * Now enable IO-APICs, actually call clear_IO_APIC
> - * We need clear_IO_APIC before enabling vector on BP
> + * spec says clear errors after enabling vector.
> */
> - if (!smp_processor_id())
> - if (!skip_ioapic_setup && nr_ioapics)
> - enable_IO_APIC();
> -
> - {
> - unsigned oldvalue;
> - maxlvt = get_maxlvt();
> - oldvalue = apic_read(APIC_ESR);
> - value = ERROR_APIC_VECTOR; // enables sending errors
> - apic_write(APIC_LVTERR, value);
> - /*
> - * spec says clear errors after enabling vector.
> - */
> - if (maxlvt > 3)
> - apic_write(APIC_ESR, 0);
> - value = apic_read(APIC_ESR);
> - if (value != oldvalue)
> - apic_printk(APIC_VERBOSE,
> - "ESR value after enabling vector: %08x, after %08x\n",
> - oldvalue, value);
> - }
> + if (maxlvt > 3)
> + apic_write(APIC_ESR, 0);
> +}
>
> +void __cpuinit end_local_APIC_setup(void)
> +{
> + lapic_setup_esr();
> nmi_watchdog_default();
> setup_apic_nmi_watchdog(NULL);
> apic_pm_activate();
> @@ -1178,6 +1167,15 @@ int __init APIC_init_uniprocessor (void)
>
> setup_local_APIC();
>
> + /*
> + * Now enable IO-APICs, actually call clear_IO_APIC
> + * We need clear_IO_APIC before enabling vector on BP

here it is uniprocessor...
so
+ * We need clear_IO_APIC before enabling error vector

> + */
> + if (!skip_ioapic_setup && nr_ioapics)
> + enable_IO_APIC();

could it cause modpost warning too?

> +
> + end_local_APIC_setup();
> +
> if (smp_found_config && !skip_ioapic_setup && nr_ioapics)
> setup_IO_APIC();
> else
> Index: linux/arch/x86_64/kernel/smpboot.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/smpboot.c
> +++ linux/arch/x86_64/kernel/smpboot.c
> @@ -211,6 +211,7 @@ void __cpuinit smp_callin(void)
>
> Dprintk("CALLIN, before setup_local_APIC().\n");
> setup_local_APIC();
> + end_local_APIC_setup();
>
> /*
> * Get our bogomips.
> @@ -870,6 +871,13 @@ void __init smp_prepare_cpus(unsigned in
> */
> setup_local_APIC();
>
> + /*
> + * Enable IO APIC before setting up error vector
> + */
> + if (!skip_ioapic_setup && nr_ioapics)
> + enable_IO_APIC();
> + end_local_APIC_setup();
> +
> if (GET_APIC_ID(apic_read(APIC_ID)) != boot_cpu_id) {
> panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
> GET_APIC_ID(apic_read(APIC_ID)), boot_cpu_id);
> Index: linux/include/asm-x86_64/apic.h
> ===================================================================
> --- linux.orig/include/asm-x86_64/apic.h
> +++ linux/include/asm-x86_64/apic.h
> @@ -73,6 +73,7 @@ extern void cache_APIC_registers (void);
> extern void sync_Arb_IDs (void);

sync_Arb_IDs is still left there?

YH

2007-09-22 06:49:29

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

On 9/21/07, Andi Kleen <[email protected]> wrote:
>
> From: Robert Hancock <[email protected]>
>
> This path adds validation of the MMCONFIG table against the ACPI reserved
> motherboard resources. If the MMCONFIG table is found to be reserved in
> ACPI, we don't bother checking the E820 table. The PCI Express firmware
> spec apparently tells BIOS developers that reservation in ACPI is required
> and E820 reservation is optional, so checking against ACPI first makes
> sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though
> it is perfectly functional, the existing check needlessly disables MMCONFIG
> in these cases.
>
> In order to do this, MMCONFIG setup has been split into two phases. If PCI
> configuration type 1 is not available then MMCONFIG is enabled early as
> before. Otherwise, it is enabled later after the ACPI interpreter is
> enabled, since we need to be able to execute control methods in order to
> check the ACPI reserved resources. Presently this is just triggered off
> the end of ACPI interpreter initialization.
>
> There are a few other behavioral changes here:
>
> - Validate all MMCONFIG configurations provided, not just the first one.
>
> - Validate the entire required length of each configuration according to
> the provided ending bus number is reserved, not just the minimum required
> allocation.
>
> - Validate that the area is reserved even if we read it from the chipset
> directly and not from the MCFG table. This catches the case where the
> BIOS didn't set the location properly in the chipset and has mapped it
> over other things it shouldn't have.
>
> This also cleans up the MMCONFIG initialization functions so that they
> simply do nothing if MMCONFIG is not compiled in.
>
> Based on an original patch by Rajesh Shah from Intel.
>
> [[email protected]: many fixes and cleanups]
> Signed-off-by: Robert Hancock <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>
> Cc: Rajesh Shah <[email protected]>
> Cc: Jesse Barnes <[email protected]>
> Acked-by: Linus Torvalds <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Greg KH <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> arch/i386/pci/init.c | 4 -
> arch/i386/pci/mmconfig-shared.c | 151 +++++++++++++++++++++++++++++++++++-----
> arch/i386/pci/pci.h | 1
> drivers/acpi/bus.c | 2
> include/linux/pci.h | 8 ++
> 5 files changed, 144 insertions(+), 22 deletions(-)
>
> Index: linux/arch/i386/pci/init.c
> ===================================================================
> --- linux.orig/arch/i386/pci/init.c
> +++ linux/arch/i386/pci/init.c
> @@ -11,9 +11,7 @@ static __init int pci_access_init(void)
> #ifdef CONFIG_PCI_DIRECT
> type = pci_direct_probe();
> #endif
> -#ifdef CONFIG_PCI_MMCONFIG
> - pci_mmcfg_init(type);
> -#endif
> + pci_mmcfg_early_init(type);
> if (raw_pci_ops)
> return 0;
> #ifdef CONFIG_PCI_BIOS
> Index: linux/arch/i386/pci/mmconfig-shared.c
> ===================================================================
> --- linux.orig/arch/i386/pci/mmconfig-shared.c
> +++ linux/arch/i386/pci/mmconfig-shared.c
> @@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso
> pci_mmcfg_resources_inserted = 1;
> }
>
> -static void __init pci_mmcfg_reject_broken(int type)
> +static acpi_status __init check_mcfg_resource(struct acpi_resource *res,
> + void *data)
> +{
> + struct resource *mcfg_res = data;
> + struct acpi_resource_address64 address;
> + acpi_status status;
> +
> + if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) {
> + struct acpi_resource_fixed_memory32 *fixmem32 =
> + &res->data.fixed_memory32;
> + if (!fixmem32)
> + return AE_OK;
> + if ((mcfg_res->start >= fixmem32->address) &&
> + (mcfg_res->end < (fixmem32->address +
> + fixmem32->address_length))) {
> + mcfg_res->flags = 1;
> + return AE_CTRL_TERMINATE;
> + }
> + }
> + if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) &&
> + (res->type != ACPI_RESOURCE_TYPE_ADDRESS64))
> + return AE_OK;
> +
> + status = acpi_resource_to_address64(res, &address);
> + if (ACPI_FAILURE(status) ||
> + (address.address_length <= 0) ||
> + (address.resource_type != ACPI_MEMORY_RANGE))
> + return AE_OK;
> +
> + if ((mcfg_res->start >= address.minimum) &&
> + (mcfg_res->end < (address.minimum + address.address_length))) {
> + mcfg_res->flags = 1;
> + return AE_CTRL_TERMINATE;
> + }
> + return AE_OK;
> +}
> +
> +static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl,
> + void *context, void **rv)
> +{
> + struct resource *mcfg_res = context;
> +
> + acpi_walk_resources(handle, METHOD_NAME__CRS,
> + check_mcfg_resource, context);
> +
> + if (mcfg_res->flags)
> + return AE_CTRL_TERMINATE;
> +
> + return AE_OK;
> +}
> +
> +static int __init is_acpi_reserved(unsigned long start, unsigned long end)
> +{
> + struct resource mcfg_res;
> +
> + mcfg_res.start = start;
> + mcfg_res.end = end;
> + mcfg_res.flags = 0;
> +
> + acpi_get_devices("PNP0C01", find_mboard_resource, &mcfg_res, NULL);
> +
> + if (!mcfg_res.flags)
> + acpi_get_devices("PNP0C02", find_mboard_resource, &mcfg_res,
> + NULL);
> +
> + return mcfg_res.flags;
> +}
> +
> +static void __init pci_mmcfg_reject_broken(void)
> {
> typeof(pci_mmcfg_config[0]) *cfg;
> + int i;
>
> if ((pci_mmcfg_config_num == 0) ||
> (pci_mmcfg_config == NULL) ||
> @@ -229,17 +298,37 @@ static void __init pci_mmcfg_reject_brok
> goto reject;
> }
>
> - /*
> - * Only do this check when type 1 works. If it doesn't work
> - * assume we run on a Mac and always use MCFG
> - */
> - if (type == 1 && !e820_all_mapped(cfg->address,
> - cfg->address + MMCONFIG_APER_MIN,
> - E820_RESERVED)) {
> - printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %Lx is not"
> - " E820-reserved\n", cfg->address);
> - goto reject;
> + for (i = 0; i < pci_mmcfg_config_num; i++) {
> + u32 size = (cfg->end_bus_number + 1) << 20;
> + cfg = &pci_mmcfg_config[i];
> + printk(KERN_NOTICE "PCI: MCFG configuration %d: base %lu "
> + "segment %hu buses %u - %u\n",
> + i, (unsigned long)cfg->address, cfg->pci_segment,
> + (unsigned int)cfg->start_bus_number,
> + (unsigned int)cfg->end_bus_number);
> + if (is_acpi_reserved(cfg->address, cfg->address + size - 1)) {
> + printk(KERN_NOTICE "PCI: MCFG area at %Lx reserved "
> + "in ACPI motherboard resources\n",
> + cfg->address);
> + } else {
> + printk(KERN_ERR "PCI: BIOS Bug: MCFG area at %Lx is not"
> + " reserved in ACPI motherboard resources\n",
> + cfg->address);
> + /* Don't try to do this check unless configuration
> + type 1 is available. */
> + if ((pci_probe & PCI_PROBE_CONF1) &&
> + e820_all_mapped(cfg->address,
> + cfg->address + size - 1,
> + E820_RESERVED))
> + printk(KERN_NOTICE
> + "PCI: MCFG area at %Lx reserved in "
> + "E820\n",
> + cfg->address);
> + else
> + goto reject;
> + }
> }
> +
> return;
>
> reject:
> @@ -249,20 +338,46 @@ reject:
> pci_mmcfg_config_num = 0;
> }
>
> -void __init pci_mmcfg_init(int type)
> +void __init pci_mmcfg_early_init(int type)
> +{
> + if ((pci_probe & PCI_PROBE_MMCONF) == 0)
> + return;
> +
> + /* If type 1 access is available, no need to enable MMCONFIG yet, we can
> + defer until later when the ACPI interpreter is available to better
> + validate things. */
> + if (type == 1)
> + return;
> +
> + acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg);
> +
> + if ((pci_mmcfg_config_num == 0) ||
> + (pci_mmcfg_config == NULL) ||
> + (pci_mmcfg_config[0].address == 0))
> + return;
> +
> + if (pci_mmcfg_arch_init())
> + pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
> +}
> +
> +void __init pci_mmcfg_late_init(void)
> {
> int known_bridge = 0;
>
> + /* MMCONFIG disabled */
> if ((pci_probe & PCI_PROBE_MMCONF) == 0)
> return;
>
> - if (type == 1 && pci_mmcfg_check_hostbridge())
> - known_bridge = 1;
> + /* MMCONFIG already enabled */
> + if (!(pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF))
> + return;
>
> - if (!known_bridge) {
> + if ((pci_probe & PCI_PROBE_CONF1) && pci_mmcfg_check_hostbridge())
> + known_bridge = 1;
> + else
> acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg);
> - pci_mmcfg_reject_broken(type);
> - }
> +
> + pci_mmcfg_reject_broken();
>
> if ((pci_mmcfg_config_num == 0) ||
> (pci_mmcfg_config == NULL) ||
> @@ -270,7 +385,7 @@ void __init pci_mmcfg_init(int type)
> return;
>
> if (pci_mmcfg_arch_init()) {
> - if (type == 1)
> + if (pci_probe & PCI_PROBE_CONF1)
> unreachable_devices();
> if (known_bridge)
> pci_mmcfg_insert_resources(IORESOURCE_BUSY);
> Index: linux/arch/i386/pci/pci.h
> ===================================================================
> --- linux.orig/arch/i386/pci/pci.h
> +++ linux/arch/i386/pci/pci.h
> @@ -91,7 +91,6 @@ extern int pci_conf1_read(unsigned int s
> extern int pci_direct_probe(void);
> extern void pci_direct_init(int type);
> extern void pci_pcbios_init(void);
> -extern void pci_mmcfg_init(int type);
> extern void pcibios_sort(void);
>
> /* pci-mmconfig.c */
> Index: linux/drivers/acpi/bus.c
> ===================================================================
> --- linux.orig/drivers/acpi/bus.c
> +++ linux/drivers/acpi/bus.c
> @@ -35,6 +35,7 @@
> #ifdef CONFIG_X86
> #include <asm/mpspec.h>
> #endif
> +#include <linux/pci.h>
> #include <acpi/acpi_bus.h>
> #include <acpi/acpi_drivers.h>
>
> @@ -757,6 +758,7 @@ static int __init acpi_init(void)
> result = acpi_bus_init();
>
> if (!result) {
> + pci_mmcfg_late_init();

No!

MMCONFIG will not work with acpi=off any more.

because acpi_init==>pci_mmcfg_late_init==>pci_mmcfg_check_hostbridge...

can you move pci_mmcfg_later_init out of acpi_init and just call
somewhere after acpi_init...
or put pci_mmcfg_check_hostbridge back to pci_mmcfg_early_init?

YH

2007-09-22 06:56:11

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

On Fri, Sep 21, 2007 at 06:45:39PM -0400, Dave Jones wrote:
> On Sat, Sep 22, 2007 at 12:32:02AM +0200, Andi Kleen wrote:
>
>
> > + Select this for:
> > + Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename:
> > + -Willamette
> > + -Northwood
> > + -Mobile Pentium 4
> > + -Mobile Pentium 4 M
> > + -Extreme Edition (Gallatin)
> > + -Prescott
> > + -Prescott 2M
> > + -Cedar Mill
> > + -Presler
> > + -Smithfiled
> > + Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename:
> > + -Foster
> > + -Prestonia
> > + -Gallatin
> > + -Nocona
> > + -Irwindale
> > + -Cranford
> > + -Potomac
> > + -Paxville
> > + -Dempsey
>
> This seems like yet another list that will need to be perpetually
> kept up to date, and given 99% of users don't know the codename
> of their core, just the marketing name, I question its value.

As a bare minimum requirement the list presented here shall use same
names as used in /proc/cpuinfo

On this box I read:

vendor_id : GenuineIntel
model name : Pentium III (Coppermine)

This info must be present in Kconfig text (help text) too.
I always have trouble selecting the right CPU before so I welcome this patch
that give me more info - and maybe a bit too much.

Sam

2007-09-22 06:56:30

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

On 9/21/07, Yinghai Lu <[email protected]> wrote:
> On 9/21/07, Andi Kleen <[email protected]> wrote:
> >
> > From: Robert Hancock <[email protected]>
> >
> > This path adds validation of the MMCONFIG table against the ACPI reserved
> > motherboard resources. If the MMCONFIG table is found to be reserved in
> > ACPI, we don't bother checking the E820 table. The PCI Express firmware
> > spec apparently tells BIOS developers that reservation in ACPI is required
> > and E820 reservation is optional, so checking against ACPI first makes
> > sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though
> > it is perfectly functional, the existing check needlessly disables MMCONFIG
> > in these cases.
> >
> > In order to do this, MMCONFIG setup has been split into two phases. If PCI
> > configuration type 1 is not available then MMCONFIG is enabled early as
> > before. Otherwise, it is enabled later after the ACPI interpreter is
> > enabled, since we need to be able to execute control methods in order to
> > check the ACPI reserved resources. Presently this is just triggered off
> > the end of ACPI interpreter initialization.
> >
> > There are a few other behavioral changes here:
> >
> > - Validate all MMCONFIG configurations provided, not just the first one.
> >
> > - Validate the entire required length of each configuration according to
> > the provided ending bus number is reserved, not just the minimum required
> > allocation.
> >
> > - Validate that the area is reserved even if we read it from the chipset
> > directly and not from the MCFG table. This catches the case where the
> > BIOS didn't set the location properly in the chipset and has mapped it
> > over other things it shouldn't have.
> >
> > This also cleans up the MMCONFIG initialization functions so that they
> > simply do nothing if MMCONFIG is not compiled in.
> >
> > Based on an original patch by Rajesh Shah from Intel.
> >
> > [[email protected]: many fixes and cleanups]
> > Signed-off-by: Robert Hancock <[email protected]>
> > Signed-off-by: Andi Kleen <[email protected]>
> > Cc: Rajesh Shah <[email protected]>
> > Cc: Jesse Barnes <[email protected]>
> > Acked-by: Linus Torvalds <[email protected]>
> > Cc: Andi Kleen <[email protected]>
> > Cc: Greg KH <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>

Also the titile is misleading: it is x86 instead of i386.. because it
will affect x86_64 too.

YH

2007-09-22 06:57:41

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS

On Sat, Sep 22, 2007 at 12:32:18AM +0200, Andi Kleen wrote:
>
> Also allow to set svm lock.
>
> TBD double check, documentation, i386 support
>
> Signed-off-by: Andi Kleen <[email protected]>

Could we have this patch tagged with x86 instead of "Experimental" in subject.

Sam

2007-09-22 09:17:18

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS

I don't think we need this patch. When SVM is disabled KVM will tell on
module load. Further with SVM-lock it will be possible to re-enable SVM
even if it was disabled by BIOS using a key. In this case the user of
SVM has to clear the capability bit you set in this patch for all cpus.

On Sat, Sep 22, 2007 at 12:32:18AM +0200, Andi Kleen wrote:
>
> Also allow to set svm lock.
>
> TBD double check, documentation, i386 support
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> arch/x86_64/kernel/setup.c | 25 +++++++++++++++++++++++--
> include/asm-i386/cpufeature.h | 1 +
> include/asm-i386/msr-index.h | 3 +++
> 3 files changed, 27 insertions(+), 2 deletions(-)
>
> Index: linux/arch/x86_64/kernel/setup.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/setup.c
> +++ linux/arch/x86_64/kernel/setup.c
> @@ -565,7 +565,7 @@ static void __cpuinit early_init_amd(str
>
> static void __cpuinit init_amd(struct cpuinfo_x86 *c)
> {
> - unsigned level;
> + unsigned level, flags, dummy;
>
> #ifdef CONFIG_SMP
> unsigned long value;
> @@ -634,7 +634,28 @@ static void __cpuinit init_amd(struct cp
> /* Family 10 doesn't support C states in MWAIT so don't use it */
> if (c->x86 == 0x10 && !force_mwait)
> clear_bit(X86_FEATURE_MWAIT, &c->x86_capability);
> +
> + if (c->x86 >= 0xf && c->x86 <= 0x11 &&
> + !rdmsr_safe(MSR_VM_CR, &flags, &dummy) &&
> + (flags & 0x18))
> + set_bit(X86_FEATURE_VIRT_DISABLED, &c->x86_capability);
> +}
> +
> +static int enable_svm_lock(char *s)
> +{
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> + boot_cpu_data.x86 >= 0xf && boot_cpu_data.x86 <= 0x11) {
> + unsigned a,b;
> + if (rdmsr_safe(MSR_VM_CR, &a, &b))
> + return 0;
> + a |= (1 << 3); /* set SVM lock */
> + if (!wrmsr_safe(MSR_VM_CR, &a, &b))
> + return 1;
> + }
> + printk(KERN_ERR "CPU does not support svm_lock\n");
> + return 0;
> }
> +__setup("svm_lock", enable_svm_lock);
>
> static void __cpuinit detect_ht(struct cpuinfo_x86 *c)
> {
> @@ -985,7 +1006,7 @@ static int show_cpuinfo(struct seq_file
> NULL, NULL, NULL, NULL,
> "constant_tsc", "up", NULL, "arch_perfmon",
> "pebs", "bts", NULL, "sync_rdtsc",
> - "rep_good", NULL, NULL, NULL, NULL, NULL, NULL, NULL,
> + "rep_good", "virtualization_bios_disabled", NULL, NULL, NULL, NULL, NULL, NULL,
> NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
>
> /* Intel-defined (#2) */
> Index: linux/include/asm-i386/cpufeature.h
> ===================================================================
> --- linux.orig/include/asm-i386/cpufeature.h
> +++ linux/include/asm-i386/cpufeature.h
> @@ -82,6 +82,7 @@
> /* 14 free */
> #define X86_FEATURE_SYNC_RDTSC (3*32+15) /* RDTSC synchronizes the CPU */
> #define X86_FEATURE_REP_GOOD (3*32+16) /* rep microcode works well on this CPU */
> +#define X86_FEATURE_VIRT_DISABLED (3*32+17) /* Hardware virt. BIOS disabled */
>
> /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
> #define X86_FEATURE_XMM3 (4*32+ 0) /* Streaming SIMD Extensions-3 */
> Index: linux/include/asm-i386/msr-index.h
> ===================================================================
> --- linux.orig/include/asm-i386/msr-index.h
> +++ linux/include/asm-i386/msr-index.h
> @@ -98,6 +98,9 @@
> #define K8_MTRRFIXRANGE_DRAM_MODIFY 0x00080000 /* MtrrFixDramModEn bit */
> #define K8_MTRR_RDMEM_WRMEM_MASK 0x18181818 /* Mask: RdMem|WrMem */
>
> +/* SVM */
> +#define MSR_VM_CR 0xc0010114
> +
> /* K7 MSRs */
> #define MSR_K7_EVNTSEL0 0xc0010000
> #define MSR_K7_PERFCTR0 0xc0010004
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2007-09-22 09:46:53

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu


On Sep 22 2007 08:57, Sam Ravnborg wrote:
>>
>> This seems like yet another list that will need to be perpetually
>> kept up to date, and given 99% of users don't know the codename
>> of their core, just the marketing name, I question its value.
>
>As a bare minimum requirement the list presented here shall use same
>names as used in /proc/cpuinfo
>
>On this box I read:
>
>vendor_id : GenuineIntel
>model name : Pentium III (Coppermine)
>
>This info must be present in Kconfig text (help text) too.
>I always have trouble selecting the right CPU before so I welcome this patch
>that give me more info - and maybe a bit too much.

model name : AMD Athlon(tm) XP 2000+

here. Seems like Intel encodes their codenames into cpuid :(

2007-09-22 09:47:52

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] [6/50] i386: clean up oops/bug reports


On Sep 21 2007 18:41, Chuck Ebbert wrote:
>On 09/21/2007 06:32 PM, Andi Kleen wrote:
>> From: Pavel Emelyanov <[email protected]>
>>
>> Typically the oops first lines look like this:
>>
>> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
>> printing eip:
>> c049dfbd
>> *pde = 00000000
>> Oops: 0002 [#1]
>> PREEMPT SMP
>> ...
>>
>> Such output is gained with some ugly if (!nl) printk("\n"); code and
>> besides being a waste of lines, this is also annoying to read. The
>> following output looks better (and it is how it looks on x86_64):
>>
>> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
>> printing eip: c049dfbd *pde = 00000000
>> Oops: 0002 [#1] PREEMPT SMP

In fact, the EIP can be left out, because it is printed later
as part of the register dump anyway.

2007-09-22 10:01:25

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH] [34/50] i386: Fix argument signedness warnings


On Sep 22 2007 10:36, Satyam Sharma wrote:
>> from arch/i386/boot/compressed/misc.c:14:
>> include/asm/processor.h: In function $B!F(Jcpuid_count$B!G(J:
> ^^^^^^^^^^ ^^^^^^^^^^
>> include/asm/processor.h:615: warning: pointer targets in passing
>> argument 1 of $B!F(Jnative_cpuid$B!G(J differ in signedness
>
>> include/asm/processor.h:615: warning: pointer targets in passing
>> argument 2 of $B!F(Jnative_cpuid$B!G(J differ in signedness
>
>> include/asm/processor.h:615: warning: pointer targets in passing
>> argument 3 of $B!F(Jnative_cpuid$B!G(J differ in signedness
>
>> include/asm/processor.h:615: warning: pointer targets in passing
>> argument 4 of $B!F(Jnative_cpuid$B!G(J differ in signedness
> ^^^^^^^^^^ ^^^^^^^^^^
>
>Yikes. My bad, I had faulty (default) alpine settings (and a sad
>combination of LANG=en_US.UTF-8) when I made and sent out that patch.
>Please ensure that this finally gets committed in a somewhat saner and
>more readable state to the tree.

I am not too thrilled about gcc using non-ascii for interpunctuation
(for Western languages)..

2007-09-22 14:23:41

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

On Sat, Sep 22, 2007 at 08:57:24AM +0200, Sam Ravnborg wrote:

> > This seems like yet another list that will need to be perpetually
> > kept up to date, and given 99% of users don't know the codename
> > of their core, just the marketing name, I question its value.
>
> As a bare minimum requirement the list presented here shall use same
> names as used in /proc/cpuinfo
>
> On this box I read:
>
> vendor_id : GenuineIntel
> model name : Pentium III (Coppermine)

There are *dozens* of possible entries here, and always new ones.
The list will become so cumbersome that searching for the name
through the help text of each possible option will become a
really boring task, that I doubt anyone will seriously do.

Dave

--
http://www.codemonkey.org.uk

2007-09-22 16:28:41

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

Yinghai Lu wrote:
> No!
>
> MMCONFIG will not work with acpi=off any more.

I don't think this is unreasonable. The ACPI MCFG table is how we are
supposed to learn about the area in the first place. If we can't get the
table location via an approved mechanism, and can't validate it doesn't
overlap with another memory reservation or something, I really don't
think we should be using it.

I don't think it's much of an issue anyway - the chances that somebody
will want to run without ACPI on a system with MCFG are pretty low given
that you'll end up losing a bunch of functionality (not least of which
is multi-cores).

>
> because acpi_init==>pci_mmcfg_late_init==>pci_mmcfg_check_hostbridge...
>
> can you move pci_mmcfg_later_init out of acpi_init and just call
> somewhere after acpi_init...
> or put pci_mmcfg_check_hostbridge back to pci_mmcfg_early_init?
>
> YH
>

2007-09-22 17:42:23

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

On Sat, 22 Sep 2007 10:23:25 -0400 Dave Jones wrote:

> On Sat, Sep 22, 2007 at 08:57:24AM +0200, Sam Ravnborg wrote:
>
> > > This seems like yet another list that will need to be perpetually
> > > kept up to date, and given 99% of users don't know the codename
> > > of their core, just the marketing name, I question its value.
> >
> > As a bare minimum requirement the list presented here shall use same
> > names as used in /proc/cpuinfo
> >
> > On this box I read:
> >
> > vendor_id : GenuineIntel
> > model name : Pentium III (Coppermine)
>
> There are *dozens* of possible entries here, and always new ones.
> The list will become so cumbersome that searching for the name
> through the help text of each possible option will become a
> really boring task, that I doubt anyone will seriously do.

Yep. help text: see http://www.wikipedia.org :)

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-09-22 17:44:17

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] [34/50] i386: Fix argument signedness warnings

On Sat, 22 Sep 2007 12:01:16 +0200 (CEST) Jan Engelhardt wrote:

>
> On Sep 22 2007 10:36, Satyam Sharma wrote:
> >> from arch/i386/boot/compressed/misc.c:14:
> >> include/asm/processor.h: In function $B!F(Jcpuid_count$B!G(J:
> > ^^^^^^^^^^ ^^^^^^^^^^
> >> include/asm/processor.h:615: warning: pointer targets in passing
> >> argument 1 of $B!F(Jnative_cpuid$B!G(J differ in signedness
> >
> >> include/asm/processor.h:615: warning: pointer targets in passing
> >> argument 2 of $B!F(Jnative_cpuid$B!G(J differ in signedness
> >
> >> include/asm/processor.h:615: warning: pointer targets in passing
> >> argument 3 of $B!F(Jnative_cpuid$B!G(J differ in signedness
> >
> >> include/asm/processor.h:615: warning: pointer targets in passing
> >> argument 4 of $B!F(Jnative_cpuid$B!G(J differ in signedness
> > ^^^^^^^^^^ ^^^^^^^^^^
> >
> >Yikes. My bad, I had faulty (default) alpine settings (and a sad
> >combination of LANG=en_US.UTF-8) when I made and sent out that patch.
> >Please ensure that this finally gets committed in a somewhat saner and
> >more readable state to the tree.
>
> I am not too thrilled about gcc using non-ascii for interpunctuation
> (for Western languages)..

Ack. I usually build with "LC_ALL=C" to make those readable.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-09-22 17:50:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote:
> From: "Oliver Pinter" <[email protected]>
>
> add cpu core name for arch/i386/Kconfig.cpu:Pentium 4 sections help
> add Pentium D for arch/i386/Kconfig.cpu
> add Pentium D for arch/x86_64/Kconfig
>
> Signed-off-by: Oliver Pinter <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>
> Acked-by: Sam Ravnborg <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> arch/i386/Kconfig.cpu | 34 +++++++++++++++++++++++++++++++---
> arch/x86_64/Kconfig | 6 +++---
> 2 files changed, 34 insertions(+), 6 deletions(-)
>
> Index: linux/arch/i386/Kconfig.cpu
> ===================================================================
> --- linux.orig/arch/i386/Kconfig.cpu
> +++ linux/arch/i386/Kconfig.cpu
> @@ -115,11 +115,39 @@ config MPENTIUM4
> bool "Pentium-4/Celeron(P4-based)/Pentium-4 M/older Xeon"
> help
> Select this for Intel Pentium 4 chips. This includes the
> - Pentium 4, P4-based Celeron and Xeon, and Pentium-4 M
> - (not Pentium M) chips. This option enables compile flags
> - optimized for the chip, uses the correct cache shift, and
> + Pentium 4, Pentium D, P4-based Celeron and Xeon, and
> + Pentium-4 M (not Pentium M) chips. This option enables compile
> + flags optimized for the chip, uses the correct cache shift, and
> applies any applicable Pentium III optimizations.
>
> + CPUIDs: F[0-6][1-A] (in /proc/cpuinfo show = cpu family : 15 )
> +
> + Select this for:
> + Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename:
> + -Willamette
> + -Northwood
> + -Mobile Pentium 4
> + -Mobile Pentium 4 M
> + -Extreme Edition (Gallatin)
> + -Prescott
> + -Prescott 2M
> + -Cedar Mill
> + -Presler
> + -Smithfiled
> + Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename:
> + -Foster
> + -Prestonia
> + -Gallatin
> + -Nocona
> + -Irwindale
> + -Cranford
> + -Potomac
> + -Paxville
> + -Dempsey
> +
> + more info: http://balusc.xs4all.nl/srv/har-cpu.html

This will never be up to date. Also the URL above is redirected to an
empty bye/bye page. Put this up to one of the kernel related wikis, if
you think it might be useful at all. 99% of the users do not even know
which CPU they have in their system.

tglx




2007-09-22 18:02:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

On Sat, 2007-09-22 at 10:28 -0600, Robert Hancock wrote:
> Yinghai Lu wrote:
> > No!
> >
> > MMCONFIG will not work with acpi=off any more.
>
> I don't think this is unreasonable. The ACPI MCFG table is how we are
> supposed to learn about the area in the first place. If we can't get the
> table location via an approved mechanism, and can't validate it doesn't
> overlap with another memory reservation or something, I really don't
> think we should be using it.

We all know how correct ACPI tables are. Specifications are nice,
reality tells a different story.

> I don't think it's much of an issue anyway - the chances that somebody
> will want to run without ACPI on a system with MCFG are pretty low given
> that you'll end up losing a bunch of functionality (not least of which
> is multi-cores).

acpi=off is an often used debug switch and it _is_ quite useful. Taking
away debug functionality is not a good idea.

tglx


2007-09-22 18:42:38

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

Thomas Gleixner wrote:
> On Sat, 2007-09-22 at 10:28 -0600, Robert Hancock wrote:
>> Yinghai Lu wrote:
>>> No!
>>>
>>> MMCONFIG will not work with acpi=off any more.
>> I don't think this is unreasonable. The ACPI MCFG table is how we are
>> supposed to learn about the area in the first place. If we can't get the
>> table location via an approved mechanism, and can't validate it doesn't
>> overlap with another memory reservation or something, I really don't
>> think we should be using it.
>
> We all know how correct ACPI tables are. Specifications are nice,
> reality tells a different story.

MMCONFIG can't be used without ACPI in any case unless we know where the
table is using chipset-specific knowledge (i.e. reading the registers
directly). Doing that without being told that this area is really
intended to be used, via the ACPI table, is dangerous, i.e. we don't
necessarily know if the MMCONFIG is broken on the platform in some way
we can't detect.

>
>> I don't think it's much of an issue anyway - the chances that somebody
>> will want to run without ACPI on a system with MCFG are pretty low given
>> that you'll end up losing a bunch of functionality (not least of which
>> is multi-cores).
>
> acpi=off is an often used debug switch and it _is_ quite useful. Taking
> away debug functionality is not a good idea.

If someone has to turn ACPI off, disabling MMCONFIG is probably the
least of their worries..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/


2007-09-22 19:05:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS

On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote:
> Also allow to set svm lock.

Please use two separate patches. The detection and cpuinfo display is
not related to set svm lock.

> TBD double check, documentation, i386 support

Yes, documentation would be useful. See below.

> Signed-off-by: Andi Kleen <[email protected]>
>
> ---
> arch/x86_64/kernel/setup.c | 25 +++++++++++++++++++++++--
> include/asm-i386/cpufeature.h | 1 +
> include/asm-i386/msr-index.h | 3 +++
> 3 files changed, 27 insertions(+), 2 deletions(-)
>
> Index: linux/arch/x86_64/kernel/setup.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/setup.c
> +++ linux/arch/x86_64/kernel/setup.c
> @@ -565,7 +565,7 @@ static void __cpuinit early_init_amd(str
>
> static void __cpuinit init_amd(struct cpuinfo_x86 *c)
> {
> - unsigned level;
> + unsigned level, flags, dummy;
>
> #ifdef CONFIG_SMP
> unsigned long value;
> @@ -634,7 +634,28 @@ static void __cpuinit init_amd(struct cp
> /* Family 10 doesn't support C states in MWAIT so don't use it */
> if (c->x86 == 0x10 && !force_mwait)
> clear_bit(X86_FEATURE_MWAIT, &c->x86_capability);
> +
> + if (c->x86 >= 0xf && c->x86 <= 0x11 &&
> + !rdmsr_safe(MSR_VM_CR, &flags, &dummy) &&
> + (flags & 0x18))
> + set_bit(X86_FEATURE_VIRT_DISABLED, &c->x86_capability);

Why the check for 0x18 ???? And please can we use understandable
constants for this.

bit 3 (SVM_LOCK) controls only the writeability of bit 4 (SVME_DISABLE),
which controls whether SVM is allowed to be enabled or not.

bit 3 bit 4
0 0 SVM can be enabled in EFER, SVME_DISABLE is writeable
1 0 SVM can be enabled in EFER, SVME_DISABLE is not writeable
0 1 SVM can not be enabled in EFER, SVME_DISABLE is writeable
1 1 SVM can not be enabled in EFER, SVME_DISABLE is not writeable

So SVM is disabled, when bit 4 is set.

> +}
> +
> +static int enable_svm_lock(char *s)
> +{
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> + boot_cpu_data.x86 >= 0xf && boot_cpu_data.x86 <= 0x11) {
> + unsigned a,b;
> + if (rdmsr_safe(MSR_VM_CR, &a, &b))
> + return 0;
> + a |= (1 << 3); /* set SVM lock */

SVM_LOCK is read only according to data sheet. You can set bit 4
(SVME_DISABLE) to prevent KVM or what else using that feature.

tglx




2007-09-22 19:18:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] [20/50] x86_64: Fix some broken white space in arch/x86_64/mm/init.c


On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote:
> No functional changes
> Signed-off-by: Andi Kleen <[email protected]>

Can we please fix _ALL_ white space and coding style issues in this file
while we are at it?

Updated patch below.

tglx

diff --git a/arch/x86_64/mm/init.c b/arch/x86_64/mm/init.c
index 458893b..346c962 100644
--- a/arch/x86_64/mm/init.c
+++ b/arch/x86_64/mm/init.c
@@ -70,10 +70,11 @@ void show_mem(void)

printk(KERN_INFO "Mem-info:\n");
show_free_areas();
- printk(KERN_INFO "Free swap: %6ldkB\n", nr_swap_pages<<(PAGE_SHIFT-10));
+ printk(KERN_INFO "Free swap: %6ldkB\n",
+ nr_swap_pages<<(PAGE_SHIFT-10));

for_each_online_pgdat(pgdat) {
- for (i = 0; i < pgdat->node_spanned_pages; ++i) {
+ for (i = 0; i < pgdat->node_spanned_pages; ++i) {
/* this loop can take a while with 256 GB and 4k pages
so update the NMI watchdog */
if (unlikely(i % MAX_ORDER_NR_PAGES == 0)) {
@@ -89,7 +90,7 @@ void show_mem(void)
cached++;
else if (page_count(page))
shared += page_count(page) - 1;
- }
+ }
}
printk(KERN_INFO "%lu pages of RAM\n", total);
printk(KERN_INFO "%lu reserved pages\n",reserved);
@@ -100,21 +101,22 @@ void show_mem(void)
int after_bootmem;

static __init void *spp_getpage(void)
-{
+{
void *ptr;
if (after_bootmem)
- ptr = (void *) get_zeroed_page(GFP_ATOMIC);
+ ptr = (void *) get_zeroed_page(GFP_ATOMIC);
else
ptr = alloc_bootmem_pages(PAGE_SIZE);
if (!ptr || ((unsigned long)ptr & ~PAGE_MASK))
- panic("set_pte_phys: cannot allocate page data %s\n", after_bootmem?"after bootmem":"");
+ panic("set_pte_phys: cannot allocate page data %s\n",
+ after_bootmem?"after bootmem":"");

Dprintk("spp_getpage %p\n", ptr);
return ptr;
-}
+}

static __init void set_pte_phys(unsigned long vaddr,
- unsigned long phys, pgprot_t prot)
+ unsigned long phys, pgprot_t prot)
{
pgd_t *pgd;
pud_t *pud;
@@ -130,10 +132,11 @@ static __init void set_pte_phys(unsigned long vaddr,
}
pud = pud_offset(pgd, vaddr);
if (pud_none(*pud)) {
- pmd = (pmd_t *) spp_getpage();
+ pmd = (pmd_t *) spp_getpage();
set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE | _PAGE_USER));
if (pmd != pmd_offset(pud, 0)) {
- printk("PAGETABLE BUG #01! %p <-> %p\n", pmd, pmd_offset(pud,0));
+ printk("PAGETABLE BUG #01! %p <-> %p\n", pmd,
+ pmd_offset(pud,0));
return;
}
}
@@ -162,7 +165,7 @@ static __init void set_pte_phys(unsigned long vaddr,
}

/* NOTE: this is meant to be run only at boot */
-void __init
+void __init
__set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot)
{
unsigned long address = __fix_to_virt(idx);
@@ -177,7 +180,7 @@ __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot)
unsigned long __meminitdata table_start, table_end;

static __meminit void *alloc_low_page(unsigned long *phys)
-{
+{
unsigned long pfn = table_end++;
void *adr;

@@ -187,8 +190,8 @@ static __meminit void *alloc_low_page(unsigned long *phys)
return adr;
}

- if (pfn >= end_pfn)
- panic("alloc_low_page: ran out of memory");
+ if (pfn >= end_pfn)
+ panic("alloc_low_page: ran out of memory");

adr = early_ioremap(pfn * PAGE_SIZE, PAGE_SIZE);
memset(adr, 0, PAGE_SIZE);
@@ -197,13 +200,13 @@ static __meminit void *alloc_low_page(unsigned long *phys)
}

static __meminit void unmap_low_page(void *adr)
-{
+{

if (after_bootmem)
return;

early_iounmap(adr, PAGE_SIZE);
-}
+}

/* Must run before zap_low_mappings */
__meminit void *early_ioremap(unsigned long addr, unsigned long size)
@@ -224,7 +227,8 @@ __meminit void *early_ioremap(unsigned long addr, unsigned long size)
vaddr += addr & ~PMD_MASK;
addr &= PMD_MASK;
for (i = 0; i < pmds; i++, addr += PMD_SIZE)
- set_pmd(pmd + i,__pmd(addr | _KERNPG_TABLE | _PAGE_PSE));
+ set_pmd(pmd + i,
+ __pmd(addr | _KERNPG_TABLE | _PAGE_PSE));
__flush_tlb();
return (void *)vaddr;
next:
@@ -284,8 +288,9 @@ phys_pmd_update(pud_t *pud, unsigned long address, unsigned long end)
__flush_tlb_all();
}

-static void __meminit phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end)
-{
+static void __meminit phys_pud_init(pud_t *pud_page, unsigned long addr,
+ unsigned long end)
+{
int i = pud_index(addr);


@@ -298,9 +303,9 @@ static void __meminit phys_pud_init(pud_t *pud_page, unsigned long addr, unsigne
break;

if (!after_bootmem && !e820_any_mapped(addr,addr+PUD_SIZE,0)) {
- set_pud(pud, __pud(0));
+ set_pud(pud, __pud(0));
continue;
- }
+ }

if (pud_val(*pud)) {
phys_pmd_update(pud, addr, end);
@@ -315,7 +320,7 @@ static void __meminit phys_pud_init(pud_t *pud_page, unsigned long addr, unsigne
unmap_low_page(pmd);
}
__flush_tlb();
-}
+}

static void __init find_early_table_space(unsigned long end)
{
@@ -324,13 +329,13 @@ static void __init find_early_table_space(unsigned long end)
puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
tables = round_up(puds * sizeof(pud_t), PAGE_SIZE) +
- round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+ round_up(pmds * sizeof(pmd_t), PAGE_SIZE);

- /* RED-PEN putting page tables only on node 0 could
- cause a hotspot and fill up ZONE_DMA. The page tables
- need roughly 0.5KB per GB. */
- start = 0x8000;
- table_start = find_e820_area(start, end, tables);
+ /* RED-PEN putting page tables only on node 0 could
+ cause a hotspot and fill up ZONE_DMA. The page tables
+ need roughly 0.5KB per GB. */
+ start = 0x8000;
+ table_start = find_e820_area(start, end, tables);
if (table_start == -1UL)
panic("Cannot find space for the kernel page tables");

@@ -338,24 +343,24 @@ static void __init find_early_table_space(unsigned long end)
table_end = table_start;

early_printk("kernel direct mapping tables up to %lx @ %lx-%lx\n",
- end, table_start << PAGE_SHIFT,
- (table_start << PAGE_SHIFT) + tables);
+ end, table_start << PAGE_SHIFT,
+ (table_start << PAGE_SHIFT) + tables);
}

/* Setup the direct mapping of the physical memory at PAGE_OFFSET.
- This runs before bootmem is initialized and gets pages directly from the
+ This runs before bootmem is initialized and gets pages directly from the
physical memory. To access them they are temporarily mapped. */
void __meminit init_memory_mapping(unsigned long start, unsigned long end)
-{
- unsigned long next;
+{
+ unsigned long next;

Dprintk("init_memory_mapping\n");

- /*
- * Find space for the kernel direct mapping tables.
- * Later we should allocate these tables in the local node of the memory
- * mapped. Unfortunately this is done currently before the nodes are
- * discovered.
+ /*
+ * Find space for the kernel direct mapping tables. Later we
+ * should allocate these tables in the local node of the
+ * memory mapped. Unfortunately this is done currently before
+ * the nodes are discovered.
*/
if (!after_bootmem)
find_early_table_space(end);
@@ -364,7 +369,7 @@ void __meminit init_memory_mapping(unsigned long start, unsigned long end)
end = (unsigned long)__va(end);

for (; start < end; start = next) {
- unsigned long pud_phys;
+ unsigned long pud_phys;
pgd_t *pgd = pgd_offset_k(start);
pud_t *pud;

@@ -374,13 +379,13 @@ void __meminit init_memory_mapping(unsigned long start, unsigned long end)
pud = alloc_low_page(&pud_phys);

next = start + PGDIR_SIZE;
- if (next > end)
- next = end;
+ if (next > end)
+ next = end;
phys_pud_init(pud, __pa(start), __pa(next));
if (!after_bootmem)
set_pgd(pgd_offset_k(start), mk_kernel_pgd(pud_phys));
unmap_low_page(pud);
- }
+ }

if (!after_bootmem)
mmu_cr4_features = read_cr4();
@@ -402,18 +407,20 @@ void __init paging_init(void)
}
#endif

-/* Unmap a kernel mapping if it exists. This is useful to avoid prefetches
- from the CPU leading to inconsistent cache lines. address and size
- must be aligned to 2MB boundaries.
- Does nothing when the mapping doesn't exist. */
-void __init clear_kernel_mapping(unsigned long address, unsigned long size)
+/*
+ * Unmap a kernel mapping if it exists. This is useful to avoid
+ * prefetches from the CPU leading to inconsistent cache
+ * lines. address and size must be aligned to 2MB boundaries. Does
+ * nothing when the mapping doesn't exist.
+ */
+void __init clear_kernel_mapping(unsigned long address, unsigned long size)
{
unsigned long end = address + size;

BUG_ON(address & ~LARGE_PAGE_MASK);
- BUG_ON(size & ~LARGE_PAGE_MASK);
-
- for (; address < end; address += LARGE_PAGE_SIZE) {
+ BUG_ON(size & ~LARGE_PAGE_MASK);
+
+ for (; address < end; address += LARGE_PAGE_SIZE) {
pgd_t *pgd = pgd_offset_k(address);
pud_t *pud;
pmd_t *pmd;
@@ -421,20 +428,23 @@ void __init clear_kernel_mapping(unsigned long address, unsigned long size)
continue;
pud = pud_offset(pgd, address);
if (pud_none(*pud))
- continue;
+ continue;
pmd = pmd_offset(pud, address);
if (!pmd || pmd_none(*pmd))
- continue;
- if (0 == (pmd_val(*pmd) & _PAGE_PSE)) {
- /* Could handle this, but it should not happen currently. */
- printk(KERN_ERR
- "clear_kernel_mapping: mapping has been split. will leak memory\n");
- pmd_ERROR(*pmd);
+ continue;
+ if (0 == (pmd_val(*pmd) & _PAGE_PSE)) {
+ /*
+ * Could handle this, but it should not happen
+ * currently.
+ */
+ printk(KERN_ERR "clear_kernel_mapping: mapping has "
+ "been split. will leak memory\n");
+ pmd_ERROR(*pmd);
}
- set_pmd(pmd, __pmd(0));
+ set_pmd(pmd, __pmd(0));
}
__flush_tlb_all();
-}
+}

/*
* Memory hotplug specific functions
@@ -492,10 +502,11 @@ EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);

#ifdef CONFIG_MEMORY_HOTPLUG_RESERVE
/*
- * Memory Hotadd without sparsemem. The mem_maps have been allocated in advance,
- * just online the pages.
+ * Memory Hotadd without sparsemem. The mem_maps have been allocated
+ * in advance, just online the pages.
*/
-int __add_pages(struct zone *z, unsigned long start_pfn, unsigned long nr_pages)
+int __add_pages(struct zone *z, unsigned long start_pfn,
+ unsigned long nr_pages)
{
int err = -EIO;
unsigned long pfn;
@@ -539,7 +550,7 @@ void __init mem_init(void)
totalram_pages = free_all_bootmem();
#endif
reservedpages = end_pfn - totalram_pages -
- absent_pages_in_range(0, end_pfn);
+ absent_pages_in_range(0, end_pfn);

after_bootmem = 1;

@@ -548,21 +559,22 @@ void __init mem_init(void)
initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;

/* Register memory areas for /proc/kcore */
- kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT);
- kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
+ kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT);
+ kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
VMALLOC_END-VMALLOC_START);
kclist_add(&kcore_kernel, &_stext, _end - _stext);
kclist_add(&kcore_modules, (void *)MODULES_VADDR, MODULES_LEN);
- kclist_add(&kcore_vsyscall, (void *)VSYSCALL_START,
- VSYSCALL_END - VSYSCALL_START);
-
- printk("Memory: %luk/%luk available (%ldk kernel code, %ldk reserved, %ldk data, %ldk init)\n",
- (unsigned long) nr_free_pages() << (PAGE_SHIFT-10),
- end_pfn << (PAGE_SHIFT-10),
- codesize >> 10,
- reservedpages << (PAGE_SHIFT-10),
- datasize >> 10,
- initsize >> 10);
+ kclist_add(&kcore_vsyscall, (void *)VSYSCALL_START,
+ VSYSCALL_END - VSYSCALL_START);
+
+ printk("Memory: %luk/%luk available (%ldk kernel code, %ldk reserved, "
+ "%ldk data, %ldk init)\n",
+ (unsigned long) nr_free_pages() << (PAGE_SHIFT-10),
+ end_pfn << (PAGE_SHIFT-10),
+ codesize >> 10,
+ reservedpages << (PAGE_SHIFT-10),
+ datasize >> 10,
+ initsize >> 10);
}

void free_init_pages(char *what, unsigned long begin, unsigned long end)
@@ -609,14 +621,15 @@ void mark_rodata_ro(void)
#ifdef CONFIG_KPROBES
start = (unsigned long)__start_rodata;
#endif
-
+
end = (unsigned long)__end_rodata;
start = (start + PAGE_SIZE - 1) & PAGE_MASK;
end &= PAGE_MASK;
if (end <= start)
return;

- change_page_attr_addr(start, (end - start) >> PAGE_SHIFT, PAGE_KERNEL_RO);
+ change_page_attr_addr(start, (end - start) >> PAGE_SHIFT,
+ PAGE_KERNEL_RO);

printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
(end - start) >> 10);
@@ -638,8 +651,8 @@ void free_initrd_mem(unsigned long start, unsigned long end)
}
#endif

-void __init reserve_bootmem_generic(unsigned long phys, unsigned len)
-{
+void __init reserve_bootmem_generic(unsigned long phys, unsigned len)
+{
#ifdef CONFIG_NUMA
int nid = phys_to_nid(phys);
#endif
@@ -656,9 +669,9 @@ void __init reserve_bootmem_generic(unsigned long phys, unsigned len)

/* Should check here against the e820 map to avoid double free */
#ifdef CONFIG_NUMA
- reserve_bootmem_node(NODE_DATA(nid), phys, len);
-#else
- reserve_bootmem(phys, len);
+ reserve_bootmem_node(NODE_DATA(nid), phys, len);
+#else
+ reserve_bootmem(phys, len);
#endif
if (phys+len <= MAX_DMA_PFN*PAGE_SIZE) {
dma_reserve += len / PAGE_SIZE;
@@ -666,24 +679,24 @@ void __init reserve_bootmem_generic(unsigned long phys, unsigned len)
}
}

-int kern_addr_valid(unsigned long addr)
-{
+int kern_addr_valid(unsigned long addr)
+{
unsigned long above = ((long)addr) >> __VIRTUAL_MASK_SHIFT;
- pgd_t *pgd;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;

if (above != 0 && above != -1UL)
- return 0;
-
+ return 0;
+
pgd = pgd_offset_k(addr);
if (pgd_none(*pgd))
return 0;

pud = pud_offset(pgd, addr);
if (pud_none(*pud))
- return 0;
+ return 0;

pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd))
@@ -737,7 +750,7 @@ int in_gate_area_no_task(unsigned long addr)
void * __init alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
{
return __alloc_bootmem_core(pgdat->bdata, size,
- SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
+ SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
}

const char *arch_vma_name(struct vm_area_struct *vma)


2007-09-22 19:24:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] [31/50] x86_64: honor notify_die() returning NOTIFY_STOP

On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote:
> - notify_die(DIE_OOPS, str, regs, err, current->thread.trap_no, SIGSEGV);
> + if (notify_die(DIE_OOPS, str, regs, err, current->thread.trap_no, SIGSEGV) == NOTIFY_STOP)

80 chars please.

tglx


2007-09-22 19:33:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] [35/50] i386: Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.

On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote:
> From: Akinobu Mita <[email protected]>
>
> Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.
>
> Cc: "H. Peter Anvin" <[email protected]>
> Signed-off-by: Akinobu Mita <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>
> Cc: Gautham R Shenoy <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> arch/i386/kernel/cpuid.c | 32 +++++++++++++++++++-------------
> 1 file changed, 19 insertions(+), 13 deletions(-)
>
> Index: linux/arch/i386/kernel/cpuid.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/cpuid.c
> +++ linux/arch/i386/kernel/cpuid.c
> @@ -136,15 +136,18 @@ static const struct file_operations cpui
> .open = cpuid_open,
> };
>
> -static int __cpuinit cpuid_device_create(int i)
> +static int cpuid_device_create(int cpu)

__cpuinit please

Thanks,

tglx



2007-09-22 20:40:30

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

On 9/22/07, Robert Hancock <[email protected]> wrote:
> Thomas Gleixner wrote:
> > On Sat, 2007-09-22 at 10:28 -0600, Robert Hancock wrote:
> >> Yinghai Lu wrote:
> >>> No!
> >>>
> >>> MMCONFIG will not work with acpi=off any more.
> >> I don't think this is unreasonable. The ACPI MCFG table is how we are
> >> supposed to learn about the area in the first place. If we can't get the
> >> table location via an approved mechanism, and can't validate it doesn't
> >> overlap with another memory reservation or something, I really don't
> >> think we should be using it.
> >
> > We all know how correct ACPI tables are. Specifications are nice,
> > reality tells a different story.
>
> MMCONFIG can't be used without ACPI in any case unless we know where the
> table is using chipset-specific knowledge (i.e. reading the registers
> directly). Doing that without being told that this area is really
> intended to be used, via the ACPI table, is dangerous, i.e. we don't
> necessarily know if the MMCONFIG is broken on the platform in some way
> we can't detect.

the BIOS get these info from the chipset too.
for AMD Fam 10h opteron, we can read that MSR for MMCONFIG base.

>
> >
> >> I don't think it's much of an issue anyway - the chances that somebody
> >> will want to run without ACPI on a system with MCFG are pretty low given
> >> that you'll end up losing a bunch of functionality (not least of which
> >> is multi-cores).
> >
> > acpi=off is an often used debug switch and it _is_ quite useful. Taking
> > away debug functionality is not a good idea.
>
> If someone has to turn ACPI off, disabling MMCONFIG is probably the
> least of their worries..

MMCONFIG has nothing to do ACPI..., just becase MCFG in the ACPI, we
must use ACPI for MMCONFIG?

For AMD Fam 10h opteron, because using MMCONFIG need via %eax, so BIOS
will stilll stay with MCFG entry for MCP55 SB to not break other
os..., then you can not access ext config space for NB...

with enabling MMCONFIG in NB, and read NNCONFIG BASE from NB,( via
pci_mmcfg_check_hostbridge) that we get full MMCONFIG access for NB...

Anyway this patch alter the feature...

BTW if you trust MCFG in ACPI so much, why do you need to bother to
verify that in DSDT...

YH

2007-09-22 20:47:34

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

On 9/22/07, Robert Hancock <[email protected]> wrote:
> Yinghai Lu wrote:
> > No!
> >
> > MMCONFIG will not work with acpi=off any more.
>
> I don't think this is unreasonable. The ACPI MCFG table is how we are
> supposed to learn about the area in the first place. If we can't get the
> table location via an approved mechanism, and can't validate it doesn't
> overlap with another memory reservation or something, I really don't
> think we should be using it.
>
> I don't think it's much of an issue anyway - the chances that somebody
> will want to run without ACPI on a system with MCFG are pretty low given
> that you'll end up losing a bunch of functionality (not least of which
> is multi-cores).

with acpi=off, that we do lose some features including acpi hotplug
and power management feature...
but we don't lose anything about numa ( multi-cores...) and
bus-numa... (we get these info from NB pci conf for AMD rev C, rev E,
rev F, and Fam 10 opteron)...

Finally we lose bugs introduced by ACPI code ...

YH

2007-09-22 20:57:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

Yinghai Lu wrote:
>>> We all know how correct ACPI tables are. Specifications are nice,
>>> reality tells a different story.
>> MMCONFIG can't be used without ACPI in any case unless we know where the
>> table is using chipset-specific knowledge (i.e. reading the registers
>> directly). Doing that without being told that this area is really
>> intended to be used, via the ACPI table, is dangerous, i.e. we don't
>> necessarily know if the MMCONFIG is broken on the platform in some way
>> we can't detect.
>
> the BIOS get these info from the chipset too.
> for AMD Fam 10h opteron, we can read that MSR for MMCONFIG base.

I think he's saying we don't know a safe place to park the MMCONFIG area
if we don't have this information. However, this applies to *any*
allocation of address space, which we do all the time, so although a
valid argument this has been decided already many times over.

-hpa

2007-09-22 21:28:20

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

Yinghai Lu wrote:
> On 9/22/07, Robert Hancock <[email protected]> wrote:
>> Thomas Gleixner wrote:
>>> On Sat, 2007-09-22 at 10:28 -0600, Robert Hancock wrote:
>>>> Yinghai Lu wrote:
>>>>> No!
>>>>>
>>>>> MMCONFIG will not work with acpi=off any more.
>>>> I don't think this is unreasonable. The ACPI MCFG table is how we are
>>>> supposed to learn about the area in the first place. If we can't get the
>>>> table location via an approved mechanism, and can't validate it doesn't
>>>> overlap with another memory reservation or something, I really don't
>>>> think we should be using it.
>>> We all know how correct ACPI tables are. Specifications are nice,
>>> reality tells a different story.
>> MMCONFIG can't be used without ACPI in any case unless we know where the
>> table is using chipset-specific knowledge (i.e. reading the registers
>> directly). Doing that without being told that this area is really
>> intended to be used, via the ACPI table, is dangerous, i.e. we don't
>> necessarily know if the MMCONFIG is broken on the platform in some way
>> we can't detect.
>
> the BIOS get these info from the chipset too.
> for AMD Fam 10h opteron, we can read that MSR for MMCONFIG base.
>
>>>> I don't think it's much of an issue anyway - the chances that somebody
>>>> will want to run without ACPI on a system with MCFG are pretty low given
>>>> that you'll end up losing a bunch of functionality (not least of which
>>>> is multi-cores).
>>> acpi=off is an often used debug switch and it _is_ quite useful. Taking
>>> away debug functionality is not a good idea.
>> If someone has to turn ACPI off, disabling MMCONFIG is probably the
>> least of their worries..
>
> MMCONFIG has nothing to do ACPI..., just becase MCFG in the ACPI, we
> must use ACPI for MMCONFIG?

config PCI_MMCONFIG
bool
depends on PCI && ACPI && (PCI_GOMMCONFIG || PCI_GOANY)
default y

We already depend on ACPI for MMCONFIG support at compile time. This
patch does not change that.

We could conceivably skip the validation if ACPI was disabled, though
this would only make a difference in the few cases where we can detect
the MMCONFIG area without it (looks like currently only Intel E7520 and
945, at least in mainline).

>
> For AMD Fam 10h opteron, because using MMCONFIG need via %eax, so BIOS
> will stilll stay with MCFG entry for MCP55 SB to not break other
> os..., then you can not access ext config space for NB...
>
> with enabling MMCONFIG in NB, and read NNCONFIG BASE from NB,( via
> pci_mmcfg_check_hostbridge) that we get full MMCONFIG access for NB...
>
> Anyway this patch alter the feature...
>
> BTW if you trust MCFG in ACPI so much, why do you need to bother to
> verify that in DSDT...

One reason is that Windows pre-Vista does not use the MCFG table, it
does use the ACPI reserved resources. Since the board/system
manufacturers have been testing against Windows, the resource
reservations should be more likely correct than the MCFG table which was
not used until Vista came along.

2007-09-23 01:20:27

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

On 9/22/07, Robert Hancock <[email protected]> wrote:
> Yinghai Lu wrote:
> > On 9/22/07, Robert Hancock <[email protected]> wrote:
> >> Thomas Gleixner wrote:
> >>> On Sat, 2007-09-22 at 10:28 -0600, Robert Hancock wrote:
> >>>> Yinghai Lu wrote:
> >>>>> No!
> >>>>>
> >>>>> MMCONFIG will not work with acpi=off any more.
> >>>> I don't think this is unreasonable. The ACPI MCFG table is how we are
> >>>> supposed to learn about the area in the first place. If we can't get the
> >>>> table location via an approved mechanism, and can't validate it doesn't
> >>>> overlap with another memory reservation or something, I really don't
> >>>> think we should be using it.
> >>> We all know how correct ACPI tables are. Specifications are nice,
> >>> reality tells a different story.
> >> MMCONFIG can't be used without ACPI in any case unless we know where the
> >> table is using chipset-specific knowledge (i.e. reading the registers
> >> directly). Doing that without being told that this area is really
> >> intended to be used, via the ACPI table, is dangerous, i.e. we don't
> >> necessarily know if the MMCONFIG is broken on the platform in some way
> >> we can't detect.
> >
> > the BIOS get these info from the chipset too.
> > for AMD Fam 10h opteron, we can read that MSR for MMCONFIG base.
> >
> >>>> I don't think it's much of an issue anyway - the chances that somebody
> >>>> will want to run without ACPI on a system with MCFG are pretty low given
> >>>> that you'll end up losing a bunch of functionality (not least of which
> >>>> is multi-cores).
> >>> acpi=off is an often used debug switch and it _is_ quite useful. Taking
> >>> away debug functionality is not a good idea.
> >> If someone has to turn ACPI off, disabling MMCONFIG is probably the
> >> least of their worries..
> >
> > MMCONFIG has nothing to do ACPI..., just becase MCFG in the ACPI, we
> > must use ACPI for MMCONFIG?
>
> config PCI_MMCONFIG
> bool
> depends on PCI && ACPI && (PCI_GOMMCONFIG || PCI_GOANY)
> default y
>
> We already depend on ACPI for MMCONFIG support at compile time. This
> patch does not change that.
>
> We could conceivably skip the validation if ACPI was disabled, though
> this would only make a difference in the few cases where we can detect
> the MMCONFIG area without it (looks like currently only Intel E7520 and
> 945, at least in mainline).

can you make pci_mmcfg_late_init take one parameter about if acpi is
there or not?

so in acpi_init will be

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 9ba778a..a4a6a6f 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -746,6 +746,7 @@ static int __init acpi_init(void)

if (acpi_disabled) {
printk(KERN_INFO PREFIX "Interpreter disabled.\n");
+ pci_mmcfg_late_init(0);
return -ENODEV;
}

@@ -757,6 +758,7 @@ static int __init acpi_init(void)
result = acpi_bus_init();

if (!result) {
+ pci_mmcfg_late_init(1);
#ifdef CONFIG_PM_LEGACY
if (!PM_IS_ACTIVE())
pm_active = 1;
@@ -767,8 +769,10 @@ static int __init acpi_init(void)
result = -ENODEV;
}
#endif
- } else
+ } else {
+ pci_mmcfg_late_init(0);
disable_acpi();
+ }

return result;
}

YH

2007-09-23 01:31:32

by Oleg Verych

[permalink] [raw]
Subject: Re: [PATCH] [20/50] x86_64: Fix some broken white space in arch/x86_64/mm/init.c

Much, much better :)

> -static void __meminit phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end)
> -{

+static void __meminit phys_pud_init(pud_t *pud_page, ulong addr, ulong end)

If somebody have *strong* objections, please say so.

[]
> @@ -737,7 +750,7 @@ int in_gate_area_no_task(unsigned long addr)
> void * __init alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
> {
> return __alloc_bootmem_core(pgdat->bdata, size,
> - SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
> + SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
> }

Maybe just?

+ return __alloc_bootmem_core(pgdat->bdata, size, SMP_CACHE_BYTES, 1UL << 32, 0);

(87 is alright sometimes)
_____

2007-09-23 01:34:31

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources

On 9/22/07, Robert Hancock <[email protected]> wrote:
> Yinghai Lu wrote:
> > On 9/22/07, Robert Hancock <[email protected]> wrote:
> >> Thomas Gleixner wrote:
> >>> On Sat, 2007-09-22 at 10:28 -0600, Robert Hancock wrote:
> >>>> Yinghai Lu wrote:
> >>>>> No!
> >>>>>
> >>>>> MMCONFIG will not work with acpi=off any more.
> >>>> I don't think this is unreasonable. The ACPI MCFG table is how we are
> >>>> supposed to learn about the area in the first place. If we can't get the
> >>>> table location via an approved mechanism, and can't validate it doesn't
> >>>> overlap with another memory reservation or something, I really don't
> >>>> think we should be using it.
> >>> We all know how correct ACPI tables are. Specifications are nice,
> >>> reality tells a different story.
> >> MMCONFIG can't be used without ACPI in any case unless we know where the
> >> table is using chipset-specific knowledge (i.e. reading the registers
> >> directly). Doing that without being told that this area is really
> >> intended to be used, via the ACPI table, is dangerous, i.e. we don't
> >> necessarily know if the MMCONFIG is broken on the platform in some way
> >> we can't detect.
> >
> > the BIOS get these info from the chipset too.
> > for AMD Fam 10h opteron, we can read that MSR for MMCONFIG base.
> >
> >>>> I don't think it's much of an issue anyway - the chances that somebody
> >>>> will want to run without ACPI on a system with MCFG are pretty low given
> >>>> that you'll end up losing a bunch of functionality (not least of which
> >>>> is multi-cores).
> >>> acpi=off is an often used debug switch and it _is_ quite useful. Taking
> >>> away debug functionality is not a good idea.
> >> If someone has to turn ACPI off, disabling MMCONFIG is probably the
> >> least of their worries..
> >
> > MMCONFIG has nothing to do ACPI..., just becase MCFG in the ACPI, we
> > must use ACPI for MMCONFIG?
>
> config PCI_MMCONFIG
> bool
> depends on PCI && ACPI && (PCI_GOMMCONFIG || PCI_GOANY)
> default y
>
> We already depend on ACPI for MMCONFIG support at compile time. This
> patch does not change that.
>
> We could conceivably skip the validation if ACPI was disabled, though
> this would only make a difference in the few cases where we can detect
> the MMCONFIG area without it (looks like currently only Intel E7520 and
> 945, at least in mainline).

i added support for AMD Fam 10h NB, and that patch is in -mm.

I like to see pci_mmcfg_check_hostbridge is called before any acpi check up.

also in your pci_mmcfg_early_init, you may miss calling to
pci_mmcfg_insert_resource...

YH

2007-09-23 01:52:57

by Akinobu Mita

[permalink] [raw]
Subject: Re: [PATCH] [35/50] i386: Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.

2007/9/23, Thomas Gleixner <[email protected]>:
> On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote:
> > From: Akinobu Mita <[email protected]>
> >
> > Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.
> >
> > Cc: "H. Peter Anvin" <[email protected]>
> > Signed-off-by: Akinobu Mita <[email protected]>
> > Signed-off-by: Andi Kleen <[email protected]>
> > Cc: Gautham R Shenoy <[email protected]>
> > Cc: Oleg Nesterov <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
> > ---
> >
> > arch/i386/kernel/cpuid.c | 32 +++++++++++++++++++-------------
> > 1 file changed, 19 insertions(+), 13 deletions(-)
> >
> > Index: linux/arch/i386/kernel/cpuid.c
> > ===================================================================
> > --- linux.orig/arch/i386/kernel/cpuid.c
> > +++ linux/arch/i386/kernel/cpuid.c
> > @@ -136,15 +136,18 @@ static const struct file_operations cpui
> > .open = cpuid_open,
> > };
> >
> > -static int __cpuinit cpuid_device_create(int i)
> > +static int cpuid_device_create(int cpu)
>
> __cpuinit please
>

Yes. This eliminates earlier patch in this series.
([22/50] i386: Misc cpuinit annotation)

2007-09-23 07:52:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] [35/50] i386: Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.

On Sun, 2007-09-23 at 10:52 +0900, Akinobu Mita wrote:
> > > arch/i386/kernel/cpuid.c | 32 +++++++++++++++++++-------------
> > > 1 file changed, 19 insertions(+), 13 deletions(-)
> > >
> > > Index: linux/arch/i386/kernel/cpuid.c
> > > ===================================================================
> > > --- linux.orig/arch/i386/kernel/cpuid.c
> > > +++ linux/arch/i386/kernel/cpuid.c
> > > @@ -136,15 +136,18 @@ static const struct file_operations cpui
> > > .open = cpuid_open,
> > > };
> > >
> > > -static int __cpuinit cpuid_device_create(int i)
> > > +static int cpuid_device_create(int cpu)
> >
> > __cpuinit please
> >
>
> Yes. This eliminates earlier patch in this series.
> ([22/50] i386: Misc cpuinit annotation)

No, it's even worse:

#22 is applied before #35.
#35 is reverting the __cpuinit anotation of #22 with its modificiations
of cpuid_device_create()

tglx


2007-09-24 08:22:39

by Jan Beulich

[permalink] [raw]
Subject: Re: [patches] [PATCH] [13/50] x86: Fix and reenable CLFLUSH support inchange_page_attr()

>@@ -162,7 +198,7 @@ __change_page_attr(unsigned long address
> /* on x86-64 the direct mapping set at boot is not using 4k pages */
> BUG_ON(PageReserved(kpte_page));
>
>- save_page(kpte_page);
>+ save_page(kpte_page, 0);
> if (page_private(kpte_page) == 0)
> revert_page(address, ref_prot);
> return 0;

What is the point of continuing to launder kpte_page here? Page table pages
never get their caching attributes changed, nor would their direct mapping
ever change. (Same for i386, obviously.)

Jan

2007-09-24 08:31:08

by Jan Beulich

[permalink] [raw]
Subject: Re: [patches] [PATCH] [15/50] x86_64: Return EINVAL for unknown addressin change_page_attr

This should be accompanied by

addr2 = __START_KERNEL_map + __pa(address);
/* Make sure the kernel mappings stay executable */
prot2 = pte_pgprot(pte_mkexec(pfn_pte(0, prot)));
- err = __change_page_attr(addr2, pfn, prot2,
+ (void)__change_page_attr(addr2, pfn, prot2,
kref_prot(addr2));
}
}

as otherwise it is non-obvious why there's no check of err (which so far really
was missing). The reason this must be tolerated here is free_init_pages()/
free_initmem() removing the translation for the affected kernel image pages
altogether.

Jan

>>> Andi Kleen <[email protected]> 22.09.07 00:32 >>>

Matches what i386 does and makes more sense.

Signed-off-by: Andi Kleen <[email protected]>

---
arch/x86_64/mm/pageattr.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux/arch/x86_64/mm/pageattr.c
===================================================================
--- linux.orig/arch/x86_64/mm/pageattr.c
+++ linux/arch/x86_64/mm/pageattr.c
@@ -151,7 +151,9 @@ __change_page_attr(unsigned long address
pgprot_t ref_prot2;

kpte = lookup_address(address);
- if (!kpte) return 0;
+ if (!kpte)
+ return -EINVAL;
+
kpte_page = virt_to_page(((unsigned long)kpte) & PAGE_MASK);
BUG_ON(PageCompound(kpte_page));
BUG_ON(PageLRU(kpte_page));
_______________________________________________
patches mailing list
[email protected]
https://www.x86-64.org/mailman/listinfo/patches

2007-09-30 10:09:48

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu

On Saturday 22 September 2007 00:45:39 Dave Jones wrote:
> On Sat, Sep 22, 2007 at 12:32:02AM +0200, Andi Kleen wrote:
>
>
> > + Select this for:
> > + Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename:
> > + -Willamette
> > + -Northwood
> > + -Mobile Pentium 4
> > + -Mobile Pentium 4 M
> > + -Extreme Edition (Gallatin)
> > + -Prescott
> > + -Prescott 2M
> > + -Cedar Mill
> > + -Presler
> > + -Smithfiled
> > + Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename:
> > + -Foster
> > + -Prestonia
> > + -Gallatin
> > + -Nocona
> > + -Irwindale
> > + -Cranford
> > + -Potomac
> > + -Paxville
> > + -Dempsey
>
> This seems like yet another list that will need to be perpetually
> kept up to date, and given 99% of users don't know the codename
> of their core, just the marketing name, I question its value.

The problem is that it is hard to distingush Core2 based Xeons
from P4 based Xeons.

There won't be any new Pentium 4 cores so that list should be static.
But yes the C2 list is a little problematic. Will remove that
and just say "family 6"

Perhaps we should just bit the bullet and add a "optimize for current
CPU" option.

>
> > + more info: http://balusc.xs4all.nl/srv/har-cpu.html
>
> This URL is dead already.
>
> > config MPSC
> > bool "Intel P4 / older Netburst based Xeon"
> > help
>
> sidenote: I always wondered what 'PSC' stood for ?

Prescott; the Intel codename for their first x86-64 core.
Rhymes with K8 which was AMD's codename for the same.

Admittedly MCORE2 doesn't fit the pattern, it should have
been MMEROM. But then when the C2 support was implemented
the Marketing name was already known, which wasn't the case
with the others.

-Andi

2007-10-01 10:38:32

by Andi Kleen

[permalink] [raw]
Subject: Re: [patches] [PATCH] [13/50] x86: Fix and reenable CLFLUSH support inchange_page_attr()

On Monday 24 September 2007 10:23:29 Jan Beulich wrote:
> >@@ -162,7 +198,7 @@ __change_page_attr(unsigned long address
> > /* on x86-64 the direct mapping set at boot is not using 4k pages */
> > BUG_ON(PageReserved(kpte_page));
> >
> >- save_page(kpte_page);
> >+ save_page(kpte_page, 0);
> > if (page_private(kpte_page) == 0)
> > revert_page(address, ref_prot);
> > return 0;
>
> What is the point of continuing to launder kpte_page here? Page table pages
> never get their caching attributes changed, nor would their direct mapping
> ever change. (Same for i386, obviously.)

We can only free the page table after all the TLBs have been flushed;
otherwise other CPUs can walk already overwritten data as page tables (which can
then cause various problems). This is similar to the lazy TLB flush logic in the
standard VM.

We don't need to cache flush it though, that is why 0 is passed here.

Admittedly the function argument is a little bogus because the caller
could just do it; didn't change that yet.

-Andi

2007-10-01 10:40:48

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [50/50] x86_64: Remove fpu io port resource


> > ===================================================================
> > --- linux.orig/arch/x86_64/kernel/setup.c
> > +++ linux/arch/x86_64/kernel/setup.c
> > @@ -121,8 +121,6 @@ struct resource standard_io_resources[]
> > .flags = IORESOURCE_BUSY | IORESOURCE_IO },
> > { .name = "dma2", .start = 0xc0, .end = 0xdf,
> > .flags = IORESOURCE_BUSY | IORESOURCE_IO },
> > - { .name = "fpu", .start = 0xf0, .end = 0xff,
> > - .flags = IORESOURCE_BUSY | IORESOURCE_IO }

The patch has been dropped meanwhile BTW -- Maciej pointed
out correctly that the south bridges likely decode it anyways.

> Since we are merging x86 and x86-64, I think it would be nice at least
> to CC Thomas on patches that increase 32/64-bit differences... because
> won't this patch have to be partial un-done when we merge i386 and x86-64?

So far I still maintain i386 and x86-64. If Thomas wants to take both
over completely he can do that; but I won't bother handling any patches i didn't
write then anymore.

-Andi

2007-10-01 10:59:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [13/50] x86: Fix and reenable CLFLUSH support in change_page_attr()

On Saturday 22 September 2007 07:47:59 Oleg Verych wrote:
> * Sat, 22 Sep 2007 00:32:11 +0200 (CEST)
> []
> > - flush_map(&l);
> > + flush_map(&arg);
>
> + flush_map(&arg.l);
>
> CC arch/x86_64/mm/pageattr.o
> arch/x86_64/mm/pageattr.c: In function 'global_flush_tlb':
> arch/x86_64/mm/pageattr.c:274: warning: passing argument 1 of 'flush_map' from incompatible pointer type

That was already fixed; hmm perhaps that was an old patch.

-Andi

2007-10-01 11:17:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu


>
> This will never be up to date.

It will. There are no new P4 cores anymore.

-Andi

2007-10-01 11:30:29

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] [50/50] x86_64: Remove fpu io port resource

Andi Kleen wrote:
> So far I still maintain i386 and x86-64. If Thomas wants to take both
> over completely he can do that; but I won't bother handling any patches i didn't
> write then anymore.


What does that mean?

Didn't we all agree that x86 and x86-64 are going to be merged?

If yes, isn't it logical to avoid patches that extend the separation?

This is just basic playing-well-with-others fundamentals.

Jeff


2007-10-01 11:48:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [50/50] x86_64: Remove fpu io port resource

On Monday 01 October 2007 13:30:12 Jeff Garzik wrote:
> Andi Kleen wrote:
> > So far I still maintain i386 and x86-64. If Thomas wants to take both
> > over completely he can do that; but I won't bother handling any patches i didn't
> > write then anymore.
>
>
> What does that mean?

What I wrote.

> Didn't we all agree that x86 and x86-64 are going to be merged?

I didn't agree no.

-Andi

2007-10-01 13:33:47

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] [50/50] x86_64: Remove fpu io port resource

Andi Kleen wrote:
> On Monday 01 October 2007 13:30:12 Jeff Garzik wrote:
>> Andi Kleen wrote:
>>> So far I still maintain i386 and x86-64. If Thomas wants to take both
>>> over completely he can do that; but I won't bother handling any patches i didn't
>>> write then anymore.
>>
>> What does that mean?
>
> What I wrote.
>
>> Didn't we all agree that x86 and x86-64 are going to be merged?
>
> I didn't agree no.

It's called consensus. Even Linus agreed to the merge.

So you are basically saying "fuck off" to the entire community?

Even in libata I have to listen to consensus, having just merged the
port multiplier support.

Jeff



2007-10-01 14:16:50

by Mark Lord

[permalink] [raw]
Subject: Re: [PATCH] [50/50] x86_64: Remove fpu io port resource

Jeff Garzik wrote:
>
> Even in libata I have to listen to consensus, having just merged the
> port multiplier support.
>
> Jeff

Thanks, Jeff!

2007-10-01 16:48:06

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS

On Saturday 22 September 2007 11:17:08 Joerg Roedel wrote:
> I don't think we need this patch. When SVM is disabled KVM will tell on
> module load.

The point is that people often want to know in advance (before they
even try to use KVM or Xen) if their CPU and BIOS supports this.

> Further with SVM-lock it will be possible to re-enable SVM
> even if it was disabled by BIOS using a key. In this case the user of
> SVM has to clear the capability bit you set in this patch for all cpus.

Not sure I follow you. Can you clarify? What exactly needs to be
done to do a full non reversible lock?

-Andi

2007-10-01 20:12:34

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS

On Mon, Oct 01, 2007 at 06:47:50PM +0200, Andi Kleen wrote:
> On Saturday 22 September 2007 11:17:08 Joerg Roedel wrote:
> > I don't think we need this patch. When SVM is disabled KVM will tell on
> > module load.
>
> The point is that people often want to know in advance (before they
> even try to use KVM or Xen) if their CPU and BIOS supports this.

If the CPU supports SVM this is visible to the user because the SVM
feature flag does not disappear when its disabled. But because with CPUs
having the SVM-lock feature it can be re-enabled in a secure way under
some circumstances the information in /proc/cpuinfo will not be
reliable. Maybe we can check for it in identify_cpu() and print to the
kernel log if its disabled? It will be visible to the user through
dmesg.

Joerg

2007-10-01 21:47:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [patches] [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS



> feature flag does not disappear when its disabled. But because with CPUs
> having the SVM-lock feature it can be re-enabled in a secure way under
> some circumstances

Who would reenable it in what circumstances?

-Andi

2007-10-01 22:14:11

by Joerg Roedel

[permalink] [raw]
Subject: Re: [patches] [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS

On Mon, Oct 01, 2007 at 11:45:22PM +0200, Andi Kleen wrote:
>
>
> > feature flag does not disappear when its disabled. But because with CPUs
> > having the SVM-lock feature it can be re-enabled in a secure way under
> > some circumstances
>
> Who would reenable it in what circumstances?

I plan to implement the ability to re-enable it into the SVM module of KVM.
The key required for this will be passed as a module parameter.

Joerg

2007-10-02 14:33:16

by Alan

[permalink] [raw]
Subject: Re: [PATCH] [50/50] x86_64: Remove fpu io port resource

On Mon, 01 Oct 2007 07:30:12 -0400
Jeff Garzik <[email protected]> wrote:

> Andi Kleen wrote:
> > So far I still maintain i386 and x86-64. If Thomas wants to take both
> > over completely he can do that; but I won't bother handling any patches i didn't
> > write then anymore.
>
>
> What does that mean?
>
> Didn't we all agree that x86 and x86-64 are going to be merged?
>
> If yes, isn't it logical to avoid patches that extend the separation?
>
> This is just basic playing-well-with-others fundamentals.

No Andi is right - he didn't agree. He made it clear he didn't agree and
even at the kernel summit from what he said I'd assumed this meant from
2.6.24 we'd simply have a different maintainer team for x86-32/64.

We all (except Andi) agreed it was going to be merged and Andi made his
position clear - Thomas and Ingo just need to update the MAINTAINERS file
and get on with it.

Alan