Hi Ingo,
Here's a series of little cleanup patches to kernel/apic/io_apic.c which
I fixed as I was reviewing the code.
There are a few classes of cleanup here:
- Unification of 32 and 64-bit variants, where they are functionally
identical.
- Removing unnecessary #ifdef CONFIG_X86_32/64 (similar to unification,
except its code which is at worst a no-op on the other platform).
This cleans up the appearance of the code, and makes compile-test
coverage broader. For the 82093AA workaround, the test should be
more fine-grained (ie, actually look at the I/O APIC version).
- Restructure loops to use more conventional forms (lots of odd loops
which can be easily represented by a plain for() loop)
- Fix some comments, delete redundant decls, etc.
Jeremy Fitzhardinge (12):
x86/acpi: acpi_parse_madt_ioapic_entries: remove redundant braces
x86/ioapic.c: ioapic_modify_irq is too large to inline
x86/ioapic.c: unify __mask_IO_APIC_irq()
x86/ioapic.c: remove #ifdef for 82093AA workaround
x86/ioapic.c: remove redundant declaration of irq_pin_list
x86/ioapic.c: move lost comment to what seems like appropriate place
x86/ioapic.c: convert io_apic_level_ack_pending loop to normal for() loop
x86/ioapic.c: simplify add_pin_to_irq_node()
x86/ioapic.c: convert replace_pin_at_irq_node to conventional for() loop
x86/ioapic.c: clean up replace_pin_at_irq_node logic and comments
x86/ioapic.c: convert __target_IO_APIC_irq to conventional for() loop
x86/ioapic.c: unify ioapic_retrigger_irq()
arch/x86/kernel/acpi/boot.c | 3 +-
arch/x86/kernel/apic/io_apic.c | 144 ++++++++++++----------------------------
2 files changed, 45 insertions(+), 102 deletions(-)
Thanks,
J
From: Jeremy Fitzhardinge <[email protected]>
We don't put braces around a single statement.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/acpi/boot.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 6310861..2410469 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1249,9 +1249,8 @@ static int __init acpi_parse_madt_ioapic_entries(void)
* If MPS is present, it will handle them,
* otherwise the system will stay in PIC mode
*/
- if (acpi_disabled || acpi_noirq) {
+ if (acpi_disabled || acpi_noirq)
return -ENODEV;
- }
if (!cpu_has_apic)
return -ENODEV;
--
1.6.2.2
From: Robert Richter <[email protected]>
The current userland does not yet fully support all cpu types
implemented in the kernel. With the module parameter:
oprofile.cpu_type=core_2
the kernel reports a core_2 cpu to the userland on an Intel system and
thus makes oprofile usable with current distros.
Cc: Linus Torvalds <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
Documentation/kernel-parameters.txt | 3 ++-
arch/x86/oprofile/nmi_int.c | 17 ++++++++++++++---
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6ce5f48..ea7ead3 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1653,10 +1653,11 @@ and is between 256 and 4096 characters. It is defined in the file
oprofile.cpu_type= Force an oprofile cpu type
This might be useful if you have an older oprofile
userland or if you want common events.
- Format: { archperfmon }
+ Format: { archperfmon | core_2 }
archperfmon: [X86] Force use of architectural
perfmon on Intel CPUs instead of the
CPU specific event set.
+ core_2: [X86] On Intel systems: report core_2 CPU.
osst= [HW,SCSI] SCSI Tape Driver
Format: <buffer_size>,<write_threshold>
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index 3b285e6..d7348a2 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -386,12 +386,20 @@ static int __init p4_init(char **cpu_type)
return 0;
}
-static int force_arch_perfmon;
+static enum {
+ NONE = 0,
+ ARCH_PERFMON,
+ CORE_2,
+} forced_cpu;
+
static int force_cpu_type(const char *str, struct kernel_param *kp)
{
if (!strcmp(str, "archperfmon")) {
- force_arch_perfmon = 1;
+ forced_cpu = ARCH_PERFMON;
printk(KERN_INFO "oprofile: forcing architectural perfmon\n");
+ } else if (!strcmp(str, "core_2")) {
+ forced_cpu = CORE_2;
+ printk(KERN_INFO "oprofile: forcing core_2\n");
}
return 0;
@@ -402,9 +410,12 @@ static int __init ppro_init(char **cpu_type)
{
__u8 cpu_model = boot_cpu_data.x86_model;
- if (force_arch_perfmon && cpu_has_arch_perfmon)
+ if (forced_cpu == ARCH_PERFMON && cpu_has_arch_perfmon)
return 0;
+ if (forced_cpu == CORE_2)
+ cpu_model = 15;
+
switch (cpu_model) {
case 0 ... 2:
*cpu_type = "i386/ppro";
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
If ioapic_modify_irq() is marked inline, it gets inlined several times.
Un-inlining it saves around 200 bytes in .text for me.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 1946fac..01aa429 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -553,9 +553,9 @@ static void __init replace_pin_at_irq_node(struct irq_cfg *cfg, int node,
add_pin_to_irq_node(cfg, node, newapic, newpin);
}
-static inline void io_apic_modify_irq(struct irq_cfg *cfg,
- int mask_and, int mask_or,
- void (*final)(struct irq_pin_list *entry))
+static void io_apic_modify_irq(struct irq_cfg *cfg,
+ int mask_and, int mask_or,
+ void (*final)(struct irq_pin_list *entry))
{
int pin;
struct irq_pin_list *entry;
--
1.6.2.2
From: Ingo Molnar <[email protected]>
A number of places said 'events' while they should say 'samples'.
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <[email protected]>
---
Documentation/perf_counter/builtin-record.c | 22 ++++++------
Documentation/perf_counter/builtin-top.c | 46 +++++++++++++-------------
2 files changed, 34 insertions(+), 34 deletions(-)
diff --git a/Documentation/perf_counter/builtin-record.c b/Documentation/perf_counter/builtin-record.c
index e2301f3..d4ad305 100644
--- a/Documentation/perf_counter/builtin-record.c
+++ b/Documentation/perf_counter/builtin-record.c
@@ -65,7 +65,7 @@ static unsigned int mmap_read_head(struct mmap_data *md)
return head;
}
-static long events;
+static long samples;
static struct timeval last_read, this_read;
static __u64 bytes_written;
@@ -83,7 +83,7 @@ static void mmap_read(struct mmap_data *md)
/*
* If we're further behind than half the buffer, there's a chance
- * the writer will bite our tail and screw up the events under us.
+ * the writer will bite our tail and mess up the samples under us.
*
* If we somehow ended up ahead of the head, we got messed up.
*
@@ -109,7 +109,7 @@ static void mmap_read(struct mmap_data *md)
last_read = this_read;
if (old != head)
- events++;
+ samples++;
size = head - old;
@@ -257,7 +257,7 @@ out_failure:
exit(EXIT_FAILURE);
}
-static void pid_synthesize_mmap_events(pid_t pid)
+static void pid_synthesize_mmap_samples(pid_t pid)
{
char filename[PATH_MAX];
FILE *fp;
@@ -315,7 +315,7 @@ static void pid_synthesize_mmap_events(pid_t pid)
fclose(fp);
}
-static void synthesize_events(void)
+static void synthesize_samples(void)
{
DIR *proc;
struct dirent dirent, *next;
@@ -331,7 +331,7 @@ static void synthesize_events(void)
continue;
pid_synthesize_comm_event(pid, 1);
- pid_synthesize_mmap_events(pid);
+ pid_synthesize_mmap_samples(pid);
}
closedir(proc);
@@ -396,7 +396,7 @@ static void open_counters(int cpu, pid_t pid)
if (pid > 0) {
pid_synthesize_comm_event(pid, 0);
- pid_synthesize_mmap_events(pid);
+ pid_synthesize_mmap_samples(pid);
}
group_fd = -1;
@@ -469,17 +469,17 @@ static int __cmd_record(int argc, const char **argv)
}
if (system_wide)
- synthesize_events();
+ synthesize_samples();
while (!done) {
- int hits = events;
+ int hits = samples;
for (i = 0; i < nr_cpu; i++) {
for (counter = 0; counter < nr_counters; counter++)
mmap_read(&mmap_array[i][counter]);
}
- if (hits == events)
+ if (hits == samples)
ret = poll(event_array, nr_poll, 100);
}
@@ -487,7 +487,7 @@ static int __cmd_record(int argc, const char **argv)
* Approximate RIP event size: 24 bytes.
*/
fprintf(stderr,
- "[ perf record: Captured and wrote %.3f MB %s (~%lld events) ]\n",
+ "[ perf record: Captured and wrote %.3f MB %s (~%lld samples) ]\n",
(double)bytes_written / 1024.0 / 1024.0,
output_name,
bytes_written / 24);
diff --git a/Documentation/perf_counter/builtin-top.c b/Documentation/perf_counter/builtin-top.c
index 2fee595..ff7e13c 100644
--- a/Documentation/perf_counter/builtin-top.c
+++ b/Documentation/perf_counter/builtin-top.c
@@ -137,8 +137,8 @@ static double sym_weight(const struct sym_entry *sym)
return weight;
}
-static long events;
-static long userspace_events;
+static long samples;
+static long userspace_samples;
static const char CONSOLE_CLEAR[] = "[H[2J";
static void __list_insert_active_sym(struct sym_entry *syme)
@@ -177,14 +177,14 @@ static void print_sym_table(void)
{
int printed = 0, j;
int counter;
- float events_per_sec = events/delay_secs;
- float kevents_per_sec = (events-userspace_events)/delay_secs;
- float sum_kevents = 0.0;
+ float samples_per_sec = samples/delay_secs;
+ float ksamples_per_sec = (samples-userspace_samples)/delay_secs;
+ float sum_ksamples = 0.0;
struct sym_entry *syme, *n;
struct rb_root tmp = RB_ROOT;
struct rb_node *nd;
- events = userspace_events = 0;
+ samples = userspace_samples = 0;
/* Sort the active symbols */
pthread_mutex_lock(&active_symbols_lock);
@@ -196,7 +196,7 @@ static void print_sym_table(void)
if (syme->snap_count != 0) {
syme->weight = sym_weight(syme);
rb_insert_active_sym(&tmp, syme);
- sum_kevents += syme->snap_count;
+ sum_ksamples += syme->snap_count;
for (j = 0; j < nr_counters; j++)
syme->count[j] = zero ? 0 : syme->count[j] * 7 / 8;
@@ -209,8 +209,8 @@ static void print_sym_table(void)
printf(
"------------------------------------------------------------------------------\n");
printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% [",
- events_per_sec,
- 100.0 - (100.0*((events_per_sec-kevents_per_sec)/events_per_sec)));
+ samples_per_sec,
+ 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)));
if (nr_counters == 1) {
printf("%d", event_count[0]);
@@ -246,12 +246,12 @@ static void print_sym_table(void)
printf("------------------------------------------------------------------------------\n\n");
if (nr_counters == 1)
- printf(" events pcnt");
+ printf(" samples pcnt");
else
- printf(" weight events pcnt");
+ printf(" weight samples pcnt");
printf(" RIP kernel function\n"
- " ______ ______ _____ ________________ _______________\n\n"
+ " ______ _______ _____ ________________ _______________\n\n"
);
for (nd = rb_first(&tmp); nd; nd = rb_next(nd)) {
@@ -263,8 +263,8 @@ static void print_sym_table(void)
if (++printed > print_entries || syme->snap_count < count_filter)
continue;
- pcnt = 100.0 - (100.0 * ((sum_kevents - syme->snap_count) /
- sum_kevents));
+ pcnt = 100.0 - (100.0 * ((sum_ksamples - syme->snap_count) /
+ sum_ksamples));
/*
* We color high-overhead entries in red, low-overhead
@@ -276,9 +276,9 @@ static void print_sym_table(void)
color = PERF_COLOR_GREEN;
if (nr_counters == 1)
- printf("%19.2f - ", syme->weight);
+ printf("%20.2f - ", syme->weight);
else
- printf("%8.1f %10ld - ", syme->weight, syme->snap_count);
+ printf("%9.1f %10ld - ", syme->weight, syme->snap_count);
color_fprintf(stdout, color, "%4.1f%%", pcnt);
printf(" - %016llx : %s\n", sym->start, sym->name);
@@ -318,7 +318,7 @@ static int symbol_filter(struct dso *self, struct symbol *sym)
return 1;
syme = dso__sym_priv(self, sym);
- /* Tag events to be skipped. */
+ /* Tag samples to be skipped. */
if (!strcmp("default_idle", name) ||
!strcmp("cpu_idle", name) ||
!strcmp("enter_idle", name) ||
@@ -405,15 +405,15 @@ static void record_ip(uint64_t ip, int counter)
}
}
- events--;
+ samples--;
}
static void process_event(uint64_t ip, int counter)
{
- events++;
+ samples++;
if (ip < min_ip || ip > max_ip) {
- userspace_events++;
+ userspace_samples++;
return;
}
@@ -451,7 +451,7 @@ static void mmap_read(struct mmap_data *md)
/*
* If we're further behind than half the buffer, there's a chance
- * the writer will bite our tail and screw up the events under us.
+ * the writer will bite our tail and mess up the samples under us.
*
* If we somehow ended up ahead of the head, we got messed up.
*
@@ -608,14 +608,14 @@ static int __cmd_top(void)
}
while (1) {
- int hits = events;
+ int hits = samples;
for (i = 0; i < nr_cpus; i++) {
for (counter = 0; counter < nr_counters; counter++)
mmap_read(&mmap_array[i][counter]);
}
- if (hits == events)
+ if (hits == samples)
ret = poll(event_array, nr_poll, 100);
}
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
The structure is defined immediately below, so there's no need
to forward declare it.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 54ec66d..ce5e888 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -116,8 +116,6 @@ static int __init parse_noapic(char *str)
}
early_param("noapic", parse_noapic);
-struct irq_pin_list;
-
/*
* This is performance-critical, we want to do it O(1)
*
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
Rather than duplicating the same alloc/init code twice, restructure
the function to look for duplicates and then add an entry
if none is found.
This function is not performance critical; all but one of its callers
are __init functions, and the non-__init caller is for PCI device setup.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 28 ++++++++--------------------
1 files changed, 8 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8ee2b14..350d99b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -487,34 +487,22 @@ static void ioapic_mask_entry(int apic, int pin)
*/
static void add_pin_to_irq_node(struct irq_cfg *cfg, int node, int apic, int pin)
{
- struct irq_pin_list *entry;
+ struct irq_pin_list **entryp, *entry;
- entry = cfg->irq_2_pin;
- if (!entry) {
- entry = get_one_free_irq_2_pin(node);
- if (!entry) {
- printk(KERN_ERR "can not alloc irq_2_pin to add %d - %d\n",
- apic, pin);
- return;
- }
- cfg->irq_2_pin = entry;
- entry->apic = apic;
- entry->pin = pin;
- return;
- }
-
- while (entry->next) {
+ for (entryp = &cfg->irq_2_pin;
+ *entryp != NULL;
+ entryp = &(*entryp)->next) {
+ entry = *entryp;
/* not again, please */
if (entry->apic == apic && entry->pin == pin)
return;
-
- entry = entry->next;
}
- entry->next = get_one_free_irq_2_pin(node);
- entry = entry->next;
+ entry = get_one_free_irq_2_pin(node);
entry->apic = apic;
entry->pin = pin;
+
+ *entryp = entry;
}
/*
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
Use a normal for() loop in __target_IO_APIC_irq().
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 9 +--------
1 files changed, 1 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index f466a1e..377aac3 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2234,13 +2234,9 @@ static void __target_IO_APIC_irq(unsigned int irq, unsigned int dest, struct irq
struct irq_pin_list *entry;
u8 vector = cfg->vector;
- entry = cfg->irq_2_pin;
- for (;;) {
+ for (entry = cfg->irq_2_pin; entry != NULL; entry = entry->next) {
unsigned int reg;
- if (!entry)
- break;
-
apic = entry->apic;
pin = entry->pin;
/*
@@ -2253,9 +2249,6 @@ static void __target_IO_APIC_irq(unsigned int irq, unsigned int dest, struct irq
reg &= ~IO_APIC_REDIR_VECTOR_MASK;
reg |= vector;
io_apic_modify(apic, 0x10 + pin*2, reg);
- if (!entry->next)
- break;
- entry = entry->next;
}
}
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
There's no need for a control variable in replace_pin_at_irq_node();
it can just return if it finds the old apic/pin to replace.
If the loop terminates, then it didn't find the old apic/pin, so it can
add the new ones.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index b71d2a2..f466a1e 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -509,25 +509,22 @@ static void add_pin_to_irq_node(struct irq_cfg *cfg, int node, int apic, int pin
* Reroute an IRQ to a different pin.
*/
static void __init replace_pin_at_irq_node(struct irq_cfg *cfg, int node,
- int oldapic, int oldpin,
- int newapic, int newpin)
+ int oldapic, int oldpin,
+ int newapic, int newpin)
{
struct irq_pin_list *entry;
- int replaced = 0;
for (entry = cfg->irq_2_pin; entry != NULL; entry = entry->next) {
if (entry->apic == oldapic && entry->pin == oldpin) {
entry->apic = newapic;
entry->pin = newpin;
- replaced = 1;
/* every one is different, right? */
- break;
+ return;
}
}
- /* why? call replace before add? */
- if (!replaced)
- add_pin_to_irq_node(cfg, node, newapic, newpin);
+ /* old apic/pin didn't exist, so just add new ones */
+ add_pin_to_irq_node(cfg, node, newapic, newpin);
}
static void io_apic_modify_irq(struct irq_cfg *cfg,
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
The 32 and 64-bit versions of ioapic_retrigger_irq() are identical
except the 64-bit one takes vector_lock. vector_lock is defined and
used on 32-bit too, so just use a common ioapic_retrigger_irq().
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 9 ---------
1 files changed, 0 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 377aac3..c6acce2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2176,7 +2176,6 @@ static unsigned int startup_ioapic_irq(unsigned int irq)
return was_pending;
}
-#ifdef CONFIG_X86_64
static int ioapic_retrigger_irq(unsigned int irq)
{
@@ -2189,14 +2188,6 @@ static int ioapic_retrigger_irq(unsigned int irq)
return 1;
}
-#else
-static int ioapic_retrigger_irq(unsigned int irq)
-{
- apic->send_IPI_self(irq_cfg(irq)->vector);
-
- return 1;
-}
-#endif
/*
* Level and edge triggered IO-APIC interrupts need different handling,
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
The comment got separated from its subject, so move it to what
appears to be the right place, and update to describe the current
structure.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 12 +++++-------
1 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ce5e888..089a4e4 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -116,13 +116,6 @@ static int __init parse_noapic(char *str)
}
early_param("noapic", parse_noapic);
-/*
- * This is performance-critical, we want to do it O(1)
- *
- * the indexing order of this array favors 1:1 mappings
- * between pins and IRQs.
- */
-
struct irq_pin_list {
int apic, pin;
struct irq_pin_list *next;
@@ -137,6 +130,11 @@ static struct irq_pin_list *get_one_free_irq_2_pin(int node)
return pin;
}
+/*
+ * This is performance-critical, we want to do it O(1)
+ *
+ * Most irqs are mapped 1:1 with pins.
+ */
struct irq_cfg {
struct irq_pin_list *irq_2_pin;
cpumask_var_t domain;
--
1.6.2.2
From: Catalin Marinas <[email protected]>
_sdata is a common symbol defined by many architectures and made
available to the kernel via asm-generic/sections.h. Kmemleak uses this
symbol when scanning the data sections.
[ Impact: add new global symbol ]
Signed-off-by: Catalin Marinas <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/vmlinux.lds.S | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4c85b2e..367e878 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -108,6 +108,8 @@ SECTIONS
/* Data */
. = ALIGN(PAGE_SIZE);
.data : AT(ADDR(.data) - LOAD_OFFSET) {
+ /* Start of data section */
+ _sdata = .;
DATA_DATA
CONSTRUCTORS
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
Use a conventional for() loop in replace_pin_at_irq_node().
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 350d99b..b71d2a2 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -512,10 +512,10 @@ static void __init replace_pin_at_irq_node(struct irq_cfg *cfg, int node,
int oldapic, int oldpin,
int newapic, int newpin)
{
- struct irq_pin_list *entry = cfg->irq_2_pin;
+ struct irq_pin_list *entry;
int replaced = 0;
- while (entry) {
+ for (entry = cfg->irq_2_pin; entry != NULL; entry = entry->next) {
if (entry->apic == oldapic && entry->pin == oldpin) {
entry->apic = newapic;
entry->pin = newpin;
@@ -523,7 +523,6 @@ static void __init replace_pin_at_irq_node(struct irq_cfg *cfg, int node,
/* every one is different, right? */
break;
}
- entry = entry->next;
}
/* why? call replace before add? */
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
Convert the unconventional loop in io_apic_level_ack_pending() to
a conventional for() loop.
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 8 +-------
1 files changed, 1 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 089a4e4..8ee2b14 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -408,13 +408,10 @@ static bool io_apic_level_ack_pending(struct irq_cfg *cfg)
unsigned long flags;
spin_lock_irqsave(&ioapic_lock, flags);
- entry = cfg->irq_2_pin;
- for (;;) {
+ for (entry = cfg->irq_2_pin; entry != NULL; entry = entry->next) {
unsigned int reg;
int pin;
- if (!entry)
- break;
pin = entry->pin;
reg = io_apic_read(entry->apic, 0x10 + pin*2);
/* Is the remote IRR bit set? */
@@ -422,9 +419,6 @@ static bool io_apic_level_ack_pending(struct irq_cfg *cfg)
spin_unlock_irqrestore(&ioapic_lock, flags);
return true;
}
- if (!entry->next)
- break;
- entry = entry->next;
}
spin_unlock_irqrestore(&ioapic_lock, flags);
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
The main difference between 32 and 64-bit __mask_IO_APIC_irq() does a
readback from the I/O APIC to synchronize it.
If there's a hardware requirement to do a readback sync after updating
an APIC register, then it will be a hardware requrement regardless of
whether the kernel is compiled 32 or 64-bit.
Unify __mask_IO_APIC_irq() using the 64-bit version which always syncs
with io_apic_sync().
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 7 +------
1 files changed, 1 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 01aa429..e767bcb 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -577,7 +577,6 @@ static void __unmask_IO_APIC_irq(struct irq_cfg *cfg)
io_apic_modify_irq(cfg, ~IO_APIC_REDIR_MASKED, 0, NULL);
}
-#ifdef CONFIG_X86_64
static void io_apic_sync(struct irq_pin_list *entry)
{
/*
@@ -593,12 +592,8 @@ static void __mask_IO_APIC_irq(struct irq_cfg *cfg)
{
io_apic_modify_irq(cfg, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
}
-#else /* CONFIG_X86_32 */
-static void __mask_IO_APIC_irq(struct irq_cfg *cfg)
-{
- io_apic_modify_irq(cfg, ~0, IO_APIC_REDIR_MASKED, NULL);
-}
+#ifdef CONFIG_X86_32
static void __mask_and_edge_IO_APIC_irq(struct irq_cfg *cfg)
{
io_apic_modify_irq(cfg, ~IO_APIC_REDIR_LEVEL_TRIGGER,
--
1.6.2.2
From: Jeremy Fitzhardinge <[email protected]>
While no 64-bit hardware will have a version 0x11 I/O APIC which needs
the level/edge bug workaround, that's not a particular reason to use
CONFIG_X86_32 to #ifdef the code out. Most 32-bit machines will no
longer need the workaround either, so the test to see whether it is
necessary should be more fine-grained than "32-bit=yes, 64-bit=no".
(Also fix formatting of block comment.)
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 47 ++++++++++++++++-----------------------
1 files changed, 19 insertions(+), 28 deletions(-)
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index e767bcb..54ec66d 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -593,7 +593,6 @@ static void __mask_IO_APIC_irq(struct irq_cfg *cfg)
io_apic_modify_irq(cfg, ~0, IO_APIC_REDIR_MASKED, &io_apic_sync);
}
-#ifdef CONFIG_X86_32
static void __mask_and_edge_IO_APIC_irq(struct irq_cfg *cfg)
{
io_apic_modify_irq(cfg, ~IO_APIC_REDIR_LEVEL_TRIGGER,
@@ -605,7 +604,6 @@ static void __unmask_and_level_IO_APIC_irq(struct irq_cfg *cfg)
io_apic_modify_irq(cfg, ~IO_APIC_REDIR_MASKED,
IO_APIC_REDIR_LEVEL_TRIGGER, NULL);
}
-#endif /* CONFIG_X86_32 */
static void mask_IO_APIC_irq_desc(struct irq_desc *desc)
{
@@ -2508,11 +2506,8 @@ atomic_t irq_mis_count;
static void ack_apic_level(unsigned int irq)
{
struct irq_desc *desc = irq_to_desc(irq);
-
-#ifdef CONFIG_X86_32
unsigned long v;
int i;
-#endif
struct irq_cfg *cfg;
int do_unmask_irq = 0;
@@ -2525,31 +2520,28 @@ static void ack_apic_level(unsigned int irq)
}
#endif
-#ifdef CONFIG_X86_32
/*
- * It appears there is an erratum which affects at least version 0x11
- * of I/O APIC (that's the 82093AA and cores integrated into various
- * chipsets). Under certain conditions a level-triggered interrupt is
- * erroneously delivered as edge-triggered one but the respective IRR
- * bit gets set nevertheless. As a result the I/O unit expects an EOI
- * message but it will never arrive and further interrupts are blocked
- * from the source. The exact reason is so far unknown, but the
- * phenomenon was observed when two consecutive interrupt requests
- * from a given source get delivered to the same CPU and the source is
- * temporarily disabled in between.
- *
- * A workaround is to simulate an EOI message manually. We achieve it
- * by setting the trigger mode to edge and then to level when the edge
- * trigger mode gets detected in the TMR of a local APIC for a
- * level-triggered interrupt. We mask the source for the time of the
- * operation to prevent an edge-triggered interrupt escaping meanwhile.
- * The idea is from Manfred Spraul. --macro
- */
+ * It appears there is an erratum which affects at least version 0x11
+ * of I/O APIC (that's the 82093AA and cores integrated into various
+ * chipsets). Under certain conditions a level-triggered interrupt is
+ * erroneously delivered as edge-triggered one but the respective IRR
+ * bit gets set nevertheless. As a result the I/O unit expects an EOI
+ * message but it will never arrive and further interrupts are blocked
+ * from the source. The exact reason is so far unknown, but the
+ * phenomenon was observed when two consecutive interrupt requests
+ * from a given source get delivered to the same CPU and the source is
+ * temporarily disabled in between.
+ *
+ * A workaround is to simulate an EOI message manually. We achieve it
+ * by setting the trigger mode to edge and then to level when the edge
+ * trigger mode gets detected in the TMR of a local APIC for a
+ * level-triggered interrupt. We mask the source for the time of the
+ * operation to prevent an edge-triggered interrupt escaping meanwhile.
+ * The idea is from Manfred Spraul. --macro
+ */
cfg = desc->chip_data;
i = cfg->vector;
-
v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
-#endif
/*
* We must acknowledge the irq before we move it or the acknowledge will
@@ -2591,7 +2583,7 @@ static void ack_apic_level(unsigned int irq)
unmask_IO_APIC_irq_desc(desc);
}
-#ifdef CONFIG_X86_32
+ /* Tail end of version 0x11 I/O APIC bug workaround */
if (!(v & (1 << (i & 0x1f)))) {
atomic_inc(&irq_mis_count);
spin_lock(&ioapic_lock);
@@ -2599,7 +2591,6 @@ static void ack_apic_level(unsigned int irq)
__unmask_and_level_IO_APIC_irq(cfg);
spin_unlock(&ioapic_lock);
}
-#endif
}
#ifdef CONFIG_INTR_REMAP
--
1.6.2.2
From: Cliff Wickman <[email protected]>
The UV tlb shootdown code has a serious initialization error.
An array of structures [32*8] is initialized as if it were [32].
The array is indexed by (cpu number on the blade)*8, so the short
initialization works for up to 4 cpus on a blade.
But above that, we provide an invalid opcode to the hub's
broadcast assist unit.
This patch changes the allocation of the array to use its symbolic
dimensions for better clarity. And initializes all 32*8 entries.
Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's
recommendation.
Tested on the UV simulator.
Signed-off-by: Cliff Wickman <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/uv/uv_bau.h | 2 +-
arch/x86/kernel/tlb_uv.c | 15 +++++++++++++--
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/uv/uv_bau.h b/arch/x86/include/asm/uv/uv_bau.h
index 9b0e61b..bddd44f 100644
--- a/arch/x86/include/asm/uv/uv_bau.h
+++ b/arch/x86/include/asm/uv/uv_bau.h
@@ -37,7 +37,7 @@
#define UV_CPUS_PER_ACT_STATUS 32
#define UV_ACT_STATUS_MASK 0x3
#define UV_ACT_STATUS_SIZE 2
-#define UV_ACTIVATION_DESCRIPTOR_SIZE 32
+#define UV_ADP_SIZE 32
#define UV_DISTRIBUTION_SIZE 256
#define UV_SW_ACK_NPENDING 8
#define UV_NET_ENDPOINT_INTD 0x38
diff --git a/arch/x86/kernel/tlb_uv.c b/arch/x86/kernel/tlb_uv.c
index ed0c337..16f0fd4 100644
--- a/arch/x86/kernel/tlb_uv.c
+++ b/arch/x86/kernel/tlb_uv.c
@@ -715,7 +715,12 @@ uv_activation_descriptor_init(int node, int pnode)
struct bau_desc *adp;
struct bau_desc *ad2;
- adp = (struct bau_desc *)kmalloc_node(16384, GFP_KERNEL, node);
+ /*
+ * each bau_desc is 64 bytes; there are 8 (UV_ITEMS_PER_DESCRIPTOR)
+ * per cpu; and up to 32 (UV_ADP_SIZE) cpu's per blade
+ */
+ adp = (struct bau_desc *)kmalloc_node(sizeof(struct bau_desc)*
+ UV_ADP_SIZE*UV_ITEMS_PER_DESCRIPTOR, GFP_KERNEL, node);
BUG_ON(!adp);
pa = uv_gpa(adp); /* need the real nasid*/
@@ -729,7 +734,13 @@ uv_activation_descriptor_init(int node, int pnode)
(n << UV_DESC_BASE_PNODE_SHIFT | m));
}
- for (i = 0, ad2 = adp; i < UV_ACTIVATION_DESCRIPTOR_SIZE; i++, ad2++) {
+ /*
+ * initializing all 8 (UV_ITEMS_PER_DESCRIPTOR) descriptors for each
+ * cpu even though we only use the first one; one descriptor can
+ * describe a broadcast to 256 nodes.
+ */
+ for (i = 0, ad2 = adp; i < (UV_ADP_SIZE*UV_ITEMS_PER_DESCRIPTOR);
+ i++, ad2++) {
memset(ad2, 0, sizeof(struct bau_desc));
ad2->header.sw_ack_flag = 1;
/*
--
1.6.2.2
On 16.06.09 13:15:40, Jeremy Fitzhardinge wrote:
> From: Robert Richter <[email protected]>
>
> The current userland does not yet fully support all cpu types
> implemented in the kernel. With the module parameter:
>
> oprofile.cpu_type=core_2
>
> the kernel reports a core_2 cpu to the userland on an Intel system and
> thus makes oprofile usable with current distros.
>
> Cc: Linus Torvalds <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> Documentation/kernel-parameters.txt | 3 ++-
> arch/x86/oprofile/nmi_int.c | 17 ++++++++++++++---
> 2 files changed, 16 insertions(+), 4 deletions(-)
Jeremy,
I dropped this patch. See this thread:
http://lkml.org/lkml/2009/5/6/234
Does your patch series depends on this?
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]
On 06/17/09 03:57, Robert Richter wrote:
> On 16.06.09 13:15:40, Jeremy Fitzhardinge wrote:
>
>> From: Robert Richter<[email protected]>
>>
>> The current userland does not yet fully support all cpu types
>> implemented in the kernel. With the module parameter:
>>
>> oprofile.cpu_type=core_2
>>
>> the kernel reports a core_2 cpu to the userland on an Intel system and
>> thus makes oprofile usable with current distros.
>>
>> Cc: Linus Torvalds<[email protected]>
>> Cc: Andi Kleen<[email protected]>
>> Signed-off-by: Robert Richter<[email protected]>
>> ---
>> Documentation/kernel-parameters.txt | 3 ++-
>> arch/x86/oprofile/nmi_int.c | 17 ++++++++++++++---
>> 2 files changed, 16 insertions(+), 4 deletions(-)
>>
>
> Jeremy,
>
> I dropped this patch. See this thread:
>
> http://lkml.org/lkml/2009/5/6/234
>
> Does your patch series depends on this?
>
Confused. I haven't done anything with oprofile or nmi lately. I don't
recognize the patch you're quoting.
J
On 06/17/09 03:57, Robert Richter wrote:
> Jeremy,
>
> I dropped this patch. See this thread:
>
> http://lkml.org/lkml/2009/5/6/234
>
> Does your patch series depends on this?
>
Crud, I appear to have posted a completely bogus patch set.
J
On 06/16/09 13:15, Jeremy Fitzhardinge wrote:
> Hi Ingo,
>
> Here's a series of little cleanup patches to kernel/apic/io_apic.c which
> I fixed as I was reviewing the code.
>
Hm, patches 1-4 in this series are a mistake; I guess I managed to
specify the base version wrongly or something. 5-16 are what I intended
to send.
The git tree at
"git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
apic-cleanup" is fine.
J