2008-06-22 14:22:31

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 0/5 v2] x86 boot: various E820 & EFI related fixes - what changed in v2

The following patches differ from their previous versions,
submitted a week ago, on the evening of Sun, 15 Jun 2008,
in the following ways:
1) The "x86_64 build reserve_bootmem_generic fix" patch
is dropped, because it is already fixed in linux-next.
Thanks to Yinghai Lu.
2) The "allow overlapping ebda and efi memmap memory ranges"
patch is reworked. Instead of allowing overlapping early
memory reservations on EFI boots, rather it allows
particular early memory reservations to overlap, if they
have been so marked. Only the EBDA "BIOS reserved"
reservation is so marked for now; only it can overlap.
Thanks to H. Peter Anvin and Huang Ying.
3) The patches "remap efi systab runtime from phys to virt"
and "virtualize the efi runtime function callback addresses"
are dropped. They were incorrect. I fixed my EFI firmware
to follow the spec instead. Thanks to Huang Ying.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373


2008-06-22 14:23:07

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 2/5 v2] x86 boot: x86_64 efi compiler warning fix

From: Paul Jackson <[email protected]>

Fix a compiler warning. Rather than always casting a u32 to a pointer
(which generates a warning on x86_64 systems) instead separate the
x86_32 and x86_64 assignments entirely with ifdefs. Until other recent
changes to this code, it used to have x86_64 separated like this.

Signed-off-by: Paul Jackson <[email protected]>

---
arch/x86/kernel/efi.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

--- linux.orig/arch/x86/kernel/efi.c 2008-06-12 17:18:57.000000000 -0700
+++ linux/arch/x86/kernel/efi.c 2008-06-12 17:32:14.984928792 -0700
@@ -242,9 +242,11 @@ void __init efi_reserve_early(void)
{
unsigned long pmap;

+#ifdef CONFIG_X86_32
pmap = boot_params.efi_info.efi_memmap;
-#ifdef CONFIG_X86_64
- pmap += (__u64)boot_params.efi_info.efi_memmap_hi << 32;
+#else
+ pmap = (boot_params.efi_info.efi_memmap |
+ ((__u64)boot_params.efi_info.efi_memmap_hi<<32));
#endif
memmap.phys_map = (void *)pmap;
memmap.nr_map = boot_params.efi_info.efi_memmap_size /
@@ -284,10 +286,12 @@ void __init efi_init(void)
int i = 0;
void *tmp;

+#ifdef CONFIG_X86_32
efi_phys.systab = (efi_system_table_t *)boot_params.efi_info.efi_systab;
-#ifdef CONFIG_X86_64
- efi_phys.systab = (void *)efi_phys.systab +
- ((__u64)boot_params.efi_info.efi_systab_hi<<32);
+#else
+ efi_phys.systab = (efi_system_table_t *)
+ (boot_params.efi_info.efi_systab |
+ ((__u64)boot_params.efi_info.efi_systab_hi<<32));
#endif

efi.systab = early_ioremap((unsigned long)efi_phys.systab,

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2008-06-22 14:23:31

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 3/5 v2] x86 boot: allow overlapping early reserve memory ranges

From: Paul Jackson <[email protected]>

Add support for overlapping early memory reservations.

In general, they still can't overlap, and will panic
with "Overlapping early reservations" if they do overlap.

But if a memory range is reserved with the new call:
reserve_early_overlap_ok()
rather than with the usual call:
reserve_early()
then subsequent early reservations are allowed to overlap.

This new reserve_early_overlap_ok() call is only used in one
place so far, which is the "BIOS reserved" reservation for the
the EBDA region, which out of Paranoia reserves more than what
the BIOS might have specified, and which thus might overlap with
another legitimate early memory reservation (such as, perhaps,
the EFI memmap.)

Signed-off-by: Paul Jackson <[email protected]>

---
arch/x86/kernel/e820.c | 140 +++++++++++++++++++++++++++++++++++++++++++++----
arch/x86/kernel/head.c | 2
include/asm-x86/e820.h | 1
3 files changed, 133 insertions(+), 10 deletions(-)

--- linux-next.orig/arch/x86/kernel/e820.c 2008-06-22 06:36:04.391767162 -0700
+++ linux-next/arch/x86/kernel/e820.c 2008-06-22 06:36:07.203937695 -0700
@@ -536,6 +536,7 @@ void __init e820_mark_nosave_regions(uns
struct early_res {
u64 start, end;
char name[16];
+ char overlap_ok;
};
static struct early_res early_res[MAX_EARLY_RES] __initdata = {
{ 0, PAGE_SIZE, "BIOS data page" }, /* BIOS data page */
@@ -572,7 +573,93 @@ static int __init find_overlapped_early(
return i;
}

-void __init reserve_early(u64 start, u64 end, char *name)
+/*
+ * Drop the i-th range from the early reservation map,
+ * by copying any higher ranges down one over it, and
+ * clearing what had been the last slot.
+ */
+static void __init drop_range(int i)
+{
+ int j;
+
+ for (j = i + 1; j < MAX_EARLY_RES && early_res[j].end; j++)
+ ;
+
+ memmove(&early_res[i], &early_res[i + 1],
+ (j - 1 - i) * sizeof(struct early_res));
+
+ early_res[j - 1].end = 0;
+}
+
+/*
+ * Split any existing ranges that:
+ * 1) are marked 'overlap_ok', and
+ * 2) overlap with the stated range [start, end)
+ * into whatever portion (if any) of the existing range is entirely
+ * below or entirely above the stated range. Drop the portion
+ * of the existing range that overlaps with the stated range,
+ * which will allow the caller of this routine to then add that
+ * stated range without conflicting with any existing range.
+ */
+static void __init drop_overlaps_that_are_ok(u64 start, u64 end)
+{
+ int i;
+ struct early_res *r;
+ u64 lower_start, lower_end;
+ u64 upper_start, upper_end;
+ char name[16];
+
+ for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
+ r = &early_res[i];
+
+ /* Continue past non-overlapping ranges */
+ if (end <= r->start || start >= r->end)
+ continue;
+
+ /*
+ * Leave non-ok overlaps as is; let caller
+ * panic "Overlapping early reservations"
+ * when it hits this overlap.
+ */
+ if (!r->overlap_ok)
+ return;
+
+ /*
+ * We have an ok overlap. We will drop it from the early
+ * reservation map, and add back in any non-overlapping
+ * portions (lower or upper) as separate, overlap_ok,
+ * non-overlapping ranges.
+ */
+
+ /* 1. Note any non-overlapping (lower or upper) ranges. */
+ strncpy(name, r->name, sizeof(name) - 1);
+
+ lower_start = lower_end = 0;
+ upper_start = upper_end = 0;
+ if (r->start < start) {
+ lower_start = r->start;
+ lower_end = start;
+ }
+ if (r->end > end) {
+ upper_start = end;
+ upper_end = r->end;
+ }
+
+ /* 2. Drop the original ok overlapping range */
+ drop_range(i);
+
+ i--; /* resume for-loop on copied down entry */
+
+ /* 3. Add back in any non-overlapping ranges. */
+ if (lower_end)
+ reserve_early_overlap_ok(lower_start, lower_end, name);
+ if (upper_end)
+ reserve_early_overlap_ok(upper_start, upper_end, name);
+ }
+}
+
+static void __init __reserve_early(u64 start, u64 end, char *name,
+ int overlap_ok)
{
int i;
struct early_res *r;
@@ -588,14 +675,55 @@ void __init reserve_early(u64 start, u64
r->end - 1, r->name);
r->start = start;
r->end = end;
+ r->overlap_ok = overlap_ok;
if (name)
strncpy(r->name, name, sizeof(r->name) - 1);
}

+/*
+ * A few early reservtations come here.
+ *
+ * The 'overlap_ok' in the name of this routine does -not- mean it
+ * is ok for these reservations to overlap an earlier reservation.
+ * Rather it means that it is ok for subsequent reservations to
+ * overlap this one.
+ *
+ * Use this entry point to reserve early ranges when you are doing
+ * so out of "Paranoia", reserving perhaps more memory than you need,
+ * just in case, and don't mind a subsequent overlapping reservation
+ * that is known to be needed.
+ *
+ * The drop_overlaps_that_are_ok() call here isn't really needed.
+ * It would be needed if we had two colliding 'overlap_ok'
+ * reservations, so that the second such would not panic on the
+ * overlap with the first. We don't have any such as of this
+ * writing, but might as well tolerate such if it happens in
+ * the future.
+ */
+void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
+{
+ drop_overlaps_that_are_ok(start, end);
+ __reserve_early(start, end, name, 1);
+}
+
+/*
+ * Most early reservations come here.
+ *
+ * We first have drop_overlaps_that_are_ok() drop any pre-existing
+ * 'overlap_ok' ranges, so that we can then reserve this memory
+ * range without risk of panic'ing on an overlapping overlap_ok
+ * early reservation.
+ */
+void __init reserve_early(u64 start, u64 end, char *name)
+{
+ drop_overlaps_that_are_ok(start, end);
+ __reserve_early(start, end, name, 0);
+}
+
void __init free_early(u64 start, u64 end)
{
struct early_res *r;
- int i, j;
+ int i;

i = find_overlapped_early(start, end);
r = &early_res[i];
@@ -603,13 +731,7 @@ void __init free_early(u64 start, u64 en
panic("free_early on not reserved area: %llx-%llx!",
start, end - 1);

- for (j = i + 1; j < MAX_EARLY_RES && early_res[j].end; j++)
- ;
-
- memmove(&early_res[i], &early_res[i + 1],
- (j - 1 - i) * sizeof(struct early_res));
-
- early_res[j - 1].end = 0;
+ drop_range(i);
}

void __init early_res_to_bootmem(u64 start, u64 end)
--- linux-next.orig/arch/x86/kernel/head.c 2008-06-22 06:35:50.286911826 -0700
+++ linux-next/arch/x86/kernel/head.c 2008-06-22 06:36:07.215938422 -0700
@@ -51,7 +51,7 @@ void __init reserve_ebda_region(void)
lowmem = 0x9f000;

/* reserve all memory between lowmem and the 1MB mark */
- reserve_early(lowmem, 0x100000, "BIOS reserved");
+ reserve_early_overlap_ok(lowmem, 0x100000, "BIOS reserved");
}

void __init reserve_setup_data(void)
--- linux-next.orig/include/asm-x86/e820.h 2008-06-22 06:35:53.227090122 -0700
+++ linux-next/include/asm-x86/e820.h 2008-06-22 06:36:07.231939393 -0700
@@ -84,6 +84,7 @@ extern unsigned long end_user_pfn;
extern u64 find_e820_area(u64 start, u64 end, u64 size, u64 align);
extern u64 find_e820_area_size(u64 start, u64 *sizep, u64 align);
extern void reserve_early(u64 start, u64 end, char *name);
+extern void reserve_early_overlap_ok(u64 start, u64 end, char *name);
extern void free_early(u64 start, u64 end);
extern void early_res_to_bootmem(u64 start, u64 end);
extern u64 early_reserve_e820(u64 startt, u64 sizet, u64 align);

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2008-06-22 14:22:49

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 1/5 v2] x86 boot: e820 code indentation fix

From: Paul Jackson <[email protected]>

Fix indentation. An earlier code merge got the
indentation of four lines of code off by a tab.

Signed-off-by: Paul Jackson <[email protected]>

---
arch/x86/kernel/e820.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- linux.orig/arch/x86/kernel/e820.c 2008-06-10 22:37:27.000000000 -0700
+++ linux/arch/x86/kernel/e820.c 2008-06-10 23:04:17.728936162 -0700
@@ -207,10 +207,10 @@ int __init sanitize_e820_map(struct e820
struct e820entry *pbios; /* pointer to original bios entry */
unsigned long long addr; /* address for this change point */
};
-static struct change_member change_point_list[2*E820_X_MAX] __initdata;
-static struct change_member *change_point[2*E820_X_MAX] __initdata;
-static struct e820entry *overlap_list[E820_X_MAX] __initdata;
-static struct e820entry new_bios[E820_X_MAX] __initdata;
+ static struct change_member change_point_list[2*E820_X_MAX] __initdata;
+ static struct change_member *change_point[2*E820_X_MAX] __initdata;
+ static struct e820entry *overlap_list[E820_X_MAX] __initdata;
+ static struct e820entry new_bios[E820_X_MAX] __initdata;
struct change_member *change_tmp;
unsigned long current_type, last_type;
unsigned long long last_addr;

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2008-06-22 14:23:46

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

From: Paul Jackson <[email protected]>

Page frame numbers (the portion of physical addresses above the low
order page offsets) are displayed in several kernel debug and info
prints in decimal, not hex. Decimal addresse are unreadable. Use hex.

Signed-off-by: Paul Jackson <[email protected]>

---
arch/x86/kernel/e820.c | 2 +-
arch/x86/mm/init_64.c | 2 +-
mm/page_alloc.c | 6 +++---
3 files changed, 5 insertions(+), 5 deletions(-)

--- linux-next.orig/arch/x86/kernel/e820.c 2008-06-22 06:36:07.203937695 -0700
+++ linux-next/arch/x86/kernel/e820.c 2008-06-22 06:36:16.792519156 -0700
@@ -922,7 +922,7 @@ unsigned long __init e820_end_of_ram(voi
if (last_pfn > end_user_pfn)
last_pfn = end_user_pfn;

- printk(KERN_INFO "last_pfn = %lu max_arch_pfn = %lu\n",
+ printk(KERN_INFO "last_pfn = 0x%lx max_arch_pfn = 0x%lx\n",
last_pfn, max_arch_pfn);
return last_pfn;
}
--- linux-next.orig/mm/page_alloc.c 2008-06-22 06:35:53.835126994 -0700
+++ linux-next/mm/page_alloc.c 2008-06-22 06:36:16.800519641 -0700
@@ -3517,7 +3517,7 @@ void __init add_active_range(unsigned in
{
int i;

- printk(KERN_DEBUG "Entering add_active_range(%d, %lu, %lu) "
+ printk(KERN_DEBUG "Entering add_active_range(%d, 0x%lx, 0x%lx) "
"%d entries of %d used\n",
nid, start_pfn, end_pfn,
nr_nodemap_entries, MAX_ACTIVE_REGIONS);
@@ -3933,7 +3933,7 @@ void __init free_area_init_nodes(unsigne
for (i = 0; i < MAX_NR_ZONES; i++) {
if (i == ZONE_MOVABLE)
continue;
- printk(" %-8s %8lu -> %8lu\n",
+ printk(" %-8s 0x%8lx -> 0x%8lx\n",
zone_names[i],
arch_zone_lowest_possible_pfn[i],
arch_zone_highest_possible_pfn[i]);
@@ -3949,7 +3949,7 @@ void __init free_area_init_nodes(unsigne
/* Print out the early_node_map[] */
printk("early_node_map[%d] active PFN ranges\n", nr_nodemap_entries);
for (i = 0; i < nr_nodemap_entries; i++)
- printk(" %3d: %8lu -> %8lu\n", early_node_map[i].nid,
+ printk(" %3d: 0x%8lx -> 0x%8lx\n", early_node_map[i].nid,
early_node_map[i].start_pfn,
early_node_map[i].end_pfn);

--- linux-next.orig/arch/x86/mm/init_64.c 2008-06-22 06:35:50.326914251 -0700
+++ linux-next/arch/x86/mm/init_64.c 2008-06-22 06:36:16.812520369 -0700
@@ -830,7 +830,7 @@ int __init reserve_bootmem_generic(unsig
if (pfn < max_pfn_mapped)
return -EFAULT;

- printk(KERN_ERR "reserve_bootmem: illegal reserve %lx %u\n",
+ printk(KERN_ERR "reserve_bootmem: illegal reserve %lx %lu\n",
phys, len);
return -EFAULT;
}

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2008-06-22 14:24:06

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 5/5 v2] x86 boot: more consistently use type int for node ids

From: Paul Jackson <[email protected]>

Everywhere I look, node id's are of type 'int', except in this one
case, which has 'unsigned long'. Change this one to 'int' as well.
There is nothing special about the way this variable 'nid' is used in
this routine to justify using an unusual type here.

Signed-off-by: Paul Jackson <[email protected]>

---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-next.orig/mm/page_alloc.c 2008-06-22 06:36:16.800519641 -0700
+++ linux-next/mm/page_alloc.c 2008-06-22 06:36:26.765123903 -0700
@@ -3666,7 +3666,7 @@ static void __init sort_node_map(void)
}

/* Find the lowest pfn for a node */
-unsigned long __init find_min_pfn_for_node(unsigned long nid)
+unsigned long __init find_min_pfn_for_node(int nid)
{
int i;
unsigned long min_pfn = ULONG_MAX;
@@ -3677,7 +3677,7 @@ unsigned long __init find_min_pfn_for_no

if (min_pfn == ULONG_MAX) {
printk(KERN_WARNING
- "Could not find start_pfn for node %lu\n", nid);
+ "Could not find start_pfn for node %d\n", nid);
return 0;
}


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373

2008-06-22 19:38:48

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

On Sun, Jun 22, 2008 at 7:22 AM, Paul Jackson <[email protected]> wrote:
> From: Paul Jackson <[email protected]>
>
> Page frame numbers (the portion of physical addresses above the low
> order page offsets) are displayed in several kernel debug and info
> prints in decimal, not hex. Decimal addresse are unreadable. Use hex.
>
> Signed-off-by: Paul Jackson <[email protected]>
>
> ---
> arch/x86/kernel/e820.c | 2 +-
> arch/x86/mm/init_64.c | 2 +-
> mm/page_alloc.c | 6 +++---
> 3 files changed, 5 insertions(+), 5 deletions(-)
>
> --- linux-next.orig/arch/x86/kernel/e820.c 2008-06-22 06:36:07.203937695 -0700
> +++ linux-next/arch/x86/kernel/e820.c 2008-06-22 06:36:16.792519156 -0700
> @@ -922,7 +922,7 @@ unsigned long __init e820_end_of_ram(voi
> if (last_pfn > end_user_pfn)
> last_pfn = end_user_pfn;
>
> - printk(KERN_INFO "last_pfn = %lu max_arch_pfn = %lu\n",
> + printk(KERN_INFO "last_pfn = 0x%lx max_arch_pfn = 0x%lx\n",

we should remove 0x,
and hex print out all the way...regarding mem pfn and memory address,
memory size etc. except XXXMB. XXXKB...

YH

2008-06-23 11:09:52

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Yinghai wrote:
> we should remove 0x,

I see some kernel hex prints either way, with or without, 0x.

In this case, in part because I was changing a decimal to a hex
format, and because it could be ambiguous (a hex number that
happened to use only [0-9] digits would be indistinguisable from
a decimal) I preferred using the 0x.

I see no clear answer here ... it seems to be a matter of taste.

> we should ... hex print out all the way...regarding mem pfn and
> memory address, memory size etc. except XXXMB. XXXKB...

I fixed the ones I noticed. If you see print formats that you
would like to change, I will likely be in agreement to such a
patch.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-06-24 11:19:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/5 v2] x86 boot: various E820 & EFI related fixes - what changed in v2


* Paul Jackson <[email protected]> wrote:

> The following patches differ from their previous versions,
> submitted a week ago, on the evening of Sun, 15 Jun 2008,
> in the following ways:
> 1) The "x86_64 build reserve_bootmem_generic fix" patch
> is dropped, because it is already fixed in linux-next.
> Thanks to Yinghai Lu.
> 2) The "allow overlapping ebda and efi memmap memory ranges"
> patch is reworked. Instead of allowing overlapping early
> memory reservations on EFI boots, rather it allows
> particular early memory reservations to overlap, if they
> have been so marked. Only the EBDA "BIOS reserved"
> reservation is so marked for now; only it can overlap.
> Thanks to H. Peter Anvin and Huang Ying.
> 3) The patches "remap efi systab runtime from phys to virt"
> and "virtualize the efi runtime function callback addresses"
> are dropped. They were incorrect. I fixed my EFI firmware
> to follow the spec instead. Thanks to Huang Ying.

applied to tip/x86/setup-memory, thanks Paul.

( the review feedback from Peter still needs to be addressed before the
memmap-from-EFI portion of your changes can be considered for
upstream. See http://lkml.org/lkml/2008/6/24/26. Should be pretty
straightforward. )

Ingo

2008-06-24 11:31:06

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 0/5 v2] x86 boot: various E820 & EFI related fixes - what changed in v2

Ingo wrote:
> applied to tip/x86/setup-memory, thanks Paul.

Thanks, Ingo.

> the review feedback from Peter still needs to be addressed

Agreed. I like Huang's follow up suggestion to that as well:
> I think it is better to add a command-line flag "auxmem"

It seems to have better backward compatibility, as Huang states.

I'm tempted to name this kernel boot option "auxmemmap", since
I like having a little more specific name for this.

I will follow up with patches for these recommendations
of Peter and Huang.

Thanks, Peter and Huang, for the excellent feedback.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-06-24 15:46:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 0/5 v2] x86 boot: various E820 & EFI related fixes - what changed in v2

Paul Jackson wrote:
>
> Agreed. I like Huang's follow up suggestion to that as well:
>> I think it is better to add a command-line flag "auxmem"
>
> It seems to have better backward compatibility, as Huang states.
>

I would very much agree with that. Cool!

-hpa

2008-06-24 21:30:06

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

On Mon, Jun 23, 2008 at 4:09 AM, Paul Jackson <[email protected]> wrote:
> Yinghai wrote:
>> we should remove 0x,
>
> I see some kernel hex prints either way, with or without, 0x.
>
> In this case, in part because I was changing a decimal to a hex
> format, and because it could be ambiguous (a hex number that
> happened to use only [0-9] digits would be indistinguisable from
> a decimal) I preferred using the 0x.
>
> I see no clear answer here ... it seems to be a matter of taste.
>
>> we should ... hex print out all the way...regarding mem pfn and
>> memory address, memory size etc. except XXXMB. XXXKB...
>
> I fixed the ones I noticed. If you see print formats that you
> would like to change, I will likely be in agreement to such a
> patch.

got

Zone PFN ranges:
DMA 0x 0 -> 0x 1000
DMA32 0x 1000 -> 0x 100000
Normal 0x 100000 -> 0x 428000
Movable zone start PFN for each node
early_node_map[5] active PFN ranges
0: 0x 0 -> 0x 99
0: 0x 100 -> 0x d7fa0
0: 0x d7fae -> 0x d7fb0
0: 0x 100000 -> 0x 228000
1: 0x 228000 -> 0x 428000

YH

2008-06-25 01:33:09

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Yinghai wrote:
> Zone PFN ranges:
> DMA 0x 0 -> 0x 1000

Aha - you're right!

I just earned the Ugly Patch of the Day Award.

I think changes of "%lu" to "0x%lx" are still ok,
but changes of "%8lu" to "0x%8lx" (which use 8
columns, right aligned, space filled format) are
not ok.

I will redo this patch, without the "0x" prefix
on the space filled fields. Let me know if that
is better.

Thank-you.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-06-25 01:57:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Paul Jackson wrote:
> Yinghai wrote:
>> Zone PFN ranges:
>> DMA 0x 0 -> 0x 1000
>
> Aha - you're right!
>
> I just earned the Ugly Patch of the Day Award.
>
> I think changes of "%lu" to "0x%lx" are still ok,
> but changes of "%8lu" to "0x%8lx" (which use 8
> columns, right aligned, space filled format) are
> not ok.
>

No, that would have to be %#10lx.

-hpa

2008-06-25 02:17:28

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

hpa wrote:
> that would have to be %#10lx.

Hmmm ... the '#' is a good idea, instead of an explicit "0x".

Do we want zero padding as well, with "0#8lx" ?

Zone PFN ranges:
DMA 0x00000000 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x00428000

or right aligned, space padded:

Zone PFN ranges:
DMA 0x0 -> 0x1000
DMA32 0x1000 -> 0x100000
Normal 0x100000 -> 0x428000

I like the zero padding here myself.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-06-25 02:24:15

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Paul Jackson wrote:
> hpa wrote:
>> that would have to be %#10lx.
>
> Hmmm ... the '#' is a good idea, instead of an explicit "0x".
>
> Do we want zero padding as well, with "0#8lx" ?
>
> Zone PFN ranges:
> DMA 0x00000000 -> 0x00001000
> DMA32 0x00001000 -> 0x00100000
> Normal 0x00100000 -> 0x00428000
>
> or right aligned, space padded:
>
> Zone PFN ranges:
> DMA 0x0 -> 0x1000
> DMA32 0x1000 -> 0x100000
> Normal 0x100000 -> 0x428000
>
> I like the zero padding here myself.
>

Yes, zero padding.

What we really should have is %p produce this format. For some odd
reason, right now %p produces numbers without the 0x prefix.

-h

2008-06-25 02:59:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks



On Tue, 24 Jun 2008, H. Peter Anvin wrote:
>
> What we really should have is %p produce this format. For some odd reason,
> right now %p produces numbers without the 0x prefix.

Actually, I'd like to make %p produce the format that you right now have
to jump through insane hoops for: symbolic addresses.

Right now you have to do something insane like

print_symbol(KERN_WARNING "address: %s\n", ptr);

to print a symbolic version, and it sucks because it's actually just able
to print symbols, you cannot mix any other types in there.

I think it would be much easier to use if "%p" just did the symbolic
version (if it's relevant) automatically. If you _just_ want the hex
representation, you can already always just use %#lx and a cast.

(Of course, then for things that we can't make into symbolic names, we'd
have to fall back to just the hex representation, and whether that one
then includes the 0x or not I don't have strong opinions on.)

And yes, to avoid messing with current users, maybe we should only do that
with %#p (which I don't think anybody uses right now, although I suspect
it actually adds the '0x' you'd like). The '#' thing is, of course, all
about 'alternate forms'. But I worry that gcc warns about undefined
formatting behavior.

Linus

2008-06-25 03:01:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks



On Tue, 24 Jun 2008, Linus Torvalds wrote:
>
> And yes, to avoid messing with current users, maybe we should only do that
> with %#p (which I don't think anybody uses right now, although I suspect
> it actually adds the '0x' you'd like). The '#' thing is, of course, all
> about 'alternate forms'. But I worry that gcc warns about undefined
> formatting behavior.

Indeed it does, I just checked. Very irritating.

Linus

2008-06-25 03:08:25

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Linus wrote:
> %#p (which I don't think anybody uses right now

I don't see any such uses, so that could work.

I agree - that the lib/vsprintf.c code seems to prefix
"%#p" numbers with "0x", as currently coded.

But I find borrowing the '#' flag for symbolic names
to be a slight stretch, as well as likely to confuse
some of us, such as myself, for whom these printf flags
are already a tad cryptic and error prone.

I'd be inclined instead to use "%P" for symbolic addrs.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-06-25 04:04:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks



On Tue, 24 Jun 2008, Paul Jackson wrote:
>
> I'd be inclined instead to use "%P" for symbolic addrs.

That doesn't work - gcc warns about it.

That turns out to be a problem with %#p too.

It's really irritating how we cannot extend on the printk strings without
either having to throw out gcc warnings altogether. gcc has no extension
mechanism to the built-in rules ;/

The format warnings are too useful to drop entirely. I guess sparse could
be taught to do them, and then we could drop the gcc support for them. But
that would still limit coverage a _lot_.

Linus

2008-06-25 05:01:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Linus Torvalds wrote:
>
> On Tue, 24 Jun 2008, Paul Jackson wrote:
>> I'd be inclined instead to use "%P" for symbolic addrs.
>
> That doesn't work - gcc warns about it.
>
> That turns out to be a problem with %#p too.
>
> It's really irritating how we cannot extend on the printk strings without
> either having to throw out gcc warnings altogether. gcc has no extension
> mechanism to the built-in rules ;/
>
> The format warnings are too useful to drop entirely. I guess sparse could
> be taught to do them, and then we could drop the gcc support for them. But
> that would still limit coverage a _lot_.
>

Any reason we can't just re-define %p to print the 0x prefix, just as
glibc does? It'd be easy enough to go and sed out all the 0x%p's
currently in the kernel.

-hpa

2008-06-25 05:06:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Linus Torvalds wrote:
>
> And yes, to avoid messing with current users, maybe we should only do that
> with %#p (which I don't think anybody uses right now, although I suspect
> it actually adds the '0x' you'd like). The '#' thing is, of course, all
> about 'alternate forms'. But I worry that gcc warns about undefined
> formatting behavior.
>

FWIW, I did a pass over the tree stripping out all the 0x%p's and
changing the behaviour about a year ago. It ended up sitting around and
not be submitted, but I'd be happy to re-do it.

-hpa

2008-06-25 08:05:24

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Linux, replying to pj:
> > I'd be inclined instead to use "%P" for symbolic addrs.
>
> That doesn't work - gcc warns about it.
>
> That turns out to be a problem with %#p too.

Ah so.

How about the following "sym(addr, buf)" macro? This could make it
more practical to include kernel symbols within ordinary printk's.

On a silly little test with:

{
char b1[32], b2[32], b3[32];
printk(">>>>>> Testing sym(): A. %s, B. %s, C. %s\n",
sym(&pid_max, b1),
sym(0xffffffff80615750, b2), /* an addr in my System.map */
sym(0, b3));
}

the kernel printed:

>>>>>> Testing sym(): A. pid_max+0x0/0x4, B. trampoline_base+0x0/0x10, C. 0x0

===========================================================================

--- linux.orig/include/linux/kallsyms.h 2008-06-23 15:16:49.885666712 -0700
+++ linux/include/linux/kallsyms.h 2008-06-25 00:48:09.446807842 -0700
@@ -32,6 +32,12 @@ extern int sprint_symbol(char *buffer, u
/* Look up a kernel symbol and print it to the kernel messages. */
extern void __print_symbol(const char *fmt, unsigned long address);

+/* Convert address to kernel symbol; can printk result with "%s" format */
+#define sym(address, namebuf) ({ \
+ sprint_symbol((namebuf), (unsigned long)(address)); \
+ (namebuf); \
+})
+
int lookup_symbol_name(unsigned long addr, char *symname);
int lookup_symbol_attrs(unsigned long addr, unsigned long *size, unsigned long *offset, char *modname, char *name);


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-06-25 09:03:09

by Marco Cesati

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

On Tue, 24 Jun 2008, Linus Torvalds wrote:
> > I'd be inclined instead to use "%P" for symbolic addrs.
> That doesn't work - gcc warns about it.
>
> That turns out to be a problem with %#p too.

I believe that using "%-p" instead of "%#p" would work.
The canonical meaning of "%-p" should be the very same as
"%p", but gcc should not complain because it is
ready to parse a format string like "%-10p" (which is of
course different than "%10p").

A bit too tricky, perhaps?!

Tested on gcc 4.1.2.

Marco

--
Marco Cesati <[email protected]>
Universita' degli Studi di Roma Tor Vergata - DISP
via del Politecnico 1, 00133 Roma

2008-06-25 14:45:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks



On Tue, 24 Jun 2008, H. Peter Anvin wrote:
>
> Any reason we can't just re-define %p to print the 0x prefix, just as glibc
> does? It'd be easy enough to go and sed out all the 0x%p's currently in the
> kernel.

You didn't listen. I want #p to do the _symbolic_ address. The thing we
have in the backtraces etc. With nice symbol offset information etc.

The '0x<hex>' thing isn't all that interesting. You can do it by adding
the '0x' by hand, or by using a cast and using %#lx instead.

Linus

2008-06-25 14:54:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks


* Linus Torvalds <[email protected]> wrote:

> The format warnings are too useful to drop entirely. I guess sparse
> could be taught to do them, and then we could drop the gcc support for
> them. But that would still limit coverage a _lot_.

i think the main problem with Sparse isnt even technical but just a few
minor gotchas IMO that artificially work against the widespread use of
Sparse in various common workflows.

Sparse is a cool tool that extends upon C types (and more), but it's
been made too hard to use widely:

- right now it's opt-in with no ability for testers to use it
transparently while also building the kernel. Forcing a second
kernel build just for 'debugging' reduces the userbase.

- it produces way too much output for people to act upon and see any
real "improvement" in the end result. (Obviously this is partly a
side-effect of its shallow use.) Producing too much output by default
reduces the userbase further.

- delta analysis (watching what new Sparse warnings pop up) is
possible, Al Viro has the "remapper" tool:

git://git.kernel.org/pub/scm/linux/kernel/git/viro/remapper.git/

but forcing delta analysis as the only viable Sparse use poses yet
another barrier against use and does not allow state-less,
single-pass test feedback.

There are simpler tools (like for example checkpatch.pl) which are much
cruder in many aspects, but still they are used much more because they
do not have such basic usage problems. The artificial limitations on
Sparse usage keep it from being the tool it could be, IMO.

A tool that has quality assurance effects _must_ be actionable by a
broad developer base and needs to be no-hassle enabled by testers -
otherwise it will just drown in sheer entropy.

Fortunately these problems are all solvable gradually:

- a kbuild mechanism to get _parallel_ Sparse checks. I.e. both the
real .o gets produced but also the "make C=1" output is produced.
Testers would be enabled to do Sparse checks "alongside" of a normal
kernel build. (Sparse is plenty fast for this to be practical)

- kbuild mechanism with which subsystem maintainers could mark specific
files (or all of their subdirectories) of their subsystem as to be
Sparse-checked by default. [if a .config debug option is enabled]
For example we'd mark all of arch/x86/*.o and kernel/sched*.o in such
a way for example.

- a second layer would be very useful as well: failures would turn into
build failures if a debug .config switch is enabled.

this (and i'm sure some other, simple measures) would push Sparse into
the mainstream and it would allow us to integrate it into existing
practices of automated testing.

if we did all that Sparse might even turn into the native, built-in
Linux kernel compiler of the future: it already has the hardest bit
implemented as it can parse the kernel source and generates syntax trees
from it, generating a real .o from it is just one next step.

On a similar note, it would be nice if subsystem maintainers had a
kbuild mechanism to have the fails they maintain to be built via -Werr
(if an opt-in .config option is enabled).

I'd enforce it in a heartbeat on all files i'm involved with - but i
cannot enforce it on the more than 10,000 files the kernel has. Grepping
the warnings of files i'm interested is not possible reliably from make
-j builds.

So if someone wants me to try a kbuild patch to that end, i'd love to
give it a shot :-)

Ingo

2008-06-25 15:01:59

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

On Tue, 2008-06-24 at 21:04 -0700, Linus Torvalds wrote:

> It's really irritating how we cannot extend on the printk strings without
> either having to throw out gcc warnings altogether. gcc has no extension
> mechanism to the built-in rules ;/
>
> The format warnings are too useful to drop entirely. I guess sparse could
> be taught to do them, and then we could drop the gcc support for them. But
> that would still limit coverage a _lot_.

We should probably start working on getting this fixed.

In networking, we've gone through various incarnations of print_mac()
which is similar to the sym() macro Paul proposed, and it turned out to
be undesirable because of the way it interacts with static inlines that
only optionally contain code at all, the print_mac() function call is
still emitted by the compiler. People experimented with marking it
__pure but that had other problems.

It would be nice to be able to say

u8 *eaddr;

printk(... %M ..., eaddr);

and get
03:45:67:89:ab:cd

out of it, which avoids both the large string/code you get when doing it
manually (printf("%2.x:...:%.2x", eaddr[0], ...) for which there are
macros) and the "function call in hotpath even when debugging is
disabled" problem I mentioned above.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-06-25 15:21:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks



On Wed, 25 Jun 2008, Johannes Berg wrote:
>
> In networking, we've gone through various incarnations of print_mac()
> which is similar to the sym() macro Paul proposed, and it turned out to
> be undesirable because of the way it interacts with static inlines that
> only optionally contain code at all, the print_mac() function call is
> still emitted by the compiler. People experimented with marking it
> __pure but that had other problems.

You don't even have to go that esoteric.

Just printing things like "sector_t" or "u64" is painful, because the
exact type depends on config options and/or architecture.

> It would be nice to be able to say
>
> u8 *eaddr;
>
> printk(... %M ..., eaddr);

For special things, I do think we should extend the format more, and
forget about single-character names. It would be lovely to do them as
%[mac], %[u64], %[symbol] or similar. Because once you don't rely on gcc
checking the string, you can do it.

The problem is that right now we absolutely _do_ rely on gcc checking the
string, and as such we're forced to use standard patterns, and standard
patterns _only_. And that means that %M isn't an option, but also that if
we want symbolic names we'd have to use %p, and not some extension.

But once you drop the 'standard patterns' requirement, I do think you
should drop it _entirely_, and not just extend it with some pissant
single-character unreadable mess.

Linus

2008-06-25 15:35:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Linus Torvalds wrote:
>
> On Tue, 24 Jun 2008, H. Peter Anvin wrote:
>> Any reason we can't just re-define %p to print the 0x prefix, just as glibc
>> does? It'd be easy enough to go and sed out all the 0x%p's currently in the
>> kernel.
>
> You didn't listen. I want #p to do the _symbolic_ address. The thing we
> have in the backtraces etc. With nice symbol offset information etc.
>
> The '0x<hex>' thing isn't all that interesting. You can do it by adding
> the '0x' by hand, or by using a cast and using %#lx instead.
>

You're right, I didn't. I still think the 0x%p's and -- even worse --
0x%08lx's (what about 64 bits?!) we currently have *all over the kernel*
suck, but yes, that's a minor problem and getting the symbolic info
would be a very nice thing.

It looks like gcc will warn about just about ever modifier to %p. I
wondered whose diseased brain came up with that idea.

Overall, I think the glibc people have botched their printf extensions
horribly, failing to pick up the very useful %I extension from Windows
and instead using %I for something completely conflicting is
particularly pissy.

-hpa

2008-06-25 15:36:04

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

On Wed, 2008-06-25 at 08:19 -0700, Linus Torvalds wrote:
>
> On Wed, 25 Jun 2008, Johannes Berg wrote:
> >
> > In networking, we've gone through various incarnations of print_mac()
> > which is similar to the sym() macro Paul proposed, and it turned out to
> > be undesirable because of the way it interacts with static inlines that
> > only optionally contain code at all, the print_mac() function call is
> > still emitted by the compiler. People experimented with marking it
> > __pure but that had other problems.
>
> You don't even have to go that esoteric.
>
> Just printing things like "sector_t" or "u64" is painful, because the
> exact type depends on config options and/or architecture.

Heh, true, but I have the print_mac() disaster firmly imprinted in my
mind ;)

> > It would be nice to be able to say
> >
> > u8 *eaddr;
> >
> > printk(... %M ..., eaddr);
>
> For special things, I do think we should extend the format more, and
> forget about single-character names. It would be lovely to do them as
> %[mac], %[u64], %[symbol] or similar. Because once you don't rely on gcc
> checking the string, you can do it.

True, that does look a lot better and has less potential for confusion.

> The problem is that right now we absolutely _do_ rely on gcc checking the
> string, and as such we're forced to use standard patterns, and standard
> patterns _only_. And that means that %M isn't an option, but also that if
> we want symbolic names we'd have to use %p, and not some extension.
>
> But once you drop the 'standard patterns' requirement, I do think you
> should drop it _entirely_, and not just extend it with some pissant
> single-character unreadable mess.

Oh yes, I agree. At one point I figured it should be easy enough to
extend gcc with something that allows you to specify the format
character/type to take but alas, such a thing is not possible.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-06-26 19:13:31

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

Hi Ingo

> Sparse is a cool tool that extends upon C types (and more), but it's
> been made too hard to use widely:
>
> - right now it's opt-in with no ability for testers to use it
> transparently while also building the kernel. Forcing a second
> kernel build just for 'debugging' reduces the userbase.

This is waht make C=1 is for. If this is broken then we should
fix it.
Just trying it out:
$ make C=1 kernel/sched.o
CHECK kernel/sched.c
kernel/sched_fair.c:37:14: warning: symbol 'sysctl_sched_latency' was not declar ed. Should it be static?
kernel/sched_fair.c:43:14: warning: symbol 'sysctl_sched_min_granularity' was no t declared. Should it be static?
kernel/sched_fair.c:72:14: warning: symbol 'sysctl_sched_wakeup_granularity' was not declared. Should it be static?
...
CC kernel/sched.o


So make C=1 works as intended and let you run sparse on the files
that are built - and only those.


> Fortunately these problems are all solvable gradually:
>
> - a kbuild mechanism to get _parallel_ Sparse checks. I.e. both the
> real .o gets produced but also the "make C=1" output is produced.
> Testers would be enabled to do Sparse checks "alongside" of a normal
> kernel build. (Sparse is plenty fast for this to be practical)
Already present.

>
> - kbuild mechanism with which subsystem maintainers could mark specific
> files (or all of their subdirectories) of their subsystem as to be
> Sparse-checked by default. [if a .config debug option is enabled]
> For example we'd mark all of arch/x86/*.o and kernel/sched*.o in such
> a way for example.
That would then be on a directory basis.
You can do:
diff --git a/kernel/Makefile b/kernel/Makefile
index 1c9938a..1ba00aa 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -2,6 +2,7 @@
# Makefile for the linux kernel.
#

+KBUILD_CHECKSRC=1
obj-y = sched.o fork.o exec_domain.o panic.o printk.o profile.o \
exit.o itimer.o time.o softirq.o resource.o \
sysctl.o capability.o ptrace.o timer.o user.o \

already today.
It applies recursively so we will then run sparse on kernel/time/*.o too.

There is no easy way to cover all of arch/x86/*.o as the files are distributed
in several Makefiles and there is no common one.

>
> - a second layer would be very useful as well: failures would turn into
> build failures if a debug .config switch is enabled.
This could be a simple:
ifdef CONFIG_CHECK_IS_ERROR
CHECKFLAGS += -Werror
endif

And teach sparse about -Werror

> On a similar note, it would be nice if subsystem maintainers had a
> kbuild mechanism to have the fails they maintain to be built via -Werr
> (if an opt-in .config option is enabled).

For each Makefile (does not apply recursively):
ccflags-$(CONFIG_PROMOTE_WARNINGS_TO_ERROR) += -Werror

When I get around to it:
ccflags-sched.o-$(CONFIG_PROMOTE_WARNINGS_TO_ERROR) += -Werror

Sam

2008-06-27 12:09:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks


* Sam Ravnborg <[email protected]> wrote:

> Hi Ingo
>
> > Sparse is a cool tool that extends upon C types (and more), but it's
> > been made too hard to use widely:
> >
> > - right now it's opt-in with no ability for testers to use it
> > transparently while also building the kernel. Forcing a second
> > kernel build just for 'debugging' reduces the userbase.
>
> This is waht make C=1 is for. If this is broken then we should
> fix it.
> Just trying it out:
> $ make C=1 kernel/sched.o
> CHECK kernel/sched.c
> kernel/sched_fair.c:37:14: warning: symbol 'sysctl_sched_latency' was not declar ed. Should it be static?
> kernel/sched_fair.c:43:14: warning: symbol 'sysctl_sched_min_granularity' was no t declared. Should it be static?
> kernel/sched_fair.c:72:14: warning: symbol 'sysctl_sched_wakeup_granularity' was not declared. Should it be static?
> ...
> CC kernel/sched.o
>
>
> So make C=1 works as intended and let you run sparse on the files that
> are built - and only those.

ah, ok - i was confused about that.

> > Fortunately these problems are all solvable gradually:
> >
> > - a kbuild mechanism to get _parallel_ Sparse checks. I.e. both the
> > real .o gets produced but also the "make C=1" output is produced.
> > Testers would be enabled to do Sparse checks "alongside" of a normal
> > kernel build. (Sparse is plenty fast for this to be practical)
> Already present.
>
> >
> > - kbuild mechanism with which subsystem maintainers could mark specific
> > files (or all of their subdirectories) of their subsystem as to be
> > Sparse-checked by default. [if a .config debug option is enabled]
> > For example we'd mark all of arch/x86/*.o and kernel/sched*.o in such
> > a way for example.
> That would then be on a directory basis.
> You can do:
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 1c9938a..1ba00aa 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -2,6 +2,7 @@
> # Makefile for the linux kernel.
> #
>
> +KBUILD_CHECKSRC=1
> obj-y = sched.o fork.o exec_domain.o panic.o printk.o profile.o \
> exit.o itimer.o time.o softirq.o resource.o \
> sysctl.o capability.o ptrace.o timer.o user.o \
>
> already today.
> It applies recursively so we will then run sparse on kernel/time/*.o too.
>
> There is no easy way to cover all of arch/x86/*.o as the files are distributed
> in several Makefiles and there is no common one.
>
> >
> > - a second layer would be very useful as well: failures would turn into
> > build failures if a debug .config switch is enabled.
> This could be a simple:
> ifdef CONFIG_CHECK_IS_ERROR
> CHECKFLAGS += -Werror
> endif
>
> And teach sparse about -Werror
>
> > On a similar note, it would be nice if subsystem maintainers had a
> > kbuild mechanism to have the fails they maintain to be built via -Werr
> > (if an opt-in .config option is enabled).
>
> For each Makefile (does not apply recursively):
> ccflags-$(CONFIG_PROMOTE_WARNINGS_TO_ERROR) += -Werror
>
> When I get around to it:
> ccflags-sched.o-$(CONFIG_PROMOTE_WARNINGS_TO_ERROR) += -Werror

if there's a generic kbuild facility for it i'd love to try something
like that out, on files that i maintain. There should perhaps be a
shortcut for it? Something like:

ccflags-sched.o += -Werror

and kbuild would decide whether CONFIG_PROMOTE_WARNINGS_TO_ERROR is set
and whether to filter out -Werror or not?

Ingo

2008-06-27 20:43:57

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

On Wednesday 25 June 2008 17:19, Linus Torvalds wrote:
> The problem is that right now we absolutely _do_ rely on gcc checking the
> string, and as such we're forced to use standard patterns, and standard
> patterns _only_.

Can we have alternative printk?

asmlinkage int printk(const char * fmt, ...)
__attribute__ ((format (printf, 1, 2))) __cold;
+asmlinkage int custom_printk(const char * fmt, ...) __cold asm ("printk");

There you go. custom_printk() will not be checked by gcc.
No runtime overhead.

> And that means that %M isn't an option, but also that if
> we want symbolic names we'd have to use %p, and not some extension.
>
> But once you drop the 'standard patterns' requirement, I do think you
> should drop it _entirely_, and not just extend it with some pissant
> single-character unreadable mess.

It still makes sense to have some more common ones as single char
for size reasons.
--
vda

2008-06-27 21:25:13

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

> >
> > For each Makefile (does not apply recursively):
> > ccflags-$(CONFIG_PROMOTE_WARNINGS_TO_ERROR) += -Werror
> >
> > When I get around to it:
> > ccflags-sched.o-$(CONFIG_PROMOTE_WARNINGS_TO_ERROR) += -Werror
>
> if there's a generic kbuild facility for it i'd love to try something
> like that out, on files that i maintain. There should perhaps be a
> shortcut for it? Something like:
>
> ccflags-sched.o += -Werror
>
> and kbuild would decide whether CONFIG_PROMOTE_WARNINGS_TO_ERROR is set
> and whether to filter out -Werror or not?

I would prefer to have a simpler CONFIG name and then
use the generic version.
So something like:

ccflags-sched.o-$(CONFIG_CC_Werror) += -Werror
It is in the above line obvious that we for sched.o add the -Werror
option if the CONFIG option CC_Werror is y.

Sam

2008-06-30 07:58:48

by Alexander van Heukelum

[permalink] [raw]
Subject: Re: [PATCH 4/5 v2] x86 boot: show pfn addresses in hex not decimal in some kernel info printks

On Wed, 25 Jun 2008 08:19:13 -0700 (PDT), "Linus Torvalds"
<[email protected]> said:
> On Wed, 25 Jun 2008, Johannes Berg wrote:
> >
> > In networking, we've gone through various incarnations of print_mac()
> > which is similar to the sym() macro Paul proposed, and it turned out to
> > be undesirable because of the way it interacts with static inlines that
> > only optionally contain code at all, the print_mac() function call is
> > still emitted by the compiler. People experimented with marking it
> > __pure but that had other problems.
>
> You don't even have to go that esoteric.
>
> Just printing things like "sector_t" or "u64" is painful, because the
> exact type depends on config options and/or architecture.
>
> > It would be nice to be able to say
> >
> > u8 *eaddr;
> >
> > printk(... %M ..., eaddr);
>
> For special things, I do think we should extend the format more, and
> forget about single-character names. It would be lovely to do them as
> %[mac], %[u64], %[symbol] or similar. Because once you don't rely on gcc
> checking the string, you can do it.

That would confuse the gcc format string checking... A solution that
just crossed my mind is leaving the format string as is (i.e., "%p"),
but prepending it with a special linux-specific string which does not
confuse gcc. Like: "&mac%p"... for simplicity & can be considered always
special in printk, and && can stand for a literal &. (or pick any
other character that is not used frequently in format strings and is
not %, of course.)

> The problem is that right now we absolutely _do_ rely on gcc checking the
> string, and as such we're forced to use standard patterns, and standard
> patterns _only_. And that means that %M isn't an option, but also that if
> we want symbolic names we'd have to use %p, and not some extension.

"&%p" could then be used for a symbol-lookup.

It doesn't help u64, though, but isn't it about time to unify u64 to
"unsigned long long" everywhere, anyhow? Is there any argument against
that except that a big sweep is necessary to clean up new warnings due
to printk format strings?

Greetings,
Alexander

> But once you drop the 'standard patterns' requirement, I do think you
> should drop it _entirely_, and not just extend it with some pissant
> single-character unreadable mess.
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
Alexander van Heukelum
[email protected]

--
http://www.fastmail.fm - IMAP accessible web-mail