LinuxLists.cc - [PATCH 1/2] kallsyms: get rid of code for absolute kallsyms

2024-02-21 20:27:15

Subject: [PATCH 1/2] kallsyms: get rid of code for absolute kallsyms

To compile code for absolute kallsyms, you had to turn off
KALLSYMS_BASE_RELATIVE. I think based on the way it was declared in
Kconfig (without a string after "bool"), it wasn't even possible to
select it.

Get rid of this config option, as preparation for some kallsyms
optimizations that would otherwise be infeasible.

Signed-off-by: Jann Horn <[email protected]>
---
init/Kconfig | 18 -------
kernel/crash_core.c | 4 --
kernel/kallsyms.c | 10 +---
kernel/kallsyms_internal.h | 1 -
scripts/kallsyms.c | 78 ++++++++++++-----------------
scripts/link-vmlinux.sh | 4 --
tools/perf/tests/vmlinux-kallsyms.c | 1 -
7 files changed, 34 insertions(+), 82 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 8426d59cc634..ef525aacfc4c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1791,24 +1791,6 @@ config KALLSYMS_ABSOLUTE_PERCPU
depends on KALLSYMS
default X86_64 && SMP

-config KALLSYMS_BASE_RELATIVE
- bool
- depends on KALLSYMS
- default y
- help
- Instead of emitting them as absolute values in the native word size,
- emit the symbol references in the kallsyms table as 32-bit entries,
- each containing a relative value in the range [base, base + U32_MAX]
- or, when KALLSYMS_ABSOLUTE_PERCPU is in effect, each containing either
- an absolute value in the range [0, S32_MAX] or a relative value in the
- range [base, base + S32_MAX], where base is the lowest relative symbol
- address encountered in the image.
-
- On 64-bit builds, this reduces the size of the address table by 50%,
- but more importantly, it results in entries whose values are build
- time constants, and no relocation pass is required at runtime to fix
- up the entries based on the runtime load address of the kernel.
-
# end of the "standard kernel features (expert users)" menu

config ARCH_HAS_MEMBARRIER_CALLBACKS
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 75cd6a736d03..1e72e65ca344 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -817,12 +817,8 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_SYMBOL(kallsyms_num_syms);
VMCOREINFO_SYMBOL(kallsyms_token_table);
VMCOREINFO_SYMBOL(kallsyms_token_index);
-#ifdef CONFIG_KALLSYMS_BASE_RELATIVE
VMCOREINFO_SYMBOL(kallsyms_offsets);
VMCOREINFO_SYMBOL(kallsyms_relative_base);
-#else
- VMCOREINFO_SYMBOL(kallsyms_addresses);
-#endif /* CONFIG_KALLSYMS_BASE_RELATIVE */
#endif /* CONFIG_KALLSYMS */

arch_crash_save_vmcoreinfo();
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 18edd57b5fe8..db9bd5ff6383 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -148,9 +148,6 @@ static unsigned int get_symbol_offset(unsigned long pos)

unsigned long kallsyms_sym_address(int idx)
{
- if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
- return kallsyms_addresses[idx];
-
/* values are unsigned offsets if --absolute-percpu is not in effect */
if (!IS_ENABLED(CONFIG_KALLSYMS_ABSOLUTE_PERCPU))
return kallsyms_relative_base + (u32)kallsyms_offsets[idx];
@@ -326,12 +323,9 @@ static unsigned long get_symbol_pos(unsigned long addr,
unsigned long i, low, high, mid;

/* This kernel should never had been booted. */
- if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
- BUG_ON(!kallsyms_addresses);
- else
- BUG_ON(!kallsyms_offsets);
+ BUG_ON(!kallsyms_offsets);

- /* Do a binary search on the sorted kallsyms_addresses array. */
+ /* Do a binary search on the sorted kallsyms_offsets array. */
low = 0;
high = kallsyms_num_syms;

diff --git a/kernel/kallsyms_internal.h b/kernel/kallsyms_internal.h
index 27fabdcc40f5..451a38b88db9 100644
--- a/kernel/kallsyms_internal.h
+++ b/kernel/kallsyms_internal.h
@@ -8,7 +8,6 @@
* These will be re-linked against their real values
* during the second link stage.
*/
-extern const unsigned long kallsyms_addresses[] __weak;
extern const int kallsyms_offsets[] __weak;
extern const u8 kallsyms_names[] __weak;

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 653b92f6d4c8..f35be95adfbe 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -6,7 +6,7 @@
* of the GNU General Public License, incorporated herein by reference.
*
* Usage: kallsyms [--all-symbols] [--absolute-percpu]
- * [--base-relative] [--lto-clang] in.map > out.S
+ * [--lto-clang] in.map > out.S
*
* Table compression uses all the unused char codes on the symbols and
* maps these to the most used substrings (tokens). For instance, it might
@@ -63,7 +63,6 @@ static struct sym_entry **table;
static unsigned int table_size, table_cnt;
static int all_symbols;
static int absolute_percpu;
-static int base_relative;
static int lto_clang;

static int token_profit[0x10000];
@@ -76,7 +75,7 @@ static unsigned char best_table_len[256];
static void usage(void)
{
fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
- "[--base-relative] [--lto-clang] in.map > out.S\n");
+ "[--lto-clang] in.map > out.S\n");
exit(1);
}

@@ -484,54 +483,43 @@ static void write_src(void)
printf("\t.short\t%d\n", best_idx[i]);
printf("\n");

- if (!base_relative)
- output_label("kallsyms_addresses");
- else
- output_label("kallsyms_offsets");
+ output_label("kallsyms_offsets");

for (i = 0; i < table_cnt; i++) {
- if (base_relative) {
- /*
- * Use the offset relative to the lowest value
- * encountered of all relative symbols, and emit
- * non-relocatable fixed offsets that will be fixed
- * up at runtime.
- */
+ /*
+ * Use the offset relative to the lowest value
+ * encountered of all relative symbols, and emit
+ * non-relocatable fixed offsets that will be fixed
+ * up at runtime.
+ */

- long long offset;
- int overflow;
-
- if (!absolute_percpu) {
- offset = table[i]->addr - relative_base;
- overflow = (offset < 0 || offset > UINT_MAX);
- } else if (symbol_absolute(table[i])) {
- offset = table[i]->addr;
- overflow = (offset < 0 || offset > INT_MAX);
- } else {
- offset = relative_base - table[i]->addr - 1;
- overflow = (offset < INT_MIN || offset >= 0);
- }
- if (overflow) {
- fprintf(stderr, "kallsyms failure: "
- "%s symbol value %#llx out of range in relative mode\n",
- symbol_absolute(table[i]) ? "absolute" : "relative",
- table[i]->addr);
- exit(EXIT_FAILURE);
- }
- printf("\t.long\t%#x /* %s */\n", (int)offset, table[i]->sym);
- } else if (!symbol_absolute(table[i])) {
- output_address(table[i]->addr);
+ long long offset;
+ int overflow;
+
+ if (!absolute_percpu) {
+ offset = table[i]->addr - relative_base;
+ overflow = (offset < 0 || offset > UINT_MAX);
+ } else if (symbol_absolute(table[i])) {
+ offset = table[i]->addr;
+ overflow = (offset < 0 || offset > INT_MAX);
} else {
- printf("\tPTR\t%#llx\n", table[i]->addr);
+ offset = relative_base - table[i]->addr - 1;
+ overflow = (offset < INT_MIN || offset >= 0);
}
+ if (overflow) {
+ fprintf(stderr, "kallsyms failure: "
+ "%s symbol value %#llx out of range in relative mode\n",
+ symbol_absolute(table[i]) ? "absolute" : "relative",
+ table[i]->addr);
+ exit(EXIT_FAILURE);
+ }
+ printf("\t.long\t%#x /* %s */\n", (int)offset, table[i]->sym);
}
printf("\n");

- if (base_relative) {
- output_label("kallsyms_relative_base");
- output_address(relative_base);
- printf("\n");
- }
+ output_label("kallsyms_relative_base");
+ output_address(relative_base);
+ printf("\n");

if (lto_clang)
for (i = 0; i < table_cnt; i++)
@@ -813,7 +801,6 @@ int main(int argc, char **argv)
static const struct option long_options[] = {
{"all-symbols", no_argument, &all_symbols, 1},
{"absolute-percpu", no_argument, &absolute_percpu, 1},
- {"base-relative", no_argument, &base_relative, 1},
{"lto-clang", no_argument, &lto_clang, 1},
{},
};
@@ -834,8 +821,7 @@ int main(int argc, char **argv)
if (absolute_percpu)
make_percpus_absolute();
sort_symbols();
- if (base_relative)
- record_relative_base();
+ record_relative_base();
optimize_token_table();
write_src();

diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 7862a8101747..5127371d3393 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -157,10 +157,6 @@ kallsyms()
kallsymopt="${kallsymopt} --absolute-percpu"
fi

- if is_enabled CONFIG_KALLSYMS_BASE_RELATIVE; then
- kallsymopt="${kallsymopt} --base-relative"
- fi
-
if is_enabled CONFIG_LTO_CLANG; then
kallsymopt="${kallsymopt} --lto-clang"
fi
diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index 822f893e67d5..1bb91f2ec5ba 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -26,7 +26,6 @@ static bool is_ignored_symbol(const char *name, char type)
* when --all-symbols is specified so exclude them to get a
* stable symbol list.
*/
- "kallsyms_addresses",
"kallsyms_offsets",
"kallsyms_relative_base",
"kallsyms_num_syms",
--
2.44.0.rc0.258.g7320e95886-goog

2024-02-21 20:27:25

by Jann Horn

[permalink] [raw]

Subject: [PATCH 2/2] kallsyms: build faster by using .incbin

Currently, kallsyms builds a big assembly file (~19M with a normal
kernel config), and then the assembler has to turn that big assembly
file back into binary data, which takes around a second per kallsyms
invocation. (Normally there are two kallsyms invocations per build.)

It is much faster to instead directly output binary data, which can
be imported in an assembly file using ".incbin". This is also the
approach taken by arch/x86/boot/compressed/mkpiggy.c.
So this patch switches kallsyms to that approach.

A complication with this is that the endianness of numbers between
host and target might not match (for example, when cross-compiling);
and there seems to be no kconfig symbol that tells us what endianness
the target has.
So pass the path to the intermediate vmlinux ELF file to the kallsyms
tool, and let it parse the ELF header to figure out the target's
endianness.

I have verified that running kallsyms without these changes and
kallsyms with these changes on the same input System.map results
in identical object files.

This change reduces the time for an incremental kernel rebuild
(touch fs/ioctl.c, then re-run make) from 27.7s to 24.1s (medians
over 16 runs each) on my machine - saving around 3.6 seconds.

Signed-off-by: Jann Horn <[email protected]>
---
scripts/kallsyms.c | 196 ++++++++++++++++++++++++++++++++--------
scripts/link-vmlinux.sh | 5 +-
2 files changed, 159 insertions(+), 42 deletions(-)

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index f35be95adfbe..ef03d723aded 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -27,6 +27,10 @@
#include <string.h>
#include <ctype.h>
#include <limits.h>
+#include <endian.h>
+#include <elf.h>
+#include <fcntl.h>
+#include <unistd.h>

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))

@@ -75,7 +79,7 @@ static unsigned char best_table_len[256];
static void usage(void)
{
fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
- "[--lto-clang] in.map > out.S\n");
+ "[--lto-clang] in in.map out.S out.bin\n");
exit(1);
}

@@ -290,20 +294,57 @@ static void read_map(const char *in)
fclose(fp);
}

+static bool is_64bit, is_little_endian;
+static char *asm_path, *bin_path;
+static FILE *asm_file, *bin_file;
+static size_t bin_offset, bin_included;
+
static void output_label(const char *label)
{
- printf(".globl %s\n", label);
- printf("\tALGN\n");
- printf("%s:\n", label);
+ fprintf(asm_file, ".globl %s\n", label);
+ fprintf(asm_file, "\tALGN\n");
+ fprintf(asm_file, "%s:\n", label);
}

/* Provide proper symbols relocatability by their '_text' relativeness. */
static void output_address(unsigned long long addr)
{
if (_text <= addr)
- printf("\tPTR\t_text + %#llx\n", addr - _text);
+ fprintf(asm_file, "\tPTR\t_text + %#llx\n", addr - _text);
else
- printf("\tPTR\t_text - %#llx\n", _text - addr);
+ fprintf(asm_file, "\tPTR\t_text - %#llx\n", _text - addr);
+}
+
+/*
+ * Include all data that has been written into bin_file since the last call to
+ * this function.
+ */
+static void include_bin_data(void)
+{
+ fprintf(asm_file, ".incbin \"%s\", %zu, %zu\n", bin_path,
+ bin_included, bin_offset - bin_included);
+ bin_included = bin_offset;
+}
+
+static void output_bin_data(const void *data, size_t len)
+{
+ if (fwrite(data, 1, len, bin_file) != len) {
+ fprintf(stderr, "kallsyms: unable to write output\n");
+ exit(EXIT_FAILURE);
+ }
+ bin_offset += len;
+}
+static void output_bin_u32(uint32_t value)
+{
+ uint32_t encoded = is_little_endian ? htole32(value) : htobe32(value);
+
+ output_bin_data(&encoded, sizeof(encoded));
+}
+static void output_bin_u16(uint16_t value)
+{
+ uint16_t encoded = is_little_endian ? htole16(value) : htobe16(value);
+
+ output_bin_data(&encoded, sizeof(encoded));
}

/* uncompress a compressed symbol. When this function is called, the best table
@@ -384,25 +425,36 @@ static void sort_symbols_by_name(void)

static void write_src(void)
{
- unsigned int i, k, off;
+ unsigned int i, off;
unsigned int best_idx[256];
unsigned int *markers;
char buf[KSYM_NAME_LEN];

- printf("#include <asm/bitsperlong.h>\n");
- printf("#if BITS_PER_LONG == 64\n");
- printf("#define PTR .quad\n");
- printf("#define ALGN .balign 8\n");
- printf("#else\n");
- printf("#define PTR .long\n");
- printf("#define ALGN .balign 4\n");
- printf("#endif\n");
+ asm_file = fopen(asm_path, "w");
+ if (!asm_file) {
+ perror("unable to open asm output");
+ exit(EXIT_FAILURE);
+ }
+ bin_file = fopen(bin_path, "w");
+ if (!bin_file) {
+ perror("unable to open bin output");
+ exit(EXIT_FAILURE);
+ }
+
+ fprintf(asm_file, "#include <asm/bitsperlong.h>\n");
+ fprintf(asm_file, "#if BITS_PER_LONG == 64\n");
+ fprintf(asm_file, "#define PTR .quad\n");
+ fprintf(asm_file, "#define ALGN .balign 8\n");
+ fprintf(asm_file, "#else\n");
+ fprintf(asm_file, "#define PTR .long\n");
+ fprintf(asm_file, "#define ALGN .balign 4\n");
+ fprintf(asm_file, "#endif\n");

- printf("\t.section .rodata, \"a\"\n");
+ fprintf(asm_file, "\t.section .rodata, \"a\"\n");

output_label("kallsyms_num_syms");
- printf("\t.long\t%u\n", table_cnt);
- printf("\n");
+ fprintf(asm_file, "\t.long\t%u\n", table_cnt);
+ fprintf(asm_file, "\n");

/* table of offset markers, that give the offset in the compressed stream
* every 256 symbols */
@@ -437,20 +489,23 @@ static void write_src(void)
/* Encode length with ULEB128. */
if (table[i]->len <= 0x7F) {
/* Most symbols use a single byte for the length. */
- printf("\t.byte 0x%02x", table[i]->len);
+ unsigned char len_encoded[1] = { table[i]->len };
+
+ output_bin_data(len_encoded, sizeof(len_encoded));
off += table[i]->len + 1;
} else {
/* "Big" symbols use two bytes. */
- printf("\t.byte 0x%02x, 0x%02x",
+ unsigned char len_encoded[2] = {
(table[i]->len & 0x7F) | 0x80,
- (table[i]->len >> 7) & 0x7F);
+ (table[i]->len >> 7) & 0x7F
+ };
+
+ output_bin_data(len_encoded, sizeof(len_encoded));
off += table[i]->len + 2;
}
- for (k = 0; k < table[i]->len; k++)
- printf(", 0x%02x", table[i]->sym[k]);
- printf("\n");
+ output_bin_data(table[i]->sym, table[i]->len);
}
- printf("\n");
+ include_bin_data();

/*
* Now that we wrote out the compressed symbol names, restore the
@@ -463,8 +518,8 @@ static void write_src(void)

output_label("kallsyms_markers");
for (i = 0; i < ((table_cnt + 255) >> 8); i++)
- printf("\t.long\t%u\n", markers[i]);
- printf("\n");
+ output_bin_u32(markers[i]);
+ include_bin_data();

free(markers);

@@ -473,15 +528,15 @@ static void write_src(void)
for (i = 0; i < 256; i++) {
best_idx[i] = off;
expand_symbol(best_table[i], best_table_len[i], buf);
- printf("\t.asciz\t\"%s\"\n", buf);
+ output_bin_data(buf, strlen(buf)+1);
off += strlen(buf) + 1;
}
- printf("\n");
+ include_bin_data();

output_label("kallsyms_token_index");
for (i = 0; i < 256; i++)
- printf("\t.short\t%d\n", best_idx[i]);
- printf("\n");
+ output_bin_u16(best_idx[i]);
+ include_bin_data();

output_label("kallsyms_offsets");

@@ -513,13 +568,12 @@ static void write_src(void)
table[i]->addr);
exit(EXIT_FAILURE);
}
- printf("\t.long\t%#x /* %s */\n", (int)offset, table[i]->sym);
+ output_bin_u32((uint32_t)offset);
}
- printf("\n");
+ include_bin_data();

output_label("kallsyms_relative_base");
output_address(relative_base);
- printf("\n");

if (lto_clang)
for (i = 0; i < table_cnt; i++)
@@ -527,12 +581,24 @@ static void write_src(void)

sort_symbols_by_name();
output_label("kallsyms_seqs_of_names");
- for (i = 0; i < table_cnt; i++)
- printf("\t.byte 0x%02x, 0x%02x, 0x%02x\n",
+ for (i = 0; i < table_cnt; i++) {
+ unsigned char seq_encoded[3] = {
(unsigned char)(table[i]->seq >> 16),
(unsigned char)(table[i]->seq >> 8),
- (unsigned char)(table[i]->seq >> 0));
- printf("\n");
+ (unsigned char)(table[i]->seq >> 0)
+ };
+ output_bin_data(seq_encoded, sizeof(seq_encoded));
+ }
+ include_bin_data();
+
+ if (fclose(asm_file)) {
+ perror("unable to write to asm output");
+ exit(EXIT_FAILURE);
+ }
+ if (fclose(bin_file)) {
+ perror("unable to write to bin output");
+ exit(EXIT_FAILURE);
+ }
}

@@ -795,6 +861,52 @@ static void record_relative_base(void)
}
}

+static void get_target_data_types(const char *elf_path)
+{
+ int elf_fd = open(elf_path, O_RDONLY);
+ unsigned char elf_ident[EI_NIDENT];
+
+ if (elf_fd == -1) {
+ perror("open ELF");
+ exit(EXIT_FAILURE);
+ }
+ if (read(elf_fd, elf_ident, sizeof(elf_ident)) != sizeof(elf_ident)) {
+ perror("read ELF header");
+ exit(EXIT_FAILURE);
+ }
+ close(elf_fd);
+
+ if (elf_ident[EI_MAG0] != ELFMAG0 || elf_ident[EI_MAG1] != ELFMAG1 ||
+ elf_ident[EI_MAG2] != ELFMAG2 || elf_ident[EI_MAG3] != ELFMAG3) {
+ fprintf(stderr, "kallsyms: input ELF has invalid header\n");
+ exit(EXIT_FAILURE);
+ }
+
+ switch (elf_ident[EI_CLASS]) {
+ case ELFCLASS32:
+ is_64bit = false;
+ break;
+ case ELFCLASS64:
+ is_64bit = true;
+ break;
+ default:
+ fprintf(stderr, "kallsyms: input ELF has invalid bitness\n");
+ exit(EXIT_FAILURE);
+ }
+
+ switch (elf_ident[EI_DATA]) {
+ case ELFDATA2LSB:
+ is_little_endian = true;
+ break;
+ case ELFDATA2MSB:
+ is_little_endian = false;
+ break;
+ default:
+ fprintf(stderr, "kallsyms: input ELF has invalid endianness\n");
+ exit(EXIT_FAILURE);
+ }
+}
+
int main(int argc, char **argv)
{
while (1) {
@@ -813,10 +925,14 @@ int main(int argc, char **argv)
usage();
}

- if (optind >= argc)
+ if (optind+4 != argc)
usage();
+ asm_path = argv[optind+2];
+ bin_path = argv[optind+3];
+
+ get_target_data_types(argv[optind]);

- read_map(argv[optind]);
+ read_map(argv[optind+1]);
shrink_table();
if (absolute_percpu)
make_percpus_absolute();
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 5127371d3393..1b5ff33a2d4a 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -162,7 +162,7 @@ kallsyms()
fi

info KSYMS ${2}
- scripts/kallsyms ${kallsymopt} ${1} > ${2}
+ scripts/kallsyms ${kallsymopt} ${1} ${2} ${3} ${4}
}

# Perform one step in kallsyms generation, including temporary linking of
@@ -173,10 +173,11 @@ kallsyms_step()
kallsyms_vmlinux=.tmp_vmlinux.kallsyms${1}
kallsymso=${kallsyms_vmlinux}.o
kallsyms_S=${kallsyms_vmlinux}.S
+ kallsyms_bin=${kallsyms_vmlinux}.bin

vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o}
mksysmap ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms ${kallsymso_prev}
- kallsyms ${kallsyms_vmlinux}.syms ${kallsyms_S}
+ kallsyms ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms ${kallsyms_S} ${kallsyms_bin}

info AS ${kallsyms_S}
${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \
--
2.44.0.rc0.258.g7320e95886-goog

2024-02-21 20:30:51

by Jann Horn

[permalink] [raw]

Subject: Re: [PATCH 2/2] kallsyms: build faster by using .incbin

On Wed, Feb 21, 2024 at 9:27 PM Jann Horn <[email protected]> wrote:
> Currently, kallsyms builds a big assembly file (~19M with a normal
> kernel config), and then the assembler has to turn that big assembly
> file back into binary data, which takes around a second per kallsyms
> invocation. (Normally there are two kallsyms invocations per build.)
>
> It is much faster to instead directly output binary data, which can
> be imported in an assembly file using ".incbin". This is also the
> approach taken by arch/x86/boot/compressed/mkpiggy.c.
> So this patch switches kallsyms to that approach.
>
> A complication with this is that the endianness of numbers between
> host and target might not match (for example, when cross-compiling);
> and there seems to be no kconfig symbol that tells us what endianness
> the target has.
> So pass the path to the intermediate vmlinux ELF file to the kallsyms
> tool, and let it parse the ELF header to figure out the target's
> endianness.
>
> I have verified that running kallsyms without these changes and
> kallsyms with these changes on the same input System.map results
> in identical object files.
>
> This change reduces the time for an incremental kernel rebuild
> (touch fs/ioctl.c, then re-run make) from 27.7s to 24.1s (medians
> over 16 runs each) on my machine - saving around 3.6 seconds.

Ah, I found no maintainer for this file in MAINTAINERS, but now that
I'm looking at the git history, it looks like fixes have come in
through Masahiro Yamada's kbuild tree? So I'm not entirely sure
whether the maintainer for this is Masahiro Yamada or akpm.

2024-02-21 21:19:21

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [PATCH 1/2] kallsyms: get rid of code for absolute kallsyms

On Wed, Feb 21, 2024, at 21:26, Jann Horn wrote:
> To compile code for absolute kallsyms, you had to turn off
> KALLSYMS_BASE_RELATIVE. I think based on the way it was declared in
> Kconfig (without a string after "bool"), it wasn't even possible to
> select it.
>
> Get rid of this config option, as preparation for some kallsyms
> optimizations that would otherwise be infeasible.
>
> Signed-off-by: Jann Horn <[email protected]>

Nice catch!

The code was needed until cf8e8658100d ("arch: Remove
Itanium (IA-64) architecture"), and we missed that this
allowed the cleanup you now did.

Acked-by: Arnd Bergmann <[email protected]>

Arnd

2024-02-22 04:07:30

by Masahiro Yamada

[permalink] [raw]

Subject: Re: [PATCH 2/2] kallsyms: build faster by using .incbin

On Thu, Feb 22, 2024 at 5:27 AM Jann Horn <[email protected]> wrote:
>
> Currently, kallsyms builds a big assembly file (~19M with a normal
> kernel config), and then the assembler has to turn that big assembly
> file back into binary data, which takes around a second per kallsyms
> invocation. (Normally there are two kallsyms invocations per build.)
>
> It is much faster to instead directly output binary data, which can
> be imported in an assembly file using ".incbin". This is also the
> approach taken by arch/x86/boot/compressed/mkpiggy.c.

Yes, that is a sensible case because it just wraps the binary
without any modification.

> So this patch switches kallsyms to that approach.
>
> A complication with this is that the endianness of numbers between
> host and target might not match (for example, when cross-compiling);
> and there seems to be no kconfig symbol that tells us what endianness
> the target has.

CONFIG_CPU_BIG_ENDIAN is it.

You could do this:

if is_enabled CONFIG_CPU_BIG_ENDIAN; then
kallsymopt="${kallsymopt} --big-endian"
fi

if is_enabled CONFIG_64BIT; then
kallsymopt="${kallsymopt} --64bit"
fi

> So pass the path to the intermediate vmlinux ELF file to the kallsyms
> tool, and let it parse the ELF header to figure out the target's
> endianness.
>
> I have verified that running kallsyms without these changes and
> kallsyms with these changes on the same input System.map results
> in identical object files.
>
> This change reduces the time for an incremental kernel rebuild
> (touch fs/ioctl.c, then re-run make) from 27.7s to 24.1s (medians
> over 16 runs each) on my machine - saving around 3.6 seconds.

This reverts bea5b74504742f1b51b815bcaf9a70bddbc49ce3

Somebody might struggle with debugging again, but I am not sure.

Arnd?

If the effort were "I invented a way to do kallsyms in
one pass instead of three", it would be so much more attractive.

I am not so sure if this grain of the optimization is exciting,
but I confirmed that a few seconds were saved for the defconfig.

I am neutral about this.

For the debugging purpose, perhaps we can add --debug option
in order to leave the possibility for
outputting the full assembly as comments.

>
> Signed-off-by: Jann Horn <[email protected]>
> ---
> scripts/kallsyms.c | 196 ++++++++++++++++++++++++++++++++--------
> scripts/link-vmlinux.sh | 5 +-
> 2 files changed, 159 insertions(+), 42 deletions(-)
>
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index f35be95adfbe..ef03d723aded 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -27,6 +27,10 @@
> #include <string.h>
> #include <ctype.h>
> #include <limits.h>
> +#include <endian.h>
> +#include <elf.h>
> +#include <fcntl.h>
> +#include <unistd.h>
>
> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
>
> @@ -75,7 +79,7 @@ static unsigned char best_table_len[256];
> static void usage(void)
> {
> fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] "
> - "[--lto-clang] in.map > out.S\n");
> + "[--lto-clang] in in.map out.S out.bin\n");
> exit(1);
> }
>
> @@ -290,20 +294,57 @@ static void read_map(const char *in)
> fclose(fp);
> }
>
> +static bool is_64bit, is_little_endian;
> +static char *asm_path, *bin_path;
> +static FILE *asm_file, *bin_file;
> +static size_t bin_offset, bin_included;
> +
> static void output_label(const char *label)
> {
> - printf(".globl %s\n", label);
> - printf("\tALGN\n");
> - printf("%s:\n", label);
> + fprintf(asm_file, ".globl %s\n", label);
> + fprintf(asm_file, "\tALGN\n");
> + fprintf(asm_file, "%s:\n", label);
> }
>
> /* Provide proper symbols relocatability by their '_text' relativeness. */
> static void output_address(unsigned long long addr)
> {
> if (_text <= addr)
> - printf("\tPTR\t_text + %#llx\n", addr - _text);
> + fprintf(asm_file, "\tPTR\t_text + %#llx\n", addr - _text);
> else
> - printf("\tPTR\t_text - %#llx\n", _text - addr);
> + fprintf(asm_file, "\tPTR\t_text - %#llx\n", _text - addr);
> +}
> +
> +/*
> + * Include all data that has been written into bin_file since the last call to
> + * this function.
> + */
> +static void include_bin_data(void)
> +{
> + fprintf(asm_file, ".incbin \"%s\", %zu, %zu\n", bin_path,
> + bin_included, bin_offset - bin_included);
> + bin_included = bin_offset;
> +}
> +
> +static void output_bin_data(const void *data, size_t len)
> +{
> + if (fwrite(data, 1, len, bin_file) != len) {
> + fprintf(stderr, "kallsyms: unable to write output\n");
> + exit(EXIT_FAILURE);
> + }
> + bin_offset += len;
> +}
> +static void output_bin_u32(uint32_t value)
> +{
> + uint32_t encoded = is_little_endian ? htole32(value) : htobe32(value);
> +
> + output_bin_data(&encoded, sizeof(encoded));
> +}
> +static void output_bin_u16(uint16_t value)

You might want to insert a blank line
between functions.

> +{
> + uint16_t encoded = is_little_endian ? htole16(value) : htobe16(value);
> +
> + output_bin_data(&encoded, sizeof(encoded));
> }
>
> /* uncompress a compressed symbol. When this function is called, the best table
> @@ -384,25 +425,36 @@ static void sort_symbols_by_name(void)
>
> static void write_src(void)
> {
> - unsigned int i, k, off;
> + unsigned int i, off;
> unsigned int best_idx[256];
> unsigned int *markers;
> char buf[KSYM_NAME_LEN];
>
> - printf("#include <asm/bitsperlong.h>\n");
> - printf("#if BITS_PER_LONG == 64\n");
> - printf("#define PTR .quad\n");
> - printf("#define ALGN .balign 8\n");
> - printf("#else\n");
> - printf("#define PTR .long\n");
> - printf("#define ALGN .balign 4\n");
> - printf("#endif\n");
> + asm_file = fopen(asm_path, "w");
> + if (!asm_file) {
> + perror("unable to open asm output");
> + exit(EXIT_FAILURE);
> + }
> + bin_file = fopen(bin_path, "w");
> + if (!bin_file) {
> + perror("unable to open bin output");
> + exit(EXIT_FAILURE);
> + }
> +
> + fprintf(asm_file, "#include <asm/bitsperlong.h>\n");
> + fprintf(asm_file, "#if BITS_PER_LONG == 64\n");
> + fprintf(asm_file, "#define PTR .quad\n");
> + fprintf(asm_file, "#define ALGN .balign 8\n");
> + fprintf(asm_file, "#else\n");
> + fprintf(asm_file, "#define PTR .long\n");
> + fprintf(asm_file, "#define ALGN .balign 4\n");
> + fprintf(asm_file, "#endif\n");

With this patch, this tool will need to be aware
whether the target is 64-bit or not.

There is no point to include <asm/bitsperlong.h>
to check BITS_PER_LONG.

>
> - printf("\t.section .rodata, \"a\"\n");
> + fprintf(asm_file, "\t.section .rodata, \"a\"\n");
>
> output_label("kallsyms_num_syms");
> - printf("\t.long\t%u\n", table_cnt);
> - printf("\n");
> + fprintf(asm_file, "\t.long\t%u\n", table_cnt);
> + fprintf(asm_file, "\n");
>
> /* table of offset markers, that give the offset in the compressed stream
> * every 256 symbols */
> @@ -437,20 +489,23 @@ static void write_src(void)
> /* Encode length with ULEB128. */
> if (table[i]->len <= 0x7F) {
> /* Most symbols use a single byte for the length. */
> - printf("\t.byte 0x%02x", table[i]->len);
> + unsigned char len_encoded[1] = { table[i]->len };
> +
> + output_bin_data(len_encoded, sizeof(len_encoded));
> off += table[i]->len + 1;
> } else {
> /* "Big" symbols use two bytes. */
> - printf("\t.byte 0x%02x, 0x%02x",
> + unsigned char len_encoded[2] = {
> (table[i]->len & 0x7F) | 0x80,
> - (table[i]->len >> 7) & 0x7F);
> + (table[i]->len >> 7) & 0x7F
> + };
> +
> + output_bin_data(len_encoded, sizeof(len_encoded));
> off += table[i]->len + 2;
> }
> - for (k = 0; k < table[i]->len; k++)
> - printf(", 0x%02x", table[i]->sym[k]);
> - printf("\n");
> + output_bin_data(table[i]->sym, table[i]->len);
> }
> - printf("\n");
> + include_bin_data();
>
> /*
> * Now that we wrote out the compressed symbol names, restore the
> @@ -463,8 +518,8 @@ static void write_src(void)
>
> output_label("kallsyms_markers");
> for (i = 0; i < ((table_cnt + 255) >> 8); i++)
> - printf("\t.long\t%u\n", markers[i]);
> - printf("\n");
> + output_bin_u32(markers[i]);
> + include_bin_data();
>
> free(markers);
>
> @@ -473,15 +528,15 @@ static void write_src(void)
> for (i = 0; i < 256; i++) {
> best_idx[i] = off;
> expand_symbol(best_table[i], best_table_len[i], buf);
> - printf("\t.asciz\t\"%s\"\n", buf);
> + output_bin_data(buf, strlen(buf)+1);
> off += strlen(buf) + 1;
> }
> - printf("\n");
> + include_bin_data();
>
> output_label("kallsyms_token_index");
> for (i = 0; i < 256; i++)
> - printf("\t.short\t%d\n", best_idx[i]);
> - printf("\n");
> + output_bin_u16(best_idx[i]);
> + include_bin_data();
>
> output_label("kallsyms_offsets");
>
> @@ -513,13 +568,12 @@ static void write_src(void)
> table[i]->addr);
> exit(EXIT_FAILURE);
> }
> - printf("\t.long\t%#x /* %s */\n", (int)offset, table[i]->sym);
> + output_bin_u32((uint32_t)offset);
> }
> - printf("\n");
> + include_bin_data();
>
> output_label("kallsyms_relative_base");
> output_address(relative_base);
> - printf("\n");
>
> if (lto_clang)
> for (i = 0; i < table_cnt; i++)
> @@ -527,12 +581,24 @@ static void write_src(void)
>
> sort_symbols_by_name();
> output_label("kallsyms_seqs_of_names");
> - for (i = 0; i < table_cnt; i++)
> - printf("\t.byte 0x%02x, 0x%02x, 0x%02x\n",
> + for (i = 0; i < table_cnt; i++) {
> + unsigned char seq_encoded[3] = {
> (unsigned char)(table[i]->seq >> 16),
> (unsigned char)(table[i]->seq >> 8),
> - (unsigned char)(table[i]->seq >> 0));
> - printf("\n");
> + (unsigned char)(table[i]->seq >> 0)
> + };
> + output_bin_data(seq_encoded, sizeof(seq_encoded));
> + }
> + include_bin_data();
> +
> + if (fclose(asm_file)) {
> + perror("unable to write to asm output");
> + exit(EXIT_FAILURE);
> + }
> + if (fclose(bin_file)) {
> + perror("unable to write to bin output");
> + exit(EXIT_FAILURE);
> + }
> }
>
>
> @@ -795,6 +861,52 @@ static void record_relative_base(void)
> }
> }
>
> +static void get_target_data_types(const char *elf_path)
> +{
> + int elf_fd = open(elf_path, O_RDONLY);
> + unsigned char elf_ident[EI_NIDENT];
> +
> + if (elf_fd == -1) {
> + perror("open ELF");
> + exit(EXIT_FAILURE);
> + }
> + if (read(elf_fd, elf_ident, sizeof(elf_ident)) != sizeof(elf_ident)) {
> + perror("read ELF header");
> + exit(EXIT_FAILURE);
> + }
> + close(elf_fd);
> +
> + if (elf_ident[EI_MAG0] != ELFMAG0 || elf_ident[EI_MAG1] != ELFMAG1 ||
> + elf_ident[EI_MAG2] != ELFMAG2 || elf_ident[EI_MAG3] != ELFMAG3) {
> + fprintf(stderr, "kallsyms: input ELF has invalid header\n");
> + exit(EXIT_FAILURE);
> + }
> +
> + switch (elf_ident[EI_CLASS]) {
> + case ELFCLASS32:
> + is_64bit = false;
> + break;
> + case ELFCLASS64:
> + is_64bit = true;
> + break;
> + default:
> + fprintf(stderr, "kallsyms: input ELF has invalid bitness\n");
> + exit(EXIT_FAILURE);
> + }
> +
> + switch (elf_ident[EI_DATA]) {
> + case ELFDATA2LSB:
> + is_little_endian = true;
> + break;
> + case ELFDATA2MSB:
> + is_little_endian = false;
> + break;
> + default:
> + fprintf(stderr, "kallsyms: input ELF has invalid endianness\n");
> + exit(EXIT_FAILURE);
> + }
> +}
> +
> int main(int argc, char **argv)
> {
> while (1) {
> @@ -813,10 +925,14 @@ int main(int argc, char **argv)
> usage();
> }
>
> - if (optind >= argc)
> + if (optind+4 != argc)
> usage();
> + asm_path = argv[optind+2];
> + bin_path = argv[optind+3];
> +
> + get_target_data_types(argv[optind]);
>
> - read_map(argv[optind]);
> + read_map(argv[optind+1]);
> shrink_table();
> if (absolute_percpu)
> make_percpus_absolute();
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 5127371d3393..1b5ff33a2d4a 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -162,7 +162,7 @@ kallsyms()
> fi
>
> info KSYMS ${2}
> - scripts/kallsyms ${kallsymopt} ${1} > ${2}
> + scripts/kallsyms ${kallsymopt} ${1} ${2} ${3} ${4}
> }
>
> # Perform one step in kallsyms generation, including temporary linking of
> @@ -173,10 +173,11 @@ kallsyms_step()
> kallsyms_vmlinux=.tmp_vmlinux.kallsyms${1}
> kallsymso=${kallsyms_vmlinux}.o
> kallsyms_S=${kallsyms_vmlinux}.S
> + kallsyms_bin=${kallsyms_vmlinux}.bin
>
> vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o}
> mksysmap ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms ${kallsymso_prev}
> - kallsyms ${kallsyms_vmlinux}.syms ${kallsyms_S}
> + kallsyms ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms ${kallsyms_S} ${kallsyms_bin}
>
> info AS ${kallsyms_S}
> ${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \
> --
> 2.44.0.rc0.258.g7320e95886-goog
>

--
Best Regards
Masahiro Yamada

2024-02-22 08:20:57

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [PATCH 2/2] kallsyms: build faster by using .incbin

On Thu, Feb 22, 2024, at 05:06, Masahiro Yamada wrote:
> On Thu, Feb 22, 2024 at 5:27 AM Jann Horn <[email protected]> wrote:
>> This change reduces the time for an incremental kernel rebuild
>> (touch fs/ioctl.c, then re-run make) from 27.7s to 24.1s (medians
>> over 16 runs each) on my machine - saving around 3.6 seconds.

Nice!

..

> This reverts bea5b74504742f1b51b815bcaf9a70bddbc49ce3
>
> Somebody might struggle with debugging again, but I am not sure.
>
> Arnd?

So far, I have not needed it again, but it's only been a year.

> If the effort were "I invented a way to do kallsyms in
> one pass instead of three", it would be so much more attractive.
>
>
> I am not so sure if this grain of the optimization is exciting,
> but I confirmed that a few seconds were saved for the defconfig.
>
> I am neutral about this.

I think the time savings are worth it, especially since this
is going to help anyone building on large machines where
the compile stage is already optimized a lot but the link
stage is limited by single-thread performance.

Arnd

2024-02-22 11:21:19

by Jann Horn

[permalink] [raw]

Subject: Re: [PATCH 2/2] kallsyms: build faster by using .incbin

On Thu, Feb 22, 2024 at 5:07 AM Masahiro Yamada <[email protected]> wrote:
> On Thu, Feb 22, 2024 at 5:27 AM Jann Horn <[email protected]> wrote:
> >
> > Currently, kallsyms builds a big assembly file (~19M with a normal
> > kernel config), and then the assembler has to turn that big assembly
> > file back into binary data, which takes around a second per kallsyms
> > invocation. (Normally there are two kallsyms invocations per build.)
> >
> > It is much faster to instead directly output binary data, which can
> > be imported in an assembly file using ".incbin". This is also the
> > approach taken by arch/x86/boot/compressed/mkpiggy.c.
>
>
> Yes, that is a sensible case because it just wraps the binary
> without any modification.
>
>
>
>
> > So this patch switches kallsyms to that approach.
> >
> > A complication with this is that the endianness of numbers between
> > host and target might not match (for example, when cross-compiling);
> > and there seems to be no kconfig symbol that tells us what endianness
> > the target has.
>
>
>
> CONFIG_CPU_BIG_ENDIAN is it.
>
>
>
> You could do this:
>
> if is_enabled CONFIG_CPU_BIG_ENDIAN; then
> kallsymopt="${kallsymopt} --big-endian"
> fi
>
> if is_enabled CONFIG_64BIT; then
> kallsymopt="${kallsymopt} --64bit"
> fi

Aah, nice, thanks, I searched for endianness kconfig flags but somehow
missed that one.

Though actually, I think further optimizations might make it necessary
to directly operate on ELF files anyway, in which case it would
probably be easier to keep using the ELF header...

> > So pass the path to the intermediate vmlinux ELF file to the kallsyms
> > tool, and let it parse the ELF header to figure out the target's
> > endianness.
> >
> > I have verified that running kallsyms without these changes and
> > kallsyms with these changes on the same input System.map results
> > in identical object files.
> >
> > This change reduces the time for an incremental kernel rebuild
> > (touch fs/ioctl.c, then re-run make) from 27.7s to 24.1s (medians
> > over 16 runs each) on my machine - saving around 3.6 seconds.
>
>
>
>
> This reverts bea5b74504742f1b51b815bcaf9a70bddbc49ce3
>
> Somebody might struggle with debugging again, but I am not sure.
>
> Arnd?
>
>
>
> If the effort were "I invented a way to do kallsyms in
> one pass instead of three", it would be so much more attractive.

Actually, I was chatting with someone about this yesterday, and I
think I have an idea on how to get rid of two link steps... I might
try out some stuff and then come back with another version of this
series afterwards.

> I am not so sure if this grain of the optimization is exciting,
> but I confirmed that a few seconds were saved for the defconfig.
>
> I am neutral about this.
>
>
>
> For the debugging purpose, perhaps we can add --debug option
> in order to leave the possibility for
> outputting the full assembly as comments.

Hm, maybe... though that also involves a lot of duplicate code...

2024-02-23 04:27:11

by Masahiro Yamada

[permalink] [raw]

Subject: Re: [PATCH 2/2] kallsyms: build faster by using .incbin

On Thu, Feb 22, 2024 at 11:21 PM Jann Horn <[email protected]> wrote:
>
> On Thu, Feb 22, 2024 at 12:20 PM Jann Horn <[email protected]> wrote:
> > On Thu, Feb 22, 2024 at 5:07 AM Masahiro Yamada <[email protected]> wrote:
> > > On Thu, Feb 22, 2024 at 5:27 AM Jann Horn <[email protected]> wrote:
> > > >
> > > > Currently, kallsyms builds a big assembly file (~19M with a normal
> > > > kernel config), and then the assembler has to turn that big assembly
> > > > file back into binary data, which takes around a second per kallsyms
> > > > invocation. (Normally there are two kallsyms invocations per build.)
> > > >
> > > > It is much faster to instead directly output binary data, which can
> > > > be imported in an assembly file using ".incbin". This is also the
> > > > approach taken by arch/x86/boot/compressed/mkpiggy.c.
> > >
> > >
> > > Yes, that is a sensible case because it just wraps the binary
> > > without any modification.
> > >
> > >
> > >
> > >
> > > > So this patch switches kallsyms to that approach.
> > > >
> > > > A complication with this is that the endianness of numbers between
> > > > host and target might not match (for example, when cross-compiling);
> > > > and there seems to be no kconfig symbol that tells us what endianness
> > > > the target has.
> > >
> > >
> > >
> > > CONFIG_CPU_BIG_ENDIAN is it.
> > >
> > >
> > >
> > > You could do this:
> > >
> > > if is_enabled CONFIG_CPU_BIG_ENDIAN; then
> > > kallsymopt="${kallsymopt} --big-endian"
> > > fi
> > >
> > > if is_enabled CONFIG_64BIT; then
> > > kallsymopt="${kallsymopt} --64bit"
> > > fi
> >
> > Aah, nice, thanks, I searched for endianness kconfig flags but somehow
> > missed that one.
> >
> > Though actually, I think further optimizations might make it necessary
> > to directly operate on ELF files anyway, in which case it would
> > probably be easier to keep using the ELF header...
> >
> > > > So pass the path to the intermediate vmlinux ELF file to the kallsyms
> > > > tool, and let it parse the ELF header to figure out the target's
> > > > endianness.
> > > >
> > > > I have verified that running kallsyms without these changes and
> > > > kallsyms with these changes on the same input System.map results
> > > > in identical object files.
> > > >
> > > > This change reduces the time for an incremental kernel rebuild
> > > > (touch fs/ioctl.c, then re-run make) from 27.7s to 24.1s (medians
> > > > over 16 runs each) on my machine - saving around 3.6 seconds.
> > >
> > >
> > >
> > >
> > > This reverts bea5b74504742f1b51b815bcaf9a70bddbc49ce3
> > >
> > > Somebody might struggle with debugging again, but I am not sure.
> > >
> > > Arnd?
> > >
> > >
> > >
> > > If the effort were "I invented a way to do kallsyms in
> > > one pass instead of three", it would be so much more attractive.
> >
> > Actually, I was chatting with someone about this yesterday, and I
> > think I have an idea on how to get rid of two link steps... I might
> > try out some stuff and then come back with another version of this
> > series afterwards.
>
> I think basically we could change kallsyms so that on the second run,
> it checks if the kallsyms layout is the same as on the first run, and
> if yes, directly overwrite the relevant part of vmlinux. (And adjust
> the relative_base.) That would save us the final link... does that
> sound like a reasonable idea?

I do not know how we can save the final link.

Inserting the kallsyms data into the .rodata section
would change the address of all symbols that come after.
Only the linker can sort out the address change.

>
> I don't really have any good ideas for saving more than that, given
> that we want to squeeze the kallsyms in between the data and bss
> sections, so we can't just append it at the end of vmlinux... we could
> get the symbol list from vmlinux.o instead of linking
> ".tmp_vmlinux.kallsyms1", but the comments in link-vmlinux.sh say that
> extra linker-generated symbols might appear, and I guess we probably
> don't want to miss those...

I knew it was not trivial.
If you do not have an idea, you do not need to change it.

--
Best Regards
Masahiro Yamada

2024-02-23 12:35:42

by Masahiro Yamada

[permalink] [raw]

Subject: Re: [PATCH 1/2] kallsyms: get rid of code for absolute kallsyms

On Thu, Feb 22, 2024 at 5:27 AM Jann Horn <[email protected]> wrote:
>
> To compile code for absolute kallsyms, you had to turn off
> KALLSYMS_BASE_RELATIVE. I think based on the way it was declared in
> Kconfig (without a string after "bool"), it wasn't even possible to
> select it.
>
> Get rid of this config option, as preparation for some kallsyms
> optimizations that would otherwise be infeasible.
>
> Signed-off-by: Jann Horn <[email protected]>

The code is correct.
Based on Arnd's feedback, could you rephrase the commit description?

When 2213e9a66bb87d83 was applied, IA64 and (TILE && 64BIT)
opted out of KALLSYMS_BASE_RELATIVE.
Now, both architectures are gone.

In the commit description, for instance, you can say:

Commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture")
removed the last use for the combination of KALLSYMS=y and
KALLSYMS_BASE_RELATIVE=n.

--
Best Regards
Masahiro Yamada

2024-02-22 14:21:50

by Jann Horn

[permalink] [raw]

Subject: Re: [PATCH 2/2] kallsyms: build faster by using .incbin

On Thu, Feb 22, 2024 at 12:20 PM Jann Horn <[email protected]> wrote:
> On Thu, Feb 22, 2024 at 5:07 AM Masahiro Yamada <masahiroy@kernelorg> wrote:
> > On Thu, Feb 22, 2024 at 5:27 AM Jann Horn <[email protected]> wrote:
> > >
> > > Currently, kallsyms builds a big assembly file (~19M with a normal
> > > kernel config), and then the assembler has to turn that big assembly
> > > file back into binary data, which takes around a second per kallsyms
> > > invocation. (Normally there are two kallsyms invocations per build.)
> > >
> > > It is much faster to instead directly output binary data, which can
> > > be imported in an assembly file using ".incbin". This is also the
> > > approach taken by arch/x86/boot/compressed/mkpiggy.c.
> >
> >
> > Yes, that is a sensible case because it just wraps the binary
> > without any modification.
> >
> >
> >
> >
> > > So this patch switches kallsyms to that approach.
> > >
> > > A complication with this is that the endianness of numbers between
> > > host and target might not match (for example, when cross-compiling);
> > > and there seems to be no kconfig symbol that tells us what endianness
> > > the target has.
> >
> >
> >
> > CONFIG_CPU_BIG_ENDIAN is it.
> >
> >
> >
> > You could do this:
> >
> > if is_enabled CONFIG_CPU_BIG_ENDIAN; then
> > kallsymopt="${kallsymopt} --big-endian"
> > fi
> >
> > if is_enabled CONFIG_64BIT; then
> > kallsymopt="${kallsymopt} --64bit"
> > fi
>
> Aah, nice, thanks, I searched for endianness kconfig flags but somehow
> missed that one.
>
> Though actually, I think further optimizations might make it necessary
> to directly operate on ELF files anyway, in which case it would
> probably be easier to keep using the ELF header...
>
> > > So pass the path to the intermediate vmlinux ELF file to the kallsyms
> > > tool, and let it parse the ELF header to figure out the target's
> > > endianness.
> > >
> > > I have verified that running kallsyms without these changes and
> > > kallsyms with these changes on the same input System.map results
> > > in identical object files.
> > >
> > > This change reduces the time for an incremental kernel rebuild
> > > (touch fs/ioctl.c, then re-run make) from 27.7s to 24.1s (medians
> > > over 16 runs each) on my machine - saving around 3.6 seconds.
> >
> >
> >
> >
> > This reverts bea5b74504742f1b51b815bcaf9a70bddbc49ce3
> >
> > Somebody might struggle with debugging again, but I am not sure.
> >
> > Arnd?
> >
> >
> >
> > If the effort were "I invented a way to do kallsyms in
> > one pass instead of three", it would be so much more attractive.
>
> Actually, I was chatting with someone about this yesterday, and I
> think I have an idea on how to get rid of two link steps... I might
> try out some stuff and then come back with another version of this
> series afterwards.

I think basically we could change kallsyms so that on the second run,
it checks if the kallsyms layout is the same as on the first run, and
if yes, directly overwrite the relevant part of vmlinux. (And adjust
the relative_base.) That would save us the final link... does that
sound like a reasonable idea?

I don't really have any good ideas for saving more than that, given
that we want to squeeze the kallsyms in between the data and bss
sections, so we can't just append it at the end of vmlinux... we could
get the symbol list from vmlinux.o instead of linking
".tmp_vmlinux.kallsyms1", but the comments in link-vmlinux.sh say that
extra linker-generated symbols might appear, and I guess we probably
don't want to miss those...