2019-11-15 16:47:18

by Shile Zhang

[permalink] [raw]
Subject: [RFC PATCH v4 0/7] Speed booting by sorting ORC unwind tables at build time

Hi,

I refactored the code, followed by Peter's suggestions in v3, thank you!
Any suggestions and comments are welcome!

Thanks!

Changelog:
==========
v3->v4:
- Code refactored for Peter's review findings and suggestions.

v2->v3:
- Discard new added sortorctable tool and related Kconfig changes.
- Refactored sortextable, makes it more readable and extendable.
- Rename 'sortextable' to 'sorttable', for more kernel tables extend.
- Add ORC unwind tables sort into sorttable.
- Remove the runtime ORC tables sort.
https://lore.kernel.org/lkml/[email protected]/

v1->v2:
- Removed new added Kconfig and runtime sort code, advised by Josh Poimboeuf.
- Some minor refactoring.
https://lore.kernel.org/lkml/[email protected]/

v1:
- Added a new sortorctable tool to sort ORC unwind tables at build time,
same as sortextable.
- Add a new Kconfigure to control if ORC unwind tables sort at build
time.
https://lore.kernel.org/lkml/[email protected]/

Shile Zhang (7):
scripts/sortextable: Rewrite error/success handling
scripts/sortextable: kernel coding style formating
scripts/sortextable: Remove dead code
scripts/sortextable: refactor do_func() function
scripts/sorttable: rename sortextable to sorttable
scripts/sorttable: Add ORC unwind tables sort concurrently
x86/unwind/orc: remove run-time ORC unwind tables sort

arch/arc/Kconfig | 2 +-
arch/arm/Kconfig | 2 +-
arch/arm64/Kconfig | 2 +-
arch/microblaze/Kconfig | 2 +-
arch/mips/Kconfig | 2 +-
arch/parisc/Kconfig | 2 +-
arch/parisc/kernel/vmlinux.lds.S | 2 +-
arch/powerpc/Kconfig | 2 +-
arch/s390/Kconfig | 2 +-
arch/x86/Kconfig | 2 +-
arch/x86/kernel/unwind_orc.c | 8 +-
arch/xtensa/Kconfig | 2 +-
init/Kconfig | 2 +-
scripts/.gitignore | 2 +-
scripts/Makefile | 10 +-
scripts/link-vmlinux.sh | 10 +-
scripts/sortextable.h | 209 -------------
scripts/{sortextable.c => sorttable.c} | 299 +++++++++---------
scripts/sorttable.h | 401 +++++++++++++++++++++++++
19 files changed, 568 insertions(+), 395 deletions(-)
delete mode 100644 scripts/sortextable.h
rename scripts/{sortextable.c => sorttable.c} (67%)
create mode 100644 scripts/sorttable.h

--
2.24.0.rc2


2019-11-15 16:47:23

by Shile Zhang

[permalink] [raw]
Subject: [RFC PATCH v4 4/7] scripts/sortextable: refactor do_func() function

Refine the loop, naming and code structure, make the code more readable
and extendable, no functional changes.

Signed-off-by: Shile Zhang <[email protected]>
---
scripts/sortextable.c | 4 +-
scripts/sortextable.h | 115 ++++++++++++++++++++++--------------------
2 files changed, 61 insertions(+), 58 deletions(-)

diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index e5384e86b58c..efa2839865cd 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -320,7 +320,7 @@ static int do_file(char const *const fname, void *addr)
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
break;
}
- rc = do32(ehdr, fname, custom_sort);
+ rc = do_sort_32(ehdr, fname, custom_sort);
break;
case ELFCLASS64:
{
@@ -332,7 +332,7 @@ static int do_file(char const *const fname, void *addr)
fname);
break;
}
- rc = do64(ghdr, fname, custom_sort);
+ rc = do_sort_64(ghdr, fname, custom_sort);
}
break;
default:
diff --git a/scripts/sortextable.h b/scripts/sortextable.h
index a2e3af7bf211..6485513f7cae 100644
--- a/scripts/sortextable.h
+++ b/scripts/sortextable.h
@@ -12,7 +12,7 @@

#undef extable_ent_size
#undef compare_extable
-#undef do_func
+#undef do_sort
#undef Elf_Addr
#undef Elf_Ehdr
#undef Elf_Shdr
@@ -34,7 +34,7 @@
#ifdef SORTEXTABLE_64
# define extable_ent_size 16
# define compare_extable compare_extable_64
-# define do_func do64
+# define do_sort do_sort_64
# define Elf_Addr Elf64_Addr
# define Elf_Ehdr Elf64_Ehdr
# define Elf_Shdr Elf64_Shdr
@@ -55,7 +55,7 @@
#else
# define extable_ent_size 8
# define compare_extable compare_extable_32
-# define do_func do32
+# define do_sort do_sort_32
# define Elf_Addr Elf32_Addr
# define Elf_Ehdr Elf32_Ehdr
# define Elf_Shdr Elf32_Shdr
@@ -87,81 +87,81 @@ static int compare_extable(const void *a, const void *b)
return 0;
}

-static int do_func(Elf_Ehdr *ehdr,
+static int do_sort(Elf_Ehdr *ehdr,
char const *const fname,
table_sort_t custom_sort)
{
- Elf_Shdr *shdr;
- Elf_Shdr *shstrtab_sec;
+ Elf_Shdr *s, *shdr = (Elf_Shdr *)((char *)ehdr + _r(&ehdr->e_shoff));
Elf_Shdr *strtab_sec = NULL;
Elf_Shdr *symtab_sec = NULL;
Elf_Shdr *extab_sec = NULL;
Elf_Sym *sym;
const Elf_Sym *symtab;
- Elf32_Word *symtab_shndx_start = NULL;
- Elf_Sym *sort_needed_sym;
+ Elf32_Word *symtab_shndx = NULL;
+ Elf_Sym *sort_needed_sym = NULL;
Elf_Shdr *sort_needed_sec;
Elf_Rel *relocs = NULL;
int relocs_size = 0;
- uint32_t *sort_done_location;
- const char *secstrtab;
+ uint32_t *sort_needed_loc;
+ const char *secstrings;
const char *strtab;
char *extab_image;
int extab_index = 0;
int i;
int idx;
- unsigned int num_sections;
- unsigned int secindex_strings;
+ unsigned int shnum;
+ unsigned int shstrndx;

- shdr = (Elf_Shdr *)((char *)ehdr + _r(&ehdr->e_shoff));
+ shstrndx = r2(&ehdr->e_shstrndx);
+ if (shstrndx == SHN_XINDEX)
+ shstrndx = r(&shdr[0].sh_link);
+ secstrings = (const char *)ehdr + _r(&shdr[shstrndx].sh_offset);

- num_sections = r2(&ehdr->e_shnum);
- if (num_sections == SHN_UNDEF)
- num_sections = _r(&shdr[0].sh_size);
+ shnum = r2(&ehdr->e_shnum);
+ if (shnum == SHN_UNDEF)
+ shnum = _r(&shdr[0].sh_size);

- secindex_strings = r2(&ehdr->e_shstrndx);
- if (secindex_strings == SHN_XINDEX)
- secindex_strings = r(&shdr[0].sh_link);
-
- shstrtab_sec = shdr + secindex_strings;
- secstrtab = (const char *)ehdr + _r(&shstrtab_sec->sh_offset);
- for (i = 0; i < num_sections; i++) {
- idx = r(&shdr[i].sh_name);
- if (!strcmp(secstrtab + idx, "__ex_table")) {
- extab_sec = shdr + i;
+ for (i = 0, s = shdr; s < shdr + shnum; i++, s++) {
+ idx = r(&s->sh_name);
+ if (!strcmp(secstrings + idx, "__ex_table")) {
+ extab_sec = s;
extab_index = i;
}
- if ((r(&shdr[i].sh_type) == SHT_REL ||
- r(&shdr[i].sh_type) == SHT_RELA) &&
- r(&shdr[i].sh_info) == extab_index) {
- relocs = (void *)ehdr + _r(&shdr[i].sh_offset);
- relocs_size = _r(&shdr[i].sh_size);
+ if (!strcmp(secstrings + idx, ".symtab"))
+ symtab_sec = s;
+ if (!strcmp(secstrings + idx, ".strtab"))
+ strtab_sec = s;
+
+ if ((r(&s->sh_type) == SHT_REL ||
+ r(&s->sh_type) == SHT_RELA) &&
+ r(&s->sh_info) == extab_index) {
+ relocs = (void *)ehdr + _r(&s->sh_offset);
+ relocs_size = _r(&s->sh_size);
}
- if (!strcmp(secstrtab + idx, ".symtab"))
- symtab_sec = shdr + i;
- if (!strcmp(secstrtab + idx, ".strtab"))
- strtab_sec = shdr + i;
- if (r(&shdr[i].sh_type) == SHT_SYMTAB_SHNDX)
- symtab_shndx_start = (Elf32_Word *)(
- (const char *)ehdr + _r(&shdr[i].sh_offset));
+ if (r(&s->sh_type) == SHT_SYMTAB_SHNDX)
+ symtab_shndx = (Elf32_Word *)((const char *)ehdr +
+ _r(&s->sh_offset));
}
- if (!strtab_sec) {
- fprintf(stderr, "no .strtab in file: %s\n", fname);
+
+ if (!extab_sec) {
+ fprintf(stderr, "no __ex_table in file: %s\n", fname);
return -1;
}
+
if (!symtab_sec) {
fprintf(stderr, "no .symtab in file: %s\n", fname);
return -1;
}
- symtab = (const Elf_Sym *)((const char *)ehdr +
- _r(&symtab_sec->sh_offset));
- if (!extab_sec) {
- fprintf(stderr, "no __ex_table in file: %s\n", fname);
+
+ if (!strtab_sec) {
+ fprintf(stderr, "no .strtab in file: %s\n", fname);
return -1;
}
- strtab = (const char *)ehdr + _r(&strtab_sec->sh_offset);

extab_image = (void *)ehdr + _r(&extab_sec->sh_offset);
+ strtab = (const char *)ehdr + _r(&strtab_sec->sh_offset);
+ symtab = (const Elf_Sym *)((const char *)ehdr +
+ _r(&symtab_sec->sh_offset));

if (custom_sort) {
custom_sort(extab_image, _r(&extab_sec->sh_size));
@@ -170,38 +170,41 @@ static int do_func(Elf_Ehdr *ehdr,
qsort(extab_image, num_entries,
extable_ent_size, compare_extable);
}
+
/* If there were relocations, we no longer need them. */
if (relocs)
memset(relocs, 0, relocs_size);

- /* find main_extable_sort_needed */
- sort_needed_sym = NULL;
- for (i = 0; i < _r(&symtab_sec->sh_size) / sizeof(Elf_Sym); i++) {
- sym = (void *)ehdr + _r(&symtab_sec->sh_offset);
- sym += i;
+ /* find the flag main_extable_sort_needed */
+ for (sym = (void *)ehdr + _r(&symtab_sec->sh_offset);
+ sym < sym + _r(&symtab_sec->sh_size) / sizeof(Elf_Sym);
+ sym++) {
if (ELF_ST_TYPE(sym->st_info) != STT_OBJECT)
continue;
- idx = r(&sym->st_name);
- if (!strcmp(strtab + idx, "main_extable_sort_needed")) {
+ if (!strcmp(strtab + r(&sym->st_name),
+ "main_extable_sort_needed")) {
sort_needed_sym = sym;
break;
}
}
+
if (!sort_needed_sym) {
fprintf(stderr,
"no main_extable_sort_needed symbol in file: %s\n",
fname);
return -1;
}
+
sort_needed_sec = &shdr[get_secindex(r2(&sym->st_shndx),
sort_needed_sym - symtab,
- symtab_shndx_start)];
- sort_done_location = (void *)ehdr +
+ symtab_shndx)];
+ sort_needed_loc = (void *)ehdr +
_r(&sort_needed_sec->sh_offset) +
_r(&sort_needed_sym->st_value) -
_r(&sort_needed_sec->sh_addr);

- /* We sorted it, clear the flag. */
- w(0, sort_done_location);
+ /* extable has been sorted, clear the flag */
+ w(0, sort_needed_loc);
+
return 0;
}
--
2.24.0.rc2

2019-11-15 16:47:27

by Shile Zhang

[permalink] [raw]
Subject: [RFC PATCH v4 1/7] scripts/sortextable: Rewrite error/success handling

The sortextable token some code from recordmount, which uses
the same setjmp/longjmp to manage control flow.
Now, recordmcount has been rewritten the error handling by
commit 3f1df12019f3 ("recordmcount: Rewrite error/success handling").

So rewrite this part as well with more refactors, make it more readable
and easy for further extend, no functional changes.

Signed-off-by: Shile Zhang <[email protected]>
---
scripts/sortextable.c | 119 +++++++++++++++---------------------------
scripts/sortextable.h | 11 ++--
2 files changed, 48 insertions(+), 82 deletions(-)

diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index 55768654e3c6..cd9762ba4467 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -22,7 +22,6 @@
#include <getopt.h>
#include <elf.h>
#include <fcntl.h>
-#include <setjmp.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@@ -51,61 +50,41 @@
#define EM_ARCV2 195
#endif

-static int fd_map; /* File descriptor for file being modified. */
-static int mmap_failed; /* Boolean flag. */
-static void *ehdr_curr; /* current ElfXX_Ehdr * for resource cleanup */
-static struct stat sb; /* Remember .st_size, etc. */
-static jmp_buf jmpenv; /* setjmp/longjmp per-file error escape */
-
-/* setjmp() return values */
-enum {
- SJ_SETJMP = 0, /* hardwired first return */
- SJ_FAIL,
- SJ_SUCCEED
-};
-
-/* Per-file resource cleanup when multiple files. */
-static void
-cleanup(void)
-{
- if (!mmap_failed)
- munmap(ehdr_curr, sb.st_size);
- close(fd_map);
-}
-
-static void __attribute__((noreturn))
-fail_file(void)
-{
- cleanup();
- longjmp(jmpenv, SJ_FAIL);
-}
-
/*
* Get the whole file as a programming convenience in order to avoid
* malloc+lseek+read+free of many pieces. If successful, then mmap
* avoids copying unused pieces; else just read the whole file.
* Open for both read and write.
*/
-static void *mmap_file(char const *fname)
+static void *mmap_file(char const *fname, size_t *size)
{
- void *addr;
+ int fd;
+ struct stat sb;
+ void *addr = NULL;

- fd_map = open(fname, O_RDWR);
- if (fd_map < 0 || fstat(fd_map, &sb) < 0) {
+ fd = open(fname, O_RDWR);
+ if (fd < 0) {
perror(fname);
- fail_file();
+ return NULL;
+ }
+ if (fstat(fd, &sb) < 0) {
+ perror(fname);
+ goto out;
}
if (!S_ISREG(sb.st_mode)) {
fprintf(stderr, "not a regular file: %s\n", fname);
- fail_file();
+ goto out;
}
- addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED,
- fd_map, 0);
+ addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
- mmap_failed = 1;
fprintf(stderr, "Could not mmap file: %s\n", fname);
- fail_file();
+ goto out;
}
+
+ *size = sb.st_size;
+
+out:
+ close(fd);
return addr;
}

@@ -264,19 +243,18 @@ static void sort_relative_table(char *extab_image, int image_size)
}
}

-static void
-do_file(char const *const fname)
+static int
+do_file(char const *const fname, void *addr)
{
- table_sort_t custom_sort;
- Elf32_Ehdr *ehdr = mmap_file(fname);
+ table_sort_t custom_sort = NULL;
+ Elf32_Ehdr *ehdr = addr;
+ int rc = -1;

- ehdr_curr = ehdr;
switch (ehdr->e_ident[EI_DATA]) {
default:
fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
ehdr->e_ident[EI_DATA], fname);
- fail_file();
- break;
+ return -1;
case ELFDATA2LSB:
r = rle;
r2 = r2le;
@@ -298,7 +276,7 @@ do_file(char const *const fname)
|| (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN)
|| ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname);
- fail_file();
+ return -1;
}

custom_sort = NULL;
@@ -306,7 +284,6 @@ do_file(char const *const fname)
default:
fprintf(stderr, "unrecognized e_machine %d %s\n",
r2(&ehdr->e_machine), fname);
- fail_file();
break;
case EM_386:
case EM_X86_64:
@@ -333,16 +310,15 @@ do_file(char const *const fname)
default:
fprintf(stderr, "unrecognized ELF class %d %s\n",
ehdr->e_ident[EI_CLASS], fname);
- fail_file();
break;
case ELFCLASS32:
if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
|| r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
- fail_file();
+ break;
}
- do32(ehdr, fname, custom_sort);
+ rc = do32(ehdr, fname, custom_sort);
break;
case ELFCLASS64: {
Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
@@ -350,21 +326,22 @@ do_file(char const *const fname)
|| r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
- fail_file();
+ break;
}
- do64(ghdr, fname, custom_sort);
+ rc = do64(ghdr, fname, custom_sort);
break;
}
} /* end switch */

- cleanup();
+ return rc;
}

int
main(int argc, char *argv[])
{
- int n_error = 0; /* gcc-4.3.0 false positive complaint */
- int i;
+ int i, n_error = 0; /* gcc-4.3.0 false positive complaint */
+ size_t size = 0;
+ void *addr = NULL;

if (argc < 2) {
fprintf(stderr, "usage: sortextable vmlinux...\n");
@@ -373,28 +350,16 @@ main(int argc, char *argv[])

/* Process each file in turn, allowing deep failure. */
for (i = 1; i < argc; i++) {
- char *file = argv[i];
- int const sjval = setjmp(jmpenv);
+ addr = mmap_file(argv[i], &size);
+ if (!addr) {
+ ++n_error;
+ continue;
+ }

- switch (sjval) {
- default:
- fprintf(stderr, "internal error: %s\n", file);
- exit(1);
- break;
- case SJ_SETJMP: /* normal sequence */
- /* Avoid problems if early cleanup() */
- fd_map = -1;
- ehdr_curr = NULL;
- mmap_failed = 1;
- do_file(file);
- break;
- case SJ_FAIL: /* error in do_file or below */
+ if (do_file(argv[i], addr))
++n_error;
- break;
- case SJ_SUCCEED: /* premature success */
- /* do nothing */
- break;
- } /* end switch */
+
+ munmap(addr, size);
}
return !!n_error;
}
diff --git a/scripts/sortextable.h b/scripts/sortextable.h
index d4b3f6c40f02..5a62e94df678 100644
--- a/scripts/sortextable.h
+++ b/scripts/sortextable.h
@@ -87,7 +87,7 @@ static int compare_extable(const void *a, const void *b)
return 0;
}

-static void
+static int
do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
{
Elf_Shdr *shdr;
@@ -146,17 +146,17 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
}
if (strtab_sec == NULL) {
fprintf(stderr, "no .strtab in file: %s\n", fname);
- fail_file();
+ return -1;
}
if (symtab_sec == NULL) {
fprintf(stderr, "no .symtab in file: %s\n", fname);
- fail_file();
+ return -1;
}
symtab = (const Elf_Sym *)((const char *)ehdr +
_r(&symtab_sec->sh_offset));
if (extab_sec == NULL) {
fprintf(stderr, "no __ex_table in file: %s\n", fname);
- fail_file();
+ return -1;
}
strtab = (const char *)ehdr + _r(&strtab_sec->sh_offset);

@@ -190,7 +190,7 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
fprintf(stderr,
"no main_extable_sort_needed symbol in file: %s\n",
fname);
- fail_file();
+ return -1;
}
sort_needed_sec = &shdr[get_secindex(r2(&sym->st_shndx),
sort_needed_sym - symtab,
@@ -206,4 +206,5 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
#endif
/* We sorted it, clear the flag. */
w(0, sort_done_location);
+ return 0;
}
--
2.24.0.rc2

2019-11-15 16:47:35

by Shile Zhang

[permalink] [raw]
Subject: [RFC PATCH v4 7/7] x86/unwind/orc: remove run-time ORC unwind tables sort

The orc_unwind and orc_unwind_ip tables are sorted in vmlinux link phase
at build time, just remove the run-time sort.

Signed-off-by: Shile Zhang <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 332ae6530fa8..280da6fa9922 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -273,9 +273,11 @@ void __init unwind_init(void)
return;
}

- /* Sort the .orc_unwind and .orc_unwind_ip tables: */
- sort(__start_orc_unwind_ip, num_entries, sizeof(int), orc_sort_cmp,
- orc_sort_swap);
+ /*
+ * Note, orc_unwind and orc_unwind_ip tables has been sorted in
+ * vmlinux link phase by sorttable tool at build time.
+ * Its ready for binary search now.
+ */

/* Initialize the fast lookup table: */
lookup_num_blocks = orc_lookup_end - orc_lookup;
--
2.24.0.rc2

2019-11-15 16:48:30

by Shile Zhang

[permalink] [raw]
Subject: [RFC PATCH v4 5/7] scripts/sorttable: rename sortextable to sorttable

Using commonly name for further more kernel table sort at build time
extend, no functional changes.

Signed-off-by: Shile Zhang <[email protected]>
---
arch/arc/Kconfig | 2 +-
arch/arm/Kconfig | 2 +-
arch/arm64/Kconfig | 2 +-
arch/microblaze/Kconfig | 2 +-
arch/mips/Kconfig | 2 +-
arch/parisc/Kconfig | 2 +-
arch/parisc/kernel/vmlinux.lds.S | 2 +-
arch/powerpc/Kconfig | 2 +-
arch/s390/Kconfig | 2 +-
arch/x86/Kconfig | 2 +-
arch/xtensa/Kconfig | 2 +-
init/Kconfig | 2 +-
scripts/.gitignore | 2 +-
scripts/Makefile | 4 ++--
scripts/link-vmlinux.sh | 10 +++++-----
scripts/{sortextable.c => sorttable.c} | 10 +++++-----
scripts/{sortextable.h => sorttable.h} | 4 ++--
17 files changed, 27 insertions(+), 27 deletions(-)
rename scripts/{sortextable.c => sorttable.c} (97%)
rename scripts/{sortextable.h => sorttable.h} (99%)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 8383155c8c82..80f1b4034ebd 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -14,7 +14,7 @@ config ARC
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_SUPPORTS_ATOMIC_RMW if ARC_HAS_LLSC
select ARCH_32BIT_OFF_T
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
select DMA_DIRECT_REMAP
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 8a50efb559f3..6a392d3b240f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -37,7 +37,7 @@ config ARM
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_IPC_PARSE_VERSION
select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
- select BUILDTIME_EXTABLE_SORT if MMU
+ select BUILDTIME_TABLE_SORT if MMU
select CLONE_BACKWARDS
select CPU_PM if SUSPEND || CPU_IDLE
select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3f047afb982c..a8ca76b327df 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -82,7 +82,7 @@ config ARM64
select ARM_GIC_V3
select ARM_GIC_V3_ITS if PCI
select ARM_PSCI_FW
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
select CPU_PM if (SUSPEND || CPU_IDLE)
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index c9c4be822456..959f0ae5cf90 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -12,7 +12,7 @@ config MICROBLAZE
select ARCH_HAS_UNCACHED_SEGMENT if !MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_WANT_IPC_PARSE_VERSION
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select TIMER_OF
select CLONE_BACKWARDS3
select COMMON_CLK
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index a0bd9bdb5f83..bec582846dbd 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -14,7 +14,7 @@ config MIPS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_IPC_PARSE_VERSION
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select CPU_NO_EFFICIENT_FFS if (TARGET_ISA_REV < 1)
select CPU_PM if CPU_IDLE
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index b16237c95ea3..e1ef610a5a2b 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -18,7 +18,7 @@ config PARISC
select RTC_DRV_GENERIC
select INIT_ALL_POSSIBLE
select BUG
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select HAVE_PCI
select HAVE_PERF_EVENTS
select HAVE_KERNEL_BZIP2
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index 99cd24f2ea01..2a9799f7b46e 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -129,7 +129,7 @@ SECTIONS

RO_DATA_SECTION(8)

- /* RO because of BUILDTIME_EXTABLE_SORT */
+ /* RO because of BUILDTIME_TABLE_SORT */
EXCEPTION_TABLE(8)
NOTES

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3e56c9c2f16e..b3f404b825a6 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -149,7 +149,7 @@ config PPC
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN
select DYNAMIC_FTRACE if FUNCTION_TRACER
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 43a81d0ad507..97a3a63c69fe 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -110,7 +110,7 @@ config S390
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANT_IPC_PARSE_VERSION
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS2
select DYNAMIC_FTRACE if FUNCTION_TRACER
select GENERIC_CLOCKEVENTS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8ef85139553f..958bfbc2e5db 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -97,7 +97,7 @@ config X86
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANT_HUGE_PMD_SHARE
select ARCH_WANTS_THP_SWAP if X86_64
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLKEVT_I8253
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
select CLOCKSOURCE_WATCHDOG
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index a8e7beb6b7b5..011c53481840 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -9,7 +9,7 @@ config XTENSA
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_IPC_PARSE_VERSION
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
select DMA_REMAP if MMU
diff --git a/init/Kconfig b/init/Kconfig
index b4daad2bac23..40e2edf8b00b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -58,7 +58,7 @@ config CONSTRUCTORS
config IRQ_WORK
bool

-config BUILDTIME_EXTABLE_SORT
+config BUILDTIME_TABLE_SORT
bool

config THREAD_INFO_IN_TASK
diff --git a/scripts/.gitignore b/scripts/.gitignore
index 17f8cef88fa8..c03b7cd97111 100644
--- a/scripts/.gitignore
+++ b/scripts/.gitignore
@@ -7,7 +7,7 @@ kallsyms
pnmtologo
unifdef
recordmcount
-sortextable
+sorttable
asn1_compiler
extract-cert
sign-file
diff --git a/scripts/Makefile b/scripts/Makefile
index 3e86b300f5a1..658d201f7f8b 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -15,13 +15,13 @@ hostprogs-$(CONFIG_KALLSYMS) += kallsyms
hostprogs-$(CONFIG_LOGO) += pnmtologo
hostprogs-$(CONFIG_VT) += conmakehash
hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
-hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sortextable
+hostprogs-$(CONFIG_BUILDTIME_TABLE_SORT) += sorttable
hostprogs-$(CONFIG_ASN1) += asn1_compiler
hostprogs-$(CONFIG_MODULE_SIG_FORMAT) += sign-file
hostprogs-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
hostprogs-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert

-HOSTCFLAGS_sortextable.o = -I$(srctree)/tools/include
+HOSTCFLAGS_sorttable.o = -I$(srctree)/tools/include
HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
HOSTLDLIBS_sign-file = -lcrypto
HOSTLDLIBS_extract-cert = -lcrypto
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 06495379fcd8..01978d1e4c13 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -178,9 +178,9 @@ mksysmap()
${CONFIG_SHELL} "${srctree}/scripts/mksysmap" ${1} ${2}
}

-sortextable()
+sorttable()
{
- ${objtree}/scripts/sortextable ${1}
+ ${objtree}/scripts/sorttable ${1}
}

# Delete output files in case of error
@@ -298,9 +298,9 @@ fi

vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o}

-if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
- info SORTEX vmlinux
- sortextable vmlinux
+if [ -n "${CONFIG_BUILDTIME_TABLE_SORT}" ]; then
+ info SORTTAB vmlinux
+ sorttable vmlinux
fi

info SYSMAP System.map
diff --git a/scripts/sortextable.c b/scripts/sorttable.c
similarity index 97%
rename from scripts/sortextable.c
rename to scripts/sorttable.c
index efa2839865cd..ff98b7db20c6 100644
--- a/scripts/sortextable.c
+++ b/scripts/sorttable.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * sortextable.c: Sort the kernel's exception table
+ * sorttable.c: Sort the kernel's table
*
* Copyright 2011 - 2012 Cavium, Inc.
*
@@ -182,9 +182,9 @@ static inline unsigned int get_secindex(unsigned int shndx,
}

/* 32 bit and 64 bit are very similar */
-#include "sortextable.h"
-#define SORTEXTABLE_64
-#include "sortextable.h"
+#include "sorttable.h"
+#define SORTTABLE_64
+#include "sorttable.h"

static int compare_relative_table(const void *a, const void *b)
{
@@ -351,7 +351,7 @@ int main(int argc, char *argv[])
void *addr = NULL;

if (argc < 2) {
- fprintf(stderr, "usage: sortextable vmlinux...\n");
+ fprintf(stderr, "usage: sorttable vmlinux...\n");
return 0;
}

diff --git a/scripts/sortextable.h b/scripts/sorttable.h
similarity index 99%
rename from scripts/sortextable.h
rename to scripts/sorttable.h
index 6485513f7cae..82589ff90e25 100644
--- a/scripts/sortextable.h
+++ b/scripts/sorttable.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
- * sortextable.h
+ * sorttable.h
*
* Copyright 2011 - 2012 Cavium, Inc.
*
@@ -31,7 +31,7 @@
#undef _r
#undef _w

-#ifdef SORTEXTABLE_64
+#ifdef SORTTABLE_64
# define extable_ent_size 16
# define compare_extable compare_extable_64
# define do_sort do_sort_64
--
2.24.0.rc2

2019-11-15 16:49:31

by Shile Zhang

[permalink] [raw]
Subject: [RFC PATCH v4 6/7] scripts/sorttable: Add ORC unwind tables sort concurrently

ORC unwinder have two tables, .orc_unwind_ip and .orc_unwind, which
needs sorted for binary search. To sort it at build time can save more
CPU cycles help to speed up kernel booting.

Add the ORC tables sorting in a sperated thread helps to avoid more link
cost of kernel building.

https://lore.kernel.org/lkml/[email protected]/
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Shile Zhang <[email protected]>
---
scripts/Makefile | 6 ++
scripts/sorttable.h | 201 ++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 202 insertions(+), 5 deletions(-)

diff --git a/scripts/Makefile b/scripts/Makefile
index 658d201f7f8b..d6670bd420f6 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -26,6 +26,12 @@ HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
HOSTLDLIBS_sign-file = -lcrypto
HOSTLDLIBS_extract-cert = -lcrypto

+ifdef CONFIG_UNWINDER_ORC
+HOSTCFLAGS_sorttable.o += -I$(srctree)/tools/arch/$(ARCH)/include
+HOSTCFLAGS_sorttable.o += -DUNWINDER_ORC_ENABLED
+HOSTLDLIBS_sorttable = -lpthread
+endif
+
always := $(hostprogs-y) $(hostprogs-m)

# The following hostprogs-y programs are only build on demand
diff --git a/scripts/sorttable.h b/scripts/sorttable.h
index 82589ff90e25..41c8dd9c75fc 100644
--- a/scripts/sorttable.h
+++ b/scripts/sorttable.h
@@ -4,6 +4,14 @@
*
* Copyright 2011 - 2012 Cavium, Inc.
*
+ * Added ORC unwind tables sort support, and other updates:
+ * Copyright (C) 1999-2019 Alibaba Group Holding Limited. by:
+ * Shile Zhang <[email protected]>
+ *
+ * Some of code was taken out of /lib/sort.c, and
+ * arch/x86/kernel/unwind_orc.c written by:
+ * Copyright (C) 2017 Josh Poimboeuf <[email protected]>
+ *
* Some of this code was taken out of recordmcount.h written by:
*
* Copyright 2009 John F. Reiser <[email protected]>. All rights reserved.
@@ -75,6 +83,131 @@
# define _w w
#endif

+#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
+/* ORC unwinder only support X86_64 */
+#include <errno.h>
+#include <pthread.h>
+#include <asm/orc_types.h>
+
+int *g_orc_ip_table;
+struct orc_entry *g_orc_table;
+
+pthread_t orc_sort_thread;
+
+/**
+ * sort - sort an array of elements
+ * @base: pointer to data to sort
+ * @num: number of elements
+ * @size: size of each element
+ * @cmp_func: pointer to comparison function
+ * @swap_func: pointer to swap function
+ *
+ * This function does a heapsort on the given array. You may provide a
+ * swap_func function optimized to your element type.
+ *
+ * Sorting time is O(n log n) both on average and worst-case. While
+ * qsort is about 20% faster on average, it suffers from exploitable
+ * O(n*n) worst-case behavior and extra memory requirements that make
+ * it less suitable for kernel use.
+ *
+ * This code token out of /lib/sort.c.
+ */
+static void sort(void *base, size_t num, size_t size,
+ int (*cmp_func)(const void *, const void *),
+ void (*swap_func)(void *, void *, int size))
+{
+ /* pre-scale counters for performance */
+ int i = (num/2 - 1) * size, n = num * size, c, r;
+
+ /* heapify */
+ for ( ; i >= 0; i -= size) {
+ for (r = i; r * 2 + size < n; r = c) {
+ c = r * 2 + size;
+ if (c < n - size &&
+ cmp_func(base + c, base + c + size) < 0)
+ c += size;
+ if (cmp_func(base + r, base + c) >= 0)
+ break;
+ swap_func(base + r, base + c, size);
+ }
+ }
+
+ /* sort */
+ for (i = n - size; i > 0; i -= size) {
+ swap_func(base, base + i, size);
+ for (r = 0; r * 2 + size < i; r = c) {
+ c = r * 2 + size;
+ if (c < i - size &&
+ cmp_func(base + c, base + c + size) < 0)
+ c += size;
+ if (cmp_func(base + r, base + c) >= 0)
+ break;
+ swap_func(base + r, base + c, size);
+ }
+ }
+}
+
+static inline unsigned long orc_ip(const int *ip)
+{
+ return (unsigned long)ip + *ip;
+}
+
+static void orc_sort_swap(void *_a, void *_b, int size)
+{
+ struct orc_entry *orc_a, *orc_b;
+ struct orc_entry orc_tmp;
+ int *a = _a, *b = _b, tmp;
+ int delta = _b - _a;
+
+ /* Swap the .orc_unwind_ip entries: */
+ tmp = *a;
+ *a = *b + delta;
+ *b = tmp - delta;
+
+ /* Swap the corresponding .orc_unwind entries: */
+ orc_a = g_orc_table + (a - g_orc_ip_table);
+ orc_b = g_orc_table + (b - g_orc_ip_table);
+ orc_tmp = *orc_a;
+ *orc_a = *orc_b;
+ *orc_b = orc_tmp;
+}
+
+static int orc_sort_cmp(const void *_a, const void *_b)
+{
+ struct orc_entry *orc_a;
+ const int *a = _a, *b = _b;
+ unsigned long a_val = orc_ip(a);
+ unsigned long b_val = orc_ip(b);
+
+ if (a_val > b_val)
+ return 1;
+ if (a_val < b_val)
+ return -1;
+
+ /*
+ * The "weak" section terminator entries need to always be on the left
+ * to ensure the lookup code skips them in favor of real entries.
+ * These terminator entries exist to handle any gaps created by
+ * whitelisted .o files which didn't get objtool generation.
+ */
+ orc_a = g_orc_table + (a - g_orc_ip_table);
+ return orc_a->sp_reg == ORC_REG_UNDEFINED && !orc_a->end ? -1 : 1;
+}
+
+static void *sort_orctable(void *arg)
+{
+ unsigned int *size = (unsigned int *)arg;
+ unsigned int num_entries;
+
+ num_entries = *size / sizeof(int);
+
+ sort(g_orc_ip_table, num_entries, sizeof(int),
+ orc_sort_cmp, orc_sort_swap);
+
+ pthread_exit(NULL);
+}
+#endif
+
static int compare_extable(const void *a, const void *b)
{
Elf_Addr av = _r(a);
@@ -91,6 +224,7 @@ static int do_sort(Elf_Ehdr *ehdr,
char const *const fname,
table_sort_t custom_sort)
{
+ int rc = -1;
Elf_Shdr *s, *shdr = (Elf_Shdr *)((char *)ehdr + _r(&ehdr->e_shoff));
Elf_Shdr *strtab_sec = NULL;
Elf_Shdr *symtab_sec = NULL;
@@ -111,6 +245,11 @@ static int do_sort(Elf_Ehdr *ehdr,
int idx;
unsigned int shnum;
unsigned int shstrndx;
+#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
+ unsigned int orc_ip_size = 0;
+ unsigned int orc_size = 0;
+ unsigned int orc_num_entries = 0;
+#endif

shstrndx = r2(&ehdr->e_shstrndx);
if (shstrndx == SHN_XINDEX)
@@ -141,21 +280,61 @@ static int do_sort(Elf_Ehdr *ehdr,
if (r(&s->sh_type) == SHT_SYMTAB_SHNDX)
symtab_shndx = (Elf32_Word *)((const char *)ehdr +
_r(&s->sh_offset));
+
+#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
+ /* locate the ORC unwind tables */
+ if (!strcmp(secstrings + idx, ".orc_unwind_ip")) {
+ orc_ip_size = s->sh_size;
+ g_orc_ip_table = (int *)((void *)ehdr +
+ s->sh_offset);
+ }
+ if (!strcmp(secstrings + idx, ".orc_unwind")) {
+ orc_size = s->sh_size;
+ g_orc_table = (struct orc_entry *)((void *)ehdr +
+ s->sh_offset);
+ }
+#endif
+ } /* for loop */
+
+#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
+ if (!g_orc_ip_table || !g_orc_table) {
+ fprintf(stderr,
+ "incomplete ORC unwind tables in file: %s\n", fname);
+ goto out;
+ }
+
+ orc_num_entries = orc_ip_size / sizeof(int);
+ if (orc_ip_size % sizeof(int) != 0 ||
+ orc_size % sizeof(struct orc_entry) != 0 ||
+ orc_num_entries != orc_size / sizeof(struct orc_entry)) {
+ fprintf(stderr,
+ "inconsistent ORC unwind table entries in file: %s\n",
+ fname);
+ goto out;
}

+ /* create thread to sort ORC unwind tables concurrently */
+ if (pthread_create(&orc_sort_thread, NULL,
+ sort_orctable, &orc_ip_size)) {
+ fprintf(stderr,
+ "pthread_create orc_sort_thread failed '%s': %s\n",
+ strerror(errno), fname);
+ goto out;
+ }
+#endif
if (!extab_sec) {
fprintf(stderr, "no __ex_table in file: %s\n", fname);
- return -1;
+ goto out;
}

if (!symtab_sec) {
fprintf(stderr, "no .symtab in file: %s\n", fname);
- return -1;
+ goto out;
}

if (!strtab_sec) {
fprintf(stderr, "no .strtab in file: %s\n", fname);
- return -1;
+ goto out;
}

extab_image = (void *)ehdr + _r(&extab_sec->sh_offset);
@@ -192,7 +371,7 @@ static int do_sort(Elf_Ehdr *ehdr,
fprintf(stderr,
"no main_extable_sort_needed symbol in file: %s\n",
fname);
- return -1;
+ goto out;
}

sort_needed_sec = &shdr[get_secindex(r2(&sym->st_shndx),
@@ -205,6 +384,18 @@ static int do_sort(Elf_Ehdr *ehdr,

/* extable has been sorted, clear the flag */
w(0, sort_needed_loc);
+ rc = 0;

- return 0;
+out:
+#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
+ if (orc_sort_thread) {
+ /* wait for ORC tables sort done */
+ rc = pthread_join(orc_sort_thread, NULL);
+ if (rc)
+ fprintf(stderr,
+ "pthread_join orc_sort_thread failed "
+ "'%s': %s\n", strerror(errno), fname);
+ }
+#endif
+ return rc;
}
--
2.24.0.rc2

2019-11-15 16:51:21

by Shile Zhang

[permalink] [raw]
Subject: [RFC PATCH v4 2/7] scripts/sortextable: kernel coding style formating

Fix the inconsistent function format and kernel code style,
referred to commit 3aec8638246f ("recordmcount: Kernel style
function signature formatting") and
commit 2e63152bc190 ("recordmcount: Kernel style formatting")

Make the code more readable and extendable, no functional changes.

Signed-off-by: Shile Zhang <[email protected]>
---
scripts/sortextable.c | 182 ++++++++++++++++++++++--------------------
scripts/sortextable.h | 31 +++----
2 files changed, 111 insertions(+), 102 deletions(-)

diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index cd9762ba4467..e5384e86b58c 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -50,6 +50,14 @@
#define EM_ARCV2 195
#endif

+static uint32_t (*r)(const uint32_t *);
+static uint16_t (*r2)(const uint16_t *);
+static uint64_t (*r8)(const uint64_t *);
+static void (*w)(uint32_t, uint32_t *);
+static void (*w2)(uint16_t, uint16_t *);
+static void (*w8)(uint64_t, uint64_t *);
+typedef void (*table_sort_t)(char *, int);
+
/*
* Get the whole file as a programming convenience in order to avoid
* malloc+lseek+read+free of many pieces. If successful, then mmap
@@ -75,6 +83,7 @@ static void *mmap_file(char const *fname, size_t *size)
fprintf(stderr, "not a regular file: %s\n", fname);
goto out;
}
+
addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
fprintf(stderr, "Could not mmap file: %s\n", fname);
@@ -88,64 +97,65 @@ static void *mmap_file(char const *fname, size_t *size)
return addr;
}

-static uint64_t r8be(const uint64_t *x)
-{
- return get_unaligned_be64(x);
-}
static uint32_t rbe(const uint32_t *x)
{
return get_unaligned_be32(x);
}
+
static uint16_t r2be(const uint16_t *x)
{
return get_unaligned_be16(x);
}
-static uint64_t r8le(const uint64_t *x)
+
+static uint64_t r8be(const uint64_t *x)
{
- return get_unaligned_le64(x);
+ return get_unaligned_be64(x);
}
+
static uint32_t rle(const uint32_t *x)
{
return get_unaligned_le32(x);
}
+
static uint16_t r2le(const uint16_t *x)
{
return get_unaligned_le16(x);
}

-static void w8be(uint64_t val, uint64_t *x)
+static uint64_t r8le(const uint64_t *x)
{
- put_unaligned_be64(val, x);
+ return get_unaligned_le64(x);
}
+
static void wbe(uint32_t val, uint32_t *x)
{
put_unaligned_be32(val, x);
}
+
static void w2be(uint16_t val, uint16_t *x)
{
put_unaligned_be16(val, x);
}
-static void w8le(uint64_t val, uint64_t *x)
+
+static void w8be(uint64_t val, uint64_t *x)
{
- put_unaligned_le64(val, x);
+ put_unaligned_be64(val, x);
}
+
static void wle(uint32_t val, uint32_t *x)
{
put_unaligned_le32(val, x);
}
+
static void w2le(uint16_t val, uint16_t *x)
{
put_unaligned_le16(val, x);
}

-static uint64_t (*r8)(const uint64_t *);
-static uint32_t (*r)(const uint32_t *);
-static uint16_t (*r2)(const uint16_t *);
-static void (*w8)(uint64_t, uint64_t *);
-static void (*w)(uint32_t, uint32_t *);
-static void (*w2)(uint16_t, uint16_t *);
-
-typedef void (*table_sort_t)(char *, int);
+static void w8le(uint64_t val, uint64_t *x)
+{
+ put_unaligned_le64(val, x);
+}

/*
* Move reserved section indices SHN_LORESERVE..SHN_HIRESERVE out of
@@ -188,108 +198,100 @@ static int compare_relative_table(const void *a, const void *b)
return 0;
}

-static void x86_sort_relative_table(char *extab_image, int image_size)
+static void sort_relative_table(char *extab_image, int image_size)
{
- int i;
+ int i = 0;

- i = 0;
+ /*
+ * Do the same thing the runtime sort does, first normalize to
+ * being relative to the start of the section.
+ */
while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i);
-
w(r(loc) + i, loc);
- w(r(loc + 1) + i + 4, loc + 1);
- w(r(loc + 2) + i + 8, loc + 2);
-
- i += sizeof(uint32_t) * 3;
+ i += 4;
}

- qsort(extab_image, image_size / 12, 12, compare_relative_table);
+ qsort(extab_image, image_size / 8, 8, compare_relative_table);

+ /* Now denormalize. */
i = 0;
while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i);
-
w(r(loc) - i, loc);
- w(r(loc + 1) - (i + 4), loc + 1);
- w(r(loc + 2) - (i + 8), loc + 2);
-
- i += sizeof(uint32_t) * 3;
+ i += 4;
}
}

-static void sort_relative_table(char *extab_image, int image_size)
+static void x86_sort_relative_table(char *extab_image, int image_size)
{
- int i;
+ int i = 0;

- /*
- * Do the same thing the runtime sort does, first normalize to
- * being relative to the start of the section.
- */
- i = 0;
while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i);
+
w(r(loc) + i, loc);
- i += 4;
+ w(r(loc + 1) + i + 4, loc + 1);
+ w(r(loc + 2) + i + 8, loc + 2);
+
+ i += sizeof(uint32_t) * 3;
}

- qsort(extab_image, image_size / 8, 8, compare_relative_table);
+ qsort(extab_image, image_size / 12, 12, compare_relative_table);

- /* Now denormalize. */
i = 0;
while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i);
+
w(r(loc) - i, loc);
- i += 4;
+ w(r(loc + 1) - (i + 4), loc + 1);
+ w(r(loc + 2) - (i + 8), loc + 2);
+
+ i += sizeof(uint32_t) * 3;
}
}

-static int
-do_file(char const *const fname, void *addr)
+static int do_file(char const *const fname, void *addr)
{
- table_sort_t custom_sort = NULL;
- Elf32_Ehdr *ehdr = addr;
int rc = -1;
+ Elf32_Ehdr *ehdr = addr;
+ table_sort_t custom_sort = NULL;

switch (ehdr->e_ident[EI_DATA]) {
- default:
- fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
- ehdr->e_ident[EI_DATA], fname);
- return -1;
case ELFDATA2LSB:
- r = rle;
- r2 = r2le;
- r8 = r8le;
- w = wle;
- w2 = w2le;
- w8 = w8le;
+ r = rle;
+ r2 = r2le;
+ r8 = r8le;
+ w = wle;
+ w2 = w2le;
+ w8 = w8le;
break;
case ELFDATA2MSB:
- r = rbe;
- r2 = r2be;
- r8 = r8be;
- w = wbe;
- w2 = w2be;
- w8 = w8be;
+ r = rbe;
+ r2 = r2be;
+ r8 = r8be;
+ w = wbe;
+ w2 = w2be;
+ w8 = w8be;
break;
- } /* end switch */
- if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0
- || (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN)
- || ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
+ default:
+ fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
+ ehdr->e_ident[EI_DATA], fname);
+ return -1;
+ }
+
+ if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 ||
+ (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN) ||
+ ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname);
return -1;
}

- custom_sort = NULL;
switch (r2(&ehdr->e_machine)) {
- default:
- fprintf(stderr, "unrecognized e_machine %d %s\n",
- r2(&ehdr->e_machine), fname);
- break;
case EM_386:
case EM_X86_64:
custom_sort = x86_sort_relative_table;
break;
-
case EM_S390:
case EM_AARCH64:
case EM_PARISC:
@@ -304,40 +306,45 @@ do_file(char const *const fname, void *addr)
case EM_MIPS:
case EM_XTENSA:
break;
- } /* end switch */
+ default:
+ fprintf(stderr, "unrecognized e_machine %d %s\n",
+ r2(&ehdr->e_machine), fname);
+ return -1;
+ }

switch (ehdr->e_ident[EI_CLASS]) {
- default:
- fprintf(stderr, "unrecognized ELF class %d %s\n",
- ehdr->e_ident[EI_CLASS], fname);
- break;
case ELFCLASS32:
- if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
- || r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
+ if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr) ||
+ r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
break;
}
rc = do32(ehdr, fname, custom_sort);
break;
- case ELFCLASS64: {
+ case ELFCLASS64:
+ {
Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
- if (r2(&ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
- || r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
+ if (r2(&ghdr->e_ehsize) != sizeof(Elf64_Ehdr) ||
+ r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
fprintf(stderr,
- "unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
+ "unrecognized ET_EXEC/ET_DYN file: %s\n",
+ fname);
break;
}
rc = do64(ghdr, fname, custom_sort);
+ }
+ break;
+ default:
+ fprintf(stderr, "unrecognized ELF class %d %s\n",
+ ehdr->e_ident[EI_CLASS], fname);
break;
}
- } /* end switch */

return rc;
}

-int
-main(int argc, char *argv[])
+int main(int argc, char *argv[])
{
int i, n_error = 0; /* gcc-4.3.0 false positive complaint */
size_t size = 0;
@@ -361,5 +368,6 @@ main(int argc, char *argv[])

munmap(addr, size);
}
+
return !!n_error;
}
diff --git a/scripts/sortextable.h b/scripts/sortextable.h
index 5a62e94df678..b7e407e09f59 100644
--- a/scripts/sortextable.h
+++ b/scripts/sortextable.h
@@ -6,7 +6,7 @@
*
* Some of this code was taken out of recordmcount.h written by:
*
- * Copyright 2009 John F. Reiser <[email protected]>. All rights reserved.
+ * Copyright 2009 John F. Reiser <[email protected]>. All rights reserved.
* Copyright 2010 Steven Rostedt <[email protected]>, Red Hat Inc.
*/

@@ -87,8 +87,9 @@ static int compare_extable(const void *a, const void *b)
return 0;
}

-static int
-do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
+static int do_func(Elf_Ehdr *ehdr,
+ char const *const fname,
+ table_sort_t custom_sort)
{
Elf_Shdr *shdr;
Elf_Shdr *shstrtab_sec;
@@ -126,7 +127,7 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
secstrtab = (const char *)ehdr + _r(&shstrtab_sec->sh_offset);
for (i = 0; i < num_sections; i++) {
idx = r(&shdr[i].sh_name);
- if (strcmp(secstrtab + idx, "__ex_table") == 0) {
+ if (!strcmp(secstrtab + idx, "__ex_table")) {
extab_sec = shdr + i;
extab_index = i;
}
@@ -136,26 +137,26 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
relocs = (void *)ehdr + _r(&shdr[i].sh_offset);
relocs_size = _r(&shdr[i].sh_size);
}
- if (strcmp(secstrtab + idx, ".symtab") == 0)
+ if (!strcmp(secstrtab + idx, ".symtab"))
symtab_sec = shdr + i;
- if (strcmp(secstrtab + idx, ".strtab") == 0)
+ if (!strcmp(secstrtab + idx, ".strtab"))
strtab_sec = shdr + i;
if (r(&shdr[i].sh_type) == SHT_SYMTAB_SHNDX)
symtab_shndx_start = (Elf32_Word *)(
(const char *)ehdr + _r(&shdr[i].sh_offset));
}
- if (strtab_sec == NULL) {
- fprintf(stderr, "no .strtab in file: %s\n", fname);
+ if (!strtab_sec) {
+ fprintf(stderr, "no .strtab in file: %s\n", fname);
return -1;
}
- if (symtab_sec == NULL) {
- fprintf(stderr, "no .symtab in file: %s\n", fname);
+ if (!symtab_sec) {
+ fprintf(stderr, "no .symtab in file: %s\n", fname);
return -1;
}
symtab = (const Elf_Sym *)((const char *)ehdr +
_r(&symtab_sec->sh_offset));
- if (extab_sec == NULL) {
- fprintf(stderr, "no __ex_table in file: %s\n", fname);
+ if (!extab_sec) {
+ fprintf(stderr, "no __ex_table in file: %s\n", fname);
return -1;
}
strtab = (const char *)ehdr + _r(&strtab_sec->sh_offset);
@@ -181,14 +182,14 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
if (ELF_ST_TYPE(sym->st_info) != STT_OBJECT)
continue;
idx = r(&sym->st_name);
- if (strcmp(strtab + idx, "main_extable_sort_needed") == 0) {
+ if (!strcmp(strtab + idx, "main_extable_sort_needed")) {
sort_needed_sym = sym;
break;
}
}
- if (sort_needed_sym == NULL) {
+ if (!sort_needed_sym) {
fprintf(stderr,
- "no main_extable_sort_needed symbol in file: %s\n",
+ "no main_extable_sort_needed symbol in file: %s\n",
fname);
return -1;
}
--
2.24.0.rc2

2019-11-15 17:56:05

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [RFC PATCH v4 0/7] Speed booting by sorting ORC unwind tables at build time

On Sat, Nov 16, 2019 at 12:45:32AM +0800, Shile Zhang wrote:
> Hi,
>
> I refactored the code, followed by Peter's suggestions in v3, thank you!
> Any suggestions and comments are welcome!
>
> Thanks!
>
> Changelog:
> ==========
> v3->v4:
> - Code refactored for Peter's review findings and suggestions.

Hi Shile,

I haven't given it a proper review yet, but one minor suggestion. In
general it's good practice to wait at least a few business days before
posting a new version of a patch set. That way it gives people time to
review, and it also helps to avoid flooding people's inboxes.

--
Josh

2019-11-18 01:58:35

by Shile Zhang

[permalink] [raw]
Subject: Re: [RFC PATCH v4 0/7] Speed booting by sorting ORC unwind tables at build time



On 2019/11/16 01:52, Josh Poimboeuf wrote:
> On Sat, Nov 16, 2019 at 12:45:32AM +0800, Shile Zhang wrote:
>> Hi,
>>
>> I refactored the code, followed by Peter's suggestions in v3, thank you!
>> Any suggestions and comments are welcome!
>>
>> Thanks!
>>
>> Changelog:
>> ==========
>> v3->v4:
>> - Code refactored for Peter's review findings and suggestions.
> Hi Shile,
>
> I haven't given it a proper review yet, but one minor suggestion. In
> general it's good practice to wait at least a few business days before
> posting a new version of a patch set. That way it gives people time to
> review, and it also helps to avoid flooding people's inboxes.
>

Hi, Josh,

Thank you very much for your kindly reminder!
I got it and very sorry for that flooding.

Best regards!

--- Shile

2019-11-18 02:21:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH v4 0/7] Speed booting by sorting ORC unwind tables at build time

On November 15, 2019 8:45:32 AM PST, Shile Zhang <[email protected]> wrote:
>Hi,
>
>I refactored the code, followed by Peter's suggestions in v3, thank
>you!
>Any suggestions and comments are welcome!
>
>Thanks!
>
>Changelog:
>==========
>v3->v4:
>- Code refactored for Peter's review findings and suggestions.
>
>v2->v3:
>- Discard new added sortorctable tool and related Kconfig changes.
>- Refactored sortextable, makes it more readable and extendable.
>- Rename 'sortextable' to 'sorttable', for more kernel tables extend.
>- Add ORC unwind tables sort into sorttable.
>- Remove the runtime ORC tables sort.
>https://lore.kernel.org/lkml/[email protected]/
>
>v1->v2:
>- Removed new added Kconfig and runtime sort code, advised by Josh
>Poimboeuf.
>- Some minor refactoring.
>https://lore.kernel.org/lkml/[email protected]/
>
>v1:
>- Added a new sortorctable tool to sort ORC unwind tables at build
>time,
> same as sortextable.
>- Add a new Kconfigure to control if ORC unwind tables sort at build
> time.
>https://lore.kernel.org/lkml/[email protected]/
>
>Shile Zhang (7):
> scripts/sortextable: Rewrite error/success handling
> scripts/sortextable: kernel coding style formating
> scripts/sortextable: Remove dead code
> scripts/sortextable: refactor do_func() function
> scripts/sorttable: rename sortextable to sorttable
> scripts/sorttable: Add ORC unwind tables sort concurrently
> x86/unwind/orc: remove run-time ORC unwind tables sort
>
> arch/arc/Kconfig | 2 +-
> arch/arm/Kconfig | 2 +-
> arch/arm64/Kconfig | 2 +-
> arch/microblaze/Kconfig | 2 +-
> arch/mips/Kconfig | 2 +-
> arch/parisc/Kconfig | 2 +-
> arch/parisc/kernel/vmlinux.lds.S | 2 +-
> arch/powerpc/Kconfig | 2 +-
> arch/s390/Kconfig | 2 +-
> arch/x86/Kconfig | 2 +-
> arch/x86/kernel/unwind_orc.c | 8 +-
> arch/xtensa/Kconfig | 2 +-
> init/Kconfig | 2 +-
> scripts/.gitignore | 2 +-
> scripts/Makefile | 10 +-
> scripts/link-vmlinux.sh | 10 +-
> scripts/sortextable.h | 209 -------------
> scripts/{sortextable.c => sorttable.c} | 299 +++++++++---------
> scripts/sorttable.h | 401 +++++++++++++++++++++++++
> 19 files changed, 568 insertions(+), 395 deletions(-)
> delete mode 100644 scripts/sortextable.h
> rename scripts/{sortextable.c => sorttable.c} (67%)
> create mode 100644 scripts/sorttable.h

Any actual time measurements?
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2019-11-18 02:30:12

by Shile Zhang

[permalink] [raw]
Subject: Re: [RFC PATCH v4 0/7] Speed booting by sorting ORC unwind tables at build time



On 2019/11/18 10:18, [email protected] wrote:
> On November 15, 2019 8:45:32 AM PST, Shile Zhang <[email protected]> wrote:
>> Hi,
>>
>> I refactored the code, followed by Peter's suggestions in v3, thank
>> you!
>> Any suggestions and comments are welcome!
>>
>> Thanks!
>>
>> Changelog:
>> ==========
>> v3->v4:
>> - Code refactored for Peter's review findings and suggestions.
>>
>> v2->v3:
>> - Discard new added sortorctable tool and related Kconfig changes.
>> - Refactored sortextable, makes it more readable and extendable.
>> - Rename 'sortextable' to 'sorttable', for more kernel tables extend.
>> - Add ORC unwind tables sort into sorttable.
>> - Remove the runtime ORC tables sort.
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> v1->v2:
>> - Removed new added Kconfig and runtime sort code, advised by Josh
>> Poimboeuf.
>> - Some minor refactoring.
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> v1:
>> - Added a new sortorctable tool to sort ORC unwind tables at build
>> time,
>> same as sortextable.
>> - Add a new Kconfigure to control if ORC unwind tables sort at build
>> time.
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Shile Zhang (7):
>> scripts/sortextable: Rewrite error/success handling
>> scripts/sortextable: kernel coding style formating
>> scripts/sortextable: Remove dead code
>> scripts/sortextable: refactor do_func() function
>> scripts/sorttable: rename sortextable to sorttable
>> scripts/sorttable: Add ORC unwind tables sort concurrently
>> x86/unwind/orc: remove run-time ORC unwind tables sort
>>
>> arch/arc/Kconfig | 2 +-
>> arch/arm/Kconfig | 2 +-
>> arch/arm64/Kconfig | 2 +-
>> arch/microblaze/Kconfig | 2 +-
>> arch/mips/Kconfig | 2 +-
>> arch/parisc/Kconfig | 2 +-
>> arch/parisc/kernel/vmlinux.lds.S | 2 +-
>> arch/powerpc/Kconfig | 2 +-
>> arch/s390/Kconfig | 2 +-
>> arch/x86/Kconfig | 2 +-
>> arch/x86/kernel/unwind_orc.c | 8 +-
>> arch/xtensa/Kconfig | 2 +-
>> init/Kconfig | 2 +-
>> scripts/.gitignore | 2 +-
>> scripts/Makefile | 10 +-
>> scripts/link-vmlinux.sh | 10 +-
>> scripts/sortextable.h | 209 -------------
>> scripts/{sortextable.c => sorttable.c} | 299 +++++++++---------
>> scripts/sorttable.h | 401 +++++++++++++++++++++++++
>> 19 files changed, 568 insertions(+), 395 deletions(-)
>> delete mode 100644 scripts/sortextable.h
>> rename scripts/{sortextable.c => sorttable.c} (67%)
>> create mode 100644 scripts/sorttable.h
> Any actual time measurements?

Sorry for missed in cover letter!

The ORC unwind tables sorting cost about 100ms in boot time on my
2vcpu16GB VM (with 2.5GHz Xeon CPU).
After sort at build time, the sorttable needs about 70ms in host.

Thanks!