Hi,
I refactored the sortextable code and add ORC unwind tables sort
support, for kernel boot speedup by sorting kernel tables at build time
as much as possible.
Followed Peter's suggestion, I put ORC tables sort into a separated
thread makes these tables sort concurrently. That helps to avoid
kernel's link time as possible.
What I did:
[1-4] : refacorting code sortextable.{ch}, for more readable and
extendable, prepare for further rework;
[5] : rename the sortextable to sorttable, and related Kconfig, for
commonly extend.
[6-7] : Add ORC unwind tables sorting, and remove the runtime sort.
Further works:
Put more kernel tables be sorted at build time:
- __jump_table:
I found jump table sort in jump_label_init() does not cost more boot
time, seems not more benefit to sort it at build time. Maybe can
consider it in future for more boot-time sensitive case.
- .static_call_sites:
This tables not merged yet, needs to check the runtime sort cost in
future.
https://lore.kernel.org/lkml/[email protected]/
Thanks!
Changes from RFC v3:
- Discard new added sortorctable tool and related Kconfig changes.
- Refactored sortextable, makes it more readable and extendable.
- Rename 'sortextable' to 'sorttable', for more kernel tables extend.
- Add ORC unwind tables sort into sorttable.
- Remove the runtime ORC tables sort.
Changes from RFC v2:
- Removed new added Kconfig and runtime sort code, advised by Josh Poimboeuf.
- Some minor refactoring.
https://www.lkml.org/lkml/2019/11/8/56
v1:
- Added a new sortorctable tool to sort ORC unwind tables at build time,
same as sortextable.
- Add a new Kconfigure to control if ORC unwind tables sort at build
time.
https://www.lkml.org/lkml/2019/11/7/611
Shile Zhang (7):
scripts/sortextable: Rewrite error/success handling
scripts/sortextable: kernel coding style formating
scripts/sortextable: Remove dead code
scripts/sortextable: refactor do_func() function
scripts/sorttable: rename sortextable to sorttable
scripts/sorttable: Add ORC unwind tables sort concurrently
x86/unwind/orc: remove run-time ORC unwind tables sort
arch/arc/Kconfig | 2 +-
arch/arm/Kconfig | 2 +-
arch/arm64/Kconfig | 2 +-
arch/microblaze/Kconfig | 2 +-
arch/mips/Kconfig | 2 +-
arch/parisc/Kconfig | 2 +-
arch/parisc/kernel/vmlinux.lds.S | 2 +-
arch/powerpc/Kconfig | 2 +-
arch/s390/Kconfig | 2 +-
arch/x86/Kconfig | 2 +-
arch/x86/kernel/unwind_orc.c | 8 +-
arch/xtensa/Kconfig | 2 +-
init/Kconfig | 2 +-
scripts/.gitignore | 2 +-
scripts/Makefile | 9 +-
scripts/link-vmlinux.sh | 10 +-
scripts/sortextable.h | 209 -------------
scripts/{sortextable.c => sorttable.c} | 299 ++++++++----------
scripts/sorttable.h | 412 +++++++++++++++++++++++++
19 files changed, 578 insertions(+), 395 deletions(-)
delete mode 100644 scripts/sortextable.h
rename scripts/{sortextable.c => sorttable.c} (67%)
create mode 100644 scripts/sorttable.h
--
2.24.0.rc2
Remove the comment out dead code, no functional changes.
Signed-off-by: Shile Zhang <[email protected]>
---
scripts/sortextable.h | 4 ----
1 file changed, 4 deletions(-)
diff --git a/scripts/sortextable.h b/scripts/sortextable.h
index b7e407e09f59..a2e3af7bf211 100644
--- a/scripts/sortextable.h
+++ b/scripts/sortextable.h
@@ -201,10 +201,6 @@ static int do_func(Elf_Ehdr *ehdr,
_r(&sort_needed_sym->st_value) -
_r(&sort_needed_sec->sh_addr);
-#if 0
- printf("sort done marker at %lx\n",
- (unsigned long)((char *)sort_done_location - (char *)ehdr));
-#endif
/* We sorted it, clear the flag. */
w(0, sort_done_location);
return 0;
--
2.24.0.rc2
The orc_unwind and orc_unwind_ip tables are sorted in vmlinux link phase
at build time, just remove the run-time sort.
Signed-off-by: Shile Zhang <[email protected]>
---
arch/x86/kernel/unwind_orc.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 332ae6530fa8..280da6fa9922 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -273,9 +273,11 @@ void __init unwind_init(void)
return;
}
- /* Sort the .orc_unwind and .orc_unwind_ip tables: */
- sort(__start_orc_unwind_ip, num_entries, sizeof(int), orc_sort_cmp,
- orc_sort_swap);
+ /*
+ * Note, orc_unwind and orc_unwind_ip tables has been sorted in
+ * vmlinux link phase by sorttable tool at build time.
+ * Its ready for binary search now.
+ */
/* Initialize the fast lookup table: */
lookup_num_blocks = orc_lookup_end - orc_lookup;
--
2.24.0.rc2
Fix the inconsistent function format and kernel code style,
referred to commit 3aec8638246f ("recordmcount: Kernel style
function signature formatting") and
commit 2e63152bc190 ("recordmcount: Kernel style formatting")
Make the code more readable and extendable, no functional changes.
Signed-off-by: Shile Zhang <[email protected]>
---
scripts/sortextable.c | 182 ++++++++++++++++++++++--------------------
scripts/sortextable.h | 31 +++----
2 files changed, 111 insertions(+), 102 deletions(-)
diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index cd9762ba4467..e5384e86b58c 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -50,6 +50,14 @@
#define EM_ARCV2 195
#endif
+static uint32_t (*r)(const uint32_t *);
+static uint16_t (*r2)(const uint16_t *);
+static uint64_t (*r8)(const uint64_t *);
+static void (*w)(uint32_t, uint32_t *);
+static void (*w2)(uint16_t, uint16_t *);
+static void (*w8)(uint64_t, uint64_t *);
+typedef void (*table_sort_t)(char *, int);
+
/*
* Get the whole file as a programming convenience in order to avoid
* malloc+lseek+read+free of many pieces. If successful, then mmap
@@ -75,6 +83,7 @@ static void *mmap_file(char const *fname, size_t *size)
fprintf(stderr, "not a regular file: %s\n", fname);
goto out;
}
+
addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
fprintf(stderr, "Could not mmap file: %s\n", fname);
@@ -88,64 +97,65 @@ static void *mmap_file(char const *fname, size_t *size)
return addr;
}
-static uint64_t r8be(const uint64_t *x)
-{
- return get_unaligned_be64(x);
-}
static uint32_t rbe(const uint32_t *x)
{
return get_unaligned_be32(x);
}
+
static uint16_t r2be(const uint16_t *x)
{
return get_unaligned_be16(x);
}
-static uint64_t r8le(const uint64_t *x)
+
+static uint64_t r8be(const uint64_t *x)
{
- return get_unaligned_le64(x);
+ return get_unaligned_be64(x);
}
+
static uint32_t rle(const uint32_t *x)
{
return get_unaligned_le32(x);
}
+
static uint16_t r2le(const uint16_t *x)
{
return get_unaligned_le16(x);
}
-static void w8be(uint64_t val, uint64_t *x)
+static uint64_t r8le(const uint64_t *x)
{
- put_unaligned_be64(val, x);
+ return get_unaligned_le64(x);
}
+
static void wbe(uint32_t val, uint32_t *x)
{
put_unaligned_be32(val, x);
}
+
static void w2be(uint16_t val, uint16_t *x)
{
put_unaligned_be16(val, x);
}
-static void w8le(uint64_t val, uint64_t *x)
+
+static void w8be(uint64_t val, uint64_t *x)
{
- put_unaligned_le64(val, x);
+ put_unaligned_be64(val, x);
}
+
static void wle(uint32_t val, uint32_t *x)
{
put_unaligned_le32(val, x);
}
+
static void w2le(uint16_t val, uint16_t *x)
{
put_unaligned_le16(val, x);
}
-static uint64_t (*r8)(const uint64_t *);
-static uint32_t (*r)(const uint32_t *);
-static uint16_t (*r2)(const uint16_t *);
-static void (*w8)(uint64_t, uint64_t *);
-static void (*w)(uint32_t, uint32_t *);
-static void (*w2)(uint16_t, uint16_t *);
-
-typedef void (*table_sort_t)(char *, int);
+static void w8le(uint64_t val, uint64_t *x)
+{
+ put_unaligned_le64(val, x);
+}
/*
* Move reserved section indices SHN_LORESERVE..SHN_HIRESERVE out of
@@ -188,108 +198,100 @@ static int compare_relative_table(const void *a, const void *b)
return 0;
}
-static void x86_sort_relative_table(char *extab_image, int image_size)
+static void sort_relative_table(char *extab_image, int image_size)
{
- int i;
+ int i = 0;
- i = 0;
+ /*
+ * Do the same thing the runtime sort does, first normalize to
+ * being relative to the start of the section.
+ */
while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i);
-
w(r(loc) + i, loc);
- w(r(loc + 1) + i + 4, loc + 1);
- w(r(loc + 2) + i + 8, loc + 2);
-
- i += sizeof(uint32_t) * 3;
+ i += 4;
}
- qsort(extab_image, image_size / 12, 12, compare_relative_table);
+ qsort(extab_image, image_size / 8, 8, compare_relative_table);
+ /* Now denormalize. */
i = 0;
while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i);
-
w(r(loc) - i, loc);
- w(r(loc + 1) - (i + 4), loc + 1);
- w(r(loc + 2) - (i + 8), loc + 2);
-
- i += sizeof(uint32_t) * 3;
+ i += 4;
}
}
-static void sort_relative_table(char *extab_image, int image_size)
+static void x86_sort_relative_table(char *extab_image, int image_size)
{
- int i;
+ int i = 0;
- /*
- * Do the same thing the runtime sort does, first normalize to
- * being relative to the start of the section.
- */
- i = 0;
while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i);
+
w(r(loc) + i, loc);
- i += 4;
+ w(r(loc + 1) + i + 4, loc + 1);
+ w(r(loc + 2) + i + 8, loc + 2);
+
+ i += sizeof(uint32_t) * 3;
}
- qsort(extab_image, image_size / 8, 8, compare_relative_table);
+ qsort(extab_image, image_size / 12, 12, compare_relative_table);
- /* Now denormalize. */
i = 0;
while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i);
+
w(r(loc) - i, loc);
- i += 4;
+ w(r(loc + 1) - (i + 4), loc + 1);
+ w(r(loc + 2) - (i + 8), loc + 2);
+
+ i += sizeof(uint32_t) * 3;
}
}
-static int
-do_file(char const *const fname, void *addr)
+static int do_file(char const *const fname, void *addr)
{
- table_sort_t custom_sort = NULL;
- Elf32_Ehdr *ehdr = addr;
int rc = -1;
+ Elf32_Ehdr *ehdr = addr;
+ table_sort_t custom_sort = NULL;
switch (ehdr->e_ident[EI_DATA]) {
- default:
- fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
- ehdr->e_ident[EI_DATA], fname);
- return -1;
case ELFDATA2LSB:
- r = rle;
- r2 = r2le;
- r8 = r8le;
- w = wle;
- w2 = w2le;
- w8 = w8le;
+ r = rle;
+ r2 = r2le;
+ r8 = r8le;
+ w = wle;
+ w2 = w2le;
+ w8 = w8le;
break;
case ELFDATA2MSB:
- r = rbe;
- r2 = r2be;
- r8 = r8be;
- w = wbe;
- w2 = w2be;
- w8 = w8be;
+ r = rbe;
+ r2 = r2be;
+ r8 = r8be;
+ w = wbe;
+ w2 = w2be;
+ w8 = w8be;
break;
- } /* end switch */
- if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0
- || (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN)
- || ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
+ default:
+ fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
+ ehdr->e_ident[EI_DATA], fname);
+ return -1;
+ }
+
+ if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 ||
+ (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN) ||
+ ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname);
return -1;
}
- custom_sort = NULL;
switch (r2(&ehdr->e_machine)) {
- default:
- fprintf(stderr, "unrecognized e_machine %d %s\n",
- r2(&ehdr->e_machine), fname);
- break;
case EM_386:
case EM_X86_64:
custom_sort = x86_sort_relative_table;
break;
-
case EM_S390:
case EM_AARCH64:
case EM_PARISC:
@@ -304,40 +306,45 @@ do_file(char const *const fname, void *addr)
case EM_MIPS:
case EM_XTENSA:
break;
- } /* end switch */
+ default:
+ fprintf(stderr, "unrecognized e_machine %d %s\n",
+ r2(&ehdr->e_machine), fname);
+ return -1;
+ }
switch (ehdr->e_ident[EI_CLASS]) {
- default:
- fprintf(stderr, "unrecognized ELF class %d %s\n",
- ehdr->e_ident[EI_CLASS], fname);
- break;
case ELFCLASS32:
- if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
- || r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
+ if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr) ||
+ r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
break;
}
rc = do32(ehdr, fname, custom_sort);
break;
- case ELFCLASS64: {
+ case ELFCLASS64:
+ {
Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
- if (r2(&ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
- || r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
+ if (r2(&ghdr->e_ehsize) != sizeof(Elf64_Ehdr) ||
+ r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
fprintf(stderr,
- "unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
+ "unrecognized ET_EXEC/ET_DYN file: %s\n",
+ fname);
break;
}
rc = do64(ghdr, fname, custom_sort);
+ }
+ break;
+ default:
+ fprintf(stderr, "unrecognized ELF class %d %s\n",
+ ehdr->e_ident[EI_CLASS], fname);
break;
}
- } /* end switch */
return rc;
}
-int
-main(int argc, char *argv[])
+int main(int argc, char *argv[])
{
int i, n_error = 0; /* gcc-4.3.0 false positive complaint */
size_t size = 0;
@@ -361,5 +368,6 @@ main(int argc, char *argv[])
munmap(addr, size);
}
+
return !!n_error;
}
diff --git a/scripts/sortextable.h b/scripts/sortextable.h
index 5a62e94df678..b7e407e09f59 100644
--- a/scripts/sortextable.h
+++ b/scripts/sortextable.h
@@ -6,7 +6,7 @@
*
* Some of this code was taken out of recordmcount.h written by:
*
- * Copyright 2009 John F. Reiser <[email protected]>. All rights reserved.
+ * Copyright 2009 John F. Reiser <[email protected]>. All rights reserved.
* Copyright 2010 Steven Rostedt <[email protected]>, Red Hat Inc.
*/
@@ -87,8 +87,9 @@ static int compare_extable(const void *a, const void *b)
return 0;
}
-static int
-do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
+static int do_func(Elf_Ehdr *ehdr,
+ char const *const fname,
+ table_sort_t custom_sort)
{
Elf_Shdr *shdr;
Elf_Shdr *shstrtab_sec;
@@ -126,7 +127,7 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
secstrtab = (const char *)ehdr + _r(&shstrtab_sec->sh_offset);
for (i = 0; i < num_sections; i++) {
idx = r(&shdr[i].sh_name);
- if (strcmp(secstrtab + idx, "__ex_table") == 0) {
+ if (!strcmp(secstrtab + idx, "__ex_table")) {
extab_sec = shdr + i;
extab_index = i;
}
@@ -136,26 +137,26 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
relocs = (void *)ehdr + _r(&shdr[i].sh_offset);
relocs_size = _r(&shdr[i].sh_size);
}
- if (strcmp(secstrtab + idx, ".symtab") == 0)
+ if (!strcmp(secstrtab + idx, ".symtab"))
symtab_sec = shdr + i;
- if (strcmp(secstrtab + idx, ".strtab") == 0)
+ if (!strcmp(secstrtab + idx, ".strtab"))
strtab_sec = shdr + i;
if (r(&shdr[i].sh_type) == SHT_SYMTAB_SHNDX)
symtab_shndx_start = (Elf32_Word *)(
(const char *)ehdr + _r(&shdr[i].sh_offset));
}
- if (strtab_sec == NULL) {
- fprintf(stderr, "no .strtab in file: %s\n", fname);
+ if (!strtab_sec) {
+ fprintf(stderr, "no .strtab in file: %s\n", fname);
return -1;
}
- if (symtab_sec == NULL) {
- fprintf(stderr, "no .symtab in file: %s\n", fname);
+ if (!symtab_sec) {
+ fprintf(stderr, "no .symtab in file: %s\n", fname);
return -1;
}
symtab = (const Elf_Sym *)((const char *)ehdr +
_r(&symtab_sec->sh_offset));
- if (extab_sec == NULL) {
- fprintf(stderr, "no __ex_table in file: %s\n", fname);
+ if (!extab_sec) {
+ fprintf(stderr, "no __ex_table in file: %s\n", fname);
return -1;
}
strtab = (const char *)ehdr + _r(&strtab_sec->sh_offset);
@@ -181,14 +182,14 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
if (ELF_ST_TYPE(sym->st_info) != STT_OBJECT)
continue;
idx = r(&sym->st_name);
- if (strcmp(strtab + idx, "main_extable_sort_needed") == 0) {
+ if (!strcmp(strtab + idx, "main_extable_sort_needed")) {
sort_needed_sym = sym;
break;
}
}
- if (sort_needed_sym == NULL) {
+ if (!sort_needed_sym) {
fprintf(stderr,
- "no main_extable_sort_needed symbol in file: %s\n",
+ "no main_extable_sort_needed symbol in file: %s\n",
fname);
return -1;
}
--
2.24.0.rc2
The sortextable token some code from recordmount, which uses
the same setjmp/longjmp to manage control flow.
Now, recordmcount has been rewritten the error handling by
commit 3f1df12019f3 ("recordmcount: Rewrite error/success handling").
So rewrite this part as well with more refactors, make it more readable
and easy for further extend, no functional changes.
Signed-off-by: Shile Zhang <[email protected]>
---
scripts/sortextable.c | 119 +++++++++++++++---------------------------
scripts/sortextable.h | 11 ++--
2 files changed, 48 insertions(+), 82 deletions(-)
diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index 55768654e3c6..cd9762ba4467 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -22,7 +22,6 @@
#include <getopt.h>
#include <elf.h>
#include <fcntl.h>
-#include <setjmp.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@@ -51,61 +50,41 @@
#define EM_ARCV2 195
#endif
-static int fd_map; /* File descriptor for file being modified. */
-static int mmap_failed; /* Boolean flag. */
-static void *ehdr_curr; /* current ElfXX_Ehdr * for resource cleanup */
-static struct stat sb; /* Remember .st_size, etc. */
-static jmp_buf jmpenv; /* setjmp/longjmp per-file error escape */
-
-/* setjmp() return values */
-enum {
- SJ_SETJMP = 0, /* hardwired first return */
- SJ_FAIL,
- SJ_SUCCEED
-};
-
-/* Per-file resource cleanup when multiple files. */
-static void
-cleanup(void)
-{
- if (!mmap_failed)
- munmap(ehdr_curr, sb.st_size);
- close(fd_map);
-}
-
-static void __attribute__((noreturn))
-fail_file(void)
-{
- cleanup();
- longjmp(jmpenv, SJ_FAIL);
-}
-
/*
* Get the whole file as a programming convenience in order to avoid
* malloc+lseek+read+free of many pieces. If successful, then mmap
* avoids copying unused pieces; else just read the whole file.
* Open for both read and write.
*/
-static void *mmap_file(char const *fname)
+static void *mmap_file(char const *fname, size_t *size)
{
- void *addr;
+ int fd;
+ struct stat sb;
+ void *addr = NULL;
- fd_map = open(fname, O_RDWR);
- if (fd_map < 0 || fstat(fd_map, &sb) < 0) {
+ fd = open(fname, O_RDWR);
+ if (fd < 0) {
perror(fname);
- fail_file();
+ return NULL;
+ }
+ if (fstat(fd, &sb) < 0) {
+ perror(fname);
+ goto out;
}
if (!S_ISREG(sb.st_mode)) {
fprintf(stderr, "not a regular file: %s\n", fname);
- fail_file();
+ goto out;
}
- addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED,
- fd_map, 0);
+ addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
- mmap_failed = 1;
fprintf(stderr, "Could not mmap file: %s\n", fname);
- fail_file();
+ goto out;
}
+
+ *size = sb.st_size;
+
+out:
+ close(fd);
return addr;
}
@@ -264,19 +243,18 @@ static void sort_relative_table(char *extab_image, int image_size)
}
}
-static void
-do_file(char const *const fname)
+static int
+do_file(char const *const fname, void *addr)
{
- table_sort_t custom_sort;
- Elf32_Ehdr *ehdr = mmap_file(fname);
+ table_sort_t custom_sort = NULL;
+ Elf32_Ehdr *ehdr = addr;
+ int rc = -1;
- ehdr_curr = ehdr;
switch (ehdr->e_ident[EI_DATA]) {
default:
fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
ehdr->e_ident[EI_DATA], fname);
- fail_file();
- break;
+ return -1;
case ELFDATA2LSB:
r = rle;
r2 = r2le;
@@ -298,7 +276,7 @@ do_file(char const *const fname)
|| (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN)
|| ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname);
- fail_file();
+ return -1;
}
custom_sort = NULL;
@@ -306,7 +284,6 @@ do_file(char const *const fname)
default:
fprintf(stderr, "unrecognized e_machine %d %s\n",
r2(&ehdr->e_machine), fname);
- fail_file();
break;
case EM_386:
case EM_X86_64:
@@ -333,16 +310,15 @@ do_file(char const *const fname)
default:
fprintf(stderr, "unrecognized ELF class %d %s\n",
ehdr->e_ident[EI_CLASS], fname);
- fail_file();
break;
case ELFCLASS32:
if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
|| r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
- fail_file();
+ break;
}
- do32(ehdr, fname, custom_sort);
+ rc = do32(ehdr, fname, custom_sort);
break;
case ELFCLASS64: {
Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
@@ -350,21 +326,22 @@ do_file(char const *const fname)
|| r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
- fail_file();
+ break;
}
- do64(ghdr, fname, custom_sort);
+ rc = do64(ghdr, fname, custom_sort);
break;
}
} /* end switch */
- cleanup();
+ return rc;
}
int
main(int argc, char *argv[])
{
- int n_error = 0; /* gcc-4.3.0 false positive complaint */
- int i;
+ int i, n_error = 0; /* gcc-4.3.0 false positive complaint */
+ size_t size = 0;
+ void *addr = NULL;
if (argc < 2) {
fprintf(stderr, "usage: sortextable vmlinux...\n");
@@ -373,28 +350,16 @@ main(int argc, char *argv[])
/* Process each file in turn, allowing deep failure. */
for (i = 1; i < argc; i++) {
- char *file = argv[i];
- int const sjval = setjmp(jmpenv);
+ addr = mmap_file(argv[i], &size);
+ if (!addr) {
+ ++n_error;
+ continue;
+ }
- switch (sjval) {
- default:
- fprintf(stderr, "internal error: %s\n", file);
- exit(1);
- break;
- case SJ_SETJMP: /* normal sequence */
- /* Avoid problems if early cleanup() */
- fd_map = -1;
- ehdr_curr = NULL;
- mmap_failed = 1;
- do_file(file);
- break;
- case SJ_FAIL: /* error in do_file or below */
+ if (do_file(argv[i], addr))
++n_error;
- break;
- case SJ_SUCCEED: /* premature success */
- /* do nothing */
- break;
- } /* end switch */
+
+ munmap(addr, size);
}
return !!n_error;
}
diff --git a/scripts/sortextable.h b/scripts/sortextable.h
index d4b3f6c40f02..5a62e94df678 100644
--- a/scripts/sortextable.h
+++ b/scripts/sortextable.h
@@ -87,7 +87,7 @@ static int compare_extable(const void *a, const void *b)
return 0;
}
-static void
+static int
do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
{
Elf_Shdr *shdr;
@@ -146,17 +146,17 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
}
if (strtab_sec == NULL) {
fprintf(stderr, "no .strtab in file: %s\n", fname);
- fail_file();
+ return -1;
}
if (symtab_sec == NULL) {
fprintf(stderr, "no .symtab in file: %s\n", fname);
- fail_file();
+ return -1;
}
symtab = (const Elf_Sym *)((const char *)ehdr +
_r(&symtab_sec->sh_offset));
if (extab_sec == NULL) {
fprintf(stderr, "no __ex_table in file: %s\n", fname);
- fail_file();
+ return -1;
}
strtab = (const char *)ehdr + _r(&strtab_sec->sh_offset);
@@ -190,7 +190,7 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
fprintf(stderr,
"no main_extable_sort_needed symbol in file: %s\n",
fname);
- fail_file();
+ return -1;
}
sort_needed_sec = &shdr[get_secindex(r2(&sym->st_shndx),
sort_needed_sym - symtab,
@@ -206,4 +206,5 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
#endif
/* We sorted it, clear the flag. */
w(0, sort_done_location);
+ return 0;
}
--
2.24.0.rc2
ORC unwinder have two tables, .orc_unwind_ip and .orc_unwind, which
needs sorted for binary search. To sort it at build time can save more
CPU cycles help to speed up kernel booting.
Add the ORC tables sorting in a sperated thread helps to avoid more link
cost of kernel building.
Signed-off-by: Shile Zhang <[email protected]>
---
scripts/Makefile | 5 ++
scripts/sorttable.h | 214 ++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 213 insertions(+), 6 deletions(-)
diff --git a/scripts/Makefile b/scripts/Makefile
index 658d201f7f8b..06e9f4f5ea93 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -26,6 +26,11 @@ HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
HOSTLDLIBS_sign-file = -lcrypto
HOSTLDLIBS_extract-cert = -lcrypto
+ifdef CONFIG_UNWINDER_ORC
+HOSTCFLAGS_sorttable.o += -DUNWINDER_ORC_ENABLED
+HOSTLDLIBS_sorttable = -lpthread
+endif
+
always := $(hostprogs-y) $(hostprogs-m)
# The following hostprogs-y programs are only build on demand
diff --git a/scripts/sorttable.h b/scripts/sorttable.h
index 82589ff90e25..a75f8b4a125f 100644
--- a/scripts/sorttable.h
+++ b/scripts/sorttable.h
@@ -4,6 +4,14 @@
*
* Copyright 2011 - 2012 Cavium, Inc.
*
+ * Added ORC unwind tables sort support, and other updates:
+ * Copyright (C) 1999-2019 Alibaba Group Holding Limited. by:
+ * Shile Zhang <[email protected]>
+ *
+ * Some of code was taken out of /lib/sort.c, and
+ * arch/x86/kernel/unwind_orc.c written by:
+ * Copyright (C) 2017 Josh Poimboeuf <[email protected]>
+ *
* Some of this code was taken out of recordmcount.h written by:
*
* Copyright 2009 John F. Reiser <[email protected]>. All rights reserved.
@@ -75,6 +83,162 @@
# define _w w
#endif
+#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
+/* ORC unwinder only support X86_64 */
+#include <errno.h>
+#include <pthread.h>
+#include <linux/types.h>
+
+#define ORC_REG_UNDEFINED 0
+#define ERRSTRING_MAXSZ 256
+
+struct orc_entry {
+ s16 sp_offset;
+ s16 bp_offset;
+ unsigned sp_reg:4;
+ unsigned bp_reg:4;
+ unsigned type:2;
+ unsigned end:1;
+} __attribute__((packed));
+
+struct orctable_info {
+ size_t orc_size;
+ size_t orc_ip_size;
+} orctable;
+
+char g_errstring[ERRSTRING_MAXSZ];
+int *g_orc_ip_table;
+struct orc_entry *g_orc_table;
+pthread_t orc_sort_thread;
+
+/**
+ * sort - sort an array of elements
+ * @base: pointer to data to sort
+ * @num: number of elements
+ * @size: size of each element
+ * @cmp_func: pointer to comparison function
+ * @swap_func: pointer to swap function
+ *
+ * This function does a heapsort on the given array. You may provide a
+ * swap_func function optimized to your element type.
+ *
+ * Sorting time is O(n log n) both on average and worst-case. While
+ * qsort is about 20% faster on average, it suffers from exploitable
+ * O(n*n) worst-case behavior and extra memory requirements that make
+ * it less suitable for kernel use.
+ *
+ * This code token out of /lib/sort.c.
+ */
+static void sort(void *base, size_t num, size_t size,
+ int (*cmp_func)(const void *, const void *),
+ void (*swap_func)(void *, void *, int size))
+{
+ /* pre-scale counters for performance */
+ int i = (num/2 - 1) * size, n = num * size, c, r;
+
+ /* heapify */
+ for ( ; i >= 0; i -= size) {
+ for (r = i; r * 2 + size < n; r = c) {
+ c = r * 2 + size;
+ if (c < n - size &&
+ cmp_func(base + c, base + c + size) < 0)
+ c += size;
+ if (cmp_func(base + r, base + c) >= 0)
+ break;
+ swap_func(base + r, base + c, size);
+ }
+ }
+
+ /* sort */
+ for (i = n - size; i > 0; i -= size) {
+ swap_func(base, base + i, size);
+ for (r = 0; r * 2 + size < i; r = c) {
+ c = r * 2 + size;
+ if (c < i - size &&
+ cmp_func(base + c, base + c + size) < 0)
+ c += size;
+ if (cmp_func(base + r, base + c) >= 0)
+ break;
+ swap_func(base + r, base + c, size);
+ }
+ }
+}
+
+static inline unsigned long orc_ip(const int *ip)
+{
+ return (unsigned long)ip + *ip;
+}
+
+static void orc_sort_swap(void *_a, void *_b, int size)
+{
+ struct orc_entry *orc_a, *orc_b;
+ struct orc_entry orc_tmp;
+ int *a = _a, *b = _b, tmp;
+ int delta = _b - _a;
+
+ /* Swap the .orc_unwind_ip entries: */
+ tmp = *a;
+ *a = *b + delta;
+ *b = tmp - delta;
+
+ /* Swap the corresponding .orc_unwind entries: */
+ orc_a = g_orc_table + (a - g_orc_ip_table);
+ orc_b = g_orc_table + (b - g_orc_ip_table);
+ orc_tmp = *orc_a;
+ *orc_a = *orc_b;
+ *orc_b = orc_tmp;
+}
+
+static int orc_sort_cmp(const void *_a, const void *_b)
+{
+ struct orc_entry *orc_a;
+ const int *a = _a, *b = _b;
+ unsigned long a_val = orc_ip(a);
+ unsigned long b_val = orc_ip(b);
+
+ if (a_val > b_val)
+ return 1;
+ if (a_val < b_val)
+ return -1;
+
+ /*
+ * The "weak" section terminator entries need to always be on the left
+ * to ensure the lookup code skips them in favor of real entries.
+ * These terminator entries exist to handle any gaps created by
+ * whitelisted .o files which didn't get objtool generation.
+ */
+ orc_a = g_orc_table + (a - g_orc_ip_table);
+ return orc_a->sp_reg == ORC_REG_UNDEFINED && !orc_a->end ? -1 : 1;
+}
+
+static void *sort_orctable(void *arg)
+{
+ struct orctable_info *orcptr = (struct orctable_info *)arg;
+ unsigned int num_entries;
+
+ if (!g_orc_ip_table || !g_orc_table) {
+ snprintf(g_errstring, ERRSTRING_MAXSZ,
+ "cannot find ORC unwind tables");
+ pthread_exit(g_errstring);
+ }
+
+ num_entries = orcptr->orc_ip_size / sizeof(int);
+
+ if (orcptr->orc_ip_size % sizeof(int) != 0 ||
+ orcptr->orc_size % sizeof(struct orc_entry) != 0 ||
+ num_entries != orcptr->orc_size / sizeof(struct orc_entry)) {
+ snprintf(g_errstring, ERRSTRING_MAXSZ,
+ "wrong ORC unwind table entries number");
+ pthread_exit(g_errstring);
+ }
+
+ sort(g_orc_ip_table, num_entries, sizeof(int),
+ orc_sort_cmp, orc_sort_swap);
+
+ pthread_exit(NULL);
+}
+#endif
+
static int compare_extable(const void *a, const void *b)
{
Elf_Addr av = _r(a);
@@ -91,6 +255,7 @@ static int do_sort(Elf_Ehdr *ehdr,
char const *const fname,
table_sort_t custom_sort)
{
+ int rc = -1;
Elf_Shdr *s, *shdr = (Elf_Shdr *)((char *)ehdr + _r(&ehdr->e_shoff));
Elf_Shdr *strtab_sec = NULL;
Elf_Shdr *symtab_sec = NULL;
@@ -141,21 +306,44 @@ static int do_sort(Elf_Ehdr *ehdr,
if (r(&s->sh_type) == SHT_SYMTAB_SHNDX)
symtab_shndx = (Elf32_Word *)((const char *)ehdr +
_r(&s->sh_offset));
- }
+#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
+ /* locate the ORC unwind tables */
+ if (!strcmp(secstrings + idx, ".orc_unwind_ip")) {
+ orctable.orc_ip_size = s->sh_size;
+ g_orc_ip_table = (int *)((void *)ehdr +
+ s->sh_offset);
+ }
+ if (!strcmp(secstrings + idx, ".orc_unwind")) {
+ orctable.orc_size = s->sh_size;
+ g_orc_table = (struct orc_entry *)((void *)ehdr +
+ s->sh_offset);
+ }
+#endif
+ } /* for loop */
+
+#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
+ /* create thread to sort ORC unwind tables concurrently */
+ if (pthread_create(&orc_sort_thread, NULL, sort_orctable, &orctable)) {
+ fprintf(stderr,
+ "pthread_create orc_sort_thread failed '%s': %s\n",
+ strerror(errno), fname);
+ goto out;
+ }
+#endif
if (!extab_sec) {
fprintf(stderr, "no __ex_table in file: %s\n", fname);
- return -1;
+ goto out;
}
if (!symtab_sec) {
fprintf(stderr, "no .symtab in file: %s\n", fname);
- return -1;
+ goto out;
}
if (!strtab_sec) {
fprintf(stderr, "no .strtab in file: %s\n", fname);
- return -1;
+ goto out;
}
extab_image = (void *)ehdr + _r(&extab_sec->sh_offset);
@@ -192,7 +380,7 @@ static int do_sort(Elf_Ehdr *ehdr,
fprintf(stderr,
"no main_extable_sort_needed symbol in file: %s\n",
fname);
- return -1;
+ goto out;
}
sort_needed_sec = &shdr[get_secindex(r2(&sym->st_shndx),
@@ -205,6 +393,20 @@ static int do_sort(Elf_Ehdr *ehdr,
/* extable has been sorted, clear the flag */
w(0, sort_needed_loc);
+ rc = 0;
- return 0;
+out:
+#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
+ { /* to avoid gcc warning about declaration */
+ void *retval = NULL;
+
+ /* wait for ORC tables sort done */
+ pthread_join(orc_sort_thread, &retval);
+ if (retval) {
+ fprintf(stderr, "%s in file: %s\n", (char *)retval, fname);
+ rc = -1;
+ }
+ }
+#endif
+ return rc;
}
--
2.24.0.rc2
Using commonly name for further more kernel table sort at build time
extend, no functional changes.
Signed-off-by: Shile Zhang <[email protected]>
---
arch/arc/Kconfig | 2 +-
arch/arm/Kconfig | 2 +-
arch/arm64/Kconfig | 2 +-
arch/microblaze/Kconfig | 2 +-
arch/mips/Kconfig | 2 +-
arch/parisc/Kconfig | 2 +-
arch/parisc/kernel/vmlinux.lds.S | 2 +-
arch/powerpc/Kconfig | 2 +-
arch/s390/Kconfig | 2 +-
arch/x86/Kconfig | 2 +-
arch/xtensa/Kconfig | 2 +-
init/Kconfig | 2 +-
scripts/.gitignore | 2 +-
scripts/Makefile | 4 ++--
scripts/link-vmlinux.sh | 10 +++++-----
scripts/{sortextable.c => sorttable.c} | 10 +++++-----
scripts/{sortextable.h => sorttable.h} | 4 ++--
17 files changed, 27 insertions(+), 27 deletions(-)
rename scripts/{sortextable.c => sorttable.c} (97%)
rename scripts/{sortextable.h => sorttable.h} (99%)
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 8383155c8c82..80f1b4034ebd 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -14,7 +14,7 @@ config ARC
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_SUPPORTS_ATOMIC_RMW if ARC_HAS_LLSC
select ARCH_32BIT_OFF_T
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
select DMA_DIRECT_REMAP
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 8a50efb559f3..6a392d3b240f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -37,7 +37,7 @@ config ARM
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_IPC_PARSE_VERSION
select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
- select BUILDTIME_EXTABLE_SORT if MMU
+ select BUILDTIME_TABLE_SORT if MMU
select CLONE_BACKWARDS
select CPU_PM if SUSPEND || CPU_IDLE
select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3f047afb982c..a8ca76b327df 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -82,7 +82,7 @@ config ARM64
select ARM_GIC_V3
select ARM_GIC_V3_ITS if PCI
select ARM_PSCI_FW
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
select CPU_PM if (SUSPEND || CPU_IDLE)
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index c9c4be822456..959f0ae5cf90 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -12,7 +12,7 @@ config MICROBLAZE
select ARCH_HAS_UNCACHED_SEGMENT if !MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_WANT_IPC_PARSE_VERSION
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select TIMER_OF
select CLONE_BACKWARDS3
select COMMON_CLK
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index a0bd9bdb5f83..bec582846dbd 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -14,7 +14,7 @@ config MIPS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
select ARCH_WANT_IPC_PARSE_VERSION
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select CPU_NO_EFFICIENT_FFS if (TARGET_ISA_REV < 1)
select CPU_PM if CPU_IDLE
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index b16237c95ea3..e1ef610a5a2b 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -18,7 +18,7 @@ config PARISC
select RTC_DRV_GENERIC
select INIT_ALL_POSSIBLE
select BUG
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select HAVE_PCI
select HAVE_PERF_EVENTS
select HAVE_KERNEL_BZIP2
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index 99cd24f2ea01..2a9799f7b46e 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -129,7 +129,7 @@ SECTIONS
RO_DATA_SECTION(8)
- /* RO because of BUILDTIME_EXTABLE_SORT */
+ /* RO because of BUILDTIME_TABLE_SORT */
EXCEPTION_TABLE(8)
NOTES
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3e56c9c2f16e..b3f404b825a6 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -149,7 +149,7 @@ config PPC
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN
select DYNAMIC_FTRACE if FUNCTION_TRACER
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 43a81d0ad507..97a3a63c69fe 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -110,7 +110,7 @@ config S390
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANT_IPC_PARSE_VERSION
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS2
select DYNAMIC_FTRACE if FUNCTION_TRACER
select GENERIC_CLOCKEVENTS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8ef85139553f..958bfbc2e5db 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -97,7 +97,7 @@ config X86
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANT_HUGE_PMD_SHARE
select ARCH_WANTS_THP_SWAP if X86_64
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLKEVT_I8253
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
select CLOCKSOURCE_WATCHDOG
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index a8e7beb6b7b5..011c53481840 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -9,7 +9,7 @@ config XTENSA
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_FRAME_POINTERS
select ARCH_WANT_IPC_PARSE_VERSION
- select BUILDTIME_EXTABLE_SORT
+ select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select COMMON_CLK
select DMA_REMAP if MMU
diff --git a/init/Kconfig b/init/Kconfig
index b4daad2bac23..40e2edf8b00b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -58,7 +58,7 @@ config CONSTRUCTORS
config IRQ_WORK
bool
-config BUILDTIME_EXTABLE_SORT
+config BUILDTIME_TABLE_SORT
bool
config THREAD_INFO_IN_TASK
diff --git a/scripts/.gitignore b/scripts/.gitignore
index 17f8cef88fa8..c03b7cd97111 100644
--- a/scripts/.gitignore
+++ b/scripts/.gitignore
@@ -7,7 +7,7 @@ kallsyms
pnmtologo
unifdef
recordmcount
-sortextable
+sorttable
asn1_compiler
extract-cert
sign-file
diff --git a/scripts/Makefile b/scripts/Makefile
index 3e86b300f5a1..658d201f7f8b 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -15,13 +15,13 @@ hostprogs-$(CONFIG_KALLSYMS) += kallsyms
hostprogs-$(CONFIG_LOGO) += pnmtologo
hostprogs-$(CONFIG_VT) += conmakehash
hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
-hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sortextable
+hostprogs-$(CONFIG_BUILDTIME_TABLE_SORT) += sorttable
hostprogs-$(CONFIG_ASN1) += asn1_compiler
hostprogs-$(CONFIG_MODULE_SIG_FORMAT) += sign-file
hostprogs-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
hostprogs-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert
-HOSTCFLAGS_sortextable.o = -I$(srctree)/tools/include
+HOSTCFLAGS_sorttable.o = -I$(srctree)/tools/include
HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
HOSTLDLIBS_sign-file = -lcrypto
HOSTLDLIBS_extract-cert = -lcrypto
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 06495379fcd8..01978d1e4c13 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -178,9 +178,9 @@ mksysmap()
${CONFIG_SHELL} "${srctree}/scripts/mksysmap" ${1} ${2}
}
-sortextable()
+sorttable()
{
- ${objtree}/scripts/sortextable ${1}
+ ${objtree}/scripts/sorttable ${1}
}
# Delete output files in case of error
@@ -298,9 +298,9 @@ fi
vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o}
-if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
- info SORTEX vmlinux
- sortextable vmlinux
+if [ -n "${CONFIG_BUILDTIME_TABLE_SORT}" ]; then
+ info SORTTAB vmlinux
+ sorttable vmlinux
fi
info SYSMAP System.map
diff --git a/scripts/sortextable.c b/scripts/sorttable.c
similarity index 97%
rename from scripts/sortextable.c
rename to scripts/sorttable.c
index efa2839865cd..ff98b7db20c6 100644
--- a/scripts/sortextable.c
+++ b/scripts/sorttable.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * sortextable.c: Sort the kernel's exception table
+ * sorttable.c: Sort the kernel's table
*
* Copyright 2011 - 2012 Cavium, Inc.
*
@@ -182,9 +182,9 @@ static inline unsigned int get_secindex(unsigned int shndx,
}
/* 32 bit and 64 bit are very similar */
-#include "sortextable.h"
-#define SORTEXTABLE_64
-#include "sortextable.h"
+#include "sorttable.h"
+#define SORTTABLE_64
+#include "sorttable.h"
static int compare_relative_table(const void *a, const void *b)
{
@@ -351,7 +351,7 @@ int main(int argc, char *argv[])
void *addr = NULL;
if (argc < 2) {
- fprintf(stderr, "usage: sortextable vmlinux...\n");
+ fprintf(stderr, "usage: sorttable vmlinux...\n");
return 0;
}
diff --git a/scripts/sortextable.h b/scripts/sorttable.h
similarity index 99%
rename from scripts/sortextable.h
rename to scripts/sorttable.h
index 6485513f7cae..82589ff90e25 100644
--- a/scripts/sortextable.h
+++ b/scripts/sorttable.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
- * sortextable.h
+ * sorttable.h
*
* Copyright 2011 - 2012 Cavium, Inc.
*
@@ -31,7 +31,7 @@
#undef _r
#undef _w
-#ifdef SORTEXTABLE_64
+#ifdef SORTTABLE_64
# define extable_ent_size 16
# define compare_extable compare_extable_64
# define do_sort do_sort_64
--
2.24.0.rc2
Refine the loop, naming and code structure, make the code more readable
and extendable, no functional changes.
Signed-off-by: Shile Zhang <[email protected]>
---
scripts/sortextable.c | 4 +-
scripts/sortextable.h | 115 ++++++++++++++++++++++--------------------
2 files changed, 61 insertions(+), 58 deletions(-)
diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index e5384e86b58c..efa2839865cd 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -320,7 +320,7 @@ static int do_file(char const *const fname, void *addr)
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
break;
}
- rc = do32(ehdr, fname, custom_sort);
+ rc = do_sort_32(ehdr, fname, custom_sort);
break;
case ELFCLASS64:
{
@@ -332,7 +332,7 @@ static int do_file(char const *const fname, void *addr)
fname);
break;
}
- rc = do64(ghdr, fname, custom_sort);
+ rc = do_sort_64(ghdr, fname, custom_sort);
}
break;
default:
diff --git a/scripts/sortextable.h b/scripts/sortextable.h
index a2e3af7bf211..6485513f7cae 100644
--- a/scripts/sortextable.h
+++ b/scripts/sortextable.h
@@ -12,7 +12,7 @@
#undef extable_ent_size
#undef compare_extable
-#undef do_func
+#undef do_sort
#undef Elf_Addr
#undef Elf_Ehdr
#undef Elf_Shdr
@@ -34,7 +34,7 @@
#ifdef SORTEXTABLE_64
# define extable_ent_size 16
# define compare_extable compare_extable_64
-# define do_func do64
+# define do_sort do_sort_64
# define Elf_Addr Elf64_Addr
# define Elf_Ehdr Elf64_Ehdr
# define Elf_Shdr Elf64_Shdr
@@ -55,7 +55,7 @@
#else
# define extable_ent_size 8
# define compare_extable compare_extable_32
-# define do_func do32
+# define do_sort do_sort_32
# define Elf_Addr Elf32_Addr
# define Elf_Ehdr Elf32_Ehdr
# define Elf_Shdr Elf32_Shdr
@@ -87,81 +87,81 @@ static int compare_extable(const void *a, const void *b)
return 0;
}
-static int do_func(Elf_Ehdr *ehdr,
+static int do_sort(Elf_Ehdr *ehdr,
char const *const fname,
table_sort_t custom_sort)
{
- Elf_Shdr *shdr;
- Elf_Shdr *shstrtab_sec;
+ Elf_Shdr *s, *shdr = (Elf_Shdr *)((char *)ehdr + _r(&ehdr->e_shoff));
Elf_Shdr *strtab_sec = NULL;
Elf_Shdr *symtab_sec = NULL;
Elf_Shdr *extab_sec = NULL;
Elf_Sym *sym;
const Elf_Sym *symtab;
- Elf32_Word *symtab_shndx_start = NULL;
- Elf_Sym *sort_needed_sym;
+ Elf32_Word *symtab_shndx = NULL;
+ Elf_Sym *sort_needed_sym = NULL;
Elf_Shdr *sort_needed_sec;
Elf_Rel *relocs = NULL;
int relocs_size = 0;
- uint32_t *sort_done_location;
- const char *secstrtab;
+ uint32_t *sort_needed_loc;
+ const char *secstrings;
const char *strtab;
char *extab_image;
int extab_index = 0;
int i;
int idx;
- unsigned int num_sections;
- unsigned int secindex_strings;
+ unsigned int shnum;
+ unsigned int shstrndx;
- shdr = (Elf_Shdr *)((char *)ehdr + _r(&ehdr->e_shoff));
+ shstrndx = r2(&ehdr->e_shstrndx);
+ if (shstrndx == SHN_XINDEX)
+ shstrndx = r(&shdr[0].sh_link);
+ secstrings = (const char *)ehdr + _r(&shdr[shstrndx].sh_offset);
- num_sections = r2(&ehdr->e_shnum);
- if (num_sections == SHN_UNDEF)
- num_sections = _r(&shdr[0].sh_size);
+ shnum = r2(&ehdr->e_shnum);
+ if (shnum == SHN_UNDEF)
+ shnum = _r(&shdr[0].sh_size);
- secindex_strings = r2(&ehdr->e_shstrndx);
- if (secindex_strings == SHN_XINDEX)
- secindex_strings = r(&shdr[0].sh_link);
-
- shstrtab_sec = shdr + secindex_strings;
- secstrtab = (const char *)ehdr + _r(&shstrtab_sec->sh_offset);
- for (i = 0; i < num_sections; i++) {
- idx = r(&shdr[i].sh_name);
- if (!strcmp(secstrtab + idx, "__ex_table")) {
- extab_sec = shdr + i;
+ for (i = 0, s = shdr; s < shdr + shnum; i++, s++) {
+ idx = r(&s->sh_name);
+ if (!strcmp(secstrings + idx, "__ex_table")) {
+ extab_sec = s;
extab_index = i;
}
- if ((r(&shdr[i].sh_type) == SHT_REL ||
- r(&shdr[i].sh_type) == SHT_RELA) &&
- r(&shdr[i].sh_info) == extab_index) {
- relocs = (void *)ehdr + _r(&shdr[i].sh_offset);
- relocs_size = _r(&shdr[i].sh_size);
+ if (!strcmp(secstrings + idx, ".symtab"))
+ symtab_sec = s;
+ if (!strcmp(secstrings + idx, ".strtab"))
+ strtab_sec = s;
+
+ if ((r(&s->sh_type) == SHT_REL ||
+ r(&s->sh_type) == SHT_RELA) &&
+ r(&s->sh_info) == extab_index) {
+ relocs = (void *)ehdr + _r(&s->sh_offset);
+ relocs_size = _r(&s->sh_size);
}
- if (!strcmp(secstrtab + idx, ".symtab"))
- symtab_sec = shdr + i;
- if (!strcmp(secstrtab + idx, ".strtab"))
- strtab_sec = shdr + i;
- if (r(&shdr[i].sh_type) == SHT_SYMTAB_SHNDX)
- symtab_shndx_start = (Elf32_Word *)(
- (const char *)ehdr + _r(&shdr[i].sh_offset));
+ if (r(&s->sh_type) == SHT_SYMTAB_SHNDX)
+ symtab_shndx = (Elf32_Word *)((const char *)ehdr +
+ _r(&s->sh_offset));
}
- if (!strtab_sec) {
- fprintf(stderr, "no .strtab in file: %s\n", fname);
+
+ if (!extab_sec) {
+ fprintf(stderr, "no __ex_table in file: %s\n", fname);
return -1;
}
+
if (!symtab_sec) {
fprintf(stderr, "no .symtab in file: %s\n", fname);
return -1;
}
- symtab = (const Elf_Sym *)((const char *)ehdr +
- _r(&symtab_sec->sh_offset));
- if (!extab_sec) {
- fprintf(stderr, "no __ex_table in file: %s\n", fname);
+
+ if (!strtab_sec) {
+ fprintf(stderr, "no .strtab in file: %s\n", fname);
return -1;
}
- strtab = (const char *)ehdr + _r(&strtab_sec->sh_offset);
extab_image = (void *)ehdr + _r(&extab_sec->sh_offset);
+ strtab = (const char *)ehdr + _r(&strtab_sec->sh_offset);
+ symtab = (const Elf_Sym *)((const char *)ehdr +
+ _r(&symtab_sec->sh_offset));
if (custom_sort) {
custom_sort(extab_image, _r(&extab_sec->sh_size));
@@ -170,38 +170,41 @@ static int do_func(Elf_Ehdr *ehdr,
qsort(extab_image, num_entries,
extable_ent_size, compare_extable);
}
+
/* If there were relocations, we no longer need them. */
if (relocs)
memset(relocs, 0, relocs_size);
- /* find main_extable_sort_needed */
- sort_needed_sym = NULL;
- for (i = 0; i < _r(&symtab_sec->sh_size) / sizeof(Elf_Sym); i++) {
- sym = (void *)ehdr + _r(&symtab_sec->sh_offset);
- sym += i;
+ /* find the flag main_extable_sort_needed */
+ for (sym = (void *)ehdr + _r(&symtab_sec->sh_offset);
+ sym < sym + _r(&symtab_sec->sh_size) / sizeof(Elf_Sym);
+ sym++) {
if (ELF_ST_TYPE(sym->st_info) != STT_OBJECT)
continue;
- idx = r(&sym->st_name);
- if (!strcmp(strtab + idx, "main_extable_sort_needed")) {
+ if (!strcmp(strtab + r(&sym->st_name),
+ "main_extable_sort_needed")) {
sort_needed_sym = sym;
break;
}
}
+
if (!sort_needed_sym) {
fprintf(stderr,
"no main_extable_sort_needed symbol in file: %s\n",
fname);
return -1;
}
+
sort_needed_sec = &shdr[get_secindex(r2(&sym->st_shndx),
sort_needed_sym - symtab,
- symtab_shndx_start)];
- sort_done_location = (void *)ehdr +
+ symtab_shndx)];
+ sort_needed_loc = (void *)ehdr +
_r(&sort_needed_sec->sh_offset) +
_r(&sort_needed_sym->st_value) -
_r(&sort_needed_sec->sh_addr);
- /* We sorted it, clear the flag. */
- w(0, sort_done_location);
+ /* extable has been sorted, clear the flag */
+ w(0, sort_needed_loc);
+
return 0;
}
--
2.24.0.rc2
* Shile Zhang <[email protected]> wrote:
> Hi,
>
> I refactored the sortextable code and add ORC unwind tables sort
> support, for kernel boot speedup by sorting kernel tables at build time
> as much as possible.
>
> Followed Peter's suggestion, I put ORC tables sort into a separated
> thread makes these tables sort concurrently. That helps to avoid
> kernel's link time as possible.
Could you please also measure how much boot time this saves,
approximately, and how long it takes to do it at build time?
Thanks,
Ingo
On 2019/11/15 15:25, Ingo Molnar wrote:
> * Shile Zhang <[email protected]> wrote:
>
>> Hi,
>>
>> I refactored the sortextable code and add ORC unwind tables sort
>> support, for kernel boot speedup by sorting kernel tables at build time
>> as much as possible.
>>
>> Followed Peter's suggestion, I put ORC tables sort into a separated
>> thread makes these tables sort concurrently. That helps to avoid
>> kernel's link time as possible.
> Could you please also measure how much boot time this saves,
> approximately, and how long it takes to do it at build time?
Thanks for your review!
I've tested on 2vcpu16GB VM (with 2.5GHz Xeon CPU), it can saves near 90ms.
And the new sorttable tool costs about 0.074s to do extable and orc
tables sort,
on host with same CPU.
>
> Thanks,
>
> Ingo
On Fri, Nov 15, 2019 at 02:47:49PM +0800, Shile Zhang wrote:
> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
> +/* ORC unwinder only support X86_64 */
> +#include <errno.h>
> +#include <pthread.h>
> +#include <linux/types.h>
> +
> +#define ORC_REG_UNDEFINED 0
> +#define ERRSTRING_MAXSZ 256
> +
> +struct orc_entry {
> + s16 sp_offset;
> + s16 bp_offset;
> + unsigned sp_reg:4;
> + unsigned bp_reg:4;
> + unsigned type:2;
> + unsigned end:1;
> +} __attribute__((packed));
> +
> +struct orctable_info {
> + size_t orc_size;
> + size_t orc_ip_size;
> +} orctable;
There's ./arch/x86/include/asm/orc_types.h for this. Please don't
duplicate. objtool uses that same header.
> +/**
> + * sort - sort an array of elements
> + * @base: pointer to data to sort
> + * @num: number of elements
> + * @size: size of each element
> + * @cmp_func: pointer to comparison function
> + * @swap_func: pointer to swap function
> + *
> + * This function does a heapsort on the given array. You may provide a
> + * swap_func function optimized to your element type.
> + *
> + * Sorting time is O(n log n) both on average and worst-case. While
> + * qsort is about 20% faster on average, it suffers from exploitable
> + * O(n*n) worst-case behavior and extra memory requirements that make
> + * it less suitable for kernel use.
> + *
> + * This code token out of /lib/sort.c.
> + */
> +static void sort(void *base, size_t num, size_t size,
> + int (*cmp_func)(const void *, const void *),
> + void (*swap_func)(void *, void *, int size))
> +{
> + /* pre-scale counters for performance */
> + int i = (num/2 - 1) * size, n = num * size, c, r;
> +
> + /* heapify */
> + for ( ; i >= 0; i -= size) {
> + for (r = i; r * 2 + size < n; r = c) {
> + c = r * 2 + size;
> + if (c < n - size &&
> + cmp_func(base + c, base + c + size) < 0)
> + c += size;
> + if (cmp_func(base + r, base + c) >= 0)
> + break;
> + swap_func(base + r, base + c, size);
> + }
> + }
> +
> + /* sort */
> + for (i = n - size; i > 0; i -= size) {
> + swap_func(base, base + i, size);
> + for (r = 0; r * 2 + size < i; r = c) {
> + c = r * 2 + size;
> + if (c < i - size &&
> + cmp_func(base + c, base + c + size) < 0)
> + c += size;
> + if (cmp_func(base + r, base + c) >= 0)
> + break;
> + swap_func(base + r, base + c, size);
> + }
> + }
> +}
Do we really need to copy the heapsort implementation? That is, why not
use libc's qsort() ? This is userspace after all.
On Fri, Nov 15, 2019 at 02:47:49PM +0800, Shile Zhang wrote:
> @@ -141,21 +306,44 @@ static int do_sort(Elf_Ehdr *ehdr,
> if (r(&s->sh_type) == SHT_SYMTAB_SHNDX)
> symtab_shndx = (Elf32_Word *)((const char *)ehdr +
> _r(&s->sh_offset));
> - }
>
> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
> + /* locate the ORC unwind tables */
> + if (!strcmp(secstrings + idx, ".orc_unwind_ip")) {
> + orctable.orc_ip_size = s->sh_size;
> + g_orc_ip_table = (int *)((void *)ehdr +
> + s->sh_offset);
> + }
> + if (!strcmp(secstrings + idx, ".orc_unwind")) {
> + orctable.orc_size = s->sh_size;
> + g_orc_table = (struct orc_entry *)((void *)ehdr +
> + s->sh_offset);
> + }
> +#endif
> + } /* for loop */
> +
> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
Maybe something like:
if (g_orc_table && g_orc_ip_table) {
if (pthread_create(...))
...
} else if (g_orc_table || g_orc_up_table) {
fprintf(stderr, "incomplete ORC tables...\n");
}
> + /* create thread to sort ORC unwind tables concurrently */
> + if (pthread_create(&orc_sort_thread, NULL, sort_orctable, &orctable)) {
> + fprintf(stderr,
> + "pthread_create orc_sort_thread failed '%s': %s\n",
> + strerror(errno), fname);
> + goto out;
> + }
> +#endif
> if (!extab_sec) {
> fprintf(stderr, "no __ex_table in file: %s\n", fname);
> - return -1;
> + goto out;
> }
>
> if (!symtab_sec) {
> fprintf(stderr, "no .symtab in file: %s\n", fname);
> - return -1;
> + goto out;
> }
>
> if (!strtab_sec) {
> fprintf(stderr, "no .strtab in file: %s\n", fname);
> - return -1;
> + goto out;
> }
>
> extab_image = (void *)ehdr + _r(&extab_sec->sh_offset);
> @@ -192,7 +380,7 @@ static int do_sort(Elf_Ehdr *ehdr,
> fprintf(stderr,
> "no main_extable_sort_needed symbol in file: %s\n",
> fname);
> - return -1;
> + goto out;
> }
>
> sort_needed_sec = &shdr[get_secindex(r2(&sym->st_shndx),
> @@ -205,6 +393,20 @@ static int do_sort(Elf_Ehdr *ehdr,
>
> /* extable has been sorted, clear the flag */
> w(0, sort_needed_loc);
> + rc = 0;
>
> - return 0;
> +out:
> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
> + { /* to avoid gcc warning about declaration */
> + void *retval = NULL;
and then here:
if (orc_sort_thread) {
void *retval = NULL;
pthread_join(...);
...
}
> +
> + /* wait for ORC tables sort done */
> + pthread_join(orc_sort_thread, &retval);
> + if (retval) {
> + fprintf(stderr, "%s in file: %s\n", (char *)retval, fname);
> + rc = -1;
> + }
> + }
> +#endif
> + return rc;
> }
On 2019/11/15 17:07, Peter Zijlstra wrote:
> On Fri, Nov 15, 2019 at 02:47:49PM +0800, Shile Zhang wrote:
>
>> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
>> +/* ORC unwinder only support X86_64 */
>> +#include <errno.h>
>> +#include <pthread.h>
>> +#include <linux/types.h>
>> +
>> +#define ORC_REG_UNDEFINED 0
>> +#define ERRSTRING_MAXSZ 256
>> +
>> +struct orc_entry {
>> + s16 sp_offset;
>> + s16 bp_offset;
>> + unsigned sp_reg:4;
>> + unsigned bp_reg:4;
>> + unsigned type:2;
>> + unsigned end:1;
>> +} __attribute__((packed));
>> +
>> +struct orctable_info {
>> + size_t orc_size;
>> + size_t orc_ip_size;
>> +} orctable;
> There's ./arch/x86/include/asm/orc_types.h for this. Please don't
> duplicate. objtool uses that same header.
Good catch! Thanks for your kindly reminder! I'll remove it.
>> +/**
>> + * sort - sort an array of elements
>> + * @base: pointer to data to sort
>> + * @num: number of elements
>> + * @size: size of each element
>> + * @cmp_func: pointer to comparison function
>> + * @swap_func: pointer to swap function
>> + *
>> + * This function does a heapsort on the given array. You may provide a
>> + * swap_func function optimized to your element type.
>> + *
>> + * Sorting time is O(n log n) both on average and worst-case. While
>> + * qsort is about 20% faster on average, it suffers from exploitable
>> + * O(n*n) worst-case behavior and extra memory requirements that make
>> + * it less suitable for kernel use.
>> + *
>> + * This code token out of /lib/sort.c.
>> + */
>> +static void sort(void *base, size_t num, size_t size,
>> + int (*cmp_func)(const void *, const void *),
>> + void (*swap_func)(void *, void *, int size))
>> +{
>> + /* pre-scale counters for performance */
>> + int i = (num/2 - 1) * size, n = num * size, c, r;
>> +
>> + /* heapify */
>> + for ( ; i >= 0; i -= size) {
>> + for (r = i; r * 2 + size < n; r = c) {
>> + c = r * 2 + size;
>> + if (c < n - size &&
>> + cmp_func(base + c, base + c + size) < 0)
>> + c += size;
>> + if (cmp_func(base + r, base + c) >= 0)
>> + break;
>> + swap_func(base + r, base + c, size);
>> + }
>> + }
>> +
>> + /* sort */
>> + for (i = n - size; i > 0; i -= size) {
>> + swap_func(base, base + i, size);
>> + for (r = 0; r * 2 + size < i; r = c) {
>> + c = r * 2 + size;
>> + if (c < i - size &&
>> + cmp_func(base + c, base + c + size) < 0)
>> + c += size;
>> + if (cmp_func(base + r, base + c) >= 0)
>> + break;
>> + swap_func(base + r, base + c, size);
>> + }
>> + }
>> +}
> Do we really need to copy the heapsort implementation? That is, why not
> use libc's qsort() ? This is userspace after all.
Yes, I think qsort is better choice than copy-paste here. But qsort does
not support customized swap func, which is needed for ORC unwind swap
two tables together.
I think it's hard to do with qsort, so I used sort same with original
orc unwind table sort.
On 2019/11/15 17:19, Peter Zijlstra wrote:
> On Fri, Nov 15, 2019 at 02:47:49PM +0800, Shile Zhang wrote:
>
>> @@ -141,21 +306,44 @@ static int do_sort(Elf_Ehdr *ehdr,
>> if (r(&s->sh_type) == SHT_SYMTAB_SHNDX)
>> symtab_shndx = (Elf32_Word *)((const char *)ehdr +
>> _r(&s->sh_offset));
>> - }
>>
>> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
>> + /* locate the ORC unwind tables */
>> + if (!strcmp(secstrings + idx, ".orc_unwind_ip")) {
>> + orctable.orc_ip_size = s->sh_size;
>> + g_orc_ip_table = (int *)((void *)ehdr +
>> + s->sh_offset);
>> + }
>> + if (!strcmp(secstrings + idx, ".orc_unwind")) {
>> + orctable.orc_size = s->sh_size;
>> + g_orc_table = (struct orc_entry *)((void *)ehdr +
>> + s->sh_offset);
>> + }
>> +#endif
>> + } /* for loop */
>> +
>> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
> Maybe something like:
>
> if (g_orc_table && g_orc_ip_table) {
> if (pthread_create(...))
> ...
> } else if (g_orc_table || g_orc_up_table) {
> fprintf(stderr, "incomplete ORC tables...\n");
> }
>
>> + /* create thread to sort ORC unwind tables concurrently */
>> + if (pthread_create(&orc_sort_thread, NULL, sort_orctable, &orctable)) {
>> + fprintf(stderr,
>> + "pthread_create orc_sort_thread failed '%s': %s\n",
>> + strerror(errno), fname);
>> + goto out;
>> + }
>> +#endif
>> if (!extab_sec) {
>> fprintf(stderr, "no __ex_table in file: %s\n", fname);
>> - return -1;
>> + goto out;
>> }
>>
>> if (!symtab_sec) {
>> fprintf(stderr, "no .symtab in file: %s\n", fname);
>> - return -1;
>> + goto out;
>> }
>>
>> if (!strtab_sec) {
>> fprintf(stderr, "no .strtab in file: %s\n", fname);
>> - return -1;
>> + goto out;
>> }
>>
>> extab_image = (void *)ehdr + _r(&extab_sec->sh_offset);
>> @@ -192,7 +380,7 @@ static int do_sort(Elf_Ehdr *ehdr,
>> fprintf(stderr,
>> "no main_extable_sort_needed symbol in file: %s\n",
>> fname);
>> - return -1;
>> + goto out;
>> }
>>
>> sort_needed_sec = &shdr[get_secindex(r2(&sym->st_shndx),
>> @@ -205,6 +393,20 @@ static int do_sort(Elf_Ehdr *ehdr,
>>
>> /* extable has been sorted, clear the flag */
>> w(0, sort_needed_loc);
>> + rc = 0;
>>
>> - return 0;
>> +out:
>> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
>> + { /* to avoid gcc warning about declaration */
>> + void *retval = NULL;
> and then here:
>
> if (orc_sort_thread) {
> void *retval = NULL;
> pthread_join(...);
> ...
> }
Thank you for your kindly advice! I'll change it in next version.
>> +
>> + /* wait for ORC tables sort done */
>> + pthread_join(orc_sort_thread, &retval);
>> + if (retval) {
>> + fprintf(stderr, "%s in file: %s\n", (char *)retval, fname);
>> + rc = -1;
>> + }
>> + }
>> +#endif
>> + return rc;
>> }
On Fri, Nov 15, 2019 at 05:43:49PM +0800, Shile Zhang wrote:
> On 2019/11/15 17:07, Peter Zijlstra wrote:
> > On Fri, Nov 15, 2019 at 02:47:49PM +0800, Shile Zhang wrote:
> > > +/**
> > > + * sort - sort an array of elements
> > > + * @base: pointer to data to sort
> > > + * @num: number of elements
> > > + * @size: size of each element
> > > + * @cmp_func: pointer to comparison function
> > > + * @swap_func: pointer to swap function
> > > + *
> > > + * This function does a heapsort on the given array. You may provide a
> > > + * swap_func function optimized to your element type.
> > > + *
> > > + * Sorting time is O(n log n) both on average and worst-case. While
> > > + * qsort is about 20% faster on average, it suffers from exploitable
> > > + * O(n*n) worst-case behavior and extra memory requirements that make
> > > + * it less suitable for kernel use.
> > > + *
> > > + * This code token out of /lib/sort.c.
> > > + */
> > > +static void sort(void *base, size_t num, size_t size,
> > > + int (*cmp_func)(const void *, const void *),
> > > + void (*swap_func)(void *, void *, int size))
> > > +{
> > > +}
> > Do we really need to copy the heapsort implementation? That is, why not
> > use libc's qsort() ? This is userspace after all.
>
> Yes, I think qsort is better choice than copy-paste here. But qsort does not
> support customized swap func, which is needed for ORC unwind swap two tables
> together.
> I think it's hard to do with qsort, so I used sort same with original orc
> unwind table sort.
Urgh, you're right. That's unforunate.
From: Shile Zhang
> Sent: 15 November 2019 06:48
...
> arch/x86/kernel/unwind_orc.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index 332ae6530fa8..280da6fa9922 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -273,9 +273,11 @@ void __init unwind_init(void)
> return;
> }
>
> - /* Sort the .orc_unwind and .orc_unwind_ip tables: */
> - sort(__start_orc_unwind_ip, num_entries, sizeof(int), orc_sort_cmp,
> - orc_sort_swap);
> + /*
> + * Note, orc_unwind and orc_unwind_ip tables has been sorted in
> + * vmlinux link phase by sorttable tool at build time.
> + * Its ready for binary search now.
> + */
How fast is sort() if the table is sorted?
Relying on the kernel sources and build scripts always being in sync seems dangerous.
Probably better to leave the sort in for a release of two.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
> On Nov 15, 2019, at 1:43 AM, Shile Zhang <[email protected]> wrote:
>
>
>
>> On 2019/11/15 17:07, Peter Zijlstra wrote:
>>> On Fri, Nov 15, 2019 at 02:47:49PM +0800, Shile Zhang wrote:
>>>
>>> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
>>> +/* ORC unwinder only support X86_64 */
>>> +#include <errno.h>
>>> +#include <pthread.h>
>>> +#include <linux/types.h>
>>> +
>>> +#define ORC_REG_UNDEFINED 0
>>> +#define ERRSTRING_MAXSZ 256
>>> +
>>> +struct orc_entry {
>>> + s16 sp_offset;
>>> + s16 bp_offset;
>>> + unsigned sp_reg:4;
>>> + unsigned bp_reg:4;
>>> + unsigned type:2;
>>> + unsigned end:1;
>>> +} __attribute__((packed));
>>> +
>>> +struct orctable_info {
>>> + size_t orc_size;
>>> + size_t orc_ip_size;
>>> +} orctable;
>> There's ./arch/x86/include/asm/orc_types.h for this. Please don't
>> duplicate. objtool uses that same header.
> Good catch! Thanks for your kindly reminder! I'll remove it.
>>> +/**
>>> + * sort - sort an array of elements
>>> + * @base: pointer to data to sort
>>> + * @num: number of elements
>>> + * @size: size of each element
>>> + * @cmp_func: pointer to comparison function
>>> + * @swap_func: pointer to swap function
>>> + *
>>> + * This function does a heapsort on the given array. You may provide a
>>> + * swap_func function optimized to your element type.
>>> + *
>>> + * Sorting time is O(n log n) both on average and worst-case. While
>>> + * qsort is about 20% faster on average, it suffers from exploitable
>>> + * O(n*n) worst-case behavior and extra memory requirements that make
>>> + * it less suitable for kernel use.
>>> + *
>>> + * This code token out of /lib/sort.c.
>>> + */
>>> +static void sort(void *base, size_t num, size_t size,
>>> + int (*cmp_func)(const void *, const void *),
>>> + void (*swap_func)(void *, void *, int size))
>>> +{
>>> + /* pre-scale counters for performance */
>>> + int i = (num/2 - 1) * size, n = num * size, c, r;
>>> +
>>> + /* heapify */
>>> + for ( ; i >= 0; i -= size) {
>>> + for (r = i; r * 2 + size < n; r = c) {
>>> + c = r * 2 + size;
>>> + if (c < n - size &&
>>> + cmp_func(base + c, base + c + size) < 0)
>>> + c += size;
>>> + if (cmp_func(base + r, base + c) >= 0)
>>> + break;
>>> + swap_func(base + r, base + c, size);
>>> + }
>>> + }
>>> +
>>> + /* sort */
>>> + for (i = n - size; i > 0; i -= size) {
>>> + swap_func(base, base + i, size);
>>> + for (r = 0; r * 2 + size < i; r = c) {
>>> + c = r * 2 + size;
>>> + if (c < i - size &&
>>> + cmp_func(base + c, base + c + size) < 0)
>>> + c += size;
>>> + if (cmp_func(base + r, base + c) >= 0)
>>> + break;
>>> + swap_func(base + r, base + c, size);
>>> + }
>>> + }
>>> +}
>> Do we really need to copy the heapsort implementation? That is, why not
>> use libc's qsort() ? This is userspace after all.
>
> Yes, I think qsort is better choice than copy-paste here. But qsort does not support customized swap func, which is needed for ORC unwind swap two tables together.
> I think it's hard to do with qsort, so I used sort same with original orc unwind table sort.
One solution is to make an array of indices 0, 1, 2, etc, and sort that using a comparison function that compares i,j by actually comparing source[i], source[j]. (Or use pointers, but indices are probably faster on a 64-bit machine if you can use 32-bit indices.) Then, after sorting, permute the original array using the now-sorted indices. In the case where swapping is expensive, this is actually faster, since it does exactly n moves instead of O(n log n).
>
On Fri, Nov 15, 2019 at 04:51:24PM +0000, David Laight wrote:
> From: Shile Zhang
> > Sent: 15 November 2019 06:48
> ...
> > arch/x86/kernel/unwind_orc.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> > index 332ae6530fa8..280da6fa9922 100644
> > --- a/arch/x86/kernel/unwind_orc.c
> > +++ b/arch/x86/kernel/unwind_orc.c
> > @@ -273,9 +273,11 @@ void __init unwind_init(void)
> > return;
> > }
> >
> > - /* Sort the .orc_unwind and .orc_unwind_ip tables: */
> > - sort(__start_orc_unwind_ip, num_entries, sizeof(int), orc_sort_cmp,
> > - orc_sort_swap);
> > + /*
> > + * Note, orc_unwind and orc_unwind_ip tables has been sorted in
> > + * vmlinux link phase by sorttable tool at build time.
> > + * Its ready for binary search now.
> > + */
>
> How fast is sort() if the table is sorted?
> Relying on the kernel sources and build scripts always being in sync seems dangerous.
> Probably better to leave the sort in for a release of two.
This patch comes after the build script changes, so they'd be in sync.
What would the concern be?
--
Josh
From: Josh Poimboeuf <[email protected]>
> Sent: 15 November 2019 17:47
> On Fri, Nov 15, 2019 at 04:51:24PM +0000, David Laight wrote:
> > From: Shile Zhang
> > > Sent: 15 November 2019 06:48
> > ...
> > > arch/x86/kernel/unwind_orc.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> > > index 332ae6530fa8..280da6fa9922 100644
> > > --- a/arch/x86/kernel/unwind_orc.c
> > > +++ b/arch/x86/kernel/unwind_orc.c
> > > @@ -273,9 +273,11 @@ void __init unwind_init(void)
> > > return;
> > > }
> > >
> > > - /* Sort the .orc_unwind and .orc_unwind_ip tables: */
> > > - sort(__start_orc_unwind_ip, num_entries, sizeof(int), orc_sort_cmp,
> > > - orc_sort_swap);
> > > + /*
> > > + * Note, orc_unwind and orc_unwind_ip tables has been sorted in
> > > + * vmlinux link phase by sorttable tool at build time.
> > > + * Its ready for binary search now.
> > > + */
> >
> > How fast is sort() if the table is sorted?
> > Relying on the kernel sources and build scripts always being in sync seems dangerous.
> > Probably better to leave the sort in for a release of two.
>
> This patch comes after the build script changes, so they'd be in sync.
> What would the concern be?
Mostly that if, for any reason, the build script changes are missing nothing
will detect the error - but the results will be very confusing.
If the sort is fast for sorted inputs (some algorithms aren't) then leaving
it in won't take that long.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 2019/11/18 18:05, David Laight wrote:
> From: Josh Poimboeuf <[email protected]>
>> Sent: 15 November 2019 17:47
>> On Fri, Nov 15, 2019 at 04:51:24PM +0000, David Laight wrote:
>>> From: Shile Zhang
>>>> Sent: 15 November 2019 06:48
>>> ...
>>>> arch/x86/kernel/unwind_orc.c | 8 +++++---
>>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
>>>> index 332ae6530fa8..280da6fa9922 100644
>>>> --- a/arch/x86/kernel/unwind_orc.c
>>>> +++ b/arch/x86/kernel/unwind_orc.c
>>>> @@ -273,9 +273,11 @@ void __init unwind_init(void)
>>>> return;
>>>> }
>>>>
>>>> - /* Sort the .orc_unwind and .orc_unwind_ip tables: */
>>>> - sort(__start_orc_unwind_ip, num_entries, sizeof(int), orc_sort_cmp,
>>>> - orc_sort_swap);
>>>> + /*
>>>> + * Note, orc_unwind and orc_unwind_ip tables has been sorted in
>>>> + * vmlinux link phase by sorttable tool at build time.
>>>> + * Its ready for binary search now.
>>>> + */
>>> How fast is sort() if the table is sorted?
>>> Relying on the kernel sources and build scripts always being in sync seems dangerous.
>>> Probably better to leave the sort in for a release of two.
>> This patch comes after the build script changes, so they'd be in sync.
>> What would the concern be?
> Mostly that if, for any reason, the build script changes are missing nothing
> will detect the error - but the results will be very confusing.
> If the sort is fast for sorted inputs (some algorithms aren't) then leaving
> it in won't take that long.
>
> David
Hi, David,
Thanks for your review!
Due to the sort inside kernel is heap-sort, so it cost almost the same
time for sorted inputs.
I wondered if we can add error handling in the link script, exit with
error if sort encountered any errors.
Thanks!
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
On 2019/11/16 01:24, Andy Lutomirski wrote:
>> On Nov 15, 2019, at 1:43 AM, Shile Zhang <[email protected]> wrote:
>>
>>
>>
>>> On 2019/11/15 17:07, Peter Zijlstra wrote:
>>>> On Fri, Nov 15, 2019 at 02:47:49PM +0800, Shile Zhang wrote:
>>>>
>>>> +#if defined(SORTTABLE_64) && defined(UNWINDER_ORC_ENABLED)
>>>> +/* ORC unwinder only support X86_64 */
>>>> +#include <errno.h>
>>>> +#include <pthread.h>
>>>> +#include <linux/types.h>
>>>> +
>>>> +#define ORC_REG_UNDEFINED 0
>>>> +#define ERRSTRING_MAXSZ 256
>>>> +
>>>> +struct orc_entry {
>>>> + s16 sp_offset;
>>>> + s16 bp_offset;
>>>> + unsigned sp_reg:4;
>>>> + unsigned bp_reg:4;
>>>> + unsigned type:2;
>>>> + unsigned end:1;
>>>> +} __attribute__((packed));
>>>> +
>>>> +struct orctable_info {
>>>> + size_t orc_size;
>>>> + size_t orc_ip_size;
>>>> +} orctable;
>>> There's ./arch/x86/include/asm/orc_types.h for this. Please don't
>>> duplicate. objtool uses that same header.
>> Good catch! Thanks for your kindly reminder! I'll remove it.
>>>> +/**
>>>> + * sort - sort an array of elements
>>>> + * @base: pointer to data to sort
>>>> + * @num: number of elements
>>>> + * @size: size of each element
>>>> + * @cmp_func: pointer to comparison function
>>>> + * @swap_func: pointer to swap function
>>>> + *
>>>> + * This function does a heapsort on the given array. You may provide a
>>>> + * swap_func function optimized to your element type.
>>>> + *
>>>> + * Sorting time is O(n log n) both on average and worst-case. While
>>>> + * qsort is about 20% faster on average, it suffers from exploitable
>>>> + * O(n*n) worst-case behavior and extra memory requirements that make
>>>> + * it less suitable for kernel use.
>>>> + *
>>>> + * This code token out of /lib/sort.c.
>>>> + */
>>>> +static void sort(void *base, size_t num, size_t size,
>>>> + int (*cmp_func)(const void *, const void *),
>>>> + void (*swap_func)(void *, void *, int size))
>>>> +{
>>>> + /* pre-scale counters for performance */
>>>> + int i = (num/2 - 1) * size, n = num * size, c, r;
>>>> +
>>>> + /* heapify */
>>>> + for ( ; i >= 0; i -= size) {
>>>> + for (r = i; r * 2 + size < n; r = c) {
>>>> + c = r * 2 + size;
>>>> + if (c < n - size &&
>>>> + cmp_func(base + c, base + c + size) < 0)
>>>> + c += size;
>>>> + if (cmp_func(base + r, base + c) >= 0)
>>>> + break;
>>>> + swap_func(base + r, base + c, size);
>>>> + }
>>>> + }
>>>> +
>>>> + /* sort */
>>>> + for (i = n - size; i > 0; i -= size) {
>>>> + swap_func(base, base + i, size);
>>>> + for (r = 0; r * 2 + size < i; r = c) {
>>>> + c = r * 2 + size;
>>>> + if (c < i - size &&
>>>> + cmp_func(base + c, base + c + size) < 0)
>>>> + c += size;
>>>> + if (cmp_func(base + r, base + c) >= 0)
>>>> + break;
>>>> + swap_func(base + r, base + c, size);
>>>> + }
>>>> + }
>>>> +}
>>> Do we really need to copy the heapsort implementation? That is, why not
>>> use libc's qsort() ? This is userspace after all.
>> Yes, I think qsort is better choice than copy-paste here. But qsort does not support customized swap func, which is needed for ORC unwind swap two tables together.
>> I think it's hard to do with qsort, so I used sort same with original orc unwind table sort.
> One solution is to make an array of indices 0, 1, 2, etc, and sort that using a comparison function that compares i,j by actually comparing source[i], source[j]. (Or use pointers, but indices are probably faster on a 64-bit machine if you can use 32-bit indices.) Then, after sorting, permute the original array using the now-sorted indices. In the case where swapping is expensive, this is actually faster, since it does exactly n moves instead of O(n log n).
Hi, Andy,
Thanks for your suggestion!
It's works, qsort is faster than heap sort, sort time down from 70ms to
20ms.
I'll update in next version.
Thanks again!
On Mon, Nov 18, 2019 at 10:05:02AM +0000, David Laight wrote:
> From: Josh Poimboeuf <[email protected]>
> > Sent: 15 November 2019 17:47
> > On Fri, Nov 15, 2019 at 04:51:24PM +0000, David Laight wrote:
> > > From: Shile Zhang
> > > > Sent: 15 November 2019 06:48
> > > ...
> > > > arch/x86/kernel/unwind_orc.c | 8 +++++---
> > > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> > > > index 332ae6530fa8..280da6fa9922 100644
> > > > --- a/arch/x86/kernel/unwind_orc.c
> > > > +++ b/arch/x86/kernel/unwind_orc.c
> > > > @@ -273,9 +273,11 @@ void __init unwind_init(void)
> > > > return;
> > > > }
> > > >
> > > > - /* Sort the .orc_unwind and .orc_unwind_ip tables: */
> > > > - sort(__start_orc_unwind_ip, num_entries, sizeof(int), orc_sort_cmp,
> > > > - orc_sort_swap);
> > > > + /*
> > > > + * Note, orc_unwind and orc_unwind_ip tables has been sorted in
> > > > + * vmlinux link phase by sorttable tool at build time.
> > > > + * Its ready for binary search now.
> > > > + */
> > >
> > > How fast is sort() if the table is sorted?
> > > Relying on the kernel sources and build scripts always being in sync seems dangerous.
> > > Probably better to leave the sort in for a release of two.
> >
> > This patch comes after the build script changes, so they'd be in sync.
> > What would the concern be?
>
> Mostly that if, for any reason, the build script changes are missing nothing
> will detect the error - but the results will be very confusing.
> If the sort is fast for sorted inputs (some algorithms aren't) then leaving
> it in won't take that long.
But why would the build script changes be missing...
And it should fail gracefully for oopses anyway: stack traces will just
have a bunch of question marks.
--
Josh