LinuxLists.cc - [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi

2015-11-21 01:24:09

Subject: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

This is a respin of a patch series from about a year ago[1]. I realized
that we already had most of the code in recordmcount to figure out
where we make calls to particular functions, so recording where
we make calls to the integer division functions should be easy enough
to add support for in the same codepaths. Looking back on the thread
it seems like Mans was thinking along the same lines, although it wasn't
obvious to me back then or even over the last few days when I wrote this.

This series extends recordmcount to record the locations of the calls
to the library functions on ARM builds, and puts those locations into a
table that we use to patch instructions at boot. The first two patches
are the recordmcount changes, while the last patch implements the runtime
patching for modules and kernel code. The module part also hooks into the
relocation patching code we already have.

The RFC tag is because I'm thinking of splitting the recordmcount changes
into a new program based on recordmcount so that we don't drag in a lot
of corner cases and stuff when we don't need to. I suspect it will be
cleaner that way too. Does anyone prefer one way or the other?

Comments/feedback appreciated.

[1] http://lkml.kernel.org/r/[email protected]

Cc: Nicolas Pitre <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Måns Rullgård <[email protected]>

Stephen Boyd (3):
scripts: Allow recordmcount to be used without tracing enabled
recordmcount: Record locations of __aeabi_{u}idiv() calls on ARM
ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

Makefile | 7 +
arch/arm/Kconfig | 14 ++
arch/arm/kernel/module.c | 44 ++++++
arch/arm/kernel/setup.c | 34 +++++
arch/arm/kernel/vmlinux.lds.S | 13 ++
kernel/trace/Kconfig | 4 +
scripts/Makefile.build | 15 +-
scripts/recordmcount.c | 10 +-
scripts/recordmcount.h | 337 +++++++++++++++++++++++++++++++++---------
scripts/recordmcount.pl | 11 +-
10 files changed, 406 insertions(+), 83 deletions(-)

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2015-11-21 01:23:22

by Stephen Boyd

[permalink] [raw]

Subject: [RFC/PATCH 1/3] scripts: Allow recordmcount to be used without tracing enabled

In the next patch we're going to modify recordmcount to also
record locations of calls to __aeabi_{u}idiv(). Lay the
groundwork for this by adding a flag to recordmcount that
indicates if we're expected to find calls to mcount or not.

Cc: Nicolas Pitre <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Måns Rullgård <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
---
kernel/trace/Kconfig | 4 ++++
scripts/Makefile.build | 15 +++++----------
scripts/recordmcount.c | 10 +++++++---
scripts/recordmcount.h | 2 +-
scripts/recordmcount.pl | 11 ++++++++---
5 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8d6363f42169..578b666ed7d9 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -57,6 +57,10 @@ config HAVE_C_RECORDMCOUNT
help
C version of recordmcount available?

+config RUN_RECORDMCOUNT
+ def_bool y
+ depends on DYNAMIC_FTRACE && HAVE_FTRACE_MCOUNT_RECORD
+
config TRACER_MAX_TRACE
bool

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 01df30af4d4a..22f2eb10d434 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -210,7 +210,7 @@ cmd_modversions = \
fi;
endif

-ifdef CONFIG_FTRACE_MCOUNT_RECORD
+ifdef CONFIG_RUN_RECORDMCOUNT
ifdef BUILD_C_RECORDMCOUNT
ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
RECORDMCOUNT_FLAGS = -w
@@ -219,26 +219,21 @@ endif
# The empty.o file is created in the make process in order to determine
# the target endianness and word size. It is made before all other C
# files, including recordmcount.
-sub_cmd_record_mcount = \
+cmd_record_mcount = \
if [ $(@) != "scripts/mod/empty.o" ]; then \
- $(objtree)/scripts/recordmcount $(RECORDMCOUNT_FLAGS) "$(@)"; \
+ $(objtree)/scripts/recordmcount $(RECORDMCOUNT_FLAGS) $(if $(findstring $(CC_FLAGS_FTRACE),$(_c_flags)),-t,) "$(@)"; \
fi;
recordmcount_source := $(srctree)/scripts/recordmcount.c \
$(srctree)/scripts/recordmcount.h
else
-sub_cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \
+cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \
"$(if $(CONFIG_CPU_BIG_ENDIAN),big,little)" \
"$(if $(CONFIG_64BIT),64,32)" \
"$(OBJDUMP)" "$(OBJCOPY)" "$(CC) $(KBUILD_CFLAGS)" \
"$(LD)" "$(NM)" "$(RM)" "$(MV)" \
- "$(if $(part-of-module),1,0)" "$(@)";
+ "$(if $(part-of-module),1,0)" "$(if $(findstring $(CC_FLAGS_FTRACE),$(_c_flags)),1,0)" "$(@)";
recordmcount_source := $(srctree)/scripts/recordmcount.pl
endif
-cmd_record_mcount = \
- if [ "$(findstring $(CC_FLAGS_FTRACE),$(_c_flags))" = \
- "$(CC_FLAGS_FTRACE)" ]; then \
- $(sub_cmd_record_mcount) \
- fi;
endif

define rule_cc_o_c
diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index 698768bdc581..b6b4a5df647a 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -54,6 +54,7 @@ static struct stat sb; /* Remember .st_size, etc. */
static jmp_buf jmpenv; /* setjmp/longjmp per-file error escape */
static const char *altmcount; /* alternate mcount symbol name */
static int warn_on_notrace_sect; /* warn when section has mcount not being recorded */
+static int trace_mcount; /* Record mcount callers */

/* setjmp() return values */
enum {
@@ -453,19 +454,22 @@ main(int argc, char *argv[])
int c;
int i;

- while ((c = getopt(argc, argv, "w")) >= 0) {
+ while ((c = getopt(argc, argv, "wt")) >= 0) {
switch (c) {
case 'w':
warn_on_notrace_sect = 1;
break;
+ case 't':
+ trace_mcount = 1;
+ break;
default:
- fprintf(stderr, "usage: recordmcount [-w] file.o...\n");
+ fprintf(stderr, "usage: recordmcount [-wt] file.o...\n");
return 0;
}
}

if ((argc - optind) < 1) {
- fprintf(stderr, "usage: recordmcount [-w] file.o...\n");
+ fprintf(stderr, "usage: recordmcount [-wt] file.o...\n");
return 0;
}

diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h
index b9897e2be404..6e196dba748d 100644
--- a/scripts/recordmcount.h
+++ b/scripts/recordmcount.h
@@ -323,7 +323,7 @@ static uint_t *sift_rel_mcount(uint_t *mlocp,
get_sym_str_and_relp(relhdr, ehdr, &sym0, &str0, &relp);

for (t = nrel; t; --t) {
- if (!mcountsym)
+ if (trace_mcount && !mcountsym)
mcountsym = get_mcountsym(sym0, relp, str0);

if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp)) {
diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 826470d7f000..cff3040ddbdc 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -113,20 +113,25 @@ $P =~ s@.*/@@g;

my $V = '0.1';

-if ($#ARGV != 11) {
- print "usage: $P arch endian bits objdump objcopy cc ld nm rm mv is_module inputfile\n";
+if ($#ARGV != 12) {
+ print "usage: $P arch endian bits objdump objcopy cc ld nm rm mv is_module is_traced inputfile\n";
print "version: $V\n";
exit(1);
}

my ($arch, $endian, $bits, $objdump, $objcopy, $cc,
- $ld, $nm, $rm, $mv, $is_module, $inputfile) = @ARGV;
+ $ld, $nm, $rm, $mv, $is_module, $is_traced, $inputfile) = @ARGV;

# This file refers to mcount and shouldn't be ftraced, so lets' ignore it
if ($inputfile =~ m,kernel/trace/ftrace\.o$,) {
exit(0);
}

+# We only trace mcount calls
+if ($is_traced eq "0") {
+ exit(0);
+}
+
# Acceptable sections to record.
my %text_sections = (
".text" => 1,
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2015-11-21 01:24:21

by Stephen Boyd

[permalink] [raw]

Subject: [RFC/PATCH 2/3] recordmcount: Record locations of __aeabi_{u}idiv() calls on ARM

The ARM compiler inserts calls to __aeabi_uidiv() and
__aeabi_idiv() when it needs to perform division on signed and
unsigned integers. If a processor has support for the udiv and
sdiv division instructions the calls to these support routines
can be replaced with those instructions. Therefore, record the
location of calls to these library functions into two sections
(one for udiv and one for sdiv) similar to how we trace calls to
mcount. When the kernel boots up it will check to see if the
processor supports the instructions and then patch the call sites
with the instruction.

Cc: Nicolas Pitre <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Måns Rullgård <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
---
scripts/recordmcount.h | 335 +++++++++++++++++++++++++++++++++++++++----------
1 file changed, 269 insertions(+), 66 deletions(-)

diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h
index 6e196dba748d..cab91ffc82a6 100644
--- a/scripts/recordmcount.h
+++ b/scripts/recordmcount.h
@@ -18,11 +18,13 @@
*
* Licensed under the GNU General Public License, version 2 (GPLv2).
*/
+#undef append_section
#undef append_func
#undef is_fake_mcount
#undef fn_is_fake_mcount
#undef MIPS_is_fake_mcount
#undef mcount_adjust
+#undef add_relocation
#undef sift_rel_mcount
#undef nop_mcount
#undef find_secsym_ndx
@@ -30,6 +32,7 @@
#undef has_rel_mcount
#undef tot_relsize
#undef get_mcountsym
+#undef get_arm_sym
#undef get_sym_str_and_relp
#undef do_func
#undef Elf_Addr
@@ -52,7 +55,9 @@
#undef _size

#ifdef RECORD_MCOUNT_64
+# define append_section append_section64
# define append_func append64
+# define add_relocation add_relocation_64
# define sift_rel_mcount sift64_rel_mcount
# define nop_mcount nop_mcount_64
# define find_secsym_ndx find64_secsym_ndx
@@ -62,6 +67,7 @@
# define get_sym_str_and_relp get_sym_str_and_relp_64
# define do_func do64
# define get_mcountsym get_mcountsym_64
+# define get_arm_sym get_arm_sym_64
# define is_fake_mcount is_fake_mcount64
# define fn_is_fake_mcount fn_is_fake_mcount64
# define MIPS_is_fake_mcount MIPS64_is_fake_mcount
@@ -85,7 +91,9 @@
# define _align 7u
# define _size 8
#else
+# define append_section append_section32
# define append_func append32
+# define add_relocation add_relocation_32
# define sift_rel_mcount sift32_rel_mcount
# define nop_mcount nop_mcount_32
# define find_secsym_ndx find32_secsym_ndx
@@ -95,6 +103,7 @@
# define get_sym_str_and_relp get_sym_str_and_relp_32
# define do_func do32
# define get_mcountsym get_mcountsym_32
+# define get_arm_sym get_arm_sym_32
# define is_fake_mcount is_fake_mcount32
# define fn_is_fake_mcount fn_is_fake_mcount32
# define MIPS_is_fake_mcount MIPS32_is_fake_mcount
@@ -174,6 +183,62 @@ static int MIPS_is_fake_mcount(Elf_Rel const *rp)
return is_fake;
}

+static void append_section(uint_t const *const mloc0,
+ uint_t const *const mlocp,
+ Elf_Rel const *const mrel0,
+ Elf_Rel const *const mrelp,
+ char const *name,
+ unsigned int const rel_entsize,
+ unsigned int const symsec_sh_link,
+ uint_t *name_offp,
+ uint_t *tp,
+ unsigned *shnump
+ )
+{
+ Elf_Shdr mcsec;
+ uint_t name_off = *name_offp;
+ uint_t t = *tp;
+ uint_t loc_diff = (void *)mlocp - (void *)mloc0;
+ uint_t rel_diff = (void *)mrelp - (void *)mrel0;
+ unsigned shnum = *shnump;
+
+ mcsec.sh_name = w((sizeof(Elf_Rela) == rel_entsize) + strlen(".rel")
+ + name_off);
+ mcsec.sh_type = w(SHT_PROGBITS);
+ mcsec.sh_flags = _w(SHF_ALLOC);
+ mcsec.sh_addr = 0;
+ mcsec.sh_offset = _w(t);
+ mcsec.sh_size = _w(loc_diff);
+ mcsec.sh_link = 0;
+ mcsec.sh_info = 0;
+ mcsec.sh_addralign = _w(_size);
+ mcsec.sh_entsize = _w(_size);
+ uwrite(fd_map, &mcsec, sizeof(mcsec));
+ t += loc_diff;
+
+ mcsec.sh_name = w(name_off);
+ mcsec.sh_type = (sizeof(Elf_Rela) == rel_entsize)
+ ? w(SHT_RELA)
+ : w(SHT_REL);
+ mcsec.sh_flags = 0;
+ mcsec.sh_addr = 0;
+ mcsec.sh_offset = _w(t);
+ mcsec.sh_size = _w(rel_diff);
+ mcsec.sh_link = w(symsec_sh_link);
+ mcsec.sh_info = w(shnum);
+ mcsec.sh_addralign = _w(_size);
+ mcsec.sh_entsize = _w(rel_entsize);
+ uwrite(fd_map, &mcsec, sizeof(mcsec));
+ t += rel_diff;
+
+ shnum += 2;
+ name_off += strlen(name) + 1;
+
+ *tp = t;
+ *shnump = shnum;
+ *name_offp = name_off;
+}
+
/* Append the new shstrtab, Elf_Shdr[], __mcount_loc and its relocations. */
static void append_func(Elf_Ehdr *const ehdr,
Elf_Shdr *const shstr,
@@ -181,20 +246,50 @@ static void append_func(Elf_Ehdr *const ehdr,
uint_t const *const mlocp,
Elf_Rel const *const mrel0,
Elf_Rel const *const mrelp,
+ uint_t const *const mloc0_u,
+ uint_t const *const mlocp_u,
+ Elf_Rel const *const mrel0_u,
+ Elf_Rel const *const mrelp_u,
+ uint_t const *const mloc0_i,
+ uint_t const *const mlocp_i,
+ Elf_Rel const *const mrel0_i,
+ Elf_Rel const *const mrelp_i,
unsigned int const rel_entsize,
unsigned int const symsec_sh_link)
{
/* Begin constructing output file */
- Elf_Shdr mcsec;
char const *mc_name = (sizeof(Elf_Rela) == rel_entsize)
? ".rela__mcount_loc"
: ".rel__mcount_loc";
- unsigned const old_shnum = w2(ehdr->e_shnum);
+ char const *udiv_name = (sizeof(Elf_Rela) == rel_entsize)
+ ? ".rela__udiv_loc"
+ : ".rel__udiv_loc";
+ char const *idiv_name = (sizeof(Elf_Rela) == rel_entsize)
+ ? ".rela__idiv_loc"
+ : ".rel__idiv_loc";
+ unsigned old_shnum = w2(ehdr->e_shnum);
uint_t const old_shoff = _w(ehdr->e_shoff);
uint_t const old_shstr_sh_size = _w(shstr->sh_size);
uint_t const old_shstr_sh_offset = _w(shstr->sh_offset);
- uint_t t = 1 + strlen(mc_name) + _w(shstr->sh_size);
uint_t new_e_shoff;
+ uint_t t = _w(shstr->sh_size);
+ uint_t name_off = old_shstr_sh_size;
+ int mc = 0, udiv = 0, idiv = 0;
+ int num_sections;
+
+ if (mlocp != mloc0) {
+ t += 1 + strlen(mc_name);
+ mc = 1;
+ }
+ if (mlocp_u != mloc0_u) {
+ t += 1 + strlen(udiv_name);
+ udiv = 1;
+ }
+ if (mlocp_i != mloc0_i) {
+ t += 1 + strlen(idiv_name);
+ idiv = 1;
+ }
+ num_sections = (mc * 2) + (udiv * 2) + (idiv * 2);

shstr->sh_size = _w(t);
shstr->sh_offset = _w(sb.st_size);
@@ -204,8 +299,13 @@ static void append_func(Elf_Ehdr *const ehdr,

/* body for new shstrtab */
ulseek(fd_map, sb.st_size, SEEK_SET);
- uwrite(fd_map, old_shstr_sh_offset + (void *)ehdr, old_shstr_sh_size);
- uwrite(fd_map, mc_name, 1 + strlen(mc_name));
+ uwrite(fd_map, old_shstr_sh_offset + (void *)ehdr, name_off);
+ if (mc)
+ uwrite(fd_map, mc_name, 1 + strlen(mc_name));
+ if (udiv)
+ uwrite(fd_map, udiv_name, 1 + strlen(udiv_name));
+ if (idiv)
+ uwrite(fd_map, idiv_name, 1 + strlen(idiv_name));

/* old(modified) Elf_Shdr table, word-byte aligned */
ulseek(fd_map, t, SEEK_SET);
@@ -214,39 +314,38 @@ static void append_func(Elf_Ehdr *const ehdr,
sizeof(Elf_Shdr) * old_shnum);

/* new sections __mcount_loc and .rel__mcount_loc */
- t += 2*sizeof(mcsec);
- mcsec.sh_name = w((sizeof(Elf_Rela) == rel_entsize) + strlen(".rel")
- + old_shstr_sh_size);
- mcsec.sh_type = w(SHT_PROGBITS);
- mcsec.sh_flags = _w(SHF_ALLOC);
- mcsec.sh_addr = 0;
- mcsec.sh_offset = _w(t);
- mcsec.sh_size = _w((void *)mlocp - (void *)mloc0);
- mcsec.sh_link = 0;
- mcsec.sh_info = 0;
- mcsec.sh_addralign = _w(_size);
- mcsec.sh_entsize = _w(_size);
- uwrite(fd_map, &mcsec, sizeof(mcsec));
-
- mcsec.sh_name = w(old_shstr_sh_size);
- mcsec.sh_type = (sizeof(Elf_Rela) == rel_entsize)
- ? w(SHT_RELA)
- : w(SHT_REL);
- mcsec.sh_flags = 0;
- mcsec.sh_addr = 0;
- mcsec.sh_offset = _w((void *)mlocp - (void *)mloc0 + t);
- mcsec.sh_size = _w((void *)mrelp - (void *)mrel0);
- mcsec.sh_link = w(symsec_sh_link);
- mcsec.sh_info = w(old_shnum);
- mcsec.sh_addralign = _w(_size);
- mcsec.sh_entsize = _w(rel_entsize);
- uwrite(fd_map, &mcsec, sizeof(mcsec));
-
- uwrite(fd_map, mloc0, (void *)mlocp - (void *)mloc0);
- uwrite(fd_map, mrel0, (void *)mrelp - (void *)mrel0);
+ t += num_sections * sizeof(Elf_Shdr);
+ if (mc)
+ append_section(mloc0, mlocp, mrel0, mrelp, mc_name, rel_entsize,
+ symsec_sh_link, &name_off, &t, &old_shnum);
+
+ /* new sections __udiv_loc and .rel__udiv_loc */
+ if (udiv)
+ append_section(mloc0_u, mlocp_u, mrel0_u, mrelp_u, udiv_name,
+ rel_entsize, symsec_sh_link, &name_off, &t,
+ &old_shnum);
+
+ /* new sections __idiv_loc and .rel__idiv_loc */
+ if (idiv)
+ append_section(mloc0_i, mlocp_i, mrel0_i, mrelp_i, idiv_name,
+ rel_entsize, symsec_sh_link, &name_off, &t,
+ &old_shnum);
+
+ if (mc) {
+ uwrite(fd_map, mloc0, (void *)mlocp - (void *)mloc0);
+ uwrite(fd_map, mrel0, (void *)mrelp - (void *)mrel0);
+ }
+ if (udiv) {
+ uwrite(fd_map, mloc0_u, (void *)mlocp_u - (void *)mloc0_u);
+ uwrite(fd_map, mrel0_u, (void *)mrelp_u - (void *)mrel0_u);
+ }
+ if (idiv) {
+ uwrite(fd_map, mloc0_i, (void *)mlocp_i - (void *)mloc0_i);
+ uwrite(fd_map, mrel0_i, (void *)mrelp_i - (void *)mrel0_i);
+ }

ehdr->e_shoff = _w(new_e_shoff);
- ehdr->e_shnum = w2(2 + w2(ehdr->e_shnum)); /* {.rel,}__mcount_loc */
+ ehdr->e_shnum = w2(num_sections + w2(ehdr->e_shnum));
ulseek(fd_map, 0, SEEK_SET);
uwrite(fd_map, ehdr, sizeof(*ehdr));
}
@@ -273,6 +372,20 @@ static unsigned get_mcountsym(Elf_Sym const *const sym0,
return mcountsym;
}

+static unsigned get_arm_sym(Elf_Sym const *const sym0,
+ Elf_Rel const *relp,
+ char const *const str0, const char *find)
+{
+ unsigned sym = 0;
+ Elf_Sym const *const symp = &sym0[Elf_r_sym(relp)];
+ char const *symname = &str0[w(symp->st_name)];
+
+ if (strcmp(find, symname) == 0)
+ sym = Elf_r_sym(relp);
+
+ return sym;
+}
+
static void get_sym_str_and_relp(Elf_Shdr const *const relhdr,
Elf_Ehdr const *const ehdr,
Elf_Sym const **sym0,
@@ -296,28 +409,65 @@ static void get_sym_str_and_relp(Elf_Shdr const *const relhdr,
*relp = rel0;
}

+static void add_relocation(Elf_Rel const *relp, uint_t *mloc0, uint_t **mlocpp,
+ uint_t const recval, unsigned const recsym,
+ Elf_Rel **const mrelpp, unsigned offbase,
+ unsigned rel_entsize, unsigned const reltype)
+{
+ uint_t *mlocp = *mlocpp;
+ Elf_Rel *mrelp = *mrelpp;
+ uint_t const addend = _w(_w(relp->r_offset) - recval + mcount_adjust);
+ mrelp->r_offset = _w(offbase + ((void *)mlocp - (void *)mloc0));
+ Elf_r_info(mrelp, recsym, reltype);
+ if (rel_entsize == sizeof(Elf_Rela)) {
+ ((Elf_Rela *)mrelp)->r_addend = addend;
+ *mlocp++ = 0;
+ } else
+ *mlocp++ = addend;
+
+ *mlocpp = mlocp;
+ *mrelpp = (Elf_Rel *)(rel_entsize + (void *)mrelp);
+}
+
/*
* Look at the relocations in order to find the calls to mcount.
* Accumulate the section offsets that are found, and their relocation info,
* onto the end of the existing arrays.
*/
-static uint_t *sift_rel_mcount(uint_t *mlocp,
- unsigned const offbase,
+static void sift_rel_mcount(uint_t **mlocpp,
+ uint_t *mloc_base,
Elf_Rel **const mrelpp,
+ uint_t **mlocpp_u,
+ uint_t *mloc_base_u,
+ Elf_Rel **const mrelpp_u,
+ uint_t **mlocpp_i,
+ uint_t *mloc_base_i,
+ Elf_Rel **const mrelpp_i,
Elf_Shdr const *const relhdr,
Elf_Ehdr const *const ehdr,
unsigned const recsym,
uint_t const recval,
unsigned const reltype)
{
+ uint_t *mlocp = *mlocpp;
+ unsigned const offbase = (void *)mlocp - (void *)mloc_base;
uint_t *const mloc0 = mlocp;
Elf_Rel *mrelp = *mrelpp;
+ uint_t *mlocp_u = *mlocpp_u;
+ unsigned const offbase_u = (void *)mlocp_u - (void *)mloc_base_u;
+ uint_t *const mloc0_u = mlocp_u;
+ Elf_Rel *mrelp_u = *mrelpp_u;
+ uint_t *mlocp_i = *mlocpp_i;
+ unsigned const offbase_i = (void *)mlocp_i - (void *)mloc_base_i;
+ uint_t *const mloc0_i = mlocp_i;
+ Elf_Rel *mrelp_i = *mrelpp_i;
Elf_Sym const *sym0;
char const *str0;
Elf_Rel const *relp;
unsigned rel_entsize = _w(relhdr->sh_entsize);
unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
- unsigned mcountsym = 0;
+ int arm = w2(ehdr->e_machine) == EM_ARM;
+ unsigned mcountsym = 0, udiv_sym = 0, idiv_sym =0;
unsigned t;

get_sym_str_and_relp(relhdr, ehdr, &sym0, &str0, &relp);
@@ -326,24 +476,53 @@ static uint_t *sift_rel_mcount(uint_t *mlocp,
if (trace_mcount && !mcountsym)
mcountsym = get_mcountsym(sym0, relp, str0);

- if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp)) {
- uint_t const addend =
- _w(_w(relp->r_offset) - recval + mcount_adjust);
- mrelp->r_offset = _w(offbase
- + ((void *)mlocp - (void *)mloc0));
- Elf_r_info(mrelp, recsym, reltype);
- if (rel_entsize == sizeof(Elf_Rela)) {
- ((Elf_Rela *)mrelp)->r_addend = addend;
- *mlocp++ = 0;
- } else
- *mlocp++ = addend;
-
- mrelp = (Elf_Rel *)(rel_entsize + (void *)mrelp);
+ if (arm && !udiv_sym)
+ udiv_sym = get_arm_sym(sym0, relp, str0,
+ "__aeabi_uidiv");
+ if (arm && !idiv_sym)
+ idiv_sym = get_arm_sym(sym0, relp, str0,
+ "__aeabi_idiv");
+
+ if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp))
+ add_relocation(relp, mloc0, &mlocp, recval, recsym,
+ &mrelp, offbase, rel_entsize, reltype);
+
+ if (udiv_sym == Elf_r_sym(relp)) {
+ switch (relp->r_info & 0xff) {
+ case R_ARM_PC24:
+ case 28:
+ case 29:
+ add_relocation(relp, mloc0_u, &mlocp_u, recval,
+ recsym, &mrelp_u, offbase_u,
+ rel_entsize, reltype);
+ break;
+ default:
+ break;
+ }
}
+
+ if (idiv_sym == Elf_r_sym(relp)) {
+ switch (relp->r_info & 0xff) {
+ case R_ARM_PC24:
+ case 28:
+ case 29:
+ add_relocation(relp, mloc0_i, &mlocp_i, recval,
+ recsym, &mrelp_i, offbase_i,
+ rel_entsize, reltype);
+ break;
+ default:
+ break;
+ }
+ }
+
relp = (Elf_Rel const *)(rel_entsize + (void *)relp);
}
+ *mrelpp_i = mrelp_i;
+ *mrelpp_u = mrelp_u;
*mrelpp = mrelp;
- return mlocp;
+ *mlocpp_i = mlocp_i;
+ *mlocpp_u = mlocp_u;
+ *mlocpp = mlocp;
}

/*
@@ -452,14 +631,14 @@ static char const *
__has_rel_mcount(Elf_Shdr const *const relhdr, /* is SHT_REL or SHT_RELA */
Elf_Shdr const *const shdr0,
char const *const shstrtab,
- char const *const fname)
+ char const *const fname, const char *find)
{
/* .sh_info depends on .sh_type == SHT_REL[,A] */
Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
char const *const txtname = &shstrtab[w(txthdr->sh_name)];

- if (strcmp("__mcount_loc", txtname) == 0) {
- fprintf(stderr, "warning: __mcount_loc already exists: %s\n",
+ if (strcmp(find, txtname) == 0) {
+ fprintf(stderr, "warning: %s already exists: %s\n", find,
fname);
succeed_file();
}
@@ -472,25 +651,25 @@ __has_rel_mcount(Elf_Shdr const *const relhdr, /* is SHT_REL or SHT_RELA */
static char const *has_rel_mcount(Elf_Shdr const *const relhdr,
Elf_Shdr const *const shdr0,
char const *const shstrtab,
- char const *const fname)
+ char const *const fname, const char *find)
{
if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
return NULL;
- return __has_rel_mcount(relhdr, shdr0, shstrtab, fname);
+ return __has_rel_mcount(relhdr, shdr0, shstrtab, fname, find);
}

static unsigned tot_relsize(Elf_Shdr const *const shdr0,
unsigned nhdr,
const char *const shstrtab,
- const char *const fname)
+ const char *const fname, const char *find)
{
unsigned totrelsz = 0;
Elf_Shdr const *shdrp = shdr0;
char const *txtname;

for (; nhdr; --nhdr, ++shdrp) {
- txtname = has_rel_mcount(shdrp, shdr0, shstrtab, fname);
+ txtname = has_rel_mcount(shdrp, shdr0, shstrtab, fname, find);
if (txtname && is_mcounted_section_name(txtname))
totrelsz += _w(shdrp->sh_size);
}
@@ -513,7 +692,8 @@ do_func(Elf_Ehdr *const ehdr, char const *const fname, unsigned const reltype)
unsigned k;

/* Upper bound on space: assume all relevant relocs are for mcount. */
- unsigned const totrelsz = tot_relsize(shdr0, nhdr, shstrtab, fname);
+ unsigned const totrelsz = tot_relsize(shdr0, nhdr, shstrtab, fname,
+ "__mcount_loc");
Elf_Rel *const mrel0 = umalloc(totrelsz);
Elf_Rel * mrelp = mrel0;

@@ -521,12 +701,28 @@ do_func(Elf_Ehdr *const ehdr, char const *const fname, unsigned const reltype)
uint_t *const mloc0 = umalloc(totrelsz>>1);
uint_t * mlocp = mloc0;

+ /* Allocate for arm too */
+ Elf_Rel *const mrel0_u = umalloc(totrelsz);
+ Elf_Rel * mrelp_u = mrel0_u;
+
+ /* 2*sizeof(address) <= sizeof(Elf_Rel) */
+ uint_t *const mloc0_u = umalloc(totrelsz>>1);
+ uint_t * mlocp_u = mloc0_u;
+
+ /* Allocate for arm too */
+ Elf_Rel *const mrel0_i = umalloc(totrelsz);
+ Elf_Rel * mrelp_i = mrel0_i;
+
+ /* 2*sizeof(address) <= sizeof(Elf_Rel) */
+ uint_t *const mloc0_i = umalloc(totrelsz>>1);
+ uint_t * mlocp_i = mloc0_i;
+
unsigned rel_entsize = 0;
unsigned symsec_sh_link = 0;

for (relhdr = shdr0, k = nhdr; k; --k, ++relhdr) {
char const *const txtname = has_rel_mcount(relhdr, shdr0,
- shstrtab, fname);
+ shstrtab, fname, "__mcount_loc");
if (txtname && is_mcounted_section_name(txtname)) {
uint_t recval = 0;
unsigned const recsym = find_secsym_ndx(
@@ -535,9 +731,10 @@ do_func(Elf_Ehdr *const ehdr, char const *const fname, unsigned const reltype)
ehdr);

rel_entsize = _w(relhdr->sh_entsize);
- mlocp = sift_rel_mcount(mlocp,
- (void *)mlocp - (void *)mloc0, &mrelp,
- relhdr, ehdr, recsym, recval, reltype);
+ sift_rel_mcount(&mlocp, mloc0, &mrelp,
+ &mlocp_u, mloc0_u, &mrelp_u, &mlocp_i, mloc0_i,
+ &mrelp_i, relhdr, ehdr, recsym, recval,
+ reltype);
} else if (txtname && (warn_on_notrace_sect || make_nop)) {
/*
* This section is ignored by ftrace, but still
@@ -546,10 +743,16 @@ do_func(Elf_Ehdr *const ehdr, char const *const fname, unsigned const reltype)
nop_mcount(relhdr, ehdr, txtname);
}
}
- if (mloc0 != mlocp) {
+ if (mloc0 != mlocp || mloc0_u != mlocp_u || mloc0_i != mlocp_i) {
append_func(ehdr, shstr, mloc0, mlocp, mrel0, mrelp,
+ mloc0_u, mlocp_u, mrel0_u, mrelp_u,
+ mloc0_i, mlocp_i, mrel0_i, mrelp_i,
rel_entsize, symsec_sh_link);
}
free(mrel0);
free(mloc0);
+ free(mrel0_u);
+ free(mloc0_u);
+ free(mrel0_i);
+ free(mloc0_i);
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2015-11-21 01:24:14

by Stephen Boyd

[permalink] [raw]

On Sun, Nov 22, 2015 at 08:58:08PM +0100, Arnd Bergmann wrote:
> does it work with -mcpu=cortex-a15? I've tried crosstool as versions
> 2.23.52.20130913, 2.24.0.20141017 and 2.25.51.20150518, and they
> all seem to behave as expected, failing with -mcpu=cortex-a9 and
> marvell-pj4 but succeeding with -mcpu=cortex-a15 or marvell-pj4+idiv.

Appears not:

root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='cortex-a15+idiv' -marm
/tmp/ccSovg32.s: Assembler messages:
/tmp/ccSovg32.s:32: Error: selected processor does not support ARM mode `udiv ip,r5,r4'
root@cubox:~# gcc -O2 -o idiv idiv.c -Wa,-mcpu='cortex-a15+idiv' -mthumb
/tmp/cchbT3EE.s: Assembler messages:
/tmp/cchbT3EE.s:36: Error: selected processor does not support Thumb mode `udiv r6,r5,r4'

Same without the +idiv.

> I've also found some /proc/cpuinfo output to cross-reference SoCs
> to their core names.
>
> variant part revision name features
> mmp2: 0 0x581 5 PJ4 idivt
> dove: 0 0x581 5 PJ4 idivt

Yes, that agrees with my dove.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-11-22 20:37:49

[permalink] [raw]

Subject: Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

On 11/24, Arnd Bergmann wrote:
> On Monday 23 November 2015 15:13:52 Stephen Boyd wrote:
> > On 11/23, Arnd Bergmann wrote:
> > >
> > > - PJ4/PJ4B (not PJ4B-MP) has a different custom opcode for udiv and sdiv
> > > in ARM mode. We don't support that with true multiplatform kernels
> > > because those opcodes work nowhere else, though with your proposed
> > > series we could easily do that for dynamic patching.
> >
> > Do you have the information on these custom opcodes? I can work
> > that into the patches assuming the MIDR is different.
>
> Thomas Petazzoni said this in a private mail:
>
> | According to the datasheet, the PJ4B has integer signed and unsigned
> | divide, similar to the sdiv and udiv ARM instructions. But the way to
> | access it is by doing a MRC instruction.
> |
> | MRC<cond> p6, 1, Rd , CRn , CRm, 4
> |
> |for PJ4B is the same as:
> |
> | SDIV Rd , Rn, Rm
> |
> | on ARM cores.
> |
> |And:
> |
> | MRC<cond> p6, 1, Rd , CRn , CRm, 0
> |
> |for PJ4B is the same as:
> |
> | UDIV Rd , Rn, Rm
> |
> |on ARM cores.
> |
> |This is documented in the "Extended instructions" section of the
> |PJ4B datasheet.
>
> I assume what he meant was that this is true for both PJ4 and PJ4B
> but not for PJ4B-MP, which has the normal udiv/sdiv instructions.
>
> IOW, anything with CPU implementer 0x56 part 0x581 should use those,
> while part 0x584 can use the sdiv/udiv that it reports correctly.
>

It looks like we have some sort of function that mostly does
this, except it doesn't differentiate on that lower bit for 1 vs
4. I guess I'll write another one for that.

static inline int cpu_is_pj4(void)
{
unsigned int id;

id = read_cpuid_id();
if ((id & 0xff0fff00) == 0x560f5800)
return 1;

return 0;
}

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2015-11-25 07:22:15

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

On Tuesday 24 November 2015 17:51:37 Stephen Boyd wrote:
> On 11/24, Arnd Bergmann wrote:
> > On Monday 23 November 2015 15:13:52 Stephen Boyd wrote:
> > IOW, anything with CPU implementer 0x56 part 0x581 should use those,
> > while part 0x584 can use the sdiv/udiv that it reports correctly.
> >
>
> It looks like we have some sort of function that mostly does
> this, except it doesn't differentiate on that lower bit for 1 vs
> 4. I guess I'll write another one for that.
>
> static inline int cpu_is_pj4(void)
> {
> unsigned int id;
>
> id = read_cpuid_id();
> if ((id & 0xff0fff00) == 0x560f5800)
> return 1;
>
> return 0;
> }

Correct, thanks.

Arnd