2019-10-19 00:18:50

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v4 15/16] module: Move where we mark modules RO,X

Now that set_all_modules_text_*() is gone, nothing depends on the
relation between ->state = COMING and the protection state anymore.
This enables moving the protection changes later, such that the COMING
notifier callbacks can more easily modify the text.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Jessica Yu <[email protected]>
---
kernel/module.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
/* This relies on module_mutex for list integrity. */
module_bug_finalize(info->hdr, info->sechdrs, mod);

- module_enable_ro(mod, false);
- module_enable_nx(mod);
- module_enable_x(mod);
-
/* Mark state as coming so strong_try_module_get() ignores us,
* but kallsyms etc. can see us. */
mod->state = MODULE_STATE_COMING;
@@ -3852,6 +3848,10 @@ static int load_module(struct load_info
if (err)
goto bug_cleanup;

+ module_enable_ro(mod, false);
+ module_enable_nx(mod);
+ module_enable_x(mod);
+
/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
-32768, 32767, mod,



2019-10-21 13:54:47

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote:
> Now that set_all_modules_text_*() is gone, nothing depends on the
> relation between ->state = COMING and the protection state anymore.
> This enables moving the protection changes later, such that the COMING
> notifier callbacks can more easily modify the text.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: Jessica Yu <[email protected]>
> ---
> kernel/module.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
> /* This relies on module_mutex for list integrity. */
> module_bug_finalize(info->hdr, info->sechdrs, mod);
>
> - module_enable_ro(mod, false);
> - module_enable_nx(mod);
> - module_enable_x(mod);
> -
> /* Mark state as coming so strong_try_module_get() ignores us,
> * but kallsyms etc. can see us. */
> mod->state = MODULE_STATE_COMING;
> @@ -3852,6 +3848,10 @@ static int load_module(struct load_info
> if (err)
> goto bug_cleanup;
>
> + module_enable_ro(mod, false);
> + module_enable_nx(mod);
> + module_enable_x(mod);
> +
> /* Module is ready to execute: parsing args may do that. */
> after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> -32768, 32767, mod,

[ Sorry if this was already discussed, I still have a large backlog. ]

Doesn't livepatch code also need to be modified? We have:

prepare_coming_module()
klp_module_coming()
klp_init_object_loaded()
module_disable_ro()
...
module_enable_ro()

which is done right before the above patch does module_enable_ro().

We could remove the disable-RO from that case, though we'd still need it
for another case (late module patching).

--
Josh

2019-10-21 14:16:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:
> On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote:
> > Now that set_all_modules_text_*() is gone, nothing depends on the
> > relation between ->state = COMING and the protection state anymore.
> > This enables moving the protection changes later, such that the COMING
> > notifier callbacks can more easily modify the text.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Jessica Yu <[email protected]>
> > ---
> > kernel/module.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
> > /* This relies on module_mutex for list integrity. */
> > module_bug_finalize(info->hdr, info->sechdrs, mod);
> >
> > - module_enable_ro(mod, false);
> > - module_enable_nx(mod);
> > - module_enable_x(mod);
> > -
> > /* Mark state as coming so strong_try_module_get() ignores us,
> > * but kallsyms etc. can see us. */
> > mod->state = MODULE_STATE_COMING;
> > @@ -3852,6 +3848,10 @@ static int load_module(struct load_info
> > if (err)
> > goto bug_cleanup;
> >
> > + module_enable_ro(mod, false);
> > + module_enable_nx(mod);
> > + module_enable_x(mod);
> > +
> > /* Module is ready to execute: parsing args may do that. */
> > after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> > -32768, 32767, mod,
>
> [ Sorry if this was already discussed, I still have a large backlog. ]
>
> Doesn't livepatch code also need to be modified? We have:

Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
arm64/ftrace and klp are the only two users of that function (outside of
module.c) and Mark was already writing a patch for arm64.

Means these last two patches need to wait a little until we've fixed
those.

2019-10-21 15:35:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:
> > On Fri, Oct 18, 2019 at 09:35:40AM +0200, Peter Zijlstra wrote:
> > > Now that set_all_modules_text_*() is gone, nothing depends on the
> > > relation between ->state = COMING and the protection state anymore.
> > > This enables moving the protection changes later, such that the COMING
> > > notifier callbacks can more easily modify the text.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > > Cc: Jessica Yu <[email protected]>
> > > ---
> > > kernel/module.c | 8 ++++----
> > > 1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > --- a/kernel/module.c
> > > +++ b/kernel/module.c
> > > @@ -3683,10 +3683,6 @@ static int complete_formation(struct mod
> > > /* This relies on module_mutex for list integrity. */
> > > module_bug_finalize(info->hdr, info->sechdrs, mod);
> > >
> > > - module_enable_ro(mod, false);
> > > - module_enable_nx(mod);
> > > - module_enable_x(mod);
> > > -
> > > /* Mark state as coming so strong_try_module_get() ignores us,
> > > * but kallsyms etc. can see us. */
> > > mod->state = MODULE_STATE_COMING;
> > > @@ -3852,6 +3848,10 @@ static int load_module(struct load_info
> > > if (err)
> > > goto bug_cleanup;
> > >
> > > + module_enable_ro(mod, false);
> > > + module_enable_nx(mod);
> > > + module_enable_x(mod);
> > > +
> > > /* Module is ready to execute: parsing args may do that. */
> > > after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
> > > -32768, 32767, mod,
> >
> > [ Sorry if this was already discussed, I still have a large backlog. ]
> >
> > Doesn't livepatch code also need to be modified? We have:
>
> Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
> arm64/ftrace and klp are the only two users of that function (outside of
> module.c) and Mark was already writing a patch for arm64.
>
> Means these last two patches need to wait a little until we've fixed
> those.

So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
available on Power and x86, and Power does not have STRICT_MODULE_RWX,
the below should be sufficient.

Completely untested...

---
arch/x86/kernel/module.c | 40 +++++++++++++++++++++++++++++++++-------
include/linux/livepatch.h | 7 +++++++
kernel/livepatch/core.c | 14 ++++++++++----
3 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index d5c72cb877b3..76fa2c5f2d7b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -126,11 +126,12 @@ int apply_relocate(Elf32_Shdr *sechdrs,
return 0;
}
#else /*X86_64*/
-int apply_relocate_add(Elf64_Shdr *sechdrs,
+int __apply_relocate_add(Elf64_Shdr *sechdrs,
const char *strtab,
unsigned int symindex,
unsigned int relsec,
- struct module *me)
+ struct module *me,
+ void *(*write)(void *addr, const void *val, size_t size))
{
unsigned int i;
Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
@@ -162,19 +163,19 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
case R_X86_64_64:
if (*(u64 *)loc != 0)
goto invalid_relocation;
- *(u64 *)loc = val;
+ write(loc, &val, 8);
break;
case R_X86_64_32:
if (*(u32 *)loc != 0)
goto invalid_relocation;
- *(u32 *)loc = val;
+ write(loc, &val, 4);
if (val != *(u32 *)loc)
goto overflow;
break;
case R_X86_64_32S:
if (*(s32 *)loc != 0)
goto invalid_relocation;
- *(s32 *)loc = val;
+ write(loc, &val, 4);
if ((s64)val != *(s32 *)loc)
goto overflow;
break;
@@ -183,7 +184,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
if (*(u32 *)loc != 0)
goto invalid_relocation;
val -= (u64)loc;
- *(u32 *)loc = val;
+ write(loc, &val, 4);
#if 0
if ((s64)val != *(s32 *)loc)
goto overflow;
@@ -193,7 +194,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
if (*(u64 *)loc != 0)
goto invalid_relocation;
val -= (u64)loc;
- *(u64 *)loc = val;
+ write(loc, &val, 8);
break;
default:
pr_err("%s: Unknown rela relocation: %llu\n",
@@ -215,6 +216,31 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
me->name);
return -ENOEXEC;
}
+
+int apply_relocate_add(Elf64_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *me)
+{
+ return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
+}
+
+int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *me)
+{
+ int ret;
+
+ ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
+ if (!ret)
+ text_poke_sync();
+
+ return ret;
+}
+
#endif

int module_finalize(const Elf_Ehdr *hdr,
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 273400814020..5b8c10871b70 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -217,6 +217,13 @@ void *klp_shadow_get_or_alloc(void *obj, unsigned long id,
void klp_shadow_free(void *obj, unsigned long id, klp_shadow_dtor_t dtor);
void klp_shadow_free_all(unsigned long id, klp_shadow_dtor_t dtor);

+
+extern int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *me);
+
#else /* !CONFIG_LIVEPATCH */

static inline int klp_module_coming(struct module *mod) { return 0; }
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index ab4a4606d19b..e690519aba31 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -245,6 +245,15 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
return 0;
}

+int __weak klp_apply_relocate_add(Elf64_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *me)
+{
+ apply_relocate_add(sechdrs, strtab, symindex, relsec, me);
+}
+
static int klp_write_object_relocations(struct module *pmod,
struct klp_object *obj)
{
@@ -285,7 +294,7 @@ static int klp_write_object_relocations(struct module *pmod,
if (ret)
break;

- ret = apply_relocate_add(pmod->klp_info->sechdrs,
+ ret = klp_apply_relocate_add(pmod->klp_info->sechdrs,
pmod->core_kallsyms.strtab,
pmod->klp_info->symndx, i, pmod);
if (ret)
@@ -721,16 +730,13 @@ static int klp_init_object_loaded(struct klp_patch *patch,

mutex_lock(&text_mutex);

- module_disable_ro(patch->mod);
ret = klp_write_object_relocations(patch->mod, obj);
if (ret) {
- module_enable_ro(patch->mod, true);
mutex_unlock(&text_mutex);
return ret;
}

arch_klp_init_object_loaded(patch, obj);
- module_enable_ro(patch->mod, true);

mutex_unlock(&text_mutex);

2019-10-21 15:46:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> the below should be sufficient.

And... s390 also has HAVE_LIVEPATCH and STRICT_MODULE_RWX, so let me
poke at that.

2019-10-21 16:13:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:

> So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> the below should be sufficient.
>
> Completely untested...

And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even
less tested s390 bits included below.

Heiko, apologies if I completely wrecked it.

The purpose is to remove module_disable_ro()/module_enable_ro() from
livepatch/core.c such that:

- nothing relies on where in the module loading path module text goes RX.
- nothing ever has writable text

> ---
> arch/x86/kernel/module.c | 40 +++++++++++++++++++++++++++++++++-------
> include/linux/livepatch.h | 7 +++++++
> kernel/livepatch/core.c | 14 ++++++++++----
> 3 files changed, 50 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index d5c72cb877b3..76fa2c5f2d7b 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -126,11 +126,12 @@ int apply_relocate(Elf32_Shdr *sechdrs,
> return 0;
> }
> #else /*X86_64*/
> -int apply_relocate_add(Elf64_Shdr *sechdrs,
> +int __apply_relocate_add(Elf64_Shdr *sechdrs,
> const char *strtab,
> unsigned int symindex,
> unsigned int relsec,
> - struct module *me)
> + struct module *me,
> + void *(*write)(void *addr, const void *val, size_t size))
> {
> unsigned int i;
> Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
> @@ -162,19 +163,19 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
> case R_X86_64_64:
> if (*(u64 *)loc != 0)
> goto invalid_relocation;
> - *(u64 *)loc = val;
> + write(loc, &val, 8);
> break;
> case R_X86_64_32:
> if (*(u32 *)loc != 0)
> goto invalid_relocation;
> - *(u32 *)loc = val;
> + write(loc, &val, 4);
> if (val != *(u32 *)loc)
> goto overflow;
> break;
> case R_X86_64_32S:
> if (*(s32 *)loc != 0)
> goto invalid_relocation;
> - *(s32 *)loc = val;
> + write(loc, &val, 4);
> if ((s64)val != *(s32 *)loc)
> goto overflow;
> break;
> @@ -183,7 +184,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
> if (*(u32 *)loc != 0)
> goto invalid_relocation;
> val -= (u64)loc;
> - *(u32 *)loc = val;
> + write(loc, &val, 4);
> #if 0
> if ((s64)val != *(s32 *)loc)
> goto overflow;
> @@ -193,7 +194,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
> if (*(u64 *)loc != 0)
> goto invalid_relocation;
> val -= (u64)loc;
> - *(u64 *)loc = val;
> + write(loc, &val, 8);
> break;
> default:
> pr_err("%s: Unknown rela relocation: %llu\n",
> @@ -215,6 +216,31 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
> me->name);
> return -ENOEXEC;
> }
> +
> +int apply_relocate_add(Elf64_Shdr *sechdrs,
> + const char *strtab,
> + unsigned int symindex,
> + unsigned int relsec,
> + struct module *me)
> +{
> + return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
> +}
> +
> +int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
> + const char *strtab,
> + unsigned int symindex,
> + unsigned int relsec,
> + struct module *me)
> +{
> + int ret;
> +
> + ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
> + if (!ret)
> + text_poke_sync();
> +
> + return ret;
> +}
> +
> #endif
>
> int module_finalize(const Elf_Ehdr *hdr,
> diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> index 273400814020..5b8c10871b70 100644
> --- a/include/linux/livepatch.h
> +++ b/include/linux/livepatch.h
> @@ -217,6 +217,13 @@ void *klp_shadow_get_or_alloc(void *obj, unsigned long id,
> void klp_shadow_free(void *obj, unsigned long id, klp_shadow_dtor_t dtor);
> void klp_shadow_free_all(unsigned long id, klp_shadow_dtor_t dtor);
>
> +
> +extern int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
> + const char *strtab,
> + unsigned int symindex,
> + unsigned int relsec,
> + struct module *me);
> +
> #else /* !CONFIG_LIVEPATCH */
>
> static inline int klp_module_coming(struct module *mod) { return 0; }
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index ab4a4606d19b..e690519aba31 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -245,6 +245,15 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
> return 0;
> }
>
> +int __weak klp_apply_relocate_add(Elf64_Shdr *sechdrs,
> + const char *strtab,
> + unsigned int symindex,
> + unsigned int relsec,
> + struct module *me)
> +{
> + apply_relocate_add(sechdrs, strtab, symindex, relsec, me);
> +}
> +
> static int klp_write_object_relocations(struct module *pmod,
> struct klp_object *obj)
> {
> @@ -285,7 +294,7 @@ static int klp_write_object_relocations(struct module *pmod,
> if (ret)
> break;
>
> - ret = apply_relocate_add(pmod->klp_info->sechdrs,
> + ret = klp_apply_relocate_add(pmod->klp_info->sechdrs,
> pmod->core_kallsyms.strtab,
> pmod->klp_info->symndx, i, pmod);
> if (ret)
> @@ -721,16 +730,13 @@ static int klp_init_object_loaded(struct klp_patch *patch,
>
> mutex_lock(&text_mutex);
>
> - module_disable_ro(patch->mod);
> ret = klp_write_object_relocations(patch->mod, obj);
> if (ret) {
> - module_enable_ro(patch->mod, true);
> mutex_unlock(&text_mutex);
> return ret;
> }
>
> arch_klp_init_object_loaded(patch, obj);
> - module_enable_ro(patch->mod, true);
>
> mutex_unlock(&text_mutex);
>

diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index ba8f19bb438b..5f3443098172 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -174,7 +174,8 @@ int module_frob_arch_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
}

static int apply_rela_bits(Elf_Addr loc, Elf_Addr val,
- int sign, int bits, int shift)
+ int sign, int bits, int shift.
+ void (*write)(void *addr, const char *data, size_t len))
{
unsigned long umax;
long min, max;
@@ -194,26 +195,29 @@ static int apply_rela_bits(Elf_Addr loc, Elf_Addr val,
return -ENOEXEC;
}

- if (bits == 8)
- *(unsigned char *) loc = val;
- else if (bits == 12)
- *(unsigned short *) loc = (val & 0xfff) |
+ if (bits == 8) {
+ write(loc, &val, 1);
+ } else if (bits == 12) {
+ unsigned short tmp = (val & 0xfff) |
(*(unsigned short *) loc & 0xf000);
- else if (bits == 16)
- *(unsigned short *) loc = val;
- else if (bits == 20)
- *(unsigned int *) loc = (val & 0xfff) << 16 |
- (val & 0xff000) >> 4 |
- (*(unsigned int *) loc & 0xf00000ff);
- else if (bits == 32)
- *(unsigned int *) loc = val;
- else if (bits == 64)
- *(unsigned long *) loc = val;
+ write(loc, &tmp, 2);
+ } else if (bits == 16) {
+ write(loc, &val, 2);
+ } else if (bits == 20) {
+ unsigned int tmp = (val & 0xfff) << 16 |
+ (val & 0xff000) >> 4 | (*(unsigned int *) loc & 0xf00000ff);
+ write(loc, &tmp, 4);
+ } else if (bits == 32) {
+ write(loc, &val, 4);
+ } else if (bits == 64) {
+ write(loc, &val, 8);
+ }
return 0;
}

static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
- const char *strtab, struct module *me)
+ const char *strtab, struct module *me,
+ void (*write)(void *addr, const void *data, size_t len))
{
struct mod_arch_syminfo *info;
Elf_Addr loc, val;
@@ -241,17 +245,17 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
case R_390_64: /* Direct 64 bit. */
val += rela->r_addend;
if (r_type == R_390_8)
- rc = apply_rela_bits(loc, val, 0, 8, 0);
+ rc = apply_rela_bits(loc, val, 0, 8, 0, write);
else if (r_type == R_390_12)
- rc = apply_rela_bits(loc, val, 0, 12, 0);
+ rc = apply_rela_bits(loc, val, 0, 12, 0, write);
else if (r_type == R_390_16)
- rc = apply_rela_bits(loc, val, 0, 16, 0);
+ rc = apply_rela_bits(loc, val, 0, 16, 0, write);
else if (r_type == R_390_20)
- rc = apply_rela_bits(loc, val, 1, 20, 0);
+ rc = apply_rela_bits(loc, val, 1, 20, 0, write);
else if (r_type == R_390_32)
- rc = apply_rela_bits(loc, val, 0, 32, 0);
+ rc = apply_rela_bits(loc, val, 0, 32, 0, write);
else if (r_type == R_390_64)
- rc = apply_rela_bits(loc, val, 0, 64, 0);
+ rc = apply_rela_bits(loc, val, 0, 64, 0, write);
break;
case R_390_PC16: /* PC relative 16 bit. */
case R_390_PC16DBL: /* PC relative 16 bit shifted by 1. */
@@ -260,15 +264,15 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
case R_390_PC64: /* PC relative 64 bit. */
val += rela->r_addend - loc;
if (r_type == R_390_PC16)
- rc = apply_rela_bits(loc, val, 1, 16, 0);
+ rc = apply_rela_bits(loc, val, 1, 16, 0, write);
else if (r_type == R_390_PC16DBL)
- rc = apply_rela_bits(loc, val, 1, 16, 1);
+ rc = apply_rela_bits(loc, val, 1, 16, 1, write);
else if (r_type == R_390_PC32DBL)
- rc = apply_rela_bits(loc, val, 1, 32, 1);
+ rc = apply_rela_bits(loc, val, 1, 32, 1, write);
else if (r_type == R_390_PC32)
- rc = apply_rela_bits(loc, val, 1, 32, 0);
+ rc = apply_rela_bits(loc, val, 1, 32, 0, write);
else if (r_type == R_390_PC64)
- rc = apply_rela_bits(loc, val, 1, 64, 0);
+ rc = apply_rela_bits(loc, val, 1, 64, 0, write);
break;
case R_390_GOT12: /* 12 bit GOT offset. */
case R_390_GOT16: /* 16 bit GOT offset. */
@@ -293,23 +297,23 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
val = info->got_offset + rela->r_addend;
if (r_type == R_390_GOT12 ||
r_type == R_390_GOTPLT12)
- rc = apply_rela_bits(loc, val, 0, 12, 0);
+ rc = apply_rela_bits(loc, val, 0, 12, 0, write);
else if (r_type == R_390_GOT16 ||
r_type == R_390_GOTPLT16)
- rc = apply_rela_bits(loc, val, 0, 16, 0);
+ rc = apply_rela_bits(loc, val, 0, 16, 0, write);
else if (r_type == R_390_GOT20 ||
r_type == R_390_GOTPLT20)
- rc = apply_rela_bits(loc, val, 1, 20, 0);
+ rc = apply_rela_bits(loc, val, 1, 20, 0, write);
else if (r_type == R_390_GOT32 ||
r_type == R_390_GOTPLT32)
- rc = apply_rela_bits(loc, val, 0, 32, 0);
+ rc = apply_rela_bits(loc, val, 0, 32, 0, write);
else if (r_type == R_390_GOT64 ||
r_type == R_390_GOTPLT64)
- rc = apply_rela_bits(loc, val, 0, 64, 0);
+ rc = apply_rela_bits(loc, val, 0, 64, 0, write);
else if (r_type == R_390_GOTENT ||
r_type == R_390_GOTPLTENT) {
val += (Elf_Addr) me->core_layout.base - loc;
- rc = apply_rela_bits(loc, val, 1, 32, 1);
+ rc = apply_rela_bits(loc, val, 1, 32, 1, write);
}
break;
case R_390_PLT16DBL: /* 16 bit PC rel. PLT shifted by 1. */
@@ -357,17 +361,17 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
val += rela->r_addend - loc;
}
if (r_type == R_390_PLT16DBL)
- rc = apply_rela_bits(loc, val, 1, 16, 1);
+ rc = apply_rela_bits(loc, val, 1, 16, 1, write);
else if (r_type == R_390_PLTOFF16)
- rc = apply_rela_bits(loc, val, 0, 16, 0);
+ rc = apply_rela_bits(loc, val, 0, 16, 0, write);
else if (r_type == R_390_PLT32DBL)
- rc = apply_rela_bits(loc, val, 1, 32, 1);
+ rc = apply_rela_bits(loc, val, 1, 32, 1, write);
else if (r_type == R_390_PLT32 ||
r_type == R_390_PLTOFF32)
- rc = apply_rela_bits(loc, val, 0, 32, 0);
+ rc = apply_rela_bits(loc, val, 0, 32, 0, write);
else if (r_type == R_390_PLT64 ||
r_type == R_390_PLTOFF64)
- rc = apply_rela_bits(loc, val, 0, 64, 0);
+ rc = apply_rela_bits(loc, val, 0, 64, 0, write);
break;
case R_390_GOTOFF16: /* 16 bit offset to GOT. */
case R_390_GOTOFF32: /* 32 bit offset to GOT. */
@@ -375,20 +379,20 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
val = val + rela->r_addend -
((Elf_Addr) me->core_layout.base + me->arch.got_offset);
if (r_type == R_390_GOTOFF16)
- rc = apply_rela_bits(loc, val, 0, 16, 0);
+ rc = apply_rela_bits(loc, val, 0, 16, 0, write);
else if (r_type == R_390_GOTOFF32)
- rc = apply_rela_bits(loc, val, 0, 32, 0);
+ rc = apply_rela_bits(loc, val, 0, 32, 0, write);
else if (r_type == R_390_GOTOFF64)
- rc = apply_rela_bits(loc, val, 0, 64, 0);
+ rc = apply_rela_bits(loc, val, 0, 64, 0, write);
break;
case R_390_GOTPC: /* 32 bit PC relative offset to GOT. */
case R_390_GOTPCDBL: /* 32 bit PC rel. off. to GOT shifted by 1. */
val = (Elf_Addr) me->core_layout.base + me->arch.got_offset +
rela->r_addend - loc;
if (r_type == R_390_GOTPC)
- rc = apply_rela_bits(loc, val, 1, 32, 0);
+ rc = apply_rela_bits(loc, val, 1, 32, 0, write);
else if (r_type == R_390_GOTPCDBL)
- rc = apply_rela_bits(loc, val, 1, 32, 1);
+ rc = apply_rela_bits(loc, val, 1, 32, 1, write);
break;
case R_390_COPY:
case R_390_GLOB_DAT: /* Create GOT entry. */
@@ -412,9 +416,10 @@ static int apply_rela(Elf_Rela *rela, Elf_Addr base, Elf_Sym *symtab,
return 0;
}

-int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
+int __apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
unsigned int symindex, unsigned int relsec,
- struct module *me)
+ struct module *me,
+ void (*write)(void *addr, const void *data, size_t len))
{
Elf_Addr base;
Elf_Sym *symtab;
@@ -437,6 +442,25 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
return 0;
}

+static void memwrite(void *addr, const void *data, size_t len)
+{
+ memcpy(addr, data, len);
+}
+
+int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
+ unsigned int symindex, unsigned int relsec,
+ struct module *me)
+{
+ return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memwrite);
+}
+
+int klp_apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
+ unsigned int symindex, unsigned int relsec,
+ struct module *me)
+{
+ return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, s390_kernel_write);
+}
+
int module_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *me)

2019-10-22 02:23:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Fri, 18 Oct 2019 09:35:40 +0200
Peter Zijlstra <[email protected]> wrote:

> Now that set_all_modules_text_*() is gone, nothing depends on the
> relation between ->state = COMING and the protection state anymore.
> This enables moving the protection changes later, such that the COMING
> notifier callbacks can more easily modify the text.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: Jessica Yu <[email protected]>
> ---

This triggered the following bug:

BUG: unable to handle page fault for address: ffffffffa01501f1
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0011) - permissions violation
PGD 2a16067 P4D 2a16067 PUD 2a17063 PMD c230c067 PTE 80000000c4d74063
Oops: 0011 [#1] PREEMPT SMP KASAN PTI
CPU: 2 PID: 638 Comm: systemd-udevd Not tainted 5.4.0-rc3-test+ #98
ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20190816/utaddress-213)
ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20190816/utaddress-213)
ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
lpc_ich: Resource conflict(s) found affecting gpio_ich
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
RIP: 0010:trace_event_define_fields_i2c_result+0x0/0x86 [i2c_core]
Code: 27 6a 00 48 c7 c2 60 34 13 a0 45 31 c9 48 89 df 41 b8 02 00 00 00 b9 12 00 00 00 48 c7 c6 a0 33 13 a0 e8 02 ec 14 e1 5a 5b c3 <53> 48 c7 c6 20 33 13 a0 b9 08 00 00 00 41
0 6a 00 41
RSP: 0018:ffff8880cba07950 EFLAGS: 00010246
RAX: ffffffffa01501f1 RBX: ffffffffa013da40 RCX: ffffffff812a147c
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffffffa013da40
RBP: ffffffffa0142be0 R08: ffffed1017fde1ab R09: ffffed1017fde1ab
R10: ffffed1017fde1aa R11: ffff8880bfef0d57 R12: ffff8880cc22a000
R13: ffffffffa013da50 R14: ffffffffa0137aa8 R15: ffff8880cd372c60
FS: 00007f062a48f940(0000) GS:ffff8880d4680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa01501f1 CR3: 00000000cb632003 CR4: 00000000001606e0
Call Trace:
event_create_dir+0x358/0x7b0
trace_module_notify+0x20b/0x240
notifier_call_chain+0x6d/0xa0
blocking_notifier_call_chain+0x5e/0x80
load_module+0x39a5/0x3d80
? module_frob_arch_sections+0x20/0x20
? vfs_read+0xcc/0x1b0
? kernel_read+0x95/0xb0
? kernel_read_file+0x187/0x310
? find_held_lock+0xac/0xd0
? syscall_trace_enter+0x369/0x590
? __do_sys_finit_module+0x11a/0x1b0
__do_sys_finit_module+0x11a/0x1b0
? __ia32_sys_init_module+0x40/0x40
? trace_hardirqs_on+0x2e/0x120
? ktime_get_coarse_real_ts64+0x6c/0xf0
? syscall_trace_enter+0x233/0x590
? do_syscall_64+0x14/0x1a0
do_syscall_64+0x68/0x1a0
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Attached config, but it seems to be triggered with modules that have
trace events defined in them.

The trace_event_define_fields_<event>() is defined in
include/trace/trace_events.h and is an init function called by the
trace_events event_create_dir() via the module notifier:
MODULE_STATE_COMING

-- Steve


Attachments:
(No filename) (3.42 kB)
config.gz (29.74 kB)
Download all attachments

2019-10-22 13:25:25

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Mon, Oct 21, 2019 at 06:11:35PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
>
> > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> > available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> > the below should be sufficient.
> >
> > Completely untested...
>
> And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even
> less tested s390 bits included below.
>
> Heiko, apologies if I completely wrecked it.
>
> The purpose is to remove module_disable_ro()/module_enable_ro() from
> livepatch/core.c such that:
>
> - nothing relies on where in the module loading path module text goes RX.
> - nothing ever has writable text

Given that Steven reported a crash, I assume I can wait until you
repost a new version of the series, which also includes s390 bits?

2019-10-22 15:34:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Tue, Oct 22, 2019 at 01:31:16PM +0200, Heiko Carstens wrote:
> On Mon, Oct 21, 2019 at 06:11:35PM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 21, 2019 at 05:34:25PM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> >
> > > So On IRC Josh suggested we use text_poke() for RELA. Since KLP is only
> > > available on Power and x86, and Power does not have STRICT_MODULE_RWX,
> > > the below should be sufficient.
> > >
> > > Completely untested...
> >
> > And because s390 also has: HAVE_LIVEPATCH and STRICT_MODULE_RWX the even
> > less tested s390 bits included below.
> >
> > Heiko, apologies if I completely wrecked it.
> >
> > The purpose is to remove module_disable_ro()/module_enable_ro() from
> > livepatch/core.c such that:
> >
> > - nothing relies on where in the module loading path module text goes RX.
> > - nothing ever has writable text
>
> Given that Steven reported a crash, I assume I can wait until you
> repost a new version of the series, which also includes s390 bits?

His crash is somewhat orthogonal, but yes, I will repost once sorted.

2019-10-22 23:53:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Mon, Oct 21, 2019 at 10:21:10PM -0400, Steven Rostedt wrote:
> On Fri, 18 Oct 2019 09:35:40 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > Now that set_all_modules_text_*() is gone, nothing depends on the
> > relation between ->state = COMING and the protection state anymore.
> > This enables moving the protection changes later, such that the COMING
> > notifier callbacks can more easily modify the text.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Jessica Yu <[email protected]>
> > ---
>
> This triggered the following bug:
>

> The trace_event_define_fields_<event>() is defined in
> include/trace/trace_events.h and is an init function called by the
> trace_events event_create_dir() via the module notifier:
> MODULE_STATE_COMING

The below seems to cure it; and seems to generate identical
events/*/format output (for my .config, with the exception of ID).

It has just one section mismatch report that I'm too tired to look at
just now.

I'm not particularly proud of the "__function__" hack, but it works :/ I
couldn't come up with anything else for [uk]probes which seem to have
dynamic fields and if we're having it then syscall_enter can also make
use of it, the syscall_metadata crud was going to be ugly otherwise.

(also, win on LOC)

---
fs/xfs/xfs_trace.h | 4 +-
include/linux/trace_events.h | 16 ++++++-
include/trace/events/filemap.h | 2 +-
include/trace/trace_events.h | 64 ++++++++-----------------
kernel/trace/trace.h | 31 ++++++------
kernel/trace/trace_entries.h | 66 +++++++------------------
kernel/trace/trace_events.c | 20 +++++++-
kernel/trace/trace_export.c | 106 +++++++++++++++--------------------------
kernel/trace/trace_kprobe.c | 12 ++++-
kernel/trace/trace_syscalls.c | 48 +++++++------------
kernel/trace/trace_uprobe.c | 6 ++-
11 files changed, 162 insertions(+), 213 deletions(-)

diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index eaae275ed430..53c5485cf2a1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -218,8 +218,8 @@ DECLARE_EVENT_CLASS(xfs_bmap_class,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
- __field(void *, leaf);
- __field(int, pos);
+ __field(void *, leaf)
+ __field(int, pos)
__field(xfs_fileoff_t, startoff)
__field(xfs_fsblock_t, startblock)
__field(xfs_filblks_t, blockcount)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 30a8cdcfd4a4..30782e78b91d 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -187,6 +187,20 @@ enum trace_reg {

struct trace_event_call;

+struct trace_event_fields {
+ const char *type;
+ union {
+ struct {
+ const char *name;
+ const int size;
+ const int align;
+ const int is_signed;
+ const int filter_type;
+ };
+ int (*define_fields)(struct trace_event_call *);
+ };
+};
+
struct trace_event_class {
const char *system;
void *probe;
@@ -195,7 +209,7 @@ struct trace_event_class {
#endif
int (*reg)(struct trace_event_call *event,
enum trace_reg type, void *data);
- int (*define_fields)(struct trace_event_call *);
+ struct trace_event_fields *fields_array;
struct list_head *(*get_fields)(struct trace_event_call *);
struct list_head fields;
int (*raw_init)(struct trace_event_call *);
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index ee05db7ee8d2..796053e162d2 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -85,7 +85,7 @@ TRACE_EVENT(file_check_and_advance_wb_err,
TP_ARGS(file, old),

TP_STRUCT__entry(
- __field(struct file *, file);
+ __field(struct file *, file)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
__field(errseq_t, old)
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 4ecdfe2e3580..ca1d2e745a3f 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -394,22 +394,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)

#undef __field_ext
-#define __field_ext(type, item, filter_type) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __field_ext(_type, _item, _filter_type) { \
+ .type = #_type, .name = #_item, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ .is_signed = is_signed_type(_type), .filter_type = _filter_type },

#undef __field_struct_ext
-#define __field_struct_ext(type, item, filter_type) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- 0, filter_type); \
- if (ret) \
- return ret;
+#define __field_struct_ext(_type, _item, _filter_type) { \
+ .type = #_type, .name = #_item, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ 0, .filter_type = _filter_type },

#undef __field
#define __field(type, item) __field_ext(type, item, FILTER_OTHER)
@@ -418,25 +412,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \
#define __field_struct(type, item) __field_struct_ext(type, item, FILTER_OTHER)

#undef __array
-#define __array(type, item, len) \
- do { \
- char *type_str = #type"["__stringify(len)"]"; \
- BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
- BUILD_BUG_ON(len <= 0); \
- ret = trace_define_field(event_call, type_str, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), FILTER_OTHER); \
- if (ret) \
- return ret; \
- } while (0);
+#define __array(_type, _item, _len) { \
+ .type = #_type"["__stringify(_len)"]", .name = #_item, \
+ .size = sizeof(_type[_len]), .align = __alignof__(_type), \
+ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __dynamic_array
-#define __dynamic_array(type, item, len) \
- ret = trace_define_field(event_call, "__data_loc " #type "[]", #item, \
- offsetof(typeof(field), __data_loc_##item), \
- sizeof(field.__data_loc_##item), \
- is_signed_type(type), FILTER_OTHER);
+#define __dynamic_array(_type, _item, _len) { \
+ .type = "__data_loc " #_type "[]", .name = #_item, \
+ .size = 4, .align = 4, \
+ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __string
#define __string(item, src) __dynamic_array(char, item, -1)
@@ -446,16 +431,9 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \

#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, func, print) \
-static int notrace __init \
-trace_event_define_fields_##call(struct trace_event_call *event_call) \
-{ \
- struct trace_event_raw_##call field; \
- int ret; \
- \
- tstruct; \
- \
- return ret; \
-}
+static struct trace_event_fields trace_event_fields_##call[] = { \
+ tstruct \
+ {} };

#undef DEFINE_EVENT
#define DEFINE_EVENT(template, name, proto, args)
@@ -613,7 +591,7 @@ static inline notrace int trace_event_get_offsets_##call( \
*
* static struct trace_event_class __used event_class_<template> = {
* .system = "<system>",
- * .define_fields = trace_event_define_fields_<call>,
+ * .fields_array = trace_event_fields_<call>,
* .fields = LIST_HEAD_INIT(event_class_##call.fields),
* .raw_init = trace_event_raw_init,
* .probe = trace_event_raw_event_##call,
@@ -761,7 +739,7 @@ _TRACE_PERF_PROTO(call, PARAMS(proto)); \
static char print_fmt_##call[] = print; \
static struct trace_event_class __used __refdata event_class_##call = { \
.system = TRACE_SYSTEM_STRING, \
- .define_fields = trace_event_define_fields_##call, \
+ .fields_array = trace_event_fields_##call, \
.fields = LIST_HEAD_INIT(event_class_##call.fields),\
.raw_init = trace_event_raw_init, \
.probe = trace_event_raw_event_##call, \
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d685c61085c0..298a7cacf146 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -49,6 +49,9 @@ enum trace_type {
#undef __field
#define __field(type, item) type item;

+#undef __field_fn
+#define __field_fn(type, item) type item;
+
#undef __field_struct
#define __field_struct(type, item) __field(type, item)

@@ -68,26 +71,22 @@ enum trace_type {
#define F_STRUCT(args...) args

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \
struct struct_name { \
struct trace_entry ent; \
tstruct \
}

#undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk, filter)
+#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)

#undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
- filter, regfn) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))

#undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print, \
- filter) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter) __packed
+#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print)) __packed

#include "trace_entries.h"

@@ -1899,17 +1898,15 @@ extern void tracing_log_err(struct trace_array *tr,
#define internal_trace_puts(str) __trace_puts(_THIS_IP_, str, strlen(str))

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, id, tstruct, print, filter) \
+#define FTRACE_ENTRY(call, struct_name, id, tstruct, print) \
extern struct trace_event_call \
__aligned(4) event_##call;
#undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print, filter) \
- FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
#undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print, filter) \
- FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))

#include "trace_entries.h"

diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index fc8e97328e54..3e9d81608284 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -61,15 +61,13 @@ FTRACE_ENTRY_REG(function, ftrace_entry,
TRACE_FN,

F_STRUCT(
- __field( unsigned long, ip )
- __field( unsigned long, parent_ip )
+ __field_fn( unsigned long, ip )
+ __field_fn( unsigned long, parent_ip )
),

F_printk(" %ps <-- %ps",
(void *)__entry->ip, (void *)__entry->parent_ip),

- FILTER_TRACE_FN,
-
perf_ftrace_event_register
);

@@ -84,9 +82,7 @@ FTRACE_ENTRY_PACKED(funcgraph_entry, ftrace_graph_ent_entry,
__field_desc( int, graph_ent, depth )
),

- F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth),
-
- FILTER_OTHER
+ F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth)
);

/* Function return entry */
@@ -97,18 +93,16 @@ FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,
F_STRUCT(
__field_struct( struct ftrace_graph_ret, ret )
__field_desc( unsigned long, ret, func )
+ __field_desc( unsigned long, ret, overrun )
__field_desc( unsigned long long, ret, calltime)
__field_desc( unsigned long long, ret, rettime )
- __field_desc( unsigned long, ret, overrun )
__field_desc( int, ret, depth )
),

F_printk("<-- %ps (%d) (start: %llx end: %llx) over: %d",
(void *)__entry->func, __entry->depth,
__entry->calltime, __entry->rettime,
- __entry->depth),
-
- FILTER_OTHER
+ __entry->depth)
);

/*
@@ -137,9 +131,7 @@ FTRACE_ENTRY(context_switch, ctx_switch_entry,
F_printk("%u:%u:%u ==> %u:%u:%u [%03u]",
__entry->prev_pid, __entry->prev_prio, __entry->prev_state,
__entry->next_pid, __entry->next_prio, __entry->next_state,
- __entry->next_cpu),
-
- FILTER_OTHER
+ __entry->next_cpu)
);

/*
@@ -157,9 +149,7 @@ FTRACE_ENTRY_DUP(wakeup, ctx_switch_entry,
F_printk("%u:%u:%u ==+ %u:%u:%u [%03u]",
__entry->prev_pid, __entry->prev_prio, __entry->prev_state,
__entry->next_pid, __entry->next_prio, __entry->next_state,
- __entry->next_cpu),
-
- FILTER_OTHER
+ __entry->next_cpu)
);

/*
@@ -183,9 +173,7 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
(void *)__entry->caller[0], (void *)__entry->caller[1],
(void *)__entry->caller[2], (void *)__entry->caller[3],
(void *)__entry->caller[4], (void *)__entry->caller[5],
- (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
- FILTER_OTHER
+ (void *)__entry->caller[6], (void *)__entry->caller[7])
);

FTRACE_ENTRY(user_stack, userstack_entry,
@@ -203,9 +191,7 @@ FTRACE_ENTRY(user_stack, userstack_entry,
(void *)__entry->caller[0], (void *)__entry->caller[1],
(void *)__entry->caller[2], (void *)__entry->caller[3],
(void *)__entry->caller[4], (void *)__entry->caller[5],
- (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
- FILTER_OTHER
+ (void *)__entry->caller[6], (void *)__entry->caller[7])
);

/*
@@ -222,9 +208,7 @@ FTRACE_ENTRY(bprint, bprint_entry,
),

F_printk("%ps: %s",
- (void *)__entry->ip, __entry->fmt),
-
- FILTER_OTHER
+ (void *)__entry->ip, __entry->fmt)
);

FTRACE_ENTRY_REG(print, print_entry,
@@ -239,8 +223,6 @@ FTRACE_ENTRY_REG(print, print_entry,
F_printk("%ps: %s",
(void *)__entry->ip, __entry->buf),

- FILTER_OTHER,
-
ftrace_event_register
);

@@ -254,9 +236,7 @@ FTRACE_ENTRY(raw_data, raw_data_entry,
),

F_printk("id:%04x %08x",
- __entry->id, (int)__entry->buf[0]),
-
- FILTER_OTHER
+ __entry->id, (int)__entry->buf[0])
);

FTRACE_ENTRY(bputs, bputs_entry,
@@ -269,9 +249,7 @@ FTRACE_ENTRY(bputs, bputs_entry,
),

F_printk("%ps: %s",
- (void *)__entry->ip, __entry->str),
-
- FILTER_OTHER
+ (void *)__entry->ip, __entry->str)
);

FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
@@ -283,16 +261,14 @@ FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
__field_desc( resource_size_t, rw, phys )
__field_desc( unsigned long, rw, value )
__field_desc( unsigned long, rw, pc )
- __field_desc( int, rw, map_id )
+ __field_desc( int, rw, map_id )
__field_desc( unsigned char, rw, opcode )
__field_desc( unsigned char, rw, width )
),

F_printk("%lx %lx %lx %d %x %x",
(unsigned long)__entry->phys, __entry->value, __entry->pc,
- __entry->map_id, __entry->opcode, __entry->width),
-
- FILTER_OTHER
+ __entry->map_id, __entry->opcode, __entry->width)
);

FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
@@ -304,15 +280,13 @@ FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
__field_desc( resource_size_t, map, phys )
__field_desc( unsigned long, map, virt )
__field_desc( unsigned long, map, len )
- __field_desc( int, map, map_id )
+ __field_desc( int, map, map_id )
__field_desc( unsigned char, map, opcode )
),

F_printk("%lx %lx %lx %d %x",
(unsigned long)__entry->phys, __entry->virt, __entry->len,
- __entry->map_id, __entry->opcode),
-
- FILTER_OTHER
+ __entry->map_id, __entry->opcode)
);


@@ -334,9 +308,7 @@ FTRACE_ENTRY(branch, trace_branch,
F_printk("%u:%s:%s (%u)%s",
__entry->line,
__entry->func, __entry->file, __entry->correct,
- __entry->constant ? " CONSTANT" : ""),
-
- FILTER_OTHER
+ __entry->constant ? " CONSTANT" : "")
);


@@ -362,7 +334,5 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
__entry->duration,
__entry->outer_duration,
__entry->nmi_total_ts,
- __entry->nmi_count),
-
- FILTER_OTHER
+ __entry->nmi_count)
);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index fba87d10f0c1..508b5299d843 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -24,6 +24,7 @@
#include <linux/delay.h>

#include <trace/events/sched.h>
+#include <trace/syscall.h>

#include <asm/setup.h>

@@ -1990,7 +1991,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
*/
head = trace_get_fields(call);
if (list_empty(head)) {
- ret = call->class->define_fields(call);
+ struct trace_event_fields *field = call->class->fields_array;
+ unsigned int offset = sizeof(struct trace_entry);
+
+ for (; field->type; field++) {
+ if (strcmp(field->type, "__function__") == 0) {
+ ret = field->define_fields(call);
+ break;
+ }
+
+ offset = ALIGN(offset, field->align);
+ ret = trace_define_field(call, field->type, field->name,
+ offset, field->size,
+ field->is_signed, field->filter_type);
+ if (ret)
+ break;
+
+ offset += field->size;
+ }
if (ret < 0) {
pr_warn("Could not initialize trace point events/%s\n",
name);
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 45630a76ed3a..6d64c1c19fd5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -29,10 +29,8 @@ static int ftrace_event_register(struct trace_event_call *call,
* function and thus become accesible via perf.
*/
#undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
- filter, regfn) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))

/* not needed for this file */
#undef __field_struct
@@ -41,6 +39,9 @@ static int ftrace_event_register(struct trace_event_call *call,
#undef __field
#define __field(type, item) type item;

+#undef __field_fn
+#define __field_fn(type, item) type item;
+
#undef __field_desc
#define __field_desc(type, container, item) type item;

@@ -60,7 +61,7 @@ static int ftrace_event_register(struct trace_event_call *call,
#define F_printk(fmt, args...) fmt, args

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \
struct ____ftrace_##name { \
tstruct \
}; \
@@ -73,76 +74,46 @@ static void __always_unused ____ftrace_check_##name(void) \
}

#undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print, filter) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))

#include "trace_entries.h"

+#undef __field_ext
+#define __field_ext(_type, _item, _filter_type) { \
+ .type = #_type, .name = #_item, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ is_signed_type(_type), .filter_type = _filter_type },
+
#undef __field
-#define __field(type, item) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __field(_type, _item) __field_ext(_type, _item, FILTER_OTHER)
+
+#undef __field_fn
+#define __field_fn(_type, _item) __field_ext(_type, _item, FILTER_TRACE_FN)

#undef __field_desc
-#define __field_desc(type, container, item) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), \
- container.item), \
- sizeof(field.container.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __field_desc(_type, _container, _item) __field_ext(_type, _item, FILTER_OTHER)

#undef __array
-#define __array(type, item, len) \
- do { \
- char *type_str = #type"["__stringify(len)"]"; \
- BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
- ret = trace_define_field(event_call, type_str, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret; \
- } while (0);
+#define __array(_type, _item, _len) { \
+ .type = #_type"["__stringify(_len)"]", .name = #_item, \
+ .size = sizeof(_type[_len]), .align = __alignof__(_type), \
+ is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __array_desc
-#define __array_desc(type, container, item, len) \
- BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
- ret = trace_define_field(event_call, #type "[" #len "]", #item, \
- offsetof(typeof(field), \
- container.item), \
- sizeof(field.container.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __array_desc(_type, _container, _item, _len) __array(_type, _item, _len)

#undef __dynamic_array
-#define __dynamic_array(type, item) \
- ret = trace_define_field(event_call, #type "[]", #item, \
- offsetof(typeof(field), item), \
- 0, is_signed_type(type), filter_type);\
- if (ret) \
- return ret;
+#define __dynamic_array(_type, _item) { \
+ .type = #_type "[]", .name = #_item, \
+ .size = 0, .align = __alignof__(_type), \
+ is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \
-static int __init \
-ftrace_define_fields_##name(struct trace_event_call *event_call) \
-{ \
- struct struct_name field; \
- int ret; \
- int filter_type = filter; \
- \
- tstruct; \
- \
- return ret; \
-}
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \
+static struct trace_event_fields ftrace_event_fields_##name[] = { \
+ tstruct \
+ {} };

#include "trace_entries.h"

@@ -152,6 +123,9 @@ ftrace_define_fields_##name(struct trace_event_call *event_call) \
#undef __field
#define __field(type, item)

+#undef __field_fn
+#define __field_fn(type, item)
+
#undef __field_desc
#define __field_desc(type, container, item)

@@ -168,12 +142,10 @@ ftrace_define_fields_##name(struct trace_event_call *event_call) \
#define F_printk(fmt, args...) __stringify(fmt) ", " __stringify(args)

#undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, filter,\
- regfn) \
- \
+#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn) \
struct trace_event_class __refdata event_class_ftrace_##call = { \
.system = __stringify(TRACE_SYSTEM), \
- .define_fields = ftrace_define_fields_##call, \
+ .fields_array = ftrace_event_fields_##call, \
.fields = LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
.reg = regfn, \
}; \
@@ -191,9 +163,9 @@ struct trace_event_call __used \
__attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print, filter) \
+#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print) \
FTRACE_ENTRY_REG(call, struct_name, etype, \
- PARAMS(tstruct), PARAMS(print), filter, NULL)
+ PARAMS(tstruct), PARAMS(print), NULL)

bool ftrace_event_is_function(struct trace_event_call *call)
{
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1552a95c743b..618267647885 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1540,10 +1540,18 @@ static inline void init_trace_event_call(struct trace_kprobe *tk)

if (trace_kprobe_is_return(tk)) {
call->event.funcs = &kretprobe_funcs;
- call->class->define_fields = kretprobe_event_define_fields;
+ call->class->fields_array = (struct trace_event_fields[]) {
+ { .type = "__function__",
+ .define_fields = kretprobe_event_define_fields },
+ {}
+ };
} else {
call->event.funcs = &kprobe_funcs;
- call->class->define_fields = kprobe_event_define_fields;
+ call->class->fields_array = (struct trace_event_fields[]) {
+ { .type = "__function__",
+ .define_fields = kprobe_event_define_fields },
+ {}
+ };
}

call->flags = TRACE_EVENT_FL_KPROBE;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index fa8fbff736d6..766557a99385 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -198,11 +198,10 @@ print_syscall_exit(struct trace_iterator *iter, int flags,

extern char *__bad_type_size(void);

-#define SYSCALL_FIELD(type, field, name) \
- sizeof(type) != sizeof(trace.field) ? \
- __bad_type_size() : \
- #type, #name, offsetof(typeof(trace), field), \
- sizeof(trace.field), is_signed_type(type)
+#define SYSCALL_FIELD(_type, _name) { \
+ .type = #_type, .name = #_name, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER }

static int __init
__set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
@@ -269,42 +268,22 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
{
struct syscall_trace_enter trace;
struct syscall_metadata *meta = call->data;
- int ret;
- int i;
int offset = offsetof(typeof(trace), args);
-
- ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
- FILTER_OTHER);
- if (ret)
- return ret;
+ int ret, i;

for (i = 0; i < meta->nb_args; i++) {
ret = trace_define_field(call, meta->types[i],
meta->args[i], offset,
sizeof(unsigned long), 0,
FILTER_OTHER);
+ if (ret)
+ break;
offset += sizeof(unsigned long);
}

return ret;
}

-static int __init syscall_exit_define_fields(struct trace_event_call *call)
-{
- struct syscall_trace_exit trace;
- int ret;
-
- ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
- FILTER_OTHER);
- if (ret)
- return ret;
-
- ret = trace_define_field(call, SYSCALL_FIELD(long, ret, ret),
- FILTER_OTHER);
-
- return ret;
-}
-
static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
{
struct trace_array *tr = data;
@@ -513,7 +492,12 @@ struct trace_event_functions exit_syscall_print_funcs = {
struct trace_event_class __refdata event_class_syscall_enter = {
.system = "syscalls",
.reg = syscall_enter_register,
- .define_fields = syscall_enter_define_fields,
+ .fields_array = (struct trace_event_fields[]) {
+ SYSCALL_FIELD(int, __syscall_nr),
+ { .type = "__function__",
+ .define_fields = syscall_enter_define_fields },
+ {}
+ },
.get_fields = syscall_get_enter_fields,
.raw_init = init_syscall_trace,
};
@@ -521,7 +505,11 @@ struct trace_event_class __refdata event_class_syscall_enter = {
struct trace_event_class __refdata event_class_syscall_exit = {
.system = "syscalls",
.reg = syscall_exit_register,
- .define_fields = syscall_exit_define_fields,
+ .fields_array = (struct trace_event_fields[]){
+ SYSCALL_FIELD(int, __syscall_nr),
+ SYSCALL_FIELD(long, ret),
+ {}
+ },
.fields = LIST_HEAD_INIT(event_class_syscall_exit.fields),
.raw_init = init_syscall_trace,
};
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 352073d36585..9aa866c62354 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1512,7 +1512,11 @@ static inline void init_trace_event_call(struct trace_uprobe *tu)
struct trace_event_call *call = trace_probe_event_call(&tu->tp);

call->event.funcs = &uprobe_funcs;
- call->class->define_fields = uprobe_event_define_fields;
+ call->class->fields_array = (struct trace_event_fields[]) {
+ { .type = "__function__",
+ .define_fields = uprobe_event_define_fields },
+ {}
+ };

call->flags = TRACE_EVENT_FL_UPROBE | TRACE_EVENT_FL_CAP_ANY;
call->class->reg = trace_uprobe_register;

2019-10-22 23:55:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Tue, 22 Oct 2019 22:24:01 +0200
Peter Zijlstra <[email protected]> wrote:

> The below seems to cure it; and seems to generate identical
> events/*/format output (for my .config, with the exception of ID).
>
> It has just one section mismatch report that I'm too tired to look at
> just now.

Thanks, I'll try to take a look at it tomorrow.

>
> I'm not particularly proud of the "__function__" hack, but it works :/ I

If anything, that should be defined as a macro:

#define TRACE_EVENT_FIELD_SPECIAL "__trace_event_special__"

And use that to test?

> couldn't come up with anything else for [uk]probes which seem to have
> dynamic fields and if we're having it then syscall_enter can also make
> use of it, the syscall_metadata crud was going to be ugly otherwise.
>
> (also, win on LOC)

I'm more worried about text/data bloat. But if anything, we may be able
to deal with that later.

-- Steve

2019-10-23 09:10:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Tue, Oct 22, 2019 at 04:40:23PM -0400, Steven Rostedt wrote:
> On Tue, 22 Oct 2019 22:24:01 +0200
> Peter Zijlstra <[email protected]> wrote:

> > I'm not particularly proud of the "__function__" hack, but it works :/ I
>
> If anything, that should be defined as a macro:
>
> #define TRACE_EVENT_FIELD_SPECIAL "__trace_event_special__"
>
> And use that to test?

Possibly, also, we should probably start with a character that C doesn't
allow in typenames, like '$'.

That way we can have a much shorter string and still be certain it will
never conflict; "$func" comes to mind.

> > couldn't come up with anything else for [uk]probes which seem to have
> > dynamic fields and if we're having it then syscall_enter can also make
> > use of it, the syscall_metadata crud was going to be ugly otherwise.
> >
> > (also, win on LOC)
>
> I'm more worried about text/data bloat. But if anything, we may be able
> to deal with that later.

We use almost the exact same data the function would've used, except we
don't have the actual function. I just don't see how it can be more.

2019-10-23 16:22:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:

> > Doesn't livepatch code also need to be modified? We have:
>
> Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
> arm64/ftrace and klp are the only two users of that function (outside of
> module.c) and Mark was already writing a patch for arm64.
>
> Means these last two patches need to wait a little until we've fixed
> those.

So the other KLP issue:

<mbenes> peterz: ad klp, apply_relocate_add() and text_poke()... what
about apply_alternatives() and apply_paravirt()? They call
text_poke_early(), which was ok with module_disable/enable_ro(), but
now it is not, I suppose. See arch_klp_init_object_loaded() in
arch/x86/kernel/livepatch.c.

<peterz> mbenes: hurm, I don't see why we would need to do
apply_alternatives() / apply_paravirt() later, why can't we do that
the moment we load the module

<peterz> mbenes: that is, those things _never_ change after boot

<mbenes> peterz: as jpoimboe explained. See commit
d4c3e6e1b193497da3a2ce495fb1db0243e41e37 for detailed explanation.

Now sadly that commit missed all the useful information, luckily I could
find the patch in my LKML folder, more sad, that thread still didn't
contain the actual useful information, for that I was directed to
github:

https://github.com/dynup/kpatch/issues/580

Now, someone is owning me a beer for having to look at github for this.

That finally explained that what happens is that the RELA was trying to
fix up the paravirt indirect call to 'local_irq_disable', which
apply_paravirt() will have overwritten with 'CLI; NOP'. This then
obviously goes *bang*.

This then raises a number of questions:

1) why is that RELA (that obviously does not depend on any module)
applied so late?

2) why can't we unconditionally skip RELA's to paravirt sites?

3) Is there ever a possible module-dependent RELA to a paravirt /
alternative site?


Now, for 1), I would propose '.klp.rela.${mod}' sections only contain
RELAs that depend on symbols in ${mod} (or modules in general). We can
fix up RELAs that depend on core kernel early without problems. Let them
be in the normal .rela sections and be fixed up on loading the
patch-module as per usual.

This should also deal with 2, paravirt should always have RELAs into the
core kernel.

Then for 3) we only have alternatives left, and I _think_ it unlikely to
be the case, but I'll have to have a hard look at that.

Hmmm ?

2019-10-24 03:30:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Wed, Oct 23, 2019 at 01:48:35PM +0200, Peter Zijlstra wrote:
> On Mon, Oct 21, 2019 at 04:14:02PM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 21, 2019 at 08:53:12AM -0500, Josh Poimboeuf wrote:
>
> > > Doesn't livepatch code also need to be modified? We have:
> >
> > Urgh bah.. I was too focussed on the other klp borkage :/ But yes,
> > arm64/ftrace and klp are the only two users of that function (outside of
> > module.c) and Mark was already writing a patch for arm64.
> >
> > Means these last two patches need to wait a little until we've fixed
> > those.
>
> So the other KLP issue:
>
> <mbenes> peterz: ad klp, apply_relocate_add() and text_poke()... what
> about apply_alternatives() and apply_paravirt()? They call
> text_poke_early(), which was ok with module_disable/enable_ro(), but
> now it is not, I suppose. See arch_klp_init_object_loaded() in
> arch/x86/kernel/livepatch.c.
>
> <peterz> mbenes: hurm, I don't see why we would need to do
> apply_alternatives() / apply_paravirt() later, why can't we do that
> the moment we load the module
>
> <peterz> mbenes: that is, those things _never_ change after boot
>
> <mbenes> peterz: as jpoimboe explained. See commit
> d4c3e6e1b193497da3a2ce495fb1db0243e41e37 for detailed explanation.
>
> Now sadly that commit missed all the useful information, luckily I could
> find the patch in my LKML folder, more sad, that thread still didn't
> contain the actual useful information, for that I was directed to
> github:
>
> https://github.com/dynup/kpatch/issues/580
>
> Now, someone is owning me a beer for having to look at github for this.
>
> That finally explained that what happens is that the RELA was trying to
> fix up the paravirt indirect call to 'local_irq_disable', which
> apply_paravirt() will have overwritten with 'CLI; NOP'. This then
> obviously goes *bang*.
>
> This then raises a number of questions:
>
> 1) why is that RELA (that obviously does not depend on any module)
> applied so late?
>
> 2) why can't we unconditionally skip RELA's to paravirt sites?
>
> 3) Is there ever a possible module-dependent RELA to a paravirt /
> alternative site?
>
>
> Now, for 1), I would propose '.klp.rela.${mod}' sections only contain
> RELAs that depend on symbols in ${mod} (or modules in general). We can
> fix up RELAs that depend on core kernel early without problems. Let them
> be in the normal .rela sections and be fixed up on loading the
> patch-module as per usual.
>
> This should also deal with 2, paravirt should always have RELAs into the
> core kernel.
>
> Then for 3) we only have alternatives left, and I _think_ it unlikely to
> be the case, but I'll have to have a hard look at that.
>
> Hmmm ?

Something like so on top, I suppose.

---

--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -131,7 +131,8 @@ static int __apply_relocate_add(Elf64_Sh
unsigned int symindex,
unsigned int relsec,
struct module *me,
- void *(*write)(void *addr, const void *val, size_t size))
+ void *(*write)(void *addr, const void *val, size_t size),
+ bool klp)
{
unsigned int i;
Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
@@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh

val = sym->st_value + rel[i].r_addend;

+ /*
+ * .klp.rela.* sections should only contain module
+ * related RELAs. All core-kernel RELAs should be in
+ * normal .rela.* sections and be applied when loading
+ * the patch module itself.
+ */
+ WARN_ON_ONCE(klp && core_kernel_text(val));
+
switch (ELF64_R_TYPE(rel[i].r_info)) {
case R_X86_64_NONE:
break;
@@ -223,7 +232,7 @@ int apply_relocate_add(Elf64_Shdr *sechd
unsigned int relsec,
struct module *me)
{
- return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
+ return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy, false);
}

int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
@@ -234,7 +243,7 @@ int klp_apply_relocate_add(Elf64_Shdr *s
{
int ret;

- ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
+ ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke, true);
if (!ret)
text_poke_sync();

2019-10-24 11:43:59

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Wed, Oct 23, 2019 at 01:48:35PM +0200, Peter Zijlstra wrote:
> Now sadly that commit missed all the useful information, luckily I could
> find the patch in my LKML folder, more sad, that thread still didn't
> contain the actual useful information, for that I was directed to
> github:
>
> https://github.com/dynup/kpatch/issues/580
>
> Now, someone is owning me a beer for having to look at github for this.

Deal. And you probably deserve a few more for fixing our crap.

The github thing is supposed to be temporary, at least in theory we'll
eventually have all klp patch module building code in the kernel tree.

> That finally explained that what happens is that the RELA was trying to
> fix up the paravirt indirect call to 'local_irq_disable', which
> apply_paravirt() will have overwritten with 'CLI; NOP'. This then
> obviously goes *bang*.
>
> This then raises a number of questions:
>
> 1) why is that RELA (that obviously does not depend on any module)
> applied so late?

Good question. The 'pv_ops' symbol is exported by the core kernel, so I
can't see any reason why we'd need to apply that rela late. In theory,
kpatch-build isn't supposed to convert that to a klp rela. Maybe
something went wrong in the patch creation code.

I'm also questioning why we even need to apply the parainstructions
section late. Maybe we can remove that apply_paravirt() call
altogether, along with .klp.arch.parainstruction sections.

I'll need to look into it...

> 2) why can't we unconditionally skip RELA's to paravirt sites?

We could, but I don't think it's needed if we fix #1.

> 3) Is there ever a possible module-dependent RELA to a paravirt /
> alternative site?

Good question...

> Now, for 1), I would propose '.klp.rela.${mod}' sections only contain
> RELAs that depend on symbols in ${mod} (or modules in general).

That was already the goal, but we've apparently failed at that.

> We can fix up RELAs that depend on core kernel early without problems.
> Let them be in the normal .rela sections and be fixed up on loading
> the patch-module as per usual.

If such symbols aren't exported, then they still need to be in
.klp.rela.vmlinux sections, since normal relas won't work.

> This should also deal with 2, paravirt should always have RELAs into the
> core kernel.
>
> Then for 3) we only have alternatives left, and I _think_ it unlikely to
> be the case, but I'll have to have a hard look at that.

I'm not sure about alternatives, but maybe we can enforce such
limitations with tooling and/or kernel checks.

--
Josh

2019-10-24 11:46:05

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Wed, Oct 23, 2019 at 05:16:54PM +0200, Peter Zijlstra wrote:
> @@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh
>
> val = sym->st_value + rel[i].r_addend;
>
> + /*
> + * .klp.rela.* sections should only contain module
> + * related RELAs. All core-kernel RELAs should be in
> + * normal .rela.* sections and be applied when loading
> + * the patch module itself.
> + */
> + WARN_ON_ONCE(klp && core_kernel_text(val));
> +

This isn't quite true, we also use .klp.rela sections to access
unexported vmlinux symbols.

--
Josh

2019-10-24 13:13:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Tue, 22 Oct 2019 22:24:01 +0200
Peter Zijlstra <[email protected]> wrote:

> On Mon, Oct 21, 2019 at 10:21:10PM -0400, Steven Rostedt wrote:
> > On Fri, 18 Oct 2019 09:35:40 +0200
> > Peter Zijlstra <[email protected]> wrote:
> >
> > > Now that set_all_modules_text_*() is gone, nothing depends on the
> > > relation between ->state = COMING and the protection state anymore.
> > > This enables moving the protection changes later, such that the COMING
> > > notifier callbacks can more easily modify the text.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > > Cc: Jessica Yu <[email protected]>
> > > ---
> >
> > This triggered the following bug:
> >
>
> > The trace_event_define_fields_<event>() is defined in
> > include/trace/trace_events.h and is an init function called by the
> > trace_events event_create_dir() via the module notifier:
> > MODULE_STATE_COMING
>
> The below seems to cure it; and seems to generate identical
> events/*/format output (for my .config, with the exception of ID).
>
> It has just one section mismatch report that I'm too tired to look at
> just now.
>
> I'm not particularly proud of the "__function__" hack, but it works :/ I
> couldn't come up with anything else for [uk]probes which seem to have
> dynamic fields and if we're having it then syscall_enter can also make
> use of it, the syscall_metadata crud was going to be ugly otherwise.
>
> (also, win on LOC)
>
>

After applying this series and this patch I triggered this:

[ 1397.281889] BUG: kernel NULL pointer dereference, address: 0000000000000001
[ 1397.288896] #PF: supervisor read access in kernel mode
[ 1397.294062] #PF: error_code(0x0000) - not-present page
[ 1397.299192] PGD 0 P4D 0
[ 1397.301728] Oops: 0000 [#1] PREEMPT SMP PTI
[ 1397.305908] CPU: 7 PID: 4252 Comm: ftracetest Not tainted 5.4.0-rc3-test+ #132
[ 1397.313114] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
[ 1397.322056] RIP: 0010:event_create_dir+0x26a/0x520
[ 1397.326841] Code: ff ff 5a 85 c0 75 37 44 03 7b 10 48 83 c3 20 4c 8b 13 4d 85 d2 0f 84 66 fe ff ff b9 0d 00 00 00 4c 89 d6 4c 89 f7 48 8b 53 08 <f3> a6 0f 97 c1 80 d9 00 84 c9 75 a5 48 89 ef e8 b2 d4 a3 00 85 c0
[ 1397.345558] RSP: 0018:ffffc90000a63d18 EFLAGS: 00010202
[ 1397.350775] RAX: 0000000000000000 RBX: ffffc90000a63d80 RCX: 000000000000000d
[ 1397.357893] RDX: ffffffff811ccfac RSI: 0000000000000001 RDI: ffffffff8207c68a
[ 1397.365006] RBP: ffff888114b1c750 R08: 0000000000000000 R09: ffff8881147561b0
[ 1397.372119] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880b1d80528
[ 1397.379243] R13: ffff88811189aeb0 R14: ffffffff8207c68a R15: 00000000811d2d22
[ 1397.386365] FS: 00007f2567213740(0000) GS:ffff88811a9c0000(0000) knlGS:0000000000000000
[ 1397.394437] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1397.400174] CR2: 0000000000000001 CR3: 00000000b1f06005 CR4: 00000000001606e0
[ 1397.407297] Call Trace:
[ 1397.409753] trace_add_event_call+0x6c/0xb0
[ 1397.413938] trace_probe_register_event_call+0x22/0x50
[ 1397.419071] trace_kprobe_create+0x65c/0xa20
[ 1397.423340] ? argv_split+0x99/0x130
[ 1397.426913] ? __kmalloc+0x1d4/0x2c0
[ 1397.430485] ? trace_kprobe_create+0xa20/0xa20
[ 1397.434922] ? trace_kprobe_create+0xa20/0xa20
[ 1397.439361] create_or_delete_trace_kprobe+0xd/0x30
[ 1397.444237] trace_run_command+0x72/0x90
[ 1397.448158] trace_parse_run_command+0xaf/0x131
[ 1397.452684] vfs_write+0xa5/0x1a0
[ 1397.455996] ksys_write+0x5c/0xd0
[ 1397.459312] do_syscall_64+0x48/0x120
[ 1397.462971] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1397.468017] RIP: 0033:0x7f2567303ff8

By running tools/selftests/ftrace/ftracetest

Crashed here:

[33] Kprobe dynamic event - adding and removing [PASS]
[34] Kprobe dynamic event - busy event check [PASS]
[35] Kprobe event with comm arguments [FAIL]
[36] Kprobe event string type argumentclient_loop:

-- Steve

2019-10-25 07:01:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Wed, Oct 23, 2019 at 02:52:45PM -0400, Steven Rostedt wrote:
> After applying this series and this patch I triggered this:

Bah, I hate C.

(also for some reason I had KPROBE_EVENTS disabled, when I enabled it it
failed on boot due to selftests)

this one seems to boot and survive your selftests thing (that takes for
bloody ever to run).

---
Subject: ftrace: Rework event_create_dir()

Rework event_create_dir() to use an array of static data instead of
function pointers where possible.

The problem is that it would call the function pointer on module load
before parse_args(), possibly even before jump_labels were initialized.

Luckily the generated functions don't use jump_labels but it still seems
fragile. It also gets in the way of changing when we make the module map
executable.

The generated function are basically calling trace_define_field() with a
bunch of static arguments. So instead of a function, capture these
arguments in a static array, avoiding the function call.

Now there are a number of cases where the fields are dynamic (syscall
arguments, kprobes and uprobes), in which case a static array does not
work, for these we preserve the function call. Luckily all these cases
are not related to modules and so we can retain the function call for
them.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
fs/xfs/xfs_trace.h | 4 +-
include/linux/trace_events.h | 18 ++++++-
include/trace/events/filemap.h | 2 +-
include/trace/trace_events.h | 64 ++++++++-----------------
kernel/trace/trace.h | 31 ++++++------
kernel/trace/trace_entries.h | 66 +++++++------------------
kernel/trace/trace_events.c | 20 +++++++-
kernel/trace/trace_export.c | 106 +++++++++++++++--------------------------
kernel/trace/trace_kprobe.c | 16 ++++++-
kernel/trace/trace_syscalls.c | 50 ++++++++-----------
kernel/trace/trace_uprobe.c | 9 +++-
11 files changed, 172 insertions(+), 214 deletions(-)

diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index eaae275ed430..53c5485cf2a1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -218,8 +218,8 @@ DECLARE_EVENT_CLASS(xfs_bmap_class,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
- __field(void *, leaf);
- __field(int, pos);
+ __field(void *, leaf)
+ __field(int, pos)
__field(xfs_fileoff_t, startoff)
__field(xfs_fsblock_t, startblock)
__field(xfs_filblks_t, blockcount)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 30a8cdcfd4a4..a379255c14a9 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -187,6 +187,22 @@ enum trace_reg {

struct trace_event_call;

+#define TRACE_FUNCTION_TYPE ((const char *)~0UL)
+
+struct trace_event_fields {
+ const char *type;
+ union {
+ struct {
+ const char *name;
+ const int size;
+ const int align;
+ const int is_signed;
+ const int filter_type;
+ };
+ int (*define_fields)(struct trace_event_call *);
+ };
+};
+
struct trace_event_class {
const char *system;
void *probe;
@@ -195,7 +211,7 @@ struct trace_event_class {
#endif
int (*reg)(struct trace_event_call *event,
enum trace_reg type, void *data);
- int (*define_fields)(struct trace_event_call *);
+ struct trace_event_fields *fields_array;
struct list_head *(*get_fields)(struct trace_event_call *);
struct list_head fields;
int (*raw_init)(struct trace_event_call *);
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index ee05db7ee8d2..796053e162d2 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -85,7 +85,7 @@ TRACE_EVENT(file_check_and_advance_wb_err,
TP_ARGS(file, old),

TP_STRUCT__entry(
- __field(struct file *, file);
+ __field(struct file *, file)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
__field(errseq_t, old)
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 4ecdfe2e3580..ca1d2e745a3f 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -394,22 +394,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)

#undef __field_ext
-#define __field_ext(type, item, filter_type) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __field_ext(_type, _item, _filter_type) { \
+ .type = #_type, .name = #_item, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ .is_signed = is_signed_type(_type), .filter_type = _filter_type },

#undef __field_struct_ext
-#define __field_struct_ext(type, item, filter_type) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- 0, filter_type); \
- if (ret) \
- return ret;
+#define __field_struct_ext(_type, _item, _filter_type) { \
+ .type = #_type, .name = #_item, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ 0, .filter_type = _filter_type },

#undef __field
#define __field(type, item) __field_ext(type, item, FILTER_OTHER)
@@ -418,25 +412,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \
#define __field_struct(type, item) __field_struct_ext(type, item, FILTER_OTHER)

#undef __array
-#define __array(type, item, len) \
- do { \
- char *type_str = #type"["__stringify(len)"]"; \
- BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
- BUILD_BUG_ON(len <= 0); \
- ret = trace_define_field(event_call, type_str, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), FILTER_OTHER); \
- if (ret) \
- return ret; \
- } while (0);
+#define __array(_type, _item, _len) { \
+ .type = #_type"["__stringify(_len)"]", .name = #_item, \
+ .size = sizeof(_type[_len]), .align = __alignof__(_type), \
+ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __dynamic_array
-#define __dynamic_array(type, item, len) \
- ret = trace_define_field(event_call, "__data_loc " #type "[]", #item, \
- offsetof(typeof(field), __data_loc_##item), \
- sizeof(field.__data_loc_##item), \
- is_signed_type(type), FILTER_OTHER);
+#define __dynamic_array(_type, _item, _len) { \
+ .type = "__data_loc " #_type "[]", .name = #_item, \
+ .size = 4, .align = 4, \
+ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __string
#define __string(item, src) __dynamic_array(char, item, -1)
@@ -446,16 +431,9 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \

#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, func, print) \
-static int notrace __init \
-trace_event_define_fields_##call(struct trace_event_call *event_call) \
-{ \
- struct trace_event_raw_##call field; \
- int ret; \
- \
- tstruct; \
- \
- return ret; \
-}
+static struct trace_event_fields trace_event_fields_##call[] = { \
+ tstruct \
+ {} };

#undef DEFINE_EVENT
#define DEFINE_EVENT(template, name, proto, args)
@@ -613,7 +591,7 @@ static inline notrace int trace_event_get_offsets_##call( \
*
* static struct trace_event_class __used event_class_<template> = {
* .system = "<system>",
- * .define_fields = trace_event_define_fields_<call>,
+ * .fields_array = trace_event_fields_<call>,
* .fields = LIST_HEAD_INIT(event_class_##call.fields),
* .raw_init = trace_event_raw_init,
* .probe = trace_event_raw_event_##call,
@@ -761,7 +739,7 @@ _TRACE_PERF_PROTO(call, PARAMS(proto)); \
static char print_fmt_##call[] = print; \
static struct trace_event_class __used __refdata event_class_##call = { \
.system = TRACE_SYSTEM_STRING, \
- .define_fields = trace_event_define_fields_##call, \
+ .fields_array = trace_event_fields_##call, \
.fields = LIST_HEAD_INIT(event_class_##call.fields),\
.raw_init = trace_event_raw_init, \
.probe = trace_event_raw_event_##call, \
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d685c61085c0..298a7cacf146 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -49,6 +49,9 @@ enum trace_type {
#undef __field
#define __field(type, item) type item;

+#undef __field_fn
+#define __field_fn(type, item) type item;
+
#undef __field_struct
#define __field_struct(type, item) __field(type, item)

@@ -68,26 +71,22 @@ enum trace_type {
#define F_STRUCT(args...) args

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \
struct struct_name { \
struct trace_entry ent; \
tstruct \
}

#undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk, filter)
+#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)

#undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
- filter, regfn) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))

#undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print, \
- filter) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter) __packed
+#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print)) __packed

#include "trace_entries.h"

@@ -1899,17 +1898,15 @@ extern void tracing_log_err(struct trace_array *tr,
#define internal_trace_puts(str) __trace_puts(_THIS_IP_, str, strlen(str))

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, id, tstruct, print, filter) \
+#define FTRACE_ENTRY(call, struct_name, id, tstruct, print) \
extern struct trace_event_call \
__aligned(4) event_##call;
#undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print, filter) \
- FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
#undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print, filter) \
- FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))

#include "trace_entries.h"

diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index fc8e97328e54..3e9d81608284 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -61,15 +61,13 @@ FTRACE_ENTRY_REG(function, ftrace_entry,
TRACE_FN,

F_STRUCT(
- __field( unsigned long, ip )
- __field( unsigned long, parent_ip )
+ __field_fn( unsigned long, ip )
+ __field_fn( unsigned long, parent_ip )
),

F_printk(" %ps <-- %ps",
(void *)__entry->ip, (void *)__entry->parent_ip),

- FILTER_TRACE_FN,
-
perf_ftrace_event_register
);

@@ -84,9 +82,7 @@ FTRACE_ENTRY_PACKED(funcgraph_entry, ftrace_graph_ent_entry,
__field_desc( int, graph_ent, depth )
),

- F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth),
-
- FILTER_OTHER
+ F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth)
);

/* Function return entry */
@@ -97,18 +93,16 @@ FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,
F_STRUCT(
__field_struct( struct ftrace_graph_ret, ret )
__field_desc( unsigned long, ret, func )
+ __field_desc( unsigned long, ret, overrun )
__field_desc( unsigned long long, ret, calltime)
__field_desc( unsigned long long, ret, rettime )
- __field_desc( unsigned long, ret, overrun )
__field_desc( int, ret, depth )
),

F_printk("<-- %ps (%d) (start: %llx end: %llx) over: %d",
(void *)__entry->func, __entry->depth,
__entry->calltime, __entry->rettime,
- __entry->depth),
-
- FILTER_OTHER
+ __entry->depth)
);

/*
@@ -137,9 +131,7 @@ FTRACE_ENTRY(context_switch, ctx_switch_entry,
F_printk("%u:%u:%u ==> %u:%u:%u [%03u]",
__entry->prev_pid, __entry->prev_prio, __entry->prev_state,
__entry->next_pid, __entry->next_prio, __entry->next_state,
- __entry->next_cpu),
-
- FILTER_OTHER
+ __entry->next_cpu)
);

/*
@@ -157,9 +149,7 @@ FTRACE_ENTRY_DUP(wakeup, ctx_switch_entry,
F_printk("%u:%u:%u ==+ %u:%u:%u [%03u]",
__entry->prev_pid, __entry->prev_prio, __entry->prev_state,
__entry->next_pid, __entry->next_prio, __entry->next_state,
- __entry->next_cpu),
-
- FILTER_OTHER
+ __entry->next_cpu)
);

/*
@@ -183,9 +173,7 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
(void *)__entry->caller[0], (void *)__entry->caller[1],
(void *)__entry->caller[2], (void *)__entry->caller[3],
(void *)__entry->caller[4], (void *)__entry->caller[5],
- (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
- FILTER_OTHER
+ (void *)__entry->caller[6], (void *)__entry->caller[7])
);

FTRACE_ENTRY(user_stack, userstack_entry,
@@ -203,9 +191,7 @@ FTRACE_ENTRY(user_stack, userstack_entry,
(void *)__entry->caller[0], (void *)__entry->caller[1],
(void *)__entry->caller[2], (void *)__entry->caller[3],
(void *)__entry->caller[4], (void *)__entry->caller[5],
- (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
- FILTER_OTHER
+ (void *)__entry->caller[6], (void *)__entry->caller[7])
);

/*
@@ -222,9 +208,7 @@ FTRACE_ENTRY(bprint, bprint_entry,
),

F_printk("%ps: %s",
- (void *)__entry->ip, __entry->fmt),
-
- FILTER_OTHER
+ (void *)__entry->ip, __entry->fmt)
);

FTRACE_ENTRY_REG(print, print_entry,
@@ -239,8 +223,6 @@ FTRACE_ENTRY_REG(print, print_entry,
F_printk("%ps: %s",
(void *)__entry->ip, __entry->buf),

- FILTER_OTHER,
-
ftrace_event_register
);

@@ -254,9 +236,7 @@ FTRACE_ENTRY(raw_data, raw_data_entry,
),

F_printk("id:%04x %08x",
- __entry->id, (int)__entry->buf[0]),
-
- FILTER_OTHER
+ __entry->id, (int)__entry->buf[0])
);

FTRACE_ENTRY(bputs, bputs_entry,
@@ -269,9 +249,7 @@ FTRACE_ENTRY(bputs, bputs_entry,
),

F_printk("%ps: %s",
- (void *)__entry->ip, __entry->str),
-
- FILTER_OTHER
+ (void *)__entry->ip, __entry->str)
);

FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
@@ -283,16 +261,14 @@ FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
__field_desc( resource_size_t, rw, phys )
__field_desc( unsigned long, rw, value )
__field_desc( unsigned long, rw, pc )
- __field_desc( int, rw, map_id )
+ __field_desc( int, rw, map_id )
__field_desc( unsigned char, rw, opcode )
__field_desc( unsigned char, rw, width )
),

F_printk("%lx %lx %lx %d %x %x",
(unsigned long)__entry->phys, __entry->value, __entry->pc,
- __entry->map_id, __entry->opcode, __entry->width),
-
- FILTER_OTHER
+ __entry->map_id, __entry->opcode, __entry->width)
);

FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
@@ -304,15 +280,13 @@ FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
__field_desc( resource_size_t, map, phys )
__field_desc( unsigned long, map, virt )
__field_desc( unsigned long, map, len )
- __field_desc( int, map, map_id )
+ __field_desc( int, map, map_id )
__field_desc( unsigned char, map, opcode )
),

F_printk("%lx %lx %lx %d %x",
(unsigned long)__entry->phys, __entry->virt, __entry->len,
- __entry->map_id, __entry->opcode),
-
- FILTER_OTHER
+ __entry->map_id, __entry->opcode)
);


@@ -334,9 +308,7 @@ FTRACE_ENTRY(branch, trace_branch,
F_printk("%u:%s:%s (%u)%s",
__entry->line,
__entry->func, __entry->file, __entry->correct,
- __entry->constant ? " CONSTANT" : ""),
-
- FILTER_OTHER
+ __entry->constant ? " CONSTANT" : "")
);


@@ -362,7 +334,5 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
__entry->duration,
__entry->outer_duration,
__entry->nmi_total_ts,
- __entry->nmi_count),
-
- FILTER_OTHER
+ __entry->nmi_count)
);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index fba87d10f0c1..5ab10c3dce78 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -24,6 +24,7 @@
#include <linux/delay.h>

#include <trace/events/sched.h>
+#include <trace/syscall.h>

#include <asm/setup.h>

@@ -1990,7 +1991,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
*/
head = trace_get_fields(call);
if (list_empty(head)) {
- ret = call->class->define_fields(call);
+ struct trace_event_fields *field = call->class->fields_array;
+ unsigned int offset = sizeof(struct trace_entry);
+
+ for (; field->type; field++) {
+ if (field->type == TRACE_FUNCTION_TYPE) {
+ ret = field->define_fields(call);
+ break;
+ }
+
+ offset = ALIGN(offset, field->align);
+ ret = trace_define_field(call, field->type, field->name,
+ offset, field->size,
+ field->is_signed, field->filter_type);
+ if (ret)
+ break;
+
+ offset += field->size;
+ }
if (ret < 0) {
pr_warn("Could not initialize trace point events/%s\n",
name);
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 45630a76ed3a..6d64c1c19fd5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -29,10 +29,8 @@ static int ftrace_event_register(struct trace_event_call *call,
* function and thus become accesible via perf.
*/
#undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
- filter, regfn) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))

/* not needed for this file */
#undef __field_struct
@@ -41,6 +39,9 @@ static int ftrace_event_register(struct trace_event_call *call,
#undef __field
#define __field(type, item) type item;

+#undef __field_fn
+#define __field_fn(type, item) type item;
+
#undef __field_desc
#define __field_desc(type, container, item) type item;

@@ -60,7 +61,7 @@ static int ftrace_event_register(struct trace_event_call *call,
#define F_printk(fmt, args...) fmt, args

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \
struct ____ftrace_##name { \
tstruct \
}; \
@@ -73,76 +74,46 @@ static void __always_unused ____ftrace_check_##name(void) \
}

#undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print, filter) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))

#include "trace_entries.h"

+#undef __field_ext
+#define __field_ext(_type, _item, _filter_type) { \
+ .type = #_type, .name = #_item, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ is_signed_type(_type), .filter_type = _filter_type },
+
#undef __field
-#define __field(type, item) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __field(_type, _item) __field_ext(_type, _item, FILTER_OTHER)
+
+#undef __field_fn
+#define __field_fn(_type, _item) __field_ext(_type, _item, FILTER_TRACE_FN)

#undef __field_desc
-#define __field_desc(type, container, item) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), \
- container.item), \
- sizeof(field.container.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __field_desc(_type, _container, _item) __field_ext(_type, _item, FILTER_OTHER)

#undef __array
-#define __array(type, item, len) \
- do { \
- char *type_str = #type"["__stringify(len)"]"; \
- BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
- ret = trace_define_field(event_call, type_str, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret; \
- } while (0);
+#define __array(_type, _item, _len) { \
+ .type = #_type"["__stringify(_len)"]", .name = #_item, \
+ .size = sizeof(_type[_len]), .align = __alignof__(_type), \
+ is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __array_desc
-#define __array_desc(type, container, item, len) \
- BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
- ret = trace_define_field(event_call, #type "[" #len "]", #item, \
- offsetof(typeof(field), \
- container.item), \
- sizeof(field.container.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __array_desc(_type, _container, _item, _len) __array(_type, _item, _len)

#undef __dynamic_array
-#define __dynamic_array(type, item) \
- ret = trace_define_field(event_call, #type "[]", #item, \
- offsetof(typeof(field), item), \
- 0, is_signed_type(type), filter_type);\
- if (ret) \
- return ret;
+#define __dynamic_array(_type, _item) { \
+ .type = #_type "[]", .name = #_item, \
+ .size = 0, .align = __alignof__(_type), \
+ is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \
-static int __init \
-ftrace_define_fields_##name(struct trace_event_call *event_call) \
-{ \
- struct struct_name field; \
- int ret; \
- int filter_type = filter; \
- \
- tstruct; \
- \
- return ret; \
-}
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \
+static struct trace_event_fields ftrace_event_fields_##name[] = { \
+ tstruct \
+ {} };

#include "trace_entries.h"

@@ -152,6 +123,9 @@ ftrace_define_fields_##name(struct trace_event_call *event_call) \
#undef __field
#define __field(type, item)

+#undef __field_fn
+#define __field_fn(type, item)
+
#undef __field_desc
#define __field_desc(type, container, item)

@@ -168,12 +142,10 @@ ftrace_define_fields_##name(struct trace_event_call *event_call) \
#define F_printk(fmt, args...) __stringify(fmt) ", " __stringify(args)

#undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, filter,\
- regfn) \
- \
+#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn) \
struct trace_event_class __refdata event_class_ftrace_##call = { \
.system = __stringify(TRACE_SYSTEM), \
- .define_fields = ftrace_define_fields_##call, \
+ .fields_array = ftrace_event_fields_##call, \
.fields = LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
.reg = regfn, \
}; \
@@ -191,9 +163,9 @@ struct trace_event_call __used \
__attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print, filter) \
+#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print) \
FTRACE_ENTRY_REG(call, struct_name, etype, \
- PARAMS(tstruct), PARAMS(print), filter, NULL)
+ PARAMS(tstruct), PARAMS(print), NULL)

bool ftrace_event_is_function(struct trace_event_call *call)
{
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1552a95c743b..66e0a8ff1c01 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1534,16 +1534,28 @@ static struct trace_event_functions kprobe_funcs = {
.trace = print_kprobe_event
};

+static struct trace_event_fields kretprobe_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = kretprobe_event_define_fields },
+ {}
+};
+
+static struct trace_event_fields kprobe_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = kprobe_event_define_fields },
+ {}
+};
+
static inline void init_trace_event_call(struct trace_kprobe *tk)
{
struct trace_event_call *call = trace_probe_event_call(&tk->tp);

if (trace_kprobe_is_return(tk)) {
call->event.funcs = &kretprobe_funcs;
- call->class->define_fields = kretprobe_event_define_fields;
+ call->class->fields_array = kretprobe_fields_array;
} else {
call->event.funcs = &kprobe_funcs;
- call->class->define_fields = kprobe_event_define_fields;
+ call->class->fields_array = kprobe_fields_array;
}

call->flags = TRACE_EVENT_FL_KPROBE;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index fa8fbff736d6..53935259f701 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -198,11 +198,10 @@ print_syscall_exit(struct trace_iterator *iter, int flags,

extern char *__bad_type_size(void);

-#define SYSCALL_FIELD(type, field, name) \
- sizeof(type) != sizeof(trace.field) ? \
- __bad_type_size() : \
- #type, #name, offsetof(typeof(trace), field), \
- sizeof(trace.field), is_signed_type(type)
+#define SYSCALL_FIELD(_type, _name) { \
+ .type = #_type, .name = #_name, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER }

static int __init
__set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
@@ -269,42 +268,22 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
{
struct syscall_trace_enter trace;
struct syscall_metadata *meta = call->data;
- int ret;
- int i;
int offset = offsetof(typeof(trace), args);
-
- ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
- FILTER_OTHER);
- if (ret)
- return ret;
+ int ret, i;

for (i = 0; i < meta->nb_args; i++) {
ret = trace_define_field(call, meta->types[i],
meta->args[i], offset,
sizeof(unsigned long), 0,
FILTER_OTHER);
+ if (ret)
+ break;
offset += sizeof(unsigned long);
}

return ret;
}

-static int __init syscall_exit_define_fields(struct trace_event_call *call)
-{
- struct syscall_trace_exit trace;
- int ret;
-
- ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
- FILTER_OTHER);
- if (ret)
- return ret;
-
- ret = trace_define_field(call, SYSCALL_FIELD(long, ret, ret),
- FILTER_OTHER);
-
- return ret;
-}
-
static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
{
struct trace_array *tr = data;
@@ -502,6 +481,13 @@ static int __init init_syscall_trace(struct trace_event_call *call)
return id;
}

+static struct trace_event_fields __refdata syscall_enter_fields_array[] = {
+ SYSCALL_FIELD(int, __syscall_nr),
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = syscall_enter_define_fields },
+ {}
+};
+
struct trace_event_functions enter_syscall_print_funcs = {
.trace = print_syscall_enter,
};
@@ -513,7 +499,7 @@ struct trace_event_functions exit_syscall_print_funcs = {
struct trace_event_class __refdata event_class_syscall_enter = {
.system = "syscalls",
.reg = syscall_enter_register,
- .define_fields = syscall_enter_define_fields,
+ .fields_array = syscall_enter_fields_array,
.get_fields = syscall_get_enter_fields,
.raw_init = init_syscall_trace,
};
@@ -521,7 +507,11 @@ struct trace_event_class __refdata event_class_syscall_enter = {
struct trace_event_class __refdata event_class_syscall_exit = {
.system = "syscalls",
.reg = syscall_exit_register,
- .define_fields = syscall_exit_define_fields,
+ .fields_array = (struct trace_event_fields[]){
+ SYSCALL_FIELD(int, __syscall_nr),
+ SYSCALL_FIELD(long, ret),
+ {}
+ },
.fields = LIST_HEAD_INIT(event_class_syscall_exit.fields),
.raw_init = init_syscall_trace,
};
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 352073d36585..476a382f1f1b 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1507,12 +1507,17 @@ static struct trace_event_functions uprobe_funcs = {
.trace = print_uprobe_event
};

+static struct trace_event_fields uprobe_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = uprobe_event_define_fields },
+ {}
+};
+
static inline void init_trace_event_call(struct trace_uprobe *tu)
{
struct trace_event_call *call = trace_probe_event_call(&tu->tp);
-
call->event.funcs = &uprobe_funcs;
- call->class->define_fields = uprobe_event_define_fields;
+ call->class->fields_array = uprobe_fields_array;

call->flags = TRACE_EVENT_FL_UPROBE | TRACE_EVENT_FL_CAP_ANY;
call->class->reg = trace_uprobe_register;

2019-10-25 07:38:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Thu, Oct 24, 2019 at 12:16:09PM +0200, Peter Zijlstra wrote:
> +struct trace_event_fields {
> + const char *type;
> + union {
> + struct {
> + const char *name;
> + const int size;
> + const int align;
> + const int is_signed;
> + const int filter_type;

FWIW, I suspect we can do:

unsigned char size;
unsigned char align;
unsigned char is_signed;
unsigned char filter_type;

And save us some 8 bytes per entry (12 on 32bit).

> + };
> + int (*define_fields)(struct trace_event_call *);
> + };
> +};

2019-10-25 09:44:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Wed, Oct 23, 2019 at 12:15:14PM -0500, Josh Poimboeuf wrote:
> On Wed, Oct 23, 2019 at 05:16:54PM +0200, Peter Zijlstra wrote:
> > @@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh
> >
> > val = sym->st_value + rel[i].r_addend;
> >
> > + /*
> > + * .klp.rela.* sections should only contain module
> > + * related RELAs. All core-kernel RELAs should be in
> > + * normal .rela.* sections and be applied when loading
> > + * the patch module itself.
> > + */
> > + WARN_ON_ONCE(klp && core_kernel_text(val));
> > +
>
> This isn't quite true, we also use .klp.rela sections to access
> unexported vmlinux symbols.

Yes, you said in that earlier email. That all makes it really hard to
validate this. But unless we validate it, it will stay buggy :/

Hmmm.... /me ponders

The alternative would be to apply the .klp.rela.* sections twice, once
at patch-module load time and then apply those core_kernel_text()
entries, and then once later and skip over them.

How's this?

---
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -126,12 +126,20 @@ int apply_relocate(Elf32_Shdr *sechdrs,
return 0;
}
#else /*X86_64*/
+
+enum rela_filter {
+ rela_all = 0,
+ rela_core,
+ rela_mod,
+};
+
static int __apply_relocate_add(Elf64_Shdr *sechdrs,
const char *strtab,
unsigned int symindex,
unsigned int relsec,
struct module *me,
- void *(*write)(void *addr, const void *val, size_t size))
+ void *(*write)(void *addr, const void *val, size_t size),
+ enum rela_filter filter)
{
unsigned int i;
Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
@@ -157,6 +165,11 @@ static int __apply_relocate_add(Elf64_Sh

val = sym->st_value + rel[i].r_addend;

+ if (filter) {
+ if ((filter == rela_core) != !!core_kernel_text(val))
+ continue;
+ }
+
switch (ELF64_R_TYPE(rel[i].r_info)) {
case R_X86_64_NONE:
break;
@@ -223,18 +236,21 @@ int apply_relocate_add(Elf64_Shdr *sechd
unsigned int relsec,
struct module *me)
{
- return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
+ return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy, rela_all);
}

int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
const char *strtab,
unsigned int symindex,
unsigned int relsec,
- struct module *me)
+ struct module *me, bool late)
{
int ret;

- ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
+ if (!late)
+ return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy, rela_core);
+
+ ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke, rela_mod);
if (!ret)
text_poke_sync();

--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -222,7 +222,7 @@ extern int klp_apply_relocate_add(Elf64_
const char *strtab,
unsigned int symindex,
unsigned int relsec,
- struct module *me);
+ struct module *me, bool late);

#else /* !CONFIG_LIVEPATCH */

--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -245,15 +245,6 @@ static int klp_resolve_symbols(Elf_Shdr
return 0;
}

-int __weak klp_apply_relocate_add(Elf64_Shdr *sechdrs,
- const char *strtab,
- unsigned int symindex,
- unsigned int relsec,
- struct module *me)
-{
- return apply_relocate_add(sechdrs, strtab, symindex, relsec, me);
-}
-
static int klp_write_object_relocations(struct module *pmod,
struct klp_object *obj)
{
@@ -296,7 +287,7 @@ static int klp_write_object_relocations(

ret = klp_apply_relocate_add(pmod->klp_info->sechdrs,
pmod->core_kallsyms.strtab,
- pmod->klp_info->symndx, i, pmod);
+ pmod->klp_info->symndx, i, pmod, true);
if (ret)
break;
}
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2328,8 +2328,13 @@ static int apply_relocations(struct modu
continue;

/* Livepatch relocation sections are applied by livepatch */
- if (info->sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH)
+ if (info->sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH) {
+ if (info->sechdrs[i].sh_type == SHT_RELA) {
+ klp_apply_relocate_add(info->sechdrs, info->strtab,
+ info->index.sym, i, mod, false);
+ }
continue;
+ }

if (info->sechdrs[i].sh_type == SHT_REL)
err = apply_relocate(info->sechdrs, info->strtab,

2019-10-25 13:56:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Wed, Oct 23, 2019 at 12:00:25PM -0500, Josh Poimboeuf wrote:

> > This then raises a number of questions:
> >
> > 1) why is that RELA (that obviously does not depend on any module)
> > applied so late?
>
> Good question. The 'pv_ops' symbol is exported by the core kernel, so I
> can't see any reason why we'd need to apply that rela late. In theory,
> kpatch-build isn't supposed to convert that to a klp rela. Maybe
> something went wrong in the patch creation code.
>
> I'm also questioning why we even need to apply the parainstructions
> section late. Maybe we can remove that apply_paravirt() call
> altogether, along with .klp.arch.parainstruction sections.
>
> I'll need to look into it...

Right, that really should be able to run early. Esp. after commit

11e86dc7f274 ("x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call()")

paravirt patching is unconditional. We _never_ run with the indirect
call except very early boot, but modules should have them patched way
before their init section runs.

We rely on this for spectre-v2 and friends.

> > 3) Is there ever a possible module-dependent RELA to a paravirt /
> > alternative site?
>
> Good question...

> > Then for 3) we only have alternatives left, and I _think_ it unlikely to
> > be the case, but I'll have to have a hard look at that.
>
> I'm not sure about alternatives, but maybe we can enforce such
> limitations with tooling and/or kernel checks.

Right, so on IRC you implied you might have some additional details on
how alternatives were affected; did you manage to dig that up?

2019-10-25 14:12:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Thu, Oct 24, 2019 at 11:00:24AM -0400, Steven Rostedt wrote:
> On Thu, 24 Oct 2019 12:16:09 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Wed, Oct 23, 2019 at 02:52:45PM -0400, Steven Rostedt wrote:
> > > After applying this series and this patch I triggered this:
> >
> > Bah, I hate C.
> >
> > (also for some reason I had KPROBE_EVENTS disabled, when I enabled it it
> > failed on boot due to selftests)
> >
> > this one seems to boot and survive your selftests thing (that takes for
> > bloody ever to run).
> >
>
> Care to enable CONFIG_HIST_TRIGGERS?
>
> CC [M] drivers/gpu/drm/i915/gem/i915_gem_context.o
> /work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
> /work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
> call->class->define_fields = synth_event_define_fields;
> ^~~~~~~~~~~~~
> get_fields
> make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
> make[3]: *** Waiting for unfinished jobs....

allmodconfig clean

(omg, so much __field(); fail)

---
drivers/infiniband/hw/hfi1/trace_tid.h | 8 +--
drivers/infiniband/hw/hfi1/trace_tx.h | 2 +-
drivers/lightnvm/pblk-trace.h | 8 +--
drivers/net/fjes/fjes_trace.h | 2 +-
drivers/net/wireless/ath/ath10k/trace.h | 6 +-
fs/xfs/scrub/trace.h | 6 +-
fs/xfs/xfs_trace.h | 4 +-
include/linux/trace_events.h | 18 +++++-
include/trace/events/filemap.h | 2 +-
include/trace/events/rpcrdma.h | 2 +-
include/trace/trace_events.h | 64 +++++++------------
kernel/trace/trace.h | 31 +++++-----
kernel/trace/trace_entries.h | 66 ++++++--------------
kernel/trace/trace_events.c | 20 +++++-
kernel/trace/trace_events_hist.c | 8 ++-
kernel/trace/trace_export.c | 106 ++++++++++++--------------------
kernel/trace/trace_kprobe.c | 16 ++++-
kernel/trace/trace_syscalls.c | 50 ++++++---------
kernel/trace/trace_uprobe.c | 9 ++-
net/mac80211/trace.h | 28 ++++-----
net/wireless/trace.h | 6 +-
21 files changed, 213 insertions(+), 249 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/trace_tid.h b/drivers/infiniband/hw/hfi1/trace_tid.h
index 343fb9894a82..985ffa9cc958 100644
--- a/drivers/infiniband/hw/hfi1/trace_tid.h
+++ b/drivers/infiniband/hw/hfi1/trace_tid.h
@@ -138,10 +138,10 @@ TRACE_EVENT(/* put_tid */
TP_ARGS(dd, index, type, pa, order),
TP_STRUCT__entry(/* entry */
DD_DEV_ENTRY(dd)
- __field(unsigned long, pa);
- __field(u32, index);
- __field(u32, type);
- __field(u16, order);
+ __field(unsigned long, pa)
+ __field(u32, index)
+ __field(u32, type)
+ __field(u16, order)
),
TP_fast_assign(/* assign */
DD_DEV_ASSIGN(dd);
diff --git a/drivers/infiniband/hw/hfi1/trace_tx.h b/drivers/infiniband/hw/hfi1/trace_tx.h
index 09eb0c9ada00..769e5e4710c6 100644
--- a/drivers/infiniband/hw/hfi1/trace_tx.h
+++ b/drivers/infiniband/hw/hfi1/trace_tx.h
@@ -588,7 +588,7 @@ TRACE_EVENT(hfi1_sdma_user_reqinfo,
TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u8 subctxt, u16 *i),
TP_ARGS(dd, ctxt, subctxt, i),
TP_STRUCT__entry(
- DD_DEV_ENTRY(dd);
+ DD_DEV_ENTRY(dd)
__field(u16, ctxt)
__field(u8, subctxt)
__field(u8, ver_opcode)
diff --git a/drivers/lightnvm/pblk-trace.h b/drivers/lightnvm/pblk-trace.h
index 9534503b69d9..47b67c6bff7a 100644
--- a/drivers/lightnvm/pblk-trace.h
+++ b/drivers/lightnvm/pblk-trace.h
@@ -46,7 +46,7 @@ TRACE_EVENT(pblk_chunk_reset,
TP_STRUCT__entry(
__string(name, name)
__field(u64, ppa)
- __field(int, state);
+ __field(int, state)
),

TP_fast_assign(
@@ -72,7 +72,7 @@ TRACE_EVENT(pblk_chunk_state,
TP_STRUCT__entry(
__string(name, name)
__field(u64, ppa)
- __field(int, state);
+ __field(int, state)
),

TP_fast_assign(
@@ -98,7 +98,7 @@ TRACE_EVENT(pblk_line_state,
TP_STRUCT__entry(
__string(name, name)
__field(int, line)
- __field(int, state);
+ __field(int, state)
),

TP_fast_assign(
@@ -121,7 +121,7 @@ TRACE_EVENT(pblk_state,

TP_STRUCT__entry(
__string(name, name)
- __field(int, state);
+ __field(int, state)
),

TP_fast_assign(
diff --git a/drivers/net/fjes/fjes_trace.h b/drivers/net/fjes/fjes_trace.h
index c611b6a80b20..9237b69d8e21 100644
--- a/drivers/net/fjes/fjes_trace.h
+++ b/drivers/net/fjes/fjes_trace.h
@@ -28,7 +28,7 @@ TRACE_EVENT(fjes_hw_issue_request_command,
__field(u8, cs_busy)
__field(u8, cs_complete)
__field(int, timeout)
- __field(int, ret);
+ __field(int, ret)
),
TP_fast_assign(
__entry->cr_req = cr->bits.req_code;
diff --git a/drivers/net/wireless/ath/ath10k/trace.h b/drivers/net/wireless/ath/ath10k/trace.h
index ab916459d237..842e42ec814f 100644
--- a/drivers/net/wireless/ath/ath10k/trace.h
+++ b/drivers/net/wireless/ath/ath10k/trace.h
@@ -239,7 +239,7 @@ TRACE_EVENT(ath10k_wmi_dbglog,
TP_STRUCT__entry(
__string(device, dev_name(ar->dev))
__string(driver, dev_driver_string(ar->dev))
- __field(u8, hw_type);
+ __field(u8, hw_type)
__field(size_t, buf_len)
__dynamic_array(u8, buf, buf_len)
),
@@ -269,7 +269,7 @@ TRACE_EVENT(ath10k_htt_pktlog,
TP_STRUCT__entry(
__string(device, dev_name(ar->dev))
__string(driver, dev_driver_string(ar->dev))
- __field(u8, hw_type);
+ __field(u8, hw_type)
__field(u16, buf_len)
__dynamic_array(u8, pktlog, buf_len)
),
@@ -435,7 +435,7 @@ TRACE_EVENT(ath10k_htt_rx_desc,
TP_STRUCT__entry(
__string(device, dev_name(ar->dev))
__string(driver, dev_driver_string(ar->dev))
- __field(u8, hw_type);
+ __field(u8, hw_type)
__field(u16, len)
__dynamic_array(u8, rxdesc, len)
),
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 3362bae28b46..096203119934 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -329,7 +329,7 @@ TRACE_EVENT(xchk_btree_op_error,
__field(int, level)
__field(xfs_agnumber_t, agno)
__field(xfs_agblock_t, bno)
- __field(int, ptr);
+ __field(int, ptr)
__field(int, error)
__field(void *, ret_ip)
),
@@ -414,7 +414,7 @@ TRACE_EVENT(xchk_btree_error,
__field(int, level)
__field(xfs_agnumber_t, agno)
__field(xfs_agblock_t, bno)
- __field(int, ptr);
+ __field(int, ptr)
__field(void *, ret_ip)
),
TP_fast_assign(
@@ -452,7 +452,7 @@ TRACE_EVENT(xchk_ifork_btree_error,
__field(int, level)
__field(xfs_agnumber_t, agno)
__field(xfs_agblock_t, bno)
- __field(int, ptr);
+ __field(int, ptr)
__field(void *, ret_ip)
),
TP_fast_assign(
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index eaae275ed430..53c5485cf2a1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -218,8 +218,8 @@ DECLARE_EVENT_CLASS(xfs_bmap_class,
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_ino_t, ino)
- __field(void *, leaf);
- __field(int, pos);
+ __field(void *, leaf)
+ __field(int, pos)
__field(xfs_fileoff_t, startoff)
__field(xfs_fsblock_t, startblock)
__field(xfs_filblks_t, blockcount)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 30a8cdcfd4a4..a379255c14a9 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -187,6 +187,22 @@ enum trace_reg {

struct trace_event_call;

+#define TRACE_FUNCTION_TYPE ((const char *)~0UL)
+
+struct trace_event_fields {
+ const char *type;
+ union {
+ struct {
+ const char *name;
+ const int size;
+ const int align;
+ const int is_signed;
+ const int filter_type;
+ };
+ int (*define_fields)(struct trace_event_call *);
+ };
+};
+
struct trace_event_class {
const char *system;
void *probe;
@@ -195,7 +211,7 @@ struct trace_event_class {
#endif
int (*reg)(struct trace_event_call *event,
enum trace_reg type, void *data);
- int (*define_fields)(struct trace_event_call *);
+ struct trace_event_fields *fields_array;
struct list_head *(*get_fields)(struct trace_event_call *);
struct list_head fields;
int (*raw_init)(struct trace_event_call *);
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index ee05db7ee8d2..796053e162d2 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -85,7 +85,7 @@ TRACE_EVENT(file_check_and_advance_wb_err,
TP_ARGS(file, old),

TP_STRUCT__entry(
- __field(struct file *, file);
+ __field(struct file *, file)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
__field(errseq_t, old)
diff --git a/include/trace/events/rpcrdma.h b/include/trace/events/rpcrdma.h
index a13830616107..477351897051 100644
--- a/include/trace/events/rpcrdma.h
+++ b/include/trace/events/rpcrdma.h
@@ -1507,7 +1507,7 @@ TRACE_EVENT(svcrdma_dma_map_page,
TP_ARGS(rdma, page),

TP_STRUCT__entry(
- __field(const void *, page);
+ __field(const void *, page)
__string(device, rdma->sc_cm_id->device->name)
__string(addr, rdma->sc_xprt.xpt_remotebuf)
),
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 4ecdfe2e3580..ca1d2e745a3f 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -394,22 +394,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)

#undef __field_ext
-#define __field_ext(type, item, filter_type) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __field_ext(_type, _item, _filter_type) { \
+ .type = #_type, .name = #_item, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ .is_signed = is_signed_type(_type), .filter_type = _filter_type },

#undef __field_struct_ext
-#define __field_struct_ext(type, item, filter_type) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- 0, filter_type); \
- if (ret) \
- return ret;
+#define __field_struct_ext(_type, _item, _filter_type) { \
+ .type = #_type, .name = #_item, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ 0, .filter_type = _filter_type },

#undef __field
#define __field(type, item) __field_ext(type, item, FILTER_OTHER)
@@ -418,25 +412,16 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \
#define __field_struct(type, item) __field_struct_ext(type, item, FILTER_OTHER)

#undef __array
-#define __array(type, item, len) \
- do { \
- char *type_str = #type"["__stringify(len)"]"; \
- BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
- BUILD_BUG_ON(len <= 0); \
- ret = trace_define_field(event_call, type_str, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), FILTER_OTHER); \
- if (ret) \
- return ret; \
- } while (0);
+#define __array(_type, _item, _len) { \
+ .type = #_type"["__stringify(_len)"]", .name = #_item, \
+ .size = sizeof(_type[_len]), .align = __alignof__(_type), \
+ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __dynamic_array
-#define __dynamic_array(type, item, len) \
- ret = trace_define_field(event_call, "__data_loc " #type "[]", #item, \
- offsetof(typeof(field), __data_loc_##item), \
- sizeof(field.__data_loc_##item), \
- is_signed_type(type), FILTER_OTHER);
+#define __dynamic_array(_type, _item, _len) { \
+ .type = "__data_loc " #_type "[]", .name = #_item, \
+ .size = 4, .align = 4, \
+ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __string
#define __string(item, src) __dynamic_array(char, item, -1)
@@ -446,16 +431,9 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \

#undef DECLARE_EVENT_CLASS
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, func, print) \
-static int notrace __init \
-trace_event_define_fields_##call(struct trace_event_call *event_call) \
-{ \
- struct trace_event_raw_##call field; \
- int ret; \
- \
- tstruct; \
- \
- return ret; \
-}
+static struct trace_event_fields trace_event_fields_##call[] = { \
+ tstruct \
+ {} };

#undef DEFINE_EVENT
#define DEFINE_EVENT(template, name, proto, args)
@@ -613,7 +591,7 @@ static inline notrace int trace_event_get_offsets_##call( \
*
* static struct trace_event_class __used event_class_<template> = {
* .system = "<system>",
- * .define_fields = trace_event_define_fields_<call>,
+ * .fields_array = trace_event_fields_<call>,
* .fields = LIST_HEAD_INIT(event_class_##call.fields),
* .raw_init = trace_event_raw_init,
* .probe = trace_event_raw_event_##call,
@@ -761,7 +739,7 @@ _TRACE_PERF_PROTO(call, PARAMS(proto)); \
static char print_fmt_##call[] = print; \
static struct trace_event_class __used __refdata event_class_##call = { \
.system = TRACE_SYSTEM_STRING, \
- .define_fields = trace_event_define_fields_##call, \
+ .fields_array = trace_event_fields_##call, \
.fields = LIST_HEAD_INIT(event_class_##call.fields),\
.raw_init = trace_event_raw_init, \
.probe = trace_event_raw_event_##call, \
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d685c61085c0..298a7cacf146 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -49,6 +49,9 @@ enum trace_type {
#undef __field
#define __field(type, item) type item;

+#undef __field_fn
+#define __field_fn(type, item) type item;
+
#undef __field_struct
#define __field_struct(type, item) __field(type, item)

@@ -68,26 +71,22 @@ enum trace_type {
#define F_STRUCT(args...) args

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \
struct struct_name { \
struct trace_entry ent; \
tstruct \
}

#undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk, filter)
+#define FTRACE_ENTRY_DUP(name, name_struct, id, tstruct, printk)

#undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
- filter, regfn) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))

#undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print, \
- filter) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter) __packed
+#define FTRACE_ENTRY_PACKED(name, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print)) __packed

#include "trace_entries.h"

@@ -1899,17 +1898,15 @@ extern void tracing_log_err(struct trace_array *tr,
#define internal_trace_puts(str) __trace_puts(_THIS_IP_, str, strlen(str))

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, id, tstruct, print, filter) \
+#define FTRACE_ENTRY(call, struct_name, id, tstruct, print) \
extern struct trace_event_call \
__aligned(4) event_##call;
#undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print, filter) \
- FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_DUP(call, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))
#undef FTRACE_ENTRY_PACKED
-#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print, filter) \
- FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_PACKED(call, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(call, struct_name, id, PARAMS(tstruct), PARAMS(print))

#include "trace_entries.h"

diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index fc8e97328e54..3e9d81608284 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -61,15 +61,13 @@ FTRACE_ENTRY_REG(function, ftrace_entry,
TRACE_FN,

F_STRUCT(
- __field( unsigned long, ip )
- __field( unsigned long, parent_ip )
+ __field_fn( unsigned long, ip )
+ __field_fn( unsigned long, parent_ip )
),

F_printk(" %ps <-- %ps",
(void *)__entry->ip, (void *)__entry->parent_ip),

- FILTER_TRACE_FN,
-
perf_ftrace_event_register
);

@@ -84,9 +82,7 @@ FTRACE_ENTRY_PACKED(funcgraph_entry, ftrace_graph_ent_entry,
__field_desc( int, graph_ent, depth )
),

- F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth),
-
- FILTER_OTHER
+ F_printk("--> %ps (%d)", (void *)__entry->func, __entry->depth)
);

/* Function return entry */
@@ -97,18 +93,16 @@ FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,
F_STRUCT(
__field_struct( struct ftrace_graph_ret, ret )
__field_desc( unsigned long, ret, func )
+ __field_desc( unsigned long, ret, overrun )
__field_desc( unsigned long long, ret, calltime)
__field_desc( unsigned long long, ret, rettime )
- __field_desc( unsigned long, ret, overrun )
__field_desc( int, ret, depth )
),

F_printk("<-- %ps (%d) (start: %llx end: %llx) over: %d",
(void *)__entry->func, __entry->depth,
__entry->calltime, __entry->rettime,
- __entry->depth),
-
- FILTER_OTHER
+ __entry->depth)
);

/*
@@ -137,9 +131,7 @@ FTRACE_ENTRY(context_switch, ctx_switch_entry,
F_printk("%u:%u:%u ==> %u:%u:%u [%03u]",
__entry->prev_pid, __entry->prev_prio, __entry->prev_state,
__entry->next_pid, __entry->next_prio, __entry->next_state,
- __entry->next_cpu),
-
- FILTER_OTHER
+ __entry->next_cpu)
);

/*
@@ -157,9 +149,7 @@ FTRACE_ENTRY_DUP(wakeup, ctx_switch_entry,
F_printk("%u:%u:%u ==+ %u:%u:%u [%03u]",
__entry->prev_pid, __entry->prev_prio, __entry->prev_state,
__entry->next_pid, __entry->next_prio, __entry->next_state,
- __entry->next_cpu),
-
- FILTER_OTHER
+ __entry->next_cpu)
);

/*
@@ -183,9 +173,7 @@ FTRACE_ENTRY(kernel_stack, stack_entry,
(void *)__entry->caller[0], (void *)__entry->caller[1],
(void *)__entry->caller[2], (void *)__entry->caller[3],
(void *)__entry->caller[4], (void *)__entry->caller[5],
- (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
- FILTER_OTHER
+ (void *)__entry->caller[6], (void *)__entry->caller[7])
);

FTRACE_ENTRY(user_stack, userstack_entry,
@@ -203,9 +191,7 @@ FTRACE_ENTRY(user_stack, userstack_entry,
(void *)__entry->caller[0], (void *)__entry->caller[1],
(void *)__entry->caller[2], (void *)__entry->caller[3],
(void *)__entry->caller[4], (void *)__entry->caller[5],
- (void *)__entry->caller[6], (void *)__entry->caller[7]),
-
- FILTER_OTHER
+ (void *)__entry->caller[6], (void *)__entry->caller[7])
);

/*
@@ -222,9 +208,7 @@ FTRACE_ENTRY(bprint, bprint_entry,
),

F_printk("%ps: %s",
- (void *)__entry->ip, __entry->fmt),
-
- FILTER_OTHER
+ (void *)__entry->ip, __entry->fmt)
);

FTRACE_ENTRY_REG(print, print_entry,
@@ -239,8 +223,6 @@ FTRACE_ENTRY_REG(print, print_entry,
F_printk("%ps: %s",
(void *)__entry->ip, __entry->buf),

- FILTER_OTHER,
-
ftrace_event_register
);

@@ -254,9 +236,7 @@ FTRACE_ENTRY(raw_data, raw_data_entry,
),

F_printk("id:%04x %08x",
- __entry->id, (int)__entry->buf[0]),
-
- FILTER_OTHER
+ __entry->id, (int)__entry->buf[0])
);

FTRACE_ENTRY(bputs, bputs_entry,
@@ -269,9 +249,7 @@ FTRACE_ENTRY(bputs, bputs_entry,
),

F_printk("%ps: %s",
- (void *)__entry->ip, __entry->str),
-
- FILTER_OTHER
+ (void *)__entry->ip, __entry->str)
);

FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
@@ -283,16 +261,14 @@ FTRACE_ENTRY(mmiotrace_rw, trace_mmiotrace_rw,
__field_desc( resource_size_t, rw, phys )
__field_desc( unsigned long, rw, value )
__field_desc( unsigned long, rw, pc )
- __field_desc( int, rw, map_id )
+ __field_desc( int, rw, map_id )
__field_desc( unsigned char, rw, opcode )
__field_desc( unsigned char, rw, width )
),

F_printk("%lx %lx %lx %d %x %x",
(unsigned long)__entry->phys, __entry->value, __entry->pc,
- __entry->map_id, __entry->opcode, __entry->width),
-
- FILTER_OTHER
+ __entry->map_id, __entry->opcode, __entry->width)
);

FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
@@ -304,15 +280,13 @@ FTRACE_ENTRY(mmiotrace_map, trace_mmiotrace_map,
__field_desc( resource_size_t, map, phys )
__field_desc( unsigned long, map, virt )
__field_desc( unsigned long, map, len )
- __field_desc( int, map, map_id )
+ __field_desc( int, map, map_id )
__field_desc( unsigned char, map, opcode )
),

F_printk("%lx %lx %lx %d %x",
(unsigned long)__entry->phys, __entry->virt, __entry->len,
- __entry->map_id, __entry->opcode),
-
- FILTER_OTHER
+ __entry->map_id, __entry->opcode)
);


@@ -334,9 +308,7 @@ FTRACE_ENTRY(branch, trace_branch,
F_printk("%u:%s:%s (%u)%s",
__entry->line,
__entry->func, __entry->file, __entry->correct,
- __entry->constant ? " CONSTANT" : ""),
-
- FILTER_OTHER
+ __entry->constant ? " CONSTANT" : "")
);


@@ -362,7 +334,5 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
__entry->duration,
__entry->outer_duration,
__entry->nmi_total_ts,
- __entry->nmi_count),
-
- FILTER_OTHER
+ __entry->nmi_count)
);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index fba87d10f0c1..5ab10c3dce78 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -24,6 +24,7 @@
#include <linux/delay.h>

#include <trace/events/sched.h>
+#include <trace/syscall.h>

#include <asm/setup.h>

@@ -1990,7 +1991,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
*/
head = trace_get_fields(call);
if (list_empty(head)) {
- ret = call->class->define_fields(call);
+ struct trace_event_fields *field = call->class->fields_array;
+ unsigned int offset = sizeof(struct trace_entry);
+
+ for (; field->type; field++) {
+ if (field->type == TRACE_FUNCTION_TYPE) {
+ ret = field->define_fields(call);
+ break;
+ }
+
+ offset = ALIGN(offset, field->align);
+ ret = trace_define_field(call, field->type, field->name,
+ offset, field->size,
+ field->is_signed, field->filter_type);
+ if (ret)
+ break;
+
+ offset += field->size;
+ }
if (ret < 0) {
pr_warn("Could not initialize trace point events/%s\n",
name);
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 57648c5aa679..2a4d3ceefd71 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1133,6 +1133,12 @@ static struct synth_event *find_synth_event(const char *name)
return NULL;
}

+static struct trace_event_fields synth_event_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = synth_event_define_fields },
+ {}
+};
+
static int register_synth_event(struct synth_event *event)
{
struct trace_event_call *call = &event->call;
@@ -1154,7 +1160,7 @@ static int register_synth_event(struct synth_event *event)

INIT_LIST_HEAD(&call->class->fields);
call->event.funcs = &synth_event_funcs;
- call->class->define_fields = synth_event_define_fields;
+ call->class->fields_array = synth_event_fields_array;

ret = register_trace_event(&call->event);
if (!ret) {
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 45630a76ed3a..6d64c1c19fd5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -29,10 +29,8 @@ static int ftrace_event_register(struct trace_event_call *call,
* function and thus become accesible via perf.
*/
#undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, \
- filter, regfn) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_REG(name, struct_name, id, tstruct, print, regfn) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))

/* not needed for this file */
#undef __field_struct
@@ -41,6 +39,9 @@ static int ftrace_event_register(struct trace_event_call *call,
#undef __field
#define __field(type, item) type item;

+#undef __field_fn
+#define __field_fn(type, item) type item;
+
#undef __field_desc
#define __field_desc(type, container, item) type item;

@@ -60,7 +61,7 @@ static int ftrace_event_register(struct trace_event_call *call,
#define F_printk(fmt, args...) fmt, args

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \
struct ____ftrace_##name { \
tstruct \
}; \
@@ -73,76 +74,46 @@ static void __always_unused ____ftrace_check_##name(void) \
}

#undef FTRACE_ENTRY_DUP
-#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print, filter) \
- FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print), \
- filter)
+#define FTRACE_ENTRY_DUP(name, struct_name, id, tstruct, print) \
+ FTRACE_ENTRY(name, struct_name, id, PARAMS(tstruct), PARAMS(print))

#include "trace_entries.h"

+#undef __field_ext
+#define __field_ext(_type, _item, _filter_type) { \
+ .type = #_type, .name = #_item, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ is_signed_type(_type), .filter_type = _filter_type },
+
#undef __field
-#define __field(type, item) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __field(_type, _item) __field_ext(_type, _item, FILTER_OTHER)
+
+#undef __field_fn
+#define __field_fn(_type, _item) __field_ext(_type, _item, FILTER_TRACE_FN)

#undef __field_desc
-#define __field_desc(type, container, item) \
- ret = trace_define_field(event_call, #type, #item, \
- offsetof(typeof(field), \
- container.item), \
- sizeof(field.container.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __field_desc(_type, _container, _item) __field_ext(_type, _item, FILTER_OTHER)

#undef __array
-#define __array(type, item, len) \
- do { \
- char *type_str = #type"["__stringify(len)"]"; \
- BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
- ret = trace_define_field(event_call, type_str, #item, \
- offsetof(typeof(field), item), \
- sizeof(field.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret; \
- } while (0);
+#define __array(_type, _item, _len) { \
+ .type = #_type"["__stringify(_len)"]", .name = #_item, \
+ .size = sizeof(_type[_len]), .align = __alignof__(_type), \
+ is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef __array_desc
-#define __array_desc(type, container, item, len) \
- BUILD_BUG_ON(len > MAX_FILTER_STR_VAL); \
- ret = trace_define_field(event_call, #type "[" #len "]", #item, \
- offsetof(typeof(field), \
- container.item), \
- sizeof(field.container.item), \
- is_signed_type(type), filter_type); \
- if (ret) \
- return ret;
+#define __array_desc(_type, _container, _item, _len) __array(_type, _item, _len)

#undef __dynamic_array
-#define __dynamic_array(type, item) \
- ret = trace_define_field(event_call, #type "[]", #item, \
- offsetof(typeof(field), item), \
- 0, is_signed_type(type), filter_type);\
- if (ret) \
- return ret;
+#define __dynamic_array(_type, _item) { \
+ .type = #_type "[]", .name = #_item, \
+ .size = 0, .align = __alignof__(_type), \
+ is_signed_type(_type), .filter_type = FILTER_OTHER },

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(name, struct_name, id, tstruct, print, filter) \
-static int __init \
-ftrace_define_fields_##name(struct trace_event_call *event_call) \
-{ \
- struct struct_name field; \
- int ret; \
- int filter_type = filter; \
- \
- tstruct; \
- \
- return ret; \
-}
+#define FTRACE_ENTRY(name, struct_name, id, tstruct, print) \
+static struct trace_event_fields ftrace_event_fields_##name[] = { \
+ tstruct \
+ {} };

#include "trace_entries.h"

@@ -152,6 +123,9 @@ ftrace_define_fields_##name(struct trace_event_call *event_call) \
#undef __field
#define __field(type, item)

+#undef __field_fn
+#define __field_fn(type, item)
+
#undef __field_desc
#define __field_desc(type, container, item)

@@ -168,12 +142,10 @@ ftrace_define_fields_##name(struct trace_event_call *event_call) \
#define F_printk(fmt, args...) __stringify(fmt) ", " __stringify(args)

#undef FTRACE_ENTRY_REG
-#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, filter,\
- regfn) \
- \
+#define FTRACE_ENTRY_REG(call, struct_name, etype, tstruct, print, regfn) \
struct trace_event_class __refdata event_class_ftrace_##call = { \
.system = __stringify(TRACE_SYSTEM), \
- .define_fields = ftrace_define_fields_##call, \
+ .fields_array = ftrace_event_fields_##call, \
.fields = LIST_HEAD_INIT(event_class_ftrace_##call.fields),\
.reg = regfn, \
}; \
@@ -191,9 +163,9 @@ struct trace_event_call __used \
__attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;

#undef FTRACE_ENTRY
-#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print, filter) \
+#define FTRACE_ENTRY(call, struct_name, etype, tstruct, print) \
FTRACE_ENTRY_REG(call, struct_name, etype, \
- PARAMS(tstruct), PARAMS(print), filter, NULL)
+ PARAMS(tstruct), PARAMS(print), NULL)

bool ftrace_event_is_function(struct trace_event_call *call)
{
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1552a95c743b..66e0a8ff1c01 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1534,16 +1534,28 @@ static struct trace_event_functions kprobe_funcs = {
.trace = print_kprobe_event
};

+static struct trace_event_fields kretprobe_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = kretprobe_event_define_fields },
+ {}
+};
+
+static struct trace_event_fields kprobe_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = kprobe_event_define_fields },
+ {}
+};
+
static inline void init_trace_event_call(struct trace_kprobe *tk)
{
struct trace_event_call *call = trace_probe_event_call(&tk->tp);

if (trace_kprobe_is_return(tk)) {
call->event.funcs = &kretprobe_funcs;
- call->class->define_fields = kretprobe_event_define_fields;
+ call->class->fields_array = kretprobe_fields_array;
} else {
call->event.funcs = &kprobe_funcs;
- call->class->define_fields = kprobe_event_define_fields;
+ call->class->fields_array = kprobe_fields_array;
}

call->flags = TRACE_EVENT_FL_KPROBE;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index fa8fbff736d6..53935259f701 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -198,11 +198,10 @@ print_syscall_exit(struct trace_iterator *iter, int flags,

extern char *__bad_type_size(void);

-#define SYSCALL_FIELD(type, field, name) \
- sizeof(type) != sizeof(trace.field) ? \
- __bad_type_size() : \
- #type, #name, offsetof(typeof(trace), field), \
- sizeof(trace.field), is_signed_type(type)
+#define SYSCALL_FIELD(_type, _name) { \
+ .type = #_type, .name = #_name, \
+ .size = sizeof(_type), .align = __alignof__(_type), \
+ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER }

static int __init
__set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
@@ -269,42 +268,22 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
{
struct syscall_trace_enter trace;
struct syscall_metadata *meta = call->data;
- int ret;
- int i;
int offset = offsetof(typeof(trace), args);
-
- ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
- FILTER_OTHER);
- if (ret)
- return ret;
+ int ret, i;

for (i = 0; i < meta->nb_args; i++) {
ret = trace_define_field(call, meta->types[i],
meta->args[i], offset,
sizeof(unsigned long), 0,
FILTER_OTHER);
+ if (ret)
+ break;
offset += sizeof(unsigned long);
}

return ret;
}

-static int __init syscall_exit_define_fields(struct trace_event_call *call)
-{
- struct syscall_trace_exit trace;
- int ret;
-
- ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
- FILTER_OTHER);
- if (ret)
- return ret;
-
- ret = trace_define_field(call, SYSCALL_FIELD(long, ret, ret),
- FILTER_OTHER);
-
- return ret;
-}
-
static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
{
struct trace_array *tr = data;
@@ -502,6 +481,13 @@ static int __init init_syscall_trace(struct trace_event_call *call)
return id;
}

+static struct trace_event_fields __refdata syscall_enter_fields_array[] = {
+ SYSCALL_FIELD(int, __syscall_nr),
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = syscall_enter_define_fields },
+ {}
+};
+
struct trace_event_functions enter_syscall_print_funcs = {
.trace = print_syscall_enter,
};
@@ -513,7 +499,7 @@ struct trace_event_functions exit_syscall_print_funcs = {
struct trace_event_class __refdata event_class_syscall_enter = {
.system = "syscalls",
.reg = syscall_enter_register,
- .define_fields = syscall_enter_define_fields,
+ .fields_array = syscall_enter_fields_array,
.get_fields = syscall_get_enter_fields,
.raw_init = init_syscall_trace,
};
@@ -521,7 +507,11 @@ struct trace_event_class __refdata event_class_syscall_enter = {
struct trace_event_class __refdata event_class_syscall_exit = {
.system = "syscalls",
.reg = syscall_exit_register,
- .define_fields = syscall_exit_define_fields,
+ .fields_array = (struct trace_event_fields[]){
+ SYSCALL_FIELD(int, __syscall_nr),
+ SYSCALL_FIELD(long, ret),
+ {}
+ },
.fields = LIST_HEAD_INIT(event_class_syscall_exit.fields),
.raw_init = init_syscall_trace,
};
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 352073d36585..476a382f1f1b 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1507,12 +1507,17 @@ static struct trace_event_functions uprobe_funcs = {
.trace = print_uprobe_event
};

+static struct trace_event_fields uprobe_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = uprobe_event_define_fields },
+ {}
+};
+
static inline void init_trace_event_call(struct trace_uprobe *tu)
{
struct trace_event_call *call = trace_probe_event_call(&tu->tp);
-
call->event.funcs = &uprobe_funcs;
- call->class->define_fields = uprobe_event_define_fields;
+ call->class->fields_array = uprobe_fields_array;

call->flags = TRACE_EVENT_FL_UPROBE | TRACE_EVENT_FL_CAP_ANY;
call->class->reg = trace_uprobe_register;
diff --git a/net/mac80211/trace.h b/net/mac80211/trace.h
index 4768322dc202..427f51a0a994 100644
--- a/net/mac80211/trace.h
+++ b/net/mac80211/trace.h
@@ -408,20 +408,20 @@ TRACE_EVENT(drv_bss_info_changed,
__field(u32, basic_rates)
__array(int, mcast_rate, NUM_NL80211_BANDS)
__field(u16, ht_operation_mode)
- __field(s32, cqm_rssi_thold);
- __field(s32, cqm_rssi_hyst);
- __field(u32, channel_width);
- __field(u32, channel_cfreq1);
+ __field(s32, cqm_rssi_thold)
+ __field(s32, cqm_rssi_hyst)
+ __field(u32, channel_width)
+ __field(u32, channel_cfreq1)
__dynamic_array(u32, arp_addr_list,
info->arp_addr_cnt > IEEE80211_BSS_ARP_ADDR_LIST_LEN ?
IEEE80211_BSS_ARP_ADDR_LIST_LEN :
- info->arp_addr_cnt);
- __field(int, arp_addr_cnt);
- __field(bool, qos);
- __field(bool, idle);
- __field(bool, ps);
- __dynamic_array(u8, ssid, info->ssid_len);
- __field(bool, hidden_ssid);
+ info->arp_addr_cnt)
+ __field(int, arp_addr_cnt)
+ __field(bool, qos)
+ __field(bool, idle)
+ __field(bool, ps)
+ __dynamic_array(u8, ssid, info->ssid_len)
+ __field(bool, hidden_ssid)
__field(int, txpower)
__field(u8, p2p_oppps_ctwindow)
),
@@ -1672,8 +1672,8 @@ TRACE_EVENT(drv_start_ap,
VIF_ENTRY
__field(u8, dtimper)
__field(u16, bcnint)
- __dynamic_array(u8, ssid, info->ssid_len);
- __field(bool, hidden_ssid);
+ __dynamic_array(u8, ssid, info->ssid_len)
+ __field(bool, hidden_ssid)
),

TP_fast_assign(
@@ -1739,7 +1739,7 @@ TRACE_EVENT(drv_join_ibss,
VIF_ENTRY
__field(u8, dtimper)
__field(u16, bcnint)
- __dynamic_array(u8, ssid, info->ssid_len);
+ __dynamic_array(u8, ssid, info->ssid_len)
),

TP_fast_assign(
diff --git a/net/wireless/trace.h b/net/wireless/trace.h
index d98ad2b3143b..d5eaa4928baa 100644
--- a/net/wireless/trace.h
+++ b/net/wireless/trace.h
@@ -2009,7 +2009,7 @@ TRACE_EVENT(rdev_start_nan,
WIPHY_ENTRY
WDEV_ENTRY
__field(u8, master_pref)
- __field(u8, bands);
+ __field(u8, bands)
),
TP_fast_assign(
WIPHY_ASSIGN;
@@ -2031,8 +2031,8 @@ TRACE_EVENT(rdev_nan_change_conf,
WIPHY_ENTRY
WDEV_ENTRY
__field(u8, master_pref)
- __field(u8, bands);
- __field(u32, changes);
+ __field(u8, bands)
+ __field(u32, changes)
),
TP_fast_assign(
WIPHY_ASSIGN;

2019-10-25 15:02:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Thu, 24 Oct 2019 18:43:20 +0200
Peter Zijlstra <[email protected]> wrote:

> >
> > CC [M] drivers/gpu/drm/i915/gem/i915_gem_context.o
> > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
> > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
> > call->class->define_fields = synth_event_define_fields;
> > ^~~~~~~~~~~~~
> > get_fields
> > make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
> > make[3]: *** Waiting for unfinished jobs....
>
> allmodconfig clean
>
> (omg, so much __field(); fail)

Well it built without warnings and passed the ftrace selftests.

I haven't ran it through the full suite, but that can wait for the v5.

-- Steve

2019-10-25 17:58:02

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Thu, 24 Oct 2019 12:16:09 +0200
Peter Zijlstra <[email protected]> wrote:

> On Wed, Oct 23, 2019 at 02:52:45PM -0400, Steven Rostedt wrote:
> > After applying this series and this patch I triggered this:
>
> Bah, I hate C.
>
> (also for some reason I had KPROBE_EVENTS disabled, when I enabled it it
> failed on boot due to selftests)
>
> this one seems to boot and survive your selftests thing (that takes for
> bloody ever to run).
>

Care to enable CONFIG_HIST_TRIGGERS?

CC [M] drivers/gpu/drm/i915/gem/i915_gem_context.o
/work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
/work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
call->class->define_fields = synth_event_define_fields;
^~~~~~~~~~~~~
get_fields
make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
make[3]: *** Waiting for unfinished jobs....

-- Steve

2019-10-25 18:52:01

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Thu, Oct 24, 2019 at 12:59:04PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 23, 2019 at 12:15:14PM -0500, Josh Poimboeuf wrote:
> > On Wed, Oct 23, 2019 at 05:16:54PM +0200, Peter Zijlstra wrote:
> > > @@ -157,6 +158,14 @@ static int __apply_relocate_add(Elf64_Sh
> > >
> > > val = sym->st_value + rel[i].r_addend;
> > >
> > > + /*
> > > + * .klp.rela.* sections should only contain module
> > > + * related RELAs. All core-kernel RELAs should be in
> > > + * normal .rela.* sections and be applied when loading
> > > + * the patch module itself.
> > > + */
> > > + WARN_ON_ONCE(klp && core_kernel_text(val));
> > > +
> >
> > This isn't quite true, we also use .klp.rela sections to access
> > unexported vmlinux symbols.
>
> Yes, you said in that earlier email. That all makes it really hard to
> validate this. But unless we validate it, it will stay buggy :/
>
> Hmmm.... /me ponders
>
> The alternative would be to apply the .klp.rela.* sections twice, once
> at patch-module load time and then apply those core_kernel_text()
> entries, and then once later and skip over them.
>
> How's this?

How about something like this? Completely untested, but if you agree
with this approach I could hack up kpatch-build to test it properly.

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index ab4a4606d19b..597bf32bc591 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -239,6 +239,17 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
if (ret)
return ret;

+ /*
+ * Prevent module patches from using livepatch relas for
+ * vmlinux symbols. Presumably such symbols are exported and
+ * normal relas can instead be used at patch module loading
+ * time.
+ */
+ if (!vmlinux && core_kernel_text(addr)) {
+ pr_err("unsupported livepatch symbol\n");
+ return -EINVAL;
+ }
+
sym->st_value = addr;
}


2019-10-25 19:04:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Thu, Oct 24, 2019 at 02:17:31PM -0400, Steven Rostedt wrote:
> On Thu, 24 Oct 2019 18:43:20 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > >
> > > CC [M] drivers/gpu/drm/i915/gem/i915_gem_context.o
> > > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
> > > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
> > > call->class->define_fields = synth_event_define_fields;
> > > ^~~~~~~~~~~~~
> > > get_fields
> > > make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
> > > make[3]: *** Waiting for unfinished jobs....
> >
> > allmodconfig clean
> >
> > (omg, so much __field(); fail)
>
> Well it built without warnings and passed the ftrace selftests.
>
> I haven't ran it through the full suite, but that can wait for the v5.

I'll push it out to git to the 0day robot can have a go at it. For v5
I'm still staring at some KLP borkage. Then again, maybe I should delay
that last bit and make that a new series.

2019-10-25 19:04:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Thu, 24 Oct 2019 22:24:55 +0200
Peter Zijlstra <[email protected]> wrote:

> On Thu, Oct 24, 2019 at 02:17:31PM -0400, Steven Rostedt wrote:
> > On Thu, 24 Oct 2019 18:43:20 +0200
> > Peter Zijlstra <[email protected]> wrote:
> >
> > > >
> > > > CC [M] drivers/gpu/drm/i915/gem/i915_gem_context.o
> > > > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c: In function ‘register_synth_event’:
> > > > /work/git/linux-trace.git/kernel/trace/trace_events_hist.c:1157:15: error: ‘struct trace_event_class’ has no member named ‘define_fields’; did you mean ‘get_fields’?
> > > > call->class->define_fields = synth_event_define_fields;
> > > > ^~~~~~~~~~~~~
> > > > get_fields
> > > > make[3]: *** [/work/git/linux-trace.git/scripts/Makefile.build:265: kernel/trace/trace_events_hist.o] Error 1
> > > > make[3]: *** Waiting for unfinished jobs....
> > >
> > > allmodconfig clean
> > >
> > > (omg, so much __field(); fail)
> >
> > Well it built without warnings and passed the ftrace selftests.
> >
> > I haven't ran it through the full suite, but that can wait for the v5.
>
> I'll push it out to git to the 0day robot can have a go at it. For v5
> I'm still staring at some KLP borkage. Then again, maybe I should delay
> that last bit and make that a new series.

Also note, that I'm about to travel to Lyon for Open Source Summit,
thus my looking at this will come pretty much to a stand still :-/

-- Steve

2019-10-25 19:07:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Thu, Oct 24, 2019 at 01:31:15PM -0500, Josh Poimboeuf wrote:

> How about something like this? Completely untested, but if you agree
> with this approach I could hack up kpatch-build to test it properly.
>
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index ab4a4606d19b..597bf32bc591 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -239,6 +239,17 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
> if (ret)
> return ret;
>
> + /*
> + * Prevent module patches from using livepatch relas for
> + * vmlinux symbols. Presumably such symbols are exported and
> + * normal relas can instead be used at patch module loading
> + * time.
> + */
> + if (!vmlinux && core_kernel_text(addr)) {
> + pr_err("unsupported livepatch symbol\n");
> + return -EINVAL;
> + }
> +
> sym->st_value = addr;
> }

If that works, this is much simpler and therefore preferred.

2019-10-25 19:27:25

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Thu 2019-10-24 15:16:34, Peter Zijlstra wrote:
> On Wed, Oct 23, 2019 at 12:00:25PM -0500, Josh Poimboeuf wrote:
>
> > > This then raises a number of questions:
> > >
> > > 1) why is that RELA (that obviously does not depend on any module)
> > > applied so late?
> >
> > Good question. The 'pv_ops' symbol is exported by the core kernel, so I
> > can't see any reason why we'd need to apply that rela late. In theory,
> > kpatch-build isn't supposed to convert that to a klp rela. Maybe
> > something went wrong in the patch creation code.
> >
> > I'm also questioning why we even need to apply the parainstructions
> > section late. Maybe we can remove that apply_paravirt() call
> > altogether, along with .klp.arch.parainstruction sections.

Hmm, the original bug report against livepatching was actually about
paravirt ops, see below.


> > I'll need to look into it...
>
> Right, that really should be able to run early. Esp. after commit
>
> 11e86dc7f274 ("x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call()")
>
> paravirt patching is unconditional. We _never_ run with the indirect
> call except very early boot, but modules should have them patched way
> before their init section runs.
>
> We rely on this for spectre-v2 and friends.

Livepatching has the same requirement. The module code has to be fully
livepatched before the module gets actually used. It means before
mod->init() is called and before the module is moved into
MODULE_STATE_LIVE state.


> > > 3) Is there ever a possible module-dependent RELA to a paravirt /
> > > alternative site?
> >
> > Good question...
>
> > > Then for 3) we only have alternatives left, and I _think_ it unlikely to
> > > be the case, but I'll have to have a hard look at that.
> >
> > I'm not sure about alternatives, but maybe we can enforce such
> > limitations with tooling and/or kernel checks.
>
> Right, so on IRC you implied you might have some additional details on
> how alternatives were affected; did you manage to dig that up?

I am not sure what Josh had in mind. But the problem with livepatches,
paravort ops, and alternatives was described in the related patchset, see
https://lkml.kernel.org/r/[email protected]

The original bug report is
https://lkml.kernel.org/r/[email protected]

Best Regards,
Petr

2019-10-25 19:30:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Fri, Oct 25, 2019 at 08:44:56AM +0200, Petr Mladek wrote:
> On Thu 2019-10-24 15:16:34, Peter Zijlstra wrote:
> > On Wed, Oct 23, 2019 at 12:00:25PM -0500, Josh Poimboeuf wrote:
> >
> > > > This then raises a number of questions:
> > > >
> > > > 1) why is that RELA (that obviously does not depend on any module)
> > > > applied so late?
> > >
> > > Good question. The 'pv_ops' symbol is exported by the core kernel, so I
> > > can't see any reason why we'd need to apply that rela late. In theory,
> > > kpatch-build isn't supposed to convert that to a klp rela. Maybe
> > > something went wrong in the patch creation code.
> > >
> > > I'm also questioning why we even need to apply the parainstructions
> > > section late. Maybe we can remove that apply_paravirt() call
> > > altogether, along with .klp.arch.parainstruction sections.
>
> Hmm, the original bug report against livepatching was actually about
> paravirt ops, see below.

Yes, I found that.

> > > I'm not sure about alternatives, but maybe we can enforce such
> > > limitations with tooling and/or kernel checks.
> >
> > Right, so on IRC you implied you might have some additional details on
> > how alternatives were affected; did you manage to dig that up?
>
> I am not sure what Josh had in mind. But the problem with livepatches,
> paravort ops, and alternatives was described in the related patchset, see
> https://lkml.kernel.org/r/[email protected]

Yes, and my complaint there is that that thread is void of useful
content.

> The original bug report is
> https://lkml.kernel.org/r/[email protected]

I found the github (*groan*) link in the thread above.

From all that I could only make that the paravirt stuff is just doing it
wrong (see earlier emails, core-kernel RELAs really should be applied at
the time of patch-module load, there's no excuse for them to be delayed
to the .klp.rela. section) at which point paravirt will also magically
work.

But none of that explains why apply_alternatives() is also delayed.

So I'm very tempted to just revert that patchset for doing it all
wrong.

2019-10-25 19:35:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Fri, Oct 25, 2019 at 08:44:56AM +0200, Petr Mladek wrote:
> On Thu 2019-10-24 15:16:34, Peter Zijlstra wrote:

> > Right, that really should be able to run early. Esp. after commit
> >
> > 11e86dc7f274 ("x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call()")
> >
> > paravirt patching is unconditional. We _never_ run with the indirect
> > call except very early boot, but modules should have them patched way
> > before their init section runs.
> >
> > We rely on this for spectre-v2 and friends.
>
> Livepatching has the same requirement. The module code has to be fully
> livepatched before the module gets actually used.

Right, and that is just saying that all paravirt RELAs (pv_ops) can
basically be deleted from modules.

Which avoids the reported problem in yet another way.

> It means before mod->init() is called and before the module is moved
> into MODULE_STATE_LIVE state.

Funny thing, currently ftrace is running code before all that. It runs
code before klp_module_coming(), before jump_label patching.

My other patch in this thread fixes that.

2019-10-25 20:20:07

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Fri, Oct 25, 2019 at 12:06:12PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 25, 2019 at 10:43:00AM +0200, Peter Zijlstra wrote:
>
> > But none of that explains why apply_alternatives() is also delayed.
> >
> > So I'm very tempted to just revert that patchset for doing it all
> > wrong.
>
> And I've done just that. This includes Josh's validation patch, the
> revert and my klp_appy_relocations_add() patches with the removal of
> module_disable_ro().
>
> Josh, can you test or give me clue on how to test? I need to run a few
> errands today, but I'll try and have a poke either tonight or tomorrow.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/rwx

Thanks. I'll work on hacking up kpatch-build to support this, and then
I'll need to run it through a lot of testing to make sure this was a
good idea.

--
Josh

2019-10-25 22:29:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Fri, Oct 25, 2019 at 10:43:00AM +0200, Peter Zijlstra wrote:

> But none of that explains why apply_alternatives() is also delayed.
>
> So I'm very tempted to just revert that patchset for doing it all
> wrong.

And I've done just that. This includes Josh's validation patch, the
revert and my klp_appy_relocations_add() patches with the removal of
module_disable_ro().

Josh, can you test or give me clue on how to test? I need to run a few
errands today, but I'll try and have a poke either tonight or tomorrow.

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/rwx

After this it is waiting for Mark's argh64 patches to land:

https://lkml.kernel.org/r/[email protected]

And then we can go and delete module_disable_ro() entirely -- hooray!

2019-10-26 01:19:04

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Fri, Oct 25, 2019 at 12:06:12PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 25, 2019 at 10:43:00AM +0200, Peter Zijlstra wrote:
>
> > But none of that explains why apply_alternatives() is also delayed.
> >
> > So I'm very tempted to just revert that patchset for doing it all
> > wrong.
>
> And I've done just that. This includes Josh's validation patch, the
> revert and my klp_appy_relocations_add() patches with the removal of
> module_disable_ro().
>
> Josh, can you test or give me clue on how to test? I need to run a few
> errands today, but I'll try and have a poke either tonight or tomorrow.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/rwx

I looked at this today. A few potential tweaks:

- The new klp_apply_relocate_add() interface isn't needed. Instead
apply_relocate_add() can use the module state to decide whether to
text_poke().

- For robustness I think we need to apply vmlinux-specific klp
relocations at the same time as normal relocations.

Rough untested changes below. I still need to finish changing
kpatch-build and then I'll need to do a LOT of testing.

I can take over the livepatch-specific patches if you want. Or however
you want to do it.

diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 7fc519b9b4e0..6a70213854f0 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -451,14 +451,11 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
unsigned int symindex, unsigned int relsec,
struct module *me)
{
- return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memwrite);
-}
+ int ret;
+ bool early = me->state != MODULE_STATE_LIVE;

-int klp_apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
- unsigned int symindex, unsigned int relsec,
- struct module *me)
-{
- return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, s390_kernel_write);
+ return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me,
+ early ? memwrite : s390_kernel_write);
}

int module_finalize(const Elf_Ehdr *hdr,
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 5eee618a98c5..30174798ff79 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -222,20 +222,14 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
unsigned int symindex,
unsigned int relsec,
struct module *me)
-{
- return __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, memcpy);
-}
-
-int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
- const char *strtab,
- unsigned int symindex,
- unsigned int relsec,
- struct module *me)
{
int ret;
+ bool early = me->state != MODULE_STATE_LIVE;
+
+ ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me,
+ early ? memcpy : text_poke);

- ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me, text_poke);
- if (!ret)
+ if (!ret && !early)
text_poke_sync();

return ret;
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index cc18f945bdb2..b00170696db2 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -214,12 +214,7 @@ void *klp_shadow_get_or_alloc(void *obj, unsigned long id,
void klp_shadow_free(void *obj, unsigned long id, klp_shadow_dtor_t dtor);
void klp_shadow_free_all(unsigned long id, klp_shadow_dtor_t dtor);

-
-extern int klp_apply_relocate_add(Elf64_Shdr *sechdrs,
- const char *strtab,
- unsigned int symindex,
- unsigned int relsec,
- struct module *me);
+int klp_write_relocations(struct module *mod, struct klp_object *obj);

#else /* !CONFIG_LIVEPATCH */

@@ -229,6 +224,12 @@ static inline bool klp_patch_pending(struct task_struct *task) { return false; }
static inline void klp_update_patch_state(struct task_struct *task) {}
static inline void klp_copy_process(struct task_struct *child) {}

+static inline int klp_write_relocations(struct module *mod,
+ struct klp_object *obj)
+{
+ return 0;
+}
+
#endif /* CONFIG_LIVEPATCH */

#endif /* _LINUX_LIVEPATCH_H_ */
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 30395302a273..52eb91d0ee8d 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -256,27 +256,60 @@ static int klp_resolve_symbols(Elf_Shdr *relasec, struct module *pmod)
return 0;
}

-int __weak klp_apply_relocate_add(Elf64_Shdr *sechdrs,
- const char *strtab,
- unsigned int symindex,
- unsigned int relsec,
- struct module *me)
-{
- return apply_relocate_add(sechdrs, strtab, symindex, relsec, me);
-}
-
-static int klp_write_object_relocations(struct module *pmod,
- struct klp_object *obj)
+/*
+ * This function is called for both vmlinux-specific and module-specific klp
+ * relocation sections:
+ *
+ * 1) When the klp module itself loads, the module code calls this function
+ * to write vmlinux-specific klp relocations. These relocations allow the
+ * patched code/data to reference unexported vmlinux symbols. They're
+ * written as early as possible to ensure that other module init code
+ * (.e.g., jump_label_apply_nops) can access any non-exported vmlinux
+ * symbols which might be referenced by the klp module's special sections.
+ *
+ * 2) When a to-be-patched module loads (or is already loaded when the
+ * klp module loads), klp code calls this function to write klp relocations
+ * which are specific to the module. These relocations allow the patched
+ * code/data to reference module symbols, both unexported and exported.
+ * They also enable late module patching, which means the to-be-patched
+ * module may be loaded *after* the klp module.
+ *
+ * The following restrictions apply to module-specific relocation sections:
+ *
+ * a) References to vmlinux symbols are not allowed. Otherwise there might
+ * be module init ordering issues, and crashes might occur in some of the
+ * other kernel patching components like paravirt patching or jump
+ * labels. All references to vmlinux symbols should use either normal
+ * relas (for exported symbols) or vmlinux-specific klp relas (for
+ * unexported symbols). This restriction is enforced in
+ * klp_resolve_symbols().
+ *
+ * b) Relocations to special sections like __jump_table and .altinstructions
+ * aren't allowed. In other words, there should never be a
+ * .klp.rela.{module}.__jump_table section. This will definitely cause
+ * initialization ordering issues, as such special sections are processed
+ * during the loading of the klp module itself, *not* the to-be-patched
+ * module. This means that e.g., it's not currently possible to patch a
+ * module function which uses a static key jump label, if you want to
+ * have the replacement function also use the same static key. In this
+ * case, a non-static interface like static_key_enabled() can be used in
+ * the new function instead.
+ *
+ * On the other hand, a .klp.rela.vmlinux.__jump_table section is fine,
+ * as it can be resolved early enough during the load of the klp module,
+ * as described above.
+ */
+int klp_write_relocations(struct module *pmod, struct klp_object *obj)
{
int i, cnt, ret = 0;
const char *objname, *secname;
char sec_objname[MODULE_NAME_LEN];
Elf_Shdr *sec;

- if (WARN_ON(!klp_is_object_loaded(obj)))
+ if (WARN_ON(obj && !klp_is_object_loaded(obj)))
return -EINVAL;

- objname = klp_is_module(obj) ? obj->name : "vmlinux";
+ objname = obj ? obj->name : "vmlinux";

/* For each klp relocation section */
for (i = 1; i < pmod->klp_info->hdr.e_shnum; i++) {
@@ -305,7 +338,7 @@ static int klp_write_object_relocations(struct module *pmod,
if (ret)
break;

- ret = klp_apply_relocate_add(pmod->klp_info->sechdrs,
+ ret = apply_relocate_add(pmod->klp_info->sechdrs,
pmod->core_kallsyms.strtab,
pmod->klp_info->symndx, i, pmod);
if (ret)
@@ -733,16 +766,25 @@ static int klp_init_object_loaded(struct klp_patch *patch,
struct klp_func *func;
int ret;

- mutex_lock(&text_mutex);
+ if (klp_is_module(obj)) {
+
+ /*
+ * Only write module-specific relocations here.
+ * vmlinux-specific relocations were already written during the
+ * loading of the klp module.
+ */
+
+ mutex_lock(&text_mutex);
+
+ ret = klp_write_relocations(patch->mod, obj);
+ if (ret) {
+ mutex_unlock(&text_mutex);
+ return ret;
+ }

- ret = klp_write_object_relocations(patch->mod, obj);
- if (ret) {
mutex_unlock(&text_mutex);
- return ret;
}

- mutex_unlock(&text_mutex);
-
klp_for_each_func(obj, func) {
ret = klp_find_object_symbol(obj->name, func->old_name,
func->old_sympos,
diff --git a/kernel/module.c b/kernel/module.c
index fe5bd382759c..ff4347385f05 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2327,11 +2327,9 @@ static int apply_relocations(struct module *mod, const struct load_info *info)
if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC))
continue;

- /* Livepatch relocation sections are applied by livepatch */
if (info->sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH)
- continue;
-
- if (info->sechdrs[i].sh_type == SHT_REL)
+ err = klp_write_relocations(mod, NULL);
+ else if (info->sechdrs[i].sh_type == SHT_REL)
err = apply_relocate(info->sechdrs, info->strtab,
info->index.sym, i, mod);
else if (info->sechdrs[i].sh_type == SHT_RELA)
@@ -3812,18 +3810,24 @@ static int load_module(struct load_info *info, const char __user *uargs,
/* Set up MODINFO_ATTR fields */
setup_modinfo(mod, info);

+ if (is_livepatch_module(mod)) {
+ err = copy_module_elf(mod, info);
+ if (err < 0)
+ goto free_modinfo;
+ }
+
/* Fix up syms, so that st_value is a pointer to location. */
err = simplify_symbols(mod, info);
if (err < 0)
- goto free_modinfo;
+ goto free_elf_copy;

err = apply_relocations(mod, info);
if (err < 0)
- goto free_modinfo;
+ goto free_elf_copy;

err = post_relocation(mod, info);
if (err < 0)
- goto free_modinfo;
+ goto free_elf_copy;

flush_module_icache(mod);

@@ -3866,12 +3870,6 @@ static int load_module(struct load_info *info, const char __user *uargs,
if (err < 0)
goto coming_cleanup;

- if (is_livepatch_module(mod)) {
- err = copy_module_elf(mod, info);
- if (err < 0)
- goto sysfs_cleanup;
- }
-
/* Get rid of temporary copy. */
free_copy(info);

@@ -3880,8 +3878,6 @@ static int load_module(struct load_info *info, const char __user *uargs,

return do_init_module(mod);

- sysfs_cleanup:
- mod_sysfs_teardown(mod);
coming_cleanup:
mod->state = MODULE_STATE_GOING;
destroy_params(mod->kp, mod->num_kp);
@@ -3901,6 +3897,8 @@ static int load_module(struct load_info *info, const char __user *uargs,
kfree(mod->args);
free_arch_cleanup:
module_arch_cleanup(mod);
+ free_elf_copy:
+ free_module_elf(mod);
free_modinfo:
free_modinfo(mod);
free_unload:

2019-10-29 01:26:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Fri, Oct 25, 2019 at 08:17:41PM -0500, Josh Poimboeuf wrote:

> + * The following restrictions apply to module-specific relocation sections:
> + *
> + * a) References to vmlinux symbols are not allowed. Otherwise there might
> + * be module init ordering issues, and crashes might occur in some of the
> + * other kernel patching components like paravirt patching or jump
> + * labels. All references to vmlinux symbols should use either normal
> + * relas (for exported symbols) or vmlinux-specific klp relas (for
> + * unexported symbols). This restriction is enforced in
> + * klp_resolve_symbols().

Right.

> + * b) Relocations to special sections like __jump_table and .altinstructions
> + * aren't allowed. In other words, there should never be a
> + * .klp.rela.{module}.__jump_table section. This will definitely cause
> + * initialization ordering issues, as such special sections are processed
> + * during the loading of the klp module itself, *not* the to-be-patched
> + * module. This means that e.g., it's not currently possible to patch a
> + * module function which uses a static key jump label, if you want to
> + * have the replacement function also use the same static key. In this
> + * case, a non-static interface like static_key_enabled() can be used in
> + * the new function instead.

Idem for .static_call_sites I suppose..

Is there any enforcement on this? I'm thinking it should be possible to
detect the presence of these sections and yell a bit.

OTOH, it should be possible to actually handle this, but let's do that
later.

> + * On the other hand, a .klp.rela.vmlinux.__jump_table section is fine,
> + * as it can be resolved early enough during the load of the klp module,
> + * as described above.
> + */

> diff --git a/kernel/module.c b/kernel/module.c
> index fe5bd382759c..ff4347385f05 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2327,11 +2327,9 @@ static int apply_relocations(struct module *mod, const struct load_info *info)
> if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC))
> continue;
>
> - /* Livepatch relocation sections are applied by livepatch */
> if (info->sechdrs[i].sh_flags & SHF_RELA_LIVEPATCH)
> - continue;
> -
> - if (info->sechdrs[i].sh_type == SHT_REL)
> + err = klp_write_relocations(mod, NULL);
> + else if (info->sechdrs[i].sh_type == SHT_REL)
> err = apply_relocate(info->sechdrs, info->strtab,
> info->index.sym, i, mod);
> else if (info->sechdrs[i].sh_type == SHT_RELA)

Like here, we can yell and error if .klp.rela.{mod}.__jump_table
sections are encountered.


Other than that, this should work I suppose.

2019-10-29 01:29:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 15/16] module: Move where we mark modules RO,X

On Fri, Oct 25, 2019 at 08:17:41PM -0500, Josh Poimboeuf wrote:

> I can take over the livepatch-specific patches if you want. Or however
> you want to do it.

Sure, feel free to take and route the livepatch patches. Then I'll wait
until those and the ARM64 patches land before I'll pick this up again.