2022-07-21 18:03:09

by Song Liu

[permalink] [raw]
Subject: [PATCH v3] livepatch: Clear relocation targets on a module removal

From: Miroslav Benes <[email protected]>

Josh reported a bug:

When the object to be patched is a module, and that module is
rmmod'ed and reloaded, it fails to load with:

module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

The livepatch module has a relocation which references a symbol
in the _previous_ loading of nfsd. When apply_relocate_add()
tries to replace the old relocation with a new one, it sees that
the previous one is nonzero and it errors out.

On ppc64le, we have a similar issue:

module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

He also proposed three different solutions. We could remove the error
check in apply_relocate_add() introduced by commit eda9cec4c9a1
("x86/module: Detect and skip invalid relocations"). However the check
is useful for detecting corrupted modules.

We could also deny the patched modules to be removed. If it proved to be
a major drawback for users, we could still implement a different
approach. The solution would also complicate the existing code a lot.

We thus decided to reverse the relocation patching (clear all relocation
targets on x86_64). The solution is not
universal and is too much arch-specific, but it may prove to be simpler
in the end.

Reported-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Miroslav Benes <[email protected]>
Signed-off-by: Song Liu <[email protected]>

---

Changes from v2:
1. Rewrite x86 changes to match current code style.
2. Remove powerpc changes as there is no test coverage in v3.
3. Only keep 1/3 of v2.

v2: https://lore.kernel.org/all/[email protected]/T/#u
---
arch/powerpc/kernel/module_64.c | 10 ++++
arch/s390/kernel/module.c | 8 ++++
arch/x86/kernel/module.c | 85 +++++++++++++++++++++++++++++++++
include/linux/moduleloader.h | 7 +++
kernel/livepatch/core.c | 41 +++++++++++++++-
5 files changed, 150 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 7e45dc98df8a..9125f0aff5d4 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -739,6 +739,16 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
return 0;
}

+#ifdef CONFIG_LIVEPATCH
+void clear_relocate_add(Elf64_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *me)
+{
+}
+#endif
+
#ifdef CONFIG_DYNAMIC_FTRACE
int module_trampoline_target(struct module *mod, unsigned long addr,
unsigned long *target)
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 26125a9c436d..abbf45715354 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -500,6 +500,14 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct module *me,
}
#endif /* CONFIG_FUNCTION_TRACER */

+#ifdef CONFIG_LIVEPATCH
+void clear_relocate_add(Elf64_Shdr *sechdrs, const char *strtab,
+ unsigned int symindex, unsigned int relsec,
+ struct module *me)
+{
+}
+#endif
+
int module_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *me)
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 67828d973389..2e4f219ea98b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -245,6 +245,91 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
return ret;
}

+#ifdef CONFIG_LIVEPATCH
+
+static void __clear_relocate_add(Elf64_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *me,
+ void *(*write)(void *dest, const void *src, size_t len))
+{
+ unsigned int i;
+ Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
+ Elf64_Sym *sym;
+ void *loc;
+ u64 zero_u64 = 0ULL;
+ u32 zero_u32 = 0;
+
+ DEBUGP("Clearing relocate section %u to %u\n",
+ relsec, sechdrs[relsec].sh_info);
+ for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
+ /* This is where to make the change */
+ loc = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
+ + rel[i].r_offset;
+
+ /*
+ * This is the symbol it is referring to. Note that all
+ * undefined symbols have been resolved.
+ */
+ sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
+ + ELF64_R_SYM(rel[i].r_info);
+
+ DEBUGP("type %d st_value %Lx r_addend %Lx loc %Lx\n",
+ (int)ELF64_R_TYPE(rel[i].r_info),
+ sym->st_value, rel[i].r_addend, (u64)loc);
+
+ switch (ELF64_R_TYPE(rel[i].r_info)) {
+ case R_X86_64_NONE:
+ break;
+ case R_X86_64_64:
+ write(loc, &zero_u64, 8);
+ break;
+ case R_X86_64_32:
+ write(loc, &zero_u32, 4);
+ break;
+ case R_X86_64_32S:
+ write(loc, &zero_u32, 4);
+ break;
+ case R_X86_64_PC32:
+ case R_X86_64_PLT32:
+ write(loc, &zero_u32, 4);
+ break;
+ case R_X86_64_PC64:
+ write(loc, &zero_u64, 8);
+ break;
+ default:
+ pr_err("%s: Unknown rela relocation: %llu\n",
+ me->name, ELF64_R_TYPE(rel[i].r_info));
+ break;
+ }
+ }
+}
+
+void clear_relocate_add(Elf64_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *me)
+{
+ bool early = me->state == MODULE_STATE_UNFORMED;
+ void *(*write)(void *, const void *, size_t) = memcpy;
+
+ if (!early) {
+ write = text_poke;
+ mutex_lock(&text_mutex);
+ }
+
+ __clear_relocate_add(sechdrs, strtab, symindex, relsec, me,
+ write);
+
+ if (!early) {
+ text_poke_sync();
+ mutex_unlock(&text_mutex);
+ }
+}
+#endif
+
#endif

int module_finalize(const Elf_Ehdr *hdr,
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..d22b36b84b4b 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -72,6 +72,13 @@ int apply_relocate_add(Elf_Shdr *sechdrs,
unsigned int symindex,
unsigned int relsec,
struct module *mod);
+#ifdef CONFIG_LIVEPATCH
+void clear_relocate_add(Elf64_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *me);
+#endif
#else
static inline int apply_relocate_add(Elf_Shdr *sechdrs,
const char *strtab,
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index bc475e62279d..5c0d8a4eba13 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -316,6 +316,45 @@ int klp_apply_section_relocs(struct module *pmod, Elf_Shdr *sechdrs,
return apply_relocate_add(sechdrs, strtab, symndx, secndx, pmod);
}

+static void klp_clear_object_relocations(struct module *pmod,
+ struct klp_object *obj)
+{
+ int i, cnt;
+ const char *objname, *secname;
+ char sec_objname[MODULE_NAME_LEN];
+ Elf_Shdr *sec;
+
+ objname = klp_is_module(obj) ? obj->name : "vmlinux";
+
+ /* For each klp relocation section */
+ for (i = 1; i < pmod->klp_info->hdr.e_shnum; i++) {
+ sec = pmod->klp_info->sechdrs + i;
+ secname = pmod->klp_info->secstrings + sec->sh_name;
+ if (!(sec->sh_flags & SHF_RELA_LIVEPATCH))
+ continue;
+
+ /*
+ * Format: .klp.rela.sec_objname.section_name
+ * See comment in klp_resolve_symbols() for an explanation
+ * of the selected field width value.
+ */
+ secname = pmod->klp_info->secstrings + sec->sh_name;
+ cnt = sscanf(secname, ".klp.rela.%55[^.]", sec_objname);
+ if (cnt != 1) {
+ pr_err("section %s has an incorrectly formatted name\n",
+ secname);
+ continue;
+ }
+
+ if (strcmp(objname, sec_objname))
+ continue;
+
+ clear_relocate_add(pmod->klp_info->sechdrs,
+ pmod->core_kallsyms.strtab,
+ pmod->klp_info->symndx, i, pmod);
+ }
+}
+
/*
* Sysfs Interface
*
@@ -1154,7 +1193,7 @@ static void klp_cleanup_module_patches_limited(struct module *mod,
klp_unpatch_object(obj);

klp_post_unpatch_callback(obj);
-
+ klp_clear_object_relocations(patch->mod, obj);
klp_free_object_loaded(obj);
break;
}
--
2.30.2


2022-07-26 17:56:34

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

Hi folks,

Could you please share your comments on this one?

Thanks,
Song

On Thu, Jul 21, 2022 at 10:52 AM Song Liu <[email protected]> wrote:
>
> From: Miroslav Benes <[email protected]>
>
> Josh reported a bug:
>
> When the object to be patched is a module, and that module is
> rmmod'ed and reloaded, it fails to load with:
>
> module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>
> The livepatch module has a relocation which references a symbol
> in the _previous_ loading of nfsd. When apply_relocate_add()
> tries to replace the old relocation with a new one, it sees that
> the previous one is nonzero and it errors out.
>
> On ppc64le, we have a similar issue:
>
> module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>
> He also proposed three different solutions. We could remove the error
> check in apply_relocate_add() introduced by commit eda9cec4c9a1
> ("x86/module: Detect and skip invalid relocations"). However the check
> is useful for detecting corrupted modules.
>
> We could also deny the patched modules to be removed. If it proved to be
> a major drawback for users, we could still implement a different
> approach. The solution would also complicate the existing code a lot.
>
> We thus decided to reverse the relocation patching (clear all relocation
> targets on x86_64). The solution is not
> universal and is too much arch-specific, but it may prove to be simpler
> in the end.
>
> Reported-by: Josh Poimboeuf <[email protected]>
> Signed-off-by: Miroslav Benes <[email protected]>
> Signed-off-by: Song Liu <[email protected]>
>
> ---
>
> Changes from v2:
> 1. Rewrite x86 changes to match current code style.
> 2. Remove powerpc changes as there is no test coverage in v3.
> 3. Only keep 1/3 of v2.
>
> v2: https://lore.kernel.org/all/[email protected]/T/#u
> ---
[...]

2022-07-26 23:51:35

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
> From: Miroslav Benes <[email protected]>
>
> Josh reported a bug:
>
> When the object to be patched is a module, and that module is
> rmmod'ed and reloaded, it fails to load with:
>
> module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>
> The livepatch module has a relocation which references a symbol
> in the _previous_ loading of nfsd. When apply_relocate_add()
> tries to replace the old relocation with a new one, it sees that
> the previous one is nonzero and it errors out.
>
> On ppc64le, we have a similar issue:
>
> module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>
> He also proposed three different solutions. We could remove the error
> check in apply_relocate_add() introduced by commit eda9cec4c9a1
> ("x86/module: Detect and skip invalid relocations"). However the check
> is useful for detecting corrupted modules.
>
> We could also deny the patched modules to be removed. If it proved to be
> a major drawback for users, we could still implement a different
> approach. The solution would also complicate the existing code a lot.
>
> We thus decided to reverse the relocation patching (clear all relocation
> targets on x86_64). The solution is not
> universal and is too much arch-specific, but it may prove to be simpler
> in the end.
>
> Reported-by: Josh Poimboeuf <[email protected]>
> Signed-off-by: Miroslav Benes <[email protected]>
> Signed-off-by: Song Liu <[email protected]>
>
> ---
>
> Changes from v2:
> 1. Rewrite x86 changes to match current code style.
> 2. Remove powerpc changes as there is no test coverage in v3.
> 3. Only keep 1/3 of v2.

1) All the copy/paste is ugly and IMO guaranteed to eventually introduce
bugs when somebody forgets to update the copy. Wouldn't it be more
robust to reuse the existing apply_relocate_add() code by making it
more generic somehow, like with a new 'clear' bool arg which sets
'val' to zero?

2) We can't only fix x86, powerpc also needs a fix.

3) A selftest would be a good idea.

--
Josh

2022-07-27 04:04:11

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

On Tue, Jul 26, 2022 at 4:33 PM Josh Poimboeuf <[email protected]> wrote:
>
> On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
> > From: Miroslav Benes <[email protected]>
> >
> > Josh reported a bug:
> >
> > When the object to be patched is a module, and that module is
> > rmmod'ed and reloaded, it fails to load with:
> >
> > module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> >
> > The livepatch module has a relocation which references a symbol
> > in the _previous_ loading of nfsd. When apply_relocate_add()
> > tries to replace the old relocation with a new one, it sees that
> > the previous one is nonzero and it errors out.
> >
> > On ppc64le, we have a similar issue:
> >
> > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> >
> > He also proposed three different solutions. We could remove the error
> > check in apply_relocate_add() introduced by commit eda9cec4c9a1
> > ("x86/module: Detect and skip invalid relocations"). However the check
> > is useful for detecting corrupted modules.
> >
> > We could also deny the patched modules to be removed. If it proved to be
> > a major drawback for users, we could still implement a different
> > approach. The solution would also complicate the existing code a lot.
> >
> > We thus decided to reverse the relocation patching (clear all relocation
> > targets on x86_64). The solution is not
> > universal and is too much arch-specific, but it may prove to be simpler
> > in the end.
> >
> > Reported-by: Josh Poimboeuf <[email protected]>
> > Signed-off-by: Miroslav Benes <[email protected]>
> > Signed-off-by: Song Liu <[email protected]>
> >
> > ---
> >
> > Changes from v2:
> > 1. Rewrite x86 changes to match current code style.
> > 2. Remove powerpc changes as there is no test coverage in v3.
> > 3. Only keep 1/3 of v2.
>
> 1) All the copy/paste is ugly and IMO guaranteed to eventually introduce
> bugs when somebody forgets to update the copy. Wouldn't it be more
> robust to reuse the existing apply_relocate_add() code by making it
> more generic somehow, like with a new 'clear' bool arg which sets
> 'val' to zero?

Agreed. I can give it a try.

>
> 2) We can't only fix x86, powerpc also needs a fix.

I have very little experience with powerpc. Would someone be willing to
help with powerpc part of this?

> 3) A selftest would be a good idea.

I will try this.

Thanks,
Song

2022-07-30 23:21:54

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

On Tue, Jul 26, 2022 at 8:54 PM Song Liu <[email protected]> wrote:
>
> On Tue, Jul 26, 2022 at 4:33 PM Josh Poimboeuf <[email protected]> wrote:
> >
> > On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
> > > From: Miroslav Benes <[email protected]>
> > >
> > > Josh reported a bug:
> > >
> > > When the object to be patched is a module, and that module is
> > > rmmod'ed and reloaded, it fails to load with:
> > >
> > > module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > >
> > > The livepatch module has a relocation which references a symbol
> > > in the _previous_ loading of nfsd. When apply_relocate_add()
> > > tries to replace the old relocation with a new one, it sees that
> > > the previous one is nonzero and it errors out.
> > >
> > > On ppc64le, we have a similar issue:
> > >
> > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > >
> > > He also proposed three different solutions. We could remove the error
> > > check in apply_relocate_add() introduced by commit eda9cec4c9a1
> > > ("x86/module: Detect and skip invalid relocations"). However the check
> > > is useful for detecting corrupted modules.
> > >
> > > We could also deny the patched modules to be removed. If it proved to be
> > > a major drawback for users, we could still implement a different
> > > approach. The solution would also complicate the existing code a lot.
> > >
> > > We thus decided to reverse the relocation patching (clear all relocation
> > > targets on x86_64). The solution is not
> > > universal and is too much arch-specific, but it may prove to be simpler
> > > in the end.
> > >
> > > Reported-by: Josh Poimboeuf <[email protected]>
> > > Signed-off-by: Miroslav Benes <[email protected]>
> > > Signed-off-by: Song Liu <[email protected]>
> > >
> > > ---
> > >
> > > Changes from v2:
> > > 1. Rewrite x86 changes to match current code style.
> > > 2. Remove powerpc changes as there is no test coverage in v3.
> > > 3. Only keep 1/3 of v2.
> >
> > 1) All the copy/paste is ugly and IMO guaranteed to eventually introduce
> > bugs when somebody forgets to update the copy. Wouldn't it be more
> > robust to reuse the existing apply_relocate_add() code by making it
> > more generic somehow, like with a new 'clear' bool arg which sets
> > 'val' to zero?
>
> Agreed. I can give it a try.

I finished this part, though it is not really clean (added if else for
each "case:").

>
> >
> > 2) We can't only fix x86, powerpc also needs a fix.
>
> I have very little experience with powerpc. Would someone be willing to
> help with powerpc part of this?

I guess folks are all busy. Any suggestions on how to test powerpc changes?

>
> > 3) A selftest would be a good idea.
>

I found it is pretty tricky to run the selftests inside a qemu VM. How about
we test it with modules in samples/livepatch? Specifically, we can add a
script try to reload livepatch-shadow-mod.ko.

Thanks,
Song

2022-07-31 03:50:22

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

On Sat, Jul 30, 2022 at 3:32 PM Song Liu <[email protected]> wrote:
>
> On Tue, Jul 26, 2022 at 8:54 PM Song Liu <[email protected]> wrote:
> >
> > On Tue, Jul 26, 2022 at 4:33 PM Josh Poimboeuf <[email protected]> wrote:
> > >
> > > On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
> > > > From: Miroslav Benes <[email protected]>
> > > >
> > > > Josh reported a bug:
> > > >
> > > > When the object to be patched is a module, and that module is
> > > > rmmod'ed and reloaded, it fails to load with:
> > > >
> > > > module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > >
> > > > The livepatch module has a relocation which references a symbol
> > > > in the _previous_ loading of nfsd. When apply_relocate_add()
> > > > tries to replace the old relocation with a new one, it sees that
> > > > the previous one is nonzero and it errors out.
> > > >
> > > > On ppc64le, we have a similar issue:
> > > >
> > > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > >
> > > > He also proposed three different solutions. We could remove the error
> > > > check in apply_relocate_add() introduced by commit eda9cec4c9a1
> > > > ("x86/module: Detect and skip invalid relocations"). However the check
> > > > is useful for detecting corrupted modules.
> > > >
> > > > We could also deny the patched modules to be removed. If it proved to be
> > > > a major drawback for users, we could still implement a different
> > > > approach. The solution would also complicate the existing code a lot.
> > > >
> > > > We thus decided to reverse the relocation patching (clear all relocation
> > > > targets on x86_64). The solution is not
> > > > universal and is too much arch-specific, but it may prove to be simpler
> > > > in the end.
> > > >
> > > > Reported-by: Josh Poimboeuf <[email protected]>
> > > > Signed-off-by: Miroslav Benes <[email protected]>
> > > > Signed-off-by: Song Liu <[email protected]>
> > > >
> > > > ---
> > > >
> > > > Changes from v2:
> > > > 1. Rewrite x86 changes to match current code style.
> > > > 2. Remove powerpc changes as there is no test coverage in v3.
> > > > 3. Only keep 1/3 of v2.
> > >
> > > 1) All the copy/paste is ugly and IMO guaranteed to eventually introduce
> > > bugs when somebody forgets to update the copy. Wouldn't it be more
> > > robust to reuse the existing apply_relocate_add() code by making it
> > > more generic somehow, like with a new 'clear' bool arg which sets
> > > 'val' to zero?
> >
> > Agreed. I can give it a try.
>
> I finished this part, though it is not really clean (added if else for
> each "case:").
>
> >
> > >
> > > 2) We can't only fix x86, powerpc also needs a fix.
> >
> > I have very little experience with powerpc. Would someone be willing to
> > help with powerpc part of this?
>
> I guess folks are all busy. Any suggestions on how to test powerpc changes?
>
> >
> > > 3) A selftest would be a good idea.
> >
>
> I found it is pretty tricky to run the selftests inside a qemu VM. How about
> we test it with modules in samples/livepatch? Specifically, we can add a
> script try to reload livepatch-shadow-mod.ko.

Actually, livepatch-shadow-mod.ko doesn't have the reload problem before
the fix. Is this expected?

Thanks,
Song

2022-08-01 10:28:00

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

On Sat 2022-07-30 20:20:22, Song Liu wrote:
> On Sat, Jul 30, 2022 at 3:32 PM Song Liu <[email protected]> wrote:
> >
> > On Tue, Jul 26, 2022 at 8:54 PM Song Liu <[email protected]> wrote:
> > >
> > > On Tue, Jul 26, 2022 at 4:33 PM Josh Poimboeuf <[email protected]> wrote:
> > > >
> > > > On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
> > > > > From: Miroslav Benes <[email protected]>
> > > > >
> > > > > Josh reported a bug:
> > > > >
> > > > > When the object to be patched is a module, and that module is
> > > > > rmmod'ed and reloaded, it fails to load with:
> > > > >
> > > > > module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > > >
> > > > > The livepatch module has a relocation which references a symbol
> > > > > in the _previous_ loading of nfsd. When apply_relocate_add()
> > > > > tries to replace the old relocation with a new one, it sees that
> > > > > the previous one is nonzero and it errors out.
> > > > >
> > > > > On ppc64le, we have a similar issue:
> > > > >
> > > > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > > >
> > > > 3) A selftest would be a good idea.
> > >
> >
> > I found it is pretty tricky to run the selftests inside a qemu VM. How about
> > we test it with modules in samples/livepatch? Specifically, we can add a
> > script try to reload livepatch-shadow-mod.ko.
>
> Actually, livepatch-shadow-mod.ko doesn't have the reload problem before
> the fix. Is this expected?

Good question. I am afraid that there is no easy way to prepare
the selftest at the moment.

There are two situations when a symbol from the livepatched module is
relocated:


1. The livepatch might access a symbol exported by the module via
EXPORT_SYMBOL(). In this case, it is "normal" external symbol
and it gets relocated by the module loader.

But EXPORT_SYMBOL() will create an explicit dependency between the
livepatch and livepatched module. As a result, the livepatch
module could be loaded only when the livepatched module is loaded.
And the livepatched module could not be removed when the livepatch
module is loaded.

In this case, the problem will not exist. Well, the developers
of the livepatch module will probably want to avoid this
dependency.


2. The livepatch module might access a non-exported symbol from another
module using the special elf section for klp relocation, see
section, see Documentation/livepatch/module-elf-format.rst

These symbols are relocated in klp_apply_section_relocs().

The problem is that upstream does not have a support to
create this elf section. There is a patchset for this, see
https://lore.kernel.org/all/[email protected]/
It requires some more review.


Resume: I think that we could not prepare the selftest without
upstreaming klp-convert tool.

Best Regards,
Petr

2022-08-01 11:10:36

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

On Sat 2022-07-30 15:32:58, Song Liu wrote:
> On Tue, Jul 26, 2022 at 8:54 PM Song Liu <[email protected]> wrote:
> >
> > On Tue, Jul 26, 2022 at 4:33 PM Josh Poimboeuf <[email protected]> wrote:
> > >
> > > On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
> > > > From: Miroslav Benes <[email protected]>
> > > >
> > > > Josh reported a bug:
> > > >
> > > > When the object to be patched is a module, and that module is
> > > > rmmod'ed and reloaded, it fails to load with:
> > > >
> > > > module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > >
> > > > The livepatch module has a relocation which references a symbol
> > > > in the _previous_ loading of nfsd. When apply_relocate_add()
> > > > tries to replace the old relocation with a new one, it sees that
> > > > the previous one is nonzero and it errors out.
> > > >
> > > > On ppc64le, we have a similar issue:
> > > >
> > > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > >
> > > 2) We can't only fix x86, powerpc also needs a fix.
> >
> > I have very little experience with powerpc. Would someone be willing to
> > help with powerpc part of this?
>
> I guess folks are all busy. Any suggestions on how to test powerpc changes?

You might also send the patch and just mention that you were not able
to test it on powerpc. There are people with access to powerpc that
might check it.

Another question is how to actually test it. I wonder how you do
it for x86.

Finally, also s390 supports livepatching. I guess that the problem is
there as well. Is is straightforward to write the revert code, please?

Best Regards,
Petr

2022-08-01 16:36:26

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

On Mon, Aug 1, 2022 at 3:31 AM Petr Mladek <[email protected]> wrote:
>
> On Sat 2022-07-30 15:32:58, Song Liu wrote:
> > On Tue, Jul 26, 2022 at 8:54 PM Song Liu <[email protected]> wrote:
> > >
> > > On Tue, Jul 26, 2022 at 4:33 PM Josh Poimboeuf <[email protected]> wrote:
> > > >
> > > > On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
> > > > > From: Miroslav Benes <[email protected]>
> > > > >
> > > > > Josh reported a bug:
> > > > >
> > > > > When the object to be patched is a module, and that module is
> > > > > rmmod'ed and reloaded, it fails to load with:
> > > > >
> > > > > module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > > >
> > > > > The livepatch module has a relocation which references a symbol
> > > > > in the _previous_ loading of nfsd. When apply_relocate_add()
> > > > > tries to replace the old relocation with a new one, it sees that
> > > > > the previous one is nonzero and it errors out.
> > > > >
> > > > > On ppc64le, we have a similar issue:
> > > > >
> > > > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > > >
> > > > 2) We can't only fix x86, powerpc also needs a fix.
> > >
> > > I have very little experience with powerpc. Would someone be willing to
> > > help with powerpc part of this?
> >
> > I guess folks are all busy. Any suggestions on how to test powerpc changes?
>
> You might also send the patch and just mention that you were not able
> to test it on powerpc. There are people with access to powerpc that
> might check it.

Let me give it a try.

>
> Another question is how to actually test it. I wonder how you do
> it for x86.

I tested it with a patch on an in-tree module. I was using kpatch-build and
testing it on 5.18 kernels. This is tricky at the moment for 5.19 because of
this issue:

https://github.com/dynup/kpatch/issues/1277

>
> Finally, also s390 supports livepatching. I guess that the problem is
> there as well. Is is straightforward to write the revert code, please?

v2 of the work doesn't have any real logic for s390. I wonder whether
it is needed.

Thanks,
Song

v2: https://lore.kernel.org/all/[email protected]/T/#u

2022-08-01 21:24:13

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

On Mon, Aug 1, 2022 at 3:25 AM Petr Mladek <[email protected]> wrote:
>
> On Sat 2022-07-30 20:20:22, Song Liu wrote:
> > On Sat, Jul 30, 2022 at 3:32 PM Song Liu <[email protected]> wrote:
> > >
> > > On Tue, Jul 26, 2022 at 8:54 PM Song Liu <[email protected]> wrote:
> > > >
> > > > On Tue, Jul 26, 2022 at 4:33 PM Josh Poimboeuf <[email protected]> wrote:
> > > > >
> > > > > On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
> > > > > > From: Miroslav Benes <[email protected]>
> > > > > >
> > > > > > Josh reported a bug:
> > > > > >
> > > > > > When the object to be patched is a module, and that module is
> > > > > > rmmod'ed and reloaded, it fails to load with:
> > > > > >
> > > > > > module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > > > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > > > >
> > > > > > The livepatch module has a relocation which references a symbol
> > > > > > in the _previous_ loading of nfsd. When apply_relocate_add()
> > > > > > tries to replace the old relocation with a new one, it sees that
> > > > > > the previous one is nonzero and it errors out.
> > > > > >
> > > > > > On ppc64le, we have a similar issue:
> > > > > >
> > > > > > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > > > > > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > > > > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > > > >
> > > > > 3) A selftest would be a good idea.
> > > >
> > >
> > > I found it is pretty tricky to run the selftests inside a qemu VM. How about
> > > we test it with modules in samples/livepatch? Specifically, we can add a
> > > script try to reload livepatch-shadow-mod.ko.
> >
> > Actually, livepatch-shadow-mod.ko doesn't have the reload problem before
> > the fix. Is this expected?
>
> Good question. I am afraid that there is no easy way to prepare
> the selftest at the moment.
>
> There are two situations when a symbol from the livepatched module is
> relocated:
>
>
> 1. The livepatch might access a symbol exported by the module via
> EXPORT_SYMBOL(). In this case, it is "normal" external symbol
> and it gets relocated by the module loader.
>
> But EXPORT_SYMBOL() will create an explicit dependency between the
> livepatch and livepatched module. As a result, the livepatch
> module could be loaded only when the livepatched module is loaded.
> And the livepatched module could not be removed when the livepatch
> module is loaded.
>
> In this case, the problem will not exist. Well, the developers
> of the livepatch module will probably want to avoid this
> dependency.
>
>
> 2. The livepatch module might access a non-exported symbol from another
> module using the special elf section for klp relocation, see
> section, see Documentation/livepatch/module-elf-format.rst
>
> These symbols are relocated in klp_apply_section_relocs().
>
> The problem is that upstream does not have a support to
> create this elf section. There is a patchset for this, see
> https://lore.kernel.org/all/[email protected]/
> It requires some more review.
>
>
> Resume: I think that we could not prepare the selftest without
> upstreaming klp-convert tool.

Thanks for the explanation! I suspected the same issue, but couldn't
connect all the logic.

I guess the selftests can wait until the klp-convert tool.

Thanks,
Song

2022-08-02 12:42:30

by Joe Lawrence

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

On 8/1/22 17:19, Song Liu wrote:
> On Mon, Aug 1, 2022 at 3:25 AM Petr Mladek <[email protected]> wrote:
>>
>> On Sat 2022-07-30 20:20:22, Song Liu wrote:
>>> On Sat, Jul 30, 2022 at 3:32 PM Song Liu <[email protected]> wrote:
>>>>
>>>> On Tue, Jul 26, 2022 at 8:54 PM Song Liu <[email protected]> wrote:
>>>>>
>>>>> On Tue, Jul 26, 2022 at 4:33 PM Josh Poimboeuf <[email protected]> wrote:
>>>>>>
>>>>>> On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
>>>>>>> From: Miroslav Benes <[email protected]>
>>>>>>>
>>>>>>> Josh reported a bug:
>>>>>>>
>>>>>>> When the object to be patched is a module, and that module is
>>>>>>> rmmod'ed and reloaded, it fails to load with:
>>>>>>>
>>>>>>> module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
>>>>>>> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>>>>>>> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>>>>>>>
>>>>>>> The livepatch module has a relocation which references a symbol
>>>>>>> in the _previous_ loading of nfsd. When apply_relocate_add()
>>>>>>> tries to replace the old relocation with a new one, it sees that
>>>>>>> the previous one is nonzero and it errors out.
>>>>>>>
>>>>>>> On ppc64le, we have a similar issue:
>>>>>>>
>>>>>>> module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
>>>>>>> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>>>>>>> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>>>>>>>
>>>>>> 3) A selftest would be a good idea.
>>>>>
>>>>
>>>> I found it is pretty tricky to run the selftests inside a qemu VM. How about
>>>> we test it with modules in samples/livepatch? Specifically, we can add a
>>>> script try to reload livepatch-shadow-mod.ko.
>>>
>>> Actually, livepatch-shadow-mod.ko doesn't have the reload problem before
>>> the fix. Is this expected?
>>
>> Good question. I am afraid that there is no easy way to prepare
>> the selftest at the moment.
>>
>> There are two situations when a symbol from the livepatched module is
>> relocated:
>>
>>
>> 1. The livepatch might access a symbol exported by the module via
>> EXPORT_SYMBOL(). In this case, it is "normal" external symbol
>> and it gets relocated by the module loader.
>>
>> But EXPORT_SYMBOL() will create an explicit dependency between the
>> livepatch and livepatched module. As a result, the livepatch
>> module could be loaded only when the livepatched module is loaded.
>> And the livepatched module could not be removed when the livepatch
>> module is loaded.
>>
>> In this case, the problem will not exist. Well, the developers
>> of the livepatch module will probably want to avoid this
>> dependency.
>>
>>
>> 2. The livepatch module might access a non-exported symbol from another
>> module using the special elf section for klp relocation, see
>> section, see Documentation/livepatch/module-elf-format.rst
>>
>> These symbols are relocated in klp_apply_section_relocs().
>>
>> The problem is that upstream does not have a support to
>> create this elf section. There is a patchset for this, see
>> https://lore.kernel.org/all/[email protected]/
>> It requires some more review.
>>
>>
>> Resume: I think that we could not prepare the selftest without
>> upstreaming klp-convert tool.
>
> Thanks for the explanation! I suspected the same issue, but couldn't
> connect all the logic.
>
> I guess the selftests can wait until the klp-convert tool.
>

Hi Song,

Petr is correct about selftests and these relocations. Let me know if
rebasing the klp-convert patchset would be helpful in your testing.
Otherwise kpatch-build is the only (easy?) way to create klp-relocations
as far as I know. (For limited arches anyway.)

Thanks,

--
Joe


2022-08-02 16:38:39

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3] livepatch: Clear relocation targets on a module removal

Hi Joe,

On Tue, Aug 2, 2022 at 5:31 AM Joe Lawrence <[email protected]> wrote:
>
> On 8/1/22 17:19, Song Liu wrote:
> > On Mon, Aug 1, 2022 at 3:25 AM Petr Mladek <[email protected]> wrote:
> >>
> >> On Sat 2022-07-30 20:20:22, Song Liu wrote:
> >>> On Sat, Jul 30, 2022 at 3:32 PM Song Liu <[email protected]> wrote:
> >>>>
> >>>> On Tue, Jul 26, 2022 at 8:54 PM Song Liu <[email protected]> wrote:
> >>>>>
> >>>>> On Tue, Jul 26, 2022 at 4:33 PM Josh Poimboeuf <[email protected]> wrote:
> >>>>>>
> >>>>>> On Thu, Jul 21, 2022 at 10:51:47AM -0700, Song Liu wrote:
> >>>>>>> From: Miroslav Benes <[email protected]>
> >>>>>>>
> >>>>>>> Josh reported a bug:
> >>>>>>>
> >>>>>>> When the object to be patched is a module, and that module is
> >>>>>>> rmmod'ed and reloaded, it fails to load with:
> >>>>>>>
> >>>>>>> module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> >>>>>>> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> >>>>>>> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> >>>>>>>
> >>>>>>> The livepatch module has a relocation which references a symbol
> >>>>>>> in the _previous_ loading of nfsd. When apply_relocate_add()
> >>>>>>> tries to replace the old relocation with a new one, it sees that
> >>>>>>> the previous one is nonzero and it errors out.
> >>>>>>>
> >>>>>>> On ppc64le, we have a similar issue:
> >>>>>>>
> >>>>>>> module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> >>>>>>> livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> >>>>>>> livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> >>>>>>>
> >>>>>> 3) A selftest would be a good idea.
> >>>>>
> >>>>
> >>>> I found it is pretty tricky to run the selftests inside a qemu VM. How about
> >>>> we test it with modules in samples/livepatch? Specifically, we can add a
> >>>> script try to reload livepatch-shadow-mod.ko.
> >>>
> >>> Actually, livepatch-shadow-mod.ko doesn't have the reload problem before
> >>> the fix. Is this expected?
> >>
> >> Good question. I am afraid that there is no easy way to prepare
> >> the selftest at the moment.
> >>
> >> There are two situations when a symbol from the livepatched module is
> >> relocated:
> >>
> >>
> >> 1. The livepatch might access a symbol exported by the module via
> >> EXPORT_SYMBOL(). In this case, it is "normal" external symbol
> >> and it gets relocated by the module loader.
> >>
> >> But EXPORT_SYMBOL() will create an explicit dependency between the
> >> livepatch and livepatched module. As a result, the livepatch
> >> module could be loaded only when the livepatched module is loaded.
> >> And the livepatched module could not be removed when the livepatch
> >> module is loaded.
> >>
> >> In this case, the problem will not exist. Well, the developers
> >> of the livepatch module will probably want to avoid this
> >> dependency.
> >>
> >>
> >> 2. The livepatch module might access a non-exported symbol from another
> >> module using the special elf section for klp relocation, see
> >> section, see Documentation/livepatch/module-elf-format.rst
> >>
> >> These symbols are relocated in klp_apply_section_relocs().
> >>
> >> The problem is that upstream does not have a support to
> >> create this elf section. There is a patchset for this, see
> >> https://lore.kernel.org/all/[email protected]/
> >> It requires some more review.
> >>
> >>
> >> Resume: I think that we could not prepare the selftest without
> >> upstreaming klp-convert tool.
> >
> > Thanks for the explanation! I suspected the same issue, but couldn't
> > connect all the logic.
> >
> > I guess the selftests can wait until the klp-convert tool.
> >
>
> Hi Song,
>
> Petr is correct about selftests and these relocations. Let me know if
> rebasing the klp-convert patchset would be helpful in your testing.
> Otherwise kpatch-build is the only (easy?) way to create klp-relocations
> as far as I know. (For limited arches anyway.)

I was able to test the patch on x86_64 with kpatch-build. I will look into
klp-convert later (after I fix kpatch-build for some OOT modules..)

I the meanwhile, I guess this patch doesn't need to wait for klp-convert
and the selftest?

Thanks,
Song