Hi,
These patches fix a few issues where KCOV code could trigger recursive
faults, discovered while debugging a patch enabling KCOV for arch/arm:
* On CONFIG_PREEMPT kernels, there's a small race window where
__sanitizer_cov_trace_pc() can see a bogus kcov_area.
* Lazy faulting of the vmalloc area can cause mutual recursion between
fault handling code and __sanitizer_cov_trace_pc().
* During the context switch, switching the mm can cause the kcov_area to
be transiently unmapped.
These are prerequisites for enabling KCOV on arm, but the issues
themsevles are generic -- we just happen to avoid them by chance rather
than design on x86-64 and arm64.
I've tested this on arm atop of v4.17-rc3, with KCOV enabled.
Thanks,
Mark.
Mark Rutland (3):
kcov: ensure irq code sees a valid area
kcov: prefault the kcov_area
sched/core / kcov: avoid kcov_area during task switch
include/linux/kcov.h | 14 ++++++++++++++
include/linux/sched.h | 2 +-
kernel/kcov.c | 17 +++++++++++++++--
kernel/sched/core.c | 4 ++++
4 files changed, 34 insertions(+), 3 deletions(-)
--
2.11.0
For kernels built with CONFIG_PREEMPT, some C code may execute before or
after the interrupt handler, while the hardirq count is zero. In these
cases, in_task() can return true.
A task can be interrupted in the middle of a KCOV_DISABLE ioctl while it
resets the task's kcov data via kcov_task_init(). Instrumented code
executed during this period will call __sanitizer_cov_trace_pc(), and as
in_task() returns true, will inspect t->kcov_mode before trying to write
to t->kcov_area.
In kcov_init_task() Since we update t->kcov_{mode,area,size} with plain
stores, which may be re-ordered, torn, etc. Thus
__sanitizer_cov_trace_pc() may see bogus values for any of these fields,
and may attempt to write to memory which is not mapped.
Let's avoid this by using WRITE_ONCE() to set t->kcov_mode, with a
barrier() to ensure this is ordered before we clear t->kov_{area,size}.
This ensures that any code execute while kcov_init_task() is preempted
will either see valid values for t->kcov_{area,size}, or will see that
t->kcov_mode is KCOV_MODE_DISABLED, and bail out without touching
t->kcov_area.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
---
kernel/kcov.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/kcov.c b/kernel/kcov.c
index 2c16f1ab5e10..5be9a60a959f 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -241,7 +241,8 @@ static void kcov_put(struct kcov *kcov)
void kcov_task_init(struct task_struct *t)
{
- t->kcov_mode = KCOV_MODE_DISABLED;
+ WRITE_ONCE(t->kcov_mode, KCOV_MODE_DISABLED);
+ barrier();
t->kcov_size = 0;
t->kcov_area = NULL;
t->kcov = NULL;
--
2.11.0
On many architectures the vmalloc area is lazily faulted in upon first
access. This is problematic for KCOV, as __sanitizer_cov_trace_pc
accesses the (vmalloc'd) kcov_area, and fault handling code may be
instrumented. If an access to kcov_area faults, this will result in
mutual recursion through the fault handling code and
__sanitizer_cov_trace_pc(), eventually leading to stack corruption
and/or overflow.
We can avoid this by faulting in the kcov_area before
__sanitizer_cov_trace_pc() is permitted to access it. Once it has been
faulted in, it will remain present in the process page tables, and will
not fault again.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
---
kernel/kcov.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/kernel/kcov.c b/kernel/kcov.c
index 5be9a60a959f..3b82f8e258da 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -324,6 +324,17 @@ static int kcov_close(struct inode *inode, struct file *filep)
return 0;
}
+static void kcov_fault_in_area(struct kcov *kcov)
+{
+ unsigned long stride = PAGE_SIZE / sizeof(unsigned long);
+ unsigned long *area = kcov->area;
+ unsigned long offset;
+
+ for (offset = 0; offset < kcov->size; offset += stride) {
+ READ_ONCE(area[offset]);
+ }
+}
+
static int kcov_ioctl_locked(struct kcov *kcov, unsigned int cmd,
unsigned long arg)
{
@@ -372,6 +383,7 @@ static int kcov_ioctl_locked(struct kcov *kcov, unsigned int cmd,
#endif
else
return -EINVAL;
+ kcov_fault_in_area(kcov);
/* Cache in task struct for performance. */
t->kcov_size = kcov->size;
t->kcov_area = kcov->area;
--
2.11.0
During a context switch, we first switch_mm() to the next task's mm,
then switch_to() that new task. This means that vmalloc'd regions which
had previously been faulted in can transiently disappear in the context
of the prev task.
Functions instrumented by KCOV may try to access a vmalloc'd kcov_area
during this window, and as the fault handling code is instrumented, this
results in a recursive fault.
We must avoid accessing any kcov_area during this window. We can do so
with a new flag in kcov_mode, set prior to switching the mm, and cleared
once the new task is live. Since task_struct::kcov_mode isn't always a
specific enum kcov_mode value, this is made an unsigned int.
The manipulation is hidden behind kcov_{prepare,finish}_switch()
helpers, which are empty for !CONFIG_KCOV kernels.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
include/linux/kcov.h | 14 ++++++++++++++
include/linux/sched.h | 2 +-
kernel/kcov.c | 2 +-
kernel/sched/core.c | 4 ++++
4 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/include/linux/kcov.h b/include/linux/kcov.h
index 3ecf6f5e3a5f..b76a1807028d 100644
--- a/include/linux/kcov.h
+++ b/include/linux/kcov.h
@@ -22,13 +22,27 @@ enum kcov_mode {
KCOV_MODE_TRACE_CMP = 3,
};
+#define KCOV_IN_CTXSW (1 << 30)
+
void kcov_task_init(struct task_struct *t);
void kcov_task_exit(struct task_struct *t);
+#define kcov_prepare_switch(t) \
+do { \
+ (t)->kcov_mode |= KCOV_IN_CTXSW; \
+} while (0)
+
+#define kcov_finish_switch(t) \
+do { \
+ (t)->kcov_mode &= ~KCOV_IN_CTXSW; \
+} while (0)
+
#else
static inline void kcov_task_init(struct task_struct *t) {}
static inline void kcov_task_exit(struct task_struct *t) {}
+static inline void kcov_prepare_switch(struct task_struct *t) {}
+static inline void kcov_finish_switch(struct task_struct *t) {}
#endif /* CONFIG_KCOV */
#endif /* _LINUX_KCOV_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b3d697f3b573..3d9edcf57e21 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1078,7 +1078,7 @@ struct task_struct {
#ifdef CONFIG_KCOV
/* Coverage collection mode enabled for this task (0 if disabled): */
- enum kcov_mode kcov_mode;
+ unsigned int kcov_mode;
/* Size of the kcov_area: */
unsigned int kcov_size;
diff --git a/kernel/kcov.c b/kernel/kcov.c
index 3b82f8e258da..f73ab8dc3ba7 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -58,7 +58,7 @@ struct kcov {
static bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t)
{
- enum kcov_mode mode;
+ unsigned int mode;
/*
* We are interested in code coverage as a function of a syscall inputs,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5e10aaeebfcc..52078516803a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7,6 +7,8 @@
*/
#include "sched.h"
+#include <linux/kcov.h>
+
#include <asm/switch_to.h>
#include <asm/tlb.h>
@@ -2632,6 +2634,7 @@ static inline void
prepare_task_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
{
+ kcov_prepare_switch(prev);
sched_info_switch(rq, prev, next);
perf_event_task_sched_out(prev, next);
fire_sched_out_preempt_notifiers(prev, next);
@@ -2700,6 +2703,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
finish_task(prev);
finish_lock_switch(rq);
finish_arch_post_lock_switch();
+ kcov_finish_switch(current);
fire_sched_in_preempt_notifiers(current);
/*
--
2.11.0
On 05/04/2018 04:55 PM, Mark Rutland wrote:
> +#define kcov_prepare_switch(t) \
> +do { \
> + (t)->kcov_mode |= KCOV_IN_CTXSW; \
> +} while (0)
> +
> +#define kcov_finish_switch(t) \
> +do { \
> + (t)->kcov_mode &= ~KCOV_IN_CTXSW; \
> +} while (0)
> +
Why macros?
> #else
>
On 05/04/2018 04:55 PM, Mark Rutland wrote:
>
> +static void kcov_fault_in_area(struct kcov *kcov)
> +{
> + unsigned long stride = PAGE_SIZE / sizeof(unsigned long);
> + unsigned long *area = kcov->area;
> + unsigned long offset;
> +
> + for (offset = 0; offset < kcov->size; offset += stride) {
> + READ_ONCE(area[offset]);
> + }
Usually we don't use {} for a single statement blocks.
> +}
> +
On Fri, May 04, 2018 at 05:32:26PM +0300, Andrey Ryabinin wrote:
> On 05/04/2018 04:55 PM, Mark Rutland wrote:
>
> > +#define kcov_prepare_switch(t) \
> > +do { \
> > + (t)->kcov_mode |= KCOV_IN_CTXSW; \
> > +} while (0)
> > +
> > +#define kcov_finish_switch(t) \
> > +do { \
> > + (t)->kcov_mode &= ~KCOV_IN_CTXSW; \
> > +} while (0)
> > +
>
> Why macros?
I can't use static inline functions without a circular include
dependency between <linux/sched.h> and <linux/kcov.h>, since the
definition of task_struct uses things defined in <linux/kcov.h>.
Thanks,
Mark.
On Fri, May 04, 2018 at 05:36:49PM +0300, Andrey Ryabinin wrote:
>
>
> On 05/04/2018 04:55 PM, Mark Rutland wrote:
>
> >
> > +static void kcov_fault_in_area(struct kcov *kcov)
> > +{
> > + unsigned long stride = PAGE_SIZE / sizeof(unsigned long);
> > + unsigned long *area = kcov->area;
> > + unsigned long offset;
> > +
> > + for (offset = 0; offset < kcov->size; offset += stride) {
> > + READ_ONCE(area[offset]);
> > + }
>
> Usually we don't use {} for a single statement blocks.
>
> > +}
> > +
I'll remove the braces.
Can I consider that an ack, otherwise? :)
Thanks,
Mark.
On 05/04/2018 05:38 PM, Mark Rutland wrote:
> On Fri, May 04, 2018 at 05:36:49PM +0300, Andrey Ryabinin wrote:
>>
>>
>> On 05/04/2018 04:55 PM, Mark Rutland wrote:
>>
>>>
>>> +static void kcov_fault_in_area(struct kcov *kcov)
>>> +{
>>> + unsigned long stride = PAGE_SIZE / sizeof(unsigned long);
>>> + unsigned long *area = kcov->area;
>>> + unsigned long offset;
>>> +
>>> + for (offset = 0; offset < kcov->size; offset += stride) {
>>> + READ_ONCE(area[offset]);
>>> + }
>>
>> Usually we don't use {} for a single statement blocks.
>>
>>> +}
>>> +
>
> I'll remove the braces.
>
> Can I consider that an ack, otherwise? :)
Yes, ack for all 3 patches.
>
> Thanks,
> Mark.
>
On Fri, May 04, 2018 at 02:55:33PM +0100, Mark Rutland wrote:
> In kcov_init_task() Since we update t->kcov_{mode,area,size} with plain
> stores, which may be re-ordered, torn, etc. Thus
> __sanitizer_cov_trace_pc() may see bogus values for any of these fields,
> and may attempt to write to memory which is not mapped.
Whoops; that 'Since' shuoldn't be there. I've fixed this up locally.
Mark.
On Fri, 4 May 2018 14:55:34 +0100 Mark Rutland <[email protected]> wrote:
> On many architectures the vmalloc area is lazily faulted in upon first
> access. This is problematic for KCOV, as __sanitizer_cov_trace_pc
> accesses the (vmalloc'd) kcov_area, and fault handling code may be
> instrumented. If an access to kcov_area faults, this will result in
> mutual recursion through the fault handling code and
> __sanitizer_cov_trace_pc(), eventually leading to stack corruption
> and/or overflow.
>
> We can avoid this by faulting in the kcov_area before
> __sanitizer_cov_trace_pc() is permitted to access it. Once it has been
> faulted in, it will remain present in the process page tables, and will
> not fault again.
>
> ...
>
> --- a/kernel/kcov.c
> +++ b/kernel/kcov.c
> @@ -324,6 +324,17 @@ static int kcov_close(struct inode *inode, struct file *filep)
> return 0;
> }
>
> +static void kcov_fault_in_area(struct kcov *kcov)
It would be nice to have a comment here explaining why the function
exists.
umm, this?
--- a/kernel/kcov.c~kcov-prefault-the-kcov_area-fix-fix
+++ a/kernel/kcov.c
@@ -324,6 +324,10 @@ static int kcov_close(struct inode *inod
return 0;
}
+/*
+ * fault in a lazily-faulted vmalloc area, to avoid recursion issues if the
+ * vmalloc fault handler itself is instrumented.
+ */
static void kcov_fault_in_area(struct kcov *kcov)
{
unsigned long stride = PAGE_SIZE / sizeof(unsigned long);
> +{
> + unsigned long stride = PAGE_SIZE / sizeof(unsigned long);
> + unsigned long *area = kcov->area;
> + unsigned long offset;
> +
> + for (offset = 0; offset < kcov->size; offset += stride) {
> + READ_ONCE(area[offset]);
> + }
> +}
On Tue, May 08, 2018 at 03:51:58PM -0700, Andrew Morton wrote:
> On Fri, 4 May 2018 14:55:34 +0100 Mark Rutland <[email protected]> wrote:
>
> > On many architectures the vmalloc area is lazily faulted in upon first
> > access. This is problematic for KCOV, as __sanitizer_cov_trace_pc
> > accesses the (vmalloc'd) kcov_area, and fault handling code may be
> > instrumented. If an access to kcov_area faults, this will result in
> > mutual recursion through the fault handling code and
> > __sanitizer_cov_trace_pc(), eventually leading to stack corruption
> > and/or overflow.
> >
> > We can avoid this by faulting in the kcov_area before
> > __sanitizer_cov_trace_pc() is permitted to access it. Once it has been
> > faulted in, it will remain present in the process page tables, and will
> > not fault again.
> >
> > ...
> >
> > --- a/kernel/kcov.c
> > +++ b/kernel/kcov.c
> > @@ -324,6 +324,17 @@ static int kcov_close(struct inode *inode, struct file *filep)
> > return 0;
> > }
> >
> > +static void kcov_fault_in_area(struct kcov *kcov)
>
> It would be nice to have a comment here explaining why the function
> exists.
>
> umm, this?
>
> --- a/kernel/kcov.c~kcov-prefault-the-kcov_area-fix-fix
> +++ a/kernel/kcov.c
> @@ -324,6 +324,10 @@ static int kcov_close(struct inode *inod
> return 0;
> }
>
> +/*
> + * fault in a lazily-faulted vmalloc area, to avoid recursion issues if the
> + * vmalloc fault handler itself is instrumented.
> + */
Sounds good to me. Perhaps we want a little more detail, e.g.
/*
* Fault in a lazily-faulted vmalloc area before it can be used by
* __santizer_cov_trace_pc(), to avoid recursion issues if any code on
* the vmalloc fault handling path is instrumented.
*/
> static void kcov_fault_in_area(struct kcov *kcov)
I also think it might make sense to rename this to kcov_prefault_area(),
so that this doesn't sound like a fault handler, but that's not a big
deal.
Thanks for handling this!
Mark.