Syzbot reports many UAF issues for workqueue or timer, see [1] and [2].
In some of these access/allocation happened in process_one_work(),
we see the free stack is useless in KASAN report, it doesn't help
programmers to solve UAF on workqueue. The same may stand for times.
This patchset improves KASAN reports by making them to have workqueue
queueing stack and timer queueing stack information. It is useful for
programmers to solve use-after-free or double-free memory issue.
Generic KASAN will record the last two workqueue and timer stacks,
print them in KASAN report. It is only suitable for generic KASAN.
In order to print the last two workqueue and timer stacks, so that
we add new members in struct kasan_alloc_meta.
- two workqueue queueing work stacks, total size is 8 bytes.
- two timer queueing stacks, total size is 8 bytes.
Orignial struct kasan_alloc_meta size is 16 bytes. After add new
members, then the struct kasan_alloc_meta total size is 32 bytes,
It is a good number of alignment. Let it get better memory consumption.
[1]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22+process_one_work
[2]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22%20expire_timers
[3]https://bugzilla.kernel.org/show_bug.cgi?id=198437
Walter Wu (5):
timer: kasan: record and print timer stack
workqueue: kasan: record and print workqueue stack
lib/test_kasan.c: add timer test case
lib/test_kasan.c: add workqueue test case
kasan: update documentation for generic kasan
Documentation/dev-tools/kasan.rst | 4 ++--
include/linux/kasan.h | 4 ++++
kernel/time/timer.c | 2 ++
kernel/workqueue.c | 3 +++
lib/test_kasan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
mm/kasan/generic.c | 42 ++++++++++++++++++++++++++++++++++++++++++
mm/kasan/kasan.h | 6 +++++-
mm/kasan/report.c | 22 ++++++++++++++++++++++
8 files changed, 134 insertions(+), 3 deletions(-)
> On Aug 10, 2020, at 3:21 AM, Walter Wu <[email protected]> wrote:
>
> Syzbot reports many UAF issues for workqueue or timer, see [1] and [2].
> In some of these access/allocation happened in process_one_work(),
> we see the free stack is useless in KASAN report, it doesn't help
> programmers to solve UAF on workqueue. The same may stand for times.
>
> This patchset improves KASAN reports by making them to have workqueue
> queueing stack and timer queueing stack information. It is useful for
> programmers to solve use-after-free or double-free memory issue.
>
> Generic KASAN will record the last two workqueue and timer stacks,
> print them in KASAN report. It is only suitable for generic KASAN.
>
> In order to print the last two workqueue and timer stacks, so that
> we add new members in struct kasan_alloc_meta.
> - two workqueue queueing work stacks, total size is 8 bytes.
> - two timer queueing stacks, total size is 8 bytes.
>
> Orignial struct kasan_alloc_meta size is 16 bytes. After add new
> members, then the struct kasan_alloc_meta total size is 32 bytes,
> It is a good number of alignment. Let it get better memory consumption.
Getting debugging tools complicated surely is the best way to kill it. I would argue that it only make sense to complicate it if it is useful most of the time which I never feel or hear that is the case. This reminds me your recent call_rcu() stacks that most of time just makes parsing the report cumbersome. Thus, I urge this exercise to over-engineer on special cases need to stop entirely.
>
> [1]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22+process_one_work
> [2]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22%20expire_timers
> [3]https://bugzilla.kernel.org/show_bug.cgi?id=198437
>
> Walter Wu (5):
> timer: kasan: record and print timer stack
> workqueue: kasan: record and print workqueue stack
> lib/test_kasan.c: add timer test case
> lib/test_kasan.c: add workqueue test case
> kasan: update documentation for generic kasan
>
> Documentation/dev-tools/kasan.rst | 4 ++--
> include/linux/kasan.h | 4 ++++
> kernel/time/timer.c | 2 ++
> kernel/workqueue.c | 3 +++
> lib/test_kasan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> mm/kasan/generic.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> mm/kasan/kasan.h | 6 +++++-
> mm/kasan/report.c | 22 ++++++++++++++++++++++
> 8 files changed, 134 insertions(+), 3 deletions(-)
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20200810072115.429-1-walter-zh.wu%40mediatek.com.
On Mon, 2020-08-10 at 07:19 -0400, Qian Cai wrote:
>
> > On Aug 10, 2020, at 3:21 AM, Walter Wu <[email protected]> wrote:
> >
> > Syzbot reports many UAF issues for workqueue or timer, see [1] and [2].
> > In some of these access/allocation happened in process_one_work(),
> > we see the free stack is useless in KASAN report, it doesn't help
> > programmers to solve UAF on workqueue. The same may stand for times.
> >
> > This patchset improves KASAN reports by making them to have workqueue
> > queueing stack and timer queueing stack information. It is useful for
> > programmers to solve use-after-free or double-free memory issue.
> >
> > Generic KASAN will record the last two workqueue and timer stacks,
> > print them in KASAN report. It is only suitable for generic KASAN.
> >
> > In order to print the last two workqueue and timer stacks, so that
> > we add new members in struct kasan_alloc_meta.
> > - two workqueue queueing work stacks, total size is 8 bytes.
> > - two timer queueing stacks, total size is 8 bytes.
> >
> > Orignial struct kasan_alloc_meta size is 16 bytes. After add new
> > members, then the struct kasan_alloc_meta total size is 32 bytes,
> > It is a good number of alignment. Let it get better memory consumption.
>
> Getting debugging tools complicated surely is the best way to kill it. I would argue that it only make sense to complicate it if it is useful most of the time which I never feel or hear that is the case. This reminds me your recent call_rcu() stacks that most of time just makes parsing the report cumbersome. Thus, I urge this exercise to over-engineer on special cases need to stop entirely.
>
A good debug tool is to have complete information in order to solve
issue. We should focus on if KASAN reports always show this debug
information or create a option to decide if show it. Because this
feature is Dimitry's suggestion. see [1]. So I think it need to be
implemented. Maybe we can wait his response.
[1]https://lkml.org/lkml/2020/6/23/256
Thanks.
> >
> > [1]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22+process_one_work
> > [2]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22%20expire_timers
> > [3]https://bugzilla.kernel.org/show_bug.cgi?id=198437
> >
> > Walter Wu (5):
> > timer: kasan: record and print timer stack
> > workqueue: kasan: record and print workqueue stack
> > lib/test_kasan.c: add timer test case
> > lib/test_kasan.c: add workqueue test case
> > kasan: update documentation for generic kasan
> >
> > Documentation/dev-tools/kasan.rst | 4 ++--
> > include/linux/kasan.h | 4 ++++
> > kernel/time/timer.c | 2 ++
> > kernel/workqueue.c | 3 +++
> > lib/test_kasan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > mm/kasan/generic.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> > mm/kasan/kasan.h | 6 +++++-
> > mm/kasan/report.c | 22 ++++++++++++++++++++++
> > 8 files changed, 134 insertions(+), 3 deletions(-)
> >
> > --
> > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20200810072115.429-1-walter-zh.wu%40mediatek.com.
On Mon, 2020-08-10 at 19:50 +0800, Walter Wu wrote:
> On Mon, 2020-08-10 at 07:19 -0400, Qian Cai wrote:
> >
> > > On Aug 10, 2020, at 3:21 AM, Walter Wu <[email protected]> wrote:
> > >
> > > Syzbot reports many UAF issues for workqueue or timer, see [1] and [2].
> > > In some of these access/allocation happened in process_one_work(),
> > > we see the free stack is useless in KASAN report, it doesn't help
> > > programmers to solve UAF on workqueue. The same may stand for times.
> > >
> > > This patchset improves KASAN reports by making them to have workqueue
> > > queueing stack and timer queueing stack information. It is useful for
> > > programmers to solve use-after-free or double-free memory issue.
> > >
> > > Generic KASAN will record the last two workqueue and timer stacks,
> > > print them in KASAN report. It is only suitable for generic KASAN.
> > >
> > > In order to print the last two workqueue and timer stacks, so that
> > > we add new members in struct kasan_alloc_meta.
> > > - two workqueue queueing work stacks, total size is 8 bytes.
> > > - two timer queueing stacks, total size is 8 bytes.
> > >
> > > Orignial struct kasan_alloc_meta size is 16 bytes. After add new
> > > members, then the struct kasan_alloc_meta total size is 32 bytes,
> > > It is a good number of alignment. Let it get better memory consumption.
> >
> > Getting debugging tools complicated surely is the best way to kill it. I would argue that it only make sense to complicate it if it is useful most of the time which I never feel or hear that is the case. This reminds me your recent call_rcu() stacks that most of time just makes parsing the report cumbersome. Thus, I urge this exercise to over-engineer on special cases need to stop entirely.
> >
>
> A good debug tool is to have complete information in order to solve
> issue. We should focus on if KASAN reports always show this debug
> information or create a option to decide if show it. Because this
> feature is Dmitry's suggestion. see [1]. So I think it need to be
> implemented. Maybe we can wait his response.
>
> [1]https://lkml.org/lkml/2020/6/23/256
>
> Thanks.
>
Fix name typo. I am sorry to him.
And add a bugzilla to show why need to do it. please see [1].
[1] https://bugzilla.kernel.org/show_bug.cgi?id=198437
> > >
> > > [1]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22+process_one_work
> > > [2]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22%20expire_timers
> > > [3]https://bugzilla.kernel.org/show_bug.cgi?id=198437
> > >
> > > Walter Wu (5):
> > > timer: kasan: record and print timer stack
> > > workqueue: kasan: record and print workqueue stack
> > > lib/test_kasan.c: add timer test case
> > > lib/test_kasan.c: add workqueue test case
> > > kasan: update documentation for generic kasan
> > >
> > > Documentation/dev-tools/kasan.rst | 4 ++--
> > > include/linux/kasan.h | 4 ++++
> > > kernel/time/timer.c | 2 ++
> > > kernel/workqueue.c | 3 +++
> > > lib/test_kasan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > mm/kasan/generic.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> > > mm/kasan/kasan.h | 6 +++++-
> > > mm/kasan/report.c | 22 ++++++++++++++++++++++
> > > 8 files changed, 134 insertions(+), 3 deletions(-)
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20200810072115.429-1-walter-zh.wu%40mediatek.com.
>
On Mon, Aug 10, 2020 at 07:50:57PM +0800, Walter Wu wrote:
> On Mon, 2020-08-10 at 07:19 -0400, Qian Cai wrote:
> >
> > > On Aug 10, 2020, at 3:21 AM, Walter Wu <[email protected]> wrote:
> > >
> > > Syzbot reports many UAF issues for workqueue or timer, see [1] and [2].
> > > In some of these access/allocation happened in process_one_work(),
> > > we see the free stack is useless in KASAN report, it doesn't help
> > > programmers to solve UAF on workqueue. The same may stand for times.
> > >
> > > This patchset improves KASAN reports by making them to have workqueue
> > > queueing stack and timer queueing stack information. It is useful for
> > > programmers to solve use-after-free or double-free memory issue.
> > >
> > > Generic KASAN will record the last two workqueue and timer stacks,
> > > print them in KASAN report. It is only suitable for generic KASAN.
> > >
> > > In order to print the last two workqueue and timer stacks, so that
> > > we add new members in struct kasan_alloc_meta.
> > > - two workqueue queueing work stacks, total size is 8 bytes.
> > > - two timer queueing stacks, total size is 8 bytes.
> > >
> > > Orignial struct kasan_alloc_meta size is 16 bytes. After add new
> > > members, then the struct kasan_alloc_meta total size is 32 bytes,
> > > It is a good number of alignment. Let it get better memory consumption.
> >
> > Getting debugging tools complicated surely is the best way to kill it. I would argue that it only make sense to complicate it if it is useful most of the time which I never feel or hear that is the case. This reminds me your recent call_rcu() stacks that most of time just makes parsing the report cumbersome. Thus, I urge this exercise to over-engineer on special cases need to stop entirely.
> >
>
> A good debug tool is to have complete information in order to solve
> issue. We should focus on if KASAN reports always show this debug
> information or create a option to decide if show it. Because this
> feature is Dimitry's suggestion. see [1]. So I think it need to be
> implemented. Maybe we can wait his response.
>
> [1]https://lkml.org/lkml/2020/6/23/256
I don't know if it is Dmitry's pipe-dream which every KASAN report would enable
developers to fix it without reproducing it. It is always an ongoing struggling
between to make kernel easier to debug and the things less cumbersome.
On the other hand, Dmitry's suggestion makes sense only if the price we are
going to pay is fair. With the current diffstat and the recent experience of
call_rcu() stacks "waste" screen spaces as a heavy KASAN user myself, I can't
really get that exciting for pushing the limit again at all.
>
> Thanks.
>
> > >
> > > [1]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22+process_one_work
> > > [2]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22%20expire_timers
> > > [3]https://bugzilla.kernel.org/show_bug.cgi?id=198437
> > >
> > > Walter Wu (5):
> > > timer: kasan: record and print timer stack
> > > workqueue: kasan: record and print workqueue stack
> > > lib/test_kasan.c: add timer test case
> > > lib/test_kasan.c: add workqueue test case
> > > kasan: update documentation for generic kasan
> > >
> > > Documentation/dev-tools/kasan.rst | 4 ++--
> > > include/linux/kasan.h | 4 ++++
> > > kernel/time/timer.c | 2 ++
> > > kernel/workqueue.c | 3 +++
> > > lib/test_kasan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > mm/kasan/generic.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> > > mm/kasan/kasan.h | 6 +++++-
> > > mm/kasan/report.c | 22 ++++++++++++++++++++++
> > > 8 files changed, 134 insertions(+), 3 deletions(-)
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20200810072115.429-1-walter-zh.wu%40mediatek.com.
>
On Mon, 2020-08-10 at 08:44 -0400, Qian Cai wrote:
> On Mon, Aug 10, 2020 at 07:50:57PM +0800, Walter Wu wrote:
> > On Mon, 2020-08-10 at 07:19 -0400, Qian Cai wrote:
> > >
> > > > On Aug 10, 2020, at 3:21 AM, Walter Wu <[email protected]> wrote:
> > > >
> > > > Syzbot reports many UAF issues for workqueue or timer, see [1] and [2].
> > > > In some of these access/allocation happened in process_one_work(),
> > > > we see the free stack is useless in KASAN report, it doesn't help
> > > > programmers to solve UAF on workqueue. The same may stand for times.
> > > >
> > > > This patchset improves KASAN reports by making them to have workqueue
> > > > queueing stack and timer queueing stack information. It is useful for
> > > > programmers to solve use-after-free or double-free memory issue.
> > > >
> > > > Generic KASAN will record the last two workqueue and timer stacks,
> > > > print them in KASAN report. It is only suitable for generic KASAN.
> > > >
> > > > In order to print the last two workqueue and timer stacks, so that
> > > > we add new members in struct kasan_alloc_meta.
> > > > - two workqueue queueing work stacks, total size is 8 bytes.
> > > > - two timer queueing stacks, total size is 8 bytes.
> > > >
> > > > Orignial struct kasan_alloc_meta size is 16 bytes. After add new
> > > > members, then the struct kasan_alloc_meta total size is 32 bytes,
> > > > It is a good number of alignment. Let it get better memory consumption.
> > >
> > > Getting debugging tools complicated surely is the best way to kill it. I would argue that it only make sense to complicate it if it is useful most of the time which I never feel or hear that is the case. This reminds me your recent call_rcu() stacks that most of time just makes parsing the report cumbersome. Thus, I urge this exercise to over-engineer on special cases need to stop entirely.
> > >
> >
> > A good debug tool is to have complete information in order to solve
> > issue. We should focus on if KASAN reports always show this debug
> > information or create a option to decide if show it. Because this
> > feature is Dimitry's suggestion. see [1]. So I think it need to be
> > implemented. Maybe we can wait his response.
> >
> > [1]https://lkml.org/lkml/2020/6/23/256
>
> I don't know if it is Dmitry's pipe-dream which every KASAN report would enable
> developers to fix it without reproducing it. It is always an ongoing struggling
> between to make kernel easier to debug and the things less cumbersome.
>
> On the other hand, Dmitry's suggestion makes sense only if the price we are
> going to pay is fair. With the current diffstat and the recent experience of
> call_rcu() stacks "waste" screen spaces as a heavy KASAN user myself, I can't
> really get that exciting for pushing the limit again at all.
>
If you are concerned that the report is long, maybe we can create an
option for the user decide whether print them (include call_rcu).
So this should satisfy everyone?
> >
> > > >
> > > > [1]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22+process_one_work
> > > > [2]https://groups.google.com/g/syzkaller-bugs/search?q=%22use-after-free%22%20expire_timers
> > > > [3]https://bugzilla.kernel.org/show_bug.cgi?id=198437
> > > >
> > > > Walter Wu (5):
> > > > timer: kasan: record and print timer stack
> > > > workqueue: kasan: record and print workqueue stack
> > > > lib/test_kasan.c: add timer test case
> > > > lib/test_kasan.c: add workqueue test case
> > > > kasan: update documentation for generic kasan
> > > >
> > > > Documentation/dev-tools/kasan.rst | 4 ++--
> > > > include/linux/kasan.h | 4 ++++
> > > > kernel/time/timer.c | 2 ++
> > > > kernel/workqueue.c | 3 +++
> > > > lib/test_kasan.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > mm/kasan/generic.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> > > > mm/kasan/kasan.h | 6 +++++-
> > > > mm/kasan/report.c | 22 ++++++++++++++++++++++
> > > > 8 files changed, 134 insertions(+), 3 deletions(-)
> > > >
> > > > --
> > > > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > > > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > > > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20200810072115.429-1-walter-zh.wu%40mediatek.com.
> >
On Mon, Aug 10, 2020 at 10:31:22PM +0800, Walter Wu wrote:
> On Mon, 2020-08-10 at 08:44 -0400, Qian Cai wrote:
> > On Mon, Aug 10, 2020 at 07:50:57PM +0800, Walter Wu wrote:
> > > On Mon, 2020-08-10 at 07:19 -0400, Qian Cai wrote:
> > > >
> > > > > On Aug 10, 2020, at 3:21 AM, Walter Wu <[email protected]> wrote:
> > > > >
> > > > > Syzbot reports many UAF issues for workqueue or timer, see [1] and [2].
> > > > > In some of these access/allocation happened in process_one_work(),
> > > > > we see the free stack is useless in KASAN report, it doesn't help
> > > > > programmers to solve UAF on workqueue. The same may stand for times.
> > > > >
> > > > > This patchset improves KASAN reports by making them to have workqueue
> > > > > queueing stack and timer queueing stack information. It is useful for
> > > > > programmers to solve use-after-free or double-free memory issue.
> > > > >
> > > > > Generic KASAN will record the last two workqueue and timer stacks,
> > > > > print them in KASAN report. It is only suitable for generic KASAN.
> > > > >
> > > > > In order to print the last two workqueue and timer stacks, so that
> > > > > we add new members in struct kasan_alloc_meta.
> > > > > - two workqueue queueing work stacks, total size is 8 bytes.
> > > > > - two timer queueing stacks, total size is 8 bytes.
> > > > >
> > > > > Orignial struct kasan_alloc_meta size is 16 bytes. After add new
> > > > > members, then the struct kasan_alloc_meta total size is 32 bytes,
> > > > > It is a good number of alignment. Let it get better memory consumption.
> > > >
> > > > Getting debugging tools complicated surely is the best way to kill it. I would argue that it only make sense to complicate it if it is useful most of the time which I never feel or hear that is the case. This reminds me your recent call_rcu() stacks that most of time just makes parsing the report cumbersome. Thus, I urge this exercise to over-engineer on special cases need to stop entirely.
> > > >
> > >
> > > A good debug tool is to have complete information in order to solve
> > > issue. We should focus on if KASAN reports always show this debug
> > > information or create a option to decide if show it. Because this
> > > feature is Dimitry's suggestion. see [1]. So I think it need to be
> > > implemented. Maybe we can wait his response.
> > >
> > > [1]https://lkml.org/lkml/2020/6/23/256
> >
> > I don't know if it is Dmitry's pipe-dream which every KASAN report would enable
> > developers to fix it without reproducing it. It is always an ongoing struggling
> > between to make kernel easier to debug and the things less cumbersome.
> >
> > On the other hand, Dmitry's suggestion makes sense only if the price we are
> > going to pay is fair. With the current diffstat and the recent experience of
> > call_rcu() stacks "waste" screen spaces as a heavy KASAN user myself, I can't
> > really get that exciting for pushing the limit again at all.
> >
>
> If you are concerned that the report is long, maybe we can create an
> option for the user decide whether print them (include call_rcu).
> So this should satisfy everyone?
Adding kernel config options is just another way to add complications with real
cost. The only other way I can think of right now is to create some kinds of
plugin systems for kasan to be able to run ebpf scripts (for example) to deal
with those special cases.