2021-12-09 01:06:41

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v3 0/5] aio: fix use-after-free and missing wakeups

This series fixes two bugs in aio poll, and one issue with POLLFREE more
broadly. This is intended to replace
"[PATCH v5] aio: Add support for the POLLFREE"
(https://lore.kernel.org/r/[email protected])
which has some bugs.

Careful review is appreciated; the aio poll code is very hard to work
with, and it doesn't appear to have many tests. I've verified that it
passes the libaio test suite, which provides some coverage of poll.

Note, it looks like io_uring has the same bugs as aio poll. I haven't
tried to fix io_uring.

This series applies to v5.16-rc4.

Changed v2 => v3:
- Fixed a few commit messages and comments.
- Mention that libaio test suite still passes.

Changed v1 => v2:
- Added wake_up_pollfree().
- Various fixes to the aio poll fixes.
- Improved some comments in aio poll.

Eric Biggers (5):
wait: add wake_up_pollfree()
binder: use wake_up_pollfree()
signalfd: use wake_up_pollfree()
aio: keep poll requests on waitqueue until completed
aio: fix use-after-free due to missing POLLFREE handling

drivers/android/binder.c | 21 ++--
fs/aio.c | 184 ++++++++++++++++++++++++++------
fs/signalfd.c | 12 +--
include/linux/wait.h | 26 +++++
include/uapi/asm-generic/poll.h | 2 +-
kernel/sched/wait.c | 7 ++
6 files changed, 195 insertions(+), 57 deletions(-)

--
2.34.1



2021-12-09 01:06:44

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v3 4/5] aio: keep poll requests on waitqueue until completed

From: Eric Biggers <[email protected]>

Currently, aio_poll_wake() will always remove the poll request from the
waitqueue. Then, if aio_poll_complete_work() sees that none of the
polled events are ready and the request isn't cancelled, it re-adds the
request to the waitqueue. (This can easily happen when polling a file
that doesn't pass an event mask when waking up its waitqueue.)

This is fundamentally broken for two reasons:

1. If a wakeup occurs between vfs_poll() and the request being
re-added to the waitqueue, it will be missed because the request
wasn't on the waitqueue at the time. Therefore, IOCB_CMD_POLL
might never complete even if the polled file is ready.

2. When the request isn't on the waitqueue, there is no way to be
notified that the waitqueue is being freed (which happens when its
lifetime is shorter than the struct file's). This is supposed to
happen via the waitqueue entries being woken up with POLLFREE.

Therefore, leave the requests on the waitqueue until they are actually
completed (or cancelled). To keep track of when aio_poll_complete_work
needs to be scheduled, use new fields in struct poll_iocb. Remove the
'done' field which is now redundant.

Note that this is consistent with how sys_poll() and eventpoll work;
their wakeup functions do *not* remove the waitqueue entries.

Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL")
Cc: <[email protected]> # v4.18+
Signed-off-by: Eric Biggers <[email protected]>
---
fs/aio.c | 83 ++++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 63 insertions(+), 20 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 9c81cf611d659..2bc1352a83d8b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -181,8 +181,9 @@ struct poll_iocb {
struct file *file;
struct wait_queue_head *head;
__poll_t events;
- bool done;
bool cancelled;
+ bool work_scheduled;
+ bool work_need_resched;
struct wait_queue_entry wait;
struct work_struct work;
};
@@ -1638,14 +1639,26 @@ static void aio_poll_complete_work(struct work_struct *work)
* avoid further branches in the fast path.
*/
spin_lock_irq(&ctx->ctx_lock);
+ spin_lock(&req->head->lock);
if (!mask && !READ_ONCE(req->cancelled)) {
- add_wait_queue(req->head, &req->wait);
+ /*
+ * The request isn't actually ready to be completed yet.
+ * Reschedule completion if another wakeup came in.
+ */
+ if (req->work_need_resched) {
+ schedule_work(&req->work);
+ req->work_need_resched = false;
+ } else {
+ req->work_scheduled = false;
+ }
+ spin_unlock(&req->head->lock);
spin_unlock_irq(&ctx->ctx_lock);
return;
}
+ list_del_init(&req->wait.entry);
+ spin_unlock(&req->head->lock);
list_del_init(&iocb->ki_list);
iocb->ki_res.res = mangle_poll(mask);
- req->done = true;
spin_unlock_irq(&ctx->ctx_lock);

iocb_put(iocb);
@@ -1659,9 +1672,9 @@ static int aio_poll_cancel(struct kiocb *iocb)

spin_lock(&req->head->lock);
WRITE_ONCE(req->cancelled, true);
- if (!list_empty(&req->wait.entry)) {
- list_del_init(&req->wait.entry);
+ if (!req->work_scheduled) {
schedule_work(&aiocb->poll.work);
+ req->work_scheduled = true;
}
spin_unlock(&req->head->lock);

@@ -1680,20 +1693,26 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
if (mask && !(mask & req->events))
return 0;

- list_del_init(&req->wait.entry);
-
- if (mask && spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
+ /*
+ * Complete the request inline if possible. This requires that three
+ * conditions be met:
+ * 1. An event mask must have been passed. If a plain wakeup was done
+ * instead, then mask == 0 and we have to call vfs_poll() to get
+ * the events, so inline completion isn't possible.
+ * 2. The completion work must not have already been scheduled.
+ * 3. ctx_lock must not be busy. We have to use trylock because we
+ * already hold the waitqueue lock, so this inverts the normal
+ * locking order. Use irqsave/irqrestore because not all
+ * filesystems (e.g. fuse) call this function with IRQs disabled,
+ * yet IRQs have to be disabled before ctx_lock is obtained.
+ */
+ if (mask && !req->work_scheduled &&
+ spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
struct kioctx *ctx = iocb->ki_ctx;

- /*
- * Try to complete the iocb inline if we can. Use
- * irqsave/irqrestore because not all filesystems (e.g. fuse)
- * call this function with IRQs disabled and because IRQs
- * have to be disabled before ctx_lock is obtained.
- */
+ list_del_init(&req->wait.entry);
list_del(&iocb->ki_list);
iocb->ki_res.res = mangle_poll(mask);
- req->done = true;
if (iocb->ki_eventfd && eventfd_signal_allowed()) {
iocb = NULL;
INIT_WORK(&req->work, aio_poll_put_work);
@@ -1703,7 +1722,20 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
if (iocb)
iocb_put(iocb);
} else {
- schedule_work(&req->work);
+ /*
+ * Schedule the completion work if needed. If it was already
+ * scheduled, record that another wakeup came in.
+ *
+ * Don't remove the request from the waitqueue here, as it might
+ * not actually be complete yet (we won't know until vfs_poll()
+ * is called), and we must not miss any wakeups.
+ */
+ if (req->work_scheduled) {
+ req->work_need_resched = true;
+ } else {
+ schedule_work(&req->work);
+ req->work_scheduled = true;
+ }
}
return 1;
}
@@ -1750,8 +1782,9 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP;

req->head = NULL;
- req->done = false;
req->cancelled = false;
+ req->work_scheduled = false;
+ req->work_need_resched = false;

apt.pt._qproc = aio_poll_queue_proc;
apt.pt._key = req->events;
@@ -1766,17 +1799,27 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
spin_lock_irq(&ctx->ctx_lock);
if (likely(req->head)) {
spin_lock(&req->head->lock);
- if (unlikely(list_empty(&req->wait.entry))) {
- if (apt.error)
+ if (list_empty(&req->wait.entry) || req->work_scheduled) {
+ /*
+ * aio_poll_wake() already either scheduled the async
+ * completion work, or completed the request inline.
+ */
+ if (apt.error) /* unsupported case: multiple queues */
cancel = true;
apt.error = 0;
mask = 0;
}
if (mask || apt.error) {
+ /* Steal to complete synchronously. */
list_del_init(&req->wait.entry);
} else if (cancel) {
+ /* Cancel if possible (may be too late though). */
WRITE_ONCE(req->cancelled, true);
- } else if (!req->done) { /* actually waiting for an event */
+ } else if (!list_empty(&req->wait.entry)) {
+ /*
+ * Actually waiting for an event, so add the request to
+ * active_reqs so that it can be cancelled if needed.
+ */
list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
aiocb->ki_cancel = aio_poll_cancel;
}
--
2.34.1


2021-12-09 01:06:48

by Eric Biggers

[permalink] [raw]
Subject: [PATCH v3 5/5] aio: fix use-after-free due to missing POLLFREE handling

From: Eric Biggers <[email protected]>

signalfd_poll() and binder_poll() are special in that they use a
waitqueue whose lifetime is the current task, rather than the struct
file as is normally the case. This is okay for blocking polls, since a
blocking poll occurs within one task; however, non-blocking polls
require another solution. This solution is for the queue to be cleared
before it is freed, by sending a POLLFREE notification to all waiters.

Unfortunately, only eventpoll handles POLLFREE. A second type of
non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't
handle POLLFREE. This allows a use-after-free to occur if a signalfd or
binder fd is polled with aio poll, and the waitqueue gets freed.

Fix this by making aio poll handle POLLFREE.

A patch by Ramji Jiyani <[email protected]>
(https://lore.kernel.org/r/[email protected])
tried to do this by making aio_poll_wake() always complete the request
inline if POLLFREE is seen. However, that solution had two bugs.
First, it introduced a deadlock, as it unconditionally locked the aio
context while holding the waitqueue lock, which inverts the normal
locking order. Second, it didn't consider that POLLFREE notifications
are missed while the request has been temporarily de-queued.

The second problem was solved by my previous patch. This patch then
properly fixes the use-after-free by handling POLLFREE in a
deadlock-free way. It does this by taking advantage of the fact that
freeing of the waitqueue is RCU-delayed, similar to what eventpoll does.

Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL")
Cc: <[email protected]> # v4.18+
Signed-off-by: Eric Biggers <[email protected]>
---
fs/aio.c | 137 ++++++++++++++++++++++++--------
include/uapi/asm-generic/poll.h | 2 +-
2 files changed, 107 insertions(+), 32 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 2bc1352a83d8b..c9bb0d3d85932 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1620,6 +1620,51 @@ static void aio_poll_put_work(struct work_struct *work)
iocb_put(iocb);
}

+/*
+ * Safely lock the waitqueue which the request is on, synchronizing with the
+ * case where the ->poll() provider decides to free its waitqueue early.
+ *
+ * Returns true on success, meaning that req->head->lock was locked, req->wait
+ * is on req->head, and an RCU read lock was taken. Returns false if the
+ * request was already removed from its waitqueue (which might no longer exist).
+ */
+static bool poll_iocb_lock_wq(struct poll_iocb *req)
+{
+ wait_queue_head_t *head;
+
+ /*
+ * While we hold the waitqueue lock and the waitqueue is nonempty,
+ * wake_up_pollfree() will wait for us. However, taking the waitqueue
+ * lock in the first place can race with the waitqueue being freed.
+ *
+ * We solve this as eventpoll does: by taking advantage of the fact that
+ * all users of wake_up_pollfree() will RCU-delay the actual free. If
+ * we enter rcu_read_lock() and see that the pointer to the queue is
+ * non-NULL, we can then lock it without the memory being freed out from
+ * under us, then check whether the request is still on the queue.
+ *
+ * Keep holding rcu_read_lock() as long as we hold the queue lock, in
+ * case the caller deletes the entry from the queue, leaving it empty.
+ * In that case, only RCU prevents the queue memory from being freed.
+ */
+ rcu_read_lock();
+ head = smp_load_acquire(&req->head);
+ if (head) {
+ spin_lock(&head->lock);
+ if (!list_empty(&req->wait.entry))
+ return true;
+ spin_unlock(&head->lock);
+ }
+ rcu_read_unlock();
+ return false;
+}
+
+static void poll_iocb_unlock_wq(struct poll_iocb *req)
+{
+ spin_unlock(&req->head->lock);
+ rcu_read_unlock();
+}
+
static void aio_poll_complete_work(struct work_struct *work)
{
struct poll_iocb *req = container_of(work, struct poll_iocb, work);
@@ -1639,24 +1684,25 @@ static void aio_poll_complete_work(struct work_struct *work)
* avoid further branches in the fast path.
*/
spin_lock_irq(&ctx->ctx_lock);
- spin_lock(&req->head->lock);
- if (!mask && !READ_ONCE(req->cancelled)) {
- /*
- * The request isn't actually ready to be completed yet.
- * Reschedule completion if another wakeup came in.
- */
- if (req->work_need_resched) {
- schedule_work(&req->work);
- req->work_need_resched = false;
- } else {
- req->work_scheduled = false;
+ if (poll_iocb_lock_wq(req)) {
+ if (!mask && !READ_ONCE(req->cancelled)) {
+ /*
+ * The request isn't actually ready to be completed yet.
+ * Reschedule completion if another wakeup came in.
+ */
+ if (req->work_need_resched) {
+ schedule_work(&req->work);
+ req->work_need_resched = false;
+ } else {
+ req->work_scheduled = false;
+ }
+ poll_iocb_unlock_wq(req);
+ spin_unlock_irq(&ctx->ctx_lock);
+ return;
}
- spin_unlock(&req->head->lock);
- spin_unlock_irq(&ctx->ctx_lock);
- return;
- }
- list_del_init(&req->wait.entry);
- spin_unlock(&req->head->lock);
+ list_del_init(&req->wait.entry);
+ poll_iocb_unlock_wq(req);
+ } /* else, POLLFREE has freed the waitqueue, so we must complete */
list_del_init(&iocb->ki_list);
iocb->ki_res.res = mangle_poll(mask);
spin_unlock_irq(&ctx->ctx_lock);
@@ -1670,13 +1716,14 @@ static int aio_poll_cancel(struct kiocb *iocb)
struct aio_kiocb *aiocb = container_of(iocb, struct aio_kiocb, rw);
struct poll_iocb *req = &aiocb->poll;

- spin_lock(&req->head->lock);
- WRITE_ONCE(req->cancelled, true);
- if (!req->work_scheduled) {
- schedule_work(&aiocb->poll.work);
- req->work_scheduled = true;
- }
- spin_unlock(&req->head->lock);
+ if (poll_iocb_lock_wq(req)) {
+ WRITE_ONCE(req->cancelled, true);
+ if (!req->work_scheduled) {
+ schedule_work(&aiocb->poll.work);
+ req->work_scheduled = true;
+ }
+ poll_iocb_unlock_wq(req);
+ } /* else, the request was force-cancelled by POLLFREE already */

return 0;
}
@@ -1728,7 +1775,8 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
*
* Don't remove the request from the waitqueue here, as it might
* not actually be complete yet (we won't know until vfs_poll()
- * is called), and we must not miss any wakeups.
+ * is called), and we must not miss any wakeups. POLLFREE is an
+ * exception to this; see below.
*/
if (req->work_scheduled) {
req->work_need_resched = true;
@@ -1736,6 +1784,28 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
schedule_work(&req->work);
req->work_scheduled = true;
}
+
+ /*
+ * If the waitqueue is being freed early but we can't complete
+ * the request inline, we have to tear down the request as best
+ * we can. That means immediately removing the request from its
+ * waitqueue and preventing all further accesses to the
+ * waitqueue via the request. We also need to schedule the
+ * completion work (done above). Also mark the request as
+ * cancelled, to potentially skip an unneeded call to ->poll().
+ */
+ if (mask & POLLFREE) {
+ WRITE_ONCE(req->cancelled, true);
+ list_del_init(&req->wait.entry);
+
+ /*
+ * Careful: this *must* be the last step, since as soon
+ * as req->head is NULL'ed out, the request can be
+ * completed and freed, since aio_poll_complete_work()
+ * will no longer need to take the waitqueue lock.
+ */
+ smp_store_release(&req->head, NULL);
+ }
}
return 1;
}
@@ -1743,6 +1813,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
struct aio_poll_table {
struct poll_table_struct pt;
struct aio_kiocb *iocb;
+ bool queued;
int error;
};

@@ -1753,11 +1824,12 @@ aio_poll_queue_proc(struct file *file, struct wait_queue_head *head,
struct aio_poll_table *pt = container_of(p, struct aio_poll_table, pt);

/* multiple wait queues per file are not supported */
- if (unlikely(pt->iocb->poll.head)) {
+ if (unlikely(pt->queued)) {
pt->error = -EINVAL;
return;
}

+ pt->queued = true;
pt->error = 0;
pt->iocb->poll.head = head;
add_wait_queue(head, &pt->iocb->poll.wait);
@@ -1789,6 +1861,7 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
apt.pt._qproc = aio_poll_queue_proc;
apt.pt._key = req->events;
apt.iocb = aiocb;
+ apt.queued = false;
apt.error = -EINVAL; /* same as no support for IOCB_CMD_POLL */

/* initialized the list so that we can do list_empty checks */
@@ -1797,9 +1870,10 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)

mask = vfs_poll(req->file, &apt.pt) & req->events;
spin_lock_irq(&ctx->ctx_lock);
- if (likely(req->head)) {
- spin_lock(&req->head->lock);
- if (list_empty(&req->wait.entry) || req->work_scheduled) {
+ if (likely(apt.queued)) {
+ bool on_queue = poll_iocb_lock_wq(req);
+
+ if (!on_queue || req->work_scheduled) {
/*
* aio_poll_wake() already either scheduled the async
* completion work, or completed the request inline.
@@ -1815,7 +1889,7 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
} else if (cancel) {
/* Cancel if possible (may be too late though). */
WRITE_ONCE(req->cancelled, true);
- } else if (!list_empty(&req->wait.entry)) {
+ } else if (on_queue) {
/*
* Actually waiting for an event, so add the request to
* active_reqs so that it can be cancelled if needed.
@@ -1823,7 +1897,8 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
aiocb->ki_cancel = aio_poll_cancel;
}
- spin_unlock(&req->head->lock);
+ if (on_queue)
+ poll_iocb_unlock_wq(req);
}
if (mask) { /* no async, we'd stolen it */
aiocb->ki_res.res = mangle_poll(mask);
diff --git a/include/uapi/asm-generic/poll.h b/include/uapi/asm-generic/poll.h
index 41b509f410bf9..f9c520ce4bf4e 100644
--- a/include/uapi/asm-generic/poll.h
+++ b/include/uapi/asm-generic/poll.h
@@ -29,7 +29,7 @@
#define POLLRDHUP 0x2000
#endif

-#define POLLFREE (__force __poll_t)0x4000 /* currently only for epoll */
+#define POLLFREE (__force __poll_t)0x4000

#define POLL_BUSY_LOOP (__force __poll_t)0x8000

--
2.34.1


2021-12-09 18:03:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] aio: fix use-after-free and missing wakeups

On Wed, Dec 8, 2021 at 5:06 PM Eric Biggers <[email protected]> wrote:
>
> Careful review is appreciated; the aio poll code is very hard to work
> with, and it doesn't appear to have many tests. I've verified that it
> passes the libaio test suite, which provides some coverage of poll.
>
> Note, it looks like io_uring has the same bugs as aio poll. I haven't
> tried to fix io_uring.

I'm hoping Jens is looking at the io_ring case, but I'm also assuming
that I'll just get a pull request for this at some point.

It looks sane to me - my only internal cursing has been about epoll
and aio in general, not about these patches in particular.

Linus

2021-12-09 18:38:03

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] aio: fix use-after-free and missing wakeups

On Thu, Dec 09, 2021 at 10:00:50AM -0800, Linus Torvalds wrote:
> On Wed, Dec 8, 2021 at 5:06 PM Eric Biggers <[email protected]> wrote:
> >
> > Careful review is appreciated; the aio poll code is very hard to work
> > with, and it doesn't appear to have many tests. I've verified that it
> > passes the libaio test suite, which provides some coverage of poll.
> >
> > Note, it looks like io_uring has the same bugs as aio poll. I haven't
> > tried to fix io_uring.
>
> I'm hoping Jens is looking at the io_ring case, but I'm also assuming
> that I'll just get a pull request for this at some point.
>
> It looks sane to me - my only internal cursing has been about epoll
> and aio in general, not about these patches in particular.

I was hoping that Al would review and apply these, given that he's listed as the
maintainer for this file, and he's worked on this code before. I was also
hoping for review from Christoph, since he added IOCB_CMD_POLL originally. But
yes, if I don't hear anything I'll just send you a pull request. I might
include https://lore.kernel.org/r/[email protected]
too, since it's an obviously correct aio fix which has been ignored for months.

- Eric

2021-12-09 21:46:52

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] aio: fix use-after-free and missing wakeups

On 12/9/21 11:00 AM, Linus Torvalds wrote:
> On Wed, Dec 8, 2021 at 5:06 PM Eric Biggers <[email protected]> wrote:
>>
>> Careful review is appreciated; the aio poll code is very hard to work
>> with, and it doesn't appear to have many tests. I've verified that it
>> passes the libaio test suite, which provides some coverage of poll.
>>
>> Note, it looks like io_uring has the same bugs as aio poll. I haven't
>> tried to fix io_uring.
>
> I'm hoping Jens is looking at the io_ring case, but I'm also assuming
> that I'll just get a pull request for this at some point.

Yes, when I saw this original posting I did discuss it with Pavel as
well, and we agree that the same issue exists there. Which isn't too
surprising, as that's where the io_uring poll code from originally.

Eric, do you have a test case for this? aio is fine, we can convert it
to io_uring as well. Would be nice for both verifying the fix, but also
to carry in the io_uring regression tests for the future.

--
Jens Axboe


2021-12-10 05:10:17

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] aio: fix use-after-free and missing wakeups

On Thu, Dec 09, 2021 at 02:46:45PM -0700, Jens Axboe wrote:
> On 12/9/21 11:00 AM, Linus Torvalds wrote:
> > On Wed, Dec 8, 2021 at 5:06 PM Eric Biggers <[email protected]> wrote:
> >>
> >> Careful review is appreciated; the aio poll code is very hard to work
> >> with, and it doesn't appear to have many tests. I've verified that it
> >> passes the libaio test suite, which provides some coverage of poll.
> >>
> >> Note, it looks like io_uring has the same bugs as aio poll. I haven't
> >> tried to fix io_uring.
> >
> > I'm hoping Jens is looking at the io_ring case, but I'm also assuming
> > that I'll just get a pull request for this at some point.
>
> Yes, when I saw this original posting I did discuss it with Pavel as
> well, and we agree that the same issue exists there. Which isn't too
> surprising, as that's where the io_uring poll code from originally.
>
> Eric, do you have a test case for this? aio is fine, we can convert it
> to io_uring as well. Would be nice for both verifying the fix, but also
> to carry in the io_uring regression tests for the future.

Well, the use-after-free bug is pretty hard to test for. It only affects
polling a binder fd or signalfd, so one of those has to be used. Also, I
haven't found a way to detect it other than the use-after-free itself, so
effectively a kernel with KASAN enabled is needed. But KASAN doesn't work with
signalfd because the signalfd waitqueues are in an SLAB_TYPESAFE_BY_RCU slab, so
binder is the only way to detect it without working around SLAB_TYPESAFE_BY_RCU,
or patching the kernel to add log messages. Also, aio supports inline
completion which avoids the bug, so that needs to be worked around.

So the best I can do is provide a program that's pretty specific to aio, which
causes KASAN to report a use-after-free if the kernel has CONFIG_KASAN and
CONFIG_ANDROID_BINDER_IPC enabled. Note, "normal" Linux distros don't have
either option enabled. I'm not sure that would be useful for you.

If you're also asking about the other bug (missed wakeups), i.e. the one that
patch 4 in this series fixes, in theory that would be detectable without those
dependencies. It's still a race condition that depends on kernel implementation
details, so it will be hard to test for too. But I might have a go at writing a
test for it anyway.

- Eric

2021-12-10 08:07:23

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] aio: fix use-after-free and missing wakeups

On Thu, Dec 09, 2021 at 09:10:09PM -0800, Eric Biggers wrote:
> On Thu, Dec 09, 2021 at 02:46:45PM -0700, Jens Axboe wrote:
> > On 12/9/21 11:00 AM, Linus Torvalds wrote:
> > > On Wed, Dec 8, 2021 at 5:06 PM Eric Biggers <[email protected]> wrote:
> > >>
> > >> Careful review is appreciated; the aio poll code is very hard to work
> > >> with, and it doesn't appear to have many tests. I've verified that it
> > >> passes the libaio test suite, which provides some coverage of poll.
> > >>
> > >> Note, it looks like io_uring has the same bugs as aio poll. I haven't
> > >> tried to fix io_uring.
> > >
> > > I'm hoping Jens is looking at the io_ring case, but I'm also assuming
> > > that I'll just get a pull request for this at some point.
> >
> > Yes, when I saw this original posting I did discuss it with Pavel as
> > well, and we agree that the same issue exists there. Which isn't too
> > surprising, as that's where the io_uring poll code from originally.
> >
> > Eric, do you have a test case for this? aio is fine, we can convert it
> > to io_uring as well. Would be nice for both verifying the fix, but also
> > to carry in the io_uring regression tests for the future.
>
> Well, the use-after-free bug is pretty hard to test for. It only affects
> polling a binder fd or signalfd, so one of those has to be used. Also, I
> haven't found a way to detect it other than the use-after-free itself, so
> effectively a kernel with KASAN enabled is needed. But KASAN doesn't work with
> signalfd because the signalfd waitqueues are in an SLAB_TYPESAFE_BY_RCU slab, so
> binder is the only way to detect it without working around SLAB_TYPESAFE_BY_RCU,
> or patching the kernel to add log messages. Also, aio supports inline
> completion which avoids the bug, so that needs to be worked around.
>
> So the best I can do is provide a program that's pretty specific to aio, which
> causes KASAN to report a use-after-free if the kernel has CONFIG_KASAN and
> CONFIG_ANDROID_BINDER_IPC enabled. Note, "normal" Linux distros don't have
> either option enabled. I'm not sure that would be useful for you.
>
> If you're also asking about the other bug (missed wakeups), i.e. the one that
> patch 4 in this series fixes, in theory that would be detectable without those
> dependencies. It's still a race condition that depends on kernel implementation
> details, so it will be hard to test for too. But I might have a go at writing a
> test for it anyway.
>

Here's a regression test for the missed wakeups bug (patch 4). It needs some
polishing, but it should be suitable for ltp or the libaio test suite. (I'm not
sure which one of those to choose, as both have aio tests; if anyone has a
preference, please let me know.)

To be clear, this does *not* test for the use-after-free bug (patch 5). As I
mentioned, as far as I know, a test for that would depend on both KASAN and
binder, making it harder to run.


// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright 2021 Google LLC
*/
/*
* Regression test for "aio: keep poll requests on waitqueue until completed".
* This test verifies that aio poll (IOCB_CMD_POLL) doesn't miss any events.
*
* This test repeatedly does the following operations in parallel:
*
* Thread 1: issue an aio poll for the read end of a pipe, and wait for it
* to complete
* Thread 2: spice some data to the pipe
* Thread 3: read from the pipe, then splice some more data to it
*
* The pipe will always end up with data, so the poll should always complete.
*
* With the bug, that didn't always happen, as the second splice sometimes
* didn't cause a wake-up due to the waitqueue entry being temporarily removed,
* as per the following buggy sequence of operations:
*
* - Thread 1: add the poll waitqueue entry
* - Thread 2: first splice. This calls the poll wakeup function,
* which deletes the waitqueue entry [BUG!], then
* schedules async completion work.
* - Thread 3: read from the pipe
* - kworker: see that the pipe isn't ready
* - Thread 3: splice some data to the pipe; waitqueue is empty.
* - kworker: re-add the waitqueue entry
*
* The reason we use splice() rather than write() is because we need something
* that doesn't pass an event mask when waking up pollers. splice() to a pipe
* happens to meet that criteria, while write() to a pipe doesn't.
*/
#define _GNU_SOURCE
#include <fcntl.h>
#include <linux/aio_abi.h>
#include <poll.h>
#include <pthread.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>

static char tmpfile_name[] = "/tmp/aio.XXXXXX";
static int tmpfd;
static int pipefds[2];

static void die(const char *msg)
{
perror(msg);
exit(1);
}

static void *thrproc1(void *arg)
{
off_t offset = 0;

usleep(rand() % 50);
if (splice(tmpfd, &offset, pipefds[1], NULL, 1, 0) != 1)
die("splice");
return NULL;
}

static void *thrproc2(void *arg)
{
char buf[4096];
off_t offset = 0;

usleep(rand() % 100);
read(pipefds[0], buf, sizeof(buf));
if (splice(tmpfd, &offset, pipefds[1], NULL, 1, 0) != 1)
die("splice");
return NULL;
}

static int run_test(void)
{
aio_context_t aio_ctx = 0;
struct iocb iocb = {};
struct iocb *iocbs[] = {&iocb};
struct io_event event;
struct timespec timeout = { .tv_sec = 5 };
pthread_t t1, t2;
int i;
int ret;

if (syscall(__NR_io_setup, 1, &aio_ctx) != 0) {
printf("aio not supported, can't run test.\n");
return 1;
}

for (i = 0; i < 25000; i++) {
if (pipe(pipefds) != 0)
die("pipe");
iocb.aio_fildes = pipefds[0];
iocb.aio_lio_opcode = IOCB_CMD_POLL;
iocb.aio_buf = POLLIN;

if (syscall(__NR_io_submit, aio_ctx, 1, iocbs) != 1)
die("io_submit");

pthread_create(&t1, NULL, thrproc1, NULL);
pthread_create(&t2, NULL, thrproc2, NULL);

ret = syscall(__NR_io_getevents, aio_ctx, 1, 1, &event,
&timeout);
if (ret < 0)
die("io_getevents");
if (ret == 0) {
printf("FAIL: poll missed an event; i=%d\n", i);
return 1;
}
pthread_join(t1, NULL);
pthread_join(t2, NULL);
close(pipefds[0]);
close(pipefds[1]);
}
return 0;
}

int main()
{
int ncpus;
int status;
int overall_status = 0;
int i;

tmpfd = mkstemp(tmpfile_name);
if (tmpfd < 0)
die("mkstemp");
if (pwrite(tmpfd, "X", 1, 0) != 1)
die("pwrite");

ncpus = sysconf(_SC_NPROCESSORS_CONF);
if (ncpus < 1)
ncpus = 1;
for (i = 0; i < ncpus; i++) {
if (fork() == 0)
return run_test();
}
for (i = 0; i < ncpus; i++) {
status = -1;
wait(&status);
overall_status |= status;
}
unlink(tmpfile_name);
return overall_status;
}


2021-12-13 07:23:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] aio: fix use-after-free and missing wakeups

On Thu, Dec 09, 2021 at 10:37:28AM -0800, Eric Biggers wrote:
> I was hoping that Al would review and apply these, given that he's listed as the
> maintainer for this file, and he's worked on this code before. I was also
> hoping for review from Christoph, since he added IOCB_CMD_POLL originally. But

I was planning to get to it, but it seems like it got merged over this
weekend?

2021-12-13 17:25:00

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] aio: fix use-after-free and missing wakeups

On Mon, Dec 13, 2021 at 08:23:39AM +0100, Christoph Hellwig wrote:
> On Thu, Dec 09, 2021 at 10:37:28AM -0800, Eric Biggers wrote:
> > I was hoping that Al would review and apply these, given that he's listed as the
> > maintainer for this file, and he's worked on this code before. I was also
> > hoping for review from Christoph, since he added IOCB_CMD_POLL originally. But
>
> I was planning to get to it, but it seems like it got merged over this
> weekend?

There weren't any indications that anyone else was going to review it, and it
wasn't appropriate to wait any longer. If you'd still like to review it, please
do so; if you find any problem I'll fix it in a follow-on fix.

- Eric

2022-01-05 15:26:48

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] aio: fix use-after-free and missing wakeups

On Thu, Dec 09, 2021 at 02:46:45PM -0700, Jens Axboe wrote:
> On 12/9/21 11:00 AM, Linus Torvalds wrote:
> > On Wed, Dec 8, 2021 at 5:06 PM Eric Biggers <[email protected]> wrote:
> >>
> >> Careful review is appreciated; the aio poll code is very hard to work
> >> with, and it doesn't appear to have many tests. I've verified that it
> >> passes the libaio test suite, which provides some coverage of poll.
> >>
> >> Note, it looks like io_uring has the same bugs as aio poll. I haven't
> >> tried to fix io_uring.
> >
> > I'm hoping Jens is looking at the io_ring case, but I'm also assuming
> > that I'll just get a pull request for this at some point.
>
> Yes, when I saw this original posting I did discuss it with Pavel as
> well, and we agree that the same issue exists there. Which isn't too
> surprising, as that's where the io_uring poll code from originally.
>

Jens, any update on fixing the io_uring version of the bug? Note, syzbot has
managed to use io_uring poll to hit the WARN_ON_ONCE() that I added in
__wake_up_pollfree(), which proves that it is broken.

- Eric