Hi all,
this series is supposed to fix some lifetime issues all related to the fact that
once the last ->serial() has been invoked, the padata user (i.e. pcrypt) is well
with its right to tear down the associated padata_shell or parallel_data
instance respectively.
Only the first one, addressed by patch [2/5], has actually been observed, namely
on a (downstream) RT kernel under a very specific workload involving LTP's
pcrypt_aead01. On non-RT, I've been unable to reproduce.
The remainder of this series, 3-5/5, fixes two more, somewhat related, but
purely theoretical issues I spotted when scratching my head about possible
reasons for the original Oops.
Thanks!
Nicolai
Nicolai Stange (5):
padata: introduce internal padata_get/put_pd() helpers
padata: make padata_free_shell() to respect pd's ->refcnt
padata: grab parallel_data refcnt for reorder
padata: split out dequeue operation from padata_find_next()
padata: avoid potential UAFs to the padata_shell from padata_reorder()
kernel/padata.c | 129 +++++++++++++++++++++++++++++++++++-------------
1 file changed, 96 insertions(+), 33 deletions(-)
--
2.37.3
The next commit in this series will add yet another code site decrementing
a struct parallel_data's refcount and invoking dealloction as appropriate.
With that, it's due to provide proper helper functions for managing
parallel_datas' refcounts and to convert the existing open-coded
refcount manipulation sites.
Implement padata_put_pd() as well as a padata_put_pd_many() for batched
releases as needed in padata_serial_worker(). For symmetry reasons, also
provide a padata_put_get(), even though its implementation is fairly
trivial.
Convert the exisiting open-coded parallel_data ->refcnt manipulation code
sites to these new helpers.
Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/padata.c | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/kernel/padata.c b/kernel/padata.c
index e5819bb8bd1d..3bd1e23f089b 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -45,6 +45,10 @@ struct padata_mt_job_state {
};
static void padata_free_pd(struct parallel_data *pd);
+static inline void padata_get_pd(struct parallel_data *pd);
+static void padata_put_pd_many(struct parallel_data *pd, int cnt);
+static inline void padata_put_pd(struct parallel_data *pd);
+
static void __init padata_mt_helper(struct work_struct *work);
static int padata_index_to_cpu(struct parallel_data *pd, int cpu_index)
@@ -198,7 +202,7 @@ int padata_do_parallel(struct padata_shell *ps,
if ((pinst->flags & PADATA_RESET))
goto out;
- refcount_inc(&pd->refcnt);
+ padata_get_pd(pd);
padata->pd = pd;
padata->cb_cpu = *cb_cpu;
@@ -370,8 +374,7 @@ static void padata_serial_worker(struct work_struct *serial_work)
}
local_bh_enable();
- if (refcount_sub_and_test(cnt, &pd->refcnt))
- padata_free_pd(pd);
+ padata_put_pd_many(pd, cnt);
}
/**
@@ -608,6 +611,22 @@ static void padata_free_pd(struct parallel_data *pd)
kfree(pd);
}
+static inline void padata_get_pd(struct parallel_data *pd)
+{
+ refcount_inc(&pd->refcnt);
+}
+
+static void padata_put_pd_many(struct parallel_data *pd, int cnt)
+{
+ if (refcount_sub_and_test(cnt, &pd->refcnt))
+ padata_free_pd(pd);
+}
+
+static inline void padata_put_pd(struct parallel_data *pd)
+{
+ padata_put_pd_many(pd, 1);
+}
+
static void __padata_start(struct padata_instance *pinst)
{
pinst->flags |= PADATA_INIT;
@@ -654,8 +673,7 @@ static int padata_replace(struct padata_instance *pinst)
synchronize_rcu();
list_for_each_entry_continue_reverse(ps, &pinst->pslist, list)
- if (refcount_dec_and_test(&ps->opd->refcnt))
- padata_free_pd(ps->opd);
+ padata_put_pd(ps->opd);
pinst->flags &= ~PADATA_RESET;
--
2.37.3
On entry of padata_do_serial(), the in-flight padata_priv owns a reference
to the associated parallel_data instance.
However, as soon as the padata_priv got enqueued on the reorder list, it
can be completed from a different context, causing the reference to get
released in the course.
This would potentially cause UAFs from the subsequent padata_reorder()
operations invoked from the enqueueing padata_do_serial() or from the
reorder work.
Note that this is a purely theroretical concern, the problem has never been
actually observed -- it would require multiple pcrypt request submissions
racing against each other, ultimately a pcrypt instance destruction
(DELALG) short after request completions as well as unfortunate timing.
However, for the sake of correctness, it is still worth fixing.
Make padata_do_serial() grab a reference count on the parallel_data for
the subsequent reorder operation(s). As long as the padata_priv has not
been enqueued, this is safe, because as mentioned above, in-flight
pdata_privs own a reference already.
Note that padata_reorder() might schedule another padata_reorder() work
and thus, care must be taken to not prematurely release that "reorder
refcount" from padata_do_serial() again in case that has happened.
Make padata_reorder() return a bool for indicating whether or not a
reorder work has been scheduled. Let padata_do_serial() drop its refcount
only if this is not the case. Accordingly, make the reorder work handler,
invoke_padata_reorder(), drop it then as appropriate.
A remark on the commit chosen for the Fixes tag reference below: before
commit bbefa1dd6a6d ("crypto: pcrypt - Avoid deadlock by using per-instance
padata queues"), the padata_parallel lifetime had been tied to the
padata_instance. The padata_free() resp. padata_stop() issued a
synchronize_rcu() before padata_free_pd() from the instance destruction
path, rendering UAFs from the padata_do_serial()=>padata_reorder()
invocations with BHs disabled impossible AFAICS. With that, the
padata_reorder() work remains to be considered. Before
commit b128a3040935 ("padata: allocate workqueue internally"), the
workqueue got destroyed (from pcrypt), hence drained, before the padata
instance destruction, but this change moved that to after the
padata_free_pd() invocation from __padata_free(). So, while the Fixes
reference from below is most likely technically correct, I would still like
to reiterate that this problem is probably hard to trigger in practice,
even more so before commit bbefa1dd6a6d ("crypto: pcrypt - Avoid deadlock
by using per-instance padata queues").
Fixes: b128a3040935 ("padata: allocate workqueue internally")
Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/padata.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/kernel/padata.c b/kernel/padata.c
index 0bf8c80dad5a..b79226727ef7 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -275,7 +275,7 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd,
return padata;
}
-static void padata_reorder(struct parallel_data *pd)
+static bool padata_reorder(struct parallel_data *pd)
{
struct padata_instance *pinst = pd->ps->pinst;
int cb_cpu;
@@ -294,7 +294,7 @@ static void padata_reorder(struct parallel_data *pd)
* care for all the objects enqueued during the holdtime of the lock.
*/
if (!spin_trylock_bh(&pd->lock))
- return;
+ return false;
while (1) {
padata = padata_find_next(pd, true);
@@ -331,17 +331,23 @@ static void padata_reorder(struct parallel_data *pd)
reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
if (!list_empty(&reorder->list) && padata_find_next(pd, false))
- queue_work(pinst->serial_wq, &pd->reorder_work);
+ return queue_work(pinst->serial_wq, &pd->reorder_work);
+
+ return false;
}
static void invoke_padata_reorder(struct work_struct *work)
{
struct parallel_data *pd;
+ bool keep_refcnt;
local_bh_disable();
pd = container_of(work, struct parallel_data, reorder_work);
- padata_reorder(pd);
+ keep_refcnt = padata_reorder(pd);
local_bh_enable();
+
+ if (!keep_refcnt)
+ padata_put_pd(pd);
}
static void padata_serial_worker(struct work_struct *serial_work)
@@ -392,6 +398,15 @@ void padata_do_serial(struct padata_priv *padata)
struct padata_list *reorder = per_cpu_ptr(pd->reorder_list, hashed_cpu);
struct padata_priv *cur;
+ /*
+ * The in-flight padata owns a reference on pd. However, as
+ * soon as it's been enqueued on the reorder list, another
+ * task can dequeue and complete it, thereby dropping the
+ * reference. Grab another reference here, it will eventually
+ * be released from a reorder work, if any, or below.
+ */
+ padata_get_pd(pd);
+
spin_lock(&reorder->lock);
/* Sort in ascending order of sequence number. */
list_for_each_entry_reverse(cur, &reorder->list, list)
@@ -407,7 +422,8 @@ void padata_do_serial(struct padata_priv *padata)
*/
smp_mb();
- padata_reorder(pd);
+ if (!padata_reorder(pd))
+ padata_put_pd(pd);
}
EXPORT_SYMBOL(padata_do_serial);
--
2.37.3
Currently, padata_find_next() takes a 'remove_object' argument for
specifying whether the caller wants the returned patada_priv, if any, to
get removed from the percpu reorder list it's been found on.
There are only two callsites, both from padata_reorder():
- one supposed to dequeue the padata_priv instances to be processed in a
loop, i.e. it has the 'remove_object' set to true, and
- another one near the end of padata_reorder() with 'remove_object' set to
false for checking whether the reorder work needs to get rescheduled.
In order to deal with lifetime issues, a future commit will need to move
this latter reorder work scheduling operation to under the reorder->lock,
where pd->ps is guaranteed to exist as long as there are any padata_privs
to process. However this lock is currently taken within padata_find_next().
In order to be able to extend the reorder->lock to beyond the call to
padata_find_next() from padata_reorder(), a variant where the caller may
grab it for the callee shall be provided.
Split padata_find_next() into two parts:
- __padata_find_next(), which expects the caller to hold the reorder->lock
and only returns the found padata_priv, if any, without removing it
from the queue.
- padata_dequeue_next(), with functionality equivalent to the former
padata_find_next(pd, remove_object=true) and implemented by means of
the factored out __padata_find_next().
Adapt the two callsites in padata_reorder() as appropriate.
There is no change in functionality.
Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/padata.c | 57 ++++++++++++++++++++++++++++++++-----------------
1 file changed, 37 insertions(+), 20 deletions(-)
diff --git a/kernel/padata.c b/kernel/padata.c
index b79226727ef7..e9eab3e94cfc 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -230,29 +230,24 @@ int padata_do_parallel(struct padata_shell *ps,
EXPORT_SYMBOL(padata_do_parallel);
/*
- * padata_find_next - Find the next object that needs serialization.
+ * __padata_find_next - Find the next object that needs serialization.
*
* Return:
* * A pointer to the control struct of the next object that needs
- * serialization, if present in one of the percpu reorder queues.
+ * serialization, if already present on the given percpu reorder queue.
* * NULL, if the next object that needs serialization will
* be parallel processed by another cpu and is not yet present in
- * the cpu's reorder queue.
+ * the reorder queue.
*/
-static struct padata_priv *padata_find_next(struct parallel_data *pd,
- bool remove_object)
+static struct padata_priv *__padata_find_next(struct parallel_data *pd,
+ struct padata_list *reorder)
{
struct padata_priv *padata;
- struct padata_list *reorder;
- int cpu = pd->cpu;
- reorder = per_cpu_ptr(pd->reorder_list, cpu);
+ lockdep_assert_held(&reorder->lock);
- spin_lock(&reorder->lock);
- if (list_empty(&reorder->list)) {
- spin_unlock(&reorder->lock);
+ if (list_empty(&reorder->list))
return NULL;
- }
padata = list_entry(reorder->list.next, struct padata_priv, list);
@@ -260,16 +255,30 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd,
* Checks the rare case where two or more parallel jobs have hashed to
* the same CPU and one of the later ones finishes first.
*/
- if (padata->seq_nr != pd->processed) {
+ if (padata->seq_nr != pd->processed)
+ return NULL;
+
+ return padata;
+}
+
+static struct padata_priv *padata_dequeue_next(struct parallel_data *pd)
+{
+ struct padata_priv *padata;
+ struct padata_list *reorder;
+ int cpu = pd->cpu;
+
+ reorder = per_cpu_ptr(pd->reorder_list, cpu);
+ spin_lock(&reorder->lock);
+
+ padata = __padata_find_next(pd, reorder);
+ if (!padata) {
spin_unlock(&reorder->lock);
return NULL;
}
- if (remove_object) {
- list_del_init(&padata->list);
- ++pd->processed;
- pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1, false);
- }
+ list_del_init(&padata->list);
+ ++pd->processed;
+ pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1, false);
spin_unlock(&reorder->lock);
return padata;
@@ -297,7 +306,7 @@ static bool padata_reorder(struct parallel_data *pd)
return false;
while (1) {
- padata = padata_find_next(pd, true);
+ padata = padata_dequeue_next(pd);
/*
* If the next object that needs serialization is parallel
@@ -330,8 +339,16 @@ static bool padata_reorder(struct parallel_data *pd)
smp_mb();
reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
- if (!list_empty(&reorder->list) && padata_find_next(pd, false))
+ if (!list_empty(&reorder->list)) {
+ spin_lock(&reorder->lock);
+ if (!__padata_find_next(pd, reorder)) {
+ spin_unlock(&reorder->lock);
+ return false;
+ }
+ spin_unlock(&reorder->lock);
+
return queue_work(pinst->serial_wq, &pd->reorder_work);
+ }
return false;
}
--
2.37.3
Even though the parallel_data "pd" instance passed to padata_reorder() is
guaranteed to exist as per the reference held by its callers, the same is
not true for the associated padata_shell, pd->ps. More specifically, once
the last padata_priv request has been completed, either at entry from
padata_reorder() or concurrently to it, the padata API users are well
within their right to free the padata_shell instance.
Note that this is a purely theoretical issue, it has not been actually
observed. Yet it's worth fixing for the sake of robustness.
Exploit the fact that as long as there are any not yet completed
padata_priv's around on any of the percpu reorder queues, pd->ps is
guaranteed to exist. Make padata_reorder() to load from pd->ps only when
it's known that there is at least one in-flight padata_priv object to
reorder. Note that this involves moving pd->ps accesses to under the
reorder->lock as appropriate, so that the found padata_priv object won't
get dequeued and completed concurrently from a different context.
Fixes: bbefa1dd6a6d ("crypto: pcrypt - Avoid deadlock by using per-instance
padata queues")
Signed-off-by: Nicolai Stange <[email protected]>
---
kernel/padata.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/kernel/padata.c b/kernel/padata.c
index e9eab3e94cfc..fa4818b81eca 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -286,7 +286,6 @@ static struct padata_priv *padata_dequeue_next(struct parallel_data *pd)
static bool padata_reorder(struct parallel_data *pd)
{
- struct padata_instance *pinst = pd->ps->pinst;
int cb_cpu;
struct padata_priv *padata;
struct padata_serial_queue *squeue;
@@ -323,7 +322,11 @@ static bool padata_reorder(struct parallel_data *pd)
list_add_tail(&padata->list, &squeue->serial.list);
spin_unlock(&squeue->serial.lock);
- queue_work_on(cb_cpu, pinst->serial_wq, &squeue->work);
+ /*
+ * Note: as long as there are requests in-flight,
+ * pd->ps is guaranteed to exist.
+ */
+ queue_work_on(cb_cpu, pd->ps->pinst->serial_wq, &squeue->work);
}
spin_unlock_bh(&pd->lock);
@@ -340,14 +343,23 @@ static bool padata_reorder(struct parallel_data *pd)
reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
if (!list_empty(&reorder->list)) {
+ bool reenqueued;
+
spin_lock(&reorder->lock);
if (!__padata_find_next(pd, reorder)) {
spin_unlock(&reorder->lock);
return false;
}
+
+ /*
+ * Note: as long as there are requests in-flight,
+ * pd->ps is guaranteed to exist.
+ */
+ reenqueued = queue_work(pd->ps->pinst->serial_wq,
+ &pd->reorder_work);
spin_unlock(&reorder->lock);
- return queue_work(pinst->serial_wq, &pd->reorder_work);
+ return reenqueued;
}
return false;
--
2.37.3
Hi Nicolai,
On Wed, Oct 19, 2022 at 10:37:03AM +0200, Nicolai Stange wrote:
> Hi all,
>
> this series is supposed to fix some lifetime issues all related to the fact that
> once the last ->serial() has been invoked, the padata user (i.e. pcrypt) is well
> with its right to tear down the associated padata_shell or parallel_data
> instance respectively.
>
> Only the first one, addressed by patch [2/5], has actually been observed, namely
> on a (downstream) RT kernel under a very specific workload involving LTP's
> pcrypt_aead01. On non-RT, I've been unable to reproduce.
I haven't been able to hit the issue in 2/5 on RT on a v6.0 kernel in an
x86 vm. Were there any other things running on the system besides
pcrypt_aead01? More details about your environment and your kernel
config would be helpful.
The first two patches seem ok but I want to think about the second more
next week. I'll look over the rest of the series then too.
Hi Daniel,
Daniel Jordan <[email protected]> writes:
> On Wed, Oct 19, 2022 at 10:37:03AM +0200, Nicolai Stange wrote:
>> this series is supposed to fix some lifetime issues all related to the fact that
>> once the last ->serial() has been invoked, the padata user (i.e. pcrypt) is well
>> with its right to tear down the associated padata_shell or parallel_data
>> instance respectively.
>>
>> Only the first one, addressed by patch [2/5], has actually been observed, namely
>> on a (downstream) RT kernel under a very specific workload involving LTP's
>> pcrypt_aead01. On non-RT, I've been unable to reproduce.
>
> I haven't been able to hit the issue in 2/5 on RT on a v6.0 kernel in an
> x86 vm. Were there any other things running on the system besides
> pcrypt_aead01? More details about your environment and your kernel
> config would be helpful.
Right, the issue is indeed hard to reproduce, unfortunately. It has
originally been reported internally by our QA Maintenance team, which --
for unknown reason -- suddenly started to hit the issue once every while
in their testing environment. I did manage to reproduce it once or twice
myself, but it took me several days running pcrypt_aead01 in a loop each
time. AFAIR, I allocated a single cpu to the VM only and increased the
priority of pcrypt_aead01 a bit, with the intent to make preemption of
the ->serial() worker by DELALG more likely. But really, I cannot tell
if that did in fact contribute to the likelihood of triggering the race
or whether I've just been lucky.
Also, as mentioned in the cover letter, the RT kernel this has been
observed on is a downstream one, based on 5.3.18 (source tree at [1],
config at [2]), but with quite some additional patches on top. A
backport of this patch series here had been subject to testing in the
same environment the issue originally showed up in on a fairly regular
basis and no new crashes have been observed since.
Let me know if I could provide you with any more details.
Thanks,
Nicolai
[1] https://github.com/SUSE/kernel/tree/SLE15-SP3-RT
[2] https://github.com/SUSE/kernel-source/blob/SLE15-SP3-RT/config/x86_64/rt
--
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nürnberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
(HRB 36809, AG Nürnberg)
On Wed, Oct 19, 2022 at 10:37:04AM +0200, Nicolai Stange wrote:
> The next commit in this series will add yet another code site decrementing
> a struct parallel_data's refcount and invoking dealloction as appropriate.
>
> With that, it's due to provide proper helper functions for managing
> parallel_datas' refcounts and to convert the existing open-coded
> refcount manipulation sites.
>
> Implement padata_put_pd() as well as a padata_put_pd_many() for batched
> releases as needed in padata_serial_worker(). For symmetry reasons, also
> provide a padata_put_get(), even though its implementation is fairly
> trivial.
>
> Convert the exisiting open-coded parallel_data ->refcnt manipulation code
> sites to these new helpers.
>
> Signed-off-by: Nicolai Stange <[email protected]>
Thanks.
Acked-by: Daniel Jordan <[email protected]>
On Wed, Oct 19, 2022 at 10:37:06AM +0200, Nicolai Stange wrote:
> On entry of padata_do_serial(), the in-flight padata_priv owns a reference
> to the associated parallel_data instance.
>
> However, as soon as the padata_priv got enqueued on the reorder list, it
> can be completed from a different context, causing the reference to get
> released in the course.
>
> This would potentially cause UAFs from the subsequent padata_reorder()
> operations invoked from the enqueueing padata_do_serial() or from the
> reorder work.
>
> Note that this is a purely theroretical concern, the problem has never been
> actually observed -- it would require multiple pcrypt request submissions
> racing against each other, ultimately a pcrypt instance destruction
> (DELALG) short after request completions as well as unfortunate timing.
>
> However, for the sake of correctness, it is still worth fixing.
>
> Make padata_do_serial() grab a reference count on the parallel_data for
> the subsequent reorder operation(s). As long as the padata_priv has not
> been enqueued, this is safe, because as mentioned above, in-flight
> pdata_privs own a reference already.
>
> Note that padata_reorder() might schedule another padata_reorder() work
> and thus, care must be taken to not prematurely release that "reorder
> refcount" from padata_do_serial() again in case that has happened.
> Make padata_reorder() return a bool for indicating whether or not a
> reorder work has been scheduled. Let padata_do_serial() drop its refcount
> only if this is not the case. Accordingly, make the reorder work handler,
> invoke_padata_reorder(), drop it then as appropriate.
>
> A remark on the commit chosen for the Fixes tag reference below: before
> commit bbefa1dd6a6d ("crypto: pcrypt - Avoid deadlock by using per-instance
> padata queues"), the padata_parallel lifetime had been tied to the
> padata_instance. The padata_free() resp. padata_stop() issued a
> synchronize_rcu() before padata_free_pd() from the instance destruction
> path, rendering UAFs from the padata_do_serial()=>padata_reorder()
> invocations with BHs disabled impossible AFAICS. With that, the
> padata_reorder() work remains to be considered. Before
> commit b128a3040935 ("padata: allocate workqueue internally"), the
> workqueue got destroyed (from pcrypt), hence drained, before the padata
> instance destruction, but this change moved that to after the
> padata_free_pd() invocation from __padata_free(). So, while the Fixes
> reference from below is most likely technically correct, I would still like
> to reiterate that this problem is probably hard to trigger in practice,
> even more so before commit bbefa1dd6a6d ("crypto: pcrypt - Avoid deadlock
> by using per-instance padata queues").
>
> Fixes: b128a3040935 ("padata: allocate workqueue internally")
> Signed-off-by: Nicolai Stange <[email protected]>
> ---
> kernel/padata.c | 26 +++++++++++++++++++++-----
> 1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/padata.c b/kernel/padata.c
> index 0bf8c80dad5a..b79226727ef7 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -275,7 +275,7 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd,
> return padata;
> }
>
> -static void padata_reorder(struct parallel_data *pd)
> +static bool padata_reorder(struct parallel_data *pd)
> {
> struct padata_instance *pinst = pd->ps->pinst;
> int cb_cpu;
> @@ -294,7 +294,7 @@ static void padata_reorder(struct parallel_data *pd)
> * care for all the objects enqueued during the holdtime of the lock.
> */
> if (!spin_trylock_bh(&pd->lock))
> - return;
> + return false;
>
> while (1) {
> padata = padata_find_next(pd, true);
> @@ -331,17 +331,23 @@ static void padata_reorder(struct parallel_data *pd)
>
> reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
> if (!list_empty(&reorder->list) && padata_find_next(pd, false))
> - queue_work(pinst->serial_wq, &pd->reorder_work);
> + return queue_work(pinst->serial_wq, &pd->reorder_work);
> +
> + return false;
> }
>
> static void invoke_padata_reorder(struct work_struct *work)
> {
> struct parallel_data *pd;
> + bool keep_refcnt;
>
> local_bh_disable();
> pd = container_of(work, struct parallel_data, reorder_work);
> - padata_reorder(pd);
> + keep_refcnt = padata_reorder(pd);
> local_bh_enable();
> +
> + if (!keep_refcnt)
> + padata_put_pd(pd);
> }
>
> static void padata_serial_worker(struct work_struct *serial_work)
> @@ -392,6 +398,15 @@ void padata_do_serial(struct padata_priv *padata)
> struct padata_list *reorder = per_cpu_ptr(pd->reorder_list, hashed_cpu);
> struct padata_priv *cur;
>
> + /*
> + * The in-flight padata owns a reference on pd. However, as
> + * soon as it's been enqueued on the reorder list, another
> + * task can dequeue and complete it, thereby dropping the
> + * reference. Grab another reference here, it will eventually
> + * be released from a reorder work, if any, or below.
> + */
> + padata_get_pd(pd);
> +
> spin_lock(&reorder->lock);
> /* Sort in ascending order of sequence number. */
> list_for_each_entry_reverse(cur, &reorder->list, list)
> @@ -407,7 +422,8 @@ void padata_do_serial(struct padata_priv *padata)
> */
> smp_mb();
>
> - padata_reorder(pd);
> + if (!padata_reorder(pd))
> + padata_put_pd(pd);
do_serial is supposed to be called with BHs disabled and will be in
every case after a fix for a separate issue that I plan to send this
cycle. Given that it (will soon...) always happen under RCU protection,
part of this issue could be addressed like this, which puts the expense
of dealing with this rare problem in the slow path:
diff --git a/kernel/padata.c b/kernel/padata.c
index 0bf8c80dad5a..cd6740ae6629 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -1110,6 +1110,12 @@ void padata_free_shell(struct padata_shell *ps)
if (!ps)
return;
+ /*
+ * Wait for all _do_serial calls to finish to avoid touching freed pd's
+ * and ps's.
+ */
+ synchronize_rcu();
+
mutex_lock(&ps->pinst->lock);
list_del(&ps->list);
padata_put_pd(rcu_dereference_protected(ps->pd, 1));
pcrypt calls padata_free_shell() when all outstanding transforms (and
thus requests, I think) have been freed/completed, so no new task can
come into padata_reorder. synchronize_rcu() then flushes out any
remaining _reorder calls.
This doesn't deal with pending reorder_work items, but we can fix it
later in the series.
What do you think?
On Wed, Oct 19, 2022 at 10:37:08AM +0200, Nicolai Stange wrote:
> Even though the parallel_data "pd" instance passed to padata_reorder() is
> guaranteed to exist as per the reference held by its callers, the same is
> not true for the associated padata_shell, pd->ps. More specifically, once
> the last padata_priv request has been completed, either at entry from
> padata_reorder() or concurrently to it, the padata API users are well
> within their right to free the padata_shell instance.
The synchronize_rcu change seems to make padata_reorder safe from freed
ps's with the exception of a straggler reorder_work. For that, I think
something like this hybrid of your code and mine is enough to plug the
hole. It's on top of 1-2 and my hunk from 3. It has to take an extra
ref on pd, but only in the rare case where the reorder work is used.
Thoughts?
diff --git a/kernel/padata.c b/kernel/padata.c
index cd6740ae6629..f14c256a0ee3 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -277,7 +277,7 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd,
static void padata_reorder(struct parallel_data *pd)
{
- struct padata_instance *pinst = pd->ps->pinst;
+ struct padata_instance *pinst;
int cb_cpu;
struct padata_priv *padata;
struct padata_serial_queue *squeue;
@@ -314,7 +314,7 @@ static void padata_reorder(struct parallel_data *pd)
list_add_tail(&padata->list, &squeue->serial.list);
spin_unlock(&squeue->serial.lock);
- queue_work_on(cb_cpu, pinst->serial_wq, &squeue->work);
+ queue_work_on(cb_cpu, pd->ps->pinst->serial_wq, &squeue->work);
}
spin_unlock_bh(&pd->lock);
@@ -330,8 +330,10 @@ static void padata_reorder(struct parallel_data *pd)
smp_mb();
reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
- if (!list_empty(&reorder->list) && padata_find_next(pd, false))
- queue_work(pinst->serial_wq, &pd->reorder_work);
+ if (!list_empty(&reorder->list) && padata_find_next(pd, false)) {
+ if (queue_work(pd->ps->pinst->serial_wq, &pd->reorder_work))
+ padata_get_pd(pd);
+ }
}
static void invoke_padata_reorder(struct work_struct *work)
@@ -342,6 +344,7 @@ static void invoke_padata_reorder(struct work_struct *work)
pd = container_of(work, struct parallel_data, reorder_work);
padata_reorder(pd);
local_bh_enable();
+ padata_put_pd(pd);
}
static void padata_serial_worker(struct work_struct *serial_work)
Daniel Jordan <[email protected]> writes:
> On Wed, Oct 19, 2022 at 10:37:06AM +0200, Nicolai Stange wrote:
>> On entry of padata_do_serial(), the in-flight padata_priv owns a reference
>> to the associated parallel_data instance.
>>
>> However, as soon as the padata_priv got enqueued on the reorder list, it
>> can be completed from a different context, causing the reference to get
>> released in the course.
>>
>> This would potentially cause UAFs from the subsequent padata_reorder()
>> operations invoked from the enqueueing padata_do_serial() or from the
>> reorder work.
>>
<snip>
>> ---
>> kernel/padata.c | 26 +++++++++++++++++++++-----
>> 1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/padata.c b/kernel/padata.c
>> index 0bf8c80dad5a..b79226727ef7 100644
>> --- a/kernel/padata.c
>> +++ b/kernel/padata.c
>> @@ -275,7 +275,7 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd,
>> return padata;
>> }
>>
>> -static void padata_reorder(struct parallel_data *pd)
>> +static bool padata_reorder(struct parallel_data *pd)
>> {
>> struct padata_instance *pinst = pd->ps->pinst;
>> int cb_cpu;
>> @@ -294,7 +294,7 @@ static void padata_reorder(struct parallel_data *pd)
>> * care for all the objects enqueued during the holdtime of the lock.
>> */
>> if (!spin_trylock_bh(&pd->lock))
>> - return;
>> + return false;
>>
>> while (1) {
>> padata = padata_find_next(pd, true);
>> @@ -331,17 +331,23 @@ static void padata_reorder(struct parallel_data *pd)
>>
>> reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
>> if (!list_empty(&reorder->list) && padata_find_next(pd, false))
>> - queue_work(pinst->serial_wq, &pd->reorder_work);
>> + return queue_work(pinst->serial_wq, &pd->reorder_work);
>> +
>> + return false;
>> }
>>
>> static void invoke_padata_reorder(struct work_struct *work)
>> {
>> struct parallel_data *pd;
>> + bool keep_refcnt;
>>
>> local_bh_disable();
>> pd = container_of(work, struct parallel_data, reorder_work);
>> - padata_reorder(pd);
>> + keep_refcnt = padata_reorder(pd);
>> local_bh_enable();
>> +
>> + if (!keep_refcnt)
>> + padata_put_pd(pd);
>> }
>>
>> static void padata_serial_worker(struct work_struct *serial_work)
>> @@ -392,6 +398,15 @@ void padata_do_serial(struct padata_priv *padata)
>> struct padata_list *reorder = per_cpu_ptr(pd->reorder_list, hashed_cpu);
>> struct padata_priv *cur;
>>
>> + /*
>> + * The in-flight padata owns a reference on pd. However, as
>> + * soon as it's been enqueued on the reorder list, another
>> + * task can dequeue and complete it, thereby dropping the
>> + * reference. Grab another reference here, it will eventually
>> + * be released from a reorder work, if any, or below.
>> + */
>> + padata_get_pd(pd);
>> +
>> spin_lock(&reorder->lock);
>> /* Sort in ascending order of sequence number. */
>> list_for_each_entry_reverse(cur, &reorder->list, list)
>> @@ -407,7 +422,8 @@ void padata_do_serial(struct padata_priv *padata)
>> */
>> smp_mb();
>>
>> - padata_reorder(pd);
>> + if (!padata_reorder(pd))
>> + padata_put_pd(pd);
>
> do_serial is supposed to be called with BHs disabled and will be in
> every case after a fix for a separate issue that I plan to send this
> cycle. Given that it (will soon...) always happen under RCU protection,
> part of this issue could be addressed like this, which puts the expense
> of dealing with this rare problem in the slow path:
>
> diff --git a/kernel/padata.c b/kernel/padata.c
> index 0bf8c80dad5a..cd6740ae6629 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -1110,6 +1110,12 @@ void padata_free_shell(struct padata_shell *ps)
> if (!ps)
> return;
>
> + /*
> + * Wait for all _do_serial calls to finish to avoid touching freed pd's
> + * and ps's.
> + */
> + synchronize_rcu();
> +
> mutex_lock(&ps->pinst->lock);
> list_del(&ps->list);
> padata_put_pd(rcu_dereference_protected(ps->pd, 1));
>
> pcrypt calls padata_free_shell() when all outstanding transforms (and
> thus requests, I think) have been freed/completed, so no new task can
> come into padata_reorder. synchronize_rcu() then flushes out any
> remaining _reorder calls.
>
> This doesn't deal with pending reorder_work items, but we can fix it
> later in the series.
>
> What do you think?
Yes, I think that would work. Will you handle it alongside that fix for
a separate issue you mentioned above? Or shall I once this fix has
landed?
Thanks!
Nicolai
--
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nürnberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
(HRB 36809, AG Nürnberg)
Daniel Jordan <[email protected]> writes:
> On Wed, Oct 19, 2022 at 10:37:08AM +0200, Nicolai Stange wrote:
>> Even though the parallel_data "pd" instance passed to padata_reorder() is
>> guaranteed to exist as per the reference held by its callers, the same is
>> not true for the associated padata_shell, pd->ps. More specifically, once
>> the last padata_priv request has been completed, either at entry from
>> padata_reorder() or concurrently to it, the padata API users are well
>> within their right to free the padata_shell instance.
>
> The synchronize_rcu change seems to make padata_reorder safe from freed
> ps's with the exception of a straggler reorder_work. For that, I think
> something like this hybrid of your code and mine is enough to plug the
> hole. It's on top of 1-2 and my hunk from 3. It has to take an extra
> ref on pd, but only in the rare case where the reorder work is used.
> Thoughts?
>
> diff --git a/kernel/padata.c b/kernel/padata.c
> index cd6740ae6629..f14c256a0ee3 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -277,7 +277,7 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd,
>
> static void padata_reorder(struct parallel_data *pd)
> {
> - struct padata_instance *pinst = pd->ps->pinst;
> + struct padata_instance *pinst;
> int cb_cpu;
> struct padata_priv *padata;
> struct padata_serial_queue *squeue;
> @@ -314,7 +314,7 @@ static void padata_reorder(struct parallel_data *pd)
> list_add_tail(&padata->list, &squeue->serial.list);
> spin_unlock(&squeue->serial.lock);
>
> - queue_work_on(cb_cpu, pinst->serial_wq, &squeue->work);
> + queue_work_on(cb_cpu, pd->ps->pinst->serial_wq, &squeue->work);
> }
>
> spin_unlock_bh(&pd->lock);
> @@ -330,8 +330,10 @@ static void padata_reorder(struct parallel_data *pd)
> smp_mb();
>
> reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
> - if (!list_empty(&reorder->list) && padata_find_next(pd, false))
> - queue_work(pinst->serial_wq, &pd->reorder_work);
> + if (!list_empty(&reorder->list) && padata_find_next(pd, false)) {
> + if (queue_work(pd->ps->pinst->serial_wq, &pd->reorder_work))
> + padata_get_pd(pd);
As the reorder_work can start running immediately after having been
submitted, wouldn't it be more correct to do something like
padata_get_pd(pd);
if (!queue_work(pd->ps->pinst->serial_wq, &pd->reorder_work))
padata_put_pd(pd);
?
Otherwise the patch looks good to me.
Thanks!
Nicolai
> + }
> }
>
> static void invoke_padata_reorder(struct work_struct *work)
> @@ -342,6 +344,7 @@ static void invoke_padata_reorder(struct work_struct *work)
> pd = container_of(work, struct parallel_data, reorder_work);
> padata_reorder(pd);
> local_bh_enable();
> + padata_put_pd(pd);
> }
>
> static void padata_serial_worker(struct work_struct *serial_work)
>
--
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nürnberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
(HRB 36809, AG Nürnberg)
On Wed, Nov 09, 2022 at 02:03:15PM +0100, Nicolai Stange wrote:
> Daniel Jordan <[email protected]> writes:
>
> > On Wed, Oct 19, 2022 at 10:37:06AM +0200, Nicolai Stange wrote:
> >> On entry of padata_do_serial(), the in-flight padata_priv owns a reference
> >> to the associated parallel_data instance.
> >>
> >> However, as soon as the padata_priv got enqueued on the reorder list, it
> >> can be completed from a different context, causing the reference to get
> >> released in the course.
> >>
> >> This would potentially cause UAFs from the subsequent padata_reorder()
> >> operations invoked from the enqueueing padata_do_serial() or from the
> >> reorder work.
> >>
>
> <snip>
>
> >> ---
> >> kernel/padata.c | 26 +++++++++++++++++++++-----
> >> 1 file changed, 21 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/padata.c b/kernel/padata.c
> >> index 0bf8c80dad5a..b79226727ef7 100644
> >> --- a/kernel/padata.c
> >> +++ b/kernel/padata.c
> >> @@ -275,7 +275,7 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd,
> >> return padata;
> >> }
> >>
> >> -static void padata_reorder(struct parallel_data *pd)
> >> +static bool padata_reorder(struct parallel_data *pd)
> >> {
> >> struct padata_instance *pinst = pd->ps->pinst;
> >> int cb_cpu;
> >> @@ -294,7 +294,7 @@ static void padata_reorder(struct parallel_data *pd)
> >> * care for all the objects enqueued during the holdtime of the lock.
> >> */
> >> if (!spin_trylock_bh(&pd->lock))
> >> - return;
> >> + return false;
> >>
> >> while (1) {
> >> padata = padata_find_next(pd, true);
> >> @@ -331,17 +331,23 @@ static void padata_reorder(struct parallel_data *pd)
> >>
> >> reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
> >> if (!list_empty(&reorder->list) && padata_find_next(pd, false))
> >> - queue_work(pinst->serial_wq, &pd->reorder_work);
> >> + return queue_work(pinst->serial_wq, &pd->reorder_work);
> >> +
> >> + return false;
> >> }
> >>
> >> static void invoke_padata_reorder(struct work_struct *work)
> >> {
> >> struct parallel_data *pd;
> >> + bool keep_refcnt;
> >>
> >> local_bh_disable();
> >> pd = container_of(work, struct parallel_data, reorder_work);
> >> - padata_reorder(pd);
> >> + keep_refcnt = padata_reorder(pd);
> >> local_bh_enable();
> >> +
> >> + if (!keep_refcnt)
> >> + padata_put_pd(pd);
> >> }
> >>
> >> static void padata_serial_worker(struct work_struct *serial_work)
> >> @@ -392,6 +398,15 @@ void padata_do_serial(struct padata_priv *padata)
> >> struct padata_list *reorder = per_cpu_ptr(pd->reorder_list, hashed_cpu);
> >> struct padata_priv *cur;
> >>
> >> + /*
> >> + * The in-flight padata owns a reference on pd. However, as
> >> + * soon as it's been enqueued on the reorder list, another
> >> + * task can dequeue and complete it, thereby dropping the
> >> + * reference. Grab another reference here, it will eventually
> >> + * be released from a reorder work, if any, or below.
> >> + */
> >> + padata_get_pd(pd);
> >> +
> >> spin_lock(&reorder->lock);
> >> /* Sort in ascending order of sequence number. */
> >> list_for_each_entry_reverse(cur, &reorder->list, list)
> >> @@ -407,7 +422,8 @@ void padata_do_serial(struct padata_priv *padata)
> >> */
> >> smp_mb();
> >>
> >> - padata_reorder(pd);
> >> + if (!padata_reorder(pd))
> >> + padata_put_pd(pd);
> >
> > do_serial is supposed to be called with BHs disabled and will be in
> > every case after a fix for a separate issue that I plan to send this
> > cycle. Given that it (will soon...) always happen under RCU protection,
> > part of this issue could be addressed like this, which puts the expense
> > of dealing with this rare problem in the slow path:
> >
> > diff --git a/kernel/padata.c b/kernel/padata.c
> > index 0bf8c80dad5a..cd6740ae6629 100644
> > --- a/kernel/padata.c
> > +++ b/kernel/padata.c
> > @@ -1110,6 +1110,12 @@ void padata_free_shell(struct padata_shell *ps)
> > if (!ps)
> > return;
> >
> > + /*
> > + * Wait for all _do_serial calls to finish to avoid touching freed pd's
> > + * and ps's.
> > + */
> > + synchronize_rcu();
> > +
> > mutex_lock(&ps->pinst->lock);
> > list_del(&ps->list);
> > padata_put_pd(rcu_dereference_protected(ps->pd, 1));
> >
> > pcrypt calls padata_free_shell() when all outstanding transforms (and
> > thus requests, I think) have been freed/completed, so no new task can
> > come into padata_reorder. synchronize_rcu() then flushes out any
> > remaining _reorder calls.
> >
> > This doesn't deal with pending reorder_work items, but we can fix it
> > later in the series.
> >
> > What do you think?
>
> Yes, I think that would work. Will you handle it alongside that fix for
> a separate issue you mentioned above? Or shall I once this fix has
> landed?
Please go ahead and do it yourself. I'll send mine soon, I think it
should be ok for them to go in in the same cycle.
On Wed, Nov 09, 2022 at 02:03:29PM +0100, Nicolai Stange wrote:
> Daniel Jordan <[email protected]> writes:
>
> > On Wed, Oct 19, 2022 at 10:37:08AM +0200, Nicolai Stange wrote:
> >> Even though the parallel_data "pd" instance passed to padata_reorder() is
> >> guaranteed to exist as per the reference held by its callers, the same is
> >> not true for the associated padata_shell, pd->ps. More specifically, once
> >> the last padata_priv request has been completed, either at entry from
> >> padata_reorder() or concurrently to it, the padata API users are well
> >> within their right to free the padata_shell instance.
> >
> > The synchronize_rcu change seems to make padata_reorder safe from freed
> > ps's with the exception of a straggler reorder_work. For that, I think
> > something like this hybrid of your code and mine is enough to plug the
> > hole. It's on top of 1-2 and my hunk from 3. It has to take an extra
> > ref on pd, but only in the rare case where the reorder work is used.
> > Thoughts?
> >
> > diff --git a/kernel/padata.c b/kernel/padata.c
> > index cd6740ae6629..f14c256a0ee3 100644
> > --- a/kernel/padata.c
> > +++ b/kernel/padata.c
> > @@ -277,7 +277,7 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd,
> >
> > static void padata_reorder(struct parallel_data *pd)
> > {
> > - struct padata_instance *pinst = pd->ps->pinst;
> > + struct padata_instance *pinst;
> > int cb_cpu;
> > struct padata_priv *padata;
> > struct padata_serial_queue *squeue;
> > @@ -314,7 +314,7 @@ static void padata_reorder(struct parallel_data *pd)
> > list_add_tail(&padata->list, &squeue->serial.list);
> > spin_unlock(&squeue->serial.lock);
> >
> > - queue_work_on(cb_cpu, pinst->serial_wq, &squeue->work);
> > + queue_work_on(cb_cpu, pd->ps->pinst->serial_wq, &squeue->work);
> > }
> >
> > spin_unlock_bh(&pd->lock);
> > @@ -330,8 +330,10 @@ static void padata_reorder(struct parallel_data *pd)
> > smp_mb();
> >
> > reorder = per_cpu_ptr(pd->reorder_list, pd->cpu);
> > - if (!list_empty(&reorder->list) && padata_find_next(pd, false))
> > - queue_work(pinst->serial_wq, &pd->reorder_work);
> > + if (!list_empty(&reorder->list) && padata_find_next(pd, false)) {
> > + if (queue_work(pd->ps->pinst->serial_wq, &pd->reorder_work))
> > + padata_get_pd(pd);
>
> As the reorder_work can start running immediately after having been
> submitted, wouldn't it be more correct to do something like
>
> padata_get_pd(pd);
> if (!queue_work(pd->ps->pinst->serial_wq, &pd->reorder_work))
> padata_put_pd(pd);
>
> ?
Yes, that's better, and all the above can go in your next version too.