2017-09-08 18:57:20

by Mathias Krause

[permalink] [raw]
Subject: [PATCH 0/3] padata cpu awareness fixes

Hi Steffen, Herbert,

this series solves multiple issues of padata I ran into while trying to
make use of pcrypt for IPsec.

The first issue is that the reorder timer callback may run on a CPU that
isn't part of the parallel set, as it runs on the CPU where the timer
interrupt gets handled. As a result, padata_reorder() may run on a CPU
with uninitialized per-cpu parallel queues, so the __this_cpu_read() in
padata_get_next() refers to an uninitialized cpu_index. However, as
per-cpu memory gets memset(0) on allocation time, it'll read 0. If the
next CPU index we're waiting for is 0 too, the compare will succeed and
we'll return with -ENODATA, making padata_reorder() think there's
nothing to do, stop any pending timer and return immediately. This is
wrong as the pending timer might have been the one to trigger the needed
reordering, but, as it was canceled, will never happen -- if no other
parallel requests arrive afterwards.

That issue is addressed with the first two patches, ensuring we're not
using a bogus cpu_index value for the compare and thereby not wrongly
cancel a pending timer. The second patch then ensures the reorder timer
callback runs on the correct CPU or, at least, on a CPU from the
parallel set to generate forward progress.

The third patch addresses a similar issues for the serialization
callback. We implicitly require padata_do_serial() to be called on the
very same CPU padata_do_parallel() was called on to ensure the correct
ordering of requests -- and correct functioning of padata, even. If the
algorithm we're parallelizing is asynchronous itself, that invariant
might not be true, as, e.g. crypto hardware may execute the callback on
the CPU its interrupt gets handled on which isn't necessarily the one
the request got issued on.

To handle that issue we track the CPU we're supposed to handle the
request on and ensure we'll call padata_reorder() on that CPU when
padata_do_serial() gets called -- either by already running on the
corect CPU or by deferring the call to a worker.

Please apply!

Mathias Krause (3):
padata: set cpu_index of unused CPUs to -1
padata: ensure the reorder timer callback runs on the correct CPU
padata: ensure padata_do_serial() runs on the correct CPU

include/linux/padata.h | 4 +++
kernel/padata.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 72 insertions(+), 3 deletions(-)

--
1.7.10.4


2017-09-08 18:57:25

by Mathias Krause

[permalink] [raw]
Subject: [PATCH 2/3] padata: ensure the reorder timer callback runs on the correct CPU

The reorder timer function runs on the CPU where the timer interrupt was
handled which is not necessarily one of the CPUs of the 'pcpu' CPU mask
set.

Ensure the padata_reorder() callback runs on the correct CPU, which is
one in the 'pcpu' CPU mask set and, preferrably, the next expected one.
Do so by comparing the current CPU with the expected target CPU. If they
match, call padata_reorder() right away. If they differ, schedule a work
item on the target CPU that does the padata_reorder() call for us.

Signed-off-by: Mathias Krause <[email protected]>
---
include/linux/padata.h | 2 ++
kernel/padata.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 2f9c1f93b1ce..5c0175bbc179 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -85,6 +85,7 @@ struct padata_serial_queue {
* @swork: work struct for serialization.
* @pd: Backpointer to the internal control structure.
* @work: work struct for parallelization.
+ * @reorder_work: work struct for reordering.
* @num_obj: Number of objects that are processed by this cpu.
* @cpu_index: Index of the cpu.
*/
@@ -93,6 +94,7 @@ struct padata_parallel_queue {
struct padata_list reorder;
struct parallel_data *pd;
struct work_struct work;
+ struct work_struct reorder_work;
atomic_t num_obj;
int cpu_index;
};
diff --git a/kernel/padata.c b/kernel/padata.c
index 1b9b4bac4a9b..b4066147bce4 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -275,11 +275,51 @@ static void padata_reorder(struct parallel_data *pd)
return;
}

+static void invoke_padata_reorder(struct work_struct *work)
+{
+ struct padata_parallel_queue *pqueue;
+ struct parallel_data *pd;
+
+ local_bh_disable();
+ pqueue = container_of(work, struct padata_parallel_queue, reorder_work);
+ pd = pqueue->pd;
+ padata_reorder(pd);
+ local_bh_enable();
+}
+
static void padata_reorder_timer(unsigned long arg)
{
struct parallel_data *pd = (struct parallel_data *)arg;
+ unsigned int weight;
+ int target_cpu, cpu;

- padata_reorder(pd);
+ cpu = get_cpu();
+
+ /* We don't lock pd here to not interfere with parallel processing
+ * padata_reorder() calls on other CPUs. We just need any CPU out of
+ * the cpumask.pcpu set. It would be nice if it's the right one but
+ * it doesn't matter if we're off to the next one by using an outdated
+ * pd->processed value.
+ */
+ weight = cpumask_weight(pd->cpumask.pcpu);
+ target_cpu = padata_index_to_cpu(pd, pd->processed % weight);
+
+ /* ensure to call the reorder callback on the correct CPU */
+ if (cpu != target_cpu) {
+ struct padata_parallel_queue *pqueue;
+ struct padata_instance *pinst;
+
+ /* The timer function is serialized wrt itself -- no locking
+ * needed.
+ */
+ pinst = pd->pinst;
+ pqueue = per_cpu_ptr(pd->pqueue, target_cpu);
+ queue_work_on(target_cpu, pinst->wq, &pqueue->reorder_work);
+ } else {
+ padata_reorder(pd);
+ }
+
+ put_cpu();
}

static void padata_serial_worker(struct work_struct *serial_work)
@@ -399,6 +439,7 @@ static void padata_init_pqueues(struct parallel_data *pd)
__padata_list_init(&pqueue->reorder);
__padata_list_init(&pqueue->parallel);
INIT_WORK(&pqueue->work, padata_parallel_worker);
+ INIT_WORK(&pqueue->reorder_work, invoke_padata_reorder);
atomic_set(&pqueue->num_obj, 0);
}
}
--
1.7.10.4

2017-09-08 18:57:25

by Mathias Krause

[permalink] [raw]
Subject: [PATCH 3/3] padata: ensure padata_do_serial() runs on the correct CPU

If the algorithm we're parallelizing is asynchronous we might change
CPUs between padata_do_parallel() and padata_do_serial(). However, we
don't expect this to happen as we need to enqueue the padata object into
the per-cpu reorder queue we took it from, i.e. the same-cpu's parallel
queue.

Ensure we're not switching CPUs for a given padata object by tracking
the CPU within the padata object. If the serial callback gets called on
the wrong CPU, defer invoking padata_reorder() via a kernel worker on
the CPU we're expected to run on.

Signed-off-by: Mathias Krause <[email protected]>
---
include/linux/padata.h | 2 ++
kernel/padata.c | 20 +++++++++++++++++++-
2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 5c0175bbc179..5d13d25da2c8 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -37,6 +37,7 @@
* @list: List entry, to attach to the padata lists.
* @pd: Pointer to the internal control structure.
* @cb_cpu: Callback cpu for serializatioon.
+ * @cpu: Cpu for parallelization.
* @seq_nr: Sequence number of the parallelized data object.
* @info: Used to pass information from the parallel to the serial function.
* @parallel: Parallel execution function.
@@ -46,6 +47,7 @@ struct padata_priv {
struct list_head list;
struct parallel_data *pd;
int cb_cpu;
+ int cpu;
int info;
void (*parallel)(struct padata_priv *padata);
void (*serial)(struct padata_priv *padata);
diff --git a/kernel/padata.c b/kernel/padata.c
index b4066147bce4..f262c9a4e70a 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -131,6 +131,7 @@ int padata_do_parallel(struct padata_instance *pinst,
padata->cb_cpu = cb_cpu;

target_cpu = padata_cpu_hash(pd);
+ padata->cpu = target_cpu;
queue = per_cpu_ptr(pd->pqueue, target_cpu);

spin_lock(&queue->parallel.lock);
@@ -363,10 +364,21 @@ void padata_do_serial(struct padata_priv *padata)
int cpu;
struct padata_parallel_queue *pqueue;
struct parallel_data *pd;
+ int reorder_via_wq = 0;

pd = padata->pd;

cpu = get_cpu();
+
+ /* We need to run on the same CPU padata_do_parallel(.., padata, ..)
+ * was called on -- or, at least, enqueue the padata object into the
+ * correct per-cpu queue.
+ */
+ if (cpu != padata->cpu) {
+ reorder_via_wq = 1;
+ cpu = padata->cpu;
+ }
+
pqueue = per_cpu_ptr(pd->pqueue, cpu);

spin_lock(&pqueue->reorder.lock);
@@ -376,7 +388,13 @@ void padata_do_serial(struct padata_priv *padata)

put_cpu();

- padata_reorder(pd);
+ /* If we're running on the wrong CPU, call padata_reorder() via a
+ * kernel worker.
+ */
+ if (reorder_via_wq)
+ queue_work_on(cpu, pd->pinst->wq, &pqueue->reorder_work);
+ else
+ padata_reorder(pd);
}
EXPORT_SYMBOL(padata_do_serial);

--
1.7.10.4

2017-09-08 18:57:22

by Mathias Krause

[permalink] [raw]
Subject: [PATCH 1/3] padata: set cpu_index of unused CPUs to -1

The parallel queue per-cpu data structure gets initialized only for CPUs
in the 'pcpu' CPU mask set. This is not sufficient as the reorder timer
may run on a different CPU and might wrongly decide it's the target CPU
for the next reorder item as per-cpu memory gets memset(0) and we might
be waiting for the first CPU in cpumask.pcpu, i.e. cpu_index 0.

Make the '__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index'
compare in padata_get_next() fail in this case by initializing the
cpu_index member of all per-cpu parallel queues. Use -1 for unused ones.

Signed-off-by: Mathias Krause <[email protected]>
---
kernel/padata.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index 868f947166d7..1b9b4bac4a9b 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -384,8 +384,14 @@ static void padata_init_pqueues(struct parallel_data *pd)
struct padata_parallel_queue *pqueue;

cpu_index = 0;
- for_each_cpu(cpu, pd->cpumask.pcpu) {
+ for_each_possible_cpu(cpu) {
pqueue = per_cpu_ptr(pd->pqueue, cpu);
+
+ if (!cpumask_test_cpu(cpu, pd->cpumask.pcpu)) {
+ pqueue->cpu_index = -1;
+ continue;
+ }
+
pqueue->pd = pd;
pqueue->cpu_index = cpu_index;
cpu_index++;
--
1.7.10.4

2017-09-12 09:07:55

by Steffen Klassert

[permalink] [raw]
Subject: Re: [PATCH 0/3] padata cpu awareness fixes

On Fri, Sep 08, 2017 at 08:57:08PM +0200, Mathias Krause wrote:
> Hi Steffen, Herbert,
>
> this series solves multiple issues of padata I ran into while trying to
> make use of pcrypt for IPsec.
>
> The first issue is that the reorder timer callback may run on a CPU that
> isn't part of the parallel set, as it runs on the CPU where the timer
> interrupt gets handled. As a result, padata_reorder() may run on a CPU
> with uninitialized per-cpu parallel queues, so the __this_cpu_read() in
> padata_get_next() refers to an uninitialized cpu_index. However, as
> per-cpu memory gets memset(0) on allocation time, it'll read 0. If the
> next CPU index we're waiting for is 0 too, the compare will succeed and
> we'll return with -ENODATA, making padata_reorder() think there's
> nothing to do, stop any pending timer and return immediately. This is
> wrong as the pending timer might have been the one to trigger the needed
> reordering, but, as it was canceled, will never happen -- if no other
> parallel requests arrive afterwards.
>
> That issue is addressed with the first two patches, ensuring we're not
> using a bogus cpu_index value for the compare and thereby not wrongly
> cancel a pending timer. The second patch then ensures the reorder timer
> callback runs on the correct CPU or, at least, on a CPU from the
> parallel set to generate forward progress.
>
> The third patch addresses a similar issues for the serialization
> callback. We implicitly require padata_do_serial() to be called on the
> very same CPU padata_do_parallel() was called on to ensure the correct
> ordering of requests -- and correct functioning of padata, even. If the
> algorithm we're parallelizing is asynchronous itself, that invariant
> might not be true, as, e.g. crypto hardware may execute the callback on
> the CPU its interrupt gets handled on which isn't necessarily the one
> the request got issued on.
>
> To handle that issue we track the CPU we're supposed to handle the
> request on and ensure we'll call padata_reorder() on that CPU when
> padata_do_serial() gets called -- either by already running on the
> corect CPU or by deferring the call to a worker.
>
> Please apply!
>
> Mathias Krause (3):
> padata: set cpu_index of unused CPUs to -1
> padata: ensure the reorder timer callback runs on the correct CPU
> padata: ensure padata_do_serial() runs on the correct CPU

Looks good, thanks!

Acked-by: Steffen Klassert <[email protected]>

2017-10-02 11:14:26

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH 0/3] padata cpu awareness fixes

On 12 September 2017 at 11:07, Steffen Klassert
<[email protected]> wrote:
> On Fri, Sep 08, 2017 at 08:57:08PM +0200, Mathias Krause wrote:
>> Hi Steffen, Herbert,
>>
>> this series solves multiple issues of padata I ran into while trying to
>> make use of pcrypt for IPsec.
>>
>> The first issue is that the reorder timer callback may run on a CPU that
>> isn't part of the parallel set, as it runs on the CPU where the timer
>> interrupt gets handled. As a result, padata_reorder() may run on a CPU
>> with uninitialized per-cpu parallel queues, so the __this_cpu_read() in
>> padata_get_next() refers to an uninitialized cpu_index. However, as
>> per-cpu memory gets memset(0) on allocation time, it'll read 0. If the
>> next CPU index we're waiting for is 0 too, the compare will succeed and
>> we'll return with -ENODATA, making padata_reorder() think there's
>> nothing to do, stop any pending timer and return immediately. This is
>> wrong as the pending timer might have been the one to trigger the needed
>> reordering, but, as it was canceled, will never happen -- if no other
>> parallel requests arrive afterwards.
>>
>> That issue is addressed with the first two patches, ensuring we're not
>> using a bogus cpu_index value for the compare and thereby not wrongly
>> cancel a pending timer. The second patch then ensures the reorder timer
>> callback runs on the correct CPU or, at least, on a CPU from the
>> parallel set to generate forward progress.
>>
>> The third patch addresses a similar issues for the serialization
>> callback. We implicitly require padata_do_serial() to be called on the
>> very same CPU padata_do_parallel() was called on to ensure the correct
>> ordering of requests -- and correct functioning of padata, even. If the
>> algorithm we're parallelizing is asynchronous itself, that invariant
>> might not be true, as, e.g. crypto hardware may execute the callback on
>> the CPU its interrupt gets handled on which isn't necessarily the one
>> the request got issued on.
>>
>> To handle that issue we track the CPU we're supposed to handle the
>> request on and ensure we'll call padata_reorder() on that CPU when
>> padata_do_serial() gets called -- either by already running on the
>> corect CPU or by deferring the call to a worker.
>>
>> Please apply!
>>
>> Mathias Krause (3):
>> padata: set cpu_index of unused CPUs to -1
>> padata: ensure the reorder timer callback runs on the correct CPU
>> padata: ensure padata_do_serial() runs on the correct CPU
>
> Looks good, thanks!
>
> Acked-by: Steffen Klassert <[email protected]>

Ping! Herbert, will these patches go through your tree or Steffen's?

2017-10-02 11:53:55

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 0/3] padata cpu awareness fixes

On Mon, Oct 02, 2017 at 01:14:24PM +0200, Mathias Krause wrote:
>
> Ping! Herbert, will these patches go through your tree or Steffen's?

They are in my queue.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2017-10-07 04:19:23

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 0/3] padata cpu awareness fixes

On Fri, Sep 08, 2017 at 08:57:08PM +0200, Mathias Krause wrote:
> Hi Steffen, Herbert,
>
> this series solves multiple issues of padata I ran into while trying to
> make use of pcrypt for IPsec.
>
> The first issue is that the reorder timer callback may run on a CPU that
> isn't part of the parallel set, as it runs on the CPU where the timer
> interrupt gets handled. As a result, padata_reorder() may run on a CPU
> with uninitialized per-cpu parallel queues, so the __this_cpu_read() in
> padata_get_next() refers to an uninitialized cpu_index. However, as
> per-cpu memory gets memset(0) on allocation time, it'll read 0. If the
> next CPU index we're waiting for is 0 too, the compare will succeed and
> we'll return with -ENODATA, making padata_reorder() think there's
> nothing to do, stop any pending timer and return immediately. This is
> wrong as the pending timer might have been the one to trigger the needed
> reordering, but, as it was canceled, will never happen -- if no other
> parallel requests arrive afterwards.
>
> That issue is addressed with the first two patches, ensuring we're not
> using a bogus cpu_index value for the compare and thereby not wrongly
> cancel a pending timer. The second patch then ensures the reorder timer
> callback runs on the correct CPU or, at least, on a CPU from the
> parallel set to generate forward progress.
>
> The third patch addresses a similar issues for the serialization
> callback. We implicitly require padata_do_serial() to be called on the
> very same CPU padata_do_parallel() was called on to ensure the correct
> ordering of requests -- and correct functioning of padata, even. If the
> algorithm we're parallelizing is asynchronous itself, that invariant
> might not be true, as, e.g. crypto hardware may execute the callback on
> the CPU its interrupt gets handled on which isn't necessarily the one
> the request got issued on.
>
> To handle that issue we track the CPU we're supposed to handle the
> request on and ensure we'll call padata_reorder() on that CPU when
> padata_do_serial() gets called -- either by already running on the
> corect CPU or by deferring the call to a worker.
>
> Please apply!
>
> Mathias Krause (3):
> padata: set cpu_index of unused CPUs to -1
> padata: ensure the reorder timer callback runs on the correct CPU
> padata: ensure padata_do_serial() runs on the correct CPU

All applied. Thanks.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt