2019-01-08 19:45:06

by Anatol Pomozov

[permalink] [raw]
Subject: seqcount usage in xt_replace_table()

Hello folks,

A bit of context what I am doing. I am trying to port KTSAN (Kernel
Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
and makes sure it is accessed in a thread-safe manner.

seqlock is a synchronization primitive used by Linux kernel. KTSAN
annotates read_seqbegin()/read_seqretry() and tracks what data been
accessed in its critical section.

During KTSAN port I found and interesting seqcount usage introduced in
commit 80055dab5de0c8677bc148c4717ddfc753a9148e

If I read this commit correctly xt_replace_table() does not use
seqlock in a canonical way to specify a critical section. Instead the
code reads the counter and waits until it gets to a specific value.

Now I want KTSAN to play with this code nicely. I need to tell KTSAN
something like "this raw_read_seqcount() does not start a critical
section, just ignore it". So temporary I introduced
raw_read_seqcount_nocritical() function that is ignored by KTSAN. Is
it a good solution?

Or maybe xt_replace_table() can be enhanced? When I hear that
something waits until an event happens on all CPUs I think about
wait_event() function. Would it be better for xt_replace_table() to
introduce an atomic counter that is decremented by CPUs, and the main
CPU waits until the counter gets zero?

WDYT?


2019-01-08 23:03:08

by Florian Westphal

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

Anatol Pomozov <[email protected]> wrote:
> Or maybe xt_replace_table() can be enhanced? When I hear that
> something waits until an event happens on all CPUs I think about
> wait_event() function. Would it be better for xt_replace_table() to
> introduce an atomic counter that is decremented by CPUs, and the main
> CPU waits until the counter gets zero?

That would mean placing an additional atomic op into the
iptables evaluation path (ipt_do_table and friends).

Only alternative I see that might work is synchronize_rcu (the
_do_table functions are called with rcu read lock held).

I guess current scheme is cheaper though.

2019-01-09 00:05:43

by Andrea Parri

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

Hi Anatol,

On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> Hello folks,
>
> A bit of context what I am doing. I am trying to port KTSAN (Kernel
> Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> and makes sure it is accessed in a thread-safe manner.

Interesting! FYI, some LKMM's maintainers (Paul included) had and
continued to have some "fun" discussing topics related to "thread-
safe memory accesses": I'm sure that they'll be very interested in
such work of yours and eager to discuss your results.

Cheers,
Andrea

2019-01-09 00:38:43

by Anatol Pomozov

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

Hello

On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
<[email protected]> wrote:
>
> Hi Anatol,
>
> On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > Hello folks,
> >
> > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > and makes sure it is accessed in a thread-safe manner.
>
> Interesting! FYI, some LKMM's maintainers (Paul included) had and
> continued to have some "fun" discussing topics related to "thread-
> safe memory accesses": I'm sure that they'll be very interested in
> such work of yours and eager to discuss your results.

Thread Sanitizer is a great tool to find thread-safety issues with
user-space code. The tool been developed by a team of smart people
from Google [1].

KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
bunch of work been done there but the project is still at
proof-of-concept point.

I am not a part of Google's dynamic tools team. But I've decided to
pick something to do during the New Year holidays so started porting
KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
fix a few crashes [3].

[1] https://github.com/google/sanitizers
[2] https://github.com/google/ktsan/wiki
[3] https://github.com/anatol/linux/tree/ktsan_4.20

2019-01-09 05:55:56

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Wed, Jan 9, 2019 at 1:36 AM Anatol Pomozov <[email protected]> wrote:
>
> Hello
>
> On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> <[email protected]> wrote:
> >
> > Hi Anatol,
> >
> > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > Hello folks,
> > >
> > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > and makes sure it is accessed in a thread-safe manner.
> >
> > Interesting! FYI, some LKMM's maintainers (Paul included) had and
> > continued to have some "fun" discussing topics related to "thread-
> > safe memory accesses": I'm sure that they'll be very interested in
> > such work of yours and eager to discuss your results.
>
> Thread Sanitizer is a great tool to find thread-safety issues with
> user-space code. The tool been developed by a team of smart people
> from Google [1].
>
> KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> bunch of work been done there but the project is still at
> proof-of-concept point.
>
> I am not a part of Google's dynamic tools team. But I've decided to
> pick something to do during the New Year holidays so started porting
> KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> fix a few crashes [3].
>
> [1] https://github.com/google/sanitizers
> [2] https://github.com/google/ktsan/wiki
> [3] https://github.com/anatol/linux/tree/ktsan_4.20


For completeness, at the time we also had to add
read_seqcount_cancel() function to dynamically mark all seqcount read
regions. It can be used here too, start read region and immediately
end it. Less clear than raw_read_seqcount_nocritical(), but also less
API surface.

/**
* read_seqcount_cancel - cancel a seq-read critical section
* @s: pointer to seqcount_t
*
* This is a no-op except for ktsan, it needs to know scopes of seq-read
* critical sections. The sections are denoted either by begin->retry or
* by begin->cancel.
*/
static inline void read_seqcount_cancel(const seqcount_t *s)
{
ktsan_seqcount_end(s);
}

https://github.com/google/ktsan/blob/tsan/include/linux/seqlock.h#L235-L246

2019-01-09 11:51:12

by Andrea Parri

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> Hello
>
> On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> <[email protected]> wrote:
> >
> > Hi Anatol,
> >
> > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > Hello folks,
> > >
> > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > and makes sure it is accessed in a thread-safe manner.
> >
> > Interesting! FYI, some LKMM's maintainers (Paul included) had and
> > continued to have some "fun" discussing topics related to "thread-
> > safe memory accesses": I'm sure that they'll be very interested in
> > such work of yours and eager to discuss your results.
>
> Thread Sanitizer is a great tool to find thread-safety issues with
> user-space code. The tool been developed by a team of smart people
> from Google [1].
>
> KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> bunch of work been done there but the project is still at
> proof-of-concept point.

Yes, I have been aware of these tools since at least ;-)

https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ


>
> I am not a part of Google's dynamic tools team. But I've decided to
> pick something to do during the New Year holidays so started porting
> KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> fix a few crashes [3].

I guess my first reaction would remain

"it's kind of hard (to use an euphemism) to review 7,582 additions
or so for a data race detector without a clear/an accepted (by the
community) notion of data race..."

Andrea


>
> [1] https://github.com/google/sanitizers
> [2] https://github.com/google/ktsan/wiki
> [3] https://github.com/anatol/linux/tree/ktsan_4.20

2019-01-09 11:58:14

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
<[email protected]> wrote:
>
> On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > Hello
> >
> > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > <[email protected]> wrote:
> > >
> > > Hi Anatol,
> > >
> > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > Hello folks,
> > > >
> > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > and makes sure it is accessed in a thread-safe manner.
> > >
> > > Interesting! FYI, some LKMM's maintainers (Paul included) had and
> > > continued to have some "fun" discussing topics related to "thread-
> > > safe memory accesses": I'm sure that they'll be very interested in
> > > such work of yours and eager to discuss your results.
> >
> > Thread Sanitizer is a great tool to find thread-safety issues with
> > user-space code. The tool been developed by a team of smart people
> > from Google [1].
> >
> > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > bunch of work been done there but the project is still at
> > proof-of-concept point.
>
> Yes, I have been aware of these tools since at least ;-)
>
> https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
>
>
> >
> > I am not a part of Google's dynamic tools team. But I've decided to
> > pick something to do during the New Year holidays so started porting
> > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > fix a few crashes [3].
>
> I guess my first reaction would remain
>
> "it's kind of hard (to use an euphemism) to review 7,582 additions
> or so for a data race detector without a clear/an accepted (by the
> community) notion of data race..."

Tsan's notion of a data race is basically the C/C++'s notion:
concurrent/unsynchronized non-atomic access in different threads at
least one of which is a write.
Tremendous (for such a project) benefits of automatic data race
detection is a good motivation to finally agree on and accept a
practically useful notion of a data race.

2019-01-09 12:42:48

by Andrea Parri

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Wed, Jan 09, 2019 at 12:55:27PM +0100, Dmitry Vyukov wrote:
> On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
> <[email protected]> wrote:
> >
> > On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > > Hello
> > >
> > > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > > <[email protected]> wrote:
> > > >
> > > > Hi Anatol,
> > > >
> > > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > > Hello folks,
> > > > >
> > > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > > and makes sure it is accessed in a thread-safe manner.
> > > >
> > > > Interesting! FYI, some LKMM's maintainers (Paul included) had and
> > > > continued to have some "fun" discussing topics related to "thread-
> > > > safe memory accesses": I'm sure that they'll be very interested in
> > > > such work of yours and eager to discuss your results.
> > >
> > > Thread Sanitizer is a great tool to find thread-safety issues with
> > > user-space code. The tool been developed by a team of smart people
> > > from Google [1].
> > >
> > > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > > bunch of work been done there but the project is still at
> > > proof-of-concept point.
> >
> > Yes, I have been aware of these tools since at least ;-)
> >
> > https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
> >
> >
> > >
> > > I am not a part of Google's dynamic tools team. But I've decided to
> > > pick something to do during the New Year holidays so started porting
> > > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > > fix a few crashes [3].
> >
> > I guess my first reaction would remain
> >
> > "it's kind of hard (to use an euphemism) to review 7,582 additions
> > or so for a data race detector without a clear/an accepted (by the
> > community) notion of data race..."
>
> Tsan's notion of a data race is basically the C/C++'s notion:
> concurrent/unsynchronized non-atomic access in different threads at
> least one of which is a write.

Yeah, I think that this notion needs to be detailed, discussed,
documented, and discussed again. ;-)


> Tremendous (for such a project) benefits of automatic data race
> detection is a good motivation to finally agree on and accept a
> practically useful notion of a data race.

Agreed.

Andrea

2019-01-09 13:20:16

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Wed, Jan 9, 2019 at 1:11 PM Andrea Parri
<[email protected]> wrote:
>
> On Wed, Jan 09, 2019 at 12:55:27PM +0100, Dmitry Vyukov wrote:
> > On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
> > <[email protected]> wrote:
> > >
> > > On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > > > Hello
> > > >
> > > > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hi Anatol,
> > > > >
> > > > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > > > Hello folks,
> > > > > >
> > > > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > > > and makes sure it is accessed in a thread-safe manner.
> > > > >
> > > > > Interesting! FYI, some LKMM's maintainers (Paul included) had and
> > > > > continued to have some "fun" discussing topics related to "thread-
> > > > > safe memory accesses": I'm sure that they'll be very interested in
> > > > > such work of yours and eager to discuss your results.
> > > >
> > > > Thread Sanitizer is a great tool to find thread-safety issues with
> > > > user-space code. The tool been developed by a team of smart people
> > > > from Google [1].
> > > >
> > > > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > > > bunch of work been done there but the project is still at
> > > > proof-of-concept point.
> > >
> > > Yes, I have been aware of these tools since at least ;-)
> > >
> > > https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
> > >
> > >
> > > >
> > > > I am not a part of Google's dynamic tools team. But I've decided to
> > > > pick something to do during the New Year holidays so started porting
> > > > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > > > fix a few crashes [3].
> > >
> > > I guess my first reaction would remain
> > >
> > > "it's kind of hard (to use an euphemism) to review 7,582 additions
> > > or so for a data race detector without a clear/an accepted (by the
> > > community) notion of data race..."
> >
> > Tsan's notion of a data race is basically the C/C++'s notion:
> > concurrent/unsynchronized non-atomic access in different threads at
> > least one of which is a write.
>
> Yeah, I think that this notion needs to be detailed, discussed,
> documented, and discussed again. ;-)
>
>
> > Tremendous (for such a project) benefits of automatic data race
> > detection is a good motivation to finally agree on and accept a
> > practically useful notion of a data race.
>
> Agreed.


While having a 100% formal definition of a data race upfront would be
useful, I don't think this is a hard requirement for deployment of
KTSAN. What I think is required is:
1. Agree that the overall direction is right.
2. Agree that we want to enable data race detection and resolve
problems as they appear in a practical manner (rather than block whole
effort on every small thing).
We deployed TSAN in user-space in much larger code bases than kernel,
and while we had the C/C++ formal definition of a data race, practical
and legacy matters were similar to that of the kernel (lots of legacy
code, different opinions, etc). Doing both things in tandem (defining
a memory model and deploying a data race detector) can actually have
benefits as a race detector may point to under-defined or
impractically defined areas, and will otherwise help to validate that
the model works and is useful.
KTSAN is not fixed as well. We adopted it as we gathered more
knowledge and understanding of the kernel. So it's not that we have to
commit to something upfront.

2019-01-09 17:13:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Wed, Jan 09, 2019 at 01:29:02PM +0100, Dmitry Vyukov wrote:
> On Wed, Jan 9, 2019 at 1:11 PM Andrea Parri
> <[email protected]> wrote:
> >
> > On Wed, Jan 09, 2019 at 12:55:27PM +0100, Dmitry Vyukov wrote:
> > > On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
> > > <[email protected]> wrote:
> > > >
> > > > On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > > > > Hello
> > > > >
> > > > > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hi Anatol,
> > > > > >
> > > > > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > > > > Hello folks,
> > > > > > >
> > > > > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > > > > and makes sure it is accessed in a thread-safe manner.
> > > > > >
> > > > > > Interesting! FYI, some LKMM's maintainers (Paul included) had and
> > > > > > continued to have some "fun" discussing topics related to "thread-
> > > > > > safe memory accesses": I'm sure that they'll be very interested in
> > > > > > such work of yours and eager to discuss your results.
> > > > >
> > > > > Thread Sanitizer is a great tool to find thread-safety issues with
> > > > > user-space code. The tool been developed by a team of smart people
> > > > > from Google [1].
> > > > >
> > > > > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > > > > bunch of work been done there but the project is still at
> > > > > proof-of-concept point.
> > > >
> > > > Yes, I have been aware of these tools since at least ;-)
> > > >
> > > > https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
> > > >
> > > >
> > > > >
> > > > > I am not a part of Google's dynamic tools team. But I've decided to
> > > > > pick something to do during the New Year holidays so started porting
> > > > > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > > > > fix a few crashes [3].
> > > >
> > > > I guess my first reaction would remain
> > > >
> > > > "it's kind of hard (to use an euphemism) to review 7,582 additions
> > > > or so for a data race detector without a clear/an accepted (by the
> > > > community) notion of data race..."
> > >
> > > Tsan's notion of a data race is basically the C/C++'s notion:
> > > concurrent/unsynchronized non-atomic access in different threads at
> > > least one of which is a write.
> >
> > Yeah, I think that this notion needs to be detailed, discussed,
> > documented, and discussed again. ;-)
> >
> >
> > > Tremendous (for such a project) benefits of automatic data race
> > > detection is a good motivation to finally agree on and accept a
> > > practically useful notion of a data race.
> >
> > Agreed.
>
> While having a 100% formal definition of a data race upfront would be
> useful, I don't think this is a hard requirement for deployment of
> KTSAN. What I think is required is:
> 1. Agree that the overall direction is right.
> 2. Agree that we want to enable data race detection and resolve
> problems as they appear in a practical manner (rather than block whole
> effort on every small thing).
> We deployed TSAN in user-space in much larger code bases than kernel,
> and while we had the C/C++ formal definition of a data race, practical
> and legacy matters were similar to that of the kernel (lots of legacy
> code, different opinions, etc). Doing both things in tandem (defining
> a memory model and deploying a data race detector) can actually have
> benefits as a race detector may point to under-defined or
> impractically defined areas, and will otherwise help to validate that
> the model works and is useful.
> KTSAN is not fixed as well. We adopted it as we gathered more
> knowledge and understanding of the kernel. So it's not that we have to
> commit to something upfront.

In any case, there might well be some differences in approach between
KTSAN and LKMM due to input size differences: One would expect LKMM
to be able to tolerate a more computationally intensive definition as
a consequence of KTSAN's ability to process much larger code bases.

But I nevertheless believe that it would be good to have these differences
be a matter of conscious choice rather than a matter of chance. ;-)

My guess is that LKMM picks its starting point (which might take some
additional time), then KTSAN critiques it, and then we work out what
differences should result in a change to one or the other (or both)
and which differences are inherent in the different workloads that LKMM
and KTSAN are presented with.

Seem reasonable?

Thanx, Paul


2019-01-10 10:46:35

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Wed, Jan 9, 2019 at 6:11 PM Paul E. McKenney <[email protected]> wrote:
>
> On Wed, Jan 09, 2019 at 01:29:02PM +0100, Dmitry Vyukov wrote:
> > On Wed, Jan 9, 2019 at 1:11 PM Andrea Parri
> > <[email protected]> wrote:
> > >
> > > On Wed, Jan 09, 2019 at 12:55:27PM +0100, Dmitry Vyukov wrote:
> > > > On Wed, Jan 9, 2019 at 12:24 PM Andrea Parri
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Tue, Jan 08, 2019 at 04:36:46PM -0800, Anatol Pomozov wrote:
> > > > > > Hello
> > > > > >
> > > > > > On Tue, Jan 8, 2019 at 4:02 PM Andrea Parri
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi Anatol,
> > > > > > >
> > > > > > > On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > > > > > > > Hello folks,
> > > > > > > >
> > > > > > > > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > > > > > > > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > > > > > > > and makes sure it is accessed in a thread-safe manner.
> > > > > > >
> > > > > > > Interesting! FYI, some LKMM's maintainers (Paul included) had and
> > > > > > > continued to have some "fun" discussing topics related to "thread-
> > > > > > > safe memory accesses": I'm sure that they'll be very interested in
> > > > > > > such work of yours and eager to discuss your results.
> > > > > >
> > > > > > Thread Sanitizer is a great tool to find thread-safety issues with
> > > > > > user-space code. The tool been developed by a team of smart people
> > > > > > from Google [1].
> > > > > >
> > > > > > KTSAN is an attempt to bring the same ideas to Linux kernel [2]. A
> > > > > > bunch of work been done there but the project is still at
> > > > > > proof-of-concept point.
> > > > >
> > > > > Yes, I have been aware of these tools since at least ;-)
> > > > >
> > > > > https://groups.google.com/forum/#!msg/ktsan/bVZ1c6H2NE0/Dxrw55bfBAAJ
> > > > >
> > > > >
> > > > > >
> > > > > > I am not a part of Google's dynamic tools team. But I've decided to
> > > > > > pick something to do during the New Year holidays so started porting
> > > > > > KTSAN from v4.2 to v4.20. The work is "almost completed" but I need to
> > > > > > fix a few crashes [3].
> > > > >
> > > > > I guess my first reaction would remain
> > > > >
> > > > > "it's kind of hard (to use an euphemism) to review 7,582 additions
> > > > > or so for a data race detector without a clear/an accepted (by the
> > > > > community) notion of data race..."
> > > >
> > > > Tsan's notion of a data race is basically the C/C++'s notion:
> > > > concurrent/unsynchronized non-atomic access in different threads at
> > > > least one of which is a write.
> > >
> > > Yeah, I think that this notion needs to be detailed, discussed,
> > > documented, and discussed again. ;-)
> > >
> > >
> > > > Tremendous (for such a project) benefits of automatic data race
> > > > detection is a good motivation to finally agree on and accept a
> > > > practically useful notion of a data race.
> > >
> > > Agreed.
> >
> > While having a 100% formal definition of a data race upfront would be
> > useful, I don't think this is a hard requirement for deployment of
> > KTSAN. What I think is required is:
> > 1. Agree that the overall direction is right.
> > 2. Agree that we want to enable data race detection and resolve
> > problems as they appear in a practical manner (rather than block whole
> > effort on every small thing).
> > We deployed TSAN in user-space in much larger code bases than kernel,
> > and while we had the C/C++ formal definition of a data race, practical
> > and legacy matters were similar to that of the kernel (lots of legacy
> > code, different opinions, etc). Doing both things in tandem (defining
> > a memory model and deploying a data race detector) can actually have
> > benefits as a race detector may point to under-defined or
> > impractically defined areas, and will otherwise help to validate that
> > the model works and is useful.
> > KTSAN is not fixed as well. We adopted it as we gathered more
> > knowledge and understanding of the kernel. So it's not that we have to
> > commit to something upfront.
>
> In any case, there might well be some differences in approach between
> KTSAN and LKMM due to input size differences: One would expect LKMM
> to be able to tolerate a more computationally intensive definition as
> a consequence of KTSAN's ability to process much larger code bases.
>
> But I nevertheless believe that it would be good to have these differences
> be a matter of conscious choice rather than a matter of chance. ;-)
>
> My guess is that LKMM picks its starting point (which might take some
> additional time), then KTSAN critiques it, and then we work out what
> differences should result in a change to one or the other (or both)
> and which differences are inherent in the different workloads that LKMM
> and KTSAN are presented with.
>
> Seem reasonable?

Sounds reasonable.

For seqcounts we currently simply ignore all accesses within the read
section (thus the requirement to dynamically track read sections).
What does LKMM say about seqlocks?

2019-01-10 12:40:49

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 1:30 PM Andrea Parri
<[email protected]> wrote:
>
> > For seqcounts we currently simply ignore all accesses within the read
> > section (thus the requirement to dynamically track read sections).
> > What does LKMM say about seqlocks?
>
> LKMM does not currently model seqlocks, if that's what you're asking;
> c.f., tools/memory-model/linux-kernel.def for a list of the currently
> supported synchronization primitives.
>
> LKMM has also no notion of "data race", it insists that the code must
> contain no unmarked accesses; we have been discussing such extensions
> since at least Dec'17 (we're not quite there!, as mentioned by Paul).

How does it call cases that do contain unmarked accesses then? :)

> My opinion is that ignoring all accesses within a given read section
> _can_ lead to false negatives

Absolutely. But this is a deliberate decision.
For our tools we consider priority 1: no false positives. Period.
Priority 2: also report some true positives in best effort manner.

> (in every possible definition of "data
> race" and "read sections" I can think of at the moment ;D):
>
> P0 P1
> read_seqbegin() x = 1;
> r0 = x;
> read_seqretry() // =0
>
> ought to be "racy"..., right? (I didn't audit all the callsites for
> read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
> pattern ;D ... "legacy", as you recalled).
>
> Andrea

2019-01-10 12:42:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Tue, Jan 08, 2019 at 11:37:46PM +0100, Florian Westphal wrote:
> Anatol Pomozov <[email protected]> wrote:
> > Or maybe xt_replace_table() can be enhanced? When I hear that
> > something waits until an event happens on all CPUs I think about
> > wait_event() function. Would it be better for xt_replace_table() to
> > introduce an atomic counter that is decremented by CPUs, and the main
> > CPU waits until the counter gets zero?
>
> That would mean placing an additional atomic op into the
> iptables evaluation path (ipt_do_table and friends).
>

For:

/*
* Ensure contents of newinfo are visible before assigning to
* private.
*/
smp_wmb();
table->private = newinfo;

we have:

smp_store_release(&table->private, newinfo);

But what store does that second smp_wmb() order against? The comment:

/* make sure all cpus see new ->private value */
smp_wmb();

makes no sense what so ever, no smp_*() barrier can provide such
guarantees.

> Only alternative I see that might work is synchronize_rcu (the
> _do_table functions are called with rcu read lock held).
>
> I guess current scheme is cheaper though.

Is performance a concern in this path? There is no comment justifying
this 'creative' stuff.


2019-01-10 12:46:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> Hello folks,
>
> A bit of context what I am doing. I am trying to port KTSAN (Kernel
> Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> and makes sure it is accessed in a thread-safe manner.
>
> seqlock is a synchronization primitive used by Linux kernel. KTSAN
> annotates read_seqbegin()/read_seqretry() and tracks what data been
> accessed in its critical section.
>
> During KTSAN port I found and interesting seqcount usage introduced in
> commit 80055dab5de0c8677bc148c4717ddfc753a9148e
>
> If I read this commit correctly xt_replace_table() does not use
> seqlock in a canonical way to specify a critical section. Instead the
> code reads the counter and waits until it gets to a specific value.

(gets away from)

> Now I want KTSAN to play with this code nicely. I need to tell KTSAN
> something like "this raw_read_seqcount() does not start a critical
> section, just ignore it". So temporary I introduced
> raw_read_seqcount_nocritical() function that is ignored by KTSAN. Is
> it a good solution?

This code is special enough to just do: READ_ONCE(->sequence) and be
done with it. It doesn't need the smp_rmb() or anything else.

2019-01-10 12:49:48

by Andrea Parri

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 01:38:11PM +0100, Dmitry Vyukov wrote:
> On Thu, Jan 10, 2019 at 1:30 PM Andrea Parri
> <[email protected]> wrote:
> >
> > > For seqcounts we currently simply ignore all accesses within the read
> > > section (thus the requirement to dynamically track read sections).
> > > What does LKMM say about seqlocks?
> >
> > LKMM does not currently model seqlocks, if that's what you're asking;
> > c.f., tools/memory-model/linux-kernel.def for a list of the currently
> > supported synchronization primitives.
> >
> > LKMM has also no notion of "data race", it insists that the code must
> > contain no unmarked accesses; we have been discussing such extensions
> > since at least Dec'17 (we're not quite there!, as mentioned by Paul).
>
> How does it call cases that do contain unmarked accesses then? :)

"work-in-progress" ;), or "limitation" (see tools/memory-model/README)


>
> > My opinion is that ignoring all accesses within a given read section
> > _can_ lead to false negatives
>
> Absolutely. But this is a deliberate decision.
> For our tools we consider priority 1: no false positives. Period.
> Priority 2: also report some true positives in best effort manner.

This sound reasonable to me. But please don't overlook the fact that
to be able to talk about "false positive" and "false negative" (for a
data race detector) we need to agree about "what a data race is".

(The hope, of course, is that the LKMM will have a say soon here ...)

Andrea


>
> > (in every possible definition of "data
> > race" and "read sections" I can think of at the moment ;D):
> >
> > P0 P1
> > read_seqbegin() x = 1;
> > r0 = x;
> > read_seqretry() // =0
> >
> > ought to be "racy"..., right? (I didn't audit all the callsites for
> > read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
> > pattern ;D ... "legacy", as you recalled).
> >
> > Andrea

2019-01-10 12:55:10

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 1:41 PM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Jan 08, 2019 at 11:37:46PM +0100, Florian Westphal wrote:
> > Anatol Pomozov <[email protected]> wrote:
> > > Or maybe xt_replace_table() can be enhanced? When I hear that
> > > something waits until an event happens on all CPUs I think about
> > > wait_event() function. Would it be better for xt_replace_table() to
> > > introduce an atomic counter that is decremented by CPUs, and the main
> > > CPU waits until the counter gets zero?
> >
> > That would mean placing an additional atomic op into the
> > iptables evaluation path (ipt_do_table and friends).
> >
>
> For:
>
> /*
> * Ensure contents of newinfo are visible before assigning to
> * private.
> */
> smp_wmb();
> table->private = newinfo;
>
> we have:
>
> smp_store_release(&table->private, newinfo);
>
> But what store does that second smp_wmb() order against? The comment:
>
> /* make sure all cpus see new ->private value */
> smp_wmb();
>
> makes no sense what so ever, no smp_*() barrier can provide such
> guarantees.

Do we want WRITE_ONCE here then?
We want it to be compiled to an actual memory access and then it's up
to hardware to make it visible to other CPUs.
smp_wmb should most likely have this as a side effect too, but
somewhat indirect.
Also race-detector-friendly.


> > Only alternative I see that might work is synchronize_rcu (the
> > _do_table functions are called with rcu read lock held).
> >
> > I guess current scheme is cheaper though.
>
> Is performance a concern in this path? There is no comment justifying
> this 'creative' stuff.
>

2019-01-10 12:57:40

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 1:44 PM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Jan 08, 2019 at 11:33:39AM -0800, Anatol Pomozov wrote:
> > Hello folks,
> >
> > A bit of context what I am doing. I am trying to port KTSAN (Kernel
> > Thread Sanitizer) tool to v4.20. That tool tracks shared data usage
> > and makes sure it is accessed in a thread-safe manner.
> >
> > seqlock is a synchronization primitive used by Linux kernel. KTSAN
> > annotates read_seqbegin()/read_seqretry() and tracks what data been
> > accessed in its critical section.
> >
> > During KTSAN port I found and interesting seqcount usage introduced in
> > commit 80055dab5de0c8677bc148c4717ddfc753a9148e
> >
> > If I read this commit correctly xt_replace_table() does not use
> > seqlock in a canonical way to specify a critical section. Instead the
> > code reads the counter and waits until it gets to a specific value.
>
> (gets away from)
>
> > Now I want KTSAN to play with this code nicely. I need to tell KTSAN
> > something like "this raw_read_seqcount() does not start a critical
> > section, just ignore it". So temporary I introduced
> > raw_read_seqcount_nocritical() function that is ignored by KTSAN. Is
> > it a good solution?
>
> This code is special enough to just do: READ_ONCE(->sequence) and be
> done with it. It doesn't need the smp_rmb() or anything else.

Sounds good to me.
From KTSAN perspective it then should work without any additional
dance, it's always good when code works as-is.

2019-01-10 13:27:04

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 1:47 PM Andrea Parri
<[email protected]> wrote:
>
> On Thu, Jan 10, 2019 at 01:38:11PM +0100, Dmitry Vyukov wrote:
> > On Thu, Jan 10, 2019 at 1:30 PM Andrea Parri
> > <[email protected]> wrote:
> > >
> > > > For seqcounts we currently simply ignore all accesses within the read
> > > > section (thus the requirement to dynamically track read sections).
> > > > What does LKMM say about seqlocks?
> > >
> > > LKMM does not currently model seqlocks, if that's what you're asking;
> > > c.f., tools/memory-model/linux-kernel.def for a list of the currently
> > > supported synchronization primitives.
> > >
> > > LKMM has also no notion of "data race", it insists that the code must
> > > contain no unmarked accesses; we have been discussing such extensions
> > > since at least Dec'17 (we're not quite there!, as mentioned by Paul).
> >
> > How does it call cases that do contain unmarked accesses then? :)
>
> "work-in-progress" ;), or "limitation" (see tools/memory-model/README)

Let's call it /data race/ interim then :)
Which also have literally undefined behavior in LKMM. Which now
precisely matches the implementation language (C) definitions. Which
is nice.

> > > My opinion is that ignoring all accesses within a given read section
> > > _can_ lead to false negatives
> >
> > Absolutely. But this is a deliberate decision.
> > For our tools we consider priority 1: no false positives. Period.
> > Priority 2: also report some true positives in best effort manner.
>
> This sound reasonable to me. But please don't overlook the fact that
> to be able to talk about "false positive" and "false negative" (for a
> data race detector) we need to agree about "what a data race is".

Having a formal model would be undoubtedly good.
But in practice things are much simpler. The complex cases that
majority of LKMM deals with are <<1% of kernel concurrency. The
majority of kernel cases are "no concurrent accesses at all", "always
protected by a mutex", "passed as argument to a new thread", "the
canonical store-release/load-acquire synchronization". For these I
hope there is no controversy across C, POSIX, gcc, clang, kernel.
Handling these cases in a race detector brings 99.9% of benefit. And
for more complex cases (like seqlock) we can always approximate as "no
races there" which inevitably satisfy our priorities (if you report
nothing, you don't report false positives).

But I am much more concerned about actual kernel code and behavior wrt
a memory model. We are talking about interaction between LKMM <->
KTSAN. When a way more important question is LKMM <-> actual kernel
behavior. KTSAN is really a secondary thing in this picture. So if
anything needs a memory model, or needs to be blocked on a memory
model, that's writing kernel code ;)


> (The hope, of course, is that the LKMM will have a say soon here ...)
>
> Andrea
>
>
> >
> > > (in every possible definition of "data
> > > race" and "read sections" I can think of at the moment ;D):
> > >
> > > P0 P1
> > > read_seqbegin() x = 1;
> > > r0 = x;
> > > read_seqretry() // =0
> > >
> > > ought to be "racy"..., right? (I didn't audit all the callsites for
> > > read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
> > > pattern ;D ... "legacy", as you recalled).
> > >
> > > Andrea

2019-01-10 13:52:07

by Andrea Parri

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

> For seqcounts we currently simply ignore all accesses within the read
> section (thus the requirement to dynamically track read sections).
> What does LKMM say about seqlocks?

LKMM does not currently model seqlocks, if that's what you're asking;
c.f., tools/memory-model/linux-kernel.def for a list of the currently
supported synchronization primitives.

LKMM has also no notion of "data race", it insists that the code must
contain no unmarked accesses; we have been discussing such extensions
since at least Dec'17 (we're not quite there!, as mentioned by Paul).

My opinion is that ignoring all accesses within a given read section
_can_ lead to false negatives (in every possible definition of "data
race" and "read sections" I can think of at the moment ;D):

P0 P1
read_seqbegin() x = 1;
r0 = x;
read_seqretry() // =0

ought to be "racy"..., right? (I didn't audit all the callsites for
read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
pattern ;D ... "legacy", as you recalled).

Andrea

2019-01-10 14:53:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 01:38:11PM +0100, Dmitry Vyukov wrote:
> On Thu, Jan 10, 2019 at 1:30 PM Andrea Parri
> <[email protected]> wrote:
> >
> > > For seqcounts we currently simply ignore all accesses within the read
> > > section (thus the requirement to dynamically track read sections).
> > > What does LKMM say about seqlocks?
> >
> > LKMM does not currently model seqlocks, if that's what you're asking;
> > c.f., tools/memory-model/linux-kernel.def for a list of the currently
> > supported synchronization primitives.
> >
> > LKMM has also no notion of "data race", it insists that the code must
> > contain no unmarked accesses; we have been discussing such extensions
> > since at least Dec'17 (we're not quite there!, as mentioned by Paul).
>
> How does it call cases that do contain unmarked accesses then? :)
>
> > My opinion is that ignoring all accesses within a given read section
> > _can_ lead to false negatives
>
> Absolutely. But this is a deliberate decision.
> For our tools we consider priority 1: no false positives. Period.
> Priority 2: also report some true positives in best effort manner.
>
> > (in every possible definition of "data
> > race" and "read sections" I can think of at the moment ;D):
> >
> > P0 P1
> > read_seqbegin() x = 1;
> > r0 = x;
> > read_seqretry() // =0
> >
> > ought to be "racy"..., right? (I didn't audit all the callsites for
> > read_{seqbegin,seqretry}(), but I wouldn't be surprised to find such
> > pattern ;D ... "legacy", as you recalled).

One approach would be to forgive data races in the seqlock read-side
critical section only if:

o There was a later matching read_seqretry() that returned true, and

o There were no dereferences of any data-racy load. (Yeah, this
one should be good clean fun to model!)

Do people nest read_seqbegin(), and if so, what does that mean? ;-)

Thanx, Paul


2019-01-10 14:55:44

by Paul E. McKenney

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 01:41:23PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 08, 2019 at 11:37:46PM +0100, Florian Westphal wrote:
> > Anatol Pomozov <[email protected]> wrote:
> > > Or maybe xt_replace_table() can be enhanced? When I hear that
> > > something waits until an event happens on all CPUs I think about
> > > wait_event() function. Would it be better for xt_replace_table() to
> > > introduce an atomic counter that is decremented by CPUs, and the main
> > > CPU waits until the counter gets zero?
> >
> > That would mean placing an additional atomic op into the
> > iptables evaluation path (ipt_do_table and friends).
> >
>
> For:
>
> /*
> * Ensure contents of newinfo are visible before assigning to
> * private.
> */
> smp_wmb();
> table->private = newinfo;
>
> we have:
>
> smp_store_release(&table->private, newinfo);
>
> But what store does that second smp_wmb() order against? The comment:
>
> /* make sure all cpus see new ->private value */
> smp_wmb();
>
> makes no sense what so ever, no smp_*() barrier can provide such
> guarantees.

Agreed, this would require something like synchronize_rcu() or some
sort of IPI-based sys_membarrier() lookalike.

Thanx, Paul

> > Only alternative I see that might work is synchronize_rcu (the
> > _do_table functions are called with rcu read lock held).
> >
> > I guess current scheme is cheaper though.
>
> Is performance a concern in this path? There is no comment justifying
> this 'creative' stuff.
>


2019-01-10 16:27:51

by Florian Westphal

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

Peter Zijlstra <[email protected]> wrote:
> /*
> * Ensure contents of newinfo are visible before assigning to
> * private.
> */
> smp_wmb();
> table->private = newinfo;
>
> we have:
>
> smp_store_release(&table->private, newinfo);
>
> But what store does that second smp_wmb() order against? The comment:
>
> /* make sure all cpus see new ->private value */
> smp_wmb();
>
> makes no sense what so ever, no smp_*() barrier can provide such
> guarantees.

IIRC I added this at the request of a reviewer of an earlier iteration
of that patch.

IIRC the concern was that compiler/hw could re-order

smb_wmb();
table->private = newinfo;
/* wait until all cpus are done with old table */

into:

smb_wmb();
/* wait until all cpus are done with old table */
...
table->private = newinfo;

and that (obviously) makes the wait-loop useless.

> > Only alternative I see that might work is synchronize_rcu (the
> > _do_table functions are called with rcu read lock held).
> >
> > I guess current scheme is cheaper though.
>
> Is performance a concern in this path? There is no comment justifying
> this 'creative' stuff.

We have to wait until all cpus are done with current iptables ruleset.

Before this 'creative' change, this relied on get_counters
synchronization. And that caused wait times of 30 seconds or more on
busy systems.

I have no objections swapping this with a synchronize_rcu() if that
makes it easier.

(synchronize_rcu might be confusing though, as we don't use rcu
for table->private), and one 'has to know' that all the netfilter
hooks, including the iptables eval loop, run with rcu_read_lock
held).

2019-01-10 21:18:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 03:48:12PM +0100, Florian Westphal wrote:
> Peter Zijlstra <[email protected]> wrote:
> > /*
> > * Ensure contents of newinfo are visible before assigning to
> > * private.
> > */
> > smp_wmb();
> > table->private = newinfo;
> >
> > we have:
> >
> > smp_store_release(&table->private, newinfo);
> >
> > But what store does that second smp_wmb() order against? The comment:
> >
> > /* make sure all cpus see new ->private value */
> > smp_wmb();
> >
> > makes no sense what so ever, no smp_*() barrier can provide such
> > guarantees.
>
> IIRC I added this at the request of a reviewer of an earlier iteration
> of that patch.
>
> IIRC the concern was that compiler/hw could re-order
>
> smb_wmb();
> table->private = newinfo;
> /* wait until all cpus are done with old table */
>
> into:
>
> smb_wmb();
> /* wait until all cpus are done with old table */
> ...
> table->private = newinfo;
>
> and that (obviously) makes the wait-loop useless.

The thing is, the 'wait for all cpus' thing is pure loads, not stores,
so smp_wmb() is a complete NOP there.

If you want to ensure those loads happen after that store (which does
indeed seem like a sensible thing), you're going to have to use
smp_mb().

2019-01-10 21:19:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 01:53:28PM +0100, Dmitry Vyukov wrote:
> On Thu, Jan 10, 2019 at 1:41 PM Peter Zijlstra <[email protected]> wrote:
> >
> > On Tue, Jan 08, 2019 at 11:37:46PM +0100, Florian Westphal wrote:
> > > Anatol Pomozov <[email protected]> wrote:
> > > > Or maybe xt_replace_table() can be enhanced? When I hear that
> > > > something waits until an event happens on all CPUs I think about
> > > > wait_event() function. Would it be better for xt_replace_table() to
> > > > introduce an atomic counter that is decremented by CPUs, and the main
> > > > CPU waits until the counter gets zero?
> > >
> > > That would mean placing an additional atomic op into the
> > > iptables evaluation path (ipt_do_table and friends).
> > >
> >
> > For:
> >
> > /*
> > * Ensure contents of newinfo are visible before assigning to
> > * private.
> > */
> > smp_wmb();
> > table->private = newinfo;
> >
> > we have:
> >
> > smp_store_release(&table->private, newinfo);
> >
> > But what store does that second smp_wmb() order against? The comment:
> >
> > /* make sure all cpus see new ->private value */
> > smp_wmb();
> >
> > makes no sense what so ever, no smp_*() barrier can provide such
> > guarantees.
>
> Do we want WRITE_ONCE here then?

The smp_store_release() already implies WRITE_ONCE().

2019-01-10 21:20:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 03:48:12PM +0100, Florian Westphal wrote:
> Peter Zijlstra <[email protected]> wrote:
> > Is performance a concern in this path? There is no comment justifying
> > this 'creative' stuff.
>
> We have to wait until all cpus are done with current iptables ruleset.
>
> Before this 'creative' change, this relied on get_counters
> synchronization. And that caused wait times of 30 seconds or more on
> busy systems.
>
> I have no objections swapping this with a synchronize_rcu() if that
> makes it easier.

Would using synchronize_rcu() not also mean you can get rid of that
xt_write_recseq*() stuff entirely?

Anyway, synchronize_rcu() can also take a little while, but I don't
think anywere near 30 seconds.

> (synchronize_rcu might be confusing though, as we don't use rcu
> for table->private), and one 'has to know' that all the netfilter
> hooks, including the iptables eval loop, run with rcu_read_lock
> held).

A comment can help there, right?

2019-01-11 00:23:35

by Florian Westphal

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

Peter Zijlstra <[email protected]> wrote:
> Would using synchronize_rcu() not also mean you can get rid of that
> xt_write_recseq*() stuff entirely?

No, because those are used to synchronize with cpus that read
the ruleset counters, see

net/ipv4/netfilter/ip_tables.c:get_counters().

> Anyway, synchronize_rcu() can also take a little while, but I don't
> think anywere near 30 seconds.

Ok, I think in that case it would be best to just replace the
recseq value sampling with smp_mb + synchronize_rcu plus a comment
that explains why its done.

2019-01-11 09:15:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Thu, Jan 10, 2019 at 11:29:20PM +0100, Florian Westphal wrote:
> Peter Zijlstra <[email protected]> wrote:
> > Would using synchronize_rcu() not also mean you can get rid of that
> > xt_write_recseq*() stuff entirely?
>
> No, because those are used to synchronize with cpus that read
> the ruleset counters, see
>
> net/ipv4/netfilter/ip_tables.c:get_counters().

Ah, bummer :/

> > Anyway, synchronize_rcu() can also take a little while, but I don't
> > think anywere near 30 seconds.
>
> Ok, I think in that case it would be best to just replace the
> recseq value sampling with smp_mb + synchronize_rcu plus a comment
> that explains why its done.

synchronize_rcu() implies smp_mb() on all CPUs.

2019-01-11 17:57:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: seqcount usage in xt_replace_table()

On Fri, Jan 11, 2019 at 09:34:11AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 10, 2019 at 11:29:20PM +0100, Florian Westphal wrote:
> > Peter Zijlstra <[email protected]> wrote:
> > > Would using synchronize_rcu() not also mean you can get rid of that
> > > xt_write_recseq*() stuff entirely?
> >
> > No, because those are used to synchronize with cpus that read
> > the ruleset counters, see
> >
> > net/ipv4/netfilter/ip_tables.c:get_counters().
>
> Ah, bummer :/
>
> > > Anyway, synchronize_rcu() can also take a little while, but I don't
> > > think anywere near 30 seconds.
> >
> > Ok, I think in that case it would be best to just replace the
> > recseq value sampling with smp_mb + synchronize_rcu plus a comment
> > that explains why its done.
>
> synchronize_rcu() implies smp_mb() on all CPUs.

Yes, it does, but in the case of idle CPUs, the smp_mb() calls are only
required to follow any pre-existing RCU read-side critical section on
the one hand an precede any RCU read-side critical section completing
after the synchronize_rcu() on the other.

To do more would mean waking up idle CPUs, which does not make the
battery-powered guys happy. ;-)

Thanx, Paul