2022-08-19 12:48:22

by Florian Westphal

[permalink] [raw]
Subject: Re: data-race in nf_tables_newtable / nf_tables_newtable

Abhishek Shah <[email protected]> wrote:
> Hi all,
>
> We found a race involving the table->handle variable here
> <https://elixir.bootlin.com/linux/v5.18-rc5/source/net/netfilter/nf_tables_api.c#L1221>.
> This race advances the pointer, which can cause out-of-bounds memory
> accesses in the future. Please let us know what you think.
>
> Thanks!
>
>
> *---------------------Report-----------------*
> *read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0:
> nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
> nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]

[..]

> *read-write* to 0xffffffff883a01e8 of 8 bytes by task 6541 on cpu 1:
> nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
> nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]

[..]

I don't understand. Like all batch operations, nf_tables_newtable is
supposed to run with the transaction mutex held, i.e. parallel execution
is not expected.

There is a lockdep assertion at start of nf_tables_newtable(); I
don't see how its possible that two threads can run this concurrently.


2022-08-22 21:13:34

by Gabriel Ryan

[permalink] [raw]
Subject: Re: data-race in nf_tables_newtable / nf_tables_newtable

Hi Florian,

I just looked at the lock event trace from our report and it looks
like two distinct commit mutexes were held when the race was
triggered. I think the race is probably on the table_handle variable
on net/netfilter/nf_tables_api.c:1221, and not the table->handle field
being written to.

Racing increments to table_handle could cause it to either overcount
or undercount. Could that be an issue?

Best,

Gabe

On Fri, Aug 19, 2022 at 8:35 AM Florian Westphal <[email protected]> wrote:
>
> Abhishek Shah <[email protected]> wrote:
> > Hi all,
> >
> > We found a race involving the table->handle variable here
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__elixir.bootlin.com_linux_v5.18-2Drc5_source_net_netfilter_nf-5Ftables-5Fapi.c-23L1221&d=DwIBAg&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=EyAJYRJu01oaAhhVVY3o8zKgZvacDAXd_PNRtaqACCo&m=xlZC-wDg7fkTm6_4HfcoDqYfJx_OU2L5HHX2q_yTYZZCEDCFAg-9I7T1gNmXPISg&s=JYkSOriQVx_3lJhAzBo7yqhe4bnf2Sy96cPL0L1NIn8&e= >.
> > This race advances the pointer, which can cause out-of-bounds memory
> > accesses in the future. Please let us know what you think.
> >
> > Thanks!
> >
> >
> > *---------------------Report-----------------*
> > *read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0:
> > nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
> > nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
>
> [..]
>
> > *read-write* to 0xffffffff883a01e8 of 8 bytes by task 6541 on cpu 1:
> > nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
> > nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
>
> [..]
>
> I don't understand. Like all batch operations, nf_tables_newtable is
> supposed to run with the transaction mutex held, i.e. parallel execution
> is not expected.
>
> There is a lockdep assertion at start of nf_tables_newtable(); I
> don't see how its possible that two threads can run this concurrently.

--
Gabriel Ryan
PhD Candidate at Columbia University

2022-08-22 21:33:37

by Florian Westphal

[permalink] [raw]
Subject: Re: data-race in nf_tables_newtable / nf_tables_newtable

Gabriel Ryan <[email protected]> wrote:
> Hi Florian,
>
> I just looked at the lock event trace from our report and it looks
> like two distinct commit mutexes were held when the race was
> triggered. I think the race is probably on the table_handle variable
> on net/netfilter/nf_tables_api.c:1221, and not the table->handle field
> being written to.

See

https://patchwork.ozlabs.org/project/netfilter-devel/patch/[email protected]/

which makes table_handle per netns.