2023-12-05 16:42:29

by Jann Horn

[permalink] [raw]
Subject: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

Hi!

I think this code is racy, but testing that seems like a pain...

owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
non-NULL, then checks that sk->sk_socket->file is non-NULL, then
accesses the ->f_cred of that file.

I don't see anything that protects this against a concurrent
sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
the context of a TCP retransmit or something like that. So I think we
can theoretically have a NULL deref, though the compiler will probably
optimize it away by merging the two loads of sk->sk_socket:

static bool
owner_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
[...]
const struct file *filp;
struct sock *sk = skb_to_full_sk(skb);
[...]

if (!sk || !sk->sk_socket || !net_eq(net, sock_net(sk)))
return (info->match ^ info->invert) == 0;
else if (info->match & info->invert & XT_OWNER_SOCKET)
[...]

// concurrent sock_orphan() while we're here

// null deref on second access to sk->sk_socket
filp = sk->sk_socket->file;
if (filp == NULL)
[...]
[...]
}

(Sidenote: I guess this also means xt_owner will treat a sock as
having no owner as soon as it is orphaned? Which probably means that
when a TCP socket enters linger state because it is close()d with data
remaining in the output buffer, the remaining packets will be treated
differently by netfilter?)

I also think we have no guarantee here that the socket's ->file won't
go away due to a concurrent __sock_release(), which could cause us to
continue reading file credentials out of a file whose refcount has
already dropped to zero?

That was probably working sorta okay-ish in practice until now because
files had RCU lifetime, "struct cred" also normally has RCU lifetime,
and nf_hook() runs in an RCU read-side critical section; but now that
"struct file" uses SLAB_TYPESAFE_BY_RCU, I think we can theoretically
race such that the "struct file" could have been freed and reallocated
in the meantime, causing us to see an entirely unrelated "struct file"
and look at creds that are unrelated to the context from which the
packet was sent.

But again, I haven't actually tested this yet, I might be getting it wrong.


2023-12-05 17:39:04

by Jann Horn

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
>
> Hi!
>
> I think this code is racy, but testing that seems like a pain...
>
> owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> accesses the ->f_cred of that file.
>
> I don't see anything that protects this against a concurrent
> sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in

Ah, and all the other users of ->sk_socket in net/netfilter/ do it
under the sk_callback_lock... so I guess the fix would be to add the
same in owner_mt?

2023-12-05 21:40:35

by Phil Sutter

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

Hi,

On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> >
> > Hi!
> >
> > I think this code is racy, but testing that seems like a pain...
> >
> > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > accesses the ->f_cred of that file.
> >
> > I don't see anything that protects this against a concurrent
> > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
>
> Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> under the sk_callback_lock... so I guess the fix would be to add the
> same in owner_mt?

Sounds reasonable, although I wonder how likely a socket is to
orphan while netfilter is processing a packet it just sent.

How about the attached patch? Not sure what hash to put into a Fixes:
tag given this is a day 1 bug and ipt_owner/ip6t_owner predate git.

Thanks, Phil


Attachments:
(No filename) (1.06 kB)
0001-netfilter-xt_owner-Fix-for-unsafe-access-of-sk-sk_so.patch (1.99 kB)
Download all attachments

2023-12-06 13:58:49

by Christian Brauner

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> >
> > Hi!
> >
> > I think this code is racy, but testing that seems like a pain...
> >
> > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > accesses the ->f_cred of that file.
> >
> > I don't see anything that protects this against a concurrent
> > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
>
> Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> under the sk_callback_lock... so I guess the fix would be to add the
> same in owner_mt?

In your other mail you wrote:

> I also think we have no guarantee here that the socket's ->file won't
> go away due to a concurrent __sock_release(), which could cause us to
> continue reading file credentials out of a file whose refcount has
> already dropped to zero?

Is this an independent worry or can the concurrent __sock_release()
issue only happen due to a sock_orphan() having happened first? I think
that it requires a sock_orphan() having happend, presumably because the
socket gets marked SOCK_DEAD and can thus be released via
__sock_release() asynchronously?

If so then taking sk_callback_lock() in owner_mt() should fix this.
(Otherwise we might need an additional get_active_file() on
sk->sk_socker->file in owner_mt() in addition to the other fix.)

2023-12-06 14:39:47

by Jann Horn

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Wed, Dec 6, 2023 at 2:58 PM Christian Brauner <[email protected]> wrote:
>
> On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> > >
> > > Hi!
> > >
> > > I think this code is racy, but testing that seems like a pain...
> > >
> > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > > accesses the ->f_cred of that file.
> > >
> > > I don't see anything that protects this against a concurrent
> > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
> >
> > Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> > under the sk_callback_lock... so I guess the fix would be to add the
> > same in owner_mt?
>
> In your other mail you wrote:
>
> > I also think we have no guarantee here that the socket's ->file won't
> > go away due to a concurrent __sock_release(), which could cause us to
> > continue reading file credentials out of a file whose refcount has
> > already dropped to zero?
>
> Is this an independent worry or can the concurrent __sock_release()
> issue only happen due to a sock_orphan() having happened first? I think
> that it requires a sock_orphan() having happend, presumably because the
> socket gets marked SOCK_DEAD and can thus be released via
> __sock_release() asynchronously?
>
> If so then taking sk_callback_lock() in owner_mt() should fix this.
> (Otherwise we might need an additional get_active_file() on
> sk->sk_socker->file in owner_mt() in addition to the other fix.)

My understanding is that it could only happen due to a sock_orphan()
having happened first, and so just sk_callback_lock() should probably
be a sufficient fix. (I'm not an expert on net subsystem locking rules
though.)

2023-12-06 16:29:38

by Jann Horn

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Tue, Dec 5, 2023 at 10:40 PM Phil Sutter <[email protected]> wrote:
> On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> > >
> > > Hi!
> > >
> > > I think this code is racy, but testing that seems like a pain...
> > >
> > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > > accesses the ->f_cred of that file.
> > >
> > > I don't see anything that protects this against a concurrent
> > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
> >
> > Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> > under the sk_callback_lock... so I guess the fix would be to add the
> > same in owner_mt?
>
> Sounds reasonable, although I wonder how likely a socket is to
> orphan while netfilter is processing a packet it just sent.
>
> How about the attached patch? Not sure what hash to put into a Fixes:
> tag given this is a day 1 bug and ipt_owner/ip6t_owner predate git.

Looks mostly reasonable to me; though I guess it's a bit weird to have
two separate bailout paths for checking whether sk->sk_socket is NULL,
where the first check can race, and the second check uses different
logic for determining the return value; I don't know whether that
actually matters semantically. But I'm not sure how to make it look
nicer either.
I guess you could add a READ_ONCE() around the first read to signal
that that's a potentially racy read, but I don't feel strongly about
that.

2023-12-06 16:43:19

by Pablo Neira Ayuso

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Tue, Dec 05, 2023 at 10:40:15PM +0100, Phil Sutter wrote:
> Hi,
>
> On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> > >
> > > Hi!
> > >
> > > I think this code is racy, but testing that seems like a pain...
> > >
> > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > > accesses the ->f_cred of that file.
> > >
> > > I don't see anything that protects this against a concurrent
> > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
> >
> > Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> > under the sk_callback_lock... so I guess the fix would be to add the
> > same in owner_mt?
>
> Sounds reasonable, although I wonder how likely a socket is to
> orphan while netfilter is processing a packet it just sent.
>
> How about the attached patch? Not sure what hash to put into a Fixes:
> tag given this is a day 1 bug and ipt_owner/ip6t_owner predate git.
>
> Thanks, Phil

> From 3e28490e43b04d49e6e7f145a70fff7dd42c8cc5 Mon Sep 17 00:00:00 2001
> From: Phil Sutter <[email protected]>
> Date: Tue, 5 Dec 2023 21:58:12 +0100
> Subject: [nf PATCH] netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
>
> A concurrently running sock_orphan() may NULL the sk_socket pointer in
> between check and deref. Follow other users (like nft_meta.c for
> instance) and acquire sk_callback_lock before dereferencing sk_socket.

For the record, I have placed this patch in nf.git

Thanks.

2023-12-06 16:48:49

by Pablo Neira Ayuso

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Wed, Dec 06, 2023 at 05:28:44PM +0100, Jann Horn wrote:
> On Tue, Dec 5, 2023 at 10:40 PM Phil Sutter <[email protected]> wrote:
> > On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> > > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> > > >
> > > > Hi!
> > > >
> > > > I think this code is racy, but testing that seems like a pain...
> > > >
> > > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > > > accesses the ->f_cred of that file.
> > > >
> > > > I don't see anything that protects this against a concurrent
> > > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
> > >
> > > Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> > > under the sk_callback_lock... so I guess the fix would be to add the
> > > same in owner_mt?
> >
> > Sounds reasonable, although I wonder how likely a socket is to
> > orphan while netfilter is processing a packet it just sent.
> >
> > How about the attached patch? Not sure what hash to put into a Fixes:
> > tag given this is a day 1 bug and ipt_owner/ip6t_owner predate git.
>
> Looks mostly reasonable to me; though I guess it's a bit weird to have
> two separate bailout paths for checking whether sk->sk_socket is NULL,
> where the first check can race, and the second check uses different
> logic for determining the return value; I don't know whether that
> actually matters semantically. But I'm not sure how to make it look
> nicer either.
> I guess you could add a READ_ONCE() around the first read to signal
> that that's a potentially racy read, but I don't feel strongly about
> that.

The lack of READ_ONCE() on sk->sk_socket also got me thinking here,
given this sk_set_socket(sk, NULL) in sock_orphan is done under the
lock.

I am taking Phil's patch, I think it is leaving things in better shape.

2023-12-06 16:50:05

by Christian Brauner

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Wed, Dec 06, 2023 at 05:28:44PM +0100, Jann Horn wrote:
> On Tue, Dec 5, 2023 at 10:40 PM Phil Sutter <[email protected]> wrote:
> > On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> > > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> > > >
> > > > Hi!
> > > >
> > > > I think this code is racy, but testing that seems like a pain...
> > > >
> > > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > > > accesses the ->f_cred of that file.
> > > >
> > > > I don't see anything that protects this against a concurrent
> > > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
> > >
> > > Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> > > under the sk_callback_lock... so I guess the fix would be to add the
> > > same in owner_mt?
> >
> > Sounds reasonable, although I wonder how likely a socket is to
> > orphan while netfilter is processing a packet it just sent.
> >
> > How about the attached patch? Not sure what hash to put into a Fixes:
> > tag given this is a day 1 bug and ipt_owner/ip6t_owner predate git.
>
> Looks mostly reasonable to me; though I guess it's a bit weird to have
> two separate bailout paths for checking whether sk->sk_socket is NULL,
> where the first check can race, and the second check uses different
> logic for determining the return value; I don't know whether that
> actually matters semantically. But I'm not sure how to make it look
> nicer either.
> I guess you could add a READ_ONCE() around the first read to signal
> that that's a potentially racy read, but I don't feel strongly about
> that.

It should be possible to split it into two static inlin helpers:

owner_mt_fast()
owner_mt_slow()

And then abstract the lockless and locked fetches into the two helpers.

2023-12-06 16:51:21

by Christian Brauner

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Wed, Dec 06, 2023 at 03:38:50PM +0100, Jann Horn wrote:
> On Wed, Dec 6, 2023 at 2:58 PM Christian Brauner <[email protected]> wrote:
> >
> > On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> > > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> > > >
> > > > Hi!
> > > >
> > > > I think this code is racy, but testing that seems like a pain...
> > > >
> > > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > > > accesses the ->f_cred of that file.
> > > >
> > > > I don't see anything that protects this against a concurrent
> > > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
> > >
> > > Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> > > under the sk_callback_lock... so I guess the fix would be to add the
> > > same in owner_mt?
> >
> > In your other mail you wrote:
> >
> > > I also think we have no guarantee here that the socket's ->file won't
> > > go away due to a concurrent __sock_release(), which could cause us to
> > > continue reading file credentials out of a file whose refcount has
> > > already dropped to zero?
> >
> > Is this an independent worry or can the concurrent __sock_release()
> > issue only happen due to a sock_orphan() having happened first? I think
> > that it requires a sock_orphan() having happend, presumably because the
> > socket gets marked SOCK_DEAD and can thus be released via
> > __sock_release() asynchronously?
> >
> > If so then taking sk_callback_lock() in owner_mt() should fix this.
> > (Otherwise we might need an additional get_active_file() on
> > sk->sk_socker->file in owner_mt() in addition to the other fix.)
>
> My understanding is that it could only happen due to a sock_orphan()
> having happened first, and so just sk_callback_lock() should probably
> be a sufficient fix. (I'm not an expert on net subsystem locking rules
> though.)

Ok, so as I suspected. That's good.

2023-12-06 20:42:53

by Phil Sutter

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Wed, Dec 06, 2023 at 05:28:44PM +0100, Jann Horn wrote:
> On Tue, Dec 5, 2023 at 10:40 PM Phil Sutter <[email protected]> wrote:
> > On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> > > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> > > >
> > > > Hi!
> > > >
> > > > I think this code is racy, but testing that seems like a pain...
> > > >
> > > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > > > accesses the ->f_cred of that file.
> > > >
> > > > I don't see anything that protects this against a concurrent
> > > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
> > >
> > > Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> > > under the sk_callback_lock... so I guess the fix would be to add the
> > > same in owner_mt?
> >
> > Sounds reasonable, although I wonder how likely a socket is to
> > orphan while netfilter is processing a packet it just sent.
> >
> > How about the attached patch? Not sure what hash to put into a Fixes:
> > tag given this is a day 1 bug and ipt_owner/ip6t_owner predate git.
>
> Looks mostly reasonable to me; though I guess it's a bit weird to have
> two separate bailout paths for checking whether sk->sk_socket is NULL,
> where the first check can race, and the second check uses different
> logic for determining the return value; I don't know whether that
> actually matters semantically. But I'm not sure how to make it look
> nicer either.

I find the code pretty confusing since it combines three matches (socket
UID, socket GID and socket existence) via binary ops. The second bail
disregards socket existence bits, I assumed it was deliberate and thus
decided to leave the first part as-is.

> I guess you could add a READ_ONCE() around the first read to signal
> that that's a potentially racy read, but I don't feel strongly about
> that.

Is this just annotation or do you see a practical effect of using
READ_ONCE() there?

Either way, thanks for the review!

Phil

2023-12-06 21:03:11

by Jann Horn

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Wed, Dec 6, 2023 at 9:42 PM Phil Sutter <[email protected]> wrote:
>
> On Wed, Dec 06, 2023 at 05:28:44PM +0100, Jann Horn wrote:
> > On Tue, Dec 5, 2023 at 10:40 PM Phil Sutter <[email protected]> wrote:
> > > On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> > > > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> > > > >
> > > > > Hi!
> > > > >
> > > > > I think this code is racy, but testing that seems like a pain...
> > > > >
> > > > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > > > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > > > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > > > > accesses the ->f_cred of that file.
> > > > >
> > > > > I don't see anything that protects this against a concurrent
> > > > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
> > > >
> > > > Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> > > > under the sk_callback_lock... so I guess the fix would be to add the
> > > > same in owner_mt?
> > >
> > > Sounds reasonable, although I wonder how likely a socket is to
> > > orphan while netfilter is processing a packet it just sent.
> > >
> > > How about the attached patch? Not sure what hash to put into a Fixes:
> > > tag given this is a day 1 bug and ipt_owner/ip6t_owner predate git.
> >
> > Looks mostly reasonable to me; though I guess it's a bit weird to have
> > two separate bailout paths for checking whether sk->sk_socket is NULL,
> > where the first check can race, and the second check uses different
> > logic for determining the return value; I don't know whether that
> > actually matters semantically. But I'm not sure how to make it look
> > nicer either.
>
> I find the code pretty confusing since it combines three matches (socket
> UID, socket GID and socket existence) via binary ops. The second bail
> disregards socket existence bits, I assumed it was deliberate and thus
> decided to leave the first part as-is.
>
> > I guess you could add a READ_ONCE() around the first read to signal
> > that that's a potentially racy read, but I don't feel strongly about
> > that.
>
> Is this just annotation or do you see a practical effect of using
> READ_ONCE() there?

I mostly just meant that as an annotation. My understanding is that in
theory, racy reads can cause the compiler to do some terrible things
to your code (https://lore.kernel.org/all/CAG48ez2nFks+yN1Kp4TZisso+rjvv_4UW0FTo8iFUd4Qyq1qDw@mail.gmail.com/),
but that's almost certainly not going to happen here.

(Well, I guess doing a READ_ONCE() at one side without doing
WRITE_ONCE() on the other side is also unclean...)

2023-12-07 18:10:33

by Phil Sutter

[permalink] [raw]
Subject: Re: Is xt_owner's owner_mt() racy with sock_orphan()? [worse with new TYPESAFE_BY_RCU file lifetime?]

On Wed, Dec 06, 2023 at 10:02:04PM +0100, Jann Horn wrote:
> On Wed, Dec 6, 2023 at 9:42 PM Phil Sutter <[email protected]> wrote:
> >
> > On Wed, Dec 06, 2023 at 05:28:44PM +0100, Jann Horn wrote:
> > > On Tue, Dec 5, 2023 at 10:40 PM Phil Sutter <[email protected]> wrote:
> > > > On Tue, Dec 05, 2023 at 06:08:29PM +0100, Jann Horn wrote:
> > > > > On Tue, Dec 5, 2023 at 5:40 PM Jann Horn <[email protected]> wrote:
> > > > > >
> > > > > > Hi!
> > > > > >
> > > > > > I think this code is racy, but testing that seems like a pain...
> > > > > >
> > > > > > owner_mt() in xt_owner runs in context of a NF_INET_LOCAL_OUT or
> > > > > > NF_INET_POST_ROUTING hook. It first checks that sk->sk_socket is
> > > > > > non-NULL, then checks that sk->sk_socket->file is non-NULL, then
> > > > > > accesses the ->f_cred of that file.
> > > > > >
> > > > > > I don't see anything that protects this against a concurrent
> > > > > > sock_orphan(), which NULLs out the sk->sk_socket pointer, if we're in
> > > > >
> > > > > Ah, and all the other users of ->sk_socket in net/netfilter/ do it
> > > > > under the sk_callback_lock... so I guess the fix would be to add the
> > > > > same in owner_mt?
> > > >
> > > > Sounds reasonable, although I wonder how likely a socket is to
> > > > orphan while netfilter is processing a packet it just sent.
> > > >
> > > > How about the attached patch? Not sure what hash to put into a Fixes:
> > > > tag given this is a day 1 bug and ipt_owner/ip6t_owner predate git.
> > >
> > > Looks mostly reasonable to me; though I guess it's a bit weird to have
> > > two separate bailout paths for checking whether sk->sk_socket is NULL,
> > > where the first check can race, and the second check uses different
> > > logic for determining the return value; I don't know whether that
> > > actually matters semantically. But I'm not sure how to make it look
> > > nicer either.
> >
> > I find the code pretty confusing since it combines three matches (socket
> > UID, socket GID and socket existence) via binary ops. The second bail
> > disregards socket existence bits, I assumed it was deliberate and thus
> > decided to leave the first part as-is.
> >
> > > I guess you could add a READ_ONCE() around the first read to signal
> > > that that's a potentially racy read, but I don't feel strongly about
> > > that.
> >
> > Is this just annotation or do you see a practical effect of using
> > READ_ONCE() there?
>
> I mostly just meant that as an annotation. My understanding is that in
> theory, racy reads can cause the compiler to do some terrible things
> to your code (https://lore.kernel.org/all/CAG48ez2nFks+yN1Kp4TZisso+rjvv_4UW0FTo8iFUd4Qyq1qDw@mail.gmail.com/),

Thanks for the pointer, this was an educational read!

> but that's almost certainly not going to happen here.

At least it's not a switch on a value in user-controlled memory. ;)

> (Well, I guess doing a READ_ONCE() at one side without doing
> WRITE_ONCE() on the other side is also unclean...)

For the annotation aspect it won't matter. Though since it will merely
improve reliability of that check in the given corner-case which is an
unreliable situation in the first place, I'd just leave it alone and
hope for the code to be replaced by the one in nft_meta.c eventually.

Thanks, Phil