2019-07-08 15:16:19

by Dag Moxnes

[permalink] [raw]
Subject: [PATCH v3] RDMA/core: Fix race when resolving IP address

Use neighbour lock when copying MAC address from neighbour data struct
in dst_fetch_ha.

When not using the lock, it is possible for the function to race with
neigh_update, causing it to copy an invalid MAC address.

It is possible to provoke this error by calling rdma_resolve_addr in a
tight loop, while deleting the corresponding ARP entry in another tight
loop.

This will cause the race shown it the following sample trace:

rdma_resolve_addr()
rdma_resolve_ip()
addr_resolve()
addr_resolve_neigh()
fetch_ha()
dst_fetch_ha()
n->nud_state == NUD_VALID

and

net_ioctl()
arp_ioctl()
arp_rec_delete()
arp_invalidate()
neigh_update()
__neigh_update()
neigh->nud_state = new;

Signed-off-by: Dag Moxnes <[email protected]>
Signed-off-by: Håkon Bugge <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
---
v1 -> v2:
* Modified implementation to improve readability

v2 -> v3:
* Added sample trace as suggested by Parav Pandit
* Added Reviewed-by


---
drivers/infiniband/core/addr.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 2f7d141598..51323ffbc5 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -333,11 +333,14 @@ static int dst_fetch_ha(const struct dst_entry *dst,
if (!n)
return -ENODATA;

- if (!(n->nud_state & NUD_VALID)) {
+ read_lock_bh(&n->lock);
+ if (n->nud_state & NUD_VALID) {
+ memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
+ read_unlock_bh(&n->lock);
+ } else {
+ read_unlock_bh(&n->lock);
neigh_event_send(n, NULL);
ret = -ENODATA;
- } else {
- memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
}

neigh_release(n);
--
2.20.1


2019-07-08 22:45:33

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3] RDMA/core: Fix race when resolving IP address

On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
> Use neighbour lock when copying MAC address from neighbour data struct
> in dst_fetch_ha.
>
> When not using the lock, it is possible for the function to race with
> neigh_update, causing it to copy an invalid MAC address.
>
> It is possible to provoke this error by calling rdma_resolve_addr in a
> tight loop, while deleting the corresponding ARP entry in another tight
> loop.
>
> This will cause the race shown it the following sample trace:
>
> rdma_resolve_addr()
> rdma_resolve_ip()
> addr_resolve()
> addr_resolve_neigh()
> fetch_ha()
> dst_fetch_ha()
> n->nud_state == NUD_VALID

It isn't nud_state that is the problem here, it is the parallel
memcpy's onto ha. I fixed the commit message

This could also have been solved by using the ha_lock, but I don't
think we have a reason to particularly over-optimize this.

> drivers/infiniband/core/addr.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)

Applied to for-next, thanks

Jason

2019-07-08 22:47:47

by Dag Moxnes

[permalink] [raw]
Subject: Re: [PATCH v3] RDMA/core: Fix race when resolving IP address

Thanks Jason,

Regards,
Dag

Den 08.07.2019 19:50, skrev Jason Gunthorpe:
> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
>> Use neighbour lock when copying MAC address from neighbour data struct
>> in dst_fetch_ha.
>>
>> When not using the lock, it is possible for the function to race with
>> neigh_update, causing it to copy an invalid MAC address.
>>
>> It is possible to provoke this error by calling rdma_resolve_addr in a
>> tight loop, while deleting the corresponding ARP entry in another tight
>> loop.
>>
>> This will cause the race shown it the following sample trace:
>>
>> rdma_resolve_addr()
>> rdma_resolve_ip()
>> addr_resolve()
>> addr_resolve_neigh()
>> fetch_ha()
>> dst_fetch_ha()
>> n->nud_state == NUD_VALID
> It isn't nud_state that is the problem here, it is the parallel
> memcpy's onto ha. I fixed the commit message
>
> This could also have been solved by using the ha_lock, but I don't
> think we have a reason to particularly over-optimize this.
>
>> drivers/infiniband/core/addr.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
> Applied to for-next, thanks
>
> Jason

2019-07-08 22:48:33

by Mark Bloch

[permalink] [raw]
Subject: Re: [PATCH v3] RDMA/core: Fix race when resolving IP address



On 7/8/19 11:47 AM, Dag Moxnes wrote:
> Thanks Jason,
>
> Regards,
> Dag
>
> Den 08.07.2019 19:50, skrev Jason Gunthorpe:
>> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
>>> Use neighbour lock when copying MAC address from neighbour data struct
>>> in dst_fetch_ha.
>>>
>>> When not using the lock, it is possible for the function to race with
>>> neigh_update, causing it to copy an invalid MAC address.
>>>
>>> It is possible to provoke this error by calling rdma_resolve_addr in a
>>> tight loop, while deleting the corresponding ARP entry in another tight
>>> loop.
>>>
>>> This will cause the race shown it the following sample trace:
>>>
>>> rdma_resolve_addr()
>>>    rdma_resolve_ip()
>>>      addr_resolve()
>>>        addr_resolve_neigh()
>>>          fetch_ha()
>>>            dst_fetch_ha()
>>>              n->nud_state == NUD_VALID
>> It isn't nud_state that is the problem here, it is the parallel
>> memcpy's onto ha. I fixed the commit message
>>
>> This could also have been solved by using the ha_lock, but I don't
>> think we have a reason to particularly over-optimize this.

Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()?

>>
>>>   drivers/infiniband/core/addr.c | 9 ++++++---
>>>   1 file changed, 6 insertions(+), 3 deletions(-)
>> Applied to for-next, thanks
>>
>> Jason
>

Mark

2019-07-08 22:49:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3] RDMA/core: Fix race when resolving IP address

On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote:
>
>
> On 7/8/19 11:47 AM, Dag Moxnes wrote:
> > Thanks Jason,
> >
> > Regards,
> > Dag
> >
> > Den 08.07.2019 19:50, skrev Jason Gunthorpe:
> >> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
> >>> Use neighbour lock when copying MAC address from neighbour data struct
> >>> in dst_fetch_ha.
> >>>
> >>> When not using the lock, it is possible for the function to race with
> >>> neigh_update, causing it to copy an invalid MAC address.
> >>>
> >>> It is possible to provoke this error by calling rdma_resolve_addr in a
> >>> tight loop, while deleting the corresponding ARP entry in another tight
> >>> loop.
> >>>
> >>> This will cause the race shown it the following sample trace:
> >>>
> >>> rdma_resolve_addr()
> >>>    rdma_resolve_ip()
> >>>      addr_resolve()
> >>>        addr_resolve_neigh()
> >>>          fetch_ha()
> >>>            dst_fetch_ha()
> >>>              n->nud_state == NUD_VALID
> >> It isn't nud_state that is the problem here, it is the parallel
> >> memcpy's onto ha. I fixed the commit message
> >>
> >> This could also have been solved by using the ha_lock, but I don't
> >> think we have a reason to particularly over-optimize this.
>
> Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()?

Yes, that is much better, please respin this

Thanks,
Jason

2019-07-08 22:50:31

by Dag Moxnes

[permalink] [raw]
Subject: Re: [PATCH v3] RDMA/core: Fix race when resolving IP address



Den 08.07.2019 21:38, skrev Jason Gunthorpe:
> On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote:
>>
>> On 7/8/19 11:47 AM, Dag Moxnes wrote:
>>> Thanks Jason,
>>>
>>> Regards,
>>> Dag
>>>
>>> Den 08.07.2019 19:50, skrev Jason Gunthorpe:
>>>> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
>>>>> Use neighbour lock when copying MAC address from neighbour data struct
>>>>> in dst_fetch_ha.
>>>>>
>>>>> When not using the lock, it is possible for the function to race with
>>>>> neigh_update, causing it to copy an invalid MAC address.
>>>>>
>>>>> It is possible to provoke this error by calling rdma_resolve_addr in a
>>>>> tight loop, while deleting the corresponding ARP entry in another tight
>>>>> loop.
>>>>>
>>>>> This will cause the race shown it the following sample trace:
>>>>>
>>>>> rdma_resolve_addr()
>>>>>    rdma_resolve_ip()
>>>>>      addr_resolve()
>>>>>        addr_resolve_neigh()
>>>>>          fetch_ha()
>>>>>            dst_fetch_ha()
>>>>>              n->nud_state == NUD_VALID
>>>> It isn't nud_state that is the problem here, it is the parallel
>>>> memcpy's onto ha. I fixed the commit message
>>>>
>>>> This could also have been solved by using the ha_lock, but I don't
>>>> think we have a reason to particularly over-optimize this.
>> Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()?
> Yes, that is much better, please respin this
OK, will do!
Can I still post it as a v4? Or should I do it differently as you
already applied it?

Regards,
-Dag
>
> Thanks,
> Jason

2019-07-08 23:45:42

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3] RDMA/core: Fix race when resolving IP address

On Mon, Jul 08, 2019 at 10:11:29PM +0200, Dag Moxnes wrote:
>
>
> Den 08.07.2019 21:38, skrev Jason Gunthorpe:
> > On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote:
> > >
> > > On 7/8/19 11:47 AM, Dag Moxnes wrote:
> > > > Thanks Jason,
> > > >
> > > > Regards,
> > > > Dag
> > > >
> > > > Den 08.07.2019 19:50, skrev Jason Gunthorpe:
> > > > > On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
> > > > > > Use neighbour lock when copying MAC address from neighbour data struct
> > > > > > in dst_fetch_ha.
> > > > > >
> > > > > > When not using the lock, it is possible for the function to race with
> > > > > > neigh_update, causing it to copy an invalid MAC address.
> > > > > >
> > > > > > It is possible to provoke this error by calling rdma_resolve_addr in a
> > > > > > tight loop, while deleting the corresponding ARP entry in another tight
> > > > > > loop.
> > > > > >
> > > > > > This will cause the race shown it the following sample trace:
> > > > > >
> > > > > > rdma_resolve_addr()
> > > > > >    rdma_resolve_ip()
> > > > > >      addr_resolve()
> > > > > >        addr_resolve_neigh()
> > > > > >          fetch_ha()
> > > > > >            dst_fetch_ha()
> > > > > >              n->nud_state == NUD_VALID
> > > > > It isn't nud_state that is the problem here, it is the parallel
> > > > > memcpy's onto ha. I fixed the commit message
> > > > >
> > > > > This could also have been solved by using the ha_lock, but I don't
> > > > > think we have a reason to particularly over-optimize this.
> > > Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()?
> > Yes, that is much better, please respin this
> OK, will do!
> Can I still post it as a v4? Or should I do it differently as you already
> applied it?

post a v4

Jason