2015-04-02 10:05:23

by Yann Droneaud

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,

Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> Hi,
>
> It was found that the Linux kernel's InfiniBand/RDMA subsystem did not
> properly sanitize input parameters while registering memory regions
> from user space via the (u)verbs API. A local user with access to
> a /dev/infiniband/uverbsX device could use this flaw to crash the
> system or, potentially, escalate their privileges on the system.
>
> The issue has been assigned CVE-2014-8159.
>
> The issue exists in the InfiniBand/RDMA/iWARP drivers since Linux
> Kernel version 2.6.13.
>
> Mellanox OFED 2.4-1.0.4 fixes the issue. Available from:
> http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
>
> RedHat errata: https://access.redhat.com/security/cve/CVE-2014-8159
> Canonical errata: http://people.canonical.com/~ubuntu-security/cve/2014/CVE-2014-8159.html
> Novell (Suse) bug tracking: https://bugzilla.novell.com/show_bug.cgi?id=914742
>
>
> The following patch fixes the issue:
>
> --------------- 8< ------------------------------
>
> From d4d68430d4a12c569e28b4f4468284ea22111186 Mon Sep 17 00:00:00 2001
> From: Shachar Raindel <[email protected]>
> Date: Sun, 04 Jan 2015 18:30:32 +0200
> Subject: [PATCH] IB/core: Prevent integer overflow in ib_umem_get address arithmetic
>
> Properly verify that the resulting page aligned end address is larger
> than both the start address and the length of the memory area
> requested.
>
> Both the start and length arguments for ib_umem_get are controlled by
> the user. A misbehaving user can provide values which will cause an
> integer overflow when calculating the page aligned end address.
>
> This overflow can cause also miscalculation of the number of pages
> mapped, and additional logic issues.
>
> Signed-off-by: Shachar Raindel <[email protected]>
> Signed-off-by: Jack Morgenstein <[email protected]>
> Signed-off-by: Or Gerlitz <[email protected]>
> ---
>
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index aec7a6a..8c014b5 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -99,6 +99,14 @@
> if (dmasync)
> dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
>
> + /*
> + * If the combination of the addr and size requested for this memory
> + * region causes an integer overflow, return error.
> + */
> + if ((PAGE_ALIGN(addr + size) <= size) ||
> + (PAGE_ALIGN(addr + size) <= addr))
> + return ERR_PTR(-EINVAL);
> +

Can access_ok() be used here ?

if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
addr, size))
return ERR_PTR(-EINVAL);


> if (!can_do_mlock())
> return ERR_PTR(-EPERM);
>

Regards.

--
Yann Droneaud
OPTEYA


2015-04-02 11:08:04

by Shachar Raindel

[permalink] [raw]
Subject: RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,

> -----Original Message-----
> From: Yann Droneaud [mailto:[email protected]]
> Sent: Thursday, April 02, 2015 1:05 PM
> To: Shachar Raindel
> Cc: [email protected]; <[email protected]>
> ([email protected]); [email protected];
> [email protected]
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
>
> Hi,
>
> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > Hi,
> >

<snipped long e-mail>

> > + /*
> > + * If the combination of the addr and size requested for this
> memory
> > + * region causes an integer overflow, return error.
> > + */
> > + if ((PAGE_ALIGN(addr + size) <= size) ||
> > + (PAGE_ALIGN(addr + size) <= addr))
> > + return ERR_PTR(-EINVAL);
> > +
>
> Can access_ok() be used here ?
>
> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> addr, size))
> return ERR_PTR(-EINVAL);
>

No, this will break the current ODP semantics.

ODP allows the user to register memory that is not accessible yet.
This is a critical design feature, as it allows avoiding holding
a registration cache. Adding this check will break the behavior,
forcing memory to be all accessible when registering an ODP MR.

Thanks,
--Shachar
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-04-02 13:30:49

by Yann Droneaud

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,

Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:[email protected]]
> > Sent: Thursday, April 02, 2015 1:05 PM
> > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :

> > > + /*
> > > + * If the combination of the addr and size requested for this
> > memory
> > > + * region causes an integer overflow, return error.
> > > + */
> > > + if ((PAGE_ALIGN(addr + size) <= size) ||
> > > + (PAGE_ALIGN(addr + size) <= addr))
> > > + return ERR_PTR(-EINVAL);
> > > +
> >
> > Can access_ok() be used here ?
> >
> > if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > addr, size))
> > return ERR_PTR(-EINVAL);
> >
>
> No, this will break the current ODP semantics.
>
> ODP allows the user to register memory that is not accessible yet.
> This is a critical design feature, as it allows avoiding holding
> a registration cache. Adding this check will break the behavior,
> forcing memory to be all accessible when registering an ODP MR.
>

Where's the check for the range being in userspace memory space,
especially for the ODP case ?

For non ODP case (eg. plain old behavior), does get_user_pages()
ensure the requested pages fit in userspace region on all
architectures ? I think so.

In ODP case, I'm not sure such check is ever done ?
(Aside, does it take special mesure to protect shared mapping from
being read and/or *written* ?)

Regards.

--
Yann Droneaud
OPTEYA

2015-04-02 15:16:11

by Yann Droneaud

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,
Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:[email protected]]
> > Sent: Thursday, April 02, 2015 1:05 PM
> > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
...
> > > + /*
> > > + * If the combination of the addr and size requested for this
> > memory
> > > + * region causes an integer overflow, return error.
> > > + */
> > > + if ((PAGE_ALIGN(addr + size) <= size) ||
> > > + (PAGE_ALIGN(addr + size) <= addr))
> > > + return ERR_PTR(-EINVAL);
> > > +
> >
> > Can access_ok() be used here ?
> >
> > if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > addr, size))
> > return ERR_PTR(-EINVAL);
> >
>
> No, this will break the current ODP semantics.
>
> ODP allows the user to register memory that is not accessible yet.
> This is a critical design feature, as it allows avoiding holding
> a registration cache. Adding this check will break the behavior,
> forcing memory to be all accessible when registering an ODP MR.
>

Failed to notice previously, but since this would break ODP, and ODP is
only available starting v3.19-rc1, my proposed fix might be applicable
for older kernel (if not better).

>From 2a3beaeb4b35d968f127cb59cfda2f12dbd87e4b Mon Sep 17 00:00:00 2001
From: Yann Droneaud <[email protected]>
Date: Thu, 2 Apr 2015 17:01:05 +0200
Subject: [RFC PATCH] IB/core: reject invalid userspace memory range registration

Signed-off-by: Yann Droneaud <[email protected]>
---
drivers/infiniband/core/umem.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index df0c4f605a21..6758b4ac56eb 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -90,6 +90,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
DEFINE_DMA_ATTRS(attrs);
struct scatterlist *sg, *sg_list_start;
int need_release = 0;
+ bool writable;

if (dmasync)
dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
@@ -97,6 +98,19 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
if (!can_do_mlock())
return ERR_PTR(-EPERM);

+ /*
+ * We ask for writable memory if any access flags other than
+ * "remote read" are set. "Local write" and "remote write"
+ * obviously require write access. "Remote atomic" can do
+ * things like fetch and add, which will modify memory, and
+ * "MW bind" can change permissions by binding a window.
+ */
+ writable = !!(access & ~IB_ACCESS_REMOTE_READ);
+
+ if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
+ (void __user *)addr, size))
+ return ERR_PTR(-EFAULT);
+
umem = kzalloc(sizeof *umem, GFP_KERNEL);
if (!umem)
return ERR_PTR(-ENOMEM);
@@ -106,14 +120,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
umem->offset = addr & ~PAGE_MASK;
umem->page_size = PAGE_SIZE;
umem->pid = get_task_pid(current, PIDTYPE_PID);
- /*
- * We ask for writable memory if any access flags other than
- * "remote read" are set. "Local write" and "remote write"
- * obviously require write access. "Remote atomic" can do
- * things like fetch and add, which will modify memory, and
- * "MW bind" can change permissions by binding a window.
- */
- umem->writable = !!(access & ~IB_ACCESS_REMOTE_READ);
+ umem->writable = writable;

/* We assume the memory is from hugetlb until proved otherwise */
umem->hugetlb = 1;
--
2.1.0

Regards.

--
Yann Droneaud
OPTEYA

2015-04-02 15:33:39

by Haggai Eran

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

On 02/04/2015 16:30, Yann Droneaud wrote:
> Hi,
>
> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
>>> -----Original Message-----
>>> From: Yann Droneaud [mailto:[email protected]]
>>> Sent: Thursday, April 02, 2015 1:05 PM
>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
>
>>>> + /*
>>>> + * If the combination of the addr and size requested for this
>>> memory
>>>> + * region causes an integer overflow, return error.
>>>> + */
>>>> + if ((PAGE_ALIGN(addr + size) <= size) ||
>>>> + (PAGE_ALIGN(addr + size) <= addr))
>>>> + return ERR_PTR(-EINVAL);
>>>> +
>>>
>>> Can access_ok() be used here ?
>>>
>>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>>> addr, size))
>>> return ERR_PTR(-EINVAL);
>>>
>>
>> No, this will break the current ODP semantics.
>>
>> ODP allows the user to register memory that is not accessible yet.
>> This is a critical design feature, as it allows avoiding holding
>> a registration cache. Adding this check will break the behavior,
>> forcing memory to be all accessible when registering an ODP MR.
>>
>
> Where's the check for the range being in userspace memory space,
> especially for the ODP case ?
>
> For non ODP case (eg. plain old behavior), does get_user_pages()
> ensure the requested pages fit in userspace region on all
> architectures ? I think so.

Yes, get_user_pages will return a smaller amount of pages than requested
if it encounters an unmapped region (or a region without write
permissions for write requests). If this happens, the loop in
ib_umem_get calls get_user_pages again with the next set of pages, and
this time if it the first page still cannot be mapped an error is returned.

>
> In ODP case, I'm not sure such check is ever done ?

In ODP, we also call get_user_pages, but only when a page fault occurs
(see ib_umem_odp_map_dma_pages()). This allows the user to pre-register
a memory region that contains unmapped virtual space, and then mmap
different files into that area without needing to re-register.

> (Aside, does it take special mesure to protect shared mapping from
> being read and/or *written* ?)

I'm not sure I understand the question. Shared mappings that the process
is allowed to read or write are also allowed for the HCA (specifically,
to local and remote operations the same process performs using the HCA),
provided the application has registered their virtual address space as a
memory region.

Regards,
Haggai

2015-04-02 16:49:29

by Shachar Raindel

[permalink] [raw]
Subject: RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,

> -----Original Message-----
> From: Yann Droneaud [mailto:[email protected]]
> Sent: Thursday, April 02, 2015 6:16 PM
> To: Shachar Raindel
> Cc: [email protected]; <[email protected]>
> ([email protected]); [email protected];
> [email protected]
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
>
> Hi,
> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > -----Original Message-----
> > > From: Yann Droneaud [mailto:[email protected]]
> > > Sent: Thursday, April 02, 2015 1:05 PM
> > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> ...
> > > > + /*
> > > > + * If the combination of the addr and size requested for this
> > > memory
> > > > + * region causes an integer overflow, return error.
> > > > + */
> > > > + if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > + (PAGE_ALIGN(addr + size) <= addr))
> > > > + return ERR_PTR(-EINVAL);
> > > > +
> > >
> > > Can access_ok() be used here ?
> > >
> > > if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > > addr, size))
> > > return ERR_PTR(-EINVAL);
> > >
> >
> > No, this will break the current ODP semantics.
> >
> > ODP allows the user to register memory that is not accessible yet.
> > This is a critical design feature, as it allows avoiding holding
> > a registration cache. Adding this check will break the behavior,
> > forcing memory to be all accessible when registering an ODP MR.
> >
>
> Failed to notice previously, but since this would break ODP, and ODP is
> only available starting v3.19-rc1, my proposed fix might be applicable
> for older kernel (if not better).
>

Can you explain how this proposed fix is better than the existing patch?
Why do we want to push to the stable tree a patch that is not in the
upstream? There is an existing, tested, patch that is going to the tip
of the development. It even applies cleanly on every kernel version around.

Thanks,
--Shachar
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-04-02 16:35:32

by Yann Droneaud

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi Haggai,

Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
> On 02/04/2015 16:30, Yann Droneaud wrote:
> > Hi,
> >
> > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> >>> -----Original Message-----
> >>> From: Yann Droneaud [mailto:[email protected]]
> >>> Sent: Thursday, April 02, 2015 1:05 PM
> >>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> >
> >>>> + /*
> >>>> + * If the combination of the addr and size requested for this
> >>> memory
> >>>> + * region causes an integer overflow, return error.
> >>>> + */
> >>>> + if ((PAGE_ALIGN(addr + size) <= size) ||
> >>>> + (PAGE_ALIGN(addr + size) <= addr))
> >>>> + return ERR_PTR(-EINVAL);
> >>>> +
> >>>
> >>> Can access_ok() be used here ?
> >>>
> >>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >>> addr, size))
> >>> return ERR_PTR(-EINVAL);
> >>>
> >>
> >> No, this will break the current ODP semantics.
> >>
> >> ODP allows the user to register memory that is not accessible yet.
> >> This is a critical design feature, as it allows avoiding holding
> >> a registration cache. Adding this check will break the behavior,
> >> forcing memory to be all accessible when registering an ODP MR.
> >>
> >
> > Where's the check for the range being in userspace memory space,
> > especially for the ODP case ?
> >
> > For non ODP case (eg. plain old behavior), does get_user_pages()
> > ensure the requested pages fit in userspace region on all
> > architectures ? I think so.
>
> Yes, get_user_pages will return a smaller amount of pages than requested
> if it encounters an unmapped region (or a region without write
> permissions for write requests). If this happens, the loop in
> ib_umem_get calls get_user_pages again with the next set of pages, and
> this time if it the first page still cannot be mapped an error is returned.
>
> >
> > In ODP case, I'm not sure such check is ever done ?
>
> In ODP, we also call get_user_pages, but only when a page fault occurs
> (see ib_umem_odp_map_dma_pages()). This allows the user to pre-register
> a memory region that contains unmapped virtual space, and then mmap
> different files into that area without needing to re-register.
>

OK, thanks for the description.

> > (Aside, does it take special mesure to protect shared mapping from
> > being read and/or *written* ?)
>
> I'm not sure I understand the question. Shared mappings that the process
> is allowed to read or write are also allowed for the HCA (specifically,
> to local and remote operations the same process performs using the HCA),
> provided the application has registered their virtual address space as a
> memory region.
>

I was refering to description of get_user_pages():

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/mm/gup.c?id=v4.0-rc6#n765

* @force: whether to force access even when user mapping is currently
* protected (but never forces write access to shared mapping).

But since ib_umem_odp_map_dma_pages() use get_user_pages() with force
argument set to 0, it's OK.

Another related question: as the large memory range could be registered
by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
what's prevent the kernel to map a file as the result of mmap(0, ...)
in this region, making it available remotely through IBV_WR_RDMA_READ /
IBV_WR_RDMA_WRITE ?

Again, thanks for the information I was missing to understand how ODP is
checking the memory ranges.

Regards.

--
Yann Droneaud
OPTEYA

2015-04-02 16:44:40

by Shachar Raindel

[permalink] [raw]
Subject: RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access



> -----Original Message-----
> From: Yann Droneaud [mailto:[email protected]]
> Sent: Thursday, April 02, 2015 7:35 PM
> To: Haggai Eran
> Cc: Shachar Raindel; Sagi Grimberg; [email protected];
> <[email protected]> ([email protected]); linux-
> [email protected]; [email protected]
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
>
> Hi Haggai,
>
> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
> > On 02/04/2015 16:30, Yann Droneaud wrote:
> > > Hi,
> > >
> > > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > >>> -----Original Message-----
> > >>> From: Yann Droneaud [mailto:[email protected]]
> > >>> Sent: Thursday, April 02, 2015 1:05 PM
> > >>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > >
> > >>>> + /*
> > >>>> + * If the combination of the addr and size requested for this
> > >>> memory
> > >>>> + * region causes an integer overflow, return error.
> > >>>> + */
> > >>>> + if ((PAGE_ALIGN(addr + size) <= size) ||
> > >>>> + (PAGE_ALIGN(addr + size) <= addr))
> > >>>> + return ERR_PTR(-EINVAL);
> > >>>> +
> > >>>
> > >>> Can access_ok() be used here ?
> > >>>
> > >>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > >>> addr, size))
> > >>> return ERR_PTR(-EINVAL);
> > >>>
> > >>
> > >> No, this will break the current ODP semantics.
> > >>
> > >> ODP allows the user to register memory that is not accessible yet.
> > >> This is a critical design feature, as it allows avoiding holding
> > >> a registration cache. Adding this check will break the behavior,
> > >> forcing memory to be all accessible when registering an ODP MR.
> > >>
> > >
> > > Where's the check for the range being in userspace memory space,
> > > especially for the ODP case ?
> > >
> > > For non ODP case (eg. plain old behavior), does get_user_pages()
> > > ensure the requested pages fit in userspace region on all
> > > architectures ? I think so.
> >
> > Yes, get_user_pages will return a smaller amount of pages than
> requested
> > if it encounters an unmapped region (or a region without write
> > permissions for write requests). If this happens, the loop in
> > ib_umem_get calls get_user_pages again with the next set of pages, and
> > this time if it the first page still cannot be mapped an error is
> returned.
> >
> > >
> > > In ODP case, I'm not sure such check is ever done ?
> >
> > In ODP, we also call get_user_pages, but only when a page fault occurs
> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
> register
> > a memory region that contains unmapped virtual space, and then mmap
> > different files into that area without needing to re-register.
> >
>
> OK, thanks for the description.
>
> > > (Aside, does it take special mesure to protect shared mapping from
> > > being read and/or *written* ?)
> >
> > I'm not sure I understand the question. Shared mappings that the
> process
> > is allowed to read or write are also allowed for the HCA
> (specifically,
> > to local and remote operations the same process performs using the
> HCA),
> > provided the application has registered their virtual address space as
> a
> > memory region.
> >
>
> I was refering to description of get_user_pages():
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/mm/g
> up.c?id=v4.0-rc6#n765
>
> * @force: whether to force access even when user mapping is currently
> * protected (but never forces write access to shared mapping).
>
> But since ib_umem_odp_map_dma_pages() use get_user_pages() with force
> argument set to 0, it's OK.
>
> Another related question: as the large memory range could be registered
> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> what's prevent the kernel to map a file as the result of mmap(0, ...)
> in this region, making it available remotely through IBV_WR_RDMA_READ /
> IBV_WR_RDMA_WRITE ?
>

This is not a bug. This is a feature.

Exposing a file through RDMA, using ODP, can be done exactly like this.
Given that the application explicitly requested this behavior, I don't
see why it is a problem. Actually, some of our tests use such flows.
The mmu notifiers mechanism allow us to do this safely. When the page is
written back to disk, it is removed from the ODP mapping. When it is
accessed by the HCA, it is brought back to RAM.


Thanks,
--Shachar
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-04-02 18:13:07

by Haggai Eran

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access


On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote:
>> -----Original Message-----
>> From: Yann Droneaud [mailto:[email protected]]
>> Sent: Thursday, April 02, 2015 7:35 PM
>> To: Haggai Eran
>> Cc: Shachar Raindel; Sagi Grimberg; [email protected];
>> <[email protected]> ([email protected]); linux-
>> [email protected]; [email protected]
>> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
>> physical memory access
>>
>> Hi Haggai,
>>
>> Le jeudi 02 avril 2015 ? 18:18 +0300, Haggai Eran a ?crit :
>> > On 02/04/2015 16:30, Yann Droneaud wrote:
>> >> Hi,
>> >>
>> >> Le jeudi 02 avril 2015 ? 10:52 +0000, Shachar Raindel a ?crit :
>> >>>> -----Original Message-----
>> >>>> From: Yann Droneaud [mailto:[email protected]]
>> >>>> Sent: Thursday, April 02, 2015 1:05 PM
>> >>>> Le mercredi 18 mars 2015 ? 17:39 +0000, Shachar Raindel a ?crit :
>> >>
>> >>>>> + /*
>> >>>>> + * If the combination of the addr and size requested for this
>> >>>> memory
>> >>>>> + * region causes an integer overflow, return error.
>> >>>>> + */
>> >>>>> + if ((PAGE_ALIGN(addr + size) <= size) ||
>> >>>>> + (PAGE_ALIGN(addr + size) <= addr))
>> >>>>> + return ERR_PTR(-EINVAL);
>> >>>>> +
>> >>>>
>> >>>> Can access_ok() be used here ?
>> >>>>
>> >>>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>> >>>> addr, size))
>> >>>> return ERR_PTR(-EINVAL);
>> >>>>
>> >>>
>> >>> No, this will break the current ODP semantics.
>> >>>
>> >>> ODP allows the user to register memory that is not accessible yet.
>> >>> This is a critical design feature, as it allows avoiding holding
>> >>> a registration cache. Adding this check will break the behavior,
>> >>> forcing memory to be all accessible when registering an ODP MR.
>> >>>
>> >>
>> >> Where's the check for the range being in userspace memory space,
>> >> especially for the ODP case ?
>> >>
>> >> For non ODP case (eg. plain old behavior), does get_user_pages()
>> >> ensure the requested pages fit in userspace region on all
>> >> architectures ? I think so.
>> >
>> > Yes, get_user_pages will return a smaller amount of pages than
>> requested
>> > if it encounters an unmapped region (or a region without write
>> > permissions for write requests). If this happens, the loop in
>> > ib_umem_get calls get_user_pages again with the next set of pages, and
>> > this time if it the first page still cannot be mapped an error is
>> returned.
>> >
>> >>
>> >> In ODP case, I'm not sure such check is ever done ?
>> >
>> > In ODP, we also call get_user_pages, but only when a page fault occurs
>> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
>> register
>> > a memory region that contains unmapped virtual space, and then mmap
>> > different files into that area without needing to re-register.
>> >
>>
>> OK, thanks for the description.
>>
>> ...
>>
>> Another related question: as the large memory range could be registered
>> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
>> what's prevent the kernel to map a file as the result of mmap(0, ...)
>> in this region, making it available remotely through IBV_WR_RDMA_READ /
>> IBV_WR_RDMA_WRITE ?
>>
>
> This is not a bug. This is a feature.
>
> Exposing a file through RDMA, using ODP, can be done exactly like this.
> Given that the application explicitly requested this behavior, I don't
> see why it is a problem. Actually, some of our tests use such flows.
> The mmu notifiers mechanism allow us to do this safely. When the page is
> written back to disk, it is removed from the ODP mapping. When it is
> accessed by the HCA, it is brought back to RAM.
>

I want to add that we would like to see users registering a very large memory region (perhaps the entire process address space) for local access, and then enabling remote access only to specific regions using memory windows. However, this isn't supported yet by our driver. Still, there are valid cases where you would still want the results of an mmap(0,...) call to be remotely accessible, in cases where there is enough trust between the local process and the remote process. It may help a middleware communication library register a large portion of the address space in advance, and still work with random pointers given to it by another application module.

Regards,
Haggai-

2015-04-02 20:40:17

by Yann Droneaud

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,

Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:[email protected]]
> > Sent: Thursday, April 02, 2015 7:35 PM

> > Another related question: as the large memory range could be registered
> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> > what's prevent the kernel to map a file as the result of mmap(0, ...)
> > in this region, making it available remotely through IBV_WR_RDMA_READ /
> > IBV_WR_RDMA_WRITE ?
> >
>
> This is not a bug. This is a feature.
>
> Exposing a file through RDMA, using ODP, can be done exactly like this.
> Given that the application explicitly requested this behavior, I don't
> see why it is a problem.

If the application cannot choose what will end up in the region it has
registered, it's an issue !

What might happen if one library in a program call mmap(0, size, ...) to
load a file storing a secret (a private key), and that file ends up
being mapped in an registered but otherwise free region (afaict, the
kernel is allowed to do it) ?
What might happen if one library in a program call call mmap(0,
size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
write in this location a secret (a passphrase), and that area ends up
in the memory region registered for on demand paging ?

The application haven't choose to disclose these confidential piece of
information, but they are available for reading/writing by remote
through RDMA given it knows the rkey of the memory region (which is a
32bits value).

I hope I'm missing something, because I'm not feeling confident such
behavor is a feature.


> Actually, some of our tests use such flows.
> The mmu notifiers mechanism allow us to do this safely. When the page is
> written back to disk, it is removed from the ODP mapping. When it is
> accessed by the HCA, it is brought back to RAM.
>

I'm not discussing about the benefit of On Demand Paging and why it's a
very good feature to expose files through RDMA.

I'm trying to understand how the application can choose what is exposed
through RDMA if it registers a very large memory region for later use
(but do not actually explicitly map something there yet): what's the
consequences ?

void *start = sbrk(0);
size_t size = ULONG_MAX - (unsigned long)start;

ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)


Regards.

--
Yann Droneaud
OPTEYA

2015-04-03 08:39:53

by Haggai Eran

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

On Thursday, April 2, 2015 11:40 PM, Yann Droneaud <[email protected]> wrote:
> Le jeudi 02 avril 2015 ? 16:44 +0000, Shachar Raindel a ?crit :
>> > -----Original Message-----
>> > From: Yann Droneaud [mailto:[email protected]]
>> > Sent: Thursday, April 02, 2015 7:35 PM
>
>> > Another related question: as the large memory range could be registered
>> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
>> > what's prevent the kernel to map a file as the result of mmap(0, ...)
>> > in this region, making it available remotely through IBV_WR_RDMA_READ /
>> > IBV_WR_RDMA_WRITE ?
>> >
>>
>> This is not a bug. This is a feature.
>>
>> Exposing a file through RDMA, using ODP, can be done exactly like this.
>> Given that the application explicitly requested this behavior, I don't
>> see why it is a problem.
>
> If the application cannot choose what will end up in the region it has
> registered, it's an issue !
>
> What might happen if one library in a program call mmap(0, size, ...) to
> load a file storing a secret (a private key), and that file ends up
> being mapped in an registered but otherwise free region (afaict, the
> kernel is allowed to do it) ?
> What might happen if one library in a program call call mmap(0,
> size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
> write in this location a secret (a passphrase), and that area ends up
> in the memory region registered for on demand paging ?
>
> The application haven't choose to disclose these confidential piece of
> information, but they are available for reading/writing by remote
> through RDMA given it knows the rkey of the memory region (which is a
> 32bits value).
>
> I hope I'm missing something, because I'm not feeling confident such
> behavor is a feature.

What we are aiming for is the possibility to register the entire process' address
space for RDMA operations (if the process chooses to use this feature).
This is similar to multiple threads accessing the same address space. I'm sure
you wouldn't be complaining about the ability of one thread to access the secret
passphrase mmapped by another thread in your example.

> I'm trying to understand how the application can choose what is exposed
> through RDMA if it registers a very large memory region for later use
> (but do not actually explicitly map something there yet): what's the
> consequences ?
>
> void *start = sbrk(0);
> size_t size = ULONG_MAX - (unsigned long)start;
>
> ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)

The consequences are exactly as you wrote. Just as giving a non-ODP rkey
to a remote node allows the node to access the registered memory behind that
rkey, giving an ODP rkey to a remote node allows that node to access the
virtual address space behind that rkey.

Regards,
Haggai-

2015-04-03 11:50:35

by Yann Droneaud

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,

Le vendredi 03 avril 2015 à 08:39 +0000, Haggai Eran a écrit :
> On Thursday, April 2, 2015 11:40 PM, Yann Droneaud <[email protected]> wrote:
> > Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit :
> >> > -----Original Message-----
> >> > From: Yann Droneaud [mailto:[email protected]]
> >> > Sent: Thursday, April 02, 2015 7:35 PM
> >
> >> > Another related question: as the large memory range could be registered
> >> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> >> > what's prevent the kernel to map a file as the result of mmap(0, ...)
> >> > in this region, making it available remotely through IBV_WR_RDMA_READ /
> >> > IBV_WR_RDMA_WRITE ?
> >> >
> >>
> >> This is not a bug. This is a feature.
> >>
> >> Exposing a file through RDMA, using ODP, can be done exactly like this.
> >> Given that the application explicitly requested this behavior, I don't
> >> see why it is a problem.
> >
> > If the application cannot choose what will end up in the region it has
> > registered, it's an issue !
> >
> > What might happen if one library in a program call mmap(0, size, ...) to
> > load a file storing a secret (a private key), and that file ends up
> > being mapped in an registered but otherwise free region (afaict, the
> > kernel is allowed to do it) ?
> > What might happen if one library in a program call call mmap(0,
> > size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
> > write in this location a secret (a passphrase), and that area ends up
> > in the memory region registered for on demand paging ?
> >
> > The application haven't choose to disclose these confidential piece of
> > information, but they are available for reading/writing by remote
> > through RDMA given it knows the rkey of the memory region (which is a
> > 32bits value).
> >
> > I hope I'm missing something, because I'm not feeling confident such
> > behavor is a feature.
>
> What we are aiming for is the possibility to register the entire process' address
> space for RDMA operations (if the process chooses to use this feature).
> This is similar to multiple threads accessing the same address space. I'm sure
> you wouldn't be complaining about the ability of one thread to access the secret
> passphrase mmapped by another thread in your example.
>
> > I'm trying to understand how the application can choose what is exposed
> > through RDMA if it registers a very large memory region for later use
> > (but do not actually explicitly map something there yet): what's the
> > consequences ?
> >
> > void *start = sbrk(0);
> > size_t size = ULONG_MAX - (unsigned long)start;
> >
> > ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)
>
> The consequences are exactly as you wrote. Just as giving a non-ODP rkey
> to a remote node allows the node to access the registered memory behind that
> rkey, giving an ODP rkey to a remote node allows that node to access the
> virtual address space behind that rkey.
>

There's a difference: it's impossible to give a valid non-ODP rkey that
point to a memory region not already mapped (backed by a file for
example), so the application *choose* the content of the memory to be
made accessible remotely before making it accessible.

As I understand the last explanation regarding ODP, at creation time,
an ODP rkey can point to a free, unused, unallocated memory portion.
At this point the kernel can happily map anything the application
(and its libraries) want to map at a (almost) *random* address that
could be in (or partially in) the ODP memory region.

And I have a problem with such random behavior. Allowing this is seems
dangerous and should be done with care.

I believe the application must kept the control of what's end up in its
ODP registered memory region.

Especially for multi thread program: imagine one thread creating a large
memory region for its future purposes, then send the rkey to a remote
peer and wait for some work to be done.
In the mean time another call mmap(0, ...) to map a file at a kernel
chosen address, and that address happen to be in the memory region
registered by the other thread:

1) the first thread is amputated from a portion of memory it was
willing to use;
2) the data used by the second thread is accessible to the remote
peer(s) while not expected.

Speculatively registering memory seems dangerous for any use case I
could think of.

Regards.

--
Yann Droneaud
OPTEYA

2015-04-08 12:19:43

by Yann Droneaud

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,

Le jeudi 02 avril 2015 à 16:34 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:[email protected]]
> > Sent: Thursday, April 02, 2015 6:16 PM
> > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > > -----Original Message-----
> > > > From: Yann Droneaud [mailto:[email protected]]
> > > > Sent: Thursday, April 02, 2015 1:05 PM
> > > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > ...
> > > > > + /*
> > > > > + * If the combination of the addr and size requested for this
> > > > memory
> > > > > + * region causes an integer overflow, return error.
> > > > > + */
> > > > > + if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > > + (PAGE_ALIGN(addr + size) <= addr))
> > > > > + return ERR_PTR(-EINVAL);
> > > > > +
> > > >
> > > > Can access_ok() be used here ?
> > > >
> > > > if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > > > addr, size))
> > > > return ERR_PTR(-EINVAL);
> > > >
> > >
> > > No, this will break the current ODP semantics.
> > >
> > > ODP allows the user to register memory that is not accessible yet.
> > > This is a critical design feature, as it allows avoiding holding
> > > a registration cache. Adding this check will break the behavior,
> > > forcing memory to be all accessible when registering an ODP MR.
> > >
> >
> > Failed to notice previously, but since this would break ODP, and ODP is
> > only available starting v3.19-rc1, my proposed fix might be applicable
> > for older kernel (if not better).
> >
>
> Can you explain how this proposed fix is better than the existing patch?
> Why do we want to push to the stable tree a patch that is not in the
> upstream? There is an existing, tested, patch that is going to the tip
> of the development. It even applies cleanly on every kernel version around.
>

access_ok() check for overflow *and* that the region is the memory range
for the current process. The later check is not done in your proposed
fix (but it should not be needed as get_user_pages() will be called
to validate the whole region for non-ODP memory registration).

Anyway, AFAIK access_ok() won't check for address being not NULL and
size not being 0, and I've noticed your proposed fix also ensure address
is not equal to NULL and, more important, that size is not equal to 0:
before v3.15-rc1 and commit eeb8461e36c9 ("IB: Refactor umem to use
linear SG table"), calling ib_umem_get() with size equal to 0 would
succeed with any arbitrary address ... who knows what might happen in
the lowlevel drivers (aka. providers) if they got an umem for a 0-sized
memory region.
This part of the changes was not detailled in your commit message: it's
an issue not related to overflow which is addressed by your patch.

So I agree my proposed patch is no better than yours: I've missed the
0-sized memory region issue and didn't take care of NULL address.

Regards.

--
Yann Droneaud
OPTEYA

2015-04-08 12:44:18

by Yann Droneaud

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,

Le mercredi 08 avril 2015 à 14:19 +0200, Yann Droneaud a écrit :
> Le jeudi 02 avril 2015 à 16:34 +0000, Shachar Raindel a écrit :
> > > -----Original Message-----
> > > From: Yann Droneaud [mailto:[email protected]]
> > > Sent: Thursday, April 02, 2015 6:16 PM
> > > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > > > -----Original Message-----
> > > > > From: Yann Droneaud [mailto:[email protected]]
> > > > > Sent: Thursday, April 02, 2015 1:05 PM
> > > > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > > ...
> > > > > > + /*
> > > > > > + * If the combination of the addr and size requested for this
> > > > > memory
> > > > > > + * region causes an integer overflow, return error.
> > > > > > + */
> > > > > > + if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > > > + (PAGE_ALIGN(addr + size) <= addr))
> > > > > > + return ERR_PTR(-EINVAL);
> > > > > > +
> > > > >
> > > > > Can access_ok() be used here ?
> > > > >
> > > > > if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > > > > addr, size))
> > > > > return ERR_PTR(-EINVAL);
> > > > >
> > > >
> > > > No, this will break the current ODP semantics.
> > > >
> > > > ODP allows the user to register memory that is not accessible yet.
> > > > This is a critical design feature, as it allows avoiding holding
> > > > a registration cache. Adding this check will break the behavior,
> > > > forcing memory to be all accessible when registering an ODP MR.
> > > >
> > >
> > > Failed to notice previously, but since this would break ODP, and ODP is
> > > only available starting v3.19-rc1, my proposed fix might be applicable
> > > for older kernel (if not better).
> > >
> >
> > Can you explain how this proposed fix is better than the existing patch?
> > Why do we want to push to the stable tree a patch that is not in the
> > upstream? There is an existing, tested, patch that is going to the tip
> > of the development. It even applies cleanly on every kernel version around.
> >
>
> access_ok() check for overflow *and* that the region is the memory range
> for the current process. The later check is not done in your proposed
> fix (but it should not be needed as get_user_pages() will be called
> to validate the whole region for non-ODP memory registration).
>
> Anyway, AFAIK access_ok() won't check for address being not NULL and
> size not being 0, and I've noticed your proposed fix also ensure address
> is not equal to NULL and, more important, that size is not equal to 0

It only check address not being 0 if size is already PAGE_SIZE aligned,
and it only check size not being 0 if address is already PAGE_SIZE
aligned.

> before v3.15-rc1 and commit eeb8461e36c9 ("IB: Refactor umem to use
> linear SG table"), calling ib_umem_get() with size equal to 0 would
> succeed with any arbitrary address ... who knows what might happen in
> the lowlevel drivers (aka. providers) if they got an umem for a 0-sized
> memory region.
> This part of the changes was not detailled in your commit message: it's
> an issue not related to overflow which is addressed by your patch.
>
> So I agree my proposed patch is no better than yours: I've missed the
> 0-sized memory region issue and didn't take care of NULL address.
>

Regards.

--
Yann Droneaud
OPTEYA

2015-04-13 13:29:54

by Yann Droneaud

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

Hi,

Le jeudi 02 avril 2015 à 18:12 +0000, Haggai Eran a écrit :
> On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote:
> >> -----Original Message-----
> >> From: Yann Droneaud [mailto:[email protected]]
> >> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
> >> > On 02/04/2015 16:30, Yann Droneaud wrote:
> >> >> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> >> >>>> -----Original Message-----
> >> >>>> From: Yann Droneaud [mailto:[email protected]]
> >> >>>> Sent: Thursday, April 02, 2015 1:05 PM
> >> >>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> >> >>
> >> >>>>> + /*
> >> >>>>> + * If the combination of the addr and size requested for this
> >> >>>> memory
> >> >>>>> + * region causes an integer overflow, return error.
> >> >>>>> + */
> >> >>>>> + if ((PAGE_ALIGN(addr + size) <= size) ||
> >> >>>>> + (PAGE_ALIGN(addr + size) <= addr))
> >> >>>>> + return ERR_PTR(-EINVAL);
> >> >>>>> +
> >> >>>>
> >> >>>> Can access_ok() be used here ?
> >> >>>>
> >> >>>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >> >>>> addr, size))
> >> >>>> return ERR_PTR(-EINVAL);
> >> >>>>
> >> >>>
> >> >>> No, this will break the current ODP semantics.
> >> >>>
> >> >>> ODP allows the user to register memory that is not accessible yet.
> >> >>> This is a critical design feature, as it allows avoiding holding
> >> >>> a registration cache. Adding this check will break the behavior,
> >> >>> forcing memory to be all accessible when registering an ODP MR.
> >> >>>
> >> >>
> >> >> Where's the check for the range being in userspace memory space,
> >> >> especially for the ODP case ?
> >> >>
> >> >> For non ODP case (eg. plain old behavior), does get_user_pages()
> >> >> ensure the requested pages fit in userspace region on all
> >> >> architectures ? I think so.
> >> >
> >> > Yes, get_user_pages will return a smaller amount of pages than
> >> requested
> >> > if it encounters an unmapped region (or a region without write
> >> > permissions for write requests). If this happens, the loop in
> >> > ib_umem_get calls get_user_pages again with the next set of pages, and
> >> > this time if it the first page still cannot be mapped an error is
> >> returned.
> >> >
> >> >>
> >> >> In ODP case, I'm not sure such check is ever done ?
> >> >
> >> > In ODP, we also call get_user_pages, but only when a page fault occurs
> >> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
> >> register
> >> > a memory region that contains unmapped virtual space, and then mmap
> >> > different files into that area without needing to re-register.
> >> >
> >>
> >> OK, thanks for the description.
> >>
> >> ...
> >>
> >> Another related question: as the large memory range could be registered
> >> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> >> what's prevent the kernel to map a file as the result of mmap(0, ...)
> >> in this region, making it available remotely through IBV_WR_RDMA_READ /
> >> IBV_WR_RDMA_WRITE ?
> >>
> >
> > This is not a bug. This is a feature.
> >
> > Exposing a file through RDMA, using ODP, can be done exactly like this.
> > Given that the application explicitly requested this behavior, I don't
> > see why it is a problem. Actually, some of our tests use such flows.
> > The mmu notifiers mechanism allow us to do this safely. When the page is
> > written back to disk, it is removed from the ODP mapping. When it is
> > accessed by the HCA, it is brought back to RAM.
> >
>
> I want to add that we would like to see users registering a very large
> memory region (perhaps the entire process address space) for local
> access, and then enabling remote access only to specific regions using
> memory windows. However, this isn't supported yet by our driver.

In such scheme, the registration must still be handled "manually":
one has to create a memory window to get a rkey to be exchanged with
a peer, so why one would want to register such a large memory region
(the whole process space) ?

I guess creating the memory window is faster than registering memory
region. I'd rather say this is not an excuse to register a larger
memory region (up to the whole process space, current and future) as it
sounds like a surprising optimisation: let the HCA known too many
pages just to be sure it already knows some when the process want to
use them. It seems it would become difficult to handle if there's too
many processes.

I would prefer creating memory region becoming costless (through ODP :).

> Still, there are valid cases where you would still want the results
> of an mmap(0,...) call to be remotely accessible, in cases where there
> is enough trust between the local process and the remote process.

mmap(0, ...., fd) let the kernel choose where to put the file in
process virtual memory space, so it may, may not, partially, end up in
an ODP pre registered memory region for a range unallocated/unused yet.

I don't think one want such to happen.

> It may help a middleware communication library register a large
> portion of the address space in advance, and still work with random
> pointers given to it by another application module.
>

But as said in the beginnig of your message, the middleware would have
bind a memory window before posting work request / exposing rkey for
the "random pointers".

So I fail to understand how could be used ODP when it comes to
registering a memory region not yet backed by something.

Regards.

--
Yann Droneaud
OPTEYA

2015-04-14 08:12:12

by Haggai Eran

[permalink] [raw]
Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access

On 13/04/2015 16:29, Yann Droneaud wrote:
> Le jeudi 02 avril 2015 à 18:12 +0000, Haggai Eran a écrit :
...
>>
>> I want to add that we would like to see users registering a very large
>> memory region (perhaps the entire process address space) for local
>> access, and then enabling remote access only to specific regions using
>> memory windows. However, this isn't supported yet by our driver.
>
> In such scheme, the registration must still be handled "manually":
> one has to create a memory window to get a rkey to be exchanged with
> a peer, so why one would want to register such a large memory region
> (the whole process space) ?
>
> I guess creating the memory window is faster than registering memory
> region.
Right.

It takes time to create and fill the hardware's page tables. Using
memory windows allows you to reuse the work done previously, while still
having a more granular control over the RDMA permissions. The larger MR
can be created with only local permissions, and the memory window can
add specific remote permissions to a smaller range. The memory window
utilizes the page tables created for the memory region.

> I'd rather say this is not an excuse to register a larger
> memory region (up to the whole process space, current and future) as it
> sounds like a surprising optimisation: let the HCA known too many
> pages just to be sure it already knows some when the process want to
> use them. It seems it would become difficult to handle if there's too
> many processes.
Are you worried about pinning too many pages? That is an issue we want
to solve with ODP :)

>
> I would prefer creating memory region becoming costless (through ODP :).
I agree :)

>
>> Still, there are valid cases where you would still want the results
>> of an mmap(0,...) call to be remotely accessible, in cases where there
>> is enough trust between the local process and the remote process.
>
> mmap(0, ...., fd) let the kernel choose where to put the file in
> process virtual memory space, so it may, may not, partially, end up in
> an ODP pre registered memory region for a range unallocated/unused yet.
>
> I don't think one want such to happen.
I think that in some cases the benefit of allowing this outweigh the
risks. This is why it is an opt-in feature.

>
>> It may help a middleware communication library register a large
>> portion of the address space in advance, and still work with random
>> pointers given to it by another application module.
>>
>
> But as said in the beginnig of your message, the middleware would have
> bind a memory window before posting work request / exposing rkey for
> the "random pointers".
>
> So I fail to understand how could be used ODP when it comes to
> registering a memory region not yet backed by something.

In this scenario, the middleware would first register the full address
space as an ODP memory region with local permissions only. When it wants
to provide remote access to some buffer, it would bind a memory window
over the ODP MR. This is possible with multiple processes because it
uses the virtual memory system without pinning. It won't cause random
mmap regions to be mapped for RDMA without the specific intent of the
application.

However, we currently don't have support for memory windows over ODP
MRs. Even if we did, there is some performance penalty due to binding
and invalidating memory windows. Some applications will still need full
process address space access for RDMA.

Regards,
Haggai