2013-04-18 15:24:20

by Mathieu Desnoyers

[permalink] [raw]
Subject: [commit review] vm: add vm_iomap_memory() helper function

Hi Linus,

Poking my nose into this new helper,

> commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e
> Author: Linus Torvalds <[email protected]>
> Date: Tue Apr 16 13:45:37 2013 -0700
>
> vm: add vm_iomap_memory() helper function
[...]
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2393,6 +2393,53 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> }
> EXPORT_SYMBOL(remap_pfn_range);
>
> +/**
> + * vm_iomap_memory - remap memory to userspace
> + * @vma: user vma to map to
> + * @start: start of area
> + * @len: size of area
> + *
> + * This is a simplified io_remap_pfn_range() for common driver use. The
> + * driver just needs to give us the physical memory range to be mapped,
> + * we'll figure out the rest from the vma information.
> + *
> + * NOTE! Some drivers might want to tweak vma->vm_page_prot first to get
> + * whatever write-combining details or similar.
> + */
> +int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned long len)
> +{
> + unsigned long vm_len, pfn, pages;
> +
> + /* Check that the physical memory area passed in looks valid */
> + if (start + len < start)

is len == 0 accepted ?

> + return -EINVAL;
> + /*
> + * You *really* shouldn't map things that aren't page-aligned,
> + * but we've historically allowed it because IO memory might
> + * just have smaller alignment.
> + */
> + len += start & ~PAGE_MASK;
> + pfn = start >> PAGE_SHIFT;
> + pages = (len + ~PAGE_MASK) >> PAGE_SHIFT;
> + if (pfn + pages < pfn)
> + return -EINVAL;
> +
> + /* We start the mapping 'vm_pgoff' pages into the area */
> + if (vma->vm_pgoff > pages)

This seems to accept vma->vm_pgoff == pages, which would leave exactly 0
backing pages. Is this expected ?

> + return -EINVAL;
> + pfn += vma->vm_pgoff;
> + pages -= vma->vm_pgoff;
> +
> + /* Can we fit all of the mapping? */
> + vm_len = vma->vm_end - vma->vm_start;
> + if (vm_len >> PAGE_SHIFT > pages)

Is it possible that we get a situation where vm_end and vm_start are not
aligned on page addresses ? If it's allowed, then this test may succeed
even though the range requires more than "pages" backing pages. This
might be an issue since the first thing remap_pfn_range() does with its
"size" argument is to PAGE_ALIGN() it.

Thoughts ?

Thanks,

Mathieu

> + return -EINVAL;
> +
> + /* Ok, let it rip */
> + return io_remap_pfn_range(vma, vma->vm_start, pfn, vm_len, vma->vm_page_prot);
> +}
> +EXPORT_SYMBOL(vm_iomap_memory);
> +
> static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
> unsigned long addr, unsigned long end,
> pte_fn_t fn, void *data)


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


2013-04-18 15:51:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [commit review] vm: add vm_iomap_memory() helper function

On Thu, Apr 18, 2013 at 8:24 AM, Mathieu Desnoyers
<[email protected]> wrote:
> Hi Linus,
>> +
>> + /* Check that the physical memory area passed in looks valid */
>> + if (start + len < start)
>
> is len == 0 accepted ?

Here, yes. We only check for *overflow*, and the canonical way to do
that is "a+b < a" (where the expressions have to be unsigned, of
course). We do this pretty commonly in the kernel, it's a pattern.

And I do know that gcc is smart enough to notice the overflow pattern
and turn this into an "add" followed by a "jump if below" without
actually having the compare instruction at all. I'm not sure it
notices the "overflow or zero" pattern, and even if it did, I think
it's not as clear on a source code level (I'd rather have an explicit
test for len being zero, unless there would be some really fundamental
performance reason why it needs to be one instruction and gcc does it
for one case but not the other).

>> + /*
>> + * You *really* shouldn't map things that aren't page-aligned,
>> + * but we've historically allowed it because IO memory might
>> + * just have smaller alignment.
>> + */
>> + len += start & ~PAGE_MASK;
>> + pfn = start >> PAGE_SHIFT;
>> + pages = (len + ~PAGE_MASK) >> PAGE_SHIFT;

Side note, there can be overflow in these additions too, so "pages"
may overflow from something really big to something really small, and
we do not test it, because we don't care. We're basically only adding
partial single pages, so even if an overflow were to happen (len
starting out as some huge number), it would overflow to 0 or 1, and
that's fine. Making the length we allow *smaller* isn't a problem.

Also, the "len" we get passed in is supposedly controlled by the
kernel (length of the kernel-allocated data mapping), so it should be
a valid value. My only reason for even considering it is that I don't
trust drivers, and I could imagine some driver using a signed "len",
and setting it to something -1 rather than 0 if an allocation failed.
I don't think that happens, but it's the kind of thing we *could* try
to explicitly check against too, and just say "you can't mmap more
than 2GB using this helper".

On the other hand I *could* also imagine some odd accelerator FPGA
thing having 4GB memory areas on some specialized 64-bit
supercomputer, so adding too many sanity checks on things like this
could be detrimental too.

>> + if (pfn + pages < pfn)
>> + return -EINVAL;

Standard overflow check again, although it's not really a big issue.
Neither pfn nor pages is user-controlled.

>> + /* We start the mapping 'vm_pgoff' pages into the area */
>> + if (vma->vm_pgoff > pages)
>
> This seems to accept vma->vm_pgoff == pages, which would leave exactly 0
> backing pages. Is this expected ?

Yes. That's fine. It's still a valid number of pages

Technically, if we were to allow zero-sized mappings, the zero length
of the source buffer would be ok too, so regardless, I don't think we
want an explicit test for zero. We just want to check that we have
enough buffer space to be mapped. "zero" *could* be enough, if we have
nothing we want to map ;)

So I don't think "len==0" should be a special case at this level.

But see below:

>> + return -EINVAL;
>> + pfn += vma->vm_pgoff;
>> + pages -= vma->vm_pgoff;
>> +
>> + /* Can we fit all of the mapping? */
>> + vm_len = vma->vm_end - vma->vm_start;
>> + if (vm_len >> PAGE_SHIFT > pages)
>
> Is it possible that we get a situation where vm_end and vm_start are not
> aligned on page addresses ?

Not unless there is some major core VM bug, in which case we shouldn't
check it here.

Also, we don't allow empty vma's, so "vm_len" is not only
page-aligned, it is known to be non-zero. So this is where a
zero-sized physical memory area (either because it was zero to begin
with, or because it became zero after the vma->vm_pgoff fixup) would
hit the wall.

> If it's allowed, then this test may succeed
> even though the range requires more than "pages" backing pages. This
> might be an issue since the first thing remap_pfn_range() does with its
> "size" argument is to PAGE_ALIGN() it.

For remap_pfn_range(), the size comes from a possibly confused caller
(most drivers are a bit confused ;), since the generic pfn remapping
can be used to remap just _parts_ of a vma.

So the generic [io_]remap_pfn_range() functions really are complicated
for a reason: they allow a much more involved and complicated use
case. It's just that for 99.9% of all drivers, they don't want that
complicated case, and they just spend a lot of time doing these checks
and updating vm_pgoff etc crap, that they really shouldn't do, and
sometimes don't do correctly.

In contrast, in this helper function, we take the size from the vma
itself, that the core VM has set up. And a non-page-aligned vma would
be horribly horribly buggy. If it happens, we'd have much bigger
problems than this small function.

But thanks for reading through it. I don't think you caught anything,
but there might be something else I missed.

Linus