by Jason A. Donenfeld

[permalink] [raw]

Subject: Re: [PATCH v6 1/3] random: add vgetrandom_alloc() syscall

On Thu, Nov 24, 2022 at 01:24:42PM +0100, Jason A. Donenfeld wrote:
> Hi Florian,
>
> On Thu, Nov 24, 2022 at 01:15:24PM +0100, Florian Weimer wrote:
> > * Jason A. Donenfeld:
> >
> > > Hi Florian,
> > >
> > > On Thu, Nov 24, 2022 at 06:25:39AM +0100, Florian Weimer wrote:
> > >> * Jason A. Donenfeld:
> > >>
> > >> > Hi Florian,
> > >> >
> > >> > On Wed, Nov 23, 2022 at 11:46:58AM +0100, Florian Weimer wrote:
> > >> >> * Jason A. Donenfeld:
> > >> >>
> > >> >> > + * The vgetrandom() function in userspace requires an opaque state, which this
> > >> >> > + * function provides to userspace, by mapping a certain number of special pages
> > >> >> > + * into the calling process. It takes a hint as to the number of opaque states
> > >> >> > + * desired, and returns the number of opaque states actually allocated, the
> > >> >> > + * size of each one in bytes, and the address of the first state.
> > >> >> > + */
> > >> >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num,
> > >> >> > + unsigned long __user *, size_per_each, unsigned int, flags)
> > >> >>
> > >> >> I think you should make this __u64, so that you get a consistent
> > >> >> userspace interface on all architectures, without the need for compat
> > >> >> system calls.
> > >> >
> > >> > That would be quite unconventional. Most syscalls that take lengths do
> > >> > so with the native register size (`unsigned long`, `size_t`), rather
> > >> > than u64. If you can point to a recent trend away from this by
> > >> > indicating some commits that added new syscalls with u64, I'd be happy
> > >> > to be shown otherwise. But AFAIK, that's not the way it's done.
> > >>
> > >> See clone3 and struct clone_args.
> > >
> > > The struct is one thing. But actually, clone3 takes a `size_t`:
> > >
> > > SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size)
> > >
> > > I take from this that I too should use `size_t` rather than `unsigned
> > > long.` And it doesn't seem like there's any compat clone3.
> >
> > But vgetrandom_alloc does not use unsigned long, but unsigned long *.
> > You need to look at the contents for struct clone_args for comparison.
>
> The other direction would be making this a u32

I think `unsigned int` is actually a sensible size for what these values
should be. That eliminates the problem and potential bikeshed too. So
I'll go with that for v+1.

Jason

2022-11-25 08:05:12

by Rasmus Villemoes

[permalink] [raw]

Subject: Re: [PATCH v6 2/3] random: introduce generic vDSO getrandom() implementation

On 24/11/2022 02.18, Jason A. Donenfeld wrote:
> Hi Rasmus,
>
> On Wed, Nov 23, 2022 at 09:51:04AM +0100, Rasmus Villemoes wrote:
>> On 21/11/2022 16.29, Jason A. Donenfeld wrote:
>>
>> Cc += linux-api
>>
>>>
>>> if (!new_block)
>>> goto out;
>>> new_cap = grnd_allocator.cap + num;
>>> new_states = reallocarray(grnd_allocator.states, new_cap, sizeof(*grnd_allocator.states));
>>> if (!new_states) {
>>> munmap(new_block, num * size_per_each);
>>
>> Hm. This does leak an implementation detail of vgetrandom_alloc(),
>> namely that it is based on mmap() of that size rounded up to page size.
>> Do we want to commit to this being the proper way of disposing of a
>> succesful vgetrandom_alloc(), or should there also be a
>> vgetrandom_free(void *states, long num, long size_per_each)?
>
> Yes, this is intentional, and this is exactly what I wanted to do. There
> are various wrappers of vm_mmap() throughout, mmap being one of them,
> and they typically then resort to munmap to unmap it. This is how
> userspace handles memory - maps, always maps. So I think doing that is
> fine and consistent.

OK. Perhaps for the benefit of future libc implementors drop a comment
somewhere as to how to dealloc the blob.

> However, your point about it relying on it being a rounded up size isn't
> correct. `munmap` will unmap the whole page if the size you pass lies
> within a page. So `num * size_of_each` will always do the right thing,
> without needing to have userspace code round anything up. (From the man
> page: "The address addr must be a multiple of the page size (but length
> need not be).

I know, and I never said userspace needed to round anything up.

All pages containing a part of the indicated range are
> unmapped.") And as you can see in my example code, nothing is rounded
> up. So I don't know why you made that comment.

I made that comment because it's clear from what this does that you get
something back that is _at least_ num*size_per_each in size, but what is
not clear is that the actual allocation is exactly and will always be
that size rounded up to a page size (and no more), so that
munmap(num*size_per_each), with its well-known and documented semantics,
will DTRT.

> I think adding more control is exactly what this is trying to avoid.
> It's very intentionally *not* a general allocator function, but
> something specific for vDSO getrandom(). However, it does already, in
> this very patchset here, take a (currently unused) flags argument, in
> case we have the need for later extension.

OK.

Perhaps you can spend a few more words on why this allocation _needs_ to
be MAP_LOCKED? That seems somewhat of a policy thing imposed by the
kernel, something that would be better left to the libc or distro or
whatnot to request via a flag. I could imagine applications that
currently run at the mlock limit start failing after a libc upgrade -
which could of course be considered a libc problem, and perhaps it's too
unlikely to worry about.

Rasmus