Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756526AbbDWG3Z (ORCPT ); Thu, 23 Apr 2015 02:29:25 -0400 Received: from relay.parallels.com ([195.214.232.42]:55528 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751508AbbDWG3U (ORCPT ); Thu, 23 Apr 2015 02:29:20 -0400 Message-ID: <55389133.8070701@parallels.com> Date: Thu, 23 Apr 2015 09:29:07 +0300 From: Pavel Emelyanov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Andrea Arcangeli CC: Linux Kernel Mailing List , Linux MM , Linux API , Sanidhya Kashyap Subject: Re: [PATCH 2/3] uffd: Introduce the v2 API References: <5509D342.7000403@parallels.com> <5509D375.7000809@parallels.com> <20150421121817.GD4481@redhat.com> In-Reply-To: <20150421121817.GD4481@redhat.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [89.169.95.100] X-ClientProxiedBy: US-EXCH2.sw.swsoft.com (10.255.249.46) To MSK-EXCH1.sw.swsoft.com (10.67.48.55) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4689 Lines: 125 On 04/21/2015 03:18 PM, Andrea Arcangeli wrote: > On Wed, Mar 18, 2015 at 10:35:17PM +0300, Pavel Emelyanov wrote: >> + if (!(ctx->features & UFFD_FEATURE_LONGMSG)) { > > If we are to use different protocols, it'd be nicer to have two > different methods to assign to userfaultfd_fops.read that calls an > __always_inline function, so that the above check can be optimized > away at build time when the inline is expanded. So the branch is > converted to calling a different pointer to function which is zero > additional cost. OK :) >> + /* careful to always initialize addr if ret == 0 */ >> + __u64 uninitialized_var(addr); >> + __u64 uninitialized_var(mtype); >> + if (count < sizeof(addr)) >> + return ret ? ret : -EINVAL; >> + _ret = userfaultfd_ctx_read(ctx, no_wait, &mtype, &addr); >> + if (_ret < 0) >> + return ret ? ret : _ret; >> + BUG_ON(mtype != UFFD_PAGEFAULT); >> + if (put_user(addr, (__u64 __user *) buf)) >> + return ret ? ret : -EFAULT; >> + _ret = sizeof(addr); >> + } else { >> + struct uffd_v2_msg msg; >> + if (count < sizeof(msg)) >> + return ret ? ret : -EINVAL; >> + _ret = userfaultfd_ctx_read(ctx, no_wait, &msg.type, &msg.arg); >> + if (_ret < 0) >> + return ret ? ret : _ret; >> + if (copy_to_user(buf, &msg, sizeof(msg))) >> + return ret ? ret : -EINVAL; >> + _ret = sizeof(msg); > > Reading 16bytes instead of 8bytes for each fault, probably wouldn't > move the needle much in terms of userfaultfd_read performance. Perhaps > we could consider using the uffd_v2_msg unconditionally and then have > a single protocol differentiated by the feature bits. So your proposal is to always report 16 bytes per PF from read() and let userspace decide itself how to handle the result? > The only reason to have two different protocols would be to be able to > read 8 bytes per userfault, in the cooperative usage (i.e. qemu > postcopy). But if we do that we want to use the __always_inline trick > to avoid branches and additional runtime costs (otherwise we may as > well forget all microoptimizations and read 16bytes always). > >> @@ -992,6 +1013,12 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, >> /* careful not to leak info, we only read the first 8 bytes */ >> uffdio_api.bits = UFFD_API_BITS; >> uffdio_api.ioctls = UFFD_API_IOCTLS; >> + >> + if (uffdio_api.api == UFFD_API_V2) { >> + ctx->features |= UFFD_FEATURE_LONGMSG; >> + uffdio_api.bits |= UFFD_API_V2_BITS; >> + } >> + >> ret = -EFAULT; >> if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api))) >> goto out; > > The original meaning of the bits is: > > If UFFD_BIT_WRITE was set in api.bits, it means the > !!(address&UFFD_BIT_WRITE) tells if it was a write fault (missing or > WP). > > If UFFD_BIT_WP was set in api.bits, it means the > !!(address&UFFD_BIT_WP) tells if it was a WP fault (if not set it > means it was a missing fault). > > Currently api.bits sets only UFFD_BIT_WRITE, and UFFD_BIT_WP will be > set later, after the WP tracking mode will be implemented. > > I'm uncertain how bits translated to features and if they should be > unified or only have features. > >> +struct uffd_v2_msg { >> + __u64 type; >> + __u64 arg; >> +}; >> + >> +#define UFFD_PAGEFAULT 0x1 >> + >> +#define UFFD_PAGEFAULT_BIT (1 << (UFFD_PAGEFAULT - 1)) >> +#define __UFFD_API_V2_BITS (UFFD_PAGEFAULT_BIT) >> + >> +/* >> + * Lower PAGE_SHIFT bits are used to report those supported >> + * by the pagefault message itself. Other bits are used to >> + * report the message types v2 API supports >> + */ >> +#define UFFD_API_V2_BITS (__UFFD_API_V2_BITS << 12) >> + > > And why exactly is this 12 hardcoded? Ah, it should have been the PAGE_SHIFT one, but I was unsure whether it would be OK to have different shifts in different arches. But taking into account your comment that bits field id bad for these values, if we introduce the new .features one for api message, then this 12 will just go away. > And which field should be masked > with the bits? In the V1 protocol it was the "arg" (userfault address) > not the "type". So this is a bit confusing and probably requires > simplification. I see. Actually I decided that since bits higher than 12th (for x86) is always 0 in api message (no bits allowed there, since pfn sits in this place), it would be OK to put non-PF bits there. Should I better introduce another .features field in uffd API message? -- Pavel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/