by Steven Price

[permalink] [raw]

Subject: Re: [PATCH v9 5/6] KVM: arm64: ioctl to fetch/store tags in a guest

On 09/03/2021 17:57, Marc Zyngier wrote:
> On Mon, 01 Mar 2021 14:23:14 +0000,
> Steven Price <[email protected]> wrote:
>>
>> The VMM may not wish to have it's own mapping of guest memory mapped
>> with PROT_MTE because this causes problems if the VMM has tag checking
>> enabled (the guest controls the tags in physical RAM and it's unlikely
>> the tags are correct for the VMM).
>>
>> Instead add a new ioctl which allows the VMM to easily read/write the
>> tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
>> while the VMM can still read/write the tags for the purpose of
>> migration.
>>
>> Signed-off-by: Steven Price <[email protected]>
>> ---
>> arch/arm64/include/uapi/asm/kvm.h | 13 +++++++
>> arch/arm64/kvm/arm.c | 57 +++++++++++++++++++++++++++++++
>> include/uapi/linux/kvm.h | 1 +
>> 3 files changed, 71 insertions(+)
>>
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index 24223adae150..5fc2534ac5df 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -184,6 +184,19 @@ struct kvm_vcpu_events {
>> __u32 reserved[12];
>> };
>>
>> +struct kvm_arm_copy_mte_tags {
>> + __u64 guest_ipa;
>> + __u64 length;
>> + union {
>> + void __user *addr;
>> + __u64 padding;
>> + };
>> + __u64 flags;
>
> I'd be keen on a couple of reserved __64s. Just in case...

Fair enough, I'll add a __u64 reserved[2];

>> +};
>> +
>> +#define KVM_ARM_TAGS_TO_GUEST 0
>> +#define KVM_ARM_TAGS_FROM_GUEST 1
>> +
>> /* If you need to interpret the index values, here is the key: */
>> #define KVM_REG_ARM_COPROC_MASK 0x000000000FFF0000
>> #define KVM_REG_ARM_COPROC_SHIFT 16
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 46bf319f6cb7..01d404833e24 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1297,6 +1297,53 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>> }
>> }
>>
>> +static int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>> + struct kvm_arm_copy_mte_tags *copy_tags)
>> +{
>> + gpa_t guest_ipa = copy_tags->guest_ipa;
>> + size_t length = copy_tags->length;
>> + void __user *tags = copy_tags->addr;
>> + gpa_t gfn;
>> + bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
>> +
>> + if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
>> + return -EINVAL;
>> +
>> + if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
>> + return -EINVAL;
>
> It is a bit odd to require userspace to provide a page-aligned
> addr/size, as it now has to find out about the kernel's page
> size. MTE_GRANULE_SIZE-aligned values would make more sense. Is there
> an underlying reason for this?

No fundamental reason, my thoughts were:

* It's likely user space is naturally going to be using page-aligned
quantities during migration, so it already has to care about this.

* It makes the loop below easier.

* It's easy to relax the restriction in the future if it becomes a
problem, much harder to tighten it without breaking anything.

But I can switch to MTE_GRANULE_SIZE if you'd prefer, let me know.

>> +
>> + gfn = gpa_to_gfn(guest_ipa);
>> +
>> + while (length > 0) {
>> + kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
>> + void *maddr;
>> + unsigned long num_tags = PAGE_SIZE / MTE_GRANULE_SIZE;
>> +
>> + if (is_error_noslot_pfn(pfn))
>> + return -ENOENT;
>> +
>> + maddr = page_address(pfn_to_page(pfn));
>> +
>> + if (!write) {
>> + num_tags = mte_copy_tags_to_user(tags, maddr, num_tags);
>> + kvm_release_pfn_clean(pfn);
>> + } else {
>> + num_tags = mte_copy_tags_from_user(maddr, tags,
>> + num_tags);
>> + kvm_release_pfn_dirty(pfn);
>> + }
>> +
>
> Is it actually safe to do this without holding any lock, without
> checking anything against the mmu_notifier_seq? What if the pages are
> being swapped out? Or the memslot removed from under your feet?
>
> It looks... dangerous. Do you even want to allow this while vcpus are
> actually running?

Umm... yeah I'm not sure how I managed to forgot the locks. This should
be holding kvm->slots_lock to prevent the slot going under our feet. I
was surprised that lockdep didn't catch that, until I noticed I'd
disabled it and discovered why (the model makes it incredibly slow).
However I've done a run with it enabled now - and with the
kvm->slots_lock taken it's happy.

gfn_to_pfn_prot() internally calls a variant of get_user_pages() - so
swapping out shouldn't be a problem.

In terms of running with the vcpus running - given this is going to be
used for migration I think that's pretty much a requirement. We want to
be able to dump the tags while executing to enable early transfer of the
memory.

Steve

>> + if (num_tags != PAGE_SIZE / MTE_GRANULE_SIZE)
>> + return -EFAULT;
>> +
>> + gfn++;
>> + tags += num_tags;
>> + length -= PAGE_SIZE;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> long kvm_arch_vm_ioctl(struct file *filp,
>> unsigned int ioctl, unsigned long arg)
>> {
>> @@ -1333,6 +1380,16 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>
>> return 0;
>> }
>> + case KVM_ARM_MTE_COPY_TAGS: {
>> + struct kvm_arm_copy_mte_tags copy_tags;
>> +
>> + if (!kvm_has_mte(kvm))
>> + return -EINVAL;
>> +
>> + if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
>> + return -EFAULT;
>> + return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
>> + }
>> default:
>> return -EINVAL;
>> }
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 05618a4abf7e..b75af0f9ba55 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1423,6 +1423,7 @@ struct kvm_s390_ucas_mapping {
>> /* Available with KVM_CAP_PMU_EVENT_FILTER */
>> #define KVM_SET_PMU_EVENT_FILTER _IOW(KVMIO, 0xb2, struct kvm_pmu_event_filter)
>> #define KVM_PPC_SVM_OFF _IO(KVMIO, 0xb3)
>> +#define KVM_ARM_MTE_COPY_TAGS _IOR(KVMIO, 0xb4, struct kvm_arm_copy_mte_tags)
>>
>> /* ioctl for vm fd */
>> #define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)
>> --
>> 2.20.1
>>
>>
>
> Thanks,
>
> M.
>

2021-03-11 12:38:25

by Steven Price

[permalink] [raw]

Subject: Re: [PATCH v9 6/6] KVM: arm64: Document MTE capability and ioctl

On 09/03/2021 11:01, Peter Maydell wrote:
> On Mon, 1 Mar 2021 at 14:23, Steven Price <[email protected]> wrote:
>>
>> A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports
>> granting a guest access to the tags, and provides a mechanism for the
>> VMM to enable it.
>>
>> A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to
>> access the tags of a guest without having to maintain a PROT_MTE mapping
>> in userspace. The above capability gates access to the ioctl.
>>
>> Signed-off-by: Steven Price <[email protected]>
>> ---
>> Documentation/virt/kvm/api.rst | 37 ++++++++++++++++++++++++++++++++++
>> 1 file changed, 37 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index aed52b0fc16e..1406ea138127 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -4939,6 +4939,23 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
>> Allows Xen vCPU attributes to be read. For the structure and types,
>> see KVM_XEN_VCPU_SET_ATTR above.
>>
>> +4.131 KVM_ARM_MTE_COPY_TAGS
>> +---------------------------
>> +
>> +:Capability: KVM_CAP_ARM_MTE
>> +:Architectures: arm64
>> +:Type: vm ioctl
>> +:Parameters: struct kvm_arm_copy_mte_tags
>> +:Returns: 0 on success, < 0 on error
>> +
>> +Copies Memory Tagging Extension (MTE) tags to/from guest tag memory.
>
> Mostly virt/kvm/api.rst seems to include documentation of the
> associated structs, something like:
>
> ::
>
> struct kvm_arm_copy_mte_tags {
> __u64 guest_ipa;
> __u64 length;
> union {
> void __user *addr;
> __u64 padding;
> };
> __u64 flags;
> };
>
>
> which saves the reader having to cross-reference against the header file.

Good point - I'll add that.

> It also means you can more naturally use the actual field names in the doc,
> eg:
>
>> +The
>> +starting address and length of guest memory must be ``PAGE_SIZE`` aligned.
>
> you could say "The guest_ipa and length fields" here.
>
> Also "The addr field must point to a buffer which the tags will
> be copied to or from." I assume.

Indeed - I'll add the clarification.

>> +The size of the buffer to store the tags is ``(length / MTE_GRANULE_SIZE)``
>> +bytes (i.e. 1/16th of the corresponding size).
>
>> + Each byte contains a single tag
>> +value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
>> +``PTRACE_POKEMTETAGS``.
>
> What are the valid values for 'flags' ? It looks like they specify which
> direction the copy is, which we definitely need to document here.

Yes either KVM_ARM_TAGS_TO_GUEST or KVM_ARM_TAGS_FROM_GUEST - again I'll
clarify that.

> What happens if the caller requests a tag copy for an area of guest
> address space which doesn't have tags (eg it has nothing mapped),
> or for an area of guest addres space which has tags in some parts
> but not in others ?

Guest memory either exists (and has tags) or doesn't exist (assuming MTE
is enabled for the guest). So the cases this can fail are:

* The region isn't completely covered with memslots
* The region isn't completely writable (and KVM_ARM_TAGS_TO_GUEST is
specified).
* User space doesn't have access to the memory (i.e. the memory would
SIGSEGV or similar if the VMM accessed it).

Currently all the above produce the error -ENOENT, which now I come to
enumerate the cases doesn't seem like a great error code (it's really
only appropriate for the first)! Perhaps -EFAULT would be better.

>> +
>> 5. The kvm_run structure
>> ========================
>>
>> @@ -6227,6 +6244,25 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them.
>> This capability can be used to check / enable 2nd DAWR feature provided
>> by POWER10 processor.
>>
>> +7.23 KVM_CAP_ARM_MTE
>> +--------------------
>> +
>> +:Architectures: arm64
>> +:Parameters: none
>> +
>> +This capability indicates that KVM (and the hardware) supports exposing the
>> +Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
>> +VMM before the guest will be granted access.
>> +
>> +When enabled the guest is able to access tags associated with any memory given
>> +to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
>> +that the tags are maintained during swap or hibernation of the host, however
>
> s/,/;/

Yep

>> +the VMM needs to manually save/restore the tags as appropriate if the VM is
>> +migrated.
>> +
>> +When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
>> +perform a bulk copy of tags to/from the guest
>
> "guest."

Good spot.

>> +
>> 8. Other capabilities.
>> ======================
>>
>> @@ -6716,3 +6752,4 @@ KVM_XEN_HVM_SET_ATTR, KVM_XEN_HVM_GET_ATTR, KVM_XEN_VCPU_SET_ATTR and
>> KVM_XEN_VCPU_GET_ATTR ioctls, as well as the delivery of exception vectors
>> for event channel upcalls when the evtchn_upcall_pending field of a vcpu's
>> vcpu_info is set.
>> +
>> --
>> 2.20.1
>
>
> Stray whitespace change ?

Not sure how that got there - but will remove.

Thanks,

Steve