Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp2391110rwo; Thu, 3 Aug 2023 08:43:33 -0700 (PDT) X-Google-Smtp-Source: APBJJlE72t/Bq765mQDm387jwMpNdpEYVGfZAAEUrB10m3/EsSMTTXRorDX7yv4UxHCLFQ5QU1T/ X-Received: by 2002:a2e:a306:0:b0:2b9:f0b4:eab7 with SMTP id l6-20020a2ea306000000b002b9f0b4eab7mr8811048lje.18.1691077412666; Thu, 03 Aug 2023 08:43:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691077412; cv=none; d=google.com; s=arc-20160816; b=fARwLkq2JxSiTD28eNRdqbYw71aHPKnhYr+FcKvjMuvjgWvYplBawK2v+YqZBY6nLA B2nRkvoKIP11uul7YoKt3XVd75YP5ocZu3uS7FhtY9z8L+npXRKKq7EC45IDCd/z3fvC +AM3Dyx2c/eEoT8T3AtFGVhZ28rEqLP7mPkL4FNuMa9ovFYUSj9A8KKPi6hmQjLY1QIg Vhst9tA1xpU+AZcXfmh428cbjRMq6buNYF1Bk9tRsW2oJC89ndgSNVPOEPefigrr/9YP /ZSC2r23u+/a04I4F7p7LVSIbS9+WDCXRx5SZFiE6sOD4/sHkzfJItOOJlTJxJKGiiwN LFOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=sdjOUaagf8wzj+9dXABDn81ONTEy4ukFajiNJ6J9Xz0=; fh=EBBRXFbqwdqcg0ufdoesRglY86PJQnHc9a/D68rHzPg=; b=IVWzBgxgjyiBtKqFDUPJ45wLOKLMaoxrP2EupikmqUft92kVNzSr2A5rvPH1mqmuGQ AToPhy9+OiKHEmHzQfInESCLvFSrQhzZEHcY3Q8YXJY5VTOjMguB3WaKZYlHZPfMmjOP eBQqjXzFpanQHITfMiI3V3mO7wHx8XKxIxGnaVFjo8+fKcywDwNNI7YDf/DsXSOpzJ5w qYf1xwkjKqFz7r/0bh7x4LGahU/iC8FoK8bkRH+d37h8COBqG5MmNxyXjhgJLj9f0wyF FDBhEoZsacwfNEKPjTUB3sMvCqIunUyWI+KGeB+dr4D53heQT4YLX5eS+gQ7qDjx0B8B qUcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=frjeGjmo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e6-20020a170906374600b0099319803901si11898449ejc.844.2023.08.03.08.43.07; Thu, 03 Aug 2023 08:43:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=frjeGjmo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234941AbjHCOnq (ORCPT + 99 others); Thu, 3 Aug 2023 10:43:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234812AbjHCOmt (ORCPT ); Thu, 3 Aug 2023 10:42:49 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3779E49D8 for ; Thu, 3 Aug 2023 07:41:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691073674; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=sdjOUaagf8wzj+9dXABDn81ONTEy4ukFajiNJ6J9Xz0=; b=frjeGjmon6iwVcqsA0TpqEPau/S/vixfr2Z1plCSTa6ltF6HHeQNVbjUqpQ3w08liESQlX bcRccjoVs9wEe+TvvBBk8C4az2lQkRDNG26sytUx6/MxmpjfIu9e+HuVxZ0UMPLewZEDms y04J3+s3Fh0HESh++nN6oUcjWZndqOk= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-397-az8IbsBMNxi6IQHCDoCwAw-1; Thu, 03 Aug 2023 10:41:11 -0400 X-MC-Unique: az8IbsBMNxi6IQHCDoCwAw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 101783C025B3; Thu, 3 Aug 2023 14:41:11 +0000 (UTC) Received: from localhost (unknown [10.39.194.215]) by smtp.corp.redhat.com (Postfix) with ESMTP id 86AA71401C2E; Thu, 3 Aug 2023 14:41:10 +0000 (UTC) From: Stefan Hajnoczi To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Alex Williamson , Stefan Hajnoczi Subject: [PATCH] vfio: align capability structures Date: Thu, 3 Aug 2023 10:41:09 -0400 Message-ID: <20230803144109.2331944-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The VFIO_DEVICE_GET_INFO, VFIO_DEVICE_GET_REGION_INFO, and VFIO_IOMMU_GET_INFO ioctls fill in an info struct followed by capability structs: +------+---------+---------+-----+ | info | caps[0] | caps[1] | ... | +------+---------+---------+-----+ Both the info and capability struct sizes are not always multiples of sizeof(u64), leaving u64 fields in later capability structs misaligned. Userspace applications currently need to handle misalignment manually in order to support CPU architectures and programming languages with strict alignment requirements. Make life easier for userspace by ensuring alignment in the kernel. The new layout is as follows: +------+---+---------+---------+---+-----+ | info | 0 | caps[0] | caps[1] | 0 | ... | +------+---+---------+---------+---+-----+ In this example info and caps[1] have sizes that are not multiples of sizeof(u64), so zero padding is added to align the subsequent structure. Adding zero padding between structs does not break the uapi. The memory layout is specified by the info.cap_offset and caps[i].next fields filled in by the kernel. Applications use these field values to locate structs and are therefore unaffected by the addition of zero padding. Signed-off-by: Stefan Hajnoczi --- include/linux/vfio.h | 2 +- drivers/gpu/drm/i915/gvt/kvmgt.c | 7 +++-- drivers/s390/cio/vfio_ccw_ops.c | 7 +++-- drivers/vfio/pci/vfio_pci_core.c | 14 ++++++--- drivers/vfio/vfio_iommu_type1.c | 7 +++-- drivers/vfio/vfio_main.c | 53 +++++++++++++++++++++++++++----- 6 files changed, 71 insertions(+), 19 deletions(-) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 2c137ea94a3e..ff0864e73cc3 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -272,7 +272,7 @@ struct vfio_info_cap { struct vfio_info_cap_header *vfio_info_cap_add(struct vfio_info_cap *caps, size_t size, u16 id, u16 version); -void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset); +ssize_t vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset); int vfio_info_add_capability(struct vfio_info_cap *caps, struct vfio_info_cap_header *cap, size_t size); diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index de675d799c7d..9060e9c6ac7c 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1297,7 +1297,10 @@ static long intel_vgpu_ioctl(struct vfio_device *vfio_dev, unsigned int cmd, info.argsz = sizeof(info) + caps.size; info.cap_offset = 0; } else { - vfio_info_cap_shift(&caps, sizeof(info)); + ssize_t cap_offset = vfio_info_cap_shift(&caps, sizeof(info)); + if (cap_offset < 0) + return cap_offset; + if (copy_to_user((void __user *)arg + sizeof(info), caps.buf, caps.size)) { @@ -1305,7 +1308,7 @@ static long intel_vgpu_ioctl(struct vfio_device *vfio_dev, unsigned int cmd, kfree(sparse); return -EFAULT; } - info.cap_offset = sizeof(info); + info.cap_offset = cap_offset; } kfree(caps.buf); diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c index 5b53b94f13c7..63d5163376a5 100644 --- a/drivers/s390/cio/vfio_ccw_ops.c +++ b/drivers/s390/cio/vfio_ccw_ops.c @@ -361,13 +361,16 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_ccw_private *private, info->argsz = sizeof(*info) + caps.size; info->cap_offset = 0; } else { - vfio_info_cap_shift(&caps, sizeof(*info)); + ssize_t cap_offset = vfio_info_cap_shift(&caps, sizeof(*info)); + if (cap_offset < 0) + return cap_offset; + if (copy_to_user((void __user *)arg + sizeof(*info), caps.buf, caps.size)) { kfree(caps.buf); return -EFAULT; } - info->cap_offset = sizeof(*info); + info->cap_offset = cap_offset; } kfree(caps.buf); diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 20d7b69ea6ff..92c093b99187 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -966,12 +966,15 @@ static int vfio_pci_ioctl_get_info(struct vfio_pci_core_device *vdev, if (info.argsz < sizeof(info) + caps.size) { info.argsz = sizeof(info) + caps.size; } else { - vfio_info_cap_shift(&caps, sizeof(info)); + ssize_t cap_offset = vfio_info_cap_shift(&caps, sizeof(info)); + if (cap_offset < 0) + return cap_offset; + if (copy_to_user(arg + 1, caps.buf, caps.size)) { kfree(caps.buf); return -EFAULT; } - info.cap_offset = sizeof(*arg); + info.cap_offset = cap_offset; } kfree(caps.buf); @@ -1107,12 +1110,15 @@ static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev, info.argsz = sizeof(info) + caps.size; info.cap_offset = 0; } else { - vfio_info_cap_shift(&caps, sizeof(info)); + ssize_t cap_offset = vfio_info_cap_shift(&caps, sizeof(info)); + if (cap_offset < 0) + return cap_offset; + if (copy_to_user(arg + 1, caps.buf, caps.size)) { kfree(caps.buf); return -EFAULT; } - info.cap_offset = sizeof(*arg); + info.cap_offset = cap_offset; } kfree(caps.buf); diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index ebe0ad31d0b0..ab64b9e3ed7c 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2808,14 +2808,17 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu, if (info.argsz < sizeof(info) + caps.size) { info.argsz = sizeof(info) + caps.size; } else { - vfio_info_cap_shift(&caps, sizeof(info)); + ssize_t cap_offset = vfio_info_cap_shift(&caps, sizeof(info)); + if (cap_offset < 0) + return cap_offset; + if (copy_to_user((void __user *)arg + sizeof(info), caps.buf, caps.size)) { kfree(caps.buf); return -EFAULT; } - info.cap_offset = sizeof(info); + info.cap_offset = cap_offset; } kfree(caps.buf); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index f0ca33b2e1df..4fc8698577a7 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1171,8 +1171,18 @@ struct vfio_info_cap_header *vfio_info_cap_add(struct vfio_info_cap *caps, { void *buf; struct vfio_info_cap_header *header, *tmp; + size_t header_offset; + size_t new_size; - buf = krealloc(caps->buf, caps->size + size, GFP_KERNEL); + /* + * Reserve extra space when the previous capability was not a multiple of + * the largest field size. This ensures that capabilities are properly + * aligned. + */ + header_offset = ALIGN(caps->size, sizeof(u64)); + new_size = header_offset + size; + + buf = krealloc(caps->buf, new_size, GFP_KERNEL); if (!buf) { kfree(caps->buf); caps->buf = NULL; @@ -1181,10 +1191,10 @@ struct vfio_info_cap_header *vfio_info_cap_add(struct vfio_info_cap *caps, } caps->buf = buf; - header = buf + caps->size; + header = buf + header_offset; /* Eventually copied to user buffer, zero */ - memset(header, 0, size); + memset(buf + caps->size, 0, new_size - caps->size); header->id = id; header->version = version; @@ -1193,20 +1203,47 @@ struct vfio_info_cap_header *vfio_info_cap_add(struct vfio_info_cap *caps, for (tmp = buf; tmp->next; tmp = buf + tmp->next) ; /* nothing */ - tmp->next = caps->size; - caps->size += size; + tmp->next = header_offset; + caps->size = new_size; return header; } EXPORT_SYMBOL_GPL(vfio_info_cap_add); -void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset) +/* + * Adjust the capability next fields to account for the given offset at which + * capability structures start and any padding added for alignment. Returns the + * cap_offset or -errno. + */ +ssize_t vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset) { struct vfio_info_cap_header *tmp; + struct vfio_info_cap_header *next_tmp; void *buf = (void *)caps->buf; + size_t pad = ALIGN(offset, sizeof(u64)) - offset; + size_t cap_offset = offset + pad; - for (tmp = buf; tmp->next; tmp = buf + tmp->next - offset) - tmp->next += offset; + /* Shift the next fields to account for offset and pad */ + for (tmp = buf; tmp->next; tmp = next_tmp) { + next_tmp = buf + tmp->next; + tmp->next += cap_offset; + } + + /* Pad with zeroes so capabilities start with proper alignment */ + buf = krealloc(caps->buf, caps->size + pad, GFP_KERNEL); + if (!buf) { + kfree(caps->buf); + caps->buf = NULL; + caps->size = 0; + return -ENOMEM; + } + + memmove(buf + pad, buf, caps->size); + memset(buf, 0, pad); + + caps->buf = buf; + caps->size += pad; + return cap_offset; } EXPORT_SYMBOL(vfio_info_cap_shift); -- 2.41.0