Dear kernel maintainers,
This submission is a kernel driver to support Intel(R) Gaussian & Neural Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor available on multiple Intel platforms. AI developers and users can offload continuous inference workloads to an Intel(R) GNA device in order to free processor resources and save power. Noise reduction and speech recognition are the examples of the workloads Intel(R) GNA deals with while its usage is not limited to the two.
For a list of processors equipped with Intel(R) GNA device, please refer to this link:
https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_supported_plugins_GNA.html
We think contributing this driver to the upstream kernel project is the best way for developers and users to get the latest Intel(R) GNA support in a Linux kernel, through the mainline to any Linux distributions installed on their systems. Upstreaming also enables contribution from developers around the world to the driver once it is merged.
The driver works with Intel(R) libraries in user space. The Intel(R) driver exposes a few IOCTL interfaces for use by libraries in user space. The libraries are open sourced and are available at:
https://github.com/intel/gna
Prior to the submission, these items were tested or examined against GNA driver patch series put on top of v5.11-rc3 tag of mainline kernel:
Linux Kernel patch submission checklist
https://www.kernel.org/doc/html/latest/process/submit-checklist.html?highlight=submit%20checklist
1. If you use a facility then #include the file that defines/declares that facility. Don’t depend on other header files pulling in ones that you use.
(Checked)
2. Builds cleanly:
with applicable or modified CONFIG options =y, =m, and =n. No gcc warnings/errors, no linker warnings/errors.
Passes allnoconfig, allmodconfig
Builds successfully when using O=builddir
(Tested by building kernel with Intel(R) GNA driver config set to ‘m’, ‘y’, and ‘n’; allmodconfig, allnoconfig and O=builddir)
3. Builds on multiple CPU architectures by using local cross-compile tools or some other build farm.
(x86_64 architecture tested - this is architecture where GNA is present and validated, please refer to drivers/misc/gna/Kconfig)
4. ppc64 is a good architecture for cross-compilation checking because it tends to use unsigned long for 64-bit quantities.
(x86_64 architecture tested - this is architecture where GNA is present and validated, please refer to drivers/misc/gna/Kconfig)
5. Check your patch for general style as detailed in Documentation/process/coding-style.rst. Check for trivial violations with the patch style checker prior to submission (scripts/checkpatch.pl). You should be able to justify all violations that remain in your patch.
(Checked. Some warnings were in the output. We checked them and feel they can be ignored.)
6. Any new or modified CONFIG options do not muck up the config menu and default to off unless they meet the exception criteria documented in Documentation/kbuild/kconfig-language.rst Menu attributes: default value.
(No explicit default value is provided because Kbuild system sets it off by default.)
7. All new Kconfig options have help text.
(Checked)
8. Has been carefully reviewed with respect to relevant Kconfig combinations. This is very hard to get right with testing – brainpower pays off here.
(Checked)
10. Use make checkstack and fix any problems that it finds.
Note
checkstack does not point out problems explicitly, but any one function that uses more than 512 bytes on the stack is a candidate for change.
(Checked)
11. Include kernel-doc to document global kernel APIs. (Not required for static functions, but OK there also.) Use make htmldocs or make pdfdocs to check the kernel-doc and fix any issues.
(Addressed by adding new gna.rst in Documentation; tested with output from ‘make htmldocs’)
12. Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_ATOMIC_SLEEP, CONFIG_PROVE_RCU and CONFIG_DEBUG_OBJECTS_RCU_HEAD all simultaneously enabled.
(Checked)
13. Has been build- and runtime tested with and without CONFIG_SMP and CONFIG_PREEMPT.
(Checked)
15. All new /proc entries are documented under Documentation/.
(The driver doesn’t introduce any new procs)
16. All new kernel boot parameters are documented in Documentation/admin-guide/kernel-parameters.rst.
(The driver doesn’t add boot parameters)
17. All new module parameters are documented with MODULE_PARM_DESC().
(Checked)
21. Newly-added code has been compiled with gcc -W (use make EXTRA_CFLAGS=-W). This will generate lots of noise, but is good for finding bugs like “warning: comparison between signed.
and unsigned”.
(Checked)
24. If any ioctl’s are added by the patch, then also update Documentation/userspace-api/ioctl/ioctl-number.rst.
(Updated)
The above results only reflect our understanding of the test and the code referred. Please kindly let us know any issues or different observations from any further tests.
Thanks
Series-reviewed-by: Tony Luck <[email protected]>
Tomasz Jankowski (12):
gna: add driver module
gna: add component of hardware operation
gna: read hardware info in the driver
gna: add memory handling
gna: initialize mmu
gna: add hardware ids
gna: add request component
gna: implement scoring
gna: add a work queue to process scoring requests
gna: add interrupt handler
gna: add ioctl handler
gna: add a char device
Documentation/misc-devices/gna.rst | 48 ++
Documentation/misc-devices/index.rst | 1 +
.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 7 +
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/gna/Kbuild | 5 +
drivers/misc/gna/Kconfig | 13 +
drivers/misc/gna/gna_device.c | 451 +++++++++++++++++
drivers/misc/gna/gna_device.h | 87 ++++
drivers/misc/gna/gna_driver.c | 90 ++++
drivers/misc/gna/gna_driver.h | 41 ++
drivers/misc/gna/gna_hw.c | 136 +++++
drivers/misc/gna/gna_hw.h | 85 ++++
drivers/misc/gna/gna_ioctl.c | 249 +++++++++
drivers/misc/gna/gna_ioctl.h | 11 +
drivers/misc/gna/gna_mem.c | 472 ++++++++++++++++++
drivers/misc/gna/gna_mem.h | 107 ++++
drivers/misc/gna/gna_request.c | 466 +++++++++++++++++
drivers/misc/gna/gna_request.h | 62 +++
drivers/misc/gna/gna_score.c | 299 +++++++++++
drivers/misc/gna/gna_score.h | 20 +
include/uapi/misc/gna.h | 155 ++++++
23 files changed, 2808 insertions(+)
create mode 100644 Documentation/misc-devices/gna.rst
create mode 100644 drivers/misc/gna/Kbuild
create mode 100644 drivers/misc/gna/Kconfig
create mode 100644 drivers/misc/gna/gna_device.c
create mode 100644 drivers/misc/gna/gna_device.h
create mode 100644 drivers/misc/gna/gna_driver.c
create mode 100644 drivers/misc/gna/gna_driver.h
create mode 100644 drivers/misc/gna/gna_hw.c
create mode 100644 drivers/misc/gna/gna_hw.h
create mode 100644 drivers/misc/gna/gna_ioctl.c
create mode 100644 drivers/misc/gna/gna_ioctl.h
create mode 100644 drivers/misc/gna/gna_mem.c
create mode 100644 drivers/misc/gna/gna_mem.h
create mode 100644 drivers/misc/gna/gna_request.c
create mode 100644 drivers/misc/gna/gna_request.h
create mode 100644 drivers/misc/gna/gna_score.c
create mode 100644 drivers/misc/gna/gna_score.h
create mode 100644 include/uapi/misc/gna.h
--
2.28.0
From: Tomasz Jankowski <[email protected]>
The new char device is the node for applications in user space to
interact with the driver.
Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/gna_device.c | 157 ++++++++++++++++++++++++++++++++++
drivers/misc/gna/gna_driver.c | 23 ++++-
2 files changed, 178 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index 9faf0456fee2..7b2d3b4f863b 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -1,12 +1,14 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright(c) 2017-2021 Intel Corporation
+#include <linux/cdev.h>
#include <linux/module.h>
#include <linux/pci.h>
#include "gna_device.h"
#include "gna_driver.h"
#include "gna_hw.h"
+#include "gna_ioctl.h"
#include "gna_request.h"
#define GNA_DEV_HWID_CNL 0x5A11
@@ -85,6 +87,150 @@ const struct pci_device_id gna_pci_ids[] = {
MODULE_DEVICE_TABLE(pci, gna_pci_ids);
+static inline struct gna_private *inode_to_gna(struct inode *inode)
+{
+ return container_of(inode->i_cdev, struct gna_private, cdev);
+}
+
+static int gna_open(struct inode *inode, struct file *f)
+{
+ struct gna_file_private *file_priv;
+ struct gna_private *gna_priv;
+
+ gna_priv = inode_to_gna(inode);
+ if (!gna_priv)
+ return -ENODEV;
+
+ file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
+ if (!file_priv)
+ return -ENOMEM;
+
+ file_priv->fd = f;
+ file_priv->gna_priv = gna_priv;
+
+ mutex_init(&file_priv->memlist_lock);
+ INIT_LIST_HEAD(&file_priv->memory_list);
+
+ mutex_lock(&gna_priv->filelist_lock);
+ list_add_tail(&file_priv->flist, &gna_priv->file_list);
+ mutex_unlock(&gna_priv->filelist_lock);
+
+ f->private_data = file_priv;
+
+ return 0;
+}
+
+static int gna_release(struct inode *inode, struct file *f)
+{
+ struct gna_file_private *iter_file, *temp_file;
+ struct gna_memory_object *iter_mo, *temp_mo;
+ struct gna_file_private *file_priv;
+ struct gna_private *gna_priv;
+
+ gna_priv = inode_to_gna(inode);
+ if (!gna_priv)
+ return -ENODEV;
+
+ /* free all memory objects created by that file */
+ file_priv = (struct gna_file_private *)f->private_data;
+ mutex_lock(&file_priv->memlist_lock);
+ list_for_each_entry_safe(iter_mo, temp_mo, &file_priv->memory_list, file_mem_list) {
+ queue_work(gna_priv->request_wq, &iter_mo->work);
+ wait_event(iter_mo->waitq, true);
+ gna_memory_free(gna_priv, iter_mo);
+ }
+ mutex_unlock(&file_priv->memlist_lock);
+
+ gna_delete_file_requests(f, gna_priv);
+
+ /* delete itself from device's file list */
+ mutex_lock(&gna_priv->filelist_lock);
+ list_for_each_entry_safe(iter_file, temp_file, &gna_priv->file_list, flist) {
+ if (iter_file->fd == f) {
+ list_del(&iter_file->flist);
+ f->private_data = NULL;
+ kfree(iter_file);
+ break;
+ }
+ }
+ mutex_unlock(&gna_priv->filelist_lock);
+
+ return 0;
+}
+
+static const struct file_operations gna_file_ops = {
+ .owner = THIS_MODULE,
+ .open = gna_open,
+ .release = gna_release,
+ .unlocked_ioctl = gna_ioctl,
+};
+
+/* Reverse gna_dev_create() */
+static void gna_dev_release(struct gna_private *gna_priv)
+{
+ cdev_device_del(&gna_priv->cdev, &gna_priv->dev);
+
+ mutex_lock(&gna_drv_priv.lock);
+ __clear_bit(MINOR(gna_priv->dev.devt), gna_drv_priv.dev_map);
+ mutex_unlock(&gna_drv_priv.lock);
+
+ dev_set_drvdata(&gna_priv->dev, NULL);
+}
+
+static int gna_dev_create(struct gna_private *gna_priv)
+{
+ struct pci_dev *pcidev;
+ struct device *dev;
+ dev_t gna_devt;
+ int dev_num;
+ int major;
+ int minor;
+ int ret;
+
+ pcidev = gna_priv->pdev;
+
+ mutex_lock(&gna_drv_priv.lock);
+
+ dev_num = find_first_zero_bit(gna_drv_priv.dev_map, GNA_MAX_DEVICES);
+ if (dev_num == GNA_MAX_DEVICES) {
+ mutex_unlock(&gna_drv_priv.lock);
+ dev_err(&pcidev->dev, "number of gna devices reached maximum\n");
+ return -ENODEV;
+ }
+
+ set_bit(dev_num, gna_drv_priv.dev_map);
+ major = MAJOR(gna_drv_priv.devt);
+ minor = gna_drv_priv.minor++;
+
+ mutex_unlock(&gna_drv_priv.lock);
+
+ gna_devt = MKDEV(major, minor);
+ dev = &gna_priv->dev;
+ device_initialize(dev);
+ dev->devt = gna_devt;
+ dev->class = gna_class;
+ dev->parent = gna_priv->parent;
+ dev->groups = NULL;
+ dev_set_drvdata(dev, gna_priv);
+ dev_set_name(dev, "gna%d", dev_num);
+
+ cdev_init(&gna_priv->cdev, &gna_file_ops);
+ gna_priv->cdev.owner = THIS_MODULE;
+
+ ret = cdev_device_add(&gna_priv->cdev, &gna_priv->dev);
+ if (ret) {
+ mutex_lock(&gna_drv_priv.lock);
+ __clear_bit(minor, gna_drv_priv.dev_map);
+ mutex_unlock(&gna_drv_priv.lock);
+ dev_err(&gna_priv->dev, "could not add gna%d char device\n", dev_num);
+ } else {
+ dev_info(&gna_priv->dev, "registered gna%d device: major %d, minor %d\n",
+ dev_num, major, minor);
+ }
+
+ return ret;
+}
+
static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
const struct pci_device_id *pci_id)
{
@@ -133,8 +279,17 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
mutex_init(&gna_priv->reqlist_lock);
INIT_LIST_HEAD(&gna_priv->request_list);
+ ret = gna_dev_create(gna_priv);
+ if (ret) {
+ dev_err(&pcidev->dev, "could not create gna device\n");
+ goto err_del_wq;
+ }
+
return 0;
+err_del_wq:
+ destroy_workqueue(gna_priv->request_wq);
+
err_pci_drvdata_unset:
pci_set_drvdata(pcidev, NULL);
@@ -244,6 +399,8 @@ void gna_remove(struct pci_dev *pcidev)
gna_priv = pci_get_drvdata(pcidev);
+ gna_dev_release(gna_priv);
+
free_irq(gna_priv->irq, gna_priv);
gna_dev_deinit(gna_priv);
diff --git a/drivers/misc/gna/gna_driver.c b/drivers/misc/gna/gna_driver.c
index cb638dfa81ac..5219a5408bc1 100644
--- a/drivers/misc/gna/gna_driver.c
+++ b/drivers/misc/gna/gna_driver.c
@@ -46,19 +46,38 @@ static int __init gna_drv_init(void)
}
gna_class->devnode = gna_devnode;
+ ret = alloc_chrdev_region(&gna_drv_priv.devt, 0, GNA_MAX_DEVICES, "gna");
+ if (ret) {
+ pr_err("could not get major number\n");
+ goto err_destroy_class;
+ }
+
+ pr_debug("major %d\n", MAJOR(gna_drv_priv.devt));
+ pr_debug("minor %d\n", MINOR(gna_drv_priv.devt));
+
+ gna_drv_priv.minor = MINOR(gna_drv_priv.devt);
+
ret = pci_register_driver(&gna_driver);
if (ret) {
pr_err("pci register driver failed\n");
- class_destroy(gna_class);
- return ret;
+ goto err_unreg_chdev;
}
return 0;
+
+err_unreg_chdev:
+ unregister_chrdev_region(gna_drv_priv.devt, GNA_MAX_DEVICES);
+
+err_destroy_class:
+ class_destroy(gna_class);
+
+ return ret;
}
static void __exit gna_drv_exit(void)
{
pci_unregister_driver(&gna_driver);
+ unregister_chrdev_region(gna_drv_priv.devt, GNA_MAX_DEVICES);
class_destroy(gna_class);
}
--
2.28.0
From: Tomasz Jankowski <[email protected]>
Get the hardware information from register MMIO_IBUFFS
Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/gna_device.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index a6ef7e790e9e..869507594f9e 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -6,10 +6,13 @@
#include "gna_device.h"
#include "gna_driver.h"
+#include "gna_hw.h"
static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
const struct pci_device_id *pci_id)
{
+ u32 bld_reg;
+
pci_set_drvdata(pcidev, gna_priv);
gna_priv->parent = &pcidev->dev;
@@ -17,6 +20,9 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
gna_priv->info = *(struct gna_drv_info *)pci_id->driver_data;
gna_priv->drv_priv = &gna_drv_priv;
+ bld_reg = gna_reg_read(gna_priv->bar0.mem_addr, GNA_MMIO_IBUFFS);
+ gna_priv->hw_info.in_buf_s = bld_reg & GENMASK(7, 0);
+
return 0;
}
--
2.28.0
From: Tomasz Jankowski <[email protected]>
Setup mmu in the driver with a new memory component.
Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/gna_device.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index f962c7e1e5e9..d6a6d3cab570 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -23,6 +23,16 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
bld_reg = gna_reg_read(gna_priv->bar0.mem_addr, GNA_MMIO_IBUFFS);
gna_priv->hw_info.in_buf_s = bld_reg & GENMASK(7, 0);
+ if (gna_mmu_alloc(gna_priv)) {
+ dev_err(&gna_priv->dev, "gna mmu allocation failed\n");
+ return -EFAULT;
+ }
+ dev_dbg(&pcidev->dev, "maximum memory size %llu num pd %d\n",
+ gna_priv->info.max_hw_mem, gna_priv->info.num_pagetables);
+ dev_dbg(&pcidev->dev, "desc rsvd size %d mmu vamax size %d\n",
+ gna_priv->info.desc_info.rsvd_size,
+ gna_priv->info.desc_info.mmu_info.vamax_size);
+
mutex_init(&gna_priv->mmu_lock);
idr_init(&gna_priv->memory_idr);
@@ -35,6 +45,7 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
static void gna_dev_deinit(struct gna_private *gna_priv)
{
idr_destroy(&gna_priv->memory_idr);
+ gna_mmu_free(gna_priv);
pci_set_drvdata(gna_priv->pdev, NULL);
}
--
2.28.0
From: Tomasz Jankowski <[email protected]>
Patch adds memory handling - mapping, DMA, pinning.
The GNA driver maps and unmaps the physical pages for 64-byte aligned
buffer allocated by user space program. The pages of mapped memory
are being locked only during actual computation.
Patch adds configuration of the DMA scatter gather list for physical pages
and generation of page table and page directory to be programmed in the GNA HW
at the time of scoring initiation.
GNA’s MMU is being configured based on specific request memory usage.
As the MMU can address up to 256MB a single scoring request is limited
to this amount of memory being used.
GNA Library can allocate any number of memory regions for GNA usage.
Its number and total capacity are limited by the OSs’ resources.
Due to GNA MMU restrictions, even when using multiple memory regions,
the sum of all the memory regions used within a single inference
request must be less than 256MB.
At least a single GNA memory region is needed to be allocated
(and can be shared by multiple models). At the other extreme,
each GNA tensor (e.g., weights/biases/inputs/outputs) could use
its own, separate GNA memory region.
Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/Kbuild | 2 +-
drivers/misc/gna/gna_device.c | 6 +
drivers/misc/gna/gna_device.h | 11 +-
drivers/misc/gna/gna_mem.c | 469 ++++++++++++++++++++++++++++++++++
drivers/misc/gna/gna_mem.h | 107 ++++++++
5 files changed, 593 insertions(+), 2 deletions(-)
create mode 100644 drivers/misc/gna/gna_mem.c
create mode 100644 drivers/misc/gna/gna_mem.h
diff --git a/drivers/misc/gna/Kbuild b/drivers/misc/gna/Kbuild
index 8620d88588e5..860b14c0e8d0 100644
--- a/drivers/misc/gna/Kbuild
+++ b/drivers/misc/gna/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-gna-y := gna_device.o gna_driver.o gna_hw.o
+gna-y := gna_device.o gna_driver.o gna_mem.o gna_hw.o
obj-$(CONFIG_INTEL_GNA) += gna.o
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index 869507594f9e..f962c7e1e5e9 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -23,12 +23,18 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
bld_reg = gna_reg_read(gna_priv->bar0.mem_addr, GNA_MMIO_IBUFFS);
gna_priv->hw_info.in_buf_s = bld_reg & GENMASK(7, 0);
+ mutex_init(&gna_priv->mmu_lock);
+
+ idr_init(&gna_priv->memory_idr);
+ mutex_init(&gna_priv->memidr_lock);
+
return 0;
}
/* Reverse gna_dev_init() */
static void gna_dev_deinit(struct gna_private *gna_priv)
{
+ idr_destroy(&gna_priv->memory_idr);
pci_set_drvdata(gna_priv->pdev, NULL);
}
diff --git a/drivers/misc/gna/gna_device.h b/drivers/misc/gna/gna_device.h
index add8088ffa28..ee234e474d43 100644
--- a/drivers/misc/gna/gna_device.h
+++ b/drivers/misc/gna/gna_device.h
@@ -10,7 +10,7 @@
#include <uapi/misc/gna.h>
-#include "gna_hw.h"
+#include "gna_mem.h"
struct gna_drv_info {
u32 hwid;
@@ -45,6 +45,15 @@ struct gna_private {
struct gna_pci_bar bar0;
struct gna_drv_info info;
struct gna_hw_info hw_info;
+
+ struct gna_mmu_object mmu;
+ /* lock protecting mmu structure */
+ struct mutex mmu_lock;
+
+ /* memory objects */
+ struct idr memory_idr;
+ /* lock protecting memory_idr */
+ struct mutex memidr_lock;
};
int gna_probe(struct pci_dev *dev, const struct pci_device_id *id);
diff --git a/drivers/misc/gna/gna_mem.c b/drivers/misc/gna/gna_mem.c
new file mode 100644
index 000000000000..a2ac3c31b4fb
--- /dev/null
+++ b/drivers/misc/gna/gna_mem.c
@@ -0,0 +1,469 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#define pr_fmt(fmt) KBUILD_MODNAME " " fmt
+
+#include <linux/device.h>
+#include <linux/mm.h>
+#include <linux/mmap_lock.h>
+#include <linux/pagemap.h>
+#include <linux/pci.h>
+#include <linux/sched.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/task.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+
+#include <uapi/misc/gna.h>
+
+#include "gna_device.h"
+#include "gna_driver.h"
+#include "gna_mem.h"
+
+static void gna_mmu_init(struct gna_private *gna_priv)
+{
+ struct gna_mmu_object *mmu;
+ dma_addr_t pagetable_dma;
+ u32 *pgdirn;
+ int i;
+
+ mmu = &gna_priv->mmu;
+
+ pgdirn = mmu->hwdesc->mmu.pagedir_n;
+
+ for (i = 0; i < mmu->num_pagetables; i++) {
+ pagetable_dma = mmu->pagetables_dma[i];
+ pgdirn[i] = pagetable_dma >> PAGE_SHIFT;
+ }
+
+ for (; i < GNA_PGDIRN_LEN; i++)
+ pgdirn[i] = GNA_PGDIR_INVALID;
+}
+
+/* descriptor and page tables allocation */
+int gna_mmu_alloc(struct gna_private *gna_priv)
+{
+ struct gna_mmu_object *mmu;
+ int desc_size;
+ int i;
+
+ if (gna_priv->info.num_pagetables > GNA_PGDIRN_LEN) {
+ dev_err(&gna_priv->dev, "too large number of pagetables requested\n");
+ return -EINVAL;
+ }
+
+ mmu = &gna_priv->mmu;
+
+ desc_size = round_up(gna_priv->info.desc_info.desc_size, PAGE_SIZE);
+
+ mmu->hwdesc = dma_alloc_coherent(&gna_priv->pdev->dev, desc_size, &mmu->hwdesc_dma,
+ GFP_KERNEL);
+ if (!mmu->hwdesc)
+ goto end;
+
+ mmu->num_pagetables = gna_priv->info.num_pagetables;
+
+ mmu->pagetables_dma = kmalloc_array(mmu->num_pagetables, sizeof(*mmu->pagetables_dma),
+ GFP_KERNEL);
+ if (!mmu->pagetables_dma)
+ goto err_free_descriptor;
+
+ mmu->pagetables = kmalloc_array(mmu->num_pagetables, sizeof(*mmu->pagetables), GFP_KERNEL);
+
+ if (!mmu->pagetables)
+ goto err_free_pagetables_dma;
+
+ for (i = 0; i < mmu->num_pagetables; i++) {
+ mmu->pagetables[i] = dma_alloc_coherent(&gna_priv->pdev->dev, PAGE_SIZE,
+ &mmu->pagetables_dma[i], GFP_KERNEL);
+ if (!mmu->pagetables[i])
+ goto err_free_mmu;
+ }
+
+ gna_mmu_init(gna_priv);
+
+ return 0;
+
+err_free_mmu:
+ while (i--) {
+ pci_free_consistent(gna_priv->pdev, PAGE_SIZE, mmu->pagetables[i],
+ mmu->pagetables_dma[i]);
+ mmu->pagetables[i] = NULL;
+ mmu->pagetables_dma[i] = 0;
+ }
+
+ kfree(mmu->pagetables);
+ mmu->pagetables = NULL;
+ mmu->num_pagetables = 0;
+
+err_free_pagetables_dma:
+ kfree(mmu->pagetables_dma);
+ mmu->pagetables_dma = NULL;
+
+err_free_descriptor:
+ pci_free_consistent(gna_priv->pdev, desc_size, mmu->hwdesc, mmu->hwdesc_dma);
+ mmu->hwdesc = NULL;
+ mmu->hwdesc_dma = 0;
+
+end:
+ return -ENOMEM;
+}
+
+void gna_mmu_free(struct gna_private *gna_priv)
+{
+ struct gna_mmu_object *mmu;
+ int desc_size;
+ int i;
+
+ mmu = &gna_priv->mmu;
+ mutex_lock(&gna_priv->mmu_lock);
+
+ for (i = 0; i < mmu->num_pagetables; i++) {
+ pci_free_consistent(gna_priv->pdev, PAGE_SIZE, mmu->pagetables[i],
+ mmu->pagetables_dma[i]);
+ mmu->pagetables[i] = NULL;
+ mmu->pagetables_dma[i] = 0;
+ }
+
+ kfree(mmu->pagetables);
+ mmu->pagetables = NULL;
+
+ kfree(mmu->pagetables_dma);
+ mmu->pagetables_dma = NULL;
+
+ desc_size = round_up(gna_priv->info.desc_info.desc_size, PAGE_SIZE);
+ pci_free_consistent(gna_priv->pdev, desc_size, mmu->hwdesc, mmu->hwdesc_dma);
+ mmu->hwdesc = NULL;
+ mmu->hwdesc_dma = 0;
+
+ mutex_unlock(&gna_priv->mmu_lock);
+}
+
+void gna_mmu_add(struct gna_private *gna_priv, struct gna_memory_object *mo)
+{
+ struct gna_mmu_object *mmu;
+ struct scatterlist *sgl;
+ dma_addr_t sg_page;
+ int sg_page_len;
+ u32 *pagetable;
+ u32 mmu_page;
+ int sg_pages;
+ int i;
+ int j;
+
+ mmu = &gna_priv->mmu;
+ mutex_lock(&gna_priv->mmu_lock);
+
+ j = mmu->filled_pages;
+ sgl = mo->sgt->sgl;
+ if (!sgl) {
+ dev_warn(&gna_priv->dev, "empty scatter list in memory object\n");
+ goto warn_empty_sgl;
+ }
+ sg_page = sg_dma_address(sgl);
+ sg_page_len = round_up(sg_dma_len(sgl), PAGE_SIZE) >> PAGE_SHIFT;
+ sg_pages = 0;
+
+ for (i = mmu->filled_pts; i < mmu->num_pagetables; i++) {
+ if (!sgl)
+ break;
+
+ pagetable = mmu->pagetables[i];
+
+ for (j = mmu->filled_pages; j < GNA_PT_LENGTH; j++) {
+ mmu_page = sg_page >> PAGE_SHIFT;
+ pagetable[j] = mmu_page;
+
+ mmu->filled_pages++;
+ sg_page += PAGE_SIZE;
+ sg_pages++;
+ if (sg_pages == sg_page_len) {
+ sgl = sg_next(sgl);
+ if (!sgl)
+ break;
+
+ sg_page = sg_dma_address(sgl);
+ sg_page_len =
+ round_up(sg_dma_len(sgl), PAGE_SIZE)
+ >> PAGE_SHIFT;
+ sg_pages = 0;
+ }
+ }
+
+ if (j == GNA_PT_LENGTH) {
+ mmu->filled_pages = 0;
+ mmu->filled_pts++;
+ }
+ }
+
+ mmu->hwdesc->mmu.vamaxaddr =
+ (mmu->filled_pts * PAGE_SIZE * GNA_PGDIR_ENTRIES) +
+ (mmu->filled_pages * PAGE_SIZE) - 1;
+ dev_dbg(&gna_priv->dev, "vamaxaddr set to %u\n", mmu->hwdesc->mmu.vamaxaddr);
+
+warn_empty_sgl:
+ mutex_unlock(&gna_priv->mmu_lock);
+}
+
+void gna_mmu_clear(struct gna_private *gna_priv)
+{
+ struct gna_mmu_object *mmu;
+ int i;
+
+ mmu = &gna_priv->mmu;
+ mutex_lock(&gna_priv->mmu_lock);
+
+ for (i = 0; i < mmu->filled_pts; i++)
+ memset(mmu->pagetables[i], 0, PAGE_SIZE);
+
+ if (mmu->filled_pages > 0)
+ memset(mmu->pagetables[mmu->filled_pts], 0, mmu->filled_pages * GNA_PT_ENTRY_SIZE);
+
+ mmu->filled_pts = 0;
+ mmu->filled_pages = 0;
+ mmu->hwdesc->mmu.vamaxaddr = 0;
+
+ mutex_unlock(&gna_priv->mmu_lock);
+}
+
+int gna_buffer_get_size(u64 offset, u64 size)
+{
+ u64 page_offset;
+
+ page_offset = offset & ~PAGE_MASK;
+ return round_up(page_offset + size, PAGE_SIZE);
+}
+
+/* must be called with gna_memory_object page_lock held */
+static int gna_get_pages(struct gna_memory_object *mo, u64 offset, u64 size)
+{
+ struct gna_private *gna_priv;
+ u64 effective_address;
+ struct mm_struct *mm;
+ struct sg_table *sgt;
+ struct page **pages;
+ int effective_size;
+ int num_pinned;
+ int num_pages;
+ int skip_size;
+ int ents;
+ int ret;
+
+ ret = 0;
+ gna_priv = mo->gna_priv;
+
+ if (mo->pages) {
+ dev_warn(&gna_priv->dev, "pages are already pinned\n");
+ return -EFAULT;
+ }
+
+ /* using vmalloc because num_pages can be large */
+ skip_size = round_down(offset, PAGE_SIZE);
+ effective_address = mo->user_address + skip_size;
+ dev_dbg(&gna_priv->dev, "user address %llx\n", mo->user_address);
+ dev_dbg(&gna_priv->dev, "effective user address %llx\n", effective_address);
+
+ effective_size = gna_buffer_get_size(offset, size);
+
+ num_pages = effective_size >> PAGE_SHIFT;
+ dev_dbg(&gna_priv->dev, "allocating %d pages\n", num_pages);
+ pages = kvmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
+ if (!pages) {
+ ret = -ENOMEM;
+ goto err_exit;
+ }
+
+ get_task_struct(mo->task);
+ mm = get_task_mm(mo->task);
+ if (!mm) {
+ ret = -ENOENT;
+ goto err_put_task;
+ }
+ mmap_read_lock(mm);
+ num_pinned = get_user_pages_remote(mm, effective_address, num_pages,
+ FOLL_WRITE, pages, NULL, NULL);
+ mmap_read_unlock(mm);
+ mmput(mm);
+
+ if (num_pinned <= 0) {
+ ret = num_pinned;
+ dev_err(&gna_priv->dev, "function get_user_pages_remote() failed\n");
+ goto err_free_pages;
+ }
+ if (num_pinned < num_pages) {
+ ret = -EFAULT;
+ dev_err(&gna_priv->dev,
+ "get_user_pages_remote() pinned fewer pages number than requested\n");
+ goto err_free_pages;
+ }
+
+ sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+ if (!sgt) {
+ ret = -ENOMEM;
+ goto err_put_pages;
+ }
+
+ ret = sg_alloc_table_from_pages(sgt, pages, num_pinned, 0, mo->memory_size, GFP_KERNEL);
+ if (ret) {
+ dev_err(&gna_priv->dev, "could not alloc scatter list\n");
+ goto err_free_sgt;
+ }
+
+ if (IS_ERR(sgt->sgl)) {
+ dev_err(&gna_priv->dev, "sgl allocation failed\n");
+ ret = PTR_ERR(sgt->sgl);
+ goto err_free_sgt;
+ }
+
+ ents = pci_map_sg(gna_priv->pdev, sgt->sgl, sgt->nents, PCI_DMA_BIDIRECTIONAL);
+ if (ents <= 0) {
+ dev_err(&gna_priv->dev, "could not map scatter gather list\n");
+ ret = -EIO;
+ goto err_free_sgl;
+ }
+
+ mo->sgt = sgt;
+ mo->pages = pages;
+ mo->num_pinned = num_pinned;
+
+ return 0;
+
+err_free_sgl:
+ sg_free_table(sgt);
+
+err_free_sgt:
+ kfree(sgt);
+
+err_put_pages:
+ release_pages(pages, num_pinned);
+
+err_free_pages:
+ kvfree(pages);
+
+err_put_task:
+ put_task_struct(mo->task);
+
+err_exit:
+ return ret;
+}
+
+/* must be called with gna_memory_object page_lock held */
+static void gna_put_pages(struct gna_memory_object *mo)
+{
+ struct gna_private *gna_priv;
+ struct sg_table *sgt;
+
+ gna_priv = mo->gna_priv;
+
+ if (!mo->pages) {
+ dev_warn(&gna_priv->dev, "memory object has no pages %llu\n", mo->memory_id);
+ return;
+ }
+
+ sgt = mo->sgt;
+
+ pci_unmap_sg(gna_priv->pdev, sgt->sgl, sgt->nents, PCI_DMA_BIDIRECTIONAL);
+ sg_free_table(sgt);
+ kfree(sgt);
+ mo->sgt = NULL;
+
+ release_pages(mo->pages, mo->num_pinned);
+ kvfree(mo->pages);
+ mo->pages = NULL;
+ mo->num_pinned = 0;
+
+ put_task_struct(mo->task);
+}
+
+void gna_memory_free(struct gna_private *gna_priv, struct gna_memory_object *mo)
+{
+ mutex_lock(&gna_priv->memidr_lock);
+ idr_remove(&gna_priv->memory_idr, mo->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+
+ cancel_work_sync(&mo->work);
+ kfree(mo);
+}
+
+static void gna_memory_release(struct work_struct *work)
+{
+ struct gna_memory_object *mo;
+
+ mo = container_of(work, struct gna_memory_object, work);
+
+ mo->user_ptr = NULL;
+
+ wake_up_interruptible(&mo->waitq);
+}
+
+static const struct gna_memory_operations memory_ops = {
+ .get_pages = gna_get_pages,
+ .put_pages = gna_put_pages,
+};
+
+int gna_priv_userptr(struct gna_file_private *file_priv, union gna_memory_map *gna_mem)
+{
+ struct gna_memory_object *mo;
+ struct gna_private *gna_priv;
+ int memory_id;
+ int ret;
+
+ ret = 0;
+
+ gna_priv = file_priv->gna_priv;
+
+ if (gna_mem->in.address & ~PAGE_MASK) {
+ dev_err(&gna_priv->dev, "user pointer not page aligned\n");
+ return -EINVAL;
+ }
+
+ if (!gna_mem->in.size) {
+ dev_err(&gna_priv->dev, "invalid user memory size\n");
+ return -EINVAL;
+ }
+
+ if (!access_ok(u64_to_user_ptr(gna_mem->in.address), gna_mem->in.size)) {
+ dev_err(&gna_priv->dev, "invalid user pointer\n");
+ return -EINVAL;
+ }
+
+ mo = kzalloc(sizeof(*mo), GFP_KERNEL);
+ if (!mo)
+ return -ENOMEM;
+
+ mo->fd = file_priv->fd;
+ mo->gna_priv = gna_priv;
+ mo->ops = &memory_ops;
+ mo->user_address = gna_mem->in.address;
+ mo->memory_size = gna_mem->in.size;
+ mo->user_ptr = u64_to_user_ptr(gna_mem->in.address);
+ mo->num_pages = round_up(gna_mem->in.size, PAGE_SIZE) >> PAGE_SHIFT;
+ mo->task = current;
+ INIT_WORK(&mo->work, gna_memory_release);
+ init_waitqueue_head(&mo->waitq);
+ mutex_init(&mo->page_lock);
+
+ mutex_lock(&gna_priv->memidr_lock);
+ memory_id = idr_alloc(&gna_priv->memory_idr, mo, 1, 0, GFP_KERNEL);
+ mutex_unlock(&gna_priv->memidr_lock);
+
+ if (memory_id < 0) {
+ dev_err(&gna_priv->dev, "idr allocation for memory failed\n");
+ ret = -EFAULT;
+ goto err_free_mo;
+ }
+
+ mo->memory_id = (u64)memory_id;
+
+ mutex_lock(&file_priv->memlist_lock);
+ list_add_tail(&mo->file_mem_list, &file_priv->memory_list);
+ mutex_unlock(&file_priv->memlist_lock);
+
+ gna_mem->out.memory_id = mo->memory_id;
+
+ return 0;
+
+err_free_mo:
+ kfree(mo);
+ return ret;
+}
diff --git a/drivers/misc/gna/gna_mem.h b/drivers/misc/gna/gna_mem.h
new file mode 100644
index 000000000000..218f8c0f51a3
--- /dev/null
+++ b/drivers/misc/gna/gna_mem.h
@@ -0,0 +1,107 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_MEM_H__
+#define __GNA_MEM_H__
+
+#include <linux/mmu_notifier.h>
+
+#include "gna_hw.h"
+
+union gna_memory_map;
+
+struct gna_file_private;
+
+struct gna_xnn_descriptor {
+ u32 labase;
+ u16 lacount;
+ u16 _rsvd;
+};
+
+struct gna_mmu {
+ u32 vamaxaddr;
+ u8 __res_204[12];
+ u32 pagedir_n[GNA_PGDIRN_LEN];
+};
+
+struct gna_hw_descriptor {
+ u8 __res_0000[256];
+ struct gna_xnn_descriptor xnn_config;
+ u8 __unused[248];
+ struct gna_mmu mmu;
+};
+
+struct gna_mmu_object {
+ struct gna_hw_descriptor *hwdesc;
+
+ dma_addr_t hwdesc_dma;
+
+ u32 **pagetables;
+ dma_addr_t *pagetables_dma;
+
+ u32 num_pagetables;
+
+ u32 filled_pts;
+ u32 filled_pages;
+};
+
+struct gna_mmu_notifier {
+ struct gna_file_private *file_priv;
+ struct gna_private *gna_priv;
+ struct gna_memory_object *mo;
+ struct mmu_notifier mn;
+ struct mm_struct *mm;
+};
+
+struct gna_memory_object {
+ u64 memory_id;
+
+ const struct gna_memory_operations *ops;
+
+ struct gna_private *gna_priv;
+ struct file *fd;
+
+ void __user *user_ptr;
+ u64 user_address;
+ u64 memory_size;
+
+ struct page **pages;
+ struct sg_table *sgt;
+ int num_pages;
+ int num_pinned;
+ struct mutex page_lock; /* protects get/put pages operations */
+
+ struct task_struct *task;
+
+ struct list_head mem_list;
+
+ struct list_head file_mem_list;
+
+ struct work_struct work;
+
+ struct wait_queue_head waitq;
+};
+
+struct gna_memory_operations {
+ /* pins pages */
+ int (*get_pages)(struct gna_memory_object *mo, u64 offset, u64 size);
+
+ /* puts previously pinned pages */
+ void (*put_pages)(struct gna_memory_object *mo);
+};
+
+int gna_buffer_get_size(u64 offset, u64 size);
+
+int gna_priv_userptr(struct gna_file_private *file_priv, union gna_memory_map *gna_mem);
+
+int gna_mmu_alloc(struct gna_private *gna_priv);
+
+void gna_mmu_free(struct gna_private *gna_priv);
+
+void gna_mmu_add(struct gna_private *gna_priv, struct gna_memory_object *object);
+
+void gna_mmu_clear(struct gna_private *gna_priv);
+
+void gna_memory_free(struct gna_private *gna_priv, struct gna_memory_object *mo);
+
+#endif // __GNA_MEM_H__
--
2.28.0
From: Tomasz Jankowski <[email protected]>
Add a new component for scoring logic such as configuring and kicking
off the hardware.
Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/Kbuild | 2 +-
drivers/misc/gna/gna_device.c | 3 +
drivers/misc/gna/gna_device.h | 5 +
drivers/misc/gna/gna_hw.h | 2 -
drivers/misc/gna/gna_score.c | 299 ++++++++++++++++++++++++++++++++++
drivers/misc/gna/gna_score.h | 20 +++
6 files changed, 328 insertions(+), 3 deletions(-)
create mode 100644 drivers/misc/gna/gna_score.c
create mode 100644 drivers/misc/gna/gna_score.h
diff --git a/drivers/misc/gna/Kbuild b/drivers/misc/gna/Kbuild
index f47775759a6e..049e142894aa 100644
--- a/drivers/misc/gna/Kbuild
+++ b/drivers/misc/gna/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-gna-y := gna_device.o gna_driver.o gna_mem.o gna_request.o gna_hw.o
+gna-y := gna_device.o gna_driver.o gna_mem.o gna_request.o gna_score.o gna_hw.o
obj-$(CONFIG_INTEL_GNA) += gna.o
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index cd8ee86bbc58..7a1888824ea1 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -112,6 +112,9 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
mutex_init(&gna_priv->mmu_lock);
+ mutex_init(&gna_priv->filelist_lock);
+ INIT_LIST_HEAD(&gna_priv->file_list);
+
atomic_set(&gna_priv->request_count, 0);
idr_init(&gna_priv->memory_idr);
diff --git a/drivers/misc/gna/gna_device.h b/drivers/misc/gna/gna_device.h
index 445c81c698fb..59b074704d1a 100644
--- a/drivers/misc/gna/gna_device.h
+++ b/drivers/misc/gna/gna_device.h
@@ -35,6 +35,11 @@ struct gna_hw_info {
struct gna_private {
struct gna_driver_private *drv_priv;
+ /* list of opened files */
+ struct list_head file_list;
+ /* lock protecting file_list */
+ struct mutex filelist_lock;
+
/* device objects */
struct pci_dev *pdev;
struct device *parent; /* pdev->dev */
diff --git a/drivers/misc/gna/gna_hw.h b/drivers/misc/gna/gna_hw.h
index e09e562aae50..941f54b0f29a 100644
--- a/drivers/misc/gna/gna_hw.h
+++ b/drivers/misc/gna/gna_hw.h
@@ -33,8 +33,6 @@
/* minimum size of XNN layer descriptors in bytes (at least 1 layer) */
#define XNN_LYR_DSC_SIZE (128)
-#define GMM_CFG_SIZE (128)
-
#define GNA_VAMAXADDR_OFFSET 0x200
#define GNA_PGDIRN_OFFSET 0x210
diff --git a/drivers/misc/gna/gna_score.c b/drivers/misc/gna/gna_score.c
new file mode 100644
index 000000000000..e708e449e1ee
--- /dev/null
+++ b/drivers/misc/gna/gna_score.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/sched/mm.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+
+#include <uapi/misc/gna.h>
+
+#include "gna_device.h"
+#include "gna_driver.h"
+#include "gna_request.h"
+#include "gna_score.h"
+
+int gna_validate_score_config(struct gna_compute_cfg *compute_cfg,
+ struct gna_file_private *file_priv)
+{
+ struct gna_private *gna_priv;
+ size_t buffers_size;
+
+ gna_priv = file_priv->gna_priv;
+
+ if (compute_cfg->gna_mode > GNA_MODE_XNN) {
+ dev_err(&gna_priv->dev, "gna mode invalid\n");
+ return -EINVAL;
+ }
+
+ if (compute_cfg->layer_count > gna_priv->info.max_layer_count) {
+ dev_err(&gna_priv->dev, "max layer count exceeded\n");
+ return -EINVAL;
+ }
+
+ if (compute_cfg->buffer_count == 0) {
+ dev_err(&gna_priv->dev, "no buffers\n");
+ return -EINVAL;
+ }
+
+ buffers_size = sizeof(struct gna_buffer) * compute_cfg->buffer_count;
+ if (!access_ok(u64_to_user_ptr(compute_cfg->buffers_ptr), buffers_size)) {
+ dev_err(&gna_priv->dev, "invalid buffers pointer\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int gna_do_patch_memory(struct gna_private *gna_priv, struct gna_memory_object *mo,
+ struct gna_memory_patch *patch, void *vaddr)
+{
+ size_t size;
+ void *dest;
+ u64 value;
+
+ value = patch->value;
+ size = patch->size;
+ dest = (u8 *)vaddr + patch->offset;
+ dev_dbg(&gna_priv->dev, "patch offset: %llu, size: %zu, value: %llu\n",
+ patch->offset, size, value);
+
+ switch (size) {
+ case 0:
+ return -EFAULT;
+ case sizeof(u8):
+ *((u8 *)dest) = (u8)value;
+ break;
+ case sizeof(u16):
+ *((u16 *)dest) = (u16)value;
+ break;
+ case sizeof(u32):
+ *((u32 *)dest) = (u32)value;
+ break;
+ case sizeof(u64):
+ *((u64 *)dest) = (u64)value;
+ break;
+ default:
+ // should never happen
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int gna_mem_patch_memory(struct gna_private *gna_priv, struct gna_buffer *buffer)
+{
+ struct gna_memory_patch *patch;
+ struct gna_memory_object *mo;
+ void *vaddr;
+ int ret = 0;
+ u32 i;
+
+ dev_dbg(&gna_priv->dev, "memory_id: %llu, patch_count, %llu\n",
+ buffer->memory_id, buffer->patch_count);
+
+ /* get kernel space memory pointer */
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, buffer->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+ if (!mo)
+ return -EINVAL;
+
+ mutex_lock(&mo->page_lock);
+ ret = mo->ops->get_pages(mo, buffer->offset, buffer->size);
+ mutex_unlock(&mo->page_lock);
+ if (ret)
+ return ret;
+
+ if (buffer->patch_count) {
+ vaddr = vm_map_ram(mo->pages, mo->num_pinned, 0);
+ if (!vaddr)
+ return -ENOMEM;
+
+ patch = (struct gna_memory_patch *)(uintptr_t)buffer->patches_ptr;
+ for (i = 0; i < buffer->patch_count; i++, patch++) {
+ ret = gna_do_patch_memory(gna_priv, mo, patch, vaddr + buffer->offset);
+ if (ret)
+ break;
+ }
+
+ kvfree((void *)(uintptr_t)buffer->patches_ptr);
+ buffer->patches_ptr = 0;
+ vm_unmap_ram(vaddr, mo->num_pages);
+
+ if (ret)
+ return ret;
+ }
+
+ gna_mmu_add(gna_priv, mo);
+
+ return ret;
+}
+
+static struct gna_buffer *gna_find_buffer(struct gna_buffer *buffer_list, u32 buffer_count,
+ u32 mmu_offset, u32 *memory_offset)
+{
+ struct gna_buffer *buffer;
+ u32 page_offset;
+ u32 memory_size;
+ u32 offset;
+ u32 i;
+
+ offset = 0;
+ for (i = 0; i < buffer_count; i++) {
+ buffer = buffer_list + i;
+ page_offset = buffer->offset & ~PAGE_MASK;
+ memory_size = round_up(page_offset + buffer->size, PAGE_SIZE);
+ if (mmu_offset < offset + memory_size) {
+ *memory_offset = offset;
+ return buffer;
+ }
+ offset += memory_size;
+ }
+
+ return NULL;
+}
+
+static int gna_copy_gmm_config(struct gna_private *gna_priv,
+ struct gna_buffer *buffer_list,
+ u32 buffer_count, u32 mmu_offset)
+{
+ struct gna_hw_descriptor *hwdesc;
+ struct gna_memory_object *mo;
+ struct gna_mmu_object *mmu;
+ struct gna_buffer *buffer;
+ u32 memory_offset;
+ u32 skip_offset;
+ u8 *gmm_desc;
+ void *vaddr;
+
+ mmu = &gna_priv->mmu;
+ hwdesc = mmu->hwdesc;
+
+ buffer = gna_find_buffer(buffer_list, buffer_count, mmu_offset, &memory_offset);
+ if (!buffer) {
+ dev_dbg(&gna_priv->dev, "buffer not found\n");
+ return -EINVAL;
+ }
+
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, buffer->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+ if (!mo) {
+ dev_dbg(&gna_priv->dev, "memory object not found\n");
+ return -EFAULT;
+ }
+
+ vaddr = vm_map_ram(mo->pages, mo->num_pinned, 0);
+ if (!vaddr) {
+ dev_dbg(&gna_priv->dev, "mapping failed\n");
+ return -EFAULT;
+ }
+
+ skip_offset = round_down(buffer->offset, PAGE_SIZE);
+ gmm_desc = (u8 *)vaddr + skip_offset + (mmu_offset - memory_offset);
+ memcpy(&hwdesc->xnn_config, gmm_desc, sizeof(struct gna_xnn_descriptor));
+ vm_unmap_ram(vaddr, mo->num_pages);
+
+ return 0;
+}
+
+int gna_priv_score(struct gna_request *score_request)
+{
+ struct gna_xnn_descriptor *xnn_config;
+ struct gna_compute_cfg *compute_cfg;
+ struct gna_private *gna_priv;
+ struct gna_memory_object *mo;
+ struct gna_mmu_object *mmu;
+ struct gna_buffer *buffer;
+ bool mo_valid = true;
+ void __iomem *addr;
+ u64 buffer_count;
+ u32 desc_base;
+ int ret;
+ u64 i;
+
+ ret = 0;
+
+ gna_priv = score_request->gna_priv;
+
+ mmu = &gna_priv->mmu;
+ xnn_config = &mmu->hwdesc->xnn_config;
+ compute_cfg = &score_request->compute_cfg;
+
+ buffer = score_request->buffer_list;
+ buffer_count = score_request->buffer_count;
+ dev_dbg(&gna_priv->dev, "buffer count: %llu\n", buffer_count);
+ for (i = 0; i < buffer_count; i++, buffer++) {
+ dev_dbg(&gna_priv->dev, "patch count: %llu\n", buffer->patch_count);
+ ret = gna_mem_patch_memory(gna_priv, buffer);
+ if (ret)
+ goto err_put_pages;
+ }
+
+ switch (compute_cfg->gna_mode) {
+ case GNA_MODE_XNN:
+ dev_dbg(&gna_priv->dev, "xNN mode, labase: %d, lacount: %d\n",
+ compute_cfg->layer_base, compute_cfg->layer_count);
+ xnn_config->labase = compute_cfg->layer_base;
+ xnn_config->lacount = compute_cfg->layer_count;
+ break;
+ case GNA_MODE_GMM:
+ dev_dbg(&gna_priv->dev, "GMM mode, offset: %d\n", compute_cfg->layer_base);
+ ret = gna_copy_gmm_config(gna_priv, score_request->buffer_list,
+ buffer_count, compute_cfg->layer_base);
+ if (ret)
+ goto err_put_pages_decr;
+ break;
+ default:
+ ret = -EINVAL;
+ goto err_put_pages_decr;
+ }
+
+ addr = gna_priv->bar0.mem_addr;
+ desc_base = (u32)(mmu->hwdesc_dma >> PAGE_SHIFT);
+ gna_reg_write(addr, GNA_MMIO_DESBASE, desc_base);
+
+ gna_start_scoring(gna_priv, addr, compute_cfg);
+
+ return 0;
+
+err_put_pages_decr:
+ i--;
+ buffer--;
+err_put_pages:
+ do {
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, buffer->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+ if (mo) {
+ mutex_lock(&mo->page_lock);
+ mo->ops->put_pages(mo);
+ mutex_unlock(&mo->page_lock);
+ } else {
+ mo_valid = false;
+ dev_warn(&gna_priv->dev, "memory object not found %llu\n",
+ buffer->memory_id);
+ }
+ buffer--;
+ } while (i--);
+
+ if (mo_valid) {
+ i = score_request->buffer_count;
+ while (i--)
+ kvfree((void *)(uintptr_t)score_request->buffer_list[i].patches_ptr);
+ kvfree(score_request->buffer_list);
+ }
+ score_request->buffer_list = NULL;
+ score_request->buffer_count = 0;
+
+ return ret;
+}
diff --git a/drivers/misc/gna/gna_score.h b/drivers/misc/gna/gna_score.h
new file mode 100644
index 000000000000..7582cf6f8493
--- /dev/null
+++ b/drivers/misc/gna/gna_score.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_SCORE_H__
+#define __GNA_SCORE_H__
+
+#include <uapi/misc/gna.h>
+
+struct gna_private;
+struct gna_file_private;
+struct gna_request;
+
+/* validate user request */
+int gna_validate_score_config(struct gna_compute_cfg *compute_cfg,
+ struct gna_file_private *file_priv);
+
+/* scoring helper functions */
+int gna_priv_score(struct gna_request *score_request);
+
+#endif // __GNA_SCORE_H__
--
2.28.0
From: Tomasz Jankowski <[email protected]>
The new workqueue is responsible to process the list of requests
in a FIFO manner. It waits for the hardware to complete on every
request until it is woken up by an interrupt that will be addressed
in following changes.
Signed-off-by: Tomasz Jankowski <[email protected]>
Co-developed-by: Anisha Dattatraya Kulkarni <[email protected]>
Signed-off-by: Anisha Dattatraya Kulkarni <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/gna_device.c | 21 +++++-
drivers/misc/gna/gna_device.h | 8 +++
drivers/misc/gna/gna_driver.c | 5 ++
drivers/misc/gna/gna_hw.c | 8 +++
drivers/misc/gna/gna_hw.h | 1 +
drivers/misc/gna/gna_request.c | 119 +++++++++++++++++++++++++++++++++
drivers/misc/gna/gna_request.h | 1 +
7 files changed, 162 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index 7a1888824ea1..a229f51fb17b 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -89,6 +89,7 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
const struct pci_device_id *pci_id)
{
u32 bld_reg;
+ int ret;
pci_set_drvdata(pcidev, gna_priv);
@@ -102,7 +103,9 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
if (gna_mmu_alloc(gna_priv)) {
dev_err(&gna_priv->dev, "gna mmu allocation failed\n");
- return -EFAULT;
+ ret = -EFAULT;
+ goto err_pci_drvdata_unset;
+
}
dev_dbg(&pcidev->dev, "maximum memory size %llu num pd %d\n",
gna_priv->info.max_hw_mem, gna_priv->info.num_pagetables);
@@ -111,6 +114,7 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
gna_priv->info.desc_info.mmu_info.vamax_size);
mutex_init(&gna_priv->mmu_lock);
+ init_waitqueue_head(&gna_priv->busy_waitq);
mutex_init(&gna_priv->filelist_lock);
INIT_LIST_HEAD(&gna_priv->file_list);
@@ -119,15 +123,30 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
idr_init(&gna_priv->memory_idr);
mutex_init(&gna_priv->memidr_lock);
+
+ gna_priv->request_wq = create_singlethread_workqueue("gna_request_wq");
+ if (!gna_priv->request_wq) {
+ dev_err(&pcidev->dev, "could not create wq for gna device\n");
+ ret = -EFAULT;
+ goto err_pci_drvdata_unset;
+ }
mutex_init(&gna_priv->reqlist_lock);
INIT_LIST_HEAD(&gna_priv->request_list);
return 0;
+
+err_pci_drvdata_unset:
+ pci_set_drvdata(pcidev, NULL);
+
+ return ret;
}
/* Reverse gna_dev_init() */
static void gna_dev_deinit(struct gna_private *gna_priv)
{
+ flush_workqueue(gna_priv->request_wq);
+ destroy_workqueue(gna_priv->request_wq);
+
idr_destroy(&gna_priv->memory_idr);
gna_mmu_free(gna_priv);
pci_set_drvdata(gna_priv->pdev, NULL);
diff --git a/drivers/misc/gna/gna_device.h b/drivers/misc/gna/gna_device.h
index 59b074704d1a..0855972cd085 100644
--- a/drivers/misc/gna/gna_device.h
+++ b/drivers/misc/gna/gna_device.h
@@ -46,6 +46,8 @@ struct gna_private {
struct device dev;
struct cdev cdev;
+ u32 hw_status;
+
/* device related resources */
struct gna_pci_bar bar0;
struct gna_drv_info info;
@@ -55,9 +57,15 @@ struct gna_private {
/* lock protecting mmu structure */
struct mutex mmu_lock;
+ /* device busy indicator */
+ bool busy;
+
+ struct wait_queue_head busy_waitq;
+
/* requests */
struct list_head request_list;
struct mutex reqlist_lock; /* protects request_list */
+ struct workqueue_struct *request_wq;
atomic_t request_count;
/* memory objects */
diff --git a/drivers/misc/gna/gna_driver.c b/drivers/misc/gna/gna_driver.c
index 80981c448f3a..cb638dfa81ac 100644
--- a/drivers/misc/gna/gna_driver.c
+++ b/drivers/misc/gna/gna_driver.c
@@ -9,6 +9,11 @@
#include "gna_device.h"
#include "gna_driver.h"
+/* recovery timeout in seconds */
+int recovery_timeout = 60;
+module_param(recovery_timeout, int, 0644);
+MODULE_PARM_DESC(recovery_timeout, "Recovery timeout");
+
struct gna_driver_private gna_drv_priv;
struct class *gna_class;
diff --git a/drivers/misc/gna/gna_hw.c b/drivers/misc/gna/gna_hw.c
index 3b85c4b75fd8..48e09e5f3ca8 100644
--- a/drivers/misc/gna/gna_hw.c
+++ b/drivers/misc/gna/gna_hw.c
@@ -49,6 +49,14 @@ void gna_print_error_status(struct gna_private *gna_priv, u32 hw_status)
dev_dbg(&gna_priv->dev, "GNA error: Saturation Reached !\n");
}
+bool gna_hw_perf_enabled(struct gna_private *gna_priv)
+{
+ void __iomem *addr = gna_priv->bar0.mem_addr;
+ u32 ctrl = gna_reg_read(addr, GNA_MMIO_CTRL);
+
+ return FIELD_GET(GNA_CTRL_COMP_STATS_EN, ctrl) ? true : false;
+}
+
void gna_start_scoring(struct gna_private *gna_priv, void __iomem *addr,
struct gna_compute_cfg *compute_cfg)
{
diff --git a/drivers/misc/gna/gna_hw.h b/drivers/misc/gna/gna_hw.h
index 941f54b0f29a..4da29870c4dc 100644
--- a/drivers/misc/gna/gna_hw.h
+++ b/drivers/misc/gna/gna_hw.h
@@ -71,6 +71,7 @@ struct gna_private;
struct gna_compute_cfg;
void gna_abort_hw(struct gna_private *gna_priv);
+bool gna_hw_perf_enabled(struct gna_private *gna_priv);
int gna_parse_hw_status(struct gna_private *gna_priv, u32 hw_status);
void gna_print_error_status(struct gna_private *gna_priv, u32 hw_status);
void gna_start_scoring(struct gna_private *gna_priv, void __iomem *addr,
diff --git a/drivers/misc/gna/gna_request.c b/drivers/misc/gna/gna_request.c
index a7a55fe68eed..3957d45223d4 100644
--- a/drivers/misc/gna/gna_request.c
+++ b/drivers/misc/gna/gna_request.c
@@ -8,7 +8,121 @@
#include "gna_device.h"
#include "gna_driver.h"
+#include "gna_hw.h"
#include "gna_request.h"
+#include "gna_score.h"
+
+static void gna_request_update_status(struct gna_request *score_request)
+{
+ struct gna_private *gna_priv = score_request->gna_priv;
+ void __iomem *addr = gna_priv->bar0.mem_addr;
+ /* The gna_priv's hw_status should be updated first */
+ u32 hw_status = gna_priv->hw_status;
+ u32 stall_cycles;
+ u32 total_cycles;
+
+ /* Technically, the time stamp can be a bit later than
+ * when the hw actually completed scoring. Here we just
+ * do our best in a deferred work, unless we want to
+ * tax isr for a more accurate record.
+ */
+ score_request->drv_perf.hw_completed = ktime_get_ns();
+
+ score_request->hw_status = hw_status;
+
+ score_request->status = gna_parse_hw_status(gna_priv, hw_status);
+
+ if (gna_hw_perf_enabled(gna_priv)) {
+ if (hw_status & GNA_STS_STATISTICS_VALID) {
+ total_cycles = gna_reg_read(addr, GNA_MMIO_PTC);
+ stall_cycles = gna_reg_read(addr, GNA_MMIO_PSC);
+ score_request->hw_perf.total = total_cycles;
+ score_request->hw_perf.stall = stall_cycles;
+ } else
+ dev_warn(&gna_priv->dev, "GNA statistics missing\n");
+ }
+ if (unlikely(hw_status & GNA_ERROR))
+ gna_print_error_status(gna_priv, hw_status);
+}
+
+static void gna_request_process(struct work_struct *work)
+{
+ struct gna_request *score_request;
+ struct gna_memory_object *mo;
+ struct gna_private *gna_priv;
+ struct gna_buffer *buffer;
+ unsigned long hw_timeout;
+ int ret;
+ u64 i;
+
+ score_request = container_of(work, struct gna_request, work);
+ gna_priv = score_request->gna_priv;
+ dev_dbg(&gna_priv->dev, "processing request %llu\n", score_request->request_id);
+
+ score_request->state = ACTIVE;
+
+ score_request->drv_perf.pre_processing = ktime_get_ns();
+
+ /* Set busy flag before kicking off HW. The isr will clear it and wake up us. There is
+ * no difference if isr is missed in a timeout situation of the last request. We just
+ * always set it busy and let the wait_event_timeout check the reset.
+ * wq: X -> true
+ * isr: X -> false
+ */
+ gna_priv->busy = true;
+
+ ret = gna_priv_score(score_request);
+ if (ret) {
+ score_request->status = ret;
+ goto end;
+ }
+
+ score_request->drv_perf.processing = ktime_get_ns();
+
+ hw_timeout = msecs_to_jiffies(recovery_timeout * 1000);
+
+ /* Wait for HW to finish the current request. */
+ hw_timeout = wait_event_timeout(gna_priv->busy_waitq,
+ !gna_priv->busy, hw_timeout);
+
+ if (!hw_timeout)
+ dev_warn(&gna_priv->dev, "hardware timeout occurred\n");
+
+ /* Update HW status */
+ gna_priv->hw_status = gna_reg_read(gna_priv->bar0.mem_addr, GNA_MMIO_STS);
+
+ gna_request_update_status(score_request);
+ gna_abort_hw(gna_priv);
+
+ /* request post-processing */
+ buffer = score_request->buffer_list;
+ for (i = 0; i < score_request->buffer_count; i++, buffer++) {
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, buffer->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+ if (mo) {
+ mutex_lock(&mo->page_lock);
+ mo->ops->put_pages(mo);
+ mutex_unlock(&mo->page_lock);
+ } else {
+ dev_warn(&gna_priv->dev, "mo not found %llu\n", buffer->memory_id);
+ }
+ }
+
+ /* patches_ptr's are already freed by ops->score() function */
+ kvfree(score_request->buffer_list);
+ score_request->buffer_list = NULL;
+ score_request->buffer_count = 0;
+
+ gna_mmu_clear(gna_priv);
+
+end:
+ score_request->drv_perf.completion = ktime_get_ns();
+ dev_dbg(&gna_priv->dev, "request %llu done, waking processes\n",
+ score_request->request_id);
+ score_request->state = DONE;
+ wake_up_interruptible_all(&score_request->waitq);
+}
static struct gna_request *gna_request_create(struct gna_file_private *file_priv,
struct gna_compute_cfg *compute_cfg)
@@ -34,6 +148,7 @@ static struct gna_request *gna_request_create(struct gna_file_private *file_priv
score_request->gna_priv = gna_priv;
score_request->state = NEW;
init_waitqueue_head(&score_request->waitq);
+ INIT_WORK(&score_request->work, gna_request_process);
return score_request;
}
@@ -242,6 +357,7 @@ int gna_enqueue_request(struct gna_compute_cfg *compute_cfg,
list_add_tail(&score_request->node, &gna_priv->request_list);
mutex_unlock(&gna_priv->reqlist_lock);
+ queue_work(gna_priv->request_wq, &score_request->work);
kref_put(&score_request->refcount, gna_request_release);
*request_id = score_request->request_id;
@@ -292,6 +408,7 @@ void gna_delete_request_by_id(u64 req_id, struct gna_private *gna_priv)
list_for_each_entry_safe(req, temp_req, reqs_list, node) {
if (req->request_id == req_id) {
list_del(&req->node);
+ cancel_work_sync(&req->work);
kref_put(&req->refcount, gna_request_release);
break;
}
@@ -313,6 +430,7 @@ void gna_delete_file_requests(struct file *fd, struct gna_private *gna_priv)
list_for_each_entry_safe(req, temp_req, reqs_list, node) {
if (req->fd == fd) {
list_del(&req->node);
+ cancel_work_sync(&req->work);
kref_put(&req->refcount, gna_request_release);
break;
}
@@ -336,6 +454,7 @@ void gna_delete_memory_requests(u64 memory_id, struct gna_private *gna_priv)
for (i = 0; i < req->buffer_count; ++i) {
if (req->buffer_list[i].memory_id == memory_id) {
list_del(&req->node);
+ cancel_work_sync(&req->work);
kref_put(&req->refcount, gna_request_release);
break;
}
diff --git a/drivers/misc/gna/gna_request.h b/drivers/misc/gna/gna_request.h
index c15a2c1803da..cefeea524cf1 100644
--- a/drivers/misc/gna/gna_request.h
+++ b/drivers/misc/gna/gna_request.h
@@ -43,6 +43,7 @@ struct gna_request {
u64 buffer_count;
struct wait_queue_head waitq;
+ struct work_struct work;
};
int gna_enqueue_request(struct gna_compute_cfg *compute_cfg,
--
2.28.0
From: Tomasz Jankowski <[email protected]>
An interrupt is generated by the hardware when a scoring job is
done. The interrupt handler wakes up the work queue to resume
the processing on the current request.
Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/gna_device.c | 32 ++++++++++++++++++++++++++++++--
drivers/misc/gna/gna_device.h | 2 ++
drivers/misc/gna/gna_hw.c | 10 ++++++++++
drivers/misc/gna/gna_hw.h | 2 ++
4 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index a229f51fb17b..5198326e8af4 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -182,6 +182,27 @@ int gna_probe(struct pci_dev *pcidev, const struct pci_device_id *pci_id)
goto err_clear_master;
}
+ ret = pci_alloc_irq_vectors(pcidev, 1, 1, PCI_IRQ_ALL_TYPES);
+ if (ret < 0)
+ goto err_clear_master;
+
+ gna_priv->irq = pci_irq_vector(pcidev, 0);
+ if (unlikely(gna_priv->irq < 0)) {
+ dev_err(&pcidev->dev, "could not obtain irq number\n");
+ ret = -EIO;
+ goto err_free_irq_vector;
+ }
+
+ ret = request_irq(gna_priv->irq, gna_interrupt,
+ IRQF_SHARED, GNA_DRV_NAME, gna_priv);
+
+ if (ret) {
+ dev_err(&pcidev->dev, "could not register for interrupt\n");
+ goto err_free_irq_vector;
+ }
+
+ dev_dbg(&pcidev->dev, "irq num %d\n", gna_priv->irq);
+
/* Map BAR0 */
gna_priv->bar0.iostart = pci_resource_start(pcidev, 0);
gna_priv->bar0.iosize = pci_resource_len(pcidev, 0);
@@ -189,7 +210,7 @@ int gna_probe(struct pci_dev *pcidev, const struct pci_device_id *pci_id)
if (!gna_priv->bar0.mem_addr) {
dev_err(&pcidev->dev, "could not map BAR 0\n");
ret = -EINVAL;
- goto err_clear_master;
+ goto err_free_irq;
}
dev_dbg(&pcidev->dev, "bar0 io start: 0x%llx\n", (unsigned long long)gna_priv->bar0.iostart);
@@ -199,11 +220,15 @@ int gna_probe(struct pci_dev *pcidev, const struct pci_device_id *pci_id)
ret = gna_dev_init(gna_priv, pcidev, pci_id);
if (ret) {
dev_err(&pcidev->dev, "could not initialize gna private structure\n");
- goto err_clear_master;
+ goto err_free_irq;
}
return 0;
+err_free_irq:
+ free_irq(gna_priv->irq, gna_priv);
+err_free_irq_vector:
+ pci_free_irq_vectors(pcidev);
err_clear_master:
pci_clear_master(pcidev);
err_release_regions:
@@ -219,7 +244,10 @@ void gna_remove(struct pci_dev *pcidev)
gna_priv = pci_get_drvdata(pcidev);
+ free_irq(gna_priv->irq, gna_priv);
+
gna_dev_deinit(gna_priv);
+ pci_free_irq_vectors(pcidev);
pci_clear_master(pcidev);
pci_release_regions(pcidev);
diff --git a/drivers/misc/gna/gna_device.h b/drivers/misc/gna/gna_device.h
index 0855972cd085..77cd1a458367 100644
--- a/drivers/misc/gna/gna_device.h
+++ b/drivers/misc/gna/gna_device.h
@@ -46,12 +46,14 @@ struct gna_private {
struct device dev;
struct cdev cdev;
+ /* hardware status set by interrupt handler */
u32 hw_status;
/* device related resources */
struct gna_pci_bar bar0;
struct gna_drv_info info;
struct gna_hw_info hw_info;
+ int irq;
struct gna_mmu_object mmu;
/* lock protecting mmu structure */
diff --git a/drivers/misc/gna/gna_hw.c b/drivers/misc/gna/gna_hw.c
index 48e09e5f3ca8..6146cbd43004 100644
--- a/drivers/misc/gna/gna_hw.c
+++ b/drivers/misc/gna/gna_hw.c
@@ -124,3 +124,13 @@ void gna_abort_hw(struct gna_private *gna_priv)
if (i == 0)
dev_err(&gna_priv->dev, "abort did not complete\n");
}
+
+irqreturn_t gna_interrupt(int irq, void *priv)
+{
+ struct gna_private *gna_priv;
+
+ gna_priv = (struct gna_private *)priv;
+ gna_priv->busy = false;
+ wake_up(&gna_priv->busy_waitq);
+ return IRQ_HANDLED;
+}
diff --git a/drivers/misc/gna/gna_hw.h b/drivers/misc/gna/gna_hw.h
index 4da29870c4dc..4dfa05937943 100644
--- a/drivers/misc/gna/gna_hw.h
+++ b/drivers/misc/gna/gna_hw.h
@@ -6,6 +6,7 @@
#include <linux/bits.h>
#include <linux/bitfield.h>
+#include <linux/interrupt.h>
#include <linux/io.h>
/* GNA MMIO registers */
@@ -70,6 +71,7 @@ struct gna_desc_info {
struct gna_private;
struct gna_compute_cfg;
+irqreturn_t gna_interrupt(int irq, void *ctx);
void gna_abort_hw(struct gna_private *gna_priv);
bool gna_hw_perf_enabled(struct gna_private *gna_priv);
int gna_parse_hw_status(struct gna_private *gna_priv, u32 hw_status);
--
2.28.0
From: Tomasz Jankowski <[email protected]>
Add PCI ids of Intel(R) Gaussian & Neural Accelerator on supported
platforms.
Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/gna_device.c | 76 +++++++++++++++++++++++++++++++++++
drivers/misc/gna/gna_device.h | 2 +
drivers/misc/gna/gna_driver.c | 1 +
3 files changed, 79 insertions(+)
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index d6a6d3cab570..7031a5d7c16c 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -8,6 +8,82 @@
#include "gna_driver.h"
#include "gna_hw.h"
+#define GNA_DEV_HWID_CNL 0x5A11
+#define GNA_DEV_HWID_EHL 0x4511
+#define GNA_DEV_HWID_GLK 0x3190
+#define GNA_DEV_HWID_ICL 0x8A11
+#define GNA_DEV_HWID_JSL 0x4E11
+#define GNA_DEV_HWID_TGL 0x9A11
+
+#define GNA_FEATURES \
+ .max_hw_mem = 256 * 1024 * 1024, \
+ .num_pagetables = 64, \
+ .num_page_entries = PAGE_SIZE / sizeof(u32), \
+ /* desc_info all in bytes */ \
+ .desc_info = { \
+ .rsvd_size = 256, \
+ .cfg_size = 256, \
+ .desc_size = 784, \
+ .mmu_info = { \
+ .vamax_size = 4, \
+ .rsvd_size = 12, \
+ .pd_size = 4 * 64, \
+ }, \
+ }
+
+#define GNA_GEN1_FEATURES \
+ GNA_FEATURES, \
+ .max_layer_count = 1024
+
+#define GNA_GEN2_FEATURES \
+ GNA_FEATURES, \
+ .max_layer_count = 4096
+
+static const struct gna_drv_info cnl_drv_info = {
+ .hwid = GNA_DEV_HWID_CNL,
+ GNA_GEN1_FEATURES
+};
+
+static const struct gna_drv_info glk_drv_info = {
+ .hwid = GNA_DEV_HWID_GLK,
+ GNA_GEN1_FEATURES
+};
+
+static const struct gna_drv_info ehl_drv_info = {
+ .hwid = GNA_DEV_HWID_EHL,
+ GNA_GEN1_FEATURES
+};
+
+static const struct gna_drv_info icl_drv_info = {
+ .hwid = GNA_DEV_HWID_ICL,
+ GNA_GEN1_FEATURES
+};
+
+static const struct gna_drv_info jsl_drv_info = {
+ .hwid = GNA_DEV_HWID_JSL,
+ GNA_GEN2_FEATURES
+};
+
+static const struct gna_drv_info tgl_drv_info = {
+ .hwid = GNA_DEV_HWID_TGL,
+ GNA_GEN2_FEATURES
+};
+
+#define INTEL_GNA_DEVICE(hwid, info) \
+ { PCI_VDEVICE(INTEL, hwid), (kernel_ulong_t)(info) }
+
+const struct pci_device_id gna_pci_ids[] = {
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_CNL, &cnl_drv_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_EHL, &ehl_drv_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_GLK, &glk_drv_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_ICL, &icl_drv_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_JSL, &jsl_drv_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_TGL, &tgl_drv_info),
+ { }
+};
+
+MODULE_DEVICE_TABLE(pci, gna_pci_ids);
+
static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
const struct pci_device_id *pci_id)
{
diff --git a/drivers/misc/gna/gna_device.h b/drivers/misc/gna/gna_device.h
index ee234e474d43..22c47f2c03ec 100644
--- a/drivers/misc/gna/gna_device.h
+++ b/drivers/misc/gna/gna_device.h
@@ -56,6 +56,8 @@ struct gna_private {
struct mutex memidr_lock;
};
+extern const struct pci_device_id gna_pci_ids[];
+
int gna_probe(struct pci_dev *dev, const struct pci_device_id *id);
void gna_remove(struct pci_dev *dev);
diff --git a/drivers/misc/gna/gna_driver.c b/drivers/misc/gna/gna_driver.c
index 81f0f003f377..80981c448f3a 100644
--- a/drivers/misc/gna/gna_driver.c
+++ b/drivers/misc/gna/gna_driver.c
@@ -15,6 +15,7 @@ struct class *gna_class;
static struct pci_driver gna_driver = {
.name = GNA_DRV_NAME,
+ .id_table = gna_pci_ids,
.probe = gna_probe,
.remove = gna_remove,
};
--
2.28.0
From: Tomasz Jankowski <[email protected]>
The scoring work submitted to the GNA driver is implemented as a
list of requests that will be processed by the hardware.
Signed-off-by: Tomasz Jankowski <[email protected]>
Co-developed-by: Anisha Dattatraya Kulkarni <[email protected]>
Signed-off-by: Anisha Dattatraya Kulkarni <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/Kbuild | 2 +-
drivers/misc/gna/gna_device.c | 5 +
drivers/misc/gna/gna_device.h | 5 +
drivers/misc/gna/gna_mem.c | 3 +
drivers/misc/gna/gna_request.c | 347 +++++++++++++++++++++++++++++++++
drivers/misc/gna/gna_request.h | 61 ++++++
6 files changed, 422 insertions(+), 1 deletion(-)
create mode 100644 drivers/misc/gna/gna_request.c
create mode 100644 drivers/misc/gna/gna_request.h
diff --git a/drivers/misc/gna/Kbuild b/drivers/misc/gna/Kbuild
index 860b14c0e8d0..f47775759a6e 100644
--- a/drivers/misc/gna/Kbuild
+++ b/drivers/misc/gna/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-gna-y := gna_device.o gna_driver.o gna_mem.o gna_hw.o
+gna-y := gna_device.o gna_driver.o gna_mem.o gna_request.o gna_hw.o
obj-$(CONFIG_INTEL_GNA) += gna.o
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index 7031a5d7c16c..cd8ee86bbc58 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -7,6 +7,7 @@
#include "gna_device.h"
#include "gna_driver.h"
#include "gna_hw.h"
+#include "gna_request.h"
#define GNA_DEV_HWID_CNL 0x5A11
#define GNA_DEV_HWID_EHL 0x4511
@@ -111,8 +112,12 @@ static int gna_dev_init(struct gna_private *gna_priv, struct pci_dev *pcidev,
mutex_init(&gna_priv->mmu_lock);
+ atomic_set(&gna_priv->request_count, 0);
+
idr_init(&gna_priv->memory_idr);
mutex_init(&gna_priv->memidr_lock);
+ mutex_init(&gna_priv->reqlist_lock);
+ INIT_LIST_HEAD(&gna_priv->request_list);
return 0;
}
diff --git a/drivers/misc/gna/gna_device.h b/drivers/misc/gna/gna_device.h
index 22c47f2c03ec..445c81c698fb 100644
--- a/drivers/misc/gna/gna_device.h
+++ b/drivers/misc/gna/gna_device.h
@@ -50,6 +50,11 @@ struct gna_private {
/* lock protecting mmu structure */
struct mutex mmu_lock;
+ /* requests */
+ struct list_head request_list;
+ struct mutex reqlist_lock; /* protects request_list */
+ atomic_t request_count;
+
/* memory objects */
struct idr memory_idr;
/* lock protecting memory_idr */
diff --git a/drivers/misc/gna/gna_mem.c b/drivers/misc/gna/gna_mem.c
index a2ac3c31b4fb..09c4f401c3fa 100644
--- a/drivers/misc/gna/gna_mem.c
+++ b/drivers/misc/gna/gna_mem.c
@@ -19,6 +19,7 @@
#include "gna_device.h"
#include "gna_driver.h"
#include "gna_mem.h"
+#include "gna_request.h"
static void gna_mmu_init(struct gna_private *gna_priv)
{
@@ -391,6 +392,8 @@ static void gna_memory_release(struct work_struct *work)
mo = container_of(work, struct gna_memory_object, work);
+ gna_delete_memory_requests(mo->memory_id, mo->gna_priv);
+
mo->user_ptr = NULL;
wake_up_interruptible(&mo->waitq);
diff --git a/drivers/misc/gna/gna_request.c b/drivers/misc/gna/gna_request.c
new file mode 100644
index 000000000000..a7a55fe68eed
--- /dev/null
+++ b/drivers/misc/gna/gna_request.c
@@ -0,0 +1,347 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/device.h>
+#include <linux/kref.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#include "gna_device.h"
+#include "gna_driver.h"
+#include "gna_request.h"
+
+static struct gna_request *gna_request_create(struct gna_file_private *file_priv,
+ struct gna_compute_cfg *compute_cfg)
+{
+ struct gna_request *score_request;
+ struct gna_private *gna_priv;
+
+ gna_priv = file_priv->gna_priv;
+ if (IS_ERR(gna_priv))
+ return NULL;
+
+ score_request = kzalloc(sizeof(*score_request), GFP_KERNEL);
+ if (!score_request)
+ return NULL;
+ kref_init(&score_request->refcount);
+
+ dev_dbg(&gna_priv->dev, "layer_base %d layer_count %d\n",
+ compute_cfg->layer_base, compute_cfg->layer_count);
+
+ score_request->request_id = atomic_inc_return(&gna_priv->request_count);
+ score_request->compute_cfg = *compute_cfg;
+ score_request->fd = file_priv->fd;
+ score_request->gna_priv = gna_priv;
+ score_request->state = NEW;
+ init_waitqueue_head(&score_request->waitq);
+
+ return score_request;
+}
+
+/*
+ * returns true if [inner_offset, inner_size) is embraced by [0, outer_size). False otherwise.
+ */
+static bool gna_validate_ranges(u64 outer_size, u64 inner_offset, u64 inner_size)
+{
+ return inner_offset < outer_size &&
+ inner_size <= (outer_size - inner_offset);
+}
+
+static int gna_validate_patches(struct gna_private *gna_priv, __u64 buffer_size,
+ struct gna_memory_patch *patches, u64 count)
+{
+ u64 idx;
+
+ for (idx = 0; idx < count; ++idx) {
+ if (patches[idx].size > 8) {
+ dev_err(&gna_priv->dev, "invalid patch size: %llu\n", patches[idx].size);
+ return -EINVAL;
+ }
+
+ if (!gna_validate_ranges(buffer_size, patches[idx].offset, patches[idx].size)) {
+ dev_err(&gna_priv->dev,
+ "patch out of bounds. buffer size: %llu, patch offset/size:%llu/%llu\n",
+ buffer_size, patches[idx].offset, patches[idx].size);
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+static int gna_buffer_fill_patches(struct gna_buffer *buffer, struct gna_private *gna_priv)
+{
+ __u64 patches_user = buffer->patches_ptr;
+ struct gna_memory_patch *patches;
+ /* At this point, the buffer points to a memory region in kernel space where the copied
+ * patches_ptr also lives, but the value of it is still an address from user space. This
+ * function will set patches_ptr to either an address in kernel space or null before it
+ * exits.
+ */
+ u64 patch_count;
+ int ret;
+
+ buffer->patches_ptr = 0;
+ patch_count = buffer->patch_count;
+ if (!patch_count)
+ return 0;
+
+ patches = kvmalloc_array(patch_count, sizeof(struct gna_memory_patch), GFP_KERNEL);
+ if (!patches)
+ return -ENOMEM;
+
+ if (copy_from_user(patches, u64_to_user_ptr(patches_user),
+ sizeof(struct gna_memory_patch) * patch_count)) {
+ dev_err(&gna_priv->dev, "copy %llu patches from user failed\n", patch_count);
+ ret = -EFAULT;
+ goto err_fill_patches;
+ }
+
+ ret = gna_validate_patches(gna_priv, buffer->size, patches, patch_count);
+ if (ret) {
+ dev_err(&gna_priv->dev, "patches failed validation\n");
+ goto err_fill_patches;
+ }
+
+ buffer->patches_ptr = (uintptr_t)patches;
+
+ return 0;
+
+err_fill_patches:
+ kvfree(patches);
+ return ret;
+}
+
+static int gna_request_fill_buffers(struct gna_request *score_request,
+ struct gna_compute_cfg *compute_cfg)
+{
+ struct gna_buffer *buffer_list;
+ struct gna_memory_object *mo;
+ struct gna_private *gna_priv;
+ u64 buffers_total_size = 0;
+ struct gna_buffer *buffer;
+ u64 buffer_count;
+ u64 memory_id;
+ u64 i, j;
+ int ret;
+
+ gna_priv = score_request->gna_priv;
+
+ /* get memory buffer list */
+ buffer_count = compute_cfg->buffer_count;
+ buffer_list = kvmalloc_array(buffer_count, sizeof(struct gna_buffer), GFP_KERNEL);
+ if (!buffer_list)
+ return -ENOMEM;
+
+ if (copy_from_user(buffer_list, u64_to_user_ptr(compute_cfg->buffers_ptr),
+ sizeof(*buffer_list) * buffer_count)) {
+ dev_err(&gna_priv->dev, "copying %llu buffers failed\n", buffer_count);
+ ret = -EFAULT;
+ goto err_free_buffers;
+ }
+
+ for (i = 0; i < buffer_count; i++) {
+ buffer = &buffer_list[i];
+ memory_id = buffer->memory_id;
+
+ for (j = 0; j < i; j++) {
+ if (buffer_list[j].memory_id == memory_id) {
+ dev_err(&gna_priv->dev, "doubled memory id in score config. id:%llu\n", memory_id);
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+ }
+
+ buffers_total_size +=
+ gna_buffer_get_size(buffer->offset, buffer->size);
+ if (buffers_total_size > gna_priv->info.max_hw_mem) {
+ dev_err(&gna_priv->dev, "buffers' total size too big\n");
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, memory_id);
+ if (!mo) {
+ mutex_unlock(&gna_priv->memidr_lock);
+ dev_err(&gna_priv->dev, "memory object %llu not found\n", memory_id);
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+ mutex_unlock(&gna_priv->memidr_lock);
+
+ if (mo->fd != score_request->fd) {
+ dev_err(&gna_priv->dev,
+ "memory object from another file. %p != %p\n",
+ mo->fd, score_request->fd);
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+
+ if (!gna_validate_ranges(mo->memory_size, buffer->offset, buffer->size)) {
+ dev_err(&gna_priv->dev,
+ "buffer out of bounds. mo size: %llu, buffer offset/size:%llu/%llu\n",
+ mo->memory_size, buffer->offset, buffer->size);
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+
+ ret = gna_buffer_fill_patches(buffer, gna_priv);
+ if (ret)
+ goto err_free_patches;
+ }
+
+ score_request->buffer_list = buffer_list;
+ score_request->buffer_count = buffer_count;
+
+ return 0;
+
+err_zero_patch_ptr:
+ /* patches_ptr may still hold an address in userspace.
+ * Don't pass it to kvfree().
+ */
+ buffer->patches_ptr = 0;
+
+err_free_patches:
+ /* patches_ptr of each processed buffer should be either
+ * null or pointing to an allocated memory block in the
+ * kernel at this point.
+ */
+ for (j = 0; j <= i; j++)
+ kvfree((void *)(uintptr_t)buffer_list[j].patches_ptr);
+
+err_free_buffers:
+ kvfree(buffer_list);
+ return ret;
+}
+
+int gna_enqueue_request(struct gna_compute_cfg *compute_cfg,
+ struct gna_file_private *file_priv, u64 *request_id)
+{
+ struct gna_request *score_request;
+ struct gna_private *gna_priv;
+ int ret;
+
+ if (!file_priv)
+ return -EINVAL;
+
+ gna_priv = file_priv->gna_priv;
+
+ score_request = gna_request_create(file_priv, compute_cfg);
+ if (!score_request)
+ return -ENOMEM;
+
+ ret = gna_request_fill_buffers(score_request, compute_cfg);
+ if (ret) {
+ kref_put(&score_request->refcount, gna_request_release);
+ return ret;
+ }
+
+ kref_get(&score_request->refcount);
+ mutex_lock(&gna_priv->reqlist_lock);
+ list_add_tail(&score_request->node, &gna_priv->request_list);
+ mutex_unlock(&gna_priv->reqlist_lock);
+
+ kref_put(&score_request->refcount, gna_request_release);
+
+ *request_id = score_request->request_id;
+
+ return 0;
+}
+
+void gna_request_release(struct kref *ref)
+{
+ struct gna_request *score_request =
+ container_of(ref, struct gna_request, refcount);
+ kfree(score_request);
+}
+
+struct gna_request *gna_find_request_by_id(u64 req_id, struct gna_private *gna_priv)
+{
+ struct gna_request *req, *found_req;
+ struct list_head *reqs_list;
+
+ mutex_lock(&gna_priv->reqlist_lock);
+
+ reqs_list = &gna_priv->request_list;
+ found_req = NULL;
+ if (!list_empty(reqs_list)) {
+ list_for_each_entry(req, reqs_list, node) {
+ if (req_id == req->request_id) {
+ found_req = req;
+ kref_get(&found_req->refcount);
+ break;
+ }
+ }
+ }
+
+ mutex_unlock(&gna_priv->reqlist_lock);
+
+ return found_req;
+}
+
+void gna_delete_request_by_id(u64 req_id, struct gna_private *gna_priv)
+{
+ struct gna_request *req, *temp_req;
+ struct list_head *reqs_list;
+
+ mutex_lock(&gna_priv->reqlist_lock);
+
+ reqs_list = &gna_priv->request_list;
+ if (!list_empty(reqs_list)) {
+ list_for_each_entry_safe(req, temp_req, reqs_list, node) {
+ if (req->request_id == req_id) {
+ list_del(&req->node);
+ kref_put(&req->refcount, gna_request_release);
+ break;
+ }
+ }
+ }
+
+ mutex_unlock(&gna_priv->reqlist_lock);
+}
+
+void gna_delete_file_requests(struct file *fd, struct gna_private *gna_priv)
+{
+ struct gna_request *req, *temp_req;
+ struct list_head *reqs_list;
+
+ mutex_lock(&gna_priv->reqlist_lock);
+
+ reqs_list = &gna_priv->request_list;
+ if (!list_empty(reqs_list)) {
+ list_for_each_entry_safe(req, temp_req, reqs_list, node) {
+ if (req->fd == fd) {
+ list_del(&req->node);
+ kref_put(&req->refcount, gna_request_release);
+ break;
+ }
+ }
+ }
+
+ mutex_unlock(&gna_priv->reqlist_lock);
+}
+
+void gna_delete_memory_requests(u64 memory_id, struct gna_private *gna_priv)
+{
+ struct gna_request *req, *temp_req;
+ struct list_head *reqs_list;
+ int i;
+
+ mutex_lock(&gna_priv->reqlist_lock);
+
+ reqs_list = &gna_priv->request_list;
+ if (!list_empty(reqs_list)) {
+ list_for_each_entry_safe(req, temp_req, reqs_list, node) {
+ for (i = 0; i < req->buffer_count; ++i) {
+ if (req->buffer_list[i].memory_id == memory_id) {
+ list_del(&req->node);
+ kref_put(&req->refcount, gna_request_release);
+ break;
+ }
+ }
+ }
+ }
+
+ mutex_unlock(&gna_priv->reqlist_lock);
+}
diff --git a/drivers/misc/gna/gna_request.h b/drivers/misc/gna/gna_request.h
new file mode 100644
index 000000000000..c15a2c1803da
--- /dev/null
+++ b/drivers/misc/gna/gna_request.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_REQUEST_H__
+#define __GNA_REQUEST_H__
+
+#include <linux/kref.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#include <uapi/misc/gna.h>
+
+enum gna_request_state {
+ NEW,
+ ACTIVE,
+ DONE,
+};
+
+struct gna_file_private;
+
+struct gna_request {
+ u64 request_id;
+
+ struct kref refcount;
+
+ struct gna_private *gna_priv;
+ struct file *fd;
+
+ u32 hw_status;
+
+ enum gna_request_state state;
+
+ int status;
+
+ struct gna_hw_perf hw_perf;
+ struct gna_drv_perf drv_perf;
+
+ struct list_head node;
+
+ struct gna_compute_cfg compute_cfg;
+
+ struct gna_buffer *buffer_list;
+ u64 buffer_count;
+
+ struct wait_queue_head waitq;
+};
+
+int gna_enqueue_request(struct gna_compute_cfg *compute_cfg,
+ struct gna_file_private *file_priv, u64 *request_id);
+
+void gna_request_release(struct kref *ref);
+
+struct gna_request *gna_find_request_by_id(u64 req_id, struct gna_private *gna_priv);
+
+void gna_delete_request_by_id(u64 req_id, struct gna_private *gna_priv);
+
+void gna_delete_file_requests(struct file *fd, struct gna_private *gna_priv);
+
+void gna_delete_memory_requests(u64 memory_id, struct gna_private *gna_priv);
+
+#endif // __GNA_REQUEST_H__
--
2.28.0
From: Tomasz Jankowski <[email protected]>
Add ioctl handler into GNA driver.
The ioctl interface provides the ability to do the following:
- Map and unmap memory buffers for GNA computation requests.
- Retrieve capabilities of the underlying GNA IP.
- Submit GNA computation requests.
- Request notification of scoring completion.
Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/gna/Kbuild | 2 +-
drivers/misc/gna/gna_device.c | 40 ++++++
drivers/misc/gna/gna_device.h | 2 +
drivers/misc/gna/gna_ioctl.c | 249 ++++++++++++++++++++++++++++++++++
drivers/misc/gna/gna_ioctl.h | 11 ++
5 files changed, 303 insertions(+), 1 deletion(-)
create mode 100644 drivers/misc/gna/gna_ioctl.c
create mode 100644 drivers/misc/gna/gna_ioctl.h
diff --git a/drivers/misc/gna/Kbuild b/drivers/misc/gna/Kbuild
index 049e142894aa..ea91603e59f2 100644
--- a/drivers/misc/gna/Kbuild
+++ b/drivers/misc/gna/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-gna-y := gna_device.o gna_driver.o gna_mem.o gna_request.o gna_score.o gna_hw.o
+gna-y := gna_device.o gna_driver.o gna_mem.o gna_ioctl.o gna_request.o gna_score.o gna_hw.o
obj-$(CONFIG_INTEL_GNA) += gna.o
diff --git a/drivers/misc/gna/gna_device.c b/drivers/misc/gna/gna_device.c
index 5198326e8af4..9faf0456fee2 100644
--- a/drivers/misc/gna/gna_device.c
+++ b/drivers/misc/gna/gna_device.c
@@ -252,3 +252,43 @@ void gna_remove(struct pci_dev *pcidev)
pci_clear_master(pcidev);
pci_release_regions(pcidev);
}
+
+static u32 gna_device_type_by_hwid(u32 hwid)
+{
+ switch (hwid) {
+ case GNA_DEV_HWID_CNL:
+ return GNA_DEV_TYPE_0_9;
+ case GNA_DEV_HWID_GLK:
+ case GNA_DEV_HWID_EHL:
+ case GNA_DEV_HWID_ICL:
+ return GNA_DEV_TYPE_1_0;
+ case GNA_DEV_HWID_JSL:
+ case GNA_DEV_HWID_TGL:
+ return GNA_DEV_TYPE_2_0;
+ default:
+ return 0;
+ }
+}
+
+int gna_getparam(struct gna_private *gna_priv, union gna_parameter *param)
+{
+ switch (param->in.id) {
+ case GNA_PARAM_DEVICE_ID:
+ param->out.value = gna_priv->info.hwid;
+ break;
+ case GNA_PARAM_RECOVERY_TIMEOUT:
+ param->out.value = recovery_timeout;
+ break;
+ case GNA_PARAM_INPUT_BUFFER_S:
+ param->out.value = gna_priv->hw_info.in_buf_s;
+ break;
+ case GNA_PARAM_DEVICE_TYPE:
+ param->out.value = gna_device_type_by_hwid(gna_priv->info.hwid);
+ break;
+ default:
+ dev_err(&gna_priv->dev, "unknown parameter id %llu\n", param->in.id);
+ return -EINVAL;
+ }
+
+ return 0;
+}
diff --git a/drivers/misc/gna/gna_device.h b/drivers/misc/gna/gna_device.h
index 77cd1a458367..531e2b904d6a 100644
--- a/drivers/misc/gna/gna_device.h
+++ b/drivers/misc/gna/gna_device.h
@@ -82,4 +82,6 @@ int gna_probe(struct pci_dev *dev, const struct pci_device_id *id);
void gna_remove(struct pci_dev *dev);
+int gna_getparam(struct gna_private *gna_priv, union gna_parameter *param);
+
#endif /* __GNA_DEVICE_H__ */
diff --git a/drivers/misc/gna/gna_ioctl.c b/drivers/misc/gna/gna_ioctl.c
new file mode 100644
index 000000000000..a408336d4b6c
--- /dev/null
+++ b/drivers/misc/gna/gna_ioctl.c
@@ -0,0 +1,249 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/uaccess.h>
+
+#include <uapi/misc/gna.h>
+
+#include "gna_driver.h"
+#include "gna_device.h"
+#include "gna_ioctl.h"
+#include "gna_mem.h"
+#include "gna_request.h"
+#include "gna_score.h"
+
+static int gna_ioctl_score(struct gna_file_private *file_priv, void __user *argptr)
+{
+ union gna_compute score_args;
+ struct gna_private *gna_priv;
+ u64 request_id;
+ int ret;
+
+ gna_priv = file_priv->gna_priv;
+
+ if (copy_from_user(&score_args, argptr, sizeof(score_args))) {
+ dev_err(&gna_priv->dev, "could not copy score ioctl config from user\n");
+ return -EFAULT;
+ }
+
+ ret = gna_validate_score_config(&score_args.in.config, file_priv);
+ if (ret) {
+ dev_err(&gna_priv->dev, "request not valid\n");
+ return ret;
+ }
+
+ ret = gna_enqueue_request(&score_args.in.config, file_priv, &request_id);
+ if (ret) {
+ dev_err(&gna_priv->dev, "could not enqueue score request %d\n", ret);
+ return ret;
+ }
+
+ score_args.out.request_id = request_id;
+ if (copy_to_user(argptr, &score_args, sizeof(score_args))) {
+ dev_err(&gna_priv->dev, "could not copy score ioctl status to user\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+static int gna_ioctl_wait(struct file *f, void __user *argptr)
+{
+ struct gna_file_private *file_priv;
+ struct gna_request *score_request;
+ struct gna_private *gna_priv;
+ union gna_wait wait_data;
+ u64 request_id;
+ u32 timeout;
+ int ret;
+
+ file_priv = (struct gna_file_private *)f->private_data;
+ gna_priv = file_priv->gna_priv;
+
+ ret = 0;
+
+ if (copy_from_user(&wait_data, argptr, sizeof(wait_data))) {
+ dev_err(&gna_priv->dev, "could not copy wait ioctl data from user\n");
+ return -EFAULT;
+ }
+
+ request_id = wait_data.in.request_id;
+ timeout = wait_data.in.timeout;
+
+ score_request = gna_find_request_by_id(request_id, gna_priv);
+
+ if (!score_request) {
+ dev_err(&gna_priv->dev, "could not find request with id: %llu\n", request_id);
+ return -EINVAL;
+ }
+
+ if (score_request->fd != f) {
+ kref_put(&score_request->refcount, gna_request_release);
+ return -EINVAL;
+ }
+
+ dev_dbg(&gna_priv->dev, "waiting for request %llu for timeout %u\n", request_id, timeout);
+
+ ret = wait_event_interruptible_timeout(score_request->waitq, score_request->state == DONE,
+ msecs_to_jiffies(timeout));
+ if (ret == 0 || ret == -ERESTARTSYS) {
+ dev_err(&gna_priv->dev, "request timed out, id: %llu\n", request_id);
+ kref_put(&score_request->refcount, gna_request_release);
+ return -EBUSY;
+ }
+
+ dev_dbg(&gna_priv->dev, "request wait completed with %d req id %llu\n", ret, request_id);
+
+ wait_data.out.hw_perf = score_request->hw_perf;
+ wait_data.out.drv_perf = score_request->drv_perf;
+ wait_data.out.hw_status = score_request->hw_status;
+
+ ret = score_request->status;
+
+ dev_dbg(&gna_priv->dev, "request status %d, hw status: %#x\n",
+ score_request->status, score_request->hw_status);
+ kref_put(&score_request->refcount, gna_request_release);
+
+ gna_delete_request_by_id(request_id, gna_priv);
+
+ if (copy_to_user(argptr, &wait_data, sizeof(wait_data))) {
+ dev_err(&gna_priv->dev, "could not copy wait ioctl status to user\n");
+ ret = -EFAULT;
+ }
+
+ return ret;
+}
+
+static int gna_ioctl_userptr(struct gna_file_private *file_priv, void __user *argptr)
+{
+ struct gna_private *gna_priv;
+ union gna_memory_map gna_mem;
+ int ret;
+
+ gna_priv = file_priv->gna_priv;
+
+ if (copy_from_user(&gna_mem, argptr, sizeof(gna_mem))) {
+ dev_err(&gna_priv->dev, "could not copy userptr ioctl data from user\n");
+ return -EFAULT;
+ }
+
+ ret = gna_priv_userptr(file_priv, &gna_mem);
+ if (ret)
+ return ret;
+
+ if (copy_to_user(argptr, &gna_mem, sizeof(gna_mem))) {
+ dev_err(&gna_priv->dev, "could not copy userptr ioctl status to user\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+static int gna_ioctl_free(struct gna_file_private *file_priv, unsigned long arg)
+{
+ struct gna_memory_object *iter_mo, *temp_mo;
+ struct gna_memory_object *mo;
+ struct gna_private *gna_priv;
+
+ u64 memory_id = arg;
+
+ gna_priv = file_priv->gna_priv;
+
+ /* get kernel space memory pointer */
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+
+ if (!mo) {
+ dev_warn(&gna_priv->dev, "memory object not found\n");
+ return -EINVAL;
+ }
+
+ queue_work(gna_priv->request_wq, &mo->work);
+ if (wait_event_interruptible(mo->waitq, true)) {
+ dev_dbg(&gna_priv->dev, "wait interrupted\n");
+ return -ETIME;
+ }
+
+ mutex_lock(&file_priv->memlist_lock);
+ list_for_each_entry_safe(iter_mo, temp_mo, &file_priv->memory_list, file_mem_list) {
+ if (iter_mo->memory_id == memory_id) {
+ list_del(&iter_mo->file_mem_list);
+ break;
+ }
+ }
+ mutex_unlock(&file_priv->memlist_lock);
+
+ gna_memory_free(gna_priv, mo);
+
+ return 0;
+}
+
+static int gna_ioctl_getparam(struct gna_private *gna_priv, void __user *argptr)
+{
+ union gna_parameter param;
+ int ret;
+
+ if (copy_from_user(¶m, argptr, sizeof(param))) {
+ dev_err(&gna_priv->dev, "could not copy getparam ioctl data from user\n");
+ return -EFAULT;
+ }
+
+ ret = gna_getparam(gna_priv, ¶m);
+ if (ret)
+ return ret;
+
+ if (copy_to_user(argptr, ¶m, sizeof(param))) {
+ dev_err(&gna_priv->dev, "could not copy getparam ioctl status to user\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+long gna_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+{
+ struct gna_file_private *file_priv;
+ struct gna_private *gna_priv;
+ void __user *argptr;
+ int ret = 0;
+
+ argptr = (void __user *)arg;
+
+ file_priv = (struct gna_file_private *)f->private_data;
+ if (!file_priv)
+ return -ENODEV;
+
+ gna_priv = file_priv->gna_priv;
+ if (!gna_priv)
+ return -ENODEV;
+
+ switch (cmd) {
+ case GNA_GET_PARAMETER:
+ ret = gna_ioctl_getparam(gna_priv, argptr);
+ break;
+
+ case GNA_MAP_MEMORY:
+ ret = gna_ioctl_userptr(file_priv, argptr);
+ break;
+
+ case GNA_UNMAP_MEMORY:
+ ret = gna_ioctl_free(file_priv, arg);
+ break;
+
+ case GNA_COMPUTE:
+ ret = gna_ioctl_score(file_priv, argptr);
+ break;
+
+ case GNA_WAIT:
+ ret = gna_ioctl_wait(f, argptr);
+ break;
+
+ default:
+ dev_warn(&gna_priv->dev, "wrong ioctl %#x\n", cmd);
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
diff --git a/drivers/misc/gna/gna_ioctl.h b/drivers/misc/gna/gna_ioctl.h
new file mode 100644
index 000000000000..562f7f835f5f
--- /dev/null
+++ b/drivers/misc/gna/gna_ioctl.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_IOCTL_H__
+#define __GNA_IOCTL_H__
+
+#include <linux/fs.h>
+
+long gna_ioctl(struct file *f, unsigned int cmd, unsigned long arg);
+
+#endif // __GNA_IOCTL_H__
--
2.28.0
On Tue, Feb 16, 2021 at 05:05:13PM +0100, Maciej Kwapulinski wrote:
> Dear kernel maintainers,
>
> This submission is a kernel driver to support Intel(R) Gaussian & Neural Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor available on multiple Intel platforms. AI developers and users can offload continuous inference workloads to an Intel(R) GNA device in order to free processor resources and save power. Noise reduction and speech recognition are the examples of the workloads Intel(R) GNA deals with while its usage is not limited to the two.
>
> For a list of processors equipped with Intel(R) GNA device, please refer to this link:
> https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_supported_plugins_GNA.html
>
> We think contributing this driver to the upstream kernel project is the best way for developers and users to get the latest Intel(R) GNA support in a Linux kernel, through the mainline to any Linux distributions installed on their systems. Upstreaming also enables contribution from developers around the world to the driver once it is merged.
>
> The driver works with Intel(R) libraries in user space. The Intel(R) driver exposes a few IOCTL interfaces for use by libraries in user space. The libraries are open sourced and are available at:
> https://github.com/intel/gna
>
> Prior to the submission, these items were tested or examined against GNA driver patch series put on top of v5.11-rc3 tag of mainline kernel:
>
> Linux Kernel patch submission checklist
> https://www.kernel.org/doc/html/latest/process/submit-checklist.html?highlight=submit%20checklist
>
> 1. If you use a facility then #include the file that defines/declares that facility. Don’t depend on other header files pulling in ones that you use.
> (Checked)
>
> 2. Builds cleanly:
> with applicable or modified CONFIG options =y, =m, and =n. No gcc warnings/errors, no linker warnings/errors.
> Passes allnoconfig, allmodconfig
> Builds successfully when using O=builddir
> (Tested by building kernel with Intel(R) GNA driver config set to ‘m’, ‘y’, and ‘n’; allmodconfig, allnoconfig and O=builddir)
>
> 3. Builds on multiple CPU architectures by using local cross-compile tools or some other build farm.
> (x86_64 architecture tested - this is architecture where GNA is present and validated, please refer to drivers/misc/gna/Kconfig)
>
> 4. ppc64 is a good architecture for cross-compilation checking because it tends to use unsigned long for 64-bit quantities.
> (x86_64 architecture tested - this is architecture where GNA is present and validated, please refer to drivers/misc/gna/Kconfig)
>
> 5. Check your patch for general style as detailed in Documentation/process/coding-style.rst. Check for trivial violations with the patch style checker prior to submission (scripts/checkpatch.pl). You should be able to justify all violations that remain in your patch.
> (Checked. Some warnings were in the output. We checked them and feel they can be ignored.)
>
> 6. Any new or modified CONFIG options do not muck up the config menu and default to off unless they meet the exception criteria documented in Documentation/kbuild/kconfig-language.rst Menu attributes: default value.
> (No explicit default value is provided because Kbuild system sets it off by default.)
>
> 7. All new Kconfig options have help text.
> (Checked)
>
> 8. Has been carefully reviewed with respect to relevant Kconfig combinations. This is very hard to get right with testing – brainpower pays off here.
> (Checked)
>
> 10. Use make checkstack and fix any problems that it finds.
> Note
> checkstack does not point out problems explicitly, but any one function that uses more than 512 bytes on the stack is a candidate for change.
> (Checked)
>
> 11. Include kernel-doc to document global kernel APIs. (Not required for static functions, but OK there also.) Use make htmldocs or make pdfdocs to check the kernel-doc and fix any issues.
> (Addressed by adding new gna.rst in Documentation; tested with output from ‘make htmldocs’)
>
> 12. Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_ATOMIC_SLEEP, CONFIG_PROVE_RCU and CONFIG_DEBUG_OBJECTS_RCU_HEAD all simultaneously enabled.
> (Checked)
>
> 13. Has been build- and runtime tested with and without CONFIG_SMP and CONFIG_PREEMPT.
> (Checked)
>
> 15. All new /proc entries are documented under Documentation/.
> (The driver doesn’t introduce any new procs)
>
> 16. All new kernel boot parameters are documented in Documentation/admin-guide/kernel-parameters.rst.
> (The driver doesn’t add boot parameters)
>
> 17. All new module parameters are documented with MODULE_PARM_DESC().
> (Checked)
>
> 21. Newly-added code has been compiled with gcc -W (use make EXTRA_CFLAGS=-W). This will generate lots of noise, but is good for finding bugs like “warning: comparison between signed.
> and unsigned”.
> (Checked)
>
> 24. If any ioctl’s are added by the patch, then also update Documentation/userspace-api/ioctl/ioctl-number.rst.
> (Updated)
What is all of this? Do you see this on any other patch submission
00/XX patch description?
And please wrap your columns properly.
> The above results only reflect our understanding of the test and the code referred. Please kindly let us know any issues or different observations from any further tests.
>
> Thanks
>
> Series-reviewed-by: Tony Luck <[email protected]>
That's not a proper sign-off-by or reviewed-by, why isn't this on the
individual patches?
You all keep coming up with new and unique ways to do patch submissions
different than anyone else, why go through all of that effort?
I'll look at this after 5.12-rc1 is out, but odds are, I'm not going to
be happy with it if this is the start...
greg k-h
On Tue, Feb 16, 2021 at 05:05:25PM +0100, Maciej Kwapulinski wrote:
> +static inline struct gna_private *inode_to_gna(struct inode *inode)
> +{
> + return container_of(inode->i_cdev, struct gna_private, cdev);
> +}
> +
> +static int gna_open(struct inode *inode, struct file *f)
> +{
> + struct gna_file_private *file_priv;
> + struct gna_private *gna_priv;
> +
> + gna_priv = inode_to_gna(inode);
> + if (!gna_priv)
> + return -ENODEV;
Why are you testing for things that is impossible to ever happen?
Please go read your own function for proof...
{sigh}
greg k-h