2021-05-13 11:49:05

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

Dear kernel maintainers,

This submission is a kernel driver to support Intel(R) Gaussian & Neural
Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
available on multiple Intel platforms. AI developers and users can offload
continuous inference workloads to an Intel(R) GNA device in order to free
processor resources and save power. Noise reduction and speech recognition
are the examples of the workloads Intel(R) GNA deals with while its usage
is not limited to the two.

For a list of processors equipped with Intel(R) GNA device, please refer to
this link:
https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_supported_plugins_GNA.html

We think contributing this driver to the upstream kernel project is the
best way for developers and users to get the latest Intel(R) GNA support in
a Linux kernel, through the mainline to any Linux distributions installed
on their systems. Upstreaming also enables contribution from developers
around the world to the driver once it is merged.

The driver works with Intel(R) libraries in user space. The Intel(R) driver
exposes a few IOCTL interfaces for use by libraries in user space. The
libraries are open sourced and are available at:
https://github.com/intel/gna

---

Changelogs:

v1->v2:
- driver's new layout:
- driver name: gna -> intel_gna
- module name: gna -> intel_gna
- device file name: /dev/gnaN -> /dev/intel_gnaN
- driver's source directory: drivers/misc/gna/ -> drivers/misc/intel/gna/
- UAPI: include/uapi/misc/gna.h -> include/uapi/misc/intel/gna.h
- DOC: Documentation/misc-devices/gna.rst ->
Documentation/misc-devices/intel/gna.rst
- 'MISC' device framework used
- fixes throughout GNA device's PCI management
- header files' includes and forward declarations cleanup
- ISR made static
- unused comments cleanup
- "_priv_" segment removed from function names
- tested: v5.11-rc3 -> v5.11
- number of other/minor fixes

v2->v3:
- PCI glue driver part split.
- GNA probe fail path made fully implicit.
- 'recovery_timeout' module parameter present under 'CONFIG_DEBUG_INTEL_GNA' flag only.
- build for X86_32 enabled.
- module initialization through 'module_pci_driver()'.
- gna_priv->file_list cleanup.
- 'gna_' prefix removed from source files' names.
- power management handling added.
- tests performed on kernel v5.12
- number of other/minor fixes

---

Maciej Kwapulinski (4):
intel_gna: add driver module
intel_gna: add interrupt handler
intel_gna: add a 'misc' device
intel_gna: add power management

Tomasz Jankowski (10):
intel_gna: add component of hardware operation
intel_gna: read hardware info in the driver
intel_gna: add memory handling
intel_gna: initialize mmu
intel_gna: add hardware ids
intel_gna: add request component
intel_gna: implement scoring
intel_gna: add a work queue to process scoring requests
intel_gna: add ioctl handler
intel_gna: add file operations to a 'misc' device

Documentation/misc-devices/index.rst | 1 +
Documentation/misc-devices/intel/gna.rst | 48 ++
.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 7 +
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/intel/gna/Kbuild | 5 +
drivers/misc/intel/gna/Kconfig | 13 +
drivers/misc/intel/gna/device.c | 350 +++++++++++++
drivers/misc/intel/gna/device.h | 106 ++++
drivers/misc/intel/gna/hw.c | 118 +++++
drivers/misc/intel/gna/hw.h | 106 ++++
drivers/misc/intel/gna/ioctl.c | 257 ++++++++++
drivers/misc/intel/gna/ioctl.h | 11 +
drivers/misc/intel/gna/mem.c | 421 +++++++++++++++
drivers/misc/intel/gna/mem.h | 115 +++++
drivers/misc/intel/gna/pci.c | 149 ++++++
drivers/misc/intel/gna/pci.h | 12 +
drivers/misc/intel/gna/request.c | 480 ++++++++++++++++++
drivers/misc/intel/gna/request.h | 67 +++
drivers/misc/intel/gna/score.c | 291 +++++++++++
drivers/misc/intel/gna/score.h | 17 +
include/uapi/misc/intel/gna.h | 153 ++++++
23 files changed, 2730 insertions(+)
create mode 100644 Documentation/misc-devices/intel/gna.rst
create mode 100644 drivers/misc/intel/gna/Kbuild
create mode 100644 drivers/misc/intel/gna/Kconfig
create mode 100644 drivers/misc/intel/gna/device.c
create mode 100644 drivers/misc/intel/gna/device.h
create mode 100644 drivers/misc/intel/gna/hw.c
create mode 100644 drivers/misc/intel/gna/hw.h
create mode 100644 drivers/misc/intel/gna/ioctl.c
create mode 100644 drivers/misc/intel/gna/ioctl.h
create mode 100644 drivers/misc/intel/gna/mem.c
create mode 100644 drivers/misc/intel/gna/mem.h
create mode 100644 drivers/misc/intel/gna/pci.c
create mode 100644 drivers/misc/intel/gna/pci.h
create mode 100644 drivers/misc/intel/gna/request.c
create mode 100644 drivers/misc/intel/gna/request.h
create mode 100644 drivers/misc/intel/gna/score.c
create mode 100644 drivers/misc/intel/gna/score.h
create mode 100644 include/uapi/misc/intel/gna.h

--
2.28.0



2021-05-13 11:50:21

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 04/14] intel_gna: add memory handling

From: Tomasz Jankowski <[email protected]>

Patch adds memory handling - mapping, DMA, pinning.
The GNA driver maps and unmaps the physical pages for 64-byte aligned
buffer allocated by user space program. The pages of mapped memory
are being locked only during actual computation.

Patch adds configuration of the DMA scatter gather list for physical pages
and generation of page table and page directory to be programmed in the GNA HW
at the time of scoring initiation.

GNA’s MMU is being configured based on specific request memory usage.
As the MMU can address up to 256MB a single scoring request is limited
to this amount of memory being used.

GNA Library can allocate any number of memory regions for GNA usage.
Its number and total capacity are limited by the OSs’ resources.
Due to GNA MMU restrictions, even when using multiple memory regions,
the sum of all the memory regions used within a single inference
request must be less than 256MB.

At least a single GNA memory region is needed to be allocated
(and can be shared by multiple models). At the other extreme,
each GNA tensor (e.g., weights/biases/inputs/outputs) could use
its own, separate GNA memory region.

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/Kbuild | 2 +-
drivers/misc/intel/gna/device.c | 18 ++
drivers/misc/intel/gna/device.h | 28 +++
drivers/misc/intel/gna/mem.c | 418 ++++++++++++++++++++++++++++++++
drivers/misc/intel/gna/mem.h | 115 +++++++++
include/uapi/misc/intel/gna.h | 12 +
6 files changed, 592 insertions(+), 1 deletion(-)
create mode 100644 drivers/misc/intel/gna/mem.c
create mode 100644 drivers/misc/intel/gna/mem.h

diff --git a/drivers/misc/intel/gna/Kbuild b/drivers/misc/intel/gna/Kbuild
index 69e20c8c22bd..64e8f10fd891 100644
--- a/drivers/misc/intel/gna/Kbuild
+++ b/drivers/misc/intel/gna/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only

-intel_gna-y := device.o hw.o pci.o
+intel_gna-y := device.o hw.o mem.o pci.o

obj-$(CONFIG_INTEL_GNA) += intel_gna.o
diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index 45b4b6df64ef..ed7d3c0223df 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -16,6 +16,13 @@ module_param(recovery_timeout, int, 0644);
MODULE_PARM_DESC(recovery_timeout, "Recovery timeout in seconds");
#endif

+static void gna_devm_idr_destroy(void *data)
+{
+ struct idr *idr = data;
+
+ idr_destroy(idr);
+}
+
int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase)
{
static atomic_t dev_last_idx = ATOMIC_INIT(-1);
@@ -52,6 +59,17 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
bld_reg = gna_reg_read(gna_priv, GNA_MMIO_IBUFFS);
gna_priv->hw_info.in_buf_s = bld_reg & GENMASK(7, 0);

+ mutex_init(&gna_priv->mmu_lock);
+
+ idr_init(&gna_priv->memory_idr);
+ ret = devm_add_action(parent, gna_devm_idr_destroy, &gna_priv->memory_idr);
+ if (ret) {
+ dev_err(parent, "could not add devm action for idr\n");
+ return ret;
+ }
+
+ mutex_init(&gna_priv->memidr_lock);
+
return 0;
}

diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index 057dec0e1983..f74c773867aa 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -4,14 +4,29 @@
#ifndef __GNA_DEVICE_H__
#define __GNA_DEVICE_H__

+#include <linux/idr.h>
#include <linux/io.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
#include <linux/types.h>

#include "hw.h"
+#include "mem.h"

#define GNA_DV_NAME "intel_gna"

struct device;
+struct file;
+
+struct gna_file_private {
+ struct file *fd;
+ struct gna_private *gna_priv;
+
+ struct list_head memory_list;
+ struct mutex memlist_lock;
+
+ struct list_head flist;
+};

struct gna_private {
int index;
@@ -25,6 +40,14 @@ struct gna_private {
void __iomem *iobase;
struct gna_dev_info info;
struct gna_hw_info hw_info;
+
+ struct gna_mmu_object mmu;
+ struct mutex mmu_lock;
+
+ /* memory objects' store */
+ struct idr memory_idr;
+ /* lock protecting memory_idr */
+ struct mutex memidr_lock;
};

int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase);
@@ -39,6 +62,11 @@ static inline void gna_reg_write(struct gna_private *gna_priv, u32 reg, u32 val)
writel(val, gna_priv->iobase + reg);
}

+static inline struct device *gna_parent(struct gna_private *gna_priv)
+{
+ return gna_priv->parent;
+}
+
static inline struct device *gna_dev(struct gna_private *gna_priv)
{
return gna_priv->parent;
diff --git a/drivers/misc/intel/gna/mem.c b/drivers/misc/intel/gna/mem.c
new file mode 100644
index 000000000000..bdc2771a0d18
--- /dev/null
+++ b/drivers/misc/intel/gna/mem.c
@@ -0,0 +1,418 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/pagemap.h>
+#include <linux/sched/mm.h>
+#include <linux/sched/task.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#include <uapi/misc/intel/gna.h>
+
+#include "hw.h"
+#include "device.h"
+#include "mem.h"
+
+static void gna_mmu_init(struct gna_private *gna_priv)
+{
+ struct gna_mmu_object *mmu;
+ dma_addr_t pagetable_dma;
+ u32 *pgdirn;
+ int i;
+
+ mmu = &gna_priv->mmu;
+
+ pgdirn = mmu->hwdesc->mmu.pagedir_n;
+
+ for (i = 0; i < mmu->num_pagetables; i++) {
+ pagetable_dma = mmu->pagetables_dma[i];
+ pgdirn[i] = pagetable_dma >> PAGE_SHIFT;
+ }
+
+ for (; i < GNA_PGDIRN_LEN; i++)
+ pgdirn[i] = GNA_PGDIR_INVALID;
+}
+
+/* descriptor and page tables allocation */
+int gna_mmu_alloc(struct gna_private *gna_priv)
+{
+ struct device *parent = gna_parent(gna_priv);
+ struct gna_mmu_object *mmu;
+ int desc_size;
+ int i;
+
+ if (gna_priv->info.num_pagetables > GNA_PGDIRN_LEN) {
+ dev_err(gna_dev(gna_priv), "too large number of pagetables requested\n");
+ return -EINVAL;
+ }
+
+ mmu = &gna_priv->mmu;
+
+ desc_size = round_up(gna_priv->info.desc_info.desc_size, PAGE_SIZE);
+
+ mmu->hwdesc = dmam_alloc_coherent(parent, desc_size, &mmu->hwdesc_dma,
+ GFP_KERNEL);
+ if (!mmu->hwdesc)
+ return -ENOMEM;
+
+ mmu->num_pagetables = gna_priv->info.num_pagetables;
+
+ mmu->pagetables_dma = devm_kmalloc_array(parent, mmu->num_pagetables, sizeof(*mmu->pagetables_dma),
+ GFP_KERNEL);
+ if (!mmu->pagetables_dma)
+ return -ENOMEM;
+
+ mmu->pagetables = devm_kmalloc_array(parent, mmu->num_pagetables, sizeof(*mmu->pagetables), GFP_KERNEL);
+
+ if (!mmu->pagetables)
+ return -ENOMEM;
+
+ for (i = 0; i < mmu->num_pagetables; i++) {
+ mmu->pagetables[i] = dmam_alloc_coherent(parent, PAGE_SIZE,
+ &mmu->pagetables_dma[i], GFP_KERNEL);
+ if (!mmu->pagetables[i])
+ return -ENOMEM;
+ }
+
+ gna_mmu_init(gna_priv);
+
+ return 0;
+}
+
+void gna_mmu_add(struct gna_private *gna_priv, struct gna_memory_object *mo)
+{
+ struct gna_mmu_object *mmu;
+ struct scatterlist *sgl;
+ dma_addr_t sg_page;
+ int sg_page_len;
+ u32 *pagetable;
+ u32 mmu_page;
+ int sg_pages;
+ int i;
+ int j;
+
+ mmu = &gna_priv->mmu;
+ mutex_lock(&gna_priv->mmu_lock);
+
+ j = mmu->filled_pages;
+ sgl = mo->sgt->sgl;
+ if (!sgl) {
+ dev_warn(gna_dev(gna_priv), "empty scatter list in memory object\n");
+ goto warn_empty_sgl;
+ }
+ sg_page = sg_dma_address(sgl);
+ sg_page_len = round_up(sg_dma_len(sgl), PAGE_SIZE) >> PAGE_SHIFT;
+ sg_pages = 0;
+
+ for (i = mmu->filled_pts; i < mmu->num_pagetables; i++) {
+ if (!sgl)
+ break;
+
+ pagetable = mmu->pagetables[i];
+
+ for (j = mmu->filled_pages; j < GNA_PT_LENGTH; j++) {
+ mmu_page = sg_page >> PAGE_SHIFT;
+ pagetable[j] = mmu_page;
+
+ mmu->filled_pages++;
+ sg_page += PAGE_SIZE;
+ sg_pages++;
+ if (sg_pages == sg_page_len) {
+ sgl = sg_next(sgl);
+ if (!sgl)
+ break;
+
+ sg_page = sg_dma_address(sgl);
+ sg_page_len =
+ round_up(sg_dma_len(sgl), PAGE_SIZE)
+ >> PAGE_SHIFT;
+ sg_pages = 0;
+ }
+ }
+
+ if (j == GNA_PT_LENGTH) {
+ mmu->filled_pages = 0;
+ mmu->filled_pts++;
+ }
+ }
+
+ mmu->hwdesc->mmu.vamaxaddr =
+ (mmu->filled_pts * PAGE_SIZE * GNA_PGDIR_ENTRIES) +
+ (mmu->filled_pages * PAGE_SIZE) - 1;
+ dev_dbg(gna_dev(gna_priv), "vamaxaddr set to %u\n", mmu->hwdesc->mmu.vamaxaddr);
+
+warn_empty_sgl:
+ mutex_unlock(&gna_priv->mmu_lock);
+}
+
+void gna_mmu_clear(struct gna_private *gna_priv)
+{
+ struct gna_mmu_object *mmu;
+ int i;
+
+ mmu = &gna_priv->mmu;
+ mutex_lock(&gna_priv->mmu_lock);
+
+ for (i = 0; i < mmu->filled_pts; i++)
+ memset(mmu->pagetables[i], 0, PAGE_SIZE);
+
+ if (mmu->filled_pages > 0)
+ memset(mmu->pagetables[mmu->filled_pts], 0, mmu->filled_pages * GNA_PT_ENTRY_SIZE);
+
+ mmu->filled_pts = 0;
+ mmu->filled_pages = 0;
+ mmu->hwdesc->mmu.vamaxaddr = 0;
+
+ mutex_unlock(&gna_priv->mmu_lock);
+}
+
+int gna_buffer_get_size(u64 offset, u64 size)
+{
+ u64 page_offset;
+
+ page_offset = offset & ~PAGE_MASK;
+ return round_up(page_offset + size, PAGE_SIZE);
+}
+
+/* must be called with gna_memory_object page_lock held */
+static int gna_get_pages(struct gna_memory_object *mo, u64 offset, u64 size)
+{
+ struct gna_private *gna_priv;
+ u64 effective_address;
+ struct mm_struct *mm;
+ struct sg_table *sgt;
+ struct page **pages;
+ int effective_size;
+ int num_pinned;
+ int num_pages;
+ int skip_size;
+ int ents;
+ int ret;
+
+ ret = 0;
+ gna_priv = mo->gna_priv;
+
+ if (mo->pages) {
+ dev_warn(gna_dev(gna_priv), "pages are already pinned\n");
+ return -EFAULT;
+ }
+
+ /* using vmalloc because num_pages can be large */
+ skip_size = round_down(offset, PAGE_SIZE);
+ effective_address = mo->user_address + skip_size;
+ dev_dbg(gna_dev(gna_priv), "user address %llx\n", mo->user_address);
+ dev_dbg(gna_dev(gna_priv), "effective user address %llx\n", effective_address);
+
+ effective_size = gna_buffer_get_size(offset, size);
+
+ num_pages = effective_size >> PAGE_SHIFT;
+ dev_dbg(gna_dev(gna_priv), "allocating %d pages\n", num_pages);
+ pages = kvmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
+ if (!pages) {
+ ret = -ENOMEM;
+ goto err_exit;
+ }
+
+ get_task_struct(mo->task);
+ mm = get_task_mm(mo->task);
+ if (!mm) {
+ ret = -ENOENT;
+ goto err_put_task;
+ }
+ mmap_read_lock(mm);
+ num_pinned = get_user_pages_remote(mm, effective_address, num_pages,
+ FOLL_WRITE, pages, NULL, NULL);
+ mmap_read_unlock(mm);
+ mmput(mm);
+
+ if (num_pinned <= 0) {
+ ret = num_pinned;
+ dev_err(gna_dev(gna_priv), "function get_user_pages_remote() failed\n");
+ goto err_free_pages;
+ }
+ if (num_pinned < num_pages) {
+ ret = -EFAULT;
+ dev_err(gna_dev(gna_priv),
+ "get_user_pages_remote() pinned fewer pages number than requested\n");
+ goto err_free_pages;
+ }
+
+ sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+ if (!sgt) {
+ ret = -ENOMEM;
+ goto err_put_pages;
+ }
+
+ ret = sg_alloc_table_from_pages(sgt, pages, num_pinned, 0, mo->memory_size, GFP_KERNEL);
+ if (ret) {
+ dev_err(gna_dev(gna_priv), "could not alloc scatter list\n");
+ goto err_free_sgt;
+ }
+
+ if (IS_ERR(sgt->sgl)) {
+ dev_err(gna_dev(gna_priv), "sgl allocation failed\n");
+ ret = PTR_ERR(sgt->sgl);
+ goto err_free_sgt;
+ }
+
+ ents = dma_map_sg(gna_parent(gna_priv), sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
+ if (ents <= 0) {
+ dev_err(gna_dev(gna_priv), "could not map scatter gather list\n");
+ ret = -EIO;
+ goto err_free_sgl;
+ }
+
+ mo->sgt = sgt;
+ mo->pages = pages;
+ mo->num_pinned = num_pinned;
+
+ return 0;
+
+err_free_sgl:
+ sg_free_table(sgt);
+
+err_free_sgt:
+ kfree(sgt);
+
+err_put_pages:
+ release_pages(pages, num_pinned);
+
+err_free_pages:
+ kvfree(pages);
+
+err_put_task:
+ put_task_struct(mo->task);
+
+err_exit:
+ return ret;
+}
+
+/* must be called with gna_memory_object page_lock held */
+static void gna_put_pages(struct gna_memory_object *mo)
+{
+ struct gna_private *gna_priv;
+ struct sg_table *sgt;
+
+ gna_priv = mo->gna_priv;
+
+ if (!mo->pages) {
+ dev_warn(gna_dev(gna_priv), "memory object has no pages %llu\n", mo->memory_id);
+ return;
+ }
+
+ sgt = mo->sgt;
+
+ dma_unmap_sg(gna_parent(gna_priv), sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
+ sg_free_table(sgt);
+ kfree(sgt);
+ mo->sgt = NULL;
+
+ release_pages(mo->pages, mo->num_pinned);
+ kvfree(mo->pages);
+ mo->pages = NULL;
+ mo->num_pinned = 0;
+
+ put_task_struct(mo->task);
+}
+
+void gna_memory_free(struct gna_private *gna_priv, struct gna_memory_object *mo)
+{
+ mutex_lock(&gna_priv->memidr_lock);
+ idr_remove(&gna_priv->memory_idr, mo->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+
+ cancel_work_sync(&mo->work);
+ kfree(mo);
+}
+
+static void gna_memory_release(struct work_struct *work)
+{
+ struct gna_memory_object *mo;
+
+ mo = container_of(work, struct gna_memory_object, work);
+
+ mo->user_ptr = NULL;
+
+ wake_up_interruptible(&mo->waitq);
+}
+
+static const struct gna_memory_operations memory_ops = {
+ .get_pages = gna_get_pages,
+ .put_pages = gna_put_pages,
+};
+
+int gna_map_memory(struct gna_file_private *file_priv, union gna_memory_map *gna_mem)
+{
+ struct gna_memory_object *mo;
+ struct gna_private *gna_priv;
+ int memory_id;
+ int ret;
+
+ ret = 0;
+
+ gna_priv = file_priv->gna_priv;
+
+ if (gna_mem->in.address & ~PAGE_MASK) {
+ dev_err(gna_dev(gna_priv), "user pointer not page aligned\n");
+ return -EINVAL;
+ }
+
+ if (!gna_mem->in.size) {
+ dev_err(gna_dev(gna_priv), "invalid user memory size\n");
+ return -EINVAL;
+ }
+
+ if (!access_ok(u64_to_user_ptr(gna_mem->in.address), gna_mem->in.size)) {
+ dev_err(gna_dev(gna_priv), "invalid user pointer\n");
+ return -EINVAL;
+ }
+
+ mo = kzalloc(sizeof(*mo), GFP_KERNEL);
+ if (!mo)
+ return -ENOMEM;
+
+ mo->fd = file_priv->fd;
+ mo->gna_priv = gna_priv;
+ mo->ops = &memory_ops;
+ mo->user_address = gna_mem->in.address;
+ mo->memory_size = gna_mem->in.size;
+ mo->user_ptr = u64_to_user_ptr(gna_mem->in.address);
+ mo->num_pages = round_up(gna_mem->in.size, PAGE_SIZE) >> PAGE_SHIFT;
+ mo->task = current;
+ INIT_WORK(&mo->work, gna_memory_release);
+ init_waitqueue_head(&mo->waitq);
+ mutex_init(&mo->page_lock);
+
+ mutex_lock(&gna_priv->memidr_lock);
+ memory_id = idr_alloc(&gna_priv->memory_idr, mo, 1, 0, GFP_KERNEL);
+ mutex_unlock(&gna_priv->memidr_lock);
+
+ if (memory_id < 0) {
+ dev_err(gna_dev(gna_priv), "idr allocation for memory failed\n");
+ ret = -EFAULT;
+ goto err_free_mo;
+ }
+
+ mo->memory_id = (u64)memory_id;
+
+ mutex_lock(&file_priv->memlist_lock);
+ list_add_tail(&mo->file_mem_list, &file_priv->memory_list);
+ mutex_unlock(&file_priv->memlist_lock);
+
+ gna_mem->out.memory_id = mo->memory_id;
+
+ return 0;
+
+err_free_mo:
+ kfree(mo);
+ return ret;
+}
diff --git a/drivers/misc/intel/gna/mem.h b/drivers/misc/intel/gna/mem.h
new file mode 100644
index 000000000000..b198b35cbb68
--- /dev/null
+++ b/drivers/misc/intel/gna/mem.h
@@ -0,0 +1,115 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_MEM_H__
+#define __GNA_MEM_H__
+
+#include <linux/mmu_notifier.h>
+#include <linux/workqueue.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/list.h>
+#include <linux/wait.h>
+
+#include "hw.h"
+
+struct gna_file_private;
+union gna_memory_map;
+struct gna_private;
+struct task_struct;
+struct mm_struct;
+struct file;
+
+struct gna_xnn_descriptor {
+ u32 labase;
+ u16 lacount;
+ u16 _rsvd;
+};
+
+struct gna_mmu {
+ u32 vamaxaddr;
+ u8 __res_204[12];
+ u32 pagedir_n[GNA_PGDIRN_LEN];
+};
+
+struct gna_hw_descriptor {
+ u8 __res_0000[256];
+ struct gna_xnn_descriptor xnn_config;
+ u8 __unused[248];
+ struct gna_mmu mmu;
+};
+
+struct gna_mmu_object {
+ struct gna_hw_descriptor *hwdesc;
+
+ dma_addr_t hwdesc_dma;
+
+ u32 **pagetables;
+ dma_addr_t *pagetables_dma;
+
+ u32 num_pagetables;
+
+ u32 filled_pts;
+ u32 filled_pages;
+};
+
+struct gna_memory_operations;
+
+struct gna_memory_object {
+ u64 memory_id;
+
+ const struct gna_memory_operations *ops;
+
+ struct gna_private *gna_priv;
+ struct file *fd;
+
+ void __user *user_ptr;
+ u64 user_address;
+ u64 memory_size;
+
+ struct page **pages;
+ struct sg_table *sgt;
+ int num_pages;
+ int num_pinned;
+ struct mutex page_lock; /* protects get/put pages operations */
+
+ struct task_struct *task;
+
+ struct list_head file_mem_list;
+
+ struct work_struct work;
+
+ struct wait_queue_head waitq;
+};
+
+struct gna_mmu_notifier {
+ struct gna_file_private *file_priv;
+ struct gna_private *gna_priv;
+ struct gna_memory_object *mo;
+ struct mmu_notifier mn;
+ struct mm_struct *mm;
+};
+
+struct gna_memory_operations {
+ /* pins pages */
+ int (*get_pages)(struct gna_memory_object *mo, u64 offset, u64 size);
+
+ /* puts previously pinned pages */
+ void (*put_pages)(struct gna_memory_object *mo);
+};
+
+int gna_buffer_get_size(u64 offset, u64 size);
+
+int gna_map_memory(struct gna_file_private *file_priv, union gna_memory_map *gna_mem);
+
+int gna_mmu_alloc(struct gna_private *gna_priv);
+
+void gna_mmu_free(struct gna_private *gna_priv);
+
+void gna_mmu_add(struct gna_private *gna_priv, struct gna_memory_object *object);
+
+void gna_mmu_clear(struct gna_private *gna_priv);
+
+void gna_memory_free(struct gna_private *gna_priv, struct gna_memory_object *mo);
+
+#endif // __GNA_MEM_H__
diff --git a/include/uapi/misc/intel/gna.h b/include/uapi/misc/intel/gna.h
index 16a44efd0f76..a4b6a1e2010c 100644
--- a/include/uapi/misc/intel/gna.h
+++ b/include/uapi/misc/intel/gna.h
@@ -40,6 +40,18 @@ struct gna_compute_cfg {
__u8 pad[5];
};

+union gna_memory_map {
+ struct {
+ __u64 address;
+ __u32 size;
+ __u32 pad;
+ } in;
+
+ struct {
+ __u64 memory_id;
+ } out;
+};
+
#if defined(__cplusplus)
}
#endif
--
2.28.0


2021-05-13 11:50:46

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 11/14] intel_gna: add ioctl handler

From: Tomasz Jankowski <[email protected]>

Add ioctl handler into GNA driver.
The ioctl interface provides the ability to do the following:
- Map and unmap memory buffers for GNA computation requests.
- Retrieve capabilities of the underlying GNA IP.
- Submit GNA computation requests.
- Request notification of scoring completion.

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/Kbuild | 2 +-
drivers/misc/intel/gna/device.c | 47 ++++++
drivers/misc/intel/gna/device.h | 2 +
drivers/misc/intel/gna/ioctl.c | 257 ++++++++++++++++++++++++++++++++
drivers/misc/intel/gna/ioctl.h | 11 ++
include/uapi/misc/intel/gna.h | 53 +++++++
6 files changed, 371 insertions(+), 1 deletion(-)
create mode 100644 drivers/misc/intel/gna/ioctl.c
create mode 100644 drivers/misc/intel/gna/ioctl.h

diff --git a/drivers/misc/intel/gna/Kbuild b/drivers/misc/intel/gna/Kbuild
index 38ff97360ed8..745a192a7304 100644
--- a/drivers/misc/intel/gna/Kbuild
+++ b/drivers/misc/intel/gna/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only

-intel_gna-y := device.o hw.o mem.o pci.o request.o score.o
+intel_gna-y := device.o hw.o ioctl.o mem.o pci.o request.o score.o

obj-$(CONFIG_INTEL_GNA) += intel_gna.o
diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index 75d8e1675485..0e31b8c6bb70 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -6,8 +6,11 @@
#include <linux/interrupt.h>
#include <linux/module.h>

+#include <uapi/misc/intel/gna.h>
+
#include "device.h"
#include "hw.h"
+#include "ioctl.h"
#include "request.h"

static int recovery_timeout = 60;
@@ -145,6 +148,50 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
return 0;
}

+static u32 gna_device_type_by_hwid(u32 hwid)
+{
+ switch (hwid) {
+ case GNA_DEV_HWID_CNL:
+ return GNA_DEV_TYPE_0_9;
+ case GNA_DEV_HWID_GLK:
+ case GNA_DEV_HWID_EHL:
+ case GNA_DEV_HWID_ICL:
+ return GNA_DEV_TYPE_1_0;
+ case GNA_DEV_HWID_JSL:
+ case GNA_DEV_HWID_TGL:
+ case GNA_DEV_HWID_RKL:
+ return GNA_DEV_TYPE_2_0;
+ case GNA_DEV_HWID_ADL:
+ case GNA_DEV_HWID_RPL:
+ return GNA_DEV_TYPE_3_0;
+ default:
+ return 0;
+ }
+}
+
+int gna_getparam(struct gna_private *gna_priv, union gna_parameter *param)
+{
+ switch (param->in.id) {
+ case GNA_PARAM_DEVICE_ID:
+ param->out.value = gna_priv->info.hwid;
+ break;
+ case GNA_PARAM_RECOVERY_TIMEOUT:
+ param->out.value = jiffies_to_msecs(gna_priv->recovery_timeout_jiffies) / 1000;
+ break;
+ case GNA_PARAM_INPUT_BUFFER_S:
+ param->out.value = gna_priv->hw_info.in_buf_s;
+ break;
+ case GNA_PARAM_DEVICE_TYPE:
+ param->out.value = gna_device_type_by_hwid(gna_priv->info.hwid);
+ break;
+ default:
+ dev_err(gna_dev(gna_priv), "unknown parameter id %llu\n", param->in.id);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
MODULE_AUTHOR("Intel Corporation");
MODULE_DESCRIPTION("Intel(R) Gaussian & Neural Accelerator (Intel(R) GNA) Driver");
MODULE_LICENSE("GPL");
diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index d3c86d649b5c..75784882f57c 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -17,6 +17,7 @@
#define GNA_DV_NAME "intel_gna"

struct workqueue_struct;
+union gna_parameter;
struct device;
struct file;

@@ -71,6 +72,7 @@ struct gna_private {
};

int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase, int irq);
+int gna_getparam(struct gna_private *gna_priv, union gna_parameter *param);

static inline u32 gna_reg_read(struct gna_private *gna_priv, u32 reg)
{
diff --git a/drivers/misc/intel/gna/ioctl.c b/drivers/misc/intel/gna/ioctl.c
new file mode 100644
index 000000000000..4a90135b3cc6
--- /dev/null
+++ b/drivers/misc/intel/gna/ioctl.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/jiffies.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/uaccess.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
+
+#include <uapi/misc/intel/gna.h>
+
+#include "device.h"
+#include "ioctl.h"
+#include "mem.h"
+#include "request.h"
+#include "score.h"
+
+static int gna_ioctl_score(struct gna_file_private *file_priv, void __user *argptr)
+{
+ union gna_compute score_args;
+ struct gna_private *gna_priv;
+ u64 request_id;
+ int ret;
+
+ gna_priv = file_priv->gna_priv;
+
+ if (copy_from_user(&score_args, argptr, sizeof(score_args))) {
+ dev_err(gna_dev(gna_priv), "could not copy score ioctl config from user\n");
+ return -EFAULT;
+ }
+
+ ret = gna_validate_score_config(&score_args.in.config, file_priv);
+ if (ret) {
+ dev_err(gna_dev(gna_priv), "request not valid\n");
+ return ret;
+ }
+
+ ret = gna_enqueue_request(&score_args.in.config, file_priv, &request_id);
+ if (ret) {
+ dev_err(gna_dev(gna_priv), "could not enqueue score request %d\n", ret);
+ return ret;
+ }
+
+ score_args.out.request_id = request_id;
+ if (copy_to_user(argptr, &score_args, sizeof(score_args))) {
+ dev_err(gna_dev(gna_priv), "could not copy score ioctl status to user\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+static int gna_ioctl_wait(struct file *f, void __user *argptr)
+{
+ struct gna_file_private *file_priv;
+ struct gna_request *score_request;
+ struct gna_private *gna_priv;
+ union gna_wait wait_data;
+ u64 request_id;
+ u32 timeout;
+ int ret;
+
+ file_priv = (struct gna_file_private *)f->private_data;
+ gna_priv = file_priv->gna_priv;
+
+ ret = 0;
+
+ if (copy_from_user(&wait_data, argptr, sizeof(wait_data))) {
+ dev_err(gna_dev(gna_priv), "could not copy wait ioctl data from user\n");
+ return -EFAULT;
+ }
+
+ request_id = wait_data.in.request_id;
+ timeout = wait_data.in.timeout;
+
+ score_request = gna_find_request_by_id(request_id, gna_priv);
+
+ if (!score_request) {
+ dev_err(gna_dev(gna_priv), "could not find request with id: %llu\n", request_id);
+ return -EINVAL;
+ }
+
+ if (score_request->fd != f) {
+ kref_put(&score_request->refcount, gna_request_release);
+ return -EINVAL;
+ }
+
+ dev_dbg(gna_dev(gna_priv), "waiting for request %llu for timeout %u\n", request_id, timeout);
+
+ ret = wait_event_interruptible_timeout(score_request->waitq, score_request->state == DONE,
+ msecs_to_jiffies(timeout));
+ if (ret == 0 || ret == -ERESTARTSYS) {
+ dev_err(gna_dev(gna_priv), "request timed out, id: %llu\n", request_id);
+ kref_put(&score_request->refcount, gna_request_release);
+ return -EBUSY;
+ }
+
+ dev_dbg(gna_dev(gna_priv), "request wait completed with %d req id %llu\n", ret, request_id);
+
+ wait_data.out.hw_perf = score_request->hw_perf;
+ wait_data.out.drv_perf = score_request->drv_perf;
+ wait_data.out.hw_status = score_request->hw_status;
+
+ ret = score_request->status;
+
+ dev_dbg(gna_dev(gna_priv), "request status %d, hw status: %#x\n",
+ score_request->status, score_request->hw_status);
+ kref_put(&score_request->refcount, gna_request_release);
+
+ gna_delete_request_by_id(request_id, gna_priv);
+
+ if (copy_to_user(argptr, &wait_data, sizeof(wait_data))) {
+ dev_err(gna_dev(gna_priv), "could not copy wait ioctl status to user\n");
+ ret = -EFAULT;
+ }
+
+ return ret;
+}
+
+static int gna_ioctl_map(struct gna_file_private *file_priv, void __user *argptr)
+{
+ struct gna_private *gna_priv;
+ union gna_memory_map gna_mem;
+ int ret;
+
+ gna_priv = file_priv->gna_priv;
+
+ if (copy_from_user(&gna_mem, argptr, sizeof(gna_mem))) {
+ dev_err(gna_dev(gna_priv), "could not copy userptr ioctl data from user\n");
+ return -EFAULT;
+ }
+
+ ret = gna_map_memory(file_priv, &gna_mem);
+ if (ret)
+ return ret;
+
+ if (copy_to_user(argptr, &gna_mem, sizeof(gna_mem))) {
+ dev_err(gna_dev(gna_priv), "could not copy userptr ioctl status to user\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+static int gna_ioctl_free(struct gna_file_private *file_priv, unsigned long arg)
+{
+ struct gna_memory_object *iter_mo, *temp_mo;
+ struct gna_memory_object *mo;
+ struct gna_private *gna_priv;
+
+ u64 memory_id = arg;
+
+ gna_priv = file_priv->gna_priv;
+
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+
+ if (!mo) {
+ dev_warn(gna_dev(gna_priv), "memory object not found\n");
+ return -EINVAL;
+ }
+
+ queue_work(gna_priv->request_wq, &mo->work);
+ if (wait_event_interruptible(mo->waitq, true)) {
+ dev_dbg(gna_dev(gna_priv), "wait interrupted\n");
+ return -ETIME;
+ }
+
+ mutex_lock(&file_priv->memlist_lock);
+ list_for_each_entry_safe(iter_mo, temp_mo, &file_priv->memory_list, file_mem_list) {
+ if (iter_mo->memory_id == memory_id) {
+ list_del(&iter_mo->file_mem_list);
+ break;
+ }
+ }
+ mutex_unlock(&file_priv->memlist_lock);
+
+ gna_memory_free(gna_priv, mo);
+
+ return 0;
+}
+
+static int gna_ioctl_getparam(struct gna_private *gna_priv, void __user *argptr)
+{
+ union gna_parameter param;
+ int ret;
+
+ if (copy_from_user(&param, argptr, sizeof(param))) {
+ dev_err(gna_dev(gna_priv), "could not copy getparam ioctl data from user\n");
+ return -EFAULT;
+ }
+
+ ret = gna_getparam(gna_priv, &param);
+ if (ret)
+ return ret;
+
+ if (copy_to_user(argptr, &param, sizeof(param))) {
+ dev_err(gna_dev(gna_priv), "could not copy getparam ioctl status to user\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+long gna_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+{
+ struct gna_file_private *file_priv;
+ struct gna_private *gna_priv;
+ void __user *argptr;
+ int ret = 0;
+
+ argptr = (void __user *)arg;
+
+ file_priv = (struct gna_file_private *)f->private_data;
+ // TODO following is always false?
+ if (!file_priv)
+ return -ENODEV;
+
+ gna_priv = file_priv->gna_priv;
+ if (!gna_priv)
+ return -ENODEV;
+
+ switch (cmd) {
+ case GNA_GET_PARAMETER:
+ ret = gna_ioctl_getparam(gna_priv, argptr);
+ break;
+
+ case GNA_MAP_MEMORY:
+ ret = gna_ioctl_map(file_priv, argptr);
+ break;
+
+ case GNA_UNMAP_MEMORY:
+ ret = gna_ioctl_free(file_priv, arg);
+ break;
+
+ case GNA_COMPUTE:
+ ret = gna_ioctl_score(file_priv, argptr);
+ break;
+
+ case GNA_WAIT:
+ ret = gna_ioctl_wait(f, argptr);
+ break;
+
+ default:
+ dev_warn(gna_dev(gna_priv), "wrong ioctl %#x\n", cmd);
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
diff --git a/drivers/misc/intel/gna/ioctl.h b/drivers/misc/intel/gna/ioctl.h
new file mode 100644
index 000000000000..c10dba7afa18
--- /dev/null
+++ b/drivers/misc/intel/gna/ioctl.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_IOCTL_H__
+#define __GNA_IOCTL_H__
+
+struct file;
+
+long gna_ioctl(struct file *f, unsigned int cmd, unsigned long arg);
+
+#endif // __GNA_IOCTL_H__
diff --git a/include/uapi/misc/intel/gna.h b/include/uapi/misc/intel/gna.h
index b531beb35bd9..0b55deb7a0ea 100644
--- a/include/uapi/misc/intel/gna.h
+++ b/include/uapi/misc/intel/gna.h
@@ -8,12 +8,19 @@
extern "C" {
#endif

+#include <linux/const.h>
+#include <linux/ioctl.h>
#include <linux/types.h>

/* Operation modes */
#define GNA_MODE_GMM 0
#define GNA_MODE_XNN 1

+#define GNA_PARAM_DEVICE_ID 1
+#define GNA_PARAM_RECOVERY_TIMEOUT 2
+#define GNA_PARAM_DEVICE_TYPE 3
+#define GNA_PARAM_INPUT_BUFFER_S 4
+
#define GNA_STS_SCORE_COMPLETED _BITUL(0)
#define GNA_STS_STATISTICS_VALID _BITUL(3)
#define GNA_STS_PCI_MMU_ERR _BITUL(4)
@@ -30,6 +37,11 @@ extern "C" {
GNA_STS_PARAM_OOR |\
GNA_STS_VA_OOR)

+#define GNA_DEV_TYPE_0_9 0x09
+#define GNA_DEV_TYPE_1_0 0x10
+#define GNA_DEV_TYPE_2_0 0x20
+#define GNA_DEV_TYPE_3_0 0x30
+
/*
* Structure describes part of memory to be overwritten before starting GNA
*/
@@ -81,6 +93,16 @@ struct gna_compute_cfg {
__u8 pad[5];
};

+union gna_parameter {
+ struct {
+ __u64 id;
+ } in;
+
+ struct {
+ __u64 value;
+ } out;
+};
+
union gna_memory_map {
struct {
__u64 address;
@@ -93,6 +115,37 @@ union gna_memory_map {
} out;
};

+union gna_compute {
+ struct {
+ struct gna_compute_cfg config;
+ } in;
+
+ struct {
+ __u64 request_id;
+ } out;
+};
+
+union gna_wait {
+ struct {
+ __u64 request_id;
+ __u32 timeout;
+ __u32 pad;
+ } in;
+
+ struct {
+ __u32 hw_status;
+ __u32 pad;
+ struct gna_drv_perf drv_perf;
+ struct gna_hw_perf hw_perf;
+ } out;
+};
+
+#define GNA_GET_PARAMETER _IOWR('C', 0x01, union gna_parameter)
+#define GNA_MAP_MEMORY _IOWR('C', 0x02, union gna_memory_map)
+#define GNA_UNMAP_MEMORY _IOWR('C', 0x03, __u64)
+#define GNA_COMPUTE _IOWR('C', 0x04, union gna_compute)
+#define GNA_WAIT _IOWR('C', 0x05, union gna_wait)
+
#if defined(__cplusplus)
}
#endif
--
2.28.0


2021-05-13 11:51:17

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 12/14] intel_gna: add a 'misc' device

The new 'misc' device is the node for applications in user space to
interact with the driver.

Signed-off-by: Maciej Kwapulinski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
---
drivers/misc/intel/gna/device.c | 52 +++++++++++++++++++++++++++++++--
drivers/misc/intel/gna/device.h | 11 +++----
2 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index 0e31b8c6bb70..1e6345a8325b 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -20,6 +20,18 @@ module_param(recovery_timeout, int, 0644);
MODULE_PARM_DESC(recovery_timeout, "Recovery timeout in seconds");
#endif

+struct file;
+
+static int gna_open(struct inode *inode, struct file *f)
+{
+ return -EPERM;
+}
+
+static const struct file_operations gna_file_ops = {
+ .owner = THIS_MODULE,
+ .open = gna_open,
+};
+
static void gna_devm_idr_destroy(void *data)
{
struct idr *idr = data;
@@ -27,6 +39,36 @@ static void gna_devm_idr_destroy(void *data)
idr_destroy(idr);
}

+static void gna_devm_deregister_misc_dev(void *data)
+{
+ struct miscdevice *misc = data;
+
+ misc_deregister(misc);
+}
+
+static int gna_devm_register_misc_dev(struct device *parent, struct miscdevice *misc)
+{
+ int ret;
+
+ ret = misc_register(misc);
+ if (ret) {
+ dev_err(parent, "misc device %s registering failed. errcode: %d\n",
+ misc->name, ret);
+ gna_devm_deregister_misc_dev(misc);
+ } else {
+ dev_dbg(parent, "device: %s registered\n",
+ misc->name);
+ }
+
+ ret = devm_add_action(parent, gna_devm_deregister_misc_dev, misc);
+ if (ret) {
+ dev_err(parent, "could not add devm action for %s misc deregister\n", misc->name);
+ gna_devm_deregister_misc_dev(misc);
+ }
+
+ return ret;
+}
+
static irqreturn_t gna_interrupt(int irq, void *priv)
{
struct gna_private *gna_priv;
@@ -81,13 +123,13 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
gna_priv->recovery_timeout_jiffies = msecs_to_jiffies(recovery_timeout * 1000);
gna_priv->iobase = iobase;
gna_priv->info = *dev_info;
- gna_priv->parent = parent;
+ gna_priv->misc.parent = parent;

dev_misc_name = devm_kasprintf(parent, GFP_KERNEL, "%s%d", GNA_DV_NAME, gna_priv->index);
if (!dev_misc_name)
return -ENOMEM;

- gna_priv->name = dev_misc_name;
+ gna_priv->misc.name = dev_misc_name;

if (!(sizeof(dma_addr_t) > 4) ||
dma_set_mask(parent, DMA_BIT_MASK(64))) {
@@ -145,7 +187,11 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
return ret;
}

- return 0;
+ gna_priv->misc.minor = MISC_DYNAMIC_MINOR;
+ gna_priv->misc.fops = &gna_file_ops;
+ gna_priv->misc.mode = 0666;
+
+ return gna_devm_register_misc_dev(parent, &gna_priv->misc);
}

static u32 gna_device_type_by_hwid(u32 hwid)
diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index 75784882f57c..ea3caf679c77 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -8,6 +8,7 @@
#include <linux/idr.h>
#include <linux/io.h>
#include <linux/list.h>
+#include <linux/miscdevice.h>
#include <linux/mutex.h>
#include <linux/types.h>

@@ -41,8 +42,8 @@ struct gna_private {

int recovery_timeout_jiffies;

- const char *name;
- struct device *parent;
+ /* gna misc-device */
+ struct miscdevice misc;

/* hardware status set by interrupt handler */
u32 hw_status;
@@ -86,17 +87,17 @@ static inline void gna_reg_write(struct gna_private *gna_priv, u32 reg, u32 val)

static inline struct device *gna_parent(struct gna_private *gna_priv)
{
- return gna_priv->parent;
+ return gna_priv->misc.parent;
}

static inline const char *gna_name(struct gna_private *gna_priv)
{
- return gna_priv->name;
+ return gna_priv->misc.name;
}

static inline struct device *gna_dev(struct gna_private *gna_priv)
{
- return gna_priv->parent;
+ return gna_priv->misc.this_device;
}

#endif /* __GNA_DEVICE_H__ */
--
2.28.0


2021-05-13 11:52:25

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 13/14] intel_gna: add file operations to a 'misc' device

From: Tomasz Jankowski <[email protected]>

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/device.c | 60 +++++++++++++++++++++++++++++++--
1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index 1e6345a8325b..c882055de8cf 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -4,7 +4,9 @@
#include <linux/device.h>
#include <linux/dma-mapping.h>
#include <linux/interrupt.h>
+#include <linux/fs.h>
#include <linux/module.h>
+#include <linux/slab.h>

#include <uapi/misc/intel/gna.h>

@@ -20,16 +22,68 @@ module_param(recovery_timeout, int, 0644);
MODULE_PARM_DESC(recovery_timeout, "Recovery timeout in seconds");
#endif

-struct file;
-
static int gna_open(struct inode *inode, struct file *f)
{
- return -EPERM;
+ struct gna_file_private *file_priv;
+ struct gna_private *gna_priv;
+
+ gna_priv = container_of(f->private_data, struct gna_private, misc);
+
+ file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
+ if (!file_priv)
+ return -ENOMEM;
+
+ file_priv->fd = f;
+ file_priv->gna_priv = gna_priv;
+
+ mutex_init(&file_priv->memlist_lock);
+ INIT_LIST_HEAD(&file_priv->memory_list);
+
+ INIT_LIST_HEAD(&file_priv->flist);
+
+ mutex_lock(&gna_priv->flist_lock);
+ list_add_tail(&file_priv->flist, &gna_priv->file_list);
+ mutex_unlock(&gna_priv->flist_lock);
+
+ f->private_data = file_priv;
+
+ return 0;
+}
+
+static int gna_release(struct inode *inode, struct file *f)
+{
+ struct gna_memory_object *iter_mo, *temp_mo;
+ struct gna_file_private *file_priv;
+ struct gna_private *gna_priv;
+
+ /* free all memory objects created by that file */
+ file_priv = (struct gna_file_private *)f->private_data;
+ gna_priv = file_priv->gna_priv;
+
+ mutex_lock(&file_priv->memlist_lock);
+ list_for_each_entry_safe(iter_mo, temp_mo, &file_priv->memory_list, file_mem_list) {
+ queue_work(gna_priv->request_wq, &iter_mo->work);
+ wait_event(iter_mo->waitq, true);
+ gna_memory_free(gna_priv, iter_mo);
+ }
+ mutex_unlock(&file_priv->memlist_lock);
+
+ gna_delete_file_requests(f, gna_priv);
+
+ mutex_lock(&gna_priv->flist_lock);
+ list_del_init(&file_priv->flist);
+ mutex_unlock(&gna_priv->flist_lock);
+ kfree(file_priv);
+ f->private_data = NULL;
+
+ return 0;
}

static const struct file_operations gna_file_ops = {
.owner = THIS_MODULE,
.open = gna_open,
+ .release = gna_release,
+ .unlocked_ioctl = gna_ioctl,
};

static void gna_devm_idr_destroy(void *data)
--
2.28.0


2021-05-13 17:17:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] intel_gna: add a 'misc' device

On Thu, May 13, 2021 at 07:06:18PM +0200, Maciej Kwapulinski wrote:
>
> Greg Kroah-Hartman <[email protected]> writes:
>
> > On Thu, May 13, 2021 at 01:00:38PM +0200, Maciej Kwapulinski wrote:
> >> The new 'misc' device is the node for applications in user space to
> >> interact with the driver.
> >>
> >> Signed-off-by: Maciej Kwapulinski <[email protected]>
> >> Tested-by: Savo Novakovic <[email protected]>
> >> ---
> >> drivers/misc/intel/gna/device.c | 52 +++++++++++++++++++++++++++++++--
> >> drivers/misc/intel/gna/device.h | 11 +++----
> >> 2 files changed, 55 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
> >> index 0e31b8c6bb70..1e6345a8325b 100644
> >> --- a/drivers/misc/intel/gna/device.c
> >> +++ b/drivers/misc/intel/gna/device.c
> >> @@ -20,6 +20,18 @@ module_param(recovery_timeout, int, 0644);
> >> MODULE_PARM_DESC(recovery_timeout, "Recovery timeout in seconds");
> >> #endif
> >>
> >> +struct file;
> >> +
> >> +static int gna_open(struct inode *inode, struct file *f)
> >> +{
> >> + return -EPERM;
> >> +}
> >
> > That sucks, why have an open that does nothing but fail?
>
> next patch provides complete implementation of gna_open(), here it's
> just a protection if someone would incidentally run gna in the middle of patch series

Then don't provide an open at all, and it will be fine :)

>
> >
> >> +
> >> +static const struct file_operations gna_file_ops = {
> >> + .owner = THIS_MODULE,
> >> + .open = gna_open,
> >> +};
> >> +
> >> static void gna_devm_idr_destroy(void *data)
> >> {
> >> struct idr *idr = data;
> >> @@ -27,6 +39,36 @@ static void gna_devm_idr_destroy(void *data)
> >> idr_destroy(idr);
> >> }
> >>
> >> +static void gna_devm_deregister_misc_dev(void *data)
> >
> > Why is this a void *?
>
> it goes to devm_add_action() api.

Ah. That's not obvious :(




2021-05-13 17:48:51

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 02/14] intel_gna: add component of hardware operation

From: Tomasz Jankowski <[email protected]>

Add definitions and utilities to interact with the hardware
device.

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/Kbuild | 2 +-
drivers/misc/intel/gna/device.h | 16 +++++
drivers/misc/intel/gna/hw.c | 118 ++++++++++++++++++++++++++++++++
drivers/misc/intel/gna/hw.h | 50 ++++++++++++++
include/uapi/misc/intel/gna.h | 47 +++++++++++++
5 files changed, 232 insertions(+), 1 deletion(-)
create mode 100644 drivers/misc/intel/gna/hw.c
create mode 100644 include/uapi/misc/intel/gna.h

diff --git a/drivers/misc/intel/gna/Kbuild b/drivers/misc/intel/gna/Kbuild
index 02d758132d32..69e20c8c22bd 100644
--- a/drivers/misc/intel/gna/Kbuild
+++ b/drivers/misc/intel/gna/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only

-intel_gna-y := device.o pci.o
+intel_gna-y := device.o hw.o pci.o

obj-$(CONFIG_INTEL_GNA) += intel_gna.o
diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index e12bac64fbf4..a188f482a273 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -4,6 +4,7 @@
#ifndef __GNA_DEVICE_H__
#define __GNA_DEVICE_H__

+#include <linux/io.h>
#include <linux/types.h>

#include "hw.h"
@@ -27,4 +28,19 @@ struct gna_private {

int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase);

+static inline u32 gna_reg_read(struct gna_private *gna_priv, u32 reg)
+{
+ return readl(gna_priv->iobase + reg);
+}
+
+static inline void gna_reg_write(struct gna_private *gna_priv, u32 reg, u32 val)
+{
+ writel(val, gna_priv->iobase + reg);
+}
+
+static inline struct device *gna_dev(struct gna_private *gna_priv)
+{
+ return gna_priv->parent;
+}
+
#endif /* __GNA_DEVICE_H__ */
diff --git a/drivers/misc/intel/gna/hw.c b/drivers/misc/intel/gna/hw.c
new file mode 100644
index 000000000000..5288d554c77c
--- /dev/null
+++ b/drivers/misc/intel/gna/hw.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/bitfield.h>
+#include <linux/device.h>
+#include <linux/iopoll.h>
+
+#include <uapi/misc/intel/gna.h>
+
+#include "device.h"
+#include "hw.h"
+
+int gna_parse_hw_status(struct gna_private *gna_priv, u32 hw_status)
+{
+ if (hw_status & GNA_ERROR) {
+ dev_dbg(gna_dev(gna_priv), "GNA completed with errors: %#x\n", hw_status);
+ return -EIO;
+ }
+
+ if (hw_status & GNA_STS_SCORE_COMPLETED) {
+ dev_dbg(gna_dev(gna_priv), "GNA completed successfully: %#x\n", hw_status);
+ return 0;
+ }
+
+ dev_err(gna_dev(gna_priv), "GNA not completed, status: %#x\n", hw_status);
+ return -ENODATA;
+}
+
+void gna_print_error_status(struct gna_private *gna_priv, u32 hw_status)
+{
+ if (hw_status & GNA_STS_PARAM_OOR)
+ dev_dbg(gna_dev(gna_priv), "GNA error: Param Out Range Error\n");
+
+ if (hw_status & GNA_STS_VA_OOR)
+ dev_dbg(gna_dev(gna_priv), "GNA error: VA Out of Range Error\n");
+
+ if (hw_status & GNA_STS_PCI_MMU_ERR)
+ dev_dbg(gna_dev(gna_priv), "GNA error: PCI MMU Error\n");
+
+ if (hw_status & GNA_STS_PCI_DMA_ERR)
+ dev_dbg(gna_dev(gna_priv), "GNA error: PCI MMU Error\n");
+
+ if (hw_status & GNA_STS_PCI_UNEXCOMPL_ERR)
+ dev_dbg(gna_dev(gna_priv), "GNA error: PCI Unexpected Completion Error\n");
+
+ if (hw_status & GNA_STS_SATURATE)
+ dev_dbg(gna_dev(gna_priv), "GNA error: Saturation Reached !\n");
+}
+
+bool gna_hw_perf_enabled(struct gna_private *gna_priv)
+{
+ u32 ctrl = gna_reg_read(gna_priv, GNA_MMIO_CTRL);
+
+ return !!FIELD_GET(GNA_CTRL_COMP_STATS_EN, ctrl);
+}
+
+void gna_start_scoring(struct gna_private *gna_priv,
+ struct gna_compute_cfg *compute_cfg)
+{
+ u32 ctrl = gna_reg_read(gna_priv, GNA_MMIO_CTRL);
+
+ ctrl |= GNA_CTRL_START_ACCEL | GNA_CTRL_COMP_INT_EN | GNA_CTRL_ERR_INT_EN;
+
+ ctrl &= ~GNA_CTRL_COMP_STATS_EN;
+ ctrl |= FIELD_PREP(GNA_CTRL_COMP_STATS_EN,
+ compute_cfg->hw_perf_encoding & FIELD_MAX(GNA_CTRL_COMP_STATS_EN));
+
+ ctrl &= ~GNA_CTRL_ACTIVE_LIST_EN;
+ ctrl |= FIELD_PREP(GNA_CTRL_ACTIVE_LIST_EN,
+ compute_cfg->active_list_on & FIELD_MAX(GNA_CTRL_ACTIVE_LIST_EN));
+
+ ctrl &= ~GNA_CTRL_OP_MODE;
+ ctrl |= FIELD_PREP(GNA_CTRL_OP_MODE,
+ compute_cfg->gna_mode & FIELD_MAX(GNA_CTRL_OP_MODE));
+
+ gna_reg_write(gna_priv, GNA_MMIO_CTRL, ctrl);
+
+ dev_dbg(gna_dev(gna_priv), "scoring started...\n");
+}
+
+static void gna_clear_saturation(struct gna_private *gna_priv)
+{
+ u32 val;
+
+ val = gna_reg_read(gna_priv, GNA_MMIO_STS);
+ if (val & GNA_STS_SATURATE) {
+ dev_dbg(gna_dev(gna_priv), "saturation reached\n");
+ dev_dbg(gna_dev(gna_priv), "status: %#x\n", val);
+
+ val = val & GNA_STS_SATURATE;
+ gna_reg_write(gna_priv, GNA_MMIO_STS, val);
+ }
+}
+
+void gna_abort_hw(struct gna_private *gna_priv)
+{
+ u32 val;
+ int ret;
+
+ /* saturation bit in the GNA status register needs
+ * to be explicitly cleared.
+ */
+ gna_clear_saturation(gna_priv);
+
+ val = gna_reg_read(gna_priv, GNA_MMIO_STS);
+ dev_dbg(gna_dev(gna_priv), "status before abort: %#x\n", val);
+
+ val = gna_reg_read(gna_priv, GNA_MMIO_CTRL);
+ val |= GNA_CTRL_ABORT_CLR_ACCEL;
+ gna_reg_write(gna_priv, GNA_MMIO_CTRL, val);
+
+ ret = readl_poll_timeout(gna_priv->iobase + GNA_MMIO_STS, val,
+ !(val & 0x1),
+ 0, 1000);
+
+ if (ret)
+ dev_err(gna_dev(gna_priv), "abort did not complete\n");
+}
diff --git a/drivers/misc/intel/gna/hw.h b/drivers/misc/intel/gna/hw.h
index 8ef8dc182b92..f9c0566cbe60 100644
--- a/drivers/misc/intel/gna/hw.h
+++ b/drivers/misc/intel/gna/hw.h
@@ -4,14 +4,64 @@
#ifndef __GNA_HW_H__
#define __GNA_HW_H__

+#include <linux/bits.h>
#include <linux/mm_types.h>

+struct gna_compute_cfg;
+struct gna_private;
+
+/* GNA MMIO registers */
+#define GNA_MMIO_STS 0x80
+#define GNA_MMIO_CTRL 0x84
+#define GNA_MMIO_PTC 0x8C
+#define GNA_MMIO_PSC 0x90
+#define GNA_MMIO_DESBASE 0xB0
+#define GNA_MMIO_IBUFFS 0xB4
+
+#define GNA_PT_ENTRY_SIZE 4
+/* there are up to 1024 32-bit pointers in one page in Page Table (L1) */
+#define GNA_PT_LENGTH (PAGE_SIZE / GNA_PT_ENTRY_SIZE)
+
+#define GNA_PGDIRN_LEN 64
+#define GNA_PGDIR_ENTRIES 1024 /* 32-bit page addresses */
+#define GNA_PGDIR_INVALID 1
+
+#define GNA_CTRL_START_ACCEL BIT(0)
+#define GNA_CTRL_ACTIVE_LIST_EN BIT(1)
+#define GNA_CTRL_ABORT_CLR_ACCEL BIT(2)
+#define GNA_CTRL_OP_MODE GENMASK(6, 5)
+#define GNA_CTRL_COMP_INT_EN BIT(8)
+#define GNA_CTRL_ERR_INT_EN BIT(10)
+#define GNA_CTRL_COMP_STATS_EN GENMASK(15, 12)
+
+struct gna_mmu_info {
+ u32 vamax_size;
+ u32 rsvd_size;
+ u32 pd_size;
+};
+
+struct gna_desc_info {
+ u32 rsvd_size;
+ u32 cfg_size;
+ u32 desc_size;
+ struct gna_mmu_info mmu_info;
+};
+
struct gna_dev_info {
u32 hwid;
u32 num_pagetables;
u32 num_page_entries;
u32 max_layer_count;
u64 max_hw_mem;
+
+ struct gna_desc_info desc_info;
};

+void gna_abort_hw(struct gna_private *gna_priv);
+bool gna_hw_perf_enabled(struct gna_private *gna_priv);
+int gna_parse_hw_status(struct gna_private *gna_priv, u32 hw_status);
+void gna_print_error_status(struct gna_private *gna_priv, u32 hw_status);
+void gna_start_scoring(struct gna_private *gna_priv,
+ struct gna_compute_cfg *compute_cfg);
+
#endif // __GNA_HW_H__
diff --git a/include/uapi/misc/intel/gna.h b/include/uapi/misc/intel/gna.h
new file mode 100644
index 000000000000..16a44efd0f76
--- /dev/null
+++ b/include/uapi/misc/intel/gna.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef _UAPI_GNA_H_
+#define _UAPI_GNA_H_
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#include <linux/types.h>
+
+#define GNA_STS_SCORE_COMPLETED _BITUL(0)
+#define GNA_STS_STATISTICS_VALID _BITUL(3)
+#define GNA_STS_PCI_MMU_ERR _BITUL(4)
+#define GNA_STS_PCI_DMA_ERR _BITUL(5)
+#define GNA_STS_PCI_UNEXCOMPL_ERR _BITUL(6)
+#define GNA_STS_VA_OOR _BITUL(7)
+#define GNA_STS_PARAM_OOR _BITUL(8)
+#define GNA_STS_SATURATE _BITUL(17)
+
+#define GNA_ERROR \
+ (GNA_STS_PCI_DMA_ERR |\
+ GNA_STS_PCI_MMU_ERR |\
+ GNA_STS_PCI_UNEXCOMPL_ERR |\
+ GNA_STS_PARAM_OOR |\
+ GNA_STS_VA_OOR)
+
+struct gna_compute_cfg {
+ __u32 layer_base;
+ __u32 layer_count;
+
+ /* List of GNA memory buffers */
+ __u64 buffers_ptr;
+ __u64 buffer_count;
+
+ __u8 active_list_on;
+ __u8 gna_mode;
+ __u8 hw_perf_encoding;
+ __u8 pad[5];
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* _UAPI_GNA_H_ */
--
2.28.0


2021-05-13 17:48:56

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 03/14] intel_gna: read hardware info in the driver

From: Tomasz Jankowski <[email protected]>

Get the hardware information from register MMIO_IBUFFS

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/device.c | 4 ++++
drivers/misc/intel/gna/device.h | 1 +
drivers/misc/intel/gna/hw.h | 4 ++++
3 files changed, 9 insertions(+)

diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index 8722935d26bf..45b4b6df64ef 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -21,6 +21,7 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
static atomic_t dev_last_idx = ATOMIC_INIT(-1);
struct gna_private *gna_priv;
const char *dev_misc_name;
+ u32 bld_reg;
int ret;

gna_priv = devm_kzalloc(parent, sizeof(*gna_priv), GFP_KERNEL);
@@ -48,6 +49,9 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
}
}

+ bld_reg = gna_reg_read(gna_priv, GNA_MMIO_IBUFFS);
+ gna_priv->hw_info.in_buf_s = bld_reg & GENMASK(7, 0);
+
return 0;
}

diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index a188f482a273..057dec0e1983 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -24,6 +24,7 @@ struct gna_private {
/* device related resources */
void __iomem *iobase;
struct gna_dev_info info;
+ struct gna_hw_info hw_info;
};

int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase);
diff --git a/drivers/misc/intel/gna/hw.h b/drivers/misc/intel/gna/hw.h
index f9c0566cbe60..f61792f9dd52 100644
--- a/drivers/misc/intel/gna/hw.h
+++ b/drivers/misc/intel/gna/hw.h
@@ -47,6 +47,10 @@ struct gna_desc_info {
struct gna_mmu_info mmu_info;
};

+struct gna_hw_info {
+ u8 in_buf_s;
+};
+
struct gna_dev_info {
u32 hwid;
u32 num_pagetables;
--
2.28.0


2021-05-13 17:51:42

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 14/14] intel_gna: add power management

Implement power management in intel_gna driver

Signed-off-by: Maciej Kwapulinski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Tomasz Jankowski <[email protected]>
Signed-off-by: Tomasz Jankowski <[email protected]>
---
drivers/misc/intel/gna/device.c | 55 +++++++++++++++++++++++++++++++-
drivers/misc/intel/gna/device.h | 3 ++
drivers/misc/intel/gna/hw.h | 1 +
drivers/misc/intel/gna/pci.c | 3 ++
drivers/misc/intel/gna/request.c | 15 +++++++++
5 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index c882055de8cf..d7b0697bdd7c 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -22,6 +22,30 @@ module_param(recovery_timeout, int, 0644);
MODULE_PARM_DESC(recovery_timeout, "Recovery timeout in seconds");
#endif

+static int __maybe_unused gna_runtime_suspend(struct device *dev)
+{
+ struct gna_private *gna_priv = dev_get_drvdata(dev);
+ u32 val = gna_reg_read(gna_priv, GNA_MMIO_D0I3C);
+
+ dev_dbg(dev, "%s D0I3, reg %.8x\n", __func__, val);
+
+ return 0;
+}
+
+static int __maybe_unused gna_runtime_resume(struct device *dev)
+{
+ struct gna_private *gna_priv = dev_get_drvdata(dev);
+ u32 val = gna_reg_read(gna_priv, GNA_MMIO_D0I3C);
+
+ dev_dbg(dev, "%s D0I3, reg %.8x\n", __func__, val);
+
+ return 0;
+}
+
+const struct dev_pm_ops __maybe_unused gna_pm = {
+ SET_RUNTIME_PM_OPS(gna_runtime_suspend, gna_runtime_resume, NULL)
+};
+
static int gna_open(struct inode *inode, struct file *f)
{
struct gna_file_private *file_priv;
@@ -123,6 +147,22 @@ static int gna_devm_register_misc_dev(struct device *parent, struct miscdevice *
return ret;
}

+static void gna_pm_init(struct device *dev)
+{
+ pm_runtime_set_autosuspend_delay(dev, 200);
+ pm_runtime_use_autosuspend(dev);
+ pm_runtime_mark_last_busy(dev);
+ pm_runtime_allow(dev);
+ pm_runtime_put_noidle(dev);
+}
+
+static void gna_pm_remove(void *data)
+{
+ struct device *dev = data;
+
+ pm_runtime_get_noresume(dev);
+}
+
static irqreturn_t gna_interrupt(int irq, void *priv)
{
struct gna_private *gna_priv;
@@ -245,7 +285,20 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
gna_priv->misc.fops = &gna_file_ops;
gna_priv->misc.mode = 0666;

- return gna_devm_register_misc_dev(parent, &gna_priv->misc);
+ ret = gna_devm_register_misc_dev(parent, &gna_priv->misc);
+ if (ret)
+ return ret;
+
+ dev_set_drvdata(parent, gna_priv);
+
+ gna_pm_init(parent);
+ ret = devm_add_action(parent, gna_pm_remove, parent);
+ if (ret) {
+ dev_err(parent, "could not add devm action for pm\n");
+ return ret;
+ }
+
+ return 0;
}

static u32 gna_device_type_by_hwid(u32 hwid)
diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index ea3caf679c77..a0e28d05ebfa 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -10,6 +10,7 @@
#include <linux/list.h>
#include <linux/miscdevice.h>
#include <linux/mutex.h>
+#include <linux/pm_runtime.h>
#include <linux/types.h>

#include "hw.h"
@@ -75,6 +76,8 @@ struct gna_private {
int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase, int irq);
int gna_getparam(struct gna_private *gna_priv, union gna_parameter *param);

+extern const struct dev_pm_ops __maybe_unused gna_pm;
+
static inline u32 gna_reg_read(struct gna_private *gna_priv, u32 reg)
{
return readl(gna_priv->iobase + reg);
diff --git a/drivers/misc/intel/gna/hw.h b/drivers/misc/intel/gna/hw.h
index 2a6890fb748e..c0a8120b84d9 100644
--- a/drivers/misc/intel/gna/hw.h
+++ b/drivers/misc/intel/gna/hw.h
@@ -49,6 +49,7 @@ struct gna_private;
#define GNA_MMIO_CTRL 0x84
#define GNA_MMIO_PTC 0x8C
#define GNA_MMIO_PSC 0x90
+#define GNA_MMIO_D0I3C 0xA8
#define GNA_MMIO_DESBASE 0xB0
#define GNA_MMIO_IBUFFS 0xB4

diff --git a/drivers/misc/intel/gna/pci.c b/drivers/misc/intel/gna/pci.c
index a31f0142a4f2..4aad4cf702b7 100644
--- a/drivers/misc/intel/gna/pci.c
+++ b/drivers/misc/intel/gna/pci.c
@@ -139,6 +139,9 @@ static struct pci_driver gna_pci_driver = {
.name = GNA_DV_NAME,
.id_table = gna_pci_ids,
.probe = gna_pci_probe,
+ .driver = {
+ .pm = &gna_pm,
+ },
};

module_pci_driver(gna_pci_driver);
diff --git a/drivers/misc/intel/gna/request.c b/drivers/misc/intel/gna/request.c
index eabbab8924be..e923f0d2651d 100644
--- a/drivers/misc/intel/gna/request.c
+++ b/drivers/misc/intel/gna/request.c
@@ -6,6 +6,7 @@
#include <linux/idr.h>
#include <linux/mm.h>
#include <linux/mutex.h>
+#include <linux/pm_runtime.h>
#include <linux/uaccess.h>
#include <linux/slab.h>

@@ -65,6 +66,14 @@ static void gna_request_process(struct work_struct *work)

score_request->drv_perf.pre_processing = ktime_get_ns();

+ ret = pm_runtime_get_sync(gna_parent(gna_priv));
+ if (ret < 0 && ret != -EACCES) {
+ dev_warn(gna_dev(gna_priv), "pm_runtime_get_sync() failed: %d\n", ret);
+ score_request->status = -ENODEV;
+ pm_runtime_put_noidle(gna_parent(gna_priv));
+ goto end;
+ }
+
/* Set busy flag before kicking off HW. The isr will clear it and wake up us. There is
* no difference if isr is missed in a timeout situation of the last request. We just
* always set it busy and let the wait_event_timeout check the reset.
@@ -75,6 +84,8 @@ static void gna_request_process(struct work_struct *work)

ret = gna_score(score_request);
if (ret) {
+ if (pm_runtime_put(gna_parent(gna_priv)) < 0)
+ dev_warn(gna_dev(gna_priv), "pm_runtime_put() failed: %d\n", ret);
score_request->status = ret;
goto end;
}
@@ -94,6 +105,10 @@ static void gna_request_process(struct work_struct *work)
gna_request_update_status(score_request);
gna_abort_hw(gna_priv);

+ ret = pm_runtime_put(gna_parent(gna_priv));
+ if (ret < 0)
+ dev_warn(gna_dev(gna_priv), "pm_runtime_put() failed: %d\n", ret);
+
buffer = score_request->buffer_list;
for (i = 0; i < score_request->buffer_count; i++, buffer++) {
mutex_lock(&gna_priv->memidr_lock);
--
2.28.0


2021-05-13 17:54:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 02/14] intel_gna: add component of hardware operation

On Thu, May 13, 2021 at 01:00:28PM +0200, Maciej Kwapulinski wrote:
> +void gna_start_scoring(struct gna_private *gna_priv,
> + struct gna_compute_cfg *compute_cfg)
> +{
> + u32 ctrl = gna_reg_read(gna_priv, GNA_MMIO_CTRL);
> +
> + ctrl |= GNA_CTRL_START_ACCEL | GNA_CTRL_COMP_INT_EN | GNA_CTRL_ERR_INT_EN;
> +
> + ctrl &= ~GNA_CTRL_COMP_STATS_EN;
> + ctrl |= FIELD_PREP(GNA_CTRL_COMP_STATS_EN,
> + compute_cfg->hw_perf_encoding & FIELD_MAX(GNA_CTRL_COMP_STATS_EN));
> +
> + ctrl &= ~GNA_CTRL_ACTIVE_LIST_EN;
> + ctrl |= FIELD_PREP(GNA_CTRL_ACTIVE_LIST_EN,
> + compute_cfg->active_list_on & FIELD_MAX(GNA_CTRL_ACTIVE_LIST_EN));
> +
> + ctrl &= ~GNA_CTRL_OP_MODE;
> + ctrl |= FIELD_PREP(GNA_CTRL_OP_MODE,
> + compute_cfg->gna_mode & FIELD_MAX(GNA_CTRL_OP_MODE));
> +
> + gna_reg_write(gna_priv, GNA_MMIO_CTRL, ctrl);
> +
> + dev_dbg(gna_dev(gna_priv), "scoring started...\n");

ftrace is your friend, no need for lines like this.

> +void gna_abort_hw(struct gna_private *gna_priv)
> +{
> + u32 val;
> + int ret;
> +
> + /* saturation bit in the GNA status register needs
> + * to be explicitly cleared.
> + */
> + gna_clear_saturation(gna_priv);
> +
> + val = gna_reg_read(gna_priv, GNA_MMIO_STS);
> + dev_dbg(gna_dev(gna_priv), "status before abort: %#x\n", val);
> +
> + val = gna_reg_read(gna_priv, GNA_MMIO_CTRL);
> + val |= GNA_CTRL_ABORT_CLR_ACCEL;
> + gna_reg_write(gna_priv, GNA_MMIO_CTRL, val);
> +
> + ret = readl_poll_timeout(gna_priv->iobase + GNA_MMIO_STS, val,
> + !(val & 0x1),
> + 0, 1000);
> +
> + if (ret)
> + dev_err(gna_dev(gna_priv), "abort did not complete\n");
> +}

If "abort_hw" can fail, then return an error. What could a user do with
an error message in the kernel log like the above one? The driver
obviously did not recover from it, so what can they do?

> --- /dev/null
> +++ b/include/uapi/misc/intel/gna.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +/* Copyright(c) 2017-2021 Intel Corporation */
> +
> +#ifndef _UAPI_GNA_H_
> +#define _UAPI_GNA_H_
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif

These C++ things should not be needed in kernel uapi header files,
right?

thanks,

greg k-h

2021-05-13 17:54:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] intel_gna: add a 'misc' device

On Thu, May 13, 2021 at 01:00:38PM +0200, Maciej Kwapulinski wrote:
> The new 'misc' device is the node for applications in user space to
> interact with the driver.
>
> Signed-off-by: Maciej Kwapulinski <[email protected]>
> Tested-by: Savo Novakovic <[email protected]>
> ---
> drivers/misc/intel/gna/device.c | 52 +++++++++++++++++++++++++++++++--
> drivers/misc/intel/gna/device.h | 11 +++----
> 2 files changed, 55 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
> index 0e31b8c6bb70..1e6345a8325b 100644
> --- a/drivers/misc/intel/gna/device.c
> +++ b/drivers/misc/intel/gna/device.c
> @@ -20,6 +20,18 @@ module_param(recovery_timeout, int, 0644);
> MODULE_PARM_DESC(recovery_timeout, "Recovery timeout in seconds");
> #endif
>
> +struct file;
> +
> +static int gna_open(struct inode *inode, struct file *f)
> +{
> + return -EPERM;
> +}

That sucks, why have an open that does nothing but fail?

> +
> +static const struct file_operations gna_file_ops = {
> + .owner = THIS_MODULE,
> + .open = gna_open,
> +};
> +
> static void gna_devm_idr_destroy(void *data)
> {
> struct idr *idr = data;
> @@ -27,6 +39,36 @@ static void gna_devm_idr_destroy(void *data)
> idr_destroy(idr);
> }
>
> +static void gna_devm_deregister_misc_dev(void *data)

Why is this a void *?

This isn't windows, use real pointer types everywhere in the kernel
please.

> +{
> + struct miscdevice *misc = data;
> +
> + misc_deregister(misc);
> +}
> +
> +static int gna_devm_register_misc_dev(struct device *parent, struct miscdevice *misc)
> +{
> + int ret;
> +
> + ret = misc_register(misc);
> + if (ret) {
> + dev_err(parent, "misc device %s registering failed. errcode: %d\n",
> + misc->name, ret);
> + gna_devm_deregister_misc_dev(misc);
> + } else {
> + dev_dbg(parent, "device: %s registered\n",
> + misc->name);

You have loads of debugging in this driver still, is it really needed?

> + }
> +
> + ret = devm_add_action(parent, gna_devm_deregister_misc_dev, misc);

Why do you need this?


thanks,

greg k-h

2021-05-13 18:48:30

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 01/14] intel_gna: add driver module

Add a new PCI driver for Intel(R) Gaussian & Neural Accelerator
with basic support like module loading and unloading. The full
function of the driver will be added by further changes.

Signed-off-by: Maciej Kwapulinski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Tomasz Jankowski <[email protected]>
Signed-off-by: Tomasz Jankowski <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
---
Documentation/misc-devices/index.rst | 1 +
Documentation/misc-devices/intel/gna.rst | 48 ++++++++++++++++
.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 7 +++
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/intel/gna/Kbuild | 5 ++
drivers/misc/intel/gna/Kconfig | 13 +++++
drivers/misc/intel/gna/device.c | 56 +++++++++++++++++++
drivers/misc/intel/gna/device.h | 30 ++++++++++
drivers/misc/intel/gna/hw.h | 17 ++++++
drivers/misc/intel/gna/pci.c | 48 ++++++++++++++++
drivers/misc/intel/gna/pci.h | 12 ++++
13 files changed, 240 insertions(+)
create mode 100644 Documentation/misc-devices/intel/gna.rst
create mode 100644 drivers/misc/intel/gna/Kbuild
create mode 100644 drivers/misc/intel/gna/Kconfig
create mode 100644 drivers/misc/intel/gna/device.c
create mode 100644 drivers/misc/intel/gna/device.h
create mode 100644 drivers/misc/intel/gna/hw.h
create mode 100644 drivers/misc/intel/gna/pci.c
create mode 100644 drivers/misc/intel/gna/pci.h

diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
index 64420b3314fe..2069d5c81a81 100644
--- a/Documentation/misc-devices/index.rst
+++ b/Documentation/misc-devices/index.rst
@@ -21,6 +21,7 @@ fit into other categories.
c2port
ibmvmc
ics932s401
+ intel/gna
isl29003
lis3lv02d
max6875
diff --git a/Documentation/misc-devices/intel/gna.rst b/Documentation/misc-devices/intel/gna.rst
new file mode 100644
index 000000000000..6bd4a663b9bb
--- /dev/null
+++ b/Documentation/misc-devices/intel/gna.rst
@@ -0,0 +1,48 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=====================================================
+Intel(R) Gaussian & Neural Accelerator (Intel(R) GNA)
+=====================================================
+
+Acronyms
+--------
+GNA - Gaussian & Neural Accelerator
+GMM - Gaussian Mixer Model
+CNN - Convolutional Neural Network
+RNN - Recurrent Neural Networks
+DNN - Deep Neural Networks
+
+Introduction
+------------
+The Intel(R) GNA is an internal PCI fixed device available on several Intel platforms/SoCs.
+Feature set depends on the Intel chipset SKU.
+
+Intel(R) GNA provides hardware accelerated computation for GMMs and Neural Networks.
+It supports several layer types: affine, recurrent, and convolutional among others.
+Hardware also provides helper layer types for copying and transposing matrices.
+
+Linux Driver
+------------
+The driver also registers a character device to expose file operations via dev node.
+
+The driver probes/removes a PCI device, implements file operations, handles runtime
+power management, and interacts with hardware through MMIO registers.
+
+Multiple processes can independently file many requests to the driver. These requests are
+processed in a FIFO manner. The hardware can process one request at a time by using a FIFO
+queue.
+
+IOCTL
+-----
+Intel(R) GNA driver controls the device through IOCTL interfaces.
+Following IOCTL commands are supported:
+
+GNA_IOCTL_PARAM_GET gets driver and device capabilities.
+
+GNA_IOCTL_MEMORY_MAP locks user pages and GNA MMU setups for DMA transfer.
+
+GNA_IOCTL_MEMORY_UNMAP unlocks user pages and releases GNA MMU structures.
+
+GNA_IOCTL_COMPUTE submits a request to the device queue.
+
+GNA_IOCTL_WAIT blocks and waits on the submitted request.
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 599bd4493944..6c8ff126e49d 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -115,6 +115,7 @@ Code Seq# Include File Comments
'B' C0-FF advanced bbus <mailto:[email protected]>
'C' all linux/soundcard.h conflict!
'C' 01-2F linux/capi.h conflict!
+'C' 01-5F uapi/misc/intel/gna.h conflict!
'C' F0-FF drivers/net/wan/cosa.h conflict!
'D' all arch/s390/include/asm/dasd.h
'D' 40-5F drivers/scsi/dpt/dtpi_ioctl.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 9450e052f1b1..b279dac57192 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8972,6 +8972,13 @@ S: Maintained
F: Documentation/fb/intelfb.rst
F: drivers/video/fbdev/intelfb/

+INTEL GNA PCI DRIVER
+M: Maciej Kwapulinski <[email protected]>
+S: Maintained
+F: Documentation/misc-devices/intel/gna.rst
+F: drivers/misc/intel/gna/*
+F: include/uapi/misc/intel/gna.h
+
INTEL GPIO DRIVERS
M: Andy Shevchenko <[email protected]>
L: [email protected]
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index f532c59bb59b..3675d2786b4f 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -461,4 +461,5 @@ source "drivers/misc/bcm-vk/Kconfig"
source "drivers/misc/cardreader/Kconfig"
source "drivers/misc/habanalabs/Kconfig"
source "drivers/misc/uacce/Kconfig"
+source "drivers/misc/intel/gna/Kconfig"
endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 99b6f15a3c70..3acf0367333d 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -56,3 +56,4 @@ obj-$(CONFIG_HABANA_AI) += habanalabs/
obj-$(CONFIG_UACCE) += uacce/
obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
obj-$(CONFIG_HISI_HIKEY_USB) += hisi_hikey_usb.o
+obj-$(CONFIG_INTEL_GNA) += intel/gna/
diff --git a/drivers/misc/intel/gna/Kbuild b/drivers/misc/intel/gna/Kbuild
new file mode 100644
index 000000000000..02d758132d32
--- /dev/null
+++ b/drivers/misc/intel/gna/Kbuild
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+intel_gna-y := device.o pci.o
+
+obj-$(CONFIG_INTEL_GNA) += intel_gna.o
diff --git a/drivers/misc/intel/gna/Kconfig b/drivers/misc/intel/gna/Kconfig
new file mode 100644
index 000000000000..ed8bd4788525
--- /dev/null
+++ b/drivers/misc/intel/gna/Kconfig
@@ -0,0 +1,13 @@
+#
+# Intel(R) Gaussian & Neural Accelerator (Intel(R) GNA)
+#
+
+config INTEL_GNA
+ tristate "Intel(R) Gaussian & Neural Accelerator (Intel(R) GNA)"
+ depends on X86 && PCI
+ help
+ This option enables the Intel(R) Gaussian & Neural Accelerator
+ (Intel(R) GNA) driver: intel_gna.
+ User space interface is defined in include/uapi/misc/intel/gna.h, while
+ information about functionality is in
+ Documentation/misc-devices/intel/gna.rst
diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
new file mode 100644
index 000000000000..8722935d26bf
--- /dev/null
+++ b/drivers/misc/intel/gna/device.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/atomic.h>
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/module.h>
+
+#include "device.h"
+#include "hw.h"
+
+static int recovery_timeout = 60;
+
+#ifdef CONFIG_DEBUG_INTEL_GNA
+module_param(recovery_timeout, int, 0644);
+MODULE_PARM_DESC(recovery_timeout, "Recovery timeout in seconds");
+#endif
+
+int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase)
+{
+ static atomic_t dev_last_idx = ATOMIC_INIT(-1);
+ struct gna_private *gna_priv;
+ const char *dev_misc_name;
+ int ret;
+
+ gna_priv = devm_kzalloc(parent, sizeof(*gna_priv), GFP_KERNEL);
+ if (!gna_priv)
+ return -ENOMEM;
+
+ gna_priv->index = atomic_inc_return(&dev_last_idx);
+ gna_priv->recovery_timeout_jiffies = msecs_to_jiffies(recovery_timeout * 1000);
+ gna_priv->iobase = iobase;
+ gna_priv->info = *dev_info;
+ gna_priv->parent = parent;
+
+ dev_misc_name = devm_kasprintf(parent, GFP_KERNEL, "%s%d", GNA_DV_NAME, gna_priv->index);
+ if (!dev_misc_name)
+ return -ENOMEM;
+
+ gna_priv->name = dev_misc_name;
+
+ if (!(sizeof(dma_addr_t) > 4) ||
+ dma_set_mask(parent, DMA_BIT_MASK(64))) {
+ ret = dma_set_mask(parent, DMA_BIT_MASK(32));
+ if (ret) {
+ dev_err(parent, "dma_set_mask error: %d\n", ret);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("Intel(R) Gaussian & Neural Accelerator (Intel(R) GNA) Driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
new file mode 100644
index 000000000000..e12bac64fbf4
--- /dev/null
+++ b/drivers/misc/intel/gna/device.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_DEVICE_H__
+#define __GNA_DEVICE_H__
+
+#include <linux/types.h>
+
+#include "hw.h"
+
+#define GNA_DV_NAME "intel_gna"
+
+struct device;
+
+struct gna_private {
+ int index;
+
+ int recovery_timeout_jiffies;
+
+ const char *name;
+ struct device *parent;
+
+ /* device related resources */
+ void __iomem *iobase;
+ struct gna_dev_info info;
+};
+
+int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase);
+
+#endif /* __GNA_DEVICE_H__ */
diff --git a/drivers/misc/intel/gna/hw.h b/drivers/misc/intel/gna/hw.h
new file mode 100644
index 000000000000..8ef8dc182b92
--- /dev/null
+++ b/drivers/misc/intel/gna/hw.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_HW_H__
+#define __GNA_HW_H__
+
+#include <linux/mm_types.h>
+
+struct gna_dev_info {
+ u32 hwid;
+ u32 num_pagetables;
+ u32 num_page_entries;
+ u32 max_layer_count;
+ u64 max_hw_mem;
+};
+
+#endif // __GNA_HW_H__
diff --git a/drivers/misc/intel/gna/pci.c b/drivers/misc/intel/gna/pci.c
new file mode 100644
index 000000000000..ade9076db097
--- /dev/null
+++ b/drivers/misc/intel/gna/pci.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/module.h>
+#include <linux/pci.h>
+
+#include "device.h"
+#include "pci.h"
+
+int gna_pci_probe(struct pci_dev *pcidev, const struct pci_device_id *pci_id)
+{
+ struct gna_dev_info *dev_info;
+ void __iomem *iobase;
+ int ret;
+
+ ret = pcim_enable_device(pcidev);
+ if (ret) {
+ dev_err(&pcidev->dev, "pci device can't be enabled\n");
+ return ret;
+ }
+
+ ret = pcim_iomap_regions(pcidev, BIT(0), pci_name(pcidev));
+ if (ret) {
+ dev_err(&pcidev->dev, "cannot iomap regions\n");
+ return ret;
+ }
+
+ iobase = pcim_iomap_table(pcidev)[0];
+
+ pci_set_master(pcidev);
+
+ dev_info = (struct gna_dev_info *)pci_id->driver_data;
+
+ ret = gna_probe(&pcidev->dev, dev_info, iobase);
+ if (ret) {
+ dev_err(&pcidev->dev, "could not initialize device\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static struct pci_driver gna_pci_driver = {
+ .name = GNA_DV_NAME,
+ .probe = gna_pci_probe,
+};
+
+module_pci_driver(gna_pci_driver);
diff --git a/drivers/misc/intel/gna/pci.h b/drivers/misc/intel/gna/pci.h
new file mode 100644
index 000000000000..a8ea77d34f26
--- /dev/null
+++ b/drivers/misc/intel/gna/pci.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_PCI_H__
+#define __GNA_PCI_H__
+
+struct pci_device_id;
+struct pci_dev;
+
+int gna_pci_probe(struct pci_dev *dev, const struct pci_device_id *id);
+
+#endif /* __GNA_PCI_H__ */
--
2.28.0


2021-05-13 18:48:41

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 07/14] intel_gna: add request component

From: Tomasz Jankowski <[email protected]>

The scoring work submitted to the GNA driver is implemented as a
list of requests that will be processed by the hardware.

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Anisha Dattatraya Kulkarni <[email protected]>
Signed-off-by: Anisha Dattatraya Kulkarni <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/Kbuild | 2 +-
drivers/misc/intel/gna/device.c | 7 +-
drivers/misc/intel/gna/device.h | 6 +
drivers/misc/intel/gna/mem.c | 3 +
drivers/misc/intel/gna/request.c | 350 +++++++++++++++++++++++++++++++
drivers/misc/intel/gna/request.h | 65 ++++++
include/uapi/misc/intel/gna.h | 37 ++++
7 files changed, 468 insertions(+), 2 deletions(-)
create mode 100644 drivers/misc/intel/gna/request.c
create mode 100644 drivers/misc/intel/gna/request.h

diff --git a/drivers/misc/intel/gna/Kbuild b/drivers/misc/intel/gna/Kbuild
index 64e8f10fd891..81d8da8f24aa 100644
--- a/drivers/misc/intel/gna/Kbuild
+++ b/drivers/misc/intel/gna/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only

-intel_gna-y := device.o hw.o mem.o pci.o
+intel_gna-y := device.o hw.o mem.o pci.o request.o

obj-$(CONFIG_INTEL_GNA) += intel_gna.o
diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index 50cac8139dcc..375342c3c140 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -1,13 +1,13 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright(c) 2017-2021 Intel Corporation

-#include <linux/atomic.h>
#include <linux/device.h>
#include <linux/dma-mapping.h>
#include <linux/module.h>

#include "device.h"
#include "hw.h"
+#include "request.h"

static int recovery_timeout = 60;

@@ -82,6 +82,11 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem

mutex_init(&gna_priv->memidr_lock);

+ atomic_set(&gna_priv->request_count, 0);
+
+ mutex_init(&gna_priv->reqlist_lock);
+ INIT_LIST_HEAD(&gna_priv->request_list);
+
return 0;
}

diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index f74c773867aa..6345295ea589 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -4,6 +4,7 @@
#ifndef __GNA_DEVICE_H__
#define __GNA_DEVICE_H__

+#include <linux/atomic.h>
#include <linux/idr.h>
#include <linux/io.h>
#include <linux/list.h>
@@ -44,6 +45,11 @@ struct gna_private {
struct gna_mmu_object mmu;
struct mutex mmu_lock;

+ struct list_head request_list;
+ /* protects request_list */
+ struct mutex reqlist_lock;
+ atomic_t request_count;
+
/* memory objects' store */
struct idr memory_idr;
/* lock protecting memory_idr */
diff --git a/drivers/misc/intel/gna/mem.c b/drivers/misc/intel/gna/mem.c
index bdc2771a0d18..de8c58d0f80a 100644
--- a/drivers/misc/intel/gna/mem.c
+++ b/drivers/misc/intel/gna/mem.c
@@ -21,6 +21,7 @@
#include "hw.h"
#include "device.h"
#include "mem.h"
+#include "request.h"

static void gna_mmu_init(struct gna_private *gna_priv)
{
@@ -340,6 +341,8 @@ static void gna_memory_release(struct work_struct *work)

mo = container_of(work, struct gna_memory_object, work);

+ gna_delete_memory_requests(mo->memory_id, mo->gna_priv);
+
mo->user_ptr = NULL;

wake_up_interruptible(&mo->waitq);
diff --git a/drivers/misc/intel/gna/request.c b/drivers/misc/intel/gna/request.c
new file mode 100644
index 000000000000..d9e7bc5d2d3a
--- /dev/null
+++ b/drivers/misc/intel/gna/request.c
@@ -0,0 +1,350 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/atomic.h>
+#include <linux/device.h>
+#include <linux/idr.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+
+#include "device.h"
+#include "mem.h"
+#include "request.h"
+
+static struct gna_request *gna_request_create(struct gna_file_private *file_priv,
+ struct gna_compute_cfg *compute_cfg)
+{
+ struct gna_request *score_request;
+ struct gna_private *gna_priv;
+
+ gna_priv = file_priv->gna_priv;
+ if (IS_ERR(gna_priv))
+ return NULL;
+
+ score_request = kzalloc(sizeof(*score_request), GFP_KERNEL);
+ if (!score_request)
+ return NULL;
+ kref_init(&score_request->refcount);
+
+ dev_dbg(gna_dev(gna_priv), "layer_base %d layer_count %d\n",
+ compute_cfg->layer_base, compute_cfg->layer_count);
+
+ score_request->request_id = atomic_inc_return(&gna_priv->request_count);
+ score_request->compute_cfg = *compute_cfg;
+ score_request->fd = file_priv->fd;
+ score_request->gna_priv = gna_priv;
+ score_request->state = NEW;
+ init_waitqueue_head(&score_request->waitq);
+
+ return score_request;
+}
+
+/*
+ * returns true if [inner_offset, inner_size) is embraced by [0, outer_size). False otherwise.
+ */
+static bool gna_validate_ranges(u64 outer_size, u64 inner_offset, u64 inner_size)
+{
+ return inner_offset < outer_size &&
+ inner_size <= (outer_size - inner_offset);
+}
+
+static int gna_validate_patches(struct gna_private *gna_priv, __u64 buffer_size,
+ struct gna_memory_patch *patches, u64 count)
+{
+ u64 idx;
+
+ for (idx = 0; idx < count; ++idx) {
+ if (patches[idx].size > 8) {
+ dev_err(gna_dev(gna_priv), "invalid patch size: %llu\n", patches[idx].size);
+ return -EINVAL;
+ }
+
+ if (!gna_validate_ranges(buffer_size, patches[idx].offset, patches[idx].size)) {
+ dev_err(gna_dev(gna_priv),
+ "patch out of bounds. buffer size: %llu, patch offset/size:%llu/%llu\n",
+ buffer_size, patches[idx].offset, patches[idx].size);
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
+static int gna_buffer_fill_patches(struct gna_buffer *buffer, struct gna_private *gna_priv)
+{
+ __u64 patches_user = buffer->patches_ptr;
+ struct gna_memory_patch *patches;
+ /* At this point, the buffer points to a memory region in kernel space where the copied
+ * patches_ptr also lives, but the value of it is still an address from user space. This
+ * function will set patches_ptr to either an address in kernel space or null before it
+ * exits.
+ */
+ u64 patch_count;
+ int ret;
+
+ buffer->patches_ptr = 0;
+ patch_count = buffer->patch_count;
+ if (!patch_count)
+ return 0;
+
+ patches = kvmalloc_array(patch_count, sizeof(struct gna_memory_patch), GFP_KERNEL);
+ if (!patches)
+ return -ENOMEM;
+
+ if (copy_from_user(patches, u64_to_user_ptr(patches_user),
+ sizeof(struct gna_memory_patch) * patch_count)) {
+ dev_err(gna_dev(gna_priv), "copy %llu patches from user failed\n", patch_count);
+ ret = -EFAULT;
+ goto err_fill_patches;
+ }
+
+ ret = gna_validate_patches(gna_priv, buffer->size, patches, patch_count);
+ if (ret) {
+ dev_err(gna_dev(gna_priv), "patches failed validation\n");
+ goto err_fill_patches;
+ }
+
+ buffer->patches_ptr = (uintptr_t)patches;
+
+ return 0;
+
+err_fill_patches:
+ kvfree(patches);
+ return ret;
+}
+
+static int gna_request_fill_buffers(struct gna_request *score_request,
+ struct gna_compute_cfg *compute_cfg)
+{
+ struct gna_buffer *buffer_list;
+ struct gna_memory_object *mo;
+ struct gna_private *gna_priv;
+ u64 buffers_total_size = 0;
+ struct gna_buffer *buffer;
+ u64 buffer_count;
+ u64 memory_id;
+ u64 i, j;
+ int ret;
+
+ gna_priv = score_request->gna_priv;
+
+ buffer_count = compute_cfg->buffer_count;
+ buffer_list = kvmalloc_array(buffer_count, sizeof(struct gna_buffer), GFP_KERNEL);
+ if (!buffer_list)
+ return -ENOMEM;
+
+ if (copy_from_user(buffer_list, u64_to_user_ptr(compute_cfg->buffers_ptr),
+ sizeof(*buffer_list) * buffer_count)) {
+ dev_err(gna_dev(gna_priv), "copying %llu buffers failed\n", buffer_count);
+ ret = -EFAULT;
+ goto err_free_buffers;
+ }
+
+ for (i = 0; i < buffer_count; i++) {
+ buffer = &buffer_list[i];
+ memory_id = buffer->memory_id;
+
+ for (j = 0; j < i; j++) {
+ if (buffer_list[j].memory_id == memory_id) {
+ dev_err(gna_dev(gna_priv),
+ "doubled memory id in score config. id:%llu\n", memory_id);
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+ }
+
+ buffers_total_size +=
+ gna_buffer_get_size(buffer->offset, buffer->size);
+ if (buffers_total_size > gna_priv->info.max_hw_mem) {
+ dev_err(gna_dev(gna_priv), "buffers' total size too big\n");
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, memory_id);
+ if (!mo) {
+ mutex_unlock(&gna_priv->memidr_lock);
+ dev_err(gna_dev(gna_priv), "memory object %llu not found\n", memory_id);
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+ mutex_unlock(&gna_priv->memidr_lock);
+
+ if (mo->fd != score_request->fd) {
+ dev_err(gna_dev(gna_priv),
+ "memory object from another file. %p != %p\n",
+ mo->fd, score_request->fd);
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+
+ if (!gna_validate_ranges(mo->memory_size, buffer->offset, buffer->size)) {
+ dev_err(gna_dev(gna_priv),
+ "buffer out of bounds. mo size: %llu, buffer offset/size:%llu/%llu\n",
+ mo->memory_size, buffer->offset, buffer->size);
+ ret = -EINVAL;
+ goto err_zero_patch_ptr;
+ }
+
+ ret = gna_buffer_fill_patches(buffer, gna_priv);
+ if (ret)
+ goto err_free_patches;
+ }
+
+ score_request->buffer_list = buffer_list;
+ score_request->buffer_count = buffer_count;
+
+ return 0;
+
+err_zero_patch_ptr:
+ /* patches_ptr may still hold an address in userspace.
+ * Don't pass it to kvfree().
+ */
+ buffer->patches_ptr = 0;
+
+err_free_patches:
+ /* patches_ptr of each processed buffer should be either
+ * null or pointing to an allocated memory block in the
+ * kernel at this point.
+ */
+ for (j = 0; j <= i; j++)
+ kvfree((void *)(uintptr_t)buffer_list[j].patches_ptr);
+
+err_free_buffers:
+ kvfree(buffer_list);
+ return ret;
+}
+
+int gna_enqueue_request(struct gna_compute_cfg *compute_cfg,
+ struct gna_file_private *file_priv, u64 *request_id)
+{
+ struct gna_request *score_request;
+ struct gna_private *gna_priv;
+ int ret;
+
+ if (!file_priv)
+ return -EINVAL;
+
+ gna_priv = file_priv->gna_priv;
+
+ score_request = gna_request_create(file_priv, compute_cfg);
+ if (!score_request)
+ return -ENOMEM;
+
+ ret = gna_request_fill_buffers(score_request, compute_cfg);
+ if (ret) {
+ kref_put(&score_request->refcount, gna_request_release);
+ return ret;
+ }
+
+ kref_get(&score_request->refcount);
+ mutex_lock(&gna_priv->reqlist_lock);
+ list_add_tail(&score_request->node, &gna_priv->request_list);
+ mutex_unlock(&gna_priv->reqlist_lock);
+
+ kref_put(&score_request->refcount, gna_request_release);
+
+ *request_id = score_request->request_id;
+
+ return 0;
+}
+
+void gna_request_release(struct kref *ref)
+{
+ struct gna_request *score_request =
+ container_of(ref, struct gna_request, refcount);
+ kfree(score_request);
+}
+
+struct gna_request *gna_find_request_by_id(u64 req_id, struct gna_private *gna_priv)
+{
+ struct gna_request *req, *found_req;
+ struct list_head *reqs_list;
+
+ mutex_lock(&gna_priv->reqlist_lock);
+
+ reqs_list = &gna_priv->request_list;
+ found_req = NULL;
+ if (!list_empty(reqs_list)) {
+ list_for_each_entry(req, reqs_list, node) {
+ if (req_id == req->request_id) {
+ found_req = req;
+ kref_get(&found_req->refcount);
+ break;
+ }
+ }
+ }
+
+ mutex_unlock(&gna_priv->reqlist_lock);
+
+ return found_req;
+}
+
+void gna_delete_request_by_id(u64 req_id, struct gna_private *gna_priv)
+{
+ struct gna_request *req, *temp_req;
+ struct list_head *reqs_list;
+
+ mutex_lock(&gna_priv->reqlist_lock);
+
+ reqs_list = &gna_priv->request_list;
+ if (!list_empty(reqs_list)) {
+ list_for_each_entry_safe(req, temp_req, reqs_list, node) {
+ if (req->request_id == req_id) {
+ list_del(&req->node);
+ kref_put(&req->refcount, gna_request_release);
+ break;
+ }
+ }
+ }
+
+ mutex_unlock(&gna_priv->reqlist_lock);
+}
+
+void gna_delete_file_requests(struct file *fd, struct gna_private *gna_priv)
+{
+ struct gna_request *req, *temp_req;
+ struct list_head *reqs_list;
+
+ mutex_lock(&gna_priv->reqlist_lock);
+
+ reqs_list = &gna_priv->request_list;
+ if (!list_empty(reqs_list)) {
+ list_for_each_entry_safe(req, temp_req, reqs_list, node) {
+ if (req->fd == fd) {
+ list_del(&req->node);
+ kref_put(&req->refcount, gna_request_release);
+ break;
+ }
+ }
+ }
+
+ mutex_unlock(&gna_priv->reqlist_lock);
+}
+
+void gna_delete_memory_requests(u64 memory_id, struct gna_private *gna_priv)
+{
+ struct gna_request *req, *temp_req;
+ struct list_head *reqs_list;
+ int i;
+
+ mutex_lock(&gna_priv->reqlist_lock);
+
+ reqs_list = &gna_priv->request_list;
+ if (!list_empty(reqs_list)) {
+ list_for_each_entry_safe(req, temp_req, reqs_list, node) {
+ for (i = 0; i < req->buffer_count; ++i) {
+ if (req->buffer_list[i].memory_id == memory_id) {
+ list_del(&req->node);
+ kref_put(&req->refcount, gna_request_release);
+ break;
+ }
+ }
+ }
+ }
+
+ mutex_unlock(&gna_priv->reqlist_lock);
+}
diff --git a/drivers/misc/intel/gna/request.h b/drivers/misc/intel/gna/request.h
new file mode 100644
index 000000000000..f34e974ad25c
--- /dev/null
+++ b/drivers/misc/intel/gna/request.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_REQUEST_H__
+#define __GNA_REQUEST_H__
+
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+
+#include <uapi/misc/intel/gna.h>
+
+struct gna_private;
+struct file;
+
+enum gna_request_state {
+ NEW,
+ ACTIVE,
+ DONE,
+};
+
+struct gna_file_private;
+
+struct gna_request {
+ u64 request_id;
+
+ struct kref refcount;
+
+ struct gna_private *gna_priv;
+ struct file *fd;
+
+ u32 hw_status;
+
+ enum gna_request_state state;
+
+ int status;
+
+ struct gna_hw_perf hw_perf;
+ struct gna_drv_perf drv_perf;
+
+ struct list_head node;
+
+ struct gna_compute_cfg compute_cfg;
+
+ struct gna_buffer *buffer_list;
+ u64 buffer_count;
+
+ struct wait_queue_head waitq;
+};
+
+int gna_enqueue_request(struct gna_compute_cfg *compute_cfg,
+ struct gna_file_private *file_priv, u64 *request_id);
+
+void gna_request_release(struct kref *ref);
+
+struct gna_request *gna_find_request_by_id(u64 req_id, struct gna_private *gna_priv);
+
+void gna_delete_request_by_id(u64 req_id, struct gna_private *gna_priv);
+
+void gna_delete_file_requests(struct file *fd, struct gna_private *gna_priv);
+
+void gna_delete_memory_requests(u64 memory_id, struct gna_private *gna_priv);
+
+#endif // __GNA_REQUEST_H__
diff --git a/include/uapi/misc/intel/gna.h b/include/uapi/misc/intel/gna.h
index a4b6a1e2010c..473ce569d146 100644
--- a/include/uapi/misc/intel/gna.h
+++ b/include/uapi/misc/intel/gna.h
@@ -26,6 +26,43 @@ extern "C" {
GNA_STS_PARAM_OOR |\
GNA_STS_VA_OOR)

+/*
+ * Structure describes part of memory to be overwritten before starting GNA
+ */
+struct gna_memory_patch {
+ /* offset from targeted memory */
+ __u64 offset;
+
+ __u64 size;
+ __u64 value;
+};
+
+struct gna_buffer {
+ __u64 memory_id;
+
+ __u64 offset;
+ __u64 size;
+
+ __u64 patch_count;
+ __u64 patches_ptr;
+};
+
+/*
+ * Driver performance timestamps in nanoseconds.
+ * Values regard system boot time, but do not count during suspend.
+ */
+struct gna_drv_perf {
+ __u64 pre_processing; /* driver starts pre-processing */
+ __u64 processing; /* hw starts processing */
+ __u64 hw_completed; /* hw finishes processing */
+ __u64 completion; /* driver finishes post-processing */
+};
+
+struct gna_hw_perf {
+ __u64 total;
+ __u64 stall;
+};
+
struct gna_compute_cfg {
__u32 layer_base;
__u32 layer_count;
--
2.28.0


2021-05-13 18:48:42

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 06/14] intel_gna: add hardware ids

From: Tomasz Jankowski <[email protected]>

Add PCI ids of Intel(R) Gaussian & Neural Accelerator on supported
platforms.

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/hw.h | 34 +++++++++++++++++++
drivers/misc/intel/gna/pci.c | 65 ++++++++++++++++++++++++++++++++++++
2 files changed, 99 insertions(+)

diff --git a/drivers/misc/intel/gna/hw.h b/drivers/misc/intel/gna/hw.h
index f61792f9dd52..2a6890fb748e 100644
--- a/drivers/misc/intel/gna/hw.h
+++ b/drivers/misc/intel/gna/hw.h
@@ -10,6 +10,40 @@
struct gna_compute_cfg;
struct gna_private;

+#define GNA_FEATURES \
+ .max_hw_mem = 256 * 1024 * 1024, \
+ .num_pagetables = 64, \
+ .num_page_entries = PAGE_SIZE / sizeof(u32), \
+ /* desc_info all in bytes */ \
+ .desc_info = { \
+ .rsvd_size = 256, \
+ .cfg_size = 256, \
+ .desc_size = 784, \
+ .mmu_info = { \
+ .vamax_size = 4, \
+ .rsvd_size = 12, \
+ .pd_size = 4 * 64, \
+ }, \
+ }
+
+#define GNA_GEN1_FEATURES \
+ GNA_FEATURES, \
+ .max_layer_count = 1024
+
+#define GNA_GEN2_FEATURES \
+ GNA_FEATURES, \
+ .max_layer_count = 4096
+
+#define GNA_DEV_HWID_CNL 0x5A11
+#define GNA_DEV_HWID_EHL 0x4511
+#define GNA_DEV_HWID_GLK 0x3190
+#define GNA_DEV_HWID_ICL 0x8A11
+#define GNA_DEV_HWID_JSL 0x4E11
+#define GNA_DEV_HWID_TGL 0x9A11
+#define GNA_DEV_HWID_RKL 0x4C11
+#define GNA_DEV_HWID_ADL 0x464F
+#define GNA_DEV_HWID_RPL 0xA74F
+
/* GNA MMIO registers */
#define GNA_MMIO_STS 0x80
#define GNA_MMIO_CTRL 0x84
diff --git a/drivers/misc/intel/gna/pci.c b/drivers/misc/intel/gna/pci.c
index ade9076db097..525e9de9d577 100644
--- a/drivers/misc/intel/gna/pci.c
+++ b/drivers/misc/intel/gna/pci.c
@@ -5,8 +5,70 @@
#include <linux/pci.h>

#include "device.h"
+#include "hw.h"
#include "pci.h"

+static const struct gna_dev_info cnl_dev_info = {
+ .hwid = GNA_DEV_HWID_CNL,
+ GNA_GEN1_FEATURES
+};
+
+static const struct gna_dev_info glk_dev_info = {
+ .hwid = GNA_DEV_HWID_GLK,
+ GNA_GEN1_FEATURES
+};
+
+static const struct gna_dev_info ehl_dev_info = {
+ .hwid = GNA_DEV_HWID_EHL,
+ GNA_GEN1_FEATURES
+};
+
+static const struct gna_dev_info icl_dev_info = {
+ .hwid = GNA_DEV_HWID_ICL,
+ GNA_GEN1_FEATURES
+};
+
+static const struct gna_dev_info jsl_dev_info = {
+ .hwid = GNA_DEV_HWID_JSL,
+ GNA_GEN2_FEATURES
+};
+
+static const struct gna_dev_info tgl_dev_info = {
+ .hwid = GNA_DEV_HWID_TGL,
+ GNA_GEN2_FEATURES
+};
+
+static const struct gna_dev_info rkl_dev_info = {
+ .hwid = GNA_DEV_HWID_RKL,
+ GNA_GEN2_FEATURES
+};
+
+static const struct gna_dev_info adl_dev_info = {
+ .hwid = GNA_DEV_HWID_ADL,
+ GNA_GEN2_FEATURES
+};
+
+static const struct gna_dev_info rpl_dev_info = {
+ .hwid = GNA_DEV_HWID_RPL,
+ GNA_GEN2_FEATURES
+};
+
+#define INTEL_GNA_DEVICE(hwid, info) \
+ { PCI_VDEVICE(INTEL, hwid), (kernel_ulong_t)(info) }
+
+static const struct pci_device_id gna_pci_ids[] = {
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_CNL, &cnl_dev_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_EHL, &ehl_dev_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_GLK, &glk_dev_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_ICL, &icl_dev_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_JSL, &jsl_dev_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_TGL, &tgl_dev_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_RKL, &rkl_dev_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_ADL, &adl_dev_info),
+ INTEL_GNA_DEVICE(GNA_DEV_HWID_RPL, &rpl_dev_info),
+ { }
+};
+
int gna_pci_probe(struct pci_dev *pcidev, const struct pci_device_id *pci_id)
{
struct gna_dev_info *dev_info;
@@ -42,7 +104,10 @@ int gna_pci_probe(struct pci_dev *pcidev, const struct pci_device_id *pci_id)

static struct pci_driver gna_pci_driver = {
.name = GNA_DV_NAME,
+ .id_table = gna_pci_ids,
.probe = gna_pci_probe,
};

module_pci_driver(gna_pci_driver);
+
+MODULE_DEVICE_TABLE(pci, gna_pci_ids);
--
2.28.0


2021-05-13 18:48:52

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 09/14] intel_gna: add a work queue to process scoring requests

From: Tomasz Jankowski <[email protected]>

The new workqueue is responsible to process the list of requests
in a FIFO manner. It waits for the hardware to complete on every
request until it is woken up by an interrupt that will be addressed
in following changes.

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Anisha Dattatraya Kulkarni <[email protected]>
Signed-off-by: Anisha Dattatraya Kulkarni <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/device.c | 34 +++++++++
drivers/misc/intel/gna/device.h | 14 ++++
drivers/misc/intel/gna/request.c | 115 +++++++++++++++++++++++++++++++
drivers/misc/intel/gna/request.h | 2 +
4 files changed, 165 insertions(+)

diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index c8a127cc3039..ca988d3ec408 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -23,6 +23,34 @@ static void gna_devm_idr_destroy(void *data)
idr_destroy(idr);
}

+static void gna_devm_destroy_workqueue(void *data)
+{
+ struct workqueue_struct *request_wq = data;
+
+ destroy_workqueue(request_wq);
+}
+
+static int gna_devm_create_singlethread_workqueue(struct gna_private *gna_priv)
+{
+ struct device *dev = gna_parent(gna_priv);
+ const char *name = gna_name(gna_priv);
+ int ret;
+
+ gna_priv->request_wq = create_singlethread_workqueue(name);
+ if (!gna_priv->request_wq) {
+ dev_err(dev, "could not create %s workqueue\n", name);
+ return -EFAULT;
+ }
+
+ ret = devm_add_action(dev, gna_devm_destroy_workqueue, gna_priv->request_wq);
+ if (ret) {
+ dev_err(dev, "could not add devm action for %s workqueue\n", name);
+ gna_devm_destroy_workqueue(gna_priv->request_wq);
+ }
+
+ return ret;
+}
+
int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase)
{
static atomic_t dev_last_idx = ATOMIC_INIT(-1);
@@ -90,6 +118,12 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
mutex_init(&gna_priv->reqlist_lock);
INIT_LIST_HEAD(&gna_priv->request_list);

+ init_waitqueue_head(&gna_priv->dev_busy_waitq);
+
+ ret = gna_devm_create_singlethread_workqueue(gna_priv);
+ if (ret)
+ return ret;
+
return 0;
}

diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index e879ac045928..65856d08729f 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -16,6 +16,7 @@

#define GNA_DV_NAME "intel_gna"

+struct workqueue_struct;
struct device;
struct file;

@@ -42,6 +43,9 @@ struct gna_private {
const char *name;
struct device *parent;

+ /* hardware status set by interrupt handler */
+ u32 hw_status;
+
/* device related resources */
void __iomem *iobase;
struct gna_dev_info info;
@@ -50,9 +54,14 @@ struct gna_private {
struct gna_mmu_object mmu;
struct mutex mmu_lock;

+ /* if true, then gna device is processing */
+ bool dev_busy;
+ struct wait_queue_head dev_busy_waitq;
+
struct list_head request_list;
/* protects request_list */
struct mutex reqlist_lock;
+ struct workqueue_struct *request_wq;
atomic_t request_count;

/* memory objects' store */
@@ -78,6 +87,11 @@ static inline struct device *gna_parent(struct gna_private *gna_priv)
return gna_priv->parent;
}

+static inline const char *gna_name(struct gna_private *gna_priv)
+{
+ return gna_priv->name;
+}
+
static inline struct device *gna_dev(struct gna_private *gna_priv)
{
return gna_priv->parent;
diff --git a/drivers/misc/intel/gna/request.c b/drivers/misc/intel/gna/request.c
index d9e7bc5d2d3a..eabbab8924be 100644
--- a/drivers/misc/intel/gna/request.c
+++ b/drivers/misc/intel/gna/request.c
@@ -10,8 +10,118 @@
#include <linux/slab.h>

#include "device.h"
+#include "hw.h"
#include "mem.h"
#include "request.h"
+#include "score.h"
+
+static void gna_request_update_status(struct gna_request *score_request)
+{
+ struct gna_private *gna_priv = score_request->gna_priv;
+ /* The gna_priv's hw_status should be updated first */
+ u32 hw_status = gna_priv->hw_status;
+ u32 stall_cycles;
+ u32 total_cycles;
+
+ /* Technically, the time stamp can be a bit later than
+ * when the hw actually completed scoring. Here we just
+ * do our best in a deferred work, unless we want to
+ * tax isr for a more accurate record.
+ */
+ score_request->drv_perf.hw_completed = ktime_get_ns();
+
+ score_request->hw_status = hw_status;
+
+ score_request->status = gna_parse_hw_status(gna_priv, hw_status);
+
+ if (gna_hw_perf_enabled(gna_priv)) {
+ if (hw_status & GNA_STS_STATISTICS_VALID) {
+ total_cycles = gna_reg_read(gna_priv, GNA_MMIO_PTC);
+ stall_cycles = gna_reg_read(gna_priv, GNA_MMIO_PSC);
+ score_request->hw_perf.total = total_cycles;
+ score_request->hw_perf.stall = stall_cycles;
+ } else
+ dev_warn(gna_dev(gna_priv), "GNA statistics missing\n");
+ }
+ if (unlikely(hw_status & GNA_ERROR))
+ gna_print_error_status(gna_priv, hw_status);
+}
+
+static void gna_request_process(struct work_struct *work)
+{
+ struct gna_request *score_request;
+ struct gna_memory_object *mo;
+ struct gna_private *gna_priv;
+ struct gna_buffer *buffer;
+ unsigned long hw_timeout;
+ int ret;
+ u64 i;
+
+ score_request = container_of(work, struct gna_request, work);
+ gna_priv = score_request->gna_priv;
+ dev_dbg(gna_dev(gna_priv), "processing request %llu\n", score_request->request_id);
+
+ score_request->state = ACTIVE;
+
+ score_request->drv_perf.pre_processing = ktime_get_ns();
+
+ /* Set busy flag before kicking off HW. The isr will clear it and wake up us. There is
+ * no difference if isr is missed in a timeout situation of the last request. We just
+ * always set it busy and let the wait_event_timeout check the reset.
+ * wq: X -> true
+ * isr: X -> false
+ */
+ gna_priv->dev_busy = true;
+
+ ret = gna_score(score_request);
+ if (ret) {
+ score_request->status = ret;
+ goto end;
+ }
+
+ score_request->drv_perf.processing = ktime_get_ns();
+
+ hw_timeout = gna_priv->recovery_timeout_jiffies;
+
+ hw_timeout = wait_event_timeout(gna_priv->dev_busy_waitq,
+ !gna_priv->dev_busy, hw_timeout);
+
+ if (!hw_timeout)
+ dev_warn(gna_dev(gna_priv), "hardware timeout occurred\n");
+
+ gna_priv->hw_status = gna_reg_read(gna_priv, GNA_MMIO_STS);
+
+ gna_request_update_status(score_request);
+ gna_abort_hw(gna_priv);
+
+ buffer = score_request->buffer_list;
+ for (i = 0; i < score_request->buffer_count; i++, buffer++) {
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, buffer->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+ if (mo) {
+ mutex_lock(&mo->page_lock);
+ mo->ops->put_pages(mo);
+ mutex_unlock(&mo->page_lock);
+ } else {
+ dev_warn(gna_dev(gna_priv), "mo not found %llu\n", buffer->memory_id);
+ }
+ }
+
+ /* patches_ptr's are already freed by ops->score() function */
+ kvfree(score_request->buffer_list);
+ score_request->buffer_list = NULL;
+ score_request->buffer_count = 0;
+
+ gna_mmu_clear(gna_priv);
+
+end:
+ score_request->drv_perf.completion = ktime_get_ns();
+ dev_dbg(gna_dev(gna_priv), "request %llu done, waking processes\n",
+ score_request->request_id);
+ score_request->state = DONE;
+ wake_up_interruptible_all(&score_request->waitq);
+}

static struct gna_request *gna_request_create(struct gna_file_private *file_priv,
struct gna_compute_cfg *compute_cfg)
@@ -37,6 +147,7 @@ static struct gna_request *gna_request_create(struct gna_file_private *file_priv
score_request->gna_priv = gna_priv;
score_request->state = NEW;
init_waitqueue_head(&score_request->waitq);
+ INIT_WORK(&score_request->work, gna_request_process);

return score_request;
}
@@ -245,6 +356,7 @@ int gna_enqueue_request(struct gna_compute_cfg *compute_cfg,
list_add_tail(&score_request->node, &gna_priv->request_list);
mutex_unlock(&gna_priv->reqlist_lock);

+ queue_work(gna_priv->request_wq, &score_request->work);
kref_put(&score_request->refcount, gna_request_release);

*request_id = score_request->request_id;
@@ -295,6 +407,7 @@ void gna_delete_request_by_id(u64 req_id, struct gna_private *gna_priv)
list_for_each_entry_safe(req, temp_req, reqs_list, node) {
if (req->request_id == req_id) {
list_del(&req->node);
+ cancel_work_sync(&req->work);
kref_put(&req->refcount, gna_request_release);
break;
}
@@ -316,6 +429,7 @@ void gna_delete_file_requests(struct file *fd, struct gna_private *gna_priv)
list_for_each_entry_safe(req, temp_req, reqs_list, node) {
if (req->fd == fd) {
list_del(&req->node);
+ cancel_work_sync(&req->work);
kref_put(&req->refcount, gna_request_release);
break;
}
@@ -339,6 +453,7 @@ void gna_delete_memory_requests(u64 memory_id, struct gna_private *gna_priv)
for (i = 0; i < req->buffer_count; ++i) {
if (req->buffer_list[i].memory_id == memory_id) {
list_del(&req->node);
+ cancel_work_sync(&req->work);
kref_put(&req->refcount, gna_request_release);
break;
}
diff --git a/drivers/misc/intel/gna/request.h b/drivers/misc/intel/gna/request.h
index f34e974ad25c..f95947dd7272 100644
--- a/drivers/misc/intel/gna/request.h
+++ b/drivers/misc/intel/gna/request.h
@@ -8,6 +8,7 @@
#include <linux/list.h>
#include <linux/types.h>
#include <linux/wait.h>
+#include <linux/workqueue.h>

#include <uapi/misc/intel/gna.h>

@@ -46,6 +47,7 @@ struct gna_request {
struct gna_buffer *buffer_list;
u64 buffer_count;

+ struct work_struct work;
struct wait_queue_head waitq;
};

--
2.28.0


2021-05-13 18:48:54

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 10/14] intel_gna: add interrupt handler

An interrupt is generated by the hardware when a scoring job is
done. The interrupt handler wakes up the work queue to resume
the processing on the current request.

Signed-off-by: Maciej Kwapulinski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Tomasz Jankowski <[email protected]>
Signed-off-by: Tomasz Jankowski <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
---
drivers/misc/intel/gna/device.c | 20 ++++++++++++++++++-
drivers/misc/intel/gna/device.h | 2 +-
drivers/misc/intel/gna/pci.c | 35 ++++++++++++++++++++++++++++++++-
3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index ca988d3ec408..75d8e1675485 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -3,6 +3,7 @@

#include <linux/device.h>
#include <linux/dma-mapping.h>
+#include <linux/interrupt.h>
#include <linux/module.h>

#include "device.h"
@@ -23,6 +24,16 @@ static void gna_devm_idr_destroy(void *data)
idr_destroy(idr);
}

+static irqreturn_t gna_interrupt(int irq, void *priv)
+{
+ struct gna_private *gna_priv;
+
+ gna_priv = (struct gna_private *)priv;
+ gna_priv->dev_busy = false;
+ wake_up(&gna_priv->dev_busy_waitq);
+ return IRQ_HANDLED;
+}
+
static void gna_devm_destroy_workqueue(void *data)
{
struct workqueue_struct *request_wq = data;
@@ -51,7 +62,7 @@ static int gna_devm_create_singlethread_workqueue(struct gna_private *gna_priv)
return ret;
}

-int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase)
+int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase, int irq)
{
static atomic_t dev_last_idx = ATOMIC_INIT(-1);
struct gna_private *gna_priv;
@@ -124,6 +135,13 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
if (ret)
return ret;

+ ret = devm_request_irq(parent, irq, gna_interrupt,
+ IRQF_SHARED, dev_misc_name, gna_priv);
+ if (ret) {
+ dev_err(parent, "could not register for interrupt\n");
+ return ret;
+ }
+
return 0;
}

diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index 65856d08729f..d3c86d649b5c 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -70,7 +70,7 @@ struct gna_private {
struct mutex memidr_lock;
};

-int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase);
+int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase, int irq);

static inline u32 gna_reg_read(struct gna_private *gna_priv, u32 reg)
{
diff --git a/drivers/misc/intel/gna/pci.c b/drivers/misc/intel/gna/pci.c
index 525e9de9d577..a31f0142a4f2 100644
--- a/drivers/misc/intel/gna/pci.c
+++ b/drivers/misc/intel/gna/pci.c
@@ -69,10 +69,33 @@ static const struct pci_device_id gna_pci_ids[] = {
{ }
};

+static void gna_pcim_free_irq_vectors(void *data)
+{
+ struct pci_dev *pcidev = data;
+
+ pci_free_irq_vectors(pcidev);
+}
+
+static int gna_pcim_alloc_irq_vectors(struct pci_dev *pcidev)
+{
+ int ret;
+
+ ret = pci_alloc_irq_vectors(pcidev, 1, 1, PCI_IRQ_ALL_TYPES);
+ if (ret < 0)
+ return ret;
+
+ ret = devm_add_action(&pcidev->dev, gna_pcim_free_irq_vectors, pcidev);
+ if (ret)
+ gna_pcim_free_irq_vectors(pcidev);
+
+ return ret;
+}
+
int gna_pci_probe(struct pci_dev *pcidev, const struct pci_device_id *pci_id)
{
struct gna_dev_info *dev_info;
void __iomem *iobase;
+ int irq;
int ret;

ret = pcim_enable_device(pcidev);
@@ -91,9 +114,19 @@ int gna_pci_probe(struct pci_dev *pcidev, const struct pci_device_id *pci_id)

pci_set_master(pcidev);

+ ret = gna_pcim_alloc_irq_vectors(pcidev);
+ if (ret < 0)
+ return ret;
+
+ irq = pci_irq_vector(pcidev, 0);
+ if (unlikely(irq < 0)) {
+ dev_err(&pcidev->dev, "could not get irq number\n");
+ return -EIO;
+ }
+
dev_info = (struct gna_dev_info *)pci_id->driver_data;

- ret = gna_probe(&pcidev->dev, dev_info, iobase);
+ ret = gna_probe(&pcidev->dev, dev_info, iobase, irq);
if (ret) {
dev_err(&pcidev->dev, "could not initialize device\n");
return ret;
--
2.28.0


2021-05-13 18:48:55

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 05/14] intel_gna: initialize mmu

From: Tomasz Jankowski <[email protected]>

Setup mmu in the driver with a new memory component.

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-Developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/device.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index ed7d3c0223df..50cac8139dcc 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -59,6 +59,18 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
bld_reg = gna_reg_read(gna_priv, GNA_MMIO_IBUFFS);
gna_priv->hw_info.in_buf_s = bld_reg & GENMASK(7, 0);

+ ret = gna_mmu_alloc(gna_priv);
+ if (ret) {
+ dev_err(parent, "mmu allocation failed\n");
+ return ret;
+
+ }
+ dev_dbg(parent, "maximum memory size %llu num pd %d\n",
+ gna_priv->info.max_hw_mem, gna_priv->info.num_pagetables);
+ dev_dbg(parent, "desc rsvd size %d mmu vamax size %d\n",
+ gna_priv->info.desc_info.rsvd_size,
+ gna_priv->info.desc_info.mmu_info.vamax_size);
+
mutex_init(&gna_priv->mmu_lock);

idr_init(&gna_priv->memory_idr);
--
2.28.0


2021-05-13 18:49:00

by Maciej Kwapulinski

[permalink] [raw]
Subject: [PATCH v3 08/14] intel_gna: implement scoring

From: Tomasz Jankowski <[email protected]>

Add a new component for scoring logic such as configuring and kicking
off the hardware.

Signed-off-by: Tomasz Jankowski <[email protected]>
Tested-by: Savo Novakovic <[email protected]>
Co-developed-by: Jianxun Zhang <[email protected]>
Signed-off-by: Jianxun Zhang <[email protected]>
Co-developed-by: Maciej Kwapulinski <[email protected]>
Signed-off-by: Maciej Kwapulinski <[email protected]>
---
drivers/misc/intel/gna/Kbuild | 2 +-
drivers/misc/intel/gna/device.c | 3 +
drivers/misc/intel/gna/device.h | 5 +
drivers/misc/intel/gna/score.c | 291 ++++++++++++++++++++++++++++++++
drivers/misc/intel/gna/score.h | 17 ++
include/uapi/misc/intel/gna.h | 4 +
6 files changed, 321 insertions(+), 1 deletion(-)
create mode 100644 drivers/misc/intel/gna/score.c
create mode 100644 drivers/misc/intel/gna/score.h

diff --git a/drivers/misc/intel/gna/Kbuild b/drivers/misc/intel/gna/Kbuild
index 81d8da8f24aa..38ff97360ed8 100644
--- a/drivers/misc/intel/gna/Kbuild
+++ b/drivers/misc/intel/gna/Kbuild
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only

-intel_gna-y := device.o hw.o mem.o pci.o request.o
+intel_gna-y := device.o hw.o mem.o pci.o request.o score.o

obj-$(CONFIG_INTEL_GNA) += intel_gna.o
diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
index 375342c3c140..c8a127cc3039 100644
--- a/drivers/misc/intel/gna/device.c
+++ b/drivers/misc/intel/gna/device.c
@@ -82,6 +82,9 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem

mutex_init(&gna_priv->memidr_lock);

+ mutex_init(&gna_priv->flist_lock);
+ INIT_LIST_HEAD(&gna_priv->file_list);
+
atomic_set(&gna_priv->request_count, 0);

mutex_init(&gna_priv->reqlist_lock);
diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
index 6345295ea589..e879ac045928 100644
--- a/drivers/misc/intel/gna/device.h
+++ b/drivers/misc/intel/gna/device.h
@@ -30,6 +30,11 @@ struct gna_file_private {
};

struct gna_private {
+ /* list of opened files */
+ struct list_head file_list;
+ /* protects file_list */
+ struct mutex flist_lock;
+
int index;

int recovery_timeout_jiffies;
diff --git a/drivers/misc/intel/gna/score.c b/drivers/misc/intel/gna/score.c
new file mode 100644
index 000000000000..a3be0d62393a
--- /dev/null
+++ b/drivers/misc/intel/gna/score.c
@@ -0,0 +1,291 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2017-2021 Intel Corporation
+
+#include <linux/device.h>
+#include <linux/idr.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+
+#include <uapi/misc/intel/gna.h>
+
+#include "device.h"
+#include "mem.h"
+#include "request.h"
+#include "score.h"
+
+int gna_validate_score_config(struct gna_compute_cfg *compute_cfg,
+ struct gna_file_private *file_priv)
+{
+ struct gna_private *gna_priv;
+ size_t buffers_size;
+
+ gna_priv = file_priv->gna_priv;
+
+ if (compute_cfg->gna_mode > GNA_MODE_XNN) {
+ dev_err(gna_dev(gna_priv), "invalid mode\n");
+ return -EINVAL;
+ }
+
+ if (compute_cfg->layer_count > gna_priv->info.max_layer_count) {
+ dev_err(gna_dev(gna_priv), "max layer count exceeded\n");
+ return -EINVAL;
+ }
+
+ if (compute_cfg->buffer_count == 0) {
+ dev_err(gna_dev(gna_priv), "no buffers\n");
+ return -EINVAL;
+ }
+
+ buffers_size = sizeof(struct gna_buffer) * compute_cfg->buffer_count;
+ if (!access_ok(u64_to_user_ptr(compute_cfg->buffers_ptr), buffers_size)) {
+ dev_err(gna_dev(gna_priv), "invalid buffers pointer\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int gna_do_patch_memory(struct gna_private *gna_priv, struct gna_memory_object *mo,
+ struct gna_memory_patch *patch, void *vaddr)
+{
+ size_t size;
+ void *dest;
+ u64 value;
+
+ value = patch->value;
+ size = patch->size;
+ dest = (u8 *)vaddr + patch->offset;
+ dev_dbg(gna_dev(gna_priv), "patch offset: %llu, size: %zu, value: %llu\n",
+ patch->offset, size, value);
+
+ switch (size) {
+ case 0:
+ return -EFAULT;
+ case sizeof(u8):
+ *((u8 *)dest) = (u8)value;
+ break;
+ case sizeof(u16):
+ *((u16 *)dest) = (u16)value;
+ break;
+ case sizeof(u32):
+ *((u32 *)dest) = (u32)value;
+ break;
+ case sizeof(u64):
+ *((u64 *)dest) = (u64)value;
+ break;
+ default:
+ // should never happen
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int gna_mem_patch_memory(struct gna_private *gna_priv, struct gna_buffer *buffer)
+{
+ struct gna_memory_patch *patch;
+ struct gna_memory_object *mo;
+ void *vaddr;
+ int ret = 0;
+ u32 i;
+
+ dev_dbg(gna_dev(gna_priv), "memory_id: %llu, patch_count, %llu\n",
+ buffer->memory_id, buffer->patch_count);
+
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, buffer->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+ if (!mo)
+ return -EINVAL;
+
+ mutex_lock(&mo->page_lock);
+ ret = mo->ops->get_pages(mo, buffer->offset, buffer->size);
+ mutex_unlock(&mo->page_lock);
+ if (ret)
+ return ret;
+
+ if (buffer->patch_count) {
+ vaddr = vm_map_ram(mo->pages, mo->num_pinned, 0);
+ if (!vaddr)
+ return -ENOMEM;
+
+ patch = (struct gna_memory_patch *)(uintptr_t)buffer->patches_ptr;
+ for (i = 0; i < buffer->patch_count; i++, patch++) {
+ ret = gna_do_patch_memory(gna_priv, mo, patch, vaddr + buffer->offset);
+ if (ret)
+ break;
+ }
+
+ kvfree((void *)(uintptr_t)buffer->patches_ptr);
+ buffer->patches_ptr = 0;
+ vm_unmap_ram(vaddr, mo->num_pages);
+
+ if (ret)
+ return ret;
+ }
+
+ gna_mmu_add(gna_priv, mo);
+
+ return ret;
+}
+
+static struct gna_buffer *gna_find_buffer(struct gna_buffer *buffer_list, u32 buffer_count,
+ u32 mmu_offset, u32 *memory_offset)
+{
+ struct gna_buffer *buffer;
+ u32 page_offset;
+ u32 memory_size;
+ u32 offset;
+ u32 i;
+
+ offset = 0;
+ for (i = 0; i < buffer_count; i++) {
+ buffer = buffer_list + i;
+ page_offset = buffer->offset & ~PAGE_MASK;
+ memory_size = round_up(page_offset + buffer->size, PAGE_SIZE);
+ if (mmu_offset < offset + memory_size) {
+ *memory_offset = offset;
+ return buffer;
+ }
+ offset += memory_size;
+ }
+
+ return NULL;
+}
+
+static int gna_copy_gmm_config(struct gna_private *gna_priv,
+ struct gna_buffer *buffer_list,
+ u32 buffer_count, u32 mmu_offset)
+{
+ struct gna_hw_descriptor *hwdesc;
+ struct gna_memory_object *mo;
+ struct gna_mmu_object *mmu;
+ struct gna_buffer *buffer;
+ u32 memory_offset;
+ u32 skip_offset;
+ u8 *gmm_desc;
+ void *vaddr;
+
+ mmu = &gna_priv->mmu;
+ hwdesc = mmu->hwdesc;
+
+ buffer = gna_find_buffer(buffer_list, buffer_count, mmu_offset, &memory_offset);
+ if (!buffer) {
+ dev_dbg(gna_dev(gna_priv), "buffer not found\n");
+ return -EINVAL;
+ }
+
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, buffer->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+ if (!mo) {
+ dev_dbg(gna_dev(gna_priv), "memory object not found\n");
+ return -EFAULT;
+ }
+
+ vaddr = vm_map_ram(mo->pages, mo->num_pinned, 0);
+ if (!vaddr) {
+ dev_dbg(gna_dev(gna_priv), "mapping failed\n");
+ return -EFAULT;
+ }
+
+ skip_offset = round_down(buffer->offset, PAGE_SIZE);
+ gmm_desc = (u8 *)vaddr + skip_offset + (mmu_offset - memory_offset);
+ memcpy(&hwdesc->xnn_config, gmm_desc, sizeof(struct gna_xnn_descriptor));
+ vm_unmap_ram(vaddr, mo->num_pages);
+
+ return 0;
+}
+
+int gna_score(struct gna_request *score_request)
+{
+ struct gna_xnn_descriptor *xnn_config;
+ struct gna_compute_cfg *compute_cfg;
+ struct gna_private *gna_priv;
+ struct gna_memory_object *mo;
+ struct gna_mmu_object *mmu;
+ struct gna_buffer *buffer;
+ bool mo_valid = true;
+ u64 buffer_count;
+ u32 desc_base;
+ int ret;
+ u64 i;
+
+ ret = 0;
+
+ gna_priv = score_request->gna_priv;
+
+ mmu = &gna_priv->mmu;
+ xnn_config = &mmu->hwdesc->xnn_config;
+ compute_cfg = &score_request->compute_cfg;
+
+ buffer = score_request->buffer_list;
+ buffer_count = score_request->buffer_count;
+ dev_dbg(gna_dev(gna_priv), "buffer count: %llu\n", buffer_count);
+ for (i = 0; i < buffer_count; i++, buffer++) {
+ dev_dbg(gna_dev(gna_priv), "patch count: %llu\n", buffer->patch_count);
+ ret = gna_mem_patch_memory(gna_priv, buffer);
+ if (ret)
+ goto err_put_pages;
+ }
+
+ switch (compute_cfg->gna_mode) {
+ case GNA_MODE_XNN:
+ dev_dbg(gna_dev(gna_priv), "xNN mode, labase: %d, lacount: %d\n",
+ compute_cfg->layer_base, compute_cfg->layer_count);
+ xnn_config->labase = compute_cfg->layer_base;
+ xnn_config->lacount = compute_cfg->layer_count;
+ break;
+ case GNA_MODE_GMM:
+ dev_dbg(gna_dev(gna_priv), "GMM mode, offset: %d\n", compute_cfg->layer_base);
+ ret = gna_copy_gmm_config(gna_priv, score_request->buffer_list,
+ buffer_count, compute_cfg->layer_base);
+ if (ret)
+ goto err_put_pages_decr;
+ break;
+ default:
+ ret = -EINVAL;
+ goto err_put_pages_decr;
+ }
+
+ desc_base = (u32)(mmu->hwdesc_dma >> PAGE_SHIFT);
+ gna_reg_write(gna_priv, GNA_MMIO_DESBASE, desc_base);
+
+ gna_start_scoring(gna_priv, compute_cfg);
+
+ return 0;
+
+err_put_pages_decr:
+ i--;
+ buffer--;
+err_put_pages:
+ do {
+ mutex_lock(&gna_priv->memidr_lock);
+ mo = idr_find(&gna_priv->memory_idr, buffer->memory_id);
+ mutex_unlock(&gna_priv->memidr_lock);
+ if (mo) {
+ mutex_lock(&mo->page_lock);
+ mo->ops->put_pages(mo);
+ mutex_unlock(&mo->page_lock);
+ } else {
+ mo_valid = false;
+ dev_warn(gna_dev(gna_priv), "memory object not found %llu\n",
+ buffer->memory_id);
+ }
+ buffer--;
+ } while (i--);
+
+ if (mo_valid) {
+ i = score_request->buffer_count;
+ while (i--)
+ kvfree((void *)(uintptr_t)score_request->buffer_list[i].patches_ptr);
+ kvfree(score_request->buffer_list);
+ }
+ score_request->buffer_list = NULL;
+ score_request->buffer_count = 0;
+
+ return ret;
+}
diff --git a/drivers/misc/intel/gna/score.h b/drivers/misc/intel/gna/score.h
new file mode 100644
index 000000000000..28aeab33e452
--- /dev/null
+++ b/drivers/misc/intel/gna/score.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2017-2021 Intel Corporation */
+
+#ifndef __GNA_SCORE_H__
+#define __GNA_SCORE_H__
+
+struct gna_file_private;
+struct gna_compute_cfg;
+struct gna_private;
+struct gna_request;
+
+int gna_validate_score_config(struct gna_compute_cfg *compute_cfg,
+ struct gna_file_private *file_priv);
+
+int gna_score(struct gna_request *score_request);
+
+#endif // __GNA_SCORE_H__
diff --git a/include/uapi/misc/intel/gna.h b/include/uapi/misc/intel/gna.h
index 473ce569d146..b531beb35bd9 100644
--- a/include/uapi/misc/intel/gna.h
+++ b/include/uapi/misc/intel/gna.h
@@ -10,6 +10,10 @@ extern "C" {

#include <linux/types.h>

+/* Operation modes */
+#define GNA_MODE_GMM 0
+#define GNA_MODE_XNN 1
+
#define GNA_STS_SCORE_COMPLETED _BITUL(0)
#define GNA_STS_STATISTICS_VALID _BITUL(3)
#define GNA_STS_PCI_MMU_ERR _BITUL(4)
--
2.28.0


2021-05-13 18:49:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 13/14] intel_gna: add file operations to a 'misc' device

On Thu, May 13, 2021 at 01:00:39PM +0200, Maciej Kwapulinski wrote:
> From: Tomasz Jankowski <[email protected]>
>
> Signed-off-by: Tomasz Jankowski <[email protected]>
> Tested-by: Savo Novakovic <[email protected]>
> Co-developed-by: Jianxun Zhang <[email protected]>
> Signed-off-by: Jianxun Zhang <[email protected]>
> Co-developed-by: Maciej Kwapulinski <[email protected]>
> Signed-off-by: Maciej Kwapulinski <[email protected]>
> ---
> drivers/misc/intel/gna/device.c | 60 +++++++++++++++++++++++++++++++--
> 1 file changed, 57 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
> index 1e6345a8325b..c882055de8cf 100644
> --- a/drivers/misc/intel/gna/device.c
> +++ b/drivers/misc/intel/gna/device.c
> @@ -4,7 +4,9 @@
> #include <linux/device.h>
> #include <linux/dma-mapping.h>
> #include <linux/interrupt.h>
> +#include <linux/fs.h>
> #include <linux/module.h>
> +#include <linux/slab.h>
>
> #include <uapi/misc/intel/gna.h>
>
> @@ -20,16 +22,68 @@ module_param(recovery_timeout, int, 0644);
> MODULE_PARM_DESC(recovery_timeout, "Recovery timeout in seconds");
> #endif
>
> -struct file;
> -
> static int gna_open(struct inode *inode, struct file *f)
> {
> - return -EPERM;
> + struct gna_file_private *file_priv;
> + struct gna_private *gna_priv;
> +
> + gna_priv = container_of(f->private_data, struct gna_private, misc);
> +
> + file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
> + if (!file_priv)
> + return -ENOMEM;
> +
> + file_priv->fd = f;
> + file_priv->gna_priv = gna_priv;
> +
> + mutex_init(&file_priv->memlist_lock);
> + INIT_LIST_HEAD(&file_priv->memory_list);
> +
> + INIT_LIST_HEAD(&file_priv->flist);
> +
> + mutex_lock(&gna_priv->flist_lock);
> + list_add_tail(&file_priv->flist, &gna_priv->file_list);
> + mutex_unlock(&gna_priv->flist_lock);
> +
> + f->private_data = file_priv;
> +
> + return 0;
> +}
> +
> +static int gna_release(struct inode *inode, struct file *f)
> +{
> + struct gna_memory_object *iter_mo, *temp_mo;
> + struct gna_file_private *file_priv;
> + struct gna_private *gna_priv;
> +
> + /* free all memory objects created by that file */
> + file_priv = (struct gna_file_private *)f->private_data;
> + gna_priv = file_priv->gna_priv;
> +
> + mutex_lock(&file_priv->memlist_lock);
> + list_for_each_entry_safe(iter_mo, temp_mo, &file_priv->memory_list, file_mem_list) {
> + queue_work(gna_priv->request_wq, &iter_mo->work);
> + wait_event(iter_mo->waitq, true);
> + gna_memory_free(gna_priv, iter_mo);
> + }
> + mutex_unlock(&file_priv->memlist_lock);
> +
> + gna_delete_file_requests(f, gna_priv);
> +
> + mutex_lock(&gna_priv->flist_lock);
> + list_del_init(&file_priv->flist);
> + mutex_unlock(&gna_priv->flist_lock);
> + kfree(file_priv);
> + f->private_data = NULL;
> +
> + return 0;
> }
>
> static const struct file_operations gna_file_ops = {
> .owner = THIS_MODULE,
> .open = gna_open,
> + .release = gna_release,
> + .unlocked_ioctl = gna_ioctl,

Wait, where's the ioctl? You added it earlier in the series?

gotta go dig now...


greg k-h

2021-05-13 18:49:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] intel_gna: add ioctl handler

On Thu, May 13, 2021 at 01:00:37PM +0200, Maciej Kwapulinski wrote:
> From: Tomasz Jankowski <[email protected]>
>
> Add ioctl handler into GNA driver.
> The ioctl interface provides the ability to do the following:
> - Map and unmap memory buffers for GNA computation requests.
> - Retrieve capabilities of the underlying GNA IP.
> - Submit GNA computation requests.
> - Request notification of scoring completion.

Do you have a pointer to the userspace code that uses this ioctl?
That's kind of required here, otherwise we have no idea how this all
works.

>
> Signed-off-by: Tomasz Jankowski <[email protected]>
> Tested-by: Savo Novakovic <[email protected]>
> Co-developed-by: Jianxun Zhang <[email protected]>
> Signed-off-by: Jianxun Zhang <[email protected]>
> Co-developed-by: Maciej Kwapulinski <[email protected]>
> Signed-off-by: Maciej Kwapulinski <[email protected]>
> ---
> drivers/misc/intel/gna/Kbuild | 2 +-
> drivers/misc/intel/gna/device.c | 47 ++++++
> drivers/misc/intel/gna/device.h | 2 +
> drivers/misc/intel/gna/ioctl.c | 257 ++++++++++++++++++++++++++++++++
> drivers/misc/intel/gna/ioctl.h | 11 ++
> include/uapi/misc/intel/gna.h | 53 +++++++
> 6 files changed, 371 insertions(+), 1 deletion(-)
> create mode 100644 drivers/misc/intel/gna/ioctl.c
> create mode 100644 drivers/misc/intel/gna/ioctl.h
>
> diff --git a/drivers/misc/intel/gna/Kbuild b/drivers/misc/intel/gna/Kbuild
> index 38ff97360ed8..745a192a7304 100644
> --- a/drivers/misc/intel/gna/Kbuild
> +++ b/drivers/misc/intel/gna/Kbuild
> @@ -1,5 +1,5 @@
> # SPDX-License-Identifier: GPL-2.0-only
>
> -intel_gna-y := device.o hw.o mem.o pci.o request.o score.o
> +intel_gna-y := device.o hw.o ioctl.o mem.o pci.o request.o score.o
>
> obj-$(CONFIG_INTEL_GNA) += intel_gna.o
> diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
> index 75d8e1675485..0e31b8c6bb70 100644
> --- a/drivers/misc/intel/gna/device.c
> +++ b/drivers/misc/intel/gna/device.c
> @@ -6,8 +6,11 @@
> #include <linux/interrupt.h>
> #include <linux/module.h>
>
> +#include <uapi/misc/intel/gna.h>
> +
> #include "device.h"
> #include "hw.h"
> +#include "ioctl.h"
> #include "request.h"
>
> static int recovery_timeout = 60;
> @@ -145,6 +148,50 @@ int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem
> return 0;
> }
>
> +static u32 gna_device_type_by_hwid(u32 hwid)
> +{
> + switch (hwid) {
> + case GNA_DEV_HWID_CNL:
> + return GNA_DEV_TYPE_0_9;
> + case GNA_DEV_HWID_GLK:
> + case GNA_DEV_HWID_EHL:
> + case GNA_DEV_HWID_ICL:
> + return GNA_DEV_TYPE_1_0;
> + case GNA_DEV_HWID_JSL:
> + case GNA_DEV_HWID_TGL:
> + case GNA_DEV_HWID_RKL:
> + return GNA_DEV_TYPE_2_0;
> + case GNA_DEV_HWID_ADL:
> + case GNA_DEV_HWID_RPL:
> + return GNA_DEV_TYPE_3_0;
> + default:
> + return 0;
> + }
> +}
> +
> +int gna_getparam(struct gna_private *gna_priv, union gna_parameter *param)
> +{
> + switch (param->in.id) {
> + case GNA_PARAM_DEVICE_ID:
> + param->out.value = gna_priv->info.hwid;
> + break;

Why do you need an ioctl to get the device id? What's wrong with sysfs?

> + case GNA_PARAM_RECOVERY_TIMEOUT:
> + param->out.value = jiffies_to_msecs(gna_priv->recovery_timeout_jiffies) / 1000;
> + break;
> + case GNA_PARAM_INPUT_BUFFER_S:
> + param->out.value = gna_priv->hw_info.in_buf_s;
> + break;
> + case GNA_PARAM_DEVICE_TYPE:
> + param->out.value = gna_device_type_by_hwid(gna_priv->info.hwid);

Same here, why isn't this a sysfs file?

> + break;
> + default:
> + dev_err(gna_dev(gna_priv), "unknown parameter id %llu\n", param->in.id);

Userspace can cause syslog DoS? Not nice :(

Please just eat the error and move on.

> + return -EINVAL;

Wrong error value for invalid ioctl value.


> + }
> +
> + return 0;
> +}
> +
> MODULE_AUTHOR("Intel Corporation");
> MODULE_DESCRIPTION("Intel(R) Gaussian & Neural Accelerator (Intel(R) GNA) Driver");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/misc/intel/gna/device.h b/drivers/misc/intel/gna/device.h
> index d3c86d649b5c..75784882f57c 100644
> --- a/drivers/misc/intel/gna/device.h
> +++ b/drivers/misc/intel/gna/device.h
> @@ -17,6 +17,7 @@
> #define GNA_DV_NAME "intel_gna"
>
> struct workqueue_struct;
> +union gna_parameter;
> struct device;
> struct file;
>
> @@ -71,6 +72,7 @@ struct gna_private {
> };
>
> int gna_probe(struct device *parent, struct gna_dev_info *dev_info, void __iomem *iobase, int irq);
> +int gna_getparam(struct gna_private *gna_priv, union gna_parameter *param);
>
> static inline u32 gna_reg_read(struct gna_private *gna_priv, u32 reg)
> {
> diff --git a/drivers/misc/intel/gna/ioctl.c b/drivers/misc/intel/gna/ioctl.c
> new file mode 100644
> index 000000000000..4a90135b3cc6
> --- /dev/null
> +++ b/drivers/misc/intel/gna/ioctl.c
> @@ -0,0 +1,257 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright(c) 2017-2021 Intel Corporation
> +
> +#include <linux/device.h>
> +#include <linux/fs.h>
> +#include <linux/idr.h>
> +#include <linux/jiffies.h>
> +#include <linux/kref.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/uaccess.h>
> +#include <linux/wait.h>
> +#include <linux/workqueue.h>
> +
> +#include <uapi/misc/intel/gna.h>
> +
> +#include "device.h"
> +#include "ioctl.h"
> +#include "mem.h"
> +#include "request.h"
> +#include "score.h"
> +
> +static int gna_ioctl_score(struct gna_file_private *file_priv, void __user *argptr)
> +{
> + union gna_compute score_args;
> + struct gna_private *gna_priv;
> + u64 request_id;
> + int ret;
> +
> + gna_priv = file_priv->gna_priv;
> +
> + if (copy_from_user(&score_args, argptr, sizeof(score_args))) {
> + dev_err(gna_dev(gna_priv), "could not copy score ioctl config from user\n");

No need for errors that userspace can cause, please drop, you already
got a message if there needed to be one.

> + return -EFAULT;
> + }
> +
> + ret = gna_validate_score_config(&score_args.in.config, file_priv);

This function is in a different patch? Now I have to dig through that
to try to figure out if you really are validating the data properly?
That's just mean to reviewers, would you want to review code like this?
Please fix.

> + if (ret) {
> + dev_err(gna_dev(gna_priv), "request not valid\n");

Same here, clean up all error reporting in your ioctl to be none at all
please.

> + return ret;
> + }
> +
> + ret = gna_enqueue_request(&score_args.in.config, file_priv, &request_id);

Same here, where is this function to review?

Same for all your other ioctl handlers, please fix up, this is rough to
review...

greg k-h

2021-05-13 21:31:49

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] intel_gna: add ioctl handler

On Thu, May 13, 2021 at 01:00:37PM +0200, Maciej Kwapulinski wrote:
> +#include <linux/idr.h>

Please don't use the IDR in new code. Use the XArray instead.

> + mutex_lock(&gna_priv->memidr_lock);
> + mo = idr_find(&gna_priv->memory_idr, memory_id);
> + mutex_unlock(&gna_priv->memidr_lock);

You don't need your own lock with the XArray.


2021-05-13 22:39:20

by Maciej Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH v3 12/14] intel_gna: add a 'misc' device


Greg Kroah-Hartman <[email protected]> writes:

> On Thu, May 13, 2021 at 01:00:38PM +0200, Maciej Kwapulinski wrote:
>> The new 'misc' device is the node for applications in user space to
>> interact with the driver.
>>
>> Signed-off-by: Maciej Kwapulinski <[email protected]>
>> Tested-by: Savo Novakovic <[email protected]>
>> ---
>> drivers/misc/intel/gna/device.c | 52 +++++++++++++++++++++++++++++++--
>> drivers/misc/intel/gna/device.h | 11 +++----
>> 2 files changed, 55 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/misc/intel/gna/device.c b/drivers/misc/intel/gna/device.c
>> index 0e31b8c6bb70..1e6345a8325b 100644
>> --- a/drivers/misc/intel/gna/device.c
>> +++ b/drivers/misc/intel/gna/device.c
>> @@ -20,6 +20,18 @@ module_param(recovery_timeout, int, 0644);
>> MODULE_PARM_DESC(recovery_timeout, "Recovery timeout in seconds");
>> #endif
>>
>> +struct file;
>> +
>> +static int gna_open(struct inode *inode, struct file *f)
>> +{
>> + return -EPERM;
>> +}
>
> That sucks, why have an open that does nothing but fail?

next patch provides complete implementation of gna_open(), here it's
just a protection if someone would incidentally run gna in the middle of patch series

>
>> +
>> +static const struct file_operations gna_file_ops = {
>> + .owner = THIS_MODULE,
>> + .open = gna_open,
>> +};
>> +
>> static void gna_devm_idr_destroy(void *data)
>> {
>> struct idr *idr = data;
>> @@ -27,6 +39,36 @@ static void gna_devm_idr_destroy(void *data)
>> idr_destroy(idr);
>> }
>>
>> +static void gna_devm_deregister_misc_dev(void *data)
>
> Why is this a void *?

it goes to devm_add_action() api.

>
> This isn't windows, use real pointer types everywhere in the kernel
> please.
>
>> +{
>> + struct miscdevice *misc = data;
>> +
>> + misc_deregister(misc);
>> +}
>> +
>> +static int gna_devm_register_misc_dev(struct device *parent, struct miscdevice *misc)
>> +{
>> + int ret;
>> +
>> + ret = misc_register(misc);
>> + if (ret) {
>> + dev_err(parent, "misc device %s registering failed. errcode: %d\n",
>> + misc->name, ret);
>> + gna_devm_deregister_misc_dev(misc);
>> + } else {
>> + dev_dbg(parent, "device: %s registered\n",
>> + misc->name);
>
> You have loads of debugging in this driver still, is it really needed?
>
>> + }
>> +
>> + ret = devm_add_action(parent, gna_devm_deregister_misc_dev, misc);
>
> Why do you need this?

I'd like to avoid having gna_probe's fail path at all.

>
>
> thanks,
>
> greg k-h


2021-05-14 10:14:30

by Maciej Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] intel_gna: add ioctl handler


Greg Kroah-Hartman <[email protected]> writes:

> On Thu, May 13, 2021 at 01:00:37PM +0200, Maciej Kwapulinski wrote:
>> From: Tomasz Jankowski <[email protected]>
>>
>> Add ioctl handler into GNA driver.
>> The ioctl interface provides the ability to do the following:
>> - Map and unmap memory buffers for GNA computation requests.
>> - Retrieve capabilities of the underlying GNA IP.
>> - Submit GNA computation requests.
>> - Request notification of scoring completion.
>
> Do you have a pointer to the userspace code that uses this ioctl?
> That's kind of required here, otherwise we have no idea how this all
> works.
>

yes, it's present under following link:

https://github.com/intel/gna

regards,
Maciej

2021-05-14 10:36:02

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
<[email protected]> wrote:
> On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> > Dear kernel maintainers,
> >
> > This submission is a kernel driver to support Intel(R) Gaussian & Neural
> > Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> > available on multiple Intel platforms. AI developers and users can offload
> > continuous inference workloads to an Intel(R) GNA device in order to free
> > processor resources and save power. Noise reduction and speech recognition
> > are the examples of the workloads Intel(R) GNA deals with while its usage
> > is not limited to the two.
>
> How does this compare with the "nnpi" driver being proposed here:
> https://lore.kernel.org/r/[email protected]
>
> Please work with those developers to share code and userspace api and
> tools. Having the community review two totally different apis and
> drivers for the same type of functionality from the same company is
> totally wasteful of our time and energy.

Agreed, but I think we should go further than this and work towards a
subsystem across companies for machine learning and neural networks
accelerators for both inferencing and training.

We have support for Intel habanalabs hardware in drivers/misc, and there are
countless hardware solutions out of tree that would hopefully go the same
way with an upstream submission and open source user space, including

- Intel/Mobileye EyeQ
- Intel/Movidius Keembay
- Nvidia NVDLA
- Gyrfalcon Lightspeeur
- Apple Neural Engine
- Google TPU
- Arm Ethos

plus many more that are somewhat less likely to gain fully open source
driver stacks.

Arnd

2021-05-14 15:01:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] intel_gna: add ioctl handler

On Fri, May 14, 2021 at 10:20:42AM +0200, Maciej Kwapulinski wrote:
>
> Greg Kroah-Hartman <[email protected]> writes:
>
> > On Thu, May 13, 2021 at 01:00:37PM +0200, Maciej Kwapulinski wrote:
> >> From: Tomasz Jankowski <[email protected]>
> >>
> >> Add ioctl handler into GNA driver.
> >> The ioctl interface provides the ability to do the following:
> >> - Map and unmap memory buffers for GNA computation requests.
> >> - Retrieve capabilities of the underlying GNA IP.
> >> - Submit GNA computation requests.
> >> - Request notification of scoring completion.
> >
> > Do you have a pointer to the userspace code that uses this ioctl?
> > That's kind of required here, otherwise we have no idea how this all
> > works.
> >
>
> yes, it's present under following link:
>
> https://github.com/intel/gna

Then that needs to go here in this changelog text, right?

2021-05-14 15:02:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> Dear kernel maintainers,
>
> This submission is a kernel driver to support Intel(R) Gaussian & Neural
> Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> available on multiple Intel platforms. AI developers and users can offload
> continuous inference workloads to an Intel(R) GNA device in order to free
> processor resources and save power. Noise reduction and speech recognition
> are the examples of the workloads Intel(R) GNA deals with while its usage
> is not limited to the two.

How does this compare with the "nnpi" driver being proposed here:
https://lore.kernel.org/r/[email protected]

Please work with those developers to share code and userspace api and
tools. Having the community review two totally different apis and
drivers for the same type of functionality from the same company is
totally wasteful of our time and energy.

thanks,

greg k-h

2021-05-14 23:07:03

by Maciej Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] intel_gna: add ioctl handler


Hillf Danton <[email protected]> writes:

> On Thu, 13 May 2021 13:00:37 +0200 Tomasz Jankowski wrote:
Greg, Matthew, Hillf

thank You for Your comments.

Next week I'll start applying them to code

regards,
Maciej

2021-05-17 08:52:13

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, May 17, 2021 at 10:00 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Mon, May 17, 2021 at 09:40:53AM +0200, Daniel Vetter wrote:
> > On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> > > On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > > On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> > > > > Dear kernel maintainers,
> > > > >
> > > > > This submission is a kernel driver to support Intel(R) Gaussian & Neural
> > > > > Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> > > > > available on multiple Intel platforms. AI developers and users can offload
> > > > > continuous inference workloads to an Intel(R) GNA device in order to free
> > > > > processor resources and save power. Noise reduction and speech recognition
> > > > > are the examples of the workloads Intel(R) GNA deals with while its usage
> > > > > is not limited to the two.
> > > >
> > > > How does this compare with the "nnpi" driver being proposed here:
> > > > https://lore.kernel.org/r/[email protected]
> > > >
> > > > Please work with those developers to share code and userspace api and
> > > > tools. Having the community review two totally different apis and
> > > > drivers for the same type of functionality from the same company is
> > > > totally wasteful of our time and energy.
> > >
> > > Agreed, but I think we should go further than this and work towards a
> > > subsystem across companies for machine learning and neural networks
> > > accelerators for both inferencing and training.
> >
> > We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> > think G as in General, not Graphisc.
> >
> > > We have support for Intel habanalabs hardware in drivers/misc, and there are
> > > countless hardware solutions out of tree that would hopefully go the same
> > > way with an upstream submission and open source user space, including
> > >
> > > - Intel/Mobileye EyeQ
> > > - Intel/Movidius Keembay
> > > - Nvidia NVDLA
> > > - Gyrfalcon Lightspeeur
> > > - Apple Neural Engine
> > > - Google TPU
> > > - Arm Ethos
> > >
> > > plus many more that are somewhat less likely to gain fully open source
> > > driver stacks.
> >
> > We also had this entire discussion 2 years ago with habanalabs. The
> > hang-up is that drivers/gpu folks require fully open source userspace,
> > including compiler and anything else you need to actually use the chip.
> > Greg doesn't, he's happy if all he has is the runtime library with some
> > tests.

I guess we're really going to beat this horse into pulp ... oh well.

> All you need is a library, what you write on top of that is always
> application-specific, so how can I ask for "more"?

This is like accepting a new cpu port, where all you require is that
the libc port is open source, but the cpu compiler is totally fine as
a blob (doable with llvm now being supported). It makes no sense at
all, at least to people who have worked with accelerators like this
before.

We are not requiring that applications are open. We're only requiring
that at least one of the compilers you need (no need to open the fully
optimized one with all the magic sauce) to create any kind of
applications is open, because without that you can't use the device,
you can't analyze the stack, and you have no idea at all about what
exactly it is you're merging. With these devices, the uapi visible in
include/uapi is the smallest part of the interface exposed to
userspace.

> > These two drivers here look a lot more like classic gpus than habanalabs
> > did, at least from a quick look they operate with explicit buffer
> > allocations/registration model. So even more reasons to just reuse all the
> > stuff we have already. But also I don't expect these drivers here to come
> > with open compilers, they never do, not initially at least before you
> > started talking with the vendor. Hence I expect there'll be more
> > drivers/totally-not-drm acceleration subsystem nonsense.
>
> As these are both from Intel, why aren't they using the same open
> compiler? Why aren't they using the same userspace api as well? What's
> preventing them from talking to each other about this and not forcing
> the community (i.e. outsiders) from being the one to force this to
> happen?

I'm unfortuantely not the CEO of this company. Also you're the one who
keeps accepting drivers that the accel folks (aka dri-devel community)
said shouldn't be merged, so my internal bargaining power is zero to
force something reaonable here. So please don't blame me for this
mess, this is yours entirely.

> > Anyway this horse has been throughroughly beaten to death and more, the
> > agreement is that accel drivers in drivers/misc must not use any gpu
> > stuff, so that drivers/gpu people dont end up in a prickly situation they
> > never signed up for. E.g. I removed some code sharing from habanalabs.
> > This means interop between gpu and nn/ai drivers will be no-go until this
> > is resolved, but *shrug*.
>
> I'm all for making this unified, but these are not really devices doing
> graphics so putting it all into DRM always feels wrong to me. The fact
> that people abuse GPU devices for not graphic usages would indicate to
> me that that code should be moving _out_ of the drm subsystem :)

Like I said, if the 'g' really annoys you that much, feel free to send
in a patch to rename drivers/gpu to drivers/xpu.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-05-17 08:59:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, May 17, 2021 at 10:49:09AM +0200, Daniel Vetter wrote:
> On Mon, May 17, 2021 at 10:00 AM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Mon, May 17, 2021 at 09:40:53AM +0200, Daniel Vetter wrote:
> > > On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> > > > On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> > > > <[email protected]> wrote:
> > > > > On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> > > > > > Dear kernel maintainers,
> > > > > >
> > > > > > This submission is a kernel driver to support Intel(R) Gaussian & Neural
> > > > > > Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> > > > > > available on multiple Intel platforms. AI developers and users can offload
> > > > > > continuous inference workloads to an Intel(R) GNA device in order to free
> > > > > > processor resources and save power. Noise reduction and speech recognition
> > > > > > are the examples of the workloads Intel(R) GNA deals with while its usage
> > > > > > is not limited to the two.
> > > > >
> > > > > How does this compare with the "nnpi" driver being proposed here:
> > > > > https://lore.kernel.org/r/[email protected]
> > > > >
> > > > > Please work with those developers to share code and userspace api and
> > > > > tools. Having the community review two totally different apis and
> > > > > drivers for the same type of functionality from the same company is
> > > > > totally wasteful of our time and energy.
> > > >
> > > > Agreed, but I think we should go further than this and work towards a
> > > > subsystem across companies for machine learning and neural networks
> > > > accelerators for both inferencing and training.
> > >
> > > We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> > > think G as in General, not Graphisc.
> > >
> > > > We have support for Intel habanalabs hardware in drivers/misc, and there are
> > > > countless hardware solutions out of tree that would hopefully go the same
> > > > way with an upstream submission and open source user space, including
> > > >
> > > > - Intel/Mobileye EyeQ
> > > > - Intel/Movidius Keembay
> > > > - Nvidia NVDLA
> > > > - Gyrfalcon Lightspeeur
> > > > - Apple Neural Engine
> > > > - Google TPU
> > > > - Arm Ethos
> > > >
> > > > plus many more that are somewhat less likely to gain fully open source
> > > > driver stacks.
> > >
> > > We also had this entire discussion 2 years ago with habanalabs. The
> > > hang-up is that drivers/gpu folks require fully open source userspace,
> > > including compiler and anything else you need to actually use the chip.
> > > Greg doesn't, he's happy if all he has is the runtime library with some
> > > tests.
>
> I guess we're really going to beat this horse into pulp ... oh well.
>
> > All you need is a library, what you write on top of that is always
> > application-specific, so how can I ask for "more"?
>
> This is like accepting a new cpu port, where all you require is that
> the libc port is open source, but the cpu compiler is totally fine as
> a blob (doable with llvm now being supported). It makes no sense at
> all, at least to people who have worked with accelerators like this
> before.
>
> We are not requiring that applications are open. We're only requiring
> that at least one of the compilers you need (no need to open the fully
> optimized one with all the magic sauce) to create any kind of
> applications is open, because without that you can't use the device,
> you can't analyze the stack, and you have no idea at all about what
> exactly it is you're merging. With these devices, the uapi visible in
> include/uapi is the smallest part of the interface exposed to
> userspace.

Ok, sorry, I was not aware that the habanalabs compiler was not
available to all under an open source license. All I was trying to
enforce was that the library to use the kernel api was open so that
anyone could use it. Trying to enforce compiler requirements like this
might feel to be a bit of a reach as the CPU on the hardware really
doesn't fall under the license of the operating system running on this
CPU over here :)

thanks,

greg k-h

2021-05-17 12:59:49

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> <[email protected]> wrote:
> > On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> > > Dear kernel maintainers,
> > >
> > > This submission is a kernel driver to support Intel(R) Gaussian & Neural
> > > Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> > > available on multiple Intel platforms. AI developers and users can offload
> > > continuous inference workloads to an Intel(R) GNA device in order to free
> > > processor resources and save power. Noise reduction and speech recognition
> > > are the examples of the workloads Intel(R) GNA deals with while its usage
> > > is not limited to the two.
> >
> > How does this compare with the "nnpi" driver being proposed here:
> > https://lore.kernel.org/r/[email protected]
> >
> > Please work with those developers to share code and userspace api and
> > tools. Having the community review two totally different apis and
> > drivers for the same type of functionality from the same company is
> > totally wasteful of our time and energy.
>
> Agreed, but I think we should go further than this and work towards a
> subsystem across companies for machine learning and neural networks
> accelerators for both inferencing and training.

We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
think G as in General, not Graphisc.

> We have support for Intel habanalabs hardware in drivers/misc, and there are
> countless hardware solutions out of tree that would hopefully go the same
> way with an upstream submission and open source user space, including
>
> - Intel/Mobileye EyeQ
> - Intel/Movidius Keembay
> - Nvidia NVDLA
> - Gyrfalcon Lightspeeur
> - Apple Neural Engine
> - Google TPU
> - Arm Ethos
>
> plus many more that are somewhat less likely to gain fully open source
> driver stacks.

We also had this entire discussion 2 years ago with habanalabs. The
hang-up is that drivers/gpu folks require fully open source userspace,
including compiler and anything else you need to actually use the chip.
Greg doesn't, he's happy if all he has is the runtime library with some
tests.

These two drivers here look a lot more like classic gpus than habanalabs
did, at least from a quick look they operate with explicit buffer
allocations/registration model. So even more reasons to just reuse all the
stuff we have already. But also I don't expect these drivers here to come
with open compilers, they never do, not initially at least before you
started talking with the vendor. Hence I expect there'll be more
drivers/totally-not-drm acceleration subsystem nonsense.

Anyway this horse has been throughroughly beaten to death and more, the
agreement is that accel drivers in drivers/misc must not use any gpu
stuff, so that drivers/gpu people dont end up in a prickly situation they
never signed up for. E.g. I removed some code sharing from habanalabs.
This means interop between gpu and nn/ai drivers will be no-go until this
is resolved, but *shrug*.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-05-17 13:40:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, May 17, 2021 at 09:40:53AM +0200, Daniel Vetter wrote:
> On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> > On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> > <[email protected]> wrote:
> > > On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> > > > Dear kernel maintainers,
> > > >
> > > > This submission is a kernel driver to support Intel(R) Gaussian & Neural
> > > > Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> > > > available on multiple Intel platforms. AI developers and users can offload
> > > > continuous inference workloads to an Intel(R) GNA device in order to free
> > > > processor resources and save power. Noise reduction and speech recognition
> > > > are the examples of the workloads Intel(R) GNA deals with while its usage
> > > > is not limited to the two.
> > >
> > > How does this compare with the "nnpi" driver being proposed here:
> > > https://lore.kernel.org/r/[email protected]
> > >
> > > Please work with those developers to share code and userspace api and
> > > tools. Having the community review two totally different apis and
> > > drivers for the same type of functionality from the same company is
> > > totally wasteful of our time and energy.
> >
> > Agreed, but I think we should go further than this and work towards a
> > subsystem across companies for machine learning and neural networks
> > accelerators for both inferencing and training.
>
> We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> think G as in General, not Graphisc.
>
> > We have support for Intel habanalabs hardware in drivers/misc, and there are
> > countless hardware solutions out of tree that would hopefully go the same
> > way with an upstream submission and open source user space, including
> >
> > - Intel/Mobileye EyeQ
> > - Intel/Movidius Keembay
> > - Nvidia NVDLA
> > - Gyrfalcon Lightspeeur
> > - Apple Neural Engine
> > - Google TPU
> > - Arm Ethos
> >
> > plus many more that are somewhat less likely to gain fully open source
> > driver stacks.
>
> We also had this entire discussion 2 years ago with habanalabs. The
> hang-up is that drivers/gpu folks require fully open source userspace,
> including compiler and anything else you need to actually use the chip.
> Greg doesn't, he's happy if all he has is the runtime library with some
> tests.

All you need is a library, what you write on top of that is always
application-specific, so how can I ask for "more"?

> These two drivers here look a lot more like classic gpus than habanalabs
> did, at least from a quick look they operate with explicit buffer
> allocations/registration model. So even more reasons to just reuse all the
> stuff we have already. But also I don't expect these drivers here to come
> with open compilers, they never do, not initially at least before you
> started talking with the vendor. Hence I expect there'll be more
> drivers/totally-not-drm acceleration subsystem nonsense.

As these are both from Intel, why aren't they using the same open
compiler? Why aren't they using the same userspace api as well? What's
preventing them from talking to each other about this and not forcing
the community (i.e. outsiders) from being the one to force this to
happen?

> Anyway this horse has been throughroughly beaten to death and more, the
> agreement is that accel drivers in drivers/misc must not use any gpu
> stuff, so that drivers/gpu people dont end up in a prickly situation they
> never signed up for. E.g. I removed some code sharing from habanalabs.
> This means interop between gpu and nn/ai drivers will be no-go until this
> is resolved, but *shrug*.

I'm all for making this unified, but these are not really devices doing
graphics so putting it all into DRM always feels wrong to me. The fact
that people abuse GPU devices for not graphic usages would indicate to
me that that code should be moving _out_ of the drm subsystem :)

thanks,

greg k-h

2021-05-17 13:53:50

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, May 17, 2021 at 10:55 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Mon, May 17, 2021 at 10:49:09AM +0200, Daniel Vetter wrote:
> > On Mon, May 17, 2021 at 10:00 AM Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > On Mon, May 17, 2021 at 09:40:53AM +0200, Daniel Vetter wrote:
> > > > On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> > > > > On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> > > > > <[email protected]> wrote:
> > > > > > On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> > > > > > > Dear kernel maintainers,
> > > > > > >
> > > > > > > This submission is a kernel driver to support Intel(R) Gaussian & Neural
> > > > > > > Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> > > > > > > available on multiple Intel platforms. AI developers and users can offload
> > > > > > > continuous inference workloads to an Intel(R) GNA device in order to free
> > > > > > > processor resources and save power. Noise reduction and speech recognition
> > > > > > > are the examples of the workloads Intel(R) GNA deals with while its usage
> > > > > > > is not limited to the two.
> > > > > >
> > > > > > How does this compare with the "nnpi" driver being proposed here:
> > > > > > https://lore.kernel.org/r/[email protected]
> > > > > >
> > > > > > Please work with those developers to share code and userspace api and
> > > > > > tools. Having the community review two totally different apis and
> > > > > > drivers for the same type of functionality from the same company is
> > > > > > totally wasteful of our time and energy.
> > > > >
> > > > > Agreed, but I think we should go further than this and work towards a
> > > > > subsystem across companies for machine learning and neural networks
> > > > > accelerators for both inferencing and training.
> > > >
> > > > We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> > > > think G as in General, not Graphisc.
> > > >
> > > > > We have support for Intel habanalabs hardware in drivers/misc, and there are
> > > > > countless hardware solutions out of tree that would hopefully go the same
> > > > > way with an upstream submission and open source user space, including
> > > > >
> > > > > - Intel/Mobileye EyeQ
> > > > > - Intel/Movidius Keembay
> > > > > - Nvidia NVDLA
> > > > > - Gyrfalcon Lightspeeur
> > > > > - Apple Neural Engine
> > > > > - Google TPU
> > > > > - Arm Ethos
> > > > >
> > > > > plus many more that are somewhat less likely to gain fully open source
> > > > > driver stacks.
> > > >
> > > > We also had this entire discussion 2 years ago with habanalabs. The
> > > > hang-up is that drivers/gpu folks require fully open source userspace,
> > > > including compiler and anything else you need to actually use the chip.
> > > > Greg doesn't, he's happy if all he has is the runtime library with some
> > > > tests.
> >
> > I guess we're really going to beat this horse into pulp ... oh well.
> >
> > > All you need is a library, what you write on top of that is always
> > > application-specific, so how can I ask for "more"?
> >
> > This is like accepting a new cpu port, where all you require is that
> > the libc port is open source, but the cpu compiler is totally fine as
> > a blob (doable with llvm now being supported). It makes no sense at
> > all, at least to people who have worked with accelerators like this
> > before.
> >
> > We are not requiring that applications are open. We're only requiring
> > that at least one of the compilers you need (no need to open the fully
> > optimized one with all the magic sauce) to create any kind of
> > applications is open, because without that you can't use the device,
> > you can't analyze the stack, and you have no idea at all about what
> > exactly it is you're merging. With these devices, the uapi visible in
> > include/uapi is the smallest part of the interface exposed to
> > userspace.
>
> Ok, sorry, I was not aware that the habanalabs compiler was not
> available to all under an open source license. All I was trying to
> enforce was that the library to use the kernel api was open so that
> anyone could use it. Trying to enforce compiler requirements like this
> might feel to be a bit of a reach as the CPU on the hardware really
> doesn't fall under the license of the operating system running on this
> CPU over here :)

Experience says if you don't, forget about supporting your
drivers/subsystem long-term. At best you're stuck with a per-device
fragmented mess that vendors might or might not support. This has
nothing to do with GPL licensing or not, but about making sure you can
do proper engineering/support/review of the driver stack. At least in
the GPU world we're already making it rather clear that running blobby
userspace is fine with us (as long as it's using the exact same uapi
as the truly open stack, no exceptions/hacks/abuse are supported).

Also yes vendors don't like it. But they also don't like that they
have to open source their kernel drivers, or runtime library. Lots of
background chats over years, and a very clear line in the sand helps
to get there, and also makes sure that the vendors who got here don't
return to the old closed source ways they love so much.

Anyway we've had all this discussions 2 years ago, nothing has changed
(well on the gpu side we managed to get ARM officially on board with
fully open stack paid by them meanwhile, other discussions still
ongoing). I just wanted to re-iterate that if we'd really care about
having a proper accel subsystem, there's people who've been doing this
for decades.

-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-05-18 22:30:23

by Dave Airlie

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, 17 May 2021 at 19:12, Daniel Vetter <[email protected]> wrote:
>
> On Mon, May 17, 2021 at 10:55 AM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Mon, May 17, 2021 at 10:49:09AM +0200, Daniel Vetter wrote:
> > > On Mon, May 17, 2021 at 10:00 AM Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > >
> > > > On Mon, May 17, 2021 at 09:40:53AM +0200, Daniel Vetter wrote:
> > > > > On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> > > > > > On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> > > > > > <[email protected]> wrote:
> > > > > > > On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> > > > > > > > Dear kernel maintainers,
> > > > > > > >
> > > > > > > > This submission is a kernel driver to support Intel(R) Gaussian & Neural
> > > > > > > > Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> > > > > > > > available on multiple Intel platforms. AI developers and users can offload
> > > > > > > > continuous inference workloads to an Intel(R) GNA device in order to free
> > > > > > > > processor resources and save power. Noise reduction and speech recognition
> > > > > > > > are the examples of the workloads Intel(R) GNA deals with while its usage
> > > > > > > > is not limited to the two.
> > > > > > >
> > > > > > > How does this compare with the "nnpi" driver being proposed here:
> > > > > > > https://lore.kernel.org/r/[email protected]
> > > > > > >
> > > > > > > Please work with those developers to share code and userspace api and
> > > > > > > tools. Having the community review two totally different apis and
> > > > > > > drivers for the same type of functionality from the same company is
> > > > > > > totally wasteful of our time and energy.
> > > > > >
> > > > > > Agreed, but I think we should go further than this and work towards a
> > > > > > subsystem across companies for machine learning and neural networks
> > > > > > accelerators for both inferencing and training.
> > > > >
> > > > > We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> > > > > think G as in General, not Graphisc.
> > > > >
> > > > > > We have support for Intel habanalabs hardware in drivers/misc, and there are
> > > > > > countless hardware solutions out of tree that would hopefully go the same
> > > > > > way with an upstream submission and open source user space, including
> > > > > >
> > > > > > - Intel/Mobileye EyeQ
> > > > > > - Intel/Movidius Keembay
> > > > > > - Nvidia NVDLA
> > > > > > - Gyrfalcon Lightspeeur
> > > > > > - Apple Neural Engine
> > > > > > - Google TPU
> > > > > > - Arm Ethos
> > > > > >
> > > > > > plus many more that are somewhat less likely to gain fully open source
> > > > > > driver stacks.
> > > > >
> > > > > We also had this entire discussion 2 years ago with habanalabs. The
> > > > > hang-up is that drivers/gpu folks require fully open source userspace,
> > > > > including compiler and anything else you need to actually use the chip.
> > > > > Greg doesn't, he's happy if all he has is the runtime library with some
> > > > > tests.
> > >
> > > I guess we're really going to beat this horse into pulp ... oh well.
> > >
> > > > All you need is a library, what you write on top of that is always
> > > > application-specific, so how can I ask for "more"?
> > >
> > > This is like accepting a new cpu port, where all you require is that
> > > the libc port is open source, but the cpu compiler is totally fine as
> > > a blob (doable with llvm now being supported). It makes no sense at
> > > all, at least to people who have worked with accelerators like this
> > > before.
> > >
> > > We are not requiring that applications are open. We're only requiring
> > > that at least one of the compilers you need (no need to open the fully
> > > optimized one with all the magic sauce) to create any kind of
> > > applications is open, because without that you can't use the device,
> > > you can't analyze the stack, and you have no idea at all about what
> > > exactly it is you're merging. With these devices, the uapi visible in
> > > include/uapi is the smallest part of the interface exposed to
> > > userspace.
> >
> > Ok, sorry, I was not aware that the habanalabs compiler was not
> > available to all under an open source license. All I was trying to
> > enforce was that the library to use the kernel api was open so that
> > anyone could use it. Trying to enforce compiler requirements like this
> > might feel to be a bit of a reach as the CPU on the hardware really
> > doesn't fall under the license of the operating system running on this
> > CPU over here :)
>
> Experience says if you don't, forget about supporting your
> drivers/subsystem long-term. At best you're stuck with a per-device
> fragmented mess that vendors might or might not support. This has
> nothing to do with GPL licensing or not, but about making sure you can
> do proper engineering/support/review of the driver stack. At least in
> the GPU world we're already making it rather clear that running blobby
> userspace is fine with us (as long as it's using the exact same uapi
> as the truly open stack, no exceptions/hacks/abuse are supported).
>
> Also yes vendors don't like it. But they also don't like that they
> have to open source their kernel drivers, or runtime library. Lots of
> background chats over years, and a very clear line in the sand helps
> to get there, and also makes sure that the vendors who got here don't
> return to the old closed source ways they love so much.
>
> Anyway we've had all this discussions 2 years ago, nothing has changed
> (well on the gpu side we managed to get ARM officially on board with
> fully open stack paid by them meanwhile, other discussions still
> ongoing). I just wanted to re-iterate that if we'd really care about
> having a proper accel subsystem, there's people who've been doing this
> for decades.


I think the other point worth reiterating is that most of these
devices are unobtanium for your average kernel maintainer. It's hard
to create a subsystem standard when you don't have access to a
collection of devices + the complete picture of what the stack is
doing and how it interoperates with the ecosystem at large, not just
the kernel. Kernel maintainers need to help ensure there is a viable
ecosystem beyond the kernel before merging stuff that is clearly a
large kernel + user stack architecture. i.e. misc USB drivers, merge
away, misc small layer drivers for larger vendor-specific ecosystems
we need to tread more carefully as longterm we do nobody any favours.

Dave.
>
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

2021-05-19 01:21:56

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

Hi

Am 17.05.21 um 09:40 schrieb Daniel Vetter:
> On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
>> On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
>> <[email protected]> wrote:
>>> On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
>>>> Dear kernel maintainers,
>>>>
>>>> This submission is a kernel driver to support Intel(R) Gaussian & Neural
>>>> Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
>>>> available on multiple Intel platforms. AI developers and users can offload
>>>> continuous inference workloads to an Intel(R) GNA device in order to
free
>>>> processor resources and save power. Noise reduction and speech recognition
>>>> are the examples of the workloads Intel(R) GNA deals with while its usage
>>>> is not limited to the two.
>>>
>>> How does this compare with the "nnpi" driver being proposed here:
>>> https://lore.kernel.org/r/[email protected]
>>>
>>> Please work with those developers to share code and userspace api and
>>> tools. Having the community review two totally different apis and
>>> drivers for the same type of functionality from the same company is
>>> totally wasteful of our time and energy.
>>
>> Agreed, but I think we should go further than this and work towards a
>> subsystem across companies for machine learning and neural networks
>> accelerators for both inferencing and training.
>
> We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> think G as in General, not Graphisc.

I hope this was a joke.

Just some thoughts:

AFAICT AI first came as an application of GPUs, but has now
evolved/specialized into something of its own. I can imagine sharing
some code among the various subsystems, say GEM/TTM internals for memory
management. Besides that there's probably little that can be shared in
the userspace interfaces. A GPU is device that puts an image onto the
screen and an AI accelerator isn't. Treating both as the same, even if
they share similar chip architectures, seems like a stretch. They might
evolve in different directions and fit less and less under the same
umbrella.

And as Dave mentioned, these devices are hard to obtain. We don't really
know what we sign up for.

Just my 2 cents.

Best regards
Thomas



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2021-05-19 01:22:11

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, May 17, 2021 at 3:12 PM Thomas Zimmermann <[email protected]> wrote:
>
> Hi
>
> Am 17.05.21 um 09:40 schrieb Daniel Vetter:
> > On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> >> On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> >> <[email protected]> wrote:
> >>> On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> >>>> Dear kernel maintainers,
> >>>>
> >>>> This submission is a kernel driver to support Intel(R) Gaussian & Neural
> >>>> Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> >>>> available on multiple Intel platforms. AI developers and users can offload
> >>>> continuous inference workloads to an Intel(R) GNA device in order to
> free
> >>>> processor resources and save power. Noise reduction and speech recognition
> >>>> are the examples of the workloads Intel(R) GNA deals with while its usage
> >>>> is not limited to the two.
> >>>
> >>> How does this compare with the "nnpi" driver being proposed here:
> >>> https://lore.kernel.org/r/[email protected]
> >>>
> >>> Please work with those developers to share code and userspace api and
> >>> tools. Having the community review two totally different apis and
> >>> drivers for the same type of functionality from the same company is
> >>> totally wasteful of our time and energy.
> >>
> >> Agreed, but I think we should go further than this and work towards a
> >> subsystem across companies for machine learning and neural networks
> >> accelerators for both inferencing and training.
> >
> > We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> > think G as in General, not Graphisc.
>
> I hope this was a joke.
>
> Just some thoughts:
>
> AFAICT AI first came as an application of GPUs, but has now
> evolved/specialized into something of its own. I can imagine sharing
> some code among the various subsystems, say GEM/TTM internals for memory
> management. Besides that there's probably little that can be shared in
> the userspace interfaces. A GPU is device that puts an image onto the
> screen and an AI accelerator isn't. Treating both as the same, even if
> they share similar chip architectures, seems like a stretch. They might
> evolve in different directions and fit less and less under the same
> umbrella.

The putting something on the screen is just a tiny part of what GPUs
do these days. Many GPUs don't even have display hardware anymore.
Even with drawing APIs, it's just some operation that you do with
memory. The display may be another device entirely. GPUs also do
video encode and decode, jpeg acceleration, etc. drivers/gpu seems
like a logical place to me. Call it drivers/accelerators if you like.
Other than modesetting most of the shared infrastructure in
drivers/gpu is around memory management and synchronization which are
all the hard parts. Better to try and share that than to reinvent
that in some other subsystem.

Alex

>
> And as Dave mentioned, these devices are hard to obtain. We don't really
> know what we sign up for.
>
> Just my 2 cents.
>
> Best regards
> Thomas
>
>
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
>

2021-05-19 01:42:04

by Daniel Stone

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

Hi,

On Mon, 17 May 2021 at 20:12, Thomas Zimmermann <[email protected]> wrote:
> Am 17.05.21 um 09:40 schrieb Daniel Vetter:
> > We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> > think G as in General, not Graphisc.
>
> I hope this was a joke.
>
> Just some thoughts:
>
> AFAICT AI first came as an application of GPUs, but has now
> evolved/specialized into something of its own. I can imagine sharing
> some code among the various subsystems, say GEM/TTM internals for memory
> management. Besides that there's probably little that can be shared in
> the userspace interfaces. A GPU is device that puts an image onto the
> screen and an AI accelerator isn't.

But it isn't. A GPU is a device that has a kernel-arbitrated MMU
hosting kernel-managed buffers, executes user-supplied compiled
programs with reference to those buffers and other jobs, and informs
the kernel about progress.

KMS lies under the same third-level directory, but even when GPU and
display are on the same die, they're totally different IP blocks
developed on different schedules which are just periodically glued
together.

> Treating both as the same, even if
> they share similar chip architectures, seems like a stretch. They might
> evolve in different directions and fit less and less under the same
> umbrella.

Why not? All we have in common in GPU land right now is MMU + buffer
references + job scheduling + synchronisation. None of this has common
top-level API, or even a common top-level model. It's not just ISA
differences, but we have very old-school devices where the kernel
needs to register fill on every job, living next to middle-age devices
where the kernel and userspace co-operate to fill a ring buffer,
living next to modern devices where userspace does some stuff and then
the hardware makes it happen with the bare minimum of kernel
awareness.

Honestly I think there's more difference between lima and amdgpu then
there is between amdgpu and current NN/ML devices.

Cheers,
Daniel

2021-05-19 01:52:04

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, May 17, 2021 at 9:23 PM Alex Deucher <[email protected]> wrote:
>
> On Mon, May 17, 2021 at 3:12 PM Thomas Zimmermann <[email protected]> wrote:
> >
> > Hi
> >
> > Am 17.05.21 um 09:40 schrieb Daniel Vetter:
> > > On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> > >> On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> > >> <[email protected]> wrote:
> > >>> On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> > >>>> Dear kernel maintainers,
> > >>>>
> > >>>> This submission is a kernel driver to support Intel(R) Gaussian & Neural
> > >>>> Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> > >>>> available on multiple Intel platforms. AI developers and users can offload
> > >>>> continuous inference workloads to an Intel(R) GNA device in order to
> > free
> > >>>> processor resources and save power. Noise reduction and speech recognition
> > >>>> are the examples of the workloads Intel(R) GNA deals with while its usage
> > >>>> is not limited to the two.
> > >>>
> > >>> How does this compare with the "nnpi" driver being proposed here:
> > >>> https://lore.kernel.org/r/[email protected]
> > >>>
> > >>> Please work with those developers to share code and userspace api and
> > >>> tools. Having the community review two totally different apis and
> > >>> drivers for the same type of functionality from the same company is
> > >>> totally wasteful of our time and energy.
> > >>
> > >> Agreed, but I think we should go further than this and work towards a
> > >> subsystem across companies for machine learning and neural networks
> > >> accelerators for both inferencing and training.
> > >
> > > We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> > > think G as in General, not Graphisc.
> >
> > I hope this was a joke.
> >
> > Just some thoughts:
> >
> > AFAICT AI first came as an application of GPUs, but has now
> > evolved/specialized into something of its own. I can imagine sharing
> > some code among the various subsystems, say GEM/TTM internals for memory
> > management. Besides that there's probably little that can be shared in
> > the userspace interfaces. A GPU is device that puts an image onto the
> > screen and an AI accelerator isn't. Treating both as the same, even if
> > they share similar chip architectures, seems like a stretch. They might
> > evolve in different directions and fit less and less under the same
> > umbrella.
>
> The putting something on the screen is just a tiny part of what GPUs
> do these days. Many GPUs don't even have display hardware anymore.
> Even with drawing APIs, it's just some operation that you do with
> memory. The display may be another device entirely. GPUs also do
> video encode and decode, jpeg acceleration, etc. drivers/gpu seems
> like a logical place to me. Call it drivers/accelerators if you like.
> Other than modesetting most of the shared infrastructure in
> drivers/gpu is around memory management and synchronization which are
> all the hard parts. Better to try and share that than to reinvent
> that in some other subsystem.

Maybe to add: Most of our driver stack is in userspace (like for NN/AI
chips too), both where high amounts of code sharing are the norm (like
with mesa3d) and areas there the landscape is a lot more fragmented
(like compute and media, where the userspace driver APIs are all
different for each vendor, or at least highly specialized). That's
another thing which I don't think any other kernel subsystem has, at
least as much as we do.

So for both the big design questions on how the overall stack is
organized down to the details like code sharing, drivers/g^Hxpu should
be the best place. Aside from the pesky problem that we do actually
look at the userspace side and have some expectations on that too, not
just on the kernel code alone.
-Daniel

>
> Alex
>
> >
> > And as Dave mentioned, these devices are hard to obtain. We don't really
> > know what we sign up for.
> >
> > Just my 2 cents.
> >
> > Best regards
> > Thomas
> >
> >
> >
> > --
> > Thomas Zimmermann
> > Graphics Driver Developer
> > SUSE Software Solutions Germany GmbH
> > Maxfeldstr. 5, 90409 Nürnberg, Germany
> > (HRB 36809, AG Nürnberg)
> > Geschäftsführer: Felix Imendörffer
> >



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-05-19 02:23:03

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

Hi

Am 17.05.21 um 21:23 schrieb Alex Deucher:
> On Mon, May 17, 2021 at 3:12 PM Thomas Zimmermann <[email protected]>
wrote:
>>
>> Hi
>>
>> Am 17.05.21 um 09:40 schrieb Daniel Vetter:
>>> On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
>>>> On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
>>>> <[email protected]> wrote:
>>>>> On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
>>>>>> Dear kernel maintainers,
>>>>>>
>>>>>> This submission is a kernel driver to support Intel(R) Gaussian & Neural
>>>>>> Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
>>>>>> available on multiple Intel platforms. AI developers and users can
offload
>>>>>> continuous inference workloads to an Intel(R) GNA device in order to
>> free
>>>>>> processor resources and save power. Noise reduction and speech recognition
>>>>>> are the examples of the workloads Intel(R) GNA deals with while its usage
>>>>>> is not limited to the two.
>>>>>
>>>>> How does this compare with the "nnpi" driver being proposed here:
>>>>> https://lore.kernel.org/r/[email protected]
>>>>>
>>>>> Please work with those developers to share code and userspace api and
>>>>> tools. Having the community review two totally different apis and
>>>>> drivers for the same type of functionality from the same company is
>>>>> totally wasteful of our time and energy.
>>>>
>>>> Agreed, but I think we should go further than this and work towards a
>>>> subsystem across companies for machine learning and neural networks
>>>> accelerators for both inferencing and training.
>>>
>>> We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
>>> think G as in General, not Graphisc.
>>
>> I hope this was a joke.
>>
>> Just some thoughts:
>>
>> AFAICT AI first came as an application of GPUs, but has now
>> evolved/specialized into something of its own. I can imagine sharing
>> some code among the various subsystems, say GEM/TTM internals for memory
>> management. Besides that there's probably little that can be shared in
>> the userspace interfaces. A GPU is device that puts an image onto the
>> screen and an AI accelerator isn't. Treating both as the same, even if
>> they share similar chip architectures, seems like a stretch. They might
>> evolve in different directions and fit less and less under the same
>> umbrella.
>
> The putting something on the screen is just a tiny part of what GPUs
> do these days. Many GPUs don't even have display hardware anymore.
> Even with drawing APIs, it's just some operation that you do with
> memory. The display may be another device entirely. GPUs also do
> video encode and decode, jpeg acceleration, etc. drivers/gpu seems
> like a logical place to me. Call it drivers/accelerators if you like.
> Other than modesetting most of the shared infrastructure in
> drivers/gpu is around memory management and synchronization which are
> all the hard parts. Better to try and share that than to reinvent
> that in some other subsystem.

I'm not sure whether we're on the same page or not.

I look at this from the UAPI perspective: the only interfaces that we
really standardize among GPUs is modesetting, dumb buffers, GEM. The
sophisticated rendering is done with per-driver interfaces. And
modesetting is the thing that AI does not do.

Sharing common code among subsystems is not a problem. Many of our
more-sophisticated helpers are located in DRM because no other
subsystems have the requirements yet. Maybe AI now has and we can move
the rsp shareable code to a common location. But AI is still no GPU. To
give a bad analogy: GPUs transmit audio these days. Yet we don't treat
them as sound cards.


Best regards
Thomas

>
> Alex
>
>>
>> And as Dave mentioned, these devices are hard to obtain. We don't really
>> know what we sign up for.
>>
>> Just my 2 cents.
>>
>> Best regards
>> Thomas
>>
>>
>>
>> --
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Software Solutions Germany GmbH
>> Maxfeldstr. 5, 90409 Nürnberg, Germany
>> (HRB 36809, AG Nürnberg)
>> Geschäftsführer: Felix Imendörffer
>>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2021-05-19 02:48:33

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, May 17, 2021 at 9:49 PM Thomas Zimmermann <[email protected]> wrote:
>
> Hi
>
> Am 17.05.21 um 21:23 schrieb Alex Deucher:
> > On Mon, May 17, 2021 at 3:12 PM Thomas Zimmermann <[email protected]>
> wrote:
> >>
> >> Hi
> >>
> >> Am 17.05.21 um 09:40 schrieb Daniel Vetter:
> >>> On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> >>>> On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> >>>> <[email protected]> wrote:
> >>>>> On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> >>>>>> Dear kernel maintainers,
> >>>>>>
> >>>>>> This submission is a kernel driver to support Intel(R) Gaussian & Neural
> >>>>>> Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> >>>>>> available on multiple Intel platforms. AI developers and users can
> offload
> >>>>>> continuous inference workloads to an Intel(R) GNA device in order to
> >> free
> >>>>>> processor resources and save power. Noise reduction and speech recognition
> >>>>>> are the examples of the workloads Intel(R) GNA deals with while its usage
> >>>>>> is not limited to the two.
> >>>>>
> >>>>> How does this compare with the "nnpi" driver being proposed here:
> >>>>> https://lore.kernel.org/r/[email protected]
> >>>>>
> >>>>> Please work with those developers to share code and userspace api and
> >>>>> tools. Having the community review two totally different apis and
> >>>>> drivers for the same type of functionality from the same company is
> >>>>> totally wasteful of our time and energy.
> >>>>
> >>>> Agreed, but I think we should go further than this and work towards a
> >>>> subsystem across companies for machine learning and neural networks
> >>>> accelerators for both inferencing and training.
> >>>
> >>> We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> >>> think G as in General, not Graphisc.
> >>
> >> I hope this was a joke.
> >>
> >> Just some thoughts:
> >>
> >> AFAICT AI first came as an application of GPUs, but has now
> >> evolved/specialized into something of its own. I can imagine sharing
> >> some code among the various subsystems, say GEM/TTM internals for memory
> >> management. Besides that there's probably little that can be shared in
> >> the userspace interfaces. A GPU is device that puts an image onto the
> >> screen and an AI accelerator isn't. Treating both as the same, even if
> >> they share similar chip architectures, seems like a stretch. They might
> >> evolve in different directions and fit less and less under the same
> >> umbrella.
> >
> > The putting something on the screen is just a tiny part of what GPUs
> > do these days. Many GPUs don't even have display hardware anymore.
> > Even with drawing APIs, it's just some operation that you do with
> > memory. The display may be another device entirely. GPUs also do
> > video encode and decode, jpeg acceleration, etc. drivers/gpu seems
> > like a logical place to me. Call it drivers/accelerators if you like.
> > Other than modesetting most of the shared infrastructure in
> > drivers/gpu is around memory management and synchronization which are
> > all the hard parts. Better to try and share that than to reinvent
> > that in some other subsystem.
>
> I'm not sure whether we're on the same page or not.
>
> I look at this from the UAPI perspective: the only interfaces that we
> really standardize among GPUs is modesetting, dumb buffers, GEM. The
> sophisticated rendering is done with per-driver interfaces. And
> modesetting is the thing that AI does not do.

Yeah, but the peole who know what should be standardized and what
should not be standardized for accel drivers are here. Because we've
done both models in the past, and pretty much everything in between.

Also like Daniel said, we support hw (and know how to drive it) for
anything from "kernel bashes register values" (gpus worked like that
20 years ago) to "mostly direct userspace submit (amdkfd and parts of
nouveau work like this).

There isn't any other subsystem with that much knowledge about how to
stand up the entire accelerator stack and not making it suck too
badly. That is the real value of dri-devel and the community we have
here, not the code sharing we occasionally tend to do.

> Sharing common code among subsystems is not a problem. Many of our
> more-sophisticated helpers are located in DRM because no other
> subsystems have the requirements yet. Maybe AI now has and we can move
> the rsp shareable code to a common location. But AI is still no GPU. To
> give a bad analogy: GPUs transmit audio these days. Yet we don't treat
> them as sound cards.

We actually do, there are full blown sound drivers for them over in
sound/ (ok I think they're all in sound/hda for pci gpus or in
sound/soc actually). There's some glue to tie it together because it
requires coordination between the gpu and sound side of things, but
that's it.

Also I think it would be extremely silly to remove all the drm_ stuff
just because it's originated from GPUs, and therefore absolutely
cannot be used by other accelarators. I'm not seeing the point in
that, but if someone has convincing technical argument for this we
could do it. A tree wide s/drm_/xpu_ might make some sense perhaps if
that makes people more comfortable with the idea of reusing code from
gpu origins for accelerators in general.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-05-19 04:53:48

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

Hi

Am 17.05.21 um 21:32 schrieb Daniel Stone:
> Hi,
>
> On Mon, 17 May 2021 at 20:12, Thomas Zimmermann <[email protected]> wrote:
>> Am 17.05.21 um 09:40 schrieb Daniel Vetter:
>>> We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
>>> think G as in General, not Graphisc.
>>
>> I hope this was a joke.
>>
>> Just some thoughts:
>>
>> AFAICT AI first came as an application of GPUs, but has now
>> evolved/specialized into something of its own. I can imagine sharing
>> some code among the various subsystems, say GEM/TTM internals for memory
>> management. Besides that there's probably little that can be shared in
>> the userspace interfaces. A GPU is device that puts an image onto the
>> screen and an AI accelerator isn't.
>
> But it isn't. A GPU is a device that has a kernel-arbitrated MMU
> hosting kernel-managed buffers, executes user-supplied compiled
> programs with reference to those buffers and other jobs, and informs
> the kernel about progress.
>
> KMS lies under the same third-level directory, but even when GPU and
> display are on the same die, they're totally different IP blocks
> developed on different schedules which are just periodically glued
> together.

I mentioned this elsewhere: it's not about the chip architecture, it's
about the UAPI. In the end, the GPU is about displaying things on a
screen. Even if the rendering and the scanout engines are on different
IP blocks. (Or different devices.)

The fact that one can do general purpose computing on a GPU is a
byproduct of the evolution of graphics hardware. It never was the goal.


>
>> Treating both as the same, even if
>> they share similar chip architectures, seems like a stretch. They might
>> evolve in different directions and fit less and less under the same
>> umbrella.
>
> Why not? All we have in common in GPU land right now is MMU + buffer
> references + job scheduling + synchronisation. None of this has common
> top-level API, or even a common top-level model. It's not just ISA
> differences, but we have very old-school devices where the kernel
> needs to register fill on every job, living next to middle-age devices
> where the kernel and userspace co-operate to fill a ring buffer,
> living next to modern devices where userspace does some stuff and then
> the hardware makes it happen with the bare minimum of kernel
> awareness.

I see all this as an example why AI should not live under gpu/. There
are already many generations of GPUs with different feature sets
supported. Why lump more behind the same abstractions if AI can take a
fresh start? Why should we care about AI and why should AI care about
all our legacy.

We can still share all the internal code if AI needs any of it.
Meanwhile AI drivers can provide their own UAPIs until a common
framework emerges.

Again, just my 2 cents.

Best regards
Thomas

>
> Honestly I think there's more difference between lima and amdgpu then
> there is between amdgpu and current NN/ML devices.
>
> Cheers,
> Daniel
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2021-05-19 04:55:32

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

Hi

Am 17.05.21 um 22:00 schrieb Daniel Vetter:

>> Sharing common code among subsystems is not a problem. Many of our
>> more-sophisticated helpers are located in DRM because no other
>> subsystems have the requirements yet. Maybe AI now has and we can move
>> the rsp shareable code to a common location. But AI is still no GPU. To
>> give a bad analogy: GPUs transmit audio these days. Yet we don't treat
>> them as sound cards.
>
> We actually do, there are full blown sound drivers for them over in
> sound/ (ok I think they're all in sound/hda for pci gpus or in
> sound/soc actually). There's some glue to tie it together because it
> requires coordination between the gpu and sound side of things, but
> that's it.

I know. But we don't merge both subsystems, just because the devices
have some overlap in functionality.

Best regards
Thomas

>
> Also I think it would be extremely silly to remove all the drm_ stuff
> just because it's originated from GPUs, and therefore absolutely
> cannot be used by other accelarators. I'm not seeing the point in
> that, but if someone has convincing technical argument for this we
> could do it. A tree wide s/drm_/xpu_ might make some sense perhaps if
> that makes people more comfortable with the idea of reusing code from
> gpu origins for accelerators in general.
> -Daniel
>

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature

2021-05-19 08:09:45

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, May 17, 2021 at 10:10 PM Thomas Zimmermann <[email protected]> wrote:
>
> Hi
>
> Am 17.05.21 um 21:32 schrieb Daniel Stone:
> > Hi,
> >
> > On Mon, 17 May 2021 at 20:12, Thomas Zimmermann <[email protected]> wrote:
> >> Am 17.05.21 um 09:40 schrieb Daniel Vetter:
> >>> We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> >>> think G as in General, not Graphisc.
> >>
> >> I hope this was a joke.
> >>
> >> Just some thoughts:
> >>
> >> AFAICT AI first came as an application of GPUs, but has now
> >> evolved/specialized into something of its own. I can imagine sharing
> >> some code among the various subsystems, say GEM/TTM internals for memory
> >> management. Besides that there's probably little that can be shared in
> >> the userspace interfaces. A GPU is device that puts an image onto the
> >> screen and an AI accelerator isn't.
> >
> > But it isn't. A GPU is a device that has a kernel-arbitrated MMU
> > hosting kernel-managed buffers, executes user-supplied compiled
> > programs with reference to those buffers and other jobs, and informs
> > the kernel about progress.
> >
> > KMS lies under the same third-level directory, but even when GPU and
> > display are on the same die, they're totally different IP blocks
> > developed on different schedules which are just periodically glued
> > together.
>
> I mentioned this elsewhere: it's not about the chip architecture, it's
> about the UAPI. In the end, the GPU is about displaying things on a
> screen. Even if the rendering and the scanout engines are on different
> IP blocks. (Or different devices.)

Sure, but that's ignoring the reality there there's enormous amounts
of code needed to make this rendering possible. All of which keeps
existing if you take away the display, use your gpu to do compute,
throw out the the raster and texture fetch blocks and rebalance your
compute units to be much faster at the bfloat16 and u8 math (or
whatever it is the NN people love today) than fp32, where traditional
render gpus are kind. At that point you have a NN/AI chip, and like
Daniel Stone says, the difference here is often much smaller than the
difference between drm/lima and drm/amdgpu. Which at least on the 3d
side happen to share large chunks of our stack (more sharing in
userspace than the kernel, but still quite some sharing overall in
concepts and code).

There's overall substantially more code to make this work than the
modeset drivers you think are the corner stone of a gpu driver.

Also if you want to do broad strokes refactoring like pulling the
memory management/command submission stuff out of drm, then the right
thing would be to pull the modeset stuff out and merge it with maybe
v4l. modesetting was a 10 years later addition to drm, this entire
thing started with memory/command submission management.

And a lot of people got rather mad that the drm folks reinvented their
own modeset api and didn't use one of the existing ones. We eclipsed
them by now with atomic support, so somewhat moot point now, but not
when it landed 10 years ago.

> The fact that one can do general purpose computing on a GPU is a
> byproduct of the evolution of graphics hardware. It never was the goal.

I think we've crossed now the point where 50% of gpu sales are
displayless. It stopped being a byproduct long ago and became the main
goal in many areas and markets.

But also the core of drivers/gpu _is_ the memory management stuff.
That's what this subsystem has been doing for 20 years or so by now.
The modeset stuff is a comparitively recent addition (but has grown a
lot thanks to tons of new drivers that landed and fbdev essentially
dying).

> >> Treating both as the same, even if
> >> they share similar chip architectures, seems like a stretch. They might
> >> evolve in different directions and fit less and less under the same
> >> umbrella.
> >
> > Why not? All we have in common in GPU land right now is MMU + buffer
> > references + job scheduling + synchronisation. None of this has common
> > top-level API, or even a common top-level model. It's not just ISA
> > differences, but we have very old-school devices where the kernel
> > needs to register fill on every job, living next to middle-age devices
> > where the kernel and userspace co-operate to fill a ring buffer,
> > living next to modern devices where userspace does some stuff and then
> > the hardware makes it happen with the bare minimum of kernel
> > awareness.
>
> I see all this as an example why AI should not live under gpu/. There
> are already many generations of GPUs with different feature sets
> supported. Why lump more behind the same abstractions if AI can take a
> fresh start? Why should we care about AI and why should AI care about
> all our legacy.

Fresh start here means "ignore all the lessons learned from 20 years
of accelerator driver hacking" I think.

> We can still share all the internal code if AI needs any of it.
> Meanwhile AI drivers can provide their own UAPIs until a common
> framework emerges.

Again the no 1 lesson of writing accel drivers is that you need the
fully open userspace stack, or it's game over long term. No amount of
"we'll share code later on" will save you from that, because that's
just not going to be an option. There's a few other lessons like you
don't actually want to have a standardized uapi for the accelerator
command submission and memory management, but there are some
standardized approaches that make sense (we've probably tried them
all).

This has nothing to do with how you organize the kernel subsystem, but
all about how you set up the overall driver stack. Of which the
userspace side is the important part.

And back to your point that display is the main reason why drivers/gpu
exists: None of this has anything to do with display, but is exactly
what the render _accelerator_ part of dri-devel has been doing for a
rather long time by now. Which is why other accelarators should
probably do the same thing instead of going "nah we're different,
there's no DP output connected to our accelator".

Cheers, Daniel

PS: Also there are NN chips with DP/HDMI ports thrown in for the lolz.
Turns out that these NN things are pretty useful when building video
processing pipelines.

> Again, just my 2 cents.
>
> Best regards
> Thomas
>
> >
> > Honestly I think there's more difference between lima and amdgpu then
> > there is between amdgpu and current NN/ML devices.
> >
> > Cheers,
> > Daniel
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
>


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-05-19 08:10:09

by Dave Airlie

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Tue, 18 May 2021 at 06:10, Thomas Zimmermann <[email protected]> wrote:
>
> Hi
>
> Am 17.05.21 um 21:32 schrieb Daniel Stone:
> > Hi,
> >
> > On Mon, 17 May 2021 at 20:12, Thomas Zimmermann <[email protected]> wrote:
> >> Am 17.05.21 um 09:40 schrieb Daniel Vetter:
> >>> We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> >>> think G as in General, not Graphisc.
> >>
> >> I hope this was a joke.
> >>
> >> Just some thoughts:
> >>
> >> AFAICT AI first came as an application of GPUs, but has now
> >> evolved/specialized into something of its own. I can imagine sharing
> >> some code among the various subsystems, say GEM/TTM internals for memory
> >> management. Besides that there's probably little that can be shared in
> >> the userspace interfaces. A GPU is device that puts an image onto the
> >> screen and an AI accelerator isn't.
> >
> > But it isn't. A GPU is a device that has a kernel-arbitrated MMU
> > hosting kernel-managed buffers, executes user-supplied compiled
> > programs with reference to those buffers and other jobs, and informs
> > the kernel about progress.
> >
> > KMS lies under the same third-level directory, but even when GPU and
> > display are on the same die, they're totally different IP blocks
> > developed on different schedules which are just periodically glued
> > together.
>
> I mentioned this elsewhere: it's not about the chip architecture, it's
> about the UAPI. In the end, the GPU is about displaying things on a
> screen. Even if the rendering and the scanout engines are on different
> IP blocks. (Or different devices.)
>
> The fact that one can do general purpose computing on a GPU is a
> byproduct of the evolution of graphics hardware. It never was the goal.

But then we would have a subsystem for AI accelerators excluding GPUs,
do we then start to layer that subsystem onto drivers/gpu? at which
point why bother.

The thing is UAPI and stack architecture are important, but what is
more important than any of that is that there is a place where the
people invested in the area can come together outside of company
boundaries and discuss ideas and bounce designs around each other to
come to an agreement without the overheads of company interactions.
dri-devel + mesa have managed this for graphics but it's taken years
and we are still fighting that battle within major companies who even
when they know it produces good results can't drag themselves to give
up control over anything unless given no other choice.

I expect the accel teams in these companies need to step outside their
productization timelines and powerpoints and start discussing uAPI
designs with the other companies in the area. Until that happens I
expect upstreaming any of these should be a default no.

Dave.

2021-05-21 02:44:48

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Thu, May 13, 2021 at 1:01 PM Maciej Kwapulinski
<[email protected]> wrote:

> This submission is a kernel driver to support Intel(R) Gaussian & Neural
> Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> available on multiple Intel platforms.

I replied to v2 and did not get any answer:
https://lore.kernel.org/lkml/CACRpkdaHMKueLr9Q5CAXQXN5A5FwZScfroE-DYfK+NaGXaqN1A@mail.gmail.com/

Now more kernel maintainers are saying more or less the
same thing, please address and reply to serious
architectural comments instead of sending new iterations of
the same patch set.

Also include me and Olof Johansson when reposting, thanks.

Yours,
Linus Walleij

2021-05-24 10:45:15

by Maciej Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] intel_gna: add ioctl handler


Greg Kroah-Hartman <[email protected]> writes:

> On Fri, May 14, 2021 at 10:20:42AM +0200, Maciej Kwapulinski wrote:
>>
>> Greg Kroah-Hartman <[email protected]> writes:
>>
>> > On Thu, May 13, 2021 at 01:00:37PM +0200, Maciej Kwapulinski wrote:
>> >> From: Tomasz Jankowski <[email protected]>
>> >>
>> >> Add ioctl handler into GNA driver.
>> >> The ioctl interface provides the ability to do the following:
>> >> - Map and unmap memory buffers for GNA computation requests.
>> >> - Retrieve capabilities of the underlying GNA IP.
>> >> - Submit GNA computation requests.
>> >> - Request notification of scoring completion.
>> >
>> > Do you have a pointer to the userspace code that uses this ioctl?
>> > That's kind of required here, otherwise we have no idea how this all
>> > works.
>> >
>>
>> yes, it's present under following link:
>>
>> https://github.com/intel/gna
>
> Then that needs to go here in this changelog text, right?

link to library is already present in 00/14, I didn't want to have it in
two places, that's why not present here.

2021-05-24 10:50:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] intel_gna: add ioctl handler

On Mon, May 24, 2021 at 12:43:25PM +0200, Maciej Kwapulinski wrote:
>
> Greg Kroah-Hartman <[email protected]> writes:
>
> > On Fri, May 14, 2021 at 10:20:42AM +0200, Maciej Kwapulinski wrote:
> >>
> >> Greg Kroah-Hartman <[email protected]> writes:
> >>
> >> > On Thu, May 13, 2021 at 01:00:37PM +0200, Maciej Kwapulinski wrote:
> >> >> From: Tomasz Jankowski <[email protected]>
> >> >>
> >> >> Add ioctl handler into GNA driver.
> >> >> The ioctl interface provides the ability to do the following:
> >> >> - Map and unmap memory buffers for GNA computation requests.
> >> >> - Retrieve capabilities of the underlying GNA IP.
> >> >> - Submit GNA computation requests.
> >> >> - Request notification of scoring completion.
> >> >
> >> > Do you have a pointer to the userspace code that uses this ioctl?
> >> > That's kind of required here, otherwise we have no idea how this all
> >> > works.
> >> >
> >>
> >> yes, it's present under following link:
> >>
> >> https://github.com/intel/gna
> >
> > Then that needs to go here in this changelog text, right?
>
> link to library is already present in 00/14, I didn't want to have it in
> two places, that's why not present here.

Commit 00/XX never shows up in the changelog :(

2021-05-25 10:24:49

by Maciej Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH v3 11/14] intel_gna: add ioctl handler


Greg Kroah-Hartman <[email protected]> writes:

> On Mon, May 24, 2021 at 12:43:25PM +0200, Maciej Kwapulinski wrote:
>>
>> Greg Kroah-Hartman <[email protected]> writes:
>>
>> > On Fri, May 14, 2021 at 10:20:42AM +0200, Maciej Kwapulinski wrote:
>> >>
>> >> Greg Kroah-Hartman <[email protected]> writes:
>> >>
>> >> > On Thu, May 13, 2021 at 01:00:37PM +0200, Maciej Kwapulinski wrote:
>> >> >> From: Tomasz Jankowski <[email protected]>
>> >> >>
>> >> >> Add ioctl handler into GNA driver.
>> >> >> The ioctl interface provides the ability to do the following:
>> >> >> - Map and unmap memory buffers for GNA computation requests.
>> >> >> - Retrieve capabilities of the underlying GNA IP.
>> >> >> - Submit GNA computation requests.
>> >> >> - Request notification of scoring completion.
>> >> >
>> >> > Do you have a pointer to the userspace code that uses this ioctl?
>> >> > That's kind of required here, otherwise we have no idea how this all
>> >> > works.
>> >> >
>> >>
>> >> yes, it's present under following link:
>> >>
>> >> https://github.com/intel/gna
>> >
>> > Then that needs to go here in this changelog text, right?
>>
>> link to library is already present in 00/14, I didn't want to have it in
>> two places, that's why not present here.
>
> Commit 00/XX never shows up in the changelog :(

right

2021-06-16 07:39:41

by Maciej Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator


Greg Kroah-Hartman <[email protected]> writes:

> On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
>> Dear kernel maintainers,
>>
>> This submission is a kernel driver to support Intel(R) Gaussian & Neural
>> Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
>> available on multiple Intel platforms. AI developers and users can offload
>> continuous inference workloads to an Intel(R) GNA device in order to free
>> processor resources and save power. Noise reduction and speech recognition
>> are the examples of the workloads Intel(R) GNA deals with while its usage
>> is not limited to the two.
>
> How does this compare with the "nnpi" driver being proposed here:
> https://lore.kernel.org/r/[email protected]
>
> Please work with those developers to share code and userspace api and
> tools. Having the community review two totally different apis and
> drivers for the same type of functionality from the same company is
> totally wasteful of our time and energy.
>
> thanks,
>
> greg k-h

after consulting, we will try to share api and some kernel code
between the two drivers.

Following is the reference for more information:
https://lore.kernel.org/lkml/[email protected]/

regards,
Maciej

2022-06-20 10:04:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, Jun 20, 2022 at 11:49:07AM +0200, [email protected] wrote:
> Please share your thoughts.

No code here to share thoughts about :(

2022-06-20 10:06:39

by Maciej Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Wed, 16 Jun 2021 09:38:14 +0200, Maciej Kwapulinski wrote:
> after consulting, we will try to share api and some kernel code
> between the two drivers.

We prepared a prototype work based on Daniel Vetter’s invitation to
evaluate DRM framework as prospective fit.

Early results look quite promising. Effective leverage of DRM framework for
non GPU driver has been achieved. GNA driver source code turned out to have
been simplified by offloading some areas (device management + memory
regions management).

As a result, GNA driver source code size has been reduced by 15% (420
lines). DRM Framework complexity has not been an issue so far.

Plan for next step is to publish code to dri-devel ML.
Please share your thoughts.

Regards,
Maciej

2022-06-20 10:21:29

by Maciej Kwapulinski

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator


Greg KH <[email protected]> writes:

> On Mon, Jun 20, 2022 at 11:49:07AM +0200, [email protected] wrote:
>> Please share your thoughts.
>
> No code here to share thoughts about :(

code will be published in comming weeks to dri-devel ML

2022-06-20 10:28:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, Jun 20, 2022 at 12:08:34PM +0200, Maciej Kwapulinski wrote:
>
> Greg KH <[email protected]> writes:
>
> > On Mon, Jun 20, 2022 at 11:49:07AM +0200, [email protected] wrote:
> >> Please share your thoughts.
> >
> > No code here to share thoughts about :(
>
> code will be published in comming weeks to dri-devel ML

Saying "we will send patches sometime in the future" is a bit odd when
we have thousands of patches each week in our inbox to review now. Why
not help us out with that workload? :)

thanks,

greg k-h

2022-06-25 18:02:19

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

On Mon, 20 Jun 2022 at 11:51, <[email protected]> wrote:
> On Wed, 16 Jun 2021 09:38:14 +0200, Maciej Kwapulinski wrote:
> > after consulting, we will try to share api and some kernel code
> > between the two drivers.
>
> We prepared a prototype work based on Daniel Vetter’s invitation to
> evaluate DRM framework as prospective fit.
>
> Early results look quite promising. Effective leverage of DRM framework for
> non GPU driver has been achieved. GNA driver source code turned out to have
> been simplified by offloading some areas (device management + memory
> regions management).
>
> As a result, GNA driver source code size has been reduced by 15% (420
> lines). DRM Framework complexity has not been an issue so far.

Nice.

> Plan for next step is to publish code to dri-devel ML.
> Please share your thoughts.

Sounds like solid progress and looking forward for some patches!

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch