Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752387AbaGJWvk (ORCPT ); Thu, 10 Jul 2014 18:51:40 -0400 Received: from mail-bn1blp0181.outbound.protection.outlook.com ([207.46.163.181]:16323 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751548AbaGJWvj (ORCPT ); Thu, 10 Jul 2014 18:51:39 -0400 X-WSS-ID: 0N8IQ5S-08-7SQ-02 X-M-MSG: From: "Gabbay, Oded" To: "j.glisse@gmail.com" CC: "linux-kernel@vger.kernel.org" , "Bridgman, John" , "Deucher, Alexander" , "Lewycky, Andrew" , "joro@8bytes.org" , "akpm@linux-foundation.org" , "dri-devel@lists.freedesktop.org" , "airlied@linux.ie" , "oded.gabbay@gmail.com" Subject: Re: [PATCH 00/83] AMD HSA kernel driver Thread-Topic: [PATCH 00/83] AMD HSA kernel driver Thread-Index: AQHPnIhRCXzXB0cUQEGn7YgiYKQ8Y5uaJWaAgAAHYIA= Date: Thu, 10 Jul 2014 22:51:29 +0000 Message-ID: <019CCE693E457142B37B791721487FD91809E4C2@storexdag01.amd.com> References: <1405028727-5276-1-git-send-email-oded.gabbay@amd.com> <20140710222423.GA14219@gmail.com> In-Reply-To: <20140710222423.GA14219@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.181.40.9] Content-Type: text/plain; charset="utf-8" Content-ID: <84C4EFE27DC2A948BFEFCEBD3E5DC241@amd.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:165.204.84.222;CTRY:US;IPV:NLI;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(6009001)(428002)(24454002)(189002)(199002)(377424004)(51704005)(81342001)(106466001)(95666004)(46102001)(92566001)(85306003)(101416001)(83322001)(84676001)(4396001)(2351001)(47776003)(107046002)(19580405001)(50986999)(23676002)(54356999)(83072002)(81542001)(106116001)(87936001)(44976005)(20776003)(85852003)(53416004)(68736004)(76176999)(33656002)(74662001)(50466002)(79102001)(86362001)(110136001)(2656002)(21056001)(99396002)(76482001)(80022001)(77982001)(55846006)(97736001)(64706001)(31966008)(19580395003)(92726001)(105586002)(77096002)(74502001)(217873001);DIR:OUT;SFP:;SCL:1;SRVR:BY2PR02MB041;H:atltwp02.amd.com;FPR:;MLV:sfv;PTR:InfoDomainNonexistent;MX:1;LANG:en; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID: X-Forefront-PRVS: 0268246AE7 Authentication-Results: spf=none (sender IP is 165.204.84.222) smtp.mailfrom=Oded.Gabbay@amd.com; X-OriginatorOrg: amd4.onmicrosoft.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id s6AMptEp013918 On Thu, 2014-07-10 at 18:24 -0400, Jerome Glisse wrote: > On Fri, Jul 11, 2014 at 12:45:27AM +0300, Oded Gabbay wrote: > > This patch set implements a Heterogeneous System Architecture > > (HSA) driver > > for radeon-family GPUs. > > This is just quick comments on few things. Given size of this, people > will need to have time to review things. > > > HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to > > share > > system resources more effectively via HW features including > > shared pageable > > memory, userspace-accessible work queues, and platform-level > > atomics. In > > addition to the memory protection mechanisms in GPUVM and > > IOMMUv2, the Sea > > Islands family of GPUs also performs HW-level validation of > > commands passed > > in through the queues (aka rings). > > The code in this patch set is intended to serve both as a sample > > driver for > > other HSA-compatible hardware devices and as a production driver > > for > > radeon-family processors. The code is architected to support > > multiple CPUs > > each with connected GPUs, although the current implementation > > focuses on a > > single Kaveri/Berlin APU, and works alongside the existing radeon > > kernel > > graphics driver (kgd). > > AMD GPUs designed for use with HSA (Sea Islands and up) share > > some hardware > > functionality between HSA compute and regular gfx/compute (memory, > > interrupts, registers), while other functionality has been added > > specifically for HSA compute (hw scheduler for virtualized > > compute rings). > > All shared hardware is owned by the radeon graphics driver, and > > an interface > > between kfd and kgd allows the kfd to make use of those shared > > resources, > > while HSA-specific functionality is managed directly by kfd by > > submitting > > packets into an HSA-specific command queue (the "HIQ"). > > During kfd module initialization a char device node (/dev/kfd) is > > created > > (surviving until module exit), with ioctls for queue creation & > > management, > > and data structures are initialized for managing HSA device > > topology. > > The rest of the initialization is driven by calls from the radeon > > kgd at > > the following points : > > - radeon_init (kfd_init) > > - radeon_exit (kfd_fini) > > - radeon_driver_load_kms (kfd_device_probe, kfd_device_init) > > - radeon_driver_unload_kms (kfd_device_fini) > > During the probe and init processing per-device data structures > > are > > established which connect to the associated graphics kernel > > driver. This > > information is exposed to userspace via sysfs, along with a > > version number > > allowing userspace to determine if a topology change has occurred > > while it > > was reading from sysfs. > > The interface between kfd and kgd also allows the kfd to request > > buffer > > management services from kgd, and allows kgd to route interrupt > > requests to > > kfd code since the interrupt block is shared between regular > > graphics/compute and HSA compute subsystems in the GPU. > > The kfd code works with an open source usermode library > > ("libhsakmt") which > > is in the final stages of IP review and should be published in a > > separate > > repo over the next few days. > > The code operates in one of three modes, selectable via the > > sched_policy > > module parameter : > > - sched_policy=0 uses a hardware scheduler running in the MEC > > block within > > CP, and allows oversubscription (more queues than HW slots) > > - sched_policy=1 also uses HW scheduling but does not allow > > oversubscription, so create_queue requests fail when we run out > > of HW slots > > - sched_policy=2 does not use HW scheduling, so the driver > > manually assigns > > queues to HW slots by programming registers > > The "no HW scheduling" option is for debug & new hardware bringup > > only, so > > has less test coverage than the other options. Default in the > > current code > > is "HW scheduling without oversubscription" since that is where > > we have the > > most test coverage but we expect to change the default to "HW > > scheduling > > with oversubscription" after further testing. This effectively > > removes the > > HW limit on the number of work queues available to applications. > > Programs running on the GPU are associated with an address space > > through the > > VMID field, which is translated to a unique PASID at access time > > via a set > > of 16 VMID-to-PASID mapping registers. The available VMIDs > > (currently 16) > > are partitioned (under control of the radeon kgd) between current > > gfx/compute and HSA compute, with each getting 8 in the current > > code. The > > VMID-to-PASID mapping registers are updated by the HW scheduler > > when used, > > and by driver code if HW scheduling is not being used. > > The Sea Islands compute queues use a new "doorbell" mechanism > > instead of the > > earlier kernel-managed write pointer registers. Doorbells use a > > separate BAR > > dedicated for this purpose, and pages within the doorbell > > aperture are > > mapped to userspace (each page mapped to only one user address > > space). > > Writes to the doorbell aperture are intercepted by GPU hardware, > > allowing > > userspace code to safely manage work queues (rings) without > > requiring a > > kernel call for every ring update. > > First step for an application process is to open the kfd device. > > Calls to > > open create a kfd "process" structure only for the first thread > > of the > > process. Subsequent open calls are checked to see if they are > > from processes > > using the same mm_struct and, if so, don't do anything. The kfd > > per-process > > data lives as long as the mm_struct exists. Each mm_struct is > > associated > > with a unique PASID, allowing the IOMMUv2 to make userspace > > process memory > > accessible to the GPU. > > Next step is for the application to collect topology information > > via sysfs. > > This gives userspace enough information to be able to identify > > specific > > nodes (processors) in subsequent queue management calls. > > Application > > processes can create queues on multiple processors, and > > processors support > > queues from multiple processes. > > I am not a fan to use sysfs to discover topoly. > > > At this point the application can create work queues in userspace > > memory and > > pass them through the usermode library to kfd to have them mapped > > onto HW > > queue slots so that commands written to the queues can be > > executed by the > > GPU. Queue operations specify a processor node, and so the bulk > > of this code > > is device-specific. > > Written by John Bridgman > > So general comment is you need to collapse many patches things like > 58 fixing > kernel style should be collapsed ie fix all previous patch that have > broken > style. > > Even worse is thing like 71, removing code you just added few patch > earlier > in the patchset. Not quite, the code was added on patch 11. > This means that if i start reviewing following > patch order > i might spend time on code that is later deleted/modified/fixed ie > time i > spend understanding and reading some code might be just wasted. Quick answer is that you are of course right, but having said that, I think it would be not simple at all to do that squashing. I squashed what I could, and probably I can do a little more (like 58), but the major problem is that one of the main modules of the driver - the scheduler (QCM) - was completely re-written (patches 46-53). Therefore, from patch 1 to 53, we use the old scheduler code and from 54 we use the new QCM (and the old scheduler code was completely remove at 71). So I could maybe squash 71 into 54, but that won't help you much, I think. So, the current advice I can give is to completely ignore the following files because they do not exist in the final commit: - kfd_sched_cik_static.c (stopped using in 54) - kfd_registers.c (stopped using in 81 because we moved all register writes to radeon) - kfd_hw_pointer_store.c (alive from 46 to 67) - kfd_hw_pointer_store.h (alive from 46 to 67) Oded > I will come back with individual patch comments and also general > remarks. > > Cheers, > Jérôme > > > > Alexey Skidanov (4): > > hsa/radeon: 32-bit processes support > > hsa/radeon: NULL pointer dereference bug workaround > > hsa/radeon: HSA64/HSA32 modes support > > hsa/radeon: Add local memory to topology > > Andrew Lewycky (3): > > hsa/radeon: Make binding of process to device permanent > > hsa/radeon: Implement hsaKmtSetMemoryPolicy > > mm: Change timing of notification to IOMMUs about a page to be > > invalidated > > Ben Goz (20): > > hsa/radeon: Add queue and hw_pointer_store modules > > hsa/radeon: Add support allocating kernel doorbells > > hsa/radeon: Add mqd_manager module > > hsa/radeon: Add kernel queue support for KFD > > hsa/radeon: Add module parameter of scheduling policy > > hsa/radeon: Add packet manager module > > hsa/radeon: Add process queue manager module > > hsa/radeon: Add device queue manager module > > hsa/radeon: Switch to new queue scheduler > > hsa/radeon: Add IOCTL for update queue > > hsa/radeon: Queue Management integration with Memory Management > > hsa/radeon: update queue fault handling > > hsa/radeon: fixing a bug to support 32b processes > > hsa/radeon: Fix number of pipes per ME > > hsa/radeon: Removing hw pointer store module > > hsa/radeon: Adding some error messages > > hsa/radeon: Fixing minor issues with kernel queues (DIQ) > > drm/radeon: Add register access functions to kfd2kgd interface > > hsa/radeon: Eliminating all direct register accesses > > drm/radeon: Remove lock functions from kfd2kgd interface > > Evgeny Pinchuk (9): > > hsa/radeon: fix the OEMID assignment in kfd_topology > > drm/radeon: extending kfd-kgd interface > > hsa/radeon: implementing IOCTL for clock counters > > drm/radeon: adding synchronization for GRBM GFX > > hsa/radeon: fixing clock counters bug > > drm/radeon: Extending kfd interface > > hsa/radeon: Adding max clock speeds to topology > > hsa/radeon: Alternating the source of max clock > > hsa/radeon: Exclusive access for perf. counters > > Michael Varga (1): > > hsa/radeon: debugging print statements > > Oded Gabbay (45): > > mm: Add kfd_process pointer to mm_struct > > drm/radeon: reduce number of free VMIDs and pipes in KV > > drm/radeon: Report doorbell configuration to kfd > > drm/radeon: Add radeon <--> kfd interface > > drm/radeon: Add kfd-->kgd interface to get virtual ram size > > drm/radeon: Add kfd-->kgd interfaces of memory > > allocation/mapping > > drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl > > register > > drm/radeon: Add calls to initialize and finalize kfd from radeon > > hsa/radeon: Add code base of hsa driver for AMD's GPUs > > hsa/radeon: Add initialization and unmapping of doorbell > > aperture > > hsa/radeon: Add scheduler code > > hsa/radeon: Add kfd mmap handler > > hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and > > DESTROY_QUEUE > > hsa/radeon: Update MAINTAINERS and CREDITS files > > hsa/radeon: Add interrupt handling module > > hsa/radeon: Add the isr function of the KFD scehduler > > hsa/radeon: Handle deactivation of queues using interrupts > > hsa/radeon: Enable interrupts in KFD scheduler > > hsa/radeon: Enable/Disable KFD interrupt module > > hsa/radeon: Add interrupt callback function to kgd2kfd interface > > hsa/radeon: Add kgd-->kfd interfaces for suspend and resume > > drm/radeon: Add calls to suspend and resume of kfd driver > > drm/radeon/cik: Don't touch int of pipes 1-7 > > drm/radeon/cik: Call kfd isr function > > hsa/radeon: Fix memory size allocated for HPD > > hsa/radeon: Fix list of supported devices > > hsa/radeon: Fix coding style in cik_int.h > > hsa/radeon: Print ioctl commnad only in debug mode > > hsa/radeon: Print ISR info only in debug mode > > hsa/radeon: Workaround for a bug in amd_iommu > > hsa/radeon: Eliminate warnings in compilation > > hsa/radeon: Various kernel styling fixes > > hsa/radeon: Rearrange structures in kfd_ioctl.h > > hsa/radeon: change another pr_info to pr_debug > > hsa/radeon: Fix timeout calculation in sync_with_hw > > hsa/radeon: Update module information and version > > hsa/radeon: Update module version to 0.6.0 > > hsa/radeon: Fix initialization of sh_mem registers > > hsa/radeon: Fix compilation warnings > > hsa/radeon: Remove old scheduler code > > hsa/radeon: Static analysis (smatch) fixes > > hsa/radeon: Check oversubscription before destroying runlist > > hsa/radeon: Don't verify cksum when parsing CRAT table > > hsa/radeon: Update module version to 0.6.1 > > hsa/radeon: Update module version to 0.6.2 > > Yair Shachar (1): > > hsa/radeon: Adding qcm fence return status > > CREDITS | 7 + > > MAINTAINERS | 8 + > > drivers/Kconfig | 2 + > > drivers/gpu/Makefile | 1 + > > drivers/gpu/drm/radeon/Makefile | 1 + > > drivers/gpu/drm/radeon/cik.c | 156 +-- > > drivers/gpu/drm/radeon/cikd.h | 51 +- > > drivers/gpu/drm/radeon/radeon.h | 9 + > > drivers/gpu/drm/radeon/radeon_device.c | 32 + > > drivers/gpu/drm/radeon/radeon_drv.c | 6 + > > drivers/gpu/drm/radeon/radeon_kfd.c | 630 > > ++++++++++ > > drivers/gpu/drm/radeon/radeon_kms.c | 9 + > > drivers/gpu/hsa/Kconfig | 20 + > > drivers/gpu/hsa/Makefile | 1 + > > drivers/gpu/hsa/radeon/Makefile | 12 + > > drivers/gpu/hsa/radeon/cik_int.h | 50 + > > drivers/gpu/hsa/radeon/cik_mqds.h | 250 ++++ > > drivers/gpu/hsa/radeon/cik_regs.h | 220 ++++ > > drivers/gpu/hsa/radeon/kfd_aperture.c | 123 ++ > > drivers/gpu/hsa/radeon/kfd_chardev.c | 530 > > +++++++++ > > drivers/gpu/hsa/radeon/kfd_crat.h | 294 +++++ > > drivers/gpu/hsa/radeon/kfd_device.c | 244 ++++ > > drivers/gpu/hsa/radeon/kfd_device_queue_manager.c | 981 > > ++++++++++++++++ > > drivers/gpu/hsa/radeon/kfd_device_queue_manager.h | 101 ++ > > drivers/gpu/hsa/radeon/kfd_doorbell.c | 242 ++++ > > drivers/gpu/hsa/radeon/kfd_interrupt.c | 177 +++ > > drivers/gpu/hsa/radeon/kfd_kernel_queue.c | 305 +++++ > > drivers/gpu/hsa/radeon/kfd_kernel_queue.h | 66 ++ > > drivers/gpu/hsa/radeon/kfd_module.c | 130 +++ > > drivers/gpu/hsa/radeon/kfd_mqd_manager.c | 290 +++++ > > drivers/gpu/hsa/radeon/kfd_mqd_manager.h | 54 + > > drivers/gpu/hsa/radeon/kfd_packet_manager.c | 488 > > ++++++++ > > drivers/gpu/hsa/radeon/kfd_pasid.c | 97 ++ > > drivers/gpu/hsa/radeon/kfd_pm4_headers.h | 682 > > +++++++++++ > > drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h | 107 ++ > > drivers/gpu/hsa/radeon/kfd_priv.h | 475 > > ++++++++ > > drivers/gpu/hsa/radeon/kfd_process.c | 391 +++++++ > > drivers/gpu/hsa/radeon/kfd_process_queue_manager.c | 343 ++++++ > > drivers/gpu/hsa/radeon/kfd_queue.c | 109 ++ > > drivers/gpu/hsa/radeon/kfd_scheduler.h | 72 ++ > > drivers/gpu/hsa/radeon/kfd_topology.c | 1207 > > ++++++++++++++++++++ > > drivers/gpu/hsa/radeon/kfd_topology.h | 168 +++ > > drivers/gpu/hsa/radeon/kfd_vidmem.c | 97 ++ > > include/linux/mm_types.h | 14 + > > include/linux/radeon_kfd.h | 106 ++ > > include/uapi/linux/kfd_ioctl.h | 133 +++ > > mm/rmap.c | 8 +- > > 47 files changed, 9402 insertions(+), 97 deletions(-) > > create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c > > create mode 100644 drivers/gpu/hsa/Kconfig > > create mode 100644 drivers/gpu/hsa/Makefile > > create mode 100644 drivers/gpu/hsa/radeon/Makefile > > create mode 100644 drivers/gpu/hsa/radeon/cik_int.h > > create mode 100644 drivers/gpu/hsa/radeon/cik_mqds.h > > create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_aperture.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c > > create mode 100644 > > drivers/gpu/hsa/radeon/kfd_device_queue_manager.c > > create mode 100644 > > drivers/gpu/hsa/radeon/kfd_device_queue_manager.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_kernel_queue.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_mqd_manager.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_packet_manager.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_headers.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_pm4_opcodes.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c > > create mode 100644 > > drivers/gpu/hsa/radeon/kfd_process_queue_manager.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_queue.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c > > create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h > > create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c > > create mode 100644 include/linux/radeon_kfd.h > > create mode 100644 include/uapi/linux/kfd_ioctl.h > > -- > > 1.9.1 ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?