Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753779AbcKBMp5 (ORCPT ); Wed, 2 Nov 2016 08:45:57 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:6336 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753148AbcKBMpz (ORCPT ); Wed, 2 Nov 2016 08:45:55 -0400 X-PGP-Universal: processed; by hqnvupgp07.nvidia.com on Tue, 01 Nov 2016 17:45:39 -0700 Subject: Re: [Qemu-devel] [PATCH v9 04/12] vfio iommu: Add support for mediated devices To: Jike Song , Alexey Kardashevskiy References: <1476739332-4911-1-git-send-email-kwankhede@nvidia.com> <1476739332-4911-5-git-send-email-kwankhede@nvidia.com> <62ade373-6edc-c7f3-c205-200cf4fd211f@nvidia.com> <45b517de-3766-e96b-fec0-2b77da4dcf8d@nvidia.com> <695ca09f-b332-d33a-22fb-073f03dfaebf@ozlabs.ru> <5819DA45.9000200@intel.com> CC: , , , , , , , , X-Nvconfidentiality: public From: Kirti Wankhede Message-ID: <64066f1a-3d25-fede-f383-078b5f8f0fa0@nvidia.com> Date: Wed, 2 Nov 2016 18:11:47 +0530 MIME-Version: 1.0 In-Reply-To: <5819DA45.9000200@intel.com> X-Originating-IP: [10.24.216.210] X-ClientProxiedBy: BGMAIL101.nvidia.com (10.25.59.10) To bgmail102.nvidia.com (10.25.59.11) Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3487 Lines: 84 On 11/2/2016 5:51 PM, Jike Song wrote: > On 11/02/2016 12:09 PM, Alexey Kardashevskiy wrote: >> On 02/11/16 14:29, Kirti Wankhede wrote: >>> >>> >>> On 11/2/2016 6:54 AM, Alexey Kardashevskiy wrote: >>>> On 02/11/16 01:01, Kirti Wankhede wrote: >>>>> >>>>> >>>>> On 10/28/2016 7:48 AM, Alexey Kardashevskiy wrote: >>>>>> On 27/10/16 23:31, Kirti Wankhede wrote: >>>>>>> >>>>>>> >>>>>>> On 10/27/2016 12:50 PM, Alexey Kardashevskiy wrote: >>>>>>>> On 18/10/16 08:22, Kirti Wankhede wrote: >>>>>>>>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable. >>>>>>>>> Mediated device only uses IOMMU APIs, the underlying hardware can be >>>>>>>>> managed by an IOMMU domain. >>>>>>>>> >>>>>>>>> Aim of this change is: >>>>>>>>> - To use most of the code of TYPE1 IOMMU driver for mediated devices >>>>>>>>> - To support direct assigned device and mediated device in single module >>>>>>>>> >>>>>>>>> Added two new callback functions to struct vfio_iommu_driver_ops. Backend >>>>>>>>> IOMMU module that supports pining and unpinning pages for mdev devices >>>>>>>>> should provide these functions. >>>>>>>>> Added APIs for pining and unpining pages to VFIO module. These calls back >>>>>>>>> into backend iommu module to actually pin and unpin pages. >>>>>>>>> >>>>>>>>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU >>>>>>>>> backend module. More details: >>>>>>>>> - When iommu_group of mediated devices is attached, task structure is >>>>>>>>> cached which is used later to pin pages and page accounting. >>>>>>>> >>>>>>>> >>>>>>>> For SPAPR TCE IOMMU driver, I ended up caching mm_struct with >>>>>>>> atomic_inc(&container->mm->mm_count) (patches are on the way) instead of >>>>>>>> using @current or task as the process might be gone while VFIO container is >>>>>>>> still alive and @mm might be needed to do proper cleanup; this might not be >>>>>>>> an issue with this patchset now but still you seem to only use @mm from >>>>>>>> task_struct. >>>>>>>> >>>>>>> >>>>>>> Consider the example of QEMU process which creates VFIO container, QEMU >>>>>>> in its teardown path would release the container. How could container be >>>>>>> alive when process is gone? >>>>>> >>>>>> do_exit() in kernel/exit.c calls exit_mm() (which sets NULL to tsk->mm) >>>>>> first, and then releases open files by calling exit_files(). So >>>>>> container's release() does not have current->mm. >>>>>> >>>>> >>>>> Incrementing usage count (get_task_struct()) while saving task structure >>>>> and decementing it (put_task_struct()) from release() should work here. >>>>> Updating the patch. >>>> >>>> I cannot see how the task->usage counter prevents do_exit() from performing >>>> the exit, can you? >>>> >>> >>> It will not prevent exit from do_exit(), but that will make sure that we >>> don't have stale pointer of task structure. Then we can check whether >>> the task is alive and get mm pointer in teardown path as below: >> >> >> Or you could just reference and use @mm as KVM and others do. Or there is >> anything else you need from @current than just @mm? >> > > I agree. If @mm is the only thing needed, there is really no reason to > refer to the @task :-) > In vfio_lock_acct(), that is for page accounting, if mm->mmap_sem is already held then page accounting is deferred, where task structure is used to get mm and work is deferred only if mm exist: mm = get_task_mm(task); That is where this module need task structure. Thanks, Kirti