Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp6123imm; Wed, 5 Sep 2018 19:50:50 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaPgCaIAG3+9ynd7poHNGvvd0OXsZUifHemSkLWAneryfvAa5F9O9zwmRrj7MaOJCK+zFXq X-Received: by 2002:a17:902:49:: with SMTP id 67-v6mr676061pla.206.1536202250309; Wed, 05 Sep 2018 19:50:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536202250; cv=none; d=google.com; s=arc-20160816; b=bMYBBlJCeXS8JNQazZx+e68QIKdCZjWVGIX5uA3ng9MmEjqm0y3/maUnsZJ/kggv8k qsj5wNg2nONdliX94HxB4D2idnpQH+L2V8PZa3FDhSfb2uqj5gqzVg8DjFyoIyzCU8ft +dsOwJN+SrKPE81ILsiGbdF3fm2IxG0zb90f2ECLDsUriVBTVATbx1myqIz1XVcRzeLA RGxbcqxa0myxL8HcZI4O7qDUEwmr/MsB38a/mS00Rp09w6CO8xfLbkWg/wUjzVTRrelJ lqm/zQ85bQZpltQ+ODbdwC5GfAHb1tuca5lrrLGBti9uitdXdZCJ7fyCqEv4Ci453iTY wl7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:cc; bh=X7jeTYFGBWQA9ezMCS/DE7vvR6sWfiK4eCej2OEvsRw=; b=URAyKRuClO0IPSJ+h/8novO7MBkDMHhG7EHmaKUIJHyyZQquD50kV6z1qbPMzMDM0t 581plW867yFvgcxpVXiiMVm2kkn3e0/2Ce++4PeAR1T3y3dBMhDLyx96F7/nZiVh/h+D ykKcRI5A3rCWIG+wLjxHlR3IOkA6ENN5imJo9aICv6ryPgwsaVU9DWPDFGZ0Roybwqi/ VO+/8sl8q/bdv3f4xFgnso+KpLSrofO2qEUm0hvdsvxgQS5h+aHzFzJEbxwUq0vMPUu6 F2b9lzDrHLDSyuCQBtH6ZiyDBVfmv0ogtq8MPn4JIrMajoc2mIdFmBLjQARbLXBFDpnx JdCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d11-v6si884270pla.245.2018.09.05.19.50.33; Wed, 05 Sep 2018 19:50:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727665AbeIFHWU (ORCPT + 99 others); Thu, 6 Sep 2018 03:22:20 -0400 Received: from mga01.intel.com ([192.55.52.88]:3961 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725959AbeIFHWT (ORCPT ); Thu, 6 Sep 2018 03:22:19 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Sep 2018 19:47:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,334,1531810800"; d="scan'208";a="68741689" Received: from allen-box.sh.intel.com (HELO [10.239.161.122]) ([10.239.161.122]) by fmsmga008.fm.intel.com with ESMTP; 05 Sep 2018 19:47:07 -0700 Cc: baolu.lu@linux.intel.com, "Raj, Ashok" , "Kumar, Sanjay K" , "Pan, Jacob jun" , "Liu, Yi L" , "Sun, Yi Y" , "peterx@redhat.com" , Jean-Philippe Brucker , "iommu@lists.linux-foundation.org" , "linux-kernel@vger.kernel.org" , Jacob Pan Subject: Re: [PATCH v2 02/12] iommu/vt-d: Manage scalalble mode PASID tables To: "Tian, Kevin" , Joerg Roedel , David Woodhouse References: <20180830013524.28743-1-baolu.lu@linux.intel.com> <20180830013524.28743-3-baolu.lu@linux.intel.com> From: Lu Baolu Message-ID: <44298d5c-5720-a382-07d1-a90a072ff24b@linux.intel.com> Date: Thu, 6 Sep 2018 10:46:03 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 09/06/2018 10:14 AM, Tian, Kevin wrote: >> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] >> Sent: Thursday, August 30, 2018 9:35 AM >> >> In scalable mode, pasid structure is a two level table with >> a pasid directory table and a pasid table. Any pasid entry >> can be identified by a pasid value in below way. >> >> 1 >> 9 6 5 0 >> .-----------------------.-------. >> | PASID | | >> '-----------------------'-------' .-------------. >> | | | | >> | | | | >> | | | | >> | .-----------. | .-------------. >> | | | |----->| PASID Entry | >> | | | | '-------------' >> | | | |Plus | | >> | .-----------. | | | >> |---->| DIR Entry |-------->| | >> | '-----------' '-------------' >> .---------. |Plus | | >> | Context | | | | >> | Entry |------->| | >> '---------' '-----------' >> >> This changes the pasid table APIs to support scalable mode >> PASID directory and PASID table. It also adds a helper to >> get the PASID table entry according to the pasid value. >> >> Cc: Ashok Raj >> Cc: Jacob Pan >> Cc: Kevin Tian >> Cc: Liu Yi L >> Signed-off-by: Sanjay Kumar >> Signed-off-by: Lu Baolu >> Reviewed-by: Ashok Raj >> --- >> drivers/iommu/intel-iommu.c | 2 +- >> drivers/iommu/intel-pasid.c | 72 ++++++++++++++++++++++++++++++++---- >> - >> drivers/iommu/intel-pasid.h | 10 +++++- >> drivers/iommu/intel-svm.c | 6 +--- >> 4 files changed, 74 insertions(+), 16 deletions(-) >> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c >> index 5845edf4dcf9..b0da4f765274 100644 >> --- a/drivers/iommu/intel-iommu.c >> +++ b/drivers/iommu/intel-iommu.c >> @@ -2507,7 +2507,7 @@ static struct dmar_domain >> *dmar_insert_one_dev_info(struct intel_iommu *iommu, >> if (dev) >> dev->archdata.iommu = info; >> >> - if (dev && dev_is_pci(dev) && info->pasid_supported) { >> + if (dev && dev_is_pci(dev) && sm_supported(iommu)) { > > worthy of a comment here that PASID table now is mandatory in > scalable mode, instead of optional for 1st level usage before. Fair enough. Will add in the next version. > >> ret = intel_pasid_alloc_table(dev); >> if (ret) { >> __dmar_remove_one_dev_info(info); >> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c >> index fe95c9bd4d33..d6e90cd5b062 100644 >> --- a/drivers/iommu/intel-pasid.c >> +++ b/drivers/iommu/intel-pasid.c >> @@ -127,8 +127,7 @@ int intel_pasid_alloc_table(struct device *dev) >> int ret, order; >> >> info = dev->archdata.iommu; >> - if (WARN_ON(!info || !dev_is_pci(dev) || >> - !info->pasid_supported || info->pasid_table)) >> + if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table)) >> return -EINVAL; > > following same logic should you check sm_supported here? If not sm_supported, info->pasid_table should be NULL. Checking info->pasid_table is better since even sm_supported, the pasid table pointer could still possible to be empty. > >> >> /* DMA alias device already has a pasid table, use it: */ >> @@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev) >> return -ENOMEM; >> INIT_LIST_HEAD(&pasid_table->dev); >> >> - size = sizeof(struct pasid_entry); >> + size = sizeof(struct pasid_dir_entry); >> count = min_t(int, pci_max_pasids(to_pci_dev(dev)), >> intel_pasid_max_id); >> + count >>= PASID_PDE_SHIFT; >> order = get_order(size * count); >> pages = alloc_pages_node(info->iommu->node, >> GFP_ATOMIC | __GFP_ZERO, >> @@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev) >> >> pasid_table->table = page_address(pages); >> pasid_table->order = order; >> - pasid_table->max_pasid = count; >> + pasid_table->max_pasid = count << PASID_PDE_SHIFT; > > are you sure of that count is PDE_SHIFT aligned? otherwise >> > then << would lose some bits. If sure, then better add some check. I am making the max_pasid PDE_SHIFT aligned as the result of shift operations. > >> >> attach_out: >> device_attach_pasid_table(info, pasid_table); >> @@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev) >> return 0; >> } >> >> +/* Get PRESENT bit of a PASID directory entry. */ >> +static inline bool >> +pasid_pde_is_present(struct pasid_dir_entry *pde) >> +{ >> + return READ_ONCE(pde->val) & PASID_PTE_PRESENT; > > curious why adding READ_ONCE specifically for PASID structure, > but not used for any other existing vtd structures? Is it to address > some specific requirement on PASID structure as defined in spec? READ/WRITE_ONCE are used in pasid entry read/write to prevent the compiler from merging, refetching or reordering successive instances of read/write. > >> +} >> + >> +/* Get PASID table from a PASID directory entry. */ >> +static inline struct pasid_entry * >> +get_pasid_table_from_pde(struct pasid_dir_entry *pde) >> +{ >> + if (!pasid_pde_is_present(pde)) >> + return NULL; >> + >> + return phys_to_virt(READ_ONCE(pde->val) & PDE_PFN_MASK); >> +} >> + >> void intel_pasid_free_table(struct device *dev) >> { >> struct device_domain_info *info; >> struct pasid_table *pasid_table; >> + struct pasid_dir_entry *dir; >> + struct pasid_entry *table; >> + int i, max_pde; >> >> info = dev->archdata.iommu; >> - if (!info || !dev_is_pci(dev) || >> - !info->pasid_supported || !info->pasid_table) >> + if (!info || !dev_is_pci(dev) || !info->pasid_table) >> return; >> >> pasid_table = info->pasid_table; >> @@ -178,6 +197,14 @@ void intel_pasid_free_table(struct device *dev) >> if (!list_empty(&pasid_table->dev)) >> return; >> >> + /* Free scalable mode PASID directory tables: */ >> + dir = pasid_table->table; >> + max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT; >> + for (i = 0; i < max_pde; i++) { >> + table = get_pasid_table_from_pde(&dir[i]); >> + free_pgtable_page(table); >> + } >> + >> free_pages((unsigned long)pasid_table->table, pasid_table->order); >> kfree(pasid_table); >> } >> @@ -206,17 +233,37 @@ int intel_pasid_get_dev_max_id(struct device >> *dev) >> >> struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid) >> { >> + struct device_domain_info *info; >> struct pasid_table *pasid_table; >> + struct pasid_dir_entry *dir; >> struct pasid_entry *entries; >> + int dir_index, index; >> >> pasid_table = intel_pasid_get_table(dev); >> if (WARN_ON(!pasid_table || pasid < 0 || >> pasid >= intel_pasid_get_dev_max_id(dev))) >> return NULL; >> >> - entries = pasid_table->table; >> + dir = pasid_table->table; >> + info = dev->archdata.iommu; >> + dir_index = pasid >> PASID_PDE_SHIFT; >> + index = pasid & PASID_PTE_MASK; >> + >> + spin_lock(&pasid_lock); >> + entries = get_pasid_table_from_pde(&dir[dir_index]); >> + if (!entries) { >> + entries = alloc_pgtable_page(info->iommu->node); >> + if (!entries) { >> + spin_unlock(&pasid_lock); >> + return NULL; >> + } >> + >> + WRITE_ONCE(dir[dir_index].val, >> + (u64)virt_to_phys(entries) | PASID_PTE_PRESENT); >> + } >> + spin_unlock(&pasid_lock); >> >> - return &entries[pasid]; >> + return &entries[index]; >> } >> >> /* >> @@ -224,7 +271,14 @@ struct pasid_entry *intel_pasid_get_entry(struct >> device *dev, int pasid) >> */ >> static inline void pasid_clear_entry(struct pasid_entry *pe) >> { >> - WRITE_ONCE(pe->val, 0); >> + WRITE_ONCE(pe->val[0], 0); >> + WRITE_ONCE(pe->val[1], 0); >> + WRITE_ONCE(pe->val[2], 0); >> + WRITE_ONCE(pe->val[3], 0); >> + WRITE_ONCE(pe->val[4], 0); >> + WRITE_ONCE(pe->val[5], 0); >> + WRITE_ONCE(pe->val[6], 0); >> + WRITE_ONCE(pe->val[7], 0); > > memset? The order is important here. Otherwise, the PRESENT bit of this pasid entry might still set while other fields contains invalid values. > >> } >> >> void intel_pasid_clear_entry(struct device *dev, int pasid) >> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h >> index 1c05ed6fc5a5..12f480c2bb8b 100644 >> --- a/drivers/iommu/intel-pasid.h >> +++ b/drivers/iommu/intel-pasid.h >> @@ -12,11 +12,19 @@ >> >> #define PASID_MIN 0x1 >> #define PASID_MAX 0x100000 >> +#define PASID_PTE_MASK 0x3F >> +#define PASID_PTE_PRESENT 1 >> +#define PDE_PFN_MASK PAGE_MASK >> +#define PASID_PDE_SHIFT 6 >> >> -struct pasid_entry { >> +struct pasid_dir_entry { >> u64 val; >> }; >> >> +struct pasid_entry { >> + u64 val[8]; >> +}; >> + >> /* The representative of a PASID table */ >> struct pasid_table { >> void *table; /* pasid table pointer */ >> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c >> index 4a03e5090952..6c0bd9ee9602 100644 >> --- a/drivers/iommu/intel-svm.c >> +++ b/drivers/iommu/intel-svm.c >> @@ -65,8 +65,6 @@ int intel_svm_init(struct intel_iommu *iommu) >> >> order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max); >> if (ecap_dis(iommu->ecap)) { >> - /* Just making it explicit... */ >> - BUILD_BUG_ON(sizeof(struct pasid_entry) != sizeof(struct >> pasid_state_entry)); >> pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order); >> if (pages) >> iommu->pasid_state_table = page_address(pages); >> @@ -406,9 +404,7 @@ int intel_svm_bind_mm(struct device *dev, int >> *pasid, int flags, struct svm_dev_ >> pasid_entry_val |= PASID_ENTRY_FLPM_5LP; >> >> entry = intel_pasid_get_entry(dev, svm->pasid); >> - entry->val = pasid_entry_val; >> - >> - wmb(); >> + WRITE_ONCE(entry->val[0], pasid_entry_val); >> >> /* >> * Flush PASID cache when a PASID table entry becomes >> -- >> 2.17.1 > > Best regards, Lu Baolu