2014-01-07 09:00:09

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 00/14] Enhance DMAR drivers to handle PCI/memory hotplug events

Intel DMA/interrupt remapping drivers scan available PCI/memory devices
at startup and cache discovered hardware topologies. They don't update
cached information if PCI/memory hotplug event happens at runtime, then
the stale information may break DMA/interrupt remapping logic.

This patchset first (Patch 1-8) tries to introduces some helper
functions and fixes several bugs, then (Patch 9,10) uses a global
rwsem and RCU to protect global DMA/interrupt remapping data
structures, and finally (Patch 11-14) hook PCI/memory hotplug events
to update cached information.

It's also a preparation for supporting of DMA/interrupt remapping
hotplug.

Jiang Liu (14):
iommu/vt-d: factor out dmar_alloc_dev_scope() for later reuse
iommu/vt-d: move private structures and variables into intel-iommu.c
iommu/vt-d: simplify function get_domain_for_dev()
iommu/vt-d: free resources if failed to create domain for PCIe
endpoint
iommu/vt-d: create device_domain_info structure for intermediate P2P
bridges
iommu/vt-d: fix incorrect iommu_count for si_domain
iommu/vt-d: fix error in detect ATS capability
iommu/vt-d: introduce macro for_each_dev_scope() to walk device scope
entries
iommu/vt-d: introduce a rwsem to protect global data structures
iommu/vt-d: use RCU to protect global resources in interrupt context
iommu/vt-d, PCI: update DRHD/RMRR/ATSR device scope caches when PCI
hotplug happens
iommu/vt-d, PCI: unify the way to process DMAR device scope array
iommu/vt-d: update device to static identity domain mapping for PCI
hotplug
iommu/vt-d: update IOMMU state when memory hotplug happens

drivers/iommu/dmar.c | 412 +++++++++++++++++--------
drivers/iommu/intel-iommu.c | 583 +++++++++++++++++++++--------------
drivers/iommu/intel_irq_remapping.c | 108 ++++---
include/linux/dmar.h | 75 +++--
4 files changed, 753 insertions(+), 425 deletions(-)

--
1.7.10.4


2014-01-07 09:00:13

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 01/14] iommu/vt-d: factor out dmar_alloc_dev_scope() for later reuse

Factor out function dmar_alloc_dev_scope() from dmar_parse_dev_scope()
for later reuse.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/dmar.c | 28 ++++++++++++++++------------
include/linux/dmar.h | 1 +
2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 248332e..4a4c9a1 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -117,13 +117,9 @@ static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope *scope,
return 0;
}

-int __init dmar_parse_dev_scope(void *start, void *end, int *cnt,
- struct pci_dev ***devices, u16 segment)
+void *dmar_alloc_dev_scope(void *start, void *end, int *cnt)
{
struct acpi_dmar_device_scope *scope;
- void * tmp = start;
- int index;
- int ret;

*cnt = 0;
while (start < end) {
@@ -138,15 +134,24 @@ int __init dmar_parse_dev_scope(void *start, void *end, int *cnt,
start += scope->length;
}
if (*cnt == 0)
- return 0;
+ return NULL;
+
+ return kcalloc(*cnt, sizeof(struct pci_dev *), GFP_KERNEL);
+}
+
+int __init dmar_parse_dev_scope(void *start, void *end, int *cnt,
+ struct pci_dev ***devices, u16 segment)
+{
+ struct acpi_dmar_device_scope *scope;
+ int index, ret;

- *devices = kcalloc(*cnt, sizeof(struct pci_dev *), GFP_KERNEL);
- if (!*devices)
+ *devices = dmar_alloc_dev_scope(start, end, cnt);
+ if (*cnt == 0)
+ return 0;
+ else if (!*devices)
return -ENOMEM;

- start = tmp;
- index = 0;
- while (start < end) {
+ for (index = 0; start < end; start += scope->length) {
scope = start;
if (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT ||
scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE) {
@@ -158,7 +163,6 @@ int __init dmar_parse_dev_scope(void *start, void *end, int *cnt,
}
index ++;
}
- start += scope->length;
}

return 0;
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index bd5026b..198eb09 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -151,6 +151,7 @@ struct dmar_atsr_unit {
int dmar_parse_rmrr_atsr_dev(void);
extern int dmar_parse_one_rmrr(struct acpi_dmar_header *header);
extern int dmar_parse_one_atsr(struct acpi_dmar_header *header);
+extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt);
extern int dmar_parse_dev_scope(void *start, void *end, int *cnt,
struct pci_dev ***devices, u16 segment);
extern void dmar_free_dev_scope(struct pci_dev ***devices, int *cnt);
--
1.7.10.4

2014-01-07 09:00:17

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 03/14] iommu/vt-d: simplify function get_domain_for_dev()

Function get_domain_for_dev() is a little complex, simplify it
by factoring out dmar_search_domain_by_dev_info() and
dmar_insert_dev_info().

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/intel-iommu.c | 161 ++++++++++++++++++++-----------------------
1 file changed, 75 insertions(+), 86 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 1704e97..da65884 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1957,18 +1957,70 @@ find_domain(struct pci_dev *pdev)
return NULL;
}

+static inline struct dmar_domain *
+dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
+{
+ struct device_domain_info *info;
+
+ list_for_each_entry(info, &device_domain_list, global)
+ if (info->segment == segment && info->bus == bus &&
+ info->devfn == devfn)
+ return info->domain;
+
+ return NULL;
+}
+
+static int dmar_insert_dev_info(int segment, int bus, int devfn,
+ struct pci_dev *dev, struct dmar_domain **domp)
+{
+ struct dmar_domain *found, *domain = *domp;
+ struct device_domain_info *info;
+ unsigned long flags;
+
+ info = alloc_devinfo_mem();
+ if (!info)
+ return -ENOMEM;
+
+ info->segment = segment;
+ info->bus = bus;
+ info->devfn = devfn;
+ info->dev = dev;
+ info->domain = domain;
+ if (!dev)
+ domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ if (dev)
+ found = find_domain(dev);
+ else
+ found = dmar_search_domain_by_dev_info(segment, bus, devfn);
+ if (found) {
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+ free_devinfo_mem(info);
+ if (found != domain) {
+ domain_exit(domain);
+ *domp = found;
+ }
+ } else {
+ list_add(&info->link, &domain->devices);
+ list_add(&info->global, &device_domain_list);
+ if (dev)
+ dev->dev.archdata.iommu = info;
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+ }
+
+ return 0;
+}
+
/* domain is initialized */
static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
{
- struct dmar_domain *domain, *found = NULL;
+ struct dmar_domain *domain;
struct intel_iommu *iommu;
struct dmar_drhd_unit *drhd;
- struct device_domain_info *info, *tmp;
- struct pci_dev *dev_tmp;
+ struct pci_dev *bridge;
unsigned long flags;
- int bus = 0, devfn = 0;
- int segment;
- int ret;
+ int segment, bus, devfn;

domain = find_domain(pdev);
if (domain)
@@ -1976,119 +2028,56 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)

segment = pci_domain_nr(pdev->bus);

- dev_tmp = pci_find_upstream_pcie_bridge(pdev);
- if (dev_tmp) {
- if (pci_is_pcie(dev_tmp)) {
- bus = dev_tmp->subordinate->number;
+ bridge = pci_find_upstream_pcie_bridge(pdev);
+ if (bridge) {
+ if (pci_is_pcie(bridge)) {
+ bus = bridge->subordinate->number;
devfn = 0;
} else {
- bus = dev_tmp->bus->number;
- devfn = dev_tmp->devfn;
+ bus = bridge->bus->number;
+ devfn = bridge->devfn;
}
spin_lock_irqsave(&device_domain_lock, flags);
- list_for_each_entry(info, &device_domain_list, global) {
- if (info->segment == segment &&
- info->bus == bus && info->devfn == devfn) {
- found = info->domain;
- break;
- }
- }
+ domain = dmar_search_domain_by_dev_info(segment, bus, devfn);
spin_unlock_irqrestore(&device_domain_lock, flags);
/* pcie-pci bridge already has a domain, uses it */
- if (found) {
- domain = found;
+ if (domain)
goto found_domain;
- }
}

- domain = alloc_domain();
- if (!domain)
- goto error;
-
- /* Allocate new domain for the device */
drhd = dmar_find_matched_drhd_unit(pdev);
if (!drhd) {
printk(KERN_ERR "IOMMU: can't find DMAR for device %s\n",
pci_name(pdev));
- free_domain_mem(domain);
return NULL;
}
iommu = drhd->iommu;

- ret = iommu_attach_domain(domain, iommu);
- if (ret) {
+ /* Allocate new domain for the device */
+ domain = alloc_domain();
+ if (!domain)
+ goto error;
+ if (iommu_attach_domain(domain, iommu)) {
free_domain_mem(domain);
goto error;
}
-
if (domain_init(domain, gaw)) {
domain_exit(domain);
goto error;
}

/* register pcie-to-pci device */
- if (dev_tmp) {
- info = alloc_devinfo_mem();
- if (!info) {
+ if (bridge) {
+ if (dmar_insert_dev_info(segment, bus, devfn, NULL, &domain)) {
domain_exit(domain);
goto error;
}
- info->segment = segment;
- info->bus = bus;
- info->devfn = devfn;
- info->dev = NULL;
- info->domain = domain;
- /* This domain is shared by devices under p2p bridge */
- domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
-
- /* pcie-to-pci bridge already has a domain, uses it */
- found = NULL;
- spin_lock_irqsave(&device_domain_lock, flags);
- list_for_each_entry(tmp, &device_domain_list, global) {
- if (tmp->segment == segment &&
- tmp->bus == bus && tmp->devfn == devfn) {
- found = tmp->domain;
- break;
- }
- }
- if (found) {
- spin_unlock_irqrestore(&device_domain_lock, flags);
- free_devinfo_mem(info);
- domain_exit(domain);
- domain = found;
- } else {
- list_add(&info->link, &domain->devices);
- list_add(&info->global, &device_domain_list);
- spin_unlock_irqrestore(&device_domain_lock, flags);
- }
}

found_domain:
- info = alloc_devinfo_mem();
- if (!info)
- goto error;
- info->segment = segment;
- info->bus = pdev->bus->number;
- info->devfn = pdev->devfn;
- info->dev = pdev;
- info->domain = domain;
- spin_lock_irqsave(&device_domain_lock, flags);
- /* somebody is fast */
- found = find_domain(pdev);
- if (found != NULL) {
- spin_unlock_irqrestore(&device_domain_lock, flags);
- if (found != domain) {
- domain_exit(domain);
- domain = found;
- }
- free_devinfo_mem(info);
+ if (dmar_insert_dev_info(segment, pdev->bus->number, pdev->devfn,
+ pdev, &domain) == 0)
return domain;
- }
- list_add(&info->link, &domain->devices);
- list_add(&info->global, &device_domain_list);
- pdev->dev.archdata.iommu = info;
- spin_unlock_irqrestore(&device_domain_lock, flags);
- return domain;
error:
/* recheck it here, maybe others set it */
return find_domain(pdev);
--
1.7.10.4

2014-01-07 09:00:24

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 04/14] iommu/vt-d: free resources if failed to create domain for PCIe endpoint

Enhance function get_domain_for_dev() to release allocated resources
if failed to create domain for PCIe endpoint, otherwise the allocated
resources will get lost.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/intel-iommu.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index da65884..2bbb877 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2078,6 +2078,8 @@ found_domain:
if (dmar_insert_dev_info(segment, pdev->bus->number, pdev->devfn,
pdev, &domain) == 0)
return domain;
+ else if (!bridge)
+ domain_exit(domain);
error:
/* recheck it here, maybe others set it */
return find_domain(pdev);
--
1.7.10.4

2014-01-07 09:00:36

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 07/14] iommu/vt-d: fix error in detect ATS capability

Current Intel IOMMU driver only matches a PCIe root port with the first
DRHD unit with the samge segment number. It will report false result
if there are multiple DRHD units with the same segment number, thus fail
to detect ATS capability for some PCIe devices.

This patch refines function dmar_find_matched_atsr_unit() to search all
DRHD units with the same segment number.

An example DMAR table entries as below:
[1D0h 0464 2] Subtable Type : 0002 <Root Port ATS Capability>
[1D2h 0466 2] Length : 0028
[1D4h 0468 1] Flags : 00
[1D5h 0469 1] Reserved : 00
[1D6h 0470 2] PCI Segment Number : 0000

[1D8h 0472 1] Device Scope Entry Type : 02
[1D9h 0473 1] Entry Length : 08
[1DAh 0474 2] Reserved : 0000
[1DCh 0476 1] Enumeration ID : 00
[1DDh 0477 1] PCI Bus Number : 00
[1DEh 0478 2] PCI Path : [02, 00]

[1E0h 0480 1] Device Scope Entry Type : 02
[1E1h 0481 1] Entry Length : 08
[1E2h 0482 2] Reserved : 0000
[1E4h 0484 1] Enumeration ID : 00
[1E5h 0485 1] PCI Bus Number : 00
[1E6h 0486 2] PCI Path : [03, 00]

[1E8h 0488 1] Device Scope Entry Type : 02
[1E9h 0489 1] Entry Length : 08
[1EAh 0490 2] Reserved : 0000
[1ECh 0492 1] Enumeration ID : 00
[1EDh 0493 1] PCI Bus Number : 00
[1EEh 0494 2] PCI Path : [03, 02]

[1F0h 0496 1] Device Scope Entry Type : 02
[1F1h 0497 1] Entry Length : 08
[1F2h 0498 2] Reserved : 0000
[1F4h 0500 1] Enumeration ID : 00
[1F5h 0501 1] PCI Bus Number : 00
[1F6h 0502 2] PCI Path : [03, 03]

[1F8h 0504 2] Subtable Type : 0002 <Root Port ATS Capability>
[1FAh 0506 2] Length : 0020
[1FCh 0508 1] Flags : 00
[1FDh 0509 1] Reserved : 00
[1FEh 0510 2] PCI Segment Number : 0000

[200h 0512 1] Device Scope Entry Type : 02
[201h 0513 1] Entry Length : 08
[202h 0514 2] Reserved : 0000
[204h 0516 1] Enumeration ID : 00
[205h 0517 1] PCI Bus Number : 40
[206h 0518 2] PCI Path : [02, 00]

[208h 0520 1] Device Scope Entry Type : 02
[209h 0521 1] Entry Length : 08
[20Ah 0522 2] Reserved : 0000
[20Ch 0524 1] Enumeration ID : 00
[20Dh 0525 1] PCI Bus Number : 40
[20Eh 0526 2] PCI Path : [02, 02]

[210h 0528 1] Device Scope Entry Type : 02
[211h 0529 1] Entry Length : 08
[212h 0530 2] Reserved : 0000
[214h 0532 1] Enumeration ID : 00
[215h 0533 1] PCI Bus Number : 40
[216h 0534 2] PCI Path : [03, 00]

[218h 0536 2] Subtable Type : 0002 <Root Port ATS Capability>
[21Ah 0538 2] Length : 0020
[21Ch 0540 1] Flags : 00
[21Dh 0541 1] Reserved : 00
[21Eh 0542 2] PCI Segment Number : 0000

[220h 0544 1] Device Scope Entry Type : 02
[221h 0545 1] Entry Length : 08
[222h 0546 2] Reserved : 0000
[224h 0548 1] Enumeration ID : 00
[225h 0549 1] PCI Bus Number : 80
[226h 0550 2] PCI Path : [02, 00]

[228h 0552 1] Device Scope Entry Type : 02
[229h 0553 1] Entry Length : 08
[22Ah 0554 2] Reserved : 0000
[22Ch 0556 1] Enumeration ID : 00
[22Dh 0557 1] PCI Bus Number : 80
[22Eh 0558 2] PCI Path : [02, 02]

[230h 0560 1] Device Scope Entry Type : 02
[231h 0561 1] Entry Length : 08
[232h 0562 2] Reserved : 0000
[234h 0564 1] Enumeration ID : 00
[235h 0565 1] PCI Bus Number : 80
[236h 0566 2] PCI Path : [03, 00]

[238h 0568 2] Subtable Type : 0002 <Root Port ATS Capability>
[23Ah 0570 2] Length : 0020
[23Ch 0572 1] Flags : 00
[23Dh 0573 1] Reserved : 00
[23Eh 0574 2] PCI Segment Number : 0000

[240h 0576 1] Device Scope Entry Type : 02
[241h 0577 1] Entry Length : 08
[242h 0578 2] Reserved : 0000
[244h 0580 1] Enumeration ID : 00
[245h 0581 1] PCI Bus Number : C0
[246h 0582 2] PCI Path : [02, 00]

[248h 0584 1] Device Scope Entry Type : 02
[249h 0585 1] Entry Length : 08
[24Ah 0586 2] Reserved : 0000
[24Ch 0588 1] Enumeration ID : 00
[24Dh 0589 1] PCI Bus Number : C0
[24Eh 0590 2] PCI Path : [02, 02]

[250h 0592 1] Device Scope Entry Type : 02
[251h 0593 1] Entry Length : 08
[252h 0594 2] Reserved : 0000
[254h 0596 1] Enumeration ID : 00
[255h 0597 1] PCI Bus Number : C0
[256h 0598 2] PCI Path : [03, 00]

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/intel-iommu.c | 37 +++++++++++++++++--------------------
1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 7038b38..b0028e2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3556,37 +3556,34 @@ int dmar_find_matched_atsr_unit(struct pci_dev *dev)
{
int i;
struct pci_bus *bus;
+ struct pci_dev *bridge = NULL;
struct acpi_dmar_atsr *atsr;
struct dmar_atsr_unit *atsru;

dev = pci_physfn(dev);
-
- list_for_each_entry(atsru, &dmar_atsr_units, list) {
- atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
- if (atsr->segment == pci_domain_nr(dev->bus))
- goto found;
- }
-
- return 0;
-
-found:
for (bus = dev->bus; bus; bus = bus->parent) {
- struct pci_dev *bridge = bus->self;
-
+ bridge = bus->self;
if (!bridge || !pci_is_pcie(bridge) ||
pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
return 0;
-
- if (pci_pcie_type(bridge) == PCI_EXP_TYPE_ROOT_PORT) {
- for (i = 0; i < atsru->devices_cnt; i++)
- if (atsru->devices[i] == bridge)
- return 1;
+ if (pci_pcie_type(bridge) == PCI_EXP_TYPE_ROOT_PORT)
break;
- }
}
+ if (!bridge)
+ return 0;

- if (atsru->include_all)
- return 1;
+ list_for_each_entry_rcu(atsru, &dmar_atsr_units, list) {
+ atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
+ if (atsr->segment != pci_domain_nr(dev->bus))
+ continue;
+
+ for (i = 0; i < atsru->devices_cnt; i++)
+ if (atsru->devices[i] == bridge)
+ return 1;
+
+ if (atsru->include_all)
+ return 1;
+ }

return 0;
}
--
1.7.10.4

2014-01-07 09:00:32

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 05/14] iommu/vt-d: create device_domain_info structure for intermediate P2P bridges

When creating DMAR domain for a PCI device, we will only associate the
PCI device and upstream (PCIe) bridge with the domain and skip all
intermediate P2P bridges.

Function domain_context_mapping() will attach the created domain to the
PCI device, the upstream (PCIe) bridge and all intermediate P2P bridge.
Later when domain_remove_dev_info() is called, it only detaches the
domain from the PCI device and its upstream (PCIe) bridge and leaves
the domain still attached to all intermediate P2P bridges.

This stale state may cause troubles when we assign the affected PCI
hierarchy to a virtual machine later. So also create device_domain_info
for intermediate P2P bridges when creating domain for PCI device.

Signed-off-by: Jiang Liu <[email protected]>
Cc: Alex Williamson <[email protected]>
---
drivers/iommu/intel-iommu.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2bbb877..d5ad21d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2018,7 +2018,7 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
struct dmar_domain *domain;
struct intel_iommu *iommu;
struct dmar_drhd_unit *drhd;
- struct pci_dev *bridge;
+ struct pci_dev *bridge, *parent;
unsigned long flags;
int segment, bus, devfn;

@@ -2072,6 +2072,13 @@ static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int gaw)
domain_exit(domain);
goto error;
}
+ /* also register intermediate P2P bridges */
+ for (parent = pdev->bus->self; parent != bridge;
+ parent = parent->bus->self) {
+ if (dmar_insert_dev_info(segment, parent->bus->number,
+ parent->devfn, NULL, &domain))
+ goto error;
+ }
}

found_domain:
--
1.7.10.4

2014-01-07 09:00:45

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 10/14] iommu/vt-d: use RCU to protect global resources in interrupt context

Global DMA and interrupt remapping resources may be accessed in
interrupt context, so use RCU instead of rwsem to protect them
in such cases.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/dmar.c | 33 ++++++++++++++++++++-------------
drivers/iommu/intel-iommu.c | 22 ++++++++++++++++++----
include/linux/dmar.h | 30 ++++++++++++++++++++----------
3 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 2831442..31c72d5 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -71,13 +71,13 @@ static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
* the very end.
*/
if (drhd->include_all)
- list_add_tail(&drhd->list, &dmar_drhd_units);
+ list_add_tail_rcu(&drhd->list, &dmar_drhd_units);
else
- list_add(&drhd->list, &dmar_drhd_units);
+ list_add_rcu(&drhd->list, &dmar_drhd_units);
}

static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope *scope,
- struct pci_dev **dev, u16 segment)
+ struct pci_dev __rcu **dev, u16 segment)
{
struct pci_bus *bus;
struct pci_dev *pdev = NULL;
@@ -122,7 +122,9 @@ static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope *scope,
pci_name(pdev));
return -EINVAL;
}
- *dev = pdev;
+
+ rcu_assign_pointer(*dev, pdev);
+
return 0;
}

@@ -149,7 +151,7 @@ void *dmar_alloc_dev_scope(void *start, void *end, int *cnt)
}

int __init dmar_parse_dev_scope(void *start, void *end, int *cnt,
- struct pci_dev ***devices, u16 segment)
+ struct pci_dev __rcu ***devices, u16 segment)
{
struct acpi_dmar_device_scope *scope;
int index, ret;
@@ -177,7 +179,7 @@ int __init dmar_parse_dev_scope(void *start, void *end, int *cnt,
return 0;
}

-void dmar_free_dev_scope(struct pci_dev ***devices, int *cnt)
+void dmar_free_dev_scope(struct pci_dev __rcu ***devices, int *cnt)
{
int i;
struct pci_dev *tmp_dev;
@@ -186,9 +188,10 @@ void dmar_free_dev_scope(struct pci_dev ***devices, int *cnt)
for_each_active_dev_scope(*devices, *cnt, i, tmp_dev)
pci_dev_put(tmp_dev);
kfree(*devices);
- *devices = NULL;
- *cnt = 0;
}
+
+ *devices = NULL;
+ *cnt = 0;
}

/**
@@ -410,7 +413,7 @@ parse_dmar_table(void)
return ret;
}

-static int dmar_pci_device_match(struct pci_dev *devices[], int cnt,
+static int dmar_pci_device_match(struct pci_dev __rcu *devices[], int cnt,
struct pci_dev *dev)
{
int index;
@@ -431,11 +434,12 @@ static int dmar_pci_device_match(struct pci_dev *devices[], int cnt,
struct dmar_drhd_unit *
dmar_find_matched_drhd_unit(struct pci_dev *dev)
{
- struct dmar_drhd_unit *dmaru = NULL;
+ struct dmar_drhd_unit *dmaru;
struct acpi_dmar_hardware_unit *drhd;

dev = pci_physfn(dev);

+ rcu_read_lock();
for_each_drhd_unit(dmaru) {
drhd = container_of(dmaru->hdr,
struct acpi_dmar_hardware_unit,
@@ -443,14 +447,17 @@ dmar_find_matched_drhd_unit(struct pci_dev *dev)

if (dmaru->include_all &&
drhd->segment == pci_domain_nr(dev->bus))
- return dmaru;
+ goto out;

if (dmar_pci_device_match(dmaru->devices,
dmaru->devices_cnt, dev))
- return dmaru;
+ goto out;
}
+ dmaru = NULL;
+out:
+ rcu_read_unlock();

- return NULL;
+ return dmaru;
}

int __init dmar_dev_scope_init(void)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 25e9b84..91ecb95 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -385,14 +385,14 @@ struct dmar_rmrr_unit {
struct acpi_dmar_header *hdr; /* ACPI header */
u64 base_address; /* reserved base address*/
u64 end_address; /* reserved end address */
- struct pci_dev **devices; /* target devices */
+ struct pci_dev __rcu **devices; /* target devices */
int devices_cnt; /* target device count */
};

struct dmar_atsr_unit {
struct list_head list; /* list of ATSR units */
struct acpi_dmar_header *hdr; /* ACPI header */
- struct pci_dev **devices; /* target devices */
+ struct pci_dev __rcu **devices; /* target devices */
int devices_cnt; /* target device count */
u8 include_all:1; /* include all ports */
};
@@ -632,12 +632,15 @@ static void domain_update_iommu_superpage(struct dmar_domain *domain)
}

/* set iommu_superpage to the smallest common denominator */
+ rcu_read_lock();
for_each_active_iommu(iommu, drhd) {
mask &= cap_super_page_val(iommu->cap);
if (!mask) {
break;
}
}
+ rcu_read_unlock();
+
domain->iommu_superpage = fls(mask);
}

@@ -656,6 +659,7 @@ static struct intel_iommu *device_to_iommu(int segment, u8 bus, u8 devfn)
struct pci_dev *dev;
int i;

+ rcu_read_lock();
for_each_active_iommu(iommu, drhd) {
if (segment != drhd->segment)
continue;
@@ -675,6 +679,7 @@ static struct intel_iommu *device_to_iommu(int segment, u8 bus, u8 devfn)
}
iommu = NULL;
out:
+ rcu_read_unlock();

return iommu;
}
@@ -1538,9 +1543,11 @@ static void domain_exit(struct dmar_domain *domain)
/* free page tables */
dma_pte_free_pagetable(domain, 0, DOMAIN_MAX_PFN(domain->gaw));

+ rcu_read_lock();
for_each_active_iommu(iommu, drhd)
if (test_bit(iommu->seq_id, domain->iommu_bmp))
iommu_detach_domain(domain, iommu);
+ rcu_read_unlock();

free_domain_mem(domain);
}
@@ -2334,6 +2341,7 @@ static bool device_has_rmrr(struct pci_dev *dev)
struct pci_dev *tmp;
int i;

+ rcu_read_lock();
for_each_rmrr_units(rmrr) {
/*
* Return TRUE if this RMRR contains the device that
@@ -2342,9 +2350,11 @@ static bool device_has_rmrr(struct pci_dev *dev)
for_each_active_dev_scope(rmrr->devices,
rmrr->devices_cnt, i, tmp)
if (tmp == dev) {
+ rcu_read_unlock();
return true;
}
}
+ rcu_read_unlock();
return false;
}

@@ -3506,7 +3516,7 @@ int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr)
atsru->hdr = hdr;
atsru->include_all = atsr->flags & 0x1;

- list_add(&atsru->list, &dmar_atsr_units);
+ list_add_rcu(&atsru->list, &dmar_atsr_units);

return 0;
}
@@ -3568,6 +3578,7 @@ int dmar_find_matched_atsr_unit(struct pci_dev *dev)
if (!bridge)
return 0;

+ rcu_read_lock();
list_for_each_entry_rcu(atsru, &dmar_atsr_units, list) {
atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
if (atsr->segment != pci_domain_nr(dev->bus))
@@ -3582,6 +3593,7 @@ int dmar_find_matched_atsr_unit(struct pci_dev *dev)
}
ret = 0;
out:
+ rcu_read_unlock();

return ret;
}
@@ -3598,7 +3610,7 @@ int __init dmar_parse_rmrr_atsr_dev(void)
return ret;
}

- list_for_each_entry(atsr, &dmar_atsr_units, list) {
+ list_for_each_entry_rcu(atsr, &dmar_atsr_units, list) {
ret = atsr_parse_dev(atsr);
if (ret)
return ret;
@@ -3915,6 +3927,7 @@ static void iommu_free_vm_domain(struct dmar_domain *domain)
unsigned long i;
unsigned long ndomains;

+ rcu_read_lock();
for_each_active_iommu(iommu, drhd) {
ndomains = cap_ndoms(iommu->cap);
for_each_set_bit(i, iommu->domain_ids, ndomains) {
@@ -3927,6 +3940,7 @@ static void iommu_free_vm_domain(struct dmar_domain *domain)
}
}
}
+ rcu_read_unlock();
}

static void vm_domain_exit(struct dmar_domain *domain)
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 9572a4f..3e42960 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -26,6 +26,7 @@
#include <linux/msi.h>
#include <linux/irqreturn.h>
#include <linux/rwsem.h>
+#include <linux/rcupdate.h>

struct acpi_dmar_header;

@@ -41,7 +42,7 @@ struct dmar_drhd_unit {
struct list_head list; /* list of drhd units */
struct acpi_dmar_header *hdr; /* ACPI header */
u64 reg_base_addr; /* register base address*/
- struct pci_dev **devices; /* target device array */
+ struct pci_dev __rcu **devices;/* target device array */
int devices_cnt; /* target device count */
u16 segment; /* PCI domain */
u8 ignored:1; /* ignore drhd */
@@ -53,22 +54,31 @@ extern struct rw_semaphore dmar_global_lock;
extern struct list_head dmar_drhd_units;

#define for_each_drhd_unit(drhd) \
- list_for_each_entry(drhd, &dmar_drhd_units, list)
+ list_for_each_entry_rcu(drhd, &dmar_drhd_units, list)

#define for_each_active_drhd_unit(drhd) \
- list_for_each_entry(drhd, &dmar_drhd_units, list) \
+ list_for_each_entry_rcu(drhd, &dmar_drhd_units, list) \
if (drhd->ignored) {} else

#define for_each_active_iommu(i, drhd) \
- list_for_each_entry(drhd, &dmar_drhd_units, list) \
+ list_for_each_entry_rcu(drhd, &dmar_drhd_units, list) \
if (i=drhd->iommu, drhd->ignored) {} else

#define for_each_iommu(i, drhd) \
- list_for_each_entry(drhd, &dmar_drhd_units, list) \
+ list_for_each_entry_rcu(drhd, &dmar_drhd_units, list) \
if (i=drhd->iommu, 0) {} else

+static inline bool dmar_rcu_check(void)
+{
+ return rwsem_is_locked(&dmar_global_lock) ||
+ system_state == SYSTEM_BOOTING;
+}
+
+#define dmar_rcu_dereference(p) rcu_dereference_check((p), dmar_rcu_check())
+
#define for_each_dev_scope(a, c, p, d) \
- for ((p) = 0; ((d) = (p) < (c) ? (a)[(p)] : NULL, (p) < (c)); (p)++)
+ for ((p) = 0; ((d) = (p) < (c) ? dmar_rcu_dereference((a)[(p)]) : \
+ NULL, (p) < (c)); (p)++)

#define for_each_active_dev_scope(a, c, p, d) \
for_each_dev_scope((a), (c), (p), (d)) if (!(d)) { continue; } else
@@ -76,6 +86,10 @@ extern struct list_head dmar_drhd_units;
extern int dmar_table_init(void);
extern int dmar_dev_scope_init(void);
extern int enable_drhd_fault_handling(void);
+extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt);
+extern int dmar_parse_dev_scope(void *start, void *end, int *cnt,
+ struct pci_dev __rcu ***devices, u16 segment);
+extern void dmar_free_dev_scope(struct pci_dev __rcu ***devices, int *cnt);
#else
static inline int dmar_table_init(void)
{
@@ -138,10 +152,6 @@ extern int iommu_detected, no_iommu;
extern int dmar_parse_rmrr_atsr_dev(void);
extern int dmar_parse_one_rmrr(struct acpi_dmar_header *header);
extern int dmar_parse_one_atsr(struct acpi_dmar_header *header);
-extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt);
-extern int dmar_parse_dev_scope(void *start, void *end, int *cnt,
- struct pci_dev ***devices, u16 segment);
-extern void dmar_free_dev_scope(struct pci_dev ***devices, int *cnt);
extern int intel_iommu_init(void);
#else /* !CONFIG_INTEL_IOMMU: */
static inline int intel_iommu_init(void) { return -ENODEV; }
--
1.7.10.4

2014-01-07 09:00:59

by Jiang Liu

[permalink] [raw]
Subject: [RFC Patch Part2 V1 14/14] iommu/vt-d: update IOMMU state when memory hotplug happens

If static identity domain is created, IOMMU driver needs to update
si_domain page table when memory hotplug event happens. Otherwise
PCI device DMA operations can't access the hot-added memory regions.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/intel-iommu.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 83e3ed4..35a987d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -33,6 +33,7 @@
#include <linux/dmar.h>
#include <linux/dma-mapping.h>
#include <linux/mempool.h>
+#include <linux/memory.h>
#include <linux/timer.h>
#include <linux/iova.h>
#include <linux/iommu.h>
@@ -3689,6 +3690,54 @@ static struct notifier_block device_nb = {
.notifier_call = device_notifier,
};

+static int intel_iommu_memory_notifier(struct notifier_block *nb,
+ unsigned long val, void *v)
+{
+ struct memory_notify *mhp = v;
+ unsigned long long start, end;
+ struct iova *iova;
+
+ switch (val) {
+ case MEM_GOING_ONLINE:
+ start = mhp->start_pfn << PAGE_SHIFT;
+ end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
+ if (iommu_domain_identity_map(si_domain, start, end)) {
+ pr_warn("dmar: failed to build identity map for [%llx-%llx]\n",
+ start, end);
+ return NOTIFY_BAD;
+ }
+ break;
+ case MEM_OFFLINE:
+ case MEM_CANCEL_ONLINE:
+ /* TODO: enhance RB-tree and IOVA code to support of splitting iova */
+ iova = find_iova(&si_domain->iovad, mhp->start_pfn);
+ if (iova) {
+ unsigned long start_pfn, last_pfn;
+ struct dmar_drhd_unit *drhd;
+ struct intel_iommu *iommu;
+
+ start_pfn = mm_to_dma_pfn(iova->pfn_lo);
+ last_pfn = mm_to_dma_pfn(iova->pfn_hi + 1) - 1;
+ dma_pte_clear_range(si_domain, start_pfn, last_pfn);
+ dma_pte_free_pagetable(si_domain, start_pfn, last_pfn);
+ rcu_read_lock();
+ for_each_active_iommu(iommu, drhd)
+ iommu_flush_iotlb_psi(iommu, si_domain->id,
+ start_pfn, last_pfn - start_pfn + 1, 0);
+ rcu_read_unlock();
+ __free_iova(&si_domain->iovad, iova);
+ }
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block intel_iommu_memory_nb = {
+ .notifier_call = intel_iommu_memory_notifier,
+ .priority = 0
+};
+
int __init intel_iommu_init(void)
{
int ret = -ENODEV;
@@ -3761,8 +3810,9 @@ int __init intel_iommu_init(void)
init_iommu_pm_ops();

bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
-
bus_register_notifier(&pci_bus_type, &device_nb);
+ if (si_domain)
+ register_memory_notifier(&intel_iommu_memory_nb);

intel_iommu_enabled = 1;

--
1.7.10.4

2014-01-07 09:00:53

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 13/14] iommu/vt-d: update device to static identity domain mapping for PCI hotplug

When PCI device hotplug event happen, we need to update device to
static identity domain mapping relationship to maintain correct
device to domain mapping.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/intel-iommu.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 6a9a314..83e3ed4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3659,13 +3659,26 @@ static int device_notifier(struct notifier_block *nb,
return 0;

down_read(&dmar_global_lock);
- if (action == BUS_NOTIFY_UNBOUND_DRIVER && !iommu_pass_through) {
- domain_remove_one_dev_info(domain, pdev);
+ switch (action) {
+ case BUS_NOTIFY_ADD_DEVICE:
+ if (iommu_should_identity_map(pdev, 1))
+ domain_add_dev_info(si_domain, pdev,
+ hw_pass_through ?
+ CONTEXT_TT_PASS_THROUGH :
+ CONTEXT_TT_MULTI_LEVEL);
+ break;

+ case BUS_NOTIFY_UNBOUND_DRIVER:
+ if (iommu_pass_through)
+ break;
+ /* fall through */
+ case BUS_NOTIFY_DEL_DEVICE:
+ domain_remove_one_dev_info(domain, pdev);
if (!(domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE) &&
!(domain->flags & DOMAIN_FLAG_STATIC_IDENTITY) &&
list_empty(&domain->devices))
domain_exit(domain);
+ break;
}
up_read(&dmar_global_lock);

--
1.7.10.4

2014-01-07 09:00:50

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 12/14] iommu/vt-d, PCI: unify the way to process DMAR device scope array

Now we have a PCI bus notification based mechanism to update DMAR
device scope array, we could extend the mechanism to support boot
time initialization too, which will help to unify and simplify
the implementation.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/dmar.c | 163 ++++++++++++-------------------------------
drivers/iommu/intel-iommu.c | 71 +++++--------------
include/linux/dmar.h | 7 --
3 files changed, 61 insertions(+), 180 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 15c9ce0..d82cee2 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -60,6 +60,7 @@ LIST_HEAD(dmar_drhd_units);

struct acpi_table_header * __initdata dmar_tbl;
static acpi_size dmar_tbl_size;
+static int dmar_dev_scope_status = 1;

static int alloc_iommu(struct dmar_drhd_unit *drhd);
static void free_iommu(struct intel_iommu *iommu);
@@ -76,58 +77,6 @@ static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
list_add_rcu(&drhd->list, &dmar_drhd_units);
}

-static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope *scope,
- struct pci_dev __rcu **dev, u16 segment)
-{
- struct pci_bus *bus;
- struct pci_dev *pdev = NULL;
- struct acpi_dmar_pci_path *path;
- int count;
-
- bus = pci_find_bus(segment, scope->bus);
- path = (struct acpi_dmar_pci_path *)(scope + 1);
- count = (scope->length - sizeof(struct acpi_dmar_device_scope))
- / sizeof(struct acpi_dmar_pci_path);
-
- while (count) {
- if (pdev)
- pci_dev_put(pdev);
- /*
- * Some BIOSes list non-exist devices in DMAR table, just
- * ignore it
- */
- if (!bus) {
- pr_warn("Device scope bus [%d] not found\n", scope->bus);
- break;
- }
- pdev = pci_get_slot(bus, PCI_DEVFN(path->device, path->function));
- if (!pdev) {
- /* warning will be printed below */
- break;
- }
- path ++;
- count --;
- bus = pdev->subordinate;
- }
- if (!pdev) {
- pr_warn("Device scope device [%04x:%02x:%02x.%02x] not found\n",
- segment, scope->bus, path->device, path->function);
- return 0;
- }
- if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT && \
- pdev->subordinate) || (scope->entry_type == \
- ACPI_DMAR_SCOPE_TYPE_BRIDGE && !pdev->subordinate)) {
- pci_dev_put(pdev);
- pr_warn("Device scope type does not match for %s\n",
- pci_name(pdev));
- return -EINVAL;
- }
-
- rcu_assign_pointer(*dev, pdev);
-
- return 0;
-}
-
void *dmar_alloc_dev_scope(void *start, void *end, int *cnt)
{
struct acpi_dmar_device_scope *scope;
@@ -150,35 +99,6 @@ void *dmar_alloc_dev_scope(void *start, void *end, int *cnt)
return kcalloc(*cnt, sizeof(struct pci_dev *), GFP_KERNEL);
}

-int __init dmar_parse_dev_scope(void *start, void *end, int *cnt,
- struct pci_dev __rcu ***devices, u16 segment)
-{
- struct acpi_dmar_device_scope *scope;
- int index, ret;
-
- *devices = dmar_alloc_dev_scope(start, end, cnt);
- if (*cnt == 0)
- return 0;
- else if (!*devices)
- return -ENOMEM;
-
- for (index = 0; start < end; start += scope->length) {
- scope = start;
- if (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT ||
- scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE) {
- ret = dmar_parse_one_dev_scope(scope,
- &(*devices)[index], segment);
- if (ret) {
- dmar_free_dev_scope(devices, cnt);
- return ret;
- }
- index ++;
- }
- }
-
- return 0;
-}
-
void dmar_free_dev_scope(struct pci_dev __rcu ***devices, int *cnt)
{
int i;
@@ -220,6 +140,8 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned long event)
if (!info) {
pr_warn("Out of memory when allocating notify_info "
"for %s.\n", pci_name(dev));
+ if (dmar_dev_scope_status == 0)
+ dmar_dev_scope_status = -ENOMEM;
return NULL;
}
}
@@ -349,6 +271,8 @@ static int dmar_pci_bus_add_dev(struct dmar_pci_notify_info *info)
}
if (ret >= 0)
ret = dmar_iommu_notify_scope_dev(info);
+ if (ret < 0 && dmar_dev_scope_status == 0)
+ dmar_dev_scope_status = ret;

return ret;
}
@@ -418,9 +342,21 @@ dmar_parse_one_drhd(struct acpi_dmar_header *header)
dmaru->reg_base_addr = drhd->address;
dmaru->segment = drhd->segment;
dmaru->include_all = drhd->flags & 0x1; /* BIT0: INCLUDE_ALL */
+ if (!dmaru->include_all) {
+ dmaru->devices = dmar_alloc_dev_scope((void *)(drhd + 1),
+ ((void *)drhd) + drhd->header.length,
+ &dmaru->devices_cnt);
+ if (dmaru->devices_cnt && dmaru->devices == NULL) {
+ kfree(dmaru);
+ return -ENOMEM;
+ }
+ }

ret = alloc_iommu(dmaru);
if (ret) {
+ if (!dmaru->include_all)
+ dmar_free_dev_scope(&dmaru->devices,
+ &dmaru->devices_cnt);
kfree(dmaru);
return ret;
}
@@ -437,21 +373,6 @@ static void dmar_free_drhd(struct dmar_drhd_unit *dmaru)
kfree(dmaru);
}

-static int __init dmar_parse_dev(struct dmar_drhd_unit *dmaru)
-{
- struct acpi_dmar_hardware_unit *drhd;
-
- drhd = (struct acpi_dmar_hardware_unit *) dmaru->hdr;
-
- if (dmaru->include_all)
- return 0;
-
- return dmar_parse_dev_scope((void *)(drhd + 1),
- ((void *)drhd) + drhd->header.length,
- &dmaru->devices_cnt, &dmaru->devices,
- drhd->segment);
-}
-
#ifdef CONFIG_ACPI_NUMA
static int __init
dmar_parse_one_rhsa(struct acpi_dmar_header *header)
@@ -665,34 +586,35 @@ out:

int __init dmar_dev_scope_init(void)
{
- static int dmar_dev_scope_initialized;
- struct dmar_drhd_unit *drhd;
- int ret = -ENODEV;
+ struct pci_dev *dev = NULL;
+ struct dmar_pci_notify_info *info;

- if (dmar_dev_scope_initialized)
- return dmar_dev_scope_initialized;
+ if (dmar_dev_scope_status != 1)
+ return dmar_dev_scope_status;

- if (list_empty(&dmar_drhd_units))
- goto fail;
+ if (list_empty(&dmar_drhd_units)) {
+ dmar_dev_scope_status = -ENODEV;
+ } else {
+ dmar_dev_scope_status = 0;
+
+ for_each_pci_dev(dev) {
+ if (dev->is_virtfn)
+ continue;
+
+ info = dmar_alloc_pci_notify_info(dev,
+ BUS_NOTIFY_ADD_DEVICE);
+ if (!info) {
+ return dmar_dev_scope_status;
+ } else {
+ dmar_pci_bus_add_dev(info);
+ dmar_free_pci_notify_info(info);
+ }
+ }

- for_each_drhd_unit(drhd) {
- ret = dmar_parse_dev(drhd);
- if (ret)
- goto fail;
+ bus_register_notifier(&pci_bus_type, &dmar_pci_bus_nb);
}

- ret = dmar_parse_rmrr_atsr_dev();
- if (ret)
- goto fail;
-
- bus_register_notifier(&pci_bus_type, &dmar_pci_bus_nb);
-
- dmar_dev_scope_initialized = 1;
- return 0;
-
-fail:
- dmar_dev_scope_initialized = ret;
- return ret;
+ return dmar_dev_scope_status;
}


@@ -1617,7 +1539,8 @@ static int __init dmar_free_unused_resources(void)
if (irq_remapping_enabled || intel_iommu_enabled)
return 0;

- bus_unregister_notifier(&pci_bus_type, &dmar_pci_bus_nb);
+ if (dmar_dev_scope_status != 1 && !list_empty(&dmar_drhd_units))
+ bus_unregister_notifier(&pci_bus_type, &dmar_pci_bus_nb);

down_write(&dmar_global_lock);
list_for_each_entry_safe(dmaru, dmaru_n, &dmar_drhd_units, list) {
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 010491f..6a9a314 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3467,11 +3467,6 @@ static void __init init_iommu_pm_ops(void)
static inline void init_iommu_pm_ops(void) {}
#endif /* CONFIG_PM */

-static void __init dmar_register_rmrr_unit(struct dmar_rmrr_unit *rmrr)
-{
- list_add(&rmrr->list, &dmar_rmrr_units);
-}
-

int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header)
{
@@ -3486,21 +3481,17 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header)
rmrr = (struct acpi_dmar_reserved_memory *)header;
rmrru->base_address = rmrr->base_address;
rmrru->end_address = rmrr->end_address;
+ rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1),
+ ((void *)rmrr) + rmrr->header.length,
+ &rmrru->devices_cnt);
+ if (rmrru->devices_cnt && rmrru->devices == NULL) {
+ kfree(rmrru);
+ return -ENOMEM;
+ }

- dmar_register_rmrr_unit(rmrru);
- return 0;
-}
+ list_add(&rmrru->list, &dmar_rmrr_units);

-static int __init
-rmrr_parse_dev(struct dmar_rmrr_unit *rmrru)
-{
- struct acpi_dmar_reserved_memory *rmrr;
-
- rmrr = (struct acpi_dmar_reserved_memory *) rmrru->hdr;
- return dmar_parse_dev_scope((void *)(rmrr + 1),
- ((void *)rmrr) + rmrr->header.length,
- &rmrru->devices_cnt, &rmrru->devices,
- rmrr->segment);
+ return 0;
}

int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr)
@@ -3515,26 +3506,21 @@ int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr)

atsru->hdr = hdr;
atsru->include_all = atsr->flags & 0x1;
+ if (!atsru->include_all) {
+ atsru->devices = dmar_alloc_dev_scope((void *)(atsr + 1),
+ (void *)atsr + atsr->header.length,
+ &atsru->devices_cnt);
+ if (atsru->devices_cnt && atsru->devices == NULL) {
+ kfree(atsru);
+ return -ENOMEM;
+ }
+ }

list_add_rcu(&atsru->list, &dmar_atsr_units);

return 0;
}

-static int __init atsr_parse_dev(struct dmar_atsr_unit *atsru)
-{
- struct acpi_dmar_atsr *atsr;
-
- if (atsru->include_all)
- return 0;
-
- atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
- return dmar_parse_dev_scope((void *)(atsr + 1),
- (void *)atsr + atsr->header.length,
- &atsru->devices_cnt, &atsru->devices,
- atsr->segment);
-}
-
static void intel_iommu_free_atsr(struct dmar_atsr_unit *atsru)
{
dmar_free_dev_scope(&atsru->devices, &atsru->devices_cnt);
@@ -3598,27 +3584,6 @@ out:
return ret;
}

-int __init dmar_parse_rmrr_atsr_dev(void)
-{
- struct dmar_rmrr_unit *rmrr;
- struct dmar_atsr_unit *atsr;
- int ret;
-
- list_for_each_entry(rmrr, &dmar_rmrr_units, list) {
- ret = rmrr_parse_dev(rmrr);
- if (ret)
- return ret;
- }
-
- list_for_each_entry_rcu(atsr, &dmar_atsr_units, list) {
- ret = atsr_parse_dev(atsr);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info)
{
int ret = 0;
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 78c15a45..242c25b 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -96,8 +96,6 @@ extern int dmar_table_init(void);
extern int dmar_dev_scope_init(void);
extern int enable_drhd_fault_handling(void);
extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt);
-extern int dmar_parse_dev_scope(void *start, void *end, int *cnt,
- struct pci_dev __rcu ***devices, u16 segment);
extern void dmar_free_dev_scope(struct pci_dev __rcu ***devices, int *cnt);
extern int dmar_insert_dev_scope(struct dmar_pci_notify_info *info,
void *start, void*end, u16 segment,
@@ -166,7 +164,6 @@ extern int arch_setup_dmar_msi(unsigned int irq);

#ifdef CONFIG_INTEL_IOMMU
extern int iommu_detected, no_iommu;
-extern int dmar_parse_rmrr_atsr_dev(void);
extern int dmar_parse_one_rmrr(struct acpi_dmar_header *header);
extern int dmar_parse_one_atsr(struct acpi_dmar_header *header);
extern int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info);
@@ -181,10 +178,6 @@ static inline int dmar_parse_one_atsr(struct acpi_dmar_header *header)
{
return 0;
}
-static inline int dmar_parse_rmrr_atsr_dev(void)
-{
- return 0;
-}
static inline int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info)
{
return 0;
--
1.7.10.4

2014-01-07 09:02:25

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 11/14] iommu/vt-d, PCI: update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens

Current Intel DMAR/IOMMU driver assumes that all PCI devices associated
with DMAR/RMRR/ATSR device scope arrays are created at boot time and
won't change at runtime, so it caches pointers of associated PCI device
object. That assumption may be wrong now due to:
1) introduction of PCI host bridge hotplug
2) PCI device hotplug through sysfs interfaces.

Wang Yijing has tried to solve this issue by caching <bus, dev, func>
tupple instead of the PCI device object pointer, but that's still
unreliable because PCI bus number may change in case of hotplug.
Please refer to http://lkml.org/lkml/2013/11/5/64
Message from Yingjing's mail:
after remove and rescan a pci device
[ 611.857095] dmar: DRHD: handling fault status reg 2
[ 611.857109] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff7000
[ 611.857109] DMAR:[fault reason 02] Present bit in context entry is clear
[ 611.857524] dmar: DRHD: handling fault status reg 102
[ 611.857534] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff6000
[ 611.857534] DMAR:[fault reason 02] Present bit in context entry is clear
[ 611.857936] dmar: DRHD: handling fault status reg 202
[ 611.857947] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff5000
[ 611.857947] DMAR:[fault reason 02] Present bit in context entry is clear
[ 611.858351] dmar: DRHD: handling fault status reg 302
[ 611.858362] dmar: DMAR:[DMA Read] Request device [86:00.3] fault addr ffff4000
[ 611.858362] DMAR:[fault reason 02] Present bit in context entry is clear
[ 611.860819] IPv6: ADDRCONF(NETDEV_UP): eth3: link is not ready
[ 611.860983] dmar: DRHD: handling fault status reg 402
[ 611.860995] dmar: INTR-REMAP: Request device [[86:00.3] fault index a4
[ 611.860995] INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear

This patch introduces a new mechanism to update the DRHD/RMRR/ATSR device scope
caches by hooking PCI bus notification.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/dmar.c | 207 +++++++++++++++++++++++++++++++++++++++++++
drivers/iommu/intel-iommu.c | 54 +++++++++++
include/linux/dmar.h | 22 +++++
3 files changed, 283 insertions(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 31c72d5..15c9ce0 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -194,6 +194,209 @@ void dmar_free_dev_scope(struct pci_dev __rcu ***devices, int *cnt)
*cnt = 0;
}

+/* Optimize out kzalloc()/kfree() for normal cases */
+static char dmar_pci_notify_info_buf[64];
+
+static struct dmar_pci_notify_info *
+dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned long event)
+{
+ int level = 0;
+ size_t size;
+ struct pci_dev *tmp;
+ struct dmar_pci_notify_info *info;
+
+ BUG_ON(dev->is_virtfn);
+
+ /* Only generate path[] for device addition event */
+ if (event == BUS_NOTIFY_ADD_DEVICE)
+ for (tmp = dev; tmp; tmp = tmp->bus->self)
+ level++;
+
+ size = sizeof(*info) + level * sizeof(struct acpi_dmar_pci_path);
+ if (size <= sizeof(dmar_pci_notify_info_buf)) {
+ info = (struct dmar_pci_notify_info *)dmar_pci_notify_info_buf;
+ } else {
+ info = kzalloc(size, GFP_KERNEL);
+ if (!info) {
+ pr_warn("Out of memory when allocating notify_info "
+ "for %s.\n", pci_name(dev));
+ return NULL;
+ }
+ }
+
+ info->event = event;
+ info->dev = dev;
+ info->seg = pci_domain_nr(dev->bus);
+ info->level = level;
+ if (event == BUS_NOTIFY_ADD_DEVICE) {
+ for (tmp = dev, level--; tmp; tmp = tmp->bus->self) {
+ info->path[level].device = PCI_SLOT(tmp->devfn);
+ info->path[level].function = PCI_FUNC(tmp->devfn);
+ if (pci_is_root_bus(tmp->bus))
+ info->bus = tmp->bus->number;
+ }
+ }
+
+ return info;
+}
+
+static inline void dmar_free_pci_notify_info(struct dmar_pci_notify_info *info)
+{
+ if ((void *)info != dmar_pci_notify_info_buf)
+ kfree(info);
+}
+
+static bool dmar_match_pci_path(struct dmar_pci_notify_info *info, int bus,
+ struct acpi_dmar_pci_path *path, int count)
+{
+ int i;
+
+ if (info->bus != bus)
+ return false;
+ if (info->level != count)
+ return false;
+
+ for (i = 0; i < count; i++) {
+ if (path[i].device != info->path[i].device ||
+ path[i].function != info->path[i].function)
+ return false;
+ }
+
+ return true;
+}
+
+/* Return: > 0 if match found, 0 if no match found, < 0 if error happens */
+int dmar_insert_dev_scope(struct dmar_pci_notify_info *info,
+ void *start, void*end, u16 segment,
+ struct pci_dev __rcu **devices, int devices_cnt)
+{
+ int i, level;
+ struct pci_dev *tmp, *dev = info->dev;
+ struct acpi_dmar_device_scope *scope;
+ struct acpi_dmar_pci_path *path;
+
+ if (segment != info->seg)
+ return 0;
+
+ for (; start < end; start += scope->length) {
+ scope = start;
+ if (scope->entry_type != ACPI_DMAR_SCOPE_TYPE_ENDPOINT &&
+ scope->entry_type != ACPI_DMAR_SCOPE_TYPE_BRIDGE)
+ continue;
+
+ path = (struct acpi_dmar_pci_path *)(scope + 1);
+ level = (scope->length - sizeof(*scope)) / sizeof(*path);
+ if (!dmar_match_pci_path(info, scope->bus, path, level))
+ continue;
+
+ if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT) ^
+ (dev->hdr_type == PCI_HEADER_TYPE_NORMAL)) {
+ pr_warn("Device scope type does not match for %s\n",
+ pci_name(dev));
+ return -EINVAL;
+ }
+
+ for_each_dev_scope(devices, devices_cnt, i, tmp)
+ if (tmp == NULL) {
+ rcu_assign_pointer(devices[i],
+ pci_dev_get(dev));
+ return 1;
+ }
+ BUG_ON(i >= devices_cnt);
+ }
+
+ return 0;
+}
+
+int dmar_remove_dev_scope(struct dmar_pci_notify_info *info, u16 segment,
+ struct pci_dev __rcu **devices, int count)
+{
+ int index;
+ struct pci_dev *tmp;
+
+ if (info->seg != segment)
+ return 0;
+
+ for_each_active_dev_scope(devices, count, index, tmp)
+ if (tmp == info->dev) {
+ rcu_assign_pointer(devices[index], NULL);
+ synchronize_rcu();
+ pci_dev_put(tmp);
+ return 1;
+ }
+
+ return 0;
+}
+
+static int dmar_pci_bus_add_dev(struct dmar_pci_notify_info *info)
+{
+ int ret = 0;
+ struct dmar_drhd_unit *dmaru;
+ struct acpi_dmar_hardware_unit *drhd;
+
+ for_each_drhd_unit(dmaru) {
+ if (dmaru->include_all)
+ continue;
+
+ drhd = container_of(dmaru->hdr,
+ struct acpi_dmar_hardware_unit, header);
+ ret = dmar_insert_dev_scope(info, (void *)(drhd + 1),
+ ((void *)drhd) + drhd->header.length,
+ dmaru->segment,
+ dmaru->devices, dmaru->devices_cnt);
+ if (ret != 0)
+ break;
+ }
+ if (ret >= 0)
+ ret = dmar_iommu_notify_scope_dev(info);
+
+ return ret;
+}
+
+static void dmar_pci_bus_del_dev(struct dmar_pci_notify_info *info)
+{
+ struct dmar_drhd_unit *dmaru;
+
+ for_each_drhd_unit(dmaru)
+ if (dmar_remove_dev_scope(info, dmaru->segment,
+ dmaru->devices, dmaru->devices_cnt))
+ break;
+ dmar_iommu_notify_scope_dev(info);
+}
+
+static int dmar_pci_bus_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct pci_dev *pdev = to_pci_dev(data);
+ struct dmar_pci_notify_info *info;
+
+ /* Only care about add/remove events for physical functions */
+ if (pdev->is_virtfn)
+ return NOTIFY_DONE;
+ if (action != BUS_NOTIFY_ADD_DEVICE && action != BUS_NOTIFY_DEL_DEVICE)
+ return NOTIFY_DONE;
+
+ info = dmar_alloc_pci_notify_info(pdev, action);
+ if (!info)
+ return NOTIFY_DONE;
+
+ down_write(&dmar_global_lock);
+ if (action == BUS_NOTIFY_ADD_DEVICE)
+ dmar_pci_bus_add_dev(info);
+ else if (action == BUS_NOTIFY_DEL_DEVICE)
+ dmar_pci_bus_del_dev(info);
+ up_write(&dmar_global_lock);
+
+ dmar_free_pci_notify_info(info);
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block dmar_pci_bus_nb = {
+ .notifier_call = dmar_pci_bus_notifier,
+ .priority = INT_MIN,
+};
+
/**
* dmar_parse_one_drhd - parses exactly one DMA remapping hardware definition
* structure which uniquely represent one DMA remapping hardware unit
@@ -482,6 +685,8 @@ int __init dmar_dev_scope_init(void)
if (ret)
goto fail;

+ bus_register_notifier(&pci_bus_type, &dmar_pci_bus_nb);
+
dmar_dev_scope_initialized = 1;
return 0;

@@ -1412,6 +1617,8 @@ static int __init dmar_free_unused_resources(void)
if (irq_remapping_enabled || intel_iommu_enabled)
return 0;

+ bus_unregister_notifier(&pci_bus_type, &dmar_pci_bus_nb);
+
down_write(&dmar_global_lock);
list_for_each_entry_safe(dmaru, dmaru_n, &dmar_drhd_units, list) {
list_del(&dmaru->list);
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 91ecb95..010491f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3619,6 +3619,60 @@ int __init dmar_parse_rmrr_atsr_dev(void)
return 0;
}

+int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info)
+{
+ int ret = 0;
+ struct dmar_rmrr_unit *rmrru;
+ struct dmar_atsr_unit *atsru;
+ struct acpi_dmar_atsr *atsr;
+ struct acpi_dmar_reserved_memory *rmrr;
+
+ if (!intel_iommu_enabled && system_state != SYSTEM_BOOTING)
+ return 0;
+
+ list_for_each_entry(rmrru, &dmar_rmrr_units, list) {
+ rmrr = container_of(rmrru->hdr,
+ struct acpi_dmar_reserved_memory, header);
+ if (info->event == BUS_NOTIFY_ADD_DEVICE) {
+ ret = dmar_insert_dev_scope(info, (void *)(rmrr + 1),
+ ((void *)rmrr) + rmrr->header.length,
+ rmrr->segment, rmrru->devices,
+ rmrru->devices_cnt);
+ if (ret > 0)
+ break;
+ else if(ret < 0)
+ return ret;
+ } else if (info->event == BUS_NOTIFY_DEL_DEVICE) {
+ if (dmar_remove_dev_scope(info, rmrr->segment,
+ rmrru->devices, rmrru->devices_cnt))
+ break;
+ }
+ }
+
+ list_for_each_entry(atsru, &dmar_atsr_units, list) {
+ if (atsru->include_all)
+ continue;
+
+ atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
+ if (info->event == BUS_NOTIFY_ADD_DEVICE) {
+ ret = dmar_insert_dev_scope(info, (void *)(atsr + 1),
+ (void *)atsr + atsr->header.length,
+ atsr->segment, atsru->devices,
+ atsru->devices_cnt);
+ if (ret > 0)
+ break;
+ else if(ret < 0)
+ return ret;
+ } else if (info->event == BUS_NOTIFY_DEL_DEVICE) {
+ if (dmar_remove_dev_scope(info, atsr->segment,
+ atsru->devices, atsru->devices_cnt))
+ break;
+ }
+ }
+
+ return 0;
+}
+
/*
* Here we only respond to action of unbound device from driver.
*
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 3e42960..78c15a45 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -50,6 +50,15 @@ struct dmar_drhd_unit {
struct intel_iommu *iommu;
};

+struct dmar_pci_notify_info {
+ struct pci_dev *dev;
+ unsigned long event;
+ int bus;
+ u16 seg;
+ u16 level;
+ struct acpi_dmar_pci_path path[];
+} __attribute__((packed));
+
extern struct rw_semaphore dmar_global_lock;
extern struct list_head dmar_drhd_units;

@@ -90,7 +99,15 @@ extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt);
extern int dmar_parse_dev_scope(void *start, void *end, int *cnt,
struct pci_dev __rcu ***devices, u16 segment);
extern void dmar_free_dev_scope(struct pci_dev __rcu ***devices, int *cnt);
+extern int dmar_insert_dev_scope(struct dmar_pci_notify_info *info,
+ void *start, void*end, u16 segment,
+ struct pci_dev __rcu **devices,
+ int devices_cnt);
+extern int dmar_remove_dev_scope(struct dmar_pci_notify_info *info,
+ u16 segment, struct pci_dev __rcu **devices,
+ int count);
#else
+struct dmar_pci_notify_info;
static inline int dmar_table_init(void)
{
return -ENODEV;
@@ -152,6 +169,7 @@ extern int iommu_detected, no_iommu;
extern int dmar_parse_rmrr_atsr_dev(void);
extern int dmar_parse_one_rmrr(struct acpi_dmar_header *header);
extern int dmar_parse_one_atsr(struct acpi_dmar_header *header);
+extern int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info);
extern int intel_iommu_init(void);
#else /* !CONFIG_INTEL_IOMMU: */
static inline int intel_iommu_init(void) { return -ENODEV; }
@@ -167,6 +185,10 @@ static inline int dmar_parse_rmrr_atsr_dev(void)
{
return 0;
}
+static inline int dmar_iommu_notify_scope_dev(struct dmar_pci_notify_info *info)
+{
+ return 0;
+}
#endif /* CONFIG_INTEL_IOMMU */

#endif /* __DMAR_H__ */
--
1.7.10.4

2014-01-07 09:00:43

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 08/14] iommu/vt-d: introduce macro for_each_dev_scope() to walk device scope entries

Introduce for_each_dev_scope()/for_each_active_dev_scope() to walk
{active} device scope entries. This will help following RCU lock
related patches.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/dmar.c | 14 +++---
drivers/iommu/intel-iommu.c | 100 +++++++++++++++++++++----------------------
include/linux/dmar.h | 6 +++
3 files changed, 64 insertions(+), 56 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 4a4c9a1..ff33866 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -170,9 +170,12 @@ int __init dmar_parse_dev_scope(void *start, void *end, int *cnt,

void dmar_free_dev_scope(struct pci_dev ***devices, int *cnt)
{
+ int i;
+ struct pci_dev *tmp_dev;
+
if (*devices && *cnt) {
- while (--*cnt >= 0)
- pci_dev_put((*devices)[*cnt]);
+ for_each_active_dev_scope(*devices, *cnt, i, tmp_dev)
+ pci_dev_put(tmp_dev);
kfree(*devices);
*devices = NULL;
*cnt = 0;
@@ -402,10 +405,11 @@ static int dmar_pci_device_match(struct pci_dev *devices[], int cnt,
struct pci_dev *dev)
{
int index;
+ struct pci_dev *tmp;

while (dev) {
- for (index = 0; index < cnt; index++)
- if (dev == devices[index])
+ for_each_active_dev_scope(devices, cnt, index, tmp)
+ if (dev == tmp)
return 1;

/* Check our parent */
@@ -452,7 +456,7 @@ int __init dmar_dev_scope_init(void)
if (list_empty(&dmar_drhd_units))
goto fail;

- list_for_each_entry(drhd, &dmar_drhd_units, list) {
+ for_each_drhd_unit(drhd) {
ret = dmar_parse_dev(drhd);
if (ret)
goto fail;
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b0028e2..7e043ca 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -652,29 +652,31 @@ static void domain_update_iommu_cap(struct dmar_domain *domain)
static struct intel_iommu *device_to_iommu(int segment, u8 bus, u8 devfn)
{
struct dmar_drhd_unit *drhd = NULL;
+ struct intel_iommu *iommu;
+ struct pci_dev *dev;
int i;

- for_each_active_drhd_unit(drhd) {
+ for_each_active_iommu(iommu, drhd) {
if (segment != drhd->segment)
continue;

- for (i = 0; i < drhd->devices_cnt; i++) {
- if (drhd->devices[i] &&
- drhd->devices[i]->bus->number == bus &&
- drhd->devices[i]->devfn == devfn)
- return drhd->iommu;
- if (drhd->devices[i] &&
- drhd->devices[i]->subordinate &&
- drhd->devices[i]->subordinate->number <= bus &&
- drhd->devices[i]->subordinate->busn_res.end >= bus)
- return drhd->iommu;
+ for_each_active_dev_scope(drhd->devices,
+ drhd->devices_cnt, i, dev) {
+ if (dev->bus->number == bus && dev->devfn == devfn)
+ goto out;
+ if (dev->subordinate &&
+ dev->subordinate->number <= bus &&
+ dev->subordinate->busn_res.end >= bus)
+ goto out;
}

if (drhd->include_all)
- return drhd->iommu;
+ goto out;
}
+ iommu = NULL;
+out:

- return NULL;
+ return iommu;
}

static void domain_flush_cache(struct dmar_domain *domain,
@@ -2329,17 +2331,19 @@ static int domain_add_dev_info(struct dmar_domain *domain,
static bool device_has_rmrr(struct pci_dev *dev)
{
struct dmar_rmrr_unit *rmrr;
+ struct pci_dev *tmp;
int i;

for_each_rmrr_units(rmrr) {
- for (i = 0; i < rmrr->devices_cnt; i++) {
- /*
- * Return TRUE if this RMRR contains the device that
- * is passed in.
- */
- if (rmrr->devices[i] == dev)
+ /*
+ * Return TRUE if this RMRR contains the device that
+ * is passed in.
+ */
+ for_each_active_dev_scope(rmrr->devices,
+ rmrr->devices_cnt, i, tmp)
+ if (tmp == dev) {
return true;
- }
+ }
}
return false;
}
@@ -2589,14 +2593,9 @@ static int __init init_dmars(void)
*/
printk(KERN_INFO "IOMMU: Setting RMRR:\n");
for_each_rmrr_units(rmrr) {
- for (i = 0; i < rmrr->devices_cnt; i++) {
- pdev = rmrr->devices[i];
- /*
- * some BIOS lists non-exist devices in DMAR
- * table.
- */
- if (!pdev)
- continue;
+ /* some BIOS lists non-exist devices in DMAR table. */
+ for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
+ i, pdev) {
ret = iommu_prepare_rmrr_dev(rmrr, pdev);
if (ret)
printk(KERN_ERR
@@ -3282,13 +3281,14 @@ DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IOAT_SNB, quir
static void __init init_no_remapping_devices(void)
{
struct dmar_drhd_unit *drhd;
+ struct pci_dev *dev;
+ int i;

for_each_drhd_unit(drhd) {
if (!drhd->include_all) {
- int i;
- for (i = 0; i < drhd->devices_cnt; i++)
- if (drhd->devices[i] != NULL)
- break;
+ for_each_active_dev_scope(drhd->devices,
+ drhd->devices_cnt, i, dev)
+ break;
/* ignore DMAR unit if no pci devices exist */
if (i == drhd->devices_cnt)
drhd->ignored = 1;
@@ -3296,15 +3296,13 @@ static void __init init_no_remapping_devices(void)
}

for_each_active_drhd_unit(drhd) {
- int i;
if (drhd->include_all)
continue;

- for (i = 0; i < drhd->devices_cnt; i++)
- if (drhd->devices[i] &&
- !IS_GFX_DEVICE(drhd->devices[i]))
+ for_each_active_dev_scope(drhd->devices,
+ drhd->devices_cnt, i, dev)
+ if (!IS_GFX_DEVICE(dev))
break;
-
if (i < drhd->devices_cnt)
continue;

@@ -3314,11 +3312,9 @@ static void __init init_no_remapping_devices(void)
intel_iommu_gfx_mapped = 1;
} else {
drhd->ignored = 1;
- for (i = 0; i < drhd->devices_cnt; i++) {
- if (!drhd->devices[i])
- continue;
- drhd->devices[i]->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
- }
+ for_each_active_dev_scope(drhd->devices,
+ drhd->devices_cnt, i, dev)
+ dev->dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
}
}
}
@@ -3554,9 +3550,9 @@ static void intel_iommu_free_dmars(void)

int dmar_find_matched_atsr_unit(struct pci_dev *dev)
{
- int i;
+ int i, ret = 1;
struct pci_bus *bus;
- struct pci_dev *bridge = NULL;
+ struct pci_dev *bridge = NULL, *tmp;
struct acpi_dmar_atsr *atsr;
struct dmar_atsr_unit *atsru;

@@ -3577,22 +3573,24 @@ int dmar_find_matched_atsr_unit(struct pci_dev *dev)
if (atsr->segment != pci_domain_nr(dev->bus))
continue;

- for (i = 0; i < atsru->devices_cnt; i++)
- if (atsru->devices[i] == bridge)
- return 1;
+ for_each_dev_scope(atsru->devices, atsru->devices_cnt, i, tmp)
+ if (tmp == bridge)
+ goto out;

if (atsru->include_all)
- return 1;
+ goto out;
}
+ ret = 0;
+out:

- return 0;
+ return ret;
}

int __init dmar_parse_rmrr_atsr_dev(void)
{
struct dmar_rmrr_unit *rmrr;
struct dmar_atsr_unit *atsr;
- int ret = 0;
+ int ret;

list_for_each_entry(rmrr, &dmar_rmrr_units, list) {
ret = rmrr_parse_dev(rmrr);
@@ -3606,7 +3604,7 @@ int __init dmar_parse_rmrr_atsr_dev(void)
return ret;
}

- return ret;
+ return 0;
}

/*
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index dc8c7d5..8aa7561 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -65,6 +65,12 @@ extern struct list_head dmar_drhd_units;
list_for_each_entry(drhd, &dmar_drhd_units, list) \
if (i=drhd->iommu, 0) {} else

+#define for_each_dev_scope(a, c, p, d) \
+ for ((p) = 0; ((d) = (p) < (c) ? (a)[(p)] : NULL, (p) < (c)); (p)++)
+
+#define for_each_active_dev_scope(a, c, p, d) \
+ for_each_dev_scope((a), (c), (p), (d)) if (!(d)) { continue; } else
+
extern int dmar_table_init(void);
extern int dmar_dev_scope_init(void);
extern int enable_drhd_fault_handling(void);
--
1.7.10.4

2014-01-07 09:03:25

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 09/14] iommu/vt-d: introduce a rwsem to protect global data structures

Introduce a global rwsem dmar_global_lock, which will be used to
protect DMAR related global data structures from DMAR/PCI/memory
device hotplug operations in process context.

DMA and interrupt remapping related data structures are read most,
and only change when memory/PCI/DMAR hotplug event happens.
So a global rwsem solution is adopted for balance between simplicity
and performance.

For interrupt remapping driver, function intel_irq_remapping_supported(),
dmar_table_init(), intel_enable_irq_remapping(), disable_irq_remapping(),
reenable_irq_remapping() and enable_drhd_fault_handling() etc
are called during booting, suspending and resuming with interrupt
disabled, so no need to take the global lock.

For interrupt remapping entry allocation, the locking model is:
down_read(&dmar_global_lock);
/* Find corresponding iommu */
iommu = map_hpet_to_ir(id);
if (iommu)
/*
* Allocate remapping entry and mark entry busy,
* the IOMMU won't be hot-removed until the
* allocated entry has been released.
*/
index = alloc_irte(iommu, irq, 1);
up_read(&dmar_global_lock);

For DMA remmaping driver, we only uses the dmar_global_lock rwsem to
protect functions which are only called in process context. For any
function which may be called in interrupt context, we will use RCU
to protect them in following patches.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/dmar.c | 19 +++++-
drivers/iommu/intel-iommu.c | 5 ++
drivers/iommu/intel_irq_remapping.c | 108 +++++++++++++++++++++++------------
include/linux/dmar.h | 2 +
4 files changed, 95 insertions(+), 39 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index ff33866..2831442 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -43,10 +43,19 @@

#include "irq_remapping.h"

-/* No locks are needed as DMA remapping hardware unit
- * list is constructed at boot time and hotplug of
- * these units are not supported by the architecture.
+/*
+ * Assumptions:
+ * 1) The hotplug framework guarentees that DMAR unit will be hot-added
+ * before IO devices managed by that unit.
+ * 2) The hotplug framework guarantees that DMAR unit will be hot-removed
+ * after IO devices managed by that unit.
+ * 3) Hotplug events are rare.
+ *
+ * Locking rules for DMA and interrupt remapping related global data structures:
+ * 1) Use dmar_global_lock in process context
+ * 2) Use RCU in interrupt context
*/
+DECLARE_RWSEM(dmar_global_lock);
LIST_HEAD(dmar_drhd_units);

struct acpi_table_header * __initdata dmar_tbl;
@@ -565,6 +574,7 @@ static int __init detect_intel_iommu(void)
{
int ret;

+ down_write(&dmar_global_lock);
ret = dmar_table_detect();
if (ret)
ret = check_zero_address();
@@ -582,6 +592,7 @@ static int __init detect_intel_iommu(void)
}
early_acpi_os_unmap_memory((void __iomem *)dmar_tbl, dmar_tbl_size);
dmar_tbl = NULL;
+ up_write(&dmar_global_lock);

return ret ? 1 : -ENODEV;
}
@@ -1394,10 +1405,12 @@ static int __init dmar_free_unused_resources(void)
if (irq_remapping_enabled || intel_iommu_enabled)
return 0;

+ down_write(&dmar_global_lock);
list_for_each_entry_safe(dmaru, dmaru_n, &dmar_drhd_units, list) {
list_del(&dmaru->list);
dmar_free_drhd(dmaru);
}
+ up_write(&dmar_global_lock);

return 0;
}
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 7e043ca..25e9b84 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3627,6 +3627,7 @@ static int device_notifier(struct notifier_block *nb,
if (!domain)
return 0;

+ down_read(&dmar_global_lock);
if (action == BUS_NOTIFY_UNBOUND_DRIVER && !iommu_pass_through) {
domain_remove_one_dev_info(domain, pdev);

@@ -3635,6 +3636,7 @@ static int device_notifier(struct notifier_block *nb,
list_empty(&domain->devices))
domain_exit(domain);
}
+ up_read(&dmar_global_lock);

return 0;
}
@@ -3652,6 +3654,7 @@ int __init intel_iommu_init(void)
/* VT-d is required for a TXT/tboot launch, so enforce that */
force_on = tboot_force_iommu();

+ down_write(&dmar_global_lock);
if (dmar_table_init()) {
if (force_on)
panic("tboot: Failed to initialize DMAR table\n");
@@ -3701,6 +3704,7 @@ int __init intel_iommu_init(void)
printk(KERN_ERR "IOMMU: dmar init failed\n");
goto out_free_reserved_range;
}
+ up_write(&dmar_global_lock);
printk(KERN_INFO
"PCI-DMA: Intel(R) Virtualization Technology for Directed I/O\n");

@@ -3726,6 +3730,7 @@ out_free_mempool:
iommu_exit_mempool();
out_free_dmar:
intel_iommu_free_dmars();
+ up_write(&dmar_global_lock);
return ret;
}

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index f307a3f..009a2bb 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -38,6 +38,17 @@ static struct ioapic_scope ir_ioapic[MAX_IO_APICS];
static struct hpet_scope ir_hpet[MAX_HPET_TBS];
static int ir_ioapic_num, ir_hpet_num;

+/*
+ * Lock ordering:
+ * ->dmar_global_lock
+ * ->irq_2_ir_lock
+ * ->qi->q_lock
+ * ->iommu->register_lock
+ * Note:
+ * intel_irq_remap_ops.{supported,prepare,enable,disable,reenable} are called
+ * in single-threaded environment with interrupt disabled, so no need to tabke
+ * the dmar_global_lock.
+ */
static DEFINE_RAW_SPINLOCK(irq_2_ir_lock);

static int __init parse_ioapics_under_ir(void);
@@ -312,12 +323,14 @@ static int set_ioapic_sid(struct irte *irte, int apic)
if (!irte)
return -1;

+ down_read(&dmar_global_lock);
for (i = 0; i < MAX_IO_APICS; i++) {
if (ir_ioapic[i].id == apic) {
sid = (ir_ioapic[i].bus << 8) | ir_ioapic[i].devfn;
break;
}
}
+ up_read(&dmar_global_lock);

if (sid == 0) {
pr_warning("Failed to set source-id of IOAPIC (%d)\n", apic);
@@ -337,12 +350,14 @@ static int set_hpet_sid(struct irte *irte, u8 id)
if (!irte)
return -1;

+ down_read(&dmar_global_lock);
for (i = 0; i < MAX_HPET_TBS; i++) {
if (ir_hpet[i].id == id) {
sid = (ir_hpet[i].bus << 8) | ir_hpet[i].devfn;
break;
}
}
+ up_read(&dmar_global_lock);

if (sid == 0) {
pr_warning("Failed to set source-id of HPET block (%d)\n", id);
@@ -799,10 +814,16 @@ static int __init parse_ioapics_under_ir(void)

static int __init ir_dev_scope_init(void)
{
+ int ret;
+
if (!irq_remapping_enabled)
return 0;

- return dmar_dev_scope_init();
+ down_write(&dmar_global_lock);
+ ret = dmar_dev_scope_init();
+ up_write(&dmar_global_lock);
+
+ return ret;
}
rootfs_initcall(ir_dev_scope_init);

@@ -883,23 +904,27 @@ static int intel_setup_ioapic_entry(int irq,
struct io_apic_irq_attr *attr)
{
int ioapic_id = mpc_ioapic_id(attr->ioapic);
- struct intel_iommu *iommu = map_ioapic_to_ir(ioapic_id);
+ struct intel_iommu *iommu;
struct IR_IO_APIC_route_entry *entry;
struct irte irte;
int index;

+ down_read(&dmar_global_lock);
+ iommu = map_ioapic_to_ir(ioapic_id);
if (!iommu) {
pr_warn("No mapping iommu for ioapic %d\n", ioapic_id);
- return -ENODEV;
- }
-
- entry = (struct IR_IO_APIC_route_entry *)route_entry;
-
- index = alloc_irte(iommu, irq, 1);
- if (index < 0) {
- pr_warn("Failed to allocate IRTE for ioapic %d\n", ioapic_id);
- return -ENOMEM;
+ index = -ENODEV;
+ } else {
+ index = alloc_irte(iommu, irq, 1);
+ if (index < 0) {
+ pr_warn("Failed to allocate IRTE for ioapic %d\n",
+ ioapic_id);
+ index = -ENOMEM;
+ }
}
+ up_read(&dmar_global_lock);
+ if (index < 0)
+ return index;

prepare_irte(&irte, vector, destination);

@@ -918,6 +943,7 @@ static int intel_setup_ioapic_entry(int irq,
irte.avail, irte.vector, irte.dest_id,
irte.sid, irte.sq, irte.svt);

+ entry = (struct IR_IO_APIC_route_entry *)route_entry;
memset(entry, 0, sizeof(*entry));

entry->index2 = (index >> 15) & 0x1;
@@ -1048,20 +1074,23 @@ static int intel_msi_alloc_irq(struct pci_dev *dev, int irq, int nvec)
struct intel_iommu *iommu;
int index;

+ down_read(&dmar_global_lock);
iommu = map_dev_to_ir(dev);
if (!iommu) {
printk(KERN_ERR
"Unable to map PCI %s to iommu\n", pci_name(dev));
- return -ENOENT;
+ index = -ENOENT;
+ } else {
+ index = alloc_irte(iommu, irq, nvec);
+ if (index < 0) {
+ printk(KERN_ERR
+ "Unable to allocate %d IRTE for PCI %s\n",
+ nvec, pci_name(dev));
+ index = -ENOSPC;
+ }
}
+ up_read(&dmar_global_lock);

- index = alloc_irte(iommu, irq, nvec);
- if (index < 0) {
- printk(KERN_ERR
- "Unable to allocate %d IRTE for PCI %s\n", nvec,
- pci_name(dev));
- return -ENOSPC;
- }
return index;
}

@@ -1069,33 +1098,40 @@ static int intel_msi_setup_irq(struct pci_dev *pdev, unsigned int irq,
int index, int sub_handle)
{
struct intel_iommu *iommu;
+ int ret = -ENOENT;

+ down_read(&dmar_global_lock);
iommu = map_dev_to_ir(pdev);
- if (!iommu)
- return -ENOENT;
- /*
- * setup the mapping between the irq and the IRTE
- * base index, the sub_handle pointing to the
- * appropriate interrupt remap table entry.
- */
- set_irte_irq(irq, iommu, index, sub_handle);
+ if (iommu) {
+ /*
+ * setup the mapping between the irq and the IRTE
+ * base index, the sub_handle pointing to the
+ * appropriate interrupt remap table entry.
+ */
+ set_irte_irq(irq, iommu, index, sub_handle);
+ ret = 0;
+ }
+ up_read(&dmar_global_lock);

- return 0;
+ return ret;
}

static int intel_setup_hpet_msi(unsigned int irq, unsigned int id)
{
- struct intel_iommu *iommu = map_hpet_to_ir(id);
+ int ret = -1;
+ struct intel_iommu *iommu;
int index;

- if (!iommu)
- return -1;
-
- index = alloc_irte(iommu, irq, 1);
- if (index < 0)
- return -1;
+ down_read(&dmar_global_lock);
+ iommu = map_hpet_to_ir(id);
+ if (iommu) {
+ index = alloc_irte(iommu, irq, 1);
+ if (index >= 0)
+ ret = 0;
+ }
+ up_read(&dmar_global_lock);

- return 0;
+ return ret;
}

struct irq_remap_ops intel_irq_remap_ops = {
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 8aa7561..9572a4f 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -25,6 +25,7 @@
#include <linux/types.h>
#include <linux/msi.h>
#include <linux/irqreturn.h>
+#include <linux/rwsem.h>

struct acpi_dmar_header;

@@ -48,6 +49,7 @@ struct dmar_drhd_unit {
struct intel_iommu *iommu;
};

+extern struct rw_semaphore dmar_global_lock;
extern struct list_head dmar_drhd_units;

#define for_each_drhd_unit(drhd) \
--
1.7.10.4

2014-01-07 09:04:29

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 06/14] iommu/vt-d: fix incorrect iommu_count for si_domain

The iommu_count for si_domain (static identity) is always zero,
which will cause trouble when trying to tear down si_domain.

[ 14.609681] IOMMU: Setting RMRR:
[ 14.613496] Ignoring identity map for HW passthrough device 0000:00:1a.0 [0xbdcfd000 - 0xbdd1dfff]
[ 14.623809] Ignoring identity map for HW passthrough device 0000:00:1d.0 [0xbdcfd000 - 0xbdd1dfff]
[ 14.634162] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 14.640329] Ignoring identity map for HW passthrough device 0000:00:1f.0 [0x0 - 0xffffff]
[ 14.673360] IOMMU: dmar init failed
[ 14.678157] kmem_cache_destroy iommu_devinfo: Slab cache still has objects
[ 14.686076] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59
[ 14.694176] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[ 14.707412] 0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff880c2cc37c00
[ 14.716407] ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00
[ 14.725468] ffffffff81dc7a6a ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711
[ 14.734464] Call Trace:
[ 14.737453] [<ffffffff8156223d>] dump_stack+0x4d/0x66
[ 14.743430] [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100
[ 14.750279] [<ffffffff81dc7a6a>] intel_iommu_init+0x122/0x56a
[ 14.757035] [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[ 14.763491] [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[ 14.769846] [<ffffffff81000342>] do_one_initcall+0x122/0x180
[ 14.776506] [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[ 14.782866] [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[ 14.789994] [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[ 14.796556] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[ 14.802626] [<ffffffff8154ffce>] kernel_init+0xe/0x130
[ 14.808698] [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0
[ 14.814963] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[ 14.821640] kmem_cache_destroy iommu_domain: Slab cache still has objects
[ 14.829456] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59
[ 14.837562] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[ 14.850803] 0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff88102c1ee3c0
[ 14.861222] ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00
[ 14.870284] ffffffff81dc7a76 ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711
[ 14.879271] Call Trace:
[ 14.882227] [<ffffffff8156223d>] dump_stack+0x4d/0x66
[ 14.888197] [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100
[ 14.895034] [<ffffffff81dc7a76>] intel_iommu_init+0x12e/0x56a
[ 14.901781] [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[ 14.908238] [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[ 14.914594] [<ffffffff81000342>] do_one_initcall+0x122/0x180
[ 14.921244] [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[ 14.927598] [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[ 14.934738] [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[ 14.941309] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[ 14.947380] [<ffffffff8154ffce>] kernel_init+0xe/0x130
[ 14.953430] [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0
[ 14.959689] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[ 14.966299] kmem_cache_destroy iommu_iova: Slab cache still has objects
[ 14.973923] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59
[ 14.982020] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[ 14.995263] 0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff88042cb5c980
[ 15.004265] ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00
[ 15.013322] ffffffff81dc7a82 ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711
[ 15.022318] Call Trace:
[ 15.025238] [<ffffffff8156223d>] dump_stack+0x4d/0x66
[ 15.031202] [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100
[ 15.038038] [<ffffffff81dc7a82>] intel_iommu_init+0x13a/0x56a
[ 15.044786] [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d
[ 15.051242] [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52
[ 15.057601] [<ffffffff81000342>] do_one_initcall+0x122/0x180
[ 15.064254] [<ffffffff81077738>] ? parse_args+0x1e8/0x320
[ 15.070608] [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c
[ 15.077747] [<ffffffff81d84833>] ? do_early_param+0x88/0x88
[ 15.084300] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[ 15.090362] [<ffffffff8154ffce>] kernel_init+0xe/0x130
[ 15.096431] [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0
[ 15.102693] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0
[ 15.189273] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

Signed-off-by: Jiang Liu <[email protected]>
Cc: Alex Williamson <[email protected]>
---
drivers/iommu/intel-iommu.c | 64 ++++++++++++++++++++++++-------------------
1 file changed, 36 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d5ad21d..7038b38 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -425,7 +425,8 @@ static LIST_HEAD(unmaps_to_do);
static int timer_on;
static long list_size;

-static void domain_remove_dev_info(struct dmar_domain *domain);
+static void domain_remove_dev_info(struct dmar_domain *domain,
+ struct intel_iommu *match);
static void domain_remove_one_dev_info(struct dmar_domain *domain,
struct pci_dev *pdev);

@@ -1300,15 +1301,22 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
for_each_set_bit(i, iommu->domain_ids, cap_ndoms(iommu->cap)) {
domain = iommu->domains[i];
clear_bit(i, iommu->domain_ids);
+ domain_remove_dev_info(domain, iommu);

spin_lock_irqsave(&domain->iommu_lock, flags);
- count = --domain->iommu_count;
+ if (test_and_clear_bit(iommu->seq_id,
+ domain->iommu_bmp))
+ count = --domain->iommu_count;
+ else
+ count = 1;
spin_unlock_irqrestore(&domain->iommu_lock, flags);
- if (count == 0) {
- if (domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE)
- vm_domain_exit(domain);
- else
- domain_exit(domain);
+
+ /* Keep VM domains, user still has reference to them */
+ if (count == 0 &&
+ !(domain->flags & DOMAIN_FLAG_VIRTUAL_MACHINE)) {
+ domain_exit(domain);
+ if (domain == si_domain)
+ si_domain = NULL;
}
}
}
@@ -1336,8 +1344,11 @@ static struct dmar_domain *alloc_domain(void)
return NULL;

domain->nid = -1;
+ domain->iommu_count = 0;
memset(domain->iommu_bmp, 0, sizeof(domain->iommu_bmp));
domain->flags = 0;
+ spin_lock_init(&domain->iommu_lock);
+ INIT_LIST_HEAD(&domain->devices);

return domain;
}
@@ -1361,6 +1372,7 @@ static int iommu_attach_domain(struct dmar_domain *domain,
}

domain->id = num;
+ domain->iommu_count++;
set_bit(num, iommu->domain_ids);
set_bit(iommu->seq_id, domain->iommu_bmp);
iommu->domains[num] = domain;
@@ -1461,8 +1473,6 @@ static int domain_init(struct dmar_domain *domain, int guest_width)
unsigned long sagaw;

init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
- spin_lock_init(&domain->iommu_lock);
-
domain_reserve_special_ranges(domain);

/* calculate AGAW */
@@ -1481,7 +1491,6 @@ static int domain_init(struct dmar_domain *domain, int guest_width)
return -ENODEV;
}
domain->agaw = agaw;
- INIT_LIST_HEAD(&domain->devices);

if (ecap_coherent(iommu->ecap))
domain->iommu_coherency = 1;
@@ -1494,7 +1503,6 @@ static int domain_init(struct dmar_domain *domain, int guest_width)
domain->iommu_snooping = 0;

domain->iommu_superpage = fls(cap_super_page_val(iommu->cap));
- domain->iommu_count = 1;
domain->nid = iommu->node;

/* always allocate the top pgd */
@@ -1518,7 +1526,7 @@ static void domain_exit(struct dmar_domain *domain)
if (!intel_iommu_strict)
flush_unmaps_timeout(0);

- domain_remove_dev_info(domain);
+ domain_remove_dev_info(domain, NULL);
/* destroy iovas */
put_iova_domain(&domain->iovad);

@@ -1918,27 +1926,29 @@ static inline void unlink_domain_info(struct device_domain_info *info)
info->dev->dev.archdata.iommu = NULL;
}

-static void domain_remove_dev_info(struct dmar_domain *domain)
+static void domain_remove_dev_info(struct dmar_domain *domain,
+ struct intel_iommu *match)
{
- struct device_domain_info *info;
+ struct device_domain_info *info, *tmp;
unsigned long flags;
struct intel_iommu *iommu;
+ LIST_HEAD(list);

spin_lock_irqsave(&device_domain_lock, flags);
- while (!list_empty(&domain->devices)) {
- info = list_entry(domain->devices.next,
- struct device_domain_info, link);
- unlink_domain_info(info);
- spin_unlock_irqrestore(&device_domain_lock, flags);
+ list_for_each_entry_safe(info, tmp, &domain->devices, link)
+ if (!match || match == device_to_iommu(info->segment,
+ info->bus, info->devfn)) {
+ unlink_domain_info(info);
+ list_add(&info->link, &list);
+ }
+ spin_unlock_irqrestore(&device_domain_lock, flags);

+ list_for_each_entry_safe(info, tmp, &list, link) {
iommu_disable_dev_iotlb(info);
iommu = device_to_iommu(info->segment, info->bus, info->devfn);
iommu_detach_dev(iommu, info->bus, info->devfn);
free_devinfo_mem(info);
-
- spin_lock_irqsave(&device_domain_lock, flags);
}
- spin_unlock_irqrestore(&device_domain_lock, flags);
}

/*
@@ -3862,8 +3872,11 @@ static struct dmar_domain *iommu_alloc_vm_domain(void)

domain->id = atomic_inc_return(&vm_domid);
domain->nid = -1;
+ domain->iommu_count = 0;
memset(domain->iommu_bmp, 0, sizeof(domain->iommu_bmp));
domain->flags = DOMAIN_FLAG_VIRTUAL_MACHINE;
+ spin_lock_init(&domain->iommu_lock);
+ INIT_LIST_HEAD(&domain->devices);

return domain;
}
@@ -3873,8 +3886,6 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width)
int adjust_width;

init_iova_domain(&domain->iovad, DMA_32BIT_PFN);
- spin_lock_init(&domain->iommu_lock);
-
domain_reserve_special_ranges(domain);

/* calculate AGAW */
@@ -3882,9 +3893,6 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width)
adjust_width = guestwidth_to_adjustwidth(guest_width);
domain->agaw = width_to_agaw(adjust_width);

- INIT_LIST_HEAD(&domain->devices);
-
- domain->iommu_count = 0;
domain->iommu_coherency = 0;
domain->iommu_snooping = 0;
domain->iommu_superpage = 0;
@@ -3993,7 +4001,7 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
dmar_domain->flags & DOMAIN_FLAG_STATIC_IDENTITY)
domain_remove_one_dev_info(old_domain, pdev);
else
- domain_remove_dev_info(old_domain);
+ domain_remove_dev_info(old_domain, NULL);
}
}

--
1.7.10.4

2014-01-07 09:07:09

by Jiang Liu

[permalink] [raw]
Subject: [Patch Part2 V1 02/14] iommu/vt-d: move private structures and variables into intel-iommu.c

Move private structures and variables into intel-iommu.c, which will
help to simplify locking policy for hotplug. Also delete redundant
declarations.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/iommu/intel-iommu.c | 31 +++++++++++++++++++++++++------
include/linux/dmar.h | 23 +----------------------
2 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0863e25..1704e97 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -380,6 +380,29 @@ struct device_domain_info {
struct dmar_domain *domain; /* pointer to domain */
};

+struct dmar_rmrr_unit {
+ struct list_head list; /* list of rmrr units */
+ struct acpi_dmar_header *hdr; /* ACPI header */
+ u64 base_address; /* reserved base address*/
+ u64 end_address; /* reserved end address */
+ struct pci_dev **devices; /* target devices */
+ int devices_cnt; /* target device count */
+};
+
+struct dmar_atsr_unit {
+ struct list_head list; /* list of ATSR units */
+ struct acpi_dmar_header *hdr; /* ACPI header */
+ struct pci_dev **devices; /* target devices */
+ int devices_cnt; /* target device count */
+ u8 include_all:1; /* include all ports */
+};
+
+static LIST_HEAD(dmar_atsr_units);
+static LIST_HEAD(dmar_rmrr_units);
+
+#define for_each_rmrr_units(rmrr) \
+ list_for_each_entry(rmrr, &dmar_rmrr_units, list)
+
static void flush_unmaps_timeout(unsigned long data);

static DEFINE_TIMER(unmap_timer, flush_unmaps_timeout, 0, 0);
@@ -403,6 +426,8 @@ static int timer_on;
static long list_size;

static void domain_remove_dev_info(struct dmar_domain *domain);
+static void domain_remove_one_dev_info(struct dmar_domain *domain,
+ struct pci_dev *pdev);

#ifdef CONFIG_INTEL_IOMMU_DEFAULT_ON
int dmar_disabled = 0;
@@ -2243,8 +2268,6 @@ static int __init si_domain_init(int hw)
return 0;
}

-static void domain_remove_one_dev_info(struct dmar_domain *domain,
- struct pci_dev *pdev);
static int identity_mapping(struct pci_dev *pdev)
{
struct device_domain_info *info;
@@ -3430,8 +3453,6 @@ static void __init init_iommu_pm_ops(void)
static inline void init_iommu_pm_ops(void) {}
#endif /* CONFIG_PM */

-LIST_HEAD(dmar_rmrr_units);
-
static void __init dmar_register_rmrr_unit(struct dmar_rmrr_unit *rmrr)
{
list_add(&rmrr->list, &dmar_rmrr_units);
@@ -3468,8 +3489,6 @@ rmrr_parse_dev(struct dmar_rmrr_unit *rmrru)
rmrr->segment);
}

-static LIST_HEAD(dmar_atsr_units);
-
int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr)
{
struct acpi_dmar_atsr *atsr;
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 198eb09..dc8c7d5 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -127,28 +127,7 @@ extern int arch_setup_dmar_msi(unsigned int irq);

#ifdef CONFIG_INTEL_IOMMU
extern int iommu_detected, no_iommu;
-extern struct list_head dmar_rmrr_units;
-struct dmar_rmrr_unit {
- struct list_head list; /* list of rmrr units */
- struct acpi_dmar_header *hdr; /* ACPI header */
- u64 base_address; /* reserved base address*/
- u64 end_address; /* reserved end address */
- struct pci_dev **devices; /* target devices */
- int devices_cnt; /* target device count */
-};
-
-#define for_each_rmrr_units(rmrr) \
- list_for_each_entry(rmrr, &dmar_rmrr_units, list)
-
-struct dmar_atsr_unit {
- struct list_head list; /* list of ATSR units */
- struct acpi_dmar_header *hdr; /* ACPI header */
- struct pci_dev **devices; /* target devices */
- int devices_cnt; /* target device count */
- u8 include_all:1; /* include all ports */
-};
-
-int dmar_parse_rmrr_atsr_dev(void);
+extern int dmar_parse_rmrr_atsr_dev(void);
extern int dmar_parse_one_rmrr(struct acpi_dmar_header *header);
extern int dmar_parse_one_atsr(struct acpi_dmar_header *header);
extern void *dmar_alloc_dev_scope(void *start, void *end, int *cnt);
--
1.7.10.4

2014-01-08 05:08:04

by Kai Huang

[permalink] [raw]
Subject: Re: [RFC Patch Part2 V1 14/14] iommu/vt-d: update IOMMU state when memory hotplug happens

On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]> wrote:
> If static identity domain is created, IOMMU driver needs to update
> si_domain page table when memory hotplug event happens. Otherwise
> PCI device DMA operations can't access the hot-added memory regions.
>
> Signed-off-by: Jiang Liu <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 83e3ed4..35a987d 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -33,6 +33,7 @@
> #include <linux/dmar.h>
> #include <linux/dma-mapping.h>
> #include <linux/mempool.h>
> +#include <linux/memory.h>
> #include <linux/timer.h>
> #include <linux/iova.h>
> #include <linux/iommu.h>
> @@ -3689,6 +3690,54 @@ static struct notifier_block device_nb = {
> .notifier_call = device_notifier,
> };
>
> +static int intel_iommu_memory_notifier(struct notifier_block *nb,
> + unsigned long val, void *v)
> +{
> + struct memory_notify *mhp = v;
> + unsigned long long start, end;
> + struct iova *iova;
> +
> + switch (val) {
> + case MEM_GOING_ONLINE:
> + start = mhp->start_pfn << PAGE_SHIFT;
> + end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
> + if (iommu_domain_identity_map(si_domain, start, end)) {
> + pr_warn("dmar: failed to build identity map for [%llx-%llx]\n",
> + start, end);
> + return NOTIFY_BAD;
> + }

Better to use iommu_prepare_identity_map? For si_domain, if
hw_pass_through is used, there's no page table.

> + break;
> + case MEM_OFFLINE:
> + case MEM_CANCEL_ONLINE:
> + /* TODO: enhance RB-tree and IOVA code to support of splitting iova */
> + iova = find_iova(&si_domain->iovad, mhp->start_pfn);
> + if (iova) {
> + unsigned long start_pfn, last_pfn;
> + struct dmar_drhd_unit *drhd;
> + struct intel_iommu *iommu;
> +
> + start_pfn = mm_to_dma_pfn(iova->pfn_lo);
> + last_pfn = mm_to_dma_pfn(iova->pfn_hi + 1) - 1;
> + dma_pte_clear_range(si_domain, start_pfn, last_pfn);
> + dma_pte_free_pagetable(si_domain, start_pfn, last_pfn);
> + rcu_read_lock();
> + for_each_active_iommu(iommu, drhd)
> + iommu_flush_iotlb_psi(iommu, si_domain->id,
> + start_pfn, last_pfn - start_pfn + 1, 0);
> + rcu_read_unlock();
> + __free_iova(&si_domain->iovad, iova);
> + }

The same as above. Looks we need to consider hw_pass_through for the si_domain.

-Kai

> + break;
> + }
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block intel_iommu_memory_nb = {
> + .notifier_call = intel_iommu_memory_notifier,
> + .priority = 0
> +};
> +
> int __init intel_iommu_init(void)
> {
> int ret = -ENODEV;
> @@ -3761,8 +3810,9 @@ int __init intel_iommu_init(void)
> init_iommu_pm_ops();
>
> bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> -
> bus_register_notifier(&pci_bus_type, &device_nb);
> + if (si_domain)
> + register_memory_notifier(&intel_iommu_memory_nb);
>
> intel_iommu_enabled = 1;
>
> --
> 1.7.10.4
>
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

2014-01-08 05:48:45

by Jiang Liu

[permalink] [raw]
Subject: Re: [Patch Part2 V1 03/14] iommu/vt-d: simplify function get_domain_for_dev()



On 2014/1/8 11:06, Kai Huang wrote:
>
>
>
> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]
> <mailto:[email protected]>> wrote:
>
> Function get_domain_for_dev() is a little complex, simplify it
> by factoring out dmar_search_domain_by_dev_info() and
> dmar_insert_dev_info().
>
> Signed-off-by: Jiang Liu <[email protected]
> <mailto:[email protected]>>
> ---
> drivers/iommu/intel-iommu.c | 161
> ++++++++++++++++++++-----------------------
> 1 file changed, 75 insertions(+), 86 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 1704e97..da65884 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1957,18 +1957,70 @@ find_domain(struct pci_dev *pdev)
> return NULL;
> }
>
> +static inline struct dmar_domain *
> +dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
> +{
> + struct device_domain_info *info;
> +
> + list_for_each_entry(info, &device_domain_list, global)
> + if (info->segment == segment && info->bus == bus &&
> + info->devfn == devfn)
> + return info->domain;
> +
> + return NULL;
> +}
> +
> +static int dmar_insert_dev_info(int segment, int bus, int devfn,
> + struct pci_dev *dev, struct
> dmar_domain **domp)
> +{
> + struct dmar_domain *found, *domain = *domp;
> + struct device_domain_info *info;
> + unsigned long flags;
> +
> + info = alloc_devinfo_mem();
> + if (!info)
> + return -ENOMEM;
> +
> + info->segment = segment;
> + info->bus = bus;
> + info->devfn = devfn;
> + info->dev = dev;
> + info->domain = domain;
> + if (!dev)
> + domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
> +
> + spin_lock_irqsave(&device_domain_lock, flags);
> + if (dev)
> + found = find_domain(dev);
> + else
> + found = dmar_search_domain_by_dev_info(segment, bus,
> devfn);
> + if (found) {
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> + free_devinfo_mem(info);
> + if (found != domain) {
> + domain_exit(domain);
> + *domp = found;
> + }
> + } else {
> + list_add(&info->link, &domain->devices);
> + list_add(&info->global, &device_domain_list);
> + if (dev)
> + dev->dev.archdata.iommu = info;
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> + }
> +
> + return 0;
> +}
>
>
> I am a little bit confused about the "alloc before check" sequence in
> above function. I believe the purpose of allocating the
> device_domain_info before searching the domain with spin_lock hold is to
> avoid the memory allocation in the spin_lock? In this case, why can't we
> use mutex instead of spin_lock? In my understanding, if we use mutex, we
> can first check if the domain exists or not before we really allocate
> device_domain_info, right?
Hi Kai,
Thanks for review!
The domain->devices list may be access in interrupt context,
so we need to protect it with spin_lock_irqsave(). Otherwise it may
case deadlock.

>
> Another question is when is info->iommu field initialized? Looks it is
> not initialized here?
It's set in function iommu_support_dev_iotlb() for devices supporting
device IOTLB.

>
> -Kai
>
> +
> /* domain is initialized */
> static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev,
> int gaw)
> {
> - struct dmar_domain *domain, *found = NULL;
> + struct dmar_domain *domain;
> struct intel_iommu *iommu;
> struct dmar_drhd_unit *drhd;
> - struct device_domain_info *info, *tmp;
> - struct pci_dev *dev_tmp;
> + struct pci_dev *bridge;
> unsigned long flags;
> - int bus = 0, devfn = 0;
> - int segment;
> - int ret;
> + int segment, bus, devfn;
>
> domain = find_domain(pdev);
> if (domain)
> @@ -1976,119 +2028,56 @@ static struct dmar_domain
> *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>
> segment = pci_domain_nr(pdev->bus);
>
> - dev_tmp = pci_find_upstream_pcie_bridge(pdev);
> - if (dev_tmp) {
> - if (pci_is_pcie(dev_tmp)) {
> - bus = dev_tmp->subordinate->number;
> + bridge = pci_find_upstream_pcie_bridge(pdev);
> + if (bridge) {
> + if (pci_is_pcie(bridge)) {
> + bus = bridge->subordinate->number;
> devfn = 0;
> } else {
> - bus = dev_tmp->bus->number;
> - devfn = dev_tmp->devfn;
> + bus = bridge->bus->number;
> + devfn = bridge->devfn;
> }
> spin_lock_irqsave(&device_domain_lock, flags);
> - list_for_each_entry(info, &device_domain_list, global) {
> - if (info->segment == segment &&
> - info->bus == bus && info->devfn == devfn) {
> - found = info->domain;
> - break;
> - }
> - }
> + domain = dmar_search_domain_by_dev_info(segment,
> bus, devfn);
> spin_unlock_irqrestore(&device_domain_lock, flags);
> /* pcie-pci bridge already has a domain, uses it */
> - if (found) {
> - domain = found;
> + if (domain)
> goto found_domain;
> - }
> }
>
> - domain = alloc_domain();
> - if (!domain)
> - goto error;
> -
> - /* Allocate new domain for the device */
> drhd = dmar_find_matched_drhd_unit(pdev);
> if (!drhd) {
> printk(KERN_ERR "IOMMU: can't find DMAR for device
> %s\n",
> pci_name(pdev));
> - free_domain_mem(domain);
> return NULL;
> }
> iommu = drhd->iommu;
>
> - ret = iommu_attach_domain(domain, iommu);
> - if (ret) {
> + /* Allocate new domain for the device */
> + domain = alloc_domain();
> + if (!domain)
> + goto error;
> + if (iommu_attach_domain(domain, iommu)) {
> free_domain_mem(domain);
> goto error;
> }
> -
> if (domain_init(domain, gaw)) {
> domain_exit(domain);
> goto error;
> }
>
> /* register pcie-to-pci device */
> - if (dev_tmp) {
> - info = alloc_devinfo_mem();
> - if (!info) {
> + if (bridge) {
> + if (dmar_insert_dev_info(segment, bus, devfn, NULL,
> &domain)) {
> domain_exit(domain);
> goto error;
> }
> - info->segment = segment;
> - info->bus = bus;
> - info->devfn = devfn;
> - info->dev = NULL;
> - info->domain = domain;
> - /* This domain is shared by devices under p2p bridge */
> - domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
> -
> - /* pcie-to-pci bridge already has a domain, uses it */
> - found = NULL;
> - spin_lock_irqsave(&device_domain_lock, flags);
> - list_for_each_entry(tmp, &device_domain_list, global) {
> - if (tmp->segment == segment &&
> - tmp->bus == bus && tmp->devfn == devfn) {
> - found = tmp->domain;
> - break;
> - }
> - }
> - if (found) {
> - spin_unlock_irqrestore(&device_domain_lock,
> flags);
> - free_devinfo_mem(info);
> - domain_exit(domain);
> - domain = found;
> - } else {
> - list_add(&info->link, &domain->devices);
> - list_add(&info->global, &device_domain_list);
> - spin_unlock_irqrestore(&device_domain_lock,
> flags);
> - }
> }
>
> found_domain:
> - info = alloc_devinfo_mem();
> - if (!info)
> - goto error;
> - info->segment = segment;
> - info->bus = pdev->bus->number;
> - info->devfn = pdev->devfn;
> - info->dev = pdev;
> - info->domain = domain;
> - spin_lock_irqsave(&device_domain_lock, flags);
> - /* somebody is fast */
> - found = find_domain(pdev);
> - if (found != NULL) {
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> - if (found != domain) {
> - domain_exit(domain);
> - domain = found;
> - }
> - free_devinfo_mem(info);
> + if (dmar_insert_dev_info(segment, pdev->bus->number,
> pdev->devfn,
> + pdev, &domain) == 0)
> return domain;
> - }
> - list_add(&info->link, &domain->devices);
> - list_add(&info->global, &device_domain_list);
> - pdev->dev.archdata.iommu = info;
> - spin_unlock_irqrestore(&device_domain_lock, flags);
> - return domain;
> error:
> /* recheck it here, maybe others set it */
> return find_domain(pdev);
> --
> 1.7.10.4
>
> _______________________________________________
> iommu mailing list
> [email protected]
> <mailto:[email protected]>
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
>

2014-01-08 06:02:15

by Jiang Liu

[permalink] [raw]
Subject: Re: [RFC Patch Part2 V1 14/14] iommu/vt-d: update IOMMU state when memory hotplug happens



On 2014/1/8 13:07, Kai Huang wrote:
> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]> wrote:
>> If static identity domain is created, IOMMU driver needs to update
>> si_domain page table when memory hotplug event happens. Otherwise
>> PCI device DMA operations can't access the hot-added memory regions.
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> ---
>> drivers/iommu/intel-iommu.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 83e3ed4..35a987d 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -33,6 +33,7 @@
>> #include <linux/dmar.h>
>> #include <linux/dma-mapping.h>
>> #include <linux/mempool.h>
>> +#include <linux/memory.h>
>> #include <linux/timer.h>
>> #include <linux/iova.h>
>> #include <linux/iommu.h>
>> @@ -3689,6 +3690,54 @@ static struct notifier_block device_nb = {
>> .notifier_call = device_notifier,
>> };
>>
>> +static int intel_iommu_memory_notifier(struct notifier_block *nb,
>> + unsigned long val, void *v)
>> +{
>> + struct memory_notify *mhp = v;
>> + unsigned long long start, end;
>> + struct iova *iova;
>> +
>> + switch (val) {
>> + case MEM_GOING_ONLINE:
>> + start = mhp->start_pfn << PAGE_SHIFT;
>> + end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
>> + if (iommu_domain_identity_map(si_domain, start, end)) {
>> + pr_warn("dmar: failed to build identity map for [%llx-%llx]\n",
>> + start, end);
>> + return NOTIFY_BAD;
>> + }
>
> Better to use iommu_prepare_identity_map? For si_domain, if
> hw_pass_through is used, there's no page table.
Hi Kai,
Good catch!
Seems function iommu_prepare_identity_map() is designed to handle
RMRRs. So how about avoiding of registering memory hotplug notifier
if hw_pass_through is true?

Thanks!
Gerry

>
>> + break;
>> + case MEM_OFFLINE:
>> + case MEM_CANCEL_ONLINE:
>> + /* TODO: enhance RB-tree and IOVA code to support of splitting iova */
>> + iova = find_iova(&si_domain->iovad, mhp->start_pfn);
>> + if (iova) {
>> + unsigned long start_pfn, last_pfn;
>> + struct dmar_drhd_unit *drhd;
>> + struct intel_iommu *iommu;
>> +
>> + start_pfn = mm_to_dma_pfn(iova->pfn_lo);
>> + last_pfn = mm_to_dma_pfn(iova->pfn_hi + 1) - 1;
>> + dma_pte_clear_range(si_domain, start_pfn, last_pfn);
>> + dma_pte_free_pagetable(si_domain, start_pfn, last_pfn);
>> + rcu_read_lock();
>> + for_each_active_iommu(iommu, drhd)
>> + iommu_flush_iotlb_psi(iommu, si_domain->id,
>> + start_pfn, last_pfn - start_pfn + 1, 0);
>> + rcu_read_unlock();
>> + __free_iova(&si_domain->iovad, iova);
>> + }
>
> The same as above. Looks we need to consider hw_pass_through for the si_domain.
>
> -Kai
>
>> + break;
>> + }
>> +
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block intel_iommu_memory_nb = {
>> + .notifier_call = intel_iommu_memory_notifier,
>> + .priority = 0
>> +};
>> +
>> int __init intel_iommu_init(void)
>> {
>> int ret = -ENODEV;
>> @@ -3761,8 +3810,9 @@ int __init intel_iommu_init(void)
>> init_iommu_pm_ops();
>>
>> bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
>> -
>> bus_register_notifier(&pci_bus_type, &device_nb);
>> + if (si_domain)
>> + register_memory_notifier(&intel_iommu_memory_nb);
>>
>> intel_iommu_enabled = 1;
>>
>> --
>> 1.7.10.4
>>
>> _______________________________________________
>> iommu mailing list
>> [email protected]
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

2014-01-08 06:06:45

by Kai Huang

[permalink] [raw]
Subject: Re: [Patch Part2 V1 03/14] iommu/vt-d: simplify function get_domain_for_dev()

On Wed, Jan 8, 2014 at 1:48 PM, Jiang Liu <[email protected]> wrote:
>
>
> On 2014/1/8 11:06, Kai Huang wrote:
>>
>>
>>
>> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Function get_domain_for_dev() is a little complex, simplify it
>> by factoring out dmar_search_domain_by_dev_info() and
>> dmar_insert_dev_info().
>>
>> Signed-off-by: Jiang Liu <[email protected]
>> <mailto:[email protected]>>
>> ---
>> drivers/iommu/intel-iommu.c | 161
>> ++++++++++++++++++++-----------------------
>> 1 file changed, 75 insertions(+), 86 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 1704e97..da65884 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -1957,18 +1957,70 @@ find_domain(struct pci_dev *pdev)
>> return NULL;
>> }
>>
>> +static inline struct dmar_domain *
>> +dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
>> +{
>> + struct device_domain_info *info;
>> +
>> + list_for_each_entry(info, &device_domain_list, global)
>> + if (info->segment == segment && info->bus == bus &&
>> + info->devfn == devfn)
>> + return info->domain;
>> +
>> + return NULL;
>> +}
>> +
>> +static int dmar_insert_dev_info(int segment, int bus, int devfn,
>> + struct pci_dev *dev, struct
>> dmar_domain **domp)
>> +{
>> + struct dmar_domain *found, *domain = *domp;
>> + struct device_domain_info *info;
>> + unsigned long flags;
>> +
>> + info = alloc_devinfo_mem();
>> + if (!info)
>> + return -ENOMEM;
>> +
>> + info->segment = segment;
>> + info->bus = bus;
>> + info->devfn = devfn;
>> + info->dev = dev;
>> + info->domain = domain;
>> + if (!dev)
>> + domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>> +
>> + spin_lock_irqsave(&device_domain_lock, flags);
>> + if (dev)
>> + found = find_domain(dev);
>> + else
>> + found = dmar_search_domain_by_dev_info(segment, bus,
>> devfn);
>> + if (found) {
>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>> + free_devinfo_mem(info);
>> + if (found != domain) {
>> + domain_exit(domain);
>> + *domp = found;
>> + }
>> + } else {
>> + list_add(&info->link, &domain->devices);
>> + list_add(&info->global, &device_domain_list);
>> + if (dev)
>> + dev->dev.archdata.iommu = info;
>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>> + }
>> +
>> + return 0;
>> +}
>>
>>
>> I am a little bit confused about the "alloc before check" sequence in
>> above function. I believe the purpose of allocating the
>> device_domain_info before searching the domain with spin_lock hold is to
>> avoid the memory allocation in the spin_lock? In this case, why can't we
>> use mutex instead of spin_lock? In my understanding, if we use mutex, we
>> can first check if the domain exists or not before we really allocate
>> device_domain_info, right?
> Hi Kai,
> Thanks for review!
> The domain->devices list may be access in interrupt context,
> so we need to protect it with spin_lock_irqsave(). Otherwise it may
> case deadlock.
>

Thanks. That's what I am thinking. I observed lots of other iommu
functions also use spin_lock.

>>
>> Another question is when is info->iommu field initialized? Looks it is
>> not initialized here?
> It's set in function iommu_support_dev_iotlb() for devices supporting
> device IOTLB.

Thanks. I see in iommu_support_dev_iotlb, info->iommu won't be set
unless the device supports iotlb and ATS. Does this mean the
info->iommu won't be used if the device doesn't support iotlb?

If this is true, looks it's not convenient to find the IOMMU that
handles the device via info, as it's possible that different IOMMUs
share the same domain, and we don't know which IOMMU actually handles
this device if we try to find it via the info->domain.

-Kai

>
>>
>> -Kai
>>
>> +
>> /* domain is initialized */
>> static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev,
>> int gaw)
>> {
>> - struct dmar_domain *domain, *found = NULL;
>> + struct dmar_domain *domain;
>> struct intel_iommu *iommu;
>> struct dmar_drhd_unit *drhd;
>> - struct device_domain_info *info, *tmp;
>> - struct pci_dev *dev_tmp;
>> + struct pci_dev *bridge;
>> unsigned long flags;
>> - int bus = 0, devfn = 0;
>> - int segment;
>> - int ret;
>> + int segment, bus, devfn;
>>
>> domain = find_domain(pdev);
>> if (domain)
>> @@ -1976,119 +2028,56 @@ static struct dmar_domain
>> *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>>
>> segment = pci_domain_nr(pdev->bus);
>>
>> - dev_tmp = pci_find_upstream_pcie_bridge(pdev);
>> - if (dev_tmp) {
>> - if (pci_is_pcie(dev_tmp)) {
>> - bus = dev_tmp->subordinate->number;
>> + bridge = pci_find_upstream_pcie_bridge(pdev);
>> + if (bridge) {
>> + if (pci_is_pcie(bridge)) {
>> + bus = bridge->subordinate->number;
>> devfn = 0;
>> } else {
>> - bus = dev_tmp->bus->number;
>> - devfn = dev_tmp->devfn;
>> + bus = bridge->bus->number;
>> + devfn = bridge->devfn;
>> }
>> spin_lock_irqsave(&device_domain_lock, flags);
>> - list_for_each_entry(info, &device_domain_list, global) {
>> - if (info->segment == segment &&
>> - info->bus == bus && info->devfn == devfn) {
>> - found = info->domain;
>> - break;
>> - }
>> - }
>> + domain = dmar_search_domain_by_dev_info(segment,
>> bus, devfn);
>> spin_unlock_irqrestore(&device_domain_lock, flags);
>> /* pcie-pci bridge already has a domain, uses it */
>> - if (found) {
>> - domain = found;
>> + if (domain)
>> goto found_domain;
>> - }
>> }
>>
>> - domain = alloc_domain();
>> - if (!domain)
>> - goto error;
>> -
>> - /* Allocate new domain for the device */
>> drhd = dmar_find_matched_drhd_unit(pdev);
>> if (!drhd) {
>> printk(KERN_ERR "IOMMU: can't find DMAR for device
>> %s\n",
>> pci_name(pdev));
>> - free_domain_mem(domain);
>> return NULL;
>> }
>> iommu = drhd->iommu;
>>
>> - ret = iommu_attach_domain(domain, iommu);
>> - if (ret) {
>> + /* Allocate new domain for the device */
>> + domain = alloc_domain();
>> + if (!domain)
>> + goto error;
>> + if (iommu_attach_domain(domain, iommu)) {
>> free_domain_mem(domain);
>> goto error;
>> }
>> -
>> if (domain_init(domain, gaw)) {
>> domain_exit(domain);
>> goto error;
>> }
>>
>> /* register pcie-to-pci device */
>> - if (dev_tmp) {
>> - info = alloc_devinfo_mem();
>> - if (!info) {
>> + if (bridge) {
>> + if (dmar_insert_dev_info(segment, bus, devfn, NULL,
>> &domain)) {
>> domain_exit(domain);
>> goto error;
>> }
>> - info->segment = segment;
>> - info->bus = bus;
>> - info->devfn = devfn;
>> - info->dev = NULL;
>> - info->domain = domain;
>> - /* This domain is shared by devices under p2p bridge */
>> - domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>> -
>> - /* pcie-to-pci bridge already has a domain, uses it */
>> - found = NULL;
>> - spin_lock_irqsave(&device_domain_lock, flags);
>> - list_for_each_entry(tmp, &device_domain_list, global) {
>> - if (tmp->segment == segment &&
>> - tmp->bus == bus && tmp->devfn == devfn) {
>> - found = tmp->domain;
>> - break;
>> - }
>> - }
>> - if (found) {
>> - spin_unlock_irqrestore(&device_domain_lock,
>> flags);
>> - free_devinfo_mem(info);
>> - domain_exit(domain);
>> - domain = found;
>> - } else {
>> - list_add(&info->link, &domain->devices);
>> - list_add(&info->global, &device_domain_list);
>> - spin_unlock_irqrestore(&device_domain_lock,
>> flags);
>> - }
>> }
>>
>> found_domain:
>> - info = alloc_devinfo_mem();
>> - if (!info)
>> - goto error;
>> - info->segment = segment;
>> - info->bus = pdev->bus->number;
>> - info->devfn = pdev->devfn;
>> - info->dev = pdev;
>> - info->domain = domain;
>> - spin_lock_irqsave(&device_domain_lock, flags);
>> - /* somebody is fast */
>> - found = find_domain(pdev);
>> - if (found != NULL) {
>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>> - if (found != domain) {
>> - domain_exit(domain);
>> - domain = found;
>> - }
>> - free_devinfo_mem(info);
>> + if (dmar_insert_dev_info(segment, pdev->bus->number,
>> pdev->devfn,
>> + pdev, &domain) == 0)
>> return domain;
>> - }
>> - list_add(&info->link, &domain->devices);
>> - list_add(&info->global, &device_domain_list);
>> - pdev->dev.archdata.iommu = info;
>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>> - return domain;
>> error:
>> /* recheck it here, maybe others set it */
>> return find_domain(pdev);
>> --
>> 1.7.10.4
>>
>> _______________________________________________
>> iommu mailing list
>> [email protected]
>> <mailto:[email protected]>
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
>>

2014-01-08 06:14:23

by Kai Huang

[permalink] [raw]
Subject: Re: [RFC Patch Part2 V1 14/14] iommu/vt-d: update IOMMU state when memory hotplug happens

On Wed, Jan 8, 2014 at 2:01 PM, Jiang Liu <[email protected]> wrote:
>
>
> On 2014/1/8 13:07, Kai Huang wrote:
>> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]> wrote:
>>> If static identity domain is created, IOMMU driver needs to update
>>> si_domain page table when memory hotplug event happens. Otherwise
>>> PCI device DMA operations can't access the hot-added memory regions.
>>>
>>> Signed-off-by: Jiang Liu <[email protected]>
>>> ---
>>> drivers/iommu/intel-iommu.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 51 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>> index 83e3ed4..35a987d 100644
>>> --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -33,6 +33,7 @@
>>> #include <linux/dmar.h>
>>> #include <linux/dma-mapping.h>
>>> #include <linux/mempool.h>
>>> +#include <linux/memory.h>
>>> #include <linux/timer.h>
>>> #include <linux/iova.h>
>>> #include <linux/iommu.h>
>>> @@ -3689,6 +3690,54 @@ static struct notifier_block device_nb = {
>>> .notifier_call = device_notifier,
>>> };
>>>
>>> +static int intel_iommu_memory_notifier(struct notifier_block *nb,
>>> + unsigned long val, void *v)
>>> +{
>>> + struct memory_notify *mhp = v;
>>> + unsigned long long start, end;
>>> + struct iova *iova;
>>> +
>>> + switch (val) {
>>> + case MEM_GOING_ONLINE:
>>> + start = mhp->start_pfn << PAGE_SHIFT;
>>> + end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
>>> + if (iommu_domain_identity_map(si_domain, start, end)) {
>>> + pr_warn("dmar: failed to build identity map for [%llx-%llx]\n",
>>> + start, end);
>>> + return NOTIFY_BAD;
>>> + }
>>
>> Better to use iommu_prepare_identity_map? For si_domain, if
>> hw_pass_through is used, there's no page table.
> Hi Kai,
> Good catch!
> Seems function iommu_prepare_identity_map() is designed to handle
> RMRRs. So how about avoiding of registering memory hotplug notifier
> if hw_pass_through is true?

I think that's also fine :)

Btw, I have a related question to memory hotplug but not related to
intel IOMMU specifically. For the devices use DMA remapping, suppose
the device is already using the memory that we are trying to remove,
is this case, looks we need to change the existing iova <-> pa
mappings for the pa that is in the memory range about to be removed,
and reset the mapping to different pa (iova remains the same). Does
existing code have this covered? Is there a generic IOMMU layer memory
hotplug notifier to handle memory removal?

-Kai
>
> Thanks!
> Gerry
>
>>
>>> + break;
>>> + case MEM_OFFLINE:
>>> + case MEM_CANCEL_ONLINE:
>>> + /* TODO: enhance RB-tree and IOVA code to support of splitting iova */
>>> + iova = find_iova(&si_domain->iovad, mhp->start_pfn);
>>> + if (iova) {
>>> + unsigned long start_pfn, last_pfn;
>>> + struct dmar_drhd_unit *drhd;
>>> + struct intel_iommu *iommu;
>>> +
>>> + start_pfn = mm_to_dma_pfn(iova->pfn_lo);
>>> + last_pfn = mm_to_dma_pfn(iova->pfn_hi + 1) - 1;
>>> + dma_pte_clear_range(si_domain, start_pfn, last_pfn);
>>> + dma_pte_free_pagetable(si_domain, start_pfn, last_pfn);
>>> + rcu_read_lock();
>>> + for_each_active_iommu(iommu, drhd)
>>> + iommu_flush_iotlb_psi(iommu, si_domain->id,
>>> + start_pfn, last_pfn - start_pfn + 1, 0);
>>> + rcu_read_unlock();
>>> + __free_iova(&si_domain->iovad, iova);
>>> + }
>>
>> The same as above. Looks we need to consider hw_pass_through for the si_domain.
>>
>> -Kai
>>
>>> + break;
>>> + }
>>> +
>>> + return NOTIFY_OK;
>>> +}
>>> +
>>> +static struct notifier_block intel_iommu_memory_nb = {
>>> + .notifier_call = intel_iommu_memory_notifier,
>>> + .priority = 0
>>> +};
>>> +
>>> int __init intel_iommu_init(void)
>>> {
>>> int ret = -ENODEV;
>>> @@ -3761,8 +3810,9 @@ int __init intel_iommu_init(void)
>>> init_iommu_pm_ops();
>>>
>>> bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
>>> -
>>> bus_register_notifier(&pci_bus_type, &device_nb);
>>> + if (si_domain)
>>> + register_memory_notifier(&intel_iommu_memory_nb);
>>>
>>> intel_iommu_enabled = 1;
>>>
>>> --
>>> 1.7.10.4
>>>
>>> _______________________________________________
>>> iommu mailing list
>>> [email protected]
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

2014-01-08 06:22:03

by Jiang Liu

[permalink] [raw]
Subject: Re: [RFC Patch Part2 V1 14/14] iommu/vt-d: update IOMMU state when memory hotplug happens



On 2014/1/8 14:14, Kai Huang wrote:
> On Wed, Jan 8, 2014 at 2:01 PM, Jiang Liu <[email protected]> wrote:
>>
>>
>> On 2014/1/8 13:07, Kai Huang wrote:
>>> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]> wrote:
>>>> If static identity domain is created, IOMMU driver needs to update
>>>> si_domain page table when memory hotplug event happens. Otherwise
>>>> PCI device DMA operations can't access the hot-added memory regions.
>>>>
>>>> Signed-off-by: Jiang Liu <[email protected]>
>>>> ---
>>>> drivers/iommu/intel-iommu.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
>>>> 1 file changed, 51 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>>> index 83e3ed4..35a987d 100644
>>>> --- a/drivers/iommu/intel-iommu.c
>>>> +++ b/drivers/iommu/intel-iommu.c
>>>> @@ -33,6 +33,7 @@
>>>> #include <linux/dmar.h>
>>>> #include <linux/dma-mapping.h>
>>>> #include <linux/mempool.h>
>>>> +#include <linux/memory.h>
>>>> #include <linux/timer.h>
>>>> #include <linux/iova.h>
>>>> #include <linux/iommu.h>
>>>> @@ -3689,6 +3690,54 @@ static struct notifier_block device_nb = {
>>>> .notifier_call = device_notifier,
>>>> };
>>>>
>>>> +static int intel_iommu_memory_notifier(struct notifier_block *nb,
>>>> + unsigned long val, void *v)
>>>> +{
>>>> + struct memory_notify *mhp = v;
>>>> + unsigned long long start, end;
>>>> + struct iova *iova;
>>>> +
>>>> + switch (val) {
>>>> + case MEM_GOING_ONLINE:
>>>> + start = mhp->start_pfn << PAGE_SHIFT;
>>>> + end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
>>>> + if (iommu_domain_identity_map(si_domain, start, end)) {
>>>> + pr_warn("dmar: failed to build identity map for [%llx-%llx]\n",
>>>> + start, end);
>>>> + return NOTIFY_BAD;
>>>> + }
>>>
>>> Better to use iommu_prepare_identity_map? For si_domain, if
>>> hw_pass_through is used, there's no page table.
>> Hi Kai,
>> Good catch!
>> Seems function iommu_prepare_identity_map() is designed to handle
>> RMRRs. So how about avoiding of registering memory hotplug notifier
>> if hw_pass_through is true?
>
> I think that's also fine :)
>
> Btw, I have a related question to memory hotplug but not related to
> intel IOMMU specifically. For the devices use DMA remapping, suppose
> the device is already using the memory that we are trying to remove,
> is this case, looks we need to change the existing iova <-> pa
> mappings for the pa that is in the memory range about to be removed,
> and reset the mapping to different pa (iova remains the same). Does
> existing code have this covered? Is there a generic IOMMU layer memory
> hotplug notifier to handle memory removal?
That's a big issue about how to reclaim memory in use. Current rule
is that memory used by DMA won't be removed until released.

>
> -Kai
>>
>> Thanks!
>> Gerry
>>
>>>
>>>> + break;
>>>> + case MEM_OFFLINE:
>>>> + case MEM_CANCEL_ONLINE:
>>>> + /* TODO: enhance RB-tree and IOVA code to support of splitting iova */
>>>> + iova = find_iova(&si_domain->iovad, mhp->start_pfn);
>>>> + if (iova) {
>>>> + unsigned long start_pfn, last_pfn;
>>>> + struct dmar_drhd_unit *drhd;
>>>> + struct intel_iommu *iommu;
>>>> +
>>>> + start_pfn = mm_to_dma_pfn(iova->pfn_lo);
>>>> + last_pfn = mm_to_dma_pfn(iova->pfn_hi + 1) - 1;
>>>> + dma_pte_clear_range(si_domain, start_pfn, last_pfn);
>>>> + dma_pte_free_pagetable(si_domain, start_pfn, last_pfn);
>>>> + rcu_read_lock();
>>>> + for_each_active_iommu(iommu, drhd)
>>>> + iommu_flush_iotlb_psi(iommu, si_domain->id,
>>>> + start_pfn, last_pfn - start_pfn + 1, 0);
>>>> + rcu_read_unlock();
>>>> + __free_iova(&si_domain->iovad, iova);
>>>> + }
>>>
>>> The same as above. Looks we need to consider hw_pass_through for the si_domain.
>>>
>>> -Kai
>>>
>>>> + break;
>>>> + }
>>>> +
>>>> + return NOTIFY_OK;
>>>> +}
>>>> +
>>>> +static struct notifier_block intel_iommu_memory_nb = {
>>>> + .notifier_call = intel_iommu_memory_notifier,
>>>> + .priority = 0
>>>> +};
>>>> +
>>>> int __init intel_iommu_init(void)
>>>> {
>>>> int ret = -ENODEV;
>>>> @@ -3761,8 +3810,9 @@ int __init intel_iommu_init(void)
>>>> init_iommu_pm_ops();
>>>>
>>>> bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
>>>> -
>>>> bus_register_notifier(&pci_bus_type, &device_nb);
>>>> + if (si_domain)
>>>> + register_memory_notifier(&intel_iommu_memory_nb);
>>>>
>>>> intel_iommu_enabled = 1;
>>>>
>>>> --
>>>> 1.7.10.4
>>>>
>>>> _______________________________________________
>>>> iommu mailing list
>>>> [email protected]
>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

2014-01-08 06:28:08

by Kai Huang

[permalink] [raw]
Subject: Re: [RFC Patch Part2 V1 14/14] iommu/vt-d: update IOMMU state when memory hotplug happens

On Wed, Jan 8, 2014 at 2:21 PM, Jiang Liu <[email protected]> wrote:
>
>
> On 2014/1/8 14:14, Kai Huang wrote:
>> On Wed, Jan 8, 2014 at 2:01 PM, Jiang Liu <[email protected]> wrote:
>>>
>>>
>>> On 2014/1/8 13:07, Kai Huang wrote:
>>>> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]> wrote:
>>>>> If static identity domain is created, IOMMU driver needs to update
>>>>> si_domain page table when memory hotplug event happens. Otherwise
>>>>> PCI device DMA operations can't access the hot-added memory regions.
>>>>>
>>>>> Signed-off-by: Jiang Liu <[email protected]>
>>>>> ---
>>>>> drivers/iommu/intel-iommu.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
>>>>> 1 file changed, 51 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>>>> index 83e3ed4..35a987d 100644
>>>>> --- a/drivers/iommu/intel-iommu.c
>>>>> +++ b/drivers/iommu/intel-iommu.c
>>>>> @@ -33,6 +33,7 @@
>>>>> #include <linux/dmar.h>
>>>>> #include <linux/dma-mapping.h>
>>>>> #include <linux/mempool.h>
>>>>> +#include <linux/memory.h>
>>>>> #include <linux/timer.h>
>>>>> #include <linux/iova.h>
>>>>> #include <linux/iommu.h>
>>>>> @@ -3689,6 +3690,54 @@ static struct notifier_block device_nb = {
>>>>> .notifier_call = device_notifier,
>>>>> };
>>>>>
>>>>> +static int intel_iommu_memory_notifier(struct notifier_block *nb,
>>>>> + unsigned long val, void *v)
>>>>> +{
>>>>> + struct memory_notify *mhp = v;
>>>>> + unsigned long long start, end;
>>>>> + struct iova *iova;
>>>>> +
>>>>> + switch (val) {
>>>>> + case MEM_GOING_ONLINE:
>>>>> + start = mhp->start_pfn << PAGE_SHIFT;
>>>>> + end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
>>>>> + if (iommu_domain_identity_map(si_domain, start, end)) {
>>>>> + pr_warn("dmar: failed to build identity map for [%llx-%llx]\n",
>>>>> + start, end);
>>>>> + return NOTIFY_BAD;
>>>>> + }
>>>>
>>>> Better to use iommu_prepare_identity_map? For si_domain, if
>>>> hw_pass_through is used, there's no page table.
>>> Hi Kai,
>>> Good catch!
>>> Seems function iommu_prepare_identity_map() is designed to handle
>>> RMRRs. So how about avoiding of registering memory hotplug notifier
>>> if hw_pass_through is true?
>>
>> I think that's also fine :)
>>
>> Btw, I have a related question to memory hotplug but not related to
>> intel IOMMU specifically. For the devices use DMA remapping, suppose
>> the device is already using the memory that we are trying to remove,
>> is this case, looks we need to change the existing iova <-> pa
>> mappings for the pa that is in the memory range about to be removed,
>> and reset the mapping to different pa (iova remains the same). Does
>> existing code have this covered? Is there a generic IOMMU layer memory
>> hotplug notifier to handle memory removal?
> That's a big issue about how to reclaim memory in use. Current rule
> is that memory used by DMA won't be removed until released.
>

Understood. Thanks.

-Kai
>>
>> -Kai
>>>
>>> Thanks!
>>> Gerry
>>>
>>>>
>>>>> + break;
>>>>> + case MEM_OFFLINE:
>>>>> + case MEM_CANCEL_ONLINE:
>>>>> + /* TODO: enhance RB-tree and IOVA code to support of splitting iova */
>>>>> + iova = find_iova(&si_domain->iovad, mhp->start_pfn);
>>>>> + if (iova) {
>>>>> + unsigned long start_pfn, last_pfn;
>>>>> + struct dmar_drhd_unit *drhd;
>>>>> + struct intel_iommu *iommu;
>>>>> +
>>>>> + start_pfn = mm_to_dma_pfn(iova->pfn_lo);
>>>>> + last_pfn = mm_to_dma_pfn(iova->pfn_hi + 1) - 1;
>>>>> + dma_pte_clear_range(si_domain, start_pfn, last_pfn);
>>>>> + dma_pte_free_pagetable(si_domain, start_pfn, last_pfn);
>>>>> + rcu_read_lock();
>>>>> + for_each_active_iommu(iommu, drhd)
>>>>> + iommu_flush_iotlb_psi(iommu, si_domain->id,
>>>>> + start_pfn, last_pfn - start_pfn + 1, 0);
>>>>> + rcu_read_unlock();
>>>>> + __free_iova(&si_domain->iovad, iova);
>>>>> + }
>>>>
>>>> The same as above. Looks we need to consider hw_pass_through for the si_domain.
>>>>
>>>> -Kai
>>>>
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> + return NOTIFY_OK;
>>>>> +}
>>>>> +
>>>>> +static struct notifier_block intel_iommu_memory_nb = {
>>>>> + .notifier_call = intel_iommu_memory_notifier,
>>>>> + .priority = 0
>>>>> +};
>>>>> +
>>>>> int __init intel_iommu_init(void)
>>>>> {
>>>>> int ret = -ENODEV;
>>>>> @@ -3761,8 +3810,9 @@ int __init intel_iommu_init(void)
>>>>> init_iommu_pm_ops();
>>>>>
>>>>> bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
>>>>> -
>>>>> bus_register_notifier(&pci_bus_type, &device_nb);
>>>>> + if (si_domain)
>>>>> + register_memory_notifier(&intel_iommu_memory_nb);
>>>>>
>>>>> intel_iommu_enabled = 1;
>>>>>
>>>>> --
>>>>> 1.7.10.4
>>>>>
>>>>> _______________________________________________
>>>>> iommu mailing list
>>>>> [email protected]
>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

2014-01-08 06:31:24

by Jiang Liu

[permalink] [raw]
Subject: Re: [Patch Part2 V1 03/14] iommu/vt-d: simplify function get_domain_for_dev()



On 2014/1/8 14:06, Kai Huang wrote:
> On Wed, Jan 8, 2014 at 1:48 PM, Jiang Liu <[email protected]> wrote:
>>
>>
>> On 2014/1/8 11:06, Kai Huang wrote:
>>>
>>>
>>>
>>> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> Function get_domain_for_dev() is a little complex, simplify it
>>> by factoring out dmar_search_domain_by_dev_info() and
>>> dmar_insert_dev_info().
>>>
>>> Signed-off-by: Jiang Liu <[email protected]
>>> <mailto:[email protected]>>
>>> ---
>>> drivers/iommu/intel-iommu.c | 161
>>> ++++++++++++++++++++-----------------------
>>> 1 file changed, 75 insertions(+), 86 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>> index 1704e97..da65884 100644
>>> --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -1957,18 +1957,70 @@ find_domain(struct pci_dev *pdev)
>>> return NULL;
>>> }
>>>
>>> +static inline struct dmar_domain *
>>> +dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
>>> +{
>>> + struct device_domain_info *info;
>>> +
>>> + list_for_each_entry(info, &device_domain_list, global)
>>> + if (info->segment == segment && info->bus == bus &&
>>> + info->devfn == devfn)
>>> + return info->domain;
>>> +
>>> + return NULL;
>>> +}
>>> +
>>> +static int dmar_insert_dev_info(int segment, int bus, int devfn,
>>> + struct pci_dev *dev, struct
>>> dmar_domain **domp)
>>> +{
>>> + struct dmar_domain *found, *domain = *domp;
>>> + struct device_domain_info *info;
>>> + unsigned long flags;
>>> +
>>> + info = alloc_devinfo_mem();
>>> + if (!info)
>>> + return -ENOMEM;
>>> +
>>> + info->segment = segment;
>>> + info->bus = bus;
>>> + info->devfn = devfn;
>>> + info->dev = dev;
>>> + info->domain = domain;
>>> + if (!dev)
>>> + domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>>> +
>>> + spin_lock_irqsave(&device_domain_lock, flags);
>>> + if (dev)
>>> + found = find_domain(dev);
>>> + else
>>> + found = dmar_search_domain_by_dev_info(segment, bus,
>>> devfn);
>>> + if (found) {
>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>> + free_devinfo_mem(info);
>>> + if (found != domain) {
>>> + domain_exit(domain);
>>> + *domp = found;
>>> + }
>>> + } else {
>>> + list_add(&info->link, &domain->devices);
>>> + list_add(&info->global, &device_domain_list);
>>> + if (dev)
>>> + dev->dev.archdata.iommu = info;
>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>> + }
>>> +
>>> + return 0;
>>> +}
>>>
>>>
>>> I am a little bit confused about the "alloc before check" sequence in
>>> above function. I believe the purpose of allocating the
>>> device_domain_info before searching the domain with spin_lock hold is to
>>> avoid the memory allocation in the spin_lock? In this case, why can't we
>>> use mutex instead of spin_lock? In my understanding, if we use mutex, we
>>> can first check if the domain exists or not before we really allocate
>>> device_domain_info, right?
>> Hi Kai,
>> Thanks for review!
>> The domain->devices list may be access in interrupt context,
>> so we need to protect it with spin_lock_irqsave(). Otherwise it may
>> case deadlock.
>>
>
> Thanks. That's what I am thinking. I observed lots of other iommu
> functions also use spin_lock.
>
>>>
>>> Another question is when is info->iommu field initialized? Looks it is
>>> not initialized here?
>> It's set in function iommu_support_dev_iotlb() for devices supporting
>> device IOTLB.
>
> Thanks. I see in iommu_support_dev_iotlb, info->iommu won't be set
> unless the device supports iotlb and ATS. Does this mean the
> info->iommu won't be used if the device doesn't support iotlb?
>
> If this is true, looks it's not convenient to find the IOMMU that
> handles the device via info, as it's possible that different IOMMUs
> share the same domain, and we don't know which IOMMU actually handles
> this device if we try to find it via the info->domain.
It's a little heavy to find info by walking the list too:)
Please refer to domain_get_iommu() for the way to find iommu associated
with a domain.

>
> -Kai
>
>>
>>>
>>> -Kai
>>>
>>> +
>>> /* domain is initialized */
>>> static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev,
>>> int gaw)
>>> {
>>> - struct dmar_domain *domain, *found = NULL;
>>> + struct dmar_domain *domain;
>>> struct intel_iommu *iommu;
>>> struct dmar_drhd_unit *drhd;
>>> - struct device_domain_info *info, *tmp;
>>> - struct pci_dev *dev_tmp;
>>> + struct pci_dev *bridge;
>>> unsigned long flags;
>>> - int bus = 0, devfn = 0;
>>> - int segment;
>>> - int ret;
>>> + int segment, bus, devfn;
>>>
>>> domain = find_domain(pdev);
>>> if (domain)
>>> @@ -1976,119 +2028,56 @@ static struct dmar_domain
>>> *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>>>
>>> segment = pci_domain_nr(pdev->bus);
>>>
>>> - dev_tmp = pci_find_upstream_pcie_bridge(pdev);
>>> - if (dev_tmp) {
>>> - if (pci_is_pcie(dev_tmp)) {
>>> - bus = dev_tmp->subordinate->number;
>>> + bridge = pci_find_upstream_pcie_bridge(pdev);
>>> + if (bridge) {
>>> + if (pci_is_pcie(bridge)) {
>>> + bus = bridge->subordinate->number;
>>> devfn = 0;
>>> } else {
>>> - bus = dev_tmp->bus->number;
>>> - devfn = dev_tmp->devfn;
>>> + bus = bridge->bus->number;
>>> + devfn = bridge->devfn;
>>> }
>>> spin_lock_irqsave(&device_domain_lock, flags);
>>> - list_for_each_entry(info, &device_domain_list, global) {
>>> - if (info->segment == segment &&
>>> - info->bus == bus && info->devfn == devfn) {
>>> - found = info->domain;
>>> - break;
>>> - }
>>> - }
>>> + domain = dmar_search_domain_by_dev_info(segment,
>>> bus, devfn);
>>> spin_unlock_irqrestore(&device_domain_lock, flags);
>>> /* pcie-pci bridge already has a domain, uses it */
>>> - if (found) {
>>> - domain = found;
>>> + if (domain)
>>> goto found_domain;
>>> - }
>>> }
>>>
>>> - domain = alloc_domain();
>>> - if (!domain)
>>> - goto error;
>>> -
>>> - /* Allocate new domain for the device */
>>> drhd = dmar_find_matched_drhd_unit(pdev);
>>> if (!drhd) {
>>> printk(KERN_ERR "IOMMU: can't find DMAR for device
>>> %s\n",
>>> pci_name(pdev));
>>> - free_domain_mem(domain);
>>> return NULL;
>>> }
>>> iommu = drhd->iommu;
>>>
>>> - ret = iommu_attach_domain(domain, iommu);
>>> - if (ret) {
>>> + /* Allocate new domain for the device */
>>> + domain = alloc_domain();
>>> + if (!domain)
>>> + goto error;
>>> + if (iommu_attach_domain(domain, iommu)) {
>>> free_domain_mem(domain);
>>> goto error;
>>> }
>>> -
>>> if (domain_init(domain, gaw)) {
>>> domain_exit(domain);
>>> goto error;
>>> }
>>>
>>> /* register pcie-to-pci device */
>>> - if (dev_tmp) {
>>> - info = alloc_devinfo_mem();
>>> - if (!info) {
>>> + if (bridge) {
>>> + if (dmar_insert_dev_info(segment, bus, devfn, NULL,
>>> &domain)) {
>>> domain_exit(domain);
>>> goto error;
>>> }
>>> - info->segment = segment;
>>> - info->bus = bus;
>>> - info->devfn = devfn;
>>> - info->dev = NULL;
>>> - info->domain = domain;
>>> - /* This domain is shared by devices under p2p bridge */
>>> - domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>>> -
>>> - /* pcie-to-pci bridge already has a domain, uses it */
>>> - found = NULL;
>>> - spin_lock_irqsave(&device_domain_lock, flags);
>>> - list_for_each_entry(tmp, &device_domain_list, global) {
>>> - if (tmp->segment == segment &&
>>> - tmp->bus == bus && tmp->devfn == devfn) {
>>> - found = tmp->domain;
>>> - break;
>>> - }
>>> - }
>>> - if (found) {
>>> - spin_unlock_irqrestore(&device_domain_lock,
>>> flags);
>>> - free_devinfo_mem(info);
>>> - domain_exit(domain);
>>> - domain = found;
>>> - } else {
>>> - list_add(&info->link, &domain->devices);
>>> - list_add(&info->global, &device_domain_list);
>>> - spin_unlock_irqrestore(&device_domain_lock,
>>> flags);
>>> - }
>>> }
>>>
>>> found_domain:
>>> - info = alloc_devinfo_mem();
>>> - if (!info)
>>> - goto error;
>>> - info->segment = segment;
>>> - info->bus = pdev->bus->number;
>>> - info->devfn = pdev->devfn;
>>> - info->dev = pdev;
>>> - info->domain = domain;
>>> - spin_lock_irqsave(&device_domain_lock, flags);
>>> - /* somebody is fast */
>>> - found = find_domain(pdev);
>>> - if (found != NULL) {
>>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>>> - if (found != domain) {
>>> - domain_exit(domain);
>>> - domain = found;
>>> - }
>>> - free_devinfo_mem(info);
>>> + if (dmar_insert_dev_info(segment, pdev->bus->number,
>>> pdev->devfn,
>>> + pdev, &domain) == 0)
>>> return domain;
>>> - }
>>> - list_add(&info->link, &domain->devices);
>>> - list_add(&info->global, &device_domain_list);
>>> - pdev->dev.archdata.iommu = info;
>>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>>> - return domain;
>>> error:
>>> /* recheck it here, maybe others set it */
>>> return find_domain(pdev);
>>> --
>>> 1.7.10.4
>>>
>>> _______________________________________________
>>> iommu mailing list
>>> [email protected]
>>> <mailto:[email protected]>
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>
>>>

2014-01-08 06:49:08

by Kai Huang

[permalink] [raw]
Subject: Re: [Patch Part2 V1 03/14] iommu/vt-d: simplify function get_domain_for_dev()

On Wed, Jan 8, 2014 at 2:31 PM, Jiang Liu <[email protected]> wrote:
>
>
> On 2014/1/8 14:06, Kai Huang wrote:
>> On Wed, Jan 8, 2014 at 1:48 PM, Jiang Liu <[email protected]> wrote:
>>>
>>>
>>> On 2014/1/8 11:06, Kai Huang wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>> Function get_domain_for_dev() is a little complex, simplify it
>>>> by factoring out dmar_search_domain_by_dev_info() and
>>>> dmar_insert_dev_info().
>>>>
>>>> Signed-off-by: Jiang Liu <[email protected]
>>>> <mailto:[email protected]>>
>>>> ---
>>>> drivers/iommu/intel-iommu.c | 161
>>>> ++++++++++++++++++++-----------------------
>>>> 1 file changed, 75 insertions(+), 86 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>>> index 1704e97..da65884 100644
>>>> --- a/drivers/iommu/intel-iommu.c
>>>> +++ b/drivers/iommu/intel-iommu.c
>>>> @@ -1957,18 +1957,70 @@ find_domain(struct pci_dev *pdev)
>>>> return NULL;
>>>> }
>>>>
>>>> +static inline struct dmar_domain *
>>>> +dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
>>>> +{
>>>> + struct device_domain_info *info;
>>>> +
>>>> + list_for_each_entry(info, &device_domain_list, global)
>>>> + if (info->segment == segment && info->bus == bus &&
>>>> + info->devfn == devfn)
>>>> + return info->domain;
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +static int dmar_insert_dev_info(int segment, int bus, int devfn,
>>>> + struct pci_dev *dev, struct
>>>> dmar_domain **domp)
>>>> +{
>>>> + struct dmar_domain *found, *domain = *domp;
>>>> + struct device_domain_info *info;
>>>> + unsigned long flags;
>>>> +
>>>> + info = alloc_devinfo_mem();
>>>> + if (!info)
>>>> + return -ENOMEM;
>>>> +
>>>> + info->segment = segment;
>>>> + info->bus = bus;
>>>> + info->devfn = devfn;
>>>> + info->dev = dev;
>>>> + info->domain = domain;
>>>> + if (!dev)
>>>> + domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>>>> +
>>>> + spin_lock_irqsave(&device_domain_lock, flags);
>>>> + if (dev)
>>>> + found = find_domain(dev);
>>>> + else
>>>> + found = dmar_search_domain_by_dev_info(segment, bus,
>>>> devfn);
>>>> + if (found) {
>>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>>> + free_devinfo_mem(info);
>>>> + if (found != domain) {
>>>> + domain_exit(domain);
>>>> + *domp = found;
>>>> + }
>>>> + } else {
>>>> + list_add(&info->link, &domain->devices);
>>>> + list_add(&info->global, &device_domain_list);
>>>> + if (dev)
>>>> + dev->dev.archdata.iommu = info;
>>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>>
>>>>
>>>> I am a little bit confused about the "alloc before check" sequence in
>>>> above function. I believe the purpose of allocating the
>>>> device_domain_info before searching the domain with spin_lock hold is to
>>>> avoid the memory allocation in the spin_lock? In this case, why can't we
>>>> use mutex instead of spin_lock? In my understanding, if we use mutex, we
>>>> can first check if the domain exists or not before we really allocate
>>>> device_domain_info, right?
>>> Hi Kai,
>>> Thanks for review!
>>> The domain->devices list may be access in interrupt context,
>>> so we need to protect it with spin_lock_irqsave(). Otherwise it may
>>> case deadlock.
>>>
>>
>> Thanks. That's what I am thinking. I observed lots of other iommu
>> functions also use spin_lock.
>>
>>>>
>>>> Another question is when is info->iommu field initialized? Looks it is
>>>> not initialized here?
>>> It's set in function iommu_support_dev_iotlb() for devices supporting
>>> device IOTLB.
>>
>> Thanks. I see in iommu_support_dev_iotlb, info->iommu won't be set
>> unless the device supports iotlb and ATS. Does this mean the
>> info->iommu won't be used if the device doesn't support iotlb?
>>
>> If this is true, looks it's not convenient to find the IOMMU that
>> handles the device via info, as it's possible that different IOMMUs
>> share the same domain, and we don't know which IOMMU actually handles
>> this device if we try to find it via the info->domain.
> It's a little heavy to find info by walking the list too:)
> Please refer to domain_get_iommu() for the way to find iommu associated
> with a domain.
>
Looks domain_get_iommu just returns the first IOMMU in the bitmap, so
it can't return the IOMMU that *really* handles the device in
hardware. But I guess from domain's view, probably we don't care which
IOMMU really handles the device, cause if multiple IOMMUs share the
same domain, the IOMMUs should have the same (or very similar)
capabilities, so referencing the first IOMMU to check some
capabilities (that's what I see in code so far) should work for the
domain layer API.

But looks it's safe to set info->iommu in get_domain_for_dev anyway
(and remove it in iommu_support_dev_iotlb)? In this case it's easier
when we want to get IOMMU from info.

Of course, it's just my opinion, and probably is wrong, it's totally
up to you if want to do this or not. :)

-Kai

>>
>> -Kai
>>
>>>
>>>>
>>>> -Kai
>>>>
>>>> +
>>>> /* domain is initialized */
>>>> static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev,
>>>> int gaw)
>>>> {
>>>> - struct dmar_domain *domain, *found = NULL;
>>>> + struct dmar_domain *domain;
>>>> struct intel_iommu *iommu;
>>>> struct dmar_drhd_unit *drhd;
>>>> - struct device_domain_info *info, *tmp;
>>>> - struct pci_dev *dev_tmp;
>>>> + struct pci_dev *bridge;
>>>> unsigned long flags;
>>>> - int bus = 0, devfn = 0;
>>>> - int segment;
>>>> - int ret;
>>>> + int segment, bus, devfn;
>>>>
>>>> domain = find_domain(pdev);
>>>> if (domain)
>>>> @@ -1976,119 +2028,56 @@ static struct dmar_domain
>>>> *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>>>>
>>>> segment = pci_domain_nr(pdev->bus);
>>>>
>>>> - dev_tmp = pci_find_upstream_pcie_bridge(pdev);
>>>> - if (dev_tmp) {
>>>> - if (pci_is_pcie(dev_tmp)) {
>>>> - bus = dev_tmp->subordinate->number;
>>>> + bridge = pci_find_upstream_pcie_bridge(pdev);
>>>> + if (bridge) {
>>>> + if (pci_is_pcie(bridge)) {
>>>> + bus = bridge->subordinate->number;
>>>> devfn = 0;
>>>> } else {
>>>> - bus = dev_tmp->bus->number;
>>>> - devfn = dev_tmp->devfn;
>>>> + bus = bridge->bus->number;
>>>> + devfn = bridge->devfn;
>>>> }
>>>> spin_lock_irqsave(&device_domain_lock, flags);
>>>> - list_for_each_entry(info, &device_domain_list, global) {
>>>> - if (info->segment == segment &&
>>>> - info->bus == bus && info->devfn == devfn) {
>>>> - found = info->domain;
>>>> - break;
>>>> - }
>>>> - }
>>>> + domain = dmar_search_domain_by_dev_info(segment,
>>>> bus, devfn);
>>>> spin_unlock_irqrestore(&device_domain_lock, flags);
>>>> /* pcie-pci bridge already has a domain, uses it */
>>>> - if (found) {
>>>> - domain = found;
>>>> + if (domain)
>>>> goto found_domain;
>>>> - }
>>>> }
>>>>
>>>> - domain = alloc_domain();
>>>> - if (!domain)
>>>> - goto error;
>>>> -
>>>> - /* Allocate new domain for the device */
>>>> drhd = dmar_find_matched_drhd_unit(pdev);
>>>> if (!drhd) {
>>>> printk(KERN_ERR "IOMMU: can't find DMAR for device
>>>> %s\n",
>>>> pci_name(pdev));
>>>> - free_domain_mem(domain);
>>>> return NULL;
>>>> }
>>>> iommu = drhd->iommu;
>>>>
>>>> - ret = iommu_attach_domain(domain, iommu);
>>>> - if (ret) {
>>>> + /* Allocate new domain for the device */
>>>> + domain = alloc_domain();
>>>> + if (!domain)
>>>> + goto error;
>>>> + if (iommu_attach_domain(domain, iommu)) {
>>>> free_domain_mem(domain);
>>>> goto error;
>>>> }
>>>> -
>>>> if (domain_init(domain, gaw)) {
>>>> domain_exit(domain);
>>>> goto error;
>>>> }
>>>>
>>>> /* register pcie-to-pci device */
>>>> - if (dev_tmp) {
>>>> - info = alloc_devinfo_mem();
>>>> - if (!info) {
>>>> + if (bridge) {
>>>> + if (dmar_insert_dev_info(segment, bus, devfn, NULL,
>>>> &domain)) {
>>>> domain_exit(domain);
>>>> goto error;
>>>> }
>>>> - info->segment = segment;
>>>> - info->bus = bus;
>>>> - info->devfn = devfn;
>>>> - info->dev = NULL;
>>>> - info->domain = domain;
>>>> - /* This domain is shared by devices under p2p bridge */
>>>> - domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>>>> -
>>>> - /* pcie-to-pci bridge already has a domain, uses it */
>>>> - found = NULL;
>>>> - spin_lock_irqsave(&device_domain_lock, flags);
>>>> - list_for_each_entry(tmp, &device_domain_list, global) {
>>>> - if (tmp->segment == segment &&
>>>> - tmp->bus == bus && tmp->devfn == devfn) {
>>>> - found = tmp->domain;
>>>> - break;
>>>> - }
>>>> - }
>>>> - if (found) {
>>>> - spin_unlock_irqrestore(&device_domain_lock,
>>>> flags);
>>>> - free_devinfo_mem(info);
>>>> - domain_exit(domain);
>>>> - domain = found;
>>>> - } else {
>>>> - list_add(&info->link, &domain->devices);
>>>> - list_add(&info->global, &device_domain_list);
>>>> - spin_unlock_irqrestore(&device_domain_lock,
>>>> flags);
>>>> - }
>>>> }
>>>>
>>>> found_domain:
>>>> - info = alloc_devinfo_mem();
>>>> - if (!info)
>>>> - goto error;
>>>> - info->segment = segment;
>>>> - info->bus = pdev->bus->number;
>>>> - info->devfn = pdev->devfn;
>>>> - info->dev = pdev;
>>>> - info->domain = domain;
>>>> - spin_lock_irqsave(&device_domain_lock, flags);
>>>> - /* somebody is fast */
>>>> - found = find_domain(pdev);
>>>> - if (found != NULL) {
>>>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>>>> - if (found != domain) {
>>>> - domain_exit(domain);
>>>> - domain = found;
>>>> - }
>>>> - free_devinfo_mem(info);
>>>> + if (dmar_insert_dev_info(segment, pdev->bus->number,
>>>> pdev->devfn,
>>>> + pdev, &domain) == 0)
>>>> return domain;
>>>> - }
>>>> - list_add(&info->link, &domain->devices);
>>>> - list_add(&info->global, &device_domain_list);
>>>> - pdev->dev.archdata.iommu = info;
>>>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>>>> - return domain;
>>>> error:
>>>> /* recheck it here, maybe others set it */
>>>> return find_domain(pdev);
>>>> --
>>>> 1.7.10.4
>>>>
>>>> _______________________________________________
>>>> iommu mailing list
>>>> [email protected]
>>>> <mailto:[email protected]>
>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>
>>>>

2014-01-08 06:56:58

by Kai Huang

[permalink] [raw]
Subject: Re: [Patch Part2 V1 03/14] iommu/vt-d: simplify function get_domain_for_dev()

On Wed, Jan 8, 2014 at 2:48 PM, Kai Huang <[email protected]> wrote:
> On Wed, Jan 8, 2014 at 2:31 PM, Jiang Liu <[email protected]> wrote:
>>
>>
>> On 2014/1/8 14:06, Kai Huang wrote:
>>> On Wed, Jan 8, 2014 at 1:48 PM, Jiang Liu <[email protected]> wrote:
>>>>
>>>>
>>>> On 2014/1/8 11:06, Kai Huang wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>> Function get_domain_for_dev() is a little complex, simplify it
>>>>> by factoring out dmar_search_domain_by_dev_info() and
>>>>> dmar_insert_dev_info().
>>>>>
>>>>> Signed-off-by: Jiang Liu <[email protected]
>>>>> <mailto:[email protected]>>
>>>>> ---
>>>>> drivers/iommu/intel-iommu.c | 161
>>>>> ++++++++++++++++++++-----------------------
>>>>> 1 file changed, 75 insertions(+), 86 deletions(-)
>>>>>
>>>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>>>> index 1704e97..da65884 100644
>>>>> --- a/drivers/iommu/intel-iommu.c
>>>>> +++ b/drivers/iommu/intel-iommu.c
>>>>> @@ -1957,18 +1957,70 @@ find_domain(struct pci_dev *pdev)
>>>>> return NULL;
>>>>> }
>>>>>
>>>>> +static inline struct dmar_domain *
>>>>> +dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
>>>>> +{
>>>>> + struct device_domain_info *info;
>>>>> +
>>>>> + list_for_each_entry(info, &device_domain_list, global)
>>>>> + if (info->segment == segment && info->bus == bus &&
>>>>> + info->devfn == devfn)
>>>>> + return info->domain;
>>>>> +
>>>>> + return NULL;
>>>>> +}
>>>>> +
>>>>> +static int dmar_insert_dev_info(int segment, int bus, int devfn,
>>>>> + struct pci_dev *dev, struct
>>>>> dmar_domain **domp)
>>>>> +{
>>>>> + struct dmar_domain *found, *domain = *domp;
>>>>> + struct device_domain_info *info;
>>>>> + unsigned long flags;
>>>>> +
>>>>> + info = alloc_devinfo_mem();
>>>>> + if (!info)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + info->segment = segment;
>>>>> + info->bus = bus;
>>>>> + info->devfn = devfn;
>>>>> + info->dev = dev;
>>>>> + info->domain = domain;
>>>>> + if (!dev)
>>>>> + domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>>>>> +
>>>>> + spin_lock_irqsave(&device_domain_lock, flags);
>>>>> + if (dev)
>>>>> + found = find_domain(dev);
>>>>> + else
>>>>> + found = dmar_search_domain_by_dev_info(segment, bus,
>>>>> devfn);
>>>>> + if (found) {
>>>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> + free_devinfo_mem(info);
>>>>> + if (found != domain) {
>>>>> + domain_exit(domain);
>>>>> + *domp = found;
>>>>> + }
>>>>> + } else {
>>>>> + list_add(&info->link, &domain->devices);
>>>>> + list_add(&info->global, &device_domain_list);
>>>>> + if (dev)
>>>>> + dev->dev.archdata.iommu = info;
>>>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> + }
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>>
>>>>>
>>>>> I am a little bit confused about the "alloc before check" sequence in
>>>>> above function. I believe the purpose of allocating the
>>>>> device_domain_info before searching the domain with spin_lock hold is to
>>>>> avoid the memory allocation in the spin_lock? In this case, why can't we
>>>>> use mutex instead of spin_lock? In my understanding, if we use mutex, we
>>>>> can first check if the domain exists or not before we really allocate
>>>>> device_domain_info, right?
>>>> Hi Kai,
>>>> Thanks for review!
>>>> The domain->devices list may be access in interrupt context,
>>>> so we need to protect it with spin_lock_irqsave(). Otherwise it may
>>>> case deadlock.
>>>>
>>>
>>> Thanks. That's what I am thinking. I observed lots of other iommu
>>> functions also use spin_lock.
>>>
>>>>>
>>>>> Another question is when is info->iommu field initialized? Looks it is
>>>>> not initialized here?
>>>> It's set in function iommu_support_dev_iotlb() for devices supporting
>>>> device IOTLB.
>>>
>>> Thanks. I see in iommu_support_dev_iotlb, info->iommu won't be set
>>> unless the device supports iotlb and ATS. Does this mean the
>>> info->iommu won't be used if the device doesn't support iotlb?
>>>
>>> If this is true, looks it's not convenient to find the IOMMU that
>>> handles the device via info, as it's possible that different IOMMUs
>>> share the same domain, and we don't know which IOMMU actually handles
>>> this device if we try to find it via the info->domain.
>> It's a little heavy to find info by walking the list too:)
>> Please refer to domain_get_iommu() for the way to find iommu associated
>> with a domain.
>>
> Looks domain_get_iommu just returns the first IOMMU in the bitmap, so
> it can't return the IOMMU that *really* handles the device in
> hardware. But I guess from domain's view, probably we don't care which
> IOMMU really handles the device, cause if multiple IOMMUs share the
> same domain, the IOMMUs should have the same (or very similar)
> capabilities, so referencing the first IOMMU to check some
> capabilities (that's what I see in code so far) should work for the
> domain layer API.
>

And just realized the domain_get_iommu just takes the domain as
argument, so it's really not used to get the IOMMU from device. I am
not sure in current code if there's such interface like
get_iommu_from_device(struct pci_dev *dev) or not, probably we don't
need it. :)

-Kai

> But looks it's safe to set info->iommu in get_domain_for_dev anyway
> (and remove it in iommu_support_dev_iotlb)? In this case it's easier
> when we want to get IOMMU from info.
>
> Of course, it's just my opinion, and probably is wrong, it's totally
> up to you if want to do this or not. :)
>
> -Kai
>
>>>
>>> -Kai
>>>
>>>>
>>>>>
>>>>> -Kai
>>>>>
>>>>> +
>>>>> /* domain is initialized */
>>>>> static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev,
>>>>> int gaw)
>>>>> {
>>>>> - struct dmar_domain *domain, *found = NULL;
>>>>> + struct dmar_domain *domain;
>>>>> struct intel_iommu *iommu;
>>>>> struct dmar_drhd_unit *drhd;
>>>>> - struct device_domain_info *info, *tmp;
>>>>> - struct pci_dev *dev_tmp;
>>>>> + struct pci_dev *bridge;
>>>>> unsigned long flags;
>>>>> - int bus = 0, devfn = 0;
>>>>> - int segment;
>>>>> - int ret;
>>>>> + int segment, bus, devfn;
>>>>>
>>>>> domain = find_domain(pdev);
>>>>> if (domain)
>>>>> @@ -1976,119 +2028,56 @@ static struct dmar_domain
>>>>> *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>>>>>
>>>>> segment = pci_domain_nr(pdev->bus);
>>>>>
>>>>> - dev_tmp = pci_find_upstream_pcie_bridge(pdev);
>>>>> - if (dev_tmp) {
>>>>> - if (pci_is_pcie(dev_tmp)) {
>>>>> - bus = dev_tmp->subordinate->number;
>>>>> + bridge = pci_find_upstream_pcie_bridge(pdev);
>>>>> + if (bridge) {
>>>>> + if (pci_is_pcie(bridge)) {
>>>>> + bus = bridge->subordinate->number;
>>>>> devfn = 0;
>>>>> } else {
>>>>> - bus = dev_tmp->bus->number;
>>>>> - devfn = dev_tmp->devfn;
>>>>> + bus = bridge->bus->number;
>>>>> + devfn = bridge->devfn;
>>>>> }
>>>>> spin_lock_irqsave(&device_domain_lock, flags);
>>>>> - list_for_each_entry(info, &device_domain_list, global) {
>>>>> - if (info->segment == segment &&
>>>>> - info->bus == bus && info->devfn == devfn) {
>>>>> - found = info->domain;
>>>>> - break;
>>>>> - }
>>>>> - }
>>>>> + domain = dmar_search_domain_by_dev_info(segment,
>>>>> bus, devfn);
>>>>> spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> /* pcie-pci bridge already has a domain, uses it */
>>>>> - if (found) {
>>>>> - domain = found;
>>>>> + if (domain)
>>>>> goto found_domain;
>>>>> - }
>>>>> }
>>>>>
>>>>> - domain = alloc_domain();
>>>>> - if (!domain)
>>>>> - goto error;
>>>>> -
>>>>> - /* Allocate new domain for the device */
>>>>> drhd = dmar_find_matched_drhd_unit(pdev);
>>>>> if (!drhd) {
>>>>> printk(KERN_ERR "IOMMU: can't find DMAR for device
>>>>> %s\n",
>>>>> pci_name(pdev));
>>>>> - free_domain_mem(domain);
>>>>> return NULL;
>>>>> }
>>>>> iommu = drhd->iommu;
>>>>>
>>>>> - ret = iommu_attach_domain(domain, iommu);
>>>>> - if (ret) {
>>>>> + /* Allocate new domain for the device */
>>>>> + domain = alloc_domain();
>>>>> + if (!domain)
>>>>> + goto error;
>>>>> + if (iommu_attach_domain(domain, iommu)) {
>>>>> free_domain_mem(domain);
>>>>> goto error;
>>>>> }
>>>>> -
>>>>> if (domain_init(domain, gaw)) {
>>>>> domain_exit(domain);
>>>>> goto error;
>>>>> }
>>>>>
>>>>> /* register pcie-to-pci device */
>>>>> - if (dev_tmp) {
>>>>> - info = alloc_devinfo_mem();
>>>>> - if (!info) {
>>>>> + if (bridge) {
>>>>> + if (dmar_insert_dev_info(segment, bus, devfn, NULL,
>>>>> &domain)) {
>>>>> domain_exit(domain);
>>>>> goto error;
>>>>> }
>>>>> - info->segment = segment;
>>>>> - info->bus = bus;
>>>>> - info->devfn = devfn;
>>>>> - info->dev = NULL;
>>>>> - info->domain = domain;
>>>>> - /* This domain is shared by devices under p2p bridge */
>>>>> - domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>>>>> -
>>>>> - /* pcie-to-pci bridge already has a domain, uses it */
>>>>> - found = NULL;
>>>>> - spin_lock_irqsave(&device_domain_lock, flags);
>>>>> - list_for_each_entry(tmp, &device_domain_list, global) {
>>>>> - if (tmp->segment == segment &&
>>>>> - tmp->bus == bus && tmp->devfn == devfn) {
>>>>> - found = tmp->domain;
>>>>> - break;
>>>>> - }
>>>>> - }
>>>>> - if (found) {
>>>>> - spin_unlock_irqrestore(&device_domain_lock,
>>>>> flags);
>>>>> - free_devinfo_mem(info);
>>>>> - domain_exit(domain);
>>>>> - domain = found;
>>>>> - } else {
>>>>> - list_add(&info->link, &domain->devices);
>>>>> - list_add(&info->global, &device_domain_list);
>>>>> - spin_unlock_irqrestore(&device_domain_lock,
>>>>> flags);
>>>>> - }
>>>>> }
>>>>>
>>>>> found_domain:
>>>>> - info = alloc_devinfo_mem();
>>>>> - if (!info)
>>>>> - goto error;
>>>>> - info->segment = segment;
>>>>> - info->bus = pdev->bus->number;
>>>>> - info->devfn = pdev->devfn;
>>>>> - info->dev = pdev;
>>>>> - info->domain = domain;
>>>>> - spin_lock_irqsave(&device_domain_lock, flags);
>>>>> - /* somebody is fast */
>>>>> - found = find_domain(pdev);
>>>>> - if (found != NULL) {
>>>>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> - if (found != domain) {
>>>>> - domain_exit(domain);
>>>>> - domain = found;
>>>>> - }
>>>>> - free_devinfo_mem(info);
>>>>> + if (dmar_insert_dev_info(segment, pdev->bus->number,
>>>>> pdev->devfn,
>>>>> + pdev, &domain) == 0)
>>>>> return domain;
>>>>> - }
>>>>> - list_add(&info->link, &domain->devices);
>>>>> - list_add(&info->global, &device_domain_list);
>>>>> - pdev->dev.archdata.iommu = info;
>>>>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> - return domain;
>>>>> error:
>>>>> /* recheck it here, maybe others set it */
>>>>> return find_domain(pdev);
>>>>> --
>>>>> 1.7.10.4
>>>>>
>>>>> _______________________________________________
>>>>> iommu mailing list
>>>>> [email protected]
>>>>> <mailto:[email protected]>
>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>>
>>>>>

2014-01-08 06:57:48

by Jiang Liu

[permalink] [raw]
Subject: Re: [Patch Part2 V1 03/14] iommu/vt-d: simplify function get_domain_for_dev()



On 2014/1/8 14:48, Kai Huang wrote:
> On Wed, Jan 8, 2014 at 2:31 PM, Jiang Liu <[email protected]> wrote:
>>
>>
>> On 2014/1/8 14:06, Kai Huang wrote:
>>> On Wed, Jan 8, 2014 at 1:48 PM, Jiang Liu <[email protected]> wrote:
>>>>
>>>>
>>>> On 2014/1/8 11:06, Kai Huang wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 7, 2014 at 5:00 PM, Jiang Liu <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>> Function get_domain_for_dev() is a little complex, simplify it
>>>>> by factoring out dmar_search_domain_by_dev_info() and
>>>>> dmar_insert_dev_info().
>>>>>
>>>>> Signed-off-by: Jiang Liu <[email protected]
>>>>> <mailto:[email protected]>>
>>>>> ---
>>>>> drivers/iommu/intel-iommu.c | 161
>>>>> ++++++++++++++++++++-----------------------
>>>>> 1 file changed, 75 insertions(+), 86 deletions(-)
>>>>>
>>>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>>>> index 1704e97..da65884 100644
>>>>> --- a/drivers/iommu/intel-iommu.c
>>>>> +++ b/drivers/iommu/intel-iommu.c
>>>>> @@ -1957,18 +1957,70 @@ find_domain(struct pci_dev *pdev)
>>>>> return NULL;
>>>>> }
>>>>>
>>>>> +static inline struct dmar_domain *
>>>>> +dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
>>>>> +{
>>>>> + struct device_domain_info *info;
>>>>> +
>>>>> + list_for_each_entry(info, &device_domain_list, global)
>>>>> + if (info->segment == segment && info->bus == bus &&
>>>>> + info->devfn == devfn)
>>>>> + return info->domain;
>>>>> +
>>>>> + return NULL;
>>>>> +}
>>>>> +
>>>>> +static int dmar_insert_dev_info(int segment, int bus, int devfn,
>>>>> + struct pci_dev *dev, struct
>>>>> dmar_domain **domp)
>>>>> +{
>>>>> + struct dmar_domain *found, *domain = *domp;
>>>>> + struct device_domain_info *info;
>>>>> + unsigned long flags;
>>>>> +
>>>>> + info = alloc_devinfo_mem();
>>>>> + if (!info)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + info->segment = segment;
>>>>> + info->bus = bus;
>>>>> + info->devfn = devfn;
>>>>> + info->dev = dev;
>>>>> + info->domain = domain;
>>>>> + if (!dev)
>>>>> + domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>>>>> +
>>>>> + spin_lock_irqsave(&device_domain_lock, flags);
>>>>> + if (dev)
>>>>> + found = find_domain(dev);
>>>>> + else
>>>>> + found = dmar_search_domain_by_dev_info(segment, bus,
>>>>> devfn);
>>>>> + if (found) {
>>>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> + free_devinfo_mem(info);
>>>>> + if (found != domain) {
>>>>> + domain_exit(domain);
>>>>> + *domp = found;
>>>>> + }
>>>>> + } else {
>>>>> + list_add(&info->link, &domain->devices);
>>>>> + list_add(&info->global, &device_domain_list);
>>>>> + if (dev)
>>>>> + dev->dev.archdata.iommu = info;
>>>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> + }
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>>
>>>>>
>>>>> I am a little bit confused about the "alloc before check" sequence in
>>>>> above function. I believe the purpose of allocating the
>>>>> device_domain_info before searching the domain with spin_lock hold is to
>>>>> avoid the memory allocation in the spin_lock? In this case, why can't we
>>>>> use mutex instead of spin_lock? In my understanding, if we use mutex, we
>>>>> can first check if the domain exists or not before we really allocate
>>>>> device_domain_info, right?
>>>> Hi Kai,
>>>> Thanks for review!
>>>> The domain->devices list may be access in interrupt context,
>>>> so we need to protect it with spin_lock_irqsave(). Otherwise it may
>>>> case deadlock.
>>>>
>>>
>>> Thanks. That's what I am thinking. I observed lots of other iommu
>>> functions also use spin_lock.
>>>
>>>>>
>>>>> Another question is when is info->iommu field initialized? Looks it is
>>>>> not initialized here?
>>>> It's set in function iommu_support_dev_iotlb() for devices supporting
>>>> device IOTLB.
>>>
>>> Thanks. I see in iommu_support_dev_iotlb, info->iommu won't be set
>>> unless the device supports iotlb and ATS. Does this mean the
>>> info->iommu won't be used if the device doesn't support iotlb?
>>>
>>> If this is true, looks it's not convenient to find the IOMMU that
>>> handles the device via info, as it's possible that different IOMMUs
>>> share the same domain, and we don't know which IOMMU actually handles
>>> this device if we try to find it via the info->domain.
>> It's a little heavy to find info by walking the list too:)
>> Please refer to domain_get_iommu() for the way to find iommu associated
>> with a domain.
>>
> Looks domain_get_iommu just returns the first IOMMU in the bitmap, so
> it can't return the IOMMU that *really* handles the device in
> hardware. But I guess from domain's view, probably we don't care which
> IOMMU really handles the device, cause if multiple IOMMUs share the
> same domain, the IOMMUs should have the same (or very similar)
> capabilities, so referencing the first IOMMU to check some
> capabilities (that's what I see in code so far) should work for the
> domain layer API.
>
> But looks it's safe to set info->iommu in get_domain_for_dev anyway
> (and remove it in iommu_support_dev_iotlb)? In this case it's easier
> when we want to get IOMMU from info.
>
> Of course, it's just my opinion, and probably is wrong, it's totally
> up to you if want to do this or not. :)
Actually I'm on your side, I think it's better to initialize info->iommu
in function dmar_insert_dev_info() too. Will try that way.

>
> -Kai
>
>>>
>>> -Kai
>>>
>>>>
>>>>>
>>>>> -Kai
>>>>>
>>>>> +
>>>>> /* domain is initialized */
>>>>> static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev,
>>>>> int gaw)
>>>>> {
>>>>> - struct dmar_domain *domain, *found = NULL;
>>>>> + struct dmar_domain *domain;
>>>>> struct intel_iommu *iommu;
>>>>> struct dmar_drhd_unit *drhd;
>>>>> - struct device_domain_info *info, *tmp;
>>>>> - struct pci_dev *dev_tmp;
>>>>> + struct pci_dev *bridge;
>>>>> unsigned long flags;
>>>>> - int bus = 0, devfn = 0;
>>>>> - int segment;
>>>>> - int ret;
>>>>> + int segment, bus, devfn;
>>>>>
>>>>> domain = find_domain(pdev);
>>>>> if (domain)
>>>>> @@ -1976,119 +2028,56 @@ static struct dmar_domain
>>>>> *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>>>>>
>>>>> segment = pci_domain_nr(pdev->bus);
>>>>>
>>>>> - dev_tmp = pci_find_upstream_pcie_bridge(pdev);
>>>>> - if (dev_tmp) {
>>>>> - if (pci_is_pcie(dev_tmp)) {
>>>>> - bus = dev_tmp->subordinate->number;
>>>>> + bridge = pci_find_upstream_pcie_bridge(pdev);
>>>>> + if (bridge) {
>>>>> + if (pci_is_pcie(bridge)) {
>>>>> + bus = bridge->subordinate->number;
>>>>> devfn = 0;
>>>>> } else {
>>>>> - bus = dev_tmp->bus->number;
>>>>> - devfn = dev_tmp->devfn;
>>>>> + bus = bridge->bus->number;
>>>>> + devfn = bridge->devfn;
>>>>> }
>>>>> spin_lock_irqsave(&device_domain_lock, flags);
>>>>> - list_for_each_entry(info, &device_domain_list, global) {
>>>>> - if (info->segment == segment &&
>>>>> - info->bus == bus && info->devfn == devfn) {
>>>>> - found = info->domain;
>>>>> - break;
>>>>> - }
>>>>> - }
>>>>> + domain = dmar_search_domain_by_dev_info(segment,
>>>>> bus, devfn);
>>>>> spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> /* pcie-pci bridge already has a domain, uses it */
>>>>> - if (found) {
>>>>> - domain = found;
>>>>> + if (domain)
>>>>> goto found_domain;
>>>>> - }
>>>>> }
>>>>>
>>>>> - domain = alloc_domain();
>>>>> - if (!domain)
>>>>> - goto error;
>>>>> -
>>>>> - /* Allocate new domain for the device */
>>>>> drhd = dmar_find_matched_drhd_unit(pdev);
>>>>> if (!drhd) {
>>>>> printk(KERN_ERR "IOMMU: can't find DMAR for device
>>>>> %s\n",
>>>>> pci_name(pdev));
>>>>> - free_domain_mem(domain);
>>>>> return NULL;
>>>>> }
>>>>> iommu = drhd->iommu;
>>>>>
>>>>> - ret = iommu_attach_domain(domain, iommu);
>>>>> - if (ret) {
>>>>> + /* Allocate new domain for the device */
>>>>> + domain = alloc_domain();
>>>>> + if (!domain)
>>>>> + goto error;
>>>>> + if (iommu_attach_domain(domain, iommu)) {
>>>>> free_domain_mem(domain);
>>>>> goto error;
>>>>> }
>>>>> -
>>>>> if (domain_init(domain, gaw)) {
>>>>> domain_exit(domain);
>>>>> goto error;
>>>>> }
>>>>>
>>>>> /* register pcie-to-pci device */
>>>>> - if (dev_tmp) {
>>>>> - info = alloc_devinfo_mem();
>>>>> - if (!info) {
>>>>> + if (bridge) {
>>>>> + if (dmar_insert_dev_info(segment, bus, devfn, NULL,
>>>>> &domain)) {
>>>>> domain_exit(domain);
>>>>> goto error;
>>>>> }
>>>>> - info->segment = segment;
>>>>> - info->bus = bus;
>>>>> - info->devfn = devfn;
>>>>> - info->dev = NULL;
>>>>> - info->domain = domain;
>>>>> - /* This domain is shared by devices under p2p bridge */
>>>>> - domain->flags |= DOMAIN_FLAG_P2P_MULTIPLE_DEVICES;
>>>>> -
>>>>> - /* pcie-to-pci bridge already has a domain, uses it */
>>>>> - found = NULL;
>>>>> - spin_lock_irqsave(&device_domain_lock, flags);
>>>>> - list_for_each_entry(tmp, &device_domain_list, global) {
>>>>> - if (tmp->segment == segment &&
>>>>> - tmp->bus == bus && tmp->devfn == devfn) {
>>>>> - found = tmp->domain;
>>>>> - break;
>>>>> - }
>>>>> - }
>>>>> - if (found) {
>>>>> - spin_unlock_irqrestore(&device_domain_lock,
>>>>> flags);
>>>>> - free_devinfo_mem(info);
>>>>> - domain_exit(domain);
>>>>> - domain = found;
>>>>> - } else {
>>>>> - list_add(&info->link, &domain->devices);
>>>>> - list_add(&info->global, &device_domain_list);
>>>>> - spin_unlock_irqrestore(&device_domain_lock,
>>>>> flags);
>>>>> - }
>>>>> }
>>>>>
>>>>> found_domain:
>>>>> - info = alloc_devinfo_mem();
>>>>> - if (!info)
>>>>> - goto error;
>>>>> - info->segment = segment;
>>>>> - info->bus = pdev->bus->number;
>>>>> - info->devfn = pdev->devfn;
>>>>> - info->dev = pdev;
>>>>> - info->domain = domain;
>>>>> - spin_lock_irqsave(&device_domain_lock, flags);
>>>>> - /* somebody is fast */
>>>>> - found = find_domain(pdev);
>>>>> - if (found != NULL) {
>>>>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> - if (found != domain) {
>>>>> - domain_exit(domain);
>>>>> - domain = found;
>>>>> - }
>>>>> - free_devinfo_mem(info);
>>>>> + if (dmar_insert_dev_info(segment, pdev->bus->number,
>>>>> pdev->devfn,
>>>>> + pdev, &domain) == 0)
>>>>> return domain;
>>>>> - }
>>>>> - list_add(&info->link, &domain->devices);
>>>>> - list_add(&info->global, &device_domain_list);
>>>>> - pdev->dev.archdata.iommu = info;
>>>>> - spin_unlock_irqrestore(&device_domain_lock, flags);
>>>>> - return domain;
>>>>> error:
>>>>> /* recheck it here, maybe others set it */
>>>>> return find_domain(pdev);
>>>>> --
>>>>> 1.7.10.4
>>>>>
>>>>> _______________________________________________
>>>>> iommu mailing list
>>>>> [email protected]
>>>>> <mailto:[email protected]>
>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>>
>>>>>

2014-01-08 20:43:41

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Patch Part2 V1 00/14] Enhance DMAR drivers to handle PCI/memory hotplug events

On Tue, Jan 7, 2014 at 1:00 AM, Jiang Liu <[email protected]> wrote:
> Intel DMA/interrupt remapping drivers scan available PCI/memory devices
> at startup and cache discovered hardware topologies. They don't update
> cached information if PCI/memory hotplug event happens at runtime, then
> the stale information may break DMA/interrupt remapping logic.
>
> This patchset first (Patch 1-8) tries to introduces some helper
> functions and fixes several bugs, then (Patch 9,10) uses a global
> rwsem and RCU to protect global DMA/interrupt remapping data
> structures, and finally (Patch 11-14) hook PCI/memory hotplug events
> to update cached information.
>
> It's also a preparation for supporting of DMA/interrupt remapping
> hotplug.
>
> Jiang Liu (14):
> iommu/vt-d: factor out dmar_alloc_dev_scope() for later reuse
> iommu/vt-d: move private structures and variables into intel-iommu.c
> iommu/vt-d: simplify function get_domain_for_dev()
> iommu/vt-d: free resources if failed to create domain for PCIe
> endpoint
> iommu/vt-d: create device_domain_info structure for intermediate P2P
> bridges
> iommu/vt-d: fix incorrect iommu_count for si_domain
> iommu/vt-d: fix error in detect ATS capability
> iommu/vt-d: introduce macro for_each_dev_scope() to walk device scope
> entries
> iommu/vt-d: introduce a rwsem to protect global data structures
> iommu/vt-d: use RCU to protect global resources in interrupt context
> iommu/vt-d, PCI: update DRHD/RMRR/ATSR device scope caches when PCI
> hotplug happens
> iommu/vt-d, PCI: unify the way to process DMAR device scope array
> iommu/vt-d: update device to static identity domain mapping for PCI
> hotplug
> iommu/vt-d: update IOMMU state when memory hotplug happens
>
> drivers/iommu/dmar.c | 412 +++++++++++++++++--------
> drivers/iommu/intel-iommu.c | 583 +++++++++++++++++++++--------------
> drivers/iommu/intel_irq_remapping.c | 108 ++++---
> include/linux/dmar.h | 75 +++--
> 4 files changed, 753 insertions(+), 425 deletions(-)


Hi, Jiang,

What is relationship between your two patchset with my iommu hotplug patchset?

https://lkml.org/lkml/2014/1/2/527

or yours will just replace mine?

Thanks

Yinghai

2014-01-09 00:41:35

by Jiang Liu

[permalink] [raw]
Subject: Re: [Patch Part2 V1 00/14] Enhance DMAR drivers to handle PCI/memory hotplug events



On 2014/1/9 4:43, Yinghai Lu wrote:
> On Tue, Jan 7, 2014 at 1:00 AM, Jiang Liu <[email protected]> wrote:
>> Intel DMA/interrupt remapping drivers scan available PCI/memory devices
>> at startup and cache discovered hardware topologies. They don't update
>> cached information if PCI/memory hotplug event happens at runtime, then
>> the stale information may break DMA/interrupt remapping logic.
>>
>> This patchset first (Patch 1-8) tries to introduces some helper
>> functions and fixes several bugs, then (Patch 9,10) uses a global
>> rwsem and RCU to protect global DMA/interrupt remapping data
>> structures, and finally (Patch 11-14) hook PCI/memory hotplug events
>> to update cached information.
>>
>> It's also a preparation for supporting of DMA/interrupt remapping
>> hotplug.
>>
>> Jiang Liu (14):
>> iommu/vt-d: factor out dmar_alloc_dev_scope() for later reuse
>> iommu/vt-d: move private structures and variables into intel-iommu.c
>> iommu/vt-d: simplify function get_domain_for_dev()
>> iommu/vt-d: free resources if failed to create domain for PCIe
>> endpoint
>> iommu/vt-d: create device_domain_info structure for intermediate P2P
>> bridges
>> iommu/vt-d: fix incorrect iommu_count for si_domain
>> iommu/vt-d: fix error in detect ATS capability
>> iommu/vt-d: introduce macro for_each_dev_scope() to walk device scope
>> entries
>> iommu/vt-d: introduce a rwsem to protect global data structures
>> iommu/vt-d: use RCU to protect global resources in interrupt context
>> iommu/vt-d, PCI: update DRHD/RMRR/ATSR device scope caches when PCI
>> hotplug happens
>> iommu/vt-d, PCI: unify the way to process DMAR device scope array
>> iommu/vt-d: update device to static identity domain mapping for PCI
>> hotplug
>> iommu/vt-d: update IOMMU state when memory hotplug happens
>>
>> drivers/iommu/dmar.c | 412 +++++++++++++++++--------
>> drivers/iommu/intel-iommu.c | 583 +++++++++++++++++++++--------------
>> drivers/iommu/intel_irq_remapping.c | 108 ++++---
>> include/linux/dmar.h | 75 +++--
>> 4 files changed, 753 insertions(+), 425 deletions(-)
>
>
> Hi, Jiang,
>
> What is relationship between your two patchset with my iommu hotplug patchset?
>
> https://lkml.org/lkml/2014/1/2/527
>
> or yours will just replace mine?
Hi Yinghai,
I have reviewed your v2 patch set, I think we are doing the
same task. If you agree, I will try to combine our two versions.

Thanks!
Gerry
>
> Thanks
>
> Yinghai
>

2014-01-09 03:12:49

by Yijing Wang

[permalink] [raw]
Subject: Re: [Patch Part2 V1 07/14] iommu/vt-d: fix error in detect ATS capability

This is a issue, our BIOS also supports several ATSR which have the same segment.
Good fix :)

On 2014/1/7 17:00, Jiang Liu wrote:
> Current Intel IOMMU driver only matches a PCIe root port with the first
> DRHD unit with the samge segment number. It will report false result
> if there are multiple DRHD units with the same segment number, thus fail
> to detect ATS capability for some PCIe devices.
>
> This patch refines function dmar_find_matched_atsr_unit() to search all
> DRHD units with the same segment number.
>
> An example DMAR table entries as below:
> [1D0h 0464 2] Subtable Type : 0002 <Root Port ATS Capability>
> [1D2h 0466 2] Length : 0028
> [1D4h 0468 1] Flags : 00
> [1D5h 0469 1] Reserved : 00
> [1D6h 0470 2] PCI Segment Number : 0000
>
> [1D8h 0472 1] Device Scope Entry Type : 02
> [1D9h 0473 1] Entry Length : 08
> [1DAh 0474 2] Reserved : 0000
> [1DCh 0476 1] Enumeration ID : 00
> [1DDh 0477 1] PCI Bus Number : 00
> [1DEh 0478 2] PCI Path : [02, 00]
>
> [1E0h 0480 1] Device Scope Entry Type : 02
> [1E1h 0481 1] Entry Length : 08
> [1E2h 0482 2] Reserved : 0000
> [1E4h 0484 1] Enumeration ID : 00
> [1E5h 0485 1] PCI Bus Number : 00
> [1E6h 0486 2] PCI Path : [03, 00]
>
> [1E8h 0488 1] Device Scope Entry Type : 02
> [1E9h 0489 1] Entry Length : 08
> [1EAh 0490 2] Reserved : 0000
> [1ECh 0492 1] Enumeration ID : 00
> [1EDh 0493 1] PCI Bus Number : 00
> [1EEh 0494 2] PCI Path : [03, 02]
>
> [1F0h 0496 1] Device Scope Entry Type : 02
> [1F1h 0497 1] Entry Length : 08
> [1F2h 0498 2] Reserved : 0000
> [1F4h 0500 1] Enumeration ID : 00
> [1F5h 0501 1] PCI Bus Number : 00
> [1F6h 0502 2] PCI Path : [03, 03]
>
> [1F8h 0504 2] Subtable Type : 0002 <Root Port ATS Capability>
> [1FAh 0506 2] Length : 0020
> [1FCh 0508 1] Flags : 00
> [1FDh 0509 1] Reserved : 00
> [1FEh 0510 2] PCI Segment Number : 0000
>
> [200h 0512 1] Device Scope Entry Type : 02
> [201h 0513 1] Entry Length : 08
> [202h 0514 2] Reserved : 0000
> [204h 0516 1] Enumeration ID : 00
> [205h 0517 1] PCI Bus Number : 40
> [206h 0518 2] PCI Path : [02, 00]
>
> [208h 0520 1] Device Scope Entry Type : 02
> [209h 0521 1] Entry Length : 08
> [20Ah 0522 2] Reserved : 0000
> [20Ch 0524 1] Enumeration ID : 00
> [20Dh 0525 1] PCI Bus Number : 40
> [20Eh 0526 2] PCI Path : [02, 02]
>
> [210h 0528 1] Device Scope Entry Type : 02
> [211h 0529 1] Entry Length : 08
> [212h 0530 2] Reserved : 0000
> [214h 0532 1] Enumeration ID : 00
> [215h 0533 1] PCI Bus Number : 40
> [216h 0534 2] PCI Path : [03, 00]
>
> [218h 0536 2] Subtable Type : 0002 <Root Port ATS Capability>
> [21Ah 0538 2] Length : 0020
> [21Ch 0540 1] Flags : 00
> [21Dh 0541 1] Reserved : 00
> [21Eh 0542 2] PCI Segment Number : 0000
>
> [220h 0544 1] Device Scope Entry Type : 02
> [221h 0545 1] Entry Length : 08
> [222h 0546 2] Reserved : 0000
> [224h 0548 1] Enumeration ID : 00
> [225h 0549 1] PCI Bus Number : 80
> [226h 0550 2] PCI Path : [02, 00]
>
> [228h 0552 1] Device Scope Entry Type : 02
> [229h 0553 1] Entry Length : 08
> [22Ah 0554 2] Reserved : 0000
> [22Ch 0556 1] Enumeration ID : 00
> [22Dh 0557 1] PCI Bus Number : 80
> [22Eh 0558 2] PCI Path : [02, 02]
>
> [230h 0560 1] Device Scope Entry Type : 02
> [231h 0561 1] Entry Length : 08
> [232h 0562 2] Reserved : 0000
> [234h 0564 1] Enumeration ID : 00
> [235h 0565 1] PCI Bus Number : 80
> [236h 0566 2] PCI Path : [03, 00]
>
> [238h 0568 2] Subtable Type : 0002 <Root Port ATS Capability>
> [23Ah 0570 2] Length : 0020
> [23Ch 0572 1] Flags : 00
> [23Dh 0573 1] Reserved : 00
> [23Eh 0574 2] PCI Segment Number : 0000
>
> [240h 0576 1] Device Scope Entry Type : 02
> [241h 0577 1] Entry Length : 08
> [242h 0578 2] Reserved : 0000
> [244h 0580 1] Enumeration ID : 00
> [245h 0581 1] PCI Bus Number : C0
> [246h 0582 2] PCI Path : [02, 00]
>
> [248h 0584 1] Device Scope Entry Type : 02
> [249h 0585 1] Entry Length : 08
> [24Ah 0586 2] Reserved : 0000
> [24Ch 0588 1] Enumeration ID : 00
> [24Dh 0589 1] PCI Bus Number : C0
> [24Eh 0590 2] PCI Path : [02, 02]
>
> [250h 0592 1] Device Scope Entry Type : 02
> [251h 0593 1] Entry Length : 08
> [252h 0594 2] Reserved : 0000
> [254h 0596 1] Enumeration ID : 00
> [255h 0597 1] PCI Bus Number : C0
> [256h 0598 2] PCI Path : [03, 00]
>
> Signed-off-by: Jiang Liu <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 37 +++++++++++++++++--------------------
> 1 file changed, 17 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 7038b38..b0028e2 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -3556,37 +3556,34 @@ int dmar_find_matched_atsr_unit(struct pci_dev *dev)
> {
> int i;
> struct pci_bus *bus;
> + struct pci_dev *bridge = NULL;
> struct acpi_dmar_atsr *atsr;
> struct dmar_atsr_unit *atsru;
>
> dev = pci_physfn(dev);
> -
> - list_for_each_entry(atsru, &dmar_atsr_units, list) {
> - atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
> - if (atsr->segment == pci_domain_nr(dev->bus))
> - goto found;
> - }
> -
> - return 0;
> -
> -found:
> for (bus = dev->bus; bus; bus = bus->parent) {
> - struct pci_dev *bridge = bus->self;
> -
> + bridge = bus->self;
> if (!bridge || !pci_is_pcie(bridge) ||
> pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
> return 0;
> -
> - if (pci_pcie_type(bridge) == PCI_EXP_TYPE_ROOT_PORT) {
> - for (i = 0; i < atsru->devices_cnt; i++)
> - if (atsru->devices[i] == bridge)
> - return 1;
> + if (pci_pcie_type(bridge) == PCI_EXP_TYPE_ROOT_PORT)
> break;
> - }
> }
> + if (!bridge)
> + return 0;
>
> - if (atsru->include_all)
> - return 1;
> + list_for_each_entry_rcu(atsru, &dmar_atsr_units, list) {
> + atsr = container_of(atsru->hdr, struct acpi_dmar_atsr, header);
> + if (atsr->segment != pci_domain_nr(dev->bus))
> + continue;
> +
> + for (i = 0; i < atsru->devices_cnt; i++)
> + if (atsru->devices[i] == bridge)
> + return 1;
> +
> + if (atsru->include_all)
> + return 1;
> + }
>
> return 0;
> }
>


--
Thanks!
Yijing

2014-01-09 08:55:07

by Yijing Wang

[permalink] [raw]
Subject: Re: [Patch Part2 V1 11/14] iommu/vt-d, PCI: update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens

Hi Gerry,
This is patch solution is better than mine, it's more reliable.
I tested this patch in my huawei RH5885 platform, after applied it,
the issue is gone. But I can not test it in the case that bus number changed
after hotplug. Anyway, it's a good solution.

Thanks!
Yijing.

On 2014/1/7 17:00, Jiang Liu wrote:
> Current Intel DMAR/IOMMU driver assumes that all PCI devices associated
> with DMAR/RMRR/ATSR device scope arrays are created at boot time and
> won't change at runtime, so it caches pointers of associated PCI device
> object. That assumption may be wrong now due to:
> 1) introduction of PCI host bridge hotplug
> 2) PCI device hotplug through sysfs interfaces.
>

> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 31c72d5..15c9ce0 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -194,6 +194,209 @@ void dmar_free_dev_scope(struct pci_dev __rcu ***devices, int *cnt)
> *cnt = 0;
> }
>
> +/* Optimize out kzalloc()/kfree() for normal cases */
> +static char dmar_pci_notify_info_buf[64];

In my idea, pci device add/remove is not a frequent action, so maybe using kzalloc and kfree make
code more simplified, this is just my personal opinion.

> +
> +static struct dmar_pci_notify_info *
> +dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned long event)
> +{
> + int level = 0;
> + size_t size;

2014-01-09 20:30:18

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Patch Part2 V1 00/14] Enhance DMAR drivers to handle PCI/memory hotplug events

On Wed, Jan 8, 2014 at 4:41 PM, Jiang Liu <[email protected]> wrote:
>
> I have reviewed your v2 patch set, I think we are doing the
> same task. If you agree, I will try to combine our two versions.

Sure. Please do.

Thanks

Yinghai