So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.
As of today, if the assumed direct mapping is not possible, DDW creation
is skipped and the default DMA window "ibm,dma-window" is used instead.
Using the DDW instead of the default DMA window may allow to expand the
amount of memory that can be DMA-mapped, given the number of pages (TCEs)
may stay the same (or increase) and the default DMA window offers only
4k-pages while DDW may offer larger pages (4k, 64k, 16M ...).
Patch #1 replaces hard-coded 4K page size with a variable containing the
correct page size for the window.
Patch #2 introduces iommu_table_in_use(), and replace manual bit-field
checking where it's used. It will be used for aborting enable_ddw() if
there is any current iommu allocation and we are trying single window
indirect mapping.
Patch #3 introduces iommu_pseries_alloc_table() that will be helpful
when indirect mapping needs to replace the iommu_table.
Patch #4 adds helpers for adding DDWs in the list.
Patch #5 refactors enable_ddw() so it returns if direct mapping is
possible, instead of DMA offset. It helps for next patches on
indirect DMA mapping and also allows DMA windows starting at 0x00.
Patch #6 bring new helper to simplify enable_ddw(), allowing
some reorganization for introducing indirect mapping DDW.
Patch #7 adds new helper _iommu_table_setparms() and use it in other
*setparams*() to fill iommu_table. It will also be used for creating a
new iommu_table for indirect mapping.
Patch #8 updates remove_dma_window() to accept different property names,
so we can introduce a new property for indirect mapping.
Patch #9 extracts find_existing_ddw_windows() into
find_existing_ddw_windows_named(), and calls it by it's property name.
This will be useful when the property for indirect mapping is created,
so we can search the device-tree for both properties.
Patch #10:
Instead of destroying the created DDW if it doesn't map the whole
partition, make use of it instead of the default DMA window as it improves
performance. Also, update the iommu_table and re-generate the pools.
It introduces a new property name for DDW with indirect DMA mapping.
Patch #11:
Does some renaming of 'direct window' to 'dma window', given the DDW
created can now be also used in indirect mapping if direct mapping is not
available.
All patches were tested into an LPAR with an virtio-net interface that
allows default DMA window and DDW to coexist.
Changes since v3:
- Fixed inverted free order at ddw_property_create()
- Updated goto tag naming
Changes since v2:
- Some patches got removed from the series and sent by themselves,
- New tbl created for DDW + indirect mapping reserves MMIO32 space,
- Improved reserved area algorithm,
- Improved commit messages,
- Removed define for default DMA window prop name,
- Avoided some unnecessary renaming,
- Removed some unnecessary empty lines,
- Changed some code moving to forward declarations.
v2
Link: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=201210&state=%2A&archive=both
Leonardo Bras (11):
powerpc/pseries/iommu: Replace hard-coded page shift
powerpc/kernel/iommu: Add new iommu_table_in_use() helper
powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
powerpc/pseries/iommu: Add ddw_list_new_entry() helper
powerpc/pseries/iommu: Allow DDW windows starting at 0x00
powerpc/pseries/iommu: Add ddw_property_create() and refactor
enable_ddw()
powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new
helper
powerpc/pseries/iommu: Update remove_dma_window() to accept property
name
powerpc/pseries/iommu: Find existing DDW with given property name
powerpc/pseries/iommu: Make use of DDW for indirect mapping
powerpc/pseries/iommu: Rename "direct window" to "dma window"
arch/powerpc/include/asm/iommu.h | 1 +
arch/powerpc/include/asm/tce.h | 8 -
arch/powerpc/kernel/iommu.c | 65 ++--
arch/powerpc/platforms/pseries/iommu.c | 504 +++++++++++++++----------
4 files changed, 338 insertions(+), 240 deletions(-)
--
2.30.2
Creates a helper to allow allocating a new iommu_table without the need
to reallocate the iommu_group.
This will be helpful for replacing the iommu_table for the new DMA window,
after we remove the old one with iommu_tce_table_put().
Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 796ab356341c..d02359ca1f9f 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -53,28 +53,31 @@ enum {
DDW_EXT_QUERY_OUT_SIZE = 2
};
-static struct iommu_table_group *iommu_pseries_alloc_group(int node)
+static struct iommu_table *iommu_pseries_alloc_table(int node)
{
- struct iommu_table_group *table_group;
struct iommu_table *tbl;
- table_group = kzalloc_node(sizeof(struct iommu_table_group), GFP_KERNEL,
- node);
- if (!table_group)
- return NULL;
-
tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
if (!tbl)
- goto free_group;
+ return NULL;
INIT_LIST_HEAD_RCU(&tbl->it_group_list);
kref_init(&tbl->it_kref);
+ return tbl;
+}
- table_group->tables[0] = tbl;
+static struct iommu_table_group *iommu_pseries_alloc_group(int node)
+{
+ struct iommu_table_group *table_group;
+
+ table_group = kzalloc_node(sizeof(*table_group), GFP_KERNEL, node);
+ if (!table_group)
+ return NULL;
- return table_group;
+ table_group->tables[0] = iommu_pseries_alloc_table(node);
+ if (table_group->tables[0])
+ return table_group;
-free_group:
kfree(table_group);
return NULL;
}
--
2.30.2
Having a function to check if the iommu table has any allocation helps
deciding if a tbl can be reset for using a new DMA window.
It should be enough to replace all instances of !bitmap_empty(tbl...).
iommu_table_in_use() skips reserved memory, so we don't need to worry about
releasing it before testing. This causes iommu_table_release_pages() to
become unnecessary, given it is only used to remove reserved memory for
testing.
Also, only allow storing reserved memory values in tbl if they are valid
in the table, so there is no need to check it in the new helper.
Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/include/asm/iommu.h | 1 +
arch/powerpc/kernel/iommu.c | 65 ++++++++++++++++----------------
2 files changed, 34 insertions(+), 32 deletions(-)
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index deef7c94d7b6..bf3b84128525 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -154,6 +154,7 @@ extern int iommu_tce_table_put(struct iommu_table *tbl);
*/
extern struct iommu_table *iommu_init_table(struct iommu_table *tbl,
int nid, unsigned long res_start, unsigned long res_end);
+bool iommu_table_in_use(struct iommu_table *tbl);
#define IOMMU_TABLE_GROUP_MAX_TABLES 2
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ad82dda81640..5e168bd91401 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -691,32 +691,24 @@ static void iommu_table_reserve_pages(struct iommu_table *tbl,
if (tbl->it_offset == 0)
set_bit(0, tbl->it_map);
- tbl->it_reserved_start = res_start;
- tbl->it_reserved_end = res_end;
-
- /* Check if res_start..res_end isn't empty and overlaps the table */
- if (res_start && res_end &&
- (tbl->it_offset + tbl->it_size < res_start ||
- res_end < tbl->it_offset))
- return;
+ if (res_start < tbl->it_offset)
+ res_start = tbl->it_offset;
- for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
- set_bit(i - tbl->it_offset, tbl->it_map);
-}
+ if (res_end > (tbl->it_offset + tbl->it_size))
+ res_end = tbl->it_offset + tbl->it_size;
-static void iommu_table_release_pages(struct iommu_table *tbl)
-{
- int i;
+ /* Check if res_start..res_end is a valid range in the table */
+ if (res_start >= res_end) {
+ tbl->it_reserved_start = tbl->it_offset;
+ tbl->it_reserved_end = tbl->it_offset;
+ return;
+ }
- /*
- * In case we have reserved the first bit, we should not emit
- * the warning below.
- */
- if (tbl->it_offset == 0)
- clear_bit(0, tbl->it_map);
+ tbl->it_reserved_start = res_start;
+ tbl->it_reserved_end = res_end;
for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
- clear_bit(i - tbl->it_offset, tbl->it_map);
+ set_bit(i - tbl->it_offset, tbl->it_map);
}
/*
@@ -781,6 +773,22 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
return tbl;
}
+bool iommu_table_in_use(struct iommu_table *tbl)
+{
+ unsigned long start = 0, end;
+
+ /* ignore reserved bit0 */
+ if (tbl->it_offset == 0)
+ start = 1;
+ end = tbl->it_reserved_start - tbl->it_offset;
+ if (find_next_bit(tbl->it_map, end, start) != end)
+ return true;
+
+ start = tbl->it_reserved_end - tbl->it_offset;
+ end = tbl->it_size;
+ return find_next_bit(tbl->it_map, end, start) != end;
+}
+
static void iommu_table_free(struct kref *kref)
{
unsigned long bitmap_sz;
@@ -799,10 +807,8 @@ static void iommu_table_free(struct kref *kref)
iommu_debugfs_del(tbl);
- iommu_table_release_pages(tbl);
-
/* verify that table contains no entries */
- if (!bitmap_empty(tbl->it_map, tbl->it_size))
+ if (iommu_table_in_use(tbl))
pr_warn("%s: Unexpected TCEs\n", __func__);
/* calculate bitmap size in bytes */
@@ -1108,18 +1114,13 @@ int iommu_take_ownership(struct iommu_table *tbl)
for (i = 0; i < tbl->nr_pools; i++)
spin_lock(&tbl->pools[i].lock);
- iommu_table_release_pages(tbl);
-
- if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
+ if (iommu_table_in_use(tbl)) {
pr_err("iommu_tce: it_map is not empty");
ret = -EBUSY;
- /* Undo iommu_table_release_pages, i.e. restore bit#0, etc */
- iommu_table_reserve_pages(tbl, tbl->it_reserved_start,
- tbl->it_reserved_end);
- } else {
- memset(tbl->it_map, 0xff, sz);
}
+ memset(tbl->it_map, 0xff, sz);
+
for (i = 0; i < tbl->nr_pools; i++)
spin_unlock(&tbl->pools[i].lock);
spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
--
2.30.2
Code used to create a ddw property that was previously scattered in
enable_ddw() is now gathered in ddw_property_create(), which deals with
allocation and filling the property, letting it ready for
of_property_add(), which now occurs in sequence.
This created an opportunity to reorganize the second part of enable_ddw():
Without this patch enable_ddw() does, in order:
kzalloc() property & members, create_ddw(), fill ddwprop inside property,
ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
of_add_property(), and list_add().
With this patch enable_ddw() does, in order:
create_ddw(), ddw_property_create(), of_add_property(),
ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
and list_add().
This change requires of_remove_property() in case anything fails after
of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
in all memory, which looks the most expensive operation, only if
everything else succeeds.
Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 93 ++++++++++++++++----------
1 file changed, 57 insertions(+), 36 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 955cf095416c..5a70ecd579b8 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1122,6 +1122,35 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
ret);
}
+static struct property *ddw_property_create(const char *propname, u32 liobn, u64 dma_addr,
+ u32 page_shift, u32 window_shift)
+{
+ struct dynamic_dma_window_prop *ddwprop;
+ struct property *win64;
+
+ win64 = kzalloc(sizeof(*win64), GFP_KERNEL);
+ if (!win64)
+ return NULL;
+
+ win64->name = kstrdup(propname, GFP_KERNEL);
+ ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
+ win64->value = ddwprop;
+ win64->length = sizeof(*ddwprop);
+ if (!win64->name || !win64->value) {
+ kfree(win64->name);
+ kfree(win64->value);
+ kfree(win64);
+ return NULL;
+ }
+
+ ddwprop->liobn = cpu_to_be32(liobn);
+ ddwprop->dma_base = cpu_to_be64(dma_addr);
+ ddwprop->tce_shift = cpu_to_be32(page_shift);
+ ddwprop->window_shift = cpu_to_be32(window_shift);
+
+ return win64;
+}
+
/* Return largest page shift based on "IO Page Sizes" output of ibm,query-pe-dma-window. */
static int iommu_get_page_shift(u32 query_page_size)
{
@@ -1167,11 +1196,11 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
struct ddw_query_response query;
struct ddw_create_response create;
int page_shift;
+ u64 win_addr;
struct device_node *dn;
u32 ddw_avail[DDW_APPLICABLE_SIZE];
struct direct_window *window;
struct property *win64 = NULL;
- struct dynamic_dma_window_prop *ddwprop;
struct failed_ddw_pdn *fpdn;
bool default_win_removed = false;
bool pmem_present;
@@ -1286,65 +1315,54 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
1ULL << page_shift);
goto out_failed;
}
- win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
- if (!win64) {
- dev_info(&dev->dev,
- "couldn't allocate property for 64bit dma window\n");
- goto out_failed;
- }
- win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
- win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
- win64->length = sizeof(*ddwprop);
- if (!win64->name || !win64->value) {
- dev_info(&dev->dev,
- "couldn't allocate property name and value\n");
- goto out_free_prop;
- }
ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
if (ret != 0)
- goto out_free_prop;
-
- ddwprop->liobn = cpu_to_be32(create.liobn);
- ddwprop->dma_base = cpu_to_be64(((u64)create.addr_hi << 32) |
- create.addr_lo);
- ddwprop->tce_shift = cpu_to_be32(page_shift);
- ddwprop->window_shift = cpu_to_be32(len);
+ goto out_failed;
dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
create.liobn, dn);
- window = ddw_list_new_entry(pdn, ddwprop);
+ win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
+ win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
+ page_shift, len);
+ if (!win64) {
+ dev_info(&dev->dev,
+ "couldn't allocate property, property name, or value\n");
+ goto out_remove_win;
+ }
+
+ ret = of_add_property(pdn, win64);
+ if (ret) {
+ dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
+ pdn, ret);
+ goto out_free_prop;
+ }
+
+ window = ddw_list_new_entry(pdn, win64->value);
if (!window)
- goto out_clear_window;
+ goto out_del_prop;
ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
win64->value, tce_setrange_multi_pSeriesLP_walk);
if (ret) {
dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
dn, ret);
- goto out_free_window;
- }
-
- ret = of_add_property(pdn, win64);
- if (ret) {
- dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
- pdn, ret);
- goto out_free_window;
+ goto out_del_list;
}
spin_lock(&direct_window_list_lock);
list_add(&window->list, &direct_window_list);
spin_unlock(&direct_window_list_lock);
- dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
+ dev->dev.archdata.dma_offset = win_addr;
goto out_unlock;
-out_free_window:
+out_del_list:
kfree(window);
-out_clear_window:
- remove_ddw(pdn, true);
+out_del_prop:
+ of_remove_property(pdn, win64);
out_free_prop:
kfree(win64->name);
@@ -1352,6 +1370,9 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
kfree(win64);
win64 = NULL;
+out_remove_win:
+ remove_ddw(pdn, true);
+
out_failed:
if (default_win_removed)
reset_dma_window(dev, pdn);
--
2.30.2
enable_ddw() currently returns the address of the DMA window, which is
considered invalid if has the value 0x00.
Also, it only considers valid an address returned from find_existing_ddw
if it's not 0x00.
Changing this behavior makes sense, given the users of enable_ddw() only
need to know if direct mapping is possible. It can also allow a DMA window
starting at 0x00 to be used.
This will be helpful for using a DDW with indirect mapping, as the window
address will be different than 0x00, but it will not map the whole
partition.
Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 35 ++++++++++++--------------
1 file changed, 16 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 6f14894d2d04..955cf095416c 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -849,25 +849,26 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
np, ret);
}
-static u64 find_existing_ddw(struct device_node *pdn, int *window_shift)
+static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
{
struct direct_window *window;
const struct dynamic_dma_window_prop *direct64;
- u64 dma_addr = 0;
+ bool found = false;
spin_lock(&direct_window_list_lock);
/* check if we already created a window and dupe that config if so */
list_for_each_entry(window, &direct_window_list, list) {
if (window->device == pdn) {
direct64 = window->prop;
- dma_addr = be64_to_cpu(direct64->dma_base);
+ *dma_addr = be64_to_cpu(direct64->dma_base);
*window_shift = be32_to_cpu(direct64->window_shift);
+ found = true;
break;
}
}
spin_unlock(&direct_window_list_lock);
- return dma_addr;
+ return found;
}
static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
@@ -1157,20 +1158,19 @@ static int iommu_get_page_shift(u32 query_page_size)
* pdn: the parent pe node with the ibm,dma_window property
* Future: also check if we can remap the base window for our base page size
*
- * returns the dma offset for use by the direct mapped DMA code.
+ * returns true if can map all pages (direct mapping), false otherwise..
*/
-static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
+static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
{
int len = 0, ret;
int max_ram_len = order_base_2(ddw_memory_hotplug_max());
struct ddw_query_response query;
struct ddw_create_response create;
int page_shift;
- u64 dma_addr;
struct device_node *dn;
u32 ddw_avail[DDW_APPLICABLE_SIZE];
struct direct_window *window;
- struct property *win64;
+ struct property *win64 = NULL;
struct dynamic_dma_window_prop *ddwprop;
struct failed_ddw_pdn *fpdn;
bool default_win_removed = false;
@@ -1182,8 +1182,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
mutex_lock(&direct_window_init_mutex);
- dma_addr = find_existing_ddw(pdn, &len);
- if (dma_addr != 0)
+ if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len))
goto out_unlock;
/*
@@ -1338,7 +1337,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
list_add(&window->list, &direct_window_list);
spin_unlock(&direct_window_list_lock);
- dma_addr = be64_to_cpu(ddwprop->dma_base);
+ dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
goto out_unlock;
out_free_window:
@@ -1351,6 +1350,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
kfree(win64->name);
kfree(win64->value);
kfree(win64);
+ win64 = NULL;
out_failed:
if (default_win_removed)
@@ -1370,10 +1370,10 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
* as RAM, then we failed to create a window to cover persistent
* memory and need to set the DMA limit.
*/
- if (pmem_present && dma_addr && (len == max_ram_len))
- dev->dev.bus_dma_limit = dma_addr + (1ULL << len);
+ if (pmem_present && win64 && (len == max_ram_len))
+ dev->dev.bus_dma_limit = dev->dev.archdata.dma_offset + (1ULL << len);
- return dma_addr;
+ return win64;
}
static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
@@ -1452,11 +1452,8 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
break;
}
- if (pdn && PCI_DN(pdn)) {
- pdev->dev.archdata.dma_offset = enable_ddw(pdev, pdn);
- if (pdev->dev.archdata.dma_offset)
- return true;
- }
+ if (pdn && PCI_DN(pdn))
+ return enable_ddw(pdev, pdn);
return false;
}
--
2.30.2
Add a new helper _iommu_table_setparms(), and use it in
iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
code.
Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
so move it to the new helper. Since we need the iommu_table_ops to be
declared before used, move iommu_table_lpar_multi_ops and
iommu_table_pseries_ops to before their respective iommu_table_setparms*().
Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 100 ++++++++++++-------------
1 file changed, 50 insertions(+), 50 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 5a70ecd579b8..89cb6e9e9f31 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -53,6 +53,11 @@ enum {
DDW_EXT_QUERY_OUT_SIZE = 2
};
+#ifdef CONFIG_IOMMU_API
+static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned long *tce,
+ enum dma_data_direction *direction, bool realmode);
+#endif
+
static struct iommu_table *iommu_pseries_alloc_table(int node)
{
struct iommu_table *tbl;
@@ -501,6 +506,28 @@ static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
}
+static inline void _iommu_table_setparms(struct iommu_table *tbl, unsigned long busno,
+ unsigned long liobn, unsigned long win_addr,
+ unsigned long window_size, unsigned long page_shift,
+ unsigned long base, struct iommu_table_ops *table_ops)
+{
+ tbl->it_busno = busno;
+ tbl->it_index = liobn;
+ tbl->it_offset = win_addr >> page_shift;
+ tbl->it_size = window_size >> page_shift;
+ tbl->it_page_shift = page_shift;
+ tbl->it_base = base;
+ tbl->it_blocksize = 16;
+ tbl->it_type = TCE_PCI;
+ tbl->it_ops = table_ops;
+}
+
+struct iommu_table_ops iommu_table_pseries_ops = {
+ .set = tce_build_pSeries,
+ .clear = tce_free_pSeries,
+ .get = tce_get_pseries
+};
+
static void iommu_table_setparms(struct pci_controller *phb,
struct device_node *dn,
struct iommu_table *tbl)
@@ -509,8 +536,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
const unsigned long *basep;
const u32 *sizep;
- node = phb->dn;
+ /* Test if we are going over 2GB of DMA space */
+ if (phb->dma_window_base_cur + phb->dma_window_size > SZ_2G) {
+ udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
+ panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
+ }
+ node = phb->dn;
basep = of_get_property(node, "linux,tce-base", NULL);
sizep = of_get_property(node, "linux,tce-size", NULL);
if (basep == NULL || sizep == NULL) {
@@ -519,33 +551,25 @@ static void iommu_table_setparms(struct pci_controller *phb,
return;
}
- tbl->it_base = (unsigned long)__va(*basep);
+ _iommu_table_setparms(tbl, phb->bus->number, 0, phb->dma_window_base_cur,
+ phb->dma_window_size, IOMMU_PAGE_SHIFT_4K,
+ (unsigned long)__va(*basep), &iommu_table_pseries_ops);
if (!is_kdump_kernel())
memset((void *)tbl->it_base, 0, *sizep);
- tbl->it_busno = phb->bus->number;
- tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
-
- /* Units of tce entries */
- tbl->it_offset = phb->dma_window_base_cur >> tbl->it_page_shift;
-
- /* Test if we are going over 2GB of DMA space */
- if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
- udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
- panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
- }
-
phb->dma_window_base_cur += phb->dma_window_size;
-
- /* Set the tce table size - measured in entries */
- tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
-
- tbl->it_index = 0;
- tbl->it_blocksize = 16;
- tbl->it_type = TCE_PCI;
}
+struct iommu_table_ops iommu_table_lpar_multi_ops = {
+ .set = tce_buildmulti_pSeriesLP,
+#ifdef CONFIG_IOMMU_API
+ .xchg_no_kill = tce_exchange_pseries,
+#endif
+ .clear = tce_freemulti_pSeriesLP,
+ .get = tce_get_pSeriesLP
+};
+
/*
* iommu_table_setparms_lpar
*
@@ -557,28 +581,17 @@ static void iommu_table_setparms_lpar(struct pci_controller *phb,
struct iommu_table_group *table_group,
const __be32 *dma_window)
{
- unsigned long offset, size;
+ unsigned long offset, size, liobn;
- of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
+ of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
- tbl->it_busno = phb->bus->number;
- tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
- tbl->it_base = 0;
- tbl->it_blocksize = 16;
- tbl->it_type = TCE_PCI;
- tbl->it_offset = offset >> tbl->it_page_shift;
- tbl->it_size = size >> tbl->it_page_shift;
+ _iommu_table_setparms(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, 0,
+ &iommu_table_lpar_multi_ops);
table_group->tce32_start = offset;
table_group->tce32_size = size;
}
-struct iommu_table_ops iommu_table_pseries_ops = {
- .set = tce_build_pSeries,
- .clear = tce_free_pSeries,
- .get = tce_get_pseries
-};
-
static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
{
struct device_node *dn;
@@ -647,7 +660,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
tbl = pci->table_group->tables[0];
iommu_table_setparms(pci->phb, dn, tbl);
- tbl->it_ops = &iommu_table_pseries_ops;
iommu_init_table(tbl, pci->phb->node, 0, 0);
/* Divide the rest (1.75GB) among the children */
@@ -664,7 +676,7 @@ static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
bool realmode)
{
long rc;
- unsigned long ioba = (unsigned long) index << tbl->it_page_shift;
+ unsigned long ioba = (unsigned long)index << tbl->it_page_shift;
unsigned long flags, oldtce = 0;
u64 proto_tce = iommu_direction_to_tce_perm(*direction);
unsigned long newtce = *tce | proto_tce;
@@ -686,15 +698,6 @@ static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
}
#endif
-struct iommu_table_ops iommu_table_lpar_multi_ops = {
- .set = tce_buildmulti_pSeriesLP,
-#ifdef CONFIG_IOMMU_API
- .xchg_no_kill = tce_exchange_pseries,
-#endif
- .clear = tce_freemulti_pSeriesLP,
- .get = tce_get_pSeriesLP
-};
-
static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
{
struct iommu_table *tbl;
@@ -729,7 +732,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
tbl = ppci->table_group->tables[0];
iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
ppci->table_group, dma_window);
- tbl->it_ops = &iommu_table_lpar_multi_ops;
iommu_init_table(tbl, ppci->phb->node, 0, 0);
iommu_register_group(ppci->table_group,
pci_domain_nr(bus), 0);
@@ -758,7 +760,6 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
PCI_DN(dn)->table_group = iommu_pseries_alloc_group(phb->node);
tbl = PCI_DN(dn)->table_group->tables[0];
iommu_table_setparms(phb, dn, tbl);
- tbl->it_ops = &iommu_table_pseries_ops;
iommu_init_table(tbl, phb->node, 0, 0);
set_iommu_table_base(&dev->dev, tbl);
return;
@@ -1436,7 +1437,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
tbl = pci->table_group->tables[0];
iommu_table_setparms_lpar(pci->phb, pdn, tbl,
pci->table_group, dma_window);
- tbl->it_ops = &iommu_table_lpar_multi_ops;
iommu_init_table(tbl, pci->phb->node, 0, 0);
iommu_register_group(pci->table_group,
pci_domain_nr(pci->phb->bus), 0);
--
2.30.2
So far it's assumed possible to map the guest RAM 1:1 to the bus, which
works with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.
As of today, if the assumed direct mapping is not possible, DDW creation
is skipped and the default DMA window "ibm,dma-window" is used instead.
By using DDW, indirect mapping can get more TCEs than available for the
default DMA window, and also get access to using much larger pagesizes
(16MB as implemented in qemu vs 4k from default DMA window), causing a
significant increase on the maximum amount of memory that can be IOMMU
mapped at the same time.
Indirect mapping will only be used if direct mapping is not a
possibility.
For indirect mapping, it's necessary to re-create the iommu_table with
the new DMA window parameters, so iommu_alloc() can use it.
Removing the default DMA window for using DDW with indirect mapping
is only allowed if there is no current IOMMU memory allocated in
the iommu_table. enable_ddw() is aborted otherwise.
Even though there won't be both direct and indirect mappings at the
same time, we can't reuse the DIRECT64_PROPNAME property name, or else
an older kexec()ed kernel can assume direct mapping, and skip
iommu_alloc(), causing undesirable behavior.
So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
was created to represent a DDW that does not allow direct mapping.
Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 87 +++++++++++++++++++++-----
1 file changed, 72 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index de54ddd9decd..572879af0211 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -53,6 +53,7 @@ enum {
DDW_EXT_QUERY_OUT_SIZE = 2
};
+static phys_addr_t ddw_memory_hotplug_max(void);
#ifdef CONFIG_IOMMU_API
static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned long *tce,
enum dma_data_direction *direction, bool realmode);
@@ -380,6 +381,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
/* protects initializing window twice for same device */
static DEFINE_MUTEX(direct_window_init_mutex);
#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
unsigned long num_pfn, const void *arg)
@@ -918,6 +920,7 @@ static int find_existing_ddw_windows(void)
return 0;
find_existing_ddw_windows_named(DIRECT64_PROPNAME);
+ find_existing_ddw_windows_named(DMA64_PROPNAME);
return 0;
}
@@ -1207,10 +1210,13 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
struct device_node *dn;
u32 ddw_avail[DDW_APPLICABLE_SIZE];
struct direct_window *window;
+ const char *win_name;
struct property *win64 = NULL;
struct failed_ddw_pdn *fpdn;
- bool default_win_removed = false;
+ bool default_win_removed = false, direct_mapping = false;
bool pmem_present;
+ struct pci_dn *pci = PCI_DN(pdn);
+ struct iommu_table *tbl = pci->table_group->tables[0];
dn = of_find_node_by_type(NULL, "ibm,pmemory");
pmem_present = dn != NULL;
@@ -1218,8 +1224,12 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
mutex_lock(&direct_window_init_mutex);
- if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len))
- goto out_unlock;
+ if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
+ direct_mapping = (len >= max_ram_len);
+
+ mutex_unlock(&direct_window_init_mutex);
+ return direct_mapping;
+ }
/*
* If we already went through this for a previous function of
@@ -1298,7 +1308,6 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
goto out_failed;
}
/* verify the window * number of ptes will map the partition */
- /* check largest block * page size > max memory hotplug addr */
/*
* The "ibm,pmemory" can appear anywhere in the address space.
* Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS
@@ -1320,6 +1329,17 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
1ULL << len,
query.largest_available_block,
1ULL << page_shift);
+
+ len = order_base_2(query.largest_available_block << page_shift);
+ win_name = DMA64_PROPNAME;
+ } else {
+ direct_mapping = true;
+ win_name = DIRECT64_PROPNAME;
+ }
+
+ /* DDW + IOMMU on single window may fail if there is any allocation */
+ if (default_win_removed && !direct_mapping && iommu_table_in_use(tbl)) {
+ dev_dbg(&dev->dev, "current IOMMU table in use, can't be replaced.\n");
goto out_failed;
}
@@ -1331,8 +1351,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
create.liobn, dn);
win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
- win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
- page_shift, len);
+ win64 = ddw_property_create(win_name, create.liobn, win_addr, page_shift, len);
if (!win64) {
dev_info(&dev->dev,
"couldn't allocate property, property name, or value\n");
@@ -1350,12 +1369,47 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
if (!window)
goto out_del_prop;
- ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
- win64->value, tce_setrange_multi_pSeriesLP_walk);
- if (ret) {
- dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
- dn, ret);
- goto out_del_list;
+ if (direct_mapping) {
+ /* DDW maps the whole partition, so enable direct DMA mapping */
+ ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
+ win64->value, tce_setrange_multi_pSeriesLP_walk);
+ if (ret) {
+ dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
+ dn, ret);
+ goto out_del_list;
+ }
+ } else {
+ struct iommu_table *newtbl;
+ int i;
+
+ /* New table for using DDW instead of the default DMA window */
+ newtbl = iommu_pseries_alloc_table(pci->phb->node);
+ if (!newtbl) {
+ dev_dbg(&dev->dev, "couldn't create new IOMMU table\n");
+ goto out_del_list;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
+ const unsigned long mask = IORESOURCE_MEM_64 | IORESOURCE_MEM;
+
+ /* Look for MMIO32 */
+ if ((pci->phb->mem_resources[i].flags & mask) == IORESOURCE_MEM)
+ break;
+ }
+
+ _iommu_table_setparms(newtbl, pci->phb->bus->number, create.liobn, win_addr,
+ 1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
+ iommu_init_table(newtbl, pci->phb->node, pci->phb->mem_resources[i].start,
+ pci->phb->mem_resources[i].end);
+
+ if (default_win_removed)
+ iommu_tce_table_put(tbl);
+ else
+ pci->table_group->tables[1] = tbl;
+
+ pci->table_group->tables[0] = newtbl;
+
+ set_iommu_table_base(&dev->dev, newtbl);
}
spin_lock(&direct_window_list_lock);
@@ -1398,10 +1452,10 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
* as RAM, then we failed to create a window to cover persistent
* memory and need to set the DMA limit.
*/
- if (pmem_present && win64 && (len == max_ram_len))
+ if (pmem_present && direct_mapping && len == max_ram_len)
dev->dev.bus_dma_limit = dev->dev.archdata.dma_offset + (1ULL << len);
- return win64;
+ return win64 && direct_mapping;
}
static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
@@ -1542,7 +1596,10 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
* we have to remove the property when releasing
* the device node.
*/
- remove_ddw(np, false, DIRECT64_PROPNAME);
+
+ if (remove_ddw(np, false, DIRECT64_PROPNAME))
+ remove_ddw(np, false, DMA64_PROPNAME);
+
if (pci && pci->table_group)
iommu_pseries_free_group(pci->table_group,
np->full_name);
--
2.30.2
A previous change introduced the usage of DDW as a bigger indirect DMA
mapping when the DDW available size does not map the whole partition.
As most of the code that manipulates direct mappings was reused for
indirect mappings, it's necessary to rename all names and debug/info
messages to reflect that it can be used for both kinds of mapping.
This should cause no behavioural change, just adjust naming.
Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 93 +++++++++++++-------------
1 file changed, 48 insertions(+), 45 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 572879af0211..ce7b841fb10f 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -355,7 +355,7 @@ struct dynamic_dma_window_prop {
__be32 window_shift; /* ilog2(tce_window_size) */
};
-struct direct_window {
+struct dma_win {
struct device_node *device;
const struct dynamic_dma_window_prop *prop;
struct list_head list;
@@ -375,11 +375,11 @@ struct ddw_create_response {
u32 addr_lo;
};
-static LIST_HEAD(direct_window_list);
+static LIST_HEAD(dma_win_list);
/* prevents races between memory on/offline and window creation */
-static DEFINE_SPINLOCK(direct_window_list_lock);
+static DEFINE_SPINLOCK(dma_win_list_lock);
/* protects initializing window twice for same device */
-static DEFINE_MUTEX(direct_window_init_mutex);
+static DEFINE_MUTEX(dma_win_init_mutex);
#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
@@ -712,7 +712,10 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n",
dn);
- /* Find nearest ibm,dma-window, walking up the device tree */
+ /*
+ * Find nearest ibm,dma-window (default DMA window), walking up the
+ * device tree
+ */
for (pdn = dn; pdn != NULL; pdn = pdn->parent) {
dma_window = of_get_property(pdn, "ibm,dma-window", NULL);
if (dma_window != NULL)
@@ -816,11 +819,11 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
ret = rtas_call(ddw_avail[DDW_REMOVE_PE_DMA_WIN], 1, 1, NULL, liobn);
if (ret)
- pr_warn("%pOF: failed to remove direct window: rtas returned "
+ pr_warn("%pOF: failed to remove DMA window: rtas returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
else
- pr_debug("%pOF: successfully removed direct window: rtas returned "
+ pr_debug("%pOF: successfully removed DMA window: rtas returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
}
@@ -848,37 +851,37 @@ static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_
ret = of_remove_property(np, win);
if (ret)
- pr_warn("%pOF: failed to remove direct window property: %d\n",
+ pr_warn("%pOF: failed to remove DMA window property: %d\n",
np, ret);
return 0;
}
static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
{
- struct direct_window *window;
- const struct dynamic_dma_window_prop *direct64;
+ struct dma_win *window;
+ const struct dynamic_dma_window_prop *dma64;
bool found = false;
- spin_lock(&direct_window_list_lock);
+ spin_lock(&dma_win_list_lock);
/* check if we already created a window and dupe that config if so */
- list_for_each_entry(window, &direct_window_list, list) {
+ list_for_each_entry(window, &dma_win_list, list) {
if (window->device == pdn) {
- direct64 = window->prop;
- *dma_addr = be64_to_cpu(direct64->dma_base);
- *window_shift = be32_to_cpu(direct64->window_shift);
+ dma64 = window->prop;
+ *dma_addr = be64_to_cpu(dma64->dma_base);
+ *window_shift = be32_to_cpu(dma64->window_shift);
found = true;
break;
}
}
- spin_unlock(&direct_window_list_lock);
+ spin_unlock(&dma_win_list_lock);
return found;
}
-static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
- const struct dynamic_dma_window_prop *dma64)
+static struct dma_win *ddw_list_new_entry(struct device_node *pdn,
+ const struct dynamic_dma_window_prop *dma64)
{
- struct direct_window *window;
+ struct dma_win *window;
window = kzalloc(sizeof(*window), GFP_KERNEL);
if (!window)
@@ -894,7 +897,7 @@ static void find_existing_ddw_windows_named(const char *name)
{
int len;
struct device_node *pdn;
- struct direct_window *window;
+ struct dma_win *window;
const struct dynamic_dma_window_prop *dma64;
for_each_node_with_property(pdn, name) {
@@ -908,9 +911,9 @@ static void find_existing_ddw_windows_named(const char *name)
if (!window)
break;
- spin_lock(&direct_window_list_lock);
- list_add(&window->list, &direct_window_list);
- spin_unlock(&direct_window_list_lock);
+ spin_lock(&dma_win_list_lock);
+ list_add(&window->list, &dma_win_list);
+ spin_unlock(&dma_win_list_lock);
}
}
@@ -1209,7 +1212,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
u64 win_addr;
struct device_node *dn;
u32 ddw_avail[DDW_APPLICABLE_SIZE];
- struct direct_window *window;
+ struct dma_win *window;
const char *win_name;
struct property *win64 = NULL;
struct failed_ddw_pdn *fpdn;
@@ -1222,12 +1225,12 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
pmem_present = dn != NULL;
of_node_put(dn);
- mutex_lock(&direct_window_init_mutex);
+ mutex_lock(&dma_win_init_mutex);
if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
direct_mapping = (len >= max_ram_len);
- mutex_unlock(&direct_window_init_mutex);
+ mutex_unlock(&dma_win_init_mutex);
return direct_mapping;
}
@@ -1303,8 +1306,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
page_shift = iommu_get_page_shift(query.page_size);
if (!page_shift) {
- dev_dbg(&dev->dev, "no supported direct page size in mask %x",
- query.page_size);
+ dev_dbg(&dev->dev, "no supported page size in mask %x",
+ query.page_size);
goto out_failed;
}
/* verify the window * number of ptes will map the partition */
@@ -1360,7 +1363,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
ret = of_add_property(pdn, win64);
if (ret) {
- dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
+ dev_err(&dev->dev, "unable to add DMA window property for %pOF: %d",
pdn, ret);
goto out_free_prop;
}
@@ -1374,7 +1377,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
win64->value, tce_setrange_multi_pSeriesLP_walk);
if (ret) {
- dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
+ dev_info(&dev->dev, "failed to map DMA window for %pOF: %d\n",
dn, ret);
goto out_del_list;
}
@@ -1412,9 +1415,9 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
set_iommu_table_base(&dev->dev, newtbl);
}
- spin_lock(&direct_window_list_lock);
- list_add(&window->list, &direct_window_list);
- spin_unlock(&direct_window_list_lock);
+ spin_lock(&dma_win_list_lock);
+ list_add(&window->list, &dma_win_list);
+ spin_unlock(&dma_win_list_lock);
dev->dev.archdata.dma_offset = win_addr;
goto out_unlock;
@@ -1445,7 +1448,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
list_add(&fpdn->list, &failed_ddw_pdn_list);
out_unlock:
- mutex_unlock(&direct_window_init_mutex);
+ mutex_unlock(&dma_win_init_mutex);
/*
* If we have persistent memory and the window size is only as big
@@ -1542,29 +1545,29 @@ static bool iommu_bypass_supported_pSeriesLP(struct pci_dev *pdev, u64 dma_mask)
static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
void *data)
{
- struct direct_window *window;
+ struct dma_win *window;
struct memory_notify *arg = data;
int ret = 0;
switch (action) {
case MEM_GOING_ONLINE:
- spin_lock(&direct_window_list_lock);
- list_for_each_entry(window, &direct_window_list, list) {
+ spin_lock(&dma_win_list_lock);
+ list_for_each_entry(window, &dma_win_list, list) {
ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
arg->nr_pages, window->prop);
/* XXX log error */
}
- spin_unlock(&direct_window_list_lock);
+ spin_unlock(&dma_win_list_lock);
break;
case MEM_CANCEL_ONLINE:
case MEM_OFFLINE:
- spin_lock(&direct_window_list_lock);
- list_for_each_entry(window, &direct_window_list, list) {
+ spin_lock(&dma_win_list_lock);
+ list_for_each_entry(window, &dma_win_list, list) {
ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
arg->nr_pages, window->prop);
/* XXX log error */
}
- spin_unlock(&direct_window_list_lock);
+ spin_unlock(&dma_win_list_lock);
break;
default:
break;
@@ -1585,7 +1588,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
struct of_reconfig_data *rd = data;
struct device_node *np = rd->dn;
struct pci_dn *pci = PCI_DN(np);
- struct direct_window *window;
+ struct dma_win *window;
switch (action) {
case OF_RECONFIG_DETACH_NODE:
@@ -1604,15 +1607,15 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
iommu_pseries_free_group(pci->table_group,
np->full_name);
- spin_lock(&direct_window_list_lock);
- list_for_each_entry(window, &direct_window_list, list) {
+ spin_lock(&dma_win_list_lock);
+ list_for_each_entry(window, &dma_win_list, list) {
if (window->device == np) {
list_del(&window->list);
kfree(window);
break;
}
}
- spin_unlock(&direct_window_list_lock);
+ spin_unlock(&dma_win_list_lock);
break;
default:
err = NOTIFY_DONE;
--
2.30.2
Some functions assume IOMMU page size can only be 4K (pageshift == 12).
Update them to accept any page size passed, so we can use 64K pages.
In the process, some defines like TCE_SHIFT were made obsolete, and then
removed.
IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
tce_buildmulti_pSeriesLP().
Most places had a tbl struct, so using tbl->it_page_shift was simple.
tce_free_pSeriesLP() was a special case, since callers not always have a
tbl struct, so adding a tceshift parameter seems the right thing to do.
Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/include/asm/tce.h | 8 ------
arch/powerpc/platforms/pseries/iommu.c | 39 +++++++++++++++-----------
2 files changed, 23 insertions(+), 24 deletions(-)
diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index db5fc2f2262d..0c34d2756d92 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -19,15 +19,7 @@
#define TCE_VB 0
#define TCE_PCI 1
-/* TCE page size is 4096 bytes (1 << 12) */
-
-#define TCE_SHIFT 12
-#define TCE_PAGE_SIZE (1 << TCE_SHIFT)
-
#define TCE_ENTRY_SIZE 8 /* each TCE is 64 bits */
-
-#define TCE_RPN_MASK 0xfffffffffful /* 40-bit RPN (4K pages) */
-#define TCE_RPN_SHIFT 12
#define TCE_VALID 0x800 /* TCE valid */
#define TCE_ALLIO 0x400 /* TCE valid for all lpars */
#define TCE_PCI_WRITE 0x2 /* write from PCI allowed */
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 67c9953a6503..796ab356341c 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -107,6 +107,8 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
u64 proto_tce;
__be64 *tcep;
u64 rpn;
+ const unsigned long tceshift = tbl->it_page_shift;
+ const unsigned long pagesize = IOMMU_PAGE_SIZE(tbl);
proto_tce = TCE_PCI_READ; // Read allowed
@@ -117,10 +119,10 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
while (npages--) {
/* can't move this out since we might cross MEMBLOCK boundary */
- rpn = __pa(uaddr) >> TCE_SHIFT;
- *tcep = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
+ rpn = __pa(uaddr) >> tceshift;
+ *tcep = cpu_to_be64(proto_tce | rpn << tceshift);
- uaddr += TCE_PAGE_SIZE;
+ uaddr += pagesize;
tcep++;
}
return 0;
@@ -146,7 +148,7 @@ static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
return be64_to_cpu(*tcep);
}
-static void tce_free_pSeriesLP(unsigned long liobn, long, long);
+static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
@@ -166,12 +168,12 @@ static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
proto_tce |= TCE_PCI_WRITE;
while (npages--) {
- tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
+ tce = proto_tce | rpn << tceshift;
rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
ret = (int)rc;
- tce_free_pSeriesLP(liobn, tcenum_start,
+ tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
(npages_start - (npages + 1)));
break;
}
@@ -205,10 +207,11 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
long tcenum_start = tcenum, npages_start = npages;
int ret = 0;
unsigned long flags;
+ const unsigned long tceshift = tbl->it_page_shift;
if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {
return tce_build_pSeriesLP(tbl->it_index, tcenum,
- tbl->it_page_shift, npages, uaddr,
+ tceshift, npages, uaddr,
direction, attrs);
}
@@ -225,13 +228,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
if (!tcep) {
local_irq_restore(flags);
return tce_build_pSeriesLP(tbl->it_index, tcenum,
- tbl->it_page_shift,
+ tceshift,
npages, uaddr, direction, attrs);
}
__this_cpu_write(tce_page, tcep);
}
- rpn = __pa(uaddr) >> TCE_SHIFT;
+ rpn = __pa(uaddr) >> tceshift;
proto_tce = TCE_PCI_READ;
if (direction != DMA_TO_DEVICE)
proto_tce |= TCE_PCI_WRITE;
@@ -245,12 +248,12 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
limit = min_t(long, npages, 4096/TCE_ENTRY_SIZE);
for (l = 0; l < limit; l++) {
- tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT);
+ tcep[l] = cpu_to_be64(proto_tce | rpn << tceshift);
rpn++;
}
rc = plpar_tce_put_indirect((u64)tbl->it_index,
- (u64)tcenum << 12,
+ (u64)tcenum << tceshift,
(u64)__pa(tcep),
limit);
@@ -277,12 +280,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
return ret;
}
-static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long npages)
+static void tce_free_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,
+ long npages)
{
u64 rc;
while (npages--) {
- rc = plpar_tce_put((u64)liobn, (u64)tcenum << 12, 0);
+ rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, 0);
if (rc && printk_ratelimit()) {
printk("tce_free_pSeriesLP: plpar_tce_put failed. rc=%lld\n", rc);
@@ -301,9 +305,11 @@ static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long n
u64 rc;
if (!firmware_has_feature(FW_FEATURE_STUFF_TCE))
- return tce_free_pSeriesLP(tbl->it_index, tcenum, npages);
+ return tce_free_pSeriesLP(tbl->it_index, tcenum,
+ tbl->it_page_shift, npages);
- rc = plpar_tce_stuff((u64)tbl->it_index, (u64)tcenum << 12, 0, npages);
+ rc = plpar_tce_stuff((u64)tbl->it_index,
+ (u64)tcenum << tbl->it_page_shift, 0, npages);
if (rc && printk_ratelimit()) {
printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");
@@ -319,7 +325,8 @@ static unsigned long tce_get_pSeriesLP(struct iommu_table *tbl, long tcenum)
u64 rc;
unsigned long tce_ret;
- rc = plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12, &tce_ret);
+ rc = plpar_tce_get((u64)tbl->it_index,
+ (u64)tcenum << tbl->it_page_shift, &tce_ret);
if (rc && printk_ratelimit()) {
printk("tce_get_pSeriesLP: plpar_tce_get failed. rc=%lld\n", rc);
--
2.30.2
At the moment pseries stores information about created directly mapped
DDW window in DIRECT64_PROPNAME.
With the objective of implementing indirect DMA mapping with DDW, it's
necessary to have another propriety name to make sure kexec'ing into older
kernels does not break, as it would if we reuse DIRECT64_PROPNAME.
In order to have this, find_existing_ddw_windows() needs to be able to
look for different property names.
Extract find_existing_ddw_windows() into find_existing_ddw_windows_named()
and calls it with current property name.
Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index f8922fcf34b6..de54ddd9decd 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -888,24 +888,21 @@ static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
return window;
}
-static int find_existing_ddw_windows(void)
+static void find_existing_ddw_windows_named(const char *name)
{
int len;
struct device_node *pdn;
struct direct_window *window;
- const struct dynamic_dma_window_prop *direct64;
-
- if (!firmware_has_feature(FW_FEATURE_LPAR))
- return 0;
+ const struct dynamic_dma_window_prop *dma64;
- for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
- direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
- if (!direct64 || len < sizeof(*direct64)) {
- remove_ddw(pdn, true, DIRECT64_PROPNAME);
+ for_each_node_with_property(pdn, name) {
+ dma64 = of_get_property(pdn, name, &len);
+ if (!dma64 || len < sizeof(*dma64)) {
+ remove_ddw(pdn, true, name);
continue;
}
- window = ddw_list_new_entry(pdn, direct64);
+ window = ddw_list_new_entry(pdn, dma64);
if (!window)
break;
@@ -913,6 +910,14 @@ static int find_existing_ddw_windows(void)
list_add(&window->list, &direct_window_list);
spin_unlock(&direct_window_list_lock);
}
+}
+
+static int find_existing_ddw_windows(void)
+{
+ if (!firmware_has_feature(FW_FEATURE_LPAR))
+ return 0;
+
+ find_existing_ddw_windows_named(DIRECT64_PROPNAME);
return 0;
}
--
2.30.2
There are two functions creating direct_window_list entries in a
similar way, so create a ddw_list_new_entry() to avoid duplicity and
simplify those functions.
Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 32 +++++++++++++++++---------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index d02359ca1f9f..6f14894d2d04 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -870,6 +870,21 @@ static u64 find_existing_ddw(struct device_node *pdn, int *window_shift)
return dma_addr;
}
+static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
+ const struct dynamic_dma_window_prop *dma64)
+{
+ struct direct_window *window;
+
+ window = kzalloc(sizeof(*window), GFP_KERNEL);
+ if (!window)
+ return NULL;
+
+ window->device = pdn;
+ window->prop = dma64;
+
+ return window;
+}
+
static int find_existing_ddw_windows(void)
{
int len;
@@ -882,18 +897,15 @@ static int find_existing_ddw_windows(void)
for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
- if (!direct64)
- continue;
-
- window = kzalloc(sizeof(*window), GFP_KERNEL);
- if (!window || len < sizeof(struct dynamic_dma_window_prop)) {
- kfree(window);
+ if (!direct64 || len < sizeof(*direct64)) {
remove_ddw(pdn, true);
continue;
}
- window->device = pdn;
- window->prop = direct64;
+ window = ddw_list_new_entry(pdn, direct64);
+ if (!window)
+ break;
+
spin_lock(&direct_window_list_lock);
list_add(&window->list, &direct_window_list);
spin_unlock(&direct_window_list_lock);
@@ -1303,7 +1315,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
create.liobn, dn);
- window = kzalloc(sizeof(*window), GFP_KERNEL);
+ window = ddw_list_new_entry(pdn, ddwprop);
if (!window)
goto out_clear_window;
@@ -1322,8 +1334,6 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
goto out_free_window;
}
- window->device = pdn;
- window->prop = ddwprop;
spin_lock(&direct_window_list_lock);
list_add(&window->list, &direct_window_list);
spin_unlock(&direct_window_list_lock);
--
2.30.2
Update remove_dma_window() so it can be used to remove DDW with a given
property name.
This enables the creation of new property names for DDW, so we can
have different usage for it, like indirect mapping.
Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 89cb6e9e9f31..f8922fcf34b6 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -823,31 +823,32 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
}
-static void remove_ddw(struct device_node *np, bool remove_prop)
+static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_name)
{
struct property *win;
u32 ddw_avail[DDW_APPLICABLE_SIZE];
int ret = 0;
+ win = of_find_property(np, win_name, NULL);
+ if (!win)
+ return -EINVAL;
+
ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
&ddw_avail[0], DDW_APPLICABLE_SIZE);
if (ret)
- return;
-
- win = of_find_property(np, DIRECT64_PROPNAME, NULL);
- if (!win)
- return;
+ return 0;
if (win->length >= sizeof(struct dynamic_dma_window_prop))
remove_dma_window(np, ddw_avail, win);
if (!remove_prop)
- return;
+ return 0;
ret = of_remove_property(np, win);
if (ret)
pr_warn("%pOF: failed to remove direct window property: %d\n",
np, ret);
+ return 0;
}
static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
@@ -900,7 +901,7 @@ static int find_existing_ddw_windows(void)
for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
if (!direct64 || len < sizeof(*direct64)) {
- remove_ddw(pdn, true);
+ remove_ddw(pdn, true, DIRECT64_PROPNAME);
continue;
}
@@ -1372,7 +1373,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
win64 = NULL;
out_remove_win:
- remove_ddw(pdn, true);
+ remove_ddw(pdn, true, DIRECT64_PROPNAME);
out_failed:
if (default_win_removed)
@@ -1536,7 +1537,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
* we have to remove the property when releasing
* the device node.
*/
- remove_ddw(np, false);
+ remove_ddw(np, false, DIRECT64_PROPNAME);
if (pci && pci->table_group)
iommu_pseries_free_group(pci->table_group,
np->full_name);
--
2.30.2
On 5/1/21 02:31, Leonardo Bras wrote:
> Having a function to check if the iommu table has any allocation helps
> deciding if a tbl can be reset for using a new DMA window.
>
> It should be enough to replace all instances of !bitmap_empty(tbl...).
>
> iommu_table_in_use() skips reserved memory, so we don't need to worry about
> releasing it before testing. This causes iommu_table_release_pages() to
> become unnecessary, given it is only used to remove reserved memory for
> testing.
>
> Also, only allow storing reserved memory values in tbl if they are valid
> in the table, so there is no need to check it in the new helper.
>
> Signed-off-by: Leonardo Bras <[email protected]>
> ---
> arch/powerpc/include/asm/iommu.h | 1 +
> arch/powerpc/kernel/iommu.c | 65 ++++++++++++++++----------------
> 2 files changed, 34 insertions(+), 32 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
> index deef7c94d7b6..bf3b84128525 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -154,6 +154,7 @@ extern int iommu_tce_table_put(struct iommu_table *tbl);
> */
> extern struct iommu_table *iommu_init_table(struct iommu_table *tbl,
> int nid, unsigned long res_start, unsigned long res_end);
> +bool iommu_table_in_use(struct iommu_table *tbl);
>
> #define IOMMU_TABLE_GROUP_MAX_TABLES 2
>
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index ad82dda81640..5e168bd91401 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -691,32 +691,24 @@ static void iommu_table_reserve_pages(struct iommu_table *tbl,
> if (tbl->it_offset == 0)
> set_bit(0, tbl->it_map);
>
> - tbl->it_reserved_start = res_start;
> - tbl->it_reserved_end = res_end;
> -
> - /* Check if res_start..res_end isn't empty and overlaps the table */
> - if (res_start && res_end &&
> - (tbl->it_offset + tbl->it_size < res_start ||
> - res_end < tbl->it_offset))
> - return;
> + if (res_start < tbl->it_offset)
> + res_start = tbl->it_offset;
>
> - for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
> - set_bit(i - tbl->it_offset, tbl->it_map);
> -}
> + if (res_end > (tbl->it_offset + tbl->it_size))
> + res_end = tbl->it_offset + tbl->it_size;
>
> -static void iommu_table_release_pages(struct iommu_table *tbl)
> -{
> - int i;
> + /* Check if res_start..res_end is a valid range in the table */
> + if (res_start >= res_end) {
> + tbl->it_reserved_start = tbl->it_offset;
> + tbl->it_reserved_end = tbl->it_offset;
> + return;
> + }
>
> - /*
> - * In case we have reserved the first bit, we should not emit
> - * the warning below.
> - */
> - if (tbl->it_offset == 0)
> - clear_bit(0, tbl->it_map);
> + tbl->it_reserved_start = res_start;
> + tbl->it_reserved_end = res_end;
>
> for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
> - clear_bit(i - tbl->it_offset, tbl->it_map);
> + set_bit(i - tbl->it_offset, tbl->it_map);
git produced a messy chunk here. The new logic is:
static void iommu_table_reserve_pages(struct iommu_table *tbl,
unsigned long res_start, unsigned long res_end)
{
int i;
WARN_ON_ONCE(res_end < res_start);
/*
* Reserve page 0 so it will not be used for any mappings.
* This avoids buggy drivers that consider page 0 to be invalid
* to crash the machine or even lose data.
*/
if (tbl->it_offset == 0)
set_bit(0, tbl->it_map);
if (res_start < tbl->it_offset)
res_start = tbl->it_offset;
if (res_end > (tbl->it_offset + tbl->it_size))
res_end = tbl->it_offset + tbl->it_size;
/* Check if res_start..res_end is a valid range in the table */
if (res_start >= res_end) {
tbl->it_reserved_start = tbl->it_offset;
tbl->it_reserved_end = tbl->it_offset;
return;
}
It is just hard to read. A code reviewer would assume res_end >=
res_start (as there is WARN_ON) but later we allow res_end to be lesser
than res_start.
but may be it is just me :)
Otherwise looks good.
Reviewed-by: Alexey Kardashevskiy <[email protected]>
> }
>
> /*
> @@ -781,6 +773,22 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid,
> return tbl;
> }
>
> +bool iommu_table_in_use(struct iommu_table *tbl)
> +{
> + unsigned long start = 0, end;
> +
> + /* ignore reserved bit0 */
> + if (tbl->it_offset == 0)
> + start = 1;
> + end = tbl->it_reserved_start - tbl->it_offset;
> + if (find_next_bit(tbl->it_map, end, start) != end)
> + return true;
> +
> + start = tbl->it_reserved_end - tbl->it_offset;
> + end = tbl->it_size;
> + return find_next_bit(tbl->it_map, end, start) != end;
> +}
> +
> static void iommu_table_free(struct kref *kref)
> {
> unsigned long bitmap_sz;
> @@ -799,10 +807,8 @@ static void iommu_table_free(struct kref *kref)
>
> iommu_debugfs_del(tbl);
>
> - iommu_table_release_pages(tbl);
> -
> /* verify that table contains no entries */
> - if (!bitmap_empty(tbl->it_map, tbl->it_size))
> + if (iommu_table_in_use(tbl))
> pr_warn("%s: Unexpected TCEs\n", __func__);
>
> /* calculate bitmap size in bytes */
> @@ -1108,18 +1114,13 @@ int iommu_take_ownership(struct iommu_table *tbl)
> for (i = 0; i < tbl->nr_pools; i++)
> spin_lock(&tbl->pools[i].lock);
>
> - iommu_table_release_pages(tbl);
> -
> - if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
> + if (iommu_table_in_use(tbl)) {
> pr_err("iommu_tce: it_map is not empty");
> ret = -EBUSY;
> - /* Undo iommu_table_release_pages, i.e. restore bit#0, etc */
> - iommu_table_reserve_pages(tbl, tbl->it_reserved_start,
> - tbl->it_reserved_end);
> - } else {
> - memset(tbl->it_map, 0xff, sz);
> }
>
> + memset(tbl->it_map, 0xff, sz);
> +
> for (i = 0; i < tbl->nr_pools; i++)
> spin_unlock(&tbl->pools[i].lock);
> spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
>
--
Alexey
On 5/1/21 02:31, Leonardo Bras wrote:
> Add a new helper _iommu_table_setparms(), and use it in
> iommu_table_setparms() and iommu_table_setparms_lpar() to avoid duplicated
> code.
>
> Also, setting tbl->it_ops was happening outsite iommu_table_setparms*(),
> so move it to the new helper. Since we need the iommu_table_ops to be
> declared before used, move iommu_table_lpar_multi_ops and
> iommu_table_pseries_ops to before their respective iommu_table_setparms*().
>
> Signed-off-by: Leonardo Bras <[email protected]>
This does not apply anymore as it conflicts with my 4be518d838809e2135.
> ---
> arch/powerpc/platforms/pseries/iommu.c | 100 ++++++++++++-------------
> 1 file changed, 50 insertions(+), 50 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 5a70ecd579b8..89cb6e9e9f31 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -53,6 +53,11 @@ enum {
> DDW_EXT_QUERY_OUT_SIZE = 2
> };
>
> +#ifdef CONFIG_IOMMU_API
> +static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned long *tce,
> + enum dma_data_direction *direction, bool realmode);
> +#endif
Instead of declaring this so far from the the code which needs it, may
be add
struct iommu_table_ops iommu_table_lpar_multi_ops;
right before iommu_table_setparms() (as the sctruct is what you actually
want there), and you won't need to move iommu_table_pseries_ops as well.
> +
> static struct iommu_table *iommu_pseries_alloc_table(int node)
> {
> struct iommu_table *tbl;
> @@ -501,6 +506,28 @@ static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
> return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
> }
>
> +static inline void _iommu_table_setparms(struct iommu_table *tbl, unsigned long busno,
The underscore is confusing, may be iommu_table_do_setparms()?
iommu_table_setparms_common()? Not sure. I cannot recall a single
function with just one leading underscore, I suspect I was pushed back
when I tried adding one ages ago :) "inline" seems excessive, the
compiler will probably figure it out anyway.
> + unsigned long liobn, unsigned long win_addr,
> + unsigned long window_size, unsigned long page_shift,
> + unsigned long base, struct iommu_table_ops *table_ops)
Make "base" a pointer. Or, better, just keep setting it directly in
iommu_table_setparms() rather than passing 0 around.
The same comment about "liobn" - set it in iommu_table_setparms_lpar().
The reviewer will see what field atters in what situation imho.
> +{
> + tbl->it_busno = busno;
> + tbl->it_index = liobn;
> + tbl->it_offset = win_addr >> page_shift;
> + tbl->it_size = window_size >> page_shift;
> + tbl->it_page_shift = page_shift;
> + tbl->it_base = base;
> + tbl->it_blocksize = 16;
> + tbl->it_type = TCE_PCI;
> + tbl->it_ops = table_ops;
> +}
> +
> +struct iommu_table_ops iommu_table_pseries_ops = {
> + .set = tce_build_pSeries,
> + .clear = tce_free_pSeries,
> + .get = tce_get_pseries
> +};
> +
> static void iommu_table_setparms(struct pci_controller *phb,
> struct device_node *dn,
> struct iommu_table *tbl)
> @@ -509,8 +536,13 @@ static void iommu_table_setparms(struct pci_controller *phb,
> const unsigned long *basep;
> const u32 *sizep;
>
> - node = phb->dn;
> + /* Test if we are going over 2GB of DMA space */
> + if (phb->dma_window_base_cur + phb->dma_window_size > SZ_2G) {
> + udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> + panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> + }
>
> + node = phb->dn;
> basep = of_get_property(node, "linux,tce-base", NULL);
> sizep = of_get_property(node, "linux,tce-size", NULL);
> if (basep == NULL || sizep == NULL) {
> @@ -519,33 +551,25 @@ static void iommu_table_setparms(struct pci_controller *phb,
> return;
> }
>
> - tbl->it_base = (unsigned long)__va(*basep);
> + _iommu_table_setparms(tbl, phb->bus->number, 0, phb->dma_window_base_cur,
> + phb->dma_window_size, IOMMU_PAGE_SHIFT_4K,
> + (unsigned long)__va(*basep), &iommu_table_pseries_ops);
>
> if (!is_kdump_kernel())
> memset((void *)tbl->it_base, 0, *sizep);
>
> - tbl->it_busno = phb->bus->number;
> - tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> -
> - /* Units of tce entries */
> - tbl->it_offset = phb->dma_window_base_cur >> tbl->it_page_shift;
> -
> - /* Test if we are going over 2GB of DMA space */
> - if (phb->dma_window_base_cur + phb->dma_window_size > 0x80000000ul) {
> - udbg_printf("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> - panic("PCI_DMA: Unexpected number of IOAs under this PHB.\n");
> - }
> -
> phb->dma_window_base_cur += phb->dma_window_size;
> -
> - /* Set the tce table size - measured in entries */
> - tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
> -
> - tbl->it_index = 0;
> - tbl->it_blocksize = 16;
> - tbl->it_type = TCE_PCI;
> }
>
> +struct iommu_table_ops iommu_table_lpar_multi_ops = {
> + .set = tce_buildmulti_pSeriesLP,
> +#ifdef CONFIG_IOMMU_API
> + .xchg_no_kill = tce_exchange_pseries,
> +#endif
> + .clear = tce_freemulti_pSeriesLP,
> + .get = tce_get_pSeriesLP
> +};
> +
> /*
> * iommu_table_setparms_lpar
> *
> @@ -557,28 +581,17 @@ static void iommu_table_setparms_lpar(struct pci_controller *phb,
> struct iommu_table_group *table_group,
> const __be32 *dma_window)
> {
> - unsigned long offset, size;
> + unsigned long offset, size, liobn;
>
> - of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset, &size);
> + of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
>
> - tbl->it_busno = phb->bus->number;
> - tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> - tbl->it_base = 0;
> - tbl->it_blocksize = 16;
> - tbl->it_type = TCE_PCI;
> - tbl->it_offset = offset >> tbl->it_page_shift;
> - tbl->it_size = size >> tbl->it_page_shift;
> + _iommu_table_setparms(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, 0,
> + &iommu_table_lpar_multi_ops);
>
> table_group->tce32_start = offset;
> table_group->tce32_size = size;
> }
>
> -struct iommu_table_ops iommu_table_pseries_ops = {
> - .set = tce_build_pSeries,
> - .clear = tce_free_pSeries,
> - .get = tce_get_pseries
> -};
> -
> static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
> {
> struct device_node *dn;
> @@ -647,7 +660,6 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
> tbl = pci->table_group->tables[0];
>
> iommu_table_setparms(pci->phb, dn, tbl);
> - tbl->it_ops = &iommu_table_pseries_ops;
> iommu_init_table(tbl, pci->phb->node, 0, 0);
>
> /* Divide the rest (1.75GB) among the children */
> @@ -664,7 +676,7 @@ static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
> bool realmode)
> {
> long rc;
> - unsigned long ioba = (unsigned long) index << tbl->it_page_shift;
> + unsigned long ioba = (unsigned long)index << tbl->it_page_shift;
Unrelated change, why, did checkpatch.pl complain?
> unsigned long flags, oldtce = 0;
> u64 proto_tce = iommu_direction_to_tce_perm(*direction);
> unsigned long newtce = *tce | proto_tce;
> @@ -686,15 +698,6 @@ static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
> }
> #endif
>
> -struct iommu_table_ops iommu_table_lpar_multi_ops = {
> - .set = tce_buildmulti_pSeriesLP,
> -#ifdef CONFIG_IOMMU_API
> - .xchg_no_kill = tce_exchange_pseries,
> -#endif
> - .clear = tce_freemulti_pSeriesLP,
> - .get = tce_get_pSeriesLP
> -};
> -
> static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
> {
> struct iommu_table *tbl;
> @@ -729,7 +732,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
> tbl = ppci->table_group->tables[0];
> iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
> ppci->table_group, dma_window);
> - tbl->it_ops = &iommu_table_lpar_multi_ops;
> iommu_init_table(tbl, ppci->phb->node, 0, 0);
> iommu_register_group(ppci->table_group,
> pci_domain_nr(bus), 0);
> @@ -758,7 +760,6 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
> PCI_DN(dn)->table_group = iommu_pseries_alloc_group(phb->node);
> tbl = PCI_DN(dn)->table_group->tables[0];
> iommu_table_setparms(phb, dn, tbl);
> - tbl->it_ops = &iommu_table_pseries_ops;
> iommu_init_table(tbl, phb->node, 0, 0);
> set_iommu_table_base(&dev->dev, tbl);
> return;
> @@ -1436,7 +1437,6 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> tbl = pci->table_group->tables[0];
> iommu_table_setparms_lpar(pci->phb, pdn, tbl,
> pci->table_group, dma_window);
> - tbl->it_ops = &iommu_table_lpar_multi_ops;
> iommu_init_table(tbl, pci->phb->node, 0, 0);
> iommu_register_group(pci->table_group,
> pci_domain_nr(pci->phb->bus), 0);
>
--
Alexey
On 5/1/21 02:31, Leonardo Bras wrote:
> Code used to create a ddw property that was previously scattered in
> enable_ddw() is now gathered in ddw_property_create(), which deals with
> allocation and filling the property, letting it ready for
> of_property_add(), which now occurs in sequence.
>
> This created an opportunity to reorganize the second part of enable_ddw():
>
> Without this patch enable_ddw() does, in order:
> kzalloc() property & members, create_ddw(), fill ddwprop inside property,
> ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> of_add_property(), and list_add().
>
> With this patch enable_ddw() does, in order:
> create_ddw(), ddw_property_create(), of_add_property(),
> ddw_list_new_entry(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> and list_add().
>
> This change requires of_remove_property() in case anything fails after
> of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
> in all memory, which looks the most expensive operation, only if
> everything else succeeds.
>
> Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 93 ++++++++++++++++----------
> 1 file changed, 57 insertions(+), 36 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 955cf095416c..5a70ecd579b8 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1122,6 +1122,35 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
> ret);
> }
>
> +static struct property *ddw_property_create(const char *propname, u32 liobn, u64 dma_addr,
> + u32 page_shift, u32 window_shift)
> +{
> + struct dynamic_dma_window_prop *ddwprop;
> + struct property *win64;
> +
> + win64 = kzalloc(sizeof(*win64), GFP_KERNEL);
> + if (!win64)
> + return NULL;
> +
> + win64->name = kstrdup(propname, GFP_KERNEL);
> + ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
> + win64->value = ddwprop;
> + win64->length = sizeof(*ddwprop);
> + if (!win64->name || !win64->value) {
> + kfree(win64->name);
> + kfree(win64->value);
> + kfree(win64);
> + return NULL;
> + }
> +
> + ddwprop->liobn = cpu_to_be32(liobn);
> + ddwprop->dma_base = cpu_to_be64(dma_addr);
> + ddwprop->tce_shift = cpu_to_be32(page_shift);
> + ddwprop->window_shift = cpu_to_be32(window_shift);
> +
> + return win64;
> +}
> +
> /* Return largest page shift based on "IO Page Sizes" output of ibm,query-pe-dma-window. */
> static int iommu_get_page_shift(u32 query_page_size)
> {
> @@ -1167,11 +1196,11 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> struct ddw_query_response query;
> struct ddw_create_response create;
> int page_shift;
> + u64 win_addr;
> struct device_node *dn;
> u32 ddw_avail[DDW_APPLICABLE_SIZE];
> struct direct_window *window;
> struct property *win64 = NULL;
> - struct dynamic_dma_window_prop *ddwprop;
> struct failed_ddw_pdn *fpdn;
> bool default_win_removed = false;
> bool pmem_present;
> @@ -1286,65 +1315,54 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> 1ULL << page_shift);
> goto out_failed;
> }
> - win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
> - if (!win64) {
> - dev_info(&dev->dev,
> - "couldn't allocate property for 64bit dma window\n");
> - goto out_failed;
> - }
> - win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
> - win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
> - win64->length = sizeof(*ddwprop);
> - if (!win64->name || !win64->value) {
> - dev_info(&dev->dev,
> - "couldn't allocate property name and value\n");
> - goto out_free_prop;
> - }
>
> ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
> if (ret != 0)
> - goto out_free_prop;
> -
> - ddwprop->liobn = cpu_to_be32(create.liobn);
> - ddwprop->dma_base = cpu_to_be64(((u64)create.addr_hi << 32) |
> - create.addr_lo);
> - ddwprop->tce_shift = cpu_to_be32(page_shift);
> - ddwprop->window_shift = cpu_to_be32(len);
> + goto out_failed;
>
> dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
> create.liobn, dn);
>
> - window = ddw_list_new_entry(pdn, ddwprop);
> + win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> + win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
> + page_shift, len);
> + if (!win64) {
> + dev_info(&dev->dev,
> + "couldn't allocate property, property name, or value\n");
> + goto out_remove_win;
> + }
> +
> + ret = of_add_property(pdn, win64);
> + if (ret) {
> + dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
> + pdn, ret);
> + goto out_free_prop;
> + }
> +
> + window = ddw_list_new_entry(pdn, win64->value);
> if (!window)
> - goto out_clear_window;
> + goto out_del_prop;
>
> ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> win64->value, tce_setrange_multi_pSeriesLP_walk);
> if (ret) {
> dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> dn, ret);
> - goto out_free_window;
> - }
> -
> - ret = of_add_property(pdn, win64);
> - if (ret) {
> - dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
> - pdn, ret);
> - goto out_free_window;
> + goto out_del_list;
> }
>
> spin_lock(&direct_window_list_lock);
> list_add(&window->list, &direct_window_list);
> spin_unlock(&direct_window_list_lock);
>
> - dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);
> + dev->dev.archdata.dma_offset = win_addr;
> goto out_unlock;
>
> -out_free_window:
> +out_del_list:
> kfree(window);
>
> -out_clear_window:
> - remove_ddw(pdn, true);
> +out_del_prop:
> + of_remove_property(pdn, win64);
>
> out_free_prop:
> kfree(win64->name);
> @@ -1352,6 +1370,9 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> kfree(win64);
> win64 = NULL;
>
> +out_remove_win:
> + remove_ddw(pdn, true);
> +
> out_failed:
> if (default_win_removed)
> reset_dma_window(dev, pdn);
>
--
Alexey
On 01/05/2021 02:31, Leonardo Bras wrote:
> So far it's assumed possible to map the guest RAM 1:1 to the bus, which
> works with a small number of devices. SRIOV changes it as the user can
> configure hundreds VFs and since phyp preallocates TCEs and does not
> allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
> per a PE to limit waste of physical pages.
>
> As of today, if the assumed direct mapping is not possible, DDW creation
> is skipped and the default DMA window "ibm,dma-window" is used instead.
>
> By using DDW, indirect mapping can get more TCEs than available for the
> default DMA window, and also get access to using much larger pagesizes
> (16MB as implemented in qemu vs 4k from default DMA window), causing a
> significant increase on the maximum amount of memory that can be IOMMU
> mapped at the same time.
>
> Indirect mapping will only be used if direct mapping is not a
> possibility.
>
> For indirect mapping, it's necessary to re-create the iommu_table with
> the new DMA window parameters, so iommu_alloc() can use it.
>
> Removing the default DMA window for using DDW with indirect mapping
> is only allowed if there is no current IOMMU memory allocated in
> the iommu_table. enable_ddw() is aborted otherwise.
>
> Even though there won't be both direct and indirect mappings at the
> same time, we can't reuse the DIRECT64_PROPNAME property name, or else
> an older kexec()ed kernel can assume direct mapping, and skip
> iommu_alloc(), causing undesirable behavior.
> So a new property name DMA64_PROPNAME "linux,dma64-ddr-window-info"
> was created to represent a DDW that does not allow direct mapping.
>
> Signed-off-by: Leonardo Bras <[email protected]>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 87 +++++++++++++++++++++-----
> 1 file changed, 72 insertions(+), 15 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index de54ddd9decd..572879af0211 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -53,6 +53,7 @@ enum {
> DDW_EXT_QUERY_OUT_SIZE = 2
> };
>
> +static phys_addr_t ddw_memory_hotplug_max(void);
> #ifdef CONFIG_IOMMU_API
> static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned long *tce,
> enum dma_data_direction *direction, bool realmode);
> @@ -380,6 +381,7 @@ static DEFINE_SPINLOCK(direct_window_list_lock);
> /* protects initializing window twice for same device */
> static DEFINE_MUTEX(direct_window_init_mutex);
> #define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>
> static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
> unsigned long num_pfn, const void *arg)
> @@ -918,6 +920,7 @@ static int find_existing_ddw_windows(void)
> return 0;
>
> find_existing_ddw_windows_named(DIRECT64_PROPNAME);
> + find_existing_ddw_windows_named(DMA64_PROPNAME);
>
> return 0;
> }
> @@ -1207,10 +1210,13 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> struct device_node *dn;
> u32 ddw_avail[DDW_APPLICABLE_SIZE];
> struct direct_window *window;
> + const char *win_name;
> struct property *win64 = NULL;
> struct failed_ddw_pdn *fpdn;
> - bool default_win_removed = false;
> + bool default_win_removed = false, direct_mapping = false;
> bool pmem_present;
> + struct pci_dn *pci = PCI_DN(pdn);
> + struct iommu_table *tbl = pci->table_group->tables[0];
>
> dn = of_find_node_by_type(NULL, "ibm,pmemory");
> pmem_present = dn != NULL;
> @@ -1218,8 +1224,12 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>
> mutex_lock(&direct_window_init_mutex);
>
> - if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len))
> - goto out_unlock;
> + if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset, &len)) {
> + direct_mapping = (len >= max_ram_len);
> +
> + mutex_unlock(&direct_window_init_mutex);
> + return direct_mapping;
Does not this break the existing case when direct_mapping==true by
skipping setting dev->dev.bus_dma_limit before returning?
> + }
>
> /*
> * If we already went through this for a previous function of
> @@ -1298,7 +1308,6 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> goto out_failed;
> }
> /* verify the window * number of ptes will map the partition */
> - /* check largest block * page size > max memory hotplug addr */
> /*
> * The "ibm,pmemory" can appear anywhere in the address space.
> * Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS
> @@ -1320,6 +1329,17 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> 1ULL << len,
> query.largest_available_block,
> 1ULL << page_shift);
> +
> + len = order_base_2(query.largest_available_block << page_shift);
> + win_name = DMA64_PROPNAME;
[1] ....
> + } else {
> + direct_mapping = true;
> + win_name = DIRECT64_PROPNAME;
> + }
> +
> + /* DDW + IOMMU on single window may fail if there is any allocation */
> + if (default_win_removed && !direct_mapping && iommu_table_in_use(tbl)) {
> + dev_dbg(&dev->dev, "current IOMMU table in use, can't be replaced.\n");
... remove !direct_mapping and move to [1]?
> goto out_failed;
> }
>
> @@ -1331,8 +1351,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> create.liobn, dn);
>
> win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> - win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn, win_addr,
> - page_shift, len);
> + win64 = ddw_property_create(win_name, create.liobn, win_addr, page_shift, len);
> if (!win64) {
> dev_info(&dev->dev,
> "couldn't allocate property, property name, or value\n");
> @@ -1350,12 +1369,47 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> if (!window)
> goto out_del_prop;
>
> - ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> - win64->value, tce_setrange_multi_pSeriesLP_walk);
> - if (ret) {
> - dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> - dn, ret);
> - goto out_del_list;
> + if (direct_mapping) {
> + /* DDW maps the whole partition, so enable direct DMA mapping */
> + ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
> + win64->value, tce_setrange_multi_pSeriesLP_walk);
> + if (ret) {
> + dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
> + dn, ret);
> + goto out_del_list;
> + }
> + } else {
> + struct iommu_table *newtbl;
> + int i;
> +
> + /* New table for using DDW instead of the default DMA window */
> + newtbl = iommu_pseries_alloc_table(pci->phb->node);
> + if (!newtbl) {
> + dev_dbg(&dev->dev, "couldn't create new IOMMU table\n");
> + goto out_del_list;
> + }
> +
> + for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
> + const unsigned long mask = IORESOURCE_MEM_64 | IORESOURCE_MEM;
> +
> + /* Look for MMIO32 */
> + if ((pci->phb->mem_resources[i].flags & mask) == IORESOURCE_MEM)
> + break;
What if there is no IORESOURCE_MEM? pci->phb->mem_resources[i].start
below will have garbage.
> + }
> +
> + _iommu_table_setparms(newtbl, pci->phb->bus->number, create.liobn, win_addr,
> + 1UL << len, page_shift, 0, &iommu_table_lpar_multi_ops);
> + iommu_init_table(newtbl, pci->phb->node, pci->phb->mem_resources[i].start,
> + pci->phb->mem_resources[i].end);
> +
> + if (default_win_removed)
> + iommu_tce_table_put(tbl);
iommu_tce_table_put() should have been called when the window was removed.
Also after some thinking - what happens if there were 2 devices in the
PE and one requested 64bit DMA? This will only update
set_iommu_table_base() for the 64bit one but not for the other.
I think the right thing to do is:
1. check if table[0] is in use, if yes => fail (which this does already)
2. remove default dma window but keep the iommu_table struct with one
change - set it_size to 0 (and free it_map) so the 32bit device won't
look at a stale structure and think there is some window (imaginery
situation for phyp but easy to recreate in qemu).
3. use table[1] for newly created indirect DDW window.
4. change get_iommu_table_base() to return a usable table (or may be not
needed?).
If this sounds reasonable (does it?), the question is now if you have
time to do that and the hardware to test that, or I'll have to finish
the work :)
> + else
> + pci->table_group->tables[1] = tbl;
What is this for?
> +
> + pci->table_group->tables[0] = newtbl;
> +
> + set_iommu_table_base(&dev->dev, newtbl);
> }
>
> spin_lock(&direct_window_list_lock);
> @@ -1398,10 +1452,10 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> * as RAM, then we failed to create a window to cover persistent
> * memory and need to set the DMA limit.
> */
> - if (pmem_present && win64 && (len == max_ram_len))
> + if (pmem_present && direct_mapping && len == max_ram_len)
> dev->dev.bus_dma_limit = dev->dev.archdata.dma_offset + (1ULL << len);
>
> - return win64;
> + return win64 && direct_mapping;
> }
>
> static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> @@ -1542,7 +1596,10 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
> * we have to remove the property when releasing
> * the device node.
> */
> - remove_ddw(np, false, DIRECT64_PROPNAME);
> +
> + if (remove_ddw(np, false, DIRECT64_PROPNAME))
> + remove_ddw(np, false, DMA64_PROPNAME);
> +
> if (pci && pci->table_group)
> iommu_pseries_free_group(pci->table_group,
> np->full_name);
>
--
Alexey
On 01/05/2021 02:31, Leonardo Bras wrote:
> At the moment pseries stores information about created directly mapped
> DDW window in DIRECT64_PROPNAME.
>
> With the objective of implementing indirect DMA mapping with DDW, it's
> necessary to have another propriety name to make sure kexec'ing into older
> kernels does not break, as it would if we reuse DIRECT64_PROPNAME.
>
> In order to have this, find_existing_ddw_windows() needs to be able to
> look for different property names.
>
> Extract find_existing_ddw_windows() into find_existing_ddw_windows_named()
> and calls it with current property name.
>
> Signed-off-by: Leonardo Bras <[email protected]>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 25 +++++++++++++++----------
> 1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index f8922fcf34b6..de54ddd9decd 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -888,24 +888,21 @@ static struct direct_window *ddw_list_new_entry(struct device_node *pdn,
> return window;
> }
>
> -static int find_existing_ddw_windows(void)
> +static void find_existing_ddw_windows_named(const char *name)
I'd suggest find_existing_ddw_windows_by_name() but this is nitpicking.
Reviewed-by: Alexey Kardashevskiy <[email protected]>
> {
> int len;
> struct device_node *pdn;
> struct direct_window *window;
> - const struct dynamic_dma_window_prop *direct64;
> -
> - if (!firmware_has_feature(FW_FEATURE_LPAR))
> - return 0;
> + const struct dynamic_dma_window_prop *dma64;
>
> - for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
> - direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
> - if (!direct64 || len < sizeof(*direct64)) {
> - remove_ddw(pdn, true, DIRECT64_PROPNAME);
> + for_each_node_with_property(pdn, name) {
> + dma64 = of_get_property(pdn, name, &len);
> + if (!dma64 || len < sizeof(*dma64)) {
> + remove_ddw(pdn, true, name);
> continue;
> }
>
> - window = ddw_list_new_entry(pdn, direct64);
> + window = ddw_list_new_entry(pdn, dma64);
> if (!window)
> break;
>
> @@ -913,6 +910,14 @@ static int find_existing_ddw_windows(void)
> list_add(&window->list, &direct_window_list);
> spin_unlock(&direct_window_list_lock);
> }
> +}
> +
> +static int find_existing_ddw_windows(void)
> +{
> + if (!firmware_has_feature(FW_FEATURE_LPAR))
> + return 0;
> +
> + find_existing_ddw_windows_named(DIRECT64_PROPNAME);
>
> return 0;
> }
>
--
Alexey
On 01/05/2021 02:31, Leonardo Bras wrote:
> Update remove_dma_window() so it can be used to remove DDW with a given
> property name.
>
> This enables the creation of new property names for DDW, so we can
> have different usage for it, like indirect mapping.
>
> Signed-off-by: Leonardo Bras <[email protected]>
Reviewed-by: Alexey Kardashevskiy <[email protected]>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 89cb6e9e9f31..f8922fcf34b6 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -823,31 +823,32 @@ static void remove_dma_window(struct device_node *np, u32 *ddw_avail,
> np, ret, ddw_avail[DDW_REMOVE_PE_DMA_WIN], liobn);
> }
>
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static int remove_ddw(struct device_node *np, bool remove_prop, const char *win_name)
> {
> struct property *win;
> u32 ddw_avail[DDW_APPLICABLE_SIZE];
> int ret = 0;
>
> + win = of_find_property(np, win_name, NULL);
> + if (!win)
> + return -EINVAL;
> +
> ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> &ddw_avail[0], DDW_APPLICABLE_SIZE);
> if (ret)
> - return;
> -
> - win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> - if (!win)
> - return;
> + return 0;
>
> if (win->length >= sizeof(struct dynamic_dma_window_prop))
> remove_dma_window(np, ddw_avail, win);
>
> if (!remove_prop)
> - return;
> + return 0;
>
> ret = of_remove_property(np, win);
> if (ret)
> pr_warn("%pOF: failed to remove direct window property: %d\n",
> np, ret);
> + return 0;
> }
>
> static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, int *window_shift)
> @@ -900,7 +901,7 @@ static int find_existing_ddw_windows(void)
> for_each_node_with_property(pdn, DIRECT64_PROPNAME) {
> direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
> if (!direct64 || len < sizeof(*direct64)) {
> - remove_ddw(pdn, true);
> + remove_ddw(pdn, true, DIRECT64_PROPNAME);
> continue;
> }
>
> @@ -1372,7 +1373,7 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> win64 = NULL;
>
> out_remove_win:
> - remove_ddw(pdn, true);
> + remove_ddw(pdn, true, DIRECT64_PROPNAME);
>
> out_failed:
> if (default_win_removed)
> @@ -1536,7 +1537,7 @@ static int iommu_reconfig_notifier(struct notifier_block *nb, unsigned long acti
> * we have to remove the property when releasing
> * the device node.
> */
> - remove_ddw(np, false);
> + remove_ddw(np, false, DIRECT64_PROPNAME);
> if (pci && pci->table_group)
> iommu_pseries_free_group(pci->table_group,
> np->full_name);
>
--
Alexey
Hello Alexey, thanks for this feedback!
On Mon, 2021-05-10 at 17:34 +1000, Alexey Kardashevskiy wrote:
>
>
> This does not apply anymore as it conflicts with my 4be518d838809e2135.
ok, rebasing on top of torvalds/master
>
>
> > ---
> > arch/powerpc/platforms/pseries/iommu.c | 100 ++++++++++++----------
> > ---
> > 1 file changed, 50 insertions(+), 50 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index 5a70ecd579b8..89cb6e9e9f31 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -53,6 +53,11 @@ enum {
> > DDW_EXT_QUERY_OUT_SIZE = 2
> > };
> >
> > +#ifdef CONFIG_IOMMU_API
> > +static int tce_exchange_pseries(struct iommu_table *tbl, long index,
> > unsigned long *tce,
> > + enum dma_data_direction *direction,
> > bool realmode);
> > +#endif
>
>
> Instead of declaring this so far from the the code which needs it, may
> be add
>
> struct iommu_table_ops iommu_table_lpar_multi_ops;
>
> right before iommu_table_setparms() (as the sctruct is what you
> actually
> want there),
> and you won't need to move iommu_table_pseries_ops as well.
Oh, I was not aware I could declare a variable and then define it
statically.
I mean, it does make sense, but I never thought of that.
I will change that :)
>
>
> > +
> > static struct iommu_table *iommu_pseries_alloc_table(int node)
> > {
> > struct iommu_table *tbl;
> > @@ -501,6 +506,28 @@ static int
> > tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
> > return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
> > }
> >
> > +static inline void _iommu_table_setparms(struct iommu_table *tbl,
> > unsigned long busno,
>
>
> The underscore is confusing, may be iommu_table_do_setparms()?
> iommu_table_setparms_common()? Not sure. I cannot recall a single
> function with just one leading underscore, I suspect I was pushed back
> when I tried adding one ages ago :) "inline" seems excessive, the
> compiler will probably figure it out anyway.
>
>
sure, done.
>
> > + unsigned long liobn,
> > unsigned long win_addr,
> > + unsigned long window_size,
> > unsigned long page_shift,
> > + unsigned long base, struct
> > iommu_table_ops *table_ops)
>
>
> Make "base" a pointer. Or, better, just keep setting it directly in
> iommu_table_setparms() rather than passing 0 around.
>
> The same comment about "liobn" - set it in iommu_table_setparms_lpar().
> The reviewer will see what field atters in what situation imho.
>
The idea here was to keep all tbl parameters setting in
_iommu_table_setparms (or iommu_table_setparms_common).
I understand the idea that each one of those is optional in the other
case, but should we keep whatever value is present in the other
variable (not zeroing the other variable), or do someting like:
tbl->it_index = 0;
tbl->it_base = basep;
(in iommu_table_setparms)
tbl->it_index = liobn;
tbl->it_base = 0;
(in iommu_table_setparms_lpar)
>
> > +{
> > + tbl->it_busno = busno;
> > + tbl->it_index = liobn;
> > + tbl->it_offset = win_addr >> page_shift;
> > + tbl->it_size = window_size >> page_shift;
> > + tbl->it_page_shift = page_shift;
> > + tbl->it_base = base;
> > + tbl->it_blocksize = 16;
> > + tbl->it_type = TCE_PCI;
> > + tbl->it_ops = table_ops;
> > +}
> > +
> > +struct iommu_table_ops iommu_table_pseries_ops = {
> > + .set = tce_build_pSeries,
> > + .clear = tce_free_pSeries,
> > + .get = tce_get_pseries
> > +};
> > +
> > static void iommu_table_setparms(struct pci_controller *phb,
> > struct device_node *dn,
> > struct iommu_table *tbl)
> > @@ -509,8 +536,13 @@ static void iommu_table_setparms(struct
> > pci_controller *phb,
> > const unsigned long *basep;
> > const u32 *sizep;
> >
> > - node = phb->dn;
> > + /* Test if we are going over 2GB of DMA space */
> > + if (phb->dma_window_base_cur + phb->dma_window_size > SZ_2G)
> > {
> > + udbg_printf("PCI_DMA: Unexpected number of IOAs under
> > this PHB.\n");
> > + panic("PCI_DMA: Unexpected number of IOAs under this
> > PHB.\n");
> > + }
> >
> > + node = phb->dn;
> > basep = of_get_property(node, "linux,tce-base", NULL);
> > sizep = of_get_property(node, "linux,tce-size", NULL);
> > if (basep == NULL || sizep == NULL) {
> > @@ -519,33 +551,25 @@ static void iommu_table_setparms(struct
> > pci_controller *phb,
> > return;
> > }
> >
> > - tbl->it_base = (unsigned long)__va(*basep);
> > + _iommu_table_setparms(tbl, phb->bus->number, 0, phb-
> > >dma_window_base_cur,
> > + phb->dma_window_size,
> > IOMMU_PAGE_SHIFT_4K,
> > + (unsigned long)__va(*basep),
> > &iommu_table_pseries_ops);
> >
> > if (!is_kdump_kernel())
> > memset((void *)tbl->it_base, 0, *sizep);
> >
> > - tbl->it_busno = phb->bus->number;
> > - tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> > -
> > - /* Units of tce entries */
> > - tbl->it_offset = phb->dma_window_base_cur >> tbl-
> > >it_page_shift;
> > -
> > - /* Test if we are going over 2GB of DMA space */
> > - if (phb->dma_window_base_cur + phb->dma_window_size >
> > 0x80000000ul) {
> > - udbg_printf("PCI_DMA: Unexpected number of IOAs under
> > this PHB.\n");
> > - panic("PCI_DMA: Unexpected number of IOAs under this
> > PHB.\n");
> > - }
> > -
> > phb->dma_window_base_cur += phb->dma_window_size;
> > -
> > - /* Set the tce table size - measured in entries */
> > - tbl->it_size = phb->dma_window_size >> tbl->it_page_shift;
> > -
> > - tbl->it_index = 0;
> > - tbl->it_blocksize = 16;
> > - tbl->it_type = TCE_PCI;
> > }
> >
> > +struct iommu_table_ops iommu_table_lpar_multi_ops = {
> > + .set = tce_buildmulti_pSeriesLP,
> > +#ifdef CONFIG_IOMMU_API
> > + .xchg_no_kill = tce_exchange_pseries,
> > +#endif
> > + .clear = tce_freemulti_pSeriesLP,
> > + .get = tce_get_pSeriesLP
> > +};
> > +
> > /*
> > * iommu_table_setparms_lpar
> > *
> > @@ -557,28 +581,17 @@ static void iommu_table_setparms_lpar(struct
> > pci_controller *phb,
> > struct iommu_table_group
> > *table_group,
> > const __be32 *dma_window)
> > {
> > - unsigned long offset, size;
> > + unsigned long offset, size, liobn;
> >
> > - of_parse_dma_window(dn, dma_window, &tbl->it_index, &offset,
> > &size);
> > + of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
> >
> > - tbl->it_busno = phb->bus->number;
> > - tbl->it_page_shift = IOMMU_PAGE_SHIFT_4K;
> > - tbl->it_base = 0;
> > - tbl->it_blocksize = 16;
> > - tbl->it_type = TCE_PCI;
> > - tbl->it_offset = offset >> tbl->it_page_shift;
> > - tbl->it_size = size >> tbl->it_page_shift;
> > + _iommu_table_setparms(tbl, phb->bus->number, liobn, offset,
> > size, IOMMU_PAGE_SHIFT_4K, 0,
> > + &iommu_table_lpar_multi_ops);
> >
> > table_group->tce32_start = offset;
> > table_group->tce32_size = size;
> > }
> >
> > -struct iommu_table_ops iommu_table_pseries_ops = {
> > - .set = tce_build_pSeries,
> > - .clear = tce_free_pSeries,
> > - .get = tce_get_pseries
> > -};
> > -
> > static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
> > {
> > struct device_node *dn;
> > @@ -647,7 +660,6 @@ static void pci_dma_bus_setup_pSeries(struct
> > pci_bus *bus)
> > tbl = pci->table_group->tables[0];
> >
> > iommu_table_setparms(pci->phb, dn, tbl);
> > - tbl->it_ops = &iommu_table_pseries_ops;
> > iommu_init_table(tbl, pci->phb->node, 0, 0);
> >
> > /* Divide the rest (1.75GB) among the children */
> > @@ -664,7 +676,7 @@ static int tce_exchange_pseries(struct
> > iommu_table *tbl, long index, unsigned
> > bool realmode)
> > {
> > long rc;
> > - unsigned long ioba = (unsigned long) index << tbl-
> > >it_page_shift;
> > + unsigned long ioba = (unsigned long)index << tbl-
> > >it_page_shift;
>
>
> Unrelated change, why, did checkpatch.pl complain?
My bad, this one could pass my git-add unnoticed.
Reverting.
>
> > unsigned long flags, oldtce = 0;
> > u64 proto_tce = iommu_direction_to_tce_perm(*direction);
> > unsigned long newtce = *tce | proto_tce;
> > @@ -686,15 +698,6 @@ static int tce_exchange_pseries(struct
> > iommu_table *tbl, long index, unsigned
> > }
> > #endif
> >
> > -struct iommu_table_ops iommu_table_lpar_multi_ops = {
> > - .set = tce_buildmulti_pSeriesLP,
> > -#ifdef CONFIG_IOMMU_API
> > - .xchg_no_kill = tce_exchange_pseries,
> > -#endif
> > - .clear = tce_freemulti_pSeriesLP,
> > - .get = tce_get_pSeriesLP
> > -};
> > -
> > static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
> > {
> > struct iommu_table *tbl;
> > @@ -729,7 +732,6 @@ static void pci_dma_bus_setup_pSeriesLP(struct
> > pci_bus *bus)
> > tbl = ppci->table_group->tables[0];
> > iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
> > ppci->table_group, dma_window);
> > - tbl->it_ops = &iommu_table_lpar_multi_ops;
> > iommu_init_table(tbl, ppci->phb->node, 0, 0);
> > iommu_register_group(ppci->table_group,
> > pci_domain_nr(bus), 0);
> > @@ -758,7 +760,6 @@ static void pci_dma_dev_setup_pSeries(struct
> > pci_dev *dev)
> > PCI_DN(dn)->table_group =
> > iommu_pseries_alloc_group(phb->node);
> > tbl = PCI_DN(dn)->table_group->tables[0];
> > iommu_table_setparms(phb, dn, tbl);
> > - tbl->it_ops = &iommu_table_pseries_ops;
> > iommu_init_table(tbl, phb->node, 0, 0);
> > set_iommu_table_base(&dev->dev, tbl);
> > return;
> > @@ -1436,7 +1437,6 @@ static void
> > pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> > tbl = pci->table_group->tables[0];
> > iommu_table_setparms_lpar(pci->phb, pdn, tbl,
> > pci->table_group, dma_window);
> > - tbl->it_ops = &iommu_table_lpar_multi_ops;
> > iommu_init_table(tbl, pci->phb->node, 0, 0);
> > iommu_register_group(pci->table_group,
> > pci_domain_nr(pci->phb->bus), 0);
> >
>
Best regards,
Leonardo Bras
On Tue, 2021-05-11 at 17:57 +1000, Alexey Kardashevskiy wrote:
>
>
> On 01/05/2021 02:31, Leonardo Bras wrote:
> > [...]
> > pmem_present = dn != NULL;
> > @@ -1218,8 +1224,12 @@ static bool enable_ddw(struct pci_dev *dev,
> > struct device_node *pdn)
> >
> > mutex_lock(&direct_window_init_mutex);
> >
> > - if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset,
> > &len))
> > - goto out_unlock;
> > + if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset,
> > &len)) {
> > + direct_mapping = (len >= max_ram_len);
> > +
> > + mutex_unlock(&direct_window_init_mutex);
> > + return direct_mapping;
>
> Does not this break the existing case when direct_mapping==true by
> skipping setting dev->dev.bus_dma_limit before returning?
>
Yes, it does. Good catch!
I changed it to use a flag instead of win64 for return, and now I can
use the same success exit path for both the new config and the config
found in list. (out_unlock)
>
>
> > + }
> >
> > /*
> > * If we already went through this for a previous function of
> > @@ -1298,7 +1308,6 @@ static bool enable_ddw(struct pci_dev *dev,
> > struct device_node *pdn)
> > goto out_failed;
> > }
> > /* verify the window * number of ptes will map the partition
> > */
> > - /* check largest block * page size > max memory hotplug addr
> > */
> > /*
> > * The "ibm,pmemory" can appear anywhere in the address
> > space.
> > * Assuming it is still backed by page structs, try
> > MAX_PHYSMEM_BITS
> > @@ -1320,6 +1329,17 @@ static bool enable_ddw(struct pci_dev *dev,
> > struct device_node *pdn)
> > 1ULL << len,
> > query.largest_available_block,
> > 1ULL << page_shift);
> > +
> > + len = order_base_2(query.largest_available_block <<
> > page_shift);
> > + win_name = DMA64_PROPNAME;
>
> [1] ....
>
>
> > + } else {
> > + direct_mapping = true;
> > + win_name = DIRECT64_PROPNAME;
> > + }
> > +
> > + /* DDW + IOMMU on single window may fail if there is any
> > allocation */
> > + if (default_win_removed && !direct_mapping &&
> > iommu_table_in_use(tbl)) {
> > + dev_dbg(&dev->dev, "current IOMMU table in use, can't
> > be replaced.\n");
>
>
> ... remove !direct_mapping and move to [1]?
sure, done!
>
>
> > goto out_failed;
> > }
> >
> > @@ -1331,8 +1351,7 @@ static bool enable_ddw(struct pci_dev *dev,
> > struct device_node *pdn)
> > create.liobn, dn);
> >
> > win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> > - win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn,
> > win_addr,
> > - page_shift, len);
> > + win64 = ddw_property_create(win_name, create.liobn, win_addr,
> > page_shift, len);
> > if (!win64) {
> > dev_info(&dev->dev,
> > "couldn't allocate property, property name,
> > or value\n");
> > @@ -1350,12 +1369,47 @@ static bool enable_ddw(struct pci_dev *dev,
> > struct device_node *pdn)
> > if (!window)
> > goto out_del_prop;
> >
> > - ret = walk_system_ram_range(0, memblock_end_of_DRAM() >>
> > PAGE_SHIFT,
> > - win64->value,
> > tce_setrange_multi_pSeriesLP_walk);
> > - if (ret) {
> > - dev_info(&dev->dev, "failed to map direct window for
> > %pOF: %d\n",
> > - dn, ret);
> > - goto out_del_list;
> > + if (direct_mapping) {
> > + /* DDW maps the whole partition, so enable direct DMA
> > mapping */
> > + ret = walk_system_ram_range(0, memblock_end_of_DRAM()
> > >> PAGE_SHIFT,
> > + win64->value,
> > tce_setrange_multi_pSeriesLP_walk);
> > + if (ret) {
> > + dev_info(&dev->dev, "failed to map direct
> > window for %pOF: %d\n",
> > + dn, ret);
> > + goto out_del_list;
> > + }
> > + } else {
> > + struct iommu_table *newtbl;
> > + int i;
> > +
> > + /* New table for using DDW instead of the default DMA
> > window */
> > + newtbl = iommu_pseries_alloc_table(pci->phb->node);
> > + if (!newtbl) {
> > + dev_dbg(&dev->dev, "couldn't create new IOMMU
> > table\n");
> > + goto out_del_list;
> > + }
> > +
> > + for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources);
> > i++) {
> > + const unsigned long mask = IORESOURCE_MEM_64
> > | IORESOURCE_MEM;
> > +
> > + /* Look for MMIO32 */
> > + if ((pci->phb->mem_resources[i].flags & mask)
> > == IORESOURCE_MEM)
> > + break;
>
> What if there is no IORESOURCE_MEM? pci->phb->mem_resources[i].start
> below will have garbage.
Yeah, that makes sense. I will add this lines after 'for':
if (i == ARRAY_SIZE(pci->phb->mem_resources)) {
iommu_tce_table_put(newtbl);
goto out_del_list;
}
What do you think?
>
>
> > + }
> > +
> > + _iommu_table_setparms(newtbl, pci->phb->bus->number,
> > create.liobn, win_addr,
> > + 1UL << len, page_shift, 0,
> > &iommu_table_lpar_multi_ops);
> > + iommu_init_table(newtbl, pci->phb->node, pci->phb-
> > >mem_resources[i].start,
> > + pci->phb->mem_resources[i].end);
> > +
> > + if (default_win_removed)
> > + iommu_tce_table_put(tbl);
>
>
> iommu_tce_table_put() should have been called when the window was
> removed.
>
> Also after some thinking - what happens if there were 2 devices in the
> PE and one requested 64bit DMA? This will only update
> set_iommu_table_base() for the 64bit one but not for the other.
>
> I think the right thing to do is:
>
> 1. check if table[0] is in use, if yes => fail (which this does
> already)
>
> 2. remove default dma window but keep the iommu_table struct with one
> change - set it_size to 0 (and free it_map) so the 32bit device won't
> look at a stale structure and think there is some window (imaginery
> situation for phyp but easy to recreate in qemu).
>
> 3. use table[1] for newly created indirect DDW window.
>
> 4. change get_iommu_table_base() to return a usable table (or may be
> not
> needed?).
>
> If this sounds reasonable (does it?),
Looks ok, I will try your suggestion.
I was not aware of how pci->table_group->tables[] worked, so I replaced
pci->table_group->tables[0] with the new tbl, while moving the older in
pci->table_group->tables[1].
(4) get_iommu_table_base() does not seem to need update, as it returns
the tlb set by set_iommu_table_base() which is already called in the
!direct_mapping path in current patch.
> the question is now if you have
> time to do that and the hardware to test that, or I'll have to finish
> the work :)
Sorry, for some reason part of this got lost in Evolution mail client.
If possible, I do want to finish this work, and I am talking to IBM
Virt people in order to get testing HW.
>
>
> > + else
> > + pci->table_group->tables[1] = tbl;
>
>
> What is this for?
I was thinking of adding the older table to pci->table_group->tables[1]
while keeping the newer table on pci->table_group->tables[0].
This did work, but I think your suggestion may work better.
Best regards,
Leonardo Bras
Hello Alexey,
On Fri, 2021-06-18 at 19:26 -0300, Leonardo Brás wrote:
> >
> > > + unsigned long liobn,
> > > unsigned long win_addr,
> > > + unsigned long
> > > window_size,
> > > unsigned long page_shift,
> > > + unsigned long base,
> > > struct
> > > iommu_table_ops *table_ops)
> >
> >
> > iommu_table_setparms() rather than passing 0 around.
> >
> > The same comment about "liobn" - set it in
> > iommu_table_setparms_lpar().
> > The reviewer will see what field atters in what situation imho.
> >
>
> The idea here was to keep all tbl parameters setting in
> _iommu_table_setparms (or iommu_table_setparms_common).
>
> I understand the idea that each one of those is optional in the other
> case, but should we keep whatever value is present in the other
> variable (not zeroing the other variable), or do someting like:
>
> tbl->it_index = 0;
> tbl->it_base = basep;
> (in iommu_table_setparms)
>
> tbl->it_index = liobn;
> tbl->it_base = 0;
> (in iommu_table_setparms_lpar)
>
This one is supposed to be a question, but I missed the question mark.
Sorry about that.
I would like to get your opinion in this :)
On 13/07/2021 14:47, Leonardo Brás wrote:
> Hello Alexey,
>
> On Fri, 2021-06-18 at 19:26 -0300, Leonardo Brás wrote:
>>>
>>>> + unsigned long liobn,
>>>> unsigned long win_addr,
>>>> + unsigned long
>>>> window_size,
>>>> unsigned long page_shift,
>>>> + unsigned long base,
>>>> struct
>>>> iommu_table_ops *table_ops)
>>>
>>>
>>> iommu_table_setparms() rather than passing 0 around.
>>>
>>> The same comment about "liobn" - set it in
>>> iommu_table_setparms_lpar().
>>> The reviewer will see what field atters in what situation imho.
>>>
>>
>> The idea here was to keep all tbl parameters setting in
>> _iommu_table_setparms (or iommu_table_setparms_common).
>>
>> I understand the idea that each one of those is optional in the other
>> case, but should we keep whatever value is present in the other
>> variable (not zeroing the other variable), or do someting like:
>>
>> tbl->it_index = 0;
>> tbl->it_base = basep;
>> (in iommu_table_setparms)
>>
>> tbl->it_index = liobn;
>> tbl->it_base = 0;
>> (in iommu_table_setparms_lpar)
>>
>
> This one is supposed to be a question, but I missed the question mark.
> Sorry about that.
Ah ok :)
> I would like to get your opinion in this :)
Besides making the "base" parameter a pointer, I really do not have
strong preference, just make it not hurting eyes of a reader, that's all :)
imho in general, rather than answering 5 weeks later, it is more
productive to address whatever comments were made, add comments (in the
code or commit logs) why you are sticking to your initial approach,
rebase and repost the whole thing. Thanks,
--
Alexey
On 13/07/2021 14:36, Leonardo Brás wrote:
> On Tue, 2021-05-11 at 17:57 +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 01/05/2021 02:31, Leonardo Bras wrote:
>>> [...]
>>> pmem_present = dn != NULL;
>>> @@ -1218,8 +1224,12 @@ static bool enable_ddw(struct pci_dev *dev,
>>> struct device_node *pdn)
>>>
>>> mutex_lock(&direct_window_init_mutex);
>>>
>>> - if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset,
>>> &len))
>>> - goto out_unlock;
>>> + if (find_existing_ddw(pdn, &dev->dev.archdata.dma_offset,
>>> &len)) {
>>> + direct_mapping = (len >= max_ram_len);
>>> +
>>> + mutex_unlock(&direct_window_init_mutex);
>>> + return direct_mapping;
>>
>> Does not this break the existing case when direct_mapping==true by
>> skipping setting dev->dev.bus_dma_limit before returning?
>>
>
> Yes, it does. Good catch!
> I changed it to use a flag instead of win64 for return, and now I can
> use the same success exit path for both the new config and the config
> found in list. (out_unlock)
>
>>
>>
>>> + }
>>>
>>> /*
>>> * If we already went through this for a previous function of
>>> @@ -1298,7 +1308,6 @@ static bool enable_ddw(struct pci_dev *dev,
>>> struct device_node *pdn)
>>> goto out_failed;
>>> }
>>> /* verify the window * number of ptes will map the partition
>>> */
>>> - /* check largest block * page size > max memory hotplug addr
>>> */
>>> /*
>>> * The "ibm,pmemory" can appear anywhere in the address
>>> space.
>>> * Assuming it is still backed by page structs, try
>>> MAX_PHYSMEM_BITS
>>> @@ -1320,6 +1329,17 @@ static bool enable_ddw(struct pci_dev *dev,
>>> struct device_node *pdn)
>>> 1ULL << len,
>>> query.largest_available_block,
>>> 1ULL << page_shift);
>>> +
>>> + len = order_base_2(query.largest_available_block <<
>>> page_shift);
>>> + win_name = DMA64_PROPNAME;
>>
>> [1] ....
>>
>>
>>> + } else {
>>> + direct_mapping = true;
>>> + win_name = DIRECT64_PROPNAME;
>>> + }
>>> +
>>> + /* DDW + IOMMU on single window may fail if there is any
>>> allocation */
>>> + if (default_win_removed && !direct_mapping &&
>>> iommu_table_in_use(tbl)) {
>>> + dev_dbg(&dev->dev, "current IOMMU table in use, can't
>>> be replaced.\n");
>>
>>
>> ... remove !direct_mapping and move to [1]?
>
>
> sure, done!
>
>>
>>
>>> goto out_failed;
>>> }
>>>
>>> @@ -1331,8 +1351,7 @@ static bool enable_ddw(struct pci_dev *dev,
>>> struct device_node *pdn)
>>> create.liobn, dn);
>>>
>>> win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
>>> - win64 = ddw_property_create(DIRECT64_PROPNAME, create.liobn,
>>> win_addr,
>>> - page_shift, len);
>>> + win64 = ddw_property_create(win_name, create.liobn, win_addr,
>>> page_shift, len);
>>> if (!win64) {
>>> dev_info(&dev->dev,
>>> "couldn't allocate property, property name,
>>> or value\n");
>>> @@ -1350,12 +1369,47 @@ static bool enable_ddw(struct pci_dev *dev,
>>> struct device_node *pdn)
>>> if (!window)
>>> goto out_del_prop;
>>>
>>> - ret = walk_system_ram_range(0, memblock_end_of_DRAM() >>
>>> PAGE_SHIFT,
>>> - win64->value,
>>> tce_setrange_multi_pSeriesLP_walk);
>>> - if (ret) {
>>> - dev_info(&dev->dev, "failed to map direct window for
>>> %pOF: %d\n",
>>> - dn, ret);
>>> - goto out_del_list;
>>> + if (direct_mapping) {
>>> + /* DDW maps the whole partition, so enable direct DMA
>>> mapping */
>>> + ret = walk_system_ram_range(0, memblock_end_of_DRAM()
>>>>> PAGE_SHIFT,
>>> + win64->value,
>>> tce_setrange_multi_pSeriesLP_walk);
>>> + if (ret) {
>>> + dev_info(&dev->dev, "failed to map direct
>>> window for %pOF: %d\n",
>>> + dn, ret);
>>> + goto out_del_list;
>>> + }
>>> + } else {
>>> + struct iommu_table *newtbl;
>>> + int i;
>>> +
>>> + /* New table for using DDW instead of the default DMA
>>> window */
>>> + newtbl = iommu_pseries_alloc_table(pci->phb->node);
>>> + if (!newtbl) {
>>> + dev_dbg(&dev->dev, "couldn't create new IOMMU
>>> table\n");
>>> + goto out_del_list;
>>> + }
>>> +
>>> + for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources);
>>> i++) {
>>> + const unsigned long mask = IORESOURCE_MEM_64
>>> | IORESOURCE_MEM;
>>> +
>>> + /* Look for MMIO32 */
>>> + if ((pci->phb->mem_resources[i].flags & mask)
>>> == IORESOURCE_MEM)
>>> + break;
>>
>> What if there is no IORESOURCE_MEM? pci->phb->mem_resources[i].start
>> below will have garbage.
>
>
>
> Yeah, that makes sense. I will add this lines after 'for':
>
> if (i == ARRAY_SIZE(pci->phb->mem_resources)) {
> iommu_tce_table_put(newtbl);
> goto out_del_list;
> }
>
> What do you think?
Move this and that "for" before iommu_pseries_alloc_table() so you won't
need to free if there is no IORESOURCE_MEM.
>
>
>>
>>
>>> + }
>>> +
>>> + _iommu_table_setparms(newtbl, pci->phb->bus->number,
>>> create.liobn, win_addr,
>>> + 1UL << len, page_shift, 0,
>>> &iommu_table_lpar_multi_ops);
>>> + iommu_init_table(newtbl, pci->phb->node, pci->phb-
>>>> mem_resources[i].start,
>>> + pci->phb->mem_resources[i].end);
>>> +
>>> + if (default_win_removed)
>>> + iommu_tce_table_put(tbl);
>>
>>
>> iommu_tce_table_put() should have been called when the window was
>> removed.
>>
>> Also after some thinking - what happens if there were 2 devices in the
>> PE and one requested 64bit DMA? This will only update
>> set_iommu_table_base() for the 64bit one but not for the other.
>>
>> I think the right thing to do is:
>>
>> 1. check if table[0] is in use, if yes => fail (which this does
>> already)
>>
>> 2. remove default dma window but keep the iommu_table struct with one
>> change - set it_size to 0 (and free it_map) so the 32bit device won't
>> look at a stale structure and think there is some window (imaginery
>> situation for phyp but easy to recreate in qemu).
>>
>> 3. use table[1] for newly created indirect DDW window.
>>
>> 4. change get_iommu_table_base() to return a usable table (or may be
>> not
>> needed?).
>>
>> If this sounds reasonable (does it?),
>
> Looks ok, I will try your suggestion.
> I was not aware of how pci->table_group->tables[] worked, so I replaced
> pci->table_group->tables[0] with the new tbl, while moving the older in
> pci->table_group->tables[1].
pci->table_group->tables[0] is window#0 at @0.
pci->table_group->tables[1] is window#1 at 0x0800.0000.0000.0000. That
is all :)
pseries does not use tables[1] but powernv does (by VFIO only though).
> (4) get_iommu_table_base() does not seem to need update, as it returns
> the tlb set by set_iommu_table_base() which is already called in the
> !direct_mapping path in current patch.
Sounds right.
>
>> the question is now if you have
>> time to do that and the hardware to test that, or I'll have to finish
>> the work :)
>
> Sorry, for some reason part of this got lost in Evolution mail client.
>
> If possible, I do want to finish this work, and I am talking to IBM
> Virt people in order to get testing HW.
Even I struggle to find a powervm machine :)
>>
>>
>>> + else
>>> + pci->table_group->tables[1] = tbl;
>>
>>
>> What is this for?
>
> I was thinking of adding the older table to pci->table_group->tables[1]
> while keeping the newer table on pci->table_group->tables[0].
> This did work, but I think your suggestion may work better.
>
> Best regards,
> Leonardo Bras
>
>
--
Alexey
On Wed, 2021-07-14 at 18:32 +1000, Alexey Kardashevskiy wrote:
>
>
> On 13/07/2021 14:47, Leonardo Brás wrote:
> > Hello Alexey,
> >
> > On Fri, 2021-06-18 at 19:26 -0300, Leonardo Brás wrote:
> > > >
> > > > > + unsigned long liobn,
> > > > > unsigned long win_addr,
> > > > > + unsigned long
> > > > > window_size,
> > > > > unsigned long page_shift,
> > > > > + unsigned long base,
> > > > > struct
> > > > > iommu_table_ops *table_ops)
> > > >
> > > >
> > > > iommu_table_setparms() rather than passing 0 around.
> > > >
> > > > The same comment about "liobn" - set it in
> > > > iommu_table_setparms_lpar().
> > > > The reviewer will see what field atters in what situation imho.
> > > >
> > >
> > > The idea here was to keep all tbl parameters setting in
> > > _iommu_table_setparms (or iommu_table_setparms_common).
> > >
> > > I understand the idea that each one of those is optional in the
> > > other
> > > case, but should we keep whatever value is present in the other
> > > variable (not zeroing the other variable), or do someting like:
> > >
> > > tbl->it_index = 0;
> > > tbl->it_base = basep;
> > > (in iommu_table_setparms)
> > >
> > > tbl->it_index = liobn;
> > > tbl->it_base = 0;
> > > (in iommu_table_setparms_lpar)
> > >
> >
> > This one is supposed to be a question, but I missed the question
> > mark.
> > Sorry about that.
>
> Ah ok :)
>
> > I would like to get your opinion in this :)
>
> Besides making the "base" parameter a pointer, I really do not have
> strong preference, just make it not hurting eyes of a reader, that's
> all :)
Ok, done :)
> imho in general, rather than answering 5 weeks later, it is more
> productive to address whatever comments were made, add comments (in
> the
> code or commit logs) why you are sticking to your initial approach,
> rebase and repost the whole thing. Thanks,
Thanks for the tip, and for the reviewing :)
Best regards,
Leonardo Bras
On Wed, 2021-07-14 at 18:38 +1000, Alexey Kardashevskiy wrote:
> > for (i = 0; i <
> > > > ARRAY_SIZE(pci->phb->mem_resources);
> > > > i++) {
> > > > + const unsigned long mask =
> > > > IORESOURCE_MEM_64
> > > > > IORESOURCE_MEM;
> > > > +
> > > > + /* Look for MMIO32 */
> > > > + if ((pci->phb->mem_resources[i].flags &
> > > > mask)
> > > > == IORESOURCE_MEM)
> > > > + break;
> > >
> > > What if there is no IORESOURCE_MEM? pci->phb-
> > > >mem_resources[i].start
> > > below will have garbage.
> >
> >
> >
> > Yeah, that makes sense. I will add this lines after 'for':
> >
> > if (i == ARRAY_SIZE(pci->phb->mem_resources)) {
> > iommu_tce_table_put(newtbl);
> > goto out_del_list;
> > }
> >
> > What do you think?
>
>
> Move this and that "for" before iommu_pseries_alloc_table() so you
> won't
> need to free if there is no IORESOURCE_MEM.
Done!
>
>
> >
> >
> > >
> > >
> > > > + }
> > > > +
> > > > + _iommu_table_setparms(newtbl, pci->phb->bus-
> > > > >number,
> > > > create.liobn, win_addr,
> > > > + 1UL << len, page_shift,
> > > > 0,
> > > > &iommu_table_lpar_multi_ops);
> > > > + iommu_init_table(newtbl, pci->phb->node, pci-
> > > > >phb-
> > > > > mem_resources[i].start,
> > > > + pci->phb-
> > > > >mem_resources[i].end);
> > > > +
> > > > + if (default_win_removed)
> > > > + iommu_tce_table_put(tbl);
> > >
> > >
> > > iommu_tce_table_put() should have been called when the window was
> > > removed.
> > >
> > > Also after some thinking - what happens if there were 2 devices
> > > in the
> > > PE and one requested 64bit DMA? This will only update
> > > set_iommu_table_base() for the 64bit one but not for the other.
> > >
> > > I think the right thing to do is:
> > >
> > > 1. check if table[0] is in use, if yes => fail (which this does
> > > already)
> > >
> > > 2. remove default dma window but keep the iommu_table struct with
> > > one
> > > change - set it_size to 0 (and free it_map) so the 32bit device
> > > won't
> > > look at a stale structure and think there is some window
> > > (imaginery
> > > situation for phyp but easy to recreate in qemu).
> > >
> > > 3. use table[1] for newly created indirect DDW window.
> > >
> > > 4. change get_iommu_table_base() to return a usable table (or may
> > > be
> > > not
> > > needed?).
> > >
> > > If this sounds reasonable (does it?),
> >
> > Looks ok, I will try your suggestion.
> > I was not aware of how pci->table_group->tables[] worked, so I
> > replaced
> > pci->table_group->tables[0] with the new tbl, while moving the
> > older in
> > pci->table_group->tables[1].
>
>
> pci->table_group->tables[0] is window#0 at @0.
> pci->table_group->tables[1] is window#1 at 0x0800.0000.0000.0000.
> That
> is all :)
>
> pseries does not use tables[1] but powernv does (by VFIO only
> though).
Thanks! This helped a lot!
>
>
> > (4) get_iommu_table_base() does not seem to need update, as it
> > returns
> > the tlb set by set_iommu_table_base() which is already called in
> > the
> > !direct_mapping path in current patch.
>
> Sounds right.
>
> >
> > > the question is now if you have
> > > time to do that and the hardware to test that, or I'll have to
> > > finish
> > > the work :)
> >
> > Sorry, for some reason part of this got lost in Evolution mail
> > client.
> >
> > If possible, I do want to finish this work, and I am talking to IBM
> > Virt people in order to get testing HW.
>
>
> Even I struggle to find a powervm machine :)
>
> > >
> > >
> > > > + else
> > > > + pci->table_group->tables[1] = tbl;
> > >
> > >
> > > What is this for?
> >
> > I was thinking of adding the older table to pci->table_group-
> > >tables[1]
> > while keeping the newer table on pci->table_group->tables[0].
> > This did work, but I think your suggestion may work better.
> >
> > Best regards,
> > Leonardo Bras
> >
> >
>
Thanks a lot for reviewing Alexey!
I will send a v5 soon.
Best regards,
Leonardo Bras