2020-06-19 05:09:38

by Leonardo Brás

[permalink] [raw]
Subject: [PATCH 0/4] Remove default DMA window before creating DDW

There are some devices that only allow 1 DMA window to exist at a time,
and in those cases, a DDW is never created to them, since the default DMA
window keeps using this resource.

LoPAR recommends this procedure:
1. Remove the default DMA window,
2. Query for which configs the DDW can be created,
3. Create a DDW.

Patch #1:
- After LoPAR level 2.8, there is an extension that can make
ibm,query-pe-dma-windows to have 6 outputs instead of 5. This changes the
order of the outputs, and that can cause some trouble.
- query_ddw() was updated to check how many outputs the
ibm,query-pe-dma-windows is supposed to have, update the rtas_call() and
deal correctly with the outputs in both cases.
- This patch looks somehow unrelated to the series, but it can avoid future
problems on DDW creation.

Patch #2 implements a new rtas call to recover the default DMA window,
in case anything fails after it was removed, and a DDW couldn't be created.

Patch #3 moves the window-removing code from remove_ddw() to
remove_dma_window(), creating a way to delete any DMA window, so it can be
used to delete the default DMA window.

Patch #4 makes use of the remove_dma_window() from patch #3 to remove the
default DMA window before query_ddw() and the rtas call from patch #2
to recover it if something goes wrong.

All patches were tested into an LPAR with an Ethernet VF:
4005:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family
[ConnectX-4 Virtual Function]

Leonardo Bras (4):
powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call
powerpc/pseries/iommu: Move window-removing part of remove_ddw into
remove_dma_window
powerpc/pseries/iommu: Remove default DMA window before creating DDW

arch/powerpc/platforms/pseries/iommu.c | 163 +++++++++++++++++++------
1 file changed, 127 insertions(+), 36 deletions(-)

--
2.25.4


2020-06-19 05:09:44

by Leonardo Brás

[permalink] [raw]
Subject: [PATCH 1/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows

From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
outputs from "ibm,query-pe-dma-windows" go from 5 to 6.

This change of output size is meant to expand the address size of
largest_available_block PE TCE from 32-bit to 64-bit, which ends up
shifting page_size and migration_capable.

This ends up requiring the update of
ddw_query_response->largest_available_block from u32 to u64, and manually
assigning the values from the buffer into this struct, according to
output size.

Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 57 +++++++++++++++++++++-----
1 file changed, 46 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 6d47b4a3ce39..e5a617738c8b 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -334,7 +334,7 @@ struct direct_window {
/* Dynamic DMA Window support */
struct ddw_query_response {
u32 windows_available;
- u32 largest_available_block;
+ u64 largest_available_block;
u32 page_size;
u32 migration_capable;
};
@@ -869,14 +869,32 @@ static int find_existing_ddw_windows(void)
}
machine_arch_initcall(pseries, find_existing_ddw_windows);

+/*
+ * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many output
+ * parameters ibm,query-pe-dma-windows will have, ranging from 5 to 6.
+ */
+
+static int query_ddw_out_sz(struct device_node *par_dn)
+{
+ int ret;
+ u32 ddw_ext[3];
+
+ ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
+ &ddw_ext[0], 3);
+ if (ret || ddw_ext[0] < 2 || ddw_ext[2] != 1)
+ return 5;
+ return 6;
+}
+
static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
- struct ddw_query_response *query)
+ struct ddw_query_response *query,
+ struct device_node *par_dn)
{
struct device_node *dn;
struct pci_dn *pdn;
- u32 cfg_addr;
+ u32 cfg_addr, query_out[5];
u64 buid;
- int ret;
+ int ret, out_sz;

/*
* Get the config address and phb buid of the PE window.
@@ -888,12 +906,29 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
pdn = PCI_DN(dn);
buid = pdn->phb->buid;
cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
+ out_sz = query_ddw_out_sz(par_dn);
+
+ ret = rtas_call(ddw_avail[0], 3, out_sz, query_out,
+ cfg_addr, BUID_HI(buid), BUID_LO(buid));
+ dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned %d\n",
+ ddw_avail[0], cfg_addr, BUID_HI(buid), BUID_LO(buid), ret);
+
+ switch (out_sz) {
+ case 5:
+ query->windows_available = query_out[0];
+ query->largest_available_block = query_out[1];
+ query->page_size = query_out[2];
+ query->migration_capable = query_out[3];
+ break;
+ case 6:
+ query->windows_available = query_out[0];
+ query->largest_available_block = ((u64)query_out[1] << 32) |
+ query_out[2];
+ query->page_size = query_out[3];
+ query->migration_capable = query_out[4];
+ break;
+ }

- ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
- cfg_addr, BUID_HI(buid), BUID_LO(buid));
- dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
- " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
- BUID_LO(buid), ret);
return ret;
}

@@ -1040,7 +1075,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
* of page sizes: supported and supported for migrate-dma.
*/
dn = pci_device_to_OF_node(dev);
- ret = query_ddw(dev, ddw_avail, &query);
+ ret = query_ddw(dev, ddw_avail, &query, pdn);
if (ret != 0)
goto out_failed;

@@ -1068,7 +1103,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
/* check largest block * page size > max memory hotplug addr */
max_addr = ddw_memory_hotplug_max();
if (query.largest_available_block < (max_addr >> page_shift)) {
- dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
+ dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
"%llu-sized pages\n", max_addr, query.largest_available_block,
1ULL << page_shift);
goto out_failed;
--
2.25.4

2020-06-19 05:10:42

by Leonardo Brás

[permalink] [raw]
Subject: [PATCH 2/4] powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call

Platforms supporting the DDW option starting with LoPAR level 2.7 implement
ibm,ddw-extensions. The first extension available (index 2) carries the
token for ibm,reset-pe-dma-windows rtas call, which is used to restore
the default DMA window for a device, if it has been deleted.

It does so by resetting the TCE table allocation for the PE to it's
boot time value, available in "ibm,dma-window" device tree node.

Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 33 ++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index e5a617738c8b..5e1fbc176a37 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1012,6 +1012,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
return max_addr;
}

+/*
+ * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
+ * ibm,ddw-extensions, which carries the rtas token for
+ * ibm,reset-pe-dma-windows.
+ * That rtas-call can be used to restore the default DMA window for the device.
+ */
+static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
+{
+ int ret;
+ u32 cfg_addr, ddw_ext[3];
+ u64 buid;
+ struct device_node *dn;
+ struct pci_dn *pdn;
+
+ ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
+ &ddw_ext[0], 3);
+ if (ret)
+ return;
+
+ dn = pci_device_to_OF_node(dev);
+ pdn = PCI_DN(dn);
+ buid = pdn->phb->buid;
+ cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
+
+ ret = rtas_call(ddw_ext[1], 3, 1, NULL, cfg_addr,
+ BUID_HI(buid), BUID_LO(buid));
+ if (ret)
+ dev_info(&dev->dev,
+ "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
+ ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
+ ret);
+}
+
/*
* If the PE supports dynamic dma windows, and there is space for a table
* that can map all pages in a linear offset, then setup such a table,
--
2.25.4

2020-06-19 05:10:57

by Leonardo Brás

[permalink] [raw]
Subject: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

Move the window-removing part of remove_ddw into a new function
(remove_dma_window), so it can be used to remove other DMA windows.

It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
property, like the default DMA window from the device, which uses
"ibm,dma-window".

Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 53 +++++++++++++++-----------
1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 5e1fbc176a37..de633f6ae093 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -767,25 +767,14 @@ static int __init disable_ddw_setup(char *str)

early_param("disable_ddw", disable_ddw_setup);

-static void remove_ddw(struct device_node *np, bool remove_prop)
+static void remove_dma_window(struct device_node *pdn, u32 *ddw_avail,
+ struct property *win)
{
struct dynamic_dma_window_prop *dwp;
- struct property *win64;
- u32 ddw_avail[3];
u64 liobn;
- int ret = 0;
-
- ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
- &ddw_avail[0], 3);
-
- win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
- if (!win64)
- return;
-
- if (ret || win64->length < sizeof(*dwp))
- goto delprop;
+ int ret;

- dwp = win64->value;
+ dwp = win->value;
liobn = (u64)be32_to_cpu(dwp->liobn);

/* clear the whole window, note the arg is in kernel pages */
@@ -793,24 +782,44 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
if (ret)
pr_warn("%pOF failed to clear tces in window.\n",
- np);
+ pdn);
else
pr_debug("%pOF successfully cleared tces in window.\n",
- np);
+ pdn);

ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
if (ret)
pr_warn("%pOF: failed to remove direct window: rtas returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
- np, ret, ddw_avail[2], liobn);
+ pdn, ret, ddw_avail[2], liobn);
else
pr_debug("%pOF: successfully removed direct window: rtas returned "
"%d to ibm,remove-pe-dma-window(%x) %llx\n",
- np, ret, ddw_avail[2], liobn);
+ pdn, ret, ddw_avail[2], liobn);
+}
+
+static void remove_ddw(struct device_node *np, bool remove_prop)
+{
+ struct property *win;
+ u32 ddw_avail[3];
+ int ret = 0;
+
+ ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
+ &ddw_avail[0], 3);
+ if (ret)
+ return;
+
+ win = of_find_property(np, DIRECT64_PROPNAME, NULL);
+ if (!win)
+ return;
+
+ if (win->length >= sizeof(struct dynamic_dma_window_prop))
+ remove_dma_window(np, ddw_avail, win);
+
+ if (!remove_prop)
+ return;

-delprop:
- if (remove_prop)
- ret = of_remove_property(np, win64);
+ ret = of_remove_property(np, win);
if (ret)
pr_warn("%pOF: failed to remove direct window property: %d\n",
np, ret);
--
2.25.4

2020-06-19 05:13:32

by Leonardo Brás

[permalink] [raw]
Subject: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW

On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
default DMA window for the device, before attempting to configure a DDW,
in order to make the maximum resources available for the next DDW to be
created.

This is a requirement for some devices to use DDW, given they only
allow one DMA window.

If setting up a new DDW fails anywhere after the removal of this
default DMA window, restore it using reset_dma_window.

Signed-off-by: Leonardo Bras <[email protected]>
---
arch/powerpc/platforms/pseries/iommu.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index de633f6ae093..68d1ea957ac7 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1074,8 +1074,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
u64 dma_addr, max_addr;
struct device_node *dn;
u32 ddw_avail[3];
+
struct direct_window *window;
- struct property *win64;
+ struct property *win64, *dfl_win;
struct dynamic_dma_window_prop *ddwprop;
struct failed_ddw_pdn *fpdn;

@@ -1110,8 +1111,19 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
if (ret)
goto out_failed;

- /*
- * Query if there is a second window of size to map the
+ /*
+ * First step of setting up DDW is removing the default DMA window,
+ * if it's present. It will make all the resources available to the
+ * new DDW window.
+ * If anything fails after this, we need to restore it.
+ */
+
+ dfl_win = of_find_property(pdn, "ibm,dma-window", NULL);
+ if (dfl_win)
+ remove_dma_window(pdn, ddw_avail, dfl_win);
+
+ /*
+ * Query if there is a window of size to map the
* whole partition. Query returns number of windows, largest
* block assigned to PE (partition endpoint), and two bitmasks
* of page sizes: supported and supported for migrate-dma.
@@ -1219,6 +1231,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
kfree(win64);

out_failed:
+ if (dfl_win)
+ reset_dma_window(dev, pdn);

fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
if (!fpdn)
--
2.25.4

2020-06-20 06:21:58

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW

Hi Leonardo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on v5.8-rc1 next-20200618]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Leonardo-Bras/Remove-default-DMA-window-before-creating-DDW/20200619-131022
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r031-20200619 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 63700971ac9cdf198faa4a3a7c226fa579e49206)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> arch/powerpc/platforms/pseries/iommu.c:1111:6: warning: variable 'dfl_win' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (ret)
^~~
arch/powerpc/platforms/pseries/iommu.c:1234:6: note: uninitialized use occurs here
if (dfl_win)
^~~~~~~
arch/powerpc/platforms/pseries/iommu.c:1111:2: note: remove the 'if' if its condition is always false
if (ret)
^~~~~~~~
arch/powerpc/platforms/pseries/iommu.c:1079:34: note: initialize the variable 'dfl_win' to silence this warning
struct property *win64, *dfl_win;
^
= NULL
1 warning generated.

vim +1111 arch/powerpc/platforms/pseries/iommu.c

715a454e17d328 Leonardo Bras 2020-06-19 1056
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1057 /*
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1058 * If the PE supports dynamic dma windows, and there is space for a table
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1059 * that can map all pages in a linear offset, then setup such a table,
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1060 * and record the dma-offset in the struct device.
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1061 *
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1062 * dev: the pci device we are checking
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1063 * pdn: the parent pe node with the ibm,dma_window property
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1064 * Future: also check if we can remap the base window for our base page size
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1065 *
9ae2fddeda4cbf Christoph Hellwig 2019-02-13 1066 * returns the dma offset for use by the direct mapped DMA code.
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1067 */
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1068 static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1069 {
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1070 int len, ret;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1071 struct ddw_query_response query;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1072 struct ddw_create_response create;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1073 int page_shift;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1074 u64 dma_addr, max_addr;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1075 struct device_node *dn;
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1076 u32 ddw_avail[3];
3248d5f65aac44 Leonardo Bras 2020-06-19 1077
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1078 struct direct_window *window;
3248d5f65aac44 Leonardo Bras 2020-06-19 1079 struct property *win64, *dfl_win;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1080 struct dynamic_dma_window_prop *ddwprop;
61435690a9c781 Nishanth Aravamudan 2013-03-07 1081 struct failed_ddw_pdn *fpdn;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1082
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1083 mutex_lock(&direct_window_init_mutex);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1084
b73a635f348610 Milton Miller 2011-05-11 1085 dma_addr = find_existing_ddw(pdn);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1086 if (dma_addr != 0)
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1087 goto out_unlock;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1088
61435690a9c781 Nishanth Aravamudan 2013-03-07 1089 /*
61435690a9c781 Nishanth Aravamudan 2013-03-07 1090 * If we already went through this for a previous function of
61435690a9c781 Nishanth Aravamudan 2013-03-07 1091 * the same device and failed, we don't want to muck with the
61435690a9c781 Nishanth Aravamudan 2013-03-07 1092 * DMA window again, as it will race with in-flight operations
61435690a9c781 Nishanth Aravamudan 2013-03-07 1093 * and can lead to EEHs. The above mutex protects access to the
61435690a9c781 Nishanth Aravamudan 2013-03-07 1094 * list.
61435690a9c781 Nishanth Aravamudan 2013-03-07 1095 */
61435690a9c781 Nishanth Aravamudan 2013-03-07 1096 list_for_each_entry(fpdn, &failed_ddw_pdn_list, list) {
b7c670d673d118 Rob Herring 2017-08-21 1097 if (fpdn->pdn == pdn)
61435690a9c781 Nishanth Aravamudan 2013-03-07 1098 goto out_unlock;
61435690a9c781 Nishanth Aravamudan 2013-03-07 1099 }
61435690a9c781 Nishanth Aravamudan 2013-03-07 1100
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1101 /*
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1102 * the ibm,ddw-applicable property holds the tokens for:
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1103 * ibm,query-pe-dma-window
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1104 * ibm,create-pe-dma-window
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1105 * ibm,remove-pe-dma-window
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1106 * for the given node in that order.
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1107 * the property is actually in the parent, not the PE
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1108 */
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1109 ret = of_property_read_u32_array(pdn, "ibm,ddw-applicable",
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1110 &ddw_avail[0], 3);
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 @1111 if (ret)
ae69e1eddc646f Nishanth Aravamudan 2014-01-10 1112 goto out_failed;
25ebc45b93452d Nishanth Aravamudan 2012-05-15 1113
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1114 /*
3248d5f65aac44 Leonardo Bras 2020-06-19 1115 * First step of setting up DDW is removing the default DMA window,
3248d5f65aac44 Leonardo Bras 2020-06-19 1116 * if it's present. It will make all the resources available to the
3248d5f65aac44 Leonardo Bras 2020-06-19 1117 * new DDW window.
3248d5f65aac44 Leonardo Bras 2020-06-19 1118 * If anything fails after this, we need to restore it.
3248d5f65aac44 Leonardo Bras 2020-06-19 1119 */
3248d5f65aac44 Leonardo Bras 2020-06-19 1120
3248d5f65aac44 Leonardo Bras 2020-06-19 1121 dfl_win = of_find_property(pdn, "ibm,dma-window", NULL);
3248d5f65aac44 Leonardo Bras 2020-06-19 1122 if (dfl_win)
3248d5f65aac44 Leonardo Bras 2020-06-19 1123 remove_dma_window(pdn, ddw_avail, dfl_win);
3248d5f65aac44 Leonardo Bras 2020-06-19 1124
3248d5f65aac44 Leonardo Bras 2020-06-19 1125 /*
3248d5f65aac44 Leonardo Bras 2020-06-19 1126 * Query if there is a window of size to map the
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1127 * whole partition. Query returns number of windows, largest
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1128 * block assigned to PE (partition endpoint), and two bitmasks
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1129 * of page sizes: supported and supported for migrate-dma.
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1130 */
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1131 dn = pci_device_to_OF_node(dev);
0ef1ee0bda323e Leonardo Bras 2020-06-19 1132 ret = query_ddw(dev, ddw_avail, &query, pdn);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1133 if (ret != 0)
ae69e1eddc646f Nishanth Aravamudan 2014-01-10 1134 goto out_failed;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1135
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1136 if (query.windows_available == 0) {
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1137 /*
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1138 * no additional windows are available for this device.
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1139 * We might be able to reallocate the existing window,
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1140 * trading in for a larger page size.
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1141 */
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1142 dev_dbg(&dev->dev, "no free dynamic windows");
ae69e1eddc646f Nishanth Aravamudan 2014-01-10 1143 goto out_failed;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1144 }
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1145 if (query.page_size & 4) {
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1146 page_shift = 24; /* 16MB */
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1147 } else if (query.page_size & 2) {
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1148 page_shift = 16; /* 64kB */
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1149 } else if (query.page_size & 1) {
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1150 page_shift = 12; /* 4kB */
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1151 } else {
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1152 dev_dbg(&dev->dev, "no supported direct page size in mask %x",
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1153 query.page_size);
ae69e1eddc646f Nishanth Aravamudan 2014-01-10 1154 goto out_failed;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1155 }
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1156 /* verify the window * number of ptes will map the partition */
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1157 /* check largest block * page size > max memory hotplug addr */
68c0449ea16d77 Alexey Kardashevskiy 2018-12-19 1158 max_addr = ddw_memory_hotplug_max();
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1159 if (query.largest_available_block < (max_addr >> page_shift)) {
0ef1ee0bda323e Leonardo Bras 2020-06-19 1160 dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1161 "%llu-sized pages\n", max_addr, query.largest_available_block,
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1162 1ULL << page_shift);
ae69e1eddc646f Nishanth Aravamudan 2014-01-10 1163 goto out_failed;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1164 }
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1165 len = order_base_2(max_addr);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1166 win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1167 if (!win64) {
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1168 dev_info(&dev->dev,
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1169 "couldn't allocate property for 64bit dma window\n");
ae69e1eddc646f Nishanth Aravamudan 2014-01-10 1170 goto out_failed;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1171 }
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1172 win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1173 win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
767303349e052a Nishanth Aravamudan 2011-05-06 1174 win64->length = sizeof(*ddwprop);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1175 if (!win64->name || !win64->value) {
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1176 dev_info(&dev->dev,
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1177 "couldn't allocate property name and value\n");
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1178 goto out_free_prop;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1179 }
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1180
b73a635f348610 Milton Miller 2011-05-11 1181 ret = create_ddw(dev, ddw_avail, &create, page_shift, len);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1182 if (ret != 0)
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1183 goto out_free_prop;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1184
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1185 ddwprop->liobn = cpu_to_be32(create.liobn);
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1186 ddwprop->dma_base = cpu_to_be64(((u64)create.addr_hi << 32) |
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1187 create.addr_lo);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1188 ddwprop->tce_shift = cpu_to_be32(page_shift);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1189 ddwprop->window_shift = cpu_to_be32(len);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1190
b7c670d673d118 Rob Herring 2017-08-21 1191 dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
b7c670d673d118 Rob Herring 2017-08-21 1192 create.liobn, dn);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1193
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1194 window = kzalloc(sizeof(*window), GFP_KERNEL);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1195 if (!window)
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1196 goto out_clear_window;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1197
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1198 ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1199 win64->value, tce_setrange_multi_pSeriesLP_walk);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1200 if (ret) {
b7c670d673d118 Rob Herring 2017-08-21 1201 dev_info(&dev->dev, "failed to map direct window for %pOF: %d\n",
b7c670d673d118 Rob Herring 2017-08-21 1202 dn, ret);
7a19081fc26581 Julia Lawall 2011-08-08 1203 goto out_free_window;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1204 }
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1205
79d1c712958f94 Nathan Fontenot 2012-10-02 1206 ret = of_add_property(pdn, win64);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1207 if (ret) {
b7c670d673d118 Rob Herring 2017-08-21 1208 dev_err(&dev->dev, "unable to add dma window property for %pOF: %d",
b7c670d673d118 Rob Herring 2017-08-21 1209 pdn, ret);
7a19081fc26581 Julia Lawall 2011-08-08 1210 goto out_free_window;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1211 }
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1212
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1213 window->device = pdn;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1214 window->prop = ddwprop;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1215 spin_lock(&direct_window_list_lock);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1216 list_add(&window->list, &direct_window_list);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1217 spin_unlock(&direct_window_list_lock);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1218
9410e0185e6539 Alexey Kardashevskiy 2014-09-25 1219 dma_addr = be64_to_cpu(ddwprop->dma_base);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1220 goto out_unlock;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1221
7a19081fc26581 Julia Lawall 2011-08-08 1222 out_free_window:
7a19081fc26581 Julia Lawall 2011-08-08 1223 kfree(window);
7a19081fc26581 Julia Lawall 2011-08-08 1224
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1225 out_clear_window:
5efbabe09d986f Gavin Shan 2014-08-11 1226 remove_ddw(pdn, true);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1227
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1228 out_free_prop:
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1229 kfree(win64->name);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1230 kfree(win64->value);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1231 kfree(win64);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1232
ae69e1eddc646f Nishanth Aravamudan 2014-01-10 1233 out_failed:
3248d5f65aac44 Leonardo Bras 2020-06-19 1234 if (dfl_win)
3248d5f65aac44 Leonardo Bras 2020-06-19 1235 reset_dma_window(dev, pdn);
25ebc45b93452d Nishanth Aravamudan 2012-05-15 1236
61435690a9c781 Nishanth Aravamudan 2013-03-07 1237 fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
61435690a9c781 Nishanth Aravamudan 2013-03-07 1238 if (!fpdn)
61435690a9c781 Nishanth Aravamudan 2013-03-07 1239 goto out_unlock;
61435690a9c781 Nishanth Aravamudan 2013-03-07 1240 fpdn->pdn = pdn;
61435690a9c781 Nishanth Aravamudan 2013-03-07 1241 list_add(&fpdn->list, &failed_ddw_pdn_list);
61435690a9c781 Nishanth Aravamudan 2013-03-07 1242
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1243 out_unlock:
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1244 mutex_unlock(&direct_window_init_mutex);
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1245 return dma_addr;
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1246 }
4e8b0cf46b2570 Nishanth Aravamudan 2011-02-10 1247

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (17.69 kB)
.config.gz (39.41 kB)
Download all attachments

2020-06-22 10:06:22

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 2/4] powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call



On 19/06/2020 15:06, Leonardo Bras wrote:
> Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> ibm,ddw-extensions. The first extension available (index 2) carries the
> token for ibm,reset-pe-dma-windows rtas call, which is used to restore
> the default DMA window for a device, if it has been deleted.
>
> It does so by resetting the TCE table allocation for the PE to it's
> boot time value, available in "ibm,dma-window" device tree node.
>
> Signed-off-by: Leonardo Bras <[email protected]>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 33 ++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index e5a617738c8b..5e1fbc176a37 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1012,6 +1012,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
> return max_addr;
> }
>
> +/*
> + * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> + * ibm,ddw-extensions, which carries the rtas token for
> + * ibm,reset-pe-dma-windows.
> + * That rtas-call can be used to restore the default DMA window for the device.
> + */
> +static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
> +{
> + int ret;
> + u32 cfg_addr, ddw_ext[3];
> + u64 buid;
> + struct device_node *dn;
> + struct pci_dn *pdn;
> +
> + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
> + &ddw_ext[0], 3);

s/3/2/ as for the reset extension you do not need the "64bit largest
block" extension.


> + if (ret)
> + return;
> +
> + dn = pci_device_to_OF_node(dev);
> + pdn = PCI_DN(dn);
> + buid = pdn->phb->buid;
> + cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> +
> + ret = rtas_call(ddw_ext[1], 3, 1, NULL, cfg_addr,


Here the "reset" extention is in ddw_ext[1]. Hm. 1/4 has a bug then.

And I am pretty sure it won't compile as reset_dma_window() is not used
and it is static so fold it into one the next patches. Thanks,


> + BUID_HI(buid), BUID_LO(buid));
> + if (ret)
> + dev_info(&dev->dev,
> + "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
> + ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
> + ret);
> +}
> +
> /*
> * If the PE supports dynamic dma windows, and there is space for a table
> * that can map all pages in a linear offset, then setup such a table,
>

--
Alexey

2020-06-22 10:08:11

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW



On 19/06/2020 15:06, Leonardo Bras wrote:
> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
> default DMA window for the device, before attempting to configure a DDW,
> in order to make the maximum resources available for the next DDW to be
> created.
>
> This is a requirement for some devices to use DDW, given they only
> allow one DMA window.
>
> If setting up a new DDW fails anywhere after the removal of this
> default DMA window, restore it using reset_dma_window.

Nah... If we do it like this, then under pHyp we lose 32bit DMA for good
as pHyp can only create a single window and it has to map at
0x800.0000.0000.0000. They probably do not care though.

Under KVM, this will fail as VFIO allows creating 2 windows and it
starts from 0 but the existing iommu_bypass_supported_pSeriesLP() treats
the window address == 0 as a failure. And we want to keep both DMA
windows for PCI adapters with both 64bit and 32bit PCI functions (I
heard AMD GPU video + audio are like this) or someone could hotplug
32bit DMA device on a vphb with already present 64bit DMA window so we
do not remove the default window.

The last discussed thing I remember was that there was supposed to be a
new bit in "ibm,architecture-vec-5" (forgot the details), we could use
that to decide whether to keep the default window or not, like this.

>
> Signed-off-by: Leonardo Bras <[email protected]>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index de633f6ae093..68d1ea957ac7 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1074,8 +1074,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> u64 dma_addr, max_addr;
> struct device_node *dn;
> u32 ddw_avail[3];
> +
> struct direct_window *window;
> - struct property *win64;
> + struct property *win64, *dfl_win;

Make it "default_win" or "def_win", "dfl" hurts to read :)

> struct dynamic_dma_window_prop *ddwprop;
> struct failed_ddw_pdn *fpdn;
>
> @@ -1110,8 +1111,19 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> if (ret)
> goto out_failed;
>
> - /*
> - * Query if there is a second window of size to map the
> + /*
> + * First step of setting up DDW is removing the default DMA window,
> + * if it's present. It will make all the resources available to the
> + * new DDW window.
> + * If anything fails after this, we need to restore it.
> + */
> +
> + dfl_win = of_find_property(pdn, "ibm,dma-window", NULL);
> + if (dfl_win)
> + remove_dma_window(pdn, ddw_avail, dfl_win);

Before doing so, you want to make sure that the "reset" is actually
supported. Thanks,


> +
> + /*
> + * Query if there is a window of size to map the
> * whole partition. Query returns number of windows, largest
> * block assigned to PE (partition endpoint), and two bitmasks
> * of page sizes: supported and supported for migrate-dma.
> @@ -1219,6 +1231,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> kfree(win64);
>
> out_failed:
> + if (dfl_win)
> + reset_dma_window(dev, pdn);
>
> fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
> if (!fpdn)
>

--
Alexey

2020-06-22 10:08:12

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 1/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows



On 19/06/2020 15:06, Leonardo Bras wrote:
> From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
> outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
>
> This change of output size is meant to expand the address size of
> largest_available_block PE TCE from 32-bit to 64-bit, which ends up
> shifting page_size and migration_capable.
>
> This ends up requiring the update of
> ddw_query_response->largest_available_block from u32 to u64, and manually
> assigning the values from the buffer into this struct, according to
> output size.
>
> Signed-off-by: Leonardo Bras <[email protected]>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 57 +++++++++++++++++++++-----
> 1 file changed, 46 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 6d47b4a3ce39..e5a617738c8b 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -334,7 +334,7 @@ struct direct_window {
> /* Dynamic DMA Window support */
> struct ddw_query_response {
> u32 windows_available;
> - u32 largest_available_block;
> + u64 largest_available_block;
> u32 page_size;
> u32 migration_capable;
> };
> @@ -869,14 +869,32 @@ static int find_existing_ddw_windows(void)
> }
> machine_arch_initcall(pseries, find_existing_ddw_windows);
>
> +/*
> + * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many output
> + * parameters ibm,query-pe-dma-windows will have, ranging from 5 to 6.
> + */
> +
> +static int query_ddw_out_sz(struct device_node *par_dn)

Can easily be folded into query_ddw().

> +{
> + int ret;
> + u32 ddw_ext[3];
> +
> + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
> + &ddw_ext[0], 3);
> + if (ret || ddw_ext[0] < 2 || ddw_ext[2] != 1)


Oh that PAPR thing again :-/

===
The “ibm,ddw-extensions” property value is a list of integers the first
integer indicates the number of extensions implemented and subsequent
integers, one per extension, provide a value associated with that
extension.
===

So ddw_ext[0] is length.
Listindex==2 is for "reset" says PAPR and
Listindex==3 is for this new 64bit "largest_available_block".

So I'd expect ddw_ext[2] to have the "reset" token and ddw_ext[3] to
have "1" for this new feature but indexes are smaller. I am confused.
Either way these "2" and "3" needs to be defined in macros, "0" probably
too.

Please post 'lsprop "ibm,ddw-extensions"' here. Thanks,



> + return 5;
> + return 6;
> +}
> +
> static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> - struct ddw_query_response *query)
> + struct ddw_query_response *query,
> + struct device_node *par_dn)
> {
> struct device_node *dn;
> struct pci_dn *pdn;
> - u32 cfg_addr;
> + u32 cfg_addr, query_out[5];
> u64 buid;
> - int ret;
> + int ret, out_sz;
>
> /*
> * Get the config address and phb buid of the PE window.
> @@ -888,12 +906,29 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> pdn = PCI_DN(dn);
> buid = pdn->phb->buid;
> cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> + out_sz = query_ddw_out_sz(par_dn);
> +
> + ret = rtas_call(ddw_avail[0], 3, out_sz, query_out,
> + cfg_addr, BUID_HI(buid), BUID_LO(buid));
> + dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned %d\n",
> + ddw_avail[0], cfg_addr, BUID_HI(buid), BUID_LO(buid), ret);
> +
> + switch (out_sz) {
> + case 5:
> + query->windows_available = query_out[0];
> + query->largest_available_block = query_out[1];
> + query->page_size = query_out[2];
> + query->migration_capable = query_out[3];
> + break;
> + case 6:
> + query->windows_available = query_out[0];
> + query->largest_available_block = ((u64)query_out[1] << 32) |
> + query_out[2];
> + query->page_size = query_out[3];
> + query->migration_capable = query_out[4];
> + break;
> + }
>
> - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> - cfg_addr, BUID_HI(buid), BUID_LO(buid));
> - dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> - " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
> - BUID_LO(buid), ret);
> return ret;
> }
>
> @@ -1040,7 +1075,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> * of page sizes: supported and supported for migrate-dma.
> */
> dn = pci_device_to_OF_node(dev);
> - ret = query_ddw(dev, ddw_avail, &query);
> + ret = query_ddw(dev, ddw_avail, &query, pdn);
> if (ret != 0)
> goto out_failed;
>
> @@ -1068,7 +1103,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> /* check largest block * page size > max memory hotplug addr */
> max_addr = ddw_memory_hotplug_max();
> if (query.largest_available_block < (max_addr >> page_shift)) {
> - dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
> + dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
> "%llu-sized pages\n", max_addr, query.largest_available_block,
> 1ULL << page_shift);
> goto out_failed;
>

--
Alexey

2020-06-22 10:08:20

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window



On 19/06/2020 15:06, Leonardo Bras wrote:
> Move the window-removing part of remove_ddw into a new function
> (remove_dma_window), so it can be used to remove other DMA windows.
>
> It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
> property, like the default DMA window from the device, which uses
> "ibm,dma-window".
>
> Signed-off-by: Leonardo Bras <[email protected]>
> ---
> arch/powerpc/platforms/pseries/iommu.c | 53 +++++++++++++++-----------
> 1 file changed, 31 insertions(+), 22 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 5e1fbc176a37..de633f6ae093 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -767,25 +767,14 @@ static int __init disable_ddw_setup(char *str)
>
> early_param("disable_ddw", disable_ddw_setup);
>
> -static void remove_ddw(struct device_node *np, bool remove_prop)
> +static void remove_dma_window(struct device_node *pdn, u32 *ddw_avail,

You do not need the entire ddw_avail here, pass just the token you need.

Also, despite this particular file, the "pdn" name is usually used for
struct pci_dn (not device_node), let's keep it that way.


> + struct property *win)
> {
> struct dynamic_dma_window_prop *dwp;
> - struct property *win64;
> - u32 ddw_avail[3];
> u64 liobn;
> - int ret = 0;
> -
> - ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> - &ddw_avail[0], 3);
> -
> - win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> - if (!win64)
> - return;
> -
> - if (ret || win64->length < sizeof(*dwp))
> - goto delprop;
> + int ret;
>
> - dwp = win64->value;
> + dwp = win->value;
> liobn = (u64)be32_to_cpu(dwp->liobn);
>
> /* clear the whole window, note the arg is in kernel pages */
> @@ -793,24 +782,44 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
> 1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
> if (ret)
> pr_warn("%pOF failed to clear tces in window.\n",
> - np);
> + pdn);
> else
> pr_debug("%pOF successfully cleared tces in window.\n",
> - np);
> + pdn);
>
> ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> if (ret)
> pr_warn("%pOF: failed to remove direct window: rtas returned "
> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + pdn, ret, ddw_avail[2], liobn);
> else
> pr_debug("%pOF: successfully removed direct window: rtas returned "
> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> - np, ret, ddw_avail[2], liobn);
> + pdn, ret, ddw_avail[2], liobn);
> +}
> +
> +static void remove_ddw(struct device_node *np, bool remove_prop)
> +{
> + struct property *win;
> + u32 ddw_avail[3];
> + int ret = 0;
> +
> + ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> + &ddw_avail[0], 3);
> + if (ret)
> + return;
> +
> + win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> + if (!win)
> + return;
> +
> + if (win->length >= sizeof(struct dynamic_dma_window_prop))


Any good reason not to make it "=="? Is there something optional or we
expect extension (which may not grow from the end but may add cells in
between). Thanks,


> + remove_dma_window(np, ddw_avail, win);
> +
> + if (!remove_prop)
> + return;
>
> -delprop:
> - if (remove_prop)
> - ret = of_remove_property(np, win64);
> + ret = of_remove_property(np, win);
> if (ret)
> pr_warn("%pOF: failed to remove direct window property: %d\n",
> np, ret);
>

--
Alexey

2020-06-22 19:01:46

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 1/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows

Hello Alexey, thank you for the feedback!

On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>
> On 19/06/2020 15:06, Leonardo Bras wrote:
> > From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
> > outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
> >
> > This change of output size is meant to expand the address size of
> > largest_available_block PE TCE from 32-bit to 64-bit, which ends up
> > shifting page_size and migration_capable.
> >
> > This ends up requiring the update of
> > ddw_query_response->largest_available_block from u32 to u64, and manually
> > assigning the values from the buffer into this struct, according to
> > output size.
> >
> > Signed-off-by: Leonardo Bras <[email protected]>
> > ---
> > arch/powerpc/platforms/pseries/iommu.c | 57 +++++++++++++++++++++-----
> > 1 file changed, 46 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index 6d47b4a3ce39..e5a617738c8b 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -334,7 +334,7 @@ struct direct_window {
> > /* Dynamic DMA Window support */
> > struct ddw_query_response {
> > u32 windows_available;
> > - u32 largest_available_block;
> > + u64 largest_available_block;
> > u32 page_size;
> > u32 migration_capable;
> > };
> > @@ -869,14 +869,32 @@ static int find_existing_ddw_windows(void)
> > }
> > machine_arch_initcall(pseries, find_existing_ddw_windows);
> >
> > +/*
> > + * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many output
> > + * parameters ibm,query-pe-dma-windows will have, ranging from 5 to 6.
> > + */
> > +
> > +static int query_ddw_out_sz(struct device_node *par_dn)
>
> Can easily be folded into query_ddw().

Sure, but it will get inlined by the compiler, and I think it reads
better this way.

I mean, I understand you have a reason to think it's better to fold it
in query_ddw(), and I would like to better understand that to improve
my code in the future.

> > +{
> > + int ret;
> > + u32 ddw_ext[3];
> > +
> > + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
> > + &ddw_ext[0], 3);
> > + if (ret || ddw_ext[0] < 2 || ddw_ext[2] != 1)
>
> Oh that PAPR thing again :-/
>
> ===
> The “ibm,ddw-extensions” property value is a list of integers the first
> integer indicates the number of extensions implemented and subsequent
> integers, one per extension, provide a value associated with that
> extension.
> ===
>
> So ddw_ext[0] is length.
> Listindex==2 is for "reset" says PAPR and
> Listindex==3 is for this new 64bit "largest_available_block".
>
> So I'd expect ddw_ext[2] to have the "reset" token and ddw_ext[3] to
> have "1" for this new feature but indexes are smaller. I am confused.
> Either way these "2" and "3" needs to be defined in macros, "0" probably
> too.

Remember these indexes are not C-like 0-starting indexes, where the
size would be Listindex==1.
Basically, in C-like array it's :
a[0] == size,
a[1] == reset_token,
a[2] == new 64bit "largest_available_block"

> Please post 'lsprop "ibm,ddw-extensions"' here. Thanks,

Sure:
[root@host pci@800000029004005]# lsprop "ibm,ddw-extensions"
ibm,dd
w-extensions
00000002 00000056 00000000


>
> > + return 5;
> > + return 6;
> > +}
> > +
> > static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> > - struct ddw_query_response *query)
> > + struct ddw_query_response *query,
> > + struct device_node *par_dn)
> > {
> > struct device_node *dn;
> > struct pci_dn *pdn;
> > - u32 cfg_addr;
> > + u32 cfg_addr, query_out[5];
> > u64 buid;
> > - int ret;
> > + int ret, out_sz;
> >
> > /*
> > * Get the config address and phb buid of the PE window.
> > @@ -888,12 +906,29 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> > pdn = PCI_DN(dn);
> > buid = pdn->phb->buid;
> > cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> > + out_sz = query_ddw_out_sz(par_dn);
> > +
> > + ret = rtas_call(ddw_avail[0], 3, out_sz, query_out,
> > + cfg_addr, BUID_HI(buid), BUID_LO(buid));
> > + dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned %d\n",
> > + ddw_avail[0], cfg_addr, BUID_HI(buid), BUID_LO(buid), ret);
> > +
> > + switch (out_sz) {
> > + case 5:
> > + query->windows_available = query_out[0];
> > + query->largest_available_block = query_out[1];
> > + query->page_size = query_out[2];
> > + query->migration_capable = query_out[3];
> > + break;
> > + case 6:
> > + query->windows_available = query_out[0];
> > + query->largest_available_block = ((u64)query_out[1] << 32) |
> > + query_out[2];
> > + query->page_size = query_out[3];
> > + query->migration_capable = query_out[4];
> > + break;
> > + }
> >
> > - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> > - cfg_addr, BUID_HI(buid), BUID_LO(buid));
> > - dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> > - " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
> > - BUID_LO(buid), ret);
> > return ret;
> > }
> >
> > @@ -1040,7 +1075,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > * of page sizes: supported and supported for migrate-dma.
> > */
> > dn = pci_device_to_OF_node(dev);
> > - ret = query_ddw(dev, ddw_avail, &query);
> > + ret = query_ddw(dev, ddw_avail, &query, pdn);
> > if (ret != 0)
> > goto out_failed;
> >
> > @@ -1068,7 +1103,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > /* check largest block * page size > max memory hotplug addr */
> > max_addr = ddw_memory_hotplug_max();
> > if (query.largest_available_block < (max_addr >> page_shift)) {
> > - dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
> > + dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
> > "%llu-sized pages\n", max_addr, query.largest_available_block,
> > 1ULL << page_shift);
> > goto out_failed;
> >

Best regards,
Leonardo

2020-06-22 19:01:51

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 2/4] powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call

Hello Alexey, thanks for the feedback!

On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>
> On 19/06/2020 15:06, Leonardo Bras wrote:
> > Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> > ibm,ddw-extensions. The first extension available (index 2) carries the
> > token for ibm,reset-pe-dma-windows rtas call, which is used to restore
> > the default DMA window for a device, if it has been deleted.
> >
> > It does so by resetting the TCE table allocation for the PE to it's
> > boot time value, available in "ibm,dma-window" device tree node.
> >
> > Signed-off-by: Leonardo Bras <[email protected]>
> > ---
> > arch/powerpc/platforms/pseries/iommu.c | 33 ++++++++++++++++++++++++++
> > 1 file changed, 33 insertions(+)
> >
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index e5a617738c8b..5e1fbc176a37 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -1012,6 +1012,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
> > return max_addr;
> > }
> >
> > +/*
> > + * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> > + * ibm,ddw-extensions, which carries the rtas token for
> > + * ibm,reset-pe-dma-windows.
> > + * That rtas-call can be used to restore the default DMA window for the device.
> > + */
> > +static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
> > +{
> > + int ret;
> > + u32 cfg_addr, ddw_ext[3];
> > + u64 buid;
> > + struct device_node *dn;
> > + struct pci_dn *pdn;
> > +
> > + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
> > + &ddw_ext[0], 3);
>
> s/3/2/ as for the reset extension you do not need the "64bit largest
> block" extension.

Sure, I will update this.

>
>
> > + if (ret)
> > + return;
> > +
> > + dn = pci_device_to_OF_node(dev);
> > + pdn = PCI_DN(dn);
> > + buid = pdn->phb->buid;
> > + cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> > +
> > + ret = rtas_call(ddw_ext[1], 3, 1, NULL, cfg_addr,
>
> Here the "reset" extention is in ddw_ext[1]. Hm. 1/4 has a bug then.

Humm, in 1/4 I used dd_ext[0] (how many extensions) and ddw_ext[2] (64-
bit largest window count). I fail to see the bug here.

> And I am pretty sure it won't compile as reset_dma_window() is not used
> and it is static so fold it into one the next patches. Thanks,

Sure, I will do that.
I was questioning myself about this and thought it would be better to
split for easier revision.

>
>
> > + BUID_HI(buid), BUID_LO(buid));
> > + if (ret)
> > + dev_info(&dev->dev,
> > + "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
> > + ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
> > + ret);
> > +}
> > +
> > /*
> > * If the PE supports dynamic dma windows, and there is space for a table
> > * that can map all pages in a linear offset, then setup such a table,
> >

Best regards,
Leonardo

2020-06-22 19:04:04

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

Hello Alexey, thanks for the feedback!

On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>
> On 19/06/2020 15:06, Leonardo Bras wrote:
> > Move the window-removing part of remove_ddw into a new function
> > (remove_dma_window), so it can be used to remove other DMA windows.
> >
> > It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
> > property, like the default DMA window from the device, which uses
> > "ibm,dma-window".
> >
> > Signed-off-by: Leonardo Bras <[email protected]>
> > ---
> > arch/powerpc/platforms/pseries/iommu.c | 53 +++++++++++++++-----------
> > 1 file changed, 31 insertions(+), 22 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index 5e1fbc176a37..de633f6ae093 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -767,25 +767,14 @@ static int __init disable_ddw_setup(char *str)
> >
> > early_param("disable_ddw", disable_ddw_setup);
> >
> > -static void remove_ddw(struct device_node *np, bool remove_prop)
> > +static void remove_dma_window(struct device_node *pdn, u32 *ddw_avail,
>
> You do not need the entire ddw_avail here, pass just the token you need.

Well, I just emulated the behavior of create_ddw() and query_ddw() as
both just pass the array instead of the token, even though they only
use a single token.

I think it's to make the rest of the code independent of the design of
the "ibm,ddw-applicable" array, and if it changes, only local changes
on the functions will be needed.

> Also, despite this particular file, the "pdn" name is usually used for
> struct pci_dn (not device_node), let's keep it that way.

Sure, I got confused for some time about this, as we have:
static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn).
but on *_ddw() we have "struct pci_dn *pdn".

I will also add a patch that renames those 'struct device_node *pdn' to
something like 'struct device_node *parent_dn'.

> > + struct property *win)
> > {
> > struct dynamic_dma_window_prop *dwp;
> > - struct property *win64;
> > - u32 ddw_avail[3];
> > u64 liobn;
> > - int ret = 0;
> > -
> > - ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> > - &ddw_avail[0], 3);
> > -
> > - win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> > - if (!win64)
> > - return;
> > -
> > - if (ret || win64->length < sizeof(*dwp))
> > - goto delprop;
> > + int ret;
> >
> > - dwp = win64->value;
> > + dwp = win->value;
> > liobn = (u64)be32_to_cpu(dwp->liobn);
> >
> > /* clear the whole window, note the arg is in kernel pages */
> > @@ -793,24 +782,44 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
> > 1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
> > if (ret)
> > pr_warn("%pOF failed to clear tces in window.\n",
> > - np);
> > + pdn);
> > else
> > pr_debug("%pOF successfully cleared tces in window.\n",
> > - np);
> > + pdn);
> >
> > ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> > if (ret)
> > pr_warn("%pOF: failed to remove direct window: rtas returned "
> > "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> > - np, ret, ddw_avail[2], liobn);
> > + pdn, ret, ddw_avail[2], liobn);
> > else
> > pr_debug("%pOF: successfully removed direct window: rtas returned "
> > "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> > - np, ret, ddw_avail[2], liobn);
> > + pdn, ret, ddw_avail[2], liobn);
> > +}
> > +
> > +static void remove_ddw(struct device_node *np, bool remove_prop)
> > +{
> > + struct property *win;
> > + u32 ddw_avail[3];
> > + int ret = 0;
> > +
> > + ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> > + &ddw_avail[0], 3);
> > + if (ret)
> > + return;
> > +
> > + win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> > + if (!win)
> > + return;
> > +
> > + if (win->length >= sizeof(struct dynamic_dma_window_prop))
>
> Any good reason not to make it "=="? Is there something optional or we
> expect extension (which may not grow from the end but may add cells in
> between). Thanks,

Well, it comes from the old behavior of remove_ddw():
- if (ret || win64->length < sizeof(*dwp))
- goto delprop;

As I reversed the logic from 'if (test) go out' to 'if (!test) do
stuff', I also reversed (a < b) to !(a < b) => (a >= b).

I have no problem changing that to '==', but it will produce a
different behavior than before.

>
>
> > + remove_dma_window(np, ddw_avail, win);
> > +
> > + if (!remove_prop)
> > + return;
> >
> > -delprop:
> > - if (remove_prop)
> > - ret = of_remove_property(np, win64);
> > + ret = of_remove_property(np, win);
> > if (ret)
> > pr_warn("%pOF: failed to remove direct window property: %d\n",
> > np, ret);
> >

Best regards,
Leonardo

2020-06-22 19:04:39

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW

Hello Alexey, thanks for the feedback!

On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>
> On 19/06/2020 15:06, Leonardo Bras wrote:
> > On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
> > default DMA window for the device, before attempting to configure a DDW,
> > in order to make the maximum resources available for the next DDW to be
> > created.
> >
> > This is a requirement for some devices to use DDW, given they only
> > allow one DMA window.
> >
> > If setting up a new DDW fails anywhere after the removal of this
> > default DMA window, restore it using reset_dma_window.
>
> Nah... If we do it like this, then under pHyp we lose 32bit DMA for good
> as pHyp can only create a single window and it has to map at
> 0x800.0000.0000.0000. They probably do not care though.
>
> Under KVM, this will fail as VFIO allows creating 2 windows and it
> starts from 0 but the existing iommu_bypass_supported_pSeriesLP() treats
> the window address == 0 as a failure. And we want to keep both DMA
> windows for PCI adapters with both 64bit and 32bit PCI functions (I
> heard AMD GPU video + audio are like this) or someone could hotplug
> 32bit DMA device on a vphb with already present 64bit DMA window so we
> do not remove the default window.

Well, then I would suggest doing something like this:
query_ddw(...);
if (query.windows_available == 0){
remove_dma_window(...,default_win);
query_ddw(...);
}

This would make sure to cover cases of windows available == 1
and windows available > 1;

> The last discussed thing I remember was that there was supposed to be a
> new bit in "ibm,architecture-vec-5" (forgot the details), we could use
> that to decide whether to keep the default window or not, like this.

I checked on the latest LoPAR draft (soon to be published), for the
ibm,architecture-vec 'option array 5' and this entry was the only
recently added one that is related to this patchset:

Byte 8 - Bit 0:
SRIOV Virtual Functions Support Dynamic DMA Windows (DDW):
0: SRIOV Virtual Functions do not support DDW
1: SRIOV Virtual Functions do support DDW

Isn't this equivalent to having a "ibm,ddw-applicable" property?


>
> > Signed-off-by: Leonardo Bras <[email protected]>
> > ---
> > arch/powerpc/platforms/pseries/iommu.c | 20 +++++++++++++++++---
> > 1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index de633f6ae093..68d1ea957ac7 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -1074,8 +1074,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > u64 dma_addr, max_addr;
> > struct device_node *dn;
> > u32 ddw_avail[3];
> > +
> > struct direct_window *window;
> > - struct property *win64;
> > + struct property *win64, *dfl_win;
>
> Make it "default_win" or "def_win", "dfl" hurts to read :)

Sure, no problem :)

>
> > struct dynamic_dma_window_prop *ddwprop;
> > struct failed_ddw_pdn *fpdn;
> >
> > @@ -1110,8 +1111,19 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > if (ret)
> > goto out_failed;
> >
> > - /*
> > - * Query if there is a second window of size to map the
> > + /*
> > + * First step of setting up DDW is removing the default DMA window,
> > + * if it's present. It will make all the resources available to the
> > + * new DDW window.
> > + * If anything fails after this, we need to restore it.
> > + */
> > +
> > + dfl_win = of_find_property(pdn, "ibm,dma-window", NULL);
> > + if (dfl_win)
> > + remove_dma_window(pdn, ddw_avail, dfl_win);
>
> Before doing so, you want to make sure that the "reset" is actually
> supported. Thanks,

Good catch, I will improve that.

>
>
> > +
> > + /*
> > + * Query if there is a window of size to map the
> > * whole partition. Query returns number of windows, largest
> > * block assigned to PE (partition endpoint), and two bitmasks
> > * of page sizes: supported and supported for migrate-dma.
> > @@ -1219,6 +1231,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > kfree(win64);
> >
> > out_failed:
> > + if (dfl_win)
> > + reset_dma_window(dev, pdn);
> >
> > fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
> > if (!fpdn)
> >

Best regards,
Leonardo

2020-06-23 01:14:10

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW



On 23/06/2020 04:59, Leonardo Bras wrote:
> Hello Alexey, thanks for the feedback!
>
> On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>>
>> On 19/06/2020 15:06, Leonardo Bras wrote:
>>> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
>>> default DMA window for the device, before attempting to configure a DDW,
>>> in order to make the maximum resources available for the next DDW to be
>>> created.
>>>
>>> This is a requirement for some devices to use DDW, given they only
>>> allow one DMA window.
>>>
>>> If setting up a new DDW fails anywhere after the removal of this
>>> default DMA window, restore it using reset_dma_window.
>>
>> Nah... If we do it like this, then under pHyp we lose 32bit DMA for good
>> as pHyp can only create a single window and it has to map at
>> 0x800.0000.0000.0000. They probably do not care though.
>>
>> Under KVM, this will fail as VFIO allows creating 2 windows and it
>> starts from 0 but the existing iommu_bypass_supported_pSeriesLP() treats
>> the window address == 0 as a failure. And we want to keep both DMA
>> windows for PCI adapters with both 64bit and 32bit PCI functions (I
>> heard AMD GPU video + audio are like this) or someone could hotplug
>> 32bit DMA device on a vphb with already present 64bit DMA window so we
>> do not remove the default window.
>
> Well, then I would suggest doing something like this:
> query_ddw(...);
> if (query.windows_available == 0){
> remove_dma_window(...,default_win);
> query_ddw(...);
> }
>
> This would make sure to cover cases of windows available == 1
> and windows available > 1;


Is "1" what pHyp returns on query? And was it always like that? Then it
is probably ok. I just never really explored the idea of removing the
default window as we did not have to.


>> The last discussed thing I remember was that there was supposed to be a
>> new bit in "ibm,architecture-vec-5" (forgot the details), we could use
>> that to decide whether to keep the default window or not, like this.
>
> I checked on the latest LoPAR draft (soon to be published), for the
> ibm,architecture-vec 'option array 5' and this entry was the only
> recently added one that is related to this patchset:
>
> Byte 8 - Bit 0:
> SRIOV Virtual Functions Support Dynamic DMA Windows (DDW):
> 0: SRIOV Virtual Functions do not support DDW
> 1: SRIOV Virtual Functions do support DDW
>
> Isn't this equivalent to having a "ibm,ddw-applicable" property?

I am not sure, is there anything else to this new bit? I'd think if the
client supports it, then pHyp will create one 64bit window per a PE and
DDW API won't be needed. Thanks,


>
>
>>
>>> Signed-off-by: Leonardo Bras <[email protected]>
>>> ---
>>> arch/powerpc/platforms/pseries/iommu.c | 20 +++++++++++++++++---
>>> 1 file changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index de633f6ae093..68d1ea957ac7 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -1074,8 +1074,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>> u64 dma_addr, max_addr;
>>> struct device_node *dn;
>>> u32 ddw_avail[3];
>>> +
>>> struct direct_window *window;
>>> - struct property *win64;
>>> + struct property *win64, *dfl_win;
>>
>> Make it "default_win" or "def_win", "dfl" hurts to read :)
>
> Sure, no problem :)
>
>>
>>> struct dynamic_dma_window_prop *ddwprop;
>>> struct failed_ddw_pdn *fpdn;
>>>
>>> @@ -1110,8 +1111,19 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>> if (ret)
>>> goto out_failed;
>>>
>>> - /*
>>> - * Query if there is a second window of size to map the
>>> + /*
>>> + * First step of setting up DDW is removing the default DMA window,
>>> + * if it's present. It will make all the resources available to the
>>> + * new DDW window.
>>> + * If anything fails after this, we need to restore it.
>>> + */
>>> +
>>> + dfl_win = of_find_property(pdn, "ibm,dma-window", NULL);
>>> + if (dfl_win)
>>> + remove_dma_window(pdn, ddw_avail, dfl_win);
>>
>> Before doing so, you want to make sure that the "reset" is actually
>> supported. Thanks,
>
> Good catch, I will improve that.
>
>>
>>
>>> +
>>> + /*
>>> + * Query if there is a window of size to map the
>>> * whole partition. Query returns number of windows, largest
>>> * block assigned to PE (partition endpoint), and two bitmasks
>>> * of page sizes: supported and supported for migrate-dma.
>>> @@ -1219,6 +1231,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>> kfree(win64);
>>>
>>> out_failed:
>>> + if (dfl_win)
>>> + reset_dma_window(dev, pdn);
>>>
>>> fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
>>> if (!fpdn)
>>>
>
> Best regards,
> Leonardo
>

--
Alexey

2020-06-23 01:14:31

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 2/4] powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call



On 23/06/2020 04:58, Leonardo Bras wrote:
> Hello Alexey, thanks for the feedback!
>
> On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>>
>> On 19/06/2020 15:06, Leonardo Bras wrote:
>>> Platforms supporting the DDW option starting with LoPAR level 2.7 implement
>>> ibm,ddw-extensions. The first extension available (index 2) carries the
>>> token for ibm,reset-pe-dma-windows rtas call, which is used to restore
>>> the default DMA window for a device, if it has been deleted.
>>>
>>> It does so by resetting the TCE table allocation for the PE to it's
>>> boot time value, available in "ibm,dma-window" device tree node.
>>>
>>> Signed-off-by: Leonardo Bras <[email protected]>
>>> ---
>>> arch/powerpc/platforms/pseries/iommu.c | 33 ++++++++++++++++++++++++++
>>> 1 file changed, 33 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index e5a617738c8b..5e1fbc176a37 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -1012,6 +1012,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>>> return max_addr;
>>> }
>>>
>>> +/*
>>> + * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
>>> + * ibm,ddw-extensions, which carries the rtas token for
>>> + * ibm,reset-pe-dma-windows.
>>> + * That rtas-call can be used to restore the default DMA window for the device.
>>> + */
>>> +static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
>>> +{
>>> + int ret;
>>> + u32 cfg_addr, ddw_ext[3];
>>> + u64 buid;
>>> + struct device_node *dn;
>>> + struct pci_dn *pdn;
>>> +
>>> + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
>>> + &ddw_ext[0], 3);
>>
>> s/3/2/ as for the reset extension you do not need the "64bit largest
>> block" extension.
>
> Sure, I will update this.
>
>>
>>
>>> + if (ret)
>>> + return;
>>> +
>>> + dn = pci_device_to_OF_node(dev);
>>> + pdn = PCI_DN(dn);
>>> + buid = pdn->phb->buid;
>>> + cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>>> +
>>> + ret = rtas_call(ddw_ext[1], 3, 1, NULL, cfg_addr,
>>
>> Here the "reset" extention is in ddw_ext[1]. Hm. 1/4 has a bug then.
>
> Humm, in 1/4 I used dd_ext[0] (how many extensions) and ddw_ext[2] (64-
> bit largest window count). I fail to see the bug here.

There is none, my bad :)


>> And I am pretty sure it won't compile as reset_dma_window() is not used
>> and it is static so fold it into one the next patches. Thanks,
>
> Sure, I will do that.
> I was questioning myself about this and thought it would be better to
> split for easier revision.

People separate things when a patch is really huge but even then I miss
the point - I'd really like to see a new function _and_ its uses in the
same patch, otherwise I either need to jump between mails or apply the
series, either is little but extra work :) Thanks,


>>
>>
>>> + BUID_HI(buid), BUID_LO(buid));
>>> + if (ret)
>>> + dev_info(&dev->dev,
>>> + "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
>>> + ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
>>> + ret);
>>> +}
>>> +
>>> /*
>>> * If the PE supports dynamic dma windows, and there is space for a table
>>> * that can map all pages in a linear offset, then setup such a table,
>>>
>
> Best regards,
> Leonardo
>

--
Alexey

2020-06-23 01:16:32

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window



On 23/06/2020 04:59, Leonardo Bras wrote:
> Hello Alexey, thanks for the feedback!
>
> On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>>
>> On 19/06/2020 15:06, Leonardo Bras wrote:
>>> Move the window-removing part of remove_ddw into a new function
>>> (remove_dma_window), so it can be used to remove other DMA windows.
>>>
>>> It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
>>> property, like the default DMA window from the device, which uses
>>> "ibm,dma-window".
>>>
>>> Signed-off-by: Leonardo Bras <[email protected]>
>>> ---
>>> arch/powerpc/platforms/pseries/iommu.c | 53 +++++++++++++++-----------
>>> 1 file changed, 31 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index 5e1fbc176a37..de633f6ae093 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -767,25 +767,14 @@ static int __init disable_ddw_setup(char *str)
>>>
>>> early_param("disable_ddw", disable_ddw_setup);
>>>
>>> -static void remove_ddw(struct device_node *np, bool remove_prop)
>>> +static void remove_dma_window(struct device_node *pdn, u32 *ddw_avail,
>>
>> You do not need the entire ddw_avail here, pass just the token you need.
>
> Well, I just emulated the behavior of create_ddw() and query_ddw() as
> both just pass the array instead of the token, even though they only
> use a single token.

True, there is a pattern.

> I think it's to make the rest of the code independent of the design of
> the "ibm,ddw-applicable" array, and if it changes, only local changes
> on the functions will be needed.

The helper removes a window, if you are going to call other operations
in remove_dma_window(), then you'll have to change its name ;)


>> Also, despite this particular file, the "pdn" name is usually used for
>> struct pci_dn (not device_node), let's keep it that way.
>
> Sure, I got confused for some time about this, as we have:
> static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn).
> but on *_ddw() we have "struct pci_dn *pdn".

True again, not the cleanest style here.


> I will also add a patch that renames those 'struct device_node *pdn' to
> something like 'struct device_node *parent_dn'.

I would not go that far, we (well, Oliver) are getting rid of many
occurrences of pci_dn and Oliver may have a stronger opinion here.


>
>>> + struct property *win)
>>> {
>>> struct dynamic_dma_window_prop *dwp;
>>> - struct property *win64;
>>> - u32 ddw_avail[3];
>>> u64 liobn;
>>> - int ret = 0;
>>> -
>>> - ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
>>> - &ddw_avail[0], 3);
>>> -
>>> - win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
>>> - if (!win64)
>>> - return;
>>> -
>>> - if (ret || win64->length < sizeof(*dwp))
>>> - goto delprop;
>>> + int ret;
>>>
>>> - dwp = win64->value;
>>> + dwp = win->value;
>>> liobn = (u64)be32_to_cpu(dwp->liobn);
>>>
>>> /* clear the whole window, note the arg is in kernel pages */
>>> @@ -793,24 +782,44 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
>>> 1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
>>> if (ret)
>>> pr_warn("%pOF failed to clear tces in window.\n",
>>> - np);
>>> + pdn);
>>> else
>>> pr_debug("%pOF successfully cleared tces in window.\n",
>>> - np);
>>> + pdn);
>>>
>>> ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
>>> if (ret)
>>> pr_warn("%pOF: failed to remove direct window: rtas returned "
>>> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
>>> - np, ret, ddw_avail[2], liobn);
>>> + pdn, ret, ddw_avail[2], liobn);
>>> else
>>> pr_debug("%pOF: successfully removed direct window: rtas returned "
>>> "%d to ibm,remove-pe-dma-window(%x) %llx\n",
>>> - np, ret, ddw_avail[2], liobn);
>>> + pdn, ret, ddw_avail[2], liobn);
>>> +}
>>> +
>>> +static void remove_ddw(struct device_node *np, bool remove_prop)
>>> +{
>>> + struct property *win;
>>> + u32 ddw_avail[3];
>>> + int ret = 0;
>>> +
>>> + ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
>>> + &ddw_avail[0], 3);
>>> + if (ret)
>>> + return;
>>> +
>>> + win = of_find_property(np, DIRECT64_PROPNAME, NULL);
>>> + if (!win)
>>> + return;
>>> +
>>> + if (win->length >= sizeof(struct dynamic_dma_window_prop))
>>
>> Any good reason not to make it "=="? Is there something optional or we
>> expect extension (which may not grow from the end but may add cells in
>> between). Thanks,
>
> Well, it comes from the old behavior of remove_ddw():
> - if (ret || win64->length < sizeof(*dwp))
> - goto delprop;
> As I reversed the logic from 'if (test) go out' to 'if (!test) do
> stuff', I also reversed (a < b) to !(a < b) => (a >= b).
>
> I have no problem changing that to '==', but it will produce a
> different behavior than before.

I missed than, never mind then.


>
>>
>>
>>> + remove_dma_window(np, ddw_avail, win);
>>> +
>>> + if (!remove_prop)
>>> + return;
>>>
>>> -delprop:
>>> - if (remove_prop)
>>> - ret = of_remove_property(np, win64);
>>> + ret = of_remove_property(np, win);
>>> if (ret)
>>> pr_warn("%pOF: failed to remove direct window property: %d\n",
>>> np, ret);
>>>
>
> Best regards,
> Leonardo
>

--
Alexey

2020-06-23 01:17:38

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 1/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows



On 23/06/2020 04:58, Leonardo Bras wrote:
> Hello Alexey, thank you for the feedback!
>
> On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>>
>> On 19/06/2020 15:06, Leonardo Bras wrote:
>>> From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can make the number of
>>> outputs from "ibm,query-pe-dma-windows" go from 5 to 6.
>>>
>>> This change of output size is meant to expand the address size of
>>> largest_available_block PE TCE from 32-bit to 64-bit, which ends up
>>> shifting page_size and migration_capable.
>>>
>>> This ends up requiring the update of
>>> ddw_query_response->largest_available_block from u32 to u64, and manually
>>> assigning the values from the buffer into this struct, according to
>>> output size.
>>>
>>> Signed-off-by: Leonardo Bras <[email protected]>
>>> ---
>>> arch/powerpc/platforms/pseries/iommu.c | 57 +++++++++++++++++++++-----
>>> 1 file changed, 46 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index 6d47b4a3ce39..e5a617738c8b 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -334,7 +334,7 @@ struct direct_window {
>>> /* Dynamic DMA Window support */
>>> struct ddw_query_response {
>>> u32 windows_available;
>>> - u32 largest_available_block;
>>> + u64 largest_available_block;
>>> u32 page_size;
>>> u32 migration_capable;
>>> };
>>> @@ -869,14 +869,32 @@ static int find_existing_ddw_windows(void)
>>> }
>>> machine_arch_initcall(pseries, find_existing_ddw_windows);
>>>
>>> +/*
>>> + * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many output
>>> + * parameters ibm,query-pe-dma-windows will have, ranging from 5 to 6.
>>> + */
>>> +
>>> +static int query_ddw_out_sz(struct device_node *par_dn)
>>
>> Can easily be folded into query_ddw().
>
> Sure, but it will get inlined by the compiler, and I think it reads
> better this way.


> I mean, I understand you have a reason to think it's better to fold it
> in query_ddw(), and I would like to better understand that to improve
> my code in the future.


You have numbers 5 and 6 (the number of parameters) twice in the file,
this is why I brought it up. query_ddw_out_sz() can potentially return
something else than 5 or 6 and you will have to change the callsite(s)
then, since these are not macros, this allows to think there may be more
places with 5 and 6. Dunno. A single function will simplify things imho.


>
>>> +{
>>> + int ret;
>>> + u32 ddw_ext[3];
>>> +
>>> + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
>>> + &ddw_ext[0], 3);
>>> + if (ret || ddw_ext[0] < 2 || ddw_ext[2] != 1)
>>
>> Oh that PAPR thing again :-/
>>
>> ===
>> The “ibm,ddw-extensions” property value is a list of integers the first
>> integer indicates the number of extensions implemented and subsequent
>> integers, one per extension, provide a value associated with that
>> extension.
>> ===
>>
>> So ddw_ext[0] is length.
>> Listindex==2 is for "reset" says PAPR and
>> Listindex==3 is for this new 64bit "largest_available_block".
>>
>> So I'd expect ddw_ext[2] to have the "reset" token and ddw_ext[3] to
>> have "1" for this new feature but indexes are smaller. I am confused.
>> Either way these "2" and "3" needs to be defined in macros, "0" probably
>> too.
>
> Remember these indexes are not C-like 0-starting indexes, where the
> size would be Listindex==1.

Yeah I can see that is the assumption but out of curiosity - is it
written anywhere? Across PAPR, they index bytes from 1 but bits from 0 :-/

Either way make them macros.


> Basically, in C-like array it's :
> a[0] == size,
> a[1] == reset_token,
> a[2] == new 64bit "largest_available_block"
>
>> Please post 'lsprop "ibm,ddw-extensions"' here. Thanks,
>
> Sure:
> [root@host pci@800000029004005]# lsprop "ibm,ddw-extensions"
> ibm,dd
> w-extensions
> 00000002 00000056 00000000

Right, good. Thanks,


>
>
>>
>>> + return 5;
>>> + return 6;
>>> +}
>>> +
>>> static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
>>> - struct ddw_query_response *query)
>>> + struct ddw_query_response *query,
>>> + struct device_node *par_dn)
>>> {
>>> struct device_node *dn;
>>> struct pci_dn *pdn;
>>> - u32 cfg_addr;
>>> + u32 cfg_addr, query_out[5];
>>> u64 buid;
>>> - int ret;
>>> + int ret, out_sz;
>>>
>>> /*
>>> * Get the config address and phb buid of the PE window.
>>> @@ -888,12 +906,29 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
>>> pdn = PCI_DN(dn);
>>> buid = pdn->phb->buid;
>>> cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>>> + out_sz = query_ddw_out_sz(par_dn);
>>> +
>>> + ret = rtas_call(ddw_avail[0], 3, out_sz, query_out,
>>> + cfg_addr, BUID_HI(buid), BUID_LO(buid));
>>> + dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned %d\n",
>>> + ddw_avail[0], cfg_addr, BUID_HI(buid), BUID_LO(buid), ret);
>>> +
>>> + switch (out_sz) {
>>> + case 5:
>>> + query->windows_available = query_out[0];
>>> + query->largest_available_block = query_out[1];
>>> + query->page_size = query_out[2];
>>> + query->migration_capable = query_out[3];
>>> + break;
>>> + case 6:
>>> + query->windows_available = query_out[0];
>>> + query->largest_available_block = ((u64)query_out[1] << 32) |
>>> + query_out[2];
>>> + query->page_size = query_out[3];
>>> + query->migration_capable = query_out[4];
>>> + break;
>>> + }
>>>
>>> - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
>>> - cfg_addr, BUID_HI(buid), BUID_LO(buid));
>>> - dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
>>> - " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
>>> - BUID_LO(buid), ret);
>>> return ret;
>>> }
>>>
>>> @@ -1040,7 +1075,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>> * of page sizes: supported and supported for migrate-dma.
>>> */
>>> dn = pci_device_to_OF_node(dev);
>>> - ret = query_ddw(dev, ddw_avail, &query);
>>> + ret = query_ddw(dev, ddw_avail, &query, pdn);
>>> if (ret != 0)
>>> goto out_failed;
>>>
>>> @@ -1068,7 +1103,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>> /* check largest block * page size > max memory hotplug addr */
>>> max_addr = ddw_memory_hotplug_max();
>>> if (query.largest_available_block < (max_addr >> page_shift)) {
>>> - dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
>>> + dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
>>> "%llu-sized pages\n", max_addr, query.largest_available_block,
>>> 1ULL << page_shift);
>>> goto out_failed;
>>>
>
> Best regards,
> Leonardo
>

--
Alexey

2020-06-23 02:08:16

by Oliver O'Halloran

[permalink] [raw]
Subject: Re: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

On Tue, Jun 23, 2020 at 11:12 AM Alexey Kardashevskiy <[email protected]> wrote:
>
> On 23/06/2020 04:59, Leonardo Bras wrote:
> >
> >> Also, despite this particular file, the "pdn" name is usually used for
> >> struct pci_dn (not device_node), let's keep it that way.
> >
> > Sure, I got confused for some time about this, as we have:
> > static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn).
> > but on *_ddw() we have "struct pci_dn *pdn".
>
> True again, not the cleanest style here.
>
>
> > I will also add a patch that renames those 'struct device_node *pdn' to
> > something like 'struct device_node *parent_dn'.

I usually go with "np" or "node". In this case I'd use "parent_np" or
just "parent." As you said pci_dn conventionally uses pdn so that
should be avoided if at all possible. There's some places that just
use "dn" for device_node, but I don't think that's something we should
encourage due to how similar it is to pdn.

> I would not go that far, we (well, Oliver) are getting rid of many
> occurrences of pci_dn and Oliver may have a stronger opinion here.

I'm trying to remove the use of pci_dn from non-RTAS platforms which
doesn't apply to pseries. For RTAS platforms having pci_dn sort of
makes sense since it's used to cache data from the device_node and
having it saves you from needing to parse and validate the DT at
runtime since we're supposed to be relying on the FW provided settings
in the DT. I want to get rid of it on PowerNV because it's become a
dumping ground for random bits and pieces of platform specific data.
It's confusing at best and IMO it duplicates a lot of what's already
available in the per-PHB structures which the platform specific stuff
should actually be looking at.

Oliver

2020-06-23 02:20:13

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 1/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows

On Tue, 2020-06-23 at 11:12 +1000, Alexey Kardashevskiy wrote:
[snip]
> > > > +static int query_ddw_out_sz(struct device_node *par_dn)
> > >
> > > Can easily be folded into query_ddw().
> >
> > Sure, but it will get inlined by the compiler, and I think it reads
> > better this way.
> > I mean, I understand you have a reason to think it's better to fold it
> > in query_ddw(), and I would like to better understand that to improve
> > my code in the future.
>
> You have numbers 5 and 6 (the number of parameters) twice in the file,
> this is why I brought it up. query_ddw_out_sz() can potentially return
> something else than 5 or 6 and you will have to change the callsite(s)
> then, since these are not macros, this allows to think there may be more
> places with 5 and 6. Dunno. A single function will simplify things imho.

That's a good point. Thanks!

>
>
> > > > +{
> > > > + int ret;
> > > > + u32 ddw_ext[3];
> > > > +
> > > > + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
> > > > + &ddw_ext[0], 3);
> > > > + if (ret || ddw_ext[0] < 2 || ddw_ext[2] != 1)
> > >
> > > Oh that PAPR thing again :-/
> > >
> > > ===
> > > The “ibm,ddw-extensions” property value is a list of integers the first
> > > integer indicates the number of extensions implemented and subsequent
> > > integers, one per extension, provide a value associated with that
> > > extension.
> > > ===
> > >
> > > So ddw_ext[0] is length.
> > > Listindex==2 is for "reset" says PAPR and
> > > Listindex==3 is for this new 64bit "largest_available_block".
> > >
> > > So I'd expect ddw_ext[2] to have the "reset" token and ddw_ext[3] to
> > > have "1" for this new feature but indexes are smaller. I am confused.
> > > Either way these "2" and "3" needs to be defined in macros, "0" probably
> > > too.
> >
> > Remember these indexes are not C-like 0-starting indexes, where the
> > size would be Listindex==1.
>
> Yeah I can see that is the assumption but out of curiosity - is it
> written anywhere? Across PAPR, they index bytes from 1 but bits from 0 :-/

From LoPAR:
The “ibm,ddw-extensions” property value is a list of integers the first
integer indicates the number of extensions implemented and subsequent
integers, one per extension, provide a value associated with that
extension.

And the list/table then shows extensions from 2 on onwards:
List index 2 : Token of the ibm,reset-pe-dma-windows RTAS Call
(...)

> Either way make them macros.
>
>
> > Basically, in C-like array it's :
> > a[0] == size,
> > a[1] == reset_token,
> > a[2] == new 64bit "largest_available_block"
> >
> > > Please post 'lsprop "ibm,ddw-extensions"' here. Thanks,
> >
> > Sure:
> > [root@host pci@800000029004005]# lsprop "ibm,ddw-extensions"
> > ibm,dd
> > w-extensions
> > 00000002 00000056 00000000
>
> Right, good. Thanks,

Thank you for reviewing!

>
> >
> > > > + return 5;
> > > > + return 6;
> > > > +}
> > > > +
> > > > static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> > > > - struct ddw_query_response *query)
> > > > + struct ddw_query_response *query,
> > > > + struct device_node *par_dn)
> > > > {
> > > > struct device_node *dn;
> > > > struct pci_dn *pdn;
> > > > - u32 cfg_addr;
> > > > + u32 cfg_addr, query_out[5];
> > > > u64 buid;
> > > > - int ret;
> > > > + int ret, out_sz;
> > > >
> > > > /*
> > > > * Get the config address and phb buid of the PE window.
> > > > @@ -888,12 +906,29 @@ static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> > > > pdn = PCI_DN(dn);
> > > > buid = pdn->phb->buid;
> > > > cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> > > > + out_sz = query_ddw_out_sz(par_dn);
> > > > +
> > > > + ret = rtas_call(ddw_avail[0], 3, out_sz, query_out,
> > > > + cfg_addr, BUID_HI(buid), BUID_LO(buid));
> > > > + dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x returned %d\n",
> > > > + ddw_avail[0], cfg_addr, BUID_HI(buid), BUID_LO(buid), ret);
> > > > +
> > > > + switch (out_sz) {
> > > > + case 5:
> > > > + query->windows_available = query_out[0];
> > > > + query->largest_available_block = query_out[1];
> > > > + query->page_size = query_out[2];
> > > > + query->migration_capable = query_out[3];
> > > > + break;
> > > > + case 6:
> > > > + query->windows_available = query_out[0];
> > > > + query->largest_available_block = ((u64)query_out[1] << 32) |
> > > > + query_out[2];
> > > > + query->page_size = query_out[3];
> > > > + query->migration_capable = query_out[4];
> > > > + break;
> > > > + }
> > > >
> > > > - ret = rtas_call(ddw_avail[0], 3, 5, (u32 *)query,
> > > > - cfg_addr, BUID_HI(buid), BUID_LO(buid));
> > > > - dev_info(&dev->dev, "ibm,query-pe-dma-windows(%x) %x %x %x"
> > > > - " returned %d\n", ddw_avail[0], cfg_addr, BUID_HI(buid),
> > > > - BUID_LO(buid), ret);
> > > > return ret;
> > > > }
> > > >
> > > > @@ -1040,7 +1075,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > > > * of page sizes: supported and supported for migrate-dma.
> > > > */
> > > > dn = pci_device_to_OF_node(dev);
> > > > - ret = query_ddw(dev, ddw_avail, &query);
> > > > + ret = query_ddw(dev, ddw_avail, &query, pdn);
> > > > if (ret != 0)
> > > > goto out_failed;
> > > >
> > > > @@ -1068,7 +1103,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > > > /* check largest block * page size > max memory hotplug addr */
> > > > max_addr = ddw_memory_hotplug_max();
> > > > if (query.largest_available_block < (max_addr >> page_shift)) {
> > > > - dev_dbg(&dev->dev, "can't map partition max 0x%llx with %u "
> > > > + dev_dbg(&dev->dev, "can't map partition max 0x%llx with %llu "
> > > > "%llu-sized pages\n", max_addr, query.largest_available_block,
> > > > 1ULL << page_shift);
> > > > goto out_failed;
> > > >
> >
> > Best regards,
> > Leonardo
> >

Best regards,
Leonardo Bras

2020-06-23 02:24:05

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 2/4] powerpc/pseries/iommu: Implement ibm,reset-pe-dma-windows rtas call

On Tue, 2020-06-23 at 11:11 +1000, Alexey Kardashevskiy wrote:
>
> On 23/06/2020 04:58, Leonardo Bras wrote:
> > Hello Alexey, thanks for the feedback!
> >
> > On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
> > > On 19/06/2020 15:06, Leonardo Bras wrote:
> > > > Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> > > > ibm,ddw-extensions. The first extension available (index 2) carries the
> > > > token for ibm,reset-pe-dma-windows rtas call, which is used to restore
> > > > the default DMA window for a device, if it has been deleted.
> > > >
> > > > It does so by resetting the TCE table allocation for the PE to it's
> > > > boot time value, available in "ibm,dma-window" device tree node.
> > > >
> > > > Signed-off-by: Leonardo Bras <[email protected]>
> > > > ---
> > > > arch/powerpc/platforms/pseries/iommu.c | 33 ++++++++++++++++++++++++++
> > > > 1 file changed, 33 insertions(+)
> > > >
> > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > > > index e5a617738c8b..5e1fbc176a37 100644
> > > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > > @@ -1012,6 +1012,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
> > > > return max_addr;
> > > > }
> > > >
> > > > +/*
> > > > + * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> > > > + * ibm,ddw-extensions, which carries the rtas token for
> > > > + * ibm,reset-pe-dma-windows.
> > > > + * That rtas-call can be used to restore the default DMA window for the device.
> > > > + */
> > > > +static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
> > > > +{
> > > > + int ret;
> > > > + u32 cfg_addr, ddw_ext[3];
> > > > + u64 buid;
> > > > + struct device_node *dn;
> > > > + struct pci_dn *pdn;
> > > > +
> > > > + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
> > > > + &ddw_ext[0], 3);
> > >
> > > s/3/2/ as for the reset extension you do not need the "64bit largest
> > > block" extension.
> >
> > Sure, I will update this.
> >
> > >
> > > > + if (ret)
> > > > + return;
> > > > +
> > > > + dn = pci_device_to_OF_node(dev);
> > > > + pdn = PCI_DN(dn);
> > > > + buid = pdn->phb->buid;
> > > > + cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> > > > +
> > > > + ret = rtas_call(ddw_ext[1], 3, 1, NULL, cfg_addr,
> > >
> > > Here the "reset" extention is in ddw_ext[1]. Hm. 1/4 has a bug then.
> >
> > Humm, in 1/4 I used dd_ext[0] (how many extensions) and ddw_ext[2] (64-
> > bit largest window count). I fail to see the bug here.
>
> There is none, my bad :)
>
>
> > > And I am pretty sure it won't compile as reset_dma_window() is not used
> > > and it is static so fold it into one the next patches. Thanks,
> >
> > Sure, I will do that.
> > I was questioning myself about this and thought it would be better to
> > split for easier revision.
>
> People separate things when a patch is really huge but even then I miss
> the point - I'd really like to see a new function _and_ its uses in the
> same patch, otherwise I either need to jump between mails or apply the
> series, either is little but extra work :) Thanks,


Sure, that makes sense.
I will keep that in mind for future patchsets (and v2).

Thank you!

>
>
> > >
> > > > + BUID_HI(buid), BUID_LO(buid));
> > > > + if (ret)
> > > > + dev_info(&dev->dev,
> > > > + "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
> > > > + ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
> > > > + ret);
> > > > +}
> > > > +
> > > > /*
> > > > * If the PE supports dynamic dma windows, and there is space for a table
> > > > * that can map all pages in a linear offset, then setup such a table,
> > > >
> >
> > Best regards,
> > Leonardo
> >

2020-06-23 02:27:05

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

On Tue, 2020-06-23 at 11:12 +1000, Alexey Kardashevskiy wrote:
>
> On 23/06/2020 04:59, Leonardo Bras wrote:
> > Hello Alexey, thanks for the feedback!
> >
> > On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
> > > On 19/06/2020 15:06, Leonardo Bras wrote:
> > > > Move the window-removing part of remove_ddw into a new function
> > > > (remove_dma_window), so it can be used to remove other DMA windows.
> > > >
> > > > It's useful for removing DMA windows that don't create DIRECT64_PROPNAME
> > > > property, like the default DMA window from the device, which uses
> > > > "ibm,dma-window".
> > > >
> > > > Signed-off-by: Leonardo Bras <[email protected]>
> > > > ---
> > > > arch/powerpc/platforms/pseries/iommu.c | 53 +++++++++++++++-----------
> > > > 1 file changed, 31 insertions(+), 22 deletions(-)
> > > >
> > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > > > index 5e1fbc176a37..de633f6ae093 100644
> > > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > > @@ -767,25 +767,14 @@ static int __init disable_ddw_setup(char *str)
> > > >
> > > > early_param("disable_ddw", disable_ddw_setup);
> > > >
> > > > -static void remove_ddw(struct device_node *np, bool remove_prop)
> > > > +static void remove_dma_window(struct device_node *pdn, u32 *ddw_avail,
> > >
> > > You do not need the entire ddw_avail here, pass just the token you need.
> >
> > Well, I just emulated the behavior of create_ddw() and query_ddw() as
> > both just pass the array instead of the token, even though they only
> > use a single token.
>
> True, there is a pattern.
>
> > I think it's to make the rest of the code independent of the design of
> > the "ibm,ddw-applicable" array, and if it changes, only local changes
> > on the functions will be needed.
>
> The helper removes a window, if you are going to call other operations
> in remove_dma_window(), then you'll have to change its name ;)

Not only doing new stuff, it can change the order for some reason (as
the order of the output of query), and it would need not change the
caller.

>
>
> > > Also, despite this particular file, the "pdn" name is usually used for
> > > struct pci_dn (not device_node), let's keep it that way.
> >
> > Sure, I got confused for some time about this, as we have:
> > static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn).
> > but on *_ddw() we have "struct pci_dn *pdn".
>
> True again, not the cleanest style here.
>
>
> > I will also add a patch that renames those 'struct device_node *pdn' to
> > something like 'struct device_node *parent_dn'.
>
> I would not go that far, we (well, Oliver) are getting rid of many
> occurrences of pci_dn and Oliver may have a stronger opinion here.
>
>
> > > > + struct property *win)
> > > > {
> > > > struct dynamic_dma_window_prop *dwp;
> > > > - struct property *win64;
> > > > - u32 ddw_avail[3];
> > > > u64 liobn;
> > > > - int ret = 0;
> > > > -
> > > > - ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> > > > - &ddw_avail[0], 3);
> > > > -
> > > > - win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
> > > > - if (!win64)
> > > > - return;
> > > > -
> > > > - if (ret || win64->length < sizeof(*dwp))
> > > > - goto delprop;
> > > > + int ret;
> > > >
> > > > - dwp = win64->value;
> > > > + dwp = win->value;
> > > > liobn = (u64)be32_to_cpu(dwp->liobn);
> > > >
> > > > /* clear the whole window, note the arg is in kernel pages */
> > > > @@ -793,24 +782,44 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
> > > > 1ULL << (be32_to_cpu(dwp->window_shift) - PAGE_SHIFT), dwp);
> > > > if (ret)
> > > > pr_warn("%pOF failed to clear tces in window.\n",
> > > > - np);
> > > > + pdn);
> > > > else
> > > > pr_debug("%pOF successfully cleared tces in window.\n",
> > > > - np);
> > > > + pdn);
> > > >
> > > > ret = rtas_call(ddw_avail[2], 1, 1, NULL, liobn);
> > > > if (ret)
> > > > pr_warn("%pOF: failed to remove direct window: rtas returned "
> > > > "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> > > > - np, ret, ddw_avail[2], liobn);
> > > > + pdn, ret, ddw_avail[2], liobn);
> > > > else
> > > > pr_debug("%pOF: successfully removed direct window: rtas returned "
> > > > "%d to ibm,remove-pe-dma-window(%x) %llx\n",
> > > > - np, ret, ddw_avail[2], liobn);
> > > > + pdn, ret, ddw_avail[2], liobn);
> > > > +}
> > > > +
> > > > +static void remove_ddw(struct device_node *np, bool remove_prop)
> > > > +{
> > > > + struct property *win;
> > > > + u32 ddw_avail[3];
> > > > + int ret = 0;
> > > > +
> > > > + ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
> > > > + &ddw_avail[0], 3);
> > > > + if (ret)
> > > > + return;
> > > > +
> > > > + win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> > > > + if (!win)
> > > > + return;
> > > > +
> > > > + if (win->length >= sizeof(struct dynamic_dma_window_prop))
> > >
> > > Any good reason not to make it "=="? Is there something optional or we
> > > expect extension (which may not grow from the end but may add cells in
> > > between). Thanks,
> >
> > Well, it comes from the old behavior of remove_ddw():
> > - if (ret || win64->length < sizeof(*dwp))
> > - goto delprop;
> > As I reversed the logic from 'if (test) go out' to 'if (!test) do
> > stuff', I also reversed (a < b) to !(a < b) => (a >= b).
> >
> > I have no problem changing that to '==', but it will produce a
> > different behavior than before.
>
> I missed than, never mind then.
>
>
> > >
> > > > + remove_dma_window(np, ddw_avail, win);
> > > > +
> > > > + if (!remove_prop)
> > > > + return;
> > > >
> > > > -delprop:
> > > > - if (remove_prop)
> > > > - ret = of_remove_property(np, win64);
> > > > + ret = of_remove_property(np, win);
> > > > if (ret)
> > > > pr_warn("%pOF: failed to remove direct window property: %d\n",
> > > > np, ret);
> > > >
> >
> > Best regards,
> > Leonardo
> >

2020-06-23 02:34:10

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 1/4] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows



On 23/06/2020 12:14, Leonardo Bras wrote:
> On Tue, 2020-06-23 at 11:12 +1000, Alexey Kardashevskiy wrote:
> [snip]
>>>>> +static int query_ddw_out_sz(struct device_node *par_dn)
>>>>
>>>> Can easily be folded into query_ddw().
>>>
>>> Sure, but it will get inlined by the compiler, and I think it reads
>>> better this way.
>>> I mean, I understand you have a reason to think it's better to fold it
>>> in query_ddw(), and I would like to better understand that to improve
>>> my code in the future.
>>
>> You have numbers 5 and 6 (the number of parameters) twice in the file,
>> this is why I brought it up. query_ddw_out_sz() can potentially return
>> something else than 5 or 6 and you will have to change the callsite(s)
>> then, since these are not macros, this allows to think there may be more
>> places with 5 and 6. Dunno. A single function will simplify things imho.
>
> That's a good point. Thanks!
>
>>
>>
>>>>> +{
>>>>> + int ret;
>>>>> + u32 ddw_ext[3];
>>>>> +
>>>>> + ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
>>>>> + &ddw_ext[0], 3);
>>>>> + if (ret || ddw_ext[0] < 2 || ddw_ext[2] != 1)
>>>>
>>>> Oh that PAPR thing again :-/
>>>>
>>>> ===
>>>> The “ibm,ddw-extensions” property value is a list of integers the first
>>>> integer indicates the number of extensions implemented and subsequent
>>>> integers, one per extension, provide a value associated with that
>>>> extension.
>>>> ===
>>>>
>>>> So ddw_ext[0] is length.
>>>> Listindex==2 is for "reset" says PAPR and
>>>> Listindex==3 is for this new 64bit "largest_available_block".
>>>>
>>>> So I'd expect ddw_ext[2] to have the "reset" token and ddw_ext[3] to
>>>> have "1" for this new feature but indexes are smaller. I am confused.
>>>> Either way these "2" and "3" needs to be defined in macros, "0" probably
>>>> too.
>>>
>>> Remember these indexes are not C-like 0-starting indexes, where the
>>> size would be Listindex==1.
>>
>> Yeah I can see that is the assumption but out of curiosity - is it
>> written anywhere? Across PAPR, they index bytes from 1 but bits from 0 :-/
>
> From LoPAR:
> The “ibm,ddw-extensions” property value is a list of integers the first
> integer indicates the number of extensions implemented and subsequent
> integers, one per extension, provide a value associated with that
> extension.
>
> And the list/table then shows extensions from 2 on onwards:
> List index 2 : Token of the ibm,reset-pe-dma-windows RTAS Call
> (...)


I means a place saying "we number things starting from 1 and not from
zero", this kind of thing. The code implementing the spec uses the C
language so it would make sense to count from zero, otoh the writer
probably did not write any code for ages :)



--
Alexey

2020-06-23 02:36:09

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW

On Tue, 2020-06-23 at 11:11 +1000, Alexey Kardashevskiy wrote:
>
> On 23/06/2020 04:59, Leonardo Bras wrote:
> > Hello Alexey, thanks for the feedback!
> >
> > On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
> > > On 19/06/2020 15:06, Leonardo Bras wrote:
> > > > On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
> > > > default DMA window for the device, before attempting to configure a DDW,
> > > > in order to make the maximum resources available for the next DDW to be
> > > > created.
> > > >
> > > > This is a requirement for some devices to use DDW, given they only
> > > > allow one DMA window.
> > > >
> > > > If setting up a new DDW fails anywhere after the removal of this
> > > > default DMA window, restore it using reset_dma_window.
> > >
> > > Nah... If we do it like this, then under pHyp we lose 32bit DMA for good
> > > as pHyp can only create a single window and it has to map at
> > > 0x800.0000.0000.0000. They probably do not care though.
> > >
> > > Under KVM, this will fail as VFIO allows creating 2 windows and it
> > > starts from 0 but the existing iommu_bypass_supported_pSeriesLP() treats
> > > the window address == 0 as a failure. And we want to keep both DMA
> > > windows for PCI adapters with both 64bit and 32bit PCI functions (I
> > > heard AMD GPU video + audio are like this) or someone could hotplug
> > > 32bit DMA device on a vphb with already present 64bit DMA window so we
> > > do not remove the default window.
> >
> > Well, then I would suggest doing something like this:
> > query_ddw(...);
> > if (query.windows_available == 0){
> > remove_dma_window(...,default_win);
> > query_ddw(...);
> > }
> >
> > This would make sure to cover cases of windows available == 1
> > and windows available > 1;
>
> Is "1" what pHyp returns on query? And was it always like that? Then it
> is probably ok. I just never really explored the idea of removing the
> default window as we did not have to.

I am not sure if this is true in general, but in this device (SR-IOV
VF) I am testing it will return 0 windows if the default DMA window is
not deleted, and 1 after it's deleted.

>
>
> > > The last discussed thing I remember was that there was supposed to be a
> > > new bit in "ibm,architecture-vec-5" (forgot the details), we could use
> > > that to decide whether to keep the default window or not, like this.
> >
> > I checked on the latest LoPAR draft (soon to be published), for the
> > ibm,architecture-vec 'option array 5' and this entry was the only
> > recently added one that is related to this patchset:
> >
> > Byte 8 - Bit 0:
> > SRIOV Virtual Functions Support Dynamic DMA Windows (DDW):
> > 0: SRIOV Virtual Functions do not support DDW
> > 1: SRIOV Virtual Functions do support DDW
> >
> > Isn't this equivalent to having a "ibm,ddw-applicable" property?
>
> I am not sure, is there anything else to this new bit?

I copied everything from the LoPAR, and IIRC the ACR for this change
only described this change in the document.


> I'd think if the
> client supports it, then pHyp will create one 64bit window per a PE and
> DDW API won't be needed. Thanks,

That would make sense, and be great.
I will dig some more.

Thank you!

> > > > Signed-off-by: Leonardo Bras <[email protected]>
> > > > ---
> > > > arch/powerpc/platforms/pseries/iommu.c | 20 +++++++++++++++++---
> > > > 1 file changed, 17 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > > > index de633f6ae093..68d1ea957ac7 100644
> > > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > > @@ -1074,8 +1074,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > > > u64 dma_addr, max_addr;
> > > > struct device_node *dn;
> > > > u32 ddw_avail[3];
> > > > +
> > > > struct direct_window *window;
> > > > - struct property *win64;
> > > > + struct property *win64, *dfl_win;
> > >
> > > Make it "default_win" or "def_win", "dfl" hurts to read :)
> >
> > Sure, no problem :)
> >
> > > > struct dynamic_dma_window_prop *ddwprop;
> > > > struct failed_ddw_pdn *fpdn;
> > > >
> > > > @@ -1110,8 +1111,19 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > > > if (ret)
> > > > goto out_failed;
> > > >
> > > > - /*
> > > > - * Query if there is a second window of size to map the
> > > > + /*
> > > > + * First step of setting up DDW is removing the default DMA window,
> > > > + * if it's present. It will make all the resources available to the
> > > > + * new DDW window.
> > > > + * If anything fails after this, we need to restore it.
> > > > + */
> > > > +
> > > > + dfl_win = of_find_property(pdn, "ibm,dma-window", NULL);
> > > > + if (dfl_win)
> > > > + remove_dma_window(pdn, ddw_avail, dfl_win);
> > >
> > > Before doing so, you want to make sure that the "reset" is actually
> > > supported. Thanks,
> >
> > Good catch, I will improve that.
> >
> > >
> > > > +
> > > > + /*
> > > > + * Query if there is a window of size to map the
> > > > * whole partition. Query returns number of windows, largest
> > > > * block assigned to PE (partition endpoint), and two bitmasks
> > > > * of page sizes: supported and supported for migrate-dma.
> > > > @@ -1219,6 +1231,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> > > > kfree(win64);
> > > >
> > > > out_failed:
> > > > + if (dfl_win)
> > > > + reset_dma_window(dev, pdn);
> > > >
> > > > fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
> > > > if (!fpdn)
> > > >
> >
> > Best regards,
> > Leonardo
> >

2020-06-23 02:41:23

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW



On 23/06/2020 12:31, Leonardo Bras wrote:
> On Tue, 2020-06-23 at 11:11 +1000, Alexey Kardashevskiy wrote:
>>
>> On 23/06/2020 04:59, Leonardo Bras wrote:
>>> Hello Alexey, thanks for the feedback!
>>>
>>> On Mon, 2020-06-22 at 20:02 +1000, Alexey Kardashevskiy wrote:
>>>> On 19/06/2020 15:06, Leonardo Bras wrote:
>>>>> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
>>>>> default DMA window for the device, before attempting to configure a DDW,
>>>>> in order to make the maximum resources available for the next DDW to be
>>>>> created.
>>>>>
>>>>> This is a requirement for some devices to use DDW, given they only
>>>>> allow one DMA window.
>>>>>
>>>>> If setting up a new DDW fails anywhere after the removal of this
>>>>> default DMA window, restore it using reset_dma_window.
>>>>
>>>> Nah... If we do it like this, then under pHyp we lose 32bit DMA for good
>>>> as pHyp can only create a single window and it has to map at
>>>> 0x800.0000.0000.0000. They probably do not care though.
>>>>
>>>> Under KVM, this will fail as VFIO allows creating 2 windows and it
>>>> starts from 0 but the existing iommu_bypass_supported_pSeriesLP() treats
>>>> the window address == 0 as a failure. And we want to keep both DMA
>>>> windows for PCI adapters with both 64bit and 32bit PCI functions (I
>>>> heard AMD GPU video + audio are like this) or someone could hotplug
>>>> 32bit DMA device on a vphb with already present 64bit DMA window so we
>>>> do not remove the default window.
>>>
>>> Well, then I would suggest doing something like this:
>>> query_ddw(...);
>>> if (query.windows_available == 0){
>>> remove_dma_window(...,default_win);
>>> query_ddw(...);
>>> }
>>>
>>> This would make sure to cover cases of windows available == 1
>>> and windows available > 1;
>>
>> Is "1" what pHyp returns on query? And was it always like that? Then it
>> is probably ok. I just never really explored the idea of removing the
>> default window as we did not have to.
>
> I am not sure if this is true in general, but in this device (SR-IOV
> VF) I am testing it will return 0 windows if the default DMA window is
> not deleted, and 1 after it's deleted.


Since pHyp can only create windows in "64bit space", now (I did not know
until a few month back) I expect that thing to return "1" always no
matter what happened to the default window. And removing the default
window will only affect the maximum number of TCEs but not the number of
possible windows.


--
Alexey

2020-06-23 02:45:43

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW

On Tue, 2020-06-23 at 12:35 +1000, Alexey Kardashevskiy wrote:
> > I am not sure if this is true in general, but in this device (SR-IOV
> > VF) I am testing it will return 0 windows if the default DMA window is
> > not deleted, and 1 after it's deleted.
>
> Since pHyp can only create windows in "64bit space", now (I did not know
> until a few month back) I expect that thing to return "1" always no
> matter what happened to the default window. And removing the default
> window will only affect the maximum number of TCEs but not the number of
> possible windows.

Humm, something gone wrong then.

This patchset was developed mostly because on testing, DDW would never
be created because windows_available would always be 0.

I will take a deeper look in that.

Best regards,
Leonardo

2020-06-23 03:29:12

by Leonardo Brás

[permalink] [raw]
Subject: Re: [PATCH 3/4] powerpc/pseries/iommu: Move window-removing part of remove_ddw into remove_dma_window

On Tue, 2020-06-23 at 11:33 +1000, Oliver O'Halloran wrote:
> On Tue, Jun 23, 2020 at 11:12 AM Alexey Kardashevskiy <[email protected]> wrote:
> > On 23/06/2020 04:59, Leonardo Bras wrote:
> > > > Also, despite this particular file, the "pdn" name is usually used for
> > > > struct pci_dn (not device_node), let's keep it that way.
> > >
> > > Sure, I got confused for some time about this, as we have:
> > > static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn).
> > > but on *_ddw() we have "struct pci_dn *pdn".
> >
> > True again, not the cleanest style here.
> >
> >
> > > I will also add a patch that renames those 'struct device_node *pdn' to
> > > something like 'struct device_node *parent_dn'.
>
> I usually go with "np" or "node". In this case I'd use "parent_np" or
> just "parent." As you said pci_dn conventionally uses pdn so that
> should be avoided if at all possible. There's some places that just
> use "dn" for device_node, but I don't think that's something we should
> encourage due to how similar it is to pdn.

Sure, I will try that.

>
> > I would not go that far, we (well, Oliver) are getting rid of many
> > occurrences of pci_dn and Oliver may have a stronger opinion here.
>
> I'm trying to remove the use of pci_dn from non-RTAS platforms which
> doesn't apply to pseries. For RTAS platforms having pci_dn sort of
> makes sense since it's used to cache data from the device_node and
> having it saves you from needing to parse and validate the DT at
> runtime since we're supposed to be relying on the FW provided settings
> in the DT. I want to get rid of it on PowerNV because it's become a
> dumping ground for random bits and pieces of platform specific data.
> It's confusing at best and IMO it duplicates a lot of what's already
> available in the per-PHB structures which the platform specific stuff
> should actually be looking at.
>
> Oliver

Best regards,
Leonardo Bras

2020-06-23 03:55:14

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 4/4] powerpc/pseries/iommu: Remove default DMA window before creating DDW



On 23/06/2020 12:43, Leonardo Bras wrote:
> On Tue, 2020-06-23 at 12:35 +1000, Alexey Kardashevskiy wrote:
>>> I am not sure if this is true in general, but in this device (SR-IOV
>>> VF) I am testing it will return 0 windows if the default DMA window is
>>> not deleted, and 1 after it's deleted.
>>
>> Since pHyp can only create windows in "64bit space", now (I did not know
>> until a few month back) I expect that thing to return "1" always no
>> matter what happened to the default window. And removing the default
>> window will only affect the maximum number of TCEs but not the number of
>> possible windows.
>
> Humm, something gone wrong then.
>
> This patchset was developed mostly because on testing, DDW would never
> be created because windows_available would always be 0.

? On phyp, if there is a huge window, then it can be 0 or 1. On KVM, it
is 0 or 1 or 2.


>
> I will take a deeper look in that.
>
> Best regards,
> Leonardo
>

--
Alexey