2015-05-03 20:49:59

by Robert Richter

[permalink] [raw]
Subject: [PATCH 0/4] arm64: gicv3: its: Fixes and updates for ThunderX

From: Robert Richter <[email protected]>

This patch series adds fixes and updates for ThunderX.

Patches are unrelated each other and can be applied individually.


Radha Mohan Chintakuntla (1):
arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

Robert Richter (2):
arm64: gicv3: its: Add range check for number of allocated pages
arm64: gicv3: its: Read typer register outside the loop

Tirumalesh Chalamarla (1):
arm64: gicv3: its: Encode domain number in PCI stream id

arch/arm64/Kconfig | 1 +
drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++-----
include/linux/irqchip/arm-gic-v3.h | 1 +
3 files changed, 15 insertions(+), 5 deletions(-)

--
2.1.1


2015-05-03 20:50:06

by Robert Richter

[permalink] [raw]
Subject: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id

From: Tirumalesh Chalamarla <[email protected]>

PCI stream ids need to consider pci bridge number to be unique on the
system. Using only bus and devfn can't do the trick in systems that
have multiple pci bridges.

Signed-off-by: Tirumalesh Chalamarla <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
drivers/irqchip/irq-gic-v3-its.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 9687f8afebff..e30b4de04c6c 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1186,7 +1186,7 @@ static int its_get_pci_alias(struct pci_dev *pdev, u16 alias, void *data)
{
struct its_pci_alias *dev_alias = data;

- dev_alias->dev_id = alias;
+ dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
if (pdev != dev_alias->pdev)
dev_alias->count += its_pci_msi_vec_count(dev_alias->pdev);

--
2.1.1

2015-05-03 20:50:19

by Robert Richter

[permalink] [raw]
Subject: [PATCH 2/4] arm64: gicv3: its: Add range check for number of allocated pages

From: Robert Richter <[email protected]>

The number of pages for the its table may exceed the maximum of 256.
Adding a range check and limitting the number to its maximum.

Based on a patch from Tirumalesh Chalamarla <[email protected]>.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/irqchip/irq-gic-v3-its.c | 11 ++++++++++-
include/linux/irqchip/arm-gic-v3.h | 1 +
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index e30b4de04c6c..a2619a1d5bb3 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -810,6 +810,7 @@ static int its_alloc_tables(struct its_node *its)
u64 entry_size = GITS_BASER_ENTRY_SIZE(val);
int order = get_order(psz);
int alloc_size;
+ int alloc_pages;
u64 tmp;
void *base;

@@ -837,6 +838,14 @@ static int its_alloc_tables(struct its_node *its)
}

alloc_size = (1 << order) * PAGE_SIZE;
+ alloc_pages = (alloc_size / psz);
+ if (alloc_pages > GITS_BASER_PAGES_MAX) {
+ alloc_pages = GITS_BASER_PAGES_MAX;
+ order = get_order(GITS_BASER_PAGES_MAX * psz);
+ pr_warn("%s: Device Table too large, reduce its page order to %u (%u pages)\n",
+ its->msi_chip.of_node->full_name, order, alloc_pages);
+ }
+
base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
if (!base) {
err = -ENOMEM;
@@ -865,7 +874,7 @@ static int its_alloc_tables(struct its_node *its)
break;
}

- val |= (alloc_size / psz) - 1;
+ val |= alloc_pages - 1;

writeq_relaxed(val, its->base + GITS_BASER + i * 8);
tmp = readq_relaxed(its->base + GITS_BASER + i * 8);
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index ffbc034c8810..f28da189c4aa 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -229,6 +229,7 @@
#define GITS_BASER_PAGE_SIZE_16K (1UL << GITS_BASER_PAGE_SIZE_SHIFT)
#define GITS_BASER_PAGE_SIZE_64K (2UL << GITS_BASER_PAGE_SIZE_SHIFT)
#define GITS_BASER_PAGE_SIZE_MASK (3UL << GITS_BASER_PAGE_SIZE_SHIFT)
+#define GITS_BASER_PAGES_MAX 256

#define GITS_BASER_TYPE_NONE 0
#define GITS_BASER_TYPE_DEVICE 1
--
2.1.1

2015-05-03 20:50:12

by Robert Richter

[permalink] [raw]
Subject: [PATCH 3/4] arm64: gicv3: its: Read typer register outside the loop

From: Robert Richter <[email protected]>

No need to read the typer register in the loop. Values do not change.

Signed-off-by: Robert Richter <[email protected]>
---
drivers/irqchip/irq-gic-v3-its.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index a2619a1d5bb3..916c4724c771 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -803,6 +803,8 @@ static int its_alloc_tables(struct its_node *its)
int psz = SZ_64K;
u64 shr = GITS_BASER_InnerShareable;
u64 cache = GITS_BASER_WaWb;
+ u64 typer = readq_relaxed(its->base + GITS_TYPER);
+ u32 ids = GITS_TYPER_DEVBITS(typer);

for (i = 0; i < GITS_BASER_NR_REGS; i++) {
u64 val = readq_relaxed(its->base + GITS_BASER + i * 8);
@@ -826,9 +828,6 @@ static int its_alloc_tables(struct its_node *its)
* For other tables, only allocate a single page.
*/
if (type == GITS_BASER_TYPE_DEVICE) {
- u64 typer = readq_relaxed(its->base + GITS_TYPER);
- u32 ids = GITS_TYPER_DEVBITS(typer);
-
order = get_order((1UL << ids) * entry_size);
if (order >= MAX_ORDER) {
order = MAX_ORDER - 1;
--
2.1.1

2015-05-03 20:50:27

by Robert Richter

[permalink] [raw]
Subject: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

From: Radha Mohan Chintakuntla <[email protected]>

In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
which is bigger than the allowed max order. So we are forcing it only in
case of 4KB page size.

Signed-off-by: Radha Mohan Chintakuntla <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
---
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1b8e97331ffb..c519f3777453 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -542,6 +542,7 @@ config XEN
config FORCE_MAX_ZONEORDER
int
default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
+ default "13" if (ARCH_THUNDER && !ARM64_64K_PAGES)
default "11"

menuconfig ARMV8_DEPRECATED
--
2.1.1

2015-05-05 10:53:50

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> From: Radha Mohan Chintakuntla <[email protected]>
>
> In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> which is bigger than the allowed max order. So we are forcing it only in
> case of 4KB page size.

Does this problem disappear if the ITS driver uses dma_alloc_coherent
instead? That would also allow us to remove the __flush_dcache_area abuse
from the driver.

Will

2015-05-11 09:48:15

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On 05.05.15 11:53:29, Will Deacon wrote:
> On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > From: Radha Mohan Chintakuntla <[email protected]>
> >
> > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > which is bigger than the allowed max order. So we are forcing it only in
> > case of 4KB page size.
>
> Does this problem disappear if the ITS driver uses dma_alloc_coherent
> instead? That would also allow us to remove the __flush_dcache_area abuse
> from the driver.

__get_free_pages() is also used internally in dma_alloc_coherent().

There is another case if the device brings dma mem with it. I am not
sure if it would be possible to assign some phys memory via devicetree
to the interrupt controller and then assign that range for its table
allocation.

Another option would be to allocate a hugepage. This would require
setting up hugepages during boottime. I need to figure out whether
that could work.

What about on the remaining 3 patches?

Thanks,

-Robert

2015-05-12 12:31:15

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> On 05.05.15 11:53:29, Will Deacon wrote:
> > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > From: Radha Mohan Chintakuntla <[email protected]>
> > >
> > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > which is bigger than the allowed max order. So we are forcing it only in
> > > case of 4KB page size.
> >
> > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > instead? That would also allow us to remove the __flush_dcache_area abuse
> > from the driver.
>
> __get_free_pages() is also used internally in dma_alloc_coherent().
>
> There is another case if the device brings dma mem with it. I am not
> sure if it would be possible to assign some phys memory via devicetree
> to the interrupt controller and then assign that range for its table
> allocation.
>
> Another option would be to allocate a hugepage. This would require
> setting up hugepages during boottime. I need to figure out whether
> that could work.
>
> What about on the remaining 3 patches?

Marc would be the best guy to review those, but he's on holiday for a couple
of weeks at the moment.

Will

2015-05-12 16:21:19

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

Will,

On 12.05.15 13:30:57, Will Deacon wrote:
> On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> > On 05.05.15 11:53:29, Will Deacon wrote:
> > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > > From: Radha Mohan Chintakuntla <[email protected]>
> > > >
> > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > > which is bigger than the allowed max order. So we are forcing it only in
> > > > case of 4KB page size.
> > >
> > > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > > instead? That would also allow us to remove the __flush_dcache_area abuse
> > > from the driver.
> >
> > __get_free_pages() is also used internally in dma_alloc_coherent().
> >
> > There is another case if the device brings dma mem with it. I am not
> > sure if it would be possible to assign some phys memory via devicetree
> > to the interrupt controller and then assign that range for its table
> > allocation.
> >
> > Another option would be to allocate a hugepage. This would require
> > setting up hugepages during boottime. I need to figure out whether
> > that could work.

For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
default pagesize) I see this different approaches:

* set FORCE_MAX_ZONEORDER to 13 as default,

* set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set,

* set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch),

* use hugepages if enabled (defconfig has the following options
enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might
work with current default kernel without changing defconfig
options),

* use devicetree to reserve mem for gicv3 (need to check ACPI).

Do you see any direction?

> > What about on the remaining 3 patches?
>
> Marc would be the best guy to review those, but he's on holiday for a couple
> of weeks at the moment.

Thanks for the note and your comments.

-Robert

2015-05-12 17:24:20

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> On 12.05.15 13:30:57, Will Deacon wrote:
> > On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> > > On 05.05.15 11:53:29, Will Deacon wrote:
> > > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > > > From: Radha Mohan Chintakuntla <[email protected]>
> > > > >
> > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > > > which is bigger than the allowed max order. So we are forcing it only in
> > > > > case of 4KB page size.
> > > >
> > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > > > instead? That would also allow us to remove the __flush_dcache_area abuse
> > > > from the driver.
> > >
> > > __get_free_pages() is also used internally in dma_alloc_coherent().
> > >
> > > There is another case if the device brings dma mem with it. I am not
> > > sure if it would be possible to assign some phys memory via devicetree
> > > to the interrupt controller and then assign that range for its table
> > > allocation.
> > >
> > > Another option would be to allocate a hugepage. This would require
> > > setting up hugepages during boottime. I need to figure out whether
> > > that could work.
>
> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> default pagesize) I see this different approaches:

16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
a sparse DeviceID space or both?

> * set FORCE_MAX_ZONEORDER to 13 as default,
>
> * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set,
>
> * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch),

I'm not hugely fond of these suggestions, as there's still no guarantee
that such a huge allocation is going to succeed and we end up bumping
MAX_ORDER for all platforms in defconfig if we enable THUNDER there.

> * use hugepages if enabled (defconfig has the following options
> enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might
> work with current default kernel without changing defconfig
> options),

I don't think hugepages help with DMA.

> * use devicetree to reserve mem for gicv3 (need to check ACPI).

Using a carveout like this might be the best bet. I assume the memory used
by the ITS can never be reclaimed by the syste (and therefore there's no
issue with wastage)?

> Do you see any direction?

Dunno, does CMA also require the MAX_ORDER bump?

Will

2015-05-12 17:47:16

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On 12.05.15 18:24:16, Will Deacon wrote:
> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > default pagesize) I see this different approaches:
>
> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> a sparse DeviceID space or both?
>
> > * set FORCE_MAX_ZONEORDER to 13 as default,
> >
> > * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set,
> >
> > * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch),
>
> I'm not hugely fond of these suggestions, as there's still no guarantee
> that such a huge allocation is going to succeed and we end up bumping
> MAX_ORDER for all platforms in defconfig if we enable THUNDER there.

I actually was expecting this...

> > * use hugepages if enabled (defconfig has the following options
> > enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might
> > work with current default kernel without changing defconfig
> > options),
>
> I don't think hugepages help with DMA.
>
> > * use devicetree to reserve mem for gicv3 (need to check ACPI).

I am quite a bit concerned letting firmware handle this. But if that
would solve it, fine.

> Using a carveout like this might be the best bet. I assume the memory used
> by the ITS can never be reclaimed by the syste (and therefore there's no
> issue with wastage)?
>
> > Do you see any direction?
>
> Dunno, does CMA also require the MAX_ORDER bump?

Looks promising at the first glance. Will look into it.

Thanks,

-Robert

2015-05-20 12:10:05

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id

On Sun, 3 May 2015 21:49:29 +0100
Robert Richter <[email protected]> wrote:

> From: Tirumalesh Chalamarla <[email protected]>
>
> PCI stream ids need to consider pci bridge number to be unique on the
> system. Using only bus and devfn can't do the trick in systems that
> have multiple pci bridges.
>
> Signed-off-by: Tirumalesh Chalamarla <[email protected]>
> Signed-off-by: Robert Richter <[email protected]>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 9687f8afebff..e30b4de04c6c 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1186,7 +1186,7 @@ static int its_get_pci_alias(struct pci_dev *pdev, u16 alias, void *data)
> {
> struct its_pci_alias *dev_alias = data;
>
> - dev_alias->dev_id = alias;
> + dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
> if (pdev != dev_alias->pdev)
> dev_alias->count += its_pci_msi_vec_count(dev_alias->pdev);
>

This feels very scary. We're now assuming that the domain number will
always be presented to the doorbell. What guarantee do we have that
this is always the case, irrespective of the platform?

Also, domains have no PCI reality, they are a Linux thing. And they can
be "randomly" assigned, unless you force the domain in DT with a
linux,pci-domain property. This looks even more wrong, specially
considering ACPI.

It really feels like we need a way to describe how the BDF numbering is
augmented. We also need to guarantee that we get the actual bridge
number, as opposed to the domain number.

Thoughts?

M.
--
Without deviation from the norm, progress is not possible.

2015-05-20 12:12:59

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 2/4] arm64: gicv3: its: Add range check for number of allocated pages

On Sun, 3 May 2015 21:49:30 +0100
Robert Richter <[email protected]> wrote:

> From: Robert Richter <[email protected]>
>
> The number of pages for the its table may exceed the maximum of 256.
> Adding a range check and limitting the number to its maximum.
>
> Based on a patch from Tirumalesh Chalamarla <[email protected]>.
>
> Signed-off-by: Robert Richter <[email protected]>

Looks good to me.

Reviewed-by: Marc Zyngier <[email protected]>

M.

> ---
> drivers/irqchip/irq-gic-v3-its.c | 11 ++++++++++-
> include/linux/irqchip/arm-gic-v3.h | 1 +
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index e30b4de04c6c..a2619a1d5bb3 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -810,6 +810,7 @@ static int its_alloc_tables(struct its_node *its)
> u64 entry_size = GITS_BASER_ENTRY_SIZE(val);
> int order = get_order(psz);
> int alloc_size;
> + int alloc_pages;
> u64 tmp;
> void *base;
>
> @@ -837,6 +838,14 @@ static int its_alloc_tables(struct its_node *its)
> }
>
> alloc_size = (1 << order) * PAGE_SIZE;
> + alloc_pages = (alloc_size / psz);
> + if (alloc_pages > GITS_BASER_PAGES_MAX) {
> + alloc_pages = GITS_BASER_PAGES_MAX;
> + order = get_order(GITS_BASER_PAGES_MAX * psz);
> + pr_warn("%s: Device Table too large, reduce its page order to %u (%u pages)\n",
> + its->msi_chip.of_node->full_name, order, alloc_pages);
> + }
> +
> base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> if (!base) {
> err = -ENOMEM;
> @@ -865,7 +874,7 @@ static int its_alloc_tables(struct its_node *its)
> break;
> }
>
> - val |= (alloc_size / psz) - 1;
> + val |= alloc_pages - 1;
>
> writeq_relaxed(val, its->base + GITS_BASER + i * 8);
> tmp = readq_relaxed(its->base + GITS_BASER + i * 8);
> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index ffbc034c8810..f28da189c4aa 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -229,6 +229,7 @@
> #define GITS_BASER_PAGE_SIZE_16K (1UL << GITS_BASER_PAGE_SIZE_SHIFT)
> #define GITS_BASER_PAGE_SIZE_64K (2UL << GITS_BASER_PAGE_SIZE_SHIFT)
> #define GITS_BASER_PAGE_SIZE_MASK (3UL << GITS_BASER_PAGE_SIZE_SHIFT)
> +#define GITS_BASER_PAGES_MAX 256
>
> #define GITS_BASER_TYPE_NONE 0
> #define GITS_BASER_TYPE_DEVICE 1



--
Without deviation from the norm, progress is not possible.

2015-05-20 12:13:46

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 3/4] arm64: gicv3: its: Read typer register outside the loop

On Sun, 3 May 2015 21:49:31 +0100
Robert Richter <[email protected]> wrote:

> From: Robert Richter <[email protected]>
>
> No need to read the typer register in the loop. Values do not change.
>
> Signed-off-by: Robert Richter <[email protected]>

Fair enough.

Acked-by: Marc Zyngier <[email protected]>

M.

> ---
> drivers/irqchip/irq-gic-v3-its.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index a2619a1d5bb3..916c4724c771 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -803,6 +803,8 @@ static int its_alloc_tables(struct its_node *its)
> int psz = SZ_64K;
> u64 shr = GITS_BASER_InnerShareable;
> u64 cache = GITS_BASER_WaWb;
> + u64 typer = readq_relaxed(its->base + GITS_TYPER);
> + u32 ids = GITS_TYPER_DEVBITS(typer);
>
> for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> u64 val = readq_relaxed(its->base + GITS_BASER + i * 8);
> @@ -826,9 +828,6 @@ static int its_alloc_tables(struct its_node *its)
> * For other tables, only allocate a single page.
> */
> if (type == GITS_BASER_TYPE_DEVICE) {
> - u64 typer = readq_relaxed(its->base + GITS_TYPER);
> - u32 ids = GITS_TYPER_DEVBITS(typer);
> -
> order = get_order((1UL << ids) * entry_size);
> if (order >= MAX_ORDER) {
> order = MAX_ORDER - 1;



--
Without deviation from the norm, progress is not possible.

2015-05-20 12:20:35

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On Tue, 12 May 2015 18:24:16 +0100
Will Deacon <[email protected]> wrote:

> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > On 12.05.15 13:30:57, Will Deacon wrote:
> > > On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> > > > On 05.05.15 11:53:29, Will Deacon wrote:
> > > > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > > > > From: Radha Mohan Chintakuntla <[email protected]>
> > > > > >
> > > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > > > > which is bigger than the allowed max order. So we are forcing it only in
> > > > > > case of 4KB page size.
> > > > >
> > > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > > > > instead? That would also allow us to remove the __flush_dcache_area abuse
> > > > > from the driver.
> > > >
> > > > __get_free_pages() is also used internally in dma_alloc_coherent().
> > > >
> > > > There is another case if the device brings dma mem with it. I am not
> > > > sure if it would be possible to assign some phys memory via devicetree
> > > > to the interrupt controller and then assign that range for its table
> > > > allocation.
> > > >
> > > > Another option would be to allocate a hugepage. This would require
> > > > setting up hugepages during boottime. I need to figure out whether
> > > > that could work.
> >
> > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > default pagesize) I see this different approaches:
>
> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> a sparse DeviceID space or both?

That's probably due to the sparseness of the DeviceID space. With some
form of bridge number encoded on top of the BFD number, the device
table is enormous, and I don't see a nice way to avoid it...

M.
--
Without deviation from the norm, progress is not possible.

2015-05-20 12:32:28

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On 20.05.15 13:22:13, Marc Zyngier wrote:
> On Tue, 12 May 2015 18:24:16 +0100
> Will Deacon <[email protected]> wrote:
> > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > > On 12.05.15 13:30:57, Will Deacon wrote:

> > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > > default pagesize) I see this different approaches:
> >
> > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> > a sparse DeviceID space or both?
>
> That's probably due to the sparseness of the DeviceID space. With some
> form of bridge number encoded on top of the BFD number, the device
> table is enormous, and I don't see a nice way to avoid it...

Right. At the momement out of 21 bits (16MB) we currently have 2 spare
bits, which reduces the actually size used to 4MB. Though, for the
current cpu model we can reduce it at least to 8MB total.

I will come up with an additional patch setting this to 8MB.

As said before, I also write on a patch to use CMA.

Thanks,

-Robert

2015-05-20 12:48:47

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id

Mark,

thanks for review, also of the other patches of this series.

See below

On 20.05.15 13:11:38, Marc Zyngier wrote:
> > - dev_alias->dev_id = alias;
> > + dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;

> This feels very scary. We're now assuming that the domain number will
> always be presented to the doorbell. What guarantee do we have that
> this is always the case, irrespective of the platform?
>
> Also, domains have no PCI reality, they are a Linux thing. And they can
> be "randomly" assigned, unless you force the domain in DT with a
> linux,pci-domain property. This looks even more wrong, specially
> considering ACPI.

The main problem here is that device ids (32 bits) are system
specific. Since we have more than one PCI root complex we need the
upper 16 bits in the devid for mapping. Using pci_domain_nr for this
fits our needs for now and shouldn't affect systems with a single RC
only as the domain nr is zero then.

The domain number is incremented during initialization beginnig with
zero and the order of it is fixed since it is taken from DT or ACPI
tables. So we have full controll of it. I don't see issues here.

> It really feels like we need a way to describe how the BDF numbering is
> augmented. We also need to guarantee that we get the actual bridge
> number, as opposed to the domain number.

But true, the obove is just intermediate. In the end we need some sort
of handler that is setup during cpu initialization that registers a
callback for the gic to determine the device id of that paricular
system.

-Robert

2015-05-20 16:49:06

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
> On 20.05.15 13:22:13, Marc Zyngier wrote:
> > On Tue, 12 May 2015 18:24:16 +0100
> > Will Deacon <[email protected]> wrote:
> > > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > > > On 12.05.15 13:30:57, Will Deacon wrote:
>
> > > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > > > default pagesize) I see this different approaches:
> > >
> > > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> > > a sparse DeviceID space or both?
> >
> > That's probably due to the sparseness of the DeviceID space. With some
> > form of bridge number encoded on top of the BFD number, the device
> > table is enormous, and I don't see a nice way to avoid it...
>
> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
> bits, which reduces the actually size used to 4MB. Though, for the
> current cpu model we can reduce it at least to 8MB total.
>
> I will come up with an additional patch setting this to 8MB.
>
> As said before, I also write on a patch to use CMA.

Can we not reserve a chunk of memory and pass the information to the
kernel via DT (/memreserve/ and a new GIC-specific binding)?

--
Catalin

2015-05-21 08:35:56

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On 20/05/15 17:48, Catalin Marinas wrote:
> On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
>> On 20.05.15 13:22:13, Marc Zyngier wrote:
>>> On Tue, 12 May 2015 18:24:16 +0100
>>> Will Deacon <[email protected]> wrote:
>>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
>>>>> On 12.05.15 13:30:57, Will Deacon wrote:
>>
>>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
>>>>> default pagesize) I see this different approaches:
>>>>
>>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
>>>> a sparse DeviceID space or both?
>>>
>>> That's probably due to the sparseness of the DeviceID space. With some
>>> form of bridge number encoded on top of the BFD number, the device
>>> table is enormous, and I don't see a nice way to avoid it...
>>
>> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
>> bits, which reduces the actually size used to 4MB. Though, for the
>> current cpu model we can reduce it at least to 8MB total.
>>
>> I will come up with an additional patch setting this to 8MB.
>>
>> As said before, I also write on a patch to use CMA.
>
> Can we not reserve a chunk of memory and pass the information to the
> kernel via DT (/memreserve/ and a new GIC-specific binding)?

That would have to be done on a per-table basis then. And how would that
work with ACPI? I don't think the ACPI ITS table specifies anything in
that respect.

We're just facing the horrible reality that linear tables are not very
well suited to sparse addressing. Nobody copied the VAX MMU model for a
reason... until now.

M.
--
Jazz is not dead. It just smells funny...

2015-05-21 12:13:26

by Robert Richter

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

On 21.05.15 09:35:47, Marc Zyngier wrote:
> On 20/05/15 17:48, Catalin Marinas wrote:
> > On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
> >> On 20.05.15 13:22:13, Marc Zyngier wrote:
> >>> On Tue, 12 May 2015 18:24:16 +0100
> >>> Will Deacon <[email protected]> wrote:
> >>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> >>>>> On 12.05.15 13:30:57, Will Deacon wrote:
> >>
> >>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> >>>>> default pagesize) I see this different approaches:
> >>>>
> >>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> >>>> a sparse DeviceID space or both?
> >>>
> >>> That's probably due to the sparseness of the DeviceID space. With some
> >>> form of bridge number encoded on top of the BFD number, the device
> >>> table is enormous, and I don't see a nice way to avoid it...
> >>
> >> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
> >> bits, which reduces the actually size used to 4MB. Though, for the
> >> current cpu model we can reduce it at least to 8MB total.
> >>
> >> I will come up with an additional patch setting this to 8MB.
> >>
> >> As said before, I also write on a patch to use CMA.
> >
> > Can we not reserve a chunk of memory and pass the information to the
> > kernel via DT (/memreserve/ and a new GIC-specific binding)?
>
> That would have to be done on a per-table basis then. And how would that
> work with ACPI? I don't think the ACPI ITS table specifies anything in
> that respect.
>
> We're just facing the horrible reality that linear tables are not very
> well suited to sparse addressing. Nobody copied the VAX MMU model for a
> reason... until now.

We could still fall back to mem alloc if the DT or ACPI does not
provide a base address for the table.

For know I would prefer to just implement mem allocation with CMA.

-Robert

2015-05-21 12:34:14

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

[off the list]

On 21/05/15 13:13, Robert Richter wrote:
> On 21.05.15 09:35:47, Marc Zyngier wrote:
>> On 20/05/15 17:48, Catalin Marinas wrote:
>>> On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
>>>> On 20.05.15 13:22:13, Marc Zyngier wrote:
>>>>> On Tue, 12 May 2015 18:24:16 +0100
>>>>> Will Deacon <[email protected]> wrote:
>>>>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
>>>>>>> On 12.05.15 13:30:57, Will Deacon wrote:
>>>>
>>>>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
>>>>>>> default pagesize) I see this different approaches:
>>>>>>
>>>>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
>>>>>> a sparse DeviceID space or both?
>>>>>
>>>>> That's probably due to the sparseness of the DeviceID space. With some
>>>>> form of bridge number encoded on top of the BFD number, the device
>>>>> table is enormous, and I don't see a nice way to avoid it...
>>>>
>>>> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
>>>> bits, which reduces the actually size used to 4MB. Though, for the
>>>> current cpu model we can reduce it at least to 8MB total.
>>>>
>>>> I will come up with an additional patch setting this to 8MB.
>>>>
>>>> As said before, I also write on a patch to use CMA.
>>>
>>> Can we not reserve a chunk of memory and pass the information to the
>>> kernel via DT (/memreserve/ and a new GIC-specific binding)?
>>
>> That would have to be done on a per-table basis then. And how would that
>> work with ACPI? I don't think the ACPI ITS table specifies anything in
>> that respect.
>>
>> We're just facing the horrible reality that linear tables are not very
>> well suited to sparse addressing. Nobody copied the VAX MMU model for a
>> reason... until now.
>
> We could still fall back to mem alloc if the DT or ACPI does not
> provide a base address for the table.
>
> For know I would prefer to just implement mem allocation with CMA.

I suppose your ITS implementation doesn't have support for the indirect
tables (where bit 62 of GITS_BASERn can be 1)? If it did, that would
solve most of the issues.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2015-05-22 08:26:36

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id

On 20/05/15 13:48, Robert Richter wrote:
> Mark,
>
> thanks for review, also of the other patches of this series.
>
> See below
>
> On 20.05.15 13:11:38, Marc Zyngier wrote:
>>> - dev_alias->dev_id = alias;
>>> + dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
>
>> This feels very scary. We're now assuming that the domain number will
>> always be presented to the doorbell. What guarantee do we have that
>> this is always the case, irrespective of the platform?
>>
>> Also, domains have no PCI reality, they are a Linux thing. And they can
>> be "randomly" assigned, unless you force the domain in DT with a
>> linux,pci-domain property. This looks even more wrong, specially
>> considering ACPI.
>
> The main problem here is that device ids (32 bits) are system
> specific. Since we have more than one PCI root complex we need the
> upper 16 bits in the devid for mapping. Using pci_domain_nr for this
> fits our needs for now and shouldn't affect systems with a single RC
> only as the domain nr is zero then.
>
> The domain number is incremented during initialization beginnig with
> zero and the order of it is fixed since it is taken from DT or ACPI
> tables. So we have full controll of it. I don't see issues here.

This may match what you have on ThunderX (as long as the kernel doesn't
adopt another behaviour when allocating the domain number). But other
platforms may have a completely different numbering, which will mess
them up entirely.

>> It really feels like we need a way to describe how the BDF numbering is
>> augmented. We also need to guarantee that we get the actual bridge
>> number, as opposed to the domain number.
>
> But true, the obove is just intermediate. In the end we need some sort
> of handler that is setup during cpu initialization that registers a
> callback for the gic to determine the device id of that paricular
> system.

I don't really like the idea of a callback from the GIC - I'd prefer it
to be standalone, and rely on the topology information to build the
DeviceID. Mark Rutland had some ideas for DT (he posted an RFC a while
ago), maybe it would be good to get back to that and find out what we
can do. ACPI should also have similar information (IORT?).

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2015-05-23 01:31:30

by Chalamarla, Tirumalesh

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id


> On May 22, 2015, at 1:26 AM, Marc Zyngier <[email protected]> wrote:
>
> On 20/05/15 13:48, Robert Richter wrote:
>> Mark,
>>
>> thanks for review, also of the other patches of this series.
>>
>> See below
>>
>> On 20.05.15 13:11:38, Marc Zyngier wrote:
>>>> - dev_alias->dev_id = alias;
>>>> + dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
>>
>>> This feels very scary. We're now assuming that the domain number will
>>> always be presented to the doorbell. What guarantee do we have that
>>> this is always the case, irrespective of the platform?
>>>
>>> Also, domains have no PCI reality, they are a Linux thing. And they can
>>> be "randomly" assigned, unless you force the domain in DT with a
>>> linux,pci-domain property. This looks even more wrong, specially
>>> considering ACPI.
>>
>> The main problem here is that device ids (32 bits) are system
>> specific. Since we have more than one PCI root complex we need the
>> upper 16 bits in the devid for mapping. Using pci_domain_nr for this
>> fits our needs for now and shouldn't affect systems with a single RC
>> only as the domain nr is zero then.
>>
>> The domain number is incremented during initialization beginnig with
>> zero and the order of it is fixed since it is taken from DT or ACPI
>> tables. So we have full controll of it. I don't see issues here.
>
> This may match what you have on ThunderX (as long as the kernel doesn't
> adopt another behaviour when allocating the domain number). But other
> platforms may have a completely different numbering, which will mess
> them up entirely.
>
>>> It really feels like we need a way to describe how the BDF numbering is
>>> augmented. We also need to guarantee that we get the actual bridge
>>> number, as opposed to the domain number.
>>
>> But true, the obove is just intermediate. In the end we need some sort
>> of handler that is setup during cpu initialization that registers a
>> callback for the gic to determine the device id of that paricular
>> system.
>
> I don't really like the idea of a callback from the GIC - I'd prefer it
> to be standalone, and rely on the topology information to build the
> DeviceID. Mark Rutland had some ideas for DT (he posted an RFC a while
> ago), maybe it would be good to get back to that and find out what we
> can do. ACPI should also have similar information (IORT?).
>

How can some one pass this from DT, especially in GIC entry. i still think it is bus owner responsibility and call back is better idea.
but if some one has a better idea for DT and ACPI, we are fine as long as it works on ThunderX.

Thanks,
Tirumalesh.


> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...

2015-05-25 10:38:35

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id

On Fri, 22 May 2015 23:57:40 +0100
"Chalamarla, Tirumalesh" <[email protected]>
wrote:

Hi Tirumalesh,

>
> > On May 22, 2015, at 1:26 AM, Marc Zyngier <[email protected]> wrote:
> >
> > On 20/05/15 13:48, Robert Richter wrote:
> >> Mark,
> >>
> >> thanks for review, also of the other patches of this series.
> >>
> >> See below
> >>
> >> On 20.05.15 13:11:38, Marc Zyngier wrote:
> >>>> - dev_alias->dev_id = alias;
> >>>> + dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
> >>
> >>> This feels very scary. We're now assuming that the domain number will
> >>> always be presented to the doorbell. What guarantee do we have that
> >>> this is always the case, irrespective of the platform?
> >>>
> >>> Also, domains have no PCI reality, they are a Linux thing. And they can
> >>> be "randomly" assigned, unless you force the domain in DT with a
> >>> linux,pci-domain property. This looks even more wrong, specially
> >>> considering ACPI.
> >>
> >> The main problem here is that device ids (32 bits) are system
> >> specific. Since we have more than one PCI root complex we need the
> >> upper 16 bits in the devid for mapping. Using pci_domain_nr for this
> >> fits our needs for now and shouldn't affect systems with a single RC
> >> only as the domain nr is zero then.
> >>
> >> The domain number is incremented during initialization beginnig with
> >> zero and the order of it is fixed since it is taken from DT or ACPI
> >> tables. So we have full controll of it. I don't see issues here.
> >
> > This may match what you have on ThunderX (as long as the kernel doesn't
> > adopt another behaviour when allocating the domain number). But other
> > platforms may have a completely different numbering, which will mess
> > them up entirely.
> >
> >>> It really feels like we need a way to describe how the BDF numbering is
> >>> augmented. We also need to guarantee that we get the actual bridge
> >>> number, as opposed to the domain number.
> >>
> >> But true, the obove is just intermediate. In the end we need some sort
> >> of handler that is setup during cpu initialization that registers a
> >> callback for the gic to determine the device id of that paricular
> >> system.
> >
> > I don't really like the idea of a callback from the GIC - I'd prefer it
> > to be standalone, and rely on the topology information to build the
> > DeviceID. Mark Rutland had some ideas for DT (he posted an RFC a while
> > ago), maybe it would be good to get back to that and find out what we
> > can do. ACPI should also have similar information (IORT?).
> >
>
> How can some one pass this from DT, especially in GIC entry. i still
> think it is bus owner responsibility and call back is better idea.
> but if some one has a better idea for DT and ACPI, we are fine as
> long as it works on ThunderX.

A callback would have to be bus-specific, and depends from the observer
of the access. There is strictly no guarantee that a single write from
the device is performed using the same ID to the IOMMU and to the MSI
doorbell. Actually, they are very likely to be different. A generic
callback would have to know about the point where this access is
observed, and expressing this is a nightmare.

Also, I'm really opposed to having platform-specific code that has for
sole purpose to describe the hardware. This is why we have DT (and to a
lesser extent ACPI). We've been there on 32bit, and learned our lesson.
It doesn't scale, it leads to a bunch of hacks in all corners, and I
don't feel like being on the receiving end of something like this.

I really suggest you look at Mark's suggestion:


http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/333199.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/341584.html

because so far, this is the only proposal that makes any sense to me in
the long run. Feel free to comment on it and help us making something
that also work for your favorite SoC.

Thanks,

M.
--
Without deviation from the norm, progress is not possible.