2019-09-16 10:51:24

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 0/2] powerpc/mm: Conditionally call H_BLOCK_REMOVE

Since the commit ba2dd8a26baa ("powerpc/pseries/mm: call H_BLOCK_REMOVE"),
the call to H_BLOCK_REMOVE is always done if the feature is exhibited.

However, the hypervisor may not support all the block size for the hcall
H_BLOCK_REMOVE depending on the segment base page size and actual page
size.

When unsupported block size is used, the hcall H_BLOCK_REMOVE is returning
H_PARAM, which is triggering a BUG_ON check leading to a panic like this:

The PAPR document specifies the TLB Block Invalidate Characteristics which
tells for each couple segment base page size, actual page size, the size of
the block the hcall H_BLOCK_REMOVE is supporting.

Supporting various block sizes doesn't seem needed at that time since all
systems I was able to play with was supporting an 8 addresses block size,
which is the maximum through the hcall, or none at all. Supporting various
size would complexify the algorithm in call_block_remove() so unless this
is required, this is not done.

In the case of block size different from 8, a warning message is displayed
at boot time and that block size will be ignored checking for the
H_BLOCK_REMOVE support.

Due to the minimal amount of hardware showing a limited set of
H_BLOCK_REMOVE supported page size, I don't think there is a need to push
this series to the stable mailing list.

The first patch is reading the characteristic through the hcall
ibm,get-system-parameter and record the supported block size for each page
size. The second patch is changing the check used to detect the
H_BLOCK_REMOVE availability to take care of the base page size and page
size couple.

Changes since V1:

- Remove penc initialisation, this is already done in
mmu_psize_set_default_penc()
- Add details on the TLB Block Invalidate Characteristics's buffer format
- Introduce #define instead of using direct numerical values
- Function reading the characteristics is now directly called from
pSeries_setup_arch()
- The characteristics are now stored in a dedciated table static to lpar.c

Laurent Dufour (2):
powperc/mm: read TLB Block Invalidate Characteristics
powerpc/mm: call H_BLOCK_REMOVE when supported

.../include/asm/book3s/64/tlbflush-hash.h | 1 +
arch/powerpc/platforms/pseries/lpar.c | 173 +++++++++++++++++-
arch/powerpc/platforms/pseries/setup.c | 1 +
3 files changed, 173 insertions(+), 2 deletions(-)

--
2.23.0


2019-09-16 17:41:59

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 1/2] powperc/mm: read TLB Block Invalidate Characteristics

The PAPR document specifies the TLB Block Invalidate Characteristics which
tells for each couple segment base page size, actual page size, the size of
the block the hcall H_BLOCK_REMOVE is supporting.

These characteristics are loaded at boot time in a new table hblkr_size.
The table is appart the mmu_psize_def because this is specific to the
pseries architecture.

A new init service, pseries_lpar_read_hblkr_characteristics() is added to
read the characteristics. In that function, the size of the buffer is set
to twice the number of known page size, plus 10 bytes to ensure we have
enough place. This new init function is called from pSeries_setup_arch().

Signed-off-by: Laurent Dufour <[email protected]>
---
.../include/asm/book3s/64/tlbflush-hash.h | 1 +
arch/powerpc/platforms/pseries/lpar.c | 138 ++++++++++++++++++
arch/powerpc/platforms/pseries/setup.c | 1 +
3 files changed, 140 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
index 64d02a704bcb..74155cc8cf89 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
@@ -117,4 +117,5 @@ extern void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
unsigned long end);
extern void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd,
unsigned long addr);
+extern void pseries_lpar_read_hblkr_characteristics(void);
#endif /* _ASM_POWERPC_BOOK3S_64_TLBFLUSH_HASH_H */
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 36b846f6e74e..98a5c2ff9a0b 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -56,6 +56,15 @@ EXPORT_SYMBOL(plpar_hcall);
EXPORT_SYMBOL(plpar_hcall9);
EXPORT_SYMBOL(plpar_hcall_norets);

+/*
+ * H_BLOCK_REMOVE supported block size for this page size in segment who's base
+ * page size is that page size.
+ *
+ * The first index is the segment base page size, the second one is the actual
+ * page size.
+ */
+static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT];
+
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
static u8 dtl_mask = DTL_LOG_PREEMPT;
#else
@@ -1311,6 +1320,135 @@ static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
(void)call_block_remove(pix, param, true);
}

+/*
+ * TLB Block Invalidate Characteristics These characteristics define the size of
+ * the block the hcall H_BLOCK_REMOVE is able to process for each couple segment
+ * base page size, actual page size.
+ *
+ * The ibm,get-system-parameter properties is returning a buffer with the
+ * following layout:
+ *
+ * [ 2 bytes size of the RTAS buffer (without these 2 bytes) ]
+ * -----------------
+ * TLB Block Invalidate Specifiers:
+ * [ 1 byte LOG base 2 of the TLB invalidate block size being specified ]
+ * [ 1 byte Number of page sizes (N) that are supported for the specified
+ * TLB invalidate block size ]
+ * [ 1 byte Encoded segment base page size and actual page size
+ * MSB=0 means 4k segment base page size and actual page size
+ * MSB=1 the penc value in mmu_psize_def ]
+ * ...
+ * -----------------
+ * Next TLB Block Invalidate Specifiers...
+ * -----------------
+ * [ 0 ]
+ */
+static inline void __init set_hblk_bloc_size(int bpsize, int psize,
+ unsigned int block_size)
+{
+ if (block_size > hblkr_size[bpsize][psize])
+ hblkr_size[bpsize][psize] = block_size;
+}
+
+/*
+ * Decode the Encoded segment base page size and actual page size.
+ * PAPR specifies with bit 0 as MSB:
+ * - bit 0 is the L bit
+ * - bits 2-7 are the penc value
+ * If the L bit is 0, this means 4K segment base page size and actual page size
+ * otherwise the penc value should be readed.
+ */
+#define HBLKR_L_BIT_MASK 0x80
+#define HBLKR_PENC_MASK 0x3f
+static inline void __init check_lp_set_hblk(unsigned int lp,
+ unsigned int block_size)
+{
+ unsigned int bpsize, psize;
+
+
+ /* First, check the L bit, if not set, this means 4K */
+ if ((lp & HBLKR_L_BIT_MASK) == 0) {
+ set_hblk_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size);
+ return;
+ }
+
+ lp &= HBLKR_PENC_MASK;
+ for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++) {
+ struct mmu_psize_def *def = &mmu_psize_defs[bpsize];
+
+ for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
+ if (def->penc[psize] == lp) {
+ set_hblk_bloc_size(bpsize, psize, block_size);
+ return;
+ }
+ }
+ }
+}
+
+#define SPLPAR_TLB_BIC_TOKEN 50
+#define SPLPAR_TLB_BIC_MAXLENGTH (MMU_PAGE_COUNT*2 + 10)
+void __init pseries_lpar_read_hblkr_characteristics(void)
+{
+ int call_status;
+ unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
+ int len, idx, bpsize;
+
+ if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+ pr_info("H_BLOCK_REMOVE is not supported");
+ return;
+ }
+
+ memset(local_buffer, 0, SPLPAR_TLB_BIC_MAXLENGTH);
+
+ spin_lock(&rtas_data_buf_lock);
+ memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
+ call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
+ NULL,
+ SPLPAR_TLB_BIC_TOKEN,
+ __pa(rtas_data_buf),
+ RTAS_DATA_BUF_SIZE);
+ memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
+ local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
+ spin_unlock(&rtas_data_buf_lock);
+
+ if (call_status != 0) {
+ pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
+ __FILE__, __func__, call_status);
+ return;
+ }
+
+ /*
+ * The first two (2) bytes of the data in the buffer are the length of
+ * the returned data, not counting these first two (2) bytes.
+ */
+ len = local_buffer[0] * 256 + local_buffer[1] + 2;
+ if (len >= SPLPAR_TLB_BIC_MAXLENGTH) {
+ pr_warn("%s too large returned buffer %d", __func__, len);
+ return;
+ }
+
+ idx = 2;
+ while (idx < len) {
+ unsigned int block_size = local_buffer[idx++];
+ unsigned int npsize;
+
+ if (!block_size)
+ break;
+
+ block_size = 1 << block_size;
+
+ for (npsize = local_buffer[idx++]; npsize > 0; npsize--)
+ check_lp_set_hblk((unsigned int) local_buffer[idx++],
+ block_size);
+ }
+
+ for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
+ for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
+ if (hblkr_size[bpsize][idx])
+ pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d block size:%d",
+ bpsize, idx, hblkr_size[bpsize][idx]);
+}
+
/*
* Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
* lock.
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index f8adcd0e4589..015b7ba13ee4 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -744,6 +744,7 @@ static void __init pSeries_setup_arch(void)

pseries_setup_rfi_flush();
setup_stf_barrier();
+ pseries_lpar_read_hblkr_characteristics();

/* By default, only probe PCI (can be overridden by rtas_pci) */
pci_add_flags(PCI_PROBE_ONLY);
--
2.23.0

2019-09-16 17:45:45

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 2/2] powerpc/mm: call H_BLOCK_REMOVE when supported

Now we do not call _BLOCK_REMOVE all the time when the feature is
exhibited.

Depending on the hardware and the hypervisor, the hcall H_BLOCK_REMOVE may
not be able to process all the page size for a segment base page size, as
reported by the TLB Invalidate Characteristics.o

For each couple base segment page size and actual page size, this
characteristic is telling the size of the block the hcall is supporting.

Due to the involve complexity in do_block_remove() and call_block_remove(),
and the fact currently a 8 size block is returned by the hypervisor, we
are only supporting 8 size block to the H_BLOCK_REMOVE hcall.

Furthermore a warning message is displayed at boot time in the case of an
unsupported block size.

In order to identify this limitation easily in the code,a local define
HBLKR_SUPPORTED_SIZE defining the currently supported block size, and a
dedicated checking helper is_supported_hlbkr() are introduced.

For regular pages and hugetlb, the assumption is made that the page size is
equal to the base page size. For THP the page size is assumed to be 16M.

Signed-off-by: Laurent Dufour <[email protected]>
---
arch/powerpc/platforms/pseries/lpar.c | 35 +++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 98a5c2ff9a0b..e2ad9b3b1097 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -65,6 +65,13 @@ EXPORT_SYMBOL(plpar_hcall_norets);
*/
static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT];

+/*
+ * Due to the involved complexity, and that the current hypervisor is only
+ * returning this value or 0, we are limiting the support of the H_BLOCK_REMOVE
+ * buffer size to 8 size block.
+ */
+#define HBLKR_SUPPORTED_BLOCK_SIZE 8
+
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
static u8 dtl_mask = DTL_LOG_PREEMPT;
#else
@@ -993,6 +1000,15 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL

+/*
+ * Returned true if we are supporting this block size for the specified segment
+ * base page size and actual page size.
+ */
+static inline bool is_supported_hlbkr(int bpsize, int psize)
+{
+ return (hblkr_size[bpsize][psize] == HBLKR_SUPPORTED_BLOCK_SIZE);
+}
+
/**
* H_BLOCK_REMOVE caller.
* @idx should point to the latest @param entry set with a PTEX.
@@ -1152,7 +1168,11 @@ static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
if (lock_tlbie)
spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);

- if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
+ /*
+ * Assuming THP size is 16M, and we only support 8 bytes size buffer
+ * for the momment.
+ */
+ if (is_supported_hlbkr(psize, MMU_PAGE_16M))
hugepage_block_invalidate(slot, vpn, count, psize, ssize);
else
hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
@@ -1437,6 +1457,14 @@ void __init pseries_lpar_read_hblkr_characteristics(void)

block_size = 1 << block_size;

+ /*
+ * If the block size is not supported by the kernel, report it,
+ * but continue reading the values, and the following blocks.
+ */
+ if (block_size != HBLKR_SUPPORTED_BLOCK_SIZE)
+ pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
+ block_size);
+
for (npsize = local_buffer[idx++]; npsize > 0; npsize--)
check_lp_set_hblk((unsigned int) local_buffer[idx++],
block_size);
@@ -1468,7 +1496,10 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
if (lock_tlbie)
spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);

- if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+ /*
+ * Currently, we only support 8 bytes size buffer in do_block_remove().
+ */
+ if (is_supported_hlbkr(batch->psize, batch->psize)) {
do_block_remove(number, batch, param);
goto out;
}
--
2.23.0

2019-09-17 18:35:16

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] powerpc/mm: Conditionally call H_BLOCK_REMOVE

On 9/16/19 3:25 PM, Laurent Dufour wrote:
> Since the commit ba2dd8a26baa ("powerpc/pseries/mm: call H_BLOCK_REMOVE"),
> the call to H_BLOCK_REMOVE is always done if the feature is exhibited.
>
> However, the hypervisor may not support all the block size for the hcall
> H_BLOCK_REMOVE depending on the segment base page size and actual page
> size.
>
> When unsupported block size is used, the hcall H_BLOCK_REMOVE is returning
> H_PARAM, which is triggering a BUG_ON check leading to a panic like this:
>
> The PAPR document specifies the TLB Block Invalidate Characteristics which
> tells for each couple segment base page size, actual page size, the size of
> the block the hcall H_BLOCK_REMOVE is supporting.
>
> Supporting various block sizes doesn't seem needed at that time since all
> systems I was able to play with was supporting an 8 addresses block size,
> which is the maximum through the hcall, or none at all. Supporting various
> size would complexify the algorithm in call_block_remove() so unless this
> is required, this is not done.
>
> In the case of block size different from 8, a warning message is displayed
> at boot time and that block size will be ignored checking for the
> H_BLOCK_REMOVE support.
>
> Due to the minimal amount of hardware showing a limited set of
> H_BLOCK_REMOVE supported page size, I don't think there is a need to push
> this series to the stable mailing list.
>
> The first patch is reading the characteristic through the hcall
> ibm,get-system-parameter and record the supported block size for each page
> size. The second patch is changing the check used to detect the
> H_BLOCK_REMOVE availability to take care of the base page size and page
> size couple.
>
> Changes since V1:
>
> - Remove penc initialisation, this is already done in
> mmu_psize_set_default_penc()
> - Add details on the TLB Block Invalidate Characteristics's buffer format
> - Introduce #define instead of using direct numerical values
> - Function reading the characteristics is now directly called from
> pSeries_setup_arch()
> - The characteristics are now stored in a dedciated table static to lpar.c
>

you can use

Reviewed-by: Aneesh Kumar K.V <[email protected]>

for the series.

-aneesh

2019-09-18 13:46:12

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powperc/mm: read TLB Block Invalidate Characteristics

Hi Laurent,

Comments below ...

Laurent Dufour <[email protected]> writes:
> The PAPR document specifies the TLB Block Invalidate Characteristics which
> tells for each couple segment base page size, actual page size, the size of
^
"pair of" again

> the block the hcall H_BLOCK_REMOVE is supporting.
^
"supports" is fine.

> These characteristics are loaded at boot time in a new table hblkr_size.
> The table is appart the mmu_psize_def because this is specific to the
^
"separate from"

> pseries architecture.
^
platform
>
> A new init service, pseries_lpar_read_hblkr_characteristics() is added to
^
function

> read the characteristics. In that function, the size of the buffer is set
> to twice the number of known page size, plus 10 bytes to ensure we have
> enough place. This new init function is called from pSeries_setup_arch().
^
space
>
> Signed-off-by: Laurent Dufour <[email protected]>
> ---
> .../include/asm/book3s/64/tlbflush-hash.h | 1 +
> arch/powerpc/platforms/pseries/lpar.c | 138 ++++++++++++++++++
> arch/powerpc/platforms/pseries/setup.c | 1 +
> 3 files changed, 140 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> index 64d02a704bcb..74155cc8cf89 100644
> --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
> @@ -117,4 +117,5 @@ extern void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
> unsigned long end);
> extern void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd,
> unsigned long addr);
> +extern void pseries_lpar_read_hblkr_characteristics(void);

This doesn't need "extern", and also should go in
arch/powerpc/platforms/pseries/pseries.h as it's local to that directory.

You're using "hblkr" in a few places, can we instead make it "hblkrm" -
"rm" is a well known abbreviation for "remove".


> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 36b846f6e74e..98a5c2ff9a0b 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -56,6 +56,15 @@ EXPORT_SYMBOL(plpar_hcall);
> EXPORT_SYMBOL(plpar_hcall9);
> EXPORT_SYMBOL(plpar_hcall_norets);
>
> +/*
> + * H_BLOCK_REMOVE supported block size for this page size in segment who's base
> + * page size is that page size.
> + *
> + * The first index is the segment base page size, the second one is the actual
> + * page size.
> + */
> +static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT];

Can you make that __ro_after_init, it goes at the end, eg:

static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT] __ro_after_init;

> @@ -1311,6 +1320,135 @@ static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
> (void)call_block_remove(pix, param, true);
> }
>
> +/*
> + * TLB Block Invalidate Characteristics These characteristics define the size of
^
newline before here?

> + * the block the hcall H_BLOCK_REMOVE is able to process for each couple segment
> + * base page size, actual page size.
> + *
> + * The ibm,get-system-parameter properties is returning a buffer with the
> + * following layout:
> + *
> + * [ 2 bytes size of the RTAS buffer (without these 2 bytes) ]
^
"excluding"

> + * -----------------
> + * TLB Block Invalidate Specifiers:
> + * [ 1 byte LOG base 2 of the TLB invalidate block size being specified ]
> + * [ 1 byte Number of page sizes (N) that are supported for the specified
> + * TLB invalidate block size ]
> + * [ 1 byte Encoded segment base page size and actual page size
> + * MSB=0 means 4k segment base page size and actual page size
> + * MSB=1 the penc value in mmu_psize_def ]
> + * ...
> + * -----------------
> + * Next TLB Block Invalidate Specifiers...
> + * -----------------
> + * [ 0 ]
> + */
> +static inline void __init set_hblk_bloc_size(int bpsize, int psize,
> + unsigned int block_size)

"static inline" and __init are sort of contradictory.

Just make it "static void __init" and the compiler will inline it
anyway, but if it decides not to the section will be correct.

This one uses "hblk"? Should it be set_hblkrm_block_size() ?

> +{
> + if (block_size > hblkr_size[bpsize][psize])
> + hblkr_size[bpsize][psize] = block_size;
> +}
> +
> +/*
> + * Decode the Encoded segment base page size and actual page size.
> + * PAPR specifies with bit 0 as MSB:
> + * - bit 0 is the L bit
> + * - bits 2-7 are the penc value

Can we just convert those to normal bit ordering for the comment, eg:

PAPR specifies:
- bit 7 is the L bit
- bits 0-5 are the penc value

> + * If the L bit is 0, this means 4K segment base page size and actual page size
> + * otherwise the penc value should be readed.
^
"read" ?
> + */
> +#define HBLKR_L_BIT_MASK 0x80

"HBLKRM_L_MASK" would do I think?

> +#define HBLKR_PENC_MASK 0x3f
> +static inline void __init check_lp_set_hblk(unsigned int lp,
> + unsigned int block_size)
> +{
> + unsigned int bpsize, psize;
> +

One blank line is sufficient :)

> +
> + /* First, check the L bit, if not set, this means 4K */
> + if ((lp & HBLKR_L_BIT_MASK) == 0) {
> + set_hblk_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size);
> + return;
> + }
> +
> + lp &= HBLKR_PENC_MASK;
> + for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++) {
> + struct mmu_psize_def *def = &mmu_psize_defs[bpsize];
^
stray space there
> +
> + for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
> + if (def->penc[psize] == lp) {
> + set_hblk_bloc_size(bpsize, psize, block_size);
> + return;
> + }
> + }
> + }
> +}
> +
> +#define SPLPAR_TLB_BIC_TOKEN 50
> +#define SPLPAR_TLB_BIC_MAXLENGTH (MMU_PAGE_COUNT*2 + 10)

The +10 is just a guess I think?

If I'm counting right that ends up as 42 bytes, which is not much at
all. You could probably just make it a cache line, 128 bytes, which
should be plenty of space?

> +void __init pseries_lpar_read_hblkr_characteristics(void)
> +{
> + int call_status;

This should be grouped with the other ints below on one line.

> + unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
> + int len, idx, bpsize;
> +
> + if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
> + pr_info("H_BLOCK_REMOVE is not supported");

That's going to trigger on a lot of older machines, and all KVM VMs, so
just return silently.

> + return;
> + }
> +
> + memset(local_buffer, 0, SPLPAR_TLB_BIC_MAXLENGTH);

Here you memset the whole buffer ...

> + spin_lock(&rtas_data_buf_lock);
> + memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
> + call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
> + NULL,
> + SPLPAR_TLB_BIC_TOKEN,
> + __pa(rtas_data_buf),
> + RTAS_DATA_BUF_SIZE);
> + memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);

But then here you memcpy over the entire buffer, making the memset above
unnecessary.

> + local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
> + spin_unlock(&rtas_data_buf_lock);
> +
> + if (call_status != 0) {
> + pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
> + __FILE__, __func__, call_status);

__FILE__ and __func__ is pretty verbose for what should be a rare case
(I realise you copied that from existing code).

pr_warn("Error calling get-system-parameter(%d, ...) (0x%x)\n",
SPLPAR_TLB_BIC_TOKEN, call_status);

Should do I think?

> + return;
> + }
> +
> + /*
> + * The first two (2) bytes of the data in the buffer are the length of
> + * the returned data, not counting these first two (2) bytes.
> + */
> + len = local_buffer[0] * 256 + local_buffer[1] + 2;

This took me a minute to parse. I guess I was expecting something more
like:
len = be16_to_cpu(local_buffer) + 2;

??

> + if (len >= SPLPAR_TLB_BIC_MAXLENGTH) {

I think it's allowed for len to be == SPLPAR_TLB_BIC_MAXLENGTH isn't it?

> + pr_warn("%s too large returned buffer %d", __func__, len);
> + return;
> + }
> +
> + idx = 2;
> + while (idx < len) {
> + unsigned int block_size = local_buffer[idx++];

This is a little subtle, local_buffer is char, so no endian issue, but
you're loading that into an unsigned int which makes it look like there
should be an endian swap.

Maybe instead do:
u8 block_shift = local_buffer[idx++];
u32 block_size;

if (!block_shift)
break;

block_size = 1 << block_shift;

??

> + unsigned int npsize;
> +
> + if (!block_size)
> + break;
> +
> + block_size = 1 << block_size;
> +
> + for (npsize = local_buffer[idx++]; npsize > 0; npsize--)
> + check_lp_set_hblk((unsigned int) local_buffer[idx++],
> + block_size);

This could overflow if npsize is too big. I think you can just add
"&& idx < len" to the for condition to make it safe?

> + }
> +
> + for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
> + for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
> + if (hblkr_size[bpsize][idx])
> + pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d block size:%d",
> + bpsize, idx, hblkr_size[bpsize][idx]);

I think this is probably too verbose, except for debugging. Probably
just remove it?

cheers

2019-09-18 13:46:23

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/mm: call H_BLOCK_REMOVE when supported

Hi Laurent,

Few comments ...

Laurent Dufour <[email protected]> writes:
> Now we do not call _BLOCK_REMOVE all the time when the feature is
> exhibited.

This isn't true until after the patch is applied, ie. the tense is
wrong. The rest of the change log explains things fine, so just drop
that sentence I think.

Can you include the info about the oops in here.

> Depending on the hardware and the hypervisor, the hcall H_BLOCK_REMOVE may
> not be able to process all the page size for a segment base page size, as
^
sizes
> reported by the TLB Invalidate Characteristics.o
^
stray "o"
>
> For each couple base segment page size and actual page size, this
^
"pair of"
> characteristic is telling the size of the block the hcall is supporting.
^ ^
"tells us" supports
>
> Due to the involve complexity in do_block_remove() and call_block_remove(),
^
"required" is better I think
> and the fact currently a 8 size block is returned by the hypervisor, we
^ ^
that "block of size 8"
> are only supporting 8 size block to the H_BLOCK_REMOVE hcall.
>
> Furthermore a warning message is displayed at boot time in the case of an
> unsupported block size.

I'm not sure we should be doing that? It could be unnecessarily spammy.

> In order to identify this limitation easily in the code,a local define
> HBLKR_SUPPORTED_SIZE defining the currently supported block size, and a
> dedicated checking helper is_supported_hlbkr() are introduced.
>
> For regular pages and hugetlb, the assumption is made that the page size is
> equal to the base page size. For THP the page size is assumed to be 16M.
>
> Signed-off-by: Laurent Dufour <[email protected]>
> ---
> arch/powerpc/platforms/pseries/lpar.c | 35 +++++++++++++++++++++++++--
> 1 file changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 98a5c2ff9a0b..e2ad9b3b1097 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -65,6 +65,13 @@ EXPORT_SYMBOL(plpar_hcall_norets);
> */
> static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT];
>
> +/*
> + * Due to the involved complexity, and that the current hypervisor is only
> + * returning this value or 0, we are limiting the support of the H_BLOCK_REMOVE
> + * buffer size to 8 size block.
> + */
> +#define HBLKR_SUPPORTED_BLOCK_SIZE 8
> +
> #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> static u8 dtl_mask = DTL_LOG_PREEMPT;
> #else
> @@ -993,6 +1000,15 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
> #define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
> #define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
>
> +/*
> + * Returned true if we are supporting this block size for the specified segment
> + * base page size and actual page size.
> + */
> +static inline bool is_supported_hlbkr(int bpsize, int psize)
> +{
> + return (hblkr_size[bpsize][psize] == HBLKR_SUPPORTED_BLOCK_SIZE);
> +}
> +
> /**
> * H_BLOCK_REMOVE caller.
> * @idx should point to the latest @param entry set with a PTEX.
> @@ -1152,7 +1168,11 @@ static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
> if (lock_tlbie)
> spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>
> - if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
> + /*
> + * Assuming THP size is 16M, and we only support 8 bytes size buffer
> + * for the momment.
> + */
> + if (is_supported_hlbkr(psize, MMU_PAGE_16M))

It's not very clear that this is correct in all cases. ie. how do we
know we're being called for THP and not regular huge page?

I think we're only called via:
flush_hash_hugepage()
-> mmu_hash_ops.hugepage_invalidate()
pSeries_lpar_hugepage_invalidate()
-> __pSeries_lpar_hugepage_invalidate()

And flush_hash_hugepage() is called via:
__hash_page_thp()
and
hpte_do_hugepage_flush()

The first is presumably fine, the 2nd is called in a few places:
__flush_hash_table_range() under if (is_thp)
hash__pmd_hugepage_update()


But it's a little bit fragile if the code ever evolves. Not sure if
there's a better solution, other than just documenting it.

> hugepage_block_invalidate(slot, vpn, count, psize, ssize);
> else
> hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
> @@ -1437,6 +1457,14 @@ void __init pseries_lpar_read_hblkr_characteristics(void)
>
> block_size = 1 << block_size;
>
> + /*
> + * If the block size is not supported by the kernel, report it,
> + * but continue reading the values, and the following blocks.
> + */
> + if (block_size != HBLKR_SUPPORTED_BLOCK_SIZE)
> + pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
> + block_size);

Does this need a printk? I'm worried it could end up triggering and
scaring people unnecessarily.

> +
> for (npsize = local_buffer[idx++]; npsize > 0; npsize--)
> check_lp_set_hblk((unsigned int) local_buffer[idx++],
> block_size);
> @@ -1468,7 +1496,10 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
> if (lock_tlbie)
> spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>
> - if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
> + /*
> + * Currently, we only support 8 bytes size buffer in do_block_remove().
> + */
> + if (is_supported_hlbkr(batch->psize, batch->psize)) {
> do_block_remove(number, batch, param);
> goto out;
> }
> --
> 2.23.0

cheers

2019-09-18 13:48:05

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] powerpc/mm: Conditionally call H_BLOCK_REMOVE

Hi Laurent,

Thanks for fixing this, just a few comments.

Laurent Dufour <[email protected]> writes:
> Since the commit ba2dd8a26baa ("powerpc/pseries/mm: call H_BLOCK_REMOVE"),
> the call to H_BLOCK_REMOVE is always done if the feature is exhibited.
>
> However, the hypervisor may not support all the block size for the hcall
> H_BLOCK_REMOVE depending on the segment base page size and actual page
> size.
>
> When unsupported block size is used, the hcall H_BLOCK_REMOVE is returning
> H_PARAM, which is triggering a BUG_ON check leading to a panic like this:

Missing panic :)

Also can you put that detail in the 2nd commit, so that it's obvious
that it is a fix for an oops.

> The PAPR document specifies the TLB Block Invalidate Characteristics which

Can you give a section/page number for the PAPR reference.

> tells for each couple segment base page size, actual page size, the size of
^
"pair of" is better than "couple" I think

> the block the hcall H_BLOCK_REMOVE is supporting.
>
> Supporting various block sizes doesn't seem needed at that time since all
> systems I was able to play with was supporting an 8 addresses block size,
> which is the maximum through the hcall, or none at all. Supporting various
> size would complexify the algorithm in call_block_remove() so unless this
> is required, this is not done.
>
> In the case of block size different from 8, a warning message is displayed
> at boot time and that block size will be ignored checking for the
> H_BLOCK_REMOVE support.
>
> Due to the minimal amount of hardware showing a limited set of
> H_BLOCK_REMOVE supported page size, I don't think there is a need to push
> this series to the stable mailing list.

But the hardware that is exhibiting it, by crashing and not booting, is
mostly (all?) older hardware. So I think we probably should send it back
to stable, otherwise those machines may never get an updated kernel.

Can you add Fixes: tags to the commits, they should point at the commit
that added H_BLOCK_REMOVE support.

cheers

> The first patch is reading the characteristic through the hcall
> ibm,get-system-parameter and record the supported block size for each page
> size. The second patch is changing the check used to detect the
> H_BLOCK_REMOVE availability to take care of the base page size and page
> size couple.
>
> Changes since V1:
>
> - Remove penc initialisation, this is already done in
> mmu_psize_set_default_penc()
> - Add details on the TLB Block Invalidate Characteristics's buffer format
> - Introduce #define instead of using direct numerical values
> - Function reading the characteristics is now directly called from
> pSeries_setup_arch()
> - The characteristics are now stored in a dedciated table static to lpar.c
>
> Laurent Dufour (2):
> powperc/mm: read TLB Block Invalidate Characteristics
> powerpc/mm: call H_BLOCK_REMOVE when supported
>
> .../include/asm/book3s/64/tlbflush-hash.h | 1 +
> arch/powerpc/platforms/pseries/lpar.c | 173 +++++++++++++++++-
> arch/powerpc/platforms/pseries/setup.c | 1 +
> 3 files changed, 173 insertions(+), 2 deletions(-)
>
> --
> 2.23.0

2019-09-19 17:47:45

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/mm: call H_BLOCK_REMOVE when supported

Le 18/09/2019 à 15:42, Michael Ellerman a écrit :
> Hi Laurent,
>
> Few comments ...

Hi Michael,

Thanks for the review and the nitpicking ;)
>
> Laurent Dufour <[email protected]> writes:
>> Now we do not call _BLOCK_REMOVE all the time when the feature is
>> exhibited.
>
> This isn't true until after the patch is applied, ie. the tense is
> wrong. The rest of the change log explains things fine, so just drop
> that sentence I think.
>
> Can you include the info about the oops in here.
>
>> Depending on the hardware and the hypervisor, the hcall H_BLOCK_REMOVE may
>> not be able to process all the page size for a segment base page size, as
> ^
> sizes
>> reported by the TLB Invalidate Characteristics.o
> ^
> stray "o"
>>
>> For each couple base segment page size and actual page size, this
> ^
> "pair of"
>> characteristic is telling the size of the block the hcall is supporting.
> ^ ^
> "tells us" supports
>>
>> Due to the involve complexity in do_block_remove() and call_block_remove(),
> ^
> "required" is better I think
>> and the fact currently a 8 size block is returned by the hypervisor, we
> ^ ^
> that "block of size 8"
>> are only supporting 8 size block to the H_BLOCK_REMOVE hcall.
>>
>> Furthermore a warning message is displayed at boot time in the case of an
>> unsupported block size.
>
> I'm not sure we should be doing that? It could be unnecessarily spammy.
>
>> In order to identify this limitation easily in the code,a local define
>> HBLKR_SUPPORTED_SIZE defining the currently supported block size, and a
>> dedicated checking helper is_supported_hlbkr() are introduced.
>>
>> For regular pages and hugetlb, the assumption is made that the page size is
>> equal to the base page size. For THP the page size is assumed to be 16M.
>>
>> Signed-off-by: Laurent Dufour <[email protected]>
>> ---
>> arch/powerpc/platforms/pseries/lpar.c | 35 +++++++++++++++++++++++++--
>> 1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>> index 98a5c2ff9a0b..e2ad9b3b1097 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -65,6 +65,13 @@ EXPORT_SYMBOL(plpar_hcall_norets);
>> */
>> static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT];
>>
>> +/*
>> + * Due to the involved complexity, and that the current hypervisor is only
>> + * returning this value or 0, we are limiting the support of the H_BLOCK_REMOVE
>> + * buffer size to 8 size block.
>> + */
>> +#define HBLKR_SUPPORTED_BLOCK_SIZE 8
>> +
>> #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>> static u8 dtl_mask = DTL_LOG_PREEMPT;
>> #else
>> @@ -993,6 +1000,15 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
>> #define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
>> #define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
>>
>> +/*
>> + * Returned true if we are supporting this block size for the specified segment
>> + * base page size and actual page size.
>> + */
>> +static inline bool is_supported_hlbkr(int bpsize, int psize)
>> +{
>> + return (hblkr_size[bpsize][psize] == HBLKR_SUPPORTED_BLOCK_SIZE);
>> +}
>> +
>> /**
>> * H_BLOCK_REMOVE caller.
>> * @idx should point to the latest @param entry set with a PTEX.
>> @@ -1152,7 +1168,11 @@ static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
>> if (lock_tlbie)
>> spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>>
>> - if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
>> + /*
>> + * Assuming THP size is 16M, and we only support 8 bytes size buffer
>> + * for the momment.
>> + */
>> + if (is_supported_hlbkr(psize, MMU_PAGE_16M))
>
> It's not very clear that this is correct in all cases. ie. how do we
> know we're being called for THP and not regular huge page?
>
> I think we're only called via:
> flush_hash_hugepage()
> -> mmu_hash_ops.hugepage_invalidate()
> pSeries_lpar_hugepage_invalidate()
> -> __pSeries_lpar_hugepage_invalidate()
>
> And flush_hash_hugepage() is called via:
> __hash_page_thp()
> and
> hpte_do_hugepage_flush()
>
> The first is presumably fine, the 2nd is called in a few places:
> __flush_hash_table_range() under if (is_thp)
> hash__pmd_hugepage_update()
>
>
> But it's a little bit fragile if the code ever evolves. Not sure if
> there's a better solution, other than just documenting it.

Indeed __pSeries_lpar_hugepage_invalidate() can only be called for THP.
flush_hash_hugepage() and hpte_do_hugepage_flush() are only defined (or
valid) with CONFIG_TRANSPARENT_HUGEPAGE.

As Aneesh remind me, "hugepage" stands for THP.

>
>> hugepage_block_invalidate(slot, vpn, count, psize, ssize);
>> else
>> hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
>> @@ -1437,6 +1457,14 @@ void __init pseries_lpar_read_hblkr_characteristics(void)
>>
>> block_size = 1 << block_size;
>>
>> + /*
>> + * If the block size is not supported by the kernel, report it,
>> + * but continue reading the values, and the following blocks.
>> + */
>> + if (block_size != HBLKR_SUPPORTED_BLOCK_SIZE)
>> + pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
>> + block_size);
>
> Does this need a printk? I'm worried it could end up triggering and
> scaring people unnecessarily.

I agree, will remove.

>
>> +
>> for (npsize = local_buffer[idx++]; npsize > 0; npsize--)
>> check_lp_set_hblk((unsigned int) local_buffer[idx++],
>> block_size);
>> @@ -1468,7 +1496,10 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
>> if (lock_tlbie)
>> spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>>
>> - if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
>> + /*
>> + * Currently, we only support 8 bytes size buffer in do_block_remove().
>> + */
>> + if (is_supported_hlbkr(batch->psize, batch->psize)) {
>> do_block_remove(number, batch, param);
>> goto out;
>> }
>> --
>> 2.23.0
>
> cheers
>

2019-09-19 21:33:24

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powperc/mm: read TLB Block Invalidate Characteristics

Le 18/09/2019 à 15:42, Michael Ellerman a écrit :
> Hi Laurent,
>
> Comments below ...

Hi Michael,

Thanks for your review.

One comment below (at the end).

>
> Laurent Dufour <[email protected]> writes:
>> The PAPR document specifies the TLB Block Invalidate Characteristics which
>> tells for each couple segment base page size, actual page size, the size of
> ^
> "pair of" again
>
>> the block the hcall H_BLOCK_REMOVE is supporting.
> ^
> "supports" is fine.
>
>> These characteristics are loaded at boot time in a new table hblkr_size.
>> The table is appart the mmu_psize_def because this is specific to the
> ^
> "separate from"
>
>> pseries architecture.
> ^
> platform
>>
>> A new init service, pseries_lpar_read_hblkr_characteristics() is added to
> ^
> function
>
>> read the characteristics. In that function, the size of the buffer is set
>> to twice the number of known page size, plus 10 bytes to ensure we have
>> enough place. This new init function is called from pSeries_setup_arch().
> ^
> space
>>
>> Signed-off-by: Laurent Dufour <[email protected]>
>> ---
>> .../include/asm/book3s/64/tlbflush-hash.h | 1 +
>> arch/powerpc/platforms/pseries/lpar.c | 138 ++++++++++++++++++
>> arch/powerpc/platforms/pseries/setup.c | 1 +
>> 3 files changed, 140 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
>> index 64d02a704bcb..74155cc8cf89 100644
>> --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
>> +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
>> @@ -117,4 +117,5 @@ extern void __flush_hash_table_range(struct mm_struct *mm, unsigned long start,
>> unsigned long end);
>> extern void flush_tlb_pmd_range(struct mm_struct *mm, pmd_t *pmd,
>> unsigned long addr);
>> +extern void pseries_lpar_read_hblkr_characteristics(void);
>
> This doesn't need "extern", and also should go in
> arch/powerpc/platforms/pseries/pseries.h as it's local to that directory.
>
> You're using "hblkr" in a few places, can we instead make it "hblkrm" -
> "rm" is a well known abbreviation for "remove".
>
>
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>> index 36b846f6e74e..98a5c2ff9a0b 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -56,6 +56,15 @@ EXPORT_SYMBOL(plpar_hcall);
>> EXPORT_SYMBOL(plpar_hcall9);
>> EXPORT_SYMBOL(plpar_hcall_norets);
>>
>> +/*
>> + * H_BLOCK_REMOVE supported block size for this page size in segment who's base
>> + * page size is that page size.
>> + *
>> + * The first index is the segment base page size, the second one is the actual
>> + * page size.
>> + */
>> +static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT];
>
> Can you make that __ro_after_init, it goes at the end, eg:
>
> static int hblkr_size[MMU_PAGE_COUNT][MMU_PAGE_COUNT] __ro_after_init;
>
>> @@ -1311,6 +1320,135 @@ static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
>> (void)call_block_remove(pix, param, true);
>> }
>>
>> +/*
>> + * TLB Block Invalidate Characteristics These characteristics define the size of
> ^
> newline before here?
>
>> + * the block the hcall H_BLOCK_REMOVE is able to process for each couple segment
>> + * base page size, actual page size.
>> + *
>> + * The ibm,get-system-parameter properties is returning a buffer with the
>> + * following layout:
>> + *
>> + * [ 2 bytes size of the RTAS buffer (without these 2 bytes) ]
> ^
> "excluding"
>
>> + * -----------------
>> + * TLB Block Invalidate Specifiers:
>> + * [ 1 byte LOG base 2 of the TLB invalidate block size being specified ]
>> + * [ 1 byte Number of page sizes (N) that are supported for the specified
>> + * TLB invalidate block size ]
>> + * [ 1 byte Encoded segment base page size and actual page size
>> + * MSB=0 means 4k segment base page size and actual page size
>> + * MSB=1 the penc value in mmu_psize_def ]
>> + * ...
>> + * -----------------
>> + * Next TLB Block Invalidate Specifiers...
>> + * -----------------
>> + * [ 0 ]
>> + */
>> +static inline void __init set_hblk_bloc_size(int bpsize, int psize,
>> + unsigned int block_size)
>
> "static inline" and __init are sort of contradictory.
>
> Just make it "static void __init" and the compiler will inline it
> anyway, but if it decides not to the section will be correct.
>
> This one uses "hblk"? Should it be set_hblkrm_block_size() ?
>
>> +{
>> + if (block_size > hblkr_size[bpsize][psize])
>> + hblkr_size[bpsize][psize] = block_size;
>> +}
>> +
>> +/*
>> + * Decode the Encoded segment base page size and actual page size.
>> + * PAPR specifies with bit 0 as MSB:
>> + * - bit 0 is the L bit
>> + * - bits 2-7 are the penc value
>
> Can we just convert those to normal bit ordering for the comment, eg:
>
> PAPR specifies:
> - bit 7 is the L bit
> - bits 0-5 are the penc value
>
>> + * If the L bit is 0, this means 4K segment base page size and actual page size
>> + * otherwise the penc value should be readed.
> ^
> "read" ?
>> + */
>> +#define HBLKR_L_BIT_MASK 0x80
>
> "HBLKRM_L_MASK" would do I think?
>
>> +#define HBLKR_PENC_MASK 0x3f
>> +static inline void __init check_lp_set_hblk(unsigned int lp,
>> + unsigned int block_size)
>> +{
>> + unsigned int bpsize, psize;
>> +
>
> One blank line is sufficient :)
>
>> +
>> + /* First, check the L bit, if not set, this means 4K */
>> + if ((lp & HBLKR_L_BIT_MASK) == 0) {
>> + set_hblk_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size);
>> + return;
>> + }
>> +
>> + lp &= HBLKR_PENC_MASK;
>> + for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++) {
>> + struct mmu_psize_def *def = &mmu_psize_defs[bpsize];
> ^
> stray space there
>> +
>> + for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
>> + if (def->penc[psize] == lp) {
>> + set_hblk_bloc_size(bpsize, psize, block_size);
>> + return;
>> + }
>> + }
>> + }
>> +}
>> +
>> +#define SPLPAR_TLB_BIC_TOKEN 50
>> +#define SPLPAR_TLB_BIC_MAXLENGTH (MMU_PAGE_COUNT*2 + 10)
>
> The +10 is just a guess I think?
>
> If I'm counting right that ends up as 42 bytes, which is not much at
> all. You could probably just make it a cache line, 128 bytes, which
> should be plenty of space?
>
>> +void __init pseries_lpar_read_hblkr_characteristics(void)
>> +{
>> + int call_status;
>
> This should be grouped with the other ints below on one line.
>
>> + unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
>> + int len, idx, bpsize;
>> +
>> + if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
>> + pr_info("H_BLOCK_REMOVE is not supported");
>
> That's going to trigger on a lot of older machines, and all KVM VMs, so
> just return silently.
>
>> + return;
>> + }
>> +
>> + memset(local_buffer, 0, SPLPAR_TLB_BIC_MAXLENGTH);
>
> Here you memset the whole buffer ...
>
>> + spin_lock(&rtas_data_buf_lock);
>> + memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
>> + call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
>> + NULL,
>> + SPLPAR_TLB_BIC_TOKEN,
>> + __pa(rtas_data_buf),
>> + RTAS_DATA_BUF_SIZE);
>> + memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
>
> But then here you memcpy over the entire buffer, making the memset above
> unnecessary.
>
>> + local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
>> + spin_unlock(&rtas_data_buf_lock);
>> +
>> + if (call_status != 0) {
>> + pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
>> + __FILE__, __func__, call_status);
>
> __FILE__ and __func__ is pretty verbose for what should be a rare case
> (I realise you copied that from existing code).
>
> pr_warn("Error calling get-system-parameter(%d, ...) (0x%x)\n",
> SPLPAR_TLB_BIC_TOKEN, call_status);
>
> Should do I think?
>
>> + return;
>> + }
>> +
>> + /*
>> + * The first two (2) bytes of the data in the buffer are the length of
>> + * the returned data, not counting these first two (2) bytes.
>> + */
>> + len = local_buffer[0] * 256 + local_buffer[1] + 2;
>
> This took me a minute to parse. I guess I was expecting something more
> like:
> len = be16_to_cpu(local_buffer) + 2;
>
> ??
>
>> + if (len >= SPLPAR_TLB_BIC_MAXLENGTH) {
>
> I think it's allowed for len to be == SPLPAR_TLB_BIC_MAXLENGTH isn't it?
>
>> + pr_warn("%s too large returned buffer %d", __func__, len);
>> + return;
>> + }
>> +
>> + idx = 2;
>> + while (idx < len) {
>> + unsigned int block_size = local_buffer[idx++];
>
> This is a little subtle, local_buffer is char, so no endian issue, but
> you're loading that into an unsigned int which makes it look like there
> should be an endian swap.
>
> Maybe instead do:
> u8 block_shift = local_buffer[idx++];
> u32 block_size;
>
> if (!block_shift)
> break;
>
> block_size = 1 << block_shift;
>
> ??
>
>> + unsigned int npsize;
>> +
>> + if (!block_size)
>> + break;
>> +
>> + block_size = 1 << block_size;
>> +
>> + for (npsize = local_buffer[idx++]; npsize > 0; npsize--)
>> + check_lp_set_hblk((unsigned int) local_buffer[idx++],
>> + block_size);
>
> This could overflow if npsize is too big. I think you can just ad
> "&& idx < len" to the for condition to make it safe?
>
>> + }
>> +
>> + for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
>> + for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
>> + if (hblkr_size[bpsize][idx])
>> + pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d block size:%d",
>> + bpsize, idx, hblkr_size[bpsize][idx]);
>
> I think this is probably too verbose, except for debugging. Probably
> just remove it?

I'd like to keep that displayed because this is the only way to get these
values.
I tried to use pr_debug() and rely on a dyndbg kernel command line option,
but the dynamic debug system is not yet initialised at this time of boot.

When a system is booting, those messages are interleaved with other user
cryptic messages ;):

[ 0.000000] numa: NODE_DATA [mem 0x7ffdd7000-0x7ffddbfff]
[ 0.000000] numa: NODE_DATA(0) on node 2
[ 0.000000] numa: NODE_DATA [mem 0x7ffdd2000-0x7ffdd6fff]
[ 0.000000] rfi-flush: fallback displacement flush available
[ 0.000000] rfi-flush: mttrig type flush available
[ 0.000000] count-cache-flush: full software flush sequence enabled.
[ 0.000000] stf-barrier: eieio barrier available
[ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:0 psize:0 block size:8
[ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:0 psize:2 block size:8
[ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:0 psize:10 block size:8
[ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:2 psize:2 block size:8
[ 0.000000] lpar: H_BLOCK_REMOVE supports base psize:2 psize:10 block size:8
[ 0.000000] PPC64 nvram contains 15360 bytes
[ 0.000000] barrier-nospec: using ORI speculation barrier
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000000000000-0x00000007ffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 2: [mem 0x0000000000000000-0x00000007ffffffff]
[ 0.000000] Could not find start_pfn for node 0

Is this an issue to keep those lines ?

Cheers