2018-07-27 13:54:05

by Laurent Dufour

[permalink] [raw]
Subject: [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE

[Resending so everyone is getting the cover letter]

On very large system we could see soft lockup fired when a process is exiting

watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
NIP: c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
REGS: c00006b0645b7610 TRAP: 0901 Not tainted (4.17.0)
MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22042082 XER: 00000000
CFAR: 00000000006cf8f0 SOFTE: 0
GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000
GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480
GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480
GPR12: ffffffffffffffff c00000001e690800
NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
Call Trace:
[c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
[c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
[c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
[c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
[c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
[c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
[c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
[c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
[c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
[c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
[c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
Instruction dump:
60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378
e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008

This happens when removing the PTE by calling the hypervisor using the
H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
tlbie for each PTE it is processing. This could lead to long time spent in
the hypervisor (sometimes up to 4s) and soft lockup being raised because
the scheduler is not called in zap_pte_range().

Since the Power7's time, the hypervisor is providing a new hcall
H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
tlbie. By limiting the amount of tlbie generated, this reduces the time
spent invalidating the PTEs.

This hcall requires that the pages are "all within the same naturally
aligned 8 page virtual address block".

With this patch series applied, I couldn't see any soft lockup raised on
the victim LPAR I was running the test one.

This series is covering both normal pages and huge pages.

Laurent Dufour (3):
powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
powerpc/pseries/mm: factorize PTE slot computation
powerpc/pseries/mm: call H_BLOCK_REMOVE

arch/powerpc/include/asm/firmware.h | 3 +-
arch/powerpc/include/asm/hvcall.h | 1 +
arch/powerpc/platforms/pseries/firmware.c | 1 +
arch/powerpc/platforms/pseries/lpar.c | 250 ++++++++++++++++++++++++++----
4 files changed, 228 insertions(+), 27 deletions(-)

--
2.7.4



2018-07-27 13:52:50

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE

This feature tells if the hcall H_BLOCK_REMOVE is available.

Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
---
arch/powerpc/include/asm/firmware.h | 3 ++-
arch/powerpc/platforms/pseries/firmware.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 535add3f7791..360ba197f9d2 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -53,6 +53,7 @@
#define FW_FEATURE_PRRN ASM_CONST(0x0000000200000000)
#define FW_FEATURE_DRMEM_V2 ASM_CONST(0x0000000400000000)
#define FW_FEATURE_DRC_INFO ASM_CONST(0x0000000800000000)
+#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000)

#ifndef __ASSEMBLY__

@@ -70,7 +71,7 @@ enum {
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
- FW_FEATURE_DRC_INFO,
+ FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index a3bbeb43689e..1624501386f4 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
{FW_FEATURE_SET_MODE, "hcall-set-mode"},
{FW_FEATURE_BEST_ENERGY, "hcall-best-energy-1*"},
{FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
+ {FW_FEATURE_BLOCK_REMOVE, "hcall-block-remove"},
};

/* Build up the firmware features bitmask using the contents of
--
2.7.4


2018-07-27 13:52:52

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

This hypervisor call allows to remove up to 8 ptes with only call to tlbie.

The virtual pages must be all within the same naturally aligned 8 page
virtual address block and have the same page and segment size encodings.

Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
---
arch/powerpc/include/asm/hvcall.h | 1 +
arch/powerpc/platforms/pseries/lpar.c | 223 +++++++++++++++++++++++++++++++---
2 files changed, 205 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 662c8347d699..e403d574651d 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -278,6 +278,7 @@
#define H_COP 0x304
#define H_GET_MPP_X 0x314
#define H_SET_MODE 0x31C
+#define H_BLOCK_REMOVE 0x328
#define H_CLEAR_HPT 0x358
#define H_REQUEST_VMC 0x360
#define H_RESIZE_HPT_PREPARE 0x36C
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 96b8cd8a802d..41ed03245eb4 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
BUG_ON(lpar_rc != H_SUCCESS);
}

+
+/*
+ * As defined in the PAPR's section 14.5.4.1.8
+ * The control mask doesn't include the returned reference and change bit from
+ * the processed PTE.
+ */
+#define HBLKR_AVPN 0x0100000000000000UL
+#define HBLKR_CTRL_MASK 0xf800000000000000UL
+#define HBLKR_CTRL_SUCCESS 0x8000000000000000UL
+#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
+#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
+
+/**
+ * H_BLOCK_REMOVE caller.
+ * @idx should point to the latest @param entry set with a PTEX.
+ * If PTE cannot be processed because another CPUs has already locked that
+ * group, those entries are put back in @param starting at index 1.
+ * If entries has to be retried and @retry_busy is set to true, these entries
+ * are retried until success. If @retry_busy is set to false, the returned
+ * is the number of entries yet to process.
+ */
+static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
+ bool retry_busy)
+{
+ unsigned long i, rc, new_idx;
+ unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+again:
+ new_idx = 0;
+ BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
+ if (idx < PLPAR_HCALL9_BUFSIZE)
+ param[idx] = HBR_END;
+
+ rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
+ param[0], /* AVA */
+ param[1], param[2], param[3], param[4], /* TS0-7 */
+ param[5], param[6], param[7], param[8]);
+ if (rc == H_SUCCESS)
+ return 0;
+
+ BUG_ON(rc != H_PARTIAL);
+
+ /* Check that the unprocessed entries were 'not found' or 'busy' */
+ for (i = 0; i < idx-1; i++) {
+ unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
+
+ if (ctrl == HBLKR_CTRL_ERRBUSY) {
+ param[++new_idx] = param[i+1];
+ continue;
+ }
+
+ BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
+ && ctrl != HBLKR_CTRL_ERRNOTFOUND);
+ }
+
+ /*
+ * If there were entries found busy, retry these entries if requested,
+ * of if all the entries have to be retried.
+ */
+ if (new_idx && (retry_busy || new_idx == (PLPAR_HCALL9_BUFSIZE-1))) {
+ idx = new_idx + 1;
+ goto again;
+ }
+
+ return new_idx;
+}
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/*
* Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
@@ -425,17 +492,59 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
*/
#define PPC64_HUGE_HPTE_BATCH 12

-static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
- unsigned long *vpn, int count,
- int psize, int ssize)
+static void hugepage_block_invalidate(unsigned long *slot, unsigned long *vpn,
+ int count, int psize, int ssize)
{
unsigned long param[PLPAR_HCALL9_BUFSIZE];
- int i = 0, pix = 0, rc;
- unsigned long flags = 0;
- int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+ unsigned long shift, current_vpgb, vpgb;
+ int i, pix = 0;

- if (lock_tlbie)
- spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+ shift = mmu_psize_defs[psize].shift;
+
+ for (i = 0; i < count; i++) {
+ /*
+ * Shifting 3 bits more on the right to get a
+ * 8 pages aligned virtual addresse.
+ */
+ vpgb = (vpn[i] >> (shift - VPN_SHIFT + 3));
+ if (!pix || vpgb != current_vpgb) {
+ /*
+ * Need to start a new 8 pages block, flush
+ * the current one if needed.
+ */
+ if (pix)
+ (void)call_block_remove(pix, param, true);
+ current_vpgb = vpgb;
+ param[0] = hpte_encode_avpn(vpn[i], psize, ssize);
+ pix = 1;
+ }
+
+ param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot[i];
+ if (pix == PLPAR_HCALL9_BUFSIZE) {
+ pix = call_block_remove(pix, param, false);
+ /*
+ * pix = 0 means that all the entries were
+ * removed, we can start a new block.
+ * Otherwise, this means that there are entries
+ * to retry, and pix points to latest one, so
+ * we should increment it and try to continue
+ * the same block.
+ */
+ if (!pix)
+ current_vpgb = 0;
+ else
+ pix++;
+ }
+ }
+ if (pix)
+ (void)call_block_remove(pix, param, true);
+}
+
+static void hugepage_bulk_invalidate(unsigned long *slot, unsigned long *vpn,
+ int count, int psize, int ssize)
+{
+ unsigned long param[PLPAR_HCALL9_BUFSIZE];
+ int i = 0, pix = 0, rc;

for (i = 0; i < count; i++) {

@@ -443,17 +552,6 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize, 0,
ssize, 0);
} else {
- param[pix] = HBR_REQUEST | HBR_AVPN | slot[i];
- param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize);
- pix += 2;
- if (pix == 8) {
- rc = plpar_hcall9(H_BULK_REMOVE, param,
- param[0], param[1], param[2],
- param[3], param[4], param[5],
- param[6], param[7]);
- BUG_ON(rc != H_SUCCESS);
- pix = 0;
- }
}
}
if (pix) {
@@ -463,6 +561,23 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
param[6], param[7]);
BUG_ON(rc != H_SUCCESS);
}
+}
+
+static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
+ unsigned long *vpn,
+ int count, int psize,
+ int ssize)
+{
+ unsigned long flags = 0;
+ int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+
+ if (lock_tlbie)
+ spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+
+ if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
+ hugepage_block_invalidate(slot, vpn, count, psize, ssize);
+ else
+ hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);

if (lock_tlbie)
spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
@@ -565,6 +680,70 @@ static inline unsigned long compute_slot(real_pte_t pte,
return slot;
}

+/**
+ * The hcall H_BLOCK_REMOVE implies that the virtual pages to processed are
+ * "all within the same naturally aligned 8 page virtual address block".
+ */
+static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
+ unsigned long *param)
+{
+ unsigned long vpn;
+ unsigned long i, pix = 0;
+ unsigned long index, shift, slot, current_vpgb, vpgb;
+ real_pte_t pte;
+ int psize, ssize;
+
+ psize = batch->psize;
+ ssize = batch->ssize;
+
+ for (i = 0; i < number; i++) {
+ vpn = batch->vpn[i];
+ pte = batch->pte[i];
+ pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+ /*
+ * Shifting 3 bits more on the right to get a
+ * 8 pages aligned virtual addresse.
+ */
+ vpgb = (vpn >> (shift - VPN_SHIFT + 3));
+ if (!pix || vpgb != current_vpgb) {
+ /*
+ * Need to start a new 8 pages block, flush
+ * the current one if needed.
+ */
+ if (pix)
+ (void)call_block_remove(pix, param,
+ true);
+ current_vpgb = vpgb;
+ param[0] = hpte_encode_avpn(vpn, psize,
+ ssize);
+ pix = 1;
+ }
+
+ slot = compute_slot(pte, vpn, index, shift, ssize);
+ param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot;
+
+ if (pix == PLPAR_HCALL9_BUFSIZE) {
+ pix = call_block_remove(pix, param, false);
+ /*
+ * pix = 0 means that all the entries were
+ * removed, we can start a new block.
+ * Otherwise, this means that there are entries
+ * to retry, and pix points to latest one, so
+ * we should increment it and try to continue
+ * the same block.
+ */
+ if (!pix)
+ current_vpgb = 0;
+ else
+ pix++;
+ }
+ } pte_iterate_hashed_end();
+ }
+
+ if (pix > 1)
+ (void)call_block_remove(pix, param, true);
+}
+
/*
* Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
* lock.
@@ -584,6 +763,11 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
if (lock_tlbie)
spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);

+ if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+ do_block_remove(number, batch, param);
+ goto out;
+ }
+
psize = batch->psize;
ssize = batch->ssize;
pix = 0;
@@ -622,6 +806,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
BUG_ON(rc != H_SUCCESS);
}

+out:
if (lock_tlbie)
spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
}
--
2.7.4


2018-07-27 13:53:00

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation

This part of code will be called also when dealing with H_BLOCK_REMOVE.

Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
---
arch/powerpc/platforms/pseries/lpar.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 52eeff1297f4..96b8cd8a802d 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -547,6 +547,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
return 0;
}

+
+static inline unsigned long compute_slot(real_pte_t pte,
+ unsigned long vpn,
+ unsigned long index,
+ unsigned long shift,
+ int ssize)
+{
+ unsigned long slot, hash, hidx;
+
+ hash = hpt_hash(vpn, shift, ssize);
+ hidx = __rpte_to_hidx(pte, index);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+ slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+ slot += hidx & _PTEIDX_GROUP_IX;
+ return slot;
+}
+
/*
* Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
* lock.
@@ -559,7 +577,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
unsigned long param[PLPAR_HCALL9_BUFSIZE];
- unsigned long hash, index, shift, hidx, slot;
+ unsigned long index, shift, slot;
real_pte_t pte;
int psize, ssize;

@@ -573,12 +591,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
vpn = batch->vpn[i];
pte = batch->pte[i];
pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
- hash = hpt_hash(vpn, shift, ssize);
- hidx = __rpte_to_hidx(pte, index);
- if (hidx & _PTEIDX_SECONDARY)
- hash = ~hash;
- slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
- slot += hidx & _PTEIDX_GROUP_IX;
+ slot = compute_slot(pte, vpn, index, shift, ssize);
if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
/*
* lpar doesn't use the passed actual page size
--
2.7.4


2018-07-30 13:48:13

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

Hi Laurent,

Just one comment below.

Laurent Dufour <[email protected]> writes:
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 96b8cd8a802d..41ed03245eb4 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
> BUG_ON(lpar_rc != H_SUCCESS);
> }
>
> +
> +/*
> + * As defined in the PAPR's section 14.5.4.1.8
> + * The control mask doesn't include the returned reference and change bit from
> + * the processed PTE.
> + */
> +#define HBLKR_AVPN 0x0100000000000000UL
> +#define HBLKR_CTRL_MASK 0xf800000000000000UL
> +#define HBLKR_CTRL_SUCCESS 0x8000000000000000UL
> +#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
> +#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
> +
> +/**
> + * H_BLOCK_REMOVE caller.
> + * @idx should point to the latest @param entry set with a PTEX.
> + * If PTE cannot be processed because another CPUs has already locked that
> + * group, those entries are put back in @param starting at index 1.
> + * If entries has to be retried and @retry_busy is set to true, these entries
> + * are retried until success. If @retry_busy is set to false, the returned
> + * is the number of entries yet to process.
> + */
> +static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
> + bool retry_busy)
> +{
> + unsigned long i, rc, new_idx;
> + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
> +
> +again:
> + new_idx = 0;
> + BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));

I count 1 ..

> + if (idx < PLPAR_HCALL9_BUFSIZE)
> + param[idx] = HBR_END;
> +
> + rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
> + param[0], /* AVA */
> + param[1], param[2], param[3], param[4], /* TS0-7 */
> + param[5], param[6], param[7], param[8]);
> + if (rc == H_SUCCESS)
> + return 0;
> +
> + BUG_ON(rc != H_PARTIAL);

2 ...

> + /* Check that the unprocessed entries were 'not found' or 'busy' */
> + for (i = 0; i < idx-1; i++) {
> + unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
> +
> + if (ctrl == HBLKR_CTRL_ERRBUSY) {
> + param[++new_idx] = param[i+1];
> + continue;
> + }
> +
> + BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
> + && ctrl != HBLKR_CTRL_ERRNOTFOUND);

3 ...

BUG_ON()s.

I know the code in this file is already pretty liberal with the use of
BUG_ON() but I'd prefer if we don't make it any worse.

Given this is an optimisation it seems like we should be able to fall
back to the existing implementation in the case of error (which will
probably then BUG_ON() ????)

If there's some reason we can't then I guess I can live with it.

cheers

2018-07-30 14:20:22

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE

Laurent Dufour <[email protected]> writes:

> This feature tells if the hcall H_BLOCK_REMOVE is available.
>
> Cc: "Aneesh Kumar K.V" <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Signed-off-by: Laurent Dufour <[email protected]>

Reviewed-by: Aneesh Kumar K.V <[email protected]>

> ---
> arch/powerpc/include/asm/firmware.h | 3 ++-
> arch/powerpc/platforms/pseries/firmware.c | 1 +
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
> index 535add3f7791..360ba197f9d2 100644
> --- a/arch/powerpc/include/asm/firmware.h
> +++ b/arch/powerpc/include/asm/firmware.h
> @@ -53,6 +53,7 @@
> #define FW_FEATURE_PRRN ASM_CONST(0x0000000200000000)
> #define FW_FEATURE_DRMEM_V2 ASM_CONST(0x0000000400000000)
> #define FW_FEATURE_DRC_INFO ASM_CONST(0x0000000800000000)
> +#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000)
>
> #ifndef __ASSEMBLY__
>
> @@ -70,7 +71,7 @@ enum {
> FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
> FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
> FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
> - FW_FEATURE_DRC_INFO,
> + FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE,
> FW_FEATURE_PSERIES_ALWAYS = 0,
> FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
> FW_FEATURE_POWERNV_ALWAYS = 0,
> diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
> index a3bbeb43689e..1624501386f4 100644
> --- a/arch/powerpc/platforms/pseries/firmware.c
> +++ b/arch/powerpc/platforms/pseries/firmware.c
> @@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
> {FW_FEATURE_SET_MODE, "hcall-set-mode"},
> {FW_FEATURE_BEST_ENERGY, "hcall-best-energy-1*"},
> {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
> + {FW_FEATURE_BLOCK_REMOVE, "hcall-block-remove"},
> };
>
> /* Build up the firmware features bitmask using the contents of
> --
> 2.7.4


2018-07-30 14:21:19

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation

Laurent Dufour <[email protected]> writes:

> This part of code will be called also when dealing with H_BLOCK_REMOVE.
>
> Cc: "Aneesh Kumar K.V" <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Signed-off-by: Laurent Dufour <[email protected]>

Reviewed-by: Aneesh Kumar K.V <[email protected]>

> ---
> arch/powerpc/platforms/pseries/lpar.c | 27 ++++++++++++++++++++-------
> 1 file changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 52eeff1297f4..96b8cd8a802d 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -547,6 +547,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
> return 0;
> }
>
> +
> +static inline unsigned long compute_slot(real_pte_t pte,
> + unsigned long vpn,
> + unsigned long index,
> + unsigned long shift,
> + int ssize)
> +{
> + unsigned long slot, hash, hidx;
> +
> + hash = hpt_hash(vpn, shift, ssize);
> + hidx = __rpte_to_hidx(pte, index);
> + if (hidx & _PTEIDX_SECONDARY)
> + hash = ~hash;
> + slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> + slot += hidx & _PTEIDX_GROUP_IX;
> + return slot;
> +}
> +
> /*
> * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
> * lock.
> @@ -559,7 +577,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
> struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
> int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
> unsigned long param[PLPAR_HCALL9_BUFSIZE];
> - unsigned long hash, index, shift, hidx, slot;
> + unsigned long index, shift, slot;
> real_pte_t pte;
> int psize, ssize;
>
> @@ -573,12 +591,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
> vpn = batch->vpn[i];
> pte = batch->pte[i];
> pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
> - hash = hpt_hash(vpn, shift, ssize);
> - hidx = __rpte_to_hidx(pte, index);
> - if (hidx & _PTEIDX_SECONDARY)
> - hash = ~hash;
> - slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> - slot += hidx & _PTEIDX_GROUP_IX;
> + slot = compute_slot(pte, vpn, index, shift, ssize);
> if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
> /*
> * lpar doesn't use the passed actual page size
> --
> 2.7.4


2018-07-30 14:24:01

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

Michael Ellerman <[email protected]> writes:

> Hi Laurent,
>
> Just one comment below.
>
> Laurent Dufour <[email protected]> writes:
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>> index 96b8cd8a802d..41ed03245eb4 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
>> BUG_ON(lpar_rc != H_SUCCESS);
>> }
>>
>> +
>> +/*
>> + * As defined in the PAPR's section 14.5.4.1.8
>> + * The control mask doesn't include the returned reference and change bit from
>> + * the processed PTE.
>> + */
>> +#define HBLKR_AVPN 0x0100000000000000UL
>> +#define HBLKR_CTRL_MASK 0xf800000000000000UL
>> +#define HBLKR_CTRL_SUCCESS 0x8000000000000000UL
>> +#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
>> +#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
>> +
>> +/**
>> + * H_BLOCK_REMOVE caller.
>> + * @idx should point to the latest @param entry set with a PTEX.
>> + * If PTE cannot be processed because another CPUs has already locked that
>> + * group, those entries are put back in @param starting at index 1.
>> + * If entries has to be retried and @retry_busy is set to true, these entries
>> + * are retried until success. If @retry_busy is set to false, the returned
>> + * is the number of entries yet to process.
>> + */
>> +static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
>> + bool retry_busy)
>> +{
>> + unsigned long i, rc, new_idx;
>> + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
>> +
>> +again:
>> + new_idx = 0;
>> + BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
>
> I count 1 ..
>
>> + if (idx < PLPAR_HCALL9_BUFSIZE)
>> + param[idx] = HBR_END;
>> +
>> + rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
>> + param[0], /* AVA */
>> + param[1], param[2], param[3], param[4], /* TS0-7 */
>> + param[5], param[6], param[7], param[8]);
>> + if (rc == H_SUCCESS)
>> + return 0;
>> +
>> + BUG_ON(rc != H_PARTIAL);
>
> 2 ...
>
>> + /* Check that the unprocessed entries were 'not found' or 'busy' */
>> + for (i = 0; i < idx-1; i++) {
>> + unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
>> +
>> + if (ctrl == HBLKR_CTRL_ERRBUSY) {
>> + param[++new_idx] = param[i+1];
>> + continue;
>> + }
>> +
>> + BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
>> + && ctrl != HBLKR_CTRL_ERRNOTFOUND);
>
> 3 ...
>
> BUG_ON()s.
>
> I know the code in this file is already pretty liberal with the use of
> BUG_ON() but I'd prefer if we don't make it any worse.
>
> Given this is an optimisation it seems like we should be able to fall
> back to the existing implementation in the case of error (which will
> probably then BUG_ON() ????)
>
> If there's some reason we can't then I guess I can live with it.

It would be nice to log the error in case we are not expecting the
error return. We recently did
https://marc.info/[email protected]

-aneesh


2018-08-01 01:56:30

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE

On Fri, 27 Jul 2018 15:51:30 +0200
Laurent Dufour <[email protected]> wrote:

> [Resending so everyone is getting the cover letter]
>
> On very large system we could see soft lockup fired when a process is exiting
>
> watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
> Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
> CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
> NIP: c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
> REGS: c00006b0645b7610 TRAP: 0901 Not tainted (4.17.0)
> MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22042082 XER: 00000000
> CFAR: 00000000006cf8f0 SOFTE: 0
> GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000
> GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480
> GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480
> GPR12: ffffffffffffffff c00000001e690800
> NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
> LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
> Call Trace:
> [c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
> [c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
> [c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
> [c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
> [c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
> [c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
> [c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
> [c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
> [c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
> [c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
> [c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
> Instruction dump:
> 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378
> e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008
>
> This happens when removing the PTE by calling the hypervisor using the
> H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
> tlbie for each PTE it is processing. This could lead to long time spent in
> the hypervisor (sometimes up to 4s) and soft lockup being raised because
> the scheduler is not called in zap_pte_range().
>
> Since the Power7's time, the hypervisor is providing a new hcall
> H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
> tlbie. By limiting the amount of tlbie generated, this reduces the time
> spent invalidating the PTEs.

Oh that's a nice feature. I must have an ancient PAPR because I don't
have it. It could be a good project for someone to implement it in KVM
too.

>
> This hcall requires that the pages are "all within the same naturally
> aligned 8 page virtual address block".
>
> With this patch series applied, I couldn't see any soft lockup raised on
> the victim LPAR I was running the test one.
>
> This series is covering both normal pages and huge pages.

Really nice, thanks for working on the problem.

Thanks,
Nick

2018-08-03 02:30:42

by Michael Ellerman

[permalink] [raw]
Subject: Re: [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE

Nicholas Piggin <[email protected]> writes:
> On Fri, 27 Jul 2018 15:51:30 +0200
> Laurent Dufour <[email protected]> wrote:
>> [Resending so everyone is getting the cover letter]
>> On very large system we could see soft lockup fired when a process is exiting
>>
>> watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
>> Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
>> CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
>> NIP: c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
>> REGS: c00006b0645b7610 TRAP: 0901 Not tainted (4.17.0)
>> MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22042082 XER: 00000000
>> CFAR: 00000000006cf8f0 SOFTE: 0
>> GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000
>> GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480
>> GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480
>> GPR12: ffffffffffffffff c00000001e690800
>> NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
>> LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
>> Call Trace:
>> [c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
>> [c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
>> [c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
>> [c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
>> [c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
>> [c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
>> [c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
>> [c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
>> [c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
>> [c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
>> [c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
>> Instruction dump:
>> 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378
>> e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008
>>
>> This happens when removing the PTE by calling the hypervisor using the
>> H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
>> tlbie for each PTE it is processing. This could lead to long time spent in
>> the hypervisor (sometimes up to 4s) and soft lockup being raised because
>> the scheduler is not called in zap_pte_range().
>>
>> Since the Power7's time, the hypervisor is providing a new hcall
>> H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
>> tlbie. By limiting the amount of tlbie generated, this reduces the time
>> spent invalidating the PTEs.
>
> Oh that's a nice feature. I must have an ancient PAPR because I don't
> have it.

The only public document is LoPAPR, which is a stripped down version of
the 2012 PAPR.

cheers

2018-08-16 18:06:45

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

On 30/07/2018 15:47, Michael Ellerman wrote:
> Hi Laurent,
>
> Just one comment below.
>
> Laurent Dufour <[email protected]> writes:
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>> index 96b8cd8a802d..41ed03245eb4 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
>> BUG_ON(lpar_rc != H_SUCCESS);
>> }
>>
>> +
>> +/*
>> + * As defined in the PAPR's section 14.5.4.1.8
>> + * The control mask doesn't include the returned reference and change bit from
>> + * the processed PTE.
>> + */
>> +#define HBLKR_AVPN 0x0100000000000000UL
>> +#define HBLKR_CTRL_MASK 0xf800000000000000UL
>> +#define HBLKR_CTRL_SUCCESS 0x8000000000000000UL
>> +#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
>> +#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
>> +
>> +/**
>> + * H_BLOCK_REMOVE caller.
>> + * @idx should point to the latest @param entry set with a PTEX.
>> + * If PTE cannot be processed because another CPUs has already locked that
>> + * group, those entries are put back in @param starting at index 1.
>> + * If entries has to be retried and @retry_busy is set to true, these entries
>> + * are retried until success. If @retry_busy is set to false, the returned
>> + * is the number of entries yet to process.
>> + */
>> +static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
>> + bool retry_busy)
>> +{
>> + unsigned long i, rc, new_idx;
>> + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
>> +
>> +again:
>> + new_idx = 0;
>> + BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
>
> I count 1 ..
>
>> + if (idx < PLPAR_HCALL9_BUFSIZE)
>> + param[idx] = HBR_END;
>> +
>> + rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
>> + param[0], /* AVA */
>> + param[1], param[2], param[3], param[4], /* TS0-7 */
>> + param[5], param[6], param[7], param[8]);
>> + if (rc == H_SUCCESS)
>> + return 0;
>> +
>> + BUG_ON(rc != H_PARTIAL);
>
> 2 ...
>
>> + /* Check that the unprocessed entries were 'not found' or 'busy' */
>> + for (i = 0; i < idx-1; i++) {
>> + unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
>> +
>> + if (ctrl == HBLKR_CTRL_ERRBUSY) {
>> + param[++new_idx] = param[i+1];
>> + continue;
>> + }
>> +
>> + BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
>> + && ctrl != HBLKR_CTRL_ERRNOTFOUND);
>
> 3 ...
>
> BUG_ON()s.
>
> I know the code in this file is already pretty liberal with the use of
> BUG_ON() but I'd prefer if we don't make it any worse.

The first one is clearly not required. But I would keep the following twos
because this call is not expected to fail except if there is a discrepancy
between the linux kernel HASH views and the hypervisor's one, which could be
dramatic in the consequences.

>
> Given this is an optimisation it seems like we should be able to fall
> back to the existing implementation in the case of error (which will
> probably then BUG_ON() ????)

I don't think falling back to the H_BULK call will be helpfull since it is
doing the same so the same errors are expected. Furthermore, this hcall can do
a partial work which means complex code to fallback on H_BULK as we should
identify to already processed entries.

> If there's some reason we can't then I guess I can live with it.

I'm proposing to send a new series with _only_ 2 calls to BUG_ON().

Furthermore this patch is not correct on the way the huge pages are managed. I
was too hurry to push it last time.

Cheers,
Laurent.


2018-08-16 18:49:00

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE

On 30/07/2018 16:22, Aneesh Kumar K.V wrote:
> Michael Ellerman <[email protected]> writes:
>
>> Hi Laurent,
>>
>> Just one comment below.
>>
>> Laurent Dufour <[email protected]> writes:
>>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>>> index 96b8cd8a802d..41ed03245eb4 100644
>>> --- a/arch/powerpc/platforms/pseries/lpar.c
>>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>>> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
>>> BUG_ON(lpar_rc != H_SUCCESS);
>>> }
>>>
>>> +
>>> +/*
>>> + * As defined in the PAPR's section 14.5.4.1.8
>>> + * The control mask doesn't include the returned reference and change bit from
>>> + * the processed PTE.
>>> + */
>>> +#define HBLKR_AVPN 0x0100000000000000UL
>>> +#define HBLKR_CTRL_MASK 0xf800000000000000UL
>>> +#define HBLKR_CTRL_SUCCESS 0x8000000000000000UL
>>> +#define HBLKR_CTRL_ERRNOTFOUND 0x8800000000000000UL
>>> +#define HBLKR_CTRL_ERRBUSY 0xa000000000000000UL
>>> +
>>> +/**
>>> + * H_BLOCK_REMOVE caller.
>>> + * @idx should point to the latest @param entry set with a PTEX.
>>> + * If PTE cannot be processed because another CPUs has already locked that
>>> + * group, those entries are put back in @param starting at index 1.
>>> + * If entries has to be retried and @retry_busy is set to true, these entries
>>> + * are retried until success. If @retry_busy is set to false, the returned
>>> + * is the number of entries yet to process.
>>> + */
>>> +static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
>>> + bool retry_busy)
>>> +{
>>> + unsigned long i, rc, new_idx;
>>> + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
>>> +
>>> +again:
>>> + new_idx = 0;
>>> + BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
>>
>> I count 1 ..
>>
>>> + if (idx < PLPAR_HCALL9_BUFSIZE)
>>> + param[idx] = HBR_END;
>>> +
>>> + rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
>>> + param[0], /* AVA */
>>> + param[1], param[2], param[3], param[4], /* TS0-7 */
>>> + param[5], param[6], param[7], param[8]);
>>> + if (rc == H_SUCCESS)
>>> + return 0;
>>> +
>>> + BUG_ON(rc != H_PARTIAL);
>>
>> 2 ...
>>
>>> + /* Check that the unprocessed entries were 'not found' or 'busy' */
>>> + for (i = 0; i < idx-1; i++) {
>>> + unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
>>> +
>>> + if (ctrl == HBLKR_CTRL_ERRBUSY) {
>>> + param[++new_idx] = param[i+1];
>>> + continue;
>>> + }
>>> +
>>> + BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
>>> + && ctrl != HBLKR_CTRL_ERRNOTFOUND);
>>
>> 3 ...
>>
>> BUG_ON()s.
>>
>> I know the code in this file is already pretty liberal with the use of
>> BUG_ON() but I'd prefer if we don't make it any worse.
>>
>> Given this is an optimisation it seems like we should be able to fall
>> back to the existing implementation in the case of error (which will
>> probably then BUG_ON() ????)
>>
>> If there's some reason we can't then I guess I can live with it.
>
> It would be nice to log the error in case we are not expecting the
> error return. We recently did
> https://marc.info/[email protected]

I'm not sure that a failure during an invalidation should just result in an
error message being displayed because the page remains accessible and could
potentially be accessed later.
A comment in the caller hash__tlb_flush(), is quite explicit about that:
/* If there's a TLB batch pending, then we must flush it because the
* pages are going to be freed and we really don't want to have a CPU
* access a freed page because it has a stale TLB
*/

Getting an error when adding an entry may not be fatal but when removing one,
this could lead to data being exposed.

Laurent.