This patch series implements multiple mitigations for the speculative
execution findings:
1. The definition of the gmb() barrier as currently used by the
distributions, we may have to find a better name for it
2. The architecture code for the nospec interfaces, the macros for
nospec_ptr and nospec_load just use the gmb() barrier
3. The enablement for firmware features to switch between different
branch prediction modes. It comes with a config option
CONFIG_KERNEL_NOBP, two new kernel parameters "nobp=[0|1]" and
"nospec", and a new system call s390_modify_bp.
With CONFIG_KERNEL_NOBP=y the new branch prediction mode is active
for the kernel code by default and can be switched off with "nospec"
or "nobp=0". With CONFIG_KERNEL_NOBP=n the new mode is inactive for
kernel code unless "nobp=1" is specified.
User space code can use the trapdoor system call s390_modify_bp to
set the new TIF_NOBP bit. This switches to the new branch prediction
mode for the lifetime of the task, any children of the task will
inherit this attribute.
The vCPU of a KVM guest will run with the new branch prediction
mode if either the associated qemu task has TIF_NOBP set or if the
KVM kernel code sets TIF_NOBP_GUEST. The later will require a small
update to KVM backend.
4. Transport channel reduction by clearing registers on interrupts,
system calls and KVM guest exits.
We are working on an equivalent for retpoline, stay tuned.
@Greg: I have started with the backports for the stable kernel releases,
but unless the interface for gmp/nospec_ptr/nospec_load is cast in stone
does it make sense to send them?
Christian Borntraeger (1):
KVM: s390: wire up seb feature
Martin Schwidefsky (5):
s390/alternative: use a copy of the facility bit mask
s390: implement nospec_[load|ptr]
s390: add options to change branch prediction behaviour for the kernel
s390: add system call to run tasks with modified branch prediction
s390: scrub registers on kernel entry and KVM exit
arch/s390/Kconfig | 17 +++++
arch/s390/include/asm/barrier.h | 38 ++++++++++
arch/s390/include/asm/facility.h | 18 +++++
arch/s390/include/asm/kvm_host.h | 3 +-
arch/s390/include/asm/lowcore.h | 3 +-
arch/s390/include/asm/processor.h | 1 +
arch/s390/include/asm/thread_info.h | 4 ++
arch/s390/include/uapi/asm/kvm.h | 4 +-
arch/s390/include/uapi/asm/unistd.h | 3 +-
arch/s390/kernel/alternative.c | 33 ++++++++-
arch/s390/kernel/early.c | 5 ++
arch/s390/kernel/entry.S | 134 +++++++++++++++++++++++++++++++++++-
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/setup.c | 4 +-
arch/s390/kernel/smp.c | 6 +-
arch/s390/kernel/sys_s390.c | 8 +++
arch/s390/kernel/syscalls.S | 1 +
arch/s390/kvm/kvm-s390.c | 11 +++
arch/s390/kvm/vsie.c | 8 +++
include/uapi/linux/kvm.h | 1 +
20 files changed, 294 insertions(+), 9 deletions(-)
--
2.7.4
Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
---
arch/s390/Kconfig | 17 +++++++++++++
arch/s390/include/asm/processor.h | 1 +
arch/s390/kernel/alternative.c | 23 ++++++++++++++++++
arch/s390/kernel/early.c | 2 ++
arch/s390/kernel/entry.S | 50 ++++++++++++++++++++++++++++++++++++++-
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/smp.c | 2 ++
7 files changed, 95 insertions(+), 1 deletion(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 829c679..a818644 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -541,6 +541,23 @@ config ARCH_RANDOM
If unsure, say Y.
+config KERNEL_NOBP
+ def_bool n
+ prompt "Enable modified branch prediction for the kernel by default"
+ help
+ If this option is selected the kernel will switch to a modified
+ branch prediction mode if the firmware interface is available.
+ The modified branch prediction mode improves the behaviour in
+ regard to speculative execution.
+
+ With the option enabled the kernel parameter "nobp=0" or "nospec"
+ can be used to run the kernel in the normal branch prediction mode.
+
+ With the option disabled the modified branch prediction mode is
+ enabled with the "nobp=1" kernel parameter.
+
+ If unsure, say N.
+
endmenu
menu "Memory setup"
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index bfbfad4..5f37f9c 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -91,6 +91,7 @@ void cpu_detect_mhz_feature(void);
extern const struct seq_operations cpuinfo_op;
extern int sysctl_ieee_emulation_warnings;
extern void execve_tail(void);
+extern void __bpon(void);
/*
* User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit.
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index 33d2e88..2546bfb 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -15,6 +15,29 @@ static int __init disable_alternative_instructions(char *str)
early_param("noaltinstr", disable_alternative_instructions);
+static int __init nobp_setup_early(char *str)
+{
+ bool enabled;
+ int rc;
+
+ rc = kstrtobool(str, &enabled);
+ if (rc)
+ return rc;
+ if (enabled && test_facility(82))
+ __set_facility(82, S390_lowcore.alt_stfle_fac_list);
+ else
+ __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
+ return 0;
+}
+early_param("nobp", nobp_setup_early);
+
+static int __init nospec_setup_early(char *str)
+{
+ __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
+ return 0;
+}
+early_param("nospec", nospec_setup_early);
+
static int __init nogmb_setup_early(char *str)
{
__clear_facility(81, S390_lowcore.alt_stfle_fac_list);
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 510f218..ac707a9 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -196,6 +196,8 @@ static noinline __init void setup_facility_list(void)
memcpy(S390_lowcore.alt_stfle_fac_list,
S390_lowcore.stfle_fac_list,
sizeof(S390_lowcore.alt_stfle_fac_list));
+ if (!IS_ENABLED(CONFIG_KERNEL_NOBP))
+ __clear_facility(82, S390_lowcore.alt_stfle_fac_list);
}
static __init void detect_diag9c(void)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 9e5f6cd..dab716b 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -159,6 +159,34 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
tm off+\addr, \mask
.endm
+ .macro BPOFF
+ .pushsection .altinstr_replacement, "ax"
+660: .long 0xb2e8c000
+ .popsection
+661: .long 0x47000000
+ .pushsection .altinstructions, "a"
+ .long 661b - .
+ .long 660b - .
+ .word 82
+ .byte 4
+ .byte 4
+ .popsection
+ .endm
+
+ .macro BPON
+ .pushsection .altinstr_replacement, "ax"
+662: .long 0xb2e8d000
+ .popsection
+663: .long 0x47000000
+ .pushsection .altinstructions, "a"
+ .long 663b - .
+ .long 662b - .
+ .word 82
+ .byte 4
+ .byte 4
+ .popsection
+ .endm
+
.section .kprobes.text, "ax"
.Ldummy:
/*
@@ -171,6 +199,11 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
*/
nop 0
+ENTRY(__bpon)
+ .globl __bpon
+ BPON
+ br %r14
+
/*
* Scheduler resume function, called by switch_to
* gpr2 = (task_struct *) prev
@@ -226,8 +259,11 @@ ENTRY(sie64a)
jnz .Lsie_skip
TSTMSK __LC_CPU_FLAGS,_CIF_FPU
jo .Lsie_skip # exit if fp/vx regs changed
+ BPON
.Lsie_entry:
sie 0(%r14)
+.Lsie_exit:
+ BPOFF
.Lsie_skip:
ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE
lctlg %c1,%c1,__LC_USER_ASCE # load primary asce
@@ -273,6 +309,7 @@ ENTRY(system_call)
stpt __LC_SYNC_ENTER_TIMER
.Lsysc_stmg:
stmg %r8,%r15,__LC_SAVE_AREA_SYNC
+ BPOFF
lg %r12,__LC_CURRENT
lghi %r13,__TASK_thread
lghi %r14,_PIF_SYSCALL
@@ -317,6 +354,7 @@ ENTRY(system_call)
jnz .Lsysc_work # check for work
TSTMSK __LC_CPU_FLAGS,_CIF_WORK
jnz .Lsysc_work
+ BPON
.Lsysc_restore:
lg %r14,__LC_VDSO_PER_CPU
lmg %r0,%r10,__PT_R0(%r11)
@@ -522,6 +560,7 @@ ENTRY(kernel_thread_starter)
ENTRY(pgm_check_handler)
stpt __LC_SYNC_ENTER_TIMER
+ BPOFF
stmg %r8,%r15,__LC_SAVE_AREA_SYNC
lg %r10,__LC_LAST_BREAK
lg %r12,__LC_CURRENT
@@ -620,6 +659,7 @@ ENTRY(pgm_check_handler)
ENTRY(io_int_handler)
STCK __LC_INT_CLOCK
stpt __LC_ASYNC_ENTER_TIMER
+ BPOFF
stmg %r8,%r15,__LC_SAVE_AREA_ASYNC
lg %r12,__LC_CURRENT
larl %r13,cleanup_critical
@@ -660,9 +700,13 @@ ENTRY(io_int_handler)
lg %r14,__LC_VDSO_PER_CPU
lmg %r0,%r10,__PT_R0(%r11)
mvc __LC_RETURN_PSW(16),__PT_PSW(%r11)
+ tm __PT_PSW+1(%r11),0x01 # returning to user ?
+ jno .Lio_exit_kernel
+ BPON
.Lio_exit_timer:
stpt __LC_EXIT_TIMER
mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER
+.Lio_exit_kernel:
lmg %r11,%r15,__PT_R11(%r11)
lpswe __LC_RETURN_PSW
.Lio_done:
@@ -833,6 +877,7 @@ ENTRY(io_int_handler)
ENTRY(ext_int_handler)
STCK __LC_INT_CLOCK
stpt __LC_ASYNC_ENTER_TIMER
+ BPOFF
stmg %r8,%r15,__LC_SAVE_AREA_ASYNC
lg %r12,__LC_CURRENT
larl %r13,cleanup_critical
@@ -871,6 +916,7 @@ ENTRY(psw_idle)
.Lpsw_idle_stcctm:
#endif
oi __LC_CPU_FLAGS+7,_CIF_ENABLED_WAIT
+ BPON
STCK __CLOCK_IDLE_ENTER(%r2)
stpt __TIMER_IDLE_ENTER(%r2)
.Lpsw_idle_lpsw:
@@ -971,6 +1017,7 @@ load_fpu_regs:
*/
ENTRY(mcck_int_handler)
STCK __LC_MCCK_CLOCK
+ BPOFF
la %r1,4095 # validate r1
spt __LC_CPU_TIMER_SAVE_AREA-4095(%r1) # validate cpu timer
sckc __LC_CLOCK_COMPARATOR # validate comparator
@@ -1071,6 +1118,7 @@ ENTRY(mcck_int_handler)
mvc __LC_RETURN_MCCK_PSW(16),__PT_PSW(%r11) # move return PSW
tm __LC_RETURN_MCCK_PSW+1,0x01 # returning to user ?
jno 0f
+ BPON
stpt __LC_EXIT_TIMER
mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER
0: lmg %r11,%r15,__PT_R11(%r11)
@@ -1385,7 +1433,7 @@ cleanup_critical:
.Lsie_crit_mcck_start:
.quad .Lsie_entry
.Lsie_crit_mcck_length:
- .quad .Lsie_skip - .Lsie_entry
+ .quad .Lsie_exit - .Lsie_entry
#endif
.section .rodata, "a"
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index 8ecb872..443b9b8 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -547,6 +547,7 @@ static struct kset *ipl_kset;
static void __ipl_run(void *unused)
{
+ __bpon();
diag308(DIAG308_LOAD_CLEAR, NULL);
if (MACHINE_IS_VM)
__cpcmd("IPL", NULL, 0, NULL);
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 4ce68ca..9cd1696 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -319,6 +319,7 @@ static void pcpu_delegate(struct pcpu *pcpu, void (*func)(void *),
mem_assign_absolute(lc->restart_fn, (unsigned long) func);
mem_assign_absolute(lc->restart_data, (unsigned long) data);
mem_assign_absolute(lc->restart_source, source_cpu);
+ __bpon();
asm volatile(
"0: sigp 0,%0,%2 # sigp restart to target cpu\n"
" brc 2,0b # busy, try again\n"
@@ -903,6 +904,7 @@ void __cpu_die(unsigned int cpu)
void __noreturn cpu_die(void)
{
idle_task_exit();
+ __bpon();
pcpu_sigp_retry(pcpu_devices + smp_processor_id(), SIGP_STOP, 0);
for (;;) ;
}
--
2.7.4
From: Christian Borntraeger <[email protected]>
The new firmware interfaces for branch prediction behaviour changes
are transparently available for the guest. Nevertheless, there is
new state attached that should be migrated and properly resetted.
Provide a mechanism for handling reset, migration and VSIE.
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
---
arch/s390/include/asm/kvm_host.h | 3 ++-
arch/s390/include/uapi/asm/kvm.h | 4 +++-
arch/s390/kvm/kvm-s390.c | 11 +++++++++++
arch/s390/kvm/vsie.c | 8 ++++++++
include/uapi/linux/kvm.h | 1 +
5 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index e14f381..a2f6c5d 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -207,7 +207,8 @@ struct kvm_s390_sie_block {
__u16 ipa; /* 0x0056 */
__u32 ipb; /* 0x0058 */
__u32 scaoh; /* 0x005c */
- __u8 reserved60; /* 0x0060 */
+#define FPF_SEBC 0x20
+ __u8 fpf; /* 0x0060 */
#define ECB_GS 0x40
#define ECB_TE 0x10
#define ECB_SRSI 0x04
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 38535a57..20b9e9f 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -224,6 +224,7 @@ struct kvm_guest_debug_arch {
#define KVM_SYNC_RICCB (1UL << 7)
#define KVM_SYNC_FPRS (1UL << 8)
#define KVM_SYNC_GSCB (1UL << 9)
+#define KVM_SYNC_SEBC (1UL << 10)
/* length and alignment of the sdnx as a power of two */
#define SDNXC 8
#define SDNXL (1UL << SDNXC)
@@ -247,7 +248,8 @@ struct kvm_sync_regs {
};
__u8 reserved[512]; /* for future vector expansion */
__u32 fpc; /* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
- __u8 padding1[52]; /* riccb needs to be 64byte aligned */
+ __u8 sebc:1; /* spec blocking */
+ __u8 padding1[51]; /* riccb needs to be 64byte aligned */
__u8 riccb[64]; /* runtime instrumentation controls block */
__u8 padding2[192]; /* sdnx needs to be 256byte aligned */
union {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 2c93cbb..0c18f73 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_GS:
r = test_facility(133);
break;
+ case KVM_CAP_S390_SEB:
+ r = test_facility(82);
+ break;
default:
r = 0;
}
@@ -2198,6 +2201,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
kvm_s390_set_prefix(vcpu, 0);
if (test_kvm_facility(vcpu->kvm, 64))
vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB;
+ if (test_kvm_facility(vcpu->kvm, 82))
+ vcpu->run->kvm_valid_regs |= KVM_SYNC_SEBC;
if (test_kvm_facility(vcpu->kvm, 133))
vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
/* fprs can be synchronized via vrs, even if the guest has no vx. With
@@ -2339,6 +2344,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
current->thread.fpu.fpc = 0;
vcpu->arch.sie_block->gbea = 1;
vcpu->arch.sie_block->pp = 0;
+ vcpu->arch.sie_block->fpf &= ~FPF_SEBC;
vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
kvm_clear_async_pf_completion_queue(vcpu);
if (!kvm_s390_user_cpu_state_ctrl(vcpu->kvm))
@@ -3298,6 +3304,10 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
vcpu->arch.sie_block->ecd |= ECD_HOSTREGMGMT;
vcpu->arch.gs_enabled = 1;
}
+ if (kvm_run->kvm_dirty_regs & KVM_SYNC_SEBC) {
+ vcpu->arch.sie_block->fpf &= ~FPF_SEBC;
+ vcpu->arch.sie_block->fpf |= kvm_run->s.regs.sebc ? FPF_SEBC : 0;
+ }
save_access_regs(vcpu->arch.host_acrs);
restore_access_regs(vcpu->run->s.regs.acrs);
/* save host (userspace) fprs/vrs */
@@ -3344,6 +3354,7 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
kvm_run->s.regs.pft = vcpu->arch.pfault_token;
kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
+ kvm_run->s.regs.sebc = (vcpu->arch.sie_block->fpf & FPF_SEBC) == FPF_SEBC;
save_access_regs(vcpu->run->s.regs.acrs);
restore_access_regs(vcpu->arch.host_acrs);
/* Save guest register state */
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 5d6ae03..10ea208 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -223,6 +223,10 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
memcpy(scb_o->gcr, scb_s->gcr, 128);
scb_o->pp = scb_s->pp;
+ /* speculative blocking */
+ scb_o->fpf &= ~FPF_SEBC;
+ scb_o->fpf |= scb_s->fpf & FPF_SEBC;
+
/* interrupt intercept */
switch (scb_s->icptcode) {
case ICPT_PROGI:
@@ -265,6 +269,7 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
scb_s->ecb3 = 0;
scb_s->ecd = 0;
scb_s->fac = 0;
+ scb_s->fpf = 0;
rc = prepare_cpuflags(vcpu, vsie_page);
if (rc)
@@ -324,6 +329,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
prefix_unmapped(vsie_page);
scb_s->ecb |= scb_o->ecb & ECB_TE;
}
+ /* speculative blocking */
+ if (test_kvm_facility(vcpu->kvm, 82))
+ scb_s->fpf |= scb_o->fpf & FPF_SEBC;
/* SIMD */
if (test_kvm_facility(vcpu->kvm, 129)) {
scb_s->eca |= scb_o->eca & ECA_VX;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 496e59a..066d7ff 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_HYPERV_SYNIC2 148
#define KVM_CAP_HYPERV_VP_INDEX 149
#define KVM_CAP_S390_AIS_MIGRATION 150
+#define KVM_CAP_S390_SEB 151
#ifdef KVM_CAP_IRQ_ROUTING
--
2.7.4
Add a system call to switch a task from the standard branch prediction
to a modified, more secure but slower behaviour.
The user space wrapper to start a program with the modified branch
prediction:
int main(int argc, char *argv[], char *envp[])
{
int rc;
if (argc < 2) {
fprintf(stderr, "Usage: %s <file-to-exec> <arguments>\n",
argv[0]);
exit(EXIT_FAILURE);
}
rc = syscall(__NR_s390_modify_bp);
if (rc) {
perror("s390_modify_bp");
exit(EXIT_FAILURE);
}
execve(argv[1], argv + 1, envp);
perror("execve"); /* execve() returns only on error */
exit(EXIT_FAILURE);
}
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
---
arch/s390/include/asm/thread_info.h | 4 +++
arch/s390/include/uapi/asm/unistd.h | 3 ++-
arch/s390/kernel/entry.S | 51 +++++++++++++++++++++++++++++++++----
arch/s390/kernel/sys_s390.c | 8 ++++++
arch/s390/kernel/syscalls.S | 1 +
5 files changed, 61 insertions(+), 6 deletions(-)
diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h
index 0880a37..ccf37c2 100644
--- a/arch/s390/include/asm/thread_info.h
+++ b/arch/s390/include/asm/thread_info.h
@@ -60,6 +60,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
#define TIF_GUARDED_STORAGE 4 /* load guarded storage control block */
#define TIF_PATCH_PENDING 5 /* pending live patching update */
#define TIF_PGSTE 6 /* New mm's will use 4K page tables */
+#define TIF_NOBP 8 /* Run process with BP off */
+#define TIF_NOBP_GUEST 9 /* Run KVM guests with BP off */
#define TIF_31BIT 16 /* 32bit process */
#define TIF_MEMDIE 17 /* is terminating due to OOM killer */
@@ -80,6 +82,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
#define _TIF_UPROBE _BITUL(TIF_UPROBE)
#define _TIF_GUARDED_STORAGE _BITUL(TIF_GUARDED_STORAGE)
#define _TIF_PATCH_PENDING _BITUL(TIF_PATCH_PENDING)
+#define _TIF_NOBP _BITUL(TIF_NOBP)
+#define _TIF_NOBP_GUEST _BITUL(TIF_NOBP_GUEST)
#define _TIF_31BIT _BITUL(TIF_31BIT)
#define _TIF_SINGLE_STEP _BITUL(TIF_SINGLE_STEP)
diff --git a/arch/s390/include/uapi/asm/unistd.h b/arch/s390/include/uapi/asm/unistd.h
index 7251209..8803723 100644
--- a/arch/s390/include/uapi/asm/unistd.h
+++ b/arch/s390/include/uapi/asm/unistd.h
@@ -317,7 +317,8 @@
#define __NR_s390_guarded_storage 378
#define __NR_statx 379
#define __NR_s390_sthyi 380
-#define NR_syscalls 381
+#define __NR_s390_modify_bp 381
+#define NR_syscalls 382
/*
* There are some system calls that are not present on 64 bit, some
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index dab716b..2a22c03 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -107,6 +107,7 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
aghi %r15,-(STACK_FRAME_OVERHEAD + __PT_SIZE)
j 3f
1: UPDATE_VTIME %r14,%r15,\timer
+ BPENTER __TI_flags(%r12),_TIF_NOBP
2: lg %r15,__LC_ASYNC_STACK # load async stack
3: la %r11,STACK_FRAME_OVERHEAD(%r15)
.endm
@@ -187,6 +188,40 @@ _PIF_WORK = (_PIF_PER_TRAP | _PIF_SYSCALL_RESTART)
.popsection
.endm
+ .macro BPENTER tif_ptr,tif_mask
+ .pushsection .altinstr_replacement, "ax"
+662: .word 0xc004, 0x0000, 0x0000 # 6 byte nop
+ .word 0xc004, 0x0000, 0x0000 # 6 byte nop
+ .popsection
+664: TSTMSK \tif_ptr,\tif_mask
+ jz . + 8
+ .long 0xb2e8d000
+ .pushsection .altinstructions, "a"
+ .long 664b - .
+ .long 662b - .
+ .word 82
+ .byte 12
+ .byte 12
+ .popsection
+ .endm
+
+ .macro BPEXIT tif_ptr,tif_mask
+ TSTMSK \tif_ptr,\tif_mask
+ .pushsection .altinstr_replacement, "ax"
+662: jnz . + 8
+ .long 0xb2e8d000
+ .popsection
+664: jz . + 8
+ .long 0xb2e8c000
+ .pushsection .altinstructions, "a"
+ .long 664b - .
+ .long 662b - .
+ .word 82
+ .byte 8
+ .byte 8
+ .popsection
+ .endm
+
.section .kprobes.text, "ax"
.Ldummy:
/*
@@ -240,9 +275,11 @@ ENTRY(__switch_to)
*/
ENTRY(sie64a)
stmg %r6,%r14,__SF_GPRS(%r15) # save kernel registers
+ lg %r12,__LC_CURRENT
stg %r2,__SF_EMPTY(%r15) # save control block pointer
stg %r3,__SF_EMPTY+8(%r15) # save guest register save area
xc __SF_EMPTY+16(8,%r15),__SF_EMPTY+16(%r15) # reason code = 0
+ mvc __SF_EMPTY+24(8,%r15),__TI_flags(%r12) # copy thread flags
TSTMSK __LC_CPU_FLAGS,_CIF_FPU # load guest fp/vx registers ?
jno .Lsie_load_guest_gprs
brasl %r14,load_fpu_regs # load guest fp/vx regs
@@ -259,11 +296,12 @@ ENTRY(sie64a)
jnz .Lsie_skip
TSTMSK __LC_CPU_FLAGS,_CIF_FPU
jo .Lsie_skip # exit if fp/vx regs changed
- BPON
+ BPEXIT __SF_EMPTY+24(%r15),(_TIF_NOBP|_TIF_NOBP_GUEST)
.Lsie_entry:
sie 0(%r14)
.Lsie_exit:
BPOFF
+ BPENTER __SF_EMPTY+24(%r15),(_TIF_NOBP|_TIF_NOBP_GUEST)
.Lsie_skip:
ni __SIE_PROG0C+3(%r14),0xfe # no longer in SIE
lctlg %c1,%c1,__LC_USER_ASCE # load primary asce
@@ -318,6 +356,7 @@ ENTRY(system_call)
la %r11,STACK_FRAME_OVERHEAD(%r15) # pointer to pt_regs
.Lsysc_vtime:
UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
+ BPENTER __TI_flags(%r12),_TIF_NOBP
stmg %r0,%r7,__PT_R0(%r11)
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
@@ -354,7 +393,7 @@ ENTRY(system_call)
jnz .Lsysc_work # check for work
TSTMSK __LC_CPU_FLAGS,_CIF_WORK
jnz .Lsysc_work
- BPON
+ BPEXIT __TI_flags(%r12),_TIF_NOBP
.Lsysc_restore:
lg %r14,__LC_VDSO_PER_CPU
lmg %r0,%r10,__PT_R0(%r11)
@@ -589,6 +628,7 @@ ENTRY(pgm_check_handler)
aghi %r15,-(STACK_FRAME_OVERHEAD + __PT_SIZE)
j 4f
2: UPDATE_VTIME %r14,%r15,__LC_SYNC_ENTER_TIMER
+ BPENTER __TI_flags(%r12),_TIF_NOBP
lg %r15,__LC_KERNEL_STACK
lgr %r14,%r12
aghi %r14,__TASK_thread # pointer to thread_struct
@@ -702,7 +742,7 @@ ENTRY(io_int_handler)
mvc __LC_RETURN_PSW(16),__PT_PSW(%r11)
tm __PT_PSW+1(%r11),0x01 # returning to user ?
jno .Lio_exit_kernel
- BPON
+ BPEXIT __TI_flags(%r12),_TIF_NOBP
.Lio_exit_timer:
stpt __LC_EXIT_TIMER
mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER
@@ -1118,7 +1158,7 @@ ENTRY(mcck_int_handler)
mvc __LC_RETURN_MCCK_PSW(16),__PT_PSW(%r11) # move return PSW
tm __LC_RETURN_MCCK_PSW+1,0x01 # returning to user ?
jno 0f
- BPON
+ BPEXIT __TI_flags(%r12),_TIF_NOBP
stpt __LC_EXIT_TIMER
mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER
0: lmg %r11,%r15,__PT_R11(%r11)
@@ -1245,7 +1285,8 @@ cleanup_critical:
clg %r9,BASED(.Lsie_crit_mcck_length)
jh 1f
oi __LC_CPU_FLAGS+7, _CIF_MCCK_GUEST
-1: lg %r9,__SF_EMPTY(%r15) # get control block pointer
+1: BPENTER __SF_EMPTY+24(%r15),(_TIF_NOBP|_TIF_NOBP_GUEST)
+ lg %r9,__SF_EMPTY(%r15) # get control block pointer
ni __SIE_PROG0C+3(%r9),0xfe # no longer in SIE
lctlg %c1,%c1,__LC_USER_ASCE # load primary asce
larl %r9,sie_exit # skip forward to sie_exit
diff --git a/arch/s390/kernel/sys_s390.c b/arch/s390/kernel/sys_s390.c
index 0090037..7579c97 100644
--- a/arch/s390/kernel/sys_s390.c
+++ b/arch/s390/kernel/sys_s390.c
@@ -90,3 +90,11 @@ SYSCALL_DEFINE1(s390_personality, unsigned int, personality)
return ret;
}
+
+SYSCALL_DEFINE0(s390_modify_bp)
+{
+ if (!test_facility(82))
+ return -EOPNOTSUPP;
+ set_thread_flag(TIF_NOBP);
+ return 0;
+}
diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S
index f7fc633..0c6293b 100644
--- a/arch/s390/kernel/syscalls.S
+++ b/arch/s390/kernel/syscalls.S
@@ -390,3 +390,4 @@ SYSCALL(sys_pwritev2,compat_sys_pwritev2)
SYSCALL(sys_s390_guarded_storage,compat_sys_s390_guarded_storage) /* 378 */
SYSCALL(sys_statx,compat_sys_statx)
SYSCALL(sys_s390_sthyi,compat_sys_s390_sthyi)
+SYSCALL(sys_s390_modify_bp,sys_s390_modify_bp)
--
2.7.4
Clear all user space registers on entry to the kernel and all KVM guest
registers on KVM guest exit if the register does not contain either a
parameter or a result value.
Suggested-by: Christian Borntraeger <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
---
arch/s390/kernel/entry.S | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 2a22c03..47227d3 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -322,6 +322,12 @@ ENTRY(sie64a)
sie_exit:
lg %r14,__SF_EMPTY+8(%r15) # load guest register save area
stmg %r0,%r13,0(%r14) # save guest gprs 0-13
+ xgr %r0,%r0 # clear guest registers
+ xgr %r1,%r1
+ xgr %r2,%r2
+ xgr %r3,%r3
+ xgr %r4,%r4
+ xgr %r5,%r5
lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers
lg %r2,__SF_EMPTY+16(%r15) # return exit reason code
br %r14
@@ -358,6 +364,7 @@ ENTRY(system_call)
UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
BPENTER __TI_flags(%r12),_TIF_NOBP
stmg %r0,%r7,__PT_R0(%r11)
+ xgr %r0,%r0
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC
@@ -640,6 +647,14 @@ ENTRY(pgm_check_handler)
4: lgr %r13,%r11
la %r11,STACK_FRAME_OVERHEAD(%r15)
stmg %r0,%r7,__PT_R0(%r11)
+ xgr %r0,%r0 # clear user space registers
+ xgr %r1,%r1
+ xgr %r2,%r2
+ xgr %r3,%r3
+ xgr %r4,%r4
+ xgr %r5,%r5
+ xgr %r6,%r6
+ xgr %r7,%r7
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
stmg %r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC
@@ -706,6 +721,15 @@ ENTRY(io_int_handler)
lmg %r8,%r9,__LC_IO_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg %r0,%r7,__PT_R0(%r11)
+ xgr %r0,%r0 # clear user space registers
+ xgr %r1,%r1
+ xgr %r2,%r2
+ xgr %r3,%r3
+ xgr %r4,%r4
+ xgr %r5,%r5
+ xgr %r6,%r6
+ xgr %r7,%r7
+ xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg %r8,%r9,__PT_PSW(%r11)
mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID
@@ -924,6 +948,15 @@ ENTRY(ext_int_handler)
lmg %r8,%r9,__LC_EXT_OLD_PSW
SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
stmg %r0,%r7,__PT_R0(%r11)
+ xgr %r0,%r0 # clear user space registers
+ xgr %r1,%r1
+ xgr %r2,%r2
+ xgr %r3,%r3
+ xgr %r4,%r4
+ xgr %r5,%r5
+ xgr %r6,%r6
+ xgr %r7,%r7
+ xgr %r10,%r10
mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
stmg %r8,%r9,__PT_PSW(%r11)
lghi %r1,__LC_EXT_PARAMS2
@@ -1133,6 +1166,14 @@ ENTRY(mcck_int_handler)
.Lmcck_skip:
lghi %r14,__LC_GPREGS_SAVE_AREA+64
stmg %r0,%r7,__PT_R0(%r11)
+ xgr %r0,%r0 # clear user space registers
+ xgr %r2,%r2
+ xgr %r3,%r3
+ xgr %r4,%r4
+ xgr %r5,%r5
+ xgr %r6,%r6
+ xgr %r7,%r7
+ xgr %r10,%r10
mvc __PT_R8(64,%r11),0(%r14)
stmg %r8,%r9,__PT_PSW(%r11)
xc __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
--
2.7.4
Implement nospec_load() and nospec_ptr() for s390 with the new
gmb() barrier between the boundary condition and the load that
may not be done speculatively.
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
---
arch/s390/include/asm/barrier.h | 38 ++++++++++++++++++++++++++++++++++++++
arch/s390/kernel/alternative.c | 7 +++++++
2 files changed, 45 insertions(+)
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 1043260..b8836a6 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -8,6 +8,8 @@
#ifndef __ASM_BARRIER_H
#define __ASM_BARRIER_H
+#include <asm/alternative.h>
+
/*
* Force strict CPU ordering.
* And yes, this is required on UP too when we're talking
@@ -23,6 +25,42 @@
#define mb() do { asm volatile(__ASM_BARRIER : : : "memory"); } while (0)
+static inline void gmb(void)
+{
+ asm volatile(
+ ALTERNATIVE("", ".long 0xb2e8f000", 81)
+ : : : "memory");
+}
+#define gmb gmb
+
+#define nospec_ptr(ptr, lo, hi) \
+({ \
+ typeof (ptr) __ptr = (ptr); \
+ typeof (ptr) __lo = (lo); \
+ typeof (ptr) __hi = (hi); \
+ typeof (ptr) __vptr = NULL; \
+ \
+ if (__lo <= __ptr && __ptr < __hi) { \
+ gmb(); \
+ __vptr = __ptr; \
+ } \
+ __vptr; \
+})
+
+#define nospec_load(ptr, lo, hi) \
+({ \
+ typeof (ptr) __ptr = (ptr); \
+ typeof (ptr) __lo = (lo); \
+ typeof (ptr) __hi = (hi); \
+ typeof (*ptr) __v = (typeof(*__ptr))(unsigned long) 0; \
+ \
+ if (__lo <= __ptr && __ptr < __hi) { \
+ gmb(); \
+ __v = *__ptr; \
+ } \
+ __v; \
+})
+
#define rmb() barrier()
#define wmb() barrier()
#define dma_rmb() mb()
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index 1abf4f3..33d2e88 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -15,6 +15,13 @@ static int __init disable_alternative_instructions(char *str)
early_param("noaltinstr", disable_alternative_instructions);
+static int __init nogmb_setup_early(char *str)
+{
+ __clear_facility(81, S390_lowcore.alt_stfle_fac_list);
+ return 0;
+}
+early_param("nogmb", nogmb_setup_early);
+
struct brcl_insn {
u16 opc;
s32 disp;
--
2.7.4
To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
---
arch/s390/include/asm/facility.h | 18 ++++++++++++++++++
arch/s390/include/asm/lowcore.h | 3 ++-
arch/s390/kernel/alternative.c | 3 ++-
arch/s390/kernel/early.c | 3 +++
arch/s390/kernel/setup.c | 4 +++-
arch/s390/kernel/smp.c | 4 +++-
6 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/arch/s390/include/asm/facility.h b/arch/s390/include/asm/facility.h
index f040644..2d58478 100644
--- a/arch/s390/include/asm/facility.h
+++ b/arch/s390/include/asm/facility.h
@@ -15,6 +15,24 @@
#define MAX_FACILITY_BIT (sizeof(((struct lowcore *)0)->stfle_fac_list) * 8)
+static inline void __set_facility(unsigned long nr, void *facilities)
+{
+ unsigned char *ptr = (unsigned char *) facilities;
+
+ if (nr >= MAX_FACILITY_BIT)
+ return;
+ ptr[nr >> 3] |= 0x80 >> (nr & 7);
+}
+
+static inline void __clear_facility(unsigned long nr, void *facilities)
+{
+ unsigned char *ptr = (unsigned char *) facilities;
+
+ if (nr >= MAX_FACILITY_BIT)
+ return;
+ ptr[nr >> 3] &= ~(0x80 >> (nr & 7));
+}
+
static inline int __test_facility(unsigned long nr, void *facilities)
{
unsigned char *ptr;
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index ec6592e..c63986a 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -151,7 +151,8 @@ struct lowcore {
__u8 pad_0x0e20[0x0f00-0x0e20]; /* 0x0e20 */
/* Extended facility list */
- __u64 stfle_fac_list[32]; /* 0x0f00 */
+ __u64 stfle_fac_list[16]; /* 0x0f00 */
+ __u64 alt_stfle_fac_list[16]; /* 0x0f80 */
__u8 pad_0x1000[0x11b0-0x1000]; /* 0x1000 */
/* Pointer to the machine check extended save area */
diff --git a/arch/s390/kernel/alternative.c b/arch/s390/kernel/alternative.c
index 574e776..1abf4f3 100644
--- a/arch/s390/kernel/alternative.c
+++ b/arch/s390/kernel/alternative.c
@@ -75,7 +75,8 @@ static void __init_or_module __apply_alternatives(struct alt_instr *start,
instr = (u8 *)&a->instr_offset + a->instr_offset;
replacement = (u8 *)&a->repl_offset + a->repl_offset;
- if (!test_facility(a->facility))
+ if (!__test_facility(a->facility,
+ S390_lowcore.alt_stfle_fac_list))
continue;
if (unlikely(a->instrlen % 2 || a->replacementlen % 2)) {
diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c
index 497a920..510f218 100644
--- a/arch/s390/kernel/early.c
+++ b/arch/s390/kernel/early.c
@@ -193,6 +193,9 @@ static noinline __init void setup_facility_list(void)
{
stfle(S390_lowcore.stfle_fac_list,
ARRAY_SIZE(S390_lowcore.stfle_fac_list));
+ memcpy(S390_lowcore.alt_stfle_fac_list,
+ S390_lowcore.stfle_fac_list,
+ sizeof(S390_lowcore.alt_stfle_fac_list));
}
static __init void detect_diag9c(void)
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 793da97..bcd2a4a 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -340,7 +340,9 @@ static void __init setup_lowcore(void)
lc->preempt_count = S390_lowcore.preempt_count;
lc->stfl_fac_list = S390_lowcore.stfl_fac_list;
memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
- MAX_FACILITY_BIT/8);
+ sizeof(lc->stfle_fac_list));
+ memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
+ sizeof(lc->alt_stfle_fac_list));
nmi_alloc_boot_cpu(lc);
vdso_alloc_boot_cpu(lc);
lc->sync_enter_timer = S390_lowcore.sync_enter_timer;
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index b8c1a85..4ce68ca 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -266,7 +266,9 @@ static void pcpu_prepare_secondary(struct pcpu *pcpu, int cpu)
__ctl_store(lc->cregs_save_area, 0, 15);
save_access_regs((unsigned int *) lc->access_regs_save_area);
memcpy(lc->stfle_fac_list, S390_lowcore.stfle_fac_list,
- MAX_FACILITY_BIT/8);
+ sizeof(lc->stfle_fac_list));
+ memcpy(lc->alt_stfle_fac_list, S390_lowcore.alt_stfle_fac_list,
+ sizeof(lc->alt_stfle_fac_list));
arch_spin_lock_setup(cpu);
}
--
2.7.4
On 01/17/2018 10:48 AM, Martin Schwidefsky wrote:
> rc = syscall(__NR_s390_modify_bp);
> if (rc) {
> perror("s390_modify_bp");
> exit(EXIT_FAILURE);
> }
Isn't this traditionally done through personality or prctl?
This looks like something other architectures may want as well.
Thanks,
Florian
On 17/01/2018 11:03, Florian Weimer wrote:
> On 01/17/2018 10:48 AM, Martin Schwidefsky wrote:
>> rc = syscall(__NR_s390_modify_bp);
>> if (rc) {
>> perror("s390_modify_bp");
>> exit(EXIT_FAILURE);
>> }
>
> Isn't this traditionally done through personality or prctl?
>
> This looks like something other architectures may want as well.
Yes, Intel would want to have a prctl or similar to enable STIBP
(single-thread indirect branch predictor).
Paolo
On 01/17/2018 11:03 AM, Florian Weimer wrote:
> On 01/17/2018 10:48 AM, Martin Schwidefsky wrote:
>> rc = syscall(__NR_s390_modify_bp);
>> if (rc) {
>> perror("s390_modify_bp");
>> exit(EXIT_FAILURE);
>> }
>
> Isn't this traditionally done through personality or prctl?
I think we want this per thread (and not per process). So I assume personality
will not work out. Can a prctl be done per thread?
>
> This looks like something other architectures may want as well.
Probably.
Paolo,
while this is kvm code, my current plan is to submit the "final" version after
review and probably some fixes/renames via Martin together with the other
patches. Are you ok with that? Right now it seems that the CAP number is still
fine.
Christian
On 01/17/2018 10:48 AM, Martin Schwidefsky wrote:
> From: Christian Borntraeger <[email protected]>
>
> The new firmware interfaces for branch prediction behaviour changes
> are transparently available for the guest. Nevertheless, there is
> new state attached that should be migrated and properly resetted.
> Provide a mechanism for handling reset, migration and VSIE.
>
> Signed-off-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
> arch/s390/include/asm/kvm_host.h | 3 ++-
> arch/s390/include/uapi/asm/kvm.h | 4 +++-
> arch/s390/kvm/kvm-s390.c | 11 +++++++++++
> arch/s390/kvm/vsie.c | 8 ++++++++
> include/uapi/linux/kvm.h | 1 +
> 5 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index e14f381..a2f6c5d 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -207,7 +207,8 @@ struct kvm_s390_sie_block {
> __u16 ipa; /* 0x0056 */
> __u32 ipb; /* 0x0058 */
> __u32 scaoh; /* 0x005c */
> - __u8 reserved60; /* 0x0060 */
> +#define FPF_SEBC 0x20
> + __u8 fpf; /* 0x0060 */
> #define ECB_GS 0x40
> #define ECB_TE 0x10
> #define ECB_SRSI 0x04
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 38535a57..20b9e9f 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -224,6 +224,7 @@ struct kvm_guest_debug_arch {
> #define KVM_SYNC_RICCB (1UL << 7)
> #define KVM_SYNC_FPRS (1UL << 8)
> #define KVM_SYNC_GSCB (1UL << 9)
> +#define KVM_SYNC_SEBC (1UL << 10)
> /* length and alignment of the sdnx as a power of two */
> #define SDNXC 8
> #define SDNXL (1UL << SDNXC)
> @@ -247,7 +248,8 @@ struct kvm_sync_regs {
> };
> __u8 reserved[512]; /* for future vector expansion */
> __u32 fpc; /* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
> - __u8 padding1[52]; /* riccb needs to be 64byte aligned */
> + __u8 sebc:1; /* spec blocking */
> + __u8 padding1[51]; /* riccb needs to be 64byte aligned */
> __u8 riccb[64]; /* runtime instrumentation controls block */
> __u8 padding2[192]; /* sdnx needs to be 256byte aligned */
> union {
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 2c93cbb..0c18f73 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_S390_GS:
> r = test_facility(133);
> break;
> + case KVM_CAP_S390_SEB:
> + r = test_facility(82);
> + break;
> default:
> r = 0;
> }
> @@ -2198,6 +2201,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> kvm_s390_set_prefix(vcpu, 0);
> if (test_kvm_facility(vcpu->kvm, 64))
> vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB;
> + if (test_kvm_facility(vcpu->kvm, 82))
> + vcpu->run->kvm_valid_regs |= KVM_SYNC_SEBC;
> if (test_kvm_facility(vcpu->kvm, 133))
> vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
> /* fprs can be synchronized via vrs, even if the guest has no vx. With
> @@ -2339,6 +2344,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
> current->thread.fpu.fpc = 0;
> vcpu->arch.sie_block->gbea = 1;
> vcpu->arch.sie_block->pp = 0;
> + vcpu->arch.sie_block->fpf &= ~FPF_SEBC;
> vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
> kvm_clear_async_pf_completion_queue(vcpu);
> if (!kvm_s390_user_cpu_state_ctrl(vcpu->kvm))
> @@ -3298,6 +3304,10 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> vcpu->arch.sie_block->ecd |= ECD_HOSTREGMGMT;
> vcpu->arch.gs_enabled = 1;
> }
> + if (kvm_run->kvm_dirty_regs & KVM_SYNC_SEBC) {
> + vcpu->arch.sie_block->fpf &= ~FPF_SEBC;
> + vcpu->arch.sie_block->fpf |= kvm_run->s.regs.sebc ? FPF_SEBC : 0;
> + }
> save_access_regs(vcpu->arch.host_acrs);
> restore_access_regs(vcpu->run->s.regs.acrs);
> /* save host (userspace) fprs/vrs */
> @@ -3344,6 +3354,7 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> kvm_run->s.regs.pft = vcpu->arch.pfault_token;
> kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
> kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
> + kvm_run->s.regs.sebc = (vcpu->arch.sie_block->fpf & FPF_SEBC) == FPF_SEBC;
> save_access_regs(vcpu->run->s.regs.acrs);
> restore_access_regs(vcpu->arch.host_acrs);
> /* Save guest register state */
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index 5d6ae03..10ea208 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -223,6 +223,10 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> memcpy(scb_o->gcr, scb_s->gcr, 128);
> scb_o->pp = scb_s->pp;
>
> + /* speculative blocking */
> + scb_o->fpf &= ~FPF_SEBC;
> + scb_o->fpf |= scb_s->fpf & FPF_SEBC;
> +
> /* interrupt intercept */
> switch (scb_s->icptcode) {
> case ICPT_PROGI:
> @@ -265,6 +269,7 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> scb_s->ecb3 = 0;
> scb_s->ecd = 0;
> scb_s->fac = 0;
> + scb_s->fpf = 0;
>
> rc = prepare_cpuflags(vcpu, vsie_page);
> if (rc)
> @@ -324,6 +329,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> prefix_unmapped(vsie_page);
> scb_s->ecb |= scb_o->ecb & ECB_TE;
> }
> + /* speculative blocking */
> + if (test_kvm_facility(vcpu->kvm, 82))
> + scb_s->fpf |= scb_o->fpf & FPF_SEBC;
> /* SIMD */
> if (test_kvm_facility(vcpu->kvm, 129)) {
> scb_s->eca |= scb_o->eca & ECA_VX;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 496e59a..066d7ff 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_HYPERV_SYNIC2 148
> #define KVM_CAP_HYPERV_VP_INDEX 149
> #define KVM_CAP_S390_AIS_MIGRATION 150
> +#define KVM_CAP_S390_SEB 151
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> while this is kvm code, my current plan is to submit the "final"
> version after review and probably some fixes/renames via Martin
> together with the other patches. Are you ok with that? Right now it
> seems that the CAP number is still fine.
Sure, though there will be a capability introduced by PPC for similar
purposes, so check for conflicts.
On 17/01/2018 12:18, Christian Borntraeger wrote:
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 2c93cbb..0c18f73 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_S390_GS:
> r = test_facility(133);
> break;
> + case KVM_CAP_S390_SEB:
> + r = test_facility(82);
> + break;
> default:
> r = 0;
Can you add a generic "test facility" capability and ioctl?
Paolo
On 01/17/2018 12:22 PM, Paolo Bonzini wrote:
>> while this is kvm code, my current plan is to submit the "final"
>> version after review and probably some fixes/renames via Martin
>> together with the other patches. Are you ok with that? Right now it
>> seems that the CAP number is still fine.
> Sure, though there will be a capability introduced by PPC for similar
> purposes, so check for conflicts.
>
> On 17/01/2018 12:18, Christian Borntraeger wrote:
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 2c93cbb..0c18f73 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> case KVM_CAP_S390_GS:
>> r = test_facility(133);
>> break;
>> + case KVM_CAP_S390_SEB:
>> + r = test_facility(82);
>> + break;
>> default:
>> r = 0;
>
> Can you add a generic "test facility" capability and ioctl?
The problem is not that I announce the facility, I in fact announce that the
programmatic interface is available (the sebc sync reg and the usage of that field).
(So the CAP is part of this patch to have both in lockstep)
A non-existing facility will then just disable that programmatic interface.
On 01/17/2018 12:28 PM, Christian Borntraeger wrote:
>
>
> On 01/17/2018 12:22 PM, Paolo Bonzini wrote:
>>> while this is kvm code, my current plan is to submit the "final"
>>> version after review and probably some fixes/renames via Martin
>>> together with the other patches. Are you ok with that? Right now it
>>> seems that the CAP number is still fine.
>> Sure, though there will be a capability introduced by PPC for similar
>> purposes, so check for conflicts.
>>
>> On 17/01/2018 12:18, Christian Borntraeger wrote:
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index 2c93cbb..0c18f73 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>> case KVM_CAP_S390_GS:
>>> r = test_facility(133);
>>> break;
>>> + case KVM_CAP_S390_SEB:
>>> + r = test_facility(82);
>>> + break;
>>> default:
>>> r = 0;
>>
>> Can you add a generic "test facility" capability and ioctl?
>
> The problem is not that I announce the facility, I in fact announce that the
> programmatic interface is available (the sebc sync reg and the usage of that field).
> (So the CAP is part of this patch to have both in lockstep)
> A non-existing facility will then just disable that programmatic interface.
To put it differently. CAP_S390_GS and CAP_S390_SEB could also just
do a
return 1;
and the QEMU has to check both (which it probably does anyway)
Christian
On 17/01/2018 12:29, Christian Borntraeger wrote:
>> The problem is not that I announce the facility, I in fact announce that the
>> programmatic interface is available (the sebc sync reg and the usage of that field).
>> (So the CAP is part of this patch to have both in lockstep)
>> A non-existing facility will then just disable that programmatic interface.
> To put it differently. CAP_S390_GS and CAP_S390_SEB could also just
> do a
>
> return 1;
>
> and the QEMU has to check both (which it probably does anyway)
I see. Thanks for the explanation!
Paolo
> #define ECB_GS 0x40
> #define ECB_TE 0x10
> #define ECB_SRSI 0x04
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 38535a57..20b9e9f 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -224,6 +224,7 @@ struct kvm_guest_debug_arch {
> #define KVM_SYNC_RICCB (1UL << 7)
> #define KVM_SYNC_FPRS (1UL << 8)
> #define KVM_SYNC_GSCB (1UL << 9)
> +#define KVM_SYNC_SEBC (1UL << 10)
> /* length and alignment of the sdnx as a power of two */
> #define SDNXC 8
> #define SDNXL (1UL << SDNXC)
> @@ -247,7 +248,8 @@ struct kvm_sync_regs {
> };
> __u8 reserved[512]; /* for future vector expansion */
> __u32 fpc; /* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
> - __u8 padding1[52]; /* riccb needs to be 64byte aligned */
> + __u8 sebc:1; /* spec blocking */
do you want to define the unused bits as reserved? Nicer to read IMHO
(especially also using spaces "sebc : 1")
> + __u8 padding1[51]; /* riccb needs to be 64byte aligned */
> __u8 riccb[64]; /* runtime instrumentation controls block */
> __u8 padding2[192]; /* sdnx needs to be 256byte aligned */
> union {
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 2c93cbb..0c18f73 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_S390_GS:
> r = test_facility(133);
> break;
> + case KVM_CAP_S390_SEB:
> + r = test_facility(82);
> + break;
> default:
> r = 0;
> }
> @@ -2198,6 +2201,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> kvm_s390_set_prefix(vcpu, 0);
> if (test_kvm_facility(vcpu->kvm, 64))
> vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB;
> + if (test_kvm_facility(vcpu->kvm, 82))
> + vcpu->run->kvm_valid_regs |= KVM_SYNC_SEBC;
> if (test_kvm_facility(vcpu->kvm, 133))
> vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
> /* fprs can be synchronized via vrs, even if the guest has no vx. With
> @@ -2339,6 +2344,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
> current->thread.fpu.fpc = 0;
> vcpu->arch.sie_block->gbea = 1;
> vcpu->arch.sie_block->pp = 0;
> + vcpu->arch.sie_block->fpf &= ~FPF_SEBC;
> vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
> kvm_clear_async_pf_completion_queue(vcpu);
> if (!kvm_s390_user_cpu_state_ctrl(vcpu->kvm))
> @@ -3298,6 +3304,10 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> vcpu->arch.sie_block->ecd |= ECD_HOSTREGMGMT;
> vcpu->arch.gs_enabled = 1;
> }
> + if (kvm_run->kvm_dirty_regs & KVM_SYNC_SEBC) {
We should test for test_facility(82). Otherwise user space can enable
undefined bits in the SCB on machines with !facility 82.
> + vcpu->arch.sie_block->fpf &= ~FPF_SEBC;
> + vcpu->arch.sie_block->fpf |= kvm_run->s.regs.sebc ? FPF_SEBC : 0;
> + }
> save_access_regs(vcpu->arch.host_acrs);
> restore_access_regs(vcpu->run->s.regs.acrs);
> /* save host (userspace) fprs/vrs */
> @@ -3344,6 +3354,7 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> kvm_run->s.regs.pft = vcpu->arch.pfault_token;
> kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
> kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
> + kvm_run->s.regs.sebc = (vcpu->arch.sie_block->fpf & FPF_SEBC) == FPF_SEBC;
> save_access_regs(vcpu->run->s.regs.acrs);
> restore_access_regs(vcpu->arch.host_acrs);
> /* Save guest register state */
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index 5d6ae03..10ea208 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -223,6 +223,10 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> memcpy(scb_o->gcr, scb_s->gcr, 128);
> scb_o->pp = scb_s->pp;
>
> + /* speculative blocking */
This field should only be written back with test_kvm_facility(vcpu->kvm, 82)
(no public documentation, this looks like the SIE can modify this field?
Triggered by which instruction?)
> + scb_o->fpf &= ~FPF_SEBC;
> + scb_o->fpf |= scb_s->fpf & FPF_SEBC;
> +
> /* interrupt intercept */
> switch (scb_s->icptcode) {
> case ICPT_PROGI:
> @@ -265,6 +269,7 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> scb_s->ecb3 = 0;
> scb_s->ecd = 0;
> scb_s->fac = 0;
> + scb_s->fpf = 0;
>
> rc = prepare_cpuflags(vcpu, vsie_page);
> if (rc)
> @@ -324,6 +329,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> prefix_unmapped(vsie_page);
> scb_s->ecb |= scb_o->ecb & ECB_TE;
> }
> + /* speculative blocking */
> + if (test_kvm_facility(vcpu->kvm, 82))
> + scb_s->fpf |= scb_o->fpf & FPF_SEBC;
> /* SIMD */
> if (test_kvm_facility(vcpu->kvm, 129)) {
> scb_s->eca |= scb_o->eca & ECA_VX;
--
Thanks,
David / dhildenb
On 01/17/2018 12:33 PM, David Hildenbrand wrote:
>
>> #define ECB_GS 0x40
>> #define ECB_TE 0x10
>> #define ECB_SRSI 0x04
>> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
>> index 38535a57..20b9e9f 100644
>> --- a/arch/s390/include/uapi/asm/kvm.h
>> +++ b/arch/s390/include/uapi/asm/kvm.h
>> @@ -224,6 +224,7 @@ struct kvm_guest_debug_arch {
>> #define KVM_SYNC_RICCB (1UL << 7)
>> #define KVM_SYNC_FPRS (1UL << 8)
>> #define KVM_SYNC_GSCB (1UL << 9)
>> +#define KVM_SYNC_SEBC (1UL << 10)
>> /* length and alignment of the sdnx as a power of two */
>> #define SDNXC 8
>> #define SDNXL (1UL << SDNXC)
>> @@ -247,7 +248,8 @@ struct kvm_sync_regs {
>> };
>> __u8 reserved[512]; /* for future vector expansion */
>> __u32 fpc; /* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
>> - __u8 padding1[52]; /* riccb needs to be 64byte aligned */
>> + __u8 sebc:1; /* spec blocking */
>
> do you want to define the unused bits as reserved? Nicer to read IMHO
I certainly want to have these bits for future use. So maybe a
__u8 reserved : 7;
after that makes a lot of sense. Also the sebc : 1; (spaces)
(FWIW, I will rename that to bpbc for other reasons).
>
> (especially also using spaces "sebc : 1")
>
>> + __u8 padding1[51]; /* riccb needs to be 64byte aligned */
>> __u8 riccb[64]; /* runtime instrumentation controls block */
>> __u8 padding2[192]; /* sdnx needs to be 256byte aligned */
>> union {
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 2c93cbb..0c18f73 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> case KVM_CAP_S390_GS:
>> r = test_facility(133);
>> break;
>> + case KVM_CAP_S390_SEB:
>> + r = test_facility(82);
>> + break;
>> default:
>> r = 0;
>> }
>> @@ -2198,6 +2201,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>> kvm_s390_set_prefix(vcpu, 0);
>> if (test_kvm_facility(vcpu->kvm, 64))
>> vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB;
>> + if (test_kvm_facility(vcpu->kvm, 82))
>> + vcpu->run->kvm_valid_regs |= KVM_SYNC_SEBC;
>> if (test_kvm_facility(vcpu->kvm, 133))
>> vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
>> /* fprs can be synchronized via vrs, even if the guest has no vx. With
>> @@ -2339,6 +2344,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
>> current->thread.fpu.fpc = 0;
>> vcpu->arch.sie_block->gbea = 1;
>> vcpu->arch.sie_block->pp = 0;
>> + vcpu->arch.sie_block->fpf &= ~FPF_SEBC;
>> vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
>> kvm_clear_async_pf_completion_queue(vcpu);
>> if (!kvm_s390_user_cpu_state_ctrl(vcpu->kvm))
>> @@ -3298,6 +3304,10 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> vcpu->arch.sie_block->ecd |= ECD_HOSTREGMGMT;
>> vcpu->arch.gs_enabled = 1;
>> }
>> + if (kvm_run->kvm_dirty_regs & KVM_SYNC_SEBC) {
>
> We should test for test_facility(82). Otherwise user space can enable
> undefined bits in the SCB on machines with !facility 82.
Agreed, will fix.
>
>> + vcpu->arch.sie_block->fpf &= ~FPF_SEBC;
>> + vcpu->arch.sie_block->fpf |= kvm_run->s.regs.sebc ? FPF_SEBC : 0;
>> + }
>> save_access_regs(vcpu->arch.host_acrs);
>> restore_access_regs(vcpu->run->s.regs.acrs);
>> /* save host (userspace) fprs/vrs */
>> @@ -3344,6 +3354,7 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> kvm_run->s.regs.pft = vcpu->arch.pfault_token;
>> kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
>> kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
>> + kvm_run->s.regs.sebc = (vcpu->arch.sie_block->fpf & FPF_SEBC) == FPF_SEBC;
>> save_access_regs(vcpu->run->s.regs.acrs);
>> restore_access_regs(vcpu->arch.host_acrs);
>> /* Save guest register state */
>> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
>> index 5d6ae03..10ea208 100644
>> --- a/arch/s390/kvm/vsie.c
>> +++ b/arch/s390/kvm/vsie.c
>> @@ -223,6 +223,10 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>> memcpy(scb_o->gcr, scb_s->gcr, 128);
>> scb_o->pp = scb_s->pp;
>>
>> + /* speculative blocking */
>
> This field should only be written back with test_kvm_facility(vcpu->kvm, 82)
Agreed.
>
> (no public documentation, this looks like the SIE can modify this field?
> Triggered by which instruction?)
The instruction from patch 3.
>
>> + scb_o->fpf &= ~FPF_SEBC;
>> + scb_o->fpf |= scb_s->fpf & FPF_SEBC;
>> +
>> /* interrupt intercept */
>> switch (scb_s->icptcode) {
>> case ICPT_PROGI:
>> @@ -265,6 +269,7 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>> scb_s->ecb3 = 0;
>> scb_s->ecd = 0;
>> scb_s->fac = 0;
>> + scb_s->fpf = 0;
>>
>> rc = prepare_cpuflags(vcpu, vsie_page);
>> if (rc)
>> @@ -324,6 +329,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>> prefix_unmapped(vsie_page);
>> scb_s->ecb |= scb_o->ecb & ECB_TE;
>> }
>> + /* speculative blocking */
>> + if (test_kvm_facility(vcpu->kvm, 82))
>> + scb_s->fpf |= scb_o->fpf & FPF_SEBC;
>> /* SIMD */
>> if (test_kvm_facility(vcpu->kvm, 129)) {
>> scb_s->eca |= scb_o->eca & ECA_VX;
>
Thanks for the quick review.
On 17/01/2018 12:14, Christian Borntraeger wrote:
>
>
> On 01/17/2018 11:03 AM, Florian Weimer wrote:
>> On 01/17/2018 10:48 AM, Martin Schwidefsky wrote:
>>> rc = syscall(__NR_s390_modify_bp);
>>> if (rc) {
>>> perror("s390_modify_bp");
>>> exit(EXIT_FAILURE);
>>> }
>>
>> Isn't this traditionally done through personality or prctl?
>
> I think we want this per thread (and not per process). So I assume personality
> will not work out. Can a prctl be done per thread?
Yes, prctls can be either per-process (e.g. PR_SET_CHILD_SUBREAPER or
PR_SET_DUMPABLE) or per-thread (e.g. PR_SET_NAME or PR_SET_SECCOMP).
Paolo
On Wed, 17 Jan 2018 12:14:52 +0100
Christian Borntraeger <[email protected]> wrote:
> On 01/17/2018 11:03 AM, Florian Weimer wrote:
> > On 01/17/2018 10:48 AM, Martin Schwidefsky wrote:
> >> rc = syscall(__NR_s390_modify_bp);
> >> if (rc) {
> >> perror("s390_modify_bp");
> >> exit(EXIT_FAILURE);
> >> }
> >
> > Isn't this traditionally done through personality or prctl?
>
> I think we want this per thread (and not per process). So I assume personality
> will not work out. Can a prctl be done per thread?
The prctl interface seems to be usable to set a per-thread control
as well. But there is no architecture specific prctl as far as I
can see. Maybe a common PR_SET_NOBP with an arch function like
arch_set_nobp.
> >
> > This looks like something other architectures may want as well.
Yes, that is likely.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
On Wed, 17 Jan 2018 10:48:33 +0100
Martin Schwidefsky <[email protected]> wrote:
> This patch series implements multiple mitigations for the speculative
> execution findings:
> 1. The definition of the gmb() barrier as currently used by the
> distributions, we may have to find a better name for it
> 2. The architecture code for the nospec interfaces, the macros for
> nospec_ptr and nospec_load just use the gmb() barrier
> 3. The enablement for firmware features to switch between different
> branch prediction modes. It comes with a config option
> CONFIG_KERNEL_NOBP, two new kernel parameters "nobp=[0|1]" and
> "nospec", and a new system call s390_modify_bp.
> With CONFIG_KERNEL_NOBP=y the new branch prediction mode is active
> for the kernel code by default and can be switched off with "nospec"
> or "nobp=0". With CONFIG_KERNEL_NOBP=n the new mode is inactive for
> kernel code unless "nobp=1" is specified.
> User space code can use the trapdoor system call s390_modify_bp to
> set the new TIF_NOBP bit. This switches to the new branch prediction
> mode for the lifetime of the task, any children of the task will
> inherit this attribute.
> The vCPU of a KVM guest will run with the new branch prediction
> mode if either the associated qemu task has TIF_NOBP set or if the
> KVM kernel code sets TIF_NOBP_GUEST. The later will require a small
> update to KVM backend.
How does this interact with the facility bits? Bit 81 seems to indicate
function code f (gmb), while bit 82 seems to indicate function codes
c/d (branch prediction modes). Both seem to be in the range of bits
transparently passed through for kvm (although this still needs a qemu
update to the cpu models so the bits are not masked out as unknown.)
What happens if a certain branch prediction mode is set prior to
execution of SIE and the guest kernel is issuing PPA c/d itself?
> 4. Transport channel reduction by clearing registers on interrupts,
> system calls and KVM guest exits.
>
> We are working on an equivalent for retpoline, stay tuned.
>
> @Greg: I have started with the backports for the stable kernel releases,
> but unless the interface for gmp/nospec_ptr/nospec_load is cast in stone
> does it make sense to send them?
>
> Christian Borntraeger (1):
> KVM: s390: wire up seb feature
>
> Martin Schwidefsky (5):
> s390/alternative: use a copy of the facility bit mask
> s390: implement nospec_[load|ptr]
> s390: add options to change branch prediction behaviour for the kernel
> s390: add system call to run tasks with modified branch prediction
> s390: scrub registers on kernel entry and KVM exit
>
> arch/s390/Kconfig | 17 +++++
> arch/s390/include/asm/barrier.h | 38 ++++++++++
> arch/s390/include/asm/facility.h | 18 +++++
> arch/s390/include/asm/kvm_host.h | 3 +-
> arch/s390/include/asm/lowcore.h | 3 +-
> arch/s390/include/asm/processor.h | 1 +
> arch/s390/include/asm/thread_info.h | 4 ++
> arch/s390/include/uapi/asm/kvm.h | 4 +-
> arch/s390/include/uapi/asm/unistd.h | 3 +-
> arch/s390/kernel/alternative.c | 33 ++++++++-
> arch/s390/kernel/early.c | 5 ++
> arch/s390/kernel/entry.S | 134 +++++++++++++++++++++++++++++++++++-
> arch/s390/kernel/ipl.c | 1 +
> arch/s390/kernel/setup.c | 4 +-
> arch/s390/kernel/smp.c | 6 +-
> arch/s390/kernel/sys_s390.c | 8 +++
> arch/s390/kernel/syscalls.S | 1 +
> arch/s390/kvm/kvm-s390.c | 11 +++
> arch/s390/kvm/vsie.c | 8 +++
> include/uapi/linux/kvm.h | 1 +
> 20 files changed, 294 insertions(+), 9 deletions(-)
>
Something under Documentation/ to document the new knobs would be nice.
On 01/17/2018 01:00 PM, Cornelia Huck wrote:
> On Wed, 17 Jan 2018 10:48:33 +0100
> Martin Schwidefsky <[email protected]> wrote:
>
>> This patch series implements multiple mitigations for the speculative
>> execution findings:
>> 1. The definition of the gmb() barrier as currently used by the
>> distributions, we may have to find a better name for it
>> 2. The architecture code for the nospec interfaces, the macros for
>> nospec_ptr and nospec_load just use the gmb() barrier
>> 3. The enablement for firmware features to switch between different
>> branch prediction modes. It comes with a config option
>> CONFIG_KERNEL_NOBP, two new kernel parameters "nobp=[0|1]" and
>> "nospec", and a new system call s390_modify_bp.
>> With CONFIG_KERNEL_NOBP=y the new branch prediction mode is active
>> for the kernel code by default and can be switched off with "nospec"
>> or "nobp=0". With CONFIG_KERNEL_NOBP=n the new mode is inactive for
>> kernel code unless "nobp=1" is specified.
>> User space code can use the trapdoor system call s390_modify_bp to
>> set the new TIF_NOBP bit. This switches to the new branch prediction
>> mode for the lifetime of the task, any children of the task will
>> inherit this attribute.
>> The vCPU of a KVM guest will run with the new branch prediction
>> mode if either the associated qemu task has TIF_NOBP set or if the
>> KVM kernel code sets TIF_NOBP_GUEST. The later will require a small
>> update to KVM backend.
>
> How does this interact with the facility bits? Bit 81 seems to indicate
> function code f (gmb), while bit 82 seems to indicate function codes
> c/d (branch prediction modes). Both seem to be in the range of bits
> transparently passed through for kvm (although this still needs a qemu
> update to the cpu models so the bits are not masked out as unknown.)
Correct.
I will send a qemu patch soon.
On Wed, 17 Jan 2018, Martin Schwidefsky wrote:
> Implement nospec_load() and nospec_ptr() for s390 with the new
> gmb() barrier between the boundary condition and the load that
> may not be done speculatively.
FWIW the naming seems to be changing constantly. The latest patchset from
Dan Williams [1] uses ifence_...().
[1] lkml.kernel.org/r/151586744180.5820.13215059696964205856.stgit@dwillia2-desk3.amr.corp.intel.com
--
Jiri Kosina
SUSE Labs
On Wed, Jan 17, 2018 at 12:55:06PM +0100, Martin Schwidefsky wrote:
> On Wed, 17 Jan 2018 12:14:52 +0100
> Christian Borntraeger <[email protected]> wrote:
>
> > On 01/17/2018 11:03 AM, Florian Weimer wrote:
> > > On 01/17/2018 10:48 AM, Martin Schwidefsky wrote:
> > >> ???????? rc = syscall(__NR_s390_modify_bp);
> > >> ???????? if (rc) {
> > >> ???????????????? perror("s390_modify_bp");
> > >> ???????????????? exit(EXIT_FAILURE);
> > >> ???????? }
> > >
> > > Isn't this traditionally done through personality or prctl?
> >
> > I think we want this per thread (and not per process). So I assume personality
> > will not work out. Can a prctl be done per thread?
>
> The prctl interface seems to be usable to set a per-thread control
> as well. But there is no architecture specific prctl as far as I
> can see. Maybe a common PR_SET_NOBP with an arch function like
> arch_set_nobp.
There is for example PR_MPX_ENABLE_MANAGEMENT, which is x86 specific. On
the other hand x86 even has an arch_prctl() system call... ;)
On Wed, Jan 17, 2018 at 10:48:33AM +0100, Martin Schwidefsky wrote:
> @Greg: I have started with the backports for the stable kernel releases,
> but unless the interface for gmp/nospec_ptr/nospec_load is cast in stone
> does it make sense to send them?
No, I can't take anything until it is in Linus's tree. And as Jiri
pointed out, this isn't the same api as Dan last posted, so any
backports would be really foolish right now...
thanks,
greg k-h
The new firmware interfaces for branch prediction behaviour changes
are transparently available for the guest. Nevertheless, there is
new state attached that should be migrated and properly resetted.
Provide a mechanism for handling reset, migration and VSIE.
Signed-off-by: Christian Borntraeger <[email protected]>
---
v1->v2: - review feedback from David
- rename seb(c) into bpb(c)
arch/s390/include/asm/kvm_host.h | 3 ++-
arch/s390/include/uapi/asm/kvm.h | 5 ++++-
arch/s390/kvm/kvm-s390.c | 12 ++++++++++++
arch/s390/kvm/vsie.c | 10 ++++++++++
include/uapi/linux/kvm.h | 1 +
5 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index e14f381..c1b0a9a 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -207,7 +207,8 @@ struct kvm_s390_sie_block {
__u16 ipa; /* 0x0056 */
__u32 ipb; /* 0x0058 */
__u32 scaoh; /* 0x005c */
- __u8 reserved60; /* 0x0060 */
+#define FPF_BPBC 0x20
+ __u8 fpf; /* 0x0060 */
#define ECB_GS 0x40
#define ECB_TE 0x10
#define ECB_SRSI 0x04
diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
index 38535a57..4cdaa55 100644
--- a/arch/s390/include/uapi/asm/kvm.h
+++ b/arch/s390/include/uapi/asm/kvm.h
@@ -224,6 +224,7 @@ struct kvm_guest_debug_arch {
#define KVM_SYNC_RICCB (1UL << 7)
#define KVM_SYNC_FPRS (1UL << 8)
#define KVM_SYNC_GSCB (1UL << 9)
+#define KVM_SYNC_BPBC (1UL << 10)
/* length and alignment of the sdnx as a power of two */
#define SDNXC 8
#define SDNXL (1UL << SDNXC)
@@ -247,7 +248,9 @@ struct kvm_sync_regs {
};
__u8 reserved[512]; /* for future vector expansion */
__u32 fpc; /* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
- __u8 padding1[52]; /* riccb needs to be 64byte aligned */
+ __u8 bpbc : 1; /* bp mode */
+ __u8 reserved2 : 7;
+ __u8 padding1[51]; /* riccb needs to be 64byte aligned */
__u8 riccb[64]; /* runtime instrumentation controls block */
__u8 padding2[192]; /* sdnx needs to be 256byte aligned */
union {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 2c93cbb..2598cf243 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_S390_GS:
r = test_facility(133);
break;
+ case KVM_CAP_S390_BPB:
+ r = test_facility(82);
+ break;
default:
r = 0;
}
@@ -2198,6 +2201,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
kvm_s390_set_prefix(vcpu, 0);
if (test_kvm_facility(vcpu->kvm, 64))
vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB;
+ if (test_kvm_facility(vcpu->kvm, 82))
+ vcpu->run->kvm_valid_regs |= KVM_SYNC_BPBC;
if (test_kvm_facility(vcpu->kvm, 133))
vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
/* fprs can be synchronized via vrs, even if the guest has no vx. With
@@ -2339,6 +2344,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
current->thread.fpu.fpc = 0;
vcpu->arch.sie_block->gbea = 1;
vcpu->arch.sie_block->pp = 0;
+ vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
kvm_clear_async_pf_completion_queue(vcpu);
if (!kvm_s390_user_cpu_state_ctrl(vcpu->kvm))
@@ -3298,6 +3304,11 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
vcpu->arch.sie_block->ecd |= ECD_HOSTREGMGMT;
vcpu->arch.gs_enabled = 1;
}
+ if ((kvm_run->kvm_dirty_regs & KVM_SYNC_BPBC) &&
+ test_kvm_facility(vcpu->kvm, 82)) {
+ vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
+ vcpu->arch.sie_block->fpf |= kvm_run->s.regs.bpbc ? FPF_BPBC : 0;
+ }
save_access_regs(vcpu->arch.host_acrs);
restore_access_regs(vcpu->run->s.regs.acrs);
/* save host (userspace) fprs/vrs */
@@ -3344,6 +3355,7 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
kvm_run->s.regs.pft = vcpu->arch.pfault_token;
kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
+ kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
save_access_regs(vcpu->run->s.regs.acrs);
restore_access_regs(vcpu->arch.host_acrs);
/* Save guest register state */
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 5d6ae03..7513483 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -223,6 +223,12 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
memcpy(scb_o->gcr, scb_s->gcr, 128);
scb_o->pp = scb_s->pp;
+ /* branch prediction */
+ if (test_kvm_facility(vcpu->kvm, 82)) {
+ scb_o->fpf &= ~FPF_BPBC;
+ scb_o->fpf |= scb_s->fpf & FPF_BPBC;
+ }
+
/* interrupt intercept */
switch (scb_s->icptcode) {
case ICPT_PROGI:
@@ -265,6 +271,7 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
scb_s->ecb3 = 0;
scb_s->ecd = 0;
scb_s->fac = 0;
+ scb_s->fpf = 0;
rc = prepare_cpuflags(vcpu, vsie_page);
if (rc)
@@ -324,6 +331,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
prefix_unmapped(vsie_page);
scb_s->ecb |= scb_o->ecb & ECB_TE;
}
+ /* branch prediction */
+ if (test_kvm_facility(vcpu->kvm, 82))
+ scb_s->fpf |= scb_o->fpf & FPF_BPBC;
/* SIMD */
if (test_kvm_facility(vcpu->kvm, 129)) {
scb_s->eca |= scb_o->eca & ECA_VX;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 496e59a..79f6050 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_HYPERV_SYNIC2 148
#define KVM_CAP_HYPERV_VP_INDEX 149
#define KVM_CAP_S390_AIS_MIGRATION 150
+#define KVM_CAP_S390_BPB 151
#ifdef KVM_CAP_IRQ_ROUTING
--
2.9.4
On 17.01.2018 14:44, Christian Borntraeger wrote:
> The new firmware interfaces for branch prediction behaviour changes
> are transparently available for the guest. Nevertheless, there is
> new state attached that should be migrated and properly resetted.
> Provide a mechanism for handling reset, migration and VSIE.
>
> Signed-off-by: Christian Borntraeger <[email protected]>
> ---
> v1->v2: - review feedback from David
> - rename seb(c) into bpb(c)
> arch/s390/include/asm/kvm_host.h | 3 ++-
> arch/s390/include/uapi/asm/kvm.h | 5 ++++-
> arch/s390/kvm/kvm-s390.c | 12 ++++++++++++
> arch/s390/kvm/vsie.c | 10 ++++++++++
> include/uapi/linux/kvm.h | 1 +
> 5 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index e14f381..c1b0a9a 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -207,7 +207,8 @@ struct kvm_s390_sie_block {
> __u16 ipa; /* 0x0056 */
> __u32 ipb; /* 0x0058 */
> __u32 scaoh; /* 0x005c */
> - __u8 reserved60; /* 0x0060 */
> +#define FPF_BPBC 0x20
> + __u8 fpf; /* 0x0060 */
> #define ECB_GS 0x40
> #define ECB_TE 0x10
> #define ECB_SRSI 0x04
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 38535a57..4cdaa55 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -224,6 +224,7 @@ struct kvm_guest_debug_arch {
> #define KVM_SYNC_RICCB (1UL << 7)
> #define KVM_SYNC_FPRS (1UL << 8)
> #define KVM_SYNC_GSCB (1UL << 9)
> +#define KVM_SYNC_BPBC (1UL << 10)
> /* length and alignment of the sdnx as a power of two */
> #define SDNXC 8
> #define SDNXL (1UL << SDNXC)
> @@ -247,7 +248,9 @@ struct kvm_sync_regs {
> };
> __u8 reserved[512]; /* for future vector expansion */
> __u32 fpc; /* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
> - __u8 padding1[52]; /* riccb needs to be 64byte aligned */
> + __u8 bpbc : 1; /* bp mode */
> + __u8 reserved2 : 7;
> + __u8 padding1[51]; /* riccb needs to be 64byte aligned */
> __u8 riccb[64]; /* runtime instrumentation controls block */
> __u8 padding2[192]; /* sdnx needs to be 256byte aligned */
> union {
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 2c93cbb..2598cf243 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_S390_GS:
> r = test_facility(133);
> break;
> + case KVM_CAP_S390_BPB:
> + r = test_facility(82);
> + break;
> default:
> r = 0;
> }
> @@ -2198,6 +2201,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> kvm_s390_set_prefix(vcpu, 0);
> if (test_kvm_facility(vcpu->kvm, 64))
> vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB;
> + if (test_kvm_facility(vcpu->kvm, 82))
> + vcpu->run->kvm_valid_regs |= KVM_SYNC_BPBC;
> if (test_kvm_facility(vcpu->kvm, 133))
> vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
> /* fprs can be synchronized via vrs, even if the guest has no vx. With
> @@ -2339,6 +2344,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
> current->thread.fpu.fpc = 0;
> vcpu->arch.sie_block->gbea = 1;
> vcpu->arch.sie_block->pp = 0;
> + vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
> vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
> kvm_clear_async_pf_completion_queue(vcpu);
> if (!kvm_s390_user_cpu_state_ctrl(vcpu->kvm))
> @@ -3298,6 +3304,11 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> vcpu->arch.sie_block->ecd |= ECD_HOSTREGMGMT;
> vcpu->arch.gs_enabled = 1;
> }
> + if ((kvm_run->kvm_dirty_regs & KVM_SYNC_BPBC) &&
> + test_kvm_facility(vcpu->kvm, 82)) {
> + vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
> + vcpu->arch.sie_block->fpf |= kvm_run->s.regs.bpbc ? FPF_BPBC : 0;
> + }
> save_access_regs(vcpu->arch.host_acrs);
> restore_access_regs(vcpu->run->s.regs.acrs);
> /* save host (userspace) fprs/vrs */
> @@ -3344,6 +3355,7 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> kvm_run->s.regs.pft = vcpu->arch.pfault_token;
> kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
> kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
> + kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
> save_access_regs(vcpu->run->s.regs.acrs);
> restore_access_regs(vcpu->arch.host_acrs);
> /* Save guest register state */
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index 5d6ae03..7513483 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -223,6 +223,12 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> memcpy(scb_o->gcr, scb_s->gcr, 128);
> scb_o->pp = scb_s->pp;
>
> + /* branch prediction */
> + if (test_kvm_facility(vcpu->kvm, 82)) {
> + scb_o->fpf &= ~FPF_BPBC;
> + scb_o->fpf |= scb_s->fpf & FPF_BPBC;
> + }
> +
> /* interrupt intercept */
> switch (scb_s->icptcode) {
> case ICPT_PROGI:
> @@ -265,6 +271,7 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> scb_s->ecb3 = 0;
> scb_s->ecd = 0;
> scb_s->fac = 0;
> + scb_s->fpf = 0;
>
> rc = prepare_cpuflags(vcpu, vsie_page);
> if (rc)
> @@ -324,6 +331,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
> prefix_unmapped(vsie_page);
> scb_s->ecb |= scb_o->ecb & ECB_TE;
> }
> + /* branch prediction */
> + if (test_kvm_facility(vcpu->kvm, 82))
> + scb_s->fpf |= scb_o->fpf & FPF_BPBC;
> /* SIMD */
> if (test_kvm_facility(vcpu->kvm, 129)) {
> scb_s->eca |= scb_o->eca & ECA_VX;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 496e59a..79f6050 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_HYPERV_SYNIC2 148
> #define KVM_CAP_HYPERV_VP_INDEX 149
> #define KVM_CAP_S390_AIS_MIGRATION 150
> +#define KVM_CAP_S390_BPB 151
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
>
Reviewed-by: David Hildenbrand <[email protected]>
--
Thanks,
David / dhildenb
On 17.01.2018 10:48, Martin Schwidefsky wrote:
> To be able to switch off specific CPU alternatives with kernel parameters
> make a copy of the facility bit mask provided by STFLE and use the copy
> for the decision to apply an alternative.
>
> Reviewed-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
Reviewed-by: David Hildenbrand <[email protected]>
--
Thanks,
David / dhildenb
On 17.01.2018 10:48, Martin Schwidefsky wrote:
> Implement nospec_load() and nospec_ptr() for s390 with the new
> gmb() barrier between the boundary condition and the load that
> may not be done speculatively.
>
> Acked-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
> arch/s390/include/asm/barrier.h | 38 ++++++++++++++++++++++++++++++++++++++
> arch/s390/kernel/alternative.c | 7 +++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
> index 1043260..b8836a6 100644
> --- a/arch/s390/include/asm/barrier.h
> +++ b/arch/s390/include/asm/barrier.h
> @@ -8,6 +8,8 @@
> #ifndef __ASM_BARRIER_H
> #define __ASM_BARRIER_H
>
> +#include <asm/alternative.h>
> +
> /*
> * Force strict CPU ordering.
> * And yes, this is required on UP too when we're talking
> @@ -23,6 +25,42 @@
>
> #define mb() do { asm volatile(__ASM_BARRIER : : : "memory"); } while (0)
>
> +static inline void gmb(void)
> +{
> + asm volatile(
> + ALTERNATIVE("", ".long 0xb2e8f000", 81)
> + : : : "memory");
> +}
Just to be sure:
There are now 2 new facilities:
81 and 82.
Is 82 just the virtualization (SIE) support for 81?
--
Thanks,
David / dhildenb
On 01/17/2018 02:58 PM, David Hildenbrand wrote:
> On 17.01.2018 10:48, Martin Schwidefsky wrote:
>> Implement nospec_load() and nospec_ptr() for s390 with the new
>> gmb() barrier between the boundary condition and the load that
>> may not be done speculatively.
>>
>> Acked-by: Christian Borntraeger <[email protected]>
>> Signed-off-by: Martin Schwidefsky <[email protected]>
>> ---
>> arch/s390/include/asm/barrier.h | 38 ++++++++++++++++++++++++++++++++++++++
>> arch/s390/kernel/alternative.c | 7 +++++++
>> 2 files changed, 45 insertions(+)
>>
>> diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
>> index 1043260..b8836a6 100644
>> --- a/arch/s390/include/asm/barrier.h
>> +++ b/arch/s390/include/asm/barrier.h
>> @@ -8,6 +8,8 @@
>> #ifndef __ASM_BARRIER_H
>> #define __ASM_BARRIER_H
>>
>> +#include <asm/alternative.h>
>> +
>> /*
>> * Force strict CPU ordering.
>> * And yes, this is required on UP too when we're talking
>> @@ -23,6 +25,42 @@
>>
>> #define mb() do { asm volatile(__ASM_BARRIER : : : "memory"); } while (0)
>>
>> +static inline void gmb(void)
>> +{
>> + asm volatile(
>> + ALTERNATIVE("", ".long 0xb2e8f000", 81)
>> + : : : "memory");
>> +}
>
> Just to be sure:
>
> There are now 2 new facilities:
>
> 81 and 82.
>
> Is 82 just the virtualization (SIE) support for 81?
81 is for ppa15 (see this patch) and 82 is for ppa12 and 13 (see patch 3).
In KVM we want to provide both (and let the guest decide what to do).
On Wed, 17 Jan 2018 10:48:34 +0100
Martin Schwidefsky <[email protected]> wrote:
> To be able to switch off specific CPU alternatives with kernel parameters
> make a copy of the facility bit mask provided by STFLE and use the copy
> for the decision to apply an alternative.
>
> Reviewed-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
> arch/s390/include/asm/facility.h | 18 ++++++++++++++++++
> arch/s390/include/asm/lowcore.h | 3 ++-
> arch/s390/kernel/alternative.c | 3 ++-
> arch/s390/kernel/early.c | 3 +++
> arch/s390/kernel/setup.c | 4 +++-
> arch/s390/kernel/smp.c | 4 +++-
> 6 files changed, 31 insertions(+), 4 deletions(-)
Reviewed-by: Cornelia Huck <[email protected]>
On 01/17/2018 07:41 AM, Jiri Kosina wrote:
> On Wed, 17 Jan 2018, Martin Schwidefsky wrote:
>
>> Implement nospec_load() and nospec_ptr() for s390 with the new
>> gmb() barrier between the boundary condition and the load that
>> may not be done speculatively.
Thanks for the patches, Martin et al. I tested various earlier versions
and will run these latest ones through some tests and add a tested by.
> FWIW the naming seems to be changing constantly. The latest patchset from
> Dan Williams [1] uses ifence_...().
>
> [1] lkml.kernel.org/r/151586744180.5820.13215059696964205856.stgit@dwillia2-desk3.amr.corp.intel.com
This is getting a little silly. Not to bikeshed this to death, but
obviously gmb (what was that ever supposed to stand for, global?) was
the wrong name. We favored seb (speculative execution barrier), etc.
Still, "ifence"? What is that supposed to mean? That sounds very
architecture specific vs. what we're actually trying to ensure, which is
that we don't speculatively load a pointer.
Jon.
--
Computer Architect | Sent from my Fedora powered laptop
On 01/17/2018 02:51 PM, David Hildenbrand wrote:
> On 17.01.2018 14:44, Christian Borntraeger wrote:
>> The new firmware interfaces for branch prediction behaviour changes
>> are transparently available for the guest. Nevertheless, there is
>> new state attached that should be migrated and properly resetted.
>> Provide a mechanism for handling reset, migration and VSIE.
>>
>> Signed-off-by: Christian Borntraeger <[email protected]>
>> ---
>> v1->v2: - review feedback from David
>> - rename seb(c) into bpb(c)
>> arch/s390/include/asm/kvm_host.h | 3 ++-
>> arch/s390/include/uapi/asm/kvm.h | 5 ++++-
>> arch/s390/kvm/kvm-s390.c | 12 ++++++++++++
>> arch/s390/kvm/vsie.c | 10 ++++++++++
>> include/uapi/linux/kvm.h | 1 +
>> 5 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index e14f381..c1b0a9a 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -207,7 +207,8 @@ struct kvm_s390_sie_block {
>> __u16 ipa; /* 0x0056 */
>> __u32 ipb; /* 0x0058 */
>> __u32 scaoh; /* 0x005c */
>> - __u8 reserved60; /* 0x0060 */
>> +#define FPF_BPBC 0x20
>> + __u8 fpf; /* 0x0060 */
>> #define ECB_GS 0x40
>> #define ECB_TE 0x10
>> #define ECB_SRSI 0x04
>> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
>> index 38535a57..4cdaa55 100644
>> --- a/arch/s390/include/uapi/asm/kvm.h
>> +++ b/arch/s390/include/uapi/asm/kvm.h
>> @@ -224,6 +224,7 @@ struct kvm_guest_debug_arch {
>> #define KVM_SYNC_RICCB (1UL << 7)
>> #define KVM_SYNC_FPRS (1UL << 8)
>> #define KVM_SYNC_GSCB (1UL << 9)
>> +#define KVM_SYNC_BPBC (1UL << 10)
>> /* length and alignment of the sdnx as a power of two */
>> #define SDNXC 8
>> #define SDNXL (1UL << SDNXC)
>> @@ -247,7 +248,9 @@ struct kvm_sync_regs {
>> };
>> __u8 reserved[512]; /* for future vector expansion */
>> __u32 fpc; /* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */
>> - __u8 padding1[52]; /* riccb needs to be 64byte aligned */
>> + __u8 bpbc : 1; /* bp mode */
>> + __u8 reserved2 : 7;
>> + __u8 padding1[51]; /* riccb needs to be 64byte aligned */
>> __u8 riccb[64]; /* runtime instrumentation controls block */
>> __u8 padding2[192]; /* sdnx needs to be 256byte aligned */
>> union {
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 2c93cbb..2598cf243 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -421,6 +421,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> case KVM_CAP_S390_GS:
>> r = test_facility(133);
>> break;
>> + case KVM_CAP_S390_BPB:
>> + r = test_facility(82);
>> + break;
>> default:
>> r = 0;
>> }
>> @@ -2198,6 +2201,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>> kvm_s390_set_prefix(vcpu, 0);
>> if (test_kvm_facility(vcpu->kvm, 64))
>> vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB;
>> + if (test_kvm_facility(vcpu->kvm, 82))
>> + vcpu->run->kvm_valid_regs |= KVM_SYNC_BPBC;
>> if (test_kvm_facility(vcpu->kvm, 133))
>> vcpu->run->kvm_valid_regs |= KVM_SYNC_GSCB;
>> /* fprs can be synchronized via vrs, even if the guest has no vx. With
>> @@ -2339,6 +2344,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
>> current->thread.fpu.fpc = 0;
>> vcpu->arch.sie_block->gbea = 1;
>> vcpu->arch.sie_block->pp = 0;
>> + vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
>> vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
>> kvm_clear_async_pf_completion_queue(vcpu);
>> if (!kvm_s390_user_cpu_state_ctrl(vcpu->kvm))
>> @@ -3298,6 +3304,11 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> vcpu->arch.sie_block->ecd |= ECD_HOSTREGMGMT;
>> vcpu->arch.gs_enabled = 1;
>> }
>> + if ((kvm_run->kvm_dirty_regs & KVM_SYNC_BPBC) &&
>> + test_kvm_facility(vcpu->kvm, 82)) {
>> + vcpu->arch.sie_block->fpf &= ~FPF_BPBC;
>> + vcpu->arch.sie_block->fpf |= kvm_run->s.regs.bpbc ? FPF_BPBC : 0;
>> + }
>> save_access_regs(vcpu->arch.host_acrs);
>> restore_access_regs(vcpu->run->s.regs.acrs);
>> /* save host (userspace) fprs/vrs */
>> @@ -3344,6 +3355,7 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>> kvm_run->s.regs.pft = vcpu->arch.pfault_token;
>> kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
>> kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
>> + kvm_run->s.regs.bpbc = (vcpu->arch.sie_block->fpf & FPF_BPBC) == FPF_BPBC;
>> save_access_regs(vcpu->run->s.regs.acrs);
>> restore_access_regs(vcpu->arch.host_acrs);
>> /* Save guest register state */
>> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
>> index 5d6ae03..7513483 100644
>> --- a/arch/s390/kvm/vsie.c
>> +++ b/arch/s390/kvm/vsie.c
>> @@ -223,6 +223,12 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>> memcpy(scb_o->gcr, scb_s->gcr, 128);
>> scb_o->pp = scb_s->pp;
>>
>> + /* branch prediction */
>> + if (test_kvm_facility(vcpu->kvm, 82)) {
>> + scb_o->fpf &= ~FPF_BPBC;
>> + scb_o->fpf |= scb_s->fpf & FPF_BPBC;
>> + }
>> +
>> /* interrupt intercept */
>> switch (scb_s->icptcode) {
>> case ICPT_PROGI:
>> @@ -265,6 +271,7 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>> scb_s->ecb3 = 0;
>> scb_s->ecd = 0;
>> scb_s->fac = 0;
>> + scb_s->fpf = 0;
>>
>> rc = prepare_cpuflags(vcpu, vsie_page);
>> if (rc)
>> @@ -324,6 +331,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>> prefix_unmapped(vsie_page);
>> scb_s->ecb |= scb_o->ecb & ECB_TE;
>> }
>> + /* branch prediction */
>> + if (test_kvm_facility(vcpu->kvm, 82))
>> + scb_s->fpf |= scb_o->fpf & FPF_BPBC;
>> /* SIMD */
>> if (test_kvm_facility(vcpu->kvm, 129)) {
>> scb_s->eca |= scb_o->eca & ECA_VX;
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 496e59a..79f6050 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -932,6 +932,7 @@ struct kvm_ppc_resize_hpt {
>> #define KVM_CAP_HYPERV_SYNIC2 148
>> #define KVM_CAP_HYPERV_VP_INDEX 149
>> #define KVM_CAP_S390_AIS_MIGRATION 150
>> +#define KVM_CAP_S390_BPB 151
>>
>> #ifdef KVM_CAP_IRQ_ROUTING
>>
>>
>
> Reviewed-by: David Hildenbrand <[email protected]>
Thanks.
Conny can you review and ack as well?
Paolo, Radim,
As the other patches need to sync on the ifetch/nospec/gmb naming I have changed my mind. :-)
This patch is independent from the other patches (as it just provides the guest facilities not caring
about what the host does).
It seems that you do a kvm pull request for 4.15 anyway (for power), so it might make sense to
apply this patch as well for 4.15. this will make it easier to also upstream the QEMU part in time
as we need the uabi interfaces.
Christian
On Wed, 17 Jan 2018 22:43:24 +0100
Christian Borntraeger <[email protected]> wrote:
> Conny can you review and ack as well?
>
> Paolo, Radim,
>
> As the other patches need to sync on the ifetch/nospec/gmb naming I have changed my mind. :-)
> This patch is independent from the other patches (as it just provides the guest facilities not caring
> about what the host does).
>
> It seems that you do a kvm pull request for 4.15 anyway (for power), so it might make sense to
> apply this patch as well for 4.15. this will make it easier to also upstream the QEMU part in time
> as we need the uabi interfaces.
Indeed, there is no real dependency. I am thinking about doing a split of my patches as well,
everything but the gmb/nospec_xxx/ifetch/isync/whatever part. The prctl part needs some more
discussion, as will prepare a patch. Or two.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
On Wed, 17 Jan 2018 10:48:36 +0100
Martin Schwidefsky <[email protected]> wrote:
> Add the PPA instruction to the system entry and exit path to switch
> the kernel to a different branch prediction behaviour. The instructions
> are added via CPU alternatives and can be disabled with the "nospec"
> or the "nobp=0" kernel parameter. If the default behaviour selected
> with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
> used to enable the changed kernel branch prediction.
>
> Acked-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
> arch/s390/Kconfig | 17 +++++++++++++
> arch/s390/include/asm/processor.h | 1 +
> arch/s390/kernel/alternative.c | 23 ++++++++++++++++++
> arch/s390/kernel/early.c | 2 ++
> arch/s390/kernel/entry.S | 50 ++++++++++++++++++++++++++++++++++++++-
> arch/s390/kernel/ipl.c | 1 +
> arch/s390/kernel/smp.c | 2 ++
> 7 files changed, 95 insertions(+), 1 deletion(-)
Acked-by: Cornelia Huck <[email protected]>
[This looks sane; but as I don't have insight into the details of the
new instructions, I don't feel confident enough to give a R-b]
On Wed, 17 Jan 2018 14:44:34 +0100
Christian Borntraeger <[email protected]> wrote:
> The new firmware interfaces for branch prediction behaviour changes
> are transparently available for the guest. Nevertheless, there is
> new state attached that should be migrated and properly resetted.
> Provide a mechanism for handling reset, migration and VSIE.
>
> Signed-off-by: Christian Borntraeger <[email protected]>
> ---
> v1->v2: - review feedback from David
> - rename seb(c) into bpb(c)
> arch/s390/include/asm/kvm_host.h | 3 ++-
> arch/s390/include/uapi/asm/kvm.h | 5 ++++-
> arch/s390/kvm/kvm-s390.c | 12 ++++++++++++
> arch/s390/kvm/vsie.c | 10 ++++++++++
> include/uapi/linux/kvm.h | 1 +
> 5 files changed, 29 insertions(+), 2 deletions(-)
Reviewed-by: Cornelia Huck <[email protected]>
But I'd like to see some documentation (a line) for the new cap under
Documentation/. However, this can be done as a separate update (we
might be missing more).
On 01/18/2018 10:59 AM, Cornelia Huck wrote:
> On Wed, 17 Jan 2018 14:44:34 +0100
> Christian Borntraeger <[email protected]> wrote:
>
>> The new firmware interfaces for branch prediction behaviour changes
>> are transparently available for the guest. Nevertheless, there is
>> new state attached that should be migrated and properly resetted.
>> Provide a mechanism for handling reset, migration and VSIE.
>>
>> Signed-off-by: Christian Borntraeger <[email protected]>
>> ---
>> v1->v2: - review feedback from David
>> - rename seb(c) into bpb(c)
>> arch/s390/include/asm/kvm_host.h | 3 ++-
>> arch/s390/include/uapi/asm/kvm.h | 5 ++++-
>> arch/s390/kvm/kvm-s390.c | 12 ++++++++++++
>> arch/s390/kvm/vsie.c | 10 ++++++++++
>> include/uapi/linux/kvm.h | 1 +
>> 5 files changed, 29 insertions(+), 2 deletions(-)
>
> Reviewed-by: Cornelia Huck <[email protected]>
Thanks
>
> But I'd like to see some documentation (a line) for the new cap under
> Documentation/. However, this can be done as a separate update (we
> might be missing more).
Ok, I will work on a doc update patch.
Paolo, Radim, can you take this patch for 4.15?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
?? 2018/1/17 17:48, Martin Schwidefsky д??:
> Add the PPA instruction to the system entry and exit path to switch
> the kernel to a different branch prediction behaviour. The instructions
> are added via CPU alternatives and can be disabled with the "nospec"
> or the "nobp=0" kernel parameter. If the default behaviour selected
> with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
> used to enable the changed kernel branch prediction.
>
> Acked-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
> arch/s390/Kconfig | 17 +++++++++++++
> arch/s390/include/asm/processor.h | 1 +
> arch/s390/kernel/alternative.c | 23 ++++++++++++++++++
> arch/s390/kernel/early.c | 2 ++
> arch/s390/kernel/entry.S | 50 ++++++++++++++++++++++++++++++++++++++-
> arch/s390/kernel/ipl.c | 1 +
> arch/s390/kernel/smp.c | 2 ++
> 7 files changed, 95 insertions(+), 1 deletion(-)
>
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 829c679..a818644 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -541,6 +541,23 @@ config ARCH_RANDOM
>
> If unsure, say Y.
>
> +config KERNEL_NOBP
Just a question that can we add the control in proc system to
enable/disable the facilities
for the whole system by default? Each process can still overwrite the
default setting.
This may provide more flexibility for the operator to choose and debug
as well without rebooting
the system. e.g. echo 0 > /sys/kernel/debug/s390x/ibpb_enabled
Ref: https://access.redhat.com/articles/3311301
> + def_bool n
> + prompt "Enable modified branch prediction for the kernel by default"
> + help
> + If this option is selected the kernel will switch to a modified
> + branch prediction mode if the firmware interface is available.
> + The modified branch prediction mode improves the behaviour in
> + regard to speculative execution.
> +
> + With the option enabled the kernel parameter "nobp=0" or "nospec"
> + can be used to run the kernel in the normal branch prediction mode.
> +
> + With the option disabled the modified branch prediction mode is
> + enabled with the "nobp=1" kernel parameter.
> +
> + If unsure, say N.
> +
> endmenu
[...]
--
Regards
QingFeng Hao
?? 2018/1/17 17:48, Martin Schwidefsky д??:
> Clear all user space registers on entry to the kernel and all KVM guest
> registers on KVM guest exit if the register does not contain either a
> parameter or a result value.
I am not sure if I understand this but it will be safer?
And can we abstract the operations to be a macro like CLEAR_REG_7?
Thanks
>
> Suggested-by: Christian Borntraeger <[email protected]>
> Reviewed-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Martin Schwidefsky <[email protected]>
> ---
> arch/s390/kernel/entry.S | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
> index 2a22c03..47227d3 100644
> --- a/arch/s390/kernel/entry.S
> +++ b/arch/s390/kernel/entry.S
> @@ -322,6 +322,12 @@ ENTRY(sie64a)
> sie_exit:
> lg %r14,__SF_EMPTY+8(%r15) # load guest register save area
> stmg %r0,%r13,0(%r14) # save guest gprs 0-13
> + xgr %r0,%r0 # clear guest registers
> + xgr %r1,%r1
> + xgr %r2,%r2
> + xgr %r3,%r3
> + xgr %r4,%r4
> + xgr %r5,%r5
> lmg %r6,%r14,__SF_GPRS(%r15) # restore kernel registers
> lg %r2,__SF_EMPTY+16(%r15) # return exit reason code
> br %r14
> @@ -358,6 +364,7 @@ ENTRY(system_call)
> UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
> BPENTER __TI_flags(%r12),_TIF_NOBP
> stmg %r0,%r7,__PT_R0(%r11)
> + xgr %r0,%r0
> mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
> mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
> mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC
> @@ -640,6 +647,14 @@ ENTRY(pgm_check_handler)
> 4: lgr %r13,%r11
> la %r11,STACK_FRAME_OVERHEAD(%r15)
> stmg %r0,%r7,__PT_R0(%r11)
> + xgr %r0,%r0 # clear user space registers
> + xgr %r1,%r1
> + xgr %r2,%r2
> + xgr %r3,%r3
> + xgr %r4,%r4
> + xgr %r5,%r5
> + xgr %r6,%r6
> + xgr %r7,%r7
> mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
> stmg %r8,%r9,__PT_PSW(%r11)
> mvc __PT_INT_CODE(4,%r11),__LC_PGM_ILC
> @@ -706,6 +721,15 @@ ENTRY(io_int_handler)
> lmg %r8,%r9,__LC_IO_OLD_PSW
> SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
> stmg %r0,%r7,__PT_R0(%r11)
> + xgr %r0,%r0 # clear user space registers
> + xgr %r1,%r1
> + xgr %r2,%r2
> + xgr %r3,%r3
> + xgr %r4,%r4
> + xgr %r5,%r5
> + xgr %r6,%r6
> + xgr %r7,%r7
> + xgr %r10,%r10
> mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
> stmg %r8,%r9,__PT_PSW(%r11)
> mvc __PT_INT_CODE(12,%r11),__LC_SUBCHANNEL_ID
> @@ -924,6 +948,15 @@ ENTRY(ext_int_handler)
> lmg %r8,%r9,__LC_EXT_OLD_PSW
> SWITCH_ASYNC __LC_SAVE_AREA_ASYNC,__LC_ASYNC_ENTER_TIMER
> stmg %r0,%r7,__PT_R0(%r11)
> + xgr %r0,%r0 # clear user space registers
> + xgr %r1,%r1
> + xgr %r2,%r2
> + xgr %r3,%r3
> + xgr %r4,%r4
> + xgr %r5,%r5
> + xgr %r6,%r6
> + xgr %r7,%r7
> + xgr %r10,%r10
> mvc __PT_R8(64,%r11),__LC_SAVE_AREA_ASYNC
> stmg %r8,%r9,__PT_PSW(%r11)
> lghi %r1,__LC_EXT_PARAMS2
> @@ -1133,6 +1166,14 @@ ENTRY(mcck_int_handler)
> .Lmcck_skip:
> lghi %r14,__LC_GPREGS_SAVE_AREA+64
> stmg %r0,%r7,__PT_R0(%r11)
> + xgr %r0,%r0 # clear user space registers
> + xgr %r2,%r2
> + xgr %r3,%r3
> + xgr %r4,%r4
> + xgr %r5,%r5
> + xgr %r6,%r6
> + xgr %r7,%r7
> + xgr %r10,%r10
> mvc __PT_R8(64,%r11),0(%r14)
> stmg %r8,%r9,__PT_PSW(%r11)
> xc __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
--
Regards
QingFeng Hao
On 01/19/2018 07:29 AM, QingFeng Hao wrote:
>
>
> ?? 2018/1/17 17:48, Martin Schwidefsky д??:
>> Clear all user space registers on entry to the kernel and all KVM guest
>> registers on KVM guest exit if the register does not contain either a
>> parameter or a result value.
> I am not sure if I understand this but it will be safer?
It ist similar to commit 0cb5b30698fd ("kvm: vmx: Scrub hardware GPRs at VM-exit").
The idea is to minimize potential payload channels.
> And can we abstract the operations to be a macro like CLEAR_REG_7?
No, please.
xgr %r7,%r7
is absolutely clear what it does, a MACRO often is not.
?? 2018/1/19 15:57, Christian Borntraeger д??:
>
> On 01/19/2018 07:29 AM, QingFeng Hao wrote:
>>
>> ?? 2018/1/17 17:48, Martin Schwidefsky д??:
>>> Clear all user space registers on entry to the kernel and all KVM guest
>>> registers on KVM guest exit if the register does not contain either a
>>> parameter or a result value.
>> I am not sure if I understand this but it will be safer?
> It ist similar to commit 0cb5b30698fd ("kvm: vmx: Scrub hardware GPRs at VM-exit").
> The idea is to minimize potential payload channels.
Got it! thanks for your explanation!
>
>> And can we abstract the operations to be a macro like CLEAR_REG_7?
> No, please.
> xgr %r7,%r7
> is absolutely clear what it does, a MACRO often is not.
nod, this makes sense!
--
Regards
QingFeng Hao