Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754595AbbHMVr5 (ORCPT ); Thu, 13 Aug 2015 17:47:57 -0400 Received: from mail-oi0-f51.google.com ([209.85.218.51]:33387 "EHLO mail-oi0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754021AbbHMVrz (ORCPT ); Thu, 13 Aug 2015 17:47:55 -0400 MIME-Version: 1.0 In-Reply-To: <55CD0DAC.9080809@redhat.com> References: <55CCB510.3060807@redhat.com> <55CD0DAC.9080809@redhat.com> From: Andy Lutomirski Date: Thu, 13 Aug 2015 14:47:34 -0700 Message-ID: Subject: Re: [Regression v4.2 ?] 32-bit seccomp-BPF returned errno values wrong in VM? To: Denys Vlasenko Cc: Kees Cook , David Drysdale , "linux-kernel@vger.kernel.org" , Will Drewry , Ingo Molnar , Alok Kataria , Linus Torvalds , Borislav Petkov , Alexei Starovoitov , Frederic Weisbecker , "H. Peter Anvin" , Oleg Nesterov , Steven Rostedt , X86 ML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7719 Lines: 190 On Thu, Aug 13, 2015 at 2:35 PM, Denys Vlasenko wrote: > On 08/13/2015 08:47 PM, Kees Cook wrote: >> On Thu, Aug 13, 2015 at 10:39 AM, David Drysdale wrote: >>> On Thu, Aug 13, 2015 at 6:15 PM, Andy Lutomirski wrote: >>>> On Thu, Aug 13, 2015 at 9:28 AM, David Drysdale wrote: >>>>> On Thu, Aug 13, 2015 at 4:17 PM, Denys Vlasenko wrote: >>>>>> On 08/13/2015 10:30 AM, David Drysdale wrote: >>>>>>> Hi folks, >>>>>>> >>>>>>> I've got an odd regression with the v4.2 rc kernel, and I wondered if anyone >>>>>>> else could reproduce it. >>>>>>> >>>>>>> The problem occurs with a seccomp-bpf filter program that's set up to return >>>>>>> an errno value -- an errno of 1 is always returned instead of what's in the >>>>>>> filter, plus other oddities (selftest output below). >>>>>>> >>>>>>> The problem seems to need a combination of circumstances to occur: >>>>>>> >>>>>>> - The seccomp-bpf userspace program needs to be 32-bit, running against a >>>>>>> 64-bit kernel -- I'm testing with seccomp_bpf from >>>>>>> tools/testing/selftests/seccomp/, built via 'CFLAGS=-m32 make'. >>>>>> >>>>>> Does it work correctly when built as 64-bit program? >>>>> >>>>> Yep, 64-bit works fine (both at v4.2-rc6 and at commit 3f5159). >>>>> >>>>>>> >>>>>>> - The kernel needs to be running as a VM guest -- it occurs inside my >>>>>>> VMware Fusion host, but not if I run on bare metal. Kees tells me he >>>>>>> cannot repro with a kvm guest though. >>>>>>> >>>>>>> Bisecting indicates that the commit that induces the problem is >>>>>>> 3f5159a9221f19b0, "x86/asm/entry/32: Update -ENOSYS handling to match the >>>>>>> 64-bit logic", included in all the v4.2-rc* candidates. >>>>>>> >>>>>>> Apologies if I've just got something odd with my local setup, but the >>>>>>> bisection was unequivocal enough that I thought it worth reporting... >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> >>>>>>> seccomp_bpf failure outputs: >>>>> >>>>> [snip] >>>>> >>>>>> End result should be: >>>>>> pt_regs->ax = -E2BIG (via syscall_set_return_value()) >>>>>> pt_regs->orig_ax = -1 ("skip syscall") >>>>>> and syscall_trace_enter_phase1() usually returns with 0, >>>>>> meaning "re-execute syscall at once, no phase2 needed". >>>>>> >>>>>> This, in turn, is called from .S files, and when it returns there, >>>>>> execution loops back to syscall dispatch. >>>>>> >>>>>> Because of orig_ax = -1, syscall dispatch should skip calling syscall. >>>>>> So -E2BIG should survive and be returned... >>>>> >>>>> So I was just about to send: >>>>> >>>>> That makes sense, and given that exactly the same 32-bit binary >>>>> runs fine on a different machine, there's presumably something up >>>>> with my local setup. The failing machine is a VMware guest, but >>>>> maybe that's not the relevant interaction -- particularly if no-one >>>>> else can repro. >>>>> >>>>> But then I noticed some odd audit entries in the main log: >>>>> >>>>> Aug 13 16:52:56 ubuntu kernel: [ 20.687249] audit: type=1326 >>>>> audit(1439481176.034:62): auid=4294967295 uid=1000 gid=1000 >>>>> ses=4294967295 pid=2621 comm="secccomp_bpf.ke" >>>>> exe="/home/dmd/secccomp_bpf.kees.m32" sig=9 arch=40000003 syscall=172 >>>>> compat=1 ip=0xf773cc90 code=0x0 >>>>> Aug 13 16:52:56 ubuntu kernel: [ 20.691157] audit: type=1326 >>>>> audit(1439481176.038:63): auid=4294967295 uid=1000 gid=1000 >>>>> ses=4294967295 pid=2631 comm="secccomp_bpf.ke" >>>>> exe="/home/dmd/secccomp_bpf.kees.m32" sig=31 arch=40000003 syscall=20 >>>>> compat=1 ip=0xf773cc90 code=0x10000000 >>>>> ... >>>>> >>>>> I didn't think I had any audit stuff turned on, and indeed: >>>>> # auditctl -l >>>>> No rules >>>>> >>>>> But as soon as I'd run that auditctl command, the 32-bit >>>>> seccomp_bpf binary started running fine! >>>>> >>>>> So now I'm confused, and I can no longer reproduce the >>>>> problem. Which probably means this was a false alarm, in >>>>> which case, my apologies. >>>> >>>> You might have triggered TIF_AUDIT or whatever it's called, which >>>> causes a whole different path through the asm tangle, so you might >>>> really have a problem. >>>> >>>> Try auditctl -a task,never. If that doesn't change anything, try >>>> rebooting the guest. >>> >>> Aha, that seems to re-instate the problem -- with that auditctl setup >>> I get the 32-bit seccomp failures on two different machines (one VM, >>> one bare). So can anyone else repro? >>> >>> I guess the relevant steps are thus: >>> - sudo auditctl -a task,never >>> - cd tools/testing/selftests/seccomp >>> - CFLAGS=-m32 make clean run_tests >> >> That was it! I can reproduce this now on kvm (after adding the auditctl rule). > > I suspect this change: > > .macro auditsys_entry_common > ... > movl %ebx,%esi /* 2nd arg: 1st syscall arg */ > movl %eax,%edi /* 1st arg: syscall number */ > call __audit_syscall_entry > - movl RAX(%rsp),%eax /* reload syscall number */ > - cmpq $(IA32_NR_syscalls-1),%rax > - ja ia32_badsys > + movl ORIG_RAX(%rsp),%eax /* reload syscall number */ > movl %ebx,%edi /* reload 1st syscall arg */ > movl RCX(%rsp),%esi /* reload 2nd syscall arg */ > movl RDX(%rsp),%edx /* reload 3rd syscall arg */ > > We were reloading syscall# from pt_regs->ax. I am so glad that this code is gone in -tip. Good riddance! > > After the patch, pt_regs->ax isn't equal to syscall# on entry, > instead it contains -ENOSYS. Therefore the change shown above > was made, to reload it from pt_regs->orig_ax. > > Well. This still should work... in fact it is "more correct" > than it was before... > > 64-bit code has no call to __audit_syscall_entry, it uses > syscall_trace_enter_phase1/phase2 mechanism instead of > "only audit" shortcut. If the bug is here (though I don't see it), > it explains why 64-bit binary works. > > > Now, how do we reach this bit of code? > > ia32_sysenter_target: > ... > testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS) > jnz sysenter_tracesys > ... > sysenter_tracesys: > testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS) > jz sysenter_auditsys > ... > sysenter_auditsys: > auditsys_entry_common <== OUR MACRO > movl %ebp,%r9d /* reload 6th syscall arg */ > jmp sysenter_dispatch > > > ia32_cstar_target: > ... > testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS) > jnz cstar_tracesys > ... > cstar_tracesys: > testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS) > jz cstar_auditsys > ... > cstar_auditsys: > movl %r9d,R9(%rsp) /* register to be clobbered by call */ > auditsys_entry_common <== OUR MACRO > movl R9(%rsp),%r9d /* reload 6th syscall arg */ > jmp cstar_dispatch > TIF_SECCOMP had better be set, so that code should be unreachable. syscall_trace_enter_phase1 returns 0 if we hit SECCOMP_RET_ERRNO (i.e. SECCOMP_PHASE1_SKIP). syscall_trace_enter sees that and returns regs->orig_ax, which is -1. It seems to me that the bug is that sysexit_from_sys_call isn't reloading RAX from regs->ax. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/