Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754093AbbGWXgn (ORCPT ); Thu, 23 Jul 2015 19:36:43 -0400 Received: from mail-ig0-f175.google.com ([209.85.213.175]:33748 "EHLO mail-ig0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752201AbbGWXgi (ORCPT ); Thu, 23 Jul 2015 19:36:38 -0400 MIME-Version: 1.0 In-Reply-To: <20150723102434.GA2929@1wt.eu> References: <7bfde005b84a90a83bf668a320c7d4ad1b940065.1437592883.git.luto@kernel.org> <20150723102434.GA2929@1wt.eu> Date: Thu, 23 Jul 2015 16:36:37 -0700 X-Google-Sender-Auth: RICF7p2NqOmtM5i-5zVyP7GIQec Message-ID: Subject: Re: [PATCH v3 2/3] x86/ldt: Make modify_ldt optional From: Kees Cook To: Willy Tarreau Cc: Andy Lutomirski , Peter Zijlstra , Steven Rostedt , "security@kernel.org" , X86 ML , Borislav Petkov , Sasha Levin , LKML , Konrad Rzeszutek Wilk , Boris Ostrovsky , Andrew Cooper , Jan Beulich , xen-devel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8135 Lines: 205 On Thu, Jul 23, 2015 at 3:24 AM, Willy Tarreau wrote: > Hi Andy, > > On Wed, Jul 22, 2015 at 12:23:47PM -0700, Andy Lutomirski wrote: >> The modify_ldt syscall exposes a large attack surface and is >> unnecessary for modern userspace. Make it optional. > > Wouldn't you prefer something like this which makes it possible to re-enable > it at runtime so that we can hope distros ship with it disabled by default ? > > It's pretty efficient on your ldtgdt testcase : > > # echo 1 > /proc/sys/kernel/modify_ldt > # ./a.out > [OK] LDT entry 0 has AR 0x0040FA00 and limit 0x0000000A > [OK] LDT entry 0 has AR 0x00C0FA00 and limit 0x0000AFFF > [OK] LDT entry 1 is invalid > [OK] LDT entry 2 has AR 0x00C0FA00 and limit 0x0000AFFF > [OK] LDT entry 1 is invalid > [OK] LDT entry 2 has AR 0x00C0FA00 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D0FA00 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D07A00 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00907A00 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D07200 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D07000 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D07400 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00507600 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507E00 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507C00 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507A00 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507800 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507800 and limit 0x0000000A > [RUN] Test fork > [OK] LDT entry 2 has AR 0x00507800 and limit 0x0000000A > [OK] LDT entry 1 is invalid > [OK] LDT entry 0 has AR 0x0040FA00 and limit 0x0000000A > [OK] LDT entry 0 has AR 0x00C0FA00 and limit 0x0000AFFF > [OK] LDT entry 1 is invalid > [OK] LDT entry 2 has AR 0x00C0FA00 and limit 0x0000AFFF > [OK] LDT entry 1 is invalid > [OK] LDT entry 2 has AR 0x00C0FA00 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D0FA00 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D07A00 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00907A00 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D07200 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D07000 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00D07400 and limit 0x0000AFFF > [OK] LDT entry 2 has AR 0x00507600 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507E00 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507C00 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507A00 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507800 and limit 0x0000000A > [OK] LDT entry 2 has AR 0x00507800 and limit 0x0000000A > [RUN] Test fork > [OK] Child succeeded > [OK] modify_ldt failure 22 > [OK] LDT entry 0 has AR 0x0000F200 and limit 0x00000000 > [OK] LDT entry 0 has AR 0x00007200 and limit 0x00000000 > [OK] LDT entry 0 has AR 0x0000F000 and limit 0x00000000 > [OK] LDT entry 0 has AR 0x00007200 and limit 0x00000000 > [OK] LDT entry 0 has AR 0x00007000 and limit 0x00000001 > [OK] LDT entry 0 has AR 0x00007000 and limit 0x00000000 > [OK] LDT entry 0 is invalid > [OK] LDT entry 0 has AR 0x0040F200 and limit 0x00000000 > [OK] LDT entry 0 is invalid > [SKIP] Cannot set affinity to CPU 1 > > > # echo 0 > /proc/sys/kernel/modify_ldt > # ./a.out > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] LDT entry 1 is invalid > [OK] modify_ldt is returned -ENOSYS > [OK] LDT entry 1 is invalid > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [SKIP] Skipping fork test because have no LDT > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [OK] modify_ldt is returned -ENOSYS > [SKIP] Cannot set affinity to CPU 1 > > The patch is quite small (I stole your comment for the config option). > > Willy > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 226d569..b926f65 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1012,6 +1012,23 @@ config X86_16BIT > this option saves about 300 bytes on i386, or around 6K text > plus 16K runtime memory on x86-64, > > +config DEFAULT_MODIFY_LDT_SYSCALL > + bool "Allow userspace to modify the LDT (local descriptor table)" > + default y > + ---help--- > + Linux can allow user programs to install a per-process x86 > + Local Descriptor Table (LDT) using the modify_ldt(2) system > + call. This is required to run 16-bit or segmented code such as > + DOSEMU or some Wine programs. It is also used by some very old > + threading libraries. > + > + Enabling this feature increases the low-level kernel attack > + surface. Disabling it disables the modify_ldt(2) system call by > + default. Note that even when disabled it remains possible to > + enable it at runtime by setting the sys.kernel.modify_ldt sysctl. > + > + Say 'N' here if you don't expect to use DOSEMU or Wine often. > + > config X86_ESPFIX32 > def_bool y > depends on X86_16BIT && X86_32 > diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c > index c37886d..2f10b6c 100644 > --- a/arch/x86/kernel/ldt.c > +++ b/arch/x86/kernel/ldt.c > @@ -20,6 +20,12 @@ > #include > #include > > +#ifdef CONFIG_DEFAULT_MODIFY_LDT_SYSCALL > +int sysctl_modify_ldt __read_mostly = 1; > +#else > +int sysctl_modify_ldt __read_mostly = 0; > +#endif > + > #ifdef CONFIG_SMP > static void flush_ldt(void *current_mm) > { > @@ -254,6 +260,9 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, > { > int ret = -ENOSYS; > > + if (!sysctl_modify_ldt) > + return ret; > + > switch (func) { > case 0: > ret = read_ldt(ptr, bytecount); > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 2082b1a..60270c6 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -111,6 +111,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max; > #ifndef CONFIG_MMU > extern int sysctl_nr_trim_pages; > #endif > +#ifdef CONFIG_X86 > +extern int sysctl_modify_ldt; > +#endif > > /* Constants used for minimum and maximum */ > #ifdef CONFIG_LOCKUP_DETECTOR > @@ -962,6 +965,13 @@ static struct ctl_table kern_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec, > }, > + { > + .procname = "modify_ldt", > + .data = &sysctl_modify_ldt, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec, > + }, > #endif > #if defined(CONFIG_MMU) > { I've been pondering something like this that is even MORE generic, for any syscall. Something like a "syscalls" directory under /proc/sys/kernel, with 1 entry per syscall. "0" is "available", "1" is disabled, and "-1" disabled until next boot. -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/