2017-09-20 20:46:02

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 00/31] Hardened usercopy whitelisting

v3:
- added LKDTM update patch
- downgrade BUGs to WARNs and fail closed
- add Acks/Reviews from v2

v2:
- added tracing of allocation and usage
- refactored solutions for task_struct
- split up network patches for readability

I intend for this to land via my usercopy hardening tree, so Acks,
Reviewed, and Tested-bys would be greatly appreciated. I have some
questions in a few patches (e.g. CIFS and thread_stack) that would be nice
to get answered for completeness. FWIW, this series has survived generally
for weeks in 0-day testing, and specifically over a couple days rebased
on v4.14-rc1, so I intend to put this in -next shortly unless there is
further feedback.

----

This series is modified from Brad Spengler/PaX Team's PAX_USERCOPY code
in the last public patch of grsecurity/PaX based on our understanding
of the code. Changes or omissions from the original code are ours and
don't reflect the original grsecurity/PaX code.

David Windsor did the bulk of the porting, refactoring, splitting,
testing, etc; I did some extra tweaks, hunk moving, traces, and extra
patches.

Description from patch 1:


Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory: 48074720
Usercopyable Memory: 6367532 13.2%
task_struct 0.2% 4480/1630720
RAW 0.3% 300/96000
RAWv6 2.1% 1408/64768
ext4_inode_cache 3.0% 269760/8740224
dentry 11.1% 585984/5273856
mm_struct 29.1% 54912/188448
kmalloc-8 100.0% 24576/24576
kmalloc-16 100.0% 28672/28672
kmalloc-32 100.0% 81920/81920
kmalloc-192 100.0% 96768/96768
kmalloc-128 100.0% 143360/143360
names_cache 100.0% 163840/163840
kmalloc-64 100.0% 167936/167936
kmalloc-256 100.0% 339968/339968
kmalloc-512 100.0% 350720/350720
kmalloc-96 100.0% 455616/455616
kmalloc-8192 100.0% 655360/655360
kmalloc-1024 100.0% 812032/812032
kmalloc-4096 100.0% 819200/819200
kmalloc-2048 100.0% 1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory: 95516184
Usercopyable Memory: 8497452 8.8%
task_struct 0.2% 4000/1456000
RAW 0.3% 300/96000
RAWv6 2.1% 1408/64768
ext4_inode_cache 3.0% 1217280/39439872
dentry 11.1% 1623200/14608800
mm_struct 29.1% 73216/251264
kmalloc-8 100.0% 24576/24576
kmalloc-16 100.0% 28672/28672
kmalloc-32 100.0% 94208/94208
kmalloc-192 100.0% 96768/96768
kmalloc-128 100.0% 143360/143360
names_cache 100.0% 163840/163840
kmalloc-64 100.0% 245760/245760
kmalloc-256 100.0% 339968/339968
kmalloc-512 100.0% 350720/350720
kmalloc-96 100.0% 563520/563520
kmalloc-8192 100.0% 655360/655360
kmalloc-1024 100.0% 794624/794624
kmalloc-4096 100.0% 819200/819200
kmalloc-2048 100.0% 1257472/1257472

------
The patches are broken in several stages of changes:

Prepare and whitelist kmalloc:
[PATCH 01/31] usercopy: Prepare for usercopy whitelisting
[PATCH 02/31] usercopy: Enforce slab cache usercopy region boundaries
[PATCH 03/31] usercopy: Mark kmalloc caches as usercopy caches

Update VFS layer for symlinks and other inline storage:
[PATCH 04/31] dcache: Define usercopy region in dentry_cache slab
[PATCH 05/31] vfs: Define usercopy region in names_cache slab caches
[PATCH 06/31] vfs: Copy struct mount.mnt_id to userspace using
[PATCH 07/31] ext4: Define usercopy region in ext4_inode_cache slab
[PATCH 08/31] ext2: Define usercopy region in ext2_inode_cache slab
[PATCH 09/31] jfs: Define usercopy region in jfs_ip slab cache
[PATCH 10/31] befs: Define usercopy region in befs_inode_cache slab
[PATCH 11/31] exofs: Define usercopy region in exofs_inode_cache slab
[PATCH 12/31] orangefs: Define usercopy region in
[PATCH 13/31] ufs: Define usercopy region in ufs_inode_cache slab
[PATCH 14/31] vxfs: Define usercopy region in vxfs_inode slab cache
[PATCH 15/31] xfs: Define usercopy region in xfs_inode slab cache
[PATCH 16/31] cifs: Define usercopy region in cifs_request slab cache

Update scsi layer for inline storage:
[PATCH 17/31] scsi: Define usercopy region in scsi_sense_cache slab

Whitelist a few network protocol-specific areas of memory:
[PATCH 18/31] net: Define usercopy region in struct proto slab cache
[PATCH 19/31] ip: Define usercopy region in IP proto slab cache
[PATCH 20/31] caif: Define usercopy region in caif proto slab cache
[PATCH 21/31] sctp: Define usercopy region in SCTP proto slab cache
[PATCH 22/31] sctp: Copy struct sctp_sock.autoclose to userspace
[PATCH 23/31] net: Restrict unwhitelisted proto caches to size 0

Whitelist areas of process memory:
[PATCH 24/31] fork: Define usercopy region in mm_struct slab caches
[PATCH 25/31] fork: Define usercopy region in thread_stack slab

Deal with per-architecture thread_struct whitelisting:
[PATCH 26/31] fork: Provide usercopy whitelisting for task_struct
[PATCH 27/31] x86: Implement thread_struct whitelist for hardened
[PATCH 28/31] arm64: Implement thread_struct whitelist for hardened
[PATCH 29/31] arm: Implement thread_struct whitelist for hardened

Make blacklisting the default:
[PATCH 30/31] usercopy: Restrict non-usercopy caches to size 0

Update LKDTM:
[PATCH 31/31] lkdtm: Update usercopy tests for whitelisting


Thanks!

-Kees (and David)


2017-09-20 20:46:12

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 06/31] vfs: Copy struct mount.mnt_id to userspace using put_user()

From: David Windsor <[email protected]>

The mnt_id field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log]
Cc: Alexander Viro <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
fs/fhandle.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 58a61f55e0d0..46e00ccca8f0 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -68,8 +68,7 @@ static long do_sys_name_to_handle(struct path *path,
} else
retval = 0;
/* copy the mount id */
- if (copy_to_user(mnt_id, &real_mount(path->mnt)->mnt_id,
- sizeof(*mnt_id)) ||
+ if (put_user(real_mount(path->mnt)->mnt_id, mnt_id) ||
copy_to_user(ufh, handle,
sizeof(struct file_handle) + handle_bytes))
retval = -EFAULT;
--
2.7.4

2017-09-20 20:46:23

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 20/31] caif: Define usercopy region in caif proto slab cache

From: David Windsor <[email protected]>

The CAIF channel connection request parameters need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

net/caif/caif_socket.c:
setsockopt(...):
...
copy_from_user(&cf_sk->conn_req.param.data, ..., ol)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: split from network patch, provide usage trace]
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
net/caif/caif_socket.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 632d5a416d97..c76d513b9a7a 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1032,6 +1032,8 @@ static int caif_create(struct net *net, struct socket *sock, int protocol,
static struct proto prot = {.name = "PF_CAIF",
.owner = THIS_MODULE,
.obj_size = sizeof(struct caifsock),
+ .useroffset = offsetof(struct caifsock, conn_req.param),
+ .usersize = sizeof_field(struct caifsock, conn_req.param)
};

if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN))
--
2.7.4

2017-09-20 20:46:20

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 16/31] cifs: Define usercopy region in cifs_request slab cache

From: David Windsor <[email protected]>

CIFS request buffers, stored in the cifs_request slab cache, need to be
copied to/from userspace.

cache object allocation:
fs/cifs/cifsfs.c:
cifs_init_request_bufs():
...
cifs_req_poolp = mempool_create_slab_pool(cifs_min_rcv,
cifs_req_cachep);

fs/cifs/misc.c:
cifs_buf_get():
...
ret_buf = mempool_alloc(cifs_req_poolp, GFP_NOFS);
...
return ret_buf;

In support of usercopy hardening, this patch defines a region in the
cifs_request slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: Steve French <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
I wasn't able to actually track down the _usage_ of the cifs_request where
it is copied to userspace. If any CIFS folks could help point that out, it
would be very welcome. :) I suspect it might be part of the debug routines,
but I never managed to exercise them.
---
fs/cifs/cifsfs.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 180b3356ff86..09dfdf76c738 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1229,9 +1229,11 @@ cifs_init_request_bufs(void)
cifs_dbg(VFS, "CIFSMaxBufSize %d 0x%x\n",
CIFSMaxBufSize, CIFSMaxBufSize);
*/
- cifs_req_cachep = kmem_cache_create("cifs_request",
+ cifs_req_cachep = kmem_cache_create_usercopy("cifs_request",
CIFSMaxBufSize + max_hdr_size, 0,
- SLAB_HWCACHE_ALIGN, NULL);
+ SLAB_HWCACHE_ALIGN, 0,
+ CIFSMaxBufSize + max_hdr_size,
+ NULL);
if (cifs_req_cachep == NULL)
return -ENOMEM;

@@ -1257,9 +1259,9 @@ cifs_init_request_bufs(void)
more SMBs to use small buffer alloc and is still much more
efficient to alloc 1 per page off the slab compared to 17K (5page)
alloc of large cifs buffers even when page debugging is on */
- cifs_sm_req_cachep = kmem_cache_create("cifs_small_rq",
+ cifs_sm_req_cachep = kmem_cache_create_usercopy("cifs_small_rq",
MAX_CIFS_SMALL_BUFFER_SIZE, 0, SLAB_HWCACHE_ALIGN,
- NULL);
+ 0, MAX_CIFS_SMALL_BUFFER_SIZE, NULL);
if (cifs_sm_req_cachep == NULL) {
mempool_destroy(cifs_req_poolp);
kmem_cache_destroy(cifs_req_cachep);
--
2.7.4

2017-09-20 20:46:58

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 14/31] vxfs: Define usercopy region in vxfs_inode slab cache

From: David Windsor <[email protected]>

vxfs symlink pathnames, stored in struct vxfs_inode_info field
vii_immed.vi_immed and therefore contained in the vxfs_inode slab cache,
need to be copied to/from userspace.

cache object allocation:
fs/freevxfs/vxfs_super.c:
vxfs_alloc_inode(...):
...
vi = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL);
...
return &vi->vfs_inode;

fs/freevxfs/vxfs_inode.c:
cxfs_iget(...):
...
inode->i_link = vip->vii_immed.vi_immed;

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);

(inlined in vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
vxfs_inode slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
fs/freevxfs/vxfs_super.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 455ce5b77e9b..c143e18d5a65 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -332,9 +332,13 @@ vxfs_init(void)
{
int rv;

- vxfs_inode_cachep = kmem_cache_create("vxfs_inode",
+ vxfs_inode_cachep = kmem_cache_create_usercopy("vxfs_inode",
sizeof(struct vxfs_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+ offsetof(struct vxfs_inode_info, vii_immed.vi_immed),
+ sizeof_field(struct vxfs_inode_info,
+ vii_immed.vi_immed),
+ NULL);
if (!vxfs_inode_cachep)
return -ENOMEM;
rv = register_filesystem(&vxfs_fs_type);
--
2.7.4

2017-09-20 20:46:57

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 22/31] sctp: Copy struct sctp_sock.autoclose to userspace using put_user()

From: David Windsor <[email protected]>

The autoclose field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log]
Cc: Vlad Yasevich <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
net/sctp/socket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index aa4f86d64545..e070c0934638 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4893,7 +4893,7 @@ static int sctp_getsockopt_autoclose(struct sock *sk, int len, char __user *optv
len = sizeof(int);
if (put_user(len, optlen))
return -EFAULT;
- if (copy_to_user(optval, &sctp_sk(sk)->autoclose, sizeof(int)))
+ if (put_user(sctp_sk(sk)->autoclose, (int __user *)optval))
return -EFAULT;
return 0;
}
--
2.7.4

2017-09-20 20:46:55

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 30/31] usercopy: Restrict non-usercopy caches to size 0

With all known usercopied cache whitelists now defined in the
kernel, switch the default usercopy region of kmem_cache_create()
to size 0. Any new caches with usercopy regions will now need to use
kmem_cache_create_usercopy() instead of kmem_cache_create().

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Cc: David Windsor <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
mm/slab_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index d4e6442f9bbc..0ac45ba6685e 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -515,7 +515,7 @@ struct kmem_cache *
kmem_cache_create(const char *name, size_t size, size_t align,
unsigned long flags, void (*ctor)(void *))
{
- return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+ return kmem_cache_create_usercopy(name, size, align, flags, 0, 0,
ctor);
}
EXPORT_SYMBOL(kmem_cache_create);
--
2.7.4

2017-09-20 20:48:08

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 11/31] exofs: Define usercopy region in exofs_inode_cache slab cache

From: David Windsor <[email protected]>

The exofs short symlink names, stored in struct exofs_i_info.i_data and
therefore contained in the exofs_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
fs/exofs/super.c:
exofs_alloc_inode(...):
...
oi = kmem_cache_alloc(exofs_inode_cachep, GFP_KERNEL);
...
return &oi->vfs_inode;

fs/exofs/namei.c:
exofs_symlink(...):
...
inode->i_link = (char *)oi->i_data;

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);

(inlined in vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
exofs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: Boaz Harrosh <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
fs/exofs/super.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 819624cfc8da..e5c532875bb7 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -192,10 +192,13 @@ static void exofs_init_once(void *foo)
*/
static int init_inodecache(void)
{
- exofs_inode_cachep = kmem_cache_create("exofs_inode_cache",
+ exofs_inode_cachep = kmem_cache_create_usercopy("exofs_inode_cache",
sizeof(struct exofs_i_info), 0,
SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD |
- SLAB_ACCOUNT, exofs_init_once);
+ SLAB_ACCOUNT,
+ offsetof(struct exofs_i_info, i_data),
+ sizeof_field(struct exofs_i_info, i_data),
+ exofs_init_once);
if (exofs_inode_cachep == NULL)
return -ENOMEM;
return 0;
--
2.7.4

2017-09-20 20:48:07

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 13/31] ufs: Define usercopy region in ufs_inode_cache slab cache

From: David Windsor <[email protected]>

The ufs symlink pathnames, stored in struct ufs_inode_info.i_u1.i_symlink
and therefore contained in the ufs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
fs/ufs/super.c:
ufs_alloc_inode(...):
...
ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
...
return &ei->vfs_inode;

fs/ufs/ufs.h:
UFS_I(struct inode *inode):
return container_of(inode, struct ufs_inode_info, vfs_inode);

fs/ufs/namei.c:
ufs_symlink(...):
...
inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);

(inlined in vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ufs_inode_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: Evgeniy Dushistov <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
fs/ufs/super.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 6440003f8ddc..62b6a4aad809 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1466,11 +1466,14 @@ static void init_once(void *foo)

static int __init init_inodecache(void)
{
- ufs_inode_cachep = kmem_cache_create("ufs_inode_cache",
- sizeof(struct ufs_inode_info),
- 0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
- init_once);
+ ufs_inode_cachep = kmem_cache_create_usercopy("ufs_inode_cache",
+ sizeof(struct ufs_inode_info), 0,
+ (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+ SLAB_ACCOUNT),
+ offsetof(struct ufs_inode_info, i_u1.i_symlink),
+ sizeof_field(struct ufs_inode_info,
+ i_u1.i_symlink),
+ init_once);
if (ufs_inode_cachep == NULL)
return -ENOMEM;
return 0;
--
2.7.4

2017-09-20 20:48:59

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 12/31] orangefs: Define usercopy region in orangefs_inode_cache slab cache

From: David Windsor <[email protected]>

orangefs symlink pathnames, stored in struct orangefs_inode_s.link_target
and therefore contained in the orangefs_inode_cache, need to be copied
to/from userspace.

cache object allocation:
fs/orangefs/super.c:
orangefs_alloc_inode(...):
...
orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, ...);
...
return &orangefs_inode->vfs_inode;

fs/orangefs/orangefs-utils.c:
exofs_symlink(...):
...
inode->i_link = orangefs_inode->link_target;

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);

(inlined in vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
orangefs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: Mike Marshall <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
fs/orangefs/super.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 47f3fb9cbec4..ee7b8bfa47c2 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -624,11 +624,16 @@ void orangefs_kill_sb(struct super_block *sb)

int orangefs_inode_cache_initialize(void)
{
- orangefs_inode_cache = kmem_cache_create("orangefs_inode_cache",
- sizeof(struct orangefs_inode_s),
- 0,
- ORANGEFS_CACHE_CREATE_FLAGS,
- orangefs_inode_cache_ctor);
+ orangefs_inode_cache = kmem_cache_create_usercopy(
+ "orangefs_inode_cache",
+ sizeof(struct orangefs_inode_s),
+ 0,
+ ORANGEFS_CACHE_CREATE_FLAGS,
+ offsetof(struct orangefs_inode_s,
+ link_target),
+ sizeof_field(struct orangefs_inode_s,
+ link_target),
+ orangefs_inode_cache_ctor);

if (!orangefs_inode_cache) {
gossip_err("Cannot create orangefs_inode_cache\n");
--
2.7.4

2017-09-20 20:46:09

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 05/31] vfs: Define usercopy region in names_cache slab caches

From: David Windsor <[email protected]>

VFS pathnames are stored in the names_cache slab cache, either inline
or across an entire allocation entry (when approaching PATH_MAX). These
are copied to/from userspace, so they must be entirely whitelisted.

cache object allocation:
include/linux/fs.h:
#define __getname() kmem_cache_alloc(names_cachep, GFP_KERNEL)

example usage trace:
strncpy_from_user+0x4d/0x170
getname_flags+0x6f/0x1f0
user_path_at_empty+0x23/0x40
do_mount+0x69/0xda0
SyS_mount+0x83/0xd0

fs/namei.c:
getname_flags(...):
...
result = __getname();
...
kname = (char *)result->iname;
result->name = kname;
len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
...
if (unlikely(len == EMBEDDED_NAME_MAX)) {
const size_t size = offsetof(struct filename, iname[1]);
kname = (char *)result;

result = kzalloc(size, GFP_KERNEL);
...
result->name = kname;
len = strncpy_from_user(kname, filename, PATH_MAX);

In support of usercopy hardening, this patch defines the entire cache
object in the names_cache slab cache as whitelisted, since it may entirely
hold name strings to be copied to/from userspace.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, add usage trace]
Cc: Alexander Viro <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
fs/dcache.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5f5e7c1fcf4b..34ef9a9169be 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3642,8 +3642,8 @@ void __init vfs_caches_init_early(void)

void __init vfs_caches_init(void)
{
- names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
- SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+ names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
+ SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);

dcache_init();
inode_init();
--
2.7.4

2017-09-20 20:49:27

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 10/31] befs: Define usercopy region in befs_inode_cache slab cache

From: David Windsor <[email protected]>

befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
and therefore contained in the befs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
fs/befs/linuxvfs.c:
befs_alloc_inode(...):
...
bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
...
return &bi->vfs_inode;

befs_iget(...):
...
strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
BEFS_SYMLINK_LEN);
...
inode->i_link = befs_ino->i_data.symlink;

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);

(inlined in vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
befs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: Luis de Bethencourt <[email protected]>
Cc: Salah Triki <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Acked-by: Luis de Bethencourt <[email protected]>
---
fs/befs/linuxvfs.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index a92355cc453b..e5dcd26003dc 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
static int __init
befs_init_inodecache(void)
{
- befs_inode_cachep = kmem_cache_create("befs_inode_cache",
- sizeof (struct befs_inode_info),
- 0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
- init_once);
+ befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
+ sizeof(struct befs_inode_info), 0,
+ (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+ SLAB_ACCOUNT),
+ offsetof(struct befs_inode_info,
+ i_data.symlink),
+ sizeof_field(struct befs_inode_info,
+ i_data.symlink),
+ init_once);
if (befs_inode_cachep == NULL)
return -ENOMEM;

--
2.7.4

2017-09-20 20:49:50

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 09/31] jfs: Define usercopy region in jfs_ip slab cache

From: David Windsor <[email protected]>

The jfs symlink pathnames, stored in struct jfs_inode_info.i_inline and
therefore contained in the jfs_ip slab cache, need to be copied to/from
userspace.

cache object allocation:
fs/jfs/super.c:
jfs_alloc_inode(...):
...
jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
...
return &jfs_inode->vfs_inode;

fs/jfs/jfs_incore.h:
JFS_IP(struct inode *inode):
return container_of(inode, struct jfs_inode_info, vfs_inode);

fs/jfs/inode.c:
jfs_iget(...):
...
inode->i_link = JFS_IP(inode)->i_inline;

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);

(inlined in vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
jfs_ip slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: Dave Kleikamp <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
fs/jfs/super.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 2f14677169c3..e018412608d4 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -966,9 +966,11 @@ static int __init init_jfs_fs(void)
int rc;

jfs_inode_cachep =
- kmem_cache_create("jfs_ip", sizeof(struct jfs_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
- init_once);
+ kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info),
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+ offsetof(struct jfs_inode_info, i_inline),
+ sizeof_field(struct jfs_inode_info, i_inline),
+ init_once);
if (jfs_inode_cachep == NULL)
return -ENOMEM;

--
2.7.4

2017-09-20 20:50:26

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 08/31] ext2: Define usercopy region in ext2_inode_cache slab cache

From: David Windsor <[email protected]>

The ext2 symlink pathnames, stored in struct ext2_inode_info.i_data and
therefore contained in the ext2_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
fs/ext2/super.c:
ext2_alloc_inode(...):
struct ext2_inode_info *ei;
...
ei = kmem_cache_alloc(ext2_inode_cachep, GFP_NOFS);
...
return &ei->vfs_inode;

fs/ext2/ext2.h:
EXT2_I(struct inode *inode):
return container_of(inode, struct ext2_inode_info, vfs_inode);

fs/ext2/namei.c:
ext2_symlink(...):
...
inode->i_link = (char *)&EXT2_I(inode)->i_data;

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);

(inlined into vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext2_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: Jan Kara <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Acked-by: Jan Kara <[email protected]>
---
fs/ext2/super.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 1458706bd2ec..789c29987b36 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -220,11 +220,13 @@ static void init_once(void *foo)

static int __init init_inodecache(void)
{
- ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
- sizeof(struct ext2_inode_info),
- 0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
- init_once);
+ ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
+ sizeof(struct ext2_inode_info), 0,
+ (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+ SLAB_ACCOUNT),
+ offsetof(struct ext2_inode_info, i_data),
+ sizeof_field(struct ext2_inode_info, i_data),
+ init_once);
if (ext2_inode_cachep == NULL)
return -ENOMEM;
return 0;
--
2.7.4

2017-09-20 20:51:04

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 07/31] ext4: Define usercopy region in ext4_inode_cache slab cache

From: David Windsor <[email protected]>

The ext4 symlink pathnames, stored in struct ext4_inode_info.i_data
and therefore contained in the ext4_inode_cache slab cache, need
to be copied to/from userspace.

cache object allocation:
fs/ext4/super.c:
ext4_alloc_inode(...):
struct ext4_inode_info *ei;
...
ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS);
...
return &ei->vfs_inode;

include/trace/events/ext4.h:
#define EXT4_I(inode) \
(container_of(inode, struct ext4_inode_info, vfs_inode))

fs/ext4/namei.c:
ext4_symlink(...):
...
inode->i_link = (char *)&EXT4_I(inode)->i_data;

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len)

(inlined into vfs_readlink)
generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext4_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: "Theodore Ts'o" <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
fs/ext4/super.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b104096fce9e..b5d393321b7b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1036,11 +1036,13 @@ static void init_once(void *foo)

static int __init init_inodecache(void)
{
- ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
- sizeof(struct ext4_inode_info),
- 0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
- init_once);
+ ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
+ sizeof(struct ext4_inode_info), 0,
+ (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+ SLAB_ACCOUNT),
+ offsetof(struct ext4_inode_info, i_data),
+ sizeof_field(struct ext4_inode_info, i_data),
+ init_once);
if (ext4_inode_cachep == NULL)
return -ENOMEM;
return 0;
--
2.7.4

2017-09-20 20:51:29

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 02/31] usercopy: Enforce slab cache usercopy region boundaries

From: David Windsor <[email protected]>

This patch adds the enforcement component of usercopy cache whitelisting,
and is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

The SLAB and SLUB allocators are modified to deny all copy operations
in which the kernel heap memory being modified falls outside of the cache's
defined usercopy region.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log and comments]
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
mm/slab.c | 16 +++++++++++-----
mm/slub.c | 18 +++++++++++-------
mm/usercopy.c | 12 ++++++++++++
3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 87b6e5e0cdaf..df268999cf02 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4408,7 +4408,9 @@ module_init(slab_proc_init);

#ifdef CONFIG_HARDENED_USERCOPY
/*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
*
* Returns NULL if check passes, otherwise const char * to name of cache
* to indicate an error.
@@ -4428,11 +4430,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
/* Find offset within object. */
offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);

- /* Allow address range falling entirely within object size. */
- if (offset <= cachep->object_size && n <= cachep->object_size - offset)
- return NULL;
+ /* Make sure object falls entirely within cache's usercopy region. */
+ if (offset < cachep->useroffset)
+ return cachep->name;
+ if (offset - cachep->useroffset > cachep->usersize)
+ return cachep->name;
+ if (n > cachep->useroffset - offset + cachep->usersize)
+ return cachep->name;

- return cachep->name;
+ return NULL;
}
#endif /* CONFIG_HARDENED_USERCOPY */

diff --git a/mm/slub.c b/mm/slub.c
index fae637726c44..bbf73024be3a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3833,7 +3833,9 @@ EXPORT_SYMBOL(__kmalloc_node);

#ifdef CONFIG_HARDENED_USERCOPY
/*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
*
* Returns NULL if check passes, otherwise const char * to name of cache
* to indicate an error.
@@ -3843,11 +3845,9 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
{
struct kmem_cache *s;
unsigned long offset;
- size_t object_size;

/* Find object and usable object size. */
s = page->slab_cache;
- object_size = slab_ksize(s);

/* Reject impossible pointers. */
if (ptr < page_address(page))
@@ -3863,11 +3863,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
offset -= s->red_left_pad;
}

- /* Allow address range falling entirely within object size. */
- if (offset <= object_size && n <= object_size - offset)
- return NULL;
+ /* Make sure object falls entirely within cache's usercopy region. */
+ if (offset < s->useroffset)
+ return s->name;
+ if (offset - s->useroffset > s->usersize)
+ return s->name;
+ if (n > s->useroffset - offset + s->usersize)
+ return s->name;

- return s->name;
+ return NULL;
}
#endif /* CONFIG_HARDENED_USERCOPY */

diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b24715d..cbffde670c49 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -58,6 +58,18 @@ static noinline int check_stack_object(const void *obj, unsigned long len)
return GOOD_STACK;
}

+/*
+ * If this function is reached, then CONFIG_HARDENED_USERCOPY has found an
+ * unexpected state during a copy_from_user() or copy_to_user() call.
+ * There are several checks being performed on the buffer by the
+ * __check_object_size() function. Normal stack buffer usage should never
+ * trip the checks, and kernel text addressing will always trip the check.
+ * For cache objects, it is checking that only the whitelisted range of
+ * bytes for a given cache is being accessed (via the cache's usersize and
+ * useroffset fields). To adjust a cache whitelist, use the usercopy-aware
+ * kmem_cache_create_usercopy() function to create the cache (and
+ * carefully audit the whitelist range).
+ */
static void report_usercopy(const void *ptr, unsigned long len,
bool to_user, const char *type)
{
--
2.7.4

2017-09-20 20:52:03

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 01/31] usercopy: Prepare for usercopy whitelisting

From: David Windsor <[email protected]>

This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory: 48074720
Usercopyable Memory: 6367532 13.2%
task_struct 0.2% 4480/1630720
RAW 0.3% 300/96000
RAWv6 2.1% 1408/64768
ext4_inode_cache 3.0% 269760/8740224
dentry 11.1% 585984/5273856
mm_struct 29.1% 54912/188448
kmalloc-8 100.0% 24576/24576
kmalloc-16 100.0% 28672/28672
kmalloc-32 100.0% 81920/81920
kmalloc-192 100.0% 96768/96768
kmalloc-128 100.0% 143360/143360
names_cache 100.0% 163840/163840
kmalloc-64 100.0% 167936/167936
kmalloc-256 100.0% 339968/339968
kmalloc-512 100.0% 350720/350720
kmalloc-96 100.0% 455616/455616
kmalloc-8192 100.0% 655360/655360
kmalloc-1024 100.0% 812032/812032
kmalloc-4096 100.0% 819200/819200
kmalloc-2048 100.0% 1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory: 95516184
Usercopyable Memory: 8497452 8.8%
task_struct 0.2% 4000/1456000
RAW 0.3% 300/96000
RAWv6 2.1% 1408/64768
ext4_inode_cache 3.0% 1217280/39439872
dentry 11.1% 1623200/14608800
mm_struct 29.1% 73216/251264
kmalloc-8 100.0% 24576/24576
kmalloc-16 100.0% 28672/28672
kmalloc-32 100.0% 94208/94208
kmalloc-192 100.0% 96768/96768
kmalloc-128 100.0% 143360/143360
names_cache 100.0% 163840/163840
kmalloc-64 100.0% 245760/245760
kmalloc-256 100.0% 339968/339968
kmalloc-512 100.0% 350720/350720
kmalloc-96 100.0% 563520/563520
kmalloc-8192 100.0% 655360/655360
kmalloc-1024 100.0% 794624/794624
kmalloc-4096 100.0% 819200/819200
kmalloc-2048 100.0% 1257472/1257472

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, split out a few extra kmalloc hunks]
[kees: add field names to function declarations]
[kees: convert BUGs to WARNs and fail closed]
[kees: add attack surface reduction analysis to commit log]
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
include/linux/slab.h | 27 +++++++++++++++++++++------
include/linux/slab_def.h | 3 +++
include/linux/slub_def.h | 3 +++
include/linux/stddef.h | 2 ++
mm/slab.c | 2 +-
mm/slab.h | 5 ++++-
mm/slab_common.c | 46 ++++++++++++++++++++++++++++++++++++++--------
mm/slub.c | 11 +++++++++--
8 files changed, 81 insertions(+), 18 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41473df6dfb0..8b6cb384f8b6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -126,9 +126,13 @@ struct mem_cgroup;
void __init kmem_cache_init(void);
bool slab_is_available(void);

-struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
- unsigned long,
- void (*)(void *));
+struct kmem_cache *kmem_cache_create(const char *name, size_t size,
+ size_t align, unsigned long flags,
+ void (*ctor)(void *));
+struct kmem_cache *kmem_cache_create_usercopy(const char *name,
+ size_t size, size_t align, unsigned long flags,
+ size_t useroffset, size_t usersize,
+ void (*ctor)(void *));
void kmem_cache_destroy(struct kmem_cache *);
int kmem_cache_shrink(struct kmem_cache *);

@@ -144,9 +148,20 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *);
* f.e. add ____cacheline_aligned_in_smp to the struct declaration
* then the objects will be properly aligned in SMP configurations.
*/
-#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
- sizeof(struct __struct), __alignof__(struct __struct),\
- (__flags), NULL)
+#define KMEM_CACHE(__struct, __flags) \
+ kmem_cache_create(#__struct, sizeof(struct __struct), \
+ __alignof__(struct __struct), (__flags), NULL)
+
+/*
+ * To whitelist a single field for copying to/from usercopy, use this
+ * macro instead for KMEM_CACHE() above.
+ */
+#define KMEM_CACHE_USERCOPY(__struct, __flags, __field) \
+ kmem_cache_create_usercopy(#__struct, \
+ sizeof(struct __struct), \
+ __alignof__(struct __struct), (__flags), \
+ offsetof(struct __struct, __field), \
+ sizeof_field(struct __struct, __field), NULL)

/*
* Common kmalloc functions provided by all allocators
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 4ad2c5a26399..03eef0df8648 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -84,6 +84,9 @@ struct kmem_cache {
unsigned int *random_seq;
#endif

+ size_t useroffset; /* Usercopy region offset */
+ size_t usersize; /* Usercopy region size */
+
struct kmem_cache_node *node[MAX_NUMNODES];
};

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 0783b622311e..62866a1a767c 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -134,6 +134,9 @@ struct kmem_cache {
struct kasan_cache kasan_info;
#endif

+ size_t useroffset; /* Usercopy region offset */
+ size_t usersize; /* Usercopy region size */
+
struct kmem_cache_node *node[MAX_NUMNODES];
};

diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index 9c61c7cda936..f00355086fb2 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -18,6 +18,8 @@ enum {
#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
#endif

+#define sizeof_field(structure, field) sizeof((((structure *)0)->field))
+
/**
* offsetofend(TYPE, MEMBER)
*
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48c3ed7..87b6e5e0cdaf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1281,7 +1281,7 @@ void __init kmem_cache_init(void)
create_boot_cache(kmem_cache, "kmem_cache",
offsetof(struct kmem_cache, node) +
nr_node_ids * sizeof(struct kmem_cache_node *),
- SLAB_HWCACHE_ALIGN);
+ SLAB_HWCACHE_ALIGN, 0, 0);
list_add(&kmem_cache->list, &slab_caches);
slab_state = PARTIAL;

diff --git a/mm/slab.h b/mm/slab.h
index 073362816acc..044755ff9632 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -21,6 +21,8 @@ struct kmem_cache {
unsigned int size; /* The aligned/padded/added on size */
unsigned int align; /* Alignment as calculated */
unsigned long flags; /* Active flags on the slab */
+ size_t useroffset; /* Usercopy region offset */
+ size_t usersize; /* Usercopy region size */
const char *name; /* Slab name for sysfs */
int refcount; /* Use counter */
void (*ctor)(void *); /* Called on object slot creation */
@@ -97,7 +99,8 @@ extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);
extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
unsigned long flags);
extern void create_boot_cache(struct kmem_cache *, const char *name,
- size_t size, unsigned long flags);
+ size_t size, unsigned long flags, size_t useroffset,
+ size_t usersize);

int slab_unmergeable(struct kmem_cache *s);
struct kmem_cache *find_mergeable(size_t size, size_t align,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83be82de..36408f5f2a34 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -272,6 +272,9 @@ int slab_unmergeable(struct kmem_cache *s)
if (s->ctor)
return 1;

+ if (s->usersize)
+ return 1;
+
/*
* We may have set a slab to be unmergeable during bootstrap.
*/
@@ -357,12 +360,16 @@ unsigned long calculate_alignment(unsigned long flags,

static struct kmem_cache *create_cache(const char *name,
size_t object_size, size_t size, size_t align,
- unsigned long flags, void (*ctor)(void *),
+ unsigned long flags, size_t useroffset,
+ size_t usersize, void (*ctor)(void *),
struct mem_cgroup *memcg, struct kmem_cache *root_cache)
{
struct kmem_cache *s;
int err;

+ if (WARN_ON(useroffset + usersize > object_size))
+ useroffset = usersize = 0;
+
err = -ENOMEM;
s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
if (!s)
@@ -373,6 +380,8 @@ static struct kmem_cache *create_cache(const char *name,
s->size = size;
s->align = align;
s->ctor = ctor;
+ s->useroffset = useroffset;
+ s->usersize = usersize;

err = init_memcg_params(s, memcg, root_cache);
if (err)
@@ -397,11 +406,13 @@ static struct kmem_cache *create_cache(const char *name,
}

/*
- * kmem_cache_create - Create a cache.
+ * kmem_cache_create_usercopy - Create a cache.
* @name: A string which is used in /proc/slabinfo to identify this cache.
* @size: The size of objects to be created in this cache.
* @align: The required alignment for the objects.
* @flags: SLAB flags
+ * @useroffset: Usercopy region offset
+ * @usersize: Usercopy region size
* @ctor: A constructor for the objects.
*
* Returns a ptr to the cache on success, NULL on failure.
@@ -421,8 +432,9 @@ static struct kmem_cache *create_cache(const char *name,
* as davem.
*/
struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t align,
- unsigned long flags, void (*ctor)(void *))
+kmem_cache_create_usercopy(const char *name, size_t size, size_t align,
+ unsigned long flags, size_t useroffset, size_t usersize,
+ void (*ctor)(void *))
{
struct kmem_cache *s = NULL;
const char *cache_name;
@@ -453,7 +465,13 @@ kmem_cache_create(const char *name, size_t size, size_t align,
*/
flags &= CACHE_CREATE_MASK;

- s = __kmem_cache_alias(name, size, align, flags, ctor);
+ /* Fail closed on bad usersize of useroffset values. */
+ if (WARN_ON(!usersize && useroffset) ||
+ WARN_ON(size < usersize || size - usersize < useroffset))
+ usersize = useroffset = 0;
+
+ if (!usersize)
+ s = __kmem_cache_alias(name, size, align, flags, ctor);
if (s)
goto out_unlock;

@@ -465,7 +483,7 @@ kmem_cache_create(const char *name, size_t size, size_t align,

s = create_cache(cache_name, size, size,
calculate_alignment(flags, align, size),
- flags, ctor, NULL, NULL);
+ flags, useroffset, usersize, ctor, NULL, NULL);
if (IS_ERR(s)) {
err = PTR_ERR(s);
kfree_const(cache_name);
@@ -491,6 +509,15 @@ kmem_cache_create(const char *name, size_t size, size_t align,
}
return s;
}
+EXPORT_SYMBOL(kmem_cache_create_usercopy);
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+ unsigned long flags, void (*ctor)(void *))
+{
+ return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+ ctor);
+}
EXPORT_SYMBOL(kmem_cache_create);

static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
@@ -603,6 +630,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
s = create_cache(cache_name, root_cache->object_size,
root_cache->size, root_cache->align,
root_cache->flags & CACHE_CREATE_MASK,
+ root_cache->useroffset, root_cache->usersize,
root_cache->ctor, memcg, root_cache);
/*
* If we could not create a memcg cache, do not complain, because
@@ -870,13 +898,15 @@ bool slab_is_available(void)
#ifndef CONFIG_SLOB
/* Create a cache during boot when no slab services are available yet */
void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
- unsigned long flags)
+ unsigned long flags, size_t useroffset, size_t usersize)
{
int err;

s->name = name;
s->size = s->object_size = size;
s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+ s->useroffset = useroffset;
+ s->usersize = usersize;

slab_init_memcg_params(s);

@@ -897,7 +927,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
if (!s)
panic("Out of memory when creating slab %s\n", name);

- create_boot_cache(s, name, size, flags);
+ create_boot_cache(s, name, size, flags, 0, size);
list_add(&s->list, &slab_caches);
memcg_link_cache(s);
s->refcount = 1;
diff --git a/mm/slub.c b/mm/slub.c
index 163352c537ab..fae637726c44 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4201,7 +4201,7 @@ void __init kmem_cache_init(void)
kmem_cache = &boot_kmem_cache;

create_boot_cache(kmem_cache_node, "kmem_cache_node",
- sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN);
+ sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);

register_hotmemory_notifier(&slab_memory_callback_nb);

@@ -4211,7 +4211,7 @@ void __init kmem_cache_init(void)
create_boot_cache(kmem_cache, "kmem_cache",
offsetof(struct kmem_cache, node) +
nr_node_ids * sizeof(struct kmem_cache_node *),
- SLAB_HWCACHE_ALIGN);
+ SLAB_HWCACHE_ALIGN, 0, 0);

kmem_cache = bootstrap(&boot_kmem_cache);

@@ -5081,6 +5081,12 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
SLAB_ATTR_RO(cache_dma);
#endif

+static ssize_t usersize_show(struct kmem_cache *s, char *buf)
+{
+ return sprintf(buf, "%zu\n", s->usersize);
+}
+SLAB_ATTR_RO(usersize);
+
static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
{
return sprintf(buf, "%d\n", !!(s->flags & SLAB_TYPESAFE_BY_RCU));
@@ -5455,6 +5461,7 @@ static struct attribute *slab_attrs[] = {
#ifdef CONFIG_FAILSLAB
&failslab_attr.attr,
#endif
+ &usersize_attr.attr,

NULL
};
--
2.7.4

2017-09-20 20:52:51

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 28/31] arm64: Implement thread_struct whitelist for hardened usercopy

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire structure.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Morse <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Dave Martin <[email protected]>
Cc: zijun_hu <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/processor.h | 8 ++++++++
2 files changed, 9 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..e190f9901aef 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -73,6 +73,7 @@ config ARM64
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
select HAVE_ARCH_SECCOMP_FILTER
+ select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_ARCH_VMAP_STACK
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 29adab8138c3..759c4d90ac7f 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -90,6 +90,14 @@ struct thread_struct {
struct debug_info debug; /* debugging */
};

+/* Whitelist the fpsimd_state for copying to userspace. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+ unsigned long *size)
+{
+ *offset = offsetof(struct thread_struct, fpsimd_state);
+ *size = sizeof(struct fpsimd_state);
+}
+
#ifdef CONFIG_COMPAT
#define task_user_tls(t) \
({ \
--
2.7.4

2017-09-20 20:52:58

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 26/31] fork: Provide usercopy whitelisting for task_struct

While the blocked and saved_sigmask fields of task_struct are copied to
userspace (via sigmask_to_save() and setup_rt_frame()), it is always
copied with a static length (i.e. sizeof(sigset_t)), so they are implictly
whitelisted.

The only portion of task_struct that is potentially dynamically sized and
may be copied to userspace is in the architecture-specific thread_struct
at the end of task_struct.

cache object allocation:
kernel/fork.c:
alloc_task_struct_node(...):
return kmem_cache_alloc_node(task_struct_cachep, ...);

dup_task_struct(...):
...
tsk = alloc_task_struct_node(node);

copy_process(...):
...
dup_task_struct(...)

_do_fork(...):
...
copy_process(...)

example usage trace:

arch/x86/kernel/fpu/signal.c:
__fpu__restore_sig(...):
...
struct task_struct *tsk = current;
struct fpu *fpu = &tsk->thread.fpu;
...
__copy_from_user(&fpu->state.xsave, ..., state_size);

fpu__restore_sig(...):
...
return __fpu__restore_sig(...);

arch/x86/kernel/signal.c:
restore_sigcontext(...):
...
fpu__restore_sig(...)

This introduces arch_thread_struct_whitelist() to let an architecture
declare specifically where the whitelist should be within thread_struct.
If undefined, the entire thread_struct field is left whitelisted.

Cc: Andrew Morton <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: "Mickaël Salaün" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Acked-by: Rik van Riel <[email protected]>
---
arch/Kconfig | 11 +++++++++++
include/linux/sched/task.h | 14 ++++++++++++++
kernel/fork.c | 22 ++++++++++++++++++++--
3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 1aafb4efbb51..43f2e7b033ca 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -241,6 +241,17 @@ config ARCH_INIT_TASK
config ARCH_TASK_STRUCT_ALLOCATOR
bool

+config HAVE_ARCH_THREAD_STRUCT_WHITELIST
+ bool
+ depends on !ARCH_TASK_STRUCT_ALLOCATOR
+ help
+ An architecture should select this to provide hardened usercopy
+ knowledge about what region of the thread_struct should be
+ whitelisted for copying to userspace. Normally this is only the
+ FPU registers. Specifically, arch_thread_struct_whitelist()
+ should be implemented. Without this, the entire thread_struct
+ field in task_struct will be left whitelisted.
+
# Select if arch has its private alloc_thread_stack() function
config ARCH_THREAD_STACK_ALLOCATOR
bool
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 79a2a744648d..a5e6f0913f74 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -103,6 +103,20 @@ extern int arch_task_struct_size __read_mostly;
# define arch_task_struct_size (sizeof(struct task_struct))
#endif

+#ifndef CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST
+/*
+ * If an architecture has not declared a thread_struct whitelist we
+ * must assume something there may need to be copied to userspace.
+ */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+ unsigned long *size)
+{
+ *offset = 0;
+ /* Handle dynamically sized thread_struct. */
+ *size = arch_task_struct_size - offsetof(struct task_struct, thread);
+}
+#endif
+
#ifdef CONFIG_VMAP_STACK
static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
{
diff --git a/kernel/fork.c b/kernel/fork.c
index 720109dc723a..d8dcd8f8e82f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -454,6 +454,21 @@ static void set_max_threads(unsigned int max_threads_suggested)
int arch_task_struct_size __read_mostly;
#endif

+static void task_struct_whitelist(unsigned long *offset, unsigned long *size)
+{
+ /* Fetch thread_struct whitelist for the architecture. */
+ arch_thread_struct_whitelist(offset, size);
+
+ /*
+ * Handle zero-sized whitelist or empty thread_struct, otherwise
+ * adjust offset to position of thread_struct in task_struct.
+ */
+ if (unlikely(*size == 0))
+ *offset = 0;
+ else
+ *offset += offsetof(struct task_struct, thread);
+}
+
void __init fork_init(void)
{
int i;
@@ -462,11 +477,14 @@ void __init fork_init(void)
#define ARCH_MIN_TASKALIGN 0
#endif
int align = max_t(int, L1_CACHE_BYTES, ARCH_MIN_TASKALIGN);
+ unsigned long useroffset, usersize;

/* create a slab on which task_structs can be allocated */
- task_struct_cachep = kmem_cache_create("task_struct",
+ task_struct_whitelist(&useroffset, &usersize);
+ task_struct_cachep = kmem_cache_create_usercopy("task_struct",
arch_task_struct_size, align,
- SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT, NULL);
+ SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT,
+ useroffset, usersize, NULL);
#endif

/* do the arch specific task caches init */
--
2.7.4

2017-09-20 20:53:13

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 18/31] net: Define usercopy region in struct proto slab cache

From: David Windsor <[email protected]>

In support of usercopy hardening, this patch defines a region in the
struct proto slab cache in which userspace copy operations are allowed.
Some protocols need to copy objects to/from userspace, and they can
declare the region via their proto structure with the new usersize and
useroffset fields. Initially, if no region is specified (usersize ==
0), the entire field is marked as whitelisted. This allows protocols
to be whitelisted in subsequent patches. Once all protocols have been
annotated, the full-whitelist default can be removed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, split off per-proto patches]
[kees: add logic for by-default full-whitelist]
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: David Howells <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
include/net/sock.h | 2 ++
net/core/sock.c | 6 +++++-
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 03a362568357..13c2d1b48c86 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1106,6 +1106,8 @@ struct proto {
struct kmem_cache *slab;
unsigned int obj_size;
int slab_flags;
+ size_t useroffset; /* Usercopy region offset */
+ size_t usersize; /* Usercopy region size */

struct percpu_counter *orphan_count;

diff --git a/net/core/sock.c b/net/core/sock.c
index 9b7b6bbb2a23..832dfb03102e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3165,8 +3165,12 @@ static int req_prot_init(const struct proto *prot)
int proto_register(struct proto *prot, int alloc_slab)
{
if (alloc_slab) {
- prot->slab = kmem_cache_create(prot->name, prot->obj_size, 0,
+ prot->slab = kmem_cache_create_usercopy(prot->name,
+ prot->obj_size, 0,
SLAB_HWCACHE_ALIGN | prot->slab_flags,
+ prot->usersize ? prot->useroffset : 0,
+ prot->usersize ? prot->usersize
+ : prot->obj_size,
NULL);

if (prot->slab == NULL) {
--
2.7.4

2017-09-20 20:53:11

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 25/31] fork: Define usercopy region in thread_stack slab caches

From: David Windsor <[email protected]>

In support of usercopy hardening, this patch defines a region in the
thread_stack slab caches in which userspace copy operations are allowed.
Since the entire thread_stack needs to be available to userspace, the
entire slab contents are whitelisted. Note that the slab-based thread
stack is only present on systems with THREAD_SIZE < PAGE_SIZE and
!CONFIG_VMAP_STACK.

cache object allocation:
kernel/fork.c:
alloc_thread_stack_node(...):
return kmem_cache_alloc_node(thread_stack_cache, ...)

dup_task_struct(...):
...
stack = alloc_thread_stack_node(...)
...
tsk->stack = stack;

copy_process(...):
...
dup_task_struct(...)

_do_fork(...):
...
copy_process(...)

This region is known as the slab cache's usercopy region. Slab caches
can now check that each copy operation involving cache-managed memory
falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, split patch, provide usage trace]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Acked-by: Rik van Riel <[email protected]>
---
I wasn't able to test this, so anyone with a system that can try running
with a large PAGE_SIZE and without VMAP_STACK would be appreciated.
---
kernel/fork.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index dc1437f8b702..720109dc723a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -278,8 +278,9 @@ static void free_thread_stack(struct task_struct *tsk)

void thread_stack_cache_init(void)
{
- thread_stack_cache = kmem_cache_create("thread_stack", THREAD_SIZE,
- THREAD_SIZE, 0, NULL);
+ thread_stack_cache = kmem_cache_create_usercopy("thread_stack",
+ THREAD_SIZE, THREAD_SIZE, 0, 0,
+ THREAD_SIZE, NULL);
BUG_ON(thread_stack_cache == NULL);
}
# endif
--
2.7.4

2017-09-20 20:56:16

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 19/31] ip: Define usercopy region in IP proto slab cache

From: David Windsor <[email protected]>

The ICMP filters for IPv4 and IPv6 raw sockets need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

net/ipv4/raw.c:
raw_seticmpfilter(...):
...
copy_from_user(&raw_sk(sk)->filter, ..., optlen)

raw_geticmpfilter(...):
...
copy_to_user(..., &raw_sk(sk)->filter, len)

net/ipv6/raw.c:
rawv6_seticmpfilter(...):
...
copy_from_user(&raw6_sk(sk)->filter, ..., optlen)

rawv6_geticmpfilter(...):
...
copy_to_user(..., &raw6_sk(sk)->filter, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: split from network patch, provide usage trace]
Cc: "David S. Miller" <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
net/ipv4/raw.c | 2 ++
net/ipv6/raw.c | 2 ++
2 files changed, 4 insertions(+)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 33b70bfd1122..1b6fa4195ac9 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -970,6 +970,8 @@ struct proto raw_prot = {
.hash = raw_hash_sk,
.unhash = raw_unhash_sk,
.obj_size = sizeof(struct raw_sock),
+ .useroffset = offsetof(struct raw_sock, filter),
+ .usersize = sizeof_field(struct raw_sock, filter),
.h.raw_hash = &raw_v4_hashinfo,
#ifdef CONFIG_COMPAT
.compat_setsockopt = compat_raw_setsockopt,
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index e4462b0ff801..041d1cd5e774 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1268,6 +1268,8 @@ struct proto rawv6_prot = {
.hash = raw_hash_sk,
.unhash = raw_unhash_sk,
.obj_size = sizeof(struct raw6_sock),
+ .useroffset = offsetof(struct raw6_sock, filter),
+ .usersize = sizeof_field(struct raw6_sock, filter),
.h.raw_hash = &raw_v6_hashinfo,
#ifdef CONFIG_COMPAT
.compat_setsockopt = compat_rawv6_setsockopt,
--
2.7.4

2017-09-20 20:56:14

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 21/31] sctp: Define usercopy region in SCTP proto slab cache

From: David Windsor <[email protected]>

The SCTP socket event notification subscription information need to be
copied to/from userspace. In support of usercopy hardening, this patch
defines a region in the struct proto slab cache in which userspace copy
operations are allowed. Additionally moves the usercopy fields to be
adjacent for the region to cover both.

example usage trace:

net/sctp/socket.c:
sctp_getsockopt_events(...):
...
copy_to_user(..., &sctp_sk(sk)->subscribe, len)

sctp_setsockopt_events(...):
...
copy_from_user(&sctp_sk(sk)->subscribe, ..., optlen)

sctp_getsockopt_initmsg(...):
...
copy_to_user(..., &sctp_sk(sk)->initmsg, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: split from network patch, move struct member adjacent, provide usage]
Cc: Vlad Yasevich <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
include/net/sctp/structs.h | 9 +++++++--
net/sctp/socket.c | 4 ++++
2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 0477945de1a3..f2da107983d9 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -202,12 +202,17 @@ struct sctp_sock {
/* Flags controlling Heartbeat, SACK delay, and Path MTU Discovery. */
__u32 param_flags;

- struct sctp_initmsg initmsg;
struct sctp_rtoinfo rtoinfo;
struct sctp_paddrparams paddrparam;
- struct sctp_event_subscribe subscribe;
struct sctp_assocparams assocparams;

+ /*
+ * These two structures must be grouped together for the usercopy
+ * whitelist region.
+ */
+ struct sctp_event_subscribe subscribe;
+ struct sctp_initmsg initmsg;
+
int user_frag;

__u32 autoclose;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index d4730ada7f32..aa4f86d64545 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -8246,6 +8246,10 @@ struct proto sctp_prot = {
.unhash = sctp_unhash,
.get_port = sctp_get_port,
.obj_size = sizeof(struct sctp_sock),
+ .useroffset = offsetof(struct sctp_sock, subscribe),
+ .usersize = offsetof(struct sctp_sock, initmsg) -
+ offsetof(struct sctp_sock, subscribe) +
+ sizeof_field(struct sctp_sock, initmsg),
.sysctl_mem = sysctl_sctp_mem,
.sysctl_rmem = sysctl_sctp_rmem,
.sysctl_wmem = sysctl_sctp_wmem,
--
2.7.4

2017-09-20 20:56:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 14/31] vxfs: Define usercopy region in vxfs_inode slab cache

Hi Kees,

I've only got this single email from you, which on it's own doesn't
compile and seems to be part of a 31 patch series.

So as-is NAK, doesn't work.

Please make sure to always send every patch in a series to every
developer you want to include.

2017-09-20 20:57:16

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 24/31] fork: Define usercopy region in mm_struct slab caches

From: David Windsor <[email protected]>

In support of usercopy hardening, this patch defines a region in the
mm_struct slab caches in which userspace copy operations are allowed.
Only the auxv field is copied to userspace.

cache object allocation:
kernel/fork.c:
#define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL))

dup_mm():
...
mm = allocate_mm();

copy_mm(...):
...
dup_mm();

copy_process(...):
...
copy_mm(...)

_do_fork(...):
...
copy_process(...)

example usage trace:

fs/binfmt_elf.c:
create_elf_tables(...):
...
elf_info = (elf_addr_t *)current->mm->saved_auxv;
...
copy_to_user(..., elf_info, ei_index * sizeof(elf_addr_t))

load_elf_binary(...):
...
create_elf_tables(...);

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, split patch, provide usage trace]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Acked-by: Rik van Riel <[email protected]>
---
kernel/fork.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 10646182440f..dc1437f8b702 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2207,9 +2207,11 @@ void __init proc_caches_init(void)
* maximum number of CPU's we can ever have. The cpumask_allocation
* is at the end of the structure, exactly for that reason.
*/
- mm_cachep = kmem_cache_create("mm_struct",
+ mm_cachep = kmem_cache_create_usercopy("mm_struct",
sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN,
SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT,
+ offsetof(struct mm_struct, saved_auxv),
+ sizeof_field(struct mm_struct, saved_auxv),
NULL);
vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
mmap_init();
--
2.7.4

2017-09-20 20:58:04

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 15/31] xfs: Define usercopy region in xfs_inode slab cache

From: David Windsor <[email protected]>

The XFS inline inode data, stored in struct xfs_inode_t field
i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
cache, needs to be copied to/from userspace.

cache object allocation:
fs/xfs/xfs_icache.c:
xfs_inode_alloc(...):
...
ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);

fs/xfs/libxfs/xfs_inode_fork.c:
xfs_init_local_fork(...):
...
if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
...

fs/xfs/xfs_symlink.c:
xfs_symlink(...):
...
xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);

example usage trace:
readlink_copy+0x43/0x70
vfs_readlink+0x62/0x110
SyS_readlinkat+0x100/0x130

fs/xfs/xfs_iops.c:
(via inode->i_op->get_link)
xfs_vn_get_link_inline(...):
...
return XFS_I(inode)->i_df.if_u1.if_data;

fs/namei.c:
readlink_copy(..., link):
...
copy_to_user(..., link, len);

generic_readlink(dentry, ...):
struct inode *inode = d_inode(dentry);
const char *link = inode->i_link;
...
if (!link) {
link = inode->i_op->get_link(dentry, inode, &done);
...
readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
xfs_inode slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: "Darrick J. Wong" <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/xfs/kmem.h | 10 ++++++++++
fs/xfs/xfs_super.c | 7 +++++--
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 4d85992d75b2..08358f38dee6 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -110,6 +110,16 @@ kmem_zone_init_flags(int size, char *zone_name, unsigned long flags,
return kmem_cache_create(zone_name, size, 0, flags, construct);
}

+static inline kmem_zone_t *
+kmem_zone_init_flags_usercopy(int size, char *zone_name, unsigned long flags,
+ size_t useroffset, size_t usersize,
+ void (*construct)(void *))
+{
+ return kmem_cache_create_usercopy(zone_name, size, 0, flags,
+ useroffset, usersize, construct);
+}
+
+
static inline void
kmem_zone_free(kmem_zone_t *zone, void *ptr)
{
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index c996f4ae4a5f..1b4b67194538 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1846,9 +1846,12 @@ xfs_init_zones(void)
goto out_destroy_efd_zone;

xfs_inode_zone =
- kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
+ kmem_zone_init_flags_usercopy(sizeof(xfs_inode_t), "xfs_inode",
KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD |
- KM_ZONE_ACCOUNT, xfs_fs_inode_init_once);
+ KM_ZONE_ACCOUNT,
+ offsetof(xfs_inode_t, i_df.if_u2.if_inline_data),
+ sizeof_field(xfs_inode_t, i_df.if_u2.if_inline_data),
+ xfs_fs_inode_init_once);
if (!xfs_inode_zone)
goto out_destroy_efi_zone;

--
2.7.4

2017-09-20 20:52:56

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 29/31] arm: Implement thread_struct whitelist for hardened usercopy

ARM does not carry FPU state in the thread structure, so it can declare
no usercopy whitelist at all.

Cc: Russell King <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
arch/arm/Kconfig | 1 +
arch/arm/include/asm/processor.h | 7 +++++++
2 files changed, 8 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 7888c9803eb0..4f1ab6c6b8c0 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -48,6 +48,7 @@ config ARM
select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
+ select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARM_SMCCC if CPU_V7
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index c3d5fc124a05..d6dc45c92ee5 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -45,6 +45,13 @@ struct thread_struct {
struct debug_info debug;
};

+/* Nothing needs to be usercopy-whitelisted from thread_struct. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+ unsigned long *size)
+{
+ *offset = *size = 0;
+}
+
#define INIT_THREAD { }

#ifdef CONFIG_MMU
--
2.7.4

2017-09-20 20:58:33

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 23/31] net: Restrict unwhitelisted proto caches to size 0

Now that protocols have been annotated (the copy of icsk_ca_ops->name
is of an ops field from outside the slab cache):

$ git grep 'copy_.*_user.*sk.*->'
caif/caif_socket.c: copy_from_user(&cf_sk->conn_req.param.data, ov, ol)) {
ipv4/raw.c: if (copy_from_user(&raw_sk(sk)->filter, optval, optlen))
ipv4/raw.c: copy_to_user(optval, &raw_sk(sk)->filter, len))
ipv4/tcp.c: if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
ipv4/tcp.c: if (copy_to_user(optval, icsk->icsk_ulp_ops->name, len))
ipv6/raw.c: if (copy_from_user(&raw6_sk(sk)->filter, optval, optlen))
ipv6/raw.c: if (copy_to_user(optval, &raw6_sk(sk)->filter, len))
sctp/socket.c: if (copy_from_user(&sctp_sk(sk)->subscribe, optval, optlen))
sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->subscribe, len))
sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->initmsg, len))

we can switch the default proto usercopy region to size 0. Any protocols
needing to add whitelisted regions must annotate the fields with the
useroffset and usersize fields of struct proto.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: David Howells <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
net/core/sock.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 832dfb03102e..84cd0b362a02 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3168,9 +3168,7 @@ int proto_register(struct proto *prot, int alloc_slab)
prot->slab = kmem_cache_create_usercopy(prot->name,
prot->obj_size, 0,
SLAB_HWCACHE_ALIGN | prot->slab_flags,
- prot->usersize ? prot->useroffset : 0,
- prot->usersize ? prot->usersize
- : prot->obj_size,
+ prot->useroffset, prot->usersize,
NULL);

if (prot->slab == NULL) {
--
2.7.4

2017-09-20 20:59:08

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 17/31] scsi: Define usercopy region in scsi_sense_cache slab cache

From: David Windsor <[email protected]>

SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
contained in the scsi_sense_cache slab cache, need to be copied to/from
userspace.

cache object allocation:
drivers/scsi/scsi_lib.c:
scsi_select_sense_cache(...):
return ... ? scsi_sense_isadma_cache : scsi_sense_cache

scsi_alloc_sense_buffer(...):
return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);

scsi_init_request(...):
...
cmd->sense_buffer = scsi_alloc_sense_buffer(...);
...
cmd->req.sense = cmd->sense_buffer

example usage trace:

block/scsi_ioctl.c:
(inline from sg_io)
blk_complete_sghdr_rq(...):
struct scsi_request *req = scsi_req(rq);
...
copy_to_user(..., req->sense, len)

scsi_cmd_ioctl(...):
sg_io(...);

In support of usercopy hardening, this patch defines a region in
the scsi_sense_cache slab cache in which userspace copy operations
are allowed.

This region is known as the slab cache's usercopy region. Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust commit log, provide usage trace]
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
drivers/scsi/scsi_lib.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..88bfab251693 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -79,14 +79,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
if (shost->unchecked_isa_dma) {
scsi_sense_isadma_cache =
kmem_cache_create("scsi_sense_cache(DMA)",
- SCSI_SENSE_BUFFERSIZE, 0,
- SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
+ SCSI_SENSE_BUFFERSIZE, 0,
+ SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
if (!scsi_sense_isadma_cache)
ret = -ENOMEM;
} else {
scsi_sense_cache =
- kmem_cache_create("scsi_sense_cache",
- SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN, NULL);
+ kmem_cache_create_usercopy("scsi_sense_cache",
+ SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN,
+ 0, SCSI_SENSE_BUFFERSIZE, NULL);
if (!scsi_sense_cache)
ret = -ENOMEM;
}
--
2.7.4

2017-09-20 20:59:42

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 31/31] lkdtm: Update usercopy tests for whitelisting

This updates the USERCOPY_HEAP_FLAG_* tests to USERCOPY_HEAP_WHITELIST_*,
since the final form of usercopy whitelisting ended up using an offset/size
window instead of the earlier proposed allocation flags.

Signed-off-by: Kees Cook <[email protected]>
---
drivers/misc/lkdtm.h | 4 +-
drivers/misc/lkdtm_core.c | 4 +-
drivers/misc/lkdtm_usercopy.c | 88 ++++++++++++++++++++++++-------------------
3 files changed, 53 insertions(+), 43 deletions(-)

diff --git a/drivers/misc/lkdtm.h b/drivers/misc/lkdtm.h
index bfb6c45b6130..327bcf46fab5 100644
--- a/drivers/misc/lkdtm.h
+++ b/drivers/misc/lkdtm.h
@@ -75,8 +75,8 @@ void __init lkdtm_usercopy_init(void);
void __exit lkdtm_usercopy_exit(void);
void lkdtm_USERCOPY_HEAP_SIZE_TO(void);
void lkdtm_USERCOPY_HEAP_SIZE_FROM(void);
-void lkdtm_USERCOPY_HEAP_FLAG_TO(void);
-void lkdtm_USERCOPY_HEAP_FLAG_FROM(void);
+void lkdtm_USERCOPY_HEAP_WHITELIST_TO(void);
+void lkdtm_USERCOPY_HEAP_WHITELIST_FROM(void);
void lkdtm_USERCOPY_STACK_FRAME_TO(void);
void lkdtm_USERCOPY_STACK_FRAME_FROM(void);
void lkdtm_USERCOPY_STACK_BEYOND(void);
diff --git a/drivers/misc/lkdtm_core.c b/drivers/misc/lkdtm_core.c
index 981b3ef71e47..6e2d767ecaaa 100644
--- a/drivers/misc/lkdtm_core.c
+++ b/drivers/misc/lkdtm_core.c
@@ -245,8 +245,8 @@ struct crashtype crashtypes[] = {
CRASHTYPE(ATOMIC_TIMING),
CRASHTYPE(USERCOPY_HEAP_SIZE_TO),
CRASHTYPE(USERCOPY_HEAP_SIZE_FROM),
- CRASHTYPE(USERCOPY_HEAP_FLAG_TO),
- CRASHTYPE(USERCOPY_HEAP_FLAG_FROM),
+ CRASHTYPE(USERCOPY_HEAP_WHITELIST_TO),
+ CRASHTYPE(USERCOPY_HEAP_WHITELIST_FROM),
CRASHTYPE(USERCOPY_STACK_FRAME_TO),
CRASHTYPE(USERCOPY_STACK_FRAME_FROM),
CRASHTYPE(USERCOPY_STACK_BEYOND),
diff --git a/drivers/misc/lkdtm_usercopy.c b/drivers/misc/lkdtm_usercopy.c
index df6ac985fbb5..f6055f4922bf 100644
--- a/drivers/misc/lkdtm_usercopy.c
+++ b/drivers/misc/lkdtm_usercopy.c
@@ -19,7 +19,7 @@
*/
static volatile size_t unconst = 0;
static volatile size_t cache_size = 1024;
-static struct kmem_cache *bad_cache;
+static struct kmem_cache *whitelist_cache;

static const unsigned char test_text[] = "This is a test.\n";

@@ -114,6 +114,10 @@ static noinline void do_usercopy_stack(bool to_user, bool bad_frame)
vm_munmap(user_addr, PAGE_SIZE);
}

+/*
+ * This checks for whole-object size validation with hardened usercopy,
+ * with or without usercopy whitelisting.
+ */
static void do_usercopy_heap_size(bool to_user)
{
unsigned long user_addr;
@@ -171,77 +175,79 @@ static void do_usercopy_heap_size(bool to_user)
kfree(two);
}

-static void do_usercopy_heap_flag(bool to_user)
+/*
+ * This checks for the specific whitelist window within an object. If this
+ * test passes, then do_usercopy_heap_size() tests will pass too.
+ */
+static void do_usercopy_heap_whitelist(bool to_user)
{
- unsigned long user_addr;
- unsigned char *good_buf = NULL;
- unsigned char *bad_buf = NULL;
+ unsigned long user_alloc;
+ unsigned char *buf = NULL;
+ unsigned char __user *user_addr;
+ size_t offset, size;

/* Make sure cache was prepared. */
- if (!bad_cache) {
+ if (!whitelist_cache) {
pr_warn("Failed to allocate kernel cache\n");
return;
}

/*
- * Allocate one buffer from each cache (kmalloc will have the
- * SLAB_USERCOPY flag already, but "bad_cache" won't).
+ * Allocate a buffer with a whitelisted window in the buffer.
*/
- good_buf = kmalloc(cache_size, GFP_KERNEL);
- bad_buf = kmem_cache_alloc(bad_cache, GFP_KERNEL);
- if (!good_buf || !bad_buf) {
- pr_warn("Failed to allocate buffers from caches\n");
+ buf = kmem_cache_alloc(whitelist_cache, GFP_KERNEL);
+ if (!buf) {
+ pr_warn("Failed to allocate buffer from whitelist cache\n");
goto free_alloc;
}

/* Allocate user memory we'll poke at. */
- user_addr = vm_mmap(NULL, 0, PAGE_SIZE,
+ user_alloc = vm_mmap(NULL, 0, PAGE_SIZE,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_ANONYMOUS | MAP_PRIVATE, 0);
- if (user_addr >= TASK_SIZE) {
+ if (user_alloc >= TASK_SIZE) {
pr_warn("Failed to allocate user memory\n");
goto free_alloc;
}
+ user_addr = (void __user *)user_alloc;

- memset(good_buf, 'A', cache_size);
- memset(bad_buf, 'B', cache_size);
+ memset(buf, 'B', cache_size);
+
+ /* Whitelisted window in buffer, from kmem_cache_create_usercopy. */
+ offset = (cache_size / 4) + unconst;
+ size = (cache_size / 16) + unconst;

if (to_user) {
- pr_info("attempting good copy_to_user with SLAB_USERCOPY\n");
- if (copy_to_user((void __user *)user_addr, good_buf,
- cache_size)) {
+ pr_info("attempting good copy_to_user inside whitelist\n");
+ if (copy_to_user(user_addr, buf + offset, size)) {
pr_warn("copy_to_user failed unexpectedly?!\n");
goto free_user;
}

- pr_info("attempting bad copy_to_user w/o SLAB_USERCOPY\n");
- if (copy_to_user((void __user *)user_addr, bad_buf,
- cache_size)) {
+ pr_info("attempting bad copy_to_user outside whitelist\n");
+ if (copy_to_user(user_addr, buf + offset - 1, size)) {
pr_warn("copy_to_user failed, but lacked Oops\n");
goto free_user;
}
} else {
- pr_info("attempting good copy_from_user with SLAB_USERCOPY\n");
- if (copy_from_user(good_buf, (void __user *)user_addr,
- cache_size)) {
+ pr_info("attempting good copy_from_user inside whitelist\n");
+ if (copy_from_user(buf + offset, user_addr, size)) {
pr_warn("copy_from_user failed unexpectedly?!\n");
goto free_user;
}

- pr_info("attempting bad copy_from_user w/o SLAB_USERCOPY\n");
- if (copy_from_user(bad_buf, (void __user *)user_addr,
- cache_size)) {
+ pr_info("attempting bad copy_from_user outside whitelist\n");
+ if (copy_from_user(buf + offset - 1, user_addr, size)) {
pr_warn("copy_from_user failed, but lacked Oops\n");
goto free_user;
}
}

free_user:
- vm_munmap(user_addr, PAGE_SIZE);
+ vm_munmap(user_alloc, PAGE_SIZE);
free_alloc:
- if (bad_buf)
- kmem_cache_free(bad_cache, bad_buf);
- kfree(good_buf);
+ if (buf)
+ kmem_cache_free(whitelist_cache, buf);
}

/* Callable tests. */
@@ -255,14 +261,14 @@ void lkdtm_USERCOPY_HEAP_SIZE_FROM(void)
do_usercopy_heap_size(false);
}

-void lkdtm_USERCOPY_HEAP_FLAG_TO(void)
+void lkdtm_USERCOPY_HEAP_WHITELIST_TO(void)
{
- do_usercopy_heap_flag(true);
+ do_usercopy_heap_whitelist(true);
}

-void lkdtm_USERCOPY_HEAP_FLAG_FROM(void)
+void lkdtm_USERCOPY_HEAP_WHITELIST_FROM(void)
{
- do_usercopy_heap_flag(false);
+ do_usercopy_heap_whitelist(false);
}

void lkdtm_USERCOPY_STACK_FRAME_TO(void)
@@ -313,11 +319,15 @@ void lkdtm_USERCOPY_KERNEL(void)
void __init lkdtm_usercopy_init(void)
{
/* Prepare cache that lacks SLAB_USERCOPY flag. */
- bad_cache = kmem_cache_create("lkdtm-no-usercopy", cache_size, 0,
- 0, NULL);
+ whitelist_cache =
+ kmem_cache_create_usercopy("lkdtm-usercopy", cache_size,
+ 0, 0,
+ cache_size / 4,
+ cache_size / 16,
+ NULL);
}

void __exit lkdtm_usercopy_exit(void)
{
- kmem_cache_destroy(bad_cache);
+ kmem_cache_destroy(whitelist_cache);
}
--
2.7.4

2017-09-20 21:00:27

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 27/31] x86: Implement thread_struct whitelist for hardened usercopy

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire struct.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Mathias Krause <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Acked-by: Rik van Riel <[email protected]>
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/processor.h | 8 ++++++++
2 files changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 971feac13506..6642e8eaff45 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -114,6 +114,7 @@ config X86
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT
select HAVE_ARCH_SECCOMP_FILTER
+ select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3fa26a61eabc..868235b967ed 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -488,6 +488,14 @@ struct thread_struct {
*/
};

+/* Whitelist the FPU state from the task_struct for hardened usercopy. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+ unsigned long *size)
+{
+ *offset = offsetof(struct thread_struct, fpu.state);
+ *size = fpu_kernel_xstate_size;
+}
+
/*
* Thread-synchronous status.
*
--
2.7.4

2017-09-20 20:46:00

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 04/31] dcache: Define usercopy region in dentry_cache slab cache

From: David Windsor <[email protected]>

When a dentry name is short enough, it can be stored directly in the
dentry itself (instead in a separate kmalloc allocation). These dentry
short names, stored in struct dentry.d_iname and therefore contained in
the dentry_cache slab cache, need to be coped to userspace.

cache object allocation:
fs/dcache.c:
__d_alloc(...):
...
dentry = kmem_cache_alloc(dentry_cache, ...);
...
dentry->d_name.name = dentry->d_iname;

example usage trace:
filldir+0xb0/0x140
dcache_readdir+0x82/0x170
iterate_dir+0x142/0x1b0
SyS_getdents+0xb5/0x160

fs/readdir.c:
(called via ctx.actor by dir_emit)
filldir(..., const char *name, ...):
...
copy_to_user(..., name, namlen)

fs/libfs.c:
dcache_readdir(...):
...
next = next_positive(dentry, p, 1)
...
dir_emit(..., next->d_name.name, ...)

In support of usercopy hardening, this patch defines a region in the
dentry_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: adjust hunks for kmalloc-specific things moved later]
[kees: adjust commit log, provide usage trace]
Cc: Alexander Viro <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
fs/dcache.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..5f5e7c1fcf4b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3603,8 +3603,9 @@ static void __init dcache_init(void)
* but it is probably not worth it because of the cache nature
* of the dcache.
*/
- dentry_cache = KMEM_CACHE(dentry,
- SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
+ dentry_cache = KMEM_CACHE_USERCOPY(dentry,
+ SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+ d_iname);

/* Hash may have been set up in dcache_init_early */
if (!hashdist)
--
2.7.4

2017-09-20 21:02:43

by Kees Cook

[permalink] [raw]
Subject: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches

From: David Windsor <[email protected]>

Mark the kmalloc slab caches as entirely whitelisted. These caches
are frequently used to fulfill kernel allocations that contain data
to be copied to/from userspace. Internal-only uses are also common,
but are scattered in the kernel. For now, mark all the kmalloc caches
as whitelisted.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <[email protected]>
[kees: merged in moved kmalloc hunks, adjust commit log]
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
---
mm/slab.c | 3 ++-
mm/slab.h | 3 ++-
mm/slab_common.c | 10 ++++++----
3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index df268999cf02..9af16f675927 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
*/
kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
kmalloc_info[INDEX_NODE].name,
- kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
+ kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
+ 0, kmalloc_size(INDEX_NODE));
slab_state = PARTIAL_NODE;
setup_kmalloc_cache_index_table();

diff --git a/mm/slab.h b/mm/slab.h
index 044755ff9632..2e0fe357d777 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -97,7 +97,8 @@ struct kmem_cache *kmalloc_slab(size_t, gfp_t);
extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);

extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
- unsigned long flags);
+ unsigned long flags, size_t useroffset,
+ size_t usersize);
extern void create_boot_cache(struct kmem_cache *, const char *name,
size_t size, unsigned long flags, size_t useroffset,
size_t usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 36408f5f2a34..d4e6442f9bbc 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -920,14 +920,15 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t siz
}

struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
- unsigned long flags)
+ unsigned long flags, size_t useroffset,
+ size_t usersize)
{
struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);

if (!s)
panic("Out of memory when creating slab %s\n", name);

- create_boot_cache(s, name, size, flags, 0, size);
+ create_boot_cache(s, name, size, flags, useroffset, usersize);
list_add(&s->list, &slab_caches);
memcg_link_cache(s);
s->refcount = 1;
@@ -1081,7 +1082,8 @@ void __init setup_kmalloc_cache_index_table(void)
static void __init new_kmalloc_cache(int idx, unsigned long flags)
{
kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
- kmalloc_info[idx].size, flags);
+ kmalloc_info[idx].size, flags, 0,
+ kmalloc_info[idx].size);
}

/*
@@ -1122,7 +1124,7 @@ void __init create_kmalloc_caches(unsigned long flags)

BUG_ON(!n);
kmalloc_dma_caches[i] = create_kmalloc_cache(n,
- size, SLAB_CACHE_DMA | flags);
+ size, SLAB_CACHE_DMA | flags, 0, 0);
}
}
#endif
--
2.7.4

2017-09-20 21:21:49

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v3 14/31] vxfs: Define usercopy region in vxfs_inode slab cache

On Wed, Sep 20, 2017 at 1:56 PM, Christoph Hellwig <[email protected]> wrote:
> Hi Kees,
>
> I've only got this single email from you, which on it's own doesn't
> compile and seems to be part of a 31 patch series.
>
> So as-is NAK, doesn't work.
>
> Please make sure to always send every patch in a series to every
> developer you want to include.

This is why I included several other lists on the full CC (am I
unlucky enough to have you not subscribed to any of them?). Adding a
CC for everyone can result in a huge CC list, especially for the
forth-coming 300-patch timer_list series. ;)

Do you want me to resend the full series to you, or would you prefer
something else like a patchwork bundle? (I'll explicitly add you to CC
for any future versions, though.)

-Kees

--
Kees Cook
Pixel Security

2017-09-20 23:22:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 14/31] vxfs: Define usercopy region in vxfs_inode slab cache

On Wed, Sep 20, 2017 at 02:21:45PM -0700, Kees Cook wrote:
> This is why I included several other lists on the full CC (am I
> unlucky enough to have you not subscribed to any of them?). Adding a
> CC for everyone can result in a huge CC list, especially for the
> forth-coming 300-patch timer_list series. ;)

If you think the lists are enough to review changes include only
the lists, but don't add CCs for individual patches, that's what
I usually do for cleanups that touch a lot of drivers, but don't
really change actual logic in ever little driver touched.

> Do you want me to resend the full series to you, or would you prefer
> something else like a patchwork bundle? (I'll explicitly add you to CC
> for any future versions, though.)

I'm fine with not being Cced at all if there isn't anything requiring
my urgent personal attention. It's up to you whom you want to Cc,
but my preference is generally for rather less than more people, and
rather more than less mailing lists.

But the important bit is to Cc a person or mailinglist either on
all patches or on none, otherwise a good review isn't possible.

2017-09-21 09:34:59

by Luis de Bethencourt

[permalink] [raw]
Subject: Re: [PATCH v3 10/31] befs: Define usercopy region in befs_inode_cache slab cache

On 09/20/2017 09:45 PM, Kees Cook wrote:
> From: David Windsor <[email protected]>
>
> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
> and therefore contained in the befs_inode_cache slab cache, need to be
> copied to/from userspace.
>
> cache object allocation:
> fs/befs/linuxvfs.c:
> befs_alloc_inode(...):
> ...
> bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
> ...
> return &bi->vfs_inode;
>
> befs_iget(...):
> ...
> strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
> BEFS_SYMLINK_LEN);
> ...
> inode->i_link = befs_ino->i_data.symlink;
>
> example usage trace:
> readlink_copy+0x43/0x70
> vfs_readlink+0x62/0x110
> SyS_readlinkat+0x100/0x130
>
> fs/namei.c:
> readlink_copy(..., link):
> ...
> copy_to_user(..., link, len);
>
> (inlined in vfs_readlink)
> generic_readlink(dentry, ...):
> struct inode *inode = d_inode(dentry);
> const char *link = inode->i_link;
> ...
> readlink_copy(..., link);
>
> In support of usercopy hardening, this patch defines a region in the
> befs_inode_cache slab cache in which userspace copy operations are
> allowed.
>
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
>
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
>
> Signed-off-by: David Windsor <[email protected]>
> [kees: adjust commit log, provide usage trace]
> Cc: Luis de Bethencourt <[email protected]>
> Cc: Salah Triki <[email protected]>
> Signed-off-by: Kees Cook <[email protected]>
> Acked-by: Luis de Bethencourt <[email protected]>
> ---
> fs/befs/linuxvfs.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
> index a92355cc453b..e5dcd26003dc 100644
> --- a/fs/befs/linuxvfs.c
> +++ b/fs/befs/linuxvfs.c
> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
> static int __init
> befs_init_inodecache(void)
> {
> - befs_inode_cachep = kmem_cache_create("befs_inode_cache",
> - sizeof (struct befs_inode_info),
> - 0, (SLAB_RECLAIM_ACCOUNT|
> - SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> - init_once);
> + befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
> + sizeof(struct befs_inode_info), 0,
> + (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
> + SLAB_ACCOUNT),
> + offsetof(struct befs_inode_info,
> + i_data.symlink),
> + sizeof_field(struct befs_inode_info,
> + i_data.symlink),
> + init_once);
> if (befs_inode_cachep == NULL)
> return -ENOMEM;
>
>

No changes in the befs patch in v3. It goes without saying I continue to
Ack this.

Thanks Kees and David,
Luis

Subject: Re: [PATCH v3 01/31] usercopy: Prepare for usercopy whitelisting

On Wed, 20 Sep 2017, Kees Cook wrote:

> diff --git a/include/linux/stddef.h b/include/linux/stddef.h
> index 9c61c7cda936..f00355086fb2 100644
> --- a/include/linux/stddef.h
> +++ b/include/linux/stddef.h
> @@ -18,6 +18,8 @@ enum {
> #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
> #endif
>
> +#define sizeof_field(structure, field) sizeof((((structure *)0)->field))
> +
> /**
> * offsetofend(TYPE, MEMBER)
> *

Hmmm.. Is that really necessary? Code knows the type of field and can
use sizeof type.

Also this is a non slab change hidden in the patchset.

> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 904a83be82de..36408f5f2a34 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -272,6 +272,9 @@ int slab_unmergeable(struct kmem_cache *s)
> if (s->ctor)
> return 1;
>
> + if (s->usersize)
> + return 1;
> +
> /*
> * We may have set a slab to be unmergeable during bootstrap.
> */

This will ultimately make all slabs unmergeable at the end of your
patchset? Lots of space will be wasted. Is there any way to make this
feature optional?

#ifdef CONFIG_HARDENED around this?


> @@ -491,6 +509,15 @@ kmem_cache_create(const char *name, size_t size, size_t align,
> }
> return s;
> }
> +EXPORT_SYMBOL(kmem_cache_create_usercopy);
> +
> +struct kmem_cache *
> +kmem_cache_create(const char *name, size_t size, size_t align,
> + unsigned long flags, void (*ctor)(void *))
> +{
> + return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
> + ctor);
> +}
> EXPORT_SYMBOL(kmem_cache_create);

Well this makes the slab created unmergeable.

> @@ -897,7 +927,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
> if (!s)
> panic("Out of memory when creating slab %s\n", name);
>
> - create_boot_cache(s, name, size, flags);
> + create_boot_cache(s, name, size, flags, 0, size);

Ok this makes the kmalloc array unmergeable.

> @@ -5081,6 +5081,12 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
> SLAB_ATTR_RO(cache_dma);
> #endif
>
> +static ssize_t usersize_show(struct kmem_cache *s, char *buf)
> +{
> + return sprintf(buf, "%zu\n", s->usersize);
> +}
> +SLAB_ATTR_RO(usersize);
> +
> static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
> {
> return sprintf(buf, "%d\n", !!(s->flags & SLAB_TYPESAFE_BY_RCU));
> @@ -5455,6 +5461,7 @@ static struct attribute *slab_attrs[] = {
> #ifdef CONFIG_FAILSLAB
> &failslab_attr.attr,
> #endif
> + &usersize_attr.attr,

So useroffset is not exposed?

Subject: Re: [PATCH v3 02/31] usercopy: Enforce slab cache usercopy region boundaries

On Wed, 20 Sep 2017, Kees Cook wrote:

> diff --git a/mm/slab.c b/mm/slab.c
> index 87b6e5e0cdaf..df268999cf02 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -4408,7 +4408,9 @@ module_init(slab_proc_init);
>
> #ifdef CONFIG_HARDENED_USERCOPY
> /*
> - * Rejects objects that are incorrectly sized.
> + * Rejects incorrectly sized objects and objects that are to be copied
> + * to/from userspace but do not fall entirely within the containing slab
> + * cache's usercopy region.
> *
> * Returns NULL if check passes, otherwise const char * to name of cache
> * to indicate an error.
> @@ -4428,11 +4430,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
> /* Find offset within object. */
> offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
>
> - /* Allow address range falling entirely within object size. */
> - if (offset <= cachep->object_size && n <= cachep->object_size - offset)
> - return NULL;
> + /* Make sure object falls entirely within cache's usercopy region. */
> + if (offset < cachep->useroffset)
> + return cachep->name;
> + if (offset - cachep->useroffset > cachep->usersize)
> + return cachep->name;
> + if (n > cachep->useroffset - offset + cachep->usersize)
> + return cachep->name;
>
> - return cachep->name;
> + return NULL;
> }
> #endif /* CONFIG_HARDENED_USERCOPY */

Looks like this is almost the same for all allocators. Can we put this
into mm/slab_common.c?

Subject: Re: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches

On Wed, 20 Sep 2017, Kees Cook wrote:

> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
> */
> kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
> kmalloc_info[INDEX_NODE].name,
> - kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
> + kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
> + 0, kmalloc_size(INDEX_NODE));
> slab_state = PARTIAL_NODE;
> setup_kmalloc_cache_index_table();

Ok this presumes that at some point we will be able to restrict the number
of bytes writeable and thus set the offset and size field to different
values. Is that realistic?

We already whitelist all kmalloc caches (see first patch).

So what is the point of this patch?

2017-09-21 15:40:06

by Kees Cook

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches

On Thu, Sep 21, 2017 at 8:27 AM, Christopher Lameter <[email protected]> wrote:
> On Wed, 20 Sep 2017, Kees Cook wrote:
>
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
>> */
>> kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
>> kmalloc_info[INDEX_NODE].name,
>> - kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
>> + kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
>> + 0, kmalloc_size(INDEX_NODE));
>> slab_state = PARTIAL_NODE;
>> setup_kmalloc_cache_index_table();
>
> Ok this presumes that at some point we will be able to restrict the number
> of bytes writeable and thus set the offset and size field to different
> values. Is that realistic?
>
> We already whitelist all kmalloc caches (see first patch).
>
> So what is the point of this patch?

The DMA kmalloc caches are not whitelisted:

>> kmalloc_dma_caches[i] = create_kmalloc_cache(n,
>> - size, SLAB_CACHE_DMA | flags);
>> + size, SLAB_CACHE_DMA | flags, 0, 0);

So this is creating the distinction between the kmallocs that go to
userspace and those that don't. The expectation is that future work
can start to distinguish between "for userspace" and "only kernel"
kmalloc allocations, as is already done here for DMA.

-Kees

--
Kees Cook
Pixel Security

Subject: Re: [kernel-hardening] Re: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches

On Thu, 21 Sep 2017, Kees Cook wrote:

> > So what is the point of this patch?
>
> The DMA kmalloc caches are not whitelisted:

The DMA kmalloc caches are pretty obsolete and mostly there for obscure
drivers.

??

> >> kmalloc_dma_caches[i] = create_kmalloc_cache(n,
> >> - size, SLAB_CACHE_DMA | flags);
> >> + size, SLAB_CACHE_DMA | flags, 0, 0);
>
> So this is creating the distinction between the kmallocs that go to
> userspace and those that don't. The expectation is that future work
> can start to distinguish between "for userspace" and "only kernel"
> kmalloc allocations, as is already done here for DMA.

The creation of the kmalloc caches in earlier patches already setup the
"whitelisting". Why do it twice?

2017-09-21 18:26:48

by Kees Cook

[permalink] [raw]
Subject: Re: [kernel-hardening] Re: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches

On Thu, Sep 21, 2017 at 9:04 AM, Christopher Lameter <[email protected]> wrote:
> On Thu, 21 Sep 2017, Kees Cook wrote:
>
>> > So what is the point of this patch?
>>
>> The DMA kmalloc caches are not whitelisted:
>
> The DMA kmalloc caches are pretty obsolete and mostly there for obscure
> drivers.
>
> ??

They may be obsolete, but they're still in the kernel, and they aren't
copied to userspace, so we can mark them.

>> >> kmalloc_dma_caches[i] = create_kmalloc_cache(n,
>> >> - size, SLAB_CACHE_DMA | flags);
>> >> + size, SLAB_CACHE_DMA | flags, 0, 0);
>>
>> So this is creating the distinction between the kmallocs that go to
>> userspace and those that don't. The expectation is that future work
>> can start to distinguish between "for userspace" and "only kernel"
>> kmalloc allocations, as is already done here for DMA.
>
> The creation of the kmalloc caches in earlier patches already setup the
> "whitelisting". Why do it twice?

Patch 1 is to allow for things to mark their whitelists. Patch 30
disables the full whitelisting, since then we've defined them all, so
the kmalloc caches need to mark themselves as whitelisted.

Patch 1 leaves unmarked things whitelisted so we can progressively
tighten the restriction and have a bisectable series. (i.e. if there
is something wrong with one of the whitelists in the series, it will
bisect to that one, not the one that removes the global whitelist from
patch 1.)

-Kees

--
Kees Cook
Pixel Security

2017-09-22 02:55:37

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH v3 09/31] jfs: Define usercopy region in jfs_ip slab cache

Acked-by: Dave Kleikamp <[email protected]>

On 09/20/2017 03:45 PM, Kees Cook wrote:
> From: David Windsor <[email protected]>
>
> The jfs symlink pathnames, stored in struct jfs_inode_info.i_inline and
> therefore contained in the jfs_ip slab cache, need to be copied to/from
> userspace.
>
> cache object allocation:
> fs/jfs/super.c:
> jfs_alloc_inode(...):
> ...
> jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
> ...
> return &jfs_inode->vfs_inode;
>
> fs/jfs/jfs_incore.h:
> JFS_IP(struct inode *inode):
> return container_of(inode, struct jfs_inode_info, vfs_inode);
>
> fs/jfs/inode.c:
> jfs_iget(...):
> ...
> inode->i_link = JFS_IP(inode)->i_inline;
>
> example usage trace:
> readlink_copy+0x43/0x70
> vfs_readlink+0x62/0x110
> SyS_readlinkat+0x100/0x130
>
> fs/namei.c:
> readlink_copy(..., link):
> ...
> copy_to_user(..., link, len);
>
> (inlined in vfs_readlink)
> generic_readlink(dentry, ...):
> struct inode *inode = d_inode(dentry);
> const char *link = inode->i_link;
> ...
> readlink_copy(..., link);
>
> In support of usercopy hardening, this patch defines a region in the
> jfs_ip slab cache in which userspace copy operations are allowed.
>
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
>
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
>
> Signed-off-by: David Windsor <[email protected]>
> [kees: adjust commit log, provide usage trace]
> Cc: Dave Kleikamp <[email protected]>
> Cc: [email protected]
> Signed-off-by: Kees Cook <[email protected]>
> ---
> fs/jfs/super.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/fs/jfs/super.c b/fs/jfs/super.c
> index 2f14677169c3..e018412608d4 100644
> --- a/fs/jfs/super.c
> +++ b/fs/jfs/super.c
> @@ -966,9 +966,11 @@ static int __init init_jfs_fs(void)
> int rc;
>
> jfs_inode_cachep =
> - kmem_cache_create("jfs_ip", sizeof(struct jfs_inode_info), 0,
> - SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
> - init_once);
> + kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info),
> + 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
> + offsetof(struct jfs_inode_info, i_inline),
> + sizeof_field(struct jfs_inode_info, i_inline),
> + init_once);
> if (jfs_inode_cachep == NULL)
> return -ENOMEM;
>
>