Received: by 10.192.165.148 with SMTP id m20csp182169imm; Fri, 4 May 2018 08:36:26 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpwwHsDsqC+m2Dq6R4/AeyIDc70gQhao74GhisiPRvbmhh5d6ZM5r88tp/bFJmE7ApvpDTG X-Received: by 2002:a17:902:6bca:: with SMTP id m10-v6mr28312370plt.6.1525448186935; Fri, 04 May 2018 08:36:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525448186; cv=none; d=google.com; s=arc-20160816; b=XhhEu/WbGbuL6s6LwyRiiyRUKqQK01XtGsVLqVXqTfJ9OUUEC8Xoz64q6VgsChNecA ZfuUlexni3bnBdBDX3aY33BHxIonBH6/OGHDn07cpp+ARorVZSfxXLy8O+dmuDhY9Tja 27dukj8LgCJ3/Laodb90sLKYtZUIsmRCsq7CHogegCfhyf4Fy+M4qfWeFcDhlLceGh7b vNzuKvhEF/x9hfue2Tx84DvwjD/Q3GOyrPUAbnISMObKs9xW+zPNrkO4ZsaXNRZn2Xj/ WOn774ndrTrkdomGHjwW0iHUXNxxI/MIoChiMx4MjgO+89dvFwgLA/Ch78WRhIL5Qaxg K8Dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=lyvWhGEkRb277ldoyqKBp0lx6jAgMJNEj1aaCsd2rk0=; b=XRvznstUOJkIWy9kt879WO9FHlkySa1AbNyD8Z4/7/hVE5J7TkiPAt2+Q1LGVeMVqI Vjf2RUKpLWE54iD6TZdVmSgn6NoshEM7Fneauo9aRLSFkW87ZIBAs5M7qLw3oQSDeEJi leSfPdkWgLGhBFKXPdRYLIV6ayWUOxP851a/U8l72C08qHVH+0mxZBl5O/lr9jkG49GN qaqzC5zlKVmnE2cH6mA788wrSrPfS5YEhHnsisitl7YnahhNa7h5JILlevsyodWlUtsY Ug1Y0zog0YQF0P3gVJVO9ZvzU8ArTg6GFwyOjJ0NtvY7X2JeVl1mb0wlBIq6401orD0S fYwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=TJ9ojVN/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a35-v6si12157539pli.85.2018.05.04.08.36.12; Fri, 04 May 2018 08:36:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=TJ9ojVN/; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751573AbeEDPfk (ORCPT + 99 others); Fri, 4 May 2018 11:35:40 -0400 Received: from mail-io0-f193.google.com ([209.85.223.193]:38394 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751538AbeEDPfh (ORCPT ); Fri, 4 May 2018 11:35:37 -0400 Received: by mail-io0-f193.google.com with SMTP id z4-v6so26125615iof.5 for ; Fri, 04 May 2018 08:35:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lyvWhGEkRb277ldoyqKBp0lx6jAgMJNEj1aaCsd2rk0=; b=TJ9ojVN/llitNJiOhzKDq+NYU/skpAKPyRskPmnv+z4ARs22RTPN46yV2gXGt3IRzG gA+z5RFMLvJiOmsj7RPKKRkQupeD5eLYBMqLbnfMUYaYtkFJ+zrtOpbxBd7lwFOifLtr sORWSBrgZfbkUpPjR+yv4Ys0eaw+y97+HQhmo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lyvWhGEkRb277ldoyqKBp0lx6jAgMJNEj1aaCsd2rk0=; b=HuaRaOEqN9NMbrW7ziqCx7wgQv6fjnyyTlmg6uY/vBIVK++O+sSUj88GUSREo1mDIW VL0j/VYjY9qwxtdcAr/BWGSyuRNWXseiqkD5a9mOmgCBH3N3Ll1huqn0ImeMVGdebs1Z +inVpE4YCD1dZVVWxohacwhl+rRYBbUgOI6Pm/L8nW1eFFgcRPoT0HWv5f7whQdBBv17 BkYo3pH2uVCEhAO7DTinsIlNJsObE+Yr52ZPdR6H4CSM22xVJ67arRXLCeX8iGni9QzD dAR0IjZB+iEVfHSWIiFUnZoC5yWd53XCy9Yybdm2fNtoYLktM9J+aCXP+3VkCJ49WcrW p7CQ== X-Gm-Message-State: ALQs6tChB/9YjG8CQUTUCNoZp+GmiDw3SIlQSftCkslAkJK7LBdKNgte UMBKwuPIf7uSQLCN5d+AearrOYYb+G3imCBi/9M= X-Received: by 2002:a6b:dc12:: with SMTP id s18-v6mr31526316ioc.203.1525448136953; Fri, 04 May 2018 08:35:36 -0700 (PDT) MIME-Version: 1.0 References: <20180214182618.14627-1-willy@infradead.org> <20180214182618.14627-3-willy@infradead.org> <20180504131441.GA24691@bombadil.infradead.org> In-Reply-To: <20180504131441.GA24691@bombadil.infradead.org> From: Linus Torvalds Date: Fri, 04 May 2018 15:35:26 +0000 Message-ID: Subject: Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct To: Matthew Wilcox Cc: Andrew Morton , Matthew Wilcox , linux-mm , Kees Cook , Linux Kernel Mailing List , Kernel Hardening Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 4, 2018 at 3:14 AM Matthew Wilcox wrote: > > In fact, the conversion I saw was buggy. You can *not* convert a GFP_ATOMIC > > user of kmalloc() to use kvmalloc. > Not sure which conversion you're referring to; not one of mine, I hope? I was thinking of the coccinelle patch in this thread, but just realized that that actually only did it for GFP_KERNEL, so I guess it would work, apart from the "oops, now it doesn't enforce the kmalloc limits any more" issue. > > - that divide is really really expensive on many architectures. > 'c' and 'size' are _supposed_ to be constant and get evaluated at > compile-time. ie you should get something like this on x86: I guess that willalways be true of the 'kvzalloc_struct() version that will always use a sizeof(). I was more thinking of any bare kvalloc_ab_c() cases, but maybe we'd discourage that to ever be used as such? Because we definitely have things like that, ie a quick grep finds f = kmalloc (sizeof (*f) + size*num, GFP_KERNEL); where 'size' is not obviously a constant. There may be others, but I didn't really try to grep any further. Maybe they aren't common, and maybe the occasional divide doesn't matter, but particularly if we use scripting to then catch and convert users, I really hate the idea of "let's introduce something that is potentially much more expensive than it needs to be". (And the automated coccinelle scripting it's also something where we must very much avoid then subtly lifting allocation size limits) > > Something like > > > > if (size > PAGE_SIZE) > > return NULL; > > if (elem > 65535) > > return NULL; > > if (offset > PAGE_SIZE) > > return NULL; > > return kzalloc(size*elem+offset); > > > > and now you (a) guarantee it can't overflow and (b) don't make people use > > crazy vmalloc() allocations when they damn well shouldn't. > I find your faith in the size of structs in the kernel touching ;-) I *really* hope some of those examples of yours aren't allocated with kmalloc using this pattern.. But yes, I may be naive on the sizes. > We really have two reasons for using vmalloc -- one is "fragmentation > currently makes it impossible to allocate enough contiguous memory > to satisfy your needs" and the other is "this request is for too much > memory to satisfy through the buddy allocator". kvmalloc is normally > (not always; see file descriptor example above) for the first kind of > problem, but I wonder if kvmalloc() shouldn't have the same limit as > kmalloc (2048 pages), then add a kvmalloc_large() which will not impose > that limit check. That would definitely solve at least one worry. We do potentially have users which use kmalloc optimistically and do not want to fall back to vmalloc (they'll fall back to smaller allocations entirely), but I guess if fwe make sure to not convert any __GFP_NOWARN/NORETRY users, that should all be ok. But honestly, I'd rather see just kmalloc users stay as kmalloc users. If instread of "kzvmalloc_ab_c()" you introduce the "struct size calculation" part as a "saturating non-overflow calculation", then it should be fairly easy to just do #define kzalloc_struct(p, member, n, gfp) \ kzalloc(struct_size(p, member, n, gfp) and you basically *know* that the only thing you changed was purely the overflow semantics. That also would take care of my diide worry, because now there wouldn't be any ab_c() calculations that might take a non-constant size. The "struct_size()" thing would always do the sizeof. So you'd have something like /* 'b' and 'c' are always constants */ static inline sizeof_t __ab_c_saturating(size_t a, size_t b, size_t c) { if (b && n > (SIZE_MAX -c) / size) return SIZE_MAX; return a*b+c; } #define struct_size(p, member, n) \ __ab_c_saturating(n, \ sizeof(*(p)->member) + __must_be_array((p)->member), \ offsetof(typeof(*(p)), member)) and then that kzalloc_struct() wrapper define above should just work. The above is entirely untested, but it *looks* sane, and avoids all semantic changes outside of the overflow protection. And nobody would use that __ab_c_saturating() by mistake, I hope. No? Linus