Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp784492pxv; Wed, 14 Jul 2021 15:46:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwWJ54PKJjStnfepBAsa20PHeXQ+H4apMjztG/GEAh6uISIURwom5mGgn/TFyFH8dvXbChk X-Received: by 2002:a05:6638:3292:: with SMTP id f18mr496069jav.120.1626302813471; Wed, 14 Jul 2021 15:46:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626302813; cv=none; d=google.com; s=arc-20160816; b=g/CNB0kEdWzb8LnOm6s3U+f+Nn1LypbVaWu7srDywu/rM6eLDxGiVaCmeZJRHuPfM3 nvrPu4NKUo/FPyK9+KSF16Gu8CLNj4UX75nj00IXqffgkD9xOV+q+plAOEl3onq4DK1d qqTmsBu6cDdzmxEZatfndVs99WXMp1MUT7O2zylWvJzPPtZEPkY/NbWMLHvBTvwKBmPs 2rOyDzg9xQbguFoVdfln477MfQqeQ9pUmlEtBXKTK67qOStJgvrtd5aA8ULcT2jD2oha 7xFLa2PCuDGuPP4hDpqi+gQilzehxxss1i71vqPH8GEJNSPrpD+WSQKGNExfOFNuoTxs 0Wzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=R4folDUDICkUURmaxRdd97kJ1DDYqeb4hdwNbT+pwyY=; b=kgE4Kb9Q3L5evM0vUxwHI0G+bluNjwkV7rtBVstLFYc2Q+lJb7TrnG8R8NZAVGo+cJ 6Y79nO1t2ht+qXM7su2x2qTHjELAjBA6fRmozI6RJ6EG8a0y+9FrgyQqH9Gpe6ixe8a3 maihsQ8dqvVfQbTdT+uutFG5fcEhRgJ00nLPUAPSSh8TP58ot72FnynovTzmR8CTqHgs zgM2sSDiyCPz8gxaitW0MusPnUzGwFVoJa3VK/x21gaLvYtzkwoDCRoifOgN6Z65tuYL Qz121E6RtazX0gZb1J7Owe2Mm85NdenJeAnxgBnwaI0OsKf+blR6vXmrSGY5/p4RP8/Z zwDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="c+/78aYl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o8si4463189ilu.107.2021.07.14.15.46.41; Wed, 14 Jul 2021 15:46:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="c+/78aYl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235200AbhGNWm5 (ORCPT + 99 others); Wed, 14 Jul 2021 18:42:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229900AbhGNWm5 (ORCPT ); Wed, 14 Jul 2021 18:42:57 -0400 Received: from mail-io1-xd29.google.com (mail-io1-xd29.google.com [IPv6:2607:f8b0:4864:20::d29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19D86C06175F for ; Wed, 14 Jul 2021 15:40:04 -0700 (PDT) Received: by mail-io1-xd29.google.com with SMTP id z11so4182298iow.0 for ; Wed, 14 Jul 2021 15:40:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=R4folDUDICkUURmaxRdd97kJ1DDYqeb4hdwNbT+pwyY=; b=c+/78aYldlz56IMFQTUGpvtOXrSnRqfxRP0K69s+NUewdaICOTLcGuSFhnKZO3Vwni FijSkmYS82HMZLcPRGgiFL2lEIeh//BeVwGuAdYDy+c79cY5Jo34eMPHQuZ9jdp5kbb3 Zjl3GPk5Ets/VJQJmE/ZSqd03xVhXNILf5Gaw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=R4folDUDICkUURmaxRdd97kJ1DDYqeb4hdwNbT+pwyY=; b=Aw0oXvrQVP5VM5DXCbfYRozKbm1MN9fhwwTbmlevtDtxP5vTzuVqbZViHGTEUf8wuH UPgSFl7luUOIyBr3Ay06eliKv7T0mdq+gLv7036LwX6DVdTEbxOLs9XMDuTnIzYCDbgK yYFBTSY4OVXek+eQZrn6+cKHsv5c0xv6tqoD7QvuDvESFAvDqSlEj4u0q4oBqfOcw4nO u791l62DMvth7VZTuSnWaIzDYjzDnMmBI6s0DVHUA6qGzyRppsvm69xzTP/JUIxwd8tT j/asIbEUsJMRCFKCwjzsl9o6imrpFehS8CZsgywhvDc9fmkSJJebDlxPrmd8iAgk78gG mqAQ== X-Gm-Message-State: AOAM533uoFnW0qjW0KpqmYBA0ZxV9mA/4DVwuv96z0lQ0iLZYcTshyLL MT+4LKT4s2KC/kQAkZKGKyGirQP12WXerA== X-Received: by 2002:a05:6602:17ce:: with SMTP id z14mr317286iox.73.1626302403262; Wed, 14 Jul 2021 15:40:03 -0700 (PDT) Received: from mail-io1-f52.google.com (mail-io1-f52.google.com. [209.85.166.52]) by smtp.gmail.com with ESMTPSA id p25sm2093851ioj.18.2021.07.14.15.40.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 14 Jul 2021 15:40:02 -0700 (PDT) Received: by mail-io1-f52.google.com with SMTP id l5so4134569iok.7 for ; Wed, 14 Jul 2021 15:40:01 -0700 (PDT) X-Received: by 2002:a6b:7719:: with SMTP id n25mr313562iom.37.1626302401031; Wed, 14 Jul 2021 15:40:01 -0700 (PDT) MIME-Version: 1.0 References: <20210709105012.v2.1.I09866d90c6de14f21223a03e9e6a31f8a02ecbaf@changeid> In-Reply-To: From: Evan Green Date: Wed, 14 Jul 2021 15:39:25 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2] mm: Enable suspend-only swap spaces To: Michal Hocko Cc: Andrew Morton , David Hildenbrand , Pavel Machek , Alex Shi , Alistair Popple , Jens Axboe , Johannes Weiner , Joonsoo Kim , "Matthew Wilcox (Oracle)" , Miaohe Lin , Minchan Kim , Vlastimil Babka , LKML , linux-mm@kvack.org, linux-api@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 13, 2021 at 10:42 PM Michal Hocko wrote: > > On Mon 12-07-21 14:32:05, Evan Green wrote: > > On Mon, Jul 12, 2021 at 12:03 AM Michal Hocko wrote: > > > > > > [Cc linux-api] > > > > > > On Fri 09-07-21 10:50:48, Evan Green wrote: > > > > Currently it's not possible to enable hibernation without also enabling > > > > generic swap for a given swap area. These two use cases are not the > > > > same. For example there may be users who want to enable hibernation, > > > > but whose drives don't have the write endurance for generic swap > > > > activities. > > > > > > > > Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow > > > > generic swapping to it. This region can still be wired up for use in > > > > suspend-to-disk activities, but will never have regular pages swapped to > > > > it. > > > > > > Could you expand some more on why a strict exclusion is really > > > necessary? I do understand that one might not want to have swap storage > > > available all the time but considering that swapon is really a light > > > operation so something like the following should be a reasonable > > > workaround, no? > > > swapon storage/file > > > s2disk > > > swapoff storage > > > > Broadly, it seemed like a reasonable thing for the kernel to be able > > to do. The workaround you suggest does work for some use cases, but it > > seems like a gap the kernel could more naturally fill. > > > > Without getting too off into the weeds, there a handful of factors > > that make this change particularly useful to me: > > > > * Slicing off part of your SSD to be SLC (single level cell) is > > expensive. From what I understand you gain endurance and speed at the > > cost of 3-4x capacity. In other words for every 1GB of SLC space you > > need for swap, it costs you 3-4GB of storage space out of the primary > > namespace. So I'm incentivized to size this region as small as > > possible. Hibernate's speed/endurance requirements are not quite as > > harsh as regular swap. Steering them separately gives me the ability > > to put the hibernate image in regular storage, and not be forced to > > oversize expensive/fast swap space. > > OK, this is likely true but it doesn't really explain/justify a > dedicated swap storage for hibernation. Wait, yes it does. Hibernation has less stringent write endurance and speed requirements than swap, so it makes sense to point it at storage that doesn't pay the 3x capacity penalty, and save the fancy fast stuff for swap. The exclusivity makes sense since you're trying not to wear out your higher capacity storage with unnecessary writes. I'd argue the API addition is worth it for this reason by itself. Usermode has valid reasons for wanting to disentangle these. > > > * Even with the workaround, swap can end up in the hibernate region. > > Hibernate starts by allocating its giant 50%-of-memory region, which > > is often the forcing function for pushing things into swap. With the > > workaround, even if my hibernate region is in last priority, there's > > still a reasonable chance I'll end up swapping into it. > > Right there is no guarantee but why does that matter at all. From the > kernel point of view it doesn't really makes much difference what was > the source of the swapout. > > > If I have > > different security designs for swap space and hibernate, then even a > > chance of some swap leaking into this region is a problem. > > Could you expand some more about the this part please? Offline attacks (ie manipulating storage from underneath the machine) are a major concern when enabling both swap and hibernate. But the approach of adding integrity to mitigate offline attacks may differ between swap and hibernate in the interest of performance. Swap for instance essentially needs a per-page dictionary of hashes for integrity, since pages can be added and removed arbitrarily. Hibernate however just needs a single hash across the entire image to provide integrity. If you have swap leaking onto a region where you don't have integrity enabled (because say you handled integrity at the image level for hibernate, and at the block layer for swap), your swap integrity story is compromised. There's a (likely defunct) series from Matthew Garrett that expounds a bit on some of this, though it's also partially tangential: https://lore.kernel.org/lkml/20210220013255.1083202-1-matthewgarrett@google.com/ > > > * I also want to limit the online attack surface that swap presents. > > I can make headway here by disallowing open() calls on active swap > > regions (via an LSM), and permanently disabling swapon/swapoff system > > calls after early init. The workaround isn't great for me because I > > want to set everything up at early init time and then not touch it. By > > suspend time, on my system I no longer have the ability to make > > swapon/swapoff calls. > > This is clearly a policy call. The goal was to show examples of why the workaround was insufficient. Yes, the response to any particular example could be "just don't choose to do that", but I'm hoping to show examples from several different angles of how the flag is a valuable knob for usermode to have. > > All that being said, I am still missing any justification for the > dedicated swap storage. This is an ABI thing so the reasoning should be > really solid. I'm hoping it is. I sympathize with the awkwardness of "swapon, but don't swap!". But from what I can there is no other route that wouldn't be hugely disruptive and risk breaking compatibility for folks who want to continue to combine their hibernate and swap regions. I don't think this digs the design hole deeper. Yes, the ship on this design has long ago sailed. But if we ever did try to dig ourselves out of the swap/hibernate hole by providing new APIs to handle them separately, this flag would serve as a good cutover to divert out of the swap code and into the new shiny hibernate-only code. The APIs are never going to be totally disentangled, so a clean cutover opportunity is the best one can hope for. -Evan