Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp44533pxf; Wed, 24 Mar 2021 20:26:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzagYPdU1Cnb4f6Df6qCKKHLB7UvszuCeSggRklnSFQIFAMKnIZcI6zzCxmaJTcZ3P/qptX X-Received: by 2002:a17:906:3488:: with SMTP id g8mr6859751ejb.282.1616642812645; Wed, 24 Mar 2021 20:26:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616642812; cv=none; d=google.com; s=arc-20160816; b=MONRJCduhPXi8Pn4ckCyJ4p0zwdp694uTpG4TjRqKjizjDhR2cnTbx1x65X7RHJaRY n68RPg7IkyY51DvxxmmWQ1/yg5Bg6xe+76NTOZZ5K9NOuhwuTuwrZqTAhTMrwXhFTl9O e/7L/yVOTAvZjfLrYurnf8e1Vo26FSYOIi5wjZv61z3d1BznS/hnwU89UMW57Xd4NtEz NNbqqvaAJ0rkXLPSkQx3OTqcXVvrU8W4wJ7HS+59ydkm28w3AS6fghlszt0+NhnFeo6Y WL7iX39m2W/5m6JYhbG7E8X5L/fXOS4sQEZZChh6soDiiW+LRet5Kz7N/lz4guodHY4m qaqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=CUd0B1oE8UgjwZS/vVWXtycCqy3Z4dx7RtFiDbPwnxk=; b=AGeBFmUqLY43X3RFsoFp4JFFbPRrhS4p/NgOfbAhh0Y8fy/GLoiQS58Krgl3o2B3zK bivCjoY16zkMG7zuYdtnsbuR3TvqEPtC87IwTv6ZqfjZFfYy3+nnlLJ1jA2PvyOHodrH P2IZ72spxNiGpViOs6O2vWUsVmDH8T0A3hBnbt+GhBFinA53vPX8WAEHcKvOPqxrYmMr JKMHljJBdNFjtQ/9wFL8fJwk6C6fEfPJl1Qrxw1p8k5lPWI3kBu8fxzAFhMTypbB5PlX p+YltbxLKdqq9cj/YsLC/khTBowM2xVcBmGRHwUByhQestwTgZA+OVpSjcZXEEZZPkiA nHrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=dbdTgeyC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dn4si2788488ejc.571.2021.03.24.20.26.30; Wed, 24 Mar 2021 20:26:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=dbdTgeyC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238326AbhCXVYZ (ORCPT + 99 others); Wed, 24 Mar 2021 17:24:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238320AbhCXVYM (ORCPT ); Wed, 24 Mar 2021 17:24:12 -0400 Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90271C0613DE for ; Wed, 24 Mar 2021 14:24:12 -0700 (PDT) Received: by mail-qk1-x735.google.com with SMTP id z10so19617484qkz.13 for ; Wed, 24 Mar 2021 14:24:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=CUd0B1oE8UgjwZS/vVWXtycCqy3Z4dx7RtFiDbPwnxk=; b=dbdTgeyCSqFo3yOGJ76gxQ1MbGnyiOl1auHv4991smV1UjVeNvf/EWpQZAUT06hMbh fviRi4ubTSutg+fJ8qUIb7nt1Ooxdtd4d4OiOoAmWjbOFUxf9ZcoS6cEgRcdhetuzqy4 P3D8NeWOFUTQq5gw+DUGkMO0L6jrrd8pi+WLcFpqlvmYjrgk1mf/xDqUxiNn+mTr1T53 WZ/8QdvoCMP57HFJpuY5In0IPlzv9E5V29+zA+257wug93KmU5ag+lxVHNVOtIIaFZbH iHOslI2xYTIazSjwTxlSPPIN+6fba2t7iBA90mUKLyANKtzbvNniP+4RorV3bHA3qkdA qvTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=CUd0B1oE8UgjwZS/vVWXtycCqy3Z4dx7RtFiDbPwnxk=; b=cpq8QtPIzLWC8x07gJlAWt2ZAQfTOi8Duw+FunjNg/i3eJ/+pv3G9x+yK1e6AokwQo QvD6v0+yqEYkRf9Mna0f9A4sQDWdZWaoaCdjRyDDs/72/d+NPaKwkPDG1koKBwt9ffKe Ws+SPUTMFg8O9Od1YNUUx4iWrlhRHw9pYUkkvJ7/OrGo6zRxVu4XIbco0cOQwHkBp+lC CMMsWueZA/02w3oFZCMKFpAYlRa4g2+qQn5lTEYGnje9XaxWvVAQpV/9kAAF/BuMJOj2 RfXN9U31RB+IKnbDYmAjt2LWgRgqAm7PzkHmWNCtIUPQpwaa9hyt4NFVjVB80qBzZFDn uhSA== X-Gm-Message-State: AOAM533kp+ZszR5hgxklXda81Lv5Rj7R4kmk5e4bRzp1sjkZlFffFweU ERBVpOAIcLvGhlBFfbRlISw5mw== X-Received: by 2002:a05:620a:5b3:: with SMTP id q19mr5317185qkq.98.1616621051819; Wed, 24 Mar 2021 14:24:11 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:f4a2]) by smtp.gmail.com with ESMTPSA id d14sm2313793qtr.55.2021.03.24.14.24.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Mar 2021 14:24:11 -0700 (PDT) Date: Wed, 24 Mar 2021 17:24:09 -0400 From: Johannes Weiner To: Michal Hocko Cc: Arjun Roy , Arjun Roy , Andrew Morton , David Miller , netdev , Linux Kernel Mailing List , Cgroups , Linux MM , Shakeel Butt , Eric Dumazet , Soheil Hassas Yeganeh , Jakub Kicinski , Yang Shi , Roman Gushchin Subject: Re: [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy Message-ID: References: <20210316041645.144249-1-arjunroy.kdev@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 24, 2021 at 10:12:46AM +0100, Michal Hocko wrote: > On Tue 23-03-21 11:47:54, Arjun Roy wrote: > > On Tue, Mar 23, 2021 at 7:34 AM Michal Hocko wrote: > > > > > > On Wed 17-03-21 18:12:55, Johannes Weiner wrote: > > > [...] > > > > Here is an idea of how it could work: > > > > > > > > struct page already has > > > > > > > > struct { /* page_pool used by netstack */ > > > > /** > > > > * @dma_addr: might require a 64-bit value even on > > > > * 32-bit architectures. > > > > */ > > > > dma_addr_t dma_addr; > > > > }; > > > > > > > > and as you can see from its union neighbors, there is quite a bit more > > > > room to store private data necessary for the page pool. > > > > > > > > When a page's refcount hits zero and it's a networking page, we can > > > > feed it back to the page pool instead of the page allocator. > > > > > > > > From a first look, we should be able to use the PG_owner_priv_1 page > > > > flag for network pages (see how this flag is overloaded, we can add a > > > > PG_network alias). With this, we can identify the page in __put_page() > > > > and __release_page(). These functions are already aware of different > > > > types of pages and do their respective cleanup handling. We can > > > > similarly make network a first-class citizen and hand pages back to > > > > the network allocator from in there. > > > > > > For compound pages we have a concept of destructors. Maybe we can extend > > > that for order-0 pages as well. The struct page is heavily packed and > > > compound_dtor shares the storage without other metadata > > > int pages; /* 16 4 */ > > > unsigned char compound_dtor; /* 16 1 */ > > > atomic_t hpage_pinned_refcount; /* 16 4 */ > > > pgtable_t pmd_huge_pte; /* 16 8 */ > > > void * zone_device_data; /* 16 8 */ > > > > > > But none of those should really require to be valid when a page is freed > > > unless I am missing something. It would really require to check their > > > users whether they can leave the state behind. But if we can establish a > > > contract that compound_dtor can be always valid when a page is freed > > > this would be really a nice and useful abstraction because you wouldn't > > > have to care about the specific type of page. Yeah technically nobody should leave these fields behind, but it sounds pretty awkward to manage an overloaded destructor with a refcounted object: Either every put would have to check ref==1 before to see if it will be the one to free the page, and then set up the destructor before putting the final ref. But that means we can't support lockless tryget() schemes like we have in the page cache with a destructor. Or you'd have to set up the destructor every time an overloaded field reverts to its null state, e.g. hpage_pinned_refcount goes back to 0. Neither of those sound practical to me. > > > But maybe I am just overlooking the real complexity there. > > > -- > > > > For now probably the easiest way is to have network pages be first > > class with a specific flag as previously discussed and have concrete > > handling for it, rather than trying to establish the contract across > > page types. > > If you are going to claim a page flag then it would be much better to > have it more generic. Flags are really scarce and if all you care about > is PageHasDestructor() and provide one via page->dtor then the similar > mechanism can be reused by somebody else. Or does anything prevent that? I was suggesting to alias PG_owner_priv_1, which currently isn't used on network pages. We don't need to allocate a brandnew page flag. I agree that a generic destructor for order-0 pages would be nice, but due to the decentralized nature of refcounting the only way I think it would work in practice is by adding a new field to struct page that is not in conflict with any existing ones. Comparably, creating a network type page consumes no additional space.