Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp44134pxf; Wed, 24 Mar 2021 20:25:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzIzA528X3j0d1B1Y2L8GFFOBKb26mGzIkqKkhnviN/Y44e7GupYpgc/5iPr7o/+t6BYn8e X-Received: by 2002:aa7:dd4d:: with SMTP id o13mr6905127edw.53.1616642759633; Wed, 24 Mar 2021 20:25:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616642759; cv=none; d=google.com; s=arc-20160816; b=iJ1az/MiL6rHmQAdBcl2WGP1qf6wLRfgoMuXIqQcPe5O3VpQs/FVOYfEYjLMVzlhzq wk6EvMowORUf2ieXFx5EbiIWFdCcb8tKRXtHh/+xQwjnecY8yXKOO+OSsQEx4ma1X1va ghG6PZ5z2uMeK/w9nNgCj7/won7yix3UmJyGX1ZFtC3/JFikxFYX9m+e+wWgi6E4/Rcq wswmSm2uwqyuGpqMqDvsniOxngFzN6CMLf2ZGeM8PqSTfpwxCU4AWfaDBRMIjoUqwT+Q f2at9gm674CF8Fo4zsHy9UHqdVEHAMCctSAPVCbhImnjgpGp6teDTnB3yBO2224mdIUc q9fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=qcDyxyk5bvCS0WEAV1+mJY8jO/8xTEDVqNFuTGwD3vw=; b=qRJvonyjjkLKQ32xo2ORjfd4E7DsG/5fgAUviJ5T/SRGOczKFWIjYuwmad71yvrO2K OF6zD2t8eHQFtr2deZVC9rl1o5ldzhbXa2XBI5xJW3+8F0r3KYrk5a7MuPPIzszIVvKL DLWEQuMj+3zjHkzDzwT37zKocw9MdclDX2d8/qGxDfrT6sR12WJ5SO1EvSMCr7pNZRfk tmjSnEuKoyMcBHlO9Fd1uGoujemzHfWnBchnbTNdOoBPXkFcZKeATpeSBAK/llWQj2Wr 7ic870oO3Ty0nboBlYs4R5d+C3sqxXFlmX+tnZlyoXRVOHFam2jXi9GeoG+VJbqkW8dl h0yA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="LQv/zun+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bd28si3028117edb.570.2021.03.24.20.25.37; Wed, 24 Mar 2021 20:25:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="LQv/zun+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238022AbhCXUjU (ORCPT + 99 others); Wed, 24 Mar 2021 16:39:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238138AbhCXUjS (ORCPT ); Wed, 24 Mar 2021 16:39:18 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0952AC06174A for ; Wed, 24 Mar 2021 13:39:18 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id g15so18318634pfq.3 for ; Wed, 24 Mar 2021 13:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=qcDyxyk5bvCS0WEAV1+mJY8jO/8xTEDVqNFuTGwD3vw=; b=LQv/zun+j1/d9r4zt9gm0IaWVEHFp8r2eRBRjyUN6GFZODOU1mbnZt2VglPEMqNrMk 5XO3EJkRspqLHmcUlkVAoJFRaMzGO65rMOxOVb89V6LEr3WYzYSl+7LKLIW3XD5V/y+J CjJh7fXJoQ5eSHjuGEm0atA60APspXDZ2TJJlFC34cmQda/5bX9QW968cxxC00GNf9yj 73+uXyYSMPdhj/hgFSsp4QabE3/KzaQ+YoUte/HNOtLyEX5iyrUeQb786gluhJc4wQ2/ U2hJkhVbGvJWYbEF8SsnOsjUTOxexa8kSlA64w/vw5ylM6SNWnXSPaYLHextD3GJ82Ir WeCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=qcDyxyk5bvCS0WEAV1+mJY8jO/8xTEDVqNFuTGwD3vw=; b=JoG4avf8Vg7E0Ny7FXIFKh+l8lq6YJlHEy4YVI/apARmjt08N95hnlF6tl8b9Y4x3o DQKGTYbsO2j66CMDrWBFgmzzMkJerm2RSCCLu1hgUEdoUhQN8EFsPQxAW13OHO+hcFmD BGg6GE2Q8AY5Ei7kYvQIQyBxX+iddFu1SlqDB76zpD6aDOPXn4eClEMj5UAsVmG2bd8y KsOXxdiy968DPukFfOX5qB96egythMjwzWg6eGCEW1HZlR30+0i9XpCBH7p1k5R1a5Up hHCW2Niw2V4+YSwEKdUiA7FT4ukeZOw6wg9xQGAdZgsZqwl6T7EBL2lVwI0IkyLEVuB/ tVcQ== X-Gm-Message-State: AOAM532VVtCc+HWhOaYF2m0Vs8CWuM3GK8B5I8vhVcPkHsfyb8LH4gn6 dMg9Gv7W7braMN3x7MaAQMFRw7GZweB/fTjJBaxwRQ== X-Received: by 2002:aa7:881a:0:b029:1f1:6148:15c3 with SMTP id c26-20020aa7881a0000b02901f1614815c3mr4760683pfo.30.1616618357193; Wed, 24 Mar 2021 13:39:17 -0700 (PDT) MIME-Version: 1.0 References: <20210316041645.144249-1-arjunroy.kdev@gmail.com> In-Reply-To: From: Arjun Roy Date: Wed, 24 Mar 2021 13:39:06 -0700 Message-ID: Subject: Re: [mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy To: Michal Hocko Cc: Johannes Weiner , Arjun Roy , Andrew Morton , David Miller , netdev , Linux Kernel Mailing List , Cgroups , Linux MM , Shakeel Butt , Eric Dumazet , Soheil Hassas Yeganeh , Jakub Kicinski , Yang Shi , Roman Gushchin Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 24, 2021 at 2:12 AM Michal Hocko wrote: > > On Tue 23-03-21 11:47:54, Arjun Roy wrote: > > On Tue, Mar 23, 2021 at 7:34 AM Michal Hocko wrote: > > > > > > On Wed 17-03-21 18:12:55, Johannes Weiner wrote: > > > [...] > > > > Here is an idea of how it could work: > > > > > > > > struct page already has > > > > > > > > struct { /* page_pool used by netstack */ > > > > /** > > > > * @dma_addr: might require a 64-bit value even on > > > > * 32-bit architectures. > > > > */ > > > > dma_addr_t dma_addr; > > > > }; > > > > > > > > and as you can see from its union neighbors, there is quite a bit more > > > > room to store private data necessary for the page pool. > > > > > > > > When a page's refcount hits zero and it's a networking page, we can > > > > feed it back to the page pool instead of the page allocator. > > > > > > > > From a first look, we should be able to use the PG_owner_priv_1 page > > > > flag for network pages (see how this flag is overloaded, we can add a > > > > PG_network alias). With this, we can identify the page in __put_page() > > > > and __release_page(). These functions are already aware of different > > > > types of pages and do their respective cleanup handling. We can > > > > similarly make network a first-class citizen and hand pages back to > > > > the network allocator from in there. > > > > > > For compound pages we have a concept of destructors. Maybe we can extend > > > that for order-0 pages as well. The struct page is heavily packed and > > > compound_dtor shares the storage without other metadata > > > int pages; /* 16 4 */ > > > unsigned char compound_dtor; /* 16 1 */ > > > atomic_t hpage_pinned_refcount; /* 16 4 */ > > > pgtable_t pmd_huge_pte; /* 16 8 */ > > > void * zone_device_data; /* 16 8 */ > > > > > > But none of those should really require to be valid when a page is freed > > > unless I am missing something. It would really require to check their > > > users whether they can leave the state behind. But if we can establish a > > > contract that compound_dtor can be always valid when a page is freed > > > this would be really a nice and useful abstraction because you wouldn't > > > have to care about the specific type of page. > > > > > > But maybe I am just overlooking the real complexity there. > > > -- > > > > For now probably the easiest way is to have network pages be first > > class with a specific flag as previously discussed and have concrete > > handling for it, rather than trying to establish the contract across > > page types. > > If you are going to claim a page flag then it would be much better to > have it more generic. Flags are really scarce and if all you care about > is PageHasDestructor() and provide one via page->dtor then the similar > mechanism can be reused by somebody else. Or does anything prevent that? The way I see it - the fundamental want here is, for some arbitrary page that we are dropping a reference on, to be able to tell that the provenance of the page is some network driver's page pool. If we added an enum target to compound_dtor, if we examine that offset in the page and look at that value, what guarantee do we have that the page isn't instead some other kind of page, and the byte value there was just coincidentally the one we were looking for (but it wasn't a network driver pool page)? Existing users of compound_dtor seem to check first that a PageCompound() or PageHead() return true - the specific scenario here, of receiving network packets, those pages will tend to not be compound (and more specifically, compound pages are explicitly disallowed for TCP receive zerocopy). Given that's the case, the options seem to be: 1) Use a page flag - with the downside that they are a severely limited resource, 2) Use some bits inside page->memcg_data - this I believe Johannes had reasons against, and it isn't always the case that MEMCG support is enabled. 3) Use compound_dtor - but I think this would have problems for the prior reasons. Thanks, -Arjun > -- > Michal Hocko > SUSE Labs