Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp6829712ybl; Mon, 23 Dec 2019 12:47:52 -0800 (PST) X-Google-Smtp-Source: APXvYqySmssjwEw1EL/gvdWpwFElog2uXkDNEA2747QalGAVgmWheqA058MzYKvsp7nT0H5WLzWr X-Received: by 2002:a9d:2073:: with SMTP id n106mr25165140ota.145.1577134072091; Mon, 23 Dec 2019 12:47:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1577134072; cv=none; d=google.com; s=arc-20160816; b=e6h2AeU9r21bKYZU4Y7NOqdgS/BduSLyN+xwmjSJPVUswConp/cBwZKQl8nP/n/l1f 3CTVHu9+H8/S4MZ8lcuAV+AFbB2f6P0xd9JDep4hidSieFtB73dIfRzvgbROhOXq9sPg 7MeDETzYZiTuX1Wi3HEhbtNAaxtgpkz8UHF5S71v5ywU28WhhO3gqZtolkoCoTRW9U7k BC8pu7lYy72LaT5zOjWtiE2C+kXgzMYcTnJpxMKLaeNN+ZG42vEDdurzeDNKXW4IdQwx QJrYRERglWAWNP3P7Xqfqk4pXFmOL7ErGKsPLo+RcYoE/zC0gWU0lc3xmH4HNM7LN4i+ 1Jng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=RfB76wwaEX/78cbszFlEp1zzCnUTfDY/Sh7gy/S4Lbo=; b=jruEpDDG9UDLxA5jmNvHw2nk/w5Le54LAVuP8oiVEx0fnPnWofepEWQW/Y527WmcB3 pDwFtLhwBggbXCjR1LEXaAUApQWod+ToFpaaen0v4ni85slDIzpIvJcxZN7T8wEy8p3k z6xgrj0wmiAuOcCcI/mhcQMXUSwZ4R9GEUcvZpNplpAoUjdyK5zd8QdvquTQBpl4MxUS PZrKd0fscAZhirifO4V/jQkzgirKcF6WSWvH3bFP6RP1SOg9AtAIXVN8C5lzlu/oFftj cuv6p1IO80WQRKVUTZuO4JtUEHNDuW4Ud+rgwWF4miUc3ZBIKp2f+Zwn8XmUZY0A1QlS OhSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b="nIF/rtU2"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r5si9637111oic.19.2019.12.23.12.47.27; Mon, 23 Dec 2019 12:47:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b="nIF/rtU2"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726982AbfLWUp4 (ORCPT + 99 others); Mon, 23 Dec 2019 15:45:56 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:36618 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726829AbfLWUp4 (ORCPT ); Mon, 23 Dec 2019 15:45:56 -0500 Received: by mail-wr1-f67.google.com with SMTP id z3so17963953wru.3 for ; Mon, 23 Dec 2019 12:45:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=RfB76wwaEX/78cbszFlEp1zzCnUTfDY/Sh7gy/S4Lbo=; b=nIF/rtU2CboWVUUhYtDjyEwhh2pLvmBwpXRDenb7QB+hciZC4vK0N69+PUkjMN7aDK RRlAU40oBB+K9zy7rIki+yoYkvykfYmfp98ExrjDAXen7Q4WBr44moYjTmVm13NhpzXp KrmGxTN3IeAgaWFHuzcn/KNRF7a/w2CjQMWoQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=RfB76wwaEX/78cbszFlEp1zzCnUTfDY/Sh7gy/S4Lbo=; b=CnvaczDNSZfA//D5+mRE3ol9RmNfLSyYnhVNh5QIHrvEaXkU1hp8nf1/rF1oHIanL8 +jy80OnPi1hMH6LUY5fXDQpBC0eplsmVqPcH8t3WsuyniP8hM/Rl3d7TgKeeXGn6rfOn RdsLdPVyHaLvJWH9Zf9HKOHX3sUPFGhsiNblSmMgiZoCY63tyQTVk8oaD1EQ8hj3OE7E NPRamlcljbP2CPXpARoGDhdjFTl3n02OBzb6Ua2iTWKHPWed5ILqV2ROlqgIAHMKETMk XbQHRuzTZfOPii9X7DV9O2DNo/zj3jkmcBfEctLUC/5SrpdKYmyhx7NTz6WGr113ZbOL /5XA== X-Gm-Message-State: APjAAAWMRAW3zACIeXKeJnnscEoEtFI8QVOZmBiVYXugn0ma10aEJq51 p5Lu6v3LN2Zarb8XmHEL/KJOlg== X-Received: by 2002:adf:df03:: with SMTP id y3mr34226407wrl.260.1577133954181; Mon, 23 Dec 2019 12:45:54 -0800 (PST) Received: from localhost ([2620:10d:c092:180::1:94bc]) by smtp.gmail.com with ESMTPSA id b10sm22304212wrt.90.2019.12.23.12.45.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Dec 2019 12:45:53 -0800 (PST) Date: Mon, 23 Dec 2019 20:45:51 +0000 From: Chris Down To: Matthew Wilcox Cc: Amir Goldstein , linux-fsdevel , Al Viro , Jeff Layton , Johannes Weiner , Tejun Heo , linux-kernel , kernel-team@fb.com, Hugh Dickins , Miklos Szeredi , "zhengbin (A)" , Roman Gushchin Subject: Re: [PATCH] fs: inode: Reduce volatile inode wraparound risk when ino_t is 64 bit Message-ID: <20191223204551.GA272672@chrisdown.name> References: <20191220024936.GA380394@chrisdown.name> <20191220121615.GB388018@chrisdown.name> <20191220164632.GA26902@bombadil.infradead.org> <20191220195025.GA9469@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20191220195025.GA9469@bombadil.infradead.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Matthew Wilcox writes: >On Fri, Dec 20, 2019 at 07:35:38PM +0200, Amir Goldstein wrote: >> On Fri, Dec 20, 2019 at 6:46 PM Matthew Wilcox wrote: >> > >> > On Fri, Dec 20, 2019 at 03:41:11PM +0200, Amir Goldstein wrote: >> > > Suggestion: >> > > 1. Extend the kmem_cache API to let the ctor() know if it is >> > > initializing an object >> > > for the first time (new page) or recycling an object. >> > >> > Uh, what? The ctor is _only_ called when new pages are allocated. >> > Part of the contract with the slab user is that objects are returned to >> > the slab in an initialised state. >> >> Right. I mixed up the ctor() with alloc_inode(). >> So is there anything stopping us from reusing an existing non-zero >> value of i_ino in shmem_get_inode()? for recycling shmem ino >> numbers? > >I think that would be an excellent solution to the problem! At least, >I can't think of any problems with it. Thanks for the suggestions and feedback, Amir and Matthew :-) The slab i_ino recycling approach works somewhat, but is unfortunately neutered quite a lot by the fact that slab recycling is per-memcg. That is, replacing with recycle_or_get_next_ino(old_ino)[0] for shmfs and a few other trivial callsites only leads to about 10% slab reuse, which doesn't really stem the bleeding of 32-bit inums on an affected workload: # tail -5000 /sys/kernel/debug/tracing/trace | grep -o 'recycle_or_get_next_ino:.*' | sort | uniq -c 4454 recycle_or_get_next_ino: not recycled 546 recycle_or_get_next_ino: recycled Roman (who I've just added to cc) tells me that currently we only have per-memcg slab reuse instead of global when using CONFIG_MEMCG. This contributes fairly significantly here since there are multiple tasks across multiple cgroups which are contributing to the get_next_ino() thrash. I think this is a good start, but we need something of a different magnitude in order to actually solve this problem with the current slab infrastructure. How about something like the following? 1. Add get_next_ino_full, which uses whatever the full width of ino_t is 2. Use get_next_ino_full in tmpfs (et al.) 3. Add a mount option to tmpfs (et al.), say `32bit-inums`, which people can pass if they want the 32-bit inode numbers back. This would still allow people who want to make this tradeoff to use xino. 4. (If you like) Also add a CONFIG option to disable this at compile time. I'd appreciate your thoughts on that approach or others you have ideas about. Thanks! :-) 0: unsigned int recycle_or_get_next_ino(ino_t old_ino) { /* * get_next_ino returns unsigned int. If this fires then i_ino must be * >32 bits and have been changed later, so the caller shouldn't be * recycling inode numbers */ WARN_ONCE(old_ino >> (sizeof(unsigned int) * 8), "Recyclable i_ino uses more bits than unsigned int: %llu", (u64)old_ino); if (old_ino) { if (prandom_u32() % 100 == 0) trace_printk("recycled\n"); return old_ino; } else { if (prandom_u32() % 100 == 0) trace_printk("not recycled\n"); return get_next_ino(); } }