Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4218637ybl; Sat, 21 Dec 2019 02:18:51 -0800 (PST) X-Google-Smtp-Source: APXvYqzSdav0nW2nshTvMo5uqkQLggSAdY5QzIpLy2d32SfO8wgC3aY8+uTGz3aWd3rGmOpOzcbA X-Received: by 2002:a9d:20e4:: with SMTP id x91mr19625288ota.335.1576923530955; Sat, 21 Dec 2019 02:18:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576923530; cv=none; d=google.com; s=arc-20160816; b=l8r3lAS57+mLkth5EMTx4VlvoHAi+m+aFcp1BQKndptpvJC7XpMUsvuG09DbizqmE0 TZHvz4SVFBBf7mnZkbzE1AGdibhA3W71wSwOnMIUGHVU4UKD6/2cpTRd65Y6TB/+kGcj vLxJ75Vn5S8wc8hbX2wfsaV8jtwKvglYX50jZdjg+VkWxeqSYdabA1xNRNVJHq5h2jih /8bqceEKr5UBrVOeW4TVwPtqhQ8LEpG2E21V4VM2EjlZC9UeHVJiqCs9mqmXRLFLOH02 7s40wED43WQO0ECixeessXLcA1TVU8h6atxQlVd3GiTZkSFGAQcUTzqxuF2SfWj+PiE8 rbIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=w7EMZsX1P6p24D8oxJ5lDhfZGtIC/1lVYlrnNXeuQH4=; b=nfeBAREEXK7Y+e7UO0FpeewyEJ0JAC1IsjQyLI8b0mswMi9nQlaIIp4PoWku8IADRA aQ7r0d1iujhZP/gysvMfBrSiVwY7tC4WSDfCx19vvZBXbSiffSHJeWG4bQYjgkkVv5bc zA/DJf25X/VoRKFDvHj2NAMZ+4mUjO6SLuHgArq/ikfgbIJjEWO2ZLR7nYD78tMzhISY RXDbtqrDRl30x0SnymQ+q8i1xSG6YrOvCjFBsis+m2xzn0OoLBTNRVg5tWH6XRjrvia/ IX6EkV9UR6oJFUSQsHn5Cw8vdRN7gJhzblLAaiSt6iVjhscu1B6hqH9KbUDAsxvCZbud IFSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b=Rq4LsVBw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l7si6842275oth.11.2019.12.21.02.18.36; Sat, 21 Dec 2019 02:18:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chrisdown.name header.s=google header.b=Rq4LsVBw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chrisdown.name Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726189AbfLUKQ4 (ORCPT + 99 others); Sat, 21 Dec 2019 05:16:56 -0500 Received: from mail-wm1-f68.google.com ([209.85.128.68]:55190 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725944AbfLUKQ4 (ORCPT ); Sat, 21 Dec 2019 05:16:56 -0500 Received: by mail-wm1-f68.google.com with SMTP id b19so11281648wmj.4 for ; Sat, 21 Dec 2019 02:16:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=w7EMZsX1P6p24D8oxJ5lDhfZGtIC/1lVYlrnNXeuQH4=; b=Rq4LsVBwxYKw74dI8FSDPQwHMWSyqhrfuDWib+CxcFiQE+kws197g+JzT+PxYDTmD5 FMcelA158Vpx3rW0QMRhO9CsMd4cyy1sKDhJe80scYqoBrsxroKFfwk1JZpGhXoa5apQ MXFmF0Y3+qIxm+ClyYCqMHx9AnYAefTs7VmVg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=w7EMZsX1P6p24D8oxJ5lDhfZGtIC/1lVYlrnNXeuQH4=; b=DK9xhUEAvuc3yYKSOuiN2C6SUGCS281zB7SLNFsvolu/D8J9dPjPvMWdi/g5SKtKzg IqVJCTYQ3Kfs5mDLqKSbQ/2fFmH5Bhe62JtTeB2bADlAyQlmuo1YxWcUeqvjmUd/PDuY 572bpmRGoZmISj/mn75Gh/P7HBXYXyx6nu2fHvVVcrnTVttysrwTS9VVaVw7PfTYip6k PWuNotIqIXCryNo/8jX7qFaxJvGYop4KI/6NITg+4Uf/VqcOTozb0ru3iqgV4ZTFnvMK yuwobBgWD1m6O9izn/YH/YChhdxzfhERy5PdmcGcQiIX3YSx25hMsMFZB8KxVOZEw/Mq 1fIw== X-Gm-Message-State: APjAAAVJNYoVQauUGIXhhq/UW6UIvceoS8uFuE39qJd5jubJj//Uwb/t P+82QDrcNzKFAMe/1Lf+X8hZHg== X-Received: by 2002:a1c:9d52:: with SMTP id g79mr21587689wme.148.1576923414062; Sat, 21 Dec 2019 02:16:54 -0800 (PST) Received: from localhost ([185.69.145.42]) by smtp.gmail.com with ESMTPSA id f127sm11114118wma.4.2019.12.21.02.16.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 Dec 2019 02:16:53 -0800 (PST) Date: Sat, 21 Dec 2019 10:16:52 +0000 From: Chris Down To: "Darrick J. Wong" Cc: linux-fsdevel@vger.kernel.org, Al Viro , Jeff Layton , Johannes Weiner , Tejun Heo , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] fs: inode: Reduce volatile inode wraparound risk when ino_t is 64 bit Message-ID: <20191221101652.GA494948@chrisdown.name> References: <20191220024936.GA380394@chrisdown.name> <20191220213052.GB7476@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20191220213052.GB7476@magnolia> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Darrick J. Wong writes: >On Fri, Dec 20, 2019 at 02:49:36AM +0000, Chris Down wrote: >> In Facebook production we are seeing heavy inode number wraparounds on >> tmpfs. On affected tiers, in excess of 10% of hosts show multiple files >> with different content and the same inode number, with some servers even >> having as many as 150 duplicated inode numbers with differing file >> content. >> >> This causes actual, tangible problems in production. For example, we >> have complaints from those working on remote caches that their >> application is reporting cache corruptions because it uses (device, >> inodenum) to establish the identity of a particular cache object, but > >...but you cannot delete the (dev, inum) tuple from the cache index when >you remove a cache object?? There are some cache objects which may be long-lived. In these kinds of cases, the cache objects aren't removed until they're conclusively not needed. Since tmpfs shares the i_ino counter with every other user of get_next_ino, it's then entirely possible that we can thrash through 2^32 inodes within a period that it's possible for a single cache file to exist. >> because it's not unique any more, the application refuses to continue >> and reports cache corruption. Even worse, sometimes applications may not >> even detect the corruption but may continue anyway, causing phantom and >> hard to debug behaviour. >> >> In general, userspace applications expect that (device, inodenum) should >> be enough to be uniquely point to one inode, which seems fair enough. > >Except that it's not. (dev, inum, generation) uniquely points to an >instance of an inode from creation to the last unlink. I didn't mention generation because, even though it's set on tmpfs (to prandom_u32()), it's not possible to evaluate it from userspace since `ioctl` returns ENOTTY. We can't ask userspace applications to introspect on an inode attribute that they can't even access :-)