Received: by 10.213.65.68 with SMTP id h4csp1431963imn; Wed, 14 Mar 2018 22:14:41 -0700 (PDT) X-Google-Smtp-Source: AG47ELstqGhWQHaE7ov3c65Lpgq+mpllWlmPjzaVm+xu3SE2X41eXPLf/zknMlX/7rZ6nbQxpUGM X-Received: by 10.98.19.10 with SMTP id b10mr4605964pfj.236.1521090881352; Wed, 14 Mar 2018 22:14:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521090881; cv=none; d=google.com; s=arc-20160816; b=KXWlk4BD8Ol0aZLAGAV+VNWec5qJJgf/heguCKpTvVEnHTClMp6Yq+nDZolp269707 x//TpW7QuJQhXZ7+WjfC0OHfI33WONpL5JIq8bNXF9pGO7lopza68A+wKizZHeL5lO/Q 2++ksBI6YY43olo+h4qBPk7iHDLGKZtvi0qpoTZAX5npinSpNl9ywjnHXJjwoAa5Wycq 9VpkkQWowSDs0CZNRTrsIe2BUwBiHzwDRLKX8fFrD3iO9+McA4Cubjbbr6gHPbjwoi1G vIY8bautDZxmq/N7JH7tbWEwCX9Jj4MlCMaXjqw7n/JJdztYVHNULOCGEmYpZaSpV/Jj Q1uQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=ia6E4vU0SZHQVH2KPi5tYBIKdikg8uCFSxiSXz+znYc=; b=z/i76xBC5BjVXTTmA3PSQlNMyF74HASqBYcYS0rsdDVAtD8mOv/B4mvr3GPCoKUePb 12WIsn69pHZZJeTBZuJWJVmp/hwo7nm8peNDiz8WmAHCD/nw9C/BNsW0H4x0jQ8dsDF6 kcfWVx+80L86jRaonCTlsAlafVibLam1rpps1kw8kNH/MjprNH8m5P0q9S9Zdk0b0ShL Lbdol7/6rixqSL9phH6/z9hJdLQ1CpjzkYuErrQNBB0aF0IaoNiWeTPPIpY/O7Rs+U2T m3W7HI/7B4is4fOHnJPNP64EEf3XPn4KXkoXoqjSWtQlrz5F0Ypqb7jZsK5M9tlYbvPL vqPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=P3+Xyswc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t74si3305925pfk.160.2018.03.14.22.14.26; Wed, 14 Mar 2018 22:14:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=P3+Xyswc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751482AbeCOFNN (ORCPT + 99 others); Thu, 15 Mar 2018 01:13:13 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:49800 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750722AbeCOFNL (ORCPT ); Thu, 15 Mar 2018 01:13:11 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w2F52NjP151424; Thu, 15 Mar 2018 05:06:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=ia6E4vU0SZHQVH2KPi5tYBIKdikg8uCFSxiSXz+znYc=; b=P3+XyswcwT+FQyjkmASYD/kjTkMW/cvKEIK+YMDIYsE0Jnh6a/rIsbpJY6V2HeQoWOhj kVgjEc+YOmWgFACrbtLalQM6QYDopcUZFgbhK+jUvR9w6lpUETsm2smgsyZMHptr3TwJ SthpWhwdwPrqKM+vqfpxib/MEE6K3WgESs1vZbQ6cd+BBdYW78yteoDc4XzDTW+hmAPJ Q/fFdtcA4t0yxzwrAmE9uRaMJtpdzQJufoiwMMmPIwpa16UTWIkiPse7ncsnQ6TsnANT Kfee6IEFfWa8HruU5V0zagF/KDE/RKgCPDTZbO7RqNvBpy4jx9jd8GxhOSj+apOAS7Eq YA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2120.oracle.com with ESMTP id 2gqj7tg1kg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Mar 2018 05:06:57 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w2F56uUw009252 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Mar 2018 05:06:56 GMT Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w2F56t9X002767; Thu, 15 Mar 2018 05:06:55 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Mar 2018 22:06:55 -0700 Date: Wed, 14 Mar 2018 22:06:53 -0700 From: "Darrick J. Wong" To: Andiry Xu Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, dan.j.williams@intel.com, andy.rudoff@intel.com, coughlan@redhat.com, swanson@cs.ucsd.edu, david@fromorbit.com, jack@suse.com, swhiteho@redhat.com, miklos@szeredi.hu, andiry.xu@gmail.com, Andiry Xu Subject: Re: [RFC v2 04/83] NOVA inode definition. Message-ID: <20180315050653.GC4860@magnolia> References: <1520705944-6723-1-git-send-email-jix024@eng.ucsd.edu> <1520705944-6723-5-git-send-email-jix024@eng.ucsd.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1520705944-6723-5-git-send-email-jix024@eng.ucsd.edu> User-Agent: Mutt/1.5.24 (2015-08-30) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8832 signatures=668690 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=668 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803150058 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Mar 10, 2018 at 10:17:45AM -0800, Andiry Xu wrote: > From: Andiry Xu > > inode.h defines the non-volatile and volatile NOVA inode data structures. > > The non-volatile NOVA inode (nova_inode) is aligned to 128 bytes and contains > file/directory metadata information. The most important fields > are log_head and log_tail. log_head points to the start of > the log, and log_tail points to the end of the latest committed > log entry. NOVA make updates to the inode by appending > to the log tail and update the log_tail pointer atomically. > > The volatile NOVA inode (nova_inode_info) contains necessary > information to limit access to the non-volatile NOVA inode during runtime. > It has a radix tree to map file offset or filenames to the corresponding > log entries. > > Signed-off-by: Andiry Xu > --- > fs/nova/inode.h | 187 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 187 insertions(+) > create mode 100644 fs/nova/inode.h > > diff --git a/fs/nova/inode.h b/fs/nova/inode.h > new file mode 100644 > index 0000000..f9187e3 > --- /dev/null > +++ b/fs/nova/inode.h > @@ -0,0 +1,187 @@ > +#ifndef __INODE_H > +#define __INODE_H > + > +struct nova_inode_info_header; > +struct nova_inode; > + > +#include "super.h" > + > +enum nova_new_inode_type { > + TYPE_CREATE = 0, > + TYPE_MKNOD, > + TYPE_SYMLINK, > + TYPE_MKDIR > +}; > + > + > +/* > + * Structure of an inode in PMEM > + * Keep the inode size to within 120 bytes: We use the last eight bytes > + * as inode table tail pointer. I would've expected a BUILD_BUG_ON(NOVA_INODE_SIZE - sizeof(struct nova_inode) == 8); or something to enforce this. (Or just equate inode number with byte offset? I looked ahead at the directory entries and they seem to be 64-bit...) I guess I'm being lazy and doing a on-disk-format-only review. :) > + */ > +struct nova_inode { > + > + /* first 40 bytes */ > + u8 i_rsvd; /* reserved. used to be checksum */ Magic number? > + u8 valid; /* Is this inode valid? */ > + u8 deleted; /* Is this inode deleted? */ Would i_mode == 0 cover these? > + u8 i_blk_type; /* data block size this inode uses */ I would've thought these would just be bits of i_flags? Also, if I have a 1G blocksize file and free space fragments to the point that there's > 1G of free space but none of it contiguous, I guess I can expect ENOSPC? > + __le32 i_flags; /* Inode flags */ > + __le64 i_size; /* Size of data in bytes */ > + __le32 i_ctime; /* Inode modification time */ > + __le32 i_mtime; /* Inode b-tree Modification time */ > + __le32 i_atime; /* Access time */ Same y2038 grumble from the previous patch. > + __le16 i_mode; /* File mode */ > + __le16 i_links_count; /* Links count */ > + > + __le64 i_xattr; /* Extended attribute block */ > + > + /* second 40 bytes */ > + __le32 i_uid; /* Owner Uid */ > + __le32 i_gid; /* Group Id */ > + __le32 i_generation; /* File version (for NFS) */ > + __le32 i_create_time; /* Create time */ > + __le64 nova_ino; /* nova inode number */ > + > + __le64 log_head; /* Log head pointer */ > + __le64 log_tail; /* Log tail pointer */ > + > + /* last 40 bytes */ > + __le64 create_epoch_id; /* Transaction ID when create */ > + __le64 delete_epoch_id; /* Transaction ID when deleted */ > + > + struct { > + __le32 rdev; /* major/minor # */ > + } dev; /* device inode */ > + > + __le32 csum; /* CRC32 checksum */ > + /* Leave 8 bytes for inode table tail pointer */ > +} __attribute((__packed__)); > + > +/* > + * NOVA-specific inode state kept in DRAM > + */ > +struct nova_inode_info_header { > + /* For files, tree holds a map from file offsets to > + * write log entries. > + * > + * For directories, tree holds a map from a hash of the file name to > + * dentry log entry. > + */ > + struct radix_tree_root tree; > + struct rw_semaphore i_sem; /* Protect log and tree */ > + unsigned short i_mode; /* Dir or file? */ > + unsigned int i_flags; > + unsigned long log_pages; /* Num of log pages */ > + unsigned long i_size; > + unsigned long i_blocks; > + unsigned long ino; > + unsigned long pi_addr; > + unsigned long valid_entries; /* For thorough GC */ > + unsigned long num_entries; /* For thorough GC */ > + u64 last_setattr; /* Last setattr entry */ > + u64 last_link_change; /* Last link change entry */ > + u64 last_dentry; /* Last updated dentry */ > + u64 trans_id; /* Transaction ID */ > + u64 log_head; /* Log head pointer */ > + u64 log_tail; /* Log tail pointer */ > + u8 i_blk_type; > +}; > + > +/* > + * DRAM state for inodes > + */ > +struct nova_inode_info { > + struct nova_inode_info_header header; > + struct inode vfs_inode; > +}; > + > + > +static inline struct nova_inode_info *NOVA_I(struct inode *inode) > +{ > + return container_of(inode, struct nova_inode_info, vfs_inode); > +} > + > +static inline void sih_lock(struct nova_inode_info_header *header) "sih"? What happened to the "nova" prefix? --D > +{ > + down_write(&header->i_sem); > +} > + > +static inline void sih_unlock(struct nova_inode_info_header *header) > +{ > + up_write(&header->i_sem); > +} > + > +static inline void sih_lock_shared(struct nova_inode_info_header *header) > +{ > + down_read(&header->i_sem); > +} > + > +static inline void sih_unlock_shared(struct nova_inode_info_header *header) > +{ > + up_read(&header->i_sem); > +} > + > +static inline unsigned int > +nova_inode_blk_shift(struct nova_inode_info_header *sih) > +{ > + return blk_type_to_shift[sih->i_blk_type]; > +} > + > +static inline uint32_t nova_inode_blk_size(struct nova_inode_info_header *sih) > +{ > + return blk_type_to_size[sih->i_blk_type]; > +} > + > +static inline u64 nova_get_reserved_inode_addr(struct super_block *sb, > + u64 inode_number) > +{ > + return (NOVA_DEF_BLOCK_SIZE_4K * RESERVE_INODE_START) + > + inode_number * NOVA_INODE_SIZE; > +} > + > +static inline struct nova_inode *nova_get_reserved_inode(struct super_block *sb, > + u64 inode_number) > +{ > + struct nova_sb_info *sbi = NOVA_SB(sb); > + u64 addr; > + > + addr = nova_get_reserved_inode_addr(sb, inode_number); > + > + return (struct nova_inode *)(sbi->virt_addr + addr); > +} > + > +static inline struct nova_inode *nova_get_inode_by_ino(struct super_block *sb, > + u64 ino) > +{ > + if (ino == 0 || ino >= NOVA_NORMAL_INODE_START) > + return NULL; > + > + return nova_get_reserved_inode(sb, ino); > +} > + > +static inline struct nova_inode *nova_get_inode(struct super_block *sb, > + struct inode *inode) > +{ > + struct nova_inode_info *si = NOVA_I(inode); > + struct nova_inode_info_header *sih = &si->header; > + struct nova_inode fake_pi; > + void *addr; > + int rc; > + > + addr = nova_get_block(sb, sih->pi_addr); > + rc = memcpy_mcsafe(&fake_pi, addr, sizeof(struct nova_inode)); > + if (rc) > + return NULL; > + > + return (struct nova_inode *)addr; > +} > + > +static inline int nova_persist_inode(struct nova_inode *pi) > +{ > + nova_flush_buffer(pi, sizeof(struct nova_inode), 1); > + return 0; > +} > + > +#endif > -- > 2.7.4 >