Received: by 10.213.65.68 with SMTP id h4csp1456036imn; Wed, 14 Mar 2018 23:17:24 -0700 (PDT) X-Google-Smtp-Source: AG47ELujOK+ZBtK0kfVgj0YCr1audT664N52pl8086HXoKEnCxmIAEuHgVUSsek0Sdguf1JwW3jI X-Received: by 10.98.178.17 with SMTP id x17mr6802136pfe.2.1521094644381; Wed, 14 Mar 2018 23:17:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521094644; cv=none; d=google.com; s=arc-20160816; b=EOCvmDYJUmo3dR2BGDXwnELX74GFCFqugpfMZJmA4by4ZqNocd0x2J6MkQFQxq4ETg GwWXJDlMUu3MOirEzi5+oXxfhf78O7jY4CGOlRwHFdBe5SsnZsCKaZn04OPgAKYX6F6s j16VSbGnd31KyZzjW7XtdTUjw1/ilHe6qCtrcn/cAIk2RFF+Df/4VHKRBUCJMqbhT6cI d377JghMgCCDHP1wWJlLq0zok7MqKiGHopG/F93HKJ0JWsBoSCAy3qfiyTWAbEp5hYDe 8Uqf5BPUxxsifjBzbddvyVUZYjzXCOzUVwyARGxM5zAhGb6IusSVLYLTRwE4TiFyXSVc SAVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=GZd+8SZe3bCLVAG50C8l2nmXR3obH4bCTsflZol/ZLM=; b=CJc8l8HU26e+kz0OY+WntBKhWmkZJCuZy6gMtlGEL6dwZcfQJkei1eknue79Afj9J7 8Mo7rVjc4U8IJ2FdjzBJm279vWmOvYkv8jnMhkn1/bEyvtxutbgor50Gikwb1MQjInSy a1ZQqCsrs2JhYLojLH1AcWen+GtZ1slaSSP5BtYHFLLaiY+lKfQwEfaNKqyPk1t0gwvj o6HJ1lZEy1KNsv2KN20ZZW1MLrkkuVYUPagx8EPtmVG2Rd23SXj86cBE/CmQAWe0FuJa Y+pe146vmSXMPkc/DQ8RW0LDv/YbITWjdO7Tj7SeX9CKOxWjQjuLs/oFzzK6w4cCA75w DIwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@eng.ucsd.edu header.s=google header.b=HoPAvPge; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r3-v6si3420921plb.197.2018.03.14.23.17.10; Wed, 14 Mar 2018 23:17:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@eng.ucsd.edu header.s=google header.b=HoPAvPge; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751527AbeCOGQT (ORCPT + 99 others); Thu, 15 Mar 2018 02:16:19 -0400 Received: from mail-io0-f194.google.com ([209.85.223.194]:42427 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750745AbeCOGQR (ORCPT ); Thu, 15 Mar 2018 02:16:17 -0400 Received: by mail-io0-f194.google.com with SMTP id u84so7199740iod.9 for ; Wed, 14 Mar 2018 23:16:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eng.ucsd.edu; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=GZd+8SZe3bCLVAG50C8l2nmXR3obH4bCTsflZol/ZLM=; b=HoPAvPgeF0qE4KKvMhYhWQxddnxcmPs1grmTKVGtW40OZ36F152ZyJzCXql3D9kzI4 yq/rnp2BISlB+FteUtW476UaSJFGsukjgJPow4TA5FNuKU2b1BQNRfreB5MjjzJDl3By 242NAYJQBflpsc88dMgkt0EsXOrNcw3xWFOLE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=GZd+8SZe3bCLVAG50C8l2nmXR3obH4bCTsflZol/ZLM=; b=W+C5tSYMwiPn8x7xYTQGR6E9pKdlrf5sOvVy8tJKu0H2My9Namfr1fEUaVqcYg4Pdm gI94VJUg9QQZlGE84DesvT6+M8vSsxq3jeaHIvkBLJ8tAxu8NG6l810d9c4FO3y4mCMj ep5RdPaNpeTCh/MmJkWJSXGVJtmEkMen56XhJtM/MRlEw5Qww1ebFHmTuVkg+ulhNmiM WsvRxGOcQB7w04tASOfX6gXtYNGZmut9GdyKisCy5/BfAba8gIpEV9wQWHDYl+XMQXPw HgT7VtANSXCBWzY1rLthuNJJ0XAc3PJ5FuWXzlQHODDz4ZrArok7tY0VdlSl2zqet+1U M9fg== X-Gm-Message-State: AElRT7G+R6dwL1QZ1NcHUS2NLNwhsLQuMb/m8SeVNZBrJQxFklUdPK9t uePNLkj+K8T3jygFSogKlv+QfBdOYraOAgoA+bl6Ew== X-Received: by 10.107.22.1 with SMTP id 1mr7965712iow.238.1521094576631; Wed, 14 Mar 2018 23:16:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.195.72 with HTTP; Wed, 14 Mar 2018 23:16:15 -0700 (PDT) In-Reply-To: <20180315050653.GC4860@magnolia> References: <1520705944-6723-1-git-send-email-jix024@eng.ucsd.edu> <1520705944-6723-5-git-send-email-jix024@eng.ucsd.edu> <20180315050653.GC4860@magnolia> From: Andiry Xu Date: Wed, 14 Mar 2018 23:16:15 -0700 Message-ID: Subject: Re: [RFC v2 04/83] NOVA inode definition. To: "Darrick J. Wong" Cc: Linux FS Devel , linux-kernel@vger.kernel.org, "linux-nvdimm@lists.01.org" , Dan Williams , "Rudoff, Andy" , coughlan@redhat.com, Steven Swanson , Dave Chinner , jack@suse.com, swhiteho@redhat.com, miklos@szeredi.hu, Jian Xu , Andiry Xu Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 14, 2018 at 10:06 PM, Darrick J. Wong wrote: > On Sat, Mar 10, 2018 at 10:17:45AM -0800, Andiry Xu wrote: >> From: Andiry Xu >> >> inode.h defines the non-volatile and volatile NOVA inode data structures. >> >> The non-volatile NOVA inode (nova_inode) is aligned to 128 bytes and contains >> file/directory metadata information. The most important fields >> are log_head and log_tail. log_head points to the start of >> the log, and log_tail points to the end of the latest committed >> log entry. NOVA make updates to the inode by appending >> to the log tail and update the log_tail pointer atomically. >> >> The volatile NOVA inode (nova_inode_info) contains necessary >> information to limit access to the non-volatile NOVA inode during runtime. >> It has a radix tree to map file offset or filenames to the corresponding >> log entries. >> >> Signed-off-by: Andiry Xu >> --- >> fs/nova/inode.h | 187 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 187 insertions(+) >> create mode 100644 fs/nova/inode.h >> >> diff --git a/fs/nova/inode.h b/fs/nova/inode.h >> new file mode 100644 >> index 0000000..f9187e3 >> --- /dev/null >> +++ b/fs/nova/inode.h >> @@ -0,0 +1,187 @@ >> +#ifndef __INODE_H >> +#define __INODE_H >> + >> +struct nova_inode_info_header; >> +struct nova_inode; >> + >> +#include "super.h" >> + >> +enum nova_new_inode_type { >> + TYPE_CREATE = 0, >> + TYPE_MKNOD, >> + TYPE_SYMLINK, >> + TYPE_MKDIR >> +}; >> + >> + >> +/* >> + * Structure of an inode in PMEM >> + * Keep the inode size to within 120 bytes: We use the last eight bytes >> + * as inode table tail pointer. > > I would've expected a > BUILD_BUG_ON(NOVA_INODE_SIZE - sizeof(struct nova_inode) == 8); > or something to enforce this. > Thanks, will do. > (Or just equate inode number with byte offset? I looked ahead at the > directory entries and they seem to be 64-bit...) > > I guess I'm being lazy and doing a on-disk-format-only review. :) > >> + */ >> +struct nova_inode { >> + >> + /* first 40 bytes */ >> + u8 i_rsvd; /* reserved. used to be checksum */ > > Magic number? > OK. >> + u8 valid; /* Is this inode valid? */ >> + u8 deleted; /* Is this inode deleted? */ > > Would i_mode == 0 cover these? > Deleted flag comes from NOVA-Fortis code. I will check if i_mode can cover it. >> + u8 i_blk_type; /* data block size this inode uses */ > > I would've thought these would just be bits of i_flags? > > Also, if I have a 1G blocksize file and free space fragments to the > point that there's > 1G of free space but none of it contiguous, I guess > I can expect ENOSPC? > Yes, but 1G blocksize has not been tested. >> + __le32 i_flags; /* Inode flags */ >> + __le64 i_size; /* Size of data in bytes */ >> + __le32 i_ctime; /* Inode modification time */ >> + __le32 i_mtime; /* Inode b-tree Modification time */ >> + __le32 i_atime; /* Access time */ > > Same y2038 grumble from the previous patch. > Will fix. >> + __le16 i_mode; /* File mode */ >> + __le16 i_links_count; /* Links count */ >> + >> + __le64 i_xattr; /* Extended attribute block */ >> + >> + /* second 40 bytes */ >> + __le32 i_uid; /* Owner Uid */ >> + __le32 i_gid; /* Group Id */ >> + __le32 i_generation; /* File version (for NFS) */ >> + __le32 i_create_time; /* Create time */ >> + __le64 nova_ino; /* nova inode number */ >> + >> + __le64 log_head; /* Log head pointer */ >> + __le64 log_tail; /* Log tail pointer */ >> + >> + /* last 40 bytes */ >> + __le64 create_epoch_id; /* Transaction ID when create */ >> + __le64 delete_epoch_id; /* Transaction ID when deleted */ >> + >> + struct { >> + __le32 rdev; /* major/minor # */ >> + } dev; /* device inode */ >> + >> + __le32 csum; /* CRC32 checksum */ >> + /* Leave 8 bytes for inode table tail pointer */ >> +} __attribute((__packed__)); >> + >> +/* >> + * NOVA-specific inode state kept in DRAM >> + */ >> +struct nova_inode_info_header { >> + /* For files, tree holds a map from file offsets to >> + * write log entries. >> + * >> + * For directories, tree holds a map from a hash of the file name to >> + * dentry log entry. >> + */ >> + struct radix_tree_root tree; >> + struct rw_semaphore i_sem; /* Protect log and tree */ >> + unsigned short i_mode; /* Dir or file? */ >> + unsigned int i_flags; >> + unsigned long log_pages; /* Num of log pages */ >> + unsigned long i_size; >> + unsigned long i_blocks; >> + unsigned long ino; >> + unsigned long pi_addr; >> + unsigned long valid_entries; /* For thorough GC */ >> + unsigned long num_entries; /* For thorough GC */ >> + u64 last_setattr; /* Last setattr entry */ >> + u64 last_link_change; /* Last link change entry */ >> + u64 last_dentry; /* Last updated dentry */ >> + u64 trans_id; /* Transaction ID */ >> + u64 log_head; /* Log head pointer */ >> + u64 log_tail; /* Log tail pointer */ >> + u8 i_blk_type; >> +}; >> + >> +/* >> + * DRAM state for inodes >> + */ >> +struct nova_inode_info { >> + struct nova_inode_info_header header; >> + struct inode vfs_inode; >> +}; >> + >> + >> +static inline struct nova_inode_info *NOVA_I(struct inode *inode) >> +{ >> + return container_of(inode, struct nova_inode_info, vfs_inode); >> +} >> + >> +static inline void sih_lock(struct nova_inode_info_header *header) > > "sih"? What happened to the "nova" prefix? > This structure is born before the name NOVA was decided. Thanks, Andiry > --D > >> +{ >> + down_write(&header->i_sem); >> +} >> + >> +static inline void sih_unlock(struct nova_inode_info_header *header) >> +{ >> + up_write(&header->i_sem); >> +} >> + >> +static inline void sih_lock_shared(struct nova_inode_info_header *header) >> +{ >> + down_read(&header->i_sem); >> +} >> + >> +static inline void sih_unlock_shared(struct nova_inode_info_header *header) >> +{ >> + up_read(&header->i_sem); >> +} >> + >> +static inline unsigned int >> +nova_inode_blk_shift(struct nova_inode_info_header *sih) >> +{ >> + return blk_type_to_shift[sih->i_blk_type]; >> +} >> + >> +static inline uint32_t nova_inode_blk_size(struct nova_inode_info_header *sih) >> +{ >> + return blk_type_to_size[sih->i_blk_type]; >> +} >> + >> +static inline u64 nova_get_reserved_inode_addr(struct super_block *sb, >> + u64 inode_number) >> +{ >> + return (NOVA_DEF_BLOCK_SIZE_4K * RESERVE_INODE_START) + >> + inode_number * NOVA_INODE_SIZE; >> +} >> + >> +static inline struct nova_inode *nova_get_reserved_inode(struct super_block *sb, >> + u64 inode_number) >> +{ >> + struct nova_sb_info *sbi = NOVA_SB(sb); >> + u64 addr; >> + >> + addr = nova_get_reserved_inode_addr(sb, inode_number); >> + >> + return (struct nova_inode *)(sbi->virt_addr + addr); >> +} >> + >> +static inline struct nova_inode *nova_get_inode_by_ino(struct super_block *sb, >> + u64 ino) >> +{ >> + if (ino == 0 || ino >= NOVA_NORMAL_INODE_START) >> + return NULL; >> + >> + return nova_get_reserved_inode(sb, ino); >> +} >> + >> +static inline struct nova_inode *nova_get_inode(struct super_block *sb, >> + struct inode *inode) >> +{ >> + struct nova_inode_info *si = NOVA_I(inode); >> + struct nova_inode_info_header *sih = &si->header; >> + struct nova_inode fake_pi; >> + void *addr; >> + int rc; >> + >> + addr = nova_get_block(sb, sih->pi_addr); >> + rc = memcpy_mcsafe(&fake_pi, addr, sizeof(struct nova_inode)); >> + if (rc) >> + return NULL; >> + >> + return (struct nova_inode *)addr; >> +} >> + >> +static inline int nova_persist_inode(struct nova_inode *pi) >> +{ >> + nova_flush_buffer(pi, sizeof(struct nova_inode), 1); >> + return 0; >> +} >> + >> +#endif >> -- >> 2.7.4 >>