2012-10-23 02:21:57

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system

Change log from v1:

o Apply the recent user namespace changes [Eric]
o Remove unnecessary condition check [Al]
o Fix wrong description [Stefan]
o Fix f2fs document [Randy]
o Enlarge the volume label length to 256 unicodes [Martin]
o Support time resolution to nano scale [Boaz]
o Fix the wrong use of endian conversion [David]
o Fix the use of mutex and spinlocks [David]
o Remove the use of __GFP_NOFAIL, etc [Neil]
o Change the flow for readability [Neil]
o Reduce the lock contention in CP [Neil]
o Support multiples of section size [Arnd]
o Support configurable extension list [Arnd]
o Support configurable active log numbers [Arnd]

[Future works]
o Aware of file access pattern
o Erase block indirect
o Sub-page write avoidance
o In-line data
o Xattr optimization/in-line xattrs

I really appreciate the valuable comments from all of you in community.

Note)
Due to the change of on-disk layout, please download f2fs-tools-1.1.0.tar.gz.

--------------------------------------------------------------------------------

This is a new patch set for the f2fs file system.

What is F2FS?
=============

NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
been widely being used for ranging from mobile to server systems. Since they are
known to have different characteristics from the conventional rotational disks,
a file system, an upper layer to the storage device, should adapt to the changes
from the sketch.

F2FS is a new file system carefully designed for the NAND flash memory-based storage
devices. We chose a log structure file system approach, but we tried to adapt it
to the new form of storage. Also we remedy some known issues of the very old log
structured file system, such as snowball effect of wandering tree and high cleaning
overhead.

Because a NAND-based storage device shows different characteristics according to
its internal geometry or flash memory management scheme aka FTL, we add various
parameters not only for configuring on-disk layout, but also for selecting allocation
and cleaning algorithms.

Patch set
=========

The patch #1 adds a document to Documentation/filesystems/.
The patch #2 adds a header file of on-disk layout to include/linux/.
The patches #3-#15 adds f2fs source files to fs/f2fs/.
The Last patch, patch #16, updates Makefile and Kconfig.

mkfs.f2fs
=========

The file system formatting tool, "mkfs.f2fs", is available from the following
download page: http://sourceforge.net/projects/f2fs-tools/

Usage
=====

If you'd like to experience f2fs, simply:
# mkfs.f2fs /dev/sdb1
# mount -t f2fs /dev/sdb1 /mnt/f2fs

Short log
=========

Jaegeuk Kim (16):
f2fs: add document
f2fs: add on-disk layout
f2fs: add superblock and major in-memory structure
f2fs: add super block operations
f2fs: add checkpoint operations
f2fs: add node operations
f2fs: add segment operations
f2fs: add file operations
f2fs: add address space operations for data
f2fs: add core inode operations
f2fs: add inode operations for special inodes
f2fs: add core directory operations
f2fs: add xattr and acl functionalities
f2fs: add garbage collection functions
f2fs: add recovery routines for roll-forward
f2fs: update Kconfig and Makefile

Documentation/filesystems/00-INDEX | 2 +
Documentation/filesystems/f2fs.txt | 404 ++++++++
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/f2fs/Kconfig | 55 ++
fs/f2fs/Makefile | 6 +
fs/f2fs/acl.c | 465 ++++++++++
fs/f2fs/acl.h | 57 ++
fs/f2fs/checkpoint.c | 795 ++++++++++++++++
fs/f2fs/data.c | 701 ++++++++++++++
fs/f2fs/dir.c | 657 +++++++++++++
fs/f2fs/f2fs.h | 982 ++++++++++++++++++++
fs/f2fs/file.c | 640 +++++++++++++
fs/f2fs/gc.c | 1139 +++++++++++++++++++++++
fs/f2fs/gc.h | 203 ++++
fs/f2fs/hash.c | 98 ++
fs/f2fs/inode.c | 262 ++++++
fs/f2fs/namei.c | 494 ++++++++++
fs/f2fs/node.c | 1782 +++++++++++++++++++++++++++++++++++
fs/f2fs/node.h | 330 +++++++
fs/f2fs/recovery.c | 375 ++++++++
fs/f2fs/segment.c | 1795 ++++++++++++++++++++++++++++++++++++
fs/f2fs/segment.h | 594 ++++++++++++
fs/f2fs/super.c | 590 ++++++++++++
fs/f2fs/xattr.c | 389 ++++++++
fs/f2fs/xattr.h | 145 +++
include/linux/f2fs_fs.h | 362 ++++++++
27 files changed, 13324 insertions(+)
create mode 100644 Documentation/filesystems/f2fs.txt
create mode 100644 fs/f2fs/Kconfig
create mode 100644 fs/f2fs/Makefile
create mode 100644 fs/f2fs/acl.c
create mode 100644 fs/f2fs/acl.h
create mode 100644 fs/f2fs/checkpoint.c
create mode 100644 fs/f2fs/data.c
create mode 100644 fs/f2fs/dir.c
create mode 100644 fs/f2fs/f2fs.h
create mode 100644 fs/f2fs/file.c
create mode 100644 fs/f2fs/gc.c
create mode 100644 fs/f2fs/gc.h
create mode 100644 fs/f2fs/hash.c
create mode 100644 fs/f2fs/inode.c
create mode 100644 fs/f2fs/namei.c
create mode 100644 fs/f2fs/node.c
create mode 100644 fs/f2fs/node.h
create mode 100644 fs/f2fs/recovery.c
create mode 100644 fs/f2fs/segment.c
create mode 100644 fs/f2fs/segment.h
create mode 100644 fs/f2fs/super.c
create mode 100644 fs/f2fs/xattr.c
create mode 100644 fs/f2fs/xattr.h
create mode 100644 include/linux/f2fs_fs.h

--
1.7.9.5




---
Jaegeuk Kim
Samsung



2012-10-23 02:25:17

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 01/16 v2] f2fs: add document

This adds a document describing the mount options, proc entries, usage, and
design of Flash-Friendly File System, namely F2FS.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
Documentation/filesystems/00-INDEX | 2 +
Documentation/filesystems/f2fs.txt | 404 ++++++++++++++++++++++++++++++++++++
2 files changed, 406 insertions(+)
create mode 100644 Documentation/filesystems/f2fs.txt

diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 8c624a1..ce5fd46 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -48,6 +48,8 @@ ext4.txt
- info, mount options and specifications for the Ext4 filesystem.
files.txt
- info on file management in the Linux kernel.
+f2fs.txt
+ - info and mount options for the F2FS filesystem.
fuse.txt
- info on the Filesystem in User SpacE including mount options.
gfs2.txt
diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
new file mode 100644
index 0000000..f2b4fde
--- /dev/null
+++ b/Documentation/filesystems/f2fs.txt
@@ -0,0 +1,404 @@
+================================================================================
+WHAT IS Flash-Friendly File System (F2FS)?
+================================================================================
+
+NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
+been widely being used for storage ranging from mobile to server systems. Since
+they are known to have different characteristics from the conventional rotating
+disks, a file system, an upper layer to the storage device, should adapt to the
+changes from the sketch in the design level.
+
+F2FS is a file system exploiting NAND flash memory-based storage devices, which
+is based on Log-structured File System (LFS). The design has been focused on
+addressing the fundamental issues in LFS, which are snowball effect of wandering
+tree and high cleaning overhead.
+
+Since a NAND flash memory-based storage device shows different characteristic
+according to its internal geometry or flash memory management scheme, namely FTL,
+F2FS and its tools support various parameters not only for configuring on-disk
+layout, but also for selecting allocation and cleaning algorithms.
+
+The file system formatting tool, "mkfs.f2fs", is available from the following
+download page: http://sourceforge.net/projects/f2fs-tools/
+
+================================================================================
+BACKGROUND AND DESIGN ISSUES
+================================================================================
+
+Log-structured File System (LFS)
+--------------------------------
+"A log-structured file system writes all modifications to disk sequentially in
+a log-like structure, thereby speeding up both file writing and crash recovery.
+The log is the only structure on disk; it contains indexing information so that
+files can be read back from the log efficiently. In order to maintain large free
+areas on disk for fast writing, we divide the log into segments and use a
+segment cleaner to compress the live information from heavily fragmented
+segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
+implementation of a log-structured file system", ACM Trans. Computer Systems
+10, 1, 26–52.
+
+Wandering Tree Problem
+----------------------
+In LFS, when a file data is updated and written to the end of log, its direct
+pointer block is updated due to the changed location. Then the indirect pointer
+block is also updated due to the direct pointer block update. In this manner,
+the upper index structures such as inode, inode map, and checkpoint block are
+also updated recursively. This problem is called as wandering tree problem [1],
+and in order to enhance the performance, it should eliminate or relax the update
+propagation as much as possible.
+
+[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
+
+Cleaning Overhead
+-----------------
+Since LFS is based on out-of-place writes, it produces so many obsolete blocks
+scattered across the whole storage. In order to serve new empty log space, it
+needs to reclaim these obsolete blocks seamlessly to users. This job is called
+as a cleaning process.
+
+The process consists of three operations as follows.
+1. A victim segment is selected through referencing segment usage table.
+2. It loads parent index structures of all the data in the victim identified by
+ segment summary blocks.
+3. It checks the cross-reference between the data and its parent index structure.
+4. It moves valid data selectively.
+
+This cleaning job may cause unexpected long delays, so the most important goal
+is to hide the latencies to users. And also definitely, it should reduce the
+amount of valid data to be moved, and move them quickly as well.
+
+================================================================================
+KEY FEATURES
+================================================================================
+
+Flash Awareness
+---------------
+- Enlarge the random write area for better performance, but provide the high
+ spatial locality
+- Align FS data structures to the operational units in FTL as best efforts
+
+Wandering Tree Problem
+----------------------
+- Use a term, “node”, that represents inodes as well as various pointer blocks
+- Introduce Node Address Table (NAT) containing the locations of all the “node”
+ blocks; this will cut off the update propagation.
+
+Cleaning Overhead
+-----------------
+- Support a background cleaning process
+- Support greedy and cost-benefit algorithms for victim selection policies
+- Support multi-head logs for static/dynamic hot and cold data separation
+- Introduce adaptive logging for efficient block allocation
+
+================================================================================
+MOUNT OPTIONS
+================================================================================
+
+background_gc_off Turn off cleaning operations, namely garbage collection,
+ triggered in background when I/O subsystem is idle.
+disable_roll_forward Disable the roll-forward recovery routine
+discard Issue discard/TRIM commands when a segment is cleaned.
+no_heap Disable heap-style segment allocation which finds free
+ segments for data from the beginning of main area, while
+ for node from the end of main area.
+nouser_xattr Disable Extended User Attributes. Note: xattr is enabled
+ by default if CONFIG_F2FS_FS_XATTR is selected.
+noacl Disable POSIX Access Control List. Note: acl is enabled
+ by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
+active_logs=%u Support configuring the number of active logs. In the
+ current design, f2fs supports only 2, 4, and 6 logs.
+ Default number is 6.
+disable_ext_identify Disable the extension list configured by mkfs, so f2fs
+ does not aware of cold files such as media files.
+
+================================================================================
+PROC ENTRIES
+================================================================================
+
+/proc/fs/f2fs/ contains information about partitions mounted as f2fs. For each
+partition, a corresponding directory, named as its device name, is provided with
+the following proc entries.
+
+- f2fs_stat major file system information managed by f2fs currently
+- f2fs_sit_stat average utilization information of the whole segments
+- f2fs_mem_stat current memory footprint consumed by f2fs
+
+e.g., in /proc/fs/f2fs/sdb1/
+
+================================================================================
+USAGE
+================================================================================
+
+1. Download userland tools
+
+2. Insmod f2fs.ko module:
+ # insmod f2fs.ko
+
+3. Check the directory trying to mount
+ # mkdir /mnt/f2fs
+
+4. Format the block device, and then mount as f2fs
+ # mkfs.f2fs -l label /dev/block_device
+ # mount -t f2fs /dev/block_device /mnt/f2fs
+
+Mount options
+-------------
+-l [label] : Give a volume label, up to 256 unicode name.
+-a [0 or 1] : Split start location of each area for heap-based allocation.
+ 1 is set by default, which performs this.
+-o [int] : Set overprovision ratio in percent over volume size.
+ 5 is set by default.
+-s [int] : Set the number of segments per section.
+ 1 is set by default.
+-z [int] : Set the number of sections per zone.
+ 1 is set by default.
+-e [str] : Set basic extension list. e.g. "mp3,gif,mov"
+
+================================================================================
+DESIGN
+================================================================================
+
+On-disk Layout
+--------------
+
+F2FS divides the whole volume into a number of segments, each of which is 2MB in
+size by default. A section is composed of consecutive segments, and a zone
+consists of a set of sections.
+
+F2FS maintains logically six log areas. Except SB, all the log areas are managed
+in a unit of multiple segments. SB is located at the beginning of the partition,
+and there exist two superblocks to avoid file system crash. Other file system
+metadata such as CP, NAT, SIT, and SSA are located in the front part of the
+volume. Main area contains file and directory data including their indices.
+
+Each area manages the following contents.
+- CP File system information, bitmaps for valid NAT/SIT sets, orphan
+ inode lists, and summary entries of current active segments.
+- NAT Block address table for all the node blocks stored in Main area.
+- SIT Segment information such as valid block count and bitmap for the
+ validity of all the blocks.
+- SSA Summary entries which contains the owner information of all the
+ data and node blocks stored in Main area.
+- Main Node and data blocks.
+
+In order to avoid misalignment between file system and flash-based storage, F2FS
+aligns the start block address of CP with the segment size. Also, it aligns the
+start block address of Main area with the zone size by reserving some segments
+in SSA area.
+
+ align with the zone size <-|
+ |-> align with the segment size
+ _________________________________________________________________________
+ | | | Node | Segment | Segment | |
+ | Superblock | Checkpoint | Address | Info. | Summary | Main |
+ | (SB) | (CP) | Table (NAT) | Table (SIT) | Area (SSA) | |
+ |____________|_____2______|______N______|______N______|______N_____|__N___|
+ . .
+ . .
+ . .
+ ._________________________________________.
+ |_Segment_|_..._|_Segment_|_..._|_Segment_|
+ . .
+ ._________._________
+ |_section_|__...__|_
+ . .
+ .________.
+ |__zone__|
+
+
+File System Metadata Structure
+------------------------------
+
+F2FS adopts the checkpointing scheme to maintain file system consistency. At
+mount time, F2FS first tries to find the last valid checkpoint data by scanning
+CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
+One of them always indicates the last valid data, which is called as shadow copy
+mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
+
+For file system consistency, each CP points to which NAT and SIT copies are
+valid, as shown as below.
+
+ +--------+----------+---------+
+ | CP | NAT | SIT |
+ +--------+----------+---------+
+ . . . .
+ . . . .
+ . . . .
+ +-------+-------+--------+--------+--------+--------+
+ | CP #0 | CP #1 | NAT #0 | NAT #1 | SIT #0 | SIT #1 |
+ +-------+-------+--------+--------+--------+--------+
+ | ^ ^
+ | | |
+ `----------------------------------------'
+
+Index Structure
+---------------
+
+The key data structure to manage the data locations is a "node". Similar to
+traditional file structures, F2FS has three types of node: inode, direct node,
+indirect node. F2FS assigns 4KB to an inode block which contains 929 data block
+indices, two direct node pointers, two indirect node pointers, and one double
+indirect node pointer as described below. One direct node block contains 1018
+data blocks, and one indirect node block contains also 1018 node blocks. Thus,
+one inode block (i.e., a file) covers:
+
+ 4KB * (927 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
+
+ Inode block (4KB)
+ |- data (927)
+ |- direct node (2)
+ | `- data (1018)
+ |- indirect node (2)
+ | `- direct node (1018)
+ | `- data (1018)
+ `- double indirect node (1)
+ `- indirect node (1018)
+ `- direct node (1018)
+ `- data (1018)
+
+Note that, all the node blocks are mapped by NAT which means the location of
+each node is translated by the NAT table. In the consideration of the wandering
+tree problem, F2FS is able to cut off the propagation of node updates caused by
+leaf data writes.
+
+Directory Structure
+-------------------
+
+A directory entry occupies 11 bytes, which consists of the following attributes.
+
+- hash hash value of the file name
+- ino inode number
+- len the length of file name
+- type file type such as directory, symlink, etc
+
+A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
+used to represent whether each dentry is valid or not. A dentry block occupies
+4KB with the following composition.
+
+ Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
+ dentries(11 * 214 bytes) + file name (8 * 214 bytes)
+
+ [Bucket]
+ +--------------------------------+
+ |dentry block 1 | dentry block 2 |
+ +--------------------------------+
+ . .
+ . .
+ . [Dentry Block Structure: 4KB] .
+ +--------+----------+----------+------------+
+ | bitmap | reserved | dentries | file names |
+ +--------+----------+----------+------------+
+ [Dentry Block: 4KB] . .
+ . .
+ . .
+ +------+------+-----+------+
+ | hash | ino | len | type |
+ +------+------+-----+------+
+ [Dentry Structure: 11 bytes]
+
+F2FS implements multi-level hash tables for directory structure. Each level has
+a hash table with dedicated number of hash buckets as shown below. Note that
+"A(2B)" means a bucket includes 2 data blocks.
+
+----------------------
+A : bucket
+B : block
+N : MAX_DIR_HASH_DEPTH
+----------------------
+
+level #0 | A(2B)
+ |
+level #1 | A(2B) - A(2B)
+ |
+level #2 | A(2B) - A(2B) - A(2B) - A(2B)
+ . | . . . .
+level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
+ . | . . . .
+level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
+
+The number of blocks and buckets are determined by,
+
+ ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
+ # of blocks in level #n = |
+ `- 4, Otherwise
+
+ ,- 2^n, if n < MAX_DIR_HASH_DEPTH / 2,
+ # of buckets in level #n = |
+ `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1), Otherwise
+
+When F2FS finds a file name in a directory, at first a hash value of the file
+name is calculated. Then, F2FS scans the hash table in level #0 to find the
+dentry consisting of the file name and its inode number. If not found, F2FS
+scans the next hash table in level #1. In this way, F2FS scans hash tables in
+each levels incrementally from 1 to N. In each levels F2FS needs to scan only
+one bucket determined by the following equation, which shows O(log(# of files))
+complexity.
+
+ bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
+
+In the case of file creation, F2FS finds empty consecutive slots that cover the
+file name. F2FS searches the empty slots in the hash tables of whole levels from
+1 to N in the same way as the lookup operation.
+
+The following figure shows an example of two cases holding children.
+ --------------> Dir <--------------
+ | |
+ child child
+
+ child - child [hole] - child
+
+ child - child - child [hole] - [hole] - child
+
+ Case 1: Case 2:
+ Number of children = 6, Number of children = 3,
+ File size = 7 File size = 7
+
+Default Block Allocation
+------------------------
+
+At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
+and Hot/Warm/Cold data.
+
+- Hot node contains direct node blocks of directories.
+- Warm node contains direct node blocks except hot node blocks.
+- Cold node contains indirect node blocks
+- Hot data contains dentry blocks
+- Warm data contains data blocks except hot and cold data blocks
+- Cold data contains multimedia data or migrated data blocks
+
+LFS has two schemes for free space management: threaded log and copy-and-compac-
+tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
+for devices showing very good sequential write performance, since free segments
+are served all the time for writing new data. However, it suffers from cleaning
+overhead under high utilization. Contrarily, the threaded log scheme suffers
+from random writes, but no cleaning process is needed. F2FS adopts a hybrid
+scheme where the copy-and-compaction scheme is adopted by default, but the
+policy is dynamically changed to the threaded log scheme according to the file
+system status.
+
+In order to align F2FS with underlying flash-based storage, F2FS allocates a
+segment in a unit of section. F2FS expects that the section size would be the
+same as the unit size of garbage collection in FTL. Furthermore, with respect
+to the mapping granularity in FTL, F2FS allocates each section of the active
+logs from different zones as much as possible, since FTL can write the data in
+the active logs into one allocation unit according to its mapping granularity.
+
+Cleaning process
+----------------
+
+F2FS does cleaning both on demand and in the background. On-demand cleaning is
+triggered when there are not enough free segments to serve VFS calls. Background
+cleaner is operated by a kernel thread, and triggers the cleaning job when the
+system is idle.
+
+F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
+In the greedy algorithm, F2FS selects a victim segment having the smallest number
+of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
+according to the segment age and the number of valid blocks in order to address
+log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
+algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
+algorithm.
+
+In order to identify whether the data in the victim segment are valid or not,
+F2FS manages a bitmap. Each bit represents the validity of a block, and the
+bitmap is composed of a bit stream covering whole blocks in main area.
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:26:06

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 02/16 v2] f2fs: add on-disk layout

This adds a header file describing the on-disk layout of f2fs.

Signed-off-by: Changman Lee <[email protected]>
Signed-off-by: Chul Lee <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
include/linux/f2fs_fs.h | 362 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 362 insertions(+)
create mode 100644 include/linux/f2fs_fs.h

diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
new file mode 100644
index 0000000..bd9c217
--- /dev/null
+++ b/include/linux/f2fs_fs.h
@@ -0,0 +1,362 @@
+/**
+ * include/linux/f2fs_fs.h
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef _LINUX_F2FS_FS_H
+#define _LINUX_F2FS_FS_H
+
+#include <linux/pagemap.h>
+#include <linux/types.h>
+
+#define F2FS_SUPER_MAGIC 0xF2F52010
+#define F2FS_SUPER_OFFSET 0 /* start sector # for sb */
+#define F2FS_BLKSIZE 4096
+#define F2FS_MAX_EXTENSION 64
+
+#define NULL_ADDR 0x0U
+#define NEW_ADDR -1U
+
+#define F2FS_ROOT_INO(sbi) (sbi->root_ino_num)
+#define F2FS_NODE_INO(sbi) (sbi->node_ino_num)
+#define F2FS_META_INO(sbi) (sbi->meta_ino_num)
+
+#define GFP_F2FS_MOVABLE (__GFP_WAIT | __GFP_IO | __GFP_ZERO)
+
+#define MAX_ACTIVE_LOGS 16
+#define MAX_ACTIVE_NODE_LOGS 8
+#define MAX_ACTIVE_DATA_LOGS 8
+
+/*
+ * For superblock
+ */
+struct f2fs_super_block {
+ __le32 magic; /* Magic Number */
+ __le16 major_ver; /* Major Version */
+ __le16 minor_ver; /* Minor Version */
+ __le32 log_sectorsize; /* log2 (Sector size in bytes) */
+ __le32 log_sectors_per_block; /* log2 (Number of sectors per block */
+ __le32 log_blocksize; /* log2 (Block size in bytes) */
+ __le32 log_blocks_per_seg; /* log2 (Number of blocks per segment) */
+ __le32 segs_per_sec; /* Number of segments per section */
+ __le32 secs_per_zone; /* Number of sections per zone */
+ __le32 checksum_offset; /* Checksum position in this super block */
+ __le64 block_count; /* Total number of blocks */
+ __le32 section_count; /* Total number of sections */
+ __le32 segment_count; /* Total number of segments */
+ __le32 segment_count_ckpt; /* Total number of segments
+ in Checkpoint area */
+ __le32 segment_count_sit; /* Total number of segments
+ in Segment information table */
+ __le32 segment_count_nat; /* Total number of segments
+ in Node address table */
+ /*Total number of segments in Segment summary area */
+ __le32 segment_count_ssa;
+ /* Total number of segments in Main area */
+ __le32 segment_count_main;
+ __le32 failure_safe_block_distance;
+ __le32 segment0_blkaddr; /* Start block address of Segment 0 */
+ __le32 start_segment_checkpoint; /* Start block address of ckpt */
+ __le32 sit_blkaddr; /* Start block address of SIT */
+ __le32 nat_blkaddr; /* Start block address of NAT */
+ __le32 ssa_blkaddr; /* Start block address of SSA */
+ __le32 main_blkaddr; /* Start block address of Main area */
+ __le32 root_ino; /* Root directory inode number */
+ __le32 node_ino; /* node inode number */
+ __le32 meta_ino; /* meta inode number */
+ __le32 volume_serial_number; /* VSN is optional field */
+ __le16 volume_name[512]; /* Volume Name */
+ __le32 extension_count;
+ __u8 extension_list[F2FS_MAX_EXTENSION][8]; /* extension array */
+} __packed;
+
+/*
+ * For checkpoint
+ */
+struct f2fs_checkpoint {
+ __le64 checkpoint_ver; /* Checkpoint block version number */
+ __le64 user_block_count; /* # of user blocks */
+ __le64 valid_block_count; /* # of valid blocks in Main area */
+ __le32 rsvd_segment_count; /* # of reserved segments for gc */
+ __le32 overprov_segment_count; /* # of overprovision segments */
+ __le32 free_segment_count; /* # of free segments in Main area */
+
+ /* information of current node segments */
+ __le32 cur_node_segno[MAX_ACTIVE_NODE_LOGS];
+ __le16 cur_node_blkoff[MAX_ACTIVE_NODE_LOGS];
+ __le16 nat_upd_blkoff[MAX_ACTIVE_NODE_LOGS];
+ /* information of current data segments */
+ __le32 cur_data_segno[MAX_ACTIVE_DATA_LOGS];
+ __le16 cur_data_blkoff[MAX_ACTIVE_DATA_LOGS];
+ __le32 ckpt_flags; /* Flags : umount and journal_present */
+ __le32 cp_pack_total_block_count;
+ __le32 cp_pack_start_sum; /* start block number of data summary */
+ __le32 valid_node_count; /* Total number of valid nodes */
+ __le32 valid_inode_count; /* Total number of valid inodes */
+ __le32 next_free_nid; /* Next free node number */
+ __le32 sit_ver_bitmap_bytesize; /* Default value 64 */
+ __le32 nat_ver_bitmap_bytesize; /* Default value 256 */
+ __le32 checksum_offset; /* Checksum position
+ in this checkpoint block */
+ __le64 elapsed_time; /* elapsed time while partition
+ is mounted */
+ /* allocation type of current segment */
+ unsigned char alloc_type[MAX_ACTIVE_LOGS];
+
+ /* SIT and NAT version bitmap */
+ unsigned char sit_nat_version_bitmap[1];
+} __packed;
+
+/*
+ * For orphan inode management
+ */
+#define F2FS_ORPHANS_PER_BLOCK 1020
+
+struct f2fs_orphan_block {
+ __le32 ino[F2FS_ORPHANS_PER_BLOCK]; /* inode numbers */
+ __le32 reserved;
+ __le16 blk_addr; /* block index in current CP */
+ __le16 blk_count; /* Number of orphan inode blocks in CP */
+ __le32 entry_count; /* Total number of orphan nodes in current CP */
+ __le32 check_sum; /* CRC32 for orphan inode block */
+} __packed;
+
+/*
+ * For NODE structure
+ */
+struct f2fs_extent {
+ __le32 fofs;
+ __le32 blk_addr;
+ __le32 len;
+} __packed;
+
+#define F2FS_MAX_NAME_LEN 256
+#define ADDRS_PER_INODE 927 /* Address Pointers in an Inode */
+#define ADDRS_PER_BLOCK 1018 /* Address Pointers in a Direct Block */
+#define NIDS_PER_BLOCK 1018 /* Node IDs in an Indirect Block */
+
+struct f2fs_inode {
+ __le16 i_mode; /* File mode */
+ __u8 i_advise; /* File hints */
+ __u8 i_reserved; /* Reserved */
+ __le32 i_uid; /* User ID */
+ __le32 i_gid; /* Group ID */
+ __le32 i_links; /* Links count */
+ __le64 i_size; /* File size in bytes */
+ __le64 i_blocks; /* File size in blocks */
+ __le64 i_ctime; /* Inode change time */
+ __le64 i_mtime; /* Modification time */
+ __le32 i_ctime_nsec;
+ __le32 i_mtime_nsec;
+ __le32 current_depth;
+ __le32 i_xattr_nid; /* nid to save xattr */
+ __le32 i_flags; /* file attributes */
+ __le32 i_pino; /* parent inode number */
+ __le32 i_namelen; /* file name length */
+ __u8 i_name[F2FS_MAX_NAME_LEN]; /* file name for SPOR */
+
+ struct f2fs_extent i_ext; /* caching a largest extent */
+
+ __le32 i_addr[ADDRS_PER_INODE]; /* Pointers to data blocks */
+
+ __le32 i_nid[5]; /* direct(2), indirect(2),
+ double_indirect(1) node id */
+} __packed;
+
+struct direct_node {
+ __le32 addr[ADDRS_PER_BLOCK]; /* array of data block address */
+} __packed;
+
+struct indirect_node {
+ __le32 nid[NIDS_PER_BLOCK]; /* array of data block address */
+} __packed;
+
+enum {
+ COLD_BIT_SHIFT = 0,
+ FSYNC_BIT_SHIFT,
+ DENT_BIT_SHIFT,
+ OFFSET_BIT_SHIFT
+};
+
+struct node_footer {
+ __le32 nid; /* node id */
+ __le32 ino; /* inode nunmber */
+ __le32 flag; /* include cold/fsync/dentry marks and offset */
+ __le64 cp_ver; /* checkpoint version */
+ __le32 next_blkaddr; /* next node page block address */
+} __packed;
+
+struct f2fs_node {
+ union {
+ struct f2fs_inode i;
+ struct direct_node dn;
+ struct indirect_node in;
+ };
+ struct node_footer footer;
+} __packed;
+
+/*
+ * For NAT entries
+ */
+#define NAT_ENTRY_PER_BLOCK (PAGE_CACHE_SIZE / sizeof(struct f2fs_nat_entry))
+
+struct f2fs_nat_entry {
+ __u8 version;
+ __le32 ino;
+ __le32 block_addr;
+} __packed;
+
+struct f2fs_nat_block {
+ struct f2fs_nat_entry entries[NAT_ENTRY_PER_BLOCK];
+} __packed;
+
+/*
+ * For SIT entries
+ */
+#define SIT_VBLOCK_MAP_SIZE 64
+#define SIT_ENTRY_PER_BLOCK (PAGE_CACHE_SIZE / sizeof(struct f2fs_sit_entry))
+
+struct f2fs_sit_entry {
+ __le16 vblocks;
+ __u8 valid_map[SIT_VBLOCK_MAP_SIZE];
+ __le64 mtime;
+} __packed;
+
+struct f2fs_sit_block {
+ struct f2fs_sit_entry entries[SIT_ENTRY_PER_BLOCK];
+} __packed;
+
+/**
+ * For segment summary
+ *
+ * NOTE : For initializing fields, you must use set_summary
+ *
+ * - If data page, nid represents dnode's nid
+ * - If node page, nid represents the node page's nid.
+ *
+ * The ofs_in_node is used by only data page. It represents offset
+ * from node's page's beginning to get a data block address.
+ * ex) data_blkaddr = (block_t)(nodepage_start_address + ofs_in_node)
+ */
+struct f2fs_summary {
+ __le32 nid;
+ union {
+ __u8 reserved[3];
+ struct {
+ __u8 version;
+ __le16 ofs_in_node;
+ } __packed;
+ };
+} __packed;
+
+struct summary_footer {
+ unsigned char entry_type;
+ __u32 check_sum;
+} __packed;
+
+#define SUMMARY_SIZE (sizeof(struct f2fs_summary))
+#define SUM_FOOTER_SIZE (sizeof(struct summary_footer))
+#define ENTRIES_IN_SUM 512
+#define SUM_ENTRY_SIZE (SUMMARY_SIZE * ENTRIES_IN_SUM)
+#define SUM_JOURNAL_SIZE (PAGE_CACHE_SIZE - SUM_FOOTER_SIZE -\
+ SUM_ENTRY_SIZE)
+struct nat_journal_entry {
+ __le32 nid;
+ struct f2fs_nat_entry ne;
+} __packed;
+
+struct sit_journal_entry {
+ __le32 segno;
+ struct f2fs_sit_entry se;
+} __packed;
+
+#define NAT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\
+ sizeof(struct nat_journal_entry))
+#define NAT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\
+ sizeof(struct nat_journal_entry))
+#define SIT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\
+ sizeof(struct sit_journal_entry))
+#define SIT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\
+ sizeof(struct sit_journal_entry))
+enum {
+ NAT_JOURNAL = 0,
+ SIT_JOURNAL
+};
+
+struct nat_journal {
+ struct nat_journal_entry entries[NAT_JOURNAL_ENTRIES];
+ __u8 reserved[NAT_JOURNAL_RESERVED];
+} __packed;
+
+struct sit_journal {
+ struct sit_journal_entry entries[SIT_JOURNAL_ENTRIES];
+ __u8 reserved[SIT_JOURNAL_RESERVED];
+} __packed;
+
+struct f2fs_summary_block {
+ struct f2fs_summary entries[ENTRIES_IN_SUM];
+ union {
+ __le16 n_nats;
+ __le16 n_sits;
+ };
+ union {
+ struct nat_journal nat_j;
+ struct sit_journal sit_j;
+ };
+ struct summary_footer footer;
+} __packed;
+
+/*
+ * For directory operations
+ */
+#define F2FS_DOT_HASH 0
+#define F2FS_DDOT_HASH F2FS_DOT_HASH
+#define F2FS_MAX_HASH (~((0x3ULL) << 62))
+#define F2FS_HASH_COL_BIT ((0x1ULL) << 63)
+
+typedef __le32 f2fs_hash_t;
+
+#define F2FS_NAME_LEN 8
+#define NR_DENTRY_IN_BLOCK 214 /* the number of dentry in a block */
+#define MAX_DIR_HASH_DEPTH 63 /* MAX level for dir lookup */
+
+#define SIZE_OF_DIR_ENTRY 11 /* by byte */
+#define SIZE_OF_DENTRY_BITMAP ((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - 1) / \
+ BITS_PER_BYTE)
+#define SIZE_OF_RESERVED (PAGE_SIZE - ((SIZE_OF_DIR_ENTRY + \
+ F2FS_NAME_LEN) * \
+ NR_DENTRY_IN_BLOCK + SIZE_OF_DENTRY_BITMAP))
+
+struct f2fs_dir_entry {
+ __le32 hash_code; /* hash code of file name */
+ __le32 ino; /* node number of inode */
+ __le16 name_len; /* the size of file name
+ length in unicode characters */
+ __u8 file_type;
+} __packed;
+
+struct f2fs_dentry_block {
+ __u8 dentry_bitmap[SIZE_OF_DENTRY_BITMAP];
+ __u8 reserved[SIZE_OF_RESERVED];
+ struct f2fs_dir_entry dentry[NR_DENTRY_IN_BLOCK];
+ __u8 filename[NR_DENTRY_IN_BLOCK][F2FS_NAME_LEN];
+} __packed;
+
+enum {
+ F2FS_FT_UNKNOWN,
+ F2FS_FT_REG_FILE,
+ F2FS_FT_DIR,
+ F2FS_FT_CHRDEV,
+ F2FS_FT_BLKDEV,
+ F2FS_FT_FIFO,
+ F2FS_FT_SOCK,
+ F2FS_FT_SYMLINK,
+ F2FS_FT_MAX
+};
+
+#endif /* _LINUX_F2FS_FS_H */
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:27:12

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 03/16 v2] f2fs: add superblock and major in-memory structure

This adds the following major in-memory structures in f2fs.

- f2fs_sb_info:
contains f2fs-specific information, two special inode pointers for node and
meta address spaces, and orphan inode management.

- f2fs_inode_info:
contains vfs_inode and other fs-specific information.

- f2fs_nm_info:
contains node manager information such as NAT entry cache, free nid list,
and NAT page management.

- f2fs_node_info:
represents a node as node id, inode number, block address, and its version.

- f2fs_sm_info:
contains segment manager information such as SIT entry cache, free segment
map, current active logs, dirty segment management, and segment utilization.
The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
curseg_info.

Signed-off-by: Chul Lee <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/f2fs.h | 982 +++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/node.h | 330 ++++++++++++++++++
fs/f2fs/segment.h | 594 ++++++++++++++++++++++++++++++++
3 files changed, 1906 insertions(+)
create mode 100644 fs/f2fs/f2fs.h
create mode 100644 fs/f2fs/node.h
create mode 100644 fs/f2fs/segment.h

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
new file mode 100644
index 0000000..bbe2f02
--- /dev/null
+++ b/fs/f2fs/f2fs.h
@@ -0,0 +1,982 @@
+/**
+ * fs/f2fs/f2fs.h
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef _LINUX_F2FS_H
+#define _LINUX_F2FS_H
+
+#include <linux/types.h>
+#include <linux/page-flags.h>
+#include <linux/buffer_head.h>
+#include <linux/version.h>
+#include <linux/slab.h>
+#include <linux/crc32.h>
+
+/**
+ * For mount options
+ */
+#define F2FS_MOUNT_BG_GC 0x00000001
+#define F2FS_MOUNT_DISABLE_ROLL_FORWARD 0x00000002
+#define F2FS_MOUNT_DISCARD 0x00000004
+#define F2FS_MOUNT_NOHEAP 0x00000008
+#define F2FS_MOUNT_XATTR_USER 0x00000010
+#define F2FS_MOUNT_POSIX_ACL 0x00000020
+#define F2FS_MOUNT_DISABLE_EXT_IDENTIFY 0x00000040
+
+#define clear_opt(sbi, option) (sbi->mount_opt.opt &= ~F2FS_MOUNT_##option)
+#define set_opt(sbi, option) (sbi->mount_opt.opt |= F2FS_MOUNT_##option)
+#define test_opt(sbi, option) (sbi->mount_opt.opt & F2FS_MOUNT_##option)
+
+#define ver_after(a, b) (typecheck(unsigned long long, a) && \
+ typecheck(unsigned long long, b) && \
+ ((long long)((a) - (b)) > 0))
+
+typedef u64 block_t;
+typedef u32 nid_t;
+
+struct f2fs_mount_info {
+ unsigned int opt;
+};
+
+static inline __u32 f2fs_crc32(void *buff, size_t len)
+{
+ return crc32_le(F2FS_SUPER_MAGIC, buff, len);
+}
+
+static inline bool f2fs_crc_valid(__u32 blk_crc, void *buff, size_t buff_size)
+{
+ return f2fs_crc32(buff, buff_size) == blk_crc;
+}
+
+/**
+ * For checkpoint manager
+ */
+#define CP_ERROR_FLAG 0x00000008
+#define CP_COMPACT_SUM_FLAG 0x00000004
+#define CP_ORPHAN_PRESENT_FLAG 0x00000002
+#define CP_UMOUNT_FLAG 0x00000001
+
+enum {
+ NAT_BITMAP,
+ SIT_BITMAP
+};
+
+struct orphan_inode_entry {
+ struct list_head list;
+ nid_t ino;
+};
+
+struct dir_inode_entry {
+ struct list_head list;
+ struct inode *inode;
+};
+
+struct fsync_inode_entry {
+ struct list_head list;
+ struct inode *inode;
+ block_t blkaddr;
+};
+
+#define nats_in_cursum(sum) (le16_to_cpu(sum->n_nats))
+#define sits_in_cursum(sum) (le16_to_cpu(sum->n_sits))
+
+#define nat_in_journal(sum, i) (sum->nat_j.entries[i].ne)
+#define nid_in_journal(sum, i) (sum->nat_j.entries[i].nid)
+#define sit_in_journal(sum, i) (sum->sit_j.entries[i].se)
+#define segno_in_journal(sum, i) (sum->sit_j.entries[i].segno)
+
+static inline int update_nats_in_cursum(struct f2fs_summary_block *rs, int i)
+{
+ int before = nats_in_cursum(rs);
+ rs->n_nats = cpu_to_le16(before + i);
+ return before;
+}
+
+static inline int update_sits_in_cursum(struct f2fs_summary_block *rs, int i)
+{
+ int before = sits_in_cursum(rs);
+ rs->n_sits = cpu_to_le16(before + i);
+ return before;
+}
+
+/**
+ * For INODE and NODE manager
+ */
+#define XATTR_NODE_OFFSET (-1)
+#define RDONLY_NODE 1
+#define F2FS_LINK_MAX 32000
+
+struct extent_info {
+ rwlock_t ext_lock;
+ unsigned int fofs;
+ u32 blk_addr;
+ unsigned int len;
+};
+
+#define FADVISE_COLD_BIT 0x01
+/*
+ * i_advise uses FADVISE_XXX_BIT. We can add additional hints later.
+ */
+struct f2fs_inode_info {
+ struct inode vfs_inode;
+ unsigned long i_flags;
+ unsigned long flags;
+ unsigned long long data_version;
+ atomic_t dirty_dents;
+ unsigned int current_depth;
+ f2fs_hash_t chash;
+ unsigned int clevel;
+ nid_t i_xattr_nid;
+ struct extent_info ext;
+ umode_t i_acl_mode;
+ unsigned char i_advise; /* If true, this is cold data */
+};
+
+static inline void get_extent_info(struct extent_info *ext,
+ struct f2fs_extent i_ext)
+{
+ write_lock(&ext->ext_lock);
+ ext->fofs = le32_to_cpu(i_ext.fofs);
+ ext->blk_addr = le32_to_cpu(i_ext.blk_addr);
+ ext->len = le32_to_cpu(i_ext.len);
+ write_unlock(&ext->ext_lock);
+}
+
+static inline void set_raw_extent(struct extent_info *ext,
+ struct f2fs_extent *i_ext)
+{
+ read_lock(&ext->ext_lock);
+ i_ext->fofs = cpu_to_le32(ext->fofs);
+ i_ext->blk_addr = cpu_to_le32(ext->blk_addr);
+ i_ext->len = cpu_to_le32(ext->len);
+ read_unlock(&ext->ext_lock);
+}
+
+struct f2fs_nm_info {
+ block_t nat_blkaddr; /* base disk address of NAT */
+ unsigned int nat_segs; /* the number of nat segments */
+ unsigned int nat_blocks; /* the number of nat blocks of
+ one size */
+ nid_t max_nid;
+ unsigned int nat_cnt; /* the number of nodes in NAT Buffer */
+ struct radix_tree_root nat_root;
+ rwlock_t nat_tree_lock; /* Protect nat_tree_lock */
+ struct list_head nat_entries; /* cached nat entry list (clean) */
+ struct list_head dirty_nat_entries; /* cached nat entry list (dirty) */
+
+ unsigned int fcnt; /* the number of free node id */
+ struct mutex build_lock; /* lock for build free nids */
+
+ int nat_upd_blkoff[3]; /* Block offset
+ in the current journal segment
+ where the last NAT update happened */
+ int lst_upd_blkoff[3]; /* Block offset
+ in current journal segment */
+
+ unsigned int written_valid_node_count;
+ unsigned int written_valid_inode_count;
+ char *nat_bitmap; /* NAT bitmap pointer */
+ int bitmap_size; /* bitmap size */
+
+ nid_t init_scan_nid; /* the first nid to be scanned */
+ nid_t next_scan_nid; /* the next nid to be scanned */
+ struct list_head free_nid_list;
+ spinlock_t free_nid_list_lock; /* Protect free nid list */
+};
+
+struct dnode_of_data {
+ struct inode *inode;
+ struct page *inode_page;
+ struct page *node_page;
+ nid_t nid;
+ unsigned int ofs_in_node;
+ bool inode_page_locked;
+ block_t data_blkaddr;
+};
+
+static inline void set_new_dnode(struct dnode_of_data *dn, struct inode *inode,
+ struct page *ipage, struct page *npage, nid_t nid)
+{
+ dn->inode = inode;
+ dn->inode_page = ipage;
+ dn->node_page = npage;
+ dn->nid = nid;
+ dn->inode_page_locked = 0;
+}
+
+/**
+ * For SIT manager
+ */
+#define NR_CURSEG_DATA_TYPE (3)
+#define NR_CURSEG_NODE_TYPE (3)
+#define NR_CURSEG_TYPE (NR_CURSEG_DATA_TYPE + NR_CURSEG_NODE_TYPE)
+
+enum {
+ CURSEG_HOT_DATA = 0,
+ CURSEG_WARM_DATA,
+ CURSEG_COLD_DATA,
+ CURSEG_HOT_NODE,
+ CURSEG_WARM_NODE,
+ CURSEG_COLD_NODE,
+ NO_CHECK_TYPE
+};
+
+struct f2fs_sm_info {
+ /* SIT information */
+ struct sit_info *sit_info;
+
+ /* Free segmap infomation */
+ struct free_segmap_info *free_info;
+
+ /* Dirty segments list information for GC victim */
+ struct dirty_seglist_info *dirty_info;
+
+ /* Current working segments(i.e. logging point) information array */
+ struct curseg_info *curseg_array;
+
+ /* list head of all under-writeback pages for flush handling */
+ struct list_head wblist_head;
+ spinlock_t wblist_lock;
+
+ block_t seg0_blkaddr;
+ block_t main_blkaddr;
+ unsigned int segment_count;
+ unsigned int rsvd_segment_count;
+ unsigned int main_segment_count;
+ block_t ssa_blkaddr;
+ unsigned int segment_count_ssa;
+};
+
+/**
+ * For Garbage Collection
+ */
+struct f2fs_gc_info {
+#ifdef CONFIG_F2FS_STAT_FS
+ struct list_head stat_list;
+ struct f2fs_stat_info *stat_info;
+#endif
+ int cause;
+ int rsvd_segment_count;
+ int overp_segment_count;
+};
+
+/**
+ * For directory operation
+ */
+#define F2FS_INODE_SIZE (17 * 4 + F2FS_MAX_NAME_LEN)
+#define NODE_DIR1_BLOCK (ADDRS_PER_INODE + 1)
+#define NODE_DIR2_BLOCK (ADDRS_PER_INODE + 2)
+#define NODE_IND1_BLOCK (ADDRS_PER_INODE + 3)
+#define NODE_IND2_BLOCK (ADDRS_PER_INODE + 4)
+#define NODE_DIND_BLOCK (ADDRS_PER_INODE + 5)
+
+/**
+ * For superblock
+ */
+enum count_type {
+ F2FS_WRITEBACK,
+ F2FS_DIRTY_DENTS,
+ F2FS_DIRTY_NODES,
+ F2FS_DIRTY_META,
+ NR_COUNT_TYPE,
+};
+
+/*
+ * FS_LOCK nesting subclasses for the lock validator:
+ *
+ * The locking order between these classes is
+ * RENAME -> DENTRY_OPS -> DATA_WRITE -> DATA_NEW
+ * -> DATA_TRUNC -> NODE_WRITE -> NODE_NEW -> NODE_TRUNC
+ */
+enum lock_type {
+ RENAME, /* for renaming operations */
+ DENTRY_OPS, /* for directory operations */
+ DATA_WRITE, /* for data write */
+ DATA_NEW, /* for data allocation */
+ DATA_TRUNC, /* for data truncate */
+ NODE_NEW, /* for node allocation */
+ NODE_TRUNC, /* for node truncate */
+ NODE_WRITE, /* for node write */
+ NR_LOCK_TYPE,
+};
+
+/*
+ * The below are the page types of bios used in submti_bio().
+ * The available types are:
+ * DATA User data pages. It operates as async mode.
+ * NODE Node pages. It operates as async mode.
+ * META FS metadata pages such as SIT, NAT, CP.
+ * NR_PAGE_TYPE The number of page types.
+ * META_FLUSH Make sure the previous pages are written
+ * with waiting the bio's completion
+ * ... Only can be used with META.
+ */
+enum page_type {
+ DATA,
+ NODE,
+ META,
+ NR_PAGE_TYPE,
+ META_FLUSH,
+};
+
+struct f2fs_sb_info {
+ struct super_block *sb; /* Pointer to VFS super block */
+ int s_dirty;
+ struct f2fs_super_block *raw_super; /* Pointer to the super block
+ in the buffer */
+ struct buffer_head *raw_super_buf; /* Buffer containing
+ the f2fs raw super block */
+ struct f2fs_checkpoint *ckpt; /* Pointer to the checkpoint
+ in the buffer */
+ struct mutex orphan_inode_mutex;
+ spinlock_t dir_inode_lock;
+ struct mutex cp_mutex;
+ /* orphan Inode list to be written in Journal block during CP */
+ struct list_head orphan_inode_list;
+ struct list_head dir_inode_list;
+ unsigned int n_orphans, n_dirty_dirs;
+
+ unsigned int log_sectorsize;
+ unsigned int log_sectors_per_block;
+ unsigned int log_blocksize;
+ unsigned int blocksize;
+ unsigned int root_ino_num; /* Root Inode Number*/
+ unsigned int node_ino_num; /* Root Inode Number*/
+ unsigned int meta_ino_num; /* Root Inode Number*/
+ unsigned int log_blocks_per_seg;
+ unsigned int blocks_per_seg;
+ unsigned int segs_per_sec;
+ unsigned int secs_per_zone;
+ unsigned int total_sections;
+ unsigned int total_node_count;
+ unsigned int total_valid_node_count;
+ unsigned int total_valid_inode_count;
+ unsigned int segment_count[2];
+ unsigned int block_count[2];
+ unsigned int last_victim[2];
+ int active_logs;
+ block_t user_block_count;
+ block_t total_valid_block_count;
+ block_t alloc_valid_block_count;
+ block_t last_valid_block_count;
+ atomic_t nr_pages[NR_COUNT_TYPE];
+
+ struct f2fs_mount_info mount_opt;
+
+ /* related to NM */
+ struct f2fs_nm_info *nm_info; /* Node Manager information */
+
+ /* related to SM */
+ struct f2fs_sm_info *sm_info; /* Segment Manager
+ information */
+ int total_hit_ext, read_hit_ext;
+ int rr_flush;
+
+ /* related to GC */
+ struct proc_dir_entry *s_proc;
+ struct f2fs_gc_info *gc_info; /* Garbage Collector
+ information */
+ struct mutex gc_mutex; /* mutex for GC */
+ struct mutex fs_lock[NR_LOCK_TYPE]; /* mutex for GP */
+ struct mutex write_inode; /* mutex for write inode */
+ struct mutex writepages; /* mutex for writepages() */
+ struct f2fs_gc_kthread *gc_thread; /* GC thread */
+ int bg_gc;
+ int last_gc_status;
+ int por_doing;
+
+ struct inode *node_inode;
+ struct inode *meta_inode;
+
+ struct bio *bio[NR_PAGE_TYPE];
+ sector_t last_block_in_bio[NR_PAGE_TYPE];
+ struct rw_semaphore bio_sem;
+ spinlock_t stat_lock; /* lock for handling the number
+ of valid blocks and
+ valid nodes */
+};
+
+/**
+ * Inline functions
+ */
+static inline struct f2fs_inode_info *F2FS_I(struct inode *inode)
+{
+ return container_of(inode, struct f2fs_inode_info, vfs_inode);
+}
+
+static inline struct f2fs_sb_info *F2FS_SB(struct super_block *sb)
+{
+ return sb->s_fs_info;
+}
+
+static inline struct f2fs_super_block *F2FS_RAW_SUPER(struct f2fs_sb_info *sbi)
+{
+ return (struct f2fs_super_block *)(sbi->raw_super);
+}
+
+static inline struct f2fs_checkpoint *F2FS_CKPT(struct f2fs_sb_info *sbi)
+{
+ return (struct f2fs_checkpoint *)(sbi->ckpt);
+}
+
+static inline struct f2fs_nm_info *NM_I(struct f2fs_sb_info *sbi)
+{
+ return (struct f2fs_nm_info *)(sbi->nm_info);
+}
+
+static inline struct f2fs_sm_info *SM_I(struct f2fs_sb_info *sbi)
+{
+ return (struct f2fs_sm_info *)(sbi->sm_info);
+}
+
+static inline struct sit_info *SIT_I(struct f2fs_sb_info *sbi)
+{
+ return (struct sit_info *)(SM_I(sbi)->sit_info);
+}
+
+static inline struct free_segmap_info *FREE_I(struct f2fs_sb_info *sbi)
+{
+ return (struct free_segmap_info *)(SM_I(sbi)->free_info);
+}
+
+static inline struct dirty_seglist_info *DIRTY_I(struct f2fs_sb_info *sbi)
+{
+ return (struct dirty_seglist_info *)(SM_I(sbi)->dirty_info);
+}
+
+static inline void F2FS_SET_SB_DIRT(struct f2fs_sb_info *sbi)
+{
+ sbi->s_dirty = 1;
+}
+
+static inline void F2FS_RESET_SB_DIRT(struct f2fs_sb_info *sbi)
+{
+ sbi->s_dirty = 0;
+}
+
+static inline void mutex_lock_op(struct f2fs_sb_info *sbi, enum lock_type t)
+{
+ mutex_lock_nested(&sbi->fs_lock[t], t);
+}
+
+static inline void mutex_unlock_op(struct f2fs_sb_info *sbi, enum lock_type t)
+{
+ mutex_unlock(&sbi->fs_lock[t]);
+}
+
+/**
+ * Check whether the given nid is within node id range.
+ */
+static inline void check_nid_range(struct f2fs_sb_info *sbi, nid_t nid)
+{
+ BUG_ON((nid >= NM_I(sbi)->max_nid));
+}
+
+#define F2FS_DEFAULT_ALLOCATED_BLOCKS 1
+
+/**
+ * Check whether the inode has blocks or not
+ */
+static inline int F2FS_HAS_BLOCKS(struct inode *inode)
+{
+ if (F2FS_I(inode)->i_xattr_nid)
+ return (inode->i_blocks > F2FS_DEFAULT_ALLOCATED_BLOCKS + 1);
+ else
+ return (inode->i_blocks > F2FS_DEFAULT_ALLOCATED_BLOCKS);
+}
+
+static inline bool inc_valid_block_count(struct f2fs_sb_info *sbi,
+ struct inode *inode, blkcnt_t count)
+{
+ block_t valid_block_count;
+
+ spin_lock(&sbi->stat_lock);
+ valid_block_count =
+ sbi->total_valid_block_count + (block_t)count;
+ if (valid_block_count > sbi->user_block_count) {
+ spin_unlock(&sbi->stat_lock);
+ return false;
+ }
+ inode->i_blocks += count;
+ sbi->total_valid_block_count = valid_block_count;
+ sbi->alloc_valid_block_count += (block_t)count;
+ spin_unlock(&sbi->stat_lock);
+ return true;
+}
+
+static inline int dec_valid_block_count(struct f2fs_sb_info *sbi,
+ struct inode *inode,
+ blkcnt_t count)
+{
+ spin_lock(&sbi->stat_lock);
+ BUG_ON(sbi->total_valid_block_count < (block_t) count);
+ BUG_ON(inode->i_blocks < count);
+ inode->i_blocks -= count;
+ sbi->total_valid_block_count -= (block_t)count;
+ spin_unlock(&sbi->stat_lock);
+ return 0;
+}
+
+static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type)
+{
+ atomic_inc(&sbi->nr_pages[count_type]);
+ F2FS_SET_SB_DIRT(sbi);
+}
+
+static inline void inode_inc_dirty_dents(struct inode *inode)
+{
+ atomic_inc(&F2FS_I(inode)->dirty_dents);
+}
+
+static inline void dec_page_count(struct f2fs_sb_info *sbi, int count_type)
+{
+ atomic_dec(&sbi->nr_pages[count_type]);
+}
+
+static inline void inode_dec_dirty_dents(struct inode *inode)
+{
+ atomic_dec(&F2FS_I(inode)->dirty_dents);
+}
+
+static inline int get_pages(struct f2fs_sb_info *sbi, int count_type)
+{
+ return atomic_read(&sbi->nr_pages[count_type]);
+}
+
+static inline block_t valid_user_blocks(struct f2fs_sb_info *sbi)
+{
+ block_t ret;
+ spin_lock(&sbi->stat_lock);
+ ret = sbi->total_valid_block_count;
+ spin_unlock(&sbi->stat_lock);
+ return ret;
+}
+
+static inline unsigned long __bitmap_size(struct f2fs_sb_info *sbi, int flag)
+{
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+
+ /* return NAT or SIT bitmap */
+ if (flag == NAT_BITMAP)
+ return le32_to_cpu(ckpt->nat_ver_bitmap_bytesize);
+ else if (flag == SIT_BITMAP)
+ return le32_to_cpu(ckpt->sit_ver_bitmap_bytesize);
+
+ return 0;
+}
+
+static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag)
+{
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ int offset = (flag == NAT_BITMAP) ? ckpt->sit_ver_bitmap_bytesize : 0;
+ return &ckpt->sit_nat_version_bitmap + offset;
+}
+
+static inline block_t __start_cp_addr(struct f2fs_sb_info *sbi)
+{
+ block_t start_addr;
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ unsigned long long ckpt_version = le64_to_cpu(ckpt->checkpoint_ver);
+
+ start_addr = le64_to_cpu(F2FS_RAW_SUPER(sbi)->start_segment_checkpoint);
+
+ /*
+ * odd numbered checkpoint shoukd at cp segment 0
+ * and even segent must be at cp segment 1
+ */
+ if (!(ckpt_version & 1))
+ start_addr += sbi->blocks_per_seg;
+
+ return start_addr;
+}
+
+static inline block_t __start_sum_addr(struct f2fs_sb_info *sbi)
+{
+ return le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_start_sum);
+}
+
+static inline bool inc_valid_node_count(struct f2fs_sb_info *sbi,
+ struct inode *inode,
+ unsigned int count)
+{
+ block_t valid_block_count;
+ unsigned int valid_node_count;
+
+ spin_lock(&sbi->stat_lock);
+
+ valid_block_count = sbi->total_valid_block_count + (block_t)count;
+ sbi->alloc_valid_block_count += (block_t)count;
+ valid_node_count = sbi->total_valid_node_count + count;
+
+ if (valid_block_count > sbi->user_block_count) {
+ spin_unlock(&sbi->stat_lock);
+ return false;
+ }
+
+ if (valid_node_count > sbi->total_node_count) {
+ spin_unlock(&sbi->stat_lock);
+ return false;
+ }
+
+ if (inode)
+ inode->i_blocks += count;
+ sbi->total_valid_node_count = valid_node_count;
+ sbi->total_valid_block_count = valid_block_count;
+ spin_unlock(&sbi->stat_lock);
+
+ return true;
+}
+
+static inline void dec_valid_node_count(struct f2fs_sb_info *sbi,
+ struct inode *inode,
+ unsigned int count)
+{
+ spin_lock(&sbi->stat_lock);
+
+ BUG_ON(sbi->total_valid_block_count < count);
+ BUG_ON(sbi->total_valid_node_count < count);
+ BUG_ON(inode->i_blocks < count);
+
+ inode->i_blocks -= count;
+ sbi->total_valid_node_count -= count;
+ sbi->total_valid_block_count -= (block_t)count;
+
+ spin_unlock(&sbi->stat_lock);
+}
+
+static inline unsigned int valid_node_count(struct f2fs_sb_info *sbi)
+{
+ unsigned int ret;
+ spin_lock(&sbi->stat_lock);
+ ret = sbi->total_valid_node_count;
+ spin_unlock(&sbi->stat_lock);
+ return ret;
+}
+
+static inline void inc_valid_inode_count(struct f2fs_sb_info *sbi)
+{
+ spin_lock(&sbi->stat_lock);
+ BUG_ON(sbi->total_valid_inode_count == sbi->total_node_count);
+ sbi->total_valid_inode_count++;
+ spin_unlock(&sbi->stat_lock);
+}
+
+static inline int dec_valid_inode_count(struct f2fs_sb_info *sbi)
+{
+ spin_lock(&sbi->stat_lock);
+ BUG_ON(!sbi->total_valid_inode_count);
+ sbi->total_valid_inode_count--;
+ spin_unlock(&sbi->stat_lock);
+ return 0;
+}
+
+static inline unsigned int valid_inode_count(struct f2fs_sb_info *sbi)
+{
+ unsigned int ret;
+ spin_lock(&sbi->stat_lock);
+ ret = sbi->total_valid_inode_count;
+ spin_unlock(&sbi->stat_lock);
+ return ret;
+}
+
+static inline void f2fs_put_page(struct page *page, int unlock)
+{
+ if (!page || IS_ERR(page))
+ return;
+
+ if (unlock) {
+ BUG_ON(!PageLocked(page));
+ unlock_page(page);
+ }
+ page_cache_release(page);
+}
+
+static inline void f2fs_put_dnode(struct dnode_of_data *dn)
+{
+ if (dn->node_page)
+ f2fs_put_page(dn->node_page, 1);
+ if (dn->inode_page && dn->node_page != dn->inode_page)
+ f2fs_put_page(dn->inode_page, 0);
+ dn->node_page = NULL;
+ dn->inode_page = NULL;
+}
+
+static inline struct kmem_cache *f2fs_kmem_cache_create(const char *name,
+ size_t size, void (*ctor)(void *))
+{
+ return kmem_cache_create(name, size, 0, SLAB_RECLAIM_ACCOUNT, ctor);
+}
+
+#define RAW_IS_INODE(p) ((p)->footer.nid == (p)->footer.ino)
+
+static inline bool IS_INODE(struct page *page)
+{
+ struct f2fs_node *p = (struct f2fs_node *)page_address(page);
+ return RAW_IS_INODE(p);
+}
+
+static inline __le32 *blkaddr_in_node(struct f2fs_node *node)
+{
+ return RAW_IS_INODE(node) ? node->i.i_addr : node->dn.addr;
+}
+
+static inline block_t datablock_addr(struct page *node_page,
+ unsigned int offset)
+{
+ struct f2fs_node *raw_node;
+ __le32 *addr_array;
+ raw_node = (struct f2fs_node *)page_address(node_page);
+ addr_array = blkaddr_in_node(raw_node);
+ return le32_to_cpu(addr_array[offset]);
+}
+
+static inline int f2fs_test_bit(unsigned int nr, char *addr)
+{
+ int mask;
+
+ addr += (nr >> 3);
+ mask = 1 << (7 - (nr & 0x07));
+ return mask & *addr;
+}
+
+static inline int f2fs_set_bit(unsigned int nr, char *addr)
+{
+ int mask;
+ int ret;
+
+ addr += (nr >> 3);
+ mask = 1 << (7 - (nr & 0x07));
+ ret = mask & *addr;
+ *addr |= mask;
+ return ret;
+}
+
+static inline int f2fs_clear_bit(unsigned int nr, char *addr)
+{
+ int mask;
+ int ret;
+
+ addr += (nr >> 3);
+ mask = 1 << (7 - (nr & 0x07));
+ ret = mask & *addr;
+ *addr &= ~mask;
+ return ret;
+}
+
+enum {
+ FI_NEW_INODE,
+ FI_NEED_CP,
+ FI_INC_LINK,
+ FI_ACL_MODE,
+ FI_NO_ALLOC,
+};
+
+static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag)
+{
+ set_bit(flag, &fi->flags);
+}
+
+static inline int is_inode_flag_set(struct f2fs_inode_info *fi, int flag)
+{
+ return test_bit(flag, &fi->flags);
+}
+
+static inline void clear_inode_flag(struct f2fs_inode_info *fi, int flag)
+{
+ clear_bit(flag, &fi->flags);
+}
+
+static inline void set_acl_inode(struct f2fs_inode_info *fi, umode_t mode)
+{
+ fi->i_acl_mode = mode;
+ set_inode_flag(fi, FI_ACL_MODE);
+}
+
+static inline int cond_clear_inode_flag(struct f2fs_inode_info *fi, int flag)
+{
+ if (is_inode_flag_set(fi, FI_ACL_MODE)) {
+ clear_inode_flag(fi, FI_ACL_MODE);
+ return 1;
+ }
+ return 0;
+}
+
+/**
+ * file.c
+ */
+int f2fs_sync_file(struct file *, loff_t, loff_t, int);
+void truncate_data_blocks(struct dnode_of_data *);
+void f2fs_truncate(struct inode *);
+int f2fs_setattr(struct dentry *, struct iattr *);
+int truncate_hole(struct inode *, pgoff_t, pgoff_t);
+long f2fs_ioctl(struct file *, unsigned int, unsigned long);
+
+/**
+ * inode.c
+ */
+void f2fs_set_inode_flags(struct inode *);
+struct inode *f2fs_iget_nowait(struct super_block *, unsigned long);
+struct inode *f2fs_iget(struct super_block *, unsigned long);
+void update_inode(struct inode *, struct page *);
+int f2fs_write_inode(struct inode *, struct writeback_control *);
+void f2fs_evict_inode(struct inode *);
+
+/**
+ * dir.c
+ */
+struct f2fs_dir_entry *f2fs_find_entry(struct inode *, struct qstr *,
+ struct page **);
+struct f2fs_dir_entry *f2fs_parent_dir(struct inode *, struct page **);
+void f2fs_set_link(struct inode *, struct f2fs_dir_entry *,
+ struct page *, struct inode *);
+void init_dent_inode(struct dentry *, struct page *);
+int f2fs_add_link(struct dentry *, struct inode *);
+void f2fs_delete_entry(struct f2fs_dir_entry *, struct page *, struct inode *);
+int f2fs_make_empty(struct inode *, struct inode *);
+bool f2fs_empty_dir(struct inode *);
+
+/**
+ * super.c
+ */
+int f2fs_sync_fs(struct super_block *, int);
+
+/**
+ * hash.c
+ */
+f2fs_hash_t f2fs_dentry_hash(const char *, int);
+
+/**
+ * node.c
+ */
+struct dnode_of_data;
+struct node_info;
+
+int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
+void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);
+int get_dnode_of_data(struct dnode_of_data *, pgoff_t, int);
+int truncate_inode_blocks(struct inode *, pgoff_t);
+int remove_inode_page(struct inode *);
+int new_inode_page(struct inode *, struct dentry *);
+struct page *new_node_page(struct dnode_of_data *, unsigned int);
+void ra_node_page(struct f2fs_sb_info *, nid_t);
+struct page *get_node_page(struct f2fs_sb_info *, pgoff_t);
+struct page *get_node_page_ra(struct page *, int);
+void sync_inode_page(struct dnode_of_data *);
+int sync_node_pages(struct f2fs_sb_info *, nid_t, struct writeback_control *);
+bool alloc_nid(struct f2fs_sb_info *, nid_t *);
+void alloc_nid_done(struct f2fs_sb_info *, nid_t);
+void alloc_nid_failed(struct f2fs_sb_info *, nid_t);
+void recover_node_page(struct f2fs_sb_info *, struct page *,
+ struct f2fs_summary *, struct node_info *, block_t);
+int recover_inode_page(struct f2fs_sb_info *, struct page *);
+int restore_node_summary(struct f2fs_sb_info *, unsigned int,
+ struct f2fs_summary_block *);
+void flush_nat_entries(struct f2fs_sb_info *);
+int build_node_manager(struct f2fs_sb_info *);
+void destroy_node_manager(struct f2fs_sb_info *);
+int create_node_manager_caches(void);
+void destroy_node_manager_caches(void);
+
+/**
+ * segment.c
+ */
+void f2fs_balance_fs(struct f2fs_sb_info *);
+void invalidate_blocks(struct f2fs_sb_info *, block_t);
+void locate_dirty_segment(struct f2fs_sb_info *, unsigned int);
+void clear_prefree_segments(struct f2fs_sb_info *);
+int npages_for_summary_flush(struct f2fs_sb_info *);
+void allocate_new_segments(struct f2fs_sb_info *);
+struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
+struct bio *f2fs_bio_alloc(struct block_device *, sector_t, int, gfp_t);
+void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool sync);
+int write_meta_page(struct f2fs_sb_info *, struct page *,
+ struct writeback_control *);
+void write_node_page(struct f2fs_sb_info *, struct page *, unsigned int,
+ block_t, block_t *);
+void write_data_page(struct inode *, struct page *, struct dnode_of_data*,
+ block_t, block_t *);
+void rewrite_data_page(struct f2fs_sb_info *, struct page *, block_t);
+void recover_data_page(struct f2fs_sb_info *, struct page *,
+ struct f2fs_summary *, block_t, block_t);
+void rewrite_node_page(struct f2fs_sb_info *, struct page *,
+ struct f2fs_summary *, block_t, block_t);
+void write_data_summaries(struct f2fs_sb_info *, block_t);
+void write_node_summaries(struct f2fs_sb_info *, block_t);
+int lookup_journal_in_cursum(struct f2fs_summary_block *,
+ int, unsigned int, int);
+void flush_sit_entries(struct f2fs_sb_info *);
+int build_segment_manager(struct f2fs_sb_info *);
+void reset_victim_segmap(struct f2fs_sb_info *);
+void destroy_segment_manager(struct f2fs_sb_info *);
+
+/**
+ * checkpoint.c
+ */
+struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
+struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
+long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
+int check_orphan_space(struct f2fs_sb_info *);
+void add_orphan_inode(struct f2fs_sb_info *, nid_t);
+void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
+int recover_orphan_inodes(struct f2fs_sb_info *);
+int get_valid_checkpoint(struct f2fs_sb_info *);
+void set_dirty_dir_page(struct inode *, struct page *);
+void remove_dirty_dir_inode(struct inode *);
+void sync_dirty_dir_inodes(struct f2fs_sb_info *);
+void block_operations(struct f2fs_sb_info *);
+void write_checkpoint(struct f2fs_sb_info *, bool, bool);
+void init_orphan_info(struct f2fs_sb_info *);
+int create_checkpoint_caches(void);
+void destroy_checkpoint_caches(void);
+
+/**
+ * data.c
+ */
+int reserve_new_block(struct dnode_of_data *);
+void update_extent_cache(block_t, struct dnode_of_data *);
+struct page *find_data_page(struct inode *, pgoff_t);
+struct page *get_lock_data_page(struct inode *, pgoff_t);
+struct page *get_new_data_page(struct inode *, pgoff_t, bool);
+int f2fs_readpage(struct f2fs_sb_info *, struct page *, block_t, int);
+int do_write_data_page(struct page *);
+
+/**
+ * gc.c
+ */
+int start_gc_thread(struct f2fs_sb_info *);
+void stop_gc_thread(struct f2fs_sb_info *);
+block_t start_bidx_of_node(unsigned int);
+int f2fs_gc(struct f2fs_sb_info *, int);
+#ifdef CONFIG_F2FS_STAT_FS
+void f2fs_update_stat(struct f2fs_sb_info *);
+void f2fs_update_gc_metric(struct f2fs_sb_info *);
+int f2fs_stat_init(struct f2fs_sb_info *);
+void f2fs_stat_exit(struct f2fs_sb_info *);
+#endif
+int build_gc_manager(struct f2fs_sb_info *);
+void destroy_gc_manager(struct f2fs_sb_info *);
+int create_gc_caches(void);
+void destroy_gc_caches(void);
+
+/**
+ * recovery.c
+ */
+void recover_fsync_data(struct f2fs_sb_info *);
+bool space_for_roll_forward(struct f2fs_sb_info *);
+
+extern const struct file_operations f2fs_dir_operations;
+extern const struct file_operations f2fs_file_operations;
+extern const struct inode_operations f2fs_file_inode_operations;
+extern const struct address_space_operations f2fs_dblock_aops;
+extern const struct address_space_operations f2fs_node_aops;
+extern const struct address_space_operations f2fs_meta_aops;
+extern const struct inode_operations f2fs_dir_inode_operations;
+extern const struct inode_operations f2fs_symlink_inode_operations;
+extern const struct inode_operations f2fs_special_inode_operations;
+#endif
diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
new file mode 100644
index 0000000..99ac689
--- /dev/null
+++ b/fs/f2fs/node.h
@@ -0,0 +1,330 @@
+/**
+ * fs/f2fs/node.h
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#define START_NID(nid) ((nid / NAT_ENTRY_PER_BLOCK) * NAT_ENTRY_PER_BLOCK)
+#define NAT_BLOCK_OFFSET(start_nid) (start_nid / NAT_ENTRY_PER_BLOCK)
+
+#define FREE_NID_PAGES 4
+#define MAX_FREE_NIDS (NAT_ENTRY_PER_BLOCK * FREE_NID_PAGES)
+
+#define MAX_RA_NODE 128 /* Max. readahead size for node */
+#define NM_WOUT_THRESHOLD (64 * NAT_ENTRY_PER_BLOCK)
+#define NATVEC_SIZE 64
+
+/**
+ * For node information
+ */
+struct node_info {
+ nid_t nid; /* node id */
+ nid_t ino; /* inode number of the node's owner */
+ block_t blk_addr; /* block address of the node */
+ unsigned char version; /* version of the node */
+};
+
+static inline unsigned char inc_node_version(unsigned char version)
+{
+ return ++version;
+}
+
+struct nat_entry {
+ struct list_head list; /* for clean or dirty nat list */
+ bool checkpointed;
+ struct node_info ni;
+};
+
+#define nat_get_nid(nat) (nat->ni.nid)
+#define nat_set_nid(nat, n) (nat->ni.nid = n)
+#define nat_get_blkaddr(nat) (nat->ni.blk_addr)
+#define nat_set_blkaddr(nat, b) (nat->ni.blk_addr = b)
+#define nat_get_ino(nat) (nat->ni.ino)
+#define nat_set_ino(nat, i) (nat->ni.ino = i)
+#define nat_get_version(nat) (nat->ni.version)
+#define nat_set_version(nat, v) (nat->ni.version = v)
+#define __set_nat_cache_dirty(nm_i, ne) \
+ list_move_tail(&ne->list, &nm_i->dirty_nat_entries);
+#define __clear_nat_cache_dirty(nm_i, ne) \
+ list_move_tail(&ne->list, &nm_i->nat_entries);
+
+static inline void node_info_from_raw_nat(struct node_info *ni,
+ struct f2fs_nat_entry *raw_ne)
+{
+ ni->ino = le32_to_cpu(raw_ne->ino);
+ ni->blk_addr = le32_to_cpu(raw_ne->block_addr);
+ ni->version = raw_ne->version;
+}
+
+/**
+ * For free nid mangement
+ */
+enum nid_state {
+ NID_NEW,
+ NID_ALLOC
+};
+
+struct free_nid {
+ nid_t nid;
+ int state;
+ struct list_head list;
+};
+
+static inline int next_free_nid(struct f2fs_sb_info *sbi, nid_t *nid)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct free_nid *fnid;
+
+ if (nm_i->fcnt <= 0)
+ return -1;
+ spin_lock(&nm_i->free_nid_list_lock);
+ fnid = list_entry(nm_i->free_nid_list.next, struct free_nid, list);
+ *nid = fnid->nid;
+ spin_unlock(&nm_i->free_nid_list_lock);
+ return 0;
+}
+
+/**
+ * inline functions
+ */
+static inline void get_nat_bitmap(struct f2fs_sb_info *sbi, void *addr)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ memcpy(addr, nm_i->nat_bitmap, nm_i->bitmap_size);
+}
+
+static inline pgoff_t current_nat_addr(struct f2fs_sb_info *sbi, nid_t start)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ pgoff_t block_off;
+ pgoff_t block_addr;
+ int seg_off;
+
+ block_off = NAT_BLOCK_OFFSET(start);
+ seg_off = block_off >> sbi->log_blocks_per_seg;
+
+ block_addr = (pgoff_t)(nm_i->nat_blkaddr +
+ (seg_off << sbi->log_blocks_per_seg << 1) +
+ (block_off & ((1 << sbi->log_blocks_per_seg) - 1)));
+
+ if (f2fs_test_bit(block_off, nm_i->nat_bitmap))
+ block_addr += sbi->blocks_per_seg;
+
+ return block_addr;
+}
+
+static inline pgoff_t next_nat_addr(struct f2fs_sb_info *sbi,
+ pgoff_t block_addr)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+
+ block_addr -= nm_i->nat_blkaddr;
+ if ((block_addr >> sbi->log_blocks_per_seg) % 2)
+ block_addr -= sbi->blocks_per_seg;
+ else
+ block_addr += sbi->blocks_per_seg;
+
+ return block_addr + nm_i->nat_blkaddr;
+}
+
+static inline void set_to_next_nat(struct f2fs_nm_info *nm_i, nid_t start_nid)
+{
+ unsigned int block_off = NAT_BLOCK_OFFSET(start_nid);
+
+ if (f2fs_test_bit(block_off, nm_i->nat_bitmap))
+ f2fs_clear_bit(block_off, nm_i->nat_bitmap);
+ else
+ f2fs_set_bit(block_off, nm_i->nat_bitmap);
+}
+
+static inline void fill_node_footer(struct page *page, nid_t nid,
+ nid_t ino, unsigned int ofs, bool reset)
+{
+ void *kaddr = page_address(page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ if (reset)
+ memset(rn, 0, sizeof(*rn));
+ rn->footer.nid = cpu_to_le32(nid);
+ rn->footer.ino = cpu_to_le32(ino);
+ rn->footer.flag = cpu_to_le32(ofs << OFFSET_BIT_SHIFT);
+}
+
+static inline void copy_node_footer(struct page *dst, struct page *src)
+{
+ void *src_addr = page_address(src);
+ void *dst_addr = page_address(dst);
+ struct f2fs_node *src_rn = (struct f2fs_node *)src_addr;
+ struct f2fs_node *dst_rn = (struct f2fs_node *)dst_addr;
+ memcpy(&dst_rn->footer, &src_rn->footer, sizeof(struct node_footer));
+}
+
+static inline void fill_node_footer_blkaddr(struct page *page, block_t blkaddr)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ void *kaddr = page_address(page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ rn->footer.cp_ver = ckpt->checkpoint_ver;
+ rn->footer.next_blkaddr = blkaddr;
+}
+
+static inline nid_t ino_of_node(struct page *node_page)
+{
+ void *kaddr = page_address(node_page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ return le32_to_cpu(rn->footer.ino);
+}
+
+static inline nid_t nid_of_node(struct page *node_page)
+{
+ void *kaddr = page_address(node_page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ return le32_to_cpu(rn->footer.nid);
+}
+
+static inline unsigned int ofs_of_node(struct page *node_page)
+{
+ void *kaddr = page_address(node_page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ unsigned flag = le32_to_cpu(rn->footer.flag);
+ return flag >> OFFSET_BIT_SHIFT;
+}
+
+static inline unsigned long long cpver_of_node(struct page *node_page)
+{
+ void *kaddr = page_address(node_page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ return le64_to_cpu(rn->footer.cp_ver);
+}
+
+static inline block_t next_blkaddr_of_node(struct page *node_page)
+{
+ void *kaddr = page_address(node_page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ return le32_to_cpu(rn->footer.next_blkaddr);
+}
+
+static inline bool IS_DNODE(struct page *node_page)
+{
+ unsigned int ofs = ofs_of_node(node_page);
+ if (ofs == 3 || ofs == 4 + NIDS_PER_BLOCK ||
+ ofs == 5 + 2 * NIDS_PER_BLOCK)
+ return false;
+ if (ofs >= 6 + 2 * NIDS_PER_BLOCK) {
+ ofs -= 6 + 2 * NIDS_PER_BLOCK;
+ if ((long int)ofs % (NIDS_PER_BLOCK + 1))
+ return false;
+ }
+ return true;
+}
+
+static inline void set_nid(struct page *p, int off, nid_t nid, bool i)
+{
+ struct f2fs_node *rn = (struct f2fs_node *)page_address(p);
+
+ wait_on_page_writeback(p);
+
+ if (i)
+ rn->i.i_nid[off - NODE_DIR1_BLOCK] = cpu_to_le32(nid);
+ else
+ rn->in.nid[off] = cpu_to_le32(nid);
+ set_page_dirty(p);
+}
+
+static inline nid_t get_nid(struct page *p, int off, bool i)
+{
+ struct f2fs_node *rn = (struct f2fs_node *)page_address(p);
+ if (i)
+ return le32_to_cpu(rn->i.i_nid[off - NODE_DIR1_BLOCK]);
+ return le32_to_cpu(rn->in.nid[off]);
+}
+
+/**
+ * Coldness identification:
+ * - Mark cold files in f2fs_inode_info
+ * - Mark cold node blocks in their node footer
+ * - Mark cold data pages in page cache
+ */
+static inline int is_cold_file(struct inode *inode)
+{
+ return F2FS_I(inode)->i_advise & FADVISE_COLD_BIT;
+}
+
+static inline int is_cold_data(struct page *page)
+{
+ return PageChecked(page);
+}
+
+static inline void set_cold_data(struct page *page)
+{
+ SetPageChecked(page);
+}
+
+static inline void clear_cold_data(struct page *page)
+{
+ ClearPageChecked(page);
+}
+
+static inline int is_cold_node(struct page *page)
+{
+ void *kaddr = page_address(page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ unsigned int flag = le32_to_cpu(rn->footer.flag);
+ return flag & (0x1 << COLD_BIT_SHIFT);
+}
+
+static inline unsigned char is_fsync_dnode(struct page *page)
+{
+ void *kaddr = page_address(page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ unsigned int flag = le32_to_cpu(rn->footer.flag);
+ return flag & (0x1 << FSYNC_BIT_SHIFT);
+}
+
+static inline unsigned char is_dent_dnode(struct page *page)
+{
+ void *kaddr = page_address(page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ unsigned int flag = le32_to_cpu(rn->footer.flag);
+ return flag & (0x1 << DENT_BIT_SHIFT);
+}
+
+static inline void set_cold_node(struct inode *inode, struct page *page)
+{
+ struct f2fs_node *rn = (struct f2fs_node *)page_address(page);
+ unsigned int flag = le32_to_cpu(rn->footer.flag);
+
+ if (S_ISDIR(inode->i_mode))
+ flag &= ~(0x1 << COLD_BIT_SHIFT);
+ else
+ flag |= (0x1 << COLD_BIT_SHIFT);
+ rn->footer.flag = cpu_to_le32(flag);
+}
+
+static inline void set_fsync_mark(struct page *page, int mark)
+{
+ void *kaddr = page_address(page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ unsigned int flag = le32_to_cpu(rn->footer.flag);
+ if (mark)
+ flag |= (0x1 << FSYNC_BIT_SHIFT);
+ else
+ flag &= ~(0x1 << FSYNC_BIT_SHIFT);
+ rn->footer.flag = cpu_to_le32(flag);
+}
+
+static inline void set_dentry_mark(struct page *page, int mark)
+{
+ void *kaddr = page_address(page);
+ struct f2fs_node *rn = (struct f2fs_node *)kaddr;
+ unsigned int flag = le32_to_cpu(rn->footer.flag);
+ if (mark)
+ flag |= (0x1 << DENT_BIT_SHIFT);
+ else
+ flag &= ~(0x1 << DENT_BIT_SHIFT);
+ rn->footer.flag = cpu_to_le32(flag);
+}
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
new file mode 100644
index 0000000..cd6268e
--- /dev/null
+++ b/fs/f2fs/segment.h
@@ -0,0 +1,594 @@
+/**
+ * fs/f2fs/segment.h
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+/* constant macro */
+#define NULL_SEGNO ((unsigned int)(~0))
+#define SUM_TYPE_NODE (1)
+#define SUM_TYPE_DATA (0)
+
+/* V: Logical segment # in volume, R: Relative segment # in main area */
+#define GET_L2R_SEGNO(free_i, segno) (segno - free_i->start_segno)
+#define GET_R2L_SEGNO(free_i, segno) (segno + free_i->start_segno)
+
+#define IS_DATASEG(t) \
+ ((t == CURSEG_HOT_DATA) || (t == CURSEG_COLD_DATA) || \
+ (t == CURSEG_WARM_DATA))
+
+#define IS_NODESEG(t) \
+ ((t == CURSEG_HOT_NODE) || (t == CURSEG_COLD_NODE) || \
+ (t == CURSEG_WARM_NODE))
+
+#define IS_CURSEG(sbi, segno) \
+ ((segno == CURSEG_I(sbi, CURSEG_HOT_DATA)->segno) || \
+ (segno == CURSEG_I(sbi, CURSEG_WARM_DATA)->segno) || \
+ (segno == CURSEG_I(sbi, CURSEG_COLD_DATA)->segno) || \
+ (segno == CURSEG_I(sbi, CURSEG_HOT_NODE)->segno) || \
+ (segno == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno) || \
+ (segno == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno))
+
+#define IS_CURSEC(sbi, secno) \
+ ((secno == CURSEG_I(sbi, CURSEG_HOT_DATA)->segno / \
+ sbi->segs_per_sec) || \
+ (secno == CURSEG_I(sbi, CURSEG_WARM_DATA)->segno / \
+ sbi->segs_per_sec) || \
+ (secno == CURSEG_I(sbi, CURSEG_COLD_DATA)->segno / \
+ sbi->segs_per_sec) || \
+ (secno == CURSEG_I(sbi, CURSEG_HOT_NODE)->segno / \
+ sbi->segs_per_sec) || \
+ (secno == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno / \
+ sbi->segs_per_sec) || \
+ (secno == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno / \
+ sbi->segs_per_sec)) \
+
+#define START_BLOCK(sbi, segno) \
+ (SM_I(sbi)->seg0_blkaddr + \
+ (GET_R2L_SEGNO(FREE_I(sbi), segno) << sbi->log_blocks_per_seg))
+#define NEXT_FREE_BLKADDR(sbi, curseg) \
+ (START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff)
+
+#define MAIN_BASE_BLOCK(sbi) (SM_I(sbi)->main_blkaddr)
+
+#define GET_SEGOFF_FROM_SEG0(sbi, blk_addr) \
+ ((blk_addr) - SM_I(sbi)->seg0_blkaddr)
+#define GET_SEGNO_FROM_SEG0(sbi, blk_addr) \
+ (GET_SEGOFF_FROM_SEG0(sbi, blk_addr) >> sbi->log_blocks_per_seg)
+#define GET_SEGNO(sbi, blk_addr) \
+ (((blk_addr == NULL_ADDR) || (blk_addr == NEW_ADDR)) ? \
+ NULL_SEGNO : GET_L2R_SEGNO(FREE_I(sbi), \
+ GET_SEGNO_FROM_SEG0(sbi, blk_addr)))
+#define GET_SECNO(sbi, segno) \
+ ((segno) / sbi->segs_per_sec)
+#define GET_ZONENO_FROM_SEGNO(sbi, segno) \
+ ((segno / sbi->segs_per_sec) / sbi->secs_per_zone)
+
+#define GET_SUM_BLOCK(sbi, segno) \
+ ((sbi->sm_info->ssa_blkaddr) + segno)
+
+#define GET_SUM_TYPE(footer) ((footer)->entry_type)
+#define SET_SUM_TYPE(footer, type) ((footer)->entry_type = type)
+
+#define SIT_ENTRY_OFFSET(sit_i, segno) \
+ (segno % sit_i->sents_per_block)
+#define SIT_BLOCK_OFFSET(sit_i, segno) \
+ (segno / SIT_ENTRY_PER_BLOCK)
+#define START_SEGNO(sit_i, segno) \
+ (SIT_BLOCK_OFFSET(sit_i, segno) * SIT_ENTRY_PER_BLOCK)
+#define f2fs_bitmap_size(nr) \
+ (BITS_TO_LONGS(nr) * sizeof(unsigned long))
+#define TOTAL_SEGS(sbi) (SM_I(sbi)->main_segment_count)
+
+enum {
+ LFS = 0,
+ SSR
+};
+
+enum {
+ ALLOC_RIGHT = 0,
+ ALLOC_LEFT
+};
+
+#define SET_SSR_TYPE(type) (((type) + 1) << 16)
+#define GET_SSR_TYPE(type) (((type) >> 16) - 1)
+#define IS_SSR_TYPE(type) ((type) >= (0x1 << 16))
+#define IS_NEXT_SEG(sbi, curseg, type) \
+ (DIRTY_I(sbi)->v_ops->get_victim(sbi, &(curseg)->next_segno, \
+ BG_GC, SET_SSR_TYPE(type)))
+/**
+ * The MSB 6 bits of f2fs_sit_entry->vblocks has segment type,
+ * and LSB 10 bits has valid blocks.
+ */
+#define VBLOCKS_MASK ((1 << 10) - 1)
+
+#define GET_SIT_VBLOCKS(raw_sit) \
+ (le16_to_cpu((raw_sit)->vblocks) & VBLOCKS_MASK)
+#define GET_SIT_TYPE(raw_sit) \
+ ((le16_to_cpu((raw_sit)->vblocks) & ~VBLOCKS_MASK) >> 10)
+
+struct bio_private {
+ struct f2fs_sb_info *sbi;
+ bool is_sync;
+ void *wait;
+};
+
+enum {
+ GC_CB = 0,
+ GC_GREEDY
+};
+
+struct victim_sel_policy {
+ int alloc_mode;
+ int gc_mode;
+ int type;
+ unsigned long *dirty_segmap;
+ unsigned int offset;
+ unsigned int ofs_unit;
+ unsigned int min_cost;
+ unsigned int min_segno;
+};
+
+struct seg_entry {
+ unsigned short valid_blocks;
+ unsigned char *cur_valid_map;
+ unsigned short ckpt_valid_blocks;
+ unsigned char *ckpt_valid_map;
+ unsigned char type;
+ unsigned long long mtime;
+};
+
+struct sec_entry {
+ unsigned int valid_blocks;
+};
+
+struct segment_allocation {
+ void (*allocate_segment)(struct f2fs_sb_info *, int, bool);
+};
+
+struct sit_info {
+ const struct segment_allocation *s_ops;
+
+ block_t sit_base_addr;
+ block_t sit_blocks;
+ block_t written_valid_blocks; /* total number of valid blocks
+ in main area */
+ char *sit_bitmap; /* SIT bitmap pointer */
+ unsigned int bitmap_size;
+
+ unsigned int dirty_sentries; /* # of dirty sentries */
+ unsigned long *dirty_sentries_bitmap; /* bitmap for dirty sentries */
+ unsigned int sents_per_block; /* number of SIT entries
+ per SIT block */
+ struct mutex sentry_lock; /* to protect SIT entries */
+ struct seg_entry *sentries;
+ struct sec_entry *sec_entries;
+
+ unsigned long long elapsed_time;
+ unsigned long long mounted_time;
+ unsigned long long min_mtime;
+ unsigned long long max_mtime;
+};
+
+struct free_segmap_info {
+ unsigned int start_segno;
+ unsigned int free_segments;
+ unsigned int free_sections;
+ rwlock_t segmap_lock; /* free segmap lock */
+ unsigned long *free_segmap;
+ unsigned long *free_secmap;
+};
+
+/* Notice: The order of dirty type is same with CURSEG_XXX in f2fs.h */
+enum dirty_type {
+ DIRTY_HOT_DATA, /* a few valid blocks in a data segment */
+ DIRTY_WARM_DATA,
+ DIRTY_COLD_DATA,
+ DIRTY_HOT_NODE, /* a few valid blocks in a node segment */
+ DIRTY_WARM_NODE,
+ DIRTY_COLD_NODE,
+ DIRTY,
+ PRE, /* no valid blocks in a segment */
+ NR_DIRTY_TYPE
+};
+
+enum {
+ BG_GC,
+ FG_GC
+};
+
+struct dirty_seglist_info {
+ const struct victim_selection *v_ops;
+ struct mutex seglist_lock;
+ unsigned long *dirty_segmap[NR_DIRTY_TYPE];
+ int nr_dirty[NR_DIRTY_TYPE];
+ unsigned long *victim_segmap[2]; /* BG_GC, FG_GC */
+};
+
+struct victim_selection {
+ int (*get_victim)(struct f2fs_sb_info *, unsigned int *, int, int);
+};
+
+struct curseg_info {
+ struct mutex curseg_mutex;
+ struct f2fs_summary_block *sum_blk;
+ unsigned char alloc_type;
+ unsigned int segno;
+ unsigned short next_blkoff;
+ unsigned int zone;
+ unsigned int next_segno;
+};
+
+/**
+ * inline functions
+ */
+static inline struct curseg_info *CURSEG_I(struct f2fs_sb_info *sbi, int type)
+{
+ return (struct curseg_info *)(SM_I(sbi)->curseg_array + type);
+}
+
+static inline struct seg_entry *get_seg_entry(struct f2fs_sb_info *sbi,
+ unsigned int segno)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ return &sit_i->sentries[segno];
+}
+
+static inline struct sec_entry *get_sec_entry(struct f2fs_sb_info *sbi,
+ unsigned int segno)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ return &sit_i->sec_entries[GET_SECNO(sbi, segno)];
+}
+
+static inline unsigned int get_valid_blocks(struct f2fs_sb_info *sbi,
+ unsigned int segno, int section)
+{
+ if (section > 1)
+ return get_sec_entry(sbi, segno)->valid_blocks;
+ else
+ return get_seg_entry(sbi, segno)->valid_blocks;
+}
+
+static inline void seg_info_from_raw_sit(struct seg_entry *se,
+ struct f2fs_sit_entry *rs)
+{
+ se->valid_blocks = GET_SIT_VBLOCKS(rs);
+ se->ckpt_valid_blocks = GET_SIT_VBLOCKS(rs);
+ memcpy(se->cur_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
+ memcpy(se->ckpt_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
+ se->type = GET_SIT_TYPE(rs);
+ se->mtime = le64_to_cpu(rs->mtime);
+}
+
+static inline void seg_info_to_raw_sit(struct seg_entry *se,
+ struct f2fs_sit_entry *rs)
+{
+ unsigned short raw_vblocks = (se->type << 10) | se->valid_blocks;
+ rs->vblocks = cpu_to_le16(raw_vblocks);
+ memcpy(rs->valid_map, se->cur_valid_map, SIT_VBLOCK_MAP_SIZE);
+ memcpy(se->ckpt_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
+ se->ckpt_valid_blocks = se->valid_blocks;
+ rs->mtime = cpu_to_le64(se->mtime);
+}
+
+static inline unsigned int find_next_inuse(struct free_segmap_info *free_i,
+ unsigned int max, unsigned int segno)
+{
+ unsigned int ret;
+ read_lock(&free_i->segmap_lock);
+ ret = find_next_bit(free_i->free_segmap, max, segno);
+ read_unlock(&free_i->segmap_lock);
+ return ret;
+}
+
+static inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
+{
+ struct free_segmap_info *free_i = FREE_I(sbi);
+ unsigned int secno = segno / sbi->segs_per_sec;
+ unsigned int start_segno = secno * sbi->segs_per_sec;
+ unsigned int next;
+
+ write_lock(&free_i->segmap_lock);
+ clear_bit(segno, free_i->free_segmap);
+ free_i->free_segments++;
+
+ next = find_next_bit(free_i->free_segmap, TOTAL_SEGS(sbi), start_segno);
+ if (next >= start_segno + sbi->segs_per_sec) {
+ clear_bit(secno, free_i->free_secmap);
+ free_i->free_sections++;
+ }
+ write_unlock(&free_i->segmap_lock);
+}
+
+static inline void __set_inuse(struct f2fs_sb_info *sbi,
+ unsigned int segno)
+{
+ struct free_segmap_info *free_i = FREE_I(sbi);
+ unsigned int secno = segno / sbi->segs_per_sec;
+ set_bit(segno, free_i->free_segmap);
+ free_i->free_segments--;
+ if (!test_and_set_bit(secno, free_i->free_secmap))
+ free_i->free_sections--;
+}
+
+static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
+ unsigned int segno)
+{
+ struct free_segmap_info *free_i = FREE_I(sbi);
+ unsigned int secno = segno / sbi->segs_per_sec;
+ unsigned int start_segno = secno * sbi->segs_per_sec;
+ unsigned int next;
+
+ write_lock(&free_i->segmap_lock);
+ if (test_and_clear_bit(segno, free_i->free_segmap)) {
+ free_i->free_segments++;
+
+ next = find_next_bit(free_i->free_segmap, TOTAL_SEGS(sbi),
+ start_segno);
+ if (next >= start_segno + sbi->segs_per_sec) {
+ if (test_and_clear_bit(secno, free_i->free_secmap))
+ free_i->free_sections++;
+ }
+ }
+ write_unlock(&free_i->segmap_lock);
+}
+
+static inline void __set_test_and_inuse(struct f2fs_sb_info *sbi,
+ unsigned int segno)
+{
+ struct free_segmap_info *free_i = FREE_I(sbi);
+ unsigned int secno = segno / sbi->segs_per_sec;
+ write_lock(&free_i->segmap_lock);
+ if (!test_and_set_bit(segno, free_i->free_segmap)) {
+ free_i->free_segments--;
+ if (!test_and_set_bit(secno, free_i->free_secmap))
+ free_i->free_sections--;
+ }
+ write_unlock(&free_i->segmap_lock);
+}
+
+static inline void get_sit_bitmap(struct f2fs_sb_info *sbi,
+ void *dst_addr)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ memcpy(dst_addr, sit_i->sit_bitmap, sit_i->bitmap_size);
+}
+
+static inline block_t written_block_count(struct f2fs_sb_info *sbi)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ block_t vblocks;
+
+ mutex_lock(&sit_i->sentry_lock);
+ vblocks = sit_i->written_valid_blocks;
+ mutex_unlock(&sit_i->sentry_lock);
+
+ return vblocks;
+}
+
+static inline unsigned int free_segments(struct f2fs_sb_info *sbi)
+{
+ struct free_segmap_info *free_i = FREE_I(sbi);
+ unsigned int free_segs;
+
+ read_lock(&free_i->segmap_lock);
+ free_segs = free_i->free_segments;
+ read_unlock(&free_i->segmap_lock);
+
+ return free_segs;
+}
+
+static inline int reserved_segments(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+ return gc_i->rsvd_segment_count;
+}
+
+static inline unsigned int free_sections(struct f2fs_sb_info *sbi)
+{
+ struct free_segmap_info *free_i = FREE_I(sbi);
+ unsigned int free_secs;
+
+ read_lock(&free_i->segmap_lock);
+ free_secs = free_i->free_sections;
+ read_unlock(&free_i->segmap_lock);
+
+ return free_secs;
+}
+
+static inline unsigned int prefree_segments(struct f2fs_sb_info *sbi)
+{
+ return DIRTY_I(sbi)->nr_dirty[PRE];
+}
+
+static inline unsigned int dirty_segments(struct f2fs_sb_info *sbi)
+{
+ return DIRTY_I(sbi)->nr_dirty[DIRTY_HOT_DATA] +
+ DIRTY_I(sbi)->nr_dirty[DIRTY_WARM_DATA] +
+ DIRTY_I(sbi)->nr_dirty[DIRTY_COLD_DATA] +
+ DIRTY_I(sbi)->nr_dirty[DIRTY_HOT_NODE] +
+ DIRTY_I(sbi)->nr_dirty[DIRTY_WARM_NODE] +
+ DIRTY_I(sbi)->nr_dirty[DIRTY_COLD_NODE];
+}
+
+static inline int overprovision_segments(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+ return gc_i->overp_segment_count;
+}
+
+static inline int overprovision_sections(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+ return ((unsigned int) gc_i->overp_segment_count) / sbi->segs_per_sec;
+}
+
+static inline int reserved_sections(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+ return ((unsigned int) gc_i->rsvd_segment_count) / sbi->segs_per_sec;
+}
+
+static inline bool need_SSR(struct f2fs_sb_info *sbi)
+{
+ return (free_sections(sbi) < overprovision_sections(sbi));
+}
+
+static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi)
+{
+ return free_sections(sbi) <= reserved_sections(sbi);
+}
+
+static inline int utilization(struct f2fs_sb_info *sbi)
+{
+ return (long int)valid_user_blocks(sbi) * 100 /
+ (long int)sbi->user_block_count;
+}
+
+/* Disable In-Place-Update by default */
+#define MIN_IPU_UTIL 100
+static inline bool need_inplace_update(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ if (S_ISDIR(inode->i_mode))
+ return false;
+ if (need_SSR(sbi) && utilization(sbi) > MIN_IPU_UTIL)
+ return true;
+ return false;
+}
+
+static inline unsigned int curseg_segno(struct f2fs_sb_info *sbi,
+ int type)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+ return curseg->segno;
+}
+
+static inline unsigned char curseg_alloc_type(struct f2fs_sb_info *sbi,
+ int type)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+ return curseg->alloc_type;
+}
+
+static inline unsigned short curseg_blkoff(struct f2fs_sb_info *sbi, int type)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+ return curseg->next_blkoff;
+}
+
+static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno)
+{
+ unsigned int end_segno = SM_I(sbi)->segment_count - 1;
+ BUG_ON(segno > end_segno);
+}
+
+/*
+ * This function is used for only debugging.
+ * NOTE: In future, we have to remove this function.
+ */
+static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
+{
+ struct f2fs_sm_info *sm_info = SM_I(sbi);
+ block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
+ block_t start_addr = sm_info->seg0_blkaddr;
+ block_t end_addr = start_addr + total_blks - 1;
+ BUG_ON(blk_addr < start_addr);
+ BUG_ON(blk_addr > end_addr);
+}
+
+/**
+ * Summary block is always treated as invalid block
+ */
+static inline void check_block_count(struct f2fs_sb_info *sbi,
+ int segno, struct f2fs_sit_entry *raw_sit)
+{
+ struct f2fs_sm_info *sm_info = SM_I(sbi);
+ unsigned int end_segno = sm_info->segment_count - 1;
+ int valid_blocks = 0;
+ int i;
+
+ /* check segment usage */
+ BUG_ON(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg);
+
+ /* check boundary of a given segment number */
+ BUG_ON(segno > end_segno);
+
+ /* check bitmap with valid block count */
+ for (i = 0; i < sbi->blocks_per_seg; i++)
+ if (f2fs_test_bit(i, raw_sit->valid_map))
+ valid_blocks++;
+ BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);
+}
+
+static inline pgoff_t current_sit_addr(struct f2fs_sb_info *sbi,
+ unsigned int start)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ unsigned int offset = SIT_BLOCK_OFFSET(sit_i, start);
+ block_t blk_addr = sit_i->sit_base_addr + offset;
+
+ check_seg_range(sbi, start);
+
+ /* calculate sit block address */
+ if (f2fs_test_bit(offset, sit_i->sit_bitmap))
+ blk_addr += sit_i->sit_blocks;
+
+ return blk_addr;
+}
+
+static inline pgoff_t next_sit_addr(struct f2fs_sb_info *sbi,
+ pgoff_t block_addr)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ block_addr -= sit_i->sit_base_addr;
+ if (block_addr < sit_i->sit_blocks)
+ block_addr += sit_i->sit_blocks;
+ else
+ block_addr -= sit_i->sit_blocks;
+
+ return block_addr + sit_i->sit_base_addr;
+}
+
+static inline void set_to_next_sit(struct sit_info *sit_i, unsigned int start)
+{
+ unsigned int block_off = SIT_BLOCK_OFFSET(sit_i, start);
+
+ if (f2fs_test_bit(block_off, sit_i->sit_bitmap))
+ f2fs_clear_bit(block_off, sit_i->sit_bitmap);
+ else
+ f2fs_set_bit(block_off, sit_i->sit_bitmap);
+}
+
+static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ return sit_i->elapsed_time + CURRENT_TIME_SEC.tv_sec -
+ sit_i->mounted_time;
+}
+
+static inline void set_summary(struct f2fs_summary *sum, nid_t nid,
+ unsigned int ofs_in_node, unsigned char version)
+{
+ sum->nid = cpu_to_le32(nid);
+ sum->ofs_in_node = cpu_to_le16(ofs_in_node);
+ sum->version = version;
+}
+
+static inline block_t start_sum_block(struct f2fs_sb_info *sbi)
+{
+ return __start_cp_addr(sbi) +
+ le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_start_sum);
+}
+
+static inline block_t sum_blk_addr(struct f2fs_sb_info *sbi, int base, int type)
+{
+ return __start_cp_addr(sbi) +
+ le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_total_block_count)
+ - (base + 1) + type;
+}
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:27:26

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 04/16 v2] f2fs: add super block operations

This adds the implementation of superblock operations for f2fs, which includes
- init_f2fs_fs/exit_f2fs_fs
- f2fs_mount
- super_operations of f2fs

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/super.c | 590 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 590 insertions(+)
create mode 100644 fs/f2fs/super.c

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
new file mode 100644
index 0000000..8e608a0
--- /dev/null
+++ b/fs/f2fs/super.c
@@ -0,0 +1,590 @@
+/**
+ * fs/f2fs/super.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/statfs.h>
+#include <linux/proc_fs.h>
+#include <linux/buffer_head.h>
+#include <linux/backing-dev.h>
+#include <linux/kthread.h>
+#include <linux/parser.h>
+#include <linux/mount.h>
+#include <linux/seq_file.h>
+#include <linux/f2fs_fs.h>
+
+#include "f2fs.h"
+#include "node.h"
+#include "xattr.h"
+
+static struct kmem_cache *f2fs_inode_cachep;
+static struct proc_dir_entry *f2fs_proc_root;
+
+enum {
+ Opt_gc_background_off,
+ Opt_disable_roll_forward,
+ Opt_discard,
+ Opt_noheap,
+ Opt_nouser_xattr,
+ Opt_noacl,
+ Opt_active_logs,
+ Opt_disable_ext_identify,
+ Opt_err,
+};
+
+static match_table_t f2fs_tokens = {
+ {Opt_gc_background_off, "background_gc_off"},
+ {Opt_disable_roll_forward, "disable_roll_forward"},
+ {Opt_discard, "discard"},
+ {Opt_noheap, "no_heap"},
+ {Opt_nouser_xattr, "nouser_xattr"},
+ {Opt_noacl, "noacl"},
+ {Opt_active_logs, "active_logs=%u"},
+ {Opt_disable_ext_identify, "disable_ext_identify"},
+ {Opt_err, NULL},
+};
+
+static void init_once(void *foo)
+{
+ struct f2fs_inode_info *fi = (struct f2fs_inode_info *) foo;
+
+ memset(fi, 0, sizeof(*fi));
+ inode_init_once(&fi->vfs_inode);
+}
+
+static struct inode *f2fs_alloc_inode(struct super_block *sb)
+{
+ struct f2fs_inode_info *fi;
+
+ fi = kmem_cache_alloc(f2fs_inode_cachep, GFP_NOFS | __GFP_ZERO);
+ if (!fi)
+ return NULL;
+
+ init_once((void *) fi);
+
+ /* Initilize f2fs-specific inode info */
+ fi->vfs_inode.i_version = 1;
+ atomic_set(&fi->dirty_dents, 0);
+ fi->current_depth = 1;
+ fi->i_advise = 0;
+ rwlock_init(&fi->ext.ext_lock);
+
+ set_inode_flag(fi, FI_NEW_INODE);
+
+ return &fi->vfs_inode;
+}
+
+static void f2fs_i_callback(struct rcu_head *head)
+{
+ struct inode *inode = container_of(head, struct inode, i_rcu);
+ kmem_cache_free(f2fs_inode_cachep, F2FS_I(inode));
+}
+
+void f2fs_destroy_inode(struct inode *inode)
+{
+ call_rcu(&inode->i_rcu, f2fs_i_callback);
+}
+
+static void f2fs_put_super(struct super_block *sb)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+
+#ifdef CONFIG_F2FS_STAT_FS
+ if (sbi->s_proc) {
+ f2fs_stat_exit(sbi);
+ remove_proc_entry(sb->s_id, f2fs_proc_root);
+ }
+#endif
+ stop_gc_thread(sbi);
+
+ write_checkpoint(sbi, false, true);
+
+ iput(sbi->node_inode);
+ iput(sbi->meta_inode);
+
+ /* destroy f2fs internal modules */
+ destroy_gc_manager(sbi);
+ destroy_node_manager(sbi);
+ destroy_segment_manager(sbi);
+
+ kfree(sbi->ckpt);
+
+ sb->s_fs_info = NULL;
+ brelse(sbi->raw_super_buf);
+ kfree(sbi);
+}
+
+int f2fs_sync_fs(struct super_block *sb, int sync)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ int ret = 0;
+
+ if (!sbi->s_dirty && !get_pages(sbi, F2FS_DIRTY_NODES))
+ return 0;
+
+ if (sync)
+ write_checkpoint(sbi, false, false);
+
+ return ret;
+}
+
+static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+ struct super_block *sb = dentry->d_sb;
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ block_t total_count, user_block_count, start_count, ovp_count;
+
+ total_count = le64_to_cpu(sbi->raw_super->block_count);
+ user_block_count = sbi->user_block_count;
+ start_count = le32_to_cpu(sbi->raw_super->segment0_blkaddr);
+ ovp_count = sbi->gc_info->overp_segment_count
+ << sbi->log_blocks_per_seg;
+ buf->f_type = F2FS_SUPER_MAGIC;
+ buf->f_bsize = sbi->blocksize;
+
+ buf->f_blocks = total_count - start_count;
+ buf->f_bfree = buf->f_blocks - valid_user_blocks(sbi) - ovp_count;
+ buf->f_bavail = user_block_count - valid_user_blocks(sbi);
+
+ buf->f_files = valid_inode_count(sbi);
+ buf->f_ffree = sbi->total_node_count - valid_node_count(sbi);
+
+ buf->f_namelen = F2FS_MAX_NAME_LEN;
+
+ return 0;
+}
+
+static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(root->d_sb);
+
+ if (test_opt(sbi, BG_GC))
+ seq_puts(seq, ",background_gc_on");
+ else
+ seq_puts(seq, ",background_gc_off");
+ if (test_opt(sbi, DISABLE_ROLL_FORWARD))
+ seq_puts(seq, ",disable_roll_forward");
+ if (test_opt(sbi, DISCARD))
+ seq_puts(seq, ",discard");
+ if (test_opt(sbi, NOHEAP))
+ seq_puts(seq, ",no_heap_alloc");
+#ifdef CONFIG_F2FS_FS_XATTR
+ if (test_opt(sbi, XATTR_USER))
+ seq_puts(seq, ",user_xattr");
+ else
+ seq_puts(seq, ",nouser_xattr");
+#endif
+#ifdef CONFIG_F2FS_FS_POSIX_ACL
+ if (test_opt(sbi, POSIX_ACL))
+ seq_puts(seq, ",acl");
+ else
+ seq_puts(seq, ",noacl");
+#endif
+ if (test_opt(sbi, DISABLE_EXT_IDENTIFY))
+ seq_puts(seq, ",disable_ext_indentify");
+
+ seq_printf(seq, ",active_logs=%u", sbi->active_logs);
+
+ return 0;
+}
+
+static struct super_operations f2fs_sops = {
+ .alloc_inode = f2fs_alloc_inode,
+ .destroy_inode = f2fs_destroy_inode,
+ .write_inode = f2fs_write_inode,
+ .show_options = f2fs_show_options,
+ .evict_inode = f2fs_evict_inode,
+ .put_super = f2fs_put_super,
+ .sync_fs = f2fs_sync_fs,
+ .statfs = f2fs_statfs,
+};
+
+static int parse_options(struct f2fs_sb_info *sbi, char *options)
+{
+ substring_t args[MAX_OPT_ARGS];
+ char *p;
+ int arg = 0;
+
+ if (!options)
+ return 0;
+
+ while ((p = strsep(&options, ",")) != NULL) {
+ int token;
+ if (!*p)
+ continue;
+ /*
+ * Initialize args struct so we know whether arg was
+ * found; some options take optional arguments.
+ */
+ args[0].to = args[0].from = NULL;
+ token = match_token(p, f2fs_tokens, args);
+
+ switch (token) {
+ case Opt_gc_background_off:
+ clear_opt(sbi, BG_GC);
+ break;
+ case Opt_disable_roll_forward:
+ set_opt(sbi, DISABLE_ROLL_FORWARD);
+ break;
+ case Opt_discard:
+ set_opt(sbi, DISCARD);
+ break;
+ case Opt_noheap:
+ set_opt(sbi, NOHEAP);
+ break;
+#ifdef CONFIG_F2FS_FS_XATTR
+ case Opt_nouser_xattr:
+ clear_opt(sbi, XATTR_USER);
+ break;
+#else
+ case Opt_nouser_xattr:
+ pr_info("nouser_xattr options not supported\n");
+ break;
+#endif
+#ifdef CONFIG_F2FS_FS_POSIX_ACL
+ case Opt_noacl:
+ clear_opt(sbi, POSIX_ACL);
+ break;
+#else
+ case Opt_noacl:
+ pr_info("noacl options not supported\n");
+ break;
+#endif
+ case Opt_active_logs:
+ if (args->from && match_int(args, &arg))
+ return -EINVAL;
+ if (arg != 2 && arg != 4 && arg != 6)
+ return -EINVAL;
+ sbi->active_logs = arg;
+ break;
+ case Opt_disable_ext_identify:
+ set_opt(sbi, DISABLE_EXT_IDENTIFY);
+ break;
+ default:
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
+
+static loff_t max_file_size(unsigned bits)
+{
+ loff_t result = ADDRS_PER_INODE;
+ loff_t leaf_count = ADDRS_PER_BLOCK;
+
+ result += (leaf_count * 2);
+
+ leaf_count *= NIDS_PER_BLOCK;
+ result += (leaf_count * 2);
+
+ leaf_count *= NIDS_PER_BLOCK;
+ result += (leaf_count * 2);
+
+ result <<= bits;
+ return result;
+}
+
+static int sanity_check_raw_super(struct f2fs_super_block *raw_super)
+{
+ unsigned int blocksize;
+
+ if (F2FS_SUPER_MAGIC != le32_to_cpu(raw_super->magic))
+ return 1;
+
+ /* Currently, support only 4KB block size */
+ blocksize = 1 << le32_to_cpu(raw_super->log_blocksize);
+ if (blocksize != PAGE_CACHE_SIZE)
+ return 1;
+ if (le32_to_cpu(raw_super->log_sectorsize) != 9)
+ return 1;
+ if (le32_to_cpu(raw_super->log_sectors_per_block) != 3)
+ return 1;
+ return 0;
+}
+
+static int sanity_check_ckpt(struct f2fs_super_block *raw_super,
+ struct f2fs_checkpoint *ckpt)
+{
+ unsigned int total, fsmeta;
+
+ total = le32_to_cpu(raw_super->segment_count);
+ fsmeta = le32_to_cpu(raw_super->segment_count_ckpt);
+ fsmeta += le32_to_cpu(raw_super->segment_count_sit);
+ fsmeta += le32_to_cpu(raw_super->segment_count_nat);
+ fsmeta += le32_to_cpu(ckpt->rsvd_segment_count);
+ fsmeta += le32_to_cpu(raw_super->segment_count_ssa);
+
+ if (fsmeta >= total)
+ return 1;
+ return 0;
+}
+
+static void init_sb_info(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_super_block *raw_super = sbi->raw_super;
+ int i;
+
+ sbi->log_sectorsize = le32_to_cpu(raw_super->log_sectorsize);
+ sbi->log_sectors_per_block =
+ le32_to_cpu(raw_super->log_sectors_per_block);
+ sbi->log_blocksize = le32_to_cpu(raw_super->log_blocksize);
+ sbi->blocksize = 1 << sbi->log_blocksize;
+ sbi->log_blocks_per_seg = le32_to_cpu(raw_super->log_blocks_per_seg);
+ sbi->blocks_per_seg = 1 << sbi->log_blocks_per_seg;
+ sbi->segs_per_sec = le32_to_cpu(raw_super->segs_per_sec);
+ sbi->secs_per_zone = le32_to_cpu(raw_super->secs_per_zone);
+ sbi->total_sections = le32_to_cpu(raw_super->section_count);
+ sbi->total_node_count =
+ (le32_to_cpu(raw_super->segment_count_nat) / 2)
+ * sbi->blocks_per_seg * NAT_ENTRY_PER_BLOCK;
+ sbi->root_ino_num = le32_to_cpu(raw_super->root_ino);
+ sbi->node_ino_num = le32_to_cpu(raw_super->node_ino);
+ sbi->meta_ino_num = le32_to_cpu(raw_super->meta_ino);
+
+ for (i = 0; i < NR_COUNT_TYPE; i++)
+ atomic_set(&sbi->nr_pages[i], 0);
+}
+
+static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
+{
+ struct f2fs_sb_info *sbi;
+ struct f2fs_super_block *raw_super;
+ struct buffer_head *raw_super_buf;
+ struct inode *root;
+ int i;
+
+ /* allocate memory for f2fs-specific super block info */
+ sbi = kzalloc(sizeof(struct f2fs_sb_info), GFP_KERNEL);
+ if (!sbi)
+ return -ENOMEM;
+
+ /* set a temporary block size */
+ if (!sb_set_blocksize(sb, F2FS_BLKSIZE))
+ goto free_sbi;
+
+ /* read f2fs raw super block */
+ raw_super_buf = sb_bread(sb, F2FS_SUPER_OFFSET);
+ if (!raw_super_buf)
+ goto free_sbi;
+ raw_super = (struct f2fs_super_block *) ((char *)raw_super_buf->b_data);
+
+ /* init some FS parameters */
+ sbi->active_logs = NR_CURSEG_TYPE;
+
+ set_opt(sbi, BG_GC);
+
+#ifdef CONFIG_F2FS_FS_XATTR
+ set_opt(sbi, XATTR_USER);
+#endif
+#ifdef CONFIG_F2FS_FS_POSIX_ACL
+ set_opt(sbi, POSIX_ACL);
+#endif
+ /* parse mount options */
+ if (parse_options(sbi, (char *)data))
+ goto free_sb_buf;
+
+ /* sanity checking of raw super */
+ if (sanity_check_raw_super(raw_super))
+ goto free_sb_buf;
+
+ sb->s_maxbytes = max_file_size(raw_super->log_blocksize);
+ sb->s_max_links = F2FS_LINK_MAX;
+
+ sb->s_op = &f2fs_sops;
+ sb->s_xattr = f2fs_xattr_handlers;
+ sb->s_magic = F2FS_SUPER_MAGIC;
+ sb->s_fs_info = sbi;
+ sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
+ (test_opt(sbi, POSIX_ACL) ? MS_POSIXACL : 0);
+
+ /* init f2fs-specific super block info */
+ sbi->sb = sb;
+ sbi->raw_super = raw_super;
+ sbi->raw_super_buf = raw_super_buf;
+ mutex_init(&sbi->gc_mutex);
+ mutex_init(&sbi->write_inode);
+ mutex_init(&sbi->writepages);
+ mutex_init(&sbi->cp_mutex);
+ for (i = 0; i < NR_LOCK_TYPE; i++)
+ mutex_init(&sbi->fs_lock[i]);
+ sbi->por_doing = 0;
+ spin_lock_init(&sbi->stat_lock);
+ init_rwsem(&sbi->bio_sem);
+ init_sb_info(sbi);
+
+ /* get an inode for meta space */
+ sbi->meta_inode = f2fs_iget(sb, F2FS_META_INO(sbi));
+ if (IS_ERR(sbi->meta_inode))
+ goto free_sb_buf;
+
+ if (get_valid_checkpoint(sbi))
+ goto free_meta_inode;
+
+ /* sanity checking of checkpoint */
+ if (sanity_check_ckpt(raw_super, sbi->ckpt))
+ goto free_cp;
+
+ sbi->total_valid_node_count =
+ le32_to_cpu(sbi->ckpt->valid_node_count);
+ sbi->total_valid_inode_count =
+ le32_to_cpu(sbi->ckpt->valid_inode_count);
+ sbi->user_block_count = le64_to_cpu(sbi->ckpt->user_block_count);
+ sbi->total_valid_block_count =
+ le64_to_cpu(sbi->ckpt->valid_block_count);
+ sbi->last_valid_block_count = sbi->total_valid_block_count;
+ sbi->alloc_valid_block_count = 0;
+ INIT_LIST_HEAD(&sbi->dir_inode_list);
+ spin_lock_init(&sbi->dir_inode_lock);
+
+ /* init super block */
+ if (!sb_set_blocksize(sb, sbi->blocksize))
+ goto free_cp;
+
+ init_orphan_info(sbi);
+
+ /* setup f2fs internal modules */
+ if (build_segment_manager(sbi))
+ goto free_sm;
+ if (build_node_manager(sbi))
+ goto free_nm;
+ if (build_gc_manager(sbi))
+ goto free_gc;
+
+ /* get an inode for node space */
+ sbi->node_inode = f2fs_iget(sb, F2FS_NODE_INO(sbi));
+ if (IS_ERR(sbi->node_inode))
+ goto free_gc;
+
+ /* if there are nt orphan nodes free them */
+ if (recover_orphan_inodes(sbi))
+ goto free_node_inode;
+
+ /* read root inode and dentry */
+ root = f2fs_iget(sb, F2FS_ROOT_INO(sbi));
+ if (IS_ERR(root))
+ goto free_node_inode;
+ if (!S_ISDIR(root->i_mode) || !root->i_blocks || !root->i_size)
+ goto free_root_inode;
+
+ sb->s_root = d_make_root(root); /* allocate root dentry */
+ if (!sb->s_root)
+ goto free_root_inode;
+
+ /* recover fsynced data */
+ if (!test_opt(sbi, DISABLE_ROLL_FORWARD))
+ recover_fsync_data(sbi);
+
+ /* After POR, we can run background GC thread */
+ if (start_gc_thread(sbi))
+ goto fail;
+
+#ifdef CONFIG_F2FS_STAT_FS
+ if (f2fs_proc_root) {
+ sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root);
+ if (f2fs_stat_init(sbi))
+ goto fail;
+ }
+#endif
+ return 0;
+fail:
+ stop_gc_thread(sbi);
+free_root_inode:
+ make_bad_inode(root);
+ iput(root);
+free_node_inode:
+ make_bad_inode(sbi->node_inode);
+ iput(sbi->node_inode);
+free_gc:
+ destroy_gc_manager(sbi);
+free_nm:
+ destroy_node_manager(sbi);
+free_sm:
+ destroy_segment_manager(sbi);
+free_cp:
+ kfree(sbi->ckpt);
+free_meta_inode:
+ make_bad_inode(sbi->meta_inode);
+ iput(sbi->meta_inode);
+free_sb_buf:
+ brelse(raw_super_buf);
+free_sbi:
+ kfree(sbi);
+ return -EINVAL;
+}
+
+static struct dentry *f2fs_mount(struct file_system_type *fs_type, int flags,
+ const char *dev_name, void *data)
+{
+ return mount_bdev(fs_type, flags, dev_name, data, f2fs_fill_super);
+}
+
+static struct file_system_type f2fs_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "f2fs",
+ .mount = f2fs_mount,
+ .kill_sb = kill_block_super,
+ .fs_flags = FS_REQUIRES_DEV,
+};
+
+static int init_inodecache(void)
+{
+ f2fs_inode_cachep = f2fs_kmem_cache_create("f2fs_inode_cache",
+ sizeof(struct f2fs_inode_info), NULL);
+ if (f2fs_inode_cachep == NULL)
+ return -ENOMEM;
+ return 0;
+}
+
+static void destroy_inodecache(void)
+{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
+ kmem_cache_destroy(f2fs_inode_cachep);
+}
+
+static int __init init_f2fs_fs(void)
+{
+ if (init_inodecache())
+ goto fail;
+ if (create_node_manager_caches())
+ goto fail;
+ if (create_gc_caches())
+ goto fail;
+ if (create_checkpoint_caches())
+ goto fail;
+ if (register_filesystem(&f2fs_fs_type))
+ return -EBUSY;
+
+ f2fs_proc_root = proc_mkdir("fs/f2fs", NULL);
+ return 0;
+fail:
+ return -ENOMEM;
+}
+
+static void __exit exit_f2fs_fs(void)
+{
+ remove_proc_entry("fs/f2fs", NULL);
+ unregister_filesystem(&f2fs_fs_type);
+ destroy_checkpoint_caches();
+ destroy_gc_caches();
+ destroy_node_manager_caches();
+ destroy_inodecache();
+}
+
+module_init(init_f2fs_fs)
+module_exit(exit_f2fs_fs)
+
+MODULE_AUTHOR("Samsung Electronics's Praesto Team");
+MODULE_DESCRIPTION("Flash Friendly File System");
+MODULE_LICENSE("GPL");
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:28:11

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 05/16 v2] f2fs: add checkpoint operations

This adds functions required by the checkpoint operations.

Basically, f2fs adopts a roll-back model with checkpoint blocks written in the
CP area. The checkpoint procedure includes as follows.

- write_checkpoint()
1. block_operations() freezes VFS calls.
2. submit cached bios.
3. flush_nat_entries() writes NAT pages updated by dirty NAT entries.
4. flush_sit_entries() writes SIT pages updated by dirty SIT entries.
5. do_checkpoint() writes,
- checkpoint block (#0)
- orphan inode blocks
- summary blocks made by active logs
- checkpoint block (copy of #0)
6. unblock_opeations()

In order to provide an address space for meta pages, f2fs_sb_info has a special
inode, namely meta_inode. This patch also adds the address space operations for
meta_inode.

Signed-off-by: Chul Lee <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 795 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 795 insertions(+)
create mode 100644 fs/f2fs/checkpoint.c

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
new file mode 100644
index 0000000..a0601cc
--- /dev/null
+++ b/fs/f2fs/checkpoint.c
@@ -0,0 +1,795 @@
+/**
+ * fs/f2fs/checkpoint.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/bio.h>
+#include <linux/mpage.h>
+#include <linux/writeback.h>
+#include <linux/blkdev.h>
+#include <linux/f2fs_fs.h>
+#include <linux/pagevec.h>
+#include <linux/swap.h>
+
+#include "f2fs.h"
+#include "node.h"
+#include "segment.h"
+
+static struct kmem_cache *orphan_entry_slab;
+static struct kmem_cache *inode_entry_slab;
+
+/**
+ * We guarantee no failure on the returned page.
+ */
+struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index)
+{
+ struct address_space *mapping = sbi->meta_inode->i_mapping;
+ struct page *page = NULL;
+repeat:
+ page = grab_cache_page(mapping, index);
+ if (!page) {
+ cond_resched();
+ goto repeat;
+ }
+
+ /* We wait writeback only inside grab_meta_page() */
+ wait_on_page_writeback(page);
+ SetPageUptodate(page);
+ return page;
+}
+
+/**
+ * We guarantee no failure on the returned page.
+ */
+struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index)
+{
+ struct address_space *mapping = sbi->meta_inode->i_mapping;
+ struct page *page;
+repeat:
+ page = grab_cache_page(mapping, index);
+ if (!page) {
+ cond_resched();
+ goto repeat;
+ }
+ if (f2fs_readpage(sbi, page, index, READ_SYNC)) {
+ f2fs_put_page(page, 1);
+ goto repeat;
+ }
+ mark_page_accessed(page);
+
+ /* We do not allow returning an errorneous page */
+ return page;
+}
+
+static int f2fs_write_meta_page(struct page *page,
+ struct writeback_control *wbc)
+{
+ struct inode *inode = page->mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ int err;
+
+ wait_on_page_writeback(page);
+
+ err = write_meta_page(sbi, page, wbc);
+ if (err) {
+ wbc->pages_skipped++;
+ set_page_dirty(page);
+ }
+
+ dec_page_count(sbi, F2FS_DIRTY_META);
+
+ /* In this case, we should not unlock this page */
+ if (err != AOP_WRITEPAGE_ACTIVATE)
+ unlock_page(page);
+ return err;
+}
+
+static int f2fs_write_meta_pages(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(mapping->host->i_sb);
+ struct block_device *bdev = sbi->sb->s_bdev;
+ long written;
+
+ if (wbc->for_kupdate)
+ return 0;
+
+ if (get_pages(sbi, F2FS_DIRTY_META) == 0)
+ return 0;
+
+ /* if mounting is failed, skip writing node pages */
+ mutex_lock(&sbi->cp_mutex);
+ written = sync_meta_pages(sbi, META, bio_get_nr_vecs(bdev));
+ mutex_unlock(&sbi->cp_mutex);
+ wbc->nr_to_write -= written;
+ return 0;
+}
+
+long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
+ long nr_to_write)
+{
+ struct address_space *mapping = sbi->meta_inode->i_mapping;
+ pgoff_t index = 0, end = LONG_MAX;
+ struct pagevec pvec;
+ long nwritten = 0;
+ struct writeback_control wbc = {
+ .for_reclaim = 0,
+ };
+
+ pagevec_init(&pvec, 0);
+
+ while (index <= end) {
+ int i, nr_pages;
+ nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
+ PAGECACHE_TAG_DIRTY,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
+ if (nr_pages == 0)
+ break;
+
+ for (i = 0; i < nr_pages; i++) {
+ struct page *page = pvec.pages[i];
+ lock_page(page);
+ BUG_ON(page->mapping != mapping);
+ BUG_ON(!PageDirty(page));
+ clear_page_dirty_for_io(page);
+ f2fs_write_meta_page(page, &wbc);
+ if (nwritten++ >= nr_to_write)
+ break;
+ }
+ pagevec_release(&pvec);
+ cond_resched();
+ }
+
+ if (nwritten)
+ f2fs_submit_bio(sbi, type, nr_to_write == LONG_MAX);
+
+ return nwritten;
+}
+
+static int f2fs_set_meta_page_dirty(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ struct f2fs_sb_info *sbi = F2FS_SB(mapping->host->i_sb);
+
+ SetPageUptodate(page);
+ if (!PageDirty(page)) {
+ __set_page_dirty_nobuffers(page);
+ inc_page_count(sbi, F2FS_DIRTY_META);
+ F2FS_SET_SB_DIRT(sbi);
+ return 1;
+ }
+ return 0;
+}
+
+const struct address_space_operations f2fs_meta_aops = {
+ .writepage = f2fs_write_meta_page,
+ .writepages = f2fs_write_meta_pages,
+ .set_page_dirty = f2fs_set_meta_page_dirty,
+};
+
+int check_orphan_space(struct f2fs_sb_info *sbi)
+{
+ unsigned int max_orphans;
+ int err = 0;
+
+ /*
+ * considering 512 blocks in a segment 5 blocks are needed for cp
+ * and log segment summaries. Remaining blocks are used to keep
+ * orphan entries with the limitation one reserved segment
+ * for cp pack we can have max 1020*507 orphan entries
+ */
+ max_orphans = (sbi->blocks_per_seg - 5) * F2FS_ORPHANS_PER_BLOCK;
+ mutex_lock(&sbi->orphan_inode_mutex);
+ if (sbi->n_orphans >= max_orphans)
+ err = -ENOSPC;
+ mutex_unlock(&sbi->orphan_inode_mutex);
+ return err;
+}
+
+void add_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
+{
+ struct list_head *head, *this;
+ struct orphan_inode_entry *new = NULL, *orphan = NULL;
+
+ mutex_lock(&sbi->orphan_inode_mutex);
+ head = &sbi->orphan_inode_list;
+ list_for_each(this, head) {
+ orphan = list_entry(this, struct orphan_inode_entry, list);
+ if (orphan->ino == ino)
+ goto out;
+ if (orphan->ino > ino)
+ break;
+ orphan = NULL;
+ }
+retry:
+ new = kmem_cache_alloc(orphan_entry_slab, GFP_ATOMIC);
+ if (!new) {
+ cond_resched();
+ goto retry;
+ }
+ new->ino = ino;
+ INIT_LIST_HEAD(&new->list);
+
+ /* add new_oentry into list which is sorted by inode number */
+ if (orphan) {
+ struct orphan_inode_entry *prev;
+
+ /* get previous entry */
+ prev = list_entry(orphan->list.prev, typeof(*prev), list);
+ if (&prev->list != head)
+ /* insert new orphan inode entry */
+ list_add(&new->list, &prev->list);
+ else
+ list_add(&new->list, head);
+ } else {
+ list_add_tail(&new->list, head);
+ }
+ sbi->n_orphans++;
+out:
+ mutex_unlock(&sbi->orphan_inode_mutex);
+}
+
+void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
+{
+ struct list_head *this, *next, *head;
+ struct orphan_inode_entry *orphan;
+
+ mutex_lock(&sbi->orphan_inode_mutex);
+ head = &sbi->orphan_inode_list;
+ list_for_each_safe(this, next, head) {
+ orphan = list_entry(this, struct orphan_inode_entry, list);
+ if (orphan->ino == ino) {
+ list_del(&orphan->list);
+ kmem_cache_free(orphan_entry_slab, orphan);
+ sbi->n_orphans--;
+ break;
+ }
+ }
+ mutex_unlock(&sbi->orphan_inode_mutex);
+}
+
+static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
+{
+ struct inode *inode = f2fs_iget(sbi->sb, ino);
+ BUG_ON(IS_ERR(inode));
+ clear_nlink(inode);
+
+ /* truncate all the data during iput */
+ iput(inode);
+}
+
+int recover_orphan_inodes(struct f2fs_sb_info *sbi)
+{
+ block_t start_blk, orphan_blkaddr, i, j;
+
+ if (!(F2FS_CKPT(sbi)->ckpt_flags & CP_ORPHAN_PRESENT_FLAG))
+ return 0;
+
+ sbi->por_doing = 1;
+ start_blk = __start_cp_addr(sbi) + 1;
+ orphan_blkaddr = __start_sum_addr(sbi) - 1;
+
+ for (i = 0; i < orphan_blkaddr; i++) {
+ struct page *page = get_meta_page(sbi, start_blk + i);
+ struct f2fs_orphan_block *orphan_blk;
+
+ orphan_blk = (struct f2fs_orphan_block *)page_address(page);
+ for (j = 0; j < le32_to_cpu(orphan_blk->entry_count); j++) {
+ nid_t ino = le32_to_cpu(orphan_blk->ino[j]);
+ recover_orphan_inode(sbi, ino);
+ }
+ f2fs_put_page(page, 1);
+ }
+ /* clear Orphan Flag */
+ F2FS_CKPT(sbi)->ckpt_flags &= (~CP_ORPHAN_PRESENT_FLAG);
+ sbi->por_doing = 0;
+ return 0;
+}
+
+static void write_orphan_inodes(struct f2fs_sb_info *sbi, block_t start_blk)
+{
+ struct list_head *head, *this, *next;
+ struct f2fs_orphan_block *orphan_blk = NULL;
+ struct page *page = NULL;
+ unsigned int nentries = 0;
+ unsigned short index = 1;
+ unsigned short orphan_blocks;
+
+ orphan_blocks = (unsigned short)((sbi->n_orphans +
+ (F2FS_ORPHANS_PER_BLOCK - 1)) / F2FS_ORPHANS_PER_BLOCK);
+
+ mutex_lock(&sbi->orphan_inode_mutex);
+ head = &sbi->orphan_inode_list;
+
+ /* loop for each orphan inode entry and write them in Jornal block */
+ list_for_each_safe(this, next, head) {
+ struct orphan_inode_entry *orphan;
+
+ orphan = list_entry(this, struct orphan_inode_entry, list);
+
+ if (nentries == F2FS_ORPHANS_PER_BLOCK) {
+ /*
+ * an orphan block is full of 1020 entries,
+ * then we need to flush current orphan blocks
+ * and bring another one in memory
+ */
+ orphan_blk->blk_addr = cpu_to_le16(index);
+ orphan_blk->blk_count = cpu_to_le16(orphan_blocks);
+ orphan_blk->entry_count = cpu_to_le32(nentries);
+ set_page_dirty(page);
+ f2fs_put_page(page, 1);
+ index++;
+ start_blk++;
+ nentries = 0;
+ page = NULL;
+ }
+ if (page)
+ goto page_exist;
+
+ page = grab_meta_page(sbi, start_blk);
+ orphan_blk = (struct f2fs_orphan_block *)page_address(page);
+ memset(orphan_blk, 0, sizeof(*orphan_blk));
+page_exist:
+ orphan_blk->ino[nentries++] = cpu_to_le32(orphan->ino);
+ }
+ if (!page)
+ goto end;
+
+ orphan_blk->blk_addr = cpu_to_le16(index);
+ orphan_blk->blk_count = cpu_to_le16(orphan_blocks);
+ orphan_blk->entry_count = cpu_to_le32(nentries);
+ set_page_dirty(page);
+ f2fs_put_page(page, 1);
+end:
+ mutex_unlock(&sbi->orphan_inode_mutex);
+}
+
+static struct page *validate_checkpoint(struct f2fs_sb_info *sbi,
+ block_t cp_addr, unsigned long long *version)
+{
+ struct page *cp_page_1, *cp_page_2 = NULL;
+ unsigned long blk_size = sbi->blocksize;
+ struct f2fs_checkpoint *cp_block;
+ unsigned long long cur_version = 0, pre_version = 0;
+ unsigned int crc = 0;
+ size_t crc_offset;
+
+ /* Read the 1st cp block in this CP pack */
+ cp_page_1 = get_meta_page(sbi, cp_addr);
+
+ /* get the version number */
+ cp_block = (struct f2fs_checkpoint *)page_address(cp_page_1);
+ crc_offset = le32_to_cpu(cp_block->checksum_offset);
+ if (crc_offset >= blk_size)
+ goto invalid_cp1;
+
+ crc = *(unsigned int *)((unsigned char *)cp_block + crc_offset);
+ if (!f2fs_crc_valid(crc, cp_block, crc_offset))
+ goto invalid_cp1;
+
+ pre_version = le64_to_cpu(cp_block->checkpoint_ver);
+
+ /* Read the 2nd cp block in this CP pack */
+ cp_addr += le64_to_cpu(cp_block->cp_pack_total_block_count) - 1;
+ cp_page_2 = get_meta_page(sbi, cp_addr);
+
+ cp_block = (struct f2fs_checkpoint *)page_address(cp_page_2);
+ crc_offset = le32_to_cpu(cp_block->checksum_offset);
+ if (crc_offset >= blk_size)
+ goto invalid_cp2;
+
+ crc = *(unsigned int *)((unsigned char *)cp_block + crc_offset);
+ if (!f2fs_crc_valid(crc, cp_block, crc_offset))
+ goto invalid_cp2;
+
+ cur_version = le64_to_cpu(cp_block->checkpoint_ver);
+
+ if (cur_version == pre_version) {
+ *version = cur_version;
+ f2fs_put_page(cp_page_2, 1);
+ return cp_page_1;
+ }
+invalid_cp2:
+ f2fs_put_page(cp_page_2, 1);
+invalid_cp1:
+ f2fs_put_page(cp_page_1, 1);
+ return NULL;
+}
+
+int get_valid_checkpoint(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_checkpoint *cp_block;
+ struct f2fs_super_block *fsb = sbi->raw_super;
+ struct page *cp1, *cp2, *cur_page;
+ unsigned long blk_size = sbi->blocksize;
+ unsigned long long cp1_version = 0, cp2_version = 0;
+ unsigned long long cp_start_blk_no;
+
+ sbi->ckpt = kzalloc(blk_size, GFP_KERNEL);
+ if (!sbi->ckpt)
+ return -ENOMEM;
+ /*
+ * Finding out valid cp block involves read both
+ * sets( cp pack1 and cp pack 2)
+ */
+ cp_start_blk_no = le32_to_cpu(fsb->start_segment_checkpoint);
+ cp1 = validate_checkpoint(sbi, cp_start_blk_no, &cp1_version);
+
+ /* The second checkpoint pack should start at the next segment */
+ cp_start_blk_no += 1 << le32_to_cpu(fsb->log_blocks_per_seg);
+ cp2 = validate_checkpoint(sbi, cp_start_blk_no, &cp2_version);
+
+ if (cp1 && cp2) {
+ if (ver_after(cp2_version, cp1_version))
+ cur_page = cp2;
+ else
+ cur_page = cp1;
+ } else if (cp1) {
+ cur_page = cp1;
+ } else if (cp2) {
+ cur_page = cp2;
+ } else {
+ goto fail_no_cp;
+ }
+
+ cp_block = (struct f2fs_checkpoint *)page_address(cur_page);
+ memcpy(sbi->ckpt, cp_block, blk_size);
+
+ f2fs_put_page(cp1, 1);
+ f2fs_put_page(cp2, 1);
+ return 0;
+
+fail_no_cp:
+ kfree(sbi->ckpt);
+ return -EINVAL;
+}
+
+void set_dirty_dir_page(struct inode *inode, struct page *page)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct list_head *head = &sbi->dir_inode_list;
+ struct dir_inode_entry *new;
+ struct list_head *this;
+
+ if (!S_ISDIR(inode->i_mode))
+ return;
+retry:
+ new = kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
+ if (!new) {
+ cond_resched();
+ goto retry;
+ }
+ new->inode = inode;
+ INIT_LIST_HEAD(&new->list);
+
+ spin_lock(&sbi->dir_inode_lock);
+ list_for_each(this, head) {
+ struct dir_inode_entry *entry;
+ entry = list_entry(this, struct dir_inode_entry, list);
+ if (entry->inode == inode) {
+ kmem_cache_free(inode_entry_slab, new);
+ goto out;
+ }
+ }
+ list_add_tail(&new->list, head);
+ sbi->n_dirty_dirs++;
+
+ BUG_ON(!S_ISDIR(inode->i_mode));
+out:
+ inc_page_count(sbi, F2FS_DIRTY_DENTS);
+ inode_inc_dirty_dents(inode);
+ SetPagePrivate(page);
+
+ spin_unlock(&sbi->dir_inode_lock);
+}
+
+void remove_dirty_dir_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct list_head *head = &sbi->dir_inode_list;
+ struct list_head *this;
+
+ if (!S_ISDIR(inode->i_mode))
+ return;
+
+ spin_lock(&sbi->dir_inode_lock);
+ if (atomic_read(&F2FS_I(inode)->dirty_dents))
+ goto out;
+
+ list_for_each(this, head) {
+ struct dir_inode_entry *entry;
+ entry = list_entry(this, struct dir_inode_entry, list);
+ if (entry->inode == inode) {
+ list_del(&entry->list);
+ kmem_cache_free(inode_entry_slab, entry);
+ sbi->n_dirty_dirs--;
+ break;
+ }
+ }
+out:
+ spin_unlock(&sbi->dir_inode_lock);
+}
+
+void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
+{
+ struct list_head *head = &sbi->dir_inode_list;
+ struct dir_inode_entry *entry;
+ struct inode *inode;
+retry:
+ spin_lock(&sbi->dir_inode_lock);
+ if (list_empty(head)) {
+ spin_unlock(&sbi->dir_inode_lock);
+ return;
+ }
+ entry = list_entry(head->next, struct dir_inode_entry, list);
+ inode = igrab(entry->inode);
+ spin_unlock(&sbi->dir_inode_lock);
+ if (inode) {
+ filemap_flush(inode->i_mapping);
+ iput(inode);
+ } else {
+ /*
+ * We should submit bio, since it exists several
+ * wribacking dentry pages in the freeing inode.
+ */
+ f2fs_submit_bio(sbi, DATA, true);
+ }
+ goto retry;
+}
+
+/**
+ * Freeze all the FS-operations for checkpoint.
+ */
+void block_operations(struct f2fs_sb_info *sbi)
+{
+ int t;
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = LONG_MAX,
+ .for_reclaim = 0,
+ };
+
+ /* Stop renaming operation */
+ mutex_lock_op(sbi, RENAME);
+ mutex_lock_op(sbi, DENTRY_OPS);
+
+retry_dents:
+ /* write all the dirty dentry pages */
+ sync_dirty_dir_inodes(sbi);
+
+ mutex_lock_op(sbi, DATA_WRITE);
+ if (get_pages(sbi, F2FS_DIRTY_DENTS)) {
+ mutex_unlock_op(sbi, DATA_WRITE);
+ goto retry_dents;
+ }
+
+ /* block all the operations */
+ for (t = DATA_NEW; t <= NODE_TRUNC; t++)
+ mutex_lock_op(sbi, t);
+
+ mutex_lock(&sbi->write_inode);
+
+ /*
+ * POR: we should ensure that there is no dirty node pages
+ * until finishing nat/sit flush.
+ */
+retry:
+ sync_node_pages(sbi, 0, &wbc);
+
+ mutex_lock_op(sbi, NODE_WRITE);
+
+ if (get_pages(sbi, F2FS_DIRTY_NODES)) {
+ mutex_unlock_op(sbi, NODE_WRITE);
+ goto retry;
+ }
+ mutex_unlock(&sbi->write_inode);
+}
+
+static void unblock_operations(struct f2fs_sb_info *sbi)
+{
+ int t;
+ for (t = NODE_WRITE; t >= RENAME; t--)
+ mutex_unlock_op(sbi, t);
+}
+
+static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
+{
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ nid_t last_nid = 0;
+ int nat_upd_blkoff[3];
+ block_t start_blk;
+ struct page *cp_page;
+ unsigned int data_sum_blocks, orphan_blocks;
+ void *kaddr;
+ __u32 crc32 = 0;
+ int i;
+
+ /* Flush all the NAT/SIT pages */
+ while (get_pages(sbi, F2FS_DIRTY_META))
+ sync_meta_pages(sbi, META, LONG_MAX);
+
+ next_free_nid(sbi, &last_nid);
+
+ /*
+ * modify checkpoint
+ * version number is already updated
+ */
+ ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi));
+ ckpt->valid_block_count = cpu_to_le64(valid_user_blocks(sbi));
+ ckpt->free_segment_count = cpu_to_le32(free_segments(sbi));
+ for (i = 0; i < 3; i++) {
+ ckpt->cur_node_segno[i] =
+ cpu_to_le32(curseg_segno(sbi, i + CURSEG_HOT_NODE));
+ ckpt->cur_node_blkoff[i] =
+ cpu_to_le16(curseg_blkoff(sbi, i + CURSEG_HOT_NODE));
+ nat_upd_blkoff[i] = NM_I(sbi)->nat_upd_blkoff[i];
+ ckpt->nat_upd_blkoff[i] = cpu_to_le16(nat_upd_blkoff[i]);
+ ckpt->alloc_type[i + CURSEG_HOT_NODE] =
+ curseg_alloc_type(sbi, i + CURSEG_HOT_NODE);
+ }
+ for (i = 0; i < 3; i++) {
+ ckpt->cur_data_segno[i] =
+ cpu_to_le32(curseg_segno(sbi, i + CURSEG_HOT_DATA));
+ ckpt->cur_data_blkoff[i] =
+ cpu_to_le16(curseg_blkoff(sbi, i + CURSEG_HOT_DATA));
+ ckpt->alloc_type[i + CURSEG_HOT_DATA] =
+ curseg_alloc_type(sbi, i + CURSEG_HOT_DATA);
+ }
+
+ ckpt->valid_node_count = cpu_to_le32(valid_node_count(sbi));
+ ckpt->valid_inode_count = cpu_to_le32(valid_inode_count(sbi));
+ ckpt->next_free_nid = cpu_to_le32(last_nid);
+
+ /* 2 cp + n data seg summary + orphan inode blocks */
+ data_sum_blocks = npages_for_summary_flush(sbi);
+ if (data_sum_blocks < 3)
+ ckpt->ckpt_flags |= CP_COMPACT_SUM_FLAG;
+ else
+ ckpt->ckpt_flags &= (~CP_COMPACT_SUM_FLAG);
+
+ orphan_blocks = (sbi->n_orphans + F2FS_ORPHANS_PER_BLOCK - 1)
+ / F2FS_ORPHANS_PER_BLOCK;
+ ckpt->cp_pack_start_sum = 1 + orphan_blocks;
+ ckpt->cp_pack_total_block_count = 2 + data_sum_blocks + orphan_blocks;
+
+ if (is_umount) {
+ ckpt->ckpt_flags |= CP_UMOUNT_FLAG;
+ ckpt->cp_pack_total_block_count += NR_CURSEG_NODE_TYPE;
+ } else {
+ ckpt->ckpt_flags &= (~CP_UMOUNT_FLAG);
+ }
+
+ if (sbi->n_orphans)
+ ckpt->ckpt_flags |= CP_ORPHAN_PRESENT_FLAG;
+ else
+ ckpt->ckpt_flags &= (~CP_ORPHAN_PRESENT_FLAG);
+
+ /* update SIT/NAT bitmap */
+ get_sit_bitmap(sbi, __bitmap_ptr(sbi, SIT_BITMAP));
+ get_nat_bitmap(sbi, __bitmap_ptr(sbi, NAT_BITMAP));
+
+ crc32 = f2fs_crc32(ckpt, le32_to_cpu(ckpt->checksum_offset));
+ *(__u32 *)((unsigned char *)ckpt +
+ le32_to_cpu(ckpt->checksum_offset))
+ = cpu_to_le32(crc32);
+
+ start_blk = __start_cp_addr(sbi);
+
+ /* write out checkpoint buffer at block 0 */
+ cp_page = grab_meta_page(sbi, start_blk++);
+ kaddr = page_address(cp_page);
+ memcpy(kaddr, ckpt, (1 << sbi->log_blocksize));
+ set_page_dirty(cp_page);
+ f2fs_put_page(cp_page, 1);
+
+ if (sbi->n_orphans) {
+ write_orphan_inodes(sbi, start_blk);
+ start_blk += orphan_blocks;
+ }
+
+ write_data_summaries(sbi, start_blk);
+ start_blk += data_sum_blocks;
+ if (is_umount) {
+ write_node_summaries(sbi, start_blk);
+ start_blk += NR_CURSEG_NODE_TYPE;
+ }
+
+ /* writeout checkpoint block */
+ cp_page = grab_meta_page(sbi, start_blk);
+ kaddr = page_address(cp_page);
+ memcpy(kaddr, ckpt, (1 << sbi->log_blocksize));
+ set_page_dirty(cp_page);
+ f2fs_put_page(cp_page, 1);
+
+ /* wait for previous submitted node/meta pages writeback */
+ while (get_pages(sbi, F2FS_WRITEBACK))
+ congestion_wait(BLK_RW_ASYNC, HZ / 50);
+
+ filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
+ filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
+
+ /* update user_block_counts */
+ sbi->last_valid_block_count = sbi->total_valid_block_count;
+ sbi->alloc_valid_block_count = 0;
+
+ /* Here, we only have one bio having CP pack */
+ if (sbi->ckpt->ckpt_flags & CP_ERROR_FLAG)
+ sbi->sb->s_flags |= MS_RDONLY;
+ else
+ sync_meta_pages(sbi, META_FLUSH, LONG_MAX);
+
+ clear_prefree_segments(sbi);
+ F2FS_RESET_SB_DIRT(sbi);
+}
+
+/**
+ * We guarantee that this checkpoint procedure should not fail.
+ */
+void write_checkpoint(struct f2fs_sb_info *sbi, bool blocked, bool is_umount)
+{
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ unsigned long long ckpt_ver;
+
+ if (!blocked) {
+ mutex_lock(&sbi->cp_mutex);
+ block_operations(sbi);
+ }
+
+ f2fs_submit_bio(sbi, DATA, true);
+ f2fs_submit_bio(sbi, NODE, true);
+ f2fs_submit_bio(sbi, META, true);
+
+ /*
+ * update checkpoint pack index
+ * Increase the version number so that
+ * SIT entries and seg summaries are written at correct place
+ */
+ ckpt_ver = le64_to_cpu(ckpt->checkpoint_ver);
+ ckpt->checkpoint_ver = cpu_to_le64(++ckpt_ver);
+
+ /* write cached NAT/SIT entries to NAT/SIT area */
+ flush_nat_entries(sbi);
+ flush_sit_entries(sbi);
+
+ reset_victim_segmap(sbi);
+
+ /* unlock all the fs_lock[] in do_checkpoint() */
+ do_checkpoint(sbi, is_umount);
+
+ unblock_operations(sbi);
+ mutex_unlock(&sbi->cp_mutex);
+}
+
+void init_orphan_info(struct f2fs_sb_info *sbi)
+{
+ mutex_init(&sbi->orphan_inode_mutex);
+ INIT_LIST_HEAD(&sbi->orphan_inode_list);
+ sbi->n_orphans = 0;
+}
+
+int create_checkpoint_caches(void)
+{
+ orphan_entry_slab = f2fs_kmem_cache_create("f2fs_orphan_entry",
+ sizeof(struct orphan_inode_entry), NULL);
+ if (unlikely(!orphan_entry_slab))
+ return -ENOMEM;
+ inode_entry_slab = f2fs_kmem_cache_create("f2fs_dirty_dir_entry",
+ sizeof(struct dir_inode_entry), NULL);
+ if (unlikely(!inode_entry_slab)) {
+ kmem_cache_destroy(orphan_entry_slab);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+void destroy_checkpoint_caches(void)
+{
+ kmem_cache_destroy(orphan_entry_slab);
+ kmem_cache_destroy(inode_entry_slab);
+}
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:28:26

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 06/16 v2] f2fs: add node operations

This adds specific functions to manage NAT pages, a cache for NAT entries, free
nids, direct/indirect node blocks for indexing data, and address space for node
pages.

- The key information of an NAT entry consists of a node id and a block address.

- An NAT page is composed of block addresses covered by a certain range of NAT
entries, which is maintained by the address space of meta_inode.

- A radix tree structure is used to cache NAT entries. The index for the tree
is a node id.

- When there is no free nid, F2FS should scan NAT entries to find new one. In
order to avoid scanning frequently, F2FS manages a list containing a number of
free nids in memory. Only when free nids in the list are exhausted, scanning
process, build_free_nids(), is triggered.

- F2FS has direct and indirect node blocks for indexing data. This patch adds
fuctions related to the node block management such as getting, allocating, and
truncating node blocks to index data.

- In order to cache node blocks in memory, F2FS has a node_inode with an address
space for node pages. This patch also adds the address space operations for
node_inode.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/node.c | 1782 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 1782 insertions(+)
create mode 100644 fs/f2fs/node.c

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
new file mode 100644
index 0000000..df69058
--- /dev/null
+++ b/fs/f2fs/node.c
@@ -0,0 +1,1782 @@
+/**
+ * fs/f2fs/node.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+#include <linux/mpage.h>
+#include <linux/backing-dev.h>
+#include <linux/blkdev.h>
+#include <linux/pagevec.h>
+#include <linux/swap.h>
+
+#include "f2fs.h"
+#include "node.h"
+#include "segment.h"
+
+static struct kmem_cache *nat_entry_slab;
+static struct kmem_cache *free_nid_slab;
+
+static void clear_node_page_dirty(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ struct f2fs_sb_info *sbi = F2FS_SB(mapping->host->i_sb);
+ unsigned int long flags;
+
+ if (PageDirty(page)) {
+ spin_lock_irqsave(&mapping->tree_lock, flags);
+ radix_tree_tag_clear(&mapping->page_tree,
+ page_index(page),
+ PAGECACHE_TAG_DIRTY);
+ spin_unlock_irqrestore(&mapping->tree_lock, flags);
+
+ clear_page_dirty_for_io(page);
+ dec_page_count(sbi, F2FS_DIRTY_NODES);
+ }
+ ClearPageUptodate(page);
+}
+
+static struct page *get_current_nat_page(struct f2fs_sb_info *sbi, nid_t nid)
+{
+ pgoff_t index = current_nat_addr(sbi, nid);
+ return get_meta_page(sbi, index);
+}
+
+static struct page *get_next_nat_page(struct f2fs_sb_info *sbi, nid_t nid)
+{
+ struct page *src_page;
+ struct page *dst_page;
+ pgoff_t src_off;
+ pgoff_t dst_off;
+ void *src_addr;
+ void *dst_addr;
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+
+ src_off = current_nat_addr(sbi, nid);
+ dst_off = next_nat_addr(sbi, src_off);
+
+ /* get current nat block page with lock */
+ src_page = get_meta_page(sbi, src_off);
+
+ /* Dirty src_page means that it is already the new target NAT page. */
+ if (PageDirty(src_page))
+ return src_page;
+
+ dst_page = grab_meta_page(sbi, dst_off);
+
+ src_addr = page_address(src_page);
+ dst_addr = page_address(dst_page);
+ memcpy(dst_addr, src_addr, PAGE_CACHE_SIZE);
+ set_page_dirty(dst_page);
+ f2fs_put_page(src_page, 1);
+
+ set_to_next_nat(nm_i, nid);
+
+ return dst_page;
+}
+
+/**
+ * Readahead NAT pages
+ */
+static void ra_nat_pages(struct f2fs_sb_info *sbi, int nid)
+{
+ struct address_space *mapping = sbi->meta_inode->i_mapping;
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct page *page;
+ pgoff_t index;
+ int i;
+
+ for (i = 0; i < FREE_NID_PAGES; i++, nid += NAT_ENTRY_PER_BLOCK) {
+ if (nid >= nm_i->max_nid)
+ nid = 0;
+ index = current_nat_addr(sbi, nid);
+
+ page = grab_cache_page(mapping, index);
+ if (!page)
+ continue;
+ if (f2fs_readpage(sbi, page, index, READ)) {
+ f2fs_put_page(page, 1);
+ continue;
+ }
+ page_cache_release(page);
+ }
+}
+
+static struct nat_entry *__lookup_nat_cache(struct f2fs_nm_info *nm_i, nid_t n)
+{
+ return radix_tree_lookup(&nm_i->nat_root, n);
+}
+
+static unsigned int __gang_lookup_nat_cache(struct f2fs_nm_info *nm_i,
+ nid_t start, unsigned int nr, struct nat_entry **ep)
+{
+ return radix_tree_gang_lookup(&nm_i->nat_root, (void **)ep, start, nr);
+}
+
+static void __del_from_nat_cache(struct f2fs_nm_info *nm_i, struct nat_entry *e)
+{
+ list_del(&e->list);
+ radix_tree_delete(&nm_i->nat_root, nat_get_nid(e));
+ nm_i->nat_cnt--;
+ kmem_cache_free(nat_entry_slab, e);
+}
+
+int is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct nat_entry *e;
+ int is_cp = 1;
+
+ read_lock(&nm_i->nat_tree_lock);
+ e = __lookup_nat_cache(nm_i, nid);
+ if (e && !e->checkpointed)
+ is_cp = 0;
+ read_unlock(&nm_i->nat_tree_lock);
+ return is_cp;
+}
+
+static struct nat_entry *grab_nat_entry(struct f2fs_nm_info *nm_i, nid_t nid)
+{
+ struct nat_entry *new;
+
+ new = kmem_cache_alloc(nat_entry_slab, GFP_ATOMIC);
+ if (!new)
+ return NULL;
+ if (radix_tree_insert(&nm_i->nat_root, nid, new)) {
+ kmem_cache_free(nat_entry_slab, new);
+ return NULL;
+ }
+ memset(new, 0, sizeof(struct nat_entry));
+ nat_set_nid(new, nid);
+ list_add_tail(&new->list, &nm_i->nat_entries);
+ nm_i->nat_cnt++;
+ return new;
+}
+
+static void cache_nat_entry(struct f2fs_nm_info *nm_i, nid_t nid,
+ struct f2fs_nat_entry *ne)
+{
+ struct nat_entry *e;
+retry:
+ write_lock(&nm_i->nat_tree_lock);
+ e = __lookup_nat_cache(nm_i, nid);
+ if (!e) {
+ e = grab_nat_entry(nm_i, nid);
+ if (!e) {
+ write_unlock(&nm_i->nat_tree_lock);
+ goto retry;
+ }
+ nat_set_blkaddr(e, le32_to_cpu(ne->block_addr));
+ nat_set_ino(e, le32_to_cpu(ne->ino));
+ nat_set_version(e, ne->version);
+ e->checkpointed = true;
+ }
+ write_unlock(&nm_i->nat_tree_lock);
+}
+
+static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni,
+ block_t new_blkaddr)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct nat_entry *e;
+retry:
+ write_lock(&nm_i->nat_tree_lock);
+ e = __lookup_nat_cache(nm_i, ni->nid);
+ if (!e) {
+ e = grab_nat_entry(nm_i, ni->nid);
+ if (!e) {
+ write_unlock(&nm_i->nat_tree_lock);
+ goto retry;
+ }
+ e->ni = *ni;
+ e->checkpointed = true;
+ BUG_ON(ni->blk_addr == NEW_ADDR);
+ } else if (new_blkaddr == NEW_ADDR) {
+ /*
+ * when nid is reallocated,
+ * previous nat entry can be remained in nat cache.
+ * So, reinitialize it with new information.
+ */
+ e->ni = *ni;
+ BUG_ON(ni->blk_addr != NULL_ADDR);
+ }
+
+ if (new_blkaddr == NEW_ADDR)
+ e->checkpointed = false;
+
+ /* sanity check */
+ BUG_ON(nat_get_blkaddr(e) != ni->blk_addr);
+ BUG_ON(nat_get_blkaddr(e) == NULL_ADDR &&
+ new_blkaddr == NULL_ADDR);
+ BUG_ON(nat_get_blkaddr(e) == NEW_ADDR &&
+ new_blkaddr == NEW_ADDR);
+ BUG_ON(nat_get_blkaddr(e) != NEW_ADDR &&
+ nat_get_blkaddr(e) != NULL_ADDR &&
+ new_blkaddr == NEW_ADDR);
+
+ /* increament version no as node is removed */
+ if (nat_get_blkaddr(e) != NEW_ADDR && new_blkaddr == NULL_ADDR) {
+ unsigned char version = nat_get_version(e);
+ nat_set_version(e, inc_node_version(version));
+ }
+
+ /* change address */
+ nat_set_blkaddr(e, new_blkaddr);
+ __set_nat_cache_dirty(nm_i, e);
+ write_unlock(&nm_i->nat_tree_lock);
+}
+
+static int try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+
+ if (nm_i->nat_cnt < 2 * NM_WOUT_THRESHOLD)
+ return 0;
+
+ write_lock(&nm_i->nat_tree_lock);
+ while (nr_shrink && !list_empty(&nm_i->nat_entries)) {
+ struct nat_entry *ne;
+ ne = list_first_entry(&nm_i->nat_entries,
+ struct nat_entry, list);
+ __del_from_nat_cache(nm_i, ne);
+ nr_shrink--;
+ }
+ write_unlock(&nm_i->nat_tree_lock);
+ return nr_shrink;
+}
+
+/**
+ * This function returns always success
+ */
+void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
+ struct f2fs_summary_block *sum = curseg->sum_blk;
+ nid_t start_nid = START_NID(nid);
+ struct f2fs_nat_block *nat_blk;
+ struct page *page = NULL;
+ struct f2fs_nat_entry ne;
+ struct nat_entry *e;
+ int i;
+
+ ni->nid = nid;
+
+ /* Check nat cache */
+ read_lock(&nm_i->nat_tree_lock);
+ e = __lookup_nat_cache(nm_i, nid);
+ if (e) {
+ ni->ino = nat_get_ino(e);
+ ni->blk_addr = nat_get_blkaddr(e);
+ ni->version = nat_get_version(e);
+ }
+ read_unlock(&nm_i->nat_tree_lock);
+ if (e)
+ return;
+
+ /* Check current segment summary */
+ mutex_lock(&curseg->curseg_mutex);
+ i = lookup_journal_in_cursum(sum, NAT_JOURNAL, nid, 0);
+ if (i >= 0) {
+ ne = nat_in_journal(sum, i);
+ node_info_from_raw_nat(ni, &ne);
+ }
+ mutex_unlock(&curseg->curseg_mutex);
+ if (i >= 0)
+ goto cache;
+
+ /* Fill node_info from nat page */
+ page = get_current_nat_page(sbi, start_nid);
+ nat_blk = (struct f2fs_nat_block *)page_address(page);
+ ne = nat_blk->entries[nid - start_nid];
+ node_info_from_raw_nat(ni, &ne);
+ f2fs_put_page(page, 1);
+cache:
+ /* cache nat entry */
+ cache_nat_entry(NM_I(sbi), nid, &ne);
+}
+
+/**
+ * The maximum depth is four.
+ * Offset[0] will have raw inode offset.
+ */
+static int get_node_path(long block, int offset[4], unsigned int noffset[4])
+{
+ const long direct_index = ADDRS_PER_INODE;
+ const long direct_blks = ADDRS_PER_BLOCK;
+ const long dptrs_per_blk = NIDS_PER_BLOCK;
+ const long indirect_blks = ADDRS_PER_BLOCK * NIDS_PER_BLOCK;
+ const long dindirect_blks = indirect_blks * NIDS_PER_BLOCK;
+ int n = 0;
+ int level = 0;
+
+ noffset[0] = 0;
+
+ if (block < direct_index) {
+ offset[n++] = block;
+ level = 0;
+ goto got;
+ }
+ block -= direct_index;
+ if (block < direct_blks) {
+ offset[n++] = NODE_DIR1_BLOCK;
+ noffset[n] = 1;
+ offset[n++] = block;
+ level = 1;
+ goto got;
+ }
+ block -= direct_blks;
+ if (block < direct_blks) {
+ offset[n++] = NODE_DIR2_BLOCK;
+ noffset[n] = 2;
+ offset[n++] = block;
+ level = 1;
+ goto got;
+ }
+ block -= direct_blks;
+ if (block < indirect_blks) {
+ offset[n++] = NODE_IND1_BLOCK;
+ noffset[n] = 3;
+ offset[n++] = block / direct_blks;
+ noffset[n] = 4 + offset[n - 1];
+ offset[n++] = block % direct_blks;
+ level = 2;
+ goto got;
+ }
+ block -= indirect_blks;
+ if (block < indirect_blks) {
+ offset[n++] = NODE_IND2_BLOCK;
+ noffset[n] = 4 + dptrs_per_blk;
+ offset[n++] = block / direct_blks;
+ noffset[n] = 5 + dptrs_per_blk + offset[n - 1];
+ offset[n++] = block % direct_blks;
+ level = 2;
+ goto got;
+ }
+ block -= indirect_blks;
+ if (block < dindirect_blks) {
+ offset[n++] = NODE_DIND_BLOCK;
+ noffset[n] = 5 + (dptrs_per_blk * 2);
+ offset[n++] = block / indirect_blks;
+ noffset[n] = 6 + (dptrs_per_blk * 2) +
+ offset[n - 1] * (dptrs_per_blk + 1);
+ offset[n++] = (block / direct_blks) % dptrs_per_blk;
+ noffset[n] = 7 + (dptrs_per_blk * 2) +
+ offset[n - 2] * (dptrs_per_blk + 1) +
+ offset[n - 1];
+ offset[n++] = block % direct_blks;
+ level = 3;
+ goto got;
+ } else {
+ BUG();
+ }
+got:
+ return level;
+}
+
+/*
+ * Caller should call f2fs_put_dnode(dn).
+ */
+int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int ro)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
+ struct page *npage[4];
+ struct page *parent;
+ int offset[4];
+ unsigned int noffset[4];
+ nid_t nids[4];
+ int level, i;
+ int err = 0;
+
+ level = get_node_path(index, offset, noffset);
+
+ nids[0] = dn->inode->i_ino;
+ npage[0] = get_node_page(sbi, nids[0]);
+ if (IS_ERR(npage[0]))
+ return PTR_ERR(npage[0]);
+
+ parent = npage[0];
+ nids[1] = get_nid(parent, offset[0], true);
+ dn->inode_page = npage[0];
+ dn->inode_page_locked = true;
+
+ /* get indirect or direct nodes */
+ for (i = 1; i <= level; i++) {
+ bool done = false;
+
+ if (!nids[i] && !ro) {
+ mutex_lock_op(sbi, NODE_NEW);
+
+ /* alloc new node */
+ if (!alloc_nid(sbi, &(nids[i]))) {
+ mutex_unlock_op(sbi, NODE_NEW);
+ err = -ENOSPC;
+ goto release_pages;
+ }
+
+ dn->nid = nids[i];
+ npage[i] = new_node_page(dn, noffset[i]);
+ if (IS_ERR(npage[i])) {
+ alloc_nid_failed(sbi, nids[i]);
+ mutex_unlock_op(sbi, NODE_NEW);
+ err = PTR_ERR(npage[i]);
+ goto release_pages;
+ }
+
+ set_nid(parent, offset[i - 1], nids[i], i == 1);
+ alloc_nid_done(sbi, nids[i]);
+ mutex_unlock_op(sbi, NODE_NEW);
+ done = true;
+ } else if (ro && i == level && level > 1) {
+ npage[i] = get_node_page_ra(parent, offset[i - 1]);
+ if (IS_ERR(npage[i])) {
+ err = PTR_ERR(npage[i]);
+ goto release_pages;
+ }
+ done = true;
+ }
+ if (i == 1) {
+ dn->inode_page_locked = false;
+ unlock_page(parent);
+ } else {
+ f2fs_put_page(parent, 1);
+ }
+
+ if (!done) {
+ npage[i] = get_node_page(sbi, nids[i]);
+ if (IS_ERR(npage[i])) {
+ err = PTR_ERR(npage[i]);
+ f2fs_put_page(npage[0], 0);
+ goto release_out;
+ }
+ }
+ if (i < level) {
+ parent = npage[i];
+ nids[i + 1] = get_nid(parent, offset[i], false);
+ }
+ }
+ dn->nid = nids[level];
+ dn->ofs_in_node = offset[level];
+ dn->node_page = npage[level];
+ dn->data_blkaddr = datablock_addr(dn->node_page, dn->ofs_in_node);
+ return 0;
+
+release_pages:
+ f2fs_put_page(parent, 1);
+ if (i > 1)
+ f2fs_put_page(npage[0], 0);
+release_out:
+ dn->inode_page = NULL;
+ dn->node_page = NULL;
+ return err;
+}
+
+static void truncate_node(struct dnode_of_data *dn)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
+ struct node_info ni;
+
+ get_node_info(sbi, dn->nid, &ni);
+ BUG_ON(ni.blk_addr == NULL_ADDR);
+
+ if (ni.blk_addr != NULL_ADDR)
+ invalidate_blocks(sbi, ni.blk_addr);
+
+ /* Deallocate node address */
+ dec_valid_node_count(sbi, dn->inode, 1);
+ set_node_addr(sbi, &ni, NULL_ADDR);
+
+ if (dn->nid == dn->inode->i_ino) {
+ remove_orphan_inode(sbi, dn->nid);
+ dec_valid_inode_count(sbi);
+ } else {
+ sync_inode_page(dn);
+ }
+
+ clear_node_page_dirty(dn->node_page);
+ F2FS_SET_SB_DIRT(sbi);
+
+ f2fs_put_page(dn->node_page, 1);
+ dn->node_page = NULL;
+}
+
+static int truncate_dnode(struct dnode_of_data *dn)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
+ struct page *page;
+
+ if (dn->nid == 0)
+ return 1;
+
+ /* get direct node */
+ page = get_node_page(sbi, dn->nid);
+ if (IS_ERR(page) && PTR_ERR(page) == -ENOENT)
+ return 1;
+ else if (IS_ERR(page))
+ return PTR_ERR(page);
+
+ /* Make dnode_of_data for parameter */
+ dn->node_page = page;
+ dn->ofs_in_node = 0;
+ truncate_data_blocks(dn);
+ truncate_node(dn);
+ return 1;
+}
+
+static int truncate_nodes(struct dnode_of_data *dn, unsigned int nofs,
+ int ofs, int depth)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
+ struct dnode_of_data rdn = *dn;
+ struct page *page;
+ struct f2fs_node *rn;
+ nid_t child_nid;
+ unsigned int child_nofs;
+ int freed = 0;
+ int i, ret;
+
+ if (dn->nid == 0)
+ return NIDS_PER_BLOCK + 1;
+
+ page = get_node_page(sbi, dn->nid);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+
+ rn = (struct f2fs_node *)page_address(page);
+ if (depth < 3) {
+ for (i = ofs; i < NIDS_PER_BLOCK; i++, freed++) {
+ child_nid = le32_to_cpu(rn->in.nid[i]);
+ if (child_nid == 0)
+ continue;
+ rdn.nid = child_nid;
+ ret = truncate_dnode(&rdn);
+ if (ret < 0)
+ goto out_err;
+ set_nid(page, i, 0, false);
+ }
+ } else {
+ child_nofs = nofs + ofs * (NIDS_PER_BLOCK + 1) + 1;
+ for (i = ofs; i < NIDS_PER_BLOCK; i++) {
+ child_nid = le32_to_cpu(rn->in.nid[i]);
+ if (child_nid == 0) {
+ child_nofs += NIDS_PER_BLOCK + 1;
+ continue;
+ }
+ rdn.nid = child_nid;
+ ret = truncate_nodes(&rdn, child_nofs, 0, depth - 1);
+ if (ret == (NIDS_PER_BLOCK + 1)) {
+ set_nid(page, i, 0, false);
+ child_nofs += ret;
+ } else if (ret < 0 && ret != -ENOENT) {
+ goto out_err;
+ }
+ }
+ freed = child_nofs;
+ }
+
+ if (!ofs) {
+ /* remove current indirect node */
+ dn->node_page = page;
+ truncate_node(dn);
+ freed++;
+ } else {
+ f2fs_put_page(page, 1);
+ }
+ return freed;
+
+out_err:
+ f2fs_put_page(page, 1);
+ return ret;
+}
+
+static int truncate_partial_nodes(struct dnode_of_data *dn,
+ struct f2fs_inode *ri, int *offset, int depth)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
+ struct page *pages[2];
+ nid_t nid[3];
+ nid_t child_nid;
+ int err = 0;
+ int i;
+ int idx = depth - 2;
+
+ nid[0] = le32_to_cpu(ri->i_nid[offset[0] - NODE_DIR1_BLOCK]);
+ if (!nid[0])
+ return 0;
+
+ /* get indirect nodes in the path */
+ for (i = 0; i < depth - 1; i++) {
+ /* refernece count'll be increased */
+ pages[i] = get_node_page(sbi, nid[i]);
+ if (IS_ERR(pages[i])) {
+ depth = i + 1;
+ err = PTR_ERR(pages[i]);
+ goto fail;
+ }
+ nid[i + 1] = get_nid(pages[i], offset[i + 1], false);
+ }
+
+ /* free direct nodes linked to a partial indirect node */
+ for (i = offset[depth - 1]; i < NIDS_PER_BLOCK; i++) {
+ child_nid = get_nid(pages[idx], i, false);
+ if (!child_nid)
+ continue;
+ dn->nid = child_nid;
+ err = truncate_dnode(dn);
+ if (err < 0)
+ goto fail;
+ set_nid(pages[idx], i, 0, false);
+ }
+
+ if (offset[depth - 1] == 0) {
+ dn->node_page = pages[idx];
+ dn->nid = nid[idx];
+ truncate_node(dn);
+ } else {
+ f2fs_put_page(pages[idx], 1);
+ }
+ offset[idx]++;
+ offset[depth - 1] = 0;
+fail:
+ for (i = depth - 3; i >= 0; i--)
+ f2fs_put_page(pages[i], 1);
+ return err;
+}
+
+/**
+ * All the block addresses of data and nodes should be nullified.
+ */
+int truncate_inode_blocks(struct inode *inode, pgoff_t from)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ int err = 0, cont = 1;
+ int level, offset[4], noffset[4];
+ unsigned int nofs;
+ struct f2fs_node *rn;
+ struct dnode_of_data dn;
+ struct page *page;
+
+ level = get_node_path(from, offset, noffset);
+
+ page = get_node_page(sbi, inode->i_ino);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+
+ set_new_dnode(&dn, inode, page, NULL, 0);
+ unlock_page(page);
+
+ rn = page_address(page);
+ switch (level) {
+ case 0:
+ case 1:
+ nofs = noffset[1];
+ break;
+ case 2:
+ nofs = noffset[1];
+ if (!offset[level - 1])
+ goto skip_partial;
+ err = truncate_partial_nodes(&dn, &rn->i, offset, level);
+ if (err < 0 && err != -ENOENT)
+ goto fail;
+ nofs += 1 + NIDS_PER_BLOCK;
+ break;
+ case 3:
+ nofs = 5 + 2 * NIDS_PER_BLOCK;
+ if (!offset[level - 1])
+ goto skip_partial;
+ err = truncate_partial_nodes(&dn, &rn->i, offset, level);
+ if (err < 0 && err != -ENOENT)
+ goto fail;
+ break;
+ default:
+ BUG();
+ }
+
+skip_partial:
+ while (cont) {
+ dn.nid = le32_to_cpu(rn->i.i_nid[offset[0] - NODE_DIR1_BLOCK]);
+ switch (offset[0]) {
+ case NODE_DIR1_BLOCK:
+ case NODE_DIR2_BLOCK:
+ err = truncate_dnode(&dn);
+ break;
+
+ case NODE_IND1_BLOCK:
+ case NODE_IND2_BLOCK:
+ err = truncate_nodes(&dn, nofs, offset[1], 2);
+ break;
+
+ case NODE_DIND_BLOCK:
+ err = truncate_nodes(&dn, nofs, offset[1], 3);
+ cont = 0;
+ break;
+
+ default:
+ BUG();
+ }
+ if (err < 0 && err != -ENOENT)
+ goto fail;
+ if (offset[1] == 0 &&
+ rn->i.i_nid[offset[0] - NODE_DIR1_BLOCK]) {
+ lock_page(page);
+ wait_on_page_writeback(page);
+ rn->i.i_nid[offset[0] - NODE_DIR1_BLOCK] = 0;
+ set_page_dirty(page);
+ unlock_page(page);
+ }
+ offset[1] = 0;
+ offset[0]++;
+ nofs += err;
+ }
+fail:
+ f2fs_put_page(page, 0);
+ return err > 0 ? 0 : err;
+}
+
+int remove_inode_page(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct page *page;
+ nid_t ino = inode->i_ino;
+ struct dnode_of_data dn;
+
+ mutex_lock_op(sbi, NODE_TRUNC);
+ page = get_node_page(sbi, ino);
+ if (IS_ERR(page)) {
+ mutex_unlock_op(sbi, NODE_TRUNC);
+ return PTR_ERR(page);
+ }
+
+ if (F2FS_I(inode)->i_xattr_nid) {
+ nid_t nid = F2FS_I(inode)->i_xattr_nid;
+ struct page *npage = get_node_page(sbi, nid);
+
+ if (IS_ERR(npage)) {
+ mutex_unlock_op(sbi, NODE_TRUNC);
+ return PTR_ERR(npage);
+ }
+
+ F2FS_I(inode)->i_xattr_nid = 0;
+ set_new_dnode(&dn, inode, page, npage, nid);
+ dn.inode_page_locked = 1;
+ truncate_node(&dn);
+ }
+ if (inode->i_blocks == 1) {
+ /* inernally call f2fs_put_page() */
+ set_new_dnode(&dn, inode, page, page, ino);
+ truncate_node(&dn);
+ } else if (inode->i_blocks == 0) {
+ struct node_info ni;
+ get_node_info(sbi, inode->i_ino, &ni);
+
+ /* called after f2fs_new_inode() is failed */
+ BUG_ON(ni.blk_addr != NULL_ADDR);
+ f2fs_put_page(page, 1);
+ } else {
+ BUG();
+ }
+ mutex_unlock_op(sbi, NODE_TRUNC);
+ return 0;
+}
+
+int new_inode_page(struct inode *inode, struct dentry *dentry)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct page *page;
+ struct dnode_of_data dn;
+
+ /* allocate inode page for new inode */
+ set_new_dnode(&dn, inode, NULL, NULL, inode->i_ino);
+ mutex_lock_op(sbi, NODE_NEW);
+ page = new_node_page(&dn, 0);
+ init_dent_inode(dentry, page);
+ mutex_unlock_op(sbi, NODE_NEW);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+ f2fs_put_page(page, 1);
+ return 0;
+}
+
+struct page *new_node_page(struct dnode_of_data *dn, unsigned int ofs)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
+ struct address_space *mapping = sbi->node_inode->i_mapping;
+ struct node_info old_ni, new_ni;
+ struct page *page;
+ int err;
+
+ if (is_inode_flag_set(F2FS_I(dn->inode), FI_NO_ALLOC))
+ return ERR_PTR(-EPERM);
+
+ page = grab_cache_page(mapping, dn->nid);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ get_node_info(sbi, dn->nid, &old_ni);
+
+ SetPageUptodate(page);
+ fill_node_footer(page, dn->nid, dn->inode->i_ino, ofs, true);
+
+ /* Reinitialize old_ni with new node page */
+ BUG_ON(old_ni.blk_addr != NULL_ADDR);
+ new_ni = old_ni;
+ new_ni.ino = dn->inode->i_ino;
+
+ if (!inc_valid_node_count(sbi, dn->inode, 1)) {
+ err = -ENOSPC;
+ goto fail;
+ }
+ set_node_addr(sbi, &new_ni, NEW_ADDR);
+
+ dn->node_page = page;
+ sync_inode_page(dn);
+ set_page_dirty(page);
+ set_cold_node(dn->inode, page);
+ if (ofs == 0)
+ inc_valid_inode_count(sbi);
+
+ return page;
+
+fail:
+ f2fs_put_page(page, 1);
+ return ERR_PTR(err);
+}
+
+static int read_node_page(struct page *page, int type)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
+ struct node_info ni;
+
+ get_node_info(sbi, page->index, &ni);
+
+ if (ni.blk_addr == NULL_ADDR)
+ return -ENOENT;
+ return f2fs_readpage(sbi, page, ni.blk_addr, type);
+}
+
+/**
+ * Readahead a node page
+ */
+void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid)
+{
+ struct address_space *mapping = sbi->node_inode->i_mapping;
+ struct page *apage;
+
+ apage = find_get_page(mapping, nid);
+ if (apage && PageUptodate(apage))
+ goto release_out;
+ f2fs_put_page(apage, 0);
+
+ apage = grab_cache_page(mapping, nid);
+ if (!apage)
+ return;
+
+ if (read_node_page(apage, READA))
+ goto unlock_out;
+
+ page_cache_release(apage);
+ return;
+
+unlock_out:
+ unlock_page(apage);
+release_out:
+ page_cache_release(apage);
+}
+
+struct page *get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid)
+{
+ int err;
+ struct page *page;
+ struct address_space *mapping = sbi->node_inode->i_mapping;
+
+ page = grab_cache_page(mapping, nid);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ err = read_node_page(page, READ_SYNC);
+ if (err) {
+ f2fs_put_page(page, 1);
+ return ERR_PTR(err);
+ }
+
+ BUG_ON(nid != nid_of_node(page));
+ mark_page_accessed(page);
+ return page;
+}
+
+/**
+ * Return a locked page for the desired node page.
+ * And, readahead MAX_RA_NODE number of node pages.
+ */
+struct page *get_node_page_ra(struct page *parent, int start)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(parent->mapping->host->i_sb);
+ struct address_space *mapping = sbi->node_inode->i_mapping;
+ int i, end;
+ int err = 0;
+ nid_t nid;
+ struct page *page;
+
+ /* First, try getting the desired direct node. */
+ nid = get_nid(parent, start, false);
+ if (!nid)
+ return ERR_PTR(-ENOENT);
+
+ page = find_get_page(mapping, nid);
+ if (page && PageUptodate(page))
+ goto page_hit;
+ f2fs_put_page(page, 0);
+
+repeat:
+ page = grab_cache_page(mapping, nid);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ err = read_node_page(page, READA);
+ if (err) {
+ f2fs_put_page(page, 1);
+ return ERR_PTR(err);
+ }
+
+ /* Then, try readahead for siblings of the desired node */
+ end = start + MAX_RA_NODE;
+ end = min(end, NIDS_PER_BLOCK);
+ for (i = start + 1; i < end; i++) {
+ nid = get_nid(parent, i, false);
+ if (!nid)
+ continue;
+ ra_node_page(sbi, nid);
+ }
+
+page_hit:
+ lock_page(page);
+ if (PageError(page)) {
+ f2fs_put_page(page, 1);
+ return ERR_PTR(-EIO);
+ }
+
+ /* Has the page been truncated? */
+ if (page->mapping != mapping) {
+ f2fs_put_page(page, 1);
+ goto repeat;
+ }
+ return page;
+}
+
+void sync_inode_page(struct dnode_of_data *dn)
+{
+ if (IS_INODE(dn->node_page) || dn->inode_page == dn->node_page) {
+ update_inode(dn->inode, dn->node_page);
+ } else if (dn->inode_page) {
+ if (!dn->inode_page_locked)
+ lock_page(dn->inode_page);
+ update_inode(dn->inode, dn->inode_page);
+ if (!dn->inode_page_locked)
+ unlock_page(dn->inode_page);
+ } else {
+ f2fs_write_inode(dn->inode, NULL);
+ }
+}
+
+int sync_node_pages(struct f2fs_sb_info *sbi, nid_t ino,
+ struct writeback_control *wbc)
+{
+ struct address_space *mapping = sbi->node_inode->i_mapping;
+ pgoff_t index, end;
+ struct pagevec pvec;
+ int step = ino ? 2 : 0;
+ int nwritten = 0, wrote = 0;
+
+ pagevec_init(&pvec, 0);
+
+next_step:
+ index = 0;
+ end = LONG_MAX;
+
+ while (index <= end) {
+ int i, nr_pages;
+ nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
+ PAGECACHE_TAG_DIRTY,
+ min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
+ if (nr_pages == 0)
+ break;
+
+ for (i = 0; i < nr_pages; i++) {
+ struct page *page = pvec.pages[i];
+
+ /*
+ * flushing sequence with step:
+ * 0. indirect nodes
+ * 1. dentry dnodes
+ * 2. file dnodes
+ */
+ if (step == 0 && IS_DNODE(page))
+ continue;
+ if (step == 1 && (!IS_DNODE(page) ||
+ is_cold_node(page)))
+ continue;
+ if (step == 2 && (!IS_DNODE(page) ||
+ !is_cold_node(page)))
+ continue;
+
+ /*
+ * If an fsync mode,
+ * we should not skip writing node pages.
+ */
+ if (ino && ino_of_node(page) == ino)
+ lock_page(page);
+ else if (!trylock_page(page))
+ continue;
+
+ if (unlikely(page->mapping != mapping)) {
+continue_unlock:
+ unlock_page(page);
+ continue;
+ }
+ if (ino && ino_of_node(page) != ino)
+ goto continue_unlock;
+
+ if (!PageDirty(page)) {
+ /* someone wrote it for us */
+ goto continue_unlock;
+ }
+
+ if (!clear_page_dirty_for_io(page))
+ goto continue_unlock;
+
+ /* called by fsync() */
+ if (ino && IS_DNODE(page)) {
+ int mark = !is_checkpointed_node(sbi, ino);
+ set_fsync_mark(page, 1);
+ if (IS_INODE(page))
+ set_dentry_mark(page, mark);
+ nwritten++;
+ } else {
+ set_fsync_mark(page, 0);
+ set_dentry_mark(page, 0);
+ }
+ mapping->a_ops->writepage(page, wbc);
+ wrote++;
+
+ if (--wbc->nr_to_write == 0)
+ break;
+ }
+ pagevec_release(&pvec);
+ cond_resched();
+
+ if (wbc->nr_to_write == 0) {
+ step = 2;
+ break;
+ }
+ }
+
+ if (step < 2) {
+ step++;
+ goto next_step;
+ }
+
+ if (wrote)
+ f2fs_submit_bio(sbi, NODE, wbc->sync_mode == WB_SYNC_ALL);
+
+ return nwritten;
+}
+
+static int f2fs_write_node_page(struct page *page,
+ struct writeback_control *wbc)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
+ nid_t nid;
+ unsigned int nofs;
+ block_t new_addr;
+ struct node_info ni;
+
+ if (wbc->for_reclaim) {
+ dec_page_count(sbi, F2FS_DIRTY_NODES);
+ wbc->pages_skipped++;
+ set_page_dirty(page);
+ return AOP_WRITEPAGE_ACTIVATE;
+ }
+
+ wait_on_page_writeback(page);
+
+ mutex_lock_op(sbi, NODE_WRITE);
+
+ /* get old block addr of this node page */
+ nid = nid_of_node(page);
+ nofs = ofs_of_node(page);
+ BUG_ON(page->index != nid);
+
+ get_node_info(sbi, nid, &ni);
+
+ /* This page is already truncated */
+ if (ni.blk_addr == NULL_ADDR)
+ return 0;
+
+ set_page_writeback(page);
+
+ /* insert node offset */
+ write_node_page(sbi, page, nid, ni.blk_addr, &new_addr);
+ set_node_addr(sbi, &ni, new_addr);
+ dec_page_count(sbi, F2FS_DIRTY_NODES);
+
+ mutex_unlock_op(sbi, NODE_WRITE);
+ unlock_page(page);
+ return 0;
+}
+
+static int f2fs_write_node_pages(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(mapping->host->i_sb);
+ struct block_device *bdev = sbi->sb->s_bdev;
+ long nr_to_write = wbc->nr_to_write;
+
+ if (wbc->for_kupdate)
+ return 0;
+
+ if (get_pages(sbi, F2FS_DIRTY_NODES) == 0)
+ return 0;
+
+ if (try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK)) {
+ write_checkpoint(sbi, false, false);
+ return 0;
+ }
+
+ /* if mounting is failed, skip writing node pages */
+ wbc->nr_to_write = bio_get_nr_vecs(bdev);
+ sync_node_pages(sbi, 0, wbc);
+ wbc->nr_to_write = nr_to_write -
+ (bio_get_nr_vecs(bdev) - wbc->nr_to_write);
+ return 0;
+}
+
+static int f2fs_set_node_page_dirty(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ struct f2fs_sb_info *sbi = F2FS_SB(mapping->host->i_sb);
+
+ SetPageUptodate(page);
+ if (!PageDirty(page)) {
+ __set_page_dirty_nobuffers(page);
+ inc_page_count(sbi, F2FS_DIRTY_NODES);
+ SetPagePrivate(page);
+ return 1;
+ }
+ return 0;
+}
+
+static void f2fs_invalidate_node_page(struct page *page, unsigned long offset)
+{
+ struct inode *inode = page->mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ if (PageDirty(page))
+ dec_page_count(sbi, F2FS_DIRTY_NODES);
+ ClearPagePrivate(page);
+}
+
+static int f2fs_release_node_page(struct page *page, gfp_t wait)
+{
+ ClearPagePrivate(page);
+ return 0;
+}
+
+/**
+ * Structure of the f2fs node operations
+ */
+const struct address_space_operations f2fs_node_aops = {
+ .writepage = f2fs_write_node_page,
+ .writepages = f2fs_write_node_pages,
+ .set_page_dirty = f2fs_set_node_page_dirty,
+ .invalidatepage = f2fs_invalidate_node_page,
+ .releasepage = f2fs_release_node_page,
+};
+
+static struct free_nid *__lookup_free_nid_list(nid_t n, struct list_head *head)
+{
+ struct list_head *this;
+ struct free_nid *i = NULL;
+ list_for_each(this, head) {
+ i = list_entry(this, struct free_nid, list);
+ if (i->nid == n)
+ break;
+ i = NULL;
+ }
+ return i;
+}
+
+static void __del_from_free_nid_list(struct free_nid *i)
+{
+ list_del(&i->list);
+ kmem_cache_free(free_nid_slab, i);
+}
+
+static int add_free_nid(struct f2fs_nm_info *nm_i, nid_t nid)
+{
+ struct free_nid *i;
+
+ if (nm_i->fcnt > 2 * MAX_FREE_NIDS)
+ return 0;
+retry:
+ i = kmem_cache_alloc(free_nid_slab, GFP_NOFS);
+ if (!i) {
+ cond_resched();
+ goto retry;
+ }
+ i->nid = nid;
+ i->state = NID_NEW;
+
+ spin_lock(&nm_i->free_nid_list_lock);
+ if (__lookup_free_nid_list(nid, &nm_i->free_nid_list)) {
+ spin_unlock(&nm_i->free_nid_list_lock);
+ kmem_cache_free(free_nid_slab, i);
+ return 0;
+ }
+ list_add_tail(&i->list, &nm_i->free_nid_list);
+ nm_i->fcnt++;
+ spin_unlock(&nm_i->free_nid_list_lock);
+ return 1;
+}
+
+static void remove_free_nid(struct f2fs_nm_info *nm_i, nid_t nid)
+{
+ struct free_nid *i;
+ spin_lock(&nm_i->free_nid_list_lock);
+ i = __lookup_free_nid_list(nid, &nm_i->free_nid_list);
+ if (i && i->state == NID_NEW) {
+ __del_from_free_nid_list(i);
+ nm_i->fcnt--;
+ }
+ spin_unlock(&nm_i->free_nid_list_lock);
+}
+
+static int scan_nat_page(struct f2fs_nm_info *nm_i,
+ struct page *nat_page, nid_t start_nid)
+{
+ struct f2fs_nat_block *nat_blk = page_address(nat_page);
+ block_t blk_addr;
+ int fcnt = 0;
+ int i;
+
+ /* 0 nid should not be used */
+ if (start_nid == 0)
+ ++start_nid;
+
+ i = start_nid % NAT_ENTRY_PER_BLOCK;
+
+ for (; i < NAT_ENTRY_PER_BLOCK; i++, start_nid++) {
+ blk_addr = le32_to_cpu(nat_blk->entries[i].block_addr);
+ BUG_ON(blk_addr == NEW_ADDR);
+ if (blk_addr == NULL_ADDR)
+ fcnt += add_free_nid(nm_i, start_nid);
+ }
+ return fcnt;
+}
+
+static void build_free_nids(struct f2fs_sb_info *sbi)
+{
+ struct free_nid *fnid, *next_fnid;
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
+ struct f2fs_summary_block *sum = curseg->sum_blk;
+ nid_t nid = 0;
+ bool is_cycled = false;
+ int fcnt = 0;
+ int i;
+
+ nid = nm_i->next_scan_nid;
+ nm_i->init_scan_nid = nid;
+
+ ra_nat_pages(sbi, nid);
+
+ while (1) {
+ struct page *page = get_current_nat_page(sbi, nid);
+
+ fcnt += scan_nat_page(nm_i, page, nid);
+ f2fs_put_page(page, 1);
+
+ nid += (NAT_ENTRY_PER_BLOCK - (nid % NAT_ENTRY_PER_BLOCK));
+
+ if (nid >= nm_i->max_nid) {
+ nid = 0;
+ is_cycled = true;
+ }
+ if (fcnt > MAX_FREE_NIDS)
+ break;
+ if (is_cycled && nm_i->init_scan_nid <= nid)
+ break;
+ }
+
+ nm_i->next_scan_nid = nid;
+
+ /* find free nids from current sum_pages */
+ mutex_lock(&curseg->curseg_mutex);
+ for (i = 0; i < nats_in_cursum(sum); i++) {
+ block_t addr = le32_to_cpu(nat_in_journal(sum, i).block_addr);
+ nid = le32_to_cpu(nid_in_journal(sum, i));
+ if (addr == NULL_ADDR)
+ add_free_nid(nm_i, nid);
+ else
+ remove_free_nid(nm_i, nid);
+ }
+ mutex_unlock(&curseg->curseg_mutex);
+
+ /* remove the free nids from current allocated nids */
+ list_for_each_entry_safe(fnid, next_fnid, &nm_i->free_nid_list, list) {
+ struct nat_entry *ne;
+
+ read_lock(&nm_i->nat_tree_lock);
+ ne = __lookup_nat_cache(nm_i, fnid->nid);
+ if (ne && nat_get_blkaddr(ne) != NULL_ADDR)
+ remove_free_nid(nm_i, fnid->nid);
+ read_unlock(&nm_i->nat_tree_lock);
+ }
+}
+
+/*
+ * If this function returns success, caller can obtain a new nid
+ * from second parameter of this function.
+ * The returned nid could be used ino as well as nid when inode is created.
+ */
+bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct free_nid *i = NULL;
+ struct list_head *this;
+retry:
+ mutex_lock(&nm_i->build_lock);
+ if (!nm_i->fcnt) {
+ /* scan NAT in order to build free nid list */
+ build_free_nids(sbi);
+ if (!nm_i->fcnt) {
+ mutex_unlock(&nm_i->build_lock);
+ return false;
+ }
+ }
+ mutex_unlock(&nm_i->build_lock);
+
+ /*
+ * We check fcnt again since previous check is racy as
+ * we didn't hold free_nid_list_lock. So other thread
+ * could consume all of free nids.
+ */
+ spin_lock(&nm_i->free_nid_list_lock);
+ if (!nm_i->fcnt) {
+ spin_unlock(&nm_i->free_nid_list_lock);
+ goto retry;
+ }
+
+ BUG_ON(list_empty(&nm_i->free_nid_list));
+ list_for_each(this, &nm_i->free_nid_list) {
+ i = list_entry(this, struct free_nid, list);
+ if (i->state == NID_NEW)
+ break;
+ }
+
+ BUG_ON(i->state != NID_NEW);
+ *nid = i->nid;
+ i->state = NID_ALLOC;
+ nm_i->fcnt--;
+ spin_unlock(&nm_i->free_nid_list_lock);
+ return true;
+}
+
+/**
+ * alloc_nid() should be called prior to this function.
+ */
+void alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct free_nid *i;
+
+ spin_lock(&nm_i->free_nid_list_lock);
+ i = __lookup_free_nid_list(nid, &nm_i->free_nid_list);
+ if (i) {
+ BUG_ON(i->state != NID_ALLOC);
+ __del_from_free_nid_list(i);
+ }
+ spin_unlock(&nm_i->free_nid_list_lock);
+}
+
+/**
+ * alloc_nid() should be called prior to this function.
+ */
+void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid)
+{
+ alloc_nid_done(sbi, nid);
+ add_free_nid(NM_I(sbi), nid);
+}
+
+void recover_node_page(struct f2fs_sb_info *sbi, struct page *page,
+ struct f2fs_summary *sum, struct node_info *ni,
+ block_t new_blkaddr)
+{
+ rewrite_node_page(sbi, page, sum, ni->blk_addr, new_blkaddr);
+ set_node_addr(sbi, ni, new_blkaddr);
+ clear_node_page_dirty(page);
+}
+
+int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page)
+{
+ struct address_space *mapping = sbi->node_inode->i_mapping;
+ struct f2fs_node *rn;
+ void *src, *dst;
+ nid_t ino = ino_of_node(page);
+ struct node_info old_ni, new_ni;
+ struct page *ipage;
+
+ ipage = grab_cache_page(mapping, ino);
+ if (!ipage)
+ return -ENOMEM;
+
+ /* Should not use this inode from free nid list */
+ remove_free_nid(NM_I(sbi), ino);
+
+ get_node_info(sbi, ino, &old_ni);
+ SetPageUptodate(ipage);
+ fill_node_footer(ipage, ino, ino, 0, true);
+
+ src = kmap_atomic(page);
+ dst = kmap_atomic(ipage);
+
+ memcpy(dst, src, F2FS_INODE_SIZE);
+ rn = (struct f2fs_node *)dst;
+ rn->i.i_size = 0;
+ rn->i.i_blocks = 1;
+ rn->i.i_links = 1;
+ rn->i.i_xattr_nid = 0;
+ kunmap_atomic(dst);
+ kunmap_atomic(src);
+
+ new_ni = old_ni;
+ new_ni.ino = ino;
+
+ set_node_addr(sbi, &new_ni, NEW_ADDR);
+ inc_valid_inode_count(sbi);
+
+ f2fs_put_page(ipage, 1);
+ return 0;
+}
+
+int restore_node_summary(struct f2fs_sb_info *sbi,
+ unsigned int segno, struct f2fs_summary_block *sum)
+{
+ struct f2fs_node *rn;
+ struct f2fs_summary *sum_entry;
+ struct page *page;
+ block_t addr;
+ int i, last_offset;
+
+ /* alloc temporal page for read node */
+ page = alloc_page(GFP_NOFS | __GFP_ZERO);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+ lock_page(page);
+
+ /* scan the node segment */
+ last_offset = sbi->blocks_per_seg;
+ addr = START_BLOCK(sbi, segno);
+ sum_entry = &sum->entries[0];
+
+ for (i = 0; i < last_offset; i++, sum_entry++) {
+ if (f2fs_readpage(sbi, page, addr, READ_SYNC))
+ goto out;
+
+ rn = (struct f2fs_node *)page_address(page);
+ sum_entry->nid = rn->footer.nid;
+ sum_entry->version = 0;
+ sum_entry->ofs_in_node = 0;
+ addr++;
+
+ /*
+ * In order to read next node page,
+ * we must clear PageUptodate flag.
+ */
+ ClearPageUptodate(page);
+ }
+out:
+ unlock_page(page);
+ __free_pages(page, 0);
+ return 0;
+}
+
+static bool flush_nats_in_journal(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
+ struct f2fs_summary_block *sum = curseg->sum_blk;
+ int i;
+
+ mutex_lock(&curseg->curseg_mutex);
+
+ if (nats_in_cursum(sum) < NAT_JOURNAL_ENTRIES) {
+ mutex_unlock(&curseg->curseg_mutex);
+ return false;
+ }
+
+ for (i = 0; i < nats_in_cursum(sum); i++) {
+ struct nat_entry *ne;
+ struct f2fs_nat_entry raw_ne;
+ nid_t nid = le32_to_cpu(nid_in_journal(sum, i));
+
+ raw_ne = nat_in_journal(sum, i);
+retry:
+ write_lock(&nm_i->nat_tree_lock);
+ ne = __lookup_nat_cache(nm_i, nid);
+ if (ne) {
+ __set_nat_cache_dirty(nm_i, ne);
+ write_unlock(&nm_i->nat_tree_lock);
+ continue;
+ }
+ ne = grab_nat_entry(nm_i, nid);
+ if (!ne) {
+ write_unlock(&nm_i->nat_tree_lock);
+ goto retry;
+ }
+ nat_set_blkaddr(ne, le32_to_cpu(raw_ne.block_addr));
+ nat_set_ino(ne, le32_to_cpu(raw_ne.ino));
+ nat_set_version(ne, raw_ne.version);
+ __set_nat_cache_dirty(nm_i, ne);
+ write_unlock(&nm_i->nat_tree_lock);
+ }
+ update_nats_in_cursum(sum, -i);
+ mutex_unlock(&curseg->curseg_mutex);
+ return true;
+}
+
+/**
+ * This function is called during the checkpointing process.
+ */
+void flush_nat_entries(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
+ struct f2fs_summary_block *sum = curseg->sum_blk;
+ struct list_head *cur, *n;
+ struct page *page = NULL;
+ struct f2fs_nat_block *nat_blk = NULL;
+ nid_t start_nid = 0, end_nid = 0;
+ bool flushed;
+
+ flushed = flush_nats_in_journal(sbi);
+
+ if (!flushed)
+ mutex_lock(&curseg->curseg_mutex);
+
+ /* 1) flush dirty nat caches */
+ list_for_each_safe(cur, n, &nm_i->dirty_nat_entries) {
+ struct nat_entry *ne;
+ nid_t nid;
+ struct f2fs_nat_entry raw_ne;
+ int offset = -1;
+ block_t old_blkaddr, new_blkaddr;
+
+ ne = list_entry(cur, struct nat_entry, list);
+ nid = nat_get_nid(ne);
+
+ if (nat_get_blkaddr(ne) == NEW_ADDR)
+ continue;
+ if (flushed)
+ goto to_nat_page;
+
+ /* if there is room for nat enries in curseg->sumpage */
+ offset = lookup_journal_in_cursum(sum, NAT_JOURNAL, nid, 1);
+ if (offset >= 0) {
+ raw_ne = nat_in_journal(sum, offset);
+ old_blkaddr = le32_to_cpu(raw_ne.block_addr);
+ goto flush_now;
+ }
+to_nat_page:
+ if (!page || (start_nid > nid || nid > end_nid)) {
+ if (page) {
+ f2fs_put_page(page, 1);
+ page = NULL;
+ }
+ start_nid = START_NID(nid);
+ end_nid = start_nid + NAT_ENTRY_PER_BLOCK - 1;
+
+ /*
+ * get nat block with dirty flag, increased reference
+ * count, mapped and lock
+ */
+ page = get_next_nat_page(sbi, start_nid);
+ nat_blk = page_address(page);
+ }
+
+ BUG_ON(!nat_blk);
+ raw_ne = nat_blk->entries[nid - start_nid];
+ old_blkaddr = le32_to_cpu(raw_ne.block_addr);
+flush_now:
+ new_blkaddr = nat_get_blkaddr(ne);
+
+ raw_ne.ino = cpu_to_le32(nat_get_ino(ne));
+ raw_ne.block_addr = cpu_to_le32(new_blkaddr);
+ raw_ne.version = nat_get_version(ne);
+
+ if (offset < 0) {
+ nat_blk->entries[nid - start_nid] = raw_ne;
+ } else {
+ nat_in_journal(sum, offset) = raw_ne;
+ nid_in_journal(sum, offset) = cpu_to_le32(nid);
+ }
+
+ if (nat_get_blkaddr(ne) == NULL_ADDR) {
+ write_lock(&nm_i->nat_tree_lock);
+ __del_from_nat_cache(nm_i, ne);
+ write_unlock(&nm_i->nat_tree_lock);
+
+ /* We can reuse this freed nid at this point */
+ add_free_nid(NM_I(sbi), nid);
+ } else {
+ write_lock(&nm_i->nat_tree_lock);
+ __clear_nat_cache_dirty(nm_i, ne);
+ ne->checkpointed = true;
+ write_unlock(&nm_i->nat_tree_lock);
+ }
+ }
+ if (!flushed)
+ mutex_unlock(&curseg->curseg_mutex);
+
+ /*
+ * set block offset in cur_journal_segno1/2
+ * where the last NAT update happened
+ */
+ memcpy(nm_i->nat_upd_blkoff,
+ nm_i->lst_upd_blkoff, sizeof(int) * 3);
+ f2fs_put_page(page, 1);
+
+ /* 2) shrink nat caches if necessary */
+ try_to_free_nats(sbi, nm_i->nat_cnt - NM_WOUT_THRESHOLD);
+}
+
+static int init_node_manager(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_super_block *sb_raw = F2FS_RAW_SUPER(sbi);
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ unsigned char *version_bitmap;
+ int i;
+
+ nm_i->nat_blkaddr = le32_to_cpu(sb_raw->nat_blkaddr);
+
+ /* segment_count_nat includes pair segment so divide to 2. */
+ nm_i->nat_segs = le32_to_cpu(sb_raw->segment_count_nat) >> 1;
+ nm_i->nat_blocks = nm_i->nat_segs <<
+ le32_to_cpu(sb_raw->log_blocks_per_seg);
+ nm_i->max_nid = NAT_ENTRY_PER_BLOCK * nm_i->nat_blocks;
+ nm_i->fcnt = 0;
+ nm_i->nat_cnt = 0;
+
+ INIT_LIST_HEAD(&nm_i->free_nid_list);
+ INIT_RADIX_TREE(&nm_i->nat_root, GFP_ATOMIC);
+ INIT_LIST_HEAD(&nm_i->nat_entries);
+ INIT_LIST_HEAD(&nm_i->dirty_nat_entries);
+
+ mutex_init(&nm_i->build_lock);
+ spin_lock_init(&nm_i->free_nid_list_lock);
+ rwlock_init(&nm_i->nat_tree_lock);
+
+ for (i = 0; i < 3; i++) {
+ nm_i->lst_upd_blkoff[i] =
+ le16_to_cpu(sbi->ckpt->nat_upd_blkoff[i]);
+ nm_i->nat_upd_blkoff[i] =
+ le16_to_cpu(sbi->ckpt->nat_upd_blkoff[i]);
+ }
+
+ nm_i->written_valid_node_count = sbi->total_valid_node_count;
+ nm_i->written_valid_inode_count = sbi->total_valid_inode_count;
+
+ nm_i->bitmap_size = __bitmap_size(sbi, NAT_BITMAP);
+ nm_i->init_scan_nid = le32_to_cpu(sbi->ckpt->next_free_nid);
+ nm_i->next_scan_nid = le32_to_cpu(sbi->ckpt->next_free_nid);
+
+ nm_i->nat_bitmap = kzalloc(nm_i->bitmap_size, GFP_KERNEL);
+ if (!nm_i->nat_bitmap)
+ return -ENOMEM;
+ version_bitmap = __bitmap_ptr(sbi, NAT_BITMAP);
+ if (!version_bitmap)
+ return -EFAULT;
+
+ /* copy version bitmap */
+ memcpy(nm_i->nat_bitmap, version_bitmap, nm_i->bitmap_size);
+ return 0;
+}
+
+int build_node_manager(struct f2fs_sb_info *sbi)
+{
+ sbi->nm_info = kzalloc(sizeof(struct f2fs_nm_info), GFP_KERNEL);
+ if (!sbi->nm_info)
+ return -ENOMEM;
+
+ if (init_node_manager(sbi))
+ return -EINVAL;
+
+ build_free_nids(sbi);
+ return 0;
+}
+
+void destroy_node_manager(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ struct free_nid *i, *next_i;
+ struct nat_entry *natvec[NATVEC_SIZE];
+ nid_t nid = 0;
+ unsigned int found;
+
+ if (!nm_i)
+ return;
+
+ /* destroy free nid list */
+ spin_lock(&nm_i->free_nid_list_lock);
+ list_for_each_entry_safe(i, next_i, &nm_i->free_nid_list, list) {
+ BUG_ON(i->state == NID_ALLOC);
+ __del_from_free_nid_list(i);
+ nm_i->fcnt--;
+ }
+ BUG_ON(nm_i->fcnt);
+ spin_unlock(&nm_i->free_nid_list_lock);
+
+ /* destroy nat cache */
+ write_lock(&nm_i->nat_tree_lock);
+ while ((found = __gang_lookup_nat_cache(nm_i,
+ nid, NATVEC_SIZE, natvec))) {
+ unsigned idx;
+ for (idx = 0; idx < found; idx++) {
+ struct nat_entry *e = natvec[idx];
+ nid = nat_get_nid(e) + 1;
+ __del_from_nat_cache(nm_i, e);
+ }
+ }
+ BUG_ON(nm_i->nat_cnt);
+ write_unlock(&nm_i->nat_tree_lock);
+
+ kfree(nm_i->nat_bitmap);
+ sbi->nm_info = NULL;
+ kfree(nm_i);
+}
+
+int create_node_manager_caches(void)
+{
+ nat_entry_slab = f2fs_kmem_cache_create("nat_entry",
+ sizeof(struct nat_entry), NULL);
+ if (!nat_entry_slab)
+ return -ENOMEM;
+
+ free_nid_slab = f2fs_kmem_cache_create("free_nid",
+ sizeof(struct free_nid), NULL);
+ if (!free_nid_slab) {
+ kmem_cache_destroy(nat_entry_slab);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+void destroy_node_manager_caches(void)
+{
+ kmem_cache_destroy(free_nid_slab);
+ kmem_cache_destroy(nat_entry_slab);
+}
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:29:06

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 07/16 v2] f2fs: add segment operations

This adds specific functions not only to manage dirty/free segments, SIT pages,
a cache for SIT entries, and summary entries, but also to allocate free blocks
and write three types of pages: data, node, and meta.

- F2FS maintains three types of bitmaps in memory, which indicate free, prefree,
and dirty segments respectively.

- The key information of an SIT entry consists of a segment number, the number
of valid blocks in the segment, a bitmap to identify there-in valid or invalid
blocks.

- An SIT page is composed of a certain range of SIT entries, which is maintained
by the address space of meta_inode.

- To cache SIT entries, a simple array is used. The index for the array is the
segment number.

- A summary entry for data contains the parent node information. A summary entry
for node contains its node offset from the inode.

- F2FS manages information about six active logs and those summary entries in
memory. Whenever one of them is changed, its summary entries are flushed to
its SIT page maintained by the address space of meta_inode.

- This patch adds a default block allocation function which supports heap-based
allocation policy.

- This patch adds core functions to write data, node, and meta pages. Since LFS
basically produces a series of sequential writes, F2FS merges sequential bios
with a single one as much as possible to reduce the IO scheduling overhead.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/segment.c | 1795 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 1795 insertions(+)
create mode 100644 fs/f2fs/segment.c

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
new file mode 100644
index 0000000..57d0931
--- /dev/null
+++ b/fs/f2fs/segment.c
@@ -0,0 +1,1795 @@
+/**
+ * fs/f2fs/segment.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+
+#include "f2fs.h"
+#include "segment.h"
+#include "node.h"
+
+static int need_to_flush(struct f2fs_sb_info *sbi)
+{
+ unsigned int pages_per_sec = (1 << sbi->log_blocks_per_seg) *
+ sbi->segs_per_sec;
+ int node_secs = ((get_pages(sbi, F2FS_DIRTY_NODES) + pages_per_sec - 1)
+ >> sbi->log_blocks_per_seg) / sbi->segs_per_sec;
+ int dent_secs = ((get_pages(sbi, F2FS_DIRTY_DENTS) + pages_per_sec - 1)
+ >> sbi->log_blocks_per_seg) / sbi->segs_per_sec;
+
+ if (sbi->por_doing)
+ return 0;
+
+ if (free_sections(sbi) <= (node_secs + 2 * dent_secs +
+ reserved_sections(sbi)))
+ return 1;
+ return 0;
+}
+
+/**
+ * This function balances dirty node and dentry pages.
+ * In addition, it controls garbage collection.
+ */
+void f2fs_balance_fs(struct f2fs_sb_info *sbi)
+{
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = LONG_MAX,
+ .for_reclaim = 0,
+ };
+
+ if (sbi->por_doing)
+ return;
+
+ /*
+ * We should do checkpoint when there are so many dirty node pages
+ * with enough free segments. After then, we should do GC.
+ */
+ if (need_to_flush(sbi)) {
+ sync_dirty_dir_inodes(sbi);
+ sync_node_pages(sbi, 0, &wbc);
+ }
+
+ if (has_not_enough_free_secs(sbi)) {
+ mutex_lock(&sbi->gc_mutex);
+ f2fs_gc(sbi, 1);
+ }
+}
+
+static void __locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno,
+ enum dirty_type dirty_type)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+
+ /* need not be added */
+ if (IS_CURSEG(sbi, segno))
+ return;
+
+ if (!test_and_set_bit(segno, dirty_i->dirty_segmap[dirty_type]))
+ dirty_i->nr_dirty[dirty_type]++;
+
+ if (dirty_type == DIRTY) {
+ struct seg_entry *sentry = get_seg_entry(sbi, segno);
+ dirty_type = sentry->type;
+ if (!test_and_set_bit(segno, dirty_i->dirty_segmap[dirty_type]))
+ dirty_i->nr_dirty[dirty_type]++;
+ }
+}
+
+static void __remove_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno,
+ enum dirty_type dirty_type)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+
+ if (test_and_clear_bit(segno, dirty_i->dirty_segmap[dirty_type]))
+ dirty_i->nr_dirty[dirty_type]--;
+
+ if (dirty_type == DIRTY) {
+ struct seg_entry *sentry = get_seg_entry(sbi, segno);
+ dirty_type = sentry->type;
+ if (test_and_clear_bit(segno,
+ dirty_i->dirty_segmap[dirty_type]))
+ dirty_i->nr_dirty[dirty_type]--;
+ clear_bit(segno, dirty_i->victim_segmap[FG_GC]);
+ clear_bit(segno, dirty_i->victim_segmap[BG_GC]);
+ }
+}
+
+/**
+ * Should not occur error such as -ENOMEM.
+ * Adding dirty entry into seglist is not critical operation.
+ * If a given segment is one of current working segments, it won't be added.
+ */
+void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ unsigned short valid_blocks;
+
+ if (segno == NULL_SEGNO || IS_CURSEG(sbi, segno))
+ return;
+
+ mutex_lock(&dirty_i->seglist_lock);
+
+ valid_blocks = get_valid_blocks(sbi, segno, 0);
+
+ if (valid_blocks == 0) {
+ __locate_dirty_segment(sbi, segno, PRE);
+ __remove_dirty_segment(sbi, segno, DIRTY);
+ } else if (valid_blocks < sbi->blocks_per_seg) {
+ __locate_dirty_segment(sbi, segno, DIRTY);
+ } else {
+ /* Recovery routine with SSR needs this */
+ __remove_dirty_segment(sbi, segno, DIRTY);
+ }
+
+ mutex_unlock(&dirty_i->seglist_lock);
+ return;
+}
+
+/**
+ * Should call clear_prefree_segments after checkpoint is done.
+ */
+static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ unsigned int segno, offset = 0;
+ unsigned int total_segs = TOTAL_SEGS(sbi);
+
+ mutex_lock(&dirty_i->seglist_lock);
+ while (1) {
+ segno = find_next_bit(dirty_i->dirty_segmap[PRE], total_segs,
+ offset);
+ if (segno >= total_segs)
+ break;
+ __set_test_and_free(sbi, segno);
+ offset = segno + 1;
+ }
+ mutex_unlock(&dirty_i->seglist_lock);
+}
+
+void clear_prefree_segments(struct f2fs_sb_info *sbi)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ unsigned int segno, offset = 0;
+ unsigned int total_segs = TOTAL_SEGS(sbi);
+
+ mutex_lock(&dirty_i->seglist_lock);
+ while (1) {
+ segno = find_next_bit(dirty_i->dirty_segmap[PRE], total_segs,
+ offset);
+ if (segno >= total_segs)
+ break;
+
+ offset = segno + 1;
+ if (test_and_clear_bit(segno, dirty_i->dirty_segmap[PRE]))
+ dirty_i->nr_dirty[PRE]--;
+
+ /* Let's use trim */
+ if (test_opt(sbi, DISCARD))
+ blkdev_issue_discard(sbi->sb->s_bdev,
+ START_BLOCK(sbi, segno) <<
+ sbi->log_sectors_per_block,
+ 1 << (sbi->log_sectors_per_block +
+ sbi->log_blocks_per_seg),
+ GFP_NOFS, 0);
+ }
+ mutex_unlock(&dirty_i->seglist_lock);
+}
+
+static void __mark_sit_entry_dirty(struct f2fs_sb_info *sbi, unsigned int segno)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ if (!__test_and_set_bit(segno, sit_i->dirty_sentries_bitmap))
+ sit_i->dirty_sentries++;
+}
+
+static void __set_sit_entry_type(struct f2fs_sb_info *sbi, int type,
+ unsigned int segno, int modified)
+{
+ struct seg_entry *se = get_seg_entry(sbi, segno);
+ se->type = type;
+ if (modified)
+ __mark_sit_entry_dirty(sbi, segno);
+}
+
+static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
+{
+ struct seg_entry *se;
+ unsigned int segno, offset;
+ long int new_vblocks;
+
+ segno = GET_SEGNO(sbi, blkaddr);
+
+ se = get_seg_entry(sbi, segno);
+ new_vblocks = se->valid_blocks + del;
+ offset = GET_SEGOFF_FROM_SEG0(sbi, blkaddr) & (sbi->blocks_per_seg - 1);
+
+ BUG_ON((new_vblocks >> (sizeof(unsigned short) << 3) ||
+ (new_vblocks > sbi->blocks_per_seg)));
+
+ se->valid_blocks = new_vblocks;
+ se->mtime = get_mtime(sbi);
+ SIT_I(sbi)->max_mtime = se->mtime;
+
+ /* Update valid block bitmap */
+ if (del > 0) {
+ if (f2fs_set_bit(offset, se->cur_valid_map))
+ BUG();
+ } else {
+ if (!f2fs_clear_bit(offset, se->cur_valid_map))
+ BUG();
+ }
+ if (!f2fs_test_bit(offset, se->ckpt_valid_map))
+ se->ckpt_valid_blocks += del;
+
+ __mark_sit_entry_dirty(sbi, segno);
+
+ /* update total number of valid blocks to be written in ckpt area */
+ SIT_I(sbi)->written_valid_blocks += del;
+
+ if (sbi->segs_per_sec > 1)
+ get_sec_entry(sbi, segno)->valid_blocks += del;
+}
+
+static void refresh_sit_entry(struct f2fs_sb_info *sbi,
+ block_t old_blkaddr, block_t new_blkaddr)
+{
+ update_sit_entry(sbi, new_blkaddr, 1);
+ if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO)
+ update_sit_entry(sbi, old_blkaddr, -1);
+}
+
+void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr)
+{
+ unsigned int segno = GET_SEGNO(sbi, addr);
+ struct sit_info *sit_i = SIT_I(sbi);
+
+ BUG_ON(addr == NULL_ADDR);
+ if (addr == NEW_ADDR)
+ return;
+
+ /* add it into sit main buffer */
+ mutex_lock(&sit_i->sentry_lock);
+
+ update_sit_entry(sbi, addr, -1);
+
+ /* add it into dirty seglist */
+ locate_dirty_segment(sbi, segno);
+
+ mutex_unlock(&sit_i->sentry_lock);
+}
+
+/**
+ * This function should be resided under the curseg_mutex lock
+ */
+static void __add_sum_entry(struct f2fs_sb_info *sbi, int type,
+ struct f2fs_summary *sum, unsigned short offset)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+ void *addr = curseg->sum_blk;
+ addr += offset * sizeof(struct f2fs_summary);
+ memcpy(addr, sum, sizeof(struct f2fs_summary));
+ return;
+}
+
+/**
+ * Calculate the number of current summary pages for writing
+ */
+int npages_for_summary_flush(struct f2fs_sb_info *sbi)
+{
+ int total_size_bytes = 0;
+ int valid_sum_count = 0;
+ int i, sum_space;
+
+ for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
+ if (sbi->ckpt->alloc_type[i] == SSR)
+ valid_sum_count += sbi->blocks_per_seg;
+ else
+ valid_sum_count += curseg_blkoff(sbi, i);
+ }
+
+ total_size_bytes = valid_sum_count * (SUMMARY_SIZE + 1)
+ + sizeof(struct nat_journal) + 2
+ + sizeof(struct sit_journal) + 2;
+ sum_space = PAGE_CACHE_SIZE - SUM_FOOTER_SIZE;
+ if (total_size_bytes < sum_space)
+ return 1;
+ else if (total_size_bytes < 2 * sum_space)
+ return 2;
+ return 3;
+}
+
+/**
+ * Caller should put this summary page
+ */
+struct page *get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno)
+{
+ return get_meta_page(sbi, GET_SUM_BLOCK(sbi, segno));
+}
+
+static void write_sum_page(struct f2fs_sb_info *sbi,
+ struct f2fs_summary_block *sum_blk, block_t blk_addr)
+{
+ struct page *page = grab_meta_page(sbi, blk_addr);
+ void *kaddr = page_address(page);
+ memcpy(kaddr, sum_blk, PAGE_CACHE_SIZE);
+ set_page_dirty(page);
+ f2fs_put_page(page, 1);
+}
+
+static unsigned int check_prefree_segments(struct f2fs_sb_info *sbi,
+ int ofs_unit, int type)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ unsigned long *prefree_segmap = dirty_i->dirty_segmap[PRE];
+ unsigned int segno, next_segno, i;
+ int ofs = 0;
+
+ /*
+ * If there is not enough reserved sections,
+ * we should not reuse prefree segments.
+ */
+ if (has_not_enough_free_secs(sbi))
+ return NULL_SEGNO;
+
+ /*
+ * NODE page should not reuse prefree segment,
+ * since those information is used for SPOR.
+ */
+ if (IS_NODESEG(type))
+ return NULL_SEGNO;
+next:
+ segno = find_next_bit(prefree_segmap, TOTAL_SEGS(sbi), ofs++);
+ ofs = ((segno / ofs_unit) * ofs_unit) + ofs_unit;
+ if (segno < TOTAL_SEGS(sbi)) {
+ /* skip intermediate segments in a section */
+ if (segno % ofs_unit)
+ goto next;
+
+ /* skip if whole section is not prefree */
+ next_segno = find_next_zero_bit(prefree_segmap,
+ TOTAL_SEGS(sbi), segno + 1);
+ if (next_segno - segno < ofs_unit)
+ goto next;
+
+ /* skip if whole section was not free at the last checkpoint */
+ for (i = 0; i < ofs_unit; i++)
+ if (get_seg_entry(sbi, segno)->ckpt_valid_blocks)
+ goto next;
+ return segno;
+ }
+ return NULL_SEGNO;
+}
+
+/**
+ * Find a new segment from the free segments bitmap to right order
+ * This function should be returned with success, otherwise BUG
+ */
+static void get_new_segment(struct f2fs_sb_info *sbi,
+ unsigned int *newseg, bool new_sec, int dir)
+{
+ struct free_segmap_info *free_i = FREE_I(sbi);
+ unsigned int total_secs = sbi->total_sections;
+ unsigned int segno, secno, zoneno;
+ unsigned int total_zones = sbi->total_sections / sbi->secs_per_zone;
+ unsigned int hint = *newseg / sbi->segs_per_sec;
+ unsigned int old_zoneno = GET_ZONENO_FROM_SEGNO(sbi, *newseg);
+ unsigned int left_start = hint;
+ bool init = true;
+ int go_left = 0;
+ int i;
+
+ write_lock(&free_i->segmap_lock);
+
+ if (!new_sec && ((*newseg + 1) % sbi->segs_per_sec)) {
+ segno = find_next_zero_bit(free_i->free_segmap,
+ TOTAL_SEGS(sbi), *newseg + 1);
+ if (segno < TOTAL_SEGS(sbi))
+ goto got_it;
+ }
+find_other_zone:
+ secno = find_next_zero_bit(free_i->free_secmap, total_secs, hint);
+ if (secno >= total_secs) {
+ if (dir == ALLOC_RIGHT) {
+ secno = find_next_zero_bit(free_i->free_secmap,
+ total_secs, 0);
+ BUG_ON(secno >= total_secs);
+ } else {
+ go_left = 1;
+ left_start = hint - 1;
+ }
+ }
+ if (go_left == 0)
+ goto skip_left;
+
+ while (test_bit(left_start, free_i->free_secmap)) {
+ if (left_start > 0) {
+ left_start--;
+ continue;
+ }
+ left_start = find_next_zero_bit(free_i->free_secmap,
+ total_secs, 0);
+ BUG_ON(left_start >= total_secs);
+ break;
+ }
+ secno = left_start;
+skip_left:
+ hint = secno;
+ segno = secno * sbi->segs_per_sec;
+ zoneno = secno / sbi->secs_per_zone;
+
+ /* give up on finding another zone */
+ if (!init)
+ goto got_it;
+ if (sbi->secs_per_zone == 1)
+ goto got_it;
+ if (zoneno == old_zoneno)
+ goto got_it;
+ if (dir == ALLOC_LEFT) {
+ if (!go_left && zoneno + 1 >= total_zones)
+ goto got_it;
+ if (go_left && zoneno == 0)
+ goto got_it;
+ }
+ for (i = 0; i < NR_CURSEG_TYPE; i++)
+ if (CURSEG_I(sbi, i)->zone == zoneno)
+ break;
+
+ if (i < NR_CURSEG_TYPE) {
+ /* zone is in user, try another */
+ if (go_left)
+ hint = zoneno * sbi->secs_per_zone - 1;
+ else if (zoneno + 1 >= total_zones)
+ hint = 0;
+ else
+ hint = (zoneno + 1) * sbi->secs_per_zone;
+ init = false;
+ goto find_other_zone;
+ }
+got_it:
+ /* set it as dirty segment in free segmap */
+ BUG_ON(test_bit(segno, free_i->free_segmap));
+ __set_inuse(sbi, segno);
+ *newseg = segno;
+ write_unlock(&free_i->segmap_lock);
+}
+
+static void reset_curseg(struct f2fs_sb_info *sbi, int type, int modified)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+ struct summary_footer *sum_footer;
+
+ curseg->segno = curseg->next_segno;
+ curseg->zone = GET_ZONENO_FROM_SEGNO(sbi, curseg->segno);
+ curseg->next_blkoff = 0;
+ curseg->next_segno = NULL_SEGNO;
+
+ sum_footer = &(curseg->sum_blk->footer);
+ memset(sum_footer, 0, sizeof(struct summary_footer));
+ if (IS_DATASEG(type))
+ SET_SUM_TYPE(sum_footer, SUM_TYPE_DATA);
+ if (IS_NODESEG(type))
+ SET_SUM_TYPE(sum_footer, SUM_TYPE_NODE);
+ __set_sit_entry_type(sbi, type, curseg->segno, modified);
+}
+
+/**
+ * Allocate a current working segment.
+ * This function always allocates a free segment in LFS manner.
+ */
+static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+ unsigned int segno = curseg->segno;
+ int dir = ALLOC_LEFT;
+
+ write_sum_page(sbi, curseg->sum_blk,
+ GET_SUM_BLOCK(sbi, curseg->segno));
+ if (type == CURSEG_WARM_DATA || type == CURSEG_COLD_DATA)
+ dir = ALLOC_RIGHT;
+
+ if (test_opt(sbi, NOHEAP))
+ dir = ALLOC_RIGHT;
+
+ get_new_segment(sbi, &segno, new_sec, dir);
+ curseg->next_segno = segno;
+ reset_curseg(sbi, type, 1);
+ curseg->alloc_type = LFS;
+}
+
+static void __next_free_blkoff(struct f2fs_sb_info *sbi,
+ struct curseg_info *seg, block_t start)
+{
+ struct seg_entry *se = get_seg_entry(sbi, seg->segno);
+ block_t ofs;
+ for (ofs = start; ofs < sbi->blocks_per_seg; ofs++) {
+ if (!f2fs_test_bit(ofs, se->ckpt_valid_map)
+ && !f2fs_test_bit(ofs, se->cur_valid_map))
+ break;
+ }
+ seg->next_blkoff = ofs;
+}
+
+/**
+ * If a segment is written by LFS manner, next block offset is just obtained
+ * by increasing the current block offset. However, if a segment is written by
+ * SSR manner, next block offset obtained by calling __next_free_blkoff
+ */
+static void __refresh_next_blkoff(struct f2fs_sb_info *sbi,
+ struct curseg_info *seg)
+{
+ if (seg->alloc_type == SSR)
+ __next_free_blkoff(sbi, seg, seg->next_blkoff + 1);
+ else
+ seg->next_blkoff++;
+}
+
+/**
+ * This function always allocates a used segment (from dirty seglist) by SSR
+ * manner, so it should recover the existing segment information of valid blocks
+ */
+static void change_curseg(struct f2fs_sb_info *sbi, int type, bool reuse)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+ unsigned int new_segno = curseg->next_segno;
+ struct f2fs_summary_block *sum_node;
+ struct page *sum_page;
+
+ write_sum_page(sbi, curseg->sum_blk,
+ GET_SUM_BLOCK(sbi, curseg->segno));
+ __set_test_and_inuse(sbi, new_segno);
+
+ mutex_lock(&dirty_i->seglist_lock);
+ __remove_dirty_segment(sbi, new_segno, PRE);
+ __remove_dirty_segment(sbi, new_segno, DIRTY);
+ mutex_unlock(&dirty_i->seglist_lock);
+
+ reset_curseg(sbi, type, 1);
+ curseg->alloc_type = SSR;
+ __next_free_blkoff(sbi, curseg, 0);
+
+ if (reuse) {
+ sum_page = get_sum_page(sbi, new_segno);
+ sum_node = (struct f2fs_summary_block *)page_address(sum_page);
+ memcpy(curseg->sum_blk, sum_node, SUM_ENTRY_SIZE);
+ f2fs_put_page(sum_page, 1);
+ }
+}
+
+/*
+ * flush out current segment and replace it with new segment
+ * This function should be returned with success, otherwise BUG
+ */
+static void allocate_segment_by_default(struct f2fs_sb_info *sbi,
+ int type, bool force)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+ unsigned int ofs_unit;
+
+ if (force) {
+ new_curseg(sbi, type, true);
+ goto out;
+ }
+
+ ofs_unit = need_SSR(sbi) ? 1 : sbi->segs_per_sec;
+ curseg->next_segno = check_prefree_segments(sbi, ofs_unit, type);
+
+ if (curseg->next_segno != NULL_SEGNO)
+ change_curseg(sbi, type, false);
+ else if (type == CURSEG_WARM_NODE)
+ new_curseg(sbi, type, false);
+ else if (need_SSR(sbi) && IS_NEXT_SEG(sbi, curseg, type))
+ change_curseg(sbi, type, true);
+ else
+ new_curseg(sbi, type, false);
+out:
+ sbi->segment_count[curseg->alloc_type]++;
+}
+
+void allocate_new_segments(struct f2fs_sb_info *sbi)
+{
+ struct curseg_info *curseg;
+ unsigned int old_curseg;
+ int i;
+
+ for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
+ curseg = CURSEG_I(sbi, i);
+ old_curseg = curseg->segno;
+ SIT_I(sbi)->s_ops->allocate_segment(sbi, i, true);
+ locate_dirty_segment(sbi, old_curseg);
+ }
+}
+
+static const struct segment_allocation default_salloc_ops = {
+ .allocate_segment = allocate_segment_by_default,
+};
+
+static void f2fs_end_io_write(struct bio *bio, int err)
+{
+ const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+ struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+ struct bio_private *p = bio->bi_private;
+
+ do {
+ struct page *page = bvec->bv_page;
+
+ if (--bvec >= bio->bi_io_vec)
+ prefetchw(&bvec->bv_page->flags);
+ if (!uptodate) {
+ SetPageError(page);
+ if (page->mapping)
+ set_bit(AS_EIO, &page->mapping->flags);
+ p->sbi->ckpt->ckpt_flags |= CP_ERROR_FLAG;
+ set_page_dirty(page);
+ }
+ end_page_writeback(page);
+ dec_page_count(p->sbi, F2FS_WRITEBACK);
+ } while (bvec >= bio->bi_io_vec);
+
+ if (p->is_sync)
+ complete(p->wait);
+ kfree(p);
+ bio_put(bio);
+}
+
+struct bio *f2fs_bio_alloc(struct block_device *bdev, sector_t first_sector,
+ int nr_vecs, gfp_t gfp_flags)
+{
+ struct bio *bio;
+repeat:
+ /* allocate new bio */
+ bio = bio_alloc(gfp_flags, nr_vecs);
+
+ if (bio == NULL && (current->flags & PF_MEMALLOC)) {
+ while (!bio && (nr_vecs /= 2))
+ bio = bio_alloc(gfp_flags, nr_vecs);
+ }
+ if (bio) {
+ bio->bi_bdev = bdev;
+ bio->bi_sector = first_sector;
+retry:
+ bio->bi_private = kmalloc(sizeof(struct bio_private),
+ GFP_NOFS | __GFP_HIGH);
+ if (!bio->bi_private) {
+ cond_resched();
+ goto retry;
+ }
+ }
+ if (bio == NULL) {
+ cond_resched();
+ goto repeat;
+ }
+ return bio;
+}
+
+static void do_submit_bio(struct f2fs_sb_info *sbi,
+ enum page_type type, bool sync)
+{
+ int rw = sync ? WRITE_SYNC : WRITE;
+ enum page_type btype = type > META ? META : type;
+
+ if (type >= META_FLUSH)
+ rw = WRITE_FLUSH_FUA;
+
+ if (sbi->bio[btype]) {
+ struct bio_private *p = sbi->bio[btype]->bi_private;
+ p->sbi = sbi;
+ sbi->bio[btype]->bi_end_io = f2fs_end_io_write;
+ if (type == META_FLUSH) {
+ DECLARE_COMPLETION_ONSTACK(wait);
+ p->is_sync = true;
+ p->wait = &wait;
+ submit_bio(rw, sbi->bio[btype]);
+ wait_for_completion(&wait);
+ } else {
+ p->is_sync = false;
+ submit_bio(rw, sbi->bio[btype]);
+ }
+ sbi->bio[btype] = NULL;
+ }
+}
+
+void f2fs_submit_bio(struct f2fs_sb_info *sbi, enum page_type type, bool sync)
+{
+ down_write(&sbi->bio_sem);
+ do_submit_bio(sbi, type, sync);
+ up_write(&sbi->bio_sem);
+}
+
+static void submit_write_page(struct f2fs_sb_info *sbi, struct page *page,
+ block_t blk_addr, enum page_type type)
+{
+ struct block_device *bdev = sbi->sb->s_bdev;
+
+ verify_block_addr(sbi, blk_addr);
+
+ down_write(&sbi->bio_sem);
+
+ inc_page_count(sbi, F2FS_WRITEBACK);
+
+ if (sbi->bio[type] && sbi->last_block_in_bio[type] != blk_addr - 1)
+ do_submit_bio(sbi, type, false);
+alloc_new:
+ if (sbi->bio[type] == NULL)
+ sbi->bio[type] = f2fs_bio_alloc(bdev,
+ blk_addr << (sbi->log_blocksize - 9),
+ bio_get_nr_vecs(bdev), GFP_NOFS | __GFP_HIGH);
+
+ if (bio_add_page(sbi->bio[type], page, PAGE_CACHE_SIZE, 0) <
+ PAGE_CACHE_SIZE) {
+ do_submit_bio(sbi, type, false);
+ goto alloc_new;
+ }
+
+ sbi->last_block_in_bio[type] = blk_addr;
+
+ up_write(&sbi->bio_sem);
+}
+
+static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+ if (curseg->next_blkoff < sbi->blocks_per_seg)
+ return true;
+ return false;
+}
+
+static int __get_segment_type_2(struct page *page, enum page_type p_type)
+{
+ if (p_type == DATA)
+ return CURSEG_HOT_DATA;
+ else
+ return CURSEG_HOT_NODE;
+}
+
+static int __get_segment_type_4(struct page *page, enum page_type p_type)
+{
+ if (p_type == DATA) {
+ struct inode *inode = page->mapping->host;
+
+ if (S_ISDIR(inode->i_mode))
+ return CURSEG_HOT_DATA;
+ else
+ return CURSEG_COLD_DATA;
+ } else {
+ if (IS_DNODE(page) && !is_cold_node(page))
+ return CURSEG_HOT_NODE;
+ else
+ return CURSEG_COLD_NODE;
+ }
+}
+
+static int __get_segment_type_6(struct page *page, enum page_type p_type)
+{
+ if (p_type == DATA) {
+ struct inode *inode = page->mapping->host;
+
+ if (S_ISDIR(inode->i_mode))
+ return CURSEG_HOT_DATA;
+ else if (is_cold_data(page) || is_cold_file(inode))
+ return CURSEG_COLD_DATA;
+ else
+ return CURSEG_WARM_DATA;
+ } else {
+ if (IS_DNODE(page))
+ return is_cold_node(page) ? CURSEG_WARM_NODE :
+ CURSEG_HOT_NODE;
+ else
+ return CURSEG_COLD_NODE;
+ }
+}
+
+static int __get_segment_type(struct page *page, enum page_type p_type)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
+ switch (sbi->active_logs) {
+ case 2:
+ return __get_segment_type_2(page, p_type);
+ case 4:
+ return __get_segment_type_4(page, p_type);
+ case 6:
+ return __get_segment_type_6(page, p_type);
+ default:
+ BUG();
+ }
+}
+
+static void do_write_page(struct f2fs_sb_info *sbi, struct page *page,
+ block_t old_blkaddr, block_t *new_blkaddr,
+ struct f2fs_summary *sum, enum page_type p_type)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ struct curseg_info *curseg;
+ unsigned int old_cursegno;
+ int type;
+
+ type = __get_segment_type(page, p_type);
+ curseg = CURSEG_I(sbi, type);
+
+ mutex_lock(&curseg->curseg_mutex);
+
+ *new_blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
+ old_cursegno = curseg->segno;
+
+ /*
+ * __add_sum_entry should be resided under the curseg_mutex
+ * because, this function updates a summary entry in the
+ * current summary block.
+ */
+ __add_sum_entry(sbi, type, sum, curseg->next_blkoff);
+
+ mutex_lock(&sit_i->sentry_lock);
+ __refresh_next_blkoff(sbi, curseg);
+ sbi->block_count[curseg->alloc_type]++;
+
+ /*
+ * SIT information should be updated before segment allocation,
+ * since SSR needs latest valid block information.
+ */
+ refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr);
+
+ if (!__has_curseg_space(sbi, type))
+ sit_i->s_ops->allocate_segment(sbi, type, false);
+
+ locate_dirty_segment(sbi, old_cursegno);
+ locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
+ mutex_unlock(&sit_i->sentry_lock);
+
+ if (p_type == NODE)
+ fill_node_footer_blkaddr(page, NEXT_FREE_BLKADDR(sbi, curseg));
+
+ /* writeout dirty page into bdev */
+ submit_write_page(sbi, page, *new_blkaddr, p_type);
+
+ mutex_unlock(&curseg->curseg_mutex);
+}
+
+int write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
+ struct writeback_control *wbc)
+{
+ if (wbc->for_reclaim)
+ return AOP_WRITEPAGE_ACTIVATE;
+
+ set_page_writeback(page);
+ submit_write_page(sbi, page, page->index, META);
+ return 0;
+}
+
+void write_node_page(struct f2fs_sb_info *sbi, struct page *page,
+ unsigned int nid, block_t old_blkaddr, block_t *new_blkaddr)
+{
+ struct f2fs_summary sum;
+ set_summary(&sum, nid, 0, 0);
+ do_write_page(sbi, page, old_blkaddr, new_blkaddr, &sum, NODE);
+}
+
+void write_data_page(struct inode *inode, struct page *page,
+ struct dnode_of_data *dn, block_t old_blkaddr,
+ block_t *new_blkaddr)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct f2fs_summary sum;
+ struct node_info ni;
+
+ BUG_ON(old_blkaddr == NULL_ADDR);
+ get_node_info(sbi, dn->nid, &ni);
+ set_summary(&sum, dn->nid, dn->ofs_in_node, ni.version);
+
+ do_write_page(sbi, page, old_blkaddr,
+ new_blkaddr, &sum, DATA);
+}
+
+void rewrite_data_page(struct f2fs_sb_info *sbi, struct page *page,
+ block_t old_blk_addr)
+{
+ submit_write_page(sbi, page, old_blk_addr, DATA);
+}
+
+void recover_data_page(struct f2fs_sb_info *sbi,
+ struct page *page, struct f2fs_summary *sum,
+ block_t old_blkaddr, block_t new_blkaddr)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ struct curseg_info *curseg;
+ unsigned int segno, old_cursegno;
+ struct seg_entry *se;
+ int type;
+
+ segno = GET_SEGNO(sbi, new_blkaddr);
+ se = get_seg_entry(sbi, segno);
+ type = se->type;
+
+ if (se->valid_blocks == 0 && !IS_CURSEG(sbi, segno)) {
+ if (old_blkaddr == NULL_ADDR)
+ type = CURSEG_COLD_DATA;
+ else
+ type = CURSEG_WARM_DATA;
+ }
+ curseg = CURSEG_I(sbi, type);
+
+ mutex_lock(&curseg->curseg_mutex);
+ mutex_lock(&sit_i->sentry_lock);
+
+ old_cursegno = curseg->segno;
+
+ /* change the current segment */
+ if (segno != curseg->segno) {
+ curseg->next_segno = segno;
+ change_curseg(sbi, type, true);
+ }
+
+ curseg->next_blkoff = GET_SEGOFF_FROM_SEG0(sbi, new_blkaddr) &
+ (sbi->blocks_per_seg - 1);
+ __add_sum_entry(sbi, type, sum, curseg->next_blkoff);
+
+ refresh_sit_entry(sbi, old_blkaddr, new_blkaddr);
+
+ locate_dirty_segment(sbi, old_cursegno);
+ locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
+
+ mutex_unlock(&sit_i->sentry_lock);
+ mutex_unlock(&curseg->curseg_mutex);
+}
+
+void rewrite_node_page(struct f2fs_sb_info *sbi,
+ struct page *page, struct f2fs_summary *sum,
+ block_t old_blkaddr, block_t new_blkaddr)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ int type = CURSEG_WARM_NODE;
+ struct curseg_info *curseg;
+ unsigned int segno, old_cursegno;
+ block_t next_blkaddr = next_blkaddr_of_node(page);
+ unsigned int next_segno = GET_SEGNO(sbi, next_blkaddr);
+
+ curseg = CURSEG_I(sbi, type);
+
+ mutex_lock(&curseg->curseg_mutex);
+ mutex_lock(&sit_i->sentry_lock);
+
+ segno = GET_SEGNO(sbi, new_blkaddr);
+ old_cursegno = curseg->segno;
+
+ /* change the current segment */
+ if (segno != curseg->segno) {
+ curseg->next_segno = segno;
+ change_curseg(sbi, type, true);
+ }
+ curseg->next_blkoff = GET_SEGOFF_FROM_SEG0(sbi, new_blkaddr) &
+ (sbi->blocks_per_seg - 1);
+ __add_sum_entry(sbi, type, sum, curseg->next_blkoff);
+
+ /* change the current log to the next block addr in advance */
+ if (next_segno != segno) {
+ curseg->next_segno = next_segno;
+ change_curseg(sbi, type, true);
+ }
+ curseg->next_blkoff = GET_SEGOFF_FROM_SEG0(sbi, next_blkaddr) &
+ (sbi->blocks_per_seg - 1);
+
+ /* rewrite node page */
+ set_page_writeback(page);
+ submit_write_page(sbi, page, new_blkaddr, NODE);
+ f2fs_submit_bio(sbi, NODE, true);
+ refresh_sit_entry(sbi, old_blkaddr, new_blkaddr);
+
+ locate_dirty_segment(sbi, old_cursegno);
+ locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
+
+ mutex_unlock(&sit_i->sentry_lock);
+ mutex_unlock(&curseg->curseg_mutex);
+}
+
+static int read_compacted_summaries(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ struct curseg_info *seg_i;
+ unsigned char *kaddr;
+ struct page *page;
+ block_t start;
+ int i, j, offset;
+
+ start = start_sum_block(sbi);
+
+ page = get_meta_page(sbi, start++);
+ kaddr = (unsigned char *)page_address(page);
+
+ /* Step 1: restore nat cache */
+ seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
+ memcpy(&seg_i->sum_blk->n_nats, kaddr, SUM_JOURNAL_SIZE);
+
+ /* Step 2: restore sit cache */
+ seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA);
+ memcpy(&seg_i->sum_blk->n_sits, kaddr + SUM_JOURNAL_SIZE,
+ SUM_JOURNAL_SIZE);
+ offset = 2 * SUM_JOURNAL_SIZE;
+
+ /* Step 3: restore summary entries */
+ for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
+ unsigned short blk_off;
+ unsigned int segno;
+
+ seg_i = CURSEG_I(sbi, i);
+ segno = le32_to_cpu(ckpt->cur_data_segno[i]);
+ blk_off = le16_to_cpu(ckpt->cur_data_blkoff[i]);
+ seg_i->next_segno = segno;
+ reset_curseg(sbi, i, 0);
+ seg_i->alloc_type = ckpt->alloc_type[i];
+ seg_i->next_blkoff = blk_off;
+
+ if (seg_i->alloc_type == SSR)
+ blk_off = sbi->blocks_per_seg;
+
+ for (j = 0; j < blk_off; j++) {
+ struct f2fs_summary *s;
+ s = (struct f2fs_summary *)(kaddr + offset);
+ seg_i->sum_blk->entries[j] = *s;
+ offset += SUMMARY_SIZE;
+ if (offset + SUMMARY_SIZE <= PAGE_CACHE_SIZE -
+ SUM_FOOTER_SIZE)
+ continue;
+
+ f2fs_put_page(page, 1);
+ page = NULL;
+
+ page = get_meta_page(sbi, start++);
+ kaddr = (unsigned char *)page_address(page);
+ offset = 0;
+ }
+ }
+ f2fs_put_page(page, 1);
+ return 0;
+}
+
+static int read_normal_summaries(struct f2fs_sb_info *sbi, int type)
+{
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ struct f2fs_summary_block *sum;
+ struct curseg_info *curseg;
+ struct page *new;
+ unsigned short blk_off;
+ unsigned int segno = 0;
+ block_t blk_addr = 0;
+
+ /* get segment number and block addr */
+ if (IS_DATASEG(type)) {
+ segno = le32_to_cpu(ckpt->cur_data_segno[type]);
+ blk_off = le16_to_cpu(ckpt->cur_data_blkoff[type -
+ CURSEG_HOT_DATA]);
+ if (ckpt->ckpt_flags & CP_UMOUNT_FLAG)
+ blk_addr = sum_blk_addr(sbi, NR_CURSEG_TYPE, type);
+ else
+ blk_addr = sum_blk_addr(sbi, NR_CURSEG_DATA_TYPE, type);
+ } else {
+ segno = le32_to_cpu(ckpt->cur_node_segno[type -
+ CURSEG_HOT_NODE]);
+ blk_off = le16_to_cpu(ckpt->cur_node_blkoff[type -
+ CURSEG_HOT_NODE]);
+ if (ckpt->ckpt_flags & CP_UMOUNT_FLAG)
+ blk_addr = sum_blk_addr(sbi, NR_CURSEG_NODE_TYPE,
+ type - CURSEG_HOT_NODE);
+ else
+ blk_addr = GET_SUM_BLOCK(sbi, segno);
+ }
+
+ new = get_meta_page(sbi, blk_addr);
+ sum = (struct f2fs_summary_block *)page_address(new);
+
+ if (IS_NODESEG(type)) {
+ if (ckpt->ckpt_flags & CP_UMOUNT_FLAG) {
+ struct f2fs_summary *ns = &sum->entries[0];
+ int i;
+ for (i = 0; i < sbi->blocks_per_seg; i++, ns++) {
+ ns->version = 0;
+ ns->ofs_in_node = 0;
+ }
+ } else {
+ if (restore_node_summary(sbi, segno, sum)) {
+ f2fs_put_page(new, 1);
+ return -EINVAL;
+ }
+ }
+ }
+
+ /* set uncompleted segment to curseg */
+ curseg = CURSEG_I(sbi, type);
+ mutex_lock(&curseg->curseg_mutex);
+ memcpy(curseg->sum_blk, sum, PAGE_CACHE_SIZE);
+ curseg->next_segno = segno;
+ reset_curseg(sbi, type, 0);
+ curseg->alloc_type = ckpt->alloc_type[type];
+ curseg->next_blkoff = blk_off;
+ mutex_unlock(&curseg->curseg_mutex);
+ f2fs_put_page(new, 1);
+ return 0;
+}
+
+static int restore_curseg_summaries(struct f2fs_sb_info *sbi)
+{
+ int type = CURSEG_HOT_DATA;
+
+ if (sbi->ckpt->ckpt_flags & CP_COMPACT_SUM_FLAG) {
+ /* restore for compacted data summary */
+ if (read_compacted_summaries(sbi))
+ return -EINVAL;
+ type = CURSEG_HOT_NODE;
+ }
+
+ for (; type <= CURSEG_COLD_NODE; type++)
+ if (read_normal_summaries(sbi, type))
+ return -EINVAL;
+ return 0;
+}
+
+static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr)
+{
+ struct page *page;
+ unsigned char *kaddr;
+ struct f2fs_summary *summary;
+ struct curseg_info *seg_i;
+ int written_size = 0;
+ int i, j;
+
+ page = grab_meta_page(sbi, blkaddr++);
+ kaddr = (unsigned char *)page_address(page);
+
+ /* Step 1: write nat cache */
+ seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
+ memcpy(kaddr, &seg_i->sum_blk->n_nats, SUM_JOURNAL_SIZE);
+ written_size += SUM_JOURNAL_SIZE;
+
+ /* Step 2: write sit cache */
+ seg_i = CURSEG_I(sbi, CURSEG_COLD_DATA);
+ memcpy(kaddr + written_size, &seg_i->sum_blk->n_sits,
+ SUM_JOURNAL_SIZE);
+ written_size += SUM_JOURNAL_SIZE;
+
+ set_page_dirty(page);
+
+ /* Step 3: write summary entries */
+ for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
+ unsigned short blkoff;
+ seg_i = CURSEG_I(sbi, i);
+ if (sbi->ckpt->alloc_type[i] == SSR)
+ blkoff = sbi->blocks_per_seg;
+ else
+ blkoff = curseg_blkoff(sbi, i);
+
+ for (j = 0; j < blkoff; j++) {
+ if (!page) {
+ page = grab_meta_page(sbi, blkaddr++);
+ kaddr = (unsigned char *)page_address(page);
+ written_size = 0;
+ }
+ summary = (struct f2fs_summary *)(kaddr + written_size);
+ *summary = seg_i->sum_blk->entries[j];
+ written_size += SUMMARY_SIZE;
+ set_page_dirty(page);
+
+ if (written_size + SUMMARY_SIZE <= PAGE_CACHE_SIZE -
+ SUM_FOOTER_SIZE)
+ continue;
+
+ f2fs_put_page(page, 1);
+ page = NULL;
+ }
+ }
+ if (page)
+ f2fs_put_page(page, 1);
+}
+
+static void write_normal_summaries(struct f2fs_sb_info *sbi,
+ block_t blkaddr, int type)
+{
+ int i, end;
+ if (IS_DATASEG(type))
+ end = type + NR_CURSEG_DATA_TYPE;
+ else
+ end = type + NR_CURSEG_NODE_TYPE;
+
+ for (i = type; i < end; i++) {
+ struct curseg_info *sum = CURSEG_I(sbi, i);
+ mutex_lock(&sum->curseg_mutex);
+ write_sum_page(sbi, sum->sum_blk, blkaddr + (i - type));
+ mutex_unlock(&sum->curseg_mutex);
+ }
+}
+
+void write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
+{
+ if (sbi->ckpt->ckpt_flags & CP_COMPACT_SUM_FLAG)
+ write_compacted_summaries(sbi, start_blk);
+ else
+ write_normal_summaries(sbi, start_blk, CURSEG_HOT_DATA);
+}
+
+void write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
+{
+ if (sbi->ckpt->ckpt_flags & CP_UMOUNT_FLAG)
+ write_normal_summaries(sbi, start_blk, CURSEG_HOT_NODE);
+ return;
+}
+
+int lookup_journal_in_cursum(struct f2fs_summary_block *sum, int type,
+ unsigned int val, int alloc)
+{
+ int i;
+
+ if (type == NAT_JOURNAL) {
+ for (i = 0; i < nats_in_cursum(sum); i++) {
+ if (le32_to_cpu(nid_in_journal(sum, i)) == val)
+ return i;
+ }
+ if (alloc && nats_in_cursum(sum) < NAT_JOURNAL_ENTRIES)
+ return update_nats_in_cursum(sum, 1);
+ } else if (type == SIT_JOURNAL) {
+ for (i = 0; i < sits_in_cursum(sum); i++)
+ if (le32_to_cpu(segno_in_journal(sum, i)) == val)
+ return i;
+ if (alloc && sits_in_cursum(sum) < SIT_JOURNAL_ENTRIES)
+ return update_sits_in_cursum(sum, 1);
+ }
+ return -1;
+}
+
+static struct page *get_current_sit_page(struct f2fs_sb_info *sbi,
+ unsigned int segno)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ unsigned int offset = SIT_BLOCK_OFFSET(sit_i, segno);
+ block_t blk_addr = sit_i->sit_base_addr + offset;
+
+ check_seg_range(sbi, segno);
+
+ /* calculate sit block address */
+ if (f2fs_test_bit(offset, sit_i->sit_bitmap))
+ blk_addr += sit_i->sit_blocks;
+
+ return get_meta_page(sbi, blk_addr);
+}
+
+static struct page *get_next_sit_page(struct f2fs_sb_info *sbi,
+ unsigned int start)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ struct page *src_page, *dst_page;
+ pgoff_t src_off, dst_off;
+ void *src_addr, *dst_addr;
+
+ src_off = current_sit_addr(sbi, start);
+ dst_off = next_sit_addr(sbi, src_off);
+
+ /* get current sit block page without lock */
+ src_page = get_meta_page(sbi, src_off);
+ dst_page = grab_meta_page(sbi, dst_off);
+ BUG_ON(PageDirty(src_page));
+
+ src_addr = page_address(src_page);
+ dst_addr = page_address(dst_page);
+ memcpy(dst_addr, src_addr, PAGE_CACHE_SIZE);
+
+ set_page_dirty(dst_page);
+ f2fs_put_page(src_page, 1);
+
+ set_to_next_sit(sit_i, start);
+
+ return dst_page;
+}
+
+static bool flush_sits_in_journal(struct f2fs_sb_info *sbi)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
+ struct f2fs_summary_block *sum = curseg->sum_blk;
+ int i;
+
+ /*
+ * If the journal area in the current summary is full of sit entries,
+ * all the sit entries will be flushed. Otherwise the sit entries
+ * are not able to replace with newly hot sit entries.
+ */
+ if (sits_in_cursum(sum) >= SIT_JOURNAL_ENTRIES) {
+ for (i = sits_in_cursum(sum) - 1; i >= 0; i--) {
+ unsigned int segno;
+ segno = le32_to_cpu(segno_in_journal(sum, i));
+ __mark_sit_entry_dirty(sbi, segno);
+ }
+ update_sits_in_cursum(sum, -sits_in_cursum(sum));
+ return 1;
+ }
+ return 0;
+}
+
+/**
+ * CP calls this function, which flushes SIT entries including sit_journal,
+ * and moves prefree segs to free segs.
+ */
+void flush_sit_entries(struct f2fs_sb_info *sbi)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ unsigned long *bitmap = sit_i->dirty_sentries_bitmap;
+ struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
+ struct f2fs_summary_block *sum = curseg->sum_blk;
+ unsigned long nsegs = TOTAL_SEGS(sbi);
+ struct page *page = NULL;
+ struct f2fs_sit_block *raw_sit = NULL;
+ unsigned int start = 0, end = 0;
+ unsigned int segno = -1;
+ bool flushed;
+
+ mutex_lock(&curseg->curseg_mutex);
+ mutex_lock(&sit_i->sentry_lock);
+
+ /*
+ * "flushed" indicates whether sit entries in journal are flushed
+ * to the SIT area or not.
+ */
+ flushed = flush_sits_in_journal(sbi);
+
+ while ((segno = find_next_bit(bitmap, nsegs, segno + 1)) < nsegs) {
+ struct seg_entry *se = get_seg_entry(sbi, segno);
+ int sit_offset, offset;
+
+ sit_offset = SIT_ENTRY_OFFSET(sit_i, segno);
+
+ if (flushed)
+ goto to_sit_page;
+
+ offset = lookup_journal_in_cursum(sum, SIT_JOURNAL, segno, 1);
+ if (offset >= 0) {
+ segno_in_journal(sum, offset) = cpu_to_le32(segno);
+ seg_info_to_raw_sit(se, &sit_in_journal(sum, offset));
+ goto flush_done;
+ }
+to_sit_page:
+ if (!page || (start > segno) || (segno > end)) {
+ if (page) {
+ f2fs_put_page(page, 1);
+ page = NULL;
+ }
+
+ start = START_SEGNO(sit_i, segno);
+ end = start + SIT_ENTRY_PER_BLOCK - 1;
+
+ /* read sit block that will be updated */
+ page = get_next_sit_page(sbi, start);
+ raw_sit = page_address(page);
+ }
+
+ /* udpate entry in SIT block */
+ seg_info_to_raw_sit(se, &raw_sit->entries[sit_offset]);
+flush_done:
+ __clear_bit(segno, bitmap);
+ sit_i->dirty_sentries--;
+ }
+ mutex_unlock(&sit_i->sentry_lock);
+ mutex_unlock(&curseg->curseg_mutex);
+
+ /* writeout last modified SIT block */
+ f2fs_put_page(page, 1);
+
+ set_prefree_as_free_segments(sbi);
+}
+
+static int build_sit_info(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ struct sit_info *sit_i;
+ unsigned int sit_segs, start;
+ char *src_bitmap, *dst_bitmap;
+ unsigned int bitmap_size;
+
+ /* allocate memory for SIT information */
+ sit_i = kzalloc(sizeof(struct sit_info), GFP_KERNEL);
+ if (!sit_i)
+ return -ENOMEM;
+
+ SM_I(sbi)->sit_info = sit_i;
+
+ sit_i->sentries = vzalloc(TOTAL_SEGS(sbi) * sizeof(struct seg_entry));
+ if (!sit_i->sentries)
+ return -ENOMEM;
+
+ bitmap_size = f2fs_bitmap_size(TOTAL_SEGS(sbi));
+ sit_i->dirty_sentries_bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+ if (!sit_i->dirty_sentries_bitmap)
+ return -ENOMEM;
+
+ for (start = 0; start < TOTAL_SEGS(sbi); start++) {
+ sit_i->sentries[start].cur_valid_map
+ = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
+ sit_i->sentries[start].ckpt_valid_map
+ = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
+ if (!sit_i->sentries[start].cur_valid_map
+ || !sit_i->sentries[start].ckpt_valid_map)
+ return -ENOMEM;
+ }
+
+ if (sbi->segs_per_sec > 1) {
+ sit_i->sec_entries = vzalloc(sbi->total_sections *
+ sizeof(struct sec_entry));
+ if (!sit_i->sec_entries)
+ return -ENOMEM;
+ }
+
+ /* get information related with SIT */
+ sit_segs = le32_to_cpu(raw_super->segment_count_sit) >> 1;
+
+ /* setup SIT bitmap from ckeckpoint pack */
+ bitmap_size = __bitmap_size(sbi, SIT_BITMAP);
+ src_bitmap = __bitmap_ptr(sbi, SIT_BITMAP);
+
+ dst_bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+ if (!dst_bitmap)
+ return -ENOMEM;
+ memcpy(dst_bitmap, src_bitmap, bitmap_size);
+
+ /* init SIT information */
+ sit_i->s_ops = &default_salloc_ops;
+
+ sit_i->sit_base_addr = le32_to_cpu(raw_super->sit_blkaddr);
+ sit_i->sit_blocks = sit_segs << sbi->log_blocks_per_seg;
+ sit_i->written_valid_blocks = le64_to_cpu(ckpt->valid_block_count);
+ sit_i->sit_bitmap = dst_bitmap;
+ sit_i->bitmap_size = bitmap_size;
+ sit_i->dirty_sentries = 0;
+ sit_i->sents_per_block = SIT_ENTRY_PER_BLOCK;
+ sit_i->elapsed_time = le64_to_cpu(sbi->ckpt->elapsed_time);
+ sit_i->mounted_time = CURRENT_TIME_SEC.tv_sec;
+ mutex_init(&sit_i->sentry_lock);
+ return 0;
+}
+
+static int build_free_segmap(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_sm_info *sm_info = SM_I(sbi);
+ struct free_segmap_info *free_i;
+ unsigned int bitmap_size, sec_bitmap_size;
+
+ /* allocate memory for free segmap information */
+ free_i = kzalloc(sizeof(struct free_segmap_info), GFP_KERNEL);
+ if (!free_i)
+ return -ENOMEM;
+
+ SM_I(sbi)->free_info = free_i;
+
+ bitmap_size = f2fs_bitmap_size(TOTAL_SEGS(sbi));
+ free_i->free_segmap = kmalloc(bitmap_size, GFP_KERNEL);
+ if (!free_i->free_segmap)
+ return -ENOMEM;
+
+ sec_bitmap_size = f2fs_bitmap_size(sbi->total_sections);
+ free_i->free_secmap = kmalloc(sec_bitmap_size, GFP_KERNEL);
+ if (!free_i->free_secmap)
+ return -ENOMEM;
+
+ /* set all segments as dirty temporarily */
+ memset(free_i->free_segmap, 0xff, bitmap_size);
+ memset(free_i->free_secmap, 0xff, sec_bitmap_size);
+
+ /* init free segmap information */
+ free_i->start_segno =
+ (unsigned int) GET_SEGNO_FROM_SEG0(sbi, sm_info->main_blkaddr);
+ free_i->free_segments = 0;
+ free_i->free_sections = 0;
+ rwlock_init(&free_i->segmap_lock);
+ return 0;
+}
+
+static int build_curseg(struct f2fs_sb_info *sbi)
+{
+ struct curseg_info *array = NULL;
+ int i;
+
+ array = kzalloc(sizeof(*array) * NR_CURSEG_TYPE, GFP_KERNEL);
+ if (!array)
+ return -ENOMEM;
+
+ SM_I(sbi)->curseg_array = array;
+
+ for (i = 0; i < NR_CURSEG_TYPE; i++) {
+ mutex_init(&array[i].curseg_mutex);
+ array[i].sum_blk = kzalloc(PAGE_CACHE_SIZE, GFP_KERNEL);
+ if (!array[i].sum_blk)
+ return -ENOMEM;
+ array[i].segno = NULL_SEGNO;
+ array[i].next_blkoff = 0;
+ }
+ return restore_curseg_summaries(sbi);
+}
+
+static void build_sit_entries(struct f2fs_sb_info *sbi)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
+ struct f2fs_summary_block *sum = curseg->sum_blk;
+ unsigned int start;
+
+ for (start = 0; start < TOTAL_SEGS(sbi); start++) {
+ struct seg_entry *se = &sit_i->sentries[start];
+ struct f2fs_sit_block *sit_blk;
+ struct f2fs_sit_entry sit;
+ struct page *page;
+ int i;
+
+ mutex_lock(&curseg->curseg_mutex);
+ for (i = 0; i < sits_in_cursum(sum); i++) {
+ if (le32_to_cpu(segno_in_journal(sum, i)) == start) {
+ sit = sit_in_journal(sum, i);
+ mutex_unlock(&curseg->curseg_mutex);
+ goto got_it;
+ }
+ }
+ mutex_unlock(&curseg->curseg_mutex);
+ page = get_current_sit_page(sbi, start);
+ sit_blk = (struct f2fs_sit_block *)page_address(page);
+ sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)];
+ f2fs_put_page(page, 1);
+got_it:
+ check_block_count(sbi, start, &sit);
+ seg_info_from_raw_sit(se, &sit);
+ if (sbi->segs_per_sec > 1) {
+ struct sec_entry *e = get_sec_entry(sbi, start);
+ e->valid_blocks += se->valid_blocks;
+ }
+ }
+}
+
+static void init_free_segmap(struct f2fs_sb_info *sbi)
+{
+ unsigned int start;
+ int type;
+
+ for (start = 0; start < TOTAL_SEGS(sbi); start++) {
+ struct seg_entry *sentry = get_seg_entry(sbi, start);
+ if (!sentry->valid_blocks)
+ __set_free(sbi, start);
+ }
+
+ /* set use the current segments */
+ for (type = CURSEG_HOT_DATA; type <= CURSEG_COLD_NODE; type++) {
+ struct curseg_info *curseg_t = CURSEG_I(sbi, type);
+ __set_test_and_inuse(sbi, curseg_t->segno);
+ }
+}
+
+static void init_dirty_segmap(struct f2fs_sb_info *sbi)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ struct free_segmap_info *free_i = FREE_I(sbi);
+ unsigned int segno = 0, offset = 0;
+ unsigned short valid_blocks;
+
+ while (segno < TOTAL_SEGS(sbi)) {
+ /* find dirty segment based on free segmap */
+ segno = find_next_inuse(free_i, TOTAL_SEGS(sbi), offset);
+ if (segno >= TOTAL_SEGS(sbi))
+ break;
+ offset = segno + 1;
+ valid_blocks = get_valid_blocks(sbi, segno, 0);
+ if (valid_blocks >= sbi->blocks_per_seg || !valid_blocks)
+ continue;
+ mutex_lock(&dirty_i->seglist_lock);
+ __locate_dirty_segment(sbi, segno, DIRTY);
+ mutex_unlock(&dirty_i->seglist_lock);
+ }
+}
+
+static int init_victim_segmap(struct f2fs_sb_info *sbi)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ unsigned int bitmap_size = f2fs_bitmap_size(TOTAL_SEGS(sbi));
+
+ dirty_i->victim_segmap[FG_GC] = kzalloc(bitmap_size, GFP_KERNEL);
+ dirty_i->victim_segmap[BG_GC] = kzalloc(bitmap_size, GFP_KERNEL);
+ if (!dirty_i->victim_segmap[FG_GC] || !dirty_i->victim_segmap[BG_GC])
+ return -ENOMEM;
+ return 0;
+}
+
+static int build_dirty_segmap(struct f2fs_sb_info *sbi)
+{
+ struct dirty_seglist_info *dirty_i;
+ unsigned int bitmap_size, i;
+
+ /* allocate memory for dirty segments list information */
+ dirty_i = kzalloc(sizeof(struct dirty_seglist_info), GFP_KERNEL);
+ if (!dirty_i)
+ return -ENOMEM;
+
+ SM_I(sbi)->dirty_info = dirty_i;
+ mutex_init(&dirty_i->seglist_lock);
+
+ bitmap_size = f2fs_bitmap_size(TOTAL_SEGS(sbi));
+
+ for (i = 0; i < NR_DIRTY_TYPE; i++) {
+ dirty_i->dirty_segmap[i] = kzalloc(bitmap_size, GFP_KERNEL);
+ dirty_i->nr_dirty[i] = 0;
+ if (!dirty_i->dirty_segmap[i])
+ return -ENOMEM;
+ }
+
+ init_dirty_segmap(sbi);
+ return init_victim_segmap(sbi);
+}
+
+/**
+ * Update min, max modified time for cost-benefit GC algorithm
+ */
+static void init_min_max_mtime(struct f2fs_sb_info *sbi)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ unsigned int segno;
+
+ mutex_lock(&sit_i->sentry_lock);
+
+ sit_i->min_mtime = LLONG_MAX;
+
+ for (segno = 0; segno < TOTAL_SEGS(sbi); segno += sbi->segs_per_sec) {
+ unsigned int i;
+ unsigned long long mtime = 0;
+
+ for (i = 0; i < sbi->segs_per_sec; i++)
+ mtime += get_seg_entry(sbi, segno + i)->mtime;
+
+ mtime = div_u64(mtime, sbi->segs_per_sec);
+
+ if (sit_i->min_mtime > mtime)
+ sit_i->min_mtime = mtime;
+ }
+ sit_i->max_mtime = get_mtime(sbi);
+ mutex_unlock(&sit_i->sentry_lock);
+}
+
+int build_segment_manager(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
+ struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
+ struct f2fs_sm_info *sm_info = NULL;
+
+ sm_info = kzalloc(sizeof(struct f2fs_sm_info), GFP_KERNEL);
+ if (!sm_info)
+ return -ENOMEM;
+
+ /* init sm info */
+ sbi->sm_info = sm_info;
+ INIT_LIST_HEAD(&sm_info->wblist_head);
+ spin_lock_init(&sm_info->wblist_lock);
+ sm_info->seg0_blkaddr = le32_to_cpu(raw_super->segment0_blkaddr);
+ sm_info->main_blkaddr = le32_to_cpu(raw_super->main_blkaddr);
+ sm_info->segment_count = le32_to_cpu(raw_super->segment_count);
+ sm_info->rsvd_segment_count =
+ le32_to_cpu(ckpt->rsvd_segment_count);
+ sm_info->main_segment_count =
+ le32_to_cpu(raw_super->segment_count_main);
+ sm_info->ssa_blkaddr = le32_to_cpu(raw_super->ssa_blkaddr);
+ sm_info->segment_count_ssa =
+ le32_to_cpu(raw_super->segment_count_ssa);
+
+ if (build_sit_info(sbi))
+ return -EINVAL;
+ if (build_free_segmap(sbi))
+ return -EINVAL;
+ if (build_curseg(sbi))
+ return -EINVAL;
+
+ /* reinit free segmap based on SIT */
+ build_sit_entries(sbi);
+
+ init_free_segmap(sbi);
+ if (build_dirty_segmap(sbi))
+ return -EINVAL;
+
+ init_min_max_mtime(sbi);
+ return 0;
+}
+
+static void discard_dirty_segmap(struct f2fs_sb_info *sbi,
+ enum dirty_type dirty_type)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+
+ mutex_lock(&dirty_i->seglist_lock);
+ kfree(dirty_i->dirty_segmap[dirty_type]);
+ dirty_i->nr_dirty[dirty_type] = 0;
+ mutex_unlock(&dirty_i->seglist_lock);
+}
+
+void reset_victim_segmap(struct f2fs_sb_info *sbi)
+{
+ unsigned int bitmap_size = f2fs_bitmap_size(TOTAL_SEGS(sbi));
+ memset(DIRTY_I(sbi)->victim_segmap[FG_GC], 0, bitmap_size);
+}
+
+static void destroy_victim_segmap(struct f2fs_sb_info *sbi)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+
+ kfree(dirty_i->victim_segmap[FG_GC]);
+ kfree(dirty_i->victim_segmap[BG_GC]);
+}
+
+static void destroy_dirty_segmap(struct f2fs_sb_info *sbi)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ int i;
+
+ if (!dirty_i)
+ return;
+
+ /* discard pre-free/dirty segments list */
+ for (i = 0; i < NR_DIRTY_TYPE; i++)
+ discard_dirty_segmap(sbi, i);
+
+ destroy_victim_segmap(sbi);
+ SM_I(sbi)->dirty_info = NULL;
+ kfree(dirty_i);
+}
+
+static void destroy_curseg(struct f2fs_sb_info *sbi)
+{
+ struct curseg_info *array = SM_I(sbi)->curseg_array;
+ int i;
+
+ if (!array)
+ return;
+ SM_I(sbi)->curseg_array = NULL;
+ for (i = 0; i < NR_CURSEG_TYPE; i++)
+ kfree(array[i].sum_blk);
+ kfree(array);
+}
+
+static void destroy_free_segmap(struct f2fs_sb_info *sbi)
+{
+ struct free_segmap_info *free_i = SM_I(sbi)->free_info;
+ if (!free_i)
+ return;
+ SM_I(sbi)->free_info = NULL;
+ kfree(free_i->free_segmap);
+ kfree(free_i->free_secmap);
+ kfree(free_i);
+}
+
+static void destroy_sit_info(struct f2fs_sb_info *sbi)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ unsigned int start;
+
+ if (!sit_i)
+ return;
+
+ if (sit_i->sentries) {
+ for (start = 0; start < TOTAL_SEGS(sbi); start++) {
+ kfree(sit_i->sentries[start].cur_valid_map);
+ kfree(sit_i->sentries[start].ckpt_valid_map);
+ }
+ }
+ vfree(sit_i->sentries);
+ vfree(sit_i->sec_entries);
+ kfree(sit_i->dirty_sentries_bitmap);
+
+ SM_I(sbi)->sit_info = NULL;
+ kfree(sit_i->sit_bitmap);
+ kfree(sit_i);
+}
+
+void destroy_segment_manager(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_sm_info *sm_info = SM_I(sbi);
+ destroy_dirty_segmap(sbi);
+ destroy_curseg(sbi);
+ destroy_free_segmap(sbi);
+ destroy_sit_info(sbi);
+ sbi->sm_info = NULL;
+ kfree(sm_info);
+}
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:29:27

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 08/16 v2] f2fs: add file operations

This adds memory operations and file/file_inode operations.

- F2FS supports fallocate(), mmap(), fsync(), and basic ioctl().

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/file.c | 640 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 640 insertions(+)
create mode 100644 fs/f2fs/file.c

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
new file mode 100644
index 0000000..81b1fd0
--- /dev/null
+++ b/fs/f2fs/file.c
@@ -0,0 +1,640 @@
+/**
+ * fs/f2fs/file.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+#include <linux/stat.h>
+#include <linux/buffer_head.h>
+#include <linux/writeback.h>
+#include <linux/falloc.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/mount.h>
+
+#include "f2fs.h"
+#include "node.h"
+#include "segment.h"
+#include "xattr.h"
+#include "acl.h"
+
+static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma,
+ struct vm_fault *vmf)
+{
+ struct page *page = vmf->page;
+ struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct page *node_page;
+ block_t old_blk_addr;
+ struct dnode_of_data dn;
+ int err;
+
+ f2fs_balance_fs(sbi);
+
+ sb_start_pagefault(inode->i_sb);
+
+ mutex_lock_op(sbi, DATA_NEW);
+
+ /* block allocation */
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = get_dnode_of_data(&dn, page->index, 0);
+ if (err) {
+ mutex_unlock_op(sbi, DATA_NEW);
+ goto out;
+ }
+
+ old_blk_addr = dn.data_blkaddr;
+ node_page = dn.node_page;
+
+ if (old_blk_addr == NULL_ADDR) {
+ err = reserve_new_block(&dn);
+ if (err) {
+ f2fs_put_dnode(&dn);
+ mutex_unlock_op(sbi, DATA_NEW);
+ goto out;
+ }
+ }
+ f2fs_put_dnode(&dn);
+
+ mutex_unlock_op(sbi, DATA_NEW);
+
+ lock_page(page);
+ if (page->mapping != inode->i_mapping ||
+ page_offset(page) >= i_size_read(inode) ||
+ !PageUptodate(page)) {
+ unlock_page(page);
+ err = -EFAULT;
+ goto out;
+ }
+
+ /*
+ * check to see if the page is mapped already (no holes)
+ */
+ if (PageMappedToDisk(page))
+ goto out;
+
+ /* fill the page */
+ wait_on_page_writeback(page);
+
+ /* page is wholly or partially inside EOF */
+ if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode)) {
+ unsigned offset;
+ offset = i_size_read(inode) & ~PAGE_CACHE_MASK;
+ zero_user_segment(page, offset, PAGE_CACHE_SIZE);
+ }
+ set_page_dirty(page);
+ SetPageUptodate(page);
+
+ file_update_time(vma->vm_file);
+out:
+ sb_end_pagefault(inode->i_sb);
+ return block_page_mkwrite_return(err);
+}
+
+static const struct vm_operations_struct f2fs_file_vm_ops = {
+ .fault = filemap_fault,
+ .page_mkwrite = f2fs_vm_page_mkwrite,
+};
+
+static int need_to_sync_dir(struct f2fs_sb_info *sbi, struct inode *inode)
+{
+ struct dentry *dentry;
+ nid_t pino;
+
+ inode = igrab(inode);
+ dentry = d_find_any_alias(inode);
+ if (!dentry) {
+ iput(inode);
+ return 0;
+ }
+ pino = dentry->d_parent->d_inode->i_ino;
+ dput(dentry);
+ iput(inode);
+ return !is_checkpointed_node(sbi, pino);
+}
+
+int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
+{
+ struct inode *inode = file->f_mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ unsigned long long cur_version;
+ int ret = 0;
+ bool need_cp = false;
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = LONG_MAX,
+ .for_reclaim = 0,
+ };
+
+ ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+ if (ret)
+ return ret;
+
+ mutex_lock(&inode->i_mutex);
+
+ if (inode->i_sb->s_flags & MS_RDONLY)
+ goto out;
+ if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
+ goto out;
+
+ mutex_lock(&sbi->cp_mutex);
+ cur_version = le64_to_cpu(F2FS_CKPT(sbi)->checkpoint_ver);
+ mutex_unlock(&sbi->cp_mutex);
+
+ if (F2FS_I(inode)->data_version != cur_version &&
+ !(inode->i_state & I_DIRTY))
+ goto out;
+ F2FS_I(inode)->data_version--;
+
+ if (!S_ISREG(inode->i_mode) || inode->i_nlink != 1)
+ need_cp = true;
+ if (is_inode_flag_set(F2FS_I(inode), FI_NEED_CP))
+ need_cp = true;
+ if (!space_for_roll_forward(sbi))
+ need_cp = true;
+ if (need_to_sync_dir(sbi, inode))
+ need_cp = true;
+
+ f2fs_write_inode(inode, NULL);
+
+ if (need_cp) {
+ /* all the dirty node pages should be flushed for POR */
+ ret = f2fs_sync_fs(inode->i_sb, 1);
+ clear_inode_flag(F2FS_I(inode), FI_NEED_CP);
+ } else {
+ while (sync_node_pages(sbi, inode->i_ino, &wbc) == 0)
+ f2fs_write_inode(inode, NULL);
+ filemap_fdatawait_range(sbi->node_inode->i_mapping,
+ 0, LONG_MAX);
+ }
+out:
+ mutex_unlock(&inode->i_mutex);
+ return ret;
+}
+
+static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ file_accessed(file);
+ vma->vm_ops = &f2fs_file_vm_ops;
+ return 0;
+}
+
+static int truncate_data_blocks_range(struct dnode_of_data *dn, int count)
+{
+ int nr_free = 0, ofs = dn->ofs_in_node;
+ struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
+ struct f2fs_node *raw_node;
+ __le32 *addr;
+
+ raw_node = page_address(dn->node_page);
+ addr = blkaddr_in_node(raw_node) + ofs;
+
+ for ( ; count > 0; count--, addr++, dn->ofs_in_node++) {
+ block_t blkaddr = le32_to_cpu(*addr);
+ if (blkaddr == NULL_ADDR)
+ continue;
+
+ update_extent_cache(NULL_ADDR, dn);
+ invalidate_blocks(sbi, blkaddr);
+ dec_valid_block_count(sbi, dn->inode, 1);
+ nr_free++;
+ }
+ if (nr_free) {
+ set_page_dirty(dn->node_page);
+ sync_inode_page(dn);
+ }
+ dn->ofs_in_node = ofs;
+ return nr_free;
+}
+
+void truncate_data_blocks(struct dnode_of_data *dn)
+{
+ truncate_data_blocks_range(dn, ADDRS_PER_BLOCK);
+}
+
+static void truncate_partial_data_page(struct inode *inode, u64 from)
+{
+ unsigned offset = from & (PAGE_CACHE_SIZE - 1);
+ struct page *page;
+
+ if (!offset)
+ return;
+
+ page = find_data_page(inode, from >> PAGE_CACHE_SHIFT);
+ if (IS_ERR(page))
+ return;
+
+ lock_page(page);
+ wait_on_page_writeback(page);
+ zero_user(page, offset, PAGE_CACHE_SIZE - offset);
+ set_page_dirty(page);
+ f2fs_put_page(page, 1);
+}
+
+static int truncate_blocks(struct inode *inode, u64 from)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ unsigned int blocksize = inode->i_sb->s_blocksize;
+ struct dnode_of_data dn;
+ pgoff_t free_from;
+ int count = 0;
+ int err;
+
+ free_from = (pgoff_t)
+ ((from + blocksize - 1) >> (sbi->log_blocksize));
+
+ mutex_lock_op(sbi, DATA_TRUNC);
+
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = get_dnode_of_data(&dn, free_from, RDONLY_NODE);
+ if (err) {
+ if (err == -ENOENT)
+ goto free_next;
+ mutex_unlock_op(sbi, DATA_TRUNC);
+ return err;
+ }
+
+ if (IS_INODE(dn.node_page))
+ count = ADDRS_PER_INODE;
+ else
+ count = ADDRS_PER_BLOCK;
+
+ count -= dn.ofs_in_node;
+ BUG_ON(count < 0);
+ if (dn.ofs_in_node || IS_INODE(dn.node_page)) {
+ truncate_data_blocks_range(&dn, count);
+ free_from += count;
+ }
+
+ f2fs_put_dnode(&dn);
+free_next:
+ err = truncate_inode_blocks(inode, free_from);
+ mutex_unlock_op(sbi, DATA_TRUNC);
+
+ /* lastly zero out the first data page */
+ truncate_partial_data_page(inode, from);
+
+ return err;
+}
+
+void f2fs_truncate(struct inode *inode)
+{
+ if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+ S_ISLNK(inode->i_mode)))
+ return;
+
+ if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+ return;
+
+ if (!truncate_blocks(inode, i_size_read(inode))) {
+ inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+ mark_inode_dirty(inode);
+ }
+
+ f2fs_balance_fs(F2FS_SB(inode->i_sb));
+}
+
+static int f2fs_getattr(struct vfsmount *mnt,
+ struct dentry *dentry, struct kstat *stat)
+{
+ struct inode *inode = dentry->d_inode;
+ generic_fillattr(inode, stat);
+ stat->blocks <<= 3;
+ return 0;
+}
+
+#ifdef CONFIG_F2FS_FS_POSIX_ACL
+static void __setattr_copy(struct inode *inode, const struct iattr *attr)
+{
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ unsigned int ia_valid = attr->ia_valid;
+
+ if (ia_valid & ATTR_UID)
+ inode->i_uid = attr->ia_uid;
+ if (ia_valid & ATTR_GID)
+ inode->i_gid = attr->ia_gid;
+ if (ia_valid & ATTR_ATIME)
+ inode->i_atime = timespec_trunc(attr->ia_atime,
+ inode->i_sb->s_time_gran);
+ if (ia_valid & ATTR_MTIME)
+ inode->i_mtime = timespec_trunc(attr->ia_mtime,
+ inode->i_sb->s_time_gran);
+ if (ia_valid & ATTR_CTIME)
+ inode->i_ctime = timespec_trunc(attr->ia_ctime,
+ inode->i_sb->s_time_gran);
+ if (ia_valid & ATTR_MODE) {
+ umode_t mode = attr->ia_mode;
+
+ if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+ mode &= ~S_ISGID;
+ set_acl_inode(fi, mode);
+ }
+}
+#else
+#define __setattr_copy setattr_copy
+#endif
+
+int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
+{
+ struct inode *inode = dentry->d_inode;
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ int err;
+
+ err = inode_change_ok(inode, attr);
+ if (err)
+ return err;
+
+ if ((attr->ia_valid & ATTR_SIZE) &&
+ attr->ia_size != i_size_read(inode)) {
+ truncate_setsize(inode, attr->ia_size);
+ f2fs_truncate(inode);
+ }
+
+ __setattr_copy(inode, attr);
+
+ if (attr->ia_valid & ATTR_MODE) {
+ err = f2fs_acl_chmod(inode);
+ if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
+ inode->i_mode = fi->i_acl_mode;
+ clear_inode_flag(fi, FI_ACL_MODE);
+ }
+ }
+
+ mark_inode_dirty(inode);
+ return err;
+}
+
+const struct inode_operations f2fs_file_inode_operations = {
+ .getattr = f2fs_getattr,
+ .setattr = f2fs_setattr,
+ .get_acl = f2fs_get_acl,
+#ifdef CONFIG_F2FS_FS_XATTR
+ .setxattr = generic_setxattr,
+ .getxattr = generic_getxattr,
+ .listxattr = f2fs_listxattr,
+ .removexattr = generic_removexattr,
+#endif
+};
+
+static void fill_zero(struct inode *inode, pgoff_t index,
+ loff_t start, loff_t len)
+{
+ struct page *page;
+
+ if (!len)
+ return;
+
+ page = get_new_data_page(inode, index, false);
+
+ if (!IS_ERR(page)) {
+ wait_on_page_writeback(page);
+ zero_user(page, start, len);
+ set_page_dirty(page);
+ f2fs_put_page(page, 1);
+ }
+}
+
+int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end)
+{
+ pgoff_t index;
+ int err;
+
+ for (index = pg_start; index < pg_end; index++) {
+ struct dnode_of_data dn;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+
+ mutex_lock_op(sbi, DATA_TRUNC);
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = get_dnode_of_data(&dn, index, RDONLY_NODE);
+ if (err) {
+ mutex_unlock_op(sbi, DATA_TRUNC);
+ if (err == -ENOENT)
+ continue;
+ return err;
+ }
+
+ if (dn.data_blkaddr != NULL_ADDR)
+ truncate_data_blocks_range(&dn, 1);
+ f2fs_put_dnode(&dn);
+ mutex_unlock_op(sbi, DATA_TRUNC);
+ }
+ return 0;
+}
+
+static int punch_hole(struct inode *inode, loff_t offset, loff_t len, int mode)
+{
+ pgoff_t pg_start, pg_end;
+ loff_t off_start, off_end;
+ int ret = 0;
+
+ pg_start = ((unsigned long long) offset) >> PAGE_CACHE_SHIFT;
+ pg_end = ((unsigned long long) offset + len) >> PAGE_CACHE_SHIFT;
+
+ off_start = offset & (PAGE_CACHE_SIZE - 1);
+ off_end = (offset + len) & (PAGE_CACHE_SIZE - 1);
+
+ if (pg_start == pg_end) {
+ fill_zero(inode, pg_start, off_start,
+ off_end - off_start);
+ } else {
+ if (off_start)
+ fill_zero(inode, pg_start++, off_start,
+ PAGE_CACHE_SIZE - off_start);
+ if (off_end)
+ fill_zero(inode, pg_end, 0, off_end);
+
+ if (pg_start < pg_end) {
+ struct address_space *mapping = inode->i_mapping;
+ loff_t blk_start, blk_end;
+
+ blk_start = pg_start << PAGE_CACHE_SHIFT;
+ blk_end = pg_end << PAGE_CACHE_SHIFT;
+ truncate_inode_pages_range(mapping, blk_start,
+ blk_end - 1);
+ ret = truncate_hole(inode, pg_start, pg_end);
+ }
+ }
+
+ if (!(mode & FALLOC_FL_KEEP_SIZE) &&
+ i_size_read(inode) <= (offset + len)) {
+ i_size_write(inode, offset);
+ mark_inode_dirty(inode);
+ }
+
+ return ret;
+}
+
+static int expand_inode_data(struct inode *inode, loff_t offset,
+ loff_t len, int mode)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ pgoff_t index, pg_start, pg_end;
+ loff_t new_size = i_size_read(inode);
+ loff_t off_start, off_end;
+ int ret = 0;
+
+ ret = inode_newsize_ok(inode, (len + offset));
+ if (ret)
+ return ret;
+
+ pg_start = ((unsigned long long) offset) >> PAGE_CACHE_SHIFT;
+ pg_end = ((unsigned long long) offset + len) >> PAGE_CACHE_SHIFT;
+
+ off_start = offset & (PAGE_CACHE_SIZE - 1);
+ off_end = (offset + len) & (PAGE_CACHE_SIZE - 1);
+
+ for (index = pg_start; index <= pg_end; index++) {
+ struct dnode_of_data dn;
+
+ mutex_lock_op(sbi, DATA_NEW);
+
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ ret = get_dnode_of_data(&dn, index, 0);
+ if (ret) {
+ mutex_unlock_op(sbi, DATA_NEW);
+ break;
+ }
+
+ if (dn.data_blkaddr == NULL_ADDR) {
+ ret = reserve_new_block(&dn);
+ if (ret) {
+ f2fs_put_dnode(&dn);
+ mutex_unlock_op(sbi, DATA_NEW);
+ break;
+ }
+ }
+ f2fs_put_dnode(&dn);
+
+ mutex_unlock_op(sbi, DATA_NEW);
+
+ if (pg_start == pg_end)
+ new_size = offset + len;
+ else if (index == pg_start && off_start)
+ new_size = (index + 1) << PAGE_CACHE_SHIFT;
+ else if (index == pg_end)
+ new_size = (index << PAGE_CACHE_SHIFT) + off_end;
+ else
+ new_size += PAGE_CACHE_SIZE;
+ }
+
+ if (!(mode & FALLOC_FL_KEEP_SIZE) &&
+ i_size_read(inode) < new_size) {
+ i_size_write(inode, new_size);
+ mark_inode_dirty(inode);
+ }
+
+ return ret;
+}
+
+static long f2fs_fallocate(struct file *file, int mode,
+ loff_t offset, loff_t len)
+{
+ struct inode *inode = file->f_path.dentry->d_inode;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ long ret;
+
+ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
+ return -EOPNOTSUPP;
+
+ if (mode & FALLOC_FL_PUNCH_HOLE)
+ ret = punch_hole(inode, offset, len, mode);
+ else
+ ret = expand_inode_data(inode, offset, len, mode);
+
+ f2fs_balance_fs(sbi);
+ return ret;
+}
+
+#define F2FS_REG_FLMASK (~(FS_DIRSYNC_FL | FS_TOPDIR_FL))
+#define F2FS_OTHER_FLMASK (FS_NODUMP_FL | FS_NOATIME_FL)
+
+static inline __u32 f2fs_mask_flags(umode_t mode, __u32 flags)
+{
+ if (S_ISDIR(mode))
+ return flags;
+ else if (S_ISREG(mode))
+ return flags & F2FS_REG_FLMASK;
+ else
+ return flags & F2FS_OTHER_FLMASK;
+}
+
+long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ struct inode *inode = filp->f_dentry->d_inode;
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ unsigned int flags;
+ int ret;
+
+ switch (cmd) {
+ case FS_IOC_GETFLAGS:
+ flags = fi->i_flags & FS_FL_USER_VISIBLE;
+ return put_user(flags, (int __user *) arg);
+ case FS_IOC_SETFLAGS:
+ {
+ unsigned int oldflags;
+
+ ret = mnt_want_write(filp->f_path.mnt);
+ if (ret)
+ return ret;
+
+ if (!inode_owner_or_capable(inode)) {
+ ret = -EACCES;
+ goto out;
+ }
+
+ if (get_user(flags, (int __user *) arg)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ flags = f2fs_mask_flags(inode->i_mode, flags);
+
+ mutex_lock(&inode->i_mutex);
+
+ oldflags = fi->i_flags;
+
+ if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
+ if (!capable(CAP_LINUX_IMMUTABLE)) {
+ mutex_unlock(&inode->i_mutex);
+ ret = -EPERM;
+ goto out;
+ }
+ }
+
+ flags = flags & FS_FL_USER_MODIFIABLE;
+ flags |= oldflags & ~FS_FL_USER_MODIFIABLE;
+ fi->i_flags = flags;
+ mutex_unlock(&inode->i_mutex);
+
+ f2fs_set_inode_flags(inode);
+ inode->i_ctime = CURRENT_TIME;
+ mark_inode_dirty(inode);
+out:
+ mnt_drop_write(filp->f_path.mnt);
+ return ret;
+ }
+ default:
+ return -ENOTTY;
+ }
+}
+
+const struct file_operations f2fs_file_operations = {
+ .llseek = generic_file_llseek,
+ .read = do_sync_read,
+ .write = do_sync_write,
+ .aio_read = generic_file_aio_read,
+ .aio_write = generic_file_aio_write,
+ .open = generic_file_open,
+ .mmap = f2fs_file_mmap,
+ .fsync = f2fs_sync_file,
+ .fallocate = f2fs_fallocate,
+ .unlocked_ioctl = f2fs_ioctl,
+ .splice_read = generic_file_splice_read,
+ .splice_write = generic_file_splice_write,
+};
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:29:56

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 09/16 v2] f2fs: add address space operations for data

This adds address space operations for data.

- F2FS supports readpages(), writepages(), and direct_IO().

- Because of out-of-place writes, f2fs_direct_IO() does not write data in place.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/data.c | 701 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 701 insertions(+)
create mode 100644 fs/f2fs/data.c

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
new file mode 100644
index 0000000..c2fd0a8
--- /dev/null
+++ b/fs/f2fs/data.c
@@ -0,0 +1,701 @@
+/**
+ * fs/f2fs/data.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+#include <linux/buffer_head.h>
+#include <linux/mpage.h>
+#include <linux/writeback.h>
+#include <linux/backing-dev.h>
+#include <linux/blkdev.h>
+#include <linux/bio.h>
+
+#include "f2fs.h"
+#include "node.h"
+#include "segment.h"
+
+/**
+ * Lock ordering for the change of data block address:
+ * ->data_page
+ * ->node_page
+ * update block addresses in the node page
+ */
+static void __set_data_blkaddr(struct dnode_of_data *dn, block_t new_addr)
+{
+ struct f2fs_node *rn;
+ __le32 *addr_array;
+ struct page *node_page = dn->node_page;
+ unsigned int ofs_in_node = dn->ofs_in_node;
+
+ wait_on_page_writeback(node_page);
+
+ rn = (struct f2fs_node *)page_address(node_page);
+
+ /* Get physical address of data block */
+ addr_array = blkaddr_in_node(rn);
+ addr_array[ofs_in_node] = cpu_to_le32(new_addr);
+ set_page_dirty(node_page);
+}
+
+int reserve_new_block(struct dnode_of_data *dn)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
+
+ if (is_inode_flag_set(F2FS_I(dn->inode), FI_NO_ALLOC))
+ return -EPERM;
+ if (!inc_valid_block_count(sbi, dn->inode, 1))
+ return -ENOSPC;
+
+ __set_data_blkaddr(dn, NEW_ADDR);
+ dn->data_blkaddr = NEW_ADDR;
+ sync_inode_page(dn);
+ return 0;
+}
+
+static int check_extent_cache(struct inode *inode, pgoff_t pgofs,
+ struct buffer_head *bh_result)
+{
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ pgoff_t start_fofs, end_fofs;
+ block_t start_blkaddr;
+
+ read_lock(&fi->ext.ext_lock);
+ if (fi->ext.len == 0) {
+ read_unlock(&fi->ext.ext_lock);
+ return 0;
+ }
+
+ sbi->total_hit_ext++;
+ start_fofs = fi->ext.fofs;
+ end_fofs = fi->ext.fofs + fi->ext.len - 1;
+ start_blkaddr = fi->ext.blk_addr;
+
+ if (pgofs >= start_fofs && pgofs <= end_fofs) {
+ unsigned int blkbits = inode->i_sb->s_blocksize_bits;
+ size_t count;
+
+ clear_buffer_new(bh_result);
+ map_bh(bh_result, inode->i_sb,
+ start_blkaddr + pgofs - start_fofs);
+ count = end_fofs - pgofs + 1;
+ if (count < (UINT_MAX >> blkbits))
+ bh_result->b_size = (count << blkbits);
+ else
+ bh_result->b_size = UINT_MAX;
+
+ sbi->read_hit_ext++;
+ read_unlock(&fi->ext.ext_lock);
+ return 1;
+ }
+ read_unlock(&fi->ext.ext_lock);
+ return 0;
+}
+
+void update_extent_cache(block_t blk_addr, struct dnode_of_data *dn)
+{
+ struct f2fs_inode_info *fi = F2FS_I(dn->inode);
+ pgoff_t fofs, start_fofs, end_fofs;
+ block_t start_blkaddr, end_blkaddr;
+
+ BUG_ON(blk_addr == NEW_ADDR);
+ fofs = start_bidx_of_node(ofs_of_node(dn->node_page)) + dn->ofs_in_node;
+
+ /* Update the page address in the parent node */
+ __set_data_blkaddr(dn, blk_addr);
+
+ write_lock(&fi->ext.ext_lock);
+
+ start_fofs = fi->ext.fofs;
+ end_fofs = fi->ext.fofs + fi->ext.len - 1;
+ start_blkaddr = fi->ext.blk_addr;
+ end_blkaddr = fi->ext.blk_addr + fi->ext.len - 1;
+
+ /* Drop and initialize the matched extent */
+ if (fi->ext.len == 1 && fofs == start_fofs)
+ fi->ext.len = 0;
+
+ /* Initial extent */
+ if (fi->ext.len == 0) {
+ if (blk_addr != NULL_ADDR) {
+ fi->ext.fofs = fofs;
+ fi->ext.blk_addr = blk_addr;
+ fi->ext.len = 1;
+ }
+ goto end_update;
+ }
+
+ /* Frone merge */
+ if (fofs == start_fofs - 1 && blk_addr == start_blkaddr - 1) {
+ fi->ext.fofs--;
+ fi->ext.blk_addr--;
+ fi->ext.len++;
+ goto end_update;
+ }
+
+ /* Back merge */
+ if (fofs == end_fofs + 1 && blk_addr == end_blkaddr + 1) {
+ fi->ext.len++;
+ goto end_update;
+ }
+
+ /* Split the existing extent */
+ if (fi->ext.len > 1 &&
+ fofs >= start_fofs && fofs <= end_fofs) {
+ if ((end_fofs - fofs) < (fi->ext.len >> 1)) {
+ fi->ext.len = fofs - start_fofs;
+ } else {
+ fi->ext.fofs = fofs + 1;
+ fi->ext.blk_addr = start_blkaddr +
+ fofs - start_fofs + 1;
+ fi->ext.len -= fofs - start_fofs + 1;
+ }
+ goto end_update;
+ }
+ write_unlock(&fi->ext.ext_lock);
+ return;
+
+end_update:
+ write_unlock(&fi->ext.ext_lock);
+ sync_inode_page(dn);
+ return;
+}
+
+struct page *find_data_page(struct inode *inode, pgoff_t index)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct address_space *mapping = inode->i_mapping;
+ struct dnode_of_data dn;
+ struct page *page;
+ int err;
+
+ page = find_get_page(mapping, index);
+ if (page && PageUptodate(page))
+ return page;
+ f2fs_put_page(page, 0);
+
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = get_dnode_of_data(&dn, index, RDONLY_NODE);
+ if (err)
+ return ERR_PTR(err);
+ f2fs_put_dnode(&dn);
+
+ if (dn.data_blkaddr == NULL_ADDR)
+ return ERR_PTR(-ENOENT);
+
+ /* By fallocate(), there is no cached page, but with NEW_ADDR */
+ if (dn.data_blkaddr == NEW_ADDR)
+ return ERR_PTR(-EINVAL);
+
+ page = grab_cache_page(mapping, index);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
+ if (err) {
+ f2fs_put_page(page, 1);
+ return ERR_PTR(err);
+ }
+ unlock_page(page);
+ return page;
+}
+
+/**
+ * If it tries to access a hole, return an error.
+ * Because, the callers, functions in dir.c and GC, should be able to know
+ * whether this page exists or not.
+ */
+struct page *get_lock_data_page(struct inode *inode, pgoff_t index)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct address_space *mapping = inode->i_mapping;
+ struct dnode_of_data dn;
+ struct page *page;
+ int err;
+
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = get_dnode_of_data(&dn, index, RDONLY_NODE);
+ if (err)
+ return ERR_PTR(err);
+ f2fs_put_dnode(&dn);
+
+ if (dn.data_blkaddr == NULL_ADDR)
+ return ERR_PTR(-ENOENT);
+
+ page = grab_cache_page(mapping, index);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ if (PageUptodate(page))
+ return page;
+
+ BUG_ON(dn.data_blkaddr == NEW_ADDR);
+ BUG_ON(dn.data_blkaddr == NULL_ADDR);
+
+ err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
+ if (err) {
+ f2fs_put_page(page, 1);
+ return ERR_PTR(err);
+ }
+ return page;
+}
+
+/**
+ * Caller ensures that this data page is never allocated.
+ * A new zero-filled data page is allocated in the page cache.
+ */
+struct page *get_new_data_page(struct inode *inode, pgoff_t index,
+ bool new_i_size)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct address_space *mapping = inode->i_mapping;
+ struct page *page;
+ struct dnode_of_data dn;
+ int err;
+
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = get_dnode_of_data(&dn, index, 0);
+ if (err)
+ return ERR_PTR(err);
+
+ if (dn.data_blkaddr == NULL_ADDR) {
+ if (reserve_new_block(&dn)) {
+ f2fs_put_dnode(&dn);
+ return ERR_PTR(-ENOSPC);
+ }
+ }
+ f2fs_put_dnode(&dn);
+
+ page = grab_cache_page(mapping, index);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ if (PageUptodate(page))
+ return page;
+
+ if (dn.data_blkaddr == NEW_ADDR) {
+ zero_user_segment(page, 0, PAGE_CACHE_SIZE);
+ } else {
+ err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
+ if (err) {
+ f2fs_put_page(page, 1);
+ return ERR_PTR(err);
+ }
+ }
+ SetPageUptodate(page);
+
+ if (new_i_size &&
+ i_size_read(inode) < ((index + 1) << PAGE_CACHE_SHIFT)) {
+ i_size_write(inode, ((index + 1) << PAGE_CACHE_SHIFT));
+ mark_inode_dirty_sync(inode);
+ }
+ return page;
+}
+
+static void read_end_io(struct bio *bio, int err)
+{
+ const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+ struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+
+ do {
+ struct page *page = bvec->bv_page;
+
+ if (--bvec >= bio->bi_io_vec)
+ prefetchw(&bvec->bv_page->flags);
+
+ if (uptodate) {
+ SetPageUptodate(page);
+ } else {
+ ClearPageUptodate(page);
+ SetPageError(page);
+ }
+ unlock_page(page);
+ } while (bvec >= bio->bi_io_vec);
+ kfree(bio->bi_private);
+ bio_put(bio);
+}
+
+/**
+ * Fill the locked page with data located in the block address.
+ * Read operation is synchronous, and caller must unlock the page.
+ */
+int f2fs_readpage(struct f2fs_sb_info *sbi, struct page *page,
+ block_t blk_addr, int type)
+{
+ struct block_device *bdev = sbi->sb->s_bdev;
+ bool sync = (type == READ_SYNC);
+ struct bio *bio;
+
+ /* This page can be already read by other threads */
+ if (PageUptodate(page)) {
+ if (!sync)
+ unlock_page(page);
+ return 0;
+ }
+
+ down_read(&sbi->bio_sem);
+
+ /* Allocate a new bio */
+ bio = f2fs_bio_alloc(bdev, blk_addr << (sbi->log_blocksize - 9),
+ 1, GFP_NOFS | __GFP_HIGH);
+
+ /* Initialize the bio */
+ bio->bi_end_io = read_end_io;
+ if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0) < PAGE_CACHE_SIZE) {
+ kfree(bio->bi_private);
+ bio_put(bio);
+ up_read(&sbi->bio_sem);
+ return -EFAULT;
+ }
+
+ submit_bio(type, bio);
+ up_read(&sbi->bio_sem);
+
+ /* wait for read completion if sync */
+ if (sync) {
+ lock_page(page);
+ if (PageError(page))
+ return -EIO;
+ }
+ return 0;
+}
+
+/**
+ * This function should be used by the data read flow only where it
+ * does not check the "create" flag that indicates block allocation.
+ * The reason for this special functionality is to exploit VFS readahead
+ * mechanism.
+ */
+static int get_data_block_ro(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create)
+{
+ unsigned int blkbits = inode->i_sb->s_blocksize_bits;
+ unsigned maxblocks = bh_result->b_size >> blkbits;
+ struct dnode_of_data dn;
+ pgoff_t pgofs;
+ int err;
+
+ /* Get the page offset from the block offset(iblock) */
+ pgofs = (pgoff_t)(iblock >> (PAGE_CACHE_SHIFT - blkbits));
+
+ if (check_extent_cache(inode, pgofs, bh_result))
+ return 0;
+
+ /* When reading holes, we need its node page */
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = get_dnode_of_data(&dn, pgofs, RDONLY_NODE);
+ if (err)
+ return (err == -ENOENT) ? 0 : err;
+
+ /* It does not support data allocation */
+ BUG_ON(create);
+
+ if (dn.data_blkaddr != NEW_ADDR && dn.data_blkaddr != NULL_ADDR) {
+ int i;
+ unsigned int end_offset;
+
+ end_offset = IS_INODE(dn.node_page) ?
+ ADDRS_PER_INODE :
+ ADDRS_PER_BLOCK;
+
+ clear_buffer_new(bh_result);
+
+ /* Give more consecutive addresses for the read ahead */
+ for (i = 0; i < end_offset - dn.ofs_in_node; i++)
+ if (((datablock_addr(dn.node_page,
+ dn.ofs_in_node + i))
+ != (dn.data_blkaddr + i)) || maxblocks == i)
+ break;
+ map_bh(bh_result, inode->i_sb, dn.data_blkaddr);
+ bh_result->b_size = (i << blkbits);
+ }
+ f2fs_put_dnode(&dn);
+ return 0;
+}
+
+static int f2fs_read_data_page(struct file *file, struct page *page)
+{
+ return mpage_readpage(page, get_data_block_ro);
+}
+
+static int f2fs_read_data_pages(struct file *file,
+ struct address_space *mapping,
+ struct list_head *pages, unsigned nr_pages)
+{
+ return mpage_readpages(mapping, pages, nr_pages, get_data_block_ro);
+}
+
+int do_write_data_page(struct page *page)
+{
+ struct inode *inode = page->mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ block_t old_blk_addr, new_blk_addr;
+ struct dnode_of_data dn;
+ int err = 0;
+
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = get_dnode_of_data(&dn, page->index, RDONLY_NODE);
+ if (err)
+ return err;
+
+ old_blk_addr = dn.data_blkaddr;
+
+ /* This page is already truncated */
+ if (old_blk_addr == NULL_ADDR)
+ goto out_writepage;
+
+ set_page_writeback(page);
+
+ /*
+ * If current allocation needs SSR,
+ * it had better in-place writes for updated data.
+ */
+ if (old_blk_addr != NEW_ADDR && !is_cold_data(page) &&
+ need_inplace_update(inode)) {
+ rewrite_data_page(F2FS_SB(inode->i_sb), page,
+ old_blk_addr);
+ } else {
+ write_data_page(inode, page, &dn,
+ old_blk_addr, &new_blk_addr);
+ update_extent_cache(new_blk_addr, &dn);
+ F2FS_I(inode)->data_version =
+ le64_to_cpu(F2FS_CKPT(sbi)->checkpoint_ver);
+ }
+out_writepage:
+ f2fs_put_dnode(&dn);
+ return err;
+}
+
+static int f2fs_write_data_page(struct page *page,
+ struct writeback_control *wbc)
+{
+ struct inode *inode = page->mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ loff_t i_size = i_size_read(inode);
+ const pgoff_t end_index = ((unsigned long long) i_size)
+ >> PAGE_CACHE_SHIFT;
+ unsigned offset;
+ int err = 0;
+
+ if (page->index < end_index)
+ goto out;
+
+ /*
+ * If the offset is out-of-range of file size,
+ * this page does not have to be written to disk.
+ */
+ offset = i_size & (PAGE_CACHE_SIZE - 1);
+ if ((page->index >= end_index + 1) || !offset) {
+ if (S_ISDIR(inode->i_mode)) {
+ dec_page_count(sbi, F2FS_DIRTY_DENTS);
+ inode_dec_dirty_dents(inode);
+ }
+ goto unlock_out;
+ }
+
+ zero_user_segment(page, offset, PAGE_CACHE_SIZE);
+out:
+ if (sbi->por_doing)
+ goto redirty_out;
+
+ if (wbc->for_reclaim && !S_ISDIR(inode->i_mode) && !is_cold_data(page))
+ goto redirty_out;
+
+ mutex_lock_op(sbi, DATA_WRITE);
+ if (S_ISDIR(inode->i_mode)) {
+ dec_page_count(sbi, F2FS_DIRTY_DENTS);
+ inode_dec_dirty_dents(inode);
+ }
+ err = do_write_data_page(page);
+ if (err && err != -ENOENT) {
+ wbc->pages_skipped++;
+ set_page_dirty(page);
+ }
+ mutex_unlock_op(sbi, DATA_WRITE);
+
+ if (wbc->for_reclaim)
+ f2fs_submit_bio(sbi, DATA, true);
+
+ if (err == -ENOENT)
+ goto unlock_out;
+
+ clear_cold_data(page);
+ unlock_page(page);
+
+ if (!wbc->for_reclaim && !S_ISDIR(inode->i_mode))
+ f2fs_balance_fs(sbi);
+ return 0;
+
+unlock_out:
+ unlock_page(page);
+ return (err == -ENOENT) ? 0 : err;
+
+redirty_out:
+ wbc->pages_skipped++;
+ set_page_dirty(page);
+ return AOP_WRITEPAGE_ACTIVATE;
+}
+
+#define MAX_DESIRED_PAGES_WP 4096
+
+int f2fs_write_data_pages(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ struct inode *inode = mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ int ret;
+ long excess_nrtw = 0, desired_nrtw;
+
+ if (wbc->nr_to_write < MAX_DESIRED_PAGES_WP) {
+ desired_nrtw = MAX_DESIRED_PAGES_WP;
+ excess_nrtw = desired_nrtw - wbc->nr_to_write;
+ wbc->nr_to_write = desired_nrtw;
+ }
+
+ if (!S_ISDIR(inode->i_mode))
+ mutex_lock(&sbi->writepages);
+ ret = generic_writepages(mapping, wbc);
+ if (!S_ISDIR(inode->i_mode))
+ mutex_unlock(&sbi->writepages);
+ f2fs_submit_bio(sbi, DATA, (wbc->sync_mode == WB_SYNC_ALL));
+
+ remove_dirty_dir_inode(inode);
+
+ wbc->nr_to_write -= excess_nrtw;
+ return ret;
+}
+
+static int f2fs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ struct inode *inode = mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct page *page;
+ pgoff_t index = ((unsigned long long) pos) >> PAGE_CACHE_SHIFT;
+ struct dnode_of_data dn;
+ int err = 0;
+
+ /* for nobh_write_end */
+ *fsdata = NULL;
+
+ f2fs_balance_fs(sbi);
+
+ page = grab_cache_page_write_begin(mapping, index, flags);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
+
+ mutex_lock_op(sbi, DATA_NEW);
+
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ err = get_dnode_of_data(&dn, index, 0);
+ if (err) {
+ mutex_unlock_op(sbi, DATA_NEW);
+ f2fs_put_page(page, 1);
+ return err;
+ }
+
+ if (dn.data_blkaddr == NULL_ADDR) {
+ err = reserve_new_block(&dn);
+ if (err) {
+ f2fs_put_dnode(&dn);
+ mutex_unlock_op(sbi, DATA_NEW);
+ f2fs_put_page(page, 1);
+ return err;
+ }
+ }
+ f2fs_put_dnode(&dn);
+
+ mutex_unlock_op(sbi, DATA_NEW);
+
+ if ((len == PAGE_CACHE_SIZE) || PageUptodate(page))
+ return 0;
+
+ if ((pos & PAGE_CACHE_MASK) >= i_size_read(inode)) {
+ unsigned start = pos & (PAGE_CACHE_SIZE - 1);
+ unsigned end = start + len;
+
+ /* Reading beyond i_size is simple: memset to zero */
+ zero_user_segments(page, 0, start, end, PAGE_CACHE_SIZE);
+ return 0;
+ }
+
+ if (dn.data_blkaddr == NEW_ADDR) {
+ zero_user_segment(page, 0, PAGE_CACHE_SIZE);
+ } else {
+ err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
+ if (err) {
+ f2fs_put_page(page, 1);
+ return err;
+ }
+ }
+ SetPageUptodate(page);
+ clear_cold_data(page);
+ return 0;
+}
+
+static ssize_t f2fs_direct_IO(int rw, struct kiocb *iocb,
+ const struct iovec *iov, loff_t offset, unsigned long nr_segs)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = file->f_mapping->host;
+
+ if (rw == WRITE)
+ return 0;
+
+ /* Needs synchronization with the cleaner */
+ return blockdev_direct_IO(rw, iocb, inode, iov, offset, nr_segs,
+ get_data_block_ro);
+}
+
+static void f2fs_invalidate_data_page(struct page *page, unsigned long offset)
+{
+ struct inode *inode = page->mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ if (S_ISDIR(inode->i_mode) && PageDirty(page)) {
+ dec_page_count(sbi, F2FS_DIRTY_DENTS);
+ inode_dec_dirty_dents(inode);
+ }
+ ClearPagePrivate(page);
+}
+
+static int f2fs_release_data_page(struct page *page, gfp_t wait)
+{
+ ClearPagePrivate(page);
+ return 0;
+}
+
+static int f2fs_set_data_page_dirty(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ struct inode *inode = mapping->host;
+
+ SetPageUptodate(page);
+ if (!PageDirty(page)) {
+ __set_page_dirty_nobuffers(page);
+ set_dirty_dir_page(inode, page);
+ return 1;
+ }
+ return 0;
+}
+
+const struct address_space_operations f2fs_dblock_aops = {
+ .readpage = f2fs_read_data_page,
+ .readpages = f2fs_read_data_pages,
+ .writepage = f2fs_write_data_page,
+ .writepages = f2fs_write_data_pages,
+ .write_begin = f2fs_write_begin,
+ .write_end = nobh_write_end,
+ .set_page_dirty = f2fs_set_data_page_dirty,
+ .invalidatepage = f2fs_invalidate_data_page,
+ .releasepage = f2fs_release_data_page,
+ .direct_IO = f2fs_direct_IO,
+};
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:30:23

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 10/16 v2] f2fs: add core inode operations

This adds core functions to get, read, write, and evict an inode.

Signed-off-by: Changman Lee <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/inode.c | 262 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 262 insertions(+)
create mode 100644 fs/f2fs/inode.c

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
new file mode 100644
index 0000000..0cf61da
--- /dev/null
+++ b/fs/f2fs/inode.c
@@ -0,0 +1,262 @@
+/**
+ * fs/f2fs/inode.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+#include <linux/buffer_head.h>
+#include <linux/writeback.h>
+
+#include "f2fs.h"
+#include "node.h"
+
+struct f2fs_iget_args {
+ u64 ino;
+ int on_free;
+};
+
+void f2fs_set_inode_flags(struct inode *inode)
+{
+ unsigned int flags = F2FS_I(inode)->i_flags;
+
+ inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE |
+ S_NOATIME | S_DIRSYNC);
+
+ if (flags & FS_SYNC_FL)
+ inode->i_flags |= S_SYNC;
+ if (flags & FS_APPEND_FL)
+ inode->i_flags |= S_APPEND;
+ if (flags & FS_IMMUTABLE_FL)
+ inode->i_flags |= S_IMMUTABLE;
+ if (flags & FS_NOATIME_FL)
+ inode->i_flags |= S_NOATIME;
+ if (flags & FS_DIRSYNC_FL)
+ inode->i_flags |= S_DIRSYNC;
+}
+
+static int f2fs_iget_test(struct inode *inode, void *data)
+{
+ struct f2fs_iget_args *args = data;
+
+ if (inode->i_ino != args->ino)
+ return 0;
+ if (inode->i_state & (I_FREEING | I_WILL_FREE)) {
+ args->on_free = 1;
+ return 0;
+ }
+ return 1;
+}
+
+struct inode *f2fs_iget_nowait(struct super_block *sb, unsigned long ino)
+{
+ struct f2fs_iget_args args = {
+ .ino = ino,
+ .on_free = 0
+ };
+ struct inode *inode = ilookup5(sb, ino, f2fs_iget_test, &args);
+
+ if (inode)
+ return inode;
+ if (!args.on_free)
+ return f2fs_iget(sb, ino);
+ return ERR_PTR(-ENOENT);
+}
+
+static int do_read_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ struct page *node_page;
+ struct f2fs_node *rn;
+ struct f2fs_inode *ri;
+
+ /* Check if ino is within scope */
+ check_nid_range(sbi, inode->i_ino);
+
+ node_page = get_node_page(sbi, inode->i_ino);
+ if (IS_ERR(node_page))
+ return PTR_ERR(node_page);
+
+ rn = page_address(node_page);
+ ri = &(rn->i);
+
+ inode->i_mode = le16_to_cpu(ri->i_mode);
+ i_uid_write(inode, le32_to_cpu(ri->i_uid));
+ i_gid_write(inode, le32_to_cpu(ri->i_gid));
+ set_nlink(inode, le32_to_cpu(ri->i_links));
+ inode->i_size = le64_to_cpu(ri->i_size);
+ inode->i_blocks = le64_to_cpu(ri->i_blocks);
+
+ inode->i_atime.tv_sec = le64_to_cpu(ri->i_mtime);
+ inode->i_ctime.tv_sec = le64_to_cpu(ri->i_ctime);
+ inode->i_mtime.tv_sec = le64_to_cpu(ri->i_mtime);
+ inode->i_atime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec);
+ inode->i_ctime.tv_nsec = le32_to_cpu(ri->i_ctime_nsec);
+ inode->i_mtime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec);
+
+ fi->current_depth = le32_to_cpu(ri->current_depth);
+ fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid);
+ fi->i_flags = le32_to_cpu(ri->i_flags);
+ fi->flags = 0;
+ fi->data_version = le64_to_cpu(F2FS_CKPT(sbi)->checkpoint_ver) - 1;
+ fi->i_advise = ri->i_advise;
+ get_extent_info(&fi->ext, ri->i_ext);
+ f2fs_put_page(node_page, 1);
+ return 0;
+}
+
+struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ struct inode *inode;
+ int ret;
+
+ inode = iget_locked(sb, ino);
+ if (!inode)
+ return ERR_PTR(-ENOMEM);
+ if (!(inode->i_state & I_NEW))
+ return inode;
+ if (ino == F2FS_NODE_INO(sbi) || ino == F2FS_META_INO(sbi))
+ goto make_now;
+
+ ret = do_read_inode(inode);
+ if (ret)
+ goto bad_inode;
+
+ if (!sbi->por_doing && inode->i_nlink == 0) {
+ ret = -ENOENT;
+ goto bad_inode;
+ }
+
+make_now:
+ if (ino == F2FS_NODE_INO(sbi)) {
+ inode->i_mapping->a_ops = &f2fs_node_aops;
+ mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_MOVABLE);
+ } else if (ino == F2FS_META_INO(sbi)) {
+ inode->i_mapping->a_ops = &f2fs_meta_aops;
+ mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_MOVABLE);
+ } else if (S_ISREG(inode->i_mode)) {
+ inode->i_op = &f2fs_file_inode_operations;
+ inode->i_fop = &f2fs_file_operations;
+ inode->i_mapping->a_ops = &f2fs_dblock_aops;
+ } else if (S_ISDIR(inode->i_mode)) {
+ inode->i_op = &f2fs_dir_inode_operations;
+ inode->i_fop = &f2fs_dir_operations;
+ inode->i_mapping->a_ops = &f2fs_dblock_aops;
+ mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER_MOVABLE |
+ __GFP_ZERO);
+ } else if (S_ISLNK(inode->i_mode)) {
+ inode->i_op = &f2fs_symlink_inode_operations;
+ inode->i_mapping->a_ops = &f2fs_dblock_aops;
+ } else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
+ S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
+ inode->i_op = &f2fs_special_inode_operations;
+ init_special_inode(inode, inode->i_mode, inode->i_rdev);
+ } else {
+ ret = -EIO;
+ goto bad_inode;
+ }
+ unlock_new_inode(inode);
+
+ return inode;
+
+bad_inode:
+ iget_failed(inode);
+ return ERR_PTR(ret);
+}
+
+void update_inode(struct inode *inode, struct page *node_page)
+{
+ struct f2fs_node *rn;
+ struct f2fs_inode *ri;
+
+ wait_on_page_writeback(node_page);
+
+ rn = page_address(node_page);
+ ri = &(rn->i);
+
+ ri->i_mode = cpu_to_le16(inode->i_mode);
+ ri->i_advise = F2FS_I(inode)->i_advise;
+ ri->i_uid = cpu_to_le32(i_uid_read(inode));
+ ri->i_gid = cpu_to_le32(i_gid_read(inode));
+ ri->i_links = cpu_to_le32(inode->i_nlink);
+ ri->i_size = cpu_to_le64(i_size_read(inode));
+ ri->i_blocks = cpu_to_le64(inode->i_blocks);
+ set_raw_extent(&F2FS_I(inode)->ext, &ri->i_ext);
+
+ ri->i_ctime = cpu_to_le64(inode->i_ctime.tv_sec);
+ ri->i_mtime = cpu_to_le64(inode->i_mtime.tv_sec);
+ ri->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);
+ ri->i_mtime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec);
+ ri->current_depth = cpu_to_le32(F2FS_I(inode)->current_depth);
+ ri->i_xattr_nid = cpu_to_le32(F2FS_I(inode)->i_xattr_nid);
+ ri->i_flags = cpu_to_le32(F2FS_I(inode)->i_flags);
+ set_page_dirty(node_page);
+}
+
+int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct page *node_page;
+ bool need_lock = false;
+
+ if (inode->i_ino == F2FS_NODE_INO(sbi) ||
+ inode->i_ino == F2FS_META_INO(sbi))
+ return 0;
+
+ node_page = get_node_page(sbi, inode->i_ino);
+ if (IS_ERR(node_page))
+ return PTR_ERR(node_page);
+
+ if (!PageDirty(node_page)) {
+ need_lock = true;
+ f2fs_put_page(node_page, 1);
+ mutex_lock(&sbi->write_inode);
+ node_page = get_node_page(sbi, inode->i_ino);
+ if (IS_ERR(node_page)) {
+ mutex_unlock(&sbi->write_inode);
+ return PTR_ERR(node_page);
+ }
+ }
+ update_inode(inode, node_page);
+ f2fs_put_page(node_page, 1);
+ if (need_lock)
+ mutex_unlock(&sbi->write_inode);
+ return 0;
+}
+
+/**
+ * Called at the last iput() if i_nlink is zero
+ */
+void f2fs_evict_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+
+ truncate_inode_pages(&inode->i_data, 0);
+
+ if (inode->i_ino == F2FS_NODE_INO(sbi) ||
+ inode->i_ino == F2FS_META_INO(sbi))
+ goto no_delete;
+
+ BUG_ON(atomic_read(&F2FS_I(inode)->dirty_dents));
+ remove_dirty_dir_inode(inode);
+
+ if (inode->i_nlink || is_bad_inode(inode))
+ goto no_delete;
+
+ set_inode_flag(F2FS_I(inode), FI_NO_ALLOC);
+ i_size_write(inode, 0);
+
+ if (F2FS_HAS_BLOCKS(inode))
+ f2fs_truncate(inode);
+
+ remove_inode_page(inode);
+no_delete:
+ clear_inode(inode);
+}
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:30:49

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 11/16 v2] f2fs: add inode operations for special inodes

This adds inode operations for directory, symlink, and special inodes.

Signed-off-by: Changman Lee <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/namei.c | 494 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 494 insertions(+)
create mode 100644 fs/f2fs/namei.c

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
new file mode 100644
index 0000000..899d144
--- /dev/null
+++ b/fs/f2fs/namei.c
@@ -0,0 +1,494 @@
+/**
+ * fs/f2fs/namei.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+#include <linux/pagemap.h>
+#include <linux/sched.h>
+#include <linux/ctype.h>
+
+#include "f2fs.h"
+#include "xattr.h"
+#include "acl.h"
+
+static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
+{
+ struct super_block *sb = dir->i_sb;
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ nid_t ino;
+ struct inode *inode;
+ bool nid_free = false;
+ int err;
+
+ inode = new_inode(sb);
+ if (!inode)
+ return ERR_PTR(-ENOMEM);
+
+ mutex_lock_op(sbi, NODE_NEW);
+ if (!alloc_nid(sbi, &ino)) {
+ mutex_unlock_op(sbi, NODE_NEW);
+ err = -ENOSPC;
+ goto fail;
+ }
+ mutex_unlock_op(sbi, NODE_NEW);
+
+ inode->i_uid = current_fsuid();
+
+ if (dir->i_mode & S_ISGID) {
+ inode->i_gid = dir->i_gid;
+ if (S_ISDIR(mode))
+ mode |= S_ISGID;
+ } else {
+ inode->i_gid = current_fsgid();
+ }
+
+ inode->i_ino = ino;
+ inode->i_mode = mode;
+ inode->i_blocks = 0;
+ inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
+
+ err = insert_inode_locked(inode);
+ if (err) {
+ err = -EINVAL;
+ nid_free = true;
+ goto out;
+ }
+
+ mark_inode_dirty(inode);
+ return inode;
+
+out:
+ clear_nlink(inode);
+ unlock_new_inode(inode);
+fail:
+ iput(inode);
+ if (nid_free)
+ alloc_nid_failed(sbi, ino);
+ return ERR_PTR(err);
+}
+
+static int is_multimedia_file(const unsigned char *s, const char *sub)
+{
+ int slen = strlen(s);
+ int sublen = strlen(sub);
+ int ret;
+
+ if (sublen > slen)
+ return 1;
+
+ ret = memcmp(s + slen - sublen, sub, sublen);
+ if (ret) { /* compare upper case */
+ int i;
+ char upper_sub[8];
+ for (i = 0; i < sublen && i < sizeof(upper_sub); i++)
+ upper_sub[i] = toupper(sub[i]);
+ return memcmp(s + slen - sublen, upper_sub, sublen);
+ }
+
+ return ret;
+}
+
+/**
+ * Set multimedia files as cold files for hot/cold data separation
+ */
+static inline void set_cold_file(struct f2fs_sb_info *sbi, struct inode *inode,
+ const unsigned char *name)
+{
+ int i;
+ __u8 (*extlist)[8] = sbi->raw_super->extension_list;
+
+ int count = le32_to_cpu(sbi->raw_super->extension_count);
+ for (i = 0; i < count; i++) {
+ if (!is_multimedia_file(name, extlist[i])) {
+ F2FS_I(inode)->i_advise |= FADVISE_COLD_BIT;
+ break;
+ }
+ }
+}
+
+static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
+ bool excl)
+{
+ struct super_block *sb = dir->i_sb;
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ struct inode *inode;
+ nid_t ino = 0;
+ int err;
+
+ inode = f2fs_new_inode(dir, mode);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
+
+ if (!test_opt(sbi, DISABLE_EXT_IDENTIFY))
+ set_cold_file(sbi, inode, dentry->d_name.name);
+
+ inode->i_op = &f2fs_file_inode_operations;
+ inode->i_fop = &f2fs_file_operations;
+ inode->i_mapping->a_ops = &f2fs_dblock_aops;
+ ino = inode->i_ino;
+
+ err = f2fs_add_link(dentry, inode);
+ if (err)
+ goto out;
+
+ alloc_nid_done(sbi, ino);
+
+ if (!sbi->por_doing)
+ d_instantiate(dentry, inode);
+ unlock_new_inode(inode);
+
+ f2fs_balance_fs(sbi);
+ return 0;
+out:
+ clear_nlink(inode);
+ unlock_new_inode(inode);
+ iput(inode);
+ alloc_nid_failed(sbi, ino);
+ return err;
+}
+
+static int f2fs_link(struct dentry *old_dentry, struct inode *dir,
+ struct dentry *dentry)
+{
+ struct inode *inode = old_dentry->d_inode;
+ struct super_block *sb = dir->i_sb;
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ int err;
+
+ inode->i_ctime = CURRENT_TIME;
+ atomic_inc(&inode->i_count);
+
+ set_inode_flag(F2FS_I(inode), FI_INC_LINK);
+ err = f2fs_add_link(dentry, inode);
+ if (err)
+ goto out;
+
+ d_instantiate(dentry, inode);
+
+ f2fs_balance_fs(sbi);
+ return 0;
+out:
+ clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
+ iput(inode);
+ return err;
+}
+
+static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
+ unsigned int flags)
+{
+ struct inode *inode = NULL;
+ struct f2fs_dir_entry *de;
+ struct page *page;
+
+ if (dentry->d_name.len > F2FS_MAX_NAME_LEN)
+ return ERR_PTR(-ENAMETOOLONG);
+
+ de = f2fs_find_entry(dir, &dentry->d_name, &page);
+ if (de) {
+ nid_t ino = le32_to_cpu(de->ino);
+ kunmap(page);
+ f2fs_put_page(page, 0);
+
+ inode = f2fs_iget(dir->i_sb, ino);
+ if (IS_ERR(inode))
+ return ERR_CAST(inode);
+ }
+
+ return d_splice_alias(inode, dentry);
+}
+
+static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
+{
+ struct super_block *sb = dir->i_sb;
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ struct inode *inode = dentry->d_inode;
+ struct f2fs_dir_entry *de;
+ struct page *page;
+ int err = -ENOENT;
+
+ de = f2fs_find_entry(dir, &dentry->d_name, &page);
+ if (!de)
+ goto fail;
+
+ err = check_orphan_space(sbi);
+ if (err) {
+ kunmap(page);
+ f2fs_put_page(page, 0);
+ goto fail;
+ }
+
+ f2fs_delete_entry(de, page, inode);
+
+ /* In order to evict this inode, we set it dirty */
+ mark_inode_dirty(inode);
+ f2fs_balance_fs(sbi);
+fail:
+ return err;
+}
+
+static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
+ const char *symname)
+{
+ struct super_block *sb = dir->i_sb;
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ struct inode *inode;
+ unsigned symlen = strlen(symname) + 1;
+ int err;
+
+ inode = f2fs_new_inode(dir, S_IFLNK | S_IRWXUGO);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
+
+ inode->i_op = &f2fs_symlink_inode_operations;
+ inode->i_mapping->a_ops = &f2fs_dblock_aops;
+
+ err = f2fs_add_link(dentry, inode);
+ if (err)
+ goto out;
+
+ err = page_symlink(inode, symname, symlen);
+ alloc_nid_done(sbi, inode->i_ino);
+
+ d_instantiate(dentry, inode);
+ unlock_new_inode(inode);
+
+ f2fs_balance_fs(sbi);
+
+ return err;
+out:
+ clear_nlink(inode);
+ unlock_new_inode(inode);
+ iput(inode);
+ alloc_nid_failed(sbi, inode->i_ino);
+ return err;
+}
+
+static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
+ struct inode *inode;
+ int err;
+
+ inode = f2fs_new_inode(dir, S_IFDIR | mode);
+ err = PTR_ERR(inode);
+ if (IS_ERR(inode))
+ return err;
+
+ inode->i_op = &f2fs_dir_inode_operations;
+ inode->i_fop = &f2fs_dir_operations;
+ inode->i_mapping->a_ops = &f2fs_dblock_aops;
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS | __GFP_ZERO);
+
+ set_inode_flag(F2FS_I(inode), FI_INC_LINK);
+ err = f2fs_add_link(dentry, inode);
+ if (err)
+ goto out_fail;
+
+ alloc_nid_done(sbi, inode->i_ino);
+
+ d_instantiate(dentry, inode);
+ unlock_new_inode(inode);
+
+ f2fs_balance_fs(sbi);
+ return 0;
+
+out_fail:
+ clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
+ clear_nlink(inode);
+ unlock_new_inode(inode);
+ iput(inode);
+ alloc_nid_failed(sbi, inode->i_ino);
+ return err;
+}
+
+static int f2fs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+ struct inode *inode = dentry->d_inode;
+ if (f2fs_empty_dir(inode))
+ return f2fs_unlink(dir, dentry);
+ return -ENOTEMPTY;
+}
+
+static int f2fs_mknod(struct inode *dir, struct dentry *dentry,
+ umode_t mode, dev_t rdev)
+{
+ struct super_block *sb = dir->i_sb;
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ struct inode *inode;
+ int err = 0;
+
+ if (!new_valid_dev(rdev))
+ return -EINVAL;
+
+ inode = f2fs_new_inode(dir, mode);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
+
+ init_special_inode(inode, inode->i_mode, rdev);
+ inode->i_op = &f2fs_special_inode_operations;
+
+ err = f2fs_add_link(dentry, inode);
+ if (err)
+ goto out;
+
+ alloc_nid_done(sbi, inode->i_ino);
+ d_instantiate(dentry, inode);
+ unlock_new_inode(inode);
+
+ f2fs_balance_fs(sbi);
+
+ return 0;
+out:
+ clear_nlink(inode);
+ unlock_new_inode(inode);
+ iput(inode);
+ alloc_nid_failed(sbi, inode->i_ino);
+ return err;
+}
+
+static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+ struct super_block *sb = old_dir->i_sb;
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+ struct inode *old_inode = old_dentry->d_inode;
+ struct inode *new_inode = new_dentry->d_inode;
+ struct page *old_dir_page;
+ struct page *old_page;
+ struct f2fs_dir_entry *old_dir_entry = NULL;
+ struct f2fs_dir_entry *old_entry;
+ struct f2fs_dir_entry *new_entry;
+ int err = -ENOENT;
+
+ old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page);
+ if (!old_entry)
+ goto out;
+
+ if (S_ISDIR(old_inode->i_mode)) {
+ err = -EIO;
+ old_dir_entry = f2fs_parent_dir(old_inode, &old_dir_page);
+ if (!old_dir_entry)
+ goto out_old;
+ }
+
+ mutex_lock_op(sbi, RENAME);
+
+ if (new_inode) {
+ struct page *new_page;
+
+ err = -ENOTEMPTY;
+ if (old_dir_entry && !f2fs_empty_dir(new_inode))
+ goto out_dir;
+
+ err = -ENOENT;
+ new_entry = f2fs_find_entry(new_dir, &new_dentry->d_name,
+ &new_page);
+ if (!new_entry)
+ goto out_dir;
+
+ f2fs_set_link(new_dir, new_entry, new_page, old_inode);
+
+ new_inode->i_ctime = CURRENT_TIME;
+ if (old_dir_entry)
+ drop_nlink(new_inode);
+ drop_nlink(new_inode);
+ if (!new_inode->i_nlink)
+ add_orphan_inode(sbi, new_inode->i_ino);
+ f2fs_write_inode(new_inode, NULL);
+ } else {
+ err = f2fs_add_link(new_dentry, old_inode);
+ if (err)
+ goto out_dir;
+
+ if (old_dir_entry) {
+ inc_nlink(new_dir);
+ f2fs_write_inode(new_dir, NULL);
+ }
+ }
+
+ old_inode->i_ctime = CURRENT_TIME;
+ set_inode_flag(F2FS_I(old_inode), FI_NEED_CP);
+ mark_inode_dirty(old_inode);
+
+ f2fs_delete_entry(old_entry, old_page, NULL);
+
+ if (old_dir_entry) {
+ if (old_dir != new_dir) {
+ f2fs_set_link(old_inode, old_dir_entry,
+ old_dir_page, new_dir);
+ } else {
+ kunmap(old_dir_page);
+ f2fs_put_page(old_dir_page, 0);
+ }
+ drop_nlink(old_dir);
+ f2fs_write_inode(old_dir, NULL);
+ }
+
+ mutex_unlock_op(sbi, RENAME);
+
+ f2fs_balance_fs(sbi);
+ return 0;
+
+out_dir:
+ if (old_dir_entry) {
+ kunmap(old_dir_page);
+ f2fs_put_page(old_dir_page, 0);
+ }
+ mutex_unlock_op(sbi, RENAME);
+out_old:
+ kunmap(old_page);
+ f2fs_put_page(old_page, 0);
+out:
+ return err;
+}
+
+const struct inode_operations f2fs_dir_inode_operations = {
+ .create = f2fs_create,
+ .lookup = f2fs_lookup,
+ .link = f2fs_link,
+ .unlink = f2fs_unlink,
+ .symlink = f2fs_symlink,
+ .mkdir = f2fs_mkdir,
+ .rmdir = f2fs_rmdir,
+ .mknod = f2fs_mknod,
+ .rename = f2fs_rename,
+ .setattr = f2fs_setattr,
+ .get_acl = f2fs_get_acl,
+#ifdef CONFIG_F2FS_FS_XATTR
+ .setxattr = generic_setxattr,
+ .getxattr = generic_getxattr,
+ .listxattr = f2fs_listxattr,
+ .removexattr = generic_removexattr,
+#endif
+};
+
+const struct inode_operations f2fs_symlink_inode_operations = {
+ .readlink = generic_readlink,
+ .follow_link = page_follow_link_light,
+ .put_link = page_put_link,
+ .setattr = f2fs_setattr,
+#ifdef CONFIG_F2FS_FS_XATTR
+ .setxattr = generic_setxattr,
+ .getxattr = generic_getxattr,
+ .listxattr = f2fs_listxattr,
+ .removexattr = generic_removexattr,
+#endif
+};
+
+const struct inode_operations f2fs_special_inode_operations = {
+ .setattr = f2fs_setattr,
+ .get_acl = f2fs_get_acl,
+#ifdef CONFIG_F2FS_FS_XATTR
+ .setxattr = generic_setxattr,
+ .getxattr = generic_getxattr,
+ .listxattr = f2fs_listxattr,
+ .removexattr = generic_removexattr,
+#endif
+};
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:31:39

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 12/16 v2] f2fs: add core directory operations

This adds core functions to find, add, delete, and link dentries.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/dir.c | 657 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/hash.c | 98 +++++++++
2 files changed, 755 insertions(+)
create mode 100644 fs/f2fs/dir.c
create mode 100644 fs/f2fs/hash.c

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
new file mode 100644
index 0000000..f3de333
--- /dev/null
+++ b/fs/f2fs/dir.c
@@ -0,0 +1,657 @@
+/**
+ * fs/f2fs/dir.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+#include "f2fs.h"
+#include "acl.h"
+
+static unsigned long dir_blocks(struct inode *inode)
+{
+ return ((unsigned long long) (i_size_read(inode) + PAGE_CACHE_SIZE - 1))
+ >> PAGE_CACHE_SHIFT;
+}
+
+static unsigned int dir_buckets(unsigned int level)
+{
+ if (level < MAX_DIR_HASH_DEPTH / 2)
+ return 1 << level;
+ else
+ return 1 << ((MAX_DIR_HASH_DEPTH / 2) - 1);
+}
+
+static unsigned int bucket_blocks(unsigned int level)
+{
+ if (level < MAX_DIR_HASH_DEPTH / 2)
+ return 2;
+ else
+ return 4;
+}
+
+static unsigned char f2fs_filetype_table[F2FS_FT_MAX] = {
+ [F2FS_FT_UNKNOWN] = DT_UNKNOWN,
+ [F2FS_FT_REG_FILE] = DT_REG,
+ [F2FS_FT_DIR] = DT_DIR,
+ [F2FS_FT_CHRDEV] = DT_CHR,
+ [F2FS_FT_BLKDEV] = DT_BLK,
+ [F2FS_FT_FIFO] = DT_FIFO,
+ [F2FS_FT_SOCK] = DT_SOCK,
+ [F2FS_FT_SYMLINK] = DT_LNK,
+};
+
+#define S_SHIFT 12
+static unsigned char f2fs_type_by_mode[S_IFMT >> S_SHIFT] = {
+ [S_IFREG >> S_SHIFT] = F2FS_FT_REG_FILE,
+ [S_IFDIR >> S_SHIFT] = F2FS_FT_DIR,
+ [S_IFCHR >> S_SHIFT] = F2FS_FT_CHRDEV,
+ [S_IFBLK >> S_SHIFT] = F2FS_FT_BLKDEV,
+ [S_IFIFO >> S_SHIFT] = F2FS_FT_FIFO,
+ [S_IFSOCK >> S_SHIFT] = F2FS_FT_SOCK,
+ [S_IFLNK >> S_SHIFT] = F2FS_FT_SYMLINK,
+};
+
+static void set_de_type(struct f2fs_dir_entry *de, struct inode *inode)
+{
+ mode_t mode = inode->i_mode;
+ de->file_type = f2fs_type_by_mode[(mode & S_IFMT) >> S_SHIFT];
+}
+
+static unsigned long dir_block_index(unsigned int level, unsigned int idx)
+{
+ unsigned long i;
+ unsigned long bidx = 0;
+
+ for (i = 0; i < level; i++)
+ bidx += dir_buckets(i) * bucket_blocks(i);
+ bidx += idx * bucket_blocks(level);
+ return bidx;
+}
+
+static bool early_match_name(const char *name, int namelen,
+ f2fs_hash_t namehash, struct f2fs_dir_entry *de)
+{
+ if (le16_to_cpu(de->name_len) != namelen)
+ return false;
+
+ if (le32_to_cpu(de->hash_code) != namehash)
+ return false;
+
+ return true;
+}
+
+static struct f2fs_dir_entry *find_in_block(struct page *dentry_page,
+ const char *name, int namelen, int *max_slots,
+ f2fs_hash_t namehash, struct page **res_page)
+{
+ struct f2fs_dir_entry *de;
+ unsigned long bit_pos, end_pos, next_pos;
+ struct f2fs_dentry_block *dentry_blk = kmap(dentry_page);
+ int slots;
+
+ bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
+ NR_DENTRY_IN_BLOCK, 0);
+ while (bit_pos < NR_DENTRY_IN_BLOCK) {
+ de = &dentry_blk->dentry[bit_pos];
+ slots = (le16_to_cpu(de->name_len) + F2FS_NAME_LEN - 1) /
+ F2FS_NAME_LEN;
+
+ if (early_match_name(name, namelen, namehash, de)) {
+ if (!memcmp(dentry_blk->filename[bit_pos],
+ name, namelen)) {
+ *res_page = dentry_page;
+ goto found;
+ }
+ }
+ next_pos = bit_pos + slots;
+ bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
+ NR_DENTRY_IN_BLOCK, next_pos);
+ if (bit_pos >= NR_DENTRY_IN_BLOCK)
+ end_pos = NR_DENTRY_IN_BLOCK;
+ else
+ end_pos = bit_pos;
+ if (*max_slots < end_pos - next_pos)
+ *max_slots = end_pos - next_pos;
+ }
+
+ de = NULL;
+ kunmap(dentry_page);
+found:
+ return de;
+}
+
+static struct f2fs_dir_entry *find_in_level(struct inode *dir,
+ unsigned int level, const char *name, int namelen,
+ f2fs_hash_t namehash, struct page **res_page)
+{
+ int s = (namelen + F2FS_NAME_LEN - 1) / F2FS_NAME_LEN;
+ unsigned int nbucket, nblock;
+ unsigned int bidx, end_block;
+ struct page *dentry_page;
+ struct f2fs_dir_entry *de = NULL;
+ bool room = false;
+ int max_slots = 0;
+
+ BUG_ON(level > MAX_DIR_HASH_DEPTH);
+
+ nbucket = dir_buckets(level);
+ nblock = bucket_blocks(level);
+
+ bidx = dir_block_index(level, namehash % nbucket);
+ end_block = bidx + nblock;
+
+ for (; bidx < end_block; bidx++) {
+ /* no need to allocate new dentry pages to all the indices */
+ dentry_page = find_data_page(dir, bidx);
+ if (IS_ERR(dentry_page)) {
+ room = true;
+ continue;
+ }
+
+ de = find_in_block(dentry_page, name, namelen,
+ &max_slots, namehash, res_page);
+ if (de)
+ break;
+
+ if (max_slots >= s)
+ room = true;
+ f2fs_put_page(dentry_page, 0);
+ }
+
+ if (!de && room && F2FS_I(dir)->chash != namehash) {
+ F2FS_I(dir)->chash = namehash;
+ F2FS_I(dir)->clevel = level;
+ }
+
+ return de;
+}
+
+/*
+ * Find an entry in the specified directory with the wanted name.
+ * It returns the page where the entry was found (as a parameter - res_page),
+ * and the entry itself. Page is returned mapped and unlocked.
+ * Entry is guaranteed to be valid.
+ */
+struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
+ struct qstr *child, struct page **res_page)
+{
+ const char *name = child->name;
+ int namelen = child->len;
+ unsigned long npages = dir_blocks(dir);
+ struct f2fs_dir_entry *de = NULL;
+ f2fs_hash_t name_hash;
+ unsigned int max_depth;
+ unsigned int level;
+
+ if (npages == 0)
+ return NULL;
+
+ *res_page = NULL;
+
+ name_hash = f2fs_dentry_hash(name, namelen);
+ max_depth = F2FS_I(dir)->current_depth;
+
+ for (level = 0; level < max_depth; level++) {
+ de = find_in_level(dir, level, name,
+ namelen, name_hash, res_page);
+ if (de)
+ break;
+ }
+ if (!de && F2FS_I(dir)->chash != name_hash) {
+ F2FS_I(dir)->chash = name_hash;
+ F2FS_I(dir)->clevel = level - 1;
+ }
+ return de;
+}
+
+struct f2fs_dir_entry *f2fs_parent_dir(struct inode *dir, struct page **p)
+{
+ struct page *page = NULL;
+ struct f2fs_dir_entry *de = NULL;
+ struct f2fs_dentry_block *dentry_blk = NULL;
+
+ page = get_lock_data_page(dir, 0);
+ if (IS_ERR(page))
+ return NULL;
+
+ dentry_blk = kmap(page);
+ de = &dentry_blk->dentry[1];
+ *p = page;
+ unlock_page(page);
+ return de;
+}
+
+void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de,
+ struct page *page, struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
+
+ mutex_lock_op(sbi, DENTRY_OPS);
+ lock_page(page);
+ wait_on_page_writeback(page);
+ de->ino = cpu_to_le32(inode->i_ino);
+ set_de_type(de, inode);
+ kunmap(page);
+ set_page_dirty(page);
+ dir->i_mtime = dir->i_ctime = CURRENT_TIME;
+ mark_inode_dirty(dir);
+ f2fs_put_page(page, 1);
+ mutex_unlock_op(sbi, DENTRY_OPS);
+}
+
+void init_dent_inode(struct dentry *dentry, struct page *ipage)
+{
+ struct inode *dir = dentry->d_parent->d_inode;
+ struct f2fs_node *rn;
+
+ if (IS_ERR(ipage))
+ return;
+
+ wait_on_page_writeback(ipage);
+
+ /* copy dentry info. to this inode page */
+ rn = (struct f2fs_node *)page_address(ipage);
+ rn->i.i_pino = cpu_to_le32(dir->i_ino);
+ rn->i.i_namelen = cpu_to_le32(dentry->d_name.len);
+ memcpy(rn->i.i_name, dentry->d_name.name, dentry->d_name.len);
+ set_page_dirty(ipage);
+}
+
+static int init_inode_metadata(struct inode *inode, struct dentry *dentry)
+{
+ struct inode *dir = dentry->d_parent->d_inode;
+
+ if (is_inode_flag_set(F2FS_I(inode), FI_NEW_INODE)) {
+ int err;
+ err = new_inode_page(inode, dentry);
+ if (err)
+ return err;
+
+ if (S_ISDIR(inode->i_mode)) {
+ err = f2fs_make_empty(inode, dir);
+ if (err) {
+ remove_inode_page(inode);
+ return err;
+ }
+ }
+
+ err = f2fs_init_acl(inode, dir);
+ if (err) {
+ remove_inode_page(inode);
+ return err;
+ }
+ } else {
+ struct page *ipage;
+ ipage = get_node_page(F2FS_SB(dir->i_sb), inode->i_ino);
+ if (IS_ERR(ipage))
+ return PTR_ERR(ipage);
+ init_dent_inode(dentry, ipage);
+ f2fs_put_page(ipage, 1);
+ }
+ if (is_inode_flag_set(F2FS_I(inode), FI_INC_LINK)) {
+ inc_nlink(inode);
+ f2fs_write_inode(inode, NULL);
+ }
+ return 0;
+}
+
+static void update_parent_metadata(struct inode *dir, struct inode *inode,
+ unsigned int current_depth)
+{
+ bool need_dir_update = false;
+
+ if (is_inode_flag_set(F2FS_I(inode), FI_NEW_INODE)) {
+ if (S_ISDIR(inode->i_mode)) {
+ inc_nlink(dir);
+ need_dir_update = true;
+ }
+ clear_inode_flag(F2FS_I(inode), FI_NEW_INODE);
+ }
+ dir->i_mtime = dir->i_ctime = CURRENT_TIME;
+ if (F2FS_I(dir)->current_depth != current_depth) {
+ F2FS_I(dir)->current_depth = current_depth;
+ need_dir_update = true;
+ }
+
+ if (need_dir_update)
+ f2fs_write_inode(dir, NULL);
+ else
+ mark_inode_dirty(dir);
+
+ if (is_inode_flag_set(F2FS_I(inode), FI_INC_LINK))
+ clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
+}
+
+static int room_for_filename(struct f2fs_dentry_block *dentry_blk, int slots)
+{
+ int bit_start = 0;
+ int zero_start, zero_end;
+next:
+ zero_start = find_next_zero_bit_le(&dentry_blk->dentry_bitmap,
+ NR_DENTRY_IN_BLOCK,
+ bit_start);
+ if (zero_start >= NR_DENTRY_IN_BLOCK)
+ return NR_DENTRY_IN_BLOCK;
+
+ zero_end = find_next_bit_le(&dentry_blk->dentry_bitmap,
+ NR_DENTRY_IN_BLOCK,
+ zero_start);
+ if (zero_end - zero_start >= slots)
+ return zero_start;
+
+ bit_start = zero_end + 1;
+
+ if (zero_end + 1 >= NR_DENTRY_IN_BLOCK)
+ return NR_DENTRY_IN_BLOCK;
+ goto next;
+}
+
+int f2fs_add_link(struct dentry *dentry, struct inode *inode)
+{
+ unsigned int bit_pos;
+ unsigned int level;
+ unsigned int current_depth;
+ unsigned long bidx, block;
+ f2fs_hash_t dentry_hash;
+ struct f2fs_dir_entry *de;
+ unsigned int nbucket, nblock;
+ struct inode *dir = dentry->d_parent->d_inode;
+ struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
+ const char *name = dentry->d_name.name;
+ int namelen = dentry->d_name.len;
+ struct page *dentry_page = NULL;
+ struct f2fs_dentry_block *dentry_blk = NULL;
+ int slots = (namelen + F2FS_NAME_LEN - 1) / F2FS_NAME_LEN;
+ int err = 0;
+ int i;
+
+ dentry_hash = f2fs_dentry_hash(name, dentry->d_name.len);
+ level = 0;
+ current_depth = F2FS_I(dir)->current_depth;
+ if (F2FS_I(dir)->chash == dentry_hash) {
+ level = F2FS_I(dir)->clevel;
+ F2FS_I(dir)->chash = 0;
+ }
+
+start:
+ if (current_depth == MAX_DIR_HASH_DEPTH)
+ return -ENOSPC;
+
+ /* Increase the depth, if required */
+ if (level == current_depth)
+ ++current_depth;
+
+ nbucket = dir_buckets(level);
+ nblock = bucket_blocks(level);
+
+ bidx = dir_block_index(level, (dentry_hash % nbucket));
+
+ for (block = bidx; block <= (bidx + nblock - 1); block++) {
+ mutex_lock_op(sbi, DENTRY_OPS);
+ dentry_page = get_new_data_page(dir, block, true);
+ if (IS_ERR(dentry_page)) {
+ mutex_unlock_op(sbi, DENTRY_OPS);
+ return PTR_ERR(dentry_page);
+ }
+
+ dentry_blk = kmap(dentry_page);
+ bit_pos = room_for_filename(dentry_blk, slots);
+ if (bit_pos < NR_DENTRY_IN_BLOCK)
+ goto add_dentry;
+
+ kunmap(dentry_page);
+ f2fs_put_page(dentry_page, 1);
+ mutex_unlock_op(sbi, DENTRY_OPS);
+ }
+
+ /* Move to next level to find the empty slot for new dentry */
+ ++level;
+ goto start;
+add_dentry:
+ err = init_inode_metadata(inode, dentry);
+ if (err)
+ goto fail;
+
+ wait_on_page_writeback(dentry_page);
+
+ de = &dentry_blk->dentry[bit_pos];
+ de->hash_code = cpu_to_le32(dentry_hash);
+ de->name_len = cpu_to_le16(namelen);
+ memcpy(dentry_blk->filename[bit_pos], name, namelen);
+ de->ino = cpu_to_le32(inode->i_ino);
+ set_de_type(de, inode);
+ for (i = 0; i < slots; i++)
+ test_and_set_bit_le(bit_pos + i, &dentry_blk->dentry_bitmap);
+ set_page_dirty(dentry_page);
+ update_parent_metadata(dir, inode, current_depth);
+fail:
+ kunmap(dentry_page);
+ f2fs_put_page(dentry_page, 1);
+ mutex_unlock_op(sbi, DENTRY_OPS);
+ return err;
+}
+
+/**
+ * It only removes the dentry from the dentry page,corresponding name
+ * entry in name page does not need to be touched during deletion.
+ */
+void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
+ struct inode *inode)
+{
+ struct f2fs_dentry_block *dentry_blk;
+ unsigned int bit_pos;
+ struct address_space *mapping = page->mapping;
+ struct inode *dir = mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
+ int slots = (le16_to_cpu(dentry->name_len) + F2FS_NAME_LEN - 1) /
+ F2FS_NAME_LEN;
+ void *kaddr = page_address(page);
+ int i;
+
+ mutex_lock_op(sbi, DENTRY_OPS);
+
+ lock_page(page);
+ wait_on_page_writeback(page);
+
+ dentry_blk = (struct f2fs_dentry_block *)kaddr;
+ bit_pos = dentry - (struct f2fs_dir_entry *)dentry_blk->dentry;
+ for (i = 0; i < slots; i++)
+ test_and_clear_bit_le(bit_pos + i, &dentry_blk->dentry_bitmap);
+
+ /* Let's check and deallocate this dentry page */
+ bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
+ NR_DENTRY_IN_BLOCK,
+ 0);
+ kunmap(page); /* kunmap - pair of f2fs_find_entry */
+ set_page_dirty(page);
+
+ dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+
+ if (inode && S_ISDIR(inode->i_mode)) {
+ drop_nlink(dir);
+ f2fs_write_inode(dir, NULL);
+ } else {
+ mark_inode_dirty(dir);
+ }
+
+ if (inode) {
+ inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+ drop_nlink(inode);
+ if (S_ISDIR(inode->i_mode)) {
+ drop_nlink(inode);
+ i_size_write(inode, 0);
+ }
+ f2fs_write_inode(inode, NULL);
+ if (inode->i_nlink == 0)
+ add_orphan_inode(sbi, inode->i_ino);
+ }
+
+ if (bit_pos == NR_DENTRY_IN_BLOCK) {
+ loff_t page_offset;
+ truncate_hole(dir, page->index, page->index + 1);
+ clear_page_dirty_for_io(page);
+ ClearPageUptodate(page);
+ dec_page_count(sbi, F2FS_DIRTY_DENTS);
+ inode_dec_dirty_dents(dir);
+ page_offset = page->index << PAGE_CACHE_SHIFT;
+ f2fs_put_page(page, 1);
+ } else {
+ f2fs_put_page(page, 1);
+ }
+ mutex_unlock_op(sbi, DENTRY_OPS);
+}
+
+int f2fs_make_empty(struct inode *inode, struct inode *parent)
+{
+ struct page *dentry_page;
+ struct f2fs_dentry_block *dentry_blk;
+ struct f2fs_dir_entry *de;
+ void *kaddr;
+
+ dentry_page = get_new_data_page(inode, 0, true);
+ if (IS_ERR(dentry_page))
+ return PTR_ERR(dentry_page);
+
+ kaddr = kmap_atomic(dentry_page);
+ dentry_blk = (struct f2fs_dentry_block *)kaddr;
+
+ de = &dentry_blk->dentry[0];
+ de->name_len = cpu_to_le16(1);
+ de->hash_code = 0;
+ de->ino = cpu_to_le32(inode->i_ino);
+ memcpy(dentry_blk->filename[0], ".", 1);
+ set_de_type(de, inode);
+
+ de = &dentry_blk->dentry[1];
+ de->hash_code = 0;
+ de->name_len = cpu_to_le16(2);
+ de->ino = cpu_to_le32(parent->i_ino);
+ memcpy(dentry_blk->filename[1], "..", 2);
+ set_de_type(de, inode);
+
+ test_and_set_bit_le(0, &dentry_blk->dentry_bitmap);
+ test_and_set_bit_le(1, &dentry_blk->dentry_bitmap);
+ kunmap_atomic(kaddr);
+
+ set_page_dirty(dentry_page);
+ f2fs_put_page(dentry_page, 1);
+ return 0;
+}
+
+bool f2fs_empty_dir(struct inode *dir)
+{
+ unsigned long bidx;
+ struct page *dentry_page;
+ unsigned int bit_pos;
+ struct f2fs_dentry_block *dentry_blk;
+ unsigned long nblock = dir_blocks(dir);
+
+ for (bidx = 0; bidx < nblock; bidx++) {
+ void *kaddr;
+ dentry_page = get_lock_data_page(dir, bidx);
+ if (IS_ERR(dentry_page)) {
+ if (PTR_ERR(dentry_page) == -ENOENT)
+ continue;
+ else
+ return false;
+ }
+
+ kaddr = kmap_atomic(dentry_page);
+ dentry_blk = (struct f2fs_dentry_block *)kaddr;
+ if (bidx == 0)
+ bit_pos = 2;
+ else
+ bit_pos = 0;
+ bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
+ NR_DENTRY_IN_BLOCK,
+ bit_pos);
+ kunmap_atomic(kaddr);
+
+ f2fs_put_page(dentry_page, 1);
+
+ if (bit_pos < NR_DENTRY_IN_BLOCK)
+ return false;
+ }
+ return true;
+}
+
+static int f2fs_readdir(struct file *file, void *dirent, filldir_t filldir)
+{
+ unsigned long pos = file->f_pos;
+ struct inode *inode = file->f_dentry->d_inode;
+ unsigned long npages = dir_blocks(inode);
+ unsigned char *types = NULL;
+ unsigned int bit_pos = 0, start_bit_pos = 0;
+ int over = 0;
+ struct f2fs_dentry_block *dentry_blk = NULL;
+ struct f2fs_dir_entry *de = NULL;
+ struct page *dentry_page = NULL;
+ unsigned int n = 0;
+ unsigned char *ptr = NULL;
+ unsigned char d_type = DT_UNKNOWN;
+ int slots;
+
+ types = f2fs_filetype_table;
+ bit_pos = (pos % NR_DENTRY_IN_BLOCK);
+ n = (pos / NR_DENTRY_IN_BLOCK);
+
+ for ( ; n < npages; n++) {
+ dentry_page = get_lock_data_page(inode, n);
+ if (IS_ERR(dentry_page))
+ continue;
+
+ start_bit_pos = bit_pos;
+ dentry_blk = kmap(dentry_page);
+ while (bit_pos < NR_DENTRY_IN_BLOCK) {
+ d_type = DT_UNKNOWN;
+ bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
+ NR_DENTRY_IN_BLOCK,
+ bit_pos);
+ if (bit_pos >= NR_DENTRY_IN_BLOCK)
+ break;
+
+ de = &dentry_blk->dentry[bit_pos];
+ if (types && de->file_type < F2FS_FT_MAX)
+ d_type = types[de->file_type];
+
+ ptr = dentry_blk->filename[bit_pos];
+ over = filldir(dirent,
+ dentry_blk->filename[bit_pos],
+ le16_to_cpu(de->name_len),
+ (n * NR_DENTRY_IN_BLOCK) + bit_pos,
+ le32_to_cpu(de->ino), d_type);
+ if (over) {
+ file->f_pos += bit_pos - start_bit_pos;
+ goto success;
+ }
+ slots = (le16_to_cpu(de->name_len) + F2FS_NAME_LEN - 1)
+ / F2FS_NAME_LEN;
+ bit_pos += slots;
+ }
+ bit_pos = 0;
+ file->f_pos = (n + 1) * NR_DENTRY_IN_BLOCK;
+ kunmap(dentry_page);
+ f2fs_put_page(dentry_page, 1);
+ dentry_page = NULL;
+ }
+success:
+ if (dentry_page && !IS_ERR(dentry_page)) {
+ kunmap(dentry_page);
+ f2fs_put_page(dentry_page, 1);
+ }
+
+ return 0;
+}
+
+const struct file_operations f2fs_dir_operations = {
+ .read = generic_read_dir,
+ .readdir = f2fs_readdir,
+ .fsync = f2fs_sync_file,
+ .unlocked_ioctl = f2fs_ioctl,
+};
diff --git a/fs/f2fs/hash.c b/fs/f2fs/hash.c
new file mode 100644
index 0000000..098a196
--- /dev/null
+++ b/fs/f2fs/hash.c
@@ -0,0 +1,98 @@
+/**
+ * fs/f2fs/hash.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * Portions of this code from linux/fs/ext3/hash.c
+ *
+ * Copyright (C) 2002 by Theodore Ts'o
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/types.h>
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+#include <linux/cryptohash.h>
+#include <linux/pagemap.h>
+
+#include "f2fs.h"
+
+/*
+ * Hashing code copied from ext3
+ */
+#define DELTA 0x9E3779B9
+
+static void TEA_transform(unsigned int buf[4], unsigned int const in[])
+{
+ __u32 sum = 0;
+ __u32 b0 = buf[0], b1 = buf[1];
+ __u32 a = in[0], b = in[1], c = in[2], d = in[3];
+ int n = 16;
+
+ do {
+ sum += DELTA;
+ b0 += ((b1 << 4)+a) ^ (b1+sum) ^ ((b1 >> 5)+b);
+ b1 += ((b0 << 4)+c) ^ (b0+sum) ^ ((b0 >> 5)+d);
+ } while (--n);
+
+ buf[0] += b0;
+ buf[1] += b1;
+}
+
+static void str2hashbuf(const char *msg, int len, unsigned int *buf, int num)
+{
+ unsigned pad, val;
+ int i;
+
+ pad = (__u32)len | ((__u32)len << 8);
+ pad |= pad << 16;
+
+ val = pad;
+ if (len > num * 4)
+ len = num * 4;
+ for (i = 0; i < len; i++) {
+ if ((i % 4) == 0)
+ val = pad;
+ val = msg[i] + (val << 8);
+ if ((i % 4) == 3) {
+ *buf++ = val;
+ val = pad;
+ num--;
+ }
+ }
+ if (--num >= 0)
+ *buf++ = val;
+ while (--num >= 0)
+ *buf++ = pad;
+}
+
+f2fs_hash_t f2fs_dentry_hash(const char *name, int len)
+{
+ __u32 hash, minor_hash;
+ f2fs_hash_t f2fs_hash;
+ const char *p;
+ __u32 in[8], buf[4];
+
+ /* Initialize the default seed for the hash checksum functions */
+ buf[0] = 0x67452301;
+ buf[1] = 0xefcdab89;
+ buf[2] = 0x98badcfe;
+ buf[3] = 0x10325476;
+
+ p = name;
+ while (len > 0) {
+ str2hashbuf(p, len, in, 4);
+ TEA_transform(buf, in);
+ len -= 16;
+ p += 16;
+ }
+ hash = buf[0];
+ minor_hash = buf[1];
+
+ f2fs_hash = hash;
+ f2fs_hash &= ~F2FS_HASH_COL_BIT;
+ return f2fs_hash;
+}
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:31:55

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 13/16 v2] f2fs: add xattr and acl functionalities

This implements xattr and acl functionalities.

- F2FS uses a node page to contain use extended attributes.

Signed-off-by: Changman Lee <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/acl.c | 465 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/acl.h | 57 +++++++
fs/f2fs/xattr.c | 389 ++++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/xattr.h | 145 +++++++++++++++++
4 files changed, 1056 insertions(+)
create mode 100644 fs/f2fs/acl.c
create mode 100644 fs/f2fs/acl.h
create mode 100644 fs/f2fs/xattr.c
create mode 100644 fs/f2fs/xattr.h

diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
new file mode 100644
index 0000000..dff2a2b
--- /dev/null
+++ b/fs/f2fs/acl.c
@@ -0,0 +1,465 @@
+/**
+ * fs/f2fs/acl.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * Portions of this code from linux/fs/ext2/acl.c
+ *
+ * Copyright (C) 2001-2003 Andreas Gruenbacher, <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/f2fs_fs.h>
+#include "f2fs.h"
+#include "xattr.h"
+#include "acl.h"
+
+#define get_inode_mode(i) ((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \
+ (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
+
+static inline size_t f2fs_acl_size(int count)
+{
+ if (count <= 4) {
+ return sizeof(struct f2fs_acl_header) +
+ count * sizeof(struct f2fs_acl_entry_short);
+ } else {
+ return sizeof(struct f2fs_acl_header) +
+ 4 * sizeof(struct f2fs_acl_entry_short) +
+ (count - 4) * sizeof(struct f2fs_acl_entry);
+ }
+}
+
+static inline int f2fs_acl_count(size_t size)
+{
+ ssize_t s;
+ size -= sizeof(struct f2fs_acl_header);
+ s = size - 4 * sizeof(struct f2fs_acl_entry_short);
+ if (s < 0) {
+ if (size % sizeof(struct f2fs_acl_entry_short))
+ return -1;
+ return size / sizeof(struct f2fs_acl_entry_short);
+ } else {
+ if (s % sizeof(struct f2fs_acl_entry))
+ return -1;
+ return s / sizeof(struct f2fs_acl_entry) + 4;
+ }
+}
+
+static struct posix_acl *f2fs_acl_from_disk(const char *value, size_t size)
+{
+ int i, count;
+ struct posix_acl *acl;
+ struct f2fs_acl_header *hdr = (struct f2fs_acl_header *)value;
+ struct f2fs_acl_entry *entry = (struct f2fs_acl_entry *)(hdr + 1);
+ const char *end = value + size;
+
+ if (hdr->a_version != cpu_to_le32(F2FS_ACL_VERSION))
+ return ERR_PTR(-EINVAL);
+
+ count = f2fs_acl_count(size);
+ if (count < 0)
+ return ERR_PTR(-EINVAL);
+ if (count == 0)
+ return NULL;
+
+ acl = posix_acl_alloc(count, GFP_KERNEL);
+ if (!acl)
+ return ERR_PTR(-ENOMEM);
+
+ for (i = 0; i < count; i++) {
+
+ if ((char *)entry > end)
+ goto fail;
+
+ acl->a_entries[i].e_tag = le16_to_cpu(entry->e_tag);
+ acl->a_entries[i].e_perm = le16_to_cpu(entry->e_perm);
+
+ switch (acl->a_entries[i].e_tag) {
+ case ACL_USER_OBJ:
+ case ACL_GROUP_OBJ:
+ case ACL_MASK:
+ case ACL_OTHER:
+ acl->a_entries[i].e_id = ACL_UNDEFINED_ID;
+ entry = (struct f2fs_acl_entry *)((char *)entry +
+ sizeof(struct f2fs_acl_entry_short));
+ break;
+
+ case ACL_USER:
+ acl->a_entries[i].e_uid =
+ make_kuid(&init_user_ns,
+ le32_to_cpu(entry->e_id));
+ entry = (struct f2fs_acl_entry *)((char *)entry +
+ sizeof(struct f2fs_acl_entry));
+ break;
+ case ACL_GROUP:
+ acl->a_entries[i].e_gid =
+ make_kgid(&init_user_ns,
+ le32_to_cpu(entry->e_id));
+ entry = (struct f2fs_acl_entry *)((char *)entry +
+ sizeof(struct f2fs_acl_entry));
+ break;
+ default:
+ goto fail;
+ }
+ }
+ if ((char *)entry != end)
+ goto fail;
+ return acl;
+fail:
+ posix_acl_release(acl);
+ return ERR_PTR(-EINVAL);
+}
+
+static void *f2fs_acl_to_disk(const struct posix_acl *acl, size_t *size)
+{
+ struct f2fs_acl_header *f2fs_acl;
+ struct f2fs_acl_entry *entry;
+ int i;
+
+ f2fs_acl = kmalloc(sizeof(struct f2fs_acl_header) + acl->a_count *
+ sizeof(struct f2fs_acl_entry), GFP_KERNEL);
+ if (!f2fs_acl)
+ return ERR_PTR(-ENOMEM);
+
+ f2fs_acl->a_version = cpu_to_le32(F2FS_ACL_VERSION);
+ entry = (struct f2fs_acl_entry *)(f2fs_acl + 1);
+
+ for (i = 0; i < acl->a_count; i++) {
+
+ entry->e_tag = cpu_to_le16(acl->a_entries[i].e_tag);
+ entry->e_perm = cpu_to_le16(acl->a_entries[i].e_perm);
+
+ switch (acl->a_entries[i].e_tag) {
+ case ACL_USER:
+ entry->e_id = cpu_to_le32(
+ from_kuid(&init_user_ns,
+ acl->a_entries[i].e_uid));
+ entry = (struct f2fs_acl_entry *)((char *)entry +
+ sizeof(struct f2fs_acl_entry));
+ break;
+ case ACL_GROUP:
+ entry->e_id = cpu_to_le32(
+ from_kgid(&init_user_ns,
+ acl->a_entries[i].e_gid));
+ entry = (struct f2fs_acl_entry *)((char *)entry +
+ sizeof(struct f2fs_acl_entry));
+ break;
+ case ACL_USER_OBJ:
+ case ACL_GROUP_OBJ:
+ case ACL_MASK:
+ case ACL_OTHER:
+ entry = (struct f2fs_acl_entry *)((char *)entry +
+ sizeof(struct f2fs_acl_entry_short));
+ break;
+ default:
+ goto fail;
+ }
+ }
+ *size = f2fs_acl_size(acl->a_count);
+ return (void *)f2fs_acl;
+
+fail:
+ kfree(f2fs_acl);
+ return ERR_PTR(-EINVAL);
+}
+
+struct posix_acl *f2fs_get_acl(struct inode *inode, int type)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ int name_index = F2FS_XATTR_INDEX_POSIX_ACL_DEFAULT;
+ void *value = NULL;
+ struct posix_acl *acl;
+ int retval;
+
+ if (!test_opt(sbi, POSIX_ACL))
+ return NULL;
+
+ acl = get_cached_acl(inode, type);
+ if (acl != ACL_NOT_CACHED)
+ return acl;
+
+ if (type == ACL_TYPE_ACCESS)
+ name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS;
+
+ retval = f2fs_getxattr(inode, name_index, "", NULL, 0);
+ if (retval > 0) {
+ value = kmalloc(retval, GFP_KERNEL);
+ if (!value)
+ return ERR_PTR(-ENOMEM);
+ retval = f2fs_getxattr(inode, name_index, "", value, retval);
+ }
+
+ if (retval < 0) {
+ if (retval == -ENODATA)
+ acl = NULL;
+ else
+ acl = ERR_PTR(retval);
+ } else {
+ acl = f2fs_acl_from_disk(value, retval);
+ }
+ kfree(value);
+ if (!IS_ERR(acl))
+ set_cached_acl(inode, type, acl);
+
+ return acl;
+}
+
+static int f2fs_set_acl(struct inode *inode, int type, struct posix_acl *acl)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ int name_index;
+ void *value = NULL;
+ size_t size = 0;
+ int error;
+
+ if (!test_opt(sbi, POSIX_ACL))
+ return 0;
+ if (S_ISLNK(inode->i_mode))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case ACL_TYPE_ACCESS:
+ name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS;
+ if (acl) {
+ error = posix_acl_equiv_mode(acl, &inode->i_mode);
+ if (error < 0)
+ return error;
+ set_acl_inode(fi, inode->i_mode);
+ if (error == 0)
+ acl = NULL;
+ }
+ break;
+
+ case ACL_TYPE_DEFAULT:
+ name_index = F2FS_XATTR_INDEX_POSIX_ACL_DEFAULT;
+ if (!S_ISDIR(inode->i_mode))
+ return acl ? -EACCES : 0;
+ break;
+
+ default:
+ return -EINVAL;
+ }
+
+ if (acl) {
+ value = f2fs_acl_to_disk(acl, &size);
+ if (IS_ERR(value)) {
+ cond_clear_inode_flag(fi, FI_ACL_MODE);
+ return (int)PTR_ERR(value);
+ }
+ }
+
+ error = f2fs_setxattr(inode, name_index, "", value, size);
+
+ kfree(value);
+ if (!error)
+ set_cached_acl(inode, type, acl);
+
+ cond_clear_inode_flag(fi, FI_ACL_MODE);
+ return error;
+}
+
+int f2fs_init_acl(struct inode *inode, struct inode *dir)
+{
+ struct posix_acl *acl = NULL;
+ struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
+ int error = 0;
+
+ if (!S_ISLNK(inode->i_mode)) {
+ if (test_opt(sbi, POSIX_ACL)) {
+ acl = f2fs_get_acl(dir, ACL_TYPE_DEFAULT);
+ if (IS_ERR(acl))
+ return PTR_ERR(acl);
+ }
+ if (!acl)
+ inode->i_mode &= ~current_umask();
+ }
+
+ if (test_opt(sbi, POSIX_ACL) && acl) {
+
+ if (S_ISDIR(inode->i_mode)) {
+ error = f2fs_set_acl(inode, ACL_TYPE_DEFAULT, acl);
+ if (error)
+ goto cleanup;
+ }
+ error = posix_acl_create(&acl, GFP_KERNEL, &inode->i_mode);
+ if (error < 0)
+ return error;
+ if (error > 0)
+ error = f2fs_set_acl(inode, ACL_TYPE_ACCESS, acl);
+ }
+cleanup:
+ posix_acl_release(acl);
+ return error;
+}
+
+int f2fs_acl_chmod(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct posix_acl *acl;
+ int error;
+ mode_t mode = get_inode_mode(inode);
+
+ if (!test_opt(sbi, POSIX_ACL))
+ return 0;
+ if (S_ISLNK(mode))
+ return -EOPNOTSUPP;
+
+ acl = f2fs_get_acl(inode, ACL_TYPE_ACCESS);
+ if (IS_ERR(acl) || !acl)
+ return PTR_ERR(acl);
+
+ error = posix_acl_chmod(&acl, GFP_KERNEL, mode);
+ if (error)
+ return error;
+ error = f2fs_set_acl(inode, ACL_TYPE_ACCESS, acl);
+ posix_acl_release(acl);
+ return error;
+}
+
+static size_t f2fs_xattr_list_acl(struct dentry *dentry, char *list,
+ size_t list_size, const char *name, size_t name_len, int type)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
+ const char *xname = POSIX_ACL_XATTR_DEFAULT;
+ size_t size;
+
+ if (!test_opt(sbi, POSIX_ACL))
+ return 0;
+
+ if (type == ACL_TYPE_ACCESS)
+ xname = POSIX_ACL_XATTR_ACCESS;
+
+ size = strlen(xname) + 1;
+ if (list && size <= list_size)
+ memcpy(list, xname, size);
+ return size;
+}
+
+static int f2fs_xattr_get_acl(struct dentry *dentry, const char *name,
+ void *buffer, size_t size, int type)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
+ struct posix_acl *acl;
+ int error;
+
+ if (strcmp(name, "") != 0)
+ return -EINVAL;
+ if (!test_opt(sbi, POSIX_ACL))
+ return -EOPNOTSUPP;
+
+ acl = f2fs_get_acl(dentry->d_inode, type);
+ if (IS_ERR(acl))
+ return PTR_ERR(acl);
+ if (!acl)
+ return -ENODATA;
+ error = posix_acl_to_xattr(&init_user_ns, acl, buffer, size);
+ posix_acl_release(acl);
+
+ return error;
+}
+
+static int f2fs_xattr_set_acl(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags, int type)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
+ struct inode *inode = dentry->d_inode;
+ struct posix_acl *acl = NULL;
+ int error;
+
+ if (strcmp(name, "") != 0)
+ return -EINVAL;
+ if (!test_opt(sbi, POSIX_ACL))
+ return -EOPNOTSUPP;
+ if (!inode_owner_or_capable(inode))
+ return -EPERM;
+
+ if (value) {
+ acl = posix_acl_from_xattr(&init_user_ns, value, size);
+ if (IS_ERR(acl))
+ return PTR_ERR(acl);
+ if (acl) {
+ error = posix_acl_valid(acl);
+ if (error)
+ goto release_and_out;
+ }
+ } else {
+ acl = NULL;
+ }
+
+ error = f2fs_set_acl(inode, type, acl);
+
+release_and_out:
+ posix_acl_release(acl);
+ return error;
+}
+
+const struct xattr_handler f2fs_xattr_acl_default_handler = {
+ .prefix = POSIX_ACL_XATTR_DEFAULT,
+ .flags = ACL_TYPE_DEFAULT,
+ .list = f2fs_xattr_list_acl,
+ .get = f2fs_xattr_get_acl,
+ .set = f2fs_xattr_set_acl,
+};
+
+const struct xattr_handler f2fs_xattr_acl_access_handler = {
+ .prefix = POSIX_ACL_XATTR_ACCESS,
+ .flags = ACL_TYPE_ACCESS,
+ .list = f2fs_xattr_list_acl,
+ .get = f2fs_xattr_get_acl,
+ .set = f2fs_xattr_set_acl,
+};
+
+static size_t f2fs_xattr_advise_list(struct dentry *dentry, char *list,
+ size_t list_size, const char *name, size_t name_len, int type)
+{
+ const char *xname = F2FS_SYSTEM_ADVISE_PREFIX;
+ size_t size;
+
+ if (type != F2FS_XATTR_INDEX_ADVISE)
+ return 0;
+
+ size = strlen(xname) + 1;
+ if (list && size <= list_size)
+ memcpy(list, xname, size);
+ return size;
+}
+
+static int f2fs_xattr_advise_get(struct dentry *dentry, const char *name,
+ void *buffer, size_t size, int type)
+{
+ struct inode *inode = dentry->d_inode;
+
+ if (strcmp(name, "") != 0)
+ return -EINVAL;
+
+ *((char *)buffer) = F2FS_I(inode)->i_advise;
+ return sizeof(char);
+}
+
+static int f2fs_xattr_advise_set(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags, int type)
+{
+ struct inode *inode = dentry->d_inode;
+
+ if (strcmp(name, "") != 0)
+ return -EINVAL;
+ if (!inode_owner_or_capable(inode))
+ return -EPERM;
+ if (value == NULL)
+ return -EINVAL;
+
+ F2FS_I(inode)->i_advise |= *(char *)value;
+ return 0;
+}
+
+const struct xattr_handler f2fs_xattr_advise_handler = {
+ .prefix = F2FS_SYSTEM_ADVISE_PREFIX,
+ .flags = F2FS_XATTR_INDEX_ADVISE,
+ .list = f2fs_xattr_advise_list,
+ .get = f2fs_xattr_advise_get,
+ .set = f2fs_xattr_advise_set,
+};
diff --git a/fs/f2fs/acl.h b/fs/f2fs/acl.h
new file mode 100644
index 0000000..c97675e
--- /dev/null
+++ b/fs/f2fs/acl.h
@@ -0,0 +1,57 @@
+/**
+ * fs/f2fs/acl.h
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * Portions of this code from linux/fs/ext2/acl.h
+ *
+ * Copyright (C) 2001-2003 Andreas Gruenbacher, <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef __F2FS_ACL_H__
+#define __F2FS_ACL_H__
+
+#include <linux/posix_acl_xattr.h>
+
+#define F2FS_ACL_VERSION 0x0001
+
+struct f2fs_acl_entry {
+ __le16 e_tag;
+ __le16 e_perm;
+ __le32 e_id;
+};
+
+struct f2fs_acl_entry_short {
+ __le16 e_tag;
+ __le16 e_perm;
+};
+
+struct f2fs_acl_header {
+ __le32 a_version;
+};
+
+#ifdef CONFIG_F2FS_FS_POSIX_ACL
+
+extern struct posix_acl *f2fs_get_acl(struct inode *inode, int type);
+extern int f2fs_acl_chmod(struct inode *inode);
+extern int f2fs_init_acl(struct inode *inode, struct inode *dir);
+#else
+#define f2fs_check_acl NULL
+#define f2fs_get_acl NULL
+#define f2fs_set_acl NULL
+
+static inline int f2fs_acl_chmod(struct inode *inode)
+{
+ return 0;
+}
+
+static inline int f2fs_init_acl(struct inode *inode, struct inode *dir)
+{
+ return 0;
+}
+#endif
+#endif /* __F2FS_ACL_H__ */
diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
new file mode 100644
index 0000000..aca50fe
--- /dev/null
+++ b/fs/f2fs/xattr.c
@@ -0,0 +1,389 @@
+/**
+ * fs/f2fs/xattr.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * Portions of this code from linux/fs/ext2/xattr.c
+ *
+ * Copyright (C) 2001-2003 Andreas Gruenbacher <[email protected]>
+ *
+ * Fix by Harrison Xing <[email protected]>.
+ * Extended attributes for symlinks and special files added per
+ * suggestion of Luka Renko <[email protected]>.
+ * xattr consolidation Copyright (c) 2004 James Morris <[email protected]>,
+ * Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/rwsem.h>
+#include <linux/f2fs_fs.h>
+#include "f2fs.h"
+#include "xattr.h"
+
+static size_t f2fs_xattr_generic_list(struct dentry *dentry, char *list,
+ size_t list_size, const char *name, size_t name_len, int type)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
+ int total_len, prefix_len = 0;
+ const char *prefix = NULL;
+
+ switch (type) {
+ case F2FS_XATTR_INDEX_USER:
+ if (!test_opt(sbi, XATTR_USER))
+ return -EOPNOTSUPP;
+ prefix = XATTR_USER_PREFIX;
+ prefix_len = XATTR_USER_PREFIX_LEN;
+ break;
+ case F2FS_XATTR_INDEX_TRUSTED:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+ prefix = XATTR_TRUSTED_PREFIX;
+ prefix_len = XATTR_TRUSTED_PREFIX_LEN;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ total_len = prefix_len + name_len + 1;
+ if (list && total_len <= list_size) {
+ memcpy(list, prefix, prefix_len);
+ memcpy(list+prefix_len, name, name_len);
+ list[prefix_len + name_len] = '\0';
+ }
+ return total_len;
+}
+
+static int f2fs_xattr_generic_get(struct dentry *dentry, const char *name,
+ void *buffer, size_t size, int type)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
+
+ switch (type) {
+ case F2FS_XATTR_INDEX_USER:
+ if (!test_opt(sbi, XATTR_USER))
+ return -EOPNOTSUPP;
+ break;
+ case F2FS_XATTR_INDEX_TRUSTED:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+ break;
+ default:
+ return -EINVAL;
+ }
+ if (strcmp(name, "") == 0)
+ return -EINVAL;
+ return f2fs_getxattr(dentry->d_inode, type, name,
+ buffer, size);
+}
+
+static int f2fs_xattr_generic_set(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags, int type)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
+
+ switch (type) {
+ case F2FS_XATTR_INDEX_USER:
+ if (!test_opt(sbi, XATTR_USER))
+ return -EOPNOTSUPP;
+ break;
+ case F2FS_XATTR_INDEX_TRUSTED:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+ break;
+ default:
+ return -EINVAL;
+ }
+ if (strcmp(name, "") == 0)
+ return -EINVAL;
+
+ return f2fs_setxattr(dentry->d_inode, type, name, value, size);
+}
+
+const struct xattr_handler f2fs_xattr_user_handler = {
+ .prefix = XATTR_USER_PREFIX,
+ .flags = F2FS_XATTR_INDEX_USER,
+ .list = f2fs_xattr_generic_list,
+ .get = f2fs_xattr_generic_get,
+ .set = f2fs_xattr_generic_set,
+};
+
+const struct xattr_handler f2fs_xattr_trusted_handler = {
+ .prefix = XATTR_TRUSTED_PREFIX,
+ .flags = F2FS_XATTR_INDEX_TRUSTED,
+ .list = f2fs_xattr_generic_list,
+ .get = f2fs_xattr_generic_get,
+ .set = f2fs_xattr_generic_set,
+};
+
+static const struct xattr_handler *f2fs_xattr_handler_map[] = {
+ [F2FS_XATTR_INDEX_USER] = &f2fs_xattr_user_handler,
+#ifdef CONFIG_F2FS_FS_POSIX_ACL
+ [F2FS_XATTR_INDEX_POSIX_ACL_ACCESS] = &f2fs_xattr_acl_access_handler,
+ [F2FS_XATTR_INDEX_POSIX_ACL_DEFAULT] = &f2fs_xattr_acl_default_handler,
+#endif
+ [F2FS_XATTR_INDEX_TRUSTED] = &f2fs_xattr_trusted_handler,
+ [F2FS_XATTR_INDEX_ADVISE] = &f2fs_xattr_advise_handler,
+};
+
+const struct xattr_handler *f2fs_xattr_handlers[] = {
+ &f2fs_xattr_user_handler,
+#ifdef CONFIG_F2FS_FS_POSIX_ACL
+ &f2fs_xattr_acl_access_handler,
+ &f2fs_xattr_acl_default_handler,
+#endif
+ &f2fs_xattr_trusted_handler,
+ &f2fs_xattr_advise_handler,
+ NULL,
+};
+
+static inline const struct xattr_handler *f2fs_xattr_handler(int name_index)
+{
+ const struct xattr_handler *handler = NULL;
+
+ if (name_index > 0 && name_index < ARRAY_SIZE(f2fs_xattr_handler_map))
+ handler = f2fs_xattr_handler_map[name_index];
+ return handler;
+}
+
+int f2fs_getxattr(struct inode *inode, int name_index, const char *name,
+ void *buffer, size_t buffer_size)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ struct f2fs_xattr_entry *entry;
+ struct page *page;
+ void *base_addr;
+ int error = 0, found = 0;
+ int value_len, name_len;
+
+ if (name == NULL)
+ return -EINVAL;
+ name_len = strlen(name);
+
+ if (!fi->i_xattr_nid)
+ return -ENODATA;
+
+ page = get_node_page(sbi, fi->i_xattr_nid);
+ base_addr = page_address(page);
+
+ list_for_each_xattr(entry, base_addr) {
+ if (entry->e_name_index != name_index)
+ continue;
+ if (entry->e_name_len != name_len)
+ continue;
+ if (!memcmp(entry->e_name, name, name_len)) {
+ found = 1;
+ break;
+ }
+ }
+ if (!found) {
+ error = -ENODATA;
+ goto cleanup;
+ }
+
+ value_len = le16_to_cpu(entry->e_value_size);
+
+ if (buffer && value_len > buffer_size) {
+ error = -ERANGE;
+ goto cleanup;
+ }
+
+ if (buffer) {
+ char *pval = entry->e_name + entry->e_name_len;
+ memcpy(buffer, pval, value_len);
+ }
+ error = value_len;
+
+cleanup:
+ f2fs_put_page(page, 1);
+ return error;
+}
+
+ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
+{
+ struct inode *inode = dentry->d_inode;
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ struct f2fs_xattr_entry *entry;
+ struct page *page;
+ void *base_addr;
+ int error = 0;
+ size_t rest = buffer_size;
+
+ if (!fi->i_xattr_nid)
+ return 0;
+
+ page = get_node_page(sbi, fi->i_xattr_nid);
+ base_addr = page_address(page);
+
+ list_for_each_xattr(entry, base_addr) {
+ const struct xattr_handler *handler =
+ f2fs_xattr_handler(entry->e_name_index);
+ size_t size;
+
+ if (!handler)
+ continue;
+
+ size = handler->list(dentry, buffer, rest, entry->e_name,
+ entry->e_name_len, handler->flags);
+ if (buffer && size > rest) {
+ error = -ERANGE;
+ goto cleanup;
+ }
+
+ if (buffer)
+ buffer += size;
+ rest -= size;
+ }
+ error = buffer_size - rest;
+cleanup:
+ f2fs_put_page(page, 1);
+ return error;
+}
+
+int f2fs_setxattr(struct inode *inode, int name_index, const char *name,
+ const void *value, size_t value_len)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ struct f2fs_xattr_header *header = NULL;
+ struct f2fs_xattr_entry *here, *last;
+ struct page *page;
+ void *base_addr;
+ int error, found, free, name_len, newsize;
+ char *pval;
+
+ if (name == NULL)
+ return -EINVAL;
+ name_len = strlen(name);
+
+ if (value == NULL)
+ value_len = 0;
+
+ if (name_len > 255 || value_len > MAX_VALUE_LEN)
+ return -ERANGE;
+
+ mutex_lock_op(sbi, NODE_NEW);
+ if (!fi->i_xattr_nid) {
+ /* Allocate new attribute block */
+ struct dnode_of_data dn;
+
+ if (!alloc_nid(sbi, &fi->i_xattr_nid)) {
+ mutex_unlock_op(sbi, NODE_NEW);
+ return -ENOSPC;
+ }
+ set_new_dnode(&dn, inode, NULL, NULL, fi->i_xattr_nid);
+ mark_inode_dirty(inode);
+
+ page = new_node_page(&dn, XATTR_NODE_OFFSET);
+ if (IS_ERR(page)) {
+ alloc_nid_failed(sbi, fi->i_xattr_nid);
+ fi->i_xattr_nid = 0;
+ mutex_unlock_op(sbi, NODE_NEW);
+ return PTR_ERR(page);
+ }
+
+ alloc_nid_done(sbi, fi->i_xattr_nid);
+ base_addr = page_address(page);
+ header = XATTR_HDR(base_addr);
+ header->h_magic = cpu_to_le32(F2FS_XATTR_MAGIC);
+ header->h_refcount = cpu_to_le32(1);
+ } else {
+ /* The inode already has an extended attribute block. */
+ page = get_node_page(sbi, fi->i_xattr_nid);
+ if (IS_ERR(page)) {
+ mutex_unlock_op(sbi, NODE_NEW);
+ return PTR_ERR(page);
+ }
+
+ base_addr = page_address(page);
+ header = XATTR_HDR(base_addr);
+ }
+
+ if (le32_to_cpu(header->h_magic) != F2FS_XATTR_MAGIC) {
+ error = -EIO;
+ goto cleanup;
+ }
+
+ /* find entry with wanted name. */
+ found = 0;
+ list_for_each_xattr(here, base_addr) {
+ if (here->e_name_index != name_index)
+ continue;
+ if (here->e_name_len != name_len)
+ continue;
+ if (!memcmp(here->e_name, name, name_len)) {
+ found = 1;
+ break;
+ }
+ }
+
+ last = here;
+
+ while (!IS_XATTR_LAST_ENTRY(last))
+ last = XATTR_NEXT_ENTRY(last);
+
+ newsize = XATTR_ALIGN(sizeof(struct f2fs_xattr_entry) +
+ name_len + value_len);
+
+ /* 1. Check space */
+ if (value) {
+ /* If value is NULL, it is remove operation.
+ * In case of update operation, we caculate free.
+ */
+ free = MIN_OFFSET - ((char *)last - (char *)header);
+ if (found)
+ free = free - ENTRY_SIZE(here);
+
+ if (free < newsize) {
+ error = -ENOSPC;
+ goto cleanup;
+ }
+ }
+
+ /* 2. Remove old entry */
+ if (found) {
+ /* If entry is found, remove old entry.
+ * If not found, remove operation is not needed.
+ */
+ struct f2fs_xattr_entry *next = XATTR_NEXT_ENTRY(here);
+ int oldsize = ENTRY_SIZE(here);
+
+ memmove(here, next, (char *)last - (char *)next);
+ last = (struct f2fs_xattr_entry *)((char *)last - oldsize);
+ memset(last, 0, oldsize);
+ }
+
+ /* 3. Write new entry */
+ if (value) {
+ /* Before we come here, old entry is removed.
+ * We just write new entry. */
+ memset(last, 0, newsize);
+ last->e_name_index = name_index;
+ last->e_name_len = name_len;
+ memcpy(last->e_name, name, name_len);
+ pval = last->e_name + name_len;
+ memcpy(pval, value, value_len);
+ last->e_value_size = cpu_to_le16(value_len);
+ }
+
+ set_page_dirty(page);
+ f2fs_put_page(page, 1);
+
+ if (is_inode_flag_set(fi, FI_ACL_MODE)) {
+ inode->i_mode = fi->i_acl_mode;
+ inode->i_ctime = CURRENT_TIME;
+ clear_inode_flag(fi, FI_ACL_MODE);
+ }
+ f2fs_write_inode(inode, NULL);
+ mutex_unlock_op(sbi, NODE_NEW);
+
+ return 0;
+cleanup:
+ f2fs_put_page(page, 1);
+ mutex_unlock_op(sbi, NODE_NEW);
+ return error;
+}
diff --git a/fs/f2fs/xattr.h b/fs/f2fs/xattr.h
new file mode 100644
index 0000000..29b0a08
--- /dev/null
+++ b/fs/f2fs/xattr.h
@@ -0,0 +1,145 @@
+/**
+ * fs/f2fs/xattr.h
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * Portions of this code from linux/fs/ext2/xattr.h
+ *
+ * On-disk format of extended attributes for the ext2 filesystem.
+ *
+ * (C) 2001 Andreas Gruenbacher, <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef __F2FS_XATTR_H__
+#define __F2FS_XATTR_H__
+
+#include <linux/init.h>
+#include <linux/xattr.h>
+
+/* Magic value in attribute blocks */
+#define F2FS_XATTR_MAGIC 0xF2F52011
+
+/* Maximum number of references to one attribute block */
+#define F2FS_XATTR_REFCOUNT_MAX 1024
+
+/* Name indexes */
+#define F2FS_SYSTEM_ADVISE_PREFIX "system.advise"
+#define F2FS_XATTR_INDEX_USER 1
+#define F2FS_XATTR_INDEX_POSIX_ACL_ACCESS 2
+#define F2FS_XATTR_INDEX_POSIX_ACL_DEFAULT 3
+#define F2FS_XATTR_INDEX_TRUSTED 4
+#define F2FS_XATTR_INDEX_LUSTRE 5
+#define F2FS_XATTR_INDEX_SECURITY 6
+#define F2FS_XATTR_INDEX_ADVISE 7
+
+struct f2fs_xattr_header {
+ __le32 h_magic; /* magic number for identification */
+ __le32 h_refcount; /* reference count */
+ __u32 h_reserved[4]; /* zero right now */
+};
+
+struct f2fs_xattr_entry {
+ __u8 e_name_index;
+ __u8 e_name_len;
+ __le16 e_value_size; /* size of attribute value */
+ char e_name[0]; /* attribute name */
+};
+
+#define XATTR_HDR(ptr) ((struct f2fs_xattr_header *)(ptr))
+#define XATTR_ENTRY(ptr) ((struct f2fs_xattr_entry *)(ptr))
+#define XATTR_FIRST_ENTRY(ptr) (XATTR_ENTRY(XATTR_HDR(ptr)+1))
+#define XATTR_ROUND (3)
+
+#define XATTR_ALIGN(size) ((size + XATTR_ROUND) & ~XATTR_ROUND)
+
+#define ENTRY_SIZE(entry) (XATTR_ALIGN(sizeof(struct f2fs_xattr_entry) + \
+ entry->e_name_len + le16_to_cpu(entry->e_value_size)))
+
+#define XATTR_NEXT_ENTRY(entry) ((struct f2fs_xattr_entry *)((char *)(entry) +\
+ ENTRY_SIZE(entry)))
+
+#define IS_XATTR_LAST_ENTRY(entry) (*(__u32 *)(entry) == 0)
+
+#define list_for_each_xattr(entry, addr) \
+ for (entry = XATTR_FIRST_ENTRY(addr);\
+ !IS_XATTR_LAST_ENTRY(entry);\
+ entry = XATTR_NEXT_ENTRY(entry))
+
+
+#define MIN_OFFSET XATTR_ALIGN(PAGE_SIZE - \
+ sizeof(struct node_footer) - \
+ sizeof(__u32))
+
+#define MAX_VALUE_LEN (MIN_OFFSET - sizeof(struct f2fs_xattr_header) - \
+ sizeof(struct f2fs_xattr_entry))
+
+/**
+ * On-disk structure of f2fs_xattr
+ * We use only 1 block for xattr.
+ *
+ * +--------------------+
+ * | f2fs_xattr_header |
+ * | |
+ * +--------------------+
+ * | f2fs_xattr_entry |
+ * | .e_name_index = 1 |
+ * | .e_name_len = 3 |
+ * | .e_value_size = 14 |
+ * | .e_name = "foo" |
+ * | "value_of_xattr" |<- value_offs = e_name + e_name_len
+ * +--------------------+
+ * | f2fs_xattr_entry |
+ * | .e_name_index = 4 |
+ * | .e_name = "bar" |
+ * +--------------------+
+ * | |
+ * | Free |
+ * | |
+ * +--------------------+<- MIN_OFFSET
+ * | node_footer |
+ * | (nid, ino, offset) |
+ * +--------------------+
+ *
+ **/
+
+#ifdef CONFIG_F2FS_FS_XATTR
+extern const struct xattr_handler f2fs_xattr_user_handler;
+extern const struct xattr_handler f2fs_xattr_trusted_handler;
+extern const struct xattr_handler f2fs_xattr_acl_access_handler;
+extern const struct xattr_handler f2fs_xattr_acl_default_handler;
+extern const struct xattr_handler f2fs_xattr_advise_handler;
+
+extern const struct xattr_handler *f2fs_xattr_handlers[];
+
+extern int f2fs_setxattr(struct inode *inode, int name_index, const char *name,
+ const void *value, size_t value_len);
+extern int f2fs_getxattr(struct inode *inode, int name_index, const char *name,
+ void *buffer, size_t buffer_size);
+extern ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer,
+ size_t buffer_size);
+
+#else
+
+#define f2fs_xattr_handlers NULL
+static inline int f2fs_setxattr(struct inode *inode, int name_index,
+ const char *name, const void *value, size_t value_len)
+{
+ return -EOPNOTSUPP;
+}
+static inline int f2fs_getxattr(struct inode *inode, int name_index,
+ const char *name, void *buffer, size_t buffer_size)
+{
+ return -EOPNOTSUPP;
+}
+static inline ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer,
+ size_t buffer_size)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
+#endif /* __F2FS_XATTR_H__ */
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:32:26

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 14/16 v2] f2fs: add garbage collection functions

This adds on-demand and background cleaning functions.

- The basic background cleaning policy is trying to do cleaning jobs as much as
possible whenever the system is idle. Once the background cleaning is done,
the cleaner sleeps an amount of time not to interfere with VFS calls. The time
is dynamically adjusted according to the status of whole segments, which is
decreased when the following conditions are satisfied.

. GC is not conducted currently, and
. IO subsystem is idle by checking the number of requets in bdev's request
list, and
. There are enough dirty segments.

Otherwise, the time is increased incrementally until to the maximum time.
Note that, min and max times are 10 secs and 30 secs by default.

- F2FS adopts a default victim selection policy where background cleaning uses
a cost-benefit algorithm, while on-demand cleaning uses a greedy algorithm.

- The method of moving data during the cleaning is slightly different between
background and on-demand cleaning schemes. In the case of background cleaning,
F2FS loads the data, and marks them as dirty. Then, F2FS expects that the data
will be moved by flusher or VM. In the case of on-demand cleaning, F2FS should
move the data right away.

- In order to identify valid blocks in a victim segment, F2FS scans the bitmap
of the segment managed as an SIT entry.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/gc.c | 1139 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/gc.h | 203 +++++++++++
2 files changed, 1342 insertions(+)
create mode 100644 fs/f2fs/gc.c
create mode 100644 fs/f2fs/gc.h

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
new file mode 100644
index 0000000..753b05e
--- /dev/null
+++ b/fs/f2fs/gc.c
@@ -0,0 +1,1139 @@
+/**
+ * fs/f2fs/gc.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/backing-dev.h>
+#include <linux/proc_fs.h>
+#include <linux/init.h>
+#include <linux/f2fs_fs.h>
+#include <linux/kthread.h>
+#include <linux/delay.h>
+#include <linux/freezer.h>
+#include <linux/blkdev.h>
+
+#include "f2fs.h"
+#include "node.h"
+#include "segment.h"
+#include "gc.h"
+
+static LIST_HEAD(f2fs_stat_list);
+static struct kmem_cache *winode_slab;
+
+static int gc_thread_func(void *data)
+{
+ struct f2fs_sb_info *sbi = data;
+ wait_queue_head_t *wq = &sbi->gc_thread->gc_wait_queue_head;
+ long wait_ms;
+
+ wait_ms = GC_THREAD_MIN_SLEEP_TIME;
+
+ do {
+ if (try_to_freeze())
+ continue;
+ else
+ wait_event_interruptible_timeout(*wq,
+ kthread_should_stop(),
+ msecs_to_jiffies(wait_ms));
+ if (kthread_should_stop())
+ break;
+
+ f2fs_balance_fs(sbi);
+
+ if (!test_opt(sbi, BG_GC))
+ continue;
+
+ /*
+ * [GC triggering condition]
+ * 0. GC is not conducted currently.
+ * 1. There are enough dirty segments.
+ * 2. IO subsystem is idle by checking the # of writeback pages.
+ * 3. IO subsystem is idle by checking the # of requests in
+ * bdev's request list.
+ *
+ * Note) We have to avoid triggering GCs too much frequently.
+ * Because it is possible that some segments can be
+ * invalidated soon after by user update or deletion.
+ * So, I'd like to wait some time to collect dirty segments.
+ */
+ if (!mutex_trylock(&sbi->gc_mutex))
+ continue;
+
+ if (!is_idle(sbi)) {
+ wait_ms = increase_sleep_time(wait_ms);
+ mutex_unlock(&sbi->gc_mutex);
+ continue;
+ }
+
+ if (has_enough_invalid_blocks(sbi))
+ wait_ms = decrease_sleep_time(wait_ms);
+ else
+ wait_ms = increase_sleep_time(wait_ms);
+
+ sbi->bg_gc++;
+
+ if (f2fs_gc(sbi, 1) == GC_NONE)
+ wait_ms = GC_THREAD_NOGC_SLEEP_TIME;
+ else if (wait_ms == GC_THREAD_NOGC_SLEEP_TIME)
+ wait_ms = GC_THREAD_MAX_SLEEP_TIME;
+
+ } while (!kthread_should_stop());
+ return 0;
+}
+
+int start_gc_thread(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_kthread *gc_th = NULL;
+
+ gc_th = kmalloc(sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
+ if (!gc_th)
+ return -ENOMEM;
+
+ sbi->gc_thread = gc_th;
+ init_waitqueue_head(&sbi->gc_thread->gc_wait_queue_head);
+ sbi->gc_thread->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
+ GC_THREAD_NAME);
+ if (IS_ERR(gc_th->f2fs_gc_task)) {
+ kfree(gc_th);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+void stop_gc_thread(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
+ if (!gc_th)
+ return;
+ kthread_stop(gc_th->f2fs_gc_task);
+ kfree(gc_th);
+ sbi->gc_thread = NULL;
+}
+
+static int select_gc_type(int gc_type)
+{
+ return (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
+}
+
+static void select_policy(struct f2fs_sb_info *sbi, int gc_type,
+ int type, struct victim_sel_policy *p)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+
+ if (IS_SSR_TYPE(type)) {
+ p->alloc_mode = SSR;
+ p->gc_mode = GC_GREEDY;
+ p->type = GET_SSR_TYPE(type);
+ p->dirty_segmap = dirty_i->dirty_segmap[p->type];
+ p->ofs_unit = 1;
+ } else {
+ p->alloc_mode = LFS;
+ p->gc_mode = select_gc_type(gc_type);
+ p->type = 0;
+ p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
+ p->ofs_unit = sbi->segs_per_sec;
+ }
+ p->offset = sbi->last_victim[p->gc_mode];
+}
+
+static unsigned int get_max_cost(struct f2fs_sb_info *sbi,
+ struct victim_sel_policy *p)
+{
+ if (p->gc_mode == GC_GREEDY)
+ return (1 << sbi->log_blocks_per_seg) * p->ofs_unit;
+ else if (p->gc_mode == GC_CB)
+ return UINT_MAX;
+ else /* No other gc_mode */
+ return 0;
+}
+
+static unsigned int check_bg_victims(struct f2fs_sb_info *sbi)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ unsigned int segno;
+
+ /*
+ * If the gc_type is FG_GC, we can select victim segments
+ * selected by background GC before.
+ * Those segments guarantee they have small valid blocks.
+ */
+ segno = find_next_bit(dirty_i->victim_segmap[BG_GC],
+ TOTAL_SEGS(sbi), 0);
+ if (segno < TOTAL_SEGS(sbi)) {
+ clear_bit(segno, dirty_i->victim_segmap[BG_GC]);
+ return segno;
+ }
+ return NULL_SEGNO;
+}
+
+static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ unsigned int secno = GET_SECNO(sbi, segno);
+ unsigned int start = secno * sbi->segs_per_sec;
+ unsigned long long mtime = 0;
+ unsigned int vblocks;
+ unsigned char age = 0;
+ unsigned char u;
+ unsigned int i;
+
+ for (i = 0; i < sbi->segs_per_sec; i++)
+ mtime += get_seg_entry(sbi, start + i)->mtime;
+ vblocks = get_valid_blocks(sbi, segno, sbi->segs_per_sec);
+
+ mtime = div_u64(mtime, sbi->segs_per_sec);
+ vblocks = div_u64(vblocks, sbi->segs_per_sec);
+
+ u = (vblocks * 100) >> sbi->log_blocks_per_seg;
+
+ /* Handle if the system time is changed by user */
+ if (mtime < sit_i->min_mtime)
+ sit_i->min_mtime = mtime;
+ if (mtime > sit_i->max_mtime)
+ sit_i->max_mtime = mtime;
+ if (sit_i->max_mtime != sit_i->min_mtime)
+ age = 100 - div64_u64(100 * (mtime - sit_i->min_mtime),
+ sit_i->max_mtime - sit_i->min_mtime);
+
+ return UINT_MAX - ((100 * (100 - u) * age) / (100 + u));
+}
+
+static unsigned int get_gc_cost(struct f2fs_sb_info *sbi, unsigned int segno,
+ struct victim_sel_policy *p)
+{
+ if (p->alloc_mode == SSR)
+ return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
+
+ /* alloc_mode == LFS */
+ if (p->gc_mode == GC_GREEDY)
+ return get_valid_blocks(sbi, segno, sbi->segs_per_sec);
+ else
+ return get_cb_cost(sbi, segno);
+}
+
+/**
+ * This function is called from two pathes.
+ * One is garbage collection and the other is SSR segment selection.
+ * When it is called during GC, it just gets a victim segment
+ * and it does not remove it from dirty seglist.
+ * When it is called from SSR segment selection, it finds a segment
+ * which has minimum valid blocks and removes it from dirty seglist.
+ */
+static int get_victim_by_default(struct f2fs_sb_info *sbi,
+ unsigned int *result, int gc_type, int type)
+{
+ struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+ struct victim_sel_policy p;
+ unsigned int segno;
+ int nsearched = 0;
+
+ select_policy(sbi, gc_type, type, &p);
+
+ p.min_segno = NULL_SEGNO;
+ p.min_cost = get_max_cost(sbi, &p);
+
+ mutex_lock(&dirty_i->seglist_lock);
+
+ if (p.alloc_mode == LFS && gc_type == FG_GC) {
+ p.min_segno = check_bg_victims(sbi);
+ if (p.min_segno != NULL_SEGNO)
+ goto got_it;
+ }
+
+ while (1) {
+ unsigned long cost;
+
+ segno = find_next_bit(p.dirty_segmap,
+ TOTAL_SEGS(sbi), p.offset);
+ if (segno >= TOTAL_SEGS(sbi)) {
+ if (sbi->last_victim[p.gc_mode]) {
+ sbi->last_victim[p.gc_mode] = 0;
+ p.offset = 0;
+ continue;
+ }
+ break;
+ }
+ p.offset = ((segno / p.ofs_unit) * p.ofs_unit) + p.ofs_unit;
+
+ if (test_bit(segno, dirty_i->victim_segmap[FG_GC]))
+ continue;
+ if (gc_type == BG_GC &&
+ test_bit(segno, dirty_i->victim_segmap[BG_GC]))
+ continue;
+ if (IS_CURSEC(sbi, GET_SECNO(sbi, segno)))
+ continue;
+
+ cost = get_gc_cost(sbi, segno, &p);
+
+ if (p.min_cost > cost) {
+ p.min_segno = segno;
+ p.min_cost = cost;
+ }
+
+ if (cost == get_max_cost(sbi, &p))
+ continue;
+
+ if (nsearched++ >= MAX_VICTIM_SEARCH) {
+ sbi->last_victim[p.gc_mode] = segno;
+ break;
+ }
+ }
+got_it:
+ if (p.min_segno != NULL_SEGNO) {
+ *result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
+ if (p.alloc_mode == LFS) {
+ int i;
+ for (i = 0; i < p.ofs_unit; i++)
+ set_bit(*result + i,
+ dirty_i->victim_segmap[gc_type]);
+ }
+ }
+ mutex_unlock(&dirty_i->seglist_lock);
+
+ return (p.min_segno == NULL_SEGNO) ? 0 : 1;
+}
+
+static const struct victim_selection default_v_ops = {
+ .get_victim = get_victim_by_default,
+};
+
+static struct inode *find_gc_inode(nid_t ino, struct list_head *ilist)
+{
+ struct list_head *this;
+ struct inode_entry *ie;
+
+ list_for_each(this, ilist) {
+ ie = list_entry(this, struct inode_entry, list);
+ if (ie->inode->i_ino == ino)
+ return ie->inode;
+ }
+ return NULL;
+}
+
+static void add_gc_inode(struct inode *inode, struct list_head *ilist)
+{
+ struct list_head *this;
+ struct inode_entry *new_ie, *ie;
+
+ list_for_each(this, ilist) {
+ ie = list_entry(this, struct inode_entry, list);
+ if (ie->inode == inode) {
+ iput(inode);
+ return;
+ }
+ }
+repeat:
+ new_ie = kmem_cache_alloc(winode_slab, GFP_NOFS);
+ if (!new_ie) {
+ cond_resched();
+ goto repeat;
+ }
+ new_ie->inode = inode;
+ list_add_tail(&new_ie->list, ilist);
+}
+
+static void put_gc_inode(struct list_head *ilist)
+{
+ struct inode_entry *ie, *next_ie;
+ list_for_each_entry_safe(ie, next_ie, ilist, list) {
+ iput(ie->inode);
+ list_del(&ie->list);
+ kmem_cache_free(winode_slab, ie);
+ }
+}
+
+static int check_valid_map(struct f2fs_sb_info *sbi,
+ unsigned int segno, int offset)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ struct seg_entry *sentry;
+ int ret;
+
+ mutex_lock(&sit_i->sentry_lock);
+ sentry = get_seg_entry(sbi, segno);
+ ret = f2fs_test_bit(offset, sentry->cur_valid_map);
+ mutex_unlock(&sit_i->sentry_lock);
+ return ret ? GC_OK : GC_NEXT;
+}
+
+/**
+ * This function compares node address got in summary with that in NAT.
+ * On validity, copy that node with cold status, otherwise (invalid node)
+ * ignore that.
+ */
+static int gc_node_segment(struct f2fs_sb_info *sbi,
+ struct f2fs_summary *sum, unsigned int segno, int gc_type)
+{
+ bool initial = true;
+ struct f2fs_summary *entry;
+ int off;
+
+next_step:
+ entry = sum;
+ for (off = 0; off < sbi->blocks_per_seg; off++, entry++) {
+ nid_t nid = le32_to_cpu(entry->nid);
+ struct page *node_page;
+ int err;
+
+ /*
+ * It makes sure that free segments are able to write
+ * all the dirty node pages before CP after this CP.
+ * So let's check the space of dirty node pages.
+ */
+ if (should_do_checkpoint(sbi)) {
+ mutex_lock(&sbi->cp_mutex);
+ block_operations(sbi);
+ return GC_BLOCKED;
+ }
+
+ err = check_valid_map(sbi, segno, off);
+ if (err == GC_ERROR)
+ return err;
+ else if (err == GC_NEXT)
+ continue;
+
+ if (initial) {
+ ra_node_page(sbi, nid);
+ continue;
+ }
+ node_page = get_node_page(sbi, nid);
+ if (IS_ERR(node_page))
+ continue;
+
+ /* set page dirty and write it */
+ if (!PageWriteback(node_page))
+ set_page_dirty(node_page);
+ f2fs_put_page(node_page, 1);
+ gc_stat_inc_node_blk_count(sbi, 1);
+ }
+ if (initial) {
+ initial = false;
+ goto next_step;
+ }
+
+ if (gc_type == FG_GC) {
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = LONG_MAX,
+ .for_reclaim = 0,
+ };
+ sync_node_pages(sbi, 0, &wbc);
+ }
+ return GC_DONE;
+}
+
+/**
+ * Calculate start block index that this node page contains
+ */
+block_t start_bidx_of_node(unsigned int node_ofs)
+{
+ block_t start_bidx;
+ unsigned int bidx, indirect_blks;
+ int dec;
+
+ indirect_blks = 2 * NIDS_PER_BLOCK + 4;
+
+ start_bidx = 1;
+ if (node_ofs == 0) {
+ start_bidx = 0;
+ } else if (node_ofs <= 2) {
+ bidx = node_ofs - 1;
+ } else if (node_ofs <= indirect_blks) {
+ dec = (node_ofs - 4) / (NIDS_PER_BLOCK + 1);
+ bidx = node_ofs - 2 - dec;
+ } else {
+ dec = (node_ofs - indirect_blks - 3) / (NIDS_PER_BLOCK + 1);
+ bidx = node_ofs - 5 - dec;
+ }
+
+ if (start_bidx)
+ start_bidx = bidx * ADDRS_PER_BLOCK + ADDRS_PER_INODE;
+ return start_bidx;
+}
+
+static int check_dnode(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
+ struct node_info *dni, block_t blkaddr, unsigned int *nofs)
+{
+ struct page *node_page;
+ nid_t nid;
+ unsigned int ofs_in_node;
+ block_t source_blkaddr;
+
+ nid = le32_to_cpu(sum->nid);
+ ofs_in_node = le16_to_cpu(sum->ofs_in_node);
+
+ node_page = get_node_page(sbi, nid);
+ if (IS_ERR(node_page))
+ return GC_NEXT;
+
+ get_node_info(sbi, nid, dni);
+
+ if (sum->version != dni->version) {
+ f2fs_put_page(node_page, 1);
+ return GC_NEXT;
+ }
+
+ *nofs = ofs_of_node(node_page);
+ source_blkaddr = datablock_addr(node_page, ofs_in_node);
+ f2fs_put_page(node_page, 1);
+
+ if (source_blkaddr != blkaddr)
+ return GC_NEXT;
+ return GC_OK;
+}
+
+static void move_data_page(struct inode *inode, struct page *page, int gc_type)
+{
+ if (page->mapping != inode->i_mapping)
+ goto out;
+
+ if (inode != page->mapping->host)
+ goto out;
+
+ if (PageWriteback(page))
+ goto out;
+
+ if (gc_type == BG_GC) {
+ set_page_dirty(page);
+ set_cold_data(page);
+ } else {
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
+ mutex_lock_op(sbi, DATA_WRITE);
+ if (clear_page_dirty_for_io(page) &&
+ S_ISDIR(inode->i_mode)) {
+ dec_page_count(sbi, F2FS_DIRTY_DENTS);
+ inode_dec_dirty_dents(inode);
+ }
+ set_cold_data(page);
+ do_write_data_page(page);
+ mutex_unlock_op(sbi, DATA_WRITE);
+ clear_cold_data(page);
+ }
+out:
+ f2fs_put_page(page, 1);
+}
+
+/**
+ * This function tries to get parent node of victim data block, and identifies
+ * data block validity. If the block is valid, copy that with cold status and
+ * modify parent node.
+ * If the parent node is not valid or the data block address is different,
+ * the victim data block is ignored.
+ */
+static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
+ struct list_head *ilist, unsigned int segno, int gc_type)
+{
+ struct super_block *sb = sbi->sb;
+ struct f2fs_summary *entry;
+ block_t start_addr;
+ int err, off;
+ int phase = 0;
+
+ start_addr = START_BLOCK(sbi, segno);
+
+next_step:
+ entry = sum;
+ for (off = 0; off < sbi->blocks_per_seg; off++, entry++) {
+ struct page *data_page;
+ struct inode *inode;
+ struct node_info dni; /* dnode info for the data */
+ unsigned int ofs_in_node, nofs;
+ block_t start_bidx;
+
+ /*
+ * It makes sure that free segments are able to write
+ * all the dirty node pages before CP after this CP.
+ * So let's check the space of dirty node pages.
+ */
+ if (should_do_checkpoint(sbi)) {
+ mutex_lock(&sbi->cp_mutex);
+ block_operations(sbi);
+ err = GC_BLOCKED;
+ goto stop;
+ }
+
+ err = check_valid_map(sbi, segno, off);
+ if (err == GC_ERROR)
+ goto stop;
+ else if (err == GC_NEXT)
+ continue;
+
+ if (phase == 0) {
+ ra_node_page(sbi, le32_to_cpu(entry->nid));
+ continue;
+ }
+
+ /* Get an inode by ino with checking validity */
+ err = check_dnode(sbi, entry, &dni, start_addr + off, &nofs);
+ if (err == GC_ERROR)
+ goto stop;
+ else if (err == GC_NEXT)
+ continue;
+
+ if (phase == 1) {
+ ra_node_page(sbi, dni.ino);
+ continue;
+ }
+
+ start_bidx = start_bidx_of_node(nofs);
+ ofs_in_node = le16_to_cpu(entry->ofs_in_node);
+
+ if (phase == 2) {
+ inode = f2fs_iget_nowait(sb, dni.ino);
+ if (IS_ERR(inode))
+ continue;
+
+ data_page = find_data_page(inode,
+ start_bidx + ofs_in_node);
+ if (IS_ERR(data_page))
+ goto next_iput;
+
+ f2fs_put_page(data_page, 0);
+ add_gc_inode(inode, ilist);
+ } else {
+ inode = find_gc_inode(dni.ino, ilist);
+ if (inode) {
+ data_page = get_lock_data_page(inode,
+ start_bidx + ofs_in_node);
+ if (IS_ERR(data_page))
+ continue;
+ move_data_page(inode, data_page, gc_type);
+ gc_stat_inc_data_blk_count(sbi, 1);
+ }
+ }
+ continue;
+next_iput:
+ iput(inode);
+ }
+ if (++phase < 4)
+ goto next_step;
+ err = GC_DONE;
+stop:
+ if (gc_type == FG_GC)
+ f2fs_submit_bio(sbi, DATA, true);
+ return err;
+}
+
+static int __get_victim(struct f2fs_sb_info *sbi, unsigned int *result,
+ int gc_type, int type)
+{
+ struct sit_info *sit_i = SIT_I(sbi);
+ int ret;
+ mutex_lock(&sit_i->sentry_lock);
+ ret = DIRTY_I(sbi)->v_ops->get_victim(sbi, result, gc_type, type);
+ mutex_unlock(&sit_i->sentry_lock);
+ return ret;
+}
+
+static int do_garbage_collect(struct f2fs_sb_info *sbi, unsigned int segno,
+ struct list_head *ilist, int gc_type)
+{
+ struct page *sum_page;
+ struct f2fs_summary_block *sum;
+ int ret = GC_DONE;
+
+ /* read segment summary of victim */
+ sum_page = get_sum_page(sbi, segno);
+ if (IS_ERR(sum_page))
+ return GC_ERROR;
+
+ /*
+ * CP needs to lock sum_page. In this time, we don't need
+ * to lock this page, because this summary page is not gone anywhere.
+ * Also, this page is not gonna be updated before GC is done.
+ */
+ unlock_page(sum_page);
+ sum = page_address(sum_page);
+
+ switch (GET_SUM_TYPE((&sum->footer))) {
+ case SUM_TYPE_NODE:
+ ret = gc_node_segment(sbi, sum->entries, segno, gc_type);
+ break;
+ case SUM_TYPE_DATA:
+ ret = gc_data_segment(sbi, sum->entries, ilist, segno, gc_type);
+ break;
+ }
+ gc_stat_inc_seg_count(sbi, GET_SUM_TYPE((&sum->footer)));
+ gc_stat_inc_call_count(sbi->gc_info);
+
+ f2fs_put_page(sum_page, 0);
+ return ret;
+}
+
+int f2fs_gc(struct f2fs_sb_info *sbi, int nGC)
+{
+ unsigned int segno;
+ int old_free_secs, cur_free_secs;
+ int gc_status, nfree;
+ struct list_head ilist;
+ int gc_type = BG_GC;
+
+ INIT_LIST_HEAD(&ilist);
+gc_more:
+ nfree = 0;
+ gc_status = GC_NONE;
+
+ if (has_not_enough_free_secs(sbi))
+ old_free_secs = reserved_sections(sbi);
+ else
+ old_free_secs = free_sections(sbi);
+
+ while (sbi->sb->s_flags & MS_ACTIVE) {
+ int i;
+ if (has_not_enough_free_secs(sbi))
+ gc_type = FG_GC;
+
+ cur_free_secs = free_sections(sbi) + nfree;
+
+ /* We got free space successfully. */
+ if (nGC < cur_free_secs - old_free_secs)
+ break;
+
+ if (!__get_victim(sbi, &segno, gc_type, NO_CHECK_TYPE))
+ break;
+
+ for (i = 0; i < sbi->segs_per_sec; i++) {
+ /*
+ * do_garbage_collect will give us three gc_status:
+ * GC_ERROR, GC_DONE, and GC_BLOCKED.
+ * If GC is finished uncleanly, we have to return
+ * the victim to dirty segment list.
+ */
+ gc_status = do_garbage_collect(sbi, segno + i,
+ &ilist, gc_type);
+ if (gc_status != GC_DONE)
+ goto stop;
+ nfree++;
+ }
+ }
+stop:
+ if (has_not_enough_free_secs(sbi) || gc_status == GC_BLOCKED) {
+ write_checkpoint(sbi, (gc_status == GC_BLOCKED), false);
+ if (nfree)
+ goto gc_more;
+ }
+ sbi->last_gc_status = gc_status;
+ mutex_unlock(&sbi->gc_mutex);
+
+ put_gc_inode(&ilist);
+ BUG_ON(!list_empty(&ilist));
+ return gc_status;
+}
+
+#ifdef CONFIG_F2FS_STAT_FS
+void f2fs_update_stat(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+ struct f2fs_stat_info *si = gc_i->stat_info;
+ int i;
+
+ /* valid check of the segment numbers */
+ si->hit_ext = sbi->read_hit_ext;
+ si->total_ext = sbi->total_hit_ext;
+ si->ndirty_node = get_pages(sbi, F2FS_DIRTY_NODES);
+ si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS);
+ si->ndirty_dirs = sbi->n_dirty_dirs;
+ si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META);
+ si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg;
+ si->rsvd_segs = reserved_segments(sbi);
+ si->overp_segs = overprovision_segments(sbi);
+ si->valid_count = valid_user_blocks(sbi);
+ si->valid_node_count = valid_node_count(sbi);
+ si->valid_inode_count = valid_inode_count(sbi);
+ si->utilization = utilization(sbi);
+
+ si->free_segs = free_segments(sbi);
+ si->free_secs = free_sections(sbi);
+ si->prefree_count = prefree_segments(sbi);
+ si->dirty_count = dirty_segments(sbi);
+ si->node_pages = sbi->node_inode->i_mapping->nrpages;
+ si->meta_pages = sbi->meta_inode->i_mapping->nrpages;
+ si->nats = NM_I(sbi)->nat_cnt;
+ si->sits = SIT_I(sbi)->dirty_sentries;
+ si->fnids = NM_I(sbi)->fcnt;
+ si->bg_gc = sbi->bg_gc;
+ si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
+ * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
+ / 2;
+ si->util_valid = (int)(written_block_count(sbi) >>
+ sbi->log_blocks_per_seg)
+ * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
+ / 2;
+ si->util_invalid = 50 - si->util_free - si->util_valid;
+ for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_NODE; i++) {
+ struct curseg_info *curseg = CURSEG_I(sbi, i);
+ si->curseg[i] = curseg->segno;
+ si->cursec[i] = curseg->segno / sbi->segs_per_sec;
+ si->curzone[i] = si->cursec[i] / sbi->secs_per_zone;
+ }
+
+ for (i = 0; i < 2; i++) {
+ si->segment_count[i] = sbi->segment_count[i];
+ si->block_count[i] = sbi->block_count[i];
+ }
+}
+
+/**
+ * This function calculates BDF of every segments
+ */
+void f2fs_update_gc_metric(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+ struct f2fs_stat_info *si = gc_i->stat_info;
+ unsigned int blks_per_sec, hblks_per_sec, total_vblocks, bimodal, dist;
+ struct sit_info *sit_i = SIT_I(sbi);
+ unsigned int segno, vblocks;
+ int ndirty = 0;
+
+ bimodal = 0;
+ total_vblocks = 0;
+ blks_per_sec = sbi->segs_per_sec * (1 << sbi->log_blocks_per_seg);
+ hblks_per_sec = blks_per_sec / 2;
+ mutex_lock(&sit_i->sentry_lock);
+ for (segno = 0; segno < TOTAL_SEGS(sbi); segno += sbi->segs_per_sec) {
+ vblocks = get_valid_blocks(sbi, segno, sbi->segs_per_sec);
+ dist = abs(vblocks - hblks_per_sec);
+ bimodal += dist * dist;
+
+ if (vblocks > 0 && vblocks < blks_per_sec) {
+ total_vblocks += vblocks;
+ ndirty++;
+ }
+ }
+ mutex_unlock(&sit_i->sentry_lock);
+ dist = sbi->total_sections * hblks_per_sec * hblks_per_sec / 100;
+ si->bimodal = bimodal / dist;
+ if (si->dirty_count)
+ si->avg_vblocks = total_vblocks / ndirty;
+ else
+ si->avg_vblocks = 0;
+}
+
+static int f2fs_read_gc(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ struct f2fs_gc_info *gc_i, *next;
+ struct f2fs_stat_info *si;
+ char *buf = page;
+ int i = 0;
+
+ list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
+ int j;
+ si = gc_i->stat_info;
+
+ mutex_lock(&si->stat_list);
+ if (!si->sbi) {
+ mutex_unlock(&si->stat_list);
+ continue;
+ }
+ f2fs_update_stat(si->sbi);
+
+ buf += sprintf(buf, "=====[ partition info. #%d ]=====\n", i++);
+ buf += sprintf(buf, "[SB: 1] [CP: 2] [NAT: %d] [SIT: %d] ",
+ si->nat_area_segs, si->sit_area_segs);
+ buf += sprintf(buf, "[SSA: %d] [MAIN: %d",
+ si->ssa_area_segs, si->main_area_segs);
+ buf += sprintf(buf, "(OverProv:%d Resv:%d)]\n\n",
+ si->overp_segs, si->rsvd_segs);
+ buf += sprintf(buf, "Utilization: %d%% (%d valid blocks)\n",
+ si->utilization, si->valid_count);
+ buf += sprintf(buf, " - Node: %u (Inode: %u, ",
+ si->valid_node_count, si->valid_inode_count);
+ buf += sprintf(buf, "Other: %u)\n - Data: %u\n",
+ si->valid_node_count - si->valid_inode_count,
+ si->valid_count - si->valid_node_count);
+ buf += sprintf(buf, "\nMain area: %d segs, %d secs %d zones\n",
+ si->main_area_segs, si->main_area_sections,
+ si->main_area_zones);
+ buf += sprintf(buf, " - COLD data: %d, %d, %d\n",
+ si->curseg[CURSEG_COLD_DATA],
+ si->cursec[CURSEG_COLD_DATA],
+ si->curzone[CURSEG_COLD_DATA]);
+ buf += sprintf(buf, " - WARM data: %d, %d, %d\n",
+ si->curseg[CURSEG_WARM_DATA],
+ si->cursec[CURSEG_WARM_DATA],
+ si->curzone[CURSEG_WARM_DATA]);
+ buf += sprintf(buf, " - HOT data: %d, %d, %d\n",
+ si->curseg[CURSEG_HOT_DATA],
+ si->cursec[CURSEG_HOT_DATA],
+ si->curzone[CURSEG_HOT_DATA]);
+ buf += sprintf(buf, " - Dir dnode: %d, %d, %d\n",
+ si->curseg[CURSEG_HOT_NODE],
+ si->cursec[CURSEG_HOT_NODE],
+ si->curzone[CURSEG_HOT_NODE]);
+ buf += sprintf(buf, " - File dnode: %d, %d, %d\n",
+ si->curseg[CURSEG_WARM_NODE],
+ si->cursec[CURSEG_WARM_NODE],
+ si->curzone[CURSEG_WARM_NODE]);
+ buf += sprintf(buf, " - Indir nodes: %d, %d, %d\n",
+ si->curseg[CURSEG_COLD_NODE],
+ si->cursec[CURSEG_COLD_NODE],
+ si->curzone[CURSEG_COLD_NODE]);
+ buf += sprintf(buf, "\n - Valid: %d\n - Dirty: %d\n",
+ si->main_area_segs - si->dirty_count -
+ si->prefree_count - si->free_segs,
+ si->dirty_count);
+ buf += sprintf(buf, " - Prefree: %d\n - Free: %d (%d)\n\n",
+ si->prefree_count,
+ si->free_segs,
+ si->free_secs);
+ buf += sprintf(buf, "GC calls: %d (BG: %d)\n",
+ si->call_count, si->bg_gc);
+ buf += sprintf(buf, " - data segments : %d\n", si->data_segs);
+ buf += sprintf(buf, " - node segments : %d\n", si->node_segs);
+ buf += sprintf(buf, "Try to move %d blocks\n", si->tot_blks);
+ buf += sprintf(buf, " - data blocks : %d\n", si->data_blks);
+ buf += sprintf(buf, " - node blocks : %d\n", si->node_blks);
+ buf += sprintf(buf, "\nExtent Hit Ratio: %d / %d\n",
+ si->hit_ext, si->total_ext);
+ buf += sprintf(buf, "\nBalancing F2FS Async:\n");
+ buf += sprintf(buf, " - nodes %4d in %4d\n",
+ si->ndirty_node, si->node_pages);
+ buf += sprintf(buf, " - dents %4d in dirs:%4d\n",
+ si->ndirty_dent, si->ndirty_dirs);
+ buf += sprintf(buf, " - meta %4d in %4d\n",
+ si->ndirty_meta, si->meta_pages);
+ buf += sprintf(buf, " - NATs %5d > %lu\n",
+ si->nats, NM_WOUT_THRESHOLD);
+ buf += sprintf(buf, " - SITs: %5d\n - free_nids: %5d\n",
+ si->sits, si->fnids);
+ buf += sprintf(buf, "\nDistribution of User Blocks:");
+ buf += sprintf(buf, " [ valid | invalid | free ]\n");
+ buf += sprintf(buf, " [");
+ for (j = 0; j < si->util_valid; j++)
+ buf += sprintf(buf, "-");
+ buf += sprintf(buf, "|");
+ for (j = 0; j < si->util_invalid; j++)
+ buf += sprintf(buf, "-");
+ buf += sprintf(buf, "|");
+ for (j = 0; j < si->util_free; j++)
+ buf += sprintf(buf, "-");
+ buf += sprintf(buf, "]\n\n");
+ buf += sprintf(buf, "SSR: %u blocks in %u segments\n",
+ si->block_count[SSR], si->segment_count[SSR]);
+ buf += sprintf(buf, "LFS: %u blocks in %u segments\n",
+ si->block_count[LFS], si->segment_count[LFS]);
+ mutex_unlock(&si->stat_list);
+ }
+ return buf - page;
+}
+
+static int f2fs_read_sit(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ struct f2fs_gc_info *gc_i, *next;
+ struct f2fs_stat_info *si;
+ char *buf = page;
+
+ list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
+ si = gc_i->stat_info;
+
+ mutex_lock(&si->stat_list);
+ if (!si->sbi) {
+ mutex_unlock(&si->stat_list);
+ continue;
+ }
+ f2fs_update_gc_metric(si->sbi);
+
+ buf += sprintf(buf, "BDF: %u, avg. vblocks: %u\n",
+ si->bimodal, si->avg_vblocks);
+ mutex_unlock(&si->stat_list);
+ }
+ return buf - page;
+}
+
+static int f2fs_read_mem(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ struct f2fs_gc_info *gc_i, *next;
+ struct f2fs_stat_info *si;
+ char *buf = page;
+
+ list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
+ struct f2fs_sb_info *sbi = gc_i->stat_info->sbi;
+ unsigned npages;
+ unsigned base_mem = 0, cache_mem = 0;
+
+ si = gc_i->stat_info;
+ mutex_lock(&si->stat_list);
+ if (!si->sbi) {
+ mutex_unlock(&si->stat_list);
+ continue;
+ }
+ base_mem += sizeof(struct f2fs_sb_info) + sbi->sb->s_blocksize;
+ base_mem += 2 * sizeof(struct f2fs_inode_info);
+ base_mem += sizeof(*sbi->ckpt);
+
+ /* build sm */
+ base_mem += sizeof(struct f2fs_sm_info);
+
+ /* build sit */
+ base_mem += sizeof(struct sit_info);
+ base_mem += TOTAL_SEGS(sbi) * sizeof(struct seg_entry);
+ base_mem += f2fs_bitmap_size(TOTAL_SEGS(sbi));
+ base_mem += 2 * SIT_VBLOCK_MAP_SIZE * TOTAL_SEGS(sbi);
+ if (sbi->segs_per_sec > 1)
+ base_mem += sbi->total_sections *
+ sizeof(struct sec_entry);
+ base_mem += __bitmap_size(sbi, SIT_BITMAP);
+
+ /* build free segmap */
+ base_mem += sizeof(struct free_segmap_info);
+ base_mem += f2fs_bitmap_size(TOTAL_SEGS(sbi));
+ base_mem += f2fs_bitmap_size(sbi->total_sections);
+
+ /* build curseg */
+ base_mem += sizeof(struct curseg_info) * NR_CURSEG_TYPE;
+ base_mem += PAGE_CACHE_SIZE * NR_CURSEG_TYPE;
+
+ /* build dirty segmap */
+ base_mem += sizeof(struct dirty_seglist_info);
+ base_mem += NR_DIRTY_TYPE * f2fs_bitmap_size(TOTAL_SEGS(sbi));
+ base_mem += 2 * f2fs_bitmap_size(TOTAL_SEGS(sbi));
+
+ /* buld nm */
+ base_mem += sizeof(struct f2fs_nm_info);
+ base_mem += __bitmap_size(sbi, NAT_BITMAP);
+
+ /* build gc */
+ base_mem += sizeof(struct f2fs_gc_info);
+ base_mem += sizeof(struct f2fs_gc_kthread);
+
+ /* free nids */
+ cache_mem += NM_I(sbi)->fcnt;
+ cache_mem += NM_I(sbi)->nat_cnt;
+ npages = sbi->node_inode->i_mapping->nrpages;
+ cache_mem += npages << PAGE_CACHE_SHIFT;
+ npages = sbi->meta_inode->i_mapping->nrpages;
+ cache_mem += npages << PAGE_CACHE_SHIFT;
+ cache_mem += sbi->n_orphans * sizeof(struct orphan_inode_entry);
+ cache_mem += sbi->n_dirty_dirs * sizeof(struct dir_inode_entry);
+
+ buf += sprintf(buf, "%u KB = static: %u + cached: %u\n",
+ (base_mem + cache_mem) >> 10,
+ base_mem >> 10,
+ cache_mem >> 10);
+ mutex_unlock(&si->stat_list);
+ }
+ return buf - page;
+}
+
+int f2fs_stat_init(struct f2fs_sb_info *sbi)
+{
+ struct proc_dir_entry *entry;
+
+ entry = create_proc_entry("f2fs_stat", 0, sbi->s_proc);
+ if (!entry)
+ return -ENOMEM;
+ entry->read_proc = f2fs_read_gc;
+ entry->write_proc = NULL;
+
+ entry = create_proc_entry("f2fs_sit_stat", 0, sbi->s_proc);
+ if (!entry) {
+ remove_proc_entry("f2fs_stat", sbi->s_proc);
+ return -ENOMEM;
+ }
+ entry->read_proc = f2fs_read_sit;
+ entry->write_proc = NULL;
+ entry = create_proc_entry("f2fs_mem_stat", 0, sbi->s_proc);
+ if (!entry) {
+ remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
+ remove_proc_entry("f2fs_stat", sbi->s_proc);
+ return -ENOMEM;
+ }
+ entry->read_proc = f2fs_read_mem;
+ entry->write_proc = NULL;
+ return 0;
+}
+
+void f2fs_stat_exit(struct f2fs_sb_info *sbi)
+{
+ if (sbi->s_proc) {
+ remove_proc_entry("f2fs_stat", sbi->s_proc);
+ remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
+ remove_proc_entry("f2fs_mem_stat", sbi->s_proc);
+ }
+}
+#endif
+
+int build_gc_manager(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i;
+ struct f2fs_checkpoint *ckp = F2FS_CKPT(sbi);
+#ifdef CONFIG_F2FS_STAT_FS
+ struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
+ struct f2fs_stat_info *si;
+#endif
+
+ gc_i = kzalloc(sizeof(struct f2fs_gc_info), GFP_KERNEL);
+ if (!gc_i)
+ return -ENOMEM;
+
+ sbi->gc_info = gc_i;
+ gc_i->rsvd_segment_count = le32_to_cpu(ckp->rsvd_segment_count);
+ gc_i->overp_segment_count = le32_to_cpu(ckp->overprov_segment_count);
+
+ DIRTY_I(sbi)->v_ops = &default_v_ops;
+
+#ifdef CONFIG_F2FS_STAT_FS
+ gc_i->stat_info = kzalloc(sizeof(struct f2fs_stat_info),
+ GFP_KERNEL);
+ if (!gc_i->stat_info)
+ return -ENOMEM;
+ si = gc_i->stat_info;
+ mutex_init(&si->stat_list);
+ list_add_tail(&gc_i->stat_list, &f2fs_stat_list);
+
+ si->all_area_segs = le32_to_cpu(raw_super->segment_count);
+ si->sit_area_segs = le32_to_cpu(raw_super->segment_count_sit);
+ si->nat_area_segs = le32_to_cpu(raw_super->segment_count_nat);
+ si->ssa_area_segs = le32_to_cpu(raw_super->segment_count_ssa);
+ si->main_area_segs = le32_to_cpu(raw_super->segment_count_main);
+ si->main_area_sections = le32_to_cpu(raw_super->section_count);
+ si->main_area_zones = si->main_area_sections /
+ le32_to_cpu(raw_super->secs_per_zone);
+ si->sbi = sbi;
+#endif
+ return 0;
+}
+
+void destroy_gc_manager(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+#ifdef CONFIG_F2FS_STAT_FS
+ struct f2fs_stat_info *si = gc_i->stat_info;
+#endif
+ if (!gc_i)
+ return;
+
+#ifdef CONFIG_F2FS_STAT_FS
+ list_del(&gc_i->stat_list);
+ mutex_lock(&si->stat_list);
+ si->sbi = NULL;
+ mutex_unlock(&si->stat_list);
+ kfree(gc_i->stat_info);
+#endif
+ sbi->gc_info = NULL;
+ kfree(gc_i);
+}
+
+int create_gc_caches(void)
+{
+ winode_slab = f2fs_kmem_cache_create("f2fs_gc_inodes",
+ sizeof(struct inode_entry), NULL);
+ if (!winode_slab)
+ return -ENOMEM;
+ return 0;
+}
+
+void destroy_gc_caches(void)
+{
+ kmem_cache_destroy(winode_slab);
+}
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
new file mode 100644
index 0000000..29b345d
--- /dev/null
+++ b/fs/f2fs/gc.h
@@ -0,0 +1,203 @@
+/**
+ * fs/f2fs/gc.h
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#define GC_THREAD_NAME "f2fs_gc_task"
+#define GC_THREAD_MIN_WB_PAGES 1 /*
+ * a threshold to determine
+ * whether IO subsystem is idle
+ * or not
+ */
+#define GC_THREAD_MIN_SLEEP_TIME 10000 /* milliseconds */
+#define GC_THREAD_MAX_SLEEP_TIME 30000
+#define GC_THREAD_NOGC_SLEEP_TIME 10000
+#define LIMIT_INVALID_BLOCK 40 /* percentage over total user space */
+#define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */
+
+/* Search max. number of dirty segments to select a victim segment */
+#define MAX_VICTIM_SEARCH 20
+
+enum {
+ GC_NONE = 0,
+ GC_ERROR,
+ GC_OK,
+ GC_NEXT,
+ GC_BLOCKED,
+ GC_DONE,
+};
+
+#ifdef CONFIG_F2FS_STAT_FS
+struct f2fs_stat_info {
+ struct f2fs_sb_info *sbi;
+ struct mutex stat_list;
+ int all_area_segs;
+ int sit_area_segs;
+ int nat_area_segs;
+ int ssa_area_segs;
+ int main_area_segs;
+ int main_area_sections;
+ int main_area_zones;
+ int hit_ext, total_ext;
+ int ndirty_node;
+ int ndirty_dent;
+ int ndirty_dirs;
+ int ndirty_meta;
+ int nats, sits, fnids;
+ int total_count;
+ int utilization;
+ int bg_gc;
+ unsigned int valid_count;
+ unsigned int valid_node_count;
+ unsigned int valid_inode_count;
+ unsigned int bimodal, avg_vblocks;
+ int util_free, util_valid, util_invalid;
+ int rsvd_segs, overp_segs;
+ int dirty_count;
+ int node_pages;
+ int meta_pages;
+ int prefree_count;
+ int call_count;
+ int tot_segs;
+ int node_segs;
+ int data_segs;
+ int free_segs;
+ int free_secs;
+ int tot_blks;
+ int data_blks;
+ int node_blks;
+ int curseg[6];
+ int cursec[6];
+ int curzone[6];
+
+ unsigned int segment_count[2];
+ unsigned int block_count[2];
+};
+
+#define GC_STAT_I(gi) ((gi)->stat_info)
+
+#define gc_stat_inc_call_count(gi) ((GC_STAT_I(gi))->call_count++)
+
+#define gc_stat_inc_seg_count(sbi, type) \
+ do { \
+ struct f2fs_gc_info *gi = sbi->gc_info; \
+ GC_STAT_I(gi)->tot_segs++; \
+ if (type == SUM_TYPE_DATA) \
+ GC_STAT_I(gi)->data_segs++; \
+ else \
+ GC_STAT_I(gi)->node_segs++; \
+ } while (0)
+
+#define gc_stat_inc_tot_blk_count(gi, blks) \
+ ((GC_STAT_I(gi)->tot_blks) += (blks))
+
+#define gc_stat_inc_data_blk_count(sbi, blks) \
+ do { \
+ struct f2fs_gc_info *gi = sbi->gc_info; \
+ gc_stat_inc_tot_blk_count(gi, blks); \
+ GC_STAT_I(gi)->data_blks += (blks); \
+ } while (0)
+
+#define gc_stat_inc_node_blk_count(sbi, blks) \
+ do { \
+ struct f2fs_gc_info *gi = sbi->gc_info; \
+ gc_stat_inc_tot_blk_count(gi, blks); \
+ GC_STAT_I(gi)->node_blks += (blks); \
+ } while (0)
+
+#else
+#define gc_stat_inc_call_count(gi)
+#define gc_stat_inc_seg_count(gi, type)
+#define gc_stat_inc_tot_blk_count(gi, blks)
+#define gc_stat_inc_data_blk_count(gi, blks)
+#define gc_stat_inc_node_blk_count(sbi, blks)
+#endif
+
+struct f2fs_gc_kthread {
+ struct task_struct *f2fs_gc_task;
+ wait_queue_head_t gc_wait_queue_head;
+};
+
+struct inode_entry {
+ struct list_head list;
+ struct inode *inode;
+};
+
+/**
+ * inline functions
+ */
+static inline block_t free_user_blocks(struct f2fs_sb_info *sbi)
+{
+ if (free_segments(sbi) < overprovision_segments(sbi))
+ return 0;
+ else
+ return (free_segments(sbi) - overprovision_segments(sbi))
+ << sbi->log_blocks_per_seg;
+}
+
+static inline block_t limit_invalid_user_blocks(struct f2fs_sb_info *sbi)
+{
+ return (long)(sbi->user_block_count * LIMIT_INVALID_BLOCK) / 100;
+}
+
+static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi)
+{
+ block_t reclaimable_user_blocks = sbi->user_block_count -
+ written_block_count(sbi);
+ return (long)(reclaimable_user_blocks * LIMIT_FREE_BLOCK) / 100;
+}
+
+static inline long increase_sleep_time(long wait)
+{
+ wait += GC_THREAD_MIN_SLEEP_TIME;
+ if (wait > GC_THREAD_MAX_SLEEP_TIME)
+ wait = GC_THREAD_MAX_SLEEP_TIME;
+ return wait;
+}
+
+static inline long decrease_sleep_time(long wait)
+{
+ wait -= GC_THREAD_MIN_SLEEP_TIME;
+ if (wait <= GC_THREAD_MIN_SLEEP_TIME)
+ wait = GC_THREAD_MIN_SLEEP_TIME;
+ return wait;
+}
+
+static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi)
+{
+ block_t invalid_user_blocks = sbi->user_block_count -
+ written_block_count(sbi);
+ /*
+ * Background GC is triggered with the following condition.
+ * 1. There are a number of invalid blocks.
+ * 2. There is not enough free space.
+ */
+ if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
+ free_user_blocks(sbi) < limit_free_user_blocks(sbi))
+ return true;
+ return false;
+}
+
+static inline int is_idle(struct f2fs_sb_info *sbi)
+{
+ struct block_device *bdev = sbi->sb->s_bdev;
+ struct request_queue *q = bdev_get_queue(bdev);
+ struct request_list *rl = &q->root_rl;
+ return !(rl->count[BLK_RW_SYNC]) && !(rl->count[BLK_RW_ASYNC]);
+}
+
+static bool should_do_checkpoint(struct f2fs_sb_info *sbi)
+{
+ unsigned int pages_per_sec = sbi->segs_per_sec *
+ (1 << sbi->log_blocks_per_seg);
+ int node_secs = ((get_pages(sbi, F2FS_DIRTY_NODES) + pages_per_sec - 1)
+ >> sbi->log_blocks_per_seg) / sbi->segs_per_sec;
+ int dent_secs = ((get_pages(sbi, F2FS_DIRTY_DENTS) + pages_per_sec - 1)
+ >> sbi->log_blocks_per_seg) / sbi->segs_per_sec;
+ return free_sections(sbi) <= (node_secs + 2 * dent_secs + 2);
+}
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:33:00

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 15/16 v2] f2fs: add recovery routines for roll-forward

This adds roll-forward routines to recover fsynced data.

- F2FS uses basically roll-back model with checkpointing.

- In order to implement fsync(), there are two approaches as follows.

1. A roll-back model with checkpointing at every fsync()
: This is a naive method, but suffers from very low performance.

2. A roll-forward model
: F2FS adopts this model where all the fsynced data should be recovered, which
were written after checkpointing was done. In order to figure out the data,
F2FS keeps a "fsync" mark in direct node blocks. In addition, F2FS remains
the location of next node block in each direct node block for reconstructing
the chain of node blocks during the recovery.

- In order to enhance the performance, F2FS keeps a "dentry" mark also in direct
node blocks. If this is set during the recovery, F2FS replays adding a dentry.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/recovery.c | 375 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 375 insertions(+)
create mode 100644 fs/f2fs/recovery.c

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
new file mode 100644
index 0000000..59b6331
--- /dev/null
+++ b/fs/f2fs/recovery.c
@@ -0,0 +1,375 @@
+/**
+ * fs/f2fs/recovery.c
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/f2fs_fs.h>
+#include "f2fs.h"
+#include "node.h"
+#include "segment.h"
+
+static struct kmem_cache *fsync_entry_slab;
+
+bool space_for_roll_forward(struct f2fs_sb_info *sbi)
+{
+ if (sbi->last_valid_block_count + sbi->alloc_valid_block_count
+ > sbi->user_block_count)
+ return false;
+ return true;
+}
+
+static struct fsync_inode_entry *get_fsync_inode(struct list_head *head,
+ nid_t ino)
+{
+ struct list_head *this;
+ struct fsync_inode_entry *entry;
+
+ list_for_each(this, head) {
+ entry = list_entry(this, struct fsync_inode_entry, list);
+ if (entry->inode->i_ino == ino)
+ return entry;
+ }
+ return NULL;
+}
+
+static int recover_dentry(struct page *ipage, struct inode *inode)
+{
+ struct f2fs_node *raw_node = (struct f2fs_node *)kmap(ipage);
+ struct f2fs_inode *raw_inode = &(raw_node->i);
+ struct dentry dent, parent;
+ struct f2fs_dir_entry *de;
+ struct page *page;
+ struct inode *dir;
+ int err = 0;
+
+ if (!is_dent_dnode(ipage))
+ goto out;
+
+ dir = f2fs_iget(inode->i_sb, le32_to_cpu(raw_inode->i_pino));
+ if (IS_ERR(dir)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ parent.d_inode = dir;
+ dent.d_parent = &parent;
+ dent.d_name.len = le32_to_cpu(raw_inode->i_namelen);
+ dent.d_name.name = raw_inode->i_name;
+
+ de = f2fs_find_entry(dir, &dent.d_name, &page);
+ if (de) {
+ kunmap(page);
+ f2fs_put_page(page, 0);
+ } else {
+ f2fs_add_link(&dent, inode);
+ }
+ iput(dir);
+out:
+ kunmap(ipage);
+ return err;
+}
+
+static int recover_inode(struct inode *inode, struct page *node_page)
+{
+ void *kaddr = page_address(node_page);
+ struct f2fs_node *raw_node = (struct f2fs_node *)kaddr;
+ struct f2fs_inode *raw_inode = &(raw_node->i);
+
+ inode->i_mode = le32_to_cpu(raw_inode->i_mode);
+ i_size_write(inode, le64_to_cpu(raw_inode->i_size));
+ inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
+ inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
+ inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
+ inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+ inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
+ inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+
+ return recover_dentry(node_page, inode);
+}
+
+static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
+{
+ unsigned long long cp_ver = le64_to_cpu(sbi->ckpt->checkpoint_ver);
+ struct curseg_info *curseg;
+ struct page *page;
+ block_t blkaddr;
+ int err = 0;
+
+ /* get node pages in the current segment */
+ curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
+ blkaddr = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff;
+
+ /* read node page */
+ page = alloc_page(GFP_NOFS | __GFP_ZERO);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+ lock_page(page);
+
+ while (1) {
+ struct fsync_inode_entry *entry;
+
+ if (f2fs_readpage(sbi, page, blkaddr, READ_SYNC))
+ goto out;
+
+ if (cp_ver != cpver_of_node(page))
+ goto out;
+
+ if (!is_fsync_dnode(page))
+ goto next;
+
+ entry = get_fsync_inode(head, ino_of_node(page));
+ if (entry) {
+ entry->blkaddr = blkaddr;
+ if (IS_INODE(page) && is_dent_dnode(page))
+ set_inode_flag(F2FS_I(entry->inode),
+ FI_INC_LINK);
+ } else {
+ if (IS_INODE(page) && is_dent_dnode(page)) {
+ if (recover_inode_page(sbi, page)) {
+ err = -ENOMEM;
+ goto out;
+ }
+ }
+
+ /* add this fsync inode to the list */
+ entry = kmem_cache_alloc(fsync_entry_slab, GFP_NOFS);
+ if (!entry) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ INIT_LIST_HEAD(&entry->list);
+ list_add_tail(&entry->list, head);
+
+ entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
+ if (IS_ERR(entry->inode)) {
+ err = PTR_ERR(entry->inode);
+ goto out;
+ }
+ entry->blkaddr = blkaddr;
+ }
+ if (IS_INODE(page)) {
+ err = recover_inode(entry->inode, page);
+ if (err)
+ goto out;
+ }
+next:
+ /* check next segment */
+ blkaddr = next_blkaddr_of_node(page);
+ ClearPageUptodate(page);
+ }
+out:
+ unlock_page(page);
+ __free_pages(page, 0);
+ return err;
+}
+
+static void destroy_fsync_dnodes(struct f2fs_sb_info *sbi,
+ struct list_head *head)
+{
+ struct list_head *this;
+ struct fsync_inode_entry *entry;
+ list_for_each(this, head) {
+ entry = list_entry(this, struct fsync_inode_entry, list);
+ iput(entry->inode);
+ list_del(&entry->list);
+ kmem_cache_free(fsync_entry_slab, entry);
+ }
+}
+
+static void check_index_in_prev_nodes(struct f2fs_sb_info *sbi,
+ block_t blkaddr)
+{
+ struct seg_entry *sentry;
+ unsigned int segno = GET_SEGNO(sbi, blkaddr);
+ unsigned short blkoff = GET_SEGOFF_FROM_SEG0(sbi, blkaddr) &
+ (sbi->blocks_per_seg - 1);
+ struct f2fs_summary sum;
+ nid_t ino;
+ void *kaddr;
+ struct inode *inode;
+ struct page *node_page;
+ block_t bidx;
+ int i;
+
+ sentry = get_seg_entry(sbi, segno);
+ if (!f2fs_test_bit(blkoff, sentry->cur_valid_map))
+ return;
+
+ /* Get the previous summary */
+ for (i = CURSEG_WARM_DATA; i <= CURSEG_COLD_DATA; i++) {
+ struct curseg_info *curseg = CURSEG_I(sbi, i);
+ if (curseg->segno == segno) {
+ sum = curseg->sum_blk->entries[blkoff];
+ break;
+ }
+ }
+ if (i > CURSEG_COLD_DATA) {
+ struct page *sum_page = get_sum_page(sbi, segno);
+ struct f2fs_summary_block *sum_node;
+ kaddr = page_address(sum_page);
+ sum_node = (struct f2fs_summary_block *)kaddr;
+ sum = sum_node->entries[blkoff];
+ f2fs_put_page(sum_page, 1);
+ }
+
+ /* Get the node page */
+ node_page = get_node_page(sbi, le32_to_cpu(sum.nid));
+ bidx = start_bidx_of_node(ofs_of_node(node_page)) +
+ le16_to_cpu(sum.ofs_in_node);
+ ino = ino_of_node(node_page);
+ f2fs_put_page(node_page, 1);
+
+ /* Deallocate previous index in the node page */
+ inode = f2fs_iget_nowait(sbi->sb, ino);
+ truncate_hole(inode, bidx, bidx + 1);
+ iput(inode);
+}
+
+static void do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode,
+ struct page *page, block_t blkaddr)
+{
+ unsigned int start, end;
+ struct dnode_of_data dn;
+ struct f2fs_summary sum;
+ struct node_info ni;
+
+ start = start_bidx_of_node(ofs_of_node(page));
+ if (IS_INODE(page))
+ end = start + ADDRS_PER_INODE;
+ else
+ end = start + ADDRS_PER_BLOCK;
+
+ set_new_dnode(&dn, inode, NULL, NULL, 0);
+ if (get_dnode_of_data(&dn, start, 0))
+ return;
+
+ wait_on_page_writeback(dn.node_page);
+
+ get_node_info(sbi, dn.nid, &ni);
+ BUG_ON(ni.ino != ino_of_node(page));
+ BUG_ON(ofs_of_node(dn.node_page) != ofs_of_node(page));
+
+ for (; start < end; start++) {
+ block_t src, dest;
+
+ src = datablock_addr(dn.node_page, dn.ofs_in_node);
+ dest = datablock_addr(page, dn.ofs_in_node);
+
+ if (src != dest && dest != NEW_ADDR && dest != NULL_ADDR) {
+ if (src == NULL_ADDR) {
+ int err = reserve_new_block(&dn);
+ /* We should not get -ENOSPC */
+ BUG_ON(err);
+ }
+
+ /* Check the previous node page having this index */
+ check_index_in_prev_nodes(sbi, dest);
+
+ set_summary(&sum, dn.nid, dn.ofs_in_node, ni.version);
+
+ /* write dummy data page */
+ recover_data_page(sbi, NULL, &sum, src, dest);
+ update_extent_cache(dest, &dn);
+ }
+ dn.ofs_in_node++;
+ }
+
+ /* write node page in place */
+ set_summary(&sum, dn.nid, 0, 0);
+ if (IS_INODE(dn.node_page))
+ sync_inode_page(&dn);
+
+ copy_node_footer(dn.node_page, page);
+ fill_node_footer(dn.node_page, dn.nid, ni.ino,
+ ofs_of_node(page), false);
+ set_page_dirty(dn.node_page);
+
+ recover_node_page(sbi, dn.node_page, &sum, &ni, blkaddr);
+ f2fs_put_dnode(&dn);
+}
+
+static void recover_data(struct f2fs_sb_info *sbi,
+ struct list_head *head, int type)
+{
+ unsigned long long cp_ver = le64_to_cpu(sbi->ckpt->checkpoint_ver);
+ struct curseg_info *curseg;
+ struct page *page;
+ block_t blkaddr;
+
+ /* get node pages in the current segment */
+ curseg = CURSEG_I(sbi, type);
+ blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
+
+ /* read node page */
+ page = alloc_page(GFP_NOFS | __GFP_ZERO);
+ if (IS_ERR(page))
+ return;
+ lock_page(page);
+
+ while (1) {
+ struct fsync_inode_entry *entry;
+
+ if (f2fs_readpage(sbi, page, blkaddr, READ_SYNC))
+ goto out;
+
+ if (cp_ver != cpver_of_node(page))
+ goto out;
+
+ entry = get_fsync_inode(head, ino_of_node(page));
+ if (!entry)
+ goto next;
+
+ do_recover_data(sbi, entry->inode, page, blkaddr);
+
+ if (entry->blkaddr == blkaddr) {
+ iput(entry->inode);
+ list_del(&entry->list);
+ kmem_cache_free(fsync_entry_slab, entry);
+ }
+next:
+ /* check next segment */
+ blkaddr = next_blkaddr_of_node(page);
+ ClearPageUptodate(page);
+ }
+out:
+ unlock_page(page);
+ __free_pages(page, 0);
+
+ allocate_new_segments(sbi);
+}
+
+void recover_fsync_data(struct f2fs_sb_info *sbi)
+{
+ struct list_head inode_list;
+
+ fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
+ sizeof(struct fsync_inode_entry), NULL);
+ if (unlikely(!fsync_entry_slab))
+ return;
+
+ INIT_LIST_HEAD(&inode_list);
+
+ /* step #1: find fsynced inode numbers */
+ if (find_fsync_dnodes(sbi, &inode_list))
+ goto out;
+
+ if (list_empty(&inode_list))
+ goto out;
+
+ /* step #2: recover data */
+ sbi->por_doing = 1;
+ recover_data(sbi, &inode_list, CURSEG_WARM_NODE);
+ sbi->por_doing = 0;
+ BUG_ON(!list_empty(&inode_list));
+out:
+ destroy_fsync_dnodes(sbi, &inode_list);
+ kmem_cache_destroy(fsync_entry_slab);
+ write_checkpoint(sbi, false, false);
+}
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 02:33:32

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 16/16 v2] f2fs: update Kconfig and Makefile

This adds Makefile and Kconfig for f2fs, and updates Makefile and Kconfig files
in the fs directory.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/f2fs/Kconfig | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/Makefile | 6 ++++++
4 files changed, 63 insertions(+)
create mode 100644 fs/f2fs/Kconfig
create mode 100644 fs/f2fs/Makefile

diff --git a/fs/Kconfig b/fs/Kconfig
index f95ae3a..e352b37 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -220,6 +220,7 @@ source "fs/pstore/Kconfig"
source "fs/sysv/Kconfig"
source "fs/ufs/Kconfig"
source "fs/exofs/Kconfig"
+source "fs/f2fs/Kconfig"

endif # MISC_FILESYSTEMS

diff --git a/fs/Makefile b/fs/Makefile
index 1d7af79..9d53192 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -123,6 +123,7 @@ obj-$(CONFIG_DEBUG_FS) += debugfs/
obj-$(CONFIG_OCFS2_FS) += ocfs2/
obj-$(CONFIG_BTRFS_FS) += btrfs/
obj-$(CONFIG_GFS2_FS) += gfs2/
+obj-$(CONFIG_F2FS_FS) += f2fs/
obj-y += exofs/ # Multiple modules
obj-$(CONFIG_CEPH_FS) += ceph/
obj-$(CONFIG_PSTORE) += pstore/
diff --git a/fs/f2fs/Kconfig b/fs/f2fs/Kconfig
new file mode 100644
index 0000000..8821c6a
--- /dev/null
+++ b/fs/f2fs/Kconfig
@@ -0,0 +1,55 @@
+config F2FS_FS
+ tristate "F2FS filesystem support (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ help
+ F2FS is based on Log-structured File System (LFS), which supports
+ versatile "flash-friendly" features. The design has been focused on
+ addressing the fundamental issues in LFS, which are snowball effect
+ of wandering tree and high cleaning overhead.
+
+ Since flash-based storages show different characteristics according to
+ the internal geometry or flash memory management schemes aka FTL, F2FS
+ and tools support various parameters not only for configuring on-disk
+ layout, but also for selecting allocation and cleaning algorithms.
+
+ If unsure, say N.
+
+config F2FS_STAT_FS
+ bool "F2FS Status Information"
+ depends on F2FS_FS
+ default y
+ help
+ /proc/fs/f2fs/ contains information about partitions mounted as f2fs.
+ For each partition, a corresponding directory, named as its device
+ name, is provided with the following proc entries.
+
+ f2fs_stat major file system information managed by f2fs currently
+ f2fs_sit_stat average SIT information about whole segments
+ f2fs_mem_stat current memory footprint consumed by f2fs
+
+ e.g., in /proc/fs/f2fs/sdb1/
+
+config F2FS_FS_XATTR
+ bool "F2FS extended attributes"
+ depends on F2FS_FS
+ default y
+ help
+ Extended attributes are name:value pairs associated with inodes by
+ the kernel or by users (see the attr(5) manual page, or visit
+ <http://acl.bestbits.at/> for details).
+
+ If unsure, say N.
+
+config F2FS_FS_POSIX_ACL
+ bool "F2FS Access Control Lists"
+ depends on F2FS_FS_XATTR
+ select FS_POSIX_ACL
+ default y
+ help
+ Posix Access Control Lists (ACLs) support permissions for users and
+ gourps beyond the owner/group/world scheme.
+
+ To learn more about Access Control Lists, visit the POSIX ACLs for
+ Linux website <http://acl.bestbits.at/>.
+
+ If you don't know what Access Control Lists are, say N
diff --git a/fs/f2fs/Makefile b/fs/f2fs/Makefile
new file mode 100644
index 0000000..72fcf9a
--- /dev/null
+++ b/fs/f2fs/Makefile
@@ -0,0 +1,6 @@
+obj-$(CONFIG_F2FS_FS) += f2fs.o
+
+f2fs-y := dir.o file.o inode.o namei.o hash.o super.o
+f2fs-y += checkpoint.o gc.o data.o node.o segment.o recovery.o
+f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
+f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
--
1.7.9.5




---
Jaegeuk Kim
Samsung


2012-10-23 03:02:36

by Max Filippov

[permalink] [raw]
Subject: Re: [PATCH 07/16 v2] f2fs: add segment operations

Hi.

Building f2fs for ARM gives the following error:

CC fs/f2fs/segment.o
CC fs/f2fs/recovery.o
fs/f2fs/segment.c: In function 'build_sit_info':
fs/f2fs/segment.c:1399:2: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
fs/f2fs/segment.c:1399:18: warning: assignment makes pointer from integer without a cast [enabled by default]
fs/f2fs/segment.c:1419:22: warning: assignment makes pointer from integer without a cast [enabled by default]
fs/f2fs/segment.c: In function 'destroy_sit_info':
fs/f2fs/segment.c:1777:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors

The following fixes that:

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 57d0931..5bab838 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -12,6 +12,7 @@
#include <linux/f2fs_fs.h>
#include <linux/bio.h>
#include <linux/blkdev.h>
+#include <linux/vmalloc.h>

#include "f2fs.h"
#include "segment.h"

--
Thanks.
-- Max

2012-10-23 03:03:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 16/16 v2] f2fs: update Kconfig and Makefile

On Tue, Oct 23, 2012 at 11:33:28AM +0900, Jaegeuk Kim wrote:
> +config F2FS_STAT_FS
> + bool "F2FS Status Information"
> + depends on F2FS_FS
> + default y
> + help
> + /proc/fs/f2fs/ contains information about partitions mounted as f2fs.
> + For each partition, a corresponding directory, named as its device
> + name, is provided with the following proc entries.
> +
> + f2fs_stat major file system information managed by f2fs currently
> + f2fs_sit_stat average SIT information about whole segments
> + f2fs_mem_stat current memory footprint consumed by f2fs
> +
> + e.g., in /proc/fs/f2fs/sdb1/

Again, I will point out that this should either be in debugfs, or in
/sys/fs/ but it should NOT be in /proc/ at all.

If you need help, I will be glad to do this conversion, just let me know
and I'll send you a patch on top of this series that moves these entries
into debugfs, which is where I think they really belong.

thanks,

greg k-h

2012-10-23 03:21:27

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 16/16 v2] f2fs: update Kconfig and Makefile

> On Tue, Oct 23, 2012 at 11:33:28AM +0900, Jaegeuk Kim wrote:
> > +config F2FS_STAT_FS
> > + bool "F2FS Status Information"
> > + depends on F2FS_FS
> > + default y
> > + help
> > + /proc/fs/f2fs/ contains information about partitions mounted as f2fs.
> > + For each partition, a corresponding directory, named as its device
> > + name, is provided with the following proc entries.
> > +
> > + f2fs_stat major file system information managed by f2fs currently
> > + f2fs_sit_stat average SIT information about whole segments
> > + f2fs_mem_stat current memory footprint consumed by f2fs
> > +
> > + e.g., in /proc/fs/f2fs/sdb1/
>
> Again, I will point out that this should either be in debugfs, or in
> /sys/fs/ but it should NOT be in /proc/ at all.
>
> If you need help, I will be glad to do this conversion, just let me know
> and I'll send you a patch on top of this series that moves these entries
> into debugfs, which is where I think they really belong.

Ok, please. :)
I really appreciate your kindness.

>
> thanks,
>
> greg k-h

2012-10-23 03:23:41

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 07/16 v2] f2fs: add segment operations

> Hi.
>
> Building f2fs for ARM gives the following error:
>
> CC fs/f2fs/segment.o
> CC fs/f2fs/recovery.o
> fs/f2fs/segment.c: In function 'build_sit_info':
> fs/f2fs/segment.c:1399:2: error: implicit declaration of function 'vzalloc' [-Werror=implicit-
> function-declaration]
> fs/f2fs/segment.c:1399:18: warning: assignment makes pointer from integer without a cast [enabled by
> default]
> fs/f2fs/segment.c:1419:22: warning: assignment makes pointer from integer without a cast [enabled by
> default]
> fs/f2fs/segment.c: In function 'destroy_sit_info':
> fs/f2fs/segment.c:1777:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-
> declaration]
> cc1: some warnings being treated as errors
>
> The following fixes that:
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 57d0931..5bab838 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -12,6 +12,7 @@
> #include <linux/f2fs_fs.h>
> #include <linux/bio.h>
> #include <linux/blkdev.h>
> +#include <linux/vmalloc.h>
>
> #include "f2fs.h"
> #include "segment.h"
>

Thank you very much.
I'll apply this in v3.

> --
> Thanks.
> -- Max

2012-10-23 03:46:33

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 02/16 v2] f2fs: add on-disk layout

On Tue, 23 Oct 2012 11:26:00 +0900 Jaegeuk Kim <[email protected]>
wrote:

> This adds a header file describing the on-disk layout of f2fs.
>


> +struct f2fs_inode {
> + __le16 i_mode; /* File mode */
> + __u8 i_advise; /* File hints */
> + __u8 i_reserved; /* Reserved */
> + __le32 i_uid; /* User ID */
> + __le32 i_gid; /* Group ID */
> + __le32 i_links; /* Links count */
> + __le64 i_size; /* File size in bytes */
> + __le64 i_blocks; /* File size in blocks */
> + __le64 i_ctime; /* Inode change time */
> + __le64 i_mtime; /* Modification time */
> + __le32 i_ctime_nsec;
> + __le32 i_mtime_nsec;
> + __le32 current_depth;
> + __le32 i_xattr_nid; /* nid to save xattr */
> + __le32 i_flags; /* file attributes */
> + __le32 i_pino; /* parent inode number */
> + __le32 i_namelen; /* file name length */
> + __u8 i_name[F2FS_MAX_NAME_LEN]; /* file name for SPOR */
> +
> + struct f2fs_extent i_ext; /* caching a largest extent */
> +
> + __le32 i_addr[ADDRS_PER_INODE]; /* Pointers to data blocks */
> +
> + __le32 i_nid[5]; /* direct(2), indirect(2),
> + double_indirect(1) node id */
> +} __packed;
> +


You appear to have dropped i_btime - no big deal, you weren't using it anyway.
However if you ever want to support NFS export you will need some value which
is assigned when the inode is allocated and never changed until it is
de-allocated. This is used to detect when an NFS file-handle refers to a
previous incarnation of an inode and so should be rejected as STALE.
i_btime could have possibly provided this, but not any more. You might want
to add something back.
ext3 uses "i_generation" and has an 's_next_generation' in the superblock to
ensure that each new inode gets a new generation number.

You've also dropped i_atime. I can certainly understand the desire to do
that, but I wonder if it is entirely wise. There are some use-cases where
i_mtime is a poor substitute.

Also 'current_depth' looks a little odd without a 'i_' prefix. It wouldn't
hurt to have a comment noting that it is for directories.

Thanks,
NeilBrown


Attachments:
signature.asc (828.00 B)

2012-10-23 06:30:59

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 02/16 v2] f2fs: add on-disk layout

> -----Original Message-----
> From: NeilBrown [mailto:[email protected]]
> Sent: Tuesday, October 23, 2012 12:47 PM
> To: Jaegeuk Kim
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 02/16 v2] f2fs: add on-disk layout
> Importance: High
>
> On Tue, 23 Oct 2012 11:26:00 +0900 Jaegeuk Kim <[email protected]>
> wrote:
>
> > This adds a header file describing the on-disk layout of f2fs.
> >
>
>
> > +struct f2fs_inode {
> > + __le16 i_mode; /* File mode */
> > + __u8 i_advise; /* File hints */
> > + __u8 i_reserved; /* Reserved */
> > + __le32 i_uid; /* User ID */
> > + __le32 i_gid; /* Group ID */
> > + __le32 i_links; /* Links count */
> > + __le64 i_size; /* File size in bytes */
> > + __le64 i_blocks; /* File size in blocks */
> > + __le64 i_ctime; /* Inode change time */
> > + __le64 i_mtime; /* Modification time */
> > + __le32 i_ctime_nsec;
> > + __le32 i_mtime_nsec;
> > + __le32 current_depth;
> > + __le32 i_xattr_nid; /* nid to save xattr */
> > + __le32 i_flags; /* file attributes */
> > + __le32 i_pino; /* parent inode number */
> > + __le32 i_namelen; /* file name length */
> > + __u8 i_name[F2FS_MAX_NAME_LEN]; /* file name for SPOR */
> > +
> > + struct f2fs_extent i_ext; /* caching a largest extent */
> > +
> > + __le32 i_addr[ADDRS_PER_INODE]; /* Pointers to data blocks */
> > +
> > + __le32 i_nid[5]; /* direct(2), indirect(2),
> > + double_indirect(1) node id */
> > +} __packed;
> > +
>
>
> You appear to have dropped i_btime - no big deal, you weren't using it anyway.
> However if you ever want to support NFS export you will need some value which
> is assigned when the inode is allocated and never changed until it is
> de-allocated. This is used to detect when an NFS file-handle refers to a
> previous incarnation of an inode and so should be rejected as STALE.
> i_btime could have possibly provided this, but not any more. You might want
> to add something back.
> ext3 uses "i_generation" and has an 's_next_generation' in the superblock to
> ensure that each new inode gets a new generation number.

Agreed. I'll check that.

>
> You've also dropped i_atime. I can certainly understand the desire to do
> that, but I wonder if it is entirely wise. There are some use-cases where
> i_mtime is a poor substitute.

Got it.

>
> Also 'current_depth' looks a little odd without a 'i_' prefix. It wouldn't
> hurt to have a comment noting that it is for directories.

Agreed.
Thank you for comments. :)

>
> Thanks,
> NeilBrown


---
Jaegeuk Kim
Samsung

2012-10-23 06:47:21

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 02/16 v2] f2fs: add on-disk layout

2012/10/23 Jaegeuk Kim <[email protected]>:
> This adds a header file describing the on-disk layout of f2fs.
>
> Signed-off-by: Changman Lee <[email protected]>
> Signed-off-by: Chul Lee <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> include/linux/f2fs_fs.h | 362 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 362 insertions(+)
> create mode 100644 include/linux/f2fs_fs.h
>
> diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
> new file mode 100644
> index 0000000..bd9c217
> --- /dev/null
> +++ b/include/linux/f2fs_fs.h
> @@ -0,0 +1,362 @@
> +/**
> + * include/linux/f2fs_fs.h

Is this file used by user space?

> + *
> + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com/
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#ifndef _LINUX_F2FS_FS_H
> +#define _LINUX_F2FS_FS_H
> +
> +#include <linux/pagemap.h>
> +#include <linux/types.h>
> +
> +#define F2FS_SUPER_MAGIC 0xF2F52010

In magic.h please.

Marco

2012-10-23 06:51:27

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 04/16 v2] f2fs: add super block operations

2012/10/23 Jaegeuk Kim <[email protected]>:
> This adds the implementation of superblock operations for f2fs, which includes
> - init_f2fs_fs/exit_f2fs_fs
> - f2fs_mount
> - super_operations of f2fs
>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/super.c | 590 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 590 insertions(+)
> create mode 100644 fs/f2fs/super.c
>
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> new file mode 100644
> index 0000000..8e608a0
> --- /dev/null
> +++ b/fs/f2fs/super.c
> @@ -0,0 +1,590 @@
> +/**
> + * fs/f2fs/super.c
> + *
> + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com/
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/fs.h>
> +#include <linux/statfs.h>
> +#include <linux/proc_fs.h>
> +#include <linux/buffer_head.h>
> +#include <linux/backing-dev.h>
> +#include <linux/kthread.h>
> +#include <linux/parser.h>
> +#include <linux/mount.h>
> +#include <linux/seq_file.h>
> +#include <linux/f2fs_fs.h>
> +
> +#include "f2fs.h"
> +#include "node.h"
> +#include "xattr.h"
> +
> +static struct kmem_cache *f2fs_inode_cachep;
> +static struct proc_dir_entry *f2fs_proc_root;
> +
> +enum {
> + Opt_gc_background_off,
> + Opt_disable_roll_forward,
> + Opt_discard,
> + Opt_noheap,
> + Opt_nouser_xattr,
> + Opt_noacl,
> + Opt_active_logs,
> + Opt_disable_ext_identify,
> + Opt_err,
> +};
> +
> +static match_table_t f2fs_tokens = {
> + {Opt_gc_background_off, "background_gc_off"},
> + {Opt_disable_roll_forward, "disable_roll_forward"},
> + {Opt_discard, "discard"},
> + {Opt_noheap, "no_heap"},
> + {Opt_nouser_xattr, "nouser_xattr"},
> + {Opt_noacl, "noacl"},
> + {Opt_active_logs, "active_logs=%u"},
> + {Opt_disable_ext_identify, "disable_ext_identify"},
> + {Opt_err, NULL},
> +};
> +
> +static void init_once(void *foo)
> +{
> + struct f2fs_inode_info *fi = (struct f2fs_inode_info *) foo;
> +
> + memset(fi, 0, sizeof(*fi));
> + inode_init_once(&fi->vfs_inode);
> +}
> +
> +static struct inode *f2fs_alloc_inode(struct super_block *sb)
> +{
> + struct f2fs_inode_info *fi;
> +
> + fi = kmem_cache_alloc(f2fs_inode_cachep, GFP_NOFS | __GFP_ZERO);
> + if (!fi)
> + return NULL;
> +
> + init_once((void *) fi);
> +
> + /* Initilize f2fs-specific inode info */
> + fi->vfs_inode.i_version = 1;
> + atomic_set(&fi->dirty_dents, 0);
> + fi->current_depth = 1;
> + fi->i_advise = 0;
> + rwlock_init(&fi->ext.ext_lock);
> +
> + set_inode_flag(fi, FI_NEW_INODE);
> +
> + return &fi->vfs_inode;
> +}
> +
> +static void f2fs_i_callback(struct rcu_head *head)
> +{
> + struct inode *inode = container_of(head, struct inode, i_rcu);
> + kmem_cache_free(f2fs_inode_cachep, F2FS_I(inode));
> +}
> +
> +void f2fs_destroy_inode(struct inode *inode)
> +{
> + call_rcu(&inode->i_rcu, f2fs_i_callback);
> +}
> +
> +static void f2fs_put_super(struct super_block *sb)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> +
> +#ifdef CONFIG_F2FS_STAT_FS
> + if (sbi->s_proc) {
> + f2fs_stat_exit(sbi);
> + remove_proc_entry(sb->s_id, f2fs_proc_root);
> + }
> +#endif
> + stop_gc_thread(sbi);
> +
> + write_checkpoint(sbi, false, true);
> +
> + iput(sbi->node_inode);
> + iput(sbi->meta_inode);
> +
> + /* destroy f2fs internal modules */
> + destroy_gc_manager(sbi);
> + destroy_node_manager(sbi);
> + destroy_segment_manager(sbi);
> +
> + kfree(sbi->ckpt);
> +
> + sb->s_fs_info = NULL;
> + brelse(sbi->raw_super_buf);
> + kfree(sbi);
> +}
> +
> +int f2fs_sync_fs(struct super_block *sb, int sync)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> + int ret = 0;
> +
> + if (!sbi->s_dirty && !get_pages(sbi, F2FS_DIRTY_NODES))
> + return 0;
> +
> + if (sync)
> + write_checkpoint(sbi, false, false);
> +
> + return ret;
> +}
> +
> +static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf)
> +{
> + struct super_block *sb = dentry->d_sb;
> + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> + block_t total_count, user_block_count, start_count, ovp_count;
> +
> + total_count = le64_to_cpu(sbi->raw_super->block_count);
> + user_block_count = sbi->user_block_count;
> + start_count = le32_to_cpu(sbi->raw_super->segment0_blkaddr);
> + ovp_count = sbi->gc_info->overp_segment_count
> + << sbi->log_blocks_per_seg;
> + buf->f_type = F2FS_SUPER_MAGIC;
> + buf->f_bsize = sbi->blocksize;
> +
> + buf->f_blocks = total_count - start_count;
> + buf->f_bfree = buf->f_blocks - valid_user_blocks(sbi) - ovp_count;
> + buf->f_bavail = user_block_count - valid_user_blocks(sbi);
> +
> + buf->f_files = valid_inode_count(sbi);
> + buf->f_ffree = sbi->total_node_count - valid_node_count(sbi);
> +
> + buf->f_namelen = F2FS_MAX_NAME_LEN;
> +
> + return 0;
> +}
> +
> +static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(root->d_sb);
> +
> + if (test_opt(sbi, BG_GC))
> + seq_puts(seq, ",background_gc_on");
> + else
> + seq_puts(seq, ",background_gc_off");
> + if (test_opt(sbi, DISABLE_ROLL_FORWARD))
> + seq_puts(seq, ",disable_roll_forward");
> + if (test_opt(sbi, DISCARD))
> + seq_puts(seq, ",discard");
> + if (test_opt(sbi, NOHEAP))
> + seq_puts(seq, ",no_heap_alloc");
> +#ifdef CONFIG_F2FS_FS_XATTR
> + if (test_opt(sbi, XATTR_USER))
> + seq_puts(seq, ",user_xattr");
> + else
> + seq_puts(seq, ",nouser_xattr");
> +#endif
> +#ifdef CONFIG_F2FS_FS_POSIX_ACL
> + if (test_opt(sbi, POSIX_ACL))
> + seq_puts(seq, ",acl");
> + else
> + seq_puts(seq, ",noacl");
> +#endif
> + if (test_opt(sbi, DISABLE_EXT_IDENTIFY))
> + seq_puts(seq, ",disable_ext_indentify");
> +
> + seq_printf(seq, ",active_logs=%u", sbi->active_logs);
> +
> + return 0;
> +}
> +
> +static struct super_operations f2fs_sops = {
> + .alloc_inode = f2fs_alloc_inode,
> + .destroy_inode = f2fs_destroy_inode,
> + .write_inode = f2fs_write_inode,
> + .show_options = f2fs_show_options,
> + .evict_inode = f2fs_evict_inode,
> + .put_super = f2fs_put_super,
> + .sync_fs = f2fs_sync_fs,
> + .statfs = f2fs_statfs,
> +};
> +
> +static int parse_options(struct f2fs_sb_info *sbi, char *options)
> +{
> + substring_t args[MAX_OPT_ARGS];
> + char *p;
> + int arg = 0;
> +
> + if (!options)
> + return 0;
> +
> + while ((p = strsep(&options, ",")) != NULL) {
> + int token;
> + if (!*p)
> + continue;
> + /*
> + * Initialize args struct so we know whether arg was
> + * found; some options take optional arguments.
> + */
> + args[0].to = args[0].from = NULL;
> + token = match_token(p, f2fs_tokens, args);
> +
> + switch (token) {
> + case Opt_gc_background_off:
> + clear_opt(sbi, BG_GC);
> + break;
> + case Opt_disable_roll_forward:
> + set_opt(sbi, DISABLE_ROLL_FORWARD);
> + break;
> + case Opt_discard:
> + set_opt(sbi, DISCARD);
> + break;
> + case Opt_noheap:
> + set_opt(sbi, NOHEAP);
> + break;
> +#ifdef CONFIG_F2FS_FS_XATTR
> + case Opt_nouser_xattr:
> + clear_opt(sbi, XATTR_USER);
> + break;
> +#else
> + case Opt_nouser_xattr:
> + pr_info("nouser_xattr options not supported\n");
> + break;
> +#endif
> +#ifdef CONFIG_F2FS_FS_POSIX_ACL
> + case Opt_noacl:
> + clear_opt(sbi, POSIX_ACL);
> + break;
> +#else
> + case Opt_noacl:
> + pr_info("noacl options not supported\n");
> + break;
> +#endif
> + case Opt_active_logs:
> + if (args->from && match_int(args, &arg))
> + return -EINVAL;
> + if (arg != 2 && arg != 4 && arg != 6)
> + return -EINVAL;
> + sbi->active_logs = arg;
> + break;
> + case Opt_disable_ext_identify:
> + set_opt(sbi, DISABLE_EXT_IDENTIFY);
> + break;
> + default:
> + return -EINVAL;
> + }
> + }
> + return 0;
> +}
> +
> +static loff_t max_file_size(unsigned bits)
> +{
> + loff_t result = ADDRS_PER_INODE;
> + loff_t leaf_count = ADDRS_PER_BLOCK;
> +
> + result += (leaf_count * 2);
> +
> + leaf_count *= NIDS_PER_BLOCK;
> + result += (leaf_count * 2);
> +
> + leaf_count *= NIDS_PER_BLOCK;
> + result += (leaf_count * 2);
> +
> + result <<= bits;
> + return result;
> +}
> +
> +static int sanity_check_raw_super(struct f2fs_super_block *raw_super)
> +{
> + unsigned int blocksize;
> +
> + if (F2FS_SUPER_MAGIC != le32_to_cpu(raw_super->magic))
> + return 1;
> +
> + /* Currently, support only 4KB block size */
> + blocksize = 1 << le32_to_cpu(raw_super->log_blocksize);
> + if (blocksize != PAGE_CACHE_SIZE)
> + return 1;
> + if (le32_to_cpu(raw_super->log_sectorsize) != 9)
> + return 1;
> + if (le32_to_cpu(raw_super->log_sectors_per_block) != 3)
> + return 1;
> + return 0;
> +}
> +
> +static int sanity_check_ckpt(struct f2fs_super_block *raw_super,
> + struct f2fs_checkpoint *ckpt)
> +{
> + unsigned int total, fsmeta;
> +
> + total = le32_to_cpu(raw_super->segment_count);
> + fsmeta = le32_to_cpu(raw_super->segment_count_ckpt);
> + fsmeta += le32_to_cpu(raw_super->segment_count_sit);
> + fsmeta += le32_to_cpu(raw_super->segment_count_nat);
> + fsmeta += le32_to_cpu(ckpt->rsvd_segment_count);
> + fsmeta += le32_to_cpu(raw_super->segment_count_ssa);
> +
> + if (fsmeta >= total)
> + return 1;
> + return 0;
> +}
> +
> +static void init_sb_info(struct f2fs_sb_info *sbi)
> +{
> + struct f2fs_super_block *raw_super = sbi->raw_super;
> + int i;
> +
> + sbi->log_sectorsize = le32_to_cpu(raw_super->log_sectorsize);
> + sbi->log_sectors_per_block =
> + le32_to_cpu(raw_super->log_sectors_per_block);
> + sbi->log_blocksize = le32_to_cpu(raw_super->log_blocksize);
> + sbi->blocksize = 1 << sbi->log_blocksize;
> + sbi->log_blocks_per_seg = le32_to_cpu(raw_super->log_blocks_per_seg);
> + sbi->blocks_per_seg = 1 << sbi->log_blocks_per_seg;
> + sbi->segs_per_sec = le32_to_cpu(raw_super->segs_per_sec);
> + sbi->secs_per_zone = le32_to_cpu(raw_super->secs_per_zone);
> + sbi->total_sections = le32_to_cpu(raw_super->section_count);
> + sbi->total_node_count =
> + (le32_to_cpu(raw_super->segment_count_nat) / 2)
> + * sbi->blocks_per_seg * NAT_ENTRY_PER_BLOCK;
> + sbi->root_ino_num = le32_to_cpu(raw_super->root_ino);
> + sbi->node_ino_num = le32_to_cpu(raw_super->node_ino);
> + sbi->meta_ino_num = le32_to_cpu(raw_super->meta_ino);
> +
> + for (i = 0; i < NR_COUNT_TYPE; i++)
> + atomic_set(&sbi->nr_pages[i], 0);
> +}
> +
> +static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> +{
> + struct f2fs_sb_info *sbi;
> + struct f2fs_super_block *raw_super;
> + struct buffer_head *raw_super_buf;
> + struct inode *root;
> + int i;
> +
> + /* allocate memory for f2fs-specific super block info */
> + sbi = kzalloc(sizeof(struct f2fs_sb_info), GFP_KERNEL);
> + if (!sbi)
> + return -ENOMEM;
> +
> + /* set a temporary block size */
> + if (!sb_set_blocksize(sb, F2FS_BLKSIZE))
> + goto free_sbi;
> +
> + /* read f2fs raw super block */
> + raw_super_buf = sb_bread(sb, F2FS_SUPER_OFFSET);
> + if (!raw_super_buf)
> + goto free_sbi;
> + raw_super = (struct f2fs_super_block *) ((char *)raw_super_buf->b_data);
> +
> + /* init some FS parameters */
> + sbi->active_logs = NR_CURSEG_TYPE;
> +
> + set_opt(sbi, BG_GC);
> +
> +#ifdef CONFIG_F2FS_FS_XATTR
> + set_opt(sbi, XATTR_USER);
> +#endif
> +#ifdef CONFIG_F2FS_FS_POSIX_ACL
> + set_opt(sbi, POSIX_ACL);
> +#endif
> + /* parse mount options */
> + if (parse_options(sbi, (char *)data))
> + goto free_sb_buf;
> +
> + /* sanity checking of raw super */
> + if (sanity_check_raw_super(raw_super))
> + goto free_sb_buf;
> +
> + sb->s_maxbytes = max_file_size(raw_super->log_blocksize);
> + sb->s_max_links = F2FS_LINK_MAX;
> +
> + sb->s_op = &f2fs_sops;
> + sb->s_xattr = f2fs_xattr_handlers;
> + sb->s_magic = F2FS_SUPER_MAGIC;
> + sb->s_fs_info = sbi;

and s_time_gran?

Marco

2012-10-23 06:58:59

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 08/16 v2] f2fs: add file operations

2012/10/23 Jaegeuk Kim <[email protected]>:
> This adds memory operations and file/file_inode operations.
>
> - F2FS supports fallocate(), mmap(), fsync(), and basic ioctl().
>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/file.c | 640 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 640 insertions(+)
> create mode 100644 fs/f2fs/file.c
>
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> new file mode 100644
> index 0000000..81b1fd0
> --- /dev/null
> +++ b/fs/f2fs/file.c
> @@ -0,0 +1,640 @@
> +/**
> + * fs/f2fs/file.c
> + *
> + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com/
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#include <linux/fs.h>
> +#include <linux/f2fs_fs.h>
> +#include <linux/stat.h>
> +#include <linux/buffer_head.h>
> +#include <linux/writeback.h>
> +#include <linux/falloc.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <linux/mount.h>
> +
> +#include "f2fs.h"
> +#include "node.h"
> +#include "segment.h"
> +#include "xattr.h"
> +#include "acl.h"
> +
> +static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma,
> + struct vm_fault *vmf)
> +{
> + struct page *page = vmf->page;
> + struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
> + struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
> + struct page *node_page;
> + block_t old_blk_addr;
> + struct dnode_of_data dn;
> + int err;
> +
> + f2fs_balance_fs(sbi);
> +
> + sb_start_pagefault(inode->i_sb);
> +
> + mutex_lock_op(sbi, DATA_NEW);
> +
> + /* block allocation */
> + set_new_dnode(&dn, inode, NULL, NULL, 0);
> + err = get_dnode_of_data(&dn, page->index, 0);
> + if (err) {
> + mutex_unlock_op(sbi, DATA_NEW);
> + goto out;
> + }
> +
> + old_blk_addr = dn.data_blkaddr;
> + node_page = dn.node_page;
> +
> + if (old_blk_addr == NULL_ADDR) {
> + err = reserve_new_block(&dn);
> + if (err) {
> + f2fs_put_dnode(&dn);
> + mutex_unlock_op(sbi, DATA_NEW);
> + goto out;
> + }
> + }
> + f2fs_put_dnode(&dn);
> +
> + mutex_unlock_op(sbi, DATA_NEW);
> +
> + lock_page(page);
> + if (page->mapping != inode->i_mapping ||
> + page_offset(page) >= i_size_read(inode) ||
> + !PageUptodate(page)) {
> + unlock_page(page);
> + err = -EFAULT;
> + goto out;
> + }
> +
> + /*
> + * check to see if the page is mapped already (no holes)
> + */
> + if (PageMappedToDisk(page))
> + goto out;
> +
> + /* fill the page */
> + wait_on_page_writeback(page);
> +
> + /* page is wholly or partially inside EOF */
> + if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode)) {
> + unsigned offset;
> + offset = i_size_read(inode) & ~PAGE_CACHE_MASK;
> + zero_user_segment(page, offset, PAGE_CACHE_SIZE);
> + }
> + set_page_dirty(page);
> + SetPageUptodate(page);
> +
> + file_update_time(vma->vm_file);
> +out:
> + sb_end_pagefault(inode->i_sb);
> + return block_page_mkwrite_return(err);
> +}
> +
> +static const struct vm_operations_struct f2fs_file_vm_ops = {
> + .fault = filemap_fault,
> + .page_mkwrite = f2fs_vm_page_mkwrite,
> +};
> +
> +static int need_to_sync_dir(struct f2fs_sb_info *sbi, struct inode *inode)
> +{
> + struct dentry *dentry;
> + nid_t pino;
> +
> + inode = igrab(inode);
> + dentry = d_find_any_alias(inode);
> + if (!dentry) {
> + iput(inode);
> + return 0;
> + }
> + pino = dentry->d_parent->d_inode->i_ino;
> + dput(dentry);
> + iput(inode);
> + return !is_checkpointed_node(sbi, pino);
> +}
> +
> +int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
> +{
> + struct inode *inode = file->f_mapping->host;
> + struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
> + unsigned long long cur_version;
> + int ret = 0;
> + bool need_cp = false;
> + struct writeback_control wbc = {
> + .sync_mode = WB_SYNC_ALL,
> + .nr_to_write = LONG_MAX,
> + .for_reclaim = 0,
> + };
> +
> + ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
> + if (ret)
> + return ret;
> +
> + mutex_lock(&inode->i_mutex);
> +
> + if (inode->i_sb->s_flags & MS_RDONLY)
> + goto out;
> + if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
> + goto out;
> +
> + mutex_lock(&sbi->cp_mutex);
> + cur_version = le64_to_cpu(F2FS_CKPT(sbi)->checkpoint_ver);
> + mutex_unlock(&sbi->cp_mutex);
> +
> + if (F2FS_I(inode)->data_version != cur_version &&
> + !(inode->i_state & I_DIRTY))
> + goto out;
> + F2FS_I(inode)->data_version--;
> +
> + if (!S_ISREG(inode->i_mode) || inode->i_nlink != 1)
> + need_cp = true;
> + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_CP))
> + need_cp = true;
> + if (!space_for_roll_forward(sbi))
> + need_cp = true;
> + if (need_to_sync_dir(sbi, inode))
> + need_cp = true;
> +
> + f2fs_write_inode(inode, NULL);
> +
> + if (need_cp) {
> + /* all the dirty node pages should be flushed for POR */
> + ret = f2fs_sync_fs(inode->i_sb, 1);
> + clear_inode_flag(F2FS_I(inode), FI_NEED_CP);
> + } else {
> + while (sync_node_pages(sbi, inode->i_ino, &wbc) == 0)
> + f2fs_write_inode(inode, NULL);
> + filemap_fdatawait_range(sbi->node_inode->i_mapping,
> + 0, LONG_MAX);
> + }
> +out:
> + mutex_unlock(&inode->i_mutex);
> + return ret;
> +}
> +
> +static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + file_accessed(file);
> + vma->vm_ops = &f2fs_file_vm_ops;
> + return 0;
> +}
> +
> +static int truncate_data_blocks_range(struct dnode_of_data *dn, int count)
> +{
> + int nr_free = 0, ofs = dn->ofs_in_node;
> + struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
> + struct f2fs_node *raw_node;
> + __le32 *addr;
> +
> + raw_node = page_address(dn->node_page);
> + addr = blkaddr_in_node(raw_node) + ofs;
> +
> + for ( ; count > 0; count--, addr++, dn->ofs_in_node++) {
> + block_t blkaddr = le32_to_cpu(*addr);
> + if (blkaddr == NULL_ADDR)
> + continue;
> +
> + update_extent_cache(NULL_ADDR, dn);
> + invalidate_blocks(sbi, blkaddr);
> + dec_valid_block_count(sbi, dn->inode, 1);
> + nr_free++;
> + }
> + if (nr_free) {
> + set_page_dirty(dn->node_page);
> + sync_inode_page(dn);
> + }
> + dn->ofs_in_node = ofs;
> + return nr_free;
> +}
> +
> +void truncate_data_blocks(struct dnode_of_data *dn)
> +{
> + truncate_data_blocks_range(dn, ADDRS_PER_BLOCK);
> +}
> +
> +static void truncate_partial_data_page(struct inode *inode, u64 from)
> +{
> + unsigned offset = from & (PAGE_CACHE_SIZE - 1);
> + struct page *page;
> +
> + if (!offset)
> + return;
> +
> + page = find_data_page(inode, from >> PAGE_CACHE_SHIFT);
> + if (IS_ERR(page))
> + return;
> +
> + lock_page(page);
> + wait_on_page_writeback(page);
> + zero_user(page, offset, PAGE_CACHE_SIZE - offset);
> + set_page_dirty(page);
> + f2fs_put_page(page, 1);
> +}
> +
> +static int truncate_blocks(struct inode *inode, u64 from)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
> + unsigned int blocksize = inode->i_sb->s_blocksize;
> + struct dnode_of_data dn;
> + pgoff_t free_from;
> + int count = 0;
> + int err;
> +
> + free_from = (pgoff_t)
> + ((from + blocksize - 1) >> (sbi->log_blocksize));
> +
> + mutex_lock_op(sbi, DATA_TRUNC);
> +
> + set_new_dnode(&dn, inode, NULL, NULL, 0);
> + err = get_dnode_of_data(&dn, free_from, RDONLY_NODE);
> + if (err) {
> + if (err == -ENOENT)
> + goto free_next;
> + mutex_unlock_op(sbi, DATA_TRUNC);
> + return err;
> + }
> +
> + if (IS_INODE(dn.node_page))
> + count = ADDRS_PER_INODE;
> + else
> + count = ADDRS_PER_BLOCK;
> +
> + count -= dn.ofs_in_node;
> + BUG_ON(count < 0);
> + if (dn.ofs_in_node || IS_INODE(dn.node_page)) {
> + truncate_data_blocks_range(&dn, count);
> + free_from += count;
> + }
> +
> + f2fs_put_dnode(&dn);
> +free_next:
> + err = truncate_inode_blocks(inode, free_from);
> + mutex_unlock_op(sbi, DATA_TRUNC);
> +
> + /* lastly zero out the first data page */
> + truncate_partial_data_page(inode, from);
> +
> + return err;
> +}
> +
> +void f2fs_truncate(struct inode *inode)
> +{
> + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> + S_ISLNK(inode->i_mode)))
> + return;
> +
> + if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> + return;

No truncate for an append only file? You call f2fs_truncate from
evict_inode, so no block freeing when this kind of inode is deleted.

> +
> + if (!truncate_blocks(inode, i_size_read(inode))) {
> + inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> + mark_inode_dirty(inode);
> + }
> +
> + f2fs_balance_fs(F2FS_SB(inode->i_sb));
> +}
> +
> +static int f2fs_getattr(struct vfsmount *mnt,
> + struct dentry *dentry, struct kstat *stat)
> +{
> + struct inode *inode = dentry->d_inode;
> + generic_fillattr(inode, stat);
> + stat->blocks <<= 3;
> + return 0;
> +}
> +
> +#ifdef CONFIG_F2FS_FS_POSIX_ACL
> +static void __setattr_copy(struct inode *inode, const struct iattr *attr)
> +{
> + struct f2fs_inode_info *fi = F2FS_I(inode);
> + unsigned int ia_valid = attr->ia_valid;
> +
> + if (ia_valid & ATTR_UID)
> + inode->i_uid = attr->ia_uid;
> + if (ia_valid & ATTR_GID)
> + inode->i_gid = attr->ia_gid;
> + if (ia_valid & ATTR_ATIME)
> + inode->i_atime = timespec_trunc(attr->ia_atime,
> + inode->i_sb->s_time_gran);
> + if (ia_valid & ATTR_MTIME)
> + inode->i_mtime = timespec_trunc(attr->ia_mtime,
> + inode->i_sb->s_time_gran);
> + if (ia_valid & ATTR_CTIME)
> + inode->i_ctime = timespec_trunc(attr->ia_ctime,
> + inode->i_sb->s_time_gran);
> + if (ia_valid & ATTR_MODE) {
> + umode_t mode = attr->ia_mode;
> +
> + if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> + mode &= ~S_ISGID;
> + set_acl_inode(fi, mode);
> + }
> +}
> +#else
> +#define __setattr_copy setattr_copy
> +#endif
> +
> +int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
> +{
> + struct inode *inode = dentry->d_inode;
> + struct f2fs_inode_info *fi = F2FS_I(inode);
> + int err;
> +
> + err = inode_change_ok(inode, attr);
> + if (err)
> + return err;
> +
> + if ((attr->ia_valid & ATTR_SIZE) &&
> + attr->ia_size != i_size_read(inode)) {
> + truncate_setsize(inode, attr->ia_size);
> + f2fs_truncate(inode);

No need to call truncate_pagecache & co.?

Marco

2012-10-23 07:01:52

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 11/16 v2] f2fs: add inode operations for special inodes

2012/10/23 Jaegeuk Kim <[email protected]>:
> This adds inode operations for directory, symlink, and special inodes.
>
> Signed-off-by: Changman Lee <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/namei.c | 494 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 494 insertions(+)
> create mode 100644 fs/f2fs/namei.c
>
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> new file mode 100644
> index 0000000..899d144
> --- /dev/null
> +++ b/fs/f2fs/namei.c
> @@ -0,0 +1,494 @@
> +/**
> + * fs/f2fs/namei.c
> + *
> + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com/
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#include <linux/fs.h>
> +#include <linux/f2fs_fs.h>
> +#include <linux/pagemap.h>
> +#include <linux/sched.h>
> +#include <linux/ctype.h>
> +
> +#include "f2fs.h"
> +#include "xattr.h"
> +#include "acl.h"
> +
> +static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
> +{
> + struct super_block *sb = dir->i_sb;
> + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> + nid_t ino;
> + struct inode *inode;
> + bool nid_free = false;
> + int err;
> +
> + inode = new_inode(sb);
> + if (!inode)
> + return ERR_PTR(-ENOMEM);
> +
> + mutex_lock_op(sbi, NODE_NEW);
> + if (!alloc_nid(sbi, &ino)) {
> + mutex_unlock_op(sbi, NODE_NEW);
> + err = -ENOSPC;
> + goto fail;
> + }
> + mutex_unlock_op(sbi, NODE_NEW);
> +
> + inode->i_uid = current_fsuid();
> +
> + if (dir->i_mode & S_ISGID) {
> + inode->i_gid = dir->i_gid;
> + if (S_ISDIR(mode))
> + mode |= S_ISGID;
> + } else {
> + inode->i_gid = current_fsgid();
> + }
> +
> + inode->i_ino = ino;
> + inode->i_mode = mode;
> + inode->i_blocks = 0;
> + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
> +
> + err = insert_inode_locked(inode);
> + if (err) {
> + err = -EINVAL;
> + nid_free = true;
> + goto out;
> + }
> +
> + mark_inode_dirty(inode);
> + return inode;
> +
> +out:
> + clear_nlink(inode);
> + unlock_new_inode(inode);
> +fail:
> + iput(inode);

make_bad_inode here?

Marco

2012-10-23 07:08:44

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 02/16 v2] f2fs: add on-disk layout

> 2012/10/23 Jaegeuk Kim <[email protected]>:
> > This adds a header file describing the on-disk layout of f2fs.
> >
> > Signed-off-by: Changman Lee <[email protected]>
> > Signed-off-by: Chul Lee <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > include/linux/f2fs_fs.h | 362 +++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 362 insertions(+)
> > create mode 100644 include/linux/f2fs_fs.h
> >
> > diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
> > new file mode 100644
> > index 0000000..bd9c217
> > --- /dev/null
> > +++ b/include/linux/f2fs_fs.h
> > @@ -0,0 +1,362 @@
> > +/**
> > + * include/linux/f2fs_fs.h
>
> Is this file used by user space?

Currently, no.
But, later, I'll use this file in format and fsck tools.

>
> > + *
> > + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> > + * http://www.samsung.com/
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +#ifndef _LINUX_F2FS_FS_H
> > +#define _LINUX_F2FS_FS_H
> > +
> > +#include <linux/pagemap.h>
> > +#include <linux/types.h>
> > +
> > +#define F2FS_SUPER_MAGIC 0xF2F52010
>
> In magic.h please.

Ok, thank you.

>
> Marco


---
Jaegeuk Kim
Samsung

2012-10-23 07:09:40

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 04/16 v2] f2fs: add super block operations

[snip]

> > + sb->s_op = &f2fs_sops;
> > + sb->s_xattr = f2fs_xattr_handlers;
> > + sb->s_magic = F2FS_SUPER_MAGIC;
> > + sb->s_fs_info = sbi;
>
> and s_time_gran?

Ok, I'll check this.
Thanks,

>
> Marco


---
Jaegeuk Kim
Samsung

2012-10-23 07:31:33

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 08/16 v2] f2fs: add file operations

> > +void f2fs_truncate(struct inode *inode)
> > +{
> > + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> > + S_ISLNK(inode->i_mode)))
> > + return;
> > +
> > + if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> > + return;
>
> No truncate for an append only file? You call f2fs_truncate from
> evict_inode, so no block freeing when this kind of inode is deleted.

Agreed.

>
> > +
> > + if (!truncate_blocks(inode, i_size_read(inode))) {
> > + inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > + mark_inode_dirty(inode);
> > + }
> > +
> > + f2fs_balance_fs(F2FS_SB(inode->i_sb));
> > +}
> > +
> > +static int f2fs_getattr(struct vfsmount *mnt,
> > + struct dentry *dentry, struct kstat *stat)
> > +{
> > + struct inode *inode = dentry->d_inode;
> > + generic_fillattr(inode, stat);
> > + stat->blocks <<= 3;
> > + return 0;
> > +}
> > +
> > +#ifdef CONFIG_F2FS_FS_POSIX_ACL
> > +static void __setattr_copy(struct inode *inode, const struct iattr *attr)
> > +{
> > + struct f2fs_inode_info *fi = F2FS_I(inode);
> > + unsigned int ia_valid = attr->ia_valid;
> > +
> > + if (ia_valid & ATTR_UID)
> > + inode->i_uid = attr->ia_uid;
> > + if (ia_valid & ATTR_GID)
> > + inode->i_gid = attr->ia_gid;
> > + if (ia_valid & ATTR_ATIME)
> > + inode->i_atime = timespec_trunc(attr->ia_atime,
> > + inode->i_sb->s_time_gran);
> > + if (ia_valid & ATTR_MTIME)
> > + inode->i_mtime = timespec_trunc(attr->ia_mtime,
> > + inode->i_sb->s_time_gran);
> > + if (ia_valid & ATTR_CTIME)
> > + inode->i_ctime = timespec_trunc(attr->ia_ctime,
> > + inode->i_sb->s_time_gran);
> > + if (ia_valid & ATTR_MODE) {
> > + umode_t mode = attr->ia_mode;
> > +
> > + if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> > + mode &= ~S_ISGID;
> > + set_acl_inode(fi, mode);
> > + }
> > +}
> > +#else
> > +#define __setattr_copy setattr_copy
> > +#endif
> > +
> > +int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
> > +{
> > + struct inode *inode = dentry->d_inode;
> > + struct f2fs_inode_info *fi = F2FS_I(inode);
> > + int err;
> > +
> > + err = inode_change_ok(inode, attr);
> > + if (err)
> > + return err;
> > +
> > + if ((attr->ia_valid & ATTR_SIZE) &&
> > + attr->ia_size != i_size_read(inode)) {
> > + truncate_setsize(inode, attr->ia_size);
> > + f2fs_truncate(inode);
>
> No need to call truncate_pagecache & co.?

truncate_setsize() calls truncate_pagecache().
Any comment?

>
> Marco



---
Jaegeuk Kim
Samsung

2012-10-23 07:39:46

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 08/16 v2] f2fs: add file operations

2012/10/23 Jaegeuk Kim <[email protected]>:
>> > +void f2fs_truncate(struct inode *inode)
>> > +{
>> > + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
>> > + S_ISLNK(inode->i_mode)))
>> > + return;
>> > +
>> > + if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
>> > + return;
>>
>> No truncate for an append only file? You call f2fs_truncate from
>> evict_inode, so no block freeing when this kind of inode is deleted.
>
> Agreed.
>
>>
>> > +
>> > + if (!truncate_blocks(inode, i_size_read(inode))) {
>> > + inode->i_mtime = inode->i_ctime = CURRENT_TIME;
>> > + mark_inode_dirty(inode);
>> > + }
>> > +
>> > + f2fs_balance_fs(F2FS_SB(inode->i_sb));
>> > +}
>> > +
>> > +static int f2fs_getattr(struct vfsmount *mnt,
>> > + struct dentry *dentry, struct kstat *stat)
>> > +{
>> > + struct inode *inode = dentry->d_inode;
>> > + generic_fillattr(inode, stat);
>> > + stat->blocks <<= 3;
>> > + return 0;
>> > +}
>> > +
>> > +#ifdef CONFIG_F2FS_FS_POSIX_ACL
>> > +static void __setattr_copy(struct inode *inode, const struct iattr *attr)
>> > +{
>> > + struct f2fs_inode_info *fi = F2FS_I(inode);
>> > + unsigned int ia_valid = attr->ia_valid;
>> > +
>> > + if (ia_valid & ATTR_UID)
>> > + inode->i_uid = attr->ia_uid;
>> > + if (ia_valid & ATTR_GID)
>> > + inode->i_gid = attr->ia_gid;
>> > + if (ia_valid & ATTR_ATIME)
>> > + inode->i_atime = timespec_trunc(attr->ia_atime,
>> > + inode->i_sb->s_time_gran);
>> > + if (ia_valid & ATTR_MTIME)
>> > + inode->i_mtime = timespec_trunc(attr->ia_mtime,
>> > + inode->i_sb->s_time_gran);
>> > + if (ia_valid & ATTR_CTIME)
>> > + inode->i_ctime = timespec_trunc(attr->ia_ctime,
>> > + inode->i_sb->s_time_gran);
>> > + if (ia_valid & ATTR_MODE) {
>> > + umode_t mode = attr->ia_mode;
>> > +
>> > + if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
>> > + mode &= ~S_ISGID;
>> > + set_acl_inode(fi, mode);
>> > + }
>> > +}
>> > +#else
>> > +#define __setattr_copy setattr_copy
>> > +#endif
>> > +
>> > +int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
>> > +{
>> > + struct inode *inode = dentry->d_inode;
>> > + struct f2fs_inode_info *fi = F2FS_I(inode);
>> > + int err;
>> > +
>> > + err = inode_change_ok(inode, attr);
>> > + if (err)
>> > + return err;
>> > +
>> > + if ((attr->ia_valid & ATTR_SIZE) &&
>> > + attr->ia_size != i_size_read(inode)) {
>> > + truncate_setsize(inode, attr->ia_size);
>> > + f2fs_truncate(inode);
>>
>> No need to call truncate_pagecache & co.?
>
> truncate_setsize() calls truncate_pagecache().
> Any comment?

Yep, I didn't get my coffee yet :)

Marco

2012-10-23 07:46:29

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 11/16 v2] f2fs: add inode operations for special inodes


> -----Original Message-----
> From: Marco Stornelli [mailto:[email protected]]
> Sent: Tuesday, October 23, 2012 4:02 PM
> To: Jaegeuk Kim
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 11/16 v2] f2fs: add inode operations for special inodes
> Importance: High
>
> 2012/10/23 Jaegeuk Kim <[email protected]>:
> > This adds inode operations for directory, symlink, and special inodes.
> >
> > Signed-off-by: Changman Lee <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/namei.c | 494 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 494 insertions(+)
> > create mode 100644 fs/f2fs/namei.c
> >
> > diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> > new file mode 100644
> > index 0000000..899d144
> > --- /dev/null
> > +++ b/fs/f2fs/namei.c
> > @@ -0,0 +1,494 @@
> > +/**
> > + * fs/f2fs/namei.c
> > + *
> > + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> > + * http://www.samsung.com/
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +#include <linux/fs.h>
> > +#include <linux/f2fs_fs.h>
> > +#include <linux/pagemap.h>
> > +#include <linux/sched.h>
> > +#include <linux/ctype.h>
> > +
> > +#include "f2fs.h"
> > +#include "xattr.h"
> > +#include "acl.h"
> > +
> > +static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
> > +{
> > + struct super_block *sb = dir->i_sb;
> > + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> > + nid_t ino;
> > + struct inode *inode;
> > + bool nid_free = false;
> > + int err;
> > +
> > + inode = new_inode(sb);
> > + if (!inode)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + mutex_lock_op(sbi, NODE_NEW);
> > + if (!alloc_nid(sbi, &ino)) {
> > + mutex_unlock_op(sbi, NODE_NEW);
> > + err = -ENOSPC;
> > + goto fail;
> > + }
> > + mutex_unlock_op(sbi, NODE_NEW);
> > +
> > + inode->i_uid = current_fsuid();
> > +
> > + if (dir->i_mode & S_ISGID) {
> > + inode->i_gid = dir->i_gid;
> > + if (S_ISDIR(mode))
> > + mode |= S_ISGID;
> > + } else {
> > + inode->i_gid = current_fsgid();
> > + }
> > +
> > + inode->i_ino = ino;
> > + inode->i_mode = mode;
> > + inode->i_blocks = 0;
> > + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
> > +
> > + err = insert_inode_locked(inode);
> > + if (err) {
> > + err = -EINVAL;
> > + nid_free = true;
> > + goto out;
> > + }
> > +
> > + mark_inode_dirty(inode);
> > + return inode;
> > +
> > +out:
> > + clear_nlink(inode);
> > + unlock_new_inode(inode);
> > +fail:
> > + iput(inode);
>
> make_bad_inode here?

I wanted to call f2fs_evict_inode() at this moment.
- f2fs_evict_inode()
- remove_inode_page()
-> check any erroneous conditions.

Got coffee? :)

>
> Marco


---
Jaegeuk Kim
Samsung

2012-10-23 08:20:48

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 11/16 v2] f2fs: add inode operations for special inodes

2012/10/23 Jaegeuk Kim <[email protected]>:
>
>> -----Original Message-----
>> From: Marco Stornelli [mailto:[email protected]]
>> Sent: Tuesday, October 23, 2012 4:02 PM
>> To: Jaegeuk Kim
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [PATCH 11/16 v2] f2fs: add inode operations for special inodes
>> Importance: High
>>
>> 2012/10/23 Jaegeuk Kim <[email protected]>:
>> > This adds inode operations for directory, symlink, and special inodes.
>> >
>> > Signed-off-by: Changman Lee <[email protected]>
>> > Signed-off-by: Jaegeuk Kim <[email protected]>
>> > ---
>> > fs/f2fs/namei.c | 494 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > 1 file changed, 494 insertions(+)
>> > create mode 100644 fs/f2fs/namei.c
>> >
>> > diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
>> > new file mode 100644
>> > index 0000000..899d144
>> > --- /dev/null
>> > +++ b/fs/f2fs/namei.c
>> > @@ -0,0 +1,494 @@
>> > +/**
>> > + * fs/f2fs/namei.c
>> > + *
>> > + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
>> > + * http://www.samsung.com/
>> > + *
>> > + * This program is free software; you can redistribute it and/or modify
>> > + * it under the terms of the GNU General Public License version 2 as
>> > + * published by the Free Software Foundation.
>> > + */
>> > +#include <linux/fs.h>
>> > +#include <linux/f2fs_fs.h>
>> > +#include <linux/pagemap.h>
>> > +#include <linux/sched.h>
>> > +#include <linux/ctype.h>
>> > +
>> > +#include "f2fs.h"
>> > +#include "xattr.h"
>> > +#include "acl.h"
>> > +
>> > +static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
>> > +{
>> > + struct super_block *sb = dir->i_sb;
>> > + struct f2fs_sb_info *sbi = F2FS_SB(sb);
>> > + nid_t ino;
>> > + struct inode *inode;
>> > + bool nid_free = false;
>> > + int err;
>> > +
>> > + inode = new_inode(sb);
>> > + if (!inode)
>> > + return ERR_PTR(-ENOMEM);
>> > +
>> > + mutex_lock_op(sbi, NODE_NEW);
>> > + if (!alloc_nid(sbi, &ino)) {
>> > + mutex_unlock_op(sbi, NODE_NEW);
>> > + err = -ENOSPC;
>> > + goto fail;
>> > + }
>> > + mutex_unlock_op(sbi, NODE_NEW);
>> > +
>> > + inode->i_uid = current_fsuid();
>> > +
>> > + if (dir->i_mode & S_ISGID) {
>> > + inode->i_gid = dir->i_gid;
>> > + if (S_ISDIR(mode))
>> > + mode |= S_ISGID;
>> > + } else {
>> > + inode->i_gid = current_fsgid();
>> > + }
>> > +
>> > + inode->i_ino = ino;
>> > + inode->i_mode = mode;
>> > + inode->i_blocks = 0;
>> > + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
>> > +
>> > + err = insert_inode_locked(inode);
>> > + if (err) {
>> > + err = -EINVAL;
>> > + nid_free = true;
>> > + goto out;
>> > + }
>> > +
>> > + mark_inode_dirty(inode);
>> > + return inode;
>> > +
>> > +out:
>> > + clear_nlink(inode);
>> > + unlock_new_inode(inode);
>> > +fail:
>> > + iput(inode);
>>
>> make_bad_inode here?
>
> I wanted to call f2fs_evict_inode() at this moment.
> - f2fs_evict_inode()
> - remove_inode_page()
> -> check any erroneous conditions.
>
> Got coffee? :)
>

Not yet, I'm reading my 240 email yet :)
I meant not to replace iput but to add make_bad_inode() before (I
don't know if it was clear). I don't know if it's the right thing to
do. In case of "out" I'd do the "rollback" here.

Marco

2012-10-23 08:56:46

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 11/16 v2] f2fs: add inode operations for special inodes

> 2012/10/23 Jaegeuk Kim <[email protected]>:
> >
> >> -----Original Message-----
> >> From: Marco Stornelli [mailto:[email protected]]
> >> Sent: Tuesday, October 23, 2012 4:02 PM
> >> To: Jaegeuk Kim
> >> Cc: [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> >> [email protected]
> >> Subject: Re: [PATCH 11/16 v2] f2fs: add inode operations for special inodes
> >> Importance: High
> >>
> >> 2012/10/23 Jaegeuk Kim <[email protected]>:
> >> > This adds inode operations for directory, symlink, and special inodes.
> >> >
> >> > Signed-off-by: Changman Lee <[email protected]>
> >> > Signed-off-by: Jaegeuk Kim <[email protected]>
> >> > ---
> >> > fs/f2fs/namei.c | 494 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > 1 file changed, 494 insertions(+)
> >> > create mode 100644 fs/f2fs/namei.c
> >> >
> >> > diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> >> > new file mode 100644
> >> > index 0000000..899d144
> >> > --- /dev/null
> >> > +++ b/fs/f2fs/namei.c
> >> > @@ -0,0 +1,494 @@
> >> > +/**
> >> > + * fs/f2fs/namei.c
> >> > + *
> >> > + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> >> > + * http://www.samsung.com/
> >> > + *
> >> > + * This program is free software; you can redistribute it and/or modify
> >> > + * it under the terms of the GNU General Public License version 2 as
> >> > + * published by the Free Software Foundation.
> >> > + */
> >> > +#include <linux/fs.h>
> >> > +#include <linux/f2fs_fs.h>
> >> > +#include <linux/pagemap.h>
> >> > +#include <linux/sched.h>
> >> > +#include <linux/ctype.h>
> >> > +
> >> > +#include "f2fs.h"
> >> > +#include "xattr.h"
> >> > +#include "acl.h"
> >> > +
> >> > +static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
> >> > +{
> >> > + struct super_block *sb = dir->i_sb;
> >> > + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> >> > + nid_t ino;
> >> > + struct inode *inode;
> >> > + bool nid_free = false;
> >> > + int err;
> >> > +
> >> > + inode = new_inode(sb);
> >> > + if (!inode)
> >> > + return ERR_PTR(-ENOMEM);
> >> > +
> >> > + mutex_lock_op(sbi, NODE_NEW);
> >> > + if (!alloc_nid(sbi, &ino)) {
> >> > + mutex_unlock_op(sbi, NODE_NEW);
> >> > + err = -ENOSPC;
> >> > + goto fail;
> >> > + }
> >> > + mutex_unlock_op(sbi, NODE_NEW);
> >> > +
> >> > + inode->i_uid = current_fsuid();
> >> > +
> >> > + if (dir->i_mode & S_ISGID) {
> >> > + inode->i_gid = dir->i_gid;
> >> > + if (S_ISDIR(mode))
> >> > + mode |= S_ISGID;
> >> > + } else {
> >> > + inode->i_gid = current_fsgid();
> >> > + }
> >> > +
> >> > + inode->i_ino = ino;
> >> > + inode->i_mode = mode;
> >> > + inode->i_blocks = 0;
> >> > + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
> >> > +
> >> > + err = insert_inode_locked(inode);
> >> > + if (err) {
> >> > + err = -EINVAL;
> >> > + nid_free = true;
> >> > + goto out;
> >> > + }
> >> > +
> >> > + mark_inode_dirty(inode);
> >> > + return inode;
> >> > +
> >> > +out:
> >> > + clear_nlink(inode);
> >> > + unlock_new_inode(inode);
> >> > +fail:
> >> > + iput(inode);
> >>
> >> make_bad_inode here?
> >
> > I wanted to call f2fs_evict_inode() at this moment.
> > - f2fs_evict_inode()
> > - remove_inode_page()
> > -> check any erroneous conditions.
> >
> > Got coffee? :)
> >
>
> Not yet, I'm reading my 240 email yet :)
> I meant not to replace iput but to add make_bad_inode() before (I
> don't know if it was clear). I don't know if it's the right thing to
> do. In case of "out" I'd do the "rollback" here.
>

Sorry, I confused what you said. I need a cup of coffee.
IMHO, it seems there is no difference, since f2fs doesn't allow
a race condition on inodes with a same inode number.
(e.g., one is bad, and the other is newly allocated with the same
inode number.)

> Marco

2012-10-23 09:35:17

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 11/16 v2] f2fs: add inode operations for special inodes

2012/10/23 Jaegeuk Kim <[email protected]>:
>> 2012/10/23 Jaegeuk Kim <[email protected]>:
>> >
>> >> -----Original Message-----
>> >> From: Marco Stornelli [mailto:[email protected]]
>> >> Sent: Tuesday, October 23, 2012 4:02 PM
>> >> To: Jaegeuk Kim
>> >> Cc: [email protected]; [email protected]; [email protected];
>> >> [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
>> >> [email protected]
>> >> Subject: Re: [PATCH 11/16 v2] f2fs: add inode operations for special inodes
>> >> Importance: High
>> >>
>> >> 2012/10/23 Jaegeuk Kim <[email protected]>:
>> >> > This adds inode operations for directory, symlink, and special inodes.
>> >> >
>> >> > Signed-off-by: Changman Lee <[email protected]>
>> >> > Signed-off-by: Jaegeuk Kim <[email protected]>
>> >> > ---
>> >> > fs/f2fs/namei.c | 494 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> > 1 file changed, 494 insertions(+)
>> >> > create mode 100644 fs/f2fs/namei.c
>> >> >
>> >> > diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
>> >> > new file mode 100644
>> >> > index 0000000..899d144
>> >> > --- /dev/null
>> >> > +++ b/fs/f2fs/namei.c
>> >> > @@ -0,0 +1,494 @@
>> >> > +/**
>> >> > + * fs/f2fs/namei.c
>> >> > + *
>> >> > + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
>> >> > + * http://www.samsung.com/
>> >> > + *
>> >> > + * This program is free software; you can redistribute it and/or modify
>> >> > + * it under the terms of the GNU General Public License version 2 as
>> >> > + * published by the Free Software Foundation.
>> >> > + */
>> >> > +#include <linux/fs.h>
>> >> > +#include <linux/f2fs_fs.h>
>> >> > +#include <linux/pagemap.h>
>> >> > +#include <linux/sched.h>
>> >> > +#include <linux/ctype.h>
>> >> > +
>> >> > +#include "f2fs.h"
>> >> > +#include "xattr.h"
>> >> > +#include "acl.h"
>> >> > +
>> >> > +static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
>> >> > +{
>> >> > + struct super_block *sb = dir->i_sb;
>> >> > + struct f2fs_sb_info *sbi = F2FS_SB(sb);
>> >> > + nid_t ino;
>> >> > + struct inode *inode;
>> >> > + bool nid_free = false;
>> >> > + int err;
>> >> > +
>> >> > + inode = new_inode(sb);
>> >> > + if (!inode)
>> >> > + return ERR_PTR(-ENOMEM);
>> >> > +
>> >> > + mutex_lock_op(sbi, NODE_NEW);
>> >> > + if (!alloc_nid(sbi, &ino)) {
>> >> > + mutex_unlock_op(sbi, NODE_NEW);
>> >> > + err = -ENOSPC;
>> >> > + goto fail;
>> >> > + }
>> >> > + mutex_unlock_op(sbi, NODE_NEW);
>> >> > +
>> >> > + inode->i_uid = current_fsuid();
>> >> > +
>> >> > + if (dir->i_mode & S_ISGID) {
>> >> > + inode->i_gid = dir->i_gid;
>> >> > + if (S_ISDIR(mode))
>> >> > + mode |= S_ISGID;
>> >> > + } else {
>> >> > + inode->i_gid = current_fsgid();
>> >> > + }
>> >> > +
>> >> > + inode->i_ino = ino;
>> >> > + inode->i_mode = mode;
>> >> > + inode->i_blocks = 0;
>> >> > + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
>> >> > +
>> >> > + err = insert_inode_locked(inode);
>> >> > + if (err) {
>> >> > + err = -EINVAL;
>> >> > + nid_free = true;
>> >> > + goto out;
>> >> > + }
>> >> > +
>> >> > + mark_inode_dirty(inode);
>> >> > + return inode;
>> >> > +
>> >> > +out:
>> >> > + clear_nlink(inode);
>> >> > + unlock_new_inode(inode);
>> >> > +fail:
>> >> > + iput(inode);
>> >>
>> >> make_bad_inode here?
>> >
>> > I wanted to call f2fs_evict_inode() at this moment.
>> > - f2fs_evict_inode()
>> > - remove_inode_page()
>> > -> check any erroneous conditions.
>> >
>> > Got coffee? :)
>> >
>>
>> Not yet, I'm reading my 240 email yet :)
>> I meant not to replace iput but to add make_bad_inode() before (I
>> don't know if it was clear). I don't know if it's the right thing to
>> do. In case of "out" I'd do the "rollback" here.
>>
>
> Sorry, I confused what you said. I need a cup of coffee.
> IMHO, it seems there is no difference, since f2fs doesn't allow
> a race condition on inodes with a same inode number.
> (e.g., one is bad, and the other is newly allocated with the same
> inode number.)
>
>> Marco
>

It was only a suggestion :)

Marco

2012-10-23 11:41:24

by Viacheslav Dubeyko

[permalink] [raw]
Subject: Re: [PATCH 01/16 v2] f2fs: add document

On Tue, 2012-10-23 at 11:25 +0900, Jaegeuk Kim wrote:
> This adds a document describing the mount options, proc entries, usage, and
> design of Flash-Friendly File System, namely F2FS.
>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> Documentation/filesystems/00-INDEX | 2 +
> Documentation/filesystems/f2fs.txt | 404 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 406 insertions(+)
> create mode 100644 Documentation/filesystems/f2fs.txt
>
> diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
> index 8c624a1..ce5fd46 100644
> --- a/Documentation/filesystems/00-INDEX
> +++ b/Documentation/filesystems/00-INDEX
> @@ -48,6 +48,8 @@ ext4.txt
> - info, mount options and specifications for the Ext4 filesystem.
> files.txt
> - info on file management in the Linux kernel.
> +f2fs.txt
> + - info and mount options for the F2FS filesystem.
> fuse.txt
> - info on the Filesystem in User SpacE including mount options.
> gfs2.txt
> diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
> new file mode 100644
> index 0000000..f2b4fde
> --- /dev/null
> +++ b/Documentation/filesystems/f2fs.txt
> @@ -0,0 +1,404 @@
> +================================================================================
> +WHAT IS Flash-Friendly File System (F2FS)?
> +================================================================================
> +
> +NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
> +been widely being used for storage ranging from mobile to server systems. Since

Maybe, it needs to reformulate "... have been widely being used ..."?

> +they are known to have different characteristics from the conventional rotating
> +disks, a file system, an upper layer to the storage device, should adapt to the
> +changes from the sketch in the design level.
> +
> +F2FS is a file system exploiting NAND flash memory-based storage devices, which
> +is based on Log-structured File System (LFS). The design has been focused on
> +addressing the fundamental issues in LFS, which are snowball effect of wandering
> +tree and high cleaning overhead.
> +
> +Since a NAND flash memory-based storage device shows different characteristic
> +according to its internal geometry or flash memory management scheme, namely FTL,
> +F2FS and its tools support various parameters not only for configuring on-disk
> +layout, but also for selecting allocation and cleaning algorithms.
> +
> +The file system formatting tool, "mkfs.f2fs", is available from the following
> +download page: http://sourceforge.net/projects/f2fs-tools/
> +
> +================================================================================
> +BACKGROUND AND DESIGN ISSUES
> +================================================================================
> +
> +Log-structured File System (LFS)
> +--------------------------------
> +"A log-structured file system writes all modifications to disk sequentially in
> +a log-like structure, thereby speeding up both file writing and crash recovery.
> +The log is the only structure on disk; it contains indexing information so that
> +files can be read back from the log efficiently. In order to maintain large free
> +areas on disk for fast writing, we divide the log into segments and use a
> +segment cleaner to compress the live information from heavily fragmented
> +segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
> +implementation of a log-structured file system", ACM Trans. Computer Systems
> +10, 1, 26–52.
> +
> +Wandering Tree Problem
> +----------------------
> +In LFS, when a file data is updated and written to the end of log, its direct
> +pointer block is updated due to the changed location. Then the indirect pointer
> +block is also updated due to the direct pointer block update. In this manner,
> +the upper index structures such as inode, inode map, and checkpoint block are
> +also updated recursively. This problem is called as wandering tree problem [1],
> +and in order to enhance the performance, it should eliminate or relax the update
> +propagation as much as possible.
> +
> +[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
> +
> +Cleaning Overhead
> +-----------------
> +Since LFS is based on out-of-place writes, it produces so many obsolete blocks
> +scattered across the whole storage. In order to serve new empty log space, it
> +needs to reclaim these obsolete blocks seamlessly to users. This job is called
> +as a cleaning process.
> +
> +The process consists of three operations as follows.
> +1. A victim segment is selected through referencing segment usage table.
> +2. It loads parent index structures of all the data in the victim identified by
> + segment summary blocks.
> +3. It checks the cross-reference between the data and its parent index structure.
> +4. It moves valid data selectively.
> +
> +This cleaning job may cause unexpected long delays, so the most important goal
> +is to hide the latencies to users. And also definitely, it should reduce the
> +amount of valid data to be moved, and move them quickly as well.
> +
> +================================================================================
> +KEY FEATURES
> +================================================================================
> +
> +Flash Awareness
> +---------------
> +- Enlarge the random write area for better performance, but provide the high
> + spatial locality
> +- Align FS data structures to the operational units in FTL as best efforts
> +
> +Wandering Tree Problem
> +----------------------
> +- Use a term, “node”, that represents inodes as well as various pointer blocks
> +- Introduce Node Address Table (NAT) containing the locations of all the “node”
> + blocks; this will cut off the update propagation.
> +
> +Cleaning Overhead
> +-----------------
> +- Support a background cleaning process
> +- Support greedy and cost-benefit algorithms for victim selection policies
> +- Support multi-head logs for static/dynamic hot and cold data separation
> +- Introduce adaptive logging for efficient block allocation
> +
> +================================================================================
> +MOUNT OPTIONS
> +================================================================================
> +
> +background_gc_off Turn off cleaning operations, namely garbage collection,
> + triggered in background when I/O subsystem is idle.
> +disable_roll_forward Disable the roll-forward recovery routine
> +discard Issue discard/TRIM commands when a segment is cleaned.
> +no_heap Disable heap-style segment allocation which finds free
> + segments for data from the beginning of main area, while
> + for node from the end of main area.
> +nouser_xattr Disable Extended User Attributes. Note: xattr is enabled
> + by default if CONFIG_F2FS_FS_XATTR is selected.
> +noacl Disable POSIX Access Control List. Note: acl is enabled
> + by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
> +active_logs=%u Support configuring the number of active logs. In the
> + current design, f2fs supports only 2, 4, and 6 logs.
> + Default number is 6.
> +disable_ext_identify Disable the extension list configured by mkfs, so f2fs
> + does not aware of cold files such as media files.
> +
> +================================================================================
> +PROC ENTRIES
> +================================================================================
> +
> +/proc/fs/f2fs/ contains information about partitions mounted as f2fs. For each
> +partition, a corresponding directory, named as its device name, is provided with
> +the following proc entries.
> +
> +- f2fs_stat major file system information managed by f2fs currently
> +- f2fs_sit_stat average utilization information of the whole segments
> +- f2fs_mem_stat current memory footprint consumed by f2fs
> +
> +e.g., in /proc/fs/f2fs/sdb1/
> +
> +================================================================================
> +USAGE
> +================================================================================
> +
> +1. Download userland tools
> +
> +2. Insmod f2fs.ko module:
> + # insmod f2fs.ko
> +

What about the case of static compilation of f2fs in the kernel?

> +3. Check the directory trying to mount
> + # mkdir /mnt/f2fs
> +

Create or check?

> +4. Format the block device, and then mount as f2fs
> + # mkfs.f2fs -l label /dev/block_device
> + # mount -t f2fs /dev/block_device /mnt/f2fs
> +
> +Mount options

Sorry, is it really mount options? Maybe, I misunderstand possibility to
set volume label during mount.

> +-------------
> +-l [label] : Give a volume label, up to 256 unicode name.
> +-a [0 or 1] : Split start location of each area for heap-based allocation.
> + 1 is set by default, which performs this.
> +-o [int] : Set overprovision ratio in percent over volume size.
> + 5 is set by default.
> +-s [int] : Set the number of segments per section.
> + 1 is set by default.
> +-z [int] : Set the number of sections per zone.
> + 1 is set by default.
> +-e [str] : Set basic extension list. e.g. "mp3,gif,mov"
> +
> +================================================================================
> +DESIGN
> +================================================================================
> +
> +On-disk Layout
> +--------------
> +
> +F2FS divides the whole volume into a number of segments, each of which is 2MB in
> +size by default. A section is composed of consecutive segments, and a zone
> +consists of a set of sections.
> +

Maybe, it makes sense to describe here possible sizes of sections and
zones?

> +F2FS maintains logically six log areas. Except SB, all the log areas are managed
> +in a unit of multiple segments. SB is located at the beginning of the partition,
> +and there exist two superblocks to avoid file system crash. Other file system
> +metadata such as CP, NAT, SIT, and SSA are located in the front part of the
> +volume. Main area contains file and directory data including their indices.
> +

I feel necessity to know more details about log concept here. Could you
add slightly more description about log?

> +Each area manages the following contents.
> +- CP File system information, bitmaps for valid NAT/SIT sets, orphan
> + inode lists, and summary entries of current active segments.
> +- NAT Block address table for all the node blocks stored in Main area.
> +- SIT Segment information such as valid block count and bitmap for the
> + validity of all the blocks.
> +- SSA Summary entries which contains the owner information of all the
> + data and node blocks stored in Main area.
> +- Main Node and data blocks.
> +

Could you add definition of abbreviations here also (for example, NAT
Node Address Table: <description>)?

> +In order to avoid misalignment between file system and flash-based storage, F2FS
> +aligns the start block address of CP with the segment size. Also, it aligns the
> +start block address of Main area with the zone size by reserving some segments
> +in SSA area.

Maybe, it makes sense to add some technical details about aligning
procedure here?

> +
> + align with the zone size <-|
> + |-> align with the segment size
> + _________________________________________________________________________
> + | | | Node | Segment | Segment | |
> + | Superblock | Checkpoint | Address | Info. | Summary | Main |
> + | (SB) | (CP) | Table (NAT) | Table (SIT) | Area (SSA) | |
> + |____________|_____2______|______N______|______N______|______N_____|__N___|
> + . .
> + . .
> + . .
> + ._________________________________________.
> + |_Segment_|_..._|_Segment_|_..._|_Segment_|
> + . .
> + ._________._________
> + |_section_|__...__|_
> + . .
> + .________.
> + |__zone__|
> +
> +
> +File System Metadata Structure
> +------------------------------
> +
> +F2FS adopts the checkpointing scheme to maintain file system consistency. At
> +mount time, F2FS first tries to find the last valid checkpoint data by scanning
> +CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
> +One of them always indicates the last valid data, which is called as shadow copy
> +mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
> +
> +For file system consistency, each CP points to which NAT and SIT copies are
> +valid, as shown as below.
> +
> + +--------+----------+---------+
> + | CP | NAT | SIT |
> + +--------+----------+---------+
> + . . . .
> + . . . .
> + . . . .
> + +-------+-------+--------+--------+--------+--------+
> + | CP #0 | CP #1 | NAT #0 | NAT #1 | SIT #0 | SIT #1 |
> + +-------+-------+--------+--------+--------+--------+
> + | ^ ^
> + | | |
> + `----------------------------------------'
> +
> +Index Structure
> +---------------
> +
> +The key data structure to manage the data locations is a "node". Similar to
> +traditional file structures, F2FS has three types of node: inode, direct node,
> +indirect node. F2FS assigns 4KB to an inode block which contains 929 data block
> +indices, two direct node pointers, two indirect node pointers, and one double
> +indirect node pointer as described below. One direct node block contains 1018
> +data blocks, and one indirect node block contains also 1018 node blocks. Thus,
> +one inode block (i.e., a file) covers:
> +
> + 4KB * (927 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
> +
> + Inode block (4KB)
> + |- data (927)
> + |- direct node (2)
> + | `- data (1018)
> + |- indirect node (2)
> + | `- direct node (1018)
> + | `- data (1018)
> + `- double indirect node (1)
> + `- indirect node (1018)
> + `- direct node (1018)
> + `- data (1018)
> +
> +Note that, all the node blocks are mapped by NAT which means the location of
> +each node is translated by the NAT table. In the consideration of the wandering
> +tree problem, F2FS is able to cut off the propagation of node updates caused by
> +leaf data writes.
> +
> +Directory Structure
> +-------------------
> +
> +A directory entry occupies 11 bytes, which consists of the following attributes.
> +
> +- hash hash value of the file name
> +- ino inode number
> +- len the length of file name
> +- type file type such as directory, symlink, etc
> +
> +A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
> +used to represent whether each dentry is valid or not. A dentry block occupies
> +4KB with the following composition.
> +
> + Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
> + dentries(11 * 214 bytes) + file name (8 * 214 bytes)
> +
> + [Bucket]
> + +--------------------------------+
> + |dentry block 1 | dentry block 2 |
> + +--------------------------------+
> + . .
> + . .
> + . [Dentry Block Structure: 4KB] .
> + +--------+----------+----------+------------+
> + | bitmap | reserved | dentries | file names |
> + +--------+----------+----------+------------+
> + [Dentry Block: 4KB] . .
> + . .
> + . .
> + +------+------+-----+------+
> + | hash | ino | len | type |
> + +------+------+-----+------+
> + [Dentry Structure: 11 bytes]
> +
> +F2FS implements multi-level hash tables for directory structure. Each level has
> +a hash table with dedicated number of hash buckets as shown below. Note that
> +"A(2B)" means a bucket includes 2 data blocks.
> +
> +----------------------
> +A : bucket
> +B : block
> +N : MAX_DIR_HASH_DEPTH
> +----------------------
> +
> +level #0 | A(2B)
> + |
> +level #1 | A(2B) - A(2B)
> + |
> +level #2 | A(2B) - A(2B) - A(2B) - A(2B)
> + . | . . . .
> +level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
> + . | . . . .
> +level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
> +
> +The number of blocks and buckets are determined by,
> +
> + ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
> + # of blocks in level #n = |
> + `- 4, Otherwise
> +
> + ,- 2^n, if n < MAX_DIR_HASH_DEPTH / 2,
> + # of buckets in level #n = |
> + `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1), Otherwise
> +
> +When F2FS finds a file name in a directory, at first a hash value of the file
> +name is calculated. Then, F2FS scans the hash table in level #0 to find the
> +dentry consisting of the file name and its inode number. If not found, F2FS
> +scans the next hash table in level #1. In this way, F2FS scans hash tables in
> +each levels incrementally from 1 to N. In each levels F2FS needs to scan only
> +one bucket determined by the following equation, which shows O(log(# of files))
> +complexity.
> +
> + bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
> +
> +In the case of file creation, F2FS finds empty consecutive slots that cover the
> +file name. F2FS searches the empty slots in the hash tables of whole levels from
> +1 to N in the same way as the lookup operation.
> +
> +The following figure shows an example of two cases holding children.
> + --------------> Dir <--------------
> + | |
> + child child
> +
> + child - child [hole] - child
> +
> + child - child - child [hole] - [hole] - child
> +
> + Case 1: Case 2:
> + Number of children = 6, Number of children = 3,
> + File size = 7 File size = 7
> +
> +Default Block Allocation
> +------------------------
> +
> +At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
> +and Hot/Warm/Cold data.
> +
> +- Hot node contains direct node blocks of directories.
> +- Warm node contains direct node blocks except hot node blocks.
> +- Cold node contains indirect node blocks
> +- Hot data contains dentry blocks
> +- Warm data contains data blocks except hot and cold data blocks
> +- Cold data contains multimedia data or migrated data blocks
> +
> +LFS has two schemes for free space management: threaded log and copy-and-compac-
> +tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
> +for devices showing very good sequential write performance, since free segments
> +are served all the time for writing new data. However, it suffers from cleaning
> +overhead under high utilization. Contrarily, the threaded log scheme suffers
> +from random writes, but no cleaning process is needed. F2FS adopts a hybrid
> +scheme where the copy-and-compaction scheme is adopted by default, but the
> +policy is dynamically changed to the threaded log scheme according to the file
> +system status.
> +
> +In order to align F2FS with underlying flash-based storage, F2FS allocates a
> +segment in a unit of section. F2FS expects that the section size would be the
> +same as the unit size of garbage collection in FTL. Furthermore, with respect
> +to the mapping granularity in FTL, F2FS allocates each section of the active
> +logs from different zones as much as possible, since FTL can write the data in
> +the active logs into one allocation unit according to its mapping granularity.
> +
> +Cleaning process
> +----------------
> +
> +F2FS does cleaning both on demand and in the background. On-demand cleaning is
> +triggered when there are not enough free segments to serve VFS calls. Background
> +cleaner is operated by a kernel thread, and triggers the cleaning job when the
> +system is idle.
> +
> +F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
> +In the greedy algorithm, F2FS selects a victim segment having the smallest number
> +of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
> +according to the segment age and the number of valid blocks in order to address
> +log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
> +algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
> +algorithm.
> +
> +In order to identify whether the data in the victim segment are valid or not,
> +F2FS manages a bitmap. Each bit represents the validity of a block, and the
> +bitmap is composed of a bit stream covering whole blocks in main area.

With the best regards,
Vyacheslav Dubeyko.

2012-10-23 18:21:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 0/3] f2fs: move proc files to debugfs

Here are 3 patches, moving the proc file usage on f2fs to debugfs.

The first one fixes a bug in the gc.h file preventing it from being able
to be included by any other files.

The second patch moves all current proc file accesses to a single file,
removing all #ifdefs from the .c files. This should have been done in
the first place.

The last file converts the files to use debugfs instead of proc.

Note, these patches have been compile tested only, I haven't tested them
out, as I haven't had the chance to yet. I'll go do that this afternoon
after I catch up on some other pending kernel work.

One question, it seems that the proc files show all information for all
super blocks in the system, no matter which subdirectory you are reading
from in the proc f2fs tree. Is that really what you want? Shouldn't we
only be showing the stats of the superblock we are saying we will
report? I'll test that later today, and if it really is wrong, will fix
the debugfs code up to handle this properly.

Do these patches look reasonable?

thanks,

greg k-h

2012-10-23 18:21:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 1/3] f2fs: gc.h: make should_do_checkpoint() inline

This should be an inline function, not a "real" function. Now other
files can properly include gc.h.

Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/f2fs/gc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -191,7 +191,7 @@ static inline int is_idle(struct f2fs_sb
return !(rl->count[BLK_RW_SYNC]) && !(rl->count[BLK_RW_ASYNC]);
}

-static bool should_do_checkpoint(struct f2fs_sb_info *sbi)
+static inline bool should_do_checkpoint(struct f2fs_sb_info *sbi)
{
unsigned int pages_per_sec = sbi->segs_per_sec *
(1 << sbi->log_blocks_per_seg);

2012-10-23 18:22:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 2/3] f2fs: move statistics code into one file

From: Greg Kroah-Hartman <[email protected]>

This moves all of the procfs statistics code into one file, debug.c and
removes the #ifdefs from the core f2fs code when calling statistic
functions.

This will make it more obvious how to move from procfs to debugfs, no
functionality was changed here at all.

Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/f2fs/Makefile | 1
fs/f2fs/debug.c | 414 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/f2fs/f2fs.h | 25 ++-
fs/f2fs/gc.c | 373 -------------------------------------------------
fs/f2fs/super.c | 21 --
5 files changed, 442 insertions(+), 392 deletions(-)

--- a/fs/f2fs/Makefile
+++ b/fs/f2fs/Makefile
@@ -2,5 +2,6 @@ obj-$(CONFIG_F2FS_FS) += f2fs.o

f2fs-y := dir.o file.o inode.o namei.o hash.o super.o
f2fs-y += checkpoint.o gc.o data.o node.o segment.o recovery.o
+f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o
f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
--- /dev/null
+++ b/fs/f2fs/debug.c
@@ -0,0 +1,414 @@
+/**
+ * f2fs debugging statistics
+ *
+ * Copyright (c) 2012 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com/
+ * Copyright (c) 2012 Linux Foundation
+ * Copyright (c) 2012 Greg Kroah-Hartman <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/backing-dev.h>
+#include <linux/proc_fs.h>
+#include <linux/f2fs_fs.h>
+#include <linux/blkdev.h>
+
+#include "f2fs.h"
+#include "node.h"
+#include "segment.h"
+#include "gc.h"
+
+static LIST_HEAD(f2fs_stat_list);
+static struct proc_dir_entry *f2fs_proc_root;
+
+
+void f2fs_update_stat(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+ struct f2fs_stat_info *si = gc_i->stat_info;
+ int i;
+
+ /* valid check of the segment numbers */
+ si->hit_ext = sbi->read_hit_ext;
+ si->total_ext = sbi->total_hit_ext;
+ si->ndirty_node = get_pages(sbi, F2FS_DIRTY_NODES);
+ si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS);
+ si->ndirty_dirs = sbi->n_dirty_dirs;
+ si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META);
+ si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg;
+ si->rsvd_segs = reserved_segments(sbi);
+ si->overp_segs = overprovision_segments(sbi);
+ si->valid_count = valid_user_blocks(sbi);
+ si->valid_node_count = valid_node_count(sbi);
+ si->valid_inode_count = valid_inode_count(sbi);
+ si->utilization = utilization(sbi);
+
+ si->free_segs = free_segments(sbi);
+ si->free_secs = free_sections(sbi);
+ si->prefree_count = prefree_segments(sbi);
+ si->dirty_count = dirty_segments(sbi);
+ si->node_pages = sbi->node_inode->i_mapping->nrpages;
+ si->meta_pages = sbi->meta_inode->i_mapping->nrpages;
+ si->nats = NM_I(sbi)->nat_cnt;
+ si->sits = SIT_I(sbi)->dirty_sentries;
+ si->fnids = NM_I(sbi)->fcnt;
+ si->bg_gc = sbi->bg_gc;
+ si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
+ * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
+ / 2;
+ si->util_valid = (int)(written_block_count(sbi) >>
+ sbi->log_blocks_per_seg)
+ * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
+ / 2;
+ si->util_invalid = 50 - si->util_free - si->util_valid;
+ for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_NODE; i++) {
+ struct curseg_info *curseg = CURSEG_I(sbi, i);
+ si->curseg[i] = curseg->segno;
+ si->cursec[i] = curseg->segno / sbi->segs_per_sec;
+ si->curzone[i] = si->cursec[i] / sbi->secs_per_zone;
+ }
+
+ for (i = 0; i < 2; i++) {
+ si->segment_count[i] = sbi->segment_count[i];
+ si->block_count[i] = sbi->block_count[i];
+ }
+}
+
+/**
+ * This function calculates BDF of every segments
+ */
+static void f2fs_update_gc_metric(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+ struct f2fs_stat_info *si = gc_i->stat_info;
+ unsigned int blks_per_sec, hblks_per_sec, total_vblocks, bimodal, dist;
+ struct sit_info *sit_i = SIT_I(sbi);
+ unsigned int segno, vblocks;
+ int ndirty = 0;
+
+ bimodal = 0;
+ total_vblocks = 0;
+ blks_per_sec = sbi->segs_per_sec * (1 << sbi->log_blocks_per_seg);
+ hblks_per_sec = blks_per_sec / 2;
+ mutex_lock(&sit_i->sentry_lock);
+ for (segno = 0; segno < TOTAL_SEGS(sbi); segno += sbi->segs_per_sec) {
+ vblocks = get_valid_blocks(sbi, segno, sbi->segs_per_sec);
+ dist = abs(vblocks - hblks_per_sec);
+ bimodal += dist * dist;
+
+ if (vblocks > 0 && vblocks < blks_per_sec) {
+ total_vblocks += vblocks;
+ ndirty++;
+ }
+ }
+ mutex_unlock(&sit_i->sentry_lock);
+ dist = sbi->total_sections * hblks_per_sec * hblks_per_sec / 100;
+ si->bimodal = bimodal / dist;
+ if (si->dirty_count)
+ si->avg_vblocks = total_vblocks / ndirty;
+ else
+ si->avg_vblocks = 0;
+}
+
+static int f2fs_read_gc(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ struct f2fs_gc_info *gc_i, *next;
+ struct f2fs_stat_info *si;
+ char *buf = page;
+ int i = 0;
+
+ list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
+ int j;
+ si = gc_i->stat_info;
+
+ mutex_lock(&si->stat_list);
+ if (!si->sbi) {
+ mutex_unlock(&si->stat_list);
+ continue;
+ }
+ f2fs_update_stat(si->sbi);
+
+ buf += sprintf(buf, "=====[ partition info. #%d ]=====\n", i++);
+ buf += sprintf(buf, "[SB: 1] [CP: 2] [NAT: %d] [SIT: %d] ",
+ si->nat_area_segs, si->sit_area_segs);
+ buf += sprintf(buf, "[SSA: %d] [MAIN: %d",
+ si->ssa_area_segs, si->main_area_segs);
+ buf += sprintf(buf, "(OverProv:%d Resv:%d)]\n\n",
+ si->overp_segs, si->rsvd_segs);
+ buf += sprintf(buf, "Utilization: %d%% (%d valid blocks)\n",
+ si->utilization, si->valid_count);
+ buf += sprintf(buf, " - Node: %u (Inode: %u, ",
+ si->valid_node_count, si->valid_inode_count);
+ buf += sprintf(buf, "Other: %u)\n - Data: %u\n",
+ si->valid_node_count - si->valid_inode_count,
+ si->valid_count - si->valid_node_count);
+ buf += sprintf(buf, "\nMain area: %d segs, %d secs %d zones\n",
+ si->main_area_segs, si->main_area_sections,
+ si->main_area_zones);
+ buf += sprintf(buf, " - COLD data: %d, %d, %d\n",
+ si->curseg[CURSEG_COLD_DATA],
+ si->cursec[CURSEG_COLD_DATA],
+ si->curzone[CURSEG_COLD_DATA]);
+ buf += sprintf(buf, " - WARM data: %d, %d, %d\n",
+ si->curseg[CURSEG_WARM_DATA],
+ si->cursec[CURSEG_WARM_DATA],
+ si->curzone[CURSEG_WARM_DATA]);
+ buf += sprintf(buf, " - HOT data: %d, %d, %d\n",
+ si->curseg[CURSEG_HOT_DATA],
+ si->cursec[CURSEG_HOT_DATA],
+ si->curzone[CURSEG_HOT_DATA]);
+ buf += sprintf(buf, " - Dir dnode: %d, %d, %d\n",
+ si->curseg[CURSEG_HOT_NODE],
+ si->cursec[CURSEG_HOT_NODE],
+ si->curzone[CURSEG_HOT_NODE]);
+ buf += sprintf(buf, " - File dnode: %d, %d, %d\n",
+ si->curseg[CURSEG_WARM_NODE],
+ si->cursec[CURSEG_WARM_NODE],
+ si->curzone[CURSEG_WARM_NODE]);
+ buf += sprintf(buf, " - Indir nodes: %d, %d, %d\n",
+ si->curseg[CURSEG_COLD_NODE],
+ si->cursec[CURSEG_COLD_NODE],
+ si->curzone[CURSEG_COLD_NODE]);
+ buf += sprintf(buf, "\n - Valid: %d\n - Dirty: %d\n",
+ si->main_area_segs - si->dirty_count -
+ si->prefree_count - si->free_segs,
+ si->dirty_count);
+ buf += sprintf(buf, " - Prefree: %d\n - Free: %d (%d)\n\n",
+ si->prefree_count,
+ si->free_segs,
+ si->free_secs);
+ buf += sprintf(buf, "GC calls: %d (BG: %d)\n",
+ si->call_count, si->bg_gc);
+ buf += sprintf(buf, " - data segments : %d\n", si->data_segs);
+ buf += sprintf(buf, " - node segments : %d\n", si->node_segs);
+ buf += sprintf(buf, "Try to move %d blocks\n", si->tot_blks);
+ buf += sprintf(buf, " - data blocks : %d\n", si->data_blks);
+ buf += sprintf(buf, " - node blocks : %d\n", si->node_blks);
+ buf += sprintf(buf, "\nExtent Hit Ratio: %d / %d\n",
+ si->hit_ext, si->total_ext);
+ buf += sprintf(buf, "\nBalancing F2FS Async:\n");
+ buf += sprintf(buf, " - nodes %4d in %4d\n",
+ si->ndirty_node, si->node_pages);
+ buf += sprintf(buf, " - dents %4d in dirs:%4d\n",
+ si->ndirty_dent, si->ndirty_dirs);
+ buf += sprintf(buf, " - meta %4d in %4d\n",
+ si->ndirty_meta, si->meta_pages);
+ buf += sprintf(buf, " - NATs %5d > %lu\n",
+ si->nats, NM_WOUT_THRESHOLD);
+ buf += sprintf(buf, " - SITs: %5d\n - free_nids: %5d\n",
+ si->sits, si->fnids);
+ buf += sprintf(buf, "\nDistribution of User Blocks:");
+ buf += sprintf(buf, " [ valid | invalid | free ]\n");
+ buf += sprintf(buf, " [");
+ for (j = 0; j < si->util_valid; j++)
+ buf += sprintf(buf, "-");
+ buf += sprintf(buf, "|");
+ for (j = 0; j < si->util_invalid; j++)
+ buf += sprintf(buf, "-");
+ buf += sprintf(buf, "|");
+ for (j = 0; j < si->util_free; j++)
+ buf += sprintf(buf, "-");
+ buf += sprintf(buf, "]\n\n");
+ buf += sprintf(buf, "SSR: %u blocks in %u segments\n",
+ si->block_count[SSR], si->segment_count[SSR]);
+ buf += sprintf(buf, "LFS: %u blocks in %u segments\n",
+ si->block_count[LFS], si->segment_count[LFS]);
+ mutex_unlock(&si->stat_list);
+ }
+ return buf - page;
+}
+
+static int f2fs_read_sit(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ struct f2fs_gc_info *gc_i, *next;
+ struct f2fs_stat_info *si;
+ char *buf = page;
+
+ list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
+ si = gc_i->stat_info;
+
+ mutex_lock(&si->stat_list);
+ if (!si->sbi) {
+ mutex_unlock(&si->stat_list);
+ continue;
+ }
+ f2fs_update_gc_metric(si->sbi);
+
+ buf += sprintf(buf, "BDF: %u, avg. vblocks: %u\n",
+ si->bimodal, si->avg_vblocks);
+ mutex_unlock(&si->stat_list);
+ }
+ return buf - page;
+}
+
+static int f2fs_read_mem(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ struct f2fs_gc_info *gc_i, *next;
+ struct f2fs_stat_info *si;
+ char *buf = page;
+
+ list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
+ struct f2fs_sb_info *sbi = gc_i->stat_info->sbi;
+ unsigned npages;
+ unsigned base_mem = 0, cache_mem = 0;
+
+ si = gc_i->stat_info;
+ mutex_lock(&si->stat_list);
+ if (!si->sbi) {
+ mutex_unlock(&si->stat_list);
+ continue;
+ }
+ base_mem += sizeof(struct f2fs_sb_info) + sbi->sb->s_blocksize;
+ base_mem += 2 * sizeof(struct f2fs_inode_info);
+ base_mem += sizeof(*sbi->ckpt);
+
+ /* build sm */
+ base_mem += sizeof(struct f2fs_sm_info);
+
+ /* build sit */
+ base_mem += sizeof(struct sit_info);
+ base_mem += TOTAL_SEGS(sbi) * sizeof(struct seg_entry);
+ base_mem += f2fs_bitmap_size(TOTAL_SEGS(sbi));
+ base_mem += 2 * SIT_VBLOCK_MAP_SIZE * TOTAL_SEGS(sbi);
+ if (sbi->segs_per_sec > 1)
+ base_mem += sbi->total_sections *
+ sizeof(struct sec_entry);
+ base_mem += __bitmap_size(sbi, SIT_BITMAP);
+
+ /* build free segmap */
+ base_mem += sizeof(struct free_segmap_info);
+ base_mem += f2fs_bitmap_size(TOTAL_SEGS(sbi));
+ base_mem += f2fs_bitmap_size(sbi->total_sections);
+
+ /* build curseg */
+ base_mem += sizeof(struct curseg_info) * NR_CURSEG_TYPE;
+ base_mem += PAGE_CACHE_SIZE * NR_CURSEG_TYPE;
+
+ /* build dirty segmap */
+ base_mem += sizeof(struct dirty_seglist_info);
+ base_mem += NR_DIRTY_TYPE * f2fs_bitmap_size(TOTAL_SEGS(sbi));
+ base_mem += 2 * f2fs_bitmap_size(TOTAL_SEGS(sbi));
+
+ /* buld nm */
+ base_mem += sizeof(struct f2fs_nm_info);
+ base_mem += __bitmap_size(sbi, NAT_BITMAP);
+
+ /* build gc */
+ base_mem += sizeof(struct f2fs_gc_info);
+ base_mem += sizeof(struct f2fs_gc_kthread);
+
+ /* free nids */
+ cache_mem += NM_I(sbi)->fcnt;
+ cache_mem += NM_I(sbi)->nat_cnt;
+ npages = sbi->node_inode->i_mapping->nrpages;
+ cache_mem += npages << PAGE_CACHE_SHIFT;
+ npages = sbi->meta_inode->i_mapping->nrpages;
+ cache_mem += npages << PAGE_CACHE_SHIFT;
+ cache_mem += sbi->n_orphans * sizeof(struct orphan_inode_entry);
+ cache_mem += sbi->n_dirty_dirs * sizeof(struct dir_inode_entry);
+
+ buf += sprintf(buf, "%u KB = static: %u + cached: %u\n",
+ (base_mem + cache_mem) >> 10,
+ base_mem >> 10,
+ cache_mem >> 10);
+ mutex_unlock(&si->stat_list);
+ }
+ return buf - page;
+}
+
+static int init_stats(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_stat_info *si;
+ struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
+ struct f2fs_gc_info *gc_i = sbi->gc_info;
+
+ gc_i->stat_info = kzalloc(sizeof(struct f2fs_stat_info),
+ GFP_KERNEL);
+ if (!gc_i->stat_info)
+ return -ENOMEM;
+ si = gc_i->stat_info;
+ mutex_init(&si->stat_list);
+ list_add_tail(&gc_i->stat_list, &f2fs_stat_list);
+
+ si->all_area_segs = le32_to_cpu(raw_super->segment_count);
+ si->sit_area_segs = le32_to_cpu(raw_super->segment_count_sit);
+ si->nat_area_segs = le32_to_cpu(raw_super->segment_count_nat);
+ si->ssa_area_segs = le32_to_cpu(raw_super->segment_count_ssa);
+ si->main_area_segs = le32_to_cpu(raw_super->segment_count_main);
+ si->main_area_sections = le32_to_cpu(raw_super->section_count);
+ si->main_area_zones = si->main_area_sections /
+ le32_to_cpu(raw_super->secs_per_zone);
+ si->sbi = sbi;
+ return 0;
+}
+
+void f2fs_destroy_gci_stats(struct f2fs_gc_info *gc_i)
+{
+ struct f2fs_stat_info *si = gc_i->stat_info;
+
+ list_del(&gc_i->stat_list);
+ mutex_lock(&si->stat_list);
+ si->sbi = NULL;
+ mutex_unlock(&si->stat_list);
+ kfree(gc_i->stat_info);
+}
+
+int f2fs_stat_init(struct super_block *sb, struct f2fs_sb_info *sbi)
+{
+ struct proc_dir_entry *entry;
+ int retval;
+
+ if (!f2fs_proc_root)
+ f2fs_proc_root = proc_mkdir("fs/f2fs", NULL);
+
+ sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root);
+
+ retval = init_stats(sbi);
+ if (retval)
+ return retval;
+
+ entry = create_proc_entry("f2fs_stat", 0, sbi->s_proc);
+ if (!entry)
+ return -ENOMEM;
+ entry->read_proc = f2fs_read_gc;
+ entry->write_proc = NULL;
+
+ entry = create_proc_entry("f2fs_sit_stat", 0, sbi->s_proc);
+ if (!entry) {
+ remove_proc_entry("f2fs_stat", sbi->s_proc);
+ return -ENOMEM;
+ }
+ entry->read_proc = f2fs_read_sit;
+ entry->write_proc = NULL;
+ entry = create_proc_entry("f2fs_mem_stat", 0, sbi->s_proc);
+ if (!entry) {
+ remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
+ remove_proc_entry("f2fs_stat", sbi->s_proc);
+ return -ENOMEM;
+ }
+ entry->read_proc = f2fs_read_mem;
+ entry->write_proc = NULL;
+ return 0;
+}
+
+void f2fs_stat_exit(struct super_block *sb, struct f2fs_sb_info *sbi)
+{
+ if (sbi->s_proc) {
+ remove_proc_entry("f2fs_stat", sbi->s_proc);
+ remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
+ remove_proc_entry("f2fs_mem_stat", sbi->s_proc);
+ remove_proc_entry(sb->s_id, f2fs_proc_root);
+ }
+}
+
+void f2fs_remove_stats(void)
+{
+ remove_proc_entry("fs/f2fs", NULL);
+}
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -953,12 +953,6 @@ int start_gc_thread(struct f2fs_sb_info
void stop_gc_thread(struct f2fs_sb_info *);
block_t start_bidx_of_node(unsigned int);
int f2fs_gc(struct f2fs_sb_info *, int);
-#ifdef CONFIG_F2FS_STAT_FS
-void f2fs_update_stat(struct f2fs_sb_info *);
-void f2fs_update_gc_metric(struct f2fs_sb_info *);
-int f2fs_stat_init(struct f2fs_sb_info *);
-void f2fs_stat_exit(struct f2fs_sb_info *);
-#endif
int build_gc_manager(struct f2fs_sb_info *);
void destroy_gc_manager(struct f2fs_sb_info *);
int create_gc_caches(void);
@@ -970,6 +964,25 @@ void destroy_gc_caches(void);
void recover_fsync_data(struct f2fs_sb_info *);
bool space_for_roll_forward(struct f2fs_sb_info *);

+/**
+ * debug.c
+ */
+#ifdef CONFIG_F2FS_STAT_FS
+void f2fs_update_stat(struct f2fs_sb_info *);
+int f2fs_stat_init(struct super_block *sb, struct f2fs_sb_info *);
+void f2fs_stat_exit(struct super_block *sb, struct f2fs_sb_info *);
+void f2fs_destroy_gci_stats(struct f2fs_gc_info *gc_i);
+void f2fs_remove_stats(void);
+#else
+static inline void f2fs_update_stat(struct f2fs_sb_info *sbi) { }
+static inline int f2fs_stat_init(struct super_block *sb,
+ struct f2fs_sb_info *sbi) { return 0; }
+static inline void f2fs_stat_exit(struct super_block *sb,
+ struct f2fs_sb_info *sbi) { }
+static inline void f2fs_destroy_gci_stats(struct f2fs_gc_info *gc_i) { }
+static inline void f2fs_remove_stats(void) { }
+#endif
+
extern const struct file_operations f2fs_dir_operations;
extern const struct file_operations f2fs_file_operations;
extern const struct inode_operations f2fs_file_inode_operations;
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -24,7 +24,6 @@
#include "segment.h"
#include "gc.h"

-static LIST_HEAD(f2fs_stat_list);
static struct kmem_cache *winode_slab;

static int gc_thread_func(void *data)
@@ -727,350 +726,10 @@ stop:
return gc_status;
}

-#ifdef CONFIG_F2FS_STAT_FS
-void f2fs_update_stat(struct f2fs_sb_info *sbi)
-{
- struct f2fs_gc_info *gc_i = sbi->gc_info;
- struct f2fs_stat_info *si = gc_i->stat_info;
- int i;
-
- /* valid check of the segment numbers */
- si->hit_ext = sbi->read_hit_ext;
- si->total_ext = sbi->total_hit_ext;
- si->ndirty_node = get_pages(sbi, F2FS_DIRTY_NODES);
- si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS);
- si->ndirty_dirs = sbi->n_dirty_dirs;
- si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META);
- si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg;
- si->rsvd_segs = reserved_segments(sbi);
- si->overp_segs = overprovision_segments(sbi);
- si->valid_count = valid_user_blocks(sbi);
- si->valid_node_count = valid_node_count(sbi);
- si->valid_inode_count = valid_inode_count(sbi);
- si->utilization = utilization(sbi);
-
- si->free_segs = free_segments(sbi);
- si->free_secs = free_sections(sbi);
- si->prefree_count = prefree_segments(sbi);
- si->dirty_count = dirty_segments(sbi);
- si->node_pages = sbi->node_inode->i_mapping->nrpages;
- si->meta_pages = sbi->meta_inode->i_mapping->nrpages;
- si->nats = NM_I(sbi)->nat_cnt;
- si->sits = SIT_I(sbi)->dirty_sentries;
- si->fnids = NM_I(sbi)->fcnt;
- si->bg_gc = sbi->bg_gc;
- si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
- * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
- / 2;
- si->util_valid = (int)(written_block_count(sbi) >>
- sbi->log_blocks_per_seg)
- * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
- / 2;
- si->util_invalid = 50 - si->util_free - si->util_valid;
- for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_NODE; i++) {
- struct curseg_info *curseg = CURSEG_I(sbi, i);
- si->curseg[i] = curseg->segno;
- si->cursec[i] = curseg->segno / sbi->segs_per_sec;
- si->curzone[i] = si->cursec[i] / sbi->secs_per_zone;
- }
-
- for (i = 0; i < 2; i++) {
- si->segment_count[i] = sbi->segment_count[i];
- si->block_count[i] = sbi->block_count[i];
- }
-}
-
-/**
- * This function calculates BDF of every segments
- */
-void f2fs_update_gc_metric(struct f2fs_sb_info *sbi)
-{
- struct f2fs_gc_info *gc_i = sbi->gc_info;
- struct f2fs_stat_info *si = gc_i->stat_info;
- unsigned int blks_per_sec, hblks_per_sec, total_vblocks, bimodal, dist;
- struct sit_info *sit_i = SIT_I(sbi);
- unsigned int segno, vblocks;
- int ndirty = 0;
-
- bimodal = 0;
- total_vblocks = 0;
- blks_per_sec = sbi->segs_per_sec * (1 << sbi->log_blocks_per_seg);
- hblks_per_sec = blks_per_sec / 2;
- mutex_lock(&sit_i->sentry_lock);
- for (segno = 0; segno < TOTAL_SEGS(sbi); segno += sbi->segs_per_sec) {
- vblocks = get_valid_blocks(sbi, segno, sbi->segs_per_sec);
- dist = abs(vblocks - hblks_per_sec);
- bimodal += dist * dist;
-
- if (vblocks > 0 && vblocks < blks_per_sec) {
- total_vblocks += vblocks;
- ndirty++;
- }
- }
- mutex_unlock(&sit_i->sentry_lock);
- dist = sbi->total_sections * hblks_per_sec * hblks_per_sec / 100;
- si->bimodal = bimodal / dist;
- if (si->dirty_count)
- si->avg_vblocks = total_vblocks / ndirty;
- else
- si->avg_vblocks = 0;
-}
-
-static int f2fs_read_gc(char *page, char **start, off_t off,
- int count, int *eof, void *data)
-{
- struct f2fs_gc_info *gc_i, *next;
- struct f2fs_stat_info *si;
- char *buf = page;
- int i = 0;
-
- list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
- int j;
- si = gc_i->stat_info;
-
- mutex_lock(&si->stat_list);
- if (!si->sbi) {
- mutex_unlock(&si->stat_list);
- continue;
- }
- f2fs_update_stat(si->sbi);
-
- buf += sprintf(buf, "=====[ partition info. #%d ]=====\n", i++);
- buf += sprintf(buf, "[SB: 1] [CP: 2] [NAT: %d] [SIT: %d] ",
- si->nat_area_segs, si->sit_area_segs);
- buf += sprintf(buf, "[SSA: %d] [MAIN: %d",
- si->ssa_area_segs, si->main_area_segs);
- buf += sprintf(buf, "(OverProv:%d Resv:%d)]\n\n",
- si->overp_segs, si->rsvd_segs);
- buf += sprintf(buf, "Utilization: %d%% (%d valid blocks)\n",
- si->utilization, si->valid_count);
- buf += sprintf(buf, " - Node: %u (Inode: %u, ",
- si->valid_node_count, si->valid_inode_count);
- buf += sprintf(buf, "Other: %u)\n - Data: %u\n",
- si->valid_node_count - si->valid_inode_count,
- si->valid_count - si->valid_node_count);
- buf += sprintf(buf, "\nMain area: %d segs, %d secs %d zones\n",
- si->main_area_segs, si->main_area_sections,
- si->main_area_zones);
- buf += sprintf(buf, " - COLD data: %d, %d, %d\n",
- si->curseg[CURSEG_COLD_DATA],
- si->cursec[CURSEG_COLD_DATA],
- si->curzone[CURSEG_COLD_DATA]);
- buf += sprintf(buf, " - WARM data: %d, %d, %d\n",
- si->curseg[CURSEG_WARM_DATA],
- si->cursec[CURSEG_WARM_DATA],
- si->curzone[CURSEG_WARM_DATA]);
- buf += sprintf(buf, " - HOT data: %d, %d, %d\n",
- si->curseg[CURSEG_HOT_DATA],
- si->cursec[CURSEG_HOT_DATA],
- si->curzone[CURSEG_HOT_DATA]);
- buf += sprintf(buf, " - Dir dnode: %d, %d, %d\n",
- si->curseg[CURSEG_HOT_NODE],
- si->cursec[CURSEG_HOT_NODE],
- si->curzone[CURSEG_HOT_NODE]);
- buf += sprintf(buf, " - File dnode: %d, %d, %d\n",
- si->curseg[CURSEG_WARM_NODE],
- si->cursec[CURSEG_WARM_NODE],
- si->curzone[CURSEG_WARM_NODE]);
- buf += sprintf(buf, " - Indir nodes: %d, %d, %d\n",
- si->curseg[CURSEG_COLD_NODE],
- si->cursec[CURSEG_COLD_NODE],
- si->curzone[CURSEG_COLD_NODE]);
- buf += sprintf(buf, "\n - Valid: %d\n - Dirty: %d\n",
- si->main_area_segs - si->dirty_count -
- si->prefree_count - si->free_segs,
- si->dirty_count);
- buf += sprintf(buf, " - Prefree: %d\n - Free: %d (%d)\n\n",
- si->prefree_count,
- si->free_segs,
- si->free_secs);
- buf += sprintf(buf, "GC calls: %d (BG: %d)\n",
- si->call_count, si->bg_gc);
- buf += sprintf(buf, " - data segments : %d\n", si->data_segs);
- buf += sprintf(buf, " - node segments : %d\n", si->node_segs);
- buf += sprintf(buf, "Try to move %d blocks\n", si->tot_blks);
- buf += sprintf(buf, " - data blocks : %d\n", si->data_blks);
- buf += sprintf(buf, " - node blocks : %d\n", si->node_blks);
- buf += sprintf(buf, "\nExtent Hit Ratio: %d / %d\n",
- si->hit_ext, si->total_ext);
- buf += sprintf(buf, "\nBalancing F2FS Async:\n");
- buf += sprintf(buf, " - nodes %4d in %4d\n",
- si->ndirty_node, si->node_pages);
- buf += sprintf(buf, " - dents %4d in dirs:%4d\n",
- si->ndirty_dent, si->ndirty_dirs);
- buf += sprintf(buf, " - meta %4d in %4d\n",
- si->ndirty_meta, si->meta_pages);
- buf += sprintf(buf, " - NATs %5d > %lu\n",
- si->nats, NM_WOUT_THRESHOLD);
- buf += sprintf(buf, " - SITs: %5d\n - free_nids: %5d\n",
- si->sits, si->fnids);
- buf += sprintf(buf, "\nDistribution of User Blocks:");
- buf += sprintf(buf, " [ valid | invalid | free ]\n");
- buf += sprintf(buf, " [");
- for (j = 0; j < si->util_valid; j++)
- buf += sprintf(buf, "-");
- buf += sprintf(buf, "|");
- for (j = 0; j < si->util_invalid; j++)
- buf += sprintf(buf, "-");
- buf += sprintf(buf, "|");
- for (j = 0; j < si->util_free; j++)
- buf += sprintf(buf, "-");
- buf += sprintf(buf, "]\n\n");
- buf += sprintf(buf, "SSR: %u blocks in %u segments\n",
- si->block_count[SSR], si->segment_count[SSR]);
- buf += sprintf(buf, "LFS: %u blocks in %u segments\n",
- si->block_count[LFS], si->segment_count[LFS]);
- mutex_unlock(&si->stat_list);
- }
- return buf - page;
-}
-
-static int f2fs_read_sit(char *page, char **start, off_t off,
- int count, int *eof, void *data)
-{
- struct f2fs_gc_info *gc_i, *next;
- struct f2fs_stat_info *si;
- char *buf = page;
-
- list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
- si = gc_i->stat_info;
-
- mutex_lock(&si->stat_list);
- if (!si->sbi) {
- mutex_unlock(&si->stat_list);
- continue;
- }
- f2fs_update_gc_metric(si->sbi);
-
- buf += sprintf(buf, "BDF: %u, avg. vblocks: %u\n",
- si->bimodal, si->avg_vblocks);
- mutex_unlock(&si->stat_list);
- }
- return buf - page;
-}
-
-static int f2fs_read_mem(char *page, char **start, off_t off,
- int count, int *eof, void *data)
-{
- struct f2fs_gc_info *gc_i, *next;
- struct f2fs_stat_info *si;
- char *buf = page;
-
- list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
- struct f2fs_sb_info *sbi = gc_i->stat_info->sbi;
- unsigned npages;
- unsigned base_mem = 0, cache_mem = 0;
-
- si = gc_i->stat_info;
- mutex_lock(&si->stat_list);
- if (!si->sbi) {
- mutex_unlock(&si->stat_list);
- continue;
- }
- base_mem += sizeof(struct f2fs_sb_info) + sbi->sb->s_blocksize;
- base_mem += 2 * sizeof(struct f2fs_inode_info);
- base_mem += sizeof(*sbi->ckpt);
-
- /* build sm */
- base_mem += sizeof(struct f2fs_sm_info);
-
- /* build sit */
- base_mem += sizeof(struct sit_info);
- base_mem += TOTAL_SEGS(sbi) * sizeof(struct seg_entry);
- base_mem += f2fs_bitmap_size(TOTAL_SEGS(sbi));
- base_mem += 2 * SIT_VBLOCK_MAP_SIZE * TOTAL_SEGS(sbi);
- if (sbi->segs_per_sec > 1)
- base_mem += sbi->total_sections *
- sizeof(struct sec_entry);
- base_mem += __bitmap_size(sbi, SIT_BITMAP);
-
- /* build free segmap */
- base_mem += sizeof(struct free_segmap_info);
- base_mem += f2fs_bitmap_size(TOTAL_SEGS(sbi));
- base_mem += f2fs_bitmap_size(sbi->total_sections);
-
- /* build curseg */
- base_mem += sizeof(struct curseg_info) * NR_CURSEG_TYPE;
- base_mem += PAGE_CACHE_SIZE * NR_CURSEG_TYPE;
-
- /* build dirty segmap */
- base_mem += sizeof(struct dirty_seglist_info);
- base_mem += NR_DIRTY_TYPE * f2fs_bitmap_size(TOTAL_SEGS(sbi));
- base_mem += 2 * f2fs_bitmap_size(TOTAL_SEGS(sbi));
-
- /* buld nm */
- base_mem += sizeof(struct f2fs_nm_info);
- base_mem += __bitmap_size(sbi, NAT_BITMAP);
-
- /* build gc */
- base_mem += sizeof(struct f2fs_gc_info);
- base_mem += sizeof(struct f2fs_gc_kthread);
-
- /* free nids */
- cache_mem += NM_I(sbi)->fcnt;
- cache_mem += NM_I(sbi)->nat_cnt;
- npages = sbi->node_inode->i_mapping->nrpages;
- cache_mem += npages << PAGE_CACHE_SHIFT;
- npages = sbi->meta_inode->i_mapping->nrpages;
- cache_mem += npages << PAGE_CACHE_SHIFT;
- cache_mem += sbi->n_orphans * sizeof(struct orphan_inode_entry);
- cache_mem += sbi->n_dirty_dirs * sizeof(struct dir_inode_entry);
-
- buf += sprintf(buf, "%u KB = static: %u + cached: %u\n",
- (base_mem + cache_mem) >> 10,
- base_mem >> 10,
- cache_mem >> 10);
- mutex_unlock(&si->stat_list);
- }
- return buf - page;
-}
-
-int f2fs_stat_init(struct f2fs_sb_info *sbi)
-{
- struct proc_dir_entry *entry;
-
- entry = create_proc_entry("f2fs_stat", 0, sbi->s_proc);
- if (!entry)
- return -ENOMEM;
- entry->read_proc = f2fs_read_gc;
- entry->write_proc = NULL;
-
- entry = create_proc_entry("f2fs_sit_stat", 0, sbi->s_proc);
- if (!entry) {
- remove_proc_entry("f2fs_stat", sbi->s_proc);
- return -ENOMEM;
- }
- entry->read_proc = f2fs_read_sit;
- entry->write_proc = NULL;
- entry = create_proc_entry("f2fs_mem_stat", 0, sbi->s_proc);
- if (!entry) {
- remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
- remove_proc_entry("f2fs_stat", sbi->s_proc);
- return -ENOMEM;
- }
- entry->read_proc = f2fs_read_mem;
- entry->write_proc = NULL;
- return 0;
-}
-
-void f2fs_stat_exit(struct f2fs_sb_info *sbi)
-{
- if (sbi->s_proc) {
- remove_proc_entry("f2fs_stat", sbi->s_proc);
- remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
- remove_proc_entry("f2fs_mem_stat", sbi->s_proc);
- }
-}
-#endif
-
int build_gc_manager(struct f2fs_sb_info *sbi)
{
struct f2fs_gc_info *gc_i;
struct f2fs_checkpoint *ckp = F2FS_CKPT(sbi);
-#ifdef CONFIG_F2FS_STAT_FS
- struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
- struct f2fs_stat_info *si;
-#endif

gc_i = kzalloc(sizeof(struct f2fs_gc_info), GFP_KERNEL);
if (!gc_i)
@@ -1082,44 +741,18 @@ int build_gc_manager(struct f2fs_sb_info

DIRTY_I(sbi)->v_ops = &default_v_ops;

-#ifdef CONFIG_F2FS_STAT_FS
- gc_i->stat_info = kzalloc(sizeof(struct f2fs_stat_info),
- GFP_KERNEL);
- if (!gc_i->stat_info)
- return -ENOMEM;
- si = gc_i->stat_info;
- mutex_init(&si->stat_list);
- list_add_tail(&gc_i->stat_list, &f2fs_stat_list);
-
- si->all_area_segs = le32_to_cpu(raw_super->segment_count);
- si->sit_area_segs = le32_to_cpu(raw_super->segment_count_sit);
- si->nat_area_segs = le32_to_cpu(raw_super->segment_count_nat);
- si->ssa_area_segs = le32_to_cpu(raw_super->segment_count_ssa);
- si->main_area_segs = le32_to_cpu(raw_super->segment_count_main);
- si->main_area_sections = le32_to_cpu(raw_super->section_count);
- si->main_area_zones = si->main_area_sections /
- le32_to_cpu(raw_super->secs_per_zone);
- si->sbi = sbi;
-#endif
return 0;
}

void destroy_gc_manager(struct f2fs_sb_info *sbi)
{
struct f2fs_gc_info *gc_i = sbi->gc_info;
-#ifdef CONFIG_F2FS_STAT_FS
- struct f2fs_stat_info *si = gc_i->stat_info;
-#endif
+
if (!gc_i)
return;

-#ifdef CONFIG_F2FS_STAT_FS
- list_del(&gc_i->stat_list);
- mutex_lock(&si->stat_list);
- si->sbi = NULL;
- mutex_unlock(&si->stat_list);
- kfree(gc_i->stat_info);
-#endif
+ f2fs_destroy_gci_stats(gc_i);
+
sbi->gc_info = NULL;
kfree(gc_i);
}
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -26,7 +26,6 @@
#include "xattr.h"

static struct kmem_cache *f2fs_inode_cachep;
-static struct proc_dir_entry *f2fs_proc_root;

enum {
Opt_gc_background_off,
@@ -97,12 +96,7 @@ static void f2fs_put_super(struct super_
{
struct f2fs_sb_info *sbi = F2FS_SB(sb);

-#ifdef CONFIG_F2FS_STAT_FS
- if (sbi->s_proc) {
- f2fs_stat_exit(sbi);
- remove_proc_entry(sb->s_id, f2fs_proc_root);
- }
-#endif
+ f2fs_stat_exit(sb, sbi);
stop_gc_thread(sbi);

write_checkpoint(sbi, false, true);
@@ -486,13 +480,9 @@ static int f2fs_fill_super(struct super_
if (start_gc_thread(sbi))
goto fail;

-#ifdef CONFIG_F2FS_STAT_FS
- if (f2fs_proc_root) {
- sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root);
- if (f2fs_stat_init(sbi))
- goto fail;
- }
-#endif
+ if (f2fs_stat_init(sb, sbi))
+ goto fail;
+
return 0;
fail:
stop_gc_thread(sbi);
@@ -566,7 +556,6 @@ static int __init init_f2fs_fs(void)
if (register_filesystem(&f2fs_fs_type))
return -EBUSY;

- f2fs_proc_root = proc_mkdir("fs/f2fs", NULL);
return 0;
fail:
return -ENOMEM;
@@ -574,7 +563,7 @@ fail:

static void __exit exit_f2fs_fs(void)
{
- remove_proc_entry("fs/f2fs", NULL);
+ f2fs_remove_stats();
unregister_filesystem(&f2fs_fs_type);
destroy_checkpoint_caches();
destroy_gc_caches();

2012-10-23 18:24:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3/3] f2fs: move proc files to debugfs

From: Greg Kroah-Hartman <[email protected]>

This moves all of the f2fs debugging files into debugfs. The files are
located in /sys/kernel/debug/f2fs/

Note, I think we are generating all of the same information in each of
the files for every unique f2fs filesystem in the machine. This copies
the functionality that was present in the proc files, but this should be
fixed up in the future.

Compile-tested only.

Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/f2fs/Kconfig | 8 -
fs/f2fs/debug.c | 288 ++++++++++++++++++++++++++++++--------------------------
fs/f2fs/f2fs.h | 2
3 files changed, 161 insertions(+), 137 deletions(-)

--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -16,18 +16,18 @@ config F2FS_FS

config F2FS_STAT_FS
bool "F2FS Status Information"
- depends on F2FS_FS
+ depends on F2FS_FS && DEBUG_FS
default y
help
- /proc/fs/f2fs/ contains information about partitions mounted as f2fs.
+ /sys/kernel/debug/f2fs/ contains information about partitions mounted as f2fs.
For each partition, a corresponding directory, named as its device
- name, is provided with the following proc entries.
+ name, is provided with the following files:

f2fs_stat major file system information managed by f2fs currently
f2fs_sit_stat average SIT information about whole segments
f2fs_mem_stat current memory footprint consumed by f2fs

- e.g., in /proc/fs/f2fs/sdb1/
+ e.g., in /sys/kernel/debug/f2fs/sdb1/

config F2FS_FS_XATTR
bool "F2FS extended attributes"
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -16,6 +16,8 @@
#include <linux/proc_fs.h>
#include <linux/f2fs_fs.h>
#include <linux/blkdev.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>

#include "f2fs.h"
#include "node.h"
@@ -23,7 +25,7 @@
#include "gc.h"

static LIST_HEAD(f2fs_stat_list);
-static struct proc_dir_entry *f2fs_proc_root;
+static struct dentry *debugfs_root;


void f2fs_update_stat(struct f2fs_sb_info *sbi)
@@ -114,16 +116,14 @@ static void f2fs_update_gc_metric(struct
si->avg_vblocks = 0;
}

-static int f2fs_read_gc(char *page, char **start, off_t off,
- int count, int *eof, void *data)
+static int stat_show(struct seq_file *s, void *v)
{
struct f2fs_gc_info *gc_i, *next;
struct f2fs_stat_info *si;
- char *buf = page;
int i = 0;
+ int j;

list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
- int j;
si = gc_i->stat_info;

mutex_lock(&si->stat_list);
@@ -133,102 +133,111 @@ static int f2fs_read_gc(char *page, char
}
f2fs_update_stat(si->sbi);

- buf += sprintf(buf, "=====[ partition info. #%d ]=====\n", i++);
- buf += sprintf(buf, "[SB: 1] [CP: 2] [NAT: %d] [SIT: %d] ",
- si->nat_area_segs, si->sit_area_segs);
- buf += sprintf(buf, "[SSA: %d] [MAIN: %d",
- si->ssa_area_segs, si->main_area_segs);
- buf += sprintf(buf, "(OverProv:%d Resv:%d)]\n\n",
- si->overp_segs, si->rsvd_segs);
- buf += sprintf(buf, "Utilization: %d%% (%d valid blocks)\n",
- si->utilization, si->valid_count);
- buf += sprintf(buf, " - Node: %u (Inode: %u, ",
- si->valid_node_count, si->valid_inode_count);
- buf += sprintf(buf, "Other: %u)\n - Data: %u\n",
- si->valid_node_count - si->valid_inode_count,
- si->valid_count - si->valid_node_count);
- buf += sprintf(buf, "\nMain area: %d segs, %d secs %d zones\n",
- si->main_area_segs, si->main_area_sections,
- si->main_area_zones);
- buf += sprintf(buf, " - COLD data: %d, %d, %d\n",
- si->curseg[CURSEG_COLD_DATA],
- si->cursec[CURSEG_COLD_DATA],
- si->curzone[CURSEG_COLD_DATA]);
- buf += sprintf(buf, " - WARM data: %d, %d, %d\n",
- si->curseg[CURSEG_WARM_DATA],
- si->cursec[CURSEG_WARM_DATA],
- si->curzone[CURSEG_WARM_DATA]);
- buf += sprintf(buf, " - HOT data: %d, %d, %d\n",
- si->curseg[CURSEG_HOT_DATA],
- si->cursec[CURSEG_HOT_DATA],
- si->curzone[CURSEG_HOT_DATA]);
- buf += sprintf(buf, " - Dir dnode: %d, %d, %d\n",
- si->curseg[CURSEG_HOT_NODE],
- si->cursec[CURSEG_HOT_NODE],
- si->curzone[CURSEG_HOT_NODE]);
- buf += sprintf(buf, " - File dnode: %d, %d, %d\n",
- si->curseg[CURSEG_WARM_NODE],
- si->cursec[CURSEG_WARM_NODE],
- si->curzone[CURSEG_WARM_NODE]);
- buf += sprintf(buf, " - Indir nodes: %d, %d, %d\n",
- si->curseg[CURSEG_COLD_NODE],
- si->cursec[CURSEG_COLD_NODE],
- si->curzone[CURSEG_COLD_NODE]);
- buf += sprintf(buf, "\n - Valid: %d\n - Dirty: %d\n",
- si->main_area_segs - si->dirty_count -
- si->prefree_count - si->free_segs,
- si->dirty_count);
- buf += sprintf(buf, " - Prefree: %d\n - Free: %d (%d)\n\n",
- si->prefree_count,
- si->free_segs,
- si->free_secs);
- buf += sprintf(buf, "GC calls: %d (BG: %d)\n",
- si->call_count, si->bg_gc);
- buf += sprintf(buf, " - data segments : %d\n", si->data_segs);
- buf += sprintf(buf, " - node segments : %d\n", si->node_segs);
- buf += sprintf(buf, "Try to move %d blocks\n", si->tot_blks);
- buf += sprintf(buf, " - data blocks : %d\n", si->data_blks);
- buf += sprintf(buf, " - node blocks : %d\n", si->node_blks);
- buf += sprintf(buf, "\nExtent Hit Ratio: %d / %d\n",
- si->hit_ext, si->total_ext);
- buf += sprintf(buf, "\nBalancing F2FS Async:\n");
- buf += sprintf(buf, " - nodes %4d in %4d\n",
- si->ndirty_node, si->node_pages);
- buf += sprintf(buf, " - dents %4d in dirs:%4d\n",
- si->ndirty_dent, si->ndirty_dirs);
- buf += sprintf(buf, " - meta %4d in %4d\n",
- si->ndirty_meta, si->meta_pages);
- buf += sprintf(buf, " - NATs %5d > %lu\n",
- si->nats, NM_WOUT_THRESHOLD);
- buf += sprintf(buf, " - SITs: %5d\n - free_nids: %5d\n",
- si->sits, si->fnids);
- buf += sprintf(buf, "\nDistribution of User Blocks:");
- buf += sprintf(buf, " [ valid | invalid | free ]\n");
- buf += sprintf(buf, " [");
+ seq_printf(s, "=====[ partition info. #%d ]=====\n", i++);
+ seq_printf(s, "=====[ partition info. #%d ]=====\n", i++);
+ seq_printf(s, "[SB: 1] [CP: 2] [NAT: %d] [SIT: %d] ",
+ si->nat_area_segs, si->sit_area_segs);
+ seq_printf(s, "[SSA: %d] [MAIN: %d",
+ si->ssa_area_segs, si->main_area_segs);
+ seq_printf(s, "(OverProv:%d Resv:%d)]\n\n",
+ si->overp_segs, si->rsvd_segs);
+ seq_printf(s, "Utilization: %d%% (%d valid blocks)\n",
+ si->utilization, si->valid_count);
+ seq_printf(s, " - Node: %u (Inode: %u, ",
+ si->valid_node_count, si->valid_inode_count);
+ seq_printf(s, "Other: %u)\n - Data: %u\n",
+ si->valid_node_count - si->valid_inode_count,
+ si->valid_count - si->valid_node_count);
+ seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
+ si->main_area_segs, si->main_area_sections,
+ si->main_area_zones);
+ seq_printf(s, " - COLD data: %d, %d, %d\n",
+ si->curseg[CURSEG_COLD_DATA],
+ si->cursec[CURSEG_COLD_DATA],
+ si->curzone[CURSEG_COLD_DATA]);
+ seq_printf(s, " - WARM data: %d, %d, %d\n",
+ si->curseg[CURSEG_WARM_DATA],
+ si->cursec[CURSEG_WARM_DATA],
+ si->curzone[CURSEG_WARM_DATA]);
+ seq_printf(s, " - HOT data: %d, %d, %d\n",
+ si->curseg[CURSEG_HOT_DATA],
+ si->cursec[CURSEG_HOT_DATA],
+ si->curzone[CURSEG_HOT_DATA]);
+ seq_printf(s, " - Dir dnode: %d, %d, %d\n",
+ si->curseg[CURSEG_HOT_NODE],
+ si->cursec[CURSEG_HOT_NODE],
+ si->curzone[CURSEG_HOT_NODE]);
+ seq_printf(s, " - File dnode: %d, %d, %d\n",
+ si->curseg[CURSEG_WARM_NODE],
+ si->cursec[CURSEG_WARM_NODE],
+ si->curzone[CURSEG_WARM_NODE]);
+ seq_printf(s, " - Indir nodes: %d, %d, %d\n",
+ si->curseg[CURSEG_COLD_NODE],
+ si->cursec[CURSEG_COLD_NODE],
+ si->curzone[CURSEG_COLD_NODE]);
+ seq_printf(s, "\n - Valid: %d\n - Dirty: %d\n",
+ si->main_area_segs - si->dirty_count -
+ si->prefree_count - si->free_segs,
+ si->dirty_count);
+ seq_printf(s, " - Prefree: %d\n - Free: %d (%d)\n\n",
+ si->prefree_count, si->free_segs, si->free_secs);
+ seq_printf(s, "GC calls: %d (BG: %d)\n",
+ si->call_count, si->bg_gc);
+ seq_printf(s, " - data segments : %d\n", si->data_segs);
+ seq_printf(s, " - node segments : %d\n", si->node_segs);
+ seq_printf(s, "Try to move %d blocks\n", si->tot_blks);
+ seq_printf(s, " - data blocks : %d\n", si->data_blks);
+ seq_printf(s, " - node blocks : %d\n", si->node_blks);
+ seq_printf(s, "\nExtent Hit Ratio: %d / %d\n",
+ si->hit_ext, si->total_ext);
+ seq_printf(s, "\nBalancing F2FS Async:\n");
+ seq_printf(s, " - nodes %4d in %4d\n",
+ si->ndirty_node, si->node_pages);
+ seq_printf(s, " - dents %4d in dirs:%4d\n",
+ si->ndirty_dent, si->ndirty_dirs);
+ seq_printf(s, " - meta %4d in %4d\n",
+ si->ndirty_meta, si->meta_pages);
+ seq_printf(s, " - NATs %5d > %lu\n",
+ si->nats, NM_WOUT_THRESHOLD);
+ seq_printf(s, " - SITs: %5d\n - free_nids: %5d\n",
+ si->sits, si->fnids);
+ seq_printf(s, "\nDistribution of User Blocks:");
+ seq_printf(s, " [ valid | invalid | free ]\n");
+ seq_printf(s, " [");
for (j = 0; j < si->util_valid; j++)
- buf += sprintf(buf, "-");
- buf += sprintf(buf, "|");
+ seq_printf(s, "-");
+ seq_printf(s, "|");
for (j = 0; j < si->util_invalid; j++)
- buf += sprintf(buf, "-");
- buf += sprintf(buf, "|");
+ seq_printf(s, "-");
+ seq_printf(s, "|");
for (j = 0; j < si->util_free; j++)
- buf += sprintf(buf, "-");
- buf += sprintf(buf, "]\n\n");
- buf += sprintf(buf, "SSR: %u blocks in %u segments\n",
- si->block_count[SSR], si->segment_count[SSR]);
- buf += sprintf(buf, "LFS: %u blocks in %u segments\n",
- si->block_count[LFS], si->segment_count[LFS]);
+ seq_printf(s, "-");
+ seq_printf(s, "]\n\n");
+ seq_printf(s, "SSR: %u blocks in %u segments\n",
+ si->block_count[SSR], si->segment_count[SSR]);
+ seq_printf(s, "LFS: %u blocks in %u segments\n",
+ si->block_count[LFS], si->segment_count[LFS]);
mutex_unlock(&si->stat_list);
}
- return buf - page;
+ return 0;
+}
+
+static int stat_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, stat_show, inode->i_private);
}

-static int f2fs_read_sit(char *page, char **start, off_t off,
- int count, int *eof, void *data)
+static const struct file_operations stat_fops = {
+ .open = stat_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int sit_show(struct seq_file *s, void *v)
{
struct f2fs_gc_info *gc_i, *next;
struct f2fs_stat_info *si;
- char *buf = page;

list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
si = gc_i->stat_info;
@@ -240,19 +249,29 @@ static int f2fs_read_sit(char *page, cha
}
f2fs_update_gc_metric(si->sbi);

- buf += sprintf(buf, "BDF: %u, avg. vblocks: %u\n",
- si->bimodal, si->avg_vblocks);
+ seq_printf(s, "BDF: %u, avg. vblocks: %u\n",
+ si->bimodal, si->avg_vblocks);
mutex_unlock(&si->stat_list);
}
- return buf - page;
+ return 0;
}

-static int f2fs_read_mem(char *page, char **start, off_t off,
- int count, int *eof, void *data)
+static int sit_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, sit_show, inode->i_private);
+}
+
+static const struct file_operations sit_fops = {
+ .open = sit_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int mem_show(struct seq_file *s, void *v)
{
struct f2fs_gc_info *gc_i, *next;
struct f2fs_stat_info *si;
- char *buf = page;

list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
struct f2fs_sb_info *sbi = gc_i->stat_info->sbi;
@@ -314,15 +333,27 @@ static int f2fs_read_mem(char *page, cha
cache_mem += sbi->n_orphans * sizeof(struct orphan_inode_entry);
cache_mem += sbi->n_dirty_dirs * sizeof(struct dir_inode_entry);

- buf += sprintf(buf, "%u KB = static: %u + cached: %u\n",
- (base_mem + cache_mem) >> 10,
- base_mem >> 10,
- cache_mem >> 10);
+ seq_printf(s, "%u KB = static: %u + cached: %u\n",
+ (base_mem + cache_mem) >> 10,
+ base_mem >> 10, cache_mem >> 10);
mutex_unlock(&si->stat_list);
}
- return buf - page;
+ return 0;
}

+static int mem_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, mem_show, inode->i_private);
+}
+
+static const struct file_operations mem_fops = {
+ .open = mem_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+
static int init_stats(struct f2fs_sb_info *sbi)
{
struct f2fs_stat_info *si;
@@ -362,53 +393,46 @@ void f2fs_destroy_gci_stats(struct f2fs_

int f2fs_stat_init(struct super_block *sb, struct f2fs_sb_info *sbi)
{
- struct proc_dir_entry *entry;
int retval;

- if (!f2fs_proc_root)
- f2fs_proc_root = proc_mkdir("fs/f2fs", NULL);
+ if (!debugfs_root)
+ debugfs_root = debugfs_create_dir("f2fs", NULL);

- sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root);
+ sbi->s_debug = debugfs_create_dir(sb->s_id, debugfs_root);

retval = init_stats(sbi);
if (retval)
return retval;

- entry = create_proc_entry("f2fs_stat", 0, sbi->s_proc);
- if (!entry)
- return -ENOMEM;
- entry->read_proc = f2fs_read_gc;
- entry->write_proc = NULL;
+ if (!debugfs_create_file("f2fs_stat", S_IRUGO, debugfs_root,
+ NULL, &stat_fops))
+ goto failed;
+
+ if (!debugfs_create_file("f2fs_sit_stat", S_IRUGO, debugfs_root,
+ NULL, &sit_fops))
+ goto failed;
+
+ if (!debugfs_create_file("f2fs_mem_stat", S_IRUGO, debugfs_root,
+ NULL, &mem_fops))
+ goto failed;

- entry = create_proc_entry("f2fs_sit_stat", 0, sbi->s_proc);
- if (!entry) {
- remove_proc_entry("f2fs_stat", sbi->s_proc);
- return -ENOMEM;
- }
- entry->read_proc = f2fs_read_sit;
- entry->write_proc = NULL;
- entry = create_proc_entry("f2fs_mem_stat", 0, sbi->s_proc);
- if (!entry) {
- remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
- remove_proc_entry("f2fs_stat", sbi->s_proc);
- return -ENOMEM;
- }
- entry->read_proc = f2fs_read_mem;
- entry->write_proc = NULL;
return 0;
+failed:
+ debugfs_remove_recursive(sbi->s_debug);
+ sbi->s_debug = NULL;
+ return -EINVAL;
}

void f2fs_stat_exit(struct super_block *sb, struct f2fs_sb_info *sbi)
{
- if (sbi->s_proc) {
- remove_proc_entry("f2fs_stat", sbi->s_proc);
- remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
- remove_proc_entry("f2fs_mem_stat", sbi->s_proc);
- remove_proc_entry(sb->s_id, f2fs_proc_root);
+ if (sbi->s_debug) {
+ debugfs_remove_recursive(sbi->s_debug);
+ sbi->s_debug = NULL;
}
}

void f2fs_remove_stats(void)
{
- remove_proc_entry("fs/f2fs", NULL);
+ debugfs_remove_recursive(debugfs_root);
+ debugfs_root = NULL;
}
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -379,7 +379,7 @@ struct f2fs_sb_info {
int rr_flush;

/* related to GC */
- struct proc_dir_entry *s_proc;
+ struct dentry *s_debug;
struct f2fs_gc_info *gc_info; /* Garbage Collector
information */
struct mutex gc_mutex; /* mutex for GC */

2012-10-23 18:27:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system

On Tue, Oct 23, 2012 at 11:21:53AM +0900, Jaegeuk Kim wrote:
> mkfs.f2fs
> =========
>
> The file system formatting tool, "mkfs.f2fs", is available from the following
> download page: http://sourceforge.net/projects/f2fs-tools/

Is there a git tree of this tool somewhere, so I don't have to
constantly suffer the sf.net download interface every time I want to get
the latest version?

thanks,

greg k-h

2012-10-23 18:57:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system

On Tue, Oct 23, 2012 at 11:26:59AM -0700, Greg KH wrote:
> On Tue, Oct 23, 2012 at 11:21:53AM +0900, Jaegeuk Kim wrote:
> > mkfs.f2fs
> > =========
> >
> > The file system formatting tool, "mkfs.f2fs", is available from the following
> > download page: http://sourceforge.net/projects/f2fs-tools/
>
> Is there a git tree of this tool somewhere, so I don't have to
> constantly suffer the sf.net download interface every time I want to get
> the latest version?

Oh, and where do we report bugs for this tool? I just formatted a usb
stick with the mkfs.f2fs program, and it did not fully erase the old
filesystem that was on there (iso9660), so when I mounted it, it did so
in iso9660 mode, not f2fs mode.

thanks,

greg k-h

2012-10-23 19:11:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 0/3] f2fs: move proc files to debugfs

On Tue, Oct 23, 2012 at 11:20:55AM -0700, Greg KH wrote:
> Here are 3 patches, moving the proc file usage on f2fs to debugfs.
>
> The first one fixes a bug in the gc.h file preventing it from being able
> to be included by any other files.
>
> The second patch moves all current proc file accesses to a single file,
> removing all #ifdefs from the .c files. This should have been done in
> the first place.
>
> The last file converts the files to use debugfs instead of proc.
>
> Note, these patches have been compile tested only, I haven't tested them
> out, as I haven't had the chance to yet. I'll go do that this afternoon
> after I catch up on some other pending kernel work.
>
> One question, it seems that the proc files show all information for all
> super blocks in the system, no matter which subdirectory you are reading
> from in the proc f2fs tree. Is that really what you want? Shouldn't we
> only be showing the stats of the superblock we are saying we will
> report? I'll test that later today, and if it really is wrong, will fix
> the debugfs code up to handle this properly.

I just tested your patch set, and it looks like I see all partition
information in each file, no matter what subdir it is in.

So, do you want this to be broken up per partition/superblock, in a
subdir, like you intended? Or just 3 files, for all superblocks in the
system?

Oh, the third patch is buggy, don't apply it, I got the subdir logic
wrong, I'll go fix that up now.

thanks,

greg k-h

2012-10-23 19:20:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3/3 v2] f2fs: move proc files to debugfs

From: Greg Kroah-Hartman <[email protected]>

This moves all of the f2fs debugging files into debugfs. The files are
located in /sys/kernel/debug/f2fs/

Note, I think we are generating all of the same information in each of
the files for every unique f2fs filesystem in the machine. This copies
the functionality that was present in the proc files, but this should be
fixed up in the future.

Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
v2: run-time tested, and fixed subdirectory issue. Now files are
created where they were supposed to be.

fs/f2fs/Kconfig | 8 -
fs/f2fs/debug.c | 294 ++++++++++++++++++++++++++++++--------------------------
fs/f2fs/f2fs.h | 2
3 files changed, 165 insertions(+), 139 deletions(-)

--- a/fs/f2fs/Kconfig
+++ b/fs/f2fs/Kconfig
@@ -16,18 +16,18 @@ config F2FS_FS

config F2FS_STAT_FS
bool "F2FS Status Information"
- depends on F2FS_FS
+ depends on F2FS_FS && DEBUG_FS
default y
help
- /proc/fs/f2fs/ contains information about partitions mounted as f2fs.
+ /sys/kernel/debug/f2fs/ contains information about partitions mounted as f2fs.
For each partition, a corresponding directory, named as its device
- name, is provided with the following proc entries.
+ name, is provided with the following files:

f2fs_stat major file system information managed by f2fs currently
f2fs_sit_stat average SIT information about whole segments
f2fs_mem_stat current memory footprint consumed by f2fs

- e.g., in /proc/fs/f2fs/sdb1/
+ e.g., in /sys/kernel/debug/f2fs/sdb1/

config F2FS_FS_XATTR
bool "F2FS extended attributes"
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -16,6 +16,8 @@
#include <linux/proc_fs.h>
#include <linux/f2fs_fs.h>
#include <linux/blkdev.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>

#include "f2fs.h"
#include "node.h"
@@ -23,7 +25,7 @@
#include "gc.h"

static LIST_HEAD(f2fs_stat_list);
-static struct proc_dir_entry *f2fs_proc_root;
+static struct dentry *debugfs_root;


void f2fs_update_stat(struct f2fs_sb_info *sbi)
@@ -114,16 +116,14 @@ static void f2fs_update_gc_metric(struct
si->avg_vblocks = 0;
}

-static int f2fs_read_gc(char *page, char **start, off_t off,
- int count, int *eof, void *data)
+static int stat_show(struct seq_file *s, void *v)
{
struct f2fs_gc_info *gc_i, *next;
struct f2fs_stat_info *si;
- char *buf = page;
int i = 0;
+ int j;

list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
- int j;
si = gc_i->stat_info;

mutex_lock(&si->stat_list);
@@ -133,102 +133,111 @@ static int f2fs_read_gc(char *page, char
}
f2fs_update_stat(si->sbi);

- buf += sprintf(buf, "=====[ partition info. #%d ]=====\n", i++);
- buf += sprintf(buf, "[SB: 1] [CP: 2] [NAT: %d] [SIT: %d] ",
- si->nat_area_segs, si->sit_area_segs);
- buf += sprintf(buf, "[SSA: %d] [MAIN: %d",
- si->ssa_area_segs, si->main_area_segs);
- buf += sprintf(buf, "(OverProv:%d Resv:%d)]\n\n",
- si->overp_segs, si->rsvd_segs);
- buf += sprintf(buf, "Utilization: %d%% (%d valid blocks)\n",
- si->utilization, si->valid_count);
- buf += sprintf(buf, " - Node: %u (Inode: %u, ",
- si->valid_node_count, si->valid_inode_count);
- buf += sprintf(buf, "Other: %u)\n - Data: %u\n",
- si->valid_node_count - si->valid_inode_count,
- si->valid_count - si->valid_node_count);
- buf += sprintf(buf, "\nMain area: %d segs, %d secs %d zones\n",
- si->main_area_segs, si->main_area_sections,
- si->main_area_zones);
- buf += sprintf(buf, " - COLD data: %d, %d, %d\n",
- si->curseg[CURSEG_COLD_DATA],
- si->cursec[CURSEG_COLD_DATA],
- si->curzone[CURSEG_COLD_DATA]);
- buf += sprintf(buf, " - WARM data: %d, %d, %d\n",
- si->curseg[CURSEG_WARM_DATA],
- si->cursec[CURSEG_WARM_DATA],
- si->curzone[CURSEG_WARM_DATA]);
- buf += sprintf(buf, " - HOT data: %d, %d, %d\n",
- si->curseg[CURSEG_HOT_DATA],
- si->cursec[CURSEG_HOT_DATA],
- si->curzone[CURSEG_HOT_DATA]);
- buf += sprintf(buf, " - Dir dnode: %d, %d, %d\n",
- si->curseg[CURSEG_HOT_NODE],
- si->cursec[CURSEG_HOT_NODE],
- si->curzone[CURSEG_HOT_NODE]);
- buf += sprintf(buf, " - File dnode: %d, %d, %d\n",
- si->curseg[CURSEG_WARM_NODE],
- si->cursec[CURSEG_WARM_NODE],
- si->curzone[CURSEG_WARM_NODE]);
- buf += sprintf(buf, " - Indir nodes: %d, %d, %d\n",
- si->curseg[CURSEG_COLD_NODE],
- si->cursec[CURSEG_COLD_NODE],
- si->curzone[CURSEG_COLD_NODE]);
- buf += sprintf(buf, "\n - Valid: %d\n - Dirty: %d\n",
- si->main_area_segs - si->dirty_count -
- si->prefree_count - si->free_segs,
- si->dirty_count);
- buf += sprintf(buf, " - Prefree: %d\n - Free: %d (%d)\n\n",
- si->prefree_count,
- si->free_segs,
- si->free_secs);
- buf += sprintf(buf, "GC calls: %d (BG: %d)\n",
- si->call_count, si->bg_gc);
- buf += sprintf(buf, " - data segments : %d\n", si->data_segs);
- buf += sprintf(buf, " - node segments : %d\n", si->node_segs);
- buf += sprintf(buf, "Try to move %d blocks\n", si->tot_blks);
- buf += sprintf(buf, " - data blocks : %d\n", si->data_blks);
- buf += sprintf(buf, " - node blocks : %d\n", si->node_blks);
- buf += sprintf(buf, "\nExtent Hit Ratio: %d / %d\n",
- si->hit_ext, si->total_ext);
- buf += sprintf(buf, "\nBalancing F2FS Async:\n");
- buf += sprintf(buf, " - nodes %4d in %4d\n",
- si->ndirty_node, si->node_pages);
- buf += sprintf(buf, " - dents %4d in dirs:%4d\n",
- si->ndirty_dent, si->ndirty_dirs);
- buf += sprintf(buf, " - meta %4d in %4d\n",
- si->ndirty_meta, si->meta_pages);
- buf += sprintf(buf, " - NATs %5d > %lu\n",
- si->nats, NM_WOUT_THRESHOLD);
- buf += sprintf(buf, " - SITs: %5d\n - free_nids: %5d\n",
- si->sits, si->fnids);
- buf += sprintf(buf, "\nDistribution of User Blocks:");
- buf += sprintf(buf, " [ valid | invalid | free ]\n");
- buf += sprintf(buf, " [");
+ seq_printf(s, "=====[ partition info. #%d ]=====\n", i++);
+ seq_printf(s, "=====[ partition info. #%d ]=====\n", i++);
+ seq_printf(s, "[SB: 1] [CP: 2] [NAT: %d] [SIT: %d] ",
+ si->nat_area_segs, si->sit_area_segs);
+ seq_printf(s, "[SSA: %d] [MAIN: %d",
+ si->ssa_area_segs, si->main_area_segs);
+ seq_printf(s, "(OverProv:%d Resv:%d)]\n\n",
+ si->overp_segs, si->rsvd_segs);
+ seq_printf(s, "Utilization: %d%% (%d valid blocks)\n",
+ si->utilization, si->valid_count);
+ seq_printf(s, " - Node: %u (Inode: %u, ",
+ si->valid_node_count, si->valid_inode_count);
+ seq_printf(s, "Other: %u)\n - Data: %u\n",
+ si->valid_node_count - si->valid_inode_count,
+ si->valid_count - si->valid_node_count);
+ seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
+ si->main_area_segs, si->main_area_sections,
+ si->main_area_zones);
+ seq_printf(s, " - COLD data: %d, %d, %d\n",
+ si->curseg[CURSEG_COLD_DATA],
+ si->cursec[CURSEG_COLD_DATA],
+ si->curzone[CURSEG_COLD_DATA]);
+ seq_printf(s, " - WARM data: %d, %d, %d\n",
+ si->curseg[CURSEG_WARM_DATA],
+ si->cursec[CURSEG_WARM_DATA],
+ si->curzone[CURSEG_WARM_DATA]);
+ seq_printf(s, " - HOT data: %d, %d, %d\n",
+ si->curseg[CURSEG_HOT_DATA],
+ si->cursec[CURSEG_HOT_DATA],
+ si->curzone[CURSEG_HOT_DATA]);
+ seq_printf(s, " - Dir dnode: %d, %d, %d\n",
+ si->curseg[CURSEG_HOT_NODE],
+ si->cursec[CURSEG_HOT_NODE],
+ si->curzone[CURSEG_HOT_NODE]);
+ seq_printf(s, " - File dnode: %d, %d, %d\n",
+ si->curseg[CURSEG_WARM_NODE],
+ si->cursec[CURSEG_WARM_NODE],
+ si->curzone[CURSEG_WARM_NODE]);
+ seq_printf(s, " - Indir nodes: %d, %d, %d\n",
+ si->curseg[CURSEG_COLD_NODE],
+ si->cursec[CURSEG_COLD_NODE],
+ si->curzone[CURSEG_COLD_NODE]);
+ seq_printf(s, "\n - Valid: %d\n - Dirty: %d\n",
+ si->main_area_segs - si->dirty_count -
+ si->prefree_count - si->free_segs,
+ si->dirty_count);
+ seq_printf(s, " - Prefree: %d\n - Free: %d (%d)\n\n",
+ si->prefree_count, si->free_segs, si->free_secs);
+ seq_printf(s, "GC calls: %d (BG: %d)\n",
+ si->call_count, si->bg_gc);
+ seq_printf(s, " - data segments : %d\n", si->data_segs);
+ seq_printf(s, " - node segments : %d\n", si->node_segs);
+ seq_printf(s, "Try to move %d blocks\n", si->tot_blks);
+ seq_printf(s, " - data blocks : %d\n", si->data_blks);
+ seq_printf(s, " - node blocks : %d\n", si->node_blks);
+ seq_printf(s, "\nExtent Hit Ratio: %d / %d\n",
+ si->hit_ext, si->total_ext);
+ seq_printf(s, "\nBalancing F2FS Async:\n");
+ seq_printf(s, " - nodes %4d in %4d\n",
+ si->ndirty_node, si->node_pages);
+ seq_printf(s, " - dents %4d in dirs:%4d\n",
+ si->ndirty_dent, si->ndirty_dirs);
+ seq_printf(s, " - meta %4d in %4d\n",
+ si->ndirty_meta, si->meta_pages);
+ seq_printf(s, " - NATs %5d > %lu\n",
+ si->nats, NM_WOUT_THRESHOLD);
+ seq_printf(s, " - SITs: %5d\n - free_nids: %5d\n",
+ si->sits, si->fnids);
+ seq_printf(s, "\nDistribution of User Blocks:");
+ seq_printf(s, " [ valid | invalid | free ]\n");
+ seq_printf(s, " [");
for (j = 0; j < si->util_valid; j++)
- buf += sprintf(buf, "-");
- buf += sprintf(buf, "|");
+ seq_printf(s, "-");
+ seq_printf(s, "|");
for (j = 0; j < si->util_invalid; j++)
- buf += sprintf(buf, "-");
- buf += sprintf(buf, "|");
+ seq_printf(s, "-");
+ seq_printf(s, "|");
for (j = 0; j < si->util_free; j++)
- buf += sprintf(buf, "-");
- buf += sprintf(buf, "]\n\n");
- buf += sprintf(buf, "SSR: %u blocks in %u segments\n",
- si->block_count[SSR], si->segment_count[SSR]);
- buf += sprintf(buf, "LFS: %u blocks in %u segments\n",
- si->block_count[LFS], si->segment_count[LFS]);
+ seq_printf(s, "-");
+ seq_printf(s, "]\n\n");
+ seq_printf(s, "SSR: %u blocks in %u segments\n",
+ si->block_count[SSR], si->segment_count[SSR]);
+ seq_printf(s, "LFS: %u blocks in %u segments\n",
+ si->block_count[LFS], si->segment_count[LFS]);
mutex_unlock(&si->stat_list);
}
- return buf - page;
+ return 0;
+}
+
+static int stat_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, stat_show, inode->i_private);
}

-static int f2fs_read_sit(char *page, char **start, off_t off,
- int count, int *eof, void *data)
+static const struct file_operations stat_fops = {
+ .open = stat_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int sit_show(struct seq_file *s, void *v)
{
struct f2fs_gc_info *gc_i, *next;
struct f2fs_stat_info *si;
- char *buf = page;

list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
si = gc_i->stat_info;
@@ -240,19 +249,29 @@ static int f2fs_read_sit(char *page, cha
}
f2fs_update_gc_metric(si->sbi);

- buf += sprintf(buf, "BDF: %u, avg. vblocks: %u\n",
- si->bimodal, si->avg_vblocks);
+ seq_printf(s, "BDF: %u, avg. vblocks: %u\n",
+ si->bimodal, si->avg_vblocks);
mutex_unlock(&si->stat_list);
}
- return buf - page;
+ return 0;
}

-static int f2fs_read_mem(char *page, char **start, off_t off,
- int count, int *eof, void *data)
+static int sit_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, sit_show, inode->i_private);
+}
+
+static const struct file_operations sit_fops = {
+ .open = sit_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int mem_show(struct seq_file *s, void *v)
{
struct f2fs_gc_info *gc_i, *next;
struct f2fs_stat_info *si;
- char *buf = page;

list_for_each_entry_safe(gc_i, next, &f2fs_stat_list, stat_list) {
struct f2fs_sb_info *sbi = gc_i->stat_info->sbi;
@@ -314,15 +333,27 @@ static int f2fs_read_mem(char *page, cha
cache_mem += sbi->n_orphans * sizeof(struct orphan_inode_entry);
cache_mem += sbi->n_dirty_dirs * sizeof(struct dir_inode_entry);

- buf += sprintf(buf, "%u KB = static: %u + cached: %u\n",
- (base_mem + cache_mem) >> 10,
- base_mem >> 10,
- cache_mem >> 10);
+ seq_printf(s, "%u KB = static: %u + cached: %u\n",
+ (base_mem + cache_mem) >> 10,
+ base_mem >> 10, cache_mem >> 10);
mutex_unlock(&si->stat_list);
}
- return buf - page;
+ return 0;
+}
+
+static int mem_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, mem_show, inode->i_private);
}

+static const struct file_operations mem_fops = {
+ .open = mem_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+
static int init_stats(struct f2fs_sb_info *sbi)
{
struct f2fs_stat_info *si;
@@ -362,53 +393,48 @@ void f2fs_destroy_gci_stats(struct f2fs_

int f2fs_stat_init(struct super_block *sb, struct f2fs_sb_info *sbi)
{
- struct proc_dir_entry *entry;
int retval;

- if (!f2fs_proc_root)
- f2fs_proc_root = proc_mkdir("fs/f2fs", NULL);
-
- sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root);
-
retval = init_stats(sbi);
if (retval)
return retval;

- entry = create_proc_entry("f2fs_stat", 0, sbi->s_proc);
- if (!entry)
- return -ENOMEM;
- entry->read_proc = f2fs_read_gc;
- entry->write_proc = NULL;
+ if (!debugfs_root)
+ debugfs_root = debugfs_create_dir("f2fs", NULL);
+
+ sbi->s_debug = debugfs_create_dir(sb->s_id, debugfs_root);
+ if (!sbi->s_debug)
+ return -EINVAL;
+
+ if (!debugfs_create_file("f2fs_stat", S_IRUGO, sbi->s_debug,
+ NULL, &stat_fops))
+ goto failed;
+
+ if (!debugfs_create_file("f2fs_sit_stat", S_IRUGO, sbi->s_debug,
+ NULL, &sit_fops))
+ goto failed;
+
+ if (!debugfs_create_file("f2fs_mem_stat", S_IRUGO, sbi->s_debug,
+ NULL, &mem_fops))
+ goto failed;

- entry = create_proc_entry("f2fs_sit_stat", 0, sbi->s_proc);
- if (!entry) {
- remove_proc_entry("f2fs_stat", sbi->s_proc);
- return -ENOMEM;
- }
- entry->read_proc = f2fs_read_sit;
- entry->write_proc = NULL;
- entry = create_proc_entry("f2fs_mem_stat", 0, sbi->s_proc);
- if (!entry) {
- remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
- remove_proc_entry("f2fs_stat", sbi->s_proc);
- return -ENOMEM;
- }
- entry->read_proc = f2fs_read_mem;
- entry->write_proc = NULL;
return 0;
+failed:
+ debugfs_remove_recursive(sbi->s_debug);
+ sbi->s_debug = NULL;
+ return -EINVAL;
}

void f2fs_stat_exit(struct super_block *sb, struct f2fs_sb_info *sbi)
{
- if (sbi->s_proc) {
- remove_proc_entry("f2fs_stat", sbi->s_proc);
- remove_proc_entry("f2fs_sit_stat", sbi->s_proc);
- remove_proc_entry("f2fs_mem_stat", sbi->s_proc);
- remove_proc_entry(sb->s_id, f2fs_proc_root);
+ if (sbi->s_debug) {
+ debugfs_remove_recursive(sbi->s_debug);
+ sbi->s_debug = NULL;
}
}

void f2fs_remove_stats(void)
{
- remove_proc_entry("fs/f2fs", NULL);
+ debugfs_remove_recursive(debugfs_root);
+ debugfs_root = NULL;
}
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -379,7 +379,7 @@ struct f2fs_sb_info {
int rr_flush;

/* related to GC */
- struct proc_dir_entry *s_proc;
+ struct dentry *s_debug;
struct f2fs_gc_info *gc_info; /* Garbage Collector
information */
struct mutex gc_mutex; /* mutex for GC */

2012-10-23 23:14:49

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system

> On Tue, Oct 23, 2012 at 11:21:53AM +0900, Jaegeuk Kim wrote:
> > mkfs.f2fs
> > =========
> >
> > The file system formatting tool, "mkfs.f2fs", is available from the following
> > download page: http://sourceforge.net/projects/f2fs-tools/
>
> Is there a git tree of this tool somewhere, so I don't have to
> constantly suffer the sf.net download interface every time I want to get
> the latest version?

I'd love to do like that.
I've managed a git tree for tools in house only, due to the company secret.
Would you suggest something for this?
I can do managing the tree outside though.

>
> thanks,
>
> greg k-h

2012-10-23 23:18:41

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system

> On Tue, Oct 23, 2012 at 11:26:59AM -0700, Greg KH wrote:
> > On Tue, Oct 23, 2012 at 11:21:53AM +0900, Jaegeuk Kim wrote:
> > > mkfs.f2fs
> > > =========
> > >
> > > The file system formatting tool, "mkfs.f2fs", is available from the following
> > > download page: http://sourceforge.net/projects/f2fs-tools/
> >
> > Is there a git tree of this tool somewhere, so I don't have to
> > constantly suffer the sf.net download interface every time I want to get
> > the latest version?
>
> Oh, and where do we report bugs for this tool? I just formatted a usb
> stick with the mkfs.f2fs program, and it did not fully erase the old
> filesystem that was on there (iso9660), so when I mounted it, it did so
> in iso9660 mode, not f2fs mode.
>

Any suggestion for reporting bugs?
Maybe via a mailing list?
What version did you use? (1.1.0 is correct.)
The reason we found was due to the 0'th block, so we fixed that in v1.1.0.
Thanks,

> thanks,
>
> greg k-h


---
Jaegeuk Kim
Samsung

2012-10-24 03:00:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system

On Wed, Oct 24, 2012 at 08:14:44AM +0900, Jaegeuk Kim wrote:
> > On Tue, Oct 23, 2012 at 11:21:53AM +0900, Jaegeuk Kim wrote:
> > > mkfs.f2fs
> > > =========
> > >
> > > The file system formatting tool, "mkfs.f2fs", is available from the following
> > > download page: http://sourceforge.net/projects/f2fs-tools/
> >
> > Is there a git tree of this tool somewhere, so I don't have to
> > constantly suffer the sf.net download interface every time I want to get
> > the latest version?
>
> I'd love to do like that.
> I've managed a git tree for tools in house only, due to the company secret.
> Would you suggest something for this?
> I can do managing the tree outside though.

git.kernel.org can work, so can github, and also, you have a sf.net
project, why not use the git tree it provides you? Right now it is
empty.

thanks,

greg k-h

2012-10-24 03:02:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system

On Wed, Oct 24, 2012 at 08:18:36AM +0900, Jaegeuk Kim wrote:
> > On Tue, Oct 23, 2012 at 11:26:59AM -0700, Greg KH wrote:
> > > On Tue, Oct 23, 2012 at 11:21:53AM +0900, Jaegeuk Kim wrote:
> > > > mkfs.f2fs
> > > > =========
> > > >
> > > > The file system formatting tool, "mkfs.f2fs", is available from the following
> > > > download page: http://sourceforge.net/projects/f2fs-tools/
> > >
> > > Is there a git tree of this tool somewhere, so I don't have to
> > > constantly suffer the sf.net download interface every time I want to get
> > > the latest version?
> >
> > Oh, and where do we report bugs for this tool? I just formatted a usb
> > stick with the mkfs.f2fs program, and it did not fully erase the old
> > filesystem that was on there (iso9660), so when I mounted it, it did so
> > in iso9660 mode, not f2fs mode.
> >
>
> Any suggestion for reporting bugs?
> Maybe via a mailing list?

Mailing list is fine.

> What version did you use? (1.1.0 is correct.)

I used 1.1.0

> The reason we found was due to the 0'th block, so we fixed that in v1.1.0.

Hm, that's what I used. I zeroed out the whole usb disk and tried again
and it worked then, I was trying to debug the kernel changes, not the
userspace tool, so I didn't spend much time on it :)

But, if you do get a public git tree up, I will at the very least,
provide a patch to handle '-h' properly for mkfs, that should work...

thanks,

greg k-h

2012-10-24 05:34:41

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system

> -----Original Message-----
> From: 'Greg KH' [mailto:[email protected]]
> Sent: Wednesday, October 24, 2012 12:01 PM
> To: Jaegeuk Kim
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system
> Importance: High
>
> On Wed, Oct 24, 2012 at 08:14:44AM +0900, Jaegeuk Kim wrote:
> > > On Tue, Oct 23, 2012 at 11:21:53AM +0900, Jaegeuk Kim wrote:
> > > > mkfs.f2fs
> > > > =========
> > > >
> > > > The file system formatting tool, "mkfs.f2fs", is available from the following
> > > > download page: http://sourceforge.net/projects/f2fs-tools/
> > >
> > > Is there a git tree of this tool somewhere, so I don't have to
> > > constantly suffer the sf.net download interface every time I want to get
> > > the latest version?
> >
> > I'd love to do like that.
> > I've managed a git tree for tools in house only, due to the company secret.
> > Would you suggest something for this?
> > I can do managing the tree outside though.
>
> git.kernel.org can work, so can github, and also, you have a sf.net
> project, why not use the git tree it provides you? Right now it is
> empty.
>

Ok, I'll make a tree in sf.net. :)

> thanks,
>
> greg k-h


---
Jaegeuk Kim
Samsung


2012-10-24 05:35:30

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 00/16 v2] f2fs: introduce flash-friendly file system

> On Wed, Oct 24, 2012 at 08:18:36AM +0900, Jaegeuk Kim wrote:
> > > On Tue, Oct 23, 2012 at 11:26:59AM -0700, Greg KH wrote:
> > > > On Tue, Oct 23, 2012 at 11:21:53AM +0900, Jaegeuk Kim wrote:
> > > > > mkfs.f2fs
> > > > > =========
> > > > >
> > > > > The file system formatting tool, "mkfs.f2fs", is available from the following
> > > > > download page: http://sourceforge.net/projects/f2fs-tools/
> > > >
> > > > Is there a git tree of this tool somewhere, so I don't have to
> > > > constantly suffer the sf.net download interface every time I want to get
> > > > the latest version?
> > >
> > > Oh, and where do we report bugs for this tool? I just formatted a usb
> > > stick with the mkfs.f2fs program, and it did not fully erase the old
> > > filesystem that was on there (iso9660), so when I mounted it, it did so
> > > in iso9660 mode, not f2fs mode.
> > >
> >
> > Any suggestion for reporting bugs?
> > Maybe via a mailing list?
>
> Mailing list is fine.
>
> > What version did you use? (1.1.0 is correct.)
>
> I used 1.1.0
>
> > The reason we found was due to the 0'th block, so we fixed that in v1.1.0.
>
> Hm, that's what I used. I zeroed out the whole usb disk and tried again
> and it worked then, I was trying to debug the kernel changes, not the
> userspace tool, so I didn't spend much time on it :)
>
> But, if you do get a public git tree up, I will at the very least,
> provide a patch to handle '-h' properly for mkfs, that should work...
>

Ok, thank you very much.

> thanks,
>
> greg k-h


---
Jaegeuk Kim
Samsung

2012-10-24 11:26:01

by Viacheslav Dubeyko

[permalink] [raw]
Subject: Re: [PATCH 02/16 v2] f2fs: add on-disk layout

Hi,

On Tue, 2012-10-23 at 11:26 +0900, Jaegeuk Kim wrote:
> This adds a header file describing the on-disk layout of f2fs.
>
> Signed-off-by: Changman Lee <[email protected]>
> Signed-off-by: Chul Lee <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> include/linux/f2fs_fs.h | 362 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 362 insertions(+)
> create mode 100644 include/linux/f2fs_fs.h
>
> diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
> new file mode 100644
> index 0000000..bd9c217
> --- /dev/null
> +++ b/include/linux/f2fs_fs.h
> @@ -0,0 +1,362 @@
> +/**
> + * include/linux/f2fs_fs.h
> + *
> + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com/
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#ifndef _LINUX_F2FS_FS_H
> +#define _LINUX_F2FS_FS_H
> +
> +#include <linux/pagemap.h>
> +#include <linux/types.h>
> +
> +#define F2FS_SUPER_MAGIC 0xF2F52010
> +#define F2FS_SUPER_OFFSET 0 /* start sector # for sb */

Does f2fs superblock really haven't any offset from the volume begin?

> +#define F2FS_BLKSIZE 4096
> +#define F2FS_MAX_EXTENSION 64
> +
> +#define NULL_ADDR 0x0U
> +#define NEW_ADDR -1U

Does NULL_ADDR and NEW_ADDR declarations really need? Does kernel
haven't any analogous?

> +
> +#define F2FS_ROOT_INO(sbi) (sbi->root_ino_num)
> +#define F2FS_NODE_INO(sbi) (sbi->node_ino_num)
> +#define F2FS_META_INO(sbi) (sbi->meta_ino_num)
> +
> +#define GFP_F2FS_MOVABLE (__GFP_WAIT | __GFP_IO | __GFP_ZERO)
> +
> +#define MAX_ACTIVE_LOGS 16
> +#define MAX_ACTIVE_NODE_LOGS 8
> +#define MAX_ACTIVE_DATA_LOGS 8

I think that it makes sense to comment the reasons of such limitations
in MAX_ACTIVE_LOGS, MAX_ACTIVE_NODE_LOGS, MAX_ACTIVE_DATA_LOGS.

> +
> +/*
> + * For superblock
> + */
> +struct f2fs_super_block {
> + __le32 magic; /* Magic Number */
> + __le16 major_ver; /* Major Version */
> + __le16 minor_ver; /* Minor Version */
> + __le32 log_sectorsize; /* log2 (Sector size in bytes) */
> + __le32 log_sectors_per_block; /* log2 (Number of sectors per block */
> + __le32 log_blocksize; /* log2 (Block size in bytes) */
> + __le32 log_blocks_per_seg; /* log2 (Number of blocks per segment) */

>From my point of view, __le32 is big data type for log2 (<value>). What
do you think?

> + __le32 segs_per_sec; /* Number of segments per section */
> + __le32 secs_per_zone; /* Number of sections per zone */
> + __le32 checksum_offset; /* Checksum position in this super block */
> + __le64 block_count; /* Total number of blocks */
> + __le32 section_count; /* Total number of sections */
> + __le32 segment_count; /* Total number of segments */
> + __le32 segment_count_ckpt; /* Total number of segments
> + in Checkpoint area */
> + __le32 segment_count_sit; /* Total number of segments
> + in Segment information table */
> + __le32 segment_count_nat; /* Total number of segments
> + in Node address table */
> + /*Total number of segments in Segment summary area */
> + __le32 segment_count_ssa;
> + /* Total number of segments in Main area */
> + __le32 segment_count_main;
> + __le32 failure_safe_block_distance;
> + __le32 segment0_blkaddr; /* Start block address of Segment 0 */
> + __le32 start_segment_checkpoint; /* Start block address of ckpt */
> + __le32 sit_blkaddr; /* Start block address of SIT */
> + __le32 nat_blkaddr; /* Start block address of NAT */
> + __le32 ssa_blkaddr; /* Start block address of SSA */
> + __le32 main_blkaddr; /* Start block address of Main area */
> + __le32 root_ino; /* Root directory inode number */
> + __le32 node_ino; /* node inode number */
> + __le32 meta_ino; /* meta inode number */
> + __le32 volume_serial_number; /* VSN is optional field */

Usually, it is used 128-bits UUID for serial number. Why do you use
__le32 as volume_serial_number?

> + __le16 volume_name[512]; /* Volume Name */
> + __le32 extension_count;
> + __u8 extension_list[F2FS_MAX_EXTENSION][8]; /* extension array */
> +} __packed;
> +
> +/*
> + * For checkpoint
> + */
> +struct f2fs_checkpoint {
> + __le64 checkpoint_ver; /* Checkpoint block version number */
> + __le64 user_block_count; /* # of user blocks */
> + __le64 valid_block_count; /* # of valid blocks in Main area */
> + __le32 rsvd_segment_count; /* # of reserved segments for gc */
> + __le32 overprov_segment_count; /* # of overprovision segments */
> + __le32 free_segment_count; /* # of free segments in Main area */
> +
> + /* information of current node segments */
> + __le32 cur_node_segno[MAX_ACTIVE_NODE_LOGS];
> + __le16 cur_node_blkoff[MAX_ACTIVE_NODE_LOGS];
> + __le16 nat_upd_blkoff[MAX_ACTIVE_NODE_LOGS];
> + /* information of current data segments */
> + __le32 cur_data_segno[MAX_ACTIVE_DATA_LOGS];
> + __le16 cur_data_blkoff[MAX_ACTIVE_DATA_LOGS];
> + __le32 ckpt_flags; /* Flags : umount and journal_present */
> + __le32 cp_pack_total_block_count;
> + __le32 cp_pack_start_sum; /* start block number of data summary */
> + __le32 valid_node_count; /* Total number of valid nodes */
> + __le32 valid_inode_count; /* Total number of valid inodes */
> + __le32 next_free_nid; /* Next free node number */
> + __le32 sit_ver_bitmap_bytesize; /* Default value 64 */
> + __le32 nat_ver_bitmap_bytesize; /* Default value 256 */
> + __le32 checksum_offset; /* Checksum position
> + in this checkpoint block */
> + __le64 elapsed_time; /* elapsed time while partition
> + is mounted */
> + /* allocation type of current segment */
> + unsigned char alloc_type[MAX_ACTIVE_LOGS];
> +
> + /* SIT and NAT version bitmap */
> + unsigned char sit_nat_version_bitmap[1];
> +} __packed;
> +
> +/*
> + * For orphan inode management
> + */
> +#define F2FS_ORPHANS_PER_BLOCK 1020
> +
> +struct f2fs_orphan_block {
> + __le32 ino[F2FS_ORPHANS_PER_BLOCK]; /* inode numbers */
> + __le32 reserved;
> + __le16 blk_addr; /* block index in current CP */
> + __le16 blk_count; /* Number of orphan inode blocks in CP */
> + __le32 entry_count; /* Total number of orphan nodes in current CP */
> + __le32 check_sum; /* CRC32 for orphan inode block */
> +} __packed;
> +
> +/*
> + * For NODE structure
> + */
> +struct f2fs_extent {
> + __le32 fofs;
> + __le32 blk_addr;
> + __le32 len;
> +} __packed;
> +
> +#define F2FS_MAX_NAME_LEN 256
> +#define ADDRS_PER_INODE 927 /* Address Pointers in an Inode */
> +#define ADDRS_PER_BLOCK 1018 /* Address Pointers in a Direct Block */
> +#define NIDS_PER_BLOCK 1018 /* Node IDs in an Indirect Block */
> +
> +struct f2fs_inode {
> + __le16 i_mode; /* File mode */
> + __u8 i_advise; /* File hints */
> + __u8 i_reserved; /* Reserved */
> + __le32 i_uid; /* User ID */
> + __le32 i_gid; /* Group ID */
> + __le32 i_links; /* Links count */
> + __le64 i_size; /* File size in bytes */
> + __le64 i_blocks; /* File size in blocks */
> + __le64 i_ctime; /* Inode change time */
> + __le64 i_mtime; /* Modification time */
> + __le32 i_ctime_nsec;
> + __le32 i_mtime_nsec;
> + __le32 current_depth;
> + __le32 i_xattr_nid; /* nid to save xattr */
> + __le32 i_flags; /* file attributes */
> + __le32 i_pino; /* parent inode number */
> + __le32 i_namelen; /* file name length */
> + __u8 i_name[F2FS_MAX_NAME_LEN]; /* file name for SPOR */
> +
> + struct f2fs_extent i_ext; /* caching a largest extent */
> +
> + __le32 i_addr[ADDRS_PER_INODE]; /* Pointers to data blocks */
> +
> + __le32 i_nid[5]; /* direct(2), indirect(2),
> + double_indirect(1) node id */
> +} __packed;
> +
> +struct direct_node {
> + __le32 addr[ADDRS_PER_BLOCK]; /* array of data block address */
> +} __packed;
> +
> +struct indirect_node {
> + __le32 nid[NIDS_PER_BLOCK]; /* array of data block address */
> +} __packed;
> +
> +enum {
> + COLD_BIT_SHIFT = 0,
> + FSYNC_BIT_SHIFT,
> + DENT_BIT_SHIFT,
> + OFFSET_BIT_SHIFT
> +};
> +
> +struct node_footer {
> + __le32 nid; /* node id */
> + __le32 ino; /* inode nunmber */
> + __le32 flag; /* include cold/fsync/dentry marks and offset */
> + __le64 cp_ver; /* checkpoint version */
> + __le32 next_blkaddr; /* next node page block address */
> +} __packed;
> +
> +struct f2fs_node {
> + union {
> + struct f2fs_inode i;
> + struct direct_node dn;
> + struct indirect_node in;
> + };
> + struct node_footer footer;
> +} __packed;
> +
> +/*
> + * For NAT entries
> + */
> +#define NAT_ENTRY_PER_BLOCK (PAGE_CACHE_SIZE / sizeof(struct f2fs_nat_entry))
> +
> +struct f2fs_nat_entry {
> + __u8 version;
> + __le32 ino;
> + __le32 block_addr;
> +} __packed;
> +
> +struct f2fs_nat_block {
> + struct f2fs_nat_entry entries[NAT_ENTRY_PER_BLOCK];
> +} __packed;
> +
> +/*
> + * For SIT entries
> + */
> +#define SIT_VBLOCK_MAP_SIZE 64
> +#define SIT_ENTRY_PER_BLOCK (PAGE_CACHE_SIZE / sizeof(struct f2fs_sit_entry))
> +
> +struct f2fs_sit_entry {
> + __le16 vblocks;
> + __u8 valid_map[SIT_VBLOCK_MAP_SIZE];
> + __le64 mtime;
> +} __packed;
> +
> +struct f2fs_sit_block {
> + struct f2fs_sit_entry entries[SIT_ENTRY_PER_BLOCK];
> +} __packed;
> +
> +/**
> + * For segment summary
> + *
> + * NOTE : For initializing fields, you must use set_summary
> + *
> + * - If data page, nid represents dnode's nid
> + * - If node page, nid represents the node page's nid.
> + *
> + * The ofs_in_node is used by only data page. It represents offset
> + * from node's page's beginning to get a data block address.
> + * ex) data_blkaddr = (block_t)(nodepage_start_address + ofs_in_node)
> + */
> +struct f2fs_summary {
> + __le32 nid;
> + union {
> + __u8 reserved[3];
> + struct {
> + __u8 version;
> + __le16 ofs_in_node;
> + } __packed;
> + };
> +} __packed;
> +
> +struct summary_footer {
> + unsigned char entry_type;
> + __u32 check_sum;
> +} __packed;
> +
> +#define SUMMARY_SIZE (sizeof(struct f2fs_summary))
> +#define SUM_FOOTER_SIZE (sizeof(struct summary_footer))
> +#define ENTRIES_IN_SUM 512
> +#define SUM_ENTRY_SIZE (SUMMARY_SIZE * ENTRIES_IN_SUM)
> +#define SUM_JOURNAL_SIZE (PAGE_CACHE_SIZE - SUM_FOOTER_SIZE -\
> + SUM_ENTRY_SIZE)
> +struct nat_journal_entry {
> + __le32 nid;
> + struct f2fs_nat_entry ne;
> +} __packed;
> +
> +struct sit_journal_entry {
> + __le32 segno;
> + struct f2fs_sit_entry se;
> +} __packed;
> +
> +#define NAT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\
> + sizeof(struct nat_journal_entry))
> +#define NAT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\
> + sizeof(struct nat_journal_entry))
> +#define SIT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\
> + sizeof(struct sit_journal_entry))
> +#define SIT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\
> + sizeof(struct sit_journal_entry))
> +enum {
> + NAT_JOURNAL = 0,
> + SIT_JOURNAL
> +};
> +
> +struct nat_journal {
> + struct nat_journal_entry entries[NAT_JOURNAL_ENTRIES];
> + __u8 reserved[NAT_JOURNAL_RESERVED];
> +} __packed;
> +
> +struct sit_journal {
> + struct sit_journal_entry entries[SIT_JOURNAL_ENTRIES];
> + __u8 reserved[SIT_JOURNAL_RESERVED];
> +} __packed;
> +
> +struct f2fs_summary_block {
> + struct f2fs_summary entries[ENTRIES_IN_SUM];
> + union {
> + __le16 n_nats;
> + __le16 n_sits;
> + };
> + union {
> + struct nat_journal nat_j;
> + struct sit_journal sit_j;
> + };
> + struct summary_footer footer;
> +} __packed;
> +
> +/*
> + * For directory operations
> + */
> +#define F2FS_DOT_HASH 0
> +#define F2FS_DDOT_HASH F2FS_DOT_HASH
> +#define F2FS_MAX_HASH (~((0x3ULL) << 62))
> +#define F2FS_HASH_COL_BIT ((0x1ULL) << 63)
> +
> +typedef __le32 f2fs_hash_t;
> +
> +#define F2FS_NAME_LEN 8

It exists F2FS_MAX_NAME_LEN. I think that it makes sense to comment here
purpose of F2FS_NAME_LEN declaration.

> +#define NR_DENTRY_IN_BLOCK 214 /* the number of dentry in a block */
> +#define MAX_DIR_HASH_DEPTH 63 /* MAX level for dir lookup */
> +
> +#define SIZE_OF_DIR_ENTRY 11 /* by byte */
> +#define SIZE_OF_DENTRY_BITMAP ((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - 1) / \
> + BITS_PER_BYTE)
> +#define SIZE_OF_RESERVED (PAGE_SIZE - ((SIZE_OF_DIR_ENTRY + \
> + F2FS_NAME_LEN) * \
> + NR_DENTRY_IN_BLOCK + SIZE_OF_DENTRY_BITMAP))
> +
> +struct f2fs_dir_entry {
> + __le32 hash_code; /* hash code of file name */
> + __le32 ino; /* node number of inode */
> + __le16 name_len; /* the size of file name
> + length in unicode characters */
> + __u8 file_type;
> +} __packed;
> +
> +struct f2fs_dentry_block {
> + __u8 dentry_bitmap[SIZE_OF_DENTRY_BITMAP];
> + __u8 reserved[SIZE_OF_RESERVED];
> + struct f2fs_dir_entry dentry[NR_DENTRY_IN_BLOCK];
> + __u8 filename[NR_DENTRY_IN_BLOCK][F2FS_NAME_LEN];
> +} __packed;
> +
> +enum {
> + F2FS_FT_UNKNOWN,
> + F2FS_FT_REG_FILE,
> + F2FS_FT_DIR,
> + F2FS_FT_CHRDEV,
> + F2FS_FT_BLKDEV,
> + F2FS_FT_FIFO,
> + F2FS_FT_SOCK,
> + F2FS_FT_SYMLINK,
> + F2FS_FT_MAX
> +};
> +
> +#endif /* _LINUX_F2FS_FS_H */

With the best regards,
Vyacheslav Dubeyko.

2012-10-25 08:12:50

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 0/3] f2fs: move proc files to debugfs

> On Tue, Oct 23, 2012 at 11:20:55AM -0700, Greg KH wrote:
> > Here are 3 patches, moving the proc file usage on f2fs to debugfs.
> >
> > The first one fixes a bug in the gc.h file preventing it from being able
> > to be included by any other files.
> >
> > The second patch moves all current proc file accesses to a single file,
> > removing all #ifdefs from the .c files. This should have been done in
> > the first place.
> >
> > The last file converts the files to use debugfs instead of proc.
> >
> > Note, these patches have been compile tested only, I haven't tested them
> > out, as I haven't had the chance to yet. I'll go do that this afternoon
> > after I catch up on some other pending kernel work.
> >
> > One question, it seems that the proc files show all information for all
> > super blocks in the system, no matter which subdirectory you are reading
> > from in the proc f2fs tree. Is that really what you want? Shouldn't we
> > only be showing the stats of the superblock we are saying we will
> > report? I'll test that later today, and if it really is wrong, will fix
> > the debugfs code up to handle this properly.
>
> I just tested your patch set, and it looks like I see all partition
> information in each file, no matter what subdir it is in.
>
> So, do you want this to be broken up per partition/superblock, in a
> subdir, like you intended? Or just 3 files, for all superblocks in the
> system?

Thank you for great patches. I really appreciate that.
When I merged your patches, I found some unnecessary codes and memory
structures such as gc_info in the original codes.
So I've done some works additionally based on the patches.
Furthermore, for readability, I merged 3 stat files into one file, and
let it show all the superblocks together, no breakdown per partition.
I'll submit v3-series applying these changes soon.
Thanks,

>
> Oh, the third patch is buggy, don't apply it, I got the subdir logic
> wrong, I'll go fix that up now.
>
> thanks,
>
> greg k-h


---
Jaegeuk Kim
Samsung

2012-10-25 22:14:40

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 01/16 v2] f2fs: add document

I'll enhance the document as much as possible according to your
recommendation.
Thank you for intensive review. :)

---
Jaegeuk Kim
Samsung

> On Tue, 2012-10-23 at 11:25 +0900, Jaegeuk Kim wrote:
> > This adds a document describing the mount options, proc entries, usage, and
> > design of Flash-Friendly File System, namely F2FS.
> >
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > Documentation/filesystems/00-INDEX | 2 +
> > Documentation/filesystems/f2fs.txt | 404 ++++++++++++++++++++++++++++++++++++
> > 2 files changed, 406 insertions(+)
> > create mode 100644 Documentation/filesystems/f2fs.txt
> >
> > diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
> > index 8c624a1..ce5fd46 100644
> > --- a/Documentation/filesystems/00-INDEX
> > +++ b/Documentation/filesystems/00-INDEX
> > @@ -48,6 +48,8 @@ ext4.txt
> > - info, mount options and specifications for the Ext4 filesystem.
> > files.txt
> > - info on file management in the Linux kernel.
> > +f2fs.txt
> > + - info and mount options for the F2FS filesystem.
> > fuse.txt
> > - info on the Filesystem in User SpacE including mount options.
> > gfs2.txt
> > diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
> > new file mode 100644
> > index 0000000..f2b4fde
> > --- /dev/null
> > +++ b/Documentation/filesystems/f2fs.txt
> > @@ -0,0 +1,404 @@
> > +================================================================================
> > +WHAT IS Flash-Friendly File System (F2FS)?
> > +================================================================================
> > +
> > +NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
> > +been widely being used for storage ranging from mobile to server systems. Since
>
> Maybe, it needs to reformulate "... have been widely being used ..."?
>
> > +they are known to have different characteristics from the conventional rotating
> > +disks, a file system, an upper layer to the storage device, should adapt to the
> > +changes from the sketch in the design level.
> > +
> > +F2FS is a file system exploiting NAND flash memory-based storage devices, which
> > +is based on Log-structured File System (LFS). The design has been focused on
> > +addressing the fundamental issues in LFS, which are snowball effect of wandering
> > +tree and high cleaning overhead.
> > +
> > +Since a NAND flash memory-based storage device shows different characteristic
> > +according to its internal geometry or flash memory management scheme, namely FTL,
> > +F2FS and its tools support various parameters not only for configuring on-disk
> > +layout, but also for selecting allocation and cleaning algorithms.
> > +
> > +The file system formatting tool, "mkfs.f2fs", is available from the following
> > +download page: http://sourceforge.net/projects/f2fs-tools/
> > +
> > +================================================================================
> > +BACKGROUND AND DESIGN ISSUES
> > +================================================================================
> > +
> > +Log-structured File System (LFS)
> > +--------------------------------
> > +"A log-structured file system writes all modifications to disk sequentially in
> > +a log-like structure, thereby speeding up both file writing and crash recovery.
> > +The log is the only structure on disk; it contains indexing information so that
> > +files can be read back from the log efficiently. In order to maintain large free
> > +areas on disk for fast writing, we divide the log into segments and use a
> > +segment cleaner to compress the live information from heavily fragmented
> > +segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
> > +implementation of a log-structured file system", ACM Trans. Computer Systems
> > +10, 1, 26–52.
> > +
> > +Wandering Tree Problem
> > +----------------------
> > +In LFS, when a file data is updated and written to the end of log, its direct
> > +pointer block is updated due to the changed location. Then the indirect pointer
> > +block is also updated due to the direct pointer block update. In this manner,
> > +the upper index structures such as inode, inode map, and checkpoint block are
> > +also updated recursively. This problem is called as wandering tree problem [1],
> > +and in order to enhance the performance, it should eliminate or relax the update
> > +propagation as much as possible.
> > +
> > +[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
> > +
> > +Cleaning Overhead
> > +-----------------
> > +Since LFS is based on out-of-place writes, it produces so many obsolete blocks
> > +scattered across the whole storage. In order to serve new empty log space, it
> > +needs to reclaim these obsolete blocks seamlessly to users. This job is called
> > +as a cleaning process.
> > +
> > +The process consists of three operations as follows.
> > +1. A victim segment is selected through referencing segment usage table.
> > +2. It loads parent index structures of all the data in the victim identified by
> > + segment summary blocks.
> > +3. It checks the cross-reference between the data and its parent index structure.
> > +4. It moves valid data selectively.
> > +
> > +This cleaning job may cause unexpected long delays, so the most important goal
> > +is to hide the latencies to users. And also definitely, it should reduce the
> > +amount of valid data to be moved, and move them quickly as well.
> > +
> > +================================================================================
> > +KEY FEATURES
> > +================================================================================
> > +
> > +Flash Awareness
> > +---------------
> > +- Enlarge the random write area for better performance, but provide the high
> > + spatial locality
> > +- Align FS data structures to the operational units in FTL as best efforts
> > +
> > +Wandering Tree Problem
> > +----------------------
> > +- Use a term, “node”, that represents inodes as well as various pointer blocks
> > +- Introduce Node Address Table (NAT) containing the locations of all the “node”
> > + blocks; this will cut off the update propagation.
> > +
> > +Cleaning Overhead
> > +-----------------
> > +- Support a background cleaning process
> > +- Support greedy and cost-benefit algorithms for victim selection policies
> > +- Support multi-head logs for static/dynamic hot and cold data separation
> > +- Introduce adaptive logging for efficient block allocation
> > +
> > +================================================================================
> > +MOUNT OPTIONS
> > +================================================================================
> > +
> > +background_gc_off Turn off cleaning operations, namely garbage collection,
> > + triggered in background when I/O subsystem is idle.
> > +disable_roll_forward Disable the roll-forward recovery routine
> > +discard Issue discard/TRIM commands when a segment is cleaned.
> > +no_heap Disable heap-style segment allocation which finds free
> > + segments for data from the beginning of main area, while
> > + for node from the end of main area.
> > +nouser_xattr Disable Extended User Attributes. Note: xattr is enabled
> > + by default if CONFIG_F2FS_FS_XATTR is selected.
> > +noacl Disable POSIX Access Control List. Note: acl is enabled
> > + by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
> > +active_logs=%u Support configuring the number of active logs. In the
> > + current design, f2fs supports only 2, 4, and 6 logs.
> > + Default number is 6.
> > +disable_ext_identify Disable the extension list configured by mkfs, so f2fs
> > + does not aware of cold files such as media files.
> > +
> > +================================================================================
> > +PROC ENTRIES
> > +================================================================================
> > +
> > +/proc/fs/f2fs/ contains information about partitions mounted as f2fs. For each
> > +partition, a corresponding directory, named as its device name, is provided with
> > +the following proc entries.
> > +
> > +- f2fs_stat major file system information managed by f2fs currently
> > +- f2fs_sit_stat average utilization information of the whole segments
> > +- f2fs_mem_stat current memory footprint consumed by f2fs
> > +
> > +e.g., in /proc/fs/f2fs/sdb1/
> > +
> > +================================================================================
> > +USAGE
> > +================================================================================
> > +
> > +1. Download userland tools
> > +
> > +2. Insmod f2fs.ko module:
> > + # insmod f2fs.ko
> > +
>
> What about the case of static compilation of f2fs in the kernel?
>
> > +3. Check the directory trying to mount
> > + # mkdir /mnt/f2fs
> > +
>
> Create or check?
>
> > +4. Format the block device, and then mount as f2fs
> > + # mkfs.f2fs -l label /dev/block_device
> > + # mount -t f2fs /dev/block_device /mnt/f2fs
> > +
> > +Mount options
>
> Sorry, is it really mount options? Maybe, I misunderstand possibility to
> set volume label during mount.
>
> > +-------------
> > +-l [label] : Give a volume label, up to 256 unicode name.
> > +-a [0 or 1] : Split start location of each area for heap-based allocation.
> > + 1 is set by default, which performs this.
> > +-o [int] : Set overprovision ratio in percent over volume size.
> > + 5 is set by default.
> > +-s [int] : Set the number of segments per section.
> > + 1 is set by default.
> > +-z [int] : Set the number of sections per zone.
> > + 1 is set by default.
> > +-e [str] : Set basic extension list. e.g. "mp3,gif,mov"
> > +
> > +================================================================================
> > +DESIGN
> > +================================================================================
> > +
> > +On-disk Layout
> > +--------------
> > +
> > +F2FS divides the whole volume into a number of segments, each of which is 2MB in
> > +size by default. A section is composed of consecutive segments, and a zone
> > +consists of a set of sections.
> > +
>
> Maybe, it makes sense to describe here possible sizes of sections and
> zones?
>
> > +F2FS maintains logically six log areas. Except SB, all the log areas are managed
> > +in a unit of multiple segments. SB is located at the beginning of the partition,
> > +and there exist two superblocks to avoid file system crash. Other file system
> > +metadata such as CP, NAT, SIT, and SSA are located in the front part of the
> > +volume. Main area contains file and directory data including their indices.
> > +
>
> I feel necessity to know more details about log concept here. Could you
> add slightly more description about log?
>
> > +Each area manages the following contents.
> > +- CP File system information, bitmaps for valid NAT/SIT sets, orphan
> > + inode lists, and summary entries of current active segments.
> > +- NAT Block address table for all the node blocks stored in Main area.
> > +- SIT Segment information such as valid block count and bitmap for the
> > + validity of all the blocks.
> > +- SSA Summary entries which contains the owner information of all the
> > + data and node blocks stored in Main area.
> > +- Main Node and data blocks.
> > +
>
> Could you add definition of abbreviations here also (for example, NAT
> Node Address Table: <description>)?
>
> > +In order to avoid misalignment between file system and flash-based storage, F2FS
> > +aligns the start block address of CP with the segment size. Also, it aligns the
> > +start block address of Main area with the zone size by reserving some segments
> > +in SSA area.
>
> Maybe, it makes sense to add some technical details about aligning
> procedure here?
>
> > +
> > + align with the zone size <-|
> > + |-> align with the segment size
> > + _________________________________________________________________________
> > + | | | Node | Segment | Segment | |
> > + | Superblock | Checkpoint | Address | Info. | Summary | Main |
> > + | (SB) | (CP) | Table (NAT) | Table (SIT) | Area (SSA) | |
> > + |____________|_____2______|______N______|______N______|______N_____|__N___|
> > + . .
> > + . .
> > + . .
> > + ._________________________________________.
> > + |_Segment_|_..._|_Segment_|_..._|_Segment_|
> > + . .
> > + ._________._________
> > + |_section_|__...__|_
> > + . .
> > + .________.
> > + |__zone__|
> > +
> > +
> > +File System Metadata Structure
> > +------------------------------
> > +
> > +F2FS adopts the checkpointing scheme to maintain file system consistency. At
> > +mount time, F2FS first tries to find the last valid checkpoint data by scanning
> > +CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
> > +One of them always indicates the last valid data, which is called as shadow copy
> > +mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
> > +
> > +For file system consistency, each CP points to which NAT and SIT copies are
> > +valid, as shown as below.
> > +
> > + +--------+----------+---------+
> > + | CP | NAT | SIT |
> > + +--------+----------+---------+
> > + . . . .
> > + . . . .
> > + . . . .
> > + +-------+-------+--------+--------+--------+--------+
> > + | CP #0 | CP #1 | NAT #0 | NAT #1 | SIT #0 | SIT #1 |
> > + +-------+-------+--------+--------+--------+--------+
> > + | ^ ^
> > + | | |
> > + `----------------------------------------'
> > +
> > +Index Structure
> > +---------------
> > +
> > +The key data structure to manage the data locations is a "node". Similar to
> > +traditional file structures, F2FS has three types of node: inode, direct node,
> > +indirect node. F2FS assigns 4KB to an inode block which contains 929 data block
> > +indices, two direct node pointers, two indirect node pointers, and one double
> > +indirect node pointer as described below. One direct node block contains 1018
> > +data blocks, and one indirect node block contains also 1018 node blocks. Thus,
> > +one inode block (i.e., a file) covers:
> > +
> > + 4KB * (927 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
> > +
> > + Inode block (4KB)
> > + |- data (927)
> > + |- direct node (2)
> > + | `- data (1018)
> > + |- indirect node (2)
> > + | `- direct node (1018)
> > + | `- data (1018)
> > + `- double indirect node (1)
> > + `- indirect node (1018)
> > + `- direct node (1018)
> > + `- data (1018)
> > +
> > +Note that, all the node blocks are mapped by NAT which means the location of
> > +each node is translated by the NAT table. In the consideration of the wandering
> > +tree problem, F2FS is able to cut off the propagation of node updates caused by
> > +leaf data writes.
> > +
> > +Directory Structure
> > +-------------------
> > +
> > +A directory entry occupies 11 bytes, which consists of the following attributes.
> > +
> > +- hash hash value of the file name
> > +- ino inode number
> > +- len the length of file name
> > +- type file type such as directory, symlink, etc
> > +
> > +A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
> > +used to represent whether each dentry is valid or not. A dentry block occupies
> > +4KB with the following composition.
> > +
> > + Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
> > + dentries(11 * 214 bytes) + file name (8 * 214 bytes)
> > +
> > + [Bucket]
> > + +--------------------------------+
> > + |dentry block 1 | dentry block 2 |
> > + +--------------------------------+
> > + . .
> > + . .
> > + . [Dentry Block Structure: 4KB] .
> > + +--------+----------+----------+------------+
> > + | bitmap | reserved | dentries | file names |
> > + +--------+----------+----------+------------+
> > + [Dentry Block: 4KB] . .
> > + . .
> > + . .
> > + +------+------+-----+------+
> > + | hash | ino | len | type |
> > + +------+------+-----+------+
> > + [Dentry Structure: 11 bytes]
> > +
> > +F2FS implements multi-level hash tables for directory structure. Each level has
> > +a hash table with dedicated number of hash buckets as shown below. Note that
> > +"A(2B)" means a bucket includes 2 data blocks.
> > +
> > +----------------------
> > +A : bucket
> > +B : block
> > +N : MAX_DIR_HASH_DEPTH
> > +----------------------
> > +
> > +level #0 | A(2B)
> > + |
> > +level #1 | A(2B) - A(2B)
> > + |
> > +level #2 | A(2B) - A(2B) - A(2B) - A(2B)
> > + . | . . . .
> > +level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
> > + . | . . . .
> > +level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
> > +
> > +The number of blocks and buckets are determined by,
> > +
> > + ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
> > + # of blocks in level #n = |
> > + `- 4, Otherwise
> > +
> > + ,- 2^n, if n < MAX_DIR_HASH_DEPTH / 2,
> > + # of buckets in level #n = |
> > + `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1), Otherwise
> > +
> > +When F2FS finds a file name in a directory, at first a hash value of the file
> > +name is calculated. Then, F2FS scans the hash table in level #0 to find the
> > +dentry consisting of the file name and its inode number. If not found, F2FS
> > +scans the next hash table in level #1. In this way, F2FS scans hash tables in
> > +each levels incrementally from 1 to N. In each levels F2FS needs to scan only
> > +one bucket determined by the following equation, which shows O(log(# of files))
> > +complexity.
> > +
> > + bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
> > +
> > +In the case of file creation, F2FS finds empty consecutive slots that cover the
> > +file name. F2FS searches the empty slots in the hash tables of whole levels from
> > +1 to N in the same way as the lookup operation.
> > +
> > +The following figure shows an example of two cases holding children.
> > + --------------> Dir <--------------
> > + | |
> > + child child
> > +
> > + child - child [hole] - child
> > +
> > + child - child - child [hole] - [hole] - child
> > +
> > + Case 1: Case 2:
> > + Number of children = 6, Number of children = 3,
> > + File size = 7 File size = 7
> > +
> > +Default Block Allocation
> > +------------------------
> > +
> > +At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
> > +and Hot/Warm/Cold data.
> > +
> > +- Hot node contains direct node blocks of directories.
> > +- Warm node contains direct node blocks except hot node blocks.
> > +- Cold node contains indirect node blocks
> > +- Hot data contains dentry blocks
> > +- Warm data contains data blocks except hot and cold data blocks
> > +- Cold data contains multimedia data or migrated data blocks
> > +
> > +LFS has two schemes for free space management: threaded log and copy-and-compac-
> > +tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
> > +for devices showing very good sequential write performance, since free segments
> > +are served all the time for writing new data. However, it suffers from cleaning
> > +overhead under high utilization. Contrarily, the threaded log scheme suffers
> > +from random writes, but no cleaning process is needed. F2FS adopts a hybrid
> > +scheme where the copy-and-compaction scheme is adopted by default, but the
> > +policy is dynamically changed to the threaded log scheme according to the file
> > +system status.
> > +
> > +In order to align F2FS with underlying flash-based storage, F2FS allocates a
> > +segment in a unit of section. F2FS expects that the section size would be the
> > +same as the unit size of garbage collection in FTL. Furthermore, with respect
> > +to the mapping granularity in FTL, F2FS allocates each section of the active
> > +logs from different zones as much as possible, since FTL can write the data in
> > +the active logs into one allocation unit according to its mapping granularity.
> > +
> > +Cleaning process
> > +----------------
> > +
> > +F2FS does cleaning both on demand and in the background. On-demand cleaning is
> > +triggered when there are not enough free segments to serve VFS calls. Background
> > +cleaner is operated by a kernel thread, and triggers the cleaning job when the
> > +system is idle.
> > +
> > +F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
> > +In the greedy algorithm, F2FS selects a victim segment having the smallest number
> > +of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
> > +according to the segment age and the number of valid blocks in order to address
> > +log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
> > +algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
> > +algorithm.
> > +
> > +In order to identify whether the data in the victim segment are valid or not,
> > +F2FS manages a bitmap. Each bit represents the validity of a block, and the
> > +bitmap is composed of a bit stream covering whole blocks in main area.
>
> With the best regards,
> Vyacheslav Dubeyko.

2012-10-26 03:31:19

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 02/16 v2] f2fs: add on-disk layout

[snip]
> > +#define F2FS_SUPER_MAGIC 0xF2F52010
> > +#define F2FS_SUPER_OFFSET 0 /* start sector # for sb */
>
> Does f2fs superblock really haven't any offset from the volume begin?

The reason that I changed this from 1 to 0 is due to the failure during android
recovery. I don't know why the recovery is failed when the offset is 1, but it
works fine after the offset is changed to 0.
I suspect that mount procedure inspects the 0'th offset to figure out what file
system is installed by checking some kind of magic numbers.
Sometimes, we've seen that the mount program tries to load previously installed
file system even though mkfs.f2fs was conducted.
Would you recommend something?

>
> > +#define F2FS_BLKSIZE 4096
> > +#define F2FS_MAX_EXTENSION 64
> > +
> > +#define NULL_ADDR 0x0U
> > +#define NEW_ADDR -1U
>
> Does NULL_ADDR and NEW_ADDR declarations really need? Does kernel
> haven't any analogous?

These are used for F2FS-specific block allocation, so for readability,
I don't want to change this.

>
> > +
> > +#define F2FS_ROOT_INO(sbi) (sbi->root_ino_num)
> > +#define F2FS_NODE_INO(sbi) (sbi->node_ino_num)
> > +#define F2FS_META_INO(sbi) (sbi->meta_ino_num)
> > +
> > +#define GFP_F2FS_MOVABLE (__GFP_WAIT | __GFP_IO | __GFP_ZERO)
> > +
> > +#define MAX_ACTIVE_LOGS 16
> > +#define MAX_ACTIVE_NODE_LOGS 8
> > +#define MAX_ACTIVE_DATA_LOGS 8
>
> I think that it makes sense to comment the reasons of such limitations
> in MAX_ACTIVE_LOGS, MAX_ACTIVE_NODE_LOGS, MAX_ACTIVE_DATA_LOGS.

The maximum number of logs is suggested by arnd before.
As I understood, why he suggested such a quite large number is for further
optimization of multiple logs without any on-disk layout changes.
And, I think it is quite enough.

>
> > +
> > +/*
> > + * For superblock
> > + */
> > +struct f2fs_super_block {
> > + __le32 magic; /* Magic Number */
> > + __le16 major_ver; /* Major Version */
> > + __le16 minor_ver; /* Minor Version */
> > + __le32 log_sectorsize; /* log2 (Sector size in bytes) */
> > + __le32 log_sectors_per_block; /* log2 (Number of sectors per block */
> > + __le32 log_blocksize; /* log2 (Block size in bytes) */
> > + __le32 log_blocks_per_seg; /* log2 (Number of blocks per segment) */
>
> From my point of view, __le32 is big data type for log2 (<value>). What
> do you think?
>

Right, but it is superblock. Should we have to consider space overhead?

> > + __le32 segs_per_sec; /* Number of segments per section */
> > + __le32 secs_per_zone; /* Number of sections per zone */
> > + __le32 checksum_offset; /* Checksum position in this super block */
> > + __le64 block_count; /* Total number of blocks */
> > + __le32 section_count; /* Total number of sections */
> > + __le32 segment_count; /* Total number of segments */
> > + __le32 segment_count_ckpt; /* Total number of segments
> > + in Checkpoint area */
> > + __le32 segment_count_sit; /* Total number of segments
> > + in Segment information table */
> > + __le32 segment_count_nat; /* Total number of segments
> > + in Node address table */
> > + /*Total number of segments in Segment summary area */
> > + __le32 segment_count_ssa;
> > + /* Total number of segments in Main area */
> > + __le32 segment_count_main;
> > + __le32 failure_safe_block_distance;
> > + __le32 segment0_blkaddr; /* Start block address of Segment 0 */
> > + __le32 start_segment_checkpoint; /* Start block address of ckpt */
> > + __le32 sit_blkaddr; /* Start block address of SIT */
> > + __le32 nat_blkaddr; /* Start block address of NAT */
> > + __le32 ssa_blkaddr; /* Start block address of SSA */
> > + __le32 main_blkaddr; /* Start block address of Main area */
> > + __le32 root_ino; /* Root directory inode number */
> > + __le32 node_ino; /* node inode number */
> > + __le32 meta_ino; /* meta inode number */
> > + __le32 volume_serial_number; /* VSN is optional field */
>
> Usually, it is used 128-bits UUID for serial number. Why do you use
> __le32 as volume_serial_number?

Ok, I'll change.

[snip]
> > +/*
> > + * For directory operations
> > + */
> > +#define F2FS_DOT_HASH 0
> > +#define F2FS_DDOT_HASH F2FS_DOT_HASH
> > +#define F2FS_MAX_HASH (~((0x3ULL) << 62))
> > +#define F2FS_HASH_COL_BIT ((0x1ULL) << 63)
> > +
> > +typedef __le32 f2fs_hash_t;
> > +
> > +#define F2FS_NAME_LEN 8
>
> It exists F2FS_MAX_NAME_LEN. I think that it makes sense to comment here
> purpose of F2FS_NAME_LEN declaration.

Ok, thanks.

---
Jaegeuk Kim
Samsung

2012-10-26 08:18:35

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 02/16 v2] f2fs: add on-disk layout

On Friday 26 October 2012, Jaegeuk Kim wrote:

> > > +
> > > +#define F2FS_ROOT_INO(sbi) (sbi->root_ino_num)
> > > +#define F2FS_NODE_INO(sbi) (sbi->node_ino_num)
> > > +#define F2FS_META_INO(sbi) (sbi->meta_ino_num)
> > > +
> > > +#define GFP_F2FS_MOVABLE (__GFP_WAIT | __GFP_IO | __GFP_ZERO)
> > > +
> > > +#define MAX_ACTIVE_LOGS 16
> > > +#define MAX_ACTIVE_NODE_LOGS 8
> > > +#define MAX_ACTIVE_DATA_LOGS 8
> >
> > I think that it makes sense to comment the reasons of such limitations
> > in MAX_ACTIVE_LOGS, MAX_ACTIVE_NODE_LOGS, MAX_ACTIVE_DATA_LOGS.
>
> The maximum number of logs is suggested by arnd before.
> As I understood, why he suggested such a quite large number is for further
> optimization of multiple logs without any on-disk layout changes.
> And, I think it is quite enough.

I agree. I think Vyacheslav was just asking you to add a comment
explaining how we got to these numbers, like

/*
* The file format supports up to 16 active logs, which should be
* more than enough for future optimizations. The implementation
* currently uses no more than 6 logs.
* Half the logs are used for nodes, the other half are used for data.
*/


Arnd

2012-10-26 08:31:52

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 02/16 v2] f2fs: add on-disk layout

> On Friday 26 October 2012, Jaegeuk Kim wrote:
>
> > > > +
> > > > +#define F2FS_ROOT_INO(sbi) (sbi->root_ino_num)
> > > > +#define F2FS_NODE_INO(sbi) (sbi->node_ino_num)
> > > > +#define F2FS_META_INO(sbi) (sbi->meta_ino_num)
> > > > +
> > > > +#define GFP_F2FS_MOVABLE (__GFP_WAIT | __GFP_IO | __GFP_ZERO)
> > > > +
> > > > +#define MAX_ACTIVE_LOGS 16
> > > > +#define MAX_ACTIVE_NODE_LOGS 8
> > > > +#define MAX_ACTIVE_DATA_LOGS 8
> > >
> > > I think that it makes sense to comment the reasons of such limitations
> > > in MAX_ACTIVE_LOGS, MAX_ACTIVE_NODE_LOGS, MAX_ACTIVE_DATA_LOGS.
> >
> > The maximum number of logs is suggested by arnd before.
> > As I understood, why he suggested such a quite large number is for further
> > optimization of multiple logs without any on-disk layout changes.
> > And, I think it is quite enough.
>
> I agree. I think Vyacheslav was just asking you to add a comment
> explaining how we got to these numbers, like
>
> /*
> * The file format supports up to 16 active logs, which should be
> * more than enough for future optimizations. The implementation
> * currently uses no more than 6 logs.
> * Half the logs are used for nodes, the other half are used for data.
> */
>

Right. I added like that.
Thank you for explanation, and it'd better add additional words described by you.

>
> Arnd


---
Jaegeuk Kim
Samsung

2012-10-26 12:48:56

by Viacheslav Dubeyko

[permalink] [raw]
Subject: RE: [PATCH 02/16 v2] f2fs: add on-disk layout

On Fri, 2012-10-26 at 12:31 +0900, Jaegeuk Kim wrote:
> [snip]
> > > +#define F2FS_SUPER_MAGIC 0xF2F52010
> > > +#define F2FS_SUPER_OFFSET 0 /* start sector # for sb */
> >
> > Does f2fs superblock really haven't any offset from the volume begin?
>
> The reason that I changed this from 1 to 0 is due to the failure during android
> recovery. I don't know why the recovery is failed when the offset is 1, but it
> works fine after the offset is changed to 0.
> I suspect that mount procedure inspects the 0'th offset to figure out what file
> system is installed by checking some kind of magic numbers.
> Sometimes, we've seen that the mount program tries to load previously installed
> file system even though mkfs.f2fs was conducted.
> Would you recommend something?
>

I thought that superblock's placement is defined always from design
point of view. :-)

As I understand, usually many filesystems places first superblock on
1024 bytes distance from a volume beginning. It reserves some place for
possibility to have volume boot record.

I am afraid that Android recovery failure takes place because of
in-proper configuration or, maybe, some special Android recovery code's
modification. Does it means that it is not possible to use, for example,
ext4 or nilfs2 under Android?

Yes, you are right, the mount procedure try to detect a filesystem type.
But superblock can place in different locations. For example, FAT's
superblock hasn't any offset from the volume begin; hfs+ superblock is
located on 1024 bytes from volume begin; reiserfs superblock is located
on 64 KB from volume begin. The situation when mount utility tries to
load another filesystem instead of f2fs is a symptom of mkfs.f2fs bug,
from my point of view.

I think that it makes sense to have as minimum 1024 bytes superblock's
offset from a volume begin.

[snip]
> > > +
> > > +/*
> > > + * For superblock
> > > + */
> > > +struct f2fs_super_block {
> > > + __le32 magic; /* Magic Number */
> > > + __le16 major_ver; /* Major Version */
> > > + __le16 minor_ver; /* Minor Version */
> > > + __le32 log_sectorsize; /* log2 (Sector size in bytes) */
> > > + __le32 log_sectors_per_block; /* log2 (Number of sectors per block */
> > > + __le32 log_blocksize; /* log2 (Block size in bytes) */
> > > + __le32 log_blocks_per_seg; /* log2 (Number of blocks per segment) */
> >
> > From my point of view, __le32 is big data type for log2 (<value>). What
> > do you think?
> >
>
> Right, but it is superblock. Should we have to consider space overhead?
>

I simply think that __le16 can be enough. So, all four fields
(log_sectorsize, log_sectors_per_block, log_blocksize,
log_blocks_per_seg) will occupy 8 bytes instead of 16. And this place (8
bytes) can be used for volume serial number field.

With the best regards,
Vyacheslav Dubeyko.

2012-10-26 13:13:31

by Jaegeuk Kim

[permalink] [raw]
Subject: RE: [PATCH 02/16 v2] f2fs: add on-disk layout

2012-10-26 (금), 16:48 +0400, Vyacheslav Dubeyko:
> On Fri, 2012-10-26 at 12:31 +0900, Jaegeuk Kim wrote:
> > [snip]
> > > > +#define F2FS_SUPER_MAGIC 0xF2F52010
> > > > +#define F2FS_SUPER_OFFSET 0 /* start sector # for sb */
> > >
> > > Does f2fs superblock really haven't any offset from the volume begin?
> >
> > The reason that I changed this from 1 to 0 is due to the failure during android
> > recovery. I don't know why the recovery is failed when the offset is 1, but it
> > works fine after the offset is changed to 0.
> > I suspect that mount procedure inspects the 0'th offset to figure out what file
> > system is installed by checking some kind of magic numbers.
> > Sometimes, we've seen that the mount program tries to load previously installed
> > file system even though mkfs.f2fs was conducted.
> > Would you recommend something?
> >
>
> I thought that superblock's placement is defined always from design
> point of view. :-)
>
> As I understand, usually many filesystems places first superblock on
> 1024 bytes distance from a volume beginning. It reserves some place for
> possibility to have volume boot record.
>
> I am afraid that Android recovery failure takes place because of
> in-proper configuration or, maybe, some special Android recovery code's
> modification. Does it means that it is not possible to use, for example,
> ext4 or nilfs2 under Android?
>
> Yes, you are right, the mount procedure try to detect a filesystem type.
> But superblock can place in different locations. For example, FAT's
> superblock hasn't any offset from the volume begin; hfs+ superblock is
> located on 1024 bytes from volume begin; reiserfs superblock is located
> on 64 KB from volume begin. The situation when mount utility tries to
> load another filesystem instead of f2fs is a symptom of mkfs.f2fs bug,
> from my point of view.
>
> I think that it makes sense to have as minimum 1024 bytes superblock's
> offset from a volume begin.

Thank you.
I'll try 1024 bytes offset, also in Android.

>
> [snip]
> > > > +
> > > > +/*
> > > > + * For superblock
> > > > + */
> > > > +struct f2fs_super_block {
> > > > + __le32 magic; /* Magic Number */
> > > > + __le16 major_ver; /* Major Version */
> > > > + __le16 minor_ver; /* Minor Version */
> > > > + __le32 log_sectorsize; /* log2 (Sector size in bytes) */
> > > > + __le32 log_sectors_per_block; /* log2 (Number of sectors per block */
> > > > + __le32 log_blocksize; /* log2 (Block size in bytes) */
> > > > + __le32 log_blocks_per_seg; /* log2 (Number of blocks per segment) */
> > >
> > > From my point of view, __le32 is big data type for log2 (<value>). What
> > > do you think?
> > >
> >
> > Right, but it is superblock. Should we have to consider space overhead?
> >
>
> I simply think that __le16 can be enough. So, all four fields
> (log_sectorsize, log_sectors_per_block, log_blocksize,
> log_blocks_per_seg) will occupy 8 bytes instead of 16. And this place (8
> bytes) can be used for volume serial number field.
>

As your previous opinion, I added already uuid in superblock.
At this moment, we can change on-disk layout freely. :)
Thanks,

> With the best regards,
> Vyacheslav Dubeyko.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Jaegeuk Kim
Samsung

2012-10-27 13:58:51

by Viacheslav Dubeyko

[permalink] [raw]
Subject: Re: [PATCH 03/16 v2] f2fs: add superblock and major in-memory structure


On Oct 23, 2012, at 6:26 AM, Jaegeuk Kim wrote:

> This adds the following major in-memory structures in f2fs.
>
> - f2fs_sb_info:
> contains f2fs-specific information, two special inode pointers for node and
> meta address spaces, and orphan inode management.
>
> - f2fs_inode_info:
> contains vfs_inode and other fs-specific information.
>
> - f2fs_nm_info:
> contains node manager information such as NAT entry cache, free nid list,
> and NAT page management.
>
> - f2fs_node_info:
> represents a node as node id, inode number, block address, and its version.
>
> - f2fs_sm_info:
> contains segment manager information such as SIT entry cache, free segment
> map, current active logs, dirty segment management, and segment utilization.
> The specific structures are sit_info, free_segmap_info, dirty_seglist_info,
> curseg_info.
>
> Signed-off-by: Chul Lee <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/f2fs.h | 982 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/f2fs/node.h | 330 ++++++++++++++++++
> fs/f2fs/segment.h | 594 ++++++++++++++++++++++++++++++++
> 3 files changed, 1906 insertions(+)
> create mode 100644 fs/f2fs/f2fs.h
> create mode 100644 fs/f2fs/node.h
> create mode 100644 fs/f2fs/segment.h
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> new file mode 100644
> index 0000000..bbe2f02
> --- /dev/null
> +++ b/fs/f2fs/f2fs.h
> @@ -0,0 +1,982 @@
> +/**
> + * fs/f2fs/f2fs.h
> + *
> + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com/
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#ifndef _LINUX_F2FS_H
> +#define _LINUX_F2FS_H
> +
> +#include <linux/types.h>
> +#include <linux/page-flags.h>
> +#include <linux/buffer_head.h>
> +#include <linux/version.h>
> +#include <linux/slab.h>
> +#include <linux/crc32.h>
> +
> +/**
> + * For mount options
> + */
> +#define F2FS_MOUNT_BG_GC 0x00000001
> +#define F2FS_MOUNT_DISABLE_ROLL_FORWARD 0x00000002
> +#define F2FS_MOUNT_DISCARD 0x00000004
> +#define F2FS_MOUNT_NOHEAP 0x00000008
> +#define F2FS_MOUNT_XATTR_USER 0x00000010
> +#define F2FS_MOUNT_POSIX_ACL 0x00000020
> +#define F2FS_MOUNT_DISABLE_EXT_IDENTIFY 0x00000040
> +
> +#define clear_opt(sbi, option) (sbi->mount_opt.opt &= ~F2FS_MOUNT_##option)
> +#define set_opt(sbi, option) (sbi->mount_opt.opt |= F2FS_MOUNT_##option)
> +#define test_opt(sbi, option) (sbi->mount_opt.opt & F2FS_MOUNT_##option)
> +
> +#define ver_after(a, b) (typecheck(unsigned long long, a) && \
> + typecheck(unsigned long long, b) && \
> + ((long long)((a) - (b)) > 0))
> +
> +typedef u64 block_t;
> +typedef u32 nid_t;
> +
> +struct f2fs_mount_info {
> + unsigned int opt;
> +};
> +
> +static inline __u32 f2fs_crc32(void *buff, size_t len)
> +{
> + return crc32_le(F2FS_SUPER_MAGIC, buff, len);
> +}
> +
> +static inline bool f2fs_crc_valid(__u32 blk_crc, void *buff, size_t buff_size)
> +{
> + return f2fs_crc32(buff, buff_size) == blk_crc;
> +}
> +
> +/**
> + * For checkpoint manager
> + */
> +#define CP_ERROR_FLAG 0x00000008
> +#define CP_COMPACT_SUM_FLAG 0x00000004
> +#define CP_ORPHAN_PRESENT_FLAG 0x00000002
> +#define CP_UMOUNT_FLAG 0x00000001
> +
> +enum {
> + NAT_BITMAP,
> + SIT_BITMAP
> +};
> +
> +struct orphan_inode_entry {
> + struct list_head list;
> + nid_t ino;
> +};
> +
> +struct dir_inode_entry {
> + struct list_head list;
> + struct inode *inode;
> +};
> +
> +struct fsync_inode_entry {
> + struct list_head list;
> + struct inode *inode;
> + block_t blkaddr;
> +};
> +
> +#define nats_in_cursum(sum) (le16_to_cpu(sum->n_nats))
> +#define sits_in_cursum(sum) (le16_to_cpu(sum->n_sits))
> +
> +#define nat_in_journal(sum, i) (sum->nat_j.entries[i].ne)
> +#define nid_in_journal(sum, i) (sum->nat_j.entries[i].nid)
> +#define sit_in_journal(sum, i) (sum->sit_j.entries[i].se)
> +#define segno_in_journal(sum, i) (sum->sit_j.entries[i].segno)
> +
> +static inline int update_nats_in_cursum(struct f2fs_summary_block *rs, int i)
> +{
> + int before = nats_in_cursum(rs);
> + rs->n_nats = cpu_to_le16(before + i);
> + return before;
> +}
> +
> +static inline int update_sits_in_cursum(struct f2fs_summary_block *rs, int i)
> +{
> + int before = sits_in_cursum(rs);
> + rs->n_sits = cpu_to_le16(before + i);
> + return before;
> +}
> +
> +/**
> + * For INODE and NODE manager
> + */
> +#define XATTR_NODE_OFFSET (-1)

This is a really strange declaration. Why does offset equal by -1?

> +#define RDONLY_NODE 1
> +#define F2FS_LINK_MAX 32000
> +
> +struct extent_info {
> + rwlock_t ext_lock;
> + unsigned int fofs;
> + u32 blk_addr;
> + unsigned int len;
> +};
> +
> +#define FADVISE_COLD_BIT 0x01
> +/*
> + * i_advise uses FADVISE_XXX_BIT. We can add additional hints later.
> + */

Maybe, it makes sense to have this comment before declaration. I think that such comment before struct f2fs_inode_info can confuse.

> +struct f2fs_inode_info {
> + struct inode vfs_inode;
> + unsigned long i_flags;
> + unsigned long flags;
> + unsigned long long data_version;
> + atomic_t dirty_dents;
> + unsigned int current_depth;
> + f2fs_hash_t chash;
> + unsigned int clevel;
> + nid_t i_xattr_nid;
> + struct extent_info ext;
> + umode_t i_acl_mode;
> + unsigned char i_advise; /* If true, this is cold data */
> +};

>From my point of view, it is very important to have descriptive comments for every key structure. But, as I can see, many key structures haven't any comment.

> +
> +static inline void get_extent_info(struct extent_info *ext,
> + struct f2fs_extent i_ext)
> +{
> + write_lock(&ext->ext_lock);
> + ext->fofs = le32_to_cpu(i_ext.fofs);
> + ext->blk_addr = le32_to_cpu(i_ext.blk_addr);
> + ext->len = le32_to_cpu(i_ext.len);
> + write_unlock(&ext->ext_lock);
> +}
> +
> +static inline void set_raw_extent(struct extent_info *ext,
> + struct f2fs_extent *i_ext)
> +{
> + read_lock(&ext->ext_lock);
> + i_ext->fofs = cpu_to_le32(ext->fofs);
> + i_ext->blk_addr = cpu_to_le32(ext->blk_addr);
> + i_ext->len = cpu_to_le32(ext->len);
> + read_unlock(&ext->ext_lock);
> +}
> +
> +struct f2fs_nm_info {
> + block_t nat_blkaddr; /* base disk address of NAT */
> + unsigned int nat_segs; /* the number of nat segments */
> + unsigned int nat_blocks; /* the number of nat blocks of
> + one size */
> + nid_t max_nid;
> + unsigned int nat_cnt; /* the number of nodes in NAT Buffer */
> + struct radix_tree_root nat_root;
> + rwlock_t nat_tree_lock; /* Protect nat_tree_lock */
> + struct list_head nat_entries; /* cached nat entry list (clean) */
> + struct list_head dirty_nat_entries; /* cached nat entry list (dirty) */
> +
> + unsigned int fcnt; /* the number of free node id */
> + struct mutex build_lock; /* lock for build free nids */
> +
> + int nat_upd_blkoff[3]; /* Block offset
> + in the current journal segment
> + where the last NAT update happened */
> + int lst_upd_blkoff[3]; /* Block offset
> + in current journal segment */
> +
> + unsigned int written_valid_node_count;
> + unsigned int written_valid_inode_count;
> + char *nat_bitmap; /* NAT bitmap pointer */
> + int bitmap_size; /* bitmap size */
> +
> + nid_t init_scan_nid; /* the first nid to be scanned */
> + nid_t next_scan_nid; /* the next nid to be scanned */
> + struct list_head free_nid_list;
> + spinlock_t free_nid_list_lock; /* Protect free nid list */
> +};
> +
> +struct dnode_of_data {
> + struct inode *inode;
> + struct page *inode_page;
> + struct page *node_page;
> + nid_t nid;
> + unsigned int ofs_in_node;
> + bool inode_page_locked;
> + block_t data_blkaddr;
> +};
> +
> +static inline void set_new_dnode(struct dnode_of_data *dn, struct inode *inode,
> + struct page *ipage, struct page *npage, nid_t nid)
> +{
> + dn->inode = inode;
> + dn->inode_page = ipage;
> + dn->node_page = npage;
> + dn->nid = nid;
> + dn->inode_page_locked = 0;
> +}
> +
> +/**
> + * For SIT manager
> + */
> +#define NR_CURSEG_DATA_TYPE (3)
> +#define NR_CURSEG_NODE_TYPE (3)
> +#define NR_CURSEG_TYPE (NR_CURSEG_DATA_TYPE + NR_CURSEG_NODE_TYPE)

It is really confusing to see two declarations with equal values. Why not different?

I think that here it needs to comment purpose of these declarations. Because it is not so easy to understand why NR_CURSEG_TYPE has such value.


> +
> +enum {
> + CURSEG_HOT_DATA = 0,
> + CURSEG_WARM_DATA,
> + CURSEG_COLD_DATA,
> + CURSEG_HOT_NODE,
> + CURSEG_WARM_NODE,
> + CURSEG_COLD_NODE,
> + NO_CHECK_TYPE
> +};
> +
> +struct f2fs_sm_info {
> + /* SIT information */
> + struct sit_info *sit_info;
> +
> + /* Free segmap infomation */
> + struct free_segmap_info *free_info;
> +
> + /* Dirty segments list information for GC victim */
> + struct dirty_seglist_info *dirty_info;
> +
> + /* Current working segments(i.e. logging point) information array */
> + struct curseg_info *curseg_array;
> +
> + /* list head of all under-writeback pages for flush handling */
> + struct list_head wblist_head;
> + spinlock_t wblist_lock;
> +
> + block_t seg0_blkaddr;
> + block_t main_blkaddr;
> + unsigned int segment_count;
> + unsigned int rsvd_segment_count;
> + unsigned int main_segment_count;
> + block_t ssa_blkaddr;
> + unsigned int segment_count_ssa;
> +};
> +
> +/**
> + * For Garbage Collection
> + */
> +struct f2fs_gc_info {
> +#ifdef CONFIG_F2FS_STAT_FS
> + struct list_head stat_list;
> + struct f2fs_stat_info *stat_info;
> +#endif
> + int cause;
> + int rsvd_segment_count;
> + int overp_segment_count;
> +};
> +
> +/**
> + * For directory operation
> + */
> +#define F2FS_INODE_SIZE (17 * 4 + F2FS_MAX_NAME_LEN)
> +#define NODE_DIR1_BLOCK (ADDRS_PER_INODE + 1)
> +#define NODE_DIR2_BLOCK (ADDRS_PER_INODE + 2)
> +#define NODE_IND1_BLOCK (ADDRS_PER_INODE + 3)
> +#define NODE_IND2_BLOCK (ADDRS_PER_INODE + 4)
> +#define NODE_DIND_BLOCK (ADDRS_PER_INODE + 5)
> +

I think that it is really hard to understand why F2FS_INODE_SIZE is calculated as (17 * 4 + F2FS_MAX_NAME_LEN). What does it means 17 * 4?

> +/**
> + * For superblock
> + */
> +enum count_type {
> + F2FS_WRITEBACK,
> + F2FS_DIRTY_DENTS,
> + F2FS_DIRTY_NODES,
> + F2FS_DIRTY_META,
> + NR_COUNT_TYPE,
> +};
> +
> +/*
> + * FS_LOCK nesting subclasses for the lock validator:
> + *
> + * The locking order between these classes is
> + * RENAME -> DENTRY_OPS -> DATA_WRITE -> DATA_NEW
> + * -> DATA_TRUNC -> NODE_WRITE -> NODE_NEW -> NODE_TRUNC
> + */
> +enum lock_type {
> + RENAME, /* for renaming operations */
> + DENTRY_OPS, /* for directory operations */
> + DATA_WRITE, /* for data write */
> + DATA_NEW, /* for data allocation */
> + DATA_TRUNC, /* for data truncate */
> + NODE_NEW, /* for node allocation */
> + NODE_TRUNC, /* for node truncate */
> + NODE_WRITE, /* for node write */
> + NR_LOCK_TYPE,
> +};
> +
> +/*
> + * The below are the page types of bios used in submti_bio().
> + * The available types are:
> + * DATA User data pages. It operates as async mode.
> + * NODE Node pages. It operates as async mode.
> + * META FS metadata pages such as SIT, NAT, CP.
> + * NR_PAGE_TYPE The number of page types.
> + * META_FLUSH Make sure the previous pages are written
> + * with waiting the bio's completion
> + * ... Only can be used with META.
> + */
> +enum page_type {
> + DATA,
> + NODE,
> + META,
> + NR_PAGE_TYPE,
> + META_FLUSH,
> +};
> +
> +struct f2fs_sb_info {
> + struct super_block *sb; /* Pointer to VFS super block */
> + int s_dirty;
> + struct f2fs_super_block *raw_super; /* Pointer to the super block
> + in the buffer */
> + struct buffer_head *raw_super_buf; /* Buffer containing
> + the f2fs raw super block */
> + struct f2fs_checkpoint *ckpt; /* Pointer to the checkpoint
> + in the buffer */
> + struct mutex orphan_inode_mutex;
> + spinlock_t dir_inode_lock;
> + struct mutex cp_mutex;
> + /* orphan Inode list to be written in Journal block during CP */
> + struct list_head orphan_inode_list;
> + struct list_head dir_inode_list;
> + unsigned int n_orphans, n_dirty_dirs;
> +
> + unsigned int log_sectorsize;
> + unsigned int log_sectors_per_block;
> + unsigned int log_blocksize;
> + unsigned int blocksize;
> + unsigned int root_ino_num; /* Root Inode Number*/
> + unsigned int node_ino_num; /* Root Inode Number*/
> + unsigned int meta_ino_num; /* Root Inode Number*/
> + unsigned int log_blocks_per_seg;
> + unsigned int blocks_per_seg;
> + unsigned int segs_per_sec;
> + unsigned int secs_per_zone;
> + unsigned int total_sections;
> + unsigned int total_node_count;
> + unsigned int total_valid_node_count;
> + unsigned int total_valid_inode_count;
> + unsigned int segment_count[2];
> + unsigned int block_count[2];
> + unsigned int last_victim[2];
> + int active_logs;
> + block_t user_block_count;
> + block_t total_valid_block_count;
> + block_t alloc_valid_block_count;
> + block_t last_valid_block_count;
> + atomic_t nr_pages[NR_COUNT_TYPE];
> +
> + struct f2fs_mount_info mount_opt;
> +
> + /* related to NM */
> + struct f2fs_nm_info *nm_info; /* Node Manager information */
> +
> + /* related to SM */
> + struct f2fs_sm_info *sm_info; /* Segment Manager
> + information */
> + int total_hit_ext, read_hit_ext;
> + int rr_flush;
> +
> + /* related to GC */
> + struct proc_dir_entry *s_proc;
> + struct f2fs_gc_info *gc_info; /* Garbage Collector
> + information */
> + struct mutex gc_mutex; /* mutex for GC */
> + struct mutex fs_lock[NR_LOCK_TYPE]; /* mutex for GP */
> + struct mutex write_inode; /* mutex for write inode */
> + struct mutex writepages; /* mutex for writepages() */
> + struct f2fs_gc_kthread *gc_thread; /* GC thread */
> + int bg_gc;
> + int last_gc_status;
> + int por_doing;
> +
> + struct inode *node_inode;
> + struct inode *meta_inode;
> +
> + struct bio *bio[NR_PAGE_TYPE];
> + sector_t last_block_in_bio[NR_PAGE_TYPE];
> + struct rw_semaphore bio_sem;
> + spinlock_t stat_lock; /* lock for handling the number
> + of valid blocks and
> + valid nodes */
> +};
> +
> +/**
> + * Inline functions
> + */
> +static inline struct f2fs_inode_info *F2FS_I(struct inode *inode)
> +{
> + return container_of(inode, struct f2fs_inode_info, vfs_inode);
> +}
> +
> +static inline struct f2fs_sb_info *F2FS_SB(struct super_block *sb)
> +{
> + return sb->s_fs_info;
> +}
> +
> +static inline struct f2fs_super_block *F2FS_RAW_SUPER(struct f2fs_sb_info *sbi)
> +{
> + return (struct f2fs_super_block *)(sbi->raw_super);
> +}
> +
> +static inline struct f2fs_checkpoint *F2FS_CKPT(struct f2fs_sb_info *sbi)
> +{
> + return (struct f2fs_checkpoint *)(sbi->ckpt);
> +}
> +
> +static inline struct f2fs_nm_info *NM_I(struct f2fs_sb_info *sbi)
> +{
> + return (struct f2fs_nm_info *)(sbi->nm_info);
> +}
> +
> +static inline struct f2fs_sm_info *SM_I(struct f2fs_sb_info *sbi)
> +{
> + return (struct f2fs_sm_info *)(sbi->sm_info);
> +}
> +
> +static inline struct sit_info *SIT_I(struct f2fs_sb_info *sbi)
> +{
> + return (struct sit_info *)(SM_I(sbi)->sit_info);
> +}
> +
> +static inline struct free_segmap_info *FREE_I(struct f2fs_sb_info *sbi)
> +{
> + return (struct free_segmap_info *)(SM_I(sbi)->free_info);
> +}
> +
> +static inline struct dirty_seglist_info *DIRTY_I(struct f2fs_sb_info *sbi)
> +{
> + return (struct dirty_seglist_info *)(SM_I(sbi)->dirty_info);
> +}
> +
> +static inline void F2FS_SET_SB_DIRT(struct f2fs_sb_info *sbi)
> +{
> + sbi->s_dirty = 1;
> +}
> +
> +static inline void F2FS_RESET_SB_DIRT(struct f2fs_sb_info *sbi)
> +{
> + sbi->s_dirty = 0;
> +}
> +
> +static inline void mutex_lock_op(struct f2fs_sb_info *sbi, enum lock_type t)
> +{
> + mutex_lock_nested(&sbi->fs_lock[t], t);
> +}
> +
> +static inline void mutex_unlock_op(struct f2fs_sb_info *sbi, enum lock_type t)
> +{
> + mutex_unlock(&sbi->fs_lock[t]);
> +}
> +
> +/**
> + * Check whether the given nid is within node id range.
> + */
> +static inline void check_nid_range(struct f2fs_sb_info *sbi, nid_t nid)
> +{
> + BUG_ON((nid >= NM_I(sbi)->max_nid));
> +}
> +
> +#define F2FS_DEFAULT_ALLOCATED_BLOCKS 1
> +
> +/**
> + * Check whether the inode has blocks or not
> + */
> +static inline int F2FS_HAS_BLOCKS(struct inode *inode)
> +{
> + if (F2FS_I(inode)->i_xattr_nid)
> + return (inode->i_blocks > F2FS_DEFAULT_ALLOCATED_BLOCKS + 1);
> + else
> + return (inode->i_blocks > F2FS_DEFAULT_ALLOCATED_BLOCKS);
> +}
> +
> +static inline bool inc_valid_block_count(struct f2fs_sb_info *sbi,
> + struct inode *inode, blkcnt_t count)
> +{
> + block_t valid_block_count;
> +
> + spin_lock(&sbi->stat_lock);
> + valid_block_count =
> + sbi->total_valid_block_count + (block_t)count;
> + if (valid_block_count > sbi->user_block_count) {
> + spin_unlock(&sbi->stat_lock);
> + return false;
> + }
> + inode->i_blocks += count;
> + sbi->total_valid_block_count = valid_block_count;
> + sbi->alloc_valid_block_count += (block_t)count;
> + spin_unlock(&sbi->stat_lock);
> + return true;
> +}
> +
> +static inline int dec_valid_block_count(struct f2fs_sb_info *sbi,
> + struct inode *inode,
> + blkcnt_t count)
> +{
> + spin_lock(&sbi->stat_lock);
> + BUG_ON(sbi->total_valid_block_count < (block_t) count);
> + BUG_ON(inode->i_blocks < count);
> + inode->i_blocks -= count;
> + sbi->total_valid_block_count -= (block_t)count;
> + spin_unlock(&sbi->stat_lock);
> + return 0;
> +}
> +
> +static inline void inc_page_count(struct f2fs_sb_info *sbi, int count_type)
> +{
> + atomic_inc(&sbi->nr_pages[count_type]);
> + F2FS_SET_SB_DIRT(sbi);
> +}
> +
> +static inline void inode_inc_dirty_dents(struct inode *inode)
> +{
> + atomic_inc(&F2FS_I(inode)->dirty_dents);
> +}
> +
> +static inline void dec_page_count(struct f2fs_sb_info *sbi, int count_type)
> +{
> + atomic_dec(&sbi->nr_pages[count_type]);
> +}
> +
> +static inline void inode_dec_dirty_dents(struct inode *inode)
> +{
> + atomic_dec(&F2FS_I(inode)->dirty_dents);
> +}
> +
> +static inline int get_pages(struct f2fs_sb_info *sbi, int count_type)
> +{
> + return atomic_read(&sbi->nr_pages[count_type]);
> +}
> +
> +static inline block_t valid_user_blocks(struct f2fs_sb_info *sbi)
> +{
> + block_t ret;
> + spin_lock(&sbi->stat_lock);
> + ret = sbi->total_valid_block_count;
> + spin_unlock(&sbi->stat_lock);
> + return ret;
> +}
> +
> +static inline unsigned long __bitmap_size(struct f2fs_sb_info *sbi, int flag)
> +{
> + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
> +
> + /* return NAT or SIT bitmap */
> + if (flag == NAT_BITMAP)
> + return le32_to_cpu(ckpt->nat_ver_bitmap_bytesize);
> + else if (flag == SIT_BITMAP)
> + return le32_to_cpu(ckpt->sit_ver_bitmap_bytesize);
> +
> + return 0;
> +}
> +
> +static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag)
> +{
> + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
> + int offset = (flag == NAT_BITMAP) ? ckpt->sit_ver_bitmap_bytesize : 0;
> + return &ckpt->sit_nat_version_bitmap + offset;
> +}
> +
> +static inline block_t __start_cp_addr(struct f2fs_sb_info *sbi)
> +{
> + block_t start_addr;
> + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
> + unsigned long long ckpt_version = le64_to_cpu(ckpt->checkpoint_ver);
> +
> + start_addr = le64_to_cpu(F2FS_RAW_SUPER(sbi)->start_segment_checkpoint);
> +
> + /*
> + * odd numbered checkpoint shoukd at cp segment 0

It needs to correct "shoukd" on "should".

> + * and even segent must be at cp segment 1
> + */
> + if (!(ckpt_version & 1))
> + start_addr += sbi->blocks_per_seg;
> +
> + return start_addr;
> +}
> +
> +static inline block_t __start_sum_addr(struct f2fs_sb_info *sbi)
> +{
> + return le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_start_sum);
> +}
> +
> +static inline bool inc_valid_node_count(struct f2fs_sb_info *sbi,
> + struct inode *inode,
> + unsigned int count)
> +{
> + block_t valid_block_count;
> + unsigned int valid_node_count;
> +
> + spin_lock(&sbi->stat_lock);
> +
> + valid_block_count = sbi->total_valid_block_count + (block_t)count;
> + sbi->alloc_valid_block_count += (block_t)count;
> + valid_node_count = sbi->total_valid_node_count + count;
> +
> + if (valid_block_count > sbi->user_block_count) {
> + spin_unlock(&sbi->stat_lock);
> + return false;
> + }
> +
> + if (valid_node_count > sbi->total_node_count) {
> + spin_unlock(&sbi->stat_lock);
> + return false;
> + }
> +
> + if (inode)
> + inode->i_blocks += count;
> + sbi->total_valid_node_count = valid_node_count;
> + sbi->total_valid_block_count = valid_block_count;
> + spin_unlock(&sbi->stat_lock);
> +
> + return true;
> +}
> +
> +static inline void dec_valid_node_count(struct f2fs_sb_info *sbi,
> + struct inode *inode,
> + unsigned int count)
> +{
> + spin_lock(&sbi->stat_lock);
> +
> + BUG_ON(sbi->total_valid_block_count < count);
> + BUG_ON(sbi->total_valid_node_count < count);
> + BUG_ON(inode->i_blocks < count);
> +
> + inode->i_blocks -= count;
> + sbi->total_valid_node_count -= count;
> + sbi->total_valid_block_count -= (block_t)count;
> +
> + spin_unlock(&sbi->stat_lock);
> +}
> +
> +static inline unsigned int valid_node_count(struct f2fs_sb_info *sbi)
> +{
> + unsigned int ret;
> + spin_lock(&sbi->stat_lock);
> + ret = sbi->total_valid_node_count;
> + spin_unlock(&sbi->stat_lock);
> + return ret;
> +}
> +
> +static inline void inc_valid_inode_count(struct f2fs_sb_info *sbi)
> +{
> + spin_lock(&sbi->stat_lock);
> + BUG_ON(sbi->total_valid_inode_count == sbi->total_node_count);
> + sbi->total_valid_inode_count++;
> + spin_unlock(&sbi->stat_lock);
> +}
> +
> +static inline int dec_valid_inode_count(struct f2fs_sb_info *sbi)
> +{
> + spin_lock(&sbi->stat_lock);
> + BUG_ON(!sbi->total_valid_inode_count);
> + sbi->total_valid_inode_count--;
> + spin_unlock(&sbi->stat_lock);
> + return 0;
> +}
> +
> +static inline unsigned int valid_inode_count(struct f2fs_sb_info *sbi)
> +{
> + unsigned int ret;
> + spin_lock(&sbi->stat_lock);
> + ret = sbi->total_valid_inode_count;
> + spin_unlock(&sbi->stat_lock);
> + return ret;
> +}
> +
> +static inline void f2fs_put_page(struct page *page, int unlock)
> +{
> + if (!page || IS_ERR(page))
> + return;
> +
> + if (unlock) {
> + BUG_ON(!PageLocked(page));
> + unlock_page(page);
> + }
> + page_cache_release(page);
> +}
> +
> +static inline void f2fs_put_dnode(struct dnode_of_data *dn)
> +{
> + if (dn->node_page)
> + f2fs_put_page(dn->node_page, 1);
> + if (dn->inode_page && dn->node_page != dn->inode_page)
> + f2fs_put_page(dn->inode_page, 0);
> + dn->node_page = NULL;
> + dn->inode_page = NULL;
> +}
> +
> +static inline struct kmem_cache *f2fs_kmem_cache_create(const char *name,
> + size_t size, void (*ctor)(void *))
> +{
> + return kmem_cache_create(name, size, 0, SLAB_RECLAIM_ACCOUNT, ctor);
> +}
> +
> +#define RAW_IS_INODE(p) ((p)->footer.nid == (p)->footer.ino)
> +
> +static inline bool IS_INODE(struct page *page)
> +{
> + struct f2fs_node *p = (struct f2fs_node *)page_address(page);
> + return RAW_IS_INODE(p);
> +}
> +
> +static inline __le32 *blkaddr_in_node(struct f2fs_node *node)
> +{
> + return RAW_IS_INODE(node) ? node->i.i_addr : node->dn.addr;
> +}
> +
> +static inline block_t datablock_addr(struct page *node_page,
> + unsigned int offset)
> +{
> + struct f2fs_node *raw_node;
> + __le32 *addr_array;
> + raw_node = (struct f2fs_node *)page_address(node_page);
> + addr_array = blkaddr_in_node(raw_node);
> + return le32_to_cpu(addr_array[offset]);
> +}
> +
> +static inline int f2fs_test_bit(unsigned int nr, char *addr)
> +{
> + int mask;
> +
> + addr += (nr >> 3);
> + mask = 1 << (7 - (nr & 0x07));
> + return mask & *addr;
> +}
> +
> +static inline int f2fs_set_bit(unsigned int nr, char *addr)
> +{
> + int mask;
> + int ret;
> +
> + addr += (nr >> 3);
> + mask = 1 << (7 - (nr & 0x07));
> + ret = mask & *addr;
> + *addr |= mask;
> + return ret;
> +}
> +
> +static inline int f2fs_clear_bit(unsigned int nr, char *addr)
> +{
> + int mask;
> + int ret;
> +
> + addr += (nr >> 3);
> + mask = 1 << (7 - (nr & 0x07));
> + ret = mask & *addr;
> + *addr &= ~mask;
> + return ret;
> +}
> +
> +enum {
> + FI_NEW_INODE,
> + FI_NEED_CP,
> + FI_INC_LINK,
> + FI_ACL_MODE,
> + FI_NO_ALLOC,
> +};

I think that it needs to have some comment about these declarations.

> +
> +static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag)
> +{
> + set_bit(flag, &fi->flags);
> +}
> +
> +static inline int is_inode_flag_set(struct f2fs_inode_info *fi, int flag)
> +{
> + return test_bit(flag, &fi->flags);
> +}
> +
> +static inline void clear_inode_flag(struct f2fs_inode_info *fi, int flag)
> +{
> + clear_bit(flag, &fi->flags);
> +}
> +
> +static inline void set_acl_inode(struct f2fs_inode_info *fi, umode_t mode)
> +{
> + fi->i_acl_mode = mode;
> + set_inode_flag(fi, FI_ACL_MODE);
> +}
> +
> +static inline int cond_clear_inode_flag(struct f2fs_inode_info *fi, int flag)
> +{
> + if (is_inode_flag_set(fi, FI_ACL_MODE)) {
> + clear_inode_flag(fi, FI_ACL_MODE);
> + return 1;
> + }
> + return 0;
> +}
> +
> +/**
> + * file.c
> + */
> +int f2fs_sync_file(struct file *, loff_t, loff_t, int);
> +void truncate_data_blocks(struct dnode_of_data *);
> +void f2fs_truncate(struct inode *);
> +int f2fs_setattr(struct dentry *, struct iattr *);
> +int truncate_hole(struct inode *, pgoff_t, pgoff_t);
> +long f2fs_ioctl(struct file *, unsigned int, unsigned long);
> +
> +/**
> + * inode.c
> + */
> +void f2fs_set_inode_flags(struct inode *);
> +struct inode *f2fs_iget_nowait(struct super_block *, unsigned long);
> +struct inode *f2fs_iget(struct super_block *, unsigned long);
> +void update_inode(struct inode *, struct page *);
> +int f2fs_write_inode(struct inode *, struct writeback_control *);
> +void f2fs_evict_inode(struct inode *);
> +
> +/**
> + * dir.c
> + */
> +struct f2fs_dir_entry *f2fs_find_entry(struct inode *, struct qstr *,
> + struct page **);
> +struct f2fs_dir_entry *f2fs_parent_dir(struct inode *, struct page **);
> +void f2fs_set_link(struct inode *, struct f2fs_dir_entry *,
> + struct page *, struct inode *);
> +void init_dent_inode(struct dentry *, struct page *);
> +int f2fs_add_link(struct dentry *, struct inode *);
> +void f2fs_delete_entry(struct f2fs_dir_entry *, struct page *, struct inode *);
> +int f2fs_make_empty(struct inode *, struct inode *);
> +bool f2fs_empty_dir(struct inode *);
> +
> +/**
> + * super.c
> + */
> +int f2fs_sync_fs(struct super_block *, int);
> +
> +/**
> + * hash.c
> + */
> +f2fs_hash_t f2fs_dentry_hash(const char *, int);
> +
> +/**
> + * node.c
> + */
> +struct dnode_of_data;
> +struct node_info;
> +
> +int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
> +void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);
> +int get_dnode_of_data(struct dnode_of_data *, pgoff_t, int);
> +int truncate_inode_blocks(struct inode *, pgoff_t);
> +int remove_inode_page(struct inode *);
> +int new_inode_page(struct inode *, struct dentry *);
> +struct page *new_node_page(struct dnode_of_data *, unsigned int);
> +void ra_node_page(struct f2fs_sb_info *, nid_t);
> +struct page *get_node_page(struct f2fs_sb_info *, pgoff_t);
> +struct page *get_node_page_ra(struct page *, int);
> +void sync_inode_page(struct dnode_of_data *);
> +int sync_node_pages(struct f2fs_sb_info *, nid_t, struct writeback_control *);
> +bool alloc_nid(struct f2fs_sb_info *, nid_t *);
> +void alloc_nid_done(struct f2fs_sb_info *, nid_t);
> +void alloc_nid_failed(struct f2fs_sb_info *, nid_t);
> +void recover_node_page(struct f2fs_sb_info *, struct page *,
> + struct f2fs_summary *, struct node_info *, block_t);
> +int recover_inode_page(struct f2fs_sb_info *, struct page *);
> +int restore_node_summary(struct f2fs_sb_info *, unsigned int,
> + struct f2fs_summary_block *);
> +void flush_nat_entries(struct f2fs_sb_info *);
> +int build_node_manager(struct f2fs_sb_info *);
> +void destroy_node_manager(struct f2fs_sb_info *);
> +int create_node_manager_caches(void);
> +void destroy_node_manager_caches(void);
> +
> +/**
> + * segment.c
> + */
> +void f2fs_balance_fs(struct f2fs_sb_info *);
> +void invalidate_blocks(struct f2fs_sb_info *, block_t);
> +void locate_dirty_segment(struct f2fs_sb_info *, unsigned int);
> +void clear_prefree_segments(struct f2fs_sb_info *);
> +int npages_for_summary_flush(struct f2fs_sb_info *);
> +void allocate_new_segments(struct f2fs_sb_info *);
> +struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
> +struct bio *f2fs_bio_alloc(struct block_device *, sector_t, int, gfp_t);
> +void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool sync);
> +int write_meta_page(struct f2fs_sb_info *, struct page *,
> + struct writeback_control *);
> +void write_node_page(struct f2fs_sb_info *, struct page *, unsigned int,
> + block_t, block_t *);
> +void write_data_page(struct inode *, struct page *, struct dnode_of_data*,
> + block_t, block_t *);
> +void rewrite_data_page(struct f2fs_sb_info *, struct page *, block_t);
> +void recover_data_page(struct f2fs_sb_info *, struct page *,
> + struct f2fs_summary *, block_t, block_t);
> +void rewrite_node_page(struct f2fs_sb_info *, struct page *,
> + struct f2fs_summary *, block_t, block_t);
> +void write_data_summaries(struct f2fs_sb_info *, block_t);
> +void write_node_summaries(struct f2fs_sb_info *, block_t);
> +int lookup_journal_in_cursum(struct f2fs_summary_block *,
> + int, unsigned int, int);
> +void flush_sit_entries(struct f2fs_sb_info *);
> +int build_segment_manager(struct f2fs_sb_info *);
> +void reset_victim_segmap(struct f2fs_sb_info *);
> +void destroy_segment_manager(struct f2fs_sb_info *);
> +
> +/**
> + * checkpoint.c
> + */
> +struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
> +struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
> +long sync_meta_pages(struct f2fs_sb_info *, enum page_type, long);
> +int check_orphan_space(struct f2fs_sb_info *);
> +void add_orphan_inode(struct f2fs_sb_info *, nid_t);
> +void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
> +int recover_orphan_inodes(struct f2fs_sb_info *);
> +int get_valid_checkpoint(struct f2fs_sb_info *);
> +void set_dirty_dir_page(struct inode *, struct page *);
> +void remove_dirty_dir_inode(struct inode *);
> +void sync_dirty_dir_inodes(struct f2fs_sb_info *);
> +void block_operations(struct f2fs_sb_info *);
> +void write_checkpoint(struct f2fs_sb_info *, bool, bool);
> +void init_orphan_info(struct f2fs_sb_info *);
> +int create_checkpoint_caches(void);
> +void destroy_checkpoint_caches(void);
> +
> +/**
> + * data.c
> + */
> +int reserve_new_block(struct dnode_of_data *);
> +void update_extent_cache(block_t, struct dnode_of_data *);
> +struct page *find_data_page(struct inode *, pgoff_t);
> +struct page *get_lock_data_page(struct inode *, pgoff_t);
> +struct page *get_new_data_page(struct inode *, pgoff_t, bool);
> +int f2fs_readpage(struct f2fs_sb_info *, struct page *, block_t, int);
> +int do_write_data_page(struct page *);
> +
> +/**
> + * gc.c
> + */
> +int start_gc_thread(struct f2fs_sb_info *);
> +void stop_gc_thread(struct f2fs_sb_info *);
> +block_t start_bidx_of_node(unsigned int);
> +int f2fs_gc(struct f2fs_sb_info *, int);
> +#ifdef CONFIG_F2FS_STAT_FS
> +void f2fs_update_stat(struct f2fs_sb_info *);
> +void f2fs_update_gc_metric(struct f2fs_sb_info *);
> +int f2fs_stat_init(struct f2fs_sb_info *);
> +void f2fs_stat_exit(struct f2fs_sb_info *);
> +#endif
> +int build_gc_manager(struct f2fs_sb_info *);
> +void destroy_gc_manager(struct f2fs_sb_info *);
> +int create_gc_caches(void);
> +void destroy_gc_caches(void);
> +
> +/**
> + * recovery.c
> + */
> +void recover_fsync_data(struct f2fs_sb_info *);
> +bool space_for_roll_forward(struct f2fs_sb_info *);
> +
> +extern const struct file_operations f2fs_dir_operations;
> +extern const struct file_operations f2fs_file_operations;
> +extern const struct inode_operations f2fs_file_inode_operations;
> +extern const struct address_space_operations f2fs_dblock_aops;
> +extern const struct address_space_operations f2fs_node_aops;
> +extern const struct address_space_operations f2fs_meta_aops;
> +extern const struct inode_operations f2fs_dir_inode_operations;
> +extern const struct inode_operations f2fs_symlink_inode_operations;
> +extern const struct inode_operations f2fs_special_inode_operations;
> +#endif
> diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
> new file mode 100644
> index 0000000..99ac689
> --- /dev/null
> +++ b/fs/f2fs/node.h
> @@ -0,0 +1,330 @@
> +/**
> + * fs/f2fs/node.h
> + *
> + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com/
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#define START_NID(nid) ((nid / NAT_ENTRY_PER_BLOCK) * NAT_ENTRY_PER_BLOCK)
> +#define NAT_BLOCK_OFFSET(start_nid) (start_nid / NAT_ENTRY_PER_BLOCK)
> +
> +#define FREE_NID_PAGES 4
> +#define MAX_FREE_NIDS (NAT_ENTRY_PER_BLOCK * FREE_NID_PAGES)
> +
> +#define MAX_RA_NODE 128 /* Max. readahead size for node */
> +#define NM_WOUT_THRESHOLD (64 * NAT_ENTRY_PER_BLOCK)
> +#define NATVEC_SIZE 64
> +
> +/**
> + * For node information
> + */
> +struct node_info {
> + nid_t nid; /* node id */
> + nid_t ino; /* inode number of the node's owner */
> + block_t blk_addr; /* block address of the node */
> + unsigned char version; /* version of the node */
> +};
> +
> +static inline unsigned char inc_node_version(unsigned char version)
> +{
> + return ++version;
> +}
> +
> +struct nat_entry {
> + struct list_head list; /* for clean or dirty nat list */
> + bool checkpointed;
> + struct node_info ni;
> +};
> +
> +#define nat_get_nid(nat) (nat->ni.nid)
> +#define nat_set_nid(nat, n) (nat->ni.nid = n)
> +#define nat_get_blkaddr(nat) (nat->ni.blk_addr)
> +#define nat_set_blkaddr(nat, b) (nat->ni.blk_addr = b)
> +#define nat_get_ino(nat) (nat->ni.ino)
> +#define nat_set_ino(nat, i) (nat->ni.ino = i)
> +#define nat_get_version(nat) (nat->ni.version)
> +#define nat_set_version(nat, v) (nat->ni.version = v)
> +#define __set_nat_cache_dirty(nm_i, ne) \
> + list_move_tail(&ne->list, &nm_i->dirty_nat_entries);
> +#define __clear_nat_cache_dirty(nm_i, ne) \
> + list_move_tail(&ne->list, &nm_i->nat_entries);
> +
> +static inline void node_info_from_raw_nat(struct node_info *ni,
> + struct f2fs_nat_entry *raw_ne)
> +{
> + ni->ino = le32_to_cpu(raw_ne->ino);
> + ni->blk_addr = le32_to_cpu(raw_ne->block_addr);
> + ni->version = raw_ne->version;
> +}
> +
> +/**
> + * For free nid mangement
> + */
> +enum nid_state {
> + NID_NEW,
> + NID_ALLOC
> +};
> +
> +struct free_nid {
> + nid_t nid;
> + int state;
> + struct list_head list;
> +};
> +
> +static inline int next_free_nid(struct f2fs_sb_info *sbi, nid_t *nid)
> +{
> + struct f2fs_nm_info *nm_i = NM_I(sbi);
> + struct free_nid *fnid;
> +
> + if (nm_i->fcnt <= 0)
> + return -1;
> + spin_lock(&nm_i->free_nid_list_lock);
> + fnid = list_entry(nm_i->free_nid_list.next, struct free_nid, list);
> + *nid = fnid->nid;
> + spin_unlock(&nm_i->free_nid_list_lock);
> + return 0;
> +}
> +
> +/**
> + * inline functions
> + */
> +static inline void get_nat_bitmap(struct f2fs_sb_info *sbi, void *addr)
> +{
> + struct f2fs_nm_info *nm_i = NM_I(sbi);
> + memcpy(addr, nm_i->nat_bitmap, nm_i->bitmap_size);
> +}
> +
> +static inline pgoff_t current_nat_addr(struct f2fs_sb_info *sbi, nid_t start)
> +{
> + struct f2fs_nm_info *nm_i = NM_I(sbi);
> + pgoff_t block_off;
> + pgoff_t block_addr;
> + int seg_off;
> +
> + block_off = NAT_BLOCK_OFFSET(start);
> + seg_off = block_off >> sbi->log_blocks_per_seg;
> +
> + block_addr = (pgoff_t)(nm_i->nat_blkaddr +
> + (seg_off << sbi->log_blocks_per_seg << 1) +
> + (block_off & ((1 << sbi->log_blocks_per_seg) - 1)));
> +
> + if (f2fs_test_bit(block_off, nm_i->nat_bitmap))
> + block_addr += sbi->blocks_per_seg;
> +
> + return block_addr;
> +}
> +
> +static inline pgoff_t next_nat_addr(struct f2fs_sb_info *sbi,
> + pgoff_t block_addr)
> +{
> + struct f2fs_nm_info *nm_i = NM_I(sbi);
> +
> + block_addr -= nm_i->nat_blkaddr;
> + if ((block_addr >> sbi->log_blocks_per_seg) % 2)
> + block_addr -= sbi->blocks_per_seg;
> + else
> + block_addr += sbi->blocks_per_seg;
> +
> + return block_addr + nm_i->nat_blkaddr;
> +}
> +
> +static inline void set_to_next_nat(struct f2fs_nm_info *nm_i, nid_t start_nid)
> +{
> + unsigned int block_off = NAT_BLOCK_OFFSET(start_nid);
> +
> + if (f2fs_test_bit(block_off, nm_i->nat_bitmap))
> + f2fs_clear_bit(block_off, nm_i->nat_bitmap);
> + else
> + f2fs_set_bit(block_off, nm_i->nat_bitmap);
> +}
> +
> +static inline void fill_node_footer(struct page *page, nid_t nid,
> + nid_t ino, unsigned int ofs, bool reset)
> +{
> + void *kaddr = page_address(page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + if (reset)
> + memset(rn, 0, sizeof(*rn));
> + rn->footer.nid = cpu_to_le32(nid);
> + rn->footer.ino = cpu_to_le32(ino);
> + rn->footer.flag = cpu_to_le32(ofs << OFFSET_BIT_SHIFT);
> +}
> +
> +static inline void copy_node_footer(struct page *dst, struct page *src)
> +{
> + void *src_addr = page_address(src);
> + void *dst_addr = page_address(dst);
> + struct f2fs_node *src_rn = (struct f2fs_node *)src_addr;
> + struct f2fs_node *dst_rn = (struct f2fs_node *)dst_addr;
> + memcpy(&dst_rn->footer, &src_rn->footer, sizeof(struct node_footer));
> +}
> +
> +static inline void fill_node_footer_blkaddr(struct page *page, block_t blkaddr)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
> + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
> + void *kaddr = page_address(page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + rn->footer.cp_ver = ckpt->checkpoint_ver;
> + rn->footer.next_blkaddr = blkaddr;
> +}
> +
> +static inline nid_t ino_of_node(struct page *node_page)
> +{
> + void *kaddr = page_address(node_page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + return le32_to_cpu(rn->footer.ino);
> +}
> +
> +static inline nid_t nid_of_node(struct page *node_page)
> +{
> + void *kaddr = page_address(node_page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + return le32_to_cpu(rn->footer.nid);
> +}
> +
> +static inline unsigned int ofs_of_node(struct page *node_page)
> +{
> + void *kaddr = page_address(node_page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + unsigned flag = le32_to_cpu(rn->footer.flag);
> + return flag >> OFFSET_BIT_SHIFT;
> +}
> +
> +static inline unsigned long long cpver_of_node(struct page *node_page)
> +{
> + void *kaddr = page_address(node_page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + return le64_to_cpu(rn->footer.cp_ver);
> +}
> +
> +static inline block_t next_blkaddr_of_node(struct page *node_page)
> +{
> + void *kaddr = page_address(node_page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + return le32_to_cpu(rn->footer.next_blkaddr);
> +}
> +
> +static inline bool IS_DNODE(struct page *node_page)
> +{
> + unsigned int ofs = ofs_of_node(node_page);
> + if (ofs == 3 || ofs == 4 + NIDS_PER_BLOCK ||
> + ofs == 5 + 2 * NIDS_PER_BLOCK)
> + return false;
> + if (ofs >= 6 + 2 * NIDS_PER_BLOCK) {
> + ofs -= 6 + 2 * NIDS_PER_BLOCK;
> + if ((long int)ofs % (NIDS_PER_BLOCK + 1))
> + return false;
> + }
> + return true;
> +}

The IS_DNODE() contains really many hardcoded constants. It is not so easy to understand why it is used such values. I suggest to change hardcoded values on declarations.

> +
> +static inline void set_nid(struct page *p, int off, nid_t nid, bool i)
> +{
> + struct f2fs_node *rn = (struct f2fs_node *)page_address(p);
> +
> + wait_on_page_writeback(p);
> +
> + if (i)
> + rn->i.i_nid[off - NODE_DIR1_BLOCK] = cpu_to_le32(nid);
> + else
> + rn->in.nid[off] = cpu_to_le32(nid);
> + set_page_dirty(p);
> +}
> +
> +static inline nid_t get_nid(struct page *p, int off, bool i)
> +{
> + struct f2fs_node *rn = (struct f2fs_node *)page_address(p);
> + if (i)
> + return le32_to_cpu(rn->i.i_nid[off - NODE_DIR1_BLOCK]);
> + return le32_to_cpu(rn->in.nid[off]);
> +}
> +
> +/**
> + * Coldness identification:
> + * - Mark cold files in f2fs_inode_info
> + * - Mark cold node blocks in their node footer
> + * - Mark cold data pages in page cache
> + */
> +static inline int is_cold_file(struct inode *inode)
> +{
> + return F2FS_I(inode)->i_advise & FADVISE_COLD_BIT;
> +}
> +
> +static inline int is_cold_data(struct page *page)
> +{
> + return PageChecked(page);
> +}
> +
> +static inline void set_cold_data(struct page *page)
> +{
> + SetPageChecked(page);
> +}
> +
> +static inline void clear_cold_data(struct page *page)
> +{
> + ClearPageChecked(page);
> +}
> +
> +static inline int is_cold_node(struct page *page)
> +{
> + void *kaddr = page_address(page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + unsigned int flag = le32_to_cpu(rn->footer.flag);
> + return flag & (0x1 << COLD_BIT_SHIFT);
> +}
> +
> +static inline unsigned char is_fsync_dnode(struct page *page)
> +{
> + void *kaddr = page_address(page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + unsigned int flag = le32_to_cpu(rn->footer.flag);
> + return flag & (0x1 << FSYNC_BIT_SHIFT);
> +}
> +
> +static inline unsigned char is_dent_dnode(struct page *page)
> +{
> + void *kaddr = page_address(page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + unsigned int flag = le32_to_cpu(rn->footer.flag);
> + return flag & (0x1 << DENT_BIT_SHIFT);
> +}
> +
> +static inline void set_cold_node(struct inode *inode, struct page *page)
> +{
> + struct f2fs_node *rn = (struct f2fs_node *)page_address(page);
> + unsigned int flag = le32_to_cpu(rn->footer.flag);
> +
> + if (S_ISDIR(inode->i_mode))
> + flag &= ~(0x1 << COLD_BIT_SHIFT);
> + else
> + flag |= (0x1 << COLD_BIT_SHIFT);
> + rn->footer.flag = cpu_to_le32(flag);
> +}
> +
> +static inline void set_fsync_mark(struct page *page, int mark)
> +{
> + void *kaddr = page_address(page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + unsigned int flag = le32_to_cpu(rn->footer.flag);
> + if (mark)
> + flag |= (0x1 << FSYNC_BIT_SHIFT);
> + else
> + flag &= ~(0x1 << FSYNC_BIT_SHIFT);
> + rn->footer.flag = cpu_to_le32(flag);
> +}
> +
> +static inline void set_dentry_mark(struct page *page, int mark)
> +{
> + void *kaddr = page_address(page);
> + struct f2fs_node *rn = (struct f2fs_node *)kaddr;
> + unsigned int flag = le32_to_cpu(rn->footer.flag);
> + if (mark)
> + flag |= (0x1 << DENT_BIT_SHIFT);
> + else
> + flag &= ~(0x1 << DENT_BIT_SHIFT);
> + rn->footer.flag = cpu_to_le32(flag);
> +}
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> new file mode 100644
> index 0000000..cd6268e
> --- /dev/null
> +++ b/fs/f2fs/segment.h
> @@ -0,0 +1,594 @@
> +/**
> + * fs/f2fs/segment.h
> + *
> + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com/
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +/* constant macro */
> +#define NULL_SEGNO ((unsigned int)(~0))
> +#define SUM_TYPE_NODE (1)
> +#define SUM_TYPE_DATA (0)
> +
> +/* V: Logical segment # in volume, R: Relative segment # in main area */
> +#define GET_L2R_SEGNO(free_i, segno) (segno - free_i->start_segno)
> +#define GET_R2L_SEGNO(free_i, segno) (segno + free_i->start_segno)
> +
> +#define IS_DATASEG(t) \
> + ((t == CURSEG_HOT_DATA) || (t == CURSEG_COLD_DATA) || \
> + (t == CURSEG_WARM_DATA))
> +
> +#define IS_NODESEG(t) \
> + ((t == CURSEG_HOT_NODE) || (t == CURSEG_COLD_NODE) || \
> + (t == CURSEG_WARM_NODE))
> +
> +#define IS_CURSEG(sbi, segno) \
> + ((segno == CURSEG_I(sbi, CURSEG_HOT_DATA)->segno) || \
> + (segno == CURSEG_I(sbi, CURSEG_WARM_DATA)->segno) || \
> + (segno == CURSEG_I(sbi, CURSEG_COLD_DATA)->segno) || \
> + (segno == CURSEG_I(sbi, CURSEG_HOT_NODE)->segno) || \
> + (segno == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno) || \
> + (segno == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno))
> +
> +#define IS_CURSEC(sbi, secno) \
> + ((secno == CURSEG_I(sbi, CURSEG_HOT_DATA)->segno / \
> + sbi->segs_per_sec) || \
> + (secno == CURSEG_I(sbi, CURSEG_WARM_DATA)->segno / \
> + sbi->segs_per_sec) || \
> + (secno == CURSEG_I(sbi, CURSEG_COLD_DATA)->segno / \
> + sbi->segs_per_sec) || \
> + (secno == CURSEG_I(sbi, CURSEG_HOT_NODE)->segno / \
> + sbi->segs_per_sec) || \
> + (secno == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno / \
> + sbi->segs_per_sec) || \
> + (secno == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno / \
> + sbi->segs_per_sec)) \
> +
> +#define START_BLOCK(sbi, segno) \
> + (SM_I(sbi)->seg0_blkaddr + \
> + (GET_R2L_SEGNO(FREE_I(sbi), segno) << sbi->log_blocks_per_seg))
> +#define NEXT_FREE_BLKADDR(sbi, curseg) \
> + (START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff)
> +
> +#define MAIN_BASE_BLOCK(sbi) (SM_I(sbi)->main_blkaddr)
> +
> +#define GET_SEGOFF_FROM_SEG0(sbi, blk_addr) \
> + ((blk_addr) - SM_I(sbi)->seg0_blkaddr)
> +#define GET_SEGNO_FROM_SEG0(sbi, blk_addr) \
> + (GET_SEGOFF_FROM_SEG0(sbi, blk_addr) >> sbi->log_blocks_per_seg)
> +#define GET_SEGNO(sbi, blk_addr) \
> + (((blk_addr == NULL_ADDR) || (blk_addr == NEW_ADDR)) ? \
> + NULL_SEGNO : GET_L2R_SEGNO(FREE_I(sbi), \
> + GET_SEGNO_FROM_SEG0(sbi, blk_addr)))
> +#define GET_SECNO(sbi, segno) \
> + ((segno) / sbi->segs_per_sec)
> +#define GET_ZONENO_FROM_SEGNO(sbi, segno) \
> + ((segno / sbi->segs_per_sec) / sbi->secs_per_zone)
> +
> +#define GET_SUM_BLOCK(sbi, segno) \
> + ((sbi->sm_info->ssa_blkaddr) + segno)
> +
> +#define GET_SUM_TYPE(footer) ((footer)->entry_type)
> +#define SET_SUM_TYPE(footer, type) ((footer)->entry_type = type)
> +
> +#define SIT_ENTRY_OFFSET(sit_i, segno) \
> + (segno % sit_i->sents_per_block)
> +#define SIT_BLOCK_OFFSET(sit_i, segno) \
> + (segno / SIT_ENTRY_PER_BLOCK)
> +#define START_SEGNO(sit_i, segno) \
> + (SIT_BLOCK_OFFSET(sit_i, segno) * SIT_ENTRY_PER_BLOCK)
> +#define f2fs_bitmap_size(nr) \
> + (BITS_TO_LONGS(nr) * sizeof(unsigned long))
> +#define TOTAL_SEGS(sbi) (SM_I(sbi)->main_segment_count)
> +
> +enum {
> + LFS = 0,
> + SSR
> +};

I can't understand what means LFS or SSR. I think that these declarations have to be commented.

> +
> +enum {
> + ALLOC_RIGHT = 0,
> + ALLOC_LEFT
> +};
> +
> +#define SET_SSR_TYPE(type) (((type) + 1) << 16)
> +#define GET_SSR_TYPE(type) (((type) >> 16) - 1)
> +#define IS_SSR_TYPE(type) ((type) >= (0x1 << 16))

I can't understand what means these calculations with hardcoded constants.

> +#define IS_NEXT_SEG(sbi, curseg, type) \
> + (DIRTY_I(sbi)->v_ops->get_victim(sbi, &(curseg)->next_segno, \
> + BG_GC, SET_SSR_TYPE(type)))
> +/**
> + * The MSB 6 bits of f2fs_sit_entry->vblocks has segment type,
> + * and LSB 10 bits has valid blocks.
> + */
> +#define VBLOCKS_MASK ((1 << 10) - 1)
> +
> +#define GET_SIT_VBLOCKS(raw_sit) \
> + (le16_to_cpu((raw_sit)->vblocks) & VBLOCKS_MASK)
> +#define GET_SIT_TYPE(raw_sit) \
> + ((le16_to_cpu((raw_sit)->vblocks) & ~VBLOCKS_MASK) >> 10)
> +
> +struct bio_private {
> + struct f2fs_sb_info *sbi;
> + bool is_sync;
> + void *wait;
> +};
> +
> +enum {
> + GC_CB = 0,
> + GC_GREEDY
> +};

I guess that GC_GREEDY means greedy policy of GC. But what does GC_CB means?

> +
> +struct victim_sel_policy {
> + int alloc_mode;
> + int gc_mode;
> + int type;
> + unsigned long *dirty_segmap;
> + unsigned int offset;
> + unsigned int ofs_unit;
> + unsigned int min_cost;
> + unsigned int min_segno;
> +};

Need to comment, from my point of view.

> +
> +struct seg_entry {
> + unsigned short valid_blocks;
> + unsigned char *cur_valid_map;
> + unsigned short ckpt_valid_blocks;
> + unsigned char *ckpt_valid_map;
> + unsigned char type;
> + unsigned long long mtime;
> +};

Need to comment, from my point of view.

> +
> +struct sec_entry {
> + unsigned int valid_blocks;
> +};
> +
> +struct segment_allocation {
> + void (*allocate_segment)(struct f2fs_sb_info *, int, bool);
> +};

Need to comment, from my point of view.

> +
> +struct sit_info {
> + const struct segment_allocation *s_ops;
> +
> + block_t sit_base_addr;
> + block_t sit_blocks;
> + block_t written_valid_blocks; /* total number of valid blocks
> + in main area */
> + char *sit_bitmap; /* SIT bitmap pointer */
> + unsigned int bitmap_size;
> +
> + unsigned int dirty_sentries; /* # of dirty sentries */
> + unsigned long *dirty_sentries_bitmap; /* bitmap for dirty sentries */
> + unsigned int sents_per_block; /* number of SIT entries
> + per SIT block */
> + struct mutex sentry_lock; /* to protect SIT entries */
> + struct seg_entry *sentries;
> + struct sec_entry *sec_entries;
> +
> + unsigned long long elapsed_time;
> + unsigned long long mounted_time;
> + unsigned long long min_mtime;
> + unsigned long long max_mtime;
> +};

Need to comment, from my point of view.

> +
> +struct free_segmap_info {
> + unsigned int start_segno;
> + unsigned int free_segments;
> + unsigned int free_sections;
> + rwlock_t segmap_lock; /* free segmap lock */
> + unsigned long *free_segmap;
> + unsigned long *free_secmap;
> +};

Need to comment, from my point of view.

> +
> +/* Notice: The order of dirty type is same with CURSEG_XXX in f2fs.h */
> +enum dirty_type {
> + DIRTY_HOT_DATA, /* a few valid blocks in a data segment */
> + DIRTY_WARM_DATA,
> + DIRTY_COLD_DATA,
> + DIRTY_HOT_NODE, /* a few valid blocks in a node segment */
> + DIRTY_WARM_NODE,
> + DIRTY_COLD_NODE,
> + DIRTY,
> + PRE, /* no valid blocks in a segment */
> + NR_DIRTY_TYPE
> +};
> +
> +enum {
> + BG_GC,
> + FG_GC
> +};

What do these declarations mean? Need to comment, from my point of view.

> +
> +struct dirty_seglist_info {
> + const struct victim_selection *v_ops;
> + struct mutex seglist_lock;
> + unsigned long *dirty_segmap[NR_DIRTY_TYPE];
> + int nr_dirty[NR_DIRTY_TYPE];
> + unsigned long *victim_segmap[2]; /* BG_GC, FG_GC */
> +};

Need to comment, from my point of view.

> +
> +struct victim_selection {
> + int (*get_victim)(struct f2fs_sb_info *, unsigned int *, int, int);
> +};

Need to comment, from my point of view.

> +
> +struct curseg_info {
> + struct mutex curseg_mutex;
> + struct f2fs_summary_block *sum_blk;
> + unsigned char alloc_type;
> + unsigned int segno;
> + unsigned short next_blkoff;
> + unsigned int zone;
> + unsigned int next_segno;
> +};

Need to comment, from my point of view.

> +
> +/**
> + * inline functions
> + */
> +static inline struct curseg_info *CURSEG_I(struct f2fs_sb_info *sbi, int type)
> +{
> + return (struct curseg_info *)(SM_I(sbi)->curseg_array + type);
> +}
> +
> +static inline struct seg_entry *get_seg_entry(struct f2fs_sb_info *sbi,
> + unsigned int segno)
> +{
> + struct sit_info *sit_i = SIT_I(sbi);
> + return &sit_i->sentries[segno];
> +}
> +
> +static inline struct sec_entry *get_sec_entry(struct f2fs_sb_info *sbi,
> + unsigned int segno)
> +{
> + struct sit_info *sit_i = SIT_I(sbi);
> + return &sit_i->sec_entries[GET_SECNO(sbi, segno)];
> +}
> +
> +static inline unsigned int get_valid_blocks(struct f2fs_sb_info *sbi,
> + unsigned int segno, int section)
> +{
> + if (section > 1)
> + return get_sec_entry(sbi, segno)->valid_blocks;
> + else
> + return get_seg_entry(sbi, segno)->valid_blocks;
> +}

What difference is in two branches in the get_valid_blocks()?

> +
> +static inline void seg_info_from_raw_sit(struct seg_entry *se,
> + struct f2fs_sit_entry *rs)
> +{
> + se->valid_blocks = GET_SIT_VBLOCKS(rs);
> + se->ckpt_valid_blocks = GET_SIT_VBLOCKS(rs);
> + memcpy(se->cur_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
> + memcpy(se->ckpt_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
> + se->type = GET_SIT_TYPE(rs);
> + se->mtime = le64_to_cpu(rs->mtime);
> +}
> +
> +static inline void seg_info_to_raw_sit(struct seg_entry *se,
> + struct f2fs_sit_entry *rs)
> +{
> + unsigned short raw_vblocks = (se->type << 10) | se->valid_blocks;

What does hardcoded 10 value means here?

> + rs->vblocks = cpu_to_le16(raw_vblocks);
> + memcpy(rs->valid_map, se->cur_valid_map, SIT_VBLOCK_MAP_SIZE);
> + memcpy(se->ckpt_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
> + se->ckpt_valid_blocks = se->valid_blocks;
> + rs->mtime = cpu_to_le64(se->mtime);
> +}
> +
> +static inline unsigned int find_next_inuse(struct free_segmap_info *free_i,
> + unsigned int max, unsigned int segno)
> +{
> + unsigned int ret;
> + read_lock(&free_i->segmap_lock);
> + ret = find_next_bit(free_i->free_segmap, max, segno);
> + read_unlock(&free_i->segmap_lock);
> + return ret;
> +}
> +
> +static inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
> +{
> + struct free_segmap_info *free_i = FREE_I(sbi);
> + unsigned int secno = segno / sbi->segs_per_sec;
> + unsigned int start_segno = secno * sbi->segs_per_sec;
> + unsigned int next;
> +
> + write_lock(&free_i->segmap_lock);
> + clear_bit(segno, free_i->free_segmap);
> + free_i->free_segments++;
> +
> + next = find_next_bit(free_i->free_segmap, TOTAL_SEGS(sbi), start_segno);
> + if (next >= start_segno + sbi->segs_per_sec) {
> + clear_bit(secno, free_i->free_secmap);
> + free_i->free_sections++;
> + }
> + write_unlock(&free_i->segmap_lock);
> +}
> +
> +static inline void __set_inuse(struct f2fs_sb_info *sbi,
> + unsigned int segno)
> +{
> + struct free_segmap_info *free_i = FREE_I(sbi);
> + unsigned int secno = segno / sbi->segs_per_sec;
> + set_bit(segno, free_i->free_segmap);
> + free_i->free_segments--;
> + if (!test_and_set_bit(secno, free_i->free_secmap))
> + free_i->free_sections--;
> +}
> +
> +static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
> + unsigned int segno)
> +{
> + struct free_segmap_info *free_i = FREE_I(sbi);
> + unsigned int secno = segno / sbi->segs_per_sec;
> + unsigned int start_segno = secno * sbi->segs_per_sec;
> + unsigned int next;
> +
> + write_lock(&free_i->segmap_lock);
> + if (test_and_clear_bit(segno, free_i->free_segmap)) {
> + free_i->free_segments++;
> +
> + next = find_next_bit(free_i->free_segmap, TOTAL_SEGS(sbi),
> + start_segno);
> + if (next >= start_segno + sbi->segs_per_sec) {
> + if (test_and_clear_bit(secno, free_i->free_secmap))
> + free_i->free_sections++;
> + }
> + }
> + write_unlock(&free_i->segmap_lock);
> +}
> +
> +static inline void __set_test_and_inuse(struct f2fs_sb_info *sbi,
> + unsigned int segno)
> +{
> + struct free_segmap_info *free_i = FREE_I(sbi);
> + unsigned int secno = segno / sbi->segs_per_sec;
> + write_lock(&free_i->segmap_lock);
> + if (!test_and_set_bit(segno, free_i->free_segmap)) {
> + free_i->free_segments--;
> + if (!test_and_set_bit(secno, free_i->free_secmap))
> + free_i->free_sections--;
> + }
> + write_unlock(&free_i->segmap_lock);
> +}
> +
> +static inline void get_sit_bitmap(struct f2fs_sb_info *sbi,
> + void *dst_addr)
> +{
> + struct sit_info *sit_i = SIT_I(sbi);
> + memcpy(dst_addr, sit_i->sit_bitmap, sit_i->bitmap_size);
> +}
> +
> +static inline block_t written_block_count(struct f2fs_sb_info *sbi)
> +{
> + struct sit_info *sit_i = SIT_I(sbi);
> + block_t vblocks;
> +
> + mutex_lock(&sit_i->sentry_lock);
> + vblocks = sit_i->written_valid_blocks;
> + mutex_unlock(&sit_i->sentry_lock);
> +
> + return vblocks;
> +}
> +
> +static inline unsigned int free_segments(struct f2fs_sb_info *sbi)
> +{
> + struct free_segmap_info *free_i = FREE_I(sbi);
> + unsigned int free_segs;
> +
> + read_lock(&free_i->segmap_lock);
> + free_segs = free_i->free_segments;
> + read_unlock(&free_i->segmap_lock);
> +
> + return free_segs;
> +}
> +
> +static inline int reserved_segments(struct f2fs_sb_info *sbi)
> +{
> + struct f2fs_gc_info *gc_i = sbi->gc_info;
> + return gc_i->rsvd_segment_count;
> +}
> +
> +static inline unsigned int free_sections(struct f2fs_sb_info *sbi)
> +{
> + struct free_segmap_info *free_i = FREE_I(sbi);
> + unsigned int free_secs;
> +
> + read_lock(&free_i->segmap_lock);
> + free_secs = free_i->free_sections;
> + read_unlock(&free_i->segmap_lock);
> +
> + return free_secs;
> +}
> +
> +static inline unsigned int prefree_segments(struct f2fs_sb_info *sbi)
> +{
> + return DIRTY_I(sbi)->nr_dirty[PRE];
> +}
> +
> +static inline unsigned int dirty_segments(struct f2fs_sb_info *sbi)
> +{
> + return DIRTY_I(sbi)->nr_dirty[DIRTY_HOT_DATA] +
> + DIRTY_I(sbi)->nr_dirty[DIRTY_WARM_DATA] +
> + DIRTY_I(sbi)->nr_dirty[DIRTY_COLD_DATA] +
> + DIRTY_I(sbi)->nr_dirty[DIRTY_HOT_NODE] +
> + DIRTY_I(sbi)->nr_dirty[DIRTY_WARM_NODE] +
> + DIRTY_I(sbi)->nr_dirty[DIRTY_COLD_NODE];
> +}
> +
> +static inline int overprovision_segments(struct f2fs_sb_info *sbi)
> +{
> + struct f2fs_gc_info *gc_i = sbi->gc_info;
> + return gc_i->overp_segment_count;
> +}
> +
> +static inline int overprovision_sections(struct f2fs_sb_info *sbi)
> +{
> + struct f2fs_gc_info *gc_i = sbi->gc_info;
> + return ((unsigned int) gc_i->overp_segment_count) / sbi->segs_per_sec;
> +}
> +
> +static inline int reserved_sections(struct f2fs_sb_info *sbi)
> +{
> + struct f2fs_gc_info *gc_i = sbi->gc_info;
> + return ((unsigned int) gc_i->rsvd_segment_count) / sbi->segs_per_sec;
> +}
> +
> +static inline bool need_SSR(struct f2fs_sb_info *sbi)
> +{
> + return (free_sections(sbi) < overprovision_sections(sbi));
> +}
> +
> +static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi)
> +{
> + return free_sections(sbi) <= reserved_sections(sbi);
> +}
> +
> +static inline int utilization(struct f2fs_sb_info *sbi)
> +{
> + return (long int)valid_user_blocks(sbi) * 100 /
> + (long int)sbi->user_block_count;
> +}
> +
> +/* Disable In-Place-Update by default */
> +#define MIN_IPU_UTIL 100

What does 100 means?

> +static inline bool need_inplace_update(struct inode *inode)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
> + if (S_ISDIR(inode->i_mode))
> + return false;
> + if (need_SSR(sbi) && utilization(sbi) > MIN_IPU_UTIL)
> + return true;
> + return false;
> +}
> +
> +static inline unsigned int curseg_segno(struct f2fs_sb_info *sbi,
> + int type)
> +{
> + struct curseg_info *curseg = CURSEG_I(sbi, type);
> + return curseg->segno;
> +}
> +
> +static inline unsigned char curseg_alloc_type(struct f2fs_sb_info *sbi,
> + int type)
> +{
> + struct curseg_info *curseg = CURSEG_I(sbi, type);
> + return curseg->alloc_type;
> +}
> +
> +static inline unsigned short curseg_blkoff(struct f2fs_sb_info *sbi, int type)
> +{
> + struct curseg_info *curseg = CURSEG_I(sbi, type);
> + return curseg->next_blkoff;
> +}
> +
> +static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno)
> +{
> + unsigned int end_segno = SM_I(sbi)->segment_count - 1;
> + BUG_ON(segno > end_segno);
> +}
> +
> +/*
> + * This function is used for only debugging.
> + * NOTE: In future, we have to remove this function.
> + */
> +static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
> +{
> + struct f2fs_sm_info *sm_info = SM_I(sbi);
> + block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
> + block_t start_addr = sm_info->seg0_blkaddr;
> + block_t end_addr = start_addr + total_blks - 1;
> + BUG_ON(blk_addr < start_addr);
> + BUG_ON(blk_addr > end_addr);
> +}
> +
> +/**
> + * Summary block is always treated as invalid block
> + */
> +static inline void check_block_count(struct f2fs_sb_info *sbi,
> + int segno, struct f2fs_sit_entry *raw_sit)
> +{
> + struct f2fs_sm_info *sm_info = SM_I(sbi);
> + unsigned int end_segno = sm_info->segment_count - 1;
> + int valid_blocks = 0;
> + int i;
> +
> + /* check segment usage */
> + BUG_ON(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg);
> +
> + /* check boundary of a given segment number */
> + BUG_ON(segno > end_segno);
> +
> + /* check bitmap with valid block count */
> + for (i = 0; i < sbi->blocks_per_seg; i++)
> + if (f2fs_test_bit(i, raw_sit->valid_map))
> + valid_blocks++;
> + BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);
> +}
> +
> +static inline pgoff_t current_sit_addr(struct f2fs_sb_info *sbi,
> + unsigned int start)
> +{
> + struct sit_info *sit_i = SIT_I(sbi);
> + unsigned int offset = SIT_BLOCK_OFFSET(sit_i, start);
> + block_t blk_addr = sit_i->sit_base_addr + offset;
> +
> + check_seg_range(sbi, start);
> +
> + /* calculate sit block address */
> + if (f2fs_test_bit(offset, sit_i->sit_bitmap))
> + blk_addr += sit_i->sit_blocks;
> +
> + return blk_addr;
> +}
> +
> +static inline pgoff_t next_sit_addr(struct f2fs_sb_info *sbi,
> + pgoff_t block_addr)
> +{
> + struct sit_info *sit_i = SIT_I(sbi);
> + block_addr -= sit_i->sit_base_addr;
> + if (block_addr < sit_i->sit_blocks)
> + block_addr += sit_i->sit_blocks;
> + else
> + block_addr -= sit_i->sit_blocks;
> +
> + return block_addr + sit_i->sit_base_addr;
> +}
> +
> +static inline void set_to_next_sit(struct sit_info *sit_i, unsigned int start)
> +{
> + unsigned int block_off = SIT_BLOCK_OFFSET(sit_i, start);
> +
> + if (f2fs_test_bit(block_off, sit_i->sit_bitmap))
> + f2fs_clear_bit(block_off, sit_i->sit_bitmap);
> + else
> + f2fs_set_bit(block_off, sit_i->sit_bitmap);
> +}
> +
> +static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi)
> +{
> + struct sit_info *sit_i = SIT_I(sbi);
> + return sit_i->elapsed_time + CURRENT_TIME_SEC.tv_sec -
> + sit_i->mounted_time;
> +}
> +
> +static inline void set_summary(struct f2fs_summary *sum, nid_t nid,
> + unsigned int ofs_in_node, unsigned char version)
> +{
> + sum->nid = cpu_to_le32(nid);
> + sum->ofs_in_node = cpu_to_le16(ofs_in_node);
> + sum->version = version;
> +}
> +
> +static inline block_t start_sum_block(struct f2fs_sb_info *sbi)
> +{
> + return __start_cp_addr(sbi) +
> + le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_start_sum);
> +}
> +
> +static inline block_t sum_blk_addr(struct f2fs_sb_info *sbi, int base, int type)
> +{
> + return __start_cp_addr(sbi) +
> + le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_total_block_count)
> + - (base + 1) + type;
> +}
> --
> 1.7.9.5
>
>
>
>
> ---
> Jaegeuk Kim
> Samsung
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

With the best regards,
Vyacheslav Dubeyko.

2012-10-28 11:09:05

by Viacheslav Dubeyko

[permalink] [raw]
Subject: Re: [PATCH 04/16 v2] f2fs: add super block operations


On Oct 23, 2012, at 6:27 AM, Jaegeuk Kim wrote:

> This adds the implementation of superblock operations for f2fs, which includes
> - init_f2fs_fs/exit_f2fs_fs
> - f2fs_mount
> - super_operations of f2fs
>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/super.c | 590 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 590 insertions(+)
> create mode 100644 fs/f2fs/super.c
>
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> new file mode 100644
> index 0000000..8e608a0
> --- /dev/null
> +++ b/fs/f2fs/super.c
> @@ -0,0 +1,590 @@
> +/**
> + * fs/f2fs/super.c
> + *
> + * Copyright (c) 2012 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com/
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/fs.h>
> +#include <linux/statfs.h>
> +#include <linux/proc_fs.h>
> +#include <linux/buffer_head.h>
> +#include <linux/backing-dev.h>
> +#include <linux/kthread.h>
> +#include <linux/parser.h>
> +#include <linux/mount.h>
> +#include <linux/seq_file.h>
> +#include <linux/f2fs_fs.h>
> +
> +#include "f2fs.h"
> +#include "node.h"
> +#include "xattr.h"
> +
> +static struct kmem_cache *f2fs_inode_cachep;
> +static struct proc_dir_entry *f2fs_proc_root;
> +
> +enum {
> + Opt_gc_background_off,
> + Opt_disable_roll_forward,
> + Opt_discard,
> + Opt_noheap,
> + Opt_nouser_xattr,
> + Opt_noacl,
> + Opt_active_logs,
> + Opt_disable_ext_identify,
> + Opt_err,
> +};
> +
> +static match_table_t f2fs_tokens = {
> + {Opt_gc_background_off, "background_gc_off"},
> + {Opt_disable_roll_forward, "disable_roll_forward"},
> + {Opt_discard, "discard"},
> + {Opt_noheap, "no_heap"},
> + {Opt_nouser_xattr, "nouser_xattr"},
> + {Opt_noacl, "noacl"},
> + {Opt_active_logs, "active_logs=%u"},
> + {Opt_disable_ext_identify, "disable_ext_identify"},
> + {Opt_err, NULL},
> +};
> +
> +static void init_once(void *foo)
> +{
> + struct f2fs_inode_info *fi = (struct f2fs_inode_info *) foo;
> +
> + memset(fi, 0, sizeof(*fi));
> + inode_init_once(&fi->vfs_inode);
> +}
> +
> +static struct inode *f2fs_alloc_inode(struct super_block *sb)
> +{
> + struct f2fs_inode_info *fi;
> +
> + fi = kmem_cache_alloc(f2fs_inode_cachep, GFP_NOFS | __GFP_ZERO);
> + if (!fi)
> + return NULL;
> +
> + init_once((void *) fi);
> +
> + /* Initilize f2fs-specific inode info */
> + fi->vfs_inode.i_version = 1;
> + atomic_set(&fi->dirty_dents, 0);
> + fi->current_depth = 1;
> + fi->i_advise = 0;
> + rwlock_init(&fi->ext.ext_lock);
> +
> + set_inode_flag(fi, FI_NEW_INODE);
> +
> + return &fi->vfs_inode;
> +}
> +
> +static void f2fs_i_callback(struct rcu_head *head)
> +{
> + struct inode *inode = container_of(head, struct inode, i_rcu);
> + kmem_cache_free(f2fs_inode_cachep, F2FS_I(inode));
> +}
> +
> +void f2fs_destroy_inode(struct inode *inode)
> +{
> + call_rcu(&inode->i_rcu, f2fs_i_callback);
> +}
> +
> +static void f2fs_put_super(struct super_block *sb)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> +
> +#ifdef CONFIG_F2FS_STAT_FS
> + if (sbi->s_proc) {
> + f2fs_stat_exit(sbi);
> + remove_proc_entry(sb->s_id, f2fs_proc_root);
> + }
> +#endif
> + stop_gc_thread(sbi);
> +
> + write_checkpoint(sbi, false, true);
> +
> + iput(sbi->node_inode);
> + iput(sbi->meta_inode);
> +
> + /* destroy f2fs internal modules */
> + destroy_gc_manager(sbi);
> + destroy_node_manager(sbi);
> + destroy_segment_manager(sbi);
> +
> + kfree(sbi->ckpt);
> +
> + sb->s_fs_info = NULL;
> + brelse(sbi->raw_super_buf);
> + kfree(sbi);
> +}
> +
> +int f2fs_sync_fs(struct super_block *sb, int sync)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> + int ret = 0;
> +
> + if (!sbi->s_dirty && !get_pages(sbi, F2FS_DIRTY_NODES))
> + return 0;
> +
> + if (sync)
> + write_checkpoint(sbi, false, false);
> +
> + return ret;
> +}
> +
> +static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf)
> +{
> + struct super_block *sb = dentry->d_sb;
> + struct f2fs_sb_info *sbi = F2FS_SB(sb);
> + block_t total_count, user_block_count, start_count, ovp_count;
> +
> + total_count = le64_to_cpu(sbi->raw_super->block_count);
> + user_block_count = sbi->user_block_count;
> + start_count = le32_to_cpu(sbi->raw_super->segment0_blkaddr);
> + ovp_count = sbi->gc_info->overp_segment_count
> + << sbi->log_blocks_per_seg;
> + buf->f_type = F2FS_SUPER_MAGIC;
> + buf->f_bsize = sbi->blocksize;
> +
> + buf->f_blocks = total_count - start_count;
> + buf->f_bfree = buf->f_blocks - valid_user_blocks(sbi) - ovp_count;
> + buf->f_bavail = user_block_count - valid_user_blocks(sbi);
> +
> + buf->f_files = valid_inode_count(sbi);
> + buf->f_ffree = sbi->total_node_count - valid_node_count(sbi);
> +
> + buf->f_namelen = F2FS_MAX_NAME_LEN;
> +
> + return 0;
> +}
> +
> +static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(root->d_sb);
> +
> + if (test_opt(sbi, BG_GC))
> + seq_puts(seq, ",background_gc_on");
> + else
> + seq_puts(seq, ",background_gc_off");
> + if (test_opt(sbi, DISABLE_ROLL_FORWARD))
> + seq_puts(seq, ",disable_roll_forward");
> + if (test_opt(sbi, DISCARD))
> + seq_puts(seq, ",discard");
> + if (test_opt(sbi, NOHEAP))
> + seq_puts(seq, ",no_heap_alloc");
> +#ifdef CONFIG_F2FS_FS_XATTR
> + if (test_opt(sbi, XATTR_USER))
> + seq_puts(seq, ",user_xattr");
> + else
> + seq_puts(seq, ",nouser_xattr");
> +#endif
> +#ifdef CONFIG_F2FS_FS_POSIX_ACL
> + if (test_opt(sbi, POSIX_ACL))
> + seq_puts(seq, ",acl");
> + else
> + seq_puts(seq, ",noacl");
> +#endif
> + if (test_opt(sbi, DISABLE_EXT_IDENTIFY))
> + seq_puts(seq, ",disable_ext_indentify");
> +
> + seq_printf(seq, ",active_logs=%u", sbi->active_logs);
> +
> + return 0;
> +}
> +
> +static struct super_operations f2fs_sops = {
> + .alloc_inode = f2fs_alloc_inode,
> + .destroy_inode = f2fs_destroy_inode,
> + .write_inode = f2fs_write_inode,
> + .show_options = f2fs_show_options,
> + .evict_inode = f2fs_evict_inode,
> + .put_super = f2fs_put_super,
> + .sync_fs = f2fs_sync_fs,
> + .statfs = f2fs_statfs,
> +};
> +
> +static int parse_options(struct f2fs_sb_info *sbi, char *options)
> +{
> + substring_t args[MAX_OPT_ARGS];
> + char *p;
> + int arg = 0;
> +
> + if (!options)
> + return 0;
> +
> + while ((p = strsep(&options, ",")) != NULL) {
> + int token;
> + if (!*p)
> + continue;
> + /*
> + * Initialize args struct so we know whether arg was
> + * found; some options take optional arguments.
> + */
> + args[0].to = args[0].from = NULL;
> + token = match_token(p, f2fs_tokens, args);
> +
> + switch (token) {
> + case Opt_gc_background_off:
> + clear_opt(sbi, BG_GC);
> + break;
> + case Opt_disable_roll_forward:
> + set_opt(sbi, DISABLE_ROLL_FORWARD);
> + break;
> + case Opt_discard:
> + set_opt(sbi, DISCARD);
> + break;
> + case Opt_noheap:
> + set_opt(sbi, NOHEAP);
> + break;
> +#ifdef CONFIG_F2FS_FS_XATTR
> + case Opt_nouser_xattr:
> + clear_opt(sbi, XATTR_USER);
> + break;
> +#else
> + case Opt_nouser_xattr:
> + pr_info("nouser_xattr options not supported\n");
> + break;
> +#endif
> +#ifdef CONFIG_F2FS_FS_POSIX_ACL
> + case Opt_noacl:
> + clear_opt(sbi, POSIX_ACL);
> + break;
> +#else
> + case Opt_noacl:
> + pr_info("noacl options not supported\n");
> + break;
> +#endif
> + case Opt_active_logs:
> + if (args->from && match_int(args, &arg))
> + return -EINVAL;
> + if (arg != 2 && arg != 4 && arg != 6)
> + return -EINVAL;
> + sbi->active_logs = arg;
> + break;
> + case Opt_disable_ext_identify:
> + set_opt(sbi, DISABLE_EXT_IDENTIFY);
> + break;
> + default:
> + return -EINVAL;
> + }
> + }
> + return 0;
> +}
> +
> +static loff_t max_file_size(unsigned bits)
> +{
> + loff_t result = ADDRS_PER_INODE;
> + loff_t leaf_count = ADDRS_PER_BLOCK;
> +
> + result += (leaf_count * 2);
> +
> + leaf_count *= NIDS_PER_BLOCK;
> + result += (leaf_count * 2);
> +
> + leaf_count *= NIDS_PER_BLOCK;
> + result += (leaf_count * 2);
> +
> + result <<= bits;
> + return result;
> +}

I think that it needs to comment logic of max_file_size() function. It is not so clear why it needs to multiply leaf_count on 2.

> +
> +static int sanity_check_raw_super(struct f2fs_super_block *raw_super)
> +{
> + unsigned int blocksize;
> +
> + if (F2FS_SUPER_MAGIC != le32_to_cpu(raw_super->magic))
> + return 1;
> +
> + /* Currently, support only 4KB block size */
> + blocksize = 1 << le32_to_cpu(raw_super->log_blocksize);
> + if (blocksize != PAGE_CACHE_SIZE)
> + return 1;
> + if (le32_to_cpu(raw_super->log_sectorsize) != 9)

I think it makes sense to have special declaration for hardcoded 9.

> + return 1;
> + if (le32_to_cpu(raw_super->log_sectors_per_block) != 3)

I think it makes sense to have special declaration for hardcoded 3.

> + return 1;
> + return 0;
> +}
> +
> +static int sanity_check_ckpt(struct f2fs_super_block *raw_super,
> + struct f2fs_checkpoint *ckpt)
> +{
> + unsigned int total, fsmeta;
> +
> + total = le32_to_cpu(raw_super->segment_count);
> + fsmeta = le32_to_cpu(raw_super->segment_count_ckpt);
> + fsmeta += le32_to_cpu(raw_super->segment_count_sit);
> + fsmeta += le32_to_cpu(raw_super->segment_count_nat);
> + fsmeta += le32_to_cpu(ckpt->rsvd_segment_count);
> + fsmeta += le32_to_cpu(raw_super->segment_count_ssa);
> +
> + if (fsmeta >= total)
> + return 1;
> + return 0;
> +}
> +
> +static void init_sb_info(struct f2fs_sb_info *sbi)
> +{
> + struct f2fs_super_block *raw_super = sbi->raw_super;
> + int i;
> +
> + sbi->log_sectorsize = le32_to_cpu(raw_super->log_sectorsize);
> + sbi->log_sectors_per_block =
> + le32_to_cpu(raw_super->log_sectors_per_block);
> + sbi->log_blocksize = le32_to_cpu(raw_super->log_blocksize);
> + sbi->blocksize = 1 << sbi->log_blocksize;
> + sbi->log_blocks_per_seg = le32_to_cpu(raw_super->log_blocks_per_seg);
> + sbi->blocks_per_seg = 1 << sbi->log_blocks_per_seg;
> + sbi->segs_per_sec = le32_to_cpu(raw_super->segs_per_sec);
> + sbi->secs_per_zone = le32_to_cpu(raw_super->secs_per_zone);
> + sbi->total_sections = le32_to_cpu(raw_super->section_count);
> + sbi->total_node_count =
> + (le32_to_cpu(raw_super->segment_count_nat) / 2)
> + * sbi->blocks_per_seg * NAT_ENTRY_PER_BLOCK;
> + sbi->root_ino_num = le32_to_cpu(raw_super->root_ino);
> + sbi->node_ino_num = le32_to_cpu(raw_super->node_ino);
> + sbi->meta_ino_num = le32_to_cpu(raw_super->meta_ino);
> +
> + for (i = 0; i < NR_COUNT_TYPE; i++)
> + atomic_set(&sbi->nr_pages[i], 0);
> +}
> +
> +static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> +{
> + struct f2fs_sb_info *sbi;
> + struct f2fs_super_block *raw_super;
> + struct buffer_head *raw_super_buf;
> + struct inode *root;
> + int i;
> +
> + /* allocate memory for f2fs-specific super block info */
> + sbi = kzalloc(sizeof(struct f2fs_sb_info), GFP_KERNEL);
> + if (!sbi)
> + return -ENOMEM;
> +
> + /* set a temporary block size */
> + if (!sb_set_blocksize(sb, F2FS_BLKSIZE))
> + goto free_sbi;
> +
> + /* read f2fs raw super block */
> + raw_super_buf = sb_bread(sb, F2FS_SUPER_OFFSET);
> + if (!raw_super_buf)
> + goto free_sbi;
> + raw_super = (struct f2fs_super_block *) ((char *)raw_super_buf->b_data);
> +
> + /* init some FS parameters */
> + sbi->active_logs = NR_CURSEG_TYPE;
> +
> + set_opt(sbi, BG_GC);
> +
> +#ifdef CONFIG_F2FS_FS_XATTR
> + set_opt(sbi, XATTR_USER);
> +#endif
> +#ifdef CONFIG_F2FS_FS_POSIX_ACL
> + set_opt(sbi, POSIX_ACL);
> +#endif
> + /* parse mount options */
> + if (parse_options(sbi, (char *)data))
> + goto free_sb_buf;
> +
> + /* sanity checking of raw super */
> + if (sanity_check_raw_super(raw_super))
> + goto free_sb_buf;
> +
> + sb->s_maxbytes = max_file_size(raw_super->log_blocksize);
> + sb->s_max_links = F2FS_LINK_MAX;
> +
> + sb->s_op = &f2fs_sops;
> + sb->s_xattr = f2fs_xattr_handlers;
> + sb->s_magic = F2FS_SUPER_MAGIC;
> + sb->s_fs_info = sbi;
> + sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
> + (test_opt(sbi, POSIX_ACL) ? MS_POSIXACL : 0);
> +
> + /* init f2fs-specific super block info */
> + sbi->sb = sb;
> + sbi->raw_super = raw_super;
> + sbi->raw_super_buf = raw_super_buf;
> + mutex_init(&sbi->gc_mutex);
> + mutex_init(&sbi->write_inode);
> + mutex_init(&sbi->writepages);
> + mutex_init(&sbi->cp_mutex);
> + for (i = 0; i < NR_LOCK_TYPE; i++)
> + mutex_init(&sbi->fs_lock[i]);
> + sbi->por_doing = 0;
> + spin_lock_init(&sbi->stat_lock);
> + init_rwsem(&sbi->bio_sem);
> + init_sb_info(sbi);
> +
> + /* get an inode for meta space */
> + sbi->meta_inode = f2fs_iget(sb, F2FS_META_INO(sbi));
> + if (IS_ERR(sbi->meta_inode))
> + goto free_sb_buf;
> +
> + if (get_valid_checkpoint(sbi))
> + goto free_meta_inode;
> +
> + /* sanity checking of checkpoint */
> + if (sanity_check_ckpt(raw_super, sbi->ckpt))
> + goto free_cp;
> +
> + sbi->total_valid_node_count =
> + le32_to_cpu(sbi->ckpt->valid_node_count);
> + sbi->total_valid_inode_count =
> + le32_to_cpu(sbi->ckpt->valid_inode_count);
> + sbi->user_block_count = le64_to_cpu(sbi->ckpt->user_block_count);
> + sbi->total_valid_block_count =
> + le64_to_cpu(sbi->ckpt->valid_block_count);
> + sbi->last_valid_block_count = sbi->total_valid_block_count;
> + sbi->alloc_valid_block_count = 0;
> + INIT_LIST_HEAD(&sbi->dir_inode_list);
> + spin_lock_init(&sbi->dir_inode_lock);
> +
> + /* init super block */
> + if (!sb_set_blocksize(sb, sbi->blocksize))
> + goto free_cp;
> +
> + init_orphan_info(sbi);
> +
> + /* setup f2fs internal modules */
> + if (build_segment_manager(sbi))
> + goto free_sm;
> + if (build_node_manager(sbi))
> + goto free_nm;
> + if (build_gc_manager(sbi))
> + goto free_gc;
> +
> + /* get an inode for node space */
> + sbi->node_inode = f2fs_iget(sb, F2FS_NODE_INO(sbi));
> + if (IS_ERR(sbi->node_inode))
> + goto free_gc;
> +
> + /* if there are nt orphan nodes free them */
> + if (recover_orphan_inodes(sbi))
> + goto free_node_inode;
> +
> + /* read root inode and dentry */
> + root = f2fs_iget(sb, F2FS_ROOT_INO(sbi));
> + if (IS_ERR(root))
> + goto free_node_inode;
> + if (!S_ISDIR(root->i_mode) || !root->i_blocks || !root->i_size)
> + goto free_root_inode;
> +
> + sb->s_root = d_make_root(root); /* allocate root dentry */
> + if (!sb->s_root)
> + goto free_root_inode;
> +
> + /* recover fsynced data */
> + if (!test_opt(sbi, DISABLE_ROLL_FORWARD))
> + recover_fsync_data(sbi);
> +
> + /* After POR, we can run background GC thread */
> + if (start_gc_thread(sbi))
> + goto fail;
> +
> +#ifdef CONFIG_F2FS_STAT_FS
> + if (f2fs_proc_root) {
> + sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root);
> + if (f2fs_stat_init(sbi))
> + goto fail;
> + }
> +#endif
> + return 0;
> +fail:
> + stop_gc_thread(sbi);
> +free_root_inode:
> + make_bad_inode(root);
> + iput(root);
> +free_node_inode:
> + make_bad_inode(sbi->node_inode);
> + iput(sbi->node_inode);
> +free_gc:
> + destroy_gc_manager(sbi);
> +free_nm:
> + destroy_node_manager(sbi);
> +free_sm:
> + destroy_segment_manager(sbi);
> +free_cp:
> + kfree(sbi->ckpt);
> +free_meta_inode:
> + make_bad_inode(sbi->meta_inode);
> + iput(sbi->meta_inode);
> +free_sb_buf:
> + brelse(raw_super_buf);
> +free_sbi:
> + kfree(sbi);
> + return -EINVAL;

The reason of failing in fill_super() can be different but this function report -EINVAL always, as I understand. I think that it is not fully correct way of reporting about failing.

> +}
> +
> +static struct dentry *f2fs_mount(struct file_system_type *fs_type, int flags,
> + const char *dev_name, void *data)
> +{
> + return mount_bdev(fs_type, flags, dev_name, data, f2fs_fill_super);
> +}
> +
> +static struct file_system_type f2fs_fs_type = {
> + .owner = THIS_MODULE,
> + .name = "f2fs",
> + .mount = f2fs_mount,
> + .kill_sb = kill_block_super,
> + .fs_flags = FS_REQUIRES_DEV,
> +};
> +
> +static int init_inodecache(void)
> +{
> + f2fs_inode_cachep = f2fs_kmem_cache_create("f2fs_inode_cache",
> + sizeof(struct f2fs_inode_info), NULL);
> + if (f2fs_inode_cachep == NULL)
> + return -ENOMEM;
> + return 0;
> +}
> +
> +static void destroy_inodecache(void)
> +{
> + /*
> + * Make sure all delayed rcu free inodes are flushed before we
> + * destroy cache.
> + */
> + rcu_barrier();
> + kmem_cache_destroy(f2fs_inode_cachep);
> +}
> +
> +static int __init init_f2fs_fs(void)
> +{
> + if (init_inodecache())
> + goto fail;
> + if (create_node_manager_caches())
> + goto fail;
> + if (create_gc_caches())
> + goto fail;
> + if (create_checkpoint_caches())
> + goto fail;
> + if (register_filesystem(&f2fs_fs_type))
> + return -EBUSY;

Why EBUSY? Usually, it returns error code from register_filesystem().

> +
> + f2fs_proc_root = proc_mkdir("fs/f2fs", NULL);
> + return 0;
> +fail:
> + return -ENOMEM;
> +}
> +
> +static void __exit exit_f2fs_fs(void)
> +{
> + remove_proc_entry("fs/f2fs", NULL);
> + unregister_filesystem(&f2fs_fs_type);
> + destroy_checkpoint_caches();
> + destroy_gc_caches();
> + destroy_node_manager_caches();
> + destroy_inodecache();
> +}
> +
> +module_init(init_f2fs_fs)
> +module_exit(exit_f2fs_fs)
> +
> +MODULE_AUTHOR("Samsung Electronics's Praesto Team");
> +MODULE_DESCRIPTION("Flash Friendly File System");
> +MODULE_LICENSE("GPL");
> --
> 1.7.9.5
>
>
>
>
> ---
> Jaegeuk Kim
> Samsung
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

With the best regards,
Vyacheslav Dubeyko.